This patchset is a reduced scope version of the patchset "fs/dcache: Track & limit # of negative dentries" (https://lkml.org/lkml/2018/7/12/586). Only the first 2 patches are included to track the number of negative dentries in the system as well as making negative dentries more easily reclaimed than positive ones. There are controversies on limiting number of negative dentries as it may make negative dentries special in term of how memory resources are to be managed in the kernel. However, I don't believe I heard any concern about tracking the number of negative dentries in the system. So it is better to separate that out and get it done with. We can deal with the controversial part later on. Patch 1 adds tracking to the number of negative dentries in the LRU list. Patch 2 makes negative dentries to be added at the head end of the LRU list so that they are first to go when a shrinker is running if those negative dentries are never referenced again. Waiman Long (2): fs/dcache: Track & report number of negative dentries fs/dcache: Make negative dentries easier to be reclaimed Documentation/sysctl/fs.txt | 19 ++++++++++----- fs/dcache.c | 56 ++++++++++++++++++++++++++++++++++++++++++++- include/linux/dcache.h | 8 ++++--- include/linux/list_lru.h | 17 ++++++++++++++ mm/list_lru.c | 16 +++++++++++-- 5 files changed, 104 insertions(+), 12 deletions(-) -- 1.8.3.1
The current dentry number tracking code doesn't distinguish between positive & negative dentries. It just reports the total number of dentries in the LRU lists. As excessive number of negative dentries can have an impact on system performance, it will be wise to track the number of positive and negative dentries separately. This patch adds tracking for the total number of negative dentries in the system LRU lists and reports it in the /proc/sys/fs/dentry-state file. The number, however, does not include negative dentries that are in flight but not in the LRU yet. The number of positive dentries in the LRU lists can be roughly found by subtracting the number of negative dentries from the total. Signed-off-by: Waiman Long <longman@redhat.com> --- Documentation/sysctl/fs.txt | 19 +++++++++++++------ fs/dcache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/dcache.h | 7 ++++--- 3 files changed, 62 insertions(+), 9 deletions(-) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 819caf8..118bb93 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -63,19 +63,26 @@ struct { int nr_unused; int age_limit; /* age in seconds */ int want_pages; /* pages requested by system */ - int dummy[2]; + int nr_negative; /* # of unused negative dentries */ + int dummy; } dentry_stat = {0, 0, 45, 0,}; --------------------------------------------------------------- +-------------------------------------------------------------- + +Dentries are dynamically allocated and deallocated. + +nr_dentry shows the total number of dentries allocated (active ++ unused). nr_unused shows the number of dentries that are not +actively used, but are saved in the LRU list for future reuse. -Dentries are dynamically allocated and deallocated, and -nr_dentry seems to be 0 all the time. Hence it's safe to -assume that only nr_unused, age_limit and want_pages are -used. Nr_unused seems to be exactly what its name says. Age_limit is the age in seconds after which dcache entries can be reclaimed when memory is short and want_pages is nonzero when shrink_dcache_pages() has been called and the dcache isn't pruned yet. +nr_negative shows the number of unused dentries that are also +negative dentries which do not mapped to actual files if negative +dentries tracking is enabled. + ============================================================== dquot-max & dquot-nr: diff --git a/fs/dcache.c b/fs/dcache.c index 2e7e8d8..69f5541 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -119,6 +119,7 @@ struct dentry_stat_t dentry_stat = { static DEFINE_PER_CPU(long, nr_dentry); static DEFINE_PER_CPU(long, nr_dentry_unused); +static DEFINE_PER_CPU(long, nr_dentry_neg); #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) @@ -152,11 +153,22 @@ static long get_nr_dentry_unused(void) return sum < 0 ? 0 : sum; } +static long get_nr_dentry_neg(void) +{ + int i; + long sum = 0; + + for_each_possible_cpu(i) + sum += per_cpu(nr_dentry_neg, i); + return sum < 0 ? 0 : sum; +} + int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { dentry_stat.nr_dentry = get_nr_dentry(); dentry_stat.nr_unused = get_nr_dentry_unused(); + dentry_stat.nr_negative = get_nr_dentry_neg(); return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } #endif @@ -214,6 +226,28 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char #endif +static inline void __neg_dentry_dec(struct dentry *dentry) +{ + this_cpu_dec(nr_dentry_neg); +} + +static inline void neg_dentry_dec(struct dentry *dentry) +{ + if (unlikely(d_is_negative(dentry))) + __neg_dentry_dec(dentry); +} + +static inline void __neg_dentry_inc(struct dentry *dentry) +{ + this_cpu_inc(nr_dentry_neg); +} + +static inline void neg_dentry_inc(struct dentry *dentry) +{ + if (unlikely(d_is_negative(dentry))) + __neg_dentry_inc(dentry); +} + static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount) { /* @@ -331,6 +365,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry) flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); WRITE_ONCE(dentry->d_flags, flags); dentry->d_inode = NULL; + if (dentry->d_flags & DCACHE_LRU_LIST) + __neg_dentry_inc(dentry); } static void dentry_free(struct dentry *dentry) @@ -395,6 +431,7 @@ static void d_lru_add(struct dentry *dentry) dentry->d_flags |= DCACHE_LRU_LIST; this_cpu_inc(nr_dentry_unused); WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); + neg_dentry_inc(dentry); } static void d_lru_del(struct dentry *dentry) @@ -403,6 +440,7 @@ static void d_lru_del(struct dentry *dentry) dentry->d_flags &= ~DCACHE_LRU_LIST; this_cpu_dec(nr_dentry_unused); WARN_ON_ONCE(!list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); + neg_dentry_dec(dentry); } static void d_shrink_del(struct dentry *dentry) @@ -433,6 +471,7 @@ static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry) dentry->d_flags &= ~DCACHE_LRU_LIST; this_cpu_dec(nr_dentry_unused); list_lru_isolate(lru, &dentry->d_lru); + neg_dentry_dec(dentry); } static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry, @@ -441,6 +480,7 @@ static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry, D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST); dentry->d_flags |= DCACHE_SHRINK_LIST; list_lru_isolate_move(lru, &dentry->d_lru, list); + neg_dentry_dec(dentry); } /** @@ -1840,6 +1880,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode) WARN_ON(d_in_lookup(dentry)); spin_lock(&dentry->d_lock); + /* + * Decrement negative dentry count if it was in the LRU list. + */ + if (dentry->d_flags & DCACHE_LRU_LIST) + __neg_dentry_dec(dentry); hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry); raw_write_seqcount_begin(&dentry->d_seq); __d_set_inode_and_type(dentry, inode, add_flags); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index ef4b70f..df942e5 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -62,9 +62,10 @@ struct qstr { struct dentry_stat_t { long nr_dentry; long nr_unused; - long age_limit; /* age in seconds */ - long want_pages; /* pages requested by system */ - long dummy[2]; + long age_limit; /* age in seconds */ + long want_pages; /* pages requested by system */ + long nr_negative; /* # of unused negative dentries */ + long dummy; }; extern struct dentry_stat_t dentry_stat; -- 1.8.3.1
For negative dentries that are accessed once and never used again, they should be removed first before other dentries when shrinker is running. This is done by putting negative dentries at the head of the LRU list instead at the tail. A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it is initially created. When such a dentry is added to the LRU, it will be added to the head so that it will be the first to go when a shrinker is running if it is never accessed again (DCACHE_REFERENCED bit not set). The flag is cleared after the LRU list addition. Suggested-by: Larry Woodman <lwoodman@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com> --- fs/dcache.c | 25 +++++++++++++++++-------- include/linux/dcache.h | 1 + include/linux/list_lru.h | 17 +++++++++++++++++ mm/list_lru.c | 16 ++++++++++++++-- 4 files changed, 49 insertions(+), 10 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 69f5541..ab6a4cf 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -242,12 +242,6 @@ static inline void __neg_dentry_inc(struct dentry *dentry) this_cpu_inc(nr_dentry_neg); } -static inline void neg_dentry_inc(struct dentry *dentry) -{ - if (unlikely(d_is_negative(dentry))) - __neg_dentry_inc(dentry); -} - static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount) { /* @@ -353,7 +347,7 @@ static inline void __d_set_inode_and_type(struct dentry *dentry, dentry->d_inode = inode; flags = READ_ONCE(dentry->d_flags); - flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); + flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU | DCACHE_NEW_NEGATIVE); flags |= type_flags; WRITE_ONCE(dentry->d_flags, flags); } @@ -430,8 +424,20 @@ static void d_lru_add(struct dentry *dentry) D_FLAG_VERIFY(dentry, 0); dentry->d_flags |= DCACHE_LRU_LIST; this_cpu_inc(nr_dentry_unused); + if (d_is_negative(dentry)) { + __neg_dentry_inc(dentry); + if (dentry->d_flags & DCACHE_NEW_NEGATIVE) { + /* + * Add the negative dentry to the head once, it + * will be added to the tail next time. + */ + WARN_ON_ONCE(!list_lru_add_head( + &dentry->d_sb->s_dentry_lru, &dentry->d_lru)); + dentry->d_flags &= ~DCACHE_NEW_NEGATIVE; + return; + } + } WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); - neg_dentry_inc(dentry); } static void d_lru_del(struct dentry *dentry) @@ -2620,6 +2626,9 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode) __d_set_inode_and_type(dentry, inode, add_flags); raw_write_seqcount_end(&dentry->d_seq); fsnotify_update_flags(dentry); + } else { + /* It is a negative dentry, add it to LRU head initially. */ + dentry->d_flags |= DCACHE_NEW_NEGATIVE; } __d_rehash(dentry); if (dir) diff --git a/include/linux/dcache.h b/include/linux/dcache.h index df942e5..03a1918 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -214,6 +214,7 @@ struct dentry_operations { #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */ #define DCACHE_ENCRYPTED_WITH_KEY 0x02000000 /* dir is encrypted with a valid key */ #define DCACHE_OP_REAL 0x04000000 +#define DCACHE_NEW_NEGATIVE 0x08000000 /* New negative dentry */ #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */ #define DCACHE_DENTRY_CURSOR 0x20000000 diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index aa5efd9..bfac057 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -90,6 +90,23 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, bool list_lru_add(struct list_lru *lru, struct list_head *item); /** + * list_lru_add_head: add an element to the lru list's head + * @list_lru: the lru pointer + * @item: the item to be added. + * + * This is similar to list_lru_add(). The only difference is the location + * where the new item will be added. The list_lru_add() function will add + * the new item to the tail as it is the most recently used one. The + * list_lru_add_head() will add the new item into the head so that it + * will the first to go if a shrinker is running. So this function should + * only be used for less important item that can be the first to go if + * the system is under memory pressure. + * + * Return value: true if the list was updated, false otherwise + */ +bool list_lru_add_head(struct list_lru *lru, struct list_head *item); + +/** * list_lru_del: delete an element to the lru list * @list_lru: the lru pointer * @item: the item to be deleted. diff --git a/mm/list_lru.c b/mm/list_lru.c index 5b30625..133f41c 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -124,7 +124,8 @@ static inline bool list_lru_memcg_aware(struct list_lru *lru) } #endif /* CONFIG_MEMCG_KMEM */ -bool list_lru_add(struct list_lru *lru, struct list_head *item) +static inline bool __list_lru_add(struct list_lru *lru, struct list_head *item, + const bool add_tail) { int nid = page_to_nid(virt_to_page(item)); struct list_lru_node *nlru = &lru->node[nid]; @@ -134,7 +135,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item) spin_lock(&nlru->lock); if (list_empty(item)) { l = list_lru_from_kmem(nlru, item, &memcg); - list_add_tail(item, &l->list); + (add_tail ? list_add_tail : list_add)(item, &l->list); /* Set shrinker bit if the first element was added */ if (!l->nr_items++) memcg_set_shrinker_bit(memcg, nid, @@ -146,8 +147,19 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item) spin_unlock(&nlru->lock); return false; } + +bool list_lru_add(struct list_lru *lru, struct list_head *item) +{ + return __list_lru_add(lru, item, true); +} EXPORT_SYMBOL_GPL(list_lru_add); +bool list_lru_add_head(struct list_lru *lru, struct list_head *item) +{ + return __list_lru_add(lru, item, false); +} +EXPORT_SYMBOL_GPL(list_lru_add_head); + bool list_lru_del(struct list_lru *lru, struct list_head *item) { int nid = page_to_nid(virt_to_page(item)); -- 1.8.3.1
On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
> @@ -134,7 +135,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
> spin_lock(&nlru->lock);
> if (list_empty(item)) {
> l = list_lru_from_kmem(nlru, item, &memcg);
> - list_add_tail(item, &l->list);
> + (add_tail ? list_add_tail : list_add)(item, &l->list);
> /* Set shrinker bit if the first element was added */
> if (!l->nr_items++)
> memcg_set_shrinker_bit(memcg, nid,
That's not OK. Write it out properly, ie:
if (add_tail)
list_add_tail(item, &l->list);
else
list_add(item, &l->list);
On 08/28/2018 06:13 PM, Matthew Wilcox wrote:
> On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
>> @@ -134,7 +135,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
>> spin_lock(&nlru->lock);
>> if (list_empty(item)) {
>> l = list_lru_from_kmem(nlru, item, &memcg);
>> - list_add_tail(item, &l->list);
>> + (add_tail ? list_add_tail : list_add)(item, &l->list);
>> /* Set shrinker bit if the first element was added */
>> if (!l->nr_items++)
>> memcg_set_shrinker_bit(memcg, nid,
> That's not OK. Write it out properly, ie:
>
> if (add_tail)
> list_add_tail(item, &l->list);
> else
> list_add(item, &l->list);
>
Yes, I can rewrite it. What is the problem with the abbreviated form?
Cheers,
Longman
On Tue, 28 Aug 2018 13:19:38 -0400 Waiman Long <longman@redhat.com> wrote:
> This patchset is a reduced scope version of the
> patchset "fs/dcache: Track & limit # of negative dentries"
> (https://lkml.org/lkml/2018/7/12/586). Only the first 2 patches are
> included to track the number of negative dentries in the system as well
> as making negative dentries more easily reclaimed than positive ones.
>
> There are controversies on limiting number of negative dentries as it may
> make negative dentries special in term of how memory resources are to
> be managed in the kernel. However, I don't believe I heard any concern
> about tracking the number of negative dentries in the system. So it is
> better to separate that out and get it done with. We can deal with the
> controversial part later on.
Seems reasonable.
It would be nice to see testing results please. Quite comprehensive
ones.
And again, an apparently permanent feature of this patchset is that the
changelogs fail to provide descriptions of real-world problems with the
existing code. Please do provide those (comprehensive) descriptions and
demonstrate that these changes resolve those problems.
Also, a grumpynit: with 100% uniformity, the vfs presently refers to
negative dentries with the string "negative" in the identifier. This
patchset abbreviates that to "neg".
On 08/28/2018 06:50 PM, Andrew Morton wrote:
> On Tue, 28 Aug 2018 13:19:38 -0400 Waiman Long <longman@redhat.com> wrote:
>
>> This patchset is a reduced scope version of the
>> patchset "fs/dcache: Track & limit # of negative dentries"
>> (https://lkml.org/lkml/2018/7/12/586). Only the first 2 patches are
>> included to track the number of negative dentries in the system as well
>> as making negative dentries more easily reclaimed than positive ones.
>>
>> There are controversies on limiting number of negative dentries as it may
>> make negative dentries special in term of how memory resources are to
>> be managed in the kernel. However, I don't believe I heard any concern
>> about tracking the number of negative dentries in the system. So it is
>> better to separate that out and get it done with. We can deal with the
>> controversial part later on.
> Seems reasonable.
>
> It would be nice to see testing results please. Quite comprehensive
> ones.
>
> And again, an apparently permanent feature of this patchset is that the
> changelogs fail to provide descriptions of real-world problems with the
> existing code. Please do provide those (comprehensive) descriptions and
> demonstrate that these changes resolve those problems.
>
> Also, a grumpynit: with 100% uniformity, the vfs presently refers to
> negative dentries with the string "negative" in the identifier. This
> patchset abbreviates that to "neg".
>
Will do.
Cheers,
Longman
Another pet peeve ;) On Tue, 28 Aug 2018 13:19:40 -0400 Waiman Long <longman@redhat.com> wrote: > /** > + * list_lru_add_head: add an element to the lru list's head > + * @list_lru: the lru pointer > + * @item: the item to be added. > + * > + * This is similar to list_lru_add(). The only difference is the location > + * where the new item will be added. The list_lru_add() function will add People often use the term "the foo() function". I don't know why - just say "foo()"! > + * the new item to the tail as it is the most recently used one. The > + * list_lru_add_head() will add the new item into the head so that it Ditto. "to the head" > + * will the first to go if a shrinker is running. So this function should "will be the" > + * only be used for less important item that can be the first to go if "items" > + * the system is under memory pressure. > + * > + * Return value: true if the list was updated, false otherwise > + */
On Tue, Aug 28, 2018 at 3:29 PM Waiman Long <longman@redhat.com> wrote:
>
> Yes, I can rewrite it. What is the problem with the abbreviated form?
Either gcc rewrites it for you, or you end up _actually_ using a
function pointer and calling through it.
The latter would be absolutely horribly bad for something like
"list_add()", which should expand to just a couple of instructions.
And the former would be ok, except for the "you wrote code the garbage
way, and then depended on the compiler fixing it up". Which we
generally try to avoid in the kernel.
(Don't get me wrong - we definitely depend on the compiler doing a
good job at CSE and dead code elimination etc, but generally we try to
avoid the whole "compiler has to rewrite code to be good" model).
Linus
On Tue, 28 Aug 2018 16:10:24 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Aug 28, 2018 at 3:29 PM Waiman Long <longman@redhat.com> wrote:
> >
> > Yes, I can rewrite it. What is the problem with the abbreviated form?
>
> Either gcc rewrites it for you, or you end up _actually_ using a
> function pointer and calling through it.
>
> The latter would be absolutely horribly bad for something like
> "list_add()", which should expand to just a couple of instructions.
>
> And the former would be ok, except for the "you wrote code the garbage
> way, and then depended on the compiler fixing it up". Which we
> generally try to avoid in the kernel.
>
> (Don't get me wrong - we definitely depend on the compiler doing a
> good job at CSE and dead code elimination etc, but generally we try to
> avoid the whole "compiler has to rewrite code to be good" model).
>
And the "abbreviated form" will surely explode if one or both of those
"functions" happens to be implemented (or later reimplemented) as a macro.
It's best not to unnecessarily make such assumptions.
On Tue, Aug 28, 2018 at 01:19:39PM -0400, Waiman Long wrote: > The current dentry number tracking code doesn't distinguish between > positive & negative dentries. It just reports the total number of > dentries in the LRU lists. > > As excessive number of negative dentries can have an impact on system > performance, it will be wise to track the number of positive and > negative dentries separately. > > This patch adds tracking for the total number of negative dentries in > the system LRU lists and reports it in the /proc/sys/fs/dentry-state > file. The number, however, does not include negative dentries that are > in flight but not in the LRU yet. > > The number of positive dentries in the LRU lists can be roughly found > by subtracting the number of negative dentries from the total. > > Signed-off-by: Waiman Long <longman@redhat.com> > --- > Documentation/sysctl/fs.txt | 19 +++++++++++++------ > fs/dcache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ > include/linux/dcache.h | 7 ++++--- > 3 files changed, 62 insertions(+), 9 deletions(-) > > diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt > index 819caf8..118bb93 100644 > --- a/Documentation/sysctl/fs.txt > +++ b/Documentation/sysctl/fs.txt > @@ -63,19 +63,26 @@ struct { > int nr_unused; > int age_limit; /* age in seconds */ > int want_pages; /* pages requested by system */ > - int dummy[2]; > + int nr_negative; /* # of unused negative dentries */ > + int dummy; > } dentry_stat = {0, 0, 45, 0,}; That's not a backwards compatible ABI change. Those dummy fields used to represent some metric we no longer calculate, and there are probably still monitoring apps out there that think they still have the old meaning. i.e. they are still visible to userspace: $ cat /proc/sys/fs/dentry-state 83090 67661 45 0 0 0 $ IOWs, you can add new fields for new metrics to the end of the structure, but you can't re-use existing fields even if they aren't calculated anymore. [....] > @@ -214,6 +226,28 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char > > #endif > > +static inline void __neg_dentry_dec(struct dentry *dentry) > +{ > + this_cpu_dec(nr_dentry_neg); > +} > + > +static inline void neg_dentry_dec(struct dentry *dentry) > +{ > + if (unlikely(d_is_negative(dentry))) > + __neg_dentry_dec(dentry); unlikely() considered harmful. The workload you are trying to optimise is whe negative dentries are the common case. IOWs, static branch prediction hints like this will be wrong exactly when we want the branch to be predicted correctly by the hardware. > +} > + > +static inline void __neg_dentry_inc(struct dentry *dentry) > +{ > + this_cpu_inc(nr_dentry_neg); > +} > + > +static inline void neg_dentry_inc(struct dentry *dentry) > +{ > + if (unlikely(d_is_negative(dentry))) > + __neg_dentry_inc(dentry); > +} These wrappers obfuscate the code - they do not do what the name suggests and instead are conditional on dentry state. I'd just open code this stuff - the code is much better without the wrappers. > + > static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount) > { > /* > @@ -331,6 +365,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry) > flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); > WRITE_ONCE(dentry->d_flags, flags); > dentry->d_inode = NULL; > + if (dentry->d_flags & DCACHE_LRU_LIST) > + __neg_dentry_inc(dentry); > } > > static void dentry_free(struct dentry *dentry) > @@ -395,6 +431,7 @@ static void d_lru_add(struct dentry *dentry) > dentry->d_flags |= DCACHE_LRU_LIST; > this_cpu_inc(nr_dentry_unused); > WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); > + neg_dentry_inc(dentry); Like this - why on earth would we increment the negative dentry count for every dentry that is added to LRU? Open coding this_cpu_inc(nr_dentry_unused); + if (d_is_negative(dentry)) + this_cpu_inc(nr_dentry_neg); WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); That's obvious to the reader what we are doing, and it aggregates all the accounting in a single location. Same for the rest of the code. Cheers, Dave. -- Dave Chinner david@fromorbit.com
On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote: > For negative dentries that are accessed once and never used again, they > should be removed first before other dentries when shrinker is running. > This is done by putting negative dentries at the head of the LRU list > instead at the tail. > > A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it > is initially created. When such a dentry is added to the LRU, it will be > added to the head so that it will be the first to go when a shrinker is > running if it is never accessed again (DCACHE_REFERENCED bit not set). > The flag is cleared after the LRU list addition. This exposes internal LRU list functionality to callers. I carefully hid what end of the list was the most or least recent from subsystems interacting with LRUs precisely because "head" and "tail" are completely confusing when interacting with a LRU. LRUs are about tracking relative object age, not heads and tails. IOWs, "track this object as most recently used", not "add this object to the list tail". The opposite is "track this object is least recently used", not "add object at head of list". IOWs, the interface should be list_lru_add_oldest() or maybe list_lru_add_least_recent() to indicate that these objects are considered to be the oldest and therefore prime immediate reclaim candidates. Which points out a problem. That the most recent negative dentry will be the first to be reclaimed. That's not LRU behaviour, and prevents a working set of negative dentries from being retained when the shrinker rotates through the dentries in LRU order. i.e. this patch turns the LRU into a MRU list for negative dentries. And then there's shrinker behaviour. What happens if the shrinker isolate callback returns LRU_ROTATE on a negative dentry? It gets moved to the most recent end of the list, so it won't have an attempt to reclaim it again until it's tried to reclaim all the real dentries. IOWs, it goes back to behaving like LRUs are supposed to behaving. IOWs, reclaim behaviour of negative dentries will be highly unpredictable, it will not easily retain a working set, nor will the working set it does retain be predictable or easy to eject from memory when the workload changes. Is this the behavour what we want for negative dentries? Perhaps a second internal LRU list in the list_lru for "immediate reclaim" objects would be a better solution. i.e. similar to the active/inactive lists used for prioritising the working set iover single use pages in page reclaim. negative dentries go onto the immediate list, real dentries go on the existing list. Both are LRU, and the shrinker operates on the immediate list first. When we rotate referenced negative dentries on the immediate list, promote them to the active list with all the real dentries so they age out with the rest of the working set. That way single use negative dentries get removed in LRU order in preference to the working set of real dentries. Being able to make changes to the list implementation easily was one of the reasons I hid the implementation of the list_lru from the interface callers use.... [...] > @@ -430,8 +424,20 @@ static void d_lru_add(struct dentry *dentry) > D_FLAG_VERIFY(dentry, 0); > dentry->d_flags |= DCACHE_LRU_LIST; > this_cpu_inc(nr_dentry_unused); > + if (d_is_negative(dentry)) { > + __neg_dentry_inc(dentry); /me notes this patch now open codes this, like suggested in the previous patch. > + if (dentry->d_flags & DCACHE_NEW_NEGATIVE) { > + /* > + * Add the negative dentry to the head once, it > + * will be added to the tail next time. > + */ > + WARN_ON_ONCE(!list_lru_add_head( > + &dentry->d_sb->s_dentry_lru, &dentry->d_lru)); > + dentry->d_flags &= ~DCACHE_NEW_NEGATIVE; > + return; > + } > + } > WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); > - neg_dentry_inc(dentry); > } > > static void d_lru_del(struct dentry *dentry) > @@ -2620,6 +2626,9 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode) > __d_set_inode_and_type(dentry, inode, add_flags); > raw_write_seqcount_end(&dentry->d_seq); > fsnotify_update_flags(dentry); > + } else { > + /* It is a negative dentry, add it to LRU head initially. */ > + dentry->d_flags |= DCACHE_NEW_NEGATIVE; Comments about LRU behaviour should not be put anywhere but in the lru code. Otherwise it ends up stale and wrong, assuming it is correct to start with.... > } > __d_rehash(dentry); > if (dir) > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > index df942e5..03a1918 100644 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -214,6 +214,7 @@ struct dentry_operations { > #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */ > #define DCACHE_ENCRYPTED_WITH_KEY 0x02000000 /* dir is encrypted with a valid key */ > #define DCACHE_OP_REAL 0x04000000 > +#define DCACHE_NEW_NEGATIVE 0x08000000 /* New negative dentry */ > > #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */ > #define DCACHE_DENTRY_CURSOR 0x20000000 > diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h > index aa5efd9..bfac057 100644 > --- a/include/linux/list_lru.h > +++ b/include/linux/list_lru.h > @@ -90,6 +90,23 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, > bool list_lru_add(struct list_lru *lru, struct list_head *item); > > /** > + * list_lru_add_head: add an element to the lru list's head > + * @list_lru: the lru pointer > + * @item: the item to be added. > + * > + * This is similar to list_lru_add(). The only difference is the location > + * where the new item will be added. The list_lru_add() function will add > + * the new item to the tail as it is the most recently used one. The > + * list_lru_add_head() will add the new item into the head so that it > + * will the first to go if a shrinker is running. So this function should > + * only be used for less important item that can be the first to go if > + * the system is under memory pressure. As mentioned above, this API needs reference object ages, not internal list ordering and shrinker implementation details. Cheers, Dave. -- Dave Chinner david@fromorbit.com
On 08/28/2018 07:10 PM, Linus Torvalds wrote: > On Tue, Aug 28, 2018 at 3:29 PM Waiman Long <longman@redhat.com> wrote: >> Yes, I can rewrite it. What is the problem with the abbreviated form? > Either gcc rewrites it for you, or you end up _actually_ using a > function pointer and calling through it. Yes, function pointer will be really bad. > > The latter would be absolutely horribly bad for something like > "list_add()", which should expand to just a couple of instructions. > > And the former would be ok, except for the "you wrote code the garbage > way, and then depended on the compiler fixing it up". Which we > generally try to avoid in the kernel. > > (Don't get me wrong - we definitely depend on the compiler doing a > good job at CSE and dead code elimination etc, but generally we try to > avoid the whole "compiler has to rewrite code to be good" model). > > Linus I see your point here. I will rewrite to use the regular if-then-else. Thanks, Longman
On 08/28/2018 07:22 PM, Andrew Morton wrote:
> On Tue, 28 Aug 2018 16:10:24 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Tue, Aug 28, 2018 at 3:29 PM Waiman Long <longman@redhat.com> wrote:
>>> Yes, I can rewrite it. What is the problem with the abbreviated form?
>> Either gcc rewrites it for you, or you end up _actually_ using a
>> function pointer and calling through it.
>>
>> The latter would be absolutely horribly bad for something like
>> "list_add()", which should expand to just a couple of instructions.
>>
>> And the former would be ok, except for the "you wrote code the garbage
>> way, and then depended on the compiler fixing it up". Which we
>> generally try to avoid in the kernel.
>>
>> (Don't get me wrong - we definitely depend on the compiler doing a
>> good job at CSE and dead code elimination etc, but generally we try to
>> avoid the whole "compiler has to rewrite code to be good" model).
>>
> And the "abbreviated form" will surely explode if one or both of those
> "functions" happens to be implemented (or later reimplemented) as a macro.
> It's best not to unnecessarily make such assumptions.
>
Yes, that is true.
Thanks,
Longman
On Tue 28-08-18 13:19:40, Waiman Long wrote:
> For negative dentries that are accessed once and never used again, they
> should be removed first before other dentries when shrinker is running.
> This is done by putting negative dentries at the head of the LRU list
> instead at the tail.
>
> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
> is initially created. When such a dentry is added to the LRU, it will be
> added to the head so that it will be the first to go when a shrinker is
> running if it is never accessed again (DCACHE_REFERENCED bit not set).
> The flag is cleared after the LRU list addition.
Placing object to the head of the LRU list can be really tricky as Dave
pointed out. I am not familiar with the dentry cache reclaim so my
comparison below might not apply. Let me try anyway.
Negative dentries sound very similar to MADV_FREE pages from the reclaim
POV. They are primary candidate for reclaim, yet you want to preserve
aging to other easily reclaimable objects (including other MADV_FREE
pages). What we do for those pages is to move them from the anonymous
LRU list to the inactive file LRU list. Now you obviously do not have
anon/file LRUs but something similar to active/inactive LRU lists might
be a reasonably good match. Have easily reclaimable dentries on the
inactive list including negative dentries. If negative entries are
heavily used then they can promote to the active list because there is
no reason to reclaim them soon.
Just my 2c
--
Michal Hocko
SUSE Labs
On 08/28/2018 08:11 PM, Dave Chinner wrote: > On Tue, Aug 28, 2018 at 01:19:39PM -0400, Waiman Long wrote: >> The current dentry number tracking code doesn't distinguish between >> positive & negative dentries. It just reports the total number of >> dentries in the LRU lists. >> >> As excessive number of negative dentries can have an impact on system >> performance, it will be wise to track the number of positive and >> negative dentries separately. >> >> This patch adds tracking for the total number of negative dentries in >> the system LRU lists and reports it in the /proc/sys/fs/dentry-state >> file. The number, however, does not include negative dentries that are >> in flight but not in the LRU yet. >> >> The number of positive dentries in the LRU lists can be roughly found >> by subtracting the number of negative dentries from the total. >> >> Signed-off-by: Waiman Long <longman@redhat.com> >> --- >> Documentation/sysctl/fs.txt | 19 +++++++++++++------ >> fs/dcache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ >> include/linux/dcache.h | 7 ++++--- >> 3 files changed, 62 insertions(+), 9 deletions(-) >> >> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt >> index 819caf8..118bb93 100644 >> --- a/Documentation/sysctl/fs.txt >> +++ b/Documentation/sysctl/fs.txt >> @@ -63,19 +63,26 @@ struct { >> int nr_unused; >> int age_limit; /* age in seconds */ >> int want_pages; /* pages requested by system */ >> - int dummy[2]; >> + int nr_negative; /* # of unused negative dentries */ >> + int dummy; >> } dentry_stat = {0, 0, 45, 0,}; > That's not a backwards compatible ABI change. Those dummy fields > used to represent some metric we no longer calculate, and there are > probably still monitoring apps out there that think they still have > the old meaning. i.e. they are still visible to userspace: > > $ cat /proc/sys/fs/dentry-state > 83090 67661 45 0 0 0 > $ > > IOWs, you can add new fields for new metrics to the end of the > structure, but you can't re-use existing fields even if they > aren't calculated anymore. > > [....] I looked up the git history and the state of the dentry_stat structure hadn't changed since it was first put into git in 2.6.12-rc2 on Apr 16, 2005. That was over 13 years ago. Even adding an extra argument can have the potential of breaking old applications depending on how the parsing code was written. Given that systems that are still using some very old tools are not likely to upgrade to the latest kernel anyway. I don't see that as a big problem. >> @@ -214,6 +226,28 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char >> >> #endif >> >> +static inline void __neg_dentry_dec(struct dentry *dentry) >> +{ >> + this_cpu_dec(nr_dentry_neg); >> +} >> + >> +static inline void neg_dentry_dec(struct dentry *dentry) >> +{ >> + if (unlikely(d_is_negative(dentry))) >> + __neg_dentry_dec(dentry); > unlikely() considered harmful. > > The workload you are trying to optimise is whe negative dentries are > the common case. IOWs, static branch prediction hints like this will > be wrong exactly when we want the branch to be predicted correctly > by the hardware. You are right. I will remove the branch hints. >> +} >> + >> +static inline void __neg_dentry_inc(struct dentry *dentry) >> +{ >> + this_cpu_inc(nr_dentry_neg); >> +} >> + >> +static inline void neg_dentry_inc(struct dentry *dentry) >> +{ >> + if (unlikely(d_is_negative(dentry))) >> + __neg_dentry_inc(dentry); >> +} > These wrappers obfuscate the code - they do not do what the > name suggests and instead are conditional on dentry state. > > I'd just open code this stuff - the code is much better without > the wrappers. Sure. I will open code the counter updates. >> + >> static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount) >> { >> /* >> @@ -331,6 +365,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry) >> flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); >> WRITE_ONCE(dentry->d_flags, flags); >> dentry->d_inode = NULL; >> + if (dentry->d_flags & DCACHE_LRU_LIST) >> + __neg_dentry_inc(dentry); >> } >> >> static void dentry_free(struct dentry *dentry) >> @@ -395,6 +431,7 @@ static void d_lru_add(struct dentry *dentry) >> dentry->d_flags |= DCACHE_LRU_LIST; >> this_cpu_inc(nr_dentry_unused); >> WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); >> + neg_dentry_inc(dentry); > Like this - why on earth would we increment the negative dentry > count for every dentry that is added to LRU? Open coding > > this_cpu_inc(nr_dentry_unused); > + if (d_is_negative(dentry)) > + this_cpu_inc(nr_dentry_neg); > WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); > > That's obvious to the reader what we are doing, and it aggregates > all the accounting in a single location. Same for the rest of the > code. > > Cheers, > > Dave. Cheers, Longman
On Tue, Aug 28, 2018 at 04:01:50PM -0700, Andrew Morton wrote:
>
> Another pet peeve ;)
>
> On Tue, 28 Aug 2018 13:19:40 -0400 Waiman Long <longman@redhat.com> wrote:
>
> > /**
> > + * list_lru_add_head: add an element to the lru list's head
> > + * @list_lru: the lru pointer
> > + * @item: the item to be added.
> > + *
> > + * This is similar to list_lru_add(). The only difference is the location
> > + * where the new item will be added. The list_lru_add() function will add
>
> People often use the term "the foo() function". I don't know why -
> just say "foo()"!
For whatever it is worth...
I tend to use "The foo() function ..." instead of "foo() ..." in order
to properly capitalize the first word of the sentence. So I might say
"The call_rcu() function enqueues an RCU callback." rather than something
like "call_rcu() enqueues an RCU callback." Or I might use some other
trick to keep "call_rcu()" from being the first word of the sentence.
But if the end of the previous sentence introduced call_rcu(), you
usually want the next sentence's first use of "call_rcu()" to be very
early in the sentence, because otherwise the flow will seem choppy.
And no, I have no idea what I would do if I were writing in German,
where nouns are capitalized, given that function names tend to be used
as nouns. Probably I would get yelled at a lot for capitalizing my
function names. ;-)
Thanx, Paul
On 08/28/2018 09:02 PM, Dave Chinner wrote: > On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote: >> For negative dentries that are accessed once and never used again, they >> should be removed first before other dentries when shrinker is running. >> This is done by putting negative dentries at the head of the LRU list >> instead at the tail. >> >> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it >> is initially created. When such a dentry is added to the LRU, it will be >> added to the head so that it will be the first to go when a shrinker is >> running if it is never accessed again (DCACHE_REFERENCED bit not set). >> The flag is cleared after the LRU list addition. > This exposes internal LRU list functionality to callers. I carefully > hid what end of the list was the most or least recent from > subsystems interacting with LRUs precisely because "head" and "tail" > are completely confusing when interacting with a LRU. > > LRUs are about tracking relative object age, not heads and tails. > > IOWs, "track this object as most recently used", not "add this > object to the list tail". The opposite is "track this object is > least recently used", not "add object at head of list". > > IOWs, the interface should be list_lru_add_oldest() or maybe > list_lru_add_least_recent() to indicate that these objects are > considered to be the oldest and therefore prime immediate reclaim > candidates. Good point. I will use age as function name instead. > Which points out a problem. That the most recent negative dentry > will be the first to be reclaimed. That's not LRU behaviour, and > prevents a working set of negative dentries from being retained > when the shrinker rotates through the dentries in LRU order. i.e. > this patch turns the LRU into a MRU list for negative dentries. > > And then there's shrinker behaviour. What happens if the shrinker > isolate callback returns LRU_ROTATE on a negative dentry? It gets > moved to the most recent end of the list, so it won't have an > attempt to reclaim it again until it's tried to reclaim all the real > dentries. IOWs, it goes back to behaving like LRUs are supposed to > behaving. > > IOWs, reclaim behaviour of negative dentries will be > highly unpredictable, it will not easily retain a working set, nor > will the working set it does retain be predictable or easy to eject > from memory when the workload changes. > > Is this the behavour what we want for negative dentries? I am aware that the behavior is not strictly LRU for negative dentries. This is a compromise for using one LRU list for 2 different classes of dentries. The basic idea is that negative dentries that are used only once will go first irrespective of their age. > Perhaps a second internal LRU list in the list_lru for "immediate > reclaim" objects would be a better solution. i.e. similar to the > active/inactive lists used for prioritising the working set iover > single use pages in page reclaim. negative dentries go onto the > immediate list, real dentries go on the existing list. Both are LRU, > and the shrinker operates on the immediate list first. When we > rotate referenced negative dentries on the immediate list, promote > them to the active list with all the real dentries so they age out > with the rest of the working set. That way single use negative > dentries get removed in LRU order in preference to the working set > of real dentries. > > Being able to make changes to the list implementation easily was one > of the reasons I hid the implementation of the list_lru from the > interface callers use.... > > [...] I have thought about using 2 lists for dentries. That will require much more extensive changes to the code and hence much more testing will be needed to verify their correctness. That is the main reason why I try to avoid doing that. As you have suggested, we could implement this 2-level LRU list in the list_lru API. But it is used by other subsystems as well. Extensive change like that will have similar issue in term of testing and verification effort. >> @@ -430,8 +424,20 @@ static void d_lru_add(struct dentry *dentry) >> D_FLAG_VERIFY(dentry, 0); >> dentry->d_flags |= DCACHE_LRU_LIST; >> this_cpu_inc(nr_dentry_unused); >> + if (d_is_negative(dentry)) { >> + __neg_dentry_inc(dentry); > /me notes this patch now open codes this, like suggested in the > previous patch. > >> + if (dentry->d_flags & DCACHE_NEW_NEGATIVE) { >> + /* >> + * Add the negative dentry to the head once, it >> + * will be added to the tail next time. >> + */ >> + WARN_ON_ONCE(!list_lru_add_head( >> + &dentry->d_sb->s_dentry_lru, &dentry->d_lru)); >> + dentry->d_flags &= ~DCACHE_NEW_NEGATIVE; >> + return; >> + } >> + } >> WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)); >> - neg_dentry_inc(dentry); >> } >> >> static void d_lru_del(struct dentry *dentry) >> @@ -2620,6 +2626,9 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode) >> __d_set_inode_and_type(dentry, inode, add_flags); >> raw_write_seqcount_end(&dentry->d_seq); >> fsnotify_update_flags(dentry); >> + } else { >> + /* It is a negative dentry, add it to LRU head initially. */ >> + dentry->d_flags |= DCACHE_NEW_NEGATIVE; > Comments about LRU behaviour should not be put anywhere but in the > lru code. Otherwise it ends up stale and wrong, assuming it is > correct to start with.... > I will document all that in the LRU code. >> } >> __d_rehash(dentry); >> if (dir) >> diff --git a/include/linux/dcache.h b/include/linux/dcache.h >> index df942e5..03a1918 100644 >> --- a/include/linux/dcache.h >> +++ b/include/linux/dcache.h >> @@ -214,6 +214,7 @@ struct dentry_operations { >> #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */ >> #define DCACHE_ENCRYPTED_WITH_KEY 0x02000000 /* dir is encrypted with a valid key */ >> #define DCACHE_OP_REAL 0x04000000 >> +#define DCACHE_NEW_NEGATIVE 0x08000000 /* New negative dentry */ >> >> #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */ >> #define DCACHE_DENTRY_CURSOR 0x20000000 >> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h >> index aa5efd9..bfac057 100644 >> --- a/include/linux/list_lru.h >> +++ b/include/linux/list_lru.h >> @@ -90,6 +90,23 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, >> bool list_lru_add(struct list_lru *lru, struct list_head *item); >> >> /** >> + * list_lru_add_head: add an element to the lru list's head >> + * @list_lru: the lru pointer >> + * @item: the item to be added. >> + * >> + * This is similar to list_lru_add(). The only difference is the location >> + * where the new item will be added. The list_lru_add() function will add >> + * the new item to the tail as it is the most recently used one. The >> + * list_lru_add_head() will add the new item into the head so that it >> + * will the first to go if a shrinker is running. So this function should >> + * only be used for less important item that can be the first to go if >> + * the system is under memory pressure. > As mentioned above, this API needs reference object ages, not > internal list ordering and shrinker implementation details. Sure. Thanks for the suggestion. Cheers, Longman
On 08/29/2018 03:51 AM, Michal Hocko wrote:
> On Tue 28-08-18 13:19:40, Waiman Long wrote:
>> For negative dentries that are accessed once and never used again, they
>> should be removed first before other dentries when shrinker is running.
>> This is done by putting negative dentries at the head of the LRU list
>> instead at the tail.
>>
>> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
>> is initially created. When such a dentry is added to the LRU, it will be
>> added to the head so that it will be the first to go when a shrinker is
>> running if it is never accessed again (DCACHE_REFERENCED bit not set).
>> The flag is cleared after the LRU list addition.
> Placing object to the head of the LRU list can be really tricky as Dave
> pointed out. I am not familiar with the dentry cache reclaim so my
> comparison below might not apply. Let me try anyway.
>
> Negative dentries sound very similar to MADV_FREE pages from the reclaim
> POV. They are primary candidate for reclaim, yet you want to preserve
> aging to other easily reclaimable objects (including other MADV_FREE
> pages). What we do for those pages is to move them from the anonymous
> LRU list to the inactive file LRU list. Now you obviously do not have
> anon/file LRUs but something similar to active/inactive LRU lists might
> be a reasonably good match. Have easily reclaimable dentries on the
> inactive list including negative dentries. If negative entries are
> heavily used then they can promote to the active list because there is
> no reason to reclaim them soon.
>
> Just my 2c
As mentioned in my reply to Dave, I did considered using a 2 LRU list
solution. However, that will add more complexity to the dcache LRU
management code than my current approach and probably more potential for
slowdown.
One point that I forgot to mention is that the current dcache LRU list
will only update the order of the dentries when the list is being
shrinked. If a dentry was referenced again before (the DCACHE_REFERENCED
bit set), it will rotated to the tail of the list via LRU rotation
instead of being removed. This is not strictly LRU anyway. The way that
my patch work will just make it move further away from true LRU behavior.
Cheers,
Longman
On 08/29/2018 01:54 PM, Paul E. McKenney wrote:
> On Tue, Aug 28, 2018 at 04:01:50PM -0700, Andrew Morton wrote:
>> Another pet peeve ;)
>>
>> On Tue, 28 Aug 2018 13:19:40 -0400 Waiman Long <longman@redhat.com> wrote:
>>
>>> /**
>>> + * list_lru_add_head: add an element to the lru list's head
>>> + * @list_lru: the lru pointer
>>> + * @item: the item to be added.
>>> + *
>>> + * This is similar to list_lru_add(). The only difference is the location
>>> + * where the new item will be added. The list_lru_add() function will add
>> People often use the term "the foo() function". I don't know why -
>> just say "foo()"!
> For whatever it is worth...
>
> I tend to use "The foo() function ..." instead of "foo() ..." in order
> to properly capitalize the first word of the sentence. So I might say
> "The call_rcu() function enqueues an RCU callback." rather than something
> like "call_rcu() enqueues an RCU callback." Or I might use some other
> trick to keep "call_rcu()" from being the first word of the sentence.
> But if the end of the previous sentence introduced call_rcu(), you
> usually want the next sentence's first use of "call_rcu()" to be very
> early in the sentence, because otherwise the flow will seem choppy.
>
> And no, I have no idea what I would do if I were writing in German,
> where nouns are capitalized, given that function names tend to be used
> as nouns. Probably I would get yelled at a lot for capitalizing my
> function names. ;-)
>
> Thanx, Paul
>
Yes, doing proper capitalization of the first letter of a sentence is
the main reason I used "The foo() function" in a sentence.
Cheers,
Longman
On Wed, Aug 29, 2018 at 10:54:05AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 28, 2018 at 04:01:50PM -0700, Andrew Morton wrote:
> > People often use the term "the foo() function". I don't know why -
> > just say "foo()"!
>
> For whatever it is worth...
>
> I tend to use "The foo() function ..." instead of "foo() ..." in order
> to properly capitalize the first word of the sentence. So I might say
> "The call_rcu() function enqueues an RCU callback." rather than something
> like "call_rcu() enqueues an RCU callback." Or I might use some other
> trick to keep "call_rcu()" from being the first word of the sentence.
I tend to write 'Use call_rcu() to enqueue a callback." or "When
call_rcu() returns, the callback will have been enqueued".
On Wed, Aug 29, 2018 at 03:34:05PM -0400, Waiman Long wrote: > On 08/28/2018 09:02 PM, Dave Chinner wrote: > > On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote: > > And then there's shrinker behaviour. What happens if the shrinker > > isolate callback returns LRU_ROTATE on a negative dentry? It gets > > moved to the most recent end of the list, so it won't have an > > attempt to reclaim it again until it's tried to reclaim all the real > > dentries. IOWs, it goes back to behaving like LRUs are supposed to > > behaving. > > > > IOWs, reclaim behaviour of negative dentries will be > > highly unpredictable, it will not easily retain a working set, nor > > will the working set it does retain be predictable or easy to eject > > from memory when the workload changes. > > > > Is this the behavour what we want for negative dentries? > > I am aware that the behavior is not strictly LRU for negative dentries. > This is a compromise for using one LRU list for 2 different classes of > dentries. Thus demonstrating just enough knowledge to be dangerous. We already 3 different classes of dentries on the LRU list: - in use dentries (because we use lazy removal to avoid lru list lock contention on cache hit lookups) - unused, referenced dentries - unused, unreferenced dentries. Each of these classes of dentries are treated differently by the shrinker, but the key point is that they are all aged the same way and so there's consistent maintenance of the working set under memory pressure. Putting negative dentries at the head of the list doesn't create a new "class" of object on the LRU, it just changes the ordering of the lru list. This will cause unpredictable behaviour because objects haven't had a chance to age gracefully before they are reclaimed. FYI, the inode cache has the same list_lru setup, object types and shrinker algorithm as the dentry cache, so this isn't a one-off. Indeed, the XFS buffer cache has a multi-reference heirarchy of 13 different types of {unused, referenced} buffers in it's list_lru to implement a quasi aging-NFU reclaim algorithm in it's shrinker. i.e. the list_lru infrastructure has never provided or enforced a pure LRU algorithm. It is common infrastructure intended to provide a scalable, flexible and memcg-aware FIFO-like object tracking system that interates tightly with memory reclaim to allow subsystems to implement cache reclaim algorithms that are optimal for that subsystem. IOWs, the list_lru doesn't define the reclaim algorithm the subsystem uses and there's no reason why we can't extend the infrastructure to support more complex algorithms without impacting existing subsystem reclaim algorithms at all. Only the subsystems that use the new infrastructure and algorithms would need careful examination. Of course, the overall system cache balancing behaviour under different and changing workloads would still need to be verified, but you have to do that for any cache reclaim algorithm change that is made.... > The basic idea is that negative dentries that are used only > once will go first irrespective of their age. Using MRU for negative dentries, as I've previously explained, is a flawed concept. It might be expedient to solve your specific problem, but it's not a solution to the general problem of negative dentry management. > > Perhaps a second internal LRU list in the list_lru for "immediate > > reclaim" objects would be a better solution. i.e. similar to the > > active/inactive lists used for prioritising the working set iover > > single use pages in page reclaim. negative dentries go onto the > > immediate list, real dentries go on the existing list. Both are LRU, > > and the shrinker operates on the immediate list first. When we > > rotate referenced negative dentries on the immediate list, promote > > them to the active list with all the real dentries so they age out > > with the rest of the working set. That way single use negative > > dentries get removed in LRU order in preference to the working set > > of real dentries. > > > > Being able to make changes to the list implementation easily was one > > of the reasons I hid the implementation of the list_lru from the > > interface callers use.... > > > > [...] > > I have thought about using 2 lists for dentries. That will require much > more extensive changes to the code and hence much more testing will be > needed to verify their correctness. That is the main reason why I try to > avoid doing that. i.e. expediency. However, you're changing the behaviour of core caching and memory reclaim algorithms. The amount and level of testing and verification you need to do is the same regardless of whether it's a small change or a large change. Sure, you've shown that *one* artificial micro-benchmark improves, but what about everything else? > As you have suggested, we could implement this 2-level LRU list in the > list_lru API. But it is used by other subsystems as well. Extensive > change like that will have similar issue in term of testing and > verification effort. I know what changing reclaim algorithm involves, how difficult it is to balance the competing caches quickly and to the desired ratios for acceptable performance, how difficult it is to measure and control the system reacts to transient and impulse memory pressure events, etc. I also know that "simple" and/or "obviously correct" subsystem changes can cause very unexepected system level effects, and that it's almost never what you think it is that caused the unexpected behaviour. IOWs, getting anything even slightly wrong in these algorithms will adversely affect system performance and balance significantly. Hence the bar is /always/ set high for core caching algorithm changes like this. Cheers, Dave. -- Dave Chinner david@fromorbit.com
On Wed, Aug 29, 2018 at 01:11:08PM -0400, Waiman Long wrote: > On 08/28/2018 08:11 PM, Dave Chinner wrote: > > On Tue, Aug 28, 2018 at 01:19:39PM -0400, Waiman Long wrote: > >> The current dentry number tracking code doesn't distinguish between > >> positive & negative dentries. It just reports the total number of > >> dentries in the LRU lists. > >> > >> As excessive number of negative dentries can have an impact on system > >> performance, it will be wise to track the number of positive and > >> negative dentries separately. > >> > >> This patch adds tracking for the total number of negative dentries in > >> the system LRU lists and reports it in the /proc/sys/fs/dentry-state > >> file. The number, however, does not include negative dentries that are > >> in flight but not in the LRU yet. > >> > >> The number of positive dentries in the LRU lists can be roughly found > >> by subtracting the number of negative dentries from the total. > >> > >> Signed-off-by: Waiman Long <longman@redhat.com> > >> --- > >> Documentation/sysctl/fs.txt | 19 +++++++++++++------ > >> fs/dcache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ > >> include/linux/dcache.h | 7 ++++--- > >> 3 files changed, 62 insertions(+), 9 deletions(-) > >> > >> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt > >> index 819caf8..118bb93 100644 > >> --- a/Documentation/sysctl/fs.txt > >> +++ b/Documentation/sysctl/fs.txt > >> @@ -63,19 +63,26 @@ struct { > >> int nr_unused; > >> int age_limit; /* age in seconds */ > >> int want_pages; /* pages requested by system */ > >> - int dummy[2]; > >> + int nr_negative; /* # of unused negative dentries */ > >> + int dummy; > >> } dentry_stat = {0, 0, 45, 0,}; > > That's not a backwards compatible ABI change. Those dummy fields > > used to represent some metric we no longer calculate, and there are > > probably still monitoring apps out there that think they still have > > the old meaning. i.e. they are still visible to userspace: > > > > $ cat /proc/sys/fs/dentry-state > > 83090 67661 45 0 0 0 > > $ > > > > IOWs, you can add new fields for new metrics to the end of the > > structure, but you can't re-use existing fields even if they > > aren't calculated anymore. > > > > [....] > > I looked up the git history and the state of the dentry_stat structure > hadn't changed since it was first put into git in 2.6.12-rc2 on Apr 16, > 2005. That was over 13 years ago. Even adding an extra argument can have > the potential of breaking old applications depending on how the parsing > code was written. I'm pretty we've had this discussion many times before w.r.t. /proc/self/mount* and other multi-field proc files. IIRC, The answer has always been that it's OK to extend lines with new fields as existing apps /should/ ignore them, but it's not OK to remove or redefine existing fields in the line because existing apps /will/ misinterpret what that field means. > Given that systems that are still using some very old tools are not > likely to upgrade to the latest kernel anyway. I don't see that as a big > problem. I don't think that matters when it comes to changing what information we expose in proc files. Cheers, Dave. -- Dave Chinner david@fromorbit.com
On Wed 29-08-18 15:58:52, Waiman Long wrote:
> On 08/29/2018 03:51 AM, Michal Hocko wrote:
> > On Tue 28-08-18 13:19:40, Waiman Long wrote:
> >> For negative dentries that are accessed once and never used again, they
> >> should be removed first before other dentries when shrinker is running.
> >> This is done by putting negative dentries at the head of the LRU list
> >> instead at the tail.
> >>
> >> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
> >> is initially created. When such a dentry is added to the LRU, it will be
> >> added to the head so that it will be the first to go when a shrinker is
> >> running if it is never accessed again (DCACHE_REFERENCED bit not set).
> >> The flag is cleared after the LRU list addition.
> > Placing object to the head of the LRU list can be really tricky as Dave
> > pointed out. I am not familiar with the dentry cache reclaim so my
> > comparison below might not apply. Let me try anyway.
> >
> > Negative dentries sound very similar to MADV_FREE pages from the reclaim
> > POV. They are primary candidate for reclaim, yet you want to preserve
> > aging to other easily reclaimable objects (including other MADV_FREE
> > pages). What we do for those pages is to move them from the anonymous
> > LRU list to the inactive file LRU list. Now you obviously do not have
> > anon/file LRUs but something similar to active/inactive LRU lists might
> > be a reasonably good match. Have easily reclaimable dentries on the
> > inactive list including negative dentries. If negative entries are
> > heavily used then they can promote to the active list because there is
> > no reason to reclaim them soon.
> >
> > Just my 2c
>
> As mentioned in my reply to Dave, I did considered using a 2 LRU list
> solution. However, that will add more complexity to the dcache LRU
> management code than my current approach and probably more potential for
> slowdown.
I completely agree with Dave here. This is not easy but trying to sneak
in something that works for an _artificial_ workload is simply a no go.
So if it takes to come with a more complex solution to cover more
general workloads then be it. Someone has to bite a bullet and explore
that direction. It won't be a simple project but well, if negative
dentries really matter then it is worth making the reclaim design robust
and comprehensible rather than adhoc and unpredictable.
--
Michal Hocko
SUSE Labs
On 08/30/2018 03:20 AM, Michal Hocko wrote:
> On Wed 29-08-18 15:58:52, Waiman Long wrote:
>> On 08/29/2018 03:51 AM, Michal Hocko wrote:
>>> On Tue 28-08-18 13:19:40, Waiman Long wrote:
>>>> For negative dentries that are accessed once and never used again, they
>>>> should be removed first before other dentries when shrinker is running.
>>>> This is done by putting negative dentries at the head of the LRU list
>>>> instead at the tail.
>>>>
>>>> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
>>>> is initially created. When such a dentry is added to the LRU, it will be
>>>> added to the head so that it will be the first to go when a shrinker is
>>>> running if it is never accessed again (DCACHE_REFERENCED bit not set).
>>>> The flag is cleared after the LRU list addition.
>>> Placing object to the head of the LRU list can be really tricky as Dave
>>> pointed out. I am not familiar with the dentry cache reclaim so my
>>> comparison below might not apply. Let me try anyway.
>>>
>>> Negative dentries sound very similar to MADV_FREE pages from the reclaim
>>> POV. They are primary candidate for reclaim, yet you want to preserve
>>> aging to other easily reclaimable objects (including other MADV_FREE
>>> pages). What we do for those pages is to move them from the anonymous
>>> LRU list to the inactive file LRU list. Now you obviously do not have
>>> anon/file LRUs but something similar to active/inactive LRU lists might
>>> be a reasonably good match. Have easily reclaimable dentries on the
>>> inactive list including negative dentries. If negative entries are
>>> heavily used then they can promote to the active list because there is
>>> no reason to reclaim them soon.
>>>
>>> Just my 2c
>> As mentioned in my reply to Dave, I did considered using a 2 LRU list
>> solution. However, that will add more complexity to the dcache LRU
>> management code than my current approach and probably more potential for
>> slowdown.
> I completely agree with Dave here. This is not easy but trying to sneak
> in something that works for an _artificial_ workload is simply a no go.
> So if it takes to come with a more complex solution to cover more
> general workloads then be it. Someone has to bite a bullet and explore
> that direction. It won't be a simple project but well, if negative
> dentries really matter then it is worth making the reclaim design robust
> and comprehensible rather than adhoc and unpredictable.
OK, I will need to spend more time to think about a better way of doing
that.
Cheers,
Longman
On 08/29/2018 09:43 PM, Dave Chinner wrote:
> On Wed, Aug 29, 2018 at 01:11:08PM -0400, Waiman Long wrote:
>> On 08/28/2018 08:11 PM, Dave Chinner wrote:
>>> On Tue, Aug 28, 2018 at 01:19:39PM -0400, Waiman Long wrote:
>>>> The current dentry number tracking code doesn't distinguish between
>>>> positive & negative dentries. It just reports the total number of
>>>> dentries in the LRU lists.
>>>>
>>>> As excessive number of negative dentries can have an impact on system
>>>> performance, it will be wise to track the number of positive and
>>>> negative dentries separately.
>>>>
>>>> This patch adds tracking for the total number of negative dentries in
>>>> the system LRU lists and reports it in the /proc/sys/fs/dentry-state
>>>> file. The number, however, does not include negative dentries that are
>>>> in flight but not in the LRU yet.
>>>>
>>>> The number of positive dentries in the LRU lists can be roughly found
>>>> by subtracting the number of negative dentries from the total.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>> Documentation/sysctl/fs.txt | 19 +++++++++++++------
>>>> fs/dcache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>>> include/linux/dcache.h | 7 ++++---
>>>> 3 files changed, 62 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
>>>> index 819caf8..118bb93 100644
>>>> --- a/Documentation/sysctl/fs.txt
>>>> +++ b/Documentation/sysctl/fs.txt
>>>> @@ -63,19 +63,26 @@ struct {
>>>> int nr_unused;
>>>> int age_limit; /* age in seconds */
>>>> int want_pages; /* pages requested by system */
>>>> - int dummy[2];
>>>> + int nr_negative; /* # of unused negative dentries */
>>>> + int dummy;
>>>> } dentry_stat = {0, 0, 45, 0,};
>>> That's not a backwards compatible ABI change. Those dummy fields
>>> used to represent some metric we no longer calculate, and there are
>>> probably still monitoring apps out there that think they still have
>>> the old meaning. i.e. they are still visible to userspace:
>>>
>>> $ cat /proc/sys/fs/dentry-state
>>> 83090 67661 45 0 0 0
>>> $
>>>
>>> IOWs, you can add new fields for new metrics to the end of the
>>> structure, but you can't re-use existing fields even if they
>>> aren't calculated anymore.
>>>
>>> [....]
>> I looked up the git history and the state of the dentry_stat structure
>> hadn't changed since it was first put into git in 2.6.12-rc2 on Apr 16,
>> 2005. That was over 13 years ago. Even adding an extra argument can have
>> the potential of breaking old applications depending on how the parsing
>> code was written.
> I'm pretty we've had this discussion many times before w.r.t.
> /proc/self/mount* and other multi-field proc files.
>
> IIRC, The answer has always been that it's OK to extend lines with
> new fields as existing apps /should/ ignore them, but it's not OK to
> remove or redefine existing fields in the line because existing apps
> /will/ misinterpret what that field means.
>
>> Given that systems that are still using some very old tools are not
>> likely to upgrade to the latest kernel anyway. I don't see that as a big
>> problem.
> I don't think that matters when it comes to changing what
> information we expose in proc files.
>
> Cheers,
>
> Dave.
I am not against appending the new count to the end. I just want to make
sure that it is the right thing to do.
Cheers,
Longman
On 08/29/2018 09:12 PM, Dave Chinner wrote:
> On Wed, Aug 29, 2018 at 03:34:05PM -0400, Waiman Long wrote:
>> On 08/28/2018 09:02 PM, Dave Chinner wrote:
>>> On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
>>> And then there's shrinker behaviour. What happens if the shrinker
>>> isolate callback returns LRU_ROTATE on a negative dentry? It gets
>>> moved to the most recent end of the list, so it won't have an
>>> attempt to reclaim it again until it's tried to reclaim all the real
>>> dentries. IOWs, it goes back to behaving like LRUs are supposed to
>>> behaving.
>>>
>>> IOWs, reclaim behaviour of negative dentries will be
>>> highly unpredictable, it will not easily retain a working set, nor
>>> will the working set it does retain be predictable or easy to eject
>>> from memory when the workload changes.
>>>
>>> Is this the behavour what we want for negative dentries?
>> I am aware that the behavior is not strictly LRU for negative dentries.
>> This is a compromise for using one LRU list for 2 different classes of
>> dentries.
> Thus demonstrating just enough knowledge to be dangerous.
>
> We already 3 different classes of dentries on the LRU list:
>
> - in use dentries (because we use lazy removal to avoid
> lru list lock contention on cache hit lookups)
> - unused, referenced dentries
> - unused, unreferenced dentries.
>
> Each of these classes of dentries are treated differently by the
> shrinker, but the key point is that they are all aged the same way
> and so there's consistent maintenance of the working set under
> memory pressure. Putting negative dentries at the head of the list
> doesn't create a new "class" of object on the LRU, it just changes
> the ordering of the lru list. This will cause unpredictable
> behaviour because objects haven't had a chance to age gracefully
> before they are reclaimed.
>
> FYI, the inode cache has the same list_lru setup, object types and
> shrinker algorithm as the dentry cache, so this isn't a one-off.
> Indeed, the XFS buffer cache has a multi-reference heirarchy of 13
> different types of {unused, referenced} buffers in it's list_lru to
> implement a quasi aging-NFU reclaim algorithm in it's shrinker.
>
> i.e. the list_lru infrastructure has never provided or enforced a
> pure LRU algorithm. It is common infrastructure intended to provide
> a scalable, flexible and memcg-aware FIFO-like object tracking
> system that interates tightly with memory reclaim to allow
> subsystems to implement cache reclaim algorithms that are optimal
> for that subsystem.
>
> IOWs, the list_lru doesn't define the reclaim algorithm the
> subsystem uses and there's no reason why we can't extend the
> infrastructure to support more complex algorithms without impacting
> existing subsystem reclaim algorithms at all. Only the subsystems
> that use the new infrastructure and algorithms would need careful
> examination. Of course, the overall system cache balancing
> behaviour under different and changing workloads would still need to
> be verified, but you have to do that for any cache reclaim algorithm
> change that is made....
>
>> The basic idea is that negative dentries that are used only
>> once will go first irrespective of their age.
> Using MRU for negative dentries, as I've previously explained, is a
> flawed concept. It might be expedient to solve your specific
> problem, but it's not a solution to the general problem of negative
> dentry management.
>
>>> Perhaps a second internal LRU list in the list_lru for "immediate
>>> reclaim" objects would be a better solution. i.e. similar to the
>>> active/inactive lists used for prioritising the working set iover
>>> single use pages in page reclaim. negative dentries go onto the
>>> immediate list, real dentries go on the existing list. Both are LRU,
>>> and the shrinker operates on the immediate list first. When we
>>> rotate referenced negative dentries on the immediate list, promote
>>> them to the active list with all the real dentries so they age out
>>> with the rest of the working set. That way single use negative
>>> dentries get removed in LRU order in preference to the working set
>>> of real dentries.
>>>
>>> Being able to make changes to the list implementation easily was one
>>> of the reasons I hid the implementation of the list_lru from the
>>> interface callers use....
>>>
>>> [...]
>> I have thought about using 2 lists for dentries. That will require much
>> more extensive changes to the code and hence much more testing will be
>> needed to verify their correctness. That is the main reason why I try to
>> avoid doing that.
> i.e. expediency.
>
> However, you're changing the behaviour of core caching and memory
> reclaim algorithms. The amount and level of testing and verification
> you need to do is the same regardless of whether it's a small change
> or a large change. Sure, you've shown that *one* artificial
> micro-benchmark improves, but what about everything else?
>
>> As you have suggested, we could implement this 2-level LRU list in the
>> list_lru API. But it is used by other subsystems as well. Extensive
>> change like that will have similar issue in term of testing and
>> verification effort.
> I know what changing reclaim algorithm involves, how difficult
> it is to balance the competing caches quickly and to the desired
> ratios for acceptable performance, how difficult it is to measure
> and control the system reacts to transient and impulse memory
> pressure events, etc.
>
> I also know that "simple" and/or "obviously correct" subsystem
> changes can cause very unexepected system level effects, and that
> it's almost never what you think it is that caused the unexpected
> behaviour. IOWs, getting anything even slightly wrong in these
> algorithms will adversely affect system performance and balance
> significantly. Hence the bar is /always/ set high for core caching
> algorithm changes like this.
>
> Cheers,
>
> Dave.
Thanks for the comments. I will need more time to think about it.
Cheers,
Longman
On Wed, Aug 29, 2018 at 10:11:53AM +1000, Dave Chinner wrote:
> > +++ b/Documentation/sysctl/fs.txt
> > @@ -63,19 +63,26 @@ struct {
> > int nr_unused;
> > int age_limit; /* age in seconds */
> > int want_pages; /* pages requested by system */
> > - int dummy[2];
> > + int nr_negative; /* # of unused negative dentries */
> > + int dummy;
> > } dentry_stat = {0, 0, 45, 0,};
>
> That's not a backwards compatible ABI change. Those dummy fields
> used to represent some metric we no longer calculate, and there are
> probably still monitoring apps out there that think they still have
> the old meaning. i.e. they are still visible to userspace:
I believe you are incorrect. dentry_stat was introduced in 2.1.60 with
this hunk:
+struct {
+ int nr_dentry;
+ int nr_unused;
+ int age_limit; /* age in seconds */
+ int want_pages; /* pages requested by system */
+ int dummy[2];
+} dentry_stat = {0, 0, 45, 0,};
+
Looking through the rest of the dentry_stat changes in the 2.1.60 release,
it's not replacing anything, it's adding new information.
On 08/31/2018 10:31 AM, Matthew Wilcox wrote:
> On Wed, Aug 29, 2018 at 10:11:53AM +1000, Dave Chinner wrote:
>>> +++ b/Documentation/sysctl/fs.txt
>>> @@ -63,19 +63,26 @@ struct {
>>> int nr_unused;
>>> int age_limit; /* age in seconds */
>>> int want_pages; /* pages requested by system */
>>> - int dummy[2];
>>> + int nr_negative; /* # of unused negative dentries */
>>> + int dummy;
>>> } dentry_stat = {0, 0, 45, 0,};
>> That's not a backwards compatible ABI change. Those dummy fields
>> used to represent some metric we no longer calculate, and there are
>> probably still monitoring apps out there that think they still have
>> the old meaning. i.e. they are still visible to userspace:
> I believe you are incorrect. dentry_stat was introduced in 2.1.60 with
> this hunk:
>
> +struct {
> + int nr_dentry;
> + int nr_unused;
> + int age_limit; /* age in seconds */
> + int want_pages; /* pages requested by system */
> + int dummy[2];
> +} dentry_stat = {0, 0, 45, 0,};
> +
>
> Looking through the rest of the dentry_stat changes in the 2.1.60 release,
> it's not replacing anything, it's adding new information.
Thanks for looking up earlier non-git source tree. If that is the case,
the dummy[2] was there just for future extension purpose. It should be
perfectly fine to reuse one of the dummy entry for negative dentry count
then as no sane application would have checked the last 2 entries of
dentry-state and do dummy things if they are non-zero.
Cheers,
Longman