All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] fix the negative dentres bloating system memory usage
@ 2021-01-21 13:19 Gautham Ananthakrishna
  2021-01-21 13:19 ` [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings Gautham Ananthakrishna
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Gautham Ananthakrishna @ 2021-01-21 13:19 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: viro, matthew.wilcox, khlebnikov, gautham.ananthakrishna

For most filesystems result of every negative lookup is cached, content of
directories is usually cached too. Production of negative dentries isn't
limited with disk speed. It's really easy to generate millions of them if
system has enough memory.

Getting this memory back ins't that easy because slab frees pages only when
all related objects are gone. While dcache shrinker works in LRU order.

Typical scenario is an idle system where some process periodically creates
temporary files and removes them. After some time, memory will be filled
with negative dentries for these random file names.

Simple lookup of random names also generates negative dentries very fast.
Constant flow of such negative denries drains all other inactive caches.
Too many negative dentries in the system can cause memory fragmentation
and memory compaction.

Negative dentries are linked into siblings list along with normal positive
dentries. Some operations walks dcache tree but looks only for positive
dentries: most important is fsnotify/inotify. Hordes of negative dentries
slow down these operations significantly.

Time of dentry lookup is usually unaffected because hash table grows along
with size of memory. Unless somebody especially crafts hash collisions.

This patch set solves all of these problems:

Move negative denries to the end of sliblings list, thus walkers could
skip them at first sight (patches 1-4).

Keep in dcache at most three unreferenced negative denties in row in each
hash bucket (patches 5-6).

We tested this patch set recently and found it limiting negative dentry to a
small part of total memory. The following is the test result we ran on two
types of servers, one is 256G memory with 24 CPUS and another is 3T memory
with 384 CPUS. The test case is using a lot of processes to generate negative
dentry in parallel, the following is the test result after 72 hours, the
negative dentry number is stable around that number even after running longer
for much longer time. Without the patch set, in less than half an hour 197G was
taken by negative dentry on 256G system, in 1 day 2.4T was taken on 3T system.

system memory   neg-dentry-number   neg-dentry-mem-usage
256G            55259084            10.6G
3T              202306756           38.8G

For perf test, we ran the following, and no regression found.

1. create 1M negative dentry and then touch them to convert them to positive
   dentry

2. create 10K/100K/1M files

3. remove 10K/100K/1M files

4. kernel compile

To verify the fsnotify fix, we used inotifywait to watch file create/open in
some directory where there is a lot of negative dentry, without the patch set,
the system would run into soft lockup, with it, no soft lockup was found.

We also tried to defeat the limitation by making different processes generate
negative dentry with the same name, that will make one negative dentry being
accessed couple times around same time, DCACHE_REFERENCED will be set on it
and it can't be trimmed easily.

There were a lot of customer cases on this issue. It makes no sense to leave
so many negative dentry, it just causes memory fragmentation and compaction
and does not help a lot.

Konstantin Khlebnikov (6):
  dcache: sweep cached negative dentries to the end of list of siblings
  fsnotify: stop walking child dentries if remaining tail is negative
  dcache: add action D_WALK_SKIP_SIBLINGS to d_walk()
  dcache: stop walking siblings if remaining dentries all negative
  dcache: push releasing dentry lock into sweep_negative
  dcache: prevent flooding with negative dentries

 fs/dcache.c            | 135 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/libfs.c             |   3 ++
 fs/notify/fsnotify.c   |   6 ++-
 include/linux/dcache.h |   6 +++
 4 files changed, 145 insertions(+), 5 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings
@ 2021-01-21 16:17 kernel test robot
  0 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2021-01-21 16:17 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 13817 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <1611235185-1685-2-git-send-email-gautham.ananthakrishna@oracle.com>
References: <1611235185-1685-2-git-send-email-gautham.ananthakrishna@oracle.com>
TO: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>

Hi Gautham,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ext3/fsnotify]
[cannot apply to linux/master linus/master v5.11-rc4 next-20210121]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Gautham-Ananthakrishna/fix-the-negative-dentres-bloating-system-memory-usage/20210121-212603
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
:::::: branch date: 3 hours ago
:::::: commit date: 3 hours ago
config: nios2-randconfig-s032-20210121 (attached as .config)
compiler: nios2-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-208-g46a52ca4-dirty
        # https://github.com/0day-ci/linux/commit/65ea9583a5de81f48ddc324932035ce2a4f0f8db
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Gautham-Ananthakrishna/fix-the-negative-dentres-bloating-system-memory-usage/20210121-212603
        git checkout 65ea9583a5de81f48ddc324932035ce2a4f0f8db
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=nios2 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
   fs/dcache.c:577:28: sparse: sparse: context imbalance in '__dentry_kill' - unexpected unlock
   fs/dcache.c:622:9: sparse: sparse: context imbalance in '__lock_parent' - wrong count at exit
>> fs/dcache.c:640:13: sparse: sparse: context imbalance in 'sweep_negative' - different lock contexts for basic block
   fs/dcache.c:674:20: sparse: sparse: context imbalance in 'recycle_negative' - unexpected unlock
   fs/dcache.c:767:20: sparse: sparse: context imbalance in 'dentry_kill' - different lock contexts for basic block
   fs/dcache.c:929:17: sparse: sparse: context imbalance in 'dput' - unexpected unlock
   fs/dcache.c:958:20: sparse: sparse: context imbalance in 'dput_to_list' - unexpected unlock
   fs/dcache.c:1097:18: sparse: sparse: context imbalance in 'd_prune_aliases' - different lock contexts for basic block
   fs/dcache.c:1127:13: sparse: sparse: context imbalance in 'shrink_lock_dentry' - different lock contexts for basic block
   fs/dcache.c:1170:27: sparse: sparse: context imbalance in 'shrink_dentry_list' - different lock contexts for basic block
   fs/dcache.c:1277:24: sparse: sparse: context imbalance in 'dentry_lru_isolate_shrink' - different lock contexts for basic block
   fs/dcache.c:1339:13: sparse: sparse: context imbalance in 'd_walk' - different lock contexts for basic block
   fs/dcache.c:1573:24: sparse: sparse: context imbalance in 'select_collect2' - different lock contexts for basic block
   fs/dcache.c:1632:44: sparse: sparse: context imbalance in 'shrink_dcache_parent' - unexpected unlock
   fs/dcache.c: note: in included file (through include/linux/dcache.h, include/linux/fs.h, include/linux/huge_mm.h, include/linux/mm.h):
   include/linux/rculist_bl.h:24:33: sparse: sparse: incompatible types in comparison expression (different address spaces):
   include/linux/rculist_bl.h:24:33: sparse:    struct hlist_bl_node [noderef] __rcu *
   include/linux/rculist_bl.h:24:33: sparse:    struct hlist_bl_node *
   include/linux/rculist_bl.h:24:33: sparse: sparse: incompatible types in comparison expression (different address spaces):
   include/linux/rculist_bl.h:24:33: sparse:    struct hlist_bl_node [noderef] __rcu *
   include/linux/rculist_bl.h:24:33: sparse:    struct hlist_bl_node *
   include/linux/rculist_bl.h:17:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   include/linux/rculist_bl.h:17:9: sparse:    struct hlist_bl_node [noderef] __rcu *
   include/linux/rculist_bl.h:17:9: sparse:    struct hlist_bl_node *
   include/linux/rculist_bl.h:17:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   include/linux/rculist_bl.h:17:9: sparse:    struct hlist_bl_node [noderef] __rcu *
   include/linux/rculist_bl.h:17:9: sparse:    struct hlist_bl_node *

vim +/sweep_negative +640 fs/dcache.c

ba65dc5ef16f82f Al Viro               2016-06-10  549  
e55fd011549eae0 Al Viro               2014-05-28  550  static void __dentry_kill(struct dentry *dentry)
77812a1ef139d84 Nick Piggin           2011-01-07  551  {
41edf278fc2f042 Al Viro               2014-05-01  552  	struct dentry *parent = NULL;
41edf278fc2f042 Al Viro               2014-05-01  553  	bool can_free = true;
41edf278fc2f042 Al Viro               2014-05-01  554  	if (!IS_ROOT(dentry))
77812a1ef139d84 Nick Piggin           2011-01-07  555  		parent = dentry->d_parent;
31e6b01f4183ff4 Nick Piggin           2011-01-07  556  
0d98439ea3c6ffb Linus Torvalds        2013-09-08  557  	/*
0d98439ea3c6ffb Linus Torvalds        2013-09-08  558  	 * The dentry is now unrecoverably dead to the world.
0d98439ea3c6ffb Linus Torvalds        2013-09-08  559  	 */
0d98439ea3c6ffb Linus Torvalds        2013-09-08  560  	lockref_mark_dead(&dentry->d_lockref);
0d98439ea3c6ffb Linus Torvalds        2013-09-08  561  
f0023bc617ba600 Sage Weil             2011-10-28  562  	/*
f0023bc617ba600 Sage Weil             2011-10-28  563  	 * inform the fs via d_prune that this dentry is about to be
f0023bc617ba600 Sage Weil             2011-10-28  564  	 * unhashed and destroyed.
f0023bc617ba600 Sage Weil             2011-10-28  565  	 */
2926620145095ff Al Viro               2014-05-30  566  	if (dentry->d_flags & DCACHE_OP_PRUNE)
61572bb1f40b9be Yan, Zheng            2013-04-15  567  		dentry->d_op->d_prune(dentry);
61572bb1f40b9be Yan, Zheng            2013-04-15  568  
01b6035190b0242 Al Viro               2014-04-29  569  	if (dentry->d_flags & DCACHE_LRU_LIST) {
01b6035190b0242 Al Viro               2014-04-29  570  		if (!(dentry->d_flags & DCACHE_SHRINK_LIST))
01b6035190b0242 Al Viro               2014-04-29  571  			d_lru_del(dentry);
01b6035190b0242 Al Viro               2014-04-29  572  	}
77812a1ef139d84 Nick Piggin           2011-01-07  573  	/* if it was on the hash then remove it */
77812a1ef139d84 Nick Piggin           2011-01-07  574  	__d_drop(dentry);
ba65dc5ef16f82f Al Viro               2016-06-10  575  	dentry_unlist(dentry, parent);
03b3b889e79cdb6 Al Viro               2014-04-29  576  	if (parent)
03b3b889e79cdb6 Al Viro               2014-04-29 @577  		spin_unlock(&parent->d_lock);
550dce01dd606c8 Al Viro               2016-05-29  578  	if (dentry->d_inode)
550dce01dd606c8 Al Viro               2016-05-29  579  		dentry_unlink_inode(dentry);
550dce01dd606c8 Al Viro               2016-05-29  580  	else
550dce01dd606c8 Al Viro               2016-05-29  581  		spin_unlock(&dentry->d_lock);
03b3b889e79cdb6 Al Viro               2014-04-29  582  	this_cpu_dec(nr_dentry);
03b3b889e79cdb6 Al Viro               2014-04-29  583  	if (dentry->d_op && dentry->d_op->d_release)
03b3b889e79cdb6 Al Viro               2014-04-29  584  		dentry->d_op->d_release(dentry);
03b3b889e79cdb6 Al Viro               2014-04-29  585  
41edf278fc2f042 Al Viro               2014-05-01  586  	spin_lock(&dentry->d_lock);
41edf278fc2f042 Al Viro               2014-05-01  587  	if (dentry->d_flags & DCACHE_SHRINK_LIST) {
41edf278fc2f042 Al Viro               2014-05-01  588  		dentry->d_flags |= DCACHE_MAY_FREE;
41edf278fc2f042 Al Viro               2014-05-01  589  		can_free = false;
41edf278fc2f042 Al Viro               2014-05-01  590  	}
41edf278fc2f042 Al Viro               2014-05-01  591  	spin_unlock(&dentry->d_lock);
41edf278fc2f042 Al Viro               2014-05-01  592  	if (likely(can_free))
b4f0354e968f5fa Al Viro               2014-04-29  593  		dentry_free(dentry);
9c5f1d30199d09f Al Viro               2018-04-15  594  	cond_resched();
e55fd011549eae0 Al Viro               2014-05-28  595  }
e55fd011549eae0 Al Viro               2014-05-28  596  
8b987a46a1e0e93 Al Viro               2018-02-23  597  static struct dentry *__lock_parent(struct dentry *dentry)
046b961b45f93a9 Al Viro               2014-05-29  598  {
8b987a46a1e0e93 Al Viro               2018-02-23  599  	struct dentry *parent;
046b961b45f93a9 Al Viro               2014-05-29  600  	rcu_read_lock();
c2338f2dc7c1e9f Al Viro               2014-06-12  601  	spin_unlock(&dentry->d_lock);
046b961b45f93a9 Al Viro               2014-05-29  602  again:
66702eb59064f10 Mark Rutland          2017-10-23  603  	parent = READ_ONCE(dentry->d_parent);
046b961b45f93a9 Al Viro               2014-05-29  604  	spin_lock(&parent->d_lock);
046b961b45f93a9 Al Viro               2014-05-29  605  	/*
046b961b45f93a9 Al Viro               2014-05-29  606  	 * We can't blindly lock dentry until we are sure
046b961b45f93a9 Al Viro               2014-05-29  607  	 * that we won't violate the locking order.
046b961b45f93a9 Al Viro               2014-05-29  608  	 * Any changes of dentry->d_parent must have
046b961b45f93a9 Al Viro               2014-05-29  609  	 * been done with parent->d_lock held, so
046b961b45f93a9 Al Viro               2014-05-29  610  	 * spin_lock() above is enough of a barrier
046b961b45f93a9 Al Viro               2014-05-29  611  	 * for checking if it's still our child.
046b961b45f93a9 Al Viro               2014-05-29  612  	 */
046b961b45f93a9 Al Viro               2014-05-29  613  	if (unlikely(parent != dentry->d_parent)) {
046b961b45f93a9 Al Viro               2014-05-29  614  		spin_unlock(&parent->d_lock);
046b961b45f93a9 Al Viro               2014-05-29  615  		goto again;
046b961b45f93a9 Al Viro               2014-05-29  616  	}
65d8eb5a8f54807 Al Viro               2018-02-23  617  	rcu_read_unlock();
65d8eb5a8f54807 Al Viro               2018-02-23  618  	if (parent != dentry)
9f12600fe425bc2 Linus Torvalds        2014-05-31  619  		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
65d8eb5a8f54807 Al Viro               2018-02-23  620  	else
046b961b45f93a9 Al Viro               2014-05-29  621  		parent = NULL;
046b961b45f93a9 Al Viro               2014-05-29  622  	return parent;
046b961b45f93a9 Al Viro               2014-05-29  623  }
046b961b45f93a9 Al Viro               2014-05-29  624  
8b987a46a1e0e93 Al Viro               2018-02-23  625  static inline struct dentry *lock_parent(struct dentry *dentry)
8b987a46a1e0e93 Al Viro               2018-02-23  626  {
8b987a46a1e0e93 Al Viro               2018-02-23  627  	struct dentry *parent = dentry->d_parent;
8b987a46a1e0e93 Al Viro               2018-02-23  628  	if (IS_ROOT(dentry))
8b987a46a1e0e93 Al Viro               2018-02-23  629  		return NULL;
8b987a46a1e0e93 Al Viro               2018-02-23  630  	if (likely(spin_trylock(&parent->d_lock)))
8b987a46a1e0e93 Al Viro               2018-02-23  631  		return parent;
8b987a46a1e0e93 Al Viro               2018-02-23  632  	return __lock_parent(dentry);
8b987a46a1e0e93 Al Viro               2018-02-23  633  }
8b987a46a1e0e93 Al Viro               2018-02-23  634  
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  635  /*
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  636   * Move cached negative dentry to the tail of parent->d_subdirs.
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  637   * This lets walkers skip them all together at first sight.
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  638   * Must be called at dput of negative dentry.
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  639   */
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21 @640  static void sweep_negative(struct dentry *dentry)
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  641  {
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  642  	struct dentry *parent;
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  643  
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  644  	if (!d_is_tail_negative(dentry)) {
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  645  		parent = lock_parent(dentry);
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  646  		if (!parent)
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  647  			return;
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  648  
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  649  		if (!d_count(dentry) && d_is_negative(dentry) &&
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  650  		    !d_is_tail_negative(dentry)) {
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  651  			dentry->d_flags |= DCACHE_TAIL_NEGATIVE;
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  652  			list_move_tail(&dentry->d_child, &parent->d_subdirs);
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  653  		}
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  654  
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  655  		spin_unlock(&parent->d_lock);
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  656  	}
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  657  }
65ea9583a5de81f Konstantin Khlebnikov 2021-01-21  658  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 21060 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-04-15 16:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21 13:19 [PATCH RFC 0/6] fix the negative dentres bloating system memory usage Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings Gautham Ananthakrishna
2021-04-14  3:00   ` Al Viro
2021-04-15 16:50     ` Al Viro
2021-04-14  3:41   ` Al Viro
2021-04-15 16:25     ` Al Viro
2021-01-21 13:19 ` [PATCH RFC 2/6] fsnotify: stop walking child dentries if remaining tail is negative Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 3/6] dcache: add action D_WALK_SKIP_SIBLINGS to d_walk() Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 4/6] dcache: stop walking siblings if remaining dentries all negative Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 5/6] dcache: push releasing dentry lock into sweep_negative Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 6/6] dcache: prevent flooding with negative dentries Gautham Ananthakrishna
2021-04-14  3:56   ` Al Viro
2021-03-31 14:23 ` [PATCH RFC 0/6] fix the negative dentres bloating system memory usage Matthew Wilcox
2021-04-14  2:40 ` Al Viro
2021-01-21 16:17 [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.