All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qi Zheng <qi.zheng@linux.dev>
To: akpm@linux-foundation.org, tkhai@ya.ru, roman.gushchin@linux.dev,
	vbabka@suse.cz, viro@zeniv.linux.org.uk, brauner@kernel.org,
	djwong@kernel.org, hughd@google.com, paulmck@kernel.org,
	muchun.song@linux.dev
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Qi Zheng <zhengqi.arch@bytedance.com>
Subject: [PATCH 0/8] make unregistration of super_block shrinker more faster
Date: Wed, 31 May 2023 09:57:34 +0000	[thread overview]
Message-ID: <20230531095742.2480623-1-qi.zheng@linux.dev> (raw)

From: Qi Zheng <zhengqi.arch@bytedance.com>

Hi all,

This patch series aims to make unregistration of super_block shrinker more
faster.

1. Background
=============

The kernel test robot noticed a -88.8% regression of stress-ng.ramfs.ops_per_sec
on commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). More
details can be seen from the link[1] below.

[1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/

We can just use the following command to reproduce the result:

stress-ng --timeout 60 --times --verify --metrics-brief --ramfs 9 &

1) before commit f95bdb700bc6b:

stress-ng: info:  [11023] dispatching hogs: 9 ramfs
stress-ng: info:  [11023] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: info:  [11023]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [11023] ramfs            774966     60.00     10.18    169.45     12915.89        4314.26
stress-ng: info:  [11023] for a 60.00s run time:
stress-ng: info:  [11023]    1920.11s available CPU time
stress-ng: info:  [11023]      10.18s user time   (  0.53%)
stress-ng: info:  [11023]     169.44s system time (  8.82%)
stress-ng: info:  [11023]     179.62s total time  (  9.35%)
stress-ng: info:  [11023] load average: 8.99 2.69 0.93
stress-ng: info:  [11023] successful run completed in 60.00s (1 min, 0.00 secs)

2) after commit f95bdb700bc6b:

stress-ng: info:  [37676] dispatching hogs: 9 ramfs
stress-ng: info:  [37676] stressor       bogo ops real time  usrtime  sys time   bogo ops/s     bogo ops/s
stress-ng: info:  [37676]                           (secs)    (secs)   (secs)   (real time) (usr+sys time)
stress-ng: info:  [37676] ramfs            168673     60.00     1.61    39.66      2811.08        4087.47
stress-ng: info:  [37676] for a 60.10s run time:
stress-ng: info:  [37676]    1923.36s available CPU time
stress-ng: info:  [37676]       1.60s user time   (  0.08%)
stress-ng: info:  [37676]      39.66s system time (  2.06%)
stress-ng: info:  [37676]      41.26s total time  (  2.15%)
stress-ng: info:  [37676] load average: 7.69 3.63 2.36
stress-ng: info:  [37676] successful run completed in 60.10s (1 min, 0.10 secs)

The root cause is that SRCU has to be careful to not frequently check for srcu
read-side critical section exits. Paul E. McKenney gave a detailed explanation:

```
In practice, the act of checking to see if there is anyone in an SRCU
read-side critical section is a heavy-weight operation, involving at
least one cache miss per CPU along with a number of full memory barriers.
```

Therefore, even if no one is currently in the SRCU read-side critical section,
synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has
become slower.

2. Idea
=======

2.1 use synchronize_srcu_expedited() ?
--------------------------------------

The synchronize_srcu_expedited() will let SRCU to be much more aggressive.
If we use it to replace synchronize_srcu() in the unregister_shrinker(), the
ops/s will return to previous levels:

stress-ng: info:  [13159] dispatching hogs: 9 ramfs
stress-ng: info:  [13159] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: info:  [13159]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [13159] ramfs            710062     60.00      9.63    157.26     11834.18        4254.75
stress-ng: info:  [13159] for a 60.00s run time:
stress-ng: info:  [13159]    1920.14s available CPU time
stress-ng: info:  [13159]       9.62s user time   (  0.50%)
stress-ng: info:  [13159]     157.26s system time (  8.19%)
stress-ng: info:  [13159]     166.88s total time  (  8.69%)
stress-ng: info:  [13159] load average: 9.49 4.02 1.65
stress-ng: info:  [13159] successful run completed in 60.00s (1 min, 0.00 secs)

But because SRCU (Sleepable RCU) is used here, the reader is allowed to sleep in
the read-side critical section, so synchronize_srcu_expedited() may cause a lot
of CPU consumption, so this is not a good choice.

2.2 move synchronize_srcu() to the asynchronous delayed work
------------------------------------------------------------

Kirill Tkhai proposed a better idea[2] in 2018: move synchronize_srcu() to the
asynchronous delayed work, then it doesn't affect on user-visible unregistration
speed.

[2]. https://lore.kernel.org/lkml/153365636747.19074.12610817307548583381.stgit@localhost.localdomain/

After applying his patches ([PATCH RFC 04/10]~[PATCH RFC 10/10], with few
conflicts), the ops/s is of course back to the previous levels:

stress-ng: info:  [11506] setting to a 60 second run per stressor
stress-ng: info:  [11506] dispatching hogs: 9 ramfs
stress-ng: info:  [11506] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: info:  [11506]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [11506] ramfs            829462     60.00     10.81    174.25     13824.14        4482.08
stress-ng: info:  [11506] for a 60.00s run time:
stress-ng: info:  [11506]    1920.12s available CPU time
stress-ng: info:  [11506]      10.81s user time   (  0.56%)
stress-ng: info:  [11506]     174.25s system time (  9.07%)
stress-ng: info:  [11506]     185.06s total time  (  9.64%)
stress-ng: info:  [11506] load average: 8.96 2.60 0.89
stress-ng: info:  [11506] successful run completed in 60.00s (1 min, 0.00 secs)

In order to continue to advance this patch set, I rebase these patches onto the
next-20230525. Any comments and suggestions are welcome.

Note: This patch serise is only for super_block shrinker, all further
time-critical for unregistration places may be written in the same conception.

Thanks,
Qi

Kirill Tkhai (7):
  mm: vmscan: split unregister_shrinker()
  fs: move list_lru_destroy() to destroy_super_work()
  fs: shrink only (SB_ACTIVE|SB_BORN) superblocks in super_cache_scan()
  fs: introduce struct super_operations::destroy_super() callback
  xfs: introduce xfs_fs_destroy_super()
  shmem: implement shmem_destroy_super()
  fs: use unregister_shrinker_delayed_{initiate, finalize} for
    super_block shrinker

Qi Zheng (1):
  mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()

 fs/super.c               | 32 ++++++++++++++++++--------------
 fs/xfs/xfs_super.c       | 25 ++++++++++++++++++++++---
 include/linux/fs.h       |  6 ++++++
 include/linux/shrinker.h |  2 ++
 mm/shmem.c               |  8 ++++++++
 mm/vmscan.c              | 26 ++++++++++++++++++++------
 6 files changed, 76 insertions(+), 23 deletions(-)

-- 
2.30.2


             reply	other threads:[~2023-05-31  9:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31  9:57 Qi Zheng [this message]
2023-05-31  9:57 ` [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu() Qi Zheng
2023-05-31 10:49   ` Christian Brauner
2023-05-31 12:52     ` Qi Zheng
2023-05-31  9:57 ` [PATCH 2/8] mm: vmscan: split unregister_shrinker() Qi Zheng
2023-05-31 22:57   ` Dave Chinner
2023-05-31  9:57 ` [PATCH 3/8] fs: move list_lru_destroy() to destroy_super_work() Qi Zheng
2023-05-31 23:00   ` Dave Chinner
2023-05-31  9:57 ` [PATCH 4/8] fs: shrink only (SB_ACTIVE|SB_BORN) superblocks in super_cache_scan() Qi Zheng
2023-05-31 23:12   ` Dave Chinner
2023-05-31  9:57 ` [PATCH 5/8] fs: introduce struct super_operations::destroy_super() callback Qi Zheng
2023-05-31 11:19   ` Christian Brauner
2023-05-31 12:54     ` Qi Zheng
2023-05-31  9:57 ` [PATCH 6/8] xfs: introduce xfs_fs_destroy_super() Qi Zheng
2023-05-31 23:48   ` Dave Chinner
2023-06-01  8:43     ` Qi Zheng
2023-06-01  9:58       ` Christian Brauner
2023-06-01 23:06       ` Dave Chinner
2023-06-02  3:13         ` Qi Zheng
2023-06-02 15:15           ` Darrick J. Wong
2023-06-05 11:50             ` Christian Brauner
2023-06-05 12:16               ` Qi Zheng
2023-05-31  9:57 ` [PATCH 7/8] shmem: implement shmem_destroy_super() Qi Zheng
2023-05-31  9:57 ` [PATCH 8/8] fs: use unregister_shrinker_delayed_{initiate, finalize} for super_block shrinker Qi Zheng
     [not found] ` <20230531114054.bf077db642aa9c58c0831687@linux-foundation.org>
2023-06-01  8:46   ` [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230531095742.2480623-1-qi.zheng@linux.dev \
    --to=qi.zheng@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=djwong@kernel.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=tkhai@ya.ru \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.