linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/24] Swapin path refactor for optimization and bugfix
@ 2023-11-19 19:47 Kairui Song
  2023-11-19 19:47 ` [PATCH 01/24] mm/swap: fix a potential undefined behavior issue Kairui Song
                   ` (24 more replies)
  0 siblings, 25 replies; 93+ messages in thread
From: Kairui Song @ 2023-11-19 19:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Huang, Ying, David Hildenbrand, Hugh Dickins,
	Johannes Weiner, Matthew Wilcox, Michal Hocko, linux-kernel,
	Kairui Song

From: Kairui Song <kasong@tencent.com>

This series tries to unify and clean up the swapin path, fixing a few
issues with optimizations:

1. Memcg leak issue: when a process that previously swapped out some
   migrated to another cgroup, and the origianl cgroup is dead. If we
   do a swapoff, swapped in pages will be accounted into the process
   doing swapoff instead of the new cgroup. This will allow the process
   to use more memory than expect easily.

   This can be easily reproduced by:
   - Setup a swap.
   - Create memory cgroup A, B and C.
   - Spawn process P1 in cgroup A and make it swap out some pages.
   - Move process P1 to memory cgroup B.
   - Destroy cgroup A.
   - Do a swapoff in cgroup C
   - Swapped in pages is accounted into cgroup C.

   This patch will fix it make the swapped in pages accounted in cgroup B.

2. When there are multiple swap deviced configured, if one of these
   devices is not SSD, VMA readahead will be globally disabled.

   This series will make the readahead policy check per swap entry.

3. This series also include many refactor and optimzations:
   - Swap readahead policy check is unified for page-fault/swapoff/shmem,
     so swapin from ramdisk (eg. ZRAM) will always bypass swapcache.
     Previously shmem and swapoff have different behavior on this.
   - Some mircro optimization (eg. skip duplicated xarray lookup)
     for swapin path while doing the refactor.

Some benchmark:

1. fio test for shmem (whin 2G memcg limit and using lzo-rle ZRAM swap):
fio -name=tmpfs --numjobs=16 --directory=/tmpfs --size=256m --ioengine=mmap \
  --iodepth=128 --rw=randrw --random_distribution=<RANDOM> --time_based\
  --ramp_time=1m --runtime=1m --group_reporting

RANDOM=zipf:1.2 ZRAM
Before (R/W, bw): 7339MiB/s / 7341MiB/s
After  (R/W, bw): 7305MiB/s / 7308MiB/s (-0.5%)

RANDOM=zipf:0.5 ZRAM
Before (R/W, bw): 770MiB/s / 770MiB/s
After  (R/W, bw): 775MiB/s / 774MiB/s (+0.6%)

RANDOM=random ZRAM
Before (R/W, bw): 537MiB/s / 537MiB/s
After  (R/W, bw): 552MiB/s / 552MiB/s (+2.7%)

We can see readahead barely helps, and for random RW there is a observable performance gain.

2. Micro benchmark which use madvise to swap out 10G zero-filled data to
   ZRAM then read them in, shows a performance gain for swapin path:

Before: 12480532 us
After:  12013318 us (+3.8%)

4. The final vmlinux is also a little bit smaller (gcc 8.5.0):
./scripts/bloat-o-meter vmlinux.old vmlinux
add/remove: 8/7 grow/shrink: 5/6 up/down: 5737/-5789 (-52)
Function                                     old     new   delta
unuse_vma                                      -    3204   +3204
swapin_page_fault                              -    1804   +1804
swapin_page_non_fault                          -     437    +437
swapin_no_readahead                            -     165    +165
swap_cache_get_folio                         291     326     +35
__pfx_unuse_vma                                -      16     +16
__pfx_swapin_page_non_fault                    -      16     +16
__pfx_swapin_page_fault                        -      16     +16
__pfx_swapin_no_readahead                      -      16     +16
read_swap_cache_async                        179     191     +12
swap_cluster_readahead                       912     921      +9
__read_swap_cache_async                      669     673      +4
zswap_writeback_entry                       1463    1466      +3
__do_sys_swapon                             4923    4920      -3
nr_rotate_swap                                 4       -      -4
__pfx_unuse_pte_range                         16       -     -16
__pfx_swapin_readahead                        16       -     -16
__pfx___swap_count                            16       -     -16
__x64_sys_swapoff                           1347    1312     -35
__ia32_sys_swapoff                          1346    1311     -35
__swap_count                                  72       -     -72
shmem_swapin_folio                          1697    1535    -162
do_swap_page                                2404    1942    -462
try_to_unuse                                1867     880    -987
swapin_readahead                            1377       -   -1377
unuse_pte_range                             2604       -   -2604
Total: Before=30085393, After=30085341, chg -0.00%

Kairui Song (24):
  mm/swap: fix a potential undefined behavior issue
  mm/swapfile.c: add back some comment
  mm/swap: move no readahead swapin code to a stand alone helper
  mm/swap: avoid setting page lock bit and doing extra unlock check
  mm/swap: move readahead policy checking into swapin_readahead
  swap: rework swapin_no_readahead arguments
  mm/swap: move swap_count to header to be shared
  mm/swap: check readahead policy per entry
  mm/swap: inline __swap_count
  mm/swap: remove nr_rotate_swap and related code
  mm/swap: also handle swapcache lookup in swapin_readahead
  mm/swap: simplify arguments for swap_cache_get_folio
  swap: simplify swap_cache_get_folio
  mm/swap: do shadow lookup as well when doing swap cache lookup
  mm/swap: avoid an duplicated swap cache lookup for SYNCHRONOUS_IO
    device
  mm/swap: reduce scope of get_swap_device in swapin path
  mm/swap: fix false error when swapoff race with swapin
  mm/swap: introduce a helper non fault swapin
  shmem, swap: refactor error check on OOM or race
  swap: simplify and make swap_find_cache static
  swap: make swapin_readahead result checking argument mandatory
  swap: make swap_cluster_readahead static
  swap: fix multiple swap leak when after cgroup migrate
  mm/swap: change swapin_readahead to swapin_page_fault

 include/linux/swap.h |   7 --
 mm/memory.c          | 109 +++++++--------------
 mm/shmem.c           |  55 ++++-------
 mm/swap.h            |  34 ++++---
 mm/swap_state.c      | 222 ++++++++++++++++++++++++++++++++-----------
 mm/swapfile.c        |  70 ++++++--------
 mm/zswap.c           |   2 +-
 7 files changed, 269 insertions(+), 230 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2023-12-13  2:23 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-19 19:47 [PATCH 00/24] Swapin path refactor for optimization and bugfix Kairui Song
2023-11-19 19:47 ` [PATCH 01/24] mm/swap: fix a potential undefined behavior issue Kairui Song
2023-11-19 20:55   ` Matthew Wilcox
2023-11-20  3:35     ` Chris Li
2023-11-20 11:14       ` Kairui Song
2023-11-20 17:34         ` Chris Li
2023-11-19 19:47 ` [PATCH 02/24] mm/swapfile.c: add back some comment Kairui Song
2023-11-19 19:47 ` [PATCH 03/24] mm/swap: move no readahead swapin code to a stand alone helper Kairui Song
2023-11-19 21:00   ` Matthew Wilcox
2023-11-20 11:14     ` Kairui Song
2023-11-20 14:55   ` Dan Carpenter
2023-11-21  5:34   ` Chris Li
2023-11-22 17:33     ` Kairui Song
2023-11-19 19:47 ` [PATCH 04/24] mm/swap: avoid setting page lock bit and doing extra unlock check Kairui Song
2023-11-20  4:17   ` Chris Li
2023-11-20 11:15     ` Kairui Song
2023-11-20 17:44       ` Chris Li
2023-11-22 17:32         ` Kairui Song
2023-11-22 20:57           ` Chris Li
2023-11-24  8:14             ` Kairui Song
2023-11-24  8:37               ` Christopher Li
2023-11-19 19:47 ` [PATCH 05/24] mm/swap: move readahead policy checking into swapin_readahead Kairui Song
2023-11-21  6:15   ` Chris Li
2023-11-21  6:35     ` Kairui Song
2023-11-21  7:41       ` Chris Li
2023-11-21  8:32         ` Kairui Song
2023-11-21 15:24           ` Chris Li
2023-11-19 19:47 ` [PATCH 06/24] swap: rework swapin_no_readahead arguments Kairui Song
2023-11-20  0:20   ` kernel test robot
2023-11-21  6:44   ` Chris Li
2023-11-23 10:51     ` Kairui Song
2023-11-19 19:47 ` [PATCH 07/24] mm/swap: move swap_count to header to be shared Kairui Song
2023-11-21  6:51   ` Chris Li
2023-11-21  7:03     ` Kairui Song
2023-11-19 19:47 ` [PATCH 08/24] mm/swap: check readahead policy per entry Kairui Song
2023-11-20  6:04   ` Huang, Ying
2023-11-20 11:17     ` Kairui Song
2023-11-21  1:10       ` Huang, Ying
2023-11-21  5:20         ` Chris Li
2023-11-21  5:13       ` Chris Li
2023-11-21  7:54   ` Chris Li
2023-11-23 10:52     ` Kairui Song
2023-11-19 19:47 ` [PATCH 09/24] mm/swap: inline __swap_count Kairui Song
2023-11-20  7:41   ` Huang, Ying
2023-11-21  8:02     ` Chris Li
2023-11-19 19:47 ` [PATCH 10/24] mm/swap: remove nr_rotate_swap and related code Kairui Song
2023-11-21 15:45   ` Chris Li
2023-11-19 19:47 ` [PATCH 11/24] mm/swap: also handle swapcache lookup in swapin_readahead Kairui Song
2023-11-20  0:47   ` kernel test robot
2023-11-21 16:06   ` Chris Li
2023-11-24  8:42     ` Kairui Song
2023-11-24  9:10       ` Chris Li
2023-11-19 19:47 ` [PATCH 12/24] mm/swap: simplify arguments for swap_cache_get_folio Kairui Song
2023-11-21 16:36   ` Chris Li
2023-11-19 19:47 ` [PATCH 13/24] swap: simplify swap_cache_get_folio Kairui Song
2023-11-21 16:50   ` Chris Li
2023-11-19 19:47 ` [PATCH 14/24] mm/swap: do shadow lookup as well when doing swap cache lookup Kairui Song
2023-11-21 16:55   ` Chris Li
2023-11-19 19:47 ` [PATCH 15/24] mm/swap: avoid an duplicated swap cache lookup for SYNCHRONOUS_IO device Kairui Song
2023-11-21 17:15   ` Chris Li
2023-11-22 18:08     ` Kairui Song
2023-11-19 19:47 ` [PATCH 16/24] mm/swap: reduce scope of get_swap_device in swapin path Kairui Song
2023-11-19 21:12   ` Matthew Wilcox
2023-11-20 11:14     ` Kairui Song
2023-11-21 17:25   ` Chris Li
2023-11-22  0:36   ` Huang, Ying
2023-11-23 11:13     ` Kairui Song
2023-11-24  0:40       ` Huang, Ying
2023-11-19 19:47 ` [PATCH 17/24] mm/swap: fix false error when swapoff race with swapin Kairui Song
2023-11-19 19:47 ` [PATCH 18/24] mm/swap: introduce a helper non fault swapin Kairui Song
2023-11-20  1:07   ` kernel test robot
2023-11-22  4:40   ` Chris Li
2023-11-28 11:22     ` Kairui Song
2023-12-13  2:22       ` Chris Li
2023-11-19 19:47 ` [PATCH 19/24] shmem, swap: refactor error check on OOM or race Kairui Song
2023-11-20  7:04   ` Chris Li
2023-11-20 11:17     ` Kairui Song
2023-11-19 19:47 ` [PATCH 20/24] swap: simplify and make swap_find_cache static Kairui Song
2023-11-22  5:01   ` Chris Li
2023-11-19 19:47 ` [PATCH 21/24] swap: make swapin_readahead result checking argument mandatory Kairui Song
2023-11-22  5:15   ` Chris Li
2023-11-24  8:14     ` Kairui Song
2023-11-19 19:47 ` [PATCH 22/24] swap: make swap_cluster_readahead static Kairui Song
2023-11-22  5:20   ` Chris Li
2023-11-19 19:47 ` [PATCH 23/24] swap: fix multiple swap leak when after cgroup migrate Kairui Song
2023-11-20  7:35   ` Huang, Ying
2023-11-20 11:17     ` Kairui Song
2023-11-22  5:34       ` Chris Li
2023-11-19 19:47 ` [PATCH 24/24] mm/swap: change swapin_readahead to swapin_page_fault Kairui Song
2023-11-20 19:09 ` [PATCH 00/24] Swapin path refactor for optimization and bugfix Yosry Ahmed
2023-11-20 20:22   ` Chris Li
2023-11-22  6:46     ` Kairui Song
2023-11-22  6:43   ` Kairui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).