All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Huang, Ying" <ying.huang@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Chris Li <chrisl@kernel.org>, Barry Song <v-songbaohua@oppo.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Neil Brown <neilb@suse.de>,
	Minchan Kim <minchan@kernel.org>, Hugh Dickins <hughd@google.com>,
	David Hildenbrand <david@redhat.com>,
	Yosry Ahmed <yosryahmed@google.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Kairui Song <kasong@tencent.com>
Subject: [PATCH 0/8] mm/swap: optimize swap cache search space
Date: Thu, 18 Apr 2024 00:08:34 +0800	[thread overview]
Message-ID: <20240417160842.76665-1-ryncsn@gmail.com> (raw)

From: Kairui Song <kasong@tencent.com>

Currently we use one swap_address_space for every 64M chunk to reduce lock
contention, this is like having a set of smaller swap files inside one
big swap file. But when doing swap cache look up or insert, we are
still using the offset of the whole large swap file. This is OK for
correctness, as the offset (key) is unique.

But Xarray is specially optimized for small indexes, it creates the
redix tree levels lazily to be just enough to fit the largest key
stored in one Xarray. So we are wasting tree nodes unnecessarily.

For 64M chunk it should only take at most 3 level to contain everything.
But we are using the offset from the whole swap file, so the offset (key)
value will be way beyond 64M, and so will the tree level.

Optimize this by reduce the swap cache search space into 64M scope.

Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk
with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable. The
test result is similar but the improvement is smaller if SWP_SYNCHRONOUS_IO
is enabled, as swap out path can never skip swap cache):

Before:
6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k
0inputs+0outputs (55major+33555018minor)pagefaults 0swaps

After (+1.8% faster):
6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k
0inputs+0outputs (54major+33555027minor)pagefaults 0swaps

Similar result with MySQL and sysbench using swap:
Before:
94055.61 qps

After (+0.8% faster):
94834.91 qps

There is alse a very slight drop of radix tree node slab usage:
Before: 303952K
After:  302224K

For this series:

There are multiple places that expect mixed type of pages (page cache or
swap cache), eg. migration, huge memory split; There are four helpers
for that:

- page_index
- page_file_offset
- folio_index
- folio_file_pos

So this series first cleaned up usage of page_index and
page_file_offset, then convert folio_index and folio_file_pos to be
compatible with separate offsets. And introduce a new helper
swap_cache_index for swap internal usage, replace swp_offset with
swap_cache_index when used to retrieve folio from swap cache.

And idealy, we may want to reduce SWAP_ADDRESS_SPACE_SHIFT from 14 to
12: Default Xarray chunk offset is 6, so we have 3 level trees instead
of 2 level trees just for 2 extra bits. But swap cache is based on
address_space struct, with 4 times more metadata sparsely distributed
in memory it waste more cacheline, the performance gain from this
series is almost canceled. So firstly, just have a cleaner seperation
of offsets.

Patch 1/8 - 6/8: Clean up usage of page_index and page_file_offset
Patch 7/8: Convert folio_index and folio_file_pos to be compatible with
  separate offset.
Patch 8/8: Introduce swap_cache_index and use it when doing lookup in
  swap cache.

This series is part of effort to reduce swap cache overhead, and ultimately
remove SWP_SYNCHRONOUS_IO and unify swap cache usage as proposed before:
https://lore.kernel.org/lkml/20240326185032.72159-1-ryncsn@gmail.com/

Kairui Song (8):
  NFS: remove nfs_page_lengthg and usage of page_index
  nilfs2: drop usage of page_index
  f2fs: drop usage of page_index
  ceph: drop usage of page_index
  cifs: drop usage of page_file_offset
  mm/swap: get the swap file offset directly
  mm: drop page_index/page_file_offset and convert swap helpers to use
    folio
  mm/swap: reduce swap cache search space

 fs/ceph/dir.c           |  2 +-
 fs/ceph/inode.c         |  2 +-
 fs/f2fs/data.c          |  5 ++---
 fs/nfs/internal.h       | 19 -------------------
 fs/nilfs2/bmap.c        |  3 +--
 fs/smb/client/file.c    |  2 +-
 include/linux/mm.h      | 13 -------------
 include/linux/pagemap.h | 19 +++++++++----------
 mm/huge_memory.c        |  2 +-
 mm/memcontrol.c         |  2 +-
 mm/mincore.c            |  2 +-
 mm/page_io.c            |  6 +++---
 mm/shmem.c              |  2 +-
 mm/swap.h               | 12 ++++++++++++
 mm/swap_state.c         | 12 ++++++------
 mm/swapfile.c           | 17 +++++++++++------
 16 files changed, 51 insertions(+), 69 deletions(-)

-- 
2.44.0


             reply	other threads:[~2024-04-17 16:09 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 16:08 Kairui Song [this message]
2024-04-17 16:08 ` [PATCH 1/8] NFS: remove nfs_page_lengthg and usage of page_index Kairui Song
2024-04-17 16:08 ` [PATCH 2/8] nilfs2: drop " Kairui Song
2024-04-17 16:14   ` Matthew Wilcox
2024-04-18  2:42     ` Kairui Song
2024-04-17 16:08 ` [PATCH 3/8] f2fs: " Kairui Song
2024-04-17 16:08   ` [f2fs-dev] " Kairui Song
2024-04-17 16:08 ` [PATCH 4/8] ceph: " Kairui Song
2024-04-18  0:28   ` Xiubo Li
2024-04-18  1:30     ` Matthew Wilcox
2024-04-18  1:40       ` Xiubo Li
2024-04-22 15:34         ` Kairui Song
2024-04-23  0:32           ` Xiubo Li
2024-04-17 16:08 ` [PATCH 5/8] cifs: drop usage of page_file_offset Kairui Song
2024-04-17 16:25   ` Matthew Wilcox
2024-04-17 16:08 ` [PATCH 6/8] mm/swap: get the swap file offset directly Kairui Song
2024-04-18 18:43   ` kernel test robot
2024-04-23  1:41   ` Huang, Ying
2024-04-23 13:33     ` Kairui Song
2024-04-17 16:08 ` [PATCH 7/8] mm: drop page_index/page_file_offset and convert swap helpers to use folio Kairui Song
2024-04-18  1:55   ` Barry Song
2024-04-18  2:42     ` Kairui Song
2024-04-18 10:19       ` Barry Song
2024-04-18  3:30     ` Matthew Wilcox
2024-04-18  3:55       ` Barry Song
2024-04-17 16:08 ` [PATCH 8/8] mm/swap: reduce swap cache search space Kairui Song
2024-04-18 18:21   ` kernel test robot
2024-04-18 18:21   ` kernel test robot
2024-04-22  7:54 ` [PATCH 0/8] mm/swap: optimize " Huang, Ying
2024-04-22 15:20   ` Kairui Song
2024-04-23  1:29     ` Huang, Ying
2024-04-23  3:20   ` Matthew Wilcox
2024-04-24  2:24     ` Huang, Ying
2024-04-26 23:16       ` Chris Li
2024-04-28  1:14         ` Huang, Ying
2024-04-28  2:43           ` Chris Li
2024-04-28  3:21             ` Huang, Ying
2024-04-28 17:26               ` Chris Li
2024-04-28 17:37         ` Kairui Song
2024-04-28 17:45           ` Kairui Song
2024-04-29  5:50           ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240417160842.76665-1-ryncsn@gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=neilb@suse.de \
    --cc=ryan.roberts@arm.com \
    --cc=v-songbaohua@oppo.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.