[PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
@ 2023-12-06  9:46 Chengming Zhou
  2023-12-06  9:46 ` [PATCH 1/7] mm/zswap: make sure each swapfile always have " Chengming Zhou
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-06  9:46 UTC (permalink / raw)
  To: Vitaly Wool, Nhat Pham, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, Yosry Ahmed
  Cc: linux-mm, linux-kernel, Chengming Zhou

Hi everyone,

This patch series is based on the linux-next 20231205, which depends on
the "workload-specific and memory pressure-driven zswap writeback" series
from Nhat Pham.

When testing the zswap performance by using kernel build -j32 in a tmpfs
directory, I found the scalability of zswap rb-tree is not good, which
is protected by the only spinlock. That would cause heavy lock contention
if multiple tasks zswap_store/load concurrently.

So a simple solution is to split the only one zswap rb-tree into multiple
rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").

Although this method can't solve the spinlock contention completely, it
can mitigate much of that contention.

Another problem when testing the zswap using our default zsmalloc is that
zswap_load() and zswap_writeback_entry() have to malloc a temporary memory
to support !zpool_can_sleep_mapped().

Optimize it by reusing the percpu crypto_acomp_ctx->dstmem, which is also
used by zswap_store() and protected by the same percpu crypto_acomp_ctx->mutex.

Thanks for review and comment!

To: Andrew Morton <akpm@linux-foundation.org>
To: Seth Jennings <sjenning@redhat.com>
To: Dan Streetman <ddstreet@ieee.org>
To: Vitaly Wool <vitaly.wool@konsulko.com>
To: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
To: Yosry Ahmed <yosryahmed@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

---
Chengming Zhou (7):
      mm/zswap: make sure each swapfile always have zswap rb-tree
      mm/zswap: split zswap rb-tree
      mm/zswap: reuse dstmem when decompress
      mm/zswap: change dstmem size to one page
      mm/zswap: refactor out __zswap_load()
      mm/zswap: cleanup zswap_load()
      mm/zswap: cleanup zswap_reclaim_entry()

 include/linux/zswap.h |   4 +-
 mm/swapfile.c         |  10 ++-
 mm/zswap.c            | 233 +++++++++++++++++++++-----------------------------
 3 files changed, 106 insertions(+), 141 deletions(-)
---
base-commit: 0f5f12ac05f36f117e793656c3f560625e927f1b
change-id: 20231206-zswap-lock-optimize-06f45683b02b

Best regards,
-- 
Chengming Zhou <zhouchengming@bytedance.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/7] mm/zswap: make sure each swapfile always have zswap rb-tree
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
@ 2023-12-06  9:46 ` Chengming Zhou
  2023-12-08 15:17   ` kernel test robot
  2023-12-08 16:45   ` kernel test robot
  2023-12-06  9:46 ` [PATCH 2/7] mm/zswap: split " Chengming Zhou
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-06  9:46 UTC (permalink / raw)
  To: Vitaly Wool, Nhat Pham, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, Yosry Ahmed
  Cc: linux-mm, linux-kernel, Chengming Zhou

Not all zswap interfaces can handle the absence of the zswap rb-tree,
actually only zswap_store() has handled it for now.

To make things simple, we make sure each swapfile always have the
zswap rb-tree prepared before being enabled and used. The preparation
is unlikely to fail in practice, this patch just make it explicit.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/zswap.h |  4 ++--
 mm/swapfile.c         | 10 +++++++---
 mm/zswap.c            |  7 ++++---
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 08c240e16a01..7cccc02cb9e9 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -30,7 +30,7 @@ struct zswap_lruvec_state {
 bool zswap_store(struct folio *folio);
 bool zswap_load(struct folio *folio);
 void zswap_invalidate(int type, pgoff_t offset);
-void zswap_swapon(int type);
+int zswap_swapon(int type);
 void zswap_swapoff(int type);
 void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg);
 void zswap_lruvec_state_init(struct lruvec *lruvec);
@@ -50,7 +50,7 @@ static inline bool zswap_load(struct folio *folio)
 }
 
 static inline void zswap_invalidate(int type, pgoff_t offset) {}
-static inline void zswap_swapon(int type) {}
+static inline int zswap_swapon(int type) {}
 static inline void zswap_swapoff(int type) {}
 static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
 static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8be70912e298..939e7590feda 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2346,8 +2346,6 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
 				unsigned char *swap_map,
 				struct swap_cluster_info *cluster_info)
 {
-	zswap_swapon(p->type);
-
 	spin_lock(&swap_lock);
 	spin_lock(&p->lock);
 	setup_swap_info(p, prio, swap_map, cluster_info);
@@ -3165,6 +3163,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (error)
 		goto bad_swap_unlock_inode;
 
+	error = zswap_swapon(p->type);
+	if (error)
+		goto free_swap_address_space;
+
 	/*
 	 * Flush any pending IO and dirty mappings before we start using this
 	 * swap device.
@@ -3173,7 +3175,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	error = inode_drain_writes(inode);
 	if (error) {
 		inode->i_flags &= ~S_SWAPFILE;
-		goto free_swap_address_space;
+		goto free_swap_zswap;
 	}
 
 	mutex_lock(&swapon_mutex);
@@ -3197,6 +3199,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 
 	error = 0;
 	goto out;
+free_swap_zswap:
+	zswap_swapoff(p->type);
 free_swap_address_space:
 	exit_swap_address_space(p->type);
 bad_swap_unlock_inode:
diff --git a/mm/zswap.c b/mm/zswap.c
index 0f086ffd7b6a..5e2b8d5ee33b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1560,7 +1560,7 @@ bool zswap_store(struct folio *folio)
 	if (folio_test_large(folio))
 		return false;
 
-	if (!zswap_enabled || !tree)
+	if (!zswap_enabled)
 		return false;
 
 	/*
@@ -1850,19 +1850,20 @@ void zswap_invalidate(int type, pgoff_t offset)
 	spin_unlock(&tree->lock);
 }
 
-void zswap_swapon(int type)
+int zswap_swapon(int type)
 {
 	struct zswap_tree *tree;
 
 	tree = kzalloc(sizeof(*tree), GFP_KERNEL);
 	if (!tree) {
 		pr_err("alloc failed, zswap disabled for swap type %d\n", type);
-		return;
+		return -ENOMEM;
 	}
 
 	tree->rbroot = RB_ROOT;
 	spin_lock_init(&tree->lock);
 	zswap_trees[type] = tree;
+	return 0;
 }
 
 void zswap_swapoff(int type)

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/7] mm/zswap: split zswap rb-tree
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
  2023-12-06  9:46 ` [PATCH 1/7] mm/zswap: make sure each swapfile always have " Chengming Zhou
@ 2023-12-06  9:46 ` Chengming Zhou
  2023-12-06  9:46 ` [PATCH 3/7] mm/zswap: reuse dstmem when decompress Chengming Zhou
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-06  9:46 UTC (permalink / raw)
  To: Vitaly Wool, Nhat Pham, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, Yosry Ahmed
  Cc: linux-mm, linux-kernel, Chengming Zhou

Each swapfile has one rb-tree to search the mapping of swp_entry_t to
zswap_entry, that use a spinlock to protect, which can cause heavy lock
contention if multiple tasks zswap_store/load concurrently.

Optimize the scalability problem by splitting the zswap rb-tree into
multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M),
just like we did in the swap cache address_space splitting.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/zswap.h |  4 +--
 mm/swapfile.c         |  2 +-
 mm/zswap.c            | 69 ++++++++++++++++++++++++++++++++-------------------
 3 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 7cccc02cb9e9..d3a8bc300b70 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -30,7 +30,7 @@ struct zswap_lruvec_state {
 bool zswap_store(struct folio *folio);
 bool zswap_load(struct folio *folio);
 void zswap_invalidate(int type, pgoff_t offset);
-int zswap_swapon(int type);
+int zswap_swapon(int type, unsigned long nr_pages);
 void zswap_swapoff(int type);
 void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg);
 void zswap_lruvec_state_init(struct lruvec *lruvec);
@@ -50,7 +50,7 @@ static inline bool zswap_load(struct folio *folio)
 }
 
 static inline void zswap_invalidate(int type, pgoff_t offset) {}
-static inline int zswap_swapon(int type) {}
+static inline int zswap_swapon(int type, unsigned long nr_pages) {}
 static inline void zswap_swapoff(int type) {}
 static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
 static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 939e7590feda..da8367a3e076 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3163,7 +3163,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (error)
 		goto bad_swap_unlock_inode;
 
-	error = zswap_swapon(p->type);
+	error = zswap_swapon(p->type, maxpages);
 	if (error)
 		goto free_swap_address_space;
 
diff --git a/mm/zswap.c b/mm/zswap.c
index 5e2b8d5ee33b..a6b4859a0164 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -234,6 +234,7 @@ struct zswap_tree {
 };
 
 static struct zswap_tree *zswap_trees[MAX_SWAPFILES];
+static unsigned int nr_zswap_trees[MAX_SWAPFILES];
 
 /* RCU-protected iteration */
 static LIST_HEAD(zswap_pools);
@@ -260,6 +261,10 @@ static bool zswap_has_pool;
 * helpers and fwd declarations
 **********************************/
 
+#define swap_zswap_tree(entry)					\
+	(&zswap_trees[swp_type(entry)][swp_offset(entry)	\
+		>> SWAP_ADDRESS_SPACE_SHIFT])
+
 #define zswap_pool_debug(msg, p)				\
 	pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name,		\
 		 zpool_get_type((p)->zpools[0]))
@@ -885,7 +890,7 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
 	 * until the entry is verified to still be alive in the tree.
 	 */
 	swpoffset = swp_offset(entry->swpentry);
-	tree = zswap_trees[swp_type(entry->swpentry)];
+	tree = swap_zswap_tree(entry->swpentry);
 	list_lru_isolate(l, item);
 	/*
 	 * It's safe to drop the lock here because we return either
@@ -1535,10 +1540,9 @@ static void zswap_fill_page(void *ptr, unsigned long value)
 bool zswap_store(struct folio *folio)
 {
 	swp_entry_t swp = folio->swap;
-	int type = swp_type(swp);
 	pgoff_t offset = swp_offset(swp);
 	struct page *page = &folio->page;
-	struct zswap_tree *tree = zswap_trees[type];
+	struct zswap_tree *tree = swap_zswap_tree(swp);
 	struct zswap_entry *entry, *dupentry;
 	struct scatterlist input, output;
 	struct crypto_acomp_ctx *acomp_ctx;
@@ -1610,7 +1614,7 @@ bool zswap_store(struct folio *folio)
 		src = kmap_local_page(page);
 		if (zswap_is_page_same_filled(src, &value)) {
 			kunmap_local(src);
-			entry->swpentry = swp_entry(type, offset);
+			entry->swpentry = swp;
 			entry->length = 0;
 			entry->value = value;
 			atomic_inc(&zswap_same_filled_pages);
@@ -1688,7 +1692,7 @@ bool zswap_store(struct folio *folio)
 	mutex_unlock(acomp_ctx->mutex);
 
 	/* populate entry */
-	entry->swpentry = swp_entry(type, offset);
+	entry->swpentry = swp;
 	entry->handle = handle;
 	entry->length = dlen;
 
@@ -1748,10 +1752,9 @@ bool zswap_store(struct folio *folio)
 bool zswap_load(struct folio *folio)
 {
 	swp_entry_t swp = folio->swap;
-	int type = swp_type(swp);
 	pgoff_t offset = swp_offset(swp);
 	struct page *page = &folio->page;
-	struct zswap_tree *tree = zswap_trees[type];
+	struct zswap_tree *tree = swap_zswap_tree(swp);
 	struct zswap_entry *entry;
 	struct scatterlist input, output;
 	struct crypto_acomp_ctx *acomp_ctx;
@@ -1835,7 +1838,7 @@ bool zswap_load(struct folio *folio)
 
 void zswap_invalidate(int type, pgoff_t offset)
 {
-	struct zswap_tree *tree = zswap_trees[type];
+	struct zswap_tree *tree = swap_zswap_tree(swp_entry(type, offset));
 	struct zswap_entry *entry;
 
 	/* find */
@@ -1850,37 +1853,53 @@ void zswap_invalidate(int type, pgoff_t offset)
 	spin_unlock(&tree->lock);
 }
 
-int zswap_swapon(int type)
+int zswap_swapon(int type, unsigned long nr_pages)
 {
-	struct zswap_tree *tree;
+	struct zswap_tree *trees, *tree;
+	unsigned int nr, i;
 
-	tree = kzalloc(sizeof(*tree), GFP_KERNEL);
-	if (!tree) {
+	nr = DIV_ROUND_UP(nr_pages, SWAP_ADDRESS_SPACE_PAGES);
+	trees = kvcalloc(nr, sizeof(*tree), GFP_KERNEL);
+	if (!trees) {
 		pr_err("alloc failed, zswap disabled for swap type %d\n", type);
 		return -ENOMEM;
 	}
 
-	tree->rbroot = RB_ROOT;
-	spin_lock_init(&tree->lock);
-	zswap_trees[type] = tree;
+	for (i = 0; i < nr; i++) {
+		tree = trees + i;
+		tree->rbroot = RB_ROOT;
+		spin_lock_init(&tree->lock);
+	}
+
+	nr_zswap_trees[type] = nr;
+	zswap_trees[type] = trees;
 	return 0;
 }
 
 void zswap_swapoff(int type)
 {
-	struct zswap_tree *tree = zswap_trees[type];
-	struct zswap_entry *entry, *n;
+	struct zswap_tree *trees = zswap_trees[type];
+	unsigned int i;
 
-	if (!tree)
+	if (!trees)
 		return;
 
-	/* walk the tree and free everything */
-	spin_lock(&tree->lock);
-	rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode)
-		zswap_free_entry(entry);
-	tree->rbroot = RB_ROOT;
-	spin_unlock(&tree->lock);
-	kfree(tree);
+	for (i = 0; i < nr_zswap_trees[type]; i++) {
+		struct zswap_tree *tree = trees + i;
+		struct zswap_entry *entry, *n;
+
+		/* walk the tree and free everything */
+		spin_lock(&tree->lock);
+		rbtree_postorder_for_each_entry_safe(entry, n,
+						     &tree->rbroot,
+						     rbnode)
+			zswap_free_entry(entry);
+		tree->rbroot = RB_ROOT;
+		spin_unlock(&tree->lock);
+	}
+
+	kvfree(trees);
+	nr_zswap_trees[type] = 0;
 	zswap_trees[type] = NULL;
 }
 

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/7] mm/zswap: reuse dstmem when decompress
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
  2023-12-06  9:46 ` [PATCH 1/7] mm/zswap: make sure each swapfile always have " Chengming Zhou
  2023-12-06  9:46 ` [PATCH 2/7] mm/zswap: split " Chengming Zhou
@ 2023-12-06  9:46 ` Chengming Zhou
  2023-12-12 22:58   ` Nhat Pham
  2023-12-06  9:46 ` [PATCH 4/7] mm/zswap: change dstmem size to one page Chengming Zhou
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Chengming Zhou @ 2023-12-06  9:46 UTC (permalink / raw)
  To: Vitaly Wool, Nhat Pham, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, Yosry Ahmed
  Cc: linux-mm, linux-kernel, Chengming Zhou

In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first
copy the entry->handle memory to a temporary memory, which is allocated
using kmalloc.

Obviously we can reuse the per-compressor dstmem to avoid allocating
every time, since it's percpu-compressor and protected in mutex.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index a6b4859a0164..d93a7b58b5af 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1758,9 +1758,9 @@ bool zswap_load(struct folio *folio)
 	struct zswap_entry *entry;
 	struct scatterlist input, output;
 	struct crypto_acomp_ctx *acomp_ctx;
-	u8 *src, *dst, *tmp;
+	unsigned int dlen = PAGE_SIZE;
+	u8 *src, *dst;
 	struct zpool *zpool;
-	unsigned int dlen;
 	bool ret;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
@@ -1782,27 +1782,18 @@ bool zswap_load(struct folio *folio)
 		goto stats;
 	}
 
-	zpool = zswap_find_zpool(entry);
-	if (!zpool_can_sleep_mapped(zpool)) {
-		tmp = kmalloc(entry->length, GFP_KERNEL);
-		if (!tmp) {
-			ret = false;
-			goto freeentry;
-		}
-	}
-
 	/* decompress */
-	dlen = PAGE_SIZE;
-	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
+	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+	mutex_lock(acomp_ctx->mutex);
 
+	zpool = zswap_find_zpool(entry);
+	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
 	if (!zpool_can_sleep_mapped(zpool)) {
-		memcpy(tmp, src, entry->length);
-		src = tmp;
+		memcpy(acomp_ctx->dstmem, src, entry->length);
+		src = acomp_ctx->dstmem;
 		zpool_unmap_handle(zpool, entry->handle);
 	}
 
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	mutex_lock(acomp_ctx->mutex);
 	sg_init_one(&input, src, entry->length);
 	sg_init_table(&output, 1);
 	sg_set_page(&output, page, PAGE_SIZE, 0);
@@ -1813,15 +1804,13 @@ bool zswap_load(struct folio *folio)
 
 	if (zpool_can_sleep_mapped(zpool))
 		zpool_unmap_handle(zpool, entry->handle);
-	else
-		kfree(tmp);
 
 	ret = true;
 stats:
 	count_vm_event(ZSWPIN);
 	if (entry->objcg)
 		count_objcg_event(entry->objcg, ZSWPIN);
-freeentry:
+
 	spin_lock(&tree->lock);
 	if (ret && zswap_exclusive_loads_enabled) {
 		zswap_invalidate_entry(tree, entry);

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/7] mm/zswap: change dstmem size to one page
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
                   ` (2 preceding siblings ...)
  2023-12-06  9:46 ` [PATCH 3/7] mm/zswap: reuse dstmem when decompress Chengming Zhou
@ 2023-12-06  9:46 ` Chengming Zhou
  2023-12-06 17:12   ` Nhat Pham
  2023-12-06  9:46 ` [PATCH 5/7] mm/zswap: refactor out __zswap_load() Chengming Zhou
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Chengming Zhou @ 2023-12-06  9:46 UTC (permalink / raw)
  To: Vitaly Wool, Nhat Pham, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, Yosry Ahmed
  Cc: linux-mm, linux-kernel, Chengming Zhou

Maybe I missed something, but the dstmem size of 2 * PAGE_SIZE is
very confusing, since we only need at most one page when compress,
and the "dlen" is also PAGE_SIZE in acomp_request_set_params().

So change it to one page, and fix the comments.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index d93a7b58b5af..999671dcb469 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -699,7 +699,7 @@ static int zswap_dstmem_prepare(unsigned int cpu)
 	struct mutex *mutex;
 	u8 *dst;
 
-	dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
+	dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
 	if (!dst)
 		return -ENOMEM;
 
@@ -1649,8 +1649,7 @@ bool zswap_store(struct folio *folio)
 	sg_init_table(&input, 1);
 	sg_set_page(&input, page, PAGE_SIZE, 0);
 
-	/* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
-	sg_init_one(&output, dst, PAGE_SIZE * 2);
+	sg_init_one(&output, dst, PAGE_SIZE);
 	acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen);
 	/*
 	 * it maybe looks a little bit silly that we send an asynchronous request,

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/7] mm/zswap: refactor out __zswap_load()
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
                   ` (3 preceding siblings ...)
  2023-12-06  9:46 ` [PATCH 4/7] mm/zswap: change dstmem size to one page Chengming Zhou
@ 2023-12-06  9:46 ` Chengming Zhou
  2023-12-12 23:13   ` Nhat Pham
  2023-12-06  9:46 ` [PATCH 6/7] mm/zswap: cleanup zswap_load() Chengming Zhou
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Chengming Zhou @ 2023-12-06  9:46 UTC (permalink / raw)
  To: Vitaly Wool, Nhat Pham, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, Yosry Ahmed
  Cc: linux-mm, linux-kernel, Chengming Zhou

The zswap_load() and zswap_writeback_entry() have the same part that
decompress the data from zswap_entry to page, so refactor out the
common part as __zswap_load(entry, page).

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 108 ++++++++++++++++++++++---------------------------------------
 1 file changed, 39 insertions(+), 69 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 999671dcb469..667b66a3911b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1380,6 +1380,42 @@ static int zswap_enabled_param_set(const char *val,
 	return ret;
 }
 
+static void __zswap_load(struct zswap_entry *entry, struct page *page)
+{
+	struct scatterlist input, output;
+	unsigned int dlen = PAGE_SIZE;
+	struct crypto_acomp_ctx *acomp_ctx;
+	struct zpool *zpool;
+	u8 *src;
+	int ret;
+
+	/* decompress */
+	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+	mutex_lock(acomp_ctx->mutex);
+
+	zpool = zswap_find_zpool(entry);
+	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
+	if (!zpool_can_sleep_mapped(zpool)) {
+		memcpy(acomp_ctx->dstmem, src, entry->length);
+		src = acomp_ctx->dstmem;
+		zpool_unmap_handle(zpool, entry->handle);
+	}
+
+	sg_init_one(&input, src, entry->length);
+	sg_init_table(&output, 1);
+	sg_set_page(&output, page, PAGE_SIZE, 0);
+	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
+	ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
+	dlen = acomp_ctx->req->dlen;
+	mutex_unlock(acomp_ctx->mutex);
+
+	if (zpool_can_sleep_mapped(zpool))
+		zpool_unmap_handle(zpool, entry->handle);
+
+	BUG_ON(ret);
+	BUG_ON(dlen != PAGE_SIZE);
+}
+
 /*********************************
 * writeback code
 **********************************/
@@ -1401,23 +1437,12 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	swp_entry_t swpentry = entry->swpentry;
 	struct page *page;
 	struct mempolicy *mpol;
-	struct scatterlist input, output;
-	struct crypto_acomp_ctx *acomp_ctx;
-	struct zpool *pool = zswap_find_zpool(entry);
 	bool page_was_allocated;
-	u8 *src, *tmp = NULL;
-	unsigned int dlen;
 	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_NONE,
 	};
 
-	if (!zpool_can_sleep_mapped(pool)) {
-		tmp = kmalloc(PAGE_SIZE, GFP_KERNEL);
-		if (!tmp)
-			return -ENOMEM;
-	}
-
 	/* try to allocate swap cache page */
 	mpol = get_task_policy(current);
 	page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
@@ -1450,33 +1475,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	}
 	spin_unlock(&tree->lock);
 
-	/* decompress */
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	dlen = PAGE_SIZE;
-
-	src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO);
-	if (!zpool_can_sleep_mapped(pool)) {
-		memcpy(tmp, src, entry->length);
-		src = tmp;
-		zpool_unmap_handle(pool, entry->handle);
-	}
-
-	mutex_lock(acomp_ctx->mutex);
-	sg_init_one(&input, src, entry->length);
-	sg_init_table(&output, 1);
-	sg_set_page(&output, page, PAGE_SIZE, 0);
-	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
-	ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
-	dlen = acomp_ctx->req->dlen;
-	mutex_unlock(acomp_ctx->mutex);
-
-	if (!zpool_can_sleep_mapped(pool))
-		kfree(tmp);
-	else
-		zpool_unmap_handle(pool, entry->handle);
-
-	BUG_ON(ret);
-	BUG_ON(dlen != PAGE_SIZE);
+	__zswap_load(entry, page);
 
 	/* page is up to date */
 	SetPageUptodate(page);
@@ -1496,9 +1495,6 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	return ret;
 
 fail:
-	if (!zpool_can_sleep_mapped(pool))
-		kfree(tmp);
-
 	/*
 	 * If we get here because the page is already in swapcache, a
 	 * load may be happening concurrently. It is safe and okay to
@@ -1755,11 +1751,7 @@ bool zswap_load(struct folio *folio)
 	struct page *page = &folio->page;
 	struct zswap_tree *tree = swap_zswap_tree(swp);
 	struct zswap_entry *entry;
-	struct scatterlist input, output;
-	struct crypto_acomp_ctx *acomp_ctx;
-	unsigned int dlen = PAGE_SIZE;
-	u8 *src, *dst;
-	struct zpool *zpool;
+	u8 *dst;
 	bool ret;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
@@ -1781,29 +1773,7 @@ bool zswap_load(struct folio *folio)
 		goto stats;
 	}
 
-	/* decompress */
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	mutex_lock(acomp_ctx->mutex);
-
-	zpool = zswap_find_zpool(entry);
-	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
-	if (!zpool_can_sleep_mapped(zpool)) {
-		memcpy(acomp_ctx->dstmem, src, entry->length);
-		src = acomp_ctx->dstmem;
-		zpool_unmap_handle(zpool, entry->handle);
-	}
-
-	sg_init_one(&input, src, entry->length);
-	sg_init_table(&output, 1);
-	sg_set_page(&output, page, PAGE_SIZE, 0);
-	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
-	if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait))
-		WARN_ON(1);
-	mutex_unlock(acomp_ctx->mutex);
-
-	if (zpool_can_sleep_mapped(zpool))
-		zpool_unmap_handle(zpool, entry->handle);
-
+	__zswap_load(entry, page);
 	ret = true;
 stats:
 	count_vm_event(ZSWPIN);

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/7] mm/zswap: cleanup zswap_load()
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
                   ` (4 preceding siblings ...)
  2023-12-06  9:46 ` [PATCH 5/7] mm/zswap: refactor out __zswap_load() Chengming Zhou
@ 2023-12-06  9:46 ` Chengming Zhou
  2023-12-06  9:46 ` [PATCH 7/7] mm/zswap: cleanup zswap_reclaim_entry() Chengming Zhou
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-06  9:46 UTC (permalink / raw)
  To: Vitaly Wool, Nhat Pham, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, Yosry Ahmed
  Cc: linux-mm, linux-kernel, Chengming Zhou

After the common decompress part goes to __zswap_load(), we can cleanup
the zswap_load() a little.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 667b66a3911b..50405811cd7b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1752,7 +1752,6 @@ bool zswap_load(struct folio *folio)
 	struct zswap_tree *tree = swap_zswap_tree(swp);
 	struct zswap_entry *entry;
 	u8 *dst;
-	bool ret;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
 
@@ -1769,19 +1768,16 @@ bool zswap_load(struct folio *folio)
 		dst = kmap_local_page(page);
 		zswap_fill_page(dst, entry->value);
 		kunmap_local(dst);
-		ret = true;
-		goto stats;
+	} else {
+		__zswap_load(entry, page);
 	}
 
-	__zswap_load(entry, page);
-	ret = true;
-stats:
 	count_vm_event(ZSWPIN);
 	if (entry->objcg)
 		count_objcg_event(entry->objcg, ZSWPIN);
 
 	spin_lock(&tree->lock);
-	if (ret && zswap_exclusive_loads_enabled) {
+	if (zswap_exclusive_loads_enabled) {
 		zswap_invalidate_entry(tree, entry);
 		folio_mark_dirty(folio);
 	} else if (entry->length) {
@@ -1791,7 +1787,7 @@ bool zswap_load(struct folio *folio)
 	zswap_entry_put(tree, entry);
 	spin_unlock(&tree->lock);
 
-	return ret;
+	return true;
 }
 
 void zswap_invalidate(int type, pgoff_t offset)

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/7] mm/zswap: cleanup zswap_reclaim_entry()
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
                   ` (5 preceding siblings ...)
  2023-12-06  9:46 ` [PATCH 6/7] mm/zswap: cleanup zswap_load() Chengming Zhou
@ 2023-12-06  9:46 ` Chengming Zhou
  2023-12-06 17:24 ` [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Nhat Pham
  2023-12-06 20:08 ` Nhat Pham
  8 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-06  9:46 UTC (permalink / raw)
  To: Vitaly Wool, Nhat Pham, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, Yosry Ahmed
  Cc: linux-mm, linux-kernel, Chengming Zhou

Also after the common decompress part goes to __zswap_load(), we can
cleanup the zswap_reclaim_entry() a little.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 50405811cd7b..d3fedda0d774 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1438,7 +1438,6 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	struct page *page;
 	struct mempolicy *mpol;
 	bool page_was_allocated;
-	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_NONE,
 	};
@@ -1447,16 +1446,13 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	mpol = get_task_policy(current);
 	page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
 				NO_INTERLEAVE_INDEX, &page_was_allocated, true);
-	if (!page) {
-		ret = -ENOMEM;
-		goto fail;
-	}
+	if (!page)
+		return -ENOMEM;
 
 	/* Found an existing page, we raced with load/swapin */
 	if (!page_was_allocated) {
 		put_page(page);
-		ret = -EEXIST;
-		goto fail;
+		return -EEXIST;
 	}
 
 	/*
@@ -1470,8 +1466,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) {
 		spin_unlock(&tree->lock);
 		delete_from_swap_cache(page_folio(page));
-		ret = -ENOMEM;
-		goto fail;
+		return -ENOMEM;
 	}
 	spin_unlock(&tree->lock);
 
@@ -1492,15 +1487,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	__swap_writepage(page, &wbc);
 	put_page(page);
 
-	return ret;
-
-fail:
-	/*
-	 * If we get here because the page is already in swapcache, a
-	 * load may be happening concurrently. It is safe and okay to
-	 * not free the entry. It is also okay to return !0.
-	 */
-	return ret;
+	return 0;
 }
 
 static int zswap_is_page_same_filled(void *ptr, unsigned long *value)

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/7] mm/zswap: change dstmem size to one page
  2023-12-06  9:46 ` [PATCH 4/7] mm/zswap: change dstmem size to one page Chengming Zhou
@ 2023-12-06 17:12   ` Nhat Pham
  2023-12-07  2:59     ` Chengming Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Nhat Pham @ 2023-12-06 17:12 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> Maybe I missed something, but the dstmem size of 2 * PAGE_SIZE is
> very confusing, since we only need at most one page when compress,
> and the "dlen" is also PAGE_SIZE in acomp_request_set_params().
>
> So change it to one page, and fix the comments.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  mm/zswap.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index d93a7b58b5af..999671dcb469 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -699,7 +699,7 @@ static int zswap_dstmem_prepare(unsigned int cpu)
>         struct mutex *mutex;
>         u8 *dst;
>
> -       dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
> +       dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
>         if (!dst)
>                 return -ENOMEM;
>
> @@ -1649,8 +1649,7 @@ bool zswap_store(struct folio *folio)
>         sg_init_table(&input, 1);
>         sg_set_page(&input, page, PAGE_SIZE, 0);
>
> -       /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
> -       sg_init_one(&output, dst, PAGE_SIZE * 2);
> +       sg_init_one(&output, dst, PAGE_SIZE);

Hmm. This is very weird. It looks very intentional though, so perhaps
we should consult the maintainer or the original author of this logic
to double check this?
My best guess is for cases where the compression algorithm fails - i.e
the output (header + payload) is somehow bigger than the original
data. But not sure if this happens at all, and if the size > PAGE_SIZE
we don't wanna store the output in zswap anyway.

>         acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen);
>         /*
>          * it maybe looks a little bit silly that we send an asynchronous request,
>
> --
> b4 0.10.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
                   ` (6 preceding siblings ...)
  2023-12-06  9:46 ` [PATCH 7/7] mm/zswap: cleanup zswap_reclaim_entry() Chengming Zhou
@ 2023-12-06 17:24 ` Nhat Pham
  2023-12-06 20:41   ` Yosry Ahmed
  2023-12-06 20:08 ` Nhat Pham
  8 siblings, 1 reply; 30+ messages in thread
From: Nhat Pham @ 2023-12-06 17:24 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel, Chris Li

+ Chris Li

Chris, I vaguely remember from our last conversation that you have
some concurrent efforts to use xarray here right?

On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> Hi everyone,
>
> This patch series is based on the linux-next 20231205, which depends on
> the "workload-specific and memory pressure-driven zswap writeback" series
> from Nhat Pham.
>
> When testing the zswap performance by using kernel build -j32 in a tmpfs
> directory, I found the scalability of zswap rb-tree is not good, which
> is protected by the only spinlock. That would cause heavy lock contention
> if multiple tasks zswap_store/load concurrently.
>
> So a simple solution is to split the only one zswap rb-tree into multiple
> rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
> from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
>
> Although this method can't solve the spinlock contention completely, it
> can mitigate much of that contention.
>
> Another problem when testing the zswap using our default zsmalloc is that
> zswap_load() and zswap_writeback_entry() have to malloc a temporary memory
> to support !zpool_can_sleep_mapped().
>
> Optimize it by reusing the percpu crypto_acomp_ctx->dstmem, which is also
> used by zswap_store() and protected by the same percpu crypto_acomp_ctx->mutex.
>
> Thanks for review and comment!
>
> To: Andrew Morton <akpm@linux-foundation.org>
> To: Seth Jennings <sjenning@redhat.com>
> To: Dan Streetman <ddstreet@ieee.org>
> To: Vitaly Wool <vitaly.wool@konsulko.com>
> To: Nhat Pham <nphamcs@gmail.com>
> To: Johannes Weiner <hannes@cmpxchg.org>
> To: Yosry Ahmed <yosryahmed@google.com>
> To: Michal Hocko <mhocko@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>
> ---
> Chengming Zhou (7):
>       mm/zswap: make sure each swapfile always have zswap rb-tree
>       mm/zswap: split zswap rb-tree
>       mm/zswap: reuse dstmem when decompress
>       mm/zswap: change dstmem size to one page
>       mm/zswap: refactor out __zswap_load()
>       mm/zswap: cleanup zswap_load()
>       mm/zswap: cleanup zswap_reclaim_entry()
>
>  include/linux/zswap.h |   4 +-
>  mm/swapfile.c         |  10 ++-
>  mm/zswap.c            | 233 +++++++++++++++++++++-----------------------------
>  3 files changed, 106 insertions(+), 141 deletions(-)
> ---
> base-commit: 0f5f12ac05f36f117e793656c3f560625e927f1b
> change-id: 20231206-zswap-lock-optimize-06f45683b02b
>
> Best regards,
> --
> Chengming Zhou <zhouchengming@bytedance.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
                   ` (7 preceding siblings ...)
  2023-12-06 17:24 ` [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Nhat Pham
@ 2023-12-06 20:08 ` Nhat Pham
  2023-12-07  3:13   ` Chengming Zhou
  8 siblings, 1 reply; 30+ messages in thread
From: Nhat Pham @ 2023-12-06 20:08 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
> When testing the zswap performance by using kernel build -j32 in a tmpfs
> directory, I found the scalability of zswap rb-tree is not good, which
> is protected by the only spinlock. That would cause heavy lock contention
> if multiple tasks zswap_store/load concurrently.
>
> So a simple solution is to split the only one zswap rb-tree into multiple
> rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
> from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
>
> Although this method can't solve the spinlock contention completely, it
> can mitigate much of that contention.

By how much? Do you have any stats to estimate the amount of
contention and the reduction by this patch?

I do think lock contention could be a problem here, and it will be
even worse with the zswap shrinker enabled (which introduces an
theoretically unbounded number of concurrent reclaimers hammering on
the zswap rbtree and its lock). I am generally a bit weary about
architectural change though, especially if it is just a bandaid. We
have tried to reduce the lock contention somewhere else (multiple
zpools), and as predicted it just shifts the contention point
elsewhere. Maybe we need a deeper architectural re-think.

Not an outright NACK of course - just food for thought.

>
> Another problem when testing the zswap using our default zsmalloc is that
> zswap_load() and zswap_writeback_entry() have to malloc a temporary memory
> to support !zpool_can_sleep_mapped().
>
> Optimize it by reusing the percpu crypto_acomp_ctx->dstmem, which is also
> used by zswap_store() and protected by the same percpu crypto_acomp_ctx->mutex.

It'd be nice to reduce the (temporary) memory allocation on these
paths, but would this introduce contention on the per-cpu dstmem and
the mutex that protects it, if there are too many concurrent
store/load/writeback requests?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-06 17:24 ` [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Nhat Pham
@ 2023-12-06 20:41   ` Yosry Ahmed
  2023-12-07  0:43     ` Chris Li
  0 siblings, 1 reply; 30+ messages in thread
From: Yosry Ahmed @ 2023-12-06 20:41 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Chengming Zhou, Vitaly Wool, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, linux-mm,
	linux-kernel, Chris Li

On Wed, Dec 6, 2023 at 9:24 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> + Chris Li
>
> Chris, I vaguely remember from our last conversation that you have
> some concurrent efforts to use xarray here right?

If I recall correctly, the xarray already reduces the lock contention
as lookups are lockless, but Chris knows more here. As you mentioned
in a different email, it would be nice to get some data so that we can
compare different solutions.

>
> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
> >
> > Hi everyone,
> >
> > This patch series is based on the linux-next 20231205, which depends on
> > the "workload-specific and memory pressure-driven zswap writeback" series
> > from Nhat Pham.
> >
> > When testing the zswap performance by using kernel build -j32 in a tmpfs
> > directory, I found the scalability of zswap rb-tree is not good, which
> > is protected by the only spinlock. That would cause heavy lock contention
> > if multiple tasks zswap_store/load concurrently.
> >
> > So a simple solution is to split the only one zswap rb-tree into multiple
> > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
> > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
> >
> > Although this method can't solve the spinlock contention completely, it
> > can mitigate much of that contention.
> >
> > Another problem when testing the zswap using our default zsmalloc is that
> > zswap_load() and zswap_writeback_entry() have to malloc a temporary memory
> > to support !zpool_can_sleep_mapped().
> >
> > Optimize it by reusing the percpu crypto_acomp_ctx->dstmem, which is also
> > used by zswap_store() and protected by the same percpu crypto_acomp_ctx->mutex.
> >
> > Thanks for review and comment!
> >
> > To: Andrew Morton <akpm@linux-foundation.org>
> > To: Seth Jennings <sjenning@redhat.com>
> > To: Dan Streetman <ddstreet@ieee.org>
> > To: Vitaly Wool <vitaly.wool@konsulko.com>
> > To: Nhat Pham <nphamcs@gmail.com>
> > To: Johannes Weiner <hannes@cmpxchg.org>
> > To: Yosry Ahmed <yosryahmed@google.com>
> > To: Michal Hocko <mhocko@kernel.org>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-mm@kvack.org
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> >
> > ---
> > Chengming Zhou (7):
> >       mm/zswap: make sure each swapfile always have zswap rb-tree
> >       mm/zswap: split zswap rb-tree
> >       mm/zswap: reuse dstmem when decompress
> >       mm/zswap: change dstmem size to one page
> >       mm/zswap: refactor out __zswap_load()
> >       mm/zswap: cleanup zswap_load()
> >       mm/zswap: cleanup zswap_reclaim_entry()
> >
> >  include/linux/zswap.h |   4 +-
> >  mm/swapfile.c         |  10 ++-
> >  mm/zswap.c            | 233 +++++++++++++++++++++-----------------------------
> >  3 files changed, 106 insertions(+), 141 deletions(-)
> > ---
> > base-commit: 0f5f12ac05f36f117e793656c3f560625e927f1b
> > change-id: 20231206-zswap-lock-optimize-06f45683b02b
> >
> > Best regards,
> > --
> > Chengming Zhou <zhouchengming@bytedance.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-06 20:41   ` Yosry Ahmed
@ 2023-12-07  0:43     ` Chris Li
  2023-12-07  3:25       ` Chengming Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Chris Li @ 2023-12-07  0:43 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Nhat Pham, Chengming Zhou, Vitaly Wool, Johannes Weiner,
	Michal Hocko, Seth Jennings, Dan Streetman, Andrew Morton,
	linux-mm, linux-kernel, Matthew Wilcox

Hi Nhat and Yosry,

On Wed, Dec 6, 2023 at 12:42 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Dec 6, 2023 at 9:24 AM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > + Chris Li
> >
> > Chris, I vaguely remember from our last conversation that you have
> > some concurrent efforts to use xarray here right?

Yes, I do have the zswap xarray for older versions of the kernel. The
recent mm-unstable tree has  a lot of zswap related updates. Give me 2
days to refresh and post it. The zswap invalid entry and the reference
count change is causing a good portion of the code to be updated. That
is the price to pay keeping out of tree patches. My fault is not
getting to it sooner.

>
> If I recall correctly, the xarray already reduces the lock contention
> as lookups are lockless, but Chris knows more here. As you mentioned

Yes. To be exact, xarray can use spin lock (same as current RB tree)
or take RCU read lock on the lookup path (depending on how you call
the xarray API). Not completely lockless but the RCU read lock should
have less lock contention than normal spinlock. +Matthew

> in a different email, it would be nice to get some data so that we can
> compare different solutions.

Yes, it is certainly welcome to see more data points. If I recall
correctly, the zswap xarray array makes the lookup similar to the swap
cache lookup. It has a noticeable difference in the long tail end.

Stay tuned.

Chris

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/7] mm/zswap: change dstmem size to one page
  2023-12-06 17:12   ` Nhat Pham
@ 2023-12-07  2:59     ` Chengming Zhou
  0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-07  2:59 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On 2023/12/7 01:12, Nhat Pham wrote:
> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>>
>> Maybe I missed something, but the dstmem size of 2 * PAGE_SIZE is
>> very confusing, since we only need at most one page when compress,
>> and the "dlen" is also PAGE_SIZE in acomp_request_set_params().
>>
>> So change it to one page, and fix the comments.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  mm/zswap.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index d93a7b58b5af..999671dcb469 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -699,7 +699,7 @@ static int zswap_dstmem_prepare(unsigned int cpu)
>>         struct mutex *mutex;
>>         u8 *dst;
>>
>> -       dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
>> +       dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
>>         if (!dst)
>>                 return -ENOMEM;
>>
>> @@ -1649,8 +1649,7 @@ bool zswap_store(struct folio *folio)
>>         sg_init_table(&input, 1);
>>         sg_set_page(&input, page, PAGE_SIZE, 0);
>>
>> -       /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
>> -       sg_init_one(&output, dst, PAGE_SIZE * 2);
>> +       sg_init_one(&output, dst, PAGE_SIZE);
> 
> Hmm. This is very weird. It looks very intentional though, so perhaps
> we should consult the maintainer or the original author of this logic
> to double check this?

Yes, it's also weird to me. But it seems to be 2 pages when the zswap code
merged in mm 10 years ago. Hope the maintainer or author could shed some
light on it. :)

> My best guess is for cases where the compression algorithm fails - i.e
> the output (header + payload) is somehow bigger than the original
> data. But not sure if this happens at all, and if the size > PAGE_SIZE
> we don't wanna store the output in zswap anyway.
> 

Agree, so I think one page is enough.

>>         acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen);

And here we just use PAGE_SIZE in the parameter actually.

Thanks!

>>         /*
>>          * it maybe looks a little bit silly that we send an asynchronous request,
>>
>> --
>> b4 0.10.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-06 20:08 ` Nhat Pham
@ 2023-12-07  3:13   ` Chengming Zhou
  2023-12-07 15:18     ` Chengming Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Chengming Zhou @ 2023-12-07  3:13 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On 2023/12/7 04:08, Nhat Pham wrote:
> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>> When testing the zswap performance by using kernel build -j32 in a tmpfs
>> directory, I found the scalability of zswap rb-tree is not good, which
>> is protected by the only spinlock. That would cause heavy lock contention
>> if multiple tasks zswap_store/load concurrently.
>>
>> So a simple solution is to split the only one zswap rb-tree into multiple
>> rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
>> from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
>>
>> Although this method can't solve the spinlock contention completely, it
>> can mitigate much of that contention.
> 
> By how much? Do you have any stats to estimate the amount of
> contention and the reduction by this patch?

Actually, I did some test using the linux-next 20231205 yesterday.

Testcase: memory.max = 2G, zswap enabled, make -j32 in tmpfs.

			20231205	+patchset
1. !shrinker_enabled:   156s		126s
2.  shrinker_enabled:   79s		70s

I think your zswap shrinker fix patch can solve !shrinker_enabled case.

So will test again today using the new mm-unstable branch.

> 
> I do think lock contention could be a problem here, and it will be
> even worse with the zswap shrinker enabled (which introduces an
> theoretically unbounded number of concurrent reclaimers hammering on
> the zswap rbtree and its lock). I am generally a bit weary about
> architectural change though, especially if it is just a bandaid. We
> have tried to reduce the lock contention somewhere else (multiple
> zpools), and as predicted it just shifts the contention point
> elsewhere. Maybe we need a deeper architectural re-think.
> 
> Not an outright NACK of course - just food for thought.
> 

Right, I think xarray is good for lockless reading side, and
multiple trees is also complementary, which can reduce the lock
contention on the writing sides too.

>>
>> Another problem when testing the zswap using our default zsmalloc is that
>> zswap_load() and zswap_writeback_entry() have to malloc a temporary memory
>> to support !zpool_can_sleep_mapped().
>>
>> Optimize it by reusing the percpu crypto_acomp_ctx->dstmem, which is also
>> used by zswap_store() and protected by the same percpu crypto_acomp_ctx->mutex.
> 
> It'd be nice to reduce the (temporary) memory allocation on these
> paths, but would this introduce contention on the per-cpu dstmem and
> the mutex that protects it, if there are too many concurrent
> store/load/writeback requests?

I think the mutex holding time is not changed, right? So the contention
on the per-cpu mutex should be the same. We just reuse percpu dstmem more.

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-07  0:43     ` Chris Li
@ 2023-12-07  3:25       ` Chengming Zhou
  2023-12-12 23:26         ` Yosry Ahmed
  0 siblings, 1 reply; 30+ messages in thread
From: Chengming Zhou @ 2023-12-07  3:25 UTC (permalink / raw)
  To: Chris Li, Yosry Ahmed, Nhat Pham
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, linux-mm, linux-kernel,
	Matthew Wilcox

On 2023/12/7 08:43, Chris Li wrote:
> Hi Nhat and Yosry,
> 
> On Wed, Dec 6, 2023 at 12:42 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>>
>> On Wed, Dec 6, 2023 at 9:24 AM Nhat Pham <nphamcs@gmail.com> wrote:
>>>
>>> + Chris Li
>>>
>>> Chris, I vaguely remember from our last conversation that you have
>>> some concurrent efforts to use xarray here right?
> 
> Yes, I do have the zswap xarray for older versions of the kernel. The
> recent mm-unstable tree has  a lot of zswap related updates. Give me 2
> days to refresh and post it. The zswap invalid entry and the reference
> count change is causing a good portion of the code to be updated. That
> is the price to pay keeping out of tree patches. My fault is not
> getting to it sooner.
> 
>>
>> If I recall correctly, the xarray already reduces the lock contention
>> as lookups are lockless, but Chris knows more here. As you mentioned
> 
> Yes. To be exact, xarray can use spin lock (same as current RB tree)
> or take RCU read lock on the lookup path (depending on how you call
> the xarray API). Not completely lockless but the RCU read lock should
> have less lock contention than normal spinlock. +Matthew
> 

Great! Lockless lookup in zswap_load() should reduce spinlock contention.
And multiple trees (multiple xarrays) can further reduce the contention
on the concurrent zswap_store() side. So it's complementary IMHO.

>> in a different email, it would be nice to get some data so that we can
>> compare different solutions.
> 
> Yes, it is certainly welcome to see more data points. If I recall
> correctly, the zswap xarray array makes the lookup similar to the swap
> cache lookup. It has a noticeable difference in the long tail end.
> 

Right, I post some data from yesterday in another reply.
Will test again and update the data since Nhat's zswap shrinker fix patch
has been merged into mm-unstable today.

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-07  3:13   ` Chengming Zhou
@ 2023-12-07 15:18     ` Chengming Zhou
  2023-12-07 18:15       ` Nhat Pham
  0 siblings, 1 reply; 30+ messages in thread
From: Chengming Zhou @ 2023-12-07 15:18 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On 2023/12/7 11:13, Chengming Zhou wrote:
> On 2023/12/7 04:08, Nhat Pham wrote:
>> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
>> <zhouchengming@bytedance.com> wrote:
>>> When testing the zswap performance by using kernel build -j32 in a tmpfs
>>> directory, I found the scalability of zswap rb-tree is not good, which
>>> is protected by the only spinlock. That would cause heavy lock contention
>>> if multiple tasks zswap_store/load concurrently.
>>>
>>> So a simple solution is to split the only one zswap rb-tree into multiple
>>> rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
>>> from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
>>>
>>> Although this method can't solve the spinlock contention completely, it
>>> can mitigate much of that contention.
>>
>> By how much? Do you have any stats to estimate the amount of
>> contention and the reduction by this patch?
> 
> Actually, I did some test using the linux-next 20231205 yesterday.
> 
> Testcase: memory.max = 2G, zswap enabled, make -j32 in tmpfs.
> 
> 			20231205	+patchset
> 1. !shrinker_enabled:   156s		126s
> 2.  shrinker_enabled:   79s		70s
> 
> I think your zswap shrinker fix patch can solve !shrinker_enabled case.
> 
> So will test again today using the new mm-unstable branch.
> 

Updated test data based on today's mm-unstable branch:

			mm-unstable	+patchset
1. !shrinker_enabled:	86s		74s
2.  shrinker_enabled:	63s		61s

Shows much less optimization for the shrinker_enabled case, but still
much optimization for the !shrinker_enabled case.

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-07 15:18     ` Chengming Zhou
@ 2023-12-07 18:15       ` Nhat Pham
  2023-12-07 18:57         ` Nhat Pham
  2023-12-08 15:41         ` Chengming Zhou
  0 siblings, 2 replies; 30+ messages in thread
From: Nhat Pham @ 2023-12-07 18:15 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On Thu, Dec 7, 2023 at 7:18 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> On 2023/12/7 11:13, Chengming Zhou wrote:
> > On 2023/12/7 04:08, Nhat Pham wrote:
> >> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
> >> <zhouchengming@bytedance.com> wrote:
> >>> When testing the zswap performance by using kernel build -j32 in a tmpfs
> >>> directory, I found the scalability of zswap rb-tree is not good, which
> >>> is protected by the only spinlock. That would cause heavy lock contention
> >>> if multiple tasks zswap_store/load concurrently.
> >>>
> >>> So a simple solution is to split the only one zswap rb-tree into multiple
> >>> rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
> >>> from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
> >>>
> >>> Although this method can't solve the spinlock contention completely, it
> >>> can mitigate much of that contention.
> >>
> >> By how much? Do you have any stats to estimate the amount of
> >> contention and the reduction by this patch?
> >
> > Actually, I did some test using the linux-next 20231205 yesterday.
> >
> > Testcase: memory.max = 2G, zswap enabled, make -j32 in tmpfs.
> >
> >                       20231205        +patchset
> > 1. !shrinker_enabled:   156s          126s
> > 2.  shrinker_enabled:   79s           70s
> >
> > I think your zswap shrinker fix patch can solve !shrinker_enabled case.
> >
> > So will test again today using the new mm-unstable branch.
> >
>
> Updated test data based on today's mm-unstable branch:
>
>                         mm-unstable     +patchset
> 1. !shrinker_enabled:   86s             74s
> 2.  shrinker_enabled:   63s             61s
>
> Shows much less optimization for the shrinker_enabled case, but still
> much optimization for the !shrinker_enabled case.
>
> Thanks!

I'm gonna assume this is build time since it makes the zswap shrinker
look pretty good :)
I think this just means some of the gains between this patchset and
the zswap shrinker overlaps. But on the positive note:

a) Both are complementary, i.e enable both (bottom right corner) gives
us the best result.
b) Each individual change improves the runtime. If you disable the
shrinker, then this patch helps tremendously, so we're onto something.
c) The !shrinker_enabled is no longer *too* bad - once again, thanks
for noticing the regression and help me fix it! In fact, every cell
improves compared to the last run. Woohoo!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-07 18:15       ` Nhat Pham
@ 2023-12-07 18:57         ` Nhat Pham
  2023-12-08 15:41         ` Chengming Zhou
  1 sibling, 0 replies; 30+ messages in thread
From: Nhat Pham @ 2023-12-07 18:57 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On Thu, Dec 7, 2023 at 10:15 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Thu, Dec 7, 2023 at 7:18 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
> >
> > Updated test data based on today's mm-unstable branch:
> >
> >                         mm-unstable     +patchset
> > 1. !shrinker_enabled:   86s             74s
> > 2.  shrinker_enabled:   63s             61s
> >
> > Shows much less optimization for the shrinker_enabled case, but still
> > much optimization for the !shrinker_enabled case.
> >
> > Thanks!
>
> I'm gonna assume this is build time since it makes the zswap shrinker
> look pretty good :)
> I think this just means some of the gains between this patchset and
> the zswap shrinker overlaps. But on the positive note:
>
> a) Both are complementary, i.e enable both (bottom right corner) gives
> us the best result.
> b) Each individual change improves the runtime. If you disable the
> shrinker, then this patch helps tremendously, so we're onto something.
> c) The !shrinker_enabled is no longer *too* bad - once again, thanks
> for noticing the regression and help me fix it! In fact, every cell
> improves compared to the last run. Woohoo!

Oh and, another thing that might be helpful to observe reduction in
lock contention (and compare approaches if necessary) is this analysis
that Yosry performed for the multiple zpools change:
https://lore.kernel.org/lkml/20230620194644.3142384-1-yosryahmed@google.com/

We could look at the various paths that utilize rbtree and see how
long we're spinning at the lock(s) etc.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] mm/zswap: make sure each swapfile always have zswap rb-tree
  2023-12-06  9:46 ` [PATCH 1/7] mm/zswap: make sure each swapfile always have " Chengming Zhou
@ 2023-12-08 15:17   ` kernel test robot
  2023-12-08 15:45     ` Chengming Zhou
  2023-12-08 16:45   ` kernel test robot
  1 sibling, 1 reply; 30+ messages in thread
From: kernel test robot @ 2023-12-08 15:17 UTC (permalink / raw)
  To: Chengming Zhou, Vitaly Wool, Nhat Pham, Johannes Weiner,
	Michal Hocko, Seth Jennings, Dan Streetman, Andrew Morton,
	Yosry Ahmed
  Cc: llvm, oe-kbuild-all, Linux Memory Management List, linux-kernel,
	Chengming Zhou

Hi Chengming,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 0f5f12ac05f36f117e793656c3f560625e927f1b]

url:    https://github.com/intel-lab-lkp/linux/commits/Chengming-Zhou/mm-zswap-make-sure-each-swapfile-always-have-zswap-rb-tree/20231206-174717
base:   0f5f12ac05f36f117e793656c3f560625e927f1b
patch link:    https://lore.kernel.org/r/20231206-zswap-lock-optimize-v1-1-e25b059f9c3a%40bytedance.com
patch subject: [PATCH 1/7] mm/zswap: make sure each swapfile always have zswap rb-tree
config: i386-buildonly-randconfig-002-20231208 (https://download.01.org/0day-ci/archive/20231208/202312082309.xvC0Rdd9-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231208/202312082309.xvC0Rdd9-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312082309.xvC0Rdd9-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from arch/x86/kernel/asm-offsets.c:9:
   In file included from include/linux/crypto.h:17:
   In file included from include/linux/slab.h:16:
   In file included from include/linux/gfp.h:7:
   In file included from include/linux/mmzone.h:25:
>> include/linux/zswap.h:53:43: warning: non-void function does not return a value [-Wreturn-type]
   static inline int zswap_swapon(int type) {}
                                             ^
   1 warning generated.
--
   In file included from arch/x86/mm/pat/set_memory.c:6:
   In file included from include/linux/highmem.h:5:
   In file included from include/linux/fs.h:13:
   In file included from include/linux/list_lru.h:14:
   In file included from include/linux/xarray.h:15:
   In file included from include/linux/gfp.h:7:
   In file included from include/linux/mmzone.h:25:
>> include/linux/zswap.h:53:43: warning: non-void function does not return a value [-Wreturn-type]
   static inline int zswap_swapon(int type) {}
                                             ^
   arch/x86/mm/pat/set_memory.c:217:1: warning: unused function 'within_inclusive' [-Wunused-function]
   within_inclusive(unsigned long addr, unsigned long start, unsigned long end)
   ^
   2 warnings generated.
--
   In file included from arch/x86/kernel/asm-offsets.c:9:
   In file included from include/linux/crypto.h:17:
   In file included from include/linux/slab.h:16:
   In file included from include/linux/gfp.h:7:
   In file included from include/linux/mmzone.h:25:
>> include/linux/zswap.h:53:43: warning: non-void function does not return a value [-Wreturn-type]
   static inline int zswap_swapon(int type) {}
                                             ^
   1 warning generated.


vim +53 include/linux/zswap.h

    51	
    52	static inline void zswap_invalidate(int type, pgoff_t offset) {}
  > 53	static inline int zswap_swapon(int type) {}
    54	static inline void zswap_swapoff(int type) {}
    55	static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
    56	static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {}
    57	static inline void zswap_page_swapin(struct page *page) {}
    58	#endif
    59	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-07 18:15       ` Nhat Pham
  2023-12-07 18:57         ` Nhat Pham
@ 2023-12-08 15:41         ` Chengming Zhou
  1 sibling, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-08 15:41 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On 2023/12/8 02:15, Nhat Pham wrote:
> On Thu, Dec 7, 2023 at 7:18 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>>
>> On 2023/12/7 11:13, Chengming Zhou wrote:
>>> On 2023/12/7 04:08, Nhat Pham wrote:
>>>> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
>>>> <zhouchengming@bytedance.com> wrote:
>>>>> When testing the zswap performance by using kernel build -j32 in a tmpfs
>>>>> directory, I found the scalability of zswap rb-tree is not good, which
>>>>> is protected by the only spinlock. That would cause heavy lock contention
>>>>> if multiple tasks zswap_store/load concurrently.
>>>>>
>>>>> So a simple solution is to split the only one zswap rb-tree into multiple
>>>>> rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
>>>>> from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
>>>>>
>>>>> Although this method can't solve the spinlock contention completely, it
>>>>> can mitigate much of that contention.
>>>>
>>>> By how much? Do you have any stats to estimate the amount of
>>>> contention and the reduction by this patch?
>>>
>>> Actually, I did some test using the linux-next 20231205 yesterday.
>>>
>>> Testcase: memory.max = 2G, zswap enabled, make -j32 in tmpfs.
>>>
>>>                       20231205        +patchset
>>> 1. !shrinker_enabled:   156s          126s
>>> 2.  shrinker_enabled:   79s           70s
>>>
>>> I think your zswap shrinker fix patch can solve !shrinker_enabled case.
>>>
>>> So will test again today using the new mm-unstable branch.
>>>
>>
>> Updated test data based on today's mm-unstable branch:
>>
>>                         mm-unstable     +patchset
>> 1. !shrinker_enabled:   86s             74s
>> 2.  shrinker_enabled:   63s             61s
>>
>> Shows much less optimization for the shrinker_enabled case, but still
>> much optimization for the !shrinker_enabled case.
>>
>> Thanks!
> 
> I'm gonna assume this is build time since it makes the zswap shrinker
> look pretty good :)
> I think this just means some of the gains between this patchset and
> the zswap shrinker overlaps. But on the positive note:
> 
> a) Both are complementary, i.e enable both (bottom right corner) gives
> us the best result.

Right, both optimizations are complementary, to make zswap perform better :)

> b) Each individual change improves the runtime. If you disable the
> shrinker, then this patch helps tremendously, so we're onto something.
> c) The !shrinker_enabled is no longer *too* bad - once again, thanks
> for noticing the regression and help me fix it! In fact, every cell
> improves compared to the last run. Woohoo!

It's my pleasure! Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] mm/zswap: make sure each swapfile always have zswap rb-tree
  2023-12-08 15:17   ` kernel test robot
@ 2023-12-08 15:45     ` Chengming Zhou
  0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-08 15:45 UTC (permalink / raw)
  To: kernel test robot, Vitaly Wool, Nhat Pham, Johannes Weiner,
	Michal Hocko, Seth Jennings, Dan Streetman, Andrew Morton,
	Yosry Ahmed
  Cc: llvm, oe-kbuild-all, Linux Memory Management List, linux-kernel

On 2023/12/8 23:17, kernel test robot wrote:
> Hi Chengming,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on 0f5f12ac05f36f117e793656c3f560625e927f1b]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Chengming-Zhou/mm-zswap-make-sure-each-swapfile-always-have-zswap-rb-tree/20231206-174717
> base:   0f5f12ac05f36f117e793656c3f560625e927f1b
> patch link:    https://lore.kernel.org/r/20231206-zswap-lock-optimize-v1-1-e25b059f9c3a%40bytedance.com
> patch subject: [PATCH 1/7] mm/zswap: make sure each swapfile always have zswap rb-tree
> config: i386-buildonly-randconfig-002-20231208 (https://download.01.org/0day-ci/archive/20231208/202312082309.xvC0Rdd9-lkp@intel.com/config)
> compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231208/202312082309.xvC0Rdd9-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202312082309.xvC0Rdd9-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
>    In file included from arch/x86/kernel/asm-offsets.c:9:
>    In file included from include/linux/crypto.h:17:
>    In file included from include/linux/slab.h:16:
>    In file included from include/linux/gfp.h:7:
>    In file included from include/linux/mmzone.h:25:
>>> include/linux/zswap.h:53:43: warning: non-void function does not return a value [-Wreturn-type]
>    static inline int zswap_swapon(int type) {}

Ah, will fix to return 0 when !CONFIG_ZSWAP here.

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] mm/zswap: make sure each swapfile always have zswap rb-tree
  2023-12-06  9:46 ` [PATCH 1/7] mm/zswap: make sure each swapfile always have " Chengming Zhou
  2023-12-08 15:17   ` kernel test robot
@ 2023-12-08 16:45   ` kernel test robot
  1 sibling, 0 replies; 30+ messages in thread
From: kernel test robot @ 2023-12-08 16:45 UTC (permalink / raw)
  To: Chengming Zhou, Vitaly Wool, Nhat Pham, Johannes Weiner,
	Michal Hocko, Seth Jennings, Dan Streetman, Andrew Morton,
	Yosry Ahmed
  Cc: oe-kbuild-all, Linux Memory Management List, linux-kernel,
	Chengming Zhou

Hi Chengming,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 0f5f12ac05f36f117e793656c3f560625e927f1b]

url:    https://github.com/intel-lab-lkp/linux/commits/Chengming-Zhou/mm-zswap-make-sure-each-swapfile-always-have-zswap-rb-tree/20231206-174717
base:   0f5f12ac05f36f117e793656c3f560625e927f1b
patch link:    https://lore.kernel.org/r/20231206-zswap-lock-optimize-v1-1-e25b059f9c3a%40bytedance.com
patch subject: [PATCH 1/7] mm/zswap: make sure each swapfile always have zswap rb-tree
config: i386-randconfig-012-20231208 (https://download.01.org/0day-ci/archive/20231209/202312090030.s5y0gXnv-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231209/202312090030.s5y0gXnv-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312090030.s5y0gXnv-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/linux/mmzone.h:25,
                    from include/linux/gfp.h:7,
                    from include/linux/slab.h:16,
                    from include/linux/crypto.h:17,
                    from arch/x86/kernel/asm-offsets.c:9:
   include/linux/zswap.h: In function 'zswap_swapon':
>> include/linux/zswap.h:53:1: warning: no return statement in function returning non-void [-Wreturn-type]
      53 | static inline int zswap_swapon(int type) {}
         | ^~~~~~
--
   In file included from include/linux/mmzone.h:25,
                    from include/linux/gfp.h:7,
                    from include/linux/umh.h:4,
                    from include/linux/kmod.h:9,
                    from include/linux/module.h:17,
                    from drivers/clk/clkdev.c:9:
   include/linux/zswap.h: In function 'zswap_swapon':
>> include/linux/zswap.h:53:1: warning: no return statement in function returning non-void [-Wreturn-type]
      53 | static inline int zswap_swapon(int type) {}
         | ^~~~~~
   drivers/clk/clkdev.c: In function 'vclkdev_alloc':
   drivers/clk/clkdev.c:173:17: warning: function 'vclkdev_alloc' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
     173 |                 vscnprintf(cla->dev_id, sizeof(cla->dev_id), dev_fmt, ap);
         |                 ^~~~~~~~~~
--
   In file included from include/linux/mmzone.h:25,
                    from include/linux/gfp.h:7,
                    from include/linux/slab.h:16,
                    from include/linux/crypto.h:17,
                    from arch/x86/kernel/asm-offsets.c:9:
   include/linux/zswap.h: In function 'zswap_swapon':
>> include/linux/zswap.h:53:1: warning: no return statement in function returning non-void [-Wreturn-type]
      53 | static inline int zswap_swapon(int type) {}
         | ^~~~~~


vim +53 include/linux/zswap.h

    51	
    52	static inline void zswap_invalidate(int type, pgoff_t offset) {}
  > 53	static inline int zswap_swapon(int type) {}
    54	static inline void zswap_swapoff(int type) {}
    55	static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
    56	static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {}
    57	static inline void zswap_page_swapin(struct page *page) {}
    58	#endif
    59	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] mm/zswap: reuse dstmem when decompress
  2023-12-06  9:46 ` [PATCH 3/7] mm/zswap: reuse dstmem when decompress Chengming Zhou
@ 2023-12-12 22:58   ` Nhat Pham
  2023-12-13  2:41     ` Chengming Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Nhat Pham @ 2023-12-12 22:58 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first
> copy the entry->handle memory to a temporary memory, which is allocated
> using kmalloc.
>
> Obviously we can reuse the per-compressor dstmem to avoid allocating
> every time, since it's percpu-compressor and protected in mutex.

Ah this sounds like a good idea. We have to grab that mutex anyway -
might as well use the memory slot that is protected by that mutex.

>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  mm/zswap.c | 29 +++++++++--------------------
>  1 file changed, 9 insertions(+), 20 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index a6b4859a0164..d93a7b58b5af 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1758,9 +1758,9 @@ bool zswap_load(struct folio *folio)
>         struct zswap_entry *entry;
>         struct scatterlist input, output;
>         struct crypto_acomp_ctx *acomp_ctx;
> -       u8 *src, *dst, *tmp;
> +       unsigned int dlen = PAGE_SIZE;
> +       u8 *src, *dst;
>         struct zpool *zpool;
> -       unsigned int dlen;
>         bool ret;
>
>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
> @@ -1782,27 +1782,18 @@ bool zswap_load(struct folio *folio)
>                 goto stats;
>         }
>
> -       zpool = zswap_find_zpool(entry);
> -       if (!zpool_can_sleep_mapped(zpool)) {
> -               tmp = kmalloc(entry->length, GFP_KERNEL);
> -               if (!tmp) {
> -                       ret = false;
> -                       goto freeentry;
> -               }
> -       }
> -
>         /* decompress */
> -       dlen = PAGE_SIZE;
> -       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> +       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> +       mutex_lock(acomp_ctx->mutex);
>
> +       zpool = zswap_find_zpool(entry);
> +       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
>         if (!zpool_can_sleep_mapped(zpool)) {
> -               memcpy(tmp, src, entry->length);
> -               src = tmp;
> +               memcpy(acomp_ctx->dstmem, src, entry->length);
> +               src = acomp_ctx->dstmem;

We're moving handle (un)mapping and the memory copying inside the
critical section protected by the mutex. Seems fine to me -
zswap_store() already did this IIUC.

>                 zpool_unmap_handle(zpool, entry->handle);
>         }
>
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -       mutex_lock(acomp_ctx->mutex);
>         sg_init_one(&input, src, entry->length);
>         sg_init_table(&output, 1);
>         sg_set_page(&output, page, PAGE_SIZE, 0);
> @@ -1813,15 +1804,13 @@ bool zswap_load(struct folio *folio)
>
>         if (zpool_can_sleep_mapped(zpool))
>                 zpool_unmap_handle(zpool, entry->handle);
> -       else
> -               kfree(tmp);
>
>         ret = true;
>  stats:
>         count_vm_event(ZSWPIN);
>         if (entry->objcg)
>                 count_objcg_event(entry->objcg, ZSWPIN);
> -freeentry:

So it reduces the chance of zswap_load() failure due to unable to
memory allocation failure? Nice!

> +
>         spin_lock(&tree->lock);
>         if (ret && zswap_exclusive_loads_enabled) {
>                 zswap_invalidate_entry(tree, entry);
>
> --
> b4 0.10.1

Reviewed-by: Nhat Pham <nphamcs@gmail.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] mm/zswap: refactor out __zswap_load()
  2023-12-06  9:46 ` [PATCH 5/7] mm/zswap: refactor out __zswap_load() Chengming Zhou
@ 2023-12-12 23:13   ` Nhat Pham
  2023-12-13  2:46     ` Chengming Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Nhat Pham @ 2023-12-12 23:13 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> The zswap_load() and zswap_writeback_entry() have the same part that
> decompress the data from zswap_entry to page, so refactor out the
> common part as __zswap_load(entry, page).

I love this refactoring a lot :) No reason why we should duplicate the
decompression logic shared between load and writeback.

>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  mm/zswap.c | 108 ++++++++++++++++++++++---------------------------------------
>  1 file changed, 39 insertions(+), 69 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 999671dcb469..667b66a3911b 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1380,6 +1380,42 @@ static int zswap_enabled_param_set(const char *val,
>         return ret;
>  }
>
> +static void __zswap_load(struct zswap_entry *entry, struct page *page)
> +{
> +       struct scatterlist input, output;
> +       unsigned int dlen = PAGE_SIZE;
> +       struct crypto_acomp_ctx *acomp_ctx;
> +       struct zpool *zpool;
> +       u8 *src;
> +       int ret;
> +
> +       /* decompress */

nit: I guess all this function does is decompression right? Doesn't
seem like this comment is necessary anymore... But this is just
nitpicking.

> +       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> +       mutex_lock(acomp_ctx->mutex);
> +
> +       zpool = zswap_find_zpool(entry);
> +       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> +       if (!zpool_can_sleep_mapped(zpool)) {
> +               memcpy(acomp_ctx->dstmem, src, entry->length);
> +               src = acomp_ctx->dstmem;
> +               zpool_unmap_handle(zpool, entry->handle);
> +       }
> +
> +       sg_init_one(&input, src, entry->length);
> +       sg_init_table(&output, 1);
> +       sg_set_page(&output, page, PAGE_SIZE, 0);
> +       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
> +       ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
> +       dlen = acomp_ctx->req->dlen;
> +       mutex_unlock(acomp_ctx->mutex);
> +
> +       if (zpool_can_sleep_mapped(zpool))
> +               zpool_unmap_handle(zpool, entry->handle);
> +
> +       BUG_ON(ret);
> +       BUG_ON(dlen != PAGE_SIZE);
> +}
> +
>  /*********************************
>  * writeback code
>  **********************************/
> @@ -1401,23 +1437,12 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         swp_entry_t swpentry = entry->swpentry;
>         struct page *page;
>         struct mempolicy *mpol;
> -       struct scatterlist input, output;
> -       struct crypto_acomp_ctx *acomp_ctx;
> -       struct zpool *pool = zswap_find_zpool(entry);
>         bool page_was_allocated;
> -       u8 *src, *tmp = NULL;
> -       unsigned int dlen;
>         int ret;
>         struct writeback_control wbc = {
>                 .sync_mode = WB_SYNC_NONE,
>         };
>
> -       if (!zpool_can_sleep_mapped(pool)) {
> -               tmp = kmalloc(PAGE_SIZE, GFP_KERNEL);
> -               if (!tmp)
> -                       return -ENOMEM;
> -       }
> -

Sweet. Less allocation == more efficient + less failure case :)

>         /* try to allocate swap cache page */
>         mpol = get_task_policy(current);
>         page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
> @@ -1450,33 +1475,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         }
>         spin_unlock(&tree->lock);
>
> -       /* decompress */
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -       dlen = PAGE_SIZE;
> -
> -       src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO);
> -       if (!zpool_can_sleep_mapped(pool)) {
> -               memcpy(tmp, src, entry->length);
> -               src = tmp;
> -               zpool_unmap_handle(pool, entry->handle);
> -       }
> -
> -       mutex_lock(acomp_ctx->mutex);
> -       sg_init_one(&input, src, entry->length);
> -       sg_init_table(&output, 1);
> -       sg_set_page(&output, page, PAGE_SIZE, 0);
> -       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
> -       ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
> -       dlen = acomp_ctx->req->dlen;
> -       mutex_unlock(acomp_ctx->mutex);
> -
> -       if (!zpool_can_sleep_mapped(pool))
> -               kfree(tmp);
> -       else
> -               zpool_unmap_handle(pool, entry->handle);
> -
> -       BUG_ON(ret);
> -       BUG_ON(dlen != PAGE_SIZE);
> +       __zswap_load(entry, page);
>
>         /* page is up to date */
>         SetPageUptodate(page);
> @@ -1496,9 +1495,6 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         return ret;
>
>  fail:
> -       if (!zpool_can_sleep_mapped(pool))
> -               kfree(tmp);
> -
>         /*
>          * If we get here because the page is already in swapcache, a
>          * load may be happening concurrently. It is safe and okay to
> @@ -1755,11 +1751,7 @@ bool zswap_load(struct folio *folio)
>         struct page *page = &folio->page;
>         struct zswap_tree *tree = swap_zswap_tree(swp);
>         struct zswap_entry *entry;
> -       struct scatterlist input, output;
> -       struct crypto_acomp_ctx *acomp_ctx;
> -       unsigned int dlen = PAGE_SIZE;
> -       u8 *src, *dst;
> -       struct zpool *zpool;
> +       u8 *dst;
>         bool ret;
>
>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
> @@ -1781,29 +1773,7 @@ bool zswap_load(struct folio *folio)
>                 goto stats;
>         }
>
> -       /* decompress */
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -       mutex_lock(acomp_ctx->mutex);
> -
> -       zpool = zswap_find_zpool(entry);
> -       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> -       if (!zpool_can_sleep_mapped(zpool)) {
> -               memcpy(acomp_ctx->dstmem, src, entry->length);
> -               src = acomp_ctx->dstmem;
> -               zpool_unmap_handle(zpool, entry->handle);
> -       }
> -
> -       sg_init_one(&input, src, entry->length);
> -       sg_init_table(&output, 1);
> -       sg_set_page(&output, page, PAGE_SIZE, 0);
> -       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
> -       if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait))
> -               WARN_ON(1);
> -       mutex_unlock(acomp_ctx->mutex);
> -
> -       if (zpool_can_sleep_mapped(zpool))
> -               zpool_unmap_handle(zpool, entry->handle);
> -
> +       __zswap_load(entry, page);
>         ret = true;
>  stats:
>         count_vm_event(ZSWPIN);
>
> --
> b4 0.10.1

Can't find anything wrong with this patch, so:
Reviewed-by: Nhat Pham <nphamcs@gmail.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-07  3:25       ` Chengming Zhou
@ 2023-12-12 23:26         ` Yosry Ahmed
  2023-12-12 23:33           ` Nhat Pham
  0 siblings, 1 reply; 30+ messages in thread
From: Yosry Ahmed @ 2023-12-12 23:26 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Chris Li, Nhat Pham, Vitaly Wool, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, linux-mm,
	linux-kernel, Matthew Wilcox

On Wed, Dec 6, 2023 at 7:25 PM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> On 2023/12/7 08:43, Chris Li wrote:
> > Hi Nhat and Yosry,
> >
> > On Wed, Dec 6, 2023 at 12:42 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >>
> >> On Wed, Dec 6, 2023 at 9:24 AM Nhat Pham <nphamcs@gmail.com> wrote:
> >>>
> >>> + Chris Li
> >>>
> >>> Chris, I vaguely remember from our last conversation that you have
> >>> some concurrent efforts to use xarray here right?
> >
> > Yes, I do have the zswap xarray for older versions of the kernel. The
> > recent mm-unstable tree has  a lot of zswap related updates. Give me 2
> > days to refresh and post it. The zswap invalid entry and the reference
> > count change is causing a good portion of the code to be updated. That
> > is the price to pay keeping out of tree patches. My fault is not
> > getting to it sooner.
> >
> >>
> >> If I recall correctly, the xarray already reduces the lock contention
> >> as lookups are lockless, but Chris knows more here. As you mentioned
> >
> > Yes. To be exact, xarray can use spin lock (same as current RB tree)
> > or take RCU read lock on the lookup path (depending on how you call
> > the xarray API). Not completely lockless but the RCU read lock should
> > have less lock contention than normal spinlock. +Matthew
> >
>
> Great! Lockless lookup in zswap_load() should reduce spinlock contention.
> And multiple trees (multiple xarrays) can further reduce the contention
> on the concurrent zswap_store() side. So it's complementary IMHO.
>
> >> in a different email, it would be nice to get some data so that we can
> >> compare different solutions.
> >
> > Yes, it is certainly welcome to see more data points. If I recall
> > correctly, the zswap xarray array makes the lookup similar to the swap
> > cache lookup. It has a noticeable difference in the long tail end.
> >
>
> Right, I post some data from yesterday in another reply.
> Will test again and update the data since Nhat's zswap shrinker fix patch
> has been merged into mm-unstable today.
>
> Thanks!

Let's split the rbtree breakdown into a separate series. This series
has irrelevant (and very nice) cleanups and optimizations, let's get
them separately and defer the rbtree breakdown part until we get data
about the xarray implementation. Perhaps the tree breakdown is not
needed as much with an xarray, or at the very least the implementation
would look different on top of an xarray.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-12 23:26         ` Yosry Ahmed
@ 2023-12-12 23:33           ` Nhat Pham
  2023-12-13  2:57             ` Chengming Zhou
  0 siblings, 1 reply; 30+ messages in thread
From: Nhat Pham @ 2023-12-12 23:33 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Chengming Zhou, Chris Li, Vitaly Wool, Johannes Weiner,
	Michal Hocko, Seth Jennings, Dan Streetman, Andrew Morton,
	linux-mm, linux-kernel, Matthew Wilcox

On Tue, Dec 12, 2023 at 3:27 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Let's split the rbtree breakdown into a separate series. This series
> has irrelevant (and very nice) cleanups and optimizations, let's get
> them separately and defer the rbtree breakdown part until we get data
> about the xarray implementation. Perhaps the tree breakdown is not
> needed as much with an xarray, or at the very least the implementation
> would look different on top of an xarray.

Actually, kinda agree - I quite like the cleanup/optimization done
w.r.t dstmem reuse :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] mm/zswap: reuse dstmem when decompress
  2023-12-12 22:58   ` Nhat Pham
@ 2023-12-13  2:41     ` Chengming Zhou
  0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-13  2:41 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On 2023/12/13 06:58, Nhat Pham wrote:
> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>>
>> In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first
>> copy the entry->handle memory to a temporary memory, which is allocated
>> using kmalloc.
>>
>> Obviously we can reuse the per-compressor dstmem to avoid allocating
>> every time, since it's percpu-compressor and protected in mutex.
> 
> Ah this sounds like a good idea. We have to grab that mutex anyway -
> might as well use the memory slot that is protected by that mutex.
> 
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  mm/zswap.c | 29 +++++++++--------------------
>>  1 file changed, 9 insertions(+), 20 deletions(-)
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index a6b4859a0164..d93a7b58b5af 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -1758,9 +1758,9 @@ bool zswap_load(struct folio *folio)
>>         struct zswap_entry *entry;
>>         struct scatterlist input, output;
>>         struct crypto_acomp_ctx *acomp_ctx;
>> -       u8 *src, *dst, *tmp;
>> +       unsigned int dlen = PAGE_SIZE;
>> +       u8 *src, *dst;
>>         struct zpool *zpool;
>> -       unsigned int dlen;
>>         bool ret;
>>
>>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
>> @@ -1782,27 +1782,18 @@ bool zswap_load(struct folio *folio)
>>                 goto stats;
>>         }
>>
>> -       zpool = zswap_find_zpool(entry);
>> -       if (!zpool_can_sleep_mapped(zpool)) {
>> -               tmp = kmalloc(entry->length, GFP_KERNEL);
>> -               if (!tmp) {
>> -                       ret = false;
>> -                       goto freeentry;
>> -               }
>> -       }
>> -
>>         /* decompress */
>> -       dlen = PAGE_SIZE;
>> -       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
>> +       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
>> +       mutex_lock(acomp_ctx->mutex);
>>
>> +       zpool = zswap_find_zpool(entry);
>> +       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
>>         if (!zpool_can_sleep_mapped(zpool)) {
>> -               memcpy(tmp, src, entry->length);
>> -               src = tmp;
>> +               memcpy(acomp_ctx->dstmem, src, entry->length);
>> +               src = acomp_ctx->dstmem;
> 
> We're moving handle (un)mapping and the memory copying inside the
> critical section protected by the mutex. Seems fine to me -
> zswap_store() already did this IIUC.
> 
>>                 zpool_unmap_handle(zpool, entry->handle);
>>         }
>>
>> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
>> -       mutex_lock(acomp_ctx->mutex);
>>         sg_init_one(&input, src, entry->length);
>>         sg_init_table(&output, 1);
>>         sg_set_page(&output, page, PAGE_SIZE, 0);
>> @@ -1813,15 +1804,13 @@ bool zswap_load(struct folio *folio)
>>
>>         if (zpool_can_sleep_mapped(zpool))
>>                 zpool_unmap_handle(zpool, entry->handle);
>> -       else
>> -               kfree(tmp);
>>
>>         ret = true;
>>  stats:
>>         count_vm_event(ZSWPIN);
>>         if (entry->objcg)
>>                 count_objcg_event(entry->objcg, ZSWPIN);
>> -freeentry:
> 
> So it reduces the chance of zswap_load() failure due to unable to
> memory allocation failure? Nice!
> 

Yes, no need to allocate temporary memory anymore :)

>> +
>>         spin_lock(&tree->lock);
>>         if (ret && zswap_exclusive_loads_enabled) {
>>                 zswap_invalidate_entry(tree, entry);
>>
>> --
>> b4 0.10.1
> 
> Reviewed-by: Nhat Pham <nphamcs@gmail.com>

Thanks for review!


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] mm/zswap: refactor out __zswap_load()
  2023-12-12 23:13   ` Nhat Pham
@ 2023-12-13  2:46     ` Chengming Zhou
  0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-13  2:46 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Vitaly Wool, Johannes Weiner, Michal Hocko, Seth Jennings,
	Dan Streetman, Andrew Morton, Yosry Ahmed, linux-mm,
	linux-kernel

On 2023/12/13 07:13, Nhat Pham wrote:
> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>>
>> The zswap_load() and zswap_writeback_entry() have the same part that
>> decompress the data from zswap_entry to page, so refactor out the
>> common part as __zswap_load(entry, page).
> 
> I love this refactoring a lot :) No reason why we should duplicate the
> decompression logic shared between load and writeback.
> 
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  mm/zswap.c | 108 ++++++++++++++++++++++---------------------------------------
>>  1 file changed, 39 insertions(+), 69 deletions(-)
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index 999671dcb469..667b66a3911b 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -1380,6 +1380,42 @@ static int zswap_enabled_param_set(const char *val,
>>         return ret;
>>  }
>>
>> +static void __zswap_load(struct zswap_entry *entry, struct page *page)
>> +{
>> +       struct scatterlist input, output;
>> +       unsigned int dlen = PAGE_SIZE;
>> +       struct crypto_acomp_ctx *acomp_ctx;
>> +       struct zpool *zpool;
>> +       u8 *src;
>> +       int ret;
>> +
>> +       /* decompress */
> 
> nit: I guess all this function does is decompression right? Doesn't
> seem like this comment is necessary anymore... But this is just
> nitpicking.
> 

Ah, right. I will remove it.

Thanks!

>> +       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
>> +       mutex_lock(acomp_ctx->mutex);
>> +
>> +       zpool = zswap_find_zpool(entry);
>> +       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
>> +       if (!zpool_can_sleep_mapped(zpool)) {
>> +               memcpy(acomp_ctx->dstmem, src, entry->length);
>> +               src = acomp_ctx->dstmem;
>> +               zpool_unmap_handle(zpool, entry->handle);
>> +       }
>> +
>> +       sg_init_one(&input, src, entry->length);
>> +       sg_init_table(&output, 1);
>> +       sg_set_page(&output, page, PAGE_SIZE, 0);
>> +       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
>> +       ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
>> +       dlen = acomp_ctx->req->dlen;
>> +       mutex_unlock(acomp_ctx->mutex);
>> +
>> +       if (zpool_can_sleep_mapped(zpool))
>> +               zpool_unmap_handle(zpool, entry->handle);
>> +
>> +       BUG_ON(ret);
>> +       BUG_ON(dlen != PAGE_SIZE);
>> +}
>> +
>>  /*********************************
>>  * writeback code
>>  **********************************/
>> @@ -1401,23 +1437,12 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>>         swp_entry_t swpentry = entry->swpentry;
>>         struct page *page;
>>         struct mempolicy *mpol;
>> -       struct scatterlist input, output;
>> -       struct crypto_acomp_ctx *acomp_ctx;
>> -       struct zpool *pool = zswap_find_zpool(entry);
>>         bool page_was_allocated;
>> -       u8 *src, *tmp = NULL;
>> -       unsigned int dlen;
>>         int ret;
>>         struct writeback_control wbc = {
>>                 .sync_mode = WB_SYNC_NONE,
>>         };
>>
>> -       if (!zpool_can_sleep_mapped(pool)) {
>> -               tmp = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> -               if (!tmp)
>> -                       return -ENOMEM;
>> -       }
>> -
> 
> Sweet. Less allocation == more efficient + less failure case :)
> 
>>         /* try to allocate swap cache page */
>>         mpol = get_task_policy(current);
>>         page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
>> @@ -1450,33 +1475,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>>         }
>>         spin_unlock(&tree->lock);
>>
>> -       /* decompress */
>> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
>> -       dlen = PAGE_SIZE;
>> -
>> -       src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO);
>> -       if (!zpool_can_sleep_mapped(pool)) {
>> -               memcpy(tmp, src, entry->length);
>> -               src = tmp;
>> -               zpool_unmap_handle(pool, entry->handle);
>> -       }
>> -
>> -       mutex_lock(acomp_ctx->mutex);
>> -       sg_init_one(&input, src, entry->length);
>> -       sg_init_table(&output, 1);
>> -       sg_set_page(&output, page, PAGE_SIZE, 0);
>> -       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
>> -       ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
>> -       dlen = acomp_ctx->req->dlen;
>> -       mutex_unlock(acomp_ctx->mutex);
>> -
>> -       if (!zpool_can_sleep_mapped(pool))
>> -               kfree(tmp);
>> -       else
>> -               zpool_unmap_handle(pool, entry->handle);
>> -
>> -       BUG_ON(ret);
>> -       BUG_ON(dlen != PAGE_SIZE);
>> +       __zswap_load(entry, page);
>>
>>         /* page is up to date */
>>         SetPageUptodate(page);
>> @@ -1496,9 +1495,6 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>>         return ret;
>>
>>  fail:
>> -       if (!zpool_can_sleep_mapped(pool))
>> -               kfree(tmp);
>> -
>>         /*
>>          * If we get here because the page is already in swapcache, a
>>          * load may be happening concurrently. It is safe and okay to
>> @@ -1755,11 +1751,7 @@ bool zswap_load(struct folio *folio)
>>         struct page *page = &folio->page;
>>         struct zswap_tree *tree = swap_zswap_tree(swp);
>>         struct zswap_entry *entry;
>> -       struct scatterlist input, output;
>> -       struct crypto_acomp_ctx *acomp_ctx;
>> -       unsigned int dlen = PAGE_SIZE;
>> -       u8 *src, *dst;
>> -       struct zpool *zpool;
>> +       u8 *dst;
>>         bool ret;
>>
>>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
>> @@ -1781,29 +1773,7 @@ bool zswap_load(struct folio *folio)
>>                 goto stats;
>>         }
>>
>> -       /* decompress */
>> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
>> -       mutex_lock(acomp_ctx->mutex);
>> -
>> -       zpool = zswap_find_zpool(entry);
>> -       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
>> -       if (!zpool_can_sleep_mapped(zpool)) {
>> -               memcpy(acomp_ctx->dstmem, src, entry->length);
>> -               src = acomp_ctx->dstmem;
>> -               zpool_unmap_handle(zpool, entry->handle);
>> -       }
>> -
>> -       sg_init_one(&input, src, entry->length);
>> -       sg_init_table(&output, 1);
>> -       sg_set_page(&output, page, PAGE_SIZE, 0);
>> -       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
>> -       if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait))
>> -               WARN_ON(1);
>> -       mutex_unlock(acomp_ctx->mutex);
>> -
>> -       if (zpool_can_sleep_mapped(zpool))
>> -               zpool_unmap_handle(zpool, entry->handle);
>> -
>> +       __zswap_load(entry, page);
>>         ret = true;
>>  stats:
>>         count_vm_event(ZSWPIN);
>>
>> --
>> b4 0.10.1
> 
> Can't find anything wrong with this patch, so:
> Reviewed-by: Nhat Pham <nphamcs@gmail.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree
  2023-12-12 23:33           ` Nhat Pham
@ 2023-12-13  2:57             ` Chengming Zhou
  0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-12-13  2:57 UTC (permalink / raw)
  To: Nhat Pham, Yosry Ahmed
  Cc: Chris Li, Vitaly Wool, Johannes Weiner, Michal Hocko,
	Seth Jennings, Dan Streetman, Andrew Morton, linux-mm,
	linux-kernel, Matthew Wilcox

On 2023/12/13 07:33, Nhat Pham wrote:
> On Tue, Dec 12, 2023 at 3:27 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>>
>> Let's split the rbtree breakdown into a separate series. This series
>> has irrelevant (and very nice) cleanups and optimizations, let's get
>> them separately and defer the rbtree breakdown part until we get data

Ok, will split and just send the cleanups/optimizations with dstmem reuse.

>> about the xarray implementation. Perhaps the tree breakdown is not
>> needed as much with an xarray, or at the very least the implementation
>> would look different on top of an xarray.

Yeah, will retest on the xarray version of Chris, the implementation is
easy anyway.

> 
> Actually, kinda agree - I quite like the cleanup/optimization done
> w.r.t dstmem reuse :)

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-12-13  2:58 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06  9:46 [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Chengming Zhou
2023-12-06  9:46 ` [PATCH 1/7] mm/zswap: make sure each swapfile always have " Chengming Zhou
2023-12-08 15:17   ` kernel test robot
2023-12-08 15:45     ` Chengming Zhou
2023-12-08 16:45   ` kernel test robot
2023-12-06  9:46 ` [PATCH 2/7] mm/zswap: split " Chengming Zhou
2023-12-06  9:46 ` [PATCH 3/7] mm/zswap: reuse dstmem when decompress Chengming Zhou
2023-12-12 22:58   ` Nhat Pham
2023-12-13  2:41     ` Chengming Zhou
2023-12-06  9:46 ` [PATCH 4/7] mm/zswap: change dstmem size to one page Chengming Zhou
2023-12-06 17:12   ` Nhat Pham
2023-12-07  2:59     ` Chengming Zhou
2023-12-06  9:46 ` [PATCH 5/7] mm/zswap: refactor out __zswap_load() Chengming Zhou
2023-12-12 23:13   ` Nhat Pham
2023-12-13  2:46     ` Chengming Zhou
2023-12-06  9:46 ` [PATCH 6/7] mm/zswap: cleanup zswap_load() Chengming Zhou
2023-12-06  9:46 ` [PATCH 7/7] mm/zswap: cleanup zswap_reclaim_entry() Chengming Zhou
2023-12-06 17:24 ` [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree Nhat Pham
2023-12-06 20:41   ` Yosry Ahmed
2023-12-07  0:43     ` Chris Li
2023-12-07  3:25       ` Chengming Zhou
2023-12-12 23:26         ` Yosry Ahmed
2023-12-12 23:33           ` Nhat Pham
2023-12-13  2:57             ` Chengming Zhou
2023-12-06 20:08 ` Nhat Pham
2023-12-07  3:13   ` Chengming Zhou
2023-12-07 15:18     ` Chengming Zhou
2023-12-07 18:15       ` Nhat Pham
2023-12-07 18:57         ` Nhat Pham
2023-12-08 15:41         ` Chengming Zhou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.