linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/6] Implement writeback for zsmalloc
@ 2022-11-19  0:15 Nhat Pham
  2022-11-19  0:15 ` [PATCH v6 1/6] zswap: fix writeback lock ordering " Nhat Pham
                   ` (6 more replies)
  0 siblings, 7 replies; 39+ messages in thread
From: Nhat Pham @ 2022-11-19  0:15 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

Changelog:
v6:
  * Move the move-to-front logic into zs_map_object (patch 4)
    (suggested by Minchan Kim).
  * Small clean up for free_zspage at free_handles() call site
    (patch 6) (suggested by Minchan Kim).
v5:
  * Add a new patch that eliminates unused code in zpool and simplify
    the logic for storing evict handler in zbud/z3fold (patch 2)
  * Remove redudant fields in zs_pool (previously required by zpool)
    (patch 3)
  * Wrap under_reclaim and deferred handle freeing logic in CONFIG_ZPOOL
    (patch 6) (suggested by Minchan Kim)
  * Move a small piece of refactoring from patch 6 to patch 4.
v4:
  * Wrap the new LRU logic in CONFIG_ZPOOL (patch 3).
    (suggested by Minchan Kim)
v3:
  * Set pool->ops = NULL when pool->zpool_ops is null (patch 4).
  * Stop holding pool's lock when calling lock_zspage() (patch 5).
    (suggested by Sergey Senozhatsky)
  * Stop holding pool's lock when checking pool->ops and retries.
    (patch 5) (suggested by Sergey Senozhatsky)
  * Fix formatting issues (.shrink, extra spaces in casting removed).
    (patch 5) (suggested by Sergey Senozhatsky)
v2:
  * Add missing CONFIG_ZPOOL ifdefs (patch 5)
    (detected by kernel test robot).

Unlike other zswap's allocators such as zbud or z3fold, zsmalloc
currently lacks the writeback mechanism. This means that when the zswap
pool is full, it will simply reject further allocations, and the pages
will be written directly to swap.

This series of patches implements writeback for zsmalloc. When the zswap
pool becomes full, zsmalloc will attempt to evict all the compressed
objects in the least-recently used zspages.

There are 6 patches in this series:

Johannes Weiner (2):
  zswap: fix writeback lock ordering for zsmalloc
  zpool: clean out dead code

Nhat Pham (4):
  zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks
  zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  zsmalloc: Add zpool_ops field to zs_pool to store evict handlers
  zsmalloc: Implement writeback mechanism for zsmalloc

 mm/z3fold.c   |  36 +-----
 mm/zbud.c     |  32 +----
 mm/zpool.c    |  10 +-
 mm/zsmalloc.c | 325 ++++++++++++++++++++++++++++++++++++++++----------
 mm/zswap.c    |  37 +++---
 5 files changed, 295 insertions(+), 145 deletions(-)

--
2.30.2

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v6 1/6] zswap: fix writeback lock ordering for zsmalloc
  2022-11-19  0:15 [PATCH v6 0/6] Implement writeback for zsmalloc Nhat Pham
@ 2022-11-19  0:15 ` Nhat Pham
  2022-11-22  1:43   ` Sergey Senozhatsky
  2022-11-19  0:15 ` [PATCH v6 2/6] zpool: clean out dead code Nhat Pham
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 39+ messages in thread
From: Nhat Pham @ 2022-11-19  0:15 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

From: Johannes Weiner <hannes@cmpxchg.org>

zswap's customary lock order is tree->lock before pool->lock, because
the tree->lock protects the entries' refcount, and the free callbacks in
the backends acquire their respective pool locks to dispatch the backing
object. zsmalloc's map callback takes the pool lock, so zswap must not
grab the tree->lock while a handle is mapped. This currently only
happens during writeback, which isn't implemented for zsmalloc. In
preparation for it, move the tree->lock section out of the mapped entry
section

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 mm/zswap.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 2d48fd59cc7a..2d69c1d678fe 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -958,7 +958,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
 	};

 	if (!zpool_can_sleep_mapped(pool)) {
-		tmp = kmalloc(PAGE_SIZE, GFP_ATOMIC);
+		tmp = kmalloc(PAGE_SIZE, GFP_KERNEL);
 		if (!tmp)
 			return -ENOMEM;
 	}
@@ -968,6 +968,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
 	swpentry = zhdr->swpentry; /* here */
 	tree = zswap_trees[swp_type(swpentry)];
 	offset = swp_offset(swpentry);
+	zpool_unmap_handle(pool, handle);

 	/* find and ref zswap entry */
 	spin_lock(&tree->lock);
@@ -975,20 +976,12 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
 	if (!entry) {
 		/* entry was invalidated */
 		spin_unlock(&tree->lock);
-		zpool_unmap_handle(pool, handle);
 		kfree(tmp);
 		return 0;
 	}
 	spin_unlock(&tree->lock);
 	BUG_ON(offset != entry->offset);

-	src = (u8 *)zhdr + sizeof(struct zswap_header);
-	if (!zpool_can_sleep_mapped(pool)) {
-		memcpy(tmp, src, entry->length);
-		src = tmp;
-		zpool_unmap_handle(pool, handle);
-	}
-
 	/* try to allocate swap cache page */
 	switch (zswap_get_swap_cache_page(swpentry, &page)) {
 	case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
@@ -1006,6 +999,14 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
 		acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
 		dlen = PAGE_SIZE;

+		zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO);
+		src = (u8 *)zhdr + sizeof(struct zswap_header);
+		if (!zpool_can_sleep_mapped(pool)) {
+			memcpy(tmp, src, entry->length);
+			src = tmp;
+			zpool_unmap_handle(pool, handle);
+		}
+
 		mutex_lock(acomp_ctx->mutex);
 		sg_init_one(&input, src, entry->length);
 		sg_init_table(&output, 1);
@@ -1015,6 +1016,11 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
 		dlen = acomp_ctx->req->dlen;
 		mutex_unlock(acomp_ctx->mutex);

+		if (!zpool_can_sleep_mapped(pool))
+			kfree(tmp);
+		else
+			zpool_unmap_handle(pool, handle);
+
 		BUG_ON(ret);
 		BUG_ON(dlen != PAGE_SIZE);

@@ -1045,7 +1051,11 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
 		zswap_entry_put(tree, entry);
 	spin_unlock(&tree->lock);

-	goto end;
+	return ret;
+
+fail:
+	if (!zpool_can_sleep_mapped(pool))
+		kfree(tmp);

 	/*
 	* if we get here due to ZSWAP_SWAPCACHE_EXIST
@@ -1054,17 +1064,10 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
 	* if we free the entry in the following put
 	* it is also okay to return !0
 	*/
-fail:
 	spin_lock(&tree->lock);
 	zswap_entry_put(tree, entry);
 	spin_unlock(&tree->lock);

-end:
-	if (zpool_can_sleep_mapped(pool))
-		zpool_unmap_handle(pool, handle);
-	else
-		kfree(tmp);
-
 	return ret;
 }

--
2.30.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v6 2/6] zpool: clean out dead code
  2022-11-19  0:15 [PATCH v6 0/6] Implement writeback for zsmalloc Nhat Pham
  2022-11-19  0:15 ` [PATCH v6 1/6] zswap: fix writeback lock ordering " Nhat Pham
@ 2022-11-19  0:15 ` Nhat Pham
  2022-11-22  1:46   ` Sergey Senozhatsky
  2022-11-19  0:15 ` [PATCH v6 3/6] zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks Nhat Pham
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 39+ messages in thread
From: Nhat Pham @ 2022-11-19  0:15 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

From: Johannes Weiner <hannes@cmpxchg.org>

There is a lot of provision for flexibility that isn't actually needed
or used. Zswap (the only zpool user) always passes zpool_ops with an
.evict method set. The backends who reclaim only do so for zswap, so
they can also directly call zpool_ops without indirection or checks.

Finally, there is no need to check the retries parameters and bail
with -EINVAL in the reclaim function, when that's called just a few
lines below with a hard-coded 8. There is no need to duplicate the
evictable and sleep_mapped attrs from the driver in zpool_ops.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 mm/z3fold.c | 36 +++++-------------------------------
 mm/zbud.c   | 32 +++++---------------------------
 mm/zpool.c  | 10 ++--------
 3 files changed, 12 insertions(+), 66 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index cf71da10d04e..a4de0c317ac7 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -68,9 +68,6 @@
  * Structures
 *****************/
 struct z3fold_pool;
-struct z3fold_ops {
-	int (*evict)(struct z3fold_pool *pool, unsigned long handle);
-};

 enum buddy {
 	HEADLESS = 0,
@@ -138,8 +135,6 @@ struct z3fold_header {
  * @stale:	list of pages marked for freeing
  * @pages_nr:	number of z3fold pages in the pool.
  * @c_handle:	cache for z3fold_buddy_slots allocation
- * @ops:	pointer to a structure of user defined operations specified at
- *		pool creation time.
  * @zpool:	zpool driver
  * @zpool_ops:	zpool operations structure with an evict callback
  * @compact_wq:	workqueue for page layout background optimization
@@ -158,7 +153,6 @@ struct z3fold_pool {
 	struct list_head stale;
 	atomic64_t pages_nr;
 	struct kmem_cache *c_handle;
-	const struct z3fold_ops *ops;
 	struct zpool *zpool;
 	const struct zpool_ops *zpool_ops;
 	struct workqueue_struct *compact_wq;
@@ -907,13 +901,11 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
  * z3fold_create_pool() - create a new z3fold pool
  * @name:	pool name
  * @gfp:	gfp flags when allocating the z3fold pool structure
- * @ops:	user-defined operations for the z3fold pool
  *
  * Return: pointer to the new z3fold pool or NULL if the metadata allocation
  * failed.
  */
-static struct z3fold_pool *z3fold_create_pool(const char *name, gfp_t gfp,
-		const struct z3fold_ops *ops)
+static struct z3fold_pool *z3fold_create_pool(const char *name, gfp_t gfp)
 {
 	struct z3fold_pool *pool = NULL;
 	int i, cpu;
@@ -949,7 +941,6 @@ static struct z3fold_pool *z3fold_create_pool(const char *name, gfp_t gfp,
 	if (!pool->release_wq)
 		goto out_wq;
 	INIT_WORK(&pool->work, free_pages_work);
-	pool->ops = ops;
 	return pool;

 out_wq:
@@ -1230,10 +1221,6 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
 	slots.pool = (unsigned long)pool | (1 << HANDLES_NOFREE);

 	spin_lock(&pool->lock);
-	if (!pool->ops || !pool->ops->evict || retries == 0) {
-		spin_unlock(&pool->lock);
-		return -EINVAL;
-	}
 	for (i = 0; i < retries; i++) {
 		if (list_empty(&pool->lru)) {
 			spin_unlock(&pool->lock);
@@ -1319,17 +1306,17 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
 		}
 		/* Issue the eviction callback(s) */
 		if (middle_handle) {
-			ret = pool->ops->evict(pool, middle_handle);
+			ret = pool->zpool_ops->evict(pool->zpool, middle_handle);
 			if (ret)
 				goto next;
 		}
 		if (first_handle) {
-			ret = pool->ops->evict(pool, first_handle);
+			ret = pool->zpool_ops->evict(pool->zpool, first_handle);
 			if (ret)
 				goto next;
 		}
 		if (last_handle) {
-			ret = pool->ops->evict(pool, last_handle);
+			ret = pool->zpool_ops->evict(pool->zpool, last_handle);
 			if (ret)
 				goto next;
 		}
@@ -1593,26 +1580,13 @@ static const struct movable_operations z3fold_mops = {
  * zpool
  ****************/

-static int z3fold_zpool_evict(struct z3fold_pool *pool, unsigned long handle)
-{
-	if (pool->zpool && pool->zpool_ops && pool->zpool_ops->evict)
-		return pool->zpool_ops->evict(pool->zpool, handle);
-	else
-		return -ENOENT;
-}
-
-static const struct z3fold_ops z3fold_zpool_ops = {
-	.evict =	z3fold_zpool_evict
-};
-
 static void *z3fold_zpool_create(const char *name, gfp_t gfp,
 			       const struct zpool_ops *zpool_ops,
 			       struct zpool *zpool)
 {
 	struct z3fold_pool *pool;

-	pool = z3fold_create_pool(name, gfp,
-				zpool_ops ? &z3fold_zpool_ops : NULL);
+	pool = z3fold_create_pool(name, gfp);
 	if (pool) {
 		pool->zpool = zpool;
 		pool->zpool_ops = zpool_ops;
diff --git a/mm/zbud.c b/mm/zbud.c
index 6348932430b8..3acd26193920 100644
--- a/mm/zbud.c
+++ b/mm/zbud.c
@@ -74,10 +74,6 @@

 struct zbud_pool;

-struct zbud_ops {
-	int (*evict)(struct zbud_pool *pool, unsigned long handle);
-};
-
 /**
  * struct zbud_pool - stores metadata for each zbud pool
  * @lock:	protects all pool fields and first|last_chunk fields of any
@@ -90,8 +86,6 @@ struct zbud_ops {
  * @lru:	list tracking the zbud pages in LRU order by most recently
  *		added buddy.
  * @pages_nr:	number of zbud pages in the pool.
- * @ops:	pointer to a structure of user defined operations specified at
- *		pool creation time.
  * @zpool:	zpool driver
  * @zpool_ops:	zpool operations structure with an evict callback
  *
@@ -110,7 +104,6 @@ struct zbud_pool {
 	};
 	struct list_head lru;
 	u64 pages_nr;
-	const struct zbud_ops *ops;
 	struct zpool *zpool;
 	const struct zpool_ops *zpool_ops;
 };
@@ -212,12 +205,11 @@ static int num_free_chunks(struct zbud_header *zhdr)
 /**
  * zbud_create_pool() - create a new zbud pool
  * @gfp:	gfp flags when allocating the zbud pool structure
- * @ops:	user-defined operations for the zbud pool
  *
  * Return: pointer to the new zbud pool or NULL if the metadata allocation
  * failed.
  */
-static struct zbud_pool *zbud_create_pool(gfp_t gfp, const struct zbud_ops *ops)
+static struct zbud_pool *zbud_create_pool(gfp_t gfp)
 {
 	struct zbud_pool *pool;
 	int i;
@@ -231,7 +223,6 @@ static struct zbud_pool *zbud_create_pool(gfp_t gfp, const struct zbud_ops *ops)
 	INIT_LIST_HEAD(&pool->buddied);
 	INIT_LIST_HEAD(&pool->lru);
 	pool->pages_nr = 0;
-	pool->ops = ops;
 	return pool;
 }

@@ -419,8 +410,7 @@ static int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)
 	unsigned long first_handle = 0, last_handle = 0;

 	spin_lock(&pool->lock);
-	if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) ||
-			retries == 0) {
+	if (list_empty(&pool->lru)) {
 		spin_unlock(&pool->lock);
 		return -EINVAL;
 	}
@@ -444,12 +434,12 @@ static int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries)

 		/* Issue the eviction callback(s) */
 		if (first_handle) {
-			ret = pool->ops->evict(pool, first_handle);
+			ret = pool->zpool_ops->evict(pool->zpool, first_handle);
 			if (ret)
 				goto next;
 		}
 		if (last_handle) {
-			ret = pool->ops->evict(pool, last_handle);
+			ret = pool->zpool_ops->evict(pool->zpool, last_handle);
 			if (ret)
 				goto next;
 		}
@@ -524,25 +514,13 @@ static u64 zbud_get_pool_size(struct zbud_pool *pool)
  * zpool
  ****************/

-static int zbud_zpool_evict(struct zbud_pool *pool, unsigned long handle)
-{
-	if (pool->zpool && pool->zpool_ops && pool->zpool_ops->evict)
-		return pool->zpool_ops->evict(pool->zpool, handle);
-	else
-		return -ENOENT;
-}
-
-static const struct zbud_ops zbud_zpool_ops = {
-	.evict =	zbud_zpool_evict
-};
-
 static void *zbud_zpool_create(const char *name, gfp_t gfp,
 			       const struct zpool_ops *zpool_ops,
 			       struct zpool *zpool)
 {
 	struct zbud_pool *pool;

-	pool = zbud_create_pool(gfp, zpool_ops ? &zbud_zpool_ops : NULL);
+	pool = zbud_create_pool(gfp);
 	if (pool) {
 		pool->zpool = zpool;
 		pool->zpool_ops = zpool_ops;
diff --git a/mm/zpool.c b/mm/zpool.c
index 68facc193496..fc3a9893e107 100644
--- a/mm/zpool.c
+++ b/mm/zpool.c
@@ -21,9 +21,6 @@
 struct zpool {
 	struct zpool_driver *driver;
 	void *pool;
-	const struct zpool_ops *ops;
-	bool evictable;
-	bool can_sleep_mapped;
 };

 static LIST_HEAD(drivers_head);
@@ -177,9 +174,6 @@ struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp,

 	zpool->driver = driver;
 	zpool->pool = driver->create(name, gfp, ops, zpool);
-	zpool->ops = ops;
-	zpool->evictable = driver->shrink && ops && ops->evict;
-	zpool->can_sleep_mapped = driver->sleep_mapped;

 	if (!zpool->pool) {
 		pr_err("couldn't create %s pool\n", type);
@@ -380,7 +374,7 @@ u64 zpool_get_total_size(struct zpool *zpool)
  */
 bool zpool_evictable(struct zpool *zpool)
 {
-	return zpool->evictable;
+	return zpool->driver->shrink;
 }

 /**
@@ -391,7 +385,7 @@ bool zpool_evictable(struct zpool *zpool)
  */
 bool zpool_can_sleep_mapped(struct zpool *zpool)
 {
-	return zpool->can_sleep_mapped;
+	return zpool->driver->sleep_mapped;
 }

 MODULE_LICENSE("GPL");
--
2.30.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v6 3/6] zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks
  2022-11-19  0:15 [PATCH v6 0/6] Implement writeback for zsmalloc Nhat Pham
  2022-11-19  0:15 ` [PATCH v6 1/6] zswap: fix writeback lock ordering " Nhat Pham
  2022-11-19  0:15 ` [PATCH v6 2/6] zpool: clean out dead code Nhat Pham
@ 2022-11-19  0:15 ` Nhat Pham
  2022-11-19  0:15 ` [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Nhat Pham
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 39+ messages in thread
From: Nhat Pham @ 2022-11-19  0:15 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

Currently, zsmalloc has a hierarchy of locks, which includes a
pool-level migrate_lock, and a lock for each size class. We have to
obtain both locks in the hotpath in most cases anyway, except for
zs_malloc. This exception will no longer exist when we introduce a LRU
into the zs_pool for the new writeback functionality - we will need to
obtain a pool-level lock to synchronize LRU handling even in zs_malloc.

In preparation for zsmalloc writeback, consolidate these locks into a
single pool-level lock, which drastically reduces the complexity of
synchronization in zsmalloc.

We have also benchmarked the lock consolidation to see the performance
effect of this change on zram.

First, we ran a synthetic FS workload on a server machine with 36 cores
(same machine for all runs), using

fs_mark  -d  ../zram1mnt  -s  100000  -n  2500  -t  32  -k

before and after for btrfs and ext4 on zram (FS usage is 80%).

Here is the result (unit is file/second):

With lock consolidation (btrfs):
Average: 13520.2, Median: 13531.0, Stddev: 137.5961482019028

Without lock consolidation (btrfs):
Average: 13487.2, Median: 13575.0, Stddev: 309.08283679298665

With lock consolidation (ext4):
Average: 16824.4, Median: 16839.0, Stddev: 89.97388510006668

Without lock consolidation (ext4)
Average: 16958.0, Median: 16986.0, Stddev: 194.7370021336469

As you can see, we observe a 0.3% regression for btrfs, and a 0.9%
regression for ext4. This is a small, barely measurable difference in my
opinion.

For a more realistic scenario, we also tries building the kernel on zram.
Here is the time it takes (in seconds):

With lock consolidation (btrfs):
real
Average: 319.6, Median: 320.0, Stddev: 0.8944271909999159
user
Average: 6894.2, Median: 6895.0, Stddev: 25.528415540334656
sys
Average: 521.4, Median: 522.0, Stddev: 1.51657508881031

Without lock consolidation (btrfs):
real
Average: 319.8, Median: 320.0, Stddev: 0.8366600265340756
user
Average: 6896.6, Median: 6899.0, Stddev: 16.04057355583023
sys
Average: 520.6, Median: 521.0, Stddev: 1.140175425099138

With lock consolidation (ext4):
real
Average: 320.0, Median: 319.0, Stddev: 1.4142135623730951
user
Average: 6896.8, Median: 6878.0, Stddev: 28.621670111997307
sys
Average: 521.2, Median: 521.0, Stddev: 1.7888543819998317

Without lock consolidation (ext4)
real
Average: 319.6, Median: 319.0, Stddev: 0.8944271909999159
user
Average: 6886.2, Median: 6887.0, Stddev: 16.93221781102523
sys
Average: 520.4, Median: 520.0, Stddev: 1.140175425099138

The difference is entirely within the noise of a typical run on zram. This
hardly justifies the complexity of maintaining both the pool lock and
the class lock. In fact, for writeback, we would need to introduce yet
another lock to prevent data races on the pool's LRU, further
complicating the lock handling logic. IMHO, it is just better to
collapse all of these into a single pool-level lock.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/zsmalloc.c | 87 ++++++++++++++++++++++-----------------------------
 1 file changed, 37 insertions(+), 50 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index d03941cace2c..326faa751f0a 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -33,8 +33,7 @@
 /*
  * lock ordering:
  *	page_lock
- *	pool->migrate_lock
- *	class->lock
+ *	pool->lock
  *	zspage->lock
  */

@@ -192,7 +191,6 @@ static const int fullness_threshold_frac = 4;
 static size_t huge_class_size;

 struct size_class {
-	spinlock_t lock;
 	struct list_head fullness_list[NR_ZS_FULLNESS];
 	/*
 	 * Size of objects stored in this class. Must be multiple
@@ -247,8 +245,7 @@ struct zs_pool {
 #ifdef CONFIG_COMPACTION
 	struct work_struct free_work;
 #endif
-	/* protect page/zspage migration */
-	rwlock_t migrate_lock;
+	spinlock_t lock;
 };

 struct zspage {
@@ -355,7 +352,7 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage)
 	kmem_cache_free(pool->zspage_cachep, zspage);
 }

-/* class->lock(which owns the handle) synchronizes races */
+/* pool->lock(which owns the handle) synchronizes races */
 static void record_obj(unsigned long handle, unsigned long obj)
 {
 	*(unsigned long *)handle = obj;
@@ -452,7 +449,7 @@ static __maybe_unused int is_first_page(struct page *page)
 	return PagePrivate(page);
 }

-/* Protected by class->lock */
+/* Protected by pool->lock */
 static inline int get_zspage_inuse(struct zspage *zspage)
 {
 	return zspage->inuse;
@@ -597,13 +594,13 @@ static int zs_stats_size_show(struct seq_file *s, void *v)
 		if (class->index != i)
 			continue;

-		spin_lock(&class->lock);
+		spin_lock(&pool->lock);
 		class_almost_full = zs_stat_get(class, CLASS_ALMOST_FULL);
 		class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY);
 		obj_allocated = zs_stat_get(class, OBJ_ALLOCATED);
 		obj_used = zs_stat_get(class, OBJ_USED);
 		freeable = zs_can_compact(class);
-		spin_unlock(&class->lock);
+		spin_unlock(&pool->lock);

 		objs_per_zspage = class->objs_per_zspage;
 		pages_used = obj_allocated / objs_per_zspage *
@@ -916,7 +913,7 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,

 	get_zspage_mapping(zspage, &class_idx, &fg);

-	assert_spin_locked(&class->lock);
+	assert_spin_locked(&pool->lock);

 	VM_BUG_ON(get_zspage_inuse(zspage));
 	VM_BUG_ON(fg != ZS_EMPTY);
@@ -1247,19 +1244,19 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	BUG_ON(in_interrupt());

 	/* It guarantees it can get zspage from handle safely */
-	read_lock(&pool->migrate_lock);
+	spin_lock(&pool->lock);
 	obj = handle_to_obj(handle);
 	obj_to_location(obj, &page, &obj_idx);
 	zspage = get_zspage(page);

 	/*
-	 * migration cannot move any zpages in this zspage. Here, class->lock
+	 * migration cannot move any zpages in this zspage. Here, pool->lock
 	 * is too heavy since callers would take some time until they calls
 	 * zs_unmap_object API so delegate the locking from class to zspage
 	 * which is smaller granularity.
 	 */
 	migrate_read_lock(zspage);
-	read_unlock(&pool->migrate_lock);
+	spin_unlock(&pool->lock);

 	class = zspage_class(pool, zspage);
 	off = (class->size * obj_idx) & ~PAGE_MASK;
@@ -1412,8 +1409,8 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
 	size += ZS_HANDLE_SIZE;
 	class = pool->size_class[get_size_class_index(size)];

-	/* class->lock effectively protects the zpage migration */
-	spin_lock(&class->lock);
+	/* pool->lock effectively protects the zpage migration */
+	spin_lock(&pool->lock);
 	zspage = find_get_zspage(class);
 	if (likely(zspage)) {
 		obj = obj_malloc(pool, zspage, handle);
@@ -1421,12 +1418,12 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
 		fix_fullness_group(class, zspage);
 		record_obj(handle, obj);
 		class_stat_inc(class, OBJ_USED, 1);
-		spin_unlock(&class->lock);
+		spin_unlock(&pool->lock);

 		return handle;
 	}

-	spin_unlock(&class->lock);
+	spin_unlock(&pool->lock);

 	zspage = alloc_zspage(pool, class, gfp);
 	if (!zspage) {
@@ -1434,7 +1431,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
 		return (unsigned long)ERR_PTR(-ENOMEM);
 	}

-	spin_lock(&class->lock);
+	spin_lock(&pool->lock);
 	obj = obj_malloc(pool, zspage, handle);
 	newfg = get_fullness_group(class, zspage);
 	insert_zspage(class, zspage, newfg);
@@ -1447,7 +1444,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)

 	/* We completely set up zspage so mark them as movable */
 	SetZsPageMovable(pool, zspage);
-	spin_unlock(&class->lock);
+	spin_unlock(&pool->lock);

 	return handle;
 }
@@ -1491,16 +1488,14 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 		return;

 	/*
-	 * The pool->migrate_lock protects the race with zpage's migration
+	 * The pool->lock protects the race with zpage's migration
 	 * so it's safe to get the page from handle.
 	 */
-	read_lock(&pool->migrate_lock);
+	spin_lock(&pool->lock);
 	obj = handle_to_obj(handle);
 	obj_to_page(obj, &f_page);
 	zspage = get_zspage(f_page);
 	class = zspage_class(pool, zspage);
-	spin_lock(&class->lock);
-	read_unlock(&pool->migrate_lock);

 	obj_free(class->size, obj);
 	class_stat_dec(class, OBJ_USED, 1);
@@ -1510,7 +1505,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)

 	free_zspage(pool, class, zspage);
 out:
-	spin_unlock(&class->lock);
+	spin_unlock(&pool->lock);
 	cache_free_handle(pool, handle);
 }
 EXPORT_SYMBOL_GPL(zs_free);
@@ -1867,16 +1862,12 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 	pool = zspage->pool;

 	/*
-	 * The pool migrate_lock protects the race between zpage migration
+	 * The pool's lock protects the race between zpage migration
 	 * and zs_free.
 	 */
-	write_lock(&pool->migrate_lock);
+	spin_lock(&pool->lock);
 	class = zspage_class(pool, zspage);

-	/*
-	 * the class lock protects zpage alloc/free in the zspage.
-	 */
-	spin_lock(&class->lock);
 	/* the migrate_write_lock protects zpage access via zs_map_object */
 	migrate_write_lock(zspage);

@@ -1906,10 +1897,9 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 	replace_sub_page(class, zspage, newpage, page);
 	/*
 	 * Since we complete the data copy and set up new zspage structure,
-	 * it's okay to release migration_lock.
+	 * it's okay to release the pool's lock.
 	 */
-	write_unlock(&pool->migrate_lock);
-	spin_unlock(&class->lock);
+	spin_unlock(&pool->lock);
 	dec_zspage_isolation(zspage);
 	migrate_write_unlock(zspage);

@@ -1964,9 +1954,9 @@ static void async_free_zspage(struct work_struct *work)
 		if (class->index != i)
 			continue;

-		spin_lock(&class->lock);
+		spin_lock(&pool->lock);
 		list_splice_init(&class->fullness_list[ZS_EMPTY], &free_pages);
-		spin_unlock(&class->lock);
+		spin_unlock(&pool->lock);
 	}

 	list_for_each_entry_safe(zspage, tmp, &free_pages, list) {
@@ -1976,9 +1966,9 @@ static void async_free_zspage(struct work_struct *work)
 		get_zspage_mapping(zspage, &class_idx, &fullness);
 		VM_BUG_ON(fullness != ZS_EMPTY);
 		class = pool->size_class[class_idx];
-		spin_lock(&class->lock);
+		spin_lock(&pool->lock);
 		__free_zspage(pool, class, zspage);
-		spin_unlock(&class->lock);
+		spin_unlock(&pool->lock);
 	}
 };

@@ -2039,10 +2029,11 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 	struct zspage *dst_zspage = NULL;
 	unsigned long pages_freed = 0;

-	/* protect the race between zpage migration and zs_free */
-	write_lock(&pool->migrate_lock);
-	/* protect zpage allocation/free */
-	spin_lock(&class->lock);
+	/*
+	 * protect the race between zpage migration and zs_free
+	 * as well as zpage allocation/free
+	 */
+	spin_lock(&pool->lock);
 	while ((src_zspage = isolate_zspage(class, true))) {
 		/* protect someone accessing the zspage(i.e., zs_map_object) */
 		migrate_write_lock(src_zspage);
@@ -2067,7 +2058,7 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 			putback_zspage(class, dst_zspage);
 			migrate_write_unlock(dst_zspage);
 			dst_zspage = NULL;
-			if (rwlock_is_contended(&pool->migrate_lock))
+			if (spin_is_contended(&pool->lock))
 				break;
 		}

@@ -2084,11 +2075,9 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 			pages_freed += class->pages_per_zspage;
 		} else
 			migrate_write_unlock(src_zspage);
-		spin_unlock(&class->lock);
-		write_unlock(&pool->migrate_lock);
+		spin_unlock(&pool->lock);
 		cond_resched();
-		write_lock(&pool->migrate_lock);
-		spin_lock(&class->lock);
+		spin_lock(&pool->lock);
 	}

 	if (src_zspage) {
@@ -2096,8 +2085,7 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 		migrate_write_unlock(src_zspage);
 	}

-	spin_unlock(&class->lock);
-	write_unlock(&pool->migrate_lock);
+	spin_unlock(&pool->lock);

 	return pages_freed;
 }
@@ -2200,7 +2188,7 @@ struct zs_pool *zs_create_pool(const char *name)
 		return NULL;

 	init_deferred_free(pool);
-	rwlock_init(&pool->migrate_lock);
+	spin_lock_init(&pool->lock);

 	pool->name = kstrdup(name, GFP_KERNEL);
 	if (!pool->name)
@@ -2271,7 +2259,6 @@ struct zs_pool *zs_create_pool(const char *name)
 		class->index = i;
 		class->pages_per_zspage = pages_per_zspage;
 		class->objs_per_zspage = objs_per_zspage;
-		spin_lock_init(&class->lock);
 		pool->size_class[i] = class;
 		for (fullness = ZS_EMPTY; fullness < NR_ZS_FULLNESS;
 							fullness++)
--
2.30.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-19  0:15 [PATCH v6 0/6] Implement writeback for zsmalloc Nhat Pham
                   ` (2 preceding siblings ...)
  2022-11-19  0:15 ` [PATCH v6 3/6] zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks Nhat Pham
@ 2022-11-19  0:15 ` Nhat Pham
  2022-11-19 16:38   ` Johannes Weiner
                     ` (3 more replies)
  2022-11-19  0:15 ` [PATCH v6 5/6] zsmalloc: Add zpool_ops field to zs_pool to store evict handlers Nhat Pham
                   ` (2 subsequent siblings)
  6 siblings, 4 replies; 39+ messages in thread
From: Nhat Pham @ 2022-11-19  0:15 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

This helps determines the coldest zspages as candidates for writeback.

Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 mm/zsmalloc.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 326faa751f0a..7dd464b5a6a5 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -239,6 +239,11 @@ struct zs_pool {
 	/* Compact classes */
 	struct shrinker shrinker;

+#ifdef CONFIG_ZPOOL
+	/* List tracking the zspages in LRU order by most recently added object */
+	struct list_head lru;
+#endif
+
 #ifdef CONFIG_ZSMALLOC_STAT
 	struct dentry *stat_dentry;
 #endif
@@ -260,6 +265,12 @@ struct zspage {
 	unsigned int freeobj;
 	struct page *first_page;
 	struct list_head list; /* fullness list */
+
+#ifdef CONFIG_ZPOOL
+	/* links the zspage to the lru list in the pool */
+	struct list_head lru;
+#endif
+
 	struct zs_pool *pool;
 #ifdef CONFIG_COMPACTION
 	rwlock_t lock;
@@ -953,6 +964,9 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class,
 	}

 	remove_zspage(class, zspage, ZS_EMPTY);
+#ifdef CONFIG_ZPOOL
+	list_del(&zspage->lru);
+#endif
 	__free_zspage(pool, class, zspage);
 }

@@ -998,6 +1012,10 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
 		off %= PAGE_SIZE;
 	}

+#ifdef CONFIG_ZPOOL
+	INIT_LIST_HEAD(&zspage->lru);
+#endif
+
 	set_freeobj(zspage, 0);
 }

@@ -1249,6 +1267,15 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	obj_to_location(obj, &page, &obj_idx);
 	zspage = get_zspage(page);

+#ifdef CONFIG_ZPOOL
+	/* Move the zspage to front of pool's LRU */
+	if (mm == ZS_MM_WO) {
+		if (!list_empty(&zspage->lru))
+			list_del(&zspage->lru);
+		list_add(&zspage->lru, &pool->lru);
+	}
+#endif
+
 	/*
 	 * migration cannot move any zpages in this zspage. Here, pool->lock
 	 * is too heavy since callers would take some time until they calls
@@ -1967,6 +1994,9 @@ static void async_free_zspage(struct work_struct *work)
 		VM_BUG_ON(fullness != ZS_EMPTY);
 		class = pool->size_class[class_idx];
 		spin_lock(&pool->lock);
+#ifdef CONFIG_ZPOOL
+		list_del(&zspage->lru);
+#endif
 		__free_zspage(pool, class, zspage);
 		spin_unlock(&pool->lock);
 	}
@@ -2278,6 +2308,10 @@ struct zs_pool *zs_create_pool(const char *name)
 	 */
 	zs_register_shrinker(pool);

+#ifdef CONFIG_ZPOOL
+	INIT_LIST_HEAD(&pool->lru);
+#endif
+
 	return pool;

 err:
--
2.30.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v6 5/6] zsmalloc: Add zpool_ops field to zs_pool to store evict handlers
  2022-11-19  0:15 [PATCH v6 0/6] Implement writeback for zsmalloc Nhat Pham
                   ` (3 preceding siblings ...)
  2022-11-19  0:15 ` [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Nhat Pham
@ 2022-11-19  0:15 ` Nhat Pham
  2022-11-19 16:39   ` Johannes Weiner
  2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
  2022-11-21 19:29 ` [PATCH v6 0/6] Implement writeback " Nhat Pham
  6 siblings, 1 reply; 39+ messages in thread
From: Nhat Pham @ 2022-11-19  0:15 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

This adds a new field to zs_pool to store evict handlers for writeback,
analogous to the zbud allocator.

Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7dd464b5a6a5..9920f3584511 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -242,6 +242,8 @@ struct zs_pool {
 #ifdef CONFIG_ZPOOL
 	/* List tracking the zspages in LRU order by most recently added object */
 	struct list_head lru;
+	struct zpool *zpool;
+	const struct zpool_ops *zpool_ops;
 #endif

 #ifdef CONFIG_ZSMALLOC_STAT
@@ -382,7 +384,14 @@ static void *zs_zpool_create(const char *name, gfp_t gfp,
 	 * different contexts and its caller must provide a valid
 	 * gfp mask.
 	 */
-	return zs_create_pool(name);
+	struct zs_pool *pool = zs_create_pool(name);
+
+	if (pool) {
+		pool->zpool = zpool;
+		pool->zpool_ops = zpool_ops;
+	}
+
+	return pool;
 }

 static void zs_zpool_destroy(void *pool)
--
2.30.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-19  0:15 [PATCH v6 0/6] Implement writeback for zsmalloc Nhat Pham
                   ` (4 preceding siblings ...)
  2022-11-19  0:15 ` [PATCH v6 5/6] zsmalloc: Add zpool_ops field to zs_pool to store evict handlers Nhat Pham
@ 2022-11-19  0:15 ` Nhat Pham
  2022-11-19 16:45   ` Johannes Weiner
                     ` (5 more replies)
  2022-11-21 19:29 ` [PATCH v6 0/6] Implement writeback " Nhat Pham
  6 siblings, 6 replies; 39+ messages in thread
From: Nhat Pham @ 2022-11-19  0:15 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

This commit adds the writeback mechanism for zsmalloc, analogous to the
zbud allocator. Zsmalloc will attempt to determine the coldest zspage
(i.e least recently used) in the pool, and attempt to write back all the
stored compressed objects via the pool's evict handler.

Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 mm/zsmalloc.c | 193 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 182 insertions(+), 11 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9920f3584511..3fba04e10227 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -271,12 +271,13 @@ struct zspage {
 #ifdef CONFIG_ZPOOL
 	/* links the zspage to the lru list in the pool */
 	struct list_head lru;
+	bool under_reclaim;
+	/* list of unfreed handles whose objects have been reclaimed */
+	unsigned long *deferred_handles;
 #endif

 	struct zs_pool *pool;
-#ifdef CONFIG_COMPACTION
 	rwlock_t lock;
-#endif
 };

 struct mapping_area {
@@ -297,10 +298,11 @@ static bool ZsHugePage(struct zspage *zspage)
 	return zspage->huge;
 }

-#ifdef CONFIG_COMPACTION
 static void migrate_lock_init(struct zspage *zspage);
 static void migrate_read_lock(struct zspage *zspage);
 static void migrate_read_unlock(struct zspage *zspage);
+
+#ifdef CONFIG_COMPACTION
 static void migrate_write_lock(struct zspage *zspage);
 static void migrate_write_lock_nested(struct zspage *zspage);
 static void migrate_write_unlock(struct zspage *zspage);
@@ -308,9 +310,6 @@ static void kick_deferred_free(struct zs_pool *pool);
 static void init_deferred_free(struct zs_pool *pool);
 static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage);
 #else
-static void migrate_lock_init(struct zspage *zspage) {}
-static void migrate_read_lock(struct zspage *zspage) {}
-static void migrate_read_unlock(struct zspage *zspage) {}
 static void migrate_write_lock(struct zspage *zspage) {}
 static void migrate_write_lock_nested(struct zspage *zspage) {}
 static void migrate_write_unlock(struct zspage *zspage) {}
@@ -413,6 +412,27 @@ static void zs_zpool_free(void *pool, unsigned long handle)
 	zs_free(pool, handle);
 }

+static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries);
+
+static int zs_zpool_shrink(void *pool, unsigned int pages,
+			unsigned int *reclaimed)
+{
+	unsigned int total = 0;
+	int ret = -EINVAL;
+
+	while (total < pages) {
+		ret = zs_reclaim_page(pool, 8);
+		if (ret < 0)
+			break;
+		total++;
+	}
+
+	if (reclaimed)
+		*reclaimed = total;
+
+	return ret;
+}
+
 static void *zs_zpool_map(void *pool, unsigned long handle,
 			enum zpool_mapmode mm)
 {
@@ -451,6 +471,7 @@ static struct zpool_driver zs_zpool_driver = {
 	.malloc_support_movable = true,
 	.malloc =		  zs_zpool_malloc,
 	.free =			  zs_zpool_free,
+	.shrink =		  zs_zpool_shrink,
 	.map =			  zs_zpool_map,
 	.unmap =		  zs_zpool_unmap,
 	.total_size =		  zs_zpool_total_size,
@@ -924,6 +945,25 @@ static int trylock_zspage(struct zspage *zspage)
 	return 0;
 }

+#ifdef CONFIG_ZPOOL
+/*
+ * Free all the deferred handles whose objects are freed in zs_free.
+ */
+static void free_handles(struct zs_pool *pool, struct zspage *zspage)
+{
+	unsigned long handle = (unsigned long)zspage->deferred_handles;
+
+	while (handle) {
+		unsigned long nxt_handle = handle_to_obj(handle);
+
+		cache_free_handle(pool, handle);
+		handle = nxt_handle;
+	}
+}
+#else
+static inline void free_handles(struct zs_pool *pool, struct zspage *zspage) {}
+#endif
+
 static void __free_zspage(struct zs_pool *pool, struct size_class *class,
 				struct zspage *zspage)
 {
@@ -938,6 +978,9 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
 	VM_BUG_ON(get_zspage_inuse(zspage));
 	VM_BUG_ON(fg != ZS_EMPTY);

+	/* Free all deferred handles from zs_free */
+	free_handles(pool, zspage);
+
 	next = page = get_first_page(zspage);
 	do {
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -1023,6 +1066,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)

 #ifdef CONFIG_ZPOOL
 	INIT_LIST_HEAD(&zspage->lru);
+	zspage->under_reclaim = false;
+	zspage->deferred_handles = NULL;
 #endif

 	set_freeobj(zspage, 0);
@@ -1535,12 +1580,26 @@ void zs_free(struct zs_pool *pool, unsigned long handle)

 	obj_free(class->size, obj);
 	class_stat_dec(class, OBJ_USED, 1);
+
+#ifdef CONFIG_ZPOOL
+	if (zspage->under_reclaim) {
+		/*
+		 * Reclaim needs the handles during writeback. It'll free
+		 * them along with the zspage when it's done with them.
+		 *
+		 * Record current deferred handle at the memory location
+		 * whose address is given by handle.
+		 */
+		record_obj(handle, (unsigned long)zspage->deferred_handles);
+		zspage->deferred_handles = (unsigned long *)handle;
+		spin_unlock(&pool->lock);
+		return;
+	}
+#endif
 	fullness = fix_fullness_group(class, zspage);
-	if (fullness != ZS_EMPTY)
-		goto out;
+	if (fullness == ZS_EMPTY)
+		free_zspage(pool, class, zspage);

-	free_zspage(pool, class, zspage);
-out:
 	spin_unlock(&pool->lock);
 	cache_free_handle(pool, handle);
 }
@@ -1740,7 +1799,7 @@ static enum fullness_group putback_zspage(struct size_class *class,
 	return fullness;
 }

-#ifdef CONFIG_COMPACTION
+#if defined(CONFIG_ZPOOL) || defined(CONFIG_COMPACTION)
 /*
  * To prevent zspage destroy during migration, zspage freeing should
  * hold locks of all pages in the zspage.
@@ -1782,6 +1841,24 @@ static void lock_zspage(struct zspage *zspage)
 	}
 	migrate_read_unlock(zspage);
 }
+#endif /* defined(CONFIG_ZPOOL) || defined(CONFIG_COMPACTION) */
+
+#ifdef CONFIG_ZPOOL
+/*
+ * Unlocks all the pages of the zspage.
+ *
+ * pool->lock must be held before this function is called
+ * to prevent the underlying pages from migrating.
+ */
+static void unlock_zspage(struct zspage *zspage)
+{
+	struct page *page = get_first_page(zspage);
+
+	do {
+		unlock_page(page);
+	} while ((page = get_next_page(page)) != NULL);
+}
+#endif /* CONFIG_ZPOOL */

 static void migrate_lock_init(struct zspage *zspage)
 {
@@ -1798,6 +1875,7 @@ static void migrate_read_unlock(struct zspage *zspage) __releases(&zspage->lock)
 	read_unlock(&zspage->lock);
 }

+#ifdef CONFIG_COMPACTION
 static void migrate_write_lock(struct zspage *zspage)
 {
 	write_lock(&zspage->lock);
@@ -2362,6 +2440,99 @@ void zs_destroy_pool(struct zs_pool *pool)
 }
 EXPORT_SYMBOL_GPL(zs_destroy_pool);

+#ifdef CONFIG_ZPOOL
+static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries)
+{
+	int i, obj_idx, ret = 0;
+	unsigned long handle;
+	struct zspage *zspage;
+	struct page *page;
+	enum fullness_group fullness;
+
+	/* Lock LRU and fullness list */
+	spin_lock(&pool->lock);
+	if (list_empty(&pool->lru)) {
+		spin_unlock(&pool->lock);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < retries; i++) {
+		struct size_class *class;
+
+		zspage = list_last_entry(&pool->lru, struct zspage, lru);
+		list_del(&zspage->lru);
+
+		/* zs_free may free objects, but not the zspage and handles */
+		zspage->under_reclaim = true;
+
+		class = zspage_class(pool, zspage);
+		fullness = get_fullness_group(class, zspage);
+
+		/* Lock out object allocations and object compaction */
+		remove_zspage(class, zspage, fullness);
+
+		spin_unlock(&pool->lock);
+
+		/* Lock backing pages into place */
+		lock_zspage(zspage);
+
+		obj_idx = 0;
+		page = zspage->first_page;
+		while (1) {
+			handle = find_alloced_obj(class, page, &obj_idx);
+			if (!handle) {
+				page = get_next_page(page);
+				if (!page)
+					break;
+				obj_idx = 0;
+				continue;
+			}
+
+			/*
+			 * This will write the object and call zs_free.
+			 *
+			 * zs_free will free the object, but the
+			 * under_reclaim flag prevents it from freeing
+			 * the zspage altogether. This is necessary so
+			 * that we can continue working with the
+			 * zspage potentially after the last object
+			 * has been freed.
+			 */
+			ret = pool->zpool_ops->evict(pool->zpool, handle);
+			if (ret)
+				goto next;
+
+			obj_idx++;
+		}
+
+next:
+		/* For freeing the zspage, or putting it back in the pool and LRU list. */
+		spin_lock(&pool->lock);
+		zspage->under_reclaim = false;
+
+		if (!get_zspage_inuse(zspage)) {
+			/*
+			 * Fullness went stale as zs_free() won't touch it
+			 * while the page is removed from the pool. Fix it
+			 * up for the check in __free_zspage().
+			 */
+			zspage->fullness = ZS_EMPTY;
+
+			__free_zspage(pool, class, zspage);
+			spin_unlock(&pool->lock);
+			return 0;
+		}
+
+		putback_zspage(class, zspage);
+		list_add(&zspage->lru, &pool->lru);
+		unlock_zspage(zspage);
+	}
+
+	spin_unlock(&pool->lock);
+	return -EAGAIN;
+}
+#endif /* CONFIG_ZPOOL */
+
 static int __init zs_init(void)
 {
 	int ret;
--
2.30.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-19  0:15 ` [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Nhat Pham
@ 2022-11-19 16:38   ` Johannes Weiner
  2022-11-19 17:34   ` Minchan Kim
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: Johannes Weiner @ 2022-11-19 16:38 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

On Fri, Nov 18, 2022 at 04:15:34PM -0800, Nhat Pham wrote:
> This helps determines the coldest zspages as candidates for writeback.
> 
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

This looks good to me. The ifdefs are higher than usual, but in this
case they actually really nicely annotate exactly which hunks need to
move to zswap (as CONFIG_ZPOOL == CONFIG_ZSWAP) when we unify the LRU!
zbud and z3fold don't have those helpful annotations (since they're
zswap-only to begin with), which will make their conversion a bit more
laborious. But zsmalloc can be a (rough) guiding template for them.

Thanks

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 5/6] zsmalloc: Add zpool_ops field to zs_pool to store evict handlers
  2022-11-19  0:15 ` [PATCH v6 5/6] zsmalloc: Add zpool_ops field to zs_pool to store evict handlers Nhat Pham
@ 2022-11-19 16:39   ` Johannes Weiner
  2022-11-22  1:11     ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Weiner @ 2022-11-19 16:39 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

On Fri, Nov 18, 2022 at 04:15:35PM -0800, Nhat Pham wrote:
> This adds a new field to zs_pool to store evict handlers for writeback,
> analogous to the zbud allocator.
> 
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> Acked-by: Minchan Kim <minchan@kernel.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Nice, much simpler. This should make Sergey happy too :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
@ 2022-11-19 16:45   ` Johannes Weiner
  2022-11-19 17:35   ` Minchan Kim
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Johannes Weiner @ 2022-11-19 16:45 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

On Fri, Nov 18, 2022 at 04:15:36PM -0800, Nhat Pham wrote:
> This commit adds the writeback mechanism for zsmalloc, analogous to the
> zbud allocator. Zsmalloc will attempt to determine the coldest zspage
> (i.e least recently used) in the pool, and attempt to write back all the
> stored compressed objects via the pool's evict handler.
> 
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Excellent!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-19  0:15 ` [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Nhat Pham
  2022-11-19 16:38   ` Johannes Weiner
@ 2022-11-19 17:34   ` Minchan Kim
  2022-11-22  1:52   ` Sergey Senozhatsky
  2022-11-23  3:58   ` Sergey Senozhatsky
  3 siblings, 0 replies; 39+ messages in thread
From: Minchan Kim @ 2022-11-19 17:34 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

On Fri, Nov 18, 2022 at 04:15:34PM -0800, Nhat Pham wrote:
> This helps determines the coldest zspages as candidates for writeback.
> 
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
  2022-11-19 16:45   ` Johannes Weiner
@ 2022-11-19 17:35   ` Minchan Kim
  2022-11-22  1:40   ` Sergey Senozhatsky
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Minchan Kim @ 2022-11-19 17:35 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

On Fri, Nov 18, 2022 at 04:15:36PM -0800, Nhat Pham wrote:
> This commit adds the writeback mechanism for zsmalloc, analogous to the
> zbud allocator. Zsmalloc will attempt to determine the coldest zspage
> (i.e least recently used) in the pool, and attempt to write back all the
> stored compressed objects via the pool's evict handler.
> 
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 0/6] Implement writeback for zsmalloc
  2022-11-19  0:15 [PATCH v6 0/6] Implement writeback for zsmalloc Nhat Pham
                   ` (5 preceding siblings ...)
  2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
@ 2022-11-21 19:29 ` Nhat Pham
  2022-11-23 19:26   ` Nhat Pham
  6 siblings, 1 reply; 39+ messages in thread
From: Nhat Pham @ 2022-11-21 19:29 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

Hi Andrew, looks like Minchan is on board with the series - the concerns
with the latter patches have been resolved. Feel free to cherry-pick
this series back to your mm-unstable branch!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 5/6] zsmalloc: Add zpool_ops field to zs_pool to store evict handlers
  2022-11-19 16:39   ` Johannes Weiner
@ 2022-11-22  1:11     ` Sergey Senozhatsky
  0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  1:11 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Nhat Pham, akpm, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/19 11:39), Johannes Weiner wrote:
> On Fri, Nov 18, 2022 at 04:15:35PM -0800, Nhat Pham wrote:
> > This adds a new field to zs_pool to store evict handlers for writeback,
> > analogous to the zbud allocator.
> > 
> > Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> > Acked-by: Minchan Kim <minchan@kernel.org>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Nice, much simpler. This should make Sergey happy too :)

Yup looks good :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
  2022-11-19 16:45   ` Johannes Weiner
  2022-11-19 17:35   ` Minchan Kim
@ 2022-11-22  1:40   ` Sergey Senozhatsky
  2022-11-22  2:00   ` Sergey Senozhatsky
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  1:40 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/18 16:15), Nhat Pham wrote:
> +static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries);
> +
> +static int zs_zpool_shrink(void *pool, unsigned int pages,
> +			unsigned int *reclaimed)
> +{
> +	unsigned int total = 0;
> +	int ret = -EINVAL;
> +
> +	while (total < pages) {
> +		ret = zs_reclaim_page(pool, 8);

Just curious why 8 retries and how was 8 picked?

> +		if (ret < 0)
> +			break;
> +		total++;
> +	}
> +
> +	if (reclaimed)
> +		*reclaimed = total;
> +
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 1/6] zswap: fix writeback lock ordering for zsmalloc
  2022-11-19  0:15 ` [PATCH v6 1/6] zswap: fix writeback lock ordering " Nhat Pham
@ 2022-11-22  1:43   ` Sergey Senozhatsky
  0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  1:43 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/18 16:15), Nhat Pham wrote:
> @@ -958,7 +958,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
>  	};
> 
>  	if (!zpool_can_sleep_mapped(pool)) {
> -		tmp = kmalloc(PAGE_SIZE, GFP_ATOMIC);
> +		tmp = kmalloc(PAGE_SIZE, GFP_KERNEL);
>  		if (!tmp)
>  			return -ENOMEM;
>  	}

I guess this chunk is not realted to zsmalloc lock oredering fix.
Should it be a separate patch? And feel free to squash my patch,
that does the similar thing:

https://lore.kernel.org/all/20221122013338.3696079-1-senozhatsky@chromium.org/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 2/6] zpool: clean out dead code
  2022-11-19  0:15 ` [PATCH v6 2/6] zpool: clean out dead code Nhat Pham
@ 2022-11-22  1:46   ` Sergey Senozhatsky
  0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  1:46 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/18 16:15), Nhat Pham wrote:
> There is a lot of provision for flexibility that isn't actually needed
> or used. Zswap (the only zpool user) always passes zpool_ops with an
> .evict method set. The backends who reclaim only do so for zswap, so
> they can also directly call zpool_ops without indirection or checks.
> 
> Finally, there is no need to check the retries parameters and bail
> with -EINVAL in the reclaim function, when that's called just a few
> lines below with a hard-coded 8. There is no need to duplicate the
> evictable and sleep_mapped attrs from the driver in zpool_ops.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-19  0:15 ` [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Nhat Pham
  2022-11-19 16:38   ` Johannes Weiner
  2022-11-19 17:34   ` Minchan Kim
@ 2022-11-22  1:52   ` Sergey Senozhatsky
  2022-11-22 17:42     ` Johannes Weiner
  2022-11-23  3:58   ` Sergey Senozhatsky
  3 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  1:52 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/18 16:15), Nhat Pham wrote:
[..]
> @@ -1249,6 +1267,15 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
>  	obj_to_location(obj, &page, &obj_idx);
>  	zspage = get_zspage(page);
> 
> +#ifdef CONFIG_ZPOOL
> +	/* Move the zspage to front of pool's LRU */
> +	if (mm == ZS_MM_WO) {
> +		if (!list_empty(&zspage->lru))
> +			list_del(&zspage->lru);
> +		list_add(&zspage->lru, &pool->lru);
> +	}
> +#endif

Do we consider pages that were mapped for MM_RO/MM_RW as cold?
I wonder why, we use them, so technically they are not exactly
"least recently used".

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
                     ` (2 preceding siblings ...)
  2022-11-22  1:40   ` Sergey Senozhatsky
@ 2022-11-22  2:00   ` Sergey Senozhatsky
  2022-11-22  2:15   ` Sergey Senozhatsky
  2022-11-22  6:37   ` Sergey Senozhatsky
  5 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  2:00 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/18 16:15), Nhat Pham wrote:
> +static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries)
> +{
> +	int i, obj_idx, ret = 0;
> +	unsigned long handle;
> +	struct zspage *zspage;
> +	struct page *page;
> +	enum fullness_group fullness;
> +
> +	/* Lock LRU and fullness list */
> +	spin_lock(&pool->lock);
> +	if (list_empty(&pool->lru)) {
> +		spin_unlock(&pool->lock);
> +		return -EINVAL;
> +	}
> +
> +	for (i = 0; i < retries; i++) {
> +		struct size_class *class;
> +
> +		zspage = list_last_entry(&pool->lru, struct zspage, lru);
> +		list_del(&zspage->lru);
> +
> +		/* zs_free may free objects, but not the zspage and handles */
> +		zspage->under_reclaim = true;
> +
> +		class = zspage_class(pool, zspage);
> +		fullness = get_fullness_group(class, zspage);
> +
> +		/* Lock out object allocations and object compaction */
> +		remove_zspage(class, zspage, fullness);
> +
> +		spin_unlock(&pool->lock);
> +
> +		/* Lock backing pages into place */
> +		lock_zspage(zspage);
> +
> +		obj_idx = 0;
> +		page = zspage->first_page;

A nit: we usually call get_first_page() in such cases.

> +		while (1) {
> +			handle = find_alloced_obj(class, page, &obj_idx);
> +			if (!handle) {
> +				page = get_next_page(page);
> +				if (!page)
> +					break;
> +				obj_idx = 0;
> +				continue;
> +			}
> +
> +			/*
> +			 * This will write the object and call zs_free.
> +			 *
> +			 * zs_free will free the object, but the
> +			 * under_reclaim flag prevents it from freeing
> +			 * the zspage altogether. This is necessary so
> +			 * that we can continue working with the
> +			 * zspage potentially after the last object
> +			 * has been freed.
> +			 */
> +			ret = pool->zpool_ops->evict(pool->zpool, handle);
> +			if (ret)
> +				goto next;
> +
> +			obj_idx++;
> +		}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
                     ` (3 preceding siblings ...)
  2022-11-22  2:00   ` Sergey Senozhatsky
@ 2022-11-22  2:15   ` Sergey Senozhatsky
  2022-11-22  3:12     ` Johannes Weiner
  2022-11-22  6:37   ` Sergey Senozhatsky
  5 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  2:15 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/18 16:15), Nhat Pham wrote:
> +
> +static int zs_zpool_shrink(void *pool, unsigned int pages,
> +			unsigned int *reclaimed)
> +{
> +	unsigned int total = 0;
> +	int ret = -EINVAL;
> +
> +	while (total < pages) {
> +		ret = zs_reclaim_page(pool, 8);
> +		if (ret < 0)
> +			break;
> +		total++;
> +	}
> +
> +	if (reclaimed)
> +		*reclaimed = total;
> +
> +	return ret;
> +}

A silly question: why do we need a retry loop in zs_reclaim_page()?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-22  2:15   ` Sergey Senozhatsky
@ 2022-11-22  3:12     ` Johannes Weiner
  2022-11-22  3:42       ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Weiner @ 2022-11-22  3:12 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Nhat Pham, akpm, linux-mm, linux-kernel, minchan, ngupta,
	sjenning, ddstreet, vitaly.wool

On Tue, Nov 22, 2022 at 11:15:20AM +0900, Sergey Senozhatsky wrote:
> On (22/11/18 16:15), Nhat Pham wrote:
> > +
> > +static int zs_zpool_shrink(void *pool, unsigned int pages,
> > +			unsigned int *reclaimed)
> > +{
> > +	unsigned int total = 0;
> > +	int ret = -EINVAL;
> > +
> > +	while (total < pages) {
> > +		ret = zs_reclaim_page(pool, 8);
> > +		if (ret < 0)
> > +			break;
> > +		total++;
> > +	}
> > +
> > +	if (reclaimed)
> > +		*reclaimed = total;
> > +
> > +	return ret;
> > +}
> 
> A silly question: why do we need a retry loop in zs_reclaim_page()?

Individual objects in a zspage can be busy (swapped in simultaneously
for example), which will prevent the zspage from being freed. Zswap
currently requests reclaim of one backend page at a time (another
project...), so if we don't retry we're not meeting the reclaim goal
and cause rejections for new stores. Rejections are worse than moving
on to the adjacent LRU item, because a rejected page, which should be
at the head of the LRU, bypasses the list and goes straight to swap.

The number 8 is cribbed from zbud and z3fold. It works well in
practice, but is as arbitrary as MAX_RECLAIM_RETRIES used all over MM.
We may want to revisit it at some point, but we should probably do it
for all backends then.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-22  3:12     ` Johannes Weiner
@ 2022-11-22  3:42       ` Sergey Senozhatsky
  2022-11-22  6:09         ` Johannes Weiner
  0 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  3:42 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Sergey Senozhatsky, Nhat Pham, akpm, linux-mm, linux-kernel,
	minchan, ngupta, sjenning, ddstreet, vitaly.wool

On (22/11/21 22:12), Johannes Weiner wrote:
> On Tue, Nov 22, 2022 at 11:15:20AM +0900, Sergey Senozhatsky wrote:
> > On (22/11/18 16:15), Nhat Pham wrote:
> > > +
> > > +static int zs_zpool_shrink(void *pool, unsigned int pages,
> > > +			unsigned int *reclaimed)
> > > +{
> > > +	unsigned int total = 0;
> > > +	int ret = -EINVAL;
> > > +
> > > +	while (total < pages) {
> > > +		ret = zs_reclaim_page(pool, 8);
> > > +		if (ret < 0)
> > > +			break;
> > > +		total++;
> > > +	}
> > > +
> > > +	if (reclaimed)
> > > +		*reclaimed = total;
> > > +
> > > +	return ret;
> > > +}
> > 
> > A silly question: why do we need a retry loop in zs_reclaim_page()?
> 
> Individual objects in a zspage can be busy (swapped in simultaneously
> for example), which will prevent the zspage from being freed. Zswap
> currently requests reclaim of one backend page at a time (another
> project...), so if we don't retry we're not meeting the reclaim goal
> and cause rejections for new stores.

What I meant was: if zs_reclaim_page() makes only partial progress
with the current LRU tail zspage and returns -EAGAIN, then we just
don't increment `total` and continue looping in zs_zpool_shrink().
On each iteration zs_reclaim_page() picks the new LRU tail (if any)
and tries to write it back.

> The number 8 is cribbed from zbud and z3fold.

OK.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-22  3:42       ` Sergey Senozhatsky
@ 2022-11-22  6:09         ` Johannes Weiner
  2022-11-22  6:35           ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Weiner @ 2022-11-22  6:09 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Nhat Pham, akpm, linux-mm, linux-kernel, minchan, ngupta,
	sjenning, ddstreet, vitaly.wool

On Tue, Nov 22, 2022 at 12:42:20PM +0900, Sergey Senozhatsky wrote:
> On (22/11/21 22:12), Johannes Weiner wrote:
> > On Tue, Nov 22, 2022 at 11:15:20AM +0900, Sergey Senozhatsky wrote:
> > > On (22/11/18 16:15), Nhat Pham wrote:
> > > > +
> > > > +static int zs_zpool_shrink(void *pool, unsigned int pages,
> > > > +			unsigned int *reclaimed)
> > > > +{
> > > > +	unsigned int total = 0;
> > > > +	int ret = -EINVAL;
> > > > +
> > > > +	while (total < pages) {
> > > > +		ret = zs_reclaim_page(pool, 8);
> > > > +		if (ret < 0)
> > > > +			break;
> > > > +		total++;
> > > > +	}
> > > > +
> > > > +	if (reclaimed)
> > > > +		*reclaimed = total;
> > > > +
> > > > +	return ret;
> > > > +}
> > > 
> > > A silly question: why do we need a retry loop in zs_reclaim_page()?
> > 
> > Individual objects in a zspage can be busy (swapped in simultaneously
> > for example), which will prevent the zspage from being freed. Zswap
> > currently requests reclaim of one backend page at a time (another
> > project...), so if we don't retry we're not meeting the reclaim goal
> > and cause rejections for new stores.
> 
> What I meant was: if zs_reclaim_page() makes only partial progress
> with the current LRU tail zspage and returns -EAGAIN, then we just
> don't increment `total` and continue looping in zs_zpool_shrink().

Hm, but it breaks on -EAGAIN, it doesn't continue.

This makes sense, IMO. zs_reclaim_page() will try to reclaim one page,
but considers up to 8 LRU tail pages until one succeeds. If it does,
it continues (total++). But if one of these calls fails, we exit the
loop, give up and return failure from zs_zpool_shrink().

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-22  6:09         ` Johannes Weiner
@ 2022-11-22  6:35           ` Sergey Senozhatsky
  2022-11-22  7:10             ` Johannes Weiner
  0 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  6:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Sergey Senozhatsky, Nhat Pham, akpm, linux-mm, linux-kernel,
	minchan, ngupta, sjenning, ddstreet, vitaly.wool

On (22/11/22 01:09), Johannes Weiner wrote:
> On Tue, Nov 22, 2022 at 12:42:20PM +0900, Sergey Senozhatsky wrote:
> > On (22/11/21 22:12), Johannes Weiner wrote:
> > > On Tue, Nov 22, 2022 at 11:15:20AM +0900, Sergey Senozhatsky wrote:
> > > > On (22/11/18 16:15), Nhat Pham wrote:

[..]

> > What I meant was: if zs_reclaim_page() makes only partial progress
> > with the current LRU tail zspage and returns -EAGAIN, then we just
> > don't increment `total` and continue looping in zs_zpool_shrink().
> 
> Hm, but it breaks on -EAGAIN, it doesn't continue.

Yes. "What if it would continue". Would it make sense to not
break on EAGAIN?

	while (total < pages) {
		ret = zs_reclaim_page(pool);
		if (ret == -EAGAIN)
			continue;
		if (ret < 0)
			break;
		total++;
	}

Then we don't need retry loop in zs_reclaim_page().

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
                     ` (4 preceding siblings ...)
  2022-11-22  2:15   ` Sergey Senozhatsky
@ 2022-11-22  6:37   ` Sergey Senozhatsky
  2022-11-23 16:30     ` Nhat Pham
  2022-11-23 17:18     ` Johannes Weiner
  5 siblings, 2 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  6:37 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/18 16:15), Nhat Pham wrote:
[..]
> +static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries)
> +{
> +	int i, obj_idx, ret = 0;
> +	unsigned long handle;
> +	struct zspage *zspage;
> +	struct page *page;
> +	enum fullness_group fullness;
> +
> +	/* Lock LRU and fullness list */
> +	spin_lock(&pool->lock);
> +	if (list_empty(&pool->lru)) {
> +		spin_unlock(&pool->lock);
> +		return -EINVAL;
> +	}
> +
> +	for (i = 0; i < retries; i++) {
> +		struct size_class *class;
> +
> +		zspage = list_last_entry(&pool->lru, struct zspage, lru);
> +		list_del(&zspage->lru);
> +
> +		/* zs_free may free objects, but not the zspage and handles */
> +		zspage->under_reclaim = true;
> +
> +		class = zspage_class(pool, zspage);
> +		fullness = get_fullness_group(class, zspage);
> +
> +		/* Lock out object allocations and object compaction */
> +		remove_zspage(class, zspage, fullness);
> +
> +		spin_unlock(&pool->lock);
> +
> +		/* Lock backing pages into place */
> +		lock_zspage(zspage);
> +
> +		obj_idx = 0;
> +		page = zspage->first_page;
> +		while (1) {
> +			handle = find_alloced_obj(class, page, &obj_idx);
> +			if (!handle) {
> +				page = get_next_page(page);
> +				if (!page)
> +					break;
> +				obj_idx = 0;
> +				continue;
> +			}
> +
> +			/*
> +			 * This will write the object and call zs_free.
> +			 *
> +			 * zs_free will free the object, but the
> +			 * under_reclaim flag prevents it from freeing
> +			 * the zspage altogether. This is necessary so
> +			 * that we can continue working with the
> +			 * zspage potentially after the last object
> +			 * has been freed.
> +			 */
> +			ret = pool->zpool_ops->evict(pool->zpool, handle);
> +			if (ret)
> +				goto next;
> +
> +			obj_idx++;
> +		}
> +
> +next:
> +		/* For freeing the zspage, or putting it back in the pool and LRU list. */
> +		spin_lock(&pool->lock);
> +		zspage->under_reclaim = false;
> +
> +		if (!get_zspage_inuse(zspage)) {
> +			/*
> +			 * Fullness went stale as zs_free() won't touch it
> +			 * while the page is removed from the pool. Fix it
> +			 * up for the check in __free_zspage().
> +			 */
> +			zspage->fullness = ZS_EMPTY;
> +
> +			__free_zspage(pool, class, zspage);
> +			spin_unlock(&pool->lock);
> +			return 0;
> +		}
> +
> +		putback_zspage(class, zspage);
> +		list_add(&zspage->lru, &pool->lru);
> +		unlock_zspage(zspage);

We probably better to cond_resched() somewhere here. Or in zs_zpool_shrink()
loop.

> +	}
> +
> +	spin_unlock(&pool->lock);
> +	return -EAGAIN;
> +}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-22  6:35           ` Sergey Senozhatsky
@ 2022-11-22  7:10             ` Johannes Weiner
  2022-11-22  7:19               ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Weiner @ 2022-11-22  7:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Nhat Pham, akpm, linux-mm, linux-kernel, minchan, ngupta,
	sjenning, ddstreet, vitaly.wool

On Tue, Nov 22, 2022 at 03:35:18PM +0900, Sergey Senozhatsky wrote:
> On (22/11/22 01:09), Johannes Weiner wrote:
> > On Tue, Nov 22, 2022 at 12:42:20PM +0900, Sergey Senozhatsky wrote:
> > > On (22/11/21 22:12), Johannes Weiner wrote:
> > > > On Tue, Nov 22, 2022 at 11:15:20AM +0900, Sergey Senozhatsky wrote:
> > > > > On (22/11/18 16:15), Nhat Pham wrote:
> 
> [..]
> 
> > > What I meant was: if zs_reclaim_page() makes only partial progress
> > > with the current LRU tail zspage and returns -EAGAIN, then we just
> > > don't increment `total` and continue looping in zs_zpool_shrink().
> > 
> > Hm, but it breaks on -EAGAIN, it doesn't continue.
> 
> Yes. "What if it would continue". Would it make sense to not
> break on EAGAIN?
> 
> 	while (total < pages) {
> 		ret = zs_reclaim_page(pool);
> 		if (ret == -EAGAIN)
> 			continue;
> 		if (ret < 0)
> 			break;
> 		total++;
> 	}
> 
> Then we don't need retry loop in zs_reclaim_page().

But that's an indefinite busy-loop?

I don't see what the problem with limited retrying in
zs_reclaim_page() is. It's robust and has worked for years.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-22  7:10             ` Johannes Weiner
@ 2022-11-22  7:19               ` Sergey Senozhatsky
  0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-22  7:19 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Sergey Senozhatsky, Nhat Pham, akpm, linux-mm, linux-kernel,
	minchan, ngupta, sjenning, ddstreet, vitaly.wool

On (22/11/22 02:10), Johannes Weiner wrote:
> > Yes. "What if it would continue". Would it make sense to not
> > break on EAGAIN?
> > 
> > 	while (total < pages) {
> > 		ret = zs_reclaim_page(pool);
> > 		if (ret == -EAGAIN)
> > 			continue;
> > 		if (ret < 0)
> > 			break;
> > 		total++;
> > 	}
> > 
> > Then we don't need retry loop in zs_reclaim_page().
> 
> But that's an indefinite busy-loop?

That would mean that all lru pages constantly have locked objects
and we can only make partial progress.

> I don't see what the problem with limited retrying in
> zs_reclaim_page() is. It's robust and has worked for years.

No problem with it, just asking.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-22  1:52   ` Sergey Senozhatsky
@ 2022-11-22 17:42     ` Johannes Weiner
  2022-11-23  3:50       ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Weiner @ 2022-11-22 17:42 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Nhat Pham, akpm, linux-mm, linux-kernel, minchan, ngupta,
	sjenning, ddstreet, vitaly.wool

On Tue, Nov 22, 2022 at 10:52:58AM +0900, Sergey Senozhatsky wrote:
> On (22/11/18 16:15), Nhat Pham wrote:
> [..]
> > @@ -1249,6 +1267,15 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> >  	obj_to_location(obj, &page, &obj_idx);
> >  	zspage = get_zspage(page);
> > 
> > +#ifdef CONFIG_ZPOOL
> > +	/* Move the zspage to front of pool's LRU */
> > +	if (mm == ZS_MM_WO) {
> > +		if (!list_empty(&zspage->lru))
> > +			list_del(&zspage->lru);
> > +		list_add(&zspage->lru, &pool->lru);
> > +	}
> > +#endif
> 
> Do we consider pages that were mapped for MM_RO/MM_RW as cold?
> I wonder why, we use them, so technically they are not exactly
> "least recently used".

This is a swap LRU. Per definition there are no ongoing accesses to
the memory while the page is swapped out that would make it "hot". A
new entry is hot, then ages to the tail until it gets either written
back or swaps back in. Because of that, the zswap backends have
traditionally had the lru-add in the allocation function (zs_malloc,
zbud_alloc, z3fold_alloc).

Minchan insisted we move it here for zsmalloc, since 'update lru on
data access' is more generic. Unfortunately, one of the data accesses
is when we write the swap entry to disk - during reclaim when the page
is isolated from the LRU! Obviously we MUST NOT put it back on the LRU
mid-reclaim.

So now we have very generic LRU code, and exactly one usecase that
needs exceptions from the generic behavior.

The code is raising questions, not surprisingly. We can add a lengthy
comment to it - a variant of the above text?

My vote would still be to just move it back to zs_malloc, where it
makes sense, is easier to explain, and matches the other backends.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-22 17:42     ` Johannes Weiner
@ 2022-11-23  3:50       ` Sergey Senozhatsky
  2022-11-23  8:02         ` Yosry Ahmed
  0 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-23  3:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Sergey Senozhatsky, Nhat Pham, akpm, linux-mm, linux-kernel,
	minchan, ngupta, sjenning, ddstreet, vitaly.wool

On (22/11/22 12:42), Johannes Weiner wrote:
> On Tue, Nov 22, 2022 at 10:52:58AM +0900, Sergey Senozhatsky wrote:
> > On (22/11/18 16:15), Nhat Pham wrote:
> > [..]
> > > @@ -1249,6 +1267,15 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> > >  	obj_to_location(obj, &page, &obj_idx);
> > >  	zspage = get_zspage(page);
> > > 
> > > +#ifdef CONFIG_ZPOOL
> > > +	/* Move the zspage to front of pool's LRU */
> > > +	if (mm == ZS_MM_WO) {
> > > +		if (!list_empty(&zspage->lru))
> > > +			list_del(&zspage->lru);
> > > +		list_add(&zspage->lru, &pool->lru);
> > > +	}
> > > +#endif
> > 
> > Do we consider pages that were mapped for MM_RO/MM_RW as cold?
> > I wonder why, we use them, so technically they are not exactly
> > "least recently used".
> 
> This is a swap LRU. Per definition there are no ongoing accesses to
> the memory while the page is swapped out that would make it "hot".

Hmm. Not arguing, just trying to understand some things.

There are no accesses to swapped out pages yes, but zspage holds multiple
objects, which are compressed swapped out pages in this particular case.
For example, zspage in class size 176 (bytes) can hold 93 objects per-zspage,
that is 93 compressed swapped out pages. Consider ZS_FULL zspages which
is at the tail of the LRU list. Suppose that we page-faulted 20 times and
read 20 objects from that zspage, IOW zspage has been in use 20 times very
recently, while writeback still considers it to be "not-used" and will
evict it.

So if this works for you then I'm fine. But we probably, like you suggested,
can document a couple of things here - namely why WRITE access to zspage
counts as "zspage is in use" but READ access to the same zspage does not
count as "zspage is in use".

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-19  0:15 ` [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Nhat Pham
                     ` (2 preceding siblings ...)
  2022-11-22  1:52   ` Sergey Senozhatsky
@ 2022-11-23  3:58   ` Sergey Senozhatsky
  3 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-23  3:58 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, linux-mm, linux-kernel, minchan, ngupta,
	senozhatsky, sjenning, ddstreet, vitaly.wool

On (22/11/18 16:15), Nhat Pham wrote:
> +#ifdef CONFIG_ZPOOL
> +	/* Move the zspage to front of pool's LRU */
> +	if (mm == ZS_MM_WO) {
> +		if (!list_empty(&zspage->lru))
> +			list_del(&zspage->lru);
> +		list_add(&zspage->lru, &pool->lru);
> +	}
> +#endif

Just an idea.

Have you considered having size class LRU instead of pool LRU?

Evicting pages from different classes can have different impact on the
system, in theory. For instance, ZS_FULL zspage in class size 3264
(bytes) holds 5 compressed objects per-zspage, which are 5 compressed
swapped out pages. While zspage in a class size 176 (bytes) holds 93
compressed objects (swapped pages). Both zspages consist of 4
non-contiguous 0-order physical pages, so when we free zspage from these
classes we release 4 physical pages. However, in terms of major
page faults evicting a page from size class 3264 looks better than from
a size class 176: 5 major page faults vs 93 major page faults.

Does this make sense?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-23  3:50       ` Sergey Senozhatsky
@ 2022-11-23  8:02         ` Yosry Ahmed
  2022-11-23  8:11           ` Yosry Ahmed
  2022-11-24  3:21           ` Sergey Senozhatsky
  0 siblings, 2 replies; 39+ messages in thread
From: Yosry Ahmed @ 2022-11-23  8:02 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Johannes Weiner, Nhat Pham, akpm, linux-mm, linux-kernel,
	minchan, ngupta, sjenning, ddstreet, vitaly.wool

On Tue, Nov 22, 2022 at 7:50 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (22/11/22 12:42), Johannes Weiner wrote:
> > On Tue, Nov 22, 2022 at 10:52:58AM +0900, Sergey Senozhatsky wrote:
> > > On (22/11/18 16:15), Nhat Pham wrote:
> > > [..]
> > > > @@ -1249,6 +1267,15 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> > > >   obj_to_location(obj, &page, &obj_idx);
> > > >   zspage = get_zspage(page);
> > > >
> > > > +#ifdef CONFIG_ZPOOL
> > > > + /* Move the zspage to front of pool's LRU */
> > > > + if (mm == ZS_MM_WO) {
> > > > +         if (!list_empty(&zspage->lru))
> > > > +                 list_del(&zspage->lru);
> > > > +         list_add(&zspage->lru, &pool->lru);
> > > > + }
> > > > +#endif
> > >
> > > Do we consider pages that were mapped for MM_RO/MM_RW as cold?
> > > I wonder why, we use them, so technically they are not exactly
> > > "least recently used".
> >
> > This is a swap LRU. Per definition there are no ongoing accesses to
> > the memory while the page is swapped out that would make it "hot".
>
> Hmm. Not arguing, just trying to understand some things.
>
> There are no accesses to swapped out pages yes, but zspage holds multiple
> objects, which are compressed swapped out pages in this particular case.
> For example, zspage in class size 176 (bytes) can hold 93 objects per-zspage,
> that is 93 compressed swapped out pages. Consider ZS_FULL zspages which
> is at the tail of the LRU list. Suppose that we page-faulted 20 times and
> read 20 objects from that zspage, IOW zspage has been in use 20 times very
> recently, while writeback still considers it to be "not-used" and will
> evict it.
>
> So if this works for you then I'm fine. But we probably, like you suggested,
> can document a couple of things here - namely why WRITE access to zspage
> counts as "zspage is in use" but READ access to the same zspage does not
> count as "zspage is in use".
>

I guess the key here is that we have an LRU of zspages, when we really
want an LRU of compressed objects. In some cases, we may end up
reclaiming the wrong pages.

Assuming we have 2 zspages, Z1 and Z2, and 4 physical pages that we
compress over time, P1 -> P4.

Let's assume P1 -> P4 get compressed in order (P4 is the hottest
page), and they get assigned to zspages as follows:
Z1: P1, P3
Z2: P2, P4

In this case, the zspages LRU would be Z2->Z1, because Z2 was touched
last when we compressed P4. Now if we want to writeback, we will look
at Z1, and we might end up reclaiming P3, depending on the order the
pages are stored in.

A worst case scenario of this would be if we have a large number of
pages, maybe 1000, P1->P1000 (where P1000 is the hottest), and they
all go into Z1 and Z2 in this way:
Z1: P1 -> P499, P1000
Z2: P500 -> P999

In this case, Z1 contains 499 cold pages, but it got P1000 at the end
which caused us to put it on the front of the LRU. Now writeback will
consistently use Z2. This is bad. Now I have no idea how practical
this is, but it seems fairly random, based on the compression size of
pages and access patterns.

Does this mean we should move zspages to the front of the LRU when we
writeback from them? No, I wouldn't say so. The same exact scenario
can happen because of this. Imagine the following assignment of the
1000 pages:
Z1: P<odd> (P1, P3, .., P999)
Z2: P<even> (P2, P4, .., P1000)

Z2 is at the front of the LRU because it has P1000, so the first time
we do writeback we will start at Z1. Once we reclaim one object from
Z1, we will start writeback from Z2 next time, and we will keep
alternating. Now if we are really unlucky, we can end up reclaiming in
this order P999, P1000, P997, P998, ... . So yeah I don't think
putting zspages in the front of the LRU when we writeback is the
answer. I would even say it's completely orthogonal to the problem,
because writing back an object from the zspage at the end of the LRU
gives us 0 information about the state of other objects on the same
zspage.

Ideally, we would have an LRU of objects instead, but this would be
very complicated with the current form of writeback. It would be much
easier if we have an LRU for zswap entries instead, which is something
I am looking into, and is a much bigger surgery, and should be
separate from this work. Today zswap inverts LRU priorities anyway by
sending hot pages to the swapfile when zswap is full, when colder
pages are in zswap, so I wouldn't really worry about this now :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-23  8:02         ` Yosry Ahmed
@ 2022-11-23  8:11           ` Yosry Ahmed
  2022-11-23 16:30             ` Johannes Weiner
  2022-11-24  3:21           ` Sergey Senozhatsky
  1 sibling, 1 reply; 39+ messages in thread
From: Yosry Ahmed @ 2022-11-23  8:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Johannes Weiner, Nhat Pham, akpm, linux-mm, linux-kernel,
	minchan, ngupta, sjenning, ddstreet, vitaly.wool

On Wed, Nov 23, 2022 at 12:02 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Tue, Nov 22, 2022 at 7:50 PM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > On (22/11/22 12:42), Johannes Weiner wrote:
> > > On Tue, Nov 22, 2022 at 10:52:58AM +0900, Sergey Senozhatsky wrote:
> > > > On (22/11/18 16:15), Nhat Pham wrote:
> > > > [..]
> > > > > @@ -1249,6 +1267,15 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> > > > >   obj_to_location(obj, &page, &obj_idx);
> > > > >   zspage = get_zspage(page);
> > > > >
> > > > > +#ifdef CONFIG_ZPOOL
> > > > > + /* Move the zspage to front of pool's LRU */
> > > > > + if (mm == ZS_MM_WO) {
> > > > > +         if (!list_empty(&zspage->lru))
> > > > > +                 list_del(&zspage->lru);
> > > > > +         list_add(&zspage->lru, &pool->lru);
> > > > > + }
> > > > > +#endif
> > > >
> > > > Do we consider pages that were mapped for MM_RO/MM_RW as cold?
> > > > I wonder why, we use them, so technically they are not exactly
> > > > "least recently used".
> > >
> > > This is a swap LRU. Per definition there are no ongoing accesses to
> > > the memory while the page is swapped out that would make it "hot".
> >
> > Hmm. Not arguing, just trying to understand some things.
> >
> > There are no accesses to swapped out pages yes, but zspage holds multiple
> > objects, which are compressed swapped out pages in this particular case.
> > For example, zspage in class size 176 (bytes) can hold 93 objects per-zspage,
> > that is 93 compressed swapped out pages. Consider ZS_FULL zspages which
> > is at the tail of the LRU list. Suppose that we page-faulted 20 times and
> > read 20 objects from that zspage, IOW zspage has been in use 20 times very
> > recently, while writeback still considers it to be "not-used" and will
> > evict it.
> >
> > So if this works for you then I'm fine. But we probably, like you suggested,
> > can document a couple of things here - namely why WRITE access to zspage
> > counts as "zspage is in use" but READ access to the same zspage does not
> > count as "zspage is in use".
> >
>
> I guess the key here is that we have an LRU of zspages, when we really
> want an LRU of compressed objects. In some cases, we may end up
> reclaiming the wrong pages.
>
> Assuming we have 2 zspages, Z1 and Z2, and 4 physical pages that we
> compress over time, P1 -> P4.
>
> Let's assume P1 -> P4 get compressed in order (P4 is the hottest
> page), and they get assigned to zspages as follows:
> Z1: P1, P3
> Z2: P2, P4
>
> In this case, the zspages LRU would be Z2->Z1, because Z2 was touched
> last when we compressed P4. Now if we want to writeback, we will look
> at Z1, and we might end up reclaiming P3, depending on the order the
> pages are stored in.
>
> A worst case scenario of this would be if we have a large number of
> pages, maybe 1000, P1->P1000 (where P1000 is the hottest), and they
> all go into Z1 and Z2 in this way:
> Z1: P1 -> P499, P1000
> Z2: P500 -> P999
>
> In this case, Z1 contains 499 cold pages, but it got P1000 at the end
> which caused us to put it on the front of the LRU. Now writeback will
> consistently use Z2. This is bad. Now I have no idea how practical
> this is, but it seems fairly random, based on the compression size of
> pages and access patterns.
>
> Does this mean we should move zspages to the front of the LRU when we
> writeback from them? No, I wouldn't say so. The same exact scenario
> can happen because of this. Imagine the following assignment of the
> 1000 pages:
> Z1: P<odd> (P1, P3, .., P999)
> Z2: P<even> (P2, P4, .., P1000)
>
> Z2 is at the front of the LRU because it has P1000, so the first time
> we do writeback we will start at Z1. Once we reclaim one object from
> Z1, we will start writeback from Z2 next time, and we will keep
> alternating. Now if we are really unlucky, we can end up reclaiming in
> this order P999, P1000, P997, P998, ... . So yeah I don't think
> putting zspages in the front of the LRU when we writeback is the
> answer. I would even say it's completely orthogonal to the problem,
> because writing back an object from the zspage at the end of the LRU
> gives us 0 information about the state of other objects on the same
> zspage.
>
> Ideally, we would have an LRU of objects instead, but this would be
> very complicated with the current form of writeback. It would be much
> easier if we have an LRU for zswap entries instead, which is something
> I am looking into, and is a much bigger surgery, and should be
> separate from this work. Today zswap inverts LRU priorities anyway by
> sending hot pages to the swapfile when zswap is full, when colder
> pages are in zswap, so I wouldn't really worry about this now :)

Oh, I didn't realize we reclaim all the objects in the zspage at the
end of the LRU. All the examples are wrong, but the concept still
stands, the problem is that we have an LRU of zspages not an LRU of
objects.

Nonetheless, the fact that we refaulted an object in a zspage does not
necessarily mean that other objects on the same are hotter than
objects in other zspages IIUC.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-22  6:37   ` Sergey Senozhatsky
@ 2022-11-23 16:30     ` Nhat Pham
  2022-11-23 17:27       ` Johannes Weiner
  2022-11-23 17:18     ` Johannes Weiner
  1 sibling, 1 reply; 39+ messages in thread
From: Nhat Pham @ 2022-11-23 16:30 UTC (permalink / raw)
  To: senozhatsky
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, akpm, sjenning,
	ddstreet, vitaly.wool

Thanks for the comments, Sergey!

> A nit: we usually call get_first_page() in such cases.

I'll use get_first_page() here in v7.

> We probably better to cond_resched() somewhere here. Or in zs_zpool_shrink()
> loop.

I'll put it right after releasing the pool's lock in the retry loop:

		/* Lock out object allocations and object compaction */
		remove_zspage(class, zspage, fullness);

		spin_unlock(&pool->lock);
		cond_resched();

		/* Lock backing pages into place */
		lock_zspage(zspage);

This will also appear in v7. In the meantime, please feel free to discuss all
the patches - I'll try to batch the changes to minimize the churning.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-23  8:11           ` Yosry Ahmed
@ 2022-11-23 16:30             ` Johannes Weiner
  2022-11-24  3:29               ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Weiner @ 2022-11-23 16:30 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Sergey Senozhatsky, Nhat Pham, akpm, linux-mm, linux-kernel,
	minchan, ngupta, sjenning, ddstreet, vitaly.wool

On Wed, Nov 23, 2022 at 12:11:24AM -0800, Yosry Ahmed wrote:
> On Wed, Nov 23, 2022 at 12:02 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> > On Tue, Nov 22, 2022 at 7:50 PM Sergey Senozhatsky
> > > There are no accesses to swapped out pages yes, but zspage holds multiple
> > > objects, which are compressed swapped out pages in this particular case.
> > > For example, zspage in class size 176 (bytes) can hold 93 objects per-zspage,
> > > that is 93 compressed swapped out pages. Consider ZS_FULL zspages which
> > > is at the tail of the LRU list. Suppose that we page-faulted 20 times and
> > > read 20 objects from that zspage, IOW zspage has been in use 20 times very
> > > recently, while writeback still considers it to be "not-used" and will
> > > evict it.
> > >
> > > So if this works for you then I'm fine. But we probably, like you suggested,
> > > can document a couple of things here - namely why WRITE access to zspage
> > > counts as "zspage is in use" but READ access to the same zspage does not
> > > count as "zspage is in use".

> Nonetheless, the fact that we refaulted an object in a zspage does not
> necessarily mean that other objects on the same are hotter than
> objects in other zspages IIUC.

Yes.

On allocation, we know that there is at least one hot object in the
page. On refault, the connection between objects in a page is weak.

And it's weaker on zsmalloc than with other backends due to the many
size classes making temporal grouping less likely. So I think you're
quite right, Segey, that a per-class LRU would be more accurate.

It's no-LRU < zspage-LRU < class-LRU < object-LRU.

Like Yosry said, the plan is to implement an object-LRU next as part
of the generalized LRU for zsmalloc, zbud and z3fold.

For now, the zspage LRU is an improvement to no-LRU. Our production
experiments confirmed that.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-22  6:37   ` Sergey Senozhatsky
  2022-11-23 16:30     ` Nhat Pham
@ 2022-11-23 17:18     ` Johannes Weiner
  1 sibling, 0 replies; 39+ messages in thread
From: Johannes Weiner @ 2022-11-23 17:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Nhat Pham, akpm, linux-mm, linux-kernel, minchan, ngupta,
	sjenning, ddstreet, vitaly.wool

On Tue, Nov 22, 2022 at 03:37:29PM +0900, Sergey Senozhatsky wrote:
> On (22/11/18 16:15), Nhat Pham wrote:
> [..]
> > +static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries)
> > +{
> > +	int i, obj_idx, ret = 0;
> > +	unsigned long handle;
> > +	struct zspage *zspage;
> > +	struct page *page;
> > +	enum fullness_group fullness;
> > +
> > +	/* Lock LRU and fullness list */
> > +	spin_lock(&pool->lock);
> > +	if (list_empty(&pool->lru)) {
> > +		spin_unlock(&pool->lock);
> > +		return -EINVAL;
> > +	}
> > +
> > +	for (i = 0; i < retries; i++) {
> > +		struct size_class *class;
> > +
> > +		zspage = list_last_entry(&pool->lru, struct zspage, lru);
> > +		list_del(&zspage->lru);
> > +
> > +		/* zs_free may free objects, but not the zspage and handles */
> > +		zspage->under_reclaim = true;
> > +
> > +		class = zspage_class(pool, zspage);
> > +		fullness = get_fullness_group(class, zspage);
> > +
> > +		/* Lock out object allocations and object compaction */
> > +		remove_zspage(class, zspage, fullness);
> > +
> > +		spin_unlock(&pool->lock);
> > +
> > +		/* Lock backing pages into place */
> > +		lock_zspage(zspage);
> > +
> > +		obj_idx = 0;
> > +		page = zspage->first_page;
> > +		while (1) {
> > +			handle = find_alloced_obj(class, page, &obj_idx);
> > +			if (!handle) {
> > +				page = get_next_page(page);
> > +				if (!page)
> > +					break;
> > +				obj_idx = 0;
> > +				continue;
> > +			}
> > +
> > +			/*
> > +			 * This will write the object and call zs_free.
> > +			 *
> > +			 * zs_free will free the object, but the
> > +			 * under_reclaim flag prevents it from freeing
> > +			 * the zspage altogether. This is necessary so
> > +			 * that we can continue working with the
> > +			 * zspage potentially after the last object
> > +			 * has been freed.
> > +			 */
> > +			ret = pool->zpool_ops->evict(pool->zpool, handle);
> > +			if (ret)
> > +				goto next;
> > +
> > +			obj_idx++;
> > +		}
> > +
> > +next:
> > +		/* For freeing the zspage, or putting it back in the pool and LRU list. */
> > +		spin_lock(&pool->lock);
> > +		zspage->under_reclaim = false;
> > +
> > +		if (!get_zspage_inuse(zspage)) {
> > +			/*
> > +			 * Fullness went stale as zs_free() won't touch it
> > +			 * while the page is removed from the pool. Fix it
> > +			 * up for the check in __free_zspage().
> > +			 */
> > +			zspage->fullness = ZS_EMPTY;
> > +
> > +			__free_zspage(pool, class, zspage);
> > +			spin_unlock(&pool->lock);
> > +			return 0;
> > +		}
> > +
> > +		putback_zspage(class, zspage);
> > +		list_add(&zspage->lru, &pool->lru);
> > +		unlock_zspage(zspage);
> 
> We probably better to cond_resched() somewhere here. Or in zs_zpool_shrink()
> loop.

Hm, yeah I suppose that could make sense if we try more than one page.

We always hold either the pool lock or the page locks, and we probably
don't want to schedule with the page locks held. So it would need to
actually lockbreak the pool lock. And then somebody can steal the page
and empty the LRU under us, so we need to check that on looping, too.

Something like this?

for (i = 0; i < retries; i++) {
	spin_lock(&pool->lock);
	if (list_empty(&pool->lru)) {
		spin_unlock(&pool->lock);
		return -EINVAL;
	}
	zspage = list_last_entry(&pool->lru, ...);

	...

	putback_zspage(class, zspage);
	list_add(&zspage->lru, &pool->lru);
	unlock_zspage(zspage);
	spin_unlock(&pool->lock);

	cond_resched();
}
return -EAGAIN;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc
  2022-11-23 16:30     ` Nhat Pham
@ 2022-11-23 17:27       ` Johannes Weiner
  0 siblings, 0 replies; 39+ messages in thread
From: Johannes Weiner @ 2022-11-23 17:27 UTC (permalink / raw)
  To: Nhat Pham
  Cc: senozhatsky, linux-mm, linux-kernel, minchan, ngupta, akpm,
	sjenning, ddstreet, vitaly.wool

On Wed, Nov 23, 2022 at 08:30:44AM -0800, Nhat Pham wrote:
> I'll put it right after releasing the pool's lock in the retry loop:
> 
> 		/* Lock out object allocations and object compaction */
> 		remove_zspage(class, zspage, fullness);
> 
> 		spin_unlock(&pool->lock);
> 		cond_resched();
> 
> 		/* Lock backing pages into place */
> 		lock_zspage(zspage);
> 
> This will also appear in v7. In the meantime, please feel free to discuss all
> the patches - I'll try to batch the changes to minimize the churning.

Oh, our emails collided. This is easier than my version :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 0/6] Implement writeback for zsmalloc
  2022-11-21 19:29 ` [PATCH v6 0/6] Implement writeback " Nhat Pham
@ 2022-11-23 19:26   ` Nhat Pham
  0 siblings, 0 replies; 39+ messages in thread
From: Nhat Pham @ 2022-11-23 19:26 UTC (permalink / raw)
  To: akpm
  Cc: hannes, linux-mm, linux-kernel, minchan, ngupta, senozhatsky,
	sjenning, ddstreet, vitaly.wool

The suggested changes seem relatively minor, so instead of sending a v7
series of patches, I've just sent the two fixes in a separate thread.

Andrew, would you mind applying those fixes on top of patch 4 and patch
6 respectively? Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-23  8:02         ` Yosry Ahmed
  2022-11-23  8:11           ` Yosry Ahmed
@ 2022-11-24  3:21           ` Sergey Senozhatsky
  1 sibling, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-24  3:21 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Sergey Senozhatsky, Johannes Weiner, Nhat Pham, akpm, linux-mm,
	linux-kernel, minchan, ngupta, sjenning, ddstreet, vitaly.wool

On (22/11/23 00:02), Yosry Ahmed wrote:
> > There are no accesses to swapped out pages yes, but zspage holds multiple
> > objects, which are compressed swapped out pages in this particular case.
> > For example, zspage in class size 176 (bytes) can hold 93 objects per-zspage,
> > that is 93 compressed swapped out pages. Consider ZS_FULL zspages which
> > is at the tail of the LRU list. Suppose that we page-faulted 20 times and
> > read 20 objects from that zspage, IOW zspage has been in use 20 times very
> > recently, while writeback still considers it to be "not-used" and will
> > evict it.
> >
> > So if this works for you then I'm fine. But we probably, like you suggested,
> > can document a couple of things here - namely why WRITE access to zspage
> > counts as "zspage is in use" but READ access to the same zspage does not
> > count as "zspage is in use".
> >
> 
> I guess the key here is that we have an LRU of zspages, when we really
> want an LRU of compressed objects. In some cases, we may end up
> reclaiming the wrong pages.

Yes, completely agree.

[..]
> Ideally, we would have an LRU of objects instead, but this would be
> very complicated with the current form of writeback.

Right. So we have two writebacks now: one in zram and on in zsmalloc.
And zram writeback works with objects' access patterns, it simply tracks
timestamps per entry and it doesn't know/care about zspages. Writeback
targets in zram are selected by simply looking at timestamps of objects
(compressed normal pages). And that is the right level for LRU, allocator
is too low-level for this.

I'm looking forward to seeing new LRU implementation (at a level higher
than allocator) :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order
  2022-11-23 16:30             ` Johannes Weiner
@ 2022-11-24  3:29               ` Sergey Senozhatsky
  0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2022-11-24  3:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yosry Ahmed, Sergey Senozhatsky, Nhat Pham, akpm, linux-mm,
	linux-kernel, minchan, ngupta, sjenning, ddstreet, vitaly.wool

On (22/11/23 11:30), Johannes Weiner wrote:
> Like Yosry said, the plan is to implement an object-LRU next as part
> of the generalized LRU for zsmalloc, zbud and z3fold.
> 
> For now, the zspage LRU is an improvement to no-LRU. Our production
> experiments confirmed that.

Sounds good!

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2022-11-24  3:29 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-19  0:15 [PATCH v6 0/6] Implement writeback for zsmalloc Nhat Pham
2022-11-19  0:15 ` [PATCH v6 1/6] zswap: fix writeback lock ordering " Nhat Pham
2022-11-22  1:43   ` Sergey Senozhatsky
2022-11-19  0:15 ` [PATCH v6 2/6] zpool: clean out dead code Nhat Pham
2022-11-22  1:46   ` Sergey Senozhatsky
2022-11-19  0:15 ` [PATCH v6 3/6] zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks Nhat Pham
2022-11-19  0:15 ` [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Nhat Pham
2022-11-19 16:38   ` Johannes Weiner
2022-11-19 17:34   ` Minchan Kim
2022-11-22  1:52   ` Sergey Senozhatsky
2022-11-22 17:42     ` Johannes Weiner
2022-11-23  3:50       ` Sergey Senozhatsky
2022-11-23  8:02         ` Yosry Ahmed
2022-11-23  8:11           ` Yosry Ahmed
2022-11-23 16:30             ` Johannes Weiner
2022-11-24  3:29               ` Sergey Senozhatsky
2022-11-24  3:21           ` Sergey Senozhatsky
2022-11-23  3:58   ` Sergey Senozhatsky
2022-11-19  0:15 ` [PATCH v6 5/6] zsmalloc: Add zpool_ops field to zs_pool to store evict handlers Nhat Pham
2022-11-19 16:39   ` Johannes Weiner
2022-11-22  1:11     ` Sergey Senozhatsky
2022-11-19  0:15 ` [PATCH v6 6/6] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
2022-11-19 16:45   ` Johannes Weiner
2022-11-19 17:35   ` Minchan Kim
2022-11-22  1:40   ` Sergey Senozhatsky
2022-11-22  2:00   ` Sergey Senozhatsky
2022-11-22  2:15   ` Sergey Senozhatsky
2022-11-22  3:12     ` Johannes Weiner
2022-11-22  3:42       ` Sergey Senozhatsky
2022-11-22  6:09         ` Johannes Weiner
2022-11-22  6:35           ` Sergey Senozhatsky
2022-11-22  7:10             ` Johannes Weiner
2022-11-22  7:19               ` Sergey Senozhatsky
2022-11-22  6:37   ` Sergey Senozhatsky
2022-11-23 16:30     ` Nhat Pham
2022-11-23 17:27       ` Johannes Weiner
2022-11-23 17:18     ` Johannes Weiner
2022-11-21 19:29 ` [PATCH v6 0/6] Implement writeback " Nhat Pham
2022-11-23 19:26   ` Nhat Pham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).