linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/7] zram: Support multiple compression streams
@ 2022-09-05  8:15 Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH 1/3] documentation: Add recompression documentation Sergey Senozhatsky
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Hello,

	RFC series that adds support for multiple (per-CPU)
compression streams (at point only 2). The main idea is that
different compression algorithms have different characteristics
and zram may benefit when it uses a combination of algorithms:
a default algorithm that is faster but have lower compression
rate and a secondary algorithm that can use higher compression
rate at a price of slower compression/decompression.

	There are several use-case for this functionality:

- huge pages re-compression: zstd or defalte can successfully
compress huge pages (~50% of huge pages on my synthetic ChromeOS
tests), IOW pages that lzo was not able to compress.

- idle pages re-compression: idle/cold pages sit in the memory
and we may reduce zsmalloc memory usage if we recompress those
idle pages.

	User-space has a number of ways to control the behavior
and impact of zram recompression: what type of pages should be
recompressed, size watermarks, etc. Please refer to documentation
patch.

Sergey Senozhatsky (7):
  zram: Preparation for multi-zcomp support
  zram: Add recompression algorithm sysfs knob
  zram: Factor out WB and non-WB zram read functions
  zram: Introduce recompress sysfs knob
  documentation: Add recompression documentation
  zram: Add recompression algorithm choice to Kconfig
  zram: Add recompress flag to read_block_state()

 Documentation/admin-guide/blockdev/zram.rst |  64 ++-
 drivers/block/zram/Kconfig                  |  51 +++
 drivers/block/zram/zcomp.c                  |   6 +-
 drivers/block/zram/zcomp.h                  |   2 +-
 drivers/block/zram/zram_drv.c               | 432 +++++++++++++++++---
 drivers/block/zram/zram_drv.h               |  15 +-
 6 files changed, 504 insertions(+), 66 deletions(-)

-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] documentation: Add recompression documentation
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH RFC 1/7] zram: Preparation for multi-zcomp support Sergey Senozhatsky
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Document user-space visible device attributes that
are enabled by ZRAM_MULTI_COMP.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 Documentation/admin-guide/blockdev/zram.rst | 55 +++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index c73b16930449..88957fcb6ad7 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -401,6 +401,61 @@ budget in next setting is user's job.
 If admin wants to measure writeback count in a certain period, they could
 know it via /sys/block/zram0/bd_stat's 3rd column.
 
+recompression
+-------------
+
+With CONFIG_ZRAM_MULTI_COMP, zram can recompress idle/huge pages using
+alternative (secondary) compression algorithm. The basic idea is that
+alternative compression algorithm can provide better compression ratio
+at a price of (potentially) slower compression/decompression speeds.
+Alternative compression algorithm can, for example, be more successful
+compressing huge pages (those that default algorithm failed to compress).
+Another application is idle pages recompression - pages that are cold and
+sit in the memory can be recompressed using more effective algorithm and,
+hence, reduce zsmalloc memory usage.
+
+With CONFIG_ZRAM_MULTI_COMP, zram will setup two compression algorithms
+per-CPU: primary and secondary ones. Primary zram compressor is explained
+in "3) Select compression algorithm", the secondary algorithm is configured
+in a similar way, using recomp_algorithm device attribute:
+
+Examples::
+
+	#show supported recompression algorithms
+	cat /sys/block/zramX/recomp_algorithm
+	zstd [lzo]
+
+	#select zstd recompression algorithm
+	echo zstd > /sys/block/zramX/recomp_algorithm
+
+Another device attribute that CONFIG_ZRAM_MULTI_COMP enables is recompress,
+which controls recompression:
+
+Examples::
+
+	#IDLE pages recompression is activated by `idle` mode
+	echo idle > /sys/block/zramX/recompress
+
+	#HUGE pages recompression is activated by `huge` mode
+	echo huge > /sys/block/zram0/recompress
+
+	#HUGE_IDLE pages recompression is activated by `huge_idle` mode
+	echo huge_idle > /sys/block/zramX/recompress
+
+The number of idle pages can be significant, so user-space can pass a size
+watermark value to the recompress knob, to filter out idle pages for
+recompression: zram will recompress only idle pages of equal or greater
+size:::
+
+	#recompress idle pages larger than 3000 bytes
+	echo 3000 > /sys/block/zramX/recompress
+
+	#recompress idle pages larger than 2000 bytes
+	echo 2000 > /sys/block/zramX/recompress
+
+Recompression is mostly focused on idle pages (except for huge pages
+recompression), so it works better in conjunction with memory tracking.
+
 memory tracking
 ===============
 
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 1/7] zram: Preparation for multi-zcomp support
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH 1/3] documentation: Add recompression documentation Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH 2/3] zram: Add recompression algorithm choice to Kconfig Sergey Senozhatsky
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

The patch turns compression streams and compressor algorithm
name struct zram members into arrays, so that we can have
multiple compression streams support (in the next patches).

The patch uses a rather explicit API for compressor selection:

- Get primary (default) compression stream
	zcomp_stream_get(zram->comps[ZRAM_PRIMARY_ZCOMP])
- Get secondary compression stream
	zcomp_stream_get(zram->comps[ZRAM_SECONDARY_ZCOMP])

We use similar API for compression streams put().

At this point we always have just one compression stream,
since CONFIG_ZRAM_MULTI_COMP is not yet defined.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zcomp.c    |  6 +--
 drivers/block/zram/zcomp.h    |  2 +-
 drivers/block/zram/zram_drv.c | 87 ++++++++++++++++++++++++-----------
 drivers/block/zram/zram_drv.h | 14 +++++-
 4 files changed, 77 insertions(+), 32 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 0916de952e09..55af4efd7983 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -206,7 +206,7 @@ void zcomp_destroy(struct zcomp *comp)
  * case of allocation error, or any other error potentially
  * returned by zcomp_init().
  */
-struct zcomp *zcomp_create(const char *compress)
+struct zcomp *zcomp_create(const char *alg)
 {
 	struct zcomp *comp;
 	int error;
@@ -216,14 +216,14 @@ struct zcomp *zcomp_create(const char *compress)
 	 * is not loaded yet. We must do it here, otherwise we are about to
 	 * call /sbin/modprobe under CPU hot-plug lock.
 	 */
-	if (!zcomp_available_algorithm(compress))
+	if (!zcomp_available_algorithm(alg))
 		return ERR_PTR(-EINVAL);
 
 	comp = kzalloc(sizeof(struct zcomp), GFP_KERNEL);
 	if (!comp)
 		return ERR_PTR(-ENOMEM);
 
-	comp->name = compress;
+	comp->name = alg;
 	error = zcomp_init(comp);
 	if (error) {
 		kfree(comp);
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index 40f6420f4b2e..cdefdef93da8 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -27,7 +27,7 @@ int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node);
 ssize_t zcomp_available_show(const char *comp, char *buf);
 bool zcomp_available_algorithm(const char *comp);
 
-struct zcomp *zcomp_create(const char *comp);
+struct zcomp *zcomp_create(const char *alg);
 void zcomp_destroy(struct zcomp *comp);
 
 struct zcomp_strm *zcomp_stream_get(struct zcomp *comp);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 607f4634c27d..4ad1daa1283e 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1018,36 +1018,53 @@ static ssize_t comp_algorithm_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	down_read(&zram->init_lock);
-	sz = zcomp_available_show(zram->compressor, buf);
+	sz = zcomp_available_show(zram->comp_algs[ZRAM_PRIMARY_ZCOMP], buf);
 	up_read(&zram->init_lock);
 
 	return sz;
 }
 
+static void comp_algorithm_set(struct zram *zram, u32 idx, const char *alg)
+{
+	/* Do not kfree() algs that we didn't allocate, IOW the default ones */
+	if (zram->comp_algs[idx] != default_compressor)
+		kfree(zram->comp_algs[idx]);
+	zram->comp_algs[idx] = alg;
+}
+
 static ssize_t comp_algorithm_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
 	struct zram *zram = dev_to_zram(dev);
-	char compressor[ARRAY_SIZE(zram->compressor)];
+	char *compressor;
 	size_t sz;
 
-	strlcpy(compressor, buf, sizeof(compressor));
+	sz = strlen(buf);
+	if (sz >= CRYPTO_MAX_ALG_NAME)
+		return -E2BIG;
+
+	compressor = kstrdup(buf, GFP_KERNEL);
+	if (!compressor)
+		return -ENOMEM;
+
 	/* ignore trailing newline */
-	sz = strlen(compressor);
 	if (sz > 0 && compressor[sz - 1] == '\n')
 		compressor[sz - 1] = 0x00;
 
-	if (!zcomp_available_algorithm(compressor))
+	if (!zcomp_available_algorithm(compressor)) {
+		kfree(compressor);
 		return -EINVAL;
+	}
 
 	down_write(&zram->init_lock);
 	if (init_done(zram)) {
 		up_write(&zram->init_lock);
+		kfree(compressor);
 		pr_info("Can't change algorithm for initialized device\n");
 		return -EBUSY;
 	}
 
-	strcpy(zram->compressor, compressor);
+	comp_algorithm_set(zram, ZRAM_PRIMARY_ZCOMP, compressor);
 	up_write(&zram->init_lock);
 	return len;
 }
@@ -1292,7 +1309,7 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 	size = zram_get_obj_size(zram, index);
 
 	if (size != PAGE_SIZE)
-		zstrm = zcomp_stream_get(zram->comp);
+		zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 
 	src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
 	if (size == PAGE_SIZE) {
@@ -1304,7 +1321,7 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 		dst = kmap_atomic(page);
 		ret = zcomp_decompress(zstrm, src, size, dst);
 		kunmap_atomic(dst);
-		zcomp_stream_put(zram->comp);
+		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 	}
 	zs_unmap_object(zram->mem_pool, handle);
 	zram_slot_unlock(zram, index);
@@ -1371,13 +1388,13 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	kunmap_atomic(mem);
 
 compress_again:
-	zstrm = zcomp_stream_get(zram->comp);
+	zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 	src = kmap_atomic(page);
 	ret = zcomp_compress(zstrm, src, &comp_len);
 	kunmap_atomic(src);
 
 	if (unlikely(ret)) {
-		zcomp_stream_put(zram->comp);
+		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 		pr_err("Compression failed! err=%d\n", ret);
 		zs_free(zram->mem_pool, handle);
 		return ret;
@@ -1405,7 +1422,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 				__GFP_HIGHMEM |
 				__GFP_MOVABLE);
 	if (IS_ERR((void *)handle)) {
-		zcomp_stream_put(zram->comp);
+		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 		atomic64_inc(&zram->stats.writestall);
 		handle = zs_malloc(zram->mem_pool, comp_len,
 				GFP_NOIO | __GFP_HIGHMEM |
@@ -1422,14 +1439,14 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 		 * It is necessary that the dereferencing of the zstrm variable below
 		 * occurs correctly.
 		 */
-		zstrm = zcomp_stream_get(zram->comp);
+		zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 	}
 
 	alloced_pages = zs_get_total_pages(zram->mem_pool);
 	update_used_max(zram, alloced_pages);
 
 	if (zram->limit_pages && alloced_pages > zram->limit_pages) {
-		zcomp_stream_put(zram->comp);
+		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 		zs_free(zram->mem_pool, handle);
 		return -ENOMEM;
 	}
@@ -1443,7 +1460,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	if (comp_len == PAGE_SIZE)
 		kunmap_atomic(src);
 
-	zcomp_stream_put(zram->comp);
+	zcomp_stream_put(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 	zs_unmap_object(zram->mem_pool, handle);
 	atomic64_add(comp_len, &zram->stats.compr_data_size);
 out:
@@ -1718,6 +1735,20 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	return ret;
 }
 
+static void zram_destroy_comps(struct zram *zram)
+{
+	u32 idx;
+
+	for (idx = 0; idx < ZRAM_MAX_ZCOMPS; idx++) {
+		struct zcomp *comp = zram->comps[idx];
+
+		zram->comps[idx] = NULL;
+		if (IS_ERR_OR_NULL(comp))
+			continue;
+		zcomp_destroy(comp);
+	}
+}
+
 static void zram_reset_device(struct zram *zram)
 {
 	down_write(&zram->init_lock);
@@ -1735,11 +1766,11 @@ static void zram_reset_device(struct zram *zram)
 	/* I/O operation under all of CPU are done so let's free */
 	zram_meta_free(zram, zram->disksize);
 	zram->disksize = 0;
+	zram_destroy_comps(zram);
 	memset(&zram->stats, 0, sizeof(zram->stats));
-	zcomp_destroy(zram->comp);
-	zram->comp = NULL;
 	reset_bdev(zram);
 
+	comp_algorithm_set(zram, ZRAM_PRIMARY_ZCOMP, default_compressor);
 	up_write(&zram->init_lock);
 }
 
@@ -1750,6 +1781,7 @@ static ssize_t disksize_store(struct device *dev,
 	struct zcomp *comp;
 	struct zram *zram = dev_to_zram(dev);
 	int err;
+	u32 idx;
 
 	disksize = memparse(buf, NULL);
 	if (!disksize)
@@ -1768,22 +1800,25 @@ static ssize_t disksize_store(struct device *dev,
 		goto out_unlock;
 	}
 
-	comp = zcomp_create(zram->compressor);
-	if (IS_ERR(comp)) {
-		pr_err("Cannot initialise %s compressing backend\n",
-				zram->compressor);
-		err = PTR_ERR(comp);
-		goto out_free_meta;
-	}
+	for (idx = 0; idx < ZRAM_MAX_ZCOMPS; idx++) {
+		comp = zcomp_create(zram->comp_algs[idx]);
+		if (IS_ERR(comp)) {
+			pr_err("Cannot initialise %s compressing backend\n",
+			       zram->comp_algs[idx]);
+			err = PTR_ERR(comp);
+			goto out_free_comps;
+		}
 
-	zram->comp = comp;
+		zram->comps[idx] = comp;
+	}
 	zram->disksize = disksize;
 	set_capacity_and_notify(zram->disk, zram->disksize >> SECTOR_SHIFT);
 	up_write(&zram->init_lock);
 
 	return len;
 
-out_free_meta:
+out_free_comps:
+	zram_destroy_comps(zram);
 	zram_meta_free(zram, disksize);
 out_unlock:
 	up_write(&zram->init_lock);
@@ -1979,7 +2014,7 @@ static int zram_add(void)
 	if (ret)
 		goto out_cleanup_disk;
 
-	strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
+	zram->comp_algs[ZRAM_PRIMARY_ZCOMP] = default_compressor;
 
 	zram_debugfs_register(zram);
 	pr_info("Added device: %s\n", zram->disk->disk_name);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 80c3b43b4828..af3d6f6bfcff 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -90,10 +90,20 @@ struct zram_stats {
 #endif
 };
 
+#ifdef CONFIG_ZRAM_MULTI_COMP
+#define ZRAM_PRIMARY_ZCOMP	0
+#define ZRAM_SECONDARY_ZCOMP	1
+#define ZRAM_MAX_ZCOMPS	2
+#else
+#define ZRAM_PRIMARY_ZCOMP	0
+#define ZRAM_SECONDARY_ZCOMP	0
+#define ZRAM_MAX_ZCOMPS	1
+#endif
+
 struct zram {
 	struct zram_table_entry *table;
 	struct zs_pool *mem_pool;
-	struct zcomp *comp;
+	struct zcomp *comps[ZRAM_MAX_ZCOMPS];
 	struct gendisk *disk;
 	/* Prevent concurrent execution of device init */
 	struct rw_semaphore init_lock;
@@ -108,7 +118,7 @@ struct zram {
 	 * we can store in a disk.
 	 */
 	u64 disksize;	/* bytes */
-	char compressor[CRYPTO_MAX_ALG_NAME];
+	const char *comp_algs[ZRAM_MAX_ZCOMPS];
 	/*
 	 * zram is claimed so open request will be failed
 	 */
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] zram: Add recompression algorithm choice to Kconfig
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH 1/3] documentation: Add recompression documentation Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH RFC 1/7] zram: Preparation for multi-zcomp support Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH RFC 2/7] zram: Add recompression algorithm sysfs knob Sergey Senozhatsky
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Make (secondary) recompression algorithm selectable just like
we do it for the (primary) default one.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/Kconfig    | 40 +++++++++++++++++++++++++++++++++++
 drivers/block/zram/zram_drv.c |  2 +-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 81ae4b96ec1a..fc2d4d66c484 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -89,3 +89,43 @@ config ZRAM_MULTI_COMP
 
           echo TIMEOUT > /sys/block/zramX/idle
           echo SIZE > /sys/block/zramX/recompress
+
+choice
+	prompt "Default zram recompression algorithm"
+	default ZRAM_DEF_RECOMP_ZSTD
+	depends on ZRAM && ZRAM_MULTI_COMP
+
+config ZRAM_DEF_RECOMP_LZORLE
+	bool "lzo-rle"
+	depends on CRYPTO_LZO
+
+config ZRAM_DEF_RECOMP_ZSTD
+	bool "zstd"
+	depends on CRYPTO_ZSTD
+
+config ZRAM_DEF_RECOMP_LZ4
+	bool "lz4"
+	depends on CRYPTO_LZ4
+
+config ZRAM_DEF_RECOMP_LZO
+	bool "lzo"
+	depends on CRYPTO_LZO
+
+config ZRAM_DEF_RECOMP_LZ4HC
+	bool "lz4hc"
+	depends on CRYPTO_LZ4HC
+
+config ZRAM_DEF_RECOMP_842
+	bool "842"
+	depends on CRYPTO_842
+
+endchoice
+
+config ZRAM_DEF_RECOMP
+	string
+	default "lzo-rle" if ZRAM_DEF_RECOMP_LZORLE
+	default "zstd" if ZRAM_DEF_RECOMP_ZSTD
+	default "lz4" if ZRAM_DEF_RECOMP_LZ4
+	default "lzo" if ZRAM_DEF_RECOMP_LZO
+	default "lz4hc" if ZRAM_DEF_RECOMP_LZ4HC
+	default "842" if ZRAM_DEF_RECOMP_842
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 5f0181d9a69e..f144f31c3c4b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -44,7 +44,7 @@ static int zram_major;
 static const char *default_comp_algs[ZRAM_MAX_ZCOMPS] = {
 	CONFIG_ZRAM_DEF_COMP,
 #ifdef CONFIG_ZRAM_MULTI_COMP
-	"zstd",
+	CONFIG_ZRAM_DEF_RECOMP,
 #endif
 };
 
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 2/7] zram: Add recompression algorithm sysfs knob
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2022-09-05  8:15 ` [PATCH 2/3] zram: Add recompression algorithm choice to Kconfig Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH 3/3] zram: Add recompress flag to read_block_state() Sergey Senozhatsky
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Introduce recomp_algorithm sysfs knob that controls
secondary algorithm selection used for recompression.
This device attribute works in a similar way with
comp_algorithm attribute.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 111 +++++++++++++++++++++++++++-------
 1 file changed, 90 insertions(+), 21 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 4ad1daa1283e..694c8c426cb2 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -41,7 +41,12 @@ static DEFINE_IDR(zram_index_idr);
 static DEFINE_MUTEX(zram_index_mutex);
 
 static int zram_major;
-static const char *default_compressor = CONFIG_ZRAM_DEF_COMP;
+static const char *default_comp_algs[ZRAM_MAX_ZCOMPS] = {
+	CONFIG_ZRAM_DEF_COMP,
+#ifdef CONFIG_ZRAM_MULTI_COMP
+	"zstd",
+#endif
+};
 
 /* Module params (documentation at end) */
 static unsigned int num_devices = 1;
@@ -1011,31 +1016,37 @@ static ssize_t max_comp_streams_store(struct device *dev,
 	return len;
 }
 
-static ssize_t comp_algorithm_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static void comp_algorithm_set(struct zram *zram, u32 idx, const char *alg)
 {
-	size_t sz;
-	struct zram *zram = dev_to_zram(dev);
+	bool default_alg = false;
+	int i;
 
-	down_read(&zram->init_lock);
-	sz = zcomp_available_show(zram->comp_algs[ZRAM_PRIMARY_ZCOMP], buf);
-	up_read(&zram->init_lock);
+	/* Do not kfree() algs that we didn't allocate, IOW the default ones */
+	for (i = 0; i < ZRAM_MAX_ZCOMPS; i++) {
+		if (zram->comp_algs[idx] == default_comp_algs[i]) {
+			default_alg = true;
+			break;
+		}
+	}
 
-	return sz;
+	if (!default_alg)
+		kfree(zram->comp_algs[idx]);
+	zram->comp_algs[idx] = alg;
 }
 
-static void comp_algorithm_set(struct zram *zram, u32 idx, const char *alg)
+static ssize_t __comp_algorithm_show(struct zram *zram, u32 idx, char *buf)
 {
-	/* Do not kfree() algs that we didn't allocate, IOW the default ones */
-	if (zram->comp_algs[idx] != default_compressor)
-		kfree(zram->comp_algs[idx]);
-	zram->comp_algs[idx] = alg;
+	ssize_t sz;
+
+	down_read(&zram->init_lock);
+	sz = zcomp_available_show(zram->comp_algs[idx], buf);
+	up_read(&zram->init_lock);
+
+	return sz;
 }
 
-static ssize_t comp_algorithm_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+static int __comp_algorithm_store(struct zram *zram, u32 idx, const char *buf)
 {
-	struct zram *zram = dev_to_zram(dev);
 	char *compressor;
 	size_t sz;
 
@@ -1064,11 +1075,55 @@ static ssize_t comp_algorithm_store(struct device *dev,
 		return -EBUSY;
 	}
 
-	comp_algorithm_set(zram, ZRAM_PRIMARY_ZCOMP, compressor);
+	comp_algorithm_set(zram, idx, compressor);
 	up_write(&zram->init_lock);
-	return len;
+	return 0;
+}
+
+static ssize_t comp_algorithm_show(struct device *dev,
+				   struct device_attribute *attr,
+				   char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return __comp_algorithm_show(zram, ZRAM_PRIMARY_ZCOMP, buf);
+}
+
+static ssize_t comp_algorithm_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf,
+				    size_t len)
+{
+	struct zram *zram = dev_to_zram(dev);
+	int ret;
+
+	ret = __comp_algorithm_store(zram, ZRAM_PRIMARY_ZCOMP, buf);
+	return ret ? ret : len;
 }
 
+#ifdef CONFIG_ZRAM_MULTI_COMP
+static ssize_t recomp_algorithm_show(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return __comp_algorithm_show(zram, ZRAM_SECONDARY_ZCOMP, buf);
+}
+
+static ssize_t recomp_algorithm_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf,
+				      size_t len)
+{
+	struct zram *zram = dev_to_zram(dev);
+	int ret;
+
+	ret = __comp_algorithm_store(zram, ZRAM_SECONDARY_ZCOMP, buf);
+	return ret ? ret : len;
+}
+#endif
+
 static ssize_t compact_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -1770,7 +1825,11 @@ static void zram_reset_device(struct zram *zram)
 	memset(&zram->stats, 0, sizeof(zram->stats));
 	reset_bdev(zram);
 
-	comp_algorithm_set(zram, ZRAM_PRIMARY_ZCOMP, default_compressor);
+	comp_algorithm_set(zram, ZRAM_PRIMARY_ZCOMP,
+			   default_comp_algs[ZRAM_PRIMARY_ZCOMP]);
+	if (IS_ENABLED(CONFIG_ZRAM_MULTI_COMP))
+		comp_algorithm_set(zram, ZRAM_SECONDARY_ZCOMP,
+				   default_comp_algs[ZRAM_SECONDARY_ZCOMP]);
 	up_write(&zram->init_lock);
 }
 
@@ -1912,6 +1971,9 @@ static DEVICE_ATTR_WO(writeback);
 static DEVICE_ATTR_RW(writeback_limit);
 static DEVICE_ATTR_RW(writeback_limit_enable);
 #endif
+#ifdef CONFIG_ZRAM_MULTI_COMP
+static DEVICE_ATTR_RW(recomp_algorithm);
+#endif
 
 static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_disksize.attr,
@@ -1935,6 +1997,9 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_bd_stat.attr,
 #endif
 	&dev_attr_debug_stat.attr,
+#ifdef CONFIG_ZRAM_MULTI_COMP
+	&dev_attr_recomp_algorithm.attr,
+#endif
 	NULL,
 };
 
@@ -2014,7 +2079,11 @@ static int zram_add(void)
 	if (ret)
 		goto out_cleanup_disk;
 
-	zram->comp_algs[ZRAM_PRIMARY_ZCOMP] = default_compressor;
+	zram->comp_algs[ZRAM_PRIMARY_ZCOMP] =
+		default_comp_algs[ZRAM_PRIMARY_ZCOMP];
+	if (IS_ENABLED(CONFIG_ZRAM_MULTI_COMP))
+		zram->comp_algs[ZRAM_SECONDARY_ZCOMP] =
+			default_comp_algs[ZRAM_SECONDARY_ZCOMP];
 
 	zram_debugfs_register(zram);
 	pr_info("Added device: %s\n", zram->disk->disk_name);
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] zram: Add recompress flag to read_block_state()
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2022-09-05  8:15 ` [PATCH RFC 2/7] zram: Add recompression algorithm sysfs knob Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH RFC 3/7] zram: Factor out WB and non-WB zram read functions Sergey Senozhatsky
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Add a new flag to zram block state that shows if the page
was recompressed (using alternative compression algorithm).

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 Documentation/admin-guide/blockdev/zram.rst | 9 ++++++---
 drivers/block/zram/zram_drv.c               | 5 +++--
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 88957fcb6ad7..70a3d0243b45 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -466,9 +466,10 @@ pages of the process with*pagemap.
 If you enable the feature, you could see block state via
 /sys/kernel/debug/zram/zram0/block_state". The output is as follows::
 
-	  300    75.033841 .wh.
-	  301    63.806904 s...
-	  302    63.806919 ..hi
+	  300    75.033841 .wh..
+	  301    63.806904 s....
+	  302    63.806919 ..hi.
+	  303    62.801919 ....r
 
 First column
 	zram's block index.
@@ -485,6 +486,8 @@ Third column
 		huge page
 	i:
 		idle page
+	r:
+		recompressed page (secondary compression algorithm)
 
 First line of above example says 300th block is accessed at 75.033841sec
 and the block's state is huge so it is written back to the backing
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f144f31c3c4b..791f798d356d 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -936,13 +936,14 @@ static ssize_t read_block_state(struct file *file, char __user *buf,
 
 		ts = ktime_to_timespec64(zram->table[index].ac_time);
 		copied = snprintf(kbuf + written, count,
-			"%12zd %12lld.%06lu %c%c%c%c\n",
+			"%12zd %12lld.%06lu %c%c%c%c%c\n",
 			index, (s64)ts.tv_sec,
 			ts.tv_nsec / NSEC_PER_USEC,
 			zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
 			zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
 			zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.',
-			zram_test_flag(zram, index, ZRAM_IDLE) ? 'i' : '.');
+			zram_test_flag(zram, index, ZRAM_IDLE) ? 'i' : '.',
+			zram_test_flag(zram, index, ZRAM_RECOMP) ? 'r' : '.');
 
 		if (count <= copied) {
 			zram_slot_unlock(zram, index);
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 3/7] zram: Factor out WB and non-WB zram read functions
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2022-09-05  8:15 ` [PATCH 3/3] zram: Add recompress flag to read_block_state() Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH RFC 4/7] zram: Introduce recompress sysfs knob Sergey Senozhatsky
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

We will use non-WB variant in ZRAM page recompression path.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 66 +++++++++++++++++++++++++----------
 1 file changed, 47 insertions(+), 19 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 694c8c426cb2..de2970865b7b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1325,8 +1325,33 @@ static void zram_free_page(struct zram *zram, size_t index)
 		~(1UL << ZRAM_LOCK | 1UL << ZRAM_UNDER_WB));
 }
 
-static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
-				struct bio *bio, bool partial_io)
+/*
+ * Reads a page from the writeback devices. Corresponding ZRAM slot
+ * should be unlocked.
+ */
+static int zram_read_from_writeback(struct zram *zram,
+				    struct page *page,
+				    u32 index,
+				    struct bio *bio,
+				    bool partial_io)
+{
+	struct bio_vec bvec;
+
+	bvec.bv_page = page;
+	bvec.bv_len = PAGE_SIZE;
+	bvec.bv_offset = 0;
+	return read_from_bdev(zram, &bvec,
+			      zram_get_element(zram, index),
+			      bio, partial_io);
+}
+
+/*
+ * Reads (decompresses if needed) a page from zspool (zsmalloc).
+ * Corresponding ZRAM slot should be locked.
+ */
+static int zram_read_from_zspool(struct zram *zram,
+				 struct page *page,
+				 u32 index)
 {
 	struct zcomp_strm *zstrm;
 	unsigned long handle;
@@ -1334,20 +1359,6 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 	void *src, *dst;
 	int ret;
 
-	zram_slot_lock(zram, index);
-	if (zram_test_flag(zram, index, ZRAM_WB)) {
-		struct bio_vec bvec;
-
-		zram_slot_unlock(zram, index);
-
-		bvec.bv_page = page;
-		bvec.bv_len = PAGE_SIZE;
-		bvec.bv_offset = 0;
-		return read_from_bdev(zram, &bvec,
-				zram_get_element(zram, index),
-				bio, partial_io);
-	}
-
 	handle = zram_get_handle(zram, index);
 	if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) {
 		unsigned long value;
@@ -1357,7 +1368,6 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 		mem = kmap_atomic(page);
 		zram_fill_page(mem, PAGE_SIZE, value);
 		kunmap_atomic(mem);
-		zram_slot_unlock(zram, index);
 		return 0;
 	}
 
@@ -1379,7 +1389,25 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_ZCOMP]);
 	}
 	zs_unmap_object(zram->mem_pool, handle);
-	zram_slot_unlock(zram, index);
+	return ret;
+}
+
+static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
+			    struct bio *bio, bool partial_io)
+{
+	int ret;
+
+	zram_slot_lock(zram, index);
+	if (!zram_test_flag(zram, index, ZRAM_WB)) {
+		/* Slot should be locked through out the function call */
+		ret = zram_read_from_zspool(zram, page, index);
+		zram_slot_unlock(zram, index);
+	} else {
+		/* Slot should be unlocked before the function call */
+		zram_slot_unlock(zram, index);
+		ret = zram_read_from_writeback(zram, page, index, bio,
+					       partial_io);
+	}
 
 	/* Should NEVER happen. Return bio error if it does. */
 	if (WARN_ON(ret))
@@ -1389,7 +1417,7 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 }
 
 static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
-				u32 index, int offset, struct bio *bio)
+			  u32 index, int offset, struct bio *bio)
 {
 	int ret;
 	struct page *page;
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 4/7] zram: Introduce recompress sysfs knob
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
                   ` (5 preceding siblings ...)
  2022-09-05  8:15 ` [PATCH RFC 3/7] zram: Factor out WB and non-WB zram read functions Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  9:21   ` Barry Song
  2022-09-05  8:15 ` [PATCH RFC 5/7] documentation: Add recompression documentation Sergey Senozhatsky
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Allow zram to recompress (using secondary compression streams)
pages. We support three modes:

1) IDLE pages recompression is activated by `idle` mode

	echo idle > /sys/block/zram0/recompress

2) Since there may be many idle pages user-space may pass a size
watermark value and we will recompress IDLE pages only of equal
or greater size:

	echo 888 > /sys/block/zram0/recompress

3) HUGE pages recompression is activated by `huge` mode

	echo huge > /sys/block/zram0/recompress

4) HUGE_IDLE pages recompression is activated by `huge_idle` mode

	echo huge_idle > /sys/block/zram0/recompress

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/Kconfig    |  11 ++
 drivers/block/zram/zram_drv.c | 191 +++++++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h |   1 +
 3 files changed, 200 insertions(+), 3 deletions(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index d4100b0c083e..81ae4b96ec1a 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -78,3 +78,14 @@ config ZRAM_MEMORY_TRACKING
 	  /sys/kernel/debug/zram/zramX/block_state.
 
 	  See Documentation/admin-guide/blockdev/zram.rst for more information.
+
+config ZRAM_MULTI_COMP
+	bool "Enable multiple per-CPU compression streams"
+	depends on ZRAM
+	help
+	This will enable per-CPU multi-compression streams, so that ZRAM
+	can re-compress IDLE pages, using a potentially slower but more
+	effective compression algorithm.
+
+          echo TIMEOUT > /sys/block/zramX/idle
+          echo SIZE > /sys/block/zramX/recompress
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index de2970865b7b..386e49a13806 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1293,6 +1293,9 @@ static void zram_free_page(struct zram *zram, size_t index)
 		atomic64_dec(&zram->stats.huge_pages);
 	}
 
+	if (zram_test_flag(zram, index, ZRAM_RECOMP))
+		zram_clear_flag(zram, index, ZRAM_RECOMP);
+
 	if (zram_test_flag(zram, index, ZRAM_WB)) {
 		zram_clear_flag(zram, index, ZRAM_WB);
 		free_block_bdev(zram, zram_get_element(zram, index));
@@ -1357,6 +1360,7 @@ static int zram_read_from_zspool(struct zram *zram,
 	unsigned long handle;
 	unsigned int size;
 	void *src, *dst;
+	u32 idx;
 	int ret;
 
 	handle = zram_get_handle(zram, index);
@@ -1373,8 +1377,13 @@ static int zram_read_from_zspool(struct zram *zram,
 
 	size = zram_get_obj_size(zram, index);
 
-	if (size != PAGE_SIZE)
-		zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_ZCOMP]);
+	if (size != PAGE_SIZE) {
+		idx = ZRAM_PRIMARY_ZCOMP;
+		if (zram_test_flag(zram, index, ZRAM_RECOMP))
+			idx = ZRAM_SECONDARY_ZCOMP;
+
+		zstrm = zcomp_stream_get(zram->comps[idx]);
+	}
 
 	src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
 	if (size == PAGE_SIZE) {
@@ -1386,7 +1395,7 @@ static int zram_read_from_zspool(struct zram *zram,
 		dst = kmap_atomic(page);
 		ret = zcomp_decompress(zstrm, src, size, dst);
 		kunmap_atomic(dst);
-		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_ZCOMP]);
+		zcomp_stream_put(zram->comps[idx]);
 	}
 	zs_unmap_object(zram->mem_pool, handle);
 	return ret;
@@ -1612,6 +1621,180 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	return ret;
 }
 
+#ifdef CONFIG_ZRAM_MULTI_COMP
+/*
+ * This function will decompress (unless it's ZRAM_HUGE) the page and then
+ * attempt to compress it using secondary compression algorithm (which is
+ * potentially more effective).
+ *
+ * Corresponding ZRAM slot should be locked.
+ */
+static int zram_recompress(struct zram *zram,
+			   u32 index,
+			   struct page *page,
+			   int size_watermark)
+{
+	unsigned long handle_prev;
+	unsigned long handle_next;
+	unsigned int comp_len_next;
+	unsigned int  comp_len_prev;
+	struct zcomp_strm *zstrm;
+	void *src, *dst;
+	int ret;
+
+	handle_prev = zram_get_handle(zram, index);
+	if (!handle_prev)
+		return -EINVAL;
+
+	comp_len_prev = zram_get_obj_size(zram, index);
+	/*
+	 * Do not recompress objects that are already "small enough".
+	 */
+	if (comp_len_prev < size_watermark)
+		return 0;
+
+	ret = zram_read_from_zspool(zram, page, index);
+	if (ret)
+		return ret;
+
+	zstrm = zcomp_stream_get(zram->comps[ZRAM_SECONDARY_ZCOMP]);
+	src = kmap_atomic(page);
+	ret = zcomp_compress(zstrm, src, &comp_len_next);
+	kunmap_atomic(src);
+
+	/*
+	 * Either a compression error or we failed to compressed the object
+	 * in a way that will save us memory. Mark object as "recompressed"
+	 * if it's huge, so that we don't try to recompress it again. Ideally
+	 * we want to set some bit for all such objects, but we for now do so
+	 * only for huge ones (we are out of bits in flags on 32-bit systems).
+	 */
+	if (comp_len_next >= huge_class_size ||
+	    comp_len_next >= comp_len_prev ||
+	    ret) {
+		if (zram_test_flag(zram, index, ZRAM_HUGE))
+			zram_set_flag(zram, index, ZRAM_RECOMP);
+		zram_clear_flag(zram, index, ZRAM_IDLE);
+		zcomp_stream_put(zram->comps[ZRAM_SECONDARY_ZCOMP]);
+		return ret;
+	}
+
+	/*
+	 * No direct reclaim (slow path) for handle allocation and no
+	 * re-compression attempt (unlike in __zram_bvec_write()) since
+	 * we already stored that object in zsmalloc. If we cannot alloc
+	 * memory then me bail out.
+	 */
+	handle_next = zs_malloc(zram->mem_pool, comp_len_next,
+				__GFP_KSWAPD_RECLAIM |
+				__GFP_NOWARN |
+				__GFP_HIGHMEM |
+				__GFP_MOVABLE);
+	if (IS_ERR((void *)handle_next)) {
+		zcomp_stream_put(zram->comps[ZRAM_SECONDARY_ZCOMP]);
+		return -ENOMEM;
+	}
+
+	dst = zs_map_object(zram->mem_pool, handle_next, ZS_MM_WO);
+	memcpy(dst, zstrm->buffer, comp_len_next);
+	zcomp_stream_put(zram->comps[ZRAM_SECONDARY_ZCOMP]);
+
+	zs_unmap_object(zram->mem_pool, handle_next);
+
+	zram_free_page(zram, index);
+	zram_set_handle(zram, index, handle_next);
+	zram_set_obj_size(zram, index, comp_len_next);
+
+	zram_set_flag(zram, index, ZRAM_RECOMP);
+	atomic64_add(comp_len_next, &zram->stats.compr_data_size);
+	atomic64_inc(&zram->stats.pages_stored);
+
+	return 0;
+}
+
+#define RECOMPRESS_IDLE		(1 << 0)
+#define RECOMPRESS_HUGE		(1 << 1)
+
+static ssize_t recompress_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf,
+				size_t len)
+{
+	struct zram *zram = dev_to_zram(dev);
+	unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
+	unsigned long index;
+	struct page *page;
+	ssize_t ret = 0;
+	int mode, size_watermark = 0;
+
+	if (sysfs_streq(buf, "idle")) {
+		mode = RECOMPRESS_IDLE;
+	} else if (sysfs_streq(buf, "huge")) {
+		mode = RECOMPRESS_HUGE;
+	} else if (sysfs_streq(buf, "huge_idle")) {
+		mode = RECOMPRESS_IDLE | RECOMPRESS_HUGE;
+	} else {
+		/*
+		 * We will re-compress only idle objects equal or greater
+		 * in size than watermark.
+		 */
+		ret = kstrtoint(buf, 10, &size_watermark);
+		if (ret)
+			return ret;
+		mode = RECOMPRESS_IDLE;
+	}
+
+	if (size_watermark > PAGE_SIZE)
+		return ret;
+
+	down_read(&zram->init_lock);
+	if (!init_done(zram)) {
+		ret = -EINVAL;
+		goto release_init_lock;
+	}
+
+	page = alloc_page(GFP_KERNEL);
+	if (!page) {
+		ret = -ENOMEM;
+		goto release_init_lock;
+	}
+
+	for (index = 0; index < nr_pages; index++) {
+		zram_slot_lock(zram, index);
+
+		if (!zram_allocated(zram, index))
+			goto next;
+
+		if (mode & RECOMPRESS_IDLE &&
+		    !zram_test_flag(zram, index, ZRAM_IDLE))
+			goto next;
+
+		if (mode & RECOMPRESS_HUGE &&
+		    !zram_test_flag(zram, index, ZRAM_HUGE))
+			goto next;
+
+		if (zram_test_flag(zram, index, ZRAM_WB) ||
+		    zram_test_flag(zram, index, ZRAM_UNDER_WB) ||
+		    zram_test_flag(zram, index, ZRAM_SAME) ||
+		    zram_test_flag(zram, index, ZRAM_RECOMP))
+			goto next;
+
+		ret = zram_recompress(zram, index, page, size_watermark);
+next:
+		zram_slot_unlock(zram, index);
+		if (ret)
+			break;
+	}
+
+	ret = len;
+	__free_page(page);
+
+release_init_lock:
+	up_read(&zram->init_lock);
+	return ret;
+}
+#endif
+
 /*
  * zram_bio_discard - handler on discard request
  * @index: physical block index in PAGE_SIZE units
@@ -2001,6 +2184,7 @@ static DEVICE_ATTR_RW(writeback_limit_enable);
 #endif
 #ifdef CONFIG_ZRAM_MULTI_COMP
 static DEVICE_ATTR_RW(recomp_algorithm);
+static DEVICE_ATTR_WO(recompress);
 #endif
 
 static struct attribute *zram_disk_attrs[] = {
@@ -2027,6 +2211,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_debug_stat.attr,
 #ifdef CONFIG_ZRAM_MULTI_COMP
 	&dev_attr_recomp_algorithm.attr,
+	&dev_attr_recompress.attr,
 #endif
 	NULL,
 };
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index af3d6f6bfcff..b4eecef2a11f 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -50,6 +50,7 @@ enum zram_pageflags {
 	ZRAM_UNDER_WB,	/* page is under writeback */
 	ZRAM_HUGE,	/* Incompressible page */
 	ZRAM_IDLE,	/* not accessed page since last idle marking */
+	ZRAM_RECOMP,	/* page was recompressed */
 
 	__NR_ZRAM_PAGEFLAGS,
 };
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 5/7] documentation: Add recompression documentation
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
                   ` (6 preceding siblings ...)
  2022-09-05  8:15 ` [PATCH RFC 4/7] zram: Introduce recompress sysfs knob Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH RFC 6/7] zram: Add recompression algorithm choice to Kconfig Sergey Senozhatsky
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Document user-space visible device attributes that
are enabled by ZRAM_MULTI_COMP.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 Documentation/admin-guide/blockdev/zram.rst | 55 +++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index c73b16930449..88957fcb6ad7 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -401,6 +401,61 @@ budget in next setting is user's job.
 If admin wants to measure writeback count in a certain period, they could
 know it via /sys/block/zram0/bd_stat's 3rd column.
 
+recompression
+-------------
+
+With CONFIG_ZRAM_MULTI_COMP, zram can recompress idle/huge pages using
+alternative (secondary) compression algorithm. The basic idea is that
+alternative compression algorithm can provide better compression ratio
+at a price of (potentially) slower compression/decompression speeds.
+Alternative compression algorithm can, for example, be more successful
+compressing huge pages (those that default algorithm failed to compress).
+Another application is idle pages recompression - pages that are cold and
+sit in the memory can be recompressed using more effective algorithm and,
+hence, reduce zsmalloc memory usage.
+
+With CONFIG_ZRAM_MULTI_COMP, zram will setup two compression algorithms
+per-CPU: primary and secondary ones. Primary zram compressor is explained
+in "3) Select compression algorithm", the secondary algorithm is configured
+in a similar way, using recomp_algorithm device attribute:
+
+Examples::
+
+	#show supported recompression algorithms
+	cat /sys/block/zramX/recomp_algorithm
+	zstd [lzo]
+
+	#select zstd recompression algorithm
+	echo zstd > /sys/block/zramX/recomp_algorithm
+
+Another device attribute that CONFIG_ZRAM_MULTI_COMP enables is recompress,
+which controls recompression:
+
+Examples::
+
+	#IDLE pages recompression is activated by `idle` mode
+	echo idle > /sys/block/zramX/recompress
+
+	#HUGE pages recompression is activated by `huge` mode
+	echo huge > /sys/block/zram0/recompress
+
+	#HUGE_IDLE pages recompression is activated by `huge_idle` mode
+	echo huge_idle > /sys/block/zramX/recompress
+
+The number of idle pages can be significant, so user-space can pass a size
+watermark value to the recompress knob, to filter out idle pages for
+recompression: zram will recompress only idle pages of equal or greater
+size:::
+
+	#recompress idle pages larger than 3000 bytes
+	echo 3000 > /sys/block/zramX/recompress
+
+	#recompress idle pages larger than 2000 bytes
+	echo 2000 > /sys/block/zramX/recompress
+
+Recompression is mostly focused on idle pages (except for huge pages
+recompression), so it works better in conjunction with memory tracking.
+
 memory tracking
 ===============
 
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 6/7] zram: Add recompression algorithm choice to Kconfig
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
                   ` (7 preceding siblings ...)
  2022-09-05  8:15 ` [PATCH RFC 5/7] documentation: Add recompression documentation Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:15 ` [PATCH RFC 7/7] zram: Add recompress flag to read_block_state() Sergey Senozhatsky
  2022-09-05  8:21 ` [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Make (secondary) recompression algorithm selectable just like
we do it for the (primary) default one.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/Kconfig    | 40 +++++++++++++++++++++++++++++++++++
 drivers/block/zram/zram_drv.c |  2 +-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 81ae4b96ec1a..fc2d4d66c484 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -89,3 +89,43 @@ config ZRAM_MULTI_COMP
 
           echo TIMEOUT > /sys/block/zramX/idle
           echo SIZE > /sys/block/zramX/recompress
+
+choice
+	prompt "Default zram recompression algorithm"
+	default ZRAM_DEF_RECOMP_ZSTD
+	depends on ZRAM && ZRAM_MULTI_COMP
+
+config ZRAM_DEF_RECOMP_LZORLE
+	bool "lzo-rle"
+	depends on CRYPTO_LZO
+
+config ZRAM_DEF_RECOMP_ZSTD
+	bool "zstd"
+	depends on CRYPTO_ZSTD
+
+config ZRAM_DEF_RECOMP_LZ4
+	bool "lz4"
+	depends on CRYPTO_LZ4
+
+config ZRAM_DEF_RECOMP_LZO
+	bool "lzo"
+	depends on CRYPTO_LZO
+
+config ZRAM_DEF_RECOMP_LZ4HC
+	bool "lz4hc"
+	depends on CRYPTO_LZ4HC
+
+config ZRAM_DEF_RECOMP_842
+	bool "842"
+	depends on CRYPTO_842
+
+endchoice
+
+config ZRAM_DEF_RECOMP
+	string
+	default "lzo-rle" if ZRAM_DEF_RECOMP_LZORLE
+	default "zstd" if ZRAM_DEF_RECOMP_ZSTD
+	default "lz4" if ZRAM_DEF_RECOMP_LZ4
+	default "lzo" if ZRAM_DEF_RECOMP_LZO
+	default "lz4hc" if ZRAM_DEF_RECOMP_LZ4HC
+	default "842" if ZRAM_DEF_RECOMP_842
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 386e49a13806..8ed41514b8f0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -44,7 +44,7 @@ static int zram_major;
 static const char *default_comp_algs[ZRAM_MAX_ZCOMPS] = {
 	CONFIG_ZRAM_DEF_COMP,
 #ifdef CONFIG_ZRAM_MULTI_COMP
-	"zstd",
+	CONFIG_ZRAM_DEF_RECOMP,
 #endif
 };
 
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 7/7] zram: Add recompress flag to read_block_state()
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
                   ` (8 preceding siblings ...)
  2022-09-05  8:15 ` [PATCH RFC 6/7] zram: Add recompression algorithm choice to Kconfig Sergey Senozhatsky
@ 2022-09-05  8:15 ` Sergey Senozhatsky
  2022-09-05  8:21 ` [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:15 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky

Add a new flag to zram block state that shows if the page
was recompressed (using alternative compression algorithm).

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 Documentation/admin-guide/blockdev/zram.rst | 9 ++++++---
 drivers/block/zram/zram_drv.c               | 5 +++--
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 88957fcb6ad7..70a3d0243b45 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -466,9 +466,10 @@ pages of the process with*pagemap.
 If you enable the feature, you could see block state via
 /sys/kernel/debug/zram/zram0/block_state". The output is as follows::
 
-	  300    75.033841 .wh.
-	  301    63.806904 s...
-	  302    63.806919 ..hi
+	  300    75.033841 .wh..
+	  301    63.806904 s....
+	  302    63.806919 ..hi.
+	  303    62.801919 ....r
 
 First column
 	zram's block index.
@@ -485,6 +486,8 @@ Third column
 		huge page
 	i:
 		idle page
+	r:
+		recompressed page (secondary compression algorithm)
 
 First line of above example says 300th block is accessed at 75.033841sec
 and the block's state is huge so it is written back to the backing
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 8ed41514b8f0..f3948abce2f7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -936,13 +936,14 @@ static ssize_t read_block_state(struct file *file, char __user *buf,
 
 		ts = ktime_to_timespec64(zram->table[index].ac_time);
 		copied = snprintf(kbuf + written, count,
-			"%12zd %12lld.%06lu %c%c%c%c\n",
+			"%12zd %12lld.%06lu %c%c%c%c%c\n",
 			index, (s64)ts.tv_sec,
 			ts.tv_nsec / NSEC_PER_USEC,
 			zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
 			zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
 			zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.',
-			zram_test_flag(zram, index, ZRAM_IDLE) ? 'i' : '.');
+			zram_test_flag(zram, index, ZRAM_IDLE) ? 'i' : '.',
+			zram_test_flag(zram, index, ZRAM_RECOMP) ? 'r' : '.');
 
 		if (count <= copied) {
 			zram_slot_unlock(zram, index);
-- 
2.37.2.789.g6183377224-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/7] zram: Support multiple compression streams
  2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
                   ` (9 preceding siblings ...)
  2022-09-05  8:15 ` [PATCH RFC 7/7] zram: Add recompress flag to read_block_state() Sergey Senozhatsky
@ 2022-09-05  8:21 ` Sergey Senozhatsky
  10 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  8:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Andrew Morton, Nitin Gupta, linux-kernel, linux-mm

On (22/09/05 17:15), Sergey Senozhatsky wrote:
> Hello,
> 
> 	RFC series that adds support for multiple (per-CPU)
> compression streams (at point only 2). The main idea is that
> different compression algorithms have different characteristics
> and zram may benefit when it uses a combination of algorithms:
> a default algorithm that is faster but have lower compression
> rate and a secondary algorithm that can use higher compression
> rate at a price of slower compression/decompression.
> 
> 	There are several use-case for this functionality:
> 

Please ignore this series. I had some extra patches in my
"outgoing" directory. I'll resend.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 4/7] zram: Introduce recompress sysfs knob
  2022-09-05  8:15 ` [PATCH RFC 4/7] zram: Introduce recompress sysfs knob Sergey Senozhatsky
@ 2022-09-05  9:21   ` Barry Song
  2022-09-05  9:53     ` Sergey Senozhatsky
  0 siblings, 1 reply; 17+ messages in thread
From: Barry Song @ 2022-09-05  9:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Andrew Morton, Nitin Gupta, linux-kernel, linux-mm

On Mon, Sep 5, 2022 at 9:00 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> Allow zram to recompress (using secondary compression streams)
> pages. We support three modes:
>
> 1) IDLE pages recompression is activated by `idle` mode
>
>         echo idle > /sys/block/zram0/recompress
>
> 2) Since there may be many idle pages user-space may pass a size
> watermark value and we will recompress IDLE pages only of equal
> or greater size:
>
>         echo 888 > /sys/block/zram0/recompress
>
> 3) HUGE pages recompression is activated by `huge` mode
>
>         echo huge > /sys/block/zram0/recompress

Thanks for developing this interesting feature. It seems reasonable for cold
pages. But for a huge page,  do you have some data to show that the hugepage
is not compressed by lzo/lz4 so we need zstd further? i assume the size of
the huge page you are talking about is 2MB?

what if the huge page is not cold and swapped out/in frequently?

on second thoughts, it seems you mean hugepage is those pages
whose compressed data is big? if so, can you please avoid using
"huge page" as it is quite misleading in linux. we are using hugepage
for pages larger than 4KB.

>
> 4) HUGE_IDLE pages recompression is activated by `huge_idle` mode
>
>         echo huge_idle > /sys/block/zram0/recompress
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  drivers/block/zram/Kconfig    |  11 ++
>  drivers/block/zram/zram_drv.c | 191 +++++++++++++++++++++++++++++++++-
>  drivers/block/zram/zram_drv.h |   1 +
>  3 files changed, 200 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> index d4100b0c083e..81ae4b96ec1a 100644
> --- a/drivers/block/zram/Kconfig
> +++ b/drivers/block/zram/Kconfig
> @@ -78,3 +78,14 @@ config ZRAM_MEMORY_TRACKING
>           /sys/kernel/debug/zram/zramX/block_state.
>
>           See Documentation/admin-guide/blockdev/zram.rst for more information.
> +
> +config ZRAM_MULTI_COMP
> +       bool "Enable multiple per-CPU compression streams"
> +       depends on ZRAM
> +       help
> +       This will enable per-CPU multi-compression streams, so that ZRAM
> +       can re-compress IDLE pages, using a potentially slower but more
> +       effective compression algorithm.
> +
> +          echo TIMEOUT > /sys/block/zramX/idle
> +          echo SIZE > /sys/block/zramX/recompress
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index de2970865b7b..386e49a13806 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1293,6 +1293,9 @@ static void zram_free_page(struct zram *zram, size_t index)
>                 atomic64_dec(&zram->stats.huge_pages);
>         }
>
> +       if (zram_test_flag(zram, index, ZRAM_RECOMP))
> +               zram_clear_flag(zram, index, ZRAM_RECOMP);
> +
>         if (zram_test_flag(zram, index, ZRAM_WB)) {
>                 zram_clear_flag(zram, index, ZRAM_WB);
>                 free_block_bdev(zram, zram_get_element(zram, index));
> @@ -1357,6 +1360,7 @@ static int zram_read_from_zspool(struct zram *zram,
>         unsigned long handle;
>         unsigned int size;
>         void *src, *dst;
> +       u32 idx;
>         int ret;
>
>         handle = zram_get_handle(zram, index);
> @@ -1373,8 +1377,13 @@ static int zram_read_from_zspool(struct zram *zram,
>
>         size = zram_get_obj_size(zram, index);
>
> -       if (size != PAGE_SIZE)
> -               zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_ZCOMP]);
> +       if (size != PAGE_SIZE) {
> +               idx = ZRAM_PRIMARY_ZCOMP;
> +               if (zram_test_flag(zram, index, ZRAM_RECOMP))
> +                       idx = ZRAM_SECONDARY_ZCOMP;
> +
> +               zstrm = zcomp_stream_get(zram->comps[idx]);
> +       }
>
>         src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
>         if (size == PAGE_SIZE) {
> @@ -1386,7 +1395,7 @@ static int zram_read_from_zspool(struct zram *zram,
>                 dst = kmap_atomic(page);
>                 ret = zcomp_decompress(zstrm, src, size, dst);
>                 kunmap_atomic(dst);
> -               zcomp_stream_put(zram->comps[ZRAM_PRIMARY_ZCOMP]);
> +               zcomp_stream_put(zram->comps[idx]);
>         }
>         zs_unmap_object(zram->mem_pool, handle);
>         return ret;
> @@ -1612,6 +1621,180 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
>         return ret;
>  }
>
> +#ifdef CONFIG_ZRAM_MULTI_COMP
> +/*
> + * This function will decompress (unless it's ZRAM_HUGE) the page and then
> + * attempt to compress it using secondary compression algorithm (which is
> + * potentially more effective).
> + *
> + * Corresponding ZRAM slot should be locked.
> + */
> +static int zram_recompress(struct zram *zram,
> +                          u32 index,
> +                          struct page *page,
> +                          int size_watermark)
> +{
> +       unsigned long handle_prev;
> +       unsigned long handle_next;
> +       unsigned int comp_len_next;
> +       unsigned int  comp_len_prev;
> +       struct zcomp_strm *zstrm;
> +       void *src, *dst;
> +       int ret;
> +
> +       handle_prev = zram_get_handle(zram, index);
> +       if (!handle_prev)
> +               return -EINVAL;
> +
> +       comp_len_prev = zram_get_obj_size(zram, index);
> +       /*
> +        * Do not recompress objects that are already "small enough".
> +        */
> +       if (comp_len_prev < size_watermark)
> +               return 0;
> +
> +       ret = zram_read_from_zspool(zram, page, index);
> +       if (ret)
> +               return ret;
> +
> +       zstrm = zcomp_stream_get(zram->comps[ZRAM_SECONDARY_ZCOMP]);
> +       src = kmap_atomic(page);
> +       ret = zcomp_compress(zstrm, src, &comp_len_next);
> +       kunmap_atomic(src);
> +
> +       /*
> +        * Either a compression error or we failed to compressed the object
> +        * in a way that will save us memory. Mark object as "recompressed"
> +        * if it's huge, so that we don't try to recompress it again. Ideally
> +        * we want to set some bit for all such objects, but we for now do so
> +        * only for huge ones (we are out of bits in flags on 32-bit systems).
> +        */
> +       if (comp_len_next >= huge_class_size ||
> +           comp_len_next >= comp_len_prev ||
> +           ret) {
> +               if (zram_test_flag(zram, index, ZRAM_HUGE))
> +                       zram_set_flag(zram, index, ZRAM_RECOMP);
> +               zram_clear_flag(zram, index, ZRAM_IDLE);
> +               zcomp_stream_put(zram->comps[ZRAM_SECONDARY_ZCOMP]);
> +               return ret;
> +       }
> +
> +       /*
> +        * No direct reclaim (slow path) for handle allocation and no
> +        * re-compression attempt (unlike in __zram_bvec_write()) since
> +        * we already stored that object in zsmalloc. If we cannot alloc
> +        * memory then me bail out.
> +        */
> +       handle_next = zs_malloc(zram->mem_pool, comp_len_next,
> +                               __GFP_KSWAPD_RECLAIM |
> +                               __GFP_NOWARN |
> +                               __GFP_HIGHMEM |
> +                               __GFP_MOVABLE);
> +       if (IS_ERR((void *)handle_next)) {
> +               zcomp_stream_put(zram->comps[ZRAM_SECONDARY_ZCOMP]);
> +               return -ENOMEM;
> +       }
> +
> +       dst = zs_map_object(zram->mem_pool, handle_next, ZS_MM_WO);
> +       memcpy(dst, zstrm->buffer, comp_len_next);
> +       zcomp_stream_put(zram->comps[ZRAM_SECONDARY_ZCOMP]);
> +
> +       zs_unmap_object(zram->mem_pool, handle_next);
> +
> +       zram_free_page(zram, index);
> +       zram_set_handle(zram, index, handle_next);
> +       zram_set_obj_size(zram, index, comp_len_next);
> +
> +       zram_set_flag(zram, index, ZRAM_RECOMP);
> +       atomic64_add(comp_len_next, &zram->stats.compr_data_size);
> +       atomic64_inc(&zram->stats.pages_stored);
> +
> +       return 0;
> +}
> +
> +#define RECOMPRESS_IDLE                (1 << 0)
> +#define RECOMPRESS_HUGE                (1 << 1)
> +
> +static ssize_t recompress_store(struct device *dev,
> +                               struct device_attribute *attr,
> +                               const char *buf,
> +                               size_t len)
> +{
> +       struct zram *zram = dev_to_zram(dev);
> +       unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
> +       unsigned long index;
> +       struct page *page;
> +       ssize_t ret = 0;
> +       int mode, size_watermark = 0;
> +
> +       if (sysfs_streq(buf, "idle")) {
> +               mode = RECOMPRESS_IDLE;
> +       } else if (sysfs_streq(buf, "huge")) {
> +               mode = RECOMPRESS_HUGE;
> +       } else if (sysfs_streq(buf, "huge_idle")) {
> +               mode = RECOMPRESS_IDLE | RECOMPRESS_HUGE;
> +       } else {
> +               /*
> +                * We will re-compress only idle objects equal or greater
> +                * in size than watermark.
> +                */
> +               ret = kstrtoint(buf, 10, &size_watermark);
> +               if (ret)
> +                       return ret;
> +               mode = RECOMPRESS_IDLE;
> +       }
> +
> +       if (size_watermark > PAGE_SIZE)
> +               return ret;
> +
> +       down_read(&zram->init_lock);
> +       if (!init_done(zram)) {
> +               ret = -EINVAL;
> +               goto release_init_lock;
> +       }
> +
> +       page = alloc_page(GFP_KERNEL);
> +       if (!page) {
> +               ret = -ENOMEM;
> +               goto release_init_lock;
> +       }
> +
> +       for (index = 0; index < nr_pages; index++) {
> +               zram_slot_lock(zram, index);
> +
> +               if (!zram_allocated(zram, index))
> +                       goto next;
> +
> +               if (mode & RECOMPRESS_IDLE &&
> +                   !zram_test_flag(zram, index, ZRAM_IDLE))
> +                       goto next;
> +
> +               if (mode & RECOMPRESS_HUGE &&
> +                   !zram_test_flag(zram, index, ZRAM_HUGE))
> +                       goto next;
> +
> +               if (zram_test_flag(zram, index, ZRAM_WB) ||
> +                   zram_test_flag(zram, index, ZRAM_UNDER_WB) ||
> +                   zram_test_flag(zram, index, ZRAM_SAME) ||
> +                   zram_test_flag(zram, index, ZRAM_RECOMP))
> +                       goto next;
> +
> +               ret = zram_recompress(zram, index, page, size_watermark);
> +next:
> +               zram_slot_unlock(zram, index);
> +               if (ret)
> +                       break;
> +       }
> +
> +       ret = len;
> +       __free_page(page);
> +
> +release_init_lock:
> +       up_read(&zram->init_lock);
> +       return ret;
> +}
> +#endif
> +
>  /*
>   * zram_bio_discard - handler on discard request
>   * @index: physical block index in PAGE_SIZE units
> @@ -2001,6 +2184,7 @@ static DEVICE_ATTR_RW(writeback_limit_enable);
>  #endif
>  #ifdef CONFIG_ZRAM_MULTI_COMP
>  static DEVICE_ATTR_RW(recomp_algorithm);
> +static DEVICE_ATTR_WO(recompress);
>  #endif
>
>  static struct attribute *zram_disk_attrs[] = {
> @@ -2027,6 +2211,7 @@ static struct attribute *zram_disk_attrs[] = {
>         &dev_attr_debug_stat.attr,
>  #ifdef CONFIG_ZRAM_MULTI_COMP
>         &dev_attr_recomp_algorithm.attr,
> +       &dev_attr_recompress.attr,
>  #endif
>         NULL,
>  };
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index af3d6f6bfcff..b4eecef2a11f 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -50,6 +50,7 @@ enum zram_pageflags {
>         ZRAM_UNDER_WB,  /* page is under writeback */
>         ZRAM_HUGE,      /* Incompressible page */
>         ZRAM_IDLE,      /* not accessed page since last idle marking */
> +       ZRAM_RECOMP,    /* page was recompressed */
>
>         __NR_ZRAM_PAGEFLAGS,
>  };
> --
> 2.37.2.789.g6183377224-goog
>

Thanks
Barry


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 4/7] zram: Introduce recompress sysfs knob
  2022-09-05  9:21   ` Barry Song
@ 2022-09-05  9:53     ` Sergey Senozhatsky
  2022-09-05 10:06       ` Barry Song
  0 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05  9:53 UTC (permalink / raw)
  To: Barry Song
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, Nitin Gupta,
	linux-kernel, linux-mm

On (22/09/05 21:21), Barry Song wrote:
> > 3) HUGE pages recompression is activated by `huge` mode
> >
> >         echo huge > /sys/block/zram0/recompress
> 
> Thanks for developing this interesting feature. It seems reasonable for cold
> pages. But for a huge page,  do you have some data to show that the hugepage
> is not compressed by lzo/lz4 so we need zstd further? i assume the size of
> the huge page you are talking about is 2MB?

Oh, yeah, this is the lingo we use in zram. We used "huge" object and "huge"
size class in zsmalloc and the term "huge" transitioned to zram, but zram
operates with pages not objects, so huge zsmalloc object is "huge zram page".

> on second thoughts, it seems you mean hugepage is those pages
> whose compressed data is big? if so, can you please avoid using
> "huge page" as it is quite misleading in linux. we are using hugepage
> for pages larger than 4KB.

Yes, you are right. And I wish I could use a different term, but...
this is what zram has been using for many years:
Documentation/admin-guide/blockdev/zram.rst

And we already accept "huge" and "huge pages", and so, in sysfs knobs
(zram device attrs), so the confusing term, unfortunately, is there
forever.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 4/7] zram: Introduce recompress sysfs knob
  2022-09-05  9:53     ` Sergey Senozhatsky
@ 2022-09-05 10:06       ` Barry Song
  2022-09-05 10:17         ` Sergey Senozhatsky
  0 siblings, 1 reply; 17+ messages in thread
From: Barry Song @ 2022-09-05 10:06 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Andrew Morton, Nitin Gupta, linux-kernel, linux-mm

On Mon, Sep 5, 2022 at 9:53 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (22/09/05 21:21), Barry Song wrote:
> > > 3) HUGE pages recompression is activated by `huge` mode
> > >
> > >         echo huge > /sys/block/zram0/recompress
> >
> > Thanks for developing this interesting feature. It seems reasonable for cold
> > pages. But for a huge page,  do you have some data to show that the hugepage
> > is not compressed by lzo/lz4 so we need zstd further? i assume the size of
> > the huge page you are talking about is 2MB?
>
> Oh, yeah, this is the lingo we use in zram. We used "huge" object and "huge"
> size class in zsmalloc and the term "huge" transitioned to zram, but zram
> operates with pages not objects, so huge zsmalloc object is "huge zram page".
>
> > on second thoughts, it seems you mean hugepage is those pages
> > whose compressed data is big? if so, can you please avoid using
> > "huge page" as it is quite misleading in linux. we are using hugepage
> > for pages larger than 4KB.
>
> Yes, you are right. And I wish I could use a different term, but...
> this is what zram has been using for many years:
> Documentation/admin-guide/blockdev/zram.rst
>
> And we already accept "huge" and "huge pages", and so, in sysfs knobs
> (zram device attrs), so the confusing term, unfortunately, is there
> forever.

make sense! thanks! i assume you will have some benchmark data to compare
three cases,
1. lzo with recompress zstd
2. always use lzo
3. always use zstd

such as power consumption, cpu utilization, available memory, benefits to user
experiences especially to UI smoothness under memory pressure?

Thanks
Barry


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 4/7] zram: Introduce recompress sysfs knob
  2022-09-05 10:06       ` Barry Song
@ 2022-09-05 10:17         ` Sergey Senozhatsky
  2022-09-05 21:27           ` Barry Song
  0 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2022-09-05 10:17 UTC (permalink / raw)
  To: Barry Song
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, Nitin Gupta,
	linux-kernel, linux-mm

On (22/09/05 22:06), Barry Song wrote:
> 
> make sense! thanks! i assume you will have some benchmark data to compare
> three cases,
> 1. lzo with recompress zstd
> 2. always use lzo
> 3. always use zstd
> 
> such as power consumption, cpu utilization, available memory, benefits to user
> experiences especially to UI smoothness under memory pressure?

So I didn't want to include any benchmarks, because this is entirely
specific to device's data sets/patterns. In term of CPU usage, zstd
decompression is really fast [1]; and the way plan to use is battery
aware - e.g. when low on battery do not recompress at all, if AC is
plugged in then recompress more aggressively, etc.

In term of benchmarks... a copy paste from our internal tests. But
*do note* that this is relative only to our specific data sets.
Your millage may vary.

ZSTD recomp algorithm (5.10 kernel, so the last column is the number of
'zram huge pages'):

- Initial state of zram swap partition
localhost ~ # cat /sys/block/zram0/mm_stat 
8955662336 2180671776 2277711872        0 3179720704   798724   469474   118949

- Recompress HUGE objects only
localhost ~ # echo huge > /sys/block/zram0/recompress 
localhost ~ # cat /sys/block/zram0/mm_stat 
8944390144 2106998658 2211835904        0 3179720704   798617   469474    66821

- Recompress IDLE pages that are >= 3000 bytes in size
localhost ~ # echo 3000 > /sys/block/zram0/recompress 
localhost ~ # cat /sys/block/zram0/mm_stat 
8934166528 2085232505 2207690752        0 3179720704   798484   469474    66811

- Recompress the remaining IDLE pages that are >= 2000 bytes in size
localhost ~ # echo 2000 > /sys/block/zram0/recompress 
localhost ~ # cat /sys/block/zram0/mm_stat 
8913981440 1946488434 2145157120        0 3179720704   798130   469474    66498

- Recompress the remaining IDLE pages that are >= 1000 bytes in size
localhost ~ # echo 1000 > /sys/block/zram0/recompress 
localhost ~ # cat /sys/block/zram0/mm_stat 
8905592832 1711533182 1984495616        0 3179720704   797162   469474    66222

[1] https://facebook.github.io/zstd/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 4/7] zram: Introduce recompress sysfs knob
  2022-09-05 10:17         ` Sergey Senozhatsky
@ 2022-09-05 21:27           ` Barry Song
  0 siblings, 0 replies; 17+ messages in thread
From: Barry Song @ 2022-09-05 21:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Andrew Morton, Nitin Gupta, linux-kernel, linux-mm

On Mon, Sep 5, 2022 at 10:17 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (22/09/05 22:06), Barry Song wrote:
> >
> > make sense! thanks! i assume you will have some benchmark data to compare
> > three cases,
> > 1. lzo with recompress zstd
> > 2. always use lzo
> > 3. always use zstd
> >
> > such as power consumption, cpu utilization, available memory, benefits to user
> > experiences especially to UI smoothness under memory pressure?
>
> So I didn't want to include any benchmarks, because this is entirely
> specific to device's data sets/patterns. In term of CPU usage, zstd
> decompression is really fast [1]; and the way plan to use is battery
> aware - e.g. when low on battery do not recompress at all, if AC is
> plugged in then recompress more aggressively, etc.
>
> In term of benchmarks... a copy paste from our internal tests. But
> *do note* that this is relative only to our specific data sets.
> Your millage may vary.
>
> ZSTD recomp algorithm (5.10 kernel, so the last column is the number of
> 'zram huge pages'):
>
> - Initial state of zram swap partition
> localhost ~ # cat /sys/block/zram0/mm_stat
> 8955662336 2180671776 2277711872        0 3179720704   798724   469474   118949
>
> - Recompress HUGE objects only
> localhost ~ # echo huge > /sys/block/zram0/recompress
> localhost ~ # cat /sys/block/zram0/mm_stat
> 8944390144 2106998658 2211835904        0 3179720704   798617   469474    66821
>
> - Recompress IDLE pages that are >= 3000 bytes in size
> localhost ~ # echo 3000 > /sys/block/zram0/recompress
> localhost ~ # cat /sys/block/zram0/mm_stat
> 8934166528 2085232505 2207690752        0 3179720704   798484   469474    66811
>
> - Recompress the remaining IDLE pages that are >= 2000 bytes in size
> localhost ~ # echo 2000 > /sys/block/zram0/recompress
> localhost ~ # cat /sys/block/zram0/mm_stat
> 8913981440 1946488434 2145157120        0 3179720704   798130   469474    66498
>
> - Recompress the remaining IDLE pages that are >= 1000 bytes in size
> localhost ~ # echo 1000 > /sys/block/zram0/recompress
> localhost ~ # cat /sys/block/zram0/mm_stat
> 8905592832 1711533182 1984495616        0 3179720704   797162   469474    66222
>
> [1] https://facebook.github.io/zstd/

Thanks very much. I guess the difficulty is that we need to
comprehensively evaluate its
effect on a real product such as android. there is always a trade-off
between cpu utilization
and more available memory.

Best Regards
Barry


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-09-05 21:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-05  8:15 [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH 1/3] documentation: Add recompression documentation Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH RFC 1/7] zram: Preparation for multi-zcomp support Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH 2/3] zram: Add recompression algorithm choice to Kconfig Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH RFC 2/7] zram: Add recompression algorithm sysfs knob Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH 3/3] zram: Add recompress flag to read_block_state() Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH RFC 3/7] zram: Factor out WB and non-WB zram read functions Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH RFC 4/7] zram: Introduce recompress sysfs knob Sergey Senozhatsky
2022-09-05  9:21   ` Barry Song
2022-09-05  9:53     ` Sergey Senozhatsky
2022-09-05 10:06       ` Barry Song
2022-09-05 10:17         ` Sergey Senozhatsky
2022-09-05 21:27           ` Barry Song
2022-09-05  8:15 ` [PATCH RFC 5/7] documentation: Add recompression documentation Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH RFC 6/7] zram: Add recompression algorithm choice to Kconfig Sergey Senozhatsky
2022-09-05  8:15 ` [PATCH RFC 7/7] zram: Add recompress flag to read_block_state() Sergey Senozhatsky
2022-09-05  8:21 ` [PATCH RFC 0/7] zram: Support multiple compression streams Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).