All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/7] writeback incompressible pages to storage
@ 2017-06-26  6:52 Minchan Kim
  2017-06-26  6:52 ` [PATCH v1 1/9] zram: clean up duplicated codes in __zram_bvec_write Minchan Kim
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

zRam is useful for memory saving with compressible pages but sometime,
workload can be changed and system has lots of incompressible pages
which is very harmful for zram.

This patch supports writeback feature of zram so admin can set up
a block device and with it, zram can save the memory via writing
out the incompressile pages once it found it's incompressible pages
(1/4 comp ratio) instead of keeping the page in memory.

[1-3] is just clean up and [4-8] is step by step feature enablement.
[4-8] is logically not bisectable(ie, logical unit separation)
although I tried to compiled out without breaking but I think it would
be better to review.

Minchan Kim (9):
  [1] zram: clean up duplicated codes in __zram_bvec_write
  [2] zram: inlining zram_compress
  [3] zram: rename zram_decompress_page with __zram_bvec_read
  [4] zram: add interface to specify backing device
  [5] zram: add free space management in backing device
  [6] zram: identify asynchronous IO's return value
  [7] zram: write incompressible pages to backing device
  [8] zram: read page from backing device
  [9] zram: add config and doc file for writeback feature

 Documentation/ABI/testing/sysfs-block-zram |   8 +
 Documentation/blockdev/zram.txt            |  11 +
 drivers/block/zram/Kconfig                 |  12 +
 drivers/block/zram/zram_drv.c              | 537 ++++++++++++++++++++++++-----
 drivers/block/zram/zram_drv.h              |   9 +
 5 files changed, 496 insertions(+), 81 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 1/9] zram: clean up duplicated codes in __zram_bvec_write
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
@ 2017-06-26  6:52 ` Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 2/9] zram: inlining zram_compress Minchan Kim
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

__zram_bvec_write has some of duplicated logic for zram meta data
handling of same_page|compressed_page.  This patch aims to clean it up
without behavior change.

Link: http://lkml.kernel.org/r/1496019048-27016-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 55 +++++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 33 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d3e3af2..5d3ea405 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -453,30 +453,6 @@ static bool zram_same_page_read(struct zram *zram, u32 index,
 	return false;
 }
 
-static bool zram_same_page_write(struct zram *zram, u32 index,
-					struct page *page)
-{
-	unsigned long element;
-	void *mem = kmap_atomic(page);
-
-	if (page_same_filled(mem, &element)) {
-		kunmap_atomic(mem);
-		/* Free memory associated with this sector now. */
-		zram_slot_lock(zram, index);
-		zram_free_page(zram, index);
-		zram_set_flag(zram, index, ZRAM_SAME);
-		zram_set_element(zram, index, element);
-		zram_slot_unlock(zram, index);
-
-		atomic64_inc(&zram->stats.same_pages);
-		atomic64_inc(&zram->stats.pages_stored);
-		return true;
-	}
-	kunmap_atomic(mem);
-
-	return false;
-}
-
 static void zram_meta_free(struct zram *zram, u64 disksize)
 {
 	size_t num_pages = disksize >> PAGE_SHIFT;
@@ -685,14 +661,23 @@ static int zram_compress(struct zram *zram, struct zcomp_strm **zstrm,
 static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 {
 	int ret;
-	unsigned long handle;
-	unsigned int comp_len;
-	void *src, *dst;
+	unsigned long handle = 0;
+	unsigned int comp_len = 0;
+	void *src, *dst, *mem;
 	struct zcomp_strm *zstrm;
 	struct page *page = bvec->bv_page;
+	unsigned long element = 0;
+	enum zram_pageflags flags = 0;
 
-	if (zram_same_page_write(zram, index, page))
-		return 0;
+	mem = kmap_atomic(page);
+	if (page_same_filled(mem, &element)) {
+		kunmap_atomic(mem);
+		/* Free memory associated with this sector now */
+		atomic64_inc(&zram->stats.same_pages);
+		flags = ZRAM_SAME;
+		goto out;
+	}
+	kunmap_atomic(mem);
 
 	zstrm = zcomp_stream_get(zram->comp);
 	ret = zram_compress(zram, &zstrm, page, &handle, &comp_len);
@@ -712,19 +697,23 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 
 	zcomp_stream_put(zram->comp);
 	zs_unmap_object(zram->mem_pool, handle);
-
+out:
 	/*
 	 * Free memory associated with this sector
 	 * before overwriting unused sectors.
 	 */
 	zram_slot_lock(zram, index);
 	zram_free_page(zram, index);
-	zram_set_handle(zram, index, handle);
-	zram_set_obj_size(zram, index, comp_len);
+	if (flags == ZRAM_SAME) {
+		zram_set_flag(zram, index, ZRAM_SAME);
+		zram_set_element(zram, index, element);
+	} else {
+		zram_set_handle(zram, index, handle);
+		zram_set_obj_size(zram, index, comp_len);
+	}
 	zram_slot_unlock(zram, index);
 
 	/* Update stats */
-	atomic64_add(comp_len, &zram->stats.compr_data_size);
 	atomic64_inc(&zram->stats.pages_stored);
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 2/9] zram: inlining zram_compress
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
  2017-06-26  6:52 ` [PATCH v1 1/9] zram: clean up duplicated codes in __zram_bvec_write Minchan Kim
@ 2017-06-26  6:53 ` Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 3/9] zram: rename zram_decompress_page with __zram_bvec_read Minchan Kim
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

zram_compress does several things, compress, entry alloc and check
limitation. I did for just readbility but it hurts modulization.:(
So this patch removes zram_compress functions and inline it in
__zram_bvec_write for upcoming patches.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 64 +++++++++++++++----------------------------
 1 file changed, 22 insertions(+), 42 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 5d3ea405..8138822 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -589,25 +589,38 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 	return ret;
 }
 
-static int zram_compress(struct zram *zram, struct zcomp_strm **zstrm,
-			struct page *page,
-			unsigned long *out_handle, unsigned int *out_comp_len)
+static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 {
 	int ret;
-	unsigned int comp_len;
-	void *src;
 	unsigned long alloced_pages;
 	unsigned long handle = 0;
+	unsigned int comp_len = 0;
+	void *src, *dst, *mem;
+	struct zcomp_strm *zstrm;
+	struct page *page = bvec->bv_page;
+	unsigned long element = 0;
+	enum zram_pageflags flags = 0;
+
+	mem = kmap_atomic(page);
+	if (page_same_filled(mem, &element)) {
+		kunmap_atomic(mem);
+		/* Free memory associated with this sector now. */
+		flags = ZRAM_SAME;
+		atomic64_inc(&zram->stats.same_pages);
+		goto out;
+	}
+	kunmap_atomic(mem);
 
 compress_again:
+	zstrm = zcomp_stream_get(zram->comp);
 	src = kmap_atomic(page);
-	ret = zcomp_compress(*zstrm, src, &comp_len);
+	ret = zcomp_compress(zstrm, src, &comp_len);
 	kunmap_atomic(src);
 
 	if (unlikely(ret)) {
+		zcomp_stream_put(zram->comp);
 		pr_err("Compression failed! err=%d\n", ret);
-		if (handle)
-			zs_free(zram->mem_pool, handle);
+		zs_free(zram->mem_pool, handle);
 		return ret;
 	}
 
@@ -639,7 +652,6 @@ static int zram_compress(struct zram *zram, struct zcomp_strm **zstrm,
 		handle = zs_malloc(zram->mem_pool, comp_len,
 				GFP_NOIO | __GFP_HIGHMEM |
 				__GFP_MOVABLE);
-		*zstrm = zcomp_stream_get(zram->comp);
 		if (handle)
 			goto compress_again;
 		return -ENOMEM;
@@ -649,43 +661,11 @@ static int zram_compress(struct zram *zram, struct zcomp_strm **zstrm,
 	update_used_max(zram, alloced_pages);
 
 	if (zram->limit_pages && alloced_pages > zram->limit_pages) {
+		zcomp_stream_put(zram->comp);
 		zs_free(zram->mem_pool, handle);
 		return -ENOMEM;
 	}
 
-	*out_handle = handle;
-	*out_comp_len = comp_len;
-	return 0;
-}
-
-static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
-{
-	int ret;
-	unsigned long handle = 0;
-	unsigned int comp_len = 0;
-	void *src, *dst, *mem;
-	struct zcomp_strm *zstrm;
-	struct page *page = bvec->bv_page;
-	unsigned long element = 0;
-	enum zram_pageflags flags = 0;
-
-	mem = kmap_atomic(page);
-	if (page_same_filled(mem, &element)) {
-		kunmap_atomic(mem);
-		/* Free memory associated with this sector now */
-		atomic64_inc(&zram->stats.same_pages);
-		flags = ZRAM_SAME;
-		goto out;
-	}
-	kunmap_atomic(mem);
-
-	zstrm = zcomp_stream_get(zram->comp);
-	ret = zram_compress(zram, &zstrm, page, &handle, &comp_len);
-	if (ret) {
-		zcomp_stream_put(zram->comp);
-		return ret;
-	}
-
 	dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
 
 	src = zstrm->buffer;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 3/9] zram: rename zram_decompress_page with __zram_bvec_read
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
  2017-06-26  6:52 ` [PATCH v1 1/9] zram: clean up duplicated codes in __zram_bvec_write Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 2/9] zram: inlining zram_compress Minchan Kim
@ 2017-06-26  6:53 ` Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 4/9] zram: add interface to specify backing device Minchan Kim
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

zram_decompress_page naming is not proper because it doesn't
decompress if page was dedup hit or stored with compression.
Use more abstract term and consistent with write path function
 __zram_bvec_write.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 8138822..5c92209 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -518,7 +518,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 	zram_set_obj_size(zram, index, 0);
 }
 
-static int zram_decompress_page(struct zram *zram, struct page *page, u32 index)
+static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index)
 {
 	int ret;
 	unsigned long handle;
@@ -570,7 +570,7 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 			return -ENOMEM;
 	}
 
-	ret = zram_decompress_page(zram, page, index);
+	ret = __zram_bvec_read(zram, page, index);
 	if (unlikely(ret))
 		goto out;
 
@@ -717,7 +717,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 		if (!page)
 			return -ENOMEM;
 
-		ret = zram_decompress_page(zram, page, index);
+		ret = __zram_bvec_read(zram, page, index);
 		if (ret)
 			goto out;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 4/9] zram: add interface to specify backing device
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
                   ` (2 preceding siblings ...)
  2017-06-26  6:53 ` [PATCH v1 3/9] zram: rename zram_decompress_page with __zram_bvec_read Minchan Kim
@ 2017-06-26  6:53 ` Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 5/9] zram: add free space management in " Minchan Kim
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

For writeback feature, user should set up backing device before
the zram working. This patch enables the interface via
/sys/block/zramX/backing_dev.

Currently, it supports block device only but it could be enhanced
for file as well.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 142 ++++++++++++++++++++++++++++++++++++++++++
 drivers/block/zram/zram_drv.h |   5 ++
 2 files changed, 147 insertions(+)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 5c92209..eb20655 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -270,6 +270,141 @@ static ssize_t mem_used_max_store(struct device *dev,
 	return len;
 }
 
+#ifdef CONFIG_ZRAM_WRITEBACK
+static bool zram_wb_enabled(struct zram *zram)
+{
+	return zram->backing_dev;
+}
+
+static void reset_bdev(struct zram *zram)
+{
+	struct block_device *bdev;
+
+	if (!zram_wb_enabled(zram))
+		return;
+
+	bdev = zram->bdev;
+	if (zram->old_block_size)
+		set_blocksize(bdev, zram->old_block_size);
+	blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+	/* hope filp_close flush all of IO */
+	filp_close(zram->backing_dev, NULL);
+	zram->backing_dev = NULL;
+	zram->old_block_size = 0;
+	zram->bdev = NULL;
+}
+
+static ssize_t backing_dev_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+	struct file *file = zram->backing_dev;
+	char *p;
+	ssize_t ret;
+
+	down_read(&zram->init_lock);
+	if (!zram_wb_enabled(zram)) {
+		memcpy(buf, "none\n", 5);
+		up_read(&zram->init_lock);
+		return 5;
+	}
+
+	p = file_path(file, buf, PAGE_SIZE - 1);
+	if (IS_ERR(p)) {
+		ret = PTR_ERR(p);
+		goto out;
+	}
+
+	ret = strlen(p);
+	memmove(buf, p, ret);
+	buf[ret++] = '\n';
+out:
+	up_read(&zram->init_lock);
+	return ret;
+}
+
+static ssize_t backing_dev_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	char *file_name;
+	struct file *backing_dev = NULL;
+	struct inode *inode;
+	struct address_space *mapping;
+	unsigned int old_block_size = 0;
+	struct block_device *bdev = NULL;
+	int err;
+	struct zram *zram = dev_to_zram(dev);
+
+	file_name = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (!file_name)
+		return -ENOMEM;
+
+	down_write(&zram->init_lock);
+	if (init_done(zram)) {
+		pr_info("Can't setup backing device for initialized device\n");
+		err = -EBUSY;
+		goto out;
+	}
+
+	strlcpy(file_name, buf, len);
+
+	backing_dev = filp_open(file_name, O_RDWR|O_LARGEFILE, 0);
+	if (IS_ERR(backing_dev)) {
+		err = PTR_ERR(backing_dev);
+		backing_dev = NULL;
+		goto out;
+	}
+
+	mapping = backing_dev->f_mapping;
+	inode = mapping->host;
+
+	/* Support only block device in this moment */
+	if (!S_ISBLK(inode->i_mode)) {
+		err = -ENOTBLK;
+		goto out;
+	}
+
+	bdev = bdgrab(I_BDEV(inode));
+	err = blkdev_get(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram);
+	if (err < 0)
+		goto out;
+
+	old_block_size = block_size(bdev);
+	err = set_blocksize(bdev, PAGE_SIZE);
+	if (err)
+		goto out;
+
+	reset_bdev(zram);
+
+	zram->old_block_size = old_block_size;
+	zram->bdev = bdev;
+	zram->backing_dev = backing_dev;
+	up_write(&zram->init_lock);
+
+	pr_info("setup backing device %s\n", file_name);
+	kfree(file_name);
+
+	return len;
+out:
+	if (bdev)
+		blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+
+	if (backing_dev)
+		filp_close(backing_dev, NULL);
+
+	up_write(&zram->init_lock);
+
+	kfree(file_name);
+
+	return err;
+}
+
+#else
+static bool zram_wb_enabled(struct zram *zram) { return false; }
+static inline void reset_bdev(struct zram *zram) {};
+#endif
+
+
 /*
  * We switched to per-cpu streams and this attr is not needed anymore.
  * However, we will keep it around for some time, because:
@@ -952,6 +1087,7 @@ static void zram_reset_device(struct zram *zram)
 	zram_meta_free(zram, disksize);
 	memset(&zram->stats, 0, sizeof(zram->stats));
 	zcomp_destroy(comp);
+	reset_bdev(zram);
 }
 
 static ssize_t disksize_store(struct device *dev,
@@ -1077,6 +1213,9 @@ static DEVICE_ATTR_WO(mem_limit);
 static DEVICE_ATTR_WO(mem_used_max);
 static DEVICE_ATTR_RW(max_comp_streams);
 static DEVICE_ATTR_RW(comp_algorithm);
+#ifdef CONFIG_ZRAM_WRITEBACK
+static DEVICE_ATTR_RW(backing_dev);
+#endif
 
 static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_disksize.attr,
@@ -1087,6 +1226,9 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_mem_used_max.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
+#ifdef CONFIG_ZRAM_WRITEBACK
+	&dev_attr_backing_dev.attr,
+#endif
 	&dev_attr_io_stat.attr,
 	&dev_attr_mm_stat.attr,
 	&dev_attr_debug_stat.attr,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index e34e44d..113a411 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -115,5 +115,10 @@ struct zram {
 	 * zram is claimed so open request will be failed
 	 */
 	bool claim; /* Protected by bdev->bd_mutex */
+#ifdef CONFIG_ZRAM_WRITEBACK
+	struct file *backing_dev;
+	struct block_device *bdev;
+	unsigned int old_block_size;
+#endif
 };
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 5/9] zram: add free space management in backing device
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
                   ` (3 preceding siblings ...)
  2017-06-26  6:53 ` [PATCH v1 4/9] zram: add interface to specify backing device Minchan Kim
@ 2017-06-26  6:53 ` Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 6/9] zram: identify asynchronous IO's return value Minchan Kim
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

With backing device, zram needs management of free space of
backing device.
This patch adds bitmap logic to manage free space which is
very naive. However, it would be simple enough as considering
uncompressible pages's frequenty in zram.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 48 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h |  3 +++
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index eb20655..e31fef7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -292,6 +292,9 @@ static void reset_bdev(struct zram *zram)
 	zram->backing_dev = NULL;
 	zram->old_block_size = 0;
 	zram->bdev = NULL;
+
+	kvfree(zram->bitmap);
+	zram->bitmap = NULL;
 }
 
 static ssize_t backing_dev_show(struct device *dev,
@@ -330,7 +333,8 @@ static ssize_t backing_dev_store(struct device *dev,
 	struct file *backing_dev = NULL;
 	struct inode *inode;
 	struct address_space *mapping;
-	unsigned int old_block_size = 0;
+	unsigned int bitmap_sz, old_block_size = 0;
+	unsigned long nr_pages, *bitmap = NULL;
 	struct block_device *bdev = NULL;
 	int err;
 	struct zram *zram = dev_to_zram(dev);
@@ -369,16 +373,27 @@ static ssize_t backing_dev_store(struct device *dev,
 	if (err < 0)
 		goto out;
 
+	nr_pages = i_size_read(inode) >> PAGE_SHIFT;
+	bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
+	bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
+	if (!bitmap) {
+		err = -ENOMEM;
+		goto out;
+	}
+
 	old_block_size = block_size(bdev);
 	err = set_blocksize(bdev, PAGE_SIZE);
 	if (err)
 		goto out;
 
 	reset_bdev(zram);
+	spin_lock_init(&zram->bitmap_lock);
 
 	zram->old_block_size = old_block_size;
 	zram->bdev = bdev;
 	zram->backing_dev = backing_dev;
+	zram->bitmap = bitmap;
+	zram->nr_pages = nr_pages;
 	up_write(&zram->init_lock);
 
 	pr_info("setup backing device %s\n", file_name);
@@ -386,6 +401,9 @@ static ssize_t backing_dev_store(struct device *dev,
 
 	return len;
 out:
+	if (bitmap)
+		kvfree(bitmap);
+
 	if (bdev)
 		blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
 
@@ -399,6 +417,34 @@ static ssize_t backing_dev_store(struct device *dev,
 	return err;
 }
 
+static unsigned long get_entry_bdev(struct zram *zram)
+{
+	unsigned long entry;
+
+	spin_lock(&zram->bitmap_lock);
+	/* skip 0 bit to confuse zram.handle = 0 */
+	entry = find_next_zero_bit(zram->bitmap, zram->nr_pages, 1);
+	if (entry == zram->nr_pages) {
+		spin_unlock(&zram->bitmap_lock);
+		return 0;
+	}
+
+	set_bit(entry, zram->bitmap);
+	spin_unlock(&zram->bitmap_lock);
+
+	return entry;
+}
+
+static void put_entry_bdev(struct zram *zram, unsigned long entry)
+{
+	int was_set;
+
+	spin_lock(&zram->bitmap_lock);
+	was_set = test_and_clear_bit(entry, zram->bitmap);
+	spin_unlock(&zram->bitmap_lock);
+	WARN_ON_ONCE(!was_set);
+}
+
 #else
 static bool zram_wb_enabled(struct zram *zram) { return false; }
 static inline void reset_bdev(struct zram *zram) {};
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 113a411..707aec0 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -119,6 +119,9 @@ struct zram {
 	struct file *backing_dev;
 	struct block_device *bdev;
 	unsigned int old_block_size;
+	unsigned long *bitmap;
+	unsigned long nr_pages;
+	spinlock_t bitmap_lock;
 #endif
 };
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 6/9] zram: identify asynchronous IO's return value
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
                   ` (4 preceding siblings ...)
  2017-06-26  6:53 ` [PATCH v1 5/9] zram: add free space management in " Minchan Kim
@ 2017-06-26  6:53 ` Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 7/9] zram: write incompressible pages to backing device Minchan Kim
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

For upcoming asynchronous IO like writeback, zram_rw_page should
be aware of that whether requested IO was completed or submitted
successfully, otherwise error.

For the goal, zram_bvec_rw has three return values.

-errno: returns error number
     0: IO request is done synchronously
     1: IO request is issued successfully.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e31fef7..896867e2 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -772,7 +772,7 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 
 static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 {
-	int ret;
+	int ret = 0;
 	unsigned long alloced_pages;
 	unsigned long handle = 0;
 	unsigned int comp_len = 0;
@@ -876,7 +876,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 
 	/* Update stats */
 	atomic64_inc(&zram->stats.pages_stored);
-	return 0;
+	return ret;
 }
 
 static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
@@ -958,6 +958,11 @@ static void zram_bio_discard(struct zram *zram, u32 index,
 	}
 }
 
+/*
+ * Returns errno if it has some problem. Otherwise return 0 or 1.
+ * Returns 0 if IO request was done synchronously
+ * Returns 1 if IO request was successfully submitted.
+ */
 static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 			int offset, bool is_write)
 {
@@ -979,7 +984,7 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	generic_end_io_acct(rw_acct, &zram->disk->part0, start_time);
 
-	if (unlikely(ret)) {
+	if (unlikely(ret < 0)) {
 		if (!is_write)
 			atomic64_inc(&zram->stats.failed_reads);
 		else
@@ -1072,7 +1077,7 @@ static void zram_slot_free_notify(struct block_device *bdev,
 static int zram_rw_page(struct block_device *bdev, sector_t sector,
 		       struct page *page, bool is_write)
 {
-	int offset, err = -EIO;
+	int offset, ret;
 	u32 index;
 	struct zram *zram;
 	struct bio_vec bv;
@@ -1081,7 +1086,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 
 	if (!valid_io_request(zram, sector, PAGE_SIZE)) {
 		atomic64_inc(&zram->stats.invalid_io);
-		err = -EINVAL;
+		ret = -EINVAL;
 		goto out;
 	}
 
@@ -1092,7 +1097,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_len = PAGE_SIZE;
 	bv.bv_offset = 0;
 
-	err = zram_bvec_rw(zram, &bv, index, offset, is_write);
+	ret = zram_bvec_rw(zram, &bv, index, offset, is_write);
 out:
 	/*
 	 * If I/O fails, just return error(ie, non-zero) without
@@ -1102,9 +1107,20 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	 * bio->bi_end_io does things to handle the error
 	 * (e.g., SetPageError, set_page_dirty and extra works).
 	 */
-	if (err == 0)
+	if (unlikely(ret < 0))
+		return ret;
+
+	switch (ret) {
+	case 0:
 		page_endio(page, is_write, 0);
-	return err;
+		break;
+	case 1:
+		ret = 0;
+		break;
+	default:
+		WARN_ON(1);
+	}
+	return ret;
 }
 
 static void zram_reset_device(struct zram *zram)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 7/9] zram: write incompressible pages to backing device
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
                   ` (5 preceding siblings ...)
  2017-06-26  6:53 ` [PATCH v1 6/9] zram: identify asynchronous IO's return value Minchan Kim
@ 2017-06-26  6:53 ` Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 8/9] zram: read page from " Minchan Kim
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

This patch enables write IO to transfer data to backing device.
For that, it implements write_to_bdev function which creates
new bio and chaining with parent bio to make the parent bio
asynchrnous.
For rw_page which don't have parent bio, it submit owned bio
and handle IO completion by zram_page_end_io.

Also, this patch defines new flag ZRAM_WB to mark written page
for later read IO.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 113 +++++++++++++++++++++++++++++++++++++-----
 drivers/block/zram/zram_drv.h |   1 +
 2 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 896867e2..99e46ae 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -445,9 +445,76 @@ static void put_entry_bdev(struct zram *zram, unsigned long entry)
 	WARN_ON_ONCE(!was_set);
 }
 
+void zram_page_end_io(struct bio *bio)
+{
+	struct page *page = bio->bi_io_vec[0].bv_page;
+
+	page_endio(page, op_is_write(bio_op(bio)),
+			blk_status_to_errno(bio->bi_status));
+	bio_put(bio);
+}
+
+static int write_to_bdev(struct zram *zram, struct bio_vec *bvec,
+					u32 index, struct bio *parent,
+					unsigned long *pentry)
+{
+	struct bio *bio;
+	unsigned long entry;
+
+	bio = bio_alloc(GFP_ATOMIC, 1);
+	if (!bio)
+		return -ENOMEM;
+
+	entry = get_entry_bdev(zram);
+	if (!entry) {
+		bio_put(bio);
+		return -ENOSPC;
+	}
+
+	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
+	bio->bi_bdev = zram->bdev;
+	if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len,
+					bvec->bv_offset)) {
+		bio_put(bio);
+		put_entry_bdev(zram, entry);
+		return -EIO;
+	}
+
+	if (!parent) {
+		bio->bi_opf = REQ_OP_WRITE | REQ_SYNC;
+		bio->bi_end_io = zram_page_end_io;
+	} else {
+		bio->bi_opf = parent->bi_opf;
+		bio_chain(bio, parent);
+	}
+
+	submit_bio(bio);
+	*pentry = entry;
+
+	return 0;
+}
+
+static void zram_wb_clear(struct zram *zram, u32 index)
+{
+	unsigned long entry;
+
+	zram_clear_flag(zram, index, ZRAM_WB);
+	entry = zram_get_element(zram, index);
+	zram_set_element(zram, index, 0);
+	put_entry_bdev(zram, entry);
+}
+
 #else
 static bool zram_wb_enabled(struct zram *zram) { return false; }
 static inline void reset_bdev(struct zram *zram) {};
+static int write_to_bdev(struct zram *zram, struct bio_vec *bvec,
+					u32 index, struct bio *parent,
+					unsigned long *pentry)
+
+{
+	return -EIO;
+}
+static void zram_wb_clear(struct zram *zram, u32 index) {}
 #endif
 
 
@@ -672,7 +739,13 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
  */
 static void zram_free_page(struct zram *zram, size_t index)
 {
-	unsigned long handle = zram_get_handle(zram, index);
+	unsigned long handle;
+
+	if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) {
+		zram_wb_clear(zram, index);
+		atomic64_dec(&zram->stats.pages_stored);
+		return;
+	}
 
 	/*
 	 * No memory is allocated for same element filled pages.
@@ -686,6 +759,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 		return;
 	}
 
+	handle = zram_get_handle(zram, index);
 	if (!handle)
 		return;
 
@@ -770,7 +844,8 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 	return ret;
 }
 
-static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
+static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
+				u32 index, struct bio *bio)
 {
 	int ret = 0;
 	unsigned long alloced_pages;
@@ -781,6 +856,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 	struct page *page = bvec->bv_page;
 	unsigned long element = 0;
 	enum zram_pageflags flags = 0;
+	bool allow_wb = true;
 
 	mem = kmap_atomic(page);
 	if (page_same_filled(mem, &element)) {
@@ -805,8 +881,20 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 		return ret;
 	}
 
-	if (unlikely(comp_len > max_zpage_size))
+	if (unlikely(comp_len > max_zpage_size)) {
+		if (zram_wb_enabled(zram) && allow_wb) {
+			zcomp_stream_put(zram->comp);
+			ret = write_to_bdev(zram, bvec, index, bio, &element);
+			if (!ret) {
+				flags = ZRAM_WB;
+				ret = 1;
+				goto out;
+			}
+			allow_wb = false;
+			goto compress_again;
+		}
 		comp_len = PAGE_SIZE;
+	}
 
 	/*
 	 * handle allocation has 2 paths:
@@ -865,10 +953,11 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 	 */
 	zram_slot_lock(zram, index);
 	zram_free_page(zram, index);
-	if (flags == ZRAM_SAME) {
-		zram_set_flag(zram, index, ZRAM_SAME);
+
+	if (flags) {
+		zram_set_flag(zram, index, flags);
 		zram_set_element(zram, index, element);
-	} else {
+	}  else {
 		zram_set_handle(zram, index, handle);
 		zram_set_obj_size(zram, index, comp_len);
 	}
@@ -880,7 +969,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 }
 
 static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
-				u32 index, int offset)
+				u32 index, int offset, struct bio *bio)
 {
 	int ret;
 	struct page *page = NULL;
@@ -913,7 +1002,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 		vec.bv_offset = 0;
 	}
 
-	ret = __zram_bvec_write(zram, &vec, index);
+	ret = __zram_bvec_write(zram, &vec, index, bio);
 out:
 	if (is_partial_io(bvec))
 		__free_page(page);
@@ -964,7 +1053,7 @@ static void zram_bio_discard(struct zram *zram, u32 index,
  * Returns 1 if IO request was successfully submitted.
  */
 static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
-			int offset, bool is_write)
+			int offset, bool is_write, struct bio *bio)
 {
 	unsigned long start_time = jiffies;
 	int rw_acct = is_write ? REQ_OP_WRITE : REQ_OP_READ;
@@ -979,7 +1068,7 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 		flush_dcache_page(bvec->bv_page);
 	} else {
 		atomic64_inc(&zram->stats.num_writes);
-		ret = zram_bvec_write(zram, bvec, index, offset);
+		ret = zram_bvec_write(zram, bvec, index, offset, bio);
 	}
 
 	generic_end_io_acct(rw_acct, &zram->disk->part0, start_time);
@@ -1023,7 +1112,7 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
 			bv.bv_len = min_t(unsigned int, PAGE_SIZE - offset,
 							unwritten);
 			if (zram_bvec_rw(zram, &bv, index, offset,
-					op_is_write(bio_op(bio))) < 0)
+					op_is_write(bio_op(bio)), bio) < 0)
 				goto out;
 
 			bv.bv_offset += bv.bv_len;
@@ -1097,7 +1186,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_len = PAGE_SIZE;
 	bv.bv_offset = 0;
 
-	ret = zram_bvec_rw(zram, &bv, index, offset, is_write);
+	ret = zram_bvec_rw(zram, &bv, index, offset, is_write, NULL);
 out:
 	/*
 	 * If I/O fails, just return error(ie, non-zero) without
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 707aec0..5573dc2 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -63,6 +63,7 @@ enum zram_pageflags {
 	/* Page consists entirely of zeros */
 	ZRAM_SAME = ZRAM_FLAG_SHIFT,
 	ZRAM_ACCESS,	/* page is now accessed */
+	ZRAM_WB,	/* page is stored on backing_device */
 
 	__NR_ZRAM_PAGEFLAGS,
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 8/9] zram: read page from backing device
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
                   ` (6 preceding siblings ...)
  2017-06-26  6:53 ` [PATCH v1 7/9] zram: write incompressible pages to backing device Minchan Kim
@ 2017-06-26  6:53 ` Minchan Kim
  2017-06-26  6:53 ` [PATCH v1 9/9] zram: add config and doc file for writeback feature Minchan Kim
  2017-06-28 15:41 ` [PATCH v1 0/7] writeback incompressible pages to storage Sergey Senozhatsky
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

This patch enables read IO from backing device. For the feature,
it implements two IO read functions to transfer data from backing
storage.

One is asynchronous IO function and other is synchronous one.

A reason I need synchrnous IO is due to partial write which need to
complete read IO before the overwriting partial data.

We can make the partial IO's case asynchronous, too but at the moment,
I don't feel adding more complexity to support such rare use cases
so want to go with simple.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 123 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 118 insertions(+), 5 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 99e46ae..a1e8c73 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -454,6 +454,95 @@ void zram_page_end_io(struct bio *bio)
 	bio_put(bio);
 }
 
+/*
+ * Returns 0 if the submission is successful.
+ */
+static int read_from_bdev_async(struct zram *zram, struct bio_vec *bvec,
+			unsigned long entry, struct bio *parent)
+{
+	struct bio *bio;
+
+	bio = bio_alloc(GFP_ATOMIC, 1);
+	if (!bio)
+		return -ENOMEM;
+
+	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
+	bio->bi_bdev = zram->bdev;
+	if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len, bvec->bv_offset)) {
+		bio_put(bio);
+		return -EIO;
+	}
+
+	if (!parent) {
+		bio->bi_opf = REQ_OP_READ;
+		bio->bi_end_io = zram_page_end_io;
+	} else {
+		bio->bi_opf = parent->bi_opf;
+		bio_chain(bio, parent);
+	}
+
+	submit_bio(bio);
+	return 1;
+}
+
+struct zram_work {
+	struct work_struct work;
+	struct zram *zram;
+	unsigned long entry;
+	struct bio *bio;
+};
+
+#if PAGE_SIZE != 4096
+static void zram_sync_read(struct work_struct *work)
+{
+	struct bio_vec bvec;
+	struct zram_work *zw = container_of(work, struct zram_work, work);
+	struct zram *zram = zw->zram;
+	unsigned long entry = zw->entry;
+	struct bio *bio = zw->bio;
+
+	read_from_bdev_async(zram, &bvec, entry, bio);
+}
+
+/*
+ * Block layer want one ->make_request_fn to be active at a time
+ * so if we use chained IO with parent IO in same context,
+ * it's a deadlock. To avoid, it, it uses worker thread context.
+ */
+static int read_from_bdev_sync(struct zram *zram, struct bio_vec *bvec,
+				unsigned long entry, struct bio *bio)
+{
+	struct zram_work work;
+
+	work.zram = zram;
+	work.entry = entry;
+	work.bio = bio;
+
+	INIT_WORK_ONSTACK(&work.work, zram_sync_read);
+	queue_work(system_unbound_wq, &work.work);
+	flush_work(&work.work);
+	destroy_work_on_stack(&work.work);
+
+	return 1;
+}
+#else
+static int read_from_bdev_sync(struct zram *zram, struct bio_vec *bvec,
+				unsigned long entry, struct bio *bio)
+{
+	WARN_ON(1);
+	return -EIO;
+}
+#endif
+
+static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
+			unsigned long entry, struct bio *parent, bool sync)
+{
+	if (sync)
+		return read_from_bdev_sync(zram, bvec, entry, parent);
+	else
+		return read_from_bdev_async(zram, bvec, entry, parent);
+}
+
 static int write_to_bdev(struct zram *zram, struct bio_vec *bvec,
 					u32 index, struct bio *parent,
 					unsigned long *pentry)
@@ -514,6 +603,12 @@ static int write_to_bdev(struct zram *zram, struct bio_vec *bvec,
 {
 	return -EIO;
 }
+
+static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
+			unsigned long entry, struct bio *parent, bool sync)
+{
+	return -EIO;
+}
 static void zram_wb_clear(struct zram *zram, u32 index) {}
 #endif
 
@@ -773,13 +868,31 @@ static void zram_free_page(struct zram *zram, size_t index)
 	zram_set_obj_size(zram, index, 0);
 }
 
-static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index)
+static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
+				struct bio *bio, bool partial_io)
 {
 	int ret;
 	unsigned long handle;
 	unsigned int size;
 	void *src, *dst;
 
+	if (zram_wb_enabled(zram)) {
+		zram_slot_lock(zram, index);
+		if (zram_test_flag(zram, index, ZRAM_WB)) {
+			struct bio_vec bvec;
+
+			zram_slot_unlock(zram, index);
+
+			bvec.bv_page = page;
+			bvec.bv_len = PAGE_SIZE;
+			bvec.bv_offset = 0;
+			return read_from_bdev(zram, &bvec,
+					zram_get_element(zram, index),
+					bio, partial_io);
+		}
+		zram_slot_unlock(zram, index);
+	}
+
 	if (zram_same_page_read(zram, index, page, 0, PAGE_SIZE))
 		return 0;
 
@@ -812,7 +925,7 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index)
 }
 
 static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
-				u32 index, int offset)
+				u32 index, int offset, struct bio *bio)
 {
 	int ret;
 	struct page *page;
@@ -825,7 +938,7 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 			return -ENOMEM;
 	}
 
-	ret = __zram_bvec_read(zram, page, index);
+	ret = __zram_bvec_read(zram, page, index, bio, is_partial_io(bvec));
 	if (unlikely(ret))
 		goto out;
 
@@ -987,7 +1100,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 		if (!page)
 			return -ENOMEM;
 
-		ret = __zram_bvec_read(zram, page, index);
+		ret = __zram_bvec_read(zram, page, index, bio, true);
 		if (ret)
 			goto out;
 
@@ -1064,7 +1177,7 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	if (!is_write) {
 		atomic64_inc(&zram->stats.num_reads);
-		ret = zram_bvec_read(zram, bvec, index, offset);
+		ret = zram_bvec_read(zram, bvec, index, offset, bio);
 		flush_dcache_page(bvec->bv_page);
 	} else {
 		atomic64_inc(&zram->stats.num_writes);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 9/9] zram: add config and doc file for writeback feature
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
                   ` (7 preceding siblings ...)
  2017-06-26  6:53 ` [PATCH v1 8/9] zram: read page from " Minchan Kim
@ 2017-06-26  6:53 ` Minchan Kim
  2017-06-28 15:41 ` [PATCH v1 0/7] writeback incompressible pages to storage Sergey Senozhatsky
  9 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-26  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Juneho Choi, Sergey Senozhatsky, kernel-team, Minchan Kim

This patch adds document and kconfig for using of writeback feature.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/ABI/testing/sysfs-block-zram |  8 ++++++++
 Documentation/blockdev/zram.txt            | 11 +++++++++++
 drivers/block/zram/Kconfig                 | 12 ++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 451b6d8..c1513c7 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -90,3 +90,11 @@ Contact:	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
 		device's debugging info useful for kernel developers. Its
 		format is not documented intentionally and may change
 		anytime without any notice.
+
+What:		/sys/block/zram<id>/backing_dev
+Date:		June 2017
+Contact:	Minchan Kim <minchan@kernel.org>
+Description:
+		The backing_dev file is read-write and set up backing
+		device for zram to write incompressible pages.
+		For using, user should enable CONFIG_ZRAM_WRITEBACK.
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 4fced8a..257e657 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -168,6 +168,7 @@ max_comp_streams  RW    the number of possible concurrent compress operations
 comp_algorithm    RW    show and change the compression algorithm
 compact           WO    trigger memory compaction
 debug_stat        RO    this file is used for zram debugging purposes
+backing_dev	  RW	set up backend storage for zram to write out
 
 
 User space is advised to use the following files to read the device statistics.
@@ -231,5 +232,15 @@ The stat file represents device's mm statistics. It consists of a single
 	resets the disksize to zero. You must set the disksize again
 	before reusing the device.
 
+* Optional Feature
+
+= writeback
+
+With incompressible pages, there is no memory saving with zram.
+Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page
+to backing storage rather than keeping it in memory.
+User should set up backing device via /sys/block/zramX/backing_dev
+before disksize setting.
+
 Nitin Gupta
 ngupta@vflare.org
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index b8ecba6..7cd4a8e 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -13,3 +13,15 @@ config ZRAM
 	  disks and maybe many more.
 
 	  See zram.txt for more information.
+
+config ZRAM_WRITEBACK
+       bool "Write back incompressible page to backing device"
+       depends on ZRAM
+       default n
+       help
+	 With incompressible page, there is no memory saving to keep it
+	 in memory. Instead, write it out to backing device.
+	 For this feature, admin should set up backing device via
+	 /sys/block/zramX/backing_dev.
+
+	 See zram.txt for more infomration.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/7] writeback incompressible pages to storage
  2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
                   ` (8 preceding siblings ...)
  2017-06-26  6:53 ` [PATCH v1 9/9] zram: add config and doc file for writeback feature Minchan Kim
@ 2017-06-28 15:41 ` Sergey Senozhatsky
  2017-06-29  8:47   ` Minchan Kim
  9 siblings, 1 reply; 14+ messages in thread
From: Sergey Senozhatsky @ 2017-06-28 15:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, Juneho Choi, Sergey Senozhatsky,
	kernel-team

Hello,

On (06/26/17 15:52), Minchan Kim wrote:
[..]
> zRam is useful for memory saving with compressible pages but sometime,
> workload can be changed and system has lots of incompressible pages
> which is very harmful for zram.

could do. that makes zram quite complicated, to be honest. no offense,
but the whole zram's "good compression" margin looks to me completely
random and quite unreasonable. building a complex logic atop of random
logic is a bit tricky. but I see what problem you are trying to address.

> This patch supports writeback feature of zram so admin can set up
> a block device and with it, zram can save the memory via writing
> out the incompressile pages once it found it's incompressible pages
> (1/4 comp ratio) instead of keeping the page in memory.

hm, alternative idea. just an idea. can we try compressing the page
with another algorithm? example: downcast from lz4 to zlib? we can
set up a fallback "worst case" algorithm, so each entry can contain
additional flag that would tell if the src page was compressed with
the fast or slow algorithm. that sounds to me easier than "create a
new block device and bond it to zram, etc". but I may be wrong.

	-ss

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/7] writeback incompressible pages to storage
  2017-06-28 15:41 ` [PATCH v1 0/7] writeback incompressible pages to storage Sergey Senozhatsky
@ 2017-06-29  8:47   ` Minchan Kim
  2017-06-29  9:17     ` Sergey Senozhatsky
  0 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2017-06-29  8:47 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, linux-kernel, Juneho Choi, kernel-team

Hi Sergey,

On Thu, Jun 29, 2017 at 12:41:57AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (06/26/17 15:52), Minchan Kim wrote:
> [..]
> > zRam is useful for memory saving with compressible pages but sometime,
> > workload can be changed and system has lots of incompressible pages
> > which is very harmful for zram.
> 
> could do. that makes zram quite complicated, to be honest. no offense,
> but the whole zram's "good compression" margin looks to me completely
> random and quite unreasonable. building a complex logic atop of random
> logic is a bit tricky. but I see what problem you are trying to address.
> 
> > This patch supports writeback feature of zram so admin can set up
> > a block device and with it, zram can save the memory via writing
> > out the incompressile pages once it found it's incompressible pages
> > (1/4 comp ratio) instead of keeping the page in memory.
> 
> hm, alternative idea. just an idea. can we try compressing the page
> with another algorithm? example: downcast from lz4 to zlib? we can
> set up a fallback "worst case" algorithm, so each entry can contain
> additional flag that would tell if the src page was compressed with
> the fast or slow algorithm. that sounds to me easier than "create a
> new block device and bond it to zram, etc". but I may be wrong.

We tried it although it was static not dynamic adatation you suggested.
However problem was media-stream data so zlib, lzam added just pointless
overhead.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/7] writeback incompressible pages to storage
  2017-06-29  8:47   ` Minchan Kim
@ 2017-06-29  9:17     ` Sergey Senozhatsky
  2017-06-29  9:29       ` Minchan Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Sergey Senozhatsky @ 2017-06-29  9:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Juneho Choi,
	kernel-team

Hello,

On (06/29/17 17:47), Minchan Kim wrote:
[..]
> > > This patch supports writeback feature of zram so admin can set up
> > > a block device and with it, zram can save the memory via writing
> > > out the incompressile pages once it found it's incompressible pages
> > > (1/4 comp ratio) instead of keeping the page in memory.
> > 
> > hm, alternative idea. just an idea. can we try compressing the page
> > with another algorithm? example: downcast from lz4 to zlib? we can
> > set up a fallback "worst case" algorithm, so each entry can contain
> > additional flag that would tell if the src page was compressed with
> > the fast or slow algorithm. that sounds to me easier than "create a
> > new block device and bond it to zram, etc". but I may be wrong.
> 
> We tried it although it was static not dynamic adatation you suggested.

could you please explain more? I'm not sure I understand what
was the configuration (what is static adaptation?).

> However problem was media-stream data so zlib, lzam added just pointless
> overhead.

would that overhead be bigger than a full-blown I/O request to
another block device (potentially slow, or under load, etc. etc.)?

	-ss

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/7] writeback incompressible pages to storage
  2017-06-29  9:17     ` Sergey Senozhatsky
@ 2017-06-29  9:29       ` Minchan Kim
  0 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2017-06-29  9:29 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Juneho Choi,
	kernel-team

On Thu, Jun 29, 2017 at 06:17:13PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (06/29/17 17:47), Minchan Kim wrote:
> [..]
> > > > This patch supports writeback feature of zram so admin can set up
> > > > a block device and with it, zram can save the memory via writing
> > > > out the incompressile pages once it found it's incompressible pages
> > > > (1/4 comp ratio) instead of keeping the page in memory.
> > > 
> > > hm, alternative idea. just an idea. can we try compressing the page
> > > with another algorithm? example: downcast from lz4 to zlib? we can
> > > set up a fallback "worst case" algorithm, so each entry can contain
> > > additional flag that would tell if the src page was compressed with
> > > the fast or slow algorithm. that sounds to me easier than "create a
> > > new block device and bond it to zram, etc". but I may be wrong.
> > 
> > We tried it although it was static not dynamic adatation you suggested.
> 
> could you please explain more? I'm not sure I understand what
> was the configuration (what is static adaptation?).

        echo inflate > /sys/block/zramX/comp_algorighm

> 
> > However problem was media-stream data so zlib, lzam added just pointless
> > overhead.
> 
> would that overhead be bigger than a full-blown I/O request to
> another block device (potentially slow, or under load, etc. etc.)?

The problem is not a overhead but memeory saving.
Although we use higher compression algorithm like zlib, lzma,
the comp ratio was not different with lzo and lz4 so it was
added pointless overhead without any saving.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-29  9:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-26  6:52 [PATCH v1 0/7] writeback incompressible pages to storage Minchan Kim
2017-06-26  6:52 ` [PATCH v1 1/9] zram: clean up duplicated codes in __zram_bvec_write Minchan Kim
2017-06-26  6:53 ` [PATCH v1 2/9] zram: inlining zram_compress Minchan Kim
2017-06-26  6:53 ` [PATCH v1 3/9] zram: rename zram_decompress_page with __zram_bvec_read Minchan Kim
2017-06-26  6:53 ` [PATCH v1 4/9] zram: add interface to specify backing device Minchan Kim
2017-06-26  6:53 ` [PATCH v1 5/9] zram: add free space management in " Minchan Kim
2017-06-26  6:53 ` [PATCH v1 6/9] zram: identify asynchronous IO's return value Minchan Kim
2017-06-26  6:53 ` [PATCH v1 7/9] zram: write incompressible pages to backing device Minchan Kim
2017-06-26  6:53 ` [PATCH v1 8/9] zram: read page from " Minchan Kim
2017-06-26  6:53 ` [PATCH v1 9/9] zram: add config and doc file for writeback feature Minchan Kim
2017-06-28 15:41 ` [PATCH v1 0/7] writeback incompressible pages to storage Sergey Senozhatsky
2017-06-29  8:47   ` Minchan Kim
2017-06-29  9:17     ` Sergey Senozhatsky
2017-06-29  9:29       ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.