linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] zram: Allow rw_page when page isn't written back.
@ 2022-08-08 16:50 Brian Geffon
  2022-08-08 16:50 ` [RFC PATCH 1/1] " Brian Geffon
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Geffon @ 2022-08-08 16:50 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, Sergey Senozhatsky, linux-kernel, Suleiman Souhlal,
	linux-mm, Brian Geffon

Today when a zram device has a backing device we change the ops to
a new set which does not expose a rw_page method. This prevents the
upper layers from trying to issue a synchronous rw. This has the
downside that we penalize every rw even when it could possibly
still be performed as a synchronous rw.

This is just a proposal and I wanted to get feedback if people
felt this was worthwhile.

The motivation comes from what Minchan noted in the original
change which introduced the synchronous behavior that it enhanches
swap-in performance by about 45% [1]. So it'd be great if we could
still get this benefit while using writeback.

1. https://lore.kernel.org/all/1505886205-9671-5-git-send-email-minchan@kernel.org/

Brian Geffon (1):
  zram: Allow rw_page when page isn't written back.

 drivers/block/zram/zram_drv.c | 65 +++++++++++++++++++++--------------
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 41 insertions(+), 25 deletions(-)

-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/1] zram: Allow rw_page when page isn't written back.
  2022-08-08 16:50 [RFC PATCH 0/1] zram: Allow rw_page when page isn't written back Brian Geffon
@ 2022-08-08 16:50 ` Brian Geffon
  2022-08-09  1:38   ` Sergey Senozhatsky
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Geffon @ 2022-08-08 16:50 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, Sergey Senozhatsky, linux-kernel, Suleiman Souhlal,
	linux-mm, Brian Geffon

Today when a zram device has a backing device we change the ops to
a new set which does not expose a rw_page method. This prevents the
upper layers from trying to issue a synchronous rw. This has the
downside that we penalize every rw even when it could possibly
still be performed as a synchronous rw.

This change will always expose a rw_page function and if the page
has been written back it will return -EOPNOTSUPP which will force the
upper layers to try again with bio.

To safely allow a synchronous read to proceed for pages which have not
yet written back we introduce a new flag ZRAM_NO_WB. On the first
synchronous read if the page is not written back we will set the
ZRAM_NO_WB flag. This flag, which is never cleared, prevents writeback
from ever happening to that page.

This approach works because in the case of zram as a swap backing device
the page is going to be removed from zram shortly thereafter so
preventing writeback is fine. However, if zram is being used as a
generic block device then this might prevent writeback of the page.

Signed-off-by: Brian Geffon <bgeffon@google.com>
---
 drivers/block/zram/zram_drv.c | 65 +++++++++++++++++++++--------------
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 92cb929a45b7..196392353bd3 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -52,9 +52,6 @@ static unsigned int num_devices = 1;
 static size_t huge_class_size;
 
 static const struct block_device_operations zram_devops;
-#ifdef CONFIG_ZRAM_WRITEBACK
-static const struct block_device_operations zram_wb_devops;
-#endif
 
 static void zram_free_page(struct zram *zram, size_t index);
 static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
@@ -439,7 +436,6 @@ static void reset_bdev(struct zram *zram)
 	filp_close(zram->backing_dev, NULL);
 	zram->backing_dev = NULL;
 	zram->bdev = NULL;
-	zram->disk->fops = &zram_devops;
 	kvfree(zram->bitmap);
 	zram->bitmap = NULL;
 }
@@ -543,17 +539,6 @@ static ssize_t backing_dev_store(struct device *dev,
 	zram->backing_dev = backing_dev;
 	zram->bitmap = bitmap;
 	zram->nr_pages = nr_pages;
-	/*
-	 * With writeback feature, zram does asynchronous IO so it's no longer
-	 * synchronous device so let's remove synchronous io flag. Othewise,
-	 * upper layer(e.g., swap) could wait IO completion rather than
-	 * (submit and return), which will cause system sluggish.
-	 * Furthermore, when the IO function returns(e.g., swap_readpage),
-	 * upper layer expects IO was done so it could deallocate the page
-	 * freely but in fact, IO is going on so finally could cause
-	 * use-after-free when the IO is really done.
-	 */
-	zram->disk->fops = &zram_wb_devops;
 	up_write(&zram->init_lock);
 
 	pr_info("setup backing device %s\n", file_name);
@@ -722,7 +707,8 @@ static ssize_t writeback_store(struct device *dev,
 
 		if (zram_test_flag(zram, index, ZRAM_WB) ||
 				zram_test_flag(zram, index, ZRAM_SAME) ||
-				zram_test_flag(zram, index, ZRAM_UNDER_WB))
+				zram_test_flag(zram, index, ZRAM_UNDER_WB) ||
+				zram_test_flag(zram, index, ZRAM_NO_WB))
 			goto next;
 
 		if (mode & IDLE_WRITEBACK &&
@@ -1226,6 +1212,10 @@ static void zram_free_page(struct zram *zram, size_t index)
 		goto out;
 	}
 
+	if (zram_test_flag(zram, index, ZRAM_NO_WB)) {
+		zram_clear_flag(zram, index, ZRAM_NO_WB);
+	}
+
 	/*
 	 * No memory is allocated for same element filled pages.
 	 * Simply clear same page flag.
@@ -1654,6 +1644,40 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	index = sector >> SECTORS_PER_PAGE_SHIFT;
 	offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
 
+#ifdef CONFIG_ZRAM_WRITEBACK
+	/*
+	 * With writeback feature, zram does asynchronous IO so it's no longer
+	 * synchronous device so let's remove synchronous io flag. Othewise,
+	 * upper layer(e.g., swap) could wait IO completion rather than
+	 * (submit and return), which will cause system sluggish.
+	 * Furthermore, when the IO function returns(e.g., swap_readpage),
+	 * upper layer expects IO was done so it could deallocate the page
+	 * freely but in fact, IO is going on so finally could cause
+	 * use-after-free when the IO is really done.
+	 *
+	 * If the page is not currently written back then we may proceed to
+	 * read the page synchronously, otherwise, we must fail with
+	 * -EOPNOTSUPP to force the upper layers to use a normal bio.
+	 */
+	zram_slot_lock(zram, index);
+	if (zram_test_flag(zram, index, ZRAM_WB) ||
+			zram_test_flag(zram, index, ZRAM_UNDER_WB)) {
+		zram_slot_unlock(zram, index);
+		/* We cannot proceed with synchronous read */
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Don't allow the page to be written back while we read it,
+	 * this flag is never cleared. It shouldn't be a problem that
+	 * we don't clear this flag because in the case of swap this
+	 * page will be removed shortly after this read anyway.
+	 */
+	if (op == REQ_OP_READ)
+		zram_set_flag(zram, index, ZRAM_NO_WB);
+	zram_slot_unlock(zram, index);
+#endif
+
 	bv.bv_page = page;
 	bv.bv_len = PAGE_SIZE;
 	bv.bv_offset = 0;
@@ -1827,15 +1851,6 @@ static const struct block_device_operations zram_devops = {
 	.owner = THIS_MODULE
 };
 
-#ifdef CONFIG_ZRAM_WRITEBACK
-static const struct block_device_operations zram_wb_devops = {
-	.open = zram_open,
-	.submit_bio = zram_submit_bio,
-	.swap_slot_free_notify = zram_slot_free_notify,
-	.owner = THIS_MODULE
-};
-#endif
-
 static DEVICE_ATTR_WO(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 158c91e54850..20e4c6a579e0 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -50,6 +50,7 @@ enum zram_pageflags {
 	ZRAM_UNDER_WB,	/* page is under writeback */
 	ZRAM_HUGE,	/* Incompressible page */
 	ZRAM_IDLE,	/* not accessed page since last idle marking */
+	ZRAM_NO_WB,	/* Do not allow page to be written back */
 
 	__NR_ZRAM_PAGEFLAGS,
 };
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH 1/1] zram: Allow rw_page when page isn't written back.
  2022-08-08 16:50 ` [RFC PATCH 1/1] " Brian Geffon
@ 2022-08-09  1:38   ` Sergey Senozhatsky
  2022-08-10 19:18     ` Brian Geffon
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09  1:38 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Andrew Morton, Minchan Kim, Nitin Gupta, Sergey Senozhatsky,
	linux-kernel, Suleiman Souhlal, linux-mm

On (22/08/08 12:50), Brian Geffon wrote:
[..]
>  
>  	pr_info("setup backing device %s\n", file_name);
> @@ -722,7 +707,8 @@ static ssize_t writeback_store(struct device *dev,
>  
>  		if (zram_test_flag(zram, index, ZRAM_WB) ||
>  				zram_test_flag(zram, index, ZRAM_SAME) ||
> -				zram_test_flag(zram, index, ZRAM_UNDER_WB))
> +				zram_test_flag(zram, index, ZRAM_UNDER_WB) ||
> +				zram_test_flag(zram, index, ZRAM_NO_WB))
>  			goto next;

mark_idle() probably should also test ZRAM_NO_WB bit.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH 1/1] zram: Allow rw_page when page isn't written back.
  2022-08-09  1:38   ` Sergey Senozhatsky
@ 2022-08-10 19:18     ` Brian Geffon
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Geffon @ 2022-08-10 19:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Minchan Kim, Nitin Gupta, LKML, Suleiman Souhlal,
	linux-mm

Thanks Sergey,

On Mon, Aug 8, 2022 at 9:38 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (22/08/08 12:50), Brian Geffon wrote:
> [..]
> >
> >       pr_info("setup backing device %s\n", file_name);
> > @@ -722,7 +707,8 @@ static ssize_t writeback_store(struct device *dev,
> >
> >               if (zram_test_flag(zram, index, ZRAM_WB) ||
> >                               zram_test_flag(zram, index, ZRAM_SAME) ||
> > -                             zram_test_flag(zram, index, ZRAM_UNDER_WB))
> > +                             zram_test_flag(zram, index, ZRAM_UNDER_WB) ||
> > +                             zram_test_flag(zram, index, ZRAM_NO_WB))
> >                       goto next;
>
> mark_idle() probably should also test ZRAM_NO_WB bit.

While we definitely can add that check in mark_idle() it actually
doesn't hurt to allow marking the page as idle as NO_WB only controls
the writeback aspect and as long as the page is marked NO_WB it won't
be written back, idle or not. Definitely happy to add it in later
versions if people like this approach in general.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-10 19:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-08 16:50 [RFC PATCH 0/1] zram: Allow rw_page when page isn't written back Brian Geffon
2022-08-08 16:50 ` [RFC PATCH 1/1] " Brian Geffon
2022-08-09  1:38   ` Sergey Senozhatsky
2022-08-10 19:18     ` Brian Geffon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).