linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Brian Geffon <bgeffon@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Nitin Gupta <ngupta@vflare.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	linux-kernel@vger.kernel.org,
	Suleiman Souhlal <suleiman@google.com>,
	Rom Lemarchand <romlem@google.com>,
	linux-mm@kvack.org
Subject: Re: [RESEND RFC] zram: Allow rw_page when page isn't written back.
Date: Fri, 23 Sep 2022 19:31:30 +0000	[thread overview]
Message-ID: <Yy4JkpZ/SnXtrVRf@google.com> (raw)
In-Reply-To: <20220908125037.1119114-1-bgeffon@google.com>

On Thu, Sep 08, 2022 at 08:50:37AM -0400, Brian Geffon wrote:
> Today when a zram device has a backing device we change the ops to
> a new set which does not expose a rw_page method. This prevents the
> upper layers from trying to issue a synchronous rw. This has the
> downside that we penalize every rw even when it could possibly

Do you mean addiontal bio alloc/free?
Please specify something more detail.

> still be performed as a synchronous rw. By the very nature of

Even though zram go though the block layer in the case, it's still
synchronous operation against on in-memory compressed data. Only
asynchrnous IO happens for the data in backing device.

> zram all writes are synchronous so it's unfortunate to have to
> accept this limitation.
> 
> This change will always expose a rw_page function and if the page
> has been written back it will return -EOPNOTSUPP which will force the
> upper layers to try again with bio.

Sounds a good idea.

> 
> To safely allow a synchronous read to proceed for pages which have not
> yet written back we introduce a new flag ZRAM_NO_WB. On the first
> synchronous read if the page is not written back we will set the
> ZRAM_NO_WB flag. This flag, which is never cleared, prevents writeback
> from ever happening to that page.

Why do we need a addtional flag?
Why couldn't we do?

1. expose the rw_page all the time.
2. If the page was written back, just return an error in rw_page to make
upper layer retry it with bio.

> 
> This approach works because in the case of zram as a swap backing device
> the page is going to be removed from zram shortly thereafter so
> preventing writeback is fine. However, if zram is being used as a
> generic block device then this might prevent writeback of the page.
> 
> This proposal is still very much RFC, feedback would be appreciated.
> 
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> ---
>  drivers/block/zram/zram_drv.c | 68 +++++++++++++++++++++--------------
>  drivers/block/zram/zram_drv.h |  1 +
>  2 files changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 92cb929a45b7..22b69e8b6042 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -52,9 +52,6 @@ static unsigned int num_devices = 1;
>  static size_t huge_class_size;
>  
>  static const struct block_device_operations zram_devops;
> -#ifdef CONFIG_ZRAM_WRITEBACK
> -static const struct block_device_operations zram_wb_devops;
> -#endif
>  
>  static void zram_free_page(struct zram *zram, size_t index);
>  static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
> @@ -309,7 +306,8 @@ static void mark_idle(struct zram *zram, ktime_t cutoff)
>  		 */
>  		zram_slot_lock(zram, index);
>  		if (zram_allocated(zram, index) &&
> -				!zram_test_flag(zram, index, ZRAM_UNDER_WB)) {
> +				!zram_test_flag(zram, index, ZRAM_UNDER_WB) &&
> +				!zram_test_flag(zram, index, ZRAM_NO_WB)) {
>  #ifdef CONFIG_ZRAM_MEMORY_TRACKING
>  			is_idle = !cutoff || ktime_after(cutoff, zram->table[index].ac_time);
>  #endif
> @@ -439,7 +437,6 @@ static void reset_bdev(struct zram *zram)
>  	filp_close(zram->backing_dev, NULL);
>  	zram->backing_dev = NULL;
>  	zram->bdev = NULL;
> -	zram->disk->fops = &zram_devops;
>  	kvfree(zram->bitmap);
>  	zram->bitmap = NULL;
>  }
> @@ -543,17 +540,6 @@ static ssize_t backing_dev_store(struct device *dev,
>  	zram->backing_dev = backing_dev;
>  	zram->bitmap = bitmap;
>  	zram->nr_pages = nr_pages;
> -	/*
> -	 * With writeback feature, zram does asynchronous IO so it's no longer
> -	 * synchronous device so let's remove synchronous io flag. Othewise,
> -	 * upper layer(e.g., swap) could wait IO completion rather than
> -	 * (submit and return), which will cause system sluggish.
> -	 * Furthermore, when the IO function returns(e.g., swap_readpage),
> -	 * upper layer expects IO was done so it could deallocate the page
> -	 * freely but in fact, IO is going on so finally could cause
> -	 * use-after-free when the IO is really done.
> -	 */
> -	zram->disk->fops = &zram_wb_devops;
>  	up_write(&zram->init_lock);
>  
>  	pr_info("setup backing device %s\n", file_name);
> @@ -722,7 +708,8 @@ static ssize_t writeback_store(struct device *dev,
>  
>  		if (zram_test_flag(zram, index, ZRAM_WB) ||
>  				zram_test_flag(zram, index, ZRAM_SAME) ||
> -				zram_test_flag(zram, index, ZRAM_UNDER_WB))
> +				zram_test_flag(zram, index, ZRAM_UNDER_WB) ||
> +				zram_test_flag(zram, index, ZRAM_NO_WB))
>  			goto next;
>  
>  		if (mode & IDLE_WRITEBACK &&
> @@ -1226,6 +1213,10 @@ static void zram_free_page(struct zram *zram, size_t index)
>  		goto out;
>  	}
>  
> +	if (zram_test_flag(zram, index, ZRAM_NO_WB)) {
> +		zram_clear_flag(zram, index, ZRAM_NO_WB);
> +	}
> +
>  	/*
>  	 * No memory is allocated for same element filled pages.
>  	 * Simply clear same page flag.
> @@ -1654,6 +1645,40 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
>  	index = sector >> SECTORS_PER_PAGE_SHIFT;
>  	offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
>  
> +#ifdef CONFIG_ZRAM_WRITEBACK
> +	/*
> +	 * With writeback feature, zram does asynchronous IO so it's no longer
> +	 * synchronous device so let's remove synchronous io flag. Othewise,
> +	 * upper layer(e.g., swap) could wait IO completion rather than
> +	 * (submit and return), which will cause system sluggish.
> +	 * Furthermore, when the IO function returns(e.g., swap_readpage),
> +	 * upper layer expects IO was done so it could deallocate the page
> +	 * freely but in fact, IO is going on so finally could cause
> +	 * use-after-free when the IO is really done.
> +	 *
> +	 * If the page is not currently written back then we may proceed to
> +	 * read the page synchronously, otherwise, we must fail with
> +	 * -EOPNOTSUPP to force the upper layers to use a normal bio.
> +	 */
> +	zram_slot_lock(zram, index);
> +	if (zram_test_flag(zram, index, ZRAM_WB) ||
> +			zram_test_flag(zram, index, ZRAM_UNDER_WB)) {
> +		zram_slot_unlock(zram, index);
> +		/* We cannot proceed with synchronous read */
> +		return -EOPNOTSUPP;
> +	}
> +
> +	/*
> +	 * Don't allow the page to be written back while we read it,
> +	 * this flag is never cleared. It shouldn't be a problem that
> +	 * we don't clear this flag because in the case of swap this
> +	 * page will be removed shortly after this read anyway.
> +	 */
> +	if (op == REQ_OP_READ)
> +		zram_set_flag(zram, index, ZRAM_NO_WB);
> +	zram_slot_unlock(zram, index);
> +#endif
> +
>  	bv.bv_page = page;
>  	bv.bv_len = PAGE_SIZE;
>  	bv.bv_offset = 0;
> @@ -1827,15 +1852,6 @@ static const struct block_device_operations zram_devops = {
>  	.owner = THIS_MODULE
>  };
>  
> -#ifdef CONFIG_ZRAM_WRITEBACK
> -static const struct block_device_operations zram_wb_devops = {
> -	.open = zram_open,
> -	.submit_bio = zram_submit_bio,
> -	.swap_slot_free_notify = zram_slot_free_notify,
> -	.owner = THIS_MODULE
> -};
> -#endif
> -
>  static DEVICE_ATTR_WO(compact);
>  static DEVICE_ATTR_RW(disksize);
>  static DEVICE_ATTR_RO(initstate);
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index 158c91e54850..20e4c6a579e0 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -50,6 +50,7 @@ enum zram_pageflags {
>  	ZRAM_UNDER_WB,	/* page is under writeback */
>  	ZRAM_HUGE,	/* Incompressible page */
>  	ZRAM_IDLE,	/* not accessed page since last idle marking */
> +	ZRAM_NO_WB,	/* Do not allow page to be written back */
>  
>  	__NR_ZRAM_PAGEFLAGS,
>  };
> -- 
> 2.37.2.789.g6183377224-goog
> 
> 

  parent reply	other threads:[~2022-09-23 19:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-08 12:50 [RESEND RFC] zram: Allow rw_page when page isn't written back Brian Geffon
2022-09-09  8:30 ` Sergey Senozhatsky
2022-09-12  4:37   ` Sergey Senozhatsky
2022-09-12  6:07     ` Sergey Senozhatsky
2022-09-23 19:31 ` Minchan Kim [this message]
2022-09-30 19:33   ` Brian Geffon
2022-09-30 19:52   ` [PATCH] zram: Always expose rw_page Brian Geffon
2022-10-03  2:59     ` Sergey Senozhatsky
2022-10-03 14:46       ` Brian Geffon
2022-10-03 14:48   ` [PATCH v2] " Brian Geffon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yy4JkpZ/SnXtrVRf@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ngupta@vflare.org \
    --cc=romlem@google.com \
    --cc=senozhatsky@chromium.org \
    --cc=suleiman@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).