linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT
@ 2021-12-01 10:59 Huangzhaoyang
  2021-12-01 11:12 ` Zhaoyang Huang
  0 siblings, 1 reply; 7+ messages in thread
From: Huangzhaoyang @ 2021-12-01 10:59 UTC (permalink / raw)
  To: Nitin Gupta, Sergey Senozhatsky, Jens Axboe, Johannes Weiner,
	Minchan Kim, Zhaoyang Huang, linux-mm, linux-kernel

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Have zram reading/writing be counted in PSI_IO_WAIT.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 drivers/block/zram/zram_drv.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index fcaf275..b0e4766 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -34,6 +34,7 @@
 #include <linux/debugfs.h>
 #include <linux/cpuhotplug.h>
 #include <linux/part_stat.h>
+#include <linux/psi.h>
 
 #include "zram_drv.h"
 
@@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 				zram_get_element(zram, index),
 				bio, partial_io);
 	}
-
+#ifdef CONFIG_PSI
+	psi_task_change(current, 0, TSK_IOWAIT);
+#endif
 	handle = zram_get_handle(zram, index);
 	if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) {
 		unsigned long value;
@@ -1257,6 +1260,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 		zram_fill_page(mem, PAGE_SIZE, value);
 		kunmap_atomic(mem);
 		zram_slot_unlock(zram, index);
+#ifdef CONFIG_PSI
+		psi_task_change(current, TSK_IOWAIT, 0);
+#endif
 		return 0;
 	}
 
@@ -1284,6 +1290,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
 	if (WARN_ON(ret))
 		pr_err("Decompression failed! err=%d, page=%u\n", ret, index);
 
+#ifdef CONFIG_PSI
+	psi_task_change(current, TSK_IOWAIT, 0);
+#endif
 	return ret;
 }
 
@@ -1471,7 +1480,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 		vec.bv_offset = 0;
 	}
 
+#ifdef CONFIG_PSI
+	psi_task_change(current, 0, TSK_IOWAIT);
+#endif
 	ret = __zram_bvec_write(zram, &vec, index, bio);
+#ifdef CONFIG_PSI
+	psi_task_change(current, TSK_IOWAIT, 0);
+#endif
 out:
 	if (is_partial_io(bvec))
 		__free_page(page);
@@ -1607,7 +1622,6 @@ static blk_qc_t zram_submit_bio(struct bio *bio)
 		atomic64_inc(&zram->stats.invalid_io);
 		goto error;
 	}
-
 	__zram_make_request(zram, bio);
 	return BLK_QC_T_NONE;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT
  2021-12-01 10:59 [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT Huangzhaoyang
@ 2021-12-01 11:12 ` Zhaoyang Huang
  2021-12-02 16:28   ` Johannes Weiner
  0 siblings, 1 reply; 7+ messages in thread
From: Zhaoyang Huang @ 2021-12-01 11:12 UTC (permalink / raw)
  To: Nitin Gupta, Sergey Senozhatsky, Jens Axboe, Johannes Weiner,
	Minchan Kim, Zhaoyang Huang, open list:MEMORY MANAGEMENT, LKML

There is no chance for zram reading/writing to be counted in
PSI_IO_WAIT so far as zram will deal with the request just in current
context without invoking submit_bio and io_schedule.

On Wed, Dec 1, 2021 at 6:59 PM Huangzhaoyang <huangzhaoyang@gmail.com> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> Have zram reading/writing be counted in PSI_IO_WAIT.
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  drivers/block/zram/zram_drv.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index fcaf275..b0e4766 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -34,6 +34,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/cpuhotplug.h>
>  #include <linux/part_stat.h>
> +#include <linux/psi.h>
>
>  #include "zram_drv.h"
>
> @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
>                                 zram_get_element(zram, index),
>                                 bio, partial_io);
>         }
> -
> +#ifdef CONFIG_PSI
> +       psi_task_change(current, 0, TSK_IOWAIT);
> +#endif
>         handle = zram_get_handle(zram, index);
>         if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) {
>                 unsigned long value;
> @@ -1257,6 +1260,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
>                 zram_fill_page(mem, PAGE_SIZE, value);
>                 kunmap_atomic(mem);
>                 zram_slot_unlock(zram, index);
> +#ifdef CONFIG_PSI
> +               psi_task_change(current, TSK_IOWAIT, 0);
> +#endif
>                 return 0;
>         }
>
> @@ -1284,6 +1290,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
>         if (WARN_ON(ret))
>                 pr_err("Decompression failed! err=%d, page=%u\n", ret, index);
>
> +#ifdef CONFIG_PSI
> +       psi_task_change(current, TSK_IOWAIT, 0);
> +#endif
>         return ret;
>  }
>
> @@ -1471,7 +1480,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
>                 vec.bv_offset = 0;
>         }
>
> +#ifdef CONFIG_PSI
> +       psi_task_change(current, 0, TSK_IOWAIT);
> +#endif
>         ret = __zram_bvec_write(zram, &vec, index, bio);
> +#ifdef CONFIG_PSI
> +       psi_task_change(current, TSK_IOWAIT, 0);
> +#endif
>  out:
>         if (is_partial_io(bvec))
>                 __free_page(page);
> @@ -1607,7 +1622,6 @@ static blk_qc_t zram_submit_bio(struct bio *bio)
>                 atomic64_inc(&zram->stats.invalid_io);
>                 goto error;
>         }
> -
>         __zram_make_request(zram, bio);
>         return BLK_QC_T_NONE;
>
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT
  2021-12-01 11:12 ` Zhaoyang Huang
@ 2021-12-02 16:28   ` Johannes Weiner
  2021-12-03  9:16     ` Zhaoyang Huang
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2021-12-02 16:28 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Nitin Gupta, Sergey Senozhatsky, Jens Axboe, Minchan Kim,
	Zhaoyang Huang, open list:MEMORY MANAGEMENT, LKML

On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote:
> There is no chance for zram reading/writing to be counted in
> PSI_IO_WAIT so far as zram will deal with the request just in current
> context without invoking submit_bio and io_schedule.

Hm, but you're also not waiting for a real io device - during which
the CPU could be doing something else e.g. You're waiting for
decompression. The thread also isn't in D-state during that time. What
scenario would benefit from this accounting? How is IO pressure from
comp/decomp paths actionable to you?

What about when you use zram with disk writeback enabled, and you see
a mix of decompression and actual disk IO. Wouldn't you want to be
able to tell the two apart, to see if you're short on CPU or short on
IO bandwidth in this setup? Your patch would make that impossible.

This needs a much more comprehensive changelog.

> > @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
> >                                 zram_get_element(zram, index),
> >                                 bio, partial_io);
> >         }
> > -
> > +#ifdef CONFIG_PSI
> > +       psi_task_change(current, 0, TSK_IOWAIT);
> > +#endif

Add psi_iostall_enter() and leave helpers that encapsulate the ifdefs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT
  2021-12-02 16:28   ` Johannes Weiner
@ 2021-12-03  9:16     ` Zhaoyang Huang
  2021-12-08 16:38       ` Johannes Weiner
  2021-12-08 17:45       ` Chris Down
  0 siblings, 2 replies; 7+ messages in thread
From: Zhaoyang Huang @ 2021-12-03  9:16 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Nitin Gupta, Sergey Senozhatsky, Jens Axboe, Minchan Kim,
	Zhaoyang Huang, open list:MEMORY MANAGEMENT, LKML

On Fri, Dec 3, 2021 at 12:28 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote:
> > There is no chance for zram reading/writing to be counted in
> > PSI_IO_WAIT so far as zram will deal with the request just in current
> > context without invoking submit_bio and io_schedule.
>
> Hm, but you're also not waiting for a real io device - during which
> the CPU could be doing something else e.g. You're waiting for
> decompression. The thread also isn't in D-state during that time. What
> scenario would benefit from this accounting? How is IO pressure from
> comp/decomp paths actionable to you?
No. Block device related D-state will be counted in via
psi_dequeue(io_wait). What I am proposing here is do NOT ignore the
influence on non-productive time by huge numbers of in-context swap
in/out (zram like). This can help to make IO pressure more accurate
and coordinate with the number of PSWPIN/OUT. It is like counting the
IO time within filemap_fault->wait_on_page_bit_common into
psi_mem_stall, which introduces memory pressure high by IO.
>
> What about when you use zram with disk writeback enabled, and you see
> a mix of decompression and actual disk IO. Wouldn't you want to be
> able to tell the two apart, to see if you're short on CPU or short on
> IO bandwidth in this setup? Your patch would make that impossible.
OK. Is it better to start the IO counting from pageout? Both of the
bdev and ram backed swap would benefit from it.
>
> This needs a much more comprehensive changelog.
>
> > > @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
> > >                                 zram_get_element(zram, index),
> > >                                 bio, partial_io);
> > >         }
> > > -
> > > +#ifdef CONFIG_PSI
> > > +       psi_task_change(current, 0, TSK_IOWAIT);
> > > +#endif
>
> Add psi_iostall_enter() and leave helpers that encapsulate the ifdefs.
OK.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT
  2021-12-03  9:16     ` Zhaoyang Huang
@ 2021-12-08 16:38       ` Johannes Weiner
  2021-12-08 17:45       ` Chris Down
  1 sibling, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2021-12-08 16:38 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Nitin Gupta, Sergey Senozhatsky, Jens Axboe, Minchan Kim,
	Zhaoyang Huang, open list:MEMORY MANAGEMENT, LKML

On Fri, Dec 03, 2021 at 05:16:47PM +0800, Zhaoyang Huang wrote:
> On Fri, Dec 3, 2021 at 12:28 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote:
> > > There is no chance for zram reading/writing to be counted in
> > > PSI_IO_WAIT so far as zram will deal with the request just in current
> > > context without invoking submit_bio and io_schedule.
> >
> > Hm, but you're also not waiting for a real io device - during which
> > the CPU could be doing something else e.g. You're waiting for
> > decompression. The thread also isn't in D-state during that time. What
> > scenario would benefit from this accounting? How is IO pressure from
> > comp/decomp paths actionable to you?
> No. Block device related D-state will be counted in via
> psi_dequeue(io_wait). What I am proposing here is do NOT ignore the
> influence on non-productive time by huge numbers of in-context swap
> in/out (zram like). This can help to make IO pressure more accurate
> and coordinate with the number of PSWPIN/OUT. It is like counting the
> IO time within filemap_fault->wait_on_page_bit_common into
> psi_mem_stall, which introduces memory pressure high by IO.

It's not ignored, it shows up as memory pressure. Because those delays
occur due to a lack of memory.

On the other hand, having a faster IO device would make no difference
to the time spent on compression and decompression. Counting this time
as IO pressure makes no sense to me.

I'm against merging this patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT
  2021-12-03  9:16     ` Zhaoyang Huang
  2021-12-08 16:38       ` Johannes Weiner
@ 2021-12-08 17:45       ` Chris Down
  2021-12-08 17:47         ` Chris Down
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Down @ 2021-12-08 17:45 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Johannes Weiner, Nitin Gupta, Sergey Senozhatsky, Jens Axboe,
	Minchan Kim, Zhaoyang Huang, open list:MEMORY MANAGEMENT, LKML

Zhaoyang Huang writes:
>No. Block device related D-state will be counted in via
>psi_dequeue(io_wait). What I am proposing here is do NOT ignore the
>influence on non-productive time by huge numbers of in-context swap
>in/out (zram like). This can help to make IO pressure more accurate
>and coordinate with the number of PSWPIN/OUT. It is like counting the
>IO time within filemap_fault->wait_on_page_bit_common into
>psi_mem_stall, which introduces memory pressure high by IO.

I think part of the confusion here is that the name "io" doesn't really just 
mean "io", it means "disk I/O". As in, we are targeting real, physical or 
network disk I/O. Of course, we can only do what's reasonable if the device 
we're accounting for is layers upon layers eventually leading to a 
memory-backed device, but _intentionally_ polluting that with more memory-bound 
accesses doesn't make any sense when we already have separate accounting for 
memory. Why would anyone want that?

I'm with Johannes here, I think this would actively make memory pressure 
monitoring less useful. This is a NAK from my perspective as someone who 
actually uses these things in production.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT
  2021-12-08 17:45       ` Chris Down
@ 2021-12-08 17:47         ` Chris Down
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Down @ 2021-12-08 17:47 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Johannes Weiner, Nitin Gupta, Sergey Senozhatsky, Jens Axboe,
	Minchan Kim, Zhaoyang Huang, open list:MEMORY MANAGEMENT, LKML

Chris Down writes:
>I'm with Johannes here, I think this would actively make memory 
>pressure monitoring less useful. This is a NAK from my perspective as 
>someone who actually uses these things in production.

s/memory/io/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-08 17:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01 10:59 [RFC PATCH] mm: count zram read/write into PSI_IO_WAIT Huangzhaoyang
2021-12-01 11:12 ` Zhaoyang Huang
2021-12-02 16:28   ` Johannes Weiner
2021-12-03  9:16     ` Zhaoyang Huang
2021-12-08 16:38       ` Johannes Weiner
2021-12-08 17:45       ` Chris Down
2021-12-08 17:47         ` Chris Down

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).