All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
@ 2022-03-16  6:39 cgel.zte
  2022-03-16  8:48 ` Christoph Hellwig
  2022-03-21 14:33 ` Johannes Weiner
  0 siblings, 2 replies; 18+ messages in thread
From: cgel.zte @ 2022-03-16  6:39 UTC (permalink / raw)
  To: axboe, viro, hannes
  Cc: linux-block, linux-kernel, linux-fsdevel, akpm, Yang Yang, Ran Xiaokai

From: Yang Yang <yang.yang29@zte.com.cn>

psi tracks the time spent on submitting the IO of refaulting file pages
and anonymous pages[1]. But after we tracks refaulting anonymous pages
in swap_readpage[2][3], there is no need to track refaulting anonymous
pages in submit_bio.

So this patch can reduce redundant calling of psi_memstall_enter. And
make it easier to track refaulting file pages and anonymous pages
separately.

[1] commit b8e24a9300b0 ("block: annotate refault stalls from IO submission")
[2] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage")
[3] commit 2b413a1a728f ("mm: page_io: fix psi memory pressure error on cold swapins")

Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
 block/bio.c               | 9 +++++----
 block/blk-core.c          | 2 +-
 fs/direct-io.c            | 2 +-
 include/linux/blk_types.h | 2 +-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3c57b3ba727d..54b60be4f3b0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page,
 	bio->bi_iter.bi_size += len;
 	bio->bi_vcnt++;
 
-	if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page)))
-		bio_set_flag(bio, BIO_WORKINGSET);
+	if (!bio_flagged(bio, BIO_WORKINGSET_FILE) &&
+	    unlikely(PageWorkingset(page)) && !PageSwapBacked(page))
+		bio_set_flag(bio, BIO_WORKINGSET_FILE);
 }
 EXPORT_SYMBOL_GPL(__bio_add_page);
 
@@ -1254,7 +1255,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
  * is returned only if 0 pages could be pinned.
  *
  * It's intended for direct IO, so doesn't do PSI tracking, the caller is
- * responsible for setting BIO_WORKINGSET if necessary.
+ * responsible for setting BIO_WORKINGSET_FILE if necessary.
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -1274,7 +1275,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
 	/* don't account direct I/O as memory stall */
-	bio_clear_flag(bio, BIO_WORKINGSET);
+	bio_clear_flag(bio, BIO_WORKINGSET_FILE);
 	return bio->bi_vcnt ? 0 : ret;
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
diff --git a/block/blk-core.c b/block/blk-core.c
index ddac62aebc55..9a955323734b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -918,7 +918,7 @@ void submit_bio(struct bio *bio)
 	 * part of overall IO time.
 	 */
 	if (unlikely(bio_op(bio) == REQ_OP_READ &&
-	    bio_flagged(bio, BIO_WORKINGSET))) {
+	    bio_flagged(bio, BIO_WORKINGSET_FILE))) {
 		unsigned long pflags;
 
 		psi_memstall_enter(&pflags);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index aef06e607b40..7cdec50fb27b 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -420,7 +420,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
 
 	bio->bi_private = dio;
 	/* don't account direct I/O as memory stall */
-	bio_clear_flag(bio, BIO_WORKINGSET);
+	bio_clear_flag(bio, BIO_WORKINGSET_FILE);
 
 	spin_lock_irqsave(&dio->bio_lock, flags);
 	dio->refcount++;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0c7a9a1f06c8..a1aaba4767e9 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -314,7 +314,7 @@ enum {
 	BIO_NO_PAGE_REF,	/* don't put release vec pages */
 	BIO_CLONED,		/* doesn't own data */
 	BIO_BOUNCED,		/* bio is a bounce bio */
-	BIO_WORKINGSET,		/* contains userspace workingset pages */
+	BIO_WORKINGSET_FILE,	/* contains userspace workingset file pages */
 	BIO_QUIET,		/* Make BIO Quiet */
 	BIO_CHAIN,		/* chained bio, ->bi_remaining in effect */
 	BIO_REFFED,		/* bio has elevated ->bi_cnt */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-16  6:39 [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages cgel.zte
@ 2022-03-16  8:48 ` Christoph Hellwig
  2022-03-17  2:18   ` CGEL
  2022-03-21 14:33 ` Johannes Weiner
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2022-03-16  8:48 UTC (permalink / raw)
  To: cgel.zte
  Cc: axboe, viro, hannes, linux-block, linux-kernel, linux-fsdevel,
	akpm, Yang Yang, Ran Xiaokai

> @@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page,
>  	bio->bi_iter.bi_size += len;
>  	bio->bi_vcnt++;
>  
> -	if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page)))
> -		bio_set_flag(bio, BIO_WORKINGSET);
> +	if (!bio_flagged(bio, BIO_WORKINGSET_FILE) &&
> +	    unlikely(PageWorkingset(page)) && !PageSwapBacked(page))
> +		bio_set_flag(bio, BIO_WORKINGSET_FILE);

This needs to go out of the block I/O fast path, not grow even more
checks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-16  8:48 ` Christoph Hellwig
@ 2022-03-17  2:18   ` CGEL
  0 siblings, 0 replies; 18+ messages in thread
From: CGEL @ 2022-03-17  2:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, viro, hannes, linux-block, linux-kernel, linux-fsdevel,
	akpm, Yang Yang, Ran Xiaokai

On Wed, Mar 16, 2022 at 01:48:05AM -0700, Christoph Hellwig wrote:
> > @@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page,
> >  	bio->bi_iter.bi_size += len;
> >  	bio->bi_vcnt++;
> >  
> > -	if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page)))
> > -		bio_set_flag(bio, BIO_WORKINGSET);
> > +	if (!bio_flagged(bio, BIO_WORKINGSET_FILE) &&
> > +	    unlikely(PageWorkingset(page)) && !PageSwapBacked(page))
> > +		bio_set_flag(bio, BIO_WORKINGSET_FILE);
> 
> This needs to go out of the block I/O fast path, not grow even more
> checks.

Thanks for your replying.

First, Johannes Weiner had made his state, see:
https://lore.kernel.org/all/Yio17pXawRuuVJFO@cmpxchg.org/

Second, I understand your concern, but actually there seems no better
way to do file pages workingset delay accounting, because this is no
unique function to track file pages submitting, kernel do this in
multiple sub-system.

Thirdly, this patch doesn't make it worse, indeed it reduce unnecessary
psi accounting in submit_bio.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-16  6:39 [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages cgel.zte
  2022-03-16  8:48 ` Christoph Hellwig
@ 2022-03-21 14:33 ` Johannes Weiner
  2022-03-22  2:47   ` CGEL
  2022-03-22  3:44   ` CGEL
  1 sibling, 2 replies; 18+ messages in thread
From: Johannes Weiner @ 2022-03-21 14:33 UTC (permalink / raw)
  To: cgel.zte
  Cc: axboe, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai

On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> From: Yang Yang <yang.yang29@zte.com.cn>
> 
> psi tracks the time spent on submitting the IO of refaulting file pages
> and anonymous pages[1]. But after we tracks refaulting anonymous pages
> in swap_readpage[2][3], there is no need to track refaulting anonymous
> pages in submit_bio.
> 
> So this patch can reduce redundant calling of psi_memstall_enter. And
> make it easier to track refaulting file pages and anonymous pages
> separately.

I don't think this is an improvement.

psi_memstall_enter() will check current->in_memstall once, detect the
nested call, and bail. Your patch checks PageSwapBacked for every page
being added. It's more branches for less robust code.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-21 14:33 ` Johannes Weiner
@ 2022-03-22  2:47   ` CGEL
  2022-03-22 13:27     ` Johannes Weiner
  2022-03-22  3:44   ` CGEL
  1 sibling, 1 reply; 18+ messages in thread
From: CGEL @ 2022-03-22  2:47 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: axboe, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai

On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > From: Yang Yang <yang.yang29@zte.com.cn>
> > 
> > psi tracks the time spent on submitting the IO of refaulting file pages
> > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > pages in submit_bio.
> > 
> > So this patch can reduce redundant calling of psi_memstall_enter. And
> > make it easier to track refaulting file pages and anonymous pages
> > separately.
> 
> I don't think this is an improvement.
> 
> psi_memstall_enter() will check current->in_memstall once, detect the
> nested call, and bail. Your patch checks PageSwapBacked for every page
> being added. It's more branches for less robust code.

We are also working for a new patch to classify different reasons cause
psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
user to tuning sysctl, for example, if user see high compact delay, he
may try do adjust THP sysctl to reduce the compact delay.

To support that, we should distinguish what's the reason cause psi in
submit_io(), this patch does the job.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-21 14:33 ` Johannes Weiner
  2022-03-22  2:47   ` CGEL
@ 2022-03-22  3:44   ` CGEL
  1 sibling, 0 replies; 18+ messages in thread
From: CGEL @ 2022-03-22  3:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: axboe, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai

On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > From: Yang Yang <yang.yang29@zte.com.cn>
> > 
> > psi tracks the time spent on submitting the IO of refaulting file pages
> > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > pages in submit_bio.
> > 
> > So this patch can reduce redundant calling of psi_memstall_enter. And
> > make it easier to track refaulting file pages and anonymous pages
> > separately.
> 
> I don't think this is an improvement.
> 
> psi_memstall_enter() will check current->in_memstall once, detect the
> nested call, and bail. Your patch checks PageSwapBacked for every page
> being added. It's more branches for less robust code.

And PageSwapBacked checking is after unlikely(PageWorkingset(page), so I think
the impact is little.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-22  2:47   ` CGEL
@ 2022-03-22 13:27     ` Johannes Weiner
  2022-03-23  6:10       ` CGEL
       [not found]       ` <20220323061058.GA2343452@cgel.zte@gmail.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Johannes Weiner @ 2022-03-22 13:27 UTC (permalink / raw)
  To: CGEL
  Cc: axboe, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai

On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote:
> On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > > From: Yang Yang <yang.yang29@zte.com.cn>
> > > 
> > > psi tracks the time spent on submitting the IO of refaulting file pages
> > > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > > pages in submit_bio.
> > > 
> > > So this patch can reduce redundant calling of psi_memstall_enter. And
> > > make it easier to track refaulting file pages and anonymous pages
> > > separately.
> > 
> > I don't think this is an improvement.
> > 
> > psi_memstall_enter() will check current->in_memstall once, detect the
> > nested call, and bail. Your patch checks PageSwapBacked for every page
> > being added. It's more branches for less robust code.
> 
> We are also working for a new patch to classify different reasons cause
> psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
> user to tuning sysctl, for example, if user see high compact delay, he
> may try do adjust THP sysctl to reduce the compact delay.
> 
> To support that, we should distinguish what's the reason cause psi in
> submit_io(), this patch does the job.

Please submit these patches together then. On its own, this patch
isn't desirable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-22 13:27     ` Johannes Weiner
@ 2022-03-23  6:10       ` CGEL
       [not found]       ` <20220323061058.GA2343452@cgel.zte@gmail.com>
  1 sibling, 0 replies; 18+ messages in thread
From: CGEL @ 2022-03-23  6:10 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: axboe, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai

On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote:
> On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote:
> > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > > > From: Yang Yang <yang.yang29@zte.com.cn>
> > > > 
> > > > psi tracks the time spent on submitting the IO of refaulting file pages
> > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > > > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > > > pages in submit_bio.
> > > > 
> > > > So this patch can reduce redundant calling of psi_memstall_enter. And
> > > > make it easier to track refaulting file pages and anonymous pages
> > > > separately.
> > > 
> > > I don't think this is an improvement.
> > > 
> > > psi_memstall_enter() will check current->in_memstall once, detect the
> > > nested call, and bail. Your patch checks PageSwapBacked for every page
> > > being added. It's more branches for less robust code.
> > 
> > We are also working for a new patch to classify different reasons cause
> > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
> > user to tuning sysctl, for example, if user see high compact delay, he
> > may try do adjust THP sysctl to reduce the compact delay.
> > 
> > To support that, we should distinguish what's the reason cause psi in
> > submit_io(), this patch does the job.
> 
> Please submit these patches together then. On its own, this patch
> isn't desirable.
I think this patch has it's independent value, I try to make a better
explain.

1) This patch doesn't work it worse or even better
After this patch, swap workingset handle is simpler, file workingset
handle just has one more check, as below.
Before this patch handling swap workingset:
	a) in swap_readpage() call psi_memstall_enter() ->
	b) in __bio_add_page() test if should call bio_set_flag(), true ->
	c) in __bio_add_page() call bio_set_flag()
	d) in submit_bio() test if should call psi_memstall_enter(), true ->
	e) call psi_memstall_enter, detect the nested call, and bail.
	f) call bio_clear_flag if needed.
Before this patch handling file page workingset:
	a) in __bio_add_page() test if should call bio_set_flag(), true ->
	...
	b) call bio_clear_flag if needed.
After this patch handling swap workingset:
	a) in swap_readpage() call psi_memstall_enter() ->
	b) in __bio_add_page() test if should call bio_set_flag(), one more check, false and return.
	c) in submit_bio() test if should call psi_memstall_enter(), false and return.
After this patch handling file pages workingset:
	a) in __bio_add_page() test if should call bio_set_flag(), one more check, true ->
	...
	b) call bio_clear_flag if needed.

2) This patch help tools like kprobe to trace different workingset
After this patch we know workingset in submit_io() is only cause by file pages.

3) This patch will help code evolution
Such as psi classify, getdelays supports counting file pages workingset submit.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
       [not found]       ` <20220323061058.GA2343452@cgel.zte@gmail.com>
@ 2022-03-30  8:34         ` CGEL
  2022-03-30 13:00           ` Johannes Weiner
  0 siblings, 1 reply; 18+ messages in thread
From: CGEL @ 2022-03-30  8:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: axboe, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai

On Wed, Mar 23, 2022 at 06:11:03AM +0000, CGEL wrote:
> On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote:
> > On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote:
> > > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> > > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > > > > From: Yang Yang <yang.yang29@zte.com.cn>
> > > > > 
> > > > > psi tracks the time spent on submitting the IO of refaulting file pages
> > > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > > > > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > > > > pages in submit_bio.
> > > > > 
> > > > > So this patch can reduce redundant calling of psi_memstall_enter. And
> > > > > make it easier to track refaulting file pages and anonymous pages
> > > > > separately.
> > > > 
> > > > I don't think this is an improvement.
> > > > 
> > > > psi_memstall_enter() will check current->in_memstall once, detect the
> > > > nested call, and bail. Your patch checks PageSwapBacked for every page
> > > > being added. It's more branches for less robust code.
> > > 
> > > We are also working for a new patch to classify different reasons cause
> > > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
> > > user to tuning sysctl, for example, if user see high compact delay, he
> > > may try do adjust THP sysctl to reduce the compact delay.
> > > 
> > > To support that, we should distinguish what's the reason cause psi in
> > > submit_io(), this patch does the job.
> > 
> > Please submit these patches together then. On its own, this patch
> > isn't desirable.
> I think this patch has it's independent value, I try to make a better
> explain.
> 
> 1) This patch doesn't work it worse or even better
> After this patch, swap workingset handle is simpler, file workingset
> handle just has one more check, as below.
> Before this patch handling swap workingset:
> 	a) in swap_readpage() call psi_memstall_enter() ->
> 	b) in __bio_add_page() test if should call bio_set_flag(), true ->
> 	c) in __bio_add_page() call bio_set_flag()
> 	d) in submit_bio() test if should call psi_memstall_enter(), true ->
> 	e) call psi_memstall_enter, detect the nested call, and bail.
> 	f) call bio_clear_flag if needed.
> Before this patch handling file page workingset:
> 	a) in __bio_add_page() test if should call bio_set_flag(), true ->
> 	...
> 	b) call bio_clear_flag if needed.
> After this patch handling swap workingset:
> 	a) in swap_readpage() call psi_memstall_enter() ->
> 	b) in __bio_add_page() test if should call bio_set_flag(), one more check, false and return.
> 	c) in submit_bio() test if should call psi_memstall_enter(), false and return.
> After this patch handling file pages workingset:
> 	a) in __bio_add_page() test if should call bio_set_flag(), one more check, true ->
> 	...
> 	b) call bio_clear_flag if needed.
> 
> 2) This patch help tools like kprobe to trace different workingset
> After this patch we know workingset in submit_io() is only cause by file pages.
> 
> 3) This patch will help code evolution
> Such as psi classify, getdelays supports counting file pages workingset submit.

Looking forward to your review, thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-30  8:34         ` CGEL
@ 2022-03-30 13:00           ` Johannes Weiner
  2022-03-30 13:04             ` Christoph Hellwig
  2022-03-31  2:21             ` CGEL
  0 siblings, 2 replies; 18+ messages in thread
From: Johannes Weiner @ 2022-03-30 13:00 UTC (permalink / raw)
  To: CGEL
  Cc: axboe, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai, Christoph Hellwig

On Wed, Mar 30, 2022 at 08:34:08AM +0000, CGEL wrote:
> On Wed, Mar 23, 2022 at 06:11:03AM +0000, CGEL wrote:
> > On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote:
> > > On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote:
> > > > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> > > > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > > > > > From: Yang Yang <yang.yang29@zte.com.cn>
> > > > > > 
> > > > > > psi tracks the time spent on submitting the IO of refaulting file pages
> > > > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > > > > > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > > > > > pages in submit_bio.
> > > > > > 
> > > > > > So this patch can reduce redundant calling of psi_memstall_enter. And
> > > > > > make it easier to track refaulting file pages and anonymous pages
> > > > > > separately.
> > > > > 
> > > > > I don't think this is an improvement.
> > > > > 
> > > > > psi_memstall_enter() will check current->in_memstall once, detect the
> > > > > nested call, and bail. Your patch checks PageSwapBacked for every page
> > > > > being added. It's more branches for less robust code.
> > > > 
> > > > We are also working for a new patch to classify different reasons cause
> > > > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
> > > > user to tuning sysctl, for example, if user see high compact delay, he
> > > > may try do adjust THP sysctl to reduce the compact delay.
> > > > 
> > > > To support that, we should distinguish what's the reason cause psi in
> > > > submit_io(), this patch does the job.
> > > 
> > > Please submit these patches together then. On its own, this patch
> > > isn't desirable.
> > I think this patch has it's independent value, I try to make a better
> > explain.

You missed the point about it complicating semantics.

Right now, the bio layer annotates stalls from queue contention. This
is very simple. The swap code has relied on it in the past. It doesn't
now, but that doesn't change what the concept is at the bio layer.

You patch explicitly codifies that the MM handles swap IOs, and the
lower bio layer handles files. This asymmetry is ugly and error prone.

If you want type distinction, we should move it all into MM code, like
Christoph is saying. Were swap code handles anon refaults and the page
cache code handles file refaults. This would be my preferred layering,
and my original patch did that: https://lkml.org/lkml/2019/7/22/1070.

But it was NAKed, and I had to agree with the argument. The page cache
code is not very centralized, and the place where we deal with
individual pages (and detect refaults) and where we submit bios (where
the stalls occur) is spread out into multiple filesystems. There are
180 submit_bio() calls in fs/; you'd have to audit which ones are for
page cache submissions, and then add stall annotations or use a new
submit_bio_cache() wrapper that handles it. Changes in the filesystem
could easily miss this protocol and silently break pressure detection.

I would prefer the annotations to be at this level, I just don't see
how to do it cleanly/robustly. Maybe Christoph has an idea, he
understands the fs side much better than I do.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-30 13:00           ` Johannes Weiner
@ 2022-03-30 13:04             ` Christoph Hellwig
  2022-03-30 13:49               ` Jens Axboe
  2022-03-30 15:45               ` Johannes Weiner
  2022-03-31  2:21             ` CGEL
  1 sibling, 2 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-03-30 13:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: CGEL, axboe, viro, linux-block, linux-kernel, linux-fsdevel,
	akpm, Yang Yang, Ran Xiaokai, Christoph Hellwig

On Wed, Mar 30, 2022 at 09:00:46AM -0400, Johannes Weiner wrote:
> If you want type distinction, we should move it all into MM code, like
> Christoph is saying. Were swap code handles anon refaults and the page
> cache code handles file refaults. This would be my preferred layering,
> and my original patch did that: https://lkml.org/lkml/2019/7/22/1070.

FYI, I started redoing that version and I think with all the cleanups
to filemap.c and the readahead code this can be done fairly nicely now:

http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-30 13:04             ` Christoph Hellwig
@ 2022-03-30 13:49               ` Jens Axboe
  2022-03-30 15:45               ` Johannes Weiner
  1 sibling, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2022-03-30 13:49 UTC (permalink / raw)
  To: Christoph Hellwig, Johannes Weiner
  Cc: CGEL, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai

On 3/30/22 7:04 AM, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 09:00:46AM -0400, Johannes Weiner wrote:
>> If you want type distinction, we should move it all into MM code, like
>> Christoph is saying. Were swap code handles anon refaults and the page
>> cache code handles file refaults. This would be my preferred layering,
>> and my original patch did that: https://lkml.org/lkml/2019/7/22/1070.
> 
> FYI, I started redoing that version and I think with all the cleanups
> to filemap.c and the readahead code this can be done fairly nicely now:
> 
> http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4

This looks way better than hiding it deep down in the block layer.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-30 13:04             ` Christoph Hellwig
  2022-03-30 13:49               ` Jens Axboe
@ 2022-03-30 15:45               ` Johannes Weiner
  2022-03-30 15:54                 ` Christoph Hellwig
  1 sibling, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2022-03-30 15:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: CGEL, axboe, viro, linux-block, linux-kernel, linux-fsdevel,
	akpm, Yang Yang, Ran Xiaokai

On Wed, Mar 30, 2022 at 06:04:08AM -0700, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 09:00:46AM -0400, Johannes Weiner wrote:
> > If you want type distinction, we should move it all into MM code, like
> > Christoph is saying. Were swap code handles anon refaults and the page
> > cache code handles file refaults. This would be my preferred layering,
> > and my original patch did that: https://lkml.org/lkml/2019/7/22/1070.
> 
> FYI, I started redoing that version and I think with all the cleanups
> to filemap.c and the readahead code this can be done fairly nicely now:
> 
> http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4

Yes, it's definitely much nicer now with the MM instantiating the
pages for ->readpage(s).

But AFAICS this breaks compressed btrfs (and erofs?) because those
still do additional add_to_page_cache_lru() and bio submissions.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-30 15:45               ` Johannes Weiner
@ 2022-03-30 15:54                 ` Christoph Hellwig
  2022-03-30 16:17                   ` Johannes Weiner
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2022-03-30 15:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Christoph Hellwig, CGEL, axboe, viro, linux-block, linux-kernel,
	linux-fsdevel, akpm, Yang Yang, Ran Xiaokai

On Wed, Mar 30, 2022 at 11:45:56AM -0400, Johannes Weiner wrote:
> > FYI, I started redoing that version and I think with all the cleanups
> > to filemap.c and the readahead code this can be done fairly nicely now:
> > 
> > http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4
> 
> Yes, it's definitely much nicer now with the MM instantiating the
> pages for ->readpage(s).
> 
> But AFAICS this breaks compressed btrfs (and erofs?) because those
> still do additional add_to_page_cache_lru() and bio submissions.

In btrfs, add_ra_bio_pages only passed freshly allocated pages to
add_to_page_cache_lru.  These can't really have PageWorkingSet set,
can they?  In erofs they can also come from a local page pool, but
I think otherwise the same applies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-30 15:54                 ` Christoph Hellwig
@ 2022-03-30 16:17                   ` Johannes Weiner
  2022-03-31  5:15                     ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2022-03-30 16:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: CGEL, axboe, viro, linux-block, linux-kernel, linux-fsdevel,
	akpm, Yang Yang, Ran Xiaokai

On Wed, Mar 30, 2022 at 08:54:09AM -0700, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 11:45:56AM -0400, Johannes Weiner wrote:
> > > FYI, I started redoing that version and I think with all the cleanups
> > > to filemap.c and the readahead code this can be done fairly nicely now:
> > > 
> > > http://git.infradead.org/users/hch/block.git/commitdiff/666abb29c6db870d3941acc5ac19e83fbc72cfd4
> > 
> > Yes, it's definitely much nicer now with the MM instantiating the
> > pages for ->readpage(s).
> > 
> > But AFAICS this breaks compressed btrfs (and erofs?) because those
> > still do additional add_to_page_cache_lru() and bio submissions.
> 
> In btrfs, add_ra_bio_pages only passed freshly allocated pages to
> add_to_page_cache_lru.  These can't really have PageWorkingSet set,
> can they?  In erofs they can also come from a local page pool, but
> I think otherwise the same applies.

It's add_to_page_cache_lru() that sets the flag.

Basically, when a PageWorkingset (hot) page gets reclaimed, the bit is
stored in the vacated tree slot. When the entry is brought back in,
add_to_page_cache_lru() transfers it to the newly allocated page.

add_to_page_cache_lru()
  filemap_add_folio()
    __filemap_add_folio(mapping, folio, index, gfp, &shadow)
      *shadow = *slot
      *slot = folio
  if (shadow)
    workingset_refault(folio, shadow)
      folio_set_workingset()

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-30 13:00           ` Johannes Weiner
  2022-03-30 13:04             ` Christoph Hellwig
@ 2022-03-31  2:21             ` CGEL
  1 sibling, 0 replies; 18+ messages in thread
From: CGEL @ 2022-03-31  2:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: axboe, viro, linux-block, linux-kernel, linux-fsdevel, akpm,
	Yang Yang, Ran Xiaokai, Christoph Hellwig

On Wed, Mar 30, 2022 at 09:00:46AM -0400, Johannes Weiner wrote:
> On Wed, Mar 30, 2022 at 08:34:08AM +0000, CGEL wrote:
> > On Wed, Mar 23, 2022 at 06:11:03AM +0000, CGEL wrote:
> > > On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote:
> > > > On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote:
> > > > > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> > > > > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > > > > > > From: Yang Yang <yang.yang29@zte.com.cn>
> > > > > > > 
> > > > > > > psi tracks the time spent on submitting the IO of refaulting file pages
> > > > > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > > > > > > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > > > > > > pages in submit_bio.
> > > > > > > 
> > > > > > > So this patch can reduce redundant calling of psi_memstall_enter. And
> > > > > > > make it easier to track refaulting file pages and anonymous pages
> > > > > > > separately.
> > > > > > 
> > > > > > I don't think this is an improvement.
> > > > > > 
> > > > > > psi_memstall_enter() will check current->in_memstall once, detect the
> > > > > > nested call, and bail. Your patch checks PageSwapBacked for every page
> > > > > > being added. It's more branches for less robust code.
> > > > > 
> > > > > We are also working for a new patch to classify different reasons cause
> > > > > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
> > > > > user to tuning sysctl, for example, if user see high compact delay, he
> > > > > may try do adjust THP sysctl to reduce the compact delay.
> > > > > 
> > > > > To support that, we should distinguish what's the reason cause psi in
> > > > > submit_io(), this patch does the job.
> > > > 
> > > > Please submit these patches together then. On its own, this patch
> > > > isn't desirable.
> > > I think this patch has it's independent value, I try to make a better
> > > explain.
> 
> You missed the point about it complicating semantics.
> 
> Right now, the bio layer annotates stalls from queue contention. This
> is very simple. The swap code has relied on it in the past. It doesn't
> now, but that doesn't change what the concept is at the bio layer.
> 
> You patch explicitly codifies that the MM handles swap IOs, and the
> lower bio layer handles files. This asymmetry is ugly and error prone.
>
Yes this asymmetry is bad, we also don't think this patch is perfect.

But just as you said below, page cache creating is spread out multiple
filesystems, if we move psi_memstall_enter to the upper layer it would
maybe horrible repetition and also error prone. I think the primary
question is page cache code is not centralized, before it's be solved
(not likely?), we may have to make a compromise, and I think this
patch is a simple one.

Thanks.

> If you want type distinction, we should move it all into MM code, like
> Christoph is saying. Were swap code handles anon refaults and the page
> cache code handles file refaults. This would be my preferred layering,
> and my original patch did that: https://lkml.org/lkml/2019/7/22/1070.
> 
> But it was NAKed, and I had to agree with the argument. The page cache
> code is not very centralized, and the place where we deal with
> individual pages (and detect refaults) and where we submit bios (where
> the stalls occur) is spread out into multiple filesystems. There are
> 180 submit_bio() calls in fs/; you'd have to audit which ones are for
> page cache submissions, and then add stall annotations or use a new
> submit_bio_cache() wrapper that handles it. Changes in the filesystem
> could easily miss this protocol and silently break pressure detection.
>
> I would prefer the annotations to be at this level, I just don't see
> how to do it cleanly/robustly. Maybe Christoph has an idea, he
> understands the fs side much better than I do.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-30 16:17                   ` Johannes Weiner
@ 2022-03-31  5:15                     ` Christoph Hellwig
  2022-03-31 19:07                       ` Johannes Weiner
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2022-03-31  5:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Christoph Hellwig, CGEL, axboe, viro, linux-block, linux-kernel,
	linux-fsdevel, akpm, Yang Yang, Ran Xiaokai

On Wed, Mar 30, 2022 at 12:17:09PM -0400, Johannes Weiner wrote:
> It's add_to_page_cache_lru() that sets the flag.
> 
> Basically, when a PageWorkingset (hot) page gets reclaimed, the bit is
> stored in the vacated tree slot. When the entry is brought back in,
> add_to_page_cache_lru() transfers it to the newly allocated page.

Ok.  In this case my patch didn't quite do the right thing for readahead
either.  But that does leave a question for the btrfs compressed
case, which only adds extra pages to a read to readahad a bigger
cluster size - that is these pages are not read at the request of the
VM.  Does it really make sense to do PSI accounting for them in that
case?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
  2022-03-31  5:15                     ` Christoph Hellwig
@ 2022-03-31 19:07                       ` Johannes Weiner
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2022-03-31 19:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: CGEL, axboe, viro, linux-block, linux-kernel, linux-fsdevel,
	akpm, Yang Yang, Ran Xiaokai

On Wed, Mar 30, 2022 at 10:15:32PM -0700, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 12:17:09PM -0400, Johannes Weiner wrote:
> > It's add_to_page_cache_lru() that sets the flag.
> > 
> > Basically, when a PageWorkingset (hot) page gets reclaimed, the bit is
> > stored in the vacated tree slot. When the entry is brought back in,
> > add_to_page_cache_lru() transfers it to the newly allocated page.
> 
> Ok.  In this case my patch didn't quite do the right thing for readahead
> either.  But that does leave a question for the btrfs compressed
> case, which only adds extra pages to a read to readahad a bigger
> cluster size - that is these pages are not read at the request of the
> VM.  Does it really make sense to do PSI accounting for them in that
> case?

I think it does.

I suppose it's an argument about readahead pages in general, which
technically the workload itself doesn't commission explicitly. But
those pages are still triggered by a nearby access, their reads
contribute to device utilization, and if they're PageWorkingset it
means they're only being read because there is a lack of memory.

In a perfect world, readahead would stop when memory or IO are
contended. But it doesn't, and the stalls it can inject into the
workload are as real as stalls from directly requested reads.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-03-31 19:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16  6:39 [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages cgel.zte
2022-03-16  8:48 ` Christoph Hellwig
2022-03-17  2:18   ` CGEL
2022-03-21 14:33 ` Johannes Weiner
2022-03-22  2:47   ` CGEL
2022-03-22 13:27     ` Johannes Weiner
2022-03-23  6:10       ` CGEL
     [not found]       ` <20220323061058.GA2343452@cgel.zte@gmail.com>
2022-03-30  8:34         ` CGEL
2022-03-30 13:00           ` Johannes Weiner
2022-03-30 13:04             ` Christoph Hellwig
2022-03-30 13:49               ` Jens Axboe
2022-03-30 15:45               ` Johannes Weiner
2022-03-30 15:54                 ` Christoph Hellwig
2022-03-30 16:17                   ` Johannes Weiner
2022-03-31  5:15                     ` Christoph Hellwig
2022-03-31 19:07                       ` Johannes Weiner
2022-03-31  2:21             ` CGEL
2022-03-22  3:44   ` CGEL

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.