linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/vmscan: don't try to reclaim freed folios
@ 2022-05-27  8:04 Miaohe Lin
  2022-05-27 15:02 ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Miaohe Lin @ 2022-05-27  8:04 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, linmiaohe

If folios were freed from under us, there's no need to reclaim them. Skip
these folios to save lots of cpu cycles and avoid possible unnecessary
disk IO.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/vmscan.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f7d9a683e3a7..646dd1efad32 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 		folio = lru_to_folio(page_list);
 		list_del(&folio->lru);
 
+		nr_pages = folio_nr_pages(folio);
+		if (folio_ref_count(folio) == 1) {
+			/* folio was freed from under us. So we are done. */
+			WARN_ON(!folio_put_testzero(folio));
+			goto free_it;
+		}
+
 		if (!folio_trylock(folio))
 			goto keep;
 
 		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
 
-		nr_pages = folio_nr_pages(folio);
 
 		/* Account the number of base pages */
 		sc->nr_scanned += nr_pages;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/vmscan: don't try to reclaim freed folios
  2022-05-27  8:04 [PATCH] mm/vmscan: don't try to reclaim freed folios Miaohe Lin
@ 2022-05-27 15:02 ` Matthew Wilcox
  2022-05-28  2:52   ` Miaohe Lin
  2022-06-08 14:09   ` Miaohe Lin
  0 siblings, 2 replies; 6+ messages in thread
From: Matthew Wilcox @ 2022-05-27 15:02 UTC (permalink / raw)
  To: Miaohe Lin; +Cc: akpm, linux-mm, linux-kernel

On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote:
> If folios were freed from under us, there's no need to reclaim them. Skip
> these folios to save lots of cpu cycles and avoid possible unnecessary
> disk IO.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/vmscan.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f7d9a683e3a7..646dd1efad32 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>  		folio = lru_to_folio(page_list);
>  		list_del(&folio->lru);
>  
> +		nr_pages = folio_nr_pages(folio);
> +		if (folio_ref_count(folio) == 1) {
> +			/* folio was freed from under us. So we are done. */
> +			WARN_ON(!folio_put_testzero(folio));

What?  No.  This can absolutely happen.  We have a refcount on the folio,
which means that any other thread can temporarily raise the refcount,
so this WARN_ON can trigger.  Also, we don't hold the folio locked,
or an extra reference, so nr_pages is unstable because it can be split.

> +			goto free_it;
> +		}
> +
>  		if (!folio_trylock(folio))
>  			goto keep;
>  
>  		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>  
> -		nr_pages = folio_nr_pages(folio);
>  
>  		/* Account the number of base pages */
>  		sc->nr_scanned += nr_pages;
> -- 
> 2.23.0
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/vmscan: don't try to reclaim freed folios
  2022-05-27 15:02 ` Matthew Wilcox
@ 2022-05-28  2:52   ` Miaohe Lin
  2022-05-28  3:13     ` Matthew Wilcox
  2022-06-08 14:09   ` Miaohe Lin
  1 sibling, 1 reply; 6+ messages in thread
From: Miaohe Lin @ 2022-05-28  2:52 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: akpm, linux-mm, linux-kernel

On 2022/5/27 23:02, Matthew Wilcox wrote:
> On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote:
>> If folios were freed from under us, there's no need to reclaim them. Skip
>> these folios to save lots of cpu cycles and avoid possible unnecessary
>> disk IO.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/vmscan.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index f7d9a683e3a7..646dd1efad32 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>>  		folio = lru_to_folio(page_list);
>>  		list_del(&folio->lru);
>>  
>> +		nr_pages = folio_nr_pages(folio);
>> +		if (folio_ref_count(folio) == 1) {
>> +			/* folio was freed from under us. So we are done. */
>> +			WARN_ON(!folio_put_testzero(folio));
> 
> What?  No.  This can absolutely happen.  We have a refcount on the folio,
> which means that any other thread can temporarily raise the refcount,

IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or
under any use. So there should be no way that any other thread can temporarily raise the refcount when
folio_ref_count == 1. Or am I miss something?

> so this WARN_ON can trigger.  Also, we don't hold the folio locked,
> or an extra reference, so nr_pages is unstable because it can be split.

Yes, you're right. When folio_ref_count != 1, nr_pages is unstable. Will fix it if v2 is possible. :)

Thanks a lot for review and comment!

> 
>> +			goto free_it;
>> +		}
>> +
>>  		if (!folio_trylock(folio))
>>  			goto keep;
>>  
>>  		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>>  
>> -		nr_pages = folio_nr_pages(folio);
>>  
>>  		/* Account the number of base pages */
>>  		sc->nr_scanned += nr_pages;
>> -- 
>> 2.23.0
>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/vmscan: don't try to reclaim freed folios
  2022-05-28  2:52   ` Miaohe Lin
@ 2022-05-28  3:13     ` Matthew Wilcox
  2022-05-28  6:24       ` Miaohe Lin
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2022-05-28  3:13 UTC (permalink / raw)
  To: Miaohe Lin; +Cc: akpm, linux-mm, linux-kernel

On Sat, May 28, 2022 at 10:52:11AM +0800, Miaohe Lin wrote:
> On 2022/5/27 23:02, Matthew Wilcox wrote:
> > What?  No.  This can absolutely happen.  We have a refcount on the folio,
> > which means that any other thread can temporarily raise the refcount,
> 
> IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or
> under any use. So there should be no way that any other thread can temporarily raise the refcount when
> folio_ref_count == 1. Or am I miss something?

Take a look at something like GUP (fast).  If this page _was_ mapped to
userspace, something like this can happen:

Thread A	Thread B
load PTE
		unmap page
		refcount goes to 1
		vmscan sees the page
try_get_ref
		refcount is now 2.  WARN_ON.

Thread A will see that the PTE has changed and will now drop its
reference, but Thread B already spat out the WARN.

A similar thing can happen with the page cache.

If this is a worthwhile optimisation (does it happen often that we find
a refcount == 1 page?), we could do something like ...

		if (folio_ref_freeze(folio, 1)) {
			nr_pages = folio_nr_pages(folio);
			goto free_it;
		}

... or ...

		if (folio_ref_count(folio) == 1 &&
		    folio_ref_freeze(folio, 1)) {

... if we want to test-and-test-and-clear

But this function is far too complicated already.  I really want to
see numbers that proves the extra complexity is worth it.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/vmscan: don't try to reclaim freed folios
  2022-05-28  3:13     ` Matthew Wilcox
@ 2022-05-28  6:24       ` Miaohe Lin
  0 siblings, 0 replies; 6+ messages in thread
From: Miaohe Lin @ 2022-05-28  6:24 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: akpm, linux-mm, linux-kernel

On 2022/5/28 11:13, Matthew Wilcox wrote:
> On Sat, May 28, 2022 at 10:52:11AM +0800, Miaohe Lin wrote:
>> On 2022/5/27 23:02, Matthew Wilcox wrote:
>>> What?  No.  This can absolutely happen.  We have a refcount on the folio,
>>> which means that any other thread can temporarily raise the refcount,
>>
>> IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or
>> under any use. So there should be no way that any other thread can temporarily raise the refcount when
>> folio_ref_count == 1. Or am I miss something?
> 
> Take a look at something like GUP (fast).  If this page _was_ mapped to
> userspace, something like this can happen:
> 
> Thread A	Thread B
> load PTE
> 		unmap page
> 		refcount goes to 1
> 		vmscan sees the page
> try_get_ref
> 		refcount is now 2.  WARN_ON.
> 
> Thread A will see that the PTE has changed and will now drop its
> reference, but Thread B already spat out the WARN.
> 
> A similar thing can happen with the page cache.

Oh, I see. Many thanks for your patient explanation! :)

> 
> If this is a worthwhile optimisation (does it happen often that we find
> a refcount == 1 page?), we could do something like ...

No, It should be rare.

> 
> 		if (folio_ref_freeze(folio, 1)) {
> 			nr_pages = folio_nr_pages(folio);
> 			goto free_it;
> 		}
> 
> ... or ...
> 
> 		if (folio_ref_count(folio) == 1 &&
> 		    folio_ref_freeze(folio, 1)) {
> 
> ... if we want to test-and-test-and-clear

These proposed code changes look good to me.

> 
> But this function is far too complicated already.  I really want to
> see numbers that proves the extra complexity is worth it.

This optimization can save lots of cpu cycles and avoid possible disk I/O in
that specified case. But that is a somewhat rare case. So there's no numbers
that proves the extra complexity is worth it.

Should I drop this patch or proceed with the proposed code changes above in
next version? :)

Many thanks!

> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/vmscan: don't try to reclaim freed folios
  2022-05-27 15:02 ` Matthew Wilcox
  2022-05-28  2:52   ` Miaohe Lin
@ 2022-06-08 14:09   ` Miaohe Lin
  1 sibling, 0 replies; 6+ messages in thread
From: Miaohe Lin @ 2022-06-08 14:09 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: akpm, linux-mm, linux-kernel

On 2022/5/27 23:02, Matthew Wilcox wrote:
> On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote:
>> If folios were freed from under us, there's no need to reclaim them. Skip
>> these folios to save lots of cpu cycles and avoid possible unnecessary
>> disk IO.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/vmscan.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index f7d9a683e3a7..646dd1efad32 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>>  		folio = lru_to_folio(page_list);
>>  		list_del(&folio->lru);
>>  
>> +		nr_pages = folio_nr_pages(folio);
>> +		if (folio_ref_count(folio) == 1) {
>> +			/* folio was freed from under us. So we are done. */
>> +			WARN_ON(!folio_put_testzero(folio));
> 
> What?  No.  This can absolutely happen.  We have a refcount on the folio,
> which means that any other thread can temporarily raise the refcount,
> so this WARN_ON can trigger.  Also, we don't hold the folio locked,
> or an extra reference, so nr_pages is unstable because it can be split.

When I reread the code, I found caller holds an extra reference to the folio when
calling isolate_lru_pages(), so folio can't be split and thus nr_pages should be
stable indeed? Or am I miss something again?

Thanks!

> 
>> +			goto free_it;
>> +		}
>> +
>>  		if (!folio_trylock(folio))
>>  			goto keep;
>>  
>>  		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>>  
>> -		nr_pages = folio_nr_pages(folio);
>>  
>>  		/* Account the number of base pages */
>>  		sc->nr_scanned += nr_pages;
>> -- 
>> 2.23.0
>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-08 14:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27  8:04 [PATCH] mm/vmscan: don't try to reclaim freed folios Miaohe Lin
2022-05-27 15:02 ` Matthew Wilcox
2022-05-28  2:52   ` Miaohe Lin
2022-05-28  3:13     ` Matthew Wilcox
2022-05-28  6:24       ` Miaohe Lin
2022-06-08 14:09   ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).