Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
@ 2019-11-01 13:50 zhong jiang
  2019-11-01 14:46 ` Johannes Weiner
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: zhong jiang @ 2019-11-01 13:50 UTC (permalink / raw)
  To: akpm, minchan; +Cc: mhocko, hannes, linux-mm, zhongjiang

Recently, I hit the following issue when running in the upstream.

kernel BUG at mm/vmscan.c:1521!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
Code: de f5 ff ff e8 ab 79 eb ff 4c 89 f7 e8 43 33 0d 00 e9 cc f5 ff ff e8 99 79 eb ff 48 c7 c6 a0 34 2b a0 4c 89 f7 e8 1a 4d 05 00 <0f> 0b e8 83 79 eb ff 48 89 d8 48 c1 e8 03 42 80 3c 38 00 0f 85 74
RSP: 0018:ffff88819a3df5a0 EFLAGS: 00010286
RAX: 0000000000040000 RBX: ffffea00061c3980 RCX: ffffffff814fba36
RDX: 00000000000056f7 RSI: ffffc9000c02c000 RDI: ffff8881f70268cc
RBP: ffff88819a3df898 R08: ffffed103ee05de0 R09: ffffed103ee05de0
R10: 0000000000000001 R11: ffffed103ee05ddf R12: ffff88819a3df6f0
R13: ffff88819a3df6f0 R14: ffffea00061c3980 R15: dffffc0000000000
FS:  00007f21b9d8e700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2d621000 CR3: 00000001c8c46004 CR4: 00000000007606f0
DR0: 0000000020000140 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
PKRU: 55555554
Call Trace:
 reclaim_pages+0x499/0x800 mm/vmscan.c:2188
 madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
 walk_pmd_range mm/pagewalk.c:53 [inline]
 walk_pud_range mm/pagewalk.c:112 [inline]
 walk_p4d_range mm/pagewalk.c:139 [inline]
 walk_pgd_range mm/pagewalk.c:166 [inline]
 __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
 walk_page_range+0x179/0x310 mm/pagewalk.c:349
 madvise_pageout_page_range mm/madvise.c:506 [inline]
 madvise_pageout+0x1f0/0x330 mm/madvise.c:542
 madvise_vma mm/madvise.c:931 [inline]
 __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
 do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

madvise_pageout access the specified range of the vma and isolate
them, then run shrink_page_list to reclaim its memory. But It also
isolate the unevictable page to reclaim. Hence, we can catch the
cases in shrink_page_list.

The root cause is that we scan the page tables instead of specific
LRU list. and so we need to filter out the unevictable lru pages
from our end.

Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
Cc: stable@vger.kernel.org
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
v2 -> v3:
    Modified the mail address to hannes@cmpxchg.org

v1 -> v2:
    Remove BUG_ON is not an good solutions. Becuase it indeed can
    see !page_evictable() in shrink_page_list due to some race.  

 mm/madvise.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 99dd06f..63e1308 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -363,8 +363,12 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		ClearPageReferenced(page);
 		test_and_clear_page_young(page);
 		if (pageout) {
-			if (!isolate_lru_page(page))
-				list_add(&page->lru, &page_list);
+			if (!isolate_lru_page(page)) {
+				if (PageUnevictable(page))
+					putback_lru_page(page);
+				else
+					list_add(&page->lru, &page_list);
+			}
 		} else
 			deactivate_page(page);
 huge_unlock:
@@ -441,8 +445,12 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		ClearPageReferenced(page);
 		test_and_clear_page_young(page);
 		if (pageout) {
-			if (!isolate_lru_page(page))
-				list_add(&page->lru, &page_list);
+			if (!isolate_lru_page(page)) {
+				if (PageUnevictable(page))
+					putback_lru_page(page);
+				else
+					list_add(&page->lru, &page_list);
+			}
 		} else
 			deactivate_page(page);
 	}
-- 
1.7.12.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-01 13:50 [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout zhong jiang
@ 2019-11-01 14:46 ` Johannes Weiner
  2019-11-01 15:08 ` Minchan Kim
  2019-11-01 18:32 ` Michal Hocko
  2 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2019-11-01 14:46 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, minchan, mhocko, linux-mm

On Fri, Nov 01, 2019 at 09:50:45PM +0800, zhong jiang wrote:
> Recently, I hit the following issue when running in the upstream.
> 
> kernel BUG at mm/vmscan.c:1521!
> invalid opcode: 0000 [#1] SMP KASAN PTI
> CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
> Code: de f5 ff ff e8 ab 79 eb ff 4c 89 f7 e8 43 33 0d 00 e9 cc f5 ff ff e8 99 79 eb ff 48 c7 c6 a0 34 2b a0 4c 89 f7 e8 1a 4d 05 00 <0f> 0b e8 83 79 eb ff 48 89 d8 48 c1 e8 03 42 80 3c 38 00 0f 85 74
> RSP: 0018:ffff88819a3df5a0 EFLAGS: 00010286
> RAX: 0000000000040000 RBX: ffffea00061c3980 RCX: ffffffff814fba36
> RDX: 00000000000056f7 RSI: ffffc9000c02c000 RDI: ffff8881f70268cc
> RBP: ffff88819a3df898 R08: ffffed103ee05de0 R09: ffffed103ee05de0
> R10: 0000000000000001 R11: ffffed103ee05ddf R12: ffff88819a3df6f0
> R13: ffff88819a3df6f0 R14: ffffea00061c3980 R15: dffffc0000000000
> FS:  00007f21b9d8e700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2d621000 CR3: 00000001c8c46004 CR4: 00000000007606f0
> DR0: 0000000020000140 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> PKRU: 55555554
> Call Trace:
>  reclaim_pages+0x499/0x800 mm/vmscan.c:2188
>  madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
>  walk_pmd_range mm/pagewalk.c:53 [inline]
>  walk_pud_range mm/pagewalk.c:112 [inline]
>  walk_p4d_range mm/pagewalk.c:139 [inline]
>  walk_pgd_range mm/pagewalk.c:166 [inline]
>  __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
>  walk_page_range+0x179/0x310 mm/pagewalk.c:349
>  madvise_pageout_page_range mm/madvise.c:506 [inline]
>  madvise_pageout+0x1f0/0x330 mm/madvise.c:542
>  madvise_vma mm/madvise.c:931 [inline]
>  __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
>  do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> madvise_pageout access the specified range of the vma and isolate
> them, then run shrink_page_list to reclaim its memory. But It also
> isolate the unevictable page to reclaim. Hence, we can catch the
> cases in shrink_page_list.
> 
> The root cause is that we scan the page tables instead of specific
> LRU list. and so we need to filter out the unevictable lru pages
> from our end.
> 
> Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
> Cc: stable@vger.kernel.org
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-01 13:50 [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout zhong jiang
  2019-11-01 14:46 ` Johannes Weiner
@ 2019-11-01 15:08 ` Minchan Kim
  2019-11-01 18:32 ` Michal Hocko
  2 siblings, 0 replies; 11+ messages in thread
From: Minchan Kim @ 2019-11-01 15:08 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, mhocko, hannes, linux-mm

On Fri, Nov 01, 2019 at 09:50:45PM +0800, zhong jiang wrote:
> Recently, I hit the following issue when running in the upstream.
> 
> kernel BUG at mm/vmscan.c:1521!
> invalid opcode: 0000 [#1] SMP KASAN PTI
> CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
> Code: de f5 ff ff e8 ab 79 eb ff 4c 89 f7 e8 43 33 0d 00 e9 cc f5 ff ff e8 99 79 eb ff 48 c7 c6 a0 34 2b a0 4c 89 f7 e8 1a 4d 05 00 <0f> 0b e8 83 79 eb ff 48 89 d8 48 c1 e8 03 42 80 3c 38 00 0f 85 74
> RSP: 0018:ffff88819a3df5a0 EFLAGS: 00010286
> RAX: 0000000000040000 RBX: ffffea00061c3980 RCX: ffffffff814fba36
> RDX: 00000000000056f7 RSI: ffffc9000c02c000 RDI: ffff8881f70268cc
> RBP: ffff88819a3df898 R08: ffffed103ee05de0 R09: ffffed103ee05de0
> R10: 0000000000000001 R11: ffffed103ee05ddf R12: ffff88819a3df6f0
> R13: ffff88819a3df6f0 R14: ffffea00061c3980 R15: dffffc0000000000
> FS:  00007f21b9d8e700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2d621000 CR3: 00000001c8c46004 CR4: 00000000007606f0
> DR0: 0000000020000140 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> PKRU: 55555554
> Call Trace:
>  reclaim_pages+0x499/0x800 mm/vmscan.c:2188
>  madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
>  walk_pmd_range mm/pagewalk.c:53 [inline]
>  walk_pud_range mm/pagewalk.c:112 [inline]
>  walk_p4d_range mm/pagewalk.c:139 [inline]
>  walk_pgd_range mm/pagewalk.c:166 [inline]
>  __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
>  walk_page_range+0x179/0x310 mm/pagewalk.c:349
>  madvise_pageout_page_range mm/madvise.c:506 [inline]
>  madvise_pageout+0x1f0/0x330 mm/madvise.c:542
>  madvise_vma mm/madvise.c:931 [inline]
>  __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
>  do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> madvise_pageout access the specified range of the vma and isolate
> them, then run shrink_page_list to reclaim its memory. But It also
> isolate the unevictable page to reclaim. Hence, we can catch the
> cases in shrink_page_list.
> 
> The root cause is that we scan the page tables instead of specific
> LRU list. and so we need to filter out the unevictable lru pages
> from our end.
> 
> Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
> Cc: stable@vger.kernel.org
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>

Thanks for the fix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-01 13:50 [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout zhong jiang
  2019-11-01 14:46 ` Johannes Weiner
  2019-11-01 15:08 ` Minchan Kim
@ 2019-11-01 18:32 ` Michal Hocko
  2019-11-02  7:36   ` zhong jiang
  2 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2019-11-01 18:32 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, minchan, hannes, linux-mm

On Fri 01-11-19 21:50:45, zhong jiang wrote:
> Recently, I hit the following issue when running in the upstream.
> 
> kernel BUG at mm/vmscan.c:1521!
> invalid opcode: 0000 [#1] SMP KASAN PTI
> CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
> Code: de f5 ff ff e8 ab 79 eb ff 4c 89 f7 e8 43 33 0d 00 e9 cc f5 ff ff e8 99 79 eb ff 48 c7 c6 a0 34 2b a0 4c 89 f7 e8 1a 4d 05 00 <0f> 0b e8 83 79 eb ff 48 89 d8 48 c1 e8 03 42 80 3c 38 00 0f 85 74
> RSP: 0018:ffff88819a3df5a0 EFLAGS: 00010286
> RAX: 0000000000040000 RBX: ffffea00061c3980 RCX: ffffffff814fba36
> RDX: 00000000000056f7 RSI: ffffc9000c02c000 RDI: ffff8881f70268cc
> RBP: ffff88819a3df898 R08: ffffed103ee05de0 R09: ffffed103ee05de0
> R10: 0000000000000001 R11: ffffed103ee05ddf R12: ffff88819a3df6f0
> R13: ffff88819a3df6f0 R14: ffffea00061c3980 R15: dffffc0000000000
> FS:  00007f21b9d8e700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2d621000 CR3: 00000001c8c46004 CR4: 00000000007606f0
> DR0: 0000000020000140 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> PKRU: 55555554
> Call Trace:
>  reclaim_pages+0x499/0x800 mm/vmscan.c:2188
>  madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
>  walk_pmd_range mm/pagewalk.c:53 [inline]
>  walk_pud_range mm/pagewalk.c:112 [inline]
>  walk_p4d_range mm/pagewalk.c:139 [inline]
>  walk_pgd_range mm/pagewalk.c:166 [inline]
>  __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
>  walk_page_range+0x179/0x310 mm/pagewalk.c:349
>  madvise_pageout_page_range mm/madvise.c:506 [inline]
>  madvise_pageout+0x1f0/0x330 mm/madvise.c:542
>  madvise_vma mm/madvise.c:931 [inline]
>  __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
>  do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> madvise_pageout access the specified range of the vma and isolate
> them, then run shrink_page_list to reclaim its memory. But It also
> isolate the unevictable page to reclaim. Hence, we can catch the
> cases in shrink_page_list.
> 
> The root cause is that we scan the page tables instead of specific
> LRU list. and so we need to filter out the unevictable lru pages
> from our end.
> 
> Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
> Cc: stable@vger.kernel.org
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>

Acked-by: Michal Hocko <mhocko@suse.com>

But I would really appreciate to add a comment for the BUG_ON and
explain why do we care about PageUnevictable so much when there is an
explicit page_evictable check in the reclaim path. In other words a
short summary of what Johannes explained in
http://lkml.kernel.org/r/20191030193307.GA48128@cmpxchg.org. Maybe in a
separate patch. Care to send one or should I send it?

> ---
> v2 -> v3:
>     Modified the mail address to hannes@cmpxchg.org
> 
> v1 -> v2:
>     Remove BUG_ON is not an good solutions. Becuase it indeed can
>     see !page_evictable() in shrink_page_list due to some race.  
> 
>  mm/madvise.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 99dd06f..63e1308 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -363,8 +363,12 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		ClearPageReferenced(page);
>  		test_and_clear_page_young(page);
>  		if (pageout) {
> -			if (!isolate_lru_page(page))
> -				list_add(&page->lru, &page_list);
> +			if (!isolate_lru_page(page)) {
> +				if (PageUnevictable(page))
> +					putback_lru_page(page);
> +				else
> +					list_add(&page->lru, &page_list);
> +			}
>  		} else
>  			deactivate_page(page);
>  huge_unlock:
> @@ -441,8 +445,12 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		ClearPageReferenced(page);
>  		test_and_clear_page_young(page);
>  		if (pageout) {
> -			if (!isolate_lru_page(page))
> -				list_add(&page->lru, &page_list);
> +			if (!isolate_lru_page(page)) {
> +				if (PageUnevictable(page))
> +					putback_lru_page(page);
> +				else
> +					list_add(&page->lru, &page_list);
> +			}
>  		} else
>  			deactivate_page(page);
>  	}
> -- 
> 1.7.12.4

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-01 18:32 ` Michal Hocko
@ 2019-11-02  7:36   ` zhong jiang
  2019-11-05  6:33     ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: zhong jiang @ 2019-11-02  7:36 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, minchan, hannes, linux-mm

On 2019/11/2 2:32, Michal Hocko wrote:
> On Fri 01-11-19 21:50:45, zhong jiang wrote:
>> Recently, I hit the following issue when running in the upstream.
>>
>> kernel BUG at mm/vmscan.c:1521!
>> invalid opcode: 0000 [#1] SMP KASAN PTI
>> CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
>> RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
>> Code: de f5 ff ff e8 ab 79 eb ff 4c 89 f7 e8 43 33 0d 00 e9 cc f5 ff ff e8 99 79 eb ff 48 c7 c6 a0 34 2b a0 4c 89 f7 e8 1a 4d 05 00 <0f> 0b e8 83 79 eb ff 48 89 d8 48 c1 e8 03 42 80 3c 38 00 0f 85 74
>> RSP: 0018:ffff88819a3df5a0 EFLAGS: 00010286
>> RAX: 0000000000040000 RBX: ffffea00061c3980 RCX: ffffffff814fba36
>> RDX: 00000000000056f7 RSI: ffffc9000c02c000 RDI: ffff8881f70268cc
>> RBP: ffff88819a3df898 R08: ffffed103ee05de0 R09: ffffed103ee05de0
>> R10: 0000000000000001 R11: ffffed103ee05ddf R12: ffff88819a3df6f0
>> R13: ffff88819a3df6f0 R14: ffffea00061c3980 R15: dffffc0000000000
>> FS:  00007f21b9d8e700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000001b2d621000 CR3: 00000001c8c46004 CR4: 00000000007606f0
>> DR0: 0000000020000140 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>> PKRU: 55555554
>> Call Trace:
>>  reclaim_pages+0x499/0x800 mm/vmscan.c:2188
>>  madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
>>  walk_pmd_range mm/pagewalk.c:53 [inline]
>>  walk_pud_range mm/pagewalk.c:112 [inline]
>>  walk_p4d_range mm/pagewalk.c:139 [inline]
>>  walk_pgd_range mm/pagewalk.c:166 [inline]
>>  __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
>>  walk_page_range+0x179/0x310 mm/pagewalk.c:349
>>  madvise_pageout_page_range mm/madvise.c:506 [inline]
>>  madvise_pageout+0x1f0/0x330 mm/madvise.c:542
>>  madvise_vma mm/madvise.c:931 [inline]
>>  __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
>>  do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> madvise_pageout access the specified range of the vma and isolate
>> them, then run shrink_page_list to reclaim its memory. But It also
>> isolate the unevictable page to reclaim. Hence, we can catch the
>> cases in shrink_page_list.
>>
>> The root cause is that we scan the page tables instead of specific
>> LRU list. and so we need to filter out the unevictable lru pages
>> from our end.
>>
>> Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
>> Cc: stable@vger.kernel.org
>> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
>
> But I would really appreciate to add a comment for the BUG_ON and
> explain why do we care about PageUnevictable so much when there is an
> explicit page_evictable check in the reclaim path. In other words a
> short summary of what Johannes explained in
> http://lkml.kernel.org/r/20191030193307.GA48128@cmpxchg.org. Maybe in a
> separate patch. Care to send one or should I send it?
Hi,  Michal

Actually,  I am not very clear about the words Johannes had said.   How the race to
tirgger, it will result in an PgeMlocked page can be visible in shrink_page_list.

Could you elaborate the race in detail further ?

Thanks,
zhong jiang
>> ---
>> v2 -> v3:
>>     Modified the mail address to hannes@cmpxchg.org
>>
>> v1 -> v2:
>>     Remove BUG_ON is not an good solutions. Becuase it indeed can
>>     see !page_evictable() in shrink_page_list due to some race.  
>>
>>  mm/madvise.c | 16 ++++++++++++----
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 99dd06f..63e1308 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -363,8 +363,12 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>>  		ClearPageReferenced(page);
>>  		test_and_clear_page_young(page);
>>  		if (pageout) {
>> -			if (!isolate_lru_page(page))
>> -				list_add(&page->lru, &page_list);
>> +			if (!isolate_lru_page(page)) {
>> +				if (PageUnevictable(page))
>> +					putback_lru_page(page);
>> +				else
>> +					list_add(&page->lru, &page_list);
>> +			}
>>  		} else
>>  			deactivate_page(page);
>>  huge_unlock:
>> @@ -441,8 +445,12 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>>  		ClearPageReferenced(page);
>>  		test_and_clear_page_young(page);
>>  		if (pageout) {
>> -			if (!isolate_lru_page(page))
>> -				list_add(&page->lru, &page_list);
>> +			if (!isolate_lru_page(page)) {
>> +				if (PageUnevictable(page))
>> +					putback_lru_page(page);
>> +				else
>> +					list_add(&page->lru, &page_list);
>> +			}
>>  		} else
>>  			deactivate_page(page);
>>  	}
>> -- 
>> 1.7.12.4




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-02  7:36   ` zhong jiang
@ 2019-11-05  6:33     ` Michal Hocko
  2019-11-05 12:28       ` zhong jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2019-11-05  6:33 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, minchan, hannes, linux-mm

On Sat 02-11-19 15:36:55, zhong jiang wrote:
> On 2019/11/2 2:32, Michal Hocko wrote:
[...]
> > But I would really appreciate to add a comment for the BUG_ON and
> > explain why do we care about PageUnevictable so much when there is an
> > explicit page_evictable check in the reclaim path. In other words a
> > short summary of what Johannes explained in
> > http://lkml.kernel.org/r/20191030193307.GA48128@cmpxchg.org. Maybe in a
> > separate patch. Care to send one or should I send it?
> Hi,  Michal
> 
> Actually,  I am not very clear about the words Johannes had said.   How the race to
> tirgger, it will result in an PgeMlocked page can be visible in shrink_page_list.
> 
> Could you elaborate the race in detail further ?

I would go with the following comment

	/*
	 * Page reclaim can see !page_evictable(), but it must not see pages that
	 * have the PageUnevictable lru bit already set. See  __pagevec_lru_add_fn()
	 * for more details.
	 */
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-05  6:33     ` Michal Hocko
@ 2019-11-05 12:28       ` zhong jiang
  2019-11-05 12:45         ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: zhong jiang @ 2019-11-05 12:28 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, minchan, hannes, linux-mm

On 2019/11/5 14:33, Michal Hocko wrote:
> On Sat 02-11-19 15:36:55, zhong jiang wrote:
>> On 2019/11/2 2:32, Michal Hocko wrote:
> [...]
>>> But I would really appreciate to add a comment for the BUG_ON and
>>> explain why do we care about PageUnevictable so much when there is an
>>> explicit page_evictable check in the reclaim path. In other words a
>>> short summary of what Johannes explained in
>>> http://lkml.kernel.org/r/20191030193307.GA48128@cmpxchg.org. Maybe in a
>>> separate patch. Care to send one or should I send it?
>> Hi,  Michal
>>
>> Actually,  I am not very clear about the words Johannes had said.   How the race to
>> tirgger, it will result in an PgeMlocked page can be visible in shrink_page_list.
>>
>> Could you elaborate the race in detail further ?
> I would go with the following comment
>
> 	/*
> 	 * Page reclaim can see !page_evictable(), but it must not see pages that
> 	 * have the PageUnevictable lru bit already set. See  __pagevec_lru_add_fn()
> 	 * for more details.
> 	 */
But  the detail still confuses me in __pagevec_lru_add_fn()  to  see PageMlocked  in vmscan :-\ .
 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-05 12:28       ` zhong jiang
@ 2019-11-05 12:45         ` Michal Hocko
  2019-11-05 14:10           ` zhong jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2019-11-05 12:45 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, minchan, hannes, linux-mm

On Tue 05-11-19 20:28:58, zhong jiang wrote:
> On 2019/11/5 14:33, Michal Hocko wrote:
> > On Sat 02-11-19 15:36:55, zhong jiang wrote:
> >> On 2019/11/2 2:32, Michal Hocko wrote:
> > [...]
> >>> But I would really appreciate to add a comment for the BUG_ON and
> >>> explain why do we care about PageUnevictable so much when there is an
> >>> explicit page_evictable check in the reclaim path. In other words a
> >>> short summary of what Johannes explained in
> >>> http://lkml.kernel.org/r/20191030193307.GA48128@cmpxchg.org. Maybe in a
> >>> separate patch. Care to send one or should I send it?
> >> Hi,  Michal
> >>
> >> Actually,  I am not very clear about the words Johannes had said.   How the race to
> >> tirgger, it will result in an PgeMlocked page can be visible in shrink_page_list.
> >>
> >> Could you elaborate the race in detail further ?
> > I would go with the following comment
> >
> > 	/*
> > 	 * Page reclaim can see !page_evictable(), but it must not see pages that
> > 	 * have the PageUnevictable lru bit already set. See  __pagevec_lru_add_fn()
> > 	 * for more details.
> > 	 */
> But  the detail still confuses me in __pagevec_lru_add_fn()  to  see PageMlocked  in vmscan :-\ .

Which part does confuse you exactly?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-05 12:45         ` Michal Hocko
@ 2019-11-05 14:10           ` zhong jiang
  2019-11-05 18:48             ` Johannes Weiner
  0 siblings, 1 reply; 11+ messages in thread
From: zhong jiang @ 2019-11-05 14:10 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, minchan, hannes, linux-mm

On 2019/11/5 20:45, Michal Hocko wrote:
> On Tue 05-11-19 20:28:58, zhong jiang wrote:
>> On 2019/11/5 14:33, Michal Hocko wrote:
>>> On Sat 02-11-19 15:36:55, zhong jiang wrote:
>>>> On 2019/11/2 2:32, Michal Hocko wrote:
>>> [...]
>>>>> But I would really appreciate to add a comment for the BUG_ON and
>>>>> explain why do we care about PageUnevictable so much when there is an
>>>>> explicit page_evictable check in the reclaim path. In other words a
>>>>> short summary of what Johannes explained in
>>>>> http://lkml.kernel.org/r/20191030193307.GA48128@cmpxchg.org. Maybe in a
>>>>> separate patch. Care to send one or should I send it?
>>>> Hi,  Michal
>>>>
>>>> Actually,  I am not very clear about the words Johannes had said.   How the race to
>>>> tirgger, it will result in an PgeMlocked page can be visible in shrink_page_list.
>>>>
>>>> Could you elaborate the race in detail further ?
>>> I would go with the following comment
>>>
>>> 	/*
>>> 	 * Page reclaim can see !page_evictable(), but it must not see pages that
>>> 	 * have the PageUnevictable lru bit already set. See  __pagevec_lru_add_fn()
>>> 	 * for more details.
>>> 	 */
>> But  the detail still confuses me in __pagevec_lru_add_fn()  to  see PageMlocked  in vmscan :-\ .
> Which part does confuse you exactly?
page reclaim can see !page_evictable()  means some race still exist in the kernel. Is there any race window .

PageLocked will prevent the cases in shmem ?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-05 14:10           ` zhong jiang
@ 2019-11-05 18:48             ` Johannes Weiner
  2019-11-07 13:31               ` zhong jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2019-11-05 18:48 UTC (permalink / raw)
  To: zhong jiang; +Cc: Michal Hocko, akpm, minchan, linux-mm

On Tue, Nov 05, 2019 at 10:10:37PM +0800, zhong jiang wrote:
> On 2019/11/5 20:45, Michal Hocko wrote:
> > On Tue 05-11-19 20:28:58, zhong jiang wrote:
> >> On 2019/11/5 14:33, Michal Hocko wrote:
> >>> On Sat 02-11-19 15:36:55, zhong jiang wrote:
> >>>> On 2019/11/2 2:32, Michal Hocko wrote:
> >>> [...]
> >>>>> But I would really appreciate to add a comment for the BUG_ON and
> >>>>> explain why do we care about PageUnevictable so much when there is an
> >>>>> explicit page_evictable check in the reclaim path. In other words a
> >>>>> short summary of what Johannes explained in
> >>>>> http://lkml.kernel.org/r/20191030193307.GA48128@cmpxchg.org. Maybe in a
> >>>>> separate patch. Care to send one or should I send it?
> >>>> Hi,  Michal
> >>>>
> >>>> Actually,  I am not very clear about the words Johannes had said.   How the race to
> >>>> tirgger, it will result in an PgeMlocked page can be visible in shrink_page_list.
> >>>>
> >>>> Could you elaborate the race in detail further ?
> >>> I would go with the following comment
> >>>
> >>> 	/*
> >>> 	 * Page reclaim can see !page_evictable(), but it must not see pages that
> >>> 	 * have the PageUnevictable lru bit already set. See  __pagevec_lru_add_fn()
> >>> 	 * for more details.
> >>> 	 */
> >> But  the detail still confuses me in __pagevec_lru_add_fn()  to  see PageMlocked  in vmscan :-\ .
> > Which part does confuse you exactly?
> page reclaim can see !page_evictable()  means some race still exist in the kernel. Is there any race window .

Yes. mlock does this:

lock_page()
SetPageMlocked()
if (isolate_lru_page())
  putback_lru_page() // move to unevictable list
unlock_page()

and vmscan does this:

isolate_lru_pages()
for_each_page()
  if (!try_lock_page())
    continue
  if (!page_evictable())
    continue
putback_lru_pages()

It's possible that mlock locks the page and sets PG_mlocked, but
vmscan has the page already isolated and mlock cannot move it to the
unevictable list itself. In that case, vmscan will either fail to lock
the page or see !page_unevictable() and move the page on the
unevictable list on behalf of mlock.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
  2019-11-05 18:48             ` Johannes Weiner
@ 2019-11-07 13:31               ` zhong jiang
  0 siblings, 0 replies; 11+ messages in thread
From: zhong jiang @ 2019-11-07 13:31 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Michal Hocko, akpm, minchan, linux-mm

On 2019/11/6 2:48, Johannes Weiner wrote:
> On Tue, Nov 05, 2019 at 10:10:37PM +0800, zhong jiang wrote:
>> On 2019/11/5 20:45, Michal Hocko wrote:
>>> On Tue 05-11-19 20:28:58, zhong jiang wrote:
>>>> On 2019/11/5 14:33, Michal Hocko wrote:
>>>>> On Sat 02-11-19 15:36:55, zhong jiang wrote:
>>>>>> On 2019/11/2 2:32, Michal Hocko wrote:
>>>>> [...]
>>>>>>> But I would really appreciate to add a comment for the BUG_ON and
>>>>>>> explain why do we care about PageUnevictable so much when there is an
>>>>>>> explicit page_evictable check in the reclaim path. In other words a
>>>>>>> short summary of what Johannes explained in
>>>>>>> http://lkml.kernel.org/r/20191030193307.GA48128@cmpxchg.org. Maybe in a
>>>>>>> separate patch. Care to send one or should I send it?
>>>>>> Hi,  Michal
>>>>>>
>>>>>> Actually,  I am not very clear about the words Johannes had said.   How the race to
>>>>>> tirgger, it will result in an PgeMlocked page can be visible in shrink_page_list.
>>>>>>
>>>>>> Could you elaborate the race in detail further ?
>>>>> I would go with the following comment
>>>>>
>>>>> 	/*
>>>>> 	 * Page reclaim can see !page_evictable(), but it must not see pages that
>>>>> 	 * have the PageUnevictable lru bit already set. See  __pagevec_lru_add_fn()
>>>>> 	 * for more details.
>>>>> 	 */
>>>> But  the detail still confuses me in __pagevec_lru_add_fn()  to  see PageMlocked  in vmscan :-\ .
>>> Which part does confuse you exactly?
>> page reclaim can see !page_evictable()  means some race still exist in the kernel. Is there any race window .
> Yes. mlock does this:
>
> lock_page()
> SetPageMlocked()
> if (isolate_lru_page())
>   putback_lru_page() // move to unevictable list
> unlock_page()
>
> and vmscan does this:
>
> isolate_lru_pages()
> for_each_page()
>   if (!try_lock_page())
>     continue
>   if (!page_evictable())
>     continue
> putback_lru_pages()
>
> It's possible that mlock locks the page and sets PG_mlocked, but
> vmscan has the page already isolated and mlock cannot move it to the
> unevictable list itself. In that case, vmscan will either fail to lock
> the page or see !page_unevictable() and move the page on the
> unevictable list on behalf of mlock.
Thanks for you kindly clarification.  

Sincerely,
zhong jiang
> .
>




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01 13:50 [PATCH v3] mm: fix trying to reclaim unevictable lru page when calling madvise_pageout zhong jiang
2019-11-01 14:46 ` Johannes Weiner
2019-11-01 15:08 ` Minchan Kim
2019-11-01 18:32 ` Michal Hocko
2019-11-02  7:36   ` zhong jiang
2019-11-05  6:33     ` Michal Hocko
2019-11-05 12:28       ` zhong jiang
2019-11-05 12:45         ` Michal Hocko
2019-11-05 14:10           ` zhong jiang
2019-11-05 18:48             ` Johannes Weiner
2019-11-07 13:31               ` zhong jiang

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git