All of lore.kernel.org
 help / color / mirror / Atom feed
* (resend)WARNING: trying to isolate tail page in isolate_lru_page
@ 2022-08-25 14:40 韩天硕
  2022-08-25 16:50 ` Yu Zhao
  0 siblings, 1 reply; 19+ messages in thread
From: 韩天硕 @ 2022-08-25 14:40 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 4501 bytes --]

Hello:

    My Syzkaller reported me the following issue on:




HEAD commit: 072e51356cd5a4a1c12c1020bc054c99b98333df Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

git tree: upstream

kernel config: defconfig

compiler: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0




------------[ cut here ]------------
trying to isolate tail page
WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
Modules linked in:
CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:isolate_lru_page+0x130/0x140
Code: c3 89 c6 e8 22 4f f2 ff 85 db 75 0d e8 a9 4d f2 ff 44 89 e0 5b 5d 41 5c c3 e8 9c 4d f2 ff 48 c7 c7 a0 be 6a 93 e8 a9 f5 69 01 <0f> 0b eb de 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 55 48 89 fd
loop3: detected capacity change from 0 to 16383
RSP: 0018:ffff88800844f8b8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffffc90000509000 RSI: ffff8880037997c0 RDI: ffffed1001089f09
RBP: ffffea000010b040 R08: ffffffff8117b3f8 R09: 0000000000000000
R10: 0000000000000005 R11: ffffed100d2c4ead R12: 00000000fffffff0
R13: ffff88800185aff0 R14: ffffea000010b048 R15: 0000000021000000
FS:  00007f8acbd46700(0000) GS:ffff888069600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2c821000 CR3: 0000000005028005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
nfs4: Unknown parameter 'vfat'
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 madvise_cold_or_pageout_pte_range+0x43b/0x8f0
 __walk_page_range+0xa48/0x1310
 walk_page_range+0x14b/0x280
 madvise_pageout+0x184/0x260
 madvise_vma_behavior+0x843/0x13f0
 do_madvise+0x310/0x5b0
 __x64_sys_madvise+0x5f/0x70
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8acc5d38bd
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f8acbd45bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f8acc6f2f60 RCX: 00007f8acc5d38bd
RDX: 0000000000000015 RSI: 0000000000004000 RDI: 0000000020ffc000
RBP: 00007f8acc6400a9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffec656fb0f R14: 00007ffec656fcb0 R15: 00007f8acbd45d80
 </TASK>
---[ end trace 0000000000000000 ]---













the bug was bisect to:  

[a4e58cce84ee88129d5d49c064bd2852b481357] mm: introduce MADV_PAGEOUT







the C reproducer is as follows:




#define _GNU_SOURCE 
#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>

uint64_t r[1] = {0xffffffffffffffff};

int main(void)
{
// mmap(0x1ffff000, 0x1000, PROT_NONE, MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0)
syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

// mmap(0x20000000, 0x1000000, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0)
syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);

// mmap(0x21000000, 0x1000, PROT_NONE, MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0)
syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

// fd = socket(AF_PACKET, SOCK_RAW, 0x300)
intptr_t res = 0;
res = syscall(__NR_socket, 0x11ul, 3ul, 0x300);
if (res != -1)
r[0] = res;

*(uint32_t*)0x20000100 = 0x10000;
*(uint32_t*)0x20000104 = 3;
*(uint32_t*)0x20000108 = 0x80;
*(uint32_t*)0x2000010c = 0x600;
syscall(__NR_setsockopt, r[0], 0x107, 5, 0x20000100ul, 0x10ul);

// mmap(0x20ffd000, 0x30000, PROT_NONE, MAP_PRIVATE|MAP_FIXED, fd, 0)
syscall(__NR_mmap, 0x20ffd000ul, 0x30000ul, 0ul, 0x12ul, r[0], 0ul);

// madvise(0x20ffc000, 0x4000, MADV_PAGEOUT)
syscall(__NR_madvise, 0x20ffc000ul, 0x4000ul, 0x15ul);
return 0;
}






compile the repro with: 

gcc -static -o repro repro.c







my QEMU startup command line is:

qemu-system-x86_64 \
-s \
-m 2G \
-smp 4 \
-kernel arch/x86/boot/bzImage \
-append "console=ttyS0 root=/dev/sda rw earlyprintk=serial" \
-drive file=../fs/stretch.img,format=raw \
-nographic \
-enable-kvm \
-monitor /dev/null



the bug can reproduce reliably under my experienment settings.




Regards,

Tianshuo

[-- Attachment #2: Type: text/html, Size: 14148 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-25 14:40 (resend)WARNING: trying to isolate tail page in isolate_lru_page 韩天硕
@ 2022-08-25 16:50 ` Yu Zhao
  2022-08-25 18:23   ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhao @ 2022-08-25 16:50 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, Linux-MM, 韩天硕, mawupeng

On Thu, Aug 25, 2022 at 8:40 AM 韩天硕 <hantianshuo@iie.ac.cn> wrote:
>
> Hello:
>
>     My Syzkaller reported me the following issue on:
>
>
> HEAD commit: 072e51356cd5a4a1c12c1020bc054c99b98333df Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
>
> git tree: upstream
>
> kernel config: defconfig
>
> compiler: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
>
>
> ------------[ cut here ]------------
> trying to isolate tail page
> WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
> Modules linked in:
> CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:isolate_lru_page+0x130/0x140
> Code: c3 89 c6 e8 22 4f f2 ff 85 db 75 0d e8 a9 4d f2 ff 44 89 e0 5b 5d 41 5c c3 e8 9c 4d f2 ff 48 c7 c7 a0 be 6a 93 e8 a9 f5 69 01 <0f> 0b eb de 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 55 48 89 fd
> loop3: detected capacity change from 0 to 16383
> RSP: 0018:ffff88800844f8b8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: ffffc90000509000 RSI: ffff8880037997c0 RDI: ffffed1001089f09
> RBP: ffffea000010b040 R08: ffffffff8117b3f8 R09: 0000000000000000
> R10: 0000000000000005 R11: ffffed100d2c4ead R12: 00000000fffffff0
> R13: ffff88800185aff0 R14: ffffea000010b048 R15: 0000000021000000
> FS:  00007f8acbd46700(0000) GS:ffff888069600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2c821000 CR3: 0000000005028005 CR4: 0000000000770ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> nfs4: Unknown parameter 'vfat'
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  madvise_cold_or_pageout_pte_range+0x43b/0x8f0
>  __walk_page_range+0xa48/0x1310
>  walk_page_range+0x14b/0x280
>  madvise_pageout+0x184/0x260
>  madvise_vma_behavior+0x843/0x13f0
>  do_madvise+0x310/0x5b0
>  __x64_sys_madvise+0x5f/0x70
>  do_syscall_64+0x38/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f8acc5d38bd
> Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f8acbd45bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> RAX: ffffffffffffffda RBX: 00007f8acc6f2f60 RCX: 00007f8acc5d38bd
> RDX: 0000000000000015 RSI: 0000000000004000 RDI: 0000000020ffc000
> RBP: 00007f8acc6400a9 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffec656fb0f R14: 00007ffec656fcb0 R15: 00007f8acbd45d80
>  </TASK>

The above is from 5.18. Another report from 5.10:
https://lore.kernel.org/r/d927a335-a70b-48d3-9645-1d33cc88bd9c@huawei.com/

We also hit it on 5.4, 5.10 and 5.15:
  trying to isolate tail page
  WARNING: CPU: 1 PID: 4608 at mm/vmscan.c:2096
isolate_lru_page+0xb4/0x527 mm/vmscan.c:2096
  Modules linked in:
  CPU: 1 PID: 4608 Comm: syz-executor.1 Not tainted
5.15.11-syzkaller-01208-g0faa8c9f9dc4 #0
9358db1842dece7c55a7996e65dc0b63d57f833c
  Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
  RIP: 0010:isolate_lru_page+0xb4/0x527 mm/vmscan.c:2096
  Code: c7 a0 11 39 86 e8 91 75 8d 00 31 ff 89 c3 89 c6 e8 84 22 e2 ff
85 db 74 13 e8 62 21 e2 ff 48 c7 c7 60 74 d1 84 e8 d4 9a 1b 03 <0f> 0b
e8 4f 21 e2 ff 48 89 ef e8 09 f7 fe ff 48 89 c2 48 89 c3 b8
  RSP: 0018:ffffc90007cbf960 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
  RDX: ffffc900081a1000 RSI: ffff88811d5dd400 RDI: fffff52000f97f1e
  RBP: ffffea000500c080 R08: ffffffff812600fc R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000005 R12: ffffc90007cbfc70
  R13: dffffc0000000000 R14: ffffea000500c088 R15: ffffea000500c000
  FS:  00007d1656db8700(0000) GS:ffff8881f6d00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2d422000 CR3: 0000000112d93000 CR4: 00000000003506e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   madvise_cold_or_pageout_pte_range+0x4c9/0x6c7 mm/madvise.c:462
   walk_pmd_range mm/pagewalk.c:128 [inline]
   walk_pud_range mm/pagewalk.c:205 [inline]
   walk_p4d_range mm/pagewalk.c:240 [inline]
   walk_pgd_range mm/pagewalk.c:277 [inline]
   __walk_page_range+0xae0/0xe54 mm/pagewalk.c:379
   walk_page_range+0x1eb/0x29e mm/pagewalk.c:475
   madvise_pageout_page_range mm/madvise.c:528 [inline]
   madvise_pageout+0x21a/0x343 mm/madvise.c:565
   madvise_vma mm/madvise.c:993 [inline]
   do_madvise+0xbac/0x16f4 mm/madvise.c:1202

Please take a look. Thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-25 16:50 ` Yu Zhao
@ 2022-08-25 18:23   ` Matthew Wilcox
  2022-08-25 18:37     ` Yu Zhao
  2022-08-25 18:40     ` Yang Shi
  0 siblings, 2 replies; 19+ messages in thread
From: Matthew Wilcox @ 2022-08-25 18:23 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Minchan Kim, Andrew Morton, Linux-MM, 韩天硕, mawupeng

On Thu, Aug 25, 2022 at 10:50:19AM -0600, Yu Zhao wrote:
> On Thu, Aug 25, 2022 at 8:40 AM 韩天硕 <hantianshuo@iie.ac.cn> wrote:
> >
> > Hello:
> >
> >     My Syzkaller reported me the following issue on:
> >
> >
> > HEAD commit: 072e51356cd5a4a1c12c1020bc054c99b98333df Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
> >
> > git tree: upstream
> >
> > kernel config: defconfig
> >
> > compiler: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
> >
> >
> > ------------[ cut here ]------------
> > trying to isolate tail page
> > WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
> > Modules linked in:
> > CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > RIP: 0010:isolate_lru_page+0x130/0x140
> > Code: c3 89 c6 e8 22 4f f2 ff 85 db 75 0d e8 a9 4d f2 ff 44 89 e0 5b 5d 41 5c c3 e8 9c 4d f2 ff 48 c7 c7 a0 be 6a 93 e8 a9 f5 69 01 <0f> 0b eb de 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 55 48 89 fd
> > loop3: detected capacity change from 0 to 16383
> > RSP: 0018:ffff88800844f8b8 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > RDX: ffffc90000509000 RSI: ffff8880037997c0 RDI: ffffed1001089f09
> > RBP: ffffea000010b040 R08: ffffffff8117b3f8 R09: 0000000000000000
> > R10: 0000000000000005 R11: ffffed100d2c4ead R12: 00000000fffffff0
> > R13: ffff88800185aff0 R14: ffffea000010b048 R15: 0000000021000000
> > FS:  00007f8acbd46700(0000) GS:ffff888069600000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000001b2c821000 CR3: 0000000005028005 CR4: 0000000000770ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > nfs4: Unknown parameter 'vfat'
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > Call Trace:
> >  <TASK>
> >  madvise_cold_or_pageout_pte_range+0x43b/0x8f0
> >  __walk_page_range+0xa48/0x1310
> >  walk_page_range+0x14b/0x280
> >  madvise_pageout+0x184/0x260
> >  madvise_vma_behavior+0x843/0x13f0
> >  do_madvise+0x310/0x5b0
> >  __x64_sys_madvise+0x5f/0x70
> >  do_syscall_64+0x38/0x90
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f8acc5d38bd
> > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007f8acbd45bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> > RAX: ffffffffffffffda RBX: 00007f8acc6f2f60 RCX: 00007f8acc5d38bd
> > RDX: 0000000000000015 RSI: 0000000000004000 RDI: 0000000020ffc000
> > RBP: 00007f8acc6400a9 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 00007ffec656fb0f R14: 00007ffec656fcb0 R15: 00007f8acbd45d80
> >  </TASK>
> 
> The above is from 5.18. Another report from 5.10:
> https://lore.kernel.org/r/d927a335-a70b-48d3-9645-1d33cc88bd9c@huawei.com/
> 
> We also hit it on 5.4, 5.10 and 5.15:
>   trying to isolate tail page
>   WARNING: CPU: 1 PID: 4608 at mm/vmscan.c:2096
> isolate_lru_page+0xb4/0x527 mm/vmscan.c:2096
>   Modules linked in:

Looks like my analysis from yesterday was dropped:

: This all seems quite plausible.  The reproducer seems to (correct me
: if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
: seems to create compound pages and mmap them.  This isn't folio-related
: at all; I just moved the code that warns about it from mm/vmscan.c to
: folio-compat.c.
: 
: Looks like a long-standing bug in MADV_PAGEOUT to me.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-25 18:23   ` Matthew Wilcox
@ 2022-08-25 18:37     ` Yu Zhao
  2022-08-25 18:40     ` Yang Shi
  1 sibling, 0 replies; 19+ messages in thread
From: Yu Zhao @ 2022-08-25 18:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Minchan Kim, Andrew Morton, Linux-MM, 韩天硕, mawupeng

On Thu, Aug 25, 2022 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Aug 25, 2022 at 10:50:19AM -0600, Yu Zhao wrote:
> > On Thu, Aug 25, 2022 at 8:40 AM 韩天硕 <hantianshuo@iie.ac.cn> wrote:
> > >
> > > Hello:
> > >
> > >     My Syzkaller reported me the following issue on:
> > >
> > >
> > > HEAD commit: 072e51356cd5a4a1c12c1020bc054c99b98333df Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
> > >
> > > git tree: upstream
> > >
> > > kernel config: defconfig
> > >
> > > compiler: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
> > >
> > >
> > > ------------[ cut here ]------------
> > > trying to isolate tail page
> > > WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
> > > Modules linked in:
> > > CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > RIP: 0010:isolate_lru_page+0x130/0x140
> > > Code: c3 89 c6 e8 22 4f f2 ff 85 db 75 0d e8 a9 4d f2 ff 44 89 e0 5b 5d 41 5c c3 e8 9c 4d f2 ff 48 c7 c7 a0 be 6a 93 e8 a9 f5 69 01 <0f> 0b eb de 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 55 48 89 fd
> > > loop3: detected capacity change from 0 to 16383
> > > RSP: 0018:ffff88800844f8b8 EFLAGS: 00010282
> > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > > RDX: ffffc90000509000 RSI: ffff8880037997c0 RDI: ffffed1001089f09
> > > RBP: ffffea000010b040 R08: ffffffff8117b3f8 R09: 0000000000000000
> > > R10: 0000000000000005 R11: ffffed100d2c4ead R12: 00000000fffffff0
> > > R13: ffff88800185aff0 R14: ffffea000010b048 R15: 0000000021000000
> > > FS:  00007f8acbd46700(0000) GS:ffff888069600000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000001b2c821000 CR3: 0000000005028005 CR4: 0000000000770ef0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > nfs4: Unknown parameter 'vfat'
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > PKRU: 55555554
> > > Call Trace:
> > >  <TASK>
> > >  madvise_cold_or_pageout_pte_range+0x43b/0x8f0
> > >  __walk_page_range+0xa48/0x1310
> > >  walk_page_range+0x14b/0x280
> > >  madvise_pageout+0x184/0x260
> > >  madvise_vma_behavior+0x843/0x13f0
> > >  do_madvise+0x310/0x5b0
> > >  __x64_sys_madvise+0x5f/0x70
> > >  do_syscall_64+0x38/0x90
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > RIP: 0033:0x7f8acc5d38bd
> > > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > RSP: 002b:00007f8acbd45bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> > > RAX: ffffffffffffffda RBX: 00007f8acc6f2f60 RCX: 00007f8acc5d38bd
> > > RDX: 0000000000000015 RSI: 0000000000004000 RDI: 0000000020ffc000
> > > RBP: 00007f8acc6400a9 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > R13: 00007ffec656fb0f R14: 00007ffec656fcb0 R15: 00007f8acbd45d80
> > >  </TASK>
> >
> > The above is from 5.18. Another report from 5.10:
> > https://lore.kernel.org/r/d927a335-a70b-48d3-9645-1d33cc88bd9c@huawei.com/
> >
> > We also hit it on 5.4, 5.10 and 5.15:
> >   trying to isolate tail page
> >   WARNING: CPU: 1 PID: 4608 at mm/vmscan.c:2096
> > isolate_lru_page+0xb4/0x527 mm/vmscan.c:2096
> >   Modules linked in:
>
> Looks like my analysis from yesterday was dropped:

I thought I missed your analysis but apparently it's not on linux-mm
or linux-kernel. Mail server malfunction?

> : This all seems quite plausible.  The reproducer seems to (correct me
> : if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
> : seems to create compound pages and mmap them.  This isn't folio-related
> : at all; I just moved the code that warns about it from mm/vmscan.c to
> : folio-compat.c.

Our syzkaller didn't find a reproducer, but the triggers are all
network related syscalls.

> : Looks like a long-standing bug in MADV_PAGEOUT to me.

Agreed.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-25 18:23   ` Matthew Wilcox
  2022-08-25 18:37     ` Yu Zhao
@ 2022-08-25 18:40     ` Yang Shi
  2022-08-25 18:46       ` Matthew Wilcox
  1 sibling, 1 reply; 19+ messages in thread
From: Yang Shi @ 2022-08-25 18:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yu Zhao, Minchan Kim, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Thu, Aug 25, 2022 at 11:23 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Aug 25, 2022 at 10:50:19AM -0600, Yu Zhao wrote:
> > On Thu, Aug 25, 2022 at 8:40 AM 韩天硕 <hantianshuo@iie.ac.cn> wrote:
> > >
> > > Hello:
> > >
> > >     My Syzkaller reported me the following issue on:
> > >
> > >
> > > HEAD commit: 072e51356cd5a4a1c12c1020bc054c99b98333df Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
> > >
> > > git tree: upstream
> > >
> > > kernel config: defconfig
> > >
> > > compiler: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
> > >
> > >
> > > ------------[ cut here ]------------
> > > trying to isolate tail page
> > > WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
> > > Modules linked in:
> > > CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > RIP: 0010:isolate_lru_page+0x130/0x140
> > > Code: c3 89 c6 e8 22 4f f2 ff 85 db 75 0d e8 a9 4d f2 ff 44 89 e0 5b 5d 41 5c c3 e8 9c 4d f2 ff 48 c7 c7 a0 be 6a 93 e8 a9 f5 69 01 <0f> 0b eb de 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 55 48 89 fd
> > > loop3: detected capacity change from 0 to 16383
> > > RSP: 0018:ffff88800844f8b8 EFLAGS: 00010282
> > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > > RDX: ffffc90000509000 RSI: ffff8880037997c0 RDI: ffffed1001089f09
> > > RBP: ffffea000010b040 R08: ffffffff8117b3f8 R09: 0000000000000000
> > > R10: 0000000000000005 R11: ffffed100d2c4ead R12: 00000000fffffff0
> > > R13: ffff88800185aff0 R14: ffffea000010b048 R15: 0000000021000000
> > > FS:  00007f8acbd46700(0000) GS:ffff888069600000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000001b2c821000 CR3: 0000000005028005 CR4: 0000000000770ef0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > nfs4: Unknown parameter 'vfat'
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > PKRU: 55555554
> > > Call Trace:
> > >  <TASK>
> > >  madvise_cold_or_pageout_pte_range+0x43b/0x8f0
> > >  __walk_page_range+0xa48/0x1310
> > >  walk_page_range+0x14b/0x280
> > >  madvise_pageout+0x184/0x260
> > >  madvise_vma_behavior+0x843/0x13f0
> > >  do_madvise+0x310/0x5b0
> > >  __x64_sys_madvise+0x5f/0x70
> > >  do_syscall_64+0x38/0x90
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > RIP: 0033:0x7f8acc5d38bd
> > > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > RSP: 002b:00007f8acbd45bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> > > RAX: ffffffffffffffda RBX: 00007f8acc6f2f60 RCX: 00007f8acc5d38bd
> > > RDX: 0000000000000015 RSI: 0000000000004000 RDI: 0000000020ffc000
> > > RBP: 00007f8acc6400a9 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > R13: 00007ffec656fb0f R14: 00007ffec656fcb0 R15: 00007f8acbd45d80
> > >  </TASK>
> >
> > The above is from 5.18. Another report from 5.10:
> > https://lore.kernel.org/r/d927a335-a70b-48d3-9645-1d33cc88bd9c@huawei.com/
> >
> > We also hit it on 5.4, 5.10 and 5.15:
> >   trying to isolate tail page
> >   WARNING: CPU: 1 PID: 4608 at mm/vmscan.c:2096
> > isolate_lru_page+0xb4/0x527 mm/vmscan.c:2096
> >   Modules linked in:
>
> Looks like my analysis from yesterday was dropped:
>
> : This all seems quite plausible.  The reproducer seems to (correct me
> : if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
> : seems to create compound pages and mmap them.  This isn't folio-related
> : at all; I just moved the code that warns about it from mm/vmscan.c to
> : folio-compat.c.
> :
> : Looks like a long-standing bug in MADV_PAGEOUT to me.

Such page should never be on lru, right? We could test lru before
calling isolate_lru_page() for this case? I know isolate_lru_page()
does the check, but the tail page warning is raised before the check.

Could the tail page warning be moved under the lru flag test? Seems
possible, but it should need extra handling (re-set lru flag). Seems a
little bit overkilling.

>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-25 18:40     ` Yang Shi
@ 2022-08-25 18:46       ` Matthew Wilcox
  2022-08-26  3:20         ` Yin, Fengwei
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2022-08-25 18:46 UTC (permalink / raw)
  To: Yang Shi
  Cc: Yu Zhao, Minchan Kim, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Thu, Aug 25, 2022 at 11:40:11AM -0700, Yang Shi wrote:
> On Thu, Aug 25, 2022 at 11:23 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Aug 25, 2022 at 10:50:19AM -0600, Yu Zhao wrote:
> > > On Thu, Aug 25, 2022 at 8:40 AM 韩天硕 <hantianshuo@iie.ac.cn> wrote:
> > > >
> > > > Hello:
> > > >
> > > >     My Syzkaller reported me the following issue on:
> > > >
> > > >
> > > > HEAD commit: 072e51356cd5a4a1c12c1020bc054c99b98333df Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
> > > >
> > > > git tree: upstream
> > > >
> > > > kernel config: defconfig
> > > >
> > > > compiler: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
> > > >
> > > >
> > > > ------------[ cut here ]------------
> > > > trying to isolate tail page
> > > > WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
> > > > Modules linked in:
> > > > CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > > RIP: 0010:isolate_lru_page+0x130/0x140
> > > > Code: c3 89 c6 e8 22 4f f2 ff 85 db 75 0d e8 a9 4d f2 ff 44 89 e0 5b 5d 41 5c c3 e8 9c 4d f2 ff 48 c7 c7 a0 be 6a 93 e8 a9 f5 69 01 <0f> 0b eb de 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 54 55 48 89 fd
> > > > loop3: detected capacity change from 0 to 16383
> > > > RSP: 0018:ffff88800844f8b8 EFLAGS: 00010282
> > > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > > > RDX: ffffc90000509000 RSI: ffff8880037997c0 RDI: ffffed1001089f09
> > > > RBP: ffffea000010b040 R08: ffffffff8117b3f8 R09: 0000000000000000
> > > > R10: 0000000000000005 R11: ffffed100d2c4ead R12: 00000000fffffff0
> > > > R13: ffff88800185aff0 R14: ffffea000010b048 R15: 0000000021000000
> > > > FS:  00007f8acbd46700(0000) GS:ffff888069600000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000001b2c821000 CR3: 0000000005028005 CR4: 0000000000770ef0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > nfs4: Unknown parameter 'vfat'
> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > PKRU: 55555554
> > > > Call Trace:
> > > >  <TASK>
> > > >  madvise_cold_or_pageout_pte_range+0x43b/0x8f0
> > > >  __walk_page_range+0xa48/0x1310
> > > >  walk_page_range+0x14b/0x280
> > > >  madvise_pageout+0x184/0x260
> > > >  madvise_vma_behavior+0x843/0x13f0
> > > >  do_madvise+0x310/0x5b0
> > > >  __x64_sys_madvise+0x5f/0x70
> > > >  do_syscall_64+0x38/0x90
> > > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > RIP: 0033:0x7f8acc5d38bd
> > > > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > RSP: 002b:00007f8acbd45bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> > > > RAX: ffffffffffffffda RBX: 00007f8acc6f2f60 RCX: 00007f8acc5d38bd
> > > > RDX: 0000000000000015 RSI: 0000000000004000 RDI: 0000000020ffc000
> > > > RBP: 00007f8acc6400a9 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > R13: 00007ffec656fb0f R14: 00007ffec656fcb0 R15: 00007f8acbd45d80
> > > >  </TASK>
> > >
> > > The above is from 5.18. Another report from 5.10:
> > > https://lore.kernel.org/r/d927a335-a70b-48d3-9645-1d33cc88bd9c@huawei.com/
> > >
> > > We also hit it on 5.4, 5.10 and 5.15:
> > >   trying to isolate tail page
> > >   WARNING: CPU: 1 PID: 4608 at mm/vmscan.c:2096
> > > isolate_lru_page+0xb4/0x527 mm/vmscan.c:2096
> > >   Modules linked in:
> >
> > Looks like my analysis from yesterday was dropped:
> >
> > : This all seems quite plausible.  The reproducer seems to (correct me
> > : if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
> > : seems to create compound pages and mmap them.  This isn't folio-related
> > : at all; I just moved the code that warns about it from mm/vmscan.c to
> > : folio-compat.c.
> > :
> > : Looks like a long-standing bug in MADV_PAGEOUT to me.
> 
> Such page should never be on lru, right? We could test lru before
> calling isolate_lru_page() for this case? I know isolate_lru_page()
> does the check, but the tail page warning is raised before the check.
> 
> Could the tail page warning be moved under the lru flag test? Seems
> possible, but it should need extra handling (re-set lru flag). Seems a
> little bit overkilling.

There's a number of ways of solving this.  I'm interested in seeing
which one Minchan thinks is best.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-25 18:46       ` Matthew Wilcox
@ 2022-08-26  3:20         ` Yin, Fengwei
  2022-08-26 16:56           ` Minchan Kim
  2022-08-26 17:15           ` Matthew Wilcox
  0 siblings, 2 replies; 19+ messages in thread
From: Yin, Fengwei @ 2022-08-26  3:20 UTC (permalink / raw)
  To: Matthew Wilcox, Yang Shi
  Cc: Yu Zhao, Minchan Kim, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng



On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
>>> Looks like my analysis from yesterday was dropped:
>>>
>>> : This all seems quite plausible.  The reproducer seems to (correct me
>>> : if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
>>> : seems to create compound pages and mmap them.  This isn't folio-related
>>> : at all; I just moved the code that warns about it from mm/vmscan.c to
>>> : folio-compat.c.
>>> :
>>> : Looks like a long-standing bug in MADV_PAGEOUT to me.
>> Such page should never be on lru, right? We could test lru before
>> calling isolate_lru_page() for this case? I know isolate_lru_page()
>> does the check, but the tail page warning is raised before the check.
>>
>> Could the tail page warning be moved under the lru flag test? Seems
>> possible, but it should need extra handling (re-set lru flag). Seems a
>> little bit overkilling.
> There's a number of ways of solving this.  I'm interested in seeing
> which one Minchan thinks is best.
> 

My understanding is:
PageTransCompound() return false for compound page if THP is disabled
in kernel config. Replacing PageTransCompound() with PageCompound() 
could work here. But for the long term, folio should be the answer. :).


Regards
Yin, Fengwei



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26  3:20         ` Yin, Fengwei
@ 2022-08-26 16:56           ` Minchan Kim
  2022-08-26 18:23             ` Yang Shi
  2022-08-27  0:48             ` Yin, Fengwei
  2022-08-26 17:15           ` Matthew Wilcox
  1 sibling, 2 replies; 19+ messages in thread
From: Minchan Kim @ 2022-08-26 16:56 UTC (permalink / raw)
  To: Yin, Fengwei
  Cc: Matthew Wilcox, Yang Shi, Yu Zhao, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> 
> 
> On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> >>> Looks like my analysis from yesterday was dropped:
> >>>
> >>> : This all seems quite plausible.  The reproducer seems to (correct me
> >>> : if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
> >>> : seems to create compound pages and mmap them.  This isn't folio-related
> >>> : at all; I just moved the code that warns about it from mm/vmscan.c to
> >>> : folio-compat.c.
> >>> :
> >>> : Looks like a long-standing bug in MADV_PAGEOUT to me.
> >> Such page should never be on lru, right? We could test lru before
> >> calling isolate_lru_page() for this case? I know isolate_lru_page()
> >> does the check, but the tail page warning is raised before the check.
> >>
> >> Could the tail page warning be moved under the lru flag test? Seems
> >> possible, but it should need extra handling (re-set lru flag). Seems a
> >> little bit overkilling.
> > There's a number of ways of solving this.  I'm interested in seeing
> > which one Minchan thinks is best.
> > 
> 
> My understanding is:
> PageTransCompound() return false for compound page if THP is disabled
> in kernel config. Replacing PageTransCompound() with PageCompound() 
> could work here. But for the long term, folio should be the answer. :).

Thanks for reporting and analysis, folks,

I agree with Yang since the MADV_PAGEOUT should work with only
LRU pages.

From 0a43ac31c903bc23299a868a6d6724ff5b807e3d Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Fri, 26 Aug 2022 09:37:34 -0700
Subject: [PATCH] mm: fix madivse_pageout mishandling on non-LRU page
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

MADV_PAGEOUT tries to isolate non-LRU pages and get the warning
from isolate_lru_page below.
Fix it with checking PageLRU in advance.

------------[ cut here ]------------
trying to isolate tail page
WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
Modules linked in:
CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:isolate_lru_page+0x130/0x140

Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
Reported-by: 韩天硕 <hantianshuo@iie.ac.cn>
Suggested-by: Yang Shi <shy828301@gmail.com>
Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
Cc: stable@vger.kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/madvise.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 682e1d161aef..a3fc4cd32ed3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -452,8 +452,11 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 			continue;
 		}
 
-		/* Do not interfere with other mappings of this page */
-		if (page_mapcount(page) != 1)
+		/*
+		 * Do not interfere with other mappings of this page and
+		 * non-LRU page.
+		 */
+		if (!PageLRU(page) || page_mapcount(page) != 1)
 			continue;
 
 		VM_BUG_ON_PAGE(PageTransCompound(page), page);
-- 
2.37.2.672.g94769d06f0-goog



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26  3:20         ` Yin, Fengwei
  2022-08-26 16:56           ` Minchan Kim
@ 2022-08-26 17:15           ` Matthew Wilcox
  2022-08-26 17:27             ` Yu Zhao
  2022-08-27  0:24             ` Yin, Fengwei
  1 sibling, 2 replies; 19+ messages in thread
From: Matthew Wilcox @ 2022-08-26 17:15 UTC (permalink / raw)
  To: Yin, Fengwei
  Cc: Yang Shi, Yu Zhao, Minchan Kim, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> > There's a number of ways of solving this.  I'm interested in seeing
> > which one Minchan thinks is best.
> 
> My understanding is:
> PageTransCompound() return false for compound page if THP is disabled
> in kernel config. Replacing PageTransCompound() with PageCompound() 
> could work here. But for the long term, folio should be the answer. :).

Yes, ultimately, isolate_lru_page() is going away as an interface
and one will have to call folio_isolate_lru().  But should
madvise_cold_or_pageout_pte_range() even be getting called for VMAs
which are mmaps of af_packet?  can_madv_lru_vma() rules out a number
of different types of VMA; should it also be ruling out af_packet VMAs?
If so, how?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 17:15           ` Matthew Wilcox
@ 2022-08-26 17:27             ` Yu Zhao
  2022-08-26 17:53               ` Minchan Kim
  2022-08-27  0:24             ` Yin, Fengwei
  1 sibling, 1 reply; 19+ messages in thread
From: Yu Zhao @ 2022-08-26 17:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yin, Fengwei, Yang Shi, Minchan Kim, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 11:15 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> > On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> > > There's a number of ways of solving this.  I'm interested in seeing
> > > which one Minchan thinks is best.
> >
> > My understanding is:
> > PageTransCompound() return false for compound page if THP is disabled
> > in kernel config. Replacing PageTransCompound() with PageCompound()
> > could work here. But for the long term, folio should be the answer. :).
>
> Yes, ultimately, isolate_lru_page() is going away as an interface
> and one will have to call folio_isolate_lru().  But should
> madvise_cold_or_pageout_pte_range() even be getting called for VMAs
> which are mmaps of af_packet?  can_madv_lru_vma() rules out a number
> of different types of VMA; should it also be ruling out af_packet VMAs?

Agreed.

> If so, how?

We should add a reliable helper to tell whether a file VMA is
reclaimable or not. I don't think we have one. Currently MGLRU checks
mapping->a_ops->read_folio for file VMAs to determine whether they are
reclaimable.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 17:27             ` Yu Zhao
@ 2022-08-26 17:53               ` Minchan Kim
  2022-08-26 17:58                 ` Yu Zhao
                                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Minchan Kim @ 2022-08-26 17:53 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Matthew Wilcox, Yin, Fengwei, Yang Shi, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 11:27:53AM -0600, Yu Zhao wrote:
> On Fri, Aug 26, 2022 at 11:15 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> > > On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> > > > There's a number of ways of solving this.  I'm interested in seeing
> > > > which one Minchan thinks is best.
> > >
> > > My understanding is:
> > > PageTransCompound() return false for compound page if THP is disabled
> > > in kernel config. Replacing PageTransCompound() with PageCompound()
> > > could work here. But for the long term, folio should be the answer. :).
> >
> > Yes, ultimately, isolate_lru_page() is going away as an interface
> > and one will have to call folio_isolate_lru().  But should
> > madvise_cold_or_pageout_pte_range() even be getting called for VMAs
> > which are mmaps of af_packet?  can_madv_lru_vma() rules out a number
> > of different types of VMA; should it also be ruling out af_packet VMAs?
> 
> Agreed.
> 
> > If so, how?
> 
> We should add a reliable helper to tell whether a file VMA is
> reclaimable or not. I don't think we have one. Currently MGLRU checks
> mapping->a_ops->read_folio for file VMAs to determine whether they are
> reclaimable.
> 

Long term, that's better idea(For stable backport, I'd like to go with
simple PageLRU check).

I wonder it's possible to mix LRU pages and non-struct pages together
in a VMA. Otherwise, could we reuse(abuse) VM_MIXEDMAP?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 17:53               ` Minchan Kim
@ 2022-08-26 17:58                 ` Yu Zhao
  2022-08-26 18:02                 ` Matthew Wilcox
  2022-08-26 18:19                 ` Yang Shi
  2 siblings, 0 replies; 19+ messages in thread
From: Yu Zhao @ 2022-08-26 17:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Matthew Wilcox, Yin, Fengwei, Yang Shi, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 11:53 AM Minchan Kim <minchan@kernel.org> wrote:
>
> On Fri, Aug 26, 2022 at 11:27:53AM -0600, Yu Zhao wrote:
> > On Fri, Aug 26, 2022 at 11:15 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> > > > On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> > > > > There's a number of ways of solving this.  I'm interested in seeing
> > > > > which one Minchan thinks is best.
> > > >
> > > > My understanding is:
> > > > PageTransCompound() return false for compound page if THP is disabled
> > > > in kernel config. Replacing PageTransCompound() with PageCompound()
> > > > could work here. But for the long term, folio should be the answer. :).
> > >
> > > Yes, ultimately, isolate_lru_page() is going away as an interface
> > > and one will have to call folio_isolate_lru().  But should
> > > madvise_cold_or_pageout_pte_range() even be getting called for VMAs
> > > which are mmaps of af_packet?  can_madv_lru_vma() rules out a number
> > > of different types of VMA; should it also be ruling out af_packet VMAs?
> >
> > Agreed.
> >
> > > If so, how?
> >
> > We should add a reliable helper to tell whether a file VMA is
> > reclaimable or not. I don't think we have one. Currently MGLRU checks
> > mapping->a_ops->read_folio for file VMAs to determine whether they are
> > reclaimable.
> >
>
> Long term, that's better idea(For stable backport, I'd like to go with
> simple PageLRU check).
>
> I wonder it's possible to mix LRU pages and non-struct pages together
> in a VMA.

The only way to add file pages to LRU is to go through page cache, and
page cache can't handle PFN pages.

> Otherwise, could we reuse(abuse) VM_MIXEDMAP?

VM_SPECIAL


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 17:53               ` Minchan Kim
  2022-08-26 17:58                 ` Yu Zhao
@ 2022-08-26 18:02                 ` Matthew Wilcox
  2022-08-26 18:19                 ` Yang Shi
  2 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox @ 2022-08-26 18:02 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Yu Zhao, Yin, Fengwei, Yang Shi, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 10:53:46AM -0700, Minchan Kim wrote:
> Long term, that's better idea(For stable backport, I'd like to go with
> simple PageLRU check).
> 
> I wonder it's possible to mix LRU pages and non-struct pages together
> in a VMA. Otherwise, could we reuse(abuse) VM_MIXEDMAP?

On a bit of a tangent, I'm not sure that we should be allowing this for
file pages at all.  I see that we only allow it if mapcount is 1, but page
cache is also used by applications that don't mmap it.  So if some other
application is using /tmp/bigfile with read() and write(), I can force
that memory out of the page cache by mmaping it and calling MADV_PAGEOUT.
That could be used as an exfiltration side-channel, for example.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 17:53               ` Minchan Kim
  2022-08-26 17:58                 ` Yu Zhao
  2022-08-26 18:02                 ` Matthew Wilcox
@ 2022-08-26 18:19                 ` Yang Shi
  2022-08-26 23:12                   ` Minchan Kim
  2 siblings, 1 reply; 19+ messages in thread
From: Yang Shi @ 2022-08-26 18:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Yu Zhao, Matthew Wilcox, Yin, Fengwei, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 10:53 AM Minchan Kim <minchan@kernel.org> wrote:
>
> On Fri, Aug 26, 2022 at 11:27:53AM -0600, Yu Zhao wrote:
> > On Fri, Aug 26, 2022 at 11:15 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> > > > On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> > > > > There's a number of ways of solving this.  I'm interested in seeing
> > > > > which one Minchan thinks is best.
> > > >
> > > > My understanding is:
> > > > PageTransCompound() return false for compound page if THP is disabled
> > > > in kernel config. Replacing PageTransCompound() with PageCompound()
> > > > could work here. But for the long term, folio should be the answer. :).
> > >
> > > Yes, ultimately, isolate_lru_page() is going away as an interface
> > > and one will have to call folio_isolate_lru().  But should
> > > madvise_cold_or_pageout_pte_range() even be getting called for VMAs
> > > which are mmaps of af_packet?  can_madv_lru_vma() rules out a number
> > > of different types of VMA; should it also be ruling out af_packet VMAs?
> >
> > Agreed.
> >
> > > If so, how?
> >
> > We should add a reliable helper to tell whether a file VMA is
> > reclaimable or not. I don't think we have one. Currently MGLRU checks
> > mapping->a_ops->read_folio for file VMAs to determine whether they are
> > reclaimable.
> >
>
> Long term, that's better idea(For stable backport, I'd like to go with
> simple PageLRU check).
>
> I wonder it's possible to mix LRU pages and non-struct pages together
> in a VMA. Otherwise, could we reuse(abuse) VM_MIXEDMAP?

I don't think the mix is going to happen via regular mmap call.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 16:56           ` Minchan Kim
@ 2022-08-26 18:23             ` Yang Shi
  2022-08-26 22:58               ` Minchan Kim
  2022-08-27  0:48             ` Yin, Fengwei
  1 sibling, 1 reply; 19+ messages in thread
From: Yang Shi @ 2022-08-26 18:23 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Yin, Fengwei, Matthew Wilcox, Yu Zhao, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 9:56 AM Minchan Kim <minchan@kernel.org> wrote:
>
> On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> >
> >
> > On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> > >>> Looks like my analysis from yesterday was dropped:
> > >>>
> > >>> : This all seems quite plausible.  The reproducer seems to (correct me
> > >>> : if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
> > >>> : seems to create compound pages and mmap them.  This isn't folio-related
> > >>> : at all; I just moved the code that warns about it from mm/vmscan.c to
> > >>> : folio-compat.c.
> > >>> :
> > >>> : Looks like a long-standing bug in MADV_PAGEOUT to me.
> > >> Such page should never be on lru, right? We could test lru before
> > >> calling isolate_lru_page() for this case? I know isolate_lru_page()
> > >> does the check, but the tail page warning is raised before the check.
> > >>
> > >> Could the tail page warning be moved under the lru flag test? Seems
> > >> possible, but it should need extra handling (re-set lru flag). Seems a
> > >> little bit overkilling.
> > > There's a number of ways of solving this.  I'm interested in seeing
> > > which one Minchan thinks is best.
> > >
> >
> > My understanding is:
> > PageTransCompound() return false for compound page if THP is disabled
> > in kernel config. Replacing PageTransCompound() with PageCompound()
> > could work here. But for the long term, folio should be the answer. :).
>
> Thanks for reporting and analysis, folks,
>
> I agree with Yang since the MADV_PAGEOUT should work with only
> LRU pages.
>
> From 0a43ac31c903bc23299a868a6d6724ff5b807e3d Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Fri, 26 Aug 2022 09:37:34 -0700
> Subject: [PATCH] mm: fix madivse_pageout mishandling on non-LRU page
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> MADV_PAGEOUT tries to isolate non-LRU pages and get the warning
> from isolate_lru_page below.
> Fix it with checking PageLRU in advance.
>
> ------------[ cut here ]------------
> trying to isolate tail page
> WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
> Modules linked in:
> CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:isolate_lru_page+0x130/0x140
>
> Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
> Reported-by: 韩天硕 <hantianshuo@iie.ac.cn>
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
> Cc: stable@vger.kernel.org
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Thanks for the patch, looks good to me. Will you post it to the
mailing list? Anyway you could have my ack.

> ---
>  mm/madvise.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 682e1d161aef..a3fc4cd32ed3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -452,8 +452,11 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>                         continue;
>                 }
>
> -               /* Do not interfere with other mappings of this page */
> -               if (page_mapcount(page) != 1)
> +               /*
> +                * Do not interfere with other mappings of this page and
> +                * non-LRU page.
> +                */
> +               if (!PageLRU(page) || page_mapcount(page) != 1)
>                         continue;
>
>                 VM_BUG_ON_PAGE(PageTransCompound(page), page);
> --
> 2.37.2.672.g94769d06f0-goog
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 18:23             ` Yang Shi
@ 2022-08-26 22:58               ` Minchan Kim
  0 siblings, 0 replies; 19+ messages in thread
From: Minchan Kim @ 2022-08-26 22:58 UTC (permalink / raw)
  To: Yang Shi
  Cc: Yin, Fengwei, Matthew Wilcox, Yu Zhao, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 11:23:10AM -0700, Yang Shi wrote:
> On Fri, Aug 26, 2022 at 9:56 AM Minchan Kim <minchan@kernel.org> wrote:
> >
> > On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> > >
> > >
> > > On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> > > >>> Looks like my analysis from yesterday was dropped:
> > > >>>
> > > >>> : This all seems quite plausible.  The reproducer seems to (correct me
> > > >>> : if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
> > > >>> : seems to create compound pages and mmap them.  This isn't folio-related
> > > >>> : at all; I just moved the code that warns about it from mm/vmscan.c to
> > > >>> : folio-compat.c.
> > > >>> :
> > > >>> : Looks like a long-standing bug in MADV_PAGEOUT to me.
> > > >> Such page should never be on lru, right? We could test lru before
> > > >> calling isolate_lru_page() for this case? I know isolate_lru_page()
> > > >> does the check, but the tail page warning is raised before the check.
> > > >>
> > > >> Could the tail page warning be moved under the lru flag test? Seems
> > > >> possible, but it should need extra handling (re-set lru flag). Seems a
> > > >> little bit overkilling.
> > > > There's a number of ways of solving this.  I'm interested in seeing
> > > > which one Minchan thinks is best.
> > > >
> > >
> > > My understanding is:
> > > PageTransCompound() return false for compound page if THP is disabled
> > > in kernel config. Replacing PageTransCompound() with PageCompound()
> > > could work here. But for the long term, folio should be the answer. :).
> >
> > Thanks for reporting and analysis, folks,
> >
> > I agree with Yang since the MADV_PAGEOUT should work with only
> > LRU pages.
> >
> > From 0a43ac31c903bc23299a868a6d6724ff5b807e3d Mon Sep 17 00:00:00 2001
> > From: Minchan Kim <minchan@kernel.org>
> > Date: Fri, 26 Aug 2022 09:37:34 -0700
> > Subject: [PATCH] mm: fix madivse_pageout mishandling on non-LRU page
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > MADV_PAGEOUT tries to isolate non-LRU pages and get the warning
> > from isolate_lru_page below.
> > Fix it with checking PageLRU in advance.
> >
> > ------------[ cut here ]------------
> > trying to isolate tail page
> > WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
> > Modules linked in:
> > CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > RIP: 0010:isolate_lru_page+0x130/0x140
> >
> > Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
> > Reported-by: 韩天硕 <hantianshuo@iie.ac.cn>
> > Suggested-by: Yang Shi <shy828301@gmail.com>
> > Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Thanks for the patch, looks good to me. Will you post it to the
> mailing list? Anyway you could have my ack.

IIRC, Andrew usually has picked up the patch in the thread.
If he don't within a few days, let me post the new one 

Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 18:19                 ` Yang Shi
@ 2022-08-26 23:12                   ` Minchan Kim
  0 siblings, 0 replies; 19+ messages in thread
From: Minchan Kim @ 2022-08-26 23:12 UTC (permalink / raw)
  To: Yang Shi
  Cc: Yu Zhao, Matthew Wilcox, Yin, Fengwei, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng

On Fri, Aug 26, 2022 at 11:19:37AM -0700, Yang Shi wrote:
> On Fri, Aug 26, 2022 at 10:53 AM Minchan Kim <minchan@kernel.org> wrote:
> >
> > On Fri, Aug 26, 2022 at 11:27:53AM -0600, Yu Zhao wrote:
> > > On Fri, Aug 26, 2022 at 11:15 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
> > > > > On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
> > > > > > There's a number of ways of solving this.  I'm interested in seeing
> > > > > > which one Minchan thinks is best.
> > > > >
> > > > > My understanding is:
> > > > > PageTransCompound() return false for compound page if THP is disabled
> > > > > in kernel config. Replacing PageTransCompound() with PageCompound()
> > > > > could work here. But for the long term, folio should be the answer. :).
> > > >
> > > > Yes, ultimately, isolate_lru_page() is going away as an interface
> > > > and one will have to call folio_isolate_lru().  But should
> > > > madvise_cold_or_pageout_pte_range() even be getting called for VMAs
> > > > which are mmaps of af_packet?  can_madv_lru_vma() rules out a number
> > > > of different types of VMA; should it also be ruling out af_packet VMAs?
> > >
> > > Agreed.
> > >
> > > > If so, how?
> > >
> > > We should add a reliable helper to tell whether a file VMA is
> > > reclaimable or not. I don't think we have one. Currently MGLRU checks
> > > mapping->a_ops->read_folio for file VMAs to determine whether they are
> > > reclaimable.
> > >
> >
> > Long term, that's better idea(For stable backport, I'd like to go with
> > simple PageLRU check).
> >
> > I wonder it's possible to mix LRU pages and non-struct pages together
> > in a VMA. Otherwise, could we reuse(abuse) VM_MIXEDMAP?
> 
> I don't think the mix is going to happen via regular mmap call.

At a quick glance of vm_insert_page, it could change vm_flags
into VM_MIXEDMAP dynamically. I am worry that a driver may
call the vm_insert_page in the vma which already has LRU pages
before so the VMA could have both LRU pages and non-LRU pages
in the VMA. If it's doable, we may miss LRU pages if we filter
VM_MIXEDMAP out in can_madv_lru_vma. Not sure it's common case.
Otherwise, yeah, we may introduce a new vm_flag, something like,
VM_NON_LRU. Just thought.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 17:15           ` Matthew Wilcox
  2022-08-26 17:27             ` Yu Zhao
@ 2022-08-27  0:24             ` Yin, Fengwei
  1 sibling, 0 replies; 19+ messages in thread
From: Yin, Fengwei @ 2022-08-27  0:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yang Shi, Yu Zhao, Minchan Kim, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng



On 8/27/2022 1:15 AM, Matthew Wilcox wrote:
> On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
>> On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
>>> There's a number of ways of solving this.  I'm interested in seeing
>>> which one Minchan thinks is best.
>>
>> My understanding is:
>> PageTransCompound() return false for compound page if THP is disabled
>> in kernel config. Replacing PageTransCompound() with PageCompound() 
>> could work here. But for the long term, folio should be the answer. :).
> 
> Yes, ultimately, isolate_lru_page() is going away as an interface
> and one will have to call folio_isolate_lru().  But should
> madvise_cold_or_pageout_pte_range() even be getting called for VMAs
> which are mmaps of af_packet?  can_madv_lru_vma() rules out a number
> of different types of VMA; should it also be ruling out af_packet VMAs?
> If so, how?

Thanks a lot for the information which helps me to understand the real
concern here.

Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: (resend)WARNING: trying to isolate tail page in isolate_lru_page
  2022-08-26 16:56           ` Minchan Kim
  2022-08-26 18:23             ` Yang Shi
@ 2022-08-27  0:48             ` Yin, Fengwei
  1 sibling, 0 replies; 19+ messages in thread
From: Yin, Fengwei @ 2022-08-27  0:48 UTC (permalink / raw)
  To: Minchan Kim, Yin, Fengwei
  Cc: Matthew Wilcox, Yang Shi, Yu Zhao, Andrew Morton, Linux-MM,
	韩天硕,
	mawupeng



On 8/27/2022 12:56 AM, Minchan Kim wrote:
> On Fri, Aug 26, 2022 at 11:20:58AM +0800, Yin, Fengwei wrote:
>>
>>
>> On 8/26/2022 2:46 AM, Matthew Wilcox wrote:
>>>>> Looks like my analysis from yesterday was dropped:
>>>>>
>>>>> : This all seems quite plausible.  The reproducer seems to (correct me
>>>>> : if I'm wrong) create an AF_PACKET socket and mmap it.  af_packet.c
>>>>> : seems to create compound pages and mmap them.  This isn't folio-related
>>>>> : at all; I just moved the code that warns about it from mm/vmscan.c to
>>>>> : folio-compat.c.
>>>>> :
>>>>> : Looks like a long-standing bug in MADV_PAGEOUT to me.
>>>> Such page should never be on lru, right? We could test lru before
>>>> calling isolate_lru_page() for this case? I know isolate_lru_page()
>>>> does the check, but the tail page warning is raised before the check.
>>>>
>>>> Could the tail page warning be moved under the lru flag test? Seems
>>>> possible, but it should need extra handling (re-set lru flag). Seems a
>>>> little bit overkilling.
>>> There's a number of ways of solving this.  I'm interested in seeing
>>> which one Minchan thinks is best.
>>>
>>
>> My understanding is:
>> PageTransCompound() return false for compound page if THP is disabled
>> in kernel config. Replacing PageTransCompound() with PageCompound() 
>> could work here. But for the long term, folio should be the answer. :).
> 
> Thanks for reporting and analysis, folks,
> 
> I agree with Yang since the MADV_PAGEOUT should work with only
> LRU pages.
Yes. Yang's suggestion has wider coverage.

I am still wondering whether we need change the PageTransCompound()
to PageCompound(). large folios depend on THP now:

commit 421f1ab48452af48b64e205de1caca3d1ba415f4
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Jan 15 23:27:08 2022 -0500

    mm: Make large folios depend on THP

    Some parts of the VM still depend on THP to handle large folios
    correctly.  Until those are fixed, prevent creating large folios
    if THP are disabled.


Another thing: maybe move the !LRU(page) check before PageTransCompound()
check? Avoid trying to split the page if it's none-lru page? Just one 
thought. Thanks.


Regards
Yin, Fengwei

> 
> From 0a43ac31c903bc23299a868a6d6724ff5b807e3d Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Fri, 26 Aug 2022 09:37:34 -0700
> Subject: [PATCH] mm: fix madivse_pageout mishandling on non-LRU page
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> MADV_PAGEOUT tries to isolate non-LRU pages and get the warning
> from isolate_lru_page below.
> Fix it with checking PageLRU in advance.
> 
> ------------[ cut here ]------------
> trying to isolate tail page
> WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
> Modules linked in:
> CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:isolate_lru_page+0x130/0x140
> 
> Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
> Reported-by: 韩天硕 <hantianshuo@iie.ac.cn>
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
> Cc: stable@vger.kernel.org
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/madvise.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 682e1d161aef..a3fc4cd32ed3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -452,8 +452,11 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  			continue;
>  		}
>  
> -		/* Do not interfere with other mappings of this page */
> -		if (page_mapcount(page) != 1)
> +		/*
> +		 * Do not interfere with other mappings of this page and
> +		 * non-LRU page.
> +		 */
> +		if (!PageLRU(page) || page_mapcount(page) != 1)
>  			continue;
>  
>  		VM_BUG_ON_PAGE(PageTransCompound(page), page);


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-08-27  0:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 14:40 (resend)WARNING: trying to isolate tail page in isolate_lru_page 韩天硕
2022-08-25 16:50 ` Yu Zhao
2022-08-25 18:23   ` Matthew Wilcox
2022-08-25 18:37     ` Yu Zhao
2022-08-25 18:40     ` Yang Shi
2022-08-25 18:46       ` Matthew Wilcox
2022-08-26  3:20         ` Yin, Fengwei
2022-08-26 16:56           ` Minchan Kim
2022-08-26 18:23             ` Yang Shi
2022-08-26 22:58               ` Minchan Kim
2022-08-27  0:48             ` Yin, Fengwei
2022-08-26 17:15           ` Matthew Wilcox
2022-08-26 17:27             ` Yu Zhao
2022-08-26 17:53               ` Minchan Kim
2022-08-26 17:58                 ` Yu Zhao
2022-08-26 18:02                 ` Matthew Wilcox
2022-08-26 18:19                 ` Yang Shi
2022-08-26 23:12                   ` Minchan Kim
2022-08-27  0:24             ` Yin, Fengwei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.