linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel BUG at include/linux/swapops.h:LINE!
@ 2020-05-30 17:05 syzbot
  2020-07-19 21:10 ` syzbot
  2021-05-08 11:24 ` [syzbot] " syzbot
  0 siblings, 2 replies; 16+ messages in thread
From: syzbot @ 2020-05-30 17:05 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    9cb1fd0e Linux 5.7-rc7
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1788a54a100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cca7550d53ffa599
dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com

------------[ cut here ]------------
kernel BUG at include/linux/swapops.h:197!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 30075 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
RIP: 0010:pmd_migration_entry_wait+0x5b4/0x660 mm/migrate.c:368
Code: 32 e8 10 9f c0 ff 48 c7 c6 e0 a4 35 88 4c 89 e7 e8 81 a1 ec ff 0f 0b e8 fa 9e c0 ff 4d 8d 66 ff e9 1c fe ff ff e8 ec 9e c0 ff <0f> 0b e8 e5 9e c0 ff 0f 0b e8 de 9e c0 ff 4c 8d 65 ff eb c3 48 89
RSP: 0000:ffffc90015fffc70 EFLAGS: 00010293
RAX: ffff8880544f4180 RBX: 0000000000000000 RCX: ffffffff81b29e18
RDX: 0000000000000000 RSI: ffffffff81b29fc4 RDI: 0000000000000001
RBP: ffffea0000d40080 R08: ffff8880544f4180 R09: fffff940001a8001
R10: ffffea0000d40007 R11: fffff940001a8000 R12: ffffea0000d40000
R13: 1ffff92002bfff90 R14: ffffea0001230f08 R15: ffff8880503e11e0
FS:  00007fdb7a134700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200001c0 CR3: 00000000a703c000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __handle_mm_fault+0x1c0e/0x3c90 mm/memory.c:4327
 handle_mm_fault+0x1a5/0x660 mm/memory.c:4382
 do_user_addr_fault arch/x86/mm/fault.c:1464 [inline]
 do_page_fault+0x55b/0x13da arch/x86/mm/fault.c:1535
 page_fault+0x39/0x40 arch/x86/entry/entry_64.S:1203
RIP: 0033:0x45ca35
Code: 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 48 3d 01 f0 ff ff 0f 83 db b6 fb ff <c3> 66 2e 0f 1f 84 00 00 00 00 00 48 85 ff 41 57 4d 89 cf 41 56 41
RSP: 002b:00000000200001c0 EFLAGS: 00010217
RAX: 0000000000000000 RBX: 00000000004dabc0 RCX: 000000000045ca29
RDX: 0000000020000140 RSI: 00000000200001c0 RDI: 0000000000000000
RBP: 000000000078bfa0 R08: 0000000020000300 R09: 0000000000000000
R10: 00000000200002c0 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000076 R14: 00000000004c331e R15: 00007fdb7a1346d4
Modules linked in:
---[ end trace 5096692b6266afca ]---
RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
RIP: 0010:pmd_migration_entry_wait+0x5b4/0x660 mm/migrate.c:368
Code: 32 e8 10 9f c0 ff 48 c7 c6 e0 a4 35 88 4c 89 e7 e8 81 a1 ec ff 0f 0b e8 fa 9e c0 ff 4d 8d 66 ff e9 1c fe ff ff e8 ec 9e c0 ff <0f> 0b e8 e5 9e c0 ff 0f 0b e8 de 9e c0 ff 4c 8d 65 ff eb c3 48 89
RSP: 0000:ffffc90015fffc70 EFLAGS: 00010293
RAX: ffff8880544f4180 RBX: 0000000000000000 RCX: ffffffff81b29e18
RDX: 0000000000000000 RSI: ffffffff81b29fc4 RDI: 0000000000000001
RBP: ffffea0000d40080 R08: ffff8880544f4180 R09: fffff940001a8001
R10: ffffea0000d40007 R11: fffff940001a8000 R12: ffffea0000d40000
R13: 1ffff92002bfff90 R14: ffffea0001230f08 R15: ffff8880503e11e0
FS:  00007fdb7a134700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200001c0 CR3: 00000000a703c000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-05-30 17:05 kernel BUG at include/linux/swapops.h:LINE! syzbot
@ 2020-07-19 21:10 ` syzbot
  2020-07-20 23:51   ` Andrew Morton
  2021-05-08 11:24 ` [syzbot] " syzbot
  1 sibling, 1 reply; 16+ messages in thread
From: syzbot @ 2020-07-19 21:10 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm, syzkaller-bugs

syzbot has found a reproducer for the following issue on:

HEAD commit:    4c43049f Add linux-next specific files for 20200716
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
compiler:       gcc (GCC) 10.1.0-syz 20200507
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com

------------[ cut here ]------------
kernel BUG at include/linux/swapops.h:197!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 19938 Comm: syz-executor.2 Not tainted 5.8.0-rc5-next-20200716-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368
Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89
RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24
RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001
RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080
R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __handle_mm_fault mm/memory.c:4349 [inline]
 handle_mm_fault+0x23cf/0x45e0 mm/memory.c:4465
 do_user_addr_fault+0x598/0xbf0 arch/x86/mm/fault.c:1294
 handle_page_fault arch/x86/mm/fault.c:1351 [inline]
 exc_page_fault+0xab/0x170 arch/x86/mm/fault.c:1404
 asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:544
RIP: 0010:copy_user_generic_unrolled+0x89/0xc0 arch/x86/lib/copy_user_64.S:91
Code: 38 4c 89 47 20 4c 89 4f 28 4c 89 57 30 4c 89 5f 38 48 8d 76 40 48 8d 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 <4c> 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a
RSP: 0018:ffffc9001095fe48 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000020000080 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffc9001095fea8 RDI: 0000000020000080
RBP: ffffc9001095fea8 R08: 0000000400000003 R09: ffffc9001095feaf
R10: fffff5200212bfd5 R11: 0000000000000000 R12: 0000000000000008
R13: 0000000020000088 R14: 00007ffffffff000 R15: 0000000000000000
 copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline]
 raw_copy_to_user arch/x86/include/asm/uaccess_64.h:74 [inline]
 _copy_to_user+0x11e/0x160 lib/usercopy.c:30
 copy_to_user include/linux/uaccess.h:168 [inline]
 do_pipe2+0x128/0x1b0 fs/pipe.c:1014
 __do_sys_pipe fs/pipe.c:1035 [inline]
 __se_sys_pipe fs/pipe.c:1033 [inline]
 __x64_sys_pipe+0x2f/0x40 fs/pipe.c:1033
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45c1d9
Code: Bad RIP value.
RSP: 002b:00007f55eb476c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000016
RAX: ffffffffffffffda RBX: 0000000000022ac0 RCX: 000000000045c1d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080
RBP: 000000000078c070 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078c04c
R13: 00007ffcfc120ecf R14: 00007f55eb4779c0 R15: 000000000078c04c
Modules linked in:
---[ end trace ea73d933d66ff0d4 ]---
RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368
Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89
RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24
RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001
RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080
R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-19 21:10 ` syzbot
@ 2020-07-20 23:51   ` Andrew Morton
  2020-07-21  0:21     ` Matthew Wilcox
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Andrew Morton @ 2020-07-20 23:51 UTC (permalink / raw)
  To: syzbot
  Cc: linux-kernel, linux-mm, syzkaller-bugs, Matthew Wilcox,
	Kirill A. Shutemov, Ralph Campbell, David Hildenbrand,
	Mike Kravetz

On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote:

> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    4c43049f Add linux-next specific files for 20200716
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> compiler:       gcc (GCC) 10.1.0-syz 20200507
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com

Thanks.

__handle_mm_fault
  ->pmd_migration_entry_wait
    ->migration_entry_to_page

stumbled onto an unlocked page.

I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
perhaps something else.

Is it possible to perform a bisection?

> ------------[ cut here ]------------
> kernel BUG at include/linux/swapops.h:197!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 19938 Comm: syz-executor.2 Not tainted 5.8.0-rc5-next-20200716-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
> RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
> RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368
> Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89
> RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24
> RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001
> RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080
> R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000
> FS:  00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  __handle_mm_fault mm/memory.c:4349 [inline]
>  handle_mm_fault+0x23cf/0x45e0 mm/memory.c:4465
>  do_user_addr_fault+0x598/0xbf0 arch/x86/mm/fault.c:1294
>  handle_page_fault arch/x86/mm/fault.c:1351 [inline]
>  exc_page_fault+0xab/0x170 arch/x86/mm/fault.c:1404
>  asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:544
> RIP: 0010:copy_user_generic_unrolled+0x89/0xc0 arch/x86/lib/copy_user_64.S:91
> Code: 38 4c 89 47 20 4c 89 4f 28 4c 89 57 30 4c 89 5f 38 48 8d 76 40 48 8d 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 <4c> 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a
> RSP: 0018:ffffc9001095fe48 EFLAGS: 00010202
> RAX: 0000000000000001 RBX: 0000000020000080 RCX: 0000000000000001
> RDX: 0000000000000000 RSI: ffffc9001095fea8 RDI: 0000000020000080
> RBP: ffffc9001095fea8 R08: 0000000400000003 R09: ffffc9001095feaf
> R10: fffff5200212bfd5 R11: 0000000000000000 R12: 0000000000000008
> R13: 0000000020000088 R14: 00007ffffffff000 R15: 0000000000000000
>  copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline]
>  raw_copy_to_user arch/x86/include/asm/uaccess_64.h:74 [inline]
>  _copy_to_user+0x11e/0x160 lib/usercopy.c:30
>  copy_to_user include/linux/uaccess.h:168 [inline]
>  do_pipe2+0x128/0x1b0 fs/pipe.c:1014
>  __do_sys_pipe fs/pipe.c:1035 [inline]
>  __se_sys_pipe fs/pipe.c:1033 [inline]
>  __x64_sys_pipe+0x2f/0x40 fs/pipe.c:1033
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45c1d9
> Code: Bad RIP value.
> RSP: 002b:00007f55eb476c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000016
> RAX: ffffffffffffffda RBX: 0000000000022ac0 RCX: 000000000045c1d9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080
> RBP: 000000000078c070 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078c04c
> R13: 00007ffcfc120ecf R14: 00007f55eb4779c0 R15: 000000000078c04c
> Modules linked in:
> ---[ end trace ea73d933d66ff0d4 ]---
> RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
> RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
> RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368
> Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89
> RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24
> RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001
> RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080
> R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000
> FS:  00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-20 23:51   ` Andrew Morton
@ 2020-07-21  0:21     ` Matthew Wilcox
  2020-07-21  2:14       ` Matthew Wilcox
  2020-07-21 11:11     ` Kirill A. Shutemov
       [not found]     ` <20200723073744.5268-1-hdanton@sina.com>
  2 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2020-07-21  0:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs,
	Kirill A. Shutemov, Ralph Campbell, David Hildenbrand,
	Mike Kravetz

On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
> On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote:
> 
> > syzbot has found a reproducer for the following issue on:
> > 
> > HEAD commit:    4c43049f Add linux-next specific files for 20200716
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
> 
> Thanks.
> 
> __handle_mm_fault
>   ->pmd_migration_entry_wait
>     ->migration_entry_to_page
> 
> stumbled onto an unlocked page.
> 
> I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
> perhaps something else.

That's interesting.  I'm currently chasing that signature too.  Of course,
almost anything can cause this.

What I do have in my tree is a patch to turn that WARN_ON into a
VM_BUG_ON_PAGE and what I see is not just an unlocked page, but one
that's been freed.

> Is it possible to perform a bisection?

My testing (xfstests with the full THP patch set) takes about 45 minutes
to hit this bug usually.  Sometimes two hours.  I haven't tried running
it against fewer patches because I thought it was related to having THPs
smaller than PMD size in the page cache.

I don't think it is my patches because they're essentially just a rename.
But of course, I've been wrong before ...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-21  0:21     ` Matthew Wilcox
@ 2020-07-21  2:14       ` Matthew Wilcox
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2020-07-21  2:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs,
	Kirill A. Shutemov, Ralph Campbell, David Hildenbrand,
	Mike Kravetz

On Tue, Jul 21, 2020 at 01:21:47AM +0100, Matthew Wilcox wrote:
> On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
> > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote:
> > 
> > > syzbot has found a reproducer for the following issue on:
> > > 
> > > HEAD commit:    4c43049f Add linux-next specific files for 20200716
> > > git tree:       linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> > > 
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
> > 
> > Thanks.
> > 
> > __handle_mm_fault
> >   ->pmd_migration_entry_wait
> >     ->migration_entry_to_page
> > 
> > stumbled onto an unlocked page.
> > 
> > I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
> > perhaps something else.
> 
> That's interesting.  I'm currently chasing that signature too.  Of course,
> almost anything can cause this.
> 
> What I do have in my tree is a patch to turn that WARN_ON into a
> VM_BUG_ON_PAGE and what I see is not just an unlocked page, but one
> that's been freed.

Here's an example crash:

1404 086 (25392): drop_caches: 3
1404 page:00000000c8b7c292 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x1 pfn:0xac20
1404 flags: 0x4000000000000000()
1404 raw: 4000000000000000 fffff7b501775808 fffff7b501ab7008 0000000000000000
1404 raw: 0000000000000001 0000000000000005 00000000ffffff7f 0000000000000000
1404 page dumped because: VM_BUG_ON_PAGE(!PageLocked(p))

(that's generic/086 for what it's worth, but you have to run
through a number of other tests in order to hit it; even starting at
generic/08[0123456] isn't enough to hit it, and it doesn't always hit)

A mapcount of -128 indicates PageBuddy, but I've also seen a mapcount of 0
indicating it's still on the per-cpu freelist.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-20 23:51   ` Andrew Morton
  2020-07-21  0:21     ` Matthew Wilcox
@ 2020-07-21 11:11     ` Kirill A. Shutemov
  2020-07-21 15:11       ` Jens Axboe
       [not found]     ` <20200723073744.5268-1-hdanton@sina.com>
  2 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-07-21 11:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs, Matthew Wilcox,
	Ralph Campbell, David Hildenbrand, Mike Kravetz, Johannes Weiner,
	Jens Axboe

On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
> On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote:
> 
> > syzbot has found a reproducer for the following issue on:
> > 
> > HEAD commit:    4c43049f Add linux-next specific files for 20200716
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
> 
> Thanks.
> 
> __handle_mm_fault
>   ->pmd_migration_entry_wait
>     ->migration_entry_to_page
> 
> stumbled onto an unlocked page.
> 
> I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
> perhaps something else.
> 
> Is it possible to perform a bisection?

Maybe it's related to the new lock_page_async()?

> > ------------[ cut here ]------------
> > kernel BUG at include/linux/swapops.h:197!
> > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 1 PID: 19938 Comm: syz-executor.2 Not tainted 5.8.0-rc5-next-20200716-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
> > RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
> > RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368
> > Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89
> > RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24
> > RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001
> > RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080
> > R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000
> > FS:  00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  __handle_mm_fault mm/memory.c:4349 [inline]
> >  handle_mm_fault+0x23cf/0x45e0 mm/memory.c:4465
> >  do_user_addr_fault+0x598/0xbf0 arch/x86/mm/fault.c:1294
> >  handle_page_fault arch/x86/mm/fault.c:1351 [inline]
> >  exc_page_fault+0xab/0x170 arch/x86/mm/fault.c:1404
> >  asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:544
> > RIP: 0010:copy_user_generic_unrolled+0x89/0xc0 arch/x86/lib/copy_user_64.S:91
> > Code: 38 4c 89 47 20 4c 89 4f 28 4c 89 57 30 4c 89 5f 38 48 8d 76 40 48 8d 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 <4c> 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a
> > RSP: 0018:ffffc9001095fe48 EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: 0000000020000080 RCX: 0000000000000001
> > RDX: 0000000000000000 RSI: ffffc9001095fea8 RDI: 0000000020000080
> > RBP: ffffc9001095fea8 R08: 0000000400000003 R09: ffffc9001095feaf
> > R10: fffff5200212bfd5 R11: 0000000000000000 R12: 0000000000000008
> > R13: 0000000020000088 R14: 00007ffffffff000 R15: 0000000000000000
> >  copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline]
> >  raw_copy_to_user arch/x86/include/asm/uaccess_64.h:74 [inline]
> >  _copy_to_user+0x11e/0x160 lib/usercopy.c:30
> >  copy_to_user include/linux/uaccess.h:168 [inline]
> >  do_pipe2+0x128/0x1b0 fs/pipe.c:1014
> >  __do_sys_pipe fs/pipe.c:1035 [inline]
> >  __se_sys_pipe fs/pipe.c:1033 [inline]
> >  __x64_sys_pipe+0x2f/0x40 fs/pipe.c:1033
> >  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > RIP: 0033:0x45c1d9
> > Code: Bad RIP value.
> > RSP: 002b:00007f55eb476c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000016
> > RAX: ffffffffffffffda RBX: 0000000000022ac0 RCX: 000000000045c1d9
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080
> > RBP: 000000000078c070 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078c04c
> > R13: 00007ffcfc120ecf R14: 00007f55eb4779c0 R15: 000000000078c04c
> > Modules linked in:
> > ---[ end trace ea73d933d66ff0d4 ]---
> > RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
> > RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
> > RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368
> > Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89
> > RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24
> > RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001
> > RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080
> > R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000
> > FS:  00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-21 11:11     ` Kirill A. Shutemov
@ 2020-07-21 15:11       ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2020-07-21 15:11 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton
  Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs, Matthew Wilcox,
	Ralph Campbell, David Hildenbrand, Mike Kravetz, Johannes Weiner

On 7/21/20 5:11 AM, Kirill A. Shutemov wrote:
> On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
>> On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote:
>>
>>> syzbot has found a reproducer for the following issue on:
>>>
>>> HEAD commit:    4c43049f Add linux-next specific files for 20200716
>>> git tree:       linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
>>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
>>>
>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>> Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
>>
>> Thanks.
>>
>> __handle_mm_fault
>>   ->pmd_migration_entry_wait
>>     ->migration_entry_to_page
>>
>> stumbled onto an unlocked page.
>>
>> I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
>> perhaps something else.
>>
>> Is it possible to perform a bisection?
> 
> Maybe it's related to the new lock_page_async()?

Shouldn't be used for any of those paths at all, though of course can't
rule out a bug that triggers it somehow. A bisection would be nice.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
       [not found]     ` <20200723073744.5268-1-hdanton@sina.com>
@ 2020-07-24 11:13       ` Kirill A. Shutemov
  2020-07-26 16:49         ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-07-24 11:13 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel,
	linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner,
	Jens Axboe, Markus Elfring

On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote:
> 
> On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote:
> > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
> > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote:
> > > 
> > > > syzbot has found a reproducer for the following issue on:
> > > > 
> > > > HEAD commit:    4c43049f Add linux-next specific files for 20200716
> > > > git tree:       linux-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> > > > 
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
> > > 
> > > Thanks.
> > > 
> > > __handle_mm_fault
> > >   ->pmd_migration_entry_wait
> > >     ->migration_entry_to_page
> > > 
> > > stumbled onto an unlocked page.
> > > 
> > > I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
> > > perhaps something else.
> > > 
> > > Is it possible to perform a bisection?
> > 
> > Maybe it's related to the new lock_page_async()?
> 
> Or is there likely the window that after copy_huge_pmd() the src pmd migrate
> entry is removed and the page unlocked but the dst is not?

No.

copy_huge_pmd() runs with exclusive mmap_lock on the source side and
destination side is not running yet.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-24 11:13       ` Kirill A. Shutemov
@ 2020-07-26 16:49         ` Matthew Wilcox
  2020-07-27 10:31           ` Kirill A. Shutemov
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2020-07-26 16:49 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot,
	linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz,
	Johannes Weiner, Jens Axboe, Markus Elfring

On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote:
> On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote:
> > 
> > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote:
> > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
> > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote:
> > > > 
> > > > > syzbot has found a reproducer for the following issue on:
> > > > > 
> > > > > HEAD commit:    4c43049f Add linux-next specific files for 20200716
> > > > > git tree:       linux-next
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> > > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> > > > > 
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
> > > > 
> > > > Thanks.
> > > > 
> > > > __handle_mm_fault
> > > >   ->pmd_migration_entry_wait
> > > >     ->migration_entry_to_page
> > > > 
> > > > stumbled onto an unlocked page.
> > > > 
> > > > I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
> > > > perhaps something else.
> > > > 
> > > > Is it possible to perform a bisection?
> > > 
> > > Maybe it's related to the new lock_page_async()?
> > 
> > Or is there likely the window that after copy_huge_pmd() the src pmd migrate
> > entry is removed and the page unlocked but the dst is not?
> 
> No.
> 
> copy_huge_pmd() runs with exclusive mmap_lock on the source side and
> destination side is not running yet.

The one I'm hitting is huge related though.

I added this debug:

+++ b/include/linux/swapops.h
@@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry)
 #ifdef CONFIG_MIGRATION
 static inline swp_entry_t make_migration_entry(struct page *page, int write)
 {
-       BUG_ON(!PageLocked(compound_head(page)));
+       VM_BUG_ON_PAGE(!PageLocked(page), page);
 
+if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));
        return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ,
                        page_to_pfn(page));
 }
@@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry)
         * Any use of migration entries may only occur while the
         * corresponding page is locked
         */
-       BUG_ON(!PageLocked(compound_head(p)));
+       if (!PageLocked(p)) {
+               dump_page(p, "not locked");
+               printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry));
+               BUG();
+       }
        return p;
 }
 

and got useful output (while running generic/086):

1457 086 (20181): drop_caches: 3
1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7
1457 aops:def_blk_aops ino:0
1457 flags: 0x4000000000002030(lru|active|private)
1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578
1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000
1457 page dumped because: not locked
1457 swap entry 30.229e7
1457 ------------[ cut here ]------------
1457 kernel BUG at include/linux/swapops.h:201!
1457 invalid opcode: 0000 [#1] SMP PTI
1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G        W         5.8.0-rc6-00067-gd8b18bdf9870-dirty #355
1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
1457 RIP: 0010:__migration_entry_wait+0x109/0x110
[...]

Looking back in the trace, I see:

...
1457 pfn 229e5 order 9
1457 pfn 229e6 order 9
1457 pfn 229e7 order 9
1457 pfn 229e8 order 9
1457 pfn 229e9 order 9
...

so I would say we have a refcount problem.  I've probably made it worse by
creating more THPs, but I don't think I'm the originator of the problem.

I know very little about the migration code today.  I suspect I'm going
to have to learn about it next week.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-26 16:49         ` Matthew Wilcox
@ 2020-07-27 10:31           ` Kirill A. Shutemov
  2020-07-27 12:03             ` Matthew Wilcox
       [not found]             ` <20200727125950.12048-1-hdanton@sina.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-07-27 10:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot,
	linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz,
	Johannes Weiner, Jens Axboe, Markus Elfring

On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote:
> On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote:
> > On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote:
> > > 
> > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote:
> > > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
> > > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote:
> > > > > 
> > > > > > syzbot has found a reproducer for the following issue on:
> > > > > > 
> > > > > > HEAD commit:    4c43049f Add linux-next specific files for 20200716
> > > > > > git tree:       linux-next
> > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> > > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> > > > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> > > > > > 
> > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > __handle_mm_fault
> > > > >   ->pmd_migration_entry_wait
> > > > >     ->migration_entry_to_page
> > > > > 
> > > > > stumbled onto an unlocked page.
> > > > > 
> > > > > I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
> > > > > perhaps something else.
> > > > > 
> > > > > Is it possible to perform a bisection?
> > > > 
> > > > Maybe it's related to the new lock_page_async()?
> > > 
> > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate
> > > entry is removed and the page unlocked but the dst is not?
> > 
> > No.
> > 
> > copy_huge_pmd() runs with exclusive mmap_lock on the source side and
> > destination side is not running yet.
> 
> The one I'm hitting is huge related though.
> 
> I added this debug:
> 
> +++ b/include/linux/swapops.h
> @@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry)
>  #ifdef CONFIG_MIGRATION
>  static inline swp_entry_t make_migration_entry(struct page *page, int write)
>  {
> -       BUG_ON(!PageLocked(compound_head(page)));
> +       VM_BUG_ON_PAGE(!PageLocked(page), page);
>  
> +if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));
>         return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ,
>                         page_to_pfn(page));
>  }
> @@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry)
>          * Any use of migration entries may only occur while the
>          * corresponding page is locked
>          */
> -       BUG_ON(!PageLocked(compound_head(p)));
> +       if (!PageLocked(p)) {
> +               dump_page(p, "not locked");
> +               printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry));
> +               BUG();
> +       }
>         return p;
>  }
>  
> 
> and got useful output (while running generic/086):
> 
> 1457 086 (20181): drop_caches: 3
> 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7
> 1457 aops:def_blk_aops ino:0
> 1457 flags: 0x4000000000002030(lru|active|private)
> 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578
> 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000
> 1457 page dumped because: not locked
> 1457 swap entry 30.229e7
> 1457 ------------[ cut here ]------------
> 1457 kernel BUG at include/linux/swapops.h:201!
> 1457 invalid opcode: 0000 [#1] SMP PTI
> 1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G        W         5.8.0-rc6-00067-gd8b18bdf9870-dirty #355
> 1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> 1457 RIP: 0010:__migration_entry_wait+0x109/0x110
> [...]
> 
> Looking back in the trace, I see:
> 
> ...
> 1457 pfn 229e5 order 9
> 1457 pfn 229e6 order 9
> 1457 pfn 229e7 order 9
> 1457 pfn 229e8 order 9
> 1457 pfn 229e9 order 9
> ...
> 
> so I would say we have a refcount problem.  I've probably made it worse by
> creating more THPs, but I don't think I'm the originator of the problem.
> 
> I know very little about the migration code today.  I suspect I'm going
> to have to learn about it next week.

It would be interesting to know if the migration entires ever got removed
for pfn. I mean if remove_migration_pte() got called for it.

It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes()
or something.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-27 10:31           ` Kirill A. Shutemov
@ 2020-07-27 12:03             ` Matthew Wilcox
  2020-07-29 19:21               ` Kirill A. Shutemov
       [not found]             ` <20200727125950.12048-1-hdanton@sina.com>
  1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2020-07-27 12:03 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot,
	linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz,
	Johannes Weiner, Jens Axboe, Markus Elfring

On Mon, Jul 27, 2020 at 01:31:40PM +0300, Kirill A. Shutemov wrote:
> On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote:
> > On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote:
> > > On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote:
> > > > 
> > > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote:
> > > > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
> > > > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote:
> > > > > > 
> > > > > > > syzbot has found a reproducer for the following issue on:
> > > > > > > 
> > > > > > > HEAD commit:    4c43049f Add linux-next specific files for 20200716
> > > > > > > git tree:       linux-next
> > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> > > > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> > > > > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> > > > > > > 
> > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
> > > > > > 
> > > > > > Thanks.
> > > > > > 
> > > > > > __handle_mm_fault
> > > > > >   ->pmd_migration_entry_wait
> > > > > >     ->migration_entry_to_page
> > > > > > 
> > > > > > stumbled onto an unlocked page.
> > > > > > 
> > > > > > I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
> > > > > > perhaps something else.
> > > > > > 
> > > > > > Is it possible to perform a bisection?
> > > > > 
> > > > > Maybe it's related to the new lock_page_async()?
> > > > 
> > > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate
> > > > entry is removed and the page unlocked but the dst is not?
> > > 
> > > No.
> > > 
> > > copy_huge_pmd() runs with exclusive mmap_lock on the source side and
> > > destination side is not running yet.
> > 
> > The one I'm hitting is huge related though.
> > 
> > I added this debug:
> > 
> > +++ b/include/linux/swapops.h
> > @@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry)
> >  #ifdef CONFIG_MIGRATION
> >  static inline swp_entry_t make_migration_entry(struct page *page, int write)
> >  {
> > -       BUG_ON(!PageLocked(compound_head(page)));
> > +       VM_BUG_ON_PAGE(!PageLocked(page), page);
> >  
> > +if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));
> >         return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ,
> >                         page_to_pfn(page));
> >  }
> > @@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry)
> >          * Any use of migration entries may only occur while the
> >          * corresponding page is locked
> >          */
> > -       BUG_ON(!PageLocked(compound_head(p)));
> > +       if (!PageLocked(p)) {
> > +               dump_page(p, "not locked");
> > +               printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry));
> > +               BUG();
> > +       }
> >         return p;
> >  }
> >  
> > 
> > and got useful output (while running generic/086):
> > 
> > 1457 086 (20181): drop_caches: 3
> > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7
> > 1457 aops:def_blk_aops ino:0
> > 1457 flags: 0x4000000000002030(lru|active|private)
> > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578
> > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000
> > 1457 page dumped because: not locked
> > 1457 swap entry 30.229e7
> > 1457 ------------[ cut here ]------------
> > 1457 kernel BUG at include/linux/swapops.h:201!
> > 1457 invalid opcode: 0000 [#1] SMP PTI
> > 1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G        W         5.8.0-rc6-00067-gd8b18bdf9870-dirty #355
> > 1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > 1457 RIP: 0010:__migration_entry_wait+0x109/0x110
> > [...]
> > 
> > Looking back in the trace, I see:
> > 
> > ...
> > 1457 pfn 229e5 order 9
> > 1457 pfn 229e6 order 9
> > 1457 pfn 229e7 order 9
> > 1457 pfn 229e8 order 9
> > 1457 pfn 229e9 order 9
> > ...
> > 
> > so I would say we have a refcount problem.  I've probably made it worse by
> > creating more THPs, but I don't think I'm the originator of the problem.
> > 
> > I know very little about the migration code today.  I suspect I'm going
> > to have to learn about it next week.
> 
> It would be interesting to know if the migration entires ever got removed
> for pfn. I mean if remove_migration_pte() got called for it.
> 
> It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes()
> or something.

It's not mapped with a PMD.  I tweaked my debugging slightly:

 static inline swp_entry_t make_migration_entry(struct page *page, int write)
 {
-       BUG_ON(!PageLocked(compound_head(page)));
+       VM_BUG_ON_PAGE(!PageLocked(page), page);
 
+if (PageHead(page)) dump_page(page, "make entry");
+if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));

1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00
1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0
1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked)
1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059
1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000
1523 page dumped because: make entry
1523 pfn 1dc01 order 9
1523 pfn 1dc02 order 9
1523 pfn 1dc03 order 9
...

Notice that it's an anonymous page, so it's not related to my work.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
       [not found]             ` <20200727125950.12048-1-hdanton@sina.com>
@ 2020-07-27 13:44               ` Matthew Wilcox
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2020-07-27 13:44 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, Andrew Morton, syzbot,
	linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz,
	Johannes Weiner, Jens Axboe, Markus Elfring

On Mon, Jul 27, 2020 at 08:59:50PM +0800, Hillf Danton wrote:
> Can you elaborate on the difference between the two dumps?

You didn't trim anything, so I have no idea which two dumps you mean.

I'll annotate below ...

> > > On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote:
> > > > 1457 086 (20181): drop_caches: 3
> > > > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7
> > > > 1457 aops:def_blk_aops ino:0
> > > > 1457 flags: 0x4000000000002030(lru|active|private)
> > > > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578
> > > > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000
> > > > 1457 page dumped because: not locked
> > > > 1457 swap entry 30.229e7

This is a dump of the page that was found when looking up the migration entry.

> On Mon, 27 Jul 2020 13:03:10 +0100 Matthew Wilcox wrote:
> > It's not mapped with a PMD.  I tweaked my debugging slightly:
> > 
> >  static inline swp_entry_t make_migration_entry(struct page *page, int write)
> >  {
> > -       BUG_ON(!PageLocked(compound_head(page)));
> > +       VM_BUG_ON_PAGE(!PageLocked(page), page);
> >  
> > +if (PageHead(page)) dump_page(page, "make entry");
> > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));
> > 
> > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00
> > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0
> > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked)
> > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059
> > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000
> > 1523 page dumped because: make entry

This is dumping the page when we create the entry.

For completeness, here's the page that we find from the same run.

1523 page:00000000a18100e6 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1ddde
1523 flags: 0x4000000000000000()
1523 raw: 4000000000000000 dead000000000100 dead000000000122 0000000000000000
1523 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
1523 page dumped because: not locked

(an order-9 page will occupy PFNs 0x1dc00-0x1ddff)

It's clearly been freed and is still sitting on the per-CPU free list.
I've also seen them as PageBuddy and, as in the first example above,
reallocated to a different user.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-27 12:03             ` Matthew Wilcox
@ 2020-07-29 19:21               ` Kirill A. Shutemov
  2020-07-29 19:54                 ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2020-07-29 19:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot,
	linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz,
	Johannes Weiner, Jens Axboe, Markus Elfring

On Mon, Jul 27, 2020 at 01:03:10PM +0100, Matthew Wilcox wrote:
> On Mon, Jul 27, 2020 at 01:31:40PM +0300, Kirill A. Shutemov wrote:
> > On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote:
> > > On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote:
> > > > On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote:
> > > > > 
> > > > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote:
> > > > > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote:
> > > > > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote:
> > > > > > > 
> > > > > > > > syzbot has found a reproducer for the following issue on:
> > > > > > > > 
> > > > > > > > HEAD commit:    4c43049f Add linux-next specific files for 20200716
> > > > > > > > git tree:       linux-next
> > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000
> > > > > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=2c76d72659687242
> > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
> > > > > > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > > > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000
> > > > > > > > 
> > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > __handle_mm_fault
> > > > > > >   ->pmd_migration_entry_wait
> > > > > > >     ->migration_entry_to_page
> > > > > > > 
> > > > > > > stumbled onto an unlocked page.
> > > > > > > 
> > > > > > > I don't immediately see a cause.  Perhaps Matthew's "THP prep patches",
> > > > > > > perhaps something else.
> > > > > > > 
> > > > > > > Is it possible to perform a bisection?
> > > > > > 
> > > > > > Maybe it's related to the new lock_page_async()?
> > > > > 
> > > > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate
> > > > > entry is removed and the page unlocked but the dst is not?
> > > > 
> > > > No.
> > > > 
> > > > copy_huge_pmd() runs with exclusive mmap_lock on the source side and
> > > > destination side is not running yet.
> > > 
> > > The one I'm hitting is huge related though.
> > > 
> > > I added this debug:
> > > 
> > > +++ b/include/linux/swapops.h
> > > @@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry)
> > >  #ifdef CONFIG_MIGRATION
> > >  static inline swp_entry_t make_migration_entry(struct page *page, int write)
> > >  {
> > > -       BUG_ON(!PageLocked(compound_head(page)));
> > > +       VM_BUG_ON_PAGE(!PageLocked(page), page);
> > >  
> > > +if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));
> > >         return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ,
> > >                         page_to_pfn(page));
> > >  }
> > > @@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry)
> > >          * Any use of migration entries may only occur while the
> > >          * corresponding page is locked
> > >          */
> > > -       BUG_ON(!PageLocked(compound_head(p)));
> > > +       if (!PageLocked(p)) {
> > > +               dump_page(p, "not locked");
> > > +               printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry));
> > > +               BUG();
> > > +       }
> > >         return p;
> > >  }
> > >  
> > > 
> > > and got useful output (while running generic/086):
> > > 
> > > 1457 086 (20181): drop_caches: 3
> > > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7
> > > 1457 aops:def_blk_aops ino:0
> > > 1457 flags: 0x4000000000002030(lru|active|private)
> > > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578
> > > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000
> > > 1457 page dumped because: not locked
> > > 1457 swap entry 30.229e7
> > > 1457 ------------[ cut here ]------------
> > > 1457 kernel BUG at include/linux/swapops.h:201!
> > > 1457 invalid opcode: 0000 [#1] SMP PTI
> > > 1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G        W         5.8.0-rc6-00067-gd8b18bdf9870-dirty #355
> > > 1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > > 1457 RIP: 0010:__migration_entry_wait+0x109/0x110
> > > [...]
> > > 
> > > Looking back in the trace, I see:
> > > 
> > > ...
> > > 1457 pfn 229e5 order 9
> > > 1457 pfn 229e6 order 9
> > > 1457 pfn 229e7 order 9
> > > 1457 pfn 229e8 order 9
> > > 1457 pfn 229e9 order 9
> > > ...
> > > 
> > > so I would say we have a refcount problem.  I've probably made it worse by
> > > creating more THPs, but I don't think I'm the originator of the problem.
> > > 
> > > I know very little about the migration code today.  I suspect I'm going
> > > to have to learn about it next week.
> > 
> > It would be interesting to know if the migration entires ever got removed
> > for pfn. I mean if remove_migration_pte() got called for it.
> > 
> > It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes()
> > or something.
> 
> It's not mapped with a PMD.  I tweaked my debugging slightly:
> 
>  static inline swp_entry_t make_migration_entry(struct page *page, int write)
>  {
> -       BUG_ON(!PageLocked(compound_head(page)));
> +       VM_BUG_ON_PAGE(!PageLocked(page), page);
>  
> +if (PageHead(page)) dump_page(page, "make entry");
> +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));
> 
> 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00
> 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0
> 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked)
> 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059
> 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000
> 1523 page dumped because: make entry
> 1523 pfn 1dc01 order 9
> 1523 pfn 1dc02 order 9
> 1523 pfn 1dc03 order 9
> ...
> 
> Notice that it's an anonymous page, so it's not related to my work.

I don't have much hope, but could you try if the patch below would blow
up?

Could you share the setup you use to trigger the issue? I want try it
myself.

diff --git a/mm/migrate.c b/mm/migrate.c
index 40cd7016ae6f..c3148e1261d0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -215,6 +215,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
 	pte_t pte;
 	swp_entry_t entry;
 
+	VM_BUG_ON_PAGE(PageTail(pvmw.page), pvmw.page);
 	VM_BUG_ON_PAGE(PageTail(page), page);
 	while (page_vma_mapped_walk(&pvmw)) {
 		if (PageKsm(page))
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-29 19:21               ` Kirill A. Shutemov
@ 2020-07-29 19:54                 ` Matthew Wilcox
  2020-07-29 22:11                   ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2020-07-29 19:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot,
	linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz,
	Johannes Weiner, Jens Axboe, Markus Elfring

On Wed, Jul 29, 2020 at 10:21:51PM +0300, Kirill A. Shutemov wrote:
> On Mon, Jul 27, 2020 at 01:03:10PM +0100, Matthew Wilcox wrote:
> > > It would be interesting to know if the migration entires ever got removed
> > > for pfn. I mean if remove_migration_pte() got called for it.
> > > 
> > > It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes()
> > > or something.
> > 
> > It's not mapped with a PMD.  I tweaked my debugging slightly:
> > 
> >  static inline swp_entry_t make_migration_entry(struct page *page, int write)
> >  {
> > -       BUG_ON(!PageLocked(compound_head(page)));
> > +       VM_BUG_ON_PAGE(!PageLocked(page), page);
> >  
> > +if (PageHead(page)) dump_page(page, "make entry");
> > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));
> > 
> > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00
> > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0
> > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked)
> > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059
> > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000
> > 1523 page dumped because: make entry
> > 1523 pfn 1dc01 order 9
> > 1523 pfn 1dc02 order 9
> > 1523 pfn 1dc03 order 9
> > ...
> > 
> > Notice that it's an anonymous page, so it's not related to my work.
> 
> I don't have much hope, but could you try if the patch below would blow
> up?

Running it now.  Results probably in twenty minutes.

> Could you share the setup you use to trigger the issue? I want try it
> myself.

Head commit d8b18bdf9870b131802d641f5e7f32ddc53dcce3 which you can find
in http://git.infradead.org/users/willy/pagecache.git

I'm using Kent Overstreet's ktest as the base:
https://github.com/koverstreet/ktest

from the root of the kernel tree, I type:
$ ../ktest/build-test-kernel run ../ktest/tests/xfs.ktest 

xfs.ktest is not in Kent's repo:

#!/bin/bash

require-kernel-config XFS_FS
require-kernel-config XFS_QUOTA XFS_POSIX_ACL XFS_RT XFS_ONLINE_SCRUB
require-kernel-config XFS_ONLINE_REPAIR XFS_DEBUG XFS_ASSERT_FATAL
require-kernel-config QUOTA

require-lib xfstests.sh

run_tests()
{
    run_xfstests xfs "$@"
}

I think that's all you'll need to get going.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:LINE!
  2020-07-29 19:54                 ` Matthew Wilcox
@ 2020-07-29 22:11                   ` Matthew Wilcox
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2020-07-29 22:11 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot,
	linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz,
	Johannes Weiner, Jens Axboe, Markus Elfring

On Wed, Jul 29, 2020 at 08:54:32PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 29, 2020 at 10:21:51PM +0300, Kirill A. Shutemov wrote:
> > On Mon, Jul 27, 2020 at 01:03:10PM +0100, Matthew Wilcox wrote:
> > > > It would be interesting to know if the migration entires ever got removed
> > > > for pfn. I mean if remove_migration_pte() got called for it.
> > > > 
> > > > It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes()
> > > > or something.
> > > 
> > > It's not mapped with a PMD.  I tweaked my debugging slightly:
> > > 
> > >  static inline swp_entry_t make_migration_entry(struct page *page, int write)
> > >  {
> > > -       BUG_ON(!PageLocked(compound_head(page)));
> > > +       VM_BUG_ON_PAGE(!PageLocked(page), page);
> > >  
> > > +if (PageHead(page)) dump_page(page, "make entry");
> > > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page)));
> > > 
> > > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00
> > > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0
> > > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked)
> > > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059
> > > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000
> > > 1523 page dumped because: make entry
> > > 1523 pfn 1dc01 order 9
> > > 1523 pfn 1dc02 order 9
> > > 1523 pfn 1dc03 order 9
> > > ...
> > > 
> > > Notice that it's an anonymous page, so it's not related to my work.
> > 
> > I don't have much hope, but could you try if the patch below would blow
> > up?
> 
> Running it now.  Results probably in twenty minutes.

It didn't blow up.  I added a dump_stack() after the call to dump_page()
and got this ...

2922 page:0000000085a5c107 refcount:474 mapcount:1 mapping:0000000000000000 index:0x559e98a00 pfn:0x35200
2922 head:0000000085a5c107 order:9 compound_mapcount:0 compound_pincount:0
2922 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked)
2922 raw: 400000000009003d ffffe8e5c1bbaf48 ffffe8e5c046ee88 ffffa2e7f3787ec9
2922 raw: 0000000559e98a00 0000000000000000 000001da00000000 0000000000000000
2922 page dumped because: make entry
2922 CPU: 5 PID: 23471 Comm: dd Kdump: loaded Tainted: G        W         5.8.0-rc6-00067-gd8b18bdf9870-dirty #358
2922 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
2922 Call Trace:
2922  dump_stack+0x5e/0x7a
2922  try_to_unmap_one+0x846/0x860
2922  rmap_walk_anon+0x13d/0x2a0
2922  rmap_walk_locked+0x23/0x30
2922  try_to_unmap+0x64/0xbc
2922  split_huge_page_to_list+0x188/0xdb0
2922  deferred_split_scan+0x148/0x240
2922  shrink_slab.constprop.0+0x198/0x330
2922  shrink_node+0x1a8/0x440
2922  try_to_free_pages+0x18f/0x480
2922  __alloc_pages_slowpath.constprop.0+0x297/0xca0
2922  __alloc_pages_nodemask+0x1ba/0x1e0
2922  pagecache_get_page+0xd8/0x330
2922  grab_cache_page_write_begin+0x1c/0x40
2922  iomap_write_begin+0x2d6/0x6d0
2922  iomap_write_actor+0x8b/0x1c0
2922  iomap_apply+0xe3/0x310
2922  iomap_file_buffered_write+0x5c/0x80
2922  xfs_file_buffered_aio_write+0xbd/0x310
2922  xfs_file_write_iter+0xa8/0xc0
2922  new_sync_write+0xf5/0x170
2922  vfs_write+0x191/0x1e0

I think that's interesting because it's not trying to allocate a huge
page itself (I didn't touch the write_begin path).  Rather, I presume
the additional memory pressure from allocating huge pages is causing
anonymous pages to be split to free up memory.

It survived all the way to generic/224 this run, but I don't think
that's relevant.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [syzbot] kernel BUG at include/linux/swapops.h:LINE!
  2020-05-30 17:05 kernel BUG at include/linux/swapops.h:LINE! syzbot
  2020-07-19 21:10 ` syzbot
@ 2021-05-08 11:24 ` syzbot
  1 sibling, 0 replies; 16+ messages in thread
From: syzbot @ 2021-05-08 11:24 UTC (permalink / raw)
  To: Markus.Elfring, akpm, axboe, david, hannes, hdanton, jennifer,
	kirill.shutemov, kirill, linux-kernel, linux-mm, mike.kravetz,
	rcampbell, syzkaller-bugs, willy

syzbot has found a reproducer for the following issue on:

HEAD commit:    869a85b9 Add linux-next specific files for 20210507
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15b48a63d00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b72885037018d06d
dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10479fd5d00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1565e995d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com

------------[ cut here ]------------
kernel BUG at include/linux/swapops.h:197!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 8460 Comm: syz-executor246 Not tainted 5.12.0-next-20210507-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
RIP: 0010:zap_huge_pmd+0xe5b/0x1110 mm/huge_memory.c:1697
Code: 2b 3f b8 ff 48 8b 5c 24 10 48 83 eb 01 e9 a8 f6 ff ff e8 18 3f b8 ff 48 8b 5c 24 10 48 83 eb 01 e9 66 f7 ff ff e8 05 3f b8 ff <0f> 0b e8 fe 3e b8 ff 31 f6 31 ff 49 bc 00 f0 ff ff ff ff 0f 00 e8
RSP: 0018:ffffc90001a2f730 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888024bc5580 RSI: ffffffff81bc972b RDI: 0000000000000003
RBP: ffffc90001a2fa48 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81bc8ec8 R11: 0000000000000000 R12: ffff88802c9a5800
R13: ffffea0000e58080 R14: ffff8880303bfea0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004c8168 CR3: 0000000016b36000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 zap_pmd_range mm/memory.c:1361 [inline]
 zap_pud_range mm/memory.c:1403 [inline]
 zap_p4d_range mm/memory.c:1424 [inline]
 unmap_page_range+0x1aa4/0x2650 mm/memory.c:1445
 unmap_single_vma+0x198/0x300 mm/memory.c:1490
 unmap_vmas+0x16d/0x2f0 mm/memory.c:1522
 exit_mmap+0x2a8/0x590 mm/mmap.c:3207
 __mmput+0x122/0x470 kernel/fork.c:1096
 mmput+0x58/0x60 kernel/fork.c:1117
 exit_mm kernel/exit.c:502 [inline]
 do_exit+0xb0a/0x2a60 kernel/exit.c:813
 do_group_exit+0x125/0x310 kernel/exit.c:923
 get_signal+0x47f/0x2150 kernel/signal.c:2856
 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:789
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x171/0x280 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:301
 do_syscall_64+0x47/0xb0 arch/x86/entry/common.c:57
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4458f9
Code: Unable to access opcode bytes at RIP 0x4458cf.
RSP: 002b:00007f45f24e4318 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00000000004ca408 RCX: 00000000004458f9
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000004ca408
RBP: 00000000004ca400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000001000000020
R13: 00007ffeb676771f R14: 00007f45f24e4400 R15: 0000000000022000
Modules linked in:
---[ end trace 8c9f5c48deec1bb7 ]---
RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline]
RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline]
RIP: 0010:zap_huge_pmd+0xe5b/0x1110 mm/huge_memory.c:1697
Code: 2b 3f b8 ff 48 8b 5c 24 10 48 83 eb 01 e9 a8 f6 ff ff e8 18 3f b8 ff 48 8b 5c 24 10 48 83 eb 01 e9 66 f7 ff ff e8 05 3f b8 ff <0f> 0b e8 fe 3e b8 ff 31 f6 31 ff 49 bc 00 f0 ff ff ff ff 0f 00 e8
RSP: 0018:ffffc90001a2f730 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888024bc5580 RSI: ffffffff81bc972b RDI: 0000000000000003
RBP: ffffc90001a2fa48 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81bc8ec8 R11: 0000000000000000 R12: ffff88802c9a5800
R13: ffffea0000e58080 R14: ffff8880303bfea0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004c8168 CR3: 000000000bc8e000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-05-08 11:24 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-30 17:05 kernel BUG at include/linux/swapops.h:LINE! syzbot
2020-07-19 21:10 ` syzbot
2020-07-20 23:51   ` Andrew Morton
2020-07-21  0:21     ` Matthew Wilcox
2020-07-21  2:14       ` Matthew Wilcox
2020-07-21 11:11     ` Kirill A. Shutemov
2020-07-21 15:11       ` Jens Axboe
     [not found]     ` <20200723073744.5268-1-hdanton@sina.com>
2020-07-24 11:13       ` Kirill A. Shutemov
2020-07-26 16:49         ` Matthew Wilcox
2020-07-27 10:31           ` Kirill A. Shutemov
2020-07-27 12:03             ` Matthew Wilcox
2020-07-29 19:21               ` Kirill A. Shutemov
2020-07-29 19:54                 ` Matthew Wilcox
2020-07-29 22:11                   ` Matthew Wilcox
     [not found]             ` <20200727125950.12048-1-hdanton@sina.com>
2020-07-27 13:44               ` Matthew Wilcox
2021-05-08 11:24 ` [syzbot] " syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).