* kernel BUG at include/linux/swapops.h:LINE! @ 2020-05-30 17:05 syzbot 2020-07-19 21:10 ` syzbot 2021-05-08 11:24 ` [syzbot] " syzbot 0 siblings, 2 replies; 19+ messages in thread From: syzbot @ 2020-05-30 17:05 UTC (permalink / raw) To: akpm, linux-kernel, linux-mm, syzkaller-bugs Hello, syzbot found the following crash on: HEAD commit: 9cb1fd0e Linux 5.7-rc7 git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1788a54a100000 kernel config: https://syzkaller.appspot.com/x/.config?x=cca7550d53ffa599 dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd compiler: gcc (GCC) 9.0.0 20181231 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com ------------[ cut here ]------------ kernel BUG at include/linux/swapops.h:197! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 30075 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] RIP: 0010:pmd_migration_entry_wait+0x5b4/0x660 mm/migrate.c:368 Code: 32 e8 10 9f c0 ff 48 c7 c6 e0 a4 35 88 4c 89 e7 e8 81 a1 ec ff 0f 0b e8 fa 9e c0 ff 4d 8d 66 ff e9 1c fe ff ff e8 ec 9e c0 ff <0f> 0b e8 e5 9e c0 ff 0f 0b e8 de 9e c0 ff 4c 8d 65 ff eb c3 48 89 RSP: 0000:ffffc90015fffc70 EFLAGS: 00010293 RAX: ffff8880544f4180 RBX: 0000000000000000 RCX: ffffffff81b29e18 RDX: 0000000000000000 RSI: ffffffff81b29fc4 RDI: 0000000000000001 RBP: ffffea0000d40080 R08: ffff8880544f4180 R09: fffff940001a8001 R10: ffffea0000d40007 R11: fffff940001a8000 R12: ffffea0000d40000 R13: 1ffff92002bfff90 R14: ffffea0001230f08 R15: ffff8880503e11e0 FS: 00007fdb7a134700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200001c0 CR3: 00000000a703c000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __handle_mm_fault+0x1c0e/0x3c90 mm/memory.c:4327 handle_mm_fault+0x1a5/0x660 mm/memory.c:4382 do_user_addr_fault arch/x86/mm/fault.c:1464 [inline] do_page_fault+0x55b/0x13da arch/x86/mm/fault.c:1535 page_fault+0x39/0x40 arch/x86/entry/entry_64.S:1203 RIP: 0033:0x45ca35 Code: 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 48 3d 01 f0 ff ff 0f 83 db b6 fb ff <c3> 66 2e 0f 1f 84 00 00 00 00 00 48 85 ff 41 57 4d 89 cf 41 56 41 RSP: 002b:00000000200001c0 EFLAGS: 00010217 RAX: 0000000000000000 RBX: 00000000004dabc0 RCX: 000000000045ca29 RDX: 0000000020000140 RSI: 00000000200001c0 RDI: 0000000000000000 RBP: 000000000078bfa0 R08: 0000000020000300 R09: 0000000000000000 R10: 00000000200002c0 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000076 R14: 00000000004c331e R15: 00007fdb7a1346d4 Modules linked in: ---[ end trace 5096692b6266afca ]--- RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] RIP: 0010:pmd_migration_entry_wait+0x5b4/0x660 mm/migrate.c:368 Code: 32 e8 10 9f c0 ff 48 c7 c6 e0 a4 35 88 4c 89 e7 e8 81 a1 ec ff 0f 0b e8 fa 9e c0 ff 4d 8d 66 ff e9 1c fe ff ff e8 ec 9e c0 ff <0f> 0b e8 e5 9e c0 ff 0f 0b e8 de 9e c0 ff 4c 8d 65 ff eb c3 48 89 RSP: 0000:ffffc90015fffc70 EFLAGS: 00010293 RAX: ffff8880544f4180 RBX: 0000000000000000 RCX: ffffffff81b29e18 RDX: 0000000000000000 RSI: ffffffff81b29fc4 RDI: 0000000000000001 RBP: ffffea0000d40080 R08: ffff8880544f4180 R09: fffff940001a8001 R10: ffffea0000d40007 R11: fffff940001a8000 R12: ffffea0000d40000 R13: 1ffff92002bfff90 R14: ffffea0001230f08 R15: ffff8880503e11e0 FS: 00007fdb7a134700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200001c0 CR3: 00000000a703c000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-05-30 17:05 kernel BUG at include/linux/swapops.h:LINE! syzbot @ 2020-07-19 21:10 ` syzbot 2020-07-20 23:51 ` Andrew Morton 2021-05-08 11:24 ` [syzbot] " syzbot 1 sibling, 1 reply; 19+ messages in thread From: syzbot @ 2020-07-19 21:10 UTC (permalink / raw) To: akpm, linux-kernel, linux-mm, syzkaller-bugs syzbot has found a reproducer for the following issue on: HEAD commit: 4c43049f Add linux-next specific files for 20200716 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com ------------[ cut here ]------------ kernel BUG at include/linux/swapops.h:197! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 19938 Comm: syz-executor.2 Not tainted 5.8.0-rc5-next-20200716-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368 Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89 RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24 RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001 RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080 R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __handle_mm_fault mm/memory.c:4349 [inline] handle_mm_fault+0x23cf/0x45e0 mm/memory.c:4465 do_user_addr_fault+0x598/0xbf0 arch/x86/mm/fault.c:1294 handle_page_fault arch/x86/mm/fault.c:1351 [inline] exc_page_fault+0xab/0x170 arch/x86/mm/fault.c:1404 asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:544 RIP: 0010:copy_user_generic_unrolled+0x89/0xc0 arch/x86/lib/copy_user_64.S:91 Code: 38 4c 89 47 20 4c 89 4f 28 4c 89 57 30 4c 89 5f 38 48 8d 76 40 48 8d 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 <4c> 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a RSP: 0018:ffffc9001095fe48 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000020000080 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffc9001095fea8 RDI: 0000000020000080 RBP: ffffc9001095fea8 R08: 0000000400000003 R09: ffffc9001095feaf R10: fffff5200212bfd5 R11: 0000000000000000 R12: 0000000000000008 R13: 0000000020000088 R14: 00007ffffffff000 R15: 0000000000000000 copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline] raw_copy_to_user arch/x86/include/asm/uaccess_64.h:74 [inline] _copy_to_user+0x11e/0x160 lib/usercopy.c:30 copy_to_user include/linux/uaccess.h:168 [inline] do_pipe2+0x128/0x1b0 fs/pipe.c:1014 __do_sys_pipe fs/pipe.c:1035 [inline] __se_sys_pipe fs/pipe.c:1033 [inline] __x64_sys_pipe+0x2f/0x40 fs/pipe.c:1033 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c1d9 Code: Bad RIP value. RSP: 002b:00007f55eb476c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000016 RAX: ffffffffffffffda RBX: 0000000000022ac0 RCX: 000000000045c1d9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080 RBP: 000000000078c070 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078c04c R13: 00007ffcfc120ecf R14: 00007f55eb4779c0 R15: 000000000078c04c Modules linked in: ---[ end trace ea73d933d66ff0d4 ]--- RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368 Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89 RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24 RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001 RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080 R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-19 21:10 ` syzbot @ 2020-07-20 23:51 ` Andrew Morton 2020-07-21 0:21 ` Matthew Wilcox 2020-07-21 11:11 ` Kirill A. Shutemov 0 siblings, 2 replies; 19+ messages in thread From: Andrew Morton @ 2020-07-20 23:51 UTC (permalink / raw) To: syzbot Cc: linux-kernel, linux-mm, syzkaller-bugs, Matthew Wilcox, Kirill A. Shutemov, Ralph Campbell, David Hildenbrand, Mike Kravetz On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote: > syzbot has found a reproducer for the following issue on: > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > compiler: gcc (GCC) 10.1.0-syz 20200507 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com Thanks. __handle_mm_fault ->pmd_migration_entry_wait ->migration_entry_to_page stumbled onto an unlocked page. I don't immediately see a cause. Perhaps Matthew's "THP prep patches", perhaps something else. Is it possible to perform a bisection? > ------------[ cut here ]------------ > kernel BUG at include/linux/swapops.h:197! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > CPU: 1 PID: 19938 Comm: syz-executor.2 Not tainted 5.8.0-rc5-next-20200716-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] > RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] > RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368 > Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89 > RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24 > RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001 > RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087 > R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080 > R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > __handle_mm_fault mm/memory.c:4349 [inline] > handle_mm_fault+0x23cf/0x45e0 mm/memory.c:4465 > do_user_addr_fault+0x598/0xbf0 arch/x86/mm/fault.c:1294 > handle_page_fault arch/x86/mm/fault.c:1351 [inline] > exc_page_fault+0xab/0x170 arch/x86/mm/fault.c:1404 > asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:544 > RIP: 0010:copy_user_generic_unrolled+0x89/0xc0 arch/x86/lib/copy_user_64.S:91 > Code: 38 4c 89 47 20 4c 89 4f 28 4c 89 57 30 4c 89 5f 38 48 8d 76 40 48 8d 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 <4c> 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a > RSP: 0018:ffffc9001095fe48 EFLAGS: 00010202 > RAX: 0000000000000001 RBX: 0000000020000080 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: ffffc9001095fea8 RDI: 0000000020000080 > RBP: ffffc9001095fea8 R08: 0000000400000003 R09: ffffc9001095feaf > R10: fffff5200212bfd5 R11: 0000000000000000 R12: 0000000000000008 > R13: 0000000020000088 R14: 00007ffffffff000 R15: 0000000000000000 > copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline] > raw_copy_to_user arch/x86/include/asm/uaccess_64.h:74 [inline] > _copy_to_user+0x11e/0x160 lib/usercopy.c:30 > copy_to_user include/linux/uaccess.h:168 [inline] > do_pipe2+0x128/0x1b0 fs/pipe.c:1014 > __do_sys_pipe fs/pipe.c:1035 [inline] > __se_sys_pipe fs/pipe.c:1033 [inline] > __x64_sys_pipe+0x2f/0x40 fs/pipe.c:1033 > do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x45c1d9 > Code: Bad RIP value. > RSP: 002b:00007f55eb476c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000016 > RAX: ffffffffffffffda RBX: 0000000000022ac0 RCX: 000000000045c1d9 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080 > RBP: 000000000078c070 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078c04c > R13: 00007ffcfc120ecf R14: 00007f55eb4779c0 R15: 000000000078c04c > Modules linked in: > ---[ end trace ea73d933d66ff0d4 ]--- > RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] > RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] > RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368 > Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89 > RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24 > RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001 > RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087 > R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080 > R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-20 23:51 ` Andrew Morton @ 2020-07-21 0:21 ` Matthew Wilcox 2020-07-21 2:14 ` Matthew Wilcox 2020-07-21 11:11 ` Kirill A. Shutemov 1 sibling, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2020-07-21 0:21 UTC (permalink / raw) To: Andrew Morton Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs, Kirill A. Shutemov, Ralph Campbell, David Hildenbrand, Mike Kravetz On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote: > > > syzbot has found a reproducer for the following issue on: > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > Thanks. > > __handle_mm_fault > ->pmd_migration_entry_wait > ->migration_entry_to_page > > stumbled onto an unlocked page. > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > perhaps something else. That's interesting. I'm currently chasing that signature too. Of course, almost anything can cause this. What I do have in my tree is a patch to turn that WARN_ON into a VM_BUG_ON_PAGE and what I see is not just an unlocked page, but one that's been freed. > Is it possible to perform a bisection? My testing (xfstests with the full THP patch set) takes about 45 minutes to hit this bug usually. Sometimes two hours. I haven't tried running it against fewer patches because I thought it was related to having THPs smaller than PMD size in the page cache. I don't think it is my patches because they're essentially just a rename. But of course, I've been wrong before ... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-21 0:21 ` Matthew Wilcox @ 2020-07-21 2:14 ` Matthew Wilcox 0 siblings, 0 replies; 19+ messages in thread From: Matthew Wilcox @ 2020-07-21 2:14 UTC (permalink / raw) To: Andrew Morton Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs, Kirill A. Shutemov, Ralph Campbell, David Hildenbrand, Mike Kravetz On Tue, Jul 21, 2020 at 01:21:47AM +0100, Matthew Wilcox wrote: > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote: > > > > > syzbot has found a reproducer for the following issue on: > > > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > > git tree: linux-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > > > Thanks. > > > > __handle_mm_fault > > ->pmd_migration_entry_wait > > ->migration_entry_to_page > > > > stumbled onto an unlocked page. > > > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > > perhaps something else. > > That's interesting. I'm currently chasing that signature too. Of course, > almost anything can cause this. > > What I do have in my tree is a patch to turn that WARN_ON into a > VM_BUG_ON_PAGE and what I see is not just an unlocked page, but one > that's been freed. Here's an example crash: 1404 086 (25392): drop_caches: 3 1404 page:00000000c8b7c292 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x1 pfn:0xac20 1404 flags: 0x4000000000000000() 1404 raw: 4000000000000000 fffff7b501775808 fffff7b501ab7008 0000000000000000 1404 raw: 0000000000000001 0000000000000005 00000000ffffff7f 0000000000000000 1404 page dumped because: VM_BUG_ON_PAGE(!PageLocked(p)) (that's generic/086 for what it's worth, but you have to run through a number of other tests in order to hit it; even starting at generic/08[0123456] isn't enough to hit it, and it doesn't always hit) A mapcount of -128 indicates PageBuddy, but I've also seen a mapcount of 0 indicating it's still on the per-cpu freelist. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-20 23:51 ` Andrew Morton 2020-07-21 0:21 ` Matthew Wilcox @ 2020-07-21 11:11 ` Kirill A. Shutemov 2020-07-21 15:11 ` Jens Axboe 2020-07-23 7:37 ` Hillf Danton 1 sibling, 2 replies; 19+ messages in thread From: Kirill A. Shutemov @ 2020-07-21 11:11 UTC (permalink / raw) To: Andrew Morton Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs, Matthew Wilcox, Ralph Campbell, David Hildenbrand, Mike Kravetz, Johannes Weiner, Jens Axboe On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote: > > > syzbot has found a reproducer for the following issue on: > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > Thanks. > > __handle_mm_fault > ->pmd_migration_entry_wait > ->migration_entry_to_page > > stumbled onto an unlocked page. > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > perhaps something else. > > Is it possible to perform a bisection? Maybe it's related to the new lock_page_async()? > > ------------[ cut here ]------------ > > kernel BUG at include/linux/swapops.h:197! > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > > CPU: 1 PID: 19938 Comm: syz-executor.2 Not tainted 5.8.0-rc5-next-20200716-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] > > RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] > > RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368 > > Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89 > > RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293 > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24 > > RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001 > > RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087 > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080 > > R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000 > > FS: 00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > __handle_mm_fault mm/memory.c:4349 [inline] > > handle_mm_fault+0x23cf/0x45e0 mm/memory.c:4465 > > do_user_addr_fault+0x598/0xbf0 arch/x86/mm/fault.c:1294 > > handle_page_fault arch/x86/mm/fault.c:1351 [inline] > > exc_page_fault+0xab/0x170 arch/x86/mm/fault.c:1404 > > asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:544 > > RIP: 0010:copy_user_generic_unrolled+0x89/0xc0 arch/x86/lib/copy_user_64.S:91 > > Code: 38 4c 89 47 20 4c 89 4f 28 4c 89 57 30 4c 89 5f 38 48 8d 76 40 48 8d 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 <4c> 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a > > RSP: 0018:ffffc9001095fe48 EFLAGS: 00010202 > > RAX: 0000000000000001 RBX: 0000000020000080 RCX: 0000000000000001 > > RDX: 0000000000000000 RSI: ffffc9001095fea8 RDI: 0000000020000080 > > RBP: ffffc9001095fea8 R08: 0000000400000003 R09: ffffc9001095feaf > > R10: fffff5200212bfd5 R11: 0000000000000000 R12: 0000000000000008 > > R13: 0000000020000088 R14: 00007ffffffff000 R15: 0000000000000000 > > copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline] > > raw_copy_to_user arch/x86/include/asm/uaccess_64.h:74 [inline] > > _copy_to_user+0x11e/0x160 lib/usercopy.c:30 > > copy_to_user include/linux/uaccess.h:168 [inline] > > do_pipe2+0x128/0x1b0 fs/pipe.c:1014 > > __do_sys_pipe fs/pipe.c:1035 [inline] > > __se_sys_pipe fs/pipe.c:1033 [inline] > > __x64_sys_pipe+0x2f/0x40 fs/pipe.c:1033 > > do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > RIP: 0033:0x45c1d9 > > Code: Bad RIP value. > > RSP: 002b:00007f55eb476c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000016 > > RAX: ffffffffffffffda RBX: 0000000000022ac0 RCX: 000000000045c1d9 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080 > > RBP: 000000000078c070 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078c04c > > R13: 00007ffcfc120ecf R14: 00007f55eb4779c0 R15: 000000000078c04c > > Modules linked in: > > ---[ end trace ea73d933d66ff0d4 ]--- > > RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] > > RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] > > RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368 > > Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89 > > RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293 > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24 > > RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001 > > RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087 > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080 > > R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000 > > FS: 00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-21 11:11 ` Kirill A. Shutemov @ 2020-07-21 15:11 ` Jens Axboe 2020-07-23 7:37 ` Hillf Danton 1 sibling, 0 replies; 19+ messages in thread From: Jens Axboe @ 2020-07-21 15:11 UTC (permalink / raw) To: Kirill A. Shutemov, Andrew Morton Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs, Matthew Wilcox, Ralph Campbell, David Hildenbrand, Mike Kravetz, Johannes Weiner On 7/21/20 5:11 AM, Kirill A. Shutemov wrote: > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: >> On Sun, 19 Jul 2020 14:10:19 -0700 syzbot <syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com> wrote: >> >>> syzbot has found a reproducer for the following issue on: >>> >>> HEAD commit: 4c43049f Add linux-next specific files for 20200716 >>> git tree: linux-next >>> console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 >>> dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd >>> compiler: gcc (GCC) 10.1.0-syz 20200507 >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 >>> >>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>> Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com >> >> Thanks. >> >> __handle_mm_fault >> ->pmd_migration_entry_wait >> ->migration_entry_to_page >> >> stumbled onto an unlocked page. >> >> I don't immediately see a cause. Perhaps Matthew's "THP prep patches", >> perhaps something else. >> >> Is it possible to perform a bisection? > > Maybe it's related to the new lock_page_async()? Shouldn't be used for any of those paths at all, though of course can't rule out a bug that triggers it somehow. A bisection would be nice. -- Jens Axboe ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-21 11:11 ` Kirill A. Shutemov 2020-07-21 15:11 ` Jens Axboe @ 2020-07-23 7:37 ` Hillf Danton 2020-07-24 11:13 ` Kirill A. Shutemov 1 sibling, 1 reply; 19+ messages in thread From: Hillf Danton @ 2020-07-23 7:37 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring, Hillf Danton On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote: > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote: > > > > > syzbot has found a reproducer for the following issue on: > > > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > > git tree: linux-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > > > Thanks. > > > > __handle_mm_fault > > ->pmd_migration_entry_wait > > ->migration_entry_to_page > > > > stumbled onto an unlocked page. > > > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > > perhaps something else. > > > > Is it possible to perform a bisection? > > Maybe it's related to the new lock_page_async()? Or is there likely the window that after copy_huge_pmd() the src pmd migrate entry is removed and the page unlocked but the dst is not? > > > ------------[ cut here ]------------ > > > kernel BUG at include/linux/swapops.h:197! > > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > > > CPU: 1 PID: 19938 Comm: syz-executor.2 Not tainted 5.8.0-rc5-next-20200716-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] > > > RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] > > > RIP: 0010:pmd_migration_entry_wait+0x493/0x520 mm/migrate.c:368 > > > Code: 4d 8d 66 ff e9 1f fe ff ff e8 b9 c4 be ff 49 8d 5f ff e9 58 fe ff ff e8 ab c4 be ff 4d 8d 66 ff e9 a9 fe ff ff e8 9d c4 be ff <0f> 0b e8 96 c4 be ff 0f 0b e8 8f c4 be ff 4c 8d 65 ff eb a7 48 89 > > > RSP: 0018:ffffc9001095fb70 EFLAGS: 00010293 > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81b56a24 > > > RDX: ffff888092022240 RSI: ffffffff81b56b43 RDI: 0000000000000001 > > > RBP: ffffea0008468080 R08: 0000000000000000 R09: ffffea0008468087 > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0008468080 > > > R13: ffff888015d2e0c0 R14: 0000000000000000 R15: 0000000000000000 > > > FS: 00007f55eb477700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000000020000080 CR3: 000000021ad87000 CR4: 00000000001526e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > Call Trace: > > > __handle_mm_fault mm/memory.c:4349 [inline] > > > handle_mm_fault+0x23cf/0x45e0 mm/memory.c:4465 > > > do_user_addr_fault+0x598/0xbf0 arch/x86/mm/fault.c:1294 > > > handle_page_fault arch/x86/mm/fault.c:1351 [inline] > > > exc_page_fault+0xab/0x170 arch/x86/mm/fault.c:1404 > > > asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:544 > > > RIP: 0010:copy_user_generic_unrolled+0x89/0xc0 arch/x86/lib/copy_user_64.S:91 > > > Code: 38 4c 89 47 20 4c 89 4f 28 4c 89 57 30 4c 89 5f 38 48 8d 76 40 48 8d 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 <4c> 89 07 48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a > > > RSP: 0018:ffffc9001095fe48 EFLAGS: 00010202 > > > RAX: 0000000000000001 RBX: 0000000020000080 RCX: 0000000000000001 > > > RDX: 0000000000000000 RSI: ffffc9001095fea8 RDI: 0000000020000080 > > > RBP: ffffc9001095fea8 R08: 0000000400000003 R09: ffffc9001095feaf > > > R10: fffff5200212bfd5 R11: 0000000000000000 R12: 0000000000000008 > > > R13: 0000000020000088 R14: 00007ffffffff000 R15: 0000000000000000 > > > copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline] > > > raw_copy_to_user arch/x86/include/asm/uaccess_64.h:74 [inline] > > > _copy_to_user+0x11e/0x160 lib/usercopy.c:30 > > > copy_to_user include/linux/uaccess.h:168 [inline] > > > do_pipe2+0x128/0x1b0 fs/pipe.c:1014 > > > __do_sys_pipe fs/pipe.c:1035 [inline] > > > __se_sys_pipe fs/pipe.c:1033 [inline] > > > __x64_sys_pipe+0x2f/0x40 fs/pipe.c:1033 > > > do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-23 7:37 ` Hillf Danton @ 2020-07-24 11:13 ` Kirill A. Shutemov 2020-07-26 16:49 ` Matthew Wilcox 0 siblings, 1 reply; 19+ messages in thread From: Kirill A. Shutemov @ 2020-07-24 11:13 UTC (permalink / raw) To: Hillf Danton Cc: Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote: > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote: > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote: > > > > > > > syzbot has found a reproducer for the following issue on: > > > > > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > > > git tree: linux-next > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > > > > > Thanks. > > > > > > __handle_mm_fault > > > ->pmd_migration_entry_wait > > > ->migration_entry_to_page > > > > > > stumbled onto an unlocked page. > > > > > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > > > perhaps something else. > > > > > > Is it possible to perform a bisection? > > > > Maybe it's related to the new lock_page_async()? > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate > entry is removed and the page unlocked but the dst is not? No. copy_huge_pmd() runs with exclusive mmap_lock on the source side and destination side is not running yet. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-24 11:13 ` Kirill A. Shutemov @ 2020-07-26 16:49 ` Matthew Wilcox 2020-07-27 10:31 ` Kirill A. Shutemov 0 siblings, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2020-07-26 16:49 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote: > On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote: > > > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote: > > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote: > > > > > > > > > syzbot has found a reproducer for the following issue on: > > > > > > > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > > > > git tree: linux-next > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > > > > > > > Thanks. > > > > > > > > __handle_mm_fault > > > > ->pmd_migration_entry_wait > > > > ->migration_entry_to_page > > > > > > > > stumbled onto an unlocked page. > > > > > > > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > > > > perhaps something else. > > > > > > > > Is it possible to perform a bisection? > > > > > > Maybe it's related to the new lock_page_async()? > > > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate > > entry is removed and the page unlocked but the dst is not? > > No. > > copy_huge_pmd() runs with exclusive mmap_lock on the source side and > destination side is not running yet. The one I'm hitting is huge related though. I added this debug: +++ b/include/linux/swapops.h @@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry) #ifdef CONFIG_MIGRATION static inline swp_entry_t make_migration_entry(struct page *page, int write) { - BUG_ON(!PageLocked(compound_head(page))); + VM_BUG_ON_PAGE(!PageLocked(page), page); +if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ, page_to_pfn(page)); } @@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry) * Any use of migration entries may only occur while the * corresponding page is locked */ - BUG_ON(!PageLocked(compound_head(p))); + if (!PageLocked(p)) { + dump_page(p, "not locked"); + printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry)); + BUG(); + } return p; } and got useful output (while running generic/086): 1457 086 (20181): drop_caches: 3 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7 1457 aops:def_blk_aops ino:0 1457 flags: 0x4000000000002030(lru|active|private) 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000 1457 page dumped because: not locked 1457 swap entry 30.229e7 1457 ------------[ cut here ]------------ 1457 kernel BUG at include/linux/swapops.h:201! 1457 invalid opcode: 0000 [#1] SMP PTI 1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G W 5.8.0-rc6-00067-gd8b18bdf9870-dirty #355 1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 1457 RIP: 0010:__migration_entry_wait+0x109/0x110 [...] Looking back in the trace, I see: ... 1457 pfn 229e5 order 9 1457 pfn 229e6 order 9 1457 pfn 229e7 order 9 1457 pfn 229e8 order 9 1457 pfn 229e9 order 9 ... so I would say we have a refcount problem. I've probably made it worse by creating more THPs, but I don't think I'm the originator of the problem. I know very little about the migration code today. I suspect I'm going to have to learn about it next week. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-26 16:49 ` Matthew Wilcox @ 2020-07-27 10:31 ` Kirill A. Shutemov 2020-07-27 12:03 ` Matthew Wilcox 0 siblings, 1 reply; 19+ messages in thread From: Kirill A. Shutemov @ 2020-07-27 10:31 UTC (permalink / raw) To: Matthew Wilcox Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote: > On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote: > > On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote: > > > > > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote: > > > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > > > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote: > > > > > > > > > > > syzbot has found a reproducer for the following issue on: > > > > > > > > > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > > > > > git tree: linux-next > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > > > > > > > > > Thanks. > > > > > > > > > > __handle_mm_fault > > > > > ->pmd_migration_entry_wait > > > > > ->migration_entry_to_page > > > > > > > > > > stumbled onto an unlocked page. > > > > > > > > > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > > > > > perhaps something else. > > > > > > > > > > Is it possible to perform a bisection? > > > > > > > > Maybe it's related to the new lock_page_async()? > > > > > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate > > > entry is removed and the page unlocked but the dst is not? > > > > No. > > > > copy_huge_pmd() runs with exclusive mmap_lock on the source side and > > destination side is not running yet. > > The one I'm hitting is huge related though. > > I added this debug: > > +++ b/include/linux/swapops.h > @@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry) > #ifdef CONFIG_MIGRATION > static inline swp_entry_t make_migration_entry(struct page *page, int write) > { > - BUG_ON(!PageLocked(compound_head(page))); > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > +if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ, > page_to_pfn(page)); > } > @@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry) > * Any use of migration entries may only occur while the > * corresponding page is locked > */ > - BUG_ON(!PageLocked(compound_head(p))); > + if (!PageLocked(p)) { > + dump_page(p, "not locked"); > + printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry)); > + BUG(); > + } > return p; > } > > > and got useful output (while running generic/086): > > 1457 086 (20181): drop_caches: 3 > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7 > 1457 aops:def_blk_aops ino:0 > 1457 flags: 0x4000000000002030(lru|active|private) > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578 > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000 > 1457 page dumped because: not locked > 1457 swap entry 30.229e7 > 1457 ------------[ cut here ]------------ > 1457 kernel BUG at include/linux/swapops.h:201! > 1457 invalid opcode: 0000 [#1] SMP PTI > 1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G W 5.8.0-rc6-00067-gd8b18bdf9870-dirty #355 > 1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > 1457 RIP: 0010:__migration_entry_wait+0x109/0x110 > [...] > > Looking back in the trace, I see: > > ... > 1457 pfn 229e5 order 9 > 1457 pfn 229e6 order 9 > 1457 pfn 229e7 order 9 > 1457 pfn 229e8 order 9 > 1457 pfn 229e9 order 9 > ... > > so I would say we have a refcount problem. I've probably made it worse by > creating more THPs, but I don't think I'm the originator of the problem. > > I know very little about the migration code today. I suspect I'm going > to have to learn about it next week. It would be interesting to know if the migration entires ever got removed for pfn. I mean if remove_migration_pte() got called for it. It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes() or something. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-27 10:31 ` Kirill A. Shutemov @ 2020-07-27 12:03 ` Matthew Wilcox 2020-07-27 12:59 ` Hillf Danton 2020-07-29 19:21 ` Kirill A. Shutemov 0 siblings, 2 replies; 19+ messages in thread From: Matthew Wilcox @ 2020-07-27 12:03 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Mon, Jul 27, 2020 at 01:31:40PM +0300, Kirill A. Shutemov wrote: > On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote: > > On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote: > > > On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote: > > > > > > > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote: > > > > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > > > > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote: > > > > > > > > > > > > > syzbot has found a reproducer for the following issue on: > > > > > > > > > > > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > > > > > > git tree: linux-next > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > > > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > > > > > > > > > > > Thanks. > > > > > > > > > > > > __handle_mm_fault > > > > > > ->pmd_migration_entry_wait > > > > > > ->migration_entry_to_page > > > > > > > > > > > > stumbled onto an unlocked page. > > > > > > > > > > > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > > > > > > perhaps something else. > > > > > > > > > > > > Is it possible to perform a bisection? > > > > > > > > > > Maybe it's related to the new lock_page_async()? > > > > > > > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate > > > > entry is removed and the page unlocked but the dst is not? > > > > > > No. > > > > > > copy_huge_pmd() runs with exclusive mmap_lock on the source side and > > > destination side is not running yet. > > > > The one I'm hitting is huge related though. > > > > I added this debug: > > > > +++ b/include/linux/swapops.h > > @@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry) > > #ifdef CONFIG_MIGRATION > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > > { > > - BUG_ON(!PageLocked(compound_head(page))); > > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > > > +if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ, > > page_to_pfn(page)); > > } > > @@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry) > > * Any use of migration entries may only occur while the > > * corresponding page is locked > > */ > > - BUG_ON(!PageLocked(compound_head(p))); > > + if (!PageLocked(p)) { > > + dump_page(p, "not locked"); > > + printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry)); > > + BUG(); > > + } > > return p; > > } > > > > > > and got useful output (while running generic/086): > > > > 1457 086 (20181): drop_caches: 3 > > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7 > > 1457 aops:def_blk_aops ino:0 > > 1457 flags: 0x4000000000002030(lru|active|private) > > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578 > > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000 > > 1457 page dumped because: not locked > > 1457 swap entry 30.229e7 > > 1457 ------------[ cut here ]------------ > > 1457 kernel BUG at include/linux/swapops.h:201! > > 1457 invalid opcode: 0000 [#1] SMP PTI > > 1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G W 5.8.0-rc6-00067-gd8b18bdf9870-dirty #355 > > 1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > > 1457 RIP: 0010:__migration_entry_wait+0x109/0x110 > > [...] > > > > Looking back in the trace, I see: > > > > ... > > 1457 pfn 229e5 order 9 > > 1457 pfn 229e6 order 9 > > 1457 pfn 229e7 order 9 > > 1457 pfn 229e8 order 9 > > 1457 pfn 229e9 order 9 > > ... > > > > so I would say we have a refcount problem. I've probably made it worse by > > creating more THPs, but I don't think I'm the originator of the problem. > > > > I know very little about the migration code today. I suspect I'm going > > to have to learn about it next week. > > It would be interesting to know if the migration entires ever got removed > for pfn. I mean if remove_migration_pte() got called for it. > > It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes() > or something. It's not mapped with a PMD. I tweaked my debugging slightly: static inline swp_entry_t make_migration_entry(struct page *page, int write) { - BUG_ON(!PageLocked(compound_head(page))); + VM_BUG_ON_PAGE(!PageLocked(page), page); +if (PageHead(page)) dump_page(page, "make entry"); +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked) 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000 1523 page dumped because: make entry 1523 pfn 1dc01 order 9 1523 pfn 1dc02 order 9 1523 pfn 1dc03 order 9 ... Notice that it's an anonymous page, so it's not related to my work. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-27 12:03 ` Matthew Wilcox @ 2020-07-27 12:59 ` Hillf Danton 2020-07-27 13:44 ` Matthew Wilcox 2020-07-29 19:21 ` Kirill A. Shutemov 1 sibling, 1 reply; 19+ messages in thread From: Hillf Danton @ 2020-07-27 12:59 UTC (permalink / raw) To: Matthew Wilcox Cc: Kirill A. Shutemov, Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Mon, 27 Jul 2020 13:03:10 +0100 Matthew Wilcox wrote: > On Mon, Jul 27, 2020 at 01:31:40PM +0300, Kirill A. Shutemov wrote: > > On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote: > > > On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote: > > > > On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote: > > > > > > > > > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote: > > > > > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > > > > > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote: > > > > > > > > > > > > > > > syzbot has found a reproducer for the following issue on: > > > > > > > > > > > > > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > > > > > > > git tree: linux-next > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > > > > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > __handle_mm_fault > > > > > > > ->pmd_migration_entry_wait > > > > > > > ->migration_entry_to_page > > > > > > > > > > > > > > stumbled onto an unlocked page. > > > > > > > > > > > > > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > > > > > > > perhaps something else. > > > > > > > > > > > > > > Is it possible to perform a bisection? > > > > > > > > > > > > Maybe it's related to the new lock_page_async()? > > > > > > > > > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate > > > > > entry is removed and the page unlocked but the dst is not? > > > > > > > > No. > > > > > > > > copy_huge_pmd() runs with exclusive mmap_lock on the source side and > > > > destination side is not running yet. > > > > > > The one I'm hitting is huge related though. > > > > > > I added this debug: > > > > > > +++ b/include/linux/swapops.h > > > @@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry) > > > #ifdef CONFIG_MIGRATION > > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > > > { > > > - BUG_ON(!PageLocked(compound_head(page))); > > > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > > > > > +if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > > return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ, > > > page_to_pfn(page)); > > > } > > > @@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry) > > > * Any use of migration entries may only occur while the > > > * corresponding page is locked > > > */ > > > - BUG_ON(!PageLocked(compound_head(p))); > > > + if (!PageLocked(p)) { > > > + dump_page(p, "not locked"); > > > + printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry)); > > > + BUG(); > > > + } > > > return p; > > > } > > > > > > > > > and got useful output (while running generic/086): > > > > > > 1457 086 (20181): drop_caches: 3 > > > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7 > > > 1457 aops:def_blk_aops ino:0 > > > 1457 flags: 0x4000000000002030(lru|active|private) > > > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578 > > > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000 > > > 1457 page dumped because: not locked > > > 1457 swap entry 30.229e7 > > > 1457 ------------[ cut here ]------------ > > > 1457 kernel BUG at include/linux/swapops.h:201! > > > 1457 invalid opcode: 0000 [#1] SMP PTI > > > 1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G W 5.8.0-rc6-00067-gd8b18bdf9870-dirty #355 > > > 1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > > > 1457 RIP: 0010:__migration_entry_wait+0x109/0x110 > > > [...] > > > > > > Looking back in the trace, I see: > > > > > > ... > > > 1457 pfn 229e5 order 9 > > > 1457 pfn 229e6 order 9 > > > 1457 pfn 229e7 order 9 > > > 1457 pfn 229e8 order 9 > > > 1457 pfn 229e9 order 9 > > > ... > > > > > > so I would say we have a refcount problem. I've probably made it worse by > > > creating more THPs, but I don't think I'm the originator of the problem. > > > > > > I know very little about the migration code today. I suspect I'm going > > > to have to learn about it next week. > > > > It would be interesting to know if the migration entires ever got removed > > for pfn. I mean if remove_migration_pte() got called for it. > > > > It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes() > > or something. > > It's not mapped with a PMD. I tweaked my debugging slightly: > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > { > - BUG_ON(!PageLocked(compound_head(page))); > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > +if (PageHead(page)) dump_page(page, "make entry"); > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00 > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0 > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked) > > > 1457 flags: 0x4000000000002030(lru|active|private) Can you elaborate on the difference between the two dumps? > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059 > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000 > 1523 page dumped because: make entry > 1523 pfn 1dc01 order 9 > 1523 pfn 1dc02 order 9 > 1523 pfn 1dc03 order 9 > ... > > Notice that it's an anonymous page, so it's not related to my work. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-27 12:59 ` Hillf Danton @ 2020-07-27 13:44 ` Matthew Wilcox 2020-07-27 14:46 ` Hillf Danton 0 siblings, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2020-07-27 13:44 UTC (permalink / raw) To: Hillf Danton Cc: Kirill A. Shutemov, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Mon, Jul 27, 2020 at 08:59:50PM +0800, Hillf Danton wrote: > Can you elaborate on the difference between the two dumps? You didn't trim anything, so I have no idea which two dumps you mean. I'll annotate below ... > > > On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote: > > > > 1457 086 (20181): drop_caches: 3 > > > > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7 > > > > 1457 aops:def_blk_aops ino:0 > > > > 1457 flags: 0x4000000000002030(lru|active|private) > > > > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578 > > > > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000 > > > > 1457 page dumped because: not locked > > > > 1457 swap entry 30.229e7 This is a dump of the page that was found when looking up the migration entry. > On Mon, 27 Jul 2020 13:03:10 +0100 Matthew Wilcox wrote: > > It's not mapped with a PMD. I tweaked my debugging slightly: > > > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > > { > > - BUG_ON(!PageLocked(compound_head(page))); > > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > > > +if (PageHead(page)) dump_page(page, "make entry"); > > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > > > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00 > > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0 > > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked) > > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059 > > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000 > > 1523 page dumped because: make entry This is dumping the page when we create the entry. For completeness, here's the page that we find from the same run. 1523 page:00000000a18100e6 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1ddde 1523 flags: 0x4000000000000000() 1523 raw: 4000000000000000 dead000000000100 dead000000000122 0000000000000000 1523 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 1523 page dumped because: not locked (an order-9 page will occupy PFNs 0x1dc00-0x1ddff) It's clearly been freed and is still sitting on the per-CPU free list. I've also seen them as PageBuddy and, as in the first example above, reallocated to a different user. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-27 13:44 ` Matthew Wilcox @ 2020-07-27 14:46 ` Hillf Danton 0 siblings, 0 replies; 19+ messages in thread From: Hillf Danton @ 2020-07-27 14:46 UTC (permalink / raw) To: Matthew Wilcox Cc: Hillf Danton, Kirill A. Shutemov, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Mon, 27 Jul 2020 14:44:46 +0100 Matthew Wilcox wrote: > On Mon, Jul 27, 2020 at 08:59:50PM +0800, Hillf Danton wrote: > > Can you elaborate on the difference between the two dumps? > > You didn't trim anything, so I have no idea which two dumps you mean. > > I'll annotate below ... Double thanks. > > > > > On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote: > > > > > 1457 086 (20181): drop_caches: 3 > > > > > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7 > > > > > 1457 aops:def_blk_aops ino:0 > > > > > 1457 flags: 0x4000000000002030(lru|active|private) > > > > > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578 > > > > > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000 > > > > > 1457 page dumped because: not locked > > > > > 1457 swap entry 30.229e7 > > This is a dump of the page that was found when looking up the migration entry. It can be understood without difficulty as page(with mapping) is not locked. > > > On Mon, 27 Jul 2020 13:03:10 +0100 Matthew Wilcox wrote: > > > It's not mapped with a PMD. I tweaked my debugging slightly: > > > > > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > > > { > > > - BUG_ON(!PageLocked(compound_head(page))); > > > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > > > > > +if (PageHead(page)) dump_page(page, "make entry"); > > > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > > > > > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00 > > > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0 > > > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked) > > > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059 > > > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000 > > > 1523 page dumped because: make entry > > This is dumping the page when we create the entry. Hard to understand that a locked page is dumped. > > For completeness, here's the page that we find from the same run. > > 1523 page:00000000a18100e6 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1ddde > 1523 flags: 0x4000000000000000() > 1523 raw: 4000000000000000 dead000000000100 dead000000000122 0000000000000000 > 1523 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 > 1523 page dumped because: not locked > > (an order-9 page will occupy PFNs 0x1dc00-0x1ddff) > > It's clearly been freed and is still sitting on the per-CPU free list. As it survived free, it is simple to see refcount or lock; what's unclear is why there is a migrate entry left two miles behind, anon or not. > I've also seen them as PageBuddy and, as in the first example above, > reallocated to a different user. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-27 12:03 ` Matthew Wilcox 2020-07-27 12:59 ` Hillf Danton @ 2020-07-29 19:21 ` Kirill A. Shutemov 2020-07-29 19:54 ` Matthew Wilcox 1 sibling, 1 reply; 19+ messages in thread From: Kirill A. Shutemov @ 2020-07-29 19:21 UTC (permalink / raw) To: Matthew Wilcox Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Mon, Jul 27, 2020 at 01:03:10PM +0100, Matthew Wilcox wrote: > On Mon, Jul 27, 2020 at 01:31:40PM +0300, Kirill A. Shutemov wrote: > > On Sun, Jul 26, 2020 at 05:49:04PM +0100, Matthew Wilcox wrote: > > > On Fri, Jul 24, 2020 at 02:13:11PM +0300, Kirill A. Shutemov wrote: > > > > On Thu, Jul 23, 2020 at 03:37:44PM +0800, Hillf Danton wrote: > > > > > > > > > > On Tue, 21 Jul 2020 14:11:31 +0300 Kirill A. Shutemov wrote: > > > > > > On Mon, Jul 20, 2020 at 04:51:44PM -0700, Andrew Morton wrote: > > > > > > > On Sun, 19 Jul 2020 14:10:19 -0700 syzbot wrote: > > > > > > > > > > > > > > > syzbot has found a reproducer for the following issue on: > > > > > > > > > > > > > > > > HEAD commit: 4c43049f Add linux-next specific files for 20200716 > > > > > > > > git tree: linux-next > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12c56087100000 > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2c76d72659687242 > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd > > > > > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1344abeb100000 > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > __handle_mm_fault > > > > > > > ->pmd_migration_entry_wait > > > > > > > ->migration_entry_to_page > > > > > > > > > > > > > > stumbled onto an unlocked page. > > > > > > > > > > > > > > I don't immediately see a cause. Perhaps Matthew's "THP prep patches", > > > > > > > perhaps something else. > > > > > > > > > > > > > > Is it possible to perform a bisection? > > > > > > > > > > > > Maybe it's related to the new lock_page_async()? > > > > > > > > > > Or is there likely the window that after copy_huge_pmd() the src pmd migrate > > > > > entry is removed and the page unlocked but the dst is not? > > > > > > > > No. > > > > > > > > copy_huge_pmd() runs with exclusive mmap_lock on the source side and > > > > destination side is not running yet. > > > > > > The one I'm hitting is huge related though. > > > > > > I added this debug: > > > > > > +++ b/include/linux/swapops.h > > > @@ -165,8 +165,9 @@ static inline struct page *device_private_entry_to_page(swp_entry_t entry) > > > #ifdef CONFIG_MIGRATION > > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > > > { > > > - BUG_ON(!PageLocked(compound_head(page))); > > > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > > > > > +if (PageCompound(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > > return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ, > > > page_to_pfn(page)); > > > } > > > @@ -194,7 +195,11 @@ static inline struct page *migration_entry_to_page(swp_entry_t entry) > > > * Any use of migration entries may only occur while the > > > * corresponding page is locked > > > */ > > > - BUG_ON(!PageLocked(compound_head(p))); > > > + if (!PageLocked(p)) { > > > + dump_page(p, "not locked"); > > > + printk("swap entry %d.%lx\n", swp_type(entry), swp_offset(entry)); > > > + BUG(); > > > + } > > > return p; > > > } > > > > > > > > > and got useful output (while running generic/086): > > > > > > 1457 086 (20181): drop_caches: 3 > > > 1457 page:00000000a216ae9a refcount:2 mapcount:0 mapping:000000009ba7bfed index:0x2227 pfn:0x229e7 > > > 1457 aops:def_blk_aops ino:0 > > > 1457 flags: 0x4000000000002030(lru|active|private) > > > 1457 raw: 4000000000002030 fffff5b4416b5a48 fffff5b4408a7988 ffff9e9c34848578 > > > 1457 raw: 0000000000002227 ffff9e9bd18f0d00 00000002ffffffff 0000000000000000 > > > 1457 page dumped because: not locked > > > 1457 swap entry 30.229e7 > > > 1457 ------------[ cut here ]------------ > > > 1457 kernel BUG at include/linux/swapops.h:201! > > > 1457 invalid opcode: 0000 [#1] SMP PTI > > > 1457 CPU: 3 PID: 646 Comm: check Kdump: loaded Tainted: G W 5.8.0-rc6-00067-gd8b18bdf9870-dirty #355 > > > 1457 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > > > 1457 RIP: 0010:__migration_entry_wait+0x109/0x110 > > > [...] > > > > > > Looking back in the trace, I see: > > > > > > ... > > > 1457 pfn 229e5 order 9 > > > 1457 pfn 229e6 order 9 > > > 1457 pfn 229e7 order 9 > > > 1457 pfn 229e8 order 9 > > > 1457 pfn 229e9 order 9 > > > ... > > > > > > so I would say we have a refcount problem. I've probably made it worse by > > > creating more THPs, but I don't think I'm the originator of the problem. > > > > > > I know very little about the migration code today. I suspect I'm going > > > to have to learn about it next week. > > > > It would be interesting to know if the migration entires ever got removed > > for pfn. I mean if remove_migration_pte() got called for it. > > > > It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes() > > or something. > > It's not mapped with a PMD. I tweaked my debugging slightly: > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > { > - BUG_ON(!PageLocked(compound_head(page))); > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > +if (PageHead(page)) dump_page(page, "make entry"); > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00 > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0 > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked) > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059 > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000 > 1523 page dumped because: make entry > 1523 pfn 1dc01 order 9 > 1523 pfn 1dc02 order 9 > 1523 pfn 1dc03 order 9 > ... > > Notice that it's an anonymous page, so it's not related to my work. I don't have much hope, but could you try if the patch below would blow up? Could you share the setup you use to trigger the issue? I want try it myself. diff --git a/mm/migrate.c b/mm/migrate.c index 40cd7016ae6f..c3148e1261d0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -215,6 +215,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, pte_t pte; swp_entry_t entry; + VM_BUG_ON_PAGE(PageTail(pvmw.page), pvmw.page); VM_BUG_ON_PAGE(PageTail(page), page); while (page_vma_mapped_walk(&pvmw)) { if (PageKsm(page)) -- Kirill A. Shutemov ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-29 19:21 ` Kirill A. Shutemov @ 2020-07-29 19:54 ` Matthew Wilcox 2020-07-29 22:11 ` Matthew Wilcox 0 siblings, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2020-07-29 19:54 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Wed, Jul 29, 2020 at 10:21:51PM +0300, Kirill A. Shutemov wrote: > On Mon, Jul 27, 2020 at 01:03:10PM +0100, Matthew Wilcox wrote: > > > It would be interesting to know if the migration entires ever got removed > > > for pfn. I mean if remove_migration_pte() got called for it. > > > > > > It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes() > > > or something. > > > > It's not mapped with a PMD. I tweaked my debugging slightly: > > > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > > { > > - BUG_ON(!PageLocked(compound_head(page))); > > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > > > +if (PageHead(page)) dump_page(page, "make entry"); > > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > > > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00 > > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0 > > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked) > > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059 > > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000 > > 1523 page dumped because: make entry > > 1523 pfn 1dc01 order 9 > > 1523 pfn 1dc02 order 9 > > 1523 pfn 1dc03 order 9 > > ... > > > > Notice that it's an anonymous page, so it's not related to my work. > > I don't have much hope, but could you try if the patch below would blow > up? Running it now. Results probably in twenty minutes. > Could you share the setup you use to trigger the issue? I want try it > myself. Head commit d8b18bdf9870b131802d641f5e7f32ddc53dcce3 which you can find in http://git.infradead.org/users/willy/pagecache.git I'm using Kent Overstreet's ktest as the base: https://github.com/koverstreet/ktest from the root of the kernel tree, I type: $ ../ktest/build-test-kernel run ../ktest/tests/xfs.ktest xfs.ktest is not in Kent's repo: #!/bin/bash require-kernel-config XFS_FS require-kernel-config XFS_QUOTA XFS_POSIX_ACL XFS_RT XFS_ONLINE_SCRUB require-kernel-config XFS_ONLINE_REPAIR XFS_DEBUG XFS_ASSERT_FATAL require-kernel-config QUOTA require-lib xfstests.sh run_tests() { run_xfstests xfs "$@" } I think that's all you'll need to get going. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: kernel BUG at include/linux/swapops.h:LINE! 2020-07-29 19:54 ` Matthew Wilcox @ 2020-07-29 22:11 ` Matthew Wilcox 0 siblings, 0 replies; 19+ messages in thread From: Matthew Wilcox @ 2020-07-29 22:11 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Hillf Danton, Kirill A. Shutemov, Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs, Mike Kravetz, Johannes Weiner, Jens Axboe, Markus Elfring On Wed, Jul 29, 2020 at 08:54:32PM +0100, Matthew Wilcox wrote: > On Wed, Jul 29, 2020 at 10:21:51PM +0300, Kirill A. Shutemov wrote: > > On Mon, Jul 27, 2020 at 01:03:10PM +0100, Matthew Wilcox wrote: > > > > It would be interesting to know if the migration entires ever got removed > > > > for pfn. I mean if remove_migration_pte() got called for it. > > > > > > > > It can be rmap issue too. Maybe it misses PMD on remove_migration_ptes() > > > > or something. > > > > > > It's not mapped with a PMD. I tweaked my debugging slightly: > > > > > > static inline swp_entry_t make_migration_entry(struct page *page, int write) > > > { > > > - BUG_ON(!PageLocked(compound_head(page))); > > > + VM_BUG_ON_PAGE(!PageLocked(page), page); > > > > > > +if (PageHead(page)) dump_page(page, "make entry"); > > > +if (PageTail(page)) printk("pfn %lx order %d\n", page_to_pfn(page), thp_order(thp_head(page))); > > > > > > 1523 page:0000000006f62206 refcount:490 mapcount:1 mapping:0000000000000000 index:0x562b12a00 pfn:0x1dc00 > > > 1523 head:0000000006f62206 order:9 compound_mapcount:0 compound_pincount:0 > > > 1523 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked) > > > 1523 raw: 400000000009003d ffffecfd41301308 ffffecfd41b08008 ffff9e9971c00059 > > > 1523 raw: 0000000562b12a00 0000000000000000 000001ea00000000 0000000000000000 > > > 1523 page dumped because: make entry > > > 1523 pfn 1dc01 order 9 > > > 1523 pfn 1dc02 order 9 > > > 1523 pfn 1dc03 order 9 > > > ... > > > > > > Notice that it's an anonymous page, so it's not related to my work. > > > > I don't have much hope, but could you try if the patch below would blow > > up? > > Running it now. Results probably in twenty minutes. It didn't blow up. I added a dump_stack() after the call to dump_page() and got this ... 2922 page:0000000085a5c107 refcount:474 mapcount:1 mapping:0000000000000000 index:0x559e98a00 pfn:0x35200 2922 head:0000000085a5c107 order:9 compound_mapcount:0 compound_pincount:0 2922 anon flags: 0x400000000009003d(locked|uptodate|dirty|lru|active|head|swapbacked) 2922 raw: 400000000009003d ffffe8e5c1bbaf48 ffffe8e5c046ee88 ffffa2e7f3787ec9 2922 raw: 0000000559e98a00 0000000000000000 000001da00000000 0000000000000000 2922 page dumped because: make entry 2922 CPU: 5 PID: 23471 Comm: dd Kdump: loaded Tainted: G W 5.8.0-rc6-00067-gd8b18bdf9870-dirty #358 2922 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 2922 Call Trace: 2922 dump_stack+0x5e/0x7a 2922 try_to_unmap_one+0x846/0x860 2922 rmap_walk_anon+0x13d/0x2a0 2922 rmap_walk_locked+0x23/0x30 2922 try_to_unmap+0x64/0xbc 2922 split_huge_page_to_list+0x188/0xdb0 2922 deferred_split_scan+0x148/0x240 2922 shrink_slab.constprop.0+0x198/0x330 2922 shrink_node+0x1a8/0x440 2922 try_to_free_pages+0x18f/0x480 2922 __alloc_pages_slowpath.constprop.0+0x297/0xca0 2922 __alloc_pages_nodemask+0x1ba/0x1e0 2922 pagecache_get_page+0xd8/0x330 2922 grab_cache_page_write_begin+0x1c/0x40 2922 iomap_write_begin+0x2d6/0x6d0 2922 iomap_write_actor+0x8b/0x1c0 2922 iomap_apply+0xe3/0x310 2922 iomap_file_buffered_write+0x5c/0x80 2922 xfs_file_buffered_aio_write+0xbd/0x310 2922 xfs_file_write_iter+0xa8/0xc0 2922 new_sync_write+0xf5/0x170 2922 vfs_write+0x191/0x1e0 I think that's interesting because it's not trying to allocate a huge page itself (I didn't touch the write_begin path). Rather, I presume the additional memory pressure from allocating huge pages is causing anonymous pages to be split to free up memory. It survived all the way to generic/224 this run, but I don't think that's relevant. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot] kernel BUG at include/linux/swapops.h:LINE! 2020-05-30 17:05 kernel BUG at include/linux/swapops.h:LINE! syzbot 2020-07-19 21:10 ` syzbot @ 2021-05-08 11:24 ` syzbot 1 sibling, 0 replies; 19+ messages in thread From: syzbot @ 2021-05-08 11:24 UTC (permalink / raw) To: Markus.Elfring, akpm, axboe, david, hannes, hdanton, jennifer, kirill.shutemov, kirill, linux-kernel, linux-mm, mike.kravetz, rcampbell, syzkaller-bugs, willy syzbot has found a reproducer for the following issue on: HEAD commit: 869a85b9 Add linux-next specific files for 20210507 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=15b48a63d00000 kernel config: https://syzkaller.appspot.com/x/.config?x=b72885037018d06d dashboard link: https://syzkaller.appspot.com/bug?extid=c48f34012b06c4ac67dd syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10479fd5d00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1565e995d00000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+c48f34012b06c4ac67dd@syzkaller.appspotmail.com ------------[ cut here ]------------ kernel BUG at include/linux/swapops.h:197! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 8460 Comm: syz-executor246 Not tainted 5.12.0-next-20210507-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] RIP: 0010:zap_huge_pmd+0xe5b/0x1110 mm/huge_memory.c:1697 Code: 2b 3f b8 ff 48 8b 5c 24 10 48 83 eb 01 e9 a8 f6 ff ff e8 18 3f b8 ff 48 8b 5c 24 10 48 83 eb 01 e9 66 f7 ff ff e8 05 3f b8 ff <0f> 0b e8 fe 3e b8 ff 31 f6 31 ff 49 bc 00 f0 ff ff ff ff 0f 00 e8 RSP: 0018:ffffc90001a2f730 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888024bc5580 RSI: ffffffff81bc972b RDI: 0000000000000003 RBP: ffffc90001a2fa48 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff81bc8ec8 R11: 0000000000000000 R12: ffff88802c9a5800 R13: ffffea0000e58080 R14: ffff8880303bfea0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004c8168 CR3: 0000000016b36000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: zap_pmd_range mm/memory.c:1361 [inline] zap_pud_range mm/memory.c:1403 [inline] zap_p4d_range mm/memory.c:1424 [inline] unmap_page_range+0x1aa4/0x2650 mm/memory.c:1445 unmap_single_vma+0x198/0x300 mm/memory.c:1490 unmap_vmas+0x16d/0x2f0 mm/memory.c:1522 exit_mmap+0x2a8/0x590 mm/mmap.c:3207 __mmput+0x122/0x470 kernel/fork.c:1096 mmput+0x58/0x60 kernel/fork.c:1117 exit_mm kernel/exit.c:502 [inline] do_exit+0xb0a/0x2a60 kernel/exit.c:813 do_group_exit+0x125/0x310 kernel/exit.c:923 get_signal+0x47f/0x2150 kernel/signal.c:2856 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:789 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x171/0x280 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:301 do_syscall_64+0x47/0xb0 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4458f9 Code: Unable to access opcode bytes at RIP 0x4458cf. RSP: 002b:00007f45f24e4318 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 00000000004ca408 RCX: 00000000004458f9 RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000004ca408 RBP: 00000000004ca400 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000001000000020 R13: 00007ffeb676771f R14: 00007f45f24e4400 R15: 0000000000022000 Modules linked in: ---[ end trace 8c9f5c48deec1bb7 ]--- RIP: 0010:migration_entry_to_page include/linux/swapops.h:197 [inline] RIP: 0010:migration_entry_to_page include/linux/swapops.h:190 [inline] RIP: 0010:zap_huge_pmd+0xe5b/0x1110 mm/huge_memory.c:1697 Code: 2b 3f b8 ff 48 8b 5c 24 10 48 83 eb 01 e9 a8 f6 ff ff e8 18 3f b8 ff 48 8b 5c 24 10 48 83 eb 01 e9 66 f7 ff ff e8 05 3f b8 ff <0f> 0b e8 fe 3e b8 ff 31 f6 31 ff 49 bc 00 f0 ff ff ff ff 0f 00 e8 RSP: 0018:ffffc90001a2f730 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888024bc5580 RSI: ffffffff81bc972b RDI: 0000000000000003 RBP: ffffc90001a2fa48 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff81bc8ec8 R11: 0000000000000000 R12: ffff88802c9a5800 R13: ffffea0000e58080 R14: ffff8880303bfea0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004c8168 CR3: 000000000bc8e000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-05-08 11:24 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-30 17:05 kernel BUG at include/linux/swapops.h:LINE! syzbot 2020-07-19 21:10 ` syzbot 2020-07-20 23:51 ` Andrew Morton 2020-07-21 0:21 ` Matthew Wilcox 2020-07-21 2:14 ` Matthew Wilcox 2020-07-21 11:11 ` Kirill A. Shutemov 2020-07-21 15:11 ` Jens Axboe 2020-07-23 7:37 ` Hillf Danton 2020-07-24 11:13 ` Kirill A. Shutemov 2020-07-26 16:49 ` Matthew Wilcox 2020-07-27 10:31 ` Kirill A. Shutemov 2020-07-27 12:03 ` Matthew Wilcox 2020-07-27 12:59 ` Hillf Danton 2020-07-27 13:44 ` Matthew Wilcox 2020-07-27 14:46 ` Hillf Danton 2020-07-29 19:21 ` Kirill A. Shutemov 2020-07-29 19:54 ` Matthew Wilcox 2020-07-29 22:11 ` Matthew Wilcox 2021-05-08 11:24 ` [syzbot] " syzbot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).