linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: task can't die in shrink_inactive_list (2)
@ 2020-11-21  1:55 syzbot
  2020-11-24  3:54 ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: syzbot @ 2020-11-21  1:55 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    03430750 Add linux-next specific files for 20201116
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
compiler:       gcc (GCC) 10.1.0-syz 20200507
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com

INFO: task syz-executor880:8534 can't die for more than 143 seconds.
task:syz-executor880 state:R  running task     stack:25304 pid: 8534 ppid:  8504 flags:0x00004006
Call Trace:
 context_switch kernel/sched/core.c:4269 [inline]
 __schedule+0x890/0x2030 kernel/sched/core.c:5019
 preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5179
 preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:40
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x3c/0x40 kernel/locking/spinlock.c:199
 spin_unlock_irq include/linux/spinlock.h:404 [inline]
 shrink_inactive_list+0x4b1/0xce0 mm/vmscan.c:1974
 shrink_list mm/vmscan.c:2167 [inline]
 shrink_lruvec+0x61b/0x11b0 mm/vmscan.c:2462
 shrink_node_memcgs mm/vmscan.c:2650 [inline]
 shrink_node+0x839/0x1d60 mm/vmscan.c:2767
 shrink_zones mm/vmscan.c:2970 [inline]
 do_try_to_free_pages+0x38b/0x1440 mm/vmscan.c:3025
 try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
 __perform_reclaim mm/page_alloc.c:4360 [inline]
 __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
 __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
 __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
 alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
 alloc_pages include/linux/gfp.h:547 [inline]
 __page_cache_alloc mm/filemap.c:977 [inline]
 __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
 page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
 do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
 do_sync_mmap_readahead mm/filemap.c:2721 [inline]
 filemap_fault+0x19d0/0x2940 mm/filemap.c:2809
 __do_fault+0x10d/0x4d0 mm/memory.c:3623
 do_shared_fault mm/memory.c:4071 [inline]
 do_fault mm/memory.c:4149 [inline]
 handle_pte_fault mm/memory.c:4385 [inline]
 __handle_mm_fault mm/memory.c:4520 [inline]
 handle_mm_fault+0x3033/0x55d0 mm/memory.c:4618
 do_user_addr_fault+0x55b/0xba0 arch/x86/mm/fault.c:1377
 handle_page_fault arch/x86/mm/fault.c:1434 [inline]
 exc_page_fault+0x9e/0x180 arch/x86/mm/fault.c:1490
 asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:580
RIP: 0033:0x400e71
Code: Unable to access opcode bytes at RIP 0x400e47.
RSP: 002b:00007f8a5353fdc0 EFLAGS: 00010246
RAX: 6c756e2f7665642f RBX: 00000000006dbc38 RCX: 0000000000402824
RDX: 928195da81441750 RSI: 0000000000000000 RDI: 00000000006dbc30
RBP: 00000000006dbc30 R08: 0000000000000000 R09: 00007f8a53540700
R10: 00007f8a535409d0 R11: 0000000000000202 R12: 00000000006dbc3c
R13: 00007ffe80747a5f R14: 00007f8a535409c0 R15: 0000000000000001

Showing all locks held in the system:
1 lock held by khungtaskd/1659:
 #0: ffffffff8b339ce0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6252
1 lock held by kswapd0/2195:
1 lock held by kswapd1/2196:
1 lock held by in:imklog/8191:
 #0: ffff8880125b1270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:932
1 lock held by cron/8189:
2 locks held by syz-executor880/8502:
2 locks held by syz-executor880/8505:
2 locks held by syz-executor880/8507:
2 locks held by syz-executor880/11706:
2 locks held by syz-executor880/11709:
3 locks held by syz-executor880/12008:
2 locks held by syz-executor880/12015:

=============================================



---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-11-21  1:55 INFO: task can't die in shrink_inactive_list (2) syzbot
@ 2020-11-24  3:54 ` Andrew Morton
  2020-11-24  5:20   ` Alex Shi
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andrew Morton @ 2020-11-24  3:54 UTC (permalink / raw)
  To: syzbot; +Cc: linux-kernel, linux-mm, syzkaller-bugs, Alex Shi

On Fri, 20 Nov 2020 17:55:22 -0800 syzbot <syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com> wrote:

> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    03430750 Add linux-next specific files for 20201116
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
> dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
> compiler:       gcc (GCC) 10.1.0-syz 20200507
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000

Alex, your series "per memcg lru lock" changed the vmscan code rather a
lot.  Could you please take a look at that reproducer?

> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com
> 
> INFO: task syz-executor880:8534 can't die for more than 143 seconds.
> task:syz-executor880 state:R  running task     stack:25304 pid: 8534 ppid:  8504 flags:0x00004006
> Call Trace:
>  context_switch kernel/sched/core.c:4269 [inline]
>  __schedule+0x890/0x2030 kernel/sched/core.c:5019
>  preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5179
>  preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:40
>  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>  _raw_spin_unlock_irq+0x3c/0x40 kernel/locking/spinlock.c:199
>  spin_unlock_irq include/linux/spinlock.h:404 [inline]
>  shrink_inactive_list+0x4b1/0xce0 mm/vmscan.c:1974
>  shrink_list mm/vmscan.c:2167 [inline]
>  shrink_lruvec+0x61b/0x11b0 mm/vmscan.c:2462
>  shrink_node_memcgs mm/vmscan.c:2650 [inline]
>  shrink_node+0x839/0x1d60 mm/vmscan.c:2767
>  shrink_zones mm/vmscan.c:2970 [inline]
>  do_try_to_free_pages+0x38b/0x1440 mm/vmscan.c:3025
>  try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
>  __perform_reclaim mm/page_alloc.c:4360 [inline]
>  __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
>  __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
>  __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
>  alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
>  alloc_pages include/linux/gfp.h:547 [inline]
>  __page_cache_alloc mm/filemap.c:977 [inline]
>  __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
>  page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
>  do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
>  do_sync_mmap_readahead mm/filemap.c:2721 [inline]
>  filemap_fault+0x19d0/0x2940 mm/filemap.c:2809
>  __do_fault+0x10d/0x4d0 mm/memory.c:3623
>  do_shared_fault mm/memory.c:4071 [inline]
>  do_fault mm/memory.c:4149 [inline]
>  handle_pte_fault mm/memory.c:4385 [inline]
>  __handle_mm_fault mm/memory.c:4520 [inline]
>  handle_mm_fault+0x3033/0x55d0 mm/memory.c:4618
>  do_user_addr_fault+0x55b/0xba0 arch/x86/mm/fault.c:1377
>  handle_page_fault arch/x86/mm/fault.c:1434 [inline]
>  exc_page_fault+0x9e/0x180 arch/x86/mm/fault.c:1490
>  asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:580
> RIP: 0033:0x400e71
> Code: Unable to access opcode bytes at RIP 0x400e47.
> RSP: 002b:00007f8a5353fdc0 EFLAGS: 00010246
> RAX: 6c756e2f7665642f RBX: 00000000006dbc38 RCX: 0000000000402824
> RDX: 928195da81441750 RSI: 0000000000000000 RDI: 00000000006dbc30
> RBP: 00000000006dbc30 R08: 0000000000000000 R09: 00007f8a53540700
> R10: 00007f8a535409d0 R11: 0000000000000202 R12: 00000000006dbc3c
> R13: 00007ffe80747a5f R14: 00007f8a535409c0 R15: 0000000000000001
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1659:
>  #0: ffffffff8b339ce0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6252
> 1 lock held by kswapd0/2195:
> 1 lock held by kswapd1/2196:
> 1 lock held by in:imklog/8191:
>  #0: ffff8880125b1270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:932
> 1 lock held by cron/8189:
> 2 locks held by syz-executor880/8502:
> 2 locks held by syz-executor880/8505:
> 2 locks held by syz-executor880/8507:
> 2 locks held by syz-executor880/11706:
> 2 locks held by syz-executor880/11709:
> 3 locks held by syz-executor880/12008:
> 2 locks held by syz-executor880/12015:
> 
> =============================================
> 
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-11-24  3:54 ` Andrew Morton
@ 2020-11-24  5:20   ` Alex Shi
  2020-11-24 12:00   ` Alex Shi
  2020-12-21 19:56   ` Hugh Dickins
  2 siblings, 0 replies; 11+ messages in thread
From: Alex Shi @ 2020-11-24  5:20 UTC (permalink / raw)
  To: Andrew Morton, syzbot
  Cc: linux-kernel, linux-mm, syzkaller-bugs, Hugh Dickins, Johannes Weiner

CC: Hugh Dickin & Johannes, 



在 2020/11/24 上午11:54, Andrew Morton 写道:
> On Fri, 20 Nov 2020 17:55:22 -0800 syzbot <syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com> wrote:
> 
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    03430750 Add linux-next specific files for 20201116
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000
> 
> Alex, your series "per memcg lru lock" changed the vmscan code rather a
> lot.  Could you please take a look at that reproducer?
> 

Sure, I will try to reproduce and look into it.

Thanks!
Alex

>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com
>>
>> INFO: task syz-executor880:8534 can't die for more than 143 seconds.
>> task:syz-executor880 state:R  running task     stack:25304 pid: 8534 ppid:  8504 flags:0x00004006
>> Call Trace:
>>  context_switch kernel/sched/core.c:4269 [inline]
>>  __schedule+0x890/0x2030 kernel/sched/core.c:5019
>>  preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5179
>>  preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:40
>>  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>>  _raw_spin_unlock_irq+0x3c/0x40 kernel/locking/spinlock.c:199
>>  spin_unlock_irq include/linux/spinlock.h:404 [inline]
>>  shrink_inactive_list+0x4b1/0xce0 mm/vmscan.c:1974
>>  shrink_list mm/vmscan.c:2167 [inline]
>>  shrink_lruvec+0x61b/0x11b0 mm/vmscan.c:2462
>>  shrink_node_memcgs mm/vmscan.c:2650 [inline]
>>  shrink_node+0x839/0x1d60 mm/vmscan.c:2767
>>  shrink_zones mm/vmscan.c:2970 [inline]
>>  do_try_to_free_pages+0x38b/0x1440 mm/vmscan.c:3025
>>  try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
>>  __perform_reclaim mm/page_alloc.c:4360 [inline]
>>  __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
>>  __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
>>  __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
>>  alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
>>  alloc_pages include/linux/gfp.h:547 [inline]
>>  __page_cache_alloc mm/filemap.c:977 [inline]
>>  __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
>>  page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
>>  do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
>>  do_sync_mmap_readahead mm/filemap.c:2721 [inline]
>>  filemap_fault+0x19d0/0x2940 mm/filemap.c:2809
>>  __do_fault+0x10d/0x4d0 mm/memory.c:3623
>>  do_shared_fault mm/memory.c:4071 [inline]
>>  do_fault mm/memory.c:4149 [inline]
>>  handle_pte_fault mm/memory.c:4385 [inline]
>>  __handle_mm_fault mm/memory.c:4520 [inline]
>>  handle_mm_fault+0x3033/0x55d0 mm/memory.c:4618
>>  do_user_addr_fault+0x55b/0xba0 arch/x86/mm/fault.c:1377
>>  handle_page_fault arch/x86/mm/fault.c:1434 [inline]
>>  exc_page_fault+0x9e/0x180 arch/x86/mm/fault.c:1490
>>  asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:580
>> RIP: 0033:0x400e71
>> Code: Unable to access opcode bytes at RIP 0x400e47.
>> RSP: 002b:00007f8a5353fdc0 EFLAGS: 00010246
>> RAX: 6c756e2f7665642f RBX: 00000000006dbc38 RCX: 0000000000402824
>> RDX: 928195da81441750 RSI: 0000000000000000 RDI: 00000000006dbc30
>> RBP: 00000000006dbc30 R08: 0000000000000000 R09: 00007f8a53540700
>> R10: 00007f8a535409d0 R11: 0000000000000202 R12: 00000000006dbc3c
>> R13: 00007ffe80747a5f R14: 00007f8a535409c0 R15: 0000000000000001
>>
>> Showing all locks held in the system:
>> 1 lock held by khungtaskd/1659:
>>  #0: ffffffff8b339ce0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6252
>> 1 lock held by kswapd0/2195:
>> 1 lock held by kswapd1/2196:
>> 1 lock held by in:imklog/8191:
>>  #0: ffff8880125b1270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:932
>> 1 lock held by cron/8189:
>> 2 locks held by syz-executor880/8502:
>> 2 locks held by syz-executor880/8505:
>> 2 locks held by syz-executor880/8507:
>> 2 locks held by syz-executor880/11706:
>> 2 locks held by syz-executor880/11709:
>> 3 locks held by syz-executor880/12008:
>> 2 locks held by syz-executor880/12015:
>>
>> =============================================
>>
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>> syzbot can test patches for this issue, for details see:
>> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-11-24  3:54 ` Andrew Morton
  2020-11-24  5:20   ` Alex Shi
@ 2020-11-24 12:00   ` Alex Shi
  2020-11-24 13:53     ` Alex Shi
  2020-11-24 14:35     ` Alex Shi
  2020-12-21 19:56   ` Hugh Dickins
  2 siblings, 2 replies; 11+ messages in thread
From: Alex Shi @ 2020-11-24 12:00 UTC (permalink / raw)
  To: Andrew Morton, syzbot
  Cc: linux-kernel, linux-mm, syzkaller-bugs, Hugh Dickins,
	Johannes Weiner, peterz



在 2020/11/24 上午11:54, Andrew Morton 写道:
> On Fri, 20 Nov 2020 17:55:22 -0800 syzbot <syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com> wrote:
> 
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    03430750 Add linux-next specific files for 20201116
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000

CC Peter Zijlstra.

I found next-20200821 had a very very similar ops as this.
https://groups.google.com/g/syzkaller-upstream-moderation/c/S0pyqK1dZv8/m/dxMoEhGdAQAJ
So does this means the bug exist for long time from 5.9-rc1?

The reproducer works randomly on a cpu=2, mem=1600M x86 vm. It could cause hung again
on both kernel, but both with different kernel stack.

Maybe is system just too busy? I will try more older kernel with the reproducer.

Thanks
Alex

BTW, I remove the drm and wireless config in my testing.

[ 1861.939128][ T1586] INFO: task systemd-udevd:8999 blocked for more than 143 seconds.
[ 1861.939969][ T1586]       Not tainted 5.9.0-rc1-next-20200821 #5
[ 1861.940553][ T1586] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1861.941369][ T1586] task:systemd-udevd   state:D stack:21192 pid: 8999 ppid:  4717 flags:0x00004080
[ 1861.942245][ T1586] Call Trace:
[ 1861.942581][ T1586]  __schedule+0xaab/0x1f20
[ 1861.943014][ T1586]  ? __sched_text_start+0x8/0x8
[ 1861.943482][ T1586]  schedule+0xc4/0x2b0
[ 1861.943872][ T1586]  schedule_preempt_disabled+0xf/0x20
[ 1861.944390][ T1586]  __mutex_lock+0x8a0/0x13e0
[ 1861.944831][ T1586]  ? __blkdev_get+0x4bc/0x1a00
[ 1861.945286][ T1586]  ? mutex_lock_io_nested+0x12c0/0x12c0
[ 1861.945818][ T1586]  ? up_read+0x1a5/0x740
[ 1861.946224][ T1586]  ? down_read+0x10a/0x420
[ 1861.946653][ T1586]  ? kobj_lookup+0x37a/0x480
[ 1861.947095][ T1586]  ? __blkdev_get+0x4bc/0x1a00
[ 1861.947545][ T1586]  __blkdev_get+0x4bc/0x1a00
[ 1861.947997][ T1586]  ? lock_release+0x730/0x730
[ 1861.948464][ T1586]  ? __blkdev_put+0x720/0x720
[ 1861.962189][T15367] systemd-journald[15367]: Sent WATCHDOG=1 notification.
[ 1861.991663][ T1586]  blkdev_get+0x20/0x80
[ 1861.992088][ T1586]  blkdev_open+0x20a/0x290
[ 1861.992514][ T1586]  do_dentry_open+0x69a/0x1240
[ 1861.992975][ T1586]  ? bd_acquire+0x2c0/0x2c0
[ 1861.993414][ T1586]  path_openat+0xdd2/0x26f0
[ 1861.993846][ T1586]  ? path_lookupat.isra.41+0x520/0x520
[ 1861.994368][ T1586]  ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
[ 1861.994937][ T1586]  ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
[ 1861.995502][ T1586]  ? ___sys_sendmsg+0x11c/0x180
[ 1861.995954][ T1586]  ? find_held_lock+0x33/0x1c0
[ 1861.996405][ T1586]  ? __might_fault+0x11f/0x1d0
[ 1861.996850][ T1586]  do_filp_open+0x192/0x260
[ 1861.997268][ T1586]  ? may_open_dev+0xf0/0xf0
[ 1861.997699][ T1586]  ? rwlock_bug.part.1+0x90/0x90
[ 1861.998161][ T1586]  ? do_raw_spin_unlock+0x4f/0x260
[ 1861.998650][ T1586]  ? __alloc_fd+0x282/0x600
[ 1862.002012][ T1586]  ? lock_downgrade+0x6f0/0x6f0
[ 1862.007607][ T1586]  do_sys_openat2+0x573/0x850
[ 1862.008112][ T1586]  ? file_open_root+0x3f0/0x3f0
[ 1862.008570][ T1586]  ? trace_hardirqs_on+0x5f/0x220
[ 1862.028918][ T1586]  do_sys_open+0xca/0x140
[ 1862.028932][ T1586]  ? filp_open+0x70/0x70
[ 1862.028945][ T1586]  do_syscall_64+0x2d/0x70
[ 1862.028954][ T1586]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1862.028966][ T1586] RIP: 0033:0x7fc04686eea0
[ 1862.028969][ T1586] Code: Bad RIP value.
[ 1862.028974][ T1586] RSP: 002b:00007ffd2c78ae68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[ 1862.028983][ T1586] RAX: ffffffffffffffda RBX: 000055785498f3c0 RCX: 00007fc04686eea0
[ 1862.028988][ T1586] RDX: 000055785498fcd0 RSI: 00000000000a0800 RDI: 000055785498fcd0
[ 1862.028992][ T1586] RBP: 0000000000000000 R08: 00007ffd2c7ad090 R09: 0000000000051dc0
[ 1862.028997][ T1586] R10: 0000000000051dc0 R11: 0000000000000246 R12: 0000557854990340
[ 1862.029002][ T1586] R13: 0000557854984010 R14: 0000557854990200 R15: 000000000000000c
[ 1862.029024][ T1586] INFO: task repro:17514 can't die for more than 143 seconds.
[ 1862.036603][ T1586] task:repro           state:R  running task     stack:25520 pid:17514 ppid:  8947 flags:0x00004086
[ 1862.037596][ T1586] Call Trace:
[ 1862.037909][ T1586]  __schedule+0xaab/0x1f20
[ 1862.038322][ T1586]  ? __sched_text_start+0x8/0x8
[ 1862.038776][ T1586]  ? preempt_schedule_irq+0x30/0x90
[ 1862.070004][ T1586]  ? bdev_evict_inode+0x420/0x420
[ 1862.070497][ T1586]  ? _raw_spin_unlock_irqrestore+0x47/0x60
[ 1862.071036][ T1586]  ? blkdev_write_begin+0x40/0x40
[ 1862.071504][ T1586]  ? read_pages+0x1ee/0x1170
[ 1862.071933][ T1586]  ? _raw_spin_unlock_irqrestore+0x34/0x60
[ 1862.072484][ T1586]  ? debug_check_no_obj_freed+0x205/0x45a
[ 1862.073013][ T1586]  ? rcu_read_lock_sched_held+0x9c/0xd0
[ 1862.073532][ T1586]  ? read_cache_pages+0x6e0/0x6e0
[ 1862.074002][ T1586]  ? page_cache_readahead_unbounded+0x57b/0x800
[ 1862.074591][ T1586]  ? read_pages+0x1170/0x1170
[ 1862.075028][ T1586]  ? down_read_non_owner+0x470/0x470
[ 1862.075522][ T1586]  ? __do_page_cache_readahead+0xc2/0xf0
[ 1862.076044][ T1586]  ? filemap_fault+0x16df/0x24d0
[ 1862.076507][ T1586]  ? lockdep_init_map_waits+0x267/0x7c0
[ 1862.077021][ T1586]  ? __do_fault+0x10d/0x530
[ 1862.077442][ T1586]  ? handle_mm_fault+0x36af/0x4800
[ 1862.077920][ T1586]  ? copy_page_range+0x2ea0/0x2ea0
[ 1862.078400][ T1586]  ? vmacache_update+0xce/0x140
[ 1862.078851][ T1586]  ? do_user_addr_fault+0x564/0xb40
[ 1862.088905][ T1586]  ? exc_page_fault+0xa1/0x170
[ 1862.089363][ T1586]  ? asm_exc_page_fault+0x8/0x30
[ 1862.089828][ T1586]  ? asm_exc_page_fault+0x1e/0x30
[ 1862.090314][ T1586] INFO: task repro:17545 can't die for more than 143 seconds.
[ 1862.090990][ T1586] task:repro           state:R  running task     stack:25312 pid:17545 ppid:  8946 flags:0x00004086
[ 1862.091978][ T1586] Call Trace:
[ 1862.092282][ T1586]  __schedule+0xaab/0x1f20
[ 1862.092696][ T1586]  ? __sched_text_start+0x8/0x8
[ 1862.093141][ T1586]  ? trace_hardirqs_on+0x5f/0x220
[ 1862.093613][ T1586]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 1862.094182][ T1586]  ? preempt_schedule_thunk+0x16/0x18
[ 1862.094681][ T1586]  preempt_schedule_common+0x1a/0xc0
[ 1862.095167][ T1586]  preempt_schedule_thunk+0x16/0x18
[ 1862.095653][ T1586]  kernel_init_free_pages+0xf0/0x110
[ 1862.096145][ T1586]  prep_new_page+0x12e/0x1f0
[ 1862.096578][ T1586]  get_page_from_freelist+0x1202/0x56c0
[ 1862.097097][ T1586]  ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
[ 1862.097659][ T1586]  ? __isolate_free_page+0x600/0x600
[ 1862.098147][ T1586]  __alloc_pages_nodemask+0x2d7/0x7d0
[ 1862.098645][ T1586]  ? __alloc_pages_slowpath.constprop.108+0x2380/0x2380
[ 1862.121238][ T1586]  ? add_to_page_cache_lru+0x1a3/0x700
[ 1862.121773][ T1586]  ? add_to_page_cache_locked+0x40/0x40
[ 1862.122285][ T1586]  alloc_pages_current+0x108/0x200
[ 1862.122766][ T1586]  __page_cache_alloc+0xfc/0x300
[ 1862.123229][ T1586]  page_cache_readahead_unbounded+0x47f/0x800
[ 1862.123795][ T1586]  ? rcu_read_lock_sched_held+0xd0/0xd0
[ 1862.124307][ T1586]  ? read_pages+0x1170/0x1170
[ 1862.124740][ T1586]  ? find_held_lock+0x33/0x1c0
[ 1862.125182][ T1586]  ? inode_congested+0x256/0x4e0
[ 1862.125647][ T1586]  ? page_cache_async_readahead+0x3e5/0x7c0
[ 1862.126199][ T1586]  __do_page_cache_readahead+0xc2/0xf0
[ 1862.126708][ T1586]  ondemand_readahead+0x579/0xd20
[ 1862.127175][ T1586]  page_cache_async_readahead+0x43b/0x7c0
[ 1862.127703][ T1586]  filemap_fault+0xde9/0x24d0
[ 1862.128143][ T1586]  __do_fault+0x10d/0x530
[ 1862.128550][ T1586]  handle_mm_fault+0x36af/0x4800
[ 1862.148903][ T1586]  ? copy_page_range+0x2ea0/0x2ea0
[ 1862.149413][ T1586]  ? vmacache_update+0xce/0x140
[ 1862.149866][ T1586]  do_user_addr_fault+0x564/0xb40
[ 1862.150336][ T1586]  exc_page_fault+0xa1/0x170
[ 1862.150767][ T1586]  ? asm_exc_page_fault+0x8/0x30
[ 1862.151222][ T1586]  asm_exc_page_fault+0x1e/0x30
[ 1862.151671][ T1586] RIP: 0033:0x428dd7
[ 1862.152033][ T1586] Code: Bad RIP value.
[ 1862.152416][ T1586] RSP: 002b:00007f8995966d78 EFLAGS: 00010202
[ 1862.152980][ T1586] RAX: 0000000020000080 RBX: 0000000000000000 RCX: 000000007665642f
[ 1862.153711][ T1586] RDX: 000000000000000c RSI: 00000000004b2370 RDI: 0000000020000080
[ 1862.154444][ T1586] RBP: 00007f8995966da0 R08: 00007f8995967700 R09: 00007f8995967700
[ 1862.155173][ T1586] R10: 00007f89959679d0 R11: 0000000000000202 R12: 0000000000000000
[ 1862.155901][ T1586] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f8995967700
[ 1862.156630][ T1586] INFO: task repro:17769 can't die for more than 143 seconds.
[ 1862.157309][ T1586] task:repro           state:D stack:28536 pid:17769 ppid:  8950 flags:0x00000084
[ 1862.158153][ T1586] Call Trace:
[ 1862.158467][ T1586]  __schedule+0xaab/0x1f20
[ 1862.178902][ T1586]  ? __sched_text_start+0x8/0x8
[ 1862.179370][ T1586]  schedule+0xc4/0x2b0
[ 1862.179750][ T1586]  schedule_preempt_disabled+0xf/0x20
[ 1862.180247][ T1586]  __mutex_lock+0x8a0/0x13e0
[ 1862.180681][ T1586]  ? __blkdev_get+0x4bc/0x1a00
[ 1862.181121][ T1586]  ? mutex_lock_io_nested+0x12c0/0x12c0
[ 1862.181643][ T1586]  ? up_read+0x1a5/0x740
[ 1862.182035][ T1586]  ? down_read+0x10a/0x420
[ 1862.182449][ T1586]  ? kobj_lookup+0x37a/0x480
[ 1862.182879][ T1586]  ? __blkdev_get+0x4bc/0x1a00
[ 1862.183317][ T1586]  __blkdev_get+0x4bc/0x1a00
[ 1862.183747][ T1586]  ? lock_release+0x730/0x730
[ 1862.184182][ T1586]  ? __blkdev_put+0x720/0x720
[ 1862.184623][ T1586]  blkdev_get+0x20/0x80
[ 1862.185006][ T1586]  blkdev_open+0x20a/0x290
[ 1862.185420][ T1586]  do_dentry_open+0x69a/0x1240
[ 1862.185856][ T1586]  ? bd_acquire+0x2c0/0x2c0
[ 1862.186278][ T1586]  path_openat+0xdd2/0x26f0
[ 1862.186710][ T1586]  ? path_lookupat.isra.41+0x520/0x520
[ 1862.187209][ T1586]  ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
[ 1862.187769][ T1586]  ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
[ 1862.188317][ T1586]  ? find_held_lock+0x33/0x1c0
[ 1862.188755][ T1586]  ? __might_fault+0x11f/0x1d0
[ 1862.208898][ T1586]  do_filp_open+0x192/0x260
[ 1862.209341][ T1586]  ? may_open_dev+0xf0/0xf0
[ 1862.209774][ T1586]  ? rwlock_bug.part.1+0x90/0x90
[ 1862.210230][ T1586]  ? do_raw_spin_unlock+0x4f/0x260
[ 1862.210712][ T1586]  ? __alloc_fd+0x282/0x600
[ 1862.211134][ T1586]  ? lock_downgrade+0x6f0/0x6f0
[ 1862.211585][ T1586]  do_sys_openat2+0x573/0x850
[ 1862.212022][ T1586]  ? file_open_root+0x3f0/0x3f0
[ 1862.212478][ T1586]  ? trace_hardirqs_on+0x5f/0x220
[ 1862.212942][ T1586]  do_sys_open+0xca/0x140
[ 1862.213346][ T1586]  ? filp_open+0x70/0x70
[ 1862.213752][ T1586]  do_syscall_64+0x2d/0x70
[ 1862.214162][ T1586]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1862.214713][ T1586] RIP: 0033:0x437419
[ 1862.215071][ T1586] Code: Bad RIP value.
[ 1862.215450][ T1586] RSP: 002b:00007f8995966d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 1862.216222][ T1586] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000437419
[ 1862.216961][ T1586] RDX: 0004000000004002 RSI: 0000000020000080 RDI: ffffffffffffff9c
[ 1862.217687][ T1586] RBP: 00007f8995966da0 R08: 00007f8995967700 R09: 0000000000000000
[ 1862.218419][ T1586] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 1862.248893][ T1586] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f8995967700

or

[ 2005.496484][ T1626] INFO: task repro:10028 blocked for more than 143 seconds.
[ 2005.497214][ T1626]       Not tainted 5.10.0-rc3-next-20201116 #2
[ 2005.497841][ T1626] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2005.498710][ T1626] task:repro           state:D stack:28080 pid:10028 ppid:  9078 flags:0x00004082
[ 2005.499641][ T1626] Call Trace:
[ 2005.499977][ T1626]  __schedule+0xaaa/0x1f70
[ 2005.500436][ T1626]  ? __sched_text_start+0x8/0x8
[ 2005.500930][ T1626]  schedule+0xc3/0x270
[ 2005.501347][ T1626]  schedule_preempt_disabled+0xf/0x20
[ 2005.501884][ T1626]  __mutex_lock+0x856/0x1420
[ 2005.502358][ T1626]  ? blkdev_put+0x31/0x530
[ 2005.513908][ T1626]  ? mutex_lock_io_nested+0x12c0/0x12c0
[ 2005.514486][ T1626]  ? lock_release+0x690/0x690
[ 2005.514962][ T1626]  ? do_raw_spin_lock+0x121/0x2d0
[ 2005.515468][ T1626]  ? rwlock_bug.part.1+0x90/0x90
[ 2005.515965][ T1626]  ? locks_check_ctx_file_list+0x1d/0x110
[ 2005.516551][ T1626]  ? __fsnotify_parent+0x4fc/0xab0
[ 2005.517076][ T1626]  ? _raw_spin_unlock+0x24/0x40
[ 2005.517572][ T1626]  ? locks_remove_file+0x319/0x570
[ 2005.518086][ T1626]  ? blkdev_put+0x530/0x530
[ 2005.518538][ T1626]  ? blkdev_put+0x31/0x530
[ 2005.518981][ T1626]  blkdev_put+0x31/0x530
[ 2005.519418][ T1626]  ? blkdev_put+0x530/0x530
[ 2005.519872][ T1626]  blkdev_close+0x8c/0xb0
[ 2005.520315][ T1626]  __fput+0x270/0x8e0
[ 2005.520719][ T1626]  task_work_run+0xe0/0x1a0
[ 2005.521178][ T1626]  do_exit+0xb80/0x2eb0
[ 2005.521612][ T1626]  ? rcu_read_lock_sched_held+0x9c/0xd0
[ 2005.522173][ T1626]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 2005.547613][ T1626]  ? mm_update_next_owner+0x7d0/0x7d0
[ 2005.548173][ T1626]  ? get_signal+0x325/0x2350
[ 2005.548643][ T1626]  ? lock_downgrade+0x6a0/0x6a0
[ 2005.549135][ T1626]  do_group_exit+0x125/0x340
[ 2005.549606][ T1626]  get_signal+0x3f8/0x2350
[ 2005.550050][ T1626]  ? rcu_read_lock_sched_held+0x9c/0xd0
[ 2005.550602][ T1626]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 2005.551127][ T1626]  ? find_held_lock+0x33/0x1c0
[ 2005.551610][ T1626]  arch_do_signal_or_restart+0x1ea/0x1d40
[ 2005.552181][ T1626]  ? rcu_read_lock_sched_held+0x9c/0xd0
[ 2005.562694][ T1626]  ? copy_siginfo_to_user32+0xa0/0xa0
[ 2005.563262][ T1626]  ? __x64_sys_futex+0x3f3/0x5b0
[ 2005.563758][ T1626]  ? __x64_sys_futex+0x3fc/0x5b0
[ 2005.564262][ T1626]  ? kfree+0x528/0x5b0
[ 2005.564683][ T1626]  ? dput.part.30+0x16/0xbc0
[ 2005.565152][ T1626]  ? unroll_tree_refs+0x2ae/0x410
[ 2005.565672][ T1626]  exit_to_user_mode_prepare+0x108/0x200
[ 2005.566247][ T1626]  syscall_exit_to_user_mode+0x38/0x260
[ 2005.566801][ T1626]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2005.567402][ T1626] RIP: 0033:0x437419
[ 2005.567796][ T1626] Code: Unable to access opcode bytes at RIP 0x4373ef.
[ 2005.568487][ T1626] RSP: 002b:00007f0f27ecfd88 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[ 2005.569325][ T1626] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000437419
[ 2005.570109][ T1626] RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000006e385c
[ 2005.570900][ T1626] RBP: 00007f0f27ecfda0 R08: 0000000000000000 R09: 0000000000000000
[ 2005.571697][ T1626] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 2005.572503][ T1626] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f0f27ed0700

> 
> Alex, your series "per memcg lru lock" changed the vmscan code rather a
> lot.  Could you please take a look at that reproducer?
> 
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com
>>
>> INFO: task syz-executor880:8534 can't die for more than 143 seconds.
>> task:syz-executor880 state:R  running task     stack:25304 pid: 8534 ppid:  8504 flags:0x00004006
>> Call Trace:
>>  context_switch kernel/sched/core.c:4269 [inline]
>>  __schedule+0x890/0x2030 kernel/sched/core.c:5019
>>  preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5179
>>  preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:40
>>  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
>>  _raw_spin_unlock_irq+0x3c/0x40 kernel/locking/spinlock.c:199
>>  spin_unlock_irq include/linux/spinlock.h:404 [inline]
>>  shrink_inactive_list+0x4b1/0xce0 mm/vmscan.c:1974
>>  shrink_list mm/vmscan.c:2167 [inline]
>>  shrink_lruvec+0x61b/0x11b0 mm/vmscan.c:2462
>>  shrink_node_memcgs mm/vmscan.c:2650 [inline]
>>  shrink_node+0x839/0x1d60 mm/vmscan.c:2767
>>  shrink_zones mm/vmscan.c:2970 [inline]
>>  do_try_to_free_pages+0x38b/0x1440 mm/vmscan.c:3025
>>  try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
>>  __perform_reclaim mm/page_alloc.c:4360 [inline]
>>  __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
>>  __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
>>  __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
>>  alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
>>  alloc_pages include/linux/gfp.h:547 [inline]
>>  __page_cache_alloc mm/filemap.c:977 [inline]
>>  __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
>>  page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
>>  do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
>>  do_sync_mmap_readahead mm/filemap.c:2721 [inline]
>>  filemap_fault+0x19d0/0x2940 mm/filemap.c:2809
>>  __do_fault+0x10d/0x4d0 mm/memory.c:3623
>>  do_shared_fault mm/memory.c:4071 [inline]
>>  do_fault mm/memory.c:4149 [inline]
>>  handle_pte_fault mm/memory.c:4385 [inline]
>>  __handle_mm_fault mm/memory.c:4520 [inline]
>>  handle_mm_fault+0x3033/0x55d0 mm/memory.c:4618
>>  do_user_addr_fault+0x55b/0xba0 arch/x86/mm/fault.c:1377
>>  handle_page_fault arch/x86/mm/fault.c:1434 [inline]
>>  exc_page_fault+0x9e/0x180 arch/x86/mm/fault.c:1490
>>  asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:580
>> RIP: 0033:0x400e71
>> Code: Unable to access opcode bytes at RIP 0x400e47.
>> RSP: 002b:00007f8a5353fdc0 EFLAGS: 00010246
>> RAX: 6c756e2f7665642f RBX: 00000000006dbc38 RCX: 0000000000402824
>> RDX: 928195da81441750 RSI: 0000000000000000 RDI: 00000000006dbc30
>> RBP: 00000000006dbc30 R08: 0000000000000000 R09: 00007f8a53540700
>> R10: 00007f8a535409d0 R11: 0000000000000202 R12: 00000000006dbc3c
>> R13: 00007ffe80747a5f R14: 00007f8a535409c0 R15: 0000000000000001
>>
>> Showing all locks held in the system:
>> 1 lock held by khungtaskd/1659:
>>  #0: ffffffff8b339ce0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6252
>> 1 lock held by kswapd0/2195:
>> 1 lock held by kswapd1/2196:
>> 1 lock held by in:imklog/8191:
>>  #0: ffff8880125b1270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:932
>> 1 lock held by cron/8189:
>> 2 locks held by syz-executor880/8502:
>> 2 locks held by syz-executor880/8505:
>> 2 locks held by syz-executor880/8507:
>> 2 locks held by syz-executor880/11706:
>> 2 locks held by syz-executor880/11709:
>> 3 locks held by syz-executor880/12008:
>> 2 locks held by syz-executor880/12015:
>>
>> =============================================
>>
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>> syzbot can test patches for this issue, for details see:
>> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-11-24 12:00   ` Alex Shi
@ 2020-11-24 13:53     ` Alex Shi
  2020-11-24 14:35     ` Alex Shi
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Shi @ 2020-11-24 13:53 UTC (permalink / raw)
  To: Andrew Morton, syzbot
  Cc: linux-kernel, linux-mm, syzkaller-bugs, Hugh Dickins,
	Johannes Weiner, peterz



在 2020/11/24 下午8:00, Alex Shi 写道:
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    03430750 Add linux-next specific files for 20201116
>>> git tree:       linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
>>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000
> CC Peter Zijlstra.
> 
> I found next-20200821 had a very very similar ops as this.
> https://groups.google.com/g/syzkaller-upstream-moderation/c/S0pyqK1dZv8/m/dxMoEhGdAQAJ
> So does this means the bug exist for long time from 5.9-rc1?

5.8 kernel sometime also failed on this test on my 2 cpus vm guest with 2g memory:


Thanks
Alex

[ 5875.750929][  T946] INFO: task repro:31866 blocked for more than 143 seconds.
[ 5875.751618][  T946]       Not tainted 5.8.0 #6
[ 5875.752046][  T946] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th.
[ 5875.752845][  T946] repro           D12088 31866      1 0x80004086
[ 5875.753436][  T946] Call Trace:
[ 5875.753747][  T946]  __schedule+0x394/0x950
[ 5875.774033][  T946]  ? __mutex_lock+0x46f/0x9c0
[ 5875.774481][  T946]  ? blkdev_put+0x18/0x120
[ 5875.774894][  T946]  schedule+0x37/0xe0
[ 5875.775260][  T946]  schedule_preempt_disabled+0xf/0x20
[ 5875.775753][  T946]  __mutex_lock+0x474/0x9c0
[ 5875.776174][  T946]  ? lock_acquire+0xa7/0x390
[ 5875.776602][  T946]  ? locks_remove_file+0x1e7/0x2d0
[ 5875.777079][  T946]  ? blkdev_put+0x18/0x120
[ 5875.777485][  T946]  blkdev_put+0x18/0x120
[ 5875.777880][  T946]  blkdev_close+0x1f/0x30
[ 5875.778281][  T946]  __fput+0xf0/0x260
[ 5875.778639][  T946]  task_work_run+0x68/0xb0
[ 5875.779054][  T946]  do_exit+0x3df/0xce0
[ 5875.779430][  T946]  ? get_signal+0x11d/0xca0
[ 5875.779846][  T946]  do_group_exit+0x42/0xb0
[ 5875.780261][  T946]  get_signal+0x16a/0xca0
[ 5875.780662][  T946]  ? handle_mm_fault+0xc8f/0x19c0
[ 5875.781134][  T946]  do_signal+0x2b/0x8e0
[ 5875.781521][  T946]  ? trace_hardirqs_off+0xe/0xf0
[ 5875.781989][  T946]  __prepare_exit_to_usermode+0xef/0x1f0
[ 5875.782512][  T946]  ? asm_exc_page_fault+0x8/0x30
[ 5875.782979][  T946]  prepare_exit_to_usermode+0x5/0x30
[ 5875.783461][  T946]  asm_exc_page_fault+0x1e/0x30
[ 5875.783909][  T946] RIP: 0033:0x428dd7
[ 5875.794899][  T946] Code: Bad RIP value.
[ 5875.795290][  T946] RSP: 002b:00007f37c99e0d78 EFLAGS: 00010202
[ 5875.795858][  T946] RAX: 0000000020000080 RBX: 0000000000000000 RCX: 0000000076656f
[ 5875.796588][  T946] RDX: 000000000000000c RSI: 00000000004b2370 RDI: 00000000200000
[ 5875.797326][  T946] RBP: 00007f37c99e0da0 R08: 00007f37c99e1700 R09: 00007f37c99e10
[ 5875.798063][  T946] R10: 00007f37c99e19d0 R11: 0000000000000202 R12: 00000000000000
[ 5875.798802][  T946] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f37c99e10

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-11-24 12:00   ` Alex Shi
  2020-11-24 13:53     ` Alex Shi
@ 2020-11-24 14:35     ` Alex Shi
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Shi @ 2020-11-24 14:35 UTC (permalink / raw)
  To: Andrew Morton, syzbot
  Cc: linux-kernel, linux-mm, syzkaller-bugs, Hugh Dickins,
	Johannes Weiner, peterz



在 2020/11/24 下午8:00, Alex Shi 写道:
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    03430750 Add linux-next specific files for 20201116
>>> git tree:       linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
>>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000
> CC Peter Zijlstra.
> 
> I found next-20200821 had a very very similar ops as this.
> https://groups.google.com/g/syzkaller-upstream-moderation/c/S0pyqK1dZv8/m/dxMoEhGdAQAJ
> So does this means the bug exist for long time from 5.9-rc1?
> 
> The reproducer works randomly on a cpu=2, mem=1600M x86 vm. It could cause hung again
> on both kernel, but both with different kernel stack.
> 
> Maybe is system just too busy? I will try more older kernel with the reproducer.

5.8 kernel sometime also failed on this test on my 2 cpus vm guest with 2g memory:
Any comments for this issue?

Thanks
Alex

[ 5875.750929][  T946] INFO: task repro:31866 blocked for more than 143 seconds.
[ 5875.751618][  T946]       Not tainted 5.8.0 #6
[ 5875.752046][  T946] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th.
[ 5875.752845][  T946] repro           D12088 31866      1 0x80004086
[ 5875.753436][  T946] Call Trace:
[ 5875.753747][  T946]  __schedule+0x394/0x950
[ 5875.774033][  T946]  ? __mutex_lock+0x46f/0x9c0
[ 5875.774481][  T946]  ? blkdev_put+0x18/0x120
[ 5875.774894][  T946]  schedule+0x37/0xe0
[ 5875.775260][  T946]  schedule_preempt_disabled+0xf/0x20
[ 5875.775753][  T946]  __mutex_lock+0x474/0x9c0
[ 5875.776174][  T946]  ? lock_acquire+0xa7/0x390
[ 5875.776602][  T946]  ? locks_remove_file+0x1e7/0x2d0
[ 5875.777079][  T946]  ? blkdev_put+0x18/0x120
[ 5875.777485][  T946]  blkdev_put+0x18/0x120
[ 5875.777880][  T946]  blkdev_close+0x1f/0x30
[ 5875.778281][  T946]  __fput+0xf0/0x260
[ 5875.778639][  T946]  task_work_run+0x68/0xb0
[ 5875.779054][  T946]  do_exit+0x3df/0xce0
[ 5875.779430][  T946]  ? get_signal+0x11d/0xca0
[ 5875.779846][  T946]  do_group_exit+0x42/0xb0
[ 5875.780261][  T946]  get_signal+0x16a/0xca0
[ 5875.780662][  T946]  ? handle_mm_fault+0xc8f/0x19c0
[ 5875.781134][  T946]  do_signal+0x2b/0x8e0
[ 5875.781521][  T946]  ? trace_hardirqs_off+0xe/0xf0
[ 5875.781989][  T946]  __prepare_exit_to_usermode+0xef/0x1f0
[ 5875.782512][  T946]  ? asm_exc_page_fault+0x8/0x30
[ 5875.782979][  T946]  prepare_exit_to_usermode+0x5/0x30
[ 5875.783461][  T946]  asm_exc_page_fault+0x1e/0x30
[ 5875.783909][  T946] RIP: 0033:0x428dd7
[ 5875.794899][  T946] Code: Bad RIP value.
[ 5875.795290][  T946] RSP: 002b:00007f37c99e0d78 EFLAGS: 00010202
[ 5875.795858][  T946] RAX: 0000000020000080 RBX: 0000000000000000 RCX: 0000000076656f
[ 5875.796588][  T946] RDX: 000000000000000c RSI: 00000000004b2370 RDI: 00000000200000
[ 5875.797326][  T946] RBP: 00007f37c99e0da0 R08: 00007f37c99e1700 R09: 00007f37c99e10
[ 5875.798063][  T946] R10: 00007f37c99e19d0 R11: 0000000000000202 R12: 00000000000000
[ 5875.798802][  T946] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f37c99e10

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-11-24  3:54 ` Andrew Morton
  2020-11-24  5:20   ` Alex Shi
  2020-11-24 12:00   ` Alex Shi
@ 2020-12-21 19:56   ` Hugh Dickins
  2020-12-21 20:33     ` Matthew Wilcox
  2 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2020-12-21 19:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: syzbot, linux-kernel, linux-mm, syzkaller-bugs, Alex Shi, Matthew Wilcox

On Mon, 23 Nov 2020, Andrew Morton wrote:
> On Fri, 20 Nov 2020 17:55:22 -0800 syzbot <syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com> wrote:
> 
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    03430750 Add linux-next specific files for 20201116
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
> > dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
> > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000
> 
> Alex, your series "per memcg lru lock" changed the vmscan code rather a
> lot.  Could you please take a look at that reproducer?

Andrew, I promised I'd take a look at this syzreport too (though I think
we're agreed by now that it has nothing to do with per-memcg lru_lock).

I did try, but (unlike Alex) did not manage to get the reproducer to
reproduce it.  No doubt I did not try hard enough: I did rather lose
interest after seeing that it appears to involve someone with
CAP_SYS_ADMIN doing an absurdly large ioctl(BLKFRASET) on /dev/nullb0
("Null test block driver" enabled via CONFIG_BLK_DEV_NULL_BLK=y: that I
did enable) and faulting from it: presumably triggering an absurd amount
of readahead.

Cc'ing Matthew since he has a particular interest in readahead, and
might be inspired to make some small safe change that would fix this,
and benefit realistic cases too; but on the whole it didn't look worth
worrying about - or at least not by me.

Hugh

> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com
> > 
> > INFO: task syz-executor880:8534 can't die for more than 143 seconds.
> > task:syz-executor880 state:R  running task     stack:25304 pid: 8534 ppid:  8504 flags:0x00004006
> > Call Trace:
> >  context_switch kernel/sched/core.c:4269 [inline]
> >  __schedule+0x890/0x2030 kernel/sched/core.c:5019
> >  preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5179
> >  preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:40
> >  __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
> >  _raw_spin_unlock_irq+0x3c/0x40 kernel/locking/spinlock.c:199
> >  spin_unlock_irq include/linux/spinlock.h:404 [inline]
> >  shrink_inactive_list+0x4b1/0xce0 mm/vmscan.c:1974
> >  shrink_list mm/vmscan.c:2167 [inline]
> >  shrink_lruvec+0x61b/0x11b0 mm/vmscan.c:2462
> >  shrink_node_memcgs mm/vmscan.c:2650 [inline]
> >  shrink_node+0x839/0x1d60 mm/vmscan.c:2767
> >  shrink_zones mm/vmscan.c:2970 [inline]
> >  do_try_to_free_pages+0x38b/0x1440 mm/vmscan.c:3025
> >  try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
> >  __perform_reclaim mm/page_alloc.c:4360 [inline]
> >  __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
> >  __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
> >  __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
> >  alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
> >  alloc_pages include/linux/gfp.h:547 [inline]
> >  __page_cache_alloc mm/filemap.c:977 [inline]
> >  __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
> >  page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
> >  do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
> >  do_sync_mmap_readahead mm/filemap.c:2721 [inline]
> >  filemap_fault+0x19d0/0x2940 mm/filemap.c:2809
> >  __do_fault+0x10d/0x4d0 mm/memory.c:3623
> >  do_shared_fault mm/memory.c:4071 [inline]
> >  do_fault mm/memory.c:4149 [inline]
> >  handle_pte_fault mm/memory.c:4385 [inline]
> >  __handle_mm_fault mm/memory.c:4520 [inline]
> >  handle_mm_fault+0x3033/0x55d0 mm/memory.c:4618
> >  do_user_addr_fault+0x55b/0xba0 arch/x86/mm/fault.c:1377
> >  handle_page_fault arch/x86/mm/fault.c:1434 [inline]
> >  exc_page_fault+0x9e/0x180 arch/x86/mm/fault.c:1490
> >  asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:580
> > RIP: 0033:0x400e71
> > Code: Unable to access opcode bytes at RIP 0x400e47.
> > RSP: 002b:00007f8a5353fdc0 EFLAGS: 00010246
> > RAX: 6c756e2f7665642f RBX: 00000000006dbc38 RCX: 0000000000402824
> > RDX: 928195da81441750 RSI: 0000000000000000 RDI: 00000000006dbc30
> > RBP: 00000000006dbc30 R08: 0000000000000000 R09: 00007f8a53540700
> > R10: 00007f8a535409d0 R11: 0000000000000202 R12: 00000000006dbc3c
> > R13: 00007ffe80747a5f R14: 00007f8a535409c0 R15: 0000000000000001
> > 
> > Showing all locks held in the system:
> > 1 lock held by khungtaskd/1659:
> >  #0: ffffffff8b339ce0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6252
> > 1 lock held by kswapd0/2195:
> > 1 lock held by kswapd1/2196:
> > 1 lock held by in:imklog/8191:
> >  #0: ffff8880125b1270 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:932
> > 1 lock held by cron/8189:
> > 2 locks held by syz-executor880/8502:
> > 2 locks held by syz-executor880/8505:
> > 2 locks held by syz-executor880/8507:
> > 2 locks held by syz-executor880/11706:
> > 2 locks held by syz-executor880/11709:
> > 3 locks held by syz-executor880/12008:
> > 2 locks held by syz-executor880/12015:
> > 
> > =============================================
> > 
> > 
> > 
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > 
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > syzbot can test patches for this issue, for details see:
> > https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-12-21 19:56   ` Hugh Dickins
@ 2020-12-21 20:33     ` Matthew Wilcox
  2020-12-21 21:25       ` Roman Gushchin
  2021-02-05 17:44       ` Matthew Wilcox
  0 siblings, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2020-12-21 20:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs,
	Alex Shi, Roman Gushchin, Wu Fengguang

On Mon, Dec 21, 2020 at 11:56:36AM -0800, Hugh Dickins wrote:
> On Mon, 23 Nov 2020, Andrew Morton wrote:
> > On Fri, 20 Nov 2020 17:55:22 -0800 syzbot <syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com> wrote:
> > 
> > > Hello,
> > > 
> > > syzbot found the following issue on:
> > > 
> > > HEAD commit:    03430750 Add linux-next specific files for 20201116
> > > git tree:       linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
> > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000
> > 
> > Alex, your series "per memcg lru lock" changed the vmscan code rather a
> > lot.  Could you please take a look at that reproducer?
> 
> Andrew, I promised I'd take a look at this syzreport too (though I think
> we're agreed by now that it has nothing to do with per-memcg lru_lock).
> 
> I did try, but (unlike Alex) did not manage to get the reproducer to
> reproduce it.  No doubt I did not try hard enough: I did rather lose
> interest after seeing that it appears to involve someone with
> CAP_SYS_ADMIN doing an absurdly large ioctl(BLKFRASET) on /dev/nullb0
> ("Null test block driver" enabled via CONFIG_BLK_DEV_NULL_BLK=y: that I
> did enable) and faulting from it: presumably triggering an absurd amount
> of readahead.
> 
> Cc'ing Matthew since he has a particular interest in readahead, and
> might be inspired to make some small safe change that would fix this,
> and benefit realistic cases too; but on the whole it didn't look worth
> worrying about - or at least not by me.

Oh, interesting.  Thanks for looping me in, I hadn't looked at this one
at all.  Building on the debugging you did, this is the interesting
part of the backtrace to me:

> > >  try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
> > >  __perform_reclaim mm/page_alloc.c:4360 [inline]
> > >  __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
> > >  __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
> > >  __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
> > >  alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
> > >  alloc_pages include/linux/gfp.h:547 [inline]
> > >  __page_cache_alloc mm/filemap.c:977 [inline]
> > >  __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
> > >  page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
> > >  do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
> > >  do_sync_mmap_readahead mm/filemap.c:2721 [inline]
> > >  filemap_fault+0x19d0/0x2940 mm/filemap.c:2809

So ra_pages has been set to something ridiculously large, and as
a result, we call do_page_cache_ra() asking to read more memory than
is available in the machine.  Funny thing, we actually have a function
to prevent this kind of situation, and it's force_page_cache_ra().

So this might fix the problem.  I only tested that it compiles.  I'll
be happy to write up a proper changelog and sign-off for it if it works ...
it'd be good to get it some soak testing on a variety of different
workloads; changing this stuff is enormously subtle.

As a testament to that, I think Fengguang got it wrong in commit
2cbea1d3ab11 -- async_size should have been 3 * ra_pages / 4, not ra_pages
/ 4 (because we read-behind by half the range, so we're looking for a
page fault to happen a quarter of the way behind this fault ...)

This is partially Roman's fault, see commit 600e19afc5f8.

diff --git a/mm/filemap.c b/mm/filemap.c
index d5e7c2029d16..43fe0f0ae3bb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2632,7 +2632,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
 	ra->size = ra->ra_pages;
 	ra->async_size = ra->ra_pages / 4;
 	ractl._index = ra->start;
-	do_page_cache_ra(&ractl, ra->size, ra->async_size);
+	force_page_cache_ra(&ractl, ra, ra->size);
 	return fpin;
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index c43ccdddb0f6..5664b4b91340 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,8 +49,6 @@ void unmap_page_range(struct mmu_gather *tlb,
 			     unsigned long addr, unsigned long end,
 			     struct zap_details *details);
 
-void do_page_cache_ra(struct readahead_control *, unsigned long nr_to_read,
-		unsigned long lookahead_size);
 void force_page_cache_ra(struct readahead_control *, struct file_ra_state *,
 		unsigned long nr);
 static inline void force_page_cache_readahead(struct address_space *mapping,
diff --git a/mm/readahead.c b/mm/readahead.c
index c5b0457415be..f344c894c26a 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -246,7 +246,7 @@ EXPORT_SYMBOL_GPL(page_cache_ra_unbounded);
  * behaviour which would occur if page allocations are causing VM writeback.
  * We really don't want to intermingle reads and writes like that.
  */
-void do_page_cache_ra(struct readahead_control *ractl,
+static void do_page_cache_ra(struct readahead_control *ractl,
 		unsigned long nr_to_read, unsigned long lookahead_size)
 {
 	struct inode *inode = ractl->mapping->host;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-12-21 20:33     ` Matthew Wilcox
@ 2020-12-21 21:25       ` Roman Gushchin
  2021-02-05 17:44       ` Matthew Wilcox
  1 sibling, 0 replies; 11+ messages in thread
From: Roman Gushchin @ 2020-12-21 21:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hugh Dickins, Andrew Morton, syzbot, linux-kernel, linux-mm,
	syzkaller-bugs, Alex Shi, Wu Fengguang

On Mon, Dec 21, 2020 at 08:33:44PM +0000, Matthew Wilcox wrote:
> On Mon, Dec 21, 2020 at 11:56:36AM -0800, Hugh Dickins wrote:
> > On Mon, 23 Nov 2020, Andrew Morton wrote:
> > > On Fri, 20 Nov 2020 17:55:22 -0800 syzbot <syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com> wrote:
> > > 
> > > > Hello,
> > > > 
> > > > syzbot found the following issue on:
> > > > 
> > > > HEAD commit:    03430750 Add linux-next specific files for 20201116
> > > > git tree:       linux-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000 
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8 
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2 
> > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000 
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000 
> > > 
> > > Alex, your series "per memcg lru lock" changed the vmscan code rather a
> > > lot.  Could you please take a look at that reproducer?
> > 
> > Andrew, I promised I'd take a look at this syzreport too (though I think
> > we're agreed by now that it has nothing to do with per-memcg lru_lock).
> > 
> > I did try, but (unlike Alex) did not manage to get the reproducer to
> > reproduce it.  No doubt I did not try hard enough: I did rather lose
> > interest after seeing that it appears to involve someone with
> > CAP_SYS_ADMIN doing an absurdly large ioctl(BLKFRASET) on /dev/nullb0
> > ("Null test block driver" enabled via CONFIG_BLK_DEV_NULL_BLK=y: that I
> > did enable) and faulting from it: presumably triggering an absurd amount
> > of readahead.
> > 
> > Cc'ing Matthew since he has a particular interest in readahead, and
> > might be inspired to make some small safe change that would fix this,
> > and benefit realistic cases too; but on the whole it didn't look worth
> > worrying about - or at least not by me.
> 
> Oh, interesting.  Thanks for looping me in, I hadn't looked at this one
> at all.  Building on the debugging you did, this is the interesting
> part of the backtrace to me:
> 
> > > >  try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
> > > >  __perform_reclaim mm/page_alloc.c:4360 [inline]
> > > >  __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
> > > >  __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
> > > >  __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
> > > >  alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
> > > >  alloc_pages include/linux/gfp.h:547 [inline]
> > > >  __page_cache_alloc mm/filemap.c:977 [inline]
> > > >  __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
> > > >  page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
> > > >  do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
> > > >  do_sync_mmap_readahead mm/filemap.c:2721 [inline]
> > > >  filemap_fault+0x19d0/0x2940 mm/filemap.c:2809
> 
> So ra_pages has been set to something ridiculously large, and as
> a result, we call do_page_cache_ra() asking to read more memory than
> is available in the machine.  Funny thing, we actually have a function
> to prevent this kind of situation, and it's force_page_cache_ra().
> 
> So this might fix the problem.  I only tested that it compiles.  I'll
> be happy to write up a proper changelog and sign-off for it if it works ...
> it'd be good to get it some soak testing on a variety of different
> workloads; changing this stuff is enormously subtle.
> 
> As a testament to that, I think Fengguang got it wrong in commit
> 2cbea1d3ab11 -- async_size should have been 3 * ra_pages / 4, not ra_pages
> / 4 (because we read-behind by half the range, so we're looking for a
> page fault to happen a quarter of the way behind this fault ...)
> 
> This is partially Roman's fault, see commit 600e19afc5f8.

Hi Matthew,

Lol, I had a hard time to imagine how I managed to break the readahead
by my recent changes before looking into the commit:  it's from 2015 :)

I wonder how a __GFP_NORETRY allocation is causing a 143 seconds stall.
The loop in page_cache_ra_unbounded() should in theory be easily broken
on the first allocation failure. So it could be that (partially) because of
the unrealistically high ra limit the memory is becoming completely depleted
and the memory pressure is insane.

Anyway, your change looks good to me. I'll ack the full version.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2020-12-21 20:33     ` Matthew Wilcox
  2020-12-21 21:25       ` Roman Gushchin
@ 2021-02-05 17:44       ` Matthew Wilcox
  2021-02-05 17:57         ` Hugh Dickins
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2021-02-05 17:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, syzbot, linux-kernel, linux-mm, syzkaller-bugs,
	Alex Shi, Roman Gushchin, Wu Fengguang


Hugh, did you get a chance to test this?

On Mon, Dec 21, 2020 at 08:33:44PM +0000, Matthew Wilcox wrote:
> On Mon, Dec 21, 2020 at 11:56:36AM -0800, Hugh Dickins wrote:
> > On Mon, 23 Nov 2020, Andrew Morton wrote:
> > > On Fri, 20 Nov 2020 17:55:22 -0800 syzbot <syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com> wrote:
> > > 
> > > > Hello,
> > > > 
> > > > syzbot found the following issue on:
> > > > 
> > > > HEAD commit:    03430750 Add linux-next specific files for 20201116
> > > > git tree:       linux-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
> > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000
> > > 
> > > Alex, your series "per memcg lru lock" changed the vmscan code rather a
> > > lot.  Could you please take a look at that reproducer?
> > 
> > Andrew, I promised I'd take a look at this syzreport too (though I think
> > we're agreed by now that it has nothing to do with per-memcg lru_lock).
> > 
> > I did try, but (unlike Alex) did not manage to get the reproducer to
> > reproduce it.  No doubt I did not try hard enough: I did rather lose
> > interest after seeing that it appears to involve someone with
> > CAP_SYS_ADMIN doing an absurdly large ioctl(BLKFRASET) on /dev/nullb0
> > ("Null test block driver" enabled via CONFIG_BLK_DEV_NULL_BLK=y: that I
> > did enable) and faulting from it: presumably triggering an absurd amount
> > of readahead.
> > 
> > Cc'ing Matthew since he has a particular interest in readahead, and
> > might be inspired to make some small safe change that would fix this,
> > and benefit realistic cases too; but on the whole it didn't look worth
> > worrying about - or at least not by me.
> 
> Oh, interesting.  Thanks for looping me in, I hadn't looked at this one
> at all.  Building on the debugging you did, this is the interesting
> part of the backtrace to me:
> 
> > > >  try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
> > > >  __perform_reclaim mm/page_alloc.c:4360 [inline]
> > > >  __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
> > > >  __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
> > > >  __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
> > > >  alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
> > > >  alloc_pages include/linux/gfp.h:547 [inline]
> > > >  __page_cache_alloc mm/filemap.c:977 [inline]
> > > >  __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
> > > >  page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
> > > >  do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
> > > >  do_sync_mmap_readahead mm/filemap.c:2721 [inline]
> > > >  filemap_fault+0x19d0/0x2940 mm/filemap.c:2809
> 
> So ra_pages has been set to something ridiculously large, and as
> a result, we call do_page_cache_ra() asking to read more memory than
> is available in the machine.  Funny thing, we actually have a function
> to prevent this kind of situation, and it's force_page_cache_ra().
> 
> So this might fix the problem.  I only tested that it compiles.  I'll
> be happy to write up a proper changelog and sign-off for it if it works ...
> it'd be good to get it some soak testing on a variety of different
> workloads; changing this stuff is enormously subtle.
> 
> As a testament to that, I think Fengguang got it wrong in commit
> 2cbea1d3ab11 -- async_size should have been 3 * ra_pages / 4, not ra_pages
> / 4 (because we read-behind by half the range, so we're looking for a
> page fault to happen a quarter of the way behind this fault ...)
> 
> This is partially Roman's fault, see commit 600e19afc5f8.
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index d5e7c2029d16..43fe0f0ae3bb 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2632,7 +2632,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>  	ra->size = ra->ra_pages;
>  	ra->async_size = ra->ra_pages / 4;
>  	ractl._index = ra->start;
> -	do_page_cache_ra(&ractl, ra->size, ra->async_size);
> +	force_page_cache_ra(&ractl, ra, ra->size);
>  	return fpin;
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index c43ccdddb0f6..5664b4b91340 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -49,8 +49,6 @@ void unmap_page_range(struct mmu_gather *tlb,
>  			     unsigned long addr, unsigned long end,
>  			     struct zap_details *details);
>  
> -void do_page_cache_ra(struct readahead_control *, unsigned long nr_to_read,
> -		unsigned long lookahead_size);
>  void force_page_cache_ra(struct readahead_control *, struct file_ra_state *,
>  		unsigned long nr);
>  static inline void force_page_cache_readahead(struct address_space *mapping,
> diff --git a/mm/readahead.c b/mm/readahead.c
> index c5b0457415be..f344c894c26a 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -246,7 +246,7 @@ EXPORT_SYMBOL_GPL(page_cache_ra_unbounded);
>   * behaviour which would occur if page allocations are causing VM writeback.
>   * We really don't want to intermingle reads and writes like that.
>   */
> -void do_page_cache_ra(struct readahead_control *ractl,
> +static void do_page_cache_ra(struct readahead_control *ractl,
>  		unsigned long nr_to_read, unsigned long lookahead_size)
>  {
>  	struct inode *inode = ractl->mapping->host;
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task can't die in shrink_inactive_list (2)
  2021-02-05 17:44       ` Matthew Wilcox
@ 2021-02-05 17:57         ` Hugh Dickins
  0 siblings, 0 replies; 11+ messages in thread
From: Hugh Dickins @ 2021-02-05 17:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hugh Dickins, Andrew Morton, syzbot, linux-kernel, linux-mm,
	syzkaller-bugs, Alex Shi, Roman Gushchin, Wu Fengguang

On Fri, 5 Feb 2021, Matthew Wilcox wrote:
> 
> Hugh, did you get a chance to test this?

'fraid not: since I was unable to reproduce the problem,
I did not try running with your suggested fix at all:
hoped someone who could reproduce the problem might.

Hugh

> 
> On Mon, Dec 21, 2020 at 08:33:44PM +0000, Matthew Wilcox wrote:
> > On Mon, Dec 21, 2020 at 11:56:36AM -0800, Hugh Dickins wrote:
> > > On Mon, 23 Nov 2020, Andrew Morton wrote:
> > > > On Fri, 20 Nov 2020 17:55:22 -0800 syzbot <syzbot+e5a33e700b1dd0da20a2@syzkaller.appspotmail.com> wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > syzbot found the following issue on:
> > > > > 
> > > > > HEAD commit:    03430750 Add linux-next specific files for 20201116
> > > > > git tree:       linux-next
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
> > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
> > > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
> > > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000
> > > > 
> > > > Alex, your series "per memcg lru lock" changed the vmscan code rather a
> > > > lot.  Could you please take a look at that reproducer?
> > > 
> > > Andrew, I promised I'd take a look at this syzreport too (though I think
> > > we're agreed by now that it has nothing to do with per-memcg lru_lock).
> > > 
> > > I did try, but (unlike Alex) did not manage to get the reproducer to
> > > reproduce it.  No doubt I did not try hard enough: I did rather lose
> > > interest after seeing that it appears to involve someone with
> > > CAP_SYS_ADMIN doing an absurdly large ioctl(BLKFRASET) on /dev/nullb0
> > > ("Null test block driver" enabled via CONFIG_BLK_DEV_NULL_BLK=y: that I
> > > did enable) and faulting from it: presumably triggering an absurd amount
> > > of readahead.
> > > 
> > > Cc'ing Matthew since he has a particular interest in readahead, and
> > > might be inspired to make some small safe change that would fix this,
> > > and benefit realistic cases too; but on the whole it didn't look worth
> > > worrying about - or at least not by me.
> > 
> > Oh, interesting.  Thanks for looping me in, I hadn't looked at this one
> > at all.  Building on the debugging you did, this is the interesting
> > part of the backtrace to me:
> > 
> > > > >  try_to_free_pages+0x29f/0x720 mm/vmscan.c:3264
> > > > >  __perform_reclaim mm/page_alloc.c:4360 [inline]
> > > > >  __alloc_pages_direct_reclaim mm/page_alloc.c:4381 [inline]
> > > > >  __alloc_pages_slowpath.constprop.0+0x917/0x2510 mm/page_alloc.c:4785
> > > > >  __alloc_pages_nodemask+0x5f0/0x730 mm/page_alloc.c:4995
> > > > >  alloc_pages_current+0x191/0x2a0 mm/mempolicy.c:2271
> > > > >  alloc_pages include/linux/gfp.h:547 [inline]
> > > > >  __page_cache_alloc mm/filemap.c:977 [inline]
> > > > >  __page_cache_alloc+0x2ce/0x360 mm/filemap.c:962
> > > > >  page_cache_ra_unbounded+0x3a1/0x920 mm/readahead.c:216
> > > > >  do_page_cache_ra+0xf9/0x140 mm/readahead.c:267
> > > > >  do_sync_mmap_readahead mm/filemap.c:2721 [inline]
> > > > >  filemap_fault+0x19d0/0x2940 mm/filemap.c:2809
> > 
> > So ra_pages has been set to something ridiculously large, and as
> > a result, we call do_page_cache_ra() asking to read more memory than
> > is available in the machine.  Funny thing, we actually have a function
> > to prevent this kind of situation, and it's force_page_cache_ra().
> > 
> > So this might fix the problem.  I only tested that it compiles.  I'll
> > be happy to write up a proper changelog and sign-off for it if it works ...
> > it'd be good to get it some soak testing on a variety of different
> > workloads; changing this stuff is enormously subtle.
> > 
> > As a testament to that, I think Fengguang got it wrong in commit
> > 2cbea1d3ab11 -- async_size should have been 3 * ra_pages / 4, not ra_pages
> > / 4 (because we read-behind by half the range, so we're looking for a
> > page fault to happen a quarter of the way behind this fault ...)
> > 
> > This is partially Roman's fault, see commit 600e19afc5f8.
> > 
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index d5e7c2029d16..43fe0f0ae3bb 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2632,7 +2632,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> >  	ra->size = ra->ra_pages;
> >  	ra->async_size = ra->ra_pages / 4;
> >  	ractl._index = ra->start;
> > -	do_page_cache_ra(&ractl, ra->size, ra->async_size);
> > +	force_page_cache_ra(&ractl, ra, ra->size);
> >  	return fpin;
> >  }
> >  
> > diff --git a/mm/internal.h b/mm/internal.h
> > index c43ccdddb0f6..5664b4b91340 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -49,8 +49,6 @@ void unmap_page_range(struct mmu_gather *tlb,
> >  			     unsigned long addr, unsigned long end,
> >  			     struct zap_details *details);
> >  
> > -void do_page_cache_ra(struct readahead_control *, unsigned long nr_to_read,
> > -		unsigned long lookahead_size);
> >  void force_page_cache_ra(struct readahead_control *, struct file_ra_state *,
> >  		unsigned long nr);
> >  static inline void force_page_cache_readahead(struct address_space *mapping,
> > diff --git a/mm/readahead.c b/mm/readahead.c
> > index c5b0457415be..f344c894c26a 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -246,7 +246,7 @@ EXPORT_SYMBOL_GPL(page_cache_ra_unbounded);
> >   * behaviour which would occur if page allocations are causing VM writeback.
> >   * We really don't want to intermingle reads and writes like that.
> >   */
> > -void do_page_cache_ra(struct readahead_control *ractl,
> > +static void do_page_cache_ra(struct readahead_control *ractl,
> >  		unsigned long nr_to_read, unsigned long lookahead_size)
> >  {
> >  	struct inode *inode = ractl->mapping->host;

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-05 18:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-21  1:55 INFO: task can't die in shrink_inactive_list (2) syzbot
2020-11-24  3:54 ` Andrew Morton
2020-11-24  5:20   ` Alex Shi
2020-11-24 12:00   ` Alex Shi
2020-11-24 13:53     ` Alex Shi
2020-11-24 14:35     ` Alex Shi
2020-12-21 19:56   ` Hugh Dickins
2020-12-21 20:33     ` Matthew Wilcox
2020-12-21 21:25       ` Roman Gushchin
2021-02-05 17:44       ` Matthew Wilcox
2021-02-05 17:57         ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).