* [syzbot] possible deadlock in start_this_handle (3) @ 2022-07-14 12:08 syzbot 2022-07-14 14:18 ` Jan Kara 0 siblings, 1 reply; 12+ messages in thread From: syzbot @ 2022-07-14 12:08 UTC (permalink / raw) To: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso Hello, syzbot found the following issue on: HEAD commit: 5a29232d870d Merge tag 'for-5.19-rc6-tag' of git://git.ker.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=16619ce8080000 kernel config: https://syzkaller.appspot.com/x/.config?x=525bc0635a2b942a dashboard link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 userspace arch: i386 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com ====================================================== WARNING: possible circular locking dependency detected 5.19.0-rc6-syzkaller-00026-g5a29232d870d #0 Not tainted ------------------------------------------------------ khugepaged/48 is trying to acquire lock: ffff888044598990 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0xfb4/0x14a0 fs/jbd2/transaction.c:461 but task is already holding lock: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4638 [inline] ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline] ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x9e1/0x2160 mm/page_alloc.c:5066 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (fs_reclaim){+.+.}-{0:0}: __fs_reclaim_acquire mm/page_alloc.c:4589 [inline] fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4603 might_alloc include/linux/sched/mm.h:271 [inline] slab_pre_alloc_hook mm/slab.h:723 [inline] slab_alloc_node mm/slub.c:3157 [inline] slab_alloc mm/slub.c:3251 [inline] kmem_cache_alloc_trace+0x40/0x3f0 mm/slub.c:3282 kmalloc include/linux/slab.h:600 [inline] memory_stat_format+0x95/0xae0 mm/memcontrol.c:1468 mem_cgroup_print_oom_meminfo.cold+0x50/0x7e mm/memcontrol.c:1594 dump_header+0x13f/0x7f9 mm/oom_kill.c:462 oom_kill_process.cold+0x10/0x15 mm/oom_kill.c:1037 out_of_memory+0x358/0x14b0 mm/oom_kill.c:1175 mem_cgroup_out_of_memory+0x206/0x270 mm/memcontrol.c:1650 memory_max_write+0x25c/0x3b0 mm/memcontrol.c:6299 cgroup_file_write+0x1de/0x770 kernel/cgroup/cgroup.c:3882 kernfs_fop_write_iter+0x3f8/0x610 fs/kernfs/file.c:290 call_write_iter include/linux/fs.h:2058 [inline] new_sync_write+0x38a/0x560 fs/read_write.c:504 vfs_write+0x7c0/0xac0 fs/read_write.c:591 ksys_write+0x127/0x250 fs/read_write.c:644 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203 entry_SYSENTER_compat_after_hwframe+0x53/0x62 -> #1 (oom_lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:603 [inline] __mutex_lock+0x12f/0x1350 kernel/locking/mutex.c:747 mem_cgroup_out_of_memory+0x8d/0x270 mm/memcontrol.c:1640 mem_cgroup_oom mm/memcontrol.c:1880 [inline] try_charge_memcg+0xef9/0x1380 mm/memcontrol.c:2670 obj_cgroup_charge_pages mm/memcontrol.c:2999 [inline] obj_cgroup_charge+0x2ab/0x5e0 mm/memcontrol.c:3289 memcg_slab_pre_alloc_hook mm/slab.h:505 [inline] slab_pre_alloc_hook mm/slab.h:728 [inline] slab_alloc_node mm/slub.c:3157 [inline] slab_alloc mm/slub.c:3251 [inline] __kmem_cache_alloc_lru mm/slub.c:3258 [inline] kmem_cache_alloc+0x92/0x3b0 mm/slub.c:3268 kmem_cache_zalloc include/linux/slab.h:723 [inline] alloc_buffer_head+0x20/0x140 fs/buffer.c:3294 alloc_page_buffers+0x285/0x7a0 fs/buffer.c:829 grow_dev_page fs/buffer.c:965 [inline] grow_buffers fs/buffer.c:1011 [inline] __getblk_slow+0x525/0x1080 fs/buffer.c:1038 __getblk_gfp+0x6e/0x80 fs/buffer.c:1333 sb_getblk include/linux/buffer_head.h:326 [inline] ext4_getblk+0x20d/0x7c0 fs/ext4/inode.c:866 ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:912 ext4_append+0x177/0x3a0 fs/ext4/namei.c:67 ext4_init_new_dir+0x25e/0x4d0 fs/ext4/namei.c:2920 ext4_mkdir+0x3cf/0xb20 fs/ext4/namei.c:2966 vfs_mkdir+0x1c3/0x3b0 fs/namei.c:3975 do_mkdirat+0x285/0x300 fs/namei.c:4001 __do_sys_mkdirat fs/namei.c:4016 [inline] __se_sys_mkdirat fs/namei.c:4014 [inline] __ia32_sys_mkdirat+0x81/0xa0 fs/namei.c:4014 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203 entry_SYSENTER_compat_after_hwframe+0x53/0x62 -> #0 (jbd2_handle){++++}-{0:0}: check_prev_add kernel/locking/lockdep.c:3095 [inline] check_prevs_add kernel/locking/lockdep.c:3214 [inline] validate_chain kernel/locking/lockdep.c:3829 [inline] __lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053 lock_acquire kernel/locking/lockdep.c:5665 [inline] lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630 start_this_handle+0xfe7/0x14a0 fs/jbd2/transaction.c:463 jbd2__journal_start+0x399/0x930 fs/jbd2/transaction.c:520 __ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105 __ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline] ext4_dirty_inode+0x9d/0x110 fs/ext4/inode.c:5949 __mark_inode_dirty+0x495/0x1050 fs/fs-writeback.c:2381 mark_inode_dirty_sync include/linux/fs.h:2337 [inline] iput.part.0+0x57/0x820 fs/inode.c:1767 iput+0x58/0x70 fs/inode.c:1760 dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401 __dentry_kill+0x3c0/0x640 fs/dcache.c:607 shrink_dentry_list+0x23c/0x800 fs/dcache.c:1201 prune_dcache_sb+0xe7/0x140 fs/dcache.c:1282 super_cache_scan+0x336/0x590 fs/super.c:104 do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:770 shrink_slab+0x17c/0x6f0 mm/vmscan.c:930 shrink_node_memcgs mm/vmscan.c:3124 [inline] shrink_node+0x8b3/0x1db0 mm/vmscan.c:3245 shrink_zones mm/vmscan.c:3482 [inline] do_try_to_free_pages+0x3b5/0x1700 mm/vmscan.c:3540 try_to_free_pages+0x2ac/0x840 mm/vmscan.c:3775 __perform_reclaim mm/page_alloc.c:4641 [inline] __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline] __alloc_pages_slowpath.constprop.0+0xa8a/0x2160 mm/page_alloc.c:5066 __alloc_pages+0x436/0x510 mm/page_alloc.c:5439 __alloc_pages_node include/linux/gfp.h:587 [inline] khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:859 collapse_huge_page mm/khugepaged.c:1062 [inline] khugepaged_scan_pmd mm/khugepaged.c:1348 [inline] khugepaged_scan_mm_slot mm/khugepaged.c:2170 [inline] khugepaged_do_scan mm/khugepaged.c:2251 [inline] khugepaged+0x3473/0x66a0 mm/khugepaged.c:2296 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 other info that might help us debug this: Chain exists of: jbd2_handle --> oom_lock --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(oom_lock); lock(fs_reclaim); lock(jbd2_handle); *** DEADLOCK *** 3 locks held by khugepaged/48: #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4638 [inline] #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline] #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x9e1/0x2160 mm/page_alloc.c:5066 #1: ffffffff8be7d850 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0xc9/0x6f0 mm/vmscan.c:920 #2: ffff8880445800e0 (&type->s_umount_key#33){++++}-{3:3}, at: trylock_super fs/super.c:415 [inline] #2: ffff8880445800e0 (&type->s_umount_key#33){++++}-{3:3}, at: super_cache_scan+0x6c/0x590 fs/super.c:79 stack backtrace: CPU: 2 PID: 48 Comm: khugepaged Not tainted 5.19.0-rc6-syzkaller-00026-g5a29232d870d #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2175 check_prev_add kernel/locking/lockdep.c:3095 [inline] check_prevs_add kernel/locking/lockdep.c:3214 [inline] validate_chain kernel/locking/lockdep.c:3829 [inline] __lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053 lock_acquire kernel/locking/lockdep.c:5665 [inline] lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630 start_this_handle+0xfe7/0x14a0 fs/jbd2/transaction.c:463 jbd2__journal_start+0x399/0x930 fs/jbd2/transaction.c:520 __ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105 __ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline] ext4_dirty_inode+0x9d/0x110 fs/ext4/inode.c:5949 __mark_inode_dirty+0x495/0x1050 fs/fs-writeback.c:2381 mark_inode_dirty_sync include/linux/fs.h:2337 [inline] iput.part.0+0x57/0x820 fs/inode.c:1767 iput+0x58/0x70 fs/inode.c:1760 dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401 __dentry_kill+0x3c0/0x640 fs/dcache.c:607 shrink_dentry_list+0x23c/0x800 fs/dcache.c:1201 prune_dcache_sb+0xe7/0x140 fs/dcache.c:1282 super_cache_scan+0x336/0x590 fs/super.c:104 do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:770 shrink_slab+0x17c/0x6f0 mm/vmscan.c:930 shrink_node_memcgs mm/vmscan.c:3124 [inline] shrink_node+0x8b3/0x1db0 mm/vmscan.c:3245 shrink_zones mm/vmscan.c:3482 [inline] do_try_to_free_pages+0x3b5/0x1700 mm/vmscan.c:3540 try_to_free_pages+0x2ac/0x840 mm/vmscan.c:3775 __perform_reclaim mm/page_alloc.c:4641 [inline] __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline] __alloc_pages_slowpath.constprop.0+0xa8a/0x2160 mm/page_alloc.c:5066 __alloc_pages+0x436/0x510 mm/page_alloc.c:5439 __alloc_pages_node include/linux/gfp.h:587 [inline] khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:859 collapse_huge_page mm/khugepaged.c:1062 [inline] khugepaged_scan_pmd mm/khugepaged.c:1348 [inline] khugepaged_scan_mm_slot mm/khugepaged.c:2170 [inline] khugepaged_do_scan mm/khugepaged.c:2251 [inline] khugepaged+0x3473/0x66a0 mm/khugepaged.c:2296 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 </TASK> --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] possible deadlock in start_this_handle (3) 2022-07-14 12:08 [syzbot] possible deadlock in start_this_handle (3) syzbot @ 2022-07-14 14:18 ` Jan Kara 2022-07-14 22:24 ` Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Jan Kara @ 2022-07-14 14:18 UTC (permalink / raw) To: linux-mm Cc: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso, linux-fsdevel Hello, so this lockdep report looks real but is more related to OOM handling than to ext4 as such. The immediate problem I can see is that mem_cgroup_print_oom_meminfo() which is called under oom_lock calls memory_stat_format() which does GFP_KERNEL allocations to allocate buffers for dumping of MM statistics. This creates oom_lock -> fs reclaim dependency and because OOM can be hit (and thus oom_lock acquired) in practically any allocation (regardless of GFP_NOFS) this has a potential of creating real deadlock cycles. So should mem_cgroup_print_oom_meminfo() be using memalloc_nofs_save/restore() to avoid such deadlocks? Or perhaps someone sees another solution? Generally allocating memory to report OOM looks a bit dangerous to me ;). Honza On Thu 14-07-22 05:08:26, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 5a29232d870d Merge tag 'for-5.19-rc6-tag' of git://git.ker.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=16619ce8080000 > kernel config: https://syzkaller.appspot.com/x/.config?x=525bc0635a2b942a > dashboard link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > userspace arch: i386 > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com > > ====================================================== > WARNING: possible circular locking dependency detected > 5.19.0-rc6-syzkaller-00026-g5a29232d870d #0 Not tainted > ------------------------------------------------------ > khugepaged/48 is trying to acquire lock: > ffff888044598990 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0xfb4/0x14a0 fs/jbd2/transaction.c:461 > > but task is already holding lock: > ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4638 [inline] > ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline] > ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x9e1/0x2160 mm/page_alloc.c:5066 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #2 (fs_reclaim){+.+.}-{0:0}: > __fs_reclaim_acquire mm/page_alloc.c:4589 [inline] > fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4603 > might_alloc include/linux/sched/mm.h:271 [inline] > slab_pre_alloc_hook mm/slab.h:723 [inline] > slab_alloc_node mm/slub.c:3157 [inline] > slab_alloc mm/slub.c:3251 [inline] > kmem_cache_alloc_trace+0x40/0x3f0 mm/slub.c:3282 > kmalloc include/linux/slab.h:600 [inline] > memory_stat_format+0x95/0xae0 mm/memcontrol.c:1468 > mem_cgroup_print_oom_meminfo.cold+0x50/0x7e mm/memcontrol.c:1594 > dump_header+0x13f/0x7f9 mm/oom_kill.c:462 > oom_kill_process.cold+0x10/0x15 mm/oom_kill.c:1037 > out_of_memory+0x358/0x14b0 mm/oom_kill.c:1175 > mem_cgroup_out_of_memory+0x206/0x270 mm/memcontrol.c:1650 > memory_max_write+0x25c/0x3b0 mm/memcontrol.c:6299 > cgroup_file_write+0x1de/0x770 kernel/cgroup/cgroup.c:3882 > kernfs_fop_write_iter+0x3f8/0x610 fs/kernfs/file.c:290 > call_write_iter include/linux/fs.h:2058 [inline] > new_sync_write+0x38a/0x560 fs/read_write.c:504 > vfs_write+0x7c0/0xac0 fs/read_write.c:591 > ksys_write+0x127/0x250 fs/read_write.c:644 > do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] > __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 > do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203 > entry_SYSENTER_compat_after_hwframe+0x53/0x62 > > -> #1 (oom_lock){+.+.}-{3:3}: > __mutex_lock_common kernel/locking/mutex.c:603 [inline] > __mutex_lock+0x12f/0x1350 kernel/locking/mutex.c:747 > mem_cgroup_out_of_memory+0x8d/0x270 mm/memcontrol.c:1640 > mem_cgroup_oom mm/memcontrol.c:1880 [inline] > try_charge_memcg+0xef9/0x1380 mm/memcontrol.c:2670 > obj_cgroup_charge_pages mm/memcontrol.c:2999 [inline] > obj_cgroup_charge+0x2ab/0x5e0 mm/memcontrol.c:3289 > memcg_slab_pre_alloc_hook mm/slab.h:505 [inline] > slab_pre_alloc_hook mm/slab.h:728 [inline] > slab_alloc_node mm/slub.c:3157 [inline] > slab_alloc mm/slub.c:3251 [inline] > __kmem_cache_alloc_lru mm/slub.c:3258 [inline] > kmem_cache_alloc+0x92/0x3b0 mm/slub.c:3268 > kmem_cache_zalloc include/linux/slab.h:723 [inline] > alloc_buffer_head+0x20/0x140 fs/buffer.c:3294 > alloc_page_buffers+0x285/0x7a0 fs/buffer.c:829 > grow_dev_page fs/buffer.c:965 [inline] > grow_buffers fs/buffer.c:1011 [inline] > __getblk_slow+0x525/0x1080 fs/buffer.c:1038 > __getblk_gfp+0x6e/0x80 fs/buffer.c:1333 > sb_getblk include/linux/buffer_head.h:326 [inline] > ext4_getblk+0x20d/0x7c0 fs/ext4/inode.c:866 > ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:912 > ext4_append+0x177/0x3a0 fs/ext4/namei.c:67 > ext4_init_new_dir+0x25e/0x4d0 fs/ext4/namei.c:2920 > ext4_mkdir+0x3cf/0xb20 fs/ext4/namei.c:2966 > vfs_mkdir+0x1c3/0x3b0 fs/namei.c:3975 > do_mkdirat+0x285/0x300 fs/namei.c:4001 > __do_sys_mkdirat fs/namei.c:4016 [inline] > __se_sys_mkdirat fs/namei.c:4014 [inline] > __ia32_sys_mkdirat+0x81/0xa0 fs/namei.c:4014 > do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] > __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 > do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203 > entry_SYSENTER_compat_after_hwframe+0x53/0x62 > > -> #0 (jbd2_handle){++++}-{0:0}: > check_prev_add kernel/locking/lockdep.c:3095 [inline] > check_prevs_add kernel/locking/lockdep.c:3214 [inline] > validate_chain kernel/locking/lockdep.c:3829 [inline] > __lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053 > lock_acquire kernel/locking/lockdep.c:5665 [inline] > lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630 > start_this_handle+0xfe7/0x14a0 fs/jbd2/transaction.c:463 > jbd2__journal_start+0x399/0x930 fs/jbd2/transaction.c:520 > __ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105 > __ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline] > ext4_dirty_inode+0x9d/0x110 fs/ext4/inode.c:5949 > __mark_inode_dirty+0x495/0x1050 fs/fs-writeback.c:2381 > mark_inode_dirty_sync include/linux/fs.h:2337 [inline] > iput.part.0+0x57/0x820 fs/inode.c:1767 > iput+0x58/0x70 fs/inode.c:1760 > dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401 > __dentry_kill+0x3c0/0x640 fs/dcache.c:607 > shrink_dentry_list+0x23c/0x800 fs/dcache.c:1201 > prune_dcache_sb+0xe7/0x140 fs/dcache.c:1282 > super_cache_scan+0x336/0x590 fs/super.c:104 > do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:770 > shrink_slab+0x17c/0x6f0 mm/vmscan.c:930 > shrink_node_memcgs mm/vmscan.c:3124 [inline] > shrink_node+0x8b3/0x1db0 mm/vmscan.c:3245 > shrink_zones mm/vmscan.c:3482 [inline] > do_try_to_free_pages+0x3b5/0x1700 mm/vmscan.c:3540 > try_to_free_pages+0x2ac/0x840 mm/vmscan.c:3775 > __perform_reclaim mm/page_alloc.c:4641 [inline] > __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline] > __alloc_pages_slowpath.constprop.0+0xa8a/0x2160 mm/page_alloc.c:5066 > __alloc_pages+0x436/0x510 mm/page_alloc.c:5439 > __alloc_pages_node include/linux/gfp.h:587 [inline] > khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:859 > collapse_huge_page mm/khugepaged.c:1062 [inline] > khugepaged_scan_pmd mm/khugepaged.c:1348 [inline] > khugepaged_scan_mm_slot mm/khugepaged.c:2170 [inline] > khugepaged_do_scan mm/khugepaged.c:2251 [inline] > khugepaged+0x3473/0x66a0 mm/khugepaged.c:2296 > kthread+0x2e9/0x3a0 kernel/kthread.c:376 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 > > other info that might help us debug this: > > Chain exists of: > jbd2_handle --> oom_lock --> fs_reclaim > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(fs_reclaim); > lock(oom_lock); > lock(fs_reclaim); > lock(jbd2_handle); > > *** DEADLOCK *** > > 3 locks held by khugepaged/48: > #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4638 [inline] > #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline] > #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x9e1/0x2160 mm/page_alloc.c:5066 > #1: ffffffff8be7d850 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0xc9/0x6f0 mm/vmscan.c:920 > #2: ffff8880445800e0 (&type->s_umount_key#33){++++}-{3:3}, at: trylock_super fs/super.c:415 [inline] > #2: ffff8880445800e0 (&type->s_umount_key#33){++++}-{3:3}, at: super_cache_scan+0x6c/0x590 fs/super.c:79 > > stack backtrace: > CPU: 2 PID: 48 Comm: khugepaged Not tainted 5.19.0-rc6-syzkaller-00026-g5a29232d870d #0 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > Call Trace: > <TASK> > __dump_stack lib/dump_stack.c:88 [inline] > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 > check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2175 > check_prev_add kernel/locking/lockdep.c:3095 [inline] > check_prevs_add kernel/locking/lockdep.c:3214 [inline] > validate_chain kernel/locking/lockdep.c:3829 [inline] > __lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053 > lock_acquire kernel/locking/lockdep.c:5665 [inline] > lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630 > start_this_handle+0xfe7/0x14a0 fs/jbd2/transaction.c:463 > jbd2__journal_start+0x399/0x930 fs/jbd2/transaction.c:520 > __ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105 > __ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline] > ext4_dirty_inode+0x9d/0x110 fs/ext4/inode.c:5949 > __mark_inode_dirty+0x495/0x1050 fs/fs-writeback.c:2381 > mark_inode_dirty_sync include/linux/fs.h:2337 [inline] > iput.part.0+0x57/0x820 fs/inode.c:1767 > iput+0x58/0x70 fs/inode.c:1760 > dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401 > __dentry_kill+0x3c0/0x640 fs/dcache.c:607 > shrink_dentry_list+0x23c/0x800 fs/dcache.c:1201 > prune_dcache_sb+0xe7/0x140 fs/dcache.c:1282 > super_cache_scan+0x336/0x590 fs/super.c:104 > do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:770 > shrink_slab+0x17c/0x6f0 mm/vmscan.c:930 > shrink_node_memcgs mm/vmscan.c:3124 [inline] > shrink_node+0x8b3/0x1db0 mm/vmscan.c:3245 > shrink_zones mm/vmscan.c:3482 [inline] > do_try_to_free_pages+0x3b5/0x1700 mm/vmscan.c:3540 > try_to_free_pages+0x2ac/0x840 mm/vmscan.c:3775 > __perform_reclaim mm/page_alloc.c:4641 [inline] > __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline] > __alloc_pages_slowpath.constprop.0+0xa8a/0x2160 mm/page_alloc.c:5066 > __alloc_pages+0x436/0x510 mm/page_alloc.c:5439 > __alloc_pages_node include/linux/gfp.h:587 [inline] > khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:859 > collapse_huge_page mm/khugepaged.c:1062 [inline] > khugepaged_scan_pmd mm/khugepaged.c:1348 [inline] > khugepaged_scan_mm_slot mm/khugepaged.c:2170 [inline] > khugepaged_do_scan mm/khugepaged.c:2251 [inline] > khugepaged+0x3473/0x66a0 mm/khugepaged.c:2296 > kthread+0x2e9/0x3a0 kernel/kthread.c:376 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 > </TASK> > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] possible deadlock in start_this_handle (3) 2022-07-14 14:18 ` Jan Kara @ 2022-07-14 22:24 ` Tetsuo Handa 2022-07-15 1:39 ` Shakeel Butt 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2022-07-14 22:24 UTC (permalink / raw) To: Jan Kara, linux-mm Cc: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso, linux-fsdevel On 2022/07/14 23:18, Jan Kara wrote: > Hello, > > so this lockdep report looks real but is more related to OOM handling than > to ext4 as such. The immediate problem I can see is that > mem_cgroup_print_oom_meminfo() which is called under oom_lock calls > memory_stat_format() which does GFP_KERNEL allocations to allocate buffers > for dumping of MM statistics. This creates oom_lock -> fs reclaim > dependency and because OOM can be hit (and thus oom_lock acquired) in > practically any allocation (regardless of GFP_NOFS) this has a potential of > creating real deadlock cycles. > > So should mem_cgroup_print_oom_meminfo() be using > memalloc_nofs_save/restore() to avoid such deadlocks? Or perhaps someone > sees another solution? Generally allocating memory to report OOM looks a > bit dangerous to me ;). > > Honza I think mem_cgroup_print_oom_meminfo() should use GFP_ATOMIC, for it will fall into infinite loop if kmalloc(GFP_NOFS) under oom_lock reached __alloc_pages_may_oom() path. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] possible deadlock in start_this_handle (3) 2022-07-14 22:24 ` Tetsuo Handa @ 2022-07-15 1:39 ` Shakeel Butt 2022-07-15 1:53 ` Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Shakeel Butt @ 2022-07-15 1:39 UTC (permalink / raw) To: Tetsuo Handa Cc: Jan Kara, linux-mm, jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso, linux-fsdevel On Fri, Jul 15, 2022 at 07:24:55AM +0900, Tetsuo Handa wrote: > On 2022/07/14 23:18, Jan Kara wrote: > > Hello, > > > > so this lockdep report looks real but is more related to OOM handling than > > to ext4 as such. The immediate problem I can see is that > > mem_cgroup_print_oom_meminfo() which is called under oom_lock calls > > memory_stat_format() which does GFP_KERNEL allocations to allocate buffers > > for dumping of MM statistics. This creates oom_lock -> fs reclaim > > dependency and because OOM can be hit (and thus oom_lock acquired) in > > practically any allocation (regardless of GFP_NOFS) this has a potential of > > creating real deadlock cycles. > > > > So should mem_cgroup_print_oom_meminfo() be using > > memalloc_nofs_save/restore() to avoid such deadlocks? Or perhaps someone > > sees another solution? Generally allocating memory to report OOM looks a > > bit dangerous to me ;). mem_cgroup_print_oom_meminfo() is called only for memcg OOMs. So, the situaion would be dangerous only if the system is also OOM at that time. > > > > Honza > > I think mem_cgroup_print_oom_meminfo() should use GFP_ATOMIC, for it will fall into > infinite loop if kmalloc(GFP_NOFS) under oom_lock reached __alloc_pages_may_oom() path. I would prefer GFP_NOWAIT. This is printing info for memcg OOMs and if the system is low on memory then memcg OOMs has lower importance than the system state. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] possible deadlock in start_this_handle (3) 2022-07-15 1:39 ` Shakeel Butt @ 2022-07-15 1:53 ` Tetsuo Handa 2022-07-20 23:49 ` [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2022-07-15 1:53 UTC (permalink / raw) To: Shakeel Butt Cc: Jan Kara, linux-mm, jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso, linux-fsdevel On 2022/07/15 10:39, Shakeel Butt wrote: >> I think mem_cgroup_print_oom_meminfo() should use GFP_ATOMIC, for it will fall into >> infinite loop if kmalloc(GFP_NOFS) under oom_lock reached __alloc_pages_may_oom() path. > > I would prefer GFP_NOWAIT. This is printing info for memcg OOMs and if > the system is low on memory then memcg OOMs has lower importance than > the system state. Since killing a process in some memcg likely helps solving global OOM state, system OOM condition might not be reported when memory allocation by mem_cgroup_print_oom_meminfo() caused system OOM condition. Therefore, we don't need to discard output from memcg OOM condition. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock 2022-07-15 1:53 ` Tetsuo Handa @ 2022-07-20 23:49 ` Tetsuo Handa 2022-07-21 8:01 ` Michal Hocko 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2022-07-20 23:49 UTC (permalink / raw) To: Johannes Weiner, Michal Hocko, Andrew Morton; +Cc: linux-mm syzbot is reporting fs_reclaim allocation with oom_lock held [1]. We must make sure that such allocation won't hit __alloc_pages_may_oom() path which will retry forever if oom_lock is already held. I choose GFP_ATOMIC than GFP_NOWAIT, for since global OOM situation will likely be avoided by killing some process in memcg, and memory will be released after printk(), trying a little hard will be acceptable. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- mm/memcontrol.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 618c366a2f07..11cd900729b9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1465,7 +1465,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) struct seq_buf s; int i; - seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE); + /* Caller might be holding oom_lock. */ + seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_ATOMIC), PAGE_SIZE); if (!s.buffer) return NULL; -- 2.18.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock 2022-07-20 23:49 ` [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock Tetsuo Handa @ 2022-07-21 8:01 ` Michal Hocko 2022-07-22 0:46 ` [PATCH v2] " Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Michal Hocko @ 2022-07-21 8:01 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Johannes Weiner, Andrew Morton, linux-mm On Thu 21-07-22 08:49:57, Tetsuo Handa wrote: > syzbot is reporting fs_reclaim allocation with oom_lock held [1]. We > must make sure that such allocation won't hit __alloc_pages_may_oom() > path which will retry forever if oom_lock is already held. > > I choose GFP_ATOMIC than GFP_NOWAIT, for since global OOM situation will > likely be avoided by killing some process in memcg, and memory will be > released after printk(), trying a little hard will be acceptable. Nope, this is not a proper fix. You are making memory.stat more likely to fail. An uncoditional GFP_KERNEL allocation is certainly not good but is there any reason to not use GFP_NOIO instead? Or even better. In an ideal world we won't allocate from here at all. Can we pre-allocate that single page and re-use it for the oom report? --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index abec50f31fe6..13483cb278bb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, return memcg_page_state(memcg, item) * memcg_page_state_unit(item); } -static char *memory_stat_format(struct mem_cgroup *memcg) +void memory_stat_format(struct mem_cgroup *memcg, char *buf) { struct seq_buf s; int i; - seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE); - if (!s.buffer) - return NULL; + seq_buf_init(&s, buf, PAGE_SIZE); /* * Provide statistics on the state of the memory subsystem as @@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg) /* The above should easily fit into one page */ WARN_ON_ONCE(seq_buf_has_overflowed(&s)); - - return s.buffer; } #define K(x) ((x) << (PAGE_SHIFT-10)) @@ -1563,6 +1559,12 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct * rcu_read_unlock(); } +/* + * preallocated buffer to collect memory stats for the oom situation. + * Usage protected by oom_lock + */ +char oombuf[PAGE_SIZE]; + /** * mem_cgroup_print_oom_meminfo: Print OOM memory information relevant to * memory controller. @@ -1570,7 +1572,7 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct * */ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { - char *buf; + lockdep_assert_held(&oom_lock); pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n", K((u64)page_counter_read(&memcg->memory)), @@ -1591,11 +1593,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) pr_info("Memory cgroup stats for "); pr_cont_cgroup_path(memcg->css.cgroup); pr_cont(":"); - buf = memory_stat_format(memcg); - if (!buf) - return; - pr_info("%s", buf); - kfree(buf); + memory_stat_format(memcg, oombuf); + pr_info("%s", oombuf); } /* @@ -6335,11 +6334,11 @@ static int memory_events_local_show(struct seq_file *m, void *v) static int memory_stat_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - char *buf; + char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL); - buf = memory_stat_format(memcg); if (!buf) return -ENOMEM; + memory_stat_format(memcg, buf); seq_puts(m, buf); kfree(buf); return 0; -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2] mm: memcontrol: fix potential oom_lock recursion deadlock 2022-07-21 8:01 ` Michal Hocko @ 2022-07-22 0:46 ` Tetsuo Handa 2022-07-22 7:19 ` Michal Hocko 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2022-07-22 0:46 UTC (permalink / raw) To: Michal Hocko; +Cc: Johannes Weiner, Andrew Morton, linux-mm syzbot is reporting GFP_KERNEL allocation with oom_lock held [1]. We must make sure that such allocation won't hit __alloc_pages_may_oom() path which will retry forever if oom_lock is already held. Use static buffer when oom_lock is already held. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> Suggested-by: Michal Hocko <mhocko@suse.com> Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- Changes in v2: Use static buffer for OOM reporting, suggested by Michal Hocko <mhocko@suse.com>. mm/memcontrol.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 618c366a2f07..8092be2fbb7c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, return memcg_page_state(memcg, item) * memcg_page_state_unit(item); } -static char *memory_stat_format(struct mem_cgroup *memcg) +static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize) { struct seq_buf s; int i; - seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE); - if (!s.buffer) - return NULL; + seq_buf_init(&s, buf, bufsize); /* * Provide statistics on the state of the memory subsystem as @@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg) /* The above should easily fit into one page */ WARN_ON_ONCE(seq_buf_has_overflowed(&s)); - - return s.buffer; } #define K(x) ((x) << (PAGE_SHIFT-10)) @@ -1570,7 +1566,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct * */ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { - char *buf; + /* Use static buffer, for the caller is holding oom_lock. */ + static char buf[PAGE_SIZE]; + + lockdep_assert_held(&oom_lock); pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n", K((u64)page_counter_read(&memcg->memory)), @@ -1591,11 +1590,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) pr_info("Memory cgroup stats for "); pr_cont_cgroup_path(memcg->css.cgroup); pr_cont(":"); - buf = memory_stat_format(memcg); - if (!buf) - return; + memory_stat_format(memcg, buf, sizeof(buf)); pr_info("%s", buf); - kfree(buf); } /* @@ -6335,11 +6331,11 @@ static int memory_events_local_show(struct seq_file *m, void *v) static int memory_stat_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - char *buf; + char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL); - buf = memory_stat_format(memcg); if (!buf) return -ENOMEM; + memory_stat_format(memcg, buf, PAGE_SIZE); seq_puts(m, buf); kfree(buf); return 0; -- 2.18.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm: memcontrol: fix potential oom_lock recursion deadlock 2022-07-22 0:46 ` [PATCH v2] " Tetsuo Handa @ 2022-07-22 7:19 ` Michal Hocko 2022-07-22 10:45 ` [PATCH v3] " Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Michal Hocko @ 2022-07-22 7:19 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Johannes Weiner, Andrew Morton, linux-mm On Fri 22-07-22 09:46:27, Tetsuo Handa wrote: > syzbot is reporting GFP_KERNEL allocation with oom_lock held [1]. We > must make sure that such allocation won't hit __alloc_pages_may_oom() > path which will retry forever if oom_lock is already held. Use static > buffer when oom_lock is already held. The changelog is rather cryptic. Your previous one was more readable. I would go with: " syzbot is reporting GFP_KERNEL allocation with oom_lock held [1] when reporting memcg oom. This is problematic because this creates a dependency between GFP_NOFS and GFP_KERNEL over oom_lock which could dead lock the system. Fix the problem by removing the allocation from memory_stat_format completely. Use a statically preallocated buffer instead for this path. OOM dumping is synchronized by the oom_lock so there is no exclusion required here. memory_stat_show can use GFP_KERNEL allocation. " > Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] > Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> > Suggested-by: Michal Hocko <mhocko@suse.com> > Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM") > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > Changes in v2: > Use static buffer for OOM reporting, suggested by Michal Hocko <mhocko@suse.com>. > > mm/memcontrol.c | 22 +++++++++------------- > 1 file changed, 9 insertions(+), 13 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 618c366a2f07..8092be2fbb7c 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, > return memcg_page_state(memcg, item) * memcg_page_state_unit(item); > } > > -static char *memory_stat_format(struct mem_cgroup *memcg) > +static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize) > { > struct seq_buf s; > int i; > > - seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE); > - if (!s.buffer) > - return NULL; > + seq_buf_init(&s, buf, bufsize); > > /* > * Provide statistics on the state of the memory subsystem as > @@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg) > > /* The above should easily fit into one page */ > WARN_ON_ONCE(seq_buf_has_overflowed(&s)); > - > - return s.buffer; > } > > #define K(x) ((x) << (PAGE_SHIFT-10)) > @@ -1570,7 +1566,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct * > */ > void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) > { > - char *buf; > + /* Use static buffer, for the caller is holding oom_lock. */ > + static char buf[PAGE_SIZE]; > + > + lockdep_assert_held(&oom_lock); > > pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n", > K((u64)page_counter_read(&memcg->memory)), > @@ -1591,11 +1590,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) > pr_info("Memory cgroup stats for "); > pr_cont_cgroup_path(memcg->css.cgroup); > pr_cont(":"); > - buf = memory_stat_format(memcg); > - if (!buf) > - return; > + memory_stat_format(memcg, buf, sizeof(buf)); > pr_info("%s", buf); > - kfree(buf); > } > > /* > @@ -6335,11 +6331,11 @@ static int memory_events_local_show(struct seq_file *m, void *v) > static int memory_stat_show(struct seq_file *m, void *v) > { > struct mem_cgroup *memcg = mem_cgroup_from_seq(m); > - char *buf; > + char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL); > > - buf = memory_stat_format(memcg); > if (!buf) > return -ENOMEM; > + memory_stat_format(memcg, buf, PAGE_SIZE); > seq_puts(m, buf); > kfree(buf); > return 0; > -- > 2.18.4 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3] mm: memcontrol: fix potential oom_lock recursion deadlock 2022-07-22 7:19 ` Michal Hocko @ 2022-07-22 10:45 ` Tetsuo Handa 2022-07-22 11:04 ` Michal Hocko 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2022-07-22 10:45 UTC (permalink / raw) To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. Such allocation request might deadlock the system, for __alloc_pages_may_oom() cannot invoke global OOM killer due to oom_lock being already held by the caller. Fix this problem by removing the allocation from memory_stat_format() completely, and pass static buffer when calling from memcg OOM path. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> Suggested-by: Michal Hocko <mhocko@suse.com> Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> --- Changes in v3: Update patch description. Changes in v2: Use static buffer for OOM reporting, suggested by Michal Hocko <mhocko@suse.com>. mm/memcontrol.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 618c366a2f07..8092be2fbb7c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, return memcg_page_state(memcg, item) * memcg_page_state_unit(item); } -static char *memory_stat_format(struct mem_cgroup *memcg) +static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize) { struct seq_buf s; int i; - seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE); - if (!s.buffer) - return NULL; + seq_buf_init(&s, buf, bufsize); /* * Provide statistics on the state of the memory subsystem as @@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg) /* The above should easily fit into one page */ WARN_ON_ONCE(seq_buf_has_overflowed(&s)); - - return s.buffer; } #define K(x) ((x) << (PAGE_SHIFT-10)) @@ -1570,7 +1566,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct * */ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { - char *buf; + /* Use static buffer, for the caller is holding oom_lock. */ + static char buf[PAGE_SIZE]; + + lockdep_assert_held(&oom_lock); pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n", K((u64)page_counter_read(&memcg->memory)), @@ -1591,11 +1590,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) pr_info("Memory cgroup stats for "); pr_cont_cgroup_path(memcg->css.cgroup); pr_cont(":"); - buf = memory_stat_format(memcg); - if (!buf) - return; + memory_stat_format(memcg, buf, sizeof(buf)); pr_info("%s", buf); - kfree(buf); } /* @@ -6335,11 +6331,11 @@ static int memory_events_local_show(struct seq_file *m, void *v) static int memory_stat_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - char *buf; + char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL); - buf = memory_stat_format(memcg); if (!buf) return -ENOMEM; + memory_stat_format(memcg, buf, PAGE_SIZE); seq_puts(m, buf); kfree(buf); return 0; -- 2.18.4 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3] mm: memcontrol: fix potential oom_lock recursion deadlock 2022-07-22 10:45 ` [PATCH v3] " Tetsuo Handa @ 2022-07-22 11:04 ` Michal Hocko 2022-07-22 11:12 ` Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Michal Hocko @ 2022-07-22 11:04 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Andrew Morton, Johannes Weiner, linux-mm On Fri 22-07-22 19:45:39, Tetsuo Handa wrote: > syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting > memcg OOM [1]. Such allocation request might deadlock the system, for > __alloc_pages_may_oom() cannot invoke global OOM killer due to oom_lock > being already held by the caller. OK, I have misunderstood your previous wording and now I have realized that there are 2 issues here. One of them is a (less likely) dead lock on the oom_lock not making a fwd progress (that would require global OOM racing with memcg OOM) and the other is the GFP_NOFS->GFP_KERNEL dependency which can deadlock even without global the above race. Correct? Sorry I could have realized that sooner. > Fix this problem by removing the allocation from memory_stat_format() > completely, and pass static buffer when calling from memcg OOM path. > > Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] > Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> > Suggested-by: Michal Hocko <mhocko@suse.com> > Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM") > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Acked-by: Michal Hocko <mhocko@suse.com> > --- > Changes in v3: > Update patch description. > > Changes in v2: > Use static buffer for OOM reporting, suggested by Michal Hocko <mhocko@suse.com>. > > mm/memcontrol.c | 22 +++++++++------------- > 1 file changed, 9 insertions(+), 13 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 618c366a2f07..8092be2fbb7c 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, > return memcg_page_state(memcg, item) * memcg_page_state_unit(item); > } > > -static char *memory_stat_format(struct mem_cgroup *memcg) > +static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize) > { > struct seq_buf s; > int i; > > - seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE); > - if (!s.buffer) > - return NULL; > + seq_buf_init(&s, buf, bufsize); > > /* > * Provide statistics on the state of the memory subsystem as > @@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg) > > /* The above should easily fit into one page */ > WARN_ON_ONCE(seq_buf_has_overflowed(&s)); > - > - return s.buffer; > } > > #define K(x) ((x) << (PAGE_SHIFT-10)) > @@ -1570,7 +1566,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct * > */ > void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) > { > - char *buf; > + /* Use static buffer, for the caller is holding oom_lock. */ > + static char buf[PAGE_SIZE]; > + > + lockdep_assert_held(&oom_lock); > > pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n", > K((u64)page_counter_read(&memcg->memory)), > @@ -1591,11 +1590,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) > pr_info("Memory cgroup stats for "); > pr_cont_cgroup_path(memcg->css.cgroup); > pr_cont(":"); > - buf = memory_stat_format(memcg); > - if (!buf) > - return; > + memory_stat_format(memcg, buf, sizeof(buf)); > pr_info("%s", buf); > - kfree(buf); > } > > /* > @@ -6335,11 +6331,11 @@ static int memory_events_local_show(struct seq_file *m, void *v) > static int memory_stat_show(struct seq_file *m, void *v) > { > struct mem_cgroup *memcg = mem_cgroup_from_seq(m); > - char *buf; > + char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL); > > - buf = memory_stat_format(memcg); > if (!buf) > return -ENOMEM; > + memory_stat_format(memcg, buf, PAGE_SIZE); > seq_puts(m, buf); > kfree(buf); > return 0; > -- > 2.18.4 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] mm: memcontrol: fix potential oom_lock recursion deadlock 2022-07-22 11:04 ` Michal Hocko @ 2022-07-22 11:12 ` Tetsuo Handa 0 siblings, 0 replies; 12+ messages in thread From: Tetsuo Handa @ 2022-07-22 11:12 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Johannes Weiner, linux-mm On 2022/07/22 20:04, Michal Hocko wrote: > On Fri 22-07-22 19:45:39, Tetsuo Handa wrote: >> syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting >> memcg OOM [1]. Such allocation request might deadlock the system, for >> __alloc_pages_may_oom() cannot invoke global OOM killer due to oom_lock >> being already held by the caller. > > OK, I have misunderstood your previous wording and now I have realized > that there are 2 issues here. One of them is a (less likely) dead lock on > the oom_lock not making a fwd progress (that would require global OOM > racing with memcg OOM) and the other is the GFP_NOFS->GFP_KERNEL dependency > which can deadlock even without global the above race. > Correct? Correct. > > Sorry I could have realized that sooner. > No problem. Thanks for suggestion. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-07-22 11:12 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-14 12:08 [syzbot] possible deadlock in start_this_handle (3) syzbot 2022-07-14 14:18 ` Jan Kara 2022-07-14 22:24 ` Tetsuo Handa 2022-07-15 1:39 ` Shakeel Butt 2022-07-15 1:53 ` Tetsuo Handa 2022-07-20 23:49 ` [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock Tetsuo Handa 2022-07-21 8:01 ` Michal Hocko 2022-07-22 0:46 ` [PATCH v2] " Tetsuo Handa 2022-07-22 7:19 ` Michal Hocko 2022-07-22 10:45 ` [PATCH v3] " Tetsuo Handa 2022-07-22 11:04 ` Michal Hocko 2022-07-22 11:12 ` Tetsuo Handa
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.