* [syzbot] possible deadlock in start_this_handle (3)
@ 2022-07-14 12:08 syzbot
2022-07-14 14:18 ` Jan Kara
0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2022-07-14 12:08 UTC (permalink / raw)
To: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso
Hello,
syzbot found the following issue on:
HEAD commit: 5a29232d870d Merge tag 'for-5.19-rc6-tag' of git://git.ker..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16619ce8080000
kernel config: https://syzkaller.appspot.com/x/.config?x=525bc0635a2b942a
dashboard link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: i386
Unfortunately, I don't have any reproducer for this issue yet.
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com
======================================================
WARNING: possible circular locking dependency detected
5.19.0-rc6-syzkaller-00026-g5a29232d870d #0 Not tainted
------------------------------------------------------
khugepaged/48 is trying to acquire lock:
ffff888044598990 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0xfb4/0x14a0 fs/jbd2/transaction.c:461
but task is already holding lock:
ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4638 [inline]
ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline]
ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x9e1/0x2160 mm/page_alloc.c:5066
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire mm/page_alloc.c:4589 [inline]
fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4603
might_alloc include/linux/sched/mm.h:271 [inline]
slab_pre_alloc_hook mm/slab.h:723 [inline]
slab_alloc_node mm/slub.c:3157 [inline]
slab_alloc mm/slub.c:3251 [inline]
kmem_cache_alloc_trace+0x40/0x3f0 mm/slub.c:3282
kmalloc include/linux/slab.h:600 [inline]
memory_stat_format+0x95/0xae0 mm/memcontrol.c:1468
mem_cgroup_print_oom_meminfo.cold+0x50/0x7e mm/memcontrol.c:1594
dump_header+0x13f/0x7f9 mm/oom_kill.c:462
oom_kill_process.cold+0x10/0x15 mm/oom_kill.c:1037
out_of_memory+0x358/0x14b0 mm/oom_kill.c:1175
mem_cgroup_out_of_memory+0x206/0x270 mm/memcontrol.c:1650
memory_max_write+0x25c/0x3b0 mm/memcontrol.c:6299
cgroup_file_write+0x1de/0x770 kernel/cgroup/cgroup.c:3882
kernfs_fop_write_iter+0x3f8/0x610 fs/kernfs/file.c:290
call_write_iter include/linux/fs.h:2058 [inline]
new_sync_write+0x38a/0x560 fs/read_write.c:504
vfs_write+0x7c0/0xac0 fs/read_write.c:591
ksys_write+0x127/0x250 fs/read_write.c:644
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203
entry_SYSENTER_compat_after_hwframe+0x53/0x62
-> #1 (oom_lock){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:603 [inline]
__mutex_lock+0x12f/0x1350 kernel/locking/mutex.c:747
mem_cgroup_out_of_memory+0x8d/0x270 mm/memcontrol.c:1640
mem_cgroup_oom mm/memcontrol.c:1880 [inline]
try_charge_memcg+0xef9/0x1380 mm/memcontrol.c:2670
obj_cgroup_charge_pages mm/memcontrol.c:2999 [inline]
obj_cgroup_charge+0x2ab/0x5e0 mm/memcontrol.c:3289
memcg_slab_pre_alloc_hook mm/slab.h:505 [inline]
slab_pre_alloc_hook mm/slab.h:728 [inline]
slab_alloc_node mm/slub.c:3157 [inline]
slab_alloc mm/slub.c:3251 [inline]
__kmem_cache_alloc_lru mm/slub.c:3258 [inline]
kmem_cache_alloc+0x92/0x3b0 mm/slub.c:3268
kmem_cache_zalloc include/linux/slab.h:723 [inline]
alloc_buffer_head+0x20/0x140 fs/buffer.c:3294
alloc_page_buffers+0x285/0x7a0 fs/buffer.c:829
grow_dev_page fs/buffer.c:965 [inline]
grow_buffers fs/buffer.c:1011 [inline]
__getblk_slow+0x525/0x1080 fs/buffer.c:1038
__getblk_gfp+0x6e/0x80 fs/buffer.c:1333
sb_getblk include/linux/buffer_head.h:326 [inline]
ext4_getblk+0x20d/0x7c0 fs/ext4/inode.c:866
ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:912
ext4_append+0x177/0x3a0 fs/ext4/namei.c:67
ext4_init_new_dir+0x25e/0x4d0 fs/ext4/namei.c:2920
ext4_mkdir+0x3cf/0xb20 fs/ext4/namei.c:2966
vfs_mkdir+0x1c3/0x3b0 fs/namei.c:3975
do_mkdirat+0x285/0x300 fs/namei.c:4001
__do_sys_mkdirat fs/namei.c:4016 [inline]
__se_sys_mkdirat fs/namei.c:4014 [inline]
__ia32_sys_mkdirat+0x81/0xa0 fs/namei.c:4014
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203
entry_SYSENTER_compat_after_hwframe+0x53/0x62
-> #0 (jbd2_handle){++++}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3095 [inline]
check_prevs_add kernel/locking/lockdep.c:3214 [inline]
validate_chain kernel/locking/lockdep.c:3829 [inline]
__lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5665 [inline]
lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630
start_this_handle+0xfe7/0x14a0 fs/jbd2/transaction.c:463
jbd2__journal_start+0x399/0x930 fs/jbd2/transaction.c:520
__ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105
__ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline]
ext4_dirty_inode+0x9d/0x110 fs/ext4/inode.c:5949
__mark_inode_dirty+0x495/0x1050 fs/fs-writeback.c:2381
mark_inode_dirty_sync include/linux/fs.h:2337 [inline]
iput.part.0+0x57/0x820 fs/inode.c:1767
iput+0x58/0x70 fs/inode.c:1760
dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401
__dentry_kill+0x3c0/0x640 fs/dcache.c:607
shrink_dentry_list+0x23c/0x800 fs/dcache.c:1201
prune_dcache_sb+0xe7/0x140 fs/dcache.c:1282
super_cache_scan+0x336/0x590 fs/super.c:104
do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:770
shrink_slab+0x17c/0x6f0 mm/vmscan.c:930
shrink_node_memcgs mm/vmscan.c:3124 [inline]
shrink_node+0x8b3/0x1db0 mm/vmscan.c:3245
shrink_zones mm/vmscan.c:3482 [inline]
do_try_to_free_pages+0x3b5/0x1700 mm/vmscan.c:3540
try_to_free_pages+0x2ac/0x840 mm/vmscan.c:3775
__perform_reclaim mm/page_alloc.c:4641 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline]
__alloc_pages_slowpath.constprop.0+0xa8a/0x2160 mm/page_alloc.c:5066
__alloc_pages+0x436/0x510 mm/page_alloc.c:5439
__alloc_pages_node include/linux/gfp.h:587 [inline]
khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:859
collapse_huge_page mm/khugepaged.c:1062 [inline]
khugepaged_scan_pmd mm/khugepaged.c:1348 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:2170 [inline]
khugepaged_do_scan mm/khugepaged.c:2251 [inline]
khugepaged+0x3473/0x66a0 mm/khugepaged.c:2296
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
other info that might help us debug this:
Chain exists of:
jbd2_handle --> oom_lock --> fs_reclaim
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(oom_lock);
lock(fs_reclaim);
lock(jbd2_handle);
*** DEADLOCK ***
3 locks held by khugepaged/48:
#0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4638 [inline]
#0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline]
#0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x9e1/0x2160 mm/page_alloc.c:5066
#1: ffffffff8be7d850 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0xc9/0x6f0 mm/vmscan.c:920
#2: ffff8880445800e0 (&type->s_umount_key#33){++++}-{3:3}, at: trylock_super fs/super.c:415 [inline]
#2: ffff8880445800e0 (&type->s_umount_key#33){++++}-{3:3}, at: super_cache_scan+0x6c/0x590 fs/super.c:79
stack backtrace:
CPU: 2 PID: 48 Comm: khugepaged Not tainted 5.19.0-rc6-syzkaller-00026-g5a29232d870d #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3095 [inline]
check_prevs_add kernel/locking/lockdep.c:3214 [inline]
validate_chain kernel/locking/lockdep.c:3829 [inline]
__lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5665 [inline]
lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630
start_this_handle+0xfe7/0x14a0 fs/jbd2/transaction.c:463
jbd2__journal_start+0x399/0x930 fs/jbd2/transaction.c:520
__ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105
__ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline]
ext4_dirty_inode+0x9d/0x110 fs/ext4/inode.c:5949
__mark_inode_dirty+0x495/0x1050 fs/fs-writeback.c:2381
mark_inode_dirty_sync include/linux/fs.h:2337 [inline]
iput.part.0+0x57/0x820 fs/inode.c:1767
iput+0x58/0x70 fs/inode.c:1760
dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401
__dentry_kill+0x3c0/0x640 fs/dcache.c:607
shrink_dentry_list+0x23c/0x800 fs/dcache.c:1201
prune_dcache_sb+0xe7/0x140 fs/dcache.c:1282
super_cache_scan+0x336/0x590 fs/super.c:104
do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:770
shrink_slab+0x17c/0x6f0 mm/vmscan.c:930
shrink_node_memcgs mm/vmscan.c:3124 [inline]
shrink_node+0x8b3/0x1db0 mm/vmscan.c:3245
shrink_zones mm/vmscan.c:3482 [inline]
do_try_to_free_pages+0x3b5/0x1700 mm/vmscan.c:3540
try_to_free_pages+0x2ac/0x840 mm/vmscan.c:3775
__perform_reclaim mm/page_alloc.c:4641 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline]
__alloc_pages_slowpath.constprop.0+0xa8a/0x2160 mm/page_alloc.c:5066
__alloc_pages+0x436/0x510 mm/page_alloc.c:5439
__alloc_pages_node include/linux/gfp.h:587 [inline]
khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:859
collapse_huge_page mm/khugepaged.c:1062 [inline]
khugepaged_scan_pmd mm/khugepaged.c:1348 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:2170 [inline]
khugepaged_do_scan mm/khugepaged.c:2251 [inline]
khugepaged+0x3473/0x66a0 mm/khugepaged.c:2296
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] possible deadlock in start_this_handle (3)
2022-07-14 12:08 [syzbot] possible deadlock in start_this_handle (3) syzbot
@ 2022-07-14 14:18 ` Jan Kara
2022-07-14 22:24 ` Tetsuo Handa
0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2022-07-14 14:18 UTC (permalink / raw)
To: linux-mm
Cc: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso, linux-fsdevel
Hello,
so this lockdep report looks real but is more related to OOM handling than
to ext4 as such. The immediate problem I can see is that
mem_cgroup_print_oom_meminfo() which is called under oom_lock calls
memory_stat_format() which does GFP_KERNEL allocations to allocate buffers
for dumping of MM statistics. This creates oom_lock -> fs reclaim
dependency and because OOM can be hit (and thus oom_lock acquired) in
practically any allocation (regardless of GFP_NOFS) this has a potential of
creating real deadlock cycles.
So should mem_cgroup_print_oom_meminfo() be using
memalloc_nofs_save/restore() to avoid such deadlocks? Or perhaps someone
sees another solution? Generally allocating memory to report OOM looks a
bit dangerous to me ;).
Honza
On Thu 14-07-22 05:08:26, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 5a29232d870d Merge tag 'for-5.19-rc6-tag' of git://git.ker..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16619ce8080000
> kernel config: https://syzkaller.appspot.com/x/.config?x=525bc0635a2b942a
> dashboard link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> userspace arch: i386
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.19.0-rc6-syzkaller-00026-g5a29232d870d #0 Not tainted
> ------------------------------------------------------
> khugepaged/48 is trying to acquire lock:
> ffff888044598990 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0xfb4/0x14a0 fs/jbd2/transaction.c:461
>
> but task is already holding lock:
> ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4638 [inline]
> ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline]
> ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x9e1/0x2160 mm/page_alloc.c:5066
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (fs_reclaim){+.+.}-{0:0}:
> __fs_reclaim_acquire mm/page_alloc.c:4589 [inline]
> fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4603
> might_alloc include/linux/sched/mm.h:271 [inline]
> slab_pre_alloc_hook mm/slab.h:723 [inline]
> slab_alloc_node mm/slub.c:3157 [inline]
> slab_alloc mm/slub.c:3251 [inline]
> kmem_cache_alloc_trace+0x40/0x3f0 mm/slub.c:3282
> kmalloc include/linux/slab.h:600 [inline]
> memory_stat_format+0x95/0xae0 mm/memcontrol.c:1468
> mem_cgroup_print_oom_meminfo.cold+0x50/0x7e mm/memcontrol.c:1594
> dump_header+0x13f/0x7f9 mm/oom_kill.c:462
> oom_kill_process.cold+0x10/0x15 mm/oom_kill.c:1037
> out_of_memory+0x358/0x14b0 mm/oom_kill.c:1175
> mem_cgroup_out_of_memory+0x206/0x270 mm/memcontrol.c:1650
> memory_max_write+0x25c/0x3b0 mm/memcontrol.c:6299
> cgroup_file_write+0x1de/0x770 kernel/cgroup/cgroup.c:3882
> kernfs_fop_write_iter+0x3f8/0x610 fs/kernfs/file.c:290
> call_write_iter include/linux/fs.h:2058 [inline]
> new_sync_write+0x38a/0x560 fs/read_write.c:504
> vfs_write+0x7c0/0xac0 fs/read_write.c:591
> ksys_write+0x127/0x250 fs/read_write.c:644
> do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
> __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
> do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203
> entry_SYSENTER_compat_after_hwframe+0x53/0x62
>
> -> #1 (oom_lock){+.+.}-{3:3}:
> __mutex_lock_common kernel/locking/mutex.c:603 [inline]
> __mutex_lock+0x12f/0x1350 kernel/locking/mutex.c:747
> mem_cgroup_out_of_memory+0x8d/0x270 mm/memcontrol.c:1640
> mem_cgroup_oom mm/memcontrol.c:1880 [inline]
> try_charge_memcg+0xef9/0x1380 mm/memcontrol.c:2670
> obj_cgroup_charge_pages mm/memcontrol.c:2999 [inline]
> obj_cgroup_charge+0x2ab/0x5e0 mm/memcontrol.c:3289
> memcg_slab_pre_alloc_hook mm/slab.h:505 [inline]
> slab_pre_alloc_hook mm/slab.h:728 [inline]
> slab_alloc_node mm/slub.c:3157 [inline]
> slab_alloc mm/slub.c:3251 [inline]
> __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
> kmem_cache_alloc+0x92/0x3b0 mm/slub.c:3268
> kmem_cache_zalloc include/linux/slab.h:723 [inline]
> alloc_buffer_head+0x20/0x140 fs/buffer.c:3294
> alloc_page_buffers+0x285/0x7a0 fs/buffer.c:829
> grow_dev_page fs/buffer.c:965 [inline]
> grow_buffers fs/buffer.c:1011 [inline]
> __getblk_slow+0x525/0x1080 fs/buffer.c:1038
> __getblk_gfp+0x6e/0x80 fs/buffer.c:1333
> sb_getblk include/linux/buffer_head.h:326 [inline]
> ext4_getblk+0x20d/0x7c0 fs/ext4/inode.c:866
> ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:912
> ext4_append+0x177/0x3a0 fs/ext4/namei.c:67
> ext4_init_new_dir+0x25e/0x4d0 fs/ext4/namei.c:2920
> ext4_mkdir+0x3cf/0xb20 fs/ext4/namei.c:2966
> vfs_mkdir+0x1c3/0x3b0 fs/namei.c:3975
> do_mkdirat+0x285/0x300 fs/namei.c:4001
> __do_sys_mkdirat fs/namei.c:4016 [inline]
> __se_sys_mkdirat fs/namei.c:4014 [inline]
> __ia32_sys_mkdirat+0x81/0xa0 fs/namei.c:4014
> do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
> __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
> do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203
> entry_SYSENTER_compat_after_hwframe+0x53/0x62
>
> -> #0 (jbd2_handle){++++}-{0:0}:
> check_prev_add kernel/locking/lockdep.c:3095 [inline]
> check_prevs_add kernel/locking/lockdep.c:3214 [inline]
> validate_chain kernel/locking/lockdep.c:3829 [inline]
> __lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053
> lock_acquire kernel/locking/lockdep.c:5665 [inline]
> lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630
> start_this_handle+0xfe7/0x14a0 fs/jbd2/transaction.c:463
> jbd2__journal_start+0x399/0x930 fs/jbd2/transaction.c:520
> __ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105
> __ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline]
> ext4_dirty_inode+0x9d/0x110 fs/ext4/inode.c:5949
> __mark_inode_dirty+0x495/0x1050 fs/fs-writeback.c:2381
> mark_inode_dirty_sync include/linux/fs.h:2337 [inline]
> iput.part.0+0x57/0x820 fs/inode.c:1767
> iput+0x58/0x70 fs/inode.c:1760
> dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401
> __dentry_kill+0x3c0/0x640 fs/dcache.c:607
> shrink_dentry_list+0x23c/0x800 fs/dcache.c:1201
> prune_dcache_sb+0xe7/0x140 fs/dcache.c:1282
> super_cache_scan+0x336/0x590 fs/super.c:104
> do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:770
> shrink_slab+0x17c/0x6f0 mm/vmscan.c:930
> shrink_node_memcgs mm/vmscan.c:3124 [inline]
> shrink_node+0x8b3/0x1db0 mm/vmscan.c:3245
> shrink_zones mm/vmscan.c:3482 [inline]
> do_try_to_free_pages+0x3b5/0x1700 mm/vmscan.c:3540
> try_to_free_pages+0x2ac/0x840 mm/vmscan.c:3775
> __perform_reclaim mm/page_alloc.c:4641 [inline]
> __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline]
> __alloc_pages_slowpath.constprop.0+0xa8a/0x2160 mm/page_alloc.c:5066
> __alloc_pages+0x436/0x510 mm/page_alloc.c:5439
> __alloc_pages_node include/linux/gfp.h:587 [inline]
> khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:859
> collapse_huge_page mm/khugepaged.c:1062 [inline]
> khugepaged_scan_pmd mm/khugepaged.c:1348 [inline]
> khugepaged_scan_mm_slot mm/khugepaged.c:2170 [inline]
> khugepaged_do_scan mm/khugepaged.c:2251 [inline]
> khugepaged+0x3473/0x66a0 mm/khugepaged.c:2296
> kthread+0x2e9/0x3a0 kernel/kthread.c:376
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
>
> other info that might help us debug this:
>
> Chain exists of:
> jbd2_handle --> oom_lock --> fs_reclaim
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(fs_reclaim);
> lock(oom_lock);
> lock(fs_reclaim);
> lock(jbd2_handle);
>
> *** DEADLOCK ***
>
> 3 locks held by khugepaged/48:
> #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4638 [inline]
> #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline]
> #0: ffffffff8bebdb20 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x9e1/0x2160 mm/page_alloc.c:5066
> #1: ffffffff8be7d850 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0xc9/0x6f0 mm/vmscan.c:920
> #2: ffff8880445800e0 (&type->s_umount_key#33){++++}-{3:3}, at: trylock_super fs/super.c:415 [inline]
> #2: ffff8880445800e0 (&type->s_umount_key#33){++++}-{3:3}, at: super_cache_scan+0x6c/0x590 fs/super.c:79
>
> stack backtrace:
> CPU: 2 PID: 48 Comm: khugepaged Not tainted 5.19.0-rc6-syzkaller-00026-g5a29232d870d #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2175
> check_prev_add kernel/locking/lockdep.c:3095 [inline]
> check_prevs_add kernel/locking/lockdep.c:3214 [inline]
> validate_chain kernel/locking/lockdep.c:3829 [inline]
> __lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053
> lock_acquire kernel/locking/lockdep.c:5665 [inline]
> lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630
> start_this_handle+0xfe7/0x14a0 fs/jbd2/transaction.c:463
> jbd2__journal_start+0x399/0x930 fs/jbd2/transaction.c:520
> __ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105
> __ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline]
> ext4_dirty_inode+0x9d/0x110 fs/ext4/inode.c:5949
> __mark_inode_dirty+0x495/0x1050 fs/fs-writeback.c:2381
> mark_inode_dirty_sync include/linux/fs.h:2337 [inline]
> iput.part.0+0x57/0x820 fs/inode.c:1767
> iput+0x58/0x70 fs/inode.c:1760
> dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401
> __dentry_kill+0x3c0/0x640 fs/dcache.c:607
> shrink_dentry_list+0x23c/0x800 fs/dcache.c:1201
> prune_dcache_sb+0xe7/0x140 fs/dcache.c:1282
> super_cache_scan+0x336/0x590 fs/super.c:104
> do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:770
> shrink_slab+0x17c/0x6f0 mm/vmscan.c:930
> shrink_node_memcgs mm/vmscan.c:3124 [inline]
> shrink_node+0x8b3/0x1db0 mm/vmscan.c:3245
> shrink_zones mm/vmscan.c:3482 [inline]
> do_try_to_free_pages+0x3b5/0x1700 mm/vmscan.c:3540
> try_to_free_pages+0x2ac/0x840 mm/vmscan.c:3775
> __perform_reclaim mm/page_alloc.c:4641 [inline]
> __alloc_pages_direct_reclaim mm/page_alloc.c:4663 [inline]
> __alloc_pages_slowpath.constprop.0+0xa8a/0x2160 mm/page_alloc.c:5066
> __alloc_pages+0x436/0x510 mm/page_alloc.c:5439
> __alloc_pages_node include/linux/gfp.h:587 [inline]
> khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:859
> collapse_huge_page mm/khugepaged.c:1062 [inline]
> khugepaged_scan_pmd mm/khugepaged.c:1348 [inline]
> khugepaged_scan_mm_slot mm/khugepaged.c:2170 [inline]
> khugepaged_do_scan mm/khugepaged.c:2251 [inline]
> khugepaged+0x3473/0x66a0 mm/khugepaged.c:2296
> kthread+0x2e9/0x3a0 kernel/kthread.c:376
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
> </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] possible deadlock in start_this_handle (3)
2022-07-14 14:18 ` Jan Kara
@ 2022-07-14 22:24 ` Tetsuo Handa
2022-07-15 1:39 ` Shakeel Butt
0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2022-07-14 22:24 UTC (permalink / raw)
To: Jan Kara, linux-mm
Cc: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso, linux-fsdevel
On 2022/07/14 23:18, Jan Kara wrote:
> Hello,
>
> so this lockdep report looks real but is more related to OOM handling than
> to ext4 as such. The immediate problem I can see is that
> mem_cgroup_print_oom_meminfo() which is called under oom_lock calls
> memory_stat_format() which does GFP_KERNEL allocations to allocate buffers
> for dumping of MM statistics. This creates oom_lock -> fs reclaim
> dependency and because OOM can be hit (and thus oom_lock acquired) in
> practically any allocation (regardless of GFP_NOFS) this has a potential of
> creating real deadlock cycles.
>
> So should mem_cgroup_print_oom_meminfo() be using
> memalloc_nofs_save/restore() to avoid such deadlocks? Or perhaps someone
> sees another solution? Generally allocating memory to report OOM looks a
> bit dangerous to me ;).
>
> Honza
I think mem_cgroup_print_oom_meminfo() should use GFP_ATOMIC, for it will fall into
infinite loop if kmalloc(GFP_NOFS) under oom_lock reached __alloc_pages_may_oom() path.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] possible deadlock in start_this_handle (3)
2022-07-14 22:24 ` Tetsuo Handa
@ 2022-07-15 1:39 ` Shakeel Butt
2022-07-15 1:53 ` Tetsuo Handa
0 siblings, 1 reply; 12+ messages in thread
From: Shakeel Butt @ 2022-07-15 1:39 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Jan Kara, linux-mm, jack, linux-ext4, linux-kernel,
syzkaller-bugs, tytso, linux-fsdevel
On Fri, Jul 15, 2022 at 07:24:55AM +0900, Tetsuo Handa wrote:
> On 2022/07/14 23:18, Jan Kara wrote:
> > Hello,
> >
> > so this lockdep report looks real but is more related to OOM handling than
> > to ext4 as such. The immediate problem I can see is that
> > mem_cgroup_print_oom_meminfo() which is called under oom_lock calls
> > memory_stat_format() which does GFP_KERNEL allocations to allocate buffers
> > for dumping of MM statistics. This creates oom_lock -> fs reclaim
> > dependency and because OOM can be hit (and thus oom_lock acquired) in
> > practically any allocation (regardless of GFP_NOFS) this has a potential of
> > creating real deadlock cycles.
> >
> > So should mem_cgroup_print_oom_meminfo() be using
> > memalloc_nofs_save/restore() to avoid such deadlocks? Or perhaps someone
> > sees another solution? Generally allocating memory to report OOM looks a
> > bit dangerous to me ;).
mem_cgroup_print_oom_meminfo() is called only for memcg OOMs. So, the
situaion would be dangerous only if the system is also OOM at that time.
> >
> > Honza
>
> I think mem_cgroup_print_oom_meminfo() should use GFP_ATOMIC, for it will fall into
> infinite loop if kmalloc(GFP_NOFS) under oom_lock reached __alloc_pages_may_oom() path.
I would prefer GFP_NOWAIT. This is printing info for memcg OOMs and if
the system is low on memory then memcg OOMs has lower importance than
the system state.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot] possible deadlock in start_this_handle (3)
2022-07-15 1:39 ` Shakeel Butt
@ 2022-07-15 1:53 ` Tetsuo Handa
2022-07-20 23:49 ` [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock Tetsuo Handa
0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2022-07-15 1:53 UTC (permalink / raw)
To: Shakeel Butt
Cc: Jan Kara, linux-mm, jack, linux-ext4, linux-kernel,
syzkaller-bugs, tytso, linux-fsdevel
On 2022/07/15 10:39, Shakeel Butt wrote:
>> I think mem_cgroup_print_oom_meminfo() should use GFP_ATOMIC, for it will fall into
>> infinite loop if kmalloc(GFP_NOFS) under oom_lock reached __alloc_pages_may_oom() path.
>
> I would prefer GFP_NOWAIT. This is printing info for memcg OOMs and if
> the system is low on memory then memcg OOMs has lower importance than
> the system state.
Since killing a process in some memcg likely helps solving global OOM state,
system OOM condition might not be reported when memory allocation by
mem_cgroup_print_oom_meminfo() caused system OOM condition.
Therefore, we don't need to discard output from memcg OOM condition.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock
2022-07-15 1:53 ` Tetsuo Handa
@ 2022-07-20 23:49 ` Tetsuo Handa
2022-07-21 8:01 ` Michal Hocko
0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2022-07-20 23:49 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Andrew Morton; +Cc: linux-mm
syzbot is reporting fs_reclaim allocation with oom_lock held [1]. We
must make sure that such allocation won't hit __alloc_pages_may_oom()
path which will retry forever if oom_lock is already held.
I choose GFP_ATOMIC than GFP_NOWAIT, for since global OOM situation will
likely be avoided by killing some process in memcg, and memory will be
released after printk(), trying a little hard will be acceptable.
Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1]
Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com>
Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
mm/memcontrol.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 618c366a2f07..11cd900729b9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1465,7 +1465,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
struct seq_buf s;
int i;
- seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
+ /* Caller might be holding oom_lock. */
+ seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_ATOMIC), PAGE_SIZE);
if (!s.buffer)
return NULL;
--
2.18.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock
2022-07-20 23:49 ` [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock Tetsuo Handa
@ 2022-07-21 8:01 ` Michal Hocko
2022-07-22 0:46 ` [PATCH v2] " Tetsuo Handa
0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2022-07-21 8:01 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Johannes Weiner, Andrew Morton, linux-mm
On Thu 21-07-22 08:49:57, Tetsuo Handa wrote:
> syzbot is reporting fs_reclaim allocation with oom_lock held [1]. We
> must make sure that such allocation won't hit __alloc_pages_may_oom()
> path which will retry forever if oom_lock is already held.
>
> I choose GFP_ATOMIC than GFP_NOWAIT, for since global OOM situation will
> likely be avoided by killing some process in memcg, and memory will be
> released after printk(), trying a little hard will be acceptable.
Nope, this is not a proper fix. You are making memory.stat more likely
to fail.
An uncoditional GFP_KERNEL allocation is certainly not good but is there
any reason to not use GFP_NOIO instead?
Or even better. In an ideal world we won't allocate from here at
all. Can we pre-allocate that single page and re-use it for the oom
report?
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abec50f31fe6..13483cb278bb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg,
return memcg_page_state(memcg, item) * memcg_page_state_unit(item);
}
-static char *memory_stat_format(struct mem_cgroup *memcg)
+void memory_stat_format(struct mem_cgroup *memcg, char *buf)
{
struct seq_buf s;
int i;
- seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
- if (!s.buffer)
- return NULL;
+ seq_buf_init(&s, buf, PAGE_SIZE);
/*
* Provide statistics on the state of the memory subsystem as
@@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
/* The above should easily fit into one page */
WARN_ON_ONCE(seq_buf_has_overflowed(&s));
-
- return s.buffer;
}
#define K(x) ((x) << (PAGE_SHIFT-10))
@@ -1563,6 +1559,12 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
rcu_read_unlock();
}
+/*
+ * preallocated buffer to collect memory stats for the oom situation.
+ * Usage protected by oom_lock
+ */
+char oombuf[PAGE_SIZE];
+
/**
* mem_cgroup_print_oom_meminfo: Print OOM memory information relevant to
* memory controller.
@@ -1570,7 +1572,7 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
*/
void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
{
- char *buf;
+ lockdep_assert_held(&oom_lock);
pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
K((u64)page_counter_read(&memcg->memory)),
@@ -1591,11 +1593,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
pr_info("Memory cgroup stats for ");
pr_cont_cgroup_path(memcg->css.cgroup);
pr_cont(":");
- buf = memory_stat_format(memcg);
- if (!buf)
- return;
- pr_info("%s", buf);
- kfree(buf);
+ memory_stat_format(memcg, oombuf);
+ pr_info("%s", oombuf);
}
/*
@@ -6335,11 +6334,11 @@ static int memory_events_local_show(struct seq_file *m, void *v)
static int memory_stat_show(struct seq_file *m, void *v)
{
struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
- char *buf;
+ char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
- buf = memory_stat_format(memcg);
if (!buf)
return -ENOMEM;
+ memory_stat_format(memcg, buf);
seq_puts(m, buf);
kfree(buf);
return 0;
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2] mm: memcontrol: fix potential oom_lock recursion deadlock
2022-07-21 8:01 ` Michal Hocko
@ 2022-07-22 0:46 ` Tetsuo Handa
2022-07-22 7:19 ` Michal Hocko
0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2022-07-22 0:46 UTC (permalink / raw)
To: Michal Hocko; +Cc: Johannes Weiner, Andrew Morton, linux-mm
syzbot is reporting GFP_KERNEL allocation with oom_lock held [1]. We
must make sure that such allocation won't hit __alloc_pages_may_oom()
path which will retry forever if oom_lock is already held. Use static
buffer when oom_lock is already held.
Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1]
Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
Changes in v2:
Use static buffer for OOM reporting, suggested by Michal Hocko <mhocko@suse.com>.
mm/memcontrol.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 618c366a2f07..8092be2fbb7c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg,
return memcg_page_state(memcg, item) * memcg_page_state_unit(item);
}
-static char *memory_stat_format(struct mem_cgroup *memcg)
+static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize)
{
struct seq_buf s;
int i;
- seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
- if (!s.buffer)
- return NULL;
+ seq_buf_init(&s, buf, bufsize);
/*
* Provide statistics on the state of the memory subsystem as
@@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
/* The above should easily fit into one page */
WARN_ON_ONCE(seq_buf_has_overflowed(&s));
-
- return s.buffer;
}
#define K(x) ((x) << (PAGE_SHIFT-10))
@@ -1570,7 +1566,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
*/
void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
{
- char *buf;
+ /* Use static buffer, for the caller is holding oom_lock. */
+ static char buf[PAGE_SIZE];
+
+ lockdep_assert_held(&oom_lock);
pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
K((u64)page_counter_read(&memcg->memory)),
@@ -1591,11 +1590,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
pr_info("Memory cgroup stats for ");
pr_cont_cgroup_path(memcg->css.cgroup);
pr_cont(":");
- buf = memory_stat_format(memcg);
- if (!buf)
- return;
+ memory_stat_format(memcg, buf, sizeof(buf));
pr_info("%s", buf);
- kfree(buf);
}
/*
@@ -6335,11 +6331,11 @@ static int memory_events_local_show(struct seq_file *m, void *v)
static int memory_stat_show(struct seq_file *m, void *v)
{
struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
- char *buf;
+ char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
- buf = memory_stat_format(memcg);
if (!buf)
return -ENOMEM;
+ memory_stat_format(memcg, buf, PAGE_SIZE);
seq_puts(m, buf);
kfree(buf);
return 0;
--
2.18.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm: memcontrol: fix potential oom_lock recursion deadlock
2022-07-22 0:46 ` [PATCH v2] " Tetsuo Handa
@ 2022-07-22 7:19 ` Michal Hocko
2022-07-22 10:45 ` [PATCH v3] " Tetsuo Handa
0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2022-07-22 7:19 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Johannes Weiner, Andrew Morton, linux-mm
On Fri 22-07-22 09:46:27, Tetsuo Handa wrote:
> syzbot is reporting GFP_KERNEL allocation with oom_lock held [1]. We
> must make sure that such allocation won't hit __alloc_pages_may_oom()
> path which will retry forever if oom_lock is already held. Use static
> buffer when oom_lock is already held.
The changelog is rather cryptic. Your previous one was more readable.
I would go with:
"
syzbot is reporting GFP_KERNEL allocation with oom_lock held [1]
when reporting memcg oom. This is problematic because this creates a
dependency between GFP_NOFS and GFP_KERNEL over oom_lock which could
dead lock the system.
Fix the problem by removing the allocation from memory_stat_format
completely. Use a statically preallocated buffer instead for this path.
OOM dumping is synchronized by the oom_lock so there is no exclusion
required here. memory_stat_show can use GFP_KERNEL allocation.
"
> Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1]
> Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
> ---
> Changes in v2:
> Use static buffer for OOM reporting, suggested by Michal Hocko <mhocko@suse.com>.
>
> mm/memcontrol.c | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 618c366a2f07..8092be2fbb7c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg,
> return memcg_page_state(memcg, item) * memcg_page_state_unit(item);
> }
>
> -static char *memory_stat_format(struct mem_cgroup *memcg)
> +static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize)
> {
> struct seq_buf s;
> int i;
>
> - seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> - if (!s.buffer)
> - return NULL;
> + seq_buf_init(&s, buf, bufsize);
>
> /*
> * Provide statistics on the state of the memory subsystem as
> @@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
>
> /* The above should easily fit into one page */
> WARN_ON_ONCE(seq_buf_has_overflowed(&s));
> -
> - return s.buffer;
> }
>
> #define K(x) ((x) << (PAGE_SHIFT-10))
> @@ -1570,7 +1566,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
> */
> void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
> {
> - char *buf;
> + /* Use static buffer, for the caller is holding oom_lock. */
> + static char buf[PAGE_SIZE];
> +
> + lockdep_assert_held(&oom_lock);
>
> pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
> K((u64)page_counter_read(&memcg->memory)),
> @@ -1591,11 +1590,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
> pr_info("Memory cgroup stats for ");
> pr_cont_cgroup_path(memcg->css.cgroup);
> pr_cont(":");
> - buf = memory_stat_format(memcg);
> - if (!buf)
> - return;
> + memory_stat_format(memcg, buf, sizeof(buf));
> pr_info("%s", buf);
> - kfree(buf);
> }
>
> /*
> @@ -6335,11 +6331,11 @@ static int memory_events_local_show(struct seq_file *m, void *v)
> static int memory_stat_show(struct seq_file *m, void *v)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> - char *buf;
> + char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>
> - buf = memory_stat_format(memcg);
> if (!buf)
> return -ENOMEM;
> + memory_stat_format(memcg, buf, PAGE_SIZE);
> seq_puts(m, buf);
> kfree(buf);
> return 0;
> --
> 2.18.4
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3] mm: memcontrol: fix potential oom_lock recursion deadlock
2022-07-22 7:19 ` Michal Hocko
@ 2022-07-22 10:45 ` Tetsuo Handa
2022-07-22 11:04 ` Michal Hocko
0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2022-07-22 10:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm
syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting
memcg OOM [1]. Such allocation request might deadlock the system, for
__alloc_pages_may_oom() cannot invoke global OOM killer due to oom_lock
being already held by the caller.
Fix this problem by removing the allocation from memory_stat_format()
completely, and pass static buffer when calling from memcg OOM path.
Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1]
Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
---
Changes in v3:
Update patch description.
Changes in v2:
Use static buffer for OOM reporting, suggested by Michal Hocko <mhocko@suse.com>.
mm/memcontrol.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 618c366a2f07..8092be2fbb7c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg,
return memcg_page_state(memcg, item) * memcg_page_state_unit(item);
}
-static char *memory_stat_format(struct mem_cgroup *memcg)
+static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize)
{
struct seq_buf s;
int i;
- seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
- if (!s.buffer)
- return NULL;
+ seq_buf_init(&s, buf, bufsize);
/*
* Provide statistics on the state of the memory subsystem as
@@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
/* The above should easily fit into one page */
WARN_ON_ONCE(seq_buf_has_overflowed(&s));
-
- return s.buffer;
}
#define K(x) ((x) << (PAGE_SHIFT-10))
@@ -1570,7 +1566,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
*/
void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
{
- char *buf;
+ /* Use static buffer, for the caller is holding oom_lock. */
+ static char buf[PAGE_SIZE];
+
+ lockdep_assert_held(&oom_lock);
pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
K((u64)page_counter_read(&memcg->memory)),
@@ -1591,11 +1590,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
pr_info("Memory cgroup stats for ");
pr_cont_cgroup_path(memcg->css.cgroup);
pr_cont(":");
- buf = memory_stat_format(memcg);
- if (!buf)
- return;
+ memory_stat_format(memcg, buf, sizeof(buf));
pr_info("%s", buf);
- kfree(buf);
}
/*
@@ -6335,11 +6331,11 @@ static int memory_events_local_show(struct seq_file *m, void *v)
static int memory_stat_show(struct seq_file *m, void *v)
{
struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
- char *buf;
+ char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
- buf = memory_stat_format(memcg);
if (!buf)
return -ENOMEM;
+ memory_stat_format(memcg, buf, PAGE_SIZE);
seq_puts(m, buf);
kfree(buf);
return 0;
--
2.18.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3] mm: memcontrol: fix potential oom_lock recursion deadlock
2022-07-22 10:45 ` [PATCH v3] " Tetsuo Handa
@ 2022-07-22 11:04 ` Michal Hocko
2022-07-22 11:12 ` Tetsuo Handa
0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2022-07-22 11:04 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Andrew Morton, Johannes Weiner, linux-mm
On Fri 22-07-22 19:45:39, Tetsuo Handa wrote:
> syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting
> memcg OOM [1]. Such allocation request might deadlock the system, for
> __alloc_pages_may_oom() cannot invoke global OOM killer due to oom_lock
> being already held by the caller.
OK, I have misunderstood your previous wording and now I have realized
that there are 2 issues here. One of them is a (less likely) dead lock on
the oom_lock not making a fwd progress (that would require global OOM
racing with memcg OOM) and the other is the GFP_NOFS->GFP_KERNEL dependency
which can deadlock even without global the above race.
Correct?
Sorry I could have realized that sooner.
> Fix this problem by removing the allocation from memory_stat_format()
> completely, and pass static buffer when calling from memcg OOM path.
>
> Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1]
> Reported-by: syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> Changes in v3:
> Update patch description.
>
> Changes in v2:
> Use static buffer for OOM reporting, suggested by Michal Hocko <mhocko@suse.com>.
>
> mm/memcontrol.c | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 618c366a2f07..8092be2fbb7c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1460,14 +1460,12 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg,
> return memcg_page_state(memcg, item) * memcg_page_state_unit(item);
> }
>
> -static char *memory_stat_format(struct mem_cgroup *memcg)
> +static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize)
> {
> struct seq_buf s;
> int i;
>
> - seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> - if (!s.buffer)
> - return NULL;
> + seq_buf_init(&s, buf, bufsize);
>
> /*
> * Provide statistics on the state of the memory subsystem as
> @@ -1533,8 +1531,6 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
>
> /* The above should easily fit into one page */
> WARN_ON_ONCE(seq_buf_has_overflowed(&s));
> -
> - return s.buffer;
> }
>
> #define K(x) ((x) << (PAGE_SHIFT-10))
> @@ -1570,7 +1566,10 @@ void mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *
> */
> void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
> {
> - char *buf;
> + /* Use static buffer, for the caller is holding oom_lock. */
> + static char buf[PAGE_SIZE];
> +
> + lockdep_assert_held(&oom_lock);
>
> pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
> K((u64)page_counter_read(&memcg->memory)),
> @@ -1591,11 +1590,8 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
> pr_info("Memory cgroup stats for ");
> pr_cont_cgroup_path(memcg->css.cgroup);
> pr_cont(":");
> - buf = memory_stat_format(memcg);
> - if (!buf)
> - return;
> + memory_stat_format(memcg, buf, sizeof(buf));
> pr_info("%s", buf);
> - kfree(buf);
> }
>
> /*
> @@ -6335,11 +6331,11 @@ static int memory_events_local_show(struct seq_file *m, void *v)
> static int memory_stat_show(struct seq_file *m, void *v)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> - char *buf;
> + char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>
> - buf = memory_stat_format(memcg);
> if (!buf)
> return -ENOMEM;
> + memory_stat_format(memcg, buf, PAGE_SIZE);
> seq_puts(m, buf);
> kfree(buf);
> return 0;
> --
> 2.18.4
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] mm: memcontrol: fix potential oom_lock recursion deadlock
2022-07-22 11:04 ` Michal Hocko
@ 2022-07-22 11:12 ` Tetsuo Handa
0 siblings, 0 replies; 12+ messages in thread
From: Tetsuo Handa @ 2022-07-22 11:12 UTC (permalink / raw)
To: Michal Hocko; +Cc: Andrew Morton, Johannes Weiner, linux-mm
On 2022/07/22 20:04, Michal Hocko wrote:
> On Fri 22-07-22 19:45:39, Tetsuo Handa wrote:
>> syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting
>> memcg OOM [1]. Such allocation request might deadlock the system, for
>> __alloc_pages_may_oom() cannot invoke global OOM killer due to oom_lock
>> being already held by the caller.
>
> OK, I have misunderstood your previous wording and now I have realized
> that there are 2 issues here. One of them is a (less likely) dead lock on
> the oom_lock not making a fwd progress (that would require global OOM
> racing with memcg OOM) and the other is the GFP_NOFS->GFP_KERNEL dependency
> which can deadlock even without global the above race.
> Correct?
Correct.
>
> Sorry I could have realized that sooner.
>
No problem. Thanks for suggestion.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-07-22 11:12 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14 12:08 [syzbot] possible deadlock in start_this_handle (3) syzbot
2022-07-14 14:18 ` Jan Kara
2022-07-14 22:24 ` Tetsuo Handa
2022-07-15 1:39 ` Shakeel Butt
2022-07-15 1:53 ` Tetsuo Handa
2022-07-20 23:49 ` [PATCH] mm: memcontrol: fix potential oom_lock recursion deadlock Tetsuo Handa
2022-07-21 8:01 ` Michal Hocko
2022-07-22 0:46 ` [PATCH v2] " Tetsuo Handa
2022-07-22 7:19 ` Michal Hocko
2022-07-22 10:45 ` [PATCH v3] " Tetsuo Handa
2022-07-22 11:04 ` Michal Hocko
2022-07-22 11:12 ` Tetsuo Handa
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.