All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [ext4?] possible deadlock in evict (3)
@ 2023-02-28 17:25 syzbot
  2023-03-01  0:01 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: syzbot @ 2023-02-28 17:25 UTC (permalink / raw)
  To: adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso

Hello,

syzbot found the following issue on:

HEAD commit:    ae3419fbac84 vc_screen: don't clobber return value in vcs_..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1136fe18c80000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ff98a3b3c1aed3ab
dashboard link: https://syzkaller.appspot.com/bug?extid=dd426ae4af71f1e74729
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+dd426ae4af71f1e74729@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.2.0-syzkaller-12913-gae3419fbac84 #0 Not tainted
------------------------------------------------------
kswapd0/100 is trying to acquire lock:
ffff888047aea650 (sb_internal){.+.+}-{0:0}, at: evict+0x2ed/0x6b0 fs/inode.c:665

but task is already holding lock:
ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: set_task_reclaim_state mm/vmscan.c:200 [inline]
ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x170/0x1ac0 mm/vmscan.c:7338

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (fs_reclaim){+.+.}-{0:0}:
       __fs_reclaim_acquire mm/page_alloc.c:4716 [inline]
       fs_reclaim_acquire+0x11d/0x160 mm/page_alloc.c:4730
       might_alloc include/linux/sched/mm.h:271 [inline]
       prepare_alloc_pages+0x159/0x570 mm/page_alloc.c:5362
       __alloc_pages+0x149/0x5c0 mm/page_alloc.c:5580
       alloc_pages+0x1aa/0x270 mm/mempolicy.c:2283
       __get_free_pages+0xc/0x40 mm/page_alloc.c:5641
       kasan_populate_vmalloc_pte mm/kasan/shadow.c:309 [inline]
       kasan_populate_vmalloc_pte+0x27/0x150 mm/kasan/shadow.c:300
       apply_to_pte_range mm/memory.c:2578 [inline]
       apply_to_pmd_range mm/memory.c:2622 [inline]
       apply_to_pud_range mm/memory.c:2658 [inline]
       apply_to_p4d_range mm/memory.c:2694 [inline]
       __apply_to_page_range+0x68c/0x1030 mm/memory.c:2728
       alloc_vmap_area+0x536/0x1f20 mm/vmalloc.c:1638
       __get_vm_area_node+0x145/0x3f0 mm/vmalloc.c:2495
       __vmalloc_node_range+0x250/0x1300 mm/vmalloc.c:3141
       kvmalloc_node+0x156/0x1a0 mm/util.c:628
       kvmalloc include/linux/slab.h:737 [inline]
       ext4_xattr_move_to_block fs/ext4/xattr.c:2570 [inline]
       ext4_xattr_make_inode_space fs/ext4/xattr.c:2685 [inline]
       ext4_expand_extra_isize_ea+0x7d5/0x1680 fs/ext4/xattr.c:2777
       __ext4_expand_extra_isize+0x33e/0x470 fs/ext4/inode.c:5957
       ext4_try_to_expand_extra_isize fs/ext4/inode.c:6000 [inline]
       __ext4_mark_inode_dirty+0x534/0x950 fs/ext4/inode.c:6078
       ext4_inline_data_truncate+0x610/0xd70 fs/ext4/inline.c:2020
       ext4_truncate+0xa5b/0x1620 fs/ext4/inode.c:4298
       ext4_process_orphan+0x158/0x410 fs/ext4/orphan.c:339
       ext4_orphan_cleanup+0x6e5/0x1110 fs/ext4/orphan.c:474
       __ext4_fill_super fs/ext4/super.c:5500 [inline]
       ext4_fill_super+0xa22f/0xb1f0 fs/ext4/super.c:5628
       get_tree_bdev+0x444/0x760 fs/super.c:1294
       vfs_get_tree+0x8d/0x350 fs/super.c:1501
       do_new_mount fs/namespace.c:3042 [inline]
       path_mount+0x1342/0x1e40 fs/namespace.c:3372
       do_mount fs/namespace.c:3385 [inline]
       __do_sys_mount fs/namespace.c:3594 [inline]
       __se_sys_mount fs/namespace.c:3571 [inline]
       __x64_sys_mount+0x283/0x300 fs/namespace.c:3571
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #2 (&ei->xattr_sem){++++}-{3:3}:
       down_write+0x92/0x200 kernel/locking/rwsem.c:1573
       ext4_write_lock_xattr fs/ext4/xattr.h:155 [inline]
       ext4_xattr_set_handle+0x160/0x1510 fs/ext4/xattr.c:2321
       ext4_initxattrs+0xb9/0x120 fs/ext4/xattr_security.c:44
       security_inode_init_security+0x1c8/0x370 security/security.c:1147
       __ext4_new_inode+0x3b03/0x5890 fs/ext4/ialloc.c:1324
       ext4_create+0x2da/0x4e0 fs/ext4/namei.c:2809
       lookup_open.isra.0+0x105a/0x1400 fs/namei.c:3416
       open_last_lookups fs/namei.c:3484 [inline]
       path_openat+0x975/0x2750 fs/namei.c:3712
       do_filp_open+0x1ba/0x410 fs/namei.c:3742
       do_sys_openat2+0x16d/0x4c0 fs/open.c:1348
       do_sys_open fs/open.c:1364 [inline]
       __do_sys_openat fs/open.c:1380 [inline]
       __se_sys_openat fs/open.c:1375 [inline]
       __x64_sys_openat+0x143/0x1f0 fs/open.c:1375
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #1 (jbd2_handle){++++}-{0:0}:
       start_this_handle+0xfe7/0x14e0 fs/jbd2/transaction.c:463
       jbd2__journal_start+0x39d/0x9d0 fs/jbd2/transaction.c:520
       __ext4_journal_start_sb+0x706/0x890 fs/ext4/ext4_jbd2.c:111
       ext4_sample_last_mounted fs/ext4/file.c:841 [inline]
       ext4_file_open+0x618/0xbf0 fs/ext4/file.c:870
       do_dentry_open+0x6cc/0x13f0 fs/open.c:920
       do_open fs/namei.c:3560 [inline]
       path_openat+0x1baa/0x2750 fs/namei.c:3715
       do_filp_open+0x1ba/0x410 fs/namei.c:3742
       do_sys_openat2+0x16d/0x4c0 fs/open.c:1348
       do_sys_open fs/open.c:1364 [inline]
       __do_sys_openat fs/open.c:1380 [inline]
       __se_sys_openat fs/open.c:1375 [inline]
       __x64_sys_openat+0x143/0x1f0 fs/open.c:1375
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #0 (sb_internal){.+.+}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3098 [inline]
       check_prevs_add kernel/locking/lockdep.c:3217 [inline]
       validate_chain kernel/locking/lockdep.c:3832 [inline]
       __lock_acquire+0x2ec7/0x5d40 kernel/locking/lockdep.c:5056
       lock_acquire kernel/locking/lockdep.c:5669 [inline]
       lock_acquire+0x1e3/0x670 kernel/locking/lockdep.c:5634
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write include/linux/fs.h:1477 [inline]
       sb_start_intwrite include/linux/fs.h:1599 [inline]
       ext4_evict_inode+0xfc7/0x1b50 fs/ext4/inode.c:240
       evict+0x2ed/0x6b0 fs/inode.c:665
       iput_final fs/inode.c:1748 [inline]
       iput.part.0+0x59b/0x8a0 fs/inode.c:1774
       iput+0x5c/0x80 fs/inode.c:1764
       dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401
       __dentry_kill+0x3c0/0x640 fs/dcache.c:607
       shrink_dentry_list+0x12c/0x4f0 fs/dcache.c:1201
       prune_dcache_sb+0xeb/0x150 fs/dcache.c:1282
       super_cache_scan+0x33a/0x590 fs/super.c:104
       do_shrink_slab+0x464/0xd20 mm/vmscan.c:853
       shrink_slab_memcg mm/vmscan.c:922 [inline]
       shrink_slab+0x388/0x660 mm/vmscan.c:1001
       shrink_one+0x502/0x810 mm/vmscan.c:5343
       shrink_many mm/vmscan.c:5394 [inline]
       lru_gen_shrink_node mm/vmscan.c:5511 [inline]
       shrink_node+0x2064/0x35f0 mm/vmscan.c:6459
       kswapd_shrink_node mm/vmscan.c:7262 [inline]
       balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452
       kswapd+0x70b/0x1000 mm/vmscan.c:7712
       kthread+0x2e8/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

other info that might help us debug this:

Chain exists of:
  sb_internal --> &ei->xattr_sem --> fs_reclaim

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(&ei->xattr_sem);
                               lock(fs_reclaim);
  lock(sb_internal);

 *** DEADLOCK ***

3 locks held by kswapd0/100:
 #0: ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: set_task_reclaim_state mm/vmscan.c:200 [inline]
 #0: ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x170/0x1ac0 mm/vmscan.c:7338
 #1: ffffffff8c8995d0 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab_memcg mm/vmscan.c:895 [inline]
 #1: ffffffff8c8995d0 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x2a0/0x660 mm/vmscan.c:1001
 #2: ffff888047aea0e0 (&type->s_umount_key#50){++++}-{3:3}, at: trylock_super fs/super.c:414 [inline]
 #2: ffff888047aea0e0 (&type->s_umount_key#50){++++}-{3:3}, at: super_cache_scan+0x70/0x590 fs/super.c:79

stack backtrace:
CPU: 2 PID: 100 Comm: kswapd0 Not tainted 6.2.0-syzkaller-12913-gae3419fbac84 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2178
 check_prev_add kernel/locking/lockdep.c:3098 [inline]
 check_prevs_add kernel/locking/lockdep.c:3217 [inline]
 validate_chain kernel/locking/lockdep.c:3832 [inline]
 __lock_acquire+0x2ec7/0x5d40 kernel/locking/lockdep.c:5056
 lock_acquire kernel/locking/lockdep.c:5669 [inline]
 lock_acquire+0x1e3/0x670 kernel/locking/lockdep.c:5634
 percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
 __sb_start_write include/linux/fs.h:1477 [inline]
 sb_start_intwrite include/linux/fs.h:1599 [inline]
 ext4_evict_inode+0xfc7/0x1b50 fs/ext4/inode.c:240
 evict+0x2ed/0x6b0 fs/inode.c:665
 iput_final fs/inode.c:1748 [inline]
 iput.part.0+0x59b/0x8a0 fs/inode.c:1774
 iput+0x5c/0x80 fs/inode.c:1764
 dentry_unlink_inode+0x2b1/0x460 fs/dcache.c:401
 __dentry_kill+0x3c0/0x640 fs/dcache.c:607
 shrink_dentry_list+0x12c/0x4f0 fs/dcache.c:1201
 prune_dcache_sb+0xeb/0x150 fs/dcache.c:1282
 super_cache_scan+0x33a/0x590 fs/super.c:104
 do_shrink_slab+0x464/0xd20 mm/vmscan.c:853
 shrink_slab_memcg mm/vmscan.c:922 [inline]
 shrink_slab+0x388/0x660 mm/vmscan.c:1001
 shrink_one+0x502/0x810 mm/vmscan.c:5343
 shrink_many mm/vmscan.c:5394 [inline]
 lru_gen_shrink_node mm/vmscan.c:5511 [inline]
 shrink_node+0x2064/0x35f0 mm/vmscan.c:6459
 kswapd_shrink_node mm/vmscan.c:7262 [inline]
 balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452
 kswapd+0x70b/0x1000 mm/vmscan.c:7712
 kthread+0x2e8/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [ext4?] possible deadlock in evict (3)
  2023-02-28 17:25 [syzbot] [ext4?] possible deadlock in evict (3) syzbot
@ 2023-03-01  0:01 ` Dave Chinner
  2023-03-01 16:27   ` Ritesh Harjani
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2023-03-01  0:01 UTC (permalink / raw)
  To: syzbot
  Cc: adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso

[obvious one for the ext4 people]

On Tue, Feb 28, 2023 at 09:25:55AM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    ae3419fbac84 vc_screen: don't clobber return value in vcs_..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1136fe18c80000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ff98a3b3c1aed3ab
> dashboard link: https://syzkaller.appspot.com/bug?extid=dd426ae4af71f1e74729
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+dd426ae4af71f1e74729@syzkaller.appspotmail.com
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.2.0-syzkaller-12913-gae3419fbac84 #0 Not tainted
> ------------------------------------------------------
> kswapd0/100 is trying to acquire lock:
> ffff888047aea650 (sb_internal){.+.+}-{0:0}, at: evict+0x2ed/0x6b0 fs/inode.c:665
> 
> but task is already holding lock:
> ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: set_task_reclaim_state mm/vmscan.c:200 [inline]
> ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x170/0x1ac0 mm/vmscan.c:7338
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #3 (fs_reclaim){+.+.}-{0:0}:
>        __fs_reclaim_acquire mm/page_alloc.c:4716 [inline]
>        fs_reclaim_acquire+0x11d/0x160 mm/page_alloc.c:4730
>        might_alloc include/linux/sched/mm.h:271 [inline]
>        prepare_alloc_pages+0x159/0x570 mm/page_alloc.c:5362
>        __alloc_pages+0x149/0x5c0 mm/page_alloc.c:5580
>        alloc_pages+0x1aa/0x270 mm/mempolicy.c:2283
>        __get_free_pages+0xc/0x40 mm/page_alloc.c:5641
>        kasan_populate_vmalloc_pte mm/kasan/shadow.c:309 [inline]
>        kasan_populate_vmalloc_pte+0x27/0x150 mm/kasan/shadow.c:300
>        apply_to_pte_range mm/memory.c:2578 [inline]
>        apply_to_pmd_range mm/memory.c:2622 [inline]
>        apply_to_pud_range mm/memory.c:2658 [inline]
>        apply_to_p4d_range mm/memory.c:2694 [inline]
>        __apply_to_page_range+0x68c/0x1030 mm/memory.c:2728
>        alloc_vmap_area+0x536/0x1f20 mm/vmalloc.c:1638
>        __get_vm_area_node+0x145/0x3f0 mm/vmalloc.c:2495
>        __vmalloc_node_range+0x250/0x1300 mm/vmalloc.c:3141
>        kvmalloc_node+0x156/0x1a0 mm/util.c:628
>        kvmalloc include/linux/slab.h:737 [inline]
>        ext4_xattr_move_to_block fs/ext4/xattr.c:2570 [inline]

	buffer = kvmalloc(value_size, GFP_NOFS);

Yeah, this doesn't work like the code says it should. The gfp mask
is not passed down to the page table population code and it hard
codes GFP_KERNEL allocations so you have to do:

	memalloc_nofs_save();
	buffer = kvmalloc(value_size, GFP_KERNEL);
	memalloc_nofs_restore();

to apply GFP_NOFS to allocations in the pte population code to avoid
memory reclaim recursion in kvmalloc.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [ext4?] possible deadlock in evict (3)
  2023-03-01  0:01 ` Dave Chinner
@ 2023-03-01 16:27   ` Ritesh Harjani
  2023-03-01 21:55     ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Ritesh Harjani @ 2023-03-01 16:27 UTC (permalink / raw)
  To: Dave Chinner, syzbot
  Cc: adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso

Dave Chinner <david@fromorbit.com> writes:

> [obvious one for the ext4 people]
>
> On Tue, Feb 28, 2023 at 09:25:55AM -0800, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    ae3419fbac84 vc_screen: don't clobber return value in vcs_..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1136fe18c80000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=ff98a3b3c1aed3ab
>> dashboard link: https://syzkaller.appspot.com/bug?extid=dd426ae4af71f1e74729
>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+dd426ae4af71f1e74729@syzkaller.appspotmail.com
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.2.0-syzkaller-12913-gae3419fbac84 #0 Not tainted
>> ------------------------------------------------------
>> kswapd0/100 is trying to acquire lock:
>> ffff888047aea650 (sb_internal){.+.+}-{0:0}, at: evict+0x2ed/0x6b0 fs/inode.c:665
>>
>> but task is already holding lock:
>> ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: set_task_reclaim_state mm/vmscan.c:200 [inline]
>> ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x170/0x1ac0 mm/vmscan.c:7338
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #3 (fs_reclaim){+.+.}-{0:0}:
>>        __fs_reclaim_acquire mm/page_alloc.c:4716 [inline]
>>        fs_reclaim_acquire+0x11d/0x160 mm/page_alloc.c:4730
>>        might_alloc include/linux/sched/mm.h:271 [inline]
>>        prepare_alloc_pages+0x159/0x570 mm/page_alloc.c:5362
>>        __alloc_pages+0x149/0x5c0 mm/page_alloc.c:5580
>>        alloc_pages+0x1aa/0x270 mm/mempolicy.c:2283
>>        __get_free_pages+0xc/0x40 mm/page_alloc.c:5641
>>        kasan_populate_vmalloc_pte mm/kasan/shadow.c:309 [inline]
>>        kasan_populate_vmalloc_pte+0x27/0x150 mm/kasan/shadow.c:300
>>        apply_to_pte_range mm/memory.c:2578 [inline]
>>        apply_to_pmd_range mm/memory.c:2622 [inline]
>>        apply_to_pud_range mm/memory.c:2658 [inline]
>>        apply_to_p4d_range mm/memory.c:2694 [inline]
>>        __apply_to_page_range+0x68c/0x1030 mm/memory.c:2728
>>        alloc_vmap_area+0x536/0x1f20 mm/vmalloc.c:1638
>>        __get_vm_area_node+0x145/0x3f0 mm/vmalloc.c:2495
>>        __vmalloc_node_range+0x250/0x1300 mm/vmalloc.c:3141
>>        kvmalloc_node+0x156/0x1a0 mm/util.c:628
>>        kvmalloc include/linux/slab.h:737 [inline]
>>        ext4_xattr_move_to_block fs/ext4/xattr.c:2570 [inline]
>
> 	buffer = kvmalloc(value_size, GFP_NOFS);
>
> Yeah, this doesn't work like the code says it should. The gfp mask
> is not passed down to the page table population code and it hard
> codes GFP_KERNEL allocations so you have to do:
>
> 	memalloc_nofs_save();
> 	buffer = kvmalloc(value_size, GFP_KERNEL);
> 	memalloc_nofs_restore();
>
> to apply GFP_NOFS to allocations in the pte population code to avoid
> memory reclaim recursion in kvmalloc.

What about this patch mentioned below? Is it the kasan allocations
(kasan_populate_vmalloc()), which hasn't been taken care of in this
patch. Does this means we need kvmalloc fixed instead for kasan allocations?

Though I agree we can have the fix like you mentioned above
(as many of the API users are already doing above). Just wanted to have the
full context of what is going on here.

451769ebb7e792c3404db53b3c2a422990de654e
Author:     Michal Hocko <mhocko@suse.com>

mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc

Patch series "extend vmalloc support for constrained allocations", v2.

Based on a recent discussion with Dave and Neil [1] I have tried to
implement NOFS, NOIO, NOFAIL support for the vmalloc to make life of
kvmalloc users easier.

[1] http://lkml.kernel.org/r/163184741778.29351.16920832234899124642.stgit@noble.brown


Thanks
-ritesh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [ext4?] possible deadlock in evict (3)
  2023-03-01 16:27   ` Ritesh Harjani
@ 2023-03-01 21:55     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2023-03-01 21:55 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: syzbot, adilger.kernel, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso

On Wed, Mar 01, 2023 at 09:57:58PM +0530, Ritesh Harjani wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > [obvious one for the ext4 people]
> >
> > On Tue, Feb 28, 2023 at 09:25:55AM -0800, syzbot wrote:
> >> Hello,
> >>
> >> syzbot found the following issue on:
> >>
> >> HEAD commit:    ae3419fbac84 vc_screen: don't clobber return value in vcs_..
> >> git tree:       upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=1136fe18c80000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=ff98a3b3c1aed3ab
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=dd426ae4af71f1e74729
> >> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> >>
> >> Unfortunately, I don't have any reproducer for this issue yet.
> >>
> >> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >> Reported-by: syzbot+dd426ae4af71f1e74729@syzkaller.appspotmail.com
> >>
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 6.2.0-syzkaller-12913-gae3419fbac84 #0 Not tainted
> >> ------------------------------------------------------
> >> kswapd0/100 is trying to acquire lock:
> >> ffff888047aea650 (sb_internal){.+.+}-{0:0}, at: evict+0x2ed/0x6b0 fs/inode.c:665
> >>
> >> but task is already holding lock:
> >> ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: set_task_reclaim_state mm/vmscan.c:200 [inline]
> >> ffffffff8c8e29e0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x170/0x1ac0 mm/vmscan.c:7338
> >>
> >> which lock already depends on the new lock.
> >>
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #3 (fs_reclaim){+.+.}-{0:0}:
> >>        __fs_reclaim_acquire mm/page_alloc.c:4716 [inline]
> >>        fs_reclaim_acquire+0x11d/0x160 mm/page_alloc.c:4730
> >>        might_alloc include/linux/sched/mm.h:271 [inline]
> >>        prepare_alloc_pages+0x159/0x570 mm/page_alloc.c:5362
> >>        __alloc_pages+0x149/0x5c0 mm/page_alloc.c:5580
> >>        alloc_pages+0x1aa/0x270 mm/mempolicy.c:2283
> >>        __get_free_pages+0xc/0x40 mm/page_alloc.c:5641
> >>        kasan_populate_vmalloc_pte mm/kasan/shadow.c:309 [inline]
> >>        kasan_populate_vmalloc_pte+0x27/0x150 mm/kasan/shadow.c:300
> >>        apply_to_pte_range mm/memory.c:2578 [inline]
> >>        apply_to_pmd_range mm/memory.c:2622 [inline]
> >>        apply_to_pud_range mm/memory.c:2658 [inline]
> >>        apply_to_p4d_range mm/memory.c:2694 [inline]
> >>        __apply_to_page_range+0x68c/0x1030 mm/memory.c:2728
> >>        alloc_vmap_area+0x536/0x1f20 mm/vmalloc.c:1638
> >>        __get_vm_area_node+0x145/0x3f0 mm/vmalloc.c:2495
> >>        __vmalloc_node_range+0x250/0x1300 mm/vmalloc.c:3141
> >>        kvmalloc_node+0x156/0x1a0 mm/util.c:628
> >>        kvmalloc include/linux/slab.h:737 [inline]
> >>        ext4_xattr_move_to_block fs/ext4/xattr.c:2570 [inline]
> >
> > 	buffer = kvmalloc(value_size, GFP_NOFS);
> >
> > Yeah, this doesn't work like the code says it should. The gfp mask
> > is not passed down to the page table population code and it hard
> > codes GFP_KERNEL allocations so you have to do:
> >
> > 	memalloc_nofs_save();
> > 	buffer = kvmalloc(value_size, GFP_KERNEL);
> > 	memalloc_nofs_restore();
> >
> > to apply GFP_NOFS to allocations in the pte population code to avoid
> > memory reclaim recursion in kvmalloc.
> 
> What about this patch mentioned below? Is it the kasan allocations
> (kasan_populate_vmalloc()), which hasn't been taken care of in this
> patch. Does this means we need kvmalloc fixed instead for kasan allocations?
> 
> Though I agree we can have the fix like you mentioned above
> (as many of the API users are already doing above). Just wanted to have the
> full context of what is going on here.
> 
> 451769ebb7e792c3404db53b3c2a422990de654e
> Author:     Michal Hocko <mhocko@suse.com>
> 
> mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc

Oh, you're right, I didn't realise that had been smacked into the
guts of the vmalloc implementation instead of just being wrapped
around the whole lot of it. That's an exact example of *how not to
use scoped allocation contexts* because it leads to stupid
whack-a-mole bugs like this one.  I made this exact point in the
discussion you pointed to when I noticed an anti-pattern of scoped
contexts were being used to wrap single kmalloc(GFP_KERNEL) calls.

KASAN essentialy requires explicit use of allocation scopes to avoid
spurious lockdep GFP_NOFS/GFP_NOIO allocation context warnings
because most of it's low level tracking allocations are GFP_KERNEL.
If we wrap the whole of kvmalloc() with the correct context,
everything is fine. But using contexts for fined-grained internal
"fix only the specific call-chain" patches like the above commit,
then it just doesn't work.

So in this case, I agree with you that this is a kvmalloc() bug in
that kvmalloc is failing to apply the GFP_NOFS scoped context across
the entire vmalloc operation. As it currently stands, any low level
allocation in the vmalloc path that isn't passed the correct gfp
mask will trigger this sort of warning.

This is one of the reasons we use the scopes extensively in XFS -
whenever we enter a NOFS context, we call memalloc_nofs_save() and
so we always get disconnected low level allocations (like KASAN
does) doing the right thing. Hence we simply haven't noticed how
badly vmalloc() screwed up setting allocation contexts....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-01 21:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28 17:25 [syzbot] [ext4?] possible deadlock in evict (3) syzbot
2023-03-01  0:01 ` Dave Chinner
2023-03-01 16:27   ` Ritesh Harjani
2023-03-01 21:55     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.