* [syzbot] KCSAN: data-race in start_this_handle / start_this_handle @ 2021-03-11 10:59 syzbot 2021-03-11 14:25 ` Jan Kara 0 siblings, 1 reply; 8+ messages in thread From: syzbot @ 2021-03-11 10:59 UTC (permalink / raw) To: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso Hello, syzbot found the following issue on: HEAD commit: a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000 kernel config: https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7 dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535 compiler: Debian clang version 11.0.1-2 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com ================================================================== BUG: KCSAN: data-race in start_this_handle / start_this_handle write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1: jbd2_get_transaction fs/jbd2/transaction.c:125 [inline] start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400 jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503 __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105 __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline] ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998 generic_perform_write+0x196/0x3a0 mm/filemap.c:3575 ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269 ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502 call_write_iter include/linux/fs.h:1977 [inline] do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740 do_iter_write+0x112/0x4c0 fs/read_write.c:866 vfs_iter_write+0x4c/0x70 fs/read_write.c:907 iter_file_splice_write+0x40a/0x750 fs/splice.c:689 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:936 splice_direct_to_actor+0x345/0x650 fs/splice.c:891 do_splice_direct+0xf5/0x170 fs/splice.c:979 do_sendfile+0x7a6/0xe20 fs/read_write.c:1260 __do_sys_sendfile64 fs/read_write.c:1319 [inline] __se_sys_sendfile64 fs/read_write.c:1311 [inline] __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311 do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0: start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352 jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503 __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105 __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline] ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998 generic_perform_write+0x196/0x3a0 mm/filemap.c:3575 ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269 ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502 call_write_iter include/linux/fs.h:1977 [inline] do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740 do_iter_write+0x112/0x4c0 fs/read_write.c:866 vfs_iter_write+0x4c/0x70 fs/read_write.c:907 iter_file_splice_write+0x40a/0x750 fs/splice.c:689 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:936 splice_direct_to_actor+0x345/0x650 fs/splice.c:891 do_splice_direct+0xf5/0x170 fs/splice.c:979 do_sendfile+0x7a6/0xe20 fs/read_write.c:1260 __do_sys_sendfile64 fs/read_write.c:1319 [inline] __se_sys_sendfile64 fs/read_write.c:1311 [inline] __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311 do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ================================================================== --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle 2021-03-11 10:59 [syzbot] KCSAN: data-race in start_this_handle / start_this_handle syzbot @ 2021-03-11 14:25 ` Jan Kara 2021-03-11 14:53 ` Dmitry Vyukov 0 siblings, 1 reply; 8+ messages in thread From: Jan Kara @ 2021-03-11 14:25 UTC (permalink / raw) To: syzbot; +Cc: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso On Thu 11-03-21 02:59:14, syzbot wrote: > HEAD commit: a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7 > dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535 > compiler: Debian clang version 11.0.1-2 > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com > > ================================================================== > BUG: KCSAN: data-race in start_this_handle / start_this_handle > > write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1: > jbd2_get_transaction fs/jbd2/transaction.c:125 [inline] > start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400 > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503 > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105 > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline] > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998 > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575 > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269 > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502 > call_write_iter include/linux/fs.h:1977 [inline] > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740 > do_iter_write+0x112/0x4c0 fs/read_write.c:866 > vfs_iter_write+0x4c/0x70 fs/read_write.c:907 > iter_file_splice_write+0x40a/0x750 fs/splice.c:689 > do_splice_from fs/splice.c:767 [inline] > direct_splice_actor+0x80/0xa0 fs/splice.c:936 > splice_direct_to_actor+0x345/0x650 fs/splice.c:891 > do_splice_direct+0xf5/0x170 fs/splice.c:979 > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260 > __do_sys_sendfile64 fs/read_write.c:1319 [inline] > __se_sys_sendfile64 fs/read_write.c:1311 [inline] > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311 > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0: > start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352 > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503 > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105 > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline] > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998 > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575 > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269 > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502 > call_write_iter include/linux/fs.h:1977 [inline] > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740 > do_iter_write+0x112/0x4c0 fs/read_write.c:866 > vfs_iter_write+0x4c/0x70 fs/read_write.c:907 > iter_file_splice_write+0x40a/0x750 fs/splice.c:689 > do_splice_from fs/splice.c:767 [inline] > direct_splice_actor+0x80/0xa0 fs/splice.c:936 > splice_direct_to_actor+0x345/0x650 fs/splice.c:891 > do_splice_direct+0xf5/0x170 fs/splice.c:979 > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260 > __do_sys_sendfile64 fs/read_write.c:1319 [inline] > __se_sys_sendfile64 fs/read_write.c:1311 [inline] > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311 > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > Reported by Kernel Concurrency Sanitizer on: > CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > ================================================================== So this case is harmless. start_this_handle() does indeed check journal->j_running_transaction without any protection and this is only a racy check to opportunistically preallocate a transaction if we are likely to need it. There was some macro to instruct KCSAN that the read is actually fine, wasn't there? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle 2021-03-11 14:25 ` Jan Kara @ 2021-03-11 14:53 ` Dmitry Vyukov 2021-03-11 15:08 ` Marco Elver 0 siblings, 1 reply; 8+ messages in thread From: Dmitry Vyukov @ 2021-03-11 14:53 UTC (permalink / raw) To: Jan Kara, Marco Elver, Tetsuo Handa Cc: syzbot, Jan Kara, linux-ext4, LKML, syzkaller-bugs, Theodore Ts'o On Thu, Mar 11, 2021 at 3:25 PM Jan Kara <jack@suse.cz> wrote: > > On Thu 11-03-21 02:59:14, syzbot wrote: > > HEAD commit: a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7 > > dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535 > > compiler: Debian clang version 11.0.1-2 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com > > > > ================================================================== > > BUG: KCSAN: data-race in start_this_handle / start_this_handle > > > > write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1: > > jbd2_get_transaction fs/jbd2/transaction.c:125 [inline] > > start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400 > > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503 > > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105 > > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline] > > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998 > > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575 > > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269 > > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502 > > call_write_iter include/linux/fs.h:1977 [inline] > > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740 > > do_iter_write+0x112/0x4c0 fs/read_write.c:866 > > vfs_iter_write+0x4c/0x70 fs/read_write.c:907 > > iter_file_splice_write+0x40a/0x750 fs/splice.c:689 > > do_splice_from fs/splice.c:767 [inline] > > direct_splice_actor+0x80/0xa0 fs/splice.c:936 > > splice_direct_to_actor+0x345/0x650 fs/splice.c:891 > > do_splice_direct+0xf5/0x170 fs/splice.c:979 > > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260 > > __do_sys_sendfile64 fs/read_write.c:1319 [inline] > > __se_sys_sendfile64 fs/read_write.c:1311 [inline] > > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311 > > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0: > > start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352 > > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503 > > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105 > > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline] > > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998 > > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575 > > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269 > > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502 > > call_write_iter include/linux/fs.h:1977 [inline] > > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740 > > do_iter_write+0x112/0x4c0 fs/read_write.c:866 > > vfs_iter_write+0x4c/0x70 fs/read_write.c:907 > > iter_file_splice_write+0x40a/0x750 fs/splice.c:689 > > do_splice_from fs/splice.c:767 [inline] > > direct_splice_actor+0x80/0xa0 fs/splice.c:936 > > splice_direct_to_actor+0x345/0x650 fs/splice.c:891 > > do_splice_direct+0xf5/0x170 fs/splice.c:979 > > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260 > > __do_sys_sendfile64 fs/read_write.c:1319 [inline] > > __se_sys_sendfile64 fs/read_write.c:1311 [inline] > > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311 > > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > Reported by Kernel Concurrency Sanitizer on: > > CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > ================================================================== > > So this case is harmless. start_this_handle() does indeed check > journal->j_running_transaction without any protection and this is only a > racy check to opportunistically preallocate a transaction if we are likely > to need it. There was some macro to instruct KCSAN that the read is > actually fine, wasn't there? +Marco +Tetsuo, did you want to fix it? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle 2021-03-11 14:53 ` Dmitry Vyukov @ 2021-03-11 15:08 ` Marco Elver 2021-03-11 15:30 ` Theodore Ts'o 0 siblings, 1 reply; 8+ messages in thread From: Marco Elver @ 2021-03-11 15:08 UTC (permalink / raw) To: Dmitry Vyukov Cc: Jan Kara, Tetsuo Handa, syzbot, Jan Kara, linux-ext4, LKML, syzkaller-bugs, Theodore Ts'o On Thu, Mar 11, 2021 at 03:53PM +0100, Dmitry Vyukov wrote: > On Thu, Mar 11, 2021 at 3:25 PM Jan Kara <jack@suse.cz> wrote: > > > > On Thu 11-03-21 02:59:14, syzbot wrote: > > > HEAD commit: a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535 > > > compiler: Debian clang version 11.0.1-2 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com > > > > > > ================================================================== > > > BUG: KCSAN: data-race in start_this_handle / start_this_handle > > > > > > write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1: > > > jbd2_get_transaction fs/jbd2/transaction.c:125 [inline] > > > start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400 > > > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503 > > > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105 > > > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline] > > > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998 > > > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575 > > > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269 > > > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502 > > > call_write_iter include/linux/fs.h:1977 [inline] > > > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740 > > > do_iter_write+0x112/0x4c0 fs/read_write.c:866 > > > vfs_iter_write+0x4c/0x70 fs/read_write.c:907 > > > iter_file_splice_write+0x40a/0x750 fs/splice.c:689 > > > do_splice_from fs/splice.c:767 [inline] > > > direct_splice_actor+0x80/0xa0 fs/splice.c:936 > > > splice_direct_to_actor+0x345/0x650 fs/splice.c:891 > > > do_splice_direct+0xf5/0x170 fs/splice.c:979 > > > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260 > > > __do_sys_sendfile64 fs/read_write.c:1319 [inline] > > > __se_sys_sendfile64 fs/read_write.c:1311 [inline] > > > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311 > > > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46 > > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > > > read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0: > > > start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352 > > > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503 > > > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105 > > > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline] > > > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998 > > > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575 > > > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269 > > > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502 > > > call_write_iter include/linux/fs.h:1977 [inline] > > > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740 > > > do_iter_write+0x112/0x4c0 fs/read_write.c:866 > > > vfs_iter_write+0x4c/0x70 fs/read_write.c:907 > > > iter_file_splice_write+0x40a/0x750 fs/splice.c:689 > > > do_splice_from fs/splice.c:767 [inline] > > > direct_splice_actor+0x80/0xa0 fs/splice.c:936 > > > splice_direct_to_actor+0x345/0x650 fs/splice.c:891 > > > do_splice_direct+0xf5/0x170 fs/splice.c:979 > > > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260 > > > __do_sys_sendfile64 fs/read_write.c:1319 [inline] > > > __se_sys_sendfile64 fs/read_write.c:1311 [inline] > > > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311 > > > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46 > > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > ================================================================== > > > > So this case is harmless. start_this_handle() does indeed check > > journal->j_running_transaction without any protection and this is only a > > racy check to opportunistically preallocate a transaction if we are likely > > to need it. There was some macro to instruct KCSAN that the read is > > actually fine, wasn't there? If the outcome of the check does not affect correctness and the code is entirely fault tolerant to the precise value being read, then a data_race(!journal->j_running_transaction) marking here would be fine. If in doubt, this is the latest summary for concurrent access markings: https://lkml.kernel.org/r/20210220051000.GA457@paulmck-ThinkPad-P72 > +Marco > +Tetsuo, did you want to fix it? Thanks, -- Marco ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle 2021-03-11 15:08 ` Marco Elver @ 2021-03-11 15:30 ` Theodore Ts'o 2021-03-11 15:54 ` Marco Elver 0 siblings, 1 reply; 8+ messages in thread From: Theodore Ts'o @ 2021-03-11 15:30 UTC (permalink / raw) To: Marco Elver Cc: Dmitry Vyukov, Jan Kara, Tetsuo Handa, syzbot, Jan Kara, linux-ext4, LKML, syzkaller-bugs On Thu, Mar 11, 2021 at 04:08:30PM +0100, Marco Elver wrote: > If the outcome of the check does not affect correctness and the code is > entirely fault tolerant to the precise value being read, then a > data_race(!journal->j_running_transaction) marking here would be fine. So a very common coding pattern is to check a value w/o the lock, and if it looks like we might need to check *with* a lock, we'll grab the lock and recheck. Does KCSAN understand that this sort of thing is safe automatically? In thie particular case, it's a bit more complicated than that; we're checking a value, and then allocating memory, grabbing the spin lock, and then re-checking the value, so we don't have to drop the spinlock, allocate the memory, grab the lock again, and then rechecking the value. So even if KCSAN catches the simpler case as described above, we still might need to explicitly mark the data_race explicitly. But the more we could have the compiler automatically figure out things without needing an explicit tag, it would seem to me that this would be better, since manual tagging is going to be more error-prone. Cheers, - Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle 2021-03-11 15:30 ` Theodore Ts'o @ 2021-03-11 15:54 ` Marco Elver 2021-03-19 14:15 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Marco Elver @ 2021-03-11 15:54 UTC (permalink / raw) To: Theodore Ts'o Cc: Dmitry Vyukov, Jan Kara, Tetsuo Handa, syzbot, Jan Kara, linux-ext4, LKML, syzkaller-bugs, Paul E. McKenney +Cc Paul On Thu, 11 Mar 2021 at 16:30, Theodore Ts'o <tytso@mit.edu> wrote: > > On Thu, Mar 11, 2021 at 04:08:30PM +0100, Marco Elver wrote: > > If the outcome of the check does not affect correctness and the code is > > entirely fault tolerant to the precise value being read, then a > > data_race(!journal->j_running_transaction) marking here would be fine. > > So a very common coding pattern is to check a value w/o the lock, and > if it looks like we might need to check *with* a lock, we'll grab the > lock and recheck. Does KCSAN understand that this sort of thing is > safe automatically? It doesn't -- these are concurrency patterns that go way beyond the scope of a data race detector. The main problem, as always with such patterns, is that in one case it seems fine, yet in the next it isn't. > In thie particular case, it's a bit more complicated than that; we're > checking a value, and then allocating memory, grabbing the spin lock, > and then re-checking the value, so we don't have to drop the spinlock, > allocate the memory, grab the lock again, and then rechecking the > value. So even if KCSAN catches the simpler case as described above, > we still might need to explicitly mark the data_race explicitly. This seems like a variation of double-checked locking. > But the more we could have the compiler automatically figure out > things without needing an explicit tag, it would seem to me that this > would be better, since manual tagging is going to be more error-prone. What you're alluding to here would go much further than a data race detector ("data race" is still just defined by the memory model). The wish that there was a static analysis tool that would automatically understand the "concurrency semantics as intended by the developer" is something that'd be nice to have, but just doesn't seem realistic. Because how can a tool tell what the developer intended, without input from that developer? If there's worry marking accesses is error-prone, then that might be a signal that the concurrency design is too complex (or the developer hasn't considered all cases). For that reason, we need to mark accesses to tell the compiler and tooling where to expect concurrency, so that 1) the compiler generates correct code, and 2) tooling such as KCSAN can double-check what the developer intended is actually what's happening. Thanks, -- Marco ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle 2021-03-11 15:54 ` Marco Elver @ 2021-03-19 14:15 ` Tetsuo Handa 2021-03-19 17:23 ` Paul E. McKenney 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2021-03-19 14:15 UTC (permalink / raw) To: Marco Elver, Theodore Ts'o Cc: Dmitry Vyukov, syzbot, Jan Kara, linux-ext4, LKML, syzkaller-bugs, Paul E. McKenney, Jan Kara On 2021/03/12 0:54, Marco Elver wrote: >> But the more we could have the compiler automatically figure out >> things without needing an explicit tag, it would seem to me that this >> would be better, since manual tagging is going to be more error-prone. > > What you're alluding to here would go much further than a data race > detector ("data race" is still just defined by the memory model). The > wish that there was a static analysis tool that would automatically > understand the "concurrency semantics as intended by the developer" is > something that'd be nice to have, but just doesn't seem realistic. > Because how can a tool tell what the developer intended, without input > from that developer? Input from developers is very important for not only compilers and tools but also allowing bug-explorers to understand what is happening. ext4 currently has possible deadlock in start_this_handle (2) https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba which even maintainers cannot understand what is happening. How can bug-explorers know implicit logic which maintainers believe safe and correct? It is possible that some oversight in implicit logic is the cause of "possible deadlock in start_this_handle (2)". Making implicit assumptions clear helps understanding. Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking? syzbot is already waiting for "KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 . > > If there's worry marking accesses is error-prone, then that might be a > signal that the concurrency design is too complex (or the developer > hasn't considered all cases). > > For that reason, we need to mark accesses to tell the compiler and > tooling where to expect concurrency, so that 1) the compiler generates > correct code, and 2) tooling such as KCSAN can double-check what the > developer intended is actually what's happening. and 3) bug-explorers can understand what the developers are assuming/missing. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle 2021-03-19 14:15 ` Tetsuo Handa @ 2021-03-19 17:23 ` Paul E. McKenney 0 siblings, 0 replies; 8+ messages in thread From: Paul E. McKenney @ 2021-03-19 17:23 UTC (permalink / raw) To: Tetsuo Handa Cc: Marco Elver, Theodore Ts'o, Dmitry Vyukov, syzbot, Jan Kara, linux-ext4, LKML, syzkaller-bugs, Jan Kara On Fri, Mar 19, 2021 at 11:15:42PM +0900, Tetsuo Handa wrote: > On 2021/03/12 0:54, Marco Elver wrote: > >> But the more we could have the compiler automatically figure out > >> things without needing an explicit tag, it would seem to me that this > >> would be better, since manual tagging is going to be more error-prone. > > > > What you're alluding to here would go much further than a data race > > detector ("data race" is still just defined by the memory model). The > > wish that there was a static analysis tool that would automatically > > understand the "concurrency semantics as intended by the developer" is > > something that'd be nice to have, but just doesn't seem realistic. > > Because how can a tool tell what the developer intended, without input > > from that developer? > > Input from developers is very important for not only compilers and tools > but also allowing bug-explorers to understand what is happening. > ext4 currently has > > possible deadlock in start_this_handle (2) > https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba > > which even maintainers cannot understand what is happening. > How can bug-explorers know implicit logic which maintainers believe safe and correct? > It is possible that some oversight in implicit logic is the cause of > "possible deadlock in start_this_handle (2)". > Making implicit assumptions clear helps understanding. Just to be clear, the above diagnostic is from lockdep rather than KCSAN. According to the sample crash result, different code paths acquire jdb2_handle and the __fs_reclaim_map in different orders. It looks to me that __fs_reclaim_map isn't really a lock, but rather a mode indicator. If so, lockdep should set it up accordingly, perhaps in a manner similar to rcu_lock_map. > Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking? > syzbot is already waiting for > "KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at > https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 . The first thing is to work out what the code should be doing. What KCSAN is saying is that a variable is being locklessly updated. Is it really OK for that variable to be locklessly updated? If not, a larger fix is required. For more information, please see Marco's LWN series: https://lwn.net/Articles/816850/ and https://lwn.net/Articles/816854/ Alternatively, you can refer to the documentation being proposed for the Linux kernel tree: https://lore.kernel.org/lkml/20210304004543.25364-3-paulmck@kernel.org/ > > If there's worry marking accesses is error-prone, then that might be a > > signal that the concurrency design is too complex (or the developer > > hasn't considered all cases). > > > > For that reason, we need to mark accesses to tell the compiler and > > tooling where to expect concurrency, so that 1) the compiler generates > > correct code, and 2) tooling such as KCSAN can double-check what the > > developer intended is actually what's happening. > > and 3) bug-explorers can understand what the developers are assuming/missing. If the above information doesn't help the bug explorers, please let me know. Thanx, Paul ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-03-19 17:24 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-11 10:59 [syzbot] KCSAN: data-race in start_this_handle / start_this_handle syzbot 2021-03-11 14:25 ` Jan Kara 2021-03-11 14:53 ` Dmitry Vyukov 2021-03-11 15:08 ` Marco Elver 2021-03-11 15:30 ` Theodore Ts'o 2021-03-11 15:54 ` Marco Elver 2021-03-19 14:15 ` Tetsuo Handa 2021-03-19 17:23 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).