linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
@ 2021-03-11 10:59 syzbot
  2021-03-11 14:25 ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: syzbot @ 2021-03-11 10:59 UTC (permalink / raw)
  To: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso

Hello,

syzbot found the following issue on:

HEAD commit:    a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7
dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535
compiler:       Debian clang version 11.0.1-2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com

==================================================================
BUG: KCSAN: data-race in start_this_handle / start_this_handle

write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1:
 jbd2_get_transaction fs/jbd2/transaction.c:125 [inline]
 start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400
 jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
 __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
 __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
 ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
 generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
 ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
 ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
 call_write_iter include/linux/fs.h:1977 [inline]
 do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
 do_iter_write+0x112/0x4c0 fs/read_write.c:866
 vfs_iter_write+0x4c/0x70 fs/read_write.c:907
 iter_file_splice_write+0x40a/0x750 fs/splice.c:689
 do_splice_from fs/splice.c:767 [inline]
 direct_splice_actor+0x80/0xa0 fs/splice.c:936
 splice_direct_to_actor+0x345/0x650 fs/splice.c:891
 do_splice_direct+0xf5/0x170 fs/splice.c:979
 do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
 __do_sys_sendfile64 fs/read_write.c:1319 [inline]
 __se_sys_sendfile64 fs/read_write.c:1311 [inline]
 __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
 do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0:
 start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352
 jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
 __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
 __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
 ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
 generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
 ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
 ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
 call_write_iter include/linux/fs.h:1977 [inline]
 do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
 do_iter_write+0x112/0x4c0 fs/read_write.c:866
 vfs_iter_write+0x4c/0x70 fs/read_write.c:907
 iter_file_splice_write+0x40a/0x750 fs/splice.c:689
 do_splice_from fs/splice.c:767 [inline]
 direct_splice_actor+0x80/0xa0 fs/splice.c:936
 splice_direct_to_actor+0x345/0x650 fs/splice.c:891
 do_splice_direct+0xf5/0x170 fs/splice.c:979
 do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
 __do_sys_sendfile64 fs/read_write.c:1319 [inline]
 __se_sys_sendfile64 fs/read_write.c:1311 [inline]
 __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
 do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
  2021-03-11 10:59 [syzbot] KCSAN: data-race in start_this_handle / start_this_handle syzbot
@ 2021-03-11 14:25 ` Jan Kara
  2021-03-11 14:53   ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2021-03-11 14:25 UTC (permalink / raw)
  To: syzbot; +Cc: jack, linux-ext4, linux-kernel, syzkaller-bugs, tytso

On Thu 11-03-21 02:59:14, syzbot wrote:
> HEAD commit:    a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7
> dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535
> compiler:       Debian clang version 11.0.1-2
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com
> 
> ==================================================================
> BUG: KCSAN: data-race in start_this_handle / start_this_handle
> 
> write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1:
>  jbd2_get_transaction fs/jbd2/transaction.c:125 [inline]
>  start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400
>  jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
>  __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
>  __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
>  ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
>  generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
>  ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
>  ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
>  call_write_iter include/linux/fs.h:1977 [inline]
>  do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
>  do_iter_write+0x112/0x4c0 fs/read_write.c:866
>  vfs_iter_write+0x4c/0x70 fs/read_write.c:907
>  iter_file_splice_write+0x40a/0x750 fs/splice.c:689
>  do_splice_from fs/splice.c:767 [inline]
>  direct_splice_actor+0x80/0xa0 fs/splice.c:936
>  splice_direct_to_actor+0x345/0x650 fs/splice.c:891
>  do_splice_direct+0xf5/0x170 fs/splice.c:979
>  do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
>  __do_sys_sendfile64 fs/read_write.c:1319 [inline]
>  __se_sys_sendfile64 fs/read_write.c:1311 [inline]
>  __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
>  do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0:
>  start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352
>  jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
>  __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
>  __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
>  ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
>  generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
>  ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
>  ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
>  call_write_iter include/linux/fs.h:1977 [inline]
>  do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
>  do_iter_write+0x112/0x4c0 fs/read_write.c:866
>  vfs_iter_write+0x4c/0x70 fs/read_write.c:907
>  iter_file_splice_write+0x40a/0x750 fs/splice.c:689
>  do_splice_from fs/splice.c:767 [inline]
>  direct_splice_actor+0x80/0xa0 fs/splice.c:936
>  splice_direct_to_actor+0x345/0x650 fs/splice.c:891
>  do_splice_direct+0xf5/0x170 fs/splice.c:979
>  do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
>  __do_sys_sendfile64 fs/read_write.c:1319 [inline]
>  __se_sys_sendfile64 fs/read_write.c:1311 [inline]
>  __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
>  do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> ==================================================================

So this case is harmless. start_this_handle() does indeed check
journal->j_running_transaction without any protection and this is only a
racy check to opportunistically preallocate a transaction if we are likely
to need it. There was some macro to instruct KCSAN that the read is
actually fine, wasn't there?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
  2021-03-11 14:25 ` Jan Kara
@ 2021-03-11 14:53   ` Dmitry Vyukov
  2021-03-11 15:08     ` Marco Elver
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2021-03-11 14:53 UTC (permalink / raw)
  To: Jan Kara, Marco Elver, Tetsuo Handa
  Cc: syzbot, Jan Kara, linux-ext4, LKML, syzkaller-bugs, Theodore Ts'o

On Thu, Mar 11, 2021 at 3:25 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 11-03-21 02:59:14, syzbot wrote:
> > HEAD commit:    a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535
> > compiler:       Debian clang version 11.0.1-2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com
> >
> > ==================================================================
> > BUG: KCSAN: data-race in start_this_handle / start_this_handle
> >
> > write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1:
> >  jbd2_get_transaction fs/jbd2/transaction.c:125 [inline]
> >  start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400
> >  jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> >  __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> >  __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> >  ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> >  generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> >  ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> >  ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> >  call_write_iter include/linux/fs.h:1977 [inline]
> >  do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> >  do_iter_write+0x112/0x4c0 fs/read_write.c:866
> >  vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> >  iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> >  do_splice_from fs/splice.c:767 [inline]
> >  direct_splice_actor+0x80/0xa0 fs/splice.c:936
> >  splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> >  do_splice_direct+0xf5/0x170 fs/splice.c:979
> >  do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> >  __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> >  __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> >  __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> >  do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0:
> >  start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352
> >  jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> >  __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> >  __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> >  ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> >  generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> >  ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> >  ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> >  call_write_iter include/linux/fs.h:1977 [inline]
> >  do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> >  do_iter_write+0x112/0x4c0 fs/read_write.c:866
> >  vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> >  iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> >  do_splice_from fs/splice.c:767 [inline]
> >  direct_splice_actor+0x80/0xa0 fs/splice.c:936
> >  splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> >  do_splice_direct+0xf5/0x170 fs/splice.c:979
> >  do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> >  __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> >  __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> >  __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> >  do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > ==================================================================
>
> So this case is harmless. start_this_handle() does indeed check
> journal->j_running_transaction without any protection and this is only a
> racy check to opportunistically preallocate a transaction if we are likely
> to need it. There was some macro to instruct KCSAN that the read is
> actually fine, wasn't there?

+Marco
+Tetsuo, did you want to fix it?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
  2021-03-11 14:53   ` Dmitry Vyukov
@ 2021-03-11 15:08     ` Marco Elver
  2021-03-11 15:30       ` Theodore Ts'o
  0 siblings, 1 reply; 8+ messages in thread
From: Marco Elver @ 2021-03-11 15:08 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jan Kara, Tetsuo Handa, syzbot, Jan Kara, linux-ext4, LKML,
	syzkaller-bugs, Theodore Ts'o

On Thu, Mar 11, 2021 at 03:53PM +0100, Dmitry Vyukov wrote:
> On Thu, Mar 11, 2021 at 3:25 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 11-03-21 02:59:14, syzbot wrote:
> > > HEAD commit:    a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535
> > > compiler:       Debian clang version 11.0.1-2
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com
> > >
> > > ==================================================================
> > > BUG: KCSAN: data-race in start_this_handle / start_this_handle
> > >
> > > write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1:
> > >  jbd2_get_transaction fs/jbd2/transaction.c:125 [inline]
> > >  start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400
> > >  jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> > >  __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> > >  __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> > >  ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> > >  generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> > >  ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> > >  ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> > >  call_write_iter include/linux/fs.h:1977 [inline]
> > >  do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> > >  do_iter_write+0x112/0x4c0 fs/read_write.c:866
> > >  vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> > >  iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> > >  do_splice_from fs/splice.c:767 [inline]
> > >  direct_splice_actor+0x80/0xa0 fs/splice.c:936
> > >  splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> > >  do_splice_direct+0xf5/0x170 fs/splice.c:979
> > >  do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> > >  __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> > >  __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> > >  __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> > >  do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0:
> > >  start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352
> > >  jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> > >  __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> > >  __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> > >  ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> > >  generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> > >  ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> > >  ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> > >  call_write_iter include/linux/fs.h:1977 [inline]
> > >  do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> > >  do_iter_write+0x112/0x4c0 fs/read_write.c:866
> > >  vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> > >  iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> > >  do_splice_from fs/splice.c:767 [inline]
> > >  direct_splice_actor+0x80/0xa0 fs/splice.c:936
> > >  splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> > >  do_splice_direct+0xf5/0x170 fs/splice.c:979
> > >  do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> > >  __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> > >  __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> > >  __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> > >  do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > ==================================================================
> >
> > So this case is harmless. start_this_handle() does indeed check
> > journal->j_running_transaction without any protection and this is only a
> > racy check to opportunistically preallocate a transaction if we are likely
> > to need it. There was some macro to instruct KCSAN that the read is
> > actually fine, wasn't there?

If the outcome of the check does not affect correctness and the code is
entirely fault tolerant to the precise value being read, then a
data_race(!journal->j_running_transaction) marking here would be fine.

If in doubt, this is the latest summary for concurrent access markings:
https://lkml.kernel.org/r/20210220051000.GA457@paulmck-ThinkPad-P72

> +Marco
> +Tetsuo, did you want to fix it?

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
  2021-03-11 15:08     ` Marco Elver
@ 2021-03-11 15:30       ` Theodore Ts'o
  2021-03-11 15:54         ` Marco Elver
  0 siblings, 1 reply; 8+ messages in thread
From: Theodore Ts'o @ 2021-03-11 15:30 UTC (permalink / raw)
  To: Marco Elver
  Cc: Dmitry Vyukov, Jan Kara, Tetsuo Handa, syzbot, Jan Kara,
	linux-ext4, LKML, syzkaller-bugs

On Thu, Mar 11, 2021 at 04:08:30PM +0100, Marco Elver wrote:
> If the outcome of the check does not affect correctness and the code is
> entirely fault tolerant to the precise value being read, then a
> data_race(!journal->j_running_transaction) marking here would be fine.

So a very common coding pattern is to check a value w/o the lock, and
if it looks like we might need to check *with* a lock, we'll grab the
lock and recheck.  Does KCSAN understand that this sort of thing is
safe automatically?

In thie particular case, it's a bit more complicated than that; we're
checking a value, and then allocating memory, grabbing the spin lock,
and then re-checking the value, so we don't have to drop the spinlock,
allocate the memory, grab the lock again, and then rechecking the
value.  So even if KCSAN catches the simpler case as described above,
we still might need to explicitly mark the data_race explicitly.

But the more we could have the compiler automatically figure out
things without needing an explicit tag, it would seem to me that this
would be better, since manual tagging is going to be more error-prone.

Cheers,

      	 	       	      	      	       - Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
  2021-03-11 15:30       ` Theodore Ts'o
@ 2021-03-11 15:54         ` Marco Elver
  2021-03-19 14:15           ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Marco Elver @ 2021-03-11 15:54 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Dmitry Vyukov, Jan Kara, Tetsuo Handa, syzbot, Jan Kara,
	linux-ext4, LKML, syzkaller-bugs, Paul E. McKenney

+Cc Paul

On Thu, 11 Mar 2021 at 16:30, Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Thu, Mar 11, 2021 at 04:08:30PM +0100, Marco Elver wrote:
> > If the outcome of the check does not affect correctness and the code is
> > entirely fault tolerant to the precise value being read, then a
> > data_race(!journal->j_running_transaction) marking here would be fine.
>
> So a very common coding pattern is to check a value w/o the lock, and
> if it looks like we might need to check *with* a lock, we'll grab the
> lock and recheck.  Does KCSAN understand that this sort of thing is
> safe automatically?

It doesn't -- these are concurrency patterns that go way beyond the
scope of a data race detector. The main problem, as always with such
patterns, is that in one case it seems fine, yet in the next it isn't.

> In thie particular case, it's a bit more complicated than that; we're
> checking a value, and then allocating memory, grabbing the spin lock,
> and then re-checking the value, so we don't have to drop the spinlock,
> allocate the memory, grab the lock again, and then rechecking the
> value.  So even if KCSAN catches the simpler case as described above,
> we still might need to explicitly mark the data_race explicitly.

This seems like a variation of double-checked locking.

> But the more we could have the compiler automatically figure out
> things without needing an explicit tag, it would seem to me that this
> would be better, since manual tagging is going to be more error-prone.

What you're alluding to here would go much further than a data race
detector ("data race" is still just defined by the memory model). The
wish that there was a static analysis tool that would automatically
understand the "concurrency semantics as intended by the developer" is
something that'd be nice to have, but just doesn't seem realistic.
Because how can a tool tell what the developer intended, without input
from that developer?

If there's worry marking accesses is error-prone, then that might be a
signal that the concurrency design is too complex (or the developer
hasn't considered all cases).

For that reason, we need to mark accesses to tell the compiler and
tooling where to expect concurrency, so that 1) the compiler generates
correct code, and 2) tooling such as KCSAN can double-check what the
developer intended is actually what's happening.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
  2021-03-11 15:54         ` Marco Elver
@ 2021-03-19 14:15           ` Tetsuo Handa
  2021-03-19 17:23             ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2021-03-19 14:15 UTC (permalink / raw)
  To: Marco Elver, Theodore Ts'o
  Cc: Dmitry Vyukov, syzbot, Jan Kara, linux-ext4, LKML,
	syzkaller-bugs, Paul E. McKenney, Jan Kara

On 2021/03/12 0:54, Marco Elver wrote:
>> But the more we could have the compiler automatically figure out
>> things without needing an explicit tag, it would seem to me that this
>> would be better, since manual tagging is going to be more error-prone.
> 
> What you're alluding to here would go much further than a data race
> detector ("data race" is still just defined by the memory model). The
> wish that there was a static analysis tool that would automatically
> understand the "concurrency semantics as intended by the developer" is
> something that'd be nice to have, but just doesn't seem realistic.
> Because how can a tool tell what the developer intended, without input
> from that developer?

Input from developers is very important for not only compilers and tools
but also allowing bug-explorers to understand what is happening.
ext4 currently has

  possible deadlock in start_this_handle (2)
  https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba

which even maintainers cannot understand what is happening.
How can bug-explorers know implicit logic which maintainers believe safe and correct?
It is possible that some oversight in implicit logic is the cause of
"possible deadlock in start_this_handle (2)".
Making implicit assumptions clear helps understanding.

Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking?
syzbot is already waiting for
"KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at
https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 .

> 
> If there's worry marking accesses is error-prone, then that might be a
> signal that the concurrency design is too complex (or the developer
> hasn't considered all cases).
> 
> For that reason, we need to mark accesses to tell the compiler and
> tooling where to expect concurrency, so that 1) the compiler generates
> correct code, and 2) tooling such as KCSAN can double-check what the
> developer intended is actually what's happening.

and 3) bug-explorers can understand what the developers are assuming/missing.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle
  2021-03-19 14:15           ` Tetsuo Handa
@ 2021-03-19 17:23             ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-03-19 17:23 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Marco Elver, Theodore Ts'o, Dmitry Vyukov, syzbot, Jan Kara,
	linux-ext4, LKML, syzkaller-bugs, Jan Kara

On Fri, Mar 19, 2021 at 11:15:42PM +0900, Tetsuo Handa wrote:
> On 2021/03/12 0:54, Marco Elver wrote:
> >> But the more we could have the compiler automatically figure out
> >> things without needing an explicit tag, it would seem to me that this
> >> would be better, since manual tagging is going to be more error-prone.
> > 
> > What you're alluding to here would go much further than a data race
> > detector ("data race" is still just defined by the memory model). The
> > wish that there was a static analysis tool that would automatically
> > understand the "concurrency semantics as intended by the developer" is
> > something that'd be nice to have, but just doesn't seem realistic.
> > Because how can a tool tell what the developer intended, without input
> > from that developer?
> 
> Input from developers is very important for not only compilers and tools
> but also allowing bug-explorers to understand what is happening.
> ext4 currently has
> 
>   possible deadlock in start_this_handle (2)
>   https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba
> 
> which even maintainers cannot understand what is happening.
> How can bug-explorers know implicit logic which maintainers believe safe and correct?
> It is possible that some oversight in implicit logic is the cause of
> "possible deadlock in start_this_handle (2)".
> Making implicit assumptions clear helps understanding.

Just to be clear, the above diagnostic is from lockdep rather than KCSAN.

According to the sample crash result, different code paths acquire
jdb2_handle and the __fs_reclaim_map in different orders.  It looks
to me that __fs_reclaim_map isn't really a lock, but rather a mode
indicator.  If so, lockdep should set it up accordingly, perhaps
in a manner similar to rcu_lock_map.

> Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking?
> syzbot is already waiting for
> "KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at
> https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 .

The first thing is to work out what the code should be doing.  What KCSAN
is saying is that a variable is being locklessly updated.  Is it really
OK for that variable to be locklessly updated?  If not, a larger fix
is required.

For more information, please see Marco's LWN series:
https://lwn.net/Articles/816850/ and https://lwn.net/Articles/816854/

Alternatively, you can refer to the documentation being proposed for
the Linux kernel tree:

https://lore.kernel.org/lkml/20210304004543.25364-3-paulmck@kernel.org/

> > If there's worry marking accesses is error-prone, then that might be a
> > signal that the concurrency design is too complex (or the developer
> > hasn't considered all cases).
> > 
> > For that reason, we need to mark accesses to tell the compiler and
> > tooling where to expect concurrency, so that 1) the compiler generates
> > correct code, and 2) tooling such as KCSAN can double-check what the
> > developer intended is actually what's happening.
> 
> and 3) bug-explorers can understand what the developers are assuming/missing.

If the above information doesn't help the bug explorers, please let me
know.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-19 17:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 10:59 [syzbot] KCSAN: data-race in start_this_handle / start_this_handle syzbot
2021-03-11 14:25 ` Jan Kara
2021-03-11 14:53   ` Dmitry Vyukov
2021-03-11 15:08     ` Marco Elver
2021-03-11 15:30       ` Theodore Ts'o
2021-03-11 15:54         ` Marco Elver
2021-03-19 14:15           ` Tetsuo Handa
2021-03-19 17:23             ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).