Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* KCSAN: data-race in __alloc_file / __alloc_file
@ 2019-11-08 13:16 syzbot
  2019-11-08 13:28 ` Eric Dumazet
  0 siblings, 1 reply; 45+ messages in thread
From: syzbot @ 2019-11-08 13:16 UTC (permalink / raw)
  To: elver, linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Hello,

syzbot found the following crash on:

HEAD commit:    05f22368 x86, kcsan: Enable KCSAN for x86
git tree:       https://github.com/google/ktsan.git kcsan
console output: https://syzkaller.appspot.com/x/log.txt?x=10d7fd88e00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
dashboard link: https://syzkaller.appspot.com/bug?extid=3ef049d50587836c0606
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3ef049d50587836c0606@syzkaller.appspotmail.com

==================================================================
BUG: KCSAN: data-race in __alloc_file / __alloc_file

write to 0xffff8880bb157398 of 4 bytes by task 10993 on cpu 0:
  get_cred include/linux/cred.h:253 [inline]
  __alloc_file+0x74/0x210 fs/file_table.c:105
  alloc_empty_file+0x8f/0x180 fs/file_table.c:151
  alloc_file+0x4e/0x2b0 fs/file_table.c:193
  alloc_file_pseudo+0x11c/0x1b0 fs/file_table.c:232
  anon_inode_getfile fs/anon_inodes.c:91 [inline]
  anon_inode_getfile+0x103/0x1d0 fs/anon_inodes.c:74
  __do_sys_perf_event_open+0xd32/0x1ac0 kernel/events/core.c:11100
  __se_sys_perf_event_open kernel/events/core.c:10867 [inline]
  __x64_sys_perf_event_open+0x70/0x90 kernel/events/core.c:10867
  do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

write to 0xffff8880bb157398 of 4 bytes by task 11004 on cpu 1:
  get_cred include/linux/cred.h:253 [inline]
  __alloc_file+0x74/0x210 fs/file_table.c:105
  alloc_empty_file+0x8f/0x180 fs/file_table.c:151
  path_openat+0x74/0x36e0 fs/namei.c:3514
  do_filp_open+0x11e/0x1b0 fs/namei.c:3555
  do_sys_open+0x3b3/0x4f0 fs/open.c:1097
  __do_sys_open fs/open.c:1115 [inline]
  __se_sys_open fs/open.c:1110 [inline]
  __x64_sys_open+0x55/0x70 fs/open.c:1110
  do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 11004 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 13:16 KCSAN: data-race in __alloc_file / __alloc_file syzbot
@ 2019-11-08 13:28 ` Eric Dumazet
  2019-11-08 17:01   ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Eric Dumazet @ 2019-11-08 13:28 UTC (permalink / raw)
  To: syzbot, elver, linux-fsdevel, linux-kernel, syzkaller-bugs, viro,
	Linus Torvalds, Eric Dumazet



On 11/8/19 5:16 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    05f22368 x86, kcsan: Enable KCSAN for x86
> git tree:       https://github.com/google/ktsan.git kcsan
> console output: https://syzkaller.appspot.com/x/log.txt?x=10d7fd88e00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
> dashboard link: https://syzkaller.appspot.com/bug?extid=3ef049d50587836c0606
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+3ef049d50587836c0606@syzkaller.appspotmail.com
> 
> ==================================================================
> BUG: KCSAN: data-race in __alloc_file / __alloc_file
> 
> write to 0xffff8880bb157398 of 4 bytes by task 10993 on cpu 0:
>  get_cred include/linux/cred.h:253 [inline]
>  __alloc_file+0x74/0x210 fs/file_table.c:105
>  alloc_empty_file+0x8f/0x180 fs/file_table.c:151
>  alloc_file+0x4e/0x2b0 fs/file_table.c:193
>  alloc_file_pseudo+0x11c/0x1b0 fs/file_table.c:232
>  anon_inode_getfile fs/anon_inodes.c:91 [inline]
>  anon_inode_getfile+0x103/0x1d0 fs/anon_inodes.c:74
>  __do_sys_perf_event_open+0xd32/0x1ac0 kernel/events/core.c:11100
>  __se_sys_perf_event_open kernel/events/core.c:10867 [inline]
>  __x64_sys_perf_event_open+0x70/0x90 kernel/events/core.c:10867
>  do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> write to 0xffff8880bb157398 of 4 bytes by task 11004 on cpu 1:
>  get_cred include/linux/cred.h:253 [inline]
>  __alloc_file+0x74/0x210 fs/file_table.c:105
>  alloc_empty_file+0x8f/0x180 fs/file_table.c:151
>  path_openat+0x74/0x36e0 fs/namei.c:3514
>  do_filp_open+0x11e/0x1b0 fs/namei.c:3555
>  do_sys_open+0x3b3/0x4f0 fs/open.c:1097
>  __do_sys_open fs/open.c:1115 [inline]
>  __se_sys_open fs/open.c:1110 [inline]
>  __x64_sys_open+0x55/0x70 fs/open.c:1110
>  do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 PID: 11004 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> ==================================================================
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Linus, what do you think of the following fix ?

I also took the opportunity avoiding dirtying a cache line if this was possible.

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 18639c069263fbe79dfd5a36163c656dca5da220..01b5b7d4e054ddca0df676dc1ceb068e5d71a3f8 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -250,7 +250,14 @@ static inline const struct cred *get_cred(const struct cred *cred)
        if (!cred)
                return cred;
        validate_creds(cred);
-       nonconst_cred->non_rcu = 0;
+
+       /*
+        * Avoid dirtying one cache line. The WRITE_ONCE() also pairs
+        * with itself, since we run without protection of a lock.
+        */
+       if (READ_ONCE(nonconst_cred->non_rcu))
+               WRITE_ONCE(nonconst_cred->non_rcu, 0);
+
        return get_new_cred(nonconst_cred);
 }
 
@@ -262,7 +269,14 @@ static inline const struct cred *get_cred_rcu(const struct cred *cred)
        if (!atomic_inc_not_zero(&nonconst_cred->usage))
                return NULL;
        validate_creds(cred);
-       nonconst_cred->non_rcu = 0;
+
+       /*
+        * Avoid dirtying one cache line. The WRITE_ONCE() also pairs
+        * with itself, since we run without protection of a lock.
+        */
+       if (READ_ONCE(nonconst_cred->non_rcu))
+               WRITE_ONCE(nonconst_cred->non_rcu, 0);
+
        return cred;
 }
 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 13:28 ` Eric Dumazet
@ 2019-11-08 17:01   ` Linus Torvalds
  2019-11-08 17:22     ` Eric Dumazet
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 17:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: syzbot, elver, linux-fsdevel, Linux Kernel Mailing List,
	syzkaller-bugs, Al Viro, Eric Dumazet

On Fri, Nov 8, 2019 at 5:28 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Linus, what do you think of the following fix ?

I think it's incredibly ugly.

I realize that avoiding the cacheline dirtying might be worth it, but
I'd like to see some indication that it actually matters and helps
from a performance angle. We've already dirtied memory fairly close,
even if it might not share a cacheline (that structure is randomized,
we've touched - or will touch - 'cred->usage') too.

Honestly, I don't think get_cred() is even in a hotpath. Most cred use
just use the current cred that doesn't need the 'get'. So the
optimization looks somewhat questionable - for all we know it just
makes things worse.

I also don't like using a "WRITE_ONCE()" without a reason for it. In
this case, the only "reason" is that KCSAN special-cases that thing.
I'd much rather have some other way to mark it.

So it just looks hacky to me.

I like that people are looking at KCSAN, but I get a very strong
feeling that right now the workarounds for KCSAN false-positives are
incredibly ugly, and not always appropriate.

There is absolutely zero need for a WRITE_ONCE() in this case. The
code would work fine if the compiler did the zero write fifty times,
and re-ordered it wildly. We have a flag that starts out set, and we
clear it.  There's really no "write-once" about it.

               Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 17:01   ` Linus Torvalds
@ 2019-11-08 17:22     ` Eric Dumazet
  2019-11-08 17:38       ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Eric Dumazet @ 2019-11-08 17:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 9:01 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Nov 8, 2019 at 5:28 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Linus, what do you think of the following fix ?
>
> I think it's incredibly ugly.
>
> I realize that avoiding the cacheline dirtying might be worth it, but
> I'd like to see some indication that it actually matters and helps
> from a performance angle. We've already dirtied memory fairly close,
> even if it might not share a cacheline (that structure is randomized,
> we've touched - or will touch - 'cred->usage') too.
>
> Honestly, I don't think get_cred() is even in a hotpath. Most cred use
> just use the current cred that doesn't need the 'get'. So the
> optimization looks somewhat questionable - for all we know it just
> makes things worse.
>
> I also don't like using a "WRITE_ONCE()" without a reason for it. In
> this case, the only "reason" is that KCSAN special-cases that thing.
> I'd much rather have some other way to mark it.
>
> So it just looks hacky to me.
>
> I like that people are looking at KCSAN, but I get a very strong
> feeling that right now the workarounds for KCSAN false-positives are
> incredibly ugly, and not always appropriate.
>
> There is absolutely zero need for a WRITE_ONCE() in this case. The
> code would work fine if the compiler did the zero write fifty times,
> and re-ordered it wildly. We have a flag that starts out set, and we
> clear it.  There's really no "write-once" about it.
>

Ok, so what do you suggest next ?

Declare KCSAN useless because too many false positives ?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 17:22     ` Eric Dumazet
@ 2019-11-08 17:38       ` Linus Torvalds
  2019-11-08 17:53         ` Eric Dumazet
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 17:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 9:22 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Ok, so what do you suggest next ?
>
> Declare KCSAN useless because too many false positives ?

I'd hope that there is some way to mark the cases we know about where
we just have a flag. I'm not sure what KCSAN uses right now - is it
just the "volatile" that makes KCSAN ignore it, or are there other
ways to do it?

"volatile" has huge problems with code generation for gcc. It would
probably be fine for "not_rcu" in this case, but I'd like to avoid it
in general otherwise, which is why I wonder if there are other
options.

But worst comes to worst, I'd be ok with a WRITE_ONCE() and a comment
about why (and the reason being KCSAN, not the questionable
optimization).

           Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 17:38       ` Linus Torvalds
@ 2019-11-08 17:53         ` Eric Dumazet
  2019-11-08 17:55           ` Eric Dumazet
  2019-11-08 18:05           ` Linus Torvalds
  0 siblings, 2 replies; 45+ messages in thread
From: Eric Dumazet @ 2019-11-08 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 9:39 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:

> I'd hope that there is some way to mark the cases we know about where
> we just have a flag. I'm not sure what KCSAN uses right now - is it
> just the "volatile" that makes KCSAN ignore it, or are there other
> ways to do it?

I dunno, Marco will comment on this.

I personally like WRITE_ONCE() since it adds zero overhead on generated code,
and is the facto accessor we used for many years (before KCSAN was conceived)

>
> "volatile" has huge problems with code generation for gcc. It would
> probably be fine for "not_rcu" in this case, but I'd like to avoid it
> in general otherwise, which is why I wonder if there are other
> options.
>
> But worst comes to worst, I'd be ok with a WRITE_ONCE() and a comment
> about why (and the reason being KCSAN, not the questionable
> optimization).

Ok for a single WRITE_ONCE() with a comment.

Hmm, which questionable optimization are you referring to?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 17:53         ` Eric Dumazet
@ 2019-11-08 17:55           ` Eric Dumazet
  2019-11-08 18:02             ` Eric Dumazet
  2019-11-08 20:30             ` Linus Torvalds
  2019-11-08 18:05           ` Linus Torvalds
  1 sibling, 2 replies; 45+ messages in thread
From: Eric Dumazet @ 2019-11-08 17:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 9:53 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Nov 8, 2019 at 9:39 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>
> > I'd hope that there is some way to mark the cases we know about where
> > we just have a flag. I'm not sure what KCSAN uses right now - is it
> > just the "volatile" that makes KCSAN ignore it, or are there other
> > ways to do it?
>
> I dunno, Marco will comment on this.
>
> I personally like WRITE_ONCE() since it adds zero overhead on generated code,
> and is the facto accessor we used for many years (before KCSAN was conceived)
>

BTW, I would love an efficient ADD_ONCE(variable, value)

Using WRITE_ONCE(variable, variable + value) is not good, since it can
not use the
optimized instructions operating directly on memory.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 17:55           ` Eric Dumazet
@ 2019-11-08 18:02             ` Eric Dumazet
  2019-11-08 18:12               ` Linus Torvalds
  2019-11-08 20:30             ` Linus Torvalds
  1 sibling, 1 reply; 45+ messages in thread
From: Eric Dumazet @ 2019-11-08 18:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 9:55 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Nov 8, 2019 at 9:53 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, Nov 8, 2019 at 9:39 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >
> > > I'd hope that there is some way to mark the cases we know about where
> > > we just have a flag. I'm not sure what KCSAN uses right now - is it
> > > just the "volatile" that makes KCSAN ignore it, or are there other
> > > ways to do it?
> >
> > I dunno, Marco will comment on this.
> >
> > I personally like WRITE_ONCE() since it adds zero overhead on generated code,
> > and is the facto accessor we used for many years (before KCSAN was conceived)
> >
>
> BTW, I would love an efficient ADD_ONCE(variable, value)
>
> Using WRITE_ONCE(variable, variable + value) is not good, since it can
> not use the
> optimized instructions operating directly on memory.

Another interesting KCSAN report :

static inline s64 percpu_counter_read_positive(struct percpu_counter *fbc)
{
        s64 ret = fbc->count;   // data-race ....

        barrier();              /* Prevent reloads of fbc->count */
        if (ret >= 0)
                return ret;
        return 0;
}

How was this code supposed to work at all on 32bit arches ???

Using READ_ONCE(fbc->count) alone will not help.


BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc

read to 0xffff88811dd6298c of 4 bytes by task 10060 on cpu 1:
 dst_entries_get_fast include/net/dst_ops.h:47 [inline]
 ip6_dst_gc+0xf6/0x220 net/ipv6/route.c:3167
 dst_alloc+0x104/0x149 net/core/dst.c:85
 ip6_dst_alloc+0x3d/0x80 net/ipv6/route.c:353
 ip6_rt_cache_alloc+0x8d/0x340 net/ipv6/route.c:1338
 ip6_pol_route+0x4ec/0x5c0 net/ipv6/route.c:2217
 ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
 fib6_rule_lookup+0x95/0x470 net/ipv6/fib6_rules.c:113
 ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
 ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
 ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1052
 ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1153
 rawv6_sendmsg+0x82c/0x21e0 net/ipv6/raw.c:928
 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0x9f/0xc0 net/socket.c:657
 kernel_sendmsg+0x4d/0x70 net/socket.c:677
 sock_no_sendpage+0xda/0x110 net/core/sock.c:2742
 kernel_sendpage+0x7b/0xc0 net/socket.c:3682
 sock_sendpage+0x6c/0x90 net/socket.c:935
 pipe_to_sendpage+0x102/0x180 fs/splice.c:449
 splice_from_pipe_feed fs/splice.c:500 [inline]
 __splice_from_pipe+0x248/0x480 fs/splice.c:624
 splice_from_pipe+0xbb/0x100 fs/splice.c:659
 generic_splice_sendpage+0x45/0x60 fs/splice.c:829
 do_splice_from fs/splice.c:848 [inline]
 direct_splice_actor+0xa0/0xc0 fs/splice.c:1020
 splice_direct_to_actor+0x215/0x510 fs/splice.c:975
 do_splice_direct+0x161/0x1e0 fs/splice.c:1063
 do_sendfile+0x384/0x7f0 fs/read_write.c:1464
 __do_sys_sendfile64 fs/read_write.c:1525 [inline]
 __se_sys_sendfile64 fs/read_write.c:1511 [inline]
 __x64_sys_sendfile64+0x12a/0x140 fs/read_write.c:1511
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 17:53         ` Eric Dumazet
  2019-11-08 17:55           ` Eric Dumazet
@ 2019-11-08 18:05           ` Linus Torvalds
  2019-11-08 18:15             ` Marco Elver
  1 sibling, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 18:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 9:53 AM Eric Dumazet <edumazet@google.com> wrote:
>
> I personally like WRITE_ONCE() since it adds zero overhead on generated code,
> and is the facto accessor we used for many years (before KCSAN was conceived)

So I generally prefer WRITE_ONCE() over adding "volatile" to random
data structure members.

Because volatile *does* have potentially absolutely horrendous
overhead on generated code. It just happens to be ok for the simple
case of writing once to a variable.

In fact, you bring that up yourself in your next email when you ask
for "ADD_ONCE()". Exactly because gcc generates absolutely horrendous
garbage for volatiles, for no actual good reason. Gcc *could* generate
a single add-to-memory instruction. But no, that's not at all what gcc
does.

So for the kernel, we've generally had the rule to avoid 'volatile'
data structures as much as humanly possible, because it actually does
something much worse than it could do, and the source code _looks_
simple when the volatile is hidden in the data structures.

Which is why we have READ_ONCE/WRITE_ONCE - it puts the volatile in
the code, and makes it clear not only what is going on, but also the
impact it has on code generation.

But at the same time, I don't love WRITE_ONCE() when it's not actually
about writing once. It might be better to have another way to show
"this variable is a flag that we set to a single value". Even if maybe
the implementation is then the same (ie we use a 'volatile' assignment
to make KCSAN happy).

> Hmm, which questionable optimization are you referring to?

The "avoid dirty cacheline" one by adding a read and a conditional.
Yes, it can be an optimization. And it might not be.

                Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 18:02             ` Eric Dumazet
@ 2019-11-08 18:12               ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 18:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 10:02 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Another interesting KCSAN report :
>
> static inline s64 percpu_counter_read_positive(struct percpu_counter *fbc)
> {
>         s64 ret = fbc->count;   // data-race ....

Yeah, I think that's fundamentally broken on 32-bit. It might need a
sequence lock instead of that raw_spinlock_t to be sanely done
properly on 32-bit.

Or we just admit that 32-bit doesn't really matter any more in the
long run, and that this is not a problem in practice because the
32-bit overflow basically never happens on small machines anyway.

               Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 18:05           ` Linus Torvalds
@ 2019-11-08 18:15             ` Marco Elver
  2019-11-08 18:40               ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Marco Elver @ 2019-11-08 18:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Alan Stern,
	Andrea Parri, Paul E. McKenney,
	LKMM Maintainers -- Akira Yokosawa

On Fri, 8 Nov 2019 at 19:05, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Nov 8, 2019 at 9:53 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > I personally like WRITE_ONCE() since it adds zero overhead on generated code,
> > and is the facto accessor we used for many years (before KCSAN was conceived)
>
> So I generally prefer WRITE_ONCE() over adding "volatile" to random
> data structure members.
>
> Because volatile *does* have potentially absolutely horrendous
> overhead on generated code. It just happens to be ok for the simple
> case of writing once to a variable.
>
> In fact, you bring that up yourself in your next email when you ask
> for "ADD_ONCE()". Exactly because gcc generates absolutely horrendous
> garbage for volatiles, for no actual good reason. Gcc *could* generate
> a single add-to-memory instruction. But no, that's not at all what gcc
> does.
>
> So for the kernel, we've generally had the rule to avoid 'volatile'
> data structures as much as humanly possible, because it actually does
> something much worse than it could do, and the source code _looks_
> simple when the volatile is hidden in the data structures.
>
> Which is why we have READ_ONCE/WRITE_ONCE - it puts the volatile in
> the code, and makes it clear not only what is going on, but also the
> impact it has on code generation.
>
> But at the same time, I don't love WRITE_ONCE() when it's not actually
> about writing once. It might be better to have another way to show
> "this variable is a flag that we set to a single value". Even if maybe
> the implementation is then the same (ie we use a 'volatile' assignment
> to make KCSAN happy).

(+some LKMM folks, in case I missed something on what the LKMM defines
as data race.)

KCSAN does not use volatile to distinguish accesses. Right now
READ_ONCE, WRITE_ONCE, atomic bitops, atomic_t (+ some arch specific
primitives) are treated as marked atomic operations.

The goal is to cover all primitives that the LKMM declares as
marked/atomic. A data race is then detected for concurrent conflicting
accesses where at least one is plain unmarked. In the end the LKMM
should decide what KCSAN determines as a data race. As far as I can
tell, none of the reported data races so far are false positives in
that sense.

Many thanks,
-- Marco

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 18:15             ` Marco Elver
@ 2019-11-08 18:40               ` Linus Torvalds
  2019-11-08 19:48                 ` Marco Elver
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 18:40 UTC (permalink / raw)
  To: Marco Elver
  Cc: Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Alan Stern,
	Andrea Parri, Paul E. McKenney,
	LKMM Maintainers -- Akira Yokosawa

On Fri, Nov 8, 2019 at 10:16 AM Marco Elver <elver@google.com> wrote:
>
> KCSAN does not use volatile to distinguish accesses. Right now
> READ_ONCE, WRITE_ONCE, atomic bitops, atomic_t (+ some arch specific
> primitives) are treated as marked atomic operations.

Ok, so we'd have to do this in terms of ATOMIC_WRITE().

One alternative might be KCSAN enhancement, where you notice the
following pattern:

 - a field is initialized before the data structure gets exposed (I
presume KCSAN already must understand about this issue -
initializations are different and not atomic)

 - while the field is live, there are operations that write the same
(let's call it "idempotent") value to the field under certain
circumstances

 - at release time, after all the reference counts are gone, the field
is read for whether that situation happened. I'm assuming KCSAN
already understands about this case too?

So it would only be the "idempotent writes" thing that would be
something KCSAN would have to realize do not involve a race - because
it simply doesn't matter if two writes of the same value race against
each other.

But I guess we could also just do

   #define WRITE_IDEMPOTENT(x,y) WRITE_ONCE(x,y)

and use that in the kernel to annotate these things. And if we have
that kind of annotation, we could then possibly change it to

  #define WRITE_IDEMPOTENT(x,y) \
       if READ_ONCE(x)!=y WRITE_ONCE(x,y)

if we have numbers that that actually helps (that macro written to be
intentionally invalid C - it obviously needs statement protection and
protection against evaluating the arguments multiple times etc).

                Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 18:40               ` Linus Torvalds
@ 2019-11-08 19:48                 ` Marco Elver
  2019-11-08 20:26                   ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Marco Elver @ 2019-11-08 19:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Alan Stern,
	Andrea Parri, Paul E. McKenney,
	LKMM Maintainers -- Akira Yokosawa

On Fri, 8 Nov 2019 at 19:40, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Nov 8, 2019 at 10:16 AM Marco Elver <elver@google.com> wrote:
> >
> > KCSAN does not use volatile to distinguish accesses. Right now
> > READ_ONCE, WRITE_ONCE, atomic bitops, atomic_t (+ some arch specific
> > primitives) are treated as marked atomic operations.
>
> Ok, so we'd have to do this in terms of ATOMIC_WRITE().
>
> One alternative might be KCSAN enhancement, where you notice the
> following pattern:
>
>  - a field is initialized before the data structure gets exposed (I
> presume KCSAN already must understand about this issue -
> initializations are different and not atomic)
>
>  - while the field is live, there are operations that write the same
> (let's call it "idempotent") value to the field under certain
> circumstances
>
>  - at release time, after all the reference counts are gone, the field
> is read for whether that situation happened. I'm assuming KCSAN
> already understands about this case too?

It's not explicitly aware of initialization or release. We rely on
compiler instrumentation for all memory accesses; KCSAN then sets up
"watchpoints" for sampled memory accesses, delaying execution, and
checking if a concurrent access is observed.

We already have an option (currently disabled on syzbot) where KCSAN
infers a data race not because another instrumented accesses happened
concurrently, but because the data value changed during a watchpoint's
lifetime (e.g. due to uninstrumented write, device write etc.).

This same approach could be used to ignore "idempotent writes" where
we would otherwise report a data race; i.e. if there was a concurrent
write, but the data value did not change, do not report the race. I'm
happy to add this feature if this should always be ignored.

> So it would only be the "idempotent writes" thing that would be
> something KCSAN would have to realize do not involve a race - because
> it simply doesn't matter if two writes of the same value race against
> each other.
>
> But I guess we could also just do
>
>    #define WRITE_IDEMPOTENT(x,y) WRITE_ONCE(x,y)
>
> and use that in the kernel to annotate these things. And if we have
> that kind of annotation, we could then possibly change it to
>
>   #define WRITE_IDEMPOTENT(x,y) \
>        if READ_ONCE(x)!=y WRITE_ONCE(x,y)
>
> if we have numbers that that actually helps (that macro written to be
> intentionally invalid C - it obviously needs statement protection and
> protection against evaluating the arguments multiple times etc).
>
>                 Linus

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 19:48                 ` Marco Elver
@ 2019-11-08 20:26                   ` Linus Torvalds
  2019-11-08 21:57                     ` Alan Stern
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 20:26 UTC (permalink / raw)
  To: Marco Elver
  Cc: Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Alan Stern,
	Andrea Parri, Paul E. McKenney,
	LKMM Maintainers -- Akira Yokosawa

On Fri, Nov 8, 2019 at 11:48 AM Marco Elver <elver@google.com> wrote:
>
> It's not explicitly aware of initialization or release. We rely on
> compiler instrumentation for all memory accesses; KCSAN then sets up
> "watchpoints" for sampled memory accesses, delaying execution, and
> checking if a concurrent access is observed.

Ok.

> This same approach could be used to ignore "idempotent writes" where
> we would otherwise report a data race; i.e. if there was a concurrent
> write, but the data value did not change, do not report the race. I'm
> happy to add this feature if this should always be ignored.

Hmm. I don't think it's valid in general, but it might be useful
enough in practice, at least as an option to lower the false
positives.

               Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 17:55           ` Eric Dumazet
  2019-11-08 18:02             ` Eric Dumazet
@ 2019-11-08 20:30             ` Linus Torvalds
  2019-11-08 20:53               ` Eric Dumazet
  1 sibling, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 20:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 9:56 AM Eric Dumazet <edumazet@google.com> wrote:
>
> BTW, I would love an efficient ADD_ONCE(variable, value)
>
> Using WRITE_ONCE(variable, variable + value) is not good, since it can
> not use the optimized instructions operating directly on memory.

So I'm having a hard time seeing how this could possibly ever be valid.

Is this a "writer is locked, readers are unlocked" case or something?

Because we don't really have any sane way to do that any more
efficiently, unless we'd have to add new architecture-specific
functions for it (like we do have fo the percpu ops).

Anyway, if you have a really hot case you care about, maybe you could
convince the gcc people to just add it as a peephole optimization?
Right now, gcc ends up doing some strange things with volatiles, and
basically disables a lot of stuff over them. But with a test-case,
maybe you can convince somebody that certain optimizations are still
fine. A "read+add+write" really does the exact same accesses as an
add-to-memory instruction, but gcc has some logic to disable that
instruction fusion.

          Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 20:30             ` Linus Torvalds
@ 2019-11-08 20:53               ` Eric Dumazet
  2019-11-08 21:36                 ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Eric Dumazet @ 2019-11-08 20:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 12:30 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Nov 8, 2019 at 9:56 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > BTW, I would love an efficient ADD_ONCE(variable, value)
> >
> > Using WRITE_ONCE(variable, variable + value) is not good, since it can
> > not use the optimized instructions operating directly on memory.
>
> So I'm having a hard time seeing how this could possibly ever be valid.
>
> Is this a "writer is locked, readers are unlocked" case or something?

per cpu SNMP counters mostly, with no IRQ safety requirements.

Note that this could be implemented using local{64}_add() on arches like x86_64,
while others might have to fallback to WRITE_ONCE(variable, variable + add)

>
> Because we don't really have any sane way to do that any more
> efficiently, unless we'd have to add new architecture-specific
> functions for it (like we do have fo the percpu ops).
>
> Anyway, if you have a really hot case you care about, maybe you could
> convince the gcc people to just add it as a peephole optimization?
> Right now, gcc ends up doing some strange things with volatiles, and
> basically disables a lot of stuff over them. But with a test-case,
> maybe you can convince somebody that certain optimizations are still
> fine. A "read+add+write" really does the exact same accesses as an
> add-to-memory instruction, but gcc has some logic to disable that
> instruction fusion.
>
>           Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 20:53               ` Eric Dumazet
@ 2019-11-08 21:36                 ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 21:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, syzbot, Marco Elver, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro

On Fri, Nov 8, 2019 at 12:53 PM Eric Dumazet <edumazet@google.com> wrote:
>
> per cpu SNMP counters mostly, with no IRQ safety requirements.
>
> Note that this could be implemented using local{64}_add() on arches like x86_64,
> while others might have to fallback to WRITE_ONCE(variable, variable + add)

raw_cpu_add()?

We already use those for vm_counters where we intentionally accept races.

              Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 20:26                   ` Linus Torvalds
@ 2019-11-08 21:57                     ` Alan Stern
  2019-11-08 22:06                       ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Stern @ 2019-11-08 21:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marco Elver, Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Fri, 8 Nov 2019, Linus Torvalds wrote:

> On Fri, Nov 8, 2019 at 11:48 AM Marco Elver <elver@google.com> wrote:
> >
> > It's not explicitly aware of initialization or release. We rely on
> > compiler instrumentation for all memory accesses; KCSAN then sets up
> > "watchpoints" for sampled memory accesses, delaying execution, and
> > checking if a concurrent access is observed.
> 
> Ok.
> 
> > This same approach could be used to ignore "idempotent writes" where
> > we would otherwise report a data race; i.e. if there was a concurrent
> > write, but the data value did not change, do not report the race. I'm
> > happy to add this feature if this should always be ignored.
> 
> Hmm. I don't think it's valid in general, but it might be useful
> enough in practice, at least as an option to lower the false
> positives.

Minor point...

Can we please agree to call these writes something other than 
"idempotent"?  After all, any write to normal memory is idempotent in 
the sense that doing it twice has the same effect as doing it once 
(ignoring possible races, of course).

A better name would be "write-if-different" or "write-if-changed" (and
I bet people can come up with something even better if they try).  
This at least gets across the main idea, and using

	WRITE_IF_CHANGED(x, y);

to mean

	if (READ_ONCE(x) != y) WRITE_ONCE(x, y);

is a lot clearer than using WRITE_IDEMPOTENT(x, y).

Alan Stern


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 21:57                     ` Alan Stern
@ 2019-11-08 22:06                       ` Linus Torvalds
  2019-11-09 23:08                         ` Alan Stern
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-08 22:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: Marco Elver, Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Fri, Nov 8, 2019 at 1:57 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> Can we please agree to call these writes something other than
> "idempotent"?  After all, any write to normal memory is idempotent in
> the sense that doing it twice has the same effect as doing it once
> (ignoring possible races, of course).

No!

You're completely missing the point.

Two writes to normal memory are *not* idempotent if they write
different values. The ordering very much matters, and it's racy and a
tool should complain.

But the point of WRITE_IDEMPOTENT() is that it *always* writes the
same value, so it doesn't matter if two different writers race on it.

So it really is about being idempotent.

> A better name would be "write-if-different" or "write-if-changed"

No.

Again,  you're totally missing the point.

It's not about "write-if-different".

It's about idempotent writes.

But if you do know that all the possible racing writes are idempotent,
then AS A POSSIBLE CACHE OPTIMIZATION, you can then say "only do this
write if somebody else didn't already do it".

But that's a side effect of being idempotent, not the basic rule! And
it's not necessarily obviously an optimization at all, because the
cacheline may already be dirty, and checking the old value and having
a conditional on it may be much more expensive than just writing the
new value./

The point is that certain writes DO NOT CARE ABOUT ORDERING, because
they may be setting a sticky flag (or stickily clearing a flag, as in
this case).  The ordering doesn't matter, because the operation is
idempotent.

That's what "idempotent" means. You can do it once, or a hundred
times, and the end result is the same (but is different from not doing
it at all).

And no, not all writes are idempotent. That's just crazy talk. Writes
have values.

               Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-08 22:06                       ` Linus Torvalds
@ 2019-11-09 23:08                         ` Alan Stern
  0 siblings, 0 replies; 45+ messages in thread
From: Alan Stern @ 2019-11-09 23:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marco Elver, Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Fri, 8 Nov 2019, Linus Torvalds wrote:

> On Fri, Nov 8, 2019 at 1:57 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > Can we please agree to call these writes something other than
> > "idempotent"?  After all, any write to normal memory is idempotent in
> > the sense that doing it twice has the same effect as doing it once
> > (ignoring possible races, of course).
> 
> No!
> 
> You're completely missing the point.
> 
> Two writes to normal memory are *not* idempotent if they write
> different values. The ordering very much matters, and it's racy and a
> tool should complain.

What you have written strongly indicates that either you think the word
"idempotent" means something different from its actual meaning or else
you are misusing the word in a very confusing way.

Your text seems to say that two operations are idempotent if they have 
the same effect.  Compare that to the definition from Wikipedia (not 
the best in the world, perhaps, but adequate for this discussion):

	Idempotence is the property of certain operations in
	mathematics and computer science whereby they can be applied
	multiple times without changing the result beyond the initial
	application.

For example, writing 5 to x is idempotent because performing that write
multiple times will have the same result as performing it only once.
Likewise for any write to normal memory.

Therefore talking about idempotent writes as somehow being distinct 
from other writes does not make sense.  Nor does it make sense to say 
"Two writes to normal memory are *not* idempotent if they write
different values", because idempotence is a property of individual 
operations, not of pairs of operations.

You should use a different word, because what you mean is different 
from what "idempotent" actually means.

> But the point of WRITE_IDEMPOTENT() is that it *always* writes the
> same value, so it doesn't matter if two different writers race on it.
> 
> So it really is about being idempotent.

Okay, I agree that I did completely miss the point of what you
originally meant.  But what you meant wasn't "being idempotent".

> > A better name would be "write-if-different" or "write-if-changed"
> 
> No.
> 
> Again,  you're totally missing the point.
> 
> It's not about "write-if-different".
> 
> It's about idempotent writes.

Assuming you really are talking about writes (presumably in different
threads) that store the same value to the same location: Yes, two such
writes do not in practice race with each other -- even though they may
satisfy the formal definition of a data race.

On the other hand, they may each race with other accesses.

> But if you do know that all the possible racing writes are idempotent,
> then AS A POSSIBLE CACHE OPTIMIZATION, you can then say "only do this
> write if somebody else didn't already do it".

In fact, you can _always_ perform that possible optimization.  Whether
it will be worthwhile is a separate matter...

> But that's a side effect of being idempotent, not the basic rule! And
> it's not necessarily obviously an optimization at all, because the
> cacheline may already be dirty, and checking the old value and having
> a conditional on it may be much more expensive than just writing the
> new value./

Agreed.

> The point is that certain writes DO NOT CARE ABOUT ORDERING, because
> they may be setting a sticky flag (or stickily clearing a flag, as in
> this case).  The ordering doesn't matter, because the operation is
> idempotent.

Ah, here you use the word correctly.

> That's what "idempotent" means. You can do it once, or a hundred
> times, and the end result is the same (but is different from not doing
> it at all).

Precisely the point I make above.

> And no, not all writes are idempotent. That's just crazy talk. Writes
> have values.

By "writes have values", do you mean that writes store values?  Of
course they do.  But it is clear from what you wrote just above that
all writes _are_ idempotent, because doing a write once or a hundred
times will yield the same end result.

On a more productive note, do you think we should change the LKMM so 
that it won't consider two writes to race with each other if they store 
the same value?

Alan Stern


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 19:00                 ` Linus Torvalds
  2019-11-11 19:13                   ` Eric Dumazet
@ 2019-11-11 23:51                   ` Linus Torvalds
  1 sibling, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2019-11-11 23:51 UTC (permalink / raw)
  To: Eric Dumazet, Al Viro, Kirill Smelkov
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]

On Mon, Nov 11, 2019 at 11:00 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> > if (ppos) {
> >      pos = *ppos; // data-race
>
> That code uses "fdget_pos().
>
> Which does mutual exclusion _if_ the file is something we care about
> pos for, and if it has more than one process using it.

That said, the more I look at that code, the less I like it.

I have this feeling we really should get rid of FMODE_ATOMIC_POS
entirely, now that we have the much nicer FMODE_STREAM to indicate
that 'pos' really doesn't matter.

Also, the test for "file_count(file) > 1" really is wrong, in that it
means that we protect against other processes, but not other threads.

So maybe we really should do the attached thing. Adding Al and Kirill
to the cc for comments. Kirill did some fairly in-depth review of the
whole locking on f_pos, it might be good to get his comments.

Al? Note the change from

-               if (file_count(file) > 1) {
+               if ((v & FDPUT_FPUT) || file_count(file) > 1) {

in __fdget_pos(). It basically says that the threaded case also does
the pos locking.

NOTE! This is entirely untested. It might be totally broken. It passes
my "LooksSuperficiallyFine(tm)" test, but that's all I'm going to say
about the patch.

            Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1928 bytes --]

 fs/file.c          | 4 ++--
 fs/open.c          | 6 +-----
 include/linux/fs.h | 2 --
 3 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 3da91a112bab..708e5c2b7d65 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -795,8 +795,8 @@ unsigned long __fdget_pos(unsigned int fd)
 	unsigned long v = __fdget(fd);
 	struct file *file = (struct file *)(v & ~3);
 
-	if (file && (file->f_mode & FMODE_ATOMIC_POS)) {
-		if (file_count(file) > 1) {
+	if (file && !(file->f_mode & FMODE_STREAM)) {
+		if ((v & FDPUT_FPUT) || file_count(file) > 1) {
 			v |= FDPUT_POS_UNLOCK;
 			mutex_lock(&file->f_pos_lock);
 		}
diff --git a/fs/open.c b/fs/open.c
index b62f5c0923a8..5c68282ea79e 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -771,10 +771,6 @@ static int do_dentry_open(struct file *f,
 		f->f_mode |= FMODE_WRITER;
 	}
 
-	/* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */
-	if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))
-		f->f_mode |= FMODE_ATOMIC_POS;
-
 	f->f_op = fops_get(inode->i_fop);
 	if (WARN_ON(!f->f_op)) {
 		error = -ENODEV;
@@ -1256,7 +1252,7 @@ EXPORT_SYMBOL(nonseekable_open);
  */
 int stream_open(struct inode *inode, struct file *filp)
 {
-	filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE | FMODE_ATOMIC_POS);
+	filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
 	filp->f_mode |= FMODE_STREAM;
 	return 0;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e0d909d35763..a7c3f6dd5701 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -148,8 +148,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File is opened with O_PATH; almost nothing can be done with it */
 #define FMODE_PATH		((__force fmode_t)0x4000)
 
-/* File needs atomic accesses to f_pos */
-#define FMODE_ATOMIC_POS	((__force fmode_t)0x8000)
 /* Write access to underlying fs */
 #define FMODE_WRITER		((__force fmode_t)0x10000)
 /* Has read method(s) */

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 20:46                       ` Linus Torvalds
@ 2019-11-11 21:53                         ` Eric Dumazet
  0 siblings, 0 replies; 45+ messages in thread
From: Eric Dumazet @ 2019-11-11 21:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 12:47 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Nov 11, 2019 at 12:43 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Yeah, maybe we could have some model for marking "this is statistics,
> > doesn't need to be exact".
>
> Side note: that marking MUST NOT be "READ_ONCE + WRITE_ONCE", because
> that makes gcc create horrible code, and only makes the race worse.
>
> At least with a regular add, it might stay as a single r-m-w
> instruction on architectures that have that, and makes the quality of
> the statistics slightly better (no preemption etc).
>
> So that's an excellent example of where changing code to use
> WRITE_ONCE actually makes the code objectively worse in practice -
> even if it might be the same in theory.

Yes, I believe that was the rationale of the ADD_ONCE() thing I
mentioned earlier.

I do not believe we have a solution right now ?

We have similar non atomic increments in some virtual network drivers
doing "dev->stats.tx_errors++;"  in their error path.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 20:43                     ` Linus Torvalds
@ 2019-11-11 20:46                       ` Linus Torvalds
  2019-11-11 21:53                         ` Eric Dumazet
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-11 20:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 12:43 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Yeah, maybe we could have some model for marking "this is statistics,
> doesn't need to be exact".

Side note: that marking MUST NOT be "READ_ONCE + WRITE_ONCE", because
that makes gcc create horrible code, and only makes the race worse.

At least with a regular add, it might stay as a single r-m-w
instruction on architectures that have that, and makes the quality of
the statistics slightly better (no preemption etc).

So that's an excellent example of where changing code to use
WRITE_ONCE actually makes the code objectively worse in practice -
even if it might be the same in theory.

                Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 19:13                   ` Eric Dumazet
@ 2019-11-11 20:43                     ` Linus Torvalds
  2019-11-11 20:46                       ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-11 20:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 11:13 AM Eric Dumazet <edumazet@google.com> wrote:
>
> What about this other one, it looks like multiple threads can
> manipulate tsk->min_flt++; at the same time  in faultin_page()

Yeah, maybe we could have some model for marking "this is statistics,
doesn't need to be exact".

> Should we not care, or should we mirror min_flt with a second
> atomic_long_t, or simply convert min_flt to atomic_long_t ?

Definitely not make it atomic. Those are expensive, and there's no point.

                    Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 19:00                 ` Linus Torvalds
@ 2019-11-11 19:13                   ` Eric Dumazet
  2019-11-11 20:43                     ` Linus Torvalds
  2019-11-11 23:51                   ` Linus Torvalds
  1 sibling, 1 reply; 45+ messages in thread
From: Eric Dumazet @ 2019-11-11 19:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 11:01 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Nov 11, 2019 at 10:44 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > An interesting case is the race in ksys_write()
>
> Not really.
>
> > if (ppos) {
> >      pos = *ppos; // data-race
>
> That code uses "fdget_pos().
>
> Which does mutual exclusion _if_ the file is something we care about
> pos for, and if it has more than one process using it.
>
> Basically the rule there is that we don't care about the data race in
> certain circumstances. We don't care about non-regular files, for
> example, because those are what POSIX gives guarantees for.
>
> (We have since moved towards FMODE_STREAM handling instead of the
> older FMODE_ATOMIC_POS which does this better, and it's possible we
> should get rid of the FMODE_ATOMIC_POS behavior in favor of
> FMODE_STREAM entirely)
>
> Again, that's pretty hard to tell something like KCSAN.

Well, this is hard to explain to humans... Probably less than 10 on
this planet could tell that.

What about this other one, it looks like multiple threads can
manipulate tsk->min_flt++; at the same time  in faultin_page()

Should we not care, or should we mirror min_flt with a second
atomic_long_t, or simply convert min_flt to atomic_long_t ?

BUG: KCSAN: data-race in __get_user_pages / __get_user_pages

read to 0xffff8880b0b8f650 of 8 bytes by task 11553 on cpu 1:
 faultin_page mm/gup.c:653 [inline]
 __get_user_pages+0x78f/0x1160 mm/gup.c:845
 __get_user_pages_locked mm/gup.c:1023 [inline]
 get_user_pages_remote+0x206/0x3e0 mm/gup.c:1163
 process_vm_rw_single_vec mm/process_vm_access.c:109 [inline]
 process_vm_rw_core.isra.0+0x3a4/0x8c0 mm/process_vm_access.c:216
 process_vm_rw+0x1c4/0x1e0 mm/process_vm_access.c:284
 __do_sys_process_vm_writev mm/process_vm_access.c:306 [inline]
 __se_sys_process_vm_writev mm/process_vm_access.c:301 [inline]
 __x64_sys_process_vm_writev+0x8b/0xb0 mm/process_vm_access.c:301
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

write to 0xffff8880b0b8f650 of 8 bytes by task 11531 on cpu 0:
 faultin_page mm/gup.c:653 [inline]
 __get_user_pages+0x7b1/0x1160 mm/gup.c:845
 __get_user_pages_locked mm/gup.c:1023 [inline]
 get_user_pages_remote+0x206/0x3e0 mm/gup.c:1163
 process_vm_rw_single_vec mm/process_vm_access.c:109 [inline]
 process_vm_rw_core.isra.0+0x3a4/0x8c0 mm/process_vm_access.c:216
 process_vm_rw+0x1c4/0x1e0 mm/process_vm_access.c:284
 __do_sys_process_vm_writev mm/process_vm_access.c:306 [inline]
 __se_sys_process_vm_writev mm/process_vm_access.c:301 [inline]
 __x64_sys_process_vm_writev+0x8b/0xb0 mm/process_vm_access.c:301
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11531 Comm: syz-executor.4 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 18:44               ` Eric Dumazet
@ 2019-11-11 19:00                 ` Linus Torvalds
  2019-11-11 19:13                   ` Eric Dumazet
  2019-11-11 23:51                   ` Linus Torvalds
  0 siblings, 2 replies; 45+ messages in thread
From: Linus Torvalds @ 2019-11-11 19:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 10:44 AM Eric Dumazet <edumazet@google.com> wrote:
>
> An interesting case is the race in ksys_write()

Not really.

> if (ppos) {
>      pos = *ppos; // data-race

That code uses "fdget_pos().

Which does mutual exclusion _if_ the file is something we care about
pos for, and if it has more than one process using it.

Basically the rule there is that we don't care about the data race in
certain circumstances. We don't care about non-regular files, for
example, because those are what POSIX gives guarantees for.

(We have since moved towards FMODE_STREAM handling instead of the
older FMODE_ATOMIC_POS which does this better, and it's possible we
should get rid of the FMODE_ATOMIC_POS behavior in favor of
FMODE_STREAM entirely)

Again, that's pretty hard to tell something like KCSAN.

Of course, it's then questionable whether our rules for not caring are
necessarily the _right_ rules for not caring. For example, if you have
threads, the "more than one process opening it" doesn't trigger. It's
literally just atomicity across processes that we guarantee. That's
certainly a bit questionable. But that's a higher-level decision.

               Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 18:50               ` Linus Torvalds
  2019-11-11 18:59                 ` Marco Elver
@ 2019-11-11 18:59                 ` Eric Dumazet
  1 sibling, 0 replies; 45+ messages in thread
From: Eric Dumazet @ 2019-11-11 18:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 10:50 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:

> Yeah, if it hasn't shown any real bugs so far, that just strengthens
> the "it needs much fewer false positives to be useful".
>

It definitely has shown real bugs, some of them being still addressed
(not yet fixed)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 18:50               ` Linus Torvalds
@ 2019-11-11 18:59                 ` Marco Elver
  2019-11-11 18:59                 ` Eric Dumazet
  1 sibling, 0 replies; 45+ messages in thread
From: Marco Elver @ 2019-11-11 18:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, Alan Stern, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, 11 Nov 2019 at 19:50, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Nov 11, 2019 at 10:31 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > Problem is that KASAN/KCSAN stops as soon as one issue is hit,
> > regardless of it being a false positive or not.
>
> So mayb e that - together with the known huge base of false positives
> - just means that KCSAN needs some more work before it can be used as
> a basis for sending out patches.
>
> Maybe the reporting needs to create a hash of the location, and report
> once per location? Or something like that.
>
> Maybe KCSAN needs a way to filter out known false positives on a KCSAN
> side, without having to change the source for a tool that gives too
> much noise?
>
> > If we do not annotate the false positive, the real issues might be
> > hidden for years.
>
> I don't think "change the kernel source for a tool that isn't good
> enough" is the solution.
>
> > There is no pattern really, only a lot of noise (small ' bugs'  that
> > have no real impact)
>
> Yeah, if it hasn't shown any real bugs so far, that just strengthens
> the "it needs much fewer false positives to be useful".
>
> KASAN and lockdep can afford to stop after the first problem, because
> the problems they report - and the additional annotations you might
> want to add - are quality problems and annotations.

There is a fundamental limitation here. As I understand, in an ideal
world we'd only find true logic bugs, *race conditions*, first, and
*data races* later. Some data races are also race conditions, which a
tool like KCSAN can report. However, not all race conditions are data
races and vice-versa.

The intent is to report data races according to the LKMM. KCSAN
currently does that. On syzbot, we do not even report all data races
because we use a very conservative config, to filter things like data
races between plain and ONCE/atomic accesses, etc. A data race
detector can only work at the memory model/language level.

Deeper analysis, to find only race conditions, requires conveying the
intended logic and patterns to a tool. This requires 1) the developer
either writing a spec or model of their code, and then 2) the tool
verifying that the implementation matches.  People have done this for
small bits of code using model checkers (and other formal methods),
but this just doesn't scale!

KCSAN finds real problems today. Maybe not all of them are race
conditions, but they are data races. We already have several options
to filter data races, and will keep adding more. However, a tool that
magically proves that there are no concurrency related logic bugs is
an entirely different beast.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 18:31             ` Eric Dumazet
  2019-11-11 18:44               ` Eric Dumazet
@ 2019-11-11 18:50               ` Linus Torvalds
  2019-11-11 18:59                 ` Marco Elver
  2019-11-11 18:59                 ` Eric Dumazet
  1 sibling, 2 replies; 45+ messages in thread
From: Linus Torvalds @ 2019-11-11 18:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 10:31 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Problem is that KASAN/KCSAN stops as soon as one issue is hit,
> regardless of it being a false positive or not.

So mayb e that - together with the known huge base of false positives
- just means that KCSAN needs some more work before it can be used as
a basis for sending out patches.

Maybe the reporting needs to create a hash of the location, and report
once per location? Or something like that.

Maybe KCSAN needs a way to filter out known false positives on a KCSAN
side, without having to change the source for a tool that gives too
much noise?

> If we do not annotate the false positive, the real issues might be
> hidden for years.

I don't think "change the kernel source for a tool that isn't good
enough" is the solution.

> There is no pattern really, only a lot of noise (small ' bugs'  that
> have no real impact)

Yeah, if it hasn't shown any real bugs so far, that just strengthens
the "it needs much fewer false positives to be useful".

KASAN and lockdep can afford to stop after the first problem, because
the problems they report - and the additional annotations you might
want to add - are quality problems and annotations.

                Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 18:31             ` Eric Dumazet
@ 2019-11-11 18:44               ` Eric Dumazet
  2019-11-11 19:00                 ` Linus Torvalds
  2019-11-11 18:50               ` Linus Torvalds
  1 sibling, 1 reply; 45+ messages in thread
From: Eric Dumazet @ 2019-11-11 18:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 10:31 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Nov 11, 2019 at 10:05 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Mon, Nov 11, 2019 at 9:52 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > Now I wonder what to do with the ~400 KCSAN reports sitting in
> > > pre-moderation queue.
> >
> > So regular KASAN reports are fairly easy to deal with: they report
> > actual bugs. They may be hard to hit, but generally there's no
> > question about something like a use-after-free or whatever.
> >
> > The problem with KCSAN is that it's not clear how many of the reports
> > have been actual real honest-to-goodness bugs that could cause
> > problems, and how many of them are "this isn't actually a bug, but an
> > annotation will shut up KCSAN".
> >
> > My gut feeling would be that it would be best to ignore the ones that
> > are "an annotation will shut up KCSAN", and look at the ones that are
> > real bugs.
> >
> > Is there a pattern to those real bugs? Is there perhaps a way to make
> > KCSAN notice _that_ pattern in particular, and suppress the ones that
> > are "we can shut these up with annotations that don't really change
> > the code"?
> >
> > I think it would be much better for the kernel - and much better for
> > KCSAN - if the problem reports KCSAN reports are real problems that
> > can actually be triggered as problems, and that it behaves much more
> > like KASAN in that respect.
> >
> > Yes, yes, then once the *real* problems have been handled, maybe we
> > can expand the search to be "stylistic issues" and "in theory, this
> > could cause problems with a compiler that did X" issues.
> >
> > But I think the "just annotate" thing makes people more likely to
> > dismiss KCSAN issues, and I don't think it's healthy.
> >
>
> Problem is that KASAN/KCSAN stops as soon as one issue is hit,
> regardless of it being a false positive or not.
>
> (Same happens with LOCKDEP seeing only one issue, then disabling itself)
>
> If we do not annotate the false positive, the real issues might be
> hidden for years.
>
> There is no pattern really, only a lot of noise (small ' bugs'  that
> have no real impact)

An interesting case is the race in ksys_write()

if (ppos) {
     pos = *ppos; // data-race
     ppos = &pos;
}
ret = vfs_write(f.file, buf, count, ppos);
if (ret >= 0 && ppos)
    f.file->f_pos = pos;  // data-race

BUG: KCSAN: data-race in ksys_write / ksys_write

write to 0xffff8880a9c29568 of 8 bytes by task 27477 on cpu 1:
 ksys_write+0x101/0x1b0 fs/read_write.c:613
 __do_sys_write fs/read_write.c:623 [inline]
 __se_sys_write fs/read_write.c:620 [inline]
 __x64_sys_write+0x4c/0x60 fs/read_write.c:620
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 18:04           ` Linus Torvalds
@ 2019-11-11 18:31             ` Eric Dumazet
  2019-11-11 18:44               ` Eric Dumazet
  2019-11-11 18:50               ` Linus Torvalds
  0 siblings, 2 replies; 45+ messages in thread
From: Eric Dumazet @ 2019-11-11 18:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 10:05 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Nov 11, 2019 at 9:52 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > Now I wonder what to do with the ~400 KCSAN reports sitting in
> > pre-moderation queue.
>
> So regular KASAN reports are fairly easy to deal with: they report
> actual bugs. They may be hard to hit, but generally there's no
> question about something like a use-after-free or whatever.
>
> The problem with KCSAN is that it's not clear how many of the reports
> have been actual real honest-to-goodness bugs that could cause
> problems, and how many of them are "this isn't actually a bug, but an
> annotation will shut up KCSAN".
>
> My gut feeling would be that it would be best to ignore the ones that
> are "an annotation will shut up KCSAN", and look at the ones that are
> real bugs.
>
> Is there a pattern to those real bugs? Is there perhaps a way to make
> KCSAN notice _that_ pattern in particular, and suppress the ones that
> are "we can shut these up with annotations that don't really change
> the code"?
>
> I think it would be much better for the kernel - and much better for
> KCSAN - if the problem reports KCSAN reports are real problems that
> can actually be triggered as problems, and that it behaves much more
> like KASAN in that respect.
>
> Yes, yes, then once the *real* problems have been handled, maybe we
> can expand the search to be "stylistic issues" and "in theory, this
> could cause problems with a compiler that did X" issues.
>
> But I think the "just annotate" thing makes people more likely to
> dismiss KCSAN issues, and I don't think it's healthy.
>

Problem is that KASAN/KCSAN stops as soon as one issue is hit,
regardless of it being a false positive or not.

(Same happens with LOCKDEP seeing only one issue, then disabling itself)

If we do not annotate the false positive, the real issues might be
hidden for years.

There is no pattern really, only a lot of noise (small ' bugs'  that
have no real impact)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 17:52         ` Eric Dumazet
@ 2019-11-11 18:04           ` Linus Torvalds
  2019-11-11 18:31             ` Eric Dumazet
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-11 18:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 9:52 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Now I wonder what to do with the ~400 KCSAN reports sitting in
> pre-moderation queue.

So regular KASAN reports are fairly easy to deal with: they report
actual bugs. They may be hard to hit, but generally there's no
question about something like a use-after-free or whatever.

The problem with KCSAN is that it's not clear how many of the reports
have been actual real honest-to-goodness bugs that could cause
problems, and how many of them are "this isn't actually a bug, but an
annotation will shut up KCSAN".

My gut feeling would be that it would be best to ignore the ones that
are "an annotation will shut up KCSAN", and look at the ones that are
real bugs.

Is there a pattern to those real bugs? Is there perhaps a way to make
KCSAN notice _that_ pattern in particular, and suppress the ones that
are "we can shut these up with annotations that don't really change
the code"?

I think it would be much better for the kernel - and much better for
KCSAN - if the problem reports KCSAN reports are real problems that
can actually be triggered as problems, and that it behaves much more
like KASAN in that respect.

Yes, yes, then once the *real* problems have been handled, maybe we
can expand the search to be "stylistic issues" and "in theory, this
could cause problems with a compiler that did X" issues.

But I think the "just annotate" thing makes people more likely to
dismiss KCSAN issues, and I don't think it's healthy.

                Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 16:51       ` Linus Torvalds
@ 2019-11-11 17:52         ` Eric Dumazet
  2019-11-11 18:04           ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Eric Dumazet @ 2019-11-11 17:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Marco Elver, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 8:51 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Nov 11, 2019 at 7:51 AM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > I dislike the explicit annotation approach, because it shifts the
> > burden of proving correctness from the automatic verifier to the
> > programmer.
>
> Yes.
>
> However, sometimes explicit annotations are very useful as
> documentation and as showing of intent even if they might not change
> behavior or code generation.
>
> But they generally should never _replace_ checking - in fact, the
> annotations themselves should hopefully be checked for correctness
> too.
>
> So a good annotation would implicitly document intent, but it should
> also be something that we can check being true, so that we also have
> the check that reality actually _matches_ the intent too. Because
> misleading and wrong documentation is worse than no documentation at
> all.
>
> Side note: an example of a dangerous annotation is the one that Eric
> pointed out, where a 64-bit read in percpu_counter_read_positive()
> could be changed to READ_ONCE(), and we would compile it cleanly, but
> on 32-bit it wouldn't actually be atomic.
>
> We at one time tried to actually verify that READ/WRITE_ONCE() was
> done only on types that could actually be accessed atomically (always
> ignoring alpha because the pain is not worth it), but it showed too
> many problems.
>
> So now we silently accept things that aren't actually atomic. We do
> access them "once" in the sense that we don't allow the compiler to
> reload it, but it's not "once" in the LKMM sense of one single value.
>
> That's ok for some cases. But it's actually a horrid horrid thing from
> a documentation standpoint, and I hate it, and it's dangerous.
>
>                 Linus

I was hoping to cleanup the 'easy cases'  before looking at more serious issues.

But it looks like even the ' easy cases'  are not that easy.

Now I wonder what to do with the ~400 KCSAN reports sitting in
pre-moderation queue.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 15:51     ` Alan Stern
@ 2019-11-11 16:51       ` Linus Torvalds
  2019-11-11 17:52         ` Eric Dumazet
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-11 16:51 UTC (permalink / raw)
  To: Alan Stern
  Cc: Marco Elver, Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 7:51 AM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> I dislike the explicit annotation approach, because it shifts the
> burden of proving correctness from the automatic verifier to the
> programmer.

Yes.

However, sometimes explicit annotations are very useful as
documentation and as showing of intent even if they might not change
behavior or code generation.

But they generally should never _replace_ checking - in fact, the
annotations themselves should hopefully be checked for correctness
too.

So a good annotation would implicitly document intent, but it should
also be something that we can check being true, so that we also have
the check that reality actually _matches_ the intent too. Because
misleading and wrong documentation is worse than no documentation at
all.

Side note: an example of a dangerous annotation is the one that Eric
pointed out, where a 64-bit read in percpu_counter_read_positive()
could be changed to READ_ONCE(), and we would compile it cleanly, but
on 32-bit it wouldn't actually be atomic.

We at one time tried to actually verify that READ/WRITE_ONCE() was
done only on types that could actually be accessed atomically (always
ignoring alpha because the pain is not worth it), but it showed too
many problems.

So now we silently accept things that aren't actually atomic. We do
access them "once" in the sense that we don't allow the compiler to
reload it, but it's not "once" in the LKMM sense of one single value.

That's ok for some cases. But it's actually a horrid horrid thing from
a documentation standpoint, and I hate it, and it's dangerous.

                Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-10 19:10   ` Marco Elver
@ 2019-11-11 15:51     ` Alan Stern
  2019-11-11 16:51       ` Linus Torvalds
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Stern @ 2019-11-11 15:51 UTC (permalink / raw)
  To: Marco Elver
  Cc: Linus Torvalds, Eric Dumazet, Eric Dumazet, syzbot,
	linux-fsdevel, Linux Kernel Mailing List, syzkaller-bugs,
	Al Viro, Andrea Parri, Paul E. McKenney,
	LKMM Maintainers -- Akira Yokosawa

On Sun, 10 Nov 2019, Marco Elver wrote:

> On Sun, 10 Nov 2019 at 17:09, Alan Stern <stern@rowland.harvard.edu> wrote:
...

> > For those used to thinking in terms of litmus tests, consider this one:
> >
> > C equivalent-writes
> >
> > {}
> >
> > P0(int *x)
> > {
> >         *x = 1;
> > }
> >
> > P1(int *x)
> > {
> >         *x = 1;
> > }
> >
> > exists (~x=1)
> >
> > Should the LKMM say that this litmus test contains a race?
> >
> > This suggests that we might also want to relax the notion of a write
> > racing with a read, although in that case I'm not at all sure what the
> > appropriate change to the memory model would be.  Something along the
> > lines of: If a write W races with a read R, but W stores the same value
> > that R would have read if W were not present, then it's not really a
> > race.  But of course this is far too vague to be useful.
> 
> What if you introduce to the above litmus test:
> 
> P2(int *x) { *x = 2; }

Then clearly the test _would_ contain a data race.

> How can a developer, using the LKMM as a reference, hope to prove
> their code is free from data races without having to enumerate all
> possible values a variable could contain (in addition to all possible
> interleavings)?

Well, for one thing the new rule doesn't say anything about all
possible values a variable could contain; it only talks about the
values of a pair of concurrent writes.  Of course, this would still
require you to be aware of (if not to fully enumerate) all possible
values a write could store, so perhaps it's not much of an improvement.

On the other hand, one way to prove your code is data-race-free under
the revised LKMM would be to show that it is data-race-free under the
original (i.e., current) LKMM.  Then you wouldn't have to enumerate any
lists of possible values.

> I view introducing data value dependencies, for the sake of allowing
> more programs, to a language memory model as a slippery slope, and am
> not aware of any precedent where this worked out. The additional
> complexity in the memory model would put a burden on developers and
> the compiler that is unlikely to be a real benefit (as you pointed
> out, the compiler may even need to disable some transformations).

This may be so.  But if the resulting model is a better match to the
way kernel developers think about their code, wouldn't it be
appropriate?

>  From
> a practical point of view, if the LKMM departs further and further
> from C11's memory model, how do we ensure all compilers do the right
> thing?

Linus has already committed to doing this.  To quote the latest example
(from a message he posted shortly after yours):

	I don't care one whit about C11. Made-up stores to shared data
	are not acceptable. Ever. We will turn that off with a compiler
	switch if the compiler thinks it can do them, the same way we
	turn off other incorrect optimizations like the type-based
	aliasing or the insane "signed integer arithmetic can have
	undefined behavior" stupidity that the standards people
	allowed.

Given that the kernel _requires_ compilers to behave this way, 
shouldn't the LKMM reflect this requirement?

> My vote would go to explicit annotation, not only because it reduces
> hidden complexity, but also because it makes the code more
> understandable, for developers and tooling. As an additional point, I
> find the original suggestion to add WRITE_ONCE to be the least bad (or
> some other better named WRITE_). Consider somebody changing the code,
> changing the semantics and the values written to "non_rcu". With a
> WRITE_ONCE, the developer would be clear about the fact that the write
> can happen concurrently, and ensure new code is written with the
> assumption that concurrent writes can happen.

I dislike the explicit annotation approach, because it shifts the
burden of proving correctness from the automatic verifier to the
programmer.  Let's take the litmus test above as example.  I could
annotate it to read:

P0(int *x) { WRITE_IDEMPOTENT(*x, 1); }
P1(int *x) { WRITE_IDEMPOTENT(*x, 1); }

and then KCSAN would take my word for it that the two writes don't race
with each other (or if they do race, it doesn't matter).  But now if
the code was changed by adding:

P2(int *x) { WRITE_IDEMPOTENT(*x, 2); }

then KCSAN would still believe there was no race, meaning it would be
up to me to audit all possible writes to x to make sure they store the
same value.  That is not how automated tooling should work.

Alan Stern


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 14:31           ` Paul E. McKenney
@ 2019-11-11 15:10             ` Marco Elver
  0 siblings, 0 replies; 45+ messages in thread
From: Marco Elver @ 2019-11-11 15:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Alan Stern, Eric Dumazet, Eric Dumazet, syzbot,
	linux-fsdevel, Linux Kernel Mailing List, syzkaller-bugs,
	Al Viro, Andrea Parri, LKMM Maintainers -- Akira Yokosawa

On Mon, 11 Nov 2019 at 15:31, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Nov 11, 2019 at 03:17:51PM +0100, Marco Elver wrote:
> > On Sun, 10 Nov 2019 at 21:44, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Sun, Nov 10, 2019 at 11:20:53AM -0800, Linus Torvalds wrote:
> > > > On Sun, Nov 10, 2019 at 11:12 AM Linus Torvalds
> > > > <torvalds@linux-foundation.org> wrote:
> > > > >
> > > > > And this is where WRITE_IDEMPOTENT would make a possible difference.
> > > > > In particular, if we make the optimization to do the "read and only
> > > > > write if changed"
> > > >
> > > > It might be useful for checking too. IOW, something like KCSAN could
> > > > actually check that if a field has an idempotent write to it, all
> > > > writes always have the same value.
> > > >
> > > > Again, there's the issue with lifetime.
> > > >
> > > > Part of that is "initialization is different". Those writes would not
> > > > be marked idempotent, of course, and they'd write another value.
> > > >
> > > > There's also the issue of lifetime at the _end_ of the use, of course.
> > > > There _are_ interesting data races at the end of the lifetime, both
> > > > reads and writes.
> > > >
> > > > In particular, if it's a sticky flag, in order for there to not be any
> > > > races, all the writes have to happen with a refcount held, and the
> > > > final read has to happen after the final refcount is dropped (and the
> > > > refcounts have to have atomicity and ordering, of course). I'm not
> > > > sure how easy something like that is model in KSAN. Maybe it already
> > > > does things like that for all the other refcount stuff we do.
> > > >
> > > > But the lifetime can be problematic for other reasons too - in this
> > > > particular case we have a union for that sticky flag (which is used
> > > > under the refcount), and then when the final refcount is released we
> > > > read that value (thus no data race) but because of the union we will
> > > > now start using that field with *different* data. It becomes that RCU
> > > > list head instead.
> > > >
> > > > That kind of "it used to be a sticky flag, but now the lifetime of the
> > > > flag is over, and it's something entirely different" might be a
> > > > nightmare for something like KCSAN. It sounds complicated to check
> > > > for, but I have no idea what KCSAN really considers complicated or
> > > > not.
> > >
> > > But will "one size fits all" be practical and useful?
> > >
> > > For my code, I would be happy to accept a significant "false positive"
> > > rate to get even a probabilistic warning of other-task accesses to some
> > > of RCU's fields.  Even if the accesses were perfect from a functional
> > > viewpoint, they could be problematic from a performance and scalability
> > > viewpoint.  And for something like RCU, real bugs, even those that are
> > > very improbable, need to be fixed.
> > >
> > > But other code (and thus other developers and maintainers) are going to
> > > have different needs.  For all I know, some might have good reasons to
> > > exclude their code from KCSAN analysis entirely.
> > >
> > > Would it make sense for KCSAN to have per-file/subsystem/whatever flags
> > > specifying the depth of the analysis?
> >
> > Just to answer this: we already have this, and disable certain files
> > already. So it's an option if required. Just need maintainers to add
> > KCSAN_SANITIZE := n, or KCSAN_SANITIZE_file.o := n to Makefiles, and
> > KCSAN will simply ignore those.
> >
> > FWIW we now also have a config option to "ignore repeated writes with
> > the same value". It may be a little overaggressive/imprecise in
> > filtering data races, but anything else like the super precise
> > analysis involving tracking lifetimes and values (and whatever else
> > the rules would require) is simply too complex. So, the current
> > solution will avoid reporting cases like the original report here
> > (__alloc_file), but at the cost of maybe being a little imprecise.
> > It's probably a reasonable trade-off, given that we have too many data
> > races to deal with on syzbot anyway.
>
> Nice!
>
> Is this added repeated-writes analysis something that can be disabled?
> I would prefer that the analysis of RCU complain in this case as a
> probabilistic cache-locality warning.  If it can be disabled, please
> let me know if there is anything that I need to do to make this happen.

It's hidden behind a Kconfig config option, and actually disabled by
default. We can't enable/disable this on a per-file basis.

Right now, we'll just enable it on the public syzbot instance, which
will use the most conservative config.  Of course you can still run
your own fuzzer/stress test of choice with KCSAN and the option
disabled. Is that enough?

Otherwise I could also just say if the symbolized top stack frame
contains "rcu_", don't ignore -- which would be a little hacky and
imprecise though. What do you prefer?

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-11 14:17         ` Marco Elver
@ 2019-11-11 14:31           ` Paul E. McKenney
  2019-11-11 15:10             ` Marco Elver
  0 siblings, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2019-11-11 14:31 UTC (permalink / raw)
  To: Marco Elver
  Cc: Linus Torvalds, Alan Stern, Eric Dumazet, Eric Dumazet, syzbot,
	linux-fsdevel, Linux Kernel Mailing List, syzkaller-bugs,
	Al Viro, Andrea Parri, LKMM Maintainers -- Akira Yokosawa

On Mon, Nov 11, 2019 at 03:17:51PM +0100, Marco Elver wrote:
> On Sun, 10 Nov 2019 at 21:44, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sun, Nov 10, 2019 at 11:20:53AM -0800, Linus Torvalds wrote:
> > > On Sun, Nov 10, 2019 at 11:12 AM Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > And this is where WRITE_IDEMPOTENT would make a possible difference.
> > > > In particular, if we make the optimization to do the "read and only
> > > > write if changed"
> > >
> > > It might be useful for checking too. IOW, something like KCSAN could
> > > actually check that if a field has an idempotent write to it, all
> > > writes always have the same value.
> > >
> > > Again, there's the issue with lifetime.
> > >
> > > Part of that is "initialization is different". Those writes would not
> > > be marked idempotent, of course, and they'd write another value.
> > >
> > > There's also the issue of lifetime at the _end_ of the use, of course.
> > > There _are_ interesting data races at the end of the lifetime, both
> > > reads and writes.
> > >
> > > In particular, if it's a sticky flag, in order for there to not be any
> > > races, all the writes have to happen with a refcount held, and the
> > > final read has to happen after the final refcount is dropped (and the
> > > refcounts have to have atomicity and ordering, of course). I'm not
> > > sure how easy something like that is model in KSAN. Maybe it already
> > > does things like that for all the other refcount stuff we do.
> > >
> > > But the lifetime can be problematic for other reasons too - in this
> > > particular case we have a union for that sticky flag (which is used
> > > under the refcount), and then when the final refcount is released we
> > > read that value (thus no data race) but because of the union we will
> > > now start using that field with *different* data. It becomes that RCU
> > > list head instead.
> > >
> > > That kind of "it used to be a sticky flag, but now the lifetime of the
> > > flag is over, and it's something entirely different" might be a
> > > nightmare for something like KCSAN. It sounds complicated to check
> > > for, but I have no idea what KCSAN really considers complicated or
> > > not.
> >
> > But will "one size fits all" be practical and useful?
> >
> > For my code, I would be happy to accept a significant "false positive"
> > rate to get even a probabilistic warning of other-task accesses to some
> > of RCU's fields.  Even if the accesses were perfect from a functional
> > viewpoint, they could be problematic from a performance and scalability
> > viewpoint.  And for something like RCU, real bugs, even those that are
> > very improbable, need to be fixed.
> >
> > But other code (and thus other developers and maintainers) are going to
> > have different needs.  For all I know, some might have good reasons to
> > exclude their code from KCSAN analysis entirely.
> >
> > Would it make sense for KCSAN to have per-file/subsystem/whatever flags
> > specifying the depth of the analysis?
> 
> Just to answer this: we already have this, and disable certain files
> already. So it's an option if required. Just need maintainers to add
> KCSAN_SANITIZE := n, or KCSAN_SANITIZE_file.o := n to Makefiles, and
> KCSAN will simply ignore those.
> 
> FWIW we now also have a config option to "ignore repeated writes with
> the same value". It may be a little overaggressive/imprecise in
> filtering data races, but anything else like the super precise
> analysis involving tracking lifetimes and values (and whatever else
> the rules would require) is simply too complex. So, the current
> solution will avoid reporting cases like the original report here
> (__alloc_file), but at the cost of maybe being a little imprecise.
> It's probably a reasonable trade-off, given that we have too many data
> races to deal with on syzbot anyway.

Nice!

Is this added repeated-writes analysis something that can be disabled?
I would prefer that the analysis of RCU complain in this case as a
probabilistic cache-locality warning.  If it can be disabled, please
let me know if there is anything that I need to do to make this happen.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-10 20:44       ` Paul E. McKenney
  2019-11-10 21:10         ` Linus Torvalds
@ 2019-11-11 14:17         ` Marco Elver
  2019-11-11 14:31           ` Paul E. McKenney
  1 sibling, 1 reply; 45+ messages in thread
From: Marco Elver @ 2019-11-11 14:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Alan Stern, Eric Dumazet, Eric Dumazet, syzbot,
	linux-fsdevel, Linux Kernel Mailing List, syzkaller-bugs,
	Al Viro, Andrea Parri, LKMM Maintainers -- Akira Yokosawa

On Sun, 10 Nov 2019 at 21:44, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Sun, Nov 10, 2019 at 11:20:53AM -0800, Linus Torvalds wrote:
> > On Sun, Nov 10, 2019 at 11:12 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > And this is where WRITE_IDEMPOTENT would make a possible difference.
> > > In particular, if we make the optimization to do the "read and only
> > > write if changed"
> >
> > It might be useful for checking too. IOW, something like KCSAN could
> > actually check that if a field has an idempotent write to it, all
> > writes always have the same value.
> >
> > Again, there's the issue with lifetime.
> >
> > Part of that is "initialization is different". Those writes would not
> > be marked idempotent, of course, and they'd write another value.
> >
> > There's also the issue of lifetime at the _end_ of the use, of course.
> > There _are_ interesting data races at the end of the lifetime, both
> > reads and writes.
> >
> > In particular, if it's a sticky flag, in order for there to not be any
> > races, all the writes have to happen with a refcount held, and the
> > final read has to happen after the final refcount is dropped (and the
> > refcounts have to have atomicity and ordering, of course). I'm not
> > sure how easy something like that is model in KSAN. Maybe it already
> > does things like that for all the other refcount stuff we do.
> >
> > But the lifetime can be problematic for other reasons too - in this
> > particular case we have a union for that sticky flag (which is used
> > under the refcount), and then when the final refcount is released we
> > read that value (thus no data race) but because of the union we will
> > now start using that field with *different* data. It becomes that RCU
> > list head instead.
> >
> > That kind of "it used to be a sticky flag, but now the lifetime of the
> > flag is over, and it's something entirely different" might be a
> > nightmare for something like KCSAN. It sounds complicated to check
> > for, but I have no idea what KCSAN really considers complicated or
> > not.
>
> But will "one size fits all" be practical and useful?
>
> For my code, I would be happy to accept a significant "false positive"
> rate to get even a probabilistic warning of other-task accesses to some
> of RCU's fields.  Even if the accesses were perfect from a functional
> viewpoint, they could be problematic from a performance and scalability
> viewpoint.  And for something like RCU, real bugs, even those that are
> very improbable, need to be fixed.
>
> But other code (and thus other developers and maintainers) are going to
> have different needs.  For all I know, some might have good reasons to
> exclude their code from KCSAN analysis entirely.
>
> Would it make sense for KCSAN to have per-file/subsystem/whatever flags
> specifying the depth of the analysis?

Just to answer this: we already have this, and disable certain files
already. So it's an option if required. Just need maintainers to add
KCSAN_SANITIZE := n, or KCSAN_SANITIZE_file.o := n to Makefiles, and
KCSAN will simply ignore those.

FWIW we now also have a config option to "ignore repeated writes with
the same value". It may be a little overaggressive/imprecise in
filtering data races, but anything else like the super precise
analysis involving tracking lifetimes and values (and whatever else
the rules would require) is simply too complex. So, the current
solution will avoid reporting cases like the original report here
(__alloc_file), but at the cost of maybe being a little imprecise.
It's probably a reasonable trade-off, given that we have too many data
races to deal with on syzbot anyway.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-10 21:10         ` Linus Torvalds
@ 2019-11-10 21:31           ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2019-11-10 21:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Marco Elver, Eric Dumazet, Eric Dumazet, syzbot,
	linux-fsdevel, Linux Kernel Mailing List, syzkaller-bugs,
	Al Viro, Andrea Parri, LKMM Maintainers -- Akira Yokosawa

On Sun, Nov 10, 2019 at 01:10:39PM -0800, Linus Torvalds wrote:
> On Sun, Nov 10, 2019 at 12:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > But will "one size fits all" be practical and useful?
> 
> Oh, I do agree that if KCSAN has some mode where it says "I'll ignore
> repeated writes with the same value" (or whatever), it could/should
> likely be behind some flag.
> 
> I don't think it should be a subsystem flag, though. More of a "I'm
> willing to actually analyze and ignore false positives" flag. Because
> I don't think it's so much about the code, as it is about the person
> who looks at the results.
> 
> For example, we're already getting push-back from people on some of
> the KCSAN-inspired patches. If we have people sending patches to add
> READ_ONCE/WRITE_ONCE to random places to shut up KCSAN reports, I
> don't think that's good.
> 
> But if we have people who _work_ on memory ordering issues etc, and
> want to see a strict mode, knowing there are false positives and able
> to handle them, that's a completely different thing..
> 
> No?

Understood on the pushback!  And I especially agree that it is bad to
automatically add *_ONCE() just to shut up KCSAN.  For one thing, doing
that inconveniences people later on who might want to take a closer look.

As long as I can get the full-up reports for RCU.  And as long as the
others who want the full-up reports can also get them.  ;-)

And agreed, if the results are adjusted based on who is processing them,
that should be good.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-10 20:44       ` Paul E. McKenney
@ 2019-11-10 21:10         ` Linus Torvalds
  2019-11-10 21:31           ` Paul E. McKenney
  2019-11-11 14:17         ` Marco Elver
  1 sibling, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-10 21:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Alan Stern, Marco Elver, Eric Dumazet, Eric Dumazet, syzbot,
	linux-fsdevel, Linux Kernel Mailing List, syzkaller-bugs,
	Al Viro, Andrea Parri, LKMM Maintainers -- Akira Yokosawa

On Sun, Nov 10, 2019 at 12:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> But will "one size fits all" be practical and useful?

Oh, I do agree that if KCSAN has some mode where it says "I'll ignore
repeated writes with the same value" (or whatever), it could/should
likely be behind some flag.

I don't think it should be a subsystem flag, though. More of a "I'm
willing to actually analyze and ignore false positives" flag. Because
I don't think it's so much about the code, as it is about the person
who looks at the results.

For example, we're already getting push-back from people on some of
the KCSAN-inspired patches. If we have people sending patches to add
READ_ONCE/WRITE_ONCE to random places to shut up KCSAN reports, I
don't think that's good.

But if we have people who _work_ on memory ordering issues etc, and
want to see a strict mode, knowing there are false positives and able
to handle them, that's a completely different thing..

No?

              Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-10 19:20     ` Linus Torvalds
@ 2019-11-10 20:44       ` Paul E. McKenney
  2019-11-10 21:10         ` Linus Torvalds
  2019-11-11 14:17         ` Marco Elver
  0 siblings, 2 replies; 45+ messages in thread
From: Paul E. McKenney @ 2019-11-10 20:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Stern, Marco Elver, Eric Dumazet, Eric Dumazet, syzbot,
	linux-fsdevel, Linux Kernel Mailing List, syzkaller-bugs,
	Al Viro, Andrea Parri, LKMM Maintainers -- Akira Yokosawa

On Sun, Nov 10, 2019 at 11:20:53AM -0800, Linus Torvalds wrote:
> On Sun, Nov 10, 2019 at 11:12 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > And this is where WRITE_IDEMPOTENT would make a possible difference.
> > In particular, if we make the optimization to do the "read and only
> > write if changed"
> 
> It might be useful for checking too. IOW, something like KCSAN could
> actually check that if a field has an idempotent write to it, all
> writes always have the same value.
> 
> Again, there's the issue with lifetime.
> 
> Part of that is "initialization is different". Those writes would not
> be marked idempotent, of course, and they'd write another value.
> 
> There's also the issue of lifetime at the _end_ of the use, of course.
> There _are_ interesting data races at the end of the lifetime, both
> reads and writes.
> 
> In particular, if it's a sticky flag, in order for there to not be any
> races, all the writes have to happen with a refcount held, and the
> final read has to happen after the final refcount is dropped (and the
> refcounts have to have atomicity and ordering, of course). I'm not
> sure how easy something like that is model in KSAN. Maybe it already
> does things like that for all the other refcount stuff we do.
> 
> But the lifetime can be problematic for other reasons too - in this
> particular case we have a union for that sticky flag (which is used
> under the refcount), and then when the final refcount is released we
> read that value (thus no data race) but because of the union we will
> now start using that field with *different* data. It becomes that RCU
> list head instead.
> 
> That kind of "it used to be a sticky flag, but now the lifetime of the
> flag is over, and it's something entirely different" might be a
> nightmare for something like KCSAN. It sounds complicated to check
> for, but I have no idea what KCSAN really considers complicated or
> not.

But will "one size fits all" be practical and useful?

For my code, I would be happy to accept a significant "false positive"
rate to get even a probabilistic warning of other-task accesses to some
of RCU's fields.  Even if the accesses were perfect from a functional
viewpoint, they could be problematic from a performance and scalability
viewpoint.  And for something like RCU, real bugs, even those that are
very improbable, need to be fixed.

But other code (and thus other developers and maintainers) are going to
have different needs.  For all I know, some might have good reasons to
exclude their code from KCSAN analysis entirely.

Would it make sense for KCSAN to have per-file/subsystem/whatever flags
specifying the depth of the analysis?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-10 19:12   ` Linus Torvalds
@ 2019-11-10 19:20     ` Linus Torvalds
  2019-11-10 20:44       ` Paul E. McKenney
  0 siblings, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-10 19:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Marco Elver, Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Sun, Nov 10, 2019 at 11:12 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And this is where WRITE_IDEMPOTENT would make a possible difference.
> In particular, if we make the optimization to do the "read and only
> write if changed"

It might be useful for checking too. IOW, something like KCSAN could
actually check that if a field has an idempotent write to it, all
writes always have the same value.

Again, there's the issue with lifetime.

Part of that is "initialization is different". Those writes would not
be marked idempotent, of course, and they'd write another value.

There's also the issue of lifetime at the _end_ of the use, of course.
There _are_ interesting data races at the end of the lifetime, both
reads and writes.

In particular, if it's a sticky flag, in order for there to not be any
races, all the writes have to happen with a refcount held, and the
final read has to happen after the final refcount is dropped (and the
refcounts have to have atomicity and ordering, of course). I'm not
sure how easy something like that is model in KSAN. Maybe it already
does things like that for all the other refcount stuff we do.

But the lifetime can be problematic for other reasons too - in this
particular case we have a union for that sticky flag (which is used
under the refcount), and then when the final refcount is released we
read that value (thus no data race) but because of the union we will
now start using that field with *different* data. It becomes that RCU
list head instead.

That kind of "it used to be a sticky flag, but now the lifetime of the
flag is over, and it's something entirely different" might be a
nightmare for something like KCSAN. It sounds complicated to check
for, but I have no idea what KCSAN really considers complicated or
not.

                  Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-10 16:09 ` Alan Stern
  2019-11-10 19:10   ` Marco Elver
@ 2019-11-10 19:12   ` Linus Torvalds
  2019-11-10 19:20     ` Linus Torvalds
  1 sibling, 1 reply; 45+ messages in thread
From: Linus Torvalds @ 2019-11-10 19:12 UTC (permalink / raw)
  To: Alan Stern
  Cc: Marco Elver, Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Sun, Nov 10, 2019 at 8:09 AM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> Agreed.  My point was that you were using the word in a way which did
> not match this definition.

Whatever. I claim that my use was *exactly* that "certain writes are idempotent"

> Never mind that.  You did not respond to the question at the end of my
> previous email: Should the LKMM be changed so that two writes are not
> considered to race with each other if they store the same value?

No.

The whole point is that only *certain* writes are idempotent - the
ones where we stickily set a flag or clear a flag. So the field has
exactly two possible values: the initial state, and the "something did
a write to it" state.

This is why I suggested that WRITE_IDEMPOTENT() - which is us telling
the system that "I'm now doing that sticky write of a flag, and
ordering with other threads (or within this thread) on this field
doesn't matter".

One side effect of that "ordering doesn't matter" is that we could -
if it were to be shown to be worthwhile - turn it into a "did somebody
else already do this, then I won't bother".

But that's not necessarily true in _general_. We might write the same
value back without it being a true idempotent write. Some other write
_could_ race with it and be a data race.

For example, two threads doing

   variable++;

could race, and end up writing the same value _because_ of the race.
That would obviously be a data race, and neither of the two writes are
in any way idempotent.

Similarly, a "I added new data to a linked list, you should wake up
and handle it" write would always write the same value in that
particular location, but another location would obviously clear the
flag, so now that write that sets the "new data available" flag is
_not_ idempotent, and you could _not_ replace it with a "did somebody
else already set this flag" sequence. It might look on a local scope
like a "always write the same value", and yes, it might race with
others that also write the same value, but there are also threads that
write a different value, so now it's not ok to say "did it already
have that value, in which case I can skip the write".

See why I think idempotent writes are something somewhat special -
they aren't just about writing the same value. They are about only
_ever_ writing the same value (with the caveat obviously being "over
the lifetime of that data structure, and with the initial value being
different", of course).

> That change would take care of the original issue of this email thread,
> wouldn't it?  And it would render WRITE_IDEMPOTENT unnecessary.

So I do think LKMM should say "writes of the same value must obviously
result in the same value in memory afterwards", if it doesn't already.
That's a somewhat trivial case, it's just a special case of the
single-value atomicity issue. I thought the LKMM had that already: if
you have writes of 'x' and 'y' to a variable from two CPU's, all CPU's
are supposed to see _either_ 'x' or 'y', they can't ever see a mix of
the two.

And yes, we've depended on that single-value atomicity historically.

The 'x' and 'y' have the same value is just a special case of that
general issue - if two threads write the same value, no CPU can ever
see anything but that value (or the original one). So in that sense,
fundamentally the same value write cannot race with itself.

But that LKMM rule is separate from a rule about a statistical tool like KCSAN.

Should KCSAN then ignore writes of the same value?

Maybe.

Because while that "variable++" data race with the same value is real,
the likelihood of hitting it is small, so a statistical tool like
KCSAN might as well ignore it - the tool would show the data race when
the race _doesn't_ happen, which would be the normal case anyway, and
would be the reason why the race hadn't been noticed by a normal human
being.

So practically speaking, we might say "concurrent writes of the same
value aren't data races" for KCSAN, even though they _could_ be data
races.

And this is where WRITE_IDEMPOTENT would make a possible difference.
In particular, if we make the optimization to do the "read and only
write if changed", two CPU's doing this concurrently would do

   READ 0
   WRITE 1

(for a "flag goes from 0->1" transition) and from a tool perspective,
it would be very hard to know whether this is a race (two threads
doing "variable++") or not (two threads setting a sticky flag).

So WRITE_IDEMPOTENT would then disambiguate that choice. See what I'm saying?

At the same time, I suspect that it's just simpler to say "if all the
writes we see to this field have the same value, then we will assume
it has idempotent behavior".

Even then the "all writes" would have to know the difference between
initial values and subsequent updates, which apparently isn't obvious
in KCSAN, but I don't know how hacky that kind of logic would be.

> Making that change would amount to formalizing your requirement that
> the compiler should not invent stores to shared variables.  In C11 such
> invented stores are allowed.

I don't care one whit about C11. Made-up stores to shared data are not
acceptable. Ever. We will turn that off with a compiler switch if the
compiler thinks it can do them, the same way we turn off other
incorrect optimizations like the type-based aliasing or the insane
"signed integer arithmetic can have undefined behavior" stupidity that
the standards people allowed.

I thought that has always been clear. I have not exactly been
ambiguous about my dislike of silly pointless "the standard allows me
to do stupid things".

                 Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
  2019-11-10 16:09 ` Alan Stern
@ 2019-11-10 19:10   ` Marco Elver
  2019-11-11 15:51     ` Alan Stern
  2019-11-10 19:12   ` Linus Torvalds
  1 sibling, 1 reply; 45+ messages in thread
From: Marco Elver @ 2019-11-10 19:10 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Eric Dumazet, Eric Dumazet, syzbot,
	linux-fsdevel, Linux Kernel Mailing List, syzkaller-bugs,
	Al Viro, Andrea Parri, Paul E. McKenney,
	LKMM Maintainers -- Akira Yokosawa

On Sun, 10 Nov 2019 at 17:09, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Sat, 9 Nov 2019, Linus Torvalds wrote:
>
> > On Sat, Nov 9, 2019, 15:08 Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > > On Fri, 8 Nov 2019, Linus Torvalds wrote:
> > > >
> > > > Two writes to normal memory are *not* idempotent if they write
> > > > different values. The ordering very much matters, and it's racy and a
> > > > tool should complain.
> > >
> > > What you have written strongly indicates that either you think the word
> > > "idempotent" means something different from its actual meaning or else
> > > you are misusing the word in a very confusing way.
> > >
> >
> > "Idempotence is the property of certain operations in mathematics and
> > computer science whereby they can be applied multiple times without
> > changing the result beyond the initial application. "
> >
> > This is (for example) commonly used when talking about nfs operations,
> > where you can re-send the same nfs operation, and it's ok (even if it has
> > side effects) because the server remembers that it already did the
> > operation. If it's already been done, nothing changes.
> >
> > It may not match your definition in some other area, but this is very much
> > the accepted meaning of the word in computer science and operating systems.
>
> Agreed.  My point was that you were using the word in a way which did
> not match this definition.
>
> Never mind that.  You did not respond to the question at the end of my
> previous email: Should the LKMM be changed so that two writes are not
> considered to race with each other if they store the same value?
>
> That change would take care of the original issue of this email thread,
> wouldn't it?  And it would render WRITE_IDEMPOTENT unnecessary.
>
> Making that change would amount to formalizing your requirement that
> the compiler should not invent stores to shared variables.  In C11 such
> invented stores are allowed.  Given something like this:
>
>         <A complex computation which does not involve x but does
>          require a register spill>
>         x = 1234;
>
> C11 allows the compiler to store an intermediate value in x rather than
> allocating a slot on the stack for the register spill.  After all, x is
> going to be overwritten anyway, and if any other thread accessed x
> during the complex computation then it would race with the final store
> and so the behavior would be undefined in any case.
>
> If you want to specifically forbid the compiler from doing this, it
> makes sense to change the memory model accordingly.
>
> For those used to thinking in terms of litmus tests, consider this one:
>
> C equivalent-writes
>
> {}
>
> P0(int *x)
> {
>         *x = 1;
> }
>
> P1(int *x)
> {
>         *x = 1;
> }
>
> exists (~x=1)
>
> Should the LKMM say that this litmus test contains a race?
>
> This suggests that we might also want to relax the notion of a write
> racing with a read, although in that case I'm not at all sure what the
> appropriate change to the memory model would be.  Something along the
> lines of: If a write W races with a read R, but W stores the same value
> that R would have read if W were not present, then it's not really a
> race.  But of course this is far too vague to be useful.

What if you introduce to the above litmus test:

P2(int *x) { *x = 2; }

How can a developer, using the LKMM as a reference, hope to prove
their code is free from data races without having to enumerate all
possible values a variable could contain (in addition to all possible
interleavings)?

I view introducing data value dependencies, for the sake of allowing
more programs, to a language memory model as a slippery slope, and am
not aware of any precedent where this worked out. The additional
complexity in the memory model would put a burden on developers and
the compiler that is unlikely to be a real benefit (as you pointed
out, the compiler may even need to disable some transformations). From
a practical point of view, if the LKMM departs further and further
from C11's memory model, how do we ensure all compilers do the right
thing?

My vote would go to explicit annotation, not only because it reduces
hidden complexity, but also because it makes the code more
understandable, for developers and tooling. As an additional point, I
find the original suggestion to add WRITE_ONCE to be the least bad (or
some other better named WRITE_). Consider somebody changing the code,
changing the semantics and the values written to "non_rcu". With a
WRITE_ONCE, the developer would be clear about the fact that the write
can happen concurrently, and ensure new code is written with the
assumption that concurrent writes can happen.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: KCSAN: data-race in __alloc_file / __alloc_file
       [not found] <CAHk-=wjB61GNmqpX0BLA5tpL4tsjWV7akaTc2Roth7uGgax+mw@mail.gmail.com>
@ 2019-11-10 16:09 ` Alan Stern
  2019-11-10 19:10   ` Marco Elver
  2019-11-10 19:12   ` Linus Torvalds
  0 siblings, 2 replies; 45+ messages in thread
From: Alan Stern @ 2019-11-10 16:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marco Elver, Eric Dumazet, Eric Dumazet, syzbot, linux-fsdevel,
	Linux Kernel Mailing List, syzkaller-bugs, Al Viro, Andrea Parri,
	Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa

On Sat, 9 Nov 2019, Linus Torvalds wrote:

> On Sat, Nov 9, 2019, 15:08 Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > On Fri, 8 Nov 2019, Linus Torvalds wrote:
> > >
> > > Two writes to normal memory are *not* idempotent if they write
> > > different values. The ordering very much matters, and it's racy and a
> > > tool should complain.
> >
> > What you have written strongly indicates that either you think the word
> > "idempotent" means something different from its actual meaning or else
> > you are misusing the word in a very confusing way.
> >
> 
> "Idempotence is the property of certain operations in mathematics and
> computer science whereby they can be applied multiple times without
> changing the result beyond the initial application. "
> 
> This is (for example) commonly used when talking about nfs operations,
> where you can re-send the same nfs operation, and it's ok (even if it has
> side effects) because the server remembers that it already did the
> operation. If it's already been done, nothing changes.
> 
> It may not match your definition in some other area, but this is very much
> the accepted meaning of the word in computer science and operating systems.

Agreed.  My point was that you were using the word in a way which did
not match this definition.

Never mind that.  You did not respond to the question at the end of my 
previous email: Should the LKMM be changed so that two writes are not 
considered to race with each other if they store the same value?

That change would take care of the original issue of this email thread,
wouldn't it?  And it would render WRITE_IDEMPOTENT unnecessary.

Making that change would amount to formalizing your requirement that 
the compiler should not invent stores to shared variables.  In C11 such 
invented stores are allowed.  Given something like this:

	<A complex computation which does not involve x but does
	 require a register spill>
	x = 1234;

C11 allows the compiler to store an intermediate value in x rather than
allocating a slot on the stack for the register spill.  After all, x is
going to be overwritten anyway, and if any other thread accessed x
during the complex computation then it would race with the final store
and so the behavior would be undefined in any case.

If you want to specifically forbid the compiler from doing this, it
makes sense to change the memory model accordingly.

For those used to thinking in terms of litmus tests, consider this one:

C equivalent-writes

{}

P0(int *x)
{
	*x = 1;
}

P1(int *x)
{
	*x = 1;
}

exists (~x=1)

Should the LKMM say that this litmus test contains a race?

This suggests that we might also want to relax the notion of a write
racing with a read, although in that case I'm not at all sure what the
appropriate change to the memory model would be.  Something along the
lines of: If a write W races with a read R, but W stores the same value
that R would have read if W were not present, then it's not really a
race.  But of course this is far too vague to be useful.

Alan Stern


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, back to index

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-08 13:16 KCSAN: data-race in __alloc_file / __alloc_file syzbot
2019-11-08 13:28 ` Eric Dumazet
2019-11-08 17:01   ` Linus Torvalds
2019-11-08 17:22     ` Eric Dumazet
2019-11-08 17:38       ` Linus Torvalds
2019-11-08 17:53         ` Eric Dumazet
2019-11-08 17:55           ` Eric Dumazet
2019-11-08 18:02             ` Eric Dumazet
2019-11-08 18:12               ` Linus Torvalds
2019-11-08 20:30             ` Linus Torvalds
2019-11-08 20:53               ` Eric Dumazet
2019-11-08 21:36                 ` Linus Torvalds
2019-11-08 18:05           ` Linus Torvalds
2019-11-08 18:15             ` Marco Elver
2019-11-08 18:40               ` Linus Torvalds
2019-11-08 19:48                 ` Marco Elver
2019-11-08 20:26                   ` Linus Torvalds
2019-11-08 21:57                     ` Alan Stern
2019-11-08 22:06                       ` Linus Torvalds
2019-11-09 23:08                         ` Alan Stern
     [not found] <CAHk-=wjB61GNmqpX0BLA5tpL4tsjWV7akaTc2Roth7uGgax+mw@mail.gmail.com>
2019-11-10 16:09 ` Alan Stern
2019-11-10 19:10   ` Marco Elver
2019-11-11 15:51     ` Alan Stern
2019-11-11 16:51       ` Linus Torvalds
2019-11-11 17:52         ` Eric Dumazet
2019-11-11 18:04           ` Linus Torvalds
2019-11-11 18:31             ` Eric Dumazet
2019-11-11 18:44               ` Eric Dumazet
2019-11-11 19:00                 ` Linus Torvalds
2019-11-11 19:13                   ` Eric Dumazet
2019-11-11 20:43                     ` Linus Torvalds
2019-11-11 20:46                       ` Linus Torvalds
2019-11-11 21:53                         ` Eric Dumazet
2019-11-11 23:51                   ` Linus Torvalds
2019-11-11 18:50               ` Linus Torvalds
2019-11-11 18:59                 ` Marco Elver
2019-11-11 18:59                 ` Eric Dumazet
2019-11-10 19:12   ` Linus Torvalds
2019-11-10 19:20     ` Linus Torvalds
2019-11-10 20:44       ` Paul E. McKenney
2019-11-10 21:10         ` Linus Torvalds
2019-11-10 21:31           ` Paul E. McKenney
2019-11-11 14:17         ` Marco Elver
2019-11-11 14:31           ` Paul E. McKenney
2019-11-11 15:10             ` Marco Elver

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git