Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Marco Elver <elver@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	syzbot <syzbot+3ef049d50587836c0606@syzkaller.appspotmail.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrea Parri <parri.andrea@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	LKMM Maintainers -- Akira Yokosawa <akiyks@gmail.com>
Subject: Re: KCSAN: data-race in __alloc_file / __alloc_file
Date: Sun, 10 Nov 2019 11:12:14 -0800
Message-ID: <CAHk-=wjErHCwkcgO-=NReU0KR4TFozrFktbhh2rzJ=mPgRO0-g@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1911101034180.29192-100000@netrider.rowland.org>

On Sun, Nov 10, 2019 at 8:09 AM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> Agreed.  My point was that you were using the word in a way which did
> not match this definition.

Whatever. I claim that my use was *exactly* that "certain writes are idempotent"

> Never mind that.  You did not respond to the question at the end of my
> previous email: Should the LKMM be changed so that two writes are not
> considered to race with each other if they store the same value?

No.

The whole point is that only *certain* writes are idempotent - the
ones where we stickily set a flag or clear a flag. So the field has
exactly two possible values: the initial state, and the "something did
a write to it" state.

This is why I suggested that WRITE_IDEMPOTENT() - which is us telling
the system that "I'm now doing that sticky write of a flag, and
ordering with other threads (or within this thread) on this field
doesn't matter".

One side effect of that "ordering doesn't matter" is that we could -
if it were to be shown to be worthwhile - turn it into a "did somebody
else already do this, then I won't bother".

But that's not necessarily true in _general_. We might write the same
value back without it being a true idempotent write. Some other write
_could_ race with it and be a data race.

For example, two threads doing

   variable++;

could race, and end up writing the same value _because_ of the race.
That would obviously be a data race, and neither of the two writes are
in any way idempotent.

Similarly, a "I added new data to a linked list, you should wake up
and handle it" write would always write the same value in that
particular location, but another location would obviously clear the
flag, so now that write that sets the "new data available" flag is
_not_ idempotent, and you could _not_ replace it with a "did somebody
else already set this flag" sequence. It might look on a local scope
like a "always write the same value", and yes, it might race with
others that also write the same value, but there are also threads that
write a different value, so now it's not ok to say "did it already
have that value, in which case I can skip the write".

See why I think idempotent writes are something somewhat special -
they aren't just about writing the same value. They are about only
_ever_ writing the same value (with the caveat obviously being "over
the lifetime of that data structure, and with the initial value being
different", of course).

> That change would take care of the original issue of this email thread,
> wouldn't it?  And it would render WRITE_IDEMPOTENT unnecessary.

So I do think LKMM should say "writes of the same value must obviously
result in the same value in memory afterwards", if it doesn't already.
That's a somewhat trivial case, it's just a special case of the
single-value atomicity issue. I thought the LKMM had that already: if
you have writes of 'x' and 'y' to a variable from two CPU's, all CPU's
are supposed to see _either_ 'x' or 'y', they can't ever see a mix of
the two.

And yes, we've depended on that single-value atomicity historically.

The 'x' and 'y' have the same value is just a special case of that
general issue - if two threads write the same value, no CPU can ever
see anything but that value (or the original one). So in that sense,
fundamentally the same value write cannot race with itself.

But that LKMM rule is separate from a rule about a statistical tool like KCSAN.

Should KCSAN then ignore writes of the same value?

Maybe.

Because while that "variable++" data race with the same value is real,
the likelihood of hitting it is small, so a statistical tool like
KCSAN might as well ignore it - the tool would show the data race when
the race _doesn't_ happen, which would be the normal case anyway, and
would be the reason why the race hadn't been noticed by a normal human
being.

So practically speaking, we might say "concurrent writes of the same
value aren't data races" for KCSAN, even though they _could_ be data
races.

And this is where WRITE_IDEMPOTENT would make a possible difference.
In particular, if we make the optimization to do the "read and only
write if changed", two CPU's doing this concurrently would do

   READ 0
   WRITE 1

(for a "flag goes from 0->1" transition) and from a tool perspective,
it would be very hard to know whether this is a race (two threads
doing "variable++") or not (two threads setting a sticky flag).

So WRITE_IDEMPOTENT would then disambiguate that choice. See what I'm saying?

At the same time, I suspect that it's just simpler to say "if all the
writes we see to this field have the same value, then we will assume
it has idempotent behavior".

Even then the "all writes" would have to know the difference between
initial values and subsequent updates, which apparently isn't obvious
in KCSAN, but I don't know how hacky that kind of logic would be.

> Making that change would amount to formalizing your requirement that
> the compiler should not invent stores to shared variables.  In C11 such
> invented stores are allowed.

I don't care one whit about C11. Made-up stores to shared data are not
acceptable. Ever. We will turn that off with a compiler switch if the
compiler thinks it can do them, the same way we turn off other
incorrect optimizations like the type-based aliasing or the insane
"signed integer arithmetic can have undefined behavior" stupidity that
the standards people allowed.

I thought that has always been clear. I have not exactly been
ambiguous about my dislike of silly pointless "the standard allows me
to do stupid things".

                 Linus

  parent reply index

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHk-=wjB61GNmqpX0BLA5tpL4tsjWV7akaTc2Roth7uGgax+mw@mail.gmail.com>
2019-11-10 16:09 ` Alan Stern
2019-11-10 19:10   ` Marco Elver
2019-11-11 15:51     ` Alan Stern
2019-11-11 16:51       ` Linus Torvalds
2019-11-11 17:52         ` Eric Dumazet
2019-11-11 18:04           ` Linus Torvalds
2019-11-11 18:31             ` Eric Dumazet
2019-11-11 18:44               ` Eric Dumazet
2019-11-11 19:00                 ` Linus Torvalds
2019-11-11 19:13                   ` Eric Dumazet
2019-11-11 20:43                     ` Linus Torvalds
2019-11-11 20:46                       ` Linus Torvalds
2019-11-11 21:53                         ` Eric Dumazet
2019-11-11 23:51                   ` Linus Torvalds
2019-11-12 16:50                     ` Kirill Smelkov
2019-11-12 17:23                       ` Linus Torvalds
2019-11-12 17:36                         ` Linus Torvalds
2019-11-17 18:56                           ` Kirill Smelkov
2019-11-17 19:20                             ` Linus Torvalds
2019-11-11 18:50               ` Linus Torvalds
2019-11-11 18:59                 ` Marco Elver
2019-11-11 18:59                 ` Eric Dumazet
2019-11-10 19:12   ` Linus Torvalds [this message]
2019-11-10 19:20     ` Linus Torvalds
2019-11-10 20:44       ` Paul E. McKenney
2019-11-10 21:10         ` Linus Torvalds
2019-11-10 21:31           ` Paul E. McKenney
2019-11-11 14:17         ` Marco Elver
2019-11-11 14:31           ` Paul E. McKenney
2019-11-11 15:10             ` Marco Elver
2019-11-13  0:25               ` Paul E. McKenney
2019-11-12 19:14     ` Alan Stern
2019-11-12 19:47       ` Linus Torvalds
2019-11-12 20:29         ` Alan Stern
2019-11-12 20:58           ` Linus Torvalds
2019-11-12 21:13             ` Linus Torvalds
2019-11-12 22:05               ` Marco Elver
2019-11-12 21:48             ` Alan Stern
2019-11-12 22:07               ` Eric Dumazet
2019-11-12 22:44                 ` Alexei Starovoitov
2019-11-12 23:17                   ` Eric Dumazet
2019-11-12 23:40                     ` Linus Torvalds
2019-11-13 15:00                       ` Marco Elver
2019-11-13 16:57                         ` Linus Torvalds
2019-11-13 21:33                           ` Marco Elver
2019-11-13 21:50                             ` Alan Stern
2019-11-13 22:48                               ` Marco Elver
2019-11-08 13:16 syzbot
2019-11-08 13:28 ` Eric Dumazet
2019-11-08 17:01   ` Linus Torvalds
2019-11-08 17:22     ` Eric Dumazet
2019-11-08 17:38       ` Linus Torvalds
2019-11-08 17:53         ` Eric Dumazet
2019-11-08 17:55           ` Eric Dumazet
2019-11-08 18:02             ` Eric Dumazet
2019-11-08 18:12               ` Linus Torvalds
2019-11-08 20:30             ` Linus Torvalds
2019-11-08 20:53               ` Eric Dumazet
2019-11-08 21:36                 ` Linus Torvalds
2019-11-08 18:05           ` Linus Torvalds
2019-11-08 18:15             ` Marco Elver
2019-11-08 18:40               ` Linus Torvalds
2019-11-08 19:48                 ` Marco Elver
2019-11-08 20:26                   ` Linus Torvalds
2019-11-08 21:57                     ` Alan Stern
2019-11-08 22:06                       ` Linus Torvalds
2019-11-09 23:08                         ` Alan Stern

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjErHCwkcgO-=NReU0KR4TFozrFktbhh2rzJ=mPgRO0-g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akiyks@gmail.com \
    --cc=edumazet@google.com \
    --cc=elver@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=stern@rowland.harvard.edu \
    --cc=syzbot+3ef049d50587836c0606@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git