Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	Marco Elver <elver@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	syzbot <syzbot+3ef049d50587836c0606@syzkaller.appspotmail.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrea Parri <parri.andrea@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	LKMM Maintainers -- Akira Yokosawa <akiyks@gmail.com>
Subject: Re: KCSAN: data-race in __alloc_file / __alloc_file
Date: Tue, 12 Nov 2019 15:40:51 -0800
Message-ID: <CAHk-=wj_rFOPF9Sw_-h-6J93tP9qO_UOEvd-e02z9+DEDs2kLQ@mail.gmail.com> (raw)
In-Reply-To: <CANn89iKLy-5rnGmVt-nzf6as4MvXgZzSH+BSReXZKpSTjhoWAw@mail.gmail.com>

On Tue, Nov 12, 2019 at 3:18 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Hmm we have the ' volatile'  attribute on jiffies, and it causes
> confusion already :p

The jiffies case is partly historical, but partly also because it's
one of the very very few data structures where 99% of all uses are
unlocked and for convenience reasons we really don't want to force
those legacy cases to do anything special about it.

"jiffies" really is very special for those kinds of legacy reasons.
Look at the kinds of games we play with it on 32-bit architectures:
the "real" storage is "jiffies_64", but nobody actually wants to use
that, and because of load tearing issues it's actually hard to use
too. So what everybody _uses_ is just the low 32 bits, and 'jiffies'
isn't a real variable, it's done with linker tricks in our
vmlinux.lds.S files. So, for example, on sparc32, you find this:

    jiffies = jiffies_64 + 4;

in the vmlinux.lds.S file, because it's big-endian, and the lower 32
bits are at offset 4 from the real 64-bit variable.

Note that to actually read the "true" full 64-bit value, you have to
then call a function that does the proper sequence counter stuff etc.
But nobody really wants it, since what everybody actually _uses_ is
the "time_after(x,jiffies+10)" kind of thing, which is only done in
the wrapping "unsigned long" time base. So the odd "linker tricks with
the atomic low bits marked as a volatile data structure" is actually
exactly what we want, but people should realize that this is not
normal.

So 'jiffies' is really really special.

And absolutely nothing else should use 'volatile' on data structures
and be that special.  In the case of jiffies, the rule ends up being
that nobody should ever write to it (you write to the real
jiffies_64), and the real jiffies_64 thing gets the real locking, and
'jiffies' is a read-only volatile thing.

So "READ_ONCE()" is indeed unnecessary with jiffies, but it won't
hurt. It's not really "confusion" - there's nothing _wrong_ with using
READ_ONCE() on volatile data, but we just normally don't do volatile
data in the kernel (because our normal model is that data is never
really volatile in general - there may be locked and unlocked accesses
to it, so it's stable or volatile depending on context, and thus the
'volatile' goes on the _code_, not on the data structure)

But while jiffies READ_ONCE() accesses isn't _wrong_, it is also not
really paired with any WRITE_ONCE(), because the real update is to
technically not even to the same full data structure.

So don't look to jiffies for how to do things. It's an odd one-off.

That said, for "this might be used racily", if there are annotations
for clang to just make it shut up about one particular field in a
structure, than I think that would be ok. As long as it doesn't then
imply special code generation (outside of the checking, of course).

                 Linus

  reply index

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHk-=wjB61GNmqpX0BLA5tpL4tsjWV7akaTc2Roth7uGgax+mw@mail.gmail.com>
2019-11-10 16:09 ` Alan Stern
2019-11-10 19:10   ` Marco Elver
2019-11-11 15:51     ` Alan Stern
2019-11-11 16:51       ` Linus Torvalds
2019-11-11 17:52         ` Eric Dumazet
2019-11-11 18:04           ` Linus Torvalds
2019-11-11 18:31             ` Eric Dumazet
2019-11-11 18:44               ` Eric Dumazet
2019-11-11 19:00                 ` Linus Torvalds
2019-11-11 19:13                   ` Eric Dumazet
2019-11-11 20:43                     ` Linus Torvalds
2019-11-11 20:46                       ` Linus Torvalds
2019-11-11 21:53                         ` Eric Dumazet
2019-11-11 23:51                   ` Linus Torvalds
2019-11-12 16:50                     ` Kirill Smelkov
2019-11-12 17:23                       ` Linus Torvalds
2019-11-12 17:36                         ` Linus Torvalds
2019-11-17 18:56                           ` Kirill Smelkov
2019-11-17 19:20                             ` Linus Torvalds
2019-11-11 18:50               ` Linus Torvalds
2019-11-11 18:59                 ` Marco Elver
2019-11-11 18:59                 ` Eric Dumazet
2019-11-10 19:12   ` Linus Torvalds
2019-11-10 19:20     ` Linus Torvalds
2019-11-10 20:44       ` Paul E. McKenney
2019-11-10 21:10         ` Linus Torvalds
2019-11-10 21:31           ` Paul E. McKenney
2019-11-11 14:17         ` Marco Elver
2019-11-11 14:31           ` Paul E. McKenney
2019-11-11 15:10             ` Marco Elver
2019-11-13  0:25               ` Paul E. McKenney
2019-11-12 19:14     ` Alan Stern
2019-11-12 19:47       ` Linus Torvalds
2019-11-12 20:29         ` Alan Stern
2019-11-12 20:58           ` Linus Torvalds
2019-11-12 21:13             ` Linus Torvalds
2019-11-12 22:05               ` Marco Elver
2019-11-12 21:48             ` Alan Stern
2019-11-12 22:07               ` Eric Dumazet
2019-11-12 22:44                 ` Alexei Starovoitov
2019-11-12 23:17                   ` Eric Dumazet
2019-11-12 23:40                     ` Linus Torvalds [this message]
2019-11-13 15:00                       ` Marco Elver
2019-11-13 16:57                         ` Linus Torvalds
2019-11-13 21:33                           ` Marco Elver
2019-11-13 21:50                             ` Alan Stern
2019-11-13 22:48                               ` Marco Elver
2019-11-08 13:16 syzbot
2019-11-08 13:28 ` Eric Dumazet
2019-11-08 17:01   ` Linus Torvalds
2019-11-08 17:22     ` Eric Dumazet
2019-11-08 17:38       ` Linus Torvalds
2019-11-08 17:53         ` Eric Dumazet
2019-11-08 17:55           ` Eric Dumazet
2019-11-08 18:02             ` Eric Dumazet
2019-11-08 18:12               ` Linus Torvalds
2019-11-08 20:30             ` Linus Torvalds
2019-11-08 20:53               ` Eric Dumazet
2019-11-08 21:36                 ` Linus Torvalds
2019-11-08 18:05           ` Linus Torvalds
2019-11-08 18:15             ` Marco Elver
2019-11-08 18:40               ` Linus Torvalds
2019-11-08 19:48                 ` Marco Elver
2019-11-08 20:26                   ` Linus Torvalds
2019-11-08 21:57                     ` Alan Stern
2019-11-08 22:06                       ` Linus Torvalds
2019-11-09 23:08                         ` Alan Stern

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wj_rFOPF9Sw_-h-6J93tP9qO_UOEvd-e02z9+DEDs2kLQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akiyks@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=edumazet@google.com \
    --cc=elver@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=stern@rowland.harvard.edu \
    --cc=syzbot+3ef049d50587836c0606@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git