linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>, Will Deacon <will@kernel.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrey Konovalov <andreyknvl@google.com>,
	Alexander Potapenko <glider@google.com>,
	"Paul E. McKenney" <paulmck@linux.ibm.com>,
	Paul Turner <pjt@google.com>, Daniel Axtens <dja@axtens.net>,
	Anatol Pomazau <anatol@google.com>,
	Andrea Parri <parri.andrea@gmail.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	LKMM Maintainers -- Akira Yokosawa <akiyks@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Daniel Lustig <dlustig@nvidia.com>,
	Jade Alglave <j.alglave@ucl.ac.uk>,
	Luc Maranget <luc.maranget@inria.fr>
Subject: Re: Kernel Concurrency Sanitizer (KCSAN)
Date: Mon, 23 Sep 2019 13:01:27 +0200	[thread overview]
Message-ID: <CANpmjNNX5gJ-CRZ-zg3vNzTcBh6+_zEFQzEGTVFKX-z_KwweVw@mail.gmail.com> (raw)
In-Reply-To: <CACT4Y+YfkV80QF2qxjfHnBghM8Am8m_YHzCtPRfSmOrF-y3bbg@mail.gmail.com>

On Mon, 23 Sep 2019 at 10:59, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Mon, Sep 23, 2019 at 10:54 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Mon, Sep 23, 2019 at 10:21:38AM +0200, Dmitry Vyukov wrote:
> > > On Mon, Sep 23, 2019 at 6:31 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > > >
> > > > On Fri, Sep 20, 2019 at 04:54:21PM +0100, Will Deacon wrote:
> > > > > Hi Marco,
> > > > >
> > > > > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > > > > > We would like to share a new data-race detector for the Linux kernel:
> > > > > > Kernel Concurrency Sanitizer (KCSAN) --
> > > > > > https://github.com/google/ktsan/wiki/KCSAN  (Details:
> > > > > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> > > > > >
> > > > > > To those of you who we mentioned at LPC that we're working on a
> > > > > > watchpoint-based KTSAN inspired by DataCollider [1], this is it (we
> > > > > > renamed it to KCSAN to avoid confusion with KTSAN).
> > > > > > [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf
> > > > >
> > > > > Oh, spiffy!
> > > > >
> > > > > > In the coming weeks we're planning to:
> > > > > > * Set up a syzkaller instance.
> > > > > > * Share the dashboard so that you can see the races that are found.
> > > > > > * Attempt to send fixes for some races upstream (if you find that the
> > > > > > kcsan-with-fixes branch contains an important fix, please feel free to
> > > > > > point it out and we'll prioritize that).
> > > > >
> > > > > Curious: do you take into account things like alignment and/or access size
> > > > > when looking at READ_ONCE/WRITE_ONCE? Perhaps you could initially prune
> > > > > naturally aligned accesses for which __native_word() is true?
> > > > >
> > > > > > There are a few open questions:
> > > > > > * The big one: most of the reported races are due to unmarked
> > > > > > accesses; prioritization or pruning of races to focus initial efforts
> > > > > > to fix races might be required. Comments on how best to proceed are
> > > > > > welcome. We're aware that these are issues that have recently received
> > > > > > attention in the context of the LKMM
> > > > > > (https://lwn.net/Articles/793253/).
> > > > >
> > > > > This one is tricky. What I think we need to avoid is an onslaught of
> > > > > patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the
> > > > > code being modified. My worry is that Joe Developer is eager to get their
> > > > > first patch into the kernel, so runs this tool and starts spamming
> > > > > maintainers with these things to the point that they start ignoring KCSAN
> > > > > reports altogether because of the time they take up.
> > > > >
> > > > > I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE
> > > > > to have a comment describing the racy access, a bit like we do for memory
> > > > > barriers. Another possibility would be to use atomic_t more widely if
> > > > > there is genuine concurrency involved.
> > > > >
> > > >
> > > > Instead of commenting READ_ONCE/WRITE_ONCE()s, how about adding
> > > > anotations for data fields/variables that might be accessed without
> > > > holding a lock? Because if all accesses to a variable are protected by
> > > > proper locks, we mostly don't need to worry about data races caused by
> > > > not using READ_ONCE/WRITE_ONCE(). Bad things happen when we write to a
> > > > variable using locks but read it outside a lock critical section for
> > > > better performance, for example, rcu_node::qsmask. I'm thinking so maybe
> > > > we can introduce a new annotation similar to __rcu, maybe call it
> > > > __lockfree ;-) as follow:
> > > >
> > > >         struct rcu_node {
> > > >                 ...
> > > >                 unsigned long __lockfree qsmask;
> > > >                 ...
> > > >         }
> > > >
> > > > , and __lockfree indicates that by design the maintainer of this data
> > > > structure or variable believe there will be accesses outside lock
> > > > critical sections. Note that not all accesses to __lockfree field, need
> > > > to be READ_ONCE/WRITE_ONCE(), if the developer manages to build a
> > > > complex but working wake/wait state machine so that it could not be
> > > > accessed in the same time, READ_ONCE()/WRITE_ONCE() is not needed.
> > > >
> > > > If we have such an annotation, I think it won't be hard for configuring
> > > > KCSAN to only examine accesses to variables with this annotation. Also
> > > > this annotation could help other checkers in the future.
> > > >
> > > > If KCSAN (at the least the upstream version) only check accesses with
> > > > such an anotation, "spamming with KCSAN warnings/fixes" will be the
> > > > choice of each maintainer ;-)
> > > >
> > > > Thoughts?
> > >
> > > But doesn't this defeat the main goal of any race detector -- finding
> > > concurrent accesses to complex data structures, e.g. forgotten
> > > spinlock around rbtree manipulation? Since rbtree is not meant to
> > > concurrent accesses, it won't have __lockfree annotation, and thus we
> > > will ignore races on it...
> >
> > Maybe, but for forgotten locks detection, we already have lockdep and
> > also sparse can help a little.
>
> They don't do this at all, or to the necessary degree.
>
> > Having a __lockfree annotation could be
> > benefical for KCSAN to focus on checking the accesses whose race
> > conditions could only be detected by KCSAN at this time. I think this
> > could help KCSAN find problem more easily (and fast).

Just to confirm, the annotation is supposed to mean "this variable
should not be accessed concurrently". '__lockfree' may be confusing,
as "lock-free" has a very specific meaning ("lock-free algorithm"),
and I initially thought the annotation means the opposite. Maybe more
intuitive would be '__nonatomic'.

My view, however, is that this will not scale. 1) Our goal is to
*avoid* more annotations if possible. 2) Furthermore, any such
annotation assumes the developer already has understanding of all
concurrently accessed variables; however, this may not be the case for
the next person touching the code, resulting in an error. By
"whitelisting" variables, we would likely miss almost every serious
bug.

To enable/disable KCSAN for entire subsystems, it's already possible
to use 'KCSAN_SANITIZE :=n' in the Makefile, or 'KCSAN_SANITIZE_file.o
:= n' for individual files.

> > Out of curiosity, does KCSAN ever find a problem with forgotten locks
> > involved? I didn't see any in the -with-fixes branch (that's
> > understandable, given the seriousness, the fixes of this kind of
> > problems could already be submitted to upstream once KCSAN found it.)

The sheer volume of 'benign' data-races makes it difficult to filter
through and get to these, but it certainly detects such issues.

Thanks,
-- Marco

> This one comes to mind:
> https://www.spinics.net/lists/linux-mm/msg92677.html
>
> Maybe some others here, but I don't remember which ones now:
> https://github.com/google/ktsan/wiki/KTSAN-Found-Bugs

  reply	other threads:[~2019-09-23 11:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-20 14:18 Kernel Concurrency Sanitizer (KCSAN) Marco Elver
2019-09-20 15:54 ` Will Deacon
2019-09-20 17:50   ` Marco Elver
2019-09-23  4:31   ` Boqun Feng
2019-09-23  8:21     ` Dmitry Vyukov
2019-09-23  8:54       ` Boqun Feng
2019-09-23  8:59         ` Dmitry Vyukov
2019-09-23 11:01           ` Marco Elver [this message]
2019-09-23 12:32             ` Boqun Feng
2019-10-05  0:58   ` Eric Dumazet
2019-10-05  4:16     ` Dmitry Vyukov
2019-10-09  7:45       ` Dmitry Vyukov
2019-10-09 16:39         ` Eric Dumazet
2019-10-09 20:17         ` Andrea Parri
2019-09-20 16:31 ` Mark Rutland
2019-09-20 16:46   ` Dmitry Vyukov
2019-09-20 17:51     ` Marco Elver
2019-10-03 16:12       ` Mark Rutland
2019-10-03 19:27         ` Marco Elver
2019-10-01 14:50 ` Daniel Axtens
2019-10-02 19:42   ` Marco Elver
2019-10-11  3:45     ` Daniel Axtens
2019-10-01 21:19 ` Joel Fernandes
2019-10-02 19:51   ` Marco Elver
2019-10-03 13:13     ` Dmitry Vyukov
2019-10-03 16:00       ` Dmitry Vyukov
2019-10-03 19:39         ` Christian Brauner
2019-10-04 16:48     ` Joel Fernandes
2019-10-04 16:52       ` Dmitry Vyukov
2019-10-04 16:57         ` Joel Fernandes
2019-10-04 17:01           ` Dmitry Vyukov
2019-10-04 18:08             ` Joel Fernandes
2019-10-04 18:28               ` Dmitry Vyukov
     [not found] ` <CADyx2V6j+do+CmmSYEUr0iP7TUWD7xHLP2ZJPrqB1Y+QEAwzhw@mail.gmail.com>
2019-12-12 20:53   ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANpmjNNX5gJ-CRZ-zg3vNzTcBh6+_zEFQzEGTVFKX-z_KwweVw@mail.gmail.com \
    --to=elver@google.com \
    --cc=akiyks@gmail.com \
    --cc=anatol@google.com \
    --cc=andreyknvl@google.com \
    --cc=boqun.feng@gmail.com \
    --cc=dja@axtens.net \
    --cc=dlustig@nvidia.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=j.alglave@ucl.ac.uk \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luc.maranget@inria.fr \
    --cc=npiggin@gmail.com \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@linux.ibm.com \
    --cc=pjt@google.com \
    --cc=stern@rowland.harvard.edu \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).