linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: open list <linux-kernel@vger.kernel.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	rcu@vger.kernel.org, lkft-triage@lists.linaro.org,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	fweisbec@gmail.com, Arnd Bergmann <arnd@arndb.de>
Subject: Re: BUG: KCSAN: data-race in tick_nohz_next_event / tick_nohz_stop_tick
Date: Fri, 4 Dec 2020 20:53:36 +0100	[thread overview]
Message-ID: <CANpmjNPpOym1eHYQBK4TyGgsDA=WujRJeR3aMpZPa6Y7ahtgKA@mail.gmail.com> (raw)
In-Reply-To: <CA+G9fYsHo-9tmxCKGticDowF8e3d1RkcLamapOgMQqeP6OdEEg@mail.gmail.com>

On Fri, 4 Dec 2020 at 20:04, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> LKFT started testing KCSAN enabled kernel from the linux next tree.
> Here we have found BUG: KCSAN: data-race in tick_nohz_next_event /
> tick_nohz_stop_tick

Thank you for looking into KCSAN. Would it be possible to collect
these reports in a moderation queue for now?

I'm currently trying to work out a strategy on how to best proceed
with all the data races in the kernel. We do know there are plenty. On
syzbot's internal moderation queue, we're currently looking at >300 of
them (some here:
https://syzkaller.appspot.com/upstream?manager=ci2-upstream-kcsan-gce).
Part of this strategy involves prioritizing certain concurrency bug
classes. Let's define the following buckets for concurrency bugs:

A. Data race, where failure due to current compilers is unlikely
(supposedly "benign"); merely marking the accesses
appropriately is sufficient. Finding a crash for these will
require a miscompilation, but otherwise look "benign" at the
C-language level.

B. Race-condition bugs where the bug manifests as a data race,
too -- simply marking things doesn't fix the problem. These
are the types of bugs where a data race would point out a
more severe issue.

C. Race-condition bugs where the bug never manifests as a data
race. An example of these might be 2 threads that acquire the
necessary locks, yet some interleaving of them still results
in a bug (e.g. because the logic inside the critical sections
is buggy). These are harder to detect with KCSAN as-is, and
require using ASSERT_EXCLUSIVE_ACCESS() or
ASSERT_EXCLUSIVE_WRITER() in the right place. See
https://lwn.net/Articles/816854/.

One problem currently is that the kernel has quite a lot type-(A)
reports if we run KCSAN, which makes it harder to identify bugs of type
(B) and (C). My wish for the future is that we can get to a place, where
the kernel has almost no unintentional (A) issues, so that we primarily
find (B) and (C) bugs.

The report below looks to be of type (A). Generally, the best strategy
for resolving these is to send a patch, and not a report. However, be
aware that sometimes it is really quite difficult to say if we're
looking at a type (A) or (B) issue, in which case it may still be fair
to send a report and briefly describe what you think is happening
(because that'll increase the likelihood of getting a response). I
recommend also reading "Developer/Maintainer data-race strategies" in
https://lwn.net/Articles/816854/ -- specifically note "[...] you
should not respond to KCSAN reports by mindlessly adding READ_ONCE(),
data_race(), and WRITE_ONCE(). Instead, a patch addressing a KCSAN
report must clearly identify the fix's approach and why that approach
is appropriate."

I recommend reading https://lwn.net/Articles/816850/ for more details.

> This report is from an x86_64 machine clang-11 linux next 20201201.
> Since we are running for the first time we do not call this regression.
>
> [   47.811425] BUG: KCSAN: data-race in tick_nohz_next_event /
> tick_nohz_stop_tick
> [   47.818738]
> [   47.820239] write to 0xffffffffa4cbe920 of 4 bytes by task 0 on cpu 2:
> [   47.826766]  tick_nohz_stop_tick+0x8b/0x310
> [   47.830951]  tick_nohz_idle_stop_tick+0xcb/0x170
> [   47.835571]  do_idle+0x193/0x250
> [   47.838804]  cpu_startup_entry+0x25/0x30
> [   47.842728]  start_secondary+0xa0/0xb0
> [   47.846482]  secondary_startup_64_no_verify+0xc2/0xcb
> [   47.851531]
> [   47.853034] read to 0xffffffffa4cbe920 of 4 bytes by task 0 on cpu 3:
> [   47.859473]  tick_nohz_next_event+0x165/0x1e0
> [   47.863831]  tick_nohz_get_sleep_length+0x94/0xd0
> [   47.868539]  menu_select+0x250/0xac0
> [   47.872116]  cpuidle_select+0x47/0x50
> [   47.875781]  do_idle+0x17c/0x250
> [   47.879015]  cpu_startup_entry+0x25/0x30
> [   47.882942]  start_secondary+0xa0/0xb0
> [   47.886694]  secondary_startup_64_no_verify+0xc2/0xcb
> [   47.891743]
> [   47.893234] Reported by Kernel Concurrency Sanitizer on:
> [   47.898541] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
> 5.10.0-rc6-next-20201201 #2
> [   47.906017] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.2 05/23/2018

This report should have line numbers, otherwise it's impossible to say
which accesses are racing.

[ For those curious, this is the same report on syzbot's moderation
queue, with line numbers:
https://syzkaller.appspot.com/bug?id=d835c53d1a5e27922fcd1fbefc926a74790156cb
]

Thanks,
-- Marco

  reply	other threads:[~2020-12-04 19:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-04 19:03 Naresh Kamboju
2020-12-04 19:53 ` Marco Elver [this message]
2020-12-05 18:18   ` Thomas Gleixner
2020-12-05 23:47     ` Thomas Gleixner
2020-12-07 12:23       ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANpmjNPpOym1eHYQBK4TyGgsDA=WujRJeR3aMpZPa6Y7ahtgKA@mail.gmail.com' \
    --to=elver@google.com \
    --cc=arnd@arndb.de \
    --cc=fweisbec@gmail.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkft-triage@lists.linaro.org \
    --cc=mingo@kernel.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --subject='Re: BUG: KCSAN: data-race in tick_nohz_next_event / tick_nohz_stop_tick' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).