All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Marco Elver <elver@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Anders Roxell <anders.roxell@linaro.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Potapenko <glider@google.com>,
	Dmitry Vyukov <dvyukov@google.com>, Jann Horn <jannh@google.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	rcu@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	joel@joelfernandes.org
Subject: Re: [PATCH] kfence: Avoid stalling work queue task without allocations
Date: Fri, 13 Nov 2020 09:20:50 -0800	[thread overview]
Message-ID: <20201113172050.GJ3249@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20201113110604.GA1907210@elver.google.com>

On Fri, Nov 13, 2020 at 12:06:04PM +0100, Marco Elver wrote:
> On Thu, Nov 12, 2020 at 12:00PM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 12, 2020 at 07:12:54PM +0100, Marco Elver wrote:
> > > On Thu, Nov 12, 2020 at 09:54AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 12, 2020 at 05:14:39PM +0100, Marco Elver wrote:
> > > > > On Thu, Nov 12, 2020 at 01:49PM +0100, Marco Elver wrote:
> > > > > > On Thu, 12 Nov 2020 at 01:11, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > [...]
> > > > > > > > This assert didn't fire yet, I just get more of the below. I'll keep
> > > > > > > > rerunning, but am not too hopeful...
> > > > > > >
> > > > > > > Is bisection a possibility?
> > > > > > 
> > > > > > I've been running a bisection for past ~12h, and am making slow
> > > > > > progress. It might be another 12h, but I think it'll get there.
> > > > > 
> > > > > Bisection gave me this:
> > > > > 
> > > > > | git bisect start
> > > > > | # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
> > > > > | git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
> > > > > | # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
> > > > > | git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
> > > > > | # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
> > > > > | git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
> > > > > | # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
> > > > > | git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
> > > > > | # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
> > > > > | git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
> > > > > | # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
> > > > > | git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
> > > > > | # bad: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
> > > > > | git bisect bad b790e3afead9357195b6d1e1b6cd9b3521503ad2
> > > > > | # bad: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > > > > | git bisect bad 765b512bb3d639bfad7dd43c288ee085236c7267
> > > > > | # good: [01f9e708d9eae6335ae9ff25ab09893c20727a55] tools/rcutorture: Fix BUG parsing of console.log
> > > > 
> > > > So torture.2020.11.06a is OK.
> > > > 
> > > > > | git bisect good 01f9e708d9eae6335ae9ff25ab09893c20727a55
> > > > > | # good: [1be6ab91e2db157faedb7f16ab0636a80745a073] srcu: Take early exit on memory-allocation failure
> > > > 
> > > > As is fixes.2020.11.02a.
> > > > 
> > > > > | git bisect good 1be6ab91e2db157faedb7f16ab0636a80745a073
> > > > > | # good: [65e9eb1ccfe56b41a0d8bfec651ea014968413cb] rcu: Prevent RCU_LOCKDEP_WARN() from swallowing the condition
> > > > 
> > > > And lockdep.2020.11.02a.
> > > > 
> > > > > | git bisect good 65e9eb1ccfe56b41a0d8bfec651ea014968413cb
> > > > > | # good: [c386e29d43728778ddd642fa73cc582bee684171] docs/rcu: Update the call_rcu() API
> > > > 
> > > > And doc.2020.11.06a.
> > > > 
> > > > > | git bisect good c386e29d43728778ddd642fa73cc582bee684171
> > > > > | # good: [27c0f1448389baf7f309b69e62d4b531c9395e88] rcutorture: Make grace-period kthread report match RCU flavor being tested
> > > > 
> > > > And the first three commits of tasks.2020.11.06a.
> > > > 
> > > > > | git bisect good 27c0f1448389baf7f309b69e62d4b531c9395e88
> > > > > | # good: [3fcd6a230fa7d03bffcb831a81b40435c146c12b] x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs
> > > > 
> > > > And cpuinfo.2020.11.06a.
> > > > 
> > > > > | git bisect good 3fcd6a230fa7d03bffcb831a81b40435c146c12b
> > > > > | # good: [75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374] rcu-tasks: Make the units of ->init_fract be jiffies
> > > > 
> > > > And the remaining commit of tasks.2020.11.06a.
> > > > 
> > > > > | git bisect good 75dc2da5ecd65bdcbfc4d59b9d9b7342c61fe374
> > > > > | # first bad commit: [765b512bb3d639bfad7dd43c288ee085236c7267] Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.02a', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD
> > > > > 
> > > > > This doesn't look very satisfying, given it's the merge commit. :-/
> > > > 
> > > > So each individual branch is just fine, but the merge of them is not.  Fun.
> > > > 
> > > > These have been passing quite a bit of rcutorture over here, including
> > > > preemptible kernels running !SMP, but admittedly on x86 rather than ARMv8.
> > > 
> > > Note that this is ARMv8 on QEMU on an x86 host i.e. emulated. And it's
> > > really slow as a result. Together with a bunch of debug tools including
> > > lockdep.
> > 
> > Then I don't envy you the bisection process!  ;-)
> > 
> > > > One approach would be to binary-search the combinations of merges.
> > > > Except that there are six of them, so there are 64 combinations, of
> > > > which you have tested only 8 thus far (none, one each, and all).
> > > > 
> > > > But are you sure that the bisection points labeled "good" really are good?
> > > > For example, what is the distribution of first failure times in the
> > > > points labeled "bad" vs. the runtime used to make a "good" determination?
> > > > Alternatively, just try a longer run on each of the commits feeding into
> > > > the merge point.
> > > 
> > > Yeah, I'm having doubts, and this might be even more non-deterministic
> > > that I thought and some 'good' could maybe be 'bad' if I had re-run
> > > them? I don't know. One thing I can try is to make sure I run it more
> > > than once, but I'm definitely not doing that manually, so let me try and
> > > script something so I don't have to hand-hold the bisection overnight.
> > > :-)
> > 
> > I know that feeling.  A similar experience motivated me to upgrade my
> > tooling, with more upgrades in the queue.
> 
> [.....]
> 
> > > > > | [  841.143527]  arch_local_irq_restore+0x4/0x8
> > > > 
> > > > So we are just now restoring interrupts, hence our getting the
> > > > interrupt at this point..
> > > > 
> > > > > | [  841.147612]  trace_preempt_on+0xf4/0x190
> > > > 
> > > > From within the trace code, which is apparently recording the fact
> > > > that preemption is being enabled.
> > > > 
> > > > > | [  841.151656]  preempt_schedule_common+0x12c/0x1b0
> > > > > | [  841.155869]  preempt_schedule.part.88+0x20/0x28
> > > > > | [  841.160036]  preempt_schedule+0x20/0x28
> > > > 
> > > > I was not aware that releasing a raw spinlock could result in a direct
> > > > call to preempt_schedule().
> > > > 
> > > > > | [  841.164051]  _raw_spin_unlock_irq+0x80/0x90
> > > > > | [  841.168139]  rcu_gp_kthread+0xe5c/0x19a8
> > > > 
> > > > So the RCU grace-period kthread has spent many seconds attempting to
> > > > release a lock?  Am I reading this correctly?  Mark Rutland, am I missing
> > > > something here?
> > 
> > And yes, this is the RCU grace-period kthread releasing a lock.
> > 
> > I have no idea why that would take so long.  It is acting like a
> > self-deadlock or similar hang, except that in that case, lockdep should
> > have complained before the RCU CPU stall warning rather than after.
> > 
> > The only thing I can suggest is sprinkling lockdep_assert_irqs_disabled()
> > calls hither and yon.  All of the code that lockdep is complaining about
> > runs in the context of the scheduling-clock interrupt, so interrupts
> > had jolly well be disabled!  ;-)
> > 
> > Rerunning some of the allegedly good bisects might be more productive.
> 
> Oof, so I reran bisection, and this time confirming 3x each good run.
> This is what I get:
> 
> | git bisect start
> | # bad: [c07b306d7fa5680777e2132662d2e6c19fb53579] kfence: Avoid stalling work queue task without allocations
> | git bisect bad c07b306d7fa5680777e2132662d2e6c19fb53579
> | # good: [3cea11cd5e3b00d91caf0b4730194039b45c5891] Linux 5.10-rc2
> | git bisect good 27598e7e73260ed0b2917eb02d4a515ebb578313
> | # good: [3e5acbea719e66ef3be64fe74c99cc905ca697dc] Merge remote-tracking branch 'wireless-drivers-next/master' into master
> | git bisect good 3e5acbea719e66ef3be64fe74c99cc905ca697dc
> | # good: [491a5a9a2fea28353d99621b8abb83b6928b4e36] Merge remote-tracking branch 'sound-asoc/for-next' into master
> | git bisect good 491a5a9a2fea28353d99621b8abb83b6928b4e36
> | # bad: [502f8643d6e21c7e370a0b75131130cc51609055] Merge remote-tracking branch 'phy-next/next' into master
> | git bisect bad 502f8643d6e21c7e370a0b75131130cc51609055
> | # good: [6693cb1fa5ea7b91ec00f9404776a095713face5] Merge remote-tracking branch 'tip/auto-latest' into master
> | git bisect good 6693cb1fa5ea7b91ec00f9404776a095713face5
> | # good: [b790e3afead9357195b6d1e1b6cd9b3521503ad2] Merge branch 'tglx-pc.2020.10.30a' into HEAD
> | git bisect good b790e3afead9357195b6d1e1b6cd9b3521503ad2
> | # bad: [7bd5bb161657717d576798f62b0e8d5b44653139] Merge remote-tracking branch 'drivers-x86/for-next' into master
> | git bisect bad 7bd5bb161657717d576798f62b0e8d5b44653139
> | # bad: [e71eb4c4d42bcf36a3a7ede30fd320d47b3c8cb8] Merge remote-tracking branch 'xen-tip/linux-next' into master
> | git bisect bad e71eb4c4d42bcf36a3a7ede30fd320d47b3c8cb8
> | # bad: [c0a41bf9dbc751692c8cb1a44bfd48e70e8bef7f] docs: Remove redundant "``" from Requirements.rst
> | git bisect bad c0a41bf9dbc751692c8cb1a44bfd48e70e8bef7f
> | # bad: [c293fb8f7de6c2fce11cb01a0218d668df326bcd] torture: Make --kcsan specify lockdep
> | git bisect bad c293fb8f7de6c2fce11cb01a0218d668df326bcd
> | # good: [5068ab7dcb6a526a401054ebe0d416f979efb3e1] rcutorture: Add testing for RCU's global memory ordering
> | git bisect good 5068ab7dcb6a526a401054ebe0d416f979efb3e1
> | # good: [bea68a13bbbdc575a2c868dabd7b454c2eddc618] rcu/segcblist: Add additional comments to explain smp_mb()
> | git bisect good bea68a13bbbdc575a2c868dabd7b454c2eddc618
> | # first bad commit: [c293fb8f7de6c2fce11cb01a0218d668df326bcd] torture: Make --kcsan specify lockdep
> 
> Which clearly is ridiculous! So my guess is this probably had existed
> before, but something in -next is making it more visible.
> 
> Short of giving up, I can try your suggestion of sprinkling
> lockdep_assert_irqs_disabled() everywhere, or if you have a patch to
> apply that would give some other debug output you wanted I can run that
> too.

I don't have a patch, but if you are still seeing lots of workqueue
lockups before the RCU CPU stall warning, I again suggest adding the
backtrace as called out in my earlier email.  The idea here is to see
what is causing these lockups.

I can send a formal patch for this if you wish, but today is crazy,
so I cannot promise it before this evening, Pacific Time.

							Thanx, Paul

  reply	other threads:[~2020-11-13 17:21 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10 13:53 [PATCH] kfence: Avoid stalling work queue task without allocations Marco Elver
2020-11-10 13:53 ` Marco Elver
2020-11-10 14:25 ` Dmitry Vyukov
2020-11-10 14:25   ` Dmitry Vyukov
2020-11-10 14:53   ` Marco Elver
2020-11-10 14:53     ` Marco Elver
2020-11-10 23:23 ` Anders Roxell
2020-11-10 23:23   ` Anders Roxell
2020-11-11  8:29   ` Marco Elver
2020-11-11  8:29     ` Marco Elver
2020-11-11 13:38     ` Marco Elver
2020-11-11 18:05       ` Steven Rostedt
2020-11-11 18:23         ` Paul E. McKenney
2020-11-11 18:34           ` Marco Elver
2020-11-11 19:21             ` Paul E. McKenney
2020-11-11 20:21               ` Marco Elver
2020-11-12  0:11                 ` Paul E. McKenney
2020-11-12 12:49                   ` Marco Elver
2020-11-12 12:49                     ` Marco Elver
2020-11-12 16:14                     ` Marco Elver
2020-11-12 17:54                       ` Paul E. McKenney
2020-11-12 18:12                         ` Marco Elver
2020-11-12 20:00                           ` Paul E. McKenney
2020-11-13 11:06                             ` Marco Elver
2020-11-13 17:20                               ` Paul E. McKenney [this message]
2020-11-13 17:57                         ` Paul E. McKenney
2020-11-17 10:52                           ` Marco Elver
2020-11-17 18:29                             ` Paul E. McKenney
2020-11-18 22:56                               ` Marco Elver
2020-11-18 23:38                                 ` Paul E. McKenney
2020-11-19 12:53                                   ` Marco Elver
2020-11-19 15:14                                     ` Paul E. McKenney
2020-11-19 17:02                                       ` Marco Elver
2020-11-19 18:48                                         ` Paul E. McKenney
2020-11-19 19:38                                           ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
2020-11-19 19:38                                             ` Marco Elver
2020-11-19 21:35                                             ` Paul E. McKenney
2020-11-19 21:35                                               ` Paul E. McKenney
2020-11-19 22:53                                               ` Will Deacon
2020-11-19 22:53                                                 ` Will Deacon
2020-11-20 10:30                                                 ` Mark Rutland
2020-11-20 10:30                                                   ` Mark Rutland
2020-11-20 14:03                                                   ` Marco Elver
2020-11-20 14:03                                                     ` Marco Elver
2020-11-23 19:32                                                     ` Mark Rutland
2020-11-23 19:32                                                       ` Mark Rutland
2020-11-24 14:03                                                       ` Marco Elver
2020-11-24 14:03                                                         ` Marco Elver
2020-11-24 15:01                                                         ` Paul E. McKenney
2020-11-24 15:01                                                           ` Paul E. McKenney
2020-11-24 19:43                                                           ` Mark Rutland
2020-11-24 19:43                                                             ` Mark Rutland
2020-11-24 20:32                                                             ` Steven Rostedt
2020-11-24 20:32                                                               ` Steven Rostedt
2020-11-24 19:30                                                         ` Mark Rutland
2020-11-24 19:30                                                           ` Mark Rutland
2020-11-25  9:45                                                           ` Marco Elver
2020-11-25  9:45                                                             ` Marco Elver
2020-11-25 10:28                                                             ` Mark Rutland
2020-11-25 10:28                                                               ` Mark Rutland
2020-11-20 14:19                                               ` Marco Elver
2020-11-20 14:19                                                 ` Marco Elver
2020-11-20 14:39                                                 ` Paul E. McKenney
2020-11-20 14:39                                                   ` Paul E. McKenney
2020-11-20 15:22                                                   ` Mark Rutland
2020-11-20 15:22                                                     ` Mark Rutland
2020-11-20 17:38                                                     ` Paul E. McKenney
2020-11-20 17:38                                                       ` Paul E. McKenney
2020-11-20 18:02                                                       ` Mark Rutland
2020-11-20 18:02                                                         ` Mark Rutland
2020-11-20 18:57                                                         ` Paul E. McKenney
2020-11-20 18:57                                                           ` Paul E. McKenney
2020-11-20 15:26                                                 ` Steven Rostedt
2020-11-20 15:26                                                   ` Steven Rostedt
2020-11-20 18:17                                                   ` Marco Elver
2020-11-20 18:17                                                     ` Marco Elver
2020-11-20 18:57                                                     ` Steven Rostedt
2020-11-20 18:57                                                       ` Steven Rostedt
2020-11-20 19:16                                                     ` Steven Rostedt
2020-11-20 19:16                                                       ` Steven Rostedt
2020-11-20 19:22                                                       ` Marco Elver
2020-11-20 19:22                                                         ` Marco Elver
2020-11-20 19:22                                                         ` Marco Elver
2020-11-20 19:27                                     ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
2020-11-23 15:27                                       ` Marco Elver
2020-11-23 16:28                                         ` Steven Rostedt
2020-11-23 16:36                                           ` Steven Rostedt
2020-11-23 18:53                                             ` Marco Elver
2020-11-23 18:42                                           ` Steven Rostedt
2020-11-24  2:59                                             ` Boqun Feng
2020-11-24  3:44                                               ` Paul E. McKenney
2020-11-11 18:21       ` Paul E. McKenney
2020-11-11 15:01     ` Anders Roxell
2020-11-11 15:01       ` Anders Roxell
2020-11-11 15:22       ` Marco Elver
2020-11-11 15:22         ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201113172050.GJ3249@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=anders.roxell@linaro.org \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.