linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Anders Roxell <anders.roxell@linaro.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Potapenko <glider@google.com>,
	Dmitry Vyukov <dvyukov@google.com>, Jann Horn <jannh@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	rcu@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-arm-kernel@lists.infradead.org, boqun.feng@gmail.com,
	tglx@linutronix.de
Subject: Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
Date: Tue, 24 Nov 2020 19:30:34 +0000	[thread overview]
Message-ID: <20201124193034.GB8957@C02TD0UTHF1T.local> (raw)
In-Reply-To: <20201124140310.GA811510@elver.google.com>

On Tue, Nov 24, 2020 at 03:03:10PM +0100, Marco Elver wrote:
> On Mon, Nov 23, 2020 at 07:32PM +0000, Mark Rutland wrote:
> > On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> > > On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > > > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > > > looking at that and I think he is close to having something workable.
> > > > > 
> > > > > Mark -- is there anything Marco and Paul can try out?
> > > > 
> > > > I initially traced some issues back to commit:
> > > > 
> > > >   044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > > > 
> > > > ... and that change of semantic could cause us to miss edges in some
> > > > cases, but IIUC mostly where we haven't done the right thing in
> > > > exception entry/return.
> > > > 
> > > > I don't think my patches address this case yet, but my WIP (currently
> > > > just fixing user<->kernel transitions) is at:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > > > 
> > > > I'm looking into the kernel<->kernel transitions now, and I know that we
> > > > mess up RCU management for a small window around arch_cpu_idle, but it's
> > > > not immediately clear to me if either of those cases could cause this
> > > > report.
> > > 
> > > Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> > > problem (still get warnings and then a panic). :-/
> > 
> > I've just updated that branch with a new version which I hope covers
> > kernel<->kernel transitions too. If you get a chance, would you mind
> > giving that a spin?
> > 
> > The HEAD commit should be:
> > 
> >   a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
> 
> Thank you! Your series appears to work and fixes the stalls and
> deadlocks (3 trials)! 

Thanks for testing! I'm glad that appears to work, as it suggests
there's not another massive problem lurking in this area.

While cleaning/splitting that up today, I spotted a couple of new
problems I introduced, and I'm part-way through sorting that out, but
it's not quite ready today after all. :/

Fingers crossed for tomorrow...

> I noticed there are a bunch of warnings in the log
> that might be relevant (see attached).

> [   91.184432] =============================
> [   91.188301] WARNING: suspicious RCU usage
> [   91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G        W        
> [   91.197536] -----------------------------
> [   91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> [   91.206546] 
> [   91.206546] other info that might help us debug this:
> [   91.206546] 
> [   91.211790] 
> [   91.211790] rcu_scheduler_active = 2, debug_locks = 0
> [   91.216454] RCU used illegally from extended quiescent state!
> [   91.220890] no locks held by swapper/0/0.
> [   91.224712] 
> [   91.224712] stack backtrace:
> [   91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> [   91.234877] Hardware name: linux,dummy-virt (DT)
> [   91.239032] Call trace:
> [   91.242587]  dump_backtrace+0x0/0x240
> [   91.246500]  show_stack+0x34/0x88
> [   91.250295]  dump_stack+0x140/0x1bc
> [   91.254159]  lockdep_rcu_suspicious+0xe4/0xf8
> [   91.258332]  trace_hardirqs_off+0x214/0x330
> [   91.262462]  trace_graph_return+0x1ac/0x1d8
> [   91.266564]  ftrace_return_to_handler+0xa4/0x170
> [   91.270809]  return_to_handler+0x1c/0x38
> [   91.274826]  default_idle_call+0x94/0x38c
> [   91.278869]  do_idle+0x240/0x290
> [   91.282633]  rest_init+0x1e8/0x2dc
> [   91.286529]  arch_call_rest_init+0x1c/0x28
> [   91.290585]  start_kernel+0x638/0x670

Hmm... I suspect that arch_cpu_idle() is being traced here, and I reckon
we have to mark that and its callees as noinstr, since it doesn't seem
sane to have ftrace check whether RCU is watching for every function
call. Maybe Paul or Steve can correct me. ;)

If you still have the binary lying around, can you check whether
default_idle_call+0x94/0x38c is just after the call to arch_cpu_idle()?
If you could dump the asm around that, along with whatever faddr2line
tells you, that'd be a great help. 

This looks like it should be reproducible, so I'll enable the boot-time
self tests in my kernel and check whether I see the above too.

Thanks,
Mark.

  parent reply	other threads:[~2020-11-24 19:31 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10 13:53 [PATCH] kfence: Avoid stalling work queue task without allocations Marco Elver
2020-11-10 14:25 ` Dmitry Vyukov
2020-11-10 14:53   ` Marco Elver
2020-11-10 23:23 ` Anders Roxell
2020-11-11  8:29   ` Marco Elver
2020-11-11 13:38     ` Marco Elver
2020-11-11 18:05       ` Steven Rostedt
2020-11-11 18:23         ` Paul E. McKenney
2020-11-11 18:34           ` Marco Elver
2020-11-11 19:21             ` Paul E. McKenney
2020-11-11 20:21               ` Marco Elver
2020-11-12  0:11                 ` Paul E. McKenney
2020-11-12 12:49                   ` Marco Elver
2020-11-12 16:14                     ` Marco Elver
2020-11-12 17:54                       ` Paul E. McKenney
2020-11-12 18:12                         ` Marco Elver
2020-11-12 20:00                           ` Paul E. McKenney
2020-11-13 11:06                             ` Marco Elver
2020-11-13 17:20                               ` Paul E. McKenney
2020-11-13 17:57                         ` Paul E. McKenney
2020-11-17 10:52                           ` Marco Elver
2020-11-17 18:29                             ` Paul E. McKenney
2020-11-18 22:56                               ` Marco Elver
2020-11-18 23:38                                 ` Paul E. McKenney
2020-11-19 12:53                                   ` Marco Elver
2020-11-19 15:14                                     ` Paul E. McKenney
2020-11-19 17:02                                       ` Marco Elver
2020-11-19 18:48                                         ` Paul E. McKenney
2020-11-19 19:38                                           ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
2020-11-19 21:35                                             ` Paul E. McKenney
2020-11-19 22:53                                               ` Will Deacon
2020-11-20 10:30                                                 ` Mark Rutland
2020-11-20 14:03                                                   ` Marco Elver
2020-11-23 19:32                                                     ` Mark Rutland
2020-11-24 14:03                                                       ` Marco Elver
2020-11-24 15:01                                                         ` Paul E. McKenney
2020-11-24 19:43                                                           ` Mark Rutland
2020-11-24 20:32                                                             ` Steven Rostedt
2020-11-24 19:30                                                         ` Mark Rutland [this message]
2020-11-25  9:45                                                           ` Marco Elver
2020-11-25 10:28                                                             ` Mark Rutland
2020-11-20 14:19                                               ` Marco Elver
2020-11-20 14:39                                                 ` Paul E. McKenney
2020-11-20 15:22                                                   ` Mark Rutland
2020-11-20 17:38                                                     ` Paul E. McKenney
2020-11-20 18:02                                                       ` Mark Rutland
2020-11-20 18:57                                                         ` Paul E. McKenney
2020-11-20 15:26                                                 ` Steven Rostedt
2020-11-20 18:17                                                   ` Marco Elver
2020-11-20 18:57                                                     ` Steven Rostedt
2020-11-20 19:16                                                     ` Steven Rostedt
2020-11-20 19:22                                                       ` Marco Elver
2020-11-20 19:27                                     ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
2020-11-23 15:27                                       ` Marco Elver
2020-11-23 16:28                                         ` Steven Rostedt
2020-11-23 16:36                                           ` Steven Rostedt
2020-11-23 18:53                                             ` Marco Elver
2020-11-23 18:42                                           ` Steven Rostedt
2020-11-24  2:59                                             ` Boqun Feng
2020-11-24  3:44                                               ` Paul E. McKenney
2020-11-11 18:21       ` Paul E. McKenney
2020-11-11 15:01     ` Anders Roxell
2020-11-11 15:22       ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201124193034.GB8957@C02TD0UTHF1T.local \
    --to=mark.rutland@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anders.roxell@linaro.org \
    --cc=boqun.feng@gmail.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=jannh@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).