From: Mark Rutland <mark.rutland@arm.com> To: "Paul E. McKenney" <paulmck@kernel.org> Cc: Marco Elver <elver@google.com>, Steven Rostedt <rostedt@goodmis.org>, Anders Roxell <anders.roxell@linaro.org>, Andrew Morton <akpm@linux-foundation.org>, Alexander Potapenko <glider@google.com>, Dmitry Vyukov <dvyukov@google.com>, Jann Horn <jannh@google.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>, kasan-dev <kasan-dev@googlegroups.com>, rcu@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>, linux-arm-kernel@lists.infradead.org Subject: Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Date: Fri, 20 Nov 2020 18:02:06 +0000 [thread overview] Message-ID: <20201120180206.GF2328@C02TD0UTHF1T.local> (raw) In-Reply-To: <20201120173824.GJ1437@paulmck-ThinkPad-P72> On Fri, Nov 20, 2020 at 09:38:24AM -0800, Paul E. McKenney wrote: > On Fri, Nov 20, 2020 at 03:22:00PM +0000, Mark Rutland wrote: > > On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote: > > > On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote: > > > > I found that disabling ftrace for some of kernel/rcu (see below) solved > > > > the stalls (and any mention of deadlocks as a side-effect I assume), > > > > resulting in successful boot. > > > > > > > > Does that provide any additional clues? I tried to narrow it down to 1-2 > > > > files, but that doesn't seem to work. > > > > > > There were similar issues during the x86/entry work. Are the ARM guys > > > doing arm64/entry work now? > > > > I'm currently looking at it. I had been trying to shift things to C for > > a while, and right now I'm trying to fix the lockdep state tracking, > > which is requiring untangling lockdep/rcu/tracing. > > > > The main issue I see remaining atm is that we don't save/restore the > > lockdep state over exceptions taken from kernel to kernel. That could > > result in lockdep thinking IRQs are disabled when they're actually > > enabled (because code in the nested context might do a save/restore > > while IRQs are disabled, then return to a context where IRQs are > > enabled), but AFAICT shouldn't result in the inverse in most cases since > > the non-NMI handlers all call lockdep_hardirqs_disabled(). > > > > I'm at a loss to explaim the rcu vs ftrace bits, so if you have any > > pointers to the issuies ween with the x86 rework that'd be quite handy. > > There were several over a number of months. I especially recall issues > with the direct-from-idle execution of smp_call_function*() handlers, > and also with some of the special cases in the entry code, for example, > reentering the kernel from the kernel. This latter could cause RCU to > not be watching when it should have been or vice versa. Ah; those are precisely the cases I'm currently fixing, so if we're lucky this is an indirect result of one of those rather than a novel source of pain... > I would of course be most aware of the issues that impinged on RCU > and that were located by rcutorture. This is actually not hard to run, > especially if the ARM bits in the scripting have managed to avoid bitrot. > The "modprobe rcutorture" approach has fewer dependencies. Either way: > https://paulmck.livejournal.com/57769.html and later posts. That is a very good idea. I'd been relying on Syzkaller to tickle the issue, but the torture infrastructure is a much better fit for this problem. I hadn't realise how comprehensive the scripting was, thanks for this! I'll see about giving that a go once I have the irq-from-idle cases sorted, as those are very obviously broken if you hack trace_hardirqs_{on,off}() to check that RCU is watching. Thanks, Mark.
WARNING: multiple messages have this Message-ID (diff)
From: Mark Rutland <mark.rutland@arm.com> To: "Paul E. McKenney" <paulmck@kernel.org> Cc: linux-arm-kernel@lists.infradead.org, Marco Elver <elver@google.com>, Anders Roxell <anders.roxell@linaro.org>, Jann Horn <jannh@google.com>, Peter Zijlstra <peterz@infradead.org>, Lai Jiangshan <jiangshanlai@gmail.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Steven Rostedt <rostedt@goodmis.org>, rcu@vger.kernel.org, Linux-MM <linux-mm@kvack.org>, Alexander Potapenko <glider@google.com>, kasan-dev <kasan-dev@googlegroups.com>, Tejun Heo <tj@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Dmitry Vyukov <dvyukov@google.com> Subject: Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Date: Fri, 20 Nov 2020 18:02:06 +0000 [thread overview] Message-ID: <20201120180206.GF2328@C02TD0UTHF1T.local> (raw) In-Reply-To: <20201120173824.GJ1437@paulmck-ThinkPad-P72> On Fri, Nov 20, 2020 at 09:38:24AM -0800, Paul E. McKenney wrote: > On Fri, Nov 20, 2020 at 03:22:00PM +0000, Mark Rutland wrote: > > On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote: > > > On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote: > > > > I found that disabling ftrace for some of kernel/rcu (see below) solved > > > > the stalls (and any mention of deadlocks as a side-effect I assume), > > > > resulting in successful boot. > > > > > > > > Does that provide any additional clues? I tried to narrow it down to 1-2 > > > > files, but that doesn't seem to work. > > > > > > There were similar issues during the x86/entry work. Are the ARM guys > > > doing arm64/entry work now? > > > > I'm currently looking at it. I had been trying to shift things to C for > > a while, and right now I'm trying to fix the lockdep state tracking, > > which is requiring untangling lockdep/rcu/tracing. > > > > The main issue I see remaining atm is that we don't save/restore the > > lockdep state over exceptions taken from kernel to kernel. That could > > result in lockdep thinking IRQs are disabled when they're actually > > enabled (because code in the nested context might do a save/restore > > while IRQs are disabled, then return to a context where IRQs are > > enabled), but AFAICT shouldn't result in the inverse in most cases since > > the non-NMI handlers all call lockdep_hardirqs_disabled(). > > > > I'm at a loss to explaim the rcu vs ftrace bits, so if you have any > > pointers to the issuies ween with the x86 rework that'd be quite handy. > > There were several over a number of months. I especially recall issues > with the direct-from-idle execution of smp_call_function*() handlers, > and also with some of the special cases in the entry code, for example, > reentering the kernel from the kernel. This latter could cause RCU to > not be watching when it should have been or vice versa. Ah; those are precisely the cases I'm currently fixing, so if we're lucky this is an indirect result of one of those rather than a novel source of pain... > I would of course be most aware of the issues that impinged on RCU > and that were located by rcutorture. This is actually not hard to run, > especially if the ARM bits in the scripting have managed to avoid bitrot. > The "modprobe rcutorture" approach has fewer dependencies. Either way: > https://paulmck.livejournal.com/57769.html and later posts. That is a very good idea. I'd been relying on Syzkaller to tickle the issue, but the torture infrastructure is a much better fit for this problem. I hadn't realise how comprehensive the scripting was, thanks for this! I'll see about giving that a go once I have the irq-from-idle cases sorted, as those are very obviously broken if you hack trace_hardirqs_{on,off}() to check that RCU is watching. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-11-20 18:02 UTC|newest] Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-10 13:53 [PATCH] kfence: Avoid stalling work queue task without allocations Marco Elver 2020-11-10 13:53 ` Marco Elver 2020-11-10 14:25 ` Dmitry Vyukov 2020-11-10 14:25 ` Dmitry Vyukov 2020-11-10 14:53 ` Marco Elver 2020-11-10 14:53 ` Marco Elver 2020-11-10 23:23 ` Anders Roxell 2020-11-10 23:23 ` Anders Roxell 2020-11-11 8:29 ` Marco Elver 2020-11-11 8:29 ` Marco Elver 2020-11-11 13:38 ` Marco Elver 2020-11-11 18:05 ` Steven Rostedt 2020-11-11 18:23 ` Paul E. McKenney 2020-11-11 18:34 ` Marco Elver 2020-11-11 19:21 ` Paul E. McKenney 2020-11-11 20:21 ` Marco Elver 2020-11-12 0:11 ` Paul E. McKenney 2020-11-12 12:49 ` Marco Elver 2020-11-12 12:49 ` Marco Elver 2020-11-12 16:14 ` Marco Elver 2020-11-12 17:54 ` Paul E. McKenney 2020-11-12 18:12 ` Marco Elver 2020-11-12 20:00 ` Paul E. McKenney 2020-11-13 11:06 ` Marco Elver 2020-11-13 17:20 ` Paul E. McKenney 2020-11-13 17:57 ` Paul E. McKenney 2020-11-17 10:52 ` Marco Elver 2020-11-17 18:29 ` Paul E. McKenney 2020-11-18 22:56 ` Marco Elver 2020-11-18 23:38 ` Paul E. McKenney 2020-11-19 12:53 ` Marco Elver 2020-11-19 15:14 ` Paul E. McKenney 2020-11-19 17:02 ` Marco Elver 2020-11-19 18:48 ` Paul E. McKenney 2020-11-19 19:38 ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver 2020-11-19 19:38 ` Marco Elver 2020-11-19 21:35 ` Paul E. McKenney 2020-11-19 21:35 ` Paul E. McKenney 2020-11-19 22:53 ` Will Deacon 2020-11-19 22:53 ` Will Deacon 2020-11-20 10:30 ` Mark Rutland 2020-11-20 10:30 ` Mark Rutland 2020-11-20 14:03 ` Marco Elver 2020-11-20 14:03 ` Marco Elver 2020-11-23 19:32 ` Mark Rutland 2020-11-23 19:32 ` Mark Rutland 2020-11-24 14:03 ` Marco Elver 2020-11-24 14:03 ` Marco Elver 2020-11-24 15:01 ` Paul E. McKenney 2020-11-24 15:01 ` Paul E. McKenney 2020-11-24 19:43 ` Mark Rutland 2020-11-24 19:43 ` Mark Rutland 2020-11-24 20:32 ` Steven Rostedt 2020-11-24 20:32 ` Steven Rostedt 2020-11-24 19:30 ` Mark Rutland 2020-11-24 19:30 ` Mark Rutland 2020-11-25 9:45 ` Marco Elver 2020-11-25 9:45 ` Marco Elver 2020-11-25 10:28 ` Mark Rutland 2020-11-25 10:28 ` Mark Rutland 2020-11-20 14:19 ` Marco Elver 2020-11-20 14:19 ` Marco Elver 2020-11-20 14:39 ` Paul E. McKenney 2020-11-20 14:39 ` Paul E. McKenney 2020-11-20 15:22 ` Mark Rutland 2020-11-20 15:22 ` Mark Rutland 2020-11-20 17:38 ` Paul E. McKenney 2020-11-20 17:38 ` Paul E. McKenney 2020-11-20 18:02 ` Mark Rutland [this message] 2020-11-20 18:02 ` Mark Rutland 2020-11-20 18:57 ` Paul E. McKenney 2020-11-20 18:57 ` Paul E. McKenney 2020-11-20 15:26 ` Steven Rostedt 2020-11-20 15:26 ` Steven Rostedt 2020-11-20 18:17 ` Marco Elver 2020-11-20 18:17 ` Marco Elver 2020-11-20 18:57 ` Steven Rostedt 2020-11-20 18:57 ` Steven Rostedt 2020-11-20 19:16 ` Steven Rostedt 2020-11-20 19:16 ` Steven Rostedt 2020-11-20 19:22 ` Marco Elver 2020-11-20 19:22 ` Marco Elver 2020-11-20 19:22 ` Marco Elver 2020-11-20 19:27 ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt 2020-11-23 15:27 ` Marco Elver 2020-11-23 16:28 ` Steven Rostedt 2020-11-23 16:36 ` Steven Rostedt 2020-11-23 18:53 ` Marco Elver 2020-11-23 18:42 ` Steven Rostedt 2020-11-24 2:59 ` Boqun Feng 2020-11-24 3:44 ` Paul E. McKenney 2020-11-11 18:21 ` Paul E. McKenney 2020-11-11 15:01 ` Anders Roxell 2020-11-11 15:01 ` Anders Roxell 2020-11-11 15:22 ` Marco Elver 2020-11-11 15:22 ` Marco Elver
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201120180206.GF2328@C02TD0UTHF1T.local \ --to=mark.rutland@arm.com \ --cc=akpm@linux-foundation.org \ --cc=anders.roxell@linaro.org \ --cc=dvyukov@google.com \ --cc=elver@google.com \ --cc=glider@google.com \ --cc=jannh@google.com \ --cc=jiangshanlai@gmail.com \ --cc=kasan-dev@googlegroups.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=paulmck@kernel.org \ --cc=peterz@infradead.org \ --cc=rcu@vger.kernel.org \ --cc=rostedt@goodmis.org \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.