LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Marco Elver <elver@google.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Anders Roxell <anders.roxell@linaro.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Potapenko <glider@google.com>,
	Dmitry Vyukov <dvyukov@google.com>, Jann Horn <jannh@google.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	rcu@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] kfence: Avoid stalling work queue task without allocations
Date: Tue, 24 Nov 2020 10:59:45 +0800
Message-ID: <20201124025945.GG286534@boqun-archlinux> (raw)
In-Reply-To: <20201123134227.6df443db@gandalf.local.home>

Hi Steven,

On Mon, Nov 23, 2020 at 01:42:27PM -0500, Steven Rostedt wrote:
> On Mon, 23 Nov 2020 11:28:12 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > I noticed:
> > 
> > 
> > [  237.650900] enabling event benchmark_event
> > 
> > In both traces. Could you disable CONFIG_TRACEPOINT_BENCHMARK and see if
> > the issue goes away. That event kicks off a thread that spins in a tight
> > loop for some time and could possibly cause some issues.
> > 
> > It still shouldn't break things, we can narrow it down if it is the culprit.
> 
> [ Added Thomas  ]
> 
> And that's just one issue. I don't think that has anything to do with the
> other one:
> 
> [ 1614.162007] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1614.168625]  (detected by 0, t=3752 jiffies, g=3529, q=1)
> [ 1614.170825] rcu: All QSes seen, last rcu_preempt kthread activity 242 (4295293115-4295292873), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 1614.194272] 
> [ 1614.196673] ================================
> [ 1614.199738] WARNING: inconsistent lock state
> [ 1614.203056] 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #21 Not tainted
> [ 1614.207012] --------------------------------
> [ 1614.210125] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [ 1614.213832] swapper/0/1 [HC0[0]:SC0[0]:HE0:SE1] takes:
> [ 1614.217288] ffffd942547f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
> [ 1614.225496] {IN-HARDIRQ-W} state was registered at:
> [ 1614.229031]   __lock_acquire+0xae8/0x1ac8
> [ 1614.232203]   lock_acquire+0x268/0x508
> [ 1614.235254]   _raw_spin_lock_irqsave+0x78/0x14c
> [ 1614.238547]   rcu_sched_clock_irq+0x7c0/0x17a0
> [ 1614.241757]   update_process_times+0x6c/0xb8
> [ 1614.244950]   tick_sched_handle.isra.0+0x58/0x88
> [ 1614.248225]   tick_sched_timer+0x68/0xe0
> [ 1614.251304]   __hrtimer_run_queues+0x288/0x730
> [ 1614.254516]   hrtimer_interrupt+0x114/0x288
> [ 1614.257650]   arch_timer_handler_virt+0x50/0x70
> [ 1614.260922]   handle_percpu_devid_irq+0x104/0x4c0
> [ 1614.264236]   generic_handle_irq+0x54/0x78
> [ 1614.267385]   __handle_domain_irq+0xac/0x130
> [ 1614.270585]   gic_handle_irq+0x70/0x108
> [ 1614.273633]   el1_irq+0xc0/0x180
> [ 1614.276526]   rcu_irq_exit_irqson+0x40/0x78
> [ 1614.279704]   trace_preempt_on+0x144/0x1a0
> [ 1614.282834]   preempt_schedule_common+0xf8/0x1a8
> [ 1614.286126]   preempt_schedule+0x38/0x40
> [ 1614.289240]   __mutex_lock+0x608/0x8e8
> [ 1614.292302]   mutex_lock_nested+0x3c/0x58
> [ 1614.295450]   static_key_enable_cpuslocked+0x7c/0xf8
> [ 1614.298828]   static_key_enable+0x2c/0x40
> [ 1614.301961]   tracepoint_probe_register_prio+0x284/0x3a0
> [ 1614.305464]   tracepoint_probe_register+0x40/0x58
> [ 1614.308776]   trace_event_reg+0xe8/0x150
> [ 1614.311852]   __ftrace_event_enable_disable+0x2e8/0x608
> [ 1614.315351]   __ftrace_set_clr_event_nolock+0x160/0x1d8
> [ 1614.318809]   __ftrace_set_clr_event+0x60/0x90
> [ 1614.322061]   event_trace_self_tests+0x64/0x12c
> [ 1614.325335]   event_trace_self_tests_init+0x88/0xa8
> [ 1614.328758]   do_one_initcall+0xa4/0x500
> [ 1614.331860]   kernel_init_freeable+0x344/0x3c4
> [ 1614.335110]   kernel_init+0x20/0x16c
> [ 1614.338102]   ret_from_fork+0x10/0x34
> [ 1614.341057] irq event stamp: 3206302
> [ 1614.344123] hardirqs last  enabled at (3206301): [<ffffd9425238da04>] rcu_irq_exit_irqson+0x64/0x78
> [ 1614.348697] hardirqs last disabled at (3206302): [<ffffd942522123c0>] el1_irq+0x80/0x180
> [ 1614.353013] softirqs last  enabled at (3204216): [<ffffd94252210b80>] __do_softirq+0x630/0x6b4
> [ 1614.357509] softirqs last disabled at (3204191): [<ffffd942522c623c>] irq_exit+0x1cc/0x1e0
> [ 1614.361737] 
> [ 1614.361737] other info that might help us debug this:
> [ 1614.365566]  Possible unsafe locking scenario:
> [ 1614.365566] 
> [ 1614.369128]        CPU0
> [ 1614.371747]        ----
> [ 1614.374282]   lock(rcu_node_0);
> [ 1614.378818]   <Interrupt>
> [ 1614.381394]     lock(rcu_node_0);
> [ 1614.385997] 
> [ 1614.385997]  *** DEADLOCK ***
> [ 1614.385997] 
> [ 1614.389613] 5 locks held by swapper/0/1:
> [ 1614.392655]  #0: ffffd9425480e940 (event_mutex){+.+.}-{3:3}, at: __ftrace_set_clr_event+0x48/0x90
> [ 1614.401701]  #1: ffffd9425480a530 (tracepoints_mutex){+.+.}-{3:3}, at: tracepoint_probe_register_prio+0x48/0x3a0
> [ 1614.410973]  #2: ffffd9425476abf0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x24/0x40
> [ 1614.419858]  #3: ffffd94254816348 (jump_label_mutex){+.+.}-{3:3}, at: static_key_enable_cpuslocked+0x7c/0xf8
> [ 1614.429049]  #4: ffffd942547f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
> [ 1614.438029] 
> [ 1614.438029] stack backtrace:
> [ 1614.441436] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #21
> [ 1614.446149] Hardware name: linux,dummy-virt (DT)
> [ 1614.449621] Call trace:
> [ 1614.452337]  dump_backtrace+0x0/0x240
> [ 1614.455372]  show_stack+0x34/0x88
> [ 1614.458306]  dump_stack+0x140/0x1bc
> [ 1614.461258]  print_usage_bug+0x2a0/0x2f0
> [ 1614.464399]  mark_lock.part.0+0x438/0x4e8
> [ 1614.467528]  mark_held_locks+0x54/0x90
> [ 1614.470576]  lockdep_hardirqs_on_prepare+0xe0/0x290
> [ 1614.473935]  trace_hardirqs_on+0x90/0x370
> [ 1614.477045]  el1_irq+0xdc/0x180
> [ 1614.479934]  rcu_irq_exit_irqson+0x40/0x78
> [ 1614.483093]  trace_preempt_on+0x144/0x1a0
> [ 1614.486211]  preempt_schedule_common+0xf8/0x1a8
> [ 1614.489479]  preempt_schedule+0x38/0x40
> [ 1614.492544]  __mutex_lock+0x608/0x8e8
> 
> 
> The above has:
> 
>  preempt_schedule_common() {
>    trace_preempt_on() {
>      <interrupt>
> 	el1_irq:
> 	   handle_arch_irq {
> 	      irq_enter();
> 	      [..]
> 	      irq_exit();
> 	   }
> 	   bl trace_hardirqs_on
> 
> 
> I wonder if the lockdep logic got confused on ARM64 by the rework done to
> lockdep and tracing with respect to irq entry / exit.
> 

I'm also staring at this problem and another thing caused my attention
is that there is a line like the following after the lockdep splat:

[...] BUG: scheduling while atomic ...

, which means preemption count has some inconsistency too.

Given this, a possible case cause this is that we got preempted inside a
rcu_node lock critical section (I know, this is quite impossible, but
preemption count and lockdep data are maintained quite separately, so
it's unlikely they are broken at the same time...)

Will continue to look into this.

Regards,
Boqun

> Or maybe there's an rcu_node leak lock that happened somewhere?
> 
> -- Steve

  reply index

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10 13:53 Marco Elver
2020-11-10 14:25 ` Dmitry Vyukov
2020-11-10 14:53   ` Marco Elver
2020-11-10 23:23 ` Anders Roxell
2020-11-11  8:29   ` Marco Elver
2020-11-11 13:38     ` Marco Elver
2020-11-11 18:05       ` Steven Rostedt
2020-11-11 18:23         ` Paul E. McKenney
2020-11-11 18:34           ` Marco Elver
2020-11-11 19:21             ` Paul E. McKenney
2020-11-11 20:21               ` Marco Elver
2020-11-12  0:11                 ` Paul E. McKenney
2020-11-12 12:49                   ` Marco Elver
2020-11-12 16:14                     ` Marco Elver
2020-11-12 17:54                       ` Paul E. McKenney
2020-11-12 18:12                         ` Marco Elver
2020-11-12 20:00                           ` Paul E. McKenney
2020-11-13 11:06                             ` Marco Elver
2020-11-13 17:20                               ` Paul E. McKenney
2020-11-13 17:57                         ` Paul E. McKenney
2020-11-17 10:52                           ` Marco Elver
2020-11-17 18:29                             ` Paul E. McKenney
2020-11-18 22:56                               ` Marco Elver
2020-11-18 23:38                                 ` Paul E. McKenney
2020-11-19 12:53                                   ` Marco Elver
2020-11-19 15:14                                     ` Paul E. McKenney
2020-11-19 17:02                                       ` Marco Elver
2020-11-19 18:48                                         ` Paul E. McKenney
2020-11-19 19:38                                           ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
2020-11-19 21:35                                             ` Paul E. McKenney
2020-11-19 22:53                                               ` Will Deacon
2020-11-20 10:30                                                 ` Mark Rutland
2020-11-20 14:03                                                   ` Marco Elver
2020-11-23 19:32                                                     ` Mark Rutland
2020-11-24 14:03                                                       ` Marco Elver
2020-11-24 15:01                                                         ` Paul E. McKenney
2020-11-24 19:43                                                           ` Mark Rutland
2020-11-24 20:32                                                             ` Steven Rostedt
2020-11-24 19:30                                                         ` Mark Rutland
2020-11-25  9:45                                                           ` Marco Elver
2020-11-25 10:28                                                             ` Mark Rutland
2020-11-20 14:19                                               ` Marco Elver
2020-11-20 14:39                                                 ` Paul E. McKenney
2020-11-20 15:22                                                   ` Mark Rutland
2020-11-20 17:38                                                     ` Paul E. McKenney
2020-11-20 18:02                                                       ` Mark Rutland
2020-11-20 18:57                                                         ` Paul E. McKenney
2020-11-20 15:26                                                 ` Steven Rostedt
2020-11-20 18:17                                                   ` Marco Elver
2020-11-20 18:57                                                     ` Steven Rostedt
2020-11-20 19:16                                                     ` Steven Rostedt
2020-11-20 19:22                                                       ` Marco Elver
2020-11-20 19:27                                     ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
2020-11-23 15:27                                       ` Marco Elver
2020-11-23 16:28                                         ` Steven Rostedt
2020-11-23 16:36                                           ` Steven Rostedt
2020-11-23 18:53                                             ` Marco Elver
2020-11-23 18:42                                           ` Steven Rostedt
2020-11-24  2:59                                             ` Boqun Feng [this message]
2020-11-24  3:44                                               ` Paul E. McKenney
2020-11-11 18:21       ` Paul E. McKenney
2020-11-11 15:01     ` Anders Roxell
2020-11-11 15:22       ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201124025945.GG286534@boqun-archlinux \
    --to=boqun.feng@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anders.roxell@linaro.org \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=jannh@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git