linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: syzbot <syzbot+de9526ade17c659d8336@syzkaller.appspotmail.com>,
	fweisbec@gmail.com, hch@lst.de, hdanton@sina.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	mingo@kernel.org, syzkaller-bugs@googlegroups.com,
	tglx@linutronix.de, x86@kernel.org
Subject: Re: [syzbot] INFO: rcu detected stall in __hrtimer_run_queues
Date: Wed, 17 Nov 2021 03:59:27 -0800	[thread overview]
Message-ID: <20211117115927.GZ641268@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <097631ea-d781-c4e1-aa0e-d921a7a2e69e@kernel.dk>

On Tue, Nov 16, 2021 at 08:42:39AM -0700, Jens Axboe wrote:
> On 11/16/21 8:41 AM, syzbot wrote:
> > syzbot suspects this issue was fixed by commit:
> > 
> > commit b60876296847e6cd7f1da4b8b7f0f31399d59aa1
> > Author: Jens Axboe <axboe@kernel.dk>
> > Date:   Fri Oct 15 21:03:52 2021 +0000
> > 
> >     block: improve layout of struct request
> 
> No functional changes in that patch, so looks like a fluky bisection.

I am seeing an intermittent (and rather strange) stall warnings on
v5.16-rc1.  This is a self-detected stall from the idle loop.  The
reason that this is strange is that the usual reason that a CPU stalls
in the idle loop is due to a long-running interrupt, in which case
you would expect other CPUs to detect the stall.

Reproduce using RCU's TRE07 scenario, except that the MTBF looks to be
several hundred hours.  But I ran this scenario long enough on v5.15-rc*
to be confident that this stall warning is a regression introduced
recently.

And the reason is that the CPU, despite being in the idle loop, is not
marked as idle from an RCU perspective (see the "idle=d59/0/0x1"):

 rcu:     0-...!: (13 ticks this GP) idle=d59/0/0x1 softirq=281261
/281261 fqs=1
  (t=2199037 jiffies g=249449 q=5)
 NMI backtrace for cpu 0
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc1 #4571
 Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module_el8.5.0+
746+bbd5d70c 04/01/2014
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x33/0x42
  nmi_cpu_backtrace.cold.6+0x30/0x70
  ? lapic_can_unplug_cpu+0x70/0x70
  nmi_trigger_cpumask_backtrace+0xbf/0xd0
  rcu_dump_cpu_stacks+0xc0/0x120
  rcu_sched_clock_irq.cold.110+0x15a/0x312
  ? get_nohz_timer_target+0x60/0x190
  ? lock_timer_base+0x62/0x80
  ? account_process_tick+0xd4/0x160
  ? tick_sched_handle.isra.24+0x40/0x40
  update_process_times+0x8e/0xc0
  tick_sched_handle.isra.24+0x30/0x40
  tick_sched_timer+0x6a/0x80
  __hrtimer_run_queues+0xfc/0x2a0
  hrtimer_interrupt+0x105/0x220
  ? resched_curr+0x1e/0xc0
  __sysvec_apic_timer_interrupt+0x7a/0x160
  sysvec_apic_timer_interrupt+0x85/0xb0
  </IRQ>
  <TASK>
  asm_sysvec_apic_timer_interrupt+0x12/0x20
 RIP: 0010:default_idle+0xb/0x10
 Code: ff 48 89 df e8 16 5c 90 ff eb d7 e8 bf 82 ff ff cc cc cc cc
 cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d df 92 41 00 fb f4 <c3> 0f 1f 40
 00 65 48 8b 04 25 00 ad 01 00 f0 80 48 02 20 48 8b 10
 RSP: 0018:ffffffff9dc03e98 EFLAGS: 00000202
 RAX: ffffffff9d3ed200 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffffffff9da3e0a1 RDI: ffffffff9da683ae
 RBP: ffffffff9de86050 R08: 00000000e141ad57 R09: ffffa03c5f229d40
 R10: 0000000000002400 R11: 0000000000002400 R12: 0000000000000000
 R13: 0000000000000000 R14: ffffffffffffffff R15: ffffffff9dc14940
  ? __cpuidle_text_start+0x8/0x8
  ? __cpuidle_text_start+0x8/0x8
  default_idle_call+0x28/0xd0
  do_idle+0x1fb/0x290
  cpu_startup_entry+0x14/0x20
  start_kernel+0x659/0x680
  secondary_startup_64_no_verify+0xc2/0xcb
  </TASK>

The usual reason for this odd situation is that someone forgot an
irq_enter() or added an extra irq_exit().  Or likewise for a number of
similar functions that tell RCU to start/stop ignoring the current CPU:
nmi_enter(), nmi_exit(), rcu_*_enter(), rcu_*_exit(), and so on.

Adding the x86 list on CC.

							Thanx, Paul

  reply	other threads:[~2021-11-17 11:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-20 21:05 INFO: rcu detected stall in __hrtimer_run_queues syzbot
2021-11-16 15:41 ` [syzbot] " syzbot
2021-11-16 15:42   ` Jens Axboe
2021-11-17 11:59     ` Paul E. McKenney [this message]
     [not found] <20220327092631.4171-1-hdanton@sina.com>
2022-03-27 10:59 ` syzbot
     [not found] <20220731072530.451-1-hdanton@sina.com>
2022-07-31  7:38 ` syzbot
     [not found] <20220731080330.602-1-hdanton@sina.com>
2022-07-31  8:19 ` syzbot
     [not found] <20220731082711.724-1-hdanton@sina.com>
2022-07-31  8:47 ` syzbot
     [not found] <20220731093631.791-1-hdanton@sina.com>
2022-07-31  9:53 ` syzbot
     [not found] <20220731110143.913-1-hdanton@sina.com>
2022-07-31 11:13 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211117115927.GZ641268@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=fweisbec@gmail.com \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=syzbot+de9526ade17c659d8336@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).