rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Waiman Long <longman@redhat.com>, Thomas Gleixner <tglx@linutronix.de>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	rcu@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>
Subject: Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks"
Date: Fri, 6 Mar 2020 22:33:22 -0500	[thread overview]
Message-ID: <F95F95DE-77D9-4A1D-AA5C-CAC165F6B4C8@lca.pw> (raw)
In-Reply-To: <CA9BD318-A8C8-4F22-828A-65C355931A5C@lca.pw>



> On Mar 5, 2020, at 11:06 PM, Qian Cai <cai@lca.pw> wrote:
> 

Using this config,

> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

Reverted the linux-next commit d441dceb5dce (“tick/common: Make tick_periodic() check for missing ticks”)
fixed the lockup that could easily happen during boot.

> 
> [    0.013514][    C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 
> [    0.013514][    C0] Modules linked in: 
> [    0.013514][    C0] irq event stamp: 64186318 
> [    0.013514][    C0] hardirqs last  enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 
> [    0.013514][    C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 
> [    0.013514][    C0] softirqs last  enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 
> [    0.013514][    C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 
> [    0.013514][    C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 
> [    0.013514][    C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 
> [    0.013514][    C0] RIP: 0010:lock_is_held_type+0x12a/0x150 
> [    0.013514][    C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 
> [    0.013514][    C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 
> [    0.013514][    C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 
> [    0.013514][    C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 
> [    0.013514][    C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 
> [    0.013514][    C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 
> [    0.013514][    C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 
> [    0.013514][    C0] FS:  0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 
> [    0.013514][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> [    0.013514][    C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 
> [    0.013514][    C0] Call Trace: 
> [    0.013514][    C0]  rcu_read_lock_sched_held+0xac/0xe0 
> lock_is_held at include/linux/lockdep.h:361
> (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121
> [    0.013514][    C0]  ? rcu_read_lock_bh_held+0xc0/0xc0 
> [    0.013514][    C0]  rcu_note_context_switcx186/0x3b0 
> [    0.013514][    C0]  __schedule+0x21f/0x1070 
> [    0.013514][    C0]  ? __sched_text_start+0x8/0x8 
> [    0.013514][    C0]  schedule+0x95/0x160 
> [    0.013514][    C0]  do_boot_cpu+0x58c/0xaf0 
> [    0.013514][    C0]  native_cpu_up+0x298/0x430 
> [    0.013514][    C0]  ? common_cpu_up+0x150/0x150 
> [    0.013514][    C0]  bringup_cpu+0x44/0x310 
> [    0.013514][    C0]  ? timers_prepare_cpu+0x114/0x190 
> [    0.013514][    C0]  ? takedown_cpu+0x2e0/0x2e0 
> [    0.013514][    C0]  cpuhp_invoke_callback+0x197/0x1120 
> [    0.013514][    C0]  ? ring_buffer_record_is_set_on+0x40/0x40 
> [    0.013514][    C0]  _cpu_up+0x171/0x280 
> [    0.013514][    C0]  do_cpu_up+0xb1/0x120 
> [    0.013514][    C0]  cpu_up+0x13/0x20 
> [    0.013514][    C0]  smp_init+0x91/0x118 
> [    0.013514][    C0]  kernel_init_freeable+0x221/0x4f8 
> [    0.013514][    C0]  ? mark_held_locks+0x34/0xb0 
> [    0.013514][    C0]  ? _raw_spin_unlock_irq+0x27/0x40 
> [    0.013514][    C0]  ? start_kernel+0x876/0x876 
> [    0.013514][    C0]  ? lockdep_hardirqs_on+0x1b0/0x2a0 
> [    0.013514][    C0]  ? _raw_spin_unlock_irq+0x27/0x40 
> [    0.013514][    C0]  ? rest_init+0x307/0x307 
> [    0.013514][    C0]  kernel_init+0x  0.013514][    C0]  ? rest_init+0x307/0x307 
> [    0.013514][    C0]  ret_from_fork+0x3a/0x50 
> 

We could have many slightly different traces,

[    0.000000][    T0] smpboot: CPU 8 Converting physical 0 to logical die 1
[    0.021496][    C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
[    0.021496][    C0] Modules linked in:
[    0.021496][    C0] irq event stamp: 53241496
[    0.021496][    C0] hardirqs last  enabled at (53241495): [<ffffffffa9c8c037>] _raw_spin_unlock_irq+0x27/0x40
[    0.021496][    C0] hardirqs last disabled at (53241496): [<ffffffffa9c80244>] __schedule+0x214/0x1070
[    0.021496][    C0] softirqs last  enabled at (88160): [<ffffffffaa000447>] __do_softirq+0x447/0x766
[    0.021496][    C0] softirqs last disabled at (88153): [<ffffffffa92d0c66>] irq_exit+0xd6/0xf0
[    0.021496][    C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4+ #25
[    0.021496][    C0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
[    0.021496][    C0] RIP: 0010:__asan_load8+0x0/0xa0
[    0.021496][    C0] Code: e8 03 0f b6 04 30 84 c0 74 c2 38 d0 0f 9e c0 84 c0 74 b9 ba 01 00 00 00 be 04 00 00 00 e8 c8 e3 ff ff 5d c3 66 0f 1f 44 00 00 <55> 48 89 e5 48 8b 4d 08 eb 3a 0f 1f 00 48 b8 00 00 00 00 00 00 00
[    0.021496][    C0] RSP: 0018:ffffc900031779d8 EFLAGS: 00000082
[    0.021496][    C0] RAX: 000000000000000f RBX: ffff88820f118040 RCX: ffffffffa9386e1c
[    0.021496][    C0] RDX: ffff88820f1188b0 RSI: ffff8884534442d8 RDI: ffff88820f118938
[    0.021496][    C0] RBP: ffffc90003177a10 R08: ffffed1041e23009 R09: ffffed1041e23009
[    0.021496][    C0] R10: ffffed1041e23008 R11: 0000000000000000 R12: ffff88820f118928
[    0.021496][    C0] R13: ffff8884534442d8 R14: 0000000000000003 R15: ffff88820f118928
[    0.021496][    C0] FS:  0000000000000000(0000) GS:ffff888453400000(0000) knlGS:0000000000000000
[    0.021496][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.021496][    C0] CR2: ffff88887b9ff000 CR3: 0000000257c12000 CR4: 00000000003406f0
[    0.021496][    C0] Call Trace:
[    0.021496][    C0]  ? match_held_lock+0x20/0x250
[    0.021496][    C0]  lock_unpin_lock+0x16a/0x260
[    0.021496][    C0]  ? lock_repin_lock+0x210/0x210
[    0.021496][    C0]  ? __kasan_check_read+0x11/0x20
[    0.021496][    C0]  ? pick_next_task_fair+0x3a6/0x6b0
[    0.021496][    C0]  __schedule+0xd4f/0x1070
[    0.021496][    C0]  ? firmware_map_remove+0xee/0xee
[    0.021496][    C0]  ? schedule+0xc9/0x160
[    0.021496][    C0]  schedule+0x95/0x160
[    0.021496][    C0]  do_boot_cpu+0x58c/0xaf0
[    0.021496][    C0]  native_cpu_up+0x298/0x430
[    0.021496][    C0]  ? common_cpu_up+0x150/0x150
[    0.021496][    C0]  bringup_cpu+0x44/0x310
[    0.021496][    C0]  ? timers_prepare_cpu+0x114/0x190
[    0.021496][    C0]  ? takedown_cpu+0x2e0/0x2e0
[    0.021496][    C0]  cpuhp_invoke_callback+0x197/0x1120
[    0.021496][    C0]  ? ring_buffer_record_is_set_on+0x40/0x40
[    0.021496][    C0]  _cpu_up+0x171/0x280
[    0.021496][    C0]  do_cpu_up+0xb1/0x120
[    0.021496][    C0]  cpu_up+0x13/0x20
[    0.021496][    C0]  smp_init+0x91/0x118
[    0.021496][    C0]  kernel_init_freeable+0x221/0x4f8
[    0.021496][    C0]  ? mark_held_locks+0x34/0xb0
[    0.021496][    C0]  ? _raw_spin_unlock_irq+0x27/0x40
[    0.021496][    C0]  ? start_kernel+0x857/0x857
[    0.021496][    C0]  ? lockdep_hardirqs_on+0x1b0/0x2a0
[    0.021496][    C0]  ? _raw_spin_unlock_irq+0x27/0x40
[    0.021496][    C0]  ? rest_init+0x307/0x307
[    0.021496][    C0]  kernel_init+0x11/0x139
[    0.021496][    C0]  ? rest_init+0x307/0x307
[    0.021496][    C0]  ret_from_fork+0x27/0x50

[    0.021458][    C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
[    0.021458][    C0] Modules linked in:
[    0.021458][    C0] irq event stamp: 55574034
[    0.021458][    C0] hardirqs last  enabled at (55574033): [<ffffffffa549d4d7>] _raw_spin_unlock_irq+0x27/0x40
[    0.021458][    C0] hardirqs last disabled at (55574034): [<ffffffffa5491754>] __schedule+0x214/0x1070
[    0.021458][    C0] softirqs last  enabled at (83640): [<ffffffffa5800447>] __do_softirq+0x447/0x766
[    0.021458][    C0] softirqs last disabled at (83623): [<ffffffffa4ad2196>] irq_exit+0xd6/0xf0
[    0.021458][    C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4+ #13
[    0.021458][    C0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
[    0.021458][    C0] RIP: 0010:check_memory_region+0x136/0x200
[    0.021458][    C0] Code: 00 eb 0c 49 83 c0 01 48 89 d8 49 39 d8 74 10 41 80 38 00 74 ee 4b 8d 44 0d 00 4d 85 c0 75 4d 4c 89 e3 48 29 c3 e9 3e ff ff ff <48> 85 db 74 2e 41 80 39 00 75 34 48 b8 01 00 00 00 00 fc ff df 49
[    0.021458][    C0] RSP: 0018:ffffc900031779e8 EFLAGS: 00000083
[    0.021458][    C0] RAX: fffff5200062ef48 RBX: 0000000000000001 RCX: ffffffffa4b94e16
[    0.021458][    C0] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc90003177a40
[    0.021458][    C0] RBP: ffffc90003177a00 R08: 1ffff9200062ef48 R09: fffff5200062ef48
[    0.021458][    C0] R10: fffff5200062ef48 R11: ffffc90003177a43 R12: fffff5200062ef49
[    0.021458][    C0] R13: ffffc90003177a80 R14: ffff888453444310 R15: ffff888453444308
[    0.021458][    C0] FS:  0000000000000000(0000) GS:ffff888453400000(0000) knlGS:0000000000000000
[    0.021458][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.021458][    C0] CR2: ffff88887b9ff000 CR3: 00000006e6c12000 CR4: 00000000003406f0
[    0.021458][    C0] Call Trace:
[    0.021458][    C0]  __kasan_check_write+0x14/0x20
[    0.021458][    C0]  do_raw_spin_lock+0xe6/0x1e0
[    0.021458][    C0]  ? rwlock_bug.part.1+0x60/0x60
[    0.021458][    C0]  ? __schedule+0x227/0x1070
[    0.021458][    C0]  _raw_spin_lock+0x37/0x40
[    0.021458][    C0]  ? __schedule+0x227/0x1070
[    0.021458][    C0]  __schedule+0x227/0x1070
[    0.021458][    C0]  ? __sched_text_start+0x8/0x8
[    0.021458][    C0]  schedule+0x95/0x160
[    0.021458][    C0]  do_boot_cpu+0x58c/0xaf0
[    0.021458][    C0]  native_cpu_up+0x298/0x430
[    0.021458][    C0]  ? common_cpu_up+0x150/0x150
[    0.021458][    C0]  bringup_cpu+0x44/0x310
[    0.021458][    C0]  ? timers_prepare_cpu+0x114/0x190
[    0.021458][    C0]  ? takedown_cpu+0x2e0/0x2e0
[    0.021458][    C0]  cpuhp_invoke_callback+0x197/0x1120
[    0.021458][    C0]  ? ring_buffer_record_is_set_on+0x40/0x40
[    0.021458][    C0]  _cpu_up+0x171/0x280
[    0.021458][    C0]  do_cpu_up+0xb1/0x120
[    0.021458][    C0]  cpu_up+0x13/0x20
[    0.021458][    C0]  smp_init+0x91/0x118
[    0.021458][    C0]  kernel_init_freeable+0x221/0x4f8
[    0.021458][    C0]  ? mark_held_locks+0x34/0xb0
[    0.021458][    C0]  ? _raw_spin_unlock_irq+0x27/0x40
[    0.021458][    C0]  ? start_kernel+0x876/0x876
[    0.021458][    C0]  ? lockdep_hardirqs_on+0x1b0/0x2a0
[    0.021458][    C0]  ? _raw_spin_unlock_irq+0x27/0x40
[    0.021458][    C0]  ? rest_init+0x307/0x307
[    0.021458][    C0]  kernel_init+0x11/0x139
[    0.021458][    C0]  ? rest_init+0x307/0x307
[    0.021458][    C0]  ret_from_fork+0x27/0x50




  reply	other threads:[~2020-03-07  3:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06  4:06 PROVE_RCU_LIST + /proc/lockdep warning Qian Cai
2020-03-07  3:33 ` Qian Cai [this message]
2020-03-08 21:18   ` Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" Waiman Long
2020-03-19 13:58   ` Qian Cai
2020-03-19 19:00     ` Waiman Long
2020-03-20  2:00       ` Stephen Rothwell
2020-03-20 11:50         ` Thomas Gleixner
2020-03-07 17:16 ` PROVE_RCU_LIST + /proc/lockdep warning Joel Fernandes
2020-03-09  1:40   ` Boqun Feng
2020-03-10 18:42     ` Qian Cai
2020-03-11  3:30       ` Boqun Feng
2020-03-11 13:24         ` Qian Cai
2020-03-12  0:48           ` Boqun Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F95F95DE-77D9-4A1D-AA5C-CAC165F6B4C8@lca.pw \
    --to=cai@lca.pw \
    --cc=fweisbec@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).