kernel panics with 4.14.X versions

* kernel panics with 4.14.X versions
@ 2018-04-16 11:54 Pavlos Parissis
  0 siblings, 0 replies; 19+ messages in thread
From: Pavlos Parissis @ 2018-04-16 11:54 UTC (permalink / raw)
  To: stable, linux-kernel

[-- Attachment #1.1: Type: text/plain, Size: 84638 bytes --]

Hi all,

We have observed kernel panics on several master kubernetes clusters, where we run
kubernetes API services and not application workloads.

Those clusters use kernel version 4.14.14 and 4.14.32, but we switched everything
to kernel version 4.14.32 as a way to address the issue.

We have HP and Dell hardware on those clusters, and network cards are also different,
we have bnx2x and mlx5_core in use.

We also run kernel version 4.14.32 on different type of workloads, software load
balancing using HAProxy, and we don't have any crashes there.

Since the crash happens on different hardware, we think it could be a kernel issue,
but we aren't sure about it. Thus, I am contacting kernel people in order to get some
hint, which can help us to figure out what causes this.

In our kubernetes clusters, we have instructed the kernel to panic upon soft lockup,
we use 'kernel.softlockup_panic=1', 'kernel.hung_task_panic=1' and 'kernel.watchdog_thresh=10'.
Thus, we see the stack traces. Today, we have disabled this, later I will explain why.

I believe we have two discint types of panics, one is trigger upon soft lockup and another one
where the call trace is about scheduler("sched: Unexpected reschedule of offline CPU#8!)

Let me walk you through the kernel panics and some observations.

The followin series of stack traces are happening when one CPU (CPU 24) is stuck for ~22 seconds.
watchdog_thresh is set to 10 and as far as I remember softlockup threshold is (2 * watchdog_thresh),
so it makes sense to see the kernel crashing after ~20seconds.

After the stack trace, we have the output of sar for CPU#24 and we see that just before the
crash CPU utilization for system level went to 100%. Now let's move to another panic.

[373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [kube-apiserver:24261]
[373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
pps_core scsi_transport_sas
[373782.516807]  dm_mirror dm_region_hash dm_log dm_mod dax
[373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 4.14.32-1.el7.x86_64 #1
[373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
[373782.567486] task: ffff882f66d28000 task.stack: ffffc9002120c000
[373782.583441] RIP: 0010:fsnotify+0x197/0x510
[373782.597319] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
[373782.615308] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
[373782.632950] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[373782.650616] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
[373782.668287] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[373782.685918] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[373782.703302] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
[373782.721887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[373782.737741] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
[373782.755247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[373782.772722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[373782.790043] Call Trace:
[373782.802041]  vfs_write+0x151/0x1b0
[373782.815081]  ? syscall_trace_enter+0x1cd/0x2b0
[373782.829175]  SyS_write+0x55/0xc0
[373782.841870]  do_syscall_64+0x79/0x1b0
[373782.855073]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[373782.869807] RIP: 0033:0x483084
[373782.882293] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[373782.899997] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[373782.917177] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
[373782.934268] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
[373782.951297] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[373782.968208] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[373782.985003] Code: 0f 84 f6 02 00 00 48 8b 45 a0 4d 85 d2 48 8b 00 48 89 45 a8 48 89 45 a0 0f 85
ef 02 00 00 48 8b 45 b0 48 89 45 98 48 83 7d a0 00 <0f> 95 c0 48 83 7d 98 00 0f 95 c2 89 d1 08 c1 0f
84 fc 02 00 00
[373783.024208] Kernel panic - not syncing: softlockup: hung tasks
[373783.039881] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G             L
4.14.32-1.el7.x86_64 #1
[373783.059497] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
[373783.077206] Call Trace:
[373783.089115]  <IRQ>
[373783.100422]  dump_stack+0x63/0x88
[373783.113081]  panic+0xe8/0x258
[373783.125109]  watchdog_timer_fn+0x21a/0x230
[373783.138546]  ? watchdog+0x30/0x30
[373783.150870]  __hrtimer_run_queues+0xe7/0x230
[373783.164081]  hrtimer_interrupt+0xa8/0x1a0
[373783.176703]  smp_apic_timer_interrupt+0x6b/0x140
[373783.189788]  apic_timer_interrupt+0x8e/0xa0
[373783.202198]  </IRQ>
[373783.211900] RIP: 0010:fsnotify+0x197/0x510
[373783.223746] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
[373783.239434] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
[373783.254599] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[373783.269673] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
[373783.284629] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[373783.299460] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[373783.314200]  ? fsnotify+0x4bb/0x510
[373783.324757]  vfs_write+0x151/0x1b0
[373783.335115]  ? syscall_trace_enter+0x1cd/0x2b0
[373783.346617]  SyS_write+0x55/0xc0
[373783.356735]  do_syscall_64+0x79/0x1b0
[373783.367330]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[373783.379606] RIP: 0033:0x483084
[373783.389540] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[373783.404657] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[373783.419294] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
[373783.433922] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
[373783.448565] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[373783.463128] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[373783.477744] Kernel Offset: disabled
[373783.492343] ---[ end Kernel panic - not syncing: softlockup: hung tasks
[373783.506452] ------------[ cut here ]------------
[373783.518376] WARNING: CPU: 24 PID: 24261 at kernel/sched/core.c:1179 set_task_cpu+0x197/0x1a0
[373783.534730] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
pps_core scsi_transport_sas
[373783.667938]  dm_mirror dm_region_hash dm_log dm_mod dax
[373783.682082] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G             L
4.14.32-1.el7.x86_64 #1
[373783.700753] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
[373783.717501] task: ffff882f66d28000 task.stack: ffffc9002120c000
[373783.732386] RIP: 0010:set_task_cpu+0x197/0x1a0
[373783.745458] RSP: 0018:ffff882fbf903b88 EFLAGS: 00010046
[373783.759432] RAX: 0000000000000200 RBX: ffff885fb3cb45c0 RCX: 0000000000000001
[373783.775692] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff885fb3cb45c0
[373783.791999] RBP: ffff882fbf903ba8 R08: 0000000000000000 R09: 0000000000000000
[373783.808362] R10: 0000000000000000 R11: 0000000000000000 R12: ffff885fb3cb516c
[373783.824785] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000022ac0
[373783.841196] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
[373783.858761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[373783.873710] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
[373783.890304] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[373783.906951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[373783.923503] Call Trace:
[373783.934742]  <IRQ>
[373783.945346]  try_to_wake_up+0x16c/0x480
[373783.957961]  default_wake_function+0x12/0x20
[373783.971086]  autoremove_wake_function+0x16/0x60
[373783.984483]  __wake_up_common+0x8f/0x160
[373783.997154]  __wake_up_common_lock+0x7e/0xc0
[373784.010293]  __wake_up+0x13/0x20
[373784.022125]  wake_up_klogd_work_func+0x40/0x60
[373784.035365]  irq_work_run_list+0x53/0x80
[373784.048042]  irq_work_run+0x2c/0x30
[373784.060132]  flush_smp_call_function_queue+0x88/0x110
[373784.074076]  generic_smp_call_function_single_interrupt+0x13/0x30
[373784.089312]  smp_call_function_single_interrupt+0x3a/0xe0
[373784.103788]  call_function_single_interrupt+0x8e/0xa0
[373784.117820] RIP: 0010:panic+0x206/0x258
[373784.130402] RSP: 0018:ffff882fbf903e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[373784.147325] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[373784.163842] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
[373784.180394] RBP: ffff882fbf903ef0 R08: 0000000000000001 R09: 00000000000006b1
[373784.197041] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff81e6be9f
[373784.213609] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
[373784.230077]  watchdog_timer_fn+0x21a/0x230
[373784.243095]  ? watchdog+0x30/0x30
[373784.255113]  __hrtimer_run_queues+0xe7/0x230
[373784.267974]  hrtimer_interrupt+0xa8/0x1a0
[373784.280195]  smp_apic_timer_interrupt+0x6b/0x140
[373784.292919]  apic_timer_interrupt+0x8e/0xa0
[373784.304979]  </IRQ>
[373784.314365] RIP: 0010:fsnotify+0x197/0x510
[373784.325739] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
[373784.340979] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
[373784.355767] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[373784.370474] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
[373784.385000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[373784.399438] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[373784.413725]  ? fsnotify+0x4bb/0x510
[373784.423875]  vfs_write+0x151/0x1b0
[373784.433861]  ? syscall_trace_enter+0x1cd/0x2b0
[373784.444973]  SyS_write+0x55/0xc0
[373784.454738]  do_syscall_64+0x79/0x1b0
[373784.464901]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[373784.476731] RIP: 0033:0x483084
[373784.486201] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[373784.500878] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[373784.515015] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
[373784.529155] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
[373784.543400] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[373784.557490] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[373784.571578] Code: ff 80 8b ac 08 00 00 04 e9 20 ff ff ff 0f 0b e9 b9 fe ff ff f7 83 84 00 00 00
fd ff ff ff 0f 84 c3 fe ff ff 0f 0b e9 bc fe ff ff <0f> 0b e9 cb fe ff ff 66 90 0f 1f 44 00 00 55 48
89 e5 41 56 49
[373784.605527] ---[ end trace d3faf76bdc3ca403 ]---
[373784.617188] sched: Unexpected reschedule of offline CPU#0!
[373784.629856] ------------[ cut here ]------------
[373784.641694] WARNING: CPU: 24 PID: 24261 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x42/0x50
[373784.659370] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
pps_core scsi_transport_sas
[373784.793557]  dm_mirror dm_region_hash dm_log dm_mod dax
[373784.807848] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G        W    L
4.14.32-1.el7.x86_64 #1
[373784.826743] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
[373784.843685] task: ffff882f66d28000 task.stack: ffffc9002120c000
[373784.858935] RIP: 0010:native_smp_send_reschedule+0x42/0x50
[373784.873706] RSP: 0018:ffff882fbf903b10 EFLAGS: 00010046
[373784.888200] RAX: 000000000000002e RBX: 0000000000000000 RCX: 0000000000000006
[373784.904979] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
[373784.921626] RBP: ffff882fbf903b10 R08: 0000000000000001 R09: 00000000000006f8
[373784.938313] R10: 0000000000000001 R11: 0000000000000000 R12: ffff882fbf622ac0
[373784.955106] R13: ffff885fb3cb45c0 R14: ffff882fbf903bc8 R15: ffff882fbf622ac0
[373784.971891] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
[373784.989852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[373785.005204] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
[373785.022197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[373785.039227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[373785.056132] Call Trace:
[373785.067623]  <IRQ>
[373785.078506]  resched_curr+0xae/0xd0
[373785.091051]  check_preempt_curr+0x79/0xa0
[373785.104217]  ttwu_do_wakeup+0x1e/0x160
[373785.117171]  ttwu_do_activate+0x7a/0x90
[373785.130058]  try_to_wake_up+0x1e7/0x480
[373785.142959]  default_wake_function+0x12/0x20
[373785.156411]  autoremove_wake_function+0x16/0x60
[373785.170119]  __wake_up_common+0x8f/0x160
[373785.183152]  __wake_up_common_lock+0x7e/0xc0
[373785.196508]  __wake_up+0x13/0x20
[373785.208612]  wake_up_klogd_work_func+0x40/0x60
[373785.222065]  irq_work_run_list+0x53/0x80
[373785.234885]  irq_work_run+0x2c/0x30
[373785.247071]  flush_smp_call_function_queue+0x88/0x110
[373785.261146]  generic_smp_call_function_single_interrupt+0x13/0x30
[373785.276556]  smp_call_function_single_interrupt+0x3a/0xe0
[373785.291300]  call_function_single_interrupt+0x8e/0xa0
[373785.305485] RIP: 0010:panic+0x206/0x258
[373785.318154] RSP: 0018:ffff882fbf903e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[373785.335001] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[373785.351418] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
[373785.367776] RBP: ffff882fbf903ef0 R08: 0000000000000001 R09: 00000000000006b1
[373785.383990] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff81e6be9f
[373785.400019] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
[373785.415792]  watchdog_timer_fn+0x21a/0x230
[373785.427910]  ? watchdog+0x30/0x30
[373785.438891]  __hrtimer_run_queues+0xe7/0x230
[373785.450736]  hrtimer_interrupt+0xa8/0x1a0
[373785.462037]  smp_apic_timer_interrupt+0x6b/0x140
[373785.473814]  apic_timer_interrupt+0x8e/0xa0
[373785.485054]  </IRQ>
[373785.493740] RIP: 0010:fsnotify+0x197/0x510
[373785.504592] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
[373785.519343] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
[373785.533627] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[373785.547934] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
[373785.562192] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[373785.576431] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[373785.590592]  ? fsnotify+0x4bb/0x510
[373785.600647]  vfs_write+0x151/0x1b0
[373785.610507]  ? syscall_trace_enter+0x1cd/0x2b0
[373785.621459]  SyS_write+0x55/0xc0
[373785.630952]  do_syscall_64+0x79/0x1b0
[373785.640818]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[373785.652319] RIP: 0033:0x483084
[373785.661599] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[373785.676059] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[373785.690181] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
[373785.704317] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
[373785.718448] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[373785.732562] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[373785.746624] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 48
[373785.780531] ---[ end trace d3faf76bdc3ca404 ]---
[373785.792207] sched: Unexpected reschedule of offline CPU#42!
[373785.804993] ------------[ cut here ]------------
[373785.816775] WARNING: CPU: 24 PID: 24261 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x42/0x50
[373785.834478] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
pps_core scsi_transport_sas
[373785.968794]  dm_mirror dm_region_hash dm_log dm_mod dax
[373785.983020] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G        W    L
4.14.32-1.el7.x86_64 #1
[373786.001870] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
[373786.018790] task: ffff882f66d28000 task.stack: ffffc9002120c000
[373786.034031] RIP: 0010:native_smp_send_reschedule+0x42/0x50
[373786.048836] RSP: 0018:ffff882fbf9039e0 EFLAGS: 00010046
[373786.063302] RAX: 000000000000002f RBX: 000000000000002a RCX: 0000000000000006
[373786.080012] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
[373786.096647] RBP: ffff882fbf9039e0 R08: 0000000000000001 R09: 0000000000000743
[373786.113328] R10: 0000000000000001 R11: 0000000000000000 R12: ffff882fbfb62ac0
[373786.130019] R13: ffff882fb3f61740 R14: ffff882fbf903a98 R15: ffff882fbfb62ac0
[373786.146724] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
[373786.164613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[373786.179892] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
[373786.196879] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[373786.213858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[373786.230669] Call Trace:
[373786.242081]  <IRQ>
[373786.252989]  resched_curr+0xae/0xd0
[373786.265510]  check_preempt_curr+0x79/0xa0
[373786.278628]  ttwu_do_wakeup+0x1e/0x160
[373786.291544]  ttwu_do_activate+0x7a/0x90
[373786.304508]  try_to_wake_up+0x1e7/0x480
[373786.317475]  ? check_preempt_curr+0x79/0xa0
[373786.330755]  default_wake_function+0x12/0x20
[373786.344077]  __wake_up_common+0x8f/0x160
[373786.357105]  __wake_up_locked+0x16/0x20
[373786.369982]  complete+0x42/0x60
[373786.381975]  mlx5_cmd_comp_handler+0x28f/0x4b0 [mlx5_core]
[373786.396534]  mlx5_eq_int+0x1ae/0x550 [mlx5_core]
[373786.410080]  ? __wake_up_common+0x8f/0x160
[373786.423054]  __handle_irq_event_percpu+0x42/0x1a0
[373786.436719]  handle_irq_event_percpu+0x32/0x80
[373786.450184]  handle_irq_event+0x3b/0x60
[373786.462935]  handle_edge_irq+0x95/0x1a0
[373786.475441]  handle_irq+0xb5/0x140
[373786.487323]  ? irq_work_run+0x2c/0x30
[373786.499336]  ? flush_smp_call_function_queue+0x88/0x110
[373786.513191]  do_IRQ+0x48/0xe0
[373786.524434]  common_interrupt+0x8e/0x8e
[373786.536517] RIP: 0010:panic+0x206/0x258
[373786.548351] RSP: 0018:ffff882fbf903e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff7e
[373786.564290] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[373786.579556] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
[373786.594559] RBP: ffff882fbf903ef0 R08: 0000000000000001 R09: 00000000000006b1
[373786.609374] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff81e6be9f
[373786.623990] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
[373786.638331]  watchdog_timer_fn+0x21a/0x230
[373786.649202]  ? watchdog+0x30/0x30
[373786.659024]  __hrtimer_run_queues+0xe7/0x230
[373786.669762]  hrtimer_interrupt+0xa8/0x1a0
[373786.680120]  smp_apic_timer_interrupt+0x6b/0x140
[373786.691100]  apic_timer_interrupt+0x8e/0xa0
[373786.701618]  </IRQ>
[373786.709633] RIP: 0010:fsnotify+0x197/0x510
[373786.719960] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
[373786.734322] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
[373786.748258] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[373786.762175] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
[373786.776003] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[373786.789766] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[373786.803354]  ? fsnotify+0x4bb/0x510
[373786.812823]  vfs_write+0x151/0x1b0
[373786.822215]  ? syscall_trace_enter+0x1cd/0x2b0
[373786.832724]  SyS_write+0x55/0xc0
[373786.841898]  do_syscall_64+0x79/0x1b0
[373786.851586]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[373786.862893] RIP: 0033:0x483084
[373786.871921] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[373786.886319] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[373786.900279] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
[373786.914247] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
[373786.928229] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[373786.942195] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[373786.956171] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 48
[373786.989819] ---[ end trace d3faf76bdc3ca405 ]---
[373787.001313] sched: Unexpected reschedule of offline CPU#36!
[373787.013940] ------------[ cut here ]------------
[373787.025482] WARNING: CPU: 24 PID: 24261 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x42/0x50
[373787.042884] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
pps_core scsi_transport_sas
[373787.175654]  dm_mirror dm_region_hash dm_log dm_mod dax
[373787.189862] CPU: 24 PID: 24261 Comm: kube-apiserver Tainted: G        W    L
4.14.32-1.el7.x86_64 #1
[373787.208727] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
[373787.225686] task: ffff882f66d28000 task.stack: ffffc9002120c000
[373787.240916] RIP: 0010:native_smp_send_reschedule+0x42/0x50
[373787.255668] RSP: 0018:ffff882fbf9039e0 EFLAGS: 00010046
[373787.270138] RAX: 000000000000002f RBX: 0000000000000024 RCX: 0000000000000006
[373787.286911] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
[373787.303602] RBP: ffff882fbf9039e0 R08: 0000000000000001 R09: 0000000000000793
[373787.320314] R10: 0000000000000001 R11: 0000000000000000 R12: ffff882fbfaa2ac0
[373787.337037] R13: ffff882fb78bdd00 R14: ffff882fbf903a98 R15: ffff882fbfaa2ac0
[373787.353793] FS:  000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
[373787.371708] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[373787.387114] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
[373787.404143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[373787.421146] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[373787.438016] Call Trace:
[373787.449503]  <IRQ>
[373787.460353]  resched_curr+0xae/0xd0
[373787.472913]  check_preempt_curr+0x79/0xa0
[373787.486064]  ttwu_do_wakeup+0x1e/0x160
[373787.499014]  ttwu_do_activate+0x7a/0x90
[373787.511930]  try_to_wake_up+0x1e7/0x480
[373787.524803]  ? check_preempt_curr+0x79/0xa0
[373787.538097]  default_wake_function+0x12/0x20
[373787.551463]  __wake_up_common+0x8f/0x160
[373787.564411]  __wake_up_locked+0x16/0x20
[373787.577191]  complete+0x42/0x60
[373787.589104]  mlx5_cmd_comp_handler+0x28f/0x4b0 [mlx5_core]
[373787.603704]  mlx5_eq_int+0x1ae/0x550 [mlx5_core]
[373787.617258]  ? __wake_up_common+0x8f/0x160
[373787.630170]  __handle_irq_event_percpu+0x42/0x1a0
[373787.643819]  handle_irq_event_percpu+0x32/0x80
[373787.657224]  handle_irq_event+0x3b/0x60
[373787.670045]  handle_edge_irq+0x95/0x1a0
[373787.682656]  handle_irq+0xb5/0x140
[373787.694520]  ? irq_work_run+0x2c/0x30
[373787.706546]  ? flush_smp_call_function_queue+0x88/0x110
[373787.720372]  do_IRQ+0x48/0xe0
[373787.731599]  common_interrupt+0x8e/0x8e
[373787.743630] RIP: 0010:panic+0x206/0x258
[373787.755405] RSP: 0018:ffff882fbf903e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff7e
[373787.771355] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[373787.786634] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff882fbf9169d0
[373787.801646] RBP: ffff882fbf903ef0 R08: 0000000000000001 R09: 00000000000006b1
[373787.816462] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff81e6be9f
[373787.831010] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
[373787.845323]  watchdog_timer_fn+0x21a/0x230
[373787.856160]  ? watchdog+0x30/0x30
[373787.866021]  __hrtimer_run_queues+0xe7/0x230
[373787.876785]  hrtimer_interrupt+0xa8/0x1a0
[373787.887167]  smp_apic_timer_interrupt+0x6b/0x140
[373787.898177]  apic_timer_interrupt+0x8e/0xa0
[373787.908668]  </IRQ>
[373787.916761] RIP: 0010:fsnotify+0x197/0x510
[373787.927091] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
[373787.941434] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
[373787.955328] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[373787.969286] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
[373787.983117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[373787.996820] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[373788.010389]  ? fsnotify+0x4bb/0x510
[373788.019908]  vfs_write+0x151/0x1b0
[373788.029296]  ? syscall_trace_enter+0x1cd/0x2b0
[373788.039801]  SyS_write+0x55/0xc0
[373788.048985]  do_syscall_64+0x79/0x1b0
[373788.058645]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[373788.069978] RIP: 0033:0x483084
[373788.079028] RSP: 002b:000000c4387e57f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[373788.093401] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[373788.107361] RDX: 00000000000002b3 RSI: 000000c42e27d800 RDI: 000000000000014b
[373788.121337] RBP: 000000c4387e5840 R08: 0000000000000000 R09: 0000000000000000
[373788.135346] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[373788.149304] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[373788.163236] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 48
[373788.196867] ---[ end trace d3faf76bdc3ca406 ]---

------[ sar -f ./sa15 -s 20:16:00 -P 24 ]-----------
Linux 4.14.32-1.el7.x86_64 (foobar)        04/15/2018      _x86_64_        (56 CPU)

08:16:00 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
08:16:01 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:02 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:03 PM      24      0.99      0.00      0.00      0.00      0.00     99.01
08:16:04 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:05 PM      24      1.00      0.00      0.00      0.00      0.00     99.00
08:16:06 PM      24      3.00      0.00      0.00      0.00      0.00     97.00
08:16:07 PM      24      2.00      0.00      0.00      0.00      0.00     98.00
08:16:08 PM      24      1.00      0.00      1.00      0.00      0.00     98.00
08:16:09 PM      24      0.99      0.00      0.00      0.00      0.00     99.01
08:16:10 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:11 PM      24      1.00      0.00      0.00      0.00      0.00     99.00
08:16:12 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:13 PM      24      1.01      0.00      0.00      0.00      0.00     98.99
08:16:14 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:15 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:16 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:17 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:18 PM      24      0.00      0.00      0.99      0.00      0.00     99.01
08:16:19 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:20 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:21 PM      24      1.00      0.00      0.00      0.00      0.00     99.00
08:16:22 PM      24      0.00      0.00      0.00      0.00      0.00    100.00
08:16:23 PM      24      1.00      0.00     17.00      0.00      0.00     82.00
08:16:24 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:25 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:26 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:27 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:28 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:29 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:30 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:31 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:32 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:33 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:34 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:35 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:36 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:37 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:38 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:39 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:40 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:41 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
08:16:42 PM      24      0.00      0.00    100.00      0.00      0.00      0.00
------[ sar -f ./sa15 -s 20:16:00 -P 24 ]-----------

The following panic is from a different server and we see the same symptom, kernel panics
due to a soft lockup and CPU#21 has 100% utilization for system level. In this panic we see
a timeout from the network driver for queuing packets, I believe this is the symptom and not
the cause, as a server with mellox driver had a similar soft lockup.

391838.033960] NETDEV WATCHDOG: eth0 (bnx2x): transmit queue 2 timed out
[391838.065545] ------------[ cut here ]------------
[391838.088431] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x22b/0x230
[391838.128800] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[391838.456941] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.32-1.el7.x86_64 #1
[391838.491589] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[391838.524202] task: ffffffff82012480 task.stack: ffffffff82000000
[391838.553322] RIP: 0010:dev_watchdog+0x22b/0x230
[391838.575252] RSP: 0018:ffff88103fa03e60 EFLAGS: 00010246
[391838.601054] RAX: 0000000000000039 RBX: 0000000000000002 RCX: 0000000000000000
[391838.636022] RDX: 0000000000000000 RSI: ffff88103fa169d8 RDI: ffff88103fa169d8
[391838.671651] RBP: ffff88103fa03e90 R08: 0000000000000000 R09: 00000000000004df
[391838.707021] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff881036674000
[391838.758515] R13: 000000000000005b R14: ffff88103667f100 R15: 0000000000000000
[391838.810815] FS:  0000000000000000(0000) GS:ffff88103fa00000(0000) knlGS:0000000000000000
[391838.867323] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[391838.912602] CR2: 00007f912eb7fff0 CR3: 000000000200a006 CR4: 00000000003606f0
[391838.964401] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[391839.016170] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[391839.067361] Call Trace:
[391839.096085]  <IRQ>
[391839.122674]  ? dev_deactivate_queue.constprop.30+0x60/0x60
[391839.166424]  call_timer_fn+0x37/0x140
[391839.201029]  run_timer_softirq+0x1eb/0x450
[391839.238196]  ? timerqueue_add+0x59/0x90
[391839.273260]  ? ktime_get+0x3e/0xa0
[391839.306253]  __do_softirq+0xd2/0x27c
[391839.340016]  irq_exit+0xd9/0xf0
[391839.371464]  smp_apic_timer_interrupt+0x75/0x140
[391839.410012]  apic_timer_interrupt+0x8e/0xa0
[391839.446764]  </IRQ>
[391839.472682] RIP: 0010:cpuidle_enter_state+0xdd/0x2b0
[391839.512914] RSP: 0018:ffffffff82003e00 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[391839.565090] RAX: ffff88103fa22ac0 RBX: ffffe8f000200000 RCX: 000000000000001f
[391839.615998] RDX: 0000000000000000 RSI: fff936788221f82b RDI: 0000000000000000
[391839.666639] RBP: ffffffff82003e38 R08: 000000000000034d R09: 00000000ffffffff
[391839.717691] R10: 000000000000037a R11: 0000000000000008 R12: 0000000000000004
[391839.768401] R13: 0000000000000000 R14: ffffffff8216d980 R15: 0001645fe6c35649
[391839.819280]  cpuidle_enter+0x17/0x20
[391839.852911]  call_cpuidle+0x23/0x40
[391839.885828]  do_idle+0x172/0x1e0
[391839.916662]  cpu_startup_entry+0x73/0x80
[391839.950559]  rest_init+0xaa/0xb0
[391839.981142]  start_kernel+0x4b7/0x4d8
[391840.013407]  ? set_init_arg+0x5a/0x5a
[391840.045237]  x86_64_start_reservations+0x2a/0x2c
[391840.081722]  x86_64_start_kernel+0x72/0x75
[391840.114722]  secondary_startup_64+0xa5/0xb0
[391840.149320] Code: 60 04 00 00 eb 89 4c 89 e7 c6 05 77 bb b2 00 01 e8 6b 38 fd ff 89 d9 48 89 c2
4c 89 e6 48 c7 c7 98 6a ef 81 31 c0 e8 b8 52 a2 ff <0f> 0b eb b9 90 0f 1f 44 00 00 55 48 89 e5 41 57
49 89 d7 41 56
[391840.265586] ---[ end trace c661065d595325a9 ]---
[391842.302965] bnx2x: [bnx2x_clean_tx_queue:1205(eth0)]timeout waiting for queue[2]:
txdata->tx_pkt_prod(11525) != txdata->tx_pkt_cons(11500)
[391844.388943] bnx2x: [bnx2x_clean_tx_queue:1205(eth0)]timeout waiting for queue[2]:
txdata->tx_pkt_prod(11525) != txdata->tx_pkt_cons(11500)
[391850.094964] watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [kube-apiserver:60495]
[391850.146079] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[391850.573524] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W
4.14.32-1.el7.x86_64 #1
[391850.634311] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[391850.682799] task: ffff881022172e80 task.stack: ffffc9000b874000
[391850.727891] RIP: 0010:fsnotify+0x218/0x510
[391850.763842] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[391850.820076] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
[391850.873470] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[391850.925414] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
[391850.976777] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[391851.028138] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[391851.079135] FS:  000000c42be02090(0000) GS:ffff88103fd40000(0000) knlGS:0000000000000000
[391851.135142] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[391851.180107] CR2: 00007f5c3c0690c0 CR3: 0000000fc47c4004 CR4: 00000000003606e0
[391851.231704] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[391851.283258] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[391851.335898] Call Trace:
[391851.367161]  vfs_write+0x151/0x1b0
[391851.401673]  ? syscall_trace_enter+0x1cd/0x2b0
[391851.440900]  SyS_write+0x55/0xc0
[391851.474214]  do_syscall_64+0x79/0x1b0
[391851.510034]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[391851.551320] RIP: 0033:0x483084
[391851.583001] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[391851.636289] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[391851.688719] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
[391851.740825] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
[391851.792257] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[391851.843292] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[391851.896703] Code: 0f 85 08 02 00 00 48 85 db 41 0f 94 c4 4d 85 ed 0f 94 c1 84 c9 0f 85 ef 02 00
00 8b 4d 90 85 c9 74 26 48 85 db 74 0d f6 43 44 01 <75> 07 c7 43 40 00 00 00 00 4d 85 ed 74 0f 41 f6
45 44 01 75 08
[391852.022198] Kernel panic - not syncing: softlockup: hung tasks
[391852.068204] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W    L
4.14.32-1.el7.x86_64 #1
[391852.130544] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[391852.180598] Call Trace:
[391852.210411]  <IRQ>
[391852.237477]  dump_stack+0x63/0x88
[391852.270360]  panic+0xe8/0x258
[391852.301307]  watchdog_timer_fn+0x21a/0x230
[391852.337395]  ? watchdog+0x30/0x30
[391852.368943]  __hrtimer_run_queues+0xe7/0x230
[391852.405003]  hrtimer_interrupt+0xa8/0x1a0
[391852.439190]  smp_apic_timer_interrupt+0x6b/0x140
[391852.476151]  apic_timer_interrupt+0x8e/0xa0
[391852.511089]  </IRQ>
[391852.535014] RIP: 0010:fsnotify+0x218/0x510
[391852.568048] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[391852.617533] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
[391852.664520] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[391852.711835] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
[391852.758813] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[391852.805527] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[391852.851877]  ? fsnotify+0x4bb/0x510
[391852.880665]  vfs_write+0x151/0x1b0
[391852.909135]  ? syscall_trace_enter+0x1cd/0x2b0
[391852.942798]  SyS_write+0x55/0xc0
[391852.969978]  do_syscall_64+0x79/0x1b0
[391852.999194]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[391853.035095] RIP: 0033:0x483084
[391853.061289] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[391853.109641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[391853.155956] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
[391853.202552] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
[391853.248842] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[391853.295051] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[391853.341016] Kernel Offset: disabled
[391853.375061] ---[ end Kernel panic - not syncing: softlockup: hung tasks
[391853.419102] sched: Unexpected reschedule of offline CPU#0!
[391853.457084] ------------[ cut here ]------------
[391853.491472] WARNING: CPU: 21 PID: 60495 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x42/0x50
[391853.549474] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[391853.967080] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W    L
4.14.32-1.el7.x86_64 #1
[391854.026457] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[391854.073417] task: ffff881022172e80 task.stack: ffffc9000b874000
[391854.116927] RIP: 0010:native_smp_send_reschedule+0x42/0x50
[391854.158063] RSP: 0018:ffff88103fd43b10 EFLAGS: 00010046
[391854.197408] RAX: 000000000000002e RBX: 0000000000000000 RCX: 0000000000000000
[391854.246409] RDX: 0000000000000000 RSI: ffff88103fd569d8 RDI: ffff88103fd569d8
[391854.295777] RBP: ffff88103fd43b10 R08: 0000000000000000 R09: 0000000000000556
[391854.345373] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88103fa22ac0
[391854.395334] R13: ffff880f8be48000 R14: ffff88103fd43bc8 R15: ffff88103fa22ac0
[391854.444983] FS:  000000c42be02090(0000) GS:ffff88103fd40000(0000) knlGS:0000000000000000
[391854.498575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[391854.541675] CR2: 00007f5c3c0690c0 CR3: 0000000fc47c4004 CR4: 00000000003606e0
[391854.591999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[391854.642263] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[391854.692678] Call Trace:
[391854.719793]  <IRQ>
[391854.744771]  resched_curr+0xae/0xd0
[391854.776585]  check_preempt_curr+0x79/0xa0
[391854.811170]  ttwu_do_wakeup+0x1e/0x160
[391854.844514]  ttwu_do_activate+0x7a/0x90
[391854.877774]  try_to_wake_up+0x1e7/0x480
[391854.910892]  default_wake_function+0x12/0x20
[391854.946665]  autoremove_wake_function+0x16/0x60
[391854.984069]  __wake_up_common+0x8f/0x160
[391855.018321]  __wake_up_common_lock+0x7e/0xc0
[391855.053398]  __wake_up+0x13/0x20
[391855.083708]  wake_up_klogd_work_func+0x40/0x60
[391855.119905]  irq_work_run_list+0x53/0x80
[391855.153377]  irq_work_run+0x2c/0x30
[391855.184508]  flush_smp_call_function_queue+0x88/0x110
[391855.223509]  generic_smp_call_function_single_interrupt+0x13/0x30
[391855.267592]  smp_call_function_single_interrupt+0x3a/0xe0
[391855.308323]  call_function_single_interrupt+0x8e/0xa0
[391855.347202] RIP: 0010:panic+0x206/0x258
[391855.380345] RSP: 0018:ffff88103fd43e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[391855.431894] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[391855.481301] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88103fd569d0
[391855.530810] RBP: ffff88103fd43ef0 R08: 0000000000000000 R09: 0000000000000555
[391855.579985] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffffffff81e6be9f
[391855.629525] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
[391855.677925]  watchdog_timer_fn+0x21a/0x230
[391855.711211]  ? watchdog+0x30/0x30
[391855.740236]  __hrtimer_run_queues+0xe7/0x230
[391855.773231]  hrtimer_interrupt+0xa8/0x1a0
[391855.804713]  smp_apic_timer_interrupt+0x6b/0x140
[391855.838740]  apic_timer_interrupt+0x8e/0xa0
[391855.870671]  </IRQ>
[391855.892208] RIP: 0010:fsnotify+0x218/0x510
[391855.922974] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[391855.970885] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
[391856.016803] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[391856.062423] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
[391856.108153] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[391856.153683] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[391856.200197]  ? fsnotify+0x4bb/0x510
[391856.228102]  vfs_write+0x151/0x1b0
[391856.256421]  ? syscall_trace_enter+0x1cd/0x2b0
[391856.288496]  SyS_write+0x55/0xc0
[391856.314643]  do_syscall_64+0x79/0x1b0
[391856.342704]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[391856.377545] RIP: 0033:0x483084
[391856.402822] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[391856.449735] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[391856.494804] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
[391856.540308] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
[391856.585743] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[391856.630940] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[391856.676366] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 48
[391856.792915] ---[ end trace c661065d595325aa ]---
[391856.826793] ------------[ cut here ]------------
[391856.860523] WARNING: CPU: 21 PID: 60495 at kernel/sched/core.c:1179 set_task_cpu+0x197/0x1a0
[391856.913620] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[391857.333766] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W    L
4.14.32-1.el7.x86_64 #1
[391857.393681] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[391857.440546] task: ffff881022172e80 task.stack: ffffc9000b874000
[391857.484076] RIP: 0010:set_task_cpu+0x197/0x1a0
[391857.520542] RSP: 0018:ffff88103fd43ae8 EFLAGS: 00010046
[391857.560948] RAX: 0000000000000200 RBX: ffff881038cb45c0 RCX: 0000000000000001
[391857.610782] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff881038cb45c0
[391857.660456] RBP: ffff88103fd43b08 R08: 0000000000000008 R09: 0000000000000000
[391857.710401] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff881038cb516c
[391857.760003] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000022ac0
[391857.809282] FS:  000000c42be02090(0000) GS:ffff88103fd40000(0000) knlGS:0000000000000000
[391857.863581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[391857.906806] CR2: 00007f5c3c0690c0 CR3: 0000000fc47c4004 CR4: 00000000003606e0
[391857.956620] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[391858.007011] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[391858.057596] Call Trace:
[391858.085525]  <IRQ>
[391858.110876]  try_to_wake_up+0x16c/0x480
[391858.145085]  ? resched_curr+0xae/0xd0
[391858.178173]  default_wake_function+0x12/0x20
[391858.214468]  __wake_up_common+0x8f/0x160
[391858.248941]  __wake_up_locked+0x16/0x20
[391858.283175]  ep_poll_callback+0xd0/0x300
[391858.316965]  __wake_up_common+0x8f/0x160
[391858.351271]  __wake_up_common_lock+0x7e/0xc0
[391858.387289]  __wake_up+0x13/0x20
[391858.417695]  wake_up_klogd_work_func+0x40/0x60
[391858.454575]  irq_work_run_list+0x53/0x80
[391858.488737]  irq_work_run+0x2c/0x30
[391858.520329]  flush_smp_call_function_queue+0x88/0x110
[391858.559946]  generic_smp_call_function_single_interrupt+0x13/0x30
[391858.603988]  smp_call_function_single_interrupt+0x3a/0xe0
[391858.645713]  call_function_single_interrupt+0x8e/0xa0
[391858.685706] RIP: 0010:panic+0x206/0x258
[391858.720431] RSP: 0018:ffff88103fd43e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[391858.772695] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[391858.822759] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88103fd569d0
[391858.872167] RBP: ffff88103fd43ef0 R08: 0000000000000000 R09: 0000000000000555
[391858.921420] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffffffff81e6be9f
[391858.971071] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
[391859.020677]  watchdog_timer_fn+0x21a/0x230
[391859.054291]  ? watchdog+0x30/0x30
[391859.083991]  __hrtimer_run_queues+0xe7/0x230
[391859.118087]  hrtimer_interrupt+0xa8/0x1a0
[391859.150361]  smp_apic_timer_interrupt+0x6b/0x140
[391859.185167]  apic_timer_interrupt+0x8e/0xa0
[391859.217429]  </IRQ>
[391859.239165] RIP: 0010:fsnotify+0x218/0x510
[391859.269961] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[391859.317370] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
[391859.363263] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[391859.409279] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
[391859.455080] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[391859.500518] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[391859.546063]  ? fsnotify+0x4bb/0x510
[391859.574081]  vfs_write+0x151/0x1b0
[391859.601468]  ? syscall_trace_enter+0x1cd/0x2b0
[391859.634055]  SyS_write+0x55/0xc0
[391859.660517]  do_syscall_64+0x79/0x1b0
[391859.688919]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[391859.723536] RIP: 0033:0x483084
[391859.748891] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[391859.796455] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[391859.841781] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
[391859.887303] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
[391859.932494] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[391859.977838] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[391860.023361] Code: ff 80 8b ac 08 00 00 04 e9 20 ff ff ff 0f 0b e9 b9 fe ff ff f7 83 84 00 00 00
fd ff ff ff 0f 84 c3 fe ff ff 0f 0b e9 bc fe ff ff <0f> 0b e9 cb fe ff ff 66 90 0f 1f 44 00 00 55 48
89 e5 41 56 49
[391860.138078] ---[ end trace c661065d595325ab ]---
[391860.172166] sched: Unexpected reschedule of offline CPU#8!
[391860.210690] ------------[ cut here ]------------
[391860.244671] WARNING: CPU: 21 PID: 60495 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x42/0x50
[391860.303820] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop vfat fat x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate iTCO_wdt iTCO_vendor_support
intel_rapl_perf sg hpilo hpwdt ipmi_si pcspkr lpc_ich ioatdma ipmi_devintf dca mfd_core i2c_i801
shpchp wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod bnx2x mdio drm
libcrc32c crc32c_intel hpsa ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[391860.726277] CPU: 21 PID: 60495 Comm: kube-apiserver Tainted: G        W    L
4.14.32-1.el7.x86_64 #1
[391860.786402] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[391860.834206] task: ffff881022172e80 task.stack: ffffc9000b874000
[391860.878669] RIP: 0010:native_smp_send_reschedule+0x42/0x50
[391860.920832] RSP: 0018:ffff88103fd43b08 EFLAGS: 00010046
[391860.961851] RAX: 000000000000002e RBX: ffff881038cb45c0 RCX: 0000000000000006
[391861.012094] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88103fd569d0
[391861.062447] RBP: ffff88103fd43b08 R08: 0000000000000000 R09: 00000000000005e8
[391861.112691] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff881038cb516c
[391861.163322] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000022ac0
[391861.213440] FS:  000000c42be02090(0000) GS:ffff88103fd40000(0000) knlGS:0000000000000000
[391861.268665] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[391861.311928] CR2: 00007f5c3c0690c0 CR3: 0000000fc47c4004 CR4: 00000000003606e0
[391861.362717] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[391861.414065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[391861.464505] Call Trace:
[391861.492319]  <IRQ>
[391861.517992]  try_to_wake_up+0x405/0x480
[391861.551956]  default_wake_function+0x12/0x20
[391861.588252]  __wake_up_common+0x8f/0x160
[391861.622982]  __wake_up_locked+0x16/0x20
[391861.657272]  ep_poll_callback+0xd0/0x300
[391861.691535]  __wake_up_common+0x8f/0x160
[391861.726097]  __wake_up_common_lock+0x7e/0xc0
[391861.762240]  __wake_up+0x13/0x20
[391861.793096]  wake_up_klogd_work_func+0x40/0x60
[391861.830133]  irq_work_run_list+0x53/0x80
[391861.864538]  irq_work_run+0x2c/0x30
[391861.896744]  flush_smp_call_function_queue+0x88/0x110
[391861.936872]  generic_smp_call_function_single_interrupt+0x13/0x30
[391861.981074]  smp_call_function_single_interrupt+0x3a/0xe0
[391862.022733]  call_function_single_interrupt+0x8e/0xa0
[391862.062300] RIP: 0010:panic+0x206/0x258
[391862.096123] RSP: 0018:ffff88103fd43e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[391862.148335] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[391862.197879] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88103fd569d0
[391862.247474] RBP: ffff88103fd43ef0 R08: 0000000000000000 R09: 0000000000000555
[391862.296985] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffffffff81e6be9f
[391862.346312] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ee6b2800
[391862.395985]  watchdog_timer_fn+0x21a/0x230
[391862.430116]  ? watchdog+0x30/0x30
[391862.460248]  __hrtimer_run_queues+0xe7/0x230
[391862.494845]  hrtimer_interrupt+0xa8/0x1a0
[391862.527650]  smp_apic_timer_interrupt+0x6b/0x140
[391862.563130]  apic_timer_interrupt+0x8e/0xa0
[391862.596032]  </IRQ>
[391862.618884] RIP: 0010:fsnotify+0x218/0x510
[391862.650285] RSP: 0018:ffffc9000b877db8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[391862.698849] RAX: ffff882001c77a98 RBX: ffff882001c77a70 RCX: 0000000000000002
[391862.744636] RDX: 0000000000028400 RSI: 0000000000000002 RDI: ffffffff8269a4e0
[391862.791246] RBP: ffffc9000b877e98 R08: 0000000000000000 R09: 0000000000000000
[391862.837248] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[391862.883324] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[391862.928937]  ? fsnotify+0x4bb/0x510
[391862.957183]  vfs_write+0x151/0x1b0
[391862.984840]  ? syscall_trace_enter+0x1cd/0x2b0
[391863.017128]  SyS_write+0x55/0xc0
[391863.043812]  do_syscall_64+0x79/0x1b0
[391863.072403]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[391863.107687] RIP: 0033:0x483084
[391863.133412] RSP: 002b:000000c43197d7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[391863.180683] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000483084
[391863.226639] RDX: 00000000000002a9 RSI: 000000c424283c00 RDI: 0000000000000040
[391863.272308] RBP: 000000c43197d840 R08: 0000000000000000 R09: 0000000000000000
[391863.317590] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[391863.363056] R13: 00000000000000f2 R14: 0000000000000032 R15: 0000000000000002
[391863.409871] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 48
[391863.522945] ---[ end trace c661065d595325ac ]---

----[ sar -f ./sa16 -s 04:25:50 -e 05:00:00 -P 21 ]----
Linux 4.14.32-1.el7.x86_64 (foobar)        04/16/2018      _x86_64_        (32 CPU)

04:25:50 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
04:25:51 AM      21      0.00      0.00      0.00      0.00      0.00    100.00
04:25:52 AM      21      1.00      0.00      1.00      0.00      0.00     98.00
04:25:53 AM      21      0.00      0.00      0.00      0.00      0.00    100.00
04:25:54 AM      21      1.00      0.00      0.00      0.00      0.00     99.00
04:25:55 AM      21      0.00      0.00     70.71      0.00      0.00     29.29
04:25:56 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
04:25:57 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
04:25:58 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
04:25:59 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
04:26:00 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
04:26:01 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
04:26:02 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
04:26:03 AM      21      0.00      0.00    100.00      0.00      0.00      0.00
----[ sar -f ./sa16 -s 04:25:50 -e 05:00:00 -P 21 ]----

The fact we see one CPU spinning at 100% utilization in all above crashes is a good thing,
as we can use it as a start point for our investigation. We just need to find out which
(kernel/hardware/network driver/userland application) process makes a single CPU to be stuck.
Thus, we disabled the trigger to panic the kernel when a soft lockup occurs, and we hope
can find out the process.

The following panic is from the second type of panics I mentioned, where we don't
observe soft lockups and CPU utilization is close to zero before the crash.

[123379.816452] perf: interrupt took too long (6243 > 6231), lowering
kernel.perf_event_max_sample_rate to 32000
[295349.255065] general protection fault: 0000 [#1] SMP PTI
[295349.281440] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs x86_pkg_temp_thermal intel_powerclamp loop
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
crypto_simd glue_helper cryptd iTCO_wdt ipmi_si iTCO_vendor_support intel_cstate intel_rapl_perf
lpc_ich sg hpilo hpwdt ioatdma pcspkr ipmi_devintf i2c_i801 dca shpchp mfd_core wmi ipmi_msghandler
nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt sd_mod fb_sys_fops ttm bnx2x mdio libcrc32c crc32c_intel serio_raw
hpsa ptp drm scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[295349.615070] CPU: 26 PID: 1384 Comm: thread.rb:70 Not tainted 4.14.32-1.el7.x86_64 #1
[295349.654011] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[295349.686931] task: ffff882035430000 task.stack: ffffc90007bb4000
[295349.716421] RIP: 0010:prefetch_freepointer.isra.63+0x11/0x20
[295349.744812] RSP: 0018:ffffc90007bb7e08 EFLAGS: 00010202
[295349.771654] RAX: 0000000000000000 RBX: 6236612d38373234 RCX: 00000000000199bb
[295349.807690] RDX: 00000000000199ba RSI: 6236612d38373234 RDI: ffff88203ec259a0
[295349.843664] RBP: ffffc90007bb7e08 R08: 0000000000028060 R09: ffffffff82051cc0
[295349.879868] R10: 0000000000002000 R11: 0000000000000040 R12: 00000000014000c0
[295349.916097] R13: ffff88203ec25980 R14: ffff88203ec25980 R15: ffff882000000000
[295349.951868] FS:  00007f3f439f9700(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000
[295349.993039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[295350.021664] CR2: 000000c43069c000 CR3: 000000203943e001 CR4: 00000000003606e0
[295350.057534] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[295350.093663] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[295350.129254] Call Trace:
[295350.141644]  kmem_cache_alloc+0x9c/0x1b0
[295350.161581]  ? fsnotify_add_mark_locked+0x153/0x320
[295350.186330]  fsnotify_add_mark_locked+0x153/0x320
[295350.210023]  SyS_inotify_add_watch+0x2d5/0x350
[295350.232414]  do_syscall_64+0x79/0x1b0
[295350.250528]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[295350.275482] RIP: 0033:0x7f3f53f409b7
[295350.293330] RSP: 002b:00007f3f439f70c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000fe
[295350.330889] RAX: ffffffffffffffda RBX: 00007f3f2c232fc0 RCX: 00007f3f53f409b7
[295350.365971] RDX: 0000000022000fc6 RSI: 0000000002eaba50 RDI: 0000000000000018
[295350.400949] RBP: 0000000002677d20 R08: 000000005ad2a563 R09: 0000000009caa9a8
[295350.436090] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000002677d20
[295350.471552] R13: 000000000000fd02 R14: 000000000005dc08 R15: 00000000000081a4
[295350.507348] Code: 31 d2 e8 b3 ea ff ff 5b 41 5c 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00
0f 1f 44 00 00 55 48 85 f6 48 89 e5 74 0a 48 63 07 <48> 8b 04 06 0f 18 08 5d c3 66 0f 1f 44 00 00 0f
1f 44 00 00 48
[295350.601490] RIP: prefetch_freepointer.isra.63+0x11/0x20 RSP: ffffc90007bb7e08
[295350.637891] ---[ end trace 97f09d2dbcdbfe07 ]---
[295350.666426] Kernel panic - not syncing: Fatal exception
[295350.692470] Kernel Offset: disabled
[295350.715267] ---[ end Kernel panic - not syncing: Fatal exception
[295350.745027] ------------[ cut here ]------------
[295350.767882] WARNING: CPU: 26 PID: 1384 at kernel/sched/core.c:1179 set_task_cpu+0x197/0x1a0
[295350.809229] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs x86_pkg_temp_thermal intel_powerclamp loop
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
crypto_simd glue_helper cryptd iTCO_wdt ipmi_si iTCO_vendor_support intel_cstate intel_rapl_perf
lpc_ich sg hpilo hpwdt ioatdma pcspkr ipmi_devintf i2c_i801 dca shpchp mfd_core wmi ipmi_msghandler
nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt sd_mod fb_sys_fops ttm bnx2x mdio libcrc32c crc32c_intel serio_raw
hpsa ptp drm scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[295351.141701] CPU: 26 PID: 1384 Comm: thread.rb:70 Tainted: G      D         4.14.32-1.el7.x86_64 #1
[295351.186528] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[295351.219763] task: ffff882035430000 task.stack: ffffc90007bb4000
[295351.249425] RIP: 0010:set_task_cpu+0x197/0x1a0
[295351.272046] RSP: 0018:ffff88203f483cd8 EFLAGS: 00010046
[295351.298021] RAX: 0000000000000200 RBX: ffff880fc6730000 RCX: 0000000000000001
[295351.333003] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff880fc6730000
[295351.368440] RBP: ffff88203f483cf8 R08: 0000000000000008 R09: 0000000000000000
[295351.404295] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880fc6730bac
[295351.440065] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000022ac0
[295351.475936] FS:  00007f3f439f9700(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000
[295351.516850] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[295351.545941] CR2: 000000c43069c000 CR3: 000000203943e001 CR4: 00000000003606e0
[295351.581551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[295351.616790] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[295351.652332] Call Trace:
[295351.664980]  <IRQ>
[295351.675389]  try_to_wake_up+0x16c/0x480
[295351.694771]  default_wake_function+0x12/0x20
[295351.716287]  autoremove_wake_function+0x16/0x60
[295351.738731]  __wake_up_common+0x8f/0x160
[295351.758434]  __wake_up_common_lock+0x7e/0xc0
[295351.780379]  __wake_up+0x13/0x20
[295351.796700]  wake_up_klogd_work_func+0x40/0x60
[295351.818797]  irq_work_run_list+0x53/0x80
[295351.838265]  ? tick_sched_do_timer+0x70/0x70
[295351.859777]  irq_work_tick+0x40/0x50
[295351.877976]  update_process_times+0x42/0x60
[295351.899104]  tick_sched_handle+0x2d/0x60
[295351.919406]  tick_sched_timer+0x39/0x70
[295351.938722]  __hrtimer_run_queues+0xe7/0x230
[295351.960148]  hrtimer_interrupt+0xa8/0x1a0
[295351.979989]  smp_apic_timer_interrupt+0x6b/0x140
[295352.003308]  apic_timer_interrupt+0x8e/0xa0
[295352.024371]  </IRQ>
[295352.035497] RIP: 0010:panic+0x206/0x258
[295352.055056] RSP: 0018:ffffc90007bb7c58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[295352.092974] RAX: 0000000000000034 RBX: 0000000000000200 RCX: 0000000000000006
[295352.129345] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88203f4969d0
[295352.164888] RBP: ffffc90007bb7cc8 R08: 0000000000000000 R09: 00000000000004bf
[295352.200268] R10: ffffffff8140e7c0 R11: 00000000000004be R12: ffffffff81e4b096
[295352.236368] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[295352.272653]  ? vgacon_invert_region+0x80/0x80
[295352.294690]  ? panic+0x1ff/0x258
[295352.311125]  oops_end+0xba/0xd0
[295352.327275]  die+0x42/0x50
[295352.341034]  do_general_protection+0xd2/0x160
[295352.362771]  general_protection+0x25/0x50
[295352.382624] RIP: 0010:prefetch_freepointer.isra.63+0x11/0x20
[295352.410365] RSP: 0018:ffffc90007bb7e08 EFLAGS: 00010202
[295352.435958] RAX: 0000000000000000 RBX: 6236612d38373234 RCX: 00000000000199bb
[295352.471228] RDX: 00000000000199ba RSI: 6236612d38373234 RDI: ffff88203ec259a0
[295352.506333] RBP: ffffc90007bb7e08 R08: 0000000000028060 R09: ffffffff82051cc0
[295352.541869] R10: 0000000000002000 R11: 0000000000000040 R12: 00000000014000c0
[295352.577452] R13: ffff88203ec25980 R14: ffff88203ec25980 R15: ffff882000000000
[295352.613390]  ? idr_alloc_cmn+0x98/0xe0
[295352.633360]  kmem_cache_alloc+0x9c/0x1b0
[295352.653132]  ? fsnotify_add_mark_locked+0x153/0x320
[295352.677495]  fsnotify_add_mark_locked+0x153/0x320
[295352.700960]  SyS_inotify_add_watch+0x2d5/0x350
[295352.723337]  do_syscall_64+0x79/0x1b0
[295352.741929]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[295352.767022] RIP: 0033:0x7f3f53f409b7
[295352.785431] RSP: 002b:00007f3f439f70c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000fe
[295352.823469] RAX: ffffffffffffffda RBX: 00007f3f2c232fc0 RCX: 00007f3f53f409b7
[295352.859222] RDX: 0000000022000fc6 RSI: 0000000002eaba50 RDI: 0000000000000018
[295352.901958] RBP: 0000000002677d20 R08: 000000005ad2a563 R09: 0000000009caa9a8
[295352.937907] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000002677d20
[295352.974108] R13: 000000000000fd02 R14: 000000000005dc08 R15: 00000000000081a4
[295353.010354] Code: ff 80 8b ac 08 00 00 04 e9 20 ff ff ff 0f 0b e9 b9 fe ff ff f7 83 84 00 00 00
fd ff ff ff 0f 84 c3 fe ff ff 0f 0b e9 bc fe ff ff <0f> 0b e9 cb fe ff ff 66 90 0f 1f 44 00 00 55 48
89 e5 41 56 49
[295353.103228] ---[ end trace 97f09d2dbcdbfe08 ]---
[295353.126793] sched: Unexpected reschedule of offline CPU#8!
[295353.154571] ------------[ cut here ]------------
[295353.178193] WARNING: CPU: 26 PID: 1384 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x42/0x50
[295353.225115] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs x86_pkg_temp_thermal intel_powerclamp loop
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
crypto_simd glue_helper cryptd iTCO_wdt ipmi_si iTCO_vendor_support intel_cstate intel_rapl_perf
lpc_ich sg hpilo hpwdt ioatdma pcspkr ipmi_devintf i2c_i801 dca shpchp mfd_core wmi ipmi_msghandler
nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt sd_mod fb_sys_fops ttm bnx2x mdio libcrc32c crc32c_intel serio_raw
hpsa ptp drm scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[295353.554858] CPU: 26 PID: 1384 Comm: thread.rb:70 Tainted: G      D W       4.14.32-1.el7.x86_64 #1
[295353.600673] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[295353.634304] task: ffff882035430000 task.stack: ffffc90007bb4000
[295353.664086] RIP: 0010:native_smp_send_reschedule+0x42/0x50
[295353.691429] RSP: 0018:ffff88203f483c60 EFLAGS: 00010046
[295353.717211] RAX: 000000000000002e RBX: 0000000000000008 RCX: 0000000000000006
[295353.753162] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88203f4969d0
[295353.789028] RBP: ffff88203f483c60 R08: 0000000000000000 R09: 000000000000050a
[295353.824901] R10: ffffffff8140e7c0 R11: 0000000000000509 R12: ffff88203f222ac0
[295353.860780] R13: ffff880fc6730000 R14: ffff88203f483d18 R15: ffff88203f222ac0
[295353.897041] FS:  00007f3f439f9700(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000
[295353.937015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[295353.965230] CR2: 000000c43069c000 CR3: 000000203943e001 CR4: 00000000003606e0
[295354.001263] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[295354.037348] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[295354.073079] Call Trace:
[295354.085676]  <IRQ>
[295354.096271]  resched_curr+0xae/0xd0
[295354.114398]  check_preempt_curr+0x79/0xa0
[295354.134774]  ttwu_do_wakeup+0x1e/0x160
[295354.153738]  ttwu_do_activate+0x7a/0x90
[295354.173017]  try_to_wake_up+0x1e7/0x480
[295354.192199]  default_wake_function+0x12/0x20
[295354.213726]  autoremove_wake_function+0x16/0x60
[295354.236555]  __wake_up_common+0x8f/0x160
[295354.256636]  __wake_up_common_lock+0x7e/0xc0
[295354.278570]  __wake_up+0x13/0x20
[295354.295265]  wake_up_klogd_work_func+0x40/0x60
[295354.317984]  irq_work_run_list+0x53/0x80
[295354.337965]  ? tick_sched_do_timer+0x70/0x70
[295354.359264]  irq_work_tick+0x40/0x50
[295354.377736]  update_process_times+0x42/0x60
[295354.399024]  tick_sched_handle+0x2d/0x60
[295354.418996]  tick_sched_timer+0x39/0x70
[295354.438406]  __hrtimer_run_queues+0xe7/0x230
[295354.459586]  hrtimer_interrupt+0xa8/0x1a0
[295354.479258]  smp_apic_timer_interrupt+0x6b/0x140
[295354.502194]  apic_timer_interrupt+0x8e/0xa0
[295354.523081]  </IRQ>
[295354.533789] RIP: 0010:panic+0x206/0x258
[295354.553565] RSP: 0018:ffffc90007bb7c58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[295354.590890] RAX: 0000000000000034 RBX: 0000000000000200 RCX: 0000000000000006
[295354.626876] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88203f4969d0
[295354.662703] RBP: ffffc90007bb7cc8 R08: 0000000000000000 R09: 00000000000004bf
[295354.698251] R10: ffffffff8140e7c0 R11: 00000000000004be R12: ffffffff81e4b096
[295354.733758] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[295354.769850]  ? vgacon_invert_region+0x80/0x80
[295354.791724]  ? panic+0x1ff/0x258
[295354.808021]  oops_end+0xba/0xd0
[295354.823809]  die+0x42/0x50
[295354.837948]  do_general_protection+0xd2/0x160
[295354.859636]  general_protection+0x25/0x50
[295354.880150] RIP: 0010:prefetch_freepointer.isra.63+0x11/0x20
[295354.908869] RSP: 0018:ffffc90007bb7e08 EFLAGS: 00010202
[295354.935002] RAX: 0000000000000000 RBX: 6236612d38373234 RCX: 00000000000199bb
[295354.970812] RDX: 00000000000199ba RSI: 6236612d38373234 RDI: ffff88203ec259a0
[295355.006560] RBP: ffffc90007bb7e08 R08: 0000000000028060 R09: ffffffff82051cc0
[295355.042849] R10: 0000000000002000 R11: 0000000000000040 R12: 00000000014000c0
[295355.077849] R13: ffff88203ec25980 R14: ffff88203ec25980 R15: ffff882000000000
[295355.113175]  ? idr_alloc_cmn+0x98/0xe0
[295355.132128]  kmem_cache_alloc+0x9c/0x1b0
[295355.151819]  ? fsnotify_add_mark_locked+0x153/0x320
[295355.176264]  fsnotify_add_mark_locked+0x153/0x320
[295355.199925]  SyS_inotify_add_watch+0x2d5/0x350
[295355.222164]  do_syscall_64+0x79/0x1b0
[295355.240555]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[295355.266353] RIP: 0033:0x7f3f53f409b7
[295355.284573] RSP: 002b:00007f3f439f70c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000fe
[295355.322272] RAX: ffffffffffffffda RBX: 00007f3f2c232fc0 RCX: 00007f3f53f409b7
[295355.357920] RDX: 0000000022000fc6 RSI: 0000000002eaba50 RDI: 0000000000000018
[295355.393626] RBP: 0000000002677d20 R08: 000000005ad2a563 R09: 0000000009caa9a8
[295355.429391] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000002677d20
[295355.464726] R13: 000000000000fd02 R14: 000000000005dc08 R15: 00000000000081a4
[295355.500091] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 48
[295355.592809] ---[ end trace 97f09d2dbcdbfe09 ]---
[295355.616249] sched: Unexpected reschedule of offline CPU#0!
[295355.642901] ------------[ cut here ]------------
[295355.666243] WARNING: CPU: 26 PID: 1384 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x42/0x50
[295355.713782] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs x86_pkg_temp_thermal intel_powerclamp loop
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
crypto_simd glue_helper cryptd iTCO_wdt ipmi_si iTCO_vendor_support intel_cstate intel_rapl_perf
lpc_ich sg hpilo hpwdt ioatdma pcspkr ipmi_devintf i2c_i801 dca shpchp mfd_core wmi ipmi_msghandler
nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt sd_mod fb_sys_fops ttm bnx2x mdio libcrc32c crc32c_intel serio_raw
hpsa ptp drm scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax
[295356.048067] CPU: 26 PID: 1384 Comm: thread.rb:70 Tainted: G      D W       4.14.32-1.el7.x86_64 #1
[295356.094292] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017
[295356.127304] task: ffff882035430000 task.stack: ffffc90007bb4000
[295356.157937] RIP: 0010:native_smp_send_reschedule+0x42/0x50
[295356.186118] RSP: 0018:ffff88203f483c58 EFLAGS: 00010046
[295356.212721] RAX: 000000000000002e RBX: ffff8810391945c0 RCX: 0000000000000006
[295356.247928] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88203f4969d0
[295356.284320] RBP: ffff88203f483c58 R08: 0000000000000000 R09: 0000000000000559
[295356.320685] R10: ffffffff8140e7c0 R11: 0000000000000558 R12: ffff88103919516c
[295356.356635] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000022ac0
[295356.392135] FS:  00007f3f439f9700(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000
[295356.432737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[295356.461522] CR2: 000000c43069c000 CR3: 000000203943e001 CR4: 00000000003606e0
[295356.497800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[295356.533485] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[295356.569205] Call Trace:
[295356.581694]  <IRQ>
[295356.591921]  try_to_wake_up+0x405/0x480
[295356.611188]  default_wake_function+0x12/0x20
[295356.632564]  __wake_up_common+0x8f/0x160
[295356.652486]  __wake_up_locked+0x16/0x20
[295356.671808]  ep_poll_callback+0xd0/0x300
[295356.691565]  __wake_up_common+0x8f/0x160
[295356.711684]  __wake_up_common_lock+0x7e/0xc0
[295356.733447]  __wake_up+0x13/0x20
[295356.749916]  wake_up_klogd_work_func+0x40/0x60
[295356.772512]  irq_work_run_list+0x53/0x80
[295356.792701]  ? tick_sched_do_timer+0x70/0x70
[295356.821294]  irq_work_tick+0x40/0x50
[295356.839929]  update_process_times+0x42/0x60
[295356.860941]  tick_sched_handle+0x2d/0x60
[295356.881072]  tick_sched_timer+0x39/0x70
[295356.900787]  __hrtimer_run_queues+0xe7/0x230
[295356.922396]  hrtimer_interrupt+0xa8/0x1a0
[295356.942760]  smp_apic_timer_interrupt+0x6b/0x140
[295356.966377]  apic_timer_interrupt+0x8e/0xa0
[295356.987700]  </IRQ>
[295356.998764] RIP: 0010:panic+0x206/0x258
[295357.018139] RSP: 0018:ffffc90007bb7c58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[295357.055880] RAX: 0000000000000034 RBX: 0000000000000200 RCX: 0000000000000006
[295357.092139] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88203f4969d0
[295357.127348] RBP: ffffc90007bb7cc8 R08: 0000000000000000 R09: 00000000000004bf
[295357.163530] R10: ffffffff8140e7c0 R11: 00000000000004be R12: ffffffff81e4b096
[295357.200334] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[295357.236063]  ? vgacon_invert_region+0x80/0x80
[295357.257667]  ? panic+0x1ff/0x258
[295357.274076]  oops_end+0xba/0xd0
[295357.290155]  die+0x42/0x50
[295357.303914]  do_general_protection+0xd2/0x160
[295357.326145]  general_protection+0x25/0x50
[295357.346126] RIP: 0010:prefetch_freepointer.isra.63+0x11/0x20
[295357.374233] RSP: 0018:ffffc90007bb7e08 EFLAGS: 00010202
[295357.400584] RAX: 0000000000000000 RBX: 6236612d38373234 RCX: 00000000000199bb
[295357.436122] RDX: 00000000000199ba RSI: 6236612d38373234 RDI: ffff88203ec259a0
[295357.471905] RBP: ffffc90007bb7e08 R08: 0000000000028060 R09: ffffffff82051cc0
[295357.508220] R10: 0000000000002000 R11: 0000000000000040 R12: 00000000014000c0
[295357.544201] R13: ffff88203ec25980 R14: ffff88203ec25980 R15: ffff882000000000
[295357.580063]  ? idr_alloc_cmn+0x98/0xe0
[295357.598651]  kmem_cache_alloc+0x9c/0x1b0
[295357.617905]  ? fsnotify_add_mark_locked+0x153/0x320
[295357.641988]  fsnotify_add_mark_locked+0x153/0x320
[295357.665286]  SyS_inotify_add_watch+0x2d5/0x350
[295357.687722]  do_syscall_64+0x79/0x1b0
[295357.706171]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[295357.731499] RIP: 0033:0x7f3f53f409b7
[295357.749414] RSP: 002b:00007f3f439f70c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000fe
[295357.787490] RAX: ffffffffffffffda RBX: 00007f3f2c232fc0 RCX: 00007f3f53f409b7
[295357.823420] RDX: 0000000022000fc6 RSI: 0000000002eaba50 RDI: 0000000000000018
[295357.859615] RBP: 0000000002677d20 R08: 000000005ad2a563 R09: 0000000009caa9a8
[295357.895120] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000002677d20
[295357.931829] R13: 000000000000fd02 R14: 000000000005dc08 R15: 00000000000081a4
[295357.967565] Code: c0 74 1a 48 8b 05 7f 44 ec 00 be fd 00 00 00 48 8b 80 a0 00 00 00 e8 ae 1a 9b
00 5d c3 89 fe 48 c7 c7 b8 26 e5 81 e8 21 45 09 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 55 48
[295358.060705] ---[ end trace 97f09d2dbcdbfe0a ]---

---[ sar -f ./sa15 -s 01:05:00 -e 02:00:00 -P 26 ]---
Linux 4.14.32-1.el7.x86_64 (foomar) 	04/15/2018 	_x86_64_	(32 CPU)

01:05:00 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
01:05:01 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:02 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:03 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:04 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:05 AM      26      0.99      0.00      0.99      0.00      0.00     98.02
01:05:06 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:07 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:08 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:09 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:10 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:11 AM      26      0.99      0.00      0.00      0.00      0.00     99.01
01:05:12 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:13 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:14 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:15 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:16 AM      26      2.00      0.00      1.00      0.00      0.00     97.00
01:05:17 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
01:05:18 AM      26      0.00      0.00      0.00      0.00      0.00    100.00
---[ sar -f ./sa15 -s 01:05:00 -e 02:00:00 -P 26 ]---

Any ideas would be very much appreciated.

Cheers,
Pavlos Parissis

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread