All of lore.kernel.org
 help / color / mirror / Atom feed
* Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
@ 2018-06-12 15:38 Chris Boot
  2018-06-12 16:09 ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Boot @ 2018-06-12 15:38 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe, linux-block

Hi folks,

I maintain a large (to me) system with 112 threads (4x Intel E7-4830 v4)
which has a MegaRAID SAS 9361-24i controller. This system is currently
running Debian's 4.16.12 kernel (from stretch-backports) with blk_mq
enabled.

I've run into a lockup which appears to involve blq_mq and writeback
throttling. It's hard to tell if I've run into this same thing with
older kernels; I'm trying to track down a deadlock but so far I've been
fairly certain that involved the OOM killer, but this doesn't seem to.

I've disabled blk_mq and I'm trying to reproduce the other lockup but in
the mean time I'm mailing about this. Is this something new that I've
stumbled across? I've had a look in the Git history from
4.16..8efcf34a2639 but nothing obvious pops out at me.

Console output below:

[ 6528.523172] NMI watchdog: Watchdog detected hard LOCKUP on cpu 39
[ 6528.523174] Modules linked in: netconsole configfs nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support xfs intel_cstate ast ttm intel_uncore libcrc32c drm_kms_helper intel_rapl_perf drm sg i2c_algo_bit pcspkr tpm_tis mei_me tpm_tis_core tpm evdev joydev mei ioatdma shpchp lpc_ich wmi rng_core acp
i_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb dm_mod sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper nvme nvme_core ahci libahci ehci_pci ehci_hcd
[ 6528.523230]  libata megaraid_sas usbcore scsi_mod i2c_i801 ixgbe usb_common dca mdio
[ 6528.523239] CPU: 39 PID: 0 Comm: swapper/39 Not tainted 4.16.0-0.bpo.2-amd64 #1 Debian 4.16.12-1~bpo9+1
[ 6528.523240] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS 3.0a 05/30/2017
[ 6528.523250] RIP: 0010:native_queued_spin_lock_slowpath+0x115/0x190
[ 6528.523251] RSP: 0000:ffff9001ff8c3d18 EFLAGS: 00000046
[ 6528.523253] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff9001ff8e2480
[ 6528.523254] RDX: 000000000000005f RSI: 0000000001800000 RDI: ffff8fe1fb7af9e8
[ 6528.523254] RBP: 00000000ffc2b312 R08: 0000000000a00000 R09: 0000000000000000
[ 6528.523255] R10: 00000000000002fb R11: 0000000000000027 R12: 0000000000000002
[ 6528.523256] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[ 6528.523258] FS:  0000000000000000(0000) GS:ffff9001ff8c0000(0000) knlGS:0000000000000000
[ 6528.523259] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6528.523259] CR2: 0000555fc96a7708 CR3: 00000031ad40a003 CR4: 00000000003606e0
[ 6528.523261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6528.523261] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6528.523262] Call Trace:
[ 6528.523266]  <IRQ>
[ 6528.523270]  _raw_spin_lock_irqsave+0x32/0x40
[ 6528.523275]  __wake_up_common_lock+0x63/0xc0
[ 6528.523283]  wbt_done+0x6a/0x90
[ 6528.523286]  blk_mq_free_request+0xbc/0x190
[ 6528.523302]  scsi_end_request+0x95/0x1e0 [scsi_mod]
[ 6528.523308]  scsi_io_completion+0x409/0x680 [scsi_mod]
[ 6528.523310]  __blk_mq_complete_request+0xba/0x170
[ 6528.523312]  blk_mq_complete_request+0x50/0xa0
[ 6528.523317]  complete_cmd_fusion+0x23a/0x4a0 [megaraid_sas]
[ 6528.523321]  megasas_isr_fusion+0x36/0x180 [megaraid_sas]
[ 6528.523323]  __handle_irq_event_percpu+0x81/0x190
[ 6528.523325]  handle_irq_event_percpu+0x30/0x80
[ 6528.523327]  handle_irq_event+0x3c/0x60
[ 6528.523329]  handle_edge_irq+0x94/0x1f0
[ 6528.523335]  handle_irq+0x1f/0x30
[ 6528.523339]  do_IRQ+0x41/0xc0
[ 6528.523341]  common_interrupt+0xf/0xf
[ 6528.523342]  </IRQ>
[ 6528.523346] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
[ 6528.523346] RSP: 0000:ffff9b6c0c663e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 6528.523348] RAX: ffff9001ff8e18c0 RBX: 0000000000000002 RCX: 000000000000001f
[ 6528.523349] RDX: 000005edfeda7b89 RSI: 0000000040000219 RDI: 0000000000000000
[ 6528.523350] RBP: ffff9001ff8eab20 R08: 000000000000004a R09: 0000000000000018
[ 6528.523350] R10: 00000000000002da R11: 000000000000004a R12: ffffffff910b2c78
[ 6528.523351] R13: 0000000000000002 R14: 0000000000000002 R15: 000005edfeda3109
[ 6528.523355]  do_idle+0x193/0x200
[ 6528.523357]  cpu_startup_entry+0x6f/0x80
[ 6528.523362]  start_secondary+0x1a4/0x1f0
[ 6528.523366]  secondary_startup_64+0xa5/0xb0
[ 6528.523367] Code: 12 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 80 24 02 00 48 03 04 d5 80 e4 ea 90 48 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 0d 09 eb 02 f3 90 8b 
[ 6540.752247] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 6540.753441]  39-...0: (1 GPs behind) idle=2be/0/1 softirq=118190/119299 fqs=2283 
[ 6540.754342]  (detected by 36, t=5252 jiffies, g=160509, c=160508, q=190578)
[ 6540.755220] Sending NMI from CPU 36 to CPUs 39:
[ 6540.756066] NMI backtrace for cpu 39
[ 6540.756070] CPU: 39 PID: 0 Comm: swapper/39 Not tainted 4.16.0-0.bpo.2-amd64 #1 Debian 4.16.12-1~bpo9+1
[ 6540.756071] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS 3.0a 05/30/2017
[ 6540.756080] RIP: 0010:enqueue_task_fair+0x3e/0x7d0
[ 6540.756081] RSP: 0000:ffff9001ff8c3bc8 EFLAGS: 00000082
[ 6540.756083] RAX: 0000000000000027 RBX: ffff9001f0673b00 RCX: ffff9001ff8c3c70
[ 6540.756083] RDX: 0000000000000004 RSI: 000005f2e371165e RDI: ffff9011f8e76808
[ 6540.756084] RBP: ffff9001f0673b80 R08: 0000000000000022 R09: 0000000000000247
[ 6540.756085] R10: 0000000000000000 R11: 0000000000000027 R12: 0000000000000009
[ 6540.756086] R13: ffffffff9054f690 R14: 0000000000000046 R15: 00000000000218c0
[ 6540.756088] FS:  0000000000000000(0000) GS:ffff9001ff8c0000(0000) knlGS:0000000000000000
[ 6540.756089] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6540.756090] CR2: 0000555fc96a7708 CR3: 00000031ad40a003 CR4: 00000000003606e0
[ 6540.756091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6540.756092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6540.756093] Call Trace:
[ 6540.756097]  <IRQ>
[ 6540.756100]  ? __update_load_avg_blocked_se.isra.34+0xc7/0x130
[ 6540.756106]  ttwu_do_activate+0x44/0x80
[ 6540.756108]  try_to_wake_up+0x1bf/0x480
[ 6540.756112]  autoremove_wake_function+0x11/0x50
[ 6540.756114]  __wake_up_common+0x96/0x180
[ 6540.756115]  __wake_up_common_lock+0x7c/0xc0
[ 6540.756125]  wbt_done+0x6a/0x90
[ 6540.756129]  blk_mq_free_request+0xbc/0x190
[ 6540.756147]  scsi_end_request+0x95/0x1e0 [scsi_mod]
[ 6540.756153]  scsi_io_completion+0x409/0x680 [scsi_mod]
[ 6540.756156]  __blk_mq_complete_request+0xba/0x170
[ 6540.756158]  blk_mq_complete_request+0x50/0xa0
[ 6540.756163]  complete_cmd_fusion+0x23a/0x4a0 [megaraid_sas]
[ 6540.756167]  megasas_isr_fusion+0x36/0x180 [megaraid_sas]
[ 6540.756170]  __handle_irq_event_percpu+0x81/0x190
[ 6540.756172]  handle_irq_event_percpu+0x30/0x80
[ 6540.756174]  handle_irq_event+0x3c/0x60
[ 6540.756176]  handle_edge_irq+0x94/0x1f0
[ 6540.756180]  handle_irq+0x1f/0x30
[ 6540.756186]  do_IRQ+0x41/0xc0
[ 6540.756189]  common_interrupt+0xf/0xf
[ 6540.756191]  </IRQ>
[ 6540.756195] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
[ 6540.756196] RSP: 0000:ffff9b6c0c663e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 6540.756197] RAX: ffff9001ff8e18c0 RBX: 0000000000000002 RCX: 000000000000001f
[ 6540.756198] RDX: 000005edfeda7b89 RSI: 0000000040000219 RDI: 0000000000000000
[ 6540.756199] RBP: ffff9001ff8eab20 R08: 000000000000004a R09: 0000000000000018
[ 6540.756200] R10: 00000000000002da R11: 000000000000004a R12: ffffffff910b2c78
[ 6540.756200] R13: 0000000000000002 R14: 0000000000000002 R15: 000005edfeda3109
[ 6540.756205]  do_idle+0x193/0x200
[ 6540.756208]  cpu_startup_entry+0x6f/0x80
[ 6540.756214]  start_secondary+0x1a4/0x1f0
[ 6540.756218]  secondary_startup_64+0xa5/0xb0
[ 6540.756220] Code: 8d ae 80 00 00 00 48 89 f3 48 83 ec 38 f6 86 88 04 00 00 02 48 89 7c 24 08 0f 85 14 07 00 00 48 85 ed 74 4a 44 8b 93 c0 00 00 00 <45> 85 d2 74 14 e9 8a 00 00 00 44 8b 4d 40 41 bc 01 00 00 00 45 

(Lots of soft-lockup messages follow; I can post them all somewhere if
they are useful).

Thanks,
Chris

-- 
Chris Boot
bootc@boo.tc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
  2018-06-12 15:38 Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all() Chris Boot
@ 2018-06-12 16:09 ` Jens Axboe
  2018-06-12 16:19   ` Chris Boot
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-06-12 16:09 UTC (permalink / raw)
  To: Chris Boot, linux-kernel, linux-block

On 6/12/18 9:38 AM, Chris Boot wrote:
> Hi folks,
> 
> I maintain a large (to me) system with 112 threads (4x Intel E7-4830 v4)
> which has a MegaRAID SAS 9361-24i controller. This system is currently
> running Debian's 4.16.12 kernel (from stretch-backports) with blk_mq
> enabled.
> 
> I've run into a lockup which appears to involve blq_mq and writeback
> throttling. It's hard to tell if I've run into this same thing with
> older kernels; I'm trying to track down a deadlock but so far I've been
> fairly certain that involved the OOM killer, but this doesn't seem to.
> 
> I've disabled blk_mq and I'm trying to reproduce the other lockup but in
> the mean time I'm mailing about this. Is this something new that I've
> stumbled across? I've had a look in the Git history from
> 4.16..8efcf34a2639 but nothing obvious pops out at me.
> 
> Console output below:
> 
> [ 6528.523172] NMI watchdog: Watchdog detected hard LOCKUP on cpu 39
> [ 6528.523174] Modules linked in: netconsole configfs nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support xfs intel_cstate ast ttm intel_uncore libcrc32c drm_kms_helper intel_rapl_perf drm sg i2c_algo_bit pcspkr tpm_tis mei_me tpm_tis_core tpm evdev joydev mei ioatdma shpchp lpc_ich wmi rng_core acp
> i_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb dm_mod sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper nvme nvme_core ahci libahci ehci_pci ehci_hcd
> [ 6528.523230]  libata megaraid_sas usbcore scsi_mod i2c_i801 ixgbe usb_common dca mdio
> [ 6528.523239] CPU: 39 PID: 0 Comm: swapper/39 Not tainted 4.16.0-0.bpo.2-amd64 #1 Debian 4.16.12-1~bpo9+1
> [ 6528.523240] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS 3.0a 05/30/2017
> [ 6528.523250] RIP: 0010:native_queued_spin_lock_slowpath+0x115/0x190
> [ 6528.523251] RSP: 0000:ffff9001ff8c3d18 EFLAGS: 00000046
> [ 6528.523253] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff9001ff8e2480
> [ 6528.523254] RDX: 000000000000005f RSI: 0000000001800000 RDI: ffff8fe1fb7af9e8
> [ 6528.523254] RBP: 00000000ffc2b312 R08: 0000000000a00000 R09: 0000000000000000
> [ 6528.523255] R10: 00000000000002fb R11: 0000000000000027 R12: 0000000000000002
> [ 6528.523256] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
> [ 6528.523258] FS:  0000000000000000(0000) GS:ffff9001ff8c0000(0000) knlGS:0000000000000000
> [ 6528.523259] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6528.523259] CR2: 0000555fc96a7708 CR3: 00000031ad40a003 CR4: 00000000003606e0
> [ 6528.523261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 6528.523261] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 6528.523262] Call Trace:
> [ 6528.523266]  <IRQ>
> [ 6528.523270]  _raw_spin_lock_irqsave+0x32/0x40
> [ 6528.523275]  __wake_up_common_lock+0x63/0xc0
> [ 6528.523283]  wbt_done+0x6a/0x90
> [ 6528.523286]  blk_mq_free_request+0xbc/0x190
> [ 6528.523302]  scsi_end_request+0x95/0x1e0 [scsi_mod]
> [ 6528.523308]  scsi_io_completion+0x409/0x680 [scsi_mod]
> [ 6528.523310]  __blk_mq_complete_request+0xba/0x170
> [ 6528.523312]  blk_mq_complete_request+0x50/0xa0
> [ 6528.523317]  complete_cmd_fusion+0x23a/0x4a0 [megaraid_sas]
> [ 6528.523321]  megasas_isr_fusion+0x36/0x180 [megaraid_sas]
> [ 6528.523323]  __handle_irq_event_percpu+0x81/0x190
> [ 6528.523325]  handle_irq_event_percpu+0x30/0x80
> [ 6528.523327]  handle_irq_event+0x3c/0x60
> [ 6528.523329]  handle_edge_irq+0x94/0x1f0
> [ 6528.523335]  handle_irq+0x1f/0x30
> [ 6528.523339]  do_IRQ+0x41/0xc0
> [ 6528.523341]  common_interrupt+0xf/0xf
> [ 6528.523342]  </IRQ>
> [ 6528.523346] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
> [ 6528.523346] RSP: 0000:ffff9b6c0c663e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
> [ 6528.523348] RAX: ffff9001ff8e18c0 RBX: 0000000000000002 RCX: 000000000000001f
> [ 6528.523349] RDX: 000005edfeda7b89 RSI: 0000000040000219 RDI: 0000000000000000
> [ 6528.523350] RBP: ffff9001ff8eab20 R08: 000000000000004a R09: 0000000000000018
> [ 6528.523350] R10: 00000000000002da R11: 000000000000004a R12: ffffffff910b2c78
> [ 6528.523351] R13: 0000000000000002 R14: 0000000000000002 R15: 000005edfeda3109
> [ 6528.523355]  do_idle+0x193/0x200
> [ 6528.523357]  cpu_startup_entry+0x6f/0x80
> [ 6528.523362]  start_secondary+0x1a4/0x1f0
> [ 6528.523366]  secondary_startup_64+0xa5/0xb0
> [ 6528.523367] Code: 12 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 80 24 02 00 48 03 04 d5 80 e4 ea 90 48 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 0d 09 eb 02 f3 90 8b

Hmm that's really weird, I don't see how we could be spinning on the
waitqueue lock like that. I haven't seen any wbt bug reports like this
before.

Are things generally stable if you just turn off wbt? You can do that
for sda, for instance, by doing:

# echo 0 > /sys/block/sda/queue/wbt_lat_usec

It'd be interesting to get this data point. Eg leave blk-mq enabled, and
then just disable wbt.

Is anything disabling wbt in the system otherwise?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
  2018-06-12 16:09 ` Jens Axboe
@ 2018-06-12 16:19   ` Chris Boot
  2018-06-12 16:22     ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Boot @ 2018-06-12 16:19 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-block

On 12/06/18 17:09, Jens Axboe wrote:
> On 6/12/18 9:38 AM, Chris Boot wrote:
>> Hi folks,
>>
>> I maintain a large (to me) system with 112 threads (4x Intel E7-4830 v4)
>> which has a MegaRAID SAS 9361-24i controller. This system is currently
>> running Debian's 4.16.12 kernel (from stretch-backports) with blk_mq
>> enabled.
>>
>> I've run into a lockup which appears to involve blq_mq and writeback
>> throttling. It's hard to tell if I've run into this same thing with
>> older kernels; I'm trying to track down a deadlock but so far I've been
>> fairly certain that involved the OOM killer, but this doesn't seem to.
[snip]
> 
> Hmm that's really weird, I don't see how we could be spinning on the
> waitqueue lock like that. I haven't seen any wbt bug reports like this
> before.
> 
> Are things generally stable if you just turn off wbt? You can do that
> for sda, for instance, by doing:
> 
> # echo 0 > /sys/block/sda/queue/wbt_lat_usec
> 
> It'd be interesting to get this data point. Eg leave blk-mq enabled, and
> then just disable wbt.

Hi Jens,

Thanks for the speedy response. I'll see if I can get that tested soon;
if the system is stable without blk_mq I can see the users wanting to
keep it that way for a while. I'll let you know.

> Is anything disabling wbt in the system otherwise?

Not that I'm aware of, no.

Thanks,
Chris

-- 
Chris Boot
bootc@boo.tc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
  2018-06-12 16:19   ` Chris Boot
@ 2018-06-12 16:22     ` Jens Axboe
  2018-06-20 12:45       ` Chris Boot
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-06-12 16:22 UTC (permalink / raw)
  To: Chris Boot, linux-kernel, linux-block

On 6/12/18 10:19 AM, Chris Boot wrote:
> On 12/06/18 17:09, Jens Axboe wrote:
>> On 6/12/18 9:38 AM, Chris Boot wrote:
>>> Hi folks,
>>>
>>> I maintain a large (to me) system with 112 threads (4x Intel E7-4830 v4)
>>> which has a MegaRAID SAS 9361-24i controller. This system is currently
>>> running Debian's 4.16.12 kernel (from stretch-backports) with blk_mq
>>> enabled.
>>>
>>> I've run into a lockup which appears to involve blq_mq and writeback
>>> throttling. It's hard to tell if I've run into this same thing with
>>> older kernels; I'm trying to track down a deadlock but so far I've been
>>> fairly certain that involved the OOM killer, but this doesn't seem to.
> [snip]
>>
>> Hmm that's really weird, I don't see how we could be spinning on the
>> waitqueue lock like that. I haven't seen any wbt bug reports like this
>> before.
>>
>> Are things generally stable if you just turn off wbt? You can do that
>> for sda, for instance, by doing:
>>
>> # echo 0 > /sys/block/sda/queue/wbt_lat_usec
>>
>> It'd be interesting to get this data point. Eg leave blk-mq enabled, and
>> then just disable wbt.
> 
> Hi Jens,
> 
> Thanks for the speedy response. I'll see if I can get that tested soon;
> if the system is stable without blk_mq I can see the users wanting to
> keep it that way for a while. I'll let you know.

Understandable. I just get suspicious of the general state of the system,
if it's locking up there. Could be a hardware issue, or a bug in some
other area that's messing things up. I have wbt running on literally
hundreds of thousands of boxes and haven't seen a lockup like this.

>> Is anything disabling wbt in the system otherwise?
> 
> Not that I'm aware of, no.

OK, just wanted to rule out something related to the shutdown path
racing with IO.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
  2018-06-12 16:22     ` Jens Axboe
@ 2018-06-20 12:45       ` Chris Boot
  2018-06-30 13:12         ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Boot @ 2018-06-20 12:45 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, linux-block

On 12/06/18 17:22, Jens Axboe wrote:
> On 6/12/18 10:19 AM, Chris Boot wrote:
>> On 12/06/18 17:09, Jens Axboe wrote:
>>> On 6/12/18 9:38 AM, Chris Boot wrote:
>>>> Hi folks,
>>>>
>>>> I maintain a large (to me) system with 112 threads (4x Intel E7-4830 v4)
>>>> which has a MegaRAID SAS 9361-24i controller. This system is currently
>>>> running Debian's 4.16.12 kernel (from stretch-backports) with blk_mq
>>>> enabled.
>>>>
>>>> I've run into a lockup which appears to involve blq_mq and writeback
>>>> throttling. It's hard to tell if I've run into this same thing with
>>>> older kernels; I'm trying to track down a deadlock but so far I've been
>>>> fairly certain that involved the OOM killer, but this doesn't seem to.
>> [snip]
>>>
>>> Hmm that's really weird, I don't see how we could be spinning on the
>>> waitqueue lock like that. I haven't seen any wbt bug reports like this
>>> before.
>>>
>>> Are things generally stable if you just turn off wbt? You can do that
>>> for sda, for instance, by doing:
>>>
>>> # echo 0 > /sys/block/sda/queue/wbt_lat_usec
>>>
>>> It'd be interesting to get this data point. Eg leave blk-mq enabled, and
>>> then just disable wbt.
>>
>> Hi Jens,
>>
>> Thanks for the speedy response. I'll see if I can get that tested soon;
>> if the system is stable without blk_mq I can see the users wanting to
>> keep it that way for a while. I'll let you know.
> 
> Understandable. I just get suspicious of the general state of the system,
> if it's locking up there. Could be a hardware issue, or a bug in some
> other area that's messing things up. I have wbt running on literally
> hundreds of thousands of boxes and haven't seen a lockup like this.

Hi Jens,

I got an opportunity yesterday to do some testing. I can't get this
system to crash with blk-mq disabled, or with blk-mq enabled but wbt
disabled. I have a reproducer workload I can launch against the system
and it seems to crash reliably with this, but I doubt I can share it
with you.

I do, however, have a task state dump (SysRq+T) that I managed to get
out of the server once it started locking up. It's pretty large, so I
uploaded it to my Dropbox for now:

https://www.dropbox.com/s/fyo1ab6mmcqk8fq/crash-1.log.gz?dl=0

Hope this helps!

Cheers,
Chris

-- 
Chris Boot
bootc@boo.tc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
  2018-06-20 12:45       ` Chris Boot
@ 2018-06-30 13:12         ` Tetsuo Handa
  2018-10-05 17:05           ` Chris Boot
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2018-06-30 13:12 UTC (permalink / raw)
  To: Chris Boot; +Cc: Jens Axboe, linux-kernel, linux-block

On 2018/06/20 21:45, Chris Boot wrote:
> Hi Jens,
> 
> I got an opportunity yesterday to do some testing. I can't get this
> system to crash with blk-mq disabled, or with blk-mq enabled but wbt
> disabled. I have a reproducer workload I can launch against the system
> and it seems to crash reliably with this, but I doubt I can share it
> with you.
> 
> I do, however, have a task state dump (SysRq+T) that I managed to get
> out of the server once it started locking up. It's pretty large, so I
> uploaded it to my Dropbox for now:
> 
> https://www.dropbox.com/s/fyo1ab6mmcqk8fq/crash-1.log.gz?dl=0
> 
> Hope this helps!

I'm not familiar with block layer, but checking Workqueue entries in SysRq-t.
blk_mq_timeout_work was stuck at RCU synchronization and wb_workfn was
stuck at wbt_wait() ?

----------------------------------------
[ 5376.011390] kworker/73:1H   D    0  1803      2 0x80000000
[ 5376.017418] Workqueue: kblockd blk_mq_timeout_work
[ 5376.022770] Call Trace:
[ 5376.025805]  ? __schedule+0x3dc/0x870
[ 5376.030009]  schedule+0x32/0x80
[ 5376.033737]  schedule_timeout+0x1e5/0x350
[ 5376.038298]  ? scsi_dispatch_cmd+0xe5/0x220 [scsi_mod]
[ 5376.043976]  ? scsi_queue_rq+0x4d2/0x600 [scsi_mod]
[ 5376.049425]  wait_for_completion+0x123/0x190
[ 5376.054270]  ? wake_up_q+0x70/0x70
[ 5376.058262]  __wait_rcu_gp+0x10f/0x140
[ 5376.062736]  synchronize_sched+0x59/0x80
[ 5376.067208]  ? call_rcu_bh+0x20/0x20
[ 5376.071350]  ? trace_raw_output_rcu_utilization+0x40/0x40
[ 5376.077312]  blk_mq_timeout_work+0x138/0x200
[ 5376.082152]  process_one_work+0x177/0x350
[ 5376.086692]  worker_thread+0x4d/0x3c0
[ 5376.090903]  kthread+0xf8/0x130
[ 5376.094588]  ? rescuer_thread+0x350/0x350
[ 5376.099128]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 5376.104745]  ? SyS_exit_group+0x10/0x10
[ 5376.109189]  ret_from_fork+0x35/0x40
----------------------------------------

----------------------------------------
[ 5391.381393] kworker/u225:0  D    0  9248      2 0x80000000
[ 5391.387404] Workqueue: writeback wb_workfn (flush-254:6)
[ 5391.393337] Call Trace:
[ 5391.396347]  ? __schedule+0x3dc/0x870
[ 5391.400533]  schedule+0x32/0x80
[ 5391.404253]  io_schedule+0x12/0x40
[ 5391.408188]  wbt_wait+0x277/0x350
[ 5391.412045]  ? remove_wait_queue+0x60/0x60
[ 5391.416706]  blk_mq_make_request+0x107/0x5f0
[ 5391.421510]  ? __split_and_process_bio+0x131/0x190 [dm_mod]
[ 5391.427687]  generic_make_request+0x123/0x2f0
[ 5391.432627]  ? submit_bio+0x6c/0x140
[ 5391.436734]  ? _cond_resched+0x16/0x40
[ 5391.441051]  submit_bio+0x6c/0x140
[ 5391.444987]  ? kmem_cache_alloc+0xda/0x1b0
[ 5391.449645]  ext4_io_submit+0x48/0x60 [ext4]
[ 5391.454489]  ext4_writepages+0x49c/0xe60 [ext4]
[ 5391.459594]  ? scsi_queue_rq+0x4d2/0x600 [scsi_mod]
[ 5391.465015]  ? update_load_avg+0x64f/0x6e0
[ 5391.469651]  ? do_writepages+0x1a/0x60
[ 5391.473932]  do_writepages+0x1a/0x60
[ 5391.478059]  __writeback_single_inode+0x3d/0x320
[ 5391.483226]  writeback_sb_inodes+0x221/0x4b0
[ 5391.488060]  __writeback_inodes_wb+0x87/0xb0
[ 5391.492897]  wb_writeback+0x288/0x320
[ 5391.497110]  ? cpumask_next+0x16/0x20
[ 5391.501346]  ? wb_workfn+0x37c/0x450
[ 5391.505461]  wb_workfn+0x37c/0x450
[ 5391.509461]  process_one_work+0x177/0x350
[ 5391.514002]  worker_thread+0x4d/0x3c0
[ 5391.518197]  kthread+0xf8/0x130
[ 5391.521870]  ? rescuer_thread+0x350/0x350
[ 5391.526403]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 5391.532029]  ? SyS_exit_group+0x10/0x10
[ 5391.536405]  ret_from_fork+0x35/0x40
----------------------------------------

Impossible to tell whether these threads made progress over time.
Please try https://akari.osdn.jp/capturing-kernel-messages.html#Tips5 .

But synchronize_rcu() in blk_mq_timeout_work() seems to be removed by
commit 12f5b93145450c75 ("blk-mq: Remove generation seqeunce") which went to
4.18-rc1. Thus, trying to reproduce with latest linux.git would be helpful.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
  2018-06-30 13:12         ` Tetsuo Handa
@ 2018-10-05 17:05           ` Chris Boot
  2018-10-07  0:03             ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Boot @ 2018-10-05 17:05 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Jens Axboe, linux-kernel, linux-block

On 30/06/2018 14:12, Tetsuo Handa wrote:
> On 2018/06/20 21:45, Chris Boot wrote:
>> Hi Jens,
>>
>> I got an opportunity yesterday to do some testing. I can't get this
>> system to crash with blk-mq disabled, or with blk-mq enabled but wbt
>> disabled. I have a reproducer workload I can launch against the system
>> and it seems to crash reliably with this, but I doubt I can share it
>> with you.
>>
>> I do, however, have a task state dump (SysRq+T) that I managed to get
>> out of the server once it started locking up. It's pretty large, so I
>> uploaded it to my Dropbox for now:
>>
>> https://www.dropbox.com/s/fyo1ab6mmcqk8fq/crash-1.log.gz?dl=0
>>
>> Hope this helps!
> 
> I'm not familiar with block layer, but checking Workqueue entries in SysRq-t.
> blk_mq_timeout_work was stuck at RCU synchronization and wb_workfn was
> stuck at wbt_wait() ?
[snip]
> 
> Impossible to tell whether these threads made progress over time.
> Please try https://akari.osdn.jp/capturing-kernel-messages.html#Tips5 .
> 
> But synchronize_rcu() in blk_mq_timeout_work() seems to be removed by
> commit 12f5b93145450c75 ("blk-mq: Remove generation seqeunce") which went to
> 4.18-rc1. Thus, trying to reproduce with latest linux.git would be helpful.

Hi Tetuso, Jens,

I upgraded the kernel on my affected system to a 4.18.6 kernel (Debian's
4.18.6-1~bpo9+1 in stretch-backports) and ran my test suite again. I'm
sorry to report that the issue occurred once more.

Logs below, it's all I managed to get out of it before my session locked up.

[Oct 5 17:56] INFO: rcu_sched self-detected stall on CPU
[  +0.003914] INFO: rcu_sched detected stalls on CPUs/tasks:
[  +0.001271] 	82-....: (1 GPs behind) idle=47a/0/3 softirq=60148/60149
fqs=2234
[  +0.012840] 	
[  +0.000007] 	82-....: (1 GPs behind) idle=47a/0/3 softirq=60148/60149
fqs=2235
[  +0.000002]  (t=5255 jiffies g=82048 c=82047 q=35803)
[  +0.008936] 	
[  +0.000003] NMI backtrace for cpu 82
[  +0.000005] (detected by 87, t=5257 jiffies, g=82048, c=82047, q=35803)
[  +0.001598] CPU: 82 PID: 0 Comm: swapper/82 Not tainted
4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
[  +0.000001] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
3.0a 05/30/2017
[  +0.000001] Call Trace:
[  +0.000004]  <IRQ>
[  +0.000011]  dump_stack+0x5c/0x7b
[  +0.000005]  nmi_cpu_backtrace+0x89/0x90
[  +0.000007]  ? lapic_can_unplug_cpu+0xa0/0xa0
[  +0.000002]  nmi_trigger_cpumask_backtrace+0xf5/0x130
[  +0.000007]  rcu_dump_cpu_stacks+0x9b/0xcb
[  +0.000003]  rcu_check_callbacks+0x79a/0x8e0
[  +0.000007]  ? sched_clock_cpu+0xc/0xa0
[  +0.000005]  ? tick_sched_do_timer+0x60/0x60
[  +0.000005]  update_process_times+0x28/0x50
[  +0.000003]  tick_sched_handle+0x22/0x60
[  +0.000002]  tick_sched_timer+0x37/0x70
[  +0.000002]  __hrtimer_run_queues+0xfc/0x270
[  +0.000003]  hrtimer_interrupt+0x101/0x240
[  +0.000004]  smp_apic_timer_interrupt+0x6a/0x130
[  +0.000002]  apic_timer_interrupt+0xf/0x20
[  +0.000006] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
[  +0.000001] Code: 8b 00 a8 08 74 0b 65 81 25 d8 6b 11 48 ff ff ff 7f
44 89 e0 5b 5d 41 5c c3 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57
9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 c6 07
[  +0.000030] RSP: 0000:ffff8bfdffc83de8 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff13
[  +0.000002] RAX: 00000000ff72b790 RBX: ffff8bedf5807768 RCX:
dead000000000200
[  +0.000001] RDX: ffffa8cc4fa87410 RSI: 0000000000000202 RDI:
0000000000000202
[  +0.000001] RBP: 00000000ff72b790 R08: ffff8bedf5807770 R09:
000003fffff00000
[  +0.000001] R10: 0000000000000052 R11: 0000000000000001 R12:
0000000000000202
[  +0.000001] R13: 0000000000000003 R14: 0000000000000000 R15:
0000000000000000
[  +0.000001]  ? apic_timer_interrupt+0xa/0x20
[  +0.000006]  __wake_up_common_lock+0x89/0xc0
[  +0.000007]  rwb_wake_all+0x30/0x40
[  +0.000003]  scale_up.part.25+0x24/0x40
[  +0.000002]  wb_timer_fn+0x295/0x430
[  +0.000007]  ? blk_mq_tag_update_depth+0x110/0x110
[  +0.000001]  call_timer_fn+0x2b/0x120
[  +0.000003]  run_timer_softirq+0x1d3/0x410
[  +0.000002]  ? enqueue_hrtimer+0x3a/0x90
[  +0.000002]  ? __hrtimer_run_queues+0x12c/0x270
[  +0.000002]  __do_softirq+0x10d/0x2a6
[  +0.000006]  irq_exit+0xb6/0xc0
[  +0.000003]  smp_apic_timer_interrupt+0x74/0x130
[  +0.000001]  apic_timer_interrupt+0xf/0x20
[  +0.000001]  </IRQ>
[  +0.000008] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
[  +0.000001] Code: c8 28 48 e8 bb b9 b2 ff 48 89 04 24 0f 1f 44 00 00
31 ff e8 4b c4 b2 ff 80 7c 24 0f 00 0f 85 b6 01 00 00 fb 66 0f 1f 44 00
00 <48> 8b 0c 24 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f9 48 89 c8 48 c1
[  +0.000028] RSP: 0000:ffffa8cc4c7cbe78 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
[  +0.000002] RAX: ffff8bfdffca1b80 RBX: 0000000000000001 RCX:
000000000000001f
[  +0.000001] RDX: 00000237c552d9f0 RSI: 0000000040000219 RDI:
0000000000000000
[  +0.000000] RBP: ffff8bfdffcaaf78 R08: 00000000ffffffff R09:
0000000000000008
[  +0.000001] R10: 00000000000000a9 R11: 00000000000000c2 R12:
ffffffffb88b3a78
[  +0.000001] R13: 0000000000000001 R14: 0000000000000001 R15:
00000237c55130a6
[  +0.000004]  ? cpuidle_enter_state+0x95/0x2b0
[  +0.000004]  do_idle+0x204/0x270
[  +0.000003]  cpu_startup_entry+0x6f/0x80
[  +0.000002]  start_secondary+0x1a4/0x1f0
[  +0.000005]  secondary_startup_64+0xa5/0xb0
[  +0.000058] Sending NMI from CPU 87 to CPUs 82:
[  +0.000163] NMI backtrace for cpu 82
[  +0.000001] CPU: 82 PID: 0 Comm: swapper/82 Not tainted
4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
[  +0.000000] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
3.0a 05/30/2017
[  +0.000001] RIP: 0010:try_to_wake_up+0x49/0x4a0
[  +0.000000] Code: 41 89 d5 31 ed 48 83 ec 28 65 48 8b 04 25 28 00 00
00 48 89 44 24 20 31 c0 e8 b3 48 65 00 49 89 c6 48 8b 43 10 41 85 c7 75
30 <4c> 89 f6 4c 89 e7 e8 4c 47 65 00 48 8b 4c 24 20 65 48 33 0c 25 28
[  +0.000014] RSP: 0000:ffff8bfdffc83ab0 EFLAGS: 00000046
[  +0.000001] RAX: 0000000080200000 RBX: ffff8c1b0a3a3b00 RCX:
ffff8c1dfbb3bb00
[  +0.000001] RDX: 0000000000000000 RSI: 0000000000000069 RDI:
ffff8c1b0a3a3b00
[  +0.000000] RBP: 0000000000000001 R08: 0000000000000069 R09:
0000000000000052
[  +0.000001] R10: 0000000000000052 R11: 0000000000000000 R12:
ffff8c1b0a3a424c
[  +0.000001] R13: 0000000000000000 R14: 0000000000000046 R15:
0000000000021b80
[  +0.000001] FS:  0000000000000000(0000) GS:ffff8bfdffc80000(0000)
knlGS:0000000000000000
[  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 0000559c8105f094 CR3: 000000260e20a002 CR4:
00000000003606e0
[  +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000000] Call Trace:
[  +0.000000]  <IRQ>
[  +0.000001]  autoremove_wake_function+0x11/0x50
[  +0.000000]  __wake_up_common+0x96/0x180
[  +0.000000]  __wake_up_common_lock+0x7c/0xc0
[  +0.000001]  irq_work_run_list+0x50/0x80
[  +0.000000]  ? tick_sched_do_timer+0x60/0x60
[  +0.000001]  update_process_times+0x3b/0x50
[  +0.000000]  tick_sched_handle+0x22/0x60
[  +0.000000]  tick_sched_timer+0x37/0x70
[  +0.000001]  __hrtimer_run_queues+0xfc/0x270
[  +0.000000]  hrtimer_interrupt+0x101/0x240
[  +0.000000]  smp_apic_timer_interrupt+0x6a/0x130
[  +0.000001]  apic_timer_interrupt+0xf/0x20
[  +0.000000] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
[  +0.000001] Code: 8b 00 a8 08 74 0b 65 81 25 d8 6b 11 48 ff ff ff 7f
44 89 e0 5b 5d 41 5c c3 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57
9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 c6 07
[  +0.000018] RSP: 0000:ffff8bfdffc83de8 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff13
[  +0.000001] RAX: 00000000ff72b790 RBX: ffff8bedf5807768 RCX:
dead000000000200
[  +0.000001] RDX: ffffa8cc4fa87410 RSI: 0000000000000202 RDI:
0000000000000202
[  +0.000000] RBP: 00000000ff72b790 R08: ffff8bedf5807770 R09:
000003fffff00000
[  +0.000001] R10: 0000000000000052 R11: 0000000000000001 R12:
0000000000000202
[  +0.000000] R13: 0000000000000003 R14: 0000000000000000 R15:
0000000000000000
[  +0.000001]  ? apic_timer_interrupt+0xa/0x20
[  +0.000000]  __wake_up_common_lock+0x89/0xc0
[  +0.000000]  rwb_wake_all+0x30/0x40
[  +0.000001]  scale_up.part.25+0x24/0x40
[  +0.000000]  wb_timer_fn+0x295/0x430
[  +0.000000]  ? blk_mq_tag_update_depth+0x110/0x110
[  +0.000001]  call_timer_fn+0x2b/0x120
[  +0.000000]  run_timer_softirq+0x1d3/0x410
[  +0.000001]  ? enqueue_hrtimer+0x3a/0x90
[  +0.000000]  ? __hrtimer_run_queues+0x12c/0x270
[  +0.000000]  __do_softirq+0x10d/0x2a6
[  +0.000001]  irq_exit+0xb6/0xc0
[  +0.000000]  smp_apic_timer_interrupt+0x74/0x130
[  +0.000000]  apic_timer_interrupt+0xf/0x20
[  +0.000001]  </IRQ>
[  +0.000000] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
[  +0.000000] Code: c8 28 48 e8 bb b9 b2 ff 48 89 04 24 0f 1f 44 00 00
31 ff e8 4b c4 b2 ff 80 7c 24 0f 00 0f 85 b6 01 00 00 fb 66 0f 1f 44 00
00 <48> 8b 0c 24 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f9 48 89 c8 48 c1
[  +0.000014] RSP: 0000:ffffa8cc4c7cbe78 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
[  +0.000001] RAX: ffff8bfdffca1b80 RBX: 0000000000000001 RCX:
000000000000001f
[  +0.000000] RDX: 00000237c552d9f0 RSI: 0000000040000219 RDI:
0000000000000000
[  +0.000001] RBP: ffff8bfdffcaaf78 R08: 00000000ffffffff R09:
0000000000000008
[  +0.000000] R10: 00000000000000a9 R11: 00000000000000c2 R12:
ffffffffb88b3a78
[  +0.000001] R13: 0000000000000001 R14: 0000000000000001 R15:
00000237c55130a6
[  +0.000000]  ? cpuidle_enter_state+0x95/0x2b0
[  +0.000000]  do_idle+0x204/0x270
[  +0.000001]  cpu_startup_entry+0x6f/0x80
[  +0.000000]  start_secondary+0x1a4/0x1f0
[  +0.000001]  secondary_startup_64+0xa5/0xb0
[Oct 5 17:57] watchdog: BUG: soft lockup - CPU#82 stuck for 23s!
[swapper/82:0]
[  +0.007179] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink
nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack
nf_nat nf_conntrack br_netfilter bridge stp llc overlay nfsv3
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding intel_rapl
sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
iTCO_wdt kvm iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel intel_cstate intel_uncore xfs intel_rapl_perf
libcrc32c pcspkr ast ttm drm_kms_helper tpm_tis tpm_tis_core drm tpm
mei_me sg joydev mei i2c_algo_bit ioatdma lpc_ich evdev wmi rng_core
acpi_pad pcc_cpufreq button nfsd auth_rpcgss nfs_acl ipmi_si lockd
ipmi_poweroff grace ipmi_devintf ipmi_msghandler sunrpc ip_tables
x_tables autofs4
[  +0.000066]  ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb
dm_mod sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64
crypto_simd cryptd glue_helper nvme nvme_core ahci libahci ehci_pci
ehci_hcd libata megaraid_sas scsi_mod usbcore i2c_i801 ixgbe usb_common
dca mdio
[  +0.000030] CPU: 82 PID: 0 Comm: swapper/82 Not tainted
4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
[  +0.000001] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
3.0a 05/30/2017
[  +0.000013] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
[  +0.000001] Code: 8b 00 a8 08 74 0b 65 81 25 d8 6b 11 48 ff ff ff 7f
44 89 e0 5b 5d 41 5c c3 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57
9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 c6 07
[  +0.000029] RSP: 0000:ffff8bfdffc83de8 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff13
[  +0.000002] RAX: 00000000fec2e7a2 RBX: ffff8bedf5807768 RCX:
dead000000000200
[  +0.000001] RDX: ffffa8cc6028b410 RSI: 0000000000000202 RDI:
0000000000000202
[  +0.000001] RBP: 00000000fec2e7a2 R08: ffff8bedf5807770 R09:
0000000000000052
[  +0.000001] R10: 0000000000000052 R11: 0000000000000001 R12:
0000000000000202
[  +0.000001] R13: 0000000000000003 R14: 0000000000000000 R15:
0000000000000000
[  +0.000002] FS:  0000000000000000(0000) GS:ffff8bfdffc80000(0000)
knlGS:0000000000000000
[  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000002] CR2: 0000559c8105f094 CR3: 000000260e20a002 CR4:
00000000003606e0
[  +0.000001] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000000] Call Trace:
[  +0.000004]  <IRQ>
[  +0.000008]  __wake_up_common_lock+0x89/0xc0
[  +0.000009]  rwb_wake_all+0x30/0x40
[  +0.000003]  scale_up.part.25+0x24/0x40
[  +0.000003]  wb_timer_fn+0x295/0x430
[  +0.000008]  ? blk_mq_tag_update_depth+0x110/0x110
[  +0.000007]  call_timer_fn+0x2b/0x120
[  +0.000003]  run_timer_softirq+0x1d3/0x410
[  +0.000002]  ? enqueue_hrtimer+0x3a/0x90
[  +0.000002]  ? __hrtimer_run_queues+0x12c/0x270
[  +0.000004]  __do_softirq+0x10d/0x2a6
[  +0.000006]  irq_exit+0xb6/0xc0
[  +0.000002]  smp_apic_timer_interrupt+0x74/0x130
[  +0.000003]  apic_timer_interrupt+0xf/0x20
[  +0.000002]  </IRQ>
[  +0.000009] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
[  +0.000001] Code: c8 28 48 e8 bb b9 b2 ff 48 89 04 24 0f 1f 44 00 00
31 ff e8 4b c4 b2 ff 80 7c 24 0f 00 0f 85 b6 01 00 00 fb 66 0f 1f 44 00
00 <48> 8b 0c 24 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f9 48 89 c8 48 c1
[  +0.000028] RSP: 0000:ffffa8cc4c7cbe78 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
[  +0.000002] RAX: ffff8bfdffca1b80 RBX: 0000000000000001 RCX:
000000000000001f
[  +0.000001] RDX: 00000237c552d9f0 RSI: 0000000040000219 RDI:
0000000000000000
[  +0.000001] RBP: ffff8bfdffcaaf78 R08: 00000000ffffffff R09:
0000000000000008
[  +0.000001] R10: 00000000000000a9 R11: 00000000000000c2 R12:
ffffffffb88b3a78
[  +0.000001] R13: 0000000000000001 R14: 0000000000000001 R15:
00000237c55130a6
[  +0.000004]  ? cpuidle_enter_state+0x95/0x2b0
[  +0.000008]  do_idle+0x204/0x270
[  +0.000003]  cpu_startup_entry+0x6f/0x80
[  +0.000005]  start_secondary+0x1a4/0x1f0
[  +0.000005]  secondary_startup_64+0xa5/0xb0

Message from syslogd@talisker at Oct  5 17:57:15 ...
 kernel:[ 2484.753810] watchdog: BUG: soft lockup - CPU#82 stuck for
23s! [swapper/82:0]
[ +26.628991] INFO: rcu_sched detected stalls on CPUs/tasks:
[  +0.005528] 	44-...0: (1 GPs behind) idle=42e/0/1
softirq=100999/101000 fqs=2209
[  +0.007504] 	(detected by 60, t=5255 jiffies, g=82049, c=82048, q=42045)
[  +0.006727] Sending NMI from CPU 60 to CPUs 44:
[  +0.001017] NMI watchdog: Watchdog detected hard LOCKUP on cpu 44
[  +0.000000] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink
nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack
nf_nat nf_conntrack br_netfilter bridge stp llc overlay nfsv3
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding intel_rapl
sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
iTCO_wdt kvm iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel intel_cstate intel_uncore xfs intel_rapl_perf
libcrc32c pcspkr ast ttm drm_kms_helper tpm_tis tpm_tis_core drm tpm
mei_me sg joydev mei i2c_algo_bit ioatdma lpc_ich evdev wmi rng_core
acpi_pad pcc_cpufreq button nfsd auth_rpcgss nfs_acl ipmi_si lockd
ipmi_poweroff grace ipmi_devintf ipmi_msghandler sunrpc ip_tables
x_tables autofs4
[  +0.000036]  ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb
dm_mod sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64
crypto_simd cryptd glue_helper nvme nvme_core ahci libahci ehci_pci
ehci_hcd libata megaraid_sas scsi_mod usbcore i2c_i801 ixgbe usb_common
dca mdio
[  +0.000015] CPU: 44 PID: 0 Comm: swapper/44 Tainted: G             L
  4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
[  +0.000001] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
3.0a 05/30/2017
[  +0.000001] RIP: 0010:try_to_wake_up+0x3b4/0x4a0
[  +0.000000] Code: 44 24 0c 0f 84 5a fe ff ff 49 8b 8c 2f 78 09 00 00
48 8b 11 eb 1c f6 c2 08 75 57 48 89 d6 48 89 d0 48 83 ce 08 f0 48 0f b1
31 <48> 39 c2 74 43 48 89 c2 f7 c2 00 00 20 00 75 dc 44 89 c7 44 89 04
[  +0.000022] RSP: 0018:ffff8c1dfee83c88 EFLAGS: 00000046
[  +0.000001] RAX: 0000000080200000 RBX: ffff8bedf328c9c0 RCX:
ffff8c0dfc2d8000
[  +0.000000] RDX: 0000000080200000 RSI: 0000000080200008 RDI:
ffff8bedf328c9f0
[  +0.000001] RBP: ffff8c0dff700000 R08: 0000000000000020 R09:
000000000000002c
[  +0.000000] R10: 000000000000002c R11: 0000000000000001 R12:
ffff8bedf328d10c
[  +0.000000] R13: 0000000000000000 R14: 0000000000000046 R15:
0000000000021b80
[  +0.000001] FS:  0000000000000000(0000) GS:ffff8c1dfee80000(0000)
knlGS:0000000000000000
[  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 00007fe833e00810 CR3: 000000260e20a004 CR4:
00000000003606e0
[  +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000

Message from syslogd@talisker at Oct  5 17:57:42 ...
 kernel:[ 2511.411044] NMI watchdog: Watchdog detected hard LOCKUP on cpu 44
[  +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000000] Call Trace:
[  +0.000000]  <IRQ>
[  +0.000001]  ? __wake_up_common+0x96/0x180
[  +0.000000]  autoremove_wake_function+0x11/0x50
[  +0.000001]  __wake_up_common+0x96/0x180
[  +0.000000]  __wake_up_common_lock+0x7c/0xc0
[  +0.000000]  wbt_done+0x4b/0x80
[  +0.000001]  blk_mq_free_request+0xae/0x150
[  +0.000000]  scsi_end_request+0x95/0x1e0 [scsi_mod]
[  +0.000000]  scsi_io_completion+0x404/0x6a0 [scsi_mod]
[  +0.000001]  blk_mq_complete_request+0x9c/0x100
[  +0.000000]  complete_cmd_fusion+0x23a/0x4a0 [megaraid_sas]
[  +0.000001]  megasas_isr_fusion+0x36/0x180 [megaraid_sas]
[  +0.000000]  __handle_irq_event_percpu+0x81/0x190
[  +0.000001]  handle_irq_event_percpu+0x30/0x80
[  +0.000000]  handle_irq_event+0x3c/0x60
[  +0.000000]  handle_edge_irq+0x94/0x1f0
[  +0.000001]  handle_irq+0x1f/0x30
[  +0.000000]  do_IRQ+0x49/0xd0
[  +0.000000]  common_interrupt+0xf/0xf
[  +0.000001]  </IRQ>
[  +0.000000] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
[  +0.000000] Code: c8 28 48 e8 bb b9 b2 ff 48 89 04 24 0f 1f 44 00 00
31 ff e8 4b c4 b2 ff 80 7c 24 0f 00 0f 85 b6 01 00 00 fb 66 0f 1f 44 00
00 <48> 8b 0c 24 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f9 48 89 c8 48 c1
[  +0.000014] RSP: 0018:ffffa8cc4c69be78 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffdd
[  +0.000001] RAX: ffff8c1dfeea1b80 RBX: 0000000000000002 RCX:
000000000000001f
[  +0.000001] RDX: 00000243d61ca5da RSI: 0000000040000219 RDI:
0000000000000000
[  +0.000001] RBP: ffff8c1dfeeaaf78 R08: 00000000ffffffff R09:
0000000000000008
[  +0.000000] R10: 00000000000000c0 R11: 00000000000000de R12:
ffffffffb88b3ad8
[  +0.000001] R13: 0000000000000002 R14: 0000000000000002 R15:
00000243d61bae85
[  +0.000000]  ? cpuidle_enter_state+0x95/0x2b0
[  +0.000001]  do_idle+0x204/0x270
[  +0.000000]  cpu_startup_entry+0x6f/0x80
[  +0.000000]  start_secondary+0x1a4/0x1f0
[  +0.000001]  secondary_startup_64+0xa5/0xb0
[  +0.000000] NMI backtrace for cpu 44
[  +0.000001] CPU: 44 PID: 0 Comm: swapper/44 Tainted: G             L
  4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
[  +0.000000] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
3.0a 05/30/2017
[  +0.000001] RIP: 0010:__list_del_entry_valid+0x28/0x90
[  +0.000000] Code: 00 00 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b
57 08 48 39 c8 74 26 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 2b 48 8b
12 <48> 39 d7 75 34 48 8b 50 08 48 39 d7 75 3c b8 01 00 00 00 c3 48 89
[  +0.000013] RSP: 0018:ffff8c1dfee83ce0 EFLAGS: 00000002
[  +0.000001] RAX: ffffa8cc604ef410 RBX: ffffa8cc604db3f8 RCX:
dead000000000200
[  +0.000001] RDX: ffffa8cc604db410 RSI: 0000000000000046 RDI:
ffffa8cc604db410
[  +0.000000] RBP: 0000000000000001 R08: 0000000000000057 R09:
000000000000002c
[  +0.000001] R10: 000000000000002c R11: 0000000000000001 R12:
ffffa8cc604db410
[  +0.000000] R13: ffff8bedf5807770 R14: 0000000000000000 R15:
ffffa8cc604ef3f8
[  +0.000000] FS:  0000000000000000(0000) GS:ffff8c1dfee80000(0000)
knlGS:0000000000000000
[  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000000] CR2: 00007fe833e00810 CR3: 000000260e20a004 CR4:
00000000003606e0
[  +0.000001] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000001] Call Trace:
[  +0.000000]  <IRQ>
[  +0.000000]  autoremove_wake_function+0x2a/0x50
[  +0.000001]  __wake_up_common+0x96/0x180
[  +0.000000]  __wake_up_common_lock+0x7c/0xc0
[  +0.000000]  wbt_done+0x4b/0x80
[  +0.000001]  blk_mq_free_request+0xae/0x150
[  +0.000000]  scsi_end_request+0x95/0x1e0 [scsi_mod]
[  +0.000000]  scsi_io_completion+0x404/0x6a0 [scsi_mod]
[  +0.000001]  blk_mq_complete_request+0x9c/0x100
[  +0.000000]  complete_cmd_fusion+0x23a/0x4a0 [megaraid_sas]
[  +0.000001]  megasas_isr_fusion+0x36/0x180 [megaraid_sas]
[  +0.000000]  __handle_irq_event_percpu+0x81/0x190
[  +0.000000]  handle_irq_event_percpu+0x30/0x80
[  +0.000001]  handle_irq_event+0x3c/0x60
[  +0.000000]  handle_edge_irq+0x94/0x1f0
[  +0.000001]  handle_irq+0x1f/0x30
[  +0.000000]  do_IRQ+0x49/0xd0
[  +0.000000]  common_interrupt+0xf/0xf
[  +0.000001]  </IRQ>
[  +0.000000] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
[  +0.000000] Code: c8 28 48 e8 bb b9 b2 ff 48 89 04 24 0f 1f 44 00 00
31 ff e8 4b c4 b2 ff 80 7c 24 0f 00 0f 85 b6 01 00 00 fb 66 0f 1f 44 00
00 <48> 8b 0c 24 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f9 48 89 c8 48 c1
[  +0.000014] RSP: 0018:ffffa8cc4c69be78 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffdd
[  +0.000001] RAX: ffff8c1dfeea1b80 RBX: 0000000000000002 RCX:
000000000000001f
[  +0.000000] RDX: 00000243d61ca5da RSI: 0000000040000219 RDI:
0000000000000000
[  +0.000001] RBP: ffff8c1dfeeaaf78 R08: 00000000ffffffff R09:
0000000000000008
[  +0.000000] R10: 00000000000000c0 R11: 00000000000000de R12:
ffffffffb88b3ad8
[  +0.000001] R13: 0000000000000002 R14: 0000000000000002 R15:
00000243d61bae85
[  +0.000000]  ? cpuidle_enter_state+0x95/0x2b0
[  +0.000001]  do_idle+0x204/0x270
[  +0.000000]  cpu_startup_entry+0x6f/0x80
[  +0.000001]  start_secondary+0x1a4/0x1f0
[  +0.000000]  secondary_startup_64+0xa5/0xb0
[Oct 5 17:58] watchdog: BUG: soft lockup - CPU#99 stuck for 23s!
[sshd:41460]
[  +0.007028] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink
nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack
nf_nat nf_conntrack br_netfilter bridge stp llc overlay nfsv3
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding intel_rapl
sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
iTCO_wdt kvm iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel intel_cstate intel_uncore xfs intel_rapl_perf
libcrc32c pcspkr ast ttm drm_kms_helper tpm_tis tpm_tis_core drm tpm
mei_me sg joydev mei i2c_algo_bit ioatdma lpc_ich evdev wmi rng_core
acpi_pad pcc_cpufreq button nfsd auth_rpcgss nfs_acl ipmi_si lockd
ipmi_poweroff grace ipmi_devintf ipmi_msghandler sunrpc ip_tables
x_tables autofs4
[  +0.000080]  ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb
dm_mod sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64
crypto_simd cryptd glue_helper nvme nvme_core ahci libahci ehci_pci
ehci_hcd libata megaraid_sas scsi_mod usbcore i2c_i801 ixgbe usb_common
dca mdio
[  +0.000035] CPU: 99 PID: 41460 Comm: sshd Tainted: G             L
4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
[  +0.000001] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
3.0a 05/30/2017
[  +0.000011] RIP: 0010:smp_call_function_many+0x209/0x260
[  +0.000001] Code: a4 5c 00 3b 05 8c 55 01 01 0f 83 7e fe ff ff 48 63
d0 48 8b 0b 48 03 0c d5 00 b7 6c b8 8b 51 18 83 e2 01 74 0a f3 90 8b 51
18 <83> e2 01 75 f6 eb c8 0f b6 4c 24 14 48 83 c4 18 4c 89 ea 5b 4c 89
[  +0.000040] RSP: 0018:ffffa8cc61843c10 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff13
[  +0.000003] RAX: 000000000000002c RBX: ffff8c1dff1e2ac0 RCX:
ffff8c1dfeea7d80
[  +0.000001] RDX: 0000000000000003 RSI: 0000000000000000 RDI:
ffff8c1dff1e2ac8
[  +0.000002] RBP: ffff8c1dff1e2ac8 R08: 0000000000000000 R09:
ffff8c1dff1e2b08
[  +0.000001] R10: ffff8c1dff1e2ac8 R11: ffffa8cc61843d40 R12:
ffffffffb786b200

Message from syslogd@talisker at Oct  5 17:58:15 ...
 kernel:[ 2544.790801] watchdog: BUG: soft lockup - CPU#99 stuck for
23s! [sshd:41460]
[  +0.000001] R13: 0000000000000000 R14: 0000000000000001 R15:
0000000000000200
[  +0.000003] FS:  00007f3e734bed40(0000) GS:ffff8c1dff1c0000(0000)
knlGS:0000000000000000
[  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000002] CR2: 00007f3e71647300 CR3: 0000003b6223e003 CR4:
00000000003606e0
[  +0.000001] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000002] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000001] Call Trace:
[  +0.000010]  ? load_new_mm_cr3+0xe0/0xe0
[  +0.000002]  on_each_cpu+0x28/0x60
[  +0.000004]  flush_tlb_kernel_range+0x48/0x90
[  +0.000007]  __purge_vmap_area_lazy+0x4d/0xc0
[  +0.000003]  vm_unmap_aliases+0xf5/0x130
[  +0.000003]  change_page_attr_set_clr+0xcb/0x440
[  +0.000004]  set_memory_ro+0x26/0x30
[  +0.000008]  bpf_prog_select_runtime+0x2d/0x110
[  +0.000006]  bpf_prepare_filter+0x3af/0x3f0
[  +0.000004]  bpf_prog_create_from_user+0xb9/0x110
[  +0.000004]  ? hardlockup_detector_perf_cleanup+0x80/0x80
[  +0.000002]  do_seccomp+0x289/0x6c0
[  +0.000004]  __x64_sys_prctl+0x162/0x4b0
[  +0.000007]  do_syscall_64+0x55/0x110
[  +0.000008]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000003] RIP: 0033:0x7f3e7164730a
[  +0.000000] Code: 48 8b 0d 91 fb 2a 00 f7 d8 64 89 01 48 83 c8 ff c3
66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 9d 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5e fb 2a 00 f7 d8 64 89 01 48
[  +0.000041] RSP: 002b:00007ffdc84f6a58 EFLAGS: 00000246 ORIG_RAX:
000000000000009d
[  +0.000002] RAX: ffffffffffffffda RBX: 00005601c5718da0 RCX:
00007f3e7164730a
[  +0.000001] RDX: 00005601c4d1af50 RSI: 0000000000000002 RDI:
0000000000000016
[  +0.000001] RBP: 00005601c571b590 R08: 0000000000000000 R09:
0000000000000005
[  +0.000002] R10: 00007f3e7164730a R11: 0000000000000246 R12:
0000000000000000
[  +0.000001] R13: 0000000000000016 R14: 0000000000000000 R15:
00007ffdc84f6ea0
[ +27.993133] watchdog: BUG: soft lockup - CPU#99 stuck for 23s!
[sshd:41460]
[  +0.007011] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink
nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack
nf_nat nf_conntrack br_netfilter bridge stp llc overlay nfsv3
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding intel_rapl
sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
iTCO_wdt kvm iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel intel_cstate intel_uncore xfs intel_rapl_perf
libcrc32c pcspkr ast ttm drm_kms_helper tpm_tis tpm_tis_core drm tpm
mei_me sg joydev mei i2c_algo_bit ioatdma lpc_ich evdev wmi rng_core
acpi_pad pcc_cpufreq button nfsd auth_rpcgss nfs_acl ipmi_si lockd
ipmi_poweroff grace ipmi_devintf ipmi_msghandler sunrpc ip_tables
x_tables autofs4
[  +0.000051]  ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb
dm_mod sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64
crypto_simd cryptd glue_helper nvme nvme_core ahci libahci ehci_pci
ehci_hcd libata megaraid_sas scsi_mod usbcore i2c_i801 ixgbe usb_common
dca mdio
[  +0.000024] CPU: 99 PID: 41460 Comm: sshd Tainted: G             L
4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
[  +0.000002] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
3.0a 05/30/2017
[  +0.000004] RIP: 0010:smp_call_function_many+0x209/0x260

Message from syslogd@talisker at Oct  5 17:58:43 ...
 kernel:[ 2572.791275] watchdog: BUG: soft lockup - CPU#99 stuck for
23s! [sshd:41460]
[  +0.000000] Code: a4 5c 00 3b 05 8c 55 01 01 0f 83 7e fe ff ff 48 63
d0 48 8b 0b 48 03 0c d5 00 b7 6c b8 8b 51 18 83 e2 01 74 0a f3 90 8b 51
18 <83> e2 01 75 f6 eb c8 0f b6 4c 24 14 48 83 c4 18 4c 89 ea 5b 4c 89
[  +0.000041] RSP: 0018:ffffa8cc61843c10 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff13
[  +0.000003] RAX: 000000000000002c RBX: ffff8c1dff1e2ac0 RCX:
ffff8c1dfeea7d80
[  +0.000001] RDX: 0000000000000003 RSI: 0000000000000000 RDI:
ffff8c1dff1e2ac8
[  +0.000001] RBP: ffff8c1dff1e2ac8 R08: 0000000000000000 R09:
ffff8c1dff1e2b08
[  +0.000001] R10: ffff8c1dff1e2ac8 R11: ffffa8cc61843d40 R12:
ffffffffb786b200
[  +0.000000] R13: 0000000000000000 R14: 0000000000000001 R15:
0000000000000200
[  +0.000002] FS:  00007f3e734bed40(0000) GS:ffff8c1dff1c0000(0000)
knlGS:0000000000000000
[  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 00007f3e71647300 CR3: 0000003b6223e003 CR4:
00000000003606e0
[  +0.000001] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000001] Call Trace:
[  +0.000004]  ? load_new_mm_cr3+0xe0/0xe0
[  +0.000002]  on_each_cpu+0x28/0x60
[  +0.000002]  flush_tlb_kernel_range+0x48/0x90
[  +0.000004]  __purge_vmap_area_lazy+0x4d/0xc0
[  +0.000002]  vm_unmap_aliases+0xf5/0x130
[  +0.000002]  change_page_attr_set_clr+0xcb/0x440
[  +0.000003]  set_memory_ro+0x26/0x30
[  +0.000003]  bpf_prog_select_runtime+0x2d/0x110
[  +0.000003]  bpf_prepare_filter+0x3af/0x3f0
[  +0.000003]  bpf_prog_create_from_user+0xb9/0x110
[  +0.000003]  ? hardlockup_detector_perf_cleanup+0x80/0x80
[  +0.000002]  do_seccomp+0x289/0x6c0
[  +0.000003]  __x64_sys_prctl+0x162/0x4b0
[  +0.000003]  do_syscall_64+0x55/0x110
[  +0.000003]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000002] RIP: 0033:0x7f3e7164730a
[  +0.000001] Code: 48 8b 0d 91 fb 2a 00 f7 d8 64 89 01 48 83 c8 ff c3
66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 9d 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5e fb 2a 00 f7 d8 64 89 01 48
[  +0.000040] RSP: 002b:00007ffdc84f6a58 EFLAGS: 00000246 ORIG_RAX:
000000000000009d
[  +0.000002] RAX: ffffffffffffffda RBX: 00005601c5718da0 RCX:
00007f3e7164730a
[  +0.000002] RDX: 00005601c4d1af50 RSI: 0000000000000002 RDI:
0000000000000016
[  +0.000001] RBP: 00005601c571b590 R08: 0000000000000000 R09:
0000000000000005
[  +0.000001] R10: 00007f3e7164730a R11: 0000000000000246 R12:
0000000000000000
[  +0.000002] R13: 0000000000000016 R14: 0000000000000000 R15:
00007ffdc84f6ea0
[  +1.612805] INFO: rcu_sched detected stalls on CPUs/tasks:
[  +0.005546] 	44-...0: (1 GPs behind) idle=42e/0/1
softirq=100999/101000 fqs=8947
[  +0.007499] 	(detected by 62, t=21010 jiffies, g=82049, c=82048, q=45933)
[  +0.006817] Sending NMI from CPU 62 to CPUs 44:
[  +0.001015] NMI backtrace for cpu 44
[  +0.000001] CPU: 44 PID: 0 Comm: swapper/44 Tainted: G             L
  4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
[  +0.000000] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
3.0a 05/30/2017
[  +0.000001] RIP: 0010:enqueue_task_fair+0x73/0x850
[  +0.000000] Code: 01 00 00 f6 83 88 04 00 00 02 0f 85 4d 07 00 00 48
85 ed 74 4a 44 8b 93 c0 00 00 00 45 85 d2 74 14 e9 8a 00 00 00 44 8b 4d
40 <41> bc 01 00 00 00 45 85 c9 75 7b 48 8b 9d 58 01 00 00 44 89 e2 48
[  +0.000015] RSP: 0018:ffff8c1dfee83bf0 EFLAGS: 00000086
[  +0.000001] RAX: 0000000000000000 RBX: ffff8c1df8c3d200 RCX:
0000000000000000
[  +0.000001] RDX: 0000000000000001 RSI: ffff8c1df8c3d230 RDI:
ffff8c1353b45918
[  +0.000000] RBP: ffff8c1df8c3ce00 R08: ffff8c1353b45900 R09:
0000000000000001
[  +0.000001] R10: 000000000000031f R11: 0000000000000001 R12:
0000000000000009
[  +0.000000] R13: ffffffffb7d81220 R14: 0000000000000046 R15:
0000000000021b80
[  +0.000001] FS:  0000000000000000(0000) GS:ffff8c1dfee80000(0000)
knlGS:0000000000000000
[  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 00007fe833e00810 CR3: 000000260e20a004 CR4:
00000000003606e0
[  +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000000] Call Trace:
[  +0.000000]  <IRQ>
[  +0.000001]  ? check_preempt_wakeup+0x103/0x250
[  +0.000000]  ttwu_do_activate+0x44/0x80
[  +0.000000]  try_to_wake_up+0x1ce/0x4a0
[  +0.000001]  ? __wake_up_common+0x96/0x180
[  +0.000000]  autoremove_wake_function+0x11/0x50
[  +0.000001]  __wake_up_common+0x96/0x180
[  +0.000000]  __wake_up_common_lock+0x7c/0xc0
[  +0.000000]  wbt_done+0x4b/0x80
[  +0.000001]  blk_mq_free_request+0xae/0x150
[  +0.000000]  scsi_end_request+0x95/0x1e0 [scsi_mod]
[  +0.000001]  scsi_io_completion+0x404/0x6a0 [scsi_mod]
[  +0.000000]  blk_mq_complete_request+0x9c/0x100
[  +0.000000]  complete_cmd_fusion+0x23a/0x4a0 [megaraid_sas]
[  +0.000001]  megasas_isr_fusion+0x36/0x180 [megaraid_sas]
[  +0.000000]  __handle_irq_event_percpu+0x81/0x190
[  +0.000000]  handle_irq_event_percpu+0x30/0x80
[  +0.000001]  handle_irq_event+0x3c/0x60
[  +0.000000]  handle_edge_irq+0x94/0x1f0
[  +0.000001]  handle_irq+0x1f/0x30
[  +0.000000]  do_IRQ+0x49/0xd0
[  +0.000000]  common_interrupt+0xf/0xf
[  +0.000001]  </IRQ>
[  +0.000000] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
[  +0.000000] Code: c8 28 48 e8 bb b9 b2 ff 48 89 04 24 0f 1f 44 00 00
31 ff e8 4b c4 b2 ff 80 7c 24 0f 00 0f 85 b6 01 00 00 fb 66 0f 1f 44 00
00 <48> 8b 0c 24 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f9 48 89 c8 48 c1
[  +0.000014] RSP: 0018:ffffa8cc4c69be78 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffdd
[  +0.000001] RAX: ffff8c1dfeea1b80 RBX: 0000000000000002 RCX:
000000000000001f
[  +0.000001] RDX: 00000243d61ca5da RSI: 0000000040000219 RDI:
0000000000000000
[  +0.000000] RBP: ffff8c1dfeeaaf78 R08: 00000000ffffffff R09:
0000000000000008
[  +0.000001] R10: 00000000000000c0 R11: 00000000000000de R12:
ffffffffb88b3ad8
[  +0.000000] R13: 0000000000000002 R14: 0000000000000002 R15:
00000243d61bae85
[  +0.000001]  ? cpuidle_enter_state+0x95/0x2b0
[  +0.000000]  do_idle+0x204/0x270
[  +0.000000]  cpu_startup_entry+0x6f/0x80
[  +0.000001]  start_secondary+0x1a4/0x1f0
[  +0.000000]  secondary_startup_64+0xa5/0xb0

-- 
Chris Boot
bootc@boo.tc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
  2018-10-05 17:05           ` Chris Boot
@ 2018-10-07  0:03             ` Tetsuo Handa
  2018-10-11 19:24               ` Josef Bacik
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2018-10-07  0:03 UTC (permalink / raw)
  To: Chris Boot; +Cc: Jens Axboe, linux-kernel, linux-block, Josef Bacik

Adding Josef.

On 2018/10/06 2:05, Chris Boot wrote:
> I upgraded the kernel on my affected system to a 4.18.6 kernel (Debian's
> 4.18.6-1~bpo9+1 in stretch-backports) and ran my test suite again. I'm
> sorry to report that the issue occurred once more.
> 
> Logs below, it's all I managed to get out of it before my session locked up.
> 
> [Oct 5 17:56] INFO: rcu_sched self-detected stall on CPU
> [  +0.003914] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  +0.001271] 	82-....: (1 GPs behind) idle=47a/0/3 softirq=60148/60149
> fqs=2234
> [  +0.012840] 	
> [  +0.000007] 	82-....: (1 GPs behind) idle=47a/0/3 softirq=60148/60149
> fqs=2235
> [  +0.000002]  (t=5255 jiffies g=82048 c=82047 q=35803)
> [  +0.008936] 	
> [  +0.000003] NMI backtrace for cpu 82
> [  +0.000005] (detected by 87, t=5257 jiffies, g=82048, c=82047, q=35803)
> [  +0.001598] CPU: 82 PID: 0 Comm: swapper/82 Not tainted
> 4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
> [  +0.000001] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
> 3.0a 05/30/2017
> [  +0.000001] Call Trace:
> [  +0.000004]  <IRQ>
> [  +0.000011]  dump_stack+0x5c/0x7b
> [  +0.000005]  nmi_cpu_backtrace+0x89/0x90
> [  +0.000007]  ? lapic_can_unplug_cpu+0xa0/0xa0
> [  +0.000002]  nmi_trigger_cpumask_backtrace+0xf5/0x130
> [  +0.000007]  rcu_dump_cpu_stacks+0x9b/0xcb
> [  +0.000003]  rcu_check_callbacks+0x79a/0x8e0
> [  +0.000007]  ? sched_clock_cpu+0xc/0xa0
> [  +0.000005]  ? tick_sched_do_timer+0x60/0x60
> [  +0.000005]  update_process_times+0x28/0x50
> [  +0.000003]  tick_sched_handle+0x22/0x60
> [  +0.000002]  tick_sched_timer+0x37/0x70
> [  +0.000002]  __hrtimer_run_queues+0xfc/0x270
> [  +0.000003]  hrtimer_interrupt+0x101/0x240
> [  +0.000004]  smp_apic_timer_interrupt+0x6a/0x130
> [  +0.000002]  apic_timer_interrupt+0xf/0x20
> [  +0.000006] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
> [  +0.000001] Code: 8b 00 a8 08 74 0b 65 81 25 d8 6b 11 48 ff ff ff 7f
> 44 89 e0 5b 5d 41 5c c3 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57
> 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 c6 07
> [  +0.000030] RSP: 0000:ffff8bfdffc83de8 EFLAGS: 00000202 ORIG_RAX:
> ffffffffffffff13
> [  +0.000002] RAX: 00000000ff72b790 RBX: ffff8bedf5807768 RCX:
> dead000000000200
> [  +0.000001] RDX: ffffa8cc4fa87410 RSI: 0000000000000202 RDI:
> 0000000000000202
> [  +0.000001] RBP: 00000000ff72b790 R08: ffff8bedf5807770 R09:
> 000003fffff00000
> [  +0.000001] R10: 0000000000000052 R11: 0000000000000001 R12:
> 0000000000000202
> [  +0.000001] R13: 0000000000000003 R14: 0000000000000000 R15:
> 0000000000000000
> [  +0.000001]  ? apic_timer_interrupt+0xa/0x20
> [  +0.000006]  __wake_up_common_lock+0x89/0xc0
> [  +0.000007]  rwb_wake_all+0x30/0x40
> [  +0.000003]  scale_up.part.25+0x24/0x40
> [  +0.000002]  wb_timer_fn+0x295/0x430
> [  +0.000007]  ? blk_mq_tag_update_depth+0x110/0x110
> [  +0.000001]  call_timer_fn+0x2b/0x120
> [  +0.000003]  run_timer_softirq+0x1d3/0x410
> [  +0.000002]  ? enqueue_hrtimer+0x3a/0x90
> [  +0.000002]  ? __hrtimer_run_queues+0x12c/0x270
> [  +0.000002]  __do_softirq+0x10d/0x2a6
> [  +0.000006]  irq_exit+0xb6/0xc0
> [  +0.000003]  smp_apic_timer_interrupt+0x74/0x130
> [  +0.000001]  apic_timer_interrupt+0xf/0x20
> [  +0.000001]  </IRQ>
> [  +0.000008] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
> [  +0.000001] Code: c8 28 48 e8 bb b9 b2 ff 48 89 04 24 0f 1f 44 00 00
> 31 ff e8 4b c4 b2 ff 80 7c 24 0f 00 0f 85 b6 01 00 00 fb 66 0f 1f 44 00
> 00 <48> 8b 0c 24 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f9 48 89 c8 48 c1
> [  +0.000028] RSP: 0000:ffffa8cc4c7cbe78 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff13
> [  +0.000002] RAX: ffff8bfdffca1b80 RBX: 0000000000000001 RCX:
> 000000000000001f
> [  +0.000001] RDX: 00000237c552d9f0 RSI: 0000000040000219 RDI:
> 0000000000000000
> [  +0.000000] RBP: ffff8bfdffcaaf78 R08: 00000000ffffffff R09:
> 0000000000000008
> [  +0.000001] R10: 00000000000000a9 R11: 00000000000000c2 R12:
> ffffffffb88b3a78
> [  +0.000001] R13: 0000000000000001 R14: 0000000000000001 R15:
> 00000237c55130a6
> [  +0.000004]  ? cpuidle_enter_state+0x95/0x2b0
> [  +0.000004]  do_idle+0x204/0x270
> [  +0.000003]  cpu_startup_entry+0x6f/0x80
> [  +0.000002]  start_secondary+0x1a4/0x1f0
> [  +0.000005]  secondary_startup_64+0xa5/0xb0

This trace contains rwb_wake_all() from scale_up(), which was removed by
commit a79050434b45959f ("blk-rq-qos: refactor out common elements of blk-wbt").

Josef, what is the reason you removed rwb_wake_all() from scale_up() (and
you added rwb_wake_all() to scale_down()) ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
  2018-10-07  0:03             ` Tetsuo Handa
@ 2018-10-11 19:24               ` Josef Bacik
  0 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2018-10-11 19:24 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Chris Boot, Jens Axboe, linux-kernel, linux-block, Josef Bacik

On Sun, Oct 07, 2018 at 09:03:17AM +0900, Tetsuo Handa wrote:
> Adding Josef.
> 
> On 2018/10/06 2:05, Chris Boot wrote:
> > I upgraded the kernel on my affected system to a 4.18.6 kernel (Debian's
> > 4.18.6-1~bpo9+1 in stretch-backports) and ran my test suite again. I'm
> > sorry to report that the issue occurred once more.
> > 
> > Logs below, it's all I managed to get out of it before my session locked up.
> > 
> > [Oct 5 17:56] INFO: rcu_sched self-detected stall on CPU
> > [  +0.003914] INFO: rcu_sched detected stalls on CPUs/tasks:
> > [  +0.001271] 	82-....: (1 GPs behind) idle=47a/0/3 softirq=60148/60149
> > fqs=2234
> > [  +0.012840] 	
> > [  +0.000007] 	82-....: (1 GPs behind) idle=47a/0/3 softirq=60148/60149
> > fqs=2235
> > [  +0.000002]  (t=5255 jiffies g=82048 c=82047 q=35803)
> > [  +0.008936] 	
> > [  +0.000003] NMI backtrace for cpu 82
> > [  +0.000005] (detected by 87, t=5257 jiffies, g=82048, c=82047, q=35803)
> > [  +0.001598] CPU: 82 PID: 0 Comm: swapper/82 Not tainted
> > 4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1
> > [  +0.000001] Hardware name: Supermicro SYS-8048B-TR4FT/X10QBi, BIOS
> > 3.0a 05/30/2017
> > [  +0.000001] Call Trace:
> > [  +0.000004]  <IRQ>
> > [  +0.000011]  dump_stack+0x5c/0x7b
> > [  +0.000005]  nmi_cpu_backtrace+0x89/0x90
> > [  +0.000007]  ? lapic_can_unplug_cpu+0xa0/0xa0
> > [  +0.000002]  nmi_trigger_cpumask_backtrace+0xf5/0x130
> > [  +0.000007]  rcu_dump_cpu_stacks+0x9b/0xcb
> > [  +0.000003]  rcu_check_callbacks+0x79a/0x8e0
> > [  +0.000007]  ? sched_clock_cpu+0xc/0xa0
> > [  +0.000005]  ? tick_sched_do_timer+0x60/0x60
> > [  +0.000005]  update_process_times+0x28/0x50
> > [  +0.000003]  tick_sched_handle+0x22/0x60
> > [  +0.000002]  tick_sched_timer+0x37/0x70
> > [  +0.000002]  __hrtimer_run_queues+0xfc/0x270
> > [  +0.000003]  hrtimer_interrupt+0x101/0x240
> > [  +0.000004]  smp_apic_timer_interrupt+0x6a/0x130
> > [  +0.000002]  apic_timer_interrupt+0xf/0x20
> > [  +0.000006] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
> > [  +0.000001] Code: 8b 00 a8 08 74 0b 65 81 25 d8 6b 11 48 ff ff ff 7f
> > 44 89 e0 5b 5d 41 5c c3 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 f7 57
> > 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 c6 07
> > [  +0.000030] RSP: 0000:ffff8bfdffc83de8 EFLAGS: 00000202 ORIG_RAX:
> > ffffffffffffff13
> > [  +0.000002] RAX: 00000000ff72b790 RBX: ffff8bedf5807768 RCX:
> > dead000000000200
> > [  +0.000001] RDX: ffffa8cc4fa87410 RSI: 0000000000000202 RDI:
> > 0000000000000202
> > [  +0.000001] RBP: 00000000ff72b790 R08: ffff8bedf5807770 R09:
> > 000003fffff00000
> > [  +0.000001] R10: 0000000000000052 R11: 0000000000000001 R12:
> > 0000000000000202
> > [  +0.000001] R13: 0000000000000003 R14: 0000000000000000 R15:
> > 0000000000000000
> > [  +0.000001]  ? apic_timer_interrupt+0xa/0x20
> > [  +0.000006]  __wake_up_common_lock+0x89/0xc0
> > [  +0.000007]  rwb_wake_all+0x30/0x40
> > [  +0.000003]  scale_up.part.25+0x24/0x40
> > [  +0.000002]  wb_timer_fn+0x295/0x430
> > [  +0.000007]  ? blk_mq_tag_update_depth+0x110/0x110
> > [  +0.000001]  call_timer_fn+0x2b/0x120
> > [  +0.000003]  run_timer_softirq+0x1d3/0x410
> > [  +0.000002]  ? enqueue_hrtimer+0x3a/0x90
> > [  +0.000002]  ? __hrtimer_run_queues+0x12c/0x270
> > [  +0.000002]  __do_softirq+0x10d/0x2a6
> > [  +0.000006]  irq_exit+0xb6/0xc0
> > [  +0.000003]  smp_apic_timer_interrupt+0x74/0x130
> > [  +0.000001]  apic_timer_interrupt+0xf/0x20
> > [  +0.000001]  </IRQ>
> > [  +0.000008] RIP: 0010:cpuidle_enter_state+0xa7/0x2b0
> > [  +0.000001] Code: c8 28 48 e8 bb b9 b2 ff 48 89 04 24 0f 1f 44 00 00
> > 31 ff e8 4b c4 b2 ff 80 7c 24 0f 00 0f 85 b6 01 00 00 fb 66 0f 1f 44 00
> > 00 <48> 8b 0c 24 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f9 48 89 c8 48 c1
> > [  +0.000028] RSP: 0000:ffffa8cc4c7cbe78 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffff13
> > [  +0.000002] RAX: ffff8bfdffca1b80 RBX: 0000000000000001 RCX:
> > 000000000000001f
> > [  +0.000001] RDX: 00000237c552d9f0 RSI: 0000000040000219 RDI:
> > 0000000000000000
> > [  +0.000000] RBP: ffff8bfdffcaaf78 R08: 00000000ffffffff R09:
> > 0000000000000008
> > [  +0.000001] R10: 00000000000000a9 R11: 00000000000000c2 R12:
> > ffffffffb88b3a78
> > [  +0.000001] R13: 0000000000000001 R14: 0000000000000001 R15:
> > 00000237c55130a6
> > [  +0.000004]  ? cpuidle_enter_state+0x95/0x2b0
> > [  +0.000004]  do_idle+0x204/0x270
> > [  +0.000003]  cpu_startup_entry+0x6f/0x80
> > [  +0.000002]  start_secondary+0x1a4/0x1f0
> > [  +0.000005]  secondary_startup_64+0xa5/0xb0
> 
> This trace contains rwb_wake_all() from scale_up(), which was removed by
> commit a79050434b45959f ("blk-rq-qos: refactor out common elements of blk-wbt").
> 
> Josef, what is the reason you removed rwb_wake_all() from scale_up() (and
> you added rwb_wake_all() to scale_down()) ?
> 

Oops, I screwed that up, sorry about that, it should be the other way around.
I'll send a fix for that shortly.

Josef

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-10-11 19:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-12 15:38 Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all() Chris Boot
2018-06-12 16:09 ` Jens Axboe
2018-06-12 16:19   ` Chris Boot
2018-06-12 16:22     ` Jens Axboe
2018-06-20 12:45       ` Chris Boot
2018-06-30 13:12         ` Tetsuo Handa
2018-10-05 17:05           ` Chris Boot
2018-10-07  0:03             ` Tetsuo Handa
2018-10-11 19:24               ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.