linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
@ 2019-05-15 21:36 Zygo Blaxell
  2019-05-16  6:57 ` Nikolay Borisov
  2020-02-04  4:57 ` storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+, fixed in 5.4) Zygo Blaxell
  0 siblings, 2 replies; 8+ messages in thread
From: Zygo Blaxell @ 2019-05-15 21:36 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 18424 bytes --]

"Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
CPU cores in kernel functions that are unable to make forward progress,
but also unwilling to release their respective CPU cores.  This is
usually accompanied by a lot of CPU usage (detectable as either kvm CPU
usage or just a lot of CPU fan noise) though I don't know if all cores
are spinning or only some of them.

The kernel console presents a continual stream of "BUG: soft lockup"
warnings for some days.  None of the call traces change during this time.
The only way out is to reboot.

You can reproduce this by writing a bunch of data to a filesystem while
bees is running on all cores.  It takes a few days to occur naturally.
It can probably be sped up by just doing a bunch of random LOGICAL_INO
ioctls in a tight loop on each core.

Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
(btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
ioctl (btrfs_tree_read_lock_atomic):

	[509506.128259] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-transacti:4716]
	[509506.130000] Modules linked in: mq_deadline bfq dm_cache_smq ppdev joydev dm_cache dm_persistent_data dm_bio_prison dm_bufio crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod aesni_intel aes_x86_64 crypto_simd cryptd glue_helper sr_mod cdrom sg ide_pci_generic piix input_leds i2c_piix4 bochs_drm ide_core rtc_cmos floppy parport_pc parport psmouse serio_raw evbug evdev snd_pcm snd_timer snd soundcore pcspkr qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
	[509506.137503] irq event stamp: 217070078
	[509506.138165] hardirqs last  enabled at (217070077): [<ffffffffa2dcafb6>] _raw_spin_unlock_irqrestore+0x36/0x60
	[509506.140129] hardirqs last disabled at (217070078): [<ffffffffa2dc32c9>] __schedule+0xd9/0xb70
	[509506.141653] softirqs last  enabled at (217069454): [<ffffffffa30003a0>] __do_softirq+0x3a0/0x45a
	[509506.143251] softirqs last disabled at (217069443): [<ffffffffa20a9949>] irq_exit+0xe9/0xf0
	[509506.144782] CPU: 0 PID: 4716 Comm: btrfs-transacti Tainted: G      D W    L    5.0.11-zb64-ae88fcd98bb4+ #1
	[509506.146453] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[509506.147881] RIP: 0010:queued_write_lock_slowpath+0x4e/0x90
	[509506.148912] Code: c0 75 0d ba ff 00 00 00 f0 0f b1 13 85 c0 74 33 f0 81 03 00 01 00 00 b9 ff 00 00 00 be 00 01 00 00 8b 03 3d 00 01 00 00 74 0c <f3> 90 8b 13 81 fa 00 01 00 00 75 f4 89 f0 f0 0f b1 0b 3d 00 01 00
	[509506.152585] RSP: 0018:ffffab0b412579c0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
	[509506.154027] RAX: 00000000000001ff RBX: ffff89ea1ac6fde8 RCX: 00000000000000ff
	[509506.155449] RDX: 00000000000001ff RSI: 0000000000000100 RDI: ffff89ea1ac6fde8
	[509506.156729] RBP: ffffab0b412579d0 R08: ffffffffa2502785 R09: 0000000000000000
	[509506.157946] R10: ffffab0b41257980 R11: ffff89ea1ac6fe00 R12: ffff89ea1ac6fdec
	[509506.159175] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
	[509506.160345] FS:  0000000000000000(0000) GS:ffff89ebb5c00000(0000) knlGS:0000000000000000
	[509506.161714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[509506.162703] CR2: 00007f381507ca68 CR3: 000000021b0b0003 CR4: 00000000001606f0
	[509506.164006] Call Trace:
	[509506.164473]  do_raw_write_lock+0xae/0xb0
	[509506.165315]  _raw_write_lock+0x55/0x70
	[509506.166026]  btrfs_try_tree_write_lock+0x35/0x90
	[509506.167033]  btrfs_search_slot+0x41c/0x930
	[509506.167981]  lookup_inline_extent_backref+0x118/0x570
	[509506.168929]  ? kvm_sched_clock_read+0x18/0x30
	[509506.169763]  insert_inline_extent_backref+0x51/0xe0
	[509506.170686]  __btrfs_inc_extent_ref+0x87/0x220
	[509506.171583]  ? lock_acquire+0xbd/0x1c0
	[509506.172394]  run_delayed_tree_ref+0x182/0x1f0
	[509506.173264]  run_one_delayed_ref+0x94/0xa0
	[509506.174043]  btrfs_run_delayed_refs_for_head+0x17b/0x3b0
	[509506.175064]  __btrfs_run_delayed_refs+0x84/0x180
	[509506.175948]  btrfs_run_delayed_refs+0x86/0x1e0
	[509506.176803]  btrfs_commit_transaction+0x52/0xa00
	[509506.177695]  ? start_transaction+0x93/0x4d0
	[509506.178506]  transaction_kthread+0x155/0x190
	[509506.179450]  kthread+0x113/0x150
	[509506.180222]  ? btrfs_cleanup_transaction+0x630/0x630
	[509506.181178]  ? kthread_park+0x90/0x90
	[509506.181896]  ret_from_fork+0x3a/0x50

	[509514.134266] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [crawl_6501:4815]
	[509514.135639] Modules linked in: mq_deadline bfq dm_cache_smq ppdev joydev dm_cache dm_persistent_data dm_bio_prison dm_bufio crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod aesni_intel aes_x86_64 crypto_simd cryptd glue_helper sr_mod cdrom sg ide_pci_generic piix input_leds i2c_piix4 bochs_drm ide_core rtc_cmos floppy parport_pc parport psmouse serio_raw evbug evdev snd_pcm snd_timer snd soundcore pcspkr qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
	[509514.144037] irq event stamp: 2124844710
	[509514.144834] hardirqs last  enabled at (2124844709): [<ffffffffa20037da>] trace_hardirqs_on_thunk+0x1a/0x1c
	[509514.146349] hardirqs last disabled at (2124844710): [<ffffffffa2dc32c9>] __schedule+0xd9/0xb70
	[509514.148170] softirqs last  enabled at (2124844566): [<ffffffffa30003a0>] __do_softirq+0x3a0/0x45a
	[509514.149577] softirqs last disabled at (2124844467): [<ffffffffa20a9949>] irq_exit+0xe9/0xf0
	[509514.150981] CPU: 1 PID: 4815 Comm: crawl_6501 Tainted: G      D W    L    5.0.11-zb64-ae88fcd98bb4+ #1
	[509514.152659] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[509514.154373] RIP: 0010:queued_read_lock_slowpath+0x43/0x80
	[509514.155226] Code: 1f 00 75 3b f0 81 2f 00 02 00 00 ba 01 00 00 00 4c 8d 67 04 f0 0f b1 57 04 75 37 f0 81 03 00 02 00 00 8b 03 84 c0 74 08 f3 90 <8b> 13 84 d2 75 f8 4c 89 e7 e8 23 d3 ff ff 66 90 5b 41 5c 5d c3 8b
	[509514.158458] RSP: 0018:ffffab0b4145b8d0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
	[509514.160094] RAX: 00000000000002ff RBX: ffff89eaa518b0e8 RCX: 0000000000000002
	[509514.161452] RDX: 00000000000002ff RSI: ffffffffa25026b7 RDI: ffff89eaa518b0e8
	[509514.162946] RBP: ffffab0b4145b8e0 R08: ffffffffa25026b7 R09: 0000000000000000
	[509514.164116] R10: ffffab0b4145b890 R11: ffff89eaa518b100 R12: ffff89eaa518b0ec
	[509514.165865] R13: ffff89eaa518b0e8 R14: 0000000000000001 R15: 0000000000000002
	[509514.167028] FS:  00007fc7a2c08700(0000) GS:ffff89ebb6000000(0000) knlGS:0000000000000000
	[509514.168405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[509514.169342] CR2: 000055fa1280ad60 CR3: 000000021b0b0001 CR4: 00000000001606e0
	[509514.170504] Call Trace:
	[509514.170927]  do_raw_read_lock+0x4b/0x50
	[509514.171643]  _raw_read_lock+0x58/0x70
	[509514.172343]  btrfs_tree_read_lock_atomic+0x37/0x70
	[509514.173276]  btrfs_search_slot+0x631/0x930
	[509514.173971]  ? trace_hardirqs_on+0x4c/0x100
	[509514.174810]  find_parent_nodes+0x19e/0x1340
	[509514.175596]  btrfs_find_all_roots_safe+0xc5/0x140
	[509514.176534]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[509514.177512]  iterate_extent_inodes+0x198/0x3e0
	[509514.178332]  iterate_inodes_from_logical+0xa1/0xd0
	[509514.179234]  ? iterate_inodes_from_logical+0xa1/0xd0
	[509514.180186]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[509514.181155]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[509514.182097]  btrfs_ioctl+0xcf7/0x2cb0
	[509514.182837]  ? lock_acquire+0xbd/0x1c0
	[509514.183519]  ? lock_acquire+0xbd/0x1c0
	[509514.184215]  do_vfs_ioctl+0xa6/0x6e0
	[509514.184957]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[509514.185966]  ? do_vfs_ioctl+0xa6/0x6e0
	[509514.186687]  ? __fget+0x119/0x200
	[509514.187277]  ksys_ioctl+0x75/0x80
	[509514.187958]  __x64_sys_ioctl+0x1a/0x20
	[509514.188677]  do_syscall_64+0x65/0x190
	[509514.189340]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[509514.190278] RIP: 0033:0x7fc7a4d02777
	[509514.191003] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[509514.194420] RSP: 002b:00007fc7a2c05458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[509514.195742] RAX: ffffffffffffffda RBX: 00007fc7a2c05780 RCX: 00007fc7a4d02777
	[509514.197070] RDX: 00007fc7a2c05788 RSI: 00000000c038943b RDI: 0000000000000004
	[509514.198340] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fc7a2c05960
	[509514.199699] R10: 000055b03d58cc40 R11: 0000000000000246 R12: 0000000000000004
	[509514.201021] R13: 00007fc7a2c05788 R14: 00007fc7a2c05668 R15: 00007fc7a2c05890

	[507290.135151] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [crawl_6501:4813]
	[507290.135157] Modules linked in: mq_deadline bfq dm_cache_smq ppdev joydev dm_cache dm_persistent_data dm_bio_prison dm_bufio crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod aesni_intel aes_x86_64 crypto_simd cryptd glue_helper sr_mod cdrom sg ide_pci_generic piix input_leds i2c_piix4 bochs_drm ide_core rtc_cmos floppy parport_pc parport psmouse serio_raw evbug evdev snd_pcm snd_timer snd soundcore pcspkr qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
	[507290.135177] irq event stamp: 2472772428
	[507290.135186] hardirqs last  enabled at (2472772427): [<ffffffffa21448d6>] __call_rcu.constprop.43+0x126/0x430
	[507290.135187] hardirqs last disabled at (2472772428): [<ffffffffa2dc32c9>] __schedule+0xd9/0xb70
	[507290.135189] softirqs last  enabled at (2472772410): [<ffffffffa30003a0>] __do_softirq+0x3a0/0x45a
	[507290.135191] softirqs last disabled at (2472772383): [<ffffffffa20a9949>] irq_exit+0xe9/0xf0
	[507290.135199] CPU: 2 PID: 4813 Comm: crawl_6501 Tainted: G      D W    L    5.0.11-zb64-ae88fcd98bb4+ #1
	[507290.135200] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[507290.135202] RIP: 0010:native_safe_halt+0x6/0x10
	[507290.135204] Code: 5d ff ff ff 7f 5d c3 65 48 8b 04 25 00 6e 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90
	[507290.135211] RSP: 0018:ffffab0b4144b850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
	[507290.135212] RAX: 0000000000000000 RBX: ffff89eaa518b0ec RCX: 0000000000000008
	[507290.135212] RDX: ffff89eba5f08000 RSI: ffffffffa2119a63 RDI: ffffffffa207a316
	[507290.135213] RBP: ffffab0b4144b850 R08: 0000000000000000 R09: 000000000000004c
	[507290.135214] R10: ffffab0b4144b890 R11: ffff89eaa518b100 R12: 0000000000000246
	[507290.135214] R13: 0000000000000003 R14: 0000000000000100 R15: 0000000000000000
	[507290.135215] FS:  00007fc7a3c0a700(0000) GS:ffff89ebb6400000(0000) knlGS:0000000000000000
	[507290.135216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[507290.135217] CR2: 000055def62f0f90 CR3: 000000021b0b0002 CR4: 00000000001606e0
	[507290.135226] Call Trace:
	[507290.135229]  kvm_wait+0x8b/0x90
	[507290.135231]  __pv_queued_spin_lock_slowpath+0x273/0x2b0
	[507290.135240]  queued_read_lock_slowpath+0x75/0x80
	[507290.135241]  do_raw_read_lock+0x4b/0x50
	[507290.135243]  _raw_read_lock+0x58/0x70
	[507290.135245]  btrfs_tree_read_lock_atomic+0x37/0x70
	[507290.135254]  btrfs_search_slot+0x631/0x930
	[507290.135256]  ? trace_hardirqs_on+0x4c/0x100
	[507290.135266]  find_parent_nodes+0x19e/0x1340
	[507290.135280]  btrfs_find_all_roots_safe+0xc5/0x140
	[507290.135282]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[507290.135284]  iterate_extent_inodes+0x198/0x3e0
	[507290.135295]  iterate_inodes_from_logical+0xa1/0xd0
	[507290.135312]  ? iterate_inodes_from_logical+0xa1/0xd0
	[507290.135313]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[507290.135316]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[507290.135324]  btrfs_ioctl+0xcf7/0x2cb0
	[507290.135327]  ? lock_acquire+0xbd/0x1c0
	[507290.135330]  ? lock_acquire+0xbd/0x1c0
	[507290.135339]  do_vfs_ioctl+0xa6/0x6e0
	[507290.135340]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[507290.135341]  ? do_vfs_ioctl+0xa6/0x6e0
	[507290.135343]  ? __fget+0x119/0x200
	[507290.135351]  ksys_ioctl+0x75/0x80
	[507290.135353]  __x64_sys_ioctl+0x1a/0x20
	[507290.135355]  do_syscall_64+0x65/0x190
	[507290.135357]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[507290.135365] RIP: 0033:0x7fc7a4d02777
	[507290.135367] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 0
	1 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[507290.135367] RSP: 002b:00007fc7a3c07458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[507290.135368] RAX: ffffffffffffffda RBX: 00007fc7a3c07780 RCX: 00007fc7a4d02777
	[507290.135369] RDX: 00007fc7a3c07788 RSI: 00000000c038943b RDI: 0000000000000004
	[507290.135370] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fc7a3c07960
	[507290.135376] R10: 000055b03d58cc40 R11: 0000000000000246 R12: 0000000000000004
	[507290.135376] R13: 00007fc7a3c07788 R14: 00007fc7a3c07668 R15: 00007fc7a3c07890
	[507290.135680] hardirqs last  enabled at (217070077): [<ffffffffa2dcafb6>] _raw_spin_unlock_irqrestore+0x36/0x60
	[507290.135682] hardirqs last disabled at (217070078): [<ffffffffa2dc32c9>] __schedule+0xd9/0xb70

	[506450.138317] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [crawl_6501:4816]
	[506450.138318] Modules linked in: mq_deadline bfq dm_cache_smq ppdev joydev dm_cache dm_persistent_data dm_bio_prison dm_bufio crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod aesni_intel aes_x86_64 crypto_simd cryptd glue_helper sr_mod cdrom sg ide_pci_generic piix input_leds i2c_piix4 bochs_drm ide_core rtc_cmos floppy parport_pc parport psmouse serio_raw evbug evdev snd_pcm snd_timer snd soundcore pcspkr qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
	[506450.138334] irq event stamp: 2322381922
	[506450.138338] hardirqs last  enabled at (2322381921): [<ffffffffa22d525b>] kmem_cache_free+0x6b/0x1e0
	[506450.138340] hardirqs last disabled at (2322381922): [<ffffffffa2dc32c9>] __schedule+0xd9/0xb70
	[506450.138342] softirqs last  enabled at (2322378644): [<ffffffffa30003a0>] __do_softirq+0x3a0/0x45a
	[506450.138344] softirqs last disabled at (2322378637): [<ffffffffa20a9949>] irq_exit+0xe9/0xf0
	[506450.138346] CPU: 3 PID: 4816 Comm: crawl_6501 Tainted: G      D W    L    5.0.11-zb64-ae88fcd98bb4+ #1
	[506450.138347] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[506450.138349] RIP: 0010:native_safe_halt+0x6/0x10
	[506450.138352] Code: 5d ff ff ff 7f 5d c3 65 48 8b 04 25 00 6e 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90
	[506450.138352] RSP: 0018:ffffab0b41463850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
	[506450.138353] RAX: 0000000000000000 RBX: ffff89ebb6a159d4 RCX: 0000000000000001
	[506450.138354] RDX: ffff89eb9b10c000 RSI: ffffffffa21199dd RDI: ffffffffa207a316
	[506450.138355] RBP: ffffab0b41463850 R08: 0000000000000000 R09: 0000000000000000
	[506450.138356] R10: ffffab0b41463890 R11: ffff89eaa518b100 R12: 0000000000000246
	[506450.138356] R13: 0000000000000001 R14: ffff89ebb6a159d4 R15: 0000000000000001
	[506450.138357] FS:  00007fc7a2407700(0000) GS:ffff89ebb6800000(0000) knlGS:0000000000000000
	[506450.138358] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[506450.138359] CR2: 00007f326a211c50 CR3: 000000021b0b0006 CR4: 00000000001606e0
	[506450.138362] Call Trace:
	[506450.138365]  kvm_wait+0x8b/0x90
	[506450.138368]  __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[506450.138371]  queued_read_lock_slowpath+0x75/0x80
	[506450.138372]  do_raw_read_lock+0x4b/0x50
	[506450.138374]  _raw_read_lock+0x58/0x70
	[506450.138376]  btrfs_tree_read_lock_atomic+0x37/0x70
	[506450.138379]  btrfs_search_slot+0x631/0x930
	[506450.138382]  ? trace_hardirqs_on+0x4c/0x100
	[506450.138386]  find_parent_nodes+0x19e/0x1340
	[506450.138394]  btrfs_find_all_roots_safe+0xc5/0x140
	[506450.138396]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[506450.138399]  iterate_extent_inodes+0x198/0x3e0
	[506450.138404]  iterate_inodes_from_logical+0xa1/0xd0
	[506450.138406]  ? iterate_inodes_from_logical+0xa1/0xd0
	[506450.138407]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[506450.138410]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[506450.138412]  btrfs_ioctl+0xcf7/0x2cb0
	[506450.138415]  ? lock_acquire+0xbd/0x1c0
	[506450.138418]  ? lock_acquire+0xbd/0x1c0
	[506450.138421]  do_vfs_ioctl+0xa6/0x6e0
	[506450.138422]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[506450.138423]  ? do_vfs_ioctl+0xa6/0x6e0
	[506450.138425]  ? __fget+0x119/0x200
	[506450.138428]  ksys_ioctl+0x75/0x80
	[506450.138430]  __x64_sys_ioctl+0x1a/0x20
	[506450.138432]  do_syscall_64+0x65/0x190
	[506450.138434]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[506450.138435] RIP: 0033:0x7fc7a4d02777
	[506450.138437] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[506450.138438] RSP: 002b:00007fc7a2404458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[506450.138439] RAX: ffffffffffffffda RBX: 00007fc7a2404780 RCX: 00007fc7a4d02777
	[506450.138440] RDX: 00007fc7a2404788 RSI: 00000000c038943b RDI: 0000000000000004
	[506450.138440] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fc7a2404960
	[506450.138441] R10: 000055b03d58cc40 R11: 0000000000000246 R12: 0000000000000004
	[506450.138442] R13: 00007fc7a2404788 R14: 00007fc7a2404668 R15: 00007fc7a2404890
	[506450.142132] irq event stamp: 2124844710
	[506450.142136] hardirqs last  enabled at (2124844709): [<ffffffffa20037da>] trace_hardirqs_on_thunk+0x1a/0x1c

[ed note: The logs are from different timestamps because I wanted to get a
clean copy of each stack trace.  Usually two of the cores are running call
traces at the same time, making jumbled output, so I picked call traces
from different points in the log to make the traces as cleanly separated
as possible.  Any cut+paste errors are mine.  The stack traces for each
core are all identical and there are hundreds of them from this instance.]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
  2019-05-15 21:36 storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+) Zygo Blaxell
@ 2019-05-16  6:57 ` Nikolay Borisov
  2019-05-16 16:20   ` Zygo Blaxell
  2019-05-18  4:44   ` Zygo Blaxell
  2020-02-04  4:57 ` storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+, fixed in 5.4) Zygo Blaxell
  1 sibling, 2 replies; 8+ messages in thread
From: Nikolay Borisov @ 2019-05-16  6:57 UTC (permalink / raw)
  To: Zygo Blaxell, linux-btrfs



On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
> "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> CPU cores in kernel functions that are unable to make forward progress,
> but also unwilling to release their respective CPU cores.  This is
> usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> usage or just a lot of CPU fan noise) though I don't know if all cores
> are spinning or only some of them.
> 
> The kernel console presents a continual stream of "BUG: soft lockup"
> warnings for some days.  None of the call traces change during this time.
> The only way out is to reboot.
> 
> You can reproduce this by writing a bunch of data to a filesystem while
> bees is running on all cores.  It takes a few days to occur naturally.
> It can probably be sped up by just doing a bunch of random LOGICAL_INO
> ioctls in a tight loop on each core.
> 
> Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> ioctl (btrfs_tree_read_lock_atomic):


Provide output of all sleeping threads when this occur via
 echo w > /proc/sysrq-trigger.

Also do you have this patch on the affected machine:

38e3eebff643 ("btrfs: honor path->skip_locking in backref code") can you
try and test with it applied ?


<SNIP>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
  2019-05-16  6:57 ` Nikolay Borisov
@ 2019-05-16 16:20   ` Zygo Blaxell
  2019-05-18  4:44   ` Zygo Blaxell
  1 sibling, 0 replies; 8+ messages in thread
From: Zygo Blaxell @ 2019-05-16 16:20 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2174 bytes --]

On Thu, May 16, 2019 at 09:57:01AM +0300, Nikolay Borisov wrote:
> 
> 
> On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
> > "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> > CPU cores in kernel functions that are unable to make forward progress,
> > but also unwilling to release their respective CPU cores.  This is
> > usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> > usage or just a lot of CPU fan noise) though I don't know if all cores
> > are spinning or only some of them.
> > 
> > The kernel console presents a continual stream of "BUG: soft lockup"
> > warnings for some days.  None of the call traces change during this time.
> > The only way out is to reboot.
> > 
> > You can reproduce this by writing a bunch of data to a filesystem while
> > bees is running on all cores.  It takes a few days to occur naturally.
> > It can probably be sped up by just doing a bunch of random LOGICAL_INO
> > ioctls in a tight loop on each core.
> > 
> > Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> > (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> > ioctl (btrfs_tree_read_lock_atomic):
> 
> 
> Provide output of all sleeping threads when this occur via
>  echo w > /proc/sysrq-trigger.

The machine is dead in this state--it doesn't respond to pings or serial
port input, and can't run a shell.  The serial console doesn't respond
to BREAK-w either.  Those per-CPU stack traces every 22 seconds are all
I get, and also the only indication the system is not completely stopped.
The per-CPU stack traces do continue for days, and never report any
processes running other than those four.

> Also do you have this patch on the affected machine:
> 
> 38e3eebff643 ("btrfs: honor path->skip_locking in backref code") can you
> try and test with it applied ?

I have that patch applied already from when I was collecting deadlock
fixes earlier this year.

I have observations of storm-of-soft-lockups going back to at least
4.14 (it is #5 out of the 6 most common ways 4.14.y kernels fail).
So it is not a new bug.

> <SNIP>
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
  2019-05-16  6:57 ` Nikolay Borisov
  2019-05-16 16:20   ` Zygo Blaxell
@ 2019-05-18  4:44   ` Zygo Blaxell
  2019-05-18  8:27     ` Nikolay Borisov
  1 sibling, 1 reply; 8+ messages in thread
From: Zygo Blaxell @ 2019-05-18  4:44 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 46626 bytes --]

On Thu, May 16, 2019 at 09:57:01AM +0300, Nikolay Borisov wrote:
> 
> 
> On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
> > "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> > CPU cores in kernel functions that are unable to make forward progress,
> > but also unwilling to release their respective CPU cores.  This is
> > usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> > usage or just a lot of CPU fan noise) though I don't know if all cores
> > are spinning or only some of them.
> > 
> > The kernel console presents a continual stream of "BUG: soft lockup"
> > warnings for some days.  None of the call traces change during this time.
> > The only way out is to reboot.
> > 
> > You can reproduce this by writing a bunch of data to a filesystem while
> > bees is running on all cores.  It takes a few days to occur naturally.
> > It can probably be sped up by just doing a bunch of random LOGICAL_INO
> > ioctls in a tight loop on each core.
> > 
> > Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> > (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> > ioctl (btrfs_tree_read_lock_atomic):
> 
> 
> Provide output of all sleeping threads when this occur via
>  echo w > /proc/sysrq-trigger.

I fixed my SysRq configuration.  The results are...kind of interesting.
I guess the four threads that are running loops on all cores don't count
as "blocked"...

	[39610.791203] sysrq: SysRq : Show Blocked State
	[39610.791934]   task                        PC stack   pid father
	[39610.792832] btrfs-transacti D    0  4643      2 0x80000000
	[39610.793613] Call Trace:
	[39610.793970]  __schedule+0x3d4/0xb70
	[39610.794472]  schedule+0x3d/0x80
	[39610.794925]  wait_for_commit+0x59/0xa0
	[39610.795464]  ? wait_woken+0xa0/0xa0
	[39610.795962]  btrfs_commit_transaction+0x122/0xab0
	[39610.796635]  ? start_transaction+0x93/0x4d0
	[39610.797228]  transaction_kthread+0x155/0x190
	[39610.797836]  kthread+0x113/0x150
	[39610.798296]  ? btrfs_cleanup_transaction+0x630/0x630
	[39610.799004]  ? kthread_park+0x90/0x90
	[39610.799542]  ret_from_fork+0x3a/0x50
	[39610.800080] rsync           D    0 12832   8251 0x00000000
	[39610.800868] Call Trace:
	[39610.801223]  __schedule+0x3d4/0xb70
	[39610.801728]  schedule+0x3d/0x80
	[39610.802179]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.802919]  ? wake_up_var+0x40/0x40
	[39610.803436]  btrfs_setattr+0x316/0x5a0
	[39610.803970]  notify_change+0x2f0/0x450
	[39610.804513]  do_truncate+0x73/0xc0
	[39610.805029]  do_sys_ftruncate+0x12b/0x1c0
	[39610.805606]  __x64_sys_ftruncate+0x1b/0x20
	[39610.806193]  do_syscall_64+0x65/0x1a0
	[39610.806831]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.807556] RIP: 0033:0x7f10d5005c97
	[39610.808069] Code: Bad RIP value.
	[39610.808539] RSP: 002b:00007ffea589cf28 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.809649] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f10d5005c97
	[39610.810790] RDX: 0000000000000000 RSI: 00000000000001cc RDI: 0000000000000003
	[39610.811798] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000c846d527
	[39610.812873] R10: 00000000894b52a9 R11: 0000000000000246 R12: 00000000000001cc
	[39610.813936] R13: 0000000000000000 R14: 00000000000001cc R15: 0000000000000000
	[39610.815048] rsync           D    0 12834   8444 0x00000000
	[39610.815826] Call Trace:
	[39610.816181]  __schedule+0x3d4/0xb70
	[39610.816909]  schedule+0x3d/0x80
	[39610.817505]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.818264]  ? wake_up_var+0x40/0x40
	[39610.818808]  btrfs_setattr+0x316/0x5a0
	[39610.819373]  notify_change+0x2f0/0x450
	[39610.820129]  do_truncate+0x73/0xc0
	[39610.820739]  do_sys_ftruncate+0x12b/0x1c0
	[39610.821409]  __x64_sys_ftruncate+0x1b/0x20
	[39610.822040]  do_syscall_64+0x65/0x1a0
	[39610.822614]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.823385] RIP: 0033:0x7fade44efc97
	[39610.824203] Code: Bad RIP value.
	[39610.824877] RSP: 002b:00007ffc9f263818 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.826038] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fade44efc97
	[39610.828361] RDX: 0000000000000000 RSI: 00000000001b0b3c RDI: 0000000000000003
	[39610.829580] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000e9809e58
	[39610.830932] R10: 00000000f8cf8f2b R11: 0000000000000246 R12: 00000000001b0b3c
	[39610.832178] R13: 00000000001b0820 R14: 000000000000031c R15: 0000000000000000
	[39610.833491] rsync           D    0 12835   8183 0x00000000
	[39610.834353] Call Trace:
	[39610.834747]  __schedule+0x3d4/0xb70
	[39610.835286]  schedule+0x3d/0x80
	[39610.835782]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.836646]  ? wake_up_var+0x40/0x40
	[39610.837201]  btrfs_setattr+0x316/0x5a0
	[39610.837790]  notify_change+0x2f0/0x450
	[39610.838364]  do_truncate+0x73/0xc0
	[39610.838913]  do_sys_ftruncate+0x12b/0x1c0
	[39610.839542]  __x64_sys_ftruncate+0x1b/0x20
	[39610.840161]  do_syscall_64+0x65/0x1a0
	[39610.840720]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.841496] RIP: 0033:0x7f696d6efc97
	[39610.842138] Code: Bad RIP value.
	[39610.842679] RSP: 002b:00007fff121f34d8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.844008] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f696d6efc97
	[39610.845271] RDX: 0000000000000000 RSI: 00000000000001cc RDI: 0000000000000003
	[39610.846532] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000c846d527
	[39610.847696] R10: 00000000894b52a9 R11: 0000000000000246 R12: 00000000000001cc
	[39610.848839] R13: 0000000000000000 R14: 00000000000001cc R15: 0000000000000000
	[39610.850001] rsync           D    0 12838   7602 0x00000000
	[39610.850909] Call Trace:
	[39610.851332]  __schedule+0x3d4/0xb70
	[39610.851981]  schedule+0x3d/0x80
	[39610.852535]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.853455]  ? wake_up_var+0x40/0x40
	[39610.854085]  btrfs_setattr+0x316/0x5a0
	[39610.854763]  notify_change+0x2f0/0x450
	[39610.855420]  do_truncate+0x73/0xc0
	[39610.856001]  do_sys_ftruncate+0x12b/0x1c0
	[39610.856691]  __x64_sys_ftruncate+0x1b/0x20
	[39610.857420]  do_syscall_64+0x65/0x1a0
	[39610.858104]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.859016] RIP: 0033:0x7f2bed4a1c97
	[39610.859673] Code: Bad RIP value.
	[39610.860260] RSP: 002b:00007fffa14b3ba8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.861586] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2bed4a1c97
	[39610.862862] RDX: 0000000000000000 RSI: 00000000001b0b3c RDI: 0000000000000003
	[39610.864102] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000e9809e58
	[39610.865324] R10: 00000000f8cf8f2b R11: 0000000000000246 R12: 00000000001b0b3c
	[39610.866559] R13: 00000000001b0820 R14: 000000000000031c R15: 0000000000000000
	[39610.867760] rsync           D    0 12842   8631 0x00000000
	[39610.868684] Call Trace:
	[39610.869102]  __schedule+0x3d4/0xb70
	[39610.869694]  schedule+0x3d/0x80
	[39610.870224]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.871162]  ? wake_up_var+0x40/0x40
	[39610.871826]  btrfs_setattr+0x316/0x5a0
	[39610.872466]  notify_change+0x2f0/0x450
	[39610.873087]  do_truncate+0x73/0xc0
	[39610.873702]  do_sys_ftruncate+0x12b/0x1c0
	[39610.874419]  __x64_sys_ftruncate+0x1b/0x20
	[39610.875168]  do_syscall_64+0x65/0x1a0
	[39610.875841]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.876736] RIP: 0033:0x7fd20067bc97
	[39610.877382] Code: Bad RIP value.
	[39610.877993] RSP: 002b:00007ffd11215d58 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.879337] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd20067bc97
	[39610.880591] RDX: 0000000000000000 RSI: 0000000001b9face RDI: 0000000000000003
	[39610.881881] RBP: 0000000000000003 R08: 0000000000000000 R09: 000000007afe8ff0
	[39610.883163] R10: 00000000092d5688 R11: 0000000000000246 R12: 0000000001b9face
	[39610.884312] R13: 0000000001b9e700 R14: 00000000000013ce R15: 0000000000000000
	[39610.885445] rsync           D    0 12845   8614 0x00000000
	[39610.886352] Call Trace:
	[39610.886754]  __schedule+0x3d4/0xb70
	[39610.887301]  schedule+0x3d/0x80
	[39610.887820]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.888731]  ? wake_up_var+0x40/0x40
	[39610.889323]  btrfs_setattr+0x316/0x5a0
	[39610.889979]  notify_change+0x2f0/0x450
	[39610.890569]  do_truncate+0x73/0xc0
	[39610.891102]  do_sys_ftruncate+0x12b/0x1c0
	[39610.891803]  __x64_sys_ftruncate+0x1b/0x20
	[39610.892460]  do_syscall_64+0x65/0x1a0
	[39610.893029]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.893853] RIP: 0033:0x7f697f553c97
	[39610.894481] Code: Bad RIP value.
	[39610.895058] RSP: 002b:00007ffed6c8bbb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.896391] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f697f553c97
	[39610.897665] RDX: 0000000000000000 RSI: 0000000000014ab8 RDI: 0000000000000003
	[39610.898926] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000fb17ef2f
	[39610.900177] R10: 000000009ebff28a R11: 0000000000000246 R12: 0000000000014ab8
	[39610.901436] R13: 0000000000014820 R14: 0000000000000298 R15: 0000000000000000
	[39610.902697] rsync           D    0 12846   9349 0x00000000
	[39610.903619] Call Trace:
	[39610.904017]  __schedule+0x3d4/0xb70
	[39610.904584]  schedule+0x3d/0x80
	[39610.905125]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.906044]  ? wake_up_var+0x40/0x40
	[39610.906707]  btrfs_setattr+0x316/0x5a0
	[39610.907398]  notify_change+0x2f0/0x450
	[39610.908091]  do_truncate+0x73/0xc0
	[39610.908694]  do_sys_ftruncate+0x12b/0x1c0
	[39610.909390]  __x64_sys_ftruncate+0x1b/0x20
	[39610.910120]  do_syscall_64+0x65/0x1a0
	[39610.910715]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.911535] RIP: 0033:0x7f77c7350c97
	[39610.912171] Code: Bad RIP value.
	[39610.912750] RSP: 002b:00007fff3402b998 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.914036] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f77c7350c97
	[39610.915156] RDX: 0000000000000000 RSI: 000000000001b800 RDI: 0000000000000003
	[39610.916607] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000012f2a017
	[39610.917739] R10: 000000005ca8e044 R11: 0000000000000246 R12: 000000000001b800
	[39610.918957] R13: 000000000001b580 R14: 0000000000000280 R15: 0000000000000000
	[39610.920107] rsync           D    0 12847  10903 0x00000000
	[39610.921024] Call Trace:
	[39610.921422]  __schedule+0x3d4/0xb70
	[39610.921964]  schedule+0x3d/0x80
	[39610.922457]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.923278]  ? wake_up_var+0x40/0x40
	[39610.924019]  btrfs_setattr+0x316/0x5a0
	[39610.924631]  notify_change+0x2f0/0x450
	[39610.925226]  do_truncate+0x73/0xc0
	[39610.925786]  do_sys_ftruncate+0x12b/0x1c0
	[39610.926723]  __x64_sys_ftruncate+0x1b/0x20
	[39610.927375]  do_syscall_64+0x65/0x1a0
	[39610.927963]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.928772] RIP: 0033:0x7f3462c85c97
	[39610.929341] Code: Bad RIP value.
	[39610.929883] RSP: 002b:00007ffd4b538368 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.931068] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3462c85c97
	[39610.932254] RDX: 0000000000000000 RSI: 0000000001b9face RDI: 0000000000000003
	[39610.933373] RBP: 0000000000000003 R08: 0000000000000000 R09: 000000007afe8ff0
	[39610.934622] R10: 00000000092d5688 R11: 0000000000000246 R12: 0000000001b9face
	[39610.935811] R13: 0000000001b9e700 R14: 00000000000013ce R15: 0000000000000000
	[39610.936938] rsync           D    0 12849   6970 0x00000000
	[39610.938109] Call Trace:
	[39610.938515]  __schedule+0x3d4/0xb70
	[39610.939071]  schedule+0x3d/0x80
	[39610.939585]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.940407]  ? wake_up_var+0x40/0x40
	[39610.940959]  btrfs_setattr+0x316/0x5a0
	[39610.941544]  notify_change+0x2f0/0x450
	[39610.942146]  do_truncate+0x73/0xc0
	[39610.942762]  do_sys_ftruncate+0x12b/0x1c0
	[39610.943383]  __x64_sys_ftruncate+0x1b/0x20
	[39610.944209]  do_syscall_64+0x65/0x1a0
	[39610.944786]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.945656] RIP: 0033:0x7f3044497c97
	[39610.946409] Code: Bad RIP value.
	[39610.947155] RSP: 002b:00007fffb7404608 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.948596] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3044497c97
	[39610.949873] RDX: 0000000000000000 RSI: 000000000049a1d8 RDI: 0000000000000003
	[39610.951494] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000bb1d27a0
	[39610.952718] R10: 00000000f74314fb R11: 0000000000000246 R12: 000000000049a1d8
	[39610.953893] R13: 0000000000499e10 R14: 00000000000003c8 R15: 0000000000000000
	[39610.954982] rsync           D    0 12850   8121 0x00000000
	[39610.956640] Call Trace:
	[39610.957089]  __schedule+0x3d4/0xb70
	[39610.957805]  schedule+0x3d/0x80
	[39610.958456]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.959472]  ? wake_up_var+0x40/0x40
	[39610.960176]  btrfs_setattr+0x316/0x5a0
	[39610.960779]  notify_change+0x2f0/0x450
	[39610.961377]  do_truncate+0x73/0xc0
	[39610.961924]  do_sys_ftruncate+0x12b/0x1c0
	[39610.962760]  __x64_sys_ftruncate+0x1b/0x20
	[39610.963419]  do_syscall_64+0x65/0x1a0
	[39610.964085]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.964943] RIP: 0033:0x7f8e0c156c97
	[39610.965924] Code: Bad RIP value.
	[39610.966517] RSP: 002b:00007ffcb0f66bb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.967977] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8e0c156c97
	[39610.969245] RDX: 0000000000000000 RSI: 00000000000093d8 RDI: 0000000000000003
	[39610.970619] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000021682eb2
	[39610.971881] R10: 00000000e1f12d36 R11: 0000000000000246 R12: 00000000000093d8
	[39610.973300] R13: 00000000000093a8 R14: 0000000000000030 R15: 0000000000000000
	[39610.974593] rsync           D    0 12853  11202 0x00000000
	[39610.975480] Call Trace:
	[39610.975876]  __schedule+0x3d4/0xb70
	[39610.976457]  schedule+0x3d/0x80
	[39610.977133]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.977973]  ? wake_up_var+0x40/0x40
	[39610.978549]  btrfs_setattr+0x316/0x5a0
	[39610.979145]  notify_change+0x2f0/0x450
	[39610.980154]  do_truncate+0x73/0xc0
	[39610.980712]  do_sys_ftruncate+0x12b/0x1c0
	[39610.981348]  __x64_sys_ftruncate+0x1b/0x20
	[39610.982111]  do_syscall_64+0x65/0x1a0
	[39610.982770]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39610.983692] RIP: 0033:0x7ff863906c97
	[39610.984536] Code: Bad RIP value.
	[39610.985132] RSP: 002b:00007ffed3457e28 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39610.986388] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff863906c97
	[39610.987823] RDX: 0000000000000000 RSI: 000000000009ba00 RDI: 0000000000000003
	[39610.988936] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000b58a5813
	[39610.990107] R10: 000000007951ae56 R11: 0000000000000246 R12: 000000000009ba00
	[39610.991289] R13: 000000000009b760 R14: 00000000000002a0 R15: 0000000000000000
	[39610.992603] rsync           D    0 12855   9095 0x00000000
	[39610.993581] Call Trace:
	[39610.993964]  __schedule+0x3d4/0xb70
	[39610.994512]  schedule+0x3d/0x80
	[39610.995002]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39610.995882]  ? wake_up_var+0x40/0x40
	[39610.996538]  btrfs_setattr+0x316/0x5a0
	[39610.997211]  notify_change+0x2f0/0x450
	[39610.997841]  do_truncate+0x73/0xc0
	[39610.998368]  do_sys_ftruncate+0x12b/0x1c0
	[39610.999054]  __x64_sys_ftruncate+0x1b/0x20
	[39610.999787]  do_syscall_64+0x65/0x1a0
	[39611.000438]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.001343] RIP: 0033:0x7f93b080fc97
	[39611.001963] Code: Bad RIP value.
	[39611.002523] RSP: 002b:00007fffc289a6f8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.003717] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f93b080fc97
	[39611.004965] RDX: 0000000000000000 RSI: 0000000000006fc8 RDI: 0000000000000003
	[39611.006187] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000610c490e
	[39611.007445] R10: 00000000ddc7f267 R11: 0000000000000246 R12: 0000000000006fc8
	[39611.008747] R13: 0000000000006d60 R14: 0000000000000268 R15: 0000000000000000
	[39611.010056] rsync           D    0 12856   8999 0x00000000
	[39611.011086] Call Trace:
	[39611.011599]  __schedule+0x3d4/0xb70
	[39611.012279]  schedule+0x3d/0x80
	[39611.012806]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.013748]  ? wake_up_var+0x40/0x40
	[39611.014410]  btrfs_setattr+0x316/0x5a0
	[39611.015129]  notify_change+0x2f0/0x450
	[39611.015840]  do_truncate+0x73/0xc0
	[39611.016471]  do_sys_ftruncate+0x12b/0x1c0
	[39611.017229]  __x64_sys_ftruncate+0x1b/0x20
	[39611.018002]  do_syscall_64+0x65/0x1a0
	[39611.018692]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.019624] RIP: 0033:0x7fb1cd9e1c97
	[39611.020273] Code: Bad RIP value.
	[39611.020900] RSP: 002b:00007ffc6ba2fbc8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.022206] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb1cd9e1c97
	[39611.023308] RDX: 0000000000000000 RSI: 00000000000093d8 RDI: 0000000000000003
	[39611.024431] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000021682eb2
	[39611.025668] R10: 00000000e1f12d36 R11: 0000000000000246 R12: 00000000000093d8
	[39611.027072] R13: 00000000000093a8 R14: 0000000000000030 R15: 0000000000000000
	[39611.028390] rsync           D    0 12857  10451 0x00000000
	[39611.029382] Call Trace:
	[39611.029843]  __schedule+0x3d4/0xb70
	[39611.030483]  schedule+0x3d/0x80
	[39611.031074]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.032050]  ? wake_up_var+0x40/0x40
	[39611.032722]  btrfs_setattr+0x316/0x5a0
	[39611.033409]  notify_change+0x2f0/0x450
	[39611.034118]  do_truncate+0x73/0xc0
	[39611.034756]  do_sys_ftruncate+0x12b/0x1c0
	[39611.035474]  __x64_sys_ftruncate+0x1b/0x20
	[39611.036161]  do_syscall_64+0x65/0x1a0
	[39611.036877]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.037787] RIP: 0033:0x7fab83a26c97
	[39611.039054] Code: Bad RIP value.
	[39611.039776] RSP: 002b:00007ffe7adce218 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.040967] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fab83a26c97
	[39611.042422] RDX: 0000000000000000 RSI: 00000000015983b4 RDI: 0000000000000003
	[39611.043747] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000004809bce
	[39611.044974] R10: 000000007f4aa483 R11: 0000000000000246 R12: 00000000015983b4
	[39611.047307] R13: 0000000001598250 R14: 0000000000000164 R15: 0000000000000000
	[39611.048509] rsync           D    0 12858  10345 0x00000000
	[39611.049365] Call Trace:
	[39611.050235]  __schedule+0x3d4/0xb70
	[39611.050798]  schedule+0x3d/0x80
	[39611.051289]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.052121]  ? wake_up_var+0x40/0x40
	[39611.053007]  btrfs_setattr+0x316/0x5a0
	[39611.053599]  notify_change+0x2f0/0x450
	[39611.054182]  do_truncate+0x73/0xc0
	[39611.054711]  do_sys_ftruncate+0x12b/0x1c0
	[39611.055338]  __x64_sys_ftruncate+0x1b/0x20
	[39611.056072]  do_syscall_64+0x65/0x1a0
	[39611.057115]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.058458] RIP: 0033:0x7f5fdff91c97
	[39611.059133] Code: Bad RIP value.
	[39611.059660] RSP: 002b:00007ffc67d00a58 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.060833] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5fdff91c97
	[39611.062271] RDX: 0000000000000000 RSI: 0000000000006fc8 RDI: 0000000000000003
	[39611.063389] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000610c490e
	[39611.064526] R10: 00000000ddc7f267 R11: 0000000000000246 R12: 0000000000006fc8
	[39611.065649] R13: 0000000000006d60 R14: 0000000000000268 R15: 0000000000000000
	[39611.066902] rsync           D    0 12861  10901 0x00000000
	[39611.067957] Call Trace:
	[39611.068355]  __schedule+0x3d4/0xb70
	[39611.068912]  schedule+0x3d/0x80
	[39611.069413]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.070605]  ? wake_up_var+0x40/0x40
	[39611.071174]  btrfs_setattr+0x316/0x5a0
	[39611.071789]  notify_change+0x2f0/0x450
	[39611.072383]  do_truncate+0x73/0xc0
	[39611.073001]  do_sys_ftruncate+0x12b/0x1c0
	[39611.073644]  __x64_sys_ftruncate+0x1b/0x20
	[39611.074299]  do_syscall_64+0x65/0x1a0
	[39611.074883]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.075681] RIP: 0033:0x7f3de4043c97
	[39611.076251] Code: Bad RIP value.
	[39611.076772] RSP: 002b:00007ffe58c8fac8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.077957] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3de4043c97
	[39611.079073] RDX: 0000000000000000 RSI: 0000000000003138 RDI: 0000000000000003
	[39611.080188] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000a81c530d
	[39611.081339] R10: 00000000f7dbc024 R11: 0000000000000246 R12: 0000000000003138
	[39611.082533] R13: 0000000000002e7c R14: 00000000000002bc R15: 0000000000000000
	[39611.083724] rsync           D    0 12863   7311 0x00000000
	[39611.084641] Call Trace:
	[39611.085060]  __schedule+0x3d4/0xb70
	[39611.085653]  schedule+0x3d/0x80
	[39611.086183]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.087052]  ? wake_up_var+0x40/0x40
	[39611.087660]  btrfs_setattr+0x316/0x5a0
	[39611.088286]  notify_change+0x2f0/0x450
	[39611.088923]  do_truncate+0x73/0xc0
	[39611.089506]  do_sys_ftruncate+0x12b/0x1c0
	[39611.090176]  __x64_sys_ftruncate+0x1b/0x20
	[39611.090870]  do_syscall_64+0x65/0x1a0
	[39611.091492]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.092333] RIP: 0033:0x7fdf2e203c97
	[39611.092938] Code: Bad RIP value.
	[39611.093491] RSP: 002b:00007ffda441e7e8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.094737] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdf2e203c97
	[39611.095916] RDX: 0000000000000000 RSI: 000000000000001d RDI: 0000000000000003
	[39611.097097] RBP: 0000000000000003 R08: 0000000000000000 R09: 000000000000001d
	[39611.098277] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000001d
	[39611.099439] R13: 0000000000000000 R14: 000000000000001d R15: 0000000000000000
	[39611.100839] rsync           D    0 12864   7029 0x00000000
	[39611.101792] Call Trace:
	[39611.102223]  __schedule+0x3d4/0xb70
	[39611.102939]  schedule+0x3d/0x80
	[39611.103548]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.104391]  ? wake_up_var+0x40/0x40
	[39611.104979]  btrfs_setattr+0x316/0x5a0
	[39611.105598]  notify_change+0x2f0/0x450
	[39611.106211]  do_truncate+0x73/0xc0
	[39611.106775]  do_sys_ftruncate+0x12b/0x1c0
	[39611.107413]  __x64_sys_ftruncate+0x1b/0x20
	[39611.108061]  do_syscall_64+0x65/0x1a0
	[39611.108651]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.109450] RIP: 0033:0x7fbd0ecc1c97
	[39611.110016] Code: Bad RIP value.
	[39611.110535] RSP: 002b:00007ffe7ccb9f98 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.111718] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbd0ecc1c97
	[39611.112835] RDX: 0000000000000000 RSI: 0000000000004f10 RDI: 0000000000000003
	[39611.114054] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000a72ac329
	[39611.115210] R10: 000000006c1ecb5e R11: 0000000000000246 R12: 0000000000004f10
	[39611.116358] R13: 0000000000004c90 R14: 0000000000000280 R15: 0000000000000000
	[39611.117842] rsync           D    0 12868   8122 0x00000000
	[39611.118737] Call Trace:
	[39611.119144]  __schedule+0x3d4/0xb70
	[39611.119723]  schedule+0x3d/0x80
	[39611.120238]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.121183]  ? wake_up_var+0x40/0x40
	[39611.121778]  btrfs_setattr+0x316/0x5a0
	[39611.122374]  notify_change+0x2f0/0x450
	[39611.122978]  do_truncate+0x73/0xc0
	[39611.123528]  do_sys_ftruncate+0x12b/0x1c0
	[39611.124162]  __x64_sys_ftruncate+0x1b/0x20
	[39611.124931]  do_syscall_64+0x65/0x1a0
	[39611.125528]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.126316] RIP: 0033:0x7f57cb45cc97
	[39611.126896] Code: Bad RIP value.
	[39611.127702] RSP: 002b:00007ffe555eee38 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.128893] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f57cb45cc97
	[39611.130050] RDX: 0000000000000000 RSI: 000000000000007e RDI: 0000000000000003
	[39611.131206] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000a64251e9
	[39611.132309] R10: 00000000b69a9260 R11: 0000000000000246 R12: 000000000000007e
	[39611.133449] R13: 0000000000000000 R14: 000000000000007e R15: 0000000000000000
	[39611.134660] rsync           D    0 12869   7189 0x00000000
	[39611.135511] Call Trace:
	[39611.135895]  __schedule+0x3d4/0xb70
	[39611.136441]  schedule+0x3d/0x80
	[39611.136949]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.138069]  ? wake_up_var+0x40/0x40
	[39611.138663]  btrfs_setattr+0x316/0x5a0
	[39611.139257]  notify_change+0x2f0/0x450
	[39611.140031]  do_truncate+0x73/0xc0
	[39611.140798]  do_sys_ftruncate+0x12b/0x1c0
	[39611.141726]  __x64_sys_ftruncate+0x1b/0x20
	[39611.142449]  do_syscall_64+0x65/0x1a0
	[39611.143013]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.143792] RIP: 0033:0x7f2d34f4cc97
	[39611.144344] Code: Bad RIP value.
	[39611.144929] RSP: 002b:00007ffdb5c693d8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.146088] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2d34f4cc97
	[39611.147180] RDX: 0000000000000000 RSI: 000000000063c214 RDI: 0000000000000003
	[39611.148819] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001a4a74f
	[39611.149950] R10: 00000000763c97e2 R11: 0000000000000246 R12: 000000000063c214
	[39611.151040] R13: 000000000063b9f8 R14: 000000000000081c R15: 0000000000000000
	[39611.152445] rsync           D    0 29478  29477 0x00000000
	[39611.153311] Call Trace:
	[39611.153735]  __schedule+0x3d4/0xb70
	[39611.154292]  schedule+0x3d/0x80
	[39611.154981]  btrfs_wait_for_snapshot_creation+0xb2/0xe0
	[39611.155912]  ? wake_up_var+0x40/0x40
	[39611.156481]  btrfs_setattr+0x316/0x5a0
	[39611.157169]  notify_change+0x2f0/0x450
	[39611.157766]  do_truncate+0x73/0xc0
	[39611.158309]  do_sys_ftruncate+0x12b/0x1c0
	[39611.158950]  __x64_sys_ftruncate+0x1b/0x20
	[39611.159626]  do_syscall_64+0x65/0x1a0
	[39611.160191]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[39611.160991] RIP: 0033:0x7f256a298c97
	[39611.161564] Code: Bad RIP value.
	[39611.162077] RSP: 002b:00007ffeadbc85c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
	[39611.163255] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f256a298c97
	[39611.164355] RDX: 0000000000000000 RSI: 0000000000008d10 RDI: 0000000000000003
	[39611.165474] RBP: 0000000000000003 R08: 0000000000000000 R09: 000000008c0d9c29
	[39611.166588] R10: 000000003afa98d7 R11: 0000000000000246 R12: 0000000000008d10
	[39611.167700] R13: 0000000000008b74 R14: 000000000000019c R15: 0000000000000000

All the blocked threads are waiting for a snapshot, which is here:

	[42829.412010] btrfs           R  running task        0 11306   6447 0x00000000
	[42829.413392] Call Trace:
	[42829.413895]  __schedule+0x3d4/0xb70
	[42829.414598]  preempt_schedule_common+0x1f/0x30
	[42829.415480]  _cond_resched+0x22/0x30
	[42829.416198]  wait_for_common_io.constprop.2+0x47/0x1b0
	[42829.417212]  ? submit_bio+0x73/0x140
	[42829.417932]  wait_for_completion_io+0x18/0x20
	[42829.418791]  submit_bio_wait+0x68/0x90
	[42829.419546]  blkdev_issue_discard+0x80/0xd0
	[42829.420381]  btrfs_issue_discard+0xc7/0x160
	[42829.421215]  ? btrfs_issue_discard+0xc7/0x160
	[42829.422088]  btrfs_discard_extent+0xcc/0x160
	[42829.423191]  btrfs_finish_extent_commit+0x118/0x280
	[42829.424310]  btrfs_commit_transaction+0x7f9/0xab0
	[42829.425231]  ? wait_woken+0xa0/0xa0
	[42829.425907]  btrfs_mksubvol+0x5b4/0x630
	[42829.426766]  ? mnt_want_write_file+0x28/0x60
	[42829.427597]  btrfs_ioctl_snap_create_transid+0x16b/0x1a0
	[42829.428614]  btrfs_ioctl_snap_create_v2+0x125/0x180
	[42829.429548]  btrfs_ioctl+0x1351/0x2cb0
	[42829.430272]  ? __handle_mm_fault+0x110c/0x1950
	[42829.431124]  ? do_raw_spin_unlock+0x4d/0xc0
	[42829.431934]  ? _raw_spin_unlock+0x27/0x40
	[42829.432704]  ? __handle_mm_fault+0x110c/0x1950
	[42829.433556]  ? kvm_sched_clock_read+0x18/0x30
	[42829.434437]  do_vfs_ioctl+0xa6/0x6e0
	[42829.435345]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[42829.436803]  ? do_vfs_ioctl+0xa6/0x6e0
	[42829.437528]  ? up_read+0x1f/0xa0
	[42829.438159]  ksys_ioctl+0x75/0x80
	[42829.438798]  __x64_sys_ioctl+0x1a/0x20
	[42829.439565]  do_syscall_64+0x65/0x1a0
	[42829.440510]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[42829.441463] RIP: 0033:0x7f1faa8f5777
	[42829.442133] Code: Bad RIP value.
	[42829.442736] RSP: 002b:00007ffec1520458 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
	[42829.444352] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1faa8f5777
	[42829.445710] RDX: 00007ffec1520498 RSI: 0000000050009417 RDI: 0000000000000003
	[42829.447030] RBP: 00005621c22ef260 R08: 0000000000000008 R09: 00007f1faa97fe80
	[42829.448372] R10: fffffffffffffa4a R11: 0000000000000202 R12: 00005621c22ef290
	[42829.449723] R13: 00005621c22ef290 R14: 0000000000000003 R15: 0000000000000004

Also I just noticed there's sometimes (but not always!) a BUG in
fs/btrfs/ctree.c just before all the soft lockups start:

	[26101.008546] ------------[ cut here ]------------
	[26101.016090] kernel BUG at fs/btrfs/ctree.c:1227!
	[26101.018285] irq event stamp: 36913
	[26101.018287] invalid opcode: 0000 [#1] SMP PTI
	[26101.018293] CPU: 1 PID: 4823 Comm: crawl_5268 Not tainted 5.0.16-zb64-9b948ea3083a+ #1
	[26101.019115] hardirqs last  enabled at (36913): [<ffffffffbb25b02c>] get_page_from_freelist+0x40c/0x19e0
	[26101.019118] hardirqs last disabled at (36912): [<ffffffffbb25af70>] get_page_from_freelist+0x350/0x19e0
	[26101.020820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[26101.022456] softirqs last  enabled at (36478): [<ffffffffbc0003a0>] __do_softirq+0x3a0/0x45a
	[26101.022461] softirqs last disabled at (36459): [<ffffffffbb0a9949>] irq_exit+0xe9/0xf0
	[26101.024212] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
	[26101.039031] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
	[26101.042382] RSP: 0018:ffffb3f401613820 EFLAGS: 00010206
	[26101.043512] RAX: ffff9f8068690180 RBX: ffff9f7ebf2ba660 RCX: ffff9f8003b1eb80
	[26101.044719] RDX: 000000000000015d RSI: 000000000000007e RDI: 0000018aff0a8000
	[26101.046157] RBP: ffffb3f401613850 R08: ffffb3f4016137c8 R09: ffffb3f4016137d0
	[26101.047474] R10: 0000000000007af3 R11: 000000000000007e R12: 0000000000000008
	[26101.048779] R13: ffff9f7ea7d77d00 R14: 0000000000000a49 R15: ffff9f8068690180
	[26101.049939] FS:  00007f9064d7a700(0000) GS:ffff9f80b6000000(0000) knlGS:0000000000000000
	[26101.051415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[26101.052330] CR2: 00007fcc462fd8a0 CR3: 000000020e002006 CR4: 00000000001606e0
	[26101.053615] Call Trace:
	[26101.054103]  btrfs_search_old_slot+0xfe/0x800
	[26101.054900]  resolve_indirect_refs+0x1c5/0x910
	[26101.055734]  ? prelim_ref_insert+0x10a/0x320
	[26101.056474]  find_parent_nodes+0x443/0x1340
	[26101.057153]  btrfs_find_all_roots_safe+0xc5/0x140
	[26101.057890]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[26101.058813]  iterate_extent_inodes+0x198/0x3e0
	[26101.059608]  iterate_inodes_from_logical+0xa1/0xd0
	[26101.060377]  ? iterate_inodes_from_logical+0xa1/0xd0
	[26101.061142]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[26101.061975]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[26101.062710]  btrfs_ioctl+0xcf7/0x2cb0
	[26101.063318]  ? __lock_acquire+0x477/0x1b50
	[26101.063988]  ? kvm_sched_clock_read+0x18/0x30
	[26101.065010]  ? kvm_sched_clock_read+0x18/0x30
	[26101.065781]  ? sched_clock+0x9/0x10
	[26101.066402]  do_vfs_ioctl+0xa6/0x6e0
	[26101.067072]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[26101.068102]  ? do_vfs_ioctl+0xa6/0x6e0
	[26101.068765]  ? __fget+0x119/0x200
	[26101.069381]  ksys_ioctl+0x75/0x80
	[26101.069937]  __x64_sys_ioctl+0x1a/0x20
	[26101.070627]  do_syscall_64+0x65/0x1a0
	[26101.071299]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[26101.072125] RIP: 0033:0x7f9067675777
	[26101.072732] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[26101.075884] RSP: 002b:00007f9064d77458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[26101.077193] RAX: ffffffffffffffda RBX: 00007f9064d77780 RCX: 00007f9067675777
	[26101.078362] RDX: 00007f9064d77788 RSI: 00000000c038943b RDI: 0000000000000004
	[26101.079543] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9064d77960
	[26101.080837] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004
	[26101.082724] R13: 00007f9064d77788 R14: 00007f9064d77668 R15: 00007f9064d77890
	[26101.084083] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data snd_pcm crct10dif_pclmul crc32_pclmul dm_bio_prison crc32c_intel ghash_clmulni_intel dm_bufio sr_mod snd_timer cdrom snd sg aesni_intel ppdev joydev aes_x86_64 dm_mod soundcore crypto_simd ide_pci_generic cryptd piix glue_helper psmouse input_leds parport_pc ide_core rtc_cmos pcspkr serio_raw bochs_drm evbug parport evdev floppy i2c_piix4 qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
	[26101.091346] ---[ end trace d327561dc44a663d ]---
	[26101.092065] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
	[26101.092893] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
	[26101.095707] RSP: 0018:ffffb3f401613820 EFLAGS: 00010206
	[26101.096505] RAX: ffff9f8068690180 RBX: ffff9f7ebf2ba660 RCX: ffff9f8003b1eb80
	[26101.097595] RDX: 000000000000015d RSI: 000000000000007e RDI: 0000018aff0a8000
	[26101.098678] RBP: ffffb3f401613850 R08: ffffb3f4016137c8 R09: ffffb3f4016137d0
	[26101.099771] R10: 0000000000007af3 R11: 000000000000007e R12: 0000000000000008
	[26101.100994] R13: ffff9f7ea7d77d00 R14: 0000000000000a49 R15: ffff9f8068690180
	[26101.102230] FS:  00007f9064d7a700(0000) GS:ffff9f80b6000000(0000) knlGS:0000000000000000
	[26101.103649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[26101.104627] CR2: 00007fcc462fd8a0 CR3: 000000020e002006 CR4: 00000000001606e0
	[26101.105812] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
	[26101.107351] in_atomic(): 1, irqs_disabled(): 0, pid: 4823, name: crawl_5268
	[26101.108628] INFO: lockdep is turned off.
	[26101.109362] CPU: 1 PID: 4823 Comm: crawl_5268 Tainted: G      D           5.0.16-zb64-9b948ea3083a+ #1
	[26101.110931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[26101.112303] Call Trace:
	[26101.112757]  dump_stack+0x86/0xc5
	[26101.113387]  ___might_sleep+0x217/0x240
	[26101.114077]  __might_sleep+0x4a/0x80
	[26101.114730]  exit_signals+0x33/0x250
	[26101.115508]  do_exit+0xb9/0xd70
	[26101.116095]  rewind_stack_do_exit+0x17/0x20
	[26101.116858] RIP: 0033:0x7f9067675777
	[26101.117497] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[26101.120764] RSP: 002b:00007f9064d77458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[26101.122031] RAX: ffffffffffffffda RBX: 00007f9064d77780 RCX: 00007f9067675777
	[26101.123266] RDX: 00007f9064d77788 RSI: 00000000c038943b RDI: 0000000000000004
	[26101.124474] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9064d77960
	[26101.125746] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004
	[26101.126982] R13: 00007f9064d77788 R14: 00007f9064d77668 R15: 00007f9064d77890
	[26101.128246] note: crawl_5268[4823] exited with preempt_count 2
	[26128.042702] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [crawl_5268:4821]
	[26128.048713] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [crawl_5268:4820]
	[26128.051844] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [crawl_5268:4822]
	[26128.055703] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [btrfs-balance:4691]
	[...etc...]

The stack traces for the 4 running threads are:

	[42826.377803] crawl_5268      R  running task        0  4821   4806 0x80000008
	[42826.379193] Call Trace:
	[42826.379683]  <IRQ>
	[42826.380076]  sched_show_task+0x198/0x260
	[42826.380760]  show_state_filter+0xa0/0x1a0
	[42826.381438]  sysrq_handle_showstate+0x10/0x20
	[42826.382251]  __handle_sysrq+0x139/0x210
	[42826.382911]  handle_sysrq+0x26/0x30
	[42826.383507]  serial8250_handle_irq.part.16+0xbc/0x100
	[42826.384476]  serial8250_default_handle_irq+0x53/0x60
	[42826.385369]  serial8250_interrupt+0x68/0x100
	[42826.386102]  __handle_irq_event_percpu+0x4e/0x2b0
	[42826.386890]  handle_irq_event_percpu+0x32/0x80
	[42826.387627]  handle_irq_event+0x39/0x60
	[42826.388274]  handle_edge_irq+0xef/0x1c0
	[42826.388921]  handle_irq+0x75/0x120
	[42826.389475]  do_IRQ+0x64/0x130
	[42826.390007]  common_interrupt+0xf/0xf
	[42826.390617]  </IRQ>
	[42826.390984] RIP: 0010:native_safe_halt+0x12/0x20
	[42826.391750] Code: f0 80 48 02 20 48 8b 00 a8 08 0f 84 7b ff ff ff eb bd 90 90 90 90 90 90 55 48 89 e5 e9 07 00 00 00 0f 00 2d b2 ec 43 00 fb f4 <5d> c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e9 07 00 00
	[42826.395273] RSP: 0018:ffffb3f401603750 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd6
	[42826.396849] RAX: 0000000000000000 RBX: ffff9f80a36716a4 RCX: 0000000000000008
	[42826.398308] RDX: ffff9f80ab13c000 RSI: ffffffffbb119a73 RDI: ffffffffbb07a326
	[42826.399766] RBP: ffffb3f401603750 R08: 0000000000000000 R09: 000000000000005c
	[42826.401269] R10: ffffb3f401603790 R11: ffff9f80a36716b8 R12: 0000000000000246
	[42826.402715] R13: 0000000000000003 R14: 0000000000000100 R15: 0000000000000000
	[42826.403891]  ? __pv_queued_spin_lock_slowpath+0x273/0x2b0
	[42826.404798]  ? kvm_wait+0x86/0x90
	[42826.405346]  kvm_wait+0x8b/0x90
	[42826.405998]  __pv_queued_spin_lock_slowpath+0x273/0x2b0
	[42826.407075]  queued_read_lock_slowpath+0x75/0x80
	[42826.408088]  do_raw_read_lock+0x4b/0x50
	[42826.408717]  _raw_read_lock+0x58/0x70
	[42826.409319]  __tree_mod_log_search+0x2d/0xb0
	[42826.410019]  btrfs_search_old_slot+0x312/0x800
	[42826.410738]  ? __tree_mod_log_search+0x73/0xb0
	[42826.411466]  resolve_indirect_refs+0x1c5/0x910
	[42826.412192]  ? prelim_ref_insert+0x10a/0x320
	[42826.412890]  find_parent_nodes+0x443/0x1340
	[42826.413597]  btrfs_find_all_roots_safe+0xc5/0x140
	[42826.414591]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[42826.415573]  iterate_extent_inodes+0x198/0x3e0
	[42826.416530]  iterate_inodes_from_logical+0xa1/0xd0
	[42826.417568]  ? iterate_inodes_from_logical+0xa1/0xd0
	[42826.418511]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[42826.419334]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[42826.420390]  btrfs_ioctl+0xcf7/0x2cb0
	[42826.421300]  ? __lock_acquire+0x477/0x1b50
	[42826.422246]  ? kvm_sched_clock_read+0x18/0x30
	[42826.422960]  ? kvm_sched_clock_read+0x18/0x30
	[42826.423731]  ? sched_clock+0x9/0x10
	[42826.424317]  do_vfs_ioctl+0xa6/0x6e0
	[42826.425014]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[42826.426162]  ? do_vfs_ioctl+0xa6/0x6e0
	[42826.426813]  ? __fget+0x119/0x200
	[42826.427370]  ksys_ioctl+0x75/0x80
	[42826.427926]  __x64_sys_ioctl+0x1a/0x20
	[42826.428609]  do_syscall_64+0x65/0x1a0
	[42826.429444]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[42826.430473] RIP: 0033:0x7f9067675777
	[42826.431201] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[42826.434629] RSP: 002b:00007f9065d79458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[42826.435975] RAX: ffffffffffffffda RBX: 00007f9065d79780 RCX: 00007f9067675777
	[42826.437165] RDX: 00007f9065d79788 RSI: 00000000c038943b RDI: 0000000000000004
	[42826.438305] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9065d79960
	[42826.439540] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004
	[42826.440947] R13: 00007f9065d79788 R14: 00007f9065d79668 R15: 00007f9065d79890

	[42826.343964] crawl_5268      R  running task        0  4820   4806 0x80000008
	[42826.345377] Call Trace:
	[42826.345865]  __schedule+0x3dc/0xb70
	[42826.346500]  schedule+0x3d/0x80
	[42826.347055]  ? wait_on_page_bit+0x193/0x270
	[42826.347965]  ? retint_kernel+0x10/0x10
	[42826.348861]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[42826.349961]  ? retint_kernel+0x10/0x10
	[42826.350727]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[42826.351648]  ? trace_hardirqs_on_caller+0x4a/0x100
	[42826.352683]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[42826.353593]  ? retint_kernel+0x10/0x10
	[42826.354322]  ? kvm_wait+0x86/0x90
	[42826.354922]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[42826.355802]  ? trace_hardirqs_on+0x4c/0x100
	[42826.356489]  ? kvm_wait+0x8b/0x90
	[42826.357215]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[42826.358172]  ? queued_write_lock_slowpath+0x80/0x90
	[42826.359170]  ? do_raw_write_lock+0xae/0xb0
	[42826.359845]  ? _raw_write_lock+0x55/0x70
	[42826.360490]  ? btrfs_get_tree_mod_seq+0x32/0xc0
	[42826.361513]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[42826.362373]  ? iterate_extent_inodes+0x357/0x3e0
	[42826.363149]  ? release_extent_buffer+0xaa/0xe0
	[42826.364283]  ? _raw_spin_unlock+0x27/0x40
	[42826.365094]  ? iterate_inodes_from_logical+0xa1/0xd0
	[42826.366083]  ? iterate_inodes_from_logical+0xa1/0xd0
	[42826.367077]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[42826.368108]  ? btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[42826.369098]  ? btrfs_ioctl+0xcf7/0x2cb0
	[42826.369769]  ? lock_acquire+0xbd/0x1c0
	[42826.370430]  ? lock_acquire+0xbd/0x1c0
	[42826.371226]  ? do_vfs_ioctl+0xa6/0x6e0
	[42826.371884]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[42826.372899]  ? do_vfs_ioctl+0xa6/0x6e0
	[42826.373653]  ? __fget+0x119/0x200
	[42826.374490]  ? ksys_ioctl+0x75/0x80
	[42826.375323]  ? __x64_sys_ioctl+0x1a/0x20
	[42826.376203]  ? do_syscall_64+0x65/0x1a0
	[42826.376896]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

	[42826.442068] crawl_5268      R  running task        0  4822   4806 0x80000008
	[42826.443327] Call Trace:
	[42826.443765]  __schedule+0x3dc/0xb70
	[42826.444410]  schedule+0x3d/0x80
	[42826.445077]  ? wait_on_page_bit+0x193/0x270
	[42826.445949]  ? retint_kernel+0x10/0x10
	[42826.446568]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[42826.447332]  ? trace_hardirqs_on_caller+0x4a/0x100
	[42826.448104]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[42826.448845]  ? retint_kernel+0x10/0x10
	[42826.449452]  ? kvm_wait+0x86/0x90
	[42826.450000]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[42826.450870]  ? kvm_wait+0x86/0x90
	[42826.451422]  ? trace_hardirqs_on+0x4c/0x100
	[42826.452394]  ? kvm_wait+0x8b/0x90
	[42826.453108]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[42826.454029]  ? queued_write_lock_slowpath+0x80/0x90
	[42826.455032]  ? do_raw_write_lock+0xae/0xb0
	[42826.455721]  ? _raw_write_lock+0x55/0x70
	[42826.456384]  ? btrfs_get_tree_mod_seq+0x32/0xc0
	[42826.457362]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[42826.458465]  ? iterate_extent_inodes+0x357/0x3e0
	[42826.459275]  ? release_extent_buffer+0xaa/0xe0
	[42826.460200]  ? _raw_spin_unlock+0x27/0x40
	[42826.461139]  ? iterate_inodes_from_logical+0xa1/0xd0
	[42826.462004]  ? iterate_inodes_from_logical+0xa1/0xd0
	[42826.463032]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[42826.463906]  ? btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[42826.464752]  ? btrfs_ioctl+0xcf7/0x2cb0
	[42826.466053]  ? lock_acquire+0xbd/0x1c0
	[42826.466921]  ? lock_acquire+0xbd/0x1c0
	[42826.467765]  ? do_vfs_ioctl+0xa6/0x6e0
	[42826.468482]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[42826.469726]  ? do_vfs_ioctl+0xa6/0x6e0
	[42826.470366]  ? __fget+0x119/0x200
	[42826.470950]  ? ksys_ioctl+0x75/0x80
	[42826.471558]  ? __x64_sys_ioctl+0x1a/0x20
	[42826.472367]  ? do_syscall_64+0x65/0x1a0
	[42826.473038]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

	[42826.119748] btrfs-balance   R  running task        0  4691      2 0x80000008
	[42826.121155] Call Trace:
	[42826.121620]  tree_mod_log_insert_root.isra.2+0x117/0x350
	[42826.122670]  ? queued_write_lock_slowpath+0x50/0x90
	[42826.123496]  ? do_raw_write_lock+0xae/0xb0
	[42826.124167]  ? _raw_write_lock+0x55/0x70
	[42826.124804]  ? tree_mod_log_insert_root.isra.2+0x117/0x350
	[42826.125691]  ? __btrfs_cow_block+0x41e/0x570
	[42826.126390]  ? btrfs_cow_block+0xf8/0x230
	[42826.127051]  ? replace_path.isra.18+0x403/0x770
	[42826.127785]  ? merge_reloc_root+0x2ab/0x5a0
	[42826.128469]  ? btrfs_get_fs_root.part.12+0xc0/0x160
	[42826.129262]  ? merge_reloc_roots+0xe0/0x270
	[42826.129948]  ? relocate_block_group+0x184/0x600
	[42826.130681]  ? btrfs_relocate_block_group+0x15a/0x270
	[42826.131499]  ? btrfs_relocate_chunk+0x50/0x100
	[42826.132222]  ? btrfs_balance+0xa72/0x1330
	[42826.132878]  ? balance_kthread+0x25/0x50
	[42826.133520]  ? balance_kthread+0x3b/0x50
	[42826.134166]  ? kthread+0x113/0x150
	[42826.134720]  ? btrfs_balance+0x1330/0x1330
	[42826.135389]  ? kthread_park+0x90/0x90
	[42826.135993]  ? ret_from_fork+0x3a/0x50

This doesn't quite match the traces I previously posted.  Maybe the
storm-of-soft-lockups is a symptom of multiple bugs (at least one which
happens after a bug in ctree.c and one that happens at other times)?
I'll run this a few more times and see if different cases come up.

The one thing that is common to all the storm-of-soft-lockups I've seen
so far is the involvement of multiple crawl_* threads, and those spend
~60% of their time running logical_to_ino.

> Also do you have this patch on the affected machine:
> 
> 38e3eebff643 ("btrfs: honor path->skip_locking in backref code") can you
> try and test with it applied ?
> 
> 
> <SNIP>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
  2019-05-18  4:44   ` Zygo Blaxell
@ 2019-05-18  8:27     ` Nikolay Borisov
  2019-05-18 21:00       ` Zygo Blaxell
  2019-05-18 21:41       ` Zygo Blaxell
  0 siblings, 2 replies; 8+ messages in thread
From: Nikolay Borisov @ 2019-05-18  8:27 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs



On 18.05.19 г. 7:44 ч., Zygo Blaxell wrote:
> On Thu, May 16, 2019 at 09:57:01AM +0300, Nikolay Borisov wrote:
>>
>>
>> On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
>>> "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
>>> CPU cores in kernel functions that are unable to make forward progress,
>>> but also unwilling to release their respective CPU cores.  This is
>>> usually accompanied by a lot of CPU usage (detectable as either kvm CPU
>>> usage or just a lot of CPU fan noise) though I don't know if all cores
>>> are spinning or only some of them.
>>>
>>> The kernel console presents a continual stream of "BUG: soft lockup"
>>> warnings for some days.  None of the call traces change during this time.
>>> The only way out is to reboot.
>>>
>>> You can reproduce this by writing a bunch of data to a filesystem while
>>> bees is running on all cores.  It takes a few days to occur naturally.
>>> It can probably be sped up by just doing a bunch of random LOGICAL_INO
>>> ioctls in a tight loop on each core.
>>>
>>> Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
>>> (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
>>> ioctl (btrfs_tree_read_lock_atomic):
>>
>>
>> Provide output of all sleeping threads when this occur via
>>  echo w > /proc/sysrq-trigger.
> 
> I fixed my SysRq configuration.  The results are...kind of interesting.
> I guess the four threads that are running loops on all cores don't count
> as "blocked"...
> 
<snip>

> 
> All the blocked threads are waiting for a snapshot, which is here:
> 
> 	[42829.412010] btrfs           R  running task        0 11306   6447 0x00000000
> 	[42829.413392] Call Trace:
> 	[42829.413895]  __schedule+0x3d4/0xb70
> 	[42829.414598]  preempt_schedule_common+0x1f/0x30
> 	[42829.415480]  _cond_resched+0x22/0x30
> 	[42829.416198]  wait_for_common_io.constprop.2+0x47/0x1b0
> 	[42829.417212]  ? submit_bio+0x73/0x140
> 	[42829.417932]  wait_for_completion_io+0x18/0x20
> 	[42829.418791]  submit_bio_wait+0x68/0x90
> 	[42829.419546]  blkdev_issue_discard+0x80/0xd0
> 	[42829.420381]  btrfs_issue_discard+0xc7/0x160
> 	[42829.421215]  ? btrfs_issue_discard+0xc7/0x160
> 	[42829.422088]  btrfs_discard_extent+0xcc/0x160
> 	[42829.423191]  btrfs_finish_extent_commit+0x118/0x280
> 	[42829.424310]  btrfs_commit_transaction+0x7f9/0xab0
> 	[42829.425231]  ? wait_woken+0xa0/0xa0
> 	[42829.425907]  btrfs_mksubvol+0x5b4/0x630
> 	[42829.426766]  ? mnt_want_write_file+0x28/0x60
> 	[42829.427597]  btrfs_ioctl_snap_create_transid+0x16b/0x1a0
> 	[42829.428614]  btrfs_ioctl_snap_create_v2+0x125/0x180
> 	[42829.429548]  btrfs_ioctl+0x1351/0x2cb0
> 	[42829.430272]  ? __handle_mm_fault+0x110c/0x1950
> 	[42829.431124]  ? do_raw_spin_unlock+0x4d/0xc0
> 	[42829.431934]  ? _raw_spin_unlock+0x27/0x40
> 	[42829.432704]  ? __handle_mm_fault+0x110c/0x1950
> 	[42829.433556]  ? kvm_sched_clock_read+0x18/0x30
> 	[42829.434437]  do_vfs_ioctl+0xa6/0x6e0
> 	[42829.435345]  ? btrfs_ioctl_get_supported_features+0x30/0x30
> 	[42829.436803]  ? do_vfs_ioctl+0xa6/0x6e0
> 	[42829.437528]  ? up_read+0x1f/0xa0
> 	[42829.438159]  ksys_ioctl+0x75/0x80
> 	[42829.438798]  __x64_sys_ioctl+0x1a/0x20
> 	[42829.439565]  do_syscall_64+0x65/0x1a0
> 	[42829.440510]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 	[42829.441463] RIP: 0033:0x7f1faa8f5777
> 	[42829.442133] Code: Bad RIP value.
> 	[42829.442736] RSP: 002b:00007ffec1520458 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> 	[42829.444352] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1faa8f5777
> 	[42829.445710] RDX: 00007ffec1520498 RSI: 0000000050009417 RDI: 0000000000000003
> 	[42829.447030] RBP: 00005621c22ef260 R08: 0000000000000008 R09: 00007f1faa97fe80
> 	[42829.448372] R10: fffffffffffffa4a R11: 0000000000000202 R12: 00005621c22ef290
> 	[42829.449723] R13: 00005621c22ef290 R14: 0000000000000003 R15: 0000000000000004

So you are discards are hung or there is a lost wakeup (which is rather
unlikely, since this is the block layer). What is the underlying block
device i.e model/make

> 
> Also I just noticed there's sometimes (but not always!) a BUG in
> fs/btrfs/ctree.c just before all the soft lockups start:
> 
> 	[26101.008546] ------------[ cut here ]------------
> 	[26101.016090] kernel BUG at fs/btrfs/ctree.c:1227!
> 	[26101.018285] irq event stamp: 36913
> 	[26101.018287] invalid opcode: 0000 [#1] SMP PTI
> 	[26101.018293] CPU: 1 PID: 4823 Comm: crawl_5268 Not tainted 5.0.16-zb64-9b948ea3083a+ #1
> 	[26101.019115] hardirqs last  enabled at (36913): [<ffffffffbb25b02c>] get_page_from_freelist+0x40c/0x19e0
> 	[26101.019118] hardirqs last disabled at (36912): [<ffffffffbb25af70>] get_page_from_freelist+0x350/0x19e0
> 	[26101.020820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> 	[26101.022456] softirqs last  enabled at (36478): [<ffffffffbc0003a0>] __do_softirq+0x3a0/0x45a
> 	[26101.022461] softirqs last disabled at (36459): [<ffffffffbb0a9949>] irq_exit+0xe9/0xf0
> 	[26101.024212] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
> 	[26101.039031] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
> 	[26101.042382] RSP: 0018:ffffb3f401613820 EFLAGS: 00010206
> 	[26101.043512] RAX: ffff9f8068690180 RBX: ffff9f7ebf2ba660 RCX: ffff9f8003b1eb80
> 	[26101.044719] RDX: 000000000000015d RSI: 000000000000007e RDI: 0000018aff0a8000
> 	[26101.046157] RBP: ffffb3f401613850 R08: ffffb3f4016137c8 R09: ffffb3f4016137d0
> 	[26101.047474] R10: 0000000000007af3 R11: 000000000000007e R12: 0000000000000008
> 	[26101.048779] R13: ffff9f7ea7d77d00 R14: 0000000000000a49 R15: ffff9f8068690180
> 	[26101.049939] FS:  00007f9064d7a700(0000) GS:ffff9f80b6000000(0000) knlGS:0000000000000000
> 	[26101.051415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 	[26101.052330] CR2: 00007fcc462fd8a0 CR3: 000000020e002006 CR4: 00000000001606e0
> 	[26101.053615] Call Trace:
> 	[26101.054103]  btrfs_search_old_slot+0xfe/0x800
> 	[26101.054900]  resolve_indirect_refs+0x1c5/0x910
> 	[26101.055734]  ? prelim_ref_insert+0x10a/0x320
> 	[26101.056474]  find_parent_nodes+0x443/0x1340
> 	[26101.057153]  btrfs_find_all_roots_safe+0xc5/0x140
> 	[26101.057890]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
> 	[26101.058813]  iterate_extent_inodes+0x198/0x3e0
> 	[26101.059608]  iterate_inodes_from_logical+0xa1/0xd0
> 	[26101.060377]  ? iterate_inodes_from_logical+0xa1/0xd0
> 	[26101.061142]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
> 	[26101.061975]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
> 	[26101.062710]  btrfs_ioctl+0xcf7/0x2cb0
> 	[26101.063318]  ? __lock_acquire+0x477/0x1b50
> 	[26101.063988]  ? kvm_sched_clock_read+0x18/0x30
> 	[26101.065010]  ? kvm_sched_clock_read+0x18/0x30
> 	[26101.065781]  ? sched_clock+0x9/0x10
> 	[26101.066402]  do_vfs_ioctl+0xa6/0x6e0
> 	[26101.067072]  ? btrfs_ioctl_get_supported_features+0x30/0x30
> 	[26101.068102]  ? do_vfs_ioctl+0xa6/0x6e0
> 	[26101.068765]  ? __fget+0x119/0x200
> 	[26101.069381]  ksys_ioctl+0x75/0x80
> 	[26101.069937]  __x64_sys_ioctl+0x1a/0x20
> 	[26101.070627]  do_syscall_64+0x65/0x1a0
> 	[26101.071299]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 	[26101.072125] RIP: 0033:0x7f9067675777
> 	[26101.072732] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
> 	[26101.075884] RSP: 002b:00007f9064d77458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> 	[26101.077193] RAX: ffffffffffffffda RBX: 00007f9064d77780 RCX: 00007f9067675777
> 	[26101.078362] RDX: 00007f9064d77788 RSI: 00000000c038943b RDI: 0000000000000004
> 	[26101.079543] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9064d77960
> 	[26101.080837] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004
> 	[26101.082724] R13: 00007f9064d77788 R14: 00007f9064d77668 R15: 00007f9064d77890
> 	[26101.084083] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data snd_pcm crct10dif_pclmul crc32_pclmul dm_bio_prison crc32c_intel ghash_clmulni_intel dm_bufio sr_mod snd_timer cdrom snd sg aesni_intel ppdev joydev aes_x86_64 dm_mod soundcore crypto_simd ide_pci_generic cryptd piix glue_helper psmouse input_leds parport_pc ide_core rtc_cmos pcspkr serio_raw bochs_drm evbug parport evdev floppy i2c_piix4 qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
> 	[26101.091346] ---[ end trace d327561dc44a663d ]---

That looks like genuine bug, first I've ever seen the rewind code
hitting it. The rest that follows doesn't seem relevant.

<snip>

> 
> This doesn't quite match the traces I previously posted.  Maybe the
> storm-of-soft-lockups is a symptom of multiple bugs (at least one which
> happens after a bug in ctree.c and one that happens at other times)?
> I'll run this a few more times and see if different cases come up.
> 
> The one thing that is common to all the storm-of-soft-lockups I've seen
> so far is the involvement of multiple crawl_* threads, and those spend
> ~60% of their time running logical_to_ino.
> 
>> Also do you have this patch on the affected machine:
>>
>> 38e3eebff643 ("btrfs: honor path->skip_locking in backref code") can you
>> try and test with it applied ?
>>
>>
>> <SNIP>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
  2019-05-18  8:27     ` Nikolay Borisov
@ 2019-05-18 21:00       ` Zygo Blaxell
  2019-05-18 21:41       ` Zygo Blaxell
  1 sibling, 0 replies; 8+ messages in thread
From: Zygo Blaxell @ 2019-05-18 21:00 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 12046 bytes --]

On Sat, May 18, 2019 at 11:27:46AM +0300, Nikolay Borisov wrote:
> 
> 
> On 18.05.19 г. 7:44 ч., Zygo Blaxell wrote:
> > On Thu, May 16, 2019 at 09:57:01AM +0300, Nikolay Borisov wrote:
> >>
> >>
> >> On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
> >>> "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> >>> CPU cores in kernel functions that are unable to make forward progress,
> >>> but also unwilling to release their respective CPU cores.  This is
> >>> usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> >>> usage or just a lot of CPU fan noise) though I don't know if all cores
> >>> are spinning or only some of them.
> >>>
> >>> The kernel console presents a continual stream of "BUG: soft lockup"
> >>> warnings for some days.  None of the call traces change during this time.
> >>> The only way out is to reboot.
> >>>
> >>> You can reproduce this by writing a bunch of data to a filesystem while
> >>> bees is running on all cores.  It takes a few days to occur naturally.
> >>> It can probably be sped up by just doing a bunch of random LOGICAL_INO
> >>> ioctls in a tight loop on each core.
> >>>
> >>> Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> >>> (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> >>> ioctl (btrfs_tree_read_lock_atomic):
> >>
> >>
> >> Provide output of all sleeping threads when this occur via
> >>  echo w > /proc/sysrq-trigger.
> > 
> > I fixed my SysRq configuration.  The results are...kind of interesting.
> > I guess the four threads that are running loops on all cores don't count
> > as "blocked"...
> > 
> <snip>
> 
> > 
> > All the blocked threads are waiting for a snapshot, which is here:
> > 
> > 	[42829.412010] btrfs           R  running task        0 11306   6447 0x00000000
> > 	[42829.413392] Call Trace:
> > 	[42829.413895]  __schedule+0x3d4/0xb70
> > 	[42829.414598]  preempt_schedule_common+0x1f/0x30
> > 	[42829.415480]  _cond_resched+0x22/0x30
> > 	[42829.416198]  wait_for_common_io.constprop.2+0x47/0x1b0
> > 	[42829.417212]  ? submit_bio+0x73/0x140
> > 	[42829.417932]  wait_for_completion_io+0x18/0x20
> > 	[42829.418791]  submit_bio_wait+0x68/0x90
> > 	[42829.419546]  blkdev_issue_discard+0x80/0xd0
> > 	[42829.420381]  btrfs_issue_discard+0xc7/0x160
> > 	[42829.421215]  ? btrfs_issue_discard+0xc7/0x160
> > 	[42829.422088]  btrfs_discard_extent+0xcc/0x160
> > 	[42829.423191]  btrfs_finish_extent_commit+0x118/0x280
> > 	[42829.424310]  btrfs_commit_transaction+0x7f9/0xab0
> > 	[42829.425231]  ? wait_woken+0xa0/0xa0
> > 	[42829.425907]  btrfs_mksubvol+0x5b4/0x630
> > 	[42829.426766]  ? mnt_want_write_file+0x28/0x60
> > 	[42829.427597]  btrfs_ioctl_snap_create_transid+0x16b/0x1a0
> > 	[42829.428614]  btrfs_ioctl_snap_create_v2+0x125/0x180
> > 	[42829.429548]  btrfs_ioctl+0x1351/0x2cb0
> > 	[42829.430272]  ? __handle_mm_fault+0x110c/0x1950
> > 	[42829.431124]  ? do_raw_spin_unlock+0x4d/0xc0
> > 	[42829.431934]  ? _raw_spin_unlock+0x27/0x40
> > 	[42829.432704]  ? __handle_mm_fault+0x110c/0x1950
> > 	[42829.433556]  ? kvm_sched_clock_read+0x18/0x30
> > 	[42829.434437]  do_vfs_ioctl+0xa6/0x6e0
> > 	[42829.435345]  ? btrfs_ioctl_get_supported_features+0x30/0x30
> > 	[42829.436803]  ? do_vfs_ioctl+0xa6/0x6e0
> > 	[42829.437528]  ? up_read+0x1f/0xa0
> > 	[42829.438159]  ksys_ioctl+0x75/0x80
> > 	[42829.438798]  __x64_sys_ioctl+0x1a/0x20
> > 	[42829.439565]  do_syscall_64+0x65/0x1a0
> > 	[42829.440510]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 	[42829.441463] RIP: 0033:0x7f1faa8f5777
> > 	[42829.442133] Code: Bad RIP value.
> > 	[42829.442736] RSP: 002b:00007ffec1520458 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> > 	[42829.444352] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1faa8f5777
> > 	[42829.445710] RDX: 00007ffec1520498 RSI: 0000000050009417 RDI: 0000000000000003
> > 	[42829.447030] RBP: 00005621c22ef260 R08: 0000000000000008 R09: 00007f1faa97fe80
> > 	[42829.448372] R10: fffffffffffffa4a R11: 0000000000000202 R12: 00005621c22ef290
> > 	[42829.449723] R13: 00005621c22ef290 R14: 0000000000000003 R15: 0000000000000004
> 
> So you are discards are hung or there is a lost wakeup (which is rather
> unlikely, since this is the block layer). What is the underlying block
> device i.e model/make

I doubt that discard is the cause here.  It is more likely to be a
victim of what is happening elsewhere.  Note that the snapshot task
above is not blocked--the wait for IO is finished, and the task is
ready to execute--but the task can't get scheduled to execute because
all of the CPUs are stuck on other tasks and they will not allow _any_
other task to run.

The storm-of-soft-lockups symptom happens on a diverse set of hardware
and block layer configurations, from single NVME devices to arrays of
spinning HDDs to mixed HDD/SDD/cache arrays, and has been occurring
for years.  There's no correlation with discard (it happens on systems
with and without discard enabled).  There are correlations with CPU core
count (at least 3 CPUs required, 2-core systems won't reproduce the bug)
and workload (limit LOGICAL_INO thread count to number_of_cpus - 1,
and the bug never happens).

Underlying block layers for the stack traces above are:

	- lvmcache (the lowest layer that accepts discard)
	- virtio (guest)
	- dm-crypt (host)
	- lvm
	- SSD KINGSTON SV300S37A480G (shared between HDDs in lvmcache)
	- HDD WDC WD20EFRX
	- HDD WDC WD40EFRX

The SSD 1) doesn't get discard requests, and 2) if it did, failures
would be noticed in one of the higher layers.

> > Also I just noticed there's sometimes (but not always!) a BUG in
> > fs/btrfs/ctree.c just before all the soft lockups start:
> > 
> > 	[26101.008546] ------------[ cut here ]------------
> > 	[26101.016090] kernel BUG at fs/btrfs/ctree.c:1227!
> > 	[26101.018285] irq event stamp: 36913
> > 	[26101.018287] invalid opcode: 0000 [#1] SMP PTI
> > 	[26101.018293] CPU: 1 PID: 4823 Comm: crawl_5268 Not tainted 5.0.16-zb64-9b948ea3083a+ #1
> > 	[26101.019115] hardirqs last  enabled at (36913): [<ffffffffbb25b02c>] get_page_from_freelist+0x40c/0x19e0
> > 	[26101.019118] hardirqs last disabled at (36912): [<ffffffffbb25af70>] get_page_from_freelist+0x350/0x19e0
> > 	[26101.020820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > 	[26101.022456] softirqs last  enabled at (36478): [<ffffffffbc0003a0>] __do_softirq+0x3a0/0x45a
> > 	[26101.022461] softirqs last disabled at (36459): [<ffffffffbb0a9949>] irq_exit+0xe9/0xf0
> > 	[26101.024212] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
> > 	[26101.039031] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
> > 	[26101.042382] RSP: 0018:ffffb3f401613820 EFLAGS: 00010206
> > 	[26101.043512] RAX: ffff9f8068690180 RBX: ffff9f7ebf2ba660 RCX: ffff9f8003b1eb80
> > 	[26101.044719] RDX: 000000000000015d RSI: 000000000000007e RDI: 0000018aff0a8000
> > 	[26101.046157] RBP: ffffb3f401613850 R08: ffffb3f4016137c8 R09: ffffb3f4016137d0
> > 	[26101.047474] R10: 0000000000007af3 R11: 000000000000007e R12: 0000000000000008
> > 	[26101.048779] R13: ffff9f7ea7d77d00 R14: 0000000000000a49 R15: ffff9f8068690180
> > 	[26101.049939] FS:  00007f9064d7a700(0000) GS:ffff9f80b6000000(0000) knlGS:0000000000000000
> > 	[26101.051415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > 	[26101.052330] CR2: 00007fcc462fd8a0 CR3: 000000020e002006 CR4: 00000000001606e0
> > 	[26101.053615] Call Trace:
> > 	[26101.054103]  btrfs_search_old_slot+0xfe/0x800
> > 	[26101.054900]  resolve_indirect_refs+0x1c5/0x910
> > 	[26101.055734]  ? prelim_ref_insert+0x10a/0x320
> > 	[26101.056474]  find_parent_nodes+0x443/0x1340
> > 	[26101.057153]  btrfs_find_all_roots_safe+0xc5/0x140
> > 	[26101.057890]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
> > 	[26101.058813]  iterate_extent_inodes+0x198/0x3e0
> > 	[26101.059608]  iterate_inodes_from_logical+0xa1/0xd0
> > 	[26101.060377]  ? iterate_inodes_from_logical+0xa1/0xd0
> > 	[26101.061142]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
> > 	[26101.061975]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
> > 	[26101.062710]  btrfs_ioctl+0xcf7/0x2cb0
> > 	[26101.063318]  ? __lock_acquire+0x477/0x1b50
> > 	[26101.063988]  ? kvm_sched_clock_read+0x18/0x30
> > 	[26101.065010]  ? kvm_sched_clock_read+0x18/0x30
> > 	[26101.065781]  ? sched_clock+0x9/0x10
> > 	[26101.066402]  do_vfs_ioctl+0xa6/0x6e0
> > 	[26101.067072]  ? btrfs_ioctl_get_supported_features+0x30/0x30
> > 	[26101.068102]  ? do_vfs_ioctl+0xa6/0x6e0
> > 	[26101.068765]  ? __fget+0x119/0x200
> > 	[26101.069381]  ksys_ioctl+0x75/0x80
> > 	[26101.069937]  __x64_sys_ioctl+0x1a/0x20
> > 	[26101.070627]  do_syscall_64+0x65/0x1a0
> > 	[26101.071299]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 	[26101.072125] RIP: 0033:0x7f9067675777
> > 	[26101.072732] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
> > 	[26101.075884] RSP: 002b:00007f9064d77458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > 	[26101.077193] RAX: ffffffffffffffda RBX: 00007f9064d77780 RCX: 00007f9067675777
> > 	[26101.078362] RDX: 00007f9064d77788 RSI: 00000000c038943b RDI: 0000000000000004
> > 	[26101.079543] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9064d77960
> > 	[26101.080837] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004
> > 	[26101.082724] R13: 00007f9064d77788 R14: 00007f9064d77668 R15: 00007f9064d77890
> > 	[26101.084083] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data snd_pcm crct10dif_pclmul crc32_pclmul dm_bio_prison crc32c_intel ghash_clmulni_intel dm_bufio sr_mod snd_timer cdrom snd sg aesni_intel ppdev joydev aes_x86_64 dm_mod soundcore crypto_simd ide_pci_generic cryptd piix glue_helper psmouse input_leds parport_pc ide_core rtc_cmos pcspkr serio_raw bochs_drm evbug parport evdev floppy i2c_piix4 qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
> > 	[26101.091346] ---[ end trace d327561dc44a663d ]---
> 
> That looks like genuine bug, first I've ever seen the rewind code
> hitting it. The rest that follows doesn't seem relevant.

I'm not surprised to find more than one bug at a time.  Historically I
have had to separate out half a dozen concurrently occurring bugs
before every bug report.

That said, the kernel BUG at fs/btrfs/ctree.c:1227 and
storm-of-soft-lockups events correlate very well.  On 5.0.x and 4.19.x
test runs so far, they have never happened separately, and BUG precedes
storm consistently by 22 seconds.  So maybe storm-of-soft-lockups is
itself a symptom, a result of the ctree BUG.

"Kernel BUG at fs/btrfs/ctree.c" isn't in my data from 4.20.x and 4.14.x,
but I don't have confidence in that data due to too many failures caused
by unrelated bugs--the 4.14 flushoncommit deadlock bug happens 20x more
frequently than storm-of-soft-lockups, and 4.20 has MM issues that make
it untestable.

> <snip>
> 
> > 
> > This doesn't quite match the traces I previously posted.  Maybe the
> > storm-of-soft-lockups is a symptom of multiple bugs (at least one which
> > happens after a bug in ctree.c and one that happens at other times)?
> > I'll run this a few more times and see if different cases come up.
> > 
> > The one thing that is common to all the storm-of-soft-lockups I've seen
> > so far is the involvement of multiple crawl_* threads, and those spend
> > ~60% of their time running logical_to_ino.
> > 
> >> Also do you have this patch on the affected machine:
> >>
> >> 38e3eebff643 ("btrfs: honor path->skip_locking in backref code") can you
> >> try and test with it applied ?
> >>
> >>
> >> <SNIP>
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
  2019-05-18  8:27     ` Nikolay Borisov
  2019-05-18 21:00       ` Zygo Blaxell
@ 2019-05-18 21:41       ` Zygo Blaxell
  1 sibling, 0 replies; 8+ messages in thread
From: Zygo Blaxell @ 2019-05-18 21:41 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 29227 bytes --]

On Sat, May 18, 2019 at 11:27:46AM +0300, Nikolay Borisov wrote:
> On 18.05.19 г. 7:44 ч., Zygo Blaxell wrote:
> > On Thu, May 16, 2019 at 09:57:01AM +0300, Nikolay Borisov wrote:
> >> On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
> >>> "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> >>> CPU cores in kernel functions that are unable to make forward progress,
> >>> but also unwilling to release their respective CPU cores.  This is
> >>> usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> >>> usage or just a lot of CPU fan noise) though I don't know if all cores
> >>> are spinning or only some of them.
> >>>
> >>> The kernel console presents a continual stream of "BUG: soft lockup"
> >>> warnings for some days.  None of the call traces change during this time.
> >>> The only way out is to reboot.
> >>>
> >>> You can reproduce this by writing a bunch of data to a filesystem while
> >>> bees is running on all cores.  It takes a few days to occur naturally.
> >>> It can probably be sped up by just doing a bunch of random LOGICAL_INO
> >>> ioctls in a tight loop on each core.
> >>>
> >>> Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> >>> (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> >>> ioctl (btrfs_tree_read_lock_atomic):
> >>
> >>
> >> Provide output of all sleeping threads when this occur via
> >>  echo w > /proc/sysrq-trigger.
> > 
> > I fixed my SysRq configuration.  The results are...kind of interesting.
> > I guess the four threads that are running loops on all cores don't count
> > as "blocked"...
> > 
> <snip>
> So you are discards are hung or there is a lost wakeup (which is rather
> unlikely, since this is the block layer). What is the underlying block
> device i.e model/make

Here's another set of soft-lockups, this time with no discard or snapshot.
This one's more interesting--the stuck CPUs are running something that
isn't commit_transaction or LOGICAL_INO.  It's the first time I've
seen that.

First the kernel BUG at fs/btrfs/ctree.c which looks pretty similar to
the last one:

	[20512.078483] kernel BUG at fs/btrfs/ctree.c:1227!
	[20512.079356] invalid opcode: 0000 [#1] SMP PTI
	[20512.080144] CPU: 1 PID: 5662 Comm: crawl_5268 Tainted: G        W         5.0.16-zb64-9b948ea3083a+ #1
	[20512.081397] irq event stamp: 2743290041
	[20512.082303] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[20512.082307] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
	[20512.082308] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
	[20512.082309] RSP: 0018:ffffa95b80f1b820 EFLAGS: 00010206
	[20512.082311] RAX: ffff9e059aa5a100 RBX: ffff9e04e01496e0 RCX: ffff9e06555dcb81
	[20512.082312] RDX: 00000000000000ce RSI: 000000000000009f RDI: 0000006336410000
	[20512.082312] RBP: ffffa95b80f1b850 R08: ffffa95b80f1b7c8 R09: ffffa95b80f1b7d0
	[20512.082313] R10: 0000000000007b2a R11: 000000000000009f R12: 0000000000000008
	[20512.082314] R13: ffff9e05c9e21c80 R14: 000000000003a09e R15: ffff9e059aa5a100
	[20512.082315] FS:  00007f938be31700(0000) GS:ffff9e06b6000000(0000) knlGS:0000000000000000
	[20512.082316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[20512.082317] CR2: 00007f6dc1f8c519 CR3: 00000001c3d14005 CR4: 00000000001606e0
	[20512.082319] Call Trace:
	[20512.082324]  btrfs_search_old_slot+0xfe/0x800
	[20512.082339]  resolve_indirect_refs+0x1c5/0x910
	[20512.082345]  ? prelim_ref_insert+0x10a/0x320
	[20512.082347]  find_parent_nodes+0x443/0x1340
	[20512.082356]  btrfs_find_all_roots_safe+0xc5/0x140
	[20512.082359]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[20512.082361]  iterate_extent_inodes+0x198/0x3e0
	[20512.082367]  iterate_inodes_from_logical+0xa1/0xd0
	[20512.082369]  ? iterate_inodes_from_logical+0xa1/0xd0
	[20512.082370]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[20512.082373]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[20512.082376]  btrfs_ioctl+0xcf7/0x2cb0
	[20512.082380]  ? __lock_acquire+0x477/0x1b50
	[20512.082382]  ? kvm_sched_clock_read+0x18/0x30
	[20512.082385]  ? kvm_sched_clock_read+0x18/0x30
	[20512.082387]  ? sched_clock+0x9/0x10
	[20512.082390]  do_vfs_ioctl+0xa6/0x6e0
	[20512.082391]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[20512.082392]  ? do_vfs_ioctl+0xa6/0x6e0
	[20512.082394]  ? __fget+0x119/0x200
	[20512.082397]  ksys_ioctl+0x75/0x80
	[20512.082399]  __x64_sys_ioctl+0x1a/0x20
	[20512.082402]  do_syscall_64+0x65/0x1a0
	[20512.082405]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[20512.082407] RIP: 0033:0x7f938cf29777
	[20512.082408] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[20512.082409] RSP: 002b:00007f938be2e458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[20512.082410] RAX: ffffffffffffffda RBX: 00007f938be2e780 RCX: 00007f938cf29777
	[20512.082411] RDX: 00007f938be2e788 RSI: 00000000c038943b RDI: 0000000000000003
	[20512.082412] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f938be2e960
	[20512.082412] R10: 000055ad4e628c40 R11: 0000000000000246 R12: 0000000000000003
	[20512.082413] R13: 00007f938be2e788 R14: 00007f938be2e668 R15: 00007f938be2e890
	[20512.082418] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev joydev dm_mod snd_pcm aesni_intel aes_x86_64 sr_mod cdrom crypto_simd snd_timer cryptd sg glue_helper snd parport_pc ide_pci_generic parport soundcore piix bochs_drm input_leds ide_core rtc_cmos floppy i2c_piix4 pcspkr psmouse evbug evdev serio_raw qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
	[20512.082464] ---[ end trace 3bcb7e6493f60fe5 ]---
	[20512.082466] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
	[20512.082467] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
	[20512.082468] RSP: 0018:ffffa95b80f1b820 EFLAGS: 00010206
	[20512.082469] RAX: ffff9e059aa5a100 RBX: ffff9e04e01496e0 RCX: ffff9e06555dcb81
	[20512.082469] RDX: 00000000000000ce RSI: 000000000000009f RDI: 0000006336410000
	[20512.082470] RBP: ffffa95b80f1b850 R08: ffffa95b80f1b7c8 R09: ffffa95b80f1b7d0
	[20512.082471] R10: 0000000000007b2a R11: 000000000000009f R12: 0000000000000008
	[20512.082472] R13: ffff9e05c9e21c80 R14: 000000000003a09e R15: ffff9e059aa5a100
	[20512.082473] FS:  00007f938be31700(0000) GS:ffff9e06b6000000(0000) knlGS:0000000000000000
	[20512.082473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[20512.082474] CR2: 00007f6dc1f8c519 CR3: 00000001c3d14005 CR4: 00000000001606e0
	[20512.082478] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
	[20512.082479] in_atomic(): 1, irqs_disabled(): 0, pid: 5662, name: crawl_5268
	[20512.082479] INFO: lockdep is turned off.
	[20512.082481] CPU: 1 PID: 5662 Comm: crawl_5268 Tainted: G      D W         5.0.16-zb64-9b948ea3083a+ #1
	[20512.082482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[20512.082482] Call Trace:
	[20512.082485]  dump_stack+0x86/0xc5
	[20512.082488]  ___might_sleep+0x217/0x240
	[20512.082490]  __might_sleep+0x4a/0x80
	[20512.082494]  exit_signals+0x33/0x250
	[20512.082496]  do_exit+0xb9/0xd70
	[20512.082502]  rewind_stack_do_exit+0x17/0x20
	[20512.082503] RIP: 0033:0x7f938cf29777
	[20512.082504] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[20512.082505] RSP: 002b:00007f938be2e458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[20512.082506] RAX: ffffffffffffffda RBX: 00007f938be2e780 RCX: 00007f938cf29777
	[20512.082507] RDX: 00007f938be2e788 RSI: 00000000c038943b RDI: 0000000000000003
	[20512.082508] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f938be2e960
	[20512.082508] R10: 000055ad4e628c40 R11: 0000000000000246 R12: 0000000000000003
	[20512.082509] R13: 00007f938be2e788 R14: 00007f938be2e668 R15: 00007f938be2e890
	[20512.082514] note: crawl_5268[5662] exited with preempt_count 2
	[20512.214564] hardirqs last  enabled at (2743290041): [<ffffffff922d4758>] kmem_cache_alloc+0x208/0x330
	[20512.216158] hardirqs last disabled at (2743290040): [<ffffffff922d4614>] kmem_cache_alloc+0xc4/0x330
	[20512.217793] softirqs last  enabled at (2743288790): [<ffffffff930003a0>] __do_softirq+0x3a0/0x45a
	[20512.219555] softirqs last disabled at (2743288767): [<ffffffff920a9949>] irq_exit+0xe9/0xf0
	[20536.037778] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [rsync:5715]
	[20536.042129] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [btrfs-transacti:4737]
	[20540.028759] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [crawl_5268:5663]
	[20540.033740] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [crawl_5268:5664]

Then SysRq-W:

	[53268.006056] sysrq: SysRq : Show Blocked State
	[53268.006780]   task                        PC stack   pid father
	[53268.007927] rsync           D    0  5683   5673 0x00000000
	[53268.008786] Call Trace:
	[53268.009215]  __schedule+0x3d4/0xb70
	[53268.009770]  schedule+0x3d/0x80
	[53268.010313]  btrfs_tree_lock+0x1a5/0x270
	[53268.010984]  ? wait_woken+0xa0/0xa0
	[53268.011560]  btrfs_search_slot+0x75f/0x930
	[53268.012335]  btrfs_lookup_inode+0x3e/0xc0
	[53268.013066]  ? kmem_cache_alloc+0x208/0x330
	[53268.013817]  ? btrfs_alloc_path+0x1a/0x20
	[53268.014552]  __btrfs_update_delayed_inode+0x70/0x210
	[53268.015542]  btrfs_commit_inode_delayed_inode+0x126/0x130
	[53268.016741]  btrfs_evict_inode+0x3c7/0x530
	[53268.017586]  ? dput+0x2a/0x2e0
	[53268.018222]  evict+0xc3/0x1c0
	[53268.018835]  iput+0x1de/0x280
	[53268.019481]  ? dput+0x2a/0x2e0
	[53268.020130]  dentry_unlink_inode+0xc0/0xf0
	[53268.020996]  __dentry_kill+0xd1/0x1b0
	[53268.021767]  dentry_kill+0x55/0x1e0
	[53268.022512]  ? dput+0x2a/0x2e0
	[53268.023168]  dput+0x291/0x2e0
	[53268.023792]  do_renameat2+0x420/0x5b0
	[53268.024667]  __x64_sys_rename+0x20/0x30
	[53268.025583]  do_syscall_64+0x65/0x1a0
	[53268.026501]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.027664] RIP: 0033:0x7f0167fb1947
	[53268.028415] Code: Bad RIP value.
	[53268.029104] RSP: 002b:00007ffd919c7d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
	[53268.030731] RAX: ffffffffffffffda RBX: 00007ffd919c9ed0 RCX: 00007f0167fb1947
	[53268.031962] RDX: 0000000000000000 RSI: 00007ffd919c7ed0 RDI: 00007ffd919c9ed0
	[53268.033209] RBP: 00007ffd919c7ed0 R08: 0000000000000000 R09: 0000000000000000
	[53268.034540] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
	[53268.036319] R13: 0000000000000004 R14: 00000000000081a4 R15: 0000000000000001
	[53268.038063] rsync           D    0  6461   6460 0x00000000
	[53268.039405] Call Trace:
	[53268.040102]  __schedule+0x3d4/0xb70
	[53268.040877]  ? sched_clock+0x9/0x10
	[53268.041640]  schedule+0x3d/0x80
	[53268.042301]  io_schedule+0x16/0x40
	[53268.043065]  wait_on_page_bit_killable+0x183/0x290
	[53268.044111]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.044909]  generic_file_read_iter+0x267/0xb70
	[53268.045688]  __vfs_read+0x124/0x1b0
	[53268.046290]  vfs_read+0xab/0x160
	[53268.046899]  ksys_read+0x5f/0xf0
	[53268.047637]  __x64_sys_read+0x1a/0x20
	[53268.048582]  do_syscall_64+0x65/0x1a0
	[53268.049368]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.050207] RIP: 0033:0x7fa9d031c761
	[53268.050787] Code: Bad RIP value.
	[53268.051318] RSP: 002b:00007ffc5f634ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.052997] RAX: ffffffffffffffda RBX: 000055a2c9a3cba0 RCX: 00007fa9d031c761
	[53268.054359] RDX: 0000000000003800 RSI: 000055a2c998e9d0 RDI: 0000000000000001
	[53268.056088] RBP: 0000000000003800 R08: fffffffffffffff0 R09: 0000000000b75811
	[53268.057557] R10: 000055a2c998e9d0 R11: 0000000000000246 R12: 0000000000001000
	[53268.058876] R13: 0000000000003800 R14: 0000000000000320 R15: 0000000000001000
	[53268.060583] rsync           D    0  6596   6595 0x00000000
	[53268.061503] Call Trace:
	[53268.061988]  __schedule+0x3d4/0xb70
	[53268.062717]  schedule+0x3d/0x80
	[53268.063693]  io_schedule+0x16/0x40
	[53268.064330]  wait_on_page_bit+0x178/0x270
	[53268.065084]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.065959]  read_extent_buffer_pages+0x27e/0x350
	[53268.066914]  btree_read_extent_buffer_pages+0xb8/0x1c0
	[53268.067778]  read_tree_block+0x42/0x70
	[53268.068594]  read_block_for_search.isra.11+0x181/0x350
	[53268.069556]  btrfs_search_slot+0x3cf/0x930
	[53268.070318]  btrfs_lookup_extent_info+0xa3/0x450
	[53268.071986]  update_ref_for_cow+0x1a6/0x340
	[53268.073145]  __btrfs_cow_block+0x1ff/0x570
	[53268.074389]  btrfs_cow_block+0xf8/0x230
	[53268.075428]  push_leaf_left+0x11a/0x190
	[53268.077233]  btrfs_del_items+0x27b/0x470
	[53268.078094]  btrfs_truncate_inode_items+0x2ee/0xe70
	[53268.079041]  ? _raw_spin_unlock+0x27/0x40
	[53268.079898]  btrfs_evict_inode+0x455/0x530
	[53268.080898]  ? dput+0x2a/0x2e0
	[53268.081540]  evict+0xc3/0x1c0
	[53268.082183]  iput+0x1de/0x280
	[53268.082853]  ? dput+0x2a/0x2e0
	[53268.084211]  dentry_unlink_inode+0xc0/0xf0
	[53268.084965]  __dentry_kill+0xd1/0x1b0
	[53268.085805]  dentry_kill+0x55/0x1e0
	[53268.086685]  ? dput+0x2a/0x2e0
	[53268.087371]  dput+0x291/0x2e0
	[53268.087973]  do_renameat2+0x420/0x5b0
	[53268.088974]  __x64_sys_rename+0x20/0x30
	[53268.090099]  do_syscall_64+0x65/0x1a0
	[53268.091287]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.092354] RIP: 0033:0x7f17fb432947
	[53268.093487] Code: Bad RIP value.
	[53268.094165] RSP: 002b:00007ffe387435d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
	[53268.095777] RAX: ffffffffffffffda RBX: 00007ffe387457a0 RCX: 00007f17fb432947
	[53268.097312] RDX: 0000000000000000 RSI: 00007ffe387437a0 RDI: 00007ffe387457a0
	[53268.098906] RBP: 00007ffe387437a0 R08: 0000000000000000 R09: 0000000000000000
	[53268.100613] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
	[53268.102466] R13: 0000000000000004 R14: 00000000000081a4 R15: 0000000000000001
	[53268.104454] rsync           D    0  6679   6674 0x00000000
	[53268.105875] Call Trace:
	[53268.106453]  __schedule+0x3d4/0xb70
	[53268.107603]  schedule+0x3d/0x80
	[53268.108404]  io_schedule+0x16/0x40
	[53268.109461]  wait_on_page_bit_killable+0x183/0x290
	[53268.110459]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.111238]  generic_file_read_iter+0x267/0xb70
	[53268.112123]  __vfs_read+0x124/0x1b0
	[53268.112842]  vfs_read+0xab/0x160
	[53268.113494]  ksys_read+0x5f/0xf0
	[53268.114127]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.116272]  __x64_sys_read+0x1a/0x20
	[53268.116887]  do_syscall_64+0x65/0x1a0
	[53268.117540]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.118364] RIP: 0033:0x7f935866e761
	[53268.119003] Code: 00 f7 27 00 00 00 00 00 00 62 81 01 00 00 00 00 00 00 20 00 00 00 00 00 00 08 28 00 00 00 00 00 00 62 81 01 00 00 00 00 00 00 <20> 00 00 00 00 00 00 2d 28 00 00 00 00 00 00 62 81 01 00 00 00 00
	[53268.124309] RSP: 002b:00007ffd292115f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.126193] RAX: ffffffffffffffda RBX: 00005587c16c2ff0 RCX: 00007f935866e761
	[53268.127920] RDX: 0000000000007c00 RSI: 00005587c16c3430 RDI: 0000000000000001
	[53268.129621] RBP: 0000000000007c00 R08: fffffffffffffff0 R09: 0000000005334800
	[53268.132283] R10: 00005587c16c3430 R11: 0000000000000246 R12: 0000000000000400
	[53268.134937] R13: 0000000000007c00 R14: 0000000000000390 R15: 0000000000000400
	[53268.137415] rsync           D    0  7081   7053 0x00000000
	[53268.138587] Call Trace:
	[53268.139507]  __schedule+0x3d4/0xb70
	[53268.141054]  schedule+0x3d/0x80
	[53268.141638]  io_schedule+0x16/0x40
	[53268.142392]  wait_on_page_bit_killable+0x183/0x290
	[53268.143263]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.144926]  generic_file_read_iter+0x267/0xb70
	[53268.145758]  __vfs_read+0x124/0x1b0
	[53268.146472]  vfs_read+0xab/0x160
	[53268.147252]  ksys_read+0x5f/0xf0
	[53268.148273]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.149682]  __x64_sys_read+0x1a/0x20
	[53268.150565]  do_syscall_64+0x65/0x1a0
	[53268.151428]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.152233] RIP: 0033:0x7ff3a6803761
	[53268.152800] Code: Bad RIP value.
	[53268.153319] RSP: 002b:00007ffefada95e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.154919] RAX: ffffffffffffffda RBX: 0000562ccb936cc0 RCX: 00007ff3a6803761
	[53268.156871] RDX: 0000000000002109 RSI: 0000562cc7e2b1c0 RDI: 0000000000000000
	[53268.158322] RBP: 0000000000002109 R08: 0000562cc7e2b1b0 R09: 00007ff3a68d53f0
	[53268.160016] R10: 0000562cc7269010 R11: 0000000000000246 R12: 0000000000000000
	[53268.161736] R13: 0000000000002109 R14: 0000000000000000 R15: 0000000000000000
	[53268.163275] rsync           D    0  8636   8583 0x00000000
	[53268.164188] Call Trace:
	[53268.164688]  __schedule+0x3d4/0xb70
	[53268.165647]  schedule+0x3d/0x80
	[53268.166522]  io_schedule+0x16/0x40
	[53268.167170]  wait_on_page_bit_killable+0x183/0x290
	[53268.168046]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.168887]  generic_file_read_iter+0x267/0xb70
	[53268.170488]  __vfs_read+0x124/0x1b0
	[53268.171445]  vfs_read+0xab/0x160
	[53268.172233]  ksys_read+0x5f/0xf0
	[53268.172922]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.174115]  __x64_sys_read+0x1a/0x20
	[53268.174922]  do_syscall_64+0x65/0x1a0
	[53268.175769]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.177195] RIP: 0033:0x7ff57017f761
	[53268.178011] Code: Bad RIP value.
	[53268.178730] RSP: 002b:00007ffcbb34b548 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.180469] RAX: ffffffffffffffda RBX: 000055e78c7bfc90 RCX: 00007ff57017f761
	[53268.182613] RDX: 00000000000022dd RSI: 000055e78c7b63d0 RDI: 0000000000000000
	[53268.184854] RBP: 00000000000022dd R08: 000055e78c7b63c0 R09: 00007ff570250cd0
	[53268.186508] R10: 000055e78bd77010 R11: 0000000000000246 R12: 0000000000000000
	[53268.189446] R13: 00000000000022dd R14: 0000000000000000 R15: 0000000000000000
	[53268.190635] rsync           D    0 30991   8458 0x00000000
	[53268.192905] Call Trace:
	[53268.193332]  __schedule+0x3d4/0xb70
	[53268.193920]  ? sched_clock+0x9/0x10
	[53268.195564]  schedule+0x3d/0x80
	[53268.196290]  io_schedule+0x16/0x40
	[53268.197067]  wait_on_page_bit_killable+0x183/0x290
	[53268.197895]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.198875]  generic_file_read_iter+0x267/0xb70
	[53268.199770]  __vfs_read+0x124/0x1b0
	[53268.200515]  vfs_read+0xab/0x160
	[53268.201198]  ksys_read+0x5f/0xf0
	[53268.201755]  __x64_sys_read+0x1a/0x20
	[53268.202561]  do_syscall_64+0x65/0x1a0
	[53268.203364]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.204306] RIP: 0033:0x7fcab8c0c761
	[53268.205129] Code: Bad RIP value.
	[53268.205698] RSP: 002b:00007ffe4e709938 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.207263] RAX: ffffffffffffffda RBX: 000055cf5c835d70 RCX: 00007fcab8c0c761
	[53268.208588] RDX: 0000000000004400 RSI: 000055cf5a51d7c0 RDI: 0000000000000001
	[53268.209908] RBP: 0000000000004400 R08: 000055cf5a51d7b0 R09: 00007fcab8cde450
	[53268.211376] R10: 000055cf59ded010 R11: 0000000000000246 R12: 0000000000000000
	[53268.214192] R13: 0000000000004400 R14: 0000000000000000 R15: 0000000000000000

Then SysRq-T excerpts for each of the 4 tasks stuck on CPUs:

CPU#0:

	[53274.315728] rsync           R  running task        0  5715   5708 0x80000008
	[53274.316927] Call Trace:
	[53274.317380]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53274.318181]  ? retint_kernel+0x10/0x10
	[53274.318820]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53274.319646]  ? retint_kernel+0x10/0x10
	[53274.320331]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53274.321125]  ? trace_hardirqs_on_caller+0x4a/0x100
	[53274.321924]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53274.322716]  ? retint_kernel+0x10/0x10
	[53274.323367]  ? retint_kernel+0x10/0x10
	[53274.324009]  ? __pv_queued_spin_lock_slowpath+0x273/0x2b0
	[53274.324926]  ? __pv_queued_spin_lock_slowpath+0xf7/0x2b0
	[53274.325814]  ? __pv_queued_spin_lock_slowpath+0x273/0x2b0
	[53274.326729]  ? queued_write_lock_slowpath+0x80/0x90
	[53274.327560]  ? do_raw_write_lock+0xae/0xb0
	[53274.328262]  ? _raw_write_lock+0x55/0x70
	[53274.328932]  ? tree_mod_log_insert_key+0xd3/0x150
	[53274.329840]  ? __btrfs_cow_block+0x25d/0x570
	[53274.330810]  ? btrfs_cow_block+0xf8/0x230
	[53274.331688]  ? btrfs_search_slot+0x213/0x930
	[53274.332621]  ? btrfs_lookup_inode+0x3e/0xc0
	[53274.333595]  ? kmem_cache_alloc+0x208/0x330
	[53274.334505]  ? btrfs_alloc_path+0x1a/0x20
	[53274.335385]  ? __btrfs_update_delayed_inode+0x70/0x210
	[53274.336515]  ? btrfs_commit_inode_delayed_inode+0x126/0x130
	[53274.337740]  ? btrfs_evict_inode+0x3c7/0x530
	[53274.338683]  ? dput+0x2a/0x2e0
	[53274.339350]  ? evict+0xc3/0x1c0
	[53274.340056]  ? iput+0x1de/0x280
	[53274.340746]  ? dput+0x2a/0x2e0
	[53274.341456]  ? dentry_unlink_inode+0xc0/0xf0
	[53274.342378]  ? __dentry_kill+0xd1/0x1b0
	[53274.343208]  ? dentry_kill+0x55/0x1e0
	[53274.343932]  ? dput+0x2a/0x2e0
	[53274.344597]  ? dput+0x291/0x2e0
	[53274.345263]  ? __fput+0x12f/0x260
	[53274.345984]  ? ____fput+0xe/0x10
	[53274.346660]  ? task_work_run+0x8f/0xc0
	[53274.347499]  ? exit_to_usermode_loop+0x112/0x120
	[53274.348543]  ? do_syscall_64+0x199/0x1a0
	[53274.349368]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

CPU#1:

	[53273.449195] btrfs-transacti R  running task        0  4737      2 0x80000008
	[53273.451375] Call Trace:
	[53273.451980]  ? retint_kernel+0x10/0x10
	[53273.452758]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53273.453549]  ? trace_hardirqs_on_caller+0x4a/0x100
	[53273.454355]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53273.455143]  ? retint_kernel+0x10/0x10
	[53273.455777]  ? kvm_wait+0x86/0x90
	[53273.456368]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[53273.457300]  ? trace_hardirqs_on+0x4c/0x100
	[53273.458008]  ? kvm_wait+0x8b/0x90
	[53273.458577]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[53273.459747]  ? queued_write_lock_slowpath+0x80/0x90
	[53273.460762]  ? do_raw_write_lock+0xae/0xb0
	[53273.461682]  ? _raw_write_lock+0x55/0x70
	[53273.462493]  ? tree_mod_log_insert_key+0xd3/0x150
	[53273.463314]  ? __btrfs_cow_block+0x25d/0x570
	[53273.464040]  ? btrfs_cow_block+0xf8/0x230
	[53273.464718]  ? btrfs_search_slot+0x213/0x930
	[53273.465447]  ? lookup_inline_extent_backref+0x118/0x570
	[53273.466358]  ? mark_held_locks+0x76/0xa0
	[53273.467175]  ? insert_inline_extent_backref+0x51/0xe0
	[53273.468334]  ? rcu_read_lock_sched_held+0x72/0x80
	[53273.469560]  ? __btrfs_inc_extent_ref+0x87/0x220
	[53273.470730]  ? run_delayed_data_ref+0x1c2/0x210
	[53273.471718]  ? run_one_delayed_ref+0x55/0xa0
	[53273.472787]  ? btrfs_run_delayed_refs_for_head+0x17b/0x3b0
	[53273.474200]  ? __btrfs_run_delayed_refs+0x84/0x180
	[53273.475455]  ? btrfs_run_delayed_refs+0x86/0x1e0
	[53273.476684]  ? btrfs_commit_transaction+0x52/0xab0
	[53273.478684]  ? start_transaction+0x93/0x4d0
	[53273.479500]  ? transaction_kthread+0x155/0x190
	[53273.480308]  ? kthread+0x113/0x150
	[53273.480920]  ? btrfs_cleanup_transaction+0x630/0x630
	[53273.481937]  ? kthread_park+0x90/0x90
	[53273.483571]  ? ret_from_fork+0x3a/0x50

CPU#2:

	[53273.814635] crawl_5268      R  running task        0  5663   5650 0x80000008
	[53273.816033] Call Trace:
	[53273.816470]  <IRQ>
	[53273.816840]  sched_show_task+0x198/0x260
	[53273.817546]  show_state_filter+0xa0/0x1a0
	[53273.818276]  sysrq_handle_showstate+0x10/0x20
	[53273.819059]  __handle_sysrq+0x139/0x210
	[53273.819744]  handle_sysrq+0x26/0x30
	[53273.820376]  serial8250_handle_irq.part.16+0xbc/0x100
	[53273.821298]  serial8250_default_handle_irq+0x53/0x60
	[53273.822209]  serial8250_interrupt+0x68/0x100
	[53273.823001]  __handle_irq_event_percpu+0x4e/0x2b0
	[53273.823958]  handle_irq_event_percpu+0x32/0x80
	[53273.824947]  handle_irq_event+0x39/0x60
	[53273.825800]  handle_edge_irq+0xef/0x1c0
	[53273.826554]  handle_irq+0x75/0x120
	[53273.827191]  do_IRQ+0x64/0x130
	[53273.827760]  common_interrupt+0xf/0xf
	[53273.828443]  </IRQ>
	[53273.828841] RIP: 0010:native_safe_halt+0x12/0x20
	[53273.829783] Code: f0 80 48 02 20 48 8b 00 a8 08 0f 84 7b ff ff ff eb bd 90 90 90 90 90 90 55 48 89 e5 e9 07 00 00 00 0f 00 2d b2 ec 43 00 fb f4 <5d> c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e9 07 00 00
	[53273.834211] RSP: 0018:ffffa95b81107790 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd6
	[53273.835701] RAX: 0000000000000000 RBX: ffff9e06b5e159d4 RCX: 0000000000000001
	[53273.837608] RDX: ffff9e06aeb2c000 RSI: ffffffff921199ed RDI: ffffffff9207a326
	[53273.839525] RBP: ffffa95b81107790 R08: 0000000000000000 R09: 0000000000000000
	[53273.841170] R10: ffffa95b811077d0 R11: ffff9e069ddfd6b8 R12: 0000000000000246
	[53273.842439] R13: 0000000000000001 R14: ffff9e06b5e159d4 R15: 0000000000000001
	[53273.843995]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[53273.845002]  ? kvm_wait+0x86/0x90
	[53273.845610]  kvm_wait+0x8b/0x90
	[53273.846178]  __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[53273.847472]  queued_read_lock_slowpath+0x75/0x80
	[53273.848317]  do_raw_read_lock+0x4b/0x50
	[53273.849013]  _raw_read_lock+0x58/0x70
	[53273.849693]  __tree_mod_log_search+0x2d/0xb0
	[53273.850466]  __tree_mod_log_oldest_root.isra.1+0x4c/0x80
	[53273.851593]  btrfs_old_root_level+0x26/0x80
	[53273.852374]  resolve_indirect_refs+0x2ad/0x910
	[53273.853211]  ? __schedule+0x3dc/0xb70
	[53273.853870]  find_parent_nodes+0x443/0x1340
	[53273.854631]  btrfs_find_all_roots_safe+0xc5/0x140
	[53273.855774]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[53273.856908]  iterate_extent_inodes+0x198/0x3e0
	[53273.857766]  iterate_inodes_from_logical+0xa1/0xd0
	[53273.858929]  ? iterate_inodes_from_logical+0xa1/0xd0
	[53273.859939]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[53273.860847]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[53273.861741]  btrfs_ioctl+0xcf7/0x2cb0
	[53273.862617]  ? __lock_acquire+0x477/0x1b50
	[53273.863530]  ? kvm_sched_clock_read+0x18/0x30
	[53273.864496]  ? kvm_sched_clock_read+0x18/0x30
	[53273.865335]  ? sched_clock+0x9/0x10
	[53273.866015]  do_vfs_ioctl+0xa6/0x6e0
	[53273.866891]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[53273.867954]  ? do_vfs_ioctl+0xa6/0x6e0
	[53273.869148]  ? __fget+0x119/0x200
	[53273.869922]  ksys_ioctl+0x75/0x80
	[53273.870649]  __x64_sys_ioctl+0x1a/0x20
	[53273.871450]  do_syscall_64+0x65/0x1a0
	[53273.872351]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53273.873487] RIP: 0033:0x7f938cf29777
	[53273.874154] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[53273.878117] RSP: 002b:00007f938b62d458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[53273.879542] RAX: ffffffffffffffda RBX: 00007f938b62d780 RCX: 00007f938cf29777
	[53273.881224] RDX: 00007f938b62d788 RSI: 00000000c038943b RDI: 0000000000000003
	[53273.882805] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f938b62d960
	[53273.884519] R10: 000055ad4e628c40 R11: 0000000000000246 R12: 0000000000000003
	[53273.885971] R13: 00007f938b62d788 R14: 00007f938b62d668 R15: 00007f938b62d890

CPU#3:

	[53273.887482] crawl_5268      R  running task        0  5664   5650 0x80000008
	[53273.888778] Call Trace:
	[53273.889807]  ? tree_mod_log_insert_root.isra.2+0x117/0x350
	[53273.890820]  ? __btrfs_cow_block+0x41e/0x570
	[53273.891625]  ? btrfs_cow_block+0xf8/0x230
	[53273.892471]  ? btrfs_search_slot+0x213/0x930
	[53273.893296]  ? kmem_cache_free+0xa4/0x1e0
	[53273.894189]  ? btrfs_lookup_file_extent+0x49/0x60
	[53273.895058]  ? __btrfs_drop_extents+0x183/0xdf0
	[53273.895869]  ? kmem_cache_alloc+0x208/0x330
	[53273.896641]  ? btrfs_alloc_path+0x1a/0x20
	[53273.897371]  ? trace_hardirqs_on+0x4c/0x100
	[53273.898310]  ? btrfs_drop_extents+0x5d/0x90
	[53273.899096]  ? btrfs_clone+0x845/0xcf0
	[53273.899938]  ? btrfs_extent_same_range+0x5b/0xd0
	[53273.901203]  ? btrfs_remap_file_range+0x218/0x380
	[53273.902585]  ? vfs_dedupe_file_range_one+0x111/0x170
	[53273.903996]  ? vfs_dedupe_file_range+0x157/0x1f0
	[53273.904850]  ? do_vfs_ioctl+0x27f/0x6e0
	[53273.905655]  ? __fget+0x119/0x200
	[53273.906918]  ? ksys_ioctl+0x75/0x80
	[53273.907699]  ? __x64_sys_ioctl+0x1a/0x20
	[53273.908428]  ? do_syscall_64+0x65/0x1a0
	[53273.909175]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

> >>
> >> <SNIP>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+, fixed in 5.4)
  2019-05-15 21:36 storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+) Zygo Blaxell
  2019-05-16  6:57 ` Nikolay Borisov
@ 2020-02-04  4:57 ` Zygo Blaxell
  1 sibling, 0 replies; 8+ messages in thread
From: Zygo Blaxell @ 2020-02-04  4:57 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5787 bytes --]

On Wed, May 15, 2019 at 05:36:50PM -0400, Zygo Blaxell wrote:
> "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> CPU cores in kernel functions that are unable to make forward progress,
> but also unwilling to release their respective CPU cores.  This is
> usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> usage or just a lot of CPU fan noise) though I don't know if all cores
> are spinning or only some of them.
> 
> The kernel console presents a continual stream of "BUG: soft lockup"
> warnings for some days.  None of the call traces change during this time.
> The only way out is to reboot.
> 
> You can reproduce this by writing a bunch of data to a filesystem while
> bees is running on all cores.  It takes a few days to occur naturally.
> It can probably be sped up by just doing a bunch of random LOGICAL_INO
> ioctls in a tight loop on each core.
> 
> Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> ioctl (btrfs_tree_read_lock_atomic):
> 
> 	[509506.128259] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-transacti:4716]
> 	[509506.130000] Modules linked in: mq_deadline bfq dm_cache_smq ppdev joydev dm_cache dm_persistent_data dm_bio_prison dm_bufio crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod aesni_intel aes_x86_64 crypto_simd cryptd glue_helper sr_mod cdrom sg ide_pci_generic piix input_leds i2c_piix4 bochs_drm ide_core rtc_cmos floppy parport_pc parport psmouse serio_raw evbug evdev snd_pcm snd_timer snd soundcore pcspkr qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
> 	[509506.137503] irq event stamp: 217070078
> 	[509506.138165] hardirqs last  enabled at (217070077): [<ffffffffa2dcafb6>] _raw_spin_unlock_irqrestore+0x36/0x60
> 	[509506.140129] hardirqs last disabled at (217070078): [<ffffffffa2dc32c9>] __schedule+0xd9/0xb70
> 	[509506.141653] softirqs last  enabled at (217069454): [<ffffffffa30003a0>] __do_softirq+0x3a0/0x45a
> 	[509506.143251] softirqs last disabled at (217069443): [<ffffffffa20a9949>] irq_exit+0xe9/0xf0
> 	[509506.144782] CPU: 0 PID: 4716 Comm: btrfs-transacti Tainted: G      D W    L    5.0.11-zb64-ae88fcd98bb4+ #1
> 	[509506.146453] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> 	[509506.147881] RIP: 0010:queued_write_lock_slowpath+0x4e/0x90
> 	[509506.148912] Code: c0 75 0d ba ff 00 00 00 f0 0f b1 13 85 c0 74 33 f0 81 03 00 01 00 00 b9 ff 00 00 00 be 00 01 00 00 8b 03 3d 00 01 00 00 74 0c <f3> 90 8b 13 81 fa 00 01 00 00 75 f4 89 f0 f0 0f b1 0b 3d 00 01 00
> 	[509506.152585] RSP: 0018:ffffab0b412579c0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
> 	[509506.154027] RAX: 00000000000001ff RBX: ffff89ea1ac6fde8 RCX: 00000000000000ff
> 	[509506.155449] RDX: 00000000000001ff RSI: 0000000000000100 RDI: ffff89ea1ac6fde8
> 	[509506.156729] RBP: ffffab0b412579d0 R08: ffffffffa2502785 R09: 0000000000000000
> 	[509506.157946] R10: ffffab0b41257980 R11: ffff89ea1ac6fe00 R12: ffff89ea1ac6fdec
> 	[509506.159175] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
> 	[509506.160345] FS:  0000000000000000(0000) GS:ffff89ebb5c00000(0000) knlGS:0000000000000000
> 	[509506.161714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 	[509506.162703] CR2: 00007f381507ca68 CR3: 000000021b0b0003 CR4: 00000000001606f0
> 	[509506.164006] Call Trace:
> 	[509506.164473]  do_raw_write_lock+0xae/0xb0
> 	[509506.165315]  _raw_write_lock+0x55/0x70
> 	[509506.166026]  btrfs_try_tree_write_lock+0x35/0x90
> 	[509506.167033]  btrfs_search_slot+0x41c/0x930
> 	[509506.167981]  lookup_inline_extent_backref+0x118/0x570
> 	[509506.168929]  ? kvm_sched_clock_read+0x18/0x30
> 	[509506.169763]  insert_inline_extent_backref+0x51/0xe0
> 	[509506.170686]  __btrfs_inc_extent_ref+0x87/0x220
> 	[509506.171583]  ? lock_acquire+0xbd/0x1c0
> 	[509506.172394]  run_delayed_tree_ref+0x182/0x1f0
> 	[509506.173264]  run_one_delayed_ref+0x94/0xa0
> 	[509506.174043]  btrfs_run_delayed_refs_for_head+0x17b/0x3b0
> 	[509506.175064]  __btrfs_run_delayed_refs+0x84/0x180
> 	[509506.175948]  btrfs_run_delayed_refs+0x86/0x1e0
> 	[509506.176803]  btrfs_commit_transaction+0x52/0xa00
> 	[509506.177695]  ? start_transaction+0x93/0x4d0
> 	[509506.178506]  transaction_kthread+0x155/0x190
> 	[509506.179450]  kthread+0x113/0x150
> 	[509506.180222]  ? btrfs_cleanup_transaction+0x630/0x630
> 	[509506.181178]  ? kthread_park+0x90/0x90
> 	[509506.181896]  ret_from_fork+0x3a/0x50

This was a terrible bug report.  ;)

The storm-of-softlockups happens whenever the kernel hits a BUG_ON while
holding a spinlock.  We have to find the _first_ BUG log message and
stack trace to get any useful information.  The later traces are all
consequences of the first, and they will spam the console until the
hardware watchdog times out and forces a reboot.  All cores are in an
infinite loop and will make no further progress.

I made the connection when I hit the storm of soft lockups twice due
to a BUG_ON in CIFS.  The storm isn't a btrfs-specific feature, but the
causative BUG_ON is.

This turned out to be a BUG_ON in fs/btrfs/ctree.c in
__tree_mod_log_rewind, and the fix is the three recent tree mod log
patches:

        6609fee8897a Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues                               
        efad8a853ad2 Btrfs: fix use-after-free when using the tree modification log                                                 
        47c8495d358b Btrfs: fix race between adding and putting tree mod seq elements and nodes                                     

Thanks again Filipe for these!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-02-04  4:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-15 21:36 storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+) Zygo Blaxell
2019-05-16  6:57 ` Nikolay Borisov
2019-05-16 16:20   ` Zygo Blaxell
2019-05-18  4:44   ` Zygo Blaxell
2019-05-18  8:27     ` Nikolay Borisov
2019-05-18 21:00       ` Zygo Blaxell
2019-05-18 21:41       ` Zygo Blaxell
2020-02-04  4:57 ` storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+, fixed in 5.4) Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).