linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Nikolay Borisov <nborisov@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
Date: Sat, 18 May 2019 17:41:38 -0400	[thread overview]
Message-ID: <20190518214138.GJ20359@hungrycats.org> (raw)
In-Reply-To: <349ec27f-7eb4-45be-7cd4-29c2cd56d05b@suse.com>

[-- Attachment #1: Type: text/plain, Size: 29227 bytes --]

On Sat, May 18, 2019 at 11:27:46AM +0300, Nikolay Borisov wrote:
> On 18.05.19 г. 7:44 ч., Zygo Blaxell wrote:
> > On Thu, May 16, 2019 at 09:57:01AM +0300, Nikolay Borisov wrote:
> >> On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
> >>> "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> >>> CPU cores in kernel functions that are unable to make forward progress,
> >>> but also unwilling to release their respective CPU cores.  This is
> >>> usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> >>> usage or just a lot of CPU fan noise) though I don't know if all cores
> >>> are spinning or only some of them.
> >>>
> >>> The kernel console presents a continual stream of "BUG: soft lockup"
> >>> warnings for some days.  None of the call traces change during this time.
> >>> The only way out is to reboot.
> >>>
> >>> You can reproduce this by writing a bunch of data to a filesystem while
> >>> bees is running on all cores.  It takes a few days to occur naturally.
> >>> It can probably be sped up by just doing a bunch of random LOGICAL_INO
> >>> ioctls in a tight loop on each core.
> >>>
> >>> Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> >>> (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> >>> ioctl (btrfs_tree_read_lock_atomic):
> >>
> >>
> >> Provide output of all sleeping threads when this occur via
> >>  echo w > /proc/sysrq-trigger.
> > 
> > I fixed my SysRq configuration.  The results are...kind of interesting.
> > I guess the four threads that are running loops on all cores don't count
> > as "blocked"...
> > 
> <snip>
> So you are discards are hung or there is a lost wakeup (which is rather
> unlikely, since this is the block layer). What is the underlying block
> device i.e model/make

Here's another set of soft-lockups, this time with no discard or snapshot.
This one's more interesting--the stuck CPUs are running something that
isn't commit_transaction or LOGICAL_INO.  It's the first time I've
seen that.

First the kernel BUG at fs/btrfs/ctree.c which looks pretty similar to
the last one:

	[20512.078483] kernel BUG at fs/btrfs/ctree.c:1227!
	[20512.079356] invalid opcode: 0000 [#1] SMP PTI
	[20512.080144] CPU: 1 PID: 5662 Comm: crawl_5268 Tainted: G        W         5.0.16-zb64-9b948ea3083a+ #1
	[20512.081397] irq event stamp: 2743290041
	[20512.082303] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[20512.082307] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
	[20512.082308] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
	[20512.082309] RSP: 0018:ffffa95b80f1b820 EFLAGS: 00010206
	[20512.082311] RAX: ffff9e059aa5a100 RBX: ffff9e04e01496e0 RCX: ffff9e06555dcb81
	[20512.082312] RDX: 00000000000000ce RSI: 000000000000009f RDI: 0000006336410000
	[20512.082312] RBP: ffffa95b80f1b850 R08: ffffa95b80f1b7c8 R09: ffffa95b80f1b7d0
	[20512.082313] R10: 0000000000007b2a R11: 000000000000009f R12: 0000000000000008
	[20512.082314] R13: ffff9e05c9e21c80 R14: 000000000003a09e R15: ffff9e059aa5a100
	[20512.082315] FS:  00007f938be31700(0000) GS:ffff9e06b6000000(0000) knlGS:0000000000000000
	[20512.082316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[20512.082317] CR2: 00007f6dc1f8c519 CR3: 00000001c3d14005 CR4: 00000000001606e0
	[20512.082319] Call Trace:
	[20512.082324]  btrfs_search_old_slot+0xfe/0x800
	[20512.082339]  resolve_indirect_refs+0x1c5/0x910
	[20512.082345]  ? prelim_ref_insert+0x10a/0x320
	[20512.082347]  find_parent_nodes+0x443/0x1340
	[20512.082356]  btrfs_find_all_roots_safe+0xc5/0x140
	[20512.082359]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[20512.082361]  iterate_extent_inodes+0x198/0x3e0
	[20512.082367]  iterate_inodes_from_logical+0xa1/0xd0
	[20512.082369]  ? iterate_inodes_from_logical+0xa1/0xd0
	[20512.082370]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[20512.082373]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[20512.082376]  btrfs_ioctl+0xcf7/0x2cb0
	[20512.082380]  ? __lock_acquire+0x477/0x1b50
	[20512.082382]  ? kvm_sched_clock_read+0x18/0x30
	[20512.082385]  ? kvm_sched_clock_read+0x18/0x30
	[20512.082387]  ? sched_clock+0x9/0x10
	[20512.082390]  do_vfs_ioctl+0xa6/0x6e0
	[20512.082391]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[20512.082392]  ? do_vfs_ioctl+0xa6/0x6e0
	[20512.082394]  ? __fget+0x119/0x200
	[20512.082397]  ksys_ioctl+0x75/0x80
	[20512.082399]  __x64_sys_ioctl+0x1a/0x20
	[20512.082402]  do_syscall_64+0x65/0x1a0
	[20512.082405]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[20512.082407] RIP: 0033:0x7f938cf29777
	[20512.082408] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[20512.082409] RSP: 002b:00007f938be2e458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[20512.082410] RAX: ffffffffffffffda RBX: 00007f938be2e780 RCX: 00007f938cf29777
	[20512.082411] RDX: 00007f938be2e788 RSI: 00000000c038943b RDI: 0000000000000003
	[20512.082412] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f938be2e960
	[20512.082412] R10: 000055ad4e628c40 R11: 0000000000000246 R12: 0000000000000003
	[20512.082413] R13: 00007f938be2e788 R14: 00007f938be2e668 R15: 00007f938be2e890
	[20512.082418] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev joydev dm_mod snd_pcm aesni_intel aes_x86_64 sr_mod cdrom crypto_simd snd_timer cryptd sg glue_helper snd parport_pc ide_pci_generic parport soundcore piix bochs_drm input_leds ide_core rtc_cmos floppy i2c_piix4 pcspkr psmouse evbug evdev serio_raw qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
	[20512.082464] ---[ end trace 3bcb7e6493f60fe5 ]---
	[20512.082466] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
	[20512.082467] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
	[20512.082468] RSP: 0018:ffffa95b80f1b820 EFLAGS: 00010206
	[20512.082469] RAX: ffff9e059aa5a100 RBX: ffff9e04e01496e0 RCX: ffff9e06555dcb81
	[20512.082469] RDX: 00000000000000ce RSI: 000000000000009f RDI: 0000006336410000
	[20512.082470] RBP: ffffa95b80f1b850 R08: ffffa95b80f1b7c8 R09: ffffa95b80f1b7d0
	[20512.082471] R10: 0000000000007b2a R11: 000000000000009f R12: 0000000000000008
	[20512.082472] R13: ffff9e05c9e21c80 R14: 000000000003a09e R15: ffff9e059aa5a100
	[20512.082473] FS:  00007f938be31700(0000) GS:ffff9e06b6000000(0000) knlGS:0000000000000000
	[20512.082473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[20512.082474] CR2: 00007f6dc1f8c519 CR3: 00000001c3d14005 CR4: 00000000001606e0
	[20512.082478] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
	[20512.082479] in_atomic(): 1, irqs_disabled(): 0, pid: 5662, name: crawl_5268
	[20512.082479] INFO: lockdep is turned off.
	[20512.082481] CPU: 1 PID: 5662 Comm: crawl_5268 Tainted: G      D W         5.0.16-zb64-9b948ea3083a+ #1
	[20512.082482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[20512.082482] Call Trace:
	[20512.082485]  dump_stack+0x86/0xc5
	[20512.082488]  ___might_sleep+0x217/0x240
	[20512.082490]  __might_sleep+0x4a/0x80
	[20512.082494]  exit_signals+0x33/0x250
	[20512.082496]  do_exit+0xb9/0xd70
	[20512.082502]  rewind_stack_do_exit+0x17/0x20
	[20512.082503] RIP: 0033:0x7f938cf29777
	[20512.082504] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[20512.082505] RSP: 002b:00007f938be2e458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[20512.082506] RAX: ffffffffffffffda RBX: 00007f938be2e780 RCX: 00007f938cf29777
	[20512.082507] RDX: 00007f938be2e788 RSI: 00000000c038943b RDI: 0000000000000003
	[20512.082508] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f938be2e960
	[20512.082508] R10: 000055ad4e628c40 R11: 0000000000000246 R12: 0000000000000003
	[20512.082509] R13: 00007f938be2e788 R14: 00007f938be2e668 R15: 00007f938be2e890
	[20512.082514] note: crawl_5268[5662] exited with preempt_count 2
	[20512.214564] hardirqs last  enabled at (2743290041): [<ffffffff922d4758>] kmem_cache_alloc+0x208/0x330
	[20512.216158] hardirqs last disabled at (2743290040): [<ffffffff922d4614>] kmem_cache_alloc+0xc4/0x330
	[20512.217793] softirqs last  enabled at (2743288790): [<ffffffff930003a0>] __do_softirq+0x3a0/0x45a
	[20512.219555] softirqs last disabled at (2743288767): [<ffffffff920a9949>] irq_exit+0xe9/0xf0
	[20536.037778] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [rsync:5715]
	[20536.042129] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [btrfs-transacti:4737]
	[20540.028759] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [crawl_5268:5663]
	[20540.033740] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [crawl_5268:5664]

Then SysRq-W:

	[53268.006056] sysrq: SysRq : Show Blocked State
	[53268.006780]   task                        PC stack   pid father
	[53268.007927] rsync           D    0  5683   5673 0x00000000
	[53268.008786] Call Trace:
	[53268.009215]  __schedule+0x3d4/0xb70
	[53268.009770]  schedule+0x3d/0x80
	[53268.010313]  btrfs_tree_lock+0x1a5/0x270
	[53268.010984]  ? wait_woken+0xa0/0xa0
	[53268.011560]  btrfs_search_slot+0x75f/0x930
	[53268.012335]  btrfs_lookup_inode+0x3e/0xc0
	[53268.013066]  ? kmem_cache_alloc+0x208/0x330
	[53268.013817]  ? btrfs_alloc_path+0x1a/0x20
	[53268.014552]  __btrfs_update_delayed_inode+0x70/0x210
	[53268.015542]  btrfs_commit_inode_delayed_inode+0x126/0x130
	[53268.016741]  btrfs_evict_inode+0x3c7/0x530
	[53268.017586]  ? dput+0x2a/0x2e0
	[53268.018222]  evict+0xc3/0x1c0
	[53268.018835]  iput+0x1de/0x280
	[53268.019481]  ? dput+0x2a/0x2e0
	[53268.020130]  dentry_unlink_inode+0xc0/0xf0
	[53268.020996]  __dentry_kill+0xd1/0x1b0
	[53268.021767]  dentry_kill+0x55/0x1e0
	[53268.022512]  ? dput+0x2a/0x2e0
	[53268.023168]  dput+0x291/0x2e0
	[53268.023792]  do_renameat2+0x420/0x5b0
	[53268.024667]  __x64_sys_rename+0x20/0x30
	[53268.025583]  do_syscall_64+0x65/0x1a0
	[53268.026501]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.027664] RIP: 0033:0x7f0167fb1947
	[53268.028415] Code: Bad RIP value.
	[53268.029104] RSP: 002b:00007ffd919c7d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
	[53268.030731] RAX: ffffffffffffffda RBX: 00007ffd919c9ed0 RCX: 00007f0167fb1947
	[53268.031962] RDX: 0000000000000000 RSI: 00007ffd919c7ed0 RDI: 00007ffd919c9ed0
	[53268.033209] RBP: 00007ffd919c7ed0 R08: 0000000000000000 R09: 0000000000000000
	[53268.034540] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
	[53268.036319] R13: 0000000000000004 R14: 00000000000081a4 R15: 0000000000000001
	[53268.038063] rsync           D    0  6461   6460 0x00000000
	[53268.039405] Call Trace:
	[53268.040102]  __schedule+0x3d4/0xb70
	[53268.040877]  ? sched_clock+0x9/0x10
	[53268.041640]  schedule+0x3d/0x80
	[53268.042301]  io_schedule+0x16/0x40
	[53268.043065]  wait_on_page_bit_killable+0x183/0x290
	[53268.044111]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.044909]  generic_file_read_iter+0x267/0xb70
	[53268.045688]  __vfs_read+0x124/0x1b0
	[53268.046290]  vfs_read+0xab/0x160
	[53268.046899]  ksys_read+0x5f/0xf0
	[53268.047637]  __x64_sys_read+0x1a/0x20
	[53268.048582]  do_syscall_64+0x65/0x1a0
	[53268.049368]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.050207] RIP: 0033:0x7fa9d031c761
	[53268.050787] Code: Bad RIP value.
	[53268.051318] RSP: 002b:00007ffc5f634ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.052997] RAX: ffffffffffffffda RBX: 000055a2c9a3cba0 RCX: 00007fa9d031c761
	[53268.054359] RDX: 0000000000003800 RSI: 000055a2c998e9d0 RDI: 0000000000000001
	[53268.056088] RBP: 0000000000003800 R08: fffffffffffffff0 R09: 0000000000b75811
	[53268.057557] R10: 000055a2c998e9d0 R11: 0000000000000246 R12: 0000000000001000
	[53268.058876] R13: 0000000000003800 R14: 0000000000000320 R15: 0000000000001000
	[53268.060583] rsync           D    0  6596   6595 0x00000000
	[53268.061503] Call Trace:
	[53268.061988]  __schedule+0x3d4/0xb70
	[53268.062717]  schedule+0x3d/0x80
	[53268.063693]  io_schedule+0x16/0x40
	[53268.064330]  wait_on_page_bit+0x178/0x270
	[53268.065084]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.065959]  read_extent_buffer_pages+0x27e/0x350
	[53268.066914]  btree_read_extent_buffer_pages+0xb8/0x1c0
	[53268.067778]  read_tree_block+0x42/0x70
	[53268.068594]  read_block_for_search.isra.11+0x181/0x350
	[53268.069556]  btrfs_search_slot+0x3cf/0x930
	[53268.070318]  btrfs_lookup_extent_info+0xa3/0x450
	[53268.071986]  update_ref_for_cow+0x1a6/0x340
	[53268.073145]  __btrfs_cow_block+0x1ff/0x570
	[53268.074389]  btrfs_cow_block+0xf8/0x230
	[53268.075428]  push_leaf_left+0x11a/0x190
	[53268.077233]  btrfs_del_items+0x27b/0x470
	[53268.078094]  btrfs_truncate_inode_items+0x2ee/0xe70
	[53268.079041]  ? _raw_spin_unlock+0x27/0x40
	[53268.079898]  btrfs_evict_inode+0x455/0x530
	[53268.080898]  ? dput+0x2a/0x2e0
	[53268.081540]  evict+0xc3/0x1c0
	[53268.082183]  iput+0x1de/0x280
	[53268.082853]  ? dput+0x2a/0x2e0
	[53268.084211]  dentry_unlink_inode+0xc0/0xf0
	[53268.084965]  __dentry_kill+0xd1/0x1b0
	[53268.085805]  dentry_kill+0x55/0x1e0
	[53268.086685]  ? dput+0x2a/0x2e0
	[53268.087371]  dput+0x291/0x2e0
	[53268.087973]  do_renameat2+0x420/0x5b0
	[53268.088974]  __x64_sys_rename+0x20/0x30
	[53268.090099]  do_syscall_64+0x65/0x1a0
	[53268.091287]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.092354] RIP: 0033:0x7f17fb432947
	[53268.093487] Code: Bad RIP value.
	[53268.094165] RSP: 002b:00007ffe387435d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
	[53268.095777] RAX: ffffffffffffffda RBX: 00007ffe387457a0 RCX: 00007f17fb432947
	[53268.097312] RDX: 0000000000000000 RSI: 00007ffe387437a0 RDI: 00007ffe387457a0
	[53268.098906] RBP: 00007ffe387437a0 R08: 0000000000000000 R09: 0000000000000000
	[53268.100613] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
	[53268.102466] R13: 0000000000000004 R14: 00000000000081a4 R15: 0000000000000001
	[53268.104454] rsync           D    0  6679   6674 0x00000000
	[53268.105875] Call Trace:
	[53268.106453]  __schedule+0x3d4/0xb70
	[53268.107603]  schedule+0x3d/0x80
	[53268.108404]  io_schedule+0x16/0x40
	[53268.109461]  wait_on_page_bit_killable+0x183/0x290
	[53268.110459]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.111238]  generic_file_read_iter+0x267/0xb70
	[53268.112123]  __vfs_read+0x124/0x1b0
	[53268.112842]  vfs_read+0xab/0x160
	[53268.113494]  ksys_read+0x5f/0xf0
	[53268.114127]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.116272]  __x64_sys_read+0x1a/0x20
	[53268.116887]  do_syscall_64+0x65/0x1a0
	[53268.117540]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.118364] RIP: 0033:0x7f935866e761
	[53268.119003] Code: 00 f7 27 00 00 00 00 00 00 62 81 01 00 00 00 00 00 00 20 00 00 00 00 00 00 08 28 00 00 00 00 00 00 62 81 01 00 00 00 00 00 00 <20> 00 00 00 00 00 00 2d 28 00 00 00 00 00 00 62 81 01 00 00 00 00
	[53268.124309] RSP: 002b:00007ffd292115f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.126193] RAX: ffffffffffffffda RBX: 00005587c16c2ff0 RCX: 00007f935866e761
	[53268.127920] RDX: 0000000000007c00 RSI: 00005587c16c3430 RDI: 0000000000000001
	[53268.129621] RBP: 0000000000007c00 R08: fffffffffffffff0 R09: 0000000005334800
	[53268.132283] R10: 00005587c16c3430 R11: 0000000000000246 R12: 0000000000000400
	[53268.134937] R13: 0000000000007c00 R14: 0000000000000390 R15: 0000000000000400
	[53268.137415] rsync           D    0  7081   7053 0x00000000
	[53268.138587] Call Trace:
	[53268.139507]  __schedule+0x3d4/0xb70
	[53268.141054]  schedule+0x3d/0x80
	[53268.141638]  io_schedule+0x16/0x40
	[53268.142392]  wait_on_page_bit_killable+0x183/0x290
	[53268.143263]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.144926]  generic_file_read_iter+0x267/0xb70
	[53268.145758]  __vfs_read+0x124/0x1b0
	[53268.146472]  vfs_read+0xab/0x160
	[53268.147252]  ksys_read+0x5f/0xf0
	[53268.148273]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.149682]  __x64_sys_read+0x1a/0x20
	[53268.150565]  do_syscall_64+0x65/0x1a0
	[53268.151428]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.152233] RIP: 0033:0x7ff3a6803761
	[53268.152800] Code: Bad RIP value.
	[53268.153319] RSP: 002b:00007ffefada95e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.154919] RAX: ffffffffffffffda RBX: 0000562ccb936cc0 RCX: 00007ff3a6803761
	[53268.156871] RDX: 0000000000002109 RSI: 0000562cc7e2b1c0 RDI: 0000000000000000
	[53268.158322] RBP: 0000000000002109 R08: 0000562cc7e2b1b0 R09: 00007ff3a68d53f0
	[53268.160016] R10: 0000562cc7269010 R11: 0000000000000246 R12: 0000000000000000
	[53268.161736] R13: 0000000000002109 R14: 0000000000000000 R15: 0000000000000000
	[53268.163275] rsync           D    0  8636   8583 0x00000000
	[53268.164188] Call Trace:
	[53268.164688]  __schedule+0x3d4/0xb70
	[53268.165647]  schedule+0x3d/0x80
	[53268.166522]  io_schedule+0x16/0x40
	[53268.167170]  wait_on_page_bit_killable+0x183/0x290
	[53268.168046]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.168887]  generic_file_read_iter+0x267/0xb70
	[53268.170488]  __vfs_read+0x124/0x1b0
	[53268.171445]  vfs_read+0xab/0x160
	[53268.172233]  ksys_read+0x5f/0xf0
	[53268.172922]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.174115]  __x64_sys_read+0x1a/0x20
	[53268.174922]  do_syscall_64+0x65/0x1a0
	[53268.175769]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.177195] RIP: 0033:0x7ff57017f761
	[53268.178011] Code: Bad RIP value.
	[53268.178730] RSP: 002b:00007ffcbb34b548 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.180469] RAX: ffffffffffffffda RBX: 000055e78c7bfc90 RCX: 00007ff57017f761
	[53268.182613] RDX: 00000000000022dd RSI: 000055e78c7b63d0 RDI: 0000000000000000
	[53268.184854] RBP: 00000000000022dd R08: 000055e78c7b63c0 R09: 00007ff570250cd0
	[53268.186508] R10: 000055e78bd77010 R11: 0000000000000246 R12: 0000000000000000
	[53268.189446] R13: 00000000000022dd R14: 0000000000000000 R15: 0000000000000000
	[53268.190635] rsync           D    0 30991   8458 0x00000000
	[53268.192905] Call Trace:
	[53268.193332]  __schedule+0x3d4/0xb70
	[53268.193920]  ? sched_clock+0x9/0x10
	[53268.195564]  schedule+0x3d/0x80
	[53268.196290]  io_schedule+0x16/0x40
	[53268.197067]  wait_on_page_bit_killable+0x183/0x290
	[53268.197895]  ? add_to_page_cache_lru+0xd0/0xd0
	[53268.198875]  generic_file_read_iter+0x267/0xb70
	[53268.199770]  __vfs_read+0x124/0x1b0
	[53268.200515]  vfs_read+0xab/0x160
	[53268.201198]  ksys_read+0x5f/0xf0
	[53268.201755]  __x64_sys_read+0x1a/0x20
	[53268.202561]  do_syscall_64+0x65/0x1a0
	[53268.203364]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53268.204306] RIP: 0033:0x7fcab8c0c761
	[53268.205129] Code: Bad RIP value.
	[53268.205698] RSP: 002b:00007ffe4e709938 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
	[53268.207263] RAX: ffffffffffffffda RBX: 000055cf5c835d70 RCX: 00007fcab8c0c761
	[53268.208588] RDX: 0000000000004400 RSI: 000055cf5a51d7c0 RDI: 0000000000000001
	[53268.209908] RBP: 0000000000004400 R08: 000055cf5a51d7b0 R09: 00007fcab8cde450
	[53268.211376] R10: 000055cf59ded010 R11: 0000000000000246 R12: 0000000000000000
	[53268.214192] R13: 0000000000004400 R14: 0000000000000000 R15: 0000000000000000

Then SysRq-T excerpts for each of the 4 tasks stuck on CPUs:

CPU#0:

	[53274.315728] rsync           R  running task        0  5715   5708 0x80000008
	[53274.316927] Call Trace:
	[53274.317380]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53274.318181]  ? retint_kernel+0x10/0x10
	[53274.318820]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53274.319646]  ? retint_kernel+0x10/0x10
	[53274.320331]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53274.321125]  ? trace_hardirqs_on_caller+0x4a/0x100
	[53274.321924]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53274.322716]  ? retint_kernel+0x10/0x10
	[53274.323367]  ? retint_kernel+0x10/0x10
	[53274.324009]  ? __pv_queued_spin_lock_slowpath+0x273/0x2b0
	[53274.324926]  ? __pv_queued_spin_lock_slowpath+0xf7/0x2b0
	[53274.325814]  ? __pv_queued_spin_lock_slowpath+0x273/0x2b0
	[53274.326729]  ? queued_write_lock_slowpath+0x80/0x90
	[53274.327560]  ? do_raw_write_lock+0xae/0xb0
	[53274.328262]  ? _raw_write_lock+0x55/0x70
	[53274.328932]  ? tree_mod_log_insert_key+0xd3/0x150
	[53274.329840]  ? __btrfs_cow_block+0x25d/0x570
	[53274.330810]  ? btrfs_cow_block+0xf8/0x230
	[53274.331688]  ? btrfs_search_slot+0x213/0x930
	[53274.332621]  ? btrfs_lookup_inode+0x3e/0xc0
	[53274.333595]  ? kmem_cache_alloc+0x208/0x330
	[53274.334505]  ? btrfs_alloc_path+0x1a/0x20
	[53274.335385]  ? __btrfs_update_delayed_inode+0x70/0x210
	[53274.336515]  ? btrfs_commit_inode_delayed_inode+0x126/0x130
	[53274.337740]  ? btrfs_evict_inode+0x3c7/0x530
	[53274.338683]  ? dput+0x2a/0x2e0
	[53274.339350]  ? evict+0xc3/0x1c0
	[53274.340056]  ? iput+0x1de/0x280
	[53274.340746]  ? dput+0x2a/0x2e0
	[53274.341456]  ? dentry_unlink_inode+0xc0/0xf0
	[53274.342378]  ? __dentry_kill+0xd1/0x1b0
	[53274.343208]  ? dentry_kill+0x55/0x1e0
	[53274.343932]  ? dput+0x2a/0x2e0
	[53274.344597]  ? dput+0x291/0x2e0
	[53274.345263]  ? __fput+0x12f/0x260
	[53274.345984]  ? ____fput+0xe/0x10
	[53274.346660]  ? task_work_run+0x8f/0xc0
	[53274.347499]  ? exit_to_usermode_loop+0x112/0x120
	[53274.348543]  ? do_syscall_64+0x199/0x1a0
	[53274.349368]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

CPU#1:

	[53273.449195] btrfs-transacti R  running task        0  4737      2 0x80000008
	[53273.451375] Call Trace:
	[53273.451980]  ? retint_kernel+0x10/0x10
	[53273.452758]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53273.453549]  ? trace_hardirqs_on_caller+0x4a/0x100
	[53273.454355]  ? trace_hardirqs_on_thunk+0x1a/0x1c
	[53273.455143]  ? retint_kernel+0x10/0x10
	[53273.455777]  ? kvm_wait+0x86/0x90
	[53273.456368]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[53273.457300]  ? trace_hardirqs_on+0x4c/0x100
	[53273.458008]  ? kvm_wait+0x8b/0x90
	[53273.458577]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[53273.459747]  ? queued_write_lock_slowpath+0x80/0x90
	[53273.460762]  ? do_raw_write_lock+0xae/0xb0
	[53273.461682]  ? _raw_write_lock+0x55/0x70
	[53273.462493]  ? tree_mod_log_insert_key+0xd3/0x150
	[53273.463314]  ? __btrfs_cow_block+0x25d/0x570
	[53273.464040]  ? btrfs_cow_block+0xf8/0x230
	[53273.464718]  ? btrfs_search_slot+0x213/0x930
	[53273.465447]  ? lookup_inline_extent_backref+0x118/0x570
	[53273.466358]  ? mark_held_locks+0x76/0xa0
	[53273.467175]  ? insert_inline_extent_backref+0x51/0xe0
	[53273.468334]  ? rcu_read_lock_sched_held+0x72/0x80
	[53273.469560]  ? __btrfs_inc_extent_ref+0x87/0x220
	[53273.470730]  ? run_delayed_data_ref+0x1c2/0x210
	[53273.471718]  ? run_one_delayed_ref+0x55/0xa0
	[53273.472787]  ? btrfs_run_delayed_refs_for_head+0x17b/0x3b0
	[53273.474200]  ? __btrfs_run_delayed_refs+0x84/0x180
	[53273.475455]  ? btrfs_run_delayed_refs+0x86/0x1e0
	[53273.476684]  ? btrfs_commit_transaction+0x52/0xab0
	[53273.478684]  ? start_transaction+0x93/0x4d0
	[53273.479500]  ? transaction_kthread+0x155/0x190
	[53273.480308]  ? kthread+0x113/0x150
	[53273.480920]  ? btrfs_cleanup_transaction+0x630/0x630
	[53273.481937]  ? kthread_park+0x90/0x90
	[53273.483571]  ? ret_from_fork+0x3a/0x50

CPU#2:

	[53273.814635] crawl_5268      R  running task        0  5663   5650 0x80000008
	[53273.816033] Call Trace:
	[53273.816470]  <IRQ>
	[53273.816840]  sched_show_task+0x198/0x260
	[53273.817546]  show_state_filter+0xa0/0x1a0
	[53273.818276]  sysrq_handle_showstate+0x10/0x20
	[53273.819059]  __handle_sysrq+0x139/0x210
	[53273.819744]  handle_sysrq+0x26/0x30
	[53273.820376]  serial8250_handle_irq.part.16+0xbc/0x100
	[53273.821298]  serial8250_default_handle_irq+0x53/0x60
	[53273.822209]  serial8250_interrupt+0x68/0x100
	[53273.823001]  __handle_irq_event_percpu+0x4e/0x2b0
	[53273.823958]  handle_irq_event_percpu+0x32/0x80
	[53273.824947]  handle_irq_event+0x39/0x60
	[53273.825800]  handle_edge_irq+0xef/0x1c0
	[53273.826554]  handle_irq+0x75/0x120
	[53273.827191]  do_IRQ+0x64/0x130
	[53273.827760]  common_interrupt+0xf/0xf
	[53273.828443]  </IRQ>
	[53273.828841] RIP: 0010:native_safe_halt+0x12/0x20
	[53273.829783] Code: f0 80 48 02 20 48 8b 00 a8 08 0f 84 7b ff ff ff eb bd 90 90 90 90 90 90 55 48 89 e5 e9 07 00 00 00 0f 00 2d b2 ec 43 00 fb f4 <5d> c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e9 07 00 00
	[53273.834211] RSP: 0018:ffffa95b81107790 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd6
	[53273.835701] RAX: 0000000000000000 RBX: ffff9e06b5e159d4 RCX: 0000000000000001
	[53273.837608] RDX: ffff9e06aeb2c000 RSI: ffffffff921199ed RDI: ffffffff9207a326
	[53273.839525] RBP: ffffa95b81107790 R08: 0000000000000000 R09: 0000000000000000
	[53273.841170] R10: ffffa95b811077d0 R11: ffff9e069ddfd6b8 R12: 0000000000000246
	[53273.842439] R13: 0000000000000001 R14: ffff9e06b5e159d4 R15: 0000000000000001
	[53273.843995]  ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[53273.845002]  ? kvm_wait+0x86/0x90
	[53273.845610]  kvm_wait+0x8b/0x90
	[53273.846178]  __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
	[53273.847472]  queued_read_lock_slowpath+0x75/0x80
	[53273.848317]  do_raw_read_lock+0x4b/0x50
	[53273.849013]  _raw_read_lock+0x58/0x70
	[53273.849693]  __tree_mod_log_search+0x2d/0xb0
	[53273.850466]  __tree_mod_log_oldest_root.isra.1+0x4c/0x80
	[53273.851593]  btrfs_old_root_level+0x26/0x80
	[53273.852374]  resolve_indirect_refs+0x2ad/0x910
	[53273.853211]  ? __schedule+0x3dc/0xb70
	[53273.853870]  find_parent_nodes+0x443/0x1340
	[53273.854631]  btrfs_find_all_roots_safe+0xc5/0x140
	[53273.855774]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[53273.856908]  iterate_extent_inodes+0x198/0x3e0
	[53273.857766]  iterate_inodes_from_logical+0xa1/0xd0
	[53273.858929]  ? iterate_inodes_from_logical+0xa1/0xd0
	[53273.859939]  ? btrfs_inode_flags_to_fsflags+0x90/0x90
	[53273.860847]  btrfs_ioctl_logical_to_ino+0xf3/0x1a0
	[53273.861741]  btrfs_ioctl+0xcf7/0x2cb0
	[53273.862617]  ? __lock_acquire+0x477/0x1b50
	[53273.863530]  ? kvm_sched_clock_read+0x18/0x30
	[53273.864496]  ? kvm_sched_clock_read+0x18/0x30
	[53273.865335]  ? sched_clock+0x9/0x10
	[53273.866015]  do_vfs_ioctl+0xa6/0x6e0
	[53273.866891]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[53273.867954]  ? do_vfs_ioctl+0xa6/0x6e0
	[53273.869148]  ? __fget+0x119/0x200
	[53273.869922]  ksys_ioctl+0x75/0x80
	[53273.870649]  __x64_sys_ioctl+0x1a/0x20
	[53273.871450]  do_syscall_64+0x65/0x1a0
	[53273.872351]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
	[53273.873487] RIP: 0033:0x7f938cf29777
	[53273.874154] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
	[53273.878117] RSP: 002b:00007f938b62d458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[53273.879542] RAX: ffffffffffffffda RBX: 00007f938b62d780 RCX: 00007f938cf29777
	[53273.881224] RDX: 00007f938b62d788 RSI: 00000000c038943b RDI: 0000000000000003
	[53273.882805] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f938b62d960
	[53273.884519] R10: 000055ad4e628c40 R11: 0000000000000246 R12: 0000000000000003
	[53273.885971] R13: 00007f938b62d788 R14: 00007f938b62d668 R15: 00007f938b62d890

CPU#3:

	[53273.887482] crawl_5268      R  running task        0  5664   5650 0x80000008
	[53273.888778] Call Trace:
	[53273.889807]  ? tree_mod_log_insert_root.isra.2+0x117/0x350
	[53273.890820]  ? __btrfs_cow_block+0x41e/0x570
	[53273.891625]  ? btrfs_cow_block+0xf8/0x230
	[53273.892471]  ? btrfs_search_slot+0x213/0x930
	[53273.893296]  ? kmem_cache_free+0xa4/0x1e0
	[53273.894189]  ? btrfs_lookup_file_extent+0x49/0x60
	[53273.895058]  ? __btrfs_drop_extents+0x183/0xdf0
	[53273.895869]  ? kmem_cache_alloc+0x208/0x330
	[53273.896641]  ? btrfs_alloc_path+0x1a/0x20
	[53273.897371]  ? trace_hardirqs_on+0x4c/0x100
	[53273.898310]  ? btrfs_drop_extents+0x5d/0x90
	[53273.899096]  ? btrfs_clone+0x845/0xcf0
	[53273.899938]  ? btrfs_extent_same_range+0x5b/0xd0
	[53273.901203]  ? btrfs_remap_file_range+0x218/0x380
	[53273.902585]  ? vfs_dedupe_file_range_one+0x111/0x170
	[53273.903996]  ? vfs_dedupe_file_range+0x157/0x1f0
	[53273.904850]  ? do_vfs_ioctl+0x27f/0x6e0
	[53273.905655]  ? __fget+0x119/0x200
	[53273.906918]  ? ksys_ioctl+0x75/0x80
	[53273.907699]  ? __x64_sys_ioctl+0x1a/0x20
	[53273.908428]  ? do_syscall_64+0x65/0x1a0
	[53273.909175]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

> >>
> >> <SNIP>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  parent reply	other threads:[~2019-05-18 21:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 21:36 storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+) Zygo Blaxell
2019-05-16  6:57 ` Nikolay Borisov
2019-05-16 16:20   ` Zygo Blaxell
2019-05-18  4:44   ` Zygo Blaxell
2019-05-18  8:27     ` Nikolay Borisov
2019-05-18 21:00       ` Zygo Blaxell
2019-05-18 21:41       ` Zygo Blaxell [this message]
2020-02-04  4:57 ` storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+, fixed in 5.4) Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190518214138.GJ20359@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).