From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Nikolay Borisov <nborisov@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+)
Date: Sat, 18 May 2019 00:44:11 -0400 [thread overview]
Message-ID: <20190518044411.GH20359@hungrycats.org> (raw)
In-Reply-To: <0480104e-db25-4e2f-08e5-0236ffd5c1c2@suse.com>
[-- Attachment #1: Type: text/plain, Size: 46626 bytes --]
On Thu, May 16, 2019 at 09:57:01AM +0300, Nikolay Borisov wrote:
>
>
> On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
> > "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> > CPU cores in kernel functions that are unable to make forward progress,
> > but also unwilling to release their respective CPU cores. This is
> > usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> > usage or just a lot of CPU fan noise) though I don't know if all cores
> > are spinning or only some of them.
> >
> > The kernel console presents a continual stream of "BUG: soft lockup"
> > warnings for some days. None of the call traces change during this time.
> > The only way out is to reboot.
> >
> > You can reproduce this by writing a bunch of data to a filesystem while
> > bees is running on all cores. It takes a few days to occur naturally.
> > It can probably be sped up by just doing a bunch of random LOGICAL_INO
> > ioctls in a tight loop on each core.
> >
> > Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> > (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> > ioctl (btrfs_tree_read_lock_atomic):
>
>
> Provide output of all sleeping threads when this occur via
> echo w > /proc/sysrq-trigger.
I fixed my SysRq configuration. The results are...kind of interesting.
I guess the four threads that are running loops on all cores don't count
as "blocked"...
[39610.791203] sysrq: SysRq : Show Blocked State
[39610.791934] task PC stack pid father
[39610.792832] btrfs-transacti D 0 4643 2 0x80000000
[39610.793613] Call Trace:
[39610.793970] __schedule+0x3d4/0xb70
[39610.794472] schedule+0x3d/0x80
[39610.794925] wait_for_commit+0x59/0xa0
[39610.795464] ? wait_woken+0xa0/0xa0
[39610.795962] btrfs_commit_transaction+0x122/0xab0
[39610.796635] ? start_transaction+0x93/0x4d0
[39610.797228] transaction_kthread+0x155/0x190
[39610.797836] kthread+0x113/0x150
[39610.798296] ? btrfs_cleanup_transaction+0x630/0x630
[39610.799004] ? kthread_park+0x90/0x90
[39610.799542] ret_from_fork+0x3a/0x50
[39610.800080] rsync D 0 12832 8251 0x00000000
[39610.800868] Call Trace:
[39610.801223] __schedule+0x3d4/0xb70
[39610.801728] schedule+0x3d/0x80
[39610.802179] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.802919] ? wake_up_var+0x40/0x40
[39610.803436] btrfs_setattr+0x316/0x5a0
[39610.803970] notify_change+0x2f0/0x450
[39610.804513] do_truncate+0x73/0xc0
[39610.805029] do_sys_ftruncate+0x12b/0x1c0
[39610.805606] __x64_sys_ftruncate+0x1b/0x20
[39610.806193] do_syscall_64+0x65/0x1a0
[39610.806831] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.807556] RIP: 0033:0x7f10d5005c97
[39610.808069] Code: Bad RIP value.
[39610.808539] RSP: 002b:00007ffea589cf28 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.809649] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f10d5005c97
[39610.810790] RDX: 0000000000000000 RSI: 00000000000001cc RDI: 0000000000000003
[39610.811798] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000c846d527
[39610.812873] R10: 00000000894b52a9 R11: 0000000000000246 R12: 00000000000001cc
[39610.813936] R13: 0000000000000000 R14: 00000000000001cc R15: 0000000000000000
[39610.815048] rsync D 0 12834 8444 0x00000000
[39610.815826] Call Trace:
[39610.816181] __schedule+0x3d4/0xb70
[39610.816909] schedule+0x3d/0x80
[39610.817505] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.818264] ? wake_up_var+0x40/0x40
[39610.818808] btrfs_setattr+0x316/0x5a0
[39610.819373] notify_change+0x2f0/0x450
[39610.820129] do_truncate+0x73/0xc0
[39610.820739] do_sys_ftruncate+0x12b/0x1c0
[39610.821409] __x64_sys_ftruncate+0x1b/0x20
[39610.822040] do_syscall_64+0x65/0x1a0
[39610.822614] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.823385] RIP: 0033:0x7fade44efc97
[39610.824203] Code: Bad RIP value.
[39610.824877] RSP: 002b:00007ffc9f263818 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.826038] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fade44efc97
[39610.828361] RDX: 0000000000000000 RSI: 00000000001b0b3c RDI: 0000000000000003
[39610.829580] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000e9809e58
[39610.830932] R10: 00000000f8cf8f2b R11: 0000000000000246 R12: 00000000001b0b3c
[39610.832178] R13: 00000000001b0820 R14: 000000000000031c R15: 0000000000000000
[39610.833491] rsync D 0 12835 8183 0x00000000
[39610.834353] Call Trace:
[39610.834747] __schedule+0x3d4/0xb70
[39610.835286] schedule+0x3d/0x80
[39610.835782] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.836646] ? wake_up_var+0x40/0x40
[39610.837201] btrfs_setattr+0x316/0x5a0
[39610.837790] notify_change+0x2f0/0x450
[39610.838364] do_truncate+0x73/0xc0
[39610.838913] do_sys_ftruncate+0x12b/0x1c0
[39610.839542] __x64_sys_ftruncate+0x1b/0x20
[39610.840161] do_syscall_64+0x65/0x1a0
[39610.840720] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.841496] RIP: 0033:0x7f696d6efc97
[39610.842138] Code: Bad RIP value.
[39610.842679] RSP: 002b:00007fff121f34d8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.844008] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f696d6efc97
[39610.845271] RDX: 0000000000000000 RSI: 00000000000001cc RDI: 0000000000000003
[39610.846532] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000c846d527
[39610.847696] R10: 00000000894b52a9 R11: 0000000000000246 R12: 00000000000001cc
[39610.848839] R13: 0000000000000000 R14: 00000000000001cc R15: 0000000000000000
[39610.850001] rsync D 0 12838 7602 0x00000000
[39610.850909] Call Trace:
[39610.851332] __schedule+0x3d4/0xb70
[39610.851981] schedule+0x3d/0x80
[39610.852535] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.853455] ? wake_up_var+0x40/0x40
[39610.854085] btrfs_setattr+0x316/0x5a0
[39610.854763] notify_change+0x2f0/0x450
[39610.855420] do_truncate+0x73/0xc0
[39610.856001] do_sys_ftruncate+0x12b/0x1c0
[39610.856691] __x64_sys_ftruncate+0x1b/0x20
[39610.857420] do_syscall_64+0x65/0x1a0
[39610.858104] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.859016] RIP: 0033:0x7f2bed4a1c97
[39610.859673] Code: Bad RIP value.
[39610.860260] RSP: 002b:00007fffa14b3ba8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.861586] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2bed4a1c97
[39610.862862] RDX: 0000000000000000 RSI: 00000000001b0b3c RDI: 0000000000000003
[39610.864102] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000e9809e58
[39610.865324] R10: 00000000f8cf8f2b R11: 0000000000000246 R12: 00000000001b0b3c
[39610.866559] R13: 00000000001b0820 R14: 000000000000031c R15: 0000000000000000
[39610.867760] rsync D 0 12842 8631 0x00000000
[39610.868684] Call Trace:
[39610.869102] __schedule+0x3d4/0xb70
[39610.869694] schedule+0x3d/0x80
[39610.870224] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.871162] ? wake_up_var+0x40/0x40
[39610.871826] btrfs_setattr+0x316/0x5a0
[39610.872466] notify_change+0x2f0/0x450
[39610.873087] do_truncate+0x73/0xc0
[39610.873702] do_sys_ftruncate+0x12b/0x1c0
[39610.874419] __x64_sys_ftruncate+0x1b/0x20
[39610.875168] do_syscall_64+0x65/0x1a0
[39610.875841] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.876736] RIP: 0033:0x7fd20067bc97
[39610.877382] Code: Bad RIP value.
[39610.877993] RSP: 002b:00007ffd11215d58 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.879337] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd20067bc97
[39610.880591] RDX: 0000000000000000 RSI: 0000000001b9face RDI: 0000000000000003
[39610.881881] RBP: 0000000000000003 R08: 0000000000000000 R09: 000000007afe8ff0
[39610.883163] R10: 00000000092d5688 R11: 0000000000000246 R12: 0000000001b9face
[39610.884312] R13: 0000000001b9e700 R14: 00000000000013ce R15: 0000000000000000
[39610.885445] rsync D 0 12845 8614 0x00000000
[39610.886352] Call Trace:
[39610.886754] __schedule+0x3d4/0xb70
[39610.887301] schedule+0x3d/0x80
[39610.887820] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.888731] ? wake_up_var+0x40/0x40
[39610.889323] btrfs_setattr+0x316/0x5a0
[39610.889979] notify_change+0x2f0/0x450
[39610.890569] do_truncate+0x73/0xc0
[39610.891102] do_sys_ftruncate+0x12b/0x1c0
[39610.891803] __x64_sys_ftruncate+0x1b/0x20
[39610.892460] do_syscall_64+0x65/0x1a0
[39610.893029] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.893853] RIP: 0033:0x7f697f553c97
[39610.894481] Code: Bad RIP value.
[39610.895058] RSP: 002b:00007ffed6c8bbb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.896391] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f697f553c97
[39610.897665] RDX: 0000000000000000 RSI: 0000000000014ab8 RDI: 0000000000000003
[39610.898926] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000fb17ef2f
[39610.900177] R10: 000000009ebff28a R11: 0000000000000246 R12: 0000000000014ab8
[39610.901436] R13: 0000000000014820 R14: 0000000000000298 R15: 0000000000000000
[39610.902697] rsync D 0 12846 9349 0x00000000
[39610.903619] Call Trace:
[39610.904017] __schedule+0x3d4/0xb70
[39610.904584] schedule+0x3d/0x80
[39610.905125] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.906044] ? wake_up_var+0x40/0x40
[39610.906707] btrfs_setattr+0x316/0x5a0
[39610.907398] notify_change+0x2f0/0x450
[39610.908091] do_truncate+0x73/0xc0
[39610.908694] do_sys_ftruncate+0x12b/0x1c0
[39610.909390] __x64_sys_ftruncate+0x1b/0x20
[39610.910120] do_syscall_64+0x65/0x1a0
[39610.910715] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.911535] RIP: 0033:0x7f77c7350c97
[39610.912171] Code: Bad RIP value.
[39610.912750] RSP: 002b:00007fff3402b998 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.914036] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f77c7350c97
[39610.915156] RDX: 0000000000000000 RSI: 000000000001b800 RDI: 0000000000000003
[39610.916607] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000012f2a017
[39610.917739] R10: 000000005ca8e044 R11: 0000000000000246 R12: 000000000001b800
[39610.918957] R13: 000000000001b580 R14: 0000000000000280 R15: 0000000000000000
[39610.920107] rsync D 0 12847 10903 0x00000000
[39610.921024] Call Trace:
[39610.921422] __schedule+0x3d4/0xb70
[39610.921964] schedule+0x3d/0x80
[39610.922457] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.923278] ? wake_up_var+0x40/0x40
[39610.924019] btrfs_setattr+0x316/0x5a0
[39610.924631] notify_change+0x2f0/0x450
[39610.925226] do_truncate+0x73/0xc0
[39610.925786] do_sys_ftruncate+0x12b/0x1c0
[39610.926723] __x64_sys_ftruncate+0x1b/0x20
[39610.927375] do_syscall_64+0x65/0x1a0
[39610.927963] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.928772] RIP: 0033:0x7f3462c85c97
[39610.929341] Code: Bad RIP value.
[39610.929883] RSP: 002b:00007ffd4b538368 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.931068] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3462c85c97
[39610.932254] RDX: 0000000000000000 RSI: 0000000001b9face RDI: 0000000000000003
[39610.933373] RBP: 0000000000000003 R08: 0000000000000000 R09: 000000007afe8ff0
[39610.934622] R10: 00000000092d5688 R11: 0000000000000246 R12: 0000000001b9face
[39610.935811] R13: 0000000001b9e700 R14: 00000000000013ce R15: 0000000000000000
[39610.936938] rsync D 0 12849 6970 0x00000000
[39610.938109] Call Trace:
[39610.938515] __schedule+0x3d4/0xb70
[39610.939071] schedule+0x3d/0x80
[39610.939585] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.940407] ? wake_up_var+0x40/0x40
[39610.940959] btrfs_setattr+0x316/0x5a0
[39610.941544] notify_change+0x2f0/0x450
[39610.942146] do_truncate+0x73/0xc0
[39610.942762] do_sys_ftruncate+0x12b/0x1c0
[39610.943383] __x64_sys_ftruncate+0x1b/0x20
[39610.944209] do_syscall_64+0x65/0x1a0
[39610.944786] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.945656] RIP: 0033:0x7f3044497c97
[39610.946409] Code: Bad RIP value.
[39610.947155] RSP: 002b:00007fffb7404608 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.948596] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3044497c97
[39610.949873] RDX: 0000000000000000 RSI: 000000000049a1d8 RDI: 0000000000000003
[39610.951494] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000bb1d27a0
[39610.952718] R10: 00000000f74314fb R11: 0000000000000246 R12: 000000000049a1d8
[39610.953893] R13: 0000000000499e10 R14: 00000000000003c8 R15: 0000000000000000
[39610.954982] rsync D 0 12850 8121 0x00000000
[39610.956640] Call Trace:
[39610.957089] __schedule+0x3d4/0xb70
[39610.957805] schedule+0x3d/0x80
[39610.958456] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.959472] ? wake_up_var+0x40/0x40
[39610.960176] btrfs_setattr+0x316/0x5a0
[39610.960779] notify_change+0x2f0/0x450
[39610.961377] do_truncate+0x73/0xc0
[39610.961924] do_sys_ftruncate+0x12b/0x1c0
[39610.962760] __x64_sys_ftruncate+0x1b/0x20
[39610.963419] do_syscall_64+0x65/0x1a0
[39610.964085] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.964943] RIP: 0033:0x7f8e0c156c97
[39610.965924] Code: Bad RIP value.
[39610.966517] RSP: 002b:00007ffcb0f66bb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.967977] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8e0c156c97
[39610.969245] RDX: 0000000000000000 RSI: 00000000000093d8 RDI: 0000000000000003
[39610.970619] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000021682eb2
[39610.971881] R10: 00000000e1f12d36 R11: 0000000000000246 R12: 00000000000093d8
[39610.973300] R13: 00000000000093a8 R14: 0000000000000030 R15: 0000000000000000
[39610.974593] rsync D 0 12853 11202 0x00000000
[39610.975480] Call Trace:
[39610.975876] __schedule+0x3d4/0xb70
[39610.976457] schedule+0x3d/0x80
[39610.977133] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.977973] ? wake_up_var+0x40/0x40
[39610.978549] btrfs_setattr+0x316/0x5a0
[39610.979145] notify_change+0x2f0/0x450
[39610.980154] do_truncate+0x73/0xc0
[39610.980712] do_sys_ftruncate+0x12b/0x1c0
[39610.981348] __x64_sys_ftruncate+0x1b/0x20
[39610.982111] do_syscall_64+0x65/0x1a0
[39610.982770] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39610.983692] RIP: 0033:0x7ff863906c97
[39610.984536] Code: Bad RIP value.
[39610.985132] RSP: 002b:00007ffed3457e28 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39610.986388] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff863906c97
[39610.987823] RDX: 0000000000000000 RSI: 000000000009ba00 RDI: 0000000000000003
[39610.988936] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000b58a5813
[39610.990107] R10: 000000007951ae56 R11: 0000000000000246 R12: 000000000009ba00
[39610.991289] R13: 000000000009b760 R14: 00000000000002a0 R15: 0000000000000000
[39610.992603] rsync D 0 12855 9095 0x00000000
[39610.993581] Call Trace:
[39610.993964] __schedule+0x3d4/0xb70
[39610.994512] schedule+0x3d/0x80
[39610.995002] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39610.995882] ? wake_up_var+0x40/0x40
[39610.996538] btrfs_setattr+0x316/0x5a0
[39610.997211] notify_change+0x2f0/0x450
[39610.997841] do_truncate+0x73/0xc0
[39610.998368] do_sys_ftruncate+0x12b/0x1c0
[39610.999054] __x64_sys_ftruncate+0x1b/0x20
[39610.999787] do_syscall_64+0x65/0x1a0
[39611.000438] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.001343] RIP: 0033:0x7f93b080fc97
[39611.001963] Code: Bad RIP value.
[39611.002523] RSP: 002b:00007fffc289a6f8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.003717] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f93b080fc97
[39611.004965] RDX: 0000000000000000 RSI: 0000000000006fc8 RDI: 0000000000000003
[39611.006187] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000610c490e
[39611.007445] R10: 00000000ddc7f267 R11: 0000000000000246 R12: 0000000000006fc8
[39611.008747] R13: 0000000000006d60 R14: 0000000000000268 R15: 0000000000000000
[39611.010056] rsync D 0 12856 8999 0x00000000
[39611.011086] Call Trace:
[39611.011599] __schedule+0x3d4/0xb70
[39611.012279] schedule+0x3d/0x80
[39611.012806] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.013748] ? wake_up_var+0x40/0x40
[39611.014410] btrfs_setattr+0x316/0x5a0
[39611.015129] notify_change+0x2f0/0x450
[39611.015840] do_truncate+0x73/0xc0
[39611.016471] do_sys_ftruncate+0x12b/0x1c0
[39611.017229] __x64_sys_ftruncate+0x1b/0x20
[39611.018002] do_syscall_64+0x65/0x1a0
[39611.018692] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.019624] RIP: 0033:0x7fb1cd9e1c97
[39611.020273] Code: Bad RIP value.
[39611.020900] RSP: 002b:00007ffc6ba2fbc8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.022206] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb1cd9e1c97
[39611.023308] RDX: 0000000000000000 RSI: 00000000000093d8 RDI: 0000000000000003
[39611.024431] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000021682eb2
[39611.025668] R10: 00000000e1f12d36 R11: 0000000000000246 R12: 00000000000093d8
[39611.027072] R13: 00000000000093a8 R14: 0000000000000030 R15: 0000000000000000
[39611.028390] rsync D 0 12857 10451 0x00000000
[39611.029382] Call Trace:
[39611.029843] __schedule+0x3d4/0xb70
[39611.030483] schedule+0x3d/0x80
[39611.031074] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.032050] ? wake_up_var+0x40/0x40
[39611.032722] btrfs_setattr+0x316/0x5a0
[39611.033409] notify_change+0x2f0/0x450
[39611.034118] do_truncate+0x73/0xc0
[39611.034756] do_sys_ftruncate+0x12b/0x1c0
[39611.035474] __x64_sys_ftruncate+0x1b/0x20
[39611.036161] do_syscall_64+0x65/0x1a0
[39611.036877] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.037787] RIP: 0033:0x7fab83a26c97
[39611.039054] Code: Bad RIP value.
[39611.039776] RSP: 002b:00007ffe7adce218 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.040967] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fab83a26c97
[39611.042422] RDX: 0000000000000000 RSI: 00000000015983b4 RDI: 0000000000000003
[39611.043747] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000004809bce
[39611.044974] R10: 000000007f4aa483 R11: 0000000000000246 R12: 00000000015983b4
[39611.047307] R13: 0000000001598250 R14: 0000000000000164 R15: 0000000000000000
[39611.048509] rsync D 0 12858 10345 0x00000000
[39611.049365] Call Trace:
[39611.050235] __schedule+0x3d4/0xb70
[39611.050798] schedule+0x3d/0x80
[39611.051289] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.052121] ? wake_up_var+0x40/0x40
[39611.053007] btrfs_setattr+0x316/0x5a0
[39611.053599] notify_change+0x2f0/0x450
[39611.054182] do_truncate+0x73/0xc0
[39611.054711] do_sys_ftruncate+0x12b/0x1c0
[39611.055338] __x64_sys_ftruncate+0x1b/0x20
[39611.056072] do_syscall_64+0x65/0x1a0
[39611.057115] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.058458] RIP: 0033:0x7f5fdff91c97
[39611.059133] Code: Bad RIP value.
[39611.059660] RSP: 002b:00007ffc67d00a58 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.060833] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5fdff91c97
[39611.062271] RDX: 0000000000000000 RSI: 0000000000006fc8 RDI: 0000000000000003
[39611.063389] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000610c490e
[39611.064526] R10: 00000000ddc7f267 R11: 0000000000000246 R12: 0000000000006fc8
[39611.065649] R13: 0000000000006d60 R14: 0000000000000268 R15: 0000000000000000
[39611.066902] rsync D 0 12861 10901 0x00000000
[39611.067957] Call Trace:
[39611.068355] __schedule+0x3d4/0xb70
[39611.068912] schedule+0x3d/0x80
[39611.069413] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.070605] ? wake_up_var+0x40/0x40
[39611.071174] btrfs_setattr+0x316/0x5a0
[39611.071789] notify_change+0x2f0/0x450
[39611.072383] do_truncate+0x73/0xc0
[39611.073001] do_sys_ftruncate+0x12b/0x1c0
[39611.073644] __x64_sys_ftruncate+0x1b/0x20
[39611.074299] do_syscall_64+0x65/0x1a0
[39611.074883] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.075681] RIP: 0033:0x7f3de4043c97
[39611.076251] Code: Bad RIP value.
[39611.076772] RSP: 002b:00007ffe58c8fac8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.077957] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3de4043c97
[39611.079073] RDX: 0000000000000000 RSI: 0000000000003138 RDI: 0000000000000003
[39611.080188] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000a81c530d
[39611.081339] R10: 00000000f7dbc024 R11: 0000000000000246 R12: 0000000000003138
[39611.082533] R13: 0000000000002e7c R14: 00000000000002bc R15: 0000000000000000
[39611.083724] rsync D 0 12863 7311 0x00000000
[39611.084641] Call Trace:
[39611.085060] __schedule+0x3d4/0xb70
[39611.085653] schedule+0x3d/0x80
[39611.086183] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.087052] ? wake_up_var+0x40/0x40
[39611.087660] btrfs_setattr+0x316/0x5a0
[39611.088286] notify_change+0x2f0/0x450
[39611.088923] do_truncate+0x73/0xc0
[39611.089506] do_sys_ftruncate+0x12b/0x1c0
[39611.090176] __x64_sys_ftruncate+0x1b/0x20
[39611.090870] do_syscall_64+0x65/0x1a0
[39611.091492] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.092333] RIP: 0033:0x7fdf2e203c97
[39611.092938] Code: Bad RIP value.
[39611.093491] RSP: 002b:00007ffda441e7e8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.094737] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdf2e203c97
[39611.095916] RDX: 0000000000000000 RSI: 000000000000001d RDI: 0000000000000003
[39611.097097] RBP: 0000000000000003 R08: 0000000000000000 R09: 000000000000001d
[39611.098277] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000001d
[39611.099439] R13: 0000000000000000 R14: 000000000000001d R15: 0000000000000000
[39611.100839] rsync D 0 12864 7029 0x00000000
[39611.101792] Call Trace:
[39611.102223] __schedule+0x3d4/0xb70
[39611.102939] schedule+0x3d/0x80
[39611.103548] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.104391] ? wake_up_var+0x40/0x40
[39611.104979] btrfs_setattr+0x316/0x5a0
[39611.105598] notify_change+0x2f0/0x450
[39611.106211] do_truncate+0x73/0xc0
[39611.106775] do_sys_ftruncate+0x12b/0x1c0
[39611.107413] __x64_sys_ftruncate+0x1b/0x20
[39611.108061] do_syscall_64+0x65/0x1a0
[39611.108651] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.109450] RIP: 0033:0x7fbd0ecc1c97
[39611.110016] Code: Bad RIP value.
[39611.110535] RSP: 002b:00007ffe7ccb9f98 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.111718] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbd0ecc1c97
[39611.112835] RDX: 0000000000000000 RSI: 0000000000004f10 RDI: 0000000000000003
[39611.114054] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000a72ac329
[39611.115210] R10: 000000006c1ecb5e R11: 0000000000000246 R12: 0000000000004f10
[39611.116358] R13: 0000000000004c90 R14: 0000000000000280 R15: 0000000000000000
[39611.117842] rsync D 0 12868 8122 0x00000000
[39611.118737] Call Trace:
[39611.119144] __schedule+0x3d4/0xb70
[39611.119723] schedule+0x3d/0x80
[39611.120238] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.121183] ? wake_up_var+0x40/0x40
[39611.121778] btrfs_setattr+0x316/0x5a0
[39611.122374] notify_change+0x2f0/0x450
[39611.122978] do_truncate+0x73/0xc0
[39611.123528] do_sys_ftruncate+0x12b/0x1c0
[39611.124162] __x64_sys_ftruncate+0x1b/0x20
[39611.124931] do_syscall_64+0x65/0x1a0
[39611.125528] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.126316] RIP: 0033:0x7f57cb45cc97
[39611.126896] Code: Bad RIP value.
[39611.127702] RSP: 002b:00007ffe555eee38 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.128893] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f57cb45cc97
[39611.130050] RDX: 0000000000000000 RSI: 000000000000007e RDI: 0000000000000003
[39611.131206] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000a64251e9
[39611.132309] R10: 00000000b69a9260 R11: 0000000000000246 R12: 000000000000007e
[39611.133449] R13: 0000000000000000 R14: 000000000000007e R15: 0000000000000000
[39611.134660] rsync D 0 12869 7189 0x00000000
[39611.135511] Call Trace:
[39611.135895] __schedule+0x3d4/0xb70
[39611.136441] schedule+0x3d/0x80
[39611.136949] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.138069] ? wake_up_var+0x40/0x40
[39611.138663] btrfs_setattr+0x316/0x5a0
[39611.139257] notify_change+0x2f0/0x450
[39611.140031] do_truncate+0x73/0xc0
[39611.140798] do_sys_ftruncate+0x12b/0x1c0
[39611.141726] __x64_sys_ftruncate+0x1b/0x20
[39611.142449] do_syscall_64+0x65/0x1a0
[39611.143013] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.143792] RIP: 0033:0x7f2d34f4cc97
[39611.144344] Code: Bad RIP value.
[39611.144929] RSP: 002b:00007ffdb5c693d8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.146088] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2d34f4cc97
[39611.147180] RDX: 0000000000000000 RSI: 000000000063c214 RDI: 0000000000000003
[39611.148819] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001a4a74f
[39611.149950] R10: 00000000763c97e2 R11: 0000000000000246 R12: 000000000063c214
[39611.151040] R13: 000000000063b9f8 R14: 000000000000081c R15: 0000000000000000
[39611.152445] rsync D 0 29478 29477 0x00000000
[39611.153311] Call Trace:
[39611.153735] __schedule+0x3d4/0xb70
[39611.154292] schedule+0x3d/0x80
[39611.154981] btrfs_wait_for_snapshot_creation+0xb2/0xe0
[39611.155912] ? wake_up_var+0x40/0x40
[39611.156481] btrfs_setattr+0x316/0x5a0
[39611.157169] notify_change+0x2f0/0x450
[39611.157766] do_truncate+0x73/0xc0
[39611.158309] do_sys_ftruncate+0x12b/0x1c0
[39611.158950] __x64_sys_ftruncate+0x1b/0x20
[39611.159626] do_syscall_64+0x65/0x1a0
[39611.160191] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[39611.160991] RIP: 0033:0x7f256a298c97
[39611.161564] Code: Bad RIP value.
[39611.162077] RSP: 002b:00007ffeadbc85c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
[39611.163255] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f256a298c97
[39611.164355] RDX: 0000000000000000 RSI: 0000000000008d10 RDI: 0000000000000003
[39611.165474] RBP: 0000000000000003 R08: 0000000000000000 R09: 000000008c0d9c29
[39611.166588] R10: 000000003afa98d7 R11: 0000000000000246 R12: 0000000000008d10
[39611.167700] R13: 0000000000008b74 R14: 000000000000019c R15: 0000000000000000
All the blocked threads are waiting for a snapshot, which is here:
[42829.412010] btrfs R running task 0 11306 6447 0x00000000
[42829.413392] Call Trace:
[42829.413895] __schedule+0x3d4/0xb70
[42829.414598] preempt_schedule_common+0x1f/0x30
[42829.415480] _cond_resched+0x22/0x30
[42829.416198] wait_for_common_io.constprop.2+0x47/0x1b0
[42829.417212] ? submit_bio+0x73/0x140
[42829.417932] wait_for_completion_io+0x18/0x20
[42829.418791] submit_bio_wait+0x68/0x90
[42829.419546] blkdev_issue_discard+0x80/0xd0
[42829.420381] btrfs_issue_discard+0xc7/0x160
[42829.421215] ? btrfs_issue_discard+0xc7/0x160
[42829.422088] btrfs_discard_extent+0xcc/0x160
[42829.423191] btrfs_finish_extent_commit+0x118/0x280
[42829.424310] btrfs_commit_transaction+0x7f9/0xab0
[42829.425231] ? wait_woken+0xa0/0xa0
[42829.425907] btrfs_mksubvol+0x5b4/0x630
[42829.426766] ? mnt_want_write_file+0x28/0x60
[42829.427597] btrfs_ioctl_snap_create_transid+0x16b/0x1a0
[42829.428614] btrfs_ioctl_snap_create_v2+0x125/0x180
[42829.429548] btrfs_ioctl+0x1351/0x2cb0
[42829.430272] ? __handle_mm_fault+0x110c/0x1950
[42829.431124] ? do_raw_spin_unlock+0x4d/0xc0
[42829.431934] ? _raw_spin_unlock+0x27/0x40
[42829.432704] ? __handle_mm_fault+0x110c/0x1950
[42829.433556] ? kvm_sched_clock_read+0x18/0x30
[42829.434437] do_vfs_ioctl+0xa6/0x6e0
[42829.435345] ? btrfs_ioctl_get_supported_features+0x30/0x30
[42829.436803] ? do_vfs_ioctl+0xa6/0x6e0
[42829.437528] ? up_read+0x1f/0xa0
[42829.438159] ksys_ioctl+0x75/0x80
[42829.438798] __x64_sys_ioctl+0x1a/0x20
[42829.439565] do_syscall_64+0x65/0x1a0
[42829.440510] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[42829.441463] RIP: 0033:0x7f1faa8f5777
[42829.442133] Code: Bad RIP value.
[42829.442736] RSP: 002b:00007ffec1520458 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[42829.444352] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1faa8f5777
[42829.445710] RDX: 00007ffec1520498 RSI: 0000000050009417 RDI: 0000000000000003
[42829.447030] RBP: 00005621c22ef260 R08: 0000000000000008 R09: 00007f1faa97fe80
[42829.448372] R10: fffffffffffffa4a R11: 0000000000000202 R12: 00005621c22ef290
[42829.449723] R13: 00005621c22ef290 R14: 0000000000000003 R15: 0000000000000004
Also I just noticed there's sometimes (but not always!) a BUG in
fs/btrfs/ctree.c just before all the soft lockups start:
[26101.008546] ------------[ cut here ]------------
[26101.016090] kernel BUG at fs/btrfs/ctree.c:1227!
[26101.018285] irq event stamp: 36913
[26101.018287] invalid opcode: 0000 [#1] SMP PTI
[26101.018293] CPU: 1 PID: 4823 Comm: crawl_5268 Not tainted 5.0.16-zb64-9b948ea3083a+ #1
[26101.019115] hardirqs last enabled at (36913): [<ffffffffbb25b02c>] get_page_from_freelist+0x40c/0x19e0
[26101.019118] hardirqs last disabled at (36912): [<ffffffffbb25af70>] get_page_from_freelist+0x350/0x19e0
[26101.020820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[26101.022456] softirqs last enabled at (36478): [<ffffffffbc0003a0>] __do_softirq+0x3a0/0x45a
[26101.022461] softirqs last disabled at (36459): [<ffffffffbb0a9949>] irq_exit+0xe9/0xf0
[26101.024212] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
[26101.039031] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
[26101.042382] RSP: 0018:ffffb3f401613820 EFLAGS: 00010206
[26101.043512] RAX: ffff9f8068690180 RBX: ffff9f7ebf2ba660 RCX: ffff9f8003b1eb80
[26101.044719] RDX: 000000000000015d RSI: 000000000000007e RDI: 0000018aff0a8000
[26101.046157] RBP: ffffb3f401613850 R08: ffffb3f4016137c8 R09: ffffb3f4016137d0
[26101.047474] R10: 0000000000007af3 R11: 000000000000007e R12: 0000000000000008
[26101.048779] R13: ffff9f7ea7d77d00 R14: 0000000000000a49 R15: ffff9f8068690180
[26101.049939] FS: 00007f9064d7a700(0000) GS:ffff9f80b6000000(0000) knlGS:0000000000000000
[26101.051415] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26101.052330] CR2: 00007fcc462fd8a0 CR3: 000000020e002006 CR4: 00000000001606e0
[26101.053615] Call Trace:
[26101.054103] btrfs_search_old_slot+0xfe/0x800
[26101.054900] resolve_indirect_refs+0x1c5/0x910
[26101.055734] ? prelim_ref_insert+0x10a/0x320
[26101.056474] find_parent_nodes+0x443/0x1340
[26101.057153] btrfs_find_all_roots_safe+0xc5/0x140
[26101.057890] ? btrfs_inode_flags_to_fsflags+0x90/0x90
[26101.058813] iterate_extent_inodes+0x198/0x3e0
[26101.059608] iterate_inodes_from_logical+0xa1/0xd0
[26101.060377] ? iterate_inodes_from_logical+0xa1/0xd0
[26101.061142] ? btrfs_inode_flags_to_fsflags+0x90/0x90
[26101.061975] btrfs_ioctl_logical_to_ino+0xf3/0x1a0
[26101.062710] btrfs_ioctl+0xcf7/0x2cb0
[26101.063318] ? __lock_acquire+0x477/0x1b50
[26101.063988] ? kvm_sched_clock_read+0x18/0x30
[26101.065010] ? kvm_sched_clock_read+0x18/0x30
[26101.065781] ? sched_clock+0x9/0x10
[26101.066402] do_vfs_ioctl+0xa6/0x6e0
[26101.067072] ? btrfs_ioctl_get_supported_features+0x30/0x30
[26101.068102] ? do_vfs_ioctl+0xa6/0x6e0
[26101.068765] ? __fget+0x119/0x200
[26101.069381] ksys_ioctl+0x75/0x80
[26101.069937] __x64_sys_ioctl+0x1a/0x20
[26101.070627] do_syscall_64+0x65/0x1a0
[26101.071299] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[26101.072125] RIP: 0033:0x7f9067675777
[26101.072732] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
[26101.075884] RSP: 002b:00007f9064d77458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[26101.077193] RAX: ffffffffffffffda RBX: 00007f9064d77780 RCX: 00007f9067675777
[26101.078362] RDX: 00007f9064d77788 RSI: 00000000c038943b RDI: 0000000000000004
[26101.079543] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9064d77960
[26101.080837] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004
[26101.082724] R13: 00007f9064d77788 R14: 00007f9064d77668 R15: 00007f9064d77890
[26101.084083] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data snd_pcm crct10dif_pclmul crc32_pclmul dm_bio_prison crc32c_intel ghash_clmulni_intel dm_bufio sr_mod snd_timer cdrom snd sg aesni_intel ppdev joydev aes_x86_64 dm_mod soundcore crypto_simd ide_pci_generic cryptd piix glue_helper psmouse input_leds parport_pc ide_core rtc_cmos pcspkr serio_raw bochs_drm evbug parport evdev floppy i2c_piix4 qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4
[26101.091346] ---[ end trace d327561dc44a663d ]---
[26101.092065] RIP: 0010:__tree_mod_log_rewind+0x239/0x240
[26101.092893] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5
[26101.095707] RSP: 0018:ffffb3f401613820 EFLAGS: 00010206
[26101.096505] RAX: ffff9f8068690180 RBX: ffff9f7ebf2ba660 RCX: ffff9f8003b1eb80
[26101.097595] RDX: 000000000000015d RSI: 000000000000007e RDI: 0000018aff0a8000
[26101.098678] RBP: ffffb3f401613850 R08: ffffb3f4016137c8 R09: ffffb3f4016137d0
[26101.099771] R10: 0000000000007af3 R11: 000000000000007e R12: 0000000000000008
[26101.100994] R13: ffff9f7ea7d77d00 R14: 0000000000000a49 R15: ffff9f8068690180
[26101.102230] FS: 00007f9064d7a700(0000) GS:ffff9f80b6000000(0000) knlGS:0000000000000000
[26101.103649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26101.104627] CR2: 00007fcc462fd8a0 CR3: 000000020e002006 CR4: 00000000001606e0
[26101.105812] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[26101.107351] in_atomic(): 1, irqs_disabled(): 0, pid: 4823, name: crawl_5268
[26101.108628] INFO: lockdep is turned off.
[26101.109362] CPU: 1 PID: 4823 Comm: crawl_5268 Tainted: G D 5.0.16-zb64-9b948ea3083a+ #1
[26101.110931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[26101.112303] Call Trace:
[26101.112757] dump_stack+0x86/0xc5
[26101.113387] ___might_sleep+0x217/0x240
[26101.114077] __might_sleep+0x4a/0x80
[26101.114730] exit_signals+0x33/0x250
[26101.115508] do_exit+0xb9/0xd70
[26101.116095] rewind_stack_do_exit+0x17/0x20
[26101.116858] RIP: 0033:0x7f9067675777
[26101.117497] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
[26101.120764] RSP: 002b:00007f9064d77458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[26101.122031] RAX: ffffffffffffffda RBX: 00007f9064d77780 RCX: 00007f9067675777
[26101.123266] RDX: 00007f9064d77788 RSI: 00000000c038943b RDI: 0000000000000004
[26101.124474] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9064d77960
[26101.125746] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004
[26101.126982] R13: 00007f9064d77788 R14: 00007f9064d77668 R15: 00007f9064d77890
[26101.128246] note: crawl_5268[4823] exited with preempt_count 2
[26128.042702] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [crawl_5268:4821]
[26128.048713] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [crawl_5268:4820]
[26128.051844] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [crawl_5268:4822]
[26128.055703] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [btrfs-balance:4691]
[...etc...]
The stack traces for the 4 running threads are:
[42826.377803] crawl_5268 R running task 0 4821 4806 0x80000008
[42826.379193] Call Trace:
[42826.379683] <IRQ>
[42826.380076] sched_show_task+0x198/0x260
[42826.380760] show_state_filter+0xa0/0x1a0
[42826.381438] sysrq_handle_showstate+0x10/0x20
[42826.382251] __handle_sysrq+0x139/0x210
[42826.382911] handle_sysrq+0x26/0x30
[42826.383507] serial8250_handle_irq.part.16+0xbc/0x100
[42826.384476] serial8250_default_handle_irq+0x53/0x60
[42826.385369] serial8250_interrupt+0x68/0x100
[42826.386102] __handle_irq_event_percpu+0x4e/0x2b0
[42826.386890] handle_irq_event_percpu+0x32/0x80
[42826.387627] handle_irq_event+0x39/0x60
[42826.388274] handle_edge_irq+0xef/0x1c0
[42826.388921] handle_irq+0x75/0x120
[42826.389475] do_IRQ+0x64/0x130
[42826.390007] common_interrupt+0xf/0xf
[42826.390617] </IRQ>
[42826.390984] RIP: 0010:native_safe_halt+0x12/0x20
[42826.391750] Code: f0 80 48 02 20 48 8b 00 a8 08 0f 84 7b ff ff ff eb bd 90 90 90 90 90 90 55 48 89 e5 e9 07 00 00 00 0f 00 2d b2 ec 43 00 fb f4 <5d> c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e9 07 00 00
[42826.395273] RSP: 0018:ffffb3f401603750 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd6
[42826.396849] RAX: 0000000000000000 RBX: ffff9f80a36716a4 RCX: 0000000000000008
[42826.398308] RDX: ffff9f80ab13c000 RSI: ffffffffbb119a73 RDI: ffffffffbb07a326
[42826.399766] RBP: ffffb3f401603750 R08: 0000000000000000 R09: 000000000000005c
[42826.401269] R10: ffffb3f401603790 R11: ffff9f80a36716b8 R12: 0000000000000246
[42826.402715] R13: 0000000000000003 R14: 0000000000000100 R15: 0000000000000000
[42826.403891] ? __pv_queued_spin_lock_slowpath+0x273/0x2b0
[42826.404798] ? kvm_wait+0x86/0x90
[42826.405346] kvm_wait+0x8b/0x90
[42826.405998] __pv_queued_spin_lock_slowpath+0x273/0x2b0
[42826.407075] queued_read_lock_slowpath+0x75/0x80
[42826.408088] do_raw_read_lock+0x4b/0x50
[42826.408717] _raw_read_lock+0x58/0x70
[42826.409319] __tree_mod_log_search+0x2d/0xb0
[42826.410019] btrfs_search_old_slot+0x312/0x800
[42826.410738] ? __tree_mod_log_search+0x73/0xb0
[42826.411466] resolve_indirect_refs+0x1c5/0x910
[42826.412192] ? prelim_ref_insert+0x10a/0x320
[42826.412890] find_parent_nodes+0x443/0x1340
[42826.413597] btrfs_find_all_roots_safe+0xc5/0x140
[42826.414591] ? btrfs_inode_flags_to_fsflags+0x90/0x90
[42826.415573] iterate_extent_inodes+0x198/0x3e0
[42826.416530] iterate_inodes_from_logical+0xa1/0xd0
[42826.417568] ? iterate_inodes_from_logical+0xa1/0xd0
[42826.418511] ? btrfs_inode_flags_to_fsflags+0x90/0x90
[42826.419334] btrfs_ioctl_logical_to_ino+0xf3/0x1a0
[42826.420390] btrfs_ioctl+0xcf7/0x2cb0
[42826.421300] ? __lock_acquire+0x477/0x1b50
[42826.422246] ? kvm_sched_clock_read+0x18/0x30
[42826.422960] ? kvm_sched_clock_read+0x18/0x30
[42826.423731] ? sched_clock+0x9/0x10
[42826.424317] do_vfs_ioctl+0xa6/0x6e0
[42826.425014] ? btrfs_ioctl_get_supported_features+0x30/0x30
[42826.426162] ? do_vfs_ioctl+0xa6/0x6e0
[42826.426813] ? __fget+0x119/0x200
[42826.427370] ksys_ioctl+0x75/0x80
[42826.427926] __x64_sys_ioctl+0x1a/0x20
[42826.428609] do_syscall_64+0x65/0x1a0
[42826.429444] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[42826.430473] RIP: 0033:0x7f9067675777
[42826.431201] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48
[42826.434629] RSP: 002b:00007f9065d79458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[42826.435975] RAX: ffffffffffffffda RBX: 00007f9065d79780 RCX: 00007f9067675777
[42826.437165] RDX: 00007f9065d79788 RSI: 00000000c038943b RDI: 0000000000000004
[42826.438305] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9065d79960
[42826.439540] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004
[42826.440947] R13: 00007f9065d79788 R14: 00007f9065d79668 R15: 00007f9065d79890
[42826.343964] crawl_5268 R running task 0 4820 4806 0x80000008
[42826.345377] Call Trace:
[42826.345865] __schedule+0x3dc/0xb70
[42826.346500] schedule+0x3d/0x80
[42826.347055] ? wait_on_page_bit+0x193/0x270
[42826.347965] ? retint_kernel+0x10/0x10
[42826.348861] ? trace_hardirqs_on_thunk+0x1a/0x1c
[42826.349961] ? retint_kernel+0x10/0x10
[42826.350727] ? trace_hardirqs_on_thunk+0x1a/0x1c
[42826.351648] ? trace_hardirqs_on_caller+0x4a/0x100
[42826.352683] ? trace_hardirqs_on_thunk+0x1a/0x1c
[42826.353593] ? retint_kernel+0x10/0x10
[42826.354322] ? kvm_wait+0x86/0x90
[42826.354922] ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
[42826.355802] ? trace_hardirqs_on+0x4c/0x100
[42826.356489] ? kvm_wait+0x8b/0x90
[42826.357215] ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
[42826.358172] ? queued_write_lock_slowpath+0x80/0x90
[42826.359170] ? do_raw_write_lock+0xae/0xb0
[42826.359845] ? _raw_write_lock+0x55/0x70
[42826.360490] ? btrfs_get_tree_mod_seq+0x32/0xc0
[42826.361513] ? btrfs_inode_flags_to_fsflags+0x90/0x90
[42826.362373] ? iterate_extent_inodes+0x357/0x3e0
[42826.363149] ? release_extent_buffer+0xaa/0xe0
[42826.364283] ? _raw_spin_unlock+0x27/0x40
[42826.365094] ? iterate_inodes_from_logical+0xa1/0xd0
[42826.366083] ? iterate_inodes_from_logical+0xa1/0xd0
[42826.367077] ? btrfs_inode_flags_to_fsflags+0x90/0x90
[42826.368108] ? btrfs_ioctl_logical_to_ino+0xf3/0x1a0
[42826.369098] ? btrfs_ioctl+0xcf7/0x2cb0
[42826.369769] ? lock_acquire+0xbd/0x1c0
[42826.370430] ? lock_acquire+0xbd/0x1c0
[42826.371226] ? do_vfs_ioctl+0xa6/0x6e0
[42826.371884] ? btrfs_ioctl_get_supported_features+0x30/0x30
[42826.372899] ? do_vfs_ioctl+0xa6/0x6e0
[42826.373653] ? __fget+0x119/0x200
[42826.374490] ? ksys_ioctl+0x75/0x80
[42826.375323] ? __x64_sys_ioctl+0x1a/0x20
[42826.376203] ? do_syscall_64+0x65/0x1a0
[42826.376896] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[42826.442068] crawl_5268 R running task 0 4822 4806 0x80000008
[42826.443327] Call Trace:
[42826.443765] __schedule+0x3dc/0xb70
[42826.444410] schedule+0x3d/0x80
[42826.445077] ? wait_on_page_bit+0x193/0x270
[42826.445949] ? retint_kernel+0x10/0x10
[42826.446568] ? trace_hardirqs_on_thunk+0x1a/0x1c
[42826.447332] ? trace_hardirqs_on_caller+0x4a/0x100
[42826.448104] ? trace_hardirqs_on_thunk+0x1a/0x1c
[42826.448845] ? retint_kernel+0x10/0x10
[42826.449452] ? kvm_wait+0x86/0x90
[42826.450000] ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
[42826.450870] ? kvm_wait+0x86/0x90
[42826.451422] ? trace_hardirqs_on+0x4c/0x100
[42826.452394] ? kvm_wait+0x8b/0x90
[42826.453108] ? __pv_queued_spin_lock_slowpath+0x1ed/0x2b0
[42826.454029] ? queued_write_lock_slowpath+0x80/0x90
[42826.455032] ? do_raw_write_lock+0xae/0xb0
[42826.455721] ? _raw_write_lock+0x55/0x70
[42826.456384] ? btrfs_get_tree_mod_seq+0x32/0xc0
[42826.457362] ? btrfs_inode_flags_to_fsflags+0x90/0x90
[42826.458465] ? iterate_extent_inodes+0x357/0x3e0
[42826.459275] ? release_extent_buffer+0xaa/0xe0
[42826.460200] ? _raw_spin_unlock+0x27/0x40
[42826.461139] ? iterate_inodes_from_logical+0xa1/0xd0
[42826.462004] ? iterate_inodes_from_logical+0xa1/0xd0
[42826.463032] ? btrfs_inode_flags_to_fsflags+0x90/0x90
[42826.463906] ? btrfs_ioctl_logical_to_ino+0xf3/0x1a0
[42826.464752] ? btrfs_ioctl+0xcf7/0x2cb0
[42826.466053] ? lock_acquire+0xbd/0x1c0
[42826.466921] ? lock_acquire+0xbd/0x1c0
[42826.467765] ? do_vfs_ioctl+0xa6/0x6e0
[42826.468482] ? btrfs_ioctl_get_supported_features+0x30/0x30
[42826.469726] ? do_vfs_ioctl+0xa6/0x6e0
[42826.470366] ? __fget+0x119/0x200
[42826.470950] ? ksys_ioctl+0x75/0x80
[42826.471558] ? __x64_sys_ioctl+0x1a/0x20
[42826.472367] ? do_syscall_64+0x65/0x1a0
[42826.473038] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[42826.119748] btrfs-balance R running task 0 4691 2 0x80000008
[42826.121155] Call Trace:
[42826.121620] tree_mod_log_insert_root.isra.2+0x117/0x350
[42826.122670] ? queued_write_lock_slowpath+0x50/0x90
[42826.123496] ? do_raw_write_lock+0xae/0xb0
[42826.124167] ? _raw_write_lock+0x55/0x70
[42826.124804] ? tree_mod_log_insert_root.isra.2+0x117/0x350
[42826.125691] ? __btrfs_cow_block+0x41e/0x570
[42826.126390] ? btrfs_cow_block+0xf8/0x230
[42826.127051] ? replace_path.isra.18+0x403/0x770
[42826.127785] ? merge_reloc_root+0x2ab/0x5a0
[42826.128469] ? btrfs_get_fs_root.part.12+0xc0/0x160
[42826.129262] ? merge_reloc_roots+0xe0/0x270
[42826.129948] ? relocate_block_group+0x184/0x600
[42826.130681] ? btrfs_relocate_block_group+0x15a/0x270
[42826.131499] ? btrfs_relocate_chunk+0x50/0x100
[42826.132222] ? btrfs_balance+0xa72/0x1330
[42826.132878] ? balance_kthread+0x25/0x50
[42826.133520] ? balance_kthread+0x3b/0x50
[42826.134166] ? kthread+0x113/0x150
[42826.134720] ? btrfs_balance+0x1330/0x1330
[42826.135389] ? kthread_park+0x90/0x90
[42826.135993] ? ret_from_fork+0x3a/0x50
This doesn't quite match the traces I previously posted. Maybe the
storm-of-soft-lockups is a symptom of multiple bugs (at least one which
happens after a bug in ctree.c and one that happens at other times)?
I'll run this a few more times and see if different cases come up.
The one thing that is common to all the storm-of-soft-lockups I've seen
so far is the involvement of multiple crawl_* threads, and those spend
~60% of their time running logical_to_ino.
> Also do you have this patch on the affected machine:
>
> 38e3eebff643 ("btrfs: honor path->skip_locking in backref code") can you
> try and test with it applied ?
>
>
> <SNIP>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
next prev parent reply other threads:[~2019-05-18 4:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-15 21:36 storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+) Zygo Blaxell
2019-05-16 6:57 ` Nikolay Borisov
2019-05-16 16:20 ` Zygo Blaxell
2019-05-18 4:44 ` Zygo Blaxell [this message]
2019-05-18 8:27 ` Nikolay Borisov
2019-05-18 21:00 ` Zygo Blaxell
2019-05-18 21:41 ` Zygo Blaxell
2020-02-04 4:57 ` storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+, fixed in 5.4) Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190518044411.GH20359@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).