* [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
@ 2021-10-20 7:36 bugzilla-daemon
2021-10-20 7:39 ` [Bug 214767] " bugzilla-daemon
` (17 more replies)
0 siblings, 18 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 7:36 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
Bug ID: 214767
Summary: xfs seems to hang due to race condition? maybe related
to (gratuitous) thaw.
Product: File System
Version: 2.5
Kernel Version: 5.10.70
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: high
Priority: P1
Component: XFS
Assignee: filesystem_xfs@kernel-bugs.kernel.org
Reporter: ct@flyingcircus.io
Regression: No
We have been getting somewhat frequently (around 1 per day for about 500 VMs)
blocked tasks with tracebacks always similar to this:
First we get:
[656898.010322] INFO: task kworker/u2:1:458736 blocked for more than 122
seconds.
[656898.011988] Not tainted 5.10.70 #1-NixOS
[656898.012981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[656898.014679] task:kworker/u2:1 state:D stack: 0 pid:458736 ppid: 2
flags:0x00004000
[656898.016530] Workqueue: writeback wb_workfn (flush-253:0)
[656898.017698] Call Trace:
[656898.018310] __schedule+0x271/0x860
[656898.019208] schedule+0x46/0xb0
[656898.019944] rwsem_down_read_slowpath+0x169/0x490
[656898.021008] ? mempool_alloc+0x62/0x170
[656898.022032] xfs_map_blocks+0xb9/0x400 [xfs]
[656898.022998] iomap_do_writepage+0x163/0x850
[656898.023587] ? __mod_memcg_lruvec_state+0x21/0xe0
[656898.024044] write_cache_pages+0x186/0x3d0
[656898.024463] ? iomap_migrate_page+0xc0/0xc0
[656898.024882] ? submit_bio_noacct+0x3a9/0x420
[656898.025303] iomap_writepages+0x1c/0x40
[656898.025712] xfs_vm_writepages+0x64/0x90 [xfs]
[656898.026146] do_writepages+0x34/0xc0
[656898.026513] __writeback_single_inode+0x39/0x2a0
[656898.026969] writeback_sb_inodes+0x200/0x470
[656898.027388] __writeback_inodes_wb+0x4c/0xe0
[656898.027805] wb_writeback+0x1d8/0x290
[656898.028161] wb_workfn+0x29b/0x4d0
[656898.028526] ? __switch_to_asm+0x42/0x70
[656898.028915] ? __switch_to+0x7b/0x3e0
[656898.029285] process_one_work+0x1df/0x370
[656898.029696] worker_thread+0x50/0x400
[656898.030054] ? process_one_work+0x370/0x370
[656898.030475] kthread+0x11b/0x140
[656898.030798] ? __kthread_bind_mask+0x60/0x60
[656898.031214] ret_from_fork+0x22/0x30
[656898.031592] INFO: task nix-daemon:459204 blocked for more than 122 seconds.
[656898.032259] Not tainted 5.10.70 #1-NixOS
[656898.032695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[656898.033451] task:nix-daemon state:D stack: 0 pid:459204 ppid: 1067
flags:0x00000000
[656898.034261] Call Trace:
[656898.034538] __schedule+0x271/0x860
[656898.034881] schedule+0x46/0xb0
[656898.035226] xfs_log_commit_cil+0x6a4/0x800 [xfs]
[656898.035709] ? wake_up_q+0xa0/0xa0
[656898.036073] __xfs_trans_commit+0x9d/0x310 [xfs]
[656898.036561] xfs_setattr_nonsize+0x342/0x520 [xfs]
[656898.037027] notify_change+0x348/0x4c0
[656898.037407] ? chmod_common+0xa1/0x150
[656898.037779] chmod_common+0xa1/0x150
[656898.038139] do_fchmodat+0x5a/0xb0
[656898.038491] __x64_sys_chmod+0x17/0x20
[656898.038860] do_syscall_64+0x33/0x40
[656898.039221] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[656898.039722] RIP: 0033:0x7f3d9b3a3b77
[656898.040072] RSP: 002b:00007ffc388e2b18 EFLAGS: 00000206 ORIG_RAX:
000000000000005a
[656898.040797] RAX: ffffffffffffffda RBX: 00007ffc388e2e10 RCX:
00007f3d9b3a3b77
[656898.041542] RDX: 0000000000008049 RSI: 0000000000008124 RDI:
00000000016a2400
[656898.042222] RBP: 00007ffc388e32a0 R08: 00007ffc388e4628 R09:
0000000000000000
[656898.042913] R10: 0000000000000000 R11: 0000000000000206 R12:
00007ffc388e4620
[656898.043604] R13: 0000000000000000 R14: 00007ffc388e2df0 R15:
00007ffc388e2e10
and then a while later we see this:
[657020.886633] INFO: task qemu-ga:750 blocked for more than 122 seconds.
[657020.888346] Not tainted 5.10.70 #1-NixOS
[657020.889482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[657020.891072] task:qemu-ga state:D stack: 0 pid: 750 ppid: 1
flags:0x00000080
[657020.891906] Call Trace:
[657020.892188] __schedule+0x271/0x860
[657020.892547] ? kvm_sched_clock_read+0xd/0x20
[657020.892983] schedule+0x46/0xb0
[657020.893319] rwsem_down_write_slowpath+0x218/0x480
[657020.893802] thaw_super+0x12/0x20
[657020.894130] __x64_sys_ioctl+0x62/0xb0
[657020.894491] do_syscall_64+0x33/0x40
[657020.894860] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[657020.895396] RIP: 0033:0x7f69a55f2b17
[657020.895762] RSP: 002b:00007ffe5687b548 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[657020.896470] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f69a55f2b17
[657020.897144] RDX: 0000000000000000 RSI: 00000000c0045878 RDI:
0000000000000006
[657020.897815] RBP: 0000000000000000 R08: 00007f69a5477c08 R09:
0000000000000000
[657020.898479] R10: 0000000000000000 R11: 0000000000000246 R12:
000055ef74a15190
[657020.899155] R13: 00007ffe5687b5c0 R14: 00007ffe5687b560 R15:
0000000000000006
[657020.899850] INFO: task kworker/u2:1:458736 blocked for more than 245
seconds.
[657020.900513] Not tainted 5.10.70 #1-NixOS
[657020.900942] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[657020.901672] task:kworker/u2:1 state:D stack: 0 pid:458736 ppid: 2
flags:0x00004000
[657020.902469] Workqueue: writeback wb_workfn (flush-253:0)
[657020.902976] Call Trace:
[657020.903220] __schedule+0x271/0x860
[657020.903556] schedule+0x46/0xb0
[657020.903870] rwsem_down_read_slowpath+0x169/0x490
[657020.904321] ? mempool_alloc+0x62/0x170
[657020.904996] xfs_map_blocks+0xb9/0x400 [xfs]
[657020.905412] iomap_do_writepage+0x163/0x850
[657020.905823] ? __mod_memcg_lruvec_state+0x21/0xe0
[657020.906269] write_cache_pages+0x186/0x3d0
[657020.906664] ? iomap_migrate_page+0xc0/0xc0
[657020.907079] ? submit_bio_noacct+0x3a9/0x420
[657020.907487] iomap_writepages+0x1c/0x40
[657020.907942] xfs_vm_writepages+0x64/0x90 [xfs]
[657020.908376] do_writepages+0x34/0xc0
[657020.908760] __writeback_single_inode+0x39/0x2a0
[657020.909222] writeback_sb_inodes+0x200/0x470
[657020.909635] __writeback_inodes_wb+0x4c/0xe0
[657020.910058] wb_writeback+0x1d8/0x290
[657020.910410] wb_workfn+0x29b/0x4d0
[657020.910756] ? __switch_to_asm+0x42/0x70
[657020.911135] ? __switch_to+0x7b/0x3e0
[657020.911489] process_one_work+0x1df/0x370
[657020.911912] worker_thread+0x50/0x400
[657020.912264] ? process_one_work+0x370/0x370
[657020.912667] kthread+0x11b/0x140
[657020.913008] ? __kthread_bind_mask+0x60/0x60
[657020.913413] ret_from_fork+0x22/0x30
[657020.913791] INFO: task nix-daemon:459204 blocked for more than 245 seconds.
[657020.914440] Not tainted 5.10.70 #1-NixOS
[657020.914879] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[657020.915612] task:nix-daemon state:D stack: 0 pid:459204 ppid: 1067
flags:0x00000000
[657020.916399] Call Trace:
[657020.916654] __schedule+0x271/0x860
[657020.917006] schedule+0x46/0xb0
[657020.917343] xfs_log_commit_cil+0x6a4/0x800 [xfs]
[657020.917806] ? wake_up_q+0xa0/0xa0
[657020.918162] __xfs_trans_commit+0x9d/0x310 [xfs]
[657020.918633] xfs_setattr_nonsize+0x342/0x520 [xfs]
[657020.919101] notify_change+0x348/0x4c0
[657020.919461] ? chmod_common+0xa1/0x150
[657020.919831] chmod_common+0xa1/0x150
[657020.920180] do_fchmodat+0x5a/0xb0
[657020.920509] __x64_sys_chmod+0x17/0x20
[657020.920878] do_syscall_64+0x33/0x40
[657020.921222] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[657020.921703] RIP: 0033:0x7f3d9b3a3b77
[657020.922055] RSP: 002b:00007ffc388e2b18 EFLAGS: 00000206 ORIG_RAX:
000000000000005a
[657020.922772] RAX: ffffffffffffffda RBX: 00007ffc388e2e10 RCX:
00007f3d9b3a3b77
[657020.923435] RDX: 0000000000008049 RSI: 0000000000008124 RDI:
00000000016a2400
[657020.924115] RBP: 00007ffc388e32a0 R08: 00007ffc388e4628 R09:
0000000000000000
[657020.924787] R10: 0000000000000000 R11: 0000000000000206 R12:
00007ffc388e4620
[657020.925450] R13: 0000000000000000 R14: 00007ffc388e2df0 R15:
00007ffc388e2e10
IO is subsequently stalled on ALL disks (we have vda/vdb/vdc with two of them
running XFS and one running swap).
I am pretty sure that this is not primarily an issue in the hypervisor (Qemu
4.1) or storage (Ceph Jewel) because I can warm reset the virtual machine using
"system-reset" and the machine will boot properly.
This could be related to https://bugzilla.kernel.org/show_bug.cgi?id=207053 but
it doesn't look quite right.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
@ 2021-10-20 7:39 ` bugzilla-daemon
2021-10-20 16:16 ` bugzilla-daemon
` (16 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 7:39 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #1 from Christian Theune (ct@flyingcircus.io) ---
Also not that I'm not sure that it's related to the thaw at all because the
stuck thaw info comes exactly 122 seconds after the first blocked task report
which IMHO indicates that it's a subsequent error, not a cause.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
2021-10-20 7:39 ` [Bug 214767] " bugzilla-daemon
@ 2021-10-20 16:16 ` bugzilla-daemon
2021-10-20 16:31 ` bugzilla-daemon
` (15 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 16:16 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #2 from Christian Theune (ct@flyingcircus.io) ---
Here's another excerpt that I dug out from a different VM with the same issue.
No thawing happening AFAICT around this time:
[848865.353541] INFO: task nix-daemon:1855245 blocked for more than 122
seconds.
[848865.355625] Not tainted 5.10.70 #1-NixOS
[848865.356912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[848865.359117] task:nix-daemon state:D stack: 0 pid:1855245 ppid:
2165 flags:0x00000000
[848865.360999] Call Trace:
[848865.361480] __schedule+0x271/0x860
[848865.362069] schedule+0x46/0xb0
[848865.362657] xfs_log_commit_cil+0x6a4/0x800 [xfs]
[848865.363436] ? wake_up_q+0xa0/0xa0
[848865.364026] __xfs_trans_commit+0x9d/0x310 [xfs]
[848865.364818] xfs_create+0x472/0x560 [xfs]
[848865.365517] xfs_generic_create+0x247/0x320 [xfs]
[848865.366310] ? xfs_lookup+0x55/0x100 [xfs]
[848865.366984] path_openat+0xdd7/0x1070
[848865.367617] do_filp_open+0x88/0x130
[848865.368199] ? getname_flags.part.0+0x29/0x1a0
[848865.368925] do_sys_openat2+0x97/0x150
[848865.369557] __x64_sys_openat+0x54/0x90
[848865.370176] do_syscall_64+0x33/0x40
[848865.370787] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[848865.371619] RIP: 0033:0x7f48b24c0ea8
[848865.372196] RSP: 002b:00007ffed91cee30 EFLAGS: 00000293 ORIG_RAX:
0000000000000101
[848865.373398] RAX: ffffffffffffffda RBX: 00000000000800c1 RCX:
00007f48b24c0ea8
[848865.374530] RDX: 00000000000800c1 RSI: 00000000015c4750 RDI:
00000000ffffff9c
[848865.375661] RBP: 00000000015c4750 R08: 0000000000000000 R09:
0000000000000003
[848865.376790] R10: 00000000000001b6 R11: 0000000000000293 R12:
00007ffed91cf220
[848865.377920] R13: 00007ffed91ceee0 R14: 00007ffed91ceed0 R15:
00007ffed91d0c50
and I can see it not making any progress:
[848988.235884] INFO: task nix-daemon:1855245 blocked for more than 245
seconds.
[848988.238567] Not tainted 5.10.70 #1-NixOS
[848988.240254] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[848988.242358] task:nix-daemon state:D stack: 0 pid:1855245 ppid:
2165 flags:0x00000000
[848988.243756] Call Trace:
[848988.244183] __schedule+0x271/0x860
[848988.244780] schedule+0x46/0xb0
[848988.245366] xfs_log_commit_cil+0x6a4/0x800 [xfs]
[848988.246153] ? wake_up_q+0xa0/0xa0
[848988.246775] __xfs_trans_commit+0x9d/0x310 [xfs]
[848988.247572] xfs_create+0x472/0x560 [xfs]
[848988.248269] xfs_generic_create+0x247/0x320 [xfs]
[848988.249084] ? xfs_lookup+0x55/0x100 [xfs]
[848988.249781] path_openat+0xdd7/0x1070
[848988.250388] do_filp_open+0x88/0x130
[848988.251000] ? getname_flags.part.0+0x29/0x1a0
[848988.251746] do_sys_openat2+0x97/0x150
[848988.252366] __x64_sys_openat+0x54/0x90
[848988.253015] do_syscall_64+0x33/0x40
[848988.253610] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[848988.254445] RIP: 0033:0x7f48b24c0ea8
[848988.255071] RSP: 002b:00007ffed91cee30 EFLAGS: 00000293 ORIG_RAX:
0000000000000101
[848988.256292] RAX: ffffffffffffffda RBX: 00000000000800c1 RCX:
00007f48b24c0ea8
[848988.257460] RDX: 00000000000800c1 RSI: 00000000015c4750 RDI:
00000000ffffff9c
[848988.258622] RBP: 00000000015c4750 R08: 0000000000000000 R09:
0000000000000003
[848988.259777] R10: 00000000000001b6 R11: 0000000000000293 R12:
00007ffed91cf220
[848988.260933] R13: 00007ffed91ceee0 R14: 00007ffed91ceed0 R15:
00007ffed91d0c50
Both stack traces are identical in every regard so that might be useful info.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
2021-10-20 7:39 ` [Bug 214767] " bugzilla-daemon
2021-10-20 16:16 ` bugzilla-daemon
@ 2021-10-20 16:31 ` bugzilla-daemon
2021-10-20 18:29 ` bugzilla-daemon
` (14 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 16:31 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #3 from Christian Theune (ct@flyingcircus.io) ---
I have another machine that managed to break free after 20 minutes which did
around 3 thaws around that time:
Oct 20 18:05:22 pixometerstag04 kernel: INFO: task nix-daemon:1387736 blocked
for more than 1228 seconds.
Oct 20 18:05:22 pixometerstag04 kernel: Not tainted 5.10.70 #1-NixOS
Oct 20 18:05:22 pixometerstag04 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 20 18:05:22 pixometerstag04 kernel: task:nix-daemon state:D stack:
0 pid:1387736 ppid: 1356 flags:0x00000000
Oct 20 18:05:22 pixometerstag04 kernel: Call Trace:
Oct 20 18:05:22 pixometerstag04 kernel: __schedule+0x271/0x860
Oct 20 18:05:22 pixometerstag04 kernel: schedule+0x46/0xb0
Oct 20 18:05:22 pixometerstag04 kernel: xfs_log_commit_cil+0x6a4/0x800 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel: ? wake_up_q+0xa0/0xa0
Oct 20 18:05:22 pixometerstag04 kernel: __xfs_trans_commit+0x9d/0x310 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel: xfs_create+0x472/0x560 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel: xfs_generic_create+0x247/0x320 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel: ? xfs_lookup+0x55/0x100 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel: path_openat+0xdd7/0x1070
Oct 20 18:05:22 pixometerstag04 kernel: do_filp_open+0x88/0x130
Oct 20 18:05:22 pixometerstag04 kernel: ? getname_flags.part.0+0x29/0x1a0
Oct 20 18:05:22 pixometerstag04 kernel: do_sys_openat2+0x97/0x150
Oct 20 18:05:22 pixometerstag04 kernel: __x64_sys_openat+0x54/0x90
Oct 20 18:05:22 pixometerstag04 kernel: do_syscall_64+0x33/0x40
Oct 20 18:05:22 pixometerstag04 kernel:
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 20 18:05:22 pixometerstag04 kernel: RIP: 0033:0x7f3e3114dea8
Oct 20 18:05:22 pixometerstag04 kernel: RSP: 002b:00007ffd92f7df10 EFLAGS:
00000293 ORIG_RAX: 0000000000000101
Oct 20 18:05:22 pixometerstag04 kernel: RAX: ffffffffffffffda RBX:
00000000000800c1 RCX: 00007f3e3114dea8
Oct 20 18:05:22 pixometerstag04 kernel: RDX: 00000000000800c1 RSI:
000000000195cb50 RDI: 00000000ffffff9c
Oct 20 18:05:22 pixometerstag04 kernel: RBP: 000000000195cb50 R08:
0000000000000000 R09: 0000000000000003
Oct 20 18:05:22 pixometerstag04 kernel: R10: 00000000000001b6 R11:
0000000000000293 R12: 00007ffd92f7e300
Oct 20 18:05:22 pixometerstag04 kernel: R13: 00007ffd92f7dfc0 R14:
00007ffd92f7dfb0 R15: 00007ffd92f7fd30
However, the first thaw came between when the machine already reported 122 and
245s of hangs and the last thaw came after it broke free.
Interestingly it also logs a variety of things that worked while those were
logged, maybe it only happened for the /tmp filesystem and not / ... apparently
postgresql was still writing stuff on the disk in that period, so I'd guess
this only happened on /tmp
This is square in the middle of the reported blocks:
Oct 20 17:52:00 pixometerstag04 postgres[1008]: user=,db= LOG: checkpoint
complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=0.004 s, sync=0.001 s, total=0.018 s; sync fi>
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (2 preceding siblings ...)
2021-10-20 16:31 ` bugzilla-daemon
@ 2021-10-20 18:29 ` bugzilla-daemon
2021-10-20 18:37 ` bugzilla-daemon
` (13 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 18:29 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #4 from Christian Theune (ct@flyingcircus.io) ---
Another thing I noticed: it always hangs the nix-daemon (nixos.org) which does
some dance around managing an overlay read-only mount that gets re-mounted rw
for a short period while changing things in there. Here is what it looks like
normally:
/dev/vda1 on / type xfs
(rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
/dev/vda1 on /nix/store type xfs
(ro,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (3 preceding siblings ...)
2021-10-20 18:29 ` bugzilla-daemon
@ 2021-10-20 18:37 ` bugzilla-daemon
2021-10-20 19:22 ` bugzilla-daemon
` (12 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 18:37 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #5 from Christian Theune (ct@flyingcircus.io) ---
I've also picked this up with the friends from NixOS in case we can shed some
light on what the dance is that the nix-daemon performs here:
https://github.com/NixOS/nixpkgs/issues/142394
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (4 preceding siblings ...)
2021-10-20 18:37 ` bugzilla-daemon
@ 2021-10-20 19:22 ` bugzilla-daemon
2021-10-20 19:26 ` bugzilla-daemon
` (11 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 19:22 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
Eric Sandeen (sandeen@sandeen.net) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sandeen@sandeen.net
--- Comment #6 from Eric Sandeen (sandeen@sandeen.net) ---
You might want to do an "echo w > /proc/sysrq-trigger when this happens, to get
all blocked tasks.
The thaw seems a little suspicious, what all is going on there with freezing
and thawing?
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (5 preceding siblings ...)
2021-10-20 19:22 ` bugzilla-daemon
@ 2021-10-20 19:26 ` bugzilla-daemon
2021-10-20 22:16 ` [Bug 214767] New: " Dave Chinner
` (10 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 19:26 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #7 from Christian Theune (ct@flyingcircus.io) ---
Thanks, I'll try to do something like that, likely by using a qemu monitor
command to send in the trigger via keyboard when I catch this the next time it
happens.
The freezing/thawing is part of our image backup structure and over the years
we established to regularly call 'thaw' gratuitously as thawing can once in a
blue moon cause VMs to get stuck infinitely if an initial thaw doesn't go
through. This has been reliable for at least 2-3 years by now and I'm not sure
whether this will turn out a red herring or not.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (6 preceding siblings ...)
2021-10-20 19:26 ` bugzilla-daemon
@ 2021-10-20 22:16 ` Dave Chinner
2021-10-20 22:38 ` [Bug 214767] " bugzilla-daemon
` (9 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2021-10-20 22:16 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: linux-xfs
On Wed, Oct 20, 2021 at 07:36:07AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> [656898.034261] Call Trace:
> [656898.034538] __schedule+0x271/0x860
> [656898.034881] schedule+0x46/0xb0
> [656898.035226] xfs_log_commit_cil+0x6a4/0x800 [xfs]
This looks like it is stuck on the hard throttle. Which means this
is the likely fix:
19f4e7cc8197 xfs: Fix CIL throttle hang when CIL space used going backwards
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (7 preceding siblings ...)
2021-10-20 22:16 ` [Bug 214767] New: " Dave Chinner
@ 2021-10-20 22:38 ` bugzilla-daemon
2021-10-21 4:21 ` bugzilla-daemon
` (8 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-20 22:38 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #8 from Dave Chinner (david@fromorbit.com) ---
On Wed, Oct 20, 2021 at 07:36:07AM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> [656898.034261] Call Trace:
> [656898.034538] __schedule+0x271/0x860
> [656898.034881] schedule+0x46/0xb0
> [656898.035226] xfs_log_commit_cil+0x6a4/0x800 [xfs]
This looks like it is stuck on the hard throttle. Which means this
is the likely fix:
19f4e7cc8197 xfs: Fix CIL throttle hang when CIL space used going backwards
Cheers,
Dave.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (8 preceding siblings ...)
2021-10-20 22:38 ` [Bug 214767] " bugzilla-daemon
@ 2021-10-21 4:21 ` bugzilla-daemon
2021-10-21 13:17 ` bugzilla-daemon
` (7 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-21 4:21 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #9 from Christian Theune (ct@flyingcircus.io) ---
Thanks Dave!
From what I can tell this fix was merged into 5.14 with a largish bunch of
other changes (via 9f7b640f001f). I checked that the fix you mentioned applies
cleanly to the current 5.10 codebase (it does) but I'm not sure whether we
missed some more backports for 5.10.
I wonder whether "we" (I don't think I'm expert enough to do this reliably)
need to check for other fixes that didn't make it back to 5.10. Do you have a
recommendation how/who to approach?
As I'm not doing this often, I would contact GKH and ask for a backport of the
patch you mentioned (given that you consider it appropriate to apply to 5.10
from a semantic perspective) but maybe you have better advice.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (9 preceding siblings ...)
2021-10-21 4:21 ` bugzilla-daemon
@ 2021-10-21 13:17 ` bugzilla-daemon
2021-10-21 13:19 ` bugzilla-daemon
` (6 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-21 13:17 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #10 from Christian Theune (ct@flyingcircus.io) ---
Alright,
I managed to catch a machine that logged a blocked request for only the /tmp
partition. I logged in, ran a 'sync' and did the 'w' sysrequest. I'm attaching
the log here where at first you see a number of blocked requests, then a while
later I log in and run the sync and you can see the sync stuck as well as a
large number of workers.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (10 preceding siblings ...)
2021-10-21 13:17 ` bugzilla-daemon
@ 2021-10-21 13:19 ` bugzilla-daemon
2021-10-21 13:20 ` bugzilla-daemon
` (5 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-21 13:19 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #11 from Christian Theune (ct@flyingcircus.io) ---
Going through the log I can see that aside from 'sync' and the workers there is
a tar process that is stuck since 13:20 and not making any progress. I can also
not see any IO on the block device.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (11 preceding siblings ...)
2021-10-21 13:19 ` bugzilla-daemon
@ 2021-10-21 13:20 ` bugzilla-daemon
2021-10-29 2:21 ` bugzilla-daemon
` (4 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-21 13:20 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #12 from Christian Theune (ct@flyingcircus.io) ---
Created attachment 299295
--> https://bugzilla.kernel.org/attachment.cgi?id=299295&action=edit
log of a vm with only vdc1 (/tmp) stuck
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (12 preceding siblings ...)
2021-10-21 13:20 ` bugzilla-daemon
@ 2021-10-29 2:21 ` bugzilla-daemon
2021-10-29 2:22 ` bugzilla-daemon
` (3 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-29 2:21 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
Pedram Fard (pedram.fard@appian.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pedram.fard@appian.com
--- Comment #13 from Pedram Fard (pedram.fard@appian.com) ---
We have run into a similar issue. We use xfs_freeze to take backup snapshots of
our filesystems and occasionally, in the process the xfs_freeze/thaw process
turns into zombies.
We used the echo w > /proc/sysrq-trigger to get the stack trace:
[736826.666940] task:kswapd0 state:D stack: 0 pid: 576 ppid: 2
flags:0x00004000
[736826.674439] Call Trace:
[736826.677686] __schedule+0x1f9/0x660
[736826.681593] schedule+0x46/0xb0
[736826.685141] percpu_rwsem_wait+0xa9/0x180
[736826.688867] ? percpu_free_rwsem+0x30/0x30
[736826.692764] __percpu_down_read+0x49/0x60
[736826.696549] xfs_trans_alloc+0x15d/0x170
[736826.700367] xfs_free_eofblocks+0x130/0x1e0
[736826.807546] xfs_fs_destroy_inode+0xa8/0x1b0
[736826.811610] destroy_inode+0x3b/0x70
[736826.815150] dispose_list+0x48/0x60
[736826.818796] prune_icache_sb+0x54/0x80
[736826.822476] super_cache_scan+0x161/0x1e0
[736826.826341] do_shrink_slab+0x145/0x240
[736826.830294] shrink_slab_memcg+0xcd/0x1e0
[736826.834371] shrink_node_memcgs+0x186/0x1c0
[736826.838491] shrink_node+0x14f/0x570
[736826.842088] balance_pgdat+0x232/0x510
[736826.845820] kswapd+0xe2/0x170
[736826.849154] ? balance_pgdat+0x510/0x510
[736826.852944] kthread+0x11b/0x140
[736826.856326] ? __kthread_bind_mask+0x60/0x60
[736826.860313] ret_from_fork+0x22/0x30
[736826.863839] task:systemd-logind state:D stack: 0 pid: 2073 ppid: 1
flags:0x00000084
[736826.871453] Call Trace:
[736826.874898] __schedule+0x1f9/0x660
[736826.878898] ? kvm_sched_clock_read+0xd/0x20
[736826.883265] schedule+0x46/0xb0
[736826.887003] rwsem_down_write_slowpath+0x234/0x4b0
[736826.891114] ? __kmalloc+0x154/0x2b0
[736826.894794] prealloc_shrinker+0x4c/0xf0
[736826.898526] alloc_super+0x29a/0x2f0
[736826.902210] ? set_anon_super+0x40/0x40
[736826.905867] sget_fc+0x6c/0x240
[736826.909206] ? shmem_create+0x20/0x20
[736826.912912] get_tree_nodev+0x23/0xb0
[736826.916533] ? bpf_lsm_capset+0x10/0x10
[736826.920310] vfs_get_tree+0x25/0xb0
[736826.923848] do_new_mount+0x152/0x1b0
[736826.927548] __x64_sys_mount+0x103/0x140
[736826.931531] do_syscall_64+0x33/0x40
[736826.935597] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736826.940323] RIP: 0033:0x7f49f8efcdba
[736826.944298] RSP: 002b:00007ffdb98f2658 EFLAGS: 00000202 ORIG_RAX:
00000000000000a5
[736826.952238] RAX: ffffffffffffffda RBX: 000055f21a7113a0 RCX:
00007f49f8efcdba
[736826.959940] RDX: 000055f218672af0 RSI: 000055f21a721de0 RDI:
000055f218672af0
[736826.967721] RBP: 00007ffdb98f2690 R08: 000055f21a7a51e0 R09:
0000000000000000
[736826.975503] R10: 0000000000000006 R11: 0000000000000202 R12:
0000000000000000
[736826.983253] R13: 00007ffdb98f2740 R14: 00007ffdb98f2820 R15:
0000000000000000
[736826.990979] task:dockerd state:D stack: 0 pid: 6639 ppid: 1
flags:0x00000080
[736826.999289] Call Trace:
[736827.002695] __schedule+0x1f9/0x660
[736827.006708] schedule+0x46/0xb0
[736827.010512] rwsem_down_write_slowpath+0x234/0x4b0
[736827.015214] ? __kmalloc+0x154/0x2b0
[736827.019291] prealloc_shrinker+0x4c/0xf0
[736827.023495] alloc_super+0x29a/0x2f0
[736827.027505] sget+0x91/0x220
[736827.031089] ? destroy_super_rcu+0x40/0x40
[736827.035268] ? ovl_get_lowerstack+0x190/0x190 [overlay]
[736827.040172] mount_nodev+0x26/0x90
[736827.044070] legacy_get_tree+0x27/0x40
[736827.048169] vfs_get_tree+0x25/0xb0
[736827.052088] do_new_mount+0x152/0x1b0
[736827.056076] __x64_sys_mount+0x103/0x140
[736827.060176] do_syscall_64+0x33/0x40
[736827.064158] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736827.068950] RIP: 0033:0x55c5416cc2aa
[736827.072915] RSP: 002b:000000c0015b2768 EFLAGS: 00000216 ORIG_RAX:
00000000000000a5
[736827.080676] RAX: ffffffffffffffda RBX: 000000c00005e800 RCX:
000055c5416cc2aa
[736827.088375] RDX: 000000c000df5320 RSI: 000000c0012db140 RDI:
000000c000df52e8
[736827.095311] RBP: 000000c0015b2800 R08: 000000c0015a0600 R09:
0000000000000000
[736827.102300] R10: 0000000000000000 R11: 0000000000000216 R12:
0000000000000133
[736827.109100] R13: 0000000000000132 R14: 0000000000000200 R15:
0000000000000055
[737030.525416] task:sh state:D stack: 0 pid:27526 ppid: 27517
flags:0x00000080
[737030.532905] Call Trace:
[737030.536188] __schedule+0x1f9/0x660
[737030.540060] schedule+0x46/0xb0
[737030.543385] percpu_rwsem_wait+0xa9/0x180
[737030.547220] ? percpu_free_rwsem+0x30/0x30
[737030.551093] __percpu_down_read+0x49/0x60
[737030.555220] mnt_want_write+0x66/0x90
[737030.558966] open_last_lookups+0x30d/0x3f0
[737030.562914] ? path_init+0x2bd/0x3e0
[737030.681735] path_openat+0x88/0x1d0
[737030.685600] ? xfs_iunlock+0x94/0xe0
[737030.689575] do_filp_open+0x88/0x130
[737030.693496] ? __check_object_size.part.0+0x11f/0x140
[737030.698201] do_sys_openat2+0x97/0x150
[737030.702232] __x64_sys_openat+0x54/0x90
[737030.706301] do_syscall_64+0x33/0x40
[737030.710248] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[737030.714945] RIP: 0033:0x7f4502c0d14e
[737030.718849] RSP: 002b:00007ffd066be7a0 EFLAGS: 00000246 ORIG_RAX:
0000000000000101
[737030.726628] RAX: ffffffffffffffda RBX: 00000000009ba360 RCX:
00007f4502c0d14e
[737030.734177] RDX: 0000000000000241 RSI: 00000000009bb8c0 RDI:
ffffffffffffff9c
[737030.741759] RBP: 00007ffd066be8a0 R08: 0000000000000020 R09:
00000000009bb8c0
[737030.749452] R10: 00000000000001b6 R11: 0000000000000246 R12:
0000000000000000
[737030.757142] R13: 0000000000000001 R14: 0000000000000001 R15:
00000000009bad60
[737030.764836] task:bash state:D stack: 0 pid:27580 ppid: 1
flags:0x00000084
[737030.773068] Call Trace:
[737030.776437] __schedule+0x1f9/0x660
[737030.780354] schedule+0x46/0xb0
[737030.784080] percpu_rwsem_wait+0xa9/0x180
[737030.788264] ? percpu_free_rwsem+0x30/0x30
[737030.792476] __percpu_down_read+0x49/0x60
[737030.796642] mnt_want_write+0x66/0x90
[737030.800660] open_last_lookups+0x30d/0x3f0
[737030.804923] ? path_init+0x2bd/0x3e0
[737030.808895] path_openat+0x88/0x1d0
[737030.812800] ? xfs_iunlock+0x94/0xe0
[737030.816745] do_filp_open+0x88/0x130
[737030.820694] ? getname_flags.part.0+0x29/0x1a0
[737030.825097] ? __check_object_size.part.0+0x11f/0x140
[737030.829846] ? audit_alloc_name+0x8c/0xe0
[737030.834010] do_sys_openat2+0x97/0x150
[737030.838056] __x64_sys_openat+0x54/0x90
[737030.842164] do_syscall_64+0x33/0x40
[737030.846146] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[737030.850855] RIP: 0033:0x7f2072a9414e
[737030.854822] RSP: 002b:00007ffd855832e0 EFLAGS: 00000246 ORIG_RAX:
0000000000000101
[737030.862655] RAX: ffffffffffffffda RBX: 000000000178aab0 RCX:
00007f2072a9414e
[737030.870259] RDX: 0000000000000241 RSI: 000000000178af20 RDI:
ffffffffffffff9c
[737030.877820] RBP: 00007ffd855833e0 R08: 0000000000000020 R09:
000000000178af20
[737030.885463] R10: 00000000000001b6 R11: 0000000000000246 R12:
0000000000000000
[737030.893083] R13: 0000000000000003 R14: 0000000000000001 R15:
000000000178aa60
[736992.593491] task:k state:D stack: 0 pid:11166 ppid: 29014
flags:0x00000184
[736992.600911] Call Trace:
[736992.604234] __schedule+0x1f9/0x660
[736992.607842] schedule+0x46/0xb0
[736992.611295] percpu_rwsem_wait+0xa9/0x180
[736992.614999] ? percpu_free_rwsem+0x30/0x30
[736992.618901] __percpu_down_read+0x49/0x60
[736992.622614] mnt_want_write+0x66/0x90
[736992.626323] open_last_lookups+0x30d/0x3f0
[736992.630064] ? path_init+0x2bd/0x3e0
[736992.633795] path_openat+0x88/0x1d0
[736992.637360] do_filp_open+0x88/0x130
[736992.640911] ? getname_flags.part.0+0x29/0x1a0
[736992.645006] ? __check_object_size.part.0+0x11f/0x140
[736992.649268] ? audit_alloc_name+0x8c/0xe0
[736992.653173] do_sys_openat2+0x97/0x150
[736992.656832] __x64_sys_openat+0x54/0x90
[736992.660619] do_syscall_64+0x33/0x40
[736992.664181] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736992.668543] RIP: 0033:0x7f81dc2d11ae
[736992.672208] RSP: 002b:00007ffcff615c00 EFLAGS: 00000246 ORIG_RAX:
0000000000000101
[736992.679509] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
00007f81dc2d11ae
[736992.686481] RDX: 0000000000000042 RSI: 00007f81d757e6b8 RDI:
00000000ffffff9c
[736992.693723] RBP: 00007f81d7ba14c4 R08: 000000000000003a R09:
0000000000000000
[736992.700684] R10: 00000000000001b6 R11: 0000000000000246 R12:
00007f81d757e6b8
[736992.708136] R13: 00007f81d7ba14c4 R14: 00007f81dbfb45b8 R15:
00007f81d7f3f8fc
[736992.715824] task:k state:D stack: 0 pid:11273 ppid: 29014
flags:0x00000184
[736992.724256] Call Trace:
[736992.727698] __schedule+0x1f9/0x660
[736992.731616] schedule+0x46/0xb0
[736992.735367] percpu_rwsem_wait+0xa9/0x180
[736992.739512] ? percpu_free_rwsem+0x30/0x30
[736992.743681] __percpu_down_read+0x49/0x60
[736992.747837] mnt_want_write+0x66/0x90
[736992.751896] open_last_lookups+0x30d/0x3f0
[736992.756191] ? path_init+0x2bd/0x3e0
[736992.760175] path_openat+0x88/0x1d0
[736992.764055] do_filp_open+0x88/0x130
[736992.768115] ? getname_flags.part.0+0x29/0x1a0
[736992.772563] ? __check_object_size.part.0+0x11f/0x140
[736992.777391] ? audit_alloc_name+0x8c/0xe0
[736992.781658] do_sys_openat2+0x97/0x150
[736992.785794] __x64_sys_openat+0x54/0x90
[736992.789856] do_syscall_64+0x33/0x40
[736992.793913] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736992.798746] RIP: 0033:0x7fb7d337f1ae
[736992.802878] RSP: 002b:00007ffd9b2dbda0 EFLAGS: 00000246 ORIG_RAX:
0000000000000101
[736992.810824] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
00007fb7d337f1ae
[736992.818496] RDX: 0000000000000042 RSI: 00007fb7d2e2aa88 RDI:
00000000ffffff9c
[736992.826359] RBP: 00007fb7cecdc7dc R08: 0000000000000008 R09:
0000000000000000
[736992.833753] R10: 00000000000001b6 R11: 0000000000000246 R12:
00007fb7d2e2aa88
[736992.840727] R13: 00007fb7d2e2aa7c R14: 00007fb7d3150920 R15:
0000000000000004
[736992.847580] task:tail state:D stack: 0 pid:11608 ppid: 9603
flags:0x00004186
[736992.855146] Call Trace:
[736992.858238] __schedule+0x1f9/0x660
[736992.861943] schedule+0x46/0xb0
[736992.865387] percpu_rwsem_wait+0xa9/0x180
[736992.869138] ? percpu_free_rwsem+0x30/0x30
[736992.873134] __percpu_down_read+0x49/0x60
[736992.876948] xfs_trans_alloc+0x15d/0x170
[736992.880854] xfs_free_eofblocks+0x130/0x1e0
[736992.885085] xfs_release+0x13d/0x160
[736992.888807] __fput+0x96/0x240
[736992.892138] task_work_run+0x5f/0x90
[736992.895834] do_exit+0x22c/0x3b0
[736992.899189] ? timerqueue_del+0x1e/0x40
[736992.902967] do_group_exit+0x33/0xa0
[736992.906469] get_signal+0x15d/0x5b0
[736993.013638] arch_do_signal+0x25/0xf0
[736993.017223] exit_to_user_mode_loop+0x8d/0xc0
[736993.021148] exit_to_user_mode_prepare+0x6a/0x70
[736993.025371] syscall_exit_to_user_mode+0x22/0x140
[736993.029417] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736993.033852] RIP: 0033:0x7f907725e6f4
[736993.037391] RSP: 002b:00007ffcf7722f38 EFLAGS: 00000246 ORIG_RAX:
0000000000000023
[736993.044403] RAX: fffffffffffffdfc RBX: 0000000000000001 RCX:
00007f907725e6f4
[736993.051384] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
00007ffcf7722f40
[736993.058190] RBP: 0000000000000000 R08: 00007ffcf7723070 R09:
00007f9077312e80
[736993.065109] R10: 0000000000000000 R11: 0000000000000246 R12:
00007ffcf7722f40
[736993.071934] R13: 00007ffcf77256bf R14: 0000560396137440 R15:
0000000000000000
[736996.659414] task:process-exporte state:D stack: 0 pid:30396 ppid: 2204
flags:0x00004082
[736996.666720] Call Trace:
[736996.669864] __schedule+0x1f9/0x660
[736996.673371] schedule+0x46/0xb0
[736996.676730] rwsem_down_write_slowpath+0x234/0x4b0
[736996.680991] unregister_memcg_shrinker.isra.0+0x18/0x40
[736996.685377] unregister_shrinker+0x7b/0x80
[736996.689295] deactivate_locked_super+0x29/0xa0
[736996.693241] cleanup_mnt+0x12d/0x190
[736996.696912] task_work_run+0x5f/0x90
[736996.700444] do_exit+0x22c/0x3b0
[736996.703996] do_group_exit+0x33/0xa0
[736996.707581] get_signal+0x15d/0x5b0
[736996.711191] arch_do_signal+0x25/0xf0
[736996.714771] exit_to_user_mode_loop+0x8d/0xc0
[736996.718759] exit_to_user_mode_prepare+0x6a/0x70
[736996.722740] syscall_exit_to_user_mode+0x22/0x140
[736996.726869] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736996.835174] RIP: 0033:0x4a2c50
[736996.838687] RSP: 002b:000000c000213798 EFLAGS: 00000216 ORIG_RAX:
0000000000000000
[736996.845651] RAX: 0000000000000200 RBX: 000000c000032500 RCX:
00000000004a2c50
[736996.852636] RDX: 0000000000000200 RSI: 000000c0008ef600 RDI:
0000000000000005
[736996.859446] RBP: 000000c0002137e8 R08: 0000000000000000 R09:
0000000000000000
[736996.866347] R10: 0000000000000000 R11: 0000000000000216 R12:
ffffffffffffffff
[736996.873124] R13: 000000000000000c R14: 000000000000000b R15:
0000000000000010
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (13 preceding siblings ...)
2021-10-29 2:21 ` bugzilla-daemon
@ 2021-10-29 2:22 ` bugzilla-daemon
2021-11-09 5:19 ` bugzilla-daemon
` (2 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-10-29 2:22 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #14 from Pedram Fard (pedram.fard@appian.com) ---
Here is the result of echo t > /proc/sysrq-trigger:
1 I root 578 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfsalloc]
1 I root 579 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs_mru_cache]
1 I root 1251 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-buf/nvme0n1]
1 I root 1252 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-conv/nvme0n]
1 I root 1253 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-cil/nvme0n1]
1 I root 1254 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-reclaim/nvm]
1 I root 1255 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-eofblocks/n]
1 I root 1256 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-log/nvme0n1]
1 S root 1257 2 0 80 0 - 0 - Oct20 ? 00:02:06
[xfsaild/nvme0n1]
1 I root 1990 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-buf/dm-0]
1 I root 1991 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-conv/dm-0]
1 I root 1992 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-cil/dm-0]
1 I root 1993 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-reclaim/dm-]
1 I root 1994 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-eofblocks/d]
1 I root 1995 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-log/dm-0]
1 S root 1996 2 0 80 0 - 0 - Oct20 ? 00:00:00
[xfsaild/dm-0]
1 I root 1999 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-buf/dm-1]
1 I root 2000 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-conv/dm-1]
1 I root 2001 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-cil/dm-1]
1 I root 2002 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-reclaim/dm-]
1 I root 2003 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-eofblocks/d]
1 I root 2004 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-log/dm-1]
1 S root 2005 2 0 80 0 - 0 - Oct20 ? 00:01:53
[xfsaild/dm-1]
1 I root 2008 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-buf/dm-2]
1 I root 2009 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-conv/dm-2]
1 I root 2010 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-cil/dm-2]
1 I root 2011 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-reclaim/dm-]
1 I root 2012 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-eofblocks/d]
1 I root 2013 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-log/dm-2]
1 S root 2014 2 0 80 0 - 0 - Oct20 ? 00:00:00
[xfsaild/dm-2]
0 S root 2226 1942 0 80 0 - 30496 - 17:41 ? 00:00:00
/bin/sh -f /usr/sbin/xfs_freeze -u /usr/local/appian
4 D root 2230 2226 0 80 0 - 29825 - 17:41 ? 00:00:00
/usr/sbin/xfs_io -F -r -p xfs_freeze -x -c thaw /usr/local/appian
1 I root 4068 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-buf/dm-3]
1 I root 4069 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-conv/dm-3]
1 I root 4070 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-cil/dm-3]
1 I root 4071 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-reclaim/dm-]
1 I root 4072 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-eofblocks/d]
1 I root 4073 2 0 60 -20 - 0 - Oct20 ? 00:00:00
[xfs-log/dm-3]
1 S root 4074 2 0 80 0 - 0 - Oct20 ? 00:02:01
[xfsaild/dm-3]
0 S root 10903 13809 0 80 0 - 29855 - 17:57 pts/0 00:00:00 grep
--color=auto xfs
1 I root 11221 2 0 80 0 - 0 - 17:44 ? 00:00:00
[kworker/1:4-xfs]
4 D root 22209 1 0 80 0 - 29825 - 16:08 ? 00:00:00
/usr/sbin/xfs_io -F -r -p xfs_freeze -x -c thaw /usr/local/appian
1 I root 23187 2 0 80 0 - 0 - 17:37 ? 00:00:00
[kworker/1:3-xfs]
1 I root 30083 2 0 80 0 - 0 - 17:29 ? 00:00:00
[kworker/1:1-xfs]
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (14 preceding siblings ...)
2021-10-29 2:22 ` bugzilla-daemon
@ 2021-11-09 5:19 ` bugzilla-daemon
2021-11-10 15:16 ` bugzilla-daemon
2021-11-20 22:31 ` bugzilla-daemon
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-11-09 5:19 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #15 from Christian Theune (ct@flyingcircus.io) ---
@Pedram: not sure those are related. I'm 50/50 whether my issue is
freeze-related and all my stacktraces have a 'xfs_log_commit_cil' in there
which I don't see in yours, could be a separate issue.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (15 preceding siblings ...)
2021-11-09 5:19 ` bugzilla-daemon
@ 2021-11-10 15:16 ` bugzilla-daemon
2021-11-20 22:31 ` Dave Chinner
2021-11-20 22:31 ` bugzilla-daemon
17 siblings, 1 reply; 20+ messages in thread
From: bugzilla-daemon @ 2021-11-10 15:16 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #16 from Christian Theune (ct@flyingcircus.io) ---
I started trying out the fix that Dave and am using it with 5.10.76 (applied
clean with a bit of fuzzing).
@Dave do you happen do know whether there's a helper that can stress test live
systems in this regard?
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-11-10 15:16 ` bugzilla-daemon
@ 2021-11-20 22:31 ` Dave Chinner
0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2021-11-20 22:31 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: linux-xfs
On Wed, Nov 10, 2021 at 03:16:35PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214767
>
> --- Comment #16 from Christian Theune (ct@flyingcircus.io) ---
> I started trying out the fix that Dave and am using it with 5.10.76 (applied
> clean with a bit of fuzzing).
>
> @Dave do you happen do know whether there's a helper that can stress test live
> systems in this regard?
fstests has some tests in the "freeze" group that stress
freeze/thaw. And it has lots of other tests in it that will tell you
if there's a regression in your backport, so might be an idea to run
a full "auto" group pass rather than just the freeze group....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
` (16 preceding siblings ...)
2021-11-10 15:16 ` bugzilla-daemon
@ 2021-11-20 22:31 ` bugzilla-daemon
17 siblings, 0 replies; 20+ messages in thread
From: bugzilla-daemon @ 2021-11-20 22:31 UTC (permalink / raw)
To: linux-xfs
https://bugzilla.kernel.org/show_bug.cgi?id=214767
--- Comment #17 from Dave Chinner (david@fromorbit.com) ---
On Wed, Nov 10, 2021 at 03:16:35PM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214767
>
> --- Comment #16 from Christian Theune (ct@flyingcircus.io) ---
> I started trying out the fix that Dave and am using it with 5.10.76 (applied
> clean with a bit of fuzzing).
>
> @Dave do you happen do know whether there's a helper that can stress test
> live
> systems in this regard?
fstests has some tests in the "freeze" group that stress
freeze/thaw. And it has lots of other tests in it that will tell you
if there's a regression in your backport, so might be an idea to run
a full "auto" group pass rather than just the freeze group....
Cheers,
Dave.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-11-20 22:31 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-20 7:36 [Bug 214767] New: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw bugzilla-daemon
2021-10-20 7:39 ` [Bug 214767] " bugzilla-daemon
2021-10-20 16:16 ` bugzilla-daemon
2021-10-20 16:31 ` bugzilla-daemon
2021-10-20 18:29 ` bugzilla-daemon
2021-10-20 18:37 ` bugzilla-daemon
2021-10-20 19:22 ` bugzilla-daemon
2021-10-20 19:26 ` bugzilla-daemon
2021-10-20 22:16 ` [Bug 214767] New: " Dave Chinner
2021-10-20 22:38 ` [Bug 214767] " bugzilla-daemon
2021-10-21 4:21 ` bugzilla-daemon
2021-10-21 13:17 ` bugzilla-daemon
2021-10-21 13:19 ` bugzilla-daemon
2021-10-21 13:20 ` bugzilla-daemon
2021-10-29 2:21 ` bugzilla-daemon
2021-10-29 2:22 ` bugzilla-daemon
2021-11-09 5:19 ` bugzilla-daemon
2021-11-10 15:16 ` bugzilla-daemon
2021-11-20 22:31 ` Dave Chinner
2021-11-20 22:31 ` bugzilla-daemon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.