[Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes
@ 2019-12-11 14:03 bugzilla-daemon
  2019-12-12 18:01 ` Brian Foster
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-12-11 14:03 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=205833

            Bug ID: 205833
           Summary: fsfreeze blocks close(fd) on xfs sometimes
           Product: File System
           Version: 2.5
    Kernel Version: 4.15.0-55-generic #60-Ubuntu
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: XFS
          Assignee: filesystem_xfs@kernel-bugs.kernel.org
          Reporter: kernel.org@estada.ch
        Regression: No

Dear all

I noticed the bug while setting up a backup with fsfreeze and restic.

How I reproduce it:

    1. Write multiple MB to a file (eg. 100MB) while after one or two MB freeze
the filesystem from the sidecar pod
    2. From the sidecar pod, issue multiple `strace tail /generated/data/0.txt`
    3. After a couple of tries strace shows that the `read(...)` works but
`close(...)` hangs
    4. From now on all `read(...)` operations are blocked until the freeze is
lifted

System: Ubuntu 18.04.3 LTS
CPU: Intel(R) Xeon(R) CPU X5650  @ 2.67GHz
Storage: /dev/mapper/mpathXX on /var/lib/kubelet/plugins/hpe.com/... type xfs
(rw,noatime,attr2,inode64,noquota)

I used this tool to generate the file. The number of concurrent files does not
appear to matter that much. I was able to trigger the bug, tested with 2, 4 and
32 parallel files:
https://gitlab.com/dns2utf8/multi_file_writer

Cheers,
Stefan

PS: I opened a bug at the tool vendor too:
https://github.com/vmware-tanzu/velero/issues/2113

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes
  2019-12-11 14:03 [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes bugzilla-daemon
@ 2019-12-12 18:01 ` Brian Foster
  2019-12-12 18:01 ` [Bug 205833] " bugzilla-daemon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2019-12-12 18:01 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-xfs

On Wed, Dec 11, 2019 at 02:03:52PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=205833
> 
>             Bug ID: 205833
>            Summary: fsfreeze blocks close(fd) on xfs sometimes
>            Product: File System
>            Version: 2.5
>     Kernel Version: 4.15.0-55-generic #60-Ubuntu
>           Hardware: Intel
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: XFS
>           Assignee: filesystem_xfs@kernel-bugs.kernel.org
>           Reporter: kernel.org@estada.ch
>         Regression: No
> 
> Dear all
> 
> I noticed the bug while setting up a backup with fsfreeze and restic.
> 
> How I reproduce it:
> 
>     1. Write multiple MB to a file (eg. 100MB) while after one or two MB freeze
> the filesystem from the sidecar pod
>     2. From the sidecar pod, issue multiple `strace tail /generated/data/0.txt`
>     3. After a couple of tries strace shows that the `read(...)` works but
> `close(...)` hangs
>     4. From now on all `read(...)` operations are blocked until the freeze is
> lifted
> 

I'm not familiar with your user environment, but it sounds like the use
case is essentially to read a file concurrently being written to and
freeze the fs. From there, you're expecting the readers to exit but
instead observe them blocked on close().

The ceaveat to note here is that close() is not necessarily a read-only
operation from the perspective of XFS internals. A close() (or
->release() from the fs perspective) can do things like truncate
post-eof block allocation, which requires a transaction and thus blocks
on a frozen fs. To confirm, could you post a stack trace of one of your
blocked reader tasks (i.e. 'cat /proc/<pid>/stack')?

I'm not necessarily sure blocking here is a bug if that is the
situation. We most likely wouldn't want to skip post-eof truncation on a
file simply because the fs was frozen. That said, I thought Dave had
proposed patches at one point to mitigate free space fragmentation side
effects of post-eof truncation, and one such patch was to skip the
truncation on read-only fds. I'll have to dig around or perhaps Dave can
chime in, but I'm curious if that would also help with this use case..

Brian

> System: Ubuntu 18.04.3 LTS
> CPU: Intel(R) Xeon(R) CPU X5650  @ 2.67GHz
> Storage: /dev/mapper/mpathXX on /var/lib/kubelet/plugins/hpe.com/... type xfs
> (rw,noatime,attr2,inode64,noquota)
> 
> I used this tool to generate the file. The number of concurrent files does not
> appear to matter that much. I was able to trigger the bug, tested with 2, 4 and
> 32 parallel files:
> https://gitlab.com/dns2utf8/multi_file_writer
> 
> Cheers,
> Stefan
> 
> PS: I opened a bug at the tool vendor too:
> https://github.com/vmware-tanzu/velero/issues/2113
> 
> -- 
> You are receiving this mail because:
> You are watching the assignee of the bug.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 205833] fsfreeze blocks close(fd) on xfs sometimes
  2019-12-11 14:03 [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes bugzilla-daemon
  2019-12-12 18:01 ` Brian Foster
@ 2019-12-12 18:01 ` bugzilla-daemon
  2019-12-17  9:34 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-12-12 18:01 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=205833

--- Comment #1 from bfoster@redhat.com ---
On Wed, Dec 11, 2019 at 02:03:52PM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=205833
> 
>             Bug ID: 205833
>            Summary: fsfreeze blocks close(fd) on xfs sometimes
>            Product: File System
>            Version: 2.5
>     Kernel Version: 4.15.0-55-generic #60-Ubuntu
>           Hardware: Intel
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: XFS
>           Assignee: filesystem_xfs@kernel-bugs.kernel.org
>           Reporter: kernel.org@estada.ch
>         Regression: No
> 
> Dear all
> 
> I noticed the bug while setting up a backup with fsfreeze and restic.
> 
> How I reproduce it:
> 
>     1. Write multiple MB to a file (eg. 100MB) while after one or two MB
>     freeze
> the filesystem from the sidecar pod
>     2. From the sidecar pod, issue multiple `strace tail
>     /generated/data/0.txt`
>     3. After a couple of tries strace shows that the `read(...)` works but
> `close(...)` hangs
>     4. From now on all `read(...)` operations are blocked until the freeze is
> lifted
> 

I'm not familiar with your user environment, but it sounds like the use
case is essentially to read a file concurrently being written to and
freeze the fs. From there, you're expecting the readers to exit but
instead observe them blocked on close().

The ceaveat to note here is that close() is not necessarily a read-only
operation from the perspective of XFS internals. A close() (or
->release() from the fs perspective) can do things like truncate
post-eof block allocation, which requires a transaction and thus blocks
on a frozen fs. To confirm, could you post a stack trace of one of your
blocked reader tasks (i.e. 'cat /proc/<pid>/stack')?

I'm not necessarily sure blocking here is a bug if that is the
situation. We most likely wouldn't want to skip post-eof truncation on a
file simply because the fs was frozen. That said, I thought Dave had
proposed patches at one point to mitigate free space fragmentation side
effects of post-eof truncation, and one such patch was to skip the
truncation on read-only fds. I'll have to dig around or perhaps Dave can
chime in, but I'm curious if that would also help with this use case..

Brian

> System: Ubuntu 18.04.3 LTS
> CPU: Intel(R) Xeon(R) CPU X5650  @ 2.67GHz
> Storage: /dev/mapper/mpathXX on /var/lib/kubelet/plugins/hpe.com/... type xfs
> (rw,noatime,attr2,inode64,noquota)
> 
> I used this tool to generate the file. The number of concurrent files does
> not
> appear to matter that much. I was able to trigger the bug, tested with 2, 4
> and
> 32 parallel files:
> https://gitlab.com/dns2utf8/multi_file_writer
> 
> Cheers,
> Stefan
> 
> PS: I opened a bug at the tool vendor too:
> https://github.com/vmware-tanzu/velero/issues/2113
> 
> -- 
> You are receiving this mail because:
> You are watching the assignee of the bug.
>

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 205833] fsfreeze blocks close(fd) on xfs sometimes
  2019-12-11 14:03 [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes bugzilla-daemon
  2019-12-12 18:01 ` Brian Foster
  2019-12-12 18:01 ` [Bug 205833] " bugzilla-daemon
@ 2019-12-17  9:34 ` bugzilla-daemon
  2019-12-17 12:03   ` Brian Foster
  2019-12-17 12:03 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: bugzilla-daemon @ 2019-12-17  9:34 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=205833

--- Comment #2 from Stefan @dns2utf8 Schindler (kernel.org@estada.ch) ---
Hi Brian

Thank you! Here is the stack of a blocked `tail 0.txt` process:

cat /proc/276/stack
[<0>] call_rwsem_down_read_failed+0x18/0x30
[<0>] __percpu_down_read+0x58/0x80
[<0>] __sb_start_write+0x65/0x70
[<0>] xfs_trans_alloc+0xec/0x130 [xfs]
[<0>] xfs_free_eofblocks+0x12a/0x1e0 [xfs]
[<0>] xfs_release+0x144/0x170 [xfs]
[<0>] xfs_file_release+0x15/0x20 [xfs]
[<0>] __fput+0xea/0x220
[<0>] ____fput+0xe/0x10
[<0>] task_work_run+0x9d/0xc0
[<0>] ptrace_notify+0x84/0x90
[<0>] tracehook_report_syscall_exit+0x90/0xd0
[<0>] syscall_slow_exit_work+0x50/0xd0
[<0>] do_syscall_64+0x12b/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

Your explanation matches the behaviour I see on the system.

If there was a patch, do you think it would get backported or just stay in
mainline and ship with the regular releases?

Best,
Stefan

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 205833] fsfreeze blocks close(fd) on xfs sometimes
  2019-12-17  9:34 ` bugzilla-daemon
@ 2019-12-17 12:03   ` Brian Foster
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2019-12-17 12:03 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-xfs

On Tue, Dec 17, 2019 at 09:34:34AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=205833
> 
> --- Comment #2 from Stefan @dns2utf8 Schindler (kernel.org@estada.ch) ---
> Hi Brian
> 
> Thank you! Here is the stack of a blocked `tail 0.txt` process:
> 
> cat /proc/276/stack
> [<0>] call_rwsem_down_read_failed+0x18/0x30
> [<0>] __percpu_down_read+0x58/0x80
> [<0>] __sb_start_write+0x65/0x70
> [<0>] xfs_trans_alloc+0xec/0x130 [xfs]
> [<0>] xfs_free_eofblocks+0x12a/0x1e0 [xfs]
> [<0>] xfs_release+0x144/0x170 [xfs]
> [<0>] xfs_file_release+0x15/0x20 [xfs]
> [<0>] __fput+0xea/0x220
> [<0>] ____fput+0xe/0x10
> [<0>] task_work_run+0x9d/0xc0
> [<0>] ptrace_notify+0x84/0x90
> [<0>] tracehook_report_syscall_exit+0x90/0xd0
> [<0>] syscall_slow_exit_work+0x50/0xd0
> [<0>] do_syscall_64+0x12b/0x130
> [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [<0>] 0xffffffffffffffff
> 
> Your explanation matches the behaviour I see on the system.
> 
> If there was a patch, do you think it would get backported or just stay in
> mainline and ship with the regular releases?
> 

There was a patch, but it was RFC and hadn't been merged because IIRC
more investigation/testing was required to evaluate side effects. For
reference, the last post I see is the one below. In particular, patch 3
bypasses EOF block truncation from read-only file descriptors (I believe
the file writer task would still block).

https://marc.info/?l=linux-xfs&m=154951612101291&w=2

Based on the stack above, note that this is (at least for the time
being) expected behavior on XFS.

Brian

> Best,
> Stefan
> 
> -- 
> You are receiving this mail because:
> You are watching the assignee of the bug.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 205833] fsfreeze blocks close(fd) on xfs sometimes
  2019-12-11 14:03 [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-12-17  9:34 ` bugzilla-daemon
@ 2019-12-17 12:03 ` bugzilla-daemon
  2022-09-05 14:29 ` bugzilla-daemon
  2023-10-09 15:37 ` bugzilla-daemon
  5 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2019-12-17 12:03 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=205833

--- Comment #3 from bfoster@redhat.com ---
On Tue, Dec 17, 2019 at 09:34:34AM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=205833
> 
> --- Comment #2 from Stefan @dns2utf8 Schindler (kernel.org@estada.ch) ---
> Hi Brian
> 
> Thank you! Here is the stack of a blocked `tail 0.txt` process:
> 
> cat /proc/276/stack
> [<0>] call_rwsem_down_read_failed+0x18/0x30
> [<0>] __percpu_down_read+0x58/0x80
> [<0>] __sb_start_write+0x65/0x70
> [<0>] xfs_trans_alloc+0xec/0x130 [xfs]
> [<0>] xfs_free_eofblocks+0x12a/0x1e0 [xfs]
> [<0>] xfs_release+0x144/0x170 [xfs]
> [<0>] xfs_file_release+0x15/0x20 [xfs]
> [<0>] __fput+0xea/0x220
> [<0>] ____fput+0xe/0x10
> [<0>] task_work_run+0x9d/0xc0
> [<0>] ptrace_notify+0x84/0x90
> [<0>] tracehook_report_syscall_exit+0x90/0xd0
> [<0>] syscall_slow_exit_work+0x50/0xd0
> [<0>] do_syscall_64+0x12b/0x130
> [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [<0>] 0xffffffffffffffff
> 
> Your explanation matches the behaviour I see on the system.
> 
> If there was a patch, do you think it would get backported or just stay in
> mainline and ship with the regular releases?
> 

There was a patch, but it was RFC and hadn't been merged because IIRC
more investigation/testing was required to evaluate side effects. For
reference, the last post I see is the one below. In particular, patch 3
bypasses EOF block truncation from read-only file descriptors (I believe
the file writer task would still block).

https://marc.info/?l=linux-xfs&m=154951612101291&w=2

Based on the stack above, note that this is (at least for the time
being) expected behavior on XFS.

Brian

> Best,
> Stefan
> 
> -- 
> You are receiving this mail because:
> You are watching the assignee of the bug.
>

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 205833] fsfreeze blocks close(fd) on xfs sometimes
  2019-12-11 14:03 [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-12-17 12:03 ` bugzilla-daemon
@ 2022-09-05 14:29 ` bugzilla-daemon
  2023-10-09 15:37 ` bugzilla-daemon
  5 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2022-09-05 14:29 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=205833

--- Comment #4 from Stefan @dns2utf8 Schindler (kernel.org@estada.ch) ---
Has the fix been merged?

On the latest Arch Linux I am no longer able to reproduce the error where the
second process hangs.

My test files & programs are here:

* Quick test: https://gitlab.com/dns2utf8/xfs_fsfreeze_test/
* Heavy load: https://gitlab.com/dns2utf8/multi_file_writer/

Best,
Stefan

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 205833] fsfreeze blocks close(fd) on xfs sometimes
  2019-12-11 14:03 [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes bugzilla-daemon
                   ` (4 preceding siblings ...)
  2022-09-05 14:29 ` bugzilla-daemon
@ 2023-10-09 15:37 ` bugzilla-daemon
  5 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2023-10-09 15:37 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=205833

Коренберг Марк (socketpair@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |socketpair@gmail.com

--- Comment #5 from Коренберг Марк (socketpair@gmail.com) ---
https://bugzilla.redhat.com/show_bug.cgi?id=1474726 the same bug

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-09 15:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11 14:03 [Bug 205833] New: fsfreeze blocks close(fd) on xfs sometimes bugzilla-daemon
2019-12-12 18:01 ` Brian Foster
2019-12-12 18:01 ` [Bug 205833] " bugzilla-daemon
2019-12-17  9:34 ` bugzilla-daemon
2019-12-17 12:03   ` Brian Foster
2019-12-17 12:03 ` bugzilla-daemon
2022-09-05 14:29 ` bugzilla-daemon
2023-10-09 15:37 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).