OCFS2-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Ocfs2-devel] Broken O_{D,}SYNC behavior with FICLONE*?
@ 2020-09-03  3:52 Darrick J. Wong
  2020-09-03 14:27 ` Christoph Hellwig
  2020-09-03 21:13 ` Dave Chinner
  0 siblings, 2 replies; 3+ messages in thread
From: Darrick J. Wong @ 2020-09-03  3:52 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-btrfs, linux-ext4, ocfs2 list
  Cc: Christoph Hellwig, Dave Chinner, Eric Sandeen, Theodore Ts'o

Hi,

I have a question for everyone-- do FICLONE and FICLONERANGE count as a
"write operation" for the purposes of reasoning about O_SYNC and
O_DSYNC?  In other words, is it supposed to be the case that
(paraphrasing the open(2) manpage) "By the time ioctl(FICLONE) returns,
the output data and associated file metadata have been transferred to
the underlying hardware (i.e., as though each ioctl(FICLONE) was
followed by a call to fsync(2))."?

If I open a file with O_SYNC, call FICLONE to reflink some data blocks
into that file, and hit the reset button as soon as the ioctl call
returns, should I expect that I will always see the new file contents in
that file after the system comes back up?  Or am I required to fsync()
the file despite O_SYNC being set?

The reason I ask is that (a) reflinking can definitely change the file
contents which seems like a write operation; and (b) we wrote a test to
examine the copy_file_range() semantics wrt O_SYNC and discovered that
an unaligned c_f_r through the splice code does indeed honor the
documented O_SYNC semantics, but a block-aligned c_f_r that uses reflink
does *not* honor this.

So, that's inconsistent behavior and I want to know if remap_file_range
is broken or if we all just don't care about O_SYNC for these fancy
IO accelerators?

I tend to think reflink is broken on XFS, but I converted that O_SYNC
test into a fstest and discovered that none of XFS, btrfs, or ocfs2
actually force the fs to persist metadata changes after reflinking into
an O_SYNC file.  The manpages for the clone ioctls and copy_file_range
don't explicitly declare those calls to be "write operations".

FWIW I repeated the analysis with a file that had FS_XFLAG_SYNC or
FS_SYNC_FL set on the inode but O_SYNC was not set on the fd, and
observed the same results.

--D

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Ocfs2-devel] Broken O_{D,}SYNC behavior with FICLONE*?
  2020-09-03  3:52 [Ocfs2-devel] Broken O_{D,}SYNC behavior with FICLONE*? Darrick J. Wong
@ 2020-09-03 14:27 ` Christoph Hellwig
  2020-09-03 21:13 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2020-09-03 14:27 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, xfs, linux-btrfs, linux-ext4, ocfs2 list,
	Christoph Hellwig, Dave Chinner, Eric Sandeen, Theodore Ts'o

On Wed, Sep 02, 2020 at 08:52:25PM -0700, Darrick J. Wong wrote:
> Hi,
> 
> I have a question for everyone-- do FICLONE and FICLONERANGE count as a
> "write operation" for the purposes of reasoning about O_SYNC and
> O_DSYNC?

They aren't really write operations in the traditional sense as they
only change metadata.  Then again the metadata is all about the file
content, so we'd probaby err on the safe side by including them in the
write operations umbrella term.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Ocfs2-devel] Broken O_{D,}SYNC behavior with FICLONE*?
  2020-09-03  3:52 [Ocfs2-devel] Broken O_{D,}SYNC behavior with FICLONE*? Darrick J. Wong
  2020-09-03 14:27 ` Christoph Hellwig
@ 2020-09-03 21:13 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2020-09-03 21:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, xfs, linux-btrfs, linux-ext4, ocfs2 list,
	Christoph Hellwig, Eric Sandeen, Theodore Ts'o

On Wed, Sep 02, 2020 at 08:52:25PM -0700, Darrick J. Wong wrote:
> Hi,
> 
> I have a question for everyone-- do FICLONE and FICLONERANGE count as a
> "write operation" for the purposes of reasoning about O_SYNC and
> O_DSYNC?

I'd say yes, because we are changing metadata that is used to
directly reference the data in the file. O_DSYNC implies all the
metadata needed to access the data is on stable storage when the
operation returns....

> So, that's inconsistent behavior and I want to know if remap_file_range
> is broken or if we all just don't care about O_SYNC for these fancy
> IO accelerators?

Perhaps we should pay attention to the NFSD implementation of CloneFR -
if the operation is sync then it will run fsync on the destination
and commit_metadata on the source inode. See
nfsd4_clone_file_range().

So, yeah, I think clone operations need to pay attention to
O_DSYNC/O_SYNC/IS_SYNC()....

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-03  3:52 [Ocfs2-devel] Broken O_{D,}SYNC behavior with FICLONE*? Darrick J. Wong
2020-09-03 14:27 ` Christoph Hellwig
2020-09-03 21:13 ` Dave Chinner

OCFS2-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/ocfs2-devel/0 ocfs2-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ocfs2-devel ocfs2-devel/ https://lore.kernel.org/ocfs2-devel \
		ocfs2-devel@oss.oracle.com
	public-inbox-index ocfs2-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/com.oracle.oss.ocfs2-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git