All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers3@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: david@fromorbit.com, linux-fsdevel@vger.kernel.org,
	linux-api@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH 8/9] vfs: hoist the btrfs deduplication ioctl to the vfs
Date: Tue, 12 Jan 2016 00:07:14 -0600	[thread overview]
Message-ID: <20160112060714.GA4980@zzz> (raw)
In-Reply-To: <20151219085559.12478.33700.stgit@birch.djwong.org>

Some feedback on the VFS portion of the FIDEDUPERANGE ioctl and its man page...
(note: I realize the patch is mostly just moving the code that already existed
in btrfs, but in the VFS it deserves a more thorough review):

At high level, I am confused about what is meant by the "source" and
"destination" files.  I understand that with more than two files, you
effectively have to choose one file to treat specially and dedupe with all the
other files (an NxN comparison isn't realistic).  But with just two files, a
deduplication operation should be completely symmetric, should it not?  The end
result should be that the data is deduplicated, regardless of the order in which
I gave the file descriptors.  So why is there some non-symmetric behavior?
There are several examples but one is that the VFS is checking !S_ISREG() on the
"source" file descriptor but not on the "destination" file descriptor.  Another
is that different permissions are required on the source versus on the
destination.  If there are good reasons for the nonsymmetry then this needs to
be clearly explained in the man page; otherwise it may not be clear what to use
as the "source" and what to use as the "destination".

It seems odd to be adding "copy" as a system call but then have "dedupe" and
"clone" as ioctls rather than system calls... it seems that they should all be
one or the other (at least, if we put aside the fact that the ioctls already
exist in btrfs).

The range checking in clone_verify_area() appears incomplete.  Someone could
provide len=UINT64_MAX and all the checks would still pass even though 'pos+len'
would overflow.

Should the ioctl be interruptible?  Right now it always goes through *all* the
'struct file_dedupe_range_info's you passed in --- potentially up to 65535 of
them.

Why 'info->bytes_deduped += deduped' rather than 'info->bytes_deduped =
deduped'?  'bytes_deduped' is per file descriptor, not for the operation as a
whole.

What permissions do you need on the destination file descriptors?  The man page
implies they must be open for writing and not appending.  The implementation
differs: it requires FMODE_WRITE only for non-admin users, and it doesn't check
for O_APPEND at all.  The man page also says you get EPERM if "dest_fd is
immutable" and ETXTBSY if "one of the files is a swap file", which I don't see
actually happening in the implementation; it seems those error codes perhaps
exist at all for this ioctl but rather be left to open(..., O_WRONLY).

If the filesystem doesn't support deduplication, or I pass in a strange file
descriptor such as one for a named pipe, do I get EINVAL or EOPNOTSUPP?  The man
page isn't clear.

Under what circumstances will 'bytes_deduped' differ from the count that was
passed in?  If short counts are allowed, what will be the 'status' be in that
case: FILE_DEDUP_RANGE_DIFFERS, FILE_DEDUPE_RANGE_SAME, or something else?  Can
data be deduped even if only a prefix of the data region matches?

The man page doesn't mention FILE_DEDUPE_RANGE_SAME at all, instead calling it
0; it only mentions FILE_DEDUPE_RANGE_DIFFERS.

The man page isn't clear about whether the ioctl stops early if an error occurs
or always processes all the 'struct file_dedupe_range_info's you pass in.  And
if it were, hypothetically, to stop early, how is the user meant to know on
which file it stopped?

The man page says "logical_offset" but in the struct it is called "dest_offset".

There are some variables named "same" which don't really make sense now that the
ioctl is called FIDEDUPERANGE instead of EXTENT_SAME.

Eric

WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers3@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	xfs@oss.sgi.com
Subject: Re: [PATCH 8/9] vfs: hoist the btrfs deduplication ioctl to the vfs
Date: Tue, 12 Jan 2016 00:07:14 -0600	[thread overview]
Message-ID: <20160112060714.GA4980@zzz> (raw)
In-Reply-To: <20151219085559.12478.33700.stgit@birch.djwong.org>

Some feedback on the VFS portion of the FIDEDUPERANGE ioctl and its man page...
(note: I realize the patch is mostly just moving the code that already existed
in btrfs, but in the VFS it deserves a more thorough review):

At high level, I am confused about what is meant by the "source" and
"destination" files.  I understand that with more than two files, you
effectively have to choose one file to treat specially and dedupe with all the
other files (an NxN comparison isn't realistic).  But with just two files, a
deduplication operation should be completely symmetric, should it not?  The end
result should be that the data is deduplicated, regardless of the order in which
I gave the file descriptors.  So why is there some non-symmetric behavior?
There are several examples but one is that the VFS is checking !S_ISREG() on the
"source" file descriptor but not on the "destination" file descriptor.  Another
is that different permissions are required on the source versus on the
destination.  If there are good reasons for the nonsymmetry then this needs to
be clearly explained in the man page; otherwise it may not be clear what to use
as the "source" and what to use as the "destination".

It seems odd to be adding "copy" as a system call but then have "dedupe" and
"clone" as ioctls rather than system calls... it seems that they should all be
one or the other (at least, if we put aside the fact that the ioctls already
exist in btrfs).

The range checking in clone_verify_area() appears incomplete.  Someone could
provide len=UINT64_MAX and all the checks would still pass even though 'pos+len'
would overflow.

Should the ioctl be interruptible?  Right now it always goes through *all* the
'struct file_dedupe_range_info's you passed in --- potentially up to 65535 of
them.

Why 'info->bytes_deduped += deduped' rather than 'info->bytes_deduped =
deduped'?  'bytes_deduped' is per file descriptor, not for the operation as a
whole.

What permissions do you need on the destination file descriptors?  The man page
implies they must be open for writing and not appending.  The implementation
differs: it requires FMODE_WRITE only for non-admin users, and it doesn't check
for O_APPEND at all.  The man page also says you get EPERM if "dest_fd is
immutable" and ETXTBSY if "one of the files is a swap file", which I don't see
actually happening in the implementation; it seems those error codes perhaps
exist at all for this ioctl but rather be left to open(..., O_WRONLY).

If the filesystem doesn't support deduplication, or I pass in a strange file
descriptor such as one for a named pipe, do I get EINVAL or EOPNOTSUPP?  The man
page isn't clear.

Under what circumstances will 'bytes_deduped' differ from the count that was
passed in?  If short counts are allowed, what will be the 'status' be in that
case: FILE_DEDUP_RANGE_DIFFERS, FILE_DEDUPE_RANGE_SAME, or something else?  Can
data be deduped even if only a prefix of the data region matches?

The man page doesn't mention FILE_DEDUPE_RANGE_SAME at all, instead calling it
0; it only mentions FILE_DEDUPE_RANGE_DIFFERS.

The man page isn't clear about whether the ioctl stops early if an error occurs
or always processes all the 'struct file_dedupe_range_info's you pass in.  And
if it were, hypothetically, to stop early, how is the user meant to know on
which file it stopped?

The man page says "logical_offset" but in the struct it is called "dest_offset".

There are some variables named "same" which don't really make sense now that the
ioctl is called FIDEDUPERANGE instead of EXTENT_SAME.

Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-01-12  6:07 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-19  8:55 [RFCv4 0/9] vfs: hoist reflink/dedupe ioctls to the VFS Darrick J. Wong
2015-12-19  8:55 ` Darrick J. Wong
2015-12-19  8:55 ` Darrick J. Wong
2015-12-19  8:55 ` [PATCH 1/9] vfs: add copy_file_range syscall and vfs helper Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55 ` [PATCH 2/9] x86: add sys_copy_file_range to syscall tables Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55 ` [PATCH 3/9] btrfs: add .copy_file_range file operation Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55 ` [PATCH 4/9] vfs: Add vfs_copy_file_range() support for pagecache copies Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55 ` [PATCH 5/9] locks: new locks_mandatory_area calling convention Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55 ` [PATCH 6/9] vfs: pull btrfs clone API to vfs layer Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55 ` [PATCH 7/9] vfs: wire up compat ioctl for CLONE/CLONE_RANGE Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55 ` [PATCH 8/9] vfs: hoist the btrfs deduplication ioctl to the vfs Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2015-12-19  8:55   ` Darrick J. Wong
2016-01-12  6:07   ` Eric Biggers [this message]
2016-01-12  6:07     ` Eric Biggers
2016-01-12  9:14     ` Darrick J. Wong
2016-01-12  9:14       ` Darrick J. Wong
2016-01-13  2:36       ` Eric Biggers
2016-01-13  2:36         ` Eric Biggers
2016-01-13  2:36         ` Eric Biggers
2016-01-23  0:54         ` Darrick J. Wong
2016-01-23  0:54           ` Darrick J. Wong
2016-01-23  0:54           ` Darrick J. Wong
2016-08-07 17:47       ` Michael Kerrisk (man-pages)
2016-08-07 17:47         ` Michael Kerrisk (man-pages)
2016-07-27 21:51   ` Kirill A. Shutemov
2016-07-27 21:51     ` Kirill A. Shutemov
2016-07-28 18:07     ` Darrick J. Wong
2016-07-28 18:07       ` Darrick J. Wong
2016-07-28 18:07       ` Darrick J. Wong
2016-07-28 19:25     ` Darrick J. Wong
2016-07-28 19:25       ` Darrick J. Wong
2015-12-19  8:56 ` [PATCH 9/9] btrfs: use new dedupe data function pointer Darrick J. Wong
2015-12-19  8:56   ` Darrick J. Wong
2015-12-20 15:30 ` [RFCv4 0/9] vfs: hoist reflink/dedupe ioctls to the VFS Christoph Hellwig
2015-12-20 15:30   ` Christoph Hellwig
2015-12-20 15:30   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160112060714.GA4980@zzz \
    --to=ebiggers3@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.