Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Olga Kornievskaia <olga.kornievskaia@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-nfs <linux-nfs@vger.kernel.org>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	ceph-devel@vger.kernel.org, CIFS <linux-cifs@vger.kernel.org>
Subject: Re: [PATCH 01/11] vfs: copy_file_range source range over EOF should fail
Date: Mon, 20 May 2019 16:36:12 +0300
Message-ID: <CAOQ4uxgvCz+-snW8h-M-q2KqaPSk-oMYRVn2gWeMNg2jrMP_zg@mail.gmail.com> (raw)
In-Reply-To: <CAN-5tyGN8LPAxxjApBifbs6+eAgOVE8G1x3vawSMfT2Ufo7Bpw@mail.gmail.com>

On Mon, May 20, 2019 at 4:12 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Mon, May 20, 2019 at 5:10 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Dec 5, 2018 at 12:31 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Tue, Dec 04, 2018 at 04:47:18PM -0500, Olga Kornievskaia wrote:
> > > > On Tue, Dec 4, 2018 at 4:35 PM Dave Chinner <david@fromorbit.com> wrote:
> > > > >
> > > > > On Tue, Dec 04, 2018 at 07:13:32AM -0800, Christoph Hellwig wrote:
> > > > > > On Mon, Dec 03, 2018 at 02:46:20PM +0200, Amir Goldstein wrote:
> > > > > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > > > >
> > > > > > > > The man page says:
> > > > > > > >
> > > > > > > > EINVAL Requested range extends beyond the end of the source file
> > > > > > > >
> > > > > > > > But the current behaviour is that copy_file_range does a short
> > > > > > > > copy up to the source file EOF. Fix the kernel behaviour to match
> > > > > > > > the behaviour described in the man page.
> > > > > >
> > > > > > I think the behavior implemented is a lot more useful than the one
> > > > > > documented..
> > > > >
> > > > > The current behaviour is really nasty. Because copy_file_range() can
> > > > > return short copies, the caller has to implement a loop to ensure
> > > > > the range hey want get copied.  When the source range you are
> > > > > trying to copy overlaps source EOF, this loop:
> > > > >
> > > > >         while (len > 0) {
> > > > >                 ret = copy_file_range(... len ...)
> > > > >                 ...
> > > > >                 off_in += ret;
> > > > >                 off_out += ret;
> > > > >                 len -= ret;
> > > > >         }
> > > > >
> > > > > Currently the fallback code copies up to the end of the source file
> > > > > on the first copy and then fails the second copy with EINVAL because
> > > > > the source range is now completely beyond EOF.
> > > > >
> > > > > So, from an application perspective, did the copy succeed or did it
> > > > > fail?
> > > > >
> > > > > Existing tools that exercise copy_file_range (like xfs_io) consider
> > > > > this a failure, because the second copy_file_range() call returns
> > > > > EINVAL and not some "there is no more to copy" marker like read()
> > > > > returning 0 bytes when attempting to read beyond EOF.
> > > > >
> > > > > IOWs, we cannot tell the difference between a real error and a short
> > > > > copy because the input range spans EOF and it was silently
> > > > > shortened. That's the API problem we need to fix here - the existing
> > > > > behaviour is really crappy for applications. Erroring out
> > > > > immmediately is one solution, and it's what the man page says should
> > > > > happen so that is what I implemented.
> > > > >
> > > > > Realistically, though, I think an attempt to read beyond EOF for the
> > > > > copy should result in behaviour like read() (i.e. return 0 bytes),
> > > > > not EINVAL. The existing behaviour needs to change, though.
> > > >
> > > > There are two checks to consider
> > > > 1. pos_in >= EOF should return EINVAL
> > > > 2. however what's perhaps should be relaxed is pos_in+len >= EOF
> > > > should return a short copy.
> > > >
> > > > Having check#1 enforced allows to us to differentiate between a real
> > > > error and a short copy.
> > >
> > > That's what the code does right now and *exactly what I'm trying to
> > > fix* because it EINVAL is ambiguous and not an indicator that we've
> > > reached the end of the source file. EINVAL can indicate several
> > > different errors, so it really has to be treated as a "copy failed"
> > > error by applications.
> > >
> > > Have a look at read/pread() - they return 0 in this case to indicate
> > > a short read, and the value of zero is explicitly defined as meaning
> > > "read position is beyond EOF".  Applications know straight away that
> > > there is no more data to be read and there was no error, so can
> > > terminate on a successful short read.
> > >
> > > We need to allow applications to terminate copy loops on a
> > > successful short copy. IOWs, applications need to either:
> > >
> > >         - get an immediate error saying the range is invalid rather
> > >           than doing a short copy (as per the man page); or
> > >         - have an explicit marker to say "no more data to be copied"
> > >
> > > Applications need the "no more data to copy" case to be explicit and
> > > unambiguous so they can make sane decisions about whether a short
> > > copy was successful because the file was shorter than expected or
> > > whether a short copy was a result of a real error being encountered.
> > > The current behaviour is largely unusable for applications because
> > > they have to guess at the reason for EINVAL part way through a
> > > copy....
> > >
> >
> > Dave,
> >
> > I went a head and implemented the desired behavior.
> > However, while testing I observed that the desired behavior is already
> > the existing behavior. For example, trying to copy 10 bytes from a 2 bytes file,
> > xfs_io copy loop ends as expected:
> > copy_file_range(4, [0], 3, [0], 10, 0)  = 2
> > copy_file_range(4, [2], 3, [2], 8, 0)   = 0
> >
> > This was tested on ext4 and xfs with reflink on recent kernel as well as on
> > v4.20-rc1 (era of original patch set).
> >
> > Where and how did you observe the EINVAL behavior described above?
> > (besides man page that is). There are even xfstests (which you modified)
> > that verify the return 0 for past EOF behavior.
> >
> > For now, I am just dropping this patch from the patch series.
> > Let me know if I am missing something.
>
> The was fixing inconsistency in what the man page specified (ie., it
> must fail with EINVAL if offsets are out of range) which was never
> enforced by the code. The patch then could be to fix the existing
> semantics (man page) of the system call.
>
> Copy file range range is not only read and write but rather
> lseek+read+write and if somebody specifies an incorrect offset to the

Nope. it is like either read+write or pread+pwrite.

> lseek the system call should fail. Thus I still think that copy file
> range should enforce that specifying a source offset beyond the end of
> the file should fail with EINVAL.

You appear to be out numbered by reviewers that think copy_file_range(2)
should behave like pread(2) and return 0 when offf_in >= size_in.

>
> If the copy file range returned 0 bytes does it mean it's a stopping
> condition, not according to the current semantics.

Yes. Same as read(2)/pread(2).

Thanks,
Amir.

  reply index

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03  8:34 [PATCH 0/11] fs: fixes for major copy_file_range() issues Dave Chinner
2018-12-03  8:34 ` [PATCH 01/11] vfs: copy_file_range source range over EOF should fail Dave Chinner
2018-12-03 12:46   ` Amir Goldstein
2018-12-04 15:13     ` Christoph Hellwig
2018-12-04 21:29       ` Dave Chinner
2018-12-04 21:47         ` Olga Kornievskaia
2018-12-04 22:31           ` Dave Chinner
2018-12-05 16:51             ` bfields
2019-05-20  9:10             ` Amir Goldstein
2019-05-20 13:12               ` Olga Kornievskaia
2019-05-20 13:36                 ` Amir Goldstein [this message]
2019-05-20 13:58                   ` Olga Kornievskaia
2019-05-20 14:02                     ` Amir Goldstein
2018-12-05 14:12         ` Christoph Hellwig
2018-12-05 21:08           ` Dave Chinner
2018-12-05 21:30             ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 02/11] vfs: introduce generic_copy_file_range() Dave Chinner
2018-12-03 10:03   ` Amir Goldstein
2018-12-03 23:00     ` Dave Chinner
2018-12-04 15:14   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 03/11] vfs: no fallback for ->copy_file_range Dave Chinner
2018-12-03 10:22   ` Amir Goldstein
2018-12-03 23:02     ` Dave Chinner
2018-12-06  4:16       ` Amir Goldstein
2018-12-06 21:30         ` Dave Chinner
2018-12-07  5:38           ` Amir Goldstein
2018-12-03 18:23   ` Anna Schumaker
2018-12-04 15:16   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 04/11] vfs: add missing checks to copy_file_range Dave Chinner
2018-12-03 12:42   ` Amir Goldstein
2018-12-03 19:04   ` Darrick J. Wong
2018-12-03 21:33   ` Olga Kornievskaia
2018-12-03 23:04     ` Dave Chinner
2018-12-04 15:18   ` Christoph Hellwig
2018-12-12 11:31   ` Luis Henriques
2018-12-12 16:42     ` Darrick J. Wong
2018-12-12 18:55     ` Olga Kornievskaia
2018-12-12 19:42       ` Matthew Wilcox
2018-12-12 20:22         ` Olga Kornievskaia
2018-12-13 10:29           ` Luis Henriques
2018-12-03  8:34 ` [PATCH 05/11] vfs: use inode_permission in copy_file_range() Dave Chinner
2018-12-03 12:47   ` Amir Goldstein
2018-12-03 18:18   ` Darrick J. Wong
2018-12-03 23:55     ` Dave Chinner
2018-12-05 17:28       ` bfields
2018-12-03 18:53   ` Eric Biggers
2018-12-04 15:19   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 06/11] vfs: copy_file_range needs to strip setuid bits Dave Chinner
2018-12-03 12:51   ` Amir Goldstein
2018-12-04 15:21   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 07/11] vfs: copy_file_range should update file timestamps Dave Chinner
2018-12-03 10:47   ` Amir Goldstein
2018-12-03 17:33     ` Olga Kornievskaia
2018-12-03 18:22       ` Darrick J. Wong
2018-12-03 23:19     ` Dave Chinner
2018-12-04 15:24   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 08/11] vfs: push EXDEV check down into ->remap_file_range Dave Chinner
2018-12-03 11:04   ` Amir Goldstein
2018-12-03 19:11     ` Darrick J. Wong
2018-12-03 23:37       ` Dave Chinner
2018-12-03 23:58         ` Darrick J. Wong
2018-12-04  9:17           ` Amir Goldstein
2018-12-03 23:34     ` Dave Chinner
2018-12-03 18:24   ` Darrick J. Wong
2018-12-04  8:18   ` Olga Kornievskaia
2018-12-03  8:34 ` [PATCH 09/11] vfs: push copy_file_ranges -EXDEV checks down Dave Chinner
2018-12-03 12:36   ` Amir Goldstein
2018-12-03 17:58   ` Olga Kornievskaia
2018-12-03 18:53   ` Anna Schumaker
2018-12-03 19:27     ` Olga Kornievskaia
2018-12-03 23:40     ` Dave Chinner
2018-12-04 15:43   ` Christoph Hellwig
2018-12-04 22:18     ` Dave Chinner
2018-12-04 23:33       ` Olga Kornievskaia
2018-12-05 14:09       ` Christoph Hellwig
2018-12-05 17:01         ` Olga Kornievskaia
2018-12-03  8:34 ` [PATCH 10/11] vfs: allow generic_copy_file_range to copy across devices Dave Chinner
2018-12-03 12:54   ` Amir Goldstein
2018-12-03  8:34 ` [PATCH 11/11] ovl: allow cross-device copy_file_range calls Dave Chinner
2018-12-03 12:55   ` Amir Goldstein
2018-12-03  8:39 ` [PATCH 12/11] man-pages: copy_file_range updates Dave Chinner
2018-12-03 13:05   ` Amir Goldstein
2019-05-21  5:52   ` Amir Goldstein

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxgvCz+-snW8h-M-q2KqaPSk-oMYRVn2gWeMNg2jrMP_zg@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=olga.kornievskaia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org linux-nfs@archiver.kernel.org
	public-inbox-index linux-nfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox