Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Olga Kornievskaia <olga.kornievskaia@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-nfs <linux-nfs@vger.kernel.org>,
	linux-unionfs@vger.kernel.org, ceph-devel@vger.kernel.org,
	linux-cifs@vger.kernel.org
Subject: Re: [PATCH 01/11] vfs: copy_file_range source range over EOF should fail
Date: Wed, 5 Dec 2018 09:31:02 +1100
Message-ID: <20181204223102.GR6311@dastard> (raw)
In-Reply-To: <CAN-5tyGU=y5JO5UNcmn3rX1gRyK_UxjQvQ+kCsP34_NT2-mQ_A@mail.gmail.com>

On Tue, Dec 04, 2018 at 04:47:18PM -0500, Olga Kornievskaia wrote:
> On Tue, Dec 4, 2018 at 4:35 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Tue, Dec 04, 2018 at 07:13:32AM -0800, Christoph Hellwig wrote:
> > > On Mon, Dec 03, 2018 at 02:46:20PM +0200, Amir Goldstein wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > >
> > > > > The man page says:
> > > > >
> > > > > EINVAL Requested range extends beyond the end of the source file
> > > > >
> > > > > But the current behaviour is that copy_file_range does a short
> > > > > copy up to the source file EOF. Fix the kernel behaviour to match
> > > > > the behaviour described in the man page.
> > >
> > > I think the behavior implemented is a lot more useful than the one
> > > documented..
> >
> > The current behaviour is really nasty. Because copy_file_range() can
> > return short copies, the caller has to implement a loop to ensure
> > the range hey want get copied.  When the source range you are
> > trying to copy overlaps source EOF, this loop:
> >
> >         while (len > 0) {
> >                 ret = copy_file_range(... len ...)
> >                 ...
> >                 off_in += ret;
> >                 off_out += ret;
> >                 len -= ret;
> >         }
> >
> > Currently the fallback code copies up to the end of the source file
> > on the first copy and then fails the second copy with EINVAL because
> > the source range is now completely beyond EOF.
> >
> > So, from an application perspective, did the copy succeed or did it
> > fail?
> >
> > Existing tools that exercise copy_file_range (like xfs_io) consider
> > this a failure, because the second copy_file_range() call returns
> > EINVAL and not some "there is no more to copy" marker like read()
> > returning 0 bytes when attempting to read beyond EOF.
> >
> > IOWs, we cannot tell the difference between a real error and a short
> > copy because the input range spans EOF and it was silently
> > shortened. That's the API problem we need to fix here - the existing
> > behaviour is really crappy for applications. Erroring out
> > immmediately is one solution, and it's what the man page says should
> > happen so that is what I implemented.
> >
> > Realistically, though, I think an attempt to read beyond EOF for the
> > copy should result in behaviour like read() (i.e. return 0 bytes),
> > not EINVAL. The existing behaviour needs to change, though.
> 
> There are two checks to consider
> 1. pos_in >= EOF should return EINVAL
> 2. however what's perhaps should be relaxed is pos_in+len >= EOF
> should return a short copy.
> 
> Having check#1 enforced allows to us to differentiate between a real
> error and a short copy.

That's what the code does right now and *exactly what I'm trying to
fix* because it EINVAL is ambiguous and not an indicator that we've
reached the end of the source file. EINVAL can indicate several
different errors, so it really has to be treated as a "copy failed"
error by applications.

Have a look at read/pread() - they return 0 in this case to indicate
a short read, and the value of zero is explicitly defined as meaning
"read position is beyond EOF".  Applications know straight away that
there is no more data to be read and there was no error, so can
terminate on a successful short read.

We need to allow applications to terminate copy loops on a
successful short copy. IOWs, applications need to either:

	- get an immediate error saying the range is invalid rather
	  than doing a short copy (as per the man page); or
	- have an explicit marker to say "no more data to be copied"

Applications need the "no more data to copy" case to be explicit and
unambiguous so they can make sane decisions about whether a short
copy was successful because the file was shorter than expected or
whether a short copy was a result of a real error being encountered.
The current behaviour is largely unusable for applications because
they have to guess at the reason for EINVAL part way through a
copy....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply index

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03  8:34 [PATCH 0/11] fs: fixes for major copy_file_range() issues Dave Chinner
2018-12-03  8:34 ` [PATCH 01/11] vfs: copy_file_range source range over EOF should fail Dave Chinner
2018-12-03 12:46   ` Amir Goldstein
2018-12-04 15:13     ` Christoph Hellwig
2018-12-04 21:29       ` Dave Chinner
2018-12-04 21:47         ` Olga Kornievskaia
2018-12-04 22:31           ` Dave Chinner [this message]
2018-12-05 16:51             ` bfields
2019-05-20  9:10             ` Amir Goldstein
2019-05-20 13:12               ` Olga Kornievskaia
2019-05-20 13:36                 ` Amir Goldstein
2019-05-20 13:58                   ` Olga Kornievskaia
2019-05-20 14:02                     ` Amir Goldstein
2018-12-05 14:12         ` Christoph Hellwig
2018-12-05 21:08           ` Dave Chinner
2018-12-05 21:30             ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 02/11] vfs: introduce generic_copy_file_range() Dave Chinner
2018-12-03 10:03   ` Amir Goldstein
2018-12-03 23:00     ` Dave Chinner
2018-12-04 15:14   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 03/11] vfs: no fallback for ->copy_file_range Dave Chinner
2018-12-03 10:22   ` Amir Goldstein
2018-12-03 23:02     ` Dave Chinner
2018-12-06  4:16       ` Amir Goldstein
2018-12-06 21:30         ` Dave Chinner
2018-12-07  5:38           ` Amir Goldstein
2018-12-03 18:23   ` Anna Schumaker
2018-12-04 15:16   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 04/11] vfs: add missing checks to copy_file_range Dave Chinner
2018-12-03 12:42   ` Amir Goldstein
2018-12-03 19:04   ` Darrick J. Wong
2018-12-03 21:33   ` Olga Kornievskaia
2018-12-03 23:04     ` Dave Chinner
2018-12-04 15:18   ` Christoph Hellwig
2018-12-12 11:31   ` Luis Henriques
2018-12-12 16:42     ` Darrick J. Wong
2018-12-12 18:55     ` Olga Kornievskaia
2018-12-12 19:42       ` Matthew Wilcox
2018-12-12 20:22         ` Olga Kornievskaia
2018-12-13 10:29           ` Luis Henriques
2018-12-03  8:34 ` [PATCH 05/11] vfs: use inode_permission in copy_file_range() Dave Chinner
2018-12-03 12:47   ` Amir Goldstein
2018-12-03 18:18   ` Darrick J. Wong
2018-12-03 23:55     ` Dave Chinner
2018-12-05 17:28       ` bfields
2018-12-03 18:53   ` Eric Biggers
2018-12-04 15:19   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 06/11] vfs: copy_file_range needs to strip setuid bits Dave Chinner
2018-12-03 12:51   ` Amir Goldstein
2018-12-04 15:21   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 07/11] vfs: copy_file_range should update file timestamps Dave Chinner
2018-12-03 10:47   ` Amir Goldstein
2018-12-03 17:33     ` Olga Kornievskaia
2018-12-03 18:22       ` Darrick J. Wong
2018-12-03 23:19     ` Dave Chinner
2018-12-04 15:24   ` Christoph Hellwig
2018-12-03  8:34 ` [PATCH 08/11] vfs: push EXDEV check down into ->remap_file_range Dave Chinner
2018-12-03 11:04   ` Amir Goldstein
2018-12-03 19:11     ` Darrick J. Wong
2018-12-03 23:37       ` Dave Chinner
2018-12-03 23:58         ` Darrick J. Wong
2018-12-04  9:17           ` Amir Goldstein
2018-12-03 23:34     ` Dave Chinner
2018-12-03 18:24   ` Darrick J. Wong
2018-12-04  8:18   ` Olga Kornievskaia
2018-12-03  8:34 ` [PATCH 09/11] vfs: push copy_file_ranges -EXDEV checks down Dave Chinner
2018-12-03 12:36   ` Amir Goldstein
2018-12-03 17:58   ` Olga Kornievskaia
2018-12-03 18:53   ` Anna Schumaker
2018-12-03 19:27     ` Olga Kornievskaia
2018-12-03 23:40     ` Dave Chinner
2018-12-04 15:43   ` Christoph Hellwig
2018-12-04 22:18     ` Dave Chinner
2018-12-04 23:33       ` Olga Kornievskaia
2018-12-05 14:09       ` Christoph Hellwig
2018-12-05 17:01         ` Olga Kornievskaia
2018-12-03  8:34 ` [PATCH 10/11] vfs: allow generic_copy_file_range to copy across devices Dave Chinner
2018-12-03 12:54   ` Amir Goldstein
2018-12-03  8:34 ` [PATCH 11/11] ovl: allow cross-device copy_file_range calls Dave Chinner
2018-12-03 12:55   ` Amir Goldstein
2018-12-03  8:39 ` [PATCH 12/11] man-pages: copy_file_range updates Dave Chinner
2018-12-03 13:05   ` Amir Goldstein
2019-05-21  5:52   ` Amir Goldstein

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181204223102.GR6311@dastard \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=olga.kornievskaia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org linux-nfs@archiver.kernel.org
	public-inbox-index linux-nfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox