linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] splice: don't read more than available pipe space
@ 2018-12-02 18:08 Darrick J. Wong
  2018-12-02 18:10 ` [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT) Darrick J. Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2018-12-02 18:08 UTC (permalink / raw)
  To: Amir Goldstein, Dave Chinner
  Cc: jencce.kernel, linux-xfs, overlayfs, Zorro Lang, fstests,
	linux-fsdevel, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read.  This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.

In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads.

The brokenness is compounded by splice_direct_to_actor immediately
bailing on do_splice_to returning <= 0 without ever calling ->actor
(which empties out the pipe), so if userspace calls back we'll EFAULT
again on the full pipe, and nothing ever gets copied.

Therefore, teach splice_direct_to_actor to clamp its requests to the
amount of free space in the pipe and remove the simulated short read
from the iomap directio code.

Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
Reported-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: split into two parts per hch request
---
 fs/splice.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 3553f1956508..4bd9d9590199 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -949,7 +949,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		size_t read_len;
 		loff_t pos = sd->pos, prev_pos = pos;
 
-		ret = do_splice_to(in, &pos, pipe, len, flags);
+		/* Don't try to read more the pipe has space for. */
+		read_len = min_t(size_t, len,
+				 (pipe->buffers - pipe->nrbufs) << PAGE_SHIFT);
+		ret = do_splice_to(in, &pos, pipe, read_len, flags);
 		if (unlikely(ret <= 0))
 			goto out_release;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  2018-12-02 18:08 [PATCH v2 1/2] splice: don't read more than available pipe space Darrick J. Wong
@ 2018-12-02 18:10 ` Darrick J. Wong
  2018-12-02 19:37   ` Amir Goldstein
  2019-08-21 20:23   ` Andreas Grünbacher
  0 siblings, 2 replies; 9+ messages in thread
From: Darrick J. Wong @ 2018-12-02 18:10 UTC (permalink / raw)
  To: Amir Goldstein, Dave Chinner
  Cc: jencce.kernel, linux-xfs, overlayfs, Zorro Lang, fstests,
	linux-fsdevel, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read.  This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.

In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads.  This causes infinite
splice() loops and assertion failures on generic/095 on overlayfs
because xfs only permit total success or total failure of a directio
operation.  The underlying issue in the pipe splice code has now been
fixed by changing the pipe splice loop to avoid avoid reading more data
than there is space in the pipe.

Therefore, it's no longer necessary to simulate the short directio, so
remove the hack from iomap.

Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
Reported-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: split into two patches per hch request
---
 fs/iomap.c |    9 ---------
 1 file changed, 9 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 3ffb776fbebe..d6bc98ae8d35 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1877,15 +1877,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 				dio->wait_for_completion = true;
 				ret = 0;
 			}
-
-			/*
-			 * Splicing to pipes can fail on a full pipe. We have to
-			 * swallow this to make it look like a short IO
-			 * otherwise the higher splice layers will completely
-			 * mishandle the error and stop moving data.
-			 */
-			if (ret == -EFAULT)
-				ret = 0;
 			break;
 		}
 		pos += ret;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  2018-12-02 18:10 ` [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT) Darrick J. Wong
@ 2018-12-02 19:37   ` Amir Goldstein
  2019-08-21 20:23   ` Andreas Grünbacher
  1 sibling, 0 replies; 9+ messages in thread
From: Amir Goldstein @ 2018-12-02 19:37 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, jencce.kernel, linux-xfs, overlayfs, Zorro Lang,
	fstests, linux-fsdevel, Christoph Hellwig

On Sun, Dec 2, 2018 at 8:10 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> In commit 4721a601099, we tried to fix a problem wherein directio reads
> into a splice pipe will bounce EFAULT/EAGAIN all the way out to
> userspace by simulating a zero-byte short read.  This happens because
> some directio read implementations (xfs) will call
> bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
> reads, but as soon as we run out of pipe buffers that _get_pages call
> returns EFAULT, which the splice code translates to EAGAIN and bounces
> out to userspace.
>
> In that commit, the iomap code catches the EFAULT and simulates a
> zero-byte read, but that causes assertion errors on regular splice reads
> because xfs doesn't allow short directio reads.  This causes infinite
> splice() loops and assertion failures on generic/095 on overlayfs
> because xfs only permit total success or total failure of a directio
> operation.  The underlying issue in the pipe splice code has now been
> fixed by changing the pipe splice loop to avoid avoid reading more data
> than there is space in the pipe.
>
> Therefore, it's no longer necessary to simulate the short directio, so
> remove the hack from iomap.
>
> Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
> Reported-by: Amir Goldstein <amir73il@gmail.com>

Wasn't me. I believe it was Murphy Zhou <jencce.kernel@gmail.com>.
If you want you can add Ranted-by Amir ;-)

Anyway, looks fine.

> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: split into two patches per hch request
> ---
>  fs/iomap.c |    9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 3ffb776fbebe..d6bc98ae8d35 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1877,15 +1877,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>                                 dio->wait_for_completion = true;
>                                 ret = 0;
>                         }
> -
> -                       /*
> -                        * Splicing to pipes can fail on a full pipe. We have to
> -                        * swallow this to make it look like a short IO
> -                        * otherwise the higher splice layers will completely
> -                        * mishandle the error and stop moving data.
> -                        */
> -                       if (ret == -EFAULT)
> -                               ret = 0;
>                         break;
>                 }
>                 pos += ret;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  2018-12-02 18:10 ` [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT) Darrick J. Wong
  2018-12-02 19:37   ` Amir Goldstein
@ 2019-08-21 20:23   ` Andreas Grünbacher
  2019-08-28 14:23     ` Darrick J. Wong
  1 sibling, 1 reply; 9+ messages in thread
From: Andreas Grünbacher @ 2019-08-21 20:23 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Amir Goldstein, Dave Chinner, jencce.kernel, linux-xfs,
	overlayfs, Zorro Lang, fstests, linux-fsdevel, Christoph Hellwig,
	cluster-devel

Hi Darrick,

Am So., 2. Dez. 2018 um 19:13 Uhr schrieb Darrick J. Wong
<darrick.wong@oracle.com>:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> In commit 4721a601099, we tried to fix a problem wherein directio reads
> into a splice pipe will bounce EFAULT/EAGAIN all the way out to
> userspace by simulating a zero-byte short read.  This happens because
> some directio read implementations (xfs) will call
> bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
> reads, but as soon as we run out of pipe buffers that _get_pages call
> returns EFAULT, which the splice code translates to EAGAIN and bounces
> out to userspace.
>
> In that commit, the iomap code catches the EFAULT and simulates a
> zero-byte read, but that causes assertion errors on regular splice reads
> because xfs doesn't allow short directio reads.  This causes infinite
> splice() loops and assertion failures on generic/095 on overlayfs
> because xfs only permit total success or total failure of a directio
> operation.  The underlying issue in the pipe splice code has now been
> fixed by changing the pipe splice loop to avoid avoid reading more data
> than there is space in the pipe.
>
> Therefore, it's no longer necessary to simulate the short directio, so
> remove the hack from iomap.
>
> Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
> Reported-by: Amir Goldstein <amir73il@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: split into two patches per hch request
> ---
>  fs/iomap.c |    9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 3ffb776fbebe..d6bc98ae8d35 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1877,15 +1877,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>                                 dio->wait_for_completion = true;
>                                 ret = 0;
>                         }
> -
> -                       /*
> -                        * Splicing to pipes can fail on a full pipe. We have to
> -                        * swallow this to make it look like a short IO
> -                        * otherwise the higher splice layers will completely
> -                        * mishandle the error and stop moving data.
> -                        */
> -                       if (ret == -EFAULT)
> -                               ret = 0;
>                         break;
>                 }
>                 pos += ret;

I'm afraid this breaks the following test case on xfs and gfs2, the
two current users of iomap_dio_rw.

Here, the splice system call fails with errno = EAGAIN when trying to
"move data" from a file opened with O_DIRECT into a pipe.

The test case can be run with option -d to not use O_DIRECT, which
makes the test succeed.

The -r option switches from reading from the pipe sequentially to
reading concurrently with the splice, which doesn't change the
behavior.

Any thoughts?

Thanks,
Andreas

=================================== 8< ===================================
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <unistd.h>
#include <fcntl.h>
#include <err.h>

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <errno.h>

#define SECTOR_SIZE 512
#define BUFFER_SIZE (150 * SECTOR_SIZE)

void read_from_pipe(int fd, const char *filename, size_t size)
{
    char buffer[SECTOR_SIZE];
    size_t sz;
    ssize_t ret;

    while (size) {
        sz = size;
        if (sz > sizeof buffer)
            sz = sizeof buffer;
        ret = read(fd, buffer, sz);
        if (ret < 0)
            err(1, "read: %s", filename);
        if (ret == 0) {
            fprintf(stderr, "read: %s: unexpected EOF\n", filename);
            exit(1);
        }
        size -= sz;
    }
}

void do_splice1(int fd, const char *filename, size_t size)
{
    bool retried = false;
    int pipefd[2];

    if (pipe(pipefd) == -1)
        err(1, "pipe");
    while (size) {
        ssize_t spliced;

        spliced = splice(fd, NULL, pipefd[1], NULL, size, SPLICE_F_MOVE);
        if (spliced == -1) {
            if (errno == EAGAIN && !retried) {
                retried = true;
                fprintf(stderr, "retrying splice\n");
                sleep(1);
                continue;
            }
            err(1, "splice");
        }
        read_from_pipe(pipefd[0], filename, spliced);
        size -= spliced;
    }
    close(pipefd[0]);
    close(pipefd[1]);
}

void do_splice2(int fd, const char *filename, size_t size)
{
    bool retried = false;
    int pipefd[2];
    int pid;

    if (pipe(pipefd) == -1)
        err(1, "pipe");

    pid = fork();
    if (pid == 0) {
        close(pipefd[1]);
        read_from_pipe(pipefd[0], filename, size);
        exit(0);
    } else {
        close(pipefd[0]);
        while (size) {
            ssize_t spliced;

            spliced = splice(fd, NULL, pipefd[1], NULL, size, SPLICE_F_MOVE);
            if (spliced == -1) {
                if (errno == EAGAIN && !retried) {
                    retried = true;
                    fprintf(stderr, "retrying splice\n");
                    sleep(1);
                    continue;
                }
                err(1, "splice");
            }
            size -= spliced;
        }
        close(pipefd[1]);
        waitpid(pid, NULL, 0);
    }
}

void usage(const char *argv0)
{
    fprintf(stderr, "USAGE: %s [-rd] {filename}\n", basename(argv0));
    exit(2);
}

int main(int argc, char *argv[])
{
    void (*do_splice)(int fd, const char *filename, size_t size);
    const char *filename;
    char *buffer;
    int opt, open_flags, fd;
    ssize_t ret;

    do_splice = do_splice1;
    open_flags = O_CREAT | O_TRUNC | O_RDWR | O_DIRECT;

    while ((opt = getopt(argc, argv, "rd")) != -1) {
        switch(opt) {
        case 'r':
            do_splice = do_splice2;
            break;
        case 'd':
            open_flags &= ~O_DIRECT;
            break;
        default:  /* '?' */
            usage(argv[0]);
        }
    }

    if (optind >= argc)
        usage(argv[0]);
    filename = argv[optind];

    printf("%s reader %s O_DIRECT\n",
           do_splice == do_splice1 ? "sequential" : "concurrent",
           (open_flags & O_DIRECT) ? "with" : "without");

    buffer = aligned_alloc(SECTOR_SIZE, BUFFER_SIZE);
    if (buffer == NULL)
        err(1, "aligned_alloc");

    fd = open(filename, open_flags, 0666);
    if (fd == -1)
        err(1, "open: %s", filename);

    memset(buffer, 'x', BUFFER_SIZE);
    ret = write(fd, buffer, BUFFER_SIZE);
    if (ret < 0)
        err(1, "write: %s", filename);
    if (ret != BUFFER_SIZE) {
        fprintf(stderr, "%s: short write\n", filename);
        exit(1);
    }

    ret = lseek(fd, 0, SEEK_SET);
    if (ret != 0)
        err(1, "lseek: %s", filename);

    do_splice(fd, filename, BUFFER_SIZE);

    if (unlink(filename) == -1)
        err(1, "unlink: %s", filename);

    return 0;
}
=================================== 8< ===================================

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  2019-08-21 20:23   ` Andreas Grünbacher
@ 2019-08-28 14:23     ` Darrick J. Wong
  2019-08-28 14:37       ` Andreas Grünbacher
  2019-08-29  1:36       ` Zorro Lang
  0 siblings, 2 replies; 9+ messages in thread
From: Darrick J. Wong @ 2019-08-28 14:23 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Amir Goldstein, Dave Chinner, jencce.kernel, linux-xfs,
	overlayfs, Zorro Lang, fstests, linux-fsdevel, Christoph Hellwig,
	cluster-devel

On Wed, Aug 21, 2019 at 10:23:49PM +0200, Andreas Grünbacher wrote:
> Hi Darrick,
> 
> Am So., 2. Dez. 2018 um 19:13 Uhr schrieb Darrick J. Wong
> <darrick.wong@oracle.com>:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > In commit 4721a601099, we tried to fix a problem wherein directio reads
> > into a splice pipe will bounce EFAULT/EAGAIN all the way out to
> > userspace by simulating a zero-byte short read.  This happens because
> > some directio read implementations (xfs) will call
> > bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
> > reads, but as soon as we run out of pipe buffers that _get_pages call
> > returns EFAULT, which the splice code translates to EAGAIN and bounces
> > out to userspace.
> >
> > In that commit, the iomap code catches the EFAULT and simulates a
> > zero-byte read, but that causes assertion errors on regular splice reads
> > because xfs doesn't allow short directio reads.  This causes infinite
> > splice() loops and assertion failures on generic/095 on overlayfs
> > because xfs only permit total success or total failure of a directio
> > operation.  The underlying issue in the pipe splice code has now been
> > fixed by changing the pipe splice loop to avoid avoid reading more data
> > than there is space in the pipe.
> >
> > Therefore, it's no longer necessary to simulate the short directio, so
> > remove the hack from iomap.
> >
> > Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
> > Reported-by: Amir Goldstein <amir73il@gmail.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v2: split into two patches per hch request
> > ---
> >  fs/iomap.c |    9 ---------
> >  1 file changed, 9 deletions(-)
> >
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index 3ffb776fbebe..d6bc98ae8d35 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -1877,15 +1877,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >                                 dio->wait_for_completion = true;
> >                                 ret = 0;
> >                         }
> > -
> > -                       /*
> > -                        * Splicing to pipes can fail on a full pipe. We have to
> > -                        * swallow this to make it look like a short IO
> > -                        * otherwise the higher splice layers will completely
> > -                        * mishandle the error and stop moving data.
> > -                        */
> > -                       if (ret == -EFAULT)
> > -                               ret = 0;
> >                         break;
> >                 }
> >                 pos += ret;
> 
> I'm afraid this breaks the following test case on xfs and gfs2, the
> two current users of iomap_dio_rw.

Hmm, I had kinda wondered if regular pipes still needed this help.
Evidently we don't have a lot of splice tests in fstests. :(

> Here, the splice system call fails with errno = EAGAIN when trying to
> "move data" from a file opened with O_DIRECT into a pipe.
> 
> The test case can be run with option -d to not use O_DIRECT, which
> makes the test succeed.
> 
> The -r option switches from reading from the pipe sequentially to
> reading concurrently with the splice, which doesn't change the
> behavior.
> 
> Any thoughts?

This would be great as an xfstest! :)

Do you have one ready to go, or should I just make one from the source
code?

--D

> Thanks,
> Andreas
> 
> =================================== 8< ===================================
> #define _GNU_SOURCE
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/wait.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <err.h>
> 
> #include <stdlib.h>
> #include <stdio.h>
> #include <stdbool.h>
> #include <string.h>
> #include <errno.h>
> 
> #define SECTOR_SIZE 512
> #define BUFFER_SIZE (150 * SECTOR_SIZE)
> 
> void read_from_pipe(int fd, const char *filename, size_t size)
> {
>     char buffer[SECTOR_SIZE];
>     size_t sz;
>     ssize_t ret;
> 
>     while (size) {
>         sz = size;
>         if (sz > sizeof buffer)
>             sz = sizeof buffer;
>         ret = read(fd, buffer, sz);
>         if (ret < 0)
>             err(1, "read: %s", filename);
>         if (ret == 0) {
>             fprintf(stderr, "read: %s: unexpected EOF\n", filename);
>             exit(1);
>         }
>         size -= sz;
>     }
> }
> 
> void do_splice1(int fd, const char *filename, size_t size)
> {
>     bool retried = false;
>     int pipefd[2];
> 
>     if (pipe(pipefd) == -1)
>         err(1, "pipe");
>     while (size) {
>         ssize_t spliced;
> 
>         spliced = splice(fd, NULL, pipefd[1], NULL, size, SPLICE_F_MOVE);
>         if (spliced == -1) {
>             if (errno == EAGAIN && !retried) {
>                 retried = true;
>                 fprintf(stderr, "retrying splice\n");
>                 sleep(1);
>                 continue;
>             }
>             err(1, "splice");
>         }
>         read_from_pipe(pipefd[0], filename, spliced);
>         size -= spliced;
>     }
>     close(pipefd[0]);
>     close(pipefd[1]);
> }
> 
> void do_splice2(int fd, const char *filename, size_t size)
> {
>     bool retried = false;
>     int pipefd[2];
>     int pid;
> 
>     if (pipe(pipefd) == -1)
>         err(1, "pipe");
> 
>     pid = fork();
>     if (pid == 0) {
>         close(pipefd[1]);
>         read_from_pipe(pipefd[0], filename, size);
>         exit(0);
>     } else {
>         close(pipefd[0]);
>         while (size) {
>             ssize_t spliced;
> 
>             spliced = splice(fd, NULL, pipefd[1], NULL, size, SPLICE_F_MOVE);
>             if (spliced == -1) {
>                 if (errno == EAGAIN && !retried) {
>                     retried = true;
>                     fprintf(stderr, "retrying splice\n");
>                     sleep(1);
>                     continue;
>                 }
>                 err(1, "splice");
>             }
>             size -= spliced;
>         }
>         close(pipefd[1]);
>         waitpid(pid, NULL, 0);
>     }
> }
> 
> void usage(const char *argv0)
> {
>     fprintf(stderr, "USAGE: %s [-rd] {filename}\n", basename(argv0));
>     exit(2);
> }
> 
> int main(int argc, char *argv[])
> {
>     void (*do_splice)(int fd, const char *filename, size_t size);
>     const char *filename;
>     char *buffer;
>     int opt, open_flags, fd;
>     ssize_t ret;
> 
>     do_splice = do_splice1;
>     open_flags = O_CREAT | O_TRUNC | O_RDWR | O_DIRECT;
> 
>     while ((opt = getopt(argc, argv, "rd")) != -1) {
>         switch(opt) {
>         case 'r':
>             do_splice = do_splice2;
>             break;
>         case 'd':
>             open_flags &= ~O_DIRECT;
>             break;
>         default:  /* '?' */
>             usage(argv[0]);
>         }
>     }
> 
>     if (optind >= argc)
>         usage(argv[0]);
>     filename = argv[optind];
> 
>     printf("%s reader %s O_DIRECT\n",
>            do_splice == do_splice1 ? "sequential" : "concurrent",
>            (open_flags & O_DIRECT) ? "with" : "without");
> 
>     buffer = aligned_alloc(SECTOR_SIZE, BUFFER_SIZE);
>     if (buffer == NULL)
>         err(1, "aligned_alloc");
> 
>     fd = open(filename, open_flags, 0666);
>     if (fd == -1)
>         err(1, "open: %s", filename);
> 
>     memset(buffer, 'x', BUFFER_SIZE);
>     ret = write(fd, buffer, BUFFER_SIZE);
>     if (ret < 0)
>         err(1, "write: %s", filename);
>     if (ret != BUFFER_SIZE) {
>         fprintf(stderr, "%s: short write\n", filename);
>         exit(1);
>     }
> 
>     ret = lseek(fd, 0, SEEK_SET);
>     if (ret != 0)
>         err(1, "lseek: %s", filename);
> 
>     do_splice(fd, filename, BUFFER_SIZE);
> 
>     if (unlink(filename) == -1)
>         err(1, "unlink: %s", filename);
> 
>     return 0;
> }
> =================================== 8< ===================================

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  2019-08-28 14:23     ` Darrick J. Wong
@ 2019-08-28 14:37       ` Andreas Grünbacher
  2019-08-29  3:12         ` Darrick J. Wong
  2019-08-29  1:36       ` Zorro Lang
  1 sibling, 1 reply; 9+ messages in thread
From: Andreas Grünbacher @ 2019-08-28 14:37 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Amir Goldstein, Dave Chinner, jencce.kernel, linux-xfs,
	overlayfs, Zorro Lang, fstests, linux-fsdevel, Christoph Hellwig,
	cluster-devel

Am Mi., 28. Aug. 2019 um 16:23 Uhr schrieb Darrick J. Wong
<darrick.wong@oracle.com>:
> On Wed, Aug 21, 2019 at 10:23:49PM +0200, Andreas Grünbacher wrote:
> > Hi Darrick,
> >
> > Am So., 2. Dez. 2018 um 19:13 Uhr schrieb Darrick J. Wong
> > <darrick.wong@oracle.com>:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > In commit 4721a601099, we tried to fix a problem wherein directio reads
> > > into a splice pipe will bounce EFAULT/EAGAIN all the way out to
> > > userspace by simulating a zero-byte short read.  This happens because
> > > some directio read implementations (xfs) will call
> > > bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
> > > reads, but as soon as we run out of pipe buffers that _get_pages call
> > > returns EFAULT, which the splice code translates to EAGAIN and bounces
> > > out to userspace.
> > >
> > > In that commit, the iomap code catches the EFAULT and simulates a
> > > zero-byte read, but that causes assertion errors on regular splice reads
> > > because xfs doesn't allow short directio reads.  This causes infinite
> > > splice() loops and assertion failures on generic/095 on overlayfs
> > > because xfs only permit total success or total failure of a directio
> > > operation.  The underlying issue in the pipe splice code has now been
> > > fixed by changing the pipe splice loop to avoid avoid reading more data
> > > than there is space in the pipe.
> > >
> > > Therefore, it's no longer necessary to simulate the short directio, so
> > > remove the hack from iomap.
> > >
> > > Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
> > > Reported-by: Amir Goldstein <amir73il@gmail.com>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > v2: split into two patches per hch request
> > > ---
> > >  fs/iomap.c |    9 ---------
> > >  1 file changed, 9 deletions(-)
> > >
> > > diff --git a/fs/iomap.c b/fs/iomap.c
> > > index 3ffb776fbebe..d6bc98ae8d35 100644
> > > --- a/fs/iomap.c
> > > +++ b/fs/iomap.c
> > > @@ -1877,15 +1877,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> > >                                 dio->wait_for_completion = true;
> > >                                 ret = 0;
> > >                         }
> > > -
> > > -                       /*
> > > -                        * Splicing to pipes can fail on a full pipe. We have to
> > > -                        * swallow this to make it look like a short IO
> > > -                        * otherwise the higher splice layers will completely
> > > -                        * mishandle the error and stop moving data.
> > > -                        */
> > > -                       if (ret == -EFAULT)
> > > -                               ret = 0;
> > >                         break;
> > >                 }
> > >                 pos += ret;
> >
> > I'm afraid this breaks the following test case on xfs and gfs2, the
> > two current users of iomap_dio_rw.
>
> Hmm, I had kinda wondered if regular pipes still needed this help.
> Evidently we don't have a lot of splice tests in fstests. :(

So what do you suggest as a fix?

> > Here, the splice system call fails with errno = EAGAIN when trying to
> > "move data" from a file opened with O_DIRECT into a pipe.
> >
> > The test case can be run with option -d to not use O_DIRECT, which
> > makes the test succeed.
> >
> > The -r option switches from reading from the pipe sequentially to
> > reading concurrently with the splice, which doesn't change the
> > behavior.
> >
> > Any thoughts?
>
> This would be great as an xfstest! :)

Or perhaps something generalized from it.

> Do you have one ready to go, or should I just make one from the source
> code?

The bug originally triggered in our internal cluster test system and
I've recreated the test case I've included from the strace. That's all
I have for now; feel free to take it, of course.

It could be that the same condition can be triggered with one of the
existing utilities (fio/fsstress/...).

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  2019-08-28 14:23     ` Darrick J. Wong
  2019-08-28 14:37       ` Andreas Grünbacher
@ 2019-08-29  1:36       ` Zorro Lang
  1 sibling, 0 replies; 9+ messages in thread
From: Zorro Lang @ 2019-08-29  1:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Grünbacher, Amir Goldstein, Dave Chinner,
	jencce.kernel, linux-xfs, overlayfs, fstests, linux-fsdevel,
	Christoph Hellwig, cluster-devel

On Wed, Aug 28, 2019 at 07:23:32AM -0700, Darrick J. Wong wrote:
> On Wed, Aug 21, 2019 at 10:23:49PM +0200, Andreas Grünbacher wrote:
> > Hi Darrick,
> > 
> > Am So., 2. Dez. 2018 um 19:13 Uhr schrieb Darrick J. Wong
> > <darrick.wong@oracle.com>:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > In commit 4721a601099, we tried to fix a problem wherein directio reads
> > > into a splice pipe will bounce EFAULT/EAGAIN all the way out to
> > > userspace by simulating a zero-byte short read.  This happens because
> > > some directio read implementations (xfs) will call
> > > bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
> > > reads, but as soon as we run out of pipe buffers that _get_pages call
> > > returns EFAULT, which the splice code translates to EAGAIN and bounces
> > > out to userspace.
> > >
> > > In that commit, the iomap code catches the EFAULT and simulates a
> > > zero-byte read, but that causes assertion errors on regular splice reads
> > > because xfs doesn't allow short directio reads.  This causes infinite
> > > splice() loops and assertion failures on generic/095 on overlayfs
> > > because xfs only permit total success or total failure of a directio
> > > operation.  The underlying issue in the pipe splice code has now been
> > > fixed by changing the pipe splice loop to avoid avoid reading more data
> > > than there is space in the pipe.
> > >
> > > Therefore, it's no longer necessary to simulate the short directio, so
> > > remove the hack from iomap.
> > >
> > > Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
> > > Reported-by: Amir Goldstein <amir73il@gmail.com>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > v2: split into two patches per hch request
> > > ---
> > >  fs/iomap.c |    9 ---------
> > >  1 file changed, 9 deletions(-)
> > >
> > > diff --git a/fs/iomap.c b/fs/iomap.c
> > > index 3ffb776fbebe..d6bc98ae8d35 100644
> > > --- a/fs/iomap.c
> > > +++ b/fs/iomap.c
> > > @@ -1877,15 +1877,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> > >                                 dio->wait_for_completion = true;
> > >                                 ret = 0;
> > >                         }
> > > -
> > > -                       /*
> > > -                        * Splicing to pipes can fail on a full pipe. We have to
> > > -                        * swallow this to make it look like a short IO
> > > -                        * otherwise the higher splice layers will completely
> > > -                        * mishandle the error and stop moving data.
> > > -                        */
> > > -                       if (ret == -EFAULT)
> > > -                               ret = 0;
> > >                         break;
> > >                 }
> > >                 pos += ret;
> > 
> > I'm afraid this breaks the following test case on xfs and gfs2, the
> > two current users of iomap_dio_rw.
> 
> Hmm, I had kinda wondered if regular pipes still needed this help.
> Evidently we don't have a lot of splice tests in fstests. :(
> 
> > Here, the splice system call fails with errno = EAGAIN when trying to
> > "move data" from a file opened with O_DIRECT into a pipe.
> > 
> > The test case can be run with option -d to not use O_DIRECT, which
> > makes the test succeed.
> > 
> > The -r option switches from reading from the pipe sequentially to
> > reading concurrently with the splice, which doesn't change the
> > behavior.
> > 
> > Any thoughts?
> 
> This would be great as an xfstest! :)

JFYI, I added splice operation into fsstress, and I tried to add splice operation
into xfs_io long time ago:

https://marc.info/?l=linux-xfs&m=155828702128047&w=2

But it haven't been merged. If you have any suggestion, please feel free to
review it:)

Thanks,
Zorro

> 
> Do you have one ready to go, or should I just make one from the source
> code?
> 
> --D
> 
> > Thanks,
> > Andreas
> > 
> > =================================== 8< ===================================
> > #define _GNU_SOURCE
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <sys/wait.h>
> > #include <unistd.h>
> > #include <fcntl.h>
> > #include <err.h>
> > 
> > #include <stdlib.h>
> > #include <stdio.h>
> > #include <stdbool.h>
> > #include <string.h>
> > #include <errno.h>
> > 
> > #define SECTOR_SIZE 512
> > #define BUFFER_SIZE (150 * SECTOR_SIZE)
> > 
> > void read_from_pipe(int fd, const char *filename, size_t size)
> > {
> >     char buffer[SECTOR_SIZE];
> >     size_t sz;
> >     ssize_t ret;
> > 
> >     while (size) {
> >         sz = size;
> >         if (sz > sizeof buffer)
> >             sz = sizeof buffer;
> >         ret = read(fd, buffer, sz);
> >         if (ret < 0)
> >             err(1, "read: %s", filename);
> >         if (ret == 0) {
> >             fprintf(stderr, "read: %s: unexpected EOF\n", filename);
> >             exit(1);
> >         }
> >         size -= sz;
> >     }
> > }
> > 
> > void do_splice1(int fd, const char *filename, size_t size)
> > {
> >     bool retried = false;
> >     int pipefd[2];
> > 
> >     if (pipe(pipefd) == -1)
> >         err(1, "pipe");
> >     while (size) {
> >         ssize_t spliced;
> > 
> >         spliced = splice(fd, NULL, pipefd[1], NULL, size, SPLICE_F_MOVE);
> >         if (spliced == -1) {
> >             if (errno == EAGAIN && !retried) {
> >                 retried = true;
> >                 fprintf(stderr, "retrying splice\n");
> >                 sleep(1);
> >                 continue;
> >             }
> >             err(1, "splice");
> >         }
> >         read_from_pipe(pipefd[0], filename, spliced);
> >         size -= spliced;
> >     }
> >     close(pipefd[0]);
> >     close(pipefd[1]);
> > }
> > 
> > void do_splice2(int fd, const char *filename, size_t size)
> > {
> >     bool retried = false;
> >     int pipefd[2];
> >     int pid;
> > 
> >     if (pipe(pipefd) == -1)
> >         err(1, "pipe");
> > 
> >     pid = fork();
> >     if (pid == 0) {
> >         close(pipefd[1]);
> >         read_from_pipe(pipefd[0], filename, size);
> >         exit(0);
> >     } else {
> >         close(pipefd[0]);
> >         while (size) {
> >             ssize_t spliced;
> > 
> >             spliced = splice(fd, NULL, pipefd[1], NULL, size, SPLICE_F_MOVE);
> >             if (spliced == -1) {
> >                 if (errno == EAGAIN && !retried) {
> >                     retried = true;
> >                     fprintf(stderr, "retrying splice\n");
> >                     sleep(1);
> >                     continue;
> >                 }
> >                 err(1, "splice");
> >             }
> >             size -= spliced;
> >         }
> >         close(pipefd[1]);
> >         waitpid(pid, NULL, 0);
> >     }
> > }
> > 
> > void usage(const char *argv0)
> > {
> >     fprintf(stderr, "USAGE: %s [-rd] {filename}\n", basename(argv0));
> >     exit(2);
> > }
> > 
> > int main(int argc, char *argv[])
> > {
> >     void (*do_splice)(int fd, const char *filename, size_t size);
> >     const char *filename;
> >     char *buffer;
> >     int opt, open_flags, fd;
> >     ssize_t ret;
> > 
> >     do_splice = do_splice1;
> >     open_flags = O_CREAT | O_TRUNC | O_RDWR | O_DIRECT;
> > 
> >     while ((opt = getopt(argc, argv, "rd")) != -1) {
> >         switch(opt) {
> >         case 'r':
> >             do_splice = do_splice2;
> >             break;
> >         case 'd':
> >             open_flags &= ~O_DIRECT;
> >             break;
> >         default:  /* '?' */
> >             usage(argv[0]);
> >         }
> >     }
> > 
> >     if (optind >= argc)
> >         usage(argv[0]);
> >     filename = argv[optind];
> > 
> >     printf("%s reader %s O_DIRECT\n",
> >            do_splice == do_splice1 ? "sequential" : "concurrent",
> >            (open_flags & O_DIRECT) ? "with" : "without");
> > 
> >     buffer = aligned_alloc(SECTOR_SIZE, BUFFER_SIZE);
> >     if (buffer == NULL)
> >         err(1, "aligned_alloc");
> > 
> >     fd = open(filename, open_flags, 0666);
> >     if (fd == -1)
> >         err(1, "open: %s", filename);
> > 
> >     memset(buffer, 'x', BUFFER_SIZE);
> >     ret = write(fd, buffer, BUFFER_SIZE);
> >     if (ret < 0)
> >         err(1, "write: %s", filename);
> >     if (ret != BUFFER_SIZE) {
> >         fprintf(stderr, "%s: short write\n", filename);
> >         exit(1);
> >     }
> > 
> >     ret = lseek(fd, 0, SEEK_SET);
> >     if (ret != 0)
> >         err(1, "lseek: %s", filename);
> > 
> >     do_splice(fd, filename, BUFFER_SIZE);
> > 
> >     if (unlink(filename) == -1)
> >         err(1, "unlink: %s", filename);
> > 
> >     return 0;
> > }
> > =================================== 8< ===================================

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  2019-08-28 14:37       ` Andreas Grünbacher
@ 2019-08-29  3:12         ` Darrick J. Wong
  2019-08-29 11:49           ` Andreas Grünbacher
  0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2019-08-29  3:12 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Amir Goldstein, Dave Chinner, jencce.kernel, linux-xfs,
	overlayfs, Zorro Lang, fstests, linux-fsdevel, Christoph Hellwig,
	cluster-devel

On Wed, Aug 28, 2019 at 04:37:59PM +0200, Andreas Grünbacher wrote:
> Am Mi., 28. Aug. 2019 um 16:23 Uhr schrieb Darrick J. Wong
> <darrick.wong@oracle.com>:
> > On Wed, Aug 21, 2019 at 10:23:49PM +0200, Andreas Grünbacher wrote:
> > > Hi Darrick,
> > >
> > > Am So., 2. Dez. 2018 um 19:13 Uhr schrieb Darrick J. Wong
> > > <darrick.wong@oracle.com>:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > >
> > > > In commit 4721a601099, we tried to fix a problem wherein directio reads
> > > > into a splice pipe will bounce EFAULT/EAGAIN all the way out to
> > > > userspace by simulating a zero-byte short read.  This happens because
> > > > some directio read implementations (xfs) will call
> > > > bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
> > > > reads, but as soon as we run out of pipe buffers that _get_pages call
> > > > returns EFAULT, which the splice code translates to EAGAIN and bounces
> > > > out to userspace.
> > > >
> > > > In that commit, the iomap code catches the EFAULT and simulates a
> > > > zero-byte read, but that causes assertion errors on regular splice reads
> > > > because xfs doesn't allow short directio reads.  This causes infinite
> > > > splice() loops and assertion failures on generic/095 on overlayfs
> > > > because xfs only permit total success or total failure of a directio
> > > > operation.  The underlying issue in the pipe splice code has now been
> > > > fixed by changing the pipe splice loop to avoid avoid reading more data
> > > > than there is space in the pipe.
> > > >
> > > > Therefore, it's no longer necessary to simulate the short directio, so
> > > > remove the hack from iomap.
> > > >
> > > > Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
> > > > Reported-by: Amir Goldstein <amir73il@gmail.com>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > > v2: split into two patches per hch request
> > > > ---
> > > >  fs/iomap.c |    9 ---------
> > > >  1 file changed, 9 deletions(-)
> > > >
> > > > diff --git a/fs/iomap.c b/fs/iomap.c
> > > > index 3ffb776fbebe..d6bc98ae8d35 100644
> > > > --- a/fs/iomap.c
> > > > +++ b/fs/iomap.c
> > > > @@ -1877,15 +1877,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> > > >                                 dio->wait_for_completion = true;
> > > >                                 ret = 0;
> > > >                         }
> > > > -
> > > > -                       /*
> > > > -                        * Splicing to pipes can fail on a full pipe. We have to
> > > > -                        * swallow this to make it look like a short IO
> > > > -                        * otherwise the higher splice layers will completely
> > > > -                        * mishandle the error and stop moving data.
> > > > -                        */
> > > > -                       if (ret == -EFAULT)
> > > > -                               ret = 0;
> > > >                         break;
> > > >                 }
> > > >                 pos += ret;
> > >
> > > I'm afraid this breaks the following test case on xfs and gfs2, the
> > > two current users of iomap_dio_rw.
> >
> > Hmm, I had kinda wondered if regular pipes still needed this help.
> > Evidently we don't have a lot of splice tests in fstests. :(
> 
> So what do you suggest as a fix?

(See below)

> > > Here, the splice system call fails with errno = EAGAIN when trying to
> > > "move data" from a file opened with O_DIRECT into a pipe.
> > >
> > > The test case can be run with option -d to not use O_DIRECT, which
> > > makes the test succeed.
> > >
> > > The -r option switches from reading from the pipe sequentially to
> > > reading concurrently with the splice, which doesn't change the
> > > behavior.
> > >
> > > Any thoughts?
> >
> > This would be great as an xfstest! :)
> 
> Or perhaps something generalized from it.
> 
> > Do you have one ready to go, or should I just make one from the source
> > code?
> 
> The bug originally triggered in our internal cluster test system and
> I've recreated the test case I've included from the strace. That's all
> I have for now; feel free to take it, of course.
> 
> It could be that the same condition can be triggered with one of the
> existing utilities (fio/fsstress/...).

Hm, so I made an xfstest out of the program you sent me, and indeed
reverting that chunk makes the failure go away, but that got me
wondering -- that iomap kludge was a workaround for the splice code
telling iomap to try to stuff XXXX bytes into a pipe that only has X
bytes of free buffer space.  We fixed splice_direct_to_actor to clamp
the length parameter to the available pipe space, but we never did the
same to do_splice:

	/* Don't try to read more the pipe has space for. */
	read_len = min_t(size_t, len,
			 (pipe->buffers - pipe->nrbufs) << PAGE_SHIFT);
	ret = do_splice_to(in, &pos, pipe, read_len, flags);

Applying similar logic to the two (opipe != NULL) cases of do_splice()
seem to make the EAGAIN problem go away too.  So why don't we teach
do_splice to only ask for as many bytes as the pipe has space here too?

Does the following patch fix it for you?

--D

From: Darrick J. Wong <darrick.wong@oracle.com>
Subject: [PATCH] splice: only read in as much information as there is pipe buffer space

Andreas Gruenbacher reports that on the two filesystems that support
iomap directio, it's possible for splice() to return -EAGAIN (instead of
a short splice) if the pipe being written to has less space available in
its pipe buffers than the length supplied by the calling process.

Months ago we fixed splice_direct_to_actor to clamp the length of the
read request to the size of the splice pipe.  Do the same to do_splice.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/splice.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 98412721f056..50335515d7c1 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1101,6 +1101,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 	struct pipe_inode_info *ipipe;
 	struct pipe_inode_info *opipe;
 	loff_t offset;
+	unsigned int pipe_pages;
 	long ret;
 
 	ipipe = get_pipe_info(in);
@@ -1123,6 +1124,10 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 		if ((in->f_flags | out->f_flags) & O_NONBLOCK)
 			flags |= SPLICE_F_NONBLOCK;
 
+		/* Don't try to read more the pipe has space for. */
+		pipe_pages = opipe->buffers - opipe->nrbufs;
+		len = min_t(size_t, len, pipe_pages << PAGE_SHIFT);
+
 		return splice_pipe_to_pipe(ipipe, opipe, len, flags);
 	}
 
@@ -1180,8 +1185,13 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 
 		pipe_lock(opipe);
 		ret = wait_for_space(opipe, flags);
-		if (!ret)
+		if (!ret) {
+			/* Don't try to read more the pipe has space for. */
+			pipe_pages = opipe->buffers - opipe->nrbufs;
+			len = min_t(size_t, len, pipe_pages << PAGE_SHIFT);
+
 			ret = do_splice_to(in, &offset, opipe, len, flags);
+		}
 		pipe_unlock(opipe);
 		if (ret > 0)
 			wakeup_pipe_readers(opipe);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  2019-08-29  3:12         ` Darrick J. Wong
@ 2019-08-29 11:49           ` Andreas Grünbacher
  0 siblings, 0 replies; 9+ messages in thread
From: Andreas Grünbacher @ 2019-08-29 11:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Amir Goldstein, Dave Chinner, jencce.kernel, linux-xfs,
	overlayfs, Zorro Lang, fstests, linux-fsdevel, Christoph Hellwig,
	cluster-devel

Hi Darrick,

Am Do., 29. Aug. 2019 um 05:12 Uhr schrieb Darrick J. Wong
<darrick.wong@oracle.com>:
> Hm, so I made an xfstest out of the program you sent me, and indeed
> reverting that chunk makes the failure go away, but that got me
> wondering -- that iomap kludge was a workaround for the splice code
> telling iomap to try to stuff XXXX bytes into a pipe that only has X
> bytes of free buffer space.  We fixed splice_direct_to_actor to clamp
> the length parameter to the available pipe space, but we never did the
> same to do_splice:
>
>         /* Don't try to read more the pipe has space for. */
>         read_len = min_t(size_t, len,
>                          (pipe->buffers - pipe->nrbufs) << PAGE_SHIFT);
>         ret = do_splice_to(in, &pos, pipe, read_len, flags);
>
> Applying similar logic to the two (opipe != NULL) cases of do_splice()
> seem to make the EAGAIN problem go away too.  So why don't we teach
> do_splice to only ask for as many bytes as the pipe has space here too?
>
> Does the following patch fix it for you?

Yes, that works, thank you.

> From: Darrick J. Wong <darrick.wong@oracle.com>
> Subject: [PATCH] splice: only read in as much information as there is pipe buffer space
>
> Andreas Gruenbacher reports that on the two filesystems that support
> iomap directio, it's possible for splice() to return -EAGAIN (instead of
> a short splice) if the pipe being written to has less space available in
> its pipe buffers than the length supplied by the calling process.
>
> Months ago we fixed splice_direct_to_actor to clamp the length of the
> read request to the size of the splice pipe.  Do the same to do_splice.

Can you add a reference to that commit here (17614445576b6)?

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/splice.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/splice.c b/fs/splice.c
> index 98412721f056..50335515d7c1 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1101,6 +1101,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
>         struct pipe_inode_info *ipipe;
>         struct pipe_inode_info *opipe;
>         loff_t offset;
> +       unsigned int pipe_pages;
>         long ret;
>
>         ipipe = get_pipe_info(in);
> @@ -1123,6 +1124,10 @@ static long do_splice(struct file *in, loff_t __user *off_in,
>                 if ((in->f_flags | out->f_flags) & O_NONBLOCK)
>                         flags |= SPLICE_F_NONBLOCK;
>
> +               /* Don't try to read more the pipe has space for. */
> +               pipe_pages = opipe->buffers - opipe->nrbufs;
> +               len = min_t(size_t, len, pipe_pages << PAGE_SHIFT);

This should probably be min(len, (size_t)pipe_pages << PAGE_SHIFT).
Same for the second min_t here and the one added by commit
17614445576b6.

> +
>                 return splice_pipe_to_pipe(ipipe, opipe, len, flags);
>         }
>
> @@ -1180,8 +1185,13 @@ static long do_splice(struct file *in, loff_t __user *off_in,
>
>                 pipe_lock(opipe);
>                 ret = wait_for_space(opipe, flags);
> -               if (!ret)
> +               if (!ret) {
> +                       /* Don't try to read more the pipe has space for. */
> +                       pipe_pages = opipe->buffers - opipe->nrbufs;
> +                       len = min_t(size_t, len, pipe_pages << PAGE_SHIFT);
> +
>                         ret = do_splice_to(in, &offset, opipe, len, flags);
> +               }
>                 pipe_unlock(opipe);
>                 if (ret > 0)
>                         wakeup_pipe_readers(opipe);

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-08-29 11:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-02 18:08 [PATCH v2 1/2] splice: don't read more than available pipe space Darrick J. Wong
2018-12-02 18:10 ` [PATCH v2 2/2] iomap: partially revert 4721a601099 (simulated directio short read on EFAULT) Darrick J. Wong
2018-12-02 19:37   ` Amir Goldstein
2019-08-21 20:23   ` Andreas Grünbacher
2019-08-28 14:23     ` Darrick J. Wong
2019-08-28 14:37       ` Andreas Grünbacher
2019-08-29  3:12         ` Darrick J. Wong
2019-08-29 11:49           ` Andreas Grünbacher
2019-08-29  1:36       ` Zorro Lang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).