* splice() from /dev/zero to a pipe does not work (5.9+)
@ 2021-05-07 18:05 Colin Ian King
2021-05-07 18:21 ` Kees Cook
0 siblings, 1 reply; 7+ messages in thread
From: Colin Ian King @ 2021-05-07 18:05 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Kees Cook, Al Viro, linux-fsdevel
Hi,
While doing some micro benchmarking with stress-ng I discovered that
since linux 5.9 the splicing from /dev/zero to a pipe now fails with
-EINVAL.
I bisected this down to the following commit:
36e2c7421f02a22f71c9283e55fdb672a9eb58e7 is the first bad commit
commit 36e2c7421f02a22f71c9283e55fdb672a9eb58e7
Author: Christoph Hellwig <hch@lst.de>
Date: Thu Sep 3 16:22:34 2020 +0200
fs: don't allow splice read/write without explicit ops
I'm not sure if this has been reported before, or if it's intentional
behavior or not. As it stands, it's a regression in the stress-ng splice
test case.
Prior to that commit, splicing worked from /dev/zero to a pipe. Below is
an example of the reproducer:
--- reproducer below ---
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#define FAIL(x) { perror(x); exit(1); }
int main(void)
{
int fd_in, fd_out, fds[2];
ssize_t ret;
loff_t off_in, off_out;
const size_t len = 4096;
if ((fd_in = open("/dev/zero", O_RDONLY)) < 0)
FAIL("open /dev/zero failed");
/*
* /dev/zero -> pipe splice -> pipe splice -> /dev/null
*/
if (pipe(fds) < 0)
FAIL("pipe FAILed\n");
if ((fd_out = open("/dev/null", O_WRONLY)) < 0)
FAIL("open /dev/null failed");
ret = splice(fd_in, NULL, fds[1], NULL, len, SPLICE_F_MOVE);
if (ret < 0)
FAIL("splice failed");
ret = splice(fds[0], NULL, fd_out, NULL, len, SPLICE_F_MOVE);
if (ret < 0)
FAIL("splice failed");
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: splice() from /dev/zero to a pipe does not work (5.9+)
2021-05-07 18:05 splice() from /dev/zero to a pipe does not work (5.9+) Colin Ian King
@ 2021-05-07 18:21 ` Kees Cook
2021-05-07 19:06 ` Linus Torvalds
2021-05-12 15:13 ` Colin Ian King
0 siblings, 2 replies; 7+ messages in thread
From: Kees Cook @ 2021-05-07 18:21 UTC (permalink / raw)
To: Colin Ian King
Cc: Christoph Hellwig, Al Viro, Linus Torvalds, Johannes Berg,
linux-fsdevel, LKML
On Fri, May 07, 2021 at 07:05:51PM +0100, Colin Ian King wrote:
> Hi,
>
> While doing some micro benchmarking with stress-ng I discovered that
> since linux 5.9 the splicing from /dev/zero to a pipe now fails with
> -EINVAL.
>
> I bisected this down to the following commit:
>
> 36e2c7421f02a22f71c9283e55fdb672a9eb58e7 is the first bad commit
> commit 36e2c7421f02a22f71c9283e55fdb672a9eb58e7
> Author: Christoph Hellwig <hch@lst.de>
> Date: Thu Sep 3 16:22:34 2020 +0200
>
> fs: don't allow splice read/write without explicit ops
>
> I'm not sure if this has been reported before, or if it's intentional
> behavior or not. As it stands, it's a regression in the stress-ng splice
> test case.
The general loss of generic splice read/write is known. Here's some
early conversations I was involved in:
https://lore.kernel.org/lkml/20200818200725.GA1081@lst.de/
https://lore.kernel.org/lkml/202009181443.C2179FB@keescook/
https://lore.kernel.org/lkml/20201005204517.2652730-1-keescook@chromium.org/
And it's been getting re-implemented in individual places:
$ git log --oneline --no-merges --grep 36e2c742
42984af09afc jffs2: Hook up splice_write callback
a35d8f016e0b nilfs2: make splice write available again
f8ad8187c3b5 fs/pipe: allow sendfile() to pipe again
f2d6c2708bd8 kernfs: wire up ->splice_read and ->splice_write
9bb48c82aced tty: implement write_iter
dd78b0c483e3 tty: implement read_iter
14e3e989f6a5 proc mountinfo: make splice available again
c1048828c3db orangefs: add splice file operations
960f4f8a4e60 fs: 9p: add generic splice_write file operation
cf03f316ad20 fs: 9p: add generic splice_read file operations
06a17bbe1d47 afs: Fix copy_file_range()
So the question is likely, "do we want this for /dev/zero?"
--
Kees Cook
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: splice() from /dev/zero to a pipe does not work (5.9+)
2021-05-07 18:21 ` Kees Cook
@ 2021-05-07 19:06 ` Linus Torvalds
2021-05-07 19:17 ` Al Viro
2021-05-12 15:13 ` Colin Ian King
1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2021-05-07 19:06 UTC (permalink / raw)
To: Kees Cook
Cc: Colin Ian King, Christoph Hellwig, Al Viro, Johannes Berg,
linux-fsdevel, LKML
On Fri, May 7, 2021 at 11:21 AM Kees Cook <keescook@chromium.org> wrote:
>
> So the question is likely, "do we want this for /dev/zero?"
Well, /dev/zero should at least be safe, and I guess it's actually
interesting from a performance testing standpoint (ie useful for some
kind of "what is the overhead of the splice code with no data copy").
So I'll happily take a sane patch for /dev/zero, although I think it
probably only makes sense if it's made to use the zero page explicitly
(ie exactly for that "no data copy testing" case).
So very much *not* using generic_file_splice_read(), even if that
might be the one-liner.
/dev/zero should probably also use the (already existing)
splice_write_null() function for the .splice_write case.
Anybody willing to look into this? My gu feel is that it *should* be easy to do.
That said - looking at the current 'pipe_zero()', it uses
'push_pipe()' to actually allocation regular pages, and then clear
them.
Which is basically what a generic_file_splice_read() would do, and it
feels incredibly pointless and stupid to me.
I *think* we should be able to just do something like
len = size;
while (len > 0) {
struct pipe_buffer *buf;
unsigned int tail = pipe->tail;
unsigned int head = pipe->head;
unsigned int mask = pipe->ring_size - 1;
if (pipe_full(head, tail, pipe->max_usage))
break;
buf = &pipe->bufs[iter_head & p_mask];
buf->ops = &zero_pipe_buf_ops;
buf->page = ZERO_PAGE(0);
buf->offset = 0;
buf->len = min_t(ssize_t, len, PAGE_SIZE);
len -= buf->len;
pipe->head = head+1;
}
return size - len;
but honestly, I haven't thought a lot about it.
Al? This is another of those "right up your alley" things.
Maybe it's not worth it, and just using generic_file_splice_read() is
the way to go, but I do get the feeling that if we are splicing
/dev/null, the whole _point_ of it is about benchmarking, not "make it
work".
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: splice() from /dev/zero to a pipe does not work (5.9+)
2021-05-07 19:06 ` Linus Torvalds
@ 2021-05-07 19:17 ` Al Viro
2021-05-07 19:29 ` Linus Torvalds
0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2021-05-07 19:17 UTC (permalink / raw)
To: Linus Torvalds
Cc: Kees Cook, Colin Ian King, Christoph Hellwig, Johannes Berg,
linux-fsdevel, LKML
On Fri, May 07, 2021 at 12:06:31PM -0700, Linus Torvalds wrote:
> That said - looking at the current 'pipe_zero()', it uses
> 'push_pipe()' to actually allocation regular pages, and then clear
> them.
>
> Which is basically what a generic_file_splice_read() would do, and it
> feels incredibly pointless and stupid to me.
>
> I *think* we should be able to just do something like
>
> len = size;
> while (len > 0) {
> struct pipe_buffer *buf;
> unsigned int tail = pipe->tail;
> unsigned int head = pipe->head;
> unsigned int mask = pipe->ring_size - 1;
>
> if (pipe_full(head, tail, pipe->max_usage))
> break;
> buf = &pipe->bufs[iter_head & p_mask];
> buf->ops = &zero_pipe_buf_ops;
> buf->page = ZERO_PAGE(0);
> buf->offset = 0;
> buf->len = min_t(ssize_t, len, PAGE_SIZE);
> len -= buf->len;
> pipe->head = head+1;
> }
> return size - len;
>
> but honestly, I haven't thought a lot about it.
>
> Al? This is another of those "right up your alley" things.
Umm... That would do wonders to anything that used to do
copy_to_user()/clear_user()/copy_to_user() and got converted
to copy_to_iter()/iov_iter_zero()/copy_to_iter()...
Are you sure we can shove zero page into pipe, anyway?
IIRC, get_page()/put_page() on that is not allowed, and
I'm not at all sure that nothing in e.g. fuse splice-related
logics would go ahead an do just that. Or am I confused
about the page refcounting for those?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: splice() from /dev/zero to a pipe does not work (5.9+)
2021-05-07 19:17 ` Al Viro
@ 2021-05-07 19:29 ` Linus Torvalds
2021-05-07 20:31 ` Al Viro
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2021-05-07 19:29 UTC (permalink / raw)
To: Al Viro
Cc: Kees Cook, Colin Ian King, Christoph Hellwig, Johannes Berg,
linux-fsdevel, LKML
On Fri, May 7, 2021 at 12:17 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm... That would do wonders to anything that used to do
> copy_to_user()/clear_user()/copy_to_user() and got converted
> to copy_to_iter()/iov_iter_zero()/copy_to_iter()...
I didn't mean for iov_iter_zero doing this - only splice_read_zero().
> Are you sure we can shove zero page into pipe, anyway?
> IIRC, get_page()/put_page() on that is not allowed,
That's what the
buf->ops = &zero_pipe_buf_ops;
is for. The zero_pipe_buf_ops would have empty get and release
functions, and a 'steal' function that always returns false.
That's how the pipe pages are supposed to work: there are people who
put non-page data (ie things like skbuff allocations etc) into a
splice pipe buffer. It's why we have those "ops" pointers.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: splice() from /dev/zero to a pipe does not work (5.9+)
2021-05-07 19:29 ` Linus Torvalds
@ 2021-05-07 20:31 ` Al Viro
0 siblings, 0 replies; 7+ messages in thread
From: Al Viro @ 2021-05-07 20:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Kees Cook, Colin Ian King, Christoph Hellwig, Johannes Berg,
linux-fsdevel, LKML
On Fri, May 07, 2021 at 12:29:44PM -0700, Linus Torvalds wrote:
> On Fri, May 7, 2021 at 12:17 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm... That would do wonders to anything that used to do
> > copy_to_user()/clear_user()/copy_to_user() and got converted
> > to copy_to_iter()/iov_iter_zero()/copy_to_iter()...
>
> I didn't mean for iov_iter_zero doing this - only splice_read_zero().
>
> > Are you sure we can shove zero page into pipe, anyway?
> > IIRC, get_page()/put_page() on that is not allowed,
>
> That's what the
>
> buf->ops = &zero_pipe_buf_ops;
>
> is for. The zero_pipe_buf_ops would have empty get and release
> functions, and a 'steal' function that always returns false.
>
> That's how the pipe pages are supposed to work: there are people who
> put non-page data (ie things like skbuff allocations etc) into a
> splice pipe buffer. It's why we have those "ops" pointers.
Supposed to - sure, but I'd like to verify that they actually do work
that way before we go there. Let me RTFS a bit...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: splice() from /dev/zero to a pipe does not work (5.9+)
2021-05-07 18:21 ` Kees Cook
2021-05-07 19:06 ` Linus Torvalds
@ 2021-05-12 15:13 ` Colin Ian King
1 sibling, 0 replies; 7+ messages in thread
From: Colin Ian King @ 2021-05-12 15:13 UTC (permalink / raw)
To: Kees Cook
Cc: Christoph Hellwig, Al Viro, Linus Torvalds, Johannes Berg,
linux-fsdevel, LKML
On 07/05/2021 19:21, Kees Cook wrote:
> On Fri, May 07, 2021 at 07:05:51PM +0100, Colin Ian King wrote:
>> Hi,
>>
>> While doing some micro benchmarking with stress-ng I discovered that
>> since linux 5.9 the splicing from /dev/zero to a pipe now fails with
>> -EINVAL.
>>
>> I bisected this down to the following commit:
>>
>> 36e2c7421f02a22f71c9283e55fdb672a9eb58e7 is the first bad commit
>> commit 36e2c7421f02a22f71c9283e55fdb672a9eb58e7
>> Author: Christoph Hellwig <hch@lst.de>
>> Date: Thu Sep 3 16:22:34 2020 +0200
>>
>> fs: don't allow splice read/write without explicit ops
>>
>> I'm not sure if this has been reported before, or if it's intentional
>> behavior or not. As it stands, it's a regression in the stress-ng splice
>> test case.
>
> The general loss of generic splice read/write is known. Here's some
> early conversations I was involved in:
>
> https://lore.kernel.org/lkml/20200818200725.GA1081@lst.de/
> https://lore.kernel.org/lkml/202009181443.C2179FB@keescook/
> https://lore.kernel.org/lkml/20201005204517.2652730-1-keescook@chromium.org/
>
> And it's been getting re-implemented in individual places:
>
> $ git log --oneline --no-merges --grep 36e2c742
> 42984af09afc jffs2: Hook up splice_write callback
> a35d8f016e0b nilfs2: make splice write available again
> f8ad8187c3b5 fs/pipe: allow sendfile() to pipe again
> f2d6c2708bd8 kernfs: wire up ->splice_read and ->splice_write
> 9bb48c82aced tty: implement write_iter
> dd78b0c483e3 tty: implement read_iter
> 14e3e989f6a5 proc mountinfo: make splice available again
> c1048828c3db orangefs: add splice file operations
> 960f4f8a4e60 fs: 9p: add generic splice_write file operation
> cf03f316ad20 fs: 9p: add generic splice_read file operations
> 06a17bbe1d47 afs: Fix copy_file_range()
Ah..so this explains why copy_file_range() also returns -EINVAL now on
some file systems, such a minix since that uses splicing too via
do_splice_direct(). :-/
>
> So the question is likely, "do we want this for /dev/zero?"
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-05-12 15:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-07 18:05 splice() from /dev/zero to a pipe does not work (5.9+) Colin Ian King
2021-05-07 18:21 ` Kees Cook
2021-05-07 19:06 ` Linus Torvalds
2021-05-07 19:17 ` Al Viro
2021-05-07 19:29 ` Linus Torvalds
2021-05-07 20:31 ` Al Viro
2021-05-12 15:13 ` Colin Ian King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).