linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pipe: Make a partially-satisfied blocking read wait for more
@ 2023-06-23 22:34 David Howells
  2023-06-23 22:41 ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: David Howells @ 2023-06-23 22:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Franck Grosjean, Phil Auld, Alexander Viro,
	Christian Brauner, linux-fsdevel, linux-kernel

Hi Linus,

Can you consider merging something like the attached patch?  Unfortunately,
there are applications out there that depend on a read from pipe() waiting
until the buffer is full under some circumstances.  Patch a28c8b9db8a1
removed the conditionality on there being an attached writer.

I'm not sure this is the best solution though as it goes over the other way
and will now block reads for which there isn't an active writer - and I'm
sure that, somewhere, there's an app that will break on tht.

Thanks,
David
---
pipe: Make a partially-satisfied blocking read wait for more data

A read on a pipe may return short after reading some data from a pipe, even
though the pipe isn't non-blocking.  This is stated in the read(2) manual
page:

    ... It is not an error if this number is smaller than the number of
    bytes requested; this may happen for example because fewer bytes are
    actually available right now (maybe because we were close to
    end-of-file, or because we are reading from a pipe, or from a
    terminal)...

However, some applications depend on a blocking read on a pipe not
returning until it fills the buffer unless it hits EOF or a signal occurs -
at least as long as there's an active writer on the other end.

Fix the pipe reader to restore this behaviour by only breaking out with a
short read in the non-block (and signal) cases.

Here's a reproducer for it:

    #include <fcntl.h>
    #include <stdio.h>
    #include <unistd.h>
    #include <stdlib.h>
    #include <sys/uio.h>

    #define F_GETPIPE_SZ 1032

    int main(int argc, char *argv[])
    {
       int fildes[2];
       if (pipe(fildes) == -1) {
	       perror("in pipe");
	       return -1;
       }
       printf("%d %d\n",
              fcntl(fildes[0], F_GETPIPE_SZ),
              fcntl(fildes[1], F_GETPIPE_SZ));
       if (fork() != 0) {
	       void *tata = malloc(100000);
	       int res = read(fildes[0], tata, 100000);
	       printf("could read %d bytes\n", res);
	       return -1;
       }
       void *toto = malloc(100000);
       struct iovec iov;
       iov.iov_base = toto;
       iov.iov_len = 100000;
       int d = writev(fildes[1], &iov, 1);
       if (d == -1) {
	       perror("in writev");
	       return -1;
       }
       printf("could write %d bytes\n", d);
       sleep(1);
       return 0;
    }

It should show the same amount read as written, but shows a short read because
the pipe capacity isn't sufficient.

Fixes: a28c8b9db8a1 ("pipe: remove 'waiting_writers' merging logic")
Reported-by: Franck Grosjean <fgrosjea@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Phil Auld <pauld@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: linux-fsdevel@vger.kernel.org
---
 fs/pipe.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 2d88f73f585a..c5c992f19d28 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -340,11 +340,10 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)

 		if (!pipe->writers)
 			break;
-		if (ret)
-			break;
 		if ((filp->f_flags & O_NONBLOCK) ||
 		    (iocb->ki_flags & IOCB_NOWAIT)) {
-			ret = -EAGAIN;
+			if (!ret)
+				ret = -EAGAIN;
 			break;
 		}
 		__pipe_unlock(pipe);


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pipe: Make a partially-satisfied blocking read wait for more
  2023-06-23 22:34 [PATCH] pipe: Make a partially-satisfied blocking read wait for more David Howells
@ 2023-06-23 22:41 ` Linus Torvalds
  2023-06-23 23:08   ` Linus Torvalds
  2023-06-26  9:31   ` David Laight
  0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2023-06-23 22:41 UTC (permalink / raw)
  To: David Howells
  Cc: Franck Grosjean, Phil Auld, Alexander Viro, Christian Brauner,
	linux-fsdevel, linux-kernel

On Fri, 23 Jun 2023 at 15:34, David Howells <dhowells@redhat.com> wrote:
>
> Can you consider merging something like the attached patch?  Unfortunately,
> there are applications out there that depend on a read from pipe() waiting
> until the buffer is full under some circumstances.  Patch a28c8b9db8a1
> removed the conditionality on there being an attached writer.

This patch seems actively wrong, in that now it's possibly waiting for
data that won't come, even if it's nonblocking.

What are these alleged broken apps? That commit a28c8b9db8a1 ("pipe:
remove 'waiting_writers' merging logic") is 3+ years old, and we
haven't heard people complain about it.

POSIX guarantees PIPE_BUF data, but that's 4kB. Your made-up test-case
is not valid, and never has been.

Yes, we used to do that write merging for performance reasons to avoid
extra system calls. And yes, we'll maintain semantics if people
actually end up having broken apps that depend on them, but I want to
know *what* broken app depends on this before I re-instate the write
merging.

And if we do re-instate it, I'm afraid we will have to do so with that
whole "waiting_writers" logic, so that we don't have the "reader waits
for data that might not come".

Because that patch of yours seems really broken. Nobody has *ever*
said "a read() on a pipe will always satisfy the whole buffer". That's
just completely bogus.

So let's name and shame the code that actually depended on it. And
maybe we'll have to revert commit a28c8b9db8a1, but after three+ years
of nobody reporting it I'm not really super-convinced.

               Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pipe: Make a partially-satisfied blocking read wait for more
  2023-06-23 22:41 ` Linus Torvalds
@ 2023-06-23 23:08   ` Linus Torvalds
  2023-06-23 23:32     ` Linus Torvalds
  2023-06-26  9:31   ` David Laight
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2023-06-23 23:08 UTC (permalink / raw)
  To: David Howells
  Cc: Franck Grosjean, Phil Auld, Alexander Viro, Christian Brauner,
	linux-fsdevel, linux-kernel

On Fri, 23 Jun 2023 at 15:41, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This patch seems actively wrong, in that now it's possibly waiting for
> data that won't come, even if it's nonblocking.

In fact, I'd expect that patch to fail immediately on a perfectly
normal program that passes a token around by doing a small write to a
pipe, and have the "token reader" do a bigger write.

Blocking on read(), waiting for more data, would be blocking forever.
The read already got the token, there isn't going to be anything else.

So I'm pretty sure that patch is completely wrong, and whatever
program is "fixed" by it is very very buggy.

Again - we do have the rule that regressions are regressions even for
buggy user space, but when it's been 3+ years and you don't even
mention the broken app, I am not impressed.

             Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pipe: Make a partially-satisfied blocking read wait for more
  2023-06-23 23:08   ` Linus Torvalds
@ 2023-06-23 23:32     ` Linus Torvalds
  2023-06-26  9:16       ` David Laight
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2023-06-23 23:32 UTC (permalink / raw)
  To: David Howells
  Cc: Franck Grosjean, Phil Auld, Alexander Viro, Christian Brauner,
	linux-fsdevel, linux-kernel

On Fri, 23 Jun 2023 at 16:08, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In fact, I'd expect that patch to fail immediately on a perfectly
> normal program that passes a token around by doing a small write to a
> pipe, and have the "token reader" do a bigger write.

Bigger _read_, of course.

This might be hidden by such programs typically doing a single byte
write and a single byte read, but I could easily imagine situations
where people actually depend on the POSIX atomicity guarantees, ie you
write a "token packet" that might be variable-sized, and the reader
then just does a maximally sized read, knowing that it will get a full
packet or nothing.

So a read() of a pipe absolutely has to return when it has gotten
*any* data. Except if it can know that there is a writer that is still
busy and still in the process of writing more data.

Which was that old 'pipe->waiting_writers' logic - it basically
counted "there are <N> active writers that still have more data to
write, but the buffer filled up".

That logic went back to ancient times, when our pipe buffer was just a
single page - so it helped throughput immensely if we had writers that
did big writes, and readers would continue to read even when the small
buffer was completely used up (rather than return data just one page
at a time for each read() system call).

               Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] pipe: Make a partially-satisfied blocking read wait for more
  2023-06-23 23:32     ` Linus Torvalds
@ 2023-06-26  9:16       ` David Laight
  0 siblings, 0 replies; 6+ messages in thread
From: David Laight @ 2023-06-26  9:16 UTC (permalink / raw)
  To: 'Linus Torvalds', David Howells
  Cc: Franck Grosjean, Phil Auld, Alexander Viro, Christian Brauner,
	linux-fsdevel, linux-kernel

From: Linus Torvalds
> Sent: 24 June 2023 00:32
> 
> On Fri, 23 Jun 2023 at 16:08, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > In fact, I'd expect that patch to fail immediately on a perfectly
> > normal program that passes a token around by doing a small write to a
> > pipe, and have the "token reader" do a bigger write.
> 
> Bigger _read_, of course.
> 
> This might be hidden by such programs typically doing a single byte
> write and a single byte read, but I could easily imagine situations
> where people actually depend on the POSIX atomicity guarantees, ie you
> write a "token packet" that might be variable-sized, and the reader
> then just does a maximally sized read, knowing that it will get a full
> packet or nothing.

There are definitely programs that just do a large read in order
to consume all the single byte 'wakeup' writes.

(The 'must check' on these reads is a right PITA.)

They ought to set the pipe non-blocking, but I suspect many
don't - because it all works anyway.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] pipe: Make a partially-satisfied blocking read wait for more
  2023-06-23 22:41 ` Linus Torvalds
  2023-06-23 23:08   ` Linus Torvalds
@ 2023-06-26  9:31   ` David Laight
  1 sibling, 0 replies; 6+ messages in thread
From: David Laight @ 2023-06-26  9:31 UTC (permalink / raw)
  To: 'Linus Torvalds', David Howells
  Cc: Franck Grosjean, Phil Auld, Alexander Viro, Christian Brauner,
	linux-fsdevel, linux-kernel

From: Linus Torvalds
> Sent: 23 June 2023 23:42
> 
> On Fri, 23 Jun 2023 at 15:34, David Howells <dhowells@redhat.com> wrote:
> >
> > Can you consider merging something like the attached patch?  Unfortunately,
> > there are applications out there that depend on a read from pipe() waiting
> > until the buffer is full under some circumstances.  Patch a28c8b9db8a1
> > removed the conditionality on there being an attached writer.
> 
> This patch seems actively wrong, in that now it's possibly waiting for
> data that won't come, even if it's nonblocking.

I think it pretty much breaks:
	command | tee file
where 'command' is careful to fflush(stdout).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-26  9:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-23 22:34 [PATCH] pipe: Make a partially-satisfied blocking read wait for more David Howells
2023-06-23 22:41 ` Linus Torvalds
2023-06-23 23:08   ` Linus Torvalds
2023-06-23 23:32     ` Linus Torvalds
2023-06-26  9:16       ` David Laight
2023-06-26  9:31   ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).