All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
@ 2009-11-05  9:59 Max Kellermann
  2009-11-05 10:12 ` Max Kellermann
  2009-11-05 10:30 ` Eric Dumazet
  0 siblings, 2 replies; 9+ messages in thread
From: Max Kellermann @ 2009-11-05  9:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, max

When splicing a large amount of bytes from a TCP socket to a pipe
(more than PIPE_BUFFERS), splice() can block, even though the pipe was
empty.  The correct behavior would be to copy as much as possible, and
return without blocking.  Block only if nothing can be transferred.
When the destination pipe is (initially) writable, splice() should do
the same with or without SPLICE_F_NONBLOCK.

The cause is the loop in tcp_splice_read(): it calls
__tcp_splice_read() (and thus skb_splice_bits() and splice_to_pipe())
again and again, until the requested number of bytes has been
transferred, or an error occurs.  In the first iteration, up to 64 kB
is copied, and the second iteration will block, because
splice_to_pipe() is called again and sees the pipe is already full.

This patch adds SPLICE_F_NONBLOCK to the splice flags after the first
iteration has finished successfully.  This prevents the second
splice_to_pipe() call from blocking.  The resulting EAGAIN error is
handled gracefully, and tcp_splice_read() returns the number of bytes
successfully moved.
---

 net/ipv4/tcp.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9114524..0f8b01f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -628,6 +628,11 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
 		    (sk->sk_shutdown & RCV_SHUTDOWN) ||
 		    signal_pending(current))
 			break;
+
+		/* the following splice_to_pipe() calls should not
+		   block, because we have already successfully
+		   transferred at least one buffer */
+		tss.flags |= SPLICE_F_NONBLOCK;
 	}
 
 	release_sock(sk);


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
  2009-11-05  9:59 [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced Max Kellermann
@ 2009-11-05 10:12 ` Max Kellermann
  2009-11-05 10:30 ` Eric Dumazet
  1 sibling, 0 replies; 9+ messages in thread
From: Max Kellermann @ 2009-11-05 10:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: jens.axboe, max

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

Hi,

here is a small test program for the bug.  The last splice() blocks.
Interestingly, if you close the client socket before splice(), it does
not block, and the number of bytes transferred is smaller.

In my patch submission, I forgot the Signed-off-by line - please use
the attached patch file instead.

Max

[-- Attachment #2: splice-block.c --]
[-- Type: text/x-csrc, Size: 3777 bytes --]

/*
 * This program demonstrates how a splice() from a TCP socket blocks,
 * even though the destination pipe is empty.
 *
 * Copyright 2009 Content Management AG, Cologne, Germany
 * Author: Max Kellermann <mk@cm4all.com>
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License as
 * published by the Free Software Foundation; version 2 of the
 * License.
 */

#define _GNU_SOURCE
#include <fcntl.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>

static size_t fill_socket(int fd, size_t max_fill)
{
    static char buffer[4096];
    ssize_t nbytes;
    size_t sent = 0;

    do {
        nbytes = send(fd, buffer, sizeof(buffer), MSG_DONTWAIT);
        if (nbytes >= 0)
            sent += (size_t)nbytes;
        else if (errno == EAGAIN)
            /* the socket buffer is full */
            break;
        else {
            perror("send() failed");
            exit(1);
        }
    } while (sent < max_fill);

    printf("sent %zu bytes\n", sent);
    return sent;
}

static void run(size_t max_fill, size_t max_splice, unsigned splice_flags,
                bool early_close)
{
    int i, listen_socket, client_socket, server_socket, pipefds[2];
    struct sockaddr_in sa;
    size_t sent;
    ssize_t nbytes;

    /* set up socket and pipe */

    memset(&sa, 0, sizeof(sa));
    sa.sin_family = AF_INET;
    sa.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
    sa.sin_port = htons(1234);

    listen_socket = socket(AF_INET, SOCK_STREAM, 0);
    i = 1;
    if (listen_socket < 0 ||
        setsockopt(listen_socket, SOL_SOCKET, SO_REUSEADDR, &i, sizeof(i)) ||
        bind(listen_socket, (const struct sockaddr *)&sa, sizeof(sa)) < 0 ||
        listen(listen_socket, 1) < 0) {
        perror("failed to listen");
        exit(1);
    }

    client_socket = socket(AF_INET, SOCK_STREAM, 0);
    if (client_socket < 0 ||
        connect(client_socket, (const struct sockaddr *)&sa, sizeof(sa)) < 0) {
        perror("failed to connect");
        exit(1);
    }

    server_socket = accept(listen_socket, NULL, 0);
    if (server_socket < 0) {
        perror("failed to accept");
        exit(1);
    }

    close(listen_socket);

    if (pipe(pipefds) < 0) {
        perror("pipe() failed");
        exit(1);
    }

    /* fill the socket buffer */

    sent = fill_socket(client_socket, max_fill);

    printf("%sclosing client socket\n", early_close ? "" : "not ");
    if (early_close)
        close(client_socket);

    /* now splice from the socket into the pipe, as much as
       possible */

    if (max_splice == 0)
        max_splice = sent;

    printf("invoking splice(%zu, 0x%x)\n",
           max_splice, splice_flags);

    nbytes = splice(server_socket, NULL, pipefds[1], NULL,
                    max_splice, splice_flags);
    if (nbytes >= 0)
        printf("splice(%zu, 0x%x) = %zi\n",
               max_splice, splice_flags, nbytes);
    if (nbytes < 0)
        printf("splice(%zu, 0x%x) failed: %s\n",
               max_splice, splice_flags, strerror(errno));

    if (!early_close)
        close(client_socket);

    close(server_socket);
    close(pipefds[0]);
    close(pipefds[1]);
}

int main(int argc, char **argv)
{
    (void)argc; (void)argv;

    run(4096, 0, 0, true);
    run(262144, 4096, 0, true);
    run(262144, 0, SPLICE_F_NONBLOCK, true);
    /* when closing the client socket, this does not block! */
    run(262144, 0, 0, true);

    run(4096, 0, 0, false);
    run(262144, 4096, 0, false);
    run(262144, 0, SPLICE_F_NONBLOCK, false);

    /* this will block (2.6.31, 2.6.32-rc6) */
    run(262144, 0, 0, false);

    return 0;
}

[-- Attachment #3: 0001-tcp-set-SPLICE_F_NONBLOCK-after-first-buffer-has-be.patch --]
[-- Type: text/x-diff, Size: 1924 bytes --]

>From 084ea2c0a1efdde6a1dc7f4bcabc32733fbac0ba Mon Sep 17 00:00:00 2001
From: Max Kellermann <mk@cm4all.com>
Date: Thu, 5 Nov 2009 11:05:25 +0100
Subject: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced

When splicing a large amount of bytes from a TCP socket to a pipe
(more than PIPE_BUFFERS), splice() can block, even though the pipe was
empty.  The correct behavior would be to copy as much as possible, and
return without blocking.  Block only if nothing can be transferred.
When the destination pipe is (initially) writable, splice() should do
the same with or without SPLICE_F_NONBLOCK.

The cause is the loop in tcp_splice_read(): it calls
__tcp_splice_read() (and thus skb_splice_bits() and splice_to_pipe())
again and again, until the requested number of bytes has been
transferred, or an error occurs.  In the first iteration, up to 64 kB
is copied, and the second iteration will block, because
splice_to_pipe() is called again and sees the pipe is already full.

This patch adds SPLICE_F_NONBLOCK to the splice flags after the first
iteration has finished successfully.  This prevents the second
splice_to_pipe() call from blocking.  The resulting EAGAIN error is
handled gracefully, and tcp_splice_read() returns the number of bytes
successfully moved.

Signed-off-by: Max Kellermann <mk@cm4all.com>
---
 net/ipv4/tcp.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9114524..0f8b01f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -628,6 +628,11 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
 		    (sk->sk_shutdown & RCV_SHUTDOWN) ||
 		    signal_pending(current))
 			break;
+
+		/* the following splice_to_pipe() calls should not
+		   block, because we have already successfully
+		   transferred at least one buffer */
+		tss.flags |= SPLICE_F_NONBLOCK;
 	}
 
 	release_sock(sk);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
  2009-11-05  9:59 [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced Max Kellermann
  2009-11-05 10:12 ` Max Kellermann
@ 2009-11-05 10:30 ` Eric Dumazet
  2009-11-05 10:57   ` Max Kellermann
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-11-05 10:30 UTC (permalink / raw)
  To: Max Kellermann; +Cc: linux-kernel, jens.axboe, max, Linux Netdev List

Max Kellermann a écrit :
> When splicing a large amount of bytes from a TCP socket to a pipe
> (more than PIPE_BUFFERS), splice() can block, even though the pipe was
> empty.  The correct behavior would be to copy as much as possible, and
> return without blocking.  Block only if nothing can be transferred.
> When the destination pipe is (initially) writable, splice() should do
> the same with or without SPLICE_F_NONBLOCK.
> 
> The cause is the loop in tcp_splice_read(): it calls
> __tcp_splice_read() (and thus skb_splice_bits() and splice_to_pipe())
> again and again, until the requested number of bytes has been
> transferred, or an error occurs.  In the first iteration, up to 64 kB
> is copied, and the second iteration will block, because
> splice_to_pipe() is called again and sees the pipe is already full.
> 
> This patch adds SPLICE_F_NONBLOCK to the splice flags after the first
> iteration has finished successfully.  This prevents the second
> splice_to_pipe() call from blocking.  The resulting EAGAIN error is
> handled gracefully, and tcp_splice_read() returns the number of bytes
> successfully moved.
> ---
> 
>  net/ipv4/tcp.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 9114524..0f8b01f 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -628,6 +628,11 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
>  		    (sk->sk_shutdown & RCV_SHUTDOWN) ||
>  		    signal_pending(current))
>  			break;
> +
> +		/* the following splice_to_pipe() calls should not
> +		   block, because we have already successfully
> +		   transferred at least one buffer */
> +		tss.flags |= SPLICE_F_NONBLOCK;
>  	}
>  
>  	release_sock(sk);
> 

CC netdev

I dont think this patch is correct. Could you describe your use case ?

If you dont want to block on output pipe, you should set this NONBLOCK 
flag before calling splice(SPLICE_F_NONBLOCK) syscall.

ie : Use the socket in blocking mode, but output pipe in non-blocking mode.

Some application could have a thread working in full blocking mode, and have another
thread reading the pipe (and eventually unblocking first thread)

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
  2009-11-05 10:30 ` Eric Dumazet
@ 2009-11-05 10:57   ` Max Kellermann
  2009-11-05 11:21     ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Max Kellermann @ 2009-11-05 10:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, jens.axboe, Linux Netdev List

On 2009/11/05 11:30, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> I dont think this patch is correct. Could you describe your use case ?

See my second email, there's a demo source.

> If you dont want to block on output pipe, you should set this NONBLOCK 
> flag before calling splice(SPLICE_F_NONBLOCK) syscall.
> 
> ie : Use the socket in blocking mode, but output pipe in non-blocking mode.

Do you think that a splice() should block if the socket is readable
and the pipe is writable according to select()?

"The correct behavior would be to copy as much as possible, and return
without blocking.  Block only if nothing can be transferred."

Do you disagree with that?

> Some application could have a thread working in full blocking mode,
> and have another thread reading the pipe (and eventually unblocking
> first thread)

I don't get this objection.  Please explain.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
  2009-11-05 10:57   ` Max Kellermann
@ 2009-11-05 11:21     ` Eric Dumazet
  2009-11-05 13:23         ` Max Kellermann
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-11-05 11:21 UTC (permalink / raw)
  To: Max Kellermann; +Cc: linux-kernel, jens.axboe, Linux Netdev List

Max Kellermann a écrit :
> On 2009/11/05 11:30, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> I dont think this patch is correct. Could you describe your use case ?
> 
> See my second email, there's a demo source.
> 
>> If you dont want to block on output pipe, you should set this NONBLOCK 
>> flag before calling splice(SPLICE_F_NONBLOCK) syscall.
>>
>> ie : Use the socket in blocking mode, but output pipe in non-blocking mode.
> 
> Do you think that a splice() should block if the socket is readable
> and the pipe is writable according to select()?
> 

Yes, this is perfectly legal

select() can return "OK to write on fd",
and still, write(fd, buffer, 10000000) is supposer/allowed to block if fd is not O_NDELAY

If you want to not block on fd, use O_NDELAY (if using write() syscall),
or SPLICE_F_NONBLOCK splice() flag ?

Please read recent commit on this area and why I think your patch conflicts with
this commit.

commit 42324c62704365d6a3e89138dea55909d2f26afe
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Thu Oct 1 15:26:00 2009 -0700

    net: splice() from tcp to pipe should take into account O_NONBLOCK

    tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag

    Before this patch :

    splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
    causes a random endless block (if pipe is full) and
    splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
    will return 0 immediately if the TCP buffer is empty.

    User application has no way to instruct splice() that socket should be in blocking mode
    but pipe in nonblock more.

    Many projects cannot use splice(tcp -> pipe) because of this flaw.

    http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
    http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html

    Linus introduced  SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
    (splice: add SPLICE_F_NONBLOCK flag )

      It doesn't make the splice itself necessarily nonblocking (because the
      actual file descriptors that are spliced from/to may block unless they
      have the O_NONBLOCK flag set), but it makes the splice pipe operations
      nonblocking.

    Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only

    This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK
    flag, as other socket operations do.
    Users will then call :

    splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );

    to block on data coming from socket (if file is in blocking mode),
    and not block on pipe output (to avoid deadlock)

    First version of this patch was submitted by Octavian Purdila

    Reported-by: Volker Lendecke <vl@samba.org>
    Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Acked-by: Jens Axboe <jens.axboe@oracle.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
  2009-11-05 11:21     ` Eric Dumazet
@ 2009-11-05 13:23         ` Max Kellermann
  0 siblings, 0 replies; 9+ messages in thread
From: Max Kellermann @ 2009-11-05 13:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, jens.axboe, Linux Netdev List

On 2009/11/05 12:21, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Max Kellermann a écrit :
> > Do you think that a splice() should block if the socket is readable
> > and the pipe is writable according to select()?
> > 
> 
> Yes, this is perfectly legal
> 
> select() can return "OK to write on fd",
> and still, write(fd, buffer, 10000000) is supposer/allowed to block if fd is not O_NDELAY

>From the select() manpage: "those in writefds will be watched to see
if a write will not block"

>From the poll() manpage: "Writing now will not block."

This looks unambiguous to me, and contradicts with your thesis.  Can
you provide sources?

What is your interpretation of the guarantees provided by select() and
poll()?  Which byte count is "ok" to write after POLLOUT, and how much
is "too much"?  How does the application know?

> Please read recent commit on this area and why I think your patch
> conflicts with this commit.

I understand your patch, but I don't understand the conflict with my
patch.  Can you describe a breakage caused by my patch?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
@ 2009-11-05 13:23         ` Max Kellermann
  0 siblings, 0 replies; 9+ messages in thread
From: Max Kellermann @ 2009-11-05 13:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, jens.axboe, Linux Netdev List

On 2009/11/05 12:21, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Max Kellermann a écrit :
> > Do you think that a splice() should block if the socket is readable
> > and the pipe is writable according to select()?
> > 
> 
> Yes, this is perfectly legal
> 
> select() can return "OK to write on fd",
> and still, write(fd, buffer, 10000000) is supposer/allowed to block if fd is not O_NDELAY

From the select() manpage: "those in writefds will be watched to see
if a write will not block"

From the poll() manpage: "Writing now will not block."

This looks unambiguous to me, and contradicts with your thesis.  Can
you provide sources?

What is your interpretation of the guarantees provided by select() and
poll()?  Which byte count is "ok" to write after POLLOUT, and how much
is "too much"?  How does the application know?

> Please read recent commit on this area and why I think your patch
> conflicts with this commit.

I understand your patch, but I don't understand the conflict with my
patch.  Can you describe a breakage caused by my patch?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
  2009-11-05 13:23         ` Max Kellermann
  (?)
@ 2009-11-05 14:11         ` Eric Dumazet
  2009-11-05 14:33           ` Max Kellermann
  -1 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-11-05 14:11 UTC (permalink / raw)
  To: Max Kellermann; +Cc: linux-kernel, jens.axboe, Linux Netdev List

Max Kellermann a écrit :
> On 2009/11/05 12:21, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Max Kellermann a écrit :
>>> Do you think that a splice() should block if the socket is readable
>>> and the pipe is writable according to select()?
>>>
>> Yes, this is perfectly legal
>>
>> select() can return "OK to write on fd",
>> and still, write(fd, buffer, 10000000) is supposer/allowed to block if fd is not O_NDELAY
> 
>>From the select() manpage: "those in writefds will be watched to see
> if a write will not block"
> 
>>From the poll() manpage: "Writing now will not block."
> 
> This looks unambiguous to me, and contradicts with your thesis.  Can
> you provide sources?
> 
> What is your interpretation of the guarantees provided by select() and
> poll()?  Which byte count is "ok" to write after POLLOUT, and how much
> is "too much"?  How does the application know?

It cannot, therefore an application uses O_NDELAY to avoid blocking.

Try following program if you are not convinced

#include <unistd.h>
#include <sys/poll.h>
#include <stdio.h>

char buffer[1000000];

int main(int argc, char *argv[])
{
	int fds[2];
	struct pollfd pfd;
	int res;
	
	pipe(fds);
	pfd.fd = fds[1];
	pfd.events = POLLOUT;
	res = poll(&pfd, 1, -1);
	if (res > 0 && pfd.revents & POLLOUT)
		printf("OK to write on pipe\n");
	write(fds[1], buffer, sizeof(buffer)); // why it blocks, did poll() lied ???
	return 0;
}



> I understand your patch, but I don't understand the conflict with my
> patch.  Can you describe a breakage caused by my patch?

I only pointed out that using splice(tcp -> pipe) and blocking on pipe
_can_ block, even on _first_ frame received from tcp, as you discovered.


Your only choices to avoid a deadlock are :
1) to use SPLICE_F_NONBLOCK.
2) Using a second thread to read the pipe and empty it. First thread will
   happily transfert 1000000 bytes in one syscall...
3) or limit your splice(... len, flags) length to 16 (16 buffers of one byte
   in pathological cases)

Your patch basically makes SPLICE_F_NONBLOCK option always set (choice 1) above)

So users wanting option 3) are stuck. You force them to use a poll()/select()
thing while they dont want to poll : They have a producer thread(s), and a consumer
thread(s).

producer()
{
	while (1)
		splice(tcp, &offset, pfds[1], NULL, 10000000,
		       SPLICE_F_MORE | SPLICE_F_MOVE);
}

Why in the first place have an option if it is always set ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
  2009-11-05 14:11         ` Eric Dumazet
@ 2009-11-05 14:33           ` Max Kellermann
  0 siblings, 0 replies; 9+ messages in thread
From: Max Kellermann @ 2009-11-05 14:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, jens.axboe, Linux Netdev List

On 2009/11/05 15:11, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> It cannot, therefore an application uses O_NDELAY to avoid blocking.
> 
> Try following program if you are not convinced

Indeed, I'm surprised by the result rendered by the Linux kernel.
That however still contradicts with the poll() documentation.

So this boils down to the question: kernel bug or documentation bug?

http://www.opengroup.org/onlinepubs/9699919799/functions/pselect.html

"A descriptor shall be considered ready for writing when a call to an
output function with O_NONBLOCK clear would not block, whether or not
the function would transfer data successfully."

There is no size limit mentioned here.  Your program reveals that the
kernel violates this definition.

> Your patch basically makes SPLICE_F_NONBLOCK option always set
> (choice 1) above)
[...]
> Why in the first place have an option if it is always set ?

It is not, you misunderstood my patch.  If there's no room in the pipe
buffer, then the first iteration of the "while" loop will block (as
usual).  *After* the first iteration has finished (and at least one
buffer has been moved already), the flag is set, and further
iterations will not block.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-11-05 14:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-05  9:59 [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced Max Kellermann
2009-11-05 10:12 ` Max Kellermann
2009-11-05 10:30 ` Eric Dumazet
2009-11-05 10:57   ` Max Kellermann
2009-11-05 11:21     ` Eric Dumazet
2009-11-05 13:23       ` Max Kellermann
2009-11-05 13:23         ` Max Kellermann
2009-11-05 14:11         ` Eric Dumazet
2009-11-05 14:33           ` Max Kellermann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.