linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+
@ 2013-01-19  4:49 Eric Wong
  2013-01-19  5:54 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2013-01-19  4:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, linux-fsdevel, Eric Dumazet, Willy Tarreau

With the following flow, I'm sometimes getting an unexpected EOF on the
pipe reader even though I never close the pipe writer:

  tcp_wr -write-> tcp_rd -splice-> pipe_wr -> pipe_rd -splice-> /dev/null

I encounter this in in 3.7.3, 3.8-rc3, and the latest from Linus
3.8-rc4+(5da1f88b8b727dc3a66c52d4513e871be6d43d19)

It takes longer (about 20s) to reproduce this issue on my KVM (2 cores)
running the latest Linus kernel, so maybe real/faster hardware is needed.
My dual-core laptop (on 3.7.3) which hosts the VM does encounter this
issue within a few seconds (or even <1s).

Using schedtool to pin to a single core (any CPU core) on real hardware
seems to avoid this issue on real hardware.  Not sure how KVM uses CPUs,
but schedtool doesn't help inside my VM (not even schedtool on the KVM
process).

Example code below (and via: git clone git://bogomips.org/spliceeof )

Expected outout from ./spliceeof:
	done writing
	splice(in) EOF (expected)

Output I get from ./spliceeof:
	splice(out) EOF (UNEXPECTED)
	in left: 47716 # the byte value varies

I've successfully run similar code within the past year on some 3.x
kernels, so I think this issue is fairly recent (Cc-ing folks who
have touched splice lately).

Any likely candidates before I start bisection?  Thanks for reading.

-------------------------------- 8< ------------------------------
#define _GNU_SOURCE
#include <poll.h>
#include <sys/ioctl.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/tcp.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <assert.h>
#include <limits.h>
#include <sys/times.h>

static void tcp_socketpair(int sv[2], int accept_flags)
{
	struct sockaddr_in addr;
	socklen_t addrlen = sizeof(addr);
	int l = socket(PF_INET, SOCK_STREAM, 0);
	int c = socket(PF_INET, SOCK_STREAM, 0);
	int a;

	addr.sin_family = AF_INET;
	addr.sin_addr.s_addr = INADDR_ANY;
	addr.sin_port = 0;
	assert(0 == bind(l, (struct sockaddr*)&addr, addrlen));
	assert(0 == listen(l, 5));
	assert(0 == getsockname(l, (struct sockaddr *)&addr, &addrlen));
	assert(0 == connect(c, (struct sockaddr *)&addr, addrlen));
	a = accept4(l, NULL, NULL, accept_flags);
	assert(a >= 0);
	close(l);
	sv[0] = a;
	sv[1] = c;
}

static void * write_loop(void * fdp)
{
	int fd = *(int *)fdp;
	char buf[16384];
	ssize_t w;
	size_t want = ULONG_MAX; /* try changing this around */

	while (want > 0) {
		size_t to_write = want > sizeof(buf) ? sizeof(buf) : want;

		w = write(fd, buf, to_write);

		if (w < 0) {
			dprintf(2, "write returned zero with %zu left\n", want);
			goto fail;
		} else if (w == 0) {
			dprintf(2, "write failed: %m with %zu left\n", want);
			goto fail;
		} else {
			want -= (size_t)w;
		}
	}
	dprintf(2, "done writing\n");
fail:
	close(fd);
	return NULL;
}

static void io_wait(int fd, short events)
{
	struct pollfd p;
	int rc;

	p.fd = fd;
	p.events = events;

	rc = poll(&p, 1, -1);
	assert(rc == 1 && "poll failed");
}

int main(void)
{
	int tcp_pair[2];
	int pbuf[2];
	pthread_t wt;
	int dst = open("/dev/null", O_WRONLY);
	size_t len = 1024 * 1024;
	ssize_t in, out;
	size_t in_total = 0;
	size_t out_total = 0;
	int fl = SPLICE_F_NONBLOCK;

	assert(dst >= 0 && "open(/dev/null) failed");
	tcp_socketpair(tcp_pair, SOCK_NONBLOCK);
	assert(0 == pthread_create(&wt, NULL, write_loop, &tcp_pair[1]));
	assert(0 == pipe2(pbuf, O_NONBLOCK));

	for (;;) {
		in = splice(tcp_pair[0], NULL, pbuf[1], NULL, len, fl);

		if (in < 0) {
			if (errno == EAGAIN) {
				io_wait(tcp_pair[0], POLLIN);
				io_wait(pbuf[1], POLLOUT);
				continue;
			}
			dprintf(2, "splice(in) err: %m\n");
			break;
		} else if (in == 0) {
			dprintf(2, "splice(in) EOF (expected)\n");
			break;
		}

		in_total += in;
		while (in > 0) {
			out = splice(pbuf[0], NULL, dst, NULL, (size_t)in, fl);
			if (out < 0) {
				dprintf(2, "splice(out) err: %m\n");
				exit(1);
			} else if (out == 0) {
				dprintf(2, "splice(out) EOF (UNEXPECTED)\n");
				dprintf(2, "in left: %zd\n", in);
				exit(1);
			} else {
				in -= out;
				out_total += out;
			}
		}
	}
	assert(0 == pthread_join(wt, NULL));
	return 0;
}
-------------------------------- 8< ------------------------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+
  2013-01-19  4:49 splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+ Eric Wong
@ 2013-01-19  5:54 ` Eric Dumazet
  2013-01-19  6:13   ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-01-19  5:54 UTC (permalink / raw)
  To: Eric Wong; +Cc: linux-kernel, netdev, linux-fsdevel, Willy Tarreau

On Sat, 2013-01-19 at 04:49 +0000, Eric Wong wrote:
> With the following flow, I'm sometimes getting an unexpected EOF on the
> pipe reader even though I never close the pipe writer:
> 
>   tcp_wr -write-> tcp_rd -splice-> pipe_wr -> pipe_rd -splice-> /dev/null
> 
> I encounter this in in 3.7.3, 3.8-rc3, and the latest from Linus
> 3.8-rc4+(5da1f88b8b727dc3a66c52d4513e871be6d43d19)
> 
> It takes longer (about 20s) to reproduce this issue on my KVM (2 cores)
> running the latest Linus kernel, so maybe real/faster hardware is needed.
> My dual-core laptop (on 3.7.3) which hosts the VM does encounter this
> issue within a few seconds (or even <1s).
> 
> Using schedtool to pin to a single core (any CPU core) on real hardware
> seems to avoid this issue on real hardware.  Not sure how KVM uses CPUs,
> but schedtool doesn't help inside my VM (not even schedtool on the KVM
> process).
> 
> Example code below (and via: git clone git://bogomips.org/spliceeof )
> 
> Expected outout from ./spliceeof:
> 	done writing
> 	splice(in) EOF (expected)
> 
> Output I get from ./spliceeof:
> 	splice(out) EOF (UNEXPECTED)
> 	in left: 47716 # the byte value varies
> 
> I've successfully run similar code within the past year on some 3.x
> kernels, so I think this issue is fairly recent (Cc-ing folks who
> have touched splice lately).
> 
> Any likely candidates before I start bisection?  Thanks for reading.
> 
> -------------------------------- 8< ------------------------------

Hmm, this might be already fixed in net-next tree, could you try it ?

Thanks !



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+
  2013-01-19  5:54 ` Eric Dumazet
@ 2013-01-19  6:13   ` Eric Dumazet
  2013-01-19  7:04     ` Willy Tarreau
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Eric Dumazet @ 2013-01-19  6:13 UTC (permalink / raw)
  To: Eric Wong, David Miller
  Cc: linux-kernel, netdev, linux-fsdevel, Willy Tarreau

On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote:

> 
> Hmm, this might be already fixed in net-next tree, could you try it ?
> 

Yes, running your program on net-next seems OK.

David, we need the two following commits.

They actually fixed a bug : current code in Linus tree
can push to the pipe a 0-length frag, because of the :

flen = min_t(unsigned int, flen, PAGE_SIZE - poff);

It can happen if poff == PAGE_SIZE, when one skb frag has this
particular starting offset.


commit 9ca1b22d6d228177e6f929f6818a1cd3d5e30c4a
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat Jan 5 21:31:18 2013 +0000

    net: splice: avoid high order page splitting
    
    splice() can handle pages of any order, but network code tries hard to
    split them in PAGE_SIZE units. Not quite successfully anyway, as
    __splice_segment() assumed poff < PAGE_SIZE. This is true for
    the skb->data part, not necessarily for the fragments.
    
    This patch removes this logic to give the pages as they are in the skb.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Willy Tarreau <w@1wt.eu>
    Signed-off-by: David S. Miller <davem@davemloft.net>


commit 18aafc622abf492809723d9c5a3c5dcea287169e
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Jan 11 14:46:37 2013 +0000

    net: splice: fix __splice_segment()
    
    commit 9ca1b22d6d2 (net: splice: avoid high order page splitting)
    forgot that skb->head could need a copy into several page frags.
    
    This could be the case for loopback traffic mostly.
    
    Also remove now useless skb argument from linear_to_page()
    and __splice_segment() prototypes.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Willy Tarreau <w@1wt.eu>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+
  2013-01-19  6:13   ` Eric Dumazet
@ 2013-01-19  7:04     ` Willy Tarreau
  2013-01-19  7:15     ` Eric Wong
  2013-01-21  4:21     ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: Willy Tarreau @ 2013-01-19  7:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Wong, David Miller, linux-kernel, netdev, linux-fsdevel

On Fri, Jan 18, 2013 at 10:13:16PM -0800, Eric Dumazet wrote:
> On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote:
> 
> > 
> > Hmm, this might be already fixed in net-next tree, could you try it ?
> > 
> 
> Yes, running your program on net-next seems OK.
> 
> David, we need the two following commits.
> 
> They actually fixed a bug : current code in Linus tree
> can push to the pipe a 0-length frag, because of the :
(...)

And FWIW I confirm that my test machines which have been running 3.7
with these two patches since you proposed them have never experienced
such an issue.

Willy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+
  2013-01-19  6:13   ` Eric Dumazet
  2013-01-19  7:04     ` Willy Tarreau
@ 2013-01-19  7:15     ` Eric Wong
  2013-01-21  4:21     ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2013-01-19  7:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, linux-kernel, netdev, linux-fsdevel, Willy Tarreau

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote:
> > Hmm, this might be already fixed in net-next tree, could you try it ?
> 
> Yes, running your program on net-next seems OK.
> 
> David, we need the two following commits.

> commit 9ca1b22d6d228177e6f929f6818a1cd3d5e30c4a
> commit 18aafc622abf492809723d9c5a3c5dcea287169e

Thanks Eric!  I cherry picked both of these on top on 3.7.3
and everything is great \o/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+
  2013-01-19  6:13   ` Eric Dumazet
  2013-01-19  7:04     ` Willy Tarreau
  2013-01-19  7:15     ` Eric Wong
@ 2013-01-21  4:21     ` David Miller
  2013-02-08  2:39       ` Eric Wong
  2 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2013-01-21  4:21 UTC (permalink / raw)
  To: eric.dumazet; +Cc: normalperson, linux-kernel, netdev, linux-fsdevel, w

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 18 Jan 2013 22:13:16 -0800

> On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote:
> 
>> 
>> Hmm, this might be already fixed in net-next tree, could you try it ?
>> 
> 
> Yes, running your program on net-next seems OK.
> 
> David, we need the two following commits.

Tossed into 'net' and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+
  2013-01-21  4:21     ` David Miller
@ 2013-02-08  2:39       ` Eric Wong
  2013-02-08  3:26         ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2013-02-08  2:39 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, linux-kernel, netdev, linux-fsdevel, w

David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 18 Jan 2013 22:13:16 -0800
> 
> > On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote:
> > 
> >> 
> >> Hmm, this might be already fixed in net-next tree, could you try it ?
> >> 
> > 
> > Yes, running your program on net-next seems OK.
> > 
> > David, we need the two following commits.
> 
> Tossed into 'net' and queued up for -stable, thanks.

Hi David, any update on getting these into -stable?  Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+
  2013-02-08  2:39       ` Eric Wong
@ 2013-02-08  3:26         ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2013-02-08  3:26 UTC (permalink / raw)
  To: normalperson; +Cc: eric.dumazet, linux-kernel, netdev, linux-fsdevel, w

From: Eric Wong <normalperson@yhbt.net>
Date: Fri, 8 Feb 2013 02:39:46 +0000

> David Miller <davem@davemloft.net> wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Fri, 18 Jan 2013 22:13:16 -0800
>> 
>> > On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote:
>> > 
>> >> 
>> >> Hmm, this might be already fixed in net-next tree, could you try it ?
>> >> 
>> > 
>> > Yes, running your program on net-next seems OK.
>> > 
>> > David, we need the two following commits.
>> 
>> Tossed into 'net' and queued up for -stable, thanks.
> 
> Hi David, any update on getting these into -stable?  Thanks.

I submit stable patches once they have cooked upstream for
a week or two.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-02-08  3:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-19  4:49 splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+ Eric Wong
2013-01-19  5:54 ` Eric Dumazet
2013-01-19  6:13   ` Eric Dumazet
2013-01-19  7:04     ` Willy Tarreau
2013-01-19  7:15     ` Eric Wong
2013-01-21  4:21     ` David Miller
2013-02-08  2:39       ` Eric Wong
2013-02-08  3:26         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).