All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] tcp: add zero copy receive
@ 2018-04-16 17:33 Eric Dumazet
  2018-04-16 17:33 ` [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning Eric Dumazet
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Eric Dumazet @ 2018-04-16 17:33 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

This patch series add mmap() support to TCP sockets for RX zero copy.

While tcp_mmap() patch itself is quite small (~100 LOC), optimal support
for asynchronous mmap() required better SO_RCVLOWAT behavior, and a
test program to demonstrate how mmap() on TCP sockets can be used.

Note that mmap() (and associated munmap()) calls are adding more
pressure on per-process VM semaphore, so might not show benefit
for processus with high number of threads.

Eric Dumazet (5):
  tcp: fix SO_RCVLOWAT and RCVBUF autotuning
  tcp: fix delayed acks behavior for SO_RCVLOWAT
  tcp: avoid extra wakeups for SO_RCVLOWAT users
  tcp: implement mmap() for zero copy receive
  selftests: net: add tcp_mmap program

 include/linux/net.h                    |   1 +
 include/net/tcp.h                      |   4 +
 net/core/sock.c                        |   5 +-
 net/ipv4/af_inet.c                     |   3 +-
 net/ipv4/tcp.c                         | 138 ++++++++
 net/ipv4/tcp_input.c                   |  22 +-
 net/ipv6/af_inet6.c                    |   3 +-
 tools/testing/selftests/net/Makefile   |   2 +
 tools/testing/selftests/net/tcp_mmap.c | 437 +++++++++++++++++++++++++
 9 files changed, 608 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/net/tcp_mmap.c

-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning
  2018-04-16 17:33 [PATCH net-next 0/5] tcp: add zero copy receive Eric Dumazet
@ 2018-04-16 17:33 ` Eric Dumazet
  2018-04-20  2:02   ` Marcelo Ricardo Leitner
  2018-04-16 17:33 ` [PATCH net-next 2/5] tcp: fix delayed acks behavior for SO_RCVLOWAT Eric Dumazet
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2018-04-16 17:33 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

Applications might use SO_RCVLOWAT on TCP socket hoping to receive
one [E]POLLIN event only when a given amount of bytes are ready in socket
receive queue.

Problem is that receive autotuning is not aware of this constraint,
meaning sk_rcvbuf might be too small to allow all bytes to be stored.

Add a new (struct proto_ops)->set_rcvlowat method so that a protocol
can override the default setsockopt(SO_RCVLOWAT) behavior.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/net.h |  1 +
 include/net/tcp.h   |  1 +
 net/core/sock.c     |  5 ++++-
 net/ipv4/af_inet.c  |  1 +
 net/ipv4/tcp.c      | 21 +++++++++++++++++++++
 net/ipv6/af_inet6.c |  1 +
 6 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 2248a052061d8aeb0ae08d233f181f09cba6384b..6554d3ba4396b3df49acac934ad16eeb71a695f4 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -197,6 +197,7 @@ struct proto_ops {
 					   int offset, size_t size, int flags);
 	int		(*sendmsg_locked)(struct sock *sk, struct msghdr *msg,
 					  size_t size);
+	int		(*set_rcvlowat)(struct sock *sk, int val);
 };
 
 #define DECLARE_SOCKADDR(type, dst, src)	\
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9c9b3768b350abfd51776563d220d5e97ca9da69..b2318242cad89176d3c2c027affd4db3c2549ff4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -402,6 +402,7 @@ void tcp_set_keepalive(struct sock *sk, int val);
 void tcp_syn_ack_timeout(const struct request_sock *req);
 int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		int flags, int *addr_len);
+int tcp_set_rcvlowat(struct sock *sk, int val);
 void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx,
 		       int estab, struct tcp_fastopen_cookie *foc);
diff --git a/net/core/sock.c b/net/core/sock.c
index 6444525f610cf8039516744ad26aec58485b9b8a..b2c3db169ca1892c4d624fc5e30af12f4eed0adb 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -905,7 +905,10 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 	case SO_RCVLOWAT:
 		if (val < 0)
 			val = INT_MAX;
-		sk->sk_rcvlowat = val ? : 1;
+		if (sock->ops->set_rcvlowat)
+			ret = sock->ops->set_rcvlowat(sk, val);
+		else
+			sk->sk_rcvlowat = val ? : 1;
 		break;
 
 	case SO_RCVTIMEO:
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index eaed0367e669aec7635b3cc41de4ece63bb018ec..f5c562aaef3522519bcf1ae37782a7e14e278723 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1006,6 +1006,7 @@ const struct proto_ops inet_stream_ops = {
 	.compat_getsockopt = compat_sock_common_getsockopt,
 	.compat_ioctl	   = inet_compat_ioctl,
 #endif
+	.set_rcvlowat	   = tcp_set_rcvlowat,
 };
 EXPORT_SYMBOL(inet_stream_ops);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bccc4c2700870b8c7ff592a6bd27acebd9bc6471..0abd8d1d3d1d4f0bd6e2762c8a2b862ecf31e4ae 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1701,6 +1701,27 @@ int tcp_peek_len(struct socket *sock)
 }
 EXPORT_SYMBOL(tcp_peek_len);
 
+/* Make sure sk_rcvbuf is big enough to satisfy SO_RCVLOWAT hint */
+int tcp_set_rcvlowat(struct sock *sk, int val)
+{
+	sk->sk_rcvlowat = val ? : 1;
+	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
+		return 0;
+
+	/* val comes from user space and might be close to INT_MAX */
+	val <<= 1;
+	if (val < 0)
+		val = INT_MAX;
+
+	val = min(val, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+	if (val > sk->sk_rcvbuf) {
+		sk->sk_rcvbuf = val;
+		tcp_sk(sk)->window_clamp = tcp_win_from_space(sk, val);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(tcp_set_rcvlowat);
+
 static void tcp_update_recv_tstamps(struct sk_buff *skb,
 				    struct scm_timestamping *tss)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 8da0b513f1882b39be4fa72a8233d702ae9ec53b..e70d59fb26e16ace1eb484d23964946092a2cd57 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -590,6 +590,7 @@ const struct proto_ops inet6_stream_ops = {
 	.compat_setsockopt = compat_sock_common_setsockopt,
 	.compat_getsockopt = compat_sock_common_getsockopt,
 #endif
+	.set_rcvlowat	   = tcp_set_rcvlowat,
 };
 
 const struct proto_ops inet6_dgram_ops = {
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/5] tcp: fix delayed acks behavior for SO_RCVLOWAT
  2018-04-16 17:33 [PATCH net-next 0/5] tcp: add zero copy receive Eric Dumazet
  2018-04-16 17:33 ` [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning Eric Dumazet
@ 2018-04-16 17:33 ` Eric Dumazet
  2018-04-16 17:33 ` [PATCH net-next 3/5] tcp: avoid extra wakeups for SO_RCVLOWAT users Eric Dumazet
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2018-04-16 17:33 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

We should not delay acks if there are not enough bytes
in receive queue to satisfy SO_RCVLOWAT.

Since [E]POLLIN event is not going to be generated, there is little
hope for a delayed ack to be useful.

In fact, delaying ACK prevents sender from completing
the transfer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 367def6ddeda950db841c0b9ccec98787e19e728..d854363a43875e98adbeea72c3434afb06f0f2b4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5026,9 +5026,12 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
 	    /* More than one full frame received... */
 	if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss &&
 	     /* ... and right edge of window advances far enough.
-	      * (tcp_recvmsg() will send ACK otherwise). Or...
+	      * (tcp_recvmsg() will send ACK otherwise).
+	      * If application uses SO_RCVLOWAT, we want send ack now if
+	      * we have not received enough bytes to satisfy the condition.
 	      */
-	     __tcp_select_window(sk) >= tp->rcv_wnd) ||
+	    (tp->rcv_nxt - tp->copied_seq < sk->sk_rcvlowat ||
+	     __tcp_select_window(sk) >= tp->rcv_wnd)) ||
 	    /* We ACK each frame or... */
 	    tcp_in_quickack_mode(sk) ||
 	    /* We have out of order data. */
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/5] tcp: avoid extra wakeups for SO_RCVLOWAT users
  2018-04-16 17:33 [PATCH net-next 0/5] tcp: add zero copy receive Eric Dumazet
  2018-04-16 17:33 ` [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning Eric Dumazet
  2018-04-16 17:33 ` [PATCH net-next 2/5] tcp: fix delayed acks behavior for SO_RCVLOWAT Eric Dumazet
@ 2018-04-16 17:33 ` Eric Dumazet
  2018-04-16 17:33 ` [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive Eric Dumazet
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2018-04-16 17:33 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

SO_RCVLOWAT is properly handled in tcp_poll(), so that POLLIN is only
generated when enough bytes are available in receive queue, after
David change (commit c7004482e8dc "tcp: Respect SO_RCVLOWAT in tcp_poll().")

But TCP still calls sk->sk_data_ready() for each chunk added in receive
queue, meaning thread is awaken, and goes back to sleep shortly after.

Tested:

tcp_mmap test program, receiving 32768 MB of data with SO_RCVLOWAT set to 512KB

-> Should get ~2 wakeups (c-switches) per MB, regardless of how many
(tiny or big) packets were received.

High speed (mostly full size GRO packets)

received 32768 MB (100 % mmap'ed) in 8.03112 s, 34.2266 Gbit,
  cpu usage user:0.037 sys:1.404, 43.9758 usec per MB, 65497 c-switches

received 32768 MB (99.9954 % mmap'ed) in 7.98453 s, 34.4263 Gbit,
  cpu usage user:0.03 sys:1.422, 44.3115 usec per MB, 65485 c-switches

Low speed (sender is ratelimited and sends 1-MSS at a time, so GRO is not helping)

received 22474.5 MB (100 % mmap'ed) in 6015.35 s, 0.0313414 Gbit,
  cpu usage user:0.05 sys:1.586, 72.7952 usec per MB, 44950 c-switches

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h    |  1 +
 net/ipv4/tcp.c       |  4 ++++
 net/ipv4/tcp_input.c | 15 +++++++++++++--
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index b2318242cad89176d3c2c027affd4db3c2549ff4..0ee85c47c185afcb8e1017d59e02313cb5df78ec 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -403,6 +403,7 @@ void tcp_syn_ack_timeout(const struct request_sock *req);
 int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		int flags, int *addr_len);
 int tcp_set_rcvlowat(struct sock *sk, int val);
+void tcp_data_ready(struct sock *sk);
 void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx,
 		       int estab, struct tcp_fastopen_cookie *foc);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0abd8d1d3d1d4f0bd6e2762c8a2b862ecf31e4ae..c768d306b65714bb8740c60110c43042508af6b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1705,6 +1705,10 @@ EXPORT_SYMBOL(tcp_peek_len);
 int tcp_set_rcvlowat(struct sock *sk, int val)
 {
 	sk->sk_rcvlowat = val ? : 1;
+
+	/* Check if we need to signal EPOLLIN right now */
+	tcp_data_ready(sk);
+
 	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
 		return 0;
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d854363a43875e98adbeea72c3434afb06f0f2b4..f93687f97d805732f1093d55a402638c4290700a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4576,6 +4576,17 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
 
 }
 
+void tcp_data_ready(struct sock *sk)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	int avail = tp->rcv_nxt - tp->copied_seq;
+
+	if (avail < sk->sk_rcvlowat && !sock_flag(sk, SOCK_DONE))
+		return;
+
+	sk->sk_data_ready(sk);
+}
+
 static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -4633,7 +4644,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		if (eaten > 0)
 			kfree_skb_partial(skb, fragstolen);
 		if (!sock_flag(sk, SOCK_DEAD))
-			sk->sk_data_ready(sk);
+			tcp_data_ready(sk);
 		return;
 	}
 
@@ -5434,7 +5445,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 no_ack:
 			if (eaten)
 				kfree_skb_partial(skb, fragstolen);
-			sk->sk_data_ready(sk);
+			tcp_data_ready(sk);
 			return;
 		}
 	}
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive
  2018-04-16 17:33 [PATCH net-next 0/5] tcp: add zero copy receive Eric Dumazet
                   ` (2 preceding siblings ...)
  2018-04-16 17:33 ` [PATCH net-next 3/5] tcp: avoid extra wakeups for SO_RCVLOWAT users Eric Dumazet
@ 2018-04-16 17:33 ` Eric Dumazet
  2018-04-19 23:15   ` Eric Dumazet
  2018-04-16 17:33 ` [PATCH net-next 5/5] selftests: net: add tcp_mmap program Eric Dumazet
  2018-04-16 22:48 ` [PATCH net-next 0/5] tcp: add zero copy receive David Miller
  5 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2018-04-16 17:33 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

Some networks can make sure TCP payload can exactly fit 4KB pages,
with well chosen MSS/MTU and architectures.

Implement mmap() system call so that applications can avoid
copying data without complex splice() games.

Note that a successful mmap( X bytes) on TCP socket is consuming
bytes, as if recvmsg() has been done. (tp->copied += X)

Only PROT_READ mappings are accepted, as skb page frags
are fundamentally shared and read only.

If tcp_mmap() finds data that is not a full page, or a patch of
urgent data, -EINVAL is returned, no bytes are consumed.

Application must fallback to recvmsg() to read the problematic sequence.

mmap() wont block,  regardless of socket being in blocking or
non-blocking mode. If not enough bytes are in receive queue,
mmap() would return -EAGAIN, or -EIO if socket is in a state
where no other bytes can be added into receive queue.

An application might use SO_RCVLOWAT, poll() and/or ioctl( FIONREAD)
to efficiently use mmap()

On the sender side, MSG_EOR might help to clearly separate unaligned
headers and 4K-aligned chunks if necessary.

Tested:

mlx4 (cx-3) 40Gbit NIC, with tcp_mmap program provided in following patch.
MTU set to 4168  (4096 TCP payload, 40 bytes IPv6 header, 32 bytes TCP header)

Without mmap() (tcp_mmap -s)

received 32768 MB (0 % mmap'ed) in 8.13342 s, 33.7961 Gbit,
  cpu usage user:0.034 sys:3.778, 116.333 usec per MB, 63062 c-switches
received 32768 MB (0 % mmap'ed) in 8.14501 s, 33.748 Gbit,
  cpu usage user:0.029 sys:3.997, 122.864 usec per MB, 61903 c-switches
received 32768 MB (0 % mmap'ed) in 8.11723 s, 33.8635 Gbit,
  cpu usage user:0.048 sys:3.964, 122.437 usec per MB, 62983 c-switches
received 32768 MB (0 % mmap'ed) in 8.39189 s, 32.7552 Gbit,
  cpu usage user:0.038 sys:4.181, 128.754 usec per MB, 55834 c-switches

With mmap() on receiver (tcp_mmap -s -z)

received 32768 MB (100 % mmap'ed) in 8.03083 s, 34.2278 Gbit,
  cpu usage user:0.024 sys:1.466, 45.4712 usec per MB, 65479 c-switches
received 32768 MB (100 % mmap'ed) in 7.98805 s, 34.4111 Gbit,
  cpu usage user:0.026 sys:1.401, 43.5486 usec per MB, 65447 c-switches
received 32768 MB (100 % mmap'ed) in 7.98377 s, 34.4296 Gbit,
  cpu usage user:0.028 sys:1.452, 45.166 usec per MB, 65496 c-switches
received 32768 MB (99.9969 % mmap'ed) in 8.01838 s, 34.281 Gbit,
  cpu usage user:0.02 sys:1.446, 44.7388 usec per MB, 65505 c-switches

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h   |   2 +
 net/ipv4/af_inet.c  |   2 +-
 net/ipv4/tcp.c      | 113 ++++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/af_inet6.c |   2 +-
 4 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0ee85c47c185afcb8e1017d59e02313cb5df78ec..833154e3df173ea41aa16dd1ec739a175c679c5c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -404,6 +404,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		int flags, int *addr_len);
 int tcp_set_rcvlowat(struct sock *sk, int val);
 void tcp_data_ready(struct sock *sk);
+int tcp_mmap(struct file *file, struct socket *sock,
+	     struct vm_area_struct *vma);
 void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx,
 		       int estab, struct tcp_fastopen_cookie *foc);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f5c562aaef3522519bcf1ae37782a7e14e278723..3ebf599cebaea4926decc1aad7274b12ec7e1566 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -994,7 +994,7 @@ const struct proto_ops inet_stream_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = inet_recvmsg,
-	.mmap		   = sock_no_mmap,
+	.mmap		   = tcp_mmap,
 	.sendpage	   = inet_sendpage,
 	.splice_read	   = tcp_splice_read,
 	.read_sock	   = tcp_read_sock,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c768d306b65714bb8740c60110c43042508af6b7..438fbca96cd3100d722e1bd8bcc6f49624495a21 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1726,6 +1726,119 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
 }
 EXPORT_SYMBOL(tcp_set_rcvlowat);
 
+/* When user wants to mmap X pages, we first need to perform the mapping
+ * before freeing any skbs in receive queue, otherwise user would be unable
+ * to fallback to standard recvmsg(). This happens if some data in the
+ * requested block is not exactly fitting in a page.
+ *
+ * We only support order-0 pages for the moment.
+ * mmap() on TCP is very strict, there is no point
+ * trying to accommodate with pathological layouts.
+ */
+int tcp_mmap(struct file *file, struct socket *sock,
+	     struct vm_area_struct *vma)
+{
+	unsigned long size = vma->vm_end - vma->vm_start;
+	unsigned int nr_pages = size >> PAGE_SHIFT;
+	struct page **pages_array = NULL;
+	u32 seq, len, offset, nr = 0;
+	struct sock *sk = sock->sk;
+	const skb_frag_t *frags;
+	struct tcp_sock *tp;
+	struct sk_buff *skb;
+	int ret;
+
+	if (vma->vm_pgoff || !nr_pages)
+		return -EINVAL;
+
+	if (vma->vm_flags & VM_WRITE)
+		return -EPERM;
+	/* TODO: Maybe the following is not needed if pages are COW */
+	vma->vm_flags &= ~VM_MAYWRITE;
+
+	lock_sock(sk);
+
+	ret = -ENOTCONN;
+	if (sk->sk_state == TCP_LISTEN)
+		goto out;
+
+	sock_rps_record_flow(sk);
+
+	if (tcp_inq(sk) < size) {
+		ret = sock_flag(sk, SOCK_DONE) ? -EIO : -EAGAIN;
+		goto out;
+	}
+	tp = tcp_sk(sk);
+	seq = tp->copied_seq;
+	/* Abort if urgent data is in the area */
+	if (unlikely(tp->urg_data)) {
+		u32 urg_offset = tp->urg_seq - seq;
+
+		ret = -EINVAL;
+		if (urg_offset < size)
+			goto out;
+	}
+	ret = -ENOMEM;
+	pages_array = kvmalloc_array(nr_pages, sizeof(struct page *),
+				     GFP_KERNEL);
+	if (!pages_array)
+		goto out;
+	skb = tcp_recv_skb(sk, seq, &offset);
+	ret = -EINVAL;
+skb_start:
+	/* We do not support anything not in page frags */
+	offset -= skb_headlen(skb);
+	if ((int)offset < 0)
+		goto out;
+	if (skb_has_frag_list(skb))
+		goto out;
+	len = skb->data_len - offset;
+	frags = skb_shinfo(skb)->frags;
+	while (offset) {
+		if (frags->size > offset)
+			goto out;
+		offset -= frags->size;
+		frags++;
+	}
+	while (nr < nr_pages) {
+		if (len) {
+			if (len < PAGE_SIZE)
+				goto out;
+			if (frags->size != PAGE_SIZE || frags->page_offset)
+				goto out;
+			pages_array[nr++] = skb_frag_page(frags);
+			frags++;
+			len -= PAGE_SIZE;
+			seq += PAGE_SIZE;
+			continue;
+		}
+		skb = skb->next;
+		offset = seq - TCP_SKB_CB(skb)->seq;
+		goto skb_start;
+	}
+	/* OK, we have a full set of pages ready to be inserted into vma */
+	for (nr = 0; nr < nr_pages; nr++) {
+		ret = vm_insert_page(vma, vma->vm_start + (nr << PAGE_SHIFT),
+				     pages_array[nr]);
+		if (ret)
+			goto out;
+	}
+	/* operation is complete, we can 'consume' all skbs */
+	tp->copied_seq = seq;
+	tcp_rcv_space_adjust(sk);
+
+	/* Clean up data we have read: This will do ACK frames. */
+	tcp_recv_skb(sk, seq, &offset);
+	tcp_cleanup_rbuf(sk, size);
+
+	ret = 0;
+out:
+	release_sock(sk);
+	kvfree(pages_array);
+	return ret;
+}
+EXPORT_SYMBOL(tcp_mmap);
+
 static void tcp_update_recv_tstamps(struct sk_buff *skb,
 				    struct scm_timestamping *tss)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index e70d59fb26e16ace1eb484d23964946092a2cd57..2c694912df2e77b414de5cc2aa43e2ec59286836 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -579,7 +579,7 @@ const struct proto_ops inet6_stream_ops = {
 	.getsockopt	   = sock_common_getsockopt,	/* ok		*/
 	.sendmsg	   = inet_sendmsg,		/* ok		*/
 	.recvmsg	   = inet_recvmsg,		/* ok		*/
-	.mmap		   = sock_no_mmap,
+	.mmap		   = tcp_mmap,
 	.sendpage	   = inet_sendpage,
 	.sendmsg_locked    = tcp_sendmsg_locked,
 	.sendpage_locked   = tcp_sendpage_locked,
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 5/5] selftests: net: add tcp_mmap program
  2018-04-16 17:33 [PATCH net-next 0/5] tcp: add zero copy receive Eric Dumazet
                   ` (3 preceding siblings ...)
  2018-04-16 17:33 ` [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive Eric Dumazet
@ 2018-04-16 17:33 ` Eric Dumazet
  2018-04-16 22:48 ` [PATCH net-next 0/5] tcp: add zero copy receive David Miller
  5 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2018-04-16 17:33 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

This is a reference program showing how mmap() can be used
on TCP flows to implement receive zero copy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 tools/testing/selftests/net/Makefile   |   2 +
 tools/testing/selftests/net/tcp_mmap.c | 437 +++++++++++++++++++++++++
 2 files changed, 439 insertions(+)
 create mode 100644 tools/testing/selftests/net/tcp_mmap.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 785fc18a16b4701f3ef875b60648726750b0cd26..23e725f4305eae6bc23fe705c8d3fe262110395a 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -8,9 +8,11 @@ TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetl
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
+TEST_GEN_FILES += tcp_mmap
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
 
 include ../lib.mk
 
 $(OUTPUT)/reuseport_bpf_numa: LDFLAGS += -lnuma
+$(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread
diff --git a/tools/testing/selftests/net/tcp_mmap.c b/tools/testing/selftests/net/tcp_mmap.c
new file mode 100644
index 0000000000000000000000000000000000000000..dea342fe6f4e88b5709d2ac37b2fc9a2a320bf44
--- /dev/null
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -0,0 +1,437 @@
+/*
+ * Copyright 2018 Google Inc.
+ * Author: Eric Dumazet (edumazet@google.com)
+ *
+ * Reference program demonstrating tcp mmap() usage,
+ * and SO_RCVLOWAT hints for receiver.
+ *
+ * Note : NIC with header split is needed to use mmap() on TCP :
+ * Each incoming frame must be a multiple of PAGE_SIZE bytes of TCP payload.
+ *
+ * How to use on loopback interface :
+ *
+ *  ifconfig lo mtu 61512  # 15*4096 + 40 (ipv6 header) + 32 (TCP with TS option header)
+ *  tcp_mmap -s -z &
+ *  tcp_mmap -H ::1 -z
+ *
+ *  Or leave default lo mtu, but use -M option to set TCP_MAXSEG option to (4096 + 12)
+ *      (4096 : page size on x86, 12: TCP TS option length)
+ *  tcp_mmap -s -z -M $((4096+12)) &
+ *  tcp_mmap -H ::1 -z -M $((4096+12))
+ *
+ * Note: -z option on sender uses MSG_ZEROCOPY, which forces a copy when packets go through loopback interface.
+ *       We might use sendfile() instead, but really this test program is about mmap(), for receivers ;)
+ *
+ * $ ./tcp_mmap -s &                                 # Without mmap()
+ * $ for i in {1..4}; do ./tcp_mmap -H ::1 -z ; done
+ * received 32768 MB (0 % mmap'ed) in 14.1157 s, 19.4732 Gbit
+ *   cpu usage user:0.057 sys:7.815, 240.234 usec per MB, 65531 c-switches
+ * received 32768 MB (0 % mmap'ed) in 14.6833 s, 18.7204 Gbit
+ *  cpu usage user:0.043 sys:8.103, 248.596 usec per MB, 65524 c-switches
+ * received 32768 MB (0 % mmap'ed) in 11.143 s, 24.6682 Gbit
+ *   cpu usage user:0.044 sys:6.576, 202.026 usec per MB, 65519 c-switches
+ * received 32768 MB (0 % mmap'ed) in 14.9056 s, 18.4413 Gbit
+ *   cpu usage user:0.036 sys:8.193, 251.129 usec per MB, 65530 c-switches
+ * $ kill %1   # kill tcp_mmap server
+ *
+ * $ ./tcp_mmap -s -z &                              # With mmap()
+ * $ for i in {1..4}; do ./tcp_mmap -H ::1 -z ; done
+ * received 32768 MB (99.9939 % mmap'ed) in 6.73792 s, 40.7956 Gbit
+ *   cpu usage user:0.045 sys:2.827, 87.6465 usec per MB, 65532 c-switches
+ * received 32768 MB (99.9939 % mmap'ed) in 7.26732 s, 37.8238 Gbit
+ *   cpu usage user:0.037 sys:3.087, 95.3369 usec per MB, 65532 c-switches
+ * received 32768 MB (99.9939 % mmap'ed) in 7.61661 s, 36.0893 Gbit
+ *   cpu usage user:0.046 sys:3.559, 110.016 usec per MB, 65529 c-switches
+ * received 32768 MB (99.9939 % mmap'ed) in 7.43764 s, 36.9577 Gbit
+ *   cpu usage user:0.035 sys:3.467, 106.873 usec per MB, 65530 c-switches
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sys/types.h>
+#include <fcntl.h>
+#include <error.h>
+#include <sys/socket.h>
+#include <sys/mman.h>
+#include <sys/resource.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include <time.h>
+#include <sys/time.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <arpa/inet.h>
+#include <poll.h>
+
+#ifndef MSG_ZEROCOPY
+#define MSG_ZEROCOPY    0x4000000
+#endif
+
+#define FILE_SZ (1UL << 35)
+static int cfg_family = AF_INET6;
+static socklen_t cfg_alen = sizeof(struct sockaddr_in6);
+static int cfg_port = 8787;
+
+static int rcvbuf; /* Default: autotuning.  Can be set with -r <integer> option */
+static int sndbuf; /* Default: autotuning.  Can be set with -w <integer> option */
+static int zflg; /* zero copy option. (MSG_ZEROCOPY for sender, mmap() for receiver */
+static int xflg; /* hash received data (simple xor) (-h option) */
+static int keepflag; /* -k option: receiver shall keep all received file in memory (no munmap() calls) */
+
+static int chunk_size  = 512*1024;
+
+unsigned long htotal;
+
+static inline void prefetch(const void *x)
+{
+#if defined(__x86_64__)
+	asm volatile("prefetcht0 %P0" : : "m" (*(const char *)x));
+#endif
+}
+
+void hash_zone(void *zone, unsigned int length)
+{
+	unsigned long temp = htotal;
+
+	while (length >= 8*sizeof(long)) {
+		prefetch(zone + 384);
+		temp ^= *(unsigned long *)zone;
+		temp ^= *(unsigned long *)(zone + sizeof(long));
+		temp ^= *(unsigned long *)(zone + 2*sizeof(long));
+		temp ^= *(unsigned long *)(zone + 3*sizeof(long));
+		temp ^= *(unsigned long *)(zone + 4*sizeof(long));
+		temp ^= *(unsigned long *)(zone + 5*sizeof(long));
+		temp ^= *(unsigned long *)(zone + 6*sizeof(long));
+		temp ^= *(unsigned long *)(zone + 7*sizeof(long));
+		zone += 8*sizeof(long);
+		length -= 8*sizeof(long);
+	}
+	while (length >= 1) {
+		temp ^= *(unsigned char *)zone;
+		zone += 1;
+		length--;
+	}
+	htotal = temp;
+}
+
+void *child_thread(void *arg)
+{
+	unsigned long total_mmap = 0, total = 0;
+	unsigned long delta_usec;
+	int flags = MAP_SHARED;
+	struct timeval t0, t1;
+	char *buffer = NULL;
+	void *oaddr = NULL;
+	double throughput;
+	struct rusage ru;
+	int lu, fd;
+
+	fd = (int)(unsigned long)arg;
+
+	gettimeofday(&t0, NULL);
+
+	fcntl(fd, F_SETFL, O_NDELAY);
+	buffer = malloc(chunk_size);
+	if (!buffer) {
+		perror("malloc");
+		goto error;
+	}
+	while (1) {
+		struct pollfd pfd = { .fd = fd, .events = POLLIN, };
+		int sub;
+
+		poll(&pfd, 1, 10000);
+		if (zflg) {
+			void *naddr;
+
+			naddr = mmap(oaddr, chunk_size, PROT_READ, flags, fd, 0);
+			if (naddr == (void *)-1) {
+				if (errno == EAGAIN) {
+					/* That is if SO_RCVLOWAT is buggy */
+					usleep(1000);
+					continue;
+				}
+				if (errno == EINVAL) {
+					flags = MAP_SHARED;
+					oaddr = NULL;
+					goto fallback;
+				}
+				if (errno != EIO)
+					perror("mmap()");
+				break;
+			}
+			total_mmap += chunk_size;
+			if (xflg)
+				hash_zone(naddr, chunk_size);
+			total += chunk_size;
+			if (!keepflag) {
+				flags |= MAP_FIXED;
+				oaddr = naddr;
+			}
+			continue;
+		}
+fallback:
+		sub = 0;
+		while (sub < chunk_size) {
+			lu = read(fd, buffer + sub, chunk_size - sub);
+			if (lu == 0)
+				goto end;
+			if (lu < 0)
+				break;
+			if (xflg)
+				hash_zone(buffer + sub, lu);
+			total += lu;
+			sub += lu;
+		}
+	}
+end:
+	gettimeofday(&t1, NULL);
+	delta_usec = (t1.tv_sec - t0.tv_sec) * 1000000 + t1.tv_usec - t0.tv_usec;
+
+	throughput = 0;
+	if (delta_usec)
+		throughput = total * 8.0 / (double)delta_usec / 1000.0;
+	getrusage(RUSAGE_THREAD, &ru);
+	if (total > 1024*1024) {
+		unsigned long total_usec;
+		unsigned long mb = total >> 20;
+		total_usec = 1000000*ru.ru_utime.tv_sec + ru.ru_utime.tv_usec +
+			     1000000*ru.ru_stime.tv_sec + ru.ru_stime.tv_usec;
+		printf("received %lg MB (%lg %% mmap'ed) in %lg s, %lg Gbit\n"
+		       "  cpu usage user:%lg sys:%lg, %lg usec per MB, %lu c-switches\n",
+				total / (1024.0 * 1024.0),
+				100.0*total_mmap/total,
+				(double)delta_usec / 1000000.0,
+				throughput,
+				(double)ru.ru_utime.tv_sec + (double)ru.ru_utime.tv_usec / 1000000.0,
+				(double)ru.ru_stime.tv_sec + (double)ru.ru_stime.tv_usec / 1000000.0,
+				(double)total_usec/mb,
+				ru.ru_nvcsw);
+	}
+error:
+	free(buffer);
+	close(fd);
+	pthread_exit(0);
+}
+
+static void apply_rcvsnd_buf(int fd)
+{
+	if (rcvbuf && setsockopt(fd, SOL_SOCKET,
+				 SO_RCVBUF, &rcvbuf, sizeof(rcvbuf)) == -1) {
+		perror("setsockopt SO_RCVBUF");
+	}
+
+	if (sndbuf && setsockopt(fd, SOL_SOCKET,
+				 SO_SNDBUF, &sndbuf, sizeof(sndbuf)) == -1) {
+		perror("setsockopt SO_SNDBUF");
+	}
+}
+
+
+static void setup_sockaddr(int domain, const char *str_addr,
+			   struct sockaddr_storage *sockaddr)
+{
+	struct sockaddr_in6 *addr6 = (void *) sockaddr;
+	struct sockaddr_in *addr4 = (void *) sockaddr;
+
+	switch (domain) {
+	case PF_INET:
+		memset(addr4, 0, sizeof(*addr4));
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(cfg_port);
+		if (str_addr &&
+		    inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1)
+			error(1, 0, "ipv4 parse error: %s", str_addr);
+		break;
+	case PF_INET6:
+		memset(addr6, 0, sizeof(*addr6));
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(cfg_port);
+		if (str_addr &&
+		    inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1)
+			error(1, 0, "ipv6 parse error: %s", str_addr);
+		break;
+	default:
+		error(1, 0, "illegal domain");
+	}
+}
+
+static void do_accept(int fdlisten)
+{
+	if (setsockopt(fdlisten, SOL_SOCKET, SO_RCVLOWAT,
+		       &chunk_size, sizeof(chunk_size)) == -1) {
+		perror("setsockopt SO_RCVLOWAT");
+	}
+
+	apply_rcvsnd_buf(fdlisten);
+
+	while (1) {
+		struct sockaddr_in addr;
+		socklen_t addrlen = sizeof(addr);
+		pthread_t th;
+		int fd, res;
+
+		fd = accept(fdlisten, (struct sockaddr *)&addr, &addrlen);
+		if (fd == -1) {
+			perror("accept");
+			continue;
+		}
+		res = pthread_create(&th, NULL, child_thread,
+				     (void *)(unsigned long)fd);
+		if (res) {
+			errno = res;
+			perror("pthread_create");
+			close(fd);
+		}
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	struct sockaddr_storage listenaddr, addr;
+	unsigned int max_pacing_rate = 0;
+	unsigned long total = 0;
+	char *host = NULL;
+	int fd, c, on = 1;
+	char *buffer;
+	int sflg = 0;
+	int mss = 0;
+
+	while ((c = getopt(argc, argv, "46p:svr:w:H:zxkP:M:")) != -1) {
+		switch (c) {
+		case '4':
+			cfg_family = PF_INET;
+			cfg_alen = sizeof(struct sockaddr_in);
+			break;
+		case '6':
+			cfg_family = PF_INET6;
+			cfg_alen = sizeof(struct sockaddr_in6);
+			break;
+		case 'p':
+			cfg_port = atoi(optarg);
+			break;
+		case 'H':
+			host = optarg;
+			break;
+		case 's': /* server : listen for incoming connections */
+			sflg++;
+			break;
+		case 'r':
+			rcvbuf = atoi(optarg);
+			break;
+		case 'w':
+			sndbuf = atoi(optarg);
+			break;
+		case 'z':
+			zflg = 1;
+			break;
+		case 'M':
+			mss = atoi(optarg);
+			break;
+		case 'x':
+			xflg = 1;
+			break;
+		case 'k':
+			keepflag = 1;
+			break;
+		case 'P':
+			max_pacing_rate = atoi(optarg) ;
+			break;
+		default:
+			exit(1);
+		}
+	}
+	if (sflg) {
+		int fdlisten = socket(cfg_family, SOCK_STREAM, 0);
+
+		if (fdlisten == -1) {
+			perror("socket");
+			exit(1);
+		}
+		apply_rcvsnd_buf(fdlisten);
+		setsockopt(fdlisten, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
+
+		setup_sockaddr(cfg_family, host, &listenaddr);
+
+		if (mss &&
+		    setsockopt(fdlisten, SOL_TCP, TCP_MAXSEG, &mss, sizeof(mss)) == -1) {
+			perror("setsockopt TCP_MAXSEG");
+			exit(1);
+		}
+		if (bind(fdlisten, (const struct sockaddr *)&listenaddr, cfg_alen) == -1) {
+			perror("bind");
+			exit(1);
+		}
+		if (listen(fdlisten, 128) == -1) {
+			perror("listen");
+			exit(1);
+		}
+		do_accept(fdlisten);
+	}
+	buffer = mmap(NULL, chunk_size, PROT_READ | PROT_WRITE,
+			      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buffer == (char *)-1) {
+		perror("mmap");
+		exit(1);
+	}
+
+	fd = socket(AF_INET6, SOCK_STREAM, 0);
+	if (fd == -1) {
+		perror("socket");
+		exit(1);
+	}
+	apply_rcvsnd_buf(fd);
+
+	setup_sockaddr(cfg_family, host, &addr);
+
+	if (mss &&
+	    setsockopt(fd, SOL_TCP, TCP_MAXSEG, &mss, sizeof(mss)) == -1) {
+		perror("setsockopt TCP_MAXSEG");
+		exit(1);
+	}
+	if (connect(fd, (const struct sockaddr *)&addr, cfg_alen) == -1) {
+		perror("connect");
+		exit(1);
+	}
+	if (max_pacing_rate &&
+	    setsockopt(fd, SOL_SOCKET, SO_MAX_PACING_RATE,
+		       &max_pacing_rate, sizeof(max_pacing_rate)) == -1)
+		perror("setsockopt SO_MAX_PACING_RATE");
+
+	if (zflg && setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY,
+			       &on, sizeof(on)) == -1) {
+		perror("setsockopt SO_ZEROCOPY, (-z option disabled)");
+		zflg = 0;
+	}
+	while (total < FILE_SZ) {
+		long wr = FILE_SZ - total;
+
+		if (wr > chunk_size)
+			wr = chunk_size;
+		/* Note : we just want to fill the pipe with 0 bytes */
+		wr = send(fd, buffer, wr, zflg ? MSG_ZEROCOPY : 0);
+		if (wr <= 0)
+			break;
+		total += wr;
+	}
+	close(fd);
+	munmap(buffer, chunk_size);
+	return 0;
+}
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/5] tcp: add zero copy receive
  2018-04-16 17:33 [PATCH net-next 0/5] tcp: add zero copy receive Eric Dumazet
                   ` (4 preceding siblings ...)
  2018-04-16 17:33 ` [PATCH net-next 5/5] selftests: net: add tcp_mmap program Eric Dumazet
@ 2018-04-16 22:48 ` David Miller
  5 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2018-04-16 22:48 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, ncardwell, ycheng, soheil, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 16 Apr 2018 10:33:34 -0700

> This patch series add mmap() support to TCP sockets for RX zero copy.
> 
> While tcp_mmap() patch itself is quite small (~100 LOC), optimal support
> for asynchronous mmap() required better SO_RCVLOWAT behavior, and a
> test program to demonstrate how mmap() on TCP sockets can be used.
> 
> Note that mmap() (and associated munmap()) calls are adding more
> pressure on per-process VM semaphore, so might not show benefit
> for processus with high number of threads.

Great work.  I can see how it is less effective without the rcvlowat
fixes/tweaks.

Series applied, thanks Eric!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive
  2018-04-16 17:33 ` [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive Eric Dumazet
@ 2018-04-19 23:15   ` Eric Dumazet
  2018-04-20  1:01     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2018-04-19 23:15 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh



On 04/16/2018 10:33 AM, Eric Dumazet wrote:
> Some networks can make sure TCP payload can exactly fit 4KB pages,
> with well chosen MSS/MTU and architectures.
> 
> Implement mmap() system call so that applications can avoid
> copying data without complex splice() games.
> 
> Note that a successful mmap( X bytes) on TCP socket is consuming
> bytes, as if recvmsg() has been done. (tp->copied += X)
> 

Oh well, I should have run this code with LOCKDEP enabled :/

[  974.320412] ======================================================
[  974.326631] WARNING: possible circular locking dependency detected
[  974.332816] 4.16.0-dbx-DEV #40 Not tainted
[  974.336927] ------------------------------------------------------
[  974.343107] b78299096/15790 is trying to acquire lock:
[  974.348246] 000000006074c9cf (sk_lock-AF_INET6){+.+.}, at: tcp_mmap+0x7c/0x550
[  974.355505] 
               but task is already holding lock:
[  974.361366] 000000008dbe063b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x99/0x100
[  974.368801] 
               which lock already depends on the new lock.

[  974.377010] 
               the existing dependency chain (in reverse order) is:
[  974.384501] 
               -> #1 (&mm->mmap_sem){++++}:
[  974.389911]        __might_fault+0x68/0x90
[  974.394025]        _copy_from_user+0x23/0xa0
[  974.398311]        sock_setsockopt+0x4a2/0xac0
[  974.402761]        __sys_setsockopt+0xd9/0xf0
[  974.407118]        SyS_setsockopt+0xe/0x20
[  974.411242]        do_syscall_64+0x6e/0x1a0
[  974.415431]        entry_SYSCALL_64_after_hwframe+0x42/0xb7
[  974.421011] 
               -> #0 (sk_lock-AF_INET6){+.+.}:
[  974.426690]        lock_acquire+0x95/0x1e0
[  974.430813]        lock_sock_nested+0x71/0xa0
[  974.435196]        tcp_mmap+0x7c/0x550
[  974.438940]        sock_mmap+0x23/0x30
[  974.442695]        mmap_region+0x3a4/0x5d0
[  974.446808]        do_mmap+0x313/0x530
[  974.450571]        vm_mmap_pgoff+0xc7/0x100
[  974.454769]        ksys_mmap_pgoff+0x1d5/0x260
[  974.459247]        SyS_mmap+0x1b/0x30
[  974.462936]        do_syscall_64+0x6e/0x1a0
[  974.467114]        entry_SYSCALL_64_after_hwframe+0x42/0xb7
[  974.472678] 
               other info that might help us debug this:

[  974.480677]  Possible unsafe locking scenario:

[  974.486600]        CPU0                    CPU1
[  974.491152]        ----                    ----
[  974.495684]   lock(&mm->mmap_sem);
[  974.499089]                                lock(sk_lock-AF_INET6);
[  974.505285]                                lock(&mm->mmap_sem);
[  974.511211]   lock(sk_lock-AF_INET6);
[  974.514885] 
                *** DEADLOCK ***

[  974.520825] 1 lock held by b78299096/15790:
[  974.525018]  #0: 000000008dbe063b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x99/0x100
[  974.532852] 
               stack backtrace:
[  974.537224] CPU: 25 PID: 15790 Comm: b78299096 Not tainted 4.16.0-dbx-DEV #40
[  974.544371] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[  974.551333] Call Trace:
[  974.553792]  dump_stack+0x70/0xa5
[  974.557111]  print_circular_bug.isra.39+0x1d8/0x1e6
[  974.561982]  __lock_acquire+0x1284/0x1340
[  974.565992]  ? tcp_mmap+0x7c/0x550
[  974.569419]  lock_acquire+0x95/0x1e0
[  974.573011]  ? lock_acquire+0x95/0x1e0
[  974.576767]  ? tcp_mmap+0x7c/0x550
[  974.580167]  lock_sock_nested+0x71/0xa0
[  974.584023]  ? tcp_mmap+0x7c/0x550
[  974.587437]  tcp_mmap+0x7c/0x550
[  974.590677]  sock_mmap+0x23/0x30
[  974.593909]  mmap_region+0x3a4/0x5d0
[  974.597506]  do_mmap+0x313/0x530
[  974.600749]  vm_mmap_pgoff+0xc7/0x100
[  974.604414]  ksys_mmap_pgoff+0x1d5/0x260
[  974.608341]  ? fd_install+0x25/0x30
[  974.611849]  ? trace_hardirqs_on_caller+0xef/0x180
[  974.616641]  SyS_mmap+0x1b/0x30
[  974.619804]  do_syscall_64+0x6e/0x1a0
[  974.623462]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[  974.628549] RIP: 0033:0x433749
[  974.631600] RSP: 002b:00007ffd29fdb438 EFLAGS: 00000216 ORIG_RAX: 0000000000000009
[  974.639197] RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000433749
[  974.646323] RDX: 0000000000000008 RSI: 0000000000004000 RDI: 0000000020ab7000
[  974.653463] RBP: 00007ffd29fdb460 R08: 0000000000000003 R09: 0000000000000000
[  974.660603] R10: 0000000000000012 R11: 0000000000000216 R12: 0000000000401670
[  974.667737] R13: 0000000000401700 R14: 0000000000000000 R15: 0000000000000000


I am not sure we can keep mmap() API, since we probably need to first lock the socket,
then grab vm semaphore.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive
  2018-04-19 23:15   ` Eric Dumazet
@ 2018-04-20  1:01     ` Eric Dumazet
  2018-04-20  1:17       ` David Miller
  2018-04-20 15:19       ` Jonathan Corbet
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2018-04-20  1:01 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh



On 04/19/2018 04:15 PM, Eric Dumazet wrote:

> I am not sure we can keep mmap() API, since we probably need to first lock the socket,
> then grab vm semaphore.
> 

We can keep mmap() nice interface, granted we can add one hook like in following patch.

David, do you think such patch would be acceptable by lkml and mm/fs maintainers ?

Alternative would be implementing an ioctl() or getsockopt() operation,
but it seems less natural...

Thanks !

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 92efaf1f89775f7b017477617dd983c10e0dc4d2..016c711ac33e226b4285ee5bd688e14661dc0879 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1714,6 +1714,7 @@ struct file_operations {
        long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
        long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
        int (*mmap) (struct file *, struct vm_area_struct *);
+       void (*mmap_hook) (struct file *, bool);
        unsigned long mmap_supported_flags;
        int (*open) (struct inode *, struct file *);
        int (*flush) (struct file *, fl_owner_t id);
diff --git a/mm/util.c b/mm/util.c
index 1fc4fa7576f762bbbf341f056ca6d0be803a423f..b546c59a6169c4dfa9011c61e86da4d03496aa4d 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -350,11 +350,20 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
        ret = security_mmap_file(file, prot, flag);
        if (!ret) {
-               if (down_write_killable(&mm->mmap_sem))
+               void (*mmap_hook)(struct file *, bool) = file ? file->f_op->mmap_hook : NULL;
+
+               if (mmap_hook)
+                       mmap_hook(file, true);
+               if (down_write_killable(&mm->mmap_sem)) {
+                       if (mmap_hook)
+                               mmap_hook(file, false);
                        return -EINTR;
+               }
                ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
                                    &populate, &uf);
                up_write(&mm->mmap_sem);
+               if (mmap_hook)
+                       mmap_hook(file, false);
                userfaultfd_unmap_complete(mm, &uf);
                if (populate)
                        mm_populate(ret, populate);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4022073b0aeea9d07af0fa825b640a00512908a3..79b05d6d41643e8c309dfb8bd9597dc8b00fb0e1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1756,8 +1756,6 @@ int tcp_mmap(struct file *file, struct socket *sock,
        /* TODO: Maybe the following is not needed if pages are COW */
        vma->vm_flags &= ~VM_MAYWRITE;
 
-       lock_sock(sk);
-
        ret = -ENOTCONN;
        if (sk->sk_state == TCP_LISTEN)
                goto out;
@@ -1833,7 +1831,6 @@ int tcp_mmap(struct file *file, struct socket *sock,
 
        ret = 0;
 out:
-       release_sock(sk);
        kvfree(pages_array);
        return ret;
 }
diff --git a/net/socket.c b/net/socket.c
index f10f1d947c78c193b49379b0ec641d81367fb4cf..bcabae3c37d765e5c0548a14fc93c19258972b48 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -131,6 +131,16 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
                                struct pipe_inode_info *pipe, size_t len,
                                unsigned int flags);
 
+static void sock_mmap_hook(struct file *file, bool enter)
+{
+       struct socket *sock = file->private_data;
+       struct sock *sk = sock->sk;
+
+       if (enter)
+               lock_sock(sk);
+       else
+               release_sock(sk);
+}
 /*
  *     Socket files have a set of 'special' operations as well as the generic file ones. These don't appear
  *     in the operation structures but are done directly via the socketcall() multiplexor.
@@ -147,6 +157,7 @@ static const struct file_operations socket_file_ops = {
        .compat_ioctl = compat_sock_ioctl,
 #endif
        .mmap =         sock_mmap,
+       .mmap_hook =    sock_mmap_hook,
        .release =      sock_close,
        .fasync =       sock_fasync,
        .sendpage =     sock_sendpage,

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive
  2018-04-20  1:01     ` Eric Dumazet
@ 2018-04-20  1:17       ` David Miller
  2018-04-20 15:19       ` Jonathan Corbet
  1 sibling, 0 replies; 15+ messages in thread
From: David Miller @ 2018-04-20  1:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: edumazet, netdev, ncardwell, ycheng, soheil

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 19 Apr 2018 18:01:32 -0700

> David, do you think such patch would be acceptable by lkml and mm/fs
> maintainers ?

You will have to ask them directly I think :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning
  2018-04-16 17:33 ` [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning Eric Dumazet
@ 2018-04-20  2:02   ` Marcelo Ricardo Leitner
  2018-04-20  2:36     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-04-20  2:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

On Mon, Apr 16, 2018 at 10:33:35AM -0700, Eric Dumazet wrote:
> Applications might use SO_RCVLOWAT on TCP socket hoping to receive
> one [E]POLLIN event only when a given amount of bytes are ready in socket
> receive queue.
>
> Problem is that receive autotuning is not aware of this constraint,
> meaning sk_rcvbuf might be too small to allow all bytes to be stored.
>
> Add a new (struct proto_ops)->set_rcvlowat method so that a protocol
> can override the default setsockopt(SO_RCVLOWAT) behavior.
>

...

> +/* Make sure sk_rcvbuf is big enough to satisfy SO_RCVLOWAT hint */
> +int tcp_set_rcvlowat(struct sock *sk, int val)
> +{
> +	sk->sk_rcvlowat = val ? : 1;
> +	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
> +		return 0;
> +
> +	/* val comes from user space and might be close to INT_MAX */
> +	val <<= 1;
> +	if (val < 0)
> +		val = INT_MAX;
> +
> +	val = min(val, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);

Hi Eric,

As val may be changed to a smaller value by the line above, shouldn't
it assign sk->sk_rcvlowat again?  Otherwise it may still be bigger
than sk_rcvbuf.

Say val = 512k, sysctl_tcp_rmem[2] = 256k
val <<= 1 ,  val = 1M
val = min() ,  val = 256k
val > sk_rcvbuf
   sk_rcvbuf = 256k , at most, which is smaller than sk_rcvlowat

Without reassigning the application has to check how big is
tcp_rmem[2] and be sure to not go above /2 of it to not trip on this
again.

Or, as you have added a return value here, it could return -EINVAL in
such cases. Probably better, as then the application will not get a
smaller buffer than wanted later.

> +	if (val > sk->sk_rcvbuf) {
> +		sk->sk_rcvbuf = val;
> +		tcp_sk(sk)->window_clamp = tcp_win_from_space(sk, val);
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL(tcp_set_rcvlowat);
> +
...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning
  2018-04-20  2:02   ` Marcelo Ricardo Leitner
@ 2018-04-20  2:36     ` Eric Dumazet
  2018-04-20  3:04       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2018-04-20  2:36 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Miller, netdev, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

On Thu, Apr 19, 2018 at 7:02 PM Marcelo Ricardo Leitner <
marcelo.leitner@gmail.com> wrote:

> Hi Eric,

> As val may be changed to a smaller value by the line above, shouldn't
> it assign sk->sk_rcvlowat again?  Otherwise it may still be bigger
> than sk_rcvbuf.

> Say val = 512k, sysctl_tcp_rmem[2] = 256k
> val <<= 1 ,  val = 1M
> val = min() ,  val = 256k
> val > sk_rcvbuf
>     sk_rcvbuf = 256k , at most, which is smaller than sk_rcvlowat

> Without reassigning the application has to check how big is
> tcp_rmem[2] and be sure to not go above /2 of it to not trip on this
> again.

I am not sure about that :

Reporting an error might break existing applications that were not
expecting setsockopt()
to return an error, even if the value was 'probably too big to be okay'


> Or, as you have added a return value here, it could return -EINVAL in
> such cases. Probably better, as then the application will not get a
> smaller buffer than wanted later.

Note that maybe some applications might first set SO_RCVLOWAT, then
SO_RCVBUF,
we do not want to break them.


My patch really covers the case were autotuning should immediately grow the
sk_rcvbuf
for reasonable SO_RCVLOWAT values.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning
  2018-04-20  2:36     ` Eric Dumazet
@ 2018-04-20  3:04       ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 15+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-04-20  3:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

On Fri, Apr 20, 2018 at 02:36:52AM +0000, Eric Dumazet wrote:
> On Thu, Apr 19, 2018 at 7:02 PM Marcelo Ricardo Leitner <
> marcelo.leitner@gmail.com> wrote:
>
> > Hi Eric,
>
> > As val may be changed to a smaller value by the line above, shouldn't
> > it assign sk->sk_rcvlowat again?  Otherwise it may still be bigger
> > than sk_rcvbuf.
>
> > Say val = 512k, sysctl_tcp_rmem[2] = 256k
> > val <<= 1 ,  val = 1M
> > val = min() ,  val = 256k
> > val > sk_rcvbuf
> >     sk_rcvbuf = 256k , at most, which is smaller than sk_rcvlowat
>
> > Without reassigning the application has to check how big is
> > tcp_rmem[2] and be sure to not go above /2 of it to not trip on this
> > again.
>
> I am not sure about that :
>
> Reporting an error might break existing applications that were not
> expecting setsockopt()
> to return an error, even if the value was 'probably too big to be okay'

I would argue that they are already broken but...

>
>
> > Or, as you have added a return value here, it could return -EINVAL in
> > such cases. Probably better, as then the application will not get a
> > smaller buffer than wanted later.
>
> Note that maybe some applications might first set SO_RCVLOWAT, then
> SO_RCVBUF,
> we do not want to break them.

... yeah.. if they do it this way, they work today. Good point.

>
>
> My patch really covers the case were autotuning should immediately grow the
> sk_rcvbuf
> for reasonable SO_RCVLOWAT values.

That's not exactly what the comment above the function says, thus why
my comments.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive
  2018-04-20  1:01     ` Eric Dumazet
  2018-04-20  1:17       ` David Miller
@ 2018-04-20 15:19       ` Jonathan Corbet
  2018-04-20 15:39         ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Corbet @ 2018-04-20 15:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

On Thu, 19 Apr 2018 18:01:32 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> We can keep mmap() nice interface, granted we can add one hook like in following patch.
> 
> David, do you think such patch would be acceptable by lkml and mm/fs maintainers ?
> 
> Alternative would be implementing an ioctl() or getsockopt() operation,
> but it seems less natural...

So I have little standing here, but what the heck, not letting that bother
me has earned me a living for the last 20 years or so...:)

I think you should consider switching over to an interface where you
mmap() the region once, and use ioctl() to move the data into that region,
for a couple of reasons beyond the locking issues you've already found:

 - The "mmap() consumes data" semantics are a bit ... strange, IMO.
   That's not what mmap() normally does.  People expect ioctl() to do
   magic things, instead.

 - I would expect it to be a tiny bit faster, since you wouldn't be doing
   the VMA setup and teardown each time.

Thanks,

jon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive
  2018-04-20 15:19       ` Jonathan Corbet
@ 2018-04-20 15:39         ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2018-04-20 15:39 UTC (permalink / raw)
  To: Jonathan Corbet, Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh



On 04/20/2018 08:19 AM, Jonathan Corbet wrote:
> On Thu, 19 Apr 2018 18:01:32 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> We can keep mmap() nice interface, granted we can add one hook like in following patch.
>>
>> David, do you think such patch would be acceptable by lkml and mm/fs maintainers ?
>>
>> Alternative would be implementing an ioctl() or getsockopt() operation,
>> but it seems less natural...
> 

Hi Jonathan

> So I have little standing here, but what the heck, not letting that bother
> me has earned me a living for the last 20 years or so...:)
> 
> I think you should consider switching over to an interface where you
> mmap() the region once, and use ioctl() to move the data into that region,
> for a couple of reasons beyond the locking issues you've already found:
> 
>  - The "mmap() consumes data" semantics are a bit ... strange, IMO.
>    That's not what mmap() normally does.  People expect ioctl() to do
>    magic things, instead.

Well, the thing is that most of our use cases wont reuse same mmap() area.

RPC layer will provide all RPC with their associated pages to RPC consumers.

RPC consumers will decide to keep these pages or consume them.

So having to mmap() + another syscall to consume XXX bytes from receive queue is not
going to save cpu cycles :/

Having the ability to call mmap() multiple times for the same TCP payload is not
going to be of any use in real applications. This is why I only support 'offset 0'
for the last mmap() parameter.

> 
>  - I would expect it to be a tiny bit faster, since you wouldn't be doing
>    the VMA setup and teardown each time.

Maybe for the degenerated case we can reuse the same region over and over.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-04-20 15:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-16 17:33 [PATCH net-next 0/5] tcp: add zero copy receive Eric Dumazet
2018-04-16 17:33 ` [PATCH net-next 1/5] tcp: fix SO_RCVLOWAT and RCVBUF autotuning Eric Dumazet
2018-04-20  2:02   ` Marcelo Ricardo Leitner
2018-04-20  2:36     ` Eric Dumazet
2018-04-20  3:04       ` Marcelo Ricardo Leitner
2018-04-16 17:33 ` [PATCH net-next 2/5] tcp: fix delayed acks behavior for SO_RCVLOWAT Eric Dumazet
2018-04-16 17:33 ` [PATCH net-next 3/5] tcp: avoid extra wakeups for SO_RCVLOWAT users Eric Dumazet
2018-04-16 17:33 ` [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive Eric Dumazet
2018-04-19 23:15   ` Eric Dumazet
2018-04-20  1:01     ` Eric Dumazet
2018-04-20  1:17       ` David Miller
2018-04-20 15:19       ` Jonathan Corbet
2018-04-20 15:39         ` Eric Dumazet
2018-04-16 17:33 ` [PATCH net-next 5/5] selftests: net: add tcp_mmap program Eric Dumazet
2018-04-16 22:48 ` [PATCH net-next 0/5] tcp: add zero copy receive David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.