All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] tcp: no longer keep around headers in input path
@ 2014-09-15  2:56 Eric Dumazet
  2014-09-15  2:56 ` [PATCH net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eric Dumazet @ 2014-09-15  2:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet

Looking at tcp_try_coalesce() I was wondering why I did :

if (tcp_hdr(from)->fin)
     return false;

The answer would be to allow the aggregation, if we simply OR the FIN and PSH
flags eventually present in @from to @to packet.

Then, looking at tcp_recvmsg(), I realized we access tcp_hdr(skb)->syn
(and maybe tcp_hdr(skb)->fin) for every packet we process from socket
receive queue.

We have to understand TCP flags are cold in cpu caches most of the time
(assuming RPS/RFS is used, or that application calls recvmsg() a long
time after incoming packet was processed), and bringing a whole
cache line only to access one bit is not very nice.

It would make sense to use in TCP input path TCP_SKB_CB(skb)->tcp_flags
as we do in output path.

This saves one cache line miss, and TCP tcp_collapse() could avoid dealing
with the headers.

Eric Dumazet (3):
  tcp: use TCP_SKB_CB(skb)->tcp_flags in input path
  tcp: allow segment with FIN in tcp_try_coalesce()
  tcp: do not copy headers in tcp_collapse()

 net/ipv4/tcp.c       | 18 ++++++++++--------
 net/ipv4/tcp_input.c | 31 ++++++++-----------------------
 net/ipv4/tcp_ipv4.c  |  1 +
 net/ipv6/tcp_ipv6.c  |  1 +
 4 files changed, 20 insertions(+), 31 deletions(-)

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags in input path
  2014-09-15  2:56 [PATCH net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
@ 2014-09-15  2:56 ` Eric Dumazet
  2014-09-15  2:56 ` [PATCH net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce() Eric Dumazet
  2014-09-15  2:56 ` [PATCH net-next 3/3] tcp: do not copy headers in tcp_collapse() Eric Dumazet
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2014-09-15  2:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet

Input path of TCP do not currently uses TCP_SKB_CB(skb)->tcp_flags,
which is only used in output path.

tcp_recvmsg(), looks at tcp_hdr(skb)->syn for every skb found in receive queue,
and it is unfortunate because this bit is located in a cache line right before
the payload.

We can simplify TCP by copying tcp flags into TCP_SKB_CB(skb)->tcp_flags.

This patch does so, and avoids the cache line miss in tcp_recvmsg()

Following patches will
- allow a segment with FIN being coalesced in tcp_try_coalesce()
- simplify tcp_collapse() by not copying the headers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c       | 18 ++++++++++--------
 net/ipv4/tcp_input.c | 10 +++++-----
 net/ipv4/tcp_ipv4.c  |  1 +
 net/ipv6/tcp_ipv6.c  |  1 +
 4 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 541f26a67ba2..070aeff1b131 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1510,9 +1510,9 @@ static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 
 	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
 		offset = seq - TCP_SKB_CB(skb)->seq;
-		if (tcp_hdr(skb)->syn)
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
 			offset--;
-		if (offset < skb->len || tcp_hdr(skb)->fin) {
+		if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) {
 			*off = offset;
 			return skb;
 		}
@@ -1585,7 +1585,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 			if (offset + 1 != skb->len)
 				continue;
 		}
-		if (tcp_hdr(skb)->fin) {
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
 			sk_eat_skb(sk, skb, false);
 			++seq;
 			break;
@@ -1722,11 +1722,11 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 				break;
 
 			offset = *seq - TCP_SKB_CB(skb)->seq;
-			if (tcp_hdr(skb)->syn)
+			if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
 				offset--;
 			if (offset < skb->len)
 				goto found_ok_skb;
-			if (tcp_hdr(skb)->fin)
+			if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 				goto found_fin_ok;
 			WARN(!(flags & MSG_PEEK),
 			     "recvmsg bug 2: copied %X seq %X rcvnxt %X fl %X\n",
@@ -1959,7 +1959,7 @@ skip_copy:
 		if (used + offset < skb->len)
 			continue;
 
-		if (tcp_hdr(skb)->fin)
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			goto found_fin_ok;
 		if (!(flags & MSG_PEEK)) {
 			sk_eat_skb(sk, skb, copied_early);
@@ -2160,8 +2160,10 @@ void tcp_close(struct sock *sk, long timeout)
 	 *  reader process may not have drained the data yet!
 	 */
 	while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
-		u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq -
-			  tcp_hdr(skb)->fin;
+		u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
+
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+			len--;
 		data_was_unread += len;
 		__kfree_skb(skb);
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f97003ad0af5..8f639a4face9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4093,7 +4093,7 @@ static void tcp_ofo_queue(struct sock *sk)
 		__skb_unlink(skb, &tp->out_of_order_queue);
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
 		tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
-		if (tcp_hdr(skb)->fin)
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			tcp_fin(sk);
 	}
 }
@@ -4513,7 +4513,7 @@ restart:
 		 * - bloated or contains data before "start" or
 		 *   overlaps to the next one.
 		 */
-		if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin &&
+		if (!(TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) &&
 		    (tcp_win_from_space(skb->truesize) > skb->len ||
 		     before(TCP_SKB_CB(skb)->seq, start))) {
 			end_of_skbs = false;
@@ -4532,7 +4532,8 @@ restart:
 		/* Decided to skip this, advance start seq. */
 		start = TCP_SKB_CB(skb)->end_seq;
 	}
-	if (end_of_skbs || tcp_hdr(skb)->syn || tcp_hdr(skb)->fin)
+	if (end_of_skbs ||
+	    (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
 		return;
 
 	while (before(start, end)) {
@@ -4579,8 +4580,7 @@ restart:
 				skb = tcp_collapse_one(sk, skb, list);
 				if (!skb ||
 				    skb == tail ||
-				    tcp_hdr(skb)->syn ||
-				    tcp_hdr(skb)->fin)
+				    (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
 					return;
 			}
 		}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7881b96d2b72..006b045716d8 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1638,6 +1638,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin +
 				    skb->len - th->doff * 4);
 	TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq);
+	TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th);
 	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph);
 	TCP_SKB_CB(skb)->sacked	 = 0;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 1835480336ac..de51a88bec6f 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1415,6 +1415,7 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 	TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin +
 				    skb->len - th->doff*4);
 	TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq);
+	TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th);
 	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv6_get_dsfield(hdr);
 	TCP_SKB_CB(skb)->sacked = 0;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce()
  2014-09-15  2:56 [PATCH net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
  2014-09-15  2:56 ` [PATCH net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
@ 2014-09-15  2:56 ` Eric Dumazet
  2014-09-15 10:57   ` Eric Dumazet
  2014-09-15  2:56 ` [PATCH net-next 3/3] tcp: do not copy headers in tcp_collapse() Eric Dumazet
  2 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2014-09-15  2:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet

There is nothing special with allowing a segment with FIN to be aggregated,
if we take care to add tcp flags.

Signed-of-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8f639a4face9..228bf0c5ff19 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4143,9 +4143,6 @@ static bool tcp_try_coalesce(struct sock *sk,
 
 	*fragstolen = false;
 
-	if (tcp_hdr(from)->fin)
-		return false;
-
 	/* Its possible this segment overlaps with prior segment in queue */
 	if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq)
 		return false;
@@ -4158,6 +4155,7 @@ static bool tcp_try_coalesce(struct sock *sk,
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
 	TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
 	TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
+	TCP_SKB_CB(to)->tcp_flags |= TCP_SKB_CB(from)->tcp_flags;
 	return true;
 }
 
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next 3/3] tcp: do not copy headers in tcp_collapse()
  2014-09-15  2:56 [PATCH net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
  2014-09-15  2:56 ` [PATCH net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
  2014-09-15  2:56 ` [PATCH net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce() Eric Dumazet
@ 2014-09-15  2:56 ` Eric Dumazet
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2014-09-15  2:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet

tcp_collapse() wants to shrink skb so that the overhead is minimal.

Now we store tcp flags into TCP_SKB_CB(skb)->tcp_flags, we no longer
need to keep around full headers.
Whole available space is dedicated to the payload.

Signed-of-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 228bf0c5ff19..ea92f23ffaf1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4535,26 +4535,13 @@ restart:
 		return;
 
 	while (before(start, end)) {
+		int copy = min_t(int, SKB_MAX_ORDER(0, 0), end - start);
 		struct sk_buff *nskb;
-		unsigned int header = skb_headroom(skb);
-		int copy = SKB_MAX_ORDER(header, 0);
 
-		/* Too big header? This can happen with IPv6. */
-		if (copy < 0)
-			return;
-		if (end - start < copy)
-			copy = end - start;
-		nskb = alloc_skb(copy + header, GFP_ATOMIC);
+		nskb = alloc_skb(copy, GFP_ATOMIC);
 		if (!nskb)
 			return;
 
-		skb_set_mac_header(nskb, skb_mac_header(skb) - skb->head);
-		skb_set_network_header(nskb, (skb_network_header(skb) -
-					      skb->head));
-		skb_set_transport_header(nskb, (skb_transport_header(skb) -
-						skb->head));
-		skb_reserve(nskb, header);
-		memcpy(nskb->head, skb->head, header);
 		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
 		TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start;
 		__skb_queue_before(list, skb, nskb);
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce()
  2014-09-15  2:56 ` [PATCH net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce() Eric Dumazet
@ 2014-09-15 10:57   ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2014-09-15 10:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev, Yuchung Cheng, Neal Cardwell

On Sun, 2014-09-14 at 19:56 -0700, Eric Dumazet wrote:
> There is nothing special with allowing a segment with FIN to be aggregated,
> if we take care to add tcp flags.
> 
> Signed-of-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/tcp_input.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 8f639a4face9..228bf0c5ff19 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4143,9 +4143,6 @@ static bool tcp_try_coalesce(struct sock *sk,
>  
>  	*fragstolen = false;
>  
> -	if (tcp_hdr(from)->fin)
> -		return false;
> -
>  	/* Its possible this segment overlaps with prior segment in queue */
>  	if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq)
>  		return false;
> @@ -4158,6 +4155,7 @@ static bool tcp_try_coalesce(struct sock *sk,
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
>  	TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
>  	TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
> +	TCP_SKB_CB(to)->tcp_flags |= TCP_SKB_CB(from)->tcp_flags;
>  	return true;
>  }
>  

This patch needs to be refined.

skb_try_coalesce() should not do an unconditional 
BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));

And do it only if len is non 0

I will send a v2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-09-15 10:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-15  2:56 [PATCH net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
2014-09-15  2:56 ` [PATCH net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
2014-09-15  2:56 ` [PATCH net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce() Eric Dumazet
2014-09-15 10:57   ` Eric Dumazet
2014-09-15  2:56 ` [PATCH net-next 3/3] tcp: do not copy headers in tcp_collapse() Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.