All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] tcp: reduce skb overhead in selected places
@ 2017-01-24 22:57 Eric Dumazet
  2017-01-25 18:17 ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2017-01-24 22:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

tcp_add_backlog() can use skb_condense() helper to get better
gains and less SKB_TRUESIZE() magic. This only happens when socket
backlog has to be used.

Some attacks involve specially crafted out of order tiny TCP packets,
clogging the ofo queue of (many) sockets.
Then later, expensive collapse happens, trying to copy all these skbs
into single ones.
This unfortunately does not work if each skb has no neighbor in TCP
sequence order.

By using skb_condense() if the skb could not be coalesced to a prior
one, we defeat these kind of threats, potentially saving 4K per skb
(or more, since this is one page fragment).

A typical NAPI driver allocates gro packets with GRO_MAX_HEAD bytes
in skb->head, meaning the copy done by skb_condense() is limited to
about 200 bytes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c |    1 +
 net/ipv4/tcp_ipv4.c  |    3 +--
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bfa165cc455ad0a9aea44964aa663dbe6085..3de6eba378ade2c0d4a8400ecb5582a7d126 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4507,6 +4507,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 end:
 	if (skb) {
 		tcp_grow_window(sk, skb);
+		skb_condense(skb);
 		skb_set_owner_r(skb, sk);
 	}
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f7325b25b06e65581ecc496f95e819aa738c..a90b4540c11eca6ed5b374ec69c8ced2ff18 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1556,8 +1556,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
 	 * It has been noticed pure SACK packets were sometimes dropped
 	 * (if cooked by drivers without copybreak feature).
 	 */
-	if (!skb->data_len)
-		skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
+	skb_condense(skb);
 
 	if (unlikely(sk_add_backlog(sk, skb, limit))) {
 		bh_unlock_sock(sk);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] tcp: reduce skb overhead in selected places
  2017-01-24 22:57 [PATCH net-next] tcp: reduce skb overhead in selected places Eric Dumazet
@ 2017-01-25 18:17 ` David Miller
  2017-01-25 18:38   ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2017-01-25 18:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 24 Jan 2017 14:57:36 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> tcp_add_backlog() can use skb_condense() helper to get better
> gains and less SKB_TRUESIZE() magic. This only happens when socket
> backlog has to be used.
> 
> Some attacks involve specially crafted out of order tiny TCP packets,
> clogging the ofo queue of (many) sockets.
> Then later, expensive collapse happens, trying to copy all these skbs
> into single ones.
> This unfortunately does not work if each skb has no neighbor in TCP
> sequence order.
> 
> By using skb_condense() if the skb could not be coalesced to a prior
> one, we defeat these kind of threats, potentially saving 4K per skb
> (or more, since this is one page fragment).
> 
> A typical NAPI driver allocates gro packets with GRO_MAX_HEAD bytes
> in skb->head, meaning the copy done by skb_condense() is limited to
> about 200 bytes.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] tcp: reduce skb overhead in selected places
  2017-01-25 18:17 ` David Miller
@ 2017-01-25 18:38   ` Eric Dumazet
  2017-01-25 19:03     ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2017-01-25 18:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wed, 2017-01-25 at 13:17 -0500, David Miller wrote:

> Applied, thanks Eric.

Thanks David.

It looks IPv6 potential big network headers are also a threat :

Various pskb_may_pull() to pull headers might reallocate skb->head,
but skb->truesize is not updated in __pskb_pull_tail()

We probably need to update skb->truesize, but it is tricky as the prior
skb->truesize value might have been used for memory accounting when skb
was stored in some queue.

Do you think we could change __pskb_pull_tail() right away and fix the
few places that would break, or should we add various helpers with extra
parameters to take a safe route ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] tcp: reduce skb overhead in selected places
  2017-01-25 18:38   ` Eric Dumazet
@ 2017-01-25 19:03     ` David Miller
  2017-01-25 21:40       ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2017-01-25 19:03 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 Jan 2017 10:38:52 -0800

> Do you think we could change __pskb_pull_tail() right away and fix the
> few places that would break, or should we add various helpers with extra
> parameters to take a safe route ?

It should always be safe as long as we see no socket attached on RX,
right?

That's the only real case where truesize adjustments can cause trouble.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] tcp: reduce skb overhead in selected places
  2017-01-25 19:03     ` David Miller
@ 2017-01-25 21:40       ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2017-01-25 21:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wed, 2017-01-25 at 14:03 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 25 Jan 2017 10:38:52 -0800
> 
> > Do you think we could change __pskb_pull_tail() right away and fix the
> > few places that would break, or should we add various helpers with extra
> > parameters to take a safe route ?
> 
> It should always be safe as long as we see no socket attached on RX,
> right?
> 
> That's the only real case where truesize adjustments can cause trouble.
Queue can be virtual, as for xmit path, tracking skb->truesize in
sk->sk_wmem_alloc.

If a layer calls pskb_may_pull(), we can not change skb->truesize
without also changing skb->sk->sk_wmem_alloc, or sock_wfree() will
trigger bugs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-01-25 21:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-24 22:57 [PATCH net-next] tcp: reduce skb overhead in selected places Eric Dumazet
2017-01-25 18:17 ` David Miller
2017-01-25 18:38   ` Eric Dumazet
2017-01-25 19:03     ` David Miller
2017-01-25 21:40       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.