All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tcp: Always cleanup skb before sending
@ 2017-11-01 21:10 Christoph Paasch
  2017-11-01 21:32 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Paasch @ 2017-11-01 21:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet

Since commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache
line misses") IPCB/IP6CB is no more at the beginning of the tcp_skb_cb.
This means that on the output path, we need to make sure that it has
been correctly initialized to 0, as is done in tcp_transmit_skb.

However, when going through the other code-path in TCP that can send an
skb (e.g., through tcp_v6_send_synack), we end up in a situation where
IP6CB has some of its fields set to unexpected values. Depending on the
layout of tcp_skb_cb across the different kernel-versions this can be
lastopt, flags,...

This patch makes sure that IPCB/IP6CB is always set to 0 before sending.

Cc: Eric Dumazet <edumazet@google.com>
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
---
 include/net/tcp.h     |  2 ++
 net/ipv4/tcp_ipv4.c   |  6 ++++++
 net/ipv4/tcp_output.c | 20 ++++++++++++--------
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e6d0002a1b0b..a375ee8fc534 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2032,6 +2032,8 @@ static inline void tcp_listendrop(const struct sock *sk)
 
 enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer);
 
+void tcp_skb_cleanup(struct sk_buff *skb);
+
 /*
  * Interface for adding Upper Level Protocols over TCP
  */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5b027c69cbc5..db7dd65b1f19 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -709,6 +709,9 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
 
 	arg.tos = ip_hdr(skb)->tos;
 	arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL);
+
+	tcp_skb_cleanup(skb);
+
 	local_bh_disable();
 	ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
 			      skb, &TCP_SKB_CB(skb)->header.h4.opt,
@@ -795,6 +798,9 @@ static void tcp_v4_send_ack(const struct sock *sk,
 		arg.bound_dev_if = oif;
 	arg.tos = tos;
 	arg.uid = sock_net_uid(net, sk_fullsock(sk) ? sk : NULL);
+
+	tcp_skb_cleanup(skb);
+
 	local_bh_disable();
 	ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
 			      skb, &TCP_SKB_CB(skb)->header.h4.opt,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 823003eef3a2..6935a68d449b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -973,6 +973,16 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
 		      HRTIMER_MODE_ABS_PINNED);
 }
 
+void tcp_skb_cleanup(struct sk_buff *skb)
+{
+	/* Our usage of tstamp should remain private */
+	skb->tstamp = 0;
+
+	/* Cleanup our debris for IP stacks */
+	memset(skb->cb, 0, max(sizeof(struct inet_skb_parm),
+			       sizeof(struct inet6_skb_parm)));
+}
+
 /* This routine actually transmits TCP packets queued in by
  * tcp_do_sendmsg().  This is used by both the initial
  * transmission and possible later retransmissions.
@@ -1115,12 +1125,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	skb_shinfo(skb)->gso_segs = tcp_skb_pcount(skb);
 	skb_shinfo(skb)->gso_size = tcp_skb_mss(skb);
 
-	/* Our usage of tstamp should remain private */
-	skb->tstamp = 0;
-
-	/* Cleanup our debris for IP stacks */
-	memset(skb->cb, 0, max(sizeof(struct inet_skb_parm),
-			       sizeof(struct inet6_skb_parm)));
+	tcp_skb_cleanup(skb);
 
 	err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
 
@@ -3204,8 +3209,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	rcu_read_unlock();
 #endif
 
-	/* Do not fool tcpdump (if any), clean our debris */
-	skb->tstamp = 0;
+	tcp_skb_cleanup(skb);
 	return skb;
 }
 EXPORT_SYMBOL(tcp_make_synack);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-01 21:10 [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
@ 2017-11-01 21:32 ` Eric Dumazet
  2017-11-01 21:53   ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-11-01 21:32 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: David Miller, netdev

On Wed, Nov 1, 2017 at 2:10 PM, Christoph Paasch <cpaasch@apple.com> wrote:
> Since commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache
> line misses") IPCB/IP6CB is no more at the beginning of the tcp_skb_cb.
> This means that on the output path, we need to make sure that it has
> been correctly initialized to 0, as is done in tcp_transmit_skb.
>
> However, when going through the other code-path in TCP that can send an
> skb (e.g., through tcp_v6_send_synack), we end up in a situation where
> IP6CB has some of its fields set to unexpected values. Depending on the
> layout of tcp_skb_cb across the different kernel-versions this can be
> lastopt, flags,...

Or not use tcp_init_nondata_skb() on non fast clones, since it adds
unnecessary writes and clears.

tcp_make_synack() really has no business using tcp_init_nondata_skb()
and could simply set th->seq = htonl(tcp_rsk(req)->snt_isn);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-01 21:32 ` Eric Dumazet
@ 2017-11-01 21:53   ` Eric Dumazet
  2017-11-02  0:10     ` Christoph Paasch
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-11-01 21:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Christoph Paasch, David Miller, netdev

On Wed, 2017-11-01 at 14:32 -0700, Eric Dumazet wrote:
> On Wed, Nov 1, 2017 at 2:10 PM, Christoph Paasch <cpaasch@apple.com> wrote:
> > Since commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache
> > line misses") IPCB/IP6CB is no more at the beginning of the tcp_skb_cb.
> > This means that on the output path, we need to make sure that it has
> > been correctly initialized to 0, as is done in tcp_transmit_skb.
> >
> > However, when going through the other code-path in TCP that can send an
> > skb (e.g., through tcp_v6_send_synack), we end up in a situation where
> > IP6CB has some of its fields set to unexpected values. Depending on the
> > layout of tcp_skb_cb across the different kernel-versions this can be
> > lastopt, flags,...
> 
> Or not use tcp_init_nondata_skb() on non fast clones, since it adds
> unnecessary writes and clears.
> 
> tcp_make_synack() really has no business using tcp_init_nondata_skb()
> and could simply set th->seq = htonl(tcp_rsk(req)->snt_isn);

Something like :

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 69cfdead0cb49e4365158048a0d1a9bbdd55fa83..5502abc5307f0ce1de610d4b70f3a59c4d5383c5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3399,13 +3399,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
        tcp_ecn_make_synack(req, th);
        th->source = htons(ireq->ir_num);
        th->dest = ireq->ir_rmt_port;
-       /* Setting of flags are superfluous here for callers (and ECE is
-        * not even correctly set)
-        */
-       tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn,
-                            TCPHDR_SYN | TCPHDR_ACK);
-
-       th->seq = htonl(TCP_SKB_CB(skb)->seq);
+       skb->ip_summed = CHECKSUM_PARTIAL;
+       th->seq = htonl(tcp_rsk(req)->snt_isn);
        /* XXX data is queued and acked as is. No buffer/window check */
        th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-01 21:53   ` Eric Dumazet
@ 2017-11-02  0:10     ` Christoph Paasch
  2017-11-02  1:00       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02  0:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev

On 01/11/17 - 14:53:38, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 14:32 -0700, Eric Dumazet wrote:
> > On Wed, Nov 1, 2017 at 2:10 PM, Christoph Paasch <cpaasch@apple.com> wrote:
> > > Since commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache
> > > line misses") IPCB/IP6CB is no more at the beginning of the tcp_skb_cb.
> > > This means that on the output path, we need to make sure that it has
> > > been correctly initialized to 0, as is done in tcp_transmit_skb.
> > >
> > > However, when going through the other code-path in TCP that can send an
> > > skb (e.g., through tcp_v6_send_synack), we end up in a situation where
> > > IP6CB has some of its fields set to unexpected values. Depending on the
> > > layout of tcp_skb_cb across the different kernel-versions this can be
> > > lastopt, flags,...
> > 
> > Or not use tcp_init_nondata_skb() on non fast clones, since it adds
> > unnecessary writes and clears.
> > 
> > tcp_make_synack() really has no business using tcp_init_nondata_skb()
> > and could simply set th->seq = htonl(tcp_rsk(req)->snt_isn);
> 
> Something like :
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 69cfdead0cb49e4365158048a0d1a9bbdd55fa83..5502abc5307f0ce1de610d4b70f3a59c4d5383c5 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3399,13 +3399,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
>         tcp_ecn_make_synack(req, th);
>         th->source = htons(ireq->ir_num);
>         th->dest = ireq->ir_rmt_port;
> -       /* Setting of flags are superfluous here for callers (and ECE is
> -        * not even correctly set)
> -        */
> -       tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn,
> -                            TCPHDR_SYN | TCPHDR_ACK);
> -
> -       th->seq = htonl(TCP_SKB_CB(skb)->seq);
> +       skb->ip_summed = CHECKSUM_PARTIAL;
> +       th->seq = htonl(tcp_rsk(req)->snt_isn);
>         /* XXX data is queued and acked as is. No buffer/window check */
>         th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt);

Yes, that looks good to me. Thanks!

But we still need to clean up the skb in tcp_v4_send_reset and
tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
coming from tcp_v4_rcv.


Christoph

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-02  0:10     ` Christoph Paasch
@ 2017-11-02  1:00       ` Eric Dumazet
  2017-11-02  2:21         ` Eric Dumazet
  2017-11-02 18:16         ` [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2017-11-02  1:00 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Eric Dumazet, David Miller, netdev

On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:

> Yes, that looks good to me. Thanks!
> 
> But we still need to clean up the skb in tcp_v4_send_reset and
> tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> coming from tcp_v4_rcv.

You might be confused : ip_send_unicast_reply() does not send back the
incoming skb.

A fresh skb is allocated, then appended/sent.

And commit 24a2d43d8886f5a29c did the changes to provide to
__ip_options_echo() the proper IPCB header location.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-02  1:00       ` Eric Dumazet
@ 2017-11-02  2:21         ` Eric Dumazet
  2017-11-02 18:24           ` Christoph Paasch
  2017-11-02 19:30           ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
  2017-11-02 18:16         ` [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
  1 sibling, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2017-11-02  2:21 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Eric Dumazet, David Miller, netdev

On Wed, 2017-11-01 at 18:00 -0700, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
> 
> > Yes, that looks good to me. Thanks!
> > 
> > But we still need to clean up the skb in tcp_v4_send_reset and
> > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> > coming from tcp_v4_rcv.
> 
> You might be confused : ip_send_unicast_reply() does not send back the
> incoming skb.
> 
> A fresh skb is allocated, then appended/sent.
> 
> And commit 24a2d43d8886f5a29c did the changes to provide to
> __ip_options_echo() the proper IPCB header location.
> 

More details :

Fields written by tcp_init_nondata_skb() on the synack packet :

->seq          (32bits) at offset 0 of skb->cb[]
->end_seq      (32bits) at offset 4 of skb->cb[]
->tcp_gso_segs (16bits) at offset 8
->tcp_flags    (8bits) at offset 12 value (TCPHDR_SYN | TCPHDR_ACK ->
0x12)
->sacked       (8bits) at offset 13

IPCB fields sharing these 14 bytes :

iif  /* 32bits, offset 0 */
opt.faddr    (32bits) offset 4
opt.nexthop  (32bits) offset 8 /* value 1 */
opt.optlen   (8bits) offset 12 /* value 0x12 */
opt.srr      (8bits) offset 13

IP6CB fields sharing these 14 bytes :

iif   /* 32bits, offset 0 */
ra    /* 16 bits, offset 4 */
dst0  /* 16 bits offset 6 */
srcrt /* 16 bits offset 8 */  -> 0x0001
dst1  /* 16 bits offset 10 */ (not mangled -> 0)
lastopt /* 16 bits offset 12 */  -> 0x12


At xmit :

IPV4 uses ip_build_and_send_pkt() to transmit the SYNACK, so skb->cb[]
is not used.

IPv6 uses other fields.

So I really wonder what exact issue you observed, please share your
drugs ;)

Thanks !

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-02  1:00       ` Eric Dumazet
  2017-11-02  2:21         ` Eric Dumazet
@ 2017-11-02 18:16         ` Christoph Paasch
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 18:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev

On 01/11/17 - 18:00:10, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
> 
> > Yes, that looks good to me. Thanks!
> > 
> > But we still need to clean up the skb in tcp_v4_send_reset and
> > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> > coming from tcp_v4_rcv.
> 
> You might be confused : ip_send_unicast_reply() does not send back the
> incoming skb.
> 
> A fresh skb is allocated, then appended/sent.
> 
> And commit 24a2d43d8886f5a29c did the changes to provide to
> __ip_options_echo() the proper IPCB header location.

Yes, sorry I misunderstood ip_send_unicast_reply(). Didn't see the
allocation of the new skb.


Christoph

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-02  2:21         ` Eric Dumazet
@ 2017-11-02 18:24           ` Christoph Paasch
  2017-11-02 18:26             ` Eric Dumazet
  2017-11-02 19:30           ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 18:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev

Hi Eric,

On 01/11/17 - 19:21:31, Eric Dumazet wrote:
> On Wed, 2017-11-01 at 18:00 -0700, Eric Dumazet wrote:
> > On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
> > 
> > > Yes, that looks good to me. Thanks!
> > > 
> > > But we still need to clean up the skb in tcp_v4_send_reset and
> > > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> > > coming from tcp_v4_rcv.
> > 
> > You might be confused : ip_send_unicast_reply() does not send back the
> > incoming skb.
> > 
> > A fresh skb is allocated, then appended/sent.
> > 
> > And commit 24a2d43d8886f5a29c did the changes to provide to
> > __ip_options_echo() the proper IPCB header location.
> > 
> 
> More details :
> 
> Fields written by tcp_init_nondata_skb() on the synack packet :
> 
> ->seq          (32bits) at offset 0 of skb->cb[]
> ->end_seq      (32bits) at offset 4 of skb->cb[]
> ->tcp_gso_segs (16bits) at offset 8
> ->tcp_flags    (8bits) at offset 12 value (TCPHDR_SYN | TCPHDR_ACK ->
> 0x12)

In my build I see tcp_flags at offset 16, because ktime_t is 64 bits.

> ->sacked       (8bits) at offset 13
> 
> IPCB fields sharing these 14 bytes :
> 
> iif  /* 32bits, offset 0 */
> opt.faddr    (32bits) offset 4
> opt.nexthop  (32bits) offset 8 /* value 1 */
> opt.optlen   (8bits) offset 12 /* value 0x12 */
> opt.srr      (8bits) offset 13
> 
> IP6CB fields sharing these 14 bytes :
> 
> iif   /* 32bits, offset 0 */
> ra    /* 16 bits, offset 4 */
> dst0  /* 16 bits offset 6 */
> srcrt /* 16 bits offset 8 */  -> 0x0001
> dst1  /* 16 bits offset 10 */ (not mangled -> 0)
> lastopt /* 16 bits offset 12 */  -> 0x12

Thus, because what I mention above, we are writing here into flags which sits
at offset 16.

So, IP6SKB_FORWARDED and IP6SKB_FRAGMENTED will be set on the outgoing skb.
Especially for IP6SKB_FORWARDED, which is checked at ip6_finish_output2() I
am concerned. Even in general, having these fields set looks brittle to me.

What do you think?

> At xmit :
> 
> IPV4 uses ip_build_and_send_pkt() to transmit the SYNACK, so skb->cb[]
> is not used.
> 
> IPv6 uses other fields.
> 
> So I really wonder what exact issue you observed, please share your
> drugs ;)

No drugs, only belgian beer :)


Christoph

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-02 18:24           ` Christoph Paasch
@ 2017-11-02 18:26             ` Eric Dumazet
  2017-11-02 18:31               ` Christoph Paasch
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-11-02 18:26 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Eric Dumazet, David Miller, netdev

On Thu, Nov 2, 2017 at 11:24 AM, Christoph Paasch <cpaasch@apple.com> wrote:
> Hi Eric,
>
> On 01/11/17 - 19:21:31, Eric Dumazet wrote:
>> On Wed, 2017-11-01 at 18:00 -0700, Eric Dumazet wrote:
>> > On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
>> >
>> > > Yes, that looks good to me. Thanks!
>> > >
>> > > But we still need to clean up the skb in tcp_v4_send_reset and
>> > > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
>> > > coming from tcp_v4_rcv.
>> >
>> > You might be confused : ip_send_unicast_reply() does not send back the
>> > incoming skb.
>> >
>> > A fresh skb is allocated, then appended/sent.
>> >
>> > And commit 24a2d43d8886f5a29c did the changes to provide to
>> > __ip_options_echo() the proper IPCB header location.
>> >
>>
>> More details :
>>
>> Fields written by tcp_init_nondata_skb() on the synack packet :
>>
>> ->seq          (32bits) at offset 0 of skb->cb[]
>> ->end_seq      (32bits) at offset 4 of skb->cb[]
>> ->tcp_gso_segs (16bits) at offset 8
>> ->tcp_flags    (8bits) at offset 12 value (TCPHDR_SYN | TCPHDR_ACK ->
>> 0x12)
>
> In my build I see tcp_flags at offset 16, because ktime_t is 64 bits.

Yes, but this ktime was removed in net-next for my rb-tree work.

>
>> ->sacked       (8bits) at offset 13
>>
>> IPCB fields sharing these 14 bytes :
>>
>> iif  /* 32bits, offset 0 */
>> opt.faddr    (32bits) offset 4
>> opt.nexthop  (32bits) offset 8 /* value 1 */
>> opt.optlen   (8bits) offset 12 /* value 0x12 */
>> opt.srr      (8bits) offset 13
>>
>> IP6CB fields sharing these 14 bytes :
>>
>> iif   /* 32bits, offset 0 */
>> ra    /* 16 bits, offset 4 */
>> dst0  /* 16 bits offset 6 */
>> srcrt /* 16 bits offset 8 */  -> 0x0001
>> dst1  /* 16 bits offset 10 */ (not mangled -> 0)
>> lastopt /* 16 bits offset 12 */  -> 0x12
>
> Thus, because what I mention above, we are writing here into flags which sits
> at offset 16.
>
> So, IP6SKB_FORWARDED and IP6SKB_FRAGMENTED will be set on the outgoing skb.
> Especially for IP6SKB_FORWARDED, which is checked at ip6_finish_output2() I
> am concerned. Even in general, having these fields set looks brittle to me.
>
> What do you think?

I think I will submit my patch, which should clear the issue, without
breaking IPv4 options handling as your patch did ;)

Thanks !

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: Always cleanup skb before sending
  2017-11-02 18:26             ` Eric Dumazet
@ 2017-11-02 18:31               ` Christoph Paasch
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 18:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev

On 02/11/17 - 11:26:21, Eric Dumazet wrote:
> On Thu, Nov 2, 2017 at 11:24 AM, Christoph Paasch <cpaasch@apple.com> wrote:
> > Hi Eric,
> >
> > On 01/11/17 - 19:21:31, Eric Dumazet wrote:
> >> On Wed, 2017-11-01 at 18:00 -0700, Eric Dumazet wrote:
> >> > On Wed, 2017-11-01 at 17:10 -0700, Christoph Paasch wrote:
> >> >
> >> > > Yes, that looks good to me. Thanks!
> >> > >
> >> > > But we still need to clean up the skb in tcp_v4_send_reset and
> >> > > tcp_v4_send_ack, as the fields (e.g., tcp_skb_cb->seq) have been set when
> >> > > coming from tcp_v4_rcv.
> >> >
> >> > You might be confused : ip_send_unicast_reply() does not send back the
> >> > incoming skb.
> >> >
> >> > A fresh skb is allocated, then appended/sent.
> >> >
> >> > And commit 24a2d43d8886f5a29c did the changes to provide to
> >> > __ip_options_echo() the proper IPCB header location.
> >> >
> >>
> >> More details :
> >>
> >> Fields written by tcp_init_nondata_skb() on the synack packet :
> >>
> >> ->seq          (32bits) at offset 0 of skb->cb[]
> >> ->end_seq      (32bits) at offset 4 of skb->cb[]
> >> ->tcp_gso_segs (16bits) at offset 8
> >> ->tcp_flags    (8bits) at offset 12 value (TCPHDR_SYN | TCPHDR_ACK ->
> >> 0x12)
> >
> > In my build I see tcp_flags at offset 16, because ktime_t is 64 bits.
> 
> Yes, but this ktime was removed in net-next for my rb-tree work.
> 
> >
> >> ->sacked       (8bits) at offset 13
> >>
> >> IPCB fields sharing these 14 bytes :
> >>
> >> iif  /* 32bits, offset 0 */
> >> opt.faddr    (32bits) offset 4
> >> opt.nexthop  (32bits) offset 8 /* value 1 */
> >> opt.optlen   (8bits) offset 12 /* value 0x12 */
> >> opt.srr      (8bits) offset 13
> >>
> >> IP6CB fields sharing these 14 bytes :
> >>
> >> iif   /* 32bits, offset 0 */
> >> ra    /* 16 bits, offset 4 */
> >> dst0  /* 16 bits offset 6 */
> >> srcrt /* 16 bits offset 8 */  -> 0x0001
> >> dst1  /* 16 bits offset 10 */ (not mangled -> 0)
> >> lastopt /* 16 bits offset 12 */  -> 0x12
> >
> > Thus, because what I mention above, we are writing here into flags which sits
> > at offset 16.
> >
> > So, IP6SKB_FORWARDED and IP6SKB_FRAGMENTED will be set on the outgoing skb.
> > Especially for IP6SKB_FORWARDED, which is checked at ip6_finish_output2() I
> > am concerned. Even in general, having these fields set looks brittle to me.
> >
> > What do you think?
> 
> I think I will submit my patch, which should clear the issue, without
> breaking IPv4 options handling as your patch did ;)

Sounds good! :)

Thanks for your feedback.


Christoph

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack()
  2017-11-02  2:21         ` Eric Dumazet
  2017-11-02 18:24           ` Christoph Paasch
@ 2017-11-02 19:30           ` Eric Dumazet
  2017-11-02 19:49             ` Christoph Paasch
  2017-11-03  5:32             ` David Miller
  1 sibling, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2017-11-02 19:30 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Eric Dumazet, David Miller, netdev

From: Eric Dumazet <edumazet@google.com>

Christoph Paasch sent a patch to address the following issue :

tcp_make_synack() is leaving some TCP private info in skb->cb[],
then send the packet by other means than tcp_transmit_skb()

tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.

tcp_make_synack() should not use tcp_init_nondata_skb() :

tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
queues (the ones that are only sent via tcp_transmit_skb())

This patch fixes the issue and should even save few cpu cycles ;)

Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
---
 net/ipv4/tcp_output.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 823003eef3a21a5cc5c27e0be9f46159afa060df..478909f4694d00076c96b7a3be1eda62b6be8bef 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3180,13 +3180,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	th->source = htons(ireq->ir_num);
 	th->dest = ireq->ir_rmt_port;
 	skb->mark = ireq->ir_mark;
-	/* Setting of flags are superfluous here for callers (and ECE is
-	 * not even correctly set)
-	 */
-	tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn,
-			     TCPHDR_SYN | TCPHDR_ACK);
-
-	th->seq = htonl(TCP_SKB_CB(skb)->seq);
+	skb->ip_summed = CHECKSUM_PARTIAL;
+	th->seq = htonl(tcp_rsk(req)->snt_isn);
 	/* XXX data is queued and acked as is. No buffer/window check */
 	th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack()
  2017-11-02 19:30           ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
@ 2017-11-02 19:49             ` Christoph Paasch
  2017-11-03  5:32             ` David Miller
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2017-11-02 19:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, netdev

On 02/11/17 - 12:30:25, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Christoph Paasch sent a patch to address the following issue :
> 
> tcp_make_synack() is leaving some TCP private info in skb->cb[],
> then send the packet by other means than tcp_transmit_skb()
> 
> tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
> IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
> 
> tcp_make_synack() should not use tcp_init_nondata_skb() :
> 
> tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
> queues (the ones that are only sent via tcp_transmit_skb())
> 
> This patch fixes the issue and should even save few cpu cycles ;)
> 
> Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> ---
>  net/ipv4/tcp_output.c |    9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)

Reviewed-by: Christoph Paasch <cpaasch@apple.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack()
  2017-11-02 19:30           ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
  2017-11-02 19:49             ` Christoph Paasch
@ 2017-11-03  5:32             ` David Miller
  1 sibling, 0 replies; 13+ messages in thread
From: David Miller @ 2017-11-03  5:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: cpaasch, edumazet, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 02 Nov 2017 12:30:25 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Christoph Paasch sent a patch to address the following issue :
> 
> tcp_make_synack() is leaving some TCP private info in skb->cb[],
> then send the packet by other means than tcp_transmit_skb()
> 
> tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
> IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
> 
> tcp_make_synack() should not use tcp_init_nondata_skb() :
> 
> tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
> queues (the ones that are only sent via tcp_transmit_skb())
> 
> This patch fixes the issue and should even save few cpu cycles ;)
> 
> Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Christoph Paasch <cpaasch@apple.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-11-03  5:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-01 21:10 [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch
2017-11-01 21:32 ` Eric Dumazet
2017-11-01 21:53   ` Eric Dumazet
2017-11-02  0:10     ` Christoph Paasch
2017-11-02  1:00       ` Eric Dumazet
2017-11-02  2:21         ` Eric Dumazet
2017-11-02 18:24           ` Christoph Paasch
2017-11-02 18:26             ` Eric Dumazet
2017-11-02 18:31               ` Christoph Paasch
2017-11-02 19:30           ` [PATCH net] tcp: do not mangle skb->cb[] in tcp_make_synack() Eric Dumazet
2017-11-02 19:49             ` Christoph Paasch
2017-11-03  5:32             ` David Miller
2017-11-02 18:16         ` [PATCH net] tcp: Always cleanup skb before sending Christoph Paasch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.