All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net 0/2] fix ipv6 tcp reset packet flowlabel issues
@ 2017-07-13 17:56 Shaohua Li
  2017-07-13 17:56 ` [RFC net 1/2] net: set skb hash for IP6 TCP reset packet Shaohua Li
  2017-07-13 17:56 ` [RFC net 2/2] net: tcp_v6_send_reset should set flowlabel Shaohua Li
  0 siblings, 2 replies; 8+ messages in thread
From: Shaohua Li @ 2017-07-13 17:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: eric.dumazet, Kernel-team, Shaohua Li

From: Shaohua Li <shli@fb.com>

Please see below tcpdump output:
21:00:48.109122 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 40) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [S], cksum 0x0529 (incorrect -> 0xf56c), seq 3282214508, win 43690, options [mss 65476,sackOK,TS val 2500903437 ecr 0,nop,wscale 7], length 0
21:00:48.109381 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 40) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [S.], cksum 0x0529 (incorrect -> 0x49ad), seq 1923801573, ack 3282214509, win 43690, options [mss 65476,sackOK,TS val 2500903437 ecr 2500903437,nop,wscale 7], length 0
21:00:48.109548 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [.], cksum 0x0521 (incorrect -> 0x1bdf), seq 1, ack 1, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 0
21:00:48.109823 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 62) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [P.], cksum 0x053f (incorrect -> 0xb8b1), seq 1:31, ack 1, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 30
21:00:48.109910 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [.], cksum 0x0521 (incorrect -> 0x1bc1), seq 1, ack 31, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 0
21:00:48.110043 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 56) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [P.], cksum 0x0539 (incorrect -> 0xb726), seq 1:25, ack 31, win 342, options [nop,nop,TS val 2500903438 ecr 2500903437], length 24
21:00:48.110173 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [.], cksum 0x0521 (incorrect -> 0x1ba7), seq 31, ack 25, win 342, options [nop,nop,TS val 2500903438 ecr 2500903438], length 0
21:00:48.110211 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [F.], cksum 0x0521 (incorrect -> 0x1ba7), seq 25, ack 31, win 342, options [nop,nop,TS val 2500903438 ecr 2500903437], length 0
21:00:48.151099 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [.], cksum 0x0521 (incorrect -> 0x1ba6), seq 31, ack 26, win 342, options [nop,nop,TS val 2500903438 ecr 2500903438], length 0
21:00:49.110524 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 56) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [P.], cksum 0x0539 (incorrect -> 0xb324), seq 31:55, ack 26, win 342, options [nop,nop,TS val 2500904438 ecr 2500903438], length 24
21:00:49.110637 IP6 (flowlabel 0xb34d5, hlim 64, next-header TCP (6) payload length: 20) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [R], cksum 0x0515 (incorrect -> 0x668c), seq 1923801599, win 0, length 0

The flowlabel of reset packet (0xb34d5) and that of normal packet (0xd827f) are
different. This causes our router doesn't correctly close tcp connection. The
patches try to fix the issue.

Thanks,
Shaohua

Shaohua Li (2):
  net: set skb hash for IP6 TCP reset packet
  net: tcp_v6_send_reset should set flowlabel

 include/net/inet_timewait_sock.h | 10 ++++++++++
 net/ipv4/inet_timewait_sock.c    |  1 +
 net/ipv6/tcp_ipv6.c              | 25 ++++++++++++++++++++++++-
 3 files changed, 35 insertions(+), 1 deletion(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC net 1/2] net: set skb hash for IP6 TCP reset packet
  2017-07-13 17:56 [RFC net 0/2] fix ipv6 tcp reset packet flowlabel issues Shaohua Li
@ 2017-07-13 17:56 ` Shaohua Li
  2017-07-17  8:51   ` Eric Dumazet
  2017-07-13 17:56 ` [RFC net 2/2] net: tcp_v6_send_reset should set flowlabel Shaohua Li
  1 sibling, 1 reply; 8+ messages in thread
From: Shaohua Li @ 2017-07-13 17:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: eric.dumazet, Kernel-team, Shaohua Li

From: Shaohua Li <shli@fb.com>

Please see below tcpdump output:
21:00:48.109122 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 40) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [S], cksum 0x0529 (incorrect -> 0xf56c), seq 3282214508, win 43690, options [mss 65476,sackOK,TS val 2500903437 ecr 0,nop,wscale 7], length 0
21:00:48.109381 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 40) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [S.], cksum 0x0529 (incorrect -> 0x49ad), seq 1923801573, ack 3282214509, win 43690, options [mss 65476,sackOK,TS val 2500903437 ecr 2500903437,nop,wscale 7], length 0
21:00:48.109548 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [.], cksum 0x0521 (incorrect -> 0x1bdf), seq 1, ack 1, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 0
21:00:48.109823 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 62) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [P.], cksum 0x053f (incorrect -> 0xb8b1), seq 1:31, ack 1, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 30
21:00:48.109910 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [.], cksum 0x0521 (incorrect -> 0x1bc1), seq 1, ack 31, win 342, options [nop,nop,TS val 2500903437 ecr 2500903437], length 0
21:00:48.110043 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 56) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [P.], cksum 0x0539 (incorrect -> 0xb726), seq 1:25, ack 31, win 342, options [nop,nop,TS val 2500903438 ecr 2500903437], length 24
21:00:48.110173 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [.], cksum 0x0521 (incorrect -> 0x1ba7), seq 31, ack 25, win 342, options [nop,nop,TS val 2500903438 ecr 2500903438], length 0
21:00:48.110211 IP6 (flowlabel 0xd827f, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [F.], cksum 0x0521 (incorrect -> 0x1ba7), seq 25, ack 31, win 342, options [nop,nop,TS val 2500903438 ecr 2500903437], length 0
21:00:48.151099 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 32) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [.], cksum 0x0521 (incorrect -> 0x1ba6), seq 31, ack 26, win 342, options [nop,nop,TS val 2500903438 ecr 2500903438], length 0
21:00:49.110524 IP6 (flowlabel 0x43304, hlim 64, next-header TCP (6) payload length: 56) fec0::5054:ff:fe12:3456.55804 > fec0::5054:ff:fe12:3456.5555: Flags [P.], cksum 0x0539 (incorrect -> 0xb324), seq 31:55, ack 26, win 342, options [nop,nop,TS val 2500904438 ecr 2500903438], length 24
21:00:49.110637 IP6 (flowlabel 0xb34d5, hlim 64, next-header TCP (6) payload length: 20) fec0::5054:ff:fe12:3456.5555 > fec0::5054:ff:fe12:3456.55804: Flags [R], cksum 0x0515 (incorrect -> 0x668c), seq 1923801599, win 0, length 0

The tcp reset packet has a different flowlabel, which causes our router
doesn't correctly close tcp connection.  The reason is the normal packet
gets the skb->hash from sk->sk_txhash, which is generated randomly.
ip6_make_flowlabel then uses the hash to create a flowlabel. The reset
packet doesn't get assigned a hash, so the flowlabel is calculated with
flowi6.

The solution is to save the hash value for timeout sock and use it for
reset packet.

Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/net/inet_timewait_sock.h | 10 ++++++++++
 net/ipv4/inet_timewait_sock.c    |  1 +
 net/ipv6/tcp_ipv6.c              |  7 +++++++
 3 files changed, 18 insertions(+)

diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index 6a75d67..b568224 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -77,6 +77,7 @@ struct inet_timewait_sock {
 				tw_pad		: 2,	/* 2 bits hole */
 				tw_tos		: 8;
 	kmemcheck_bitfield_end(flags);
+	__u32			tw_txhash;
 	struct timer_list	tw_timer;
 	struct inet_bind_bucket	*tw_tb;
 };
@@ -87,6 +88,15 @@ static inline struct inet_timewait_sock *inet_twsk(const struct sock *sk)
 	return (struct inet_timewait_sock *)sk;
 }
 
+static inline void skb_set_hash_from_twsk(struct sk_buff *skb,
+	struct inet_timewait_sock *twsk)
+{
+	if (twsk->tw_txhash) {
+		skb->l4_hash = 1;
+		skb->hash = twsk->tw_txhash;
+	}
+}
+
 void inet_twsk_free(struct inet_timewait_sock *tw);
 void inet_twsk_put(struct inet_timewait_sock *tw);
 
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 5b03915..6f460e7 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -190,6 +190,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
 		twsk_net_set(tw, sock_net(sk));
 		setup_pinned_timer(&tw->tw_timer, tw_timer_handler,
 				   (unsigned long)tw);
+		tw->tw_txhash	   = sk->sk_txhash;
 		/*
 		 * Because we use RCU lookups, we should not set tw_refcnt
 		 * to a non null value before everything is setup for this
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2521690..2e656d2 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -805,6 +805,13 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
 	t1 = skb_push(buff, tot_len);
 	skb_reset_transport_header(buff);
 
+	if (sk) {
+		if (sk_fullsock(sk))
+			skb_set_hash_from_sk(buff, (struct sock *)sk);
+		else
+			skb_set_hash_from_twsk(buff, inet_twsk(sk));
+	}
+
 	/* Swap the send and the receive. */
 	memset(t1, 0, sizeof(*t1));
 	t1->dest = th->source;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC net 2/2] net: tcp_v6_send_reset should set flowlabel
  2017-07-13 17:56 [RFC net 0/2] fix ipv6 tcp reset packet flowlabel issues Shaohua Li
  2017-07-13 17:56 ` [RFC net 1/2] net: set skb hash for IP6 TCP reset packet Shaohua Li
@ 2017-07-13 17:56 ` Shaohua Li
  1 sibling, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2017-07-13 17:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: eric.dumazet, Kernel-team, Shaohua Li

From: Shaohua Li <shli@fb.com>

Currently tcp_v6_send_reset ignores user defined flowlabel, so the reset
packet doesn't include the flowlabel info.

Signed-off-by: Shaohua Li <shli@fb.com>
---
 net/ipv6/tcp_ipv6.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2e656d2..eaf4b8e 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -898,6 +898,8 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
 	struct sock *sk1 = NULL;
 #endif
 	int oif;
+	u8 tclass = 0;
+	__be32 flowlabel = 0;
 
 	if (th->rst)
 		return;
@@ -946,7 +948,21 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
 			  (th->doff << 2);
 
 	oif = sk ? sk->sk_bound_dev_if : 0;
-	tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, key, 1, 0, 0);
+	if (sk) {
+		if (sk_fullsock(sk)) {
+			struct ipv6_pinfo *np = inet6_sk(sk);
+
+			tclass = np->tclass;
+			flowlabel = np->flow_label & IPV6_FLOWLABEL_MASK;
+		} else {
+			struct inet_timewait_sock *tw = inet_twsk(sk);
+
+			tclass = tw->tw_tclass;
+			flowlabel = cpu_to_be32(tw->tw_flowlabel);
+		}
+	}
+	tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, key, 1,
+		tclass, flowlabel);
 
 #ifdef CONFIG_TCP_MD5SIG
 out:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC net 1/2] net: set skb hash for IP6 TCP reset packet
  2017-07-13 17:56 ` [RFC net 1/2] net: set skb hash for IP6 TCP reset packet Shaohua Li
@ 2017-07-17  8:51   ` Eric Dumazet
  2017-07-17 20:06     ` Florent Fourcot
  2017-07-17 21:53     ` Shaohua Li
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2017-07-17  8:51 UTC (permalink / raw)
  To: Shaohua Li; +Cc: netdev, davem, Kernel-team, Shaohua Li

On Thu, 2017-07-13 at 10:56 -0700, Shaohua Li wrote:
> From: Shaohua Li <shli@fb.com>
> 
> Please see below tcpdump output:

> The tcp reset packet has a different flowlabel, which causes our router
> doesn't correctly close tcp connection.

This looks a bug in your router, because (IPv6 only) flowlabel is not
part of the tuple identifying a TCP flow.


>   The reason is the normal packet
> gets the skb->hash from sk->sk_txhash, which is generated randomly.
> ip6_make_flowlabel then uses the hash to create a flowlabel. The reset
> packet doesn't get assigned a hash, so the flowlabel is calculated with
> flowi6.
> 
> The solution is to save the hash value for timeout sock and use it for
> reset packet.

I am a bit unsure why we need to add yet another field in TCP timewait
structure, since :

1) flowlabel can vary during a TCP flow lifetime.
2) flowlabel is different unde synflood (each syncookie gets a random
flowlabel), and if 3rd packet comes back from the client to finish 3WHS,
the flowlabel will again be different from the one that SYNACK used.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net 1/2] net: set skb hash for IP6 TCP reset packet
  2017-07-17  8:51   ` Eric Dumazet
@ 2017-07-17 20:06     ` Florent Fourcot
  2017-07-17 21:53     ` Shaohua Li
  1 sibling, 0 replies; 8+ messages in thread
From: Florent Fourcot @ 2017-07-17 20:06 UTC (permalink / raw)
  To: Eric Dumazet, Shaohua Li; +Cc: netdev, davem, Kernel-team, Shaohua Li

Eric Dumazet wrote:
> I am a bit unsure why we need to add yet another field in TCP timewait
> structure, since :
>
> 1) flowlabel can vary during a TCP flow lifetime.
> 2) flowlabel is different unde synflood (each syncookie gets a random
> flowlabel), and if 3rd packet comes back from the client to finish 3WHS,
> the flowlabel will again be different from the one that SYNACK used.

Actually, there is already a field for flowlabel in TCP timewait 
structure (see commit 1d13a96c74). At least for "stateful" flow labels 
(set with setsockopt(), and not randomly generated by kernel), it should 
work (if not, this is a regression).

The solution is perhaps to store the random generated value in the same 
place than standard flowlabels, instead to store the hash.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net 1/2] net: set skb hash for IP6 TCP reset packet
  2017-07-17  8:51   ` Eric Dumazet
  2017-07-17 20:06     ` Florent Fourcot
@ 2017-07-17 21:53     ` Shaohua Li
  2017-07-18  4:02       ` Eric Dumazet
  1 sibling, 1 reply; 8+ messages in thread
From: Shaohua Li @ 2017-07-17 21:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Kernel-team, Florent Fourcot

On Mon, Jul 17, 2017 at 01:51:51AM -0700, Eric Dumazet wrote:
> On Thu, 2017-07-13 at 10:56 -0700, Shaohua Li wrote:
> > From: Shaohua Li <shli@fb.com>
> > 
> > Please see below tcpdump output:
> 
> > The tcp reset packet has a different flowlabel, which causes our router
> > doesn't correctly close tcp connection.
> 
> This looks a bug in your router, because (IPv6 only) flowlabel is not
> part of the tuple identifying a TCP flow.

Actually it's for load balance between several routers.
> 
> >   The reason is the normal packet
> > gets the skb->hash from sk->sk_txhash, which is generated randomly.
> > ip6_make_flowlabel then uses the hash to create a flowlabel. The reset
> > packet doesn't get assigned a hash, so the flowlabel is calculated with
> > flowi6.
> > 
> > The solution is to save the hash value for timeout sock and use it for
> > reset packet.
> 
> I am a bit unsure why we need to add yet another field in TCP timewait
> structure, since :
> 
> 1) flowlabel can vary during a TCP flow lifetime.
> 2) flowlabel is different unde synflood (each syncookie gets a random
> flowlabel), and if 3rd packet comes back from the client to finish 3WHS,
> the flowlabel will again be different from the one that SYNACK used.

Is it acceptable we reuse tw_flowlabel as Florent Fourcot suggested? It makes
no sense to change flowlabel for no reason.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net 1/2] net: set skb hash for IP6 TCP reset packet
  2017-07-17 21:53     ` Shaohua Li
@ 2017-07-18  4:02       ` Eric Dumazet
  2017-07-18 18:59         ` Shaohua Li
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-07-18  4:02 UTC (permalink / raw)
  To: Shaohua Li; +Cc: netdev, davem, Kernel-team, Florent Fourcot

On Mon, 2017-07-17 at 14:53 -0700, Shaohua Li wrote:
> On Mon, Jul 17, 2017 at 01:51:51AM -0700, Eric Dumazet wrote:
> > On Thu, 2017-07-13 at 10:56 -0700, Shaohua Li wrote:
> > > From: Shaohua Li <shli@fb.com>
> > > 
> > > Please see below tcpdump output:
> > 
> > > The tcp reset packet has a different flowlabel, which causes our router
> > > doesn't correctly close tcp connection.
> > 
> > This looks a bug in your router, because (IPv6 only) flowlabel is not
> > part of the tuple identifying a TCP flow.
> 
> Actually it's for load balance between several routers.

What happens then when flowlabel changes as I described ?

See commit 3acf3ec3f4b0 ("tcp: Change txhash on every SYN and RTO
retransmit")

> > 
> > >   The reason is the normal packet
> > > gets the skb->hash from sk->sk_txhash, which is generated randomly.
> > > ip6_make_flowlabel then uses the hash to create a flowlabel. The reset
> > > packet doesn't get assigned a hash, so the flowlabel is calculated with
> > > flowi6.
> > > 
> > > The solution is to save the hash value for timeout sock and use it for
> > > reset packet.
> > 
> > I am a bit unsure why we need to add yet another field in TCP timewait
> > structure, since :
> > 
> > 1) flowlabel can vary during a TCP flow lifetime.
> > 2) flowlabel is different unde synflood (each syncookie gets a random
> > flowlabel), and if 3rd packet comes back from the client to finish 3WHS,
> > the flowlabel will again be different from the one that SYNACK used.
> 
> Is it acceptable we reuse tw_flowlabel as Florent Fourcot suggested? It makes
> no sense to change flowlabel for no reason.

Sure, if you can find a way to keep storage as small as possible.

Current size is dangerously approaching 256 bytes, so we might soon use
one additional cache line (64 bytes)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC net 1/2] net: set skb hash for IP6 TCP reset packet
  2017-07-18  4:02       ` Eric Dumazet
@ 2017-07-18 18:59         ` Shaohua Li
  0 siblings, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2017-07-18 18:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Kernel-team, Florent Fourcot

On Mon, Jul 17, 2017 at 09:02:57PM -0700, Eric Dumazet wrote:
> On Mon, 2017-07-17 at 14:53 -0700, Shaohua Li wrote:
> > On Mon, Jul 17, 2017 at 01:51:51AM -0700, Eric Dumazet wrote:
> > > On Thu, 2017-07-13 at 10:56 -0700, Shaohua Li wrote:
> > > > From: Shaohua Li <shli@fb.com>
> > > > 
> > > > Please see below tcpdump output:
> > > 
> > > > The tcp reset packet has a different flowlabel, which causes our router
> > > > doesn't correctly close tcp connection.
> > > 
> > > This looks a bug in your router, because (IPv6 only) flowlabel is not
> > > part of the tuple identifying a TCP flow.
> > 
> > Actually it's for load balance between several routers.
> 
> What happens then when flowlabel changes as I described ?
> 
> See commit 3acf3ec3f4b0 ("tcp: Change txhash on every SYN and RTO
> retransmit")

Frankly I have no idea. People in the team do think this is a problem in some
corner cases. Didn't get any report yet though.
 
> > > 
> > > >   The reason is the normal packet
> > > > gets the skb->hash from sk->sk_txhash, which is generated randomly.
> > > > ip6_make_flowlabel then uses the hash to create a flowlabel. The reset
> > > > packet doesn't get assigned a hash, so the flowlabel is calculated with
> > > > flowi6.
> > > > 
> > > > The solution is to save the hash value for timeout sock and use it for
> > > > reset packet.
> > > 
> > > I am a bit unsure why we need to add yet another field in TCP timewait
> > > structure, since :
> > > 
> > > 1) flowlabel can vary during a TCP flow lifetime.
> > > 2) flowlabel is different unde synflood (each syncookie gets a random
> > > flowlabel), and if 3rd packet comes back from the client to finish 3WHS,
> > > the flowlabel will again be different from the one that SYNACK used.
> > 
> > Is it acceptable we reuse tw_flowlabel as Florent Fourcot suggested? It makes
> > no sense to change flowlabel for no reason.
> 
> Sure, if you can find a way to keep storage as small as possible.
> 
> Current size is dangerously approaching 256 bytes, so we might soon use
> one additional cache line (64 bytes)

Will send a new patch.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-07-18 18:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-13 17:56 [RFC net 0/2] fix ipv6 tcp reset packet flowlabel issues Shaohua Li
2017-07-13 17:56 ` [RFC net 1/2] net: set skb hash for IP6 TCP reset packet Shaohua Li
2017-07-17  8:51   ` Eric Dumazet
2017-07-17 20:06     ` Florent Fourcot
2017-07-17 21:53     ` Shaohua Li
2017-07-18  4:02       ` Eric Dumazet
2017-07-18 18:59         ` Shaohua Li
2017-07-13 17:56 ` [RFC net 2/2] net: tcp_v6_send_reset should set flowlabel Shaohua Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.