netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets
@ 2020-11-19 21:23 Alexander Duyck
  2020-11-19 21:23 ` [net PATCH 1/2] tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Alexander Duyck
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Alexander Duyck @ 2020-11-19 21:23 UTC (permalink / raw)
  To: netdev
  Cc: bpf, daniel, kafai, kernel-team, edumazet, brakmo,
	alexanderduyck, weiwan

This patch set is meant to address issues seen with SYN/ACK packets not
containing the ECT0 bit when DCTCP is configured as the congestion control
algorithm for a TCP socket.

A simple test using "tcpdump" and "test_progs -t bpf_tcp_ca" makes the
issue obvious. Looking at the packets will result in the SYN/ACK packet
with an ECT0 bit that does not match the other packets for the flow when
the congestion control agorithm is switch from the default. So for example
going from non-DCTCP to a DCTCP congestion control algorithm we will see
the SYN/ACK IPV6 header will not have ECT0 set while the other packets in
the flow will. Likewise if we switch from a default of DCTCP to cubic we
will see the ECT0 bit set in the SYN/ACK while the other packets in the
flow will not.

---

Alexander Duyck (2):
      tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header
      tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control


 net/ipv4/tcp_cong.c | 5 +++++
 1 file changed, 5 insertions(+)

--


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [net PATCH 1/2] tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header
  2020-11-19 21:23 [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets Alexander Duyck
@ 2020-11-19 21:23 ` Alexander Duyck
  2020-11-19 23:09   ` Wei Wang
  2020-11-19 21:23 ` [net PATCH 2/2] tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control Alexander Duyck
  2020-11-21  2:22 ` [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets Jakub Kicinski
  2 siblings, 1 reply; 5+ messages in thread
From: Alexander Duyck @ 2020-11-19 21:23 UTC (permalink / raw)
  To: netdev
  Cc: bpf, daniel, kafai, kernel-team, edumazet, brakmo,
	alexanderduyck, weiwan

From: Alexander Duyck <alexanderduyck@fb.com>

An issue was recently found where DCTCP SYN/ACK packets did not have the
ECT bit set in the L3 header. A bit of code review found that the recent
change referenced below had gone though and added a mask that prevented the
ECN bits from being populated in the L3 header.

This patch addresses that by rolling back the mask so that it is only
applied to the flags coming from the incoming TCP request instead of
applying it to the socket tos/tclass field. Doing this the ECT bits were
restored in the SYN/ACK packets in my testing.

One thing that is not addressed by this patch set is the fact that
tcp_reflect_tos appears to be incompatible with ECN based congestion
avoidance algorithms. At a minimum the feature should likely be documented
which it currently isn't.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 net/ipv4/tcp_ipv4.c |    5 +++--
 net/ipv6/tcp_ipv6.c |    6 +++---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c2d5132c523c..c5f8b686aa82 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -981,7 +981,8 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
 	skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb);
 
 	tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
-			tcp_rsk(req)->syn_tos : inet_sk(sk)->tos;
+			tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
+			inet_sk(sk)->tos;
 
 	if (skb) {
 		__tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr);
@@ -990,7 +991,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
 		err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr,
 					    ireq->ir_rmt_addr,
 					    rcu_dereference(ireq->ireq_opt),
-					    tos & ~INET_ECN_MASK);
+					    tos);
 		rcu_read_unlock();
 		err = net_xmit_eval(err);
 	}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8db59f4e5f13..3d49e8d0afee 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -530,12 +530,12 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
 		rcu_read_lock();
 		opt = ireq->ipv6_opt;
 		tclass = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
-				tcp_rsk(req)->syn_tos : np->tclass;
+				tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
+				np->tclass;
 		if (!opt)
 			opt = rcu_dereference(np->opt);
 		err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt,
-			       tclass & ~INET_ECN_MASK,
-			       sk->sk_priority);
+			       tclass, sk->sk_priority);
 		rcu_read_unlock();
 		err = net_xmit_eval(err);
 	}



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [net PATCH 2/2] tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control
  2020-11-19 21:23 [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets Alexander Duyck
  2020-11-19 21:23 ` [net PATCH 1/2] tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Alexander Duyck
@ 2020-11-19 21:23 ` Alexander Duyck
  2020-11-21  2:22 ` [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets Jakub Kicinski
  2 siblings, 0 replies; 5+ messages in thread
From: Alexander Duyck @ 2020-11-19 21:23 UTC (permalink / raw)
  To: netdev
  Cc: bpf, daniel, kafai, kernel-team, edumazet, brakmo,
	alexanderduyck, weiwan

From: Alexander Duyck <alexanderduyck@fb.com>

When setting congestion control via a BPF program it is seen that the
SYN/ACK for packets within a given flow will not include the ECT0 flag. A
bit of simple printk debugging shows that when this is configured without
BPF we will see the value INET_ECN_xmit value initialized in
tcp_assign_congestion_control however when we configure this via BPF the
socket is in the closed state and as such it isn't configured, and I do not
see it being initialized when we transition the socket into the listen
state. The result of this is that the ECT0 bit is configured based on
whatever the default state is for the socket.

Any easy way to reproduce this is to monitor the following with tcpdump:
tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca

Without this patch the SYN/ACK will follow whatever the default is. If dctcp
all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0
will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK
bit matches the value seen on the other packets in the given stream.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 net/ipv4/tcp_cong.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index db47ac24d057..563d016e7478 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -198,6 +198,11 @@ static void tcp_reinit_congestion_control(struct sock *sk,
 	icsk->icsk_ca_setsockopt = 1;
 	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
 
+	if (ca->flags & TCP_CONG_NEEDS_ECN)
+		INET_ECN_xmit(sk);
+	else
+		INET_ECN_dontxmit(sk);
+
 	if (!((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)))
 		tcp_init_congestion_control(sk);
 }



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [net PATCH 1/2] tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header
  2020-11-19 21:23 ` [net PATCH 1/2] tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Alexander Duyck
@ 2020-11-19 23:09   ` Wei Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Wei Wang @ 2020-11-19 23:09 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Linux Kernel Network Developers, bpf, daniel, Martin KaFai Lau,
	kernel-team, Eric Dumazet, brakmo, alexanderduyck

On Thu, Nov 19, 2020 at 1:23 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> From: Alexander Duyck <alexanderduyck@fb.com>
>
> An issue was recently found where DCTCP SYN/ACK packets did not have the
> ECT bit set in the L3 header. A bit of code review found that the recent
> change referenced below had gone though and added a mask that prevented the
> ECN bits from being populated in the L3 header.
>
> This patch addresses that by rolling back the mask so that it is only
> applied to the flags coming from the incoming TCP request instead of
> applying it to the socket tos/tclass field. Doing this the ECT bits were
> restored in the SYN/ACK packets in my testing.
>
> One thing that is not addressed by this patch set is the fact that
> tcp_reflect_tos appears to be incompatible with ECN based congestion
> avoidance algorithms. At a minimum the feature should likely be documented
> which it currently isn't.
>
> Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")

Acked-by: Wei Wang <weiwan@google.com>

Thanks for catching this. I was under the wrong impression that the
ECT bits were marked in tos after the tcp layer. Upon a closer look,
it seems right now, it only gets marked in inet_sock(sk)->tos from
tcp_init_congestion_control() once.
I will submit a follow-up fix to make sure we include the lower 2 bits
in the reflection case as well.

> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  net/ipv4/tcp_ipv4.c |    5 +++--
>  net/ipv6/tcp_ipv6.c |    6 +++---
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index c2d5132c523c..c5f8b686aa82 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -981,7 +981,8 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
>         skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb);
>
>         tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
> -                       tcp_rsk(req)->syn_tos : inet_sk(sk)->tos;
> +                       tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
> +                       inet_sk(sk)->tos;
>
>         if (skb) {
>                 __tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr);
> @@ -990,7 +991,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
>                 err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr,
>                                             ireq->ir_rmt_addr,
>                                             rcu_dereference(ireq->ireq_opt),
> -                                           tos & ~INET_ECN_MASK);
> +                                           tos);
>                 rcu_read_unlock();
>                 err = net_xmit_eval(err);
>         }
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 8db59f4e5f13..3d49e8d0afee 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -530,12 +530,12 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
>                 rcu_read_lock();
>                 opt = ireq->ipv6_opt;
>                 tclass = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
> -                               tcp_rsk(req)->syn_tos : np->tclass;
> +                               tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
> +                               np->tclass;
>                 if (!opt)
>                         opt = rcu_dereference(np->opt);
>                 err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt,
> -                              tclass & ~INET_ECN_MASK,
> -                              sk->sk_priority);
> +                              tclass, sk->sk_priority);
>                 rcu_read_unlock();
>                 err = net_xmit_eval(err);
>         }
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets
  2020-11-19 21:23 [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets Alexander Duyck
  2020-11-19 21:23 ` [net PATCH 1/2] tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Alexander Duyck
  2020-11-19 21:23 ` [net PATCH 2/2] tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control Alexander Duyck
@ 2020-11-21  2:22 ` Jakub Kicinski
  2 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2020-11-21  2:22 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, bpf, daniel, kafai, kernel-team, edumazet, brakmo,
	alexanderduyck, weiwan

On Thu, 19 Nov 2020 13:23:43 -0800 Alexander Duyck wrote:
> This patch set is meant to address issues seen with SYN/ACK packets not
> containing the ECT0 bit when DCTCP is configured as the congestion control
> algorithm for a TCP socket.
> 
> A simple test using "tcpdump" and "test_progs -t bpf_tcp_ca" makes the
> issue obvious. Looking at the packets will result in the SYN/ACK packet
> with an ECT0 bit that does not match the other packets for the flow when
> the congestion control agorithm is switch from the default. So for example
> going from non-DCTCP to a DCTCP congestion control algorithm we will see
> the SYN/ACK IPV6 header will not have ECT0 set while the other packets in
> the flow will. Likewise if we switch from a default of DCTCP to cubic we
> will see the ECT0 bit set in the SYN/ACK while the other packets in the
> flow will not.

Applied, thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-21  2:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19 21:23 [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets Alexander Duyck
2020-11-19 21:23 ` [net PATCH 1/2] tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Alexander Duyck
2020-11-19 23:09   ` Wei Wang
2020-11-19 21:23 ` [net PATCH 2/2] tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control Alexander Duyck
2020-11-21  2:22 ` [net PATCH 0/2] tcp: Address issues with ECT0 not being set in DCTCP packets Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).