All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tcp: don't ignore ECN CWR on pure ACK
@ 2020-06-23 14:53 Denis Kirjanov
  2020-06-23 16:12 ` Neal Cardwell
  0 siblings, 1 reply; 3+ messages in thread
From: Denis Kirjanov @ 2020-06-23 14:53 UTC (permalink / raw)
  To: netdev

there is a problem with the CWR flag set in an incoming ACK segment
and it leads to the situation when the ECE flag is latched forever

the following packetdrill script shows what happens:

// Stack receives incoming segments with CE set
+0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
+0.0 <[ce]    . 12001:13001(1000) ack 1001 win 65535
+0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535

// Stack repsonds with ECN ECHO
+0.0 >[noecn]  . 1001:1001(0) ack 12001
+0.0 >[noecn] E. 1001:1001(0) ack 13001
+0.0 >[noecn] E. 1001:1001(0) ack 14001

// Write a packet
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] PE. 1001:2001(1000) ack 14001

// Pure ACK received
+0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535

// Since CWR was sent, this packet should NOT have ECE set

+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0]  P. 2001:3001(1000) ack 14001
// but Linux will still keep ECE latched here, with packetdrill
// flagging a missing ECE flag, expecting
// >[ect0] PE. 2001:3001(1000) ack 14001
// in the script

In the situation above we will continue to send ECN ECHO packets
and trigger the peer to reduce the congestion window.

Signed-off-by: Denis Kirjanov <denis.kirjanov@suse.com>
---
 net/ipv4/tcp_input.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 12fda8f27b08..e00b88c8598d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3696,8 +3696,13 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 				      &rexmit);
 	}
 
-	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
+	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
+		/* If we miss the CWR flag then ECE will be
+		 * latched forever.
+		 */
+		tcp_ecn_accept_cwr(sk, skb);
 		sk_dst_confirm(sk);
+	}
 
 	delivered = tcp_newly_delivered(sk, delivered, flag);
 	lost = tp->lost - lost;			/* freshly marked lost */
@@ -4800,8 +4805,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 	skb_dst_drop(skb);
 	__skb_pull(skb, tcp_hdr(skb)->doff * 4);
 
-	tcp_ecn_accept_cwr(sk, skb);
-
 	tp->rx_opt.dsack = 0;
 
 	/*  Queue data for delivery to the user.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] tcp: don't ignore ECN CWR on pure ACK
  2020-06-23 14:53 [PATCH] tcp: don't ignore ECN CWR on pure ACK Denis Kirjanov
@ 2020-06-23 16:12 ` Neal Cardwell
  2020-06-24  8:37   ` Denis Kirjanov
  0 siblings, 1 reply; 3+ messages in thread
From: Neal Cardwell @ 2020-06-23 16:12 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: Netdev, Eric Dumazet, Yuchung Cheng, Scheffenegger, Richard, Bob Briscoe

On Tue, Jun 23, 2020 at 10:54 AM Denis Kirjanov <kda@linux-powerpc.org> wrote:
>
> there is a problem with the CWR flag set in an incoming ACK segment
> and it leads to the situation when the ECE flag is latched forever
>
> the following packetdrill script shows what happens:
>
> // Stack receives incoming segments with CE set
> +0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
> +0.0 <[ce]    . 12001:13001(1000) ack 1001 win 65535
> +0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535
>
> // Stack repsonds with ECN ECHO
> +0.0 >[noecn]  . 1001:1001(0) ack 12001
> +0.0 >[noecn] E. 1001:1001(0) ack 13001
> +0.0 >[noecn] E. 1001:1001(0) ack 14001
>
> // Write a packet
> +0.1 write(3, ..., 1000) = 1000
> +0.0 >[ect0] PE. 1001:2001(1000) ack 14001
>
> // Pure ACK received
> +0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
>
> // Since CWR was sent, this packet should NOT have ECE set
>
> +0.1 write(3, ..., 1000) = 1000
> +0.0 >[ect0]  P. 2001:3001(1000) ack 14001
> // but Linux will still keep ECE latched here, with packetdrill
> // flagging a missing ECE flag, expecting
> // >[ect0] PE. 2001:3001(1000) ack 14001
> // in the script
>
> In the situation above we will continue to send ECN ECHO packets
> and trigger the peer to reduce the congestion window.
>
> Signed-off-by: Denis Kirjanov <denis.kirjanov@suse.com>
> ---
>  net/ipv4/tcp_input.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 12fda8f27b08..e00b88c8598d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3696,8 +3696,13 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
>                                       &rexmit);
>         }
>
> -       if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
> +       if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
> +               /* If we miss the CWR flag then ECE will be
> +                * latched forever.
> +                */
> +               tcp_ecn_accept_cwr(sk, skb);
>                 sk_dst_confirm(sk);
> +       }
>
>         delivered = tcp_newly_delivered(sk, delivered, flag);
>         lost = tp->lost - lost;                 /* freshly marked lost */
> @@ -4800,8 +4805,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
>         skb_dst_drop(skb);
>         __skb_pull(skb, tcp_hdr(skb)->doff * 4);
>
> -       tcp_ecn_accept_cwr(sk, skb);
> -
>         tp->rx_opt.dsack = 0;
>
>         /*  Queue data for delivery to the user.
> --
> 2.27.0
>

Hi Denis -

Thanks for this patch!

A few thoughts:

(1) Richard Scheffenegger (cc-ed) raised this issue in January.
  IMHO The Linux TCP code is RFC-compliant here. If you have a
  sender that is sending CWR on pure ACKs, then that sender is
  violating RFC3168, which specifies that CWR should only sent on
  data segments, so that the sender can be sure that there is a
  cwnd reduction even if a packet with CWR set is lost:

  RFC3168 says:

"""
When an ECN-Capable TCP sender reduces its congestion window for
any reason (because of a retransmit timeout, a Fast Retransmit,
or in response to an ECN Notification), the TCP sender sets the
CWR flag in the TCP header of the first new data packet sent
after the window reduction.  If that data packet is dropped in
the network, then the
...
sending TCP will have to reduce the congestion window again and
retransmit the dropped packet.

We ensure that the "Congestion Window Reduced" information is
reliably delivered to the TCP receiver.  This comes about from
the fact that if the new data packet carrying the CWR flag is
dropped, then the TCP sender will have to again reduce its
congestion window, and send another new data packet with the CWR
flag set.  Thus, the CWR bit in the TCP header SHOULD NOT be set
on retransmitted packets.

When the TCP data sender is ready to set the CWR bit after
reducing the congestion window, it SHOULD set the CWR bit only on
the first new data packet that it transmits.
"""

(2) Even given (1), I would agree with Richard's point that TCP
   receivers should accept CWR on pure ACKs, per the rationale of
   Postel's robustness principle ("Be liberal in what you accept,
   and conservative in what you send.")  Here we should be
   liberal and accept the CWR in the pure ACK, particularly
   because we know that at least one class of widely-deployed TCP
   stacks (*BSD stacks) do this.

(3) Given (2), I don't think this is the proper place to accept a
   CWR.  That point in the code is not reached by a connection
   that is only currently receiving data, because of the lines
   above that say:

if (!prior_packets)
goto no_queue;

   Accepting CWR is something a data receiver does (to respond to
   a data sender that is signalling that it has slowed down).  So
   even though I agree we should make the code more
   liberal/robust in acepting CWR on pure ACK, I don't think this
   is the right spot to insert the code.

(4) I would say this is an interoperability bug fix, so this
   should be explicitly targetd to the "net" branch.

(5) If you are able and have time, it would be nice to post the
   full packetdrill script, either in the commit message, or as a
   pull request to the packetdrill github project, so that the
   Linux community may benefit from the full test case.

Putting all that together, I think this patch should add a
comment that this code snippet is accepting non-RFC3168-compliant
behavior for the sake of robustness to known deployed stacks, and
should put that code in a place reached by data receivers. I
would suggest perhaps something like:

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 12fda8f27b08..717475b6f5c3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3665,6 +3665,13 @@ static int tcp_ack(struct sock *sk, const
struct sk_buff *skb, int flag)
                tcp_in_ack_event(sk, ack_ev_flags);
        }

+       /* An RFC3168-compliant sender should only set CWR on data segments.
+        * ("it SHOULD set the CWR bit only on the first new data packet that
+        * it transmits." However, we accept CWR on pure ACKs to be more robust
+        * with widely-deployed TCP implementations that do this.
+        */
+       tcp_ecn_accept_cwr(sk, skb);
+
        /* We passed data and got it acked, remove any soft error
         * log. Something worked...
         */
@@ -4800,8 +4807,6 @@ static void tcp_data_queue(struct sock *sk,
struct sk_buff *skb)
        skb_dst_drop(skb);
        __skb_pull(skb, tcp_hdr(skb)->doff * 4);

-       tcp_ecn_accept_cwr(sk, skb);
-
        tp->rx_opt.dsack = 0;

        /*  Queue data for delivery to the user.
---

best,
neal

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] tcp: don't ignore ECN CWR on pure ACK
  2020-06-23 16:12 ` Neal Cardwell
@ 2020-06-24  8:37   ` Denis Kirjanov
  0 siblings, 0 replies; 3+ messages in thread
From: Denis Kirjanov @ 2020-06-24  8:37 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Netdev, Eric Dumazet, Yuchung Cheng, Scheffenegger, Richard, Bob Briscoe

On 6/23/20, Neal Cardwell <ncardwell@google.com> wrote:
> On Tue, Jun 23, 2020 at 10:54 AM Denis Kirjanov <kda@linux-powerpc.org>
> wrote:
>>
>> there is a problem with the CWR flag set in an incoming ACK segment
>> and it leads to the situation when the ECE flag is latched forever
>>
>> the following packetdrill script shows what happens:
>>
>> // Stack receives incoming segments with CE set
>> +0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
>> +0.0 <[ce]    . 12001:13001(1000) ack 1001 win 65535
>> +0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535
>>
>> // Stack repsonds with ECN ECHO
>> +0.0 >[noecn]  . 1001:1001(0) ack 12001
>> +0.0 >[noecn] E. 1001:1001(0) ack 13001
>> +0.0 >[noecn] E. 1001:1001(0) ack 14001
>>
>> // Write a packet
>> +0.1 write(3, ..., 1000) = 1000
>> +0.0 >[ect0] PE. 1001:2001(1000) ack 14001
>>
>> // Pure ACK received
>> +0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
>>
>> // Since CWR was sent, this packet should NOT have ECE set
>>
>> +0.1 write(3, ..., 1000) = 1000
>> +0.0 >[ect0]  P. 2001:3001(1000) ack 14001
>> // but Linux will still keep ECE latched here, with packetdrill
>> // flagging a missing ECE flag, expecting
>> // >[ect0] PE. 2001:3001(1000) ack 14001
>> // in the script
>>
>> In the situation above we will continue to send ECN ECHO packets
>> and trigger the peer to reduce the congestion window.
>>
>> Signed-off-by: Denis Kirjanov <denis.kirjanov@suse.com>
>> ---
>>  net/ipv4/tcp_input.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 12fda8f27b08..e00b88c8598d 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3696,8 +3696,13 @@ static int tcp_ack(struct sock *sk, const struct
>> sk_buff *skb, int flag)
>>                                       &rexmit);
>>         }
>>
>> -       if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
>> +       if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
>> +               /* If we miss the CWR flag then ECE will be
>> +                * latched forever.
>> +                */
>> +               tcp_ecn_accept_cwr(sk, skb);
>>                 sk_dst_confirm(sk);
>> +       }
>>
>>         delivered = tcp_newly_delivered(sk, delivered, flag);
>>         lost = tp->lost - lost;                 /* freshly marked lost */
>> @@ -4800,8 +4805,6 @@ static void tcp_data_queue(struct sock *sk, struct
>> sk_buff *skb)
>>         skb_dst_drop(skb);
>>         __skb_pull(skb, tcp_hdr(skb)->doff * 4);
>>
>> -       tcp_ecn_accept_cwr(sk, skb);
>> -
>>         tp->rx_opt.dsack = 0;
>>
>>         /*  Queue data for delivery to the user.
>> --
>> 2.27.0
>>
>
> Hi Denis -
>
> Thanks for this patch!

Hi Neal,

> A few thoughts:
>
> (1) Richard Scheffenegger (cc-ed) raised this issue in January.
>   IMHO The Linux TCP code is RFC-compliant here. If you have a
>   sender that is sending CWR on pure ACKs, then that sender is
>   violating RFC3168, which specifies that CWR should only sent on
>   data segments, so that the sender can be sure that there is a
>   cwnd reduction even if a packet with CWR set is lost:
>
>   RFC3168 says:
>
> """
> When an ECN-Capable TCP sender reduces its congestion window for
> any reason (because of a retransmit timeout, a Fast Retransmit,
> or in response to an ECN Notification), the TCP sender sets the
> CWR flag in the TCP header of the first new data packet sent
> after the window reduction.  If that data packet is dropped in
> the network, then the
> ...
> sending TCP will have to reduce the congestion window again and
> retransmit the dropped packet.
>
> We ensure that the "Congestion Window Reduced" information is
> reliably delivered to the TCP receiver.  This comes about from
> the fact that if the new data packet carrying the CWR flag is
> dropped, then the TCP sender will have to again reduce its
> congestion window, and send another new data packet with the CWR
> flag set.  Thus, the CWR bit in the TCP header SHOULD NOT be set
> on retransmitted packets.
>
> When the TCP data sender is ready to set the CWR bit after
> reducing the congestion window, it SHOULD set the CWR bit only on
> the first new data packet that it transmits.
> """
>
> (2) Even given (1), I would agree with Richard's point that TCP
>    receivers should accept CWR on pure ACKs, per the rationale of
>    Postel's robustness principle ("Be liberal in what you accept,
>    and conservative in what you send.")  Here we should be
>    liberal and accept the CWR in the pure ACK, particularly
>    because we know that at least one class of widely-deployed TCP
>    stacks (*BSD stacks) do this.
>
> (3) Given (2), I don't think this is the proper place to accept a
>    CWR.  That point in the code is not reached by a connection
>    that is only currently receiving data, because of the lines
>    above that say:
>
> if (!prior_packets)
> goto no_queue;

Right, here we don't loose a segment

>
>    Accepting CWR is something a data receiver does (to respond to
>    a data sender that is signalling that it has slowed down).  So
>    even though I agree we should make the code more
>    liberal/robust in acepting CWR on pure ACK, I don't think this
>    is the right spot to insert the code.
>
> (4) I would say this is an interoperability bug fix, so this
>    should be explicitly targetd to the "net" branch.
Sure

>
> (5) If you are able and have time, it would be nice to post the
>    full packetdrill script, either in the commit message, or as a
>    pull request to the packetdrill github project, so that the
>    Linux community may benefit from the full test case.

Initially I thought that we do have packetdrill scripts in the kernel
but we don't.
Sure, I'll post a patch.  Actually I've sent a different patch before
(regarding gcc 10 issue) and looks like the mailing list doesn't work.
Do I have to do that through gihub?

>
> Putting all that together, I think this patch should add a
> comment that this code snippet is accepting non-RFC3168-compliant
> behavior for the sake of robustness to known deployed stacks, and
> should put that code in a place reached by data receivers. I
> would suggest perhaps something like:

Thanks for review!
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 12fda8f27b08..717475b6f5c3 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3665,6 +3665,13 @@ static int tcp_ack(struct sock *sk, const
> struct sk_buff *skb, int flag)
>                 tcp_in_ack_event(sk, ack_ev_flags);
>         }
>
> +       /* An RFC3168-compliant sender should only set CWR on data
> segments.
> +        * ("it SHOULD set the CWR bit only on the first new data packet
> that
> +        * it transmits." However, we accept CWR on pure ACKs to be more
> robust
> +        * with widely-deployed TCP implementations that do this.
> +        */
> +       tcp_ecn_accept_cwr(sk, skb);
> +
>         /* We passed data and got it acked, remove any soft error
>          * log. Something worked...
>          */
> @@ -4800,8 +4807,6 @@ static void tcp_data_queue(struct sock *sk,
> struct sk_buff *skb)
>         skb_dst_drop(skb);
>         __skb_pull(skb, tcp_hdr(skb)->doff * 4);
>
> -       tcp_ecn_accept_cwr(sk, skb);
> -
>         tp->rx_opt.dsack = 0;
>
>         /*  Queue data for delivery to the user.
> ---
>
> best,
> neal
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-06-24  8:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-23 14:53 [PATCH] tcp: don't ignore ECN CWR on pure ACK Denis Kirjanov
2020-06-23 16:12 ` Neal Cardwell
2020-06-24  8:37   ` Denis Kirjanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.