[PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK
@ 2020-07-17 11:43 hujunwei
  2020-07-17 14:44 ` Neal Cardwell
  0 siblings, 1 reply; 5+ messages in thread
From: hujunwei @ 2020-07-17 11:43 UTC (permalink / raw)
  To: edumazet, davem, kuznet, yoshfuji, kuba
  Cc: netdev, wangxiaogang3, jinyiting, xuhanbing, zhengshaoyu

From: Junwei Hu <hujunwei4@huawei.com>

In the document of RFC2582(https://tools.ietf.org/html/rfc2582)
introduced two separate scenarios for tcp congestion control:
There are two separate scenarios in which the TCP sender could
receive three duplicate acknowledgements acknowledging "send_high"
but no more than "send_high".  One scenario would be that the data
sender transmitted four packets with sequence numbers higher than
"send_high", that the first packet was dropped in the network, and
the following three packets triggered three duplicate
acknowledgements acknowledging "send_high".  The second scenario
would be that the sender unnecessarily retransmitted three packets
below "send_high", and that these three packets triggered three
duplicate acknowledgements acknowledging "send_high".  In the absence
of SACK, the TCP sender in unable to distinguish between these two
scenarios.

We encountered the second scenario when the third-party switches
does not support SACK, and I use kprobes to find that tcp kept in
CA_Loss state when high_seq equal to snd_nxt.

All of the packets is acked if high_seq equal to snd_nxt, the TCP
sender is able to distinguish between these two scenarios in
described RFC2582. So the current state can be switched.

This patch enhance the TCP congestion control algorithm for lack
of SACK.

Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
Reviewed-by: XiaoGang Wang <wangxiaogang3@huawei.com>
Reviewed-by: Yiting Jin <jinyiting@huawei.com>
---
 net/ipv4/tcp_input.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9615e72..d5573123 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2385,7 +2385,8 @@ static bool tcp_try_undo_recovery(struct sock *sk)
 	} else if (tp->rack.reo_wnd_persist) {
 		tp->rack.reo_wnd_persist--;
 	}
-	if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
+	if (tp->snd_una == tp->high_seq &&
+	    tcp_is_reno(tp) && tp->snd_nxt > tp->high_seq) {
 		/* Hold old state until something *above* high_seq
 		 * is ACKed. For Reno it is MUST to prevent false
 		 * fast retransmits (RFC2582). SACK TCP is safe. */
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK
  2020-07-17 11:43 [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK hujunwei
@ 2020-07-17 14:44 ` Neal Cardwell
  2020-07-18 10:43   ` hujunwei
  0 siblings, 1 reply; 5+ messages in thread
From: Neal Cardwell @ 2020-07-17 14:44 UTC (permalink / raw)
  To: hujunwei
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Netdev, wangxiaogang3, jinyiting, xuhanbing,
	zhengshaoyu, Yuchung Cheng, Ilpo Jarvinen

On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@huawei.com> wrote:
>
> From: Junwei Hu <hujunwei4@huawei.com>
>
> In the document of RFC2582(https://tools.ietf.org/html/rfc2582)
> introduced two separate scenarios for tcp congestion control:
> There are two separate scenarios in which the TCP sender could
> receive three duplicate acknowledgements acknowledging "send_high"
> but no more than "send_high".  One scenario would be that the data
> sender transmitted four packets with sequence numbers higher than
> "send_high", that the first packet was dropped in the network, and
> the following three packets triggered three duplicate
> acknowledgements acknowledging "send_high".  The second scenario
> would be that the sender unnecessarily retransmitted three packets
> below "send_high", and that these three packets triggered three
> duplicate acknowledgements acknowledging "send_high".  In the absence
> of SACK, the TCP sender in unable to distinguish between these two
> scenarios.
>
> We encountered the second scenario when the third-party switches
> does not support SACK, and I use kprobes to find that tcp kept in
> CA_Loss state when high_seq equal to snd_nxt.
>
> All of the packets is acked if high_seq equal to snd_nxt, the TCP
> sender is able to distinguish between these two scenarios in
> described RFC2582. So the current state can be switched.

Can you please elaborate on how the sender is able to distinguish
between the two scenarios, after your patch?

It seems to me that with this proposed patch, there is the risk of
spurious fast recoveries due to 3 dupacks in the second second
scenario (the sender unnecessarily retransmitted three packets below
"send_high"). Can you please post a packetdrill test to demonstrate
that with this patch the TCP sender does not spuriously enter fast
recovery in such a scenario?

> This patch enhance the TCP congestion control algorithm for lack
> of SACK.

You describe this as an enhancement. Can you please elaborate on the
drawback/downside of staying in CA_Loss in this case you are
describing (where you used kprobes to find that TCP stayed in CA_Loss
state when high_seq was equal to snd_nxt)?

> Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
> Reviewed-by: XiaoGang Wang <wangxiaogang3@huawei.com>
> Reviewed-by: Yiting Jin <jinyiting@huawei.com>
> ---
>  net/ipv4/tcp_input.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9615e72..d5573123 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2385,7 +2385,8 @@ static bool tcp_try_undo_recovery(struct sock *sk)
>         } else if (tp->rack.reo_wnd_persist) {
>                 tp->rack.reo_wnd_persist--;
>         }
> -       if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
> +       if (tp->snd_una == tp->high_seq &&
> +           tcp_is_reno(tp) && tp->snd_nxt > tp->high_seq) {

To deal with sequence number wrap-around, sequence number comparisons
in TCP need to use the before() and after() helpers, rather than
comparison operators. Here it seems the patch should use after()
rather than >. However,  I think the larger concern is the concern
mentioned above.

best,
neal

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK
  2020-07-17 14:44 ` Neal Cardwell
@ 2020-07-18 10:43   ` hujunwei
  2020-07-19  3:29     ` Neal Cardwell
  0 siblings, 1 reply; 5+ messages in thread
From: hujunwei @ 2020-07-18 10:43 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Netdev, wangxiaogang3, jinyiting, xuhanbing,
	zhengshaoyu, Yuchung Cheng, Ilpo Jarvinen


On 2020/7/17 22:44, Neal Cardwell wrote:
> On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@huawei.com> wrote:
>>
>> From: Junwei Hu <hujunwei4@huawei.com>
>>
>> In the document of RFC2582(https://tools.ietf.org/html/rfc2582)
>> introduced two separate scenarios for tcp congestion control:
> 
> Can you please elaborate on how the sender is able to distinguish
> between the two scenarios, after your patch?
> 
> It seems to me that with this proposed patch, there is the risk of
> spurious fast recoveries due to 3 dupacks in the second second
> scenario (the sender unnecessarily retransmitted three packets below
> "send_high"). Can you please post a packetdrill test to demonstrate
> that with this patch the TCP sender does not spuriously enter fast
> recovery in such a scenario?
> 
Hi neal,
Thanks for you quick reply!
What I want to says is when these three numbers: snd_una, high_seq and
snd_nxt are the same, that means all data outstanding
when the Loss state began have successfully been acknowledged.
So the sender is time to exits to the Open state.
I'm not sure whether my understanding is correct.

>> This patch enhance the TCP congestion control algorithm for lack
>> of SACK.
> 
> You describe this as an enhancement. Can you please elaborate on the
> drawback/downside of staying in CA_Loss in this case you are
> describing (where you used kprobes to find that TCP stayed in CA_Loss
> state when high_seq was equal to snd_nxt)?
> 
I tried, but I can't reproduce it by packetdrill. This problem appeared
in our production environment. Here is part of the trace message:

First ack:
#tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
	packets_out=4 retrans_out=1 sacked_out=0 lost_out=4 snd_nxt=3427491485
	snd_una=3427485917 high_seq=3427491485 reordering=1 mss_cache=1392
	icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1

#tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) prior_snd_una=3427485917
	num_dupack=0 packets_out=0 retrans_out=0 sacked_out=0 lost_out=0
	snd_nxt=3427491485 snd_una=3427491485 high_seq=3427491485 reordering=1
	mss_cache=1392 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1

As we can see by func tcp_fastretrans_alert icsk_ca_state remains CA_Loss (4),
and the numbers: snd_nxt, snd_una and high_seq are the same.

first dup ack:
#tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
	packets_out=2 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269
	snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392
	icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2

#tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) num_dupack=1 packets_out=2
	retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269 snd_una=3427491485
	high_seq=3427491485 reordering=1 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2

second dup ack:
#tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
	packets_out=4 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427497053
	snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392
	icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=4

So, I really hope someone can answer whether my understanding is correct.

> To deal with sequence number wrap-around, sequence number comparisons
> in TCP need to use the before() and after() helpers, rather than
> comparison operators. Here it seems the patch should use after()
> rather than >. However,  I think the larger concern is the concern
> mentioned above.
> 
If this patch is useful, I will modify this.

Regards Junwei


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK
  2020-07-18 10:43   ` hujunwei
@ 2020-07-19  3:29     ` Neal Cardwell
  2020-07-20  1:54       ` hujunwei
  0 siblings, 1 reply; 5+ messages in thread
From: Neal Cardwell @ 2020-07-19  3:29 UTC (permalink / raw)
  To: hujunwei
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Netdev, wangxiaogang3, jinyiting, xuhanbing,
	zhengshaoyu, Yuchung Cheng, Ilpo Jarvinen

On Sat, Jul 18, 2020 at 6:43 AM hujunwei <hujunwei4@huawei.com> wrote:
>
>
> On 2020/7/17 22:44, Neal Cardwell wrote:
> > On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@huawei.com> wrote:
> >>
> >> From: Junwei Hu <hujunwei4@huawei.com>
> >>
> >> In the document of RFC2582(https://tools.ietf.org/html/rfc2582)
> >> introduced two separate scenarios for tcp congestion control:
> >
> > Can you please elaborate on how the sender is able to distinguish
> > between the two scenarios, after your patch?
> >
> > It seems to me that with this proposed patch, there is the risk of
> > spurious fast recoveries due to 3 dupacks in the second second
> > scenario (the sender unnecessarily retransmitted three packets below
> > "send_high"). Can you please post a packetdrill test to demonstrate
> > that with this patch the TCP sender does not spuriously enter fast
> > recovery in such a scenario?
> >
> Hi neal,
> Thanks for you quick reply!
> What I want to says is when these three numbers: snd_una, high_seq and
> snd_nxt are the same, that means all data outstanding
> when the Loss state began have successfully been acknowledged.

Yes, that seems true.

> So the sender is time to exits to the Open state.
> I'm not sure whether my understanding is correct.

I don't think we want the sender to exit to the CA_Open state in these
circumstances. I think section 5 ("Avoiding Multiple Fast
Retransmits") of RFC 2582 states convincingly that senders should take
steps to avoid having duplicate acknowledgements at high_seq trigger a
new fast recovery. The Linux TCP implements those steps by *not*
exiting to the Open state, and instead staying in CA_Loss or
CA_Recovery.

To make things more concrete, here is the kind of timeline/scenario I
am concerned about with your proposed patch. I have not had cycles to
cook a packetdrill test like this, but this is the basic idea:

[connection does not have SACK or TCP timestamps enabled]
app writes 4*SMSS
Send packets P1, P2, P3, P4
TLP, spurious retransmit of P4
spurious RTO, set cwnd to 1, enter CA_Loss, retransmit P1
receive ACK for P1 (original copy)
slow-start, increase cwnd to 2, retransmit P2, P3
receive ACK for P2 (original copy)
slow-start, increase cwnd to 3, retransmit P4
receive ACK for P3 (original copy)
slow-start, increase cwnd to 4
receive ACK for P4 (original copy)
slow-start, increase cwnd to 5
[with your patch, at this point the sender does not meet the
 conditions for "Hold old state until something *above* high_seq is ACKed.",
 so sender exits CA_Loss and enters Open]
app writes 4*MSS
send P5, P6, P7, P8
receive dupack for P4 (due to spurious TLP retransmit of P4)
receive dupack for P4 (due to spurious CA_Loss retransmit of P1)
receive dupack for P4 (due to spurious CA_Loss retransmit of P2)
[with your patch, at this point we risk spuriously entering
 fast recovery because we have  received 3 duplicate ACKs for P4]

A packetdrill test that shows that this is not the behavior of your
proposed patch would help support your proposed patch (presuming > is
replaced by after()).

best,
neal

> >> This patch enhance the TCP congestion control algorithm for lack
> >> of SACK.
> >
> > You describe this as an enhancement. Can you please elaborate on the
> > drawback/downside of staying in CA_Loss in this case you are
> > describing (where you used kprobes to find that TCP stayed in CA_Loss
> > state when high_seq was equal to snd_nxt)?
> >
> I tried, but I can't reproduce it by packetdrill. This problem appeared
> in our production environment. Here is part of the trace message:
>
> First ack:
> #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
>         packets_out=4 retrans_out=1 sacked_out=0 lost_out=4 snd_nxt=3427491485
>         snd_una=3427485917 high_seq=3427491485 reordering=1 mss_cache=1392
>         icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1
>
> #tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) prior_snd_una=3427485917
>         num_dupack=0 packets_out=0 retrans_out=0 sacked_out=0 lost_out=0
>         snd_nxt=3427491485 snd_una=3427491485 high_seq=3427491485 reordering=1
>         mss_cache=1392 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1
>
> As we can see by func tcp_fastretrans_alert icsk_ca_state remains CA_Loss (4),
> and the numbers: snd_nxt, snd_una and high_seq are the same.
>
> first dup ack:
> #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
>         packets_out=2 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269
>         snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392
>         icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2
>
> #tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) num_dupack=1 packets_out=2
>         retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269 snd_una=3427491485
>         high_seq=3427491485 reordering=1 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2
>
> second dup ack:
> #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485)
>         packets_out=4 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427497053
>         snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392
>         icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=4
>
> So, I really hope someone can answer whether my understanding is correct.
>
> > To deal with sequence number wrap-around, sequence number comparisons
> > in TCP need to use the before() and after() helpers, rather than
> > comparison operators. Here it seems the patch should use after()
> > rather than >. However,  I think the larger concern is the concern
> > mentioned above.
> >
> If this patch is useful, I will modify this.
>
> Regards Junwei
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK
  2020-07-19  3:29     ` Neal Cardwell
@ 2020-07-20  1:54       ` hujunwei
  0 siblings, 0 replies; 5+ messages in thread
From: hujunwei @ 2020-07-20  1:54 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Netdev, wangxiaogang3, jinyiting, xuhanbing,
	zhengshaoyu, Yuchung Cheng, Ilpo Jarvinen



On 2020/7/19 11:29, Neal Cardwell wrote:
> On Sat, Jul 18, 2020 at 6:43 AM hujunwei <hujunwei4@huawei.com> wrote:
>>
>>
>> On 2020/7/17 22:44, Neal Cardwell wrote:
>>> On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@huawei.com> wrote:
>>>>
>>>> From: Junwei Hu <hujunwei4@huawei.com>
>>>>
>>>> In the document of RFC2582(https://tools.ietf.org/html/rfc2582)
>>>> introduced two separate scenarios for tcp congestion control:
>>>
>>> Can you please elaborate on how the sender is able to distinguish
>>> between the two scenarios, after your patch?
>>>
>>> It seems to me that with this proposed patch, there is the risk of
>>> spurious fast recoveries due to 3 dupacks in the second second
>>> scenario (the sender unnecessarily retransmitted three packets below
>>> "send_high"). Can you please post a packetdrill test to demonstrate
>>> that with this patch the TCP sender does not spuriously enter fast
>>> recovery in such a scenario?
>>>
>> Hi neal,
>> Thanks for you quick reply!
>> What I want to says is when these three numbers: snd_una, high_seq and
>> snd_nxt are the same, that means all data outstanding
>> when the Loss state began have successfully been acknowledged.
> 
> Yes, that seems true.
> 
>> So the sender is time to exits to the Open state.
>> I'm not sure whether my understanding is correct.
> 
> I don't think we want the sender to exit to the CA_Open state in these
> circumstances. I think section 5 ("Avoiding Multiple Fast
> Retransmits") of RFC 2582 states convincingly that senders should take
> steps to avoid having duplicate acknowledgements at high_seq trigger a
> new fast recovery. The Linux TCP implements those steps by *not*
> exiting to the Open state, and instead staying in CA_Loss or
> CA_RecoverThanks neal,
After reading the section 5 of RFC 2582, I think you are right.

> 
> To make things more concrete, here is the kind of timeline/scenario I
> am concerned about with your proposed patch. I have not had cycles to
> cook a packetdrill test like this, but this is the basic idea:
> 
> [connection does not have SACK or TCP timestamps enabled]
> app writes 4*SMSS
> Send packets P1, P2, P3, P4
> TLP, spurious retransmit of P4
> spurious RTO, set cwnd to 1, enter CA_Loss, retransmit P1
> receive ACK for P1 (original copy)
> slow-start, increase cwnd to 2, retransmit P2, P3
> receive ACK for P2 (original copy)
> slow-start, increase cwnd to 3, retransmit P4
> receive ACK for P3 (original copy)
> slow-start, increase cwnd to 4
> receive ACK for P4 (original copy)
> slow-start, increase cwnd to 5
> [with your patch, at this point the sender does not meet the
>  conditions for "Hold old state until something *above* high_seq is ACKed.",
>  so sender exits CA_Loss and enters Open]
> app writes 4*MSS
> send P5, P6, P7, P8
> receive dupack for P4 (due to spurious TLP retransmit of P4)
> receive dupack for P4 (due to spurious CA_Loss retransmit of P1)
> receive dupack for P4 (due to spurious CA_Loss retransmit of P2)
> [with your patch, at this point we risk spuriously entering
>  fast recovery because we have  received 3 duplicate ACKs for P4]
> 
> A packetdrill test that shows that this is not the behavior of your
> proposed patch would help support your proposed patch (presuming > is
> replaced by after()).
> 
> best,
> neal

Thank you for your detailed answer, I felt that I learned a lot.

Regards Junwei


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-20  1:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17 11:43 [PATCH net-next] tcp: Optimize the recovery of tcp when lack of SACK hujunwei
2020-07-17 14:44 ` Neal Cardwell
2020-07-18 10:43   ` hujunwei
2020-07-19  3:29     ` Neal Cardwell
2020-07-20  1:54       ` hujunwei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.