All of lore.kernel.org
 help / color / mirror / Atom feed
* tcp: fixing TLP's FIN recovery
@ 2014-06-06 18:46 Per Hurtig
  2014-06-06 19:07 ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Per Hurtig @ 2014-06-06 18:46 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, anna.brunstrom, mohammad.rajiullah

>From ab1b16ef8aba4300b1a6e965c3ab7d0cb269bb2a Mon Sep 17 00:00:00 2001
From: Per Hurtig <per.hurtig@kau.se>
Date: Fri, 6 Jun 2014 18:36:19 +0200
Subject: [PATCH 1/1] tcp: fixing TLP's FIN recovery

Fix to a problem observed when losing a FIN segment that does not
contain data.  In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.

Signed-off-by: Per Hurtig <per.hurtig@kau.se>
---
 net/ipv4/tcp_output.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

  diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
  index d463c35..2c29926 100644
  --- a/net/ipv4/tcp_output.c
  +++ b/net/ipv4/tcp_output.c
  @@ -2130,8 +2130,9 @@ void tcp_send_loss_probe(struct sock *sk)
    if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
	    goto rearm_timer;
	     
	     -	/* Probe with zero data doesn't trigger fast recovery.
		*/
		-	if (skb->len > 0)
		+	/* Probe with zero data doesn't trigger fast
		recovery, unless
		+	 * FIN flag is set. */
		+	if ((skb->len > 0) ||
		(TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
			err = __tcp_retransmit_skb(sk, skb);
			 
			    /* Record snd_nxt for loss detection. */
			    -- 
			    1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: tcp: fixing TLP's FIN recovery
  2014-06-06 18:46 tcp: fixing TLP's FIN recovery Per Hurtig
@ 2014-06-06 19:07 ` Eric Dumazet
  2014-06-07 11:10   ` [PATCH] " Per Hurtig
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-06-06 19:07 UTC (permalink / raw)
  To: Per Hurtig
  Cc: netdev, anna.brunstrom, mohammad.rajiullah, Neal Cardwell,
	Nandita Dukkipati

On Fri, 2014-06-06 at 20:46 +0200, Per Hurtig wrote:
> From ab1b16ef8aba4300b1a6e965c3ab7d0cb269bb2a Mon Sep 17 00:00:00 2001
> From: Per Hurtig <per.hurtig@kau.se>
> Date: Fri, 6 Jun 2014 18:36:19 +0200
> Subject: [PATCH 1/1] tcp: fixing TLP's FIN recovery
> 
> Fix to a problem observed when losing a FIN segment that does not
> contain data.  In such situations, TLP is unable to recover from
> *any* tail loss and instead adds at least PTO ms to the
> retransmission process, i.e., RTO = RTO + PTO.



> 
> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
> ---
>  net/ipv4/tcp_output.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
>   diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>   index d463c35..2c29926 100644
>   --- a/net/ipv4/tcp_output.c
>   +++ b/net/ipv4/tcp_output.c
>   @@ -2130,8 +2130,9 @@ void tcp_send_loss_probe(struct sock *sk)
>     if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
> 	    goto rearm_timer;
> 	     
> 	     -	/* Probe with zero data doesn't trigger fast recovery.
> 		*/
> 		-	if (skb->len > 0)
> 		+	/* Probe with zero data doesn't trigger fast
> 		recovery, unless
> 		+	 * FIN flag is set. */
> 		+	if ((skb->len > 0) ||
> 		(TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
> 			err = __tcp_retransmit_skb(sk, skb);
> 			 
> 			    /* Record snd_nxt for loss detection. */
> 			    -- 
> 			    1.9.1
> 

Patch was mangled. Please fix and resend.

Please CC Nandita & Neal,

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-06 19:07 ` Eric Dumazet
@ 2014-06-07 11:10   ` Per Hurtig
  2014-06-07 13:56     ` Sergei Shtylyov
  0 siblings, 1 reply; 28+ messages in thread
From: Per Hurtig @ 2014-06-07 11:10 UTC (permalink / raw)
  To: netdev
  Cc: Per Hurtig, eric.dumazet, anna.brunstrom, mohammad.rajiullah,
	ncardwell, nanditad

Fix to a problem observed when losing a FIN segment that does not
contain data.  In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.
---
 net/ipv4/tcp_output.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d463c35..6573765 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
 	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
 		goto rearm_timer;
 
-	/* Probe with zero data doesn't trigger fast recovery. */
-	if (skb->len > 0)
+	/* Probe with zero data doesn't trigger fast recovery, if not
+	 * FIN flag is set.
+	 */
+	if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
 		err = __tcp_retransmit_skb(sk, skb);
 
 	/* Record snd_nxt for loss detection. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-07 11:10   ` [PATCH] " Per Hurtig
@ 2014-06-07 13:56     ` Sergei Shtylyov
  2014-06-07 14:34       ` Per Hurtig
  0 siblings, 1 reply; 28+ messages in thread
From: Sergei Shtylyov @ 2014-06-07 13:56 UTC (permalink / raw)
  To: Per Hurtig, netdev
  Cc: eric.dumazet, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad

Hello.

On 07-06-2014 15:10, Per Hurtig wrote:

> Fix to a problem observed when losing a FIN segment that does not
> contain data.  In such situations, TLP is unable to recover from
> *any* tail loss and instead adds at least PTO ms to the
> retransmission process, i.e., RTO = RTO + PTO.

     You should provide your signoff.

[...]

> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index d463c35..6573765 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
>   	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>   		goto rearm_timer;
>
> -	/* Probe with zero data doesn't trigger fast recovery. */
> -	if (skb->len > 0)
> +	/* Probe with zero data doesn't trigger fast recovery, if not

    s/not/no/?
    Or rather "if FIN flag is not set"?

> +	 * FIN flag is set.
> +	 */
> +	if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>   		err = __tcp_retransmit_skb(sk, skb);

WBR, Sergei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-07 13:56     ` Sergei Shtylyov
@ 2014-06-07 14:34       ` Per Hurtig
  2014-06-08  2:58         ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Per Hurtig @ 2014-06-07 14:34 UTC (permalink / raw)
  To: netdev
  Cc: Per Hurtig, eric.dumazet, anna.brunstrom, mohammad.rajiullah,
	ncardwell, nanditad, sergei.shtylyov

Fix to a problem observed when losing a FIN segment that does not
contain data.  In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.

Signed-off-by: Per Hurtig <per.hurtig@kau.se>
---
 net/ipv4/tcp_output.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d463c35..6573765 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
 	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
 		goto rearm_timer;
 
-	/* Probe with zero data doesn't trigger fast recovery. */
-	if (skb->len > 0)
+	/* Probe with zero data doesn't trigger fast recovery, if FIN
+	 * flag is not set.
+	 */
+	if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
 		err = __tcp_retransmit_skb(sk, skb);
 
 	/* Record snd_nxt for loss detection. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-07 14:34       ` Per Hurtig
@ 2014-06-08  2:58         ` Eric Dumazet
  2014-06-08  7:41           ` Per Hurtig
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-06-08  2:58 UTC (permalink / raw)
  To: Per Hurtig
  Cc: netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad,
	sergei.shtylyov

On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote:
> Fix to a problem observed when losing a FIN segment that does not
> contain data.  In such situations, TLP is unable to recover from
> *any* tail loss and instead adds at least PTO ms to the
> retransmission process, i.e., RTO = RTO + PTO.
> 
> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
> ---
>  net/ipv4/tcp_output.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index d463c35..6573765 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
>  	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>  		goto rearm_timer;
>  
> -	/* Probe with zero data doesn't trigger fast recovery. */
> -	if (skb->len > 0)
> +	/* Probe with zero data doesn't trigger fast recovery, if FIN
> +	 * flag is not set.
> +	 */
> +	if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>  		err = __tcp_retransmit_skb(sk, skb);
>  
>  	/* Record snd_nxt for loss detection. */


You know, I believe the test was exactly to avoid sending data less FIN
packets.

If you write :

    if (A  || !A)

Better remove the condition, completely ;)


Nandita, why FIN packet wont trigger fast retransnmits ?

It sounds like if the timer is the issue you want to fix, you might
simply rearm a timer with RTO-PTO instead of RTO ?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-08  2:58         ` Eric Dumazet
@ 2014-06-08  7:41           ` Per Hurtig
  2014-06-08 16:35             ` Eric Dumazet
  2014-06-09  7:02             ` Nandita Dukkipati
  0 siblings, 2 replies; 28+ messages in thread
From: Per Hurtig @ 2014-06-08  7:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad,
	sergei.shtylyov



On sön  8 jun 2014 04:58:25, Eric Dumazet wrote:
> On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote:
>> Fix to a problem observed when losing a FIN segment that does not
>> contain data.  In such situations, TLP is unable to recover from
>> *any* tail loss and instead adds at least PTO ms to the
>> retransmission process, i.e., RTO = RTO + PTO.
>>
>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>> ---
>>   net/ipv4/tcp_output.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index d463c35..6573765 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
>>   	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>>   		goto rearm_timer;
>>
>> -	/* Probe with zero data doesn't trigger fast recovery. */
>> -	if (skb->len > 0)
>> +	/* Probe with zero data doesn't trigger fast recovery, if FIN
>> +	 * flag is not set.
>> +	 */
>> +	if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>>   		err = __tcp_retransmit_skb(sk, skb);
>>
>>   	/* Record snd_nxt for loss detection. */
>
>
> You know, I believe the test was exactly to avoid sending data less FIN
> packets.
>
> If you write :
>
>      if (A  || !A)
>
> Better remove the condition, completely ;)
>
Obviously, but I don't think that FINs are the only segments
who are targeted by this condition (or targeted at all given
the implications of this statement). Furthermore, the comment above
the if statement would probably have mentioned FINs explicity
and not zero sized segments in general if this were the case.


>
> Nandita, why FIN packet wont trigger fast retransnmits ?
>

They do, that's the whole thing with this patch.

> It sounds like if the timer is the issue you want to fix, you might
> simply rearm a timer with RTO-PTO instead of RTO ?
>
>
No I want to enable TLP for tail loss where an empty FIN is involved,
this does not work now.

>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-08  7:41           ` Per Hurtig
@ 2014-06-08 16:35             ` Eric Dumazet
  2014-06-09  7:04               ` Nandita Dukkipati
  2014-06-09  7:02             ` Nandita Dukkipati
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-06-08 16:35 UTC (permalink / raw)
  To: Per Hurtig
  Cc: netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad,
	sergei.shtylyov

On Sun, 2014-06-08 at 09:41 +0200, Per Hurtig wrote:
> 
> On sön  8 jun 2014 04:58:25, Eric Dumazet wrote:
> > On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote:
> >> Fix to a problem observed when losing a FIN segment that does not
> >> contain data.  In such situations, TLP is unable to recover from
> >> *any* tail loss and instead adds at least PTO ms to the
> >> retransmission process, i.e., RTO = RTO + PTO.
> >>
> >> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
> >> ---
> >>   net/ipv4/tcp_output.c | 6 ++++--
> >>   1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> >> index d463c35..6573765 100644
> >> --- a/net/ipv4/tcp_output.c
> >> +++ b/net/ipv4/tcp_output.c
> >> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
> >>   	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
> >>   		goto rearm_timer;
> >>
> >> -	/* Probe with zero data doesn't trigger fast recovery. */
> >> -	if (skb->len > 0)
> >> +	/* Probe with zero data doesn't trigger fast recovery, if FIN
> >> +	 * flag is not set.
> >> +	 */
> >> +	if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
> >>   		err = __tcp_retransmit_skb(sk, skb);
> >>
> >>   	/* Record snd_nxt for loss detection. */
> >
> >
> > You know, I believe the test was exactly to avoid sending data less FIN
> > packets.
> >
> > If you write :
> >
> >      if (A  || !A)
> >
> > Better remove the condition, completely ;)
> >
> Obviously, but I don't think that FINs are the only segments
> who are targeted by this condition (or targeted at all given
> the implications of this statement). Furthermore, the comment above
> the if statement would probably have mentioned FINs explicity
> and not zero sized segments in general if this were the case.
> 


I see no other possibilities than FIN segments here, or the WARN_ON(!
tcp_skb_pcount(skb)) right before would trigger.

If we believe it could trigger, then we need to remove the WARN_ON(),
because its far more disruptive than waiting a bit more for the RTO.
Remember : RTO is conservative. 

The if (skb->len > 0) only is true for FIN with no data.

This was exactly the intent : Not sending FIN at this stage.

If pure FIN is OK here, just remove the comment and test, this is so
confusing and useless.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-08  7:41           ` Per Hurtig
  2014-06-08 16:35             ` Eric Dumazet
@ 2014-06-09  7:02             ` Nandita Dukkipati
  2014-06-09 13:13               ` Per Hurtig
  2014-06-12 14:21               ` Weiping Pan
  1 sibling, 2 replies; 28+ messages in thread
From: Nandita Dukkipati @ 2014-06-09  7:02 UTC (permalink / raw)
  To: Per Hurtig
  Cc: Eric Dumazet, Netdev, Anna Brunström, mohammad.rajiullah,
	Neal Cardwell, sergei.shtylyov

On Sun, Jun 8, 2014 at 12:41 AM, Per Hurtig <per.hurtig@kau.se> wrote:
>
>
> On sön  8 jun 2014 04:58:25, Eric Dumazet wrote:
>>
>> On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote:
>>>
>>> Fix to a problem observed when losing a FIN segment that does not
>>> contain data.  In such situations, TLP is unable to recover from
>>> *any* tail loss and instead adds at least PTO ms to the
>>> retransmission process, i.e., RTO = RTO + PTO.
>>>
>>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>>> ---
>>>   net/ipv4/tcp_output.c | 6 ++++--
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>> index d463c35..6573765 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
>>>         if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>>>                 goto rearm_timer;
>>>
>>> -       /* Probe with zero data doesn't trigger fast recovery. */
>>> -       if (skb->len > 0)
>>> +       /* Probe with zero data doesn't trigger fast recovery, if FIN
>>> +        * flag is not set.
>>> +        */
>>> +       if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>>>                 err = __tcp_retransmit_skb(sk, skb);
>>>
>>>         /* Record snd_nxt for loss detection. */
>>
>>
>>
>> You know, I believe the test was exactly to avoid sending data less FIN
>> packets.
>>
>> If you write :
>>
>>      if (A  || !A)
>>
>> Better remove the condition, completely ;)
>>
> Obviously, but I don't think that FINs are the only segments
> who are targeted by this condition (or targeted at all given
> the implications of this statement). Furthermore, the comment above
> the if statement would probably have mentioned FINs explicity
> and not zero sized segments in general if this were the case.
>
>
>
>>
>> Nandita, why FIN packet wont trigger fast retransnmits ?
>>
>
> They do, that's the whole thing with this patch.
>
>
>> It sounds like if the timer is the issue you want to fix, you might
>> simply rearm a timer with RTO-PTO instead of RTO ?
>>
>>
> No I want to enable TLP for tail loss where an empty FIN is involved,
> this does not work now.

I understand the tail loss case you want to solve - essentially when a
tail loss occurs that involves data segments as well as that of an
empty FIN. However, have you verified that re-sending an empty FIN
triggers fast recovery? I would be surprised if it did, because I
think the sender needs to receive a SACK of at least 1-byte of data
before sender can trigger FACK based fast recovery.

If you have verified that a pure FIN does indeed trigger recovery, can
you tell me what part of the code makes that happen?

Nandita

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-08 16:35             ` Eric Dumazet
@ 2014-06-09  7:04               ` Nandita Dukkipati
  0 siblings, 0 replies; 28+ messages in thread
From: Nandita Dukkipati @ 2014-06-09  7:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Per Hurtig, Netdev, Anna Brunström, mohammad.rajiullah,
	Neal Cardwell, sergei.shtylyov

On Sun, Jun 8, 2014 at 9:35 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2014-06-08 at 09:41 +0200, Per Hurtig wrote:
>>
>> On sön  8 jun 2014 04:58:25, Eric Dumazet wrote:
>> > On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote:
>> >> Fix to a problem observed when losing a FIN segment that does not
>> >> contain data.  In such situations, TLP is unable to recover from
>> >> *any* tail loss and instead adds at least PTO ms to the
>> >> retransmission process, i.e., RTO = RTO + PTO.
>> >>
>> >> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>> >> ---
>> >>   net/ipv4/tcp_output.c | 6 ++++--
>> >>   1 file changed, 4 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> >> index d463c35..6573765 100644
>> >> --- a/net/ipv4/tcp_output.c
>> >> +++ b/net/ipv4/tcp_output.c
>> >> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
>> >>    if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>> >>            goto rearm_timer;
>> >>
>> >> -  /* Probe with zero data doesn't trigger fast recovery. */
>> >> -  if (skb->len > 0)
>> >> +  /* Probe with zero data doesn't trigger fast recovery, if FIN
>> >> +   * flag is not set.
>> >> +   */
>> >> +  if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>> >>            err = __tcp_retransmit_skb(sk, skb);
>> >>
>> >>    /* Record snd_nxt for loss detection. */
>> >
>> >
>> > You know, I believe the test was exactly to avoid sending data less FIN
>> > packets.
>> >
>> > If you write :
>> >
>> >      if (A  || !A)
>> >
>> > Better remove the condition, completely ;)
>> >
>> Obviously, but I don't think that FINs are the only segments
>> who are targeted by this condition (or targeted at all given
>> the implications of this statement). Furthermore, the comment above
>> the if statement would probably have mentioned FINs explicity
>> and not zero sized segments in general if this were the case.
>>
>
>
> I see no other possibilities than FIN segments here, or the WARN_ON(!
> tcp_skb_pcount(skb)) right before would trigger.
>
> If we believe it could trigger, then we need to remove the WARN_ON(),
> because its far more disruptive than waiting a bit more for the RTO.
> Remember : RTO is conservative.
>
> The if (skb->len > 0) only is true for FIN with no data.
>
> This was exactly the intent : Not sending FIN at this stage.
>
> If pure FIN is OK here, just remove the comment and test, this is so
> confusing and useless.

I agree - if pure FIN can indeed trigger recovery, let's just remove
the test and comment.

Nandita

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09  7:02             ` Nandita Dukkipati
@ 2014-06-09 13:13               ` Per Hurtig
  2014-06-09 14:33                 ` Eric Dumazet
  2014-06-12 14:21               ` Weiping Pan
  1 sibling, 1 reply; 28+ messages in thread
From: Per Hurtig @ 2014-06-09 13:13 UTC (permalink / raw)
  To: Nandita Dukkipati
  Cc: Eric Dumazet, Netdev, Anna Brunström, mohammad.rajiullah,
	Neal Cardwell, Sergei Shtylyov

See inline,

On 2014-06-09 09:02, Nandita Dukkipati wrote:
> On Sun, Jun 8, 2014 at 12:41 AM, Per Hurtig <per.hurtig@kau.se> wrote:
>>
>>
>> On sön  8 jun 2014 04:58:25, Eric Dumazet wrote:
>>>
>>> On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote:
>>>>
>>>> Fix to a problem observed when losing a FIN segment that does not
>>>> contain data.  In such situations, TLP is unable to recover from
>>>> *any* tail loss and instead adds at least PTO ms to the
>>>> retransmission process, i.e., RTO = RTO + PTO.
>>>>
>>>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>>>> ---
>>>>    net/ipv4/tcp_output.c | 6 ++++--
>>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>> index d463c35..6573765 100644
>>>> --- a/net/ipv4/tcp_output.c
>>>> +++ b/net/ipv4/tcp_output.c
>>>> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
>>>>          if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>>>>                  goto rearm_timer;
>>>>
>>>> -       /* Probe with zero data doesn't trigger fast recovery. */
>>>> -       if (skb->len > 0)
>>>> +       /* Probe with zero data doesn't trigger fast recovery, if FIN
>>>> +        * flag is not set.
>>>> +        */
>>>> +       if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>>>>                  err = __tcp_retransmit_skb(sk, skb);
>>>>
>>>>          /* Record snd_nxt for loss detection. */
>>>
>>>
>>>
>>> You know, I believe the test was exactly to avoid sending data less FIN
>>> packets.
>>>
>>> If you write :
>>>
>>>       if (A  || !A)
>>>
>>> Better remove the condition, completely ;)
>>>

After looking more closely, I see that we only enter this part when the 
FIN flag is set on an otherwise empty segment. I guess I was distracted 
by the comment that suggested a more general scenario than the actual 
one, a bit confusing ;)

>> Obviously, but I don't think that FINs are the only segments
>> who are targeted by this condition (or targeted at all given
>> the implications of this statement). Furthermore, the comment above
>> the if statement would probably have mentioned FINs explicity
>> and not zero sized segments in general if this were the case.
>>
>>
>>
>>>
>>> Nandita, why FIN packet wont trigger fast retransnmits ?
>>>
>>
>> They do, that's the whole thing with this patch.
>>
>>
>>> It sounds like if the timer is the issue you want to fix, you might
>>> simply rearm a timer with RTO-PTO instead of RTO ?
>>>
>>>
>> No I want to enable TLP for tail loss where an empty FIN is involved,
>> this does not work now.
>
> I understand the tail loss case you want to solve - essentially when a
> tail loss occurs that involves data segments as well as that of an
> empty FIN. However, have you verified that re-sending an empty FIN
> triggers fast recovery? I would be surprised if it did, because I
> think the sender needs to receive a SACK of at least 1-byte of data
> before sender can trigger FACK based fast recovery.
>
Yes, it needs a SACK that covers one "sequence number", which a FIN
does. I don't see why it shouldn't generate a SACK? See below for some 
packet dumps.

> If you have verified that a pure FIN does indeed trigger recovery, can
> you tell me what part of the code makes that happen?
>

Scenario:
Eleven segments are sent back-to-back (ten data and one empty FIN), the 
last three segments (the FIN + two others) are dropped.

Other relevant info: RTT of 20ms.

Transfer time of the entire flow (from the receiver's point of view):
TCP w TLP: 324ms
TCP w modified TLP: 122ms


Detailed TLP behavior:
The entire transfer including retransmissions takes approx 324ms. The 
retransmissions are conducted in frames 23 and 25.

Sender-side packet trace:
1   0.000000     10.0.1.1 -> 10.0.2.1     TCP 74 36713 > search-agent 
[SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1171292524 
TSecr=0 WS=128
2   0.000020     10.0.2.1 -> 10.0.1.1     TCP 74 search-agent > 36713 
[SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 
TSval=1171296150 TSecr=1171292524 WS=128
3   0.019818     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=1171292529 TSecr=1171296150
4   0.019854     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=1 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
5   0.019864     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=1449 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
6   0.019868     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=2897 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
7   0.019871     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=4345 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
8   0.019875     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=5793 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
9   0.019878     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=7241 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
10   0.019881     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=8689 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
11   0.019922     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=10137 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
12   0.019929     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 36713 
[PSH, ACK] Seq=11585 Ack=1 Win=29056 Len=1448 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
13   0.019930     10.0.2.1 -> 10.0.1.1     TCP 1513 search-agent > 36713 
[PSH, ACK] Seq=13033 Ack=1 Win=29056 Len=1447 TSval=1171296155 
TSecr=1171292529[Packet size limited during capture]
14   0.019971     10.0.2.1 -> 10.0.1.1     TCP 66 search-agent > 36713 
[FIN, ACK] Seq=14480 Ack=1 Win=29056 Len=0 TSval=1171296155 TSecr=1171292529
15   0.039635     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=1449 Win=32128 Len=0 TSval=1171292534 TSecr=1171296155
16   0.039643     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=2897 Win=35072 Len=0 TSval=1171292534 TSecr=1171296155
17   0.039646     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=4345 Win=37888 Len=0 TSval=1171292534 TSecr=1171296155
18   0.039650     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=5793 Win=40832 Len=0 TSval=1171292534 TSecr=1171296155
19   0.039653     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=7241 Win=43520 Len=0 TSval=1171292534 TSecr=1171296155
20   0.039655     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=8689 Win=43520 Len=0 TSval=1171292534 TSecr=1171296155
21   0.039657     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=10137 Win=43520 Len=0 TSval=1171292534 TSecr=1171296155
22   0.039660     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=11585 Win=43520 Len=0 TSval=1171292534 TSecr=1171296155
23   0.283780     10.0.2.1 -> 10.0.1.1     TCP 1514 [TCP Retransmission] 
search-agent > 36713 [PSH, ACK] Seq=11585 Ack=1 Win=29056 Len=1448 
TSval=1171296221 TSecr=1171292534[Packet size limited during capture]
24   0.303267     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[ACK] Seq=1 Ack=13033 Win=43520 Len=0 TSval=1171292600 TSecr=1171296221
25   0.303276     10.0.2.1 -> 10.0.1.1     TCP 1513 [TCP Retransmission] 
search-agent > 36713 [FIN, PSH, ACK] Seq=13033 Ack=1 Win=29056 Len=1447 
TSval=1171296225 TSecr=1171292600[Packet size limited during capture]
26   0.324085     10.0.1.1 -> 10.0.2.1     TCP 66 36713 > search-agent 
[FIN, ACK] Seq=1 Ack=14481 Win=43520 Len=0 TSval=1171292605 TSecr=1171296225
27   0.324093     10.0.2.1 -> 10.0.1.1     TCP 66 search-agent > 36713 
[ACK] Seq=14481 Ack=2 Win=29056 Len=0 TSval=1171296231 TSecr=1171292605

----

Modified TLP behavior:
The entire transfer including retransmissions takes approx 122ms.

The TLP probe is sent in frame 23 below, and you can see in frame 24 
below that a SACK covering one sequence number is returned from the 
receiver and used to trigger retransmissions of the other lost segments.

Sender-side packet trace:
1   0.000000     10.0.1.1 -> 10.0.2.1     TCP 74 37730 > search-agent 
[SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1194757582 
TSecr=0 WS=128
2   0.000021     10.0.2.1 -> 10.0.1.1     TCP 74 search-agent > 37730 
[SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=222654 
TSecr=1194757582 WS=128
3   0.020765     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=1194757587 TSecr=222654
4   0.020800     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=1 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
5   0.020810     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=1449 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
6   0.020814     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=2897 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
7   0.020818     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=4345 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
8   0.020821     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=5793 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
9   0.020824     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=7241 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
10   0.020827     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=8689 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
11   0.020870     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=10137 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
12   0.020877     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=11585 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
13   0.020879     10.0.2.1 -> 10.0.1.1     TCP 1514 search-agent > 37730 
[PSH, ACK] Seq=13033 Ack=1 Win=29056 Len=1448 TSval=222659 
TSecr=1194757587[Packet size limited during capture]
14   0.020918     10.0.2.1 -> 10.0.1.1     TCP 66 search-agent > 37730 
[FIN, ACK] Seq=14481 Ack=1 Win=29056 Len=0 TSval=222659 TSecr=1194757587
15   0.040583     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=1449 Win=32128 Len=0 TSval=1194757593 TSecr=222659
16   0.040591     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=2897 Win=35072 Len=0 TSval=1194757593 TSecr=222659
17   0.040594     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=4345 Win=37888 Len=0 TSval=1194757593 TSecr=222659
18   0.040597     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=5793 Win=40832 Len=0 TSval=1194757593 TSecr=222659
19   0.040599     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=7241 Win=43520 Len=0 TSval=1194757593 TSecr=222659
20   0.040601     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=8689 Win=43520 Len=0 TSval=1194757593 TSecr=222659
21   0.040604     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=10137 Win=43520 Len=0 TSval=1194757593 TSecr=222659
22   0.040606     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=11585 Win=43520 Len=0 TSval=1194757593 TSecr=222659
23   0.078751     10.0.2.1 -> 10.0.1.1     TCP 66 [TCP Retransmission] 
search-agent > 37730 [FIN, ACK] Seq=14481 Ack=1 Win=29056 Len=0 
TSval=222674 TSecr=1194757593
24   0.098093     10.0.1.1 -> 10.0.2.1     TCP 78 [TCP Dup ACK 22#1] 
37730 > search-agent [ACK] Seq=1 Ack=11585 Win=43520 Len=0 
TSval=1194757607 TSecr=222659 SLE=14481 SRE=14482
25   0.102752     10.0.2.1 -> 10.0.1.1     TCP 1514 [TCP Retransmission] 
search-agent > 37730 [PSH, ACK] Seq=11585 Ack=1 Win=29056 Len=1448 
TSval=222680 TSecr=1194757607[Packet size limited during capture]
26   0.102757     10.0.2.1 -> 10.0.1.1     TCP 1514 [TCP Retransmission] 
search-agent > 37730 [PSH, ACK] Seq=13033 Ack=1 Win=29056 Len=1448 
TSval=222680 TSecr=1194757607[Packet size limited during capture]
27   0.121854     10.0.1.1 -> 10.0.2.1     TCP 78 37730 > search-agent 
[ACK] Seq=1 Ack=13033 Win=43520 Len=0 TSval=1194757613 TSecr=222680 
SLE=14481 SRE=14482
28   0.121862     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[ACK] Seq=1 Ack=14482 Win=43520 Len=0 TSval=1194757613 TSecr=222680
29   0.121868     10.0.1.1 -> 10.0.2.1     TCP 66 37730 > search-agent 
[FIN, ACK] Seq=1 Ack=14482 Win=43520 Len=0 TSval=1194757613 TSecr=222680
30   0.121875     10.0.2.1 -> 10.0.1.1     TCP 66 search-agent > 37730 
[ACK] Seq=14482 Ack=2 Win=29056 Len=0 TSval=222684 TSecr=1194757613

----

Thanks,
Per

> Nandita
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09 13:13               ` Per Hurtig
@ 2014-06-09 14:33                 ` Eric Dumazet
  2014-06-09 14:39                   ` Eric Dumazet
  2014-06-09 14:42                   ` Per Hurtig
  0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2014-06-09 14:33 UTC (permalink / raw)
  To: Per Hurtig
  Cc: Nandita Dukkipati, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov

On Mon, 2014-06-09 at 15:13 +0200, Per Hurtig wrote:
> >
> Yes, it needs a SACK that covers one "sequence number", which a FIN
> does. I don't see why it shouldn't generate a SACK? See below for some 
> packet dumps.

I cooked following packetdrill test :

$ cat tlp-10pkt-fin.pkt 
`../common/defaults.sh`
// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Send 8 MSS.
0.200 write(4, ..., 8000) = 8000
+.000 > P. 1:8001(8000) ack 1
+.000   close(4) = 0
+.000 > F. 8001:8001(0) ack 1

// Receiver ACKs 7 packets
0.300 < . 1:1(0) ack 7001 win 257
// check if TLP re-sends the FIN
0.500 > F. 8001:8001(0) ack 1
0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8001,nop,nop>
// check if fast restransmit is correctly triggered.
0.600 > P. 7001:8001(1000) ack 1

# ../packetdrill tlp-10pkt-fin.pkt
tlp-10pkt-fin.pkt:26: error handling packet: timing error: expected
outbound packet at 0.600000 sec but happened at 1.099761 sec
script packet:  0.600000 P. 7001:8001(1000) ack 1 
actual packet:  1.099761 P. 7001:8001(1000) ack 1 win 457 

So it looks like fast retransmit is not triggered.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09 14:33                 ` Eric Dumazet
@ 2014-06-09 14:39                   ` Eric Dumazet
  2014-06-09 14:42                   ` Per Hurtig
  1 sibling, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2014-06-09 14:39 UTC (permalink / raw)
  To: Per Hurtig
  Cc: Nandita Dukkipati, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov

On Mon, 2014-06-09 at 07:33 -0700, Eric Dumazet wrote:
> On Mon, 2014-06-09 at 15:13 +0200, Per Hurtig wrote:
> > >
> > Yes, it needs a SACK that covers one "sequence number", which a FIN
> > does. I don't see why it shouldn't generate a SACK? See below for some 
> > packet dumps.
> 
> I cooked following packetdrill test :
> 
> $ cat tlp-10pkt-fin.pkt 
> `../common/defaults.sh`
> // Establish a connection.
> 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 0.000 bind(3, ..., ...) = 0
> 0.000 listen(3, 1) = 0
> 
> 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
> 0.200 < . 1:1(0) ack 1 win 257
> 0.200 accept(3, ..., ...) = 4
> 
> // Send 8 MSS.
> 0.200 write(4, ..., 8000) = 8000
> +.000 > P. 1:8001(8000) ack 1
> +.000   close(4) = 0
> +.000 > F. 8001:8001(0) ack 1
> 
> // Receiver ACKs 7 packets
> 0.300 < . 1:1(0) ack 7001 win 257
> // check if TLP re-sends the FIN
> 0.500 > F. 8001:8001(0) ack 1
> 0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8001,nop,nop>
> // check if fast restransmit is correctly triggered.
> 0.600 > P. 7001:8001(1000) ack 1
> 
> # ../packetdrill tlp-10pkt-fin.pkt
> tlp-10pkt-fin.pkt:26: error handling packet: timing error: expected
> outbound packet at 0.600000 sec but happened at 1.099761 sec
> script packet:  0.600000 P. 7001:8001(1000) ack 1 
> actual packet:  1.099761 P. 7001:8001(1000) ack 1 win 457 
> 
> So it looks like fast retransmit is not triggered.
> 

And using instead :

# cat tlp-10pkt-fin.pkt
// Set up production config.
`../common/defaults.sh`

// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Send 8 MSS.
0.200 write(4, ..., 8000) = 8000
+.000 > P. 1:8001(8000) ack 1
+.000   close(4) = 0
+.000 > F. 8001:8001(0) ack 1

// Receiver ACKs 7 packets
0.300 < . 1:1(0) ack 7001 win 257

0.500 > F. 8001:8001(0) ack 1
0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8002,nop,nop>
0.600 > P. 7001:8001(1000) ack 1

# ../packetdrill tlp-10pkt-fin.pkt
tlp-10pkt-fin.pkt:26: error handling packet: timing error: expected
outbound packet at 0.600000 sec but happened at 0.624721 sec
script packet:  0.600000 P. 7001:8001(1000) ack 1 
actual packet:  0.624721 P. 7001:8001(1000) ack 1 win 457 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09 14:33                 ` Eric Dumazet
  2014-06-09 14:39                   ` Eric Dumazet
@ 2014-06-09 14:42                   ` Per Hurtig
  2014-06-09 15:04                     ` Eric Dumazet
  1 sibling, 1 reply; 28+ messages in thread
From: Per Hurtig @ 2014-06-09 14:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nandita Dukkipati, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov

Tried to run the script, but I don't have the "common/defaults" and the 
test scripts from the git repository fails on all TCP tests for Linux. 
The results I listed in the enclosed packet traces are from two real 
machines communicating with each other (with fresh net-next kernels and 
TLP without the zero probe check), so I tend to rely more on those 
results.


Cheers,
Per

On mån  9 jun 2014 16:33:08, Eric Dumazet wrote:
> On Mon, 2014-06-09 at 15:13 +0200, Per Hurtig wrote:
>>>
>> Yes, it needs a SACK that covers one "sequence number", which a FIN
>> does. I don't see why it shouldn't generate a SACK? See below for some
>> packet dumps.
>
> I cooked following packetdrill test :
>
> $ cat tlp-10pkt-fin.pkt
> `../common/defaults.sh`
> // Establish a connection.
> 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 0.000 bind(3, ..., ...) = 0
> 0.000 listen(3, 1) = 0
>
> 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
> 0.200 < . 1:1(0) ack 1 win 257
> 0.200 accept(3, ..., ...) = 4
>
> // Send 8 MSS.
> 0.200 write(4, ..., 8000) = 8000
> +.000 > P. 1:8001(8000) ack 1
> +.000   close(4) = 0
> +.000 > F. 8001:8001(0) ack 1
>
> // Receiver ACKs 7 packets
> 0.300 < . 1:1(0) ack 7001 win 257
> // check if TLP re-sends the FIN
> 0.500 > F. 8001:8001(0) ack 1
> 0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8001,nop,nop>
> // check if fast restransmit is correctly triggered.
> 0.600 > P. 7001:8001(1000) ack 1
>
> # ../packetdrill tlp-10pkt-fin.pkt
> tlp-10pkt-fin.pkt:26: error handling packet: timing error: expected
> outbound packet at 0.600000 sec but happened at 1.099761 sec
> script packet:  0.600000 P. 7001:8001(1000) ack 1
> actual packet:  1.099761 P. 7001:8001(1000) ack 1 win 457
>
> So it looks like fast retransmit is not triggered.
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09 14:42                   ` Per Hurtig
@ 2014-06-09 15:04                     ` Eric Dumazet
  2014-06-09 15:56                       ` Per Hurtig
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-06-09 15:04 UTC (permalink / raw)
  To: Per Hurtig
  Cc: Nandita Dukkipati, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov

On Mon, 2014-06-09 at 16:42 +0200, Per Hurtig wrote:
> Tried to run the script, but I don't have the "common/defaults" and the 
> test scripts from the git repository fails on all TCP tests for Linux. 
> The results I listed in the enclosed packet traces are from two real 
> machines communicating with each other (with fresh net-next kernels and 
> TLP without the zero probe check), so I tend to rely more on those 
> results.

Do not top post on netdev.

We at Google run about 1000 packet drill tests for any functional change
in TCP stack. This is the only way we can scale.

We are not 'studying by hand' various tcpdumps when a tool can do it
properly.

Nandita asked you give a pointer to the source code explaining how fast
retransmit was done for this specific case, but you provided a tcpdump,
which hardly can be reproduced and be the answer to the question.

So now, we are trying to have a test to reproduce the issue and check
the fix is complete.

So far, I am not really convinced. It seems the FIN _is_ retransmitted,
but I do not see the SACK for this RTX is properly handled in time.

Its one thing checking the FIN is retransmitted, its another to check
that the SACK will trigger sensible behavior.

If you carefully check your tcpdump, you'll see there is the same
problem, and you missed it, while packetdrill exactly pointed it.

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09 15:04                     ` Eric Dumazet
@ 2014-06-09 15:56                       ` Per Hurtig
  2014-06-09 16:15                         ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Per Hurtig @ 2014-06-09 15:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nandita Dukkipati, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov



On mån  9 jun 2014 17:04:19, Eric Dumazet wrote:
> On Mon, 2014-06-09 at 16:42 +0200, Per Hurtig wrote:
>> Tried to run the script, but I don't have the "common/defaults" and the
>> test scripts from the git repository fails on all TCP tests for Linux.
>> The results I listed in the enclosed packet traces are from two real
>> machines communicating with each other (with fresh net-next kernels and
>> TLP without the zero probe check), so I tend to rely more on those
>> results.
>
> Do not top post on netdev.
>
> We at Google run about 1000 packet drill tests for any functional change
> in TCP stack. This is the only way we can scale.
>
> We are not 'studying by hand' various tcpdumps when a tool can do it
> properly.
>
> Nandita asked you give a pointer to the source code explaining how fast
> retransmit was done for this specific case, but you provided a tcpdump,
> which hardly can be reproduced and be the answer to the question.
>
> So now, we are trying to have a test to reproduce the issue and check
> the fix is complete.
>
> So far, I am not really convinced. It seems the FIN _is_ retransmitted,
> but I do not see the SACK for this RTX is properly handled in time.
>
> Its one thing checking the FIN is retransmitted, its another to check
> that the SACK will trigger sensible behavior.
>
> If you carefully check your tcpdump, you'll see there is the same
> problem, and you missed it, while packetdrill exactly pointed it.
>
> Thanks
>
>

Ok, I guess you mean that the retransmission was not fast enough? But 
will the same not happen if the original FIN is not lost and triggers a 
SACK (i.e., if the two last data segments are still lost)?

Cheers,
Per

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09 15:56                       ` Per Hurtig
@ 2014-06-09 16:15                         ` Eric Dumazet
  2014-06-09 16:24                           ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-06-09 16:15 UTC (permalink / raw)
  To: Per Hurtig
  Cc: Nandita Dukkipati, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov

On Mon, 2014-06-09 at 17:56 +0200, Per Hurtig wrote:

> 
> Ok, I guess you mean that the retransmission was not fast enough? But 
> will the same not happen if the original FIN is not lost and triggers a 
> SACK (i.e., if the two last data segments are still lost)?

Yes, I'd like to understand why Nandita specifically added the original
test. In her tests, fast retransmit was not really effective.

Running packetdrill in a separate container gives me these interesting
counters :

# nstat
#kernel
IpInReceives                    4                  0.0
IpInDelivers                    4                  0.0
IpOutRequests                   5                  0.0
TcpPassiveOpens                 1                  0.0
TcpInSegs                       4                  0.0
TcpOutSegs                      10                 0.0
TcpRetransSegs                  2                  0.0
TcpExtTCPPureAcks               3                  0.0
TcpExtTCPSackRecovery           1                  0.0
TcpExtTCPFastRetrans            1                  0.0
TcpExtTCPLossProbes             1                  0.0
TcpExtTCPSackRecoveryFail       1                  0.0
TcpExtTCPSackShiftFallback      1                  0.0
TcpExtTCPRetransFail            4                  0.0   <<<< HERE >>>
TcpExtTCPOrigDataSent           9                  0.0
IpExtInOctets                   184                0.0
IpExtOutOctets                  9212               0.0
IpExtInNoECTPkts                4                  0.0

I guess we need to understand why the retransmit is in error.

I am investigating.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09 16:15                         ` Eric Dumazet
@ 2014-06-09 16:24                           ` Eric Dumazet
  2014-06-09 18:33                             ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-06-09 16:24 UTC (permalink / raw)
  To: Per Hurtig
  Cc: Nandita Dukkipati, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov

On Mon, 2014-06-09 at 09:15 -0700, Eric Dumazet wrote:
> On Mon, 2014-06-09 at 17:56 +0200, Per Hurtig wrote:
> 
> > 
> > Ok, I guess you mean that the retransmission was not fast enough? But 
> > will the same not happen if the original FIN is not lost and triggers a 
> > SACK (i.e., if the two last data segments are still lost)?
> 
> Yes, I'd like to understand why Nandita specifically added the original
> test. In her tests, fast retransmit was not really effective.
> 
> Running packetdrill in a separate container gives me these interesting
> counters :
> 
> # nstat
> #kernel
> IpInReceives                    4                  0.0
> IpInDelivers                    4                  0.0
> IpOutRequests                   5                  0.0
> TcpPassiveOpens                 1                  0.0
> TcpInSegs                       4                  0.0
> TcpOutSegs                      10                 0.0
> TcpRetransSegs                  2                  0.0
> TcpExtTCPPureAcks               3                  0.0
> TcpExtTCPSackRecovery           1                  0.0
> TcpExtTCPFastRetrans            1                  0.0
> TcpExtTCPLossProbes             1                  0.0
> TcpExtTCPSackRecoveryFail       1                  0.0
> TcpExtTCPSackShiftFallback      1                  0.0
> TcpExtTCPRetransFail            4                  0.0   <<<< HERE >>>
> TcpExtTCPOrigDataSent           9                  0.0
> IpExtInOctets                   184                0.0
> IpExtOutOctets                  9212               0.0
> IpExtInNoECTPkts                4                  0.0
> 
> I guess we need to understand why the retransmit is in error.
> 
> I am investigating.
> 


Hmm... We hit this point... This is embarrassing I guess.

       if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk))
                return -EHOSTUNREACH; /* Routing failure or similar. */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09 16:24                           ` Eric Dumazet
@ 2014-06-09 18:33                             ` Eric Dumazet
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2014-06-09 18:33 UTC (permalink / raw)
  To: Per Hurtig
  Cc: Nandita Dukkipati, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov

On Mon, 2014-06-09 at 09:24 -0700, Eric Dumazet wrote:

> Hmm... We hit this point... This is embarrassing I guess.
> 
>        if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk))
>                 return -EHOSTUNREACH; /* Routing failure or similar. */

False alarm. This was caused of route being dismantled at the end of the
test, and a dangling socket tried to retransmit.

The 25ms timer I had was caused by early retransmit (25 ms = RTT/4 in my
test), because only one packet was missing.

Following test runs fine, as 3 packets are missing.

// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Send 8 MSS.
0.200 write(4, ..., 8000) = 8000
+.000 > P. 1:8001(8000) ack 1
+.000   close(4) = 0
+.000 > F. 8001:8001(0) ack 1

// Receiver ACKs 5 packets
0.300 < . 1:1(0) ack 5001 win 257
// Check if TLP is triggering, even if last packet is a pure FIN
0.500 > F. 8001:8001(0) ack 1

0.610 < . 1:1(0) ack 5001 win 257 <sack 8001:8002,nop,nop>
0.610 > . 5001:6001(1000) ack 1
0.610 > . 6001:7001(1000) ack 1

0.710 < . 1:1(0) ack 7001 win 257 <sack 8001:8002,nop,nop>
0.710 > P. 7001:8001(1000) ack 1

0.810 < . 1:1(0) ack 8002 win 257

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-09  7:02             ` Nandita Dukkipati
  2014-06-09 13:13               ` Per Hurtig
@ 2014-06-12 14:21               ` Weiping Pan
  2014-06-12 14:32                 ` Eric Dumazet
  1 sibling, 1 reply; 28+ messages in thread
From: Weiping Pan @ 2014-06-12 14:21 UTC (permalink / raw)
  To: Nandita Dukkipati, Per Hurtig
  Cc: Eric Dumazet, Netdev, Anna Brunström, mohammad.rajiullah,
	Neal Cardwell, sergei.shtylyov


On 06/09/2014 03:02 PM, Nandita Dukkipati wrote:
> On Sun, Jun 8, 2014 at 12:41 AM, Per Hurtig <per.hurtig@kau.se> wrote:
>>
>> On sön  8 jun 2014 04:58:25, Eric Dumazet wrote:
>>> On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote:
>>>> Fix to a problem observed when losing a FIN segment that does not
>>>> contain data.  In such situations, TLP is unable to recover from
>>>> *any* tail loss and instead adds at least PTO ms to the
>>>> retransmission process, i.e., RTO = RTO + PTO.
>>>>
>>>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>>>> ---
>>>>    net/ipv4/tcp_output.c | 6 ++++--
>>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>> index d463c35..6573765 100644
>>>> --- a/net/ipv4/tcp_output.c
>>>> +++ b/net/ipv4/tcp_output.c
>>>> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
>>>>          if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>>>>                  goto rearm_timer;
>>>>
>>>> -       /* Probe with zero data doesn't trigger fast recovery. */
>>>> -       if (skb->len > 0)
>>>> +       /* Probe with zero data doesn't trigger fast recovery, if FIN
>>>> +        * flag is not set.
>>>> +        */
>>>> +       if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>>>>                  err = __tcp_retransmit_skb(sk, skb);
>>>>
>>>>          /* Record snd_nxt for loss detection. */
>>>
>>>
>>> You know, I believe the test was exactly to avoid sending data less FIN
>>> packets.
>>>
>>> If you write :
>>>
>>>       if (A  || !A)
>>>
>>> Better remove the condition, completely ;)
>>>
>> Obviously, but I don't think that FINs are the only segments
>> who are targeted by this condition (or targeted at all given
>> the implications of this statement). Furthermore, the comment above
>> the if statement would probably have mentioned FINs explicity
>> and not zero sized segments in general if this were the case.
>>
>>
>>
>>> Nandita, why FIN packet wont trigger fast retransnmits ?
>>>
>> They do, that's the whole thing with this patch.
>>
>>
>>> It sounds like if the timer is the issue you want to fix, you might
>>> simply rearm a timer with RTO-PTO instead of RTO ?
>>>
>>>
>> No I want to enable TLP for tail loss where an empty FIN is involved,
>> this does not work now.
> I understand the tail loss case you want to solve - essentially when a
> tail loss occurs that involves data segments as well as that of an
> empty FIN. However, have you verified that re-sending an empty FIN
> triggers fast recovery? I would be surprised if it did, because I
> think the sender needs to receive a SACK of at least 1-byte of data
> before sender can trigger FACK based fast recovery.
When we queue an out of order pure FIN packet, we do not check whether 
it has data or not.
tcp_rcv_established
-->tcp_data_queue
---->tcp_data_queue_ofo

Then the pure FIN packet can generate SACK, which will trigger fast 
recovery or early retransmit on the sender.
>
> If you have verified that a pure FIN does indeed trigger recovery, can
> you tell me what part of the code makes that happen?
Here is the patch I use, I think the original if statement is useless, 
so I delete it.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 12d6016..4b301e9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2077,9 +2077,7 @@ void tcp_send_loss_probe(struct sock *sk)
         if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
                 goto rearm_timer;

-       /* Probe with zero data doesn't trigger fast recovery. */
-       if (skb->len > 0)
-               err = __tcp_retransmit_skb(sk, skb);
+       err = __tcp_retransmit_skb(sk, skb);

         /* Record snd_nxt for loss detection. */
         if (likely(!err))


I find that pure FIN can trigger fast recovery or early retransmit, 
depending on the value of fackets_out.
I write two packetdrill scripts,
fin_fack.pkt can show that pure FIN can trigger fast recovery,
fin_er.pkt can show that pure FIN can trigger early retransmit.

# cat fin_fack.pkt
// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

// RTT = 100ms
0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Send 8 MSS.
// tcp_min_tso_segs is 2
0.200 write(4, ..., 8000) = 8000
+.000 > . 1:2001(2000) ack 1
+.000 > . 2001:4001(2000) ack 1
+.000 > . 4001:6001(2000) ack 1
+.000 > P. 6001:8001(2000) ack 1
+.000   close(4) = 0
+.000 > F. 8001:8001(0) ack 1

// Receiver ACKs 4 packets, the fifth to eighth packets are lost
0.300 < . 1:1(0) ack 4001 win 257

// PTO = 2RTT, TLP is triggered
0.500 > F. 8001:8001(0) ack 1 win 229
0.600 < . 1:1(0) ack 4001 win 257 <sack 8001:8002,nop,nop>

// got SACK, FACK triggers fast recovery and fast retransmit
0.600 > . 4001:5001(1000) ack 1 win 229
0.600 > . 5001:6001(1000) ack 1 win 229
0.700 < . 1:1(0) ack 6001 win 257 <sack 8001:8002,nop,nop>

// fast retransmit
0.700 > . 6001:7001(1000) ack 1 win 229
0.700 > P. 7001:8001(1000) ack 1 win 229
0.700 < . 1:1(0) ack 8002 win 257

// peer close
0.800 < F. 1:1(0) ack 8002 win 229
0.800 > . 8002:8002(0) ack 2


# cat fin_er.pkt
// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

// RTT = 100ms
0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Send 8 MSS.
// tcp_min_tso_segs is 2
0.200 write(4, ..., 8000) = 8000
+.000 > . 1:2001(2000) ack 1
+.000 > . 2001:4001(2000) ack 1
+.000 > . 4001:6001(2000) ack 1
+.000 > P. 6001:8001(2000) ack 1
+.000   close(4) = 0
+.000 > F. 8001:8001(0) ack 1

// Receiver ACKs 7 packets, the eighth packet is lost
0.300 < . 1:1(0) ack 7001 win 257

// PTO = 2RTT, TLP is triggered
0.500 > F. 8001:8001(0) ack 1 win 229
0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8002,nop,nop>

// got SACK, trigger early retransmit in RTT/4
0.625 > P. 7001:8001(1000) ack 1 win 229
0.725 < . 1:1(0) ack 8002 win 257

// peer close
0.800 < F. 1:1(0) ack 8002 win 229
0.800 > . 8002:8002(0) ack 2

Test results:
before the patch:

# ./packetdrill -v fin_fack.pkt
inbound injected packet:  0.100009 S 0:0(0) win 32792 <mss 
1000,sackOK,nop,nop,nop,wscale 7>
outbound sniffed packet:  0.100182 S. 3040039935:3040039935(0) ack 1 win 
29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
inbound injected packet:  0.200005 . 1:1(0) ack 3040039936 win 257
outbound sniffed packet:  0.200108 . 3040039936:3040041936(2000) ack 1 
win 229
outbound sniffed packet:  0.200117 . 3040041936:3040043936(2000) ack 1 
win 229
outbound sniffed packet:  0.200123 . 3040043936:3040045936(2000) ack 1 
win 229
outbound sniffed packet:  0.200130 P. 3040045936:3040047936(2000) ack 1 
win 229
outbound sniffed packet:  0.200328 F. 3040047936:3040047936(0) ack 1 win 
229
inbound injected packet:  0.300004 . 1:1(0) ack 3040043936 win 257
outbound sniffed packet:  0.800768 . 3040043936:3040044936(1000) ack 1 
win 229
fin_fack.pkt:27: error handling packet: live packet field 
ipv4_total_length: expected: 40 (0x28) vs actual: 1040 (0x410)
script packet:  0.500000 F. 8001:8001(0) ack 1 win 229
actual packet:  0.800768 . 4001:5001(1000) ack 1 win 229

# ./packetdrill -v fin_er.pkt
inbound injected packet:  0.100009 S 0:0(0) win 32792 <mss 
1000,sackOK,nop,nop,nop,wscale 7>
outbound sniffed packet:  0.100172 S. 1475097861:1475097861(0) ack 1 win 
29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
inbound injected packet:  0.200005 . 1:1(0) ack 1475097862 win 257
outbound sniffed packet:  0.200113 . 1475097862:1475099862(2000) ack 1 
win 229
outbound sniffed packet:  0.200121 . 1475099862:1475101862(2000) ack 1 
win 229
outbound sniffed packet:  0.200128 . 1475101862:1475103862(2000) ack 1 
win 229
outbound sniffed packet:  0.200134 P. 1475103862:1475105862(2000) ack 1 
win 229
outbound sniffed packet:  0.200305 F. 1475105862:1475105862(0) ack 1 win 
229
inbound injected packet:  0.300004 . 1:1(0) ack 1475104862 win 257
outbound sniffed packet:  0.800764 P. 1475104862:1475105862(1000) ack 1 
win 229
fin_er.pkt:27: error handling packet: live packet field 
ipv4_total_length: expected: 40 (0x28) vs actual: 1040 (0x410)
script packet:  0.500000 F. 8001:8001(0) ack 1 win 229
actual packet:  0.800764 P. 7001:8001(1000) ack 1 win 229

after the patch:
# ./packetdrill -v fin_fack.pkt
inbound injected packet:  0.100026 S 0:0(0) win 32792 <mss 
1000,sackOK,nop,nop,nop,wscale 7>
outbound sniffed packet:  0.100198 S. 3395593992:3395593992(0) ack 1 win 
29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
inbound injected packet:  0.200013 . 1:1(0) ack 3395593993 win 257
outbound sniffed packet:  0.200115 . 3395593993:3395595993(2000) ack 1 
win 229
outbound sniffed packet:  0.200131 . 3395595993:3395597993(2000) ack 1 
win 229
outbound sniffed packet:  0.200138 . 3395597993:3395599993(2000) ack 1 
win 229
outbound sniffed packet:  0.200145 P. 3395599993:3395601993(2000) ack 1 
win 229
outbound sniffed packet:  0.200345 F. 3395601993:3395601993(0) ack 1 win 
229
inbound injected packet:  0.300016 . 1:1(0) ack 3395597993 win 257
outbound sniffed packet:  0.499792 F. 3395601993:3395601993(0) ack 1 win 
229
inbound injected packet:  0.600024 . 1:1(0) ack 3395597993 win 257 <sack 
3395601993:3395601994,nop,nop>
outbound sniffed packet:  0.600074 . 3395597993:3395598993(1000) ack 1 
win 229
outbound sniffed packet:  0.600080 . 3395598993:3395599993(1000) ack 1 
win 229
inbound injected packet:  0.700016 . 1:1(0) ack 3395599993 win 257 <sack 
3395601993:3395601994,nop,nop>
outbound sniffed packet:  0.700062 . 3395599993:3395600993(1000) ack 1 
win 229
outbound sniffed packet:  0.700066 P. 3395600993:3395601993(1000) ack 1 
win 229
inbound injected packet:  0.700164 . 1:1(0) ack 3395601994 win 257
inbound injected packet:  0.800016 F. 1:1(0) ack 3395601994 win 229
outbound sniffed packet:  0.800062 . 3395601994:3395601994(0) ack 2 win 229

# ./packetdrill -v fin_er.pkt
inbound injected packet:  0.100009 S 0:0(0) win 32792 <mss 
1000,sackOK,nop,nop,nop,wscale 7>
outbound sniffed packet:  0.100180 S. 3074568182:3074568182(0) ack 1 win 
29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
inbound injected packet:  0.200005 . 1:1(0) ack 3074568183 win 257
outbound sniffed packet:  0.200106 . 3074568183:3074570183(2000) ack 1 
win 229
outbound sniffed packet:  0.200115 . 3074570183:3074572183(2000) ack 1 
win 229
outbound sniffed packet:  0.200122 . 3074572183:3074574183(2000) ack 1 
win 229
outbound sniffed packet:  0.200128 P. 3074574183:3074576183(2000) ack 1 
win 229
outbound sniffed packet:  0.200326 F. 3074576183:3074576183(0) ack 1 win 
229
inbound injected packet:  0.300003 . 1:1(0) ack 3074575183 win 257
outbound sniffed packet:  0.499765 F. 3074576183:3074576183(0) ack 1 win 
229
inbound injected packet:  0.600006 . 1:1(0) ack 3074575183 win 257 <sack 
3074576183:3074576184,nop,nop>
outbound sniffed packet:  0.624764 P. 3074575183:3074576183(1000) ack 1 
win 229
inbound injected packet:  0.725004 . 1:1(0) ack 3074576184 win 257
inbound injected packet:  0.800003 F. 1:1(0) ack 3074576184 win 229
outbound sniffed packet:  0.800047 . 3074576184:3074576184(0) ack 2 win 229


thanks
Weiping Pan
>
> Nandita
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] tcp: fixing TLP's FIN recovery
  2014-06-12 14:21               ` Weiping Pan
@ 2014-06-12 14:32                 ` Eric Dumazet
  2014-06-12 15:08                   ` [PATCH v2 1/1] " Per Hurtig
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-06-12 14:32 UTC (permalink / raw)
  To: Weiping Pan
  Cc: Nandita Dukkipati, Per Hurtig, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, sergei.shtylyov

On Thu, 2014-06-12 at 22:21 +0800, Weiping Pan wrote:
> When we queue an out of order pure FIN packet, we do not check whether 
> it has data or not.
> tcp_rcv_established
> -->tcp_data_queue
> ---->tcp_data_queue_ofo
> 
> Then the pure FIN packet can generate SACK, which will trigger fast 
> recovery or early retransmit on the sender.
> >
> > If you have verified that a pure FIN does indeed trigger recovery, can
> > you tell me what part of the code makes that happen?
> Here is the patch I use, I think the original if statement is useless, 
> so I delete it.

Yes, this is exactly what we agreed, and what we tested as well here at
Google.

Per, can you submit an updated official version of the patch ?

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/1] tcp: fixing TLP's FIN recovery
  2014-06-12 14:32                 ` Eric Dumazet
@ 2014-06-12 15:08                   ` Per Hurtig
  2014-06-12 15:28                     ` Eric Dumazet
  2014-06-12 18:06                     ` David Miller
  0 siblings, 2 replies; 28+ messages in thread
From: Per Hurtig @ 2014-06-12 15:08 UTC (permalink / raw)
  To: eric.dumazet
  Cc: Per Hurtig, panweiping3, nanditad, netdev, anna.brunstrom,
	mohammad.rajiullah, ncardwell, sergei.shtylyov

Fix to a problem observed when losing a FIN segment that does not
contain data.  In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.

Signed-off-by: Per Hurtig <per.hurtig@kau.se>
---
 net/ipv4/tcp_output.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ad7549f..819bf0c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2131,9 +2131,7 @@ void tcp_send_loss_probe(struct sock *sk)
 	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
 		goto rearm_timer;
 
-	/* Probe with zero data doesn't trigger fast recovery. */
-	if (skb->len > 0)
-		err = __tcp_retransmit_skb(sk, skb);
+	err = __tcp_retransmit_skb(sk, skb);
 
 	/* Record snd_nxt for loss detection. */
 	if (likely(!err))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery
  2014-06-12 15:08                   ` [PATCH v2 1/1] " Per Hurtig
@ 2014-06-12 15:28                     ` Eric Dumazet
  2014-06-12 17:36                       ` Nandita Dukkipati
  2014-06-12 18:06                     ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-06-12 15:28 UTC (permalink / raw)
  To: Per Hurtig
  Cc: panweiping3, nanditad, netdev, anna.brunstrom,
	mohammad.rajiullah, ncardwell, sergei.shtylyov

On Thu, 2014-06-12 at 17:08 +0200, Per Hurtig wrote:
> Fix to a problem observed when losing a FIN segment that does not
> contain data.  In such situations, TLP is unable to recover from
> *any* tail loss and instead adds at least PTO ms to the
> retransmission process, i.e., RTO = RTO + PTO.
> 
> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
> ---
>  net/ipv4/tcp_output.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ad7549f..819bf0c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2131,9 +2131,7 @@ void tcp_send_loss_probe(struct sock *sk)
>  	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>  		goto rearm_timer;
>  
> -	/* Probe with zero data doesn't trigger fast recovery. */
> -	if (skb->len > 0)
> -		err = __tcp_retransmit_skb(sk, skb);
> +	err = __tcp_retransmit_skb(sk, skb);
>  
>  	/* Record snd_nxt for loss detection. */
>  	if (likely(!err))

Thanks a lot Per

Signed-off-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery
  2014-06-12 15:28                     ` Eric Dumazet
@ 2014-06-12 17:36                       ` Nandita Dukkipati
  2014-06-12 17:46                         ` Neal Cardwell
  0 siblings, 1 reply; 28+ messages in thread
From: Nandita Dukkipati @ 2014-06-12 17:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Per Hurtig, panweiping3, Netdev, Anna Brunström,
	mohammad.rajiullah, Neal Cardwell, sergei.shtylyov

On Thu, Jun 12, 2014 at 8:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-06-12 at 17:08 +0200, Per Hurtig wrote:
>> Fix to a problem observed when losing a FIN segment that does not
>> contain data.  In such situations, TLP is unable to recover from
>> *any* tail loss and instead adds at least PTO ms to the
>> retransmission process, i.e., RTO = RTO + PTO.
>>
>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>> ---
>>  net/ipv4/tcp_output.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index ad7549f..819bf0c 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -2131,9 +2131,7 @@ void tcp_send_loss_probe(struct sock *sk)
>>       if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>>               goto rearm_timer;
>>
>> -     /* Probe with zero data doesn't trigger fast recovery. */
>> -     if (skb->len > 0)
>> -             err = __tcp_retransmit_skb(sk, skb);
>> +     err = __tcp_retransmit_skb(sk, skb);
>>
>>       /* Record snd_nxt for loss detection. */
>>       if (likely(!err))
>
> Thanks a lot Per
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Nandita Dukkipati <nanditad@google.com>

Thanks Per and Eric. This is nice addition that makes TLP more useful
for tail losses.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery
  2014-06-12 17:36                       ` Nandita Dukkipati
@ 2014-06-12 17:46                         ` Neal Cardwell
  0 siblings, 0 replies; 28+ messages in thread
From: Neal Cardwell @ 2014-06-12 17:46 UTC (permalink / raw)
  To: Nandita Dukkipati
  Cc: Eric Dumazet, Per Hurtig, panweiping3, Netdev,
	Anna Brunström, mohammad.rajiullah, sergei.shtylyov

On Thu, Jun 12, 2014 at 1:36 PM, Nandita Dukkipati <nanditad@google.com> wrote:
> On Thu, Jun 12, 2014 at 8:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Thu, 2014-06-12 at 17:08 +0200, Per Hurtig wrote:
>>> Fix to a problem observed when losing a FIN segment that does not
>>> contain data.  In such situations, TLP is unable to recover from
>>> *any* tail loss and instead adds at least PTO ms to the
>>> retransmission process, i.e., RTO = RTO + PTO.
>>>
>>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>>> ---
>>>  net/ipv4/tcp_output.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>> index ad7549f..819bf0c 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -2131,9 +2131,7 @@ void tcp_send_loss_probe(struct sock *sk)
>>>       if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>>>               goto rearm_timer;
>>>
>>> -     /* Probe with zero data doesn't trigger fast recovery. */
>>> -     if (skb->len > 0)
>>> -             err = __tcp_retransmit_skb(sk, skb);
>>> +     err = __tcp_retransmit_skb(sk, skb);
>>>
>>>       /* Record snd_nxt for loss detection. */
>>>       if (likely(!err))
>>
>> Thanks a lot Per
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> Acked-by: Nandita Dukkipati <nanditad@google.com>

Thanks, Per!

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery
  2014-06-12 15:08                   ` [PATCH v2 1/1] " Per Hurtig
  2014-06-12 15:28                     ` Eric Dumazet
@ 2014-06-12 18:06                     ` David Miller
  2014-10-07 15:03                       ` Josh Hunt
  1 sibling, 1 reply; 28+ messages in thread
From: David Miller @ 2014-06-12 18:06 UTC (permalink / raw)
  To: per.hurtig
  Cc: eric.dumazet, panweiping3, nanditad, netdev, anna.brunstrom,
	mohammad.rajiullah, ncardwell, sergei.shtylyov

From: Per Hurtig <per.hurtig@kau.se>
Date: Thu, 12 Jun 2014 17:08:32 +0200

> Fix to a problem observed when losing a FIN segment that does not
> contain data.  In such situations, TLP is unable to recover from
> *any* tail loss and instead adds at least PTO ms to the
> retransmission process, i.e., RTO = RTO + PTO.
> 
> Signed-off-by: Per Hurtig <per.hurtig@kau.se>

Applied, thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery
  2014-06-12 18:06                     ` David Miller
@ 2014-10-07 15:03                       ` Josh Hunt
  2014-10-07 20:17                         ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Josh Hunt @ 2014-10-07 15:03 UTC (permalink / raw)
  To: David Miller
  Cc: per.hurtig, Eric Dumazet, panweiping3, nanditad, netdev,
	anna.brunstrom, mohammad.rajiullah, ncardwell, sergei.shtylyov

On Thu, Jun 12, 2014 at 1:06 PM, David Miller <davem@davemloft.net> wrote:
> From: Per Hurtig <per.hurtig@kau.se>
> Date: Thu, 12 Jun 2014 17:08:32 +0200
>
>> Fix to a problem observed when losing a FIN segment that does not
>> contain data.  In such situations, TLP is unable to recover from
>> *any* tail loss and instead adds at least PTO ms to the
>> retransmission process, i.e., RTO = RTO + PTO.
>>
>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>
> Applied, thanks.

Can we queue this up for stable? 2cd0d743b05e87 (tcp: fix
tcp_match_skb_to_sack() for unaligned SACK at end of an skb) is
already in stable and based on the changelog was put in place to fix a
case that this patch introduced:

"This was visible now because the recently simplified TLP logic in
 bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
 skb at the end of the write queue, and now that we do not check that
 skb's length we could send it as a TLP probe."

However, the patch to fix TLP's FIN recovery is not in -stable.

Thanks
-- 
Josh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery
  2014-10-07 15:03                       ` Josh Hunt
@ 2014-10-07 20:17                         ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2014-10-07 20:17 UTC (permalink / raw)
  To: joshhunt00
  Cc: per.hurtig, eric.dumazet, panweiping3, nanditad, netdev,
	anna.brunstrom, mohammad.rajiullah, ncardwell, sergei.shtylyov

From: Josh Hunt <joshhunt00@gmail.com>
Date: Tue, 7 Oct 2014 10:03:29 -0500

> On Thu, Jun 12, 2014 at 1:06 PM, David Miller <davem@davemloft.net> wrote:
>> From: Per Hurtig <per.hurtig@kau.se>
>> Date: Thu, 12 Jun 2014 17:08:32 +0200
>>
>>> Fix to a problem observed when losing a FIN segment that does not
>>> contain data.  In such situations, TLP is unable to recover from
>>> *any* tail loss and instead adds at least PTO ms to the
>>> retransmission process, i.e., RTO = RTO + PTO.
>>>
>>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>>
>> Applied, thanks.
> 
> Can we queue this up for stable? 2cd0d743b05e87 (tcp: fix
> tcp_match_skb_to_sack() for unaligned SACK at end of an skb) is
> already in stable and based on the changelog was put in place to fix a
> case that this patch introduced:
> 
> "This was visible now because the recently simplified TLP logic in
>  bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
>  skb at the end of the write queue, and now that we do not check that
>  skb's length we could send it as a TLP probe."
> 
> However, the patch to fix TLP's FIN recovery is not in -stable.

Queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-10-07 20:17 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-06 18:46 tcp: fixing TLP's FIN recovery Per Hurtig
2014-06-06 19:07 ` Eric Dumazet
2014-06-07 11:10   ` [PATCH] " Per Hurtig
2014-06-07 13:56     ` Sergei Shtylyov
2014-06-07 14:34       ` Per Hurtig
2014-06-08  2:58         ` Eric Dumazet
2014-06-08  7:41           ` Per Hurtig
2014-06-08 16:35             ` Eric Dumazet
2014-06-09  7:04               ` Nandita Dukkipati
2014-06-09  7:02             ` Nandita Dukkipati
2014-06-09 13:13               ` Per Hurtig
2014-06-09 14:33                 ` Eric Dumazet
2014-06-09 14:39                   ` Eric Dumazet
2014-06-09 14:42                   ` Per Hurtig
2014-06-09 15:04                     ` Eric Dumazet
2014-06-09 15:56                       ` Per Hurtig
2014-06-09 16:15                         ` Eric Dumazet
2014-06-09 16:24                           ` Eric Dumazet
2014-06-09 18:33                             ` Eric Dumazet
2014-06-12 14:21               ` Weiping Pan
2014-06-12 14:32                 ` Eric Dumazet
2014-06-12 15:08                   ` [PATCH v2 1/1] " Per Hurtig
2014-06-12 15:28                     ` Eric Dumazet
2014-06-12 17:36                       ` Nandita Dukkipati
2014-06-12 17:46                         ` Neal Cardwell
2014-06-12 18:06                     ` David Miller
2014-10-07 15:03                       ` Josh Hunt
2014-10-07 20:17                         ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.