* tcp: fixing TLP's FIN recovery
@ 2014-06-06 18:46 Per Hurtig
2014-06-06 19:07 ` Eric Dumazet
0 siblings, 1 reply; 28+ messages in thread
From: Per Hurtig @ 2014-06-06 18:46 UTC (permalink / raw)
To: netdev; +Cc: eric.dumazet, anna.brunstrom, mohammad.rajiullah
>From ab1b16ef8aba4300b1a6e965c3ab7d0cb269bb2a Mon Sep 17 00:00:00 2001
From: Per Hurtig <per.hurtig@kau.se>
Date: Fri, 6 Jun 2014 18:36:19 +0200
Subject: [PATCH 1/1] tcp: fixing TLP's FIN recovery
Fix to a problem observed when losing a FIN segment that does not
contain data. In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.
Signed-off-by: Per Hurtig <per.hurtig@kau.se>
---
net/ipv4/tcp_output.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d463c35..2c29926 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2130,8 +2130,9 @@ void tcp_send_loss_probe(struct sock *sk)
if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
goto rearm_timer;
- /* Probe with zero data doesn't trigger fast recovery.
*/
- if (skb->len > 0)
+ /* Probe with zero data doesn't trigger fast
recovery, unless
+ * FIN flag is set. */
+ if ((skb->len > 0) ||
(TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
err = __tcp_retransmit_skb(sk, skb);
/* Record snd_nxt for loss detection. */
--
1.9.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: tcp: fixing TLP's FIN recovery 2014-06-06 18:46 tcp: fixing TLP's FIN recovery Per Hurtig @ 2014-06-06 19:07 ` Eric Dumazet 2014-06-07 11:10 ` [PATCH] " Per Hurtig 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2014-06-06 19:07 UTC (permalink / raw) To: Per Hurtig Cc: netdev, anna.brunstrom, mohammad.rajiullah, Neal Cardwell, Nandita Dukkipati On Fri, 2014-06-06 at 20:46 +0200, Per Hurtig wrote: > From ab1b16ef8aba4300b1a6e965c3ab7d0cb269bb2a Mon Sep 17 00:00:00 2001 > From: Per Hurtig <per.hurtig@kau.se> > Date: Fri, 6 Jun 2014 18:36:19 +0200 > Subject: [PATCH 1/1] tcp: fixing TLP's FIN recovery > > Fix to a problem observed when losing a FIN segment that does not > contain data. In such situations, TLP is unable to recover from > *any* tail loss and instead adds at least PTO ms to the > retransmission process, i.e., RTO = RTO + PTO. > > Signed-off-by: Per Hurtig <per.hurtig@kau.se> > --- > net/ipv4/tcp_output.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index d463c35..2c29926 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2130,8 +2130,9 @@ void tcp_send_loss_probe(struct sock *sk) > if (WARN_ON(!skb || !tcp_skb_pcount(skb))) > goto rearm_timer; > > - /* Probe with zero data doesn't trigger fast recovery. > */ > - if (skb->len > 0) > + /* Probe with zero data doesn't trigger fast > recovery, unless > + * FIN flag is set. */ > + if ((skb->len > 0) || > (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) > err = __tcp_retransmit_skb(sk, skb); > > /* Record snd_nxt for loss detection. */ > -- > 1.9.1 > Patch was mangled. Please fix and resend. Please CC Nandita & Neal, Thanks ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] tcp: fixing TLP's FIN recovery 2014-06-06 19:07 ` Eric Dumazet @ 2014-06-07 11:10 ` Per Hurtig 2014-06-07 13:56 ` Sergei Shtylyov 0 siblings, 1 reply; 28+ messages in thread From: Per Hurtig @ 2014-06-07 11:10 UTC (permalink / raw) To: netdev Cc: Per Hurtig, eric.dumazet, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad Fix to a problem observed when losing a FIN segment that does not contain data. In such situations, TLP is unable to recover from *any* tail loss and instead adds at least PTO ms to the retransmission process, i.e., RTO = RTO + PTO. --- net/ipv4/tcp_output.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d463c35..6573765 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) if (WARN_ON(!skb || !tcp_skb_pcount(skb))) goto rearm_timer; - /* Probe with zero data doesn't trigger fast recovery. */ - if (skb->len > 0) + /* Probe with zero data doesn't trigger fast recovery, if not + * FIN flag is set. + */ + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) err = __tcp_retransmit_skb(sk, skb); /* Record snd_nxt for loss detection. */ -- 1.9.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-07 11:10 ` [PATCH] " Per Hurtig @ 2014-06-07 13:56 ` Sergei Shtylyov 2014-06-07 14:34 ` Per Hurtig 0 siblings, 1 reply; 28+ messages in thread From: Sergei Shtylyov @ 2014-06-07 13:56 UTC (permalink / raw) To: Per Hurtig, netdev Cc: eric.dumazet, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad Hello. On 07-06-2014 15:10, Per Hurtig wrote: > Fix to a problem observed when losing a FIN segment that does not > contain data. In such situations, TLP is unable to recover from > *any* tail loss and instead adds at least PTO ms to the > retransmission process, i.e., RTO = RTO + PTO. You should provide your signoff. [...] > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index d463c35..6573765 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) > if (WARN_ON(!skb || !tcp_skb_pcount(skb))) > goto rearm_timer; > > - /* Probe with zero data doesn't trigger fast recovery. */ > - if (skb->len > 0) > + /* Probe with zero data doesn't trigger fast recovery, if not s/not/no/? Or rather "if FIN flag is not set"? > + * FIN flag is set. > + */ > + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) > err = __tcp_retransmit_skb(sk, skb); WBR, Sergei ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] tcp: fixing TLP's FIN recovery 2014-06-07 13:56 ` Sergei Shtylyov @ 2014-06-07 14:34 ` Per Hurtig 2014-06-08 2:58 ` Eric Dumazet 0 siblings, 1 reply; 28+ messages in thread From: Per Hurtig @ 2014-06-07 14:34 UTC (permalink / raw) To: netdev Cc: Per Hurtig, eric.dumazet, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad, sergei.shtylyov Fix to a problem observed when losing a FIN segment that does not contain data. In such situations, TLP is unable to recover from *any* tail loss and instead adds at least PTO ms to the retransmission process, i.e., RTO = RTO + PTO. Signed-off-by: Per Hurtig <per.hurtig@kau.se> --- net/ipv4/tcp_output.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d463c35..6573765 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) if (WARN_ON(!skb || !tcp_skb_pcount(skb))) goto rearm_timer; - /* Probe with zero data doesn't trigger fast recovery. */ - if (skb->len > 0) + /* Probe with zero data doesn't trigger fast recovery, if FIN + * flag is not set. + */ + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) err = __tcp_retransmit_skb(sk, skb); /* Record snd_nxt for loss detection. */ -- 1.9.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-07 14:34 ` Per Hurtig @ 2014-06-08 2:58 ` Eric Dumazet 2014-06-08 7:41 ` Per Hurtig 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2014-06-08 2:58 UTC (permalink / raw) To: Per Hurtig Cc: netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad, sergei.shtylyov On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote: > Fix to a problem observed when losing a FIN segment that does not > contain data. In such situations, TLP is unable to recover from > *any* tail loss and instead adds at least PTO ms to the > retransmission process, i.e., RTO = RTO + PTO. > > Signed-off-by: Per Hurtig <per.hurtig@kau.se> > --- > net/ipv4/tcp_output.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index d463c35..6573765 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) > if (WARN_ON(!skb || !tcp_skb_pcount(skb))) > goto rearm_timer; > > - /* Probe with zero data doesn't trigger fast recovery. */ > - if (skb->len > 0) > + /* Probe with zero data doesn't trigger fast recovery, if FIN > + * flag is not set. > + */ > + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) > err = __tcp_retransmit_skb(sk, skb); > > /* Record snd_nxt for loss detection. */ You know, I believe the test was exactly to avoid sending data less FIN packets. If you write : if (A || !A) Better remove the condition, completely ;) Nandita, why FIN packet wont trigger fast retransnmits ? It sounds like if the timer is the issue you want to fix, you might simply rearm a timer with RTO-PTO instead of RTO ? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-08 2:58 ` Eric Dumazet @ 2014-06-08 7:41 ` Per Hurtig 2014-06-08 16:35 ` Eric Dumazet 2014-06-09 7:02 ` Nandita Dukkipati 0 siblings, 2 replies; 28+ messages in thread From: Per Hurtig @ 2014-06-08 7:41 UTC (permalink / raw) To: Eric Dumazet Cc: netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad, sergei.shtylyov On sön 8 jun 2014 04:58:25, Eric Dumazet wrote: > On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote: >> Fix to a problem observed when losing a FIN segment that does not >> contain data. In such situations, TLP is unable to recover from >> *any* tail loss and instead adds at least PTO ms to the >> retransmission process, i.e., RTO = RTO + PTO. >> >> Signed-off-by: Per Hurtig <per.hurtig@kau.se> >> --- >> net/ipv4/tcp_output.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >> index d463c35..6573765 100644 >> --- a/net/ipv4/tcp_output.c >> +++ b/net/ipv4/tcp_output.c >> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) >> if (WARN_ON(!skb || !tcp_skb_pcount(skb))) >> goto rearm_timer; >> >> - /* Probe with zero data doesn't trigger fast recovery. */ >> - if (skb->len > 0) >> + /* Probe with zero data doesn't trigger fast recovery, if FIN >> + * flag is not set. >> + */ >> + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) >> err = __tcp_retransmit_skb(sk, skb); >> >> /* Record snd_nxt for loss detection. */ > > > You know, I believe the test was exactly to avoid sending data less FIN > packets. > > If you write : > > if (A || !A) > > Better remove the condition, completely ;) > Obviously, but I don't think that FINs are the only segments who are targeted by this condition (or targeted at all given the implications of this statement). Furthermore, the comment above the if statement would probably have mentioned FINs explicity and not zero sized segments in general if this were the case. > > Nandita, why FIN packet wont trigger fast retransnmits ? > They do, that's the whole thing with this patch. > It sounds like if the timer is the issue you want to fix, you might > simply rearm a timer with RTO-PTO instead of RTO ? > > No I want to enable TLP for tail loss where an empty FIN is involved, this does not work now. > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-08 7:41 ` Per Hurtig @ 2014-06-08 16:35 ` Eric Dumazet 2014-06-09 7:04 ` Nandita Dukkipati 2014-06-09 7:02 ` Nandita Dukkipati 1 sibling, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2014-06-08 16:35 UTC (permalink / raw) To: Per Hurtig Cc: netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, nanditad, sergei.shtylyov On Sun, 2014-06-08 at 09:41 +0200, Per Hurtig wrote: > > On sön 8 jun 2014 04:58:25, Eric Dumazet wrote: > > On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote: > >> Fix to a problem observed when losing a FIN segment that does not > >> contain data. In such situations, TLP is unable to recover from > >> *any* tail loss and instead adds at least PTO ms to the > >> retransmission process, i.e., RTO = RTO + PTO. > >> > >> Signed-off-by: Per Hurtig <per.hurtig@kau.se> > >> --- > >> net/ipv4/tcp_output.c | 6 ++++-- > >> 1 file changed, 4 insertions(+), 2 deletions(-) > >> > >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > >> index d463c35..6573765 100644 > >> --- a/net/ipv4/tcp_output.c > >> +++ b/net/ipv4/tcp_output.c > >> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) > >> if (WARN_ON(!skb || !tcp_skb_pcount(skb))) > >> goto rearm_timer; > >> > >> - /* Probe with zero data doesn't trigger fast recovery. */ > >> - if (skb->len > 0) > >> + /* Probe with zero data doesn't trigger fast recovery, if FIN > >> + * flag is not set. > >> + */ > >> + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) > >> err = __tcp_retransmit_skb(sk, skb); > >> > >> /* Record snd_nxt for loss detection. */ > > > > > > You know, I believe the test was exactly to avoid sending data less FIN > > packets. > > > > If you write : > > > > if (A || !A) > > > > Better remove the condition, completely ;) > > > Obviously, but I don't think that FINs are the only segments > who are targeted by this condition (or targeted at all given > the implications of this statement). Furthermore, the comment above > the if statement would probably have mentioned FINs explicity > and not zero sized segments in general if this were the case. > I see no other possibilities than FIN segments here, or the WARN_ON(! tcp_skb_pcount(skb)) right before would trigger. If we believe it could trigger, then we need to remove the WARN_ON(), because its far more disruptive than waiting a bit more for the RTO. Remember : RTO is conservative. The if (skb->len > 0) only is true for FIN with no data. This was exactly the intent : Not sending FIN at this stage. If pure FIN is OK here, just remove the comment and test, this is so confusing and useless. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-08 16:35 ` Eric Dumazet @ 2014-06-09 7:04 ` Nandita Dukkipati 0 siblings, 0 replies; 28+ messages in thread From: Nandita Dukkipati @ 2014-06-09 7:04 UTC (permalink / raw) To: Eric Dumazet Cc: Per Hurtig, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, sergei.shtylyov On Sun, Jun 8, 2014 at 9:35 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Sun, 2014-06-08 at 09:41 +0200, Per Hurtig wrote: >> >> On sön 8 jun 2014 04:58:25, Eric Dumazet wrote: >> > On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote: >> >> Fix to a problem observed when losing a FIN segment that does not >> >> contain data. In such situations, TLP is unable to recover from >> >> *any* tail loss and instead adds at least PTO ms to the >> >> retransmission process, i.e., RTO = RTO + PTO. >> >> >> >> Signed-off-by: Per Hurtig <per.hurtig@kau.se> >> >> --- >> >> net/ipv4/tcp_output.c | 6 ++++-- >> >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >> >> index d463c35..6573765 100644 >> >> --- a/net/ipv4/tcp_output.c >> >> +++ b/net/ipv4/tcp_output.c >> >> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) >> >> if (WARN_ON(!skb || !tcp_skb_pcount(skb))) >> >> goto rearm_timer; >> >> >> >> - /* Probe with zero data doesn't trigger fast recovery. */ >> >> - if (skb->len > 0) >> >> + /* Probe with zero data doesn't trigger fast recovery, if FIN >> >> + * flag is not set. >> >> + */ >> >> + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) >> >> err = __tcp_retransmit_skb(sk, skb); >> >> >> >> /* Record snd_nxt for loss detection. */ >> > >> > >> > You know, I believe the test was exactly to avoid sending data less FIN >> > packets. >> > >> > If you write : >> > >> > if (A || !A) >> > >> > Better remove the condition, completely ;) >> > >> Obviously, but I don't think that FINs are the only segments >> who are targeted by this condition (or targeted at all given >> the implications of this statement). Furthermore, the comment above >> the if statement would probably have mentioned FINs explicity >> and not zero sized segments in general if this were the case. >> > > > I see no other possibilities than FIN segments here, or the WARN_ON(! > tcp_skb_pcount(skb)) right before would trigger. > > If we believe it could trigger, then we need to remove the WARN_ON(), > because its far more disruptive than waiting a bit more for the RTO. > Remember : RTO is conservative. > > The if (skb->len > 0) only is true for FIN with no data. > > This was exactly the intent : Not sending FIN at this stage. > > If pure FIN is OK here, just remove the comment and test, this is so > confusing and useless. I agree - if pure FIN can indeed trigger recovery, let's just remove the test and comment. Nandita ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-08 7:41 ` Per Hurtig 2014-06-08 16:35 ` Eric Dumazet @ 2014-06-09 7:02 ` Nandita Dukkipati 2014-06-09 13:13 ` Per Hurtig 2014-06-12 14:21 ` Weiping Pan 1 sibling, 2 replies; 28+ messages in thread From: Nandita Dukkipati @ 2014-06-09 7:02 UTC (permalink / raw) To: Per Hurtig Cc: Eric Dumazet, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, sergei.shtylyov On Sun, Jun 8, 2014 at 12:41 AM, Per Hurtig <per.hurtig@kau.se> wrote: > > > On sön 8 jun 2014 04:58:25, Eric Dumazet wrote: >> >> On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote: >>> >>> Fix to a problem observed when losing a FIN segment that does not >>> contain data. In such situations, TLP is unable to recover from >>> *any* tail loss and instead adds at least PTO ms to the >>> retransmission process, i.e., RTO = RTO + PTO. >>> >>> Signed-off-by: Per Hurtig <per.hurtig@kau.se> >>> --- >>> net/ipv4/tcp_output.c | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >>> index d463c35..6573765 100644 >>> --- a/net/ipv4/tcp_output.c >>> +++ b/net/ipv4/tcp_output.c >>> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) >>> if (WARN_ON(!skb || !tcp_skb_pcount(skb))) >>> goto rearm_timer; >>> >>> - /* Probe with zero data doesn't trigger fast recovery. */ >>> - if (skb->len > 0) >>> + /* Probe with zero data doesn't trigger fast recovery, if FIN >>> + * flag is not set. >>> + */ >>> + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) >>> err = __tcp_retransmit_skb(sk, skb); >>> >>> /* Record snd_nxt for loss detection. */ >> >> >> >> You know, I believe the test was exactly to avoid sending data less FIN >> packets. >> >> If you write : >> >> if (A || !A) >> >> Better remove the condition, completely ;) >> > Obviously, but I don't think that FINs are the only segments > who are targeted by this condition (or targeted at all given > the implications of this statement). Furthermore, the comment above > the if statement would probably have mentioned FINs explicity > and not zero sized segments in general if this were the case. > > > >> >> Nandita, why FIN packet wont trigger fast retransnmits ? >> > > They do, that's the whole thing with this patch. > > >> It sounds like if the timer is the issue you want to fix, you might >> simply rearm a timer with RTO-PTO instead of RTO ? >> >> > No I want to enable TLP for tail loss where an empty FIN is involved, > this does not work now. I understand the tail loss case you want to solve - essentially when a tail loss occurs that involves data segments as well as that of an empty FIN. However, have you verified that re-sending an empty FIN triggers fast recovery? I would be surprised if it did, because I think the sender needs to receive a SACK of at least 1-byte of data before sender can trigger FACK based fast recovery. If you have verified that a pure FIN does indeed trigger recovery, can you tell me what part of the code makes that happen? Nandita ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 7:02 ` Nandita Dukkipati @ 2014-06-09 13:13 ` Per Hurtig 2014-06-09 14:33 ` Eric Dumazet 2014-06-12 14:21 ` Weiping Pan 1 sibling, 1 reply; 28+ messages in thread From: Per Hurtig @ 2014-06-09 13:13 UTC (permalink / raw) To: Nandita Dukkipati Cc: Eric Dumazet, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov See inline, On 2014-06-09 09:02, Nandita Dukkipati wrote: > On Sun, Jun 8, 2014 at 12:41 AM, Per Hurtig <per.hurtig@kau.se> wrote: >> >> >> On sön 8 jun 2014 04:58:25, Eric Dumazet wrote: >>> >>> On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote: >>>> >>>> Fix to a problem observed when losing a FIN segment that does not >>>> contain data. In such situations, TLP is unable to recover from >>>> *any* tail loss and instead adds at least PTO ms to the >>>> retransmission process, i.e., RTO = RTO + PTO. >>>> >>>> Signed-off-by: Per Hurtig <per.hurtig@kau.se> >>>> --- >>>> net/ipv4/tcp_output.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >>>> index d463c35..6573765 100644 >>>> --- a/net/ipv4/tcp_output.c >>>> +++ b/net/ipv4/tcp_output.c >>>> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) >>>> if (WARN_ON(!skb || !tcp_skb_pcount(skb))) >>>> goto rearm_timer; >>>> >>>> - /* Probe with zero data doesn't trigger fast recovery. */ >>>> - if (skb->len > 0) >>>> + /* Probe with zero data doesn't trigger fast recovery, if FIN >>>> + * flag is not set. >>>> + */ >>>> + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) >>>> err = __tcp_retransmit_skb(sk, skb); >>>> >>>> /* Record snd_nxt for loss detection. */ >>> >>> >>> >>> You know, I believe the test was exactly to avoid sending data less FIN >>> packets. >>> >>> If you write : >>> >>> if (A || !A) >>> >>> Better remove the condition, completely ;) >>> After looking more closely, I see that we only enter this part when the FIN flag is set on an otherwise empty segment. I guess I was distracted by the comment that suggested a more general scenario than the actual one, a bit confusing ;) >> Obviously, but I don't think that FINs are the only segments >> who are targeted by this condition (or targeted at all given >> the implications of this statement). Furthermore, the comment above >> the if statement would probably have mentioned FINs explicity >> and not zero sized segments in general if this were the case. >> >> >> >>> >>> Nandita, why FIN packet wont trigger fast retransnmits ? >>> >> >> They do, that's the whole thing with this patch. >> >> >>> It sounds like if the timer is the issue you want to fix, you might >>> simply rearm a timer with RTO-PTO instead of RTO ? >>> >>> >> No I want to enable TLP for tail loss where an empty FIN is involved, >> this does not work now. > > I understand the tail loss case you want to solve - essentially when a > tail loss occurs that involves data segments as well as that of an > empty FIN. However, have you verified that re-sending an empty FIN > triggers fast recovery? I would be surprised if it did, because I > think the sender needs to receive a SACK of at least 1-byte of data > before sender can trigger FACK based fast recovery. > Yes, it needs a SACK that covers one "sequence number", which a FIN does. I don't see why it shouldn't generate a SACK? See below for some packet dumps. > If you have verified that a pure FIN does indeed trigger recovery, can > you tell me what part of the code makes that happen? > Scenario: Eleven segments are sent back-to-back (ten data and one empty FIN), the last three segments (the FIN + two others) are dropped. Other relevant info: RTT of 20ms. Transfer time of the entire flow (from the receiver's point of view): TCP w TLP: 324ms TCP w modified TLP: 122ms Detailed TLP behavior: The entire transfer including retransmissions takes approx 324ms. The retransmissions are conducted in frames 23 and 25. Sender-side packet trace: 1 0.000000 10.0.1.1 -> 10.0.2.1 TCP 74 36713 > search-agent [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1171292524 TSecr=0 WS=128 2 0.000020 10.0.2.1 -> 10.0.1.1 TCP 74 search-agent > 36713 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1171296150 TSecr=1171292524 WS=128 3 0.019818 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=1171292529 TSecr=1171296150 4 0.019854 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=1 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 5 0.019864 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=1449 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 6 0.019868 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=2897 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 7 0.019871 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=4345 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 8 0.019875 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=5793 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 9 0.019878 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=7241 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 10 0.019881 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=8689 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 11 0.019922 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=10137 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 12 0.019929 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 36713 [PSH, ACK] Seq=11585 Ack=1 Win=29056 Len=1448 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 13 0.019930 10.0.2.1 -> 10.0.1.1 TCP 1513 search-agent > 36713 [PSH, ACK] Seq=13033 Ack=1 Win=29056 Len=1447 TSval=1171296155 TSecr=1171292529[Packet size limited during capture] 14 0.019971 10.0.2.1 -> 10.0.1.1 TCP 66 search-agent > 36713 [FIN, ACK] Seq=14480 Ack=1 Win=29056 Len=0 TSval=1171296155 TSecr=1171292529 15 0.039635 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=1449 Win=32128 Len=0 TSval=1171292534 TSecr=1171296155 16 0.039643 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=2897 Win=35072 Len=0 TSval=1171292534 TSecr=1171296155 17 0.039646 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=4345 Win=37888 Len=0 TSval=1171292534 TSecr=1171296155 18 0.039650 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=5793 Win=40832 Len=0 TSval=1171292534 TSecr=1171296155 19 0.039653 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=7241 Win=43520 Len=0 TSval=1171292534 TSecr=1171296155 20 0.039655 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=8689 Win=43520 Len=0 TSval=1171292534 TSecr=1171296155 21 0.039657 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=10137 Win=43520 Len=0 TSval=1171292534 TSecr=1171296155 22 0.039660 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=11585 Win=43520 Len=0 TSval=1171292534 TSecr=1171296155 23 0.283780 10.0.2.1 -> 10.0.1.1 TCP 1514 [TCP Retransmission] search-agent > 36713 [PSH, ACK] Seq=11585 Ack=1 Win=29056 Len=1448 TSval=1171296221 TSecr=1171292534[Packet size limited during capture] 24 0.303267 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [ACK] Seq=1 Ack=13033 Win=43520 Len=0 TSval=1171292600 TSecr=1171296221 25 0.303276 10.0.2.1 -> 10.0.1.1 TCP 1513 [TCP Retransmission] search-agent > 36713 [FIN, PSH, ACK] Seq=13033 Ack=1 Win=29056 Len=1447 TSval=1171296225 TSecr=1171292600[Packet size limited during capture] 26 0.324085 10.0.1.1 -> 10.0.2.1 TCP 66 36713 > search-agent [FIN, ACK] Seq=1 Ack=14481 Win=43520 Len=0 TSval=1171292605 TSecr=1171296225 27 0.324093 10.0.2.1 -> 10.0.1.1 TCP 66 search-agent > 36713 [ACK] Seq=14481 Ack=2 Win=29056 Len=0 TSval=1171296231 TSecr=1171292605 ---- Modified TLP behavior: The entire transfer including retransmissions takes approx 122ms. The TLP probe is sent in frame 23 below, and you can see in frame 24 below that a SACK covering one sequence number is returned from the receiver and used to trigger retransmissions of the other lost segments. Sender-side packet trace: 1 0.000000 10.0.1.1 -> 10.0.2.1 TCP 74 37730 > search-agent [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1194757582 TSecr=0 WS=128 2 0.000021 10.0.2.1 -> 10.0.1.1 TCP 74 search-agent > 37730 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=222654 TSecr=1194757582 WS=128 3 0.020765 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=1194757587 TSecr=222654 4 0.020800 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=1 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 5 0.020810 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=1449 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 6 0.020814 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=2897 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 7 0.020818 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=4345 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 8 0.020821 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=5793 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 9 0.020824 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=7241 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 10 0.020827 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=8689 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 11 0.020870 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=10137 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 12 0.020877 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=11585 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 13 0.020879 10.0.2.1 -> 10.0.1.1 TCP 1514 search-agent > 37730 [PSH, ACK] Seq=13033 Ack=1 Win=29056 Len=1448 TSval=222659 TSecr=1194757587[Packet size limited during capture] 14 0.020918 10.0.2.1 -> 10.0.1.1 TCP 66 search-agent > 37730 [FIN, ACK] Seq=14481 Ack=1 Win=29056 Len=0 TSval=222659 TSecr=1194757587 15 0.040583 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=1449 Win=32128 Len=0 TSval=1194757593 TSecr=222659 16 0.040591 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=2897 Win=35072 Len=0 TSval=1194757593 TSecr=222659 17 0.040594 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=4345 Win=37888 Len=0 TSval=1194757593 TSecr=222659 18 0.040597 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=5793 Win=40832 Len=0 TSval=1194757593 TSecr=222659 19 0.040599 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=7241 Win=43520 Len=0 TSval=1194757593 TSecr=222659 20 0.040601 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=8689 Win=43520 Len=0 TSval=1194757593 TSecr=222659 21 0.040604 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=10137 Win=43520 Len=0 TSval=1194757593 TSecr=222659 22 0.040606 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=11585 Win=43520 Len=0 TSval=1194757593 TSecr=222659 23 0.078751 10.0.2.1 -> 10.0.1.1 TCP 66 [TCP Retransmission] search-agent > 37730 [FIN, ACK] Seq=14481 Ack=1 Win=29056 Len=0 TSval=222674 TSecr=1194757593 24 0.098093 10.0.1.1 -> 10.0.2.1 TCP 78 [TCP Dup ACK 22#1] 37730 > search-agent [ACK] Seq=1 Ack=11585 Win=43520 Len=0 TSval=1194757607 TSecr=222659 SLE=14481 SRE=14482 25 0.102752 10.0.2.1 -> 10.0.1.1 TCP 1514 [TCP Retransmission] search-agent > 37730 [PSH, ACK] Seq=11585 Ack=1 Win=29056 Len=1448 TSval=222680 TSecr=1194757607[Packet size limited during capture] 26 0.102757 10.0.2.1 -> 10.0.1.1 TCP 1514 [TCP Retransmission] search-agent > 37730 [PSH, ACK] Seq=13033 Ack=1 Win=29056 Len=1448 TSval=222680 TSecr=1194757607[Packet size limited during capture] 27 0.121854 10.0.1.1 -> 10.0.2.1 TCP 78 37730 > search-agent [ACK] Seq=1 Ack=13033 Win=43520 Len=0 TSval=1194757613 TSecr=222680 SLE=14481 SRE=14482 28 0.121862 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [ACK] Seq=1 Ack=14482 Win=43520 Len=0 TSval=1194757613 TSecr=222680 29 0.121868 10.0.1.1 -> 10.0.2.1 TCP 66 37730 > search-agent [FIN, ACK] Seq=1 Ack=14482 Win=43520 Len=0 TSval=1194757613 TSecr=222680 30 0.121875 10.0.2.1 -> 10.0.1.1 TCP 66 search-agent > 37730 [ACK] Seq=14482 Ack=2 Win=29056 Len=0 TSval=222684 TSecr=1194757613 ---- Thanks, Per > Nandita > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 13:13 ` Per Hurtig @ 2014-06-09 14:33 ` Eric Dumazet 2014-06-09 14:39 ` Eric Dumazet 2014-06-09 14:42 ` Per Hurtig 0 siblings, 2 replies; 28+ messages in thread From: Eric Dumazet @ 2014-06-09 14:33 UTC (permalink / raw) To: Per Hurtig Cc: Nandita Dukkipati, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov On Mon, 2014-06-09 at 15:13 +0200, Per Hurtig wrote: > > > Yes, it needs a SACK that covers one "sequence number", which a FIN > does. I don't see why it shouldn't generate a SACK? See below for some > packet dumps. I cooked following packetdrill test : $ cat tlp-10pkt-fin.pkt `../common/defaults.sh` // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Send 8 MSS. 0.200 write(4, ..., 8000) = 8000 +.000 > P. 1:8001(8000) ack 1 +.000 close(4) = 0 +.000 > F. 8001:8001(0) ack 1 // Receiver ACKs 7 packets 0.300 < . 1:1(0) ack 7001 win 257 // check if TLP re-sends the FIN 0.500 > F. 8001:8001(0) ack 1 0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8001,nop,nop> // check if fast restransmit is correctly triggered. 0.600 > P. 7001:8001(1000) ack 1 # ../packetdrill tlp-10pkt-fin.pkt tlp-10pkt-fin.pkt:26: error handling packet: timing error: expected outbound packet at 0.600000 sec but happened at 1.099761 sec script packet: 0.600000 P. 7001:8001(1000) ack 1 actual packet: 1.099761 P. 7001:8001(1000) ack 1 win 457 So it looks like fast retransmit is not triggered. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 14:33 ` Eric Dumazet @ 2014-06-09 14:39 ` Eric Dumazet 2014-06-09 14:42 ` Per Hurtig 1 sibling, 0 replies; 28+ messages in thread From: Eric Dumazet @ 2014-06-09 14:39 UTC (permalink / raw) To: Per Hurtig Cc: Nandita Dukkipati, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov On Mon, 2014-06-09 at 07:33 -0700, Eric Dumazet wrote: > On Mon, 2014-06-09 at 15:13 +0200, Per Hurtig wrote: > > > > > Yes, it needs a SACK that covers one "sequence number", which a FIN > > does. I don't see why it shouldn't generate a SACK? See below for some > > packet dumps. > > I cooked following packetdrill test : > > $ cat tlp-10pkt-fin.pkt > `../common/defaults.sh` > // Establish a connection. > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > 0.000 bind(3, ..., ...) = 0 > 0.000 listen(3, 1) = 0 > > 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> > 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> > 0.200 < . 1:1(0) ack 1 win 257 > 0.200 accept(3, ..., ...) = 4 > > // Send 8 MSS. > 0.200 write(4, ..., 8000) = 8000 > +.000 > P. 1:8001(8000) ack 1 > +.000 close(4) = 0 > +.000 > F. 8001:8001(0) ack 1 > > // Receiver ACKs 7 packets > 0.300 < . 1:1(0) ack 7001 win 257 > // check if TLP re-sends the FIN > 0.500 > F. 8001:8001(0) ack 1 > 0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8001,nop,nop> > // check if fast restransmit is correctly triggered. > 0.600 > P. 7001:8001(1000) ack 1 > > # ../packetdrill tlp-10pkt-fin.pkt > tlp-10pkt-fin.pkt:26: error handling packet: timing error: expected > outbound packet at 0.600000 sec but happened at 1.099761 sec > script packet: 0.600000 P. 7001:8001(1000) ack 1 > actual packet: 1.099761 P. 7001:8001(1000) ack 1 win 457 > > So it looks like fast retransmit is not triggered. > And using instead : # cat tlp-10pkt-fin.pkt // Set up production config. `../common/defaults.sh` // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Send 8 MSS. 0.200 write(4, ..., 8000) = 8000 +.000 > P. 1:8001(8000) ack 1 +.000 close(4) = 0 +.000 > F. 8001:8001(0) ack 1 // Receiver ACKs 7 packets 0.300 < . 1:1(0) ack 7001 win 257 0.500 > F. 8001:8001(0) ack 1 0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8002,nop,nop> 0.600 > P. 7001:8001(1000) ack 1 # ../packetdrill tlp-10pkt-fin.pkt tlp-10pkt-fin.pkt:26: error handling packet: timing error: expected outbound packet at 0.600000 sec but happened at 0.624721 sec script packet: 0.600000 P. 7001:8001(1000) ack 1 actual packet: 0.624721 P. 7001:8001(1000) ack 1 win 457 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 14:33 ` Eric Dumazet 2014-06-09 14:39 ` Eric Dumazet @ 2014-06-09 14:42 ` Per Hurtig 2014-06-09 15:04 ` Eric Dumazet 1 sibling, 1 reply; 28+ messages in thread From: Per Hurtig @ 2014-06-09 14:42 UTC (permalink / raw) To: Eric Dumazet Cc: Nandita Dukkipati, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov Tried to run the script, but I don't have the "common/defaults" and the test scripts from the git repository fails on all TCP tests for Linux. The results I listed in the enclosed packet traces are from two real machines communicating with each other (with fresh net-next kernels and TLP without the zero probe check), so I tend to rely more on those results. Cheers, Per On mån 9 jun 2014 16:33:08, Eric Dumazet wrote: > On Mon, 2014-06-09 at 15:13 +0200, Per Hurtig wrote: >>> >> Yes, it needs a SACK that covers one "sequence number", which a FIN >> does. I don't see why it shouldn't generate a SACK? See below for some >> packet dumps. > > I cooked following packetdrill test : > > $ cat tlp-10pkt-fin.pkt > `../common/defaults.sh` > // Establish a connection. > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > 0.000 bind(3, ..., ...) = 0 > 0.000 listen(3, 1) = 0 > > 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> > 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> > 0.200 < . 1:1(0) ack 1 win 257 > 0.200 accept(3, ..., ...) = 4 > > // Send 8 MSS. > 0.200 write(4, ..., 8000) = 8000 > +.000 > P. 1:8001(8000) ack 1 > +.000 close(4) = 0 > +.000 > F. 8001:8001(0) ack 1 > > // Receiver ACKs 7 packets > 0.300 < . 1:1(0) ack 7001 win 257 > // check if TLP re-sends the FIN > 0.500 > F. 8001:8001(0) ack 1 > 0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8001,nop,nop> > // check if fast restransmit is correctly triggered. > 0.600 > P. 7001:8001(1000) ack 1 > > # ../packetdrill tlp-10pkt-fin.pkt > tlp-10pkt-fin.pkt:26: error handling packet: timing error: expected > outbound packet at 0.600000 sec but happened at 1.099761 sec > script packet: 0.600000 P. 7001:8001(1000) ack 1 > actual packet: 1.099761 P. 7001:8001(1000) ack 1 win 457 > > So it looks like fast retransmit is not triggered. > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 14:42 ` Per Hurtig @ 2014-06-09 15:04 ` Eric Dumazet 2014-06-09 15:56 ` Per Hurtig 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2014-06-09 15:04 UTC (permalink / raw) To: Per Hurtig Cc: Nandita Dukkipati, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov On Mon, 2014-06-09 at 16:42 +0200, Per Hurtig wrote: > Tried to run the script, but I don't have the "common/defaults" and the > test scripts from the git repository fails on all TCP tests for Linux. > The results I listed in the enclosed packet traces are from two real > machines communicating with each other (with fresh net-next kernels and > TLP without the zero probe check), so I tend to rely more on those > results. Do not top post on netdev. We at Google run about 1000 packet drill tests for any functional change in TCP stack. This is the only way we can scale. We are not 'studying by hand' various tcpdumps when a tool can do it properly. Nandita asked you give a pointer to the source code explaining how fast retransmit was done for this specific case, but you provided a tcpdump, which hardly can be reproduced and be the answer to the question. So now, we are trying to have a test to reproduce the issue and check the fix is complete. So far, I am not really convinced. It seems the FIN _is_ retransmitted, but I do not see the SACK for this RTX is properly handled in time. Its one thing checking the FIN is retransmitted, its another to check that the SACK will trigger sensible behavior. If you carefully check your tcpdump, you'll see there is the same problem, and you missed it, while packetdrill exactly pointed it. Thanks ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 15:04 ` Eric Dumazet @ 2014-06-09 15:56 ` Per Hurtig 2014-06-09 16:15 ` Eric Dumazet 0 siblings, 1 reply; 28+ messages in thread From: Per Hurtig @ 2014-06-09 15:56 UTC (permalink / raw) To: Eric Dumazet Cc: Nandita Dukkipati, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov On mån 9 jun 2014 17:04:19, Eric Dumazet wrote: > On Mon, 2014-06-09 at 16:42 +0200, Per Hurtig wrote: >> Tried to run the script, but I don't have the "common/defaults" and the >> test scripts from the git repository fails on all TCP tests for Linux. >> The results I listed in the enclosed packet traces are from two real >> machines communicating with each other (with fresh net-next kernels and >> TLP without the zero probe check), so I tend to rely more on those >> results. > > Do not top post on netdev. > > We at Google run about 1000 packet drill tests for any functional change > in TCP stack. This is the only way we can scale. > > We are not 'studying by hand' various tcpdumps when a tool can do it > properly. > > Nandita asked you give a pointer to the source code explaining how fast > retransmit was done for this specific case, but you provided a tcpdump, > which hardly can be reproduced and be the answer to the question. > > So now, we are trying to have a test to reproduce the issue and check > the fix is complete. > > So far, I am not really convinced. It seems the FIN _is_ retransmitted, > but I do not see the SACK for this RTX is properly handled in time. > > Its one thing checking the FIN is retransmitted, its another to check > that the SACK will trigger sensible behavior. > > If you carefully check your tcpdump, you'll see there is the same > problem, and you missed it, while packetdrill exactly pointed it. > > Thanks > > Ok, I guess you mean that the retransmission was not fast enough? But will the same not happen if the original FIN is not lost and triggers a SACK (i.e., if the two last data segments are still lost)? Cheers, Per ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 15:56 ` Per Hurtig @ 2014-06-09 16:15 ` Eric Dumazet 2014-06-09 16:24 ` Eric Dumazet 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2014-06-09 16:15 UTC (permalink / raw) To: Per Hurtig Cc: Nandita Dukkipati, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov On Mon, 2014-06-09 at 17:56 +0200, Per Hurtig wrote: > > Ok, I guess you mean that the retransmission was not fast enough? But > will the same not happen if the original FIN is not lost and triggers a > SACK (i.e., if the two last data segments are still lost)? Yes, I'd like to understand why Nandita specifically added the original test. In her tests, fast retransmit was not really effective. Running packetdrill in a separate container gives me these interesting counters : # nstat #kernel IpInReceives 4 0.0 IpInDelivers 4 0.0 IpOutRequests 5 0.0 TcpPassiveOpens 1 0.0 TcpInSegs 4 0.0 TcpOutSegs 10 0.0 TcpRetransSegs 2 0.0 TcpExtTCPPureAcks 3 0.0 TcpExtTCPSackRecovery 1 0.0 TcpExtTCPFastRetrans 1 0.0 TcpExtTCPLossProbes 1 0.0 TcpExtTCPSackRecoveryFail 1 0.0 TcpExtTCPSackShiftFallback 1 0.0 TcpExtTCPRetransFail 4 0.0 <<<< HERE >>> TcpExtTCPOrigDataSent 9 0.0 IpExtInOctets 184 0.0 IpExtOutOctets 9212 0.0 IpExtInNoECTPkts 4 0.0 I guess we need to understand why the retransmit is in error. I am investigating. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 16:15 ` Eric Dumazet @ 2014-06-09 16:24 ` Eric Dumazet 2014-06-09 18:33 ` Eric Dumazet 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2014-06-09 16:24 UTC (permalink / raw) To: Per Hurtig Cc: Nandita Dukkipati, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov On Mon, 2014-06-09 at 09:15 -0700, Eric Dumazet wrote: > On Mon, 2014-06-09 at 17:56 +0200, Per Hurtig wrote: > > > > > Ok, I guess you mean that the retransmission was not fast enough? But > > will the same not happen if the original FIN is not lost and triggers a > > SACK (i.e., if the two last data segments are still lost)? > > Yes, I'd like to understand why Nandita specifically added the original > test. In her tests, fast retransmit was not really effective. > > Running packetdrill in a separate container gives me these interesting > counters : > > # nstat > #kernel > IpInReceives 4 0.0 > IpInDelivers 4 0.0 > IpOutRequests 5 0.0 > TcpPassiveOpens 1 0.0 > TcpInSegs 4 0.0 > TcpOutSegs 10 0.0 > TcpRetransSegs 2 0.0 > TcpExtTCPPureAcks 3 0.0 > TcpExtTCPSackRecovery 1 0.0 > TcpExtTCPFastRetrans 1 0.0 > TcpExtTCPLossProbes 1 0.0 > TcpExtTCPSackRecoveryFail 1 0.0 > TcpExtTCPSackShiftFallback 1 0.0 > TcpExtTCPRetransFail 4 0.0 <<<< HERE >>> > TcpExtTCPOrigDataSent 9 0.0 > IpExtInOctets 184 0.0 > IpExtOutOctets 9212 0.0 > IpExtInNoECTPkts 4 0.0 > > I guess we need to understand why the retransmit is in error. > > I am investigating. > Hmm... We hit this point... This is embarrassing I guess. if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) return -EHOSTUNREACH; /* Routing failure or similar. */ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 16:24 ` Eric Dumazet @ 2014-06-09 18:33 ` Eric Dumazet 0 siblings, 0 replies; 28+ messages in thread From: Eric Dumazet @ 2014-06-09 18:33 UTC (permalink / raw) To: Per Hurtig Cc: Nandita Dukkipati, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, Sergei Shtylyov On Mon, 2014-06-09 at 09:24 -0700, Eric Dumazet wrote: > Hmm... We hit this point... This is embarrassing I guess. > > if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) > return -EHOSTUNREACH; /* Routing failure or similar. */ False alarm. This was caused of route being dismantled at the end of the test, and a dangling socket tried to retransmit. The 25ms timer I had was caused by early retransmit (25 ms = RTT/4 in my test), because only one packet was missing. Following test runs fine, as 3 packets are missing. // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Send 8 MSS. 0.200 write(4, ..., 8000) = 8000 +.000 > P. 1:8001(8000) ack 1 +.000 close(4) = 0 +.000 > F. 8001:8001(0) ack 1 // Receiver ACKs 5 packets 0.300 < . 1:1(0) ack 5001 win 257 // Check if TLP is triggering, even if last packet is a pure FIN 0.500 > F. 8001:8001(0) ack 1 0.610 < . 1:1(0) ack 5001 win 257 <sack 8001:8002,nop,nop> 0.610 > . 5001:6001(1000) ack 1 0.610 > . 6001:7001(1000) ack 1 0.710 < . 1:1(0) ack 7001 win 257 <sack 8001:8002,nop,nop> 0.710 > P. 7001:8001(1000) ack 1 0.810 < . 1:1(0) ack 8002 win 257 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-09 7:02 ` Nandita Dukkipati 2014-06-09 13:13 ` Per Hurtig @ 2014-06-12 14:21 ` Weiping Pan 2014-06-12 14:32 ` Eric Dumazet 1 sibling, 1 reply; 28+ messages in thread From: Weiping Pan @ 2014-06-12 14:21 UTC (permalink / raw) To: Nandita Dukkipati, Per Hurtig Cc: Eric Dumazet, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, sergei.shtylyov On 06/09/2014 03:02 PM, Nandita Dukkipati wrote: > On Sun, Jun 8, 2014 at 12:41 AM, Per Hurtig <per.hurtig@kau.se> wrote: >> >> On sön 8 jun 2014 04:58:25, Eric Dumazet wrote: >>> On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote: >>>> Fix to a problem observed when losing a FIN segment that does not >>>> contain data. In such situations, TLP is unable to recover from >>>> *any* tail loss and instead adds at least PTO ms to the >>>> retransmission process, i.e., RTO = RTO + PTO. >>>> >>>> Signed-off-by: Per Hurtig <per.hurtig@kau.se> >>>> --- >>>> net/ipv4/tcp_output.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >>>> index d463c35..6573765 100644 >>>> --- a/net/ipv4/tcp_output.c >>>> +++ b/net/ipv4/tcp_output.c >>>> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk) >>>> if (WARN_ON(!skb || !tcp_skb_pcount(skb))) >>>> goto rearm_timer; >>>> >>>> - /* Probe with zero data doesn't trigger fast recovery. */ >>>> - if (skb->len > 0) >>>> + /* Probe with zero data doesn't trigger fast recovery, if FIN >>>> + * flag is not set. >>>> + */ >>>> + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) >>>> err = __tcp_retransmit_skb(sk, skb); >>>> >>>> /* Record snd_nxt for loss detection. */ >>> >>> >>> You know, I believe the test was exactly to avoid sending data less FIN >>> packets. >>> >>> If you write : >>> >>> if (A || !A) >>> >>> Better remove the condition, completely ;) >>> >> Obviously, but I don't think that FINs are the only segments >> who are targeted by this condition (or targeted at all given >> the implications of this statement). Furthermore, the comment above >> the if statement would probably have mentioned FINs explicity >> and not zero sized segments in general if this were the case. >> >> >> >>> Nandita, why FIN packet wont trigger fast retransnmits ? >>> >> They do, that's the whole thing with this patch. >> >> >>> It sounds like if the timer is the issue you want to fix, you might >>> simply rearm a timer with RTO-PTO instead of RTO ? >>> >>> >> No I want to enable TLP for tail loss where an empty FIN is involved, >> this does not work now. > I understand the tail loss case you want to solve - essentially when a > tail loss occurs that involves data segments as well as that of an > empty FIN. However, have you verified that re-sending an empty FIN > triggers fast recovery? I would be surprised if it did, because I > think the sender needs to receive a SACK of at least 1-byte of data > before sender can trigger FACK based fast recovery. When we queue an out of order pure FIN packet, we do not check whether it has data or not. tcp_rcv_established -->tcp_data_queue ---->tcp_data_queue_ofo Then the pure FIN packet can generate SACK, which will trigger fast recovery or early retransmit on the sender. > > If you have verified that a pure FIN does indeed trigger recovery, can > you tell me what part of the code makes that happen? Here is the patch I use, I think the original if statement is useless, so I delete it. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 12d6016..4b301e9 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2077,9 +2077,7 @@ void tcp_send_loss_probe(struct sock *sk) if (WARN_ON(!skb || !tcp_skb_pcount(skb))) goto rearm_timer; - /* Probe with zero data doesn't trigger fast recovery. */ - if (skb->len > 0) - err = __tcp_retransmit_skb(sk, skb); + err = __tcp_retransmit_skb(sk, skb); /* Record snd_nxt for loss detection. */ if (likely(!err)) I find that pure FIN can trigger fast recovery or early retransmit, depending on the value of fackets_out. I write two packetdrill scripts, fin_fack.pkt can show that pure FIN can trigger fast recovery, fin_er.pkt can show that pure FIN can trigger early retransmit. # cat fin_fack.pkt // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 // RTT = 100ms 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Send 8 MSS. // tcp_min_tso_segs is 2 0.200 write(4, ..., 8000) = 8000 +.000 > . 1:2001(2000) ack 1 +.000 > . 2001:4001(2000) ack 1 +.000 > . 4001:6001(2000) ack 1 +.000 > P. 6001:8001(2000) ack 1 +.000 close(4) = 0 +.000 > F. 8001:8001(0) ack 1 // Receiver ACKs 4 packets, the fifth to eighth packets are lost 0.300 < . 1:1(0) ack 4001 win 257 // PTO = 2RTT, TLP is triggered 0.500 > F. 8001:8001(0) ack 1 win 229 0.600 < . 1:1(0) ack 4001 win 257 <sack 8001:8002,nop,nop> // got SACK, FACK triggers fast recovery and fast retransmit 0.600 > . 4001:5001(1000) ack 1 win 229 0.600 > . 5001:6001(1000) ack 1 win 229 0.700 < . 1:1(0) ack 6001 win 257 <sack 8001:8002,nop,nop> // fast retransmit 0.700 > . 6001:7001(1000) ack 1 win 229 0.700 > P. 7001:8001(1000) ack 1 win 229 0.700 < . 1:1(0) ack 8002 win 257 // peer close 0.800 < F. 1:1(0) ack 8002 win 229 0.800 > . 8002:8002(0) ack 2 # cat fin_er.pkt // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 // RTT = 100ms 0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Send 8 MSS. // tcp_min_tso_segs is 2 0.200 write(4, ..., 8000) = 8000 +.000 > . 1:2001(2000) ack 1 +.000 > . 2001:4001(2000) ack 1 +.000 > . 4001:6001(2000) ack 1 +.000 > P. 6001:8001(2000) ack 1 +.000 close(4) = 0 +.000 > F. 8001:8001(0) ack 1 // Receiver ACKs 7 packets, the eighth packet is lost 0.300 < . 1:1(0) ack 7001 win 257 // PTO = 2RTT, TLP is triggered 0.500 > F. 8001:8001(0) ack 1 win 229 0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8002,nop,nop> // got SACK, trigger early retransmit in RTT/4 0.625 > P. 7001:8001(1000) ack 1 win 229 0.725 < . 1:1(0) ack 8002 win 257 // peer close 0.800 < F. 1:1(0) ack 8002 win 229 0.800 > . 8002:8002(0) ack 2 Test results: before the patch: # ./packetdrill -v fin_fack.pkt inbound injected packet: 0.100009 S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> outbound sniffed packet: 0.100182 S. 3040039935:3040039935(0) ack 1 win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7> inbound injected packet: 0.200005 . 1:1(0) ack 3040039936 win 257 outbound sniffed packet: 0.200108 . 3040039936:3040041936(2000) ack 1 win 229 outbound sniffed packet: 0.200117 . 3040041936:3040043936(2000) ack 1 win 229 outbound sniffed packet: 0.200123 . 3040043936:3040045936(2000) ack 1 win 229 outbound sniffed packet: 0.200130 P. 3040045936:3040047936(2000) ack 1 win 229 outbound sniffed packet: 0.200328 F. 3040047936:3040047936(0) ack 1 win 229 inbound injected packet: 0.300004 . 1:1(0) ack 3040043936 win 257 outbound sniffed packet: 0.800768 . 3040043936:3040044936(1000) ack 1 win 229 fin_fack.pkt:27: error handling packet: live packet field ipv4_total_length: expected: 40 (0x28) vs actual: 1040 (0x410) script packet: 0.500000 F. 8001:8001(0) ack 1 win 229 actual packet: 0.800768 . 4001:5001(1000) ack 1 win 229 # ./packetdrill -v fin_er.pkt inbound injected packet: 0.100009 S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> outbound sniffed packet: 0.100172 S. 1475097861:1475097861(0) ack 1 win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7> inbound injected packet: 0.200005 . 1:1(0) ack 1475097862 win 257 outbound sniffed packet: 0.200113 . 1475097862:1475099862(2000) ack 1 win 229 outbound sniffed packet: 0.200121 . 1475099862:1475101862(2000) ack 1 win 229 outbound sniffed packet: 0.200128 . 1475101862:1475103862(2000) ack 1 win 229 outbound sniffed packet: 0.200134 P. 1475103862:1475105862(2000) ack 1 win 229 outbound sniffed packet: 0.200305 F. 1475105862:1475105862(0) ack 1 win 229 inbound injected packet: 0.300004 . 1:1(0) ack 1475104862 win 257 outbound sniffed packet: 0.800764 P. 1475104862:1475105862(1000) ack 1 win 229 fin_er.pkt:27: error handling packet: live packet field ipv4_total_length: expected: 40 (0x28) vs actual: 1040 (0x410) script packet: 0.500000 F. 8001:8001(0) ack 1 win 229 actual packet: 0.800764 P. 7001:8001(1000) ack 1 win 229 after the patch: # ./packetdrill -v fin_fack.pkt inbound injected packet: 0.100026 S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> outbound sniffed packet: 0.100198 S. 3395593992:3395593992(0) ack 1 win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7> inbound injected packet: 0.200013 . 1:1(0) ack 3395593993 win 257 outbound sniffed packet: 0.200115 . 3395593993:3395595993(2000) ack 1 win 229 outbound sniffed packet: 0.200131 . 3395595993:3395597993(2000) ack 1 win 229 outbound sniffed packet: 0.200138 . 3395597993:3395599993(2000) ack 1 win 229 outbound sniffed packet: 0.200145 P. 3395599993:3395601993(2000) ack 1 win 229 outbound sniffed packet: 0.200345 F. 3395601993:3395601993(0) ack 1 win 229 inbound injected packet: 0.300016 . 1:1(0) ack 3395597993 win 257 outbound sniffed packet: 0.499792 F. 3395601993:3395601993(0) ack 1 win 229 inbound injected packet: 0.600024 . 1:1(0) ack 3395597993 win 257 <sack 3395601993:3395601994,nop,nop> outbound sniffed packet: 0.600074 . 3395597993:3395598993(1000) ack 1 win 229 outbound sniffed packet: 0.600080 . 3395598993:3395599993(1000) ack 1 win 229 inbound injected packet: 0.700016 . 1:1(0) ack 3395599993 win 257 <sack 3395601993:3395601994,nop,nop> outbound sniffed packet: 0.700062 . 3395599993:3395600993(1000) ack 1 win 229 outbound sniffed packet: 0.700066 P. 3395600993:3395601993(1000) ack 1 win 229 inbound injected packet: 0.700164 . 1:1(0) ack 3395601994 win 257 inbound injected packet: 0.800016 F. 1:1(0) ack 3395601994 win 229 outbound sniffed packet: 0.800062 . 3395601994:3395601994(0) ack 2 win 229 # ./packetdrill -v fin_er.pkt inbound injected packet: 0.100009 S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> outbound sniffed packet: 0.100180 S. 3074568182:3074568182(0) ack 1 win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7> inbound injected packet: 0.200005 . 1:1(0) ack 3074568183 win 257 outbound sniffed packet: 0.200106 . 3074568183:3074570183(2000) ack 1 win 229 outbound sniffed packet: 0.200115 . 3074570183:3074572183(2000) ack 1 win 229 outbound sniffed packet: 0.200122 . 3074572183:3074574183(2000) ack 1 win 229 outbound sniffed packet: 0.200128 P. 3074574183:3074576183(2000) ack 1 win 229 outbound sniffed packet: 0.200326 F. 3074576183:3074576183(0) ack 1 win 229 inbound injected packet: 0.300003 . 1:1(0) ack 3074575183 win 257 outbound sniffed packet: 0.499765 F. 3074576183:3074576183(0) ack 1 win 229 inbound injected packet: 0.600006 . 1:1(0) ack 3074575183 win 257 <sack 3074576183:3074576184,nop,nop> outbound sniffed packet: 0.624764 P. 3074575183:3074576183(1000) ack 1 win 229 inbound injected packet: 0.725004 . 1:1(0) ack 3074576184 win 257 inbound injected packet: 0.800003 F. 1:1(0) ack 3074576184 win 229 outbound sniffed packet: 0.800047 . 3074576184:3074576184(0) ack 2 win 229 thanks Weiping Pan > > Nandita > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] tcp: fixing TLP's FIN recovery 2014-06-12 14:21 ` Weiping Pan @ 2014-06-12 14:32 ` Eric Dumazet 2014-06-12 15:08 ` [PATCH v2 1/1] " Per Hurtig 0 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2014-06-12 14:32 UTC (permalink / raw) To: Weiping Pan Cc: Nandita Dukkipati, Per Hurtig, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, sergei.shtylyov On Thu, 2014-06-12 at 22:21 +0800, Weiping Pan wrote: > When we queue an out of order pure FIN packet, we do not check whether > it has data or not. > tcp_rcv_established > -->tcp_data_queue > ---->tcp_data_queue_ofo > > Then the pure FIN packet can generate SACK, which will trigger fast > recovery or early retransmit on the sender. > > > > If you have verified that a pure FIN does indeed trigger recovery, can > > you tell me what part of the code makes that happen? > Here is the patch I use, I think the original if statement is useless, > so I delete it. Yes, this is exactly what we agreed, and what we tested as well here at Google. Per, can you submit an updated official version of the patch ? Thanks ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 1/1] tcp: fixing TLP's FIN recovery 2014-06-12 14:32 ` Eric Dumazet @ 2014-06-12 15:08 ` Per Hurtig 2014-06-12 15:28 ` Eric Dumazet 2014-06-12 18:06 ` David Miller 0 siblings, 2 replies; 28+ messages in thread From: Per Hurtig @ 2014-06-12 15:08 UTC (permalink / raw) To: eric.dumazet Cc: Per Hurtig, panweiping3, nanditad, netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, sergei.shtylyov Fix to a problem observed when losing a FIN segment that does not contain data. In such situations, TLP is unable to recover from *any* tail loss and instead adds at least PTO ms to the retransmission process, i.e., RTO = RTO + PTO. Signed-off-by: Per Hurtig <per.hurtig@kau.se> --- net/ipv4/tcp_output.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ad7549f..819bf0c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2131,9 +2131,7 @@ void tcp_send_loss_probe(struct sock *sk) if (WARN_ON(!skb || !tcp_skb_pcount(skb))) goto rearm_timer; - /* Probe with zero data doesn't trigger fast recovery. */ - if (skb->len > 0) - err = __tcp_retransmit_skb(sk, skb); + err = __tcp_retransmit_skb(sk, skb); /* Record snd_nxt for loss detection. */ if (likely(!err)) -- 1.9.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery 2014-06-12 15:08 ` [PATCH v2 1/1] " Per Hurtig @ 2014-06-12 15:28 ` Eric Dumazet 2014-06-12 17:36 ` Nandita Dukkipati 2014-06-12 18:06 ` David Miller 1 sibling, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2014-06-12 15:28 UTC (permalink / raw) To: Per Hurtig Cc: panweiping3, nanditad, netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, sergei.shtylyov On Thu, 2014-06-12 at 17:08 +0200, Per Hurtig wrote: > Fix to a problem observed when losing a FIN segment that does not > contain data. In such situations, TLP is unable to recover from > *any* tail loss and instead adds at least PTO ms to the > retransmission process, i.e., RTO = RTO + PTO. > > Signed-off-by: Per Hurtig <per.hurtig@kau.se> > --- > net/ipv4/tcp_output.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index ad7549f..819bf0c 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2131,9 +2131,7 @@ void tcp_send_loss_probe(struct sock *sk) > if (WARN_ON(!skb || !tcp_skb_pcount(skb))) > goto rearm_timer; > > - /* Probe with zero data doesn't trigger fast recovery. */ > - if (skb->len > 0) > - err = __tcp_retransmit_skb(sk, skb); > + err = __tcp_retransmit_skb(sk, skb); > > /* Record snd_nxt for loss detection. */ > if (likely(!err)) Thanks a lot Per Signed-off-by: Eric Dumazet <edumazet@google.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery 2014-06-12 15:28 ` Eric Dumazet @ 2014-06-12 17:36 ` Nandita Dukkipati 2014-06-12 17:46 ` Neal Cardwell 0 siblings, 1 reply; 28+ messages in thread From: Nandita Dukkipati @ 2014-06-12 17:36 UTC (permalink / raw) To: Eric Dumazet Cc: Per Hurtig, panweiping3, Netdev, Anna Brunström, mohammad.rajiullah, Neal Cardwell, sergei.shtylyov On Thu, Jun 12, 2014 at 8:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2014-06-12 at 17:08 +0200, Per Hurtig wrote: >> Fix to a problem observed when losing a FIN segment that does not >> contain data. In such situations, TLP is unable to recover from >> *any* tail loss and instead adds at least PTO ms to the >> retransmission process, i.e., RTO = RTO + PTO. >> >> Signed-off-by: Per Hurtig <per.hurtig@kau.se> >> --- >> net/ipv4/tcp_output.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >> index ad7549f..819bf0c 100644 >> --- a/net/ipv4/tcp_output.c >> +++ b/net/ipv4/tcp_output.c >> @@ -2131,9 +2131,7 @@ void tcp_send_loss_probe(struct sock *sk) >> if (WARN_ON(!skb || !tcp_skb_pcount(skb))) >> goto rearm_timer; >> >> - /* Probe with zero data doesn't trigger fast recovery. */ >> - if (skb->len > 0) >> - err = __tcp_retransmit_skb(sk, skb); >> + err = __tcp_retransmit_skb(sk, skb); >> >> /* Record snd_nxt for loss detection. */ >> if (likely(!err)) > > Thanks a lot Per > > Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Thanks Per and Eric. This is nice addition that makes TLP more useful for tail losses. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery 2014-06-12 17:36 ` Nandita Dukkipati @ 2014-06-12 17:46 ` Neal Cardwell 0 siblings, 0 replies; 28+ messages in thread From: Neal Cardwell @ 2014-06-12 17:46 UTC (permalink / raw) To: Nandita Dukkipati Cc: Eric Dumazet, Per Hurtig, panweiping3, Netdev, Anna Brunström, mohammad.rajiullah, sergei.shtylyov On Thu, Jun 12, 2014 at 1:36 PM, Nandita Dukkipati <nanditad@google.com> wrote: > On Thu, Jun 12, 2014 at 8:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Thu, 2014-06-12 at 17:08 +0200, Per Hurtig wrote: >>> Fix to a problem observed when losing a FIN segment that does not >>> contain data. In such situations, TLP is unable to recover from >>> *any* tail loss and instead adds at least PTO ms to the >>> retransmission process, i.e., RTO = RTO + PTO. >>> >>> Signed-off-by: Per Hurtig <per.hurtig@kau.se> >>> --- >>> net/ipv4/tcp_output.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >>> index ad7549f..819bf0c 100644 >>> --- a/net/ipv4/tcp_output.c >>> +++ b/net/ipv4/tcp_output.c >>> @@ -2131,9 +2131,7 @@ void tcp_send_loss_probe(struct sock *sk) >>> if (WARN_ON(!skb || !tcp_skb_pcount(skb))) >>> goto rearm_timer; >>> >>> - /* Probe with zero data doesn't trigger fast recovery. */ >>> - if (skb->len > 0) >>> - err = __tcp_retransmit_skb(sk, skb); >>> + err = __tcp_retransmit_skb(sk, skb); >>> >>> /* Record snd_nxt for loss detection. */ >>> if (likely(!err)) >> >> Thanks a lot Per >> >> Signed-off-by: Eric Dumazet <edumazet@google.com> > > Acked-by: Nandita Dukkipati <nanditad@google.com> Thanks, Per! Acked-by: Neal Cardwell <ncardwell@google.com> neal ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery 2014-06-12 15:08 ` [PATCH v2 1/1] " Per Hurtig 2014-06-12 15:28 ` Eric Dumazet @ 2014-06-12 18:06 ` David Miller 2014-10-07 15:03 ` Josh Hunt 1 sibling, 1 reply; 28+ messages in thread From: David Miller @ 2014-06-12 18:06 UTC (permalink / raw) To: per.hurtig Cc: eric.dumazet, panweiping3, nanditad, netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, sergei.shtylyov From: Per Hurtig <per.hurtig@kau.se> Date: Thu, 12 Jun 2014 17:08:32 +0200 > Fix to a problem observed when losing a FIN segment that does not > contain data. In such situations, TLP is unable to recover from > *any* tail loss and instead adds at least PTO ms to the > retransmission process, i.e., RTO = RTO + PTO. > > Signed-off-by: Per Hurtig <per.hurtig@kau.se> Applied, thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery 2014-06-12 18:06 ` David Miller @ 2014-10-07 15:03 ` Josh Hunt 2014-10-07 20:17 ` David Miller 0 siblings, 1 reply; 28+ messages in thread From: Josh Hunt @ 2014-10-07 15:03 UTC (permalink / raw) To: David Miller Cc: per.hurtig, Eric Dumazet, panweiping3, nanditad, netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, sergei.shtylyov On Thu, Jun 12, 2014 at 1:06 PM, David Miller <davem@davemloft.net> wrote: > From: Per Hurtig <per.hurtig@kau.se> > Date: Thu, 12 Jun 2014 17:08:32 +0200 > >> Fix to a problem observed when losing a FIN segment that does not >> contain data. In such situations, TLP is unable to recover from >> *any* tail loss and instead adds at least PTO ms to the >> retransmission process, i.e., RTO = RTO + PTO. >> >> Signed-off-by: Per Hurtig <per.hurtig@kau.se> > > Applied, thanks. Can we queue this up for stable? 2cd0d743b05e87 (tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb) is already in stable and based on the changelog was put in place to fix a case that this patch introduced: "This was visible now because the recently simplified TLP logic in bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte skb at the end of the write queue, and now that we do not check that skb's length we could send it as a TLP probe." However, the patch to fix TLP's FIN recovery is not in -stable. Thanks -- Josh ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/1] tcp: fixing TLP's FIN recovery 2014-10-07 15:03 ` Josh Hunt @ 2014-10-07 20:17 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2014-10-07 20:17 UTC (permalink / raw) To: joshhunt00 Cc: per.hurtig, eric.dumazet, panweiping3, nanditad, netdev, anna.brunstrom, mohammad.rajiullah, ncardwell, sergei.shtylyov From: Josh Hunt <joshhunt00@gmail.com> Date: Tue, 7 Oct 2014 10:03:29 -0500 > On Thu, Jun 12, 2014 at 1:06 PM, David Miller <davem@davemloft.net> wrote: >> From: Per Hurtig <per.hurtig@kau.se> >> Date: Thu, 12 Jun 2014 17:08:32 +0200 >> >>> Fix to a problem observed when losing a FIN segment that does not >>> contain data. In such situations, TLP is unable to recover from >>> *any* tail loss and instead adds at least PTO ms to the >>> retransmission process, i.e., RTO = RTO + PTO. >>> >>> Signed-off-by: Per Hurtig <per.hurtig@kau.se> >> >> Applied, thanks. > > Can we queue this up for stable? 2cd0d743b05e87 (tcp: fix > tcp_match_skb_to_sack() for unaligned SACK at end of an skb) is > already in stable and based on the changelog was put in place to fix a > case that this patch introduced: > > "This was visible now because the recently simplified TLP logic in > bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte > skb at the end of the write queue, and now that we do not check that > skb's length we could send it as a TLP probe." > > However, the patch to fix TLP's FIN recovery is not in -stable. Queued up for -stable, thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2014-10-07 20:17 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-06-06 18:46 tcp: fixing TLP's FIN recovery Per Hurtig 2014-06-06 19:07 ` Eric Dumazet 2014-06-07 11:10 ` [PATCH] " Per Hurtig 2014-06-07 13:56 ` Sergei Shtylyov 2014-06-07 14:34 ` Per Hurtig 2014-06-08 2:58 ` Eric Dumazet 2014-06-08 7:41 ` Per Hurtig 2014-06-08 16:35 ` Eric Dumazet 2014-06-09 7:04 ` Nandita Dukkipati 2014-06-09 7:02 ` Nandita Dukkipati 2014-06-09 13:13 ` Per Hurtig 2014-06-09 14:33 ` Eric Dumazet 2014-06-09 14:39 ` Eric Dumazet 2014-06-09 14:42 ` Per Hurtig 2014-06-09 15:04 ` Eric Dumazet 2014-06-09 15:56 ` Per Hurtig 2014-06-09 16:15 ` Eric Dumazet 2014-06-09 16:24 ` Eric Dumazet 2014-06-09 18:33 ` Eric Dumazet 2014-06-12 14:21 ` Weiping Pan 2014-06-12 14:32 ` Eric Dumazet 2014-06-12 15:08 ` [PATCH v2 1/1] " Per Hurtig 2014-06-12 15:28 ` Eric Dumazet 2014-06-12 17:36 ` Nandita Dukkipati 2014-06-12 17:46 ` Neal Cardwell 2014-06-12 18:06 ` David Miller 2014-10-07 15:03 ` Josh Hunt 2014-10-07 20:17 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.