All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: David Miller <davem@davemloft.net>,
	Vijay Subramanian <subramanian.vijay@gmail.com>,
	netdev@vger.kernel.org, ncardwell@google.com,
	Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>,
	Elliott Hughes <enh@google.com>,
	Yuchung Cheng <ycheng@google.com>
Subject: Re: [PATCH net-next] tcp: better retrans tracking for defer-accept
Date: Sun, 28 Oct 2012 10:15:13 +0100	[thread overview]
Message-ID: <1351415713.30380.398.camel@edumazet-glaptop> (raw)
In-Reply-To: <alpine.LFD.2.00.1210272324020.1628@ja.ssi.bg>

On Sun, 2012-10-28 at 01:29 +0300, Julian Anastasov wrote:
> 	Hello,
> 
> On Sat, 27 Oct 2012, Eric Dumazet wrote:
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > For passive TCP connections using TCP_DEFER_ACCEPT facility,
> > we incorrectly increment req->retrans each time timeout triggers
> > while no SYNACK is sent.
> > 
> > SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for wich
> > we received the ACK from client). Only the last SYNACK is
> > sent so that we can receive again an ACK from client, to move the
> > req into accept queue. We plan to change this later to avoid
> > the useless retransmit (and potential problem as this SYNACK could be
> > lost)
> > 
> > TCP_INFO later gives wrong information to user, claiming imaginary
> > retransmits.
> > 
> > Decouple req->retrans field into two independent fields :
> > 
> > num_retrans : number of retransmit
> > num_timeout : number of timeouts
> > 
> > num_timeout is the counter that is incremented at each timeout,
> > regardless of actual SYNACK being sent or not, and used to
> > compute the exponential timeout.
> > 
> > Introduce inet_rtx_syn_ack() helper to increment num_retrans
> > only if ->rtx_syn_ack() succeeded.
> > 
> > Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
> > when we re-send a SYNACK in answer to a (retransmitted) SYN.
> > Prior to this patch, we were not counting these retransmits.
> > 
> > Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
> > only if a synack packet was successfully queued.
> > 
> > Reported-by: Yuchung Cheng <ycheng@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Julian Anastasov <ja@ssi.bg>
> > Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
> > Cc: Elliott Hughes <enh@google.com>
> > Cc: Neal Cardwell <ncardwell@google.com>
> > ---
> 
> > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> > index ba48e79..b236ef0 100644
> > --- a/net/ipv4/syncookies.c
> > +++ b/net/ipv4/syncookies.c
> > @@ -340,7 +340,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
> >  	}
> >  
> >  	req->expires	= 0UL;
> > -	req->retrans	= 0;
> > +	req->num_retrans = 0;
> 
> 	also req->num_timeout = 0; ?

This is not needed here.

We construct a temporary req only to be able to use get_cookie_sock()
and build a full socket with :
 icsk->icsk_af_ops->syn_recv_sock(sk, skb, req, dst);

By definition this req wont ever be used in SYN_RECV state, and never be
part of the timer wheel.

We init req->num_retrans to 0 because this field will be the source of

newtp->total_retrans = req->num_retrans;

in tcp_v{4|6}_syn_recv_sock()

> 
> >  
> >  	/*
> >  	 * We need to lookup the route here to get at the correct
> 
> > diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
> > index 182ab9a..4016197 100644
> > --- a/net/ipv6/syncookies.c
> > +++ b/net/ipv6/syncookies.c
> > @@ -214,7 +214,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
> >  		ireq6->iif = inet6_iif(skb);
> >  
> >  	req->expires = 0UL;
> > -	req->retrans = 0;
> > +	req->num_retrans = 0;
> 
> 	also req->num_timeout = 0; ?
> 


Same thing : this is not needed

> >  	ireq->ecn_ok		= ecn_ok;
> >  	ireq->snd_wscale	= tcp_opt.snd_wscale;
> >  	ireq->sack_ok		= tcp_opt.sack_ok;
> 
> > @@ -1866,7 +1869,7 @@ static void get_openreq6(struct seq_file *seq,
> >  		   0,0, /* could print option size, but that is af dependent. */
> >  		   1,   /* timers active (only the expire timer) */
> >  		   jiffies_to_clock_t(ttd),
> > -		   req->retrans,
> > +		   req->num_timeout,
> 
> 	num_timeout already noted by Neal
> 

Yes, thanks. source patch was fine, I messed the rebase.

> >  		   from_kuid_munged(seq_user_ns(seq), uid),
> >  		   0,  /* non standard timer */
> >  		   0, /* open_requests have no inode */
> 
> 	I don't see any problems with such patch, just
> check if it is for correct tree, I see some atomic operations
> with qlen_young in patch.
> 

Arg, my net-next tree got some stuff I forgot to revert, sorry, will
cleanup.

> 	One thing to finally decide: should we use limit for
> retransmissions or for timeout, is the following better?:
> 
> 	if (!rskq_defer_accept) {
> 		*expire = req->num_retrans >= thresh;
> 			       ^^^^^^^^^^^
> 		*resend = 1;
> 		return;
> 	}

Not sure it matters and if this decision is part of this patch.

If a retransmit fails, it seems we zap the request anyway ?

inet_rtx_syn_ack() returns an error and inet_rsk(req)->acked is false ->
we remove the req from queue.

We dont remove the req only if we got a listen queue overflow in
tcp_check_req() : we set acked to 1 in this case.

listen_overflow:
	if (!sysctl_tcp_abort_on_overflow) {
		inet_rsk(req)->acked = 1;
		return NULL;
	}

Using number of timeouts seems better to me. There is no point holding a
req forever if we fail to retransmit SYNACKS.
Client probably gave up.

Thanks a lot for the review !

  reply	other threads:[~2012-10-28  9:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-26  8:05 [PATCH net-next V2 1/1] tcp: Prevent needless syn-ack rexmt during TWHS Vijay Subramanian
2012-10-26  8:03 ` Eric Dumazet
2012-10-26 21:30 ` Julian Anastasov
2012-10-26 21:42   ` Eric Dumazet
2012-10-26 22:52     ` Julian Anastasov
2012-10-27  0:07       ` Vijay Subramanian
2012-10-27  8:43         ` Julian Anastasov
2012-10-27  8:50         ` Eric Dumazet
2012-10-27 11:57 ` Eric Dumazet
2012-10-27 13:23   ` Julian Anastasov
2012-10-27 13:32     ` Eric Dumazet
2012-10-27 14:18       ` [PATCH net-next] tcp: better retrans tracking for defer-accept Eric Dumazet
2012-10-27 18:27         ` Neal Cardwell
2012-10-27 22:29         ` Julian Anastasov
2012-10-28  9:15           ` Eric Dumazet [this message]
2012-10-28 16:51             ` Julian Anastasov
2012-10-28 20:02               ` Eric Dumazet
2012-10-29  9:21                 ` Julian Anastasov
2012-11-03 18:46         ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1351415713.30380.398.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=enh@google.com \
    --cc=ja@ssi.bg \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=subramanian.vijay@gmail.com \
    --cc=venkat.x.venkatsubra@oracle.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.