From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH net-next] tcp: better retrans tracking for defer-accept Date: Sun, 28 Oct 2012 10:15:13 +0100 Message-ID: <1351415713.30380.398.camel@edumazet-glaptop> References: <1351238750-13611-1-git-send-email-subramanian.vijay@gmail.com> <1351339032.30380.222.camel@edumazet-glaptop> <1351344725.30380.286.camel@edumazet-glaptop> <1351347537.30380.315.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: David Miller , Vijay Subramanian , netdev@vger.kernel.org, ncardwell@google.com, Venkat Venkatsubra , Elliott Hughes , Yuchung Cheng To: Julian Anastasov Return-path: Received: from mail-ee0-f46.google.com ([74.125.83.46]:57937 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750733Ab2J1JPS (ORCPT ); Sun, 28 Oct 2012 05:15:18 -0400 Received: by mail-ee0-f46.google.com with SMTP id b15so1630096eek.19 for ; Sun, 28 Oct 2012 02:15:17 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Sun, 2012-10-28 at 01:29 +0300, Julian Anastasov wrote: > Hello, > > On Sat, 27 Oct 2012, Eric Dumazet wrote: > > > From: Eric Dumazet > > > > For passive TCP connections using TCP_DEFER_ACCEPT facility, > > we incorrectly increment req->retrans each time timeout triggers > > while no SYNACK is sent. > > > > SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for wich > > we received the ACK from client). Only the last SYNACK is > > sent so that we can receive again an ACK from client, to move the > > req into accept queue. We plan to change this later to avoid > > the useless retransmit (and potential problem as this SYNACK could be > > lost) > > > > TCP_INFO later gives wrong information to user, claiming imaginary > > retransmits. > > > > Decouple req->retrans field into two independent fields : > > > > num_retrans : number of retransmit > > num_timeout : number of timeouts > > > > num_timeout is the counter that is incremented at each timeout, > > regardless of actual SYNACK being sent or not, and used to > > compute the exponential timeout. > > > > Introduce inet_rtx_syn_ack() helper to increment num_retrans > > only if ->rtx_syn_ack() succeeded. > > > > Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans > > when we re-send a SYNACK in answer to a (retransmitted) SYN. > > Prior to this patch, we were not counting these retransmits. > > > > Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS > > only if a synack packet was successfully queued. > > > > Reported-by: Yuchung Cheng > > Signed-off-by: Eric Dumazet > > Cc: Julian Anastasov > > Cc: Vijay Subramanian > > Cc: Elliott Hughes > > Cc: Neal Cardwell > > --- > > > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c > > index ba48e79..b236ef0 100644 > > --- a/net/ipv4/syncookies.c > > +++ b/net/ipv4/syncookies.c > > @@ -340,7 +340,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, > > } > > > > req->expires = 0UL; > > - req->retrans = 0; > > + req->num_retrans = 0; > > also req->num_timeout = 0; ? This is not needed here. We construct a temporary req only to be able to use get_cookie_sock() and build a full socket with : icsk->icsk_af_ops->syn_recv_sock(sk, skb, req, dst); By definition this req wont ever be used in SYN_RECV state, and never be part of the timer wheel. We init req->num_retrans to 0 because this field will be the source of newtp->total_retrans = req->num_retrans; in tcp_v{4|6}_syn_recv_sock() > > > > > /* > > * We need to lookup the route here to get at the correct > > > diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c > > index 182ab9a..4016197 100644 > > --- a/net/ipv6/syncookies.c > > +++ b/net/ipv6/syncookies.c > > @@ -214,7 +214,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) > > ireq6->iif = inet6_iif(skb); > > > > req->expires = 0UL; > > - req->retrans = 0; > > + req->num_retrans = 0; > > also req->num_timeout = 0; ? > Same thing : this is not needed > > ireq->ecn_ok = ecn_ok; > > ireq->snd_wscale = tcp_opt.snd_wscale; > > ireq->sack_ok = tcp_opt.sack_ok; > > > @@ -1866,7 +1869,7 @@ static void get_openreq6(struct seq_file *seq, > > 0,0, /* could print option size, but that is af dependent. */ > > 1, /* timers active (only the expire timer) */ > > jiffies_to_clock_t(ttd), > > - req->retrans, > > + req->num_timeout, > > num_timeout already noted by Neal > Yes, thanks. source patch was fine, I messed the rebase. > > from_kuid_munged(seq_user_ns(seq), uid), > > 0, /* non standard timer */ > > 0, /* open_requests have no inode */ > > I don't see any problems with such patch, just > check if it is for correct tree, I see some atomic operations > with qlen_young in patch. > Arg, my net-next tree got some stuff I forgot to revert, sorry, will cleanup. > One thing to finally decide: should we use limit for > retransmissions or for timeout, is the following better?: > > if (!rskq_defer_accept) { > *expire = req->num_retrans >= thresh; > ^^^^^^^^^^^ > *resend = 1; > return; > } Not sure it matters and if this decision is part of this patch. If a retransmit fails, it seems we zap the request anyway ? inet_rtx_syn_ack() returns an error and inet_rsk(req)->acked is false -> we remove the req from queue. We dont remove the req only if we got a listen queue overflow in tcp_check_req() : we set acked to 1 in this case. listen_overflow: if (!sysctl_tcp_abort_on_overflow) { inet_rsk(req)->acked = 1; return NULL; } Using number of timeouts seems better to me. There is no point holding a req forever if we fail to retransmit SYNACKS. Client probably gave up. Thanks a lot for the review !