* [PATCH] tcp: Expose the initial RTO via a new sysctl. @ 2011-05-17 7:40 Benoit Sigoure 2011-05-17 7:40 ` Benoit Sigoure 0 siblings, 1 reply; 53+ messages in thread From: Benoit Sigoure @ 2011-05-17 7:40 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber; +Cc: netdev, linux-kernel Hi, it's not easy to change the initial RTO of TCP as right now you need to recompile your kernel. In order to make it easier to tune this setting, I was wondering whether you would consider turning it into a sysctl. I attached a first attempt at a patch that does this -- this is my first patch to the Linux kernel so although I've read SubmitChecklist and SubmittingPatches, and I've run checkpatch.pl, please let me know if I'm doing something wrong. I am doing this because I work in a high-throughput low-latency environment (line-rate GbE with submillisecond RTT) and some of our clients are negatively affected by the high initial RTO when the servers are unable to accept() new connections fast enough. While we're working on fixing these servers and/or giving them larger backlog queues when they listen(), being able to tune the initial RTO at runtime would be very useful as quick workaround for the server-side issues. Some large Internet websites are also running with a more aggressive initial RTO, for instance Google documented some of what they're doing here: http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf While I'm not arguing to change the default value at this time, I believe that this patch would also come in handy for those who wish to experiment with various values in their environment. If you're willing to consider this patch, bear in mind I only compiled it, I didn't test it yet (not knowing whether you'd want something like that or not). I would also appreciate if anyone had any insight on what I did with `COUNTER_TRIES' in `syncookies.c' as this magic constant is rather mysterious and the comment didn't help me figure out how it had been derived. I couldn't find anything online and git blame didn't help me either (it pre-dates Git). ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-17 7:40 [PATCH] tcp: Expose the initial RTO via a new sysctl Benoit Sigoure @ 2011-05-17 7:40 ` Benoit Sigoure 2011-05-17 8:01 ` Alexander Zimmermann 2011-05-17 8:07 ` Eric Dumazet 0 siblings, 2 replies; 53+ messages in thread From: Benoit Sigoure @ 2011-05-17 7:40 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber Cc: netdev, linux-kernel, Benoit Sigoure Instead of hardcoding the initial RTO to 3s and requiring the kernel to be recompiled to change it, expose it as a sysctl that can be tuned at runtime. Leave the default value unchanged. Signed-off-by: Benoit Sigoure <tsunanet@gmail.com> --- Documentation/networking/ip-sysctl.txt | 6 ++++++ include/linux/sysctl.h | 1 + include/net/tcp.h | 3 ++- kernel/sysctl_binary.c | 1 + net/ipv4/syncookies.c | 2 +- net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++ net/ipv4/tcp.c | 4 ++-- net/ipv4/tcp_input.c | 8 ++++---- net/ipv4/tcp_ipv4.c | 6 +++--- net/ipv4/tcp_minisocks.c | 6 +++--- net/ipv4/tcp_output.c | 2 +- net/ipv4/tcp_timer.c | 9 +++++---- net/ipv6/syncookies.c | 2 +- net/ipv6/tcp_ipv6.c | 6 +++--- 14 files changed, 44 insertions(+), 23 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d3d653a..c381c68 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -384,6 +384,12 @@ tcp_retries2 - INTEGER RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8. +tcp_initial_rto - INTEGER + This value sets the initial retransmit timeout, that is how long + the kernel will wait before retransmitting the initial SYN packet. + + RFC 1122 says that this SHOULD be 3 seconds, which is the default. + tcp_rfc1337 - BOOLEAN If set, the TCP stack behaves conforming to RFC1337. If unset, we are not conforming to RFC, but prevent TCP TIME_WAIT diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 11684d9..96a9b41 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -425,6 +425,7 @@ enum NET_TCP_ALLOWED_CONG_CONTROL=123, NET_TCP_MAX_SSTHRESH=124, NET_TCP_FRTO_RESPONSE=125, + NET_IPV4_TCP_INITIAL_RTO=126, }; enum { diff --git a/include/net/tcp.h b/include/net/tcp.h index cda30ea..a2bb0f1 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -213,6 +213,7 @@ extern int sysctl_tcp_syn_retries; extern int sysctl_tcp_synack_retries; extern int sysctl_tcp_retries1; extern int sysctl_tcp_retries2; +extern int sysctl_tcp_initial_rto; extern int sysctl_tcp_orphan_retries; extern int sysctl_tcp_syncookies; extern int sysctl_tcp_retrans_collapse; @@ -295,7 +296,7 @@ static inline void tcp_synq_overflow(struct sock *sk) static inline int tcp_synq_no_recent_overflow(const struct sock *sk) { unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; - return time_after(jiffies, last_overflow + TCP_TIMEOUT_INIT); + return time_after(jiffies, last_overflow + sysctl_tcp_initial_rto); } extern struct proto tcp_prot; diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index 3b8e028..d608d84 100644 --- a/kernel/sysctl_binary.c +++ b/kernel/sysctl_binary.c @@ -354,6 +354,7 @@ static const struct bin_table bin_net_ipv4_table[] = { { CTL_INT, NET_IPV4_TCP_KEEPALIVE_INTVL, "tcp_keepalive_intvl" }, { CTL_INT, NET_IPV4_TCP_RETRIES1, "tcp_retries1" }, { CTL_INT, NET_IPV4_TCP_RETRIES2, "tcp_retries2" }, + { CTL_INT, NET_IPV4_TCP_INITIAL_RTO, "tcp_initial_rto" }, { CTL_INT, NET_IPV4_TCP_FIN_TIMEOUT, "tcp_fin_timeout" }, { CTL_INT, NET_TCP_SYNCOOKIES, "tcp_syncookies" }, { CTL_INT, NET_TCP_TW_RECYCLE, "tcp_tw_recycle" }, diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 8b44c6d..089bc92 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -186,7 +186,7 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp) * sysctl_tcp_retries1. It's a rather complicated formula (exponential * backoff) to compute at runtime so it's currently hardcoded here. */ -#define COUNTER_TRIES 4 +#define COUNTER_TRIES (sysctl_tcp_initial_rto + 1) /* * Check if a ack sequence number is a valid syncookie. * Return the decoded mss if it is, or 0 if not. diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 321e6e8..24dc21d 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -30,6 +30,8 @@ static int tcp_adv_win_scale_min = -31; static int tcp_adv_win_scale_max = 31; static int ip_ttl_min = 1; static int ip_ttl_max = 255; +static int tcp_initial_rto_min = TCP_RTO_MIN; +static int tcp_initial_rto_max = TCP_RTO_MAX; /* Update system visible IP port range */ static void set_local_port_range(int range[2]) @@ -246,6 +248,15 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "tcp_initial_rto", + .data = &sysctl_tcp_initial_rto, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &tcp_initial_rto_min, + .extra2 = &tcp_initial_rto_max, + }, { .procname = "tcp_fin_timeout", .data = &sysctl_tcp_fin_timeout, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index b22d450..e9e7c3f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2352,7 +2352,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_DEFER_ACCEPT: /* Translate value in seconds to number of retransmits */ icsk->icsk_accept_queue.rskq_defer_accept = - secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ, + secs_to_retrans(val, sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); break; @@ -2539,7 +2539,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level, break; case TCP_DEFER_ACCEPT: val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept, - TCP_TIMEOUT_INIT / HZ, TCP_RTO_MAX / HZ); + sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bef9f04..39f6c27 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -890,7 +890,7 @@ static void tcp_init_metrics(struct sock *sk) if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (sysctl_tcp_initial_rto << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,7 +916,7 @@ static void tcp_init_metrics(struct sock *sk) tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { + if (inet_csk(sk)->icsk_rto < sysctl_tcp_initial_rto && !tp->rx_opt.saw_tstamp) { reset: /* Play conservative. If timestamps are not * supported, TCP will fail to recalculate correct @@ -924,8 +924,8 @@ reset: */ if (!tp->rx_opt.saw_tstamp && tp->srtt) { tp->srtt = 0; - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + tp->mdev = tp->mdev_max = tp->rttvar = sysctl_tcp_initial_rto; + inet_csk(sk)->icsk_rto = sysctl_tcp_initial_rto; } } tp->snd_cwnd = tcp_init_cwnd(tp, dst); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f7e6c2c..21920e6 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1383,7 +1383,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) want_cookie) goto drop_and_free; - inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + inet_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto); return 0; drop_and_release: @@ -1834,8 +1834,8 @@ static int tcp_v4_init_sock(struct sock *sk) tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); - icsk->icsk_rto = TCP_TIMEOUT_INIT; - tp->mdev = TCP_TIMEOUT_INIT; + icsk->icsk_rto = sysctl_tcp_initial_rto; + tp->mdev = sysctl_tcp_initial_rto; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 80b1f80..c63ffa0 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -472,8 +472,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, tcp_init_wl(newtp, treq->rcv_isn); newtp->srtt = 0; - newtp->mdev = TCP_TIMEOUT_INIT; - newicsk->icsk_rto = TCP_TIMEOUT_INIT; + newtp->mdev = sysctl_tcp_initial_rto; + newicsk->icsk_rto = sysctl_tcp_initial_rto; newtp->packets_out = 0; newtp->retrans_out = 0; @@ -582,7 +582,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, * it can be estimated (approximately) * from another data. */ - tmp_opt.ts_recent_stamp = get_seconds() - ((TCP_TIMEOUT_INIT/HZ)<<req->retrans); + tmp_opt.ts_recent_stamp = get_seconds() - ((sysctl_tcp_initial_rto/HZ)<<req->retrans); paws_reject = tcp_paws_reject(&tmp_opt, th->rst); } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 17388c7..e34b0f6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2599,7 +2599,7 @@ static void tcp_connect_init(struct sock *sk) tp->rcv_wup = 0; tp->copied_seq = 0; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + inet_csk(sk)->icsk_rto = sysctl_tcp_initial_rto; inet_csk(sk)->icsk_retransmits = 0; tcp_clear_retrans(tp); } diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index ecd44b0..b9da62b 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -29,6 +29,7 @@ int sysctl_tcp_keepalive_probes __read_mostly = TCP_KEEPALIVE_PROBES; int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL; int sysctl_tcp_retries1 __read_mostly = TCP_RETR1; int sysctl_tcp_retries2 __read_mostly = TCP_RETR2; +int sysctl_tcp_initial_rto __read_mostly = TCP_TIMEOUT_INIT; int sysctl_tcp_orphan_retries __read_mostly; int sysctl_tcp_thin_linear_timeouts __read_mostly; @@ -135,8 +136,8 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) /* This function calculates a "timeout" which is equivalent to the timeout of a * TCP connection after "boundary" unsuccessful, exponentially backed-off - * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if - * syn_set flag is set. + * retransmissions with an initial RTO of TCP_RTO_MIN or + * sysctl_tcp_initial_rto if syn_set flag is set. */ static bool retransmits_timed_out(struct sock *sk, unsigned int boundary, @@ -144,7 +145,7 @@ static bool retransmits_timed_out(struct sock *sk, bool syn_set) { unsigned int linear_backoff_thresh, start_ts; - unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN; + unsigned int rto_base = syn_set ? sysctl_tcp_initial_rto : TCP_RTO_MIN; if (!inet_csk(sk)->icsk_retransmits) return false; @@ -495,7 +496,7 @@ out_unlock: static void tcp_synack_timer(struct sock *sk) { inet_csk_reqsk_queue_prune(sk, TCP_SYNQ_INTERVAL, - TCP_TIMEOUT_INIT, TCP_RTO_MAX); + sysctl_tcp_initial_rto, TCP_RTO_MAX); } void tcp_syn_ack_timeout(struct sock *sk, struct request_sock *req) diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index 352c260..50baaec 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -45,7 +45,7 @@ static __u16 const msstab[] = { * sysctl_tcp_retries1. It's a rather complicated formula (exponential * backoff) to compute at runtime so it's currently hardcoded here. */ -#define COUNTER_TRIES 4 +#define COUNTER_TRIES (sysctl_tcp_initial_rto + 1) static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4f49e5d..7e791e6 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1349,7 +1349,7 @@ have_isn: want_cookie) goto drop_and_free; - inet6_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + inet6_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto); return 0; drop_and_release: @@ -1957,8 +1957,8 @@ static int tcp_v6_init_sock(struct sock *sk) tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); - icsk->icsk_rto = TCP_TIMEOUT_INIT; - tp->mdev = TCP_TIMEOUT_INIT; + icsk->icsk_rto = sysctl_tcp_initial_rto; + tp->mdev = sysctl_tcp_initial_rto; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-17 7:40 ` Benoit Sigoure @ 2011-05-17 8:01 ` Alexander Zimmermann 2011-05-17 8:34 ` Eric Dumazet 2011-05-17 8:07 ` Eric Dumazet 1 sibling, 1 reply; 53+ messages in thread From: Alexander Zimmermann @ 2011-05-17 8:01 UTC (permalink / raw) To: Benoit Sigoure Cc: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 751 bytes --] Hi Benoit, Am 17.05.2011 um 09:40 schrieb Benoit Sigoure: > Instead of hardcoding the initial RTO to 3s and requiring > the kernel to be recompiled to change it, expose it as a > sysctl that can be tuned at runtime. Leave the default > value unchanged. > regardless of netdev will accept this patch or not, the upcoming initRTO is 1s. See http://tools.ietf.org/id/draft-paxson-tcpm-rfc2988bis-02.txt The draft is IESG approved and will become an RFC soon. Alex // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-17 8:01 ` Alexander Zimmermann @ 2011-05-17 8:34 ` Eric Dumazet 0 siblings, 0 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-17 8:34 UTC (permalink / raw) To: Alexander Zimmermann Cc: Benoit Sigoure, davem, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel Le mardi 17 mai 2011 à 10:01 +0200, Alexander Zimmermann a écrit : > > regardless of netdev will accept this patch or not, the > upcoming initRTO is 1s. See > http://tools.ietf.org/id/draft-paxson-tcpm-rfc2988bis-02.txt > > The draft is IESG approved and will become an RFC soon. Thanks Alex for this link / information. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-17 7:40 ` Benoit Sigoure 2011-05-17 8:01 ` Alexander Zimmermann @ 2011-05-17 8:07 ` Eric Dumazet 2011-05-17 11:02 ` Hagen Paul Pfeifer 2011-05-18 10:43 ` Benoit Sigoure 1 sibling, 2 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-17 8:07 UTC (permalink / raw) To: Benoit Sigoure Cc: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel Le mardi 17 mai 2011 à 00:40 -0700, Benoit Sigoure a écrit : > Instead of hardcoding the initial RTO to 3s and requiring > the kernel to be recompiled to change it, expose it as a > sysctl that can be tuned at runtime. Leave the default > value unchanged. > I wont discuss if introducing a new sysctl is welcomed, only on patch issues. I believe some work in IETF is done to reduce the 3sec value to 1sec anyway. > Signed-off-by: Benoit Sigoure <tsunanet@gmail.com> > --- > Documentation/networking/ip-sysctl.txt | 6 ++++++ > include/linux/sysctl.h | 1 + > include/net/tcp.h | 3 ++- > kernel/sysctl_binary.c | 1 + > net/ipv4/syncookies.c | 2 +- > net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++ > net/ipv4/tcp.c | 4 ++-- > net/ipv4/tcp_input.c | 8 ++++---- > net/ipv4/tcp_ipv4.c | 6 +++--- > net/ipv4/tcp_minisocks.c | 6 +++--- > net/ipv4/tcp_output.c | 2 +- > net/ipv4/tcp_timer.c | 9 +++++---- > net/ipv6/syncookies.c | 2 +- > net/ipv6/tcp_ipv6.c | 6 +++--- > 14 files changed, 44 insertions(+), 23 deletions(-) > > diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt > index d3d653a..c381c68 100644 > --- a/Documentation/networking/ip-sysctl.txt > +++ b/Documentation/networking/ip-sysctl.txt > @@ -384,6 +384,12 @@ tcp_retries2 - INTEGER > RFC 1122 recommends at least 100 seconds for the timeout, > which corresponds to a value of at least 8. > > +tcp_initial_rto - INTEGER > + This value sets the initial retransmit timeout, that is how long > + the kernel will wait before retransmitting the initial SYN packet. > + > + RFC 1122 says that this SHOULD be 3 seconds, which is the default. > + units ? seconds ? ms ? jiffies ? I suggest using ms as external interface. > tcp_rfc1337 - BOOLEAN > If set, the TCP stack behaves conforming to RFC1337. If unset, > we are not conforming to RFC, but prevent TCP TIME_WAIT > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h > index 11684d9..96a9b41 100644 > --- a/include/linux/sysctl.h > +++ b/include/linux/sysctl.h > @@ -425,6 +425,7 @@ enum > NET_TCP_ALLOWED_CONG_CONTROL=123, > NET_TCP_MAX_SSTHRESH=124, > NET_TCP_FRTO_RESPONSE=125, > + NET_IPV4_TCP_INITIAL_RTO=126, We dont add new values here anymore, only anonymous ones. > }; > > enum { > diff --git a/include/net/tcp.h b/include/net/tcp.h > index cda30ea..a2bb0f1 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -213,6 +213,7 @@ extern int sysctl_tcp_syn_retries; > extern int sysctl_tcp_synack_retries; > extern int sysctl_tcp_retries1; > extern int sysctl_tcp_retries2; > +extern int sysctl_tcp_initial_rto; > extern int sysctl_tcp_orphan_retries; > extern int sysctl_tcp_syncookies; > extern int sysctl_tcp_retrans_collapse; > @@ -295,7 +296,7 @@ static inline void tcp_synq_overflow(struct sock *sk) > static inline int tcp_synq_no_recent_overflow(const struct sock *sk) > { > unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; > - return time_after(jiffies, last_overflow + TCP_TIMEOUT_INIT); > + return time_after(jiffies, last_overflow + sysctl_tcp_initial_rto); > } > > extern struct proto tcp_prot; > diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c > index 3b8e028..d608d84 100644 > --- a/kernel/sysctl_binary.c > +++ b/kernel/sysctl_binary.c > @@ -354,6 +354,7 @@ static const struct bin_table bin_net_ipv4_table[] = { > { CTL_INT, NET_IPV4_TCP_KEEPALIVE_INTVL, "tcp_keepalive_intvl" }, > { CTL_INT, NET_IPV4_TCP_RETRIES1, "tcp_retries1" }, > { CTL_INT, NET_IPV4_TCP_RETRIES2, "tcp_retries2" }, > + { CTL_INT, NET_IPV4_TCP_INITIAL_RTO, "tcp_initial_rto" }, no need here. sysctl() is deprecated. > { CTL_INT, NET_IPV4_TCP_FIN_TIMEOUT, "tcp_fin_timeout" }, > { CTL_INT, NET_TCP_SYNCOOKIES, "tcp_syncookies" }, > { CTL_INT, NET_TCP_TW_RECYCLE, "tcp_tw_recycle" }, > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c > index 8b44c6d..089bc92 100644 > --- a/net/ipv4/syncookies.c > +++ b/net/ipv4/syncookies.c > @@ -186,7 +186,7 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp) > * sysctl_tcp_retries1. It's a rather complicated formula (exponential > * backoff) to compute at runtime so it's currently hardcoded here. > */ > -#define COUNTER_TRIES 4 > +#define COUNTER_TRIES (sysctl_tcp_initial_rto + 1) Are you sure of this ? If HZ=1000, sysctl_tcp_initial_rto is 3000 COUNTER_TRIES goes from 4 to 3004 > /* > * Check if a ack sequence number is a valid syncookie. > * Return the decoded mss if it is, or 0 if not. > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c > index 321e6e8..24dc21d 100644 > --- a/net/ipv4/sysctl_net_ipv4.c > +++ b/net/ipv4/sysctl_net_ipv4.c > @@ -30,6 +30,8 @@ static int tcp_adv_win_scale_min = -31; > static int tcp_adv_win_scale_max = 31; > static int ip_ttl_min = 1; > static int ip_ttl_max = 255; > +static int tcp_initial_rto_min = TCP_RTO_MIN; warning its jiffies units here. > +static int tcp_initial_rto_max = TCP_RTO_MAX; > > /* Update system visible IP port range */ > static void set_local_port_range(int range[2]) > @@ -246,6 +248,15 @@ static struct ctl_table ipv4_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec > }, > + { > + .procname = "tcp_initial_rto", > + .data = &sysctl_tcp_initial_rto, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, so unit is jiffies ? Really its not a good thing. Use ms instead. Consider proc_dointvec_ms_jiffies(), here. > + .extra1 = &tcp_initial_rto_min, > + .extra2 = &tcp_initial_rto_max, > + }, > { > .procname = "tcp_fin_timeout", > .data = &sysctl_tcp_fin_timeout, > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index b22d450..e9e7c3f 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -2352,7 +2352,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, > case TCP_DEFER_ACCEPT: > /* Translate value in seconds to number of retransmits */ > icsk->icsk_accept_queue.rskq_defer_accept = > - secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ, > + secs_to_retrans(val, sysctl_tcp_initial_rto / HZ, Here you assume sysctl_tcp_initial_rto is expressed in jiffies ? Oh well... > TCP_RTO_MAX / HZ); > break; > > @@ -2539,7 +2539,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level, > break; > case TCP_DEFER_ACCEPT: > val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept, > - TCP_TIMEOUT_INIT / HZ, TCP_RTO_MAX / HZ); > + sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); > break; > case TCP_WINDOW_CLAMP: > val = tp->window_clamp; > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index bef9f04..39f6c27 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -890,7 +890,7 @@ static void tcp_init_metrics(struct sock *sk) > if (dst_metric(dst, RTAX_RTT) == 0) > goto reset; > > - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) > + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (sysctl_tcp_initial_rto << 3)) Here you assume jiffies unit again. I wonder how this was tested :( Please fix this and chose a definitive unit. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-17 8:07 ` Eric Dumazet @ 2011-05-17 11:02 ` Hagen Paul Pfeifer 2011-05-18 10:43 ` Benoit Sigoure 1 sibling, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-17 11:02 UTC (permalink / raw) To: Eric Dumazet Cc: Benoit Sigoure, davem, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Tue, 17 May 2011 10:07:57 +0200, Eric Dumazet wrote: > I wont discuss if introducing a new sysctl is welcomed, only on patch > issues. I believe some work in IETF is done to reduce the 3sec value to > 1sec anyway. Why not? I though all new knobs in this area should be done on a per route metric so it can be controlled on a per path basis. RTO should be adjustable on a per path basis, because it depends on the path. Some months back [1] I posted a patch to enable/disable TCP quick ack mode, which has nothing to do with network paths, just with a local server policy. But David rejected the patch with the argument that I should use a per path knob (this is a little bit inapprehensible for me, but David has the last word). Hagen [1] http://kerneltrap.org/mailarchive/linux-netdev/2010/8/23/6283640 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. @ 2011-05-17 11:02 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-17 11:02 UTC (permalink / raw) To: Eric Dumazet Cc: Benoit Sigoure, davem, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Tue, 17 May 2011 10:07:57 +0200, Eric Dumazet wrote: > I wont discuss if introducing a new sysctl is welcomed, only on patch > issues. I believe some work in IETF is done to reduce the 3sec value to > 1sec anyway. Why not? I though all new knobs in this area should be done on a per route metric so it can be controlled on a per path basis. RTO should be adjustable on a per path basis, because it depends on the path. Some months back [1] I posted a patch to enable/disable TCP quick ack mode, which has nothing to do with network paths, just with a local server policy. But David rejected the patch with the argument that I should use a per path knob (this is a little bit inapprehensible for me, but David has the last word). Hagen [1] http://kerneltrap.org/mailarchive/linux-netdev/2010/8/23/6283640 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-17 11:02 ` Hagen Paul Pfeifer (?) @ 2011-05-17 12:20 ` Eric Dumazet -1 siblings, 0 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-17 12:20 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Benoit Sigoure, davem, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel Le mardi 17 mai 2011 à 13:02 +0200, Hagen Paul Pfeifer a écrit : > On Tue, 17 May 2011 10:07:57 +0200, Eric Dumazet wrote: > > > I wont discuss if introducing a new sysctl is welcomed, only on patch > > issues. I believe some work in IETF is done to reduce the 3sec value to > > 1sec anyway. > > Why not? Just because I let this point to David and others. I personally dont care that much. > I though all new knobs in this area should be done on a per route > metric so it can be controlled on a per path basis. RTO should be > adjustable on a per path basis, because it depends on the path. > Adding many knobs to each clone had a huge cost on previous kernels. (Think some machines have millions entries in IP route cache), this used quite a lot of memory. With latest David work, we'll consume less ram, because we can now share settings, instead of copying them on each dst entry. > Some months back [1] I posted a patch to enable/disable TCP quick ack > mode, which has nothing to do with network paths, just with a local server > policy. But David rejected the patch with the argument that I should use a > per path knob (this is a little bit inapprehensible for me, but David has > the last word). Well, if nobody speaks after David, he has the last word indeed. BTW, I remember Stephen actually asked the per route thing, not David. http://kerneltrap.org/mailarchive/linux-netdev/2010/8/23/6283641 Then David also stated it : http://kerneltrap.org/mailarchive/linux-netdev/2010/8/23/6283678 If you really want tcp_quickack thing you really should do it as requested by both Stephen & David ;) Unfortunately, I dont know if its really needed or worthwhile. ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-17 8:07 ` Eric Dumazet 2011-05-17 11:02 ` Hagen Paul Pfeifer @ 2011-05-18 10:43 ` Benoit Sigoure 2011-05-18 19:26 ` David Miller 1 sibling, 1 reply; 53+ messages in thread From: Benoit Sigoure @ 2011-05-18 10:43 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber Cc: netdev, linux-kernel, Benoit Sigoure Instead of hardcoding the initial RTO to 3s and requiring the kernel to be recompiled to change it, expose it as a sysctl that can be tuned at runtime. Leave the default value unchanged. Signed-off-by: Benoit Sigoure <tsunanet@gmail.com> --- v2 of the patch to address Eric's comments. Of course I had to forget to convert things back and forth between jiffies and ms -- /me n00b. Code compiles. It seems like no one is opposed to this change, but if one of you guys could express explicit interest in merging this change, I'd be happy to spend a bit more time to test it. The new sysctl is exposed in milliseconds but internally the value remains in jiffies to avoid having to convert back / and forth between jiffies and ms in most places. I'm glad to hear that the default value will be tuned down to 1s. This change will help people play with this value and easily revert it back at runtime if they feel like they preferred the current value. Thank you for your time. Documentation/networking/ip-sysctl.txt | 8 ++++++++ include/net/tcp.h | 3 ++- net/ipv4/syncookies.c | 2 +- net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++ net/ipv4/tcp.c | 4 ++-- net/ipv4/tcp_input.c | 8 ++++---- net/ipv4/tcp_ipv4.c | 6 +++--- net/ipv4/tcp_minisocks.c | 6 +++--- net/ipv4/tcp_output.c | 2 +- net/ipv4/tcp_timer.c | 9 +++++---- net/ipv6/syncookies.c | 2 +- net/ipv6/tcp_ipv6.c | 6 +++--- 12 files changed, 44 insertions(+), 23 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d3d653a..7f3c7d2 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -384,6 +384,14 @@ tcp_retries2 - INTEGER RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8. +tcp_initial_rto - INTEGER + This value sets the initial retransmit timeout (in milliseconds), + that is how long the kernel will wait before retransmitting the + initial SYN packet. + + RFC 1122 says that this SHOULD be 3000 milliseconds, which is the + default. + tcp_rfc1337 - BOOLEAN If set, the TCP stack behaves conforming to RFC1337. If unset, we are not conforming to RFC, but prevent TCP TIME_WAIT diff --git a/include/net/tcp.h b/include/net/tcp.h index cda30ea..d6d7dea 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -213,6 +213,7 @@ extern int sysctl_tcp_syn_retries; extern int sysctl_tcp_synack_retries; extern int sysctl_tcp_retries1; extern int sysctl_tcp_retries2; +extern int sysctl_tcp_initial_rto; /* in jiffies */ extern int sysctl_tcp_orphan_retries; extern int sysctl_tcp_syncookies; extern int sysctl_tcp_retrans_collapse; @@ -295,7 +296,7 @@ static inline void tcp_synq_overflow(struct sock *sk) static inline int tcp_synq_no_recent_overflow(const struct sock *sk) { unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; - return time_after(jiffies, last_overflow + TCP_TIMEOUT_INIT); + return time_after(jiffies, last_overflow + sysctl_tcp_initial_rto); } extern struct proto tcp_prot; diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 8b44c6d..b035968 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -186,7 +186,7 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp) * sysctl_tcp_retries1. It's a rather complicated formula (exponential * backoff) to compute at runtime so it's currently hardcoded here. */ -#define COUNTER_TRIES 4 +#define COUNTER_TRIES (sysctl_tcp_initial_rto/HZ + 1) /* * Check if a ack sequence number is a valid syncookie. * Return the decoded mss if it is, or 0 if not. diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 321e6e8..51c778d 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -30,6 +30,8 @@ static int tcp_adv_win_scale_min = -31; static int tcp_adv_win_scale_max = 31; static int ip_ttl_min = 1; static int ip_ttl_max = 255; +static int tcp_initial_rto_min = TCP_RTO_MIN; +static int tcp_initial_rto_max = TCP_RTO_MAX; /* Update system visible IP port range */ static void set_local_port_range(int range[2]) @@ -246,6 +248,15 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "tcp_initial_rto", + .data = &sysctl_tcp_initial_rto, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_ms_jiffies, + .extra1 = &tcp_initial_rto_min, + .extra2 = &tcp_initial_rto_max, + }, { .procname = "tcp_fin_timeout", .data = &sysctl_tcp_fin_timeout, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index b22d450..e9e7c3f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2352,7 +2352,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_DEFER_ACCEPT: /* Translate value in seconds to number of retransmits */ icsk->icsk_accept_queue.rskq_defer_accept = - secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ, + secs_to_retrans(val, sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); break; @@ -2539,7 +2539,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level, break; case TCP_DEFER_ACCEPT: val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept, - TCP_TIMEOUT_INIT / HZ, TCP_RTO_MAX / HZ); + sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bef9f04..39f6c27 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -890,7 +890,7 @@ static void tcp_init_metrics(struct sock *sk) if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (sysctl_tcp_initial_rto << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,7 +916,7 @@ static void tcp_init_metrics(struct sock *sk) tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { + if (inet_csk(sk)->icsk_rto < sysctl_tcp_initial_rto && !tp->rx_opt.saw_tstamp) { reset: /* Play conservative. If timestamps are not * supported, TCP will fail to recalculate correct @@ -924,8 +924,8 @@ reset: */ if (!tp->rx_opt.saw_tstamp && tp->srtt) { tp->srtt = 0; - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + tp->mdev = tp->mdev_max = tp->rttvar = sysctl_tcp_initial_rto; + inet_csk(sk)->icsk_rto = sysctl_tcp_initial_rto; } } tp->snd_cwnd = tcp_init_cwnd(tp, dst); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f7e6c2c..21920e6 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1383,7 +1383,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) want_cookie) goto drop_and_free; - inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + inet_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto); return 0; drop_and_release: @@ -1834,8 +1834,8 @@ static int tcp_v4_init_sock(struct sock *sk) tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); - icsk->icsk_rto = TCP_TIMEOUT_INIT; - tp->mdev = TCP_TIMEOUT_INIT; + icsk->icsk_rto = sysctl_tcp_initial_rto; + tp->mdev = sysctl_tcp_initial_rto; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 80b1f80..c63ffa0 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -472,8 +472,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, tcp_init_wl(newtp, treq->rcv_isn); newtp->srtt = 0; - newtp->mdev = TCP_TIMEOUT_INIT; - newicsk->icsk_rto = TCP_TIMEOUT_INIT; + newtp->mdev = sysctl_tcp_initial_rto; + newicsk->icsk_rto = sysctl_tcp_initial_rto; newtp->packets_out = 0; newtp->retrans_out = 0; @@ -582,7 +582,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, * it can be estimated (approximately) * from another data. */ - tmp_opt.ts_recent_stamp = get_seconds() - ((TCP_TIMEOUT_INIT/HZ)<<req->retrans); + tmp_opt.ts_recent_stamp = get_seconds() - ((sysctl_tcp_initial_rto/HZ)<<req->retrans); paws_reject = tcp_paws_reject(&tmp_opt, th->rst); } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 17388c7..e34b0f6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2599,7 +2599,7 @@ static void tcp_connect_init(struct sock *sk) tp->rcv_wup = 0; tp->copied_seq = 0; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + inet_csk(sk)->icsk_rto = sysctl_tcp_initial_rto; inet_csk(sk)->icsk_retransmits = 0; tcp_clear_retrans(tp); } diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index ecd44b0..b9da62b 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -29,6 +29,7 @@ int sysctl_tcp_keepalive_probes __read_mostly = TCP_KEEPALIVE_PROBES; int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL; int sysctl_tcp_retries1 __read_mostly = TCP_RETR1; int sysctl_tcp_retries2 __read_mostly = TCP_RETR2; +int sysctl_tcp_initial_rto __read_mostly = TCP_TIMEOUT_INIT; int sysctl_tcp_orphan_retries __read_mostly; int sysctl_tcp_thin_linear_timeouts __read_mostly; @@ -135,8 +136,8 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) /* This function calculates a "timeout" which is equivalent to the timeout of a * TCP connection after "boundary" unsuccessful, exponentially backed-off - * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if - * syn_set flag is set. + * retransmissions with an initial RTO of TCP_RTO_MIN or + * sysctl_tcp_initial_rto if syn_set flag is set. */ static bool retransmits_timed_out(struct sock *sk, unsigned int boundary, @@ -144,7 +145,7 @@ static bool retransmits_timed_out(struct sock *sk, bool syn_set) { unsigned int linear_backoff_thresh, start_ts; - unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN; + unsigned int rto_base = syn_set ? sysctl_tcp_initial_rto : TCP_RTO_MIN; if (!inet_csk(sk)->icsk_retransmits) return false; @@ -495,7 +496,7 @@ out_unlock: static void tcp_synack_timer(struct sock *sk) { inet_csk_reqsk_queue_prune(sk, TCP_SYNQ_INTERVAL, - TCP_TIMEOUT_INIT, TCP_RTO_MAX); + sysctl_tcp_initial_rto, TCP_RTO_MAX); } void tcp_syn_ack_timeout(struct sock *sk, struct request_sock *req) diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index 352c260..f8a07a8 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -45,7 +45,7 @@ static __u16 const msstab[] = { * sysctl_tcp_retries1. It's a rather complicated formula (exponential * backoff) to compute at runtime so it's currently hardcoded here. */ -#define COUNTER_TRIES 4 +#define COUNTER_TRIES (sysctl_tcp_initial_rto/HZ + 1) static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4f49e5d..7e791e6 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1349,7 +1349,7 @@ have_isn: want_cookie) goto drop_and_free; - inet6_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + inet6_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto); return 0; drop_and_release: @@ -1957,8 +1957,8 @@ static int tcp_v6_init_sock(struct sock *sk) tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); - icsk->icsk_rto = TCP_TIMEOUT_INIT; - tp->mdev = TCP_TIMEOUT_INIT; + icsk->icsk_rto = sysctl_tcp_initial_rto; + tp->mdev = sysctl_tcp_initial_rto; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-18 10:43 ` Benoit Sigoure @ 2011-05-18 19:26 ` David Miller 2011-05-18 19:40 ` tsuna 0 siblings, 1 reply; 53+ messages in thread From: David Miller @ 2011-05-18 19:26 UTC (permalink / raw) To: tsunanet; +Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel From: Benoit Sigoure <tsunanet@gmail.com> Date: Wed, 18 May 2011 03:43:04 -0700 > Instead of hardcoding the initial RTO to 3s and requiring > the kernel to be recompiled to change it, expose it as a > sysctl that can be tuned at runtime. Leave the default > value unchanged. > > Signed-off-by: Benoit Sigoure <tsunanet@gmail.com> If you read the ietf draft that reduces the initial RTO down to 1 second, it states that if we take a timeout during the initial connection handshake then we have to revert the RTO back up to 3 seconds. This fallback logic conflicts with being able to only change the initial RTO via sysctl, I think. Because there are actually two values at stake and they depend upon eachother, the initial RTO and the value we fallback to on initial handshake retransmissions. So I'd rather get a patch that implements the 1 second initial RTO with the 3 second fallback on SYN retransmit, than this patch. We already have too many knobs. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-18 19:26 ` David Miller @ 2011-05-18 19:40 ` tsuna 2011-05-18 19:52 ` David Miller 2011-05-20 2:01 ` [PATCH] tcp: Expose the initial RTO via a new sysctl H.K. Jerry Chu 0 siblings, 2 replies; 53+ messages in thread From: tsuna @ 2011-05-18 19:40 UTC (permalink / raw) To: David Miller Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Wed, May 18, 2011 at 12:26 PM, David Miller <davem@davemloft.net> wrote: > If you read the ietf draft that reduces the initial RTO down to 1 > second, it states that if we take a timeout during the initial > connection handshake then we have to revert the RTO back up to 3 > seconds. > > This fallback logic conflicts with being able to only change the > initial RTO via sysctl, I think. Because there are actually two > values at stake and they depend upon eachother, the initial RTO and > the value we fallback to on initial handshake retransmissions. > > So I'd rather get a patch that implements the 1 second initial > RTO with the 3 second fallback on SYN retransmit, than this patch. > > We already have too many knobs. I was hoping this knob would be accepted because this is such an important issue that it even warrants an IETF draft to attempt to change the standard. I'm not sure how long it will take for this draft to be accepted and then implemented, so I thought adding this simple knob today would really help in the future. Plus, should the draft be accepted, this knob will still be just as useful (e.g. to revert back to today's behavior), and people might want to consider adding another knob for the fallback initRTO (this is debatable). I don't believe this knob conflicts with the proposed change to the standard, it actually goes along with it pretty well and helps us prepare better for this upcoming change. I agree that there are too many knobs, and I hate feature creep too, but I've found many of these knobs to be really useful, and the degree to which Linux's TCP stack can be tuned is part of what makes it so versatile. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-18 19:40 ` tsuna @ 2011-05-18 19:52 ` David Miller 2011-05-18 20:20 ` Hagen Paul Pfeifer 2011-05-19 2:22 ` Benoit Sigoure 2011-05-20 2:01 ` [PATCH] tcp: Expose the initial RTO via a new sysctl H.K. Jerry Chu 1 sibling, 2 replies; 53+ messages in thread From: David Miller @ 2011-05-18 19:52 UTC (permalink / raw) To: tsunanet; +Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel From: tsuna <tsunanet@gmail.com> Date: Wed, 18 May 2011 12:40:21 -0700 > I was hoping this knob would be accepted because this is such an > important issue that it even warrants an IETF draft to attempt to > change the standard. I'm not sure how long it will take for this > draft to be accepted and then implemented, so I thought adding this > simple knob today would really help in the future. I've already changed the initial TCP congestion window in Linux to 10 without some stupid draft being fully accepted. I'll just as easily accept right now a patch right now which lowers the initial RTO to 1 second and adds the 3 second RTO fallback. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-18 19:52 ` David Miller @ 2011-05-18 20:20 ` Hagen Paul Pfeifer 2011-05-18 20:23 ` David Miller 2011-05-20 10:27 ` H.K. Jerry Chu 2011-05-19 2:22 ` Benoit Sigoure 1 sibling, 2 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-18 20:20 UTC (permalink / raw) To: David Miller Cc: tsunanet, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel * David Miller | 2011-05-18 15:52:00 [-0400]: >I've already changed the initial TCP congestion window in Linux to 10 >without some stupid draft being fully accepted. > >I'll just as easily accept right now a patch right now which lowers >the initial RTO to 1 second and adds the 3 second RTO fallback. I like the idea to make the initial RTO a knob because we in a isolated MANET environment have a RTT larger then 1 second. Especially the link layer setup procedure over several hops demand some time-costly setup time. After that the RTT is <1 second. The current algorithm works great for us. So this RTO change will be counterproductive: it will always trigger a needless timeout. The main problem for us is that Google at all pushing their view of Internet with a lot of pressure. The same is true for the IETF IW adjustments, which is unsuitable for networks which operates at a bandwidth characteristic some years ago. The _former_ conservative principle "TCP over everything" is forgotten. Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-18 20:20 ` Hagen Paul Pfeifer @ 2011-05-18 20:23 ` David Miller 2011-05-18 20:27 ` Hagen Paul Pfeifer 2011-05-20 10:27 ` H.K. Jerry Chu 1 sibling, 1 reply; 53+ messages in thread From: David Miller @ 2011-05-18 20:23 UTC (permalink / raw) To: hagen Cc: tsunanet, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel From: Hagen Paul Pfeifer <hagen@jauu.net> Date: Wed, 18 May 2011 22:20:25 +0200 > I like the idea to make the initial RTO a knob because we in a > isolated MANET environment have a RTT larger then 1 second. Then this gets back to the fact that this is a network attribute and thus more suitable as a route metric not a global system-wide sysctl. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-18 20:23 ` David Miller @ 2011-05-18 20:27 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-18 20:27 UTC (permalink / raw) To: David Miller Cc: tsunanet, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel * David Miller | 2011-05-18 16:23:06 [-0400]: >Then this gets back to the fact that this is a network >attribute and thus more suitable as a route metric not >a global system-wide sysctl. Yes, in an Email response to Eric I mentioned this already. The initial RTO is a perfect candidate for route metric. I waiting for a patch to test it! ;-) Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-18 20:20 ` Hagen Paul Pfeifer 2011-05-18 20:23 ` David Miller @ 2011-05-20 10:27 ` H.K. Jerry Chu 2011-05-20 11:00 ` Hagen Paul Pfeifer 1 sibling, 1 reply; 53+ messages in thread From: H.K. Jerry Chu @ 2011-05-20 10:27 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: David Miller, tsunanet, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Wed, May 18, 2011 at 1:20 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > * David Miller | 2011-05-18 15:52:00 [-0400]: > >>I've already changed the initial TCP congestion window in Linux to 10 >>without some stupid draft being fully accepted. >> >>I'll just as easily accept right now a patch right now which lowers >>the initial RTO to 1 second and adds the 3 second RTO fallback. > > I like the idea to make the initial RTO a knob because we in a isolated MANET > environment have a RTT larger then 1 second. Especially the link layer setup > procedure over several hops demand some time-costly setup time. After that the > RTT is <1 second. The current algorithm works great for us. So this RTO change > will be counterproductive: it will always trigger a needless timeout. > > The main problem for us is that Google at all pushing their view of Internet > with a lot of pressure. The same is true for the IETF IW adjustments, which is > unsuitable for networks which operates at a bandwidth characteristic some > years ago. The _former_ conservative principle "TCP over everything" is > forgotten. Not sure how our various parameter tuning proposals deviate from the "TCP over everything" principle? Note that the design goal of rfc2988bis is to try to benefit 98% of traffic while keeping any negative impact to the remaining 2% at a minimum. This is why we limit the use of < 3sec initRTO to at most once. This way the negative impact of the 1sec initRTO to a path with RTT > 1sec is limited mostly to one additional, small, spuriously retransmitted SYN or SYN-ACK pkt, and the unnecessary reduction of IW to 1 segment. We actually thought about removing the IW reduction part but unfortunately the text belongs to a different rfc5681, which is at a higher maturity level ("draft-standard") than rfc2988 hence can't be done as part of rfc2988bis. Anyway I have since added the recommendation to the IW10 draft. See draft-ietf-tcpm-initcwnd-01.txt. The bottom line is the damage of rfc2988bis to any network with initRTT > 1sec is limited to one spurious retransmitted SYN/SYN-ACK. In the current Linux code, the SYN/SYN-ACK retransmit is forgotten on the passive open side by the time 3WHS is completed so there is nothing needed to be done. But for the active open side SYN retransmit will cause not long IW to be reduced to 1, but also reduction of ssthresh, which is not part of rfc5681 so some more work is needed. I can provide a patch (or work with tsuna) to ensure a correct fix is made. Jerry > > Hagen > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-20 10:27 ` H.K. Jerry Chu @ 2011-05-20 11:00 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-20 11:00 UTC (permalink / raw) To: H.K. Jerry Chu Cc: David Miller, tsunanet, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Fri, 20 May 2011 03:27:37 -0700, "H.K. Jerry Chu" wrote: Hi Jerry > Not sure how our various parameter tuning proposals deviate from the "TCP > over everything" principle? For our environment it hurts because we _always_ have an initial RTO >1. I understand and accept that 98% will benefit of this modification, no doubt Jerry! Try to put yourself in our situation: imaging a proposal of an init RTO modification to 0.5 seconds. Maybe because 98% of Internet traffic is now localized and the RTO is average now 0.2 seconds. Anyway, this will penalize your network always and this will be the situation for one of my customer. I can live with that, I see the benefits for the rest of the world. But I am happy to see a knob where I can restore the old behavior. Maybe some other environments will benefit from a even lower or higher initial RTO. Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. @ 2011-05-20 11:00 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-20 11:00 UTC (permalink / raw) To: H.K. Jerry Chu Cc: David Miller, tsunanet, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Fri, 20 May 2011 03:27:37 -0700, "H.K. Jerry Chu" wrote: Hi Jerry > Not sure how our various parameter tuning proposals deviate from the "TCP > over everything" principle? For our environment it hurts because we _always_ have an initial RTO >1. I understand and accept that 98% will benefit of this modification, no doubt Jerry! Try to put yourself in our situation: imaging a proposal of an init RTO modification to 0.5 seconds. Maybe because 98% of Internet traffic is now localized and the RTO is average now 0.2 seconds. Anyway, this will penalize your network always and this will be the situation for one of my customer. I can live with that, I see the benefits for the rest of the world. But I am happy to see a knob where I can restore the old behavior. Maybe some other environments will benefit from a even lower or higher initial RTO. Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-20 11:00 ` Hagen Paul Pfeifer (?) @ 2011-05-20 12:37 ` Alan Cox -1 siblings, 0 replies; 53+ messages in thread From: Alan Cox @ 2011-05-20 12:37 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: H.K. Jerry Chu, David Miller, tsunanet, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel > For our environment it hurts because we _always_ have an initial RTO >1. I > understand and accept that 98% will benefit of this modification, no doubt > Jerry! Try to put yourself in our situation: imaging a proposal of an init > RTO modification to 0.5 seconds. Maybe because 98% of Internet traffic is > now localized and the RTO is average now 0.2 seconds. Anyway, this will > penalize your network always and this will be the situation for one of my > customer. I can live with that, I see the benefits for the rest of the > world. But I am happy to see a knob where I can restore the old behavior. > Maybe some other environments will benefit from a even lower or higher > initial RTO. AX.25 is definitely happier with a multi-second round trip but it's a special case. Some X.25 networks are going to have similar behaviour. It shouldn't be penalising each connection (and it's worse than that of course because each node on a shared media network gets in the way of the rest, plus the queueing effect of all the extra blockages) because done right multiple connections to the same host can use the previous connections as estimates (and indeed for the initial RTO there's a good argument for treating estimates as 'host, then x.y.z.* match, then average of previous except the x.y.z.* match, then unknown') The latter would fix an awful lot of the weird cases pretty effectively. Alan ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-20 11:00 ` Hagen Paul Pfeifer (?) (?) @ 2011-05-21 0:06 ` H.K. Jerry Chu 2011-05-31 14:48 ` tsuna -1 siblings, 1 reply; 53+ messages in thread From: H.K. Jerry Chu @ 2011-05-21 0:06 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: David Miller, tsunanet, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel Hey Hagen, On Fri, May 20, 2011 at 4:00 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > > On Fri, 20 May 2011 03:27:37 -0700, "H.K. Jerry Chu" wrote: > > Hi Jerry > >> Not sure how our various parameter tuning proposals deviate from the > "TCP >> over everything" principle? > > For our environment it hurts because we _always_ have an initial RTO >1. I > understand and accept that 98% will benefit of this modification, no doubt > Jerry! Try to put yourself in our situation: imaging a proposal of an init Understood but my point was none of the parameter tuning proposals break "TCP over everything", although they may not help solving "TCP optimized for everything", but we never had the latter anyway. We've tried hard to keep the penalty to those initRTT > 1sec paths at a minimum, i.e., just one extra tinygram. This is important also for us because it may take > 1sec for many Android clients to establish connections over a radio channel that has been put into power saving mode. > RTO modification to 0.5 seconds. Maybe because 98% of Internet traffic is > now localized and the RTO is average now 0.2 seconds. Anyway, this will > penalize your network always and this will be the situation for one of my > customer. I can live with that, I see the benefits for the rest of the > world. But I am happy to see a knob where I can restore the old behavior. > Maybe some other environments will benefit from a even lower or higher > initial RTO. Yep, that's why we've had a knob for this for years. Jerry > > Hagen > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-21 0:06 ` H.K. Jerry Chu @ 2011-05-31 14:48 ` tsuna 2011-05-31 15:25 ` Hagen Paul Pfeifer 0 siblings, 1 reply; 53+ messages in thread From: tsuna @ 2011-05-31 14:48 UTC (permalink / raw) To: H.K. Jerry Chu Cc: Hagen Paul Pfeifer, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Fri, May 20, 2011 at 5:06 PM, H.K. Jerry Chu <hkjerry.chu@gmail.com> wrote: > Yep, that's why we've had a knob for this for years. I was traveling last week so sorry for not replying earlier to various comments people made. I talked to Jerry and he's agreed to share some patches that Google has been using internally for years. I started this work because after leaving Google and taking these changes for granted, I was surprised to find that they weren't actually part of the mainline Linux kernel. It seems that David is willing to accept a change that will lower the initRTO to 1s (compile-time constant), with a fallback to 3s (compile-time constant), as per the draft rfc2988bis. Others are legitimately worried about the impact this would cause in environments where RTT is typically (or always) in the 1-3s range. Some would like to see this as a per-destination thing. Personally what I think would be ideal would be: 1. A sysctl knob for initRTO, to allow people to adjust this appropriately for their environment. 2. Apply the srtt / rttvar seen on previous connections to new connections. Does that sound reasonable? For 2), I'm not sure how the details would work yet, I believe the kernel already has what's necessary to remember these things on a per peer basis, but it would be nice if I could specify things like "for 10.x.0.0/16 (local datacenter) use this aggressive setting, for 10.0.0.0/8 (my internal backend network) use that, for everything else (Internets etc.) use the default". -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-31 14:48 ` tsuna @ 2011-05-31 15:25 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-31 15:25 UTC (permalink / raw) To: tsuna Cc: H.K. Jerry Chu, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Tue, 31 May 2011 07:48:09 -0700, tsuna <tsunanet@gmail.com> wrote: > I talked to Jerry and he's agreed to share some patches that Google > has been using internally for years. Great! > Personally what I think would be ideal would be: > 1. A sysctl knob for initRTO, to allow people to adjust this > appropriately for their environment. > 2. Apply the srtt / rttvar seen on previous connections to new > connections. > > Does that sound reasonable? > > For 2), I'm not sure how the details would work yet, I believe the > kernel already has what's necessary to remember these things on a per > peer basis, but it would be nice if I could specify things like "for > 10.x.0.0/16 (local datacenter) use this aggressive setting, for > 10.0.0.0/8 (my internal backend network) use that, for everything else > (Internets etc.) use the default". Skip sysctl, it is deprecated. The initRTO is the ideal candidate for a per route knob. And happily you will solve 2) with the per route thing too! ;-) Search the web, you will find some patches where you can see how to extend the per route system - including iproute2. Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. @ 2011-05-31 15:25 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-31 15:25 UTC (permalink / raw) To: tsuna Cc: H.K. Jerry Chu, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Tue, 31 May 2011 07:48:09 -0700, tsuna <tsunanet@gmail.com> wrote: > I talked to Jerry and he's agreed to share some patches that Google > has been using internally for years. Great! > Personally what I think would be ideal would be: > 1. A sysctl knob for initRTO, to allow people to adjust this > appropriately for their environment. > 2. Apply the srtt / rttvar seen on previous connections to new > connections. > > Does that sound reasonable? > > For 2), I'm not sure how the details would work yet, I believe the > kernel already has what's necessary to remember these things on a per > peer basis, but it would be nice if I could specify things like "for > 10.x.0.0/16 (local datacenter) use this aggressive setting, for > 10.0.0.0/8 (my internal backend network) use that, for everything else > (Internets etc.) use the default". Skip sysctl, it is deprecated. The initRTO is the ideal candidate for a per route knob. And happily you will solve 2) with the per route thing too! ;-) Search the web, you will find some patches where you can see how to extend the per route system - including iproute2. Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-31 15:25 ` Hagen Paul Pfeifer (?) @ 2011-05-31 15:28 ` tsuna 2011-05-31 15:43 ` Hagen Paul Pfeifer -1 siblings, 1 reply; 53+ messages in thread From: tsuna @ 2011-05-31 15:28 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: H.K. Jerry Chu, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Tue, May 31, 2011 at 8:25 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > Skip sysctl, it is deprecated. Sorry I meant a knob such as /proc/sys/net/ipv4/tcp_initrto. > The initRTO is the ideal candidate for a > per route knob. And happily you will solve 2) with the per route thing too! You still need a knob for the default system-wide value, don't you? -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-31 15:28 ` tsuna @ 2011-05-31 15:43 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-31 15:43 UTC (permalink / raw) To: tsuna Cc: H.K. Jerry Chu, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Tue, 31 May 2011 08:28:18 -0700, tsuna <tsunanet@gmail.com> wrote: > Sorry I meant a knob such as /proc/sys/net/ipv4/tcp_initrto. That's the same! ;-) >> The initRTO is the ideal candidate for a >> per route knob. And happily you will solve 2) with the per route thing >> too! > > You still need a knob for the default system-wide value, don't you? Yes, try to re-read the emails. Sysctl is a no-go, with a per route interface you have the ability to tune the values. Talk with Jerry once again - he wrote that at Google they already have a patch for this. And with a per route knob you can select a even smaller value for your local network (e.g. datacenter) and a larger value for all other routes. It makes sense to provide a knob for this on a route basis, not on a global sysctl basis. But once again: talk with Jerry - he has the expert knowledge! Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. @ 2011-05-31 15:43 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-31 15:43 UTC (permalink / raw) To: tsuna Cc: H.K. Jerry Chu, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, netdev, linux-kernel On Tue, 31 May 2011 08:28:18 -0700, tsuna <tsunanet@gmail.com> wrote: > Sorry I meant a knob such as /proc/sys/net/ipv4/tcp_initrto. That's the same! ;-) >> The initRTO is the ideal candidate for a >> per route knob. And happily you will solve 2) with the per route thing >> too! > > You still need a knob for the default system-wide value, don't you? Yes, try to re-read the emails. Sysctl is a no-go, with a per route interface you have the ability to tune the values. Talk with Jerry once again - he wrote that at Google they already have a patch for this. And with a per route knob you can select a even smaller value for your local network (e.g. datacenter) and a larger value for all other routes. It makes sense to provide a knob for this on a route basis, not on a global sysctl basis. But once again: talk with Jerry - he has the expert knowledge! Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-18 19:52 ` David Miller @ 2011-05-19 2:22 ` Benoit Sigoure 2011-05-19 2:22 ` Benoit Sigoure 1 sibling, 0 replies; 53+ messages in thread From: Benoit Sigoure @ 2011-05-19 2:22 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann Cc: netdev, linux-kernel, Benoit Sigoure Prior to this patch, Linux would always use 3 seconds (compile-time constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune this down to 1 second and, in case of a timeout during the TCP 3WHS, revert the RTO back up to 3 seconds when data transmission begins. This patch implements this behavior but retains default values for the initial RTO of 3 seconds, instead of 1 second as is suggested in the draft RFC. This way, in a default configuration, the behavior of Linux's TCP is unchanged. This patch also adds 2 knobs to tweak the initial RTO: - tcp_initial_rto: initial RTO used during the 3WHS (default remains unchanged: 3 seconds). This was previously a compile-time constant. - tcp_initial_fallback_rto: the RTO to fallback to if a timeout occurs during the 3WHS, with a default value of 3 seconds too, as per the draft RFC. Signed-off-by: Benoit Sigoure <tsunanet@gmail.com> --- On Wed, May 18, 2011 at 12:52 PM, David Miller <davem@davemloft.net> wrote: > I'll just as easily accept right now a patch right now which lowers > the initial RTO to 1 second and adds the 3 second RTO fallback. Here's a first attempt at a patch that implements the behavior described in the draft RFC. I only compiled it so far, if you would like to move forward with this approach, I'll go ahead and test it on a real server. I'm not sure whether COUNTER_TRIES in syncookies.c should be based off sysctl_tcp_initial_rto or sysctl_tcp_initial_fallback_rto, if we're going to take the first one down to 1s... Documentation/networking/ip-sysctl.txt | 19 +++++++++++++++++++ include/net/tcp.h | 4 +++- net/ipv4/syncookies.c | 2 +- net/ipv4/sysctl_net_ipv4.c | 20 ++++++++++++++++++++ net/ipv4/tcp.c | 4 ++-- net/ipv4/tcp_input.c | 13 +++++++++---- net/ipv4/tcp_ipv4.c | 6 +++--- net/ipv4/tcp_minisocks.c | 6 +++--- net/ipv4/tcp_output.c | 2 +- net/ipv4/tcp_timer.c | 10 ++++++---- net/ipv6/syncookies.c | 2 +- net/ipv6/tcp_ipv6.c | 6 +++--- 12 files changed, 71 insertions(+), 23 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d3d653a..590042c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -384,6 +384,25 @@ tcp_retries2 - INTEGER RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8. +tcp_initial_rto - INTEGER + This value sets the initial retransmit timeout (in milliseconds), + that is how long the kernel will wait before retransmitting the + initial SYN packet. + + RFC 1122 says that this SHOULD be 3000 milliseconds, which is the + default. Note that draft RFC 2988bis-02 says that this SHOULD be + 1000 milliseconds, which might become the default value in future + versions. + +tcp_initial_fallback_rto - INTEGER + This value sets the initial retransmit timeout (in milliseconds) + to use after completing a three-way handshake during which the + initial SYN packet had to be retransmitted after waiting for + tcp_initial_rto milliseconds. + + Draft RFC 2988bis-02 says that this MUST be 3000 milliseconds, + which is the default. + tcp_rfc1337 - BOOLEAN If set, the TCP stack behaves conforming to RFC1337. If unset, we are not conforming to RFC, but prevent TCP TIME_WAIT diff --git a/include/net/tcp.h b/include/net/tcp.h index cda30ea..c974242 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -213,6 +213,8 @@ extern int sysctl_tcp_syn_retries; extern int sysctl_tcp_synack_retries; extern int sysctl_tcp_retries1; extern int sysctl_tcp_retries2; +extern int sysctl_tcp_initial_rto; /* in jiffies */ +extern int sysctl_tcp_initial_fallback_rto; /* in jiffies */ extern int sysctl_tcp_orphan_retries; extern int sysctl_tcp_syncookies; extern int sysctl_tcp_retrans_collapse; @@ -295,7 +297,7 @@ static inline void tcp_synq_overflow(struct sock *sk) static inline int tcp_synq_no_recent_overflow(const struct sock *sk) { unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; - return time_after(jiffies, last_overflow + TCP_TIMEOUT_INIT); + return time_after(jiffies, last_overflow + sysctl_tcp_initial_rto); } extern struct proto tcp_prot; diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 8b44c6d..b035968 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -186,7 +186,7 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp) * sysctl_tcp_retries1. It's a rather complicated formula (exponential * backoff) to compute at runtime so it's currently hardcoded here. */ -#define COUNTER_TRIES 4 +#define COUNTER_TRIES (sysctl_tcp_initial_rto/HZ + 1) /* * Check if a ack sequence number is a valid syncookie. * Return the decoded mss if it is, or 0 if not. diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 321e6e8..abe8cfc 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -30,6 +30,8 @@ static int tcp_adv_win_scale_min = -31; static int tcp_adv_win_scale_max = 31; static int ip_ttl_min = 1; static int ip_ttl_max = 255; +static int tcp_min_rto = TCP_RTO_MIN; +static int tcp_max_rto = TCP_RTO_MAX; /* Update system visible IP port range */ static void set_local_port_range(int range[2]) @@ -247,6 +249,24 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec }, { + .procname = "tcp_initial_rto", + .data = &sysctl_tcp_initial_rto, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_ms_jiffies, + .extra1 = &tcp_min_rto, + .extra2 = &tcp_max_rto, + }, + { + .procname = "tcp_initial_fallback_rto", + .data = &sysctl_tcp_initial_fallback_rto, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_ms_jiffies, + .extra1 = &tcp_min_rto, + .extra2 = &tcp_max_rto, + }, + { .procname = "tcp_fin_timeout", .data = &sysctl_tcp_fin_timeout, .maxlen = sizeof(int), diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index b22d450..e9e7c3f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2352,7 +2352,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_DEFER_ACCEPT: /* Translate value in seconds to number of retransmits */ icsk->icsk_accept_queue.rskq_defer_accept = - secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ, + secs_to_retrans(val, sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); break; @@ -2539,7 +2539,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level, break; case TCP_DEFER_ACCEPT: val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept, - TCP_TIMEOUT_INIT / HZ, TCP_RTO_MAX / HZ); + sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bef9f04..513cf7a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct dst_entry *dst = __sk_dst_get(sk); + /* If we had to retransmit anything during the 3WHS, + * use the initial fallback RTO. + */ + int init_rto = inet_csk(sk)->icsk_retransmits ? + sysctl_tcp_initial_fallback_rto : sysctl_tcp_initial_rto; if (dst == NULL) goto reset; @@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk) if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk) tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { + if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) { reset: /* Play conservative. If timestamps are not * supported, TCP will fail to recalculate correct @@ -924,8 +929,8 @@ reset: */ if (!tp->rx_opt.saw_tstamp && tp->srtt) { tp->srtt = 0; - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + tp->mdev = tp->mdev_max = tp->rttvar = init_rto; + inet_csk(sk)->icsk_rto = init_rto; } } tp->snd_cwnd = tcp_init_cwnd(tp, dst); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f7e6c2c..21920e6 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1383,7 +1383,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) want_cookie) goto drop_and_free; - inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + inet_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto); return 0; drop_and_release: @@ -1834,8 +1834,8 @@ static int tcp_v4_init_sock(struct sock *sk) tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); - icsk->icsk_rto = TCP_TIMEOUT_INIT; - tp->mdev = TCP_TIMEOUT_INIT; + icsk->icsk_rto = sysctl_tcp_initial_rto; + tp->mdev = sysctl_tcp_initial_rto; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 80b1f80..c63ffa0 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -472,8 +472,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, tcp_init_wl(newtp, treq->rcv_isn); newtp->srtt = 0; - newtp->mdev = TCP_TIMEOUT_INIT; - newicsk->icsk_rto = TCP_TIMEOUT_INIT; + newtp->mdev = sysctl_tcp_initial_rto; + newicsk->icsk_rto = sysctl_tcp_initial_rto; newtp->packets_out = 0; newtp->retrans_out = 0; @@ -582,7 +582,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, * it can be estimated (approximately) * from another data. */ - tmp_opt.ts_recent_stamp = get_seconds() - ((TCP_TIMEOUT_INIT/HZ)<<req->retrans); + tmp_opt.ts_recent_stamp = get_seconds() - ((sysctl_tcp_initial_rto/HZ)<<req->retrans); paws_reject = tcp_paws_reject(&tmp_opt, th->rst); } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 17388c7..e34b0f6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2599,7 +2599,7 @@ static void tcp_connect_init(struct sock *sk) tp->rcv_wup = 0; tp->copied_seq = 0; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + inet_csk(sk)->icsk_rto = sysctl_tcp_initial_rto; inet_csk(sk)->icsk_retransmits = 0; tcp_clear_retrans(tp); } diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index ecd44b0..47fa600 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -29,6 +29,8 @@ int sysctl_tcp_keepalive_probes __read_mostly = TCP_KEEPALIVE_PROBES; int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL; int sysctl_tcp_retries1 __read_mostly = TCP_RETR1; int sysctl_tcp_retries2 __read_mostly = TCP_RETR2; +int sysctl_tcp_initial_rto __read_mostly = TCP_TIMEOUT_INIT; +int sysctl_tcp_initial_fallback_rto __read_mostly = TCP_TIMEOUT_INIT; int sysctl_tcp_orphan_retries __read_mostly; int sysctl_tcp_thin_linear_timeouts __read_mostly; @@ -135,8 +137,8 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) /* This function calculates a "timeout" which is equivalent to the timeout of a * TCP connection after "boundary" unsuccessful, exponentially backed-off - * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if - * syn_set flag is set. + * retransmissions with an initial RTO of TCP_RTO_MIN or + * sysctl_tcp_initial_rto if syn_set flag is set. */ static bool retransmits_timed_out(struct sock *sk, unsigned int boundary, @@ -144,7 +146,7 @@ static bool retransmits_timed_out(struct sock *sk, bool syn_set) { unsigned int linear_backoff_thresh, start_ts; - unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN; + unsigned int rto_base = syn_set ? sysctl_tcp_initial_rto : TCP_RTO_MIN; if (!inet_csk(sk)->icsk_retransmits) return false; @@ -495,7 +497,7 @@ out_unlock: static void tcp_synack_timer(struct sock *sk) { inet_csk_reqsk_queue_prune(sk, TCP_SYNQ_INTERVAL, - TCP_TIMEOUT_INIT, TCP_RTO_MAX); + sysctl_tcp_initial_rto, TCP_RTO_MAX); } void tcp_syn_ack_timeout(struct sock *sk, struct request_sock *req) diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index 352c260..f8a07a8 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -45,7 +45,7 @@ static __u16 const msstab[] = { * sysctl_tcp_retries1. It's a rather complicated formula (exponential * backoff) to compute at runtime so it's currently hardcoded here. */ -#define COUNTER_TRIES 4 +#define COUNTER_TRIES (sysctl_tcp_initial_rto/HZ + 1) static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4f49e5d..7e791e6 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1349,7 +1349,7 @@ have_isn: want_cookie) goto drop_and_free; - inet6_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + inet6_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto); return 0; drop_and_release: @@ -1957,8 +1957,8 @@ static int tcp_v6_init_sock(struct sock *sk) tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); - icsk->icsk_rto = TCP_TIMEOUT_INIT; - tp->mdev = TCP_TIMEOUT_INIT; + icsk->icsk_rto = sysctl_tcp_initial_rto; + tp->mdev = sysctl_tcp_initial_rto; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. @ 2011-05-19 2:22 ` Benoit Sigoure 0 siblings, 0 replies; 53+ messages in thread From: Benoit Sigoure @ 2011-05-19 2:22 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann Cc: netdev, linux-kernel, Benoit Sigoure Prior to this patch, Linux would always use 3 seconds (compile-time constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune this down to 1 second and, in case of a timeout during the TCP 3WHS, revert the RTO back up to 3 seconds when data transmission begins. This patch implements this behavior but retains default values for the initial RTO of 3 seconds, instead of 1 second as is suggested in the draft RFC. This way, in a default configuration, the behavior of Linux's TCP is unchanged. This patch also adds 2 knobs to tweak the initial RTO: - tcp_initial_rto: initial RTO used during the 3WHS (default remains unchanged: 3 seconds). This was previously a compile-time constant. - tcp_initial_fallback_rto: the RTO to fallback to if a timeout occurs during the 3WHS, with a default value of 3 seconds too, as per the draft RFC. Signed-off-by: Benoit Sigoure <tsunanet@gmail.com> --- On Wed, May 18, 2011 at 12:52 PM, David Miller <davem@davemloft.net> wrote: > I'll just as easily accept right now a patch right now which lowers > the initial RTO to 1 second and adds the 3 second RTO fallback. Here's a first attempt at a patch that implements the behavior described in the draft RFC. I only compiled it so far, if you would like to move forward with this approach, I'll go ahead and test it on a real server. I'm not sure whether COUNTER_TRIES in syncookies.c should be based off sysctl_tcp_initial_rto or sysctl_tcp_initial_fallback_rto, if we're going to take the first one down to 1s... Documentation/networking/ip-sysctl.txt | 19 +++++++++++++++++++ include/net/tcp.h | 4 +++- net/ipv4/syncookies.c | 2 +- net/ipv4/sysctl_net_ipv4.c | 20 ++++++++++++++++++++ net/ipv4/tcp.c | 4 ++-- net/ipv4/tcp_input.c | 13 +++++++++---- net/ipv4/tcp_ipv4.c | 6 +++--- net/ipv4/tcp_minisocks.c | 6 +++--- net/ipv4/tcp_output.c | 2 +- net/ipv4/tcp_timer.c | 10 ++++++---- net/ipv6/syncookies.c | 2 +- net/ipv6/tcp_ipv6.c | 6 +++--- 12 files changed, 71 insertions(+), 23 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d3d653a..590042c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -384,6 +384,25 @@ tcp_retries2 - INTEGER RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8. +tcp_initial_rto - INTEGER + This value sets the initial retransmit timeout (in milliseconds), + that is how long the kernel will wait before retransmitting the + initial SYN packet. + + RFC 1122 says that this SHOULD be 3000 milliseconds, which is the + default. Note that draft RFC 2988bis-02 says that this SHOULD be + 1000 milliseconds, which might become the default value in future + versions. + +tcp_initial_fallback_rto - INTEGER + This value sets the initial retransmit timeout (in milliseconds) + to use after completing a three-way handshake during which the + initial SYN packet had to be retransmitted after waiting for + tcp_initial_rto milliseconds. + + Draft RFC 2988bis-02 says that this MUST be 3000 milliseconds, + which is the default. + tcp_rfc1337 - BOOLEAN If set, the TCP stack behaves conforming to RFC1337. If unset, we are not conforming to RFC, but prevent TCP TIME_WAIT diff --git a/include/net/tcp.h b/include/net/tcp.h index cda30ea..c974242 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -213,6 +213,8 @@ extern int sysctl_tcp_syn_retries; extern int sysctl_tcp_synack_retries; extern int sysctl_tcp_retries1; extern int sysctl_tcp_retries2; +extern int sysctl_tcp_initial_rto; /* in jiffies */ +extern int sysctl_tcp_initial_fallback_rto; /* in jiffies */ extern int sysctl_tcp_orphan_retries; extern int sysctl_tcp_syncookies; extern int sysctl_tcp_retrans_collapse; @@ -295,7 +297,7 @@ static inline void tcp_synq_overflow(struct sock *sk) static inline int tcp_synq_no_recent_overflow(const struct sock *sk) { unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; - return time_after(jiffies, last_overflow + TCP_TIMEOUT_INIT); + return time_after(jiffies, last_overflow + sysctl_tcp_initial_rto); } extern struct proto tcp_prot; diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 8b44c6d..b035968 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -186,7 +186,7 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp) * sysctl_tcp_retries1. It's a rather complicated formula (exponential * backoff) to compute at runtime so it's currently hardcoded here. */ -#define COUNTER_TRIES 4 +#define COUNTER_TRIES (sysctl_tcp_initial_rto/HZ + 1) /* * Check if a ack sequence number is a valid syncookie. * Return the decoded mss if it is, or 0 if not. diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 321e6e8..abe8cfc 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -30,6 +30,8 @@ static int tcp_adv_win_scale_min = -31; static int tcp_adv_win_scale_max = 31; static int ip_ttl_min = 1; static int ip_ttl_max = 255; +static int tcp_min_rto = TCP_RTO_MIN; +static int tcp_max_rto = TCP_RTO_MAX; /* Update system visible IP port range */ static void set_local_port_range(int range[2]) @@ -247,6 +249,24 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec }, { + .procname = "tcp_initial_rto", + .data = &sysctl_tcp_initial_rto, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_ms_jiffies, + .extra1 = &tcp_min_rto, + .extra2 = &tcp_max_rto, + }, + { + .procname = "tcp_initial_fallback_rto", + .data = &sysctl_tcp_initial_fallback_rto, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_ms_jiffies, + .extra1 = &tcp_min_rto, + .extra2 = &tcp_max_rto, + }, + { .procname = "tcp_fin_timeout", .data = &sysctl_tcp_fin_timeout, .maxlen = sizeof(int), diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index b22d450..e9e7c3f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2352,7 +2352,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_DEFER_ACCEPT: /* Translate value in seconds to number of retransmits */ icsk->icsk_accept_queue.rskq_defer_accept = - secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ, + secs_to_retrans(val, sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); break; @@ -2539,7 +2539,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level, break; case TCP_DEFER_ACCEPT: val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept, - TCP_TIMEOUT_INIT / HZ, TCP_RTO_MAX / HZ); + sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bef9f04..513cf7a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct dst_entry *dst = __sk_dst_get(sk); + /* If we had to retransmit anything during the 3WHS, + * use the initial fallback RTO. + */ + int init_rto = inet_csk(sk)->icsk_retransmits ? + sysctl_tcp_initial_fallback_rto : sysctl_tcp_initial_rto; if (dst == NULL) goto reset; @@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk) if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk) tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { + if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) { reset: /* Play conservative. If timestamps are not * supported, TCP will fail to recalculate correct @@ -924,8 +929,8 @@ reset: */ if (!tp->rx_opt.saw_tstamp && tp->srtt) { tp->srtt = 0; - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + tp->mdev = tp->mdev_max = tp->rttvar = init_rto; + inet_csk(sk)->icsk_rto = init_rto; } } tp->snd_cwnd = tcp_init_cwnd(tp, dst); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f7e6c2c..21920e6 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1383,7 +1383,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) want_cookie) goto drop_and_free; - inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + inet_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto); return 0; drop_and_release: @@ -1834,8 +1834,8 @@ static int tcp_v4_init_sock(struct sock *sk) tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); - icsk->icsk_rto = TCP_TIMEOUT_INIT; - tp->mdev = TCP_TIMEOUT_INIT; + icsk->icsk_rto = sysctl_tcp_initial_rto; + tp->mdev = sysctl_tcp_initial_rto; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 80b1f80..c63ffa0 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -472,8 +472,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, tcp_init_wl(newtp, treq->rcv_isn); newtp->srtt = 0; - newtp->mdev = TCP_TIMEOUT_INIT; - newicsk->icsk_rto = TCP_TIMEOUT_INIT; + newtp->mdev = sysctl_tcp_initial_rto; + newicsk->icsk_rto = sysctl_tcp_initial_rto; newtp->packets_out = 0; newtp->retrans_out = 0; @@ -582,7 +582,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, * it can be estimated (approximately) * from another data. */ - tmp_opt.ts_recent_stamp = get_seconds() - ((TCP_TIMEOUT_INIT/HZ)<<req->retrans); + tmp_opt.ts_recent_stamp = get_seconds() - ((sysctl_tcp_initial_rto/HZ)<<req->retrans); paws_reject = tcp_paws_reject(&tmp_opt, th->rst); } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 17388c7..e34b0f6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2599,7 +2599,7 @@ static void tcp_connect_init(struct sock *sk) tp->rcv_wup = 0; tp->copied_seq = 0; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + inet_csk(sk)->icsk_rto = sysctl_tcp_initial_rto; inet_csk(sk)->icsk_retransmits = 0; tcp_clear_retrans(tp); } diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index ecd44b0..47fa600 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -29,6 +29,8 @@ int sysctl_tcp_keepalive_probes __read_mostly = TCP_KEEPALIVE_PROBES; int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL; int sysctl_tcp_retries1 __read_mostly = TCP_RETR1; int sysctl_tcp_retries2 __read_mostly = TCP_RETR2; +int sysctl_tcp_initial_rto __read_mostly = TCP_TIMEOUT_INIT; +int sysctl_tcp_initial_fallback_rto __read_mostly = TCP_TIMEOUT_INIT; int sysctl_tcp_orphan_retries __read_mostly; int sysctl_tcp_thin_linear_timeouts __read_mostly; @@ -135,8 +137,8 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) /* This function calculates a "timeout" which is equivalent to the timeout of a * TCP connection after "boundary" unsuccessful, exponentially backed-off - * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if - * syn_set flag is set. + * retransmissions with an initial RTO of TCP_RTO_MIN or + * sysctl_tcp_initial_rto if syn_set flag is set. */ static bool retransmits_timed_out(struct sock *sk, unsigned int boundary, @@ -144,7 +146,7 @@ static bool retransmits_timed_out(struct sock *sk, bool syn_set) { unsigned int linear_backoff_thresh, start_ts; - unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN; + unsigned int rto_base = syn_set ? sysctl_tcp_initial_rto : TCP_RTO_MIN; if (!inet_csk(sk)->icsk_retransmits) return false; @@ -495,7 +497,7 @@ out_unlock: static void tcp_synack_timer(struct sock *sk) { inet_csk_reqsk_queue_prune(sk, TCP_SYNQ_INTERVAL, - TCP_TIMEOUT_INIT, TCP_RTO_MAX); + sysctl_tcp_initial_rto, TCP_RTO_MAX); } void tcp_syn_ack_timeout(struct sock *sk, struct request_sock *req) diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index 352c260..f8a07a8 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -45,7 +45,7 @@ static __u16 const msstab[] = { * sysctl_tcp_retries1. It's a rather complicated formula (exponential * backoff) to compute at runtime so it's currently hardcoded here. */ -#define COUNTER_TRIES 4 +#define COUNTER_TRIES (sysctl_tcp_initial_rto/HZ + 1) static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4f49e5d..7e791e6 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1349,7 +1349,7 @@ have_isn: want_cookie) goto drop_and_free; - inet6_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT); + inet6_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto); return 0; drop_and_release: @@ -1957,8 +1957,8 @@ static int tcp_v6_init_sock(struct sock *sk) tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); - icsk->icsk_rto = TCP_TIMEOUT_INIT; - tp->mdev = TCP_TIMEOUT_INIT; + icsk->icsk_rto = sysctl_tcp_initial_rto; + tp->mdev = sysctl_tcp_initial_rto; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 2:22 ` Benoit Sigoure (?) @ 2011-05-19 2:36 ` David Miller 2011-05-19 3:56 ` tsuna -1 siblings, 1 reply; 53+ messages in thread From: David Miller @ 2011-05-19 2:36 UTC (permalink / raw) To: tsunanet Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann, netdev, linux-kernel From: Benoit Sigoure <tsunanet@gmail.com> Date: Wed, 18 May 2011 19:22:24 -0700 > Prior to this patch, Linux would always use 3 seconds (compile-time > constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune > this down to 1 second and, in case of a timeout during the TCP 3WHS, > revert the RTO back up to 3 seconds when data transmission begins. We just had a discussion where it was determined that changes to these settings are "network specific" and therefore that if it is appropriate at all (I'm still not convinced) it is only suitable as a routing metric. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 2:36 ` David Miller @ 2011-05-19 3:56 ` tsuna 2011-05-19 4:14 ` David Miller 0 siblings, 1 reply; 53+ messages in thread From: tsuna @ 2011-05-19 3:56 UTC (permalink / raw) To: David Miller Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann, netdev, linux-kernel On Wed, May 18, 2011 at 7:36 PM, David Miller <davem@davemloft.net> wrote: > From: Benoit Sigoure <tsunanet@gmail.com> > Date: Wed, 18 May 2011 19:22:24 -0700 > >> Prior to this patch, Linux would always use 3 seconds (compile-time >> constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune >> this down to 1 second and, in case of a timeout during the TCP 3WHS, >> revert the RTO back up to 3 seconds when data transmission begins. > > We just had a discussion where it was determined that changes to > these settings are "network specific" and therefore that if it > is appropriate at all (I'm still not convinced) it is only suitable > as a routing metric. Fair enough. I'll take another stab at it and see if I can change this to be on a per network basis. Do I need any patch that's not yet in Linus' tree? I'm referring to this: On Tue, May 17, 2011 at 5:20 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Adding many knobs to each clone had a huge cost on previous kernels. > (Think some machines have millions entries in IP route cache), this used > quite a lot of memory. > > With latest David work, we'll consume less ram, because we can now share > settings, instead of copying them on each dst entry. If this has already been merged then it sounds like I should have everything I need..? -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 3:56 ` tsuna @ 2011-05-19 4:14 ` David Miller 2011-05-19 4:33 ` tsuna 0 siblings, 1 reply; 53+ messages in thread From: David Miller @ 2011-05-19 4:14 UTC (permalink / raw) To: tsunanet Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann, netdev, linux-kernel From: tsuna <tsunanet@gmail.com> Date: Wed, 18 May 2011 20:56:33 -0700 > On Wed, May 18, 2011 at 7:36 PM, David Miller <davem@davemloft.net> wrote: >> From: Benoit Sigoure <tsunanet@gmail.com> >> Date: Wed, 18 May 2011 19:22:24 -0700 >> >>> Prior to this patch, Linux would always use 3 seconds (compile-time >>> constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune >>> this down to 1 second and, in case of a timeout during the TCP 3WHS, >>> revert the RTO back up to 3 seconds when data transmission begins. >> >> We just had a discussion where it was determined that changes to >> these settings are "network specific" and therefore that if it >> is appropriate at all (I'm still not convinced) it is only suitable >> as a routing metric. > > Fair enough. I'll take another stab at it and see if I can change > this to be on a per network basis. Do I need any patch that's not yet > in Linus' tree? I'm referring to this: Keep in mind another thing I do not like about this knob. The IETF draft has a requirement that we fallback to 3 seconds if the initial RTO is 1 second. Nothing in your facilities ensure this, or provide a way for the kernel to make sure this is the case. And for other values of initial RTO, what fallback is appropriate? As a result of all of this, I do not really think this is something the user should control at all. I really would rather see the initial RTO be static and be set to 1 with fallback RTO of 3. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 4:14 ` David Miller @ 2011-05-19 4:33 ` tsuna 2011-05-19 5:46 ` David Miller 2011-05-19 6:10 ` [PATCH] tcp: Implement a two-level initial RTO " Alexander Zimmermann 0 siblings, 2 replies; 53+ messages in thread From: tsuna @ 2011-05-19 4:33 UTC (permalink / raw) To: David Miller Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann, netdev, linux-kernel On Wed, May 18, 2011 at 9:14 PM, David Miller <davem@davemloft.net> wrote: > The IETF draft has a requirement that we fallback to 3 seconds if the > initial RTO is 1 second. > > Nothing in your facilities ensure this, or provide a way for the > kernel to make sure this is the case. Not sure to understand what you're saying. If tcp_initial_rto = 1000 and tcp_initial_fallback_rto = 3000, then you get exactly the behavior the draft describes. The knobs simply allow you to either revert to today's behavior or use other settings that would make more sense in your environment (e.g. very high RTT). Are you concerned about cases where, say, tcp_initial_fallback_rto < tcp_initial_rto? > And for other values of initial RTO, what fallback is appropriate? Presumably if the user decides to tweak these knobs, they'll know what's appropriate for their environment. Or are you suggesting that one value be derived from the other? (e.g. tcp_initial_fallback_rto = 3 * tcp_initial_rto) > As a result of all of this, I do not really think this is something > the user should control at all. > > I really would rather see the initial RTO be static and be set to 1 > with fallback RTO of 3. I can also provide a simple patch for this if you want to start from there. And then maybe we can discuss having a runtime knob some more :-) -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 4:33 ` tsuna @ 2011-05-19 5:46 ` David Miller 2011-05-19 6:36 ` Benoit Sigoure 2011-05-19 6:47 ` Benoit Sigoure 2011-05-19 6:10 ` [PATCH] tcp: Implement a two-level initial RTO " Alexander Zimmermann 1 sibling, 2 replies; 53+ messages in thread From: David Miller @ 2011-05-19 5:46 UTC (permalink / raw) To: tsunanet Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann, netdev, linux-kernel From: tsuna <tsunanet@gmail.com> Date: Wed, 18 May 2011 21:33:21 -0700 > On Wed, May 18, 2011 at 9:14 PM, David Miller <davem@davemloft.net> wrote: >> I really would rather see the initial RTO be static and be set to 1 >> with fallback RTO of 3. > > I can also provide a simple patch for this if you want to start from > there. And then maybe we can discuss having a runtime knob some more > :-) Yeah why don't we do that :-) ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] tcp: Lower the initial RTO to 1s as per draft RFC 2988bis-02. 2011-05-19 5:46 ` David Miller @ 2011-05-19 6:36 ` Benoit Sigoure 2011-05-19 6:47 ` Benoit Sigoure 1 sibling, 0 replies; 53+ messages in thread From: Benoit Sigoure @ 2011-05-19 6:36 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann Cc: netdev, linux-kernel, Benoit Sigoure From: Benoit Sigoure <tsuna@stumbleupon.com> Draft RFC 2988bis-02 recommends that the initial RTO be lowered from 3 seconds down to 1 second, and that in case of a timeout during the TCP 3WHS, the RTO should fallback to 3 seconds when data transmission begins. --- On Wed, May 18, 2011 at 10:46 PM, David Miller <davem@davemloft.net> wrote: > From: tsuna <tsunanet@gmail.com> > Date: Wed, 18 May 2011 21:33:21 -0700 > >> On Wed, May 18, 2011 at 9:14 PM, David Miller <davem@davemloft.net> wrote: >>> I really would rather see the initial RTO be static and be set to 1 >>> with fallback RTO of 3. >> >> I can also provide a simple patch for this if you want to start from >> there. And then maybe we can discuss having a runtime knob some more >> :-) > > Yeah why don't we do that :-) Alright, here we go. include/net/tcp.h | 5 ++++- net/ipv4/tcp_input.c | 13 +++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index cda30ea..274d761 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -122,7 +122,10 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); #endif #define TCP_RTO_MAX ((unsigned)(120*HZ)) #define TCP_RTO_MIN ((unsigned)(HZ/5)) -#define TCP_TIMEOUT_INIT ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value */ +/* The next 2 values come from Draft RFC 2988bis-02. */ +#define TCP_TIMEOUT_INIT ((unsigned)(1*HZ)) /* initial RTO value */ +#define TCP_TIMEOUT_INIT_FALLBACK ((unsigned)(3*HZ)) /* initial RTO to fallback to when + * a timeout happens during the 3WHS. */ #define TCP_RESOURCE_PROBE_INTERVAL ((unsigned)(HZ/2U)) /* Maximal interval between probes * for local resources. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bef9f04..a36bc35 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct dst_entry *dst = __sk_dst_get(sk); + /* If we had to retransmit anything during the 3WHS, use + * the initial fallback RTO as per draft RFC 2988bis-02. + */ + int init_rto = inet_csk(sk)->icsk_retransmits ? + TCP_TIMEOUT_INIT_FALLBACK : TCP_TIMEOUT_INIT; if (dst == NULL) goto reset; @@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk) if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk) tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { + if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) { reset: /* Play conservative. If timestamps are not * supported, TCP will fail to recalculate correct @@ -924,8 +929,8 @@ reset: */ if (!tp->rx_opt.saw_tstamp && tp->srtt) { tp->srtt = 0; - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + tp->mdev = tp->mdev_max = tp->rttvar = init_rto; + inet_csk(sk)->icsk_rto = init_rto; } } tp->snd_cwnd = tcp_init_cwnd(tp, dst); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH] tcp: Lower the initial RTO to 1s as per draft RFC 2988bis-02. @ 2011-05-19 6:36 ` Benoit Sigoure 0 siblings, 0 replies; 53+ messages in thread From: Benoit Sigoure @ 2011-05-19 6:36 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann Cc: netdev, linux-kernel, Benoit Sigoure From: Benoit Sigoure <tsuna@stumbleupon.com> Draft RFC 2988bis-02 recommends that the initial RTO be lowered from 3 seconds down to 1 second, and that in case of a timeout during the TCP 3WHS, the RTO should fallback to 3 seconds when data transmission begins. --- On Wed, May 18, 2011 at 10:46 PM, David Miller <davem@davemloft.net> wrote: > From: tsuna <tsunanet@gmail.com> > Date: Wed, 18 May 2011 21:33:21 -0700 > >> On Wed, May 18, 2011 at 9:14 PM, David Miller <davem@davemloft.net> wrote: >>> I really would rather see the initial RTO be static and be set to 1 >>> with fallback RTO of 3. >> >> I can also provide a simple patch for this if you want to start from >> there. And then maybe we can discuss having a runtime knob some more >> :-) > > Yeah why don't we do that :-) Alright, here we go. include/net/tcp.h | 5 ++++- net/ipv4/tcp_input.c | 13 +++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index cda30ea..274d761 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -122,7 +122,10 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); #endif #define TCP_RTO_MAX ((unsigned)(120*HZ)) #define TCP_RTO_MIN ((unsigned)(HZ/5)) -#define TCP_TIMEOUT_INIT ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value */ +/* The next 2 values come from Draft RFC 2988bis-02. */ +#define TCP_TIMEOUT_INIT ((unsigned)(1*HZ)) /* initial RTO value */ +#define TCP_TIMEOUT_INIT_FALLBACK ((unsigned)(3*HZ)) /* initial RTO to fallback to when + * a timeout happens during the 3WHS. */ #define TCP_RESOURCE_PROBE_INTERVAL ((unsigned)(HZ/2U)) /* Maximal interval between probes * for local resources. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bef9f04..a36bc35 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct dst_entry *dst = __sk_dst_get(sk); + /* If we had to retransmit anything during the 3WHS, use + * the initial fallback RTO as per draft RFC 2988bis-02. + */ + int init_rto = inet_csk(sk)->icsk_retransmits ? + TCP_TIMEOUT_INIT_FALLBACK : TCP_TIMEOUT_INIT; if (dst == NULL) goto reset; @@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk) if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk) tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { + if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) { reset: /* Play conservative. If timestamps are not * supported, TCP will fail to recalculate correct @@ -924,8 +929,8 @@ reset: */ if (!tp->rx_opt.saw_tstamp && tp->srtt) { tp->srtt = 0; - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + tp->mdev = tp->mdev_max = tp->rttvar = init_rto; + inet_csk(sk)->icsk_rto = init_rto; } } tp->snd_cwnd = tcp_init_cwnd(tp, dst); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Lower the initial RTO to 1s as per draft RFC 2988bis-02. 2011-05-19 6:36 ` Benoit Sigoure (?) @ 2011-05-19 17:42 ` Yuchung Cheng -1 siblings, 0 replies; 53+ messages in thread From: Yuchung Cheng @ 2011-05-19 17:42 UTC (permalink / raw) To: Benoit Sigoure; +Cc: netdev, linux-kernel, Benoit Sigoure, Hsiao-keng Jerry Chu Hi Benoit, AFAICT, the passive open side would not fall back the RTO to 3sec in this change because SYNACK timeouts are not recorded in icsk_retransmits but reqsk->retrans? Yuchung On Wed, May 18, 2011 at 11:36 PM, Benoit Sigoure <tsunanet@gmail.com> wrote: > > From: Benoit Sigoure <tsuna@stumbleupon.com> > > Draft RFC 2988bis-02 recommends that the initial RTO be lowered > from 3 seconds down to 1 second, and that in case of a timeout > during the TCP 3WHS, the RTO should fallback to 3 seconds when > data transmission begins. > --- > > On Wed, May 18, 2011 at 10:46 PM, David Miller <davem@davemloft.net> wrote: > > From: tsuna <tsunanet@gmail.com> > > Date: Wed, 18 May 2011 21:33:21 -0700 > > > >> On Wed, May 18, 2011 at 9:14 PM, David Miller <davem@davemloft.net> wrote: > >>> I really would rather see the initial RTO be static and be set to 1 > >>> with fallback RTO of 3. > >> > >> I can also provide a simple patch for this if you want to start from > >> there. And then maybe we can discuss having a runtime knob some more > >> :-) > > > > Yeah why don't we do that :-) > > Alright, here we go. > > > include/net/tcp.h | 5 ++++- > net/ipv4/tcp_input.c | 13 +++++++++---- > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index cda30ea..274d761 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -122,7 +122,10 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); > #endif > #define TCP_RTO_MAX ((unsigned)(120*HZ)) > #define TCP_RTO_MIN ((unsigned)(HZ/5)) > -#define TCP_TIMEOUT_INIT ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value */ > +/* The next 2 values come from Draft RFC 2988bis-02. */ > +#define TCP_TIMEOUT_INIT ((unsigned)(1*HZ)) /* initial RTO value */ > +#define TCP_TIMEOUT_INIT_FALLBACK ((unsigned)(3*HZ)) /* initial RTO to fallback to when > + * a timeout happens during the 3WHS. */ > > #define TCP_RESOURCE_PROBE_INTERVAL ((unsigned)(HZ/2U)) /* Maximal interval between probes > * for local resources. > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index bef9f04..a36bc35 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk) > { > struct tcp_sock *tp = tcp_sk(sk); > struct dst_entry *dst = __sk_dst_get(sk); > + /* If we had to retransmit anything during the 3WHS, use > + * the initial fallback RTO as per draft RFC 2988bis-02. > + */ > + int init_rto = inet_csk(sk)->icsk_retransmits ? > + TCP_TIMEOUT_INIT_FALLBACK : TCP_TIMEOUT_INIT; > > if (dst == NULL) > goto reset; > @@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk) > if (dst_metric(dst, RTAX_RTT) == 0) > goto reset; > > - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) > + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3)) > goto reset; > > /* Initial rtt is determined from SYN,SYN-ACK. > @@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk) > tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); > } > tcp_set_rto(sk); > - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { > + if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) { > reset: > /* Play conservative. If timestamps are not > * supported, TCP will fail to recalculate correct > @@ -924,8 +929,8 @@ reset: > */ > if (!tp->rx_opt.saw_tstamp && tp->srtt) { > tp->srtt = 0; > - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; > - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; > + tp->mdev = tp->mdev_max = tp->rttvar = init_rto; > + inet_csk(sk)->icsk_rto = init_rto; > } > } > tp->snd_cwnd = tcp_init_cwnd(tp, dst); > -- > 1.7.0.4 > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] tcp: Lower the initial RTO to 1s as per draft RFC 2988bis-02. 2011-05-19 5:46 ` David Miller @ 2011-05-19 6:47 ` Benoit Sigoure 2011-05-19 6:47 ` Benoit Sigoure 1 sibling, 0 replies; 53+ messages in thread From: Benoit Sigoure @ 2011-05-19 6:47 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann Cc: netdev, linux-kernel, Benoit Sigoure Draft RFC 2988bis-02 recommends that the initial RTO be lowered from 3 seconds down to 1 second, and that in case of a timeout during the TCP 3WHS, the RTO should fallback to 3 seconds when data transmission begins. Signed-off-by: Benoit Sigoure <tsunanet@gmail.com> --- Apologies for the spam, I sent this patch from the wrong address and without sob'ing it. I build the Linux kernel in a 15G tmpfs (it's faster this way :D) and I lost my .git/config after a reboot. include/net/tcp.h | 5 ++++- net/ipv4/tcp_input.c | 13 +++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index cda30ea..274d761 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -122,7 +122,10 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); #endif #define TCP_RTO_MAX ((unsigned)(120*HZ)) #define TCP_RTO_MIN ((unsigned)(HZ/5)) -#define TCP_TIMEOUT_INIT ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value */ +/* The next 2 values come from Draft RFC 2988bis-02. */ +#define TCP_TIMEOUT_INIT ((unsigned)(1*HZ)) /* initial RTO value */ +#define TCP_TIMEOUT_INIT_FALLBACK ((unsigned)(3*HZ)) /* initial RTO to fallback to when + * a timeout happens during the 3WHS. */ #define TCP_RESOURCE_PROBE_INTERVAL ((unsigned)(HZ/2U)) /* Maximal interval between probes * for local resources. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bef9f04..a36bc35 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct dst_entry *dst = __sk_dst_get(sk); + /* If we had to retransmit anything during the 3WHS, use + * the initial fallback RTO as per draft RFC 2988bis-02. + */ + int init_rto = inet_csk(sk)->icsk_retransmits ? + TCP_TIMEOUT_INIT_FALLBACK : TCP_TIMEOUT_INIT; if (dst == NULL) goto reset; @@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk) if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk) tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { + if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) { reset: /* Play conservative. If timestamps are not * supported, TCP will fail to recalculate correct @@ -924,8 +929,8 @@ reset: */ if (!tp->rx_opt.saw_tstamp && tp->srtt) { tp->srtt = 0; - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + tp->mdev = tp->mdev_max = tp->rttvar = init_rto; + inet_csk(sk)->icsk_rto = init_rto; } } tp->snd_cwnd = tcp_init_cwnd(tp, dst); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH] tcp: Lower the initial RTO to 1s as per draft RFC 2988bis-02. @ 2011-05-19 6:47 ` Benoit Sigoure 0 siblings, 0 replies; 53+ messages in thread From: Benoit Sigoure @ 2011-05-19 6:47 UTC (permalink / raw) To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann Cc: netdev, linux-kernel, Benoit Sigoure Draft RFC 2988bis-02 recommends that the initial RTO be lowered from 3 seconds down to 1 second, and that in case of a timeout during the TCP 3WHS, the RTO should fallback to 3 seconds when data transmission begins. Signed-off-by: Benoit Sigoure <tsunanet@gmail.com> --- Apologies for the spam, I sent this patch from the wrong address and without sob'ing it. I build the Linux kernel in a 15G tmpfs (it's faster this way :D) and I lost my .git/config after a reboot. include/net/tcp.h | 5 ++++- net/ipv4/tcp_input.c | 13 +++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index cda30ea..274d761 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -122,7 +122,10 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); #endif #define TCP_RTO_MAX ((unsigned)(120*HZ)) #define TCP_RTO_MIN ((unsigned)(HZ/5)) -#define TCP_TIMEOUT_INIT ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value */ +/* The next 2 values come from Draft RFC 2988bis-02. */ +#define TCP_TIMEOUT_INIT ((unsigned)(1*HZ)) /* initial RTO value */ +#define TCP_TIMEOUT_INIT_FALLBACK ((unsigned)(3*HZ)) /* initial RTO to fallback to when + * a timeout happens during the 3WHS. */ #define TCP_RESOURCE_PROBE_INTERVAL ((unsigned)(HZ/2U)) /* Maximal interval between probes * for local resources. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bef9f04..a36bc35 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct dst_entry *dst = __sk_dst_get(sk); + /* If we had to retransmit anything during the 3WHS, use + * the initial fallback RTO as per draft RFC 2988bis-02. + */ + int init_rto = inet_csk(sk)->icsk_retransmits ? + TCP_TIMEOUT_INIT_FALLBACK : TCP_TIMEOUT_INIT; if (dst == NULL) goto reset; @@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk) if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk) tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); - if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) { + if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) { reset: /* Play conservative. If timestamps are not * supported, TCP will fail to recalculate correct @@ -924,8 +929,8 @@ reset: */ if (!tp->rx_opt.saw_tstamp && tp->srtt) { tp->srtt = 0; - tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT; - inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT; + tp->mdev = tp->mdev_max = tp->rttvar = init_rto; + inet_csk(sk)->icsk_rto = init_rto; } } tp->snd_cwnd = tcp_init_cwnd(tp, dst); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Lower the initial RTO to 1s as per draft RFC 2988bis-02. 2011-05-19 6:47 ` Benoit Sigoure (?) @ 2011-05-19 20:16 ` David Miller -1 siblings, 0 replies; 53+ messages in thread From: David Miller @ 2011-05-19 20:16 UTC (permalink / raw) To: tsunanet Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, alexander.zimmermann, netdev, linux-kernel From: Benoit Sigoure <tsunanet@gmail.com> Date: Wed, 18 May 2011 23:47:49 -0700 > @@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk) > { > struct tcp_sock *tp = tcp_sk(sk); > struct dst_entry *dst = __sk_dst_get(sk); > + /* If we had to retransmit anything during the 3WHS, use > + * the initial fallback RTO as per draft RFC 2988bis-02. > + */ > + int init_rto = inet_csk(sk)->icsk_retransmits ? > + TCP_TIMEOUT_INIT_FALLBACK : TCP_TIMEOUT_INIT; Please do not put comments in the middle of a set of function local variable declarations. Also, as mentioned already, icsk_retransmits is not where SYN retransmissions are counted. It is stored in the TCP minisocket ->retrans field. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 4:33 ` tsuna 2011-05-19 5:46 ` David Miller @ 2011-05-19 6:10 ` Alexander Zimmermann 2011-05-19 6:25 ` tsuna 1 sibling, 1 reply; 53+ messages in thread From: Alexander Zimmermann @ 2011-05-19 6:10 UTC (permalink / raw) To: tsuna Cc: David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1115 bytes --] Hi, Am 19.05.2011 um 06:33 schrieb tsuna: > Presumably if the user decides to tweak these knobs, they'll know > what's appropriate for their environment. Are you sure? I'm not. I fully agree with David that minRTO is something that a user shout not control at all > Or are you suggesting that > one value be derived from the other? (e.g. tcp_initial_fallback_rto = > 3 * tcp_initial_rto) > >> As a result of all of this, I do not really think this is something >> the user should control at all. >> >> I really would rather see the initial RTO be static and be set to 1 >> with fallback RTO of 3. > > I can also provide a simple patch for this if you want to start from > there. And then maybe we can discuss having a runtime knob some more > :-) > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 6:10 ` [PATCH] tcp: Implement a two-level initial RTO " Alexander Zimmermann @ 2011-05-19 6:25 ` tsuna 2011-05-19 6:36 ` Alexander Zimmermann 0 siblings, 1 reply; 53+ messages in thread From: tsuna @ 2011-05-19 6:25 UTC (permalink / raw) To: Alexander Zimmermann Cc: David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, netdev, linux-kernel On Wed, May 18, 2011 at 11:10 PM, Alexander Zimmermann <alexander.zimmermann@comsys.rwth-aachen.de> wrote: > Am 19.05.2011 um 06:33 schrieb tsuna: >> Presumably if the user decides to tweak these knobs, they'll know >> what's appropriate for their environment. > > Are you sure? I'm not. I fully agree with David that minRTO is s/minRTO/initRTO/, right? > something that a user shout not control at all I personally don't like to hold the hand and spoon feed users too much, I want to trust them to be responsible and know what they're doing. Yes, there will always be people who will act stupid and do stupid things with whatever knobs you expose. The web is full of people who advise to tune up all the TCP rmem/wmem parameters to crazy high level based on the voodoo belief that they're going to improve their TCP performance, but then as long as you have knobs in your system, these people will misuse them anyway and shoot themselves in the foot, what can we do about that. There's also a good chunk of people who know what they're doing, and for them compile-time constants are annoying because it's inconvenient to experiment and iterate quickly when you need to recompile your kernel to change a value. If turning the compile time constant into a knob leaves the code reasonably straightforward and doesn't incur too much overhead, then why not do it? Regarding this knob in particular, I can imagine that people who are in environment where RTT easily gets around 1s will be upset by the change in the default value, and doubly upset that they have to recompile their kernel to change the value back to 3s. I'm in favor of the reduction of initRTO, for the same reason Google is, but I can also understand that the direction we're taking might not be appropriate for everyone. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 6:25 ` tsuna @ 2011-05-19 6:36 ` Alexander Zimmermann 2011-05-19 6:42 ` tsuna 0 siblings, 1 reply; 53+ messages in thread From: Alexander Zimmermann @ 2011-05-19 6:36 UTC (permalink / raw) To: tsuna Cc: David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2625 bytes --] Am 19.05.2011 um 08:25 schrieb tsuna: > On Wed, May 18, 2011 at 11:10 PM, Alexander Zimmermann > <alexander.zimmermann@comsys.rwth-aachen.de> wrote: >> Am 19.05.2011 um 06:33 schrieb tsuna: >>> Presumably if the user decides to tweak these knobs, they'll know >>> what's appropriate for their environment. >> >> Are you sure? I'm not. I fully agree with David that minRTO is > > s/minRTO/initRTO/, right? Yes of course :-) > >> something that a user shout not control at all > > I personally don't like to hold the hand and spoon feed users too > much, I want to trust them to be responsible and know what they're > doing. Yes, there will always be people who will act stupid and do > stupid things with whatever knobs you expose. The web is full of > people who advise to tune up all the TCP rmem/wmem parameters to crazy > high level based on the voodoo belief that they're going to improve > their TCP performance, but then as long as you have knobs in your > system, these people will misuse them anyway and shoot themselves in > the foot, what can we do about that. But if you tune rmen/wmen to crazy level, it's only your TCP performance that hurts (and maybe the receiver's one). If you set the initRTO=0.1s, it's good for me but bad for the rest of the world. That's the difference. Or do you want to implement a lower barrier of 1sec so that you can ensure that nobody set the initRTO lower than 1s? > > There's also a good chunk of people who know what they're doing, and > for them compile-time constants are annoying because it's inconvenient > to experiment and iterate quickly when you need to recompile your > kernel to change a value. If turning the compile time constant into a > knob leaves the code reasonably straightforward and doesn't incur too > much overhead, then why not do it? > > Regarding this knob in particular, I can imagine that people who are > in environment where RTT easily gets around 1s will be upset by the > change in the default value, and doubly upset that they have to > recompile their kernel to change the value back to 3s. I'm in favor > of the reduction of initRTO, for the same reason Google is, but I can > also understand that the direction we're taking might not be > appropriate for everyone. > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 6:36 ` Alexander Zimmermann @ 2011-05-19 6:42 ` tsuna 2011-05-19 6:52 ` Alexander Zimmermann 0 siblings, 1 reply; 53+ messages in thread From: tsuna @ 2011-05-19 6:42 UTC (permalink / raw) To: Alexander Zimmermann Cc: David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, netdev, linux-kernel On Wed, May 18, 2011 at 11:36 PM, Alexander Zimmermann <alexander.zimmermann@comsys.rwth-aachen.de> wrote: > If you set the initRTO=0.1s, it's good for me but bad for the rest of the > world. That's the difference. > > Or do you want to implement a lower barrier of 1sec so that you can ensure > that nobody set the initRTO lower than 1s? Oh, I see. Yes, there is a lower bound (and an upper bound) on what values the kernel will accept as initRTO. In the patch "Implement a two-level initial RTO as per draft RFC 2988bis-02" above, I re-used TCP_RTO_MIN and TCP_RTO_MAX in net/ipv4/sysctl_net_ipv4.c in order to prevent users from setting a minRTO that's outside this range. They are defined as follows in tcp.h: #define TCP_RTO_MAX ((unsigned)(120*HZ)) #define TCP_RTO_MIN ((unsigned)(HZ/5)) So we're talking about a [200ms ; 120s] range no matter what. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 6:42 ` tsuna @ 2011-05-19 6:52 ` Alexander Zimmermann 2011-05-19 7:07 ` tsuna 2011-05-19 8:02 ` Hagen Paul Pfeifer 0 siblings, 2 replies; 53+ messages in thread From: Alexander Zimmermann @ 2011-05-19 6:52 UTC (permalink / raw) To: tsuna Cc: David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1462 bytes --] Am 19.05.2011 um 08:42 schrieb tsuna: > On Wed, May 18, 2011 at 11:36 PM, Alexander Zimmermann > <alexander.zimmermann@comsys.rwth-aachen.de> wrote: >> If you set the initRTO=0.1s, it's good for me but bad for the rest of the >> world. That's the difference. >> >> Or do you want to implement a lower barrier of 1sec so that you can ensure >> that nobody set the initRTO lower than 1s? > > Oh, I see. Yes, there is a lower bound (and an upper bound) on what > values the kernel will accept as initRTO. In the patch "Implement a > two-level initial RTO as per draft RFC 2988bis-02" above, I re-used > TCP_RTO_MIN and TCP_RTO_MAX in net/ipv4/sysctl_net_ipv4.c in order to > prevent users from setting a minRTO that's outside this range. They > are defined as follows in tcp.h: > > #define TCP_RTO_MAX ((unsigned)(120*HZ)) > #define TCP_RTO_MIN ((unsigned)(HZ/5)) > > So we're talking about a [200ms ; 120s] range no matter what. Why is 200ms a valid lower bound for initRTO? I'm aware of measurements that 1s is save for Internet, but I don't know of any studies that 200ms is save... > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 6:52 ` Alexander Zimmermann @ 2011-05-19 7:07 ` tsuna 2011-05-19 8:02 ` Hagen Paul Pfeifer 1 sibling, 0 replies; 53+ messages in thread From: tsuna @ 2011-05-19 7:07 UTC (permalink / raw) To: Alexander Zimmermann Cc: David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet, netdev, linux-kernel On Wed, May 18, 2011 at 11:52 PM, Alexander Zimmermann <alexander.zimmermann@comsys.rwth-aachen.de> wrote: >> So we're talking about a [200ms ; 120s] range no matter what. > > Why is 200ms a valid lower bound for initRTO? I'm aware of > measurements that 1s is save for Internet, but I don't know of any > studies that 200ms is save... The constants that are quoted aren't specific to the initRTO. They're used to bound the RTO as it gets adjusted during the TCP session. See `tcp_set_rto' in tcp_input.c for reference. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 6:52 ` Alexander Zimmermann @ 2011-05-19 8:02 ` Hagen Paul Pfeifer 2011-05-19 8:02 ` Hagen Paul Pfeifer 1 sibling, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-19 8:02 UTC (permalink / raw) To: Alexander Zimmermann Cc: tsuna, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet, netdev, linux-kernel On Thu, 19 May 2011 08:52:10 +0200, Alexander Zimmermann wrote: >> #define TCP_RTO_MAX ((unsigned)(120*HZ)) >> #define TCP_RTO_MIN ((unsigned)(HZ/5)) >> >> So we're talking about a [200ms ; 120s] range no matter what. > > Why is 200ms a valid lower bound for initRTO? I'm aware of > measurements that 1s is save for Internet, but I don't know of any > studies that 200ms is save... TCP_RTO_MAX and TCP_RTO_MIN is the lower/upper bound for the RTO in general, not for the initial RTO. RFC 2988 specify a lower bound of 1 second but all operating system choose a lower one because at the time where RFC 2988 was written the clock granularity was not that accurate. The minimum RTO for FreeBSD is even 30ms! Furthermore, analysis had demonstrated that a minimum RTO of 1 second badly breaks throughput in environments faster then 33kB with minor packet loss rate (e.g. 1%). So yes, it CAN be wise to choose other lower/upper bounds. But keep in mind that we should NOT artificial limit ourself. I can image data center scenarios where a initial RTO of <1 match perfectly. Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. @ 2011-05-19 8:02 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 53+ messages in thread From: Hagen Paul Pfeifer @ 2011-05-19 8:02 UTC (permalink / raw) To: Alexander Zimmermann Cc: tsuna, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet, netdev, linux-kernel On Thu, 19 May 2011 08:52:10 +0200, Alexander Zimmermann wrote: >> #define TCP_RTO_MAX ((unsigned)(120*HZ)) >> #define TCP_RTO_MIN ((unsigned)(HZ/5)) >> >> So we're talking about a [200ms ; 120s] range no matter what. > > Why is 200ms a valid lower bound for initRTO? I'm aware of > measurements that 1s is save for Internet, but I don't know of any > studies that 200ms is save... TCP_RTO_MAX and TCP_RTO_MIN is the lower/upper bound for the RTO in general, not for the initial RTO. RFC 2988 specify a lower bound of 1 second but all operating system choose a lower one because at the time where RFC 2988 was written the clock granularity was not that accurate. The minimum RTO for FreeBSD is even 30ms! Furthermore, analysis had demonstrated that a minimum RTO of 1 second badly breaks throughput in environments faster then 33kB with minor packet loss rate (e.g. 1%). So yes, it CAN be wise to choose other lower/upper bounds. But keep in mind that we should NOT artificial limit ourself. I can image data center scenarios where a initial RTO of <1 match perfectly. Hagen ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 8:02 ` Hagen Paul Pfeifer (?) @ 2011-05-19 16:40 ` tsuna 2011-05-19 16:55 ` Alexander Zimmermann -1 siblings, 1 reply; 53+ messages in thread From: tsuna @ 2011-05-19 16:40 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Alexander Zimmermann, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet, netdev, linux-kernel On Thu, May 19, 2011 at 1:02 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > So yes, it CAN be wise to choose other lower/upper bounds. But keep in > mind that we should NOT artificial limit ourself. I can image data center > scenarios where a initial RTO of <1 match perfectly. Yes that's exactly the point I was trying to make when talking to Alexander offline. On today's Internet, RTTs are easily in the hundreds of ms, and initRTO is 3s, so there's 2 orders of magnitude of difference. In my environment, if my RTT is ~2µs, an initRTO of 200ms means that there's a gap of 6 orders of magnitude (!). And yes, although I don't work for High Frequency Trading companies in Wall Street, I'm already buying switches full of line-rate 10Gb ports with a port-to-port latency of 500ns for L2/L3 forwarding/switching. I expect this kind of network gear will quickly become prevalent in datacenter/backend environments. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 16:40 ` tsuna @ 2011-05-19 16:55 ` Alexander Zimmermann 2011-05-19 17:11 ` tsuna 0 siblings, 1 reply; 53+ messages in thread From: Alexander Zimmermann @ 2011-05-19 16:55 UTC (permalink / raw) To: tsuna Cc: Hagen Paul Pfeifer, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1647 bytes --] Am 19.05.2011 um 18:40 schrieb tsuna: > On Thu, May 19, 2011 at 1:02 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: >> So yes, it CAN be wise to choose other lower/upper bounds. But keep in >> mind that we should NOT artificial limit ourself. I can image data center >> scenarios where a initial RTO of <1 match perfectly. > > Yes that's exactly the point I was trying to make when talking to > Alexander offline. On today's Internet, RTTs are easily in the > hundreds of ms, and initRTO is 3s, so there's 2 orders of magnitude of > difference. In my environment, Exactly. This is the point. It's *your* environment. However, TCP is general purpose. And for the wider internet 1s is know to be save. See the measurements in the draft that Mark Allman run. > if my RTT is ~2µs, an initRTO of 200ms > means that there's a gap of 6 orders of magnitude (!). Currently, initRTO is 3s. So you the gap is even larger. > And yes, > although I don't work for High Frequency Trading companies in Wall > Street, I'm already buying switches full of line-rate 10Gb ports with > a port-to-port latency of 500ns for L2/L3 forwarding/switching. I > expect this kind of network gear will quickly become prevalent in > datacenter/backend environments. > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // [-- Attachment #2: Signierter Teil der Nachricht --] [-- Type: application/pgp-signature, Size: 243 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 16:55 ` Alexander Zimmermann @ 2011-05-19 17:11 ` tsuna 2011-05-19 19:27 ` David Miller 0 siblings, 1 reply; 53+ messages in thread From: tsuna @ 2011-05-19 17:11 UTC (permalink / raw) To: Alexander Zimmermann Cc: Hagen Paul Pfeifer, David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet, netdev, linux-kernel On Thu, May 19, 2011 at 9:55 AM, Alexander Zimmermann <alexander.zimmermann@comsys.rwth-aachen.de> wrote: > Exactly. This is the point. It's *your* environment. However, TCP is > general purpose. And for the wider internet 1s is know to be save. See the > measurements in the draft that Mark Allman run. That's right, there's no one-size-fits-all solution. That's why I'm in favor of keeping a reasonably conservative default (say 1s to 3s, so we don't break the Internets) and giving people a knob to adjust it to whatever makes sense for them. Looking through the kernel, I see that SCTP already has knobs for this: sctp_rto_initial, sctp_rto_min, sctp_rto_max. You can even control the constants used to update rttvar and srtt: sctp_rto_alpha, sctp_rto_beta -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 17:11 ` tsuna @ 2011-05-19 19:27 ` David Miller 2011-05-19 20:30 ` tsuna 0 siblings, 1 reply; 53+ messages in thread From: David Miller @ 2011-05-19 19:27 UTC (permalink / raw) To: tsunanet Cc: alexander.zimmermann, hagen, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet, netdev, linux-kernel From: tsuna <tsunanet@gmail.com> Date: Thu, 19 May 2011 10:11:50 -0700 > Looking through the kernel, I see that SCTP already has knobs for > this: sctp_rto_initial, sctp_rto_min, sctp_rto_max. You can even > control the constants used to update rttvar and srtt: sctp_rto_alpha, > sctp_rto_beta SCTP is 1) not even a sliver of deployment compared to TCP and 2) doesn't get nearly the same scrutiny on patch review that TCP changes do. I basically let the SCTP folks play in their own sandbox, because frankly SCTP doesn't matter. The only time I care about an SCTP change is when it has an impact on the rest of the networking code. So using SCTP as an example of "see we do this already over here" is a non-starter. Don't do it. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02. 2011-05-19 19:27 ` David Miller @ 2011-05-19 20:30 ` tsuna 0 siblings, 0 replies; 53+ messages in thread From: tsuna @ 2011-05-19 20:30 UTC (permalink / raw) To: David Miller Cc: alexander.zimmermann, hagen, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet, netdev, linux-kernel On Thu, May 19, 2011 at 12:27 PM, David Miller <davem@davemloft.net> wrote: > So using SCTP as an example of "see we do this already over here" is a > non-starter. Don't do it. Fair enough. I hope that the "there's no one-size-fits-all solution" argument has more weight than "hey SCTP does it". :) -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl. 2011-05-18 19:40 ` tsuna 2011-05-18 19:52 ` David Miller @ 2011-05-20 2:01 ` H.K. Jerry Chu 1 sibling, 0 replies; 53+ messages in thread From: H.K. Jerry Chu @ 2011-05-20 2:01 UTC (permalink / raw) To: tsuna Cc: David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, hkchu, netdev, linux-kernel On Wed, May 18, 2011 at 12:40 PM, tsuna <tsunanet@gmail.com> wrote: > On Wed, May 18, 2011 at 12:26 PM, David Miller <davem@davemloft.net> wrote: >> If you read the ietf draft that reduces the initial RTO down to 1 >> second, it states that if we take a timeout during the initial >> connection handshake then we have to revert the RTO back up to 3 >> seconds. >> >> This fallback logic conflicts with being able to only change the >> initial RTO via sysctl, I think. Because there are actually two >> values at stake and they depend upon eachother, the initial RTO and >> the value we fallback to on initial handshake retransmissions. >> >> So I'd rather get a patch that implements the 1 second initial >> RTO with the 3 second fallback on SYN retransmit, than this patch. >> >> We already have too many knobs. > > I was hoping this knob would be accepted because this is such an > important issue that it even warrants an IETF draft to attempt to > change the standard. I'm not sure how long it will take for this > draft to be accepted and then implemented, so I thought adding this > simple knob today would really help in the future. As one of the co-authors of rfc2988bis I was planning to provide a patch as soon as the draft gets approved but it looks like you have beaten me to it :) Personally I'm in favor of a knob too. We at Google has added such a knob for years. Jerry > > Plus, should the draft be accepted, this knob will still be just as > useful (e.g. to revert back to today's behavior), and people might > want to consider adding another knob for the fallback initRTO (this is > debatable). I don't believe this knob conflicts with the proposed > change to the standard, it actually goes along with it pretty well and > helps us prepare better for this upcoming change. > > I agree that there are too many knobs, and I hate feature creep too, > but I've found many of these knobs to be really useful, and the degree > to which Linux's TCP stack can be tuned is part of what makes it so > versatile. > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2011-05-31 15:43 UTC | newest] Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-05-17 7:40 [PATCH] tcp: Expose the initial RTO via a new sysctl Benoit Sigoure 2011-05-17 7:40 ` Benoit Sigoure 2011-05-17 8:01 ` Alexander Zimmermann 2011-05-17 8:34 ` Eric Dumazet 2011-05-17 8:07 ` Eric Dumazet 2011-05-17 11:02 ` Hagen Paul Pfeifer 2011-05-17 11:02 ` Hagen Paul Pfeifer 2011-05-17 12:20 ` Eric Dumazet 2011-05-18 10:43 ` Benoit Sigoure 2011-05-18 19:26 ` David Miller 2011-05-18 19:40 ` tsuna 2011-05-18 19:52 ` David Miller 2011-05-18 20:20 ` Hagen Paul Pfeifer 2011-05-18 20:23 ` David Miller 2011-05-18 20:27 ` Hagen Paul Pfeifer 2011-05-20 10:27 ` H.K. Jerry Chu 2011-05-20 11:00 ` Hagen Paul Pfeifer 2011-05-20 11:00 ` Hagen Paul Pfeifer 2011-05-20 12:37 ` Alan Cox 2011-05-21 0:06 ` H.K. Jerry Chu 2011-05-31 14:48 ` tsuna 2011-05-31 15:25 ` Hagen Paul Pfeifer 2011-05-31 15:25 ` Hagen Paul Pfeifer 2011-05-31 15:28 ` tsuna 2011-05-31 15:43 ` Hagen Paul Pfeifer 2011-05-31 15:43 ` Hagen Paul Pfeifer 2011-05-19 2:22 ` [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02 Benoit Sigoure 2011-05-19 2:22 ` Benoit Sigoure 2011-05-19 2:36 ` David Miller 2011-05-19 3:56 ` tsuna 2011-05-19 4:14 ` David Miller 2011-05-19 4:33 ` tsuna 2011-05-19 5:46 ` David Miller 2011-05-19 6:36 ` [PATCH] tcp: Lower the initial RTO to 1s " Benoit Sigoure 2011-05-19 6:36 ` Benoit Sigoure 2011-05-19 17:42 ` Yuchung Cheng 2011-05-19 6:47 ` Benoit Sigoure 2011-05-19 6:47 ` Benoit Sigoure 2011-05-19 20:16 ` David Miller 2011-05-19 6:10 ` [PATCH] tcp: Implement a two-level initial RTO " Alexander Zimmermann 2011-05-19 6:25 ` tsuna 2011-05-19 6:36 ` Alexander Zimmermann 2011-05-19 6:42 ` tsuna 2011-05-19 6:52 ` Alexander Zimmermann 2011-05-19 7:07 ` tsuna 2011-05-19 8:02 ` Hagen Paul Pfeifer 2011-05-19 8:02 ` Hagen Paul Pfeifer 2011-05-19 16:40 ` tsuna 2011-05-19 16:55 ` Alexander Zimmermann 2011-05-19 17:11 ` tsuna 2011-05-19 19:27 ` David Miller 2011-05-19 20:30 ` tsuna 2011-05-20 2:01 ` [PATCH] tcp: Expose the initial RTO via a new sysctl H.K. Jerry Chu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.