* [net-next,v1] tcp: Improve setsockopt() TCP_USER_TIMEOUT accuracy
@ 2018-07-03 7:21 Jon Maxwell
2018-07-03 14:13 ` Neal Cardwell
0 siblings, 1 reply; 3+ messages in thread
From: Jon Maxwell @ 2018-07-03 7:21 UTC (permalink / raw)
To: davem
Cc: ncardwell, edumazet, kuznet, yoshfuji, netdev, linux-kernel, jmaxwell
v1 contains the following suggestions by Neal Cardwell:
1) Fix up units mismatch regarding msec/jiffies.
2) Address possiblility of time_remaining being negative.
3) Add a helper routine to do the rto calculation.
Every time the TCP retransmission timer fires. It checks to see if there is a
timeout before scheduling the next retransmit timer. The retransmit interval
between each retransmission increases exponentially. The issue is that in order
for the timeout to occur the retransmit timer needs to fire again. If the user
timeout check happens after the 9th retransmit for example. It needs to wait for
the 10th retransmit timer to fire in order to evaluate whether a timeout has
occurred or not. If the interval is large enough then the timeout will be
inaccurate.
For example with a TCP_USER_TIMEOUT of 10 seconds without patch:
1st retransmit:
22:25:18.973488 IP host1.49310 > host2.search-agent: Flags [.]
Last retransmit:
22:25:26.205499 IP host1.49310 > host2.search-agent: Flags [.]
Timeout:
send: Connection timed out
Sun Jul 1 22:25:34 EDT 2018
We can see that last retransmit took ~7 seconds. Which pushed the total
timeout to ~15 seconds instead of the expected 10 seconds. This gets more
inaccurate the larger the TCP_USER_TIMEOUT value. As the interval increases.
Add tcp_clamp_rto_to_user_timeout() to determine if the user rto has expired.
Or whether the rto interval needs to be recalculated. Use the original interval
if user rto is not set.
Test results with the patch is the expected 10 second timeout:
1st retransmit:
01:37:59.022555 IP host1.49310 > host2.search-agent: Flags [.]
Last retransmit:
01:38:06.486558 IP host1.49310 > host2.search-agent: Flags [.]
Timeout:
send: Connection timed out
Mon Jul 2 01:38:09 EDT 2018
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
net/ipv4/tcp_timer.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 3b3611729928..82c2a3b3713c 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -22,6 +22,23 @@
#include <linux/gfp.h>
#include <net/tcp.h>
+static __u32 tcp_clamp_rto_to_user_timeout(struct sock *sk)
+{
+ struct inet_connection_sock *icsk = inet_csk(sk);
+ __u32 rto = icsk->icsk_rto;
+ __u32 elapsed, user_timeout;
+
+ if (!icsk->icsk_user_timeout)
+ return rto;
+ elapsed = tcp_time_stamp(tcp_sk(sk)) - tcp_sk(sk)->retrans_stamp;
+ user_timeout = jiffies_to_msecs(icsk->icsk_user_timeout);
+ if (elapsed >= user_timeout)
+ rto = 1; /* user timeout has passed; fire ASAP */
+ else
+ rto = min(rto, (__u32)msecs_to_jiffies(user_timeout - elapsed));
+ return rto;
+}
+
/**
* tcp_write_err() - close socket and save error info
* @sk: The socket the error has appeared on.
@@ -407,6 +424,7 @@ void tcp_retransmit_timer(struct sock *sk)
struct tcp_sock *tp = tcp_sk(sk);
struct net *net = sock_net(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
+ __u32 rto;
if (tp->fastopen_rsk) {
WARN_ON_ONCE(sk->sk_state != TCP_SYN_RECV &&
@@ -535,7 +553,8 @@ void tcp_retransmit_timer(struct sock *sk)
/* Use normal (exponential) backoff */
icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
}
- inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
+ rto = tcp_clamp_rto_to_user_timeout(sk);
+ inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto, TCP_RTO_MAX);
if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1 + 1, 0))
__sk_dst_reset(sk);
--
2.13.6
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [net-next,v1] tcp: Improve setsockopt() TCP_USER_TIMEOUT accuracy
2018-07-03 7:21 [net-next,v1] tcp: Improve setsockopt() TCP_USER_TIMEOUT accuracy Jon Maxwell
@ 2018-07-03 14:13 ` Neal Cardwell
2018-07-03 23:03 ` Jonathan Maxwell
0 siblings, 1 reply; 3+ messages in thread
From: Neal Cardwell @ 2018-07-03 14:13 UTC (permalink / raw)
To: Jonathan Maxwell
Cc: David Miller, Eric Dumazet, Alexey Kuznetsov, Hideaki YOSHIFUJI,
Netdev, LKML, jmaxwell
On Tue, Jul 3, 2018 at 3:21 AM Jon Maxwell <jmaxwell37@gmail.com> wrote:
>
> v1 contains the following suggestions by Neal Cardwell:
>
> 1) Fix up units mismatch regarding msec/jiffies.
> 2) Address possiblility of time_remaining being negative.
> 3) Add a helper routine to do the rto calculation.
>
> Every time the TCP retransmission timer fires. It checks to see if there is a
> timeout before scheduling the next retransmit timer. The retransmit interval
> between each retransmission increases exponentially. The issue is that in order
> for the timeout to occur the retransmit timer needs to fire again. If the user
> timeout check happens after the 9th retransmit for example. It needs to wait for
> the 10th retransmit timer to fire in order to evaluate whether a timeout has
> occurred or not. If the interval is large enough then the timeout will be
> inaccurate.
>
> For example with a TCP_USER_TIMEOUT of 10 seconds without patch:
>
> 1st retransmit:
>
> 22:25:18.973488 IP host1.49310 > host2.search-agent: Flags [.]
>
> Last retransmit:
>
> 22:25:26.205499 IP host1.49310 > host2.search-agent: Flags [.]
>
> Timeout:
>
> send: Connection timed out
> Sun Jul 1 22:25:34 EDT 2018
>
> We can see that last retransmit took ~7 seconds. Which pushed the total
> timeout to ~15 seconds instead of the expected 10 seconds. This gets more
> inaccurate the larger the TCP_USER_TIMEOUT value. As the interval increases.
>
> Add tcp_clamp_rto_to_user_timeout() to determine if the user rto has expired.
> Or whether the rto interval needs to be recalculated. Use the original interval
> if user rto is not set.
>
> Test results with the patch is the expected 10 second timeout:
>
> 1st retransmit:
>
> 01:37:59.022555 IP host1.49310 > host2.search-agent: Flags [.]
>
> Last retransmit:
>
> 01:38:06.486558 IP host1.49310 > host2.search-agent: Flags [.]
>
> Timeout:
>
> send: Connection timed out
> Mon Jul 2 01:38:09 EDT 2018
>
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> ---
> net/ipv4/tcp_timer.c | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 3b3611729928..82c2a3b3713c 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -22,6 +22,23 @@
> #include <linux/gfp.h>
> #include <net/tcp.h>
>
> +static __u32 tcp_clamp_rto_to_user_timeout(struct sock *sk)
> +{
> + struct inet_connection_sock *icsk = inet_csk(sk);
> + __u32 rto = icsk->icsk_rto;
> + __u32 elapsed, user_timeout;
> +
> + if (!icsk->icsk_user_timeout)
> + return rto;
> + elapsed = tcp_time_stamp(tcp_sk(sk)) - tcp_sk(sk)->retrans_stamp;
Thanks. The local logic seems OK to me now, but from reading
retransmits_timed_out() it looks like at this point in the code we are
not guaranteed that tcp_sk(sk)->retrans_stamp is initialized to
something non-zero. So we probably need a preceding preparatory patch
that factors out the first few lines of retransmits_timed_out() into
a helper frunction to get the start_ts for use in this calculation.
Perhaps:
u32 tcp_retrans_stamp():
start_ts = tcp_sk(sk)->retrans_stamp;
if (unlikely(!start_ts)) {
head = tcp_rtx_queue_head(sk);
if (!head)
return 0;
start_ts = tcp_skb_timestamp(head);
}
return start_ts;
And then the new tcp_clamp_rto_to_user_timeout() can use the helper:
...
retrans_stamp = tcp_retransmit_stamp(sk);
if (!retrans_stamp)
return rto;
elapsed = tcp_time_stamp(tcp_sk(sk)) - retrans_stamp;
...
Eric wrote those lines to recalculate start_ts, so we may want to wait
until Eric returns to review this before merging the resulting patch
series.
neal
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [net-next,v1] tcp: Improve setsockopt() TCP_USER_TIMEOUT accuracy
2018-07-03 14:13 ` Neal Cardwell
@ 2018-07-03 23:03 ` Jonathan Maxwell
0 siblings, 0 replies; 3+ messages in thread
From: Jonathan Maxwell @ 2018-07-03 23:03 UTC (permalink / raw)
To: Neal Cardwell
Cc: David Miller, Eric Dumazet, Alexey Kuznetsov, Hideaki YOSHIFUJI,
Netdev, LKML, Jon Maxwell
On Wed, Jul 4, 2018 at 12:13 AM, Neal Cardwell <ncardwell@google.com> wrote:
> On Tue, Jul 3, 2018 at 3:21 AM Jon Maxwell <jmaxwell37@gmail.com> wrote:
>>
>> v1 contains the following suggestions by Neal Cardwell:
>>
>> 1) Fix up units mismatch regarding msec/jiffies.
>> 2) Address possiblility of time_remaining being negative.
>> 3) Add a helper routine to do the rto calculation.
>>
>> Every time the TCP retransmission timer fires. It checks to see if there is a
>> timeout before scheduling the next retransmit timer. The retransmit interval
>> between each retransmission increases exponentially. The issue is that in order
>> for the timeout to occur the retransmit timer needs to fire again. If the user
>> timeout check happens after the 9th retransmit for example. It needs to wait for
>> the 10th retransmit timer to fire in order to evaluate whether a timeout has
>> occurred or not. If the interval is large enough then the timeout will be
>> inaccurate.
>>
>> For example with a TCP_USER_TIMEOUT of 10 seconds without patch:
>>
>> 1st retransmit:
>>
>> 22:25:18.973488 IP host1.49310 > host2.search-agent: Flags [.]
>>
>> Last retransmit:
>>
>> 22:25:26.205499 IP host1.49310 > host2.search-agent: Flags [.]
>>
>> Timeout:
>>
>> send: Connection timed out
>> Sun Jul 1 22:25:34 EDT 2018
>>
>> We can see that last retransmit took ~7 seconds. Which pushed the total
>> timeout to ~15 seconds instead of the expected 10 seconds. This gets more
>> inaccurate the larger the TCP_USER_TIMEOUT value. As the interval increases.
>>
>> Add tcp_clamp_rto_to_user_timeout() to determine if the user rto has expired.
>> Or whether the rto interval needs to be recalculated. Use the original interval
>> if user rto is not set.
>>
>> Test results with the patch is the expected 10 second timeout:
>>
>> 1st retransmit:
>>
>> 01:37:59.022555 IP host1.49310 > host2.search-agent: Flags [.]
>>
>> Last retransmit:
>>
>> 01:38:06.486558 IP host1.49310 > host2.search-agent: Flags [.]
>>
>> Timeout:
>>
>> send: Connection timed out
>> Mon Jul 2 01:38:09 EDT 2018
>>
>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>> ---
>> net/ipv4/tcp_timer.c | 21 ++++++++++++++++++++-
>> 1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
>> index 3b3611729928..82c2a3b3713c 100644
>> --- a/net/ipv4/tcp_timer.c
>> +++ b/net/ipv4/tcp_timer.c
>> @@ -22,6 +22,23 @@
>> #include <linux/gfp.h>
>> #include <net/tcp.h>
>>
>> +static __u32 tcp_clamp_rto_to_user_timeout(struct sock *sk)
>> +{
>> + struct inet_connection_sock *icsk = inet_csk(sk);
>> + __u32 rto = icsk->icsk_rto;
>> + __u32 elapsed, user_timeout;
>> +
>> + if (!icsk->icsk_user_timeout)
>> + return rto;
>> + elapsed = tcp_time_stamp(tcp_sk(sk)) - tcp_sk(sk)->retrans_stamp;
>
> Thanks. The local logic seems OK to me now, but from reading
> retransmits_timed_out() it looks like at this point in the code we are
> not guaranteed that tcp_sk(sk)->retrans_stamp is initialized to
> something non-zero. So we probably need a preceding preparatory patch
> that factors out the first few lines of retransmits_timed_out() into
> a helper frunction to get the start_ts for use in this calculation.
> Perhaps:
>
> u32 tcp_retrans_stamp():
> start_ts = tcp_sk(sk)->retrans_stamp;
> if (unlikely(!start_ts)) {
> head = tcp_rtx_queue_head(sk);
> if (!head)
> return 0;
> start_ts = tcp_skb_timestamp(head);
> }
> return start_ts;
>
> And then the new tcp_clamp_rto_to_user_timeout() can use the helper:
>
> ...
> retrans_stamp = tcp_retransmit_stamp(sk);
> if (!retrans_stamp)
> return rto;
> elapsed = tcp_time_stamp(tcp_sk(sk)) - retrans_stamp;
> ...
>
> Eric wrote those lines to recalculate start_ts, so we may want to wait
> until Eric returns to review this before merging the resulting patch
> series.
>
You are right. tcp_clamp_rto_to_user_timeout() should do the
same check as retransmits_timed_out() in regards to
tcp_sk(sk)->retrans_stamp. I'll add that to v2/test and then Eric can
comment on that if he has any input when he returns.
Thanks for all your help.
Regards
Jon
> neal
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-07-03 23:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-03 7:21 [net-next,v1] tcp: Improve setsockopt() TCP_USER_TIMEOUT accuracy Jon Maxwell
2018-07-03 14:13 ` Neal Cardwell
2018-07-03 23:03 ` Jonathan Maxwell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).