netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries
@ 2019-09-25  8:46 Marek Majkowski
  2019-09-26 15:05 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Marek Majkowski @ 2019-09-25  8:46 UTC (permalink / raw)
  To: netdev

Hello my favorite mailing list!

Recently I've been looking into TCP_USER_TIMEOUT and noticed some
strange behaviour on fresh sockets in SYN-SENT state. Full writeup:
https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/

Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and
does connect() to a blackholed IP:

$ wget https://gist.githubusercontent.com/majek/b4ad53c5795b226d62fad1fa4a87151a/raw/cbb928cb99cd6c5aa9f73ba2d3bc0aef22fbc2bf/user-timeout-and-syn.py

$ sudo python3 user-timeout-and-syn.py
00:00.000000 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
00:01.007053 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
00:03.023051 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
00:05.007096 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
00:05.015037 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
00:05.023020 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
00:05.034983 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]

The connect() times out with ETIMEDOUT after 5 seconds - as intended.
But Linux (5.3.0-rc3) does something weird on the network - it sends
remaining tcp_syn_retries packets aligned to the 5s mark.

In other words: with TCP_USER_TIMEOUT we are sending spurious SYN
packets on a timeout.

For the record, the man page doesn't define what TCP_USER_TIMEOUT does
on SYN-SENT state.

Cheers,
Marek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries
  2019-09-25  8:46 TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries Marek Majkowski
@ 2019-09-26 15:05 ` Eric Dumazet
  2019-09-26 16:46   ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2019-09-26 15:05 UTC (permalink / raw)
  To: Marek Majkowski, netdev



On 9/25/19 1:46 AM, Marek Majkowski wrote:
> Hello my favorite mailing list!
> 
> Recently I've been looking into TCP_USER_TIMEOUT and noticed some
> strange behaviour on fresh sockets in SYN-SENT state. Full writeup:
> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
> 
> Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and
> does connect() to a blackholed IP:
> 
> $ wget https://gist.githubusercontent.com/majek/b4ad53c5795b226d62fad1fa4a87151a/raw/cbb928cb99cd6c5aa9f73ba2d3bc0aef22fbc2bf/user-timeout-and-syn.py
> 
> $ sudo python3 user-timeout-and-syn.py
> 00:00.000000 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
> 00:01.007053 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
> 00:03.023051 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
> 00:05.007096 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
> 00:05.015037 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
> 00:05.023020 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
> 00:05.034983 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
> 
> The connect() times out with ETIMEDOUT after 5 seconds - as intended.
> But Linux (5.3.0-rc3) does something weird on the network - it sends
> remaining tcp_syn_retries packets aligned to the 5s mark.
> 
> In other words: with TCP_USER_TIMEOUT we are sending spurious SYN
> packets on a timeout.
> 
> For the record, the man page doesn't define what TCP_USER_TIMEOUT does
> on SYN-SENT state.
> 

Exactly, so far this option has only be used on established flows.

Feel free to send patches if you need to override the stack behavior
for connection establishment (Same remark for passive side...)

Thanks.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries
  2019-09-26 15:05 ` Eric Dumazet
@ 2019-09-26 16:46   ` Eric Dumazet
  2019-09-26 16:57     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2019-09-26 16:46 UTC (permalink / raw)
  To: Marek Majkowski, netdev



On 9/26/19 8:05 AM, Eric Dumazet wrote:
> 
> 
> On 9/25/19 1:46 AM, Marek Majkowski wrote:
>> Hello my favorite mailing list!
>>
>> Recently I've been looking into TCP_USER_TIMEOUT and noticed some
>> strange behaviour on fresh sockets in SYN-SENT state. Full writeup:
>> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>
>> Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and
>> does connect() to a blackholed IP:
>>
>> $ wget https://gist.githubusercontent.com/majek/b4ad53c5795b226d62fad1fa4a87151a/raw/cbb928cb99cd6c5aa9f73ba2d3bc0aef22fbc2bf/user-timeout-and-syn.py
>>
>> $ sudo python3 user-timeout-and-syn.py
>> 00:00.000000 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>> 00:01.007053 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>> 00:03.023051 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>> 00:05.007096 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>> 00:05.015037 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>> 00:05.023020 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>> 00:05.034983 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>
>> The connect() times out with ETIMEDOUT after 5 seconds - as intended.
>> But Linux (5.3.0-rc3) does something weird on the network - it sends
>> remaining tcp_syn_retries packets aligned to the 5s mark.
>>
>> In other words: with TCP_USER_TIMEOUT we are sending spurious SYN
>> packets on a timeout.
>>
>> For the record, the man page doesn't define what TCP_USER_TIMEOUT does
>> on SYN-SENT state.
>>
> 
> Exactly, so far this option has only be used on established flows.
> 
> Feel free to send patches if you need to override the stack behavior
> for connection establishment (Same remark for passive side...)

Also please take a look at TCP_SYNCNT,  which predates TCP_USER_TIMEOUT



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries
  2019-09-26 16:46   ` Eric Dumazet
@ 2019-09-26 16:57     ` Eric Dumazet
  2019-09-26 18:03       ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2019-09-26 16:57 UTC (permalink / raw)
  To: Eric Dumazet, Marek Majkowski, netdev



On 9/26/19 9:46 AM, Eric Dumazet wrote:
> 
> 
> On 9/26/19 8:05 AM, Eric Dumazet wrote:
>>
>>
>> On 9/25/19 1:46 AM, Marek Majkowski wrote:
>>> Hello my favorite mailing list!
>>>
>>> Recently I've been looking into TCP_USER_TIMEOUT and noticed some
>>> strange behaviour on fresh sockets in SYN-SENT state. Full writeup:
>>> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>>
>>> Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and
>>> does connect() to a blackholed IP:
>>>
>>> $ wget https://gist.githubusercontent.com/majek/b4ad53c5795b226d62fad1fa4a87151a/raw/cbb928cb99cd6c5aa9f73ba2d3bc0aef22fbc2bf/user-timeout-and-syn.py
>>>
>>> $ sudo python3 user-timeout-and-syn.py
>>> 00:00.000000 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>> 00:01.007053 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>> 00:03.023051 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>> 00:05.007096 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>> 00:05.015037 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>> 00:05.023020 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>> 00:05.034983 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>
>>> The connect() times out with ETIMEDOUT after 5 seconds - as intended.
>>> But Linux (5.3.0-rc3) does something weird on the network - it sends
>>> remaining tcp_syn_retries packets aligned to the 5s mark.
>>>
>>> In other words: with TCP_USER_TIMEOUT we are sending spurious SYN
>>> packets on a timeout.
>>>
>>> For the record, the man page doesn't define what TCP_USER_TIMEOUT does
>>> on SYN-SENT state.
>>>
>>
>> Exactly, so far this option has only be used on established flows.
>>
>> Feel free to send patches if you need to override the stack behavior
>> for connection establishment (Same remark for passive side...)
> 
> Also please take a look at TCP_SYNCNT,  which predates TCP_USER_TIMEOUT
> 
> 

I will test the following :

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index dbd9d2d0ee63aa46ad2dda417da6ec9409442b77..1182e51a6b794d75beb8c130354d7804fc83a307 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -220,7 +220,6 @@ static int tcp_write_timeout(struct sock *sk)
                        sk_rethink_txhash(sk);
                }
                retry_until = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_syn_retries;
-               expired = icsk->icsk_retransmits >= retry_until;
        } else {
                if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 0)) {
                        /* Black hole detection */
@@ -242,9 +241,9 @@ static int tcp_write_timeout(struct sock *sk)
                        if (tcp_out_of_resources(sk, do_reset))
                                return 1;
                }
-               expired = retransmits_timed_out(sk, retry_until,
-                                               icsk->icsk_user_timeout);
        }
+       expired = retransmits_timed_out(sk, retry_until,
+                                       icsk->icsk_user_timeout);
        tcp_fastopen_active_detect_blackhole(sk, expired);
 
        if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries
  2019-09-26 16:57     ` Eric Dumazet
@ 2019-09-26 18:03       ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2019-09-26 18:03 UTC (permalink / raw)
  To: Marek Majkowski, netdev



On 9/26/19 9:57 AM, Eric Dumazet wrote:
> 
> 
> On 9/26/19 9:46 AM, Eric Dumazet wrote:
>>
>>
>> On 9/26/19 8:05 AM, Eric Dumazet wrote:
>>>
>>>
>>> On 9/25/19 1:46 AM, Marek Majkowski wrote:
>>>> Hello my favorite mailing list!
>>>>
>>>> Recently I've been looking into TCP_USER_TIMEOUT and noticed some
>>>> strange behaviour on fresh sockets in SYN-SENT state. Full writeup:
>>>> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>>>
>>>> Here's a reproducer. It does a simple thing: sets TCP_USER_TIMEOUT and
>>>> does connect() to a blackholed IP:
>>>>
>>>> $ wget https://gist.githubusercontent.com/majek/b4ad53c5795b226d62fad1fa4a87151a/raw/cbb928cb99cd6c5aa9f73ba2d3bc0aef22fbc2bf/user-timeout-and-syn.py
>>>>
>>>> $ sudo python3 user-timeout-and-syn.py
>>>> 00:00.000000 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:01.007053 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:03.023051 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.007096 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.015037 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.023020 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>> 00:05.034983 IP 192.1.1.1.52974 > 244.0.0.1.1234: Flags [S]
>>>>
>>>> The connect() times out with ETIMEDOUT after 5 seconds - as intended.
>>>> But Linux (5.3.0-rc3) does something weird on the network - it sends
>>>> remaining tcp_syn_retries packets aligned to the 5s mark.
>>>>
>>>> In other words: with TCP_USER_TIMEOUT we are sending spurious SYN
>>>> packets on a timeout.
>>>>
>>>> For the record, the man page doesn't define what TCP_USER_TIMEOUT does
>>>> on SYN-SENT state.
>>>>
>>>
>>> Exactly, so far this option has only be used on established flows.
>>>
>>> Feel free to send patches if you need to override the stack behavior
>>> for connection establishment (Same remark for passive side...)
>>
>> Also please take a look at TCP_SYNCNT,  which predates TCP_USER_TIMEOUT
>>
>>
> 
> I will test the following :
> 
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index dbd9d2d0ee63aa46ad2dda417da6ec9409442b77..1182e51a6b794d75beb8c130354d7804fc83a307 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -220,7 +220,6 @@ static int tcp_write_timeout(struct sock *sk)
>                         sk_rethink_txhash(sk);
>                 }
>                 retry_until = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_syn_retries;
> -               expired = icsk->icsk_retransmits >= retry_until;
>         } else {
>                 if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 0)) {
>                         /* Black hole detection */
> @@ -242,9 +241,9 @@ static int tcp_write_timeout(struct sock *sk)
>                         if (tcp_out_of_resources(sk, do_reset))
>                                 return 1;
>                 }
> -               expired = retransmits_timed_out(sk, retry_until,
> -                                               icsk->icsk_user_timeout);
>         }
> +       expired = retransmits_timed_out(sk, retry_until,
> +                                       icsk->icsk_user_timeout);
>         tcp_fastopen_active_detect_blackhole(sk, expired);
>  
>         if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))
> 

The patch works well, but reading again the man page, I see the existing behavior as
been clearly documented.

If we change the behavior, we might break applications that were setting TCP_USER_TIMEOUT
on the listener, expecting the value to b inherited to children at accept() time
but not expecting to change SYNACK rtx behavior.

On the other hand, John Maxell patch (tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy)
has added this weird effect of sending remaining SYN every jiffie


     remaining = icsk->icsk_user_timeout - elapsed;
     if (remaining <= 0)
         return 1; /* user timeout has passed; fire ASAP */ 

So we probably just should extend TCP_USER_TIMEOUT to SYN_SENT/SYN_RECV states
and change the man page accordingly. 



       TCP_USER_TIMEOUT (since Linux 2.6.37)
              This  option takes an unsigned int as an argument.  When the value is
              greater than 0, it specifies the maximum amount of time in  millisec‐
              onds  that transmitted data may remain unacknowledged before TCP will
              forcibly close the corresponding connection and return  ETIMEDOUT  to
              the  application.  If the option value is specified as 0, TCP will to
              use the system default.

              Increasing user timeouts allows a TCP connection to survive  extended
              periods  without  end-to-end  connectivity.  Decreasing user timeouts
              allows applications to "fail fast", if so desired.  Otherwise,  fail‐
              ure  may  take up to 20 minutes with the current system defaults in a
              normal WAN environment.

              This option can be set during any state of a TCP connection,  but  is
              effective only during the synchronized states of a connection (ESTAB‐
              LISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING,  and  LAST-ACK).
              Moreover,  when  used  with  the TCP keepalive (SO_KEEPALIVE) option,
              TCP_USER_TIMEOUT will override keepalive to determine when to close a
              connection due to keepalive failure.

              The option has no effect on when TCP retransmits a packet, nor when a
              keepalive probe is sent.

              This option, like many others, will be inherited by  the  socket  re‐
              turned by accept(2), if it was set on the listening socket.

              Further  details  on the user timeout feature can be found in RFC 793
              and RFC 5482 ("TCP User Timeout Option").


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-26 18:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-25  8:46 TCP_USER_TIMEOUT, SYN-SENT and tcp_syn_retries Marek Majkowski
2019-09-26 15:05 ` Eric Dumazet
2019-09-26 16:46   ` Eric Dumazet
2019-09-26 16:57     ` Eric Dumazet
2019-09-26 18:03       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).