All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
@ 2020-04-21 12:17 Cambda Zhu
  2020-04-23  2:19 ` David Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Cambda Zhu @ 2020-04-21 12:17 UTC (permalink / raw)
  To: netdev; +Cc: Dust Li, Tony Lu, Cambda Zhu

This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
option has same behavior as TCP_LINGER2, except the tp->linger2 value
can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
with CAP_NET_ADMIN.

As a server, different sockets may need different FIN-WAIT timeout and
in most cases the system default value will be used. The timeout can
be adjusted by setting TCP_LINGER2 but cannot be greater than the
system default value. If one socket needs a timeout greater than the
default, we have to adjust the sysctl which affects all sockets using
the system default value. And if we want to adjust it for just one
socket and keep the original value for others, all the other sockets
have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
set greater tp->linger2 than the default for one socket and keep
the sysctl_tcp_fin_timeout unchanged.

Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
---
 include/uapi/linux/capability.h | 1 +
 include/uapi/linux/tcp.h        | 1 +
 net/ipv4/tcp.c                  | 9 +++++++++
 3 files changed, 11 insertions(+)

diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 272dc69fa080..0e30c9756a04 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
 /* Allow multicasting */
 /* Allow read/write of device-specific registers */
 /* Allow activation of ATM control sockets */
+/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
 
 #define CAP_NET_ADMIN        12
 
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index f2acb2566333..e21e0ce98ca1 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -128,6 +128,7 @@ enum {
 #define TCP_CM_INQ		TCP_INQ
 
 #define TCP_TX_DELAY		37	/* delay outgoing packets by XX usec */
+#define TCP_FORCE_LINGER2	38	/* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
 
 
 #define TCP_REPAIR_ON		1
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6d87de434377..898a675d863e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3149,6 +3149,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 			tcp_enable_tx_delay();
 		tp->tcp_tx_delay = val;
 		break;
+	case TCP_FORCE_LINGER2:
+		if (val < 0)
+			tp->linger2 = -1;
+		else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
+			 !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+			tp->linger2 = 0;
+		else
+			tp->linger2 = val * HZ;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 		break;
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-21 12:17 [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt Cambda Zhu
@ 2020-04-23  2:19 ` David Miller
  2020-04-23  3:01 ` Eric Dumazet
  2020-04-23  8:52 ` [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt David Laight
  2 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2020-04-23  2:19 UTC (permalink / raw)
  To: cambda; +Cc: netdev, dust.li, tonylu, edumazet

From: Cambda Zhu <cambda@linux.alibaba.com>
Date: Tue, 21 Apr 2020 20:17:37 +0800

> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
> option has same behavior as TCP_LINGER2, except the tp->linger2 value
> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
> with CAP_NET_ADMIN.
> 
> As a server, different sockets may need different FIN-WAIT timeout and
> in most cases the system default value will be used. The timeout can
> be adjusted by setting TCP_LINGER2 but cannot be greater than the
> system default value. If one socket needs a timeout greater than the
> default, we have to adjust the sysctl which affects all sockets using
> the system default value. And if we want to adjust it for just one
> socket and keep the original value for others, all the other sockets
> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
> set greater tp->linger2 than the default for one socket and keep
> the sysctl_tcp_fin_timeout unchanged.
> 
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>

Eric, please review.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-21 12:17 [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt Cambda Zhu
  2020-04-23  2:19 ` David Miller
@ 2020-04-23  3:01 ` Eric Dumazet
  2020-04-23  7:35   ` [PATCH net-next v2] " Cambda Zhu
  2020-04-23  8:52 ` [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt David Laight
  2 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2020-04-23  3:01 UTC (permalink / raw)
  To: Cambda Zhu, netdev; +Cc: Dust Li, Tony Lu



On 4/21/20 5:17 AM, Cambda Zhu wrote:
> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
> option has same behavior as TCP_LINGER2, except the tp->linger2 value
> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
> with CAP_NET_ADMIN.
> 
> As a server, different sockets may need different FIN-WAIT timeout and
> in most cases the system default value will be used. The timeout can
> be adjusted by setting TCP_LINGER2 but cannot be greater than the
> system default value. If one socket needs a timeout greater than the
> default, we have to adjust the sysctl which affects all sockets using
> the system default value. And if we want to adjust it for just one
> socket and keep the original value for others, all the other sockets
> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
> set greater tp->linger2 than the default for one socket and keep
> the sysctl_tcp_fin_timeout unchanged.
> 
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
> ---
>  include/uapi/linux/capability.h | 1 +
>  include/uapi/linux/tcp.h        | 1 +
>  net/ipv4/tcp.c                  | 9 +++++++++
>  3 files changed, 11 insertions(+)
> 
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 272dc69fa080..0e30c9756a04 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>  /* Allow multicasting */
>  /* Allow read/write of device-specific registers */
>  /* Allow activation of ATM control sockets */
> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>  
>  #define CAP_NET_ADMIN        12
>  
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index f2acb2566333..e21e0ce98ca1 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -128,6 +128,7 @@ enum {
>  #define TCP_CM_INQ		TCP_INQ
>  
>  #define TCP_TX_DELAY		37	/* delay outgoing packets by XX usec */
> +#define TCP_FORCE_LINGER2	38	/* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>  
>  
>  #define TCP_REPAIR_ON		1
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6d87de434377..898a675d863e 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3149,6 +3149,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>  			tcp_enable_tx_delay();
>  		tp->tcp_tx_delay = val;
>  		break;
> +	case TCP_FORCE_LINGER2:
> +		if (val < 0)
> +			tp->linger2 = -1;
> +		else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
> +			 !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> +			tp->linger2 = 0;
> +		else
> +			tp->linger2 = val * HZ;

This multiply could overflow.

Since tp->linger2 is an int, and a negative value has a specific meaning,
you probably should have some sanity checks.

Even if the old TCP_LINGER2 silently put a 0,
maybe a new option should return an error if val*HZ would overflow.



> +		break;
>  	default:
>  		err = -ENOPROTOOPT;
>  		break;
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-23  3:01 ` Eric Dumazet
@ 2020-04-23  7:35   ` Cambda Zhu
  2020-04-23 13:40     ` Eric Dumazet
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Cambda Zhu @ 2020-04-23  7:35 UTC (permalink / raw)
  To: netdev, Eric Dumazet; +Cc: Dust Li, Tony Lu, Cambda Zhu

This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
option has same behavior as TCP_LINGER2, except the tp->linger2 value
can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
with CAP_NET_ADMIN.

As a server, different sockets may need different FIN-WAIT timeout and
in most cases the system default value will be used. The timeout can
be adjusted by setting TCP_LINGER2 but cannot be greater than the
system default value. If one socket needs a timeout greater than the
default, we have to adjust the sysctl which affects all sockets using
the system default value. And if we want to adjust it for just one
socket and keep the original value for others, all the other sockets
have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
set greater tp->linger2 than the default for one socket and keep
the sysctl_tcp_fin_timeout unchanged.

Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
---
 Changes in v2:
   - Add int overflow check.

 include/uapi/linux/capability.h |  1 +
 include/uapi/linux/tcp.h        |  1 +
 net/ipv4/tcp.c                  | 11 +++++++++++
 3 files changed, 13 insertions(+)

diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 272dc69fa080..0e30c9756a04 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
 /* Allow multicasting */
 /* Allow read/write of device-specific registers */
 /* Allow activation of ATM control sockets */
+/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
 
 #define CAP_NET_ADMIN        12
 
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index f2acb2566333..e21e0ce98ca1 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -128,6 +128,7 @@ enum {
 #define TCP_CM_INQ		TCP_INQ
 
 #define TCP_TX_DELAY		37	/* delay outgoing packets by XX usec */
+#define TCP_FORCE_LINGER2	38	/* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
 
 
 #define TCP_REPAIR_ON		1
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6d87de434377..d8cd1fd66bc1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 			tcp_enable_tx_delay();
 		tp->tcp_tx_delay = val;
 		break;
+	case TCP_FORCE_LINGER2:
+		if (val < 0)
+			tp->linger2 = -1;
+		else if (val > INT_MAX / HZ)
+			err = -EINVAL;
+		else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
+			 !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+			tp->linger2 = 0;
+		else
+			tp->linger2 = val * HZ;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 		break;
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-21 12:17 [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt Cambda Zhu
  2020-04-23  2:19 ` David Miller
  2020-04-23  3:01 ` Eric Dumazet
@ 2020-04-23  8:52 ` David Laight
  2020-04-23 10:33   ` Cambda Zhu
  2 siblings, 1 reply; 15+ messages in thread
From: David Laight @ 2020-04-23  8:52 UTC (permalink / raw)
  To: 'Cambda Zhu', netdev; +Cc: Dust Li, Tony Lu

From: Cambda Zhu
> Sent: 21 April 2020 13:18
> 
> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
> option has same behavior as TCP_LINGER2, except the tp->linger2 value
> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
> with CAP_NET_ADMIN.

Did you consider adding an extra sysctl so that the limit
for TCP_LINGER2 could be greater than the default?

It might even be sensible to set the limit to a few times
the default.

All users can set the socket buffer sizes to twice the default.
Being able to do the same for the linger timeout wouldn't be a problem.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-23  8:52 ` [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt David Laight
@ 2020-04-23 10:33   ` Cambda Zhu
  0 siblings, 0 replies; 15+ messages in thread
From: Cambda Zhu @ 2020-04-23 10:33 UTC (permalink / raw)
  To: David Laight; +Cc: netdev, Dust Li, Tony Lu



> On Apr 23, 2020, at 16:52, David Laight <David.Laight@ACULAB.COM> wrote:
> 
> From: Cambda Zhu
>> Sent: 21 April 2020 13:18
>> 
>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>> with CAP_NET_ADMIN.
> 
> Did you consider adding an extra sysctl so that the limit
> for TCP_LINGER2 could be greater than the default?
> 
> It might even be sensible to set the limit to a few times
> the default.
> 
> All users can set the socket buffer sizes to twice the default.
> Being able to do the same for the linger timeout wouldn't be a problem.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

The default tcp_fin_timeout is 60s which may be too long for a server with
tens of thousands connections or qps. To reduce the connections as many as
possible, the system administer may set the sysctl_tcp_fin_timeout and most
of the connections would use the default timeout. The timeout may be very
sensitive. For example if sysctl_tcp_fin_timeout is 10s, 20s may cause double
the FIN-WAIT connections and server load needs to be reevaluated.

Besides from my experience a longer FIN-WAIT timeout will have a better
experience for clients, because there’re some clients, such as video player,
needing to finish the play before closing the connection. If we add a sysctl
with default/max timeout for all users, all connections should be set to the
max timeout except some special requirements which can also be set with
TCP_LINGER2 to use a smaller one.

Maybe my experience is not correct for your scenes, so could you describe
a situation that the timeout with default/max value is better?

Regards,
Cambda

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-23  7:35   ` [PATCH net-next v2] " Cambda Zhu
@ 2020-04-23 13:40     ` Eric Dumazet
  2020-04-23 14:46       ` Cambda Zhu
  2020-04-23 13:43     ` Willem de Bruijn
  2020-04-24  8:06     ` [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX Cambda Zhu
  2 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2020-04-23 13:40 UTC (permalink / raw)
  To: Cambda Zhu, netdev, Eric Dumazet; +Cc: Dust Li, Tony Lu



On 4/23/20 12:35 AM, Cambda Zhu wrote:
> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
> option has same behavior as TCP_LINGER2, except the tp->linger2 value
> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
> with CAP_NET_ADMIN.
> 
> As a server, different sockets may need different FIN-WAIT timeout and
> in most cases the system default value will be used. The timeout can
> be adjusted by setting TCP_LINGER2 but cannot be greater than the
> system default value. If one socket needs a timeout greater than the
> default, we have to adjust the sysctl which affects all sockets using
> the system default value. And if we want to adjust it for just one
> socket and keep the original value for others, all the other sockets
> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
> set greater tp->linger2 than the default for one socket and keep
> the sysctl_tcp_fin_timeout unchanged.
> 
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
> ---
>  Changes in v2:
>    - Add int overflow check.
> 
>  include/uapi/linux/capability.h |  1 +
>  include/uapi/linux/tcp.h        |  1 +
>  net/ipv4/tcp.c                  | 11 +++++++++++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 272dc69fa080..0e30c9756a04 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>  /* Allow multicasting */
>  /* Allow read/write of device-specific registers */
>  /* Allow activation of ATM control sockets */
> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>  
>  #define CAP_NET_ADMIN        12
>  
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index f2acb2566333..e21e0ce98ca1 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -128,6 +128,7 @@ enum {
>  #define TCP_CM_INQ		TCP_INQ
>  
>  #define TCP_TX_DELAY		37	/* delay outgoing packets by XX usec */
> +#define TCP_FORCE_LINGER2	38	/* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>  
>  
>  #define TCP_REPAIR_ON		1
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6d87de434377..d8cd1fd66bc1 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>  			tcp_enable_tx_delay();
>  		tp->tcp_tx_delay = val;
>  		break;
> +	case TCP_FORCE_LINGER2:
> +		if (val < 0)
> +			tp->linger2 = -1;
> +		else if (val > INT_MAX / HZ)
> +			err = -EINVAL;
> +		else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
> +			 !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> +			tp->linger2 = 0;
> +		else
> +			tp->linger2 = val * HZ;
> +		break;
>  	default:
>  		err = -ENOPROTOOPT;
>  		break;
> 

INT_MAX looks quite 

Anyway, I do not think we need a new socket option, since really it will need documentation and add more confusion.

net->ipv4.sysctl_tcp_fin_timeout is the default value for sockets which have tp->linger2 cleared.

Fact that it has been used to cap TCP_LINGER2 was probably a mistake.

What about adding a new define and simply let TCP_LINGER2 use it ?

Really there is no point trying to allow hours or even days for FIN timeout,
and no point limiting a socket from having a value between net->ipv4.sysctl_tcp_fin_timeout and 2 minutes,
at least from security perspective, these values seem legal as far as TCP specs are concerned.



diff --git a/include/net/tcp.h b/include/net/tcp.h
index dcf9a72eeaa6912202e8a1ca6cf800f7401bf517..8dceea9ae87712613243a48edddd83d9ac629295 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -126,6 +126,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
                                  * to combine FIN-WAIT-2 timeout with
                                  * TIME-WAIT timer.
                                  */
+#define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */
 
 #define TCP_DELACK_MAX ((unsigned)(HZ/5))      /* maximal time to delay before sending an ACK */
 #if HZ >= 100
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6d87de434377e3741314772e5fd866de1c599108..a723fec8704ba4dff235818622e52d67ec9ef489 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3035,7 +3035,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
        case TCP_LINGER2:
                if (val < 0)
                        tp->linger2 = -1;
-               else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ)
+               else if (val > TCP_FIN_TIMEOUT_MAX / HZ)
                        tp->linger2 = 0;
                else
                        tp->linger2 = val * HZ;



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-23  7:35   ` [PATCH net-next v2] " Cambda Zhu
  2020-04-23 13:40     ` Eric Dumazet
@ 2020-04-23 13:43     ` Willem de Bruijn
  2020-04-23 15:02       ` Cambda Zhu
  2020-04-24  8:06     ` [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX Cambda Zhu
  2 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2020-04-23 13:43 UTC (permalink / raw)
  To: Cambda Zhu; +Cc: netdev, Eric Dumazet, Dust Li, Tony Lu

On Thu, Apr 23, 2020 at 3:36 AM Cambda Zhu <cambda@linux.alibaba.com> wrote:
>
> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
> option has same behavior as TCP_LINGER2, except the tp->linger2 value
> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
> with CAP_NET_ADMIN.
>
> As a server, different sockets may need different FIN-WAIT timeout and
> in most cases the system default value will be used. The timeout can
> be adjusted by setting TCP_LINGER2 but cannot be greater than the
> system default value. If one socket needs a timeout greater than the
> default, we have to adjust the sysctl which affects all sockets using
> the system default value. And if we want to adjust it for just one
> socket and keep the original value for others, all the other sockets
> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
> set greater tp->linger2 than the default for one socket and keep
> the sysctl_tcp_fin_timeout unchanged.
>
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
> ---
>  Changes in v2:
>    - Add int overflow check.
>
>  include/uapi/linux/capability.h |  1 +
>  include/uapi/linux/tcp.h        |  1 +
>  net/ipv4/tcp.c                  | 11 +++++++++++
>  3 files changed, 13 insertions(+)
>
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 272dc69fa080..0e30c9756a04 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>  /* Allow multicasting */
>  /* Allow read/write of device-specific registers */
>  /* Allow activation of ATM control sockets */
> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>
>  #define CAP_NET_ADMIN        12
>
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index f2acb2566333..e21e0ce98ca1 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -128,6 +128,7 @@ enum {
>  #define TCP_CM_INQ             TCP_INQ
>
>  #define TCP_TX_DELAY           37      /* delay outgoing packets by XX usec */
> +#define TCP_FORCE_LINGER2      38      /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>
>
>  #define TCP_REPAIR_ON          1
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6d87de434377..d8cd1fd66bc1 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>                         tcp_enable_tx_delay();
>                 tp->tcp_tx_delay = val;
>                 break;
> +       case TCP_FORCE_LINGER2:
> +               if (val < 0)
> +                       tp->linger2 = -1;
> +               else if (val > INT_MAX / HZ)
> +                       err = -EINVAL;
> +               else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
> +                        !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> +                       tp->linger2 = 0;

Instead of silently falling back to LINGER2 behavior for unprivileged
users, I would fail without privileges, similar to
SO_(SND|RCV)BUFFORCE.

Also, those have capable instead of ns_capable. If there is risk to
system integrity, that is the right choice.

Slight aside, if the original setsockopt had checked optval ==
sizeof(int), we could have added a variant of different size (say,
with an additional flags field), instead of having to create a new
socket option.

> +               else
> +                       tp->linger2 = val * HZ;
> +               break;
>         default:
>                 err = -ENOPROTOOPT;
>                 break;
> --
> 2.16.6
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-23 13:40     ` Eric Dumazet
@ 2020-04-23 14:46       ` Cambda Zhu
  2020-04-23 16:59         ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Cambda Zhu @ 2020-04-23 14:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Dust Li, Tony Lu



> On Apr 23, 2020, at 21:40, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 4/23/20 12:35 AM, Cambda Zhu wrote:
>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>> with CAP_NET_ADMIN.
>> 
>> As a server, different sockets may need different FIN-WAIT timeout and
>> in most cases the system default value will be used. The timeout can
>> be adjusted by setting TCP_LINGER2 but cannot be greater than the
>> system default value. If one socket needs a timeout greater than the
>> default, we have to adjust the sysctl which affects all sockets using
>> the system default value. And if we want to adjust it for just one
>> socket and keep the original value for others, all the other sockets
>> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
>> set greater tp->linger2 than the default for one socket and keep
>> the sysctl_tcp_fin_timeout unchanged.
>> 
>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>> ---
>> Changes in v2:
>>   - Add int overflow check.
>> 
>> include/uapi/linux/capability.h |  1 +
>> include/uapi/linux/tcp.h        |  1 +
>> net/ipv4/tcp.c                  | 11 +++++++++++
>> 3 files changed, 13 insertions(+)
>> 
>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>> index 272dc69fa080..0e30c9756a04 100644
>> --- a/include/uapi/linux/capability.h
>> +++ b/include/uapi/linux/capability.h
>> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>> /* Allow multicasting */
>> /* Allow read/write of device-specific registers */
>> /* Allow activation of ATM control sockets */
>> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>> 
>> #define CAP_NET_ADMIN        12
>> 
>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>> index f2acb2566333..e21e0ce98ca1 100644
>> --- a/include/uapi/linux/tcp.h
>> +++ b/include/uapi/linux/tcp.h
>> @@ -128,6 +128,7 @@ enum {
>> #define TCP_CM_INQ		TCP_INQ
>> 
>> #define TCP_TX_DELAY		37	/* delay outgoing packets by XX usec */
>> +#define TCP_FORCE_LINGER2	38	/* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>> 
>> 
>> #define TCP_REPAIR_ON		1
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 6d87de434377..d8cd1fd66bc1 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>> 			tcp_enable_tx_delay();
>> 		tp->tcp_tx_delay = val;
>> 		break;
>> +	case TCP_FORCE_LINGER2:
>> +		if (val < 0)
>> +			tp->linger2 = -1;
>> +		else if (val > INT_MAX / HZ)
>> +			err = -EINVAL;
>> +		else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
>> +			 !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>> +			tp->linger2 = 0;
>> +		else
>> +			tp->linger2 = val * HZ;
>> +		break;
>> 	default:
>> 		err = -ENOPROTOOPT;
>> 		break;
>> 
> 
> INT_MAX looks quite 
> 
> Anyway, I do not think we need a new socket option, since really it will need documentation and add more confusion.
> 
> net->ipv4.sysctl_tcp_fin_timeout is the default value for sockets which have tp->linger2 cleared.
> 
> Fact that it has been used to cap TCP_LINGER2 was probably a mistake.
> 
> What about adding a new define and simply let TCP_LINGER2 use it ?
> 
> Really there is no point trying to allow hours or even days for FIN timeout,
> and no point limiting a socket from having a value between net->ipv4.sysctl_tcp_fin_timeout and 2 minutes,
> at least from security perspective, these values seem legal as far as TCP specs are concerned.
> 
> 

I also think using sysctl_tcp_fin_timeout to cap TCP_LINGER2 is probably a mistake,
and adding a new define for TCP_LINGER2 is a good idea. I have considered the solution
and found it may have some compatibility issues. Whatever it’s a mistake or not, a
system administrator may have used it to limit the max timeout for all sockets. And
when I think about the behavior of TCP_LINGER2, I'm afraid the server may also rely on
the behavior so the server can set any value greater than sysctl_tcp_fin_timeout and
result in the same timeout.

Maybe my worry is unnecessary. If so, I think your suggestion is better.

Regards,
Cambda

> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index dcf9a72eeaa6912202e8a1ca6cf800f7401bf517..8dceea9ae87712613243a48edddd83d9ac629295 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -126,6 +126,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
>                                  * to combine FIN-WAIT-2 timeout with
>                                  * TIME-WAIT timer.
>                                  */
> +#define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */
> 
> #define TCP_DELACK_MAX ((unsigned)(HZ/5))      /* maximal time to delay before sending an ACK */
> #if HZ >= 100
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6d87de434377e3741314772e5fd866de1c599108..a723fec8704ba4dff235818622e52d67ec9ef489 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3035,7 +3035,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>        case TCP_LINGER2:
>                if (val < 0)
>                        tp->linger2 = -1;
> -               else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ)
> +               else if (val > TCP_FIN_TIMEOUT_MAX / HZ)
>                        tp->linger2 = 0;
>                else
>                        tp->linger2 = val * HZ;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-23 13:43     ` Willem de Bruijn
@ 2020-04-23 15:02       ` Cambda Zhu
  0 siblings, 0 replies; 15+ messages in thread
From: Cambda Zhu @ 2020-04-23 15:02 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, Eric Dumazet, Dust Li, Tony Lu



> On Apr 23, 2020, at 21:43, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> 
> On Thu, Apr 23, 2020 at 3:36 AM Cambda Zhu <cambda@linux.alibaba.com> wrote:
>> 
>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>> with CAP_NET_ADMIN.
>> 
>> As a server, different sockets may need different FIN-WAIT timeout and
>> in most cases the system default value will be used. The timeout can
>> be adjusted by setting TCP_LINGER2 but cannot be greater than the
>> system default value. If one socket needs a timeout greater than the
>> default, we have to adjust the sysctl which affects all sockets using
>> the system default value. And if we want to adjust it for just one
>> socket and keep the original value for others, all the other sockets
>> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
>> set greater tp->linger2 than the default for one socket and keep
>> the sysctl_tcp_fin_timeout unchanged.
>> 
>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>> ---
>> Changes in v2:
>>   - Add int overflow check.
>> 
>> include/uapi/linux/capability.h |  1 +
>> include/uapi/linux/tcp.h        |  1 +
>> net/ipv4/tcp.c                  | 11 +++++++++++
>> 3 files changed, 13 insertions(+)
>> 
>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>> index 272dc69fa080..0e30c9756a04 100644
>> --- a/include/uapi/linux/capability.h
>> +++ b/include/uapi/linux/capability.h
>> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>> /* Allow multicasting */
>> /* Allow read/write of device-specific registers */
>> /* Allow activation of ATM control sockets */
>> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>> 
>> #define CAP_NET_ADMIN        12
>> 
>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>> index f2acb2566333..e21e0ce98ca1 100644
>> --- a/include/uapi/linux/tcp.h
>> +++ b/include/uapi/linux/tcp.h
>> @@ -128,6 +128,7 @@ enum {
>> #define TCP_CM_INQ             TCP_INQ
>> 
>> #define TCP_TX_DELAY           37      /* delay outgoing packets by XX usec */
>> +#define TCP_FORCE_LINGER2      38      /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>> 
>> 
>> #define TCP_REPAIR_ON          1
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 6d87de434377..d8cd1fd66bc1 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>>                        tcp_enable_tx_delay();
>>                tp->tcp_tx_delay = val;
>>                break;
>> +       case TCP_FORCE_LINGER2:
>> +               if (val < 0)
>> +                       tp->linger2 = -1;
>> +               else if (val > INT_MAX / HZ)
>> +                       err = -EINVAL;
>> +               else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
>> +                        !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>> +                       tp->linger2 = 0;
> 
> Instead of silently falling back to LINGER2 behavior for unprivileged
> users, I would fail without privileges, similar to
> SO_(SND|RCV)BUFFORCE.
> 

Yes, I have considered failing without privileges too. It’s clearer
but the user may have to set both of TCP_LINGER2 and TCP_FORCE_LINGER2
to set the timeout as TCP_FORCE_LINGER2 not introduced.

> Also, those have capable instead of ns_capable. If there is risk to
> system integrity, that is the right choice.
> 

I think both are ok, but the sysctl_tcp_fin_timeout is in net ns.

> Slight aside, if the original setsockopt had checked optval ==
> sizeof(int), we could have added a variant of different size (say,
> with an additional flags field), instead of having to create a new
> socket option.
> 

Maybe it’s a little weird… I don’t know. :)

Regards,
Cambda

>> +               else
>> +                       tp->linger2 = val * HZ;
>> +               break;
>>        default:
>>                err = -ENOPROTOOPT;
>>                break;
>> --
>> 2.16.6


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-23 14:46       ` Cambda Zhu
@ 2020-04-23 16:59         ` Eric Dumazet
  2020-04-24  2:56           ` Cambda Zhu
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2020-04-23 16:59 UTC (permalink / raw)
  To: Cambda Zhu, Eric Dumazet; +Cc: netdev, Dust Li, Tony Lu



On 4/23/20 7:46 AM, Cambda Zhu wrote:
> 
> 
>> On Apr 23, 2020, at 21:40, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 4/23/20 12:35 AM, Cambda Zhu wrote:
>>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>>> with CAP_NET_ADMIN.
>>>
>>> As a server, different sockets may need different FIN-WAIT timeout and
>>> in most cases the system default value will be used. The timeout can
>>> be adjusted by setting TCP_LINGER2 but cannot be greater than the
>>> system default value. If one socket needs a timeout greater than the
>>> default, we have to adjust the sysctl which affects all sockets using
>>> the system default value. And if we want to adjust it for just one
>>> socket and keep the original value for others, all the other sockets
>>> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
>>> set greater tp->linger2 than the default for one socket and keep
>>> the sysctl_tcp_fin_timeout unchanged.
>>>
>>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>>> ---
>>> Changes in v2:
>>>   - Add int overflow check.
>>>
>>> include/uapi/linux/capability.h |  1 +
>>> include/uapi/linux/tcp.h        |  1 +
>>> net/ipv4/tcp.c                  | 11 +++++++++++
>>> 3 files changed, 13 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>>> index 272dc69fa080..0e30c9756a04 100644
>>> --- a/include/uapi/linux/capability.h
>>> +++ b/include/uapi/linux/capability.h
>>> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>>> /* Allow multicasting */
>>> /* Allow read/write of device-specific registers */
>>> /* Allow activation of ATM control sockets */
>>> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>>
>>> #define CAP_NET_ADMIN        12
>>>
>>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>>> index f2acb2566333..e21e0ce98ca1 100644
>>> --- a/include/uapi/linux/tcp.h
>>> +++ b/include/uapi/linux/tcp.h
>>> @@ -128,6 +128,7 @@ enum {
>>> #define TCP_CM_INQ		TCP_INQ
>>>
>>> #define TCP_TX_DELAY		37	/* delay outgoing packets by XX usec */
>>> +#define TCP_FORCE_LINGER2	38	/* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>>
>>>
>>> #define TCP_REPAIR_ON		1
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 6d87de434377..d8cd1fd66bc1 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>>> 			tcp_enable_tx_delay();
>>> 		tp->tcp_tx_delay = val;
>>> 		break;
>>> +	case TCP_FORCE_LINGER2:
>>> +		if (val < 0)
>>> +			tp->linger2 = -1;
>>> +		else if (val > INT_MAX / HZ)
>>> +			err = -EINVAL;
>>> +		else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
>>> +			 !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>>> +			tp->linger2 = 0;
>>> +		else
>>> +			tp->linger2 = val * HZ;
>>> +		break;
>>> 	default:
>>> 		err = -ENOPROTOOPT;
>>> 		break;
>>>
>>
>> INT_MAX looks quite 
>>
>> Anyway, I do not think we need a new socket option, since really it will need documentation and add more confusion.
>>
>> net->ipv4.sysctl_tcp_fin_timeout is the default value for sockets which have tp->linger2 cleared.
>>
>> Fact that it has been used to cap TCP_LINGER2 was probably a mistake.
>>
>> What about adding a new define and simply let TCP_LINGER2 use it ?
>>
>> Really there is no point trying to allow hours or even days for FIN timeout,
>> and no point limiting a socket from having a value between net->ipv4.sysctl_tcp_fin_timeout and 2 minutes,
>> at least from security perspective, these values seem legal as far as TCP specs are concerned.
>>
>>
> 
> I also think using sysctl_tcp_fin_timeout to cap TCP_LINGER2 is probably a mistake,
> and adding a new define for TCP_LINGER2 is a good idea. I have considered the solution
> and found it may have some compatibility issues. Whatever it’s a mistake or not, a
> system administrator may have used it to limit the max timeout for all sockets. And
> when I think about the behavior of TCP_LINGER2, I'm afraid the server may also rely on
> the behavior so the server can set any value greater than sysctl_tcp_fin_timeout and
> result in the same timeout.
> 
> Maybe my worry is unnecessary. If so, I think your suggestion is better.
>

Ṕlease send a v3 with a proper changelog then, thanks !


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-23 16:59         ` Eric Dumazet
@ 2020-04-24  2:56           ` Cambda Zhu
  2020-04-24  4:23             ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Cambda Zhu @ 2020-04-24  2:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Dust Li, Tony Lu



> On Apr 24, 2020, at 00:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 4/23/20 7:46 AM, Cambda Zhu wrote:
>> 
>> 
>>> On Apr 23, 2020, at 21:40, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> 
>>> 
>>> 
>>> On 4/23/20 12:35 AM, Cambda Zhu wrote:
>>>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>>>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>>>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>>>> with CAP_NET_ADMIN.
>>>> 
>>>> As a server, different sockets may need different FIN-WAIT timeout and
>>>> in most cases the system default value will be used. The timeout can
>>>> be adjusted by setting TCP_LINGER2 but cannot be greater than the
>>>> system default value. If one socket needs a timeout greater than the
>>>> default, we have to adjust the sysctl which affects all sockets using
>>>> the system default value. And if we want to adjust it for just one
>>>> socket and keep the original value for others, all the other sockets
>>>> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
>>>> set greater tp->linger2 than the default for one socket and keep
>>>> the sysctl_tcp_fin_timeout unchanged.
>>>> 
>>>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>>>> ---
>>>> Changes in v2:
>>>>  - Add int overflow check.
>>>> 
>>>> include/uapi/linux/capability.h |  1 +
>>>> include/uapi/linux/tcp.h        |  1 +
>>>> net/ipv4/tcp.c                  | 11 +++++++++++
>>>> 3 files changed, 13 insertions(+)
>>>> 
>>>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>>>> index 272dc69fa080..0e30c9756a04 100644
>>>> --- a/include/uapi/linux/capability.h
>>>> +++ b/include/uapi/linux/capability.h
>>>> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>>>> /* Allow multicasting */
>>>> /* Allow read/write of device-specific registers */
>>>> /* Allow activation of ATM control sockets */
>>>> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>>> 
>>>> #define CAP_NET_ADMIN        12
>>>> 
>>>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>>>> index f2acb2566333..e21e0ce98ca1 100644
>>>> --- a/include/uapi/linux/tcp.h
>>>> +++ b/include/uapi/linux/tcp.h
>>>> @@ -128,6 +128,7 @@ enum {
>>>> #define TCP_CM_INQ		TCP_INQ
>>>> 
>>>> #define TCP_TX_DELAY		37	/* delay outgoing packets by XX usec */
>>>> +#define TCP_FORCE_LINGER2	38	/* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>>> 
>>>> 
>>>> #define TCP_REPAIR_ON		1
>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>> index 6d87de434377..d8cd1fd66bc1 100644
>>>> --- a/net/ipv4/tcp.c
>>>> +++ b/net/ipv4/tcp.c
>>>> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>>>> 			tcp_enable_tx_delay();
>>>> 		tp->tcp_tx_delay = val;
>>>> 		break;
>>>> +	case TCP_FORCE_LINGER2:
>>>> +		if (val < 0)
>>>> +			tp->linger2 = -1;
>>>> +		else if (val > INT_MAX / HZ)
>>>> +			err = -EINVAL;
>>>> +		else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
>>>> +			 !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>>>> +			tp->linger2 = 0;
>>>> +		else
>>>> +			tp->linger2 = val * HZ;
>>>> +		break;
>>>> 	default:
>>>> 		err = -ENOPROTOOPT;
>>>> 		break;
>>>> 
>>> 
>>> INT_MAX looks quite 
>>> 
>>> Anyway, I do not think we need a new socket option, since really it will need documentation and add more confusion.
>>> 
>>> net->ipv4.sysctl_tcp_fin_timeout is the default value for sockets which have tp->linger2 cleared.
>>> 
>>> Fact that it has been used to cap TCP_LINGER2 was probably a mistake.
>>> 
>>> What about adding a new define and simply let TCP_LINGER2 use it ?
>>> 
>>> Really there is no point trying to allow hours or even days for FIN timeout,
>>> and no point limiting a socket from having a value between net->ipv4.sysctl_tcp_fin_timeout and 2 minutes,
>>> at least from security perspective, these values seem legal as far as TCP specs are concerned.
>>> 
>>> 
>> 
>> I also think using sysctl_tcp_fin_timeout to cap TCP_LINGER2 is probably a mistake,
>> and adding a new define for TCP_LINGER2 is a good idea. I have considered the solution
>> and found it may have some compatibility issues. Whatever it’s a mistake or not, a
>> system administrator may have used it to limit the max timeout for all sockets. And
>> when I think about the behavior of TCP_LINGER2, I'm afraid the server may also rely on
>> the behavior so the server can set any value greater than sysctl_tcp_fin_timeout and
>> result in the same timeout.
>> 
>> Maybe my worry is unnecessary. If so, I think your suggestion is better.
>> 
> 
> Ṕlease send a v3 with a proper changelog then, thanks !

If the value between sysctl_tcp_fin_timeout and 2 minutes is legal, should we set tp->linger2
to TCP_FIN_TIMEOUT_MAX or return an error instead of set it to zero which means the default value?

Regards,
Cambda


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
  2020-04-24  2:56           ` Cambda Zhu
@ 2020-04-24  4:23             ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2020-04-24  4:23 UTC (permalink / raw)
  To: Cambda Zhu, Eric Dumazet; +Cc: netdev, Dust Li, Tony Lu



On 4/23/20 7:56 PM, Cambda Zhu wrote:
> 

> If the value between sysctl_tcp_fin_timeout and 2 minutes is legal, should we set tp->linger2
> to TCP_FIN_TIMEOUT_MAX or return an error instead of set it to zero which means the default value?

I would not send an error.

This might break some applications that could consider an error as a serious one.

(You can reuse my patch basically)




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX
  2020-04-23  7:35   ` [PATCH net-next v2] " Cambda Zhu
  2020-04-23 13:40     ` Eric Dumazet
  2020-04-23 13:43     ` Willem de Bruijn
@ 2020-04-24  8:06     ` Cambda Zhu
  2020-05-01 22:12       ` David Miller
  2 siblings, 1 reply; 15+ messages in thread
From: Cambda Zhu @ 2020-04-24  8:06 UTC (permalink / raw)
  To: netdev, Eric Dumazet; +Cc: Dust Li, Tony Lu, Cambda Zhu

This patch changes the behavior of TCP_LINGER2 about its limit. The
sysctl_tcp_fin_timeout used to be the limit of TCP_LINGER2 but now it's
only the default value. A new macro named TCP_FIN_TIMEOUT_MAX is added
as the limit of TCP_LINGER2, which is 2 minutes.

Since TCP_LINGER2 used sysctl_tcp_fin_timeout as the default value
and the limit in the past, the system administrator cannot set the
default value for most of sockets and let some sockets have a greater
timeout. It might be a mistake that let the sysctl to be the limit of
the TCP_LINGER2. Maybe we can add a new sysctl to set the max of
TCP_LINGER2, but FIN-WAIT-2 timeout is usually no need to be too long
and 2 minutes are legal considering TCP specs.

Changes in v3:
- Remove the new socket option and change the TCP_LINGER2 behavior so
  that the timeout can be set to value between sysctl_tcp_fin_timeout
  and 2 minutes.

Changes in v2:
- Add int overflow check for the new socket option.

Changes in v1:
- Add a new socket option to set timeout greater than
  sysctl_tcp_fin_timeout.

Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
---
 include/net/tcp.h | 1 +
 net/ipv4/tcp.c    | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5fa9eacd965a..cde8a1f75caa 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -126,6 +126,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 				  * to combine FIN-WAIT-2 timeout with
 				  * TIME-WAIT timer.
 				  */
+#define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */
 
 #define TCP_DELACK_MAX	((unsigned)(HZ/5))	/* maximal time to delay before sending an ACK */
 #if HZ >= 100
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6d87de434377..8c1250103959 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3035,8 +3035,8 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 	case TCP_LINGER2:
 		if (val < 0)
 			tp->linger2 = -1;
-		else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ)
-			tp->linger2 = 0;
+		else if (val > TCP_FIN_TIMEOUT_MAX / HZ)
+			tp->linger2 = TCP_FIN_TIMEOUT_MAX;
 		else
 			tp->linger2 = val * HZ;
 		break;
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX
  2020-04-24  8:06     ` [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX Cambda Zhu
@ 2020-05-01 22:12       ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2020-05-01 22:12 UTC (permalink / raw)
  To: cambda; +Cc: netdev, eric.dumazet, dust.li, tonylu

From: Cambda Zhu <cambda@linux.alibaba.com>
Date: Fri, 24 Apr 2020 16:06:16 +0800

> This patch changes the behavior of TCP_LINGER2 about its limit. The
> sysctl_tcp_fin_timeout used to be the limit of TCP_LINGER2 but now it's
> only the default value. A new macro named TCP_FIN_TIMEOUT_MAX is added
> as the limit of TCP_LINGER2, which is 2 minutes.
> 
> Since TCP_LINGER2 used sysctl_tcp_fin_timeout as the default value
> and the limit in the past, the system administrator cannot set the
> default value for most of sockets and let some sockets have a greater
> timeout. It might be a mistake that let the sysctl to be the limit of
> the TCP_LINGER2. Maybe we can add a new sysctl to set the max of
> TCP_LINGER2, but FIN-WAIT-2 timeout is usually no need to be too long
> and 2 minutes are legal considering TCP specs.
> 
> Changes in v3:
> - Remove the new socket option and change the TCP_LINGER2 behavior so
>   that the timeout can be set to value between sysctl_tcp_fin_timeout
>   and 2 minutes.
> 
> Changes in v2:
> - Add int overflow check for the new socket option.
> 
> Changes in v1:
> - Add a new socket option to set timeout greater than
>   sysctl_tcp_fin_timeout.
> 
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>

Applied, thank you.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-05-01 22:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21 12:17 [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt Cambda Zhu
2020-04-23  2:19 ` David Miller
2020-04-23  3:01 ` Eric Dumazet
2020-04-23  7:35   ` [PATCH net-next v2] " Cambda Zhu
2020-04-23 13:40     ` Eric Dumazet
2020-04-23 14:46       ` Cambda Zhu
2020-04-23 16:59         ` Eric Dumazet
2020-04-24  2:56           ` Cambda Zhu
2020-04-24  4:23             ` Eric Dumazet
2020-04-23 13:43     ` Willem de Bruijn
2020-04-23 15:02       ` Cambda Zhu
2020-04-24  8:06     ` [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX Cambda Zhu
2020-05-01 22:12       ` David Miller
2020-04-23  8:52 ` [PATCH net-next] net: Add TCP_FORCE_LINGER2 to TCP setsockopt David Laight
2020-04-23 10:33   ` Cambda Zhu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.