All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization
@ 2020-09-09 18:15 Neal Cardwell
  2020-09-09 18:15 ` [PATCH net-next 1/4] tcp: only init congestion control if not initialized already Neal Cardwell
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Neal Cardwell @ 2020-09-09 18:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell

This patch series reorganizes TCP congestion control initialization so that if
EBPF code called by tcp_init_transfer() sets the congestion control algorithm
by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
congestion control module immediately, instead of having tcp_init_transfer()
later initialize the congestion control module.

This increases flexibility for the EBPF code that runs at connection
establishment time, and simplifies the code.

This has the following benefits:

(1) This allows CC module customizations made by the EBPF called in
    tcp_init_transfer() to persist, and not be wiped out by a later
    call to tcp_init_congestion_control() in tcp_init_transfer().

(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
    for existing code upstream that depends on the current order.

(3) Does not cause 2 initializations for for CC in the case where the
    EBPF called in tcp_init_transfer() wants to set the CC to a new CC
    algorithm.

(4) Allows follow-on simplifications to the code in net/core/filter.c
    and net/ipv4/tcp_cong.c, which currently both have some complexity
    to special-case CC initialization to avoid double CC
    initialization if EBPF sets the CC.

Neal Cardwell (4):
  tcp: only init congestion control if not initialized already
  tcp: simplify EBPF TCP_CONGESTION to always init CC
  tcp: simplify tcp_set_congestion_control(): always reinitialize
  tcp: simplify _bpf_setsockopt(): remove flags argument

 include/net/inet_connection_sock.h |  3 ++-
 include/net/tcp.h                  |  2 +-
 net/core/filter.c                  | 18 ++++--------------
 net/ipv4/tcp.c                     |  3 ++-
 net/ipv4/tcp_cong.c                | 14 ++++----------
 net/ipv4/tcp_input.c               |  4 +++-
 6 files changed, 16 insertions(+), 28 deletions(-)

-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 1/4] tcp: only init congestion control if not initialized already
  2020-09-09 18:15 [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Neal Cardwell
@ 2020-09-09 18:15 ` Neal Cardwell
  2020-09-09 18:15 ` [PATCH net-next 2/4] tcp: simplify EBPF TCP_CONGESTION to always init CC Neal Cardwell
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Neal Cardwell @ 2020-09-09 18:15 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Kevin Yang, Eric Dumazet,
	Lawrence Brakmo

Change tcp_init_transfer() to only initialize congestion control if it
has not been initialized already.

With this new approach, we can arrange things so that if the EBPF code
sets the congestion control by calling setsockopt(TCP_CONGESTION) then
tcp_init_transfer() will not re-initialize the CC module.

This is an approach that has the following beneficial properties:

(1) This allows CC module customizations made by the EBPF called in
    tcp_init_transfer() to persist, and not be wiped out by a later
    call to tcp_init_congestion_control() in tcp_init_transfer().

(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
    for existing code upstream that depends on the current order.

(3) Does not cause 2 initializations for for CC in the case where the
    EBPF called in tcp_init_transfer() wants to set the CC to a new CC
    algorithm.

(4) Allows follow-on simplifications to the code in net/core/filter.c
    and net/ipv4/tcp_cong.c, which currently both have some complexity
    to special-case CC initialization to avoid double CC
    initialization if EBPF sets the CC.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Kevin Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
---
 include/net/inet_connection_sock.h | 3 ++-
 net/ipv4/tcp.c                     | 1 +
 net/ipv4/tcp_cong.c                | 3 ++-
 net/ipv4/tcp_input.c               | 4 +++-
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index c738abeb3265..dc763ca9413c 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -96,7 +96,8 @@ struct inet_connection_sock {
 	void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
 	struct hlist_node         icsk_listen_portaddr_node;
 	unsigned int		  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
-	__u8			  icsk_ca_state:6,
+	__u8			  icsk_ca_state:5,
+				  icsk_ca_initialized:1,
 				  icsk_ca_setsockopt:1,
 				  icsk_ca_dst_locked:1;
 	__u8			  icsk_retransmits;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 57a568875539..7360d3db2b61 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2698,6 +2698,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	if (icsk->icsk_ca_ops->release)
 		icsk->icsk_ca_ops->release(sk);
 	memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
+	icsk->icsk_ca_initialized = 0;
 	tcp_set_ca_state(sk, TCP_CA_Open);
 	tp->is_sack_reneg = 0;
 	tcp_clear_retrans(tp);
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 62878cf26d9c..d18d7a1ce4ce 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -176,7 +176,7 @@ void tcp_assign_congestion_control(struct sock *sk)
 
 void tcp_init_congestion_control(struct sock *sk)
 {
-	const struct inet_connection_sock *icsk = inet_csk(sk);
+	struct inet_connection_sock *icsk = inet_csk(sk);
 
 	tcp_sk(sk)->prior_ssthresh = 0;
 	if (icsk->icsk_ca_ops->init)
@@ -185,6 +185,7 @@ void tcp_init_congestion_control(struct sock *sk)
 		INET_ECN_xmit(sk);
 	else
 		INET_ECN_dontxmit(sk);
+	icsk->icsk_ca_initialized = 1;
 }
 
 static void tcp_reinit_congestion_control(struct sock *sk,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4337841faeff..0e5ac0d33fd3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5894,8 +5894,10 @@ void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb)
 		tp->snd_cwnd = tcp_init_cwnd(tp, __sk_dst_get(sk));
 	tp->snd_cwnd_stamp = tcp_jiffies32;
 
+	icsk->icsk_ca_initialized = 0;
 	bpf_skops_established(sk, bpf_op, skb);
-	tcp_init_congestion_control(sk);
+	if (!icsk->icsk_ca_initialized)
+		tcp_init_congestion_control(sk);
 	tcp_init_buffer_space(sk);
 }
 
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 2/4] tcp: simplify EBPF TCP_CONGESTION to always init CC
  2020-09-09 18:15 [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Neal Cardwell
  2020-09-09 18:15 ` [PATCH net-next 1/4] tcp: only init congestion control if not initialized already Neal Cardwell
@ 2020-09-09 18:15 ` Neal Cardwell
  2020-09-09 18:15 ` [PATCH net-next 3/4] tcp: simplify tcp_set_congestion_control(): always reinitialize Neal Cardwell
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Neal Cardwell @ 2020-09-09 18:15 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Kevin Yang, Eric Dumazet,
	Lawrence Brakmo

Now that the previous patch ensures we don't initialize the congestion
control twice, when EBPF sets the congestion control algorithm at
connection establishment we can simplify the code by simply
initializing the congestion control module at that time.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Kevin Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
---
 net/core/filter.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 2ad9c0ef1946..b26c04924fa3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4313,8 +4313,6 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
-#define SOCKOPT_CC_REINIT (1 << 0)
-
 static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 			   char *optval, int optlen, u32 flags)
 {
@@ -4449,13 +4447,12 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 		   sk->sk_prot->setsockopt == tcp_setsockopt) {
 		if (optname == TCP_CONGESTION) {
 			char name[TCP_CA_NAME_MAX];
-			bool reinit = flags & SOCKOPT_CC_REINIT;
 
 			strncpy(name, optval, min_t(long, optlen,
 						    TCP_CA_NAME_MAX-1));
 			name[TCP_CA_NAME_MAX-1] = 0;
 			ret = tcp_set_congestion_control(sk, name, false,
-							 reinit, true);
+							 true, true);
 		} else {
 			struct inet_connection_sock *icsk = inet_csk(sk);
 			struct tcp_sock *tp = tcp_sk(sk);
@@ -4652,8 +4649,6 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
 	u32 flags = 0;
-	if (bpf_sock->op > BPF_SOCK_OPS_NEEDS_ECN)
-		flags |= SOCKOPT_CC_REINIT;
 	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen,
 			       flags);
 }
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 3/4] tcp: simplify tcp_set_congestion_control(): always reinitialize
  2020-09-09 18:15 [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Neal Cardwell
  2020-09-09 18:15 ` [PATCH net-next 1/4] tcp: only init congestion control if not initialized already Neal Cardwell
  2020-09-09 18:15 ` [PATCH net-next 2/4] tcp: simplify EBPF TCP_CONGESTION to always init CC Neal Cardwell
@ 2020-09-09 18:15 ` Neal Cardwell
  2020-09-10  0:29   ` Martin KaFai Lau
  2020-09-09 18:15 ` [PATCH net-next 4/4] tcp: simplify _bpf_setsockopt(): remove flags argument Neal Cardwell
  2020-09-10  0:36 ` [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Martin KaFai Lau
  4 siblings, 1 reply; 12+ messages in thread
From: Neal Cardwell @ 2020-09-09 18:15 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Kevin Yang, Eric Dumazet,
	Lawrence Brakmo

Now that the previous patches ensure that all call sites for
tcp_set_congestion_control() want to initialize congestion control, we
can simplify tcp_set_congestion_control() by removing the reinit
argument and the code to support it.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Kevin Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
---
 include/net/tcp.h   |  2 +-
 net/core/filter.c   |  3 +--
 net/ipv4/tcp.c      |  2 +-
 net/ipv4/tcp_cong.c | 11 ++---------
 4 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e85d564446c6..f857146c17a5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1104,7 +1104,7 @@ void tcp_get_available_congestion_control(char *buf, size_t len);
 void tcp_get_allowed_congestion_control(char *buf, size_t len);
 int tcp_set_allowed_congestion_control(char *allowed);
 int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
-			       bool reinit, bool cap_net_admin);
+			       bool cap_net_admin);
 u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
 void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index b26c04924fa3..0bd0a97ee951 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4451,8 +4451,7 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 			strncpy(name, optval, min_t(long, optlen,
 						    TCP_CA_NAME_MAX-1));
 			name[TCP_CA_NAME_MAX-1] = 0;
-			ret = tcp_set_congestion_control(sk, name, false,
-							 true, true);
+			ret = tcp_set_congestion_control(sk, name, false, true);
 		} else {
 			struct inet_connection_sock *icsk = inet_csk(sk);
 			struct tcp_sock *tp = tcp_sk(sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7360d3db2b61..e58ab9db73ff 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3050,7 +3050,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname,
 		name[val] = 0;
 
 		lock_sock(sk);
-		err = tcp_set_congestion_control(sk, name, true, true,
+		err = tcp_set_congestion_control(sk, name, true,
 						 ns_capable(sock_net(sk)->user_ns,
 							    CAP_NET_ADMIN));
 		release_sock(sk);
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index d18d7a1ce4ce..a9b0fb52a1ec 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -341,7 +341,7 @@ int tcp_set_allowed_congestion_control(char *val)
  * already initialized.
  */
 int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
-			       bool reinit, bool cap_net_admin)
+			       bool cap_net_admin)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	const struct tcp_congestion_ops *ca;
@@ -365,15 +365,8 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
 	if (!ca) {
 		err = -ENOENT;
 	} else if (!load) {
-		const struct tcp_congestion_ops *old_ca = icsk->icsk_ca_ops;
-
 		if (bpf_try_module_get(ca, ca->owner)) {
-			if (reinit) {
-				tcp_reinit_congestion_control(sk, ca);
-			} else {
-				icsk->icsk_ca_ops = ca;
-				bpf_module_put(old_ca, old_ca->owner);
-			}
+			tcp_reinit_congestion_control(sk, ca);
 		} else {
 			err = -EBUSY;
 		}
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 4/4] tcp: simplify _bpf_setsockopt(): remove flags argument
  2020-09-09 18:15 [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Neal Cardwell
                   ` (2 preceding siblings ...)
  2020-09-09 18:15 ` [PATCH net-next 3/4] tcp: simplify tcp_set_congestion_control(): always reinitialize Neal Cardwell
@ 2020-09-09 18:15 ` Neal Cardwell
  2020-09-10  0:36 ` [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Martin KaFai Lau
  4 siblings, 0 replies; 12+ messages in thread
From: Neal Cardwell @ 2020-09-09 18:15 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Kevin Yang, Eric Dumazet,
	Lawrence Brakmo

Now that the previous patches have removed the code that uses the
flags argument to _bpf_setsockopt(), we can remove that argument.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Kevin Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
---
 net/core/filter.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0bd0a97ee951..b393528913f2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4314,7 +4314,7 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
 };
 
 static int _bpf_setsockopt(struct sock *sk, int level, int optname,
-			   char *optval, int optlen, u32 flags)
+			   char *optval, int optlen)
 {
 	char devname[IFNAMSIZ];
 	int val, valbool;
@@ -4611,9 +4611,7 @@ static int _bpf_getsockopt(struct sock *sk, int level, int optname,
 BPF_CALL_5(bpf_sock_addr_setsockopt, struct bpf_sock_addr_kern *, ctx,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
-	u32 flags = 0;
-	return _bpf_setsockopt(ctx->sk, level, optname, optval, optlen,
-			       flags);
+	return _bpf_setsockopt(ctx->sk, level, optname, optval, optlen);
 }
 
 static const struct bpf_func_proto bpf_sock_addr_setsockopt_proto = {
@@ -4647,9 +4645,7 @@ static const struct bpf_func_proto bpf_sock_addr_getsockopt_proto = {
 BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
-	u32 flags = 0;
-	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen,
-			       flags);
+	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
 }
 
 static const struct bpf_func_proto bpf_sock_ops_setsockopt_proto = {
-- 
2.28.0.526.ge36021eeef-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 3/4] tcp: simplify tcp_set_congestion_control(): always reinitialize
  2020-09-09 18:15 ` [PATCH net-next 3/4] tcp: simplify tcp_set_congestion_control(): always reinitialize Neal Cardwell
@ 2020-09-10  0:29   ` Martin KaFai Lau
  2020-09-10  3:17     ` Neal Cardwell
  0 siblings, 1 reply; 12+ messages in thread
From: Martin KaFai Lau @ 2020-09-10  0:29 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, netdev, Yuchung Cheng, Kevin Yang, Eric Dumazet,
	Lawrence Brakmo

On Wed, Sep 09, 2020 at 02:15:55PM -0400, Neal Cardwell wrote:
> Now that the previous patches ensure that all call sites for
> tcp_set_congestion_control() want to initialize congestion control, we
> can simplify tcp_set_congestion_control() by removing the reinit
> argument and the code to support it.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Kevin Yang <yyd@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Lawrence Brakmo <brakmo@fb.com>
> ---
>  include/net/tcp.h   |  2 +-
>  net/core/filter.c   |  3 +--
>  net/ipv4/tcp.c      |  2 +-
>  net/ipv4/tcp_cong.c | 11 ++---------
>  4 files changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index e85d564446c6..f857146c17a5 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1104,7 +1104,7 @@ void tcp_get_available_congestion_control(char *buf, size_t len);
>  void tcp_get_allowed_congestion_control(char *buf, size_t len);
>  int tcp_set_allowed_congestion_control(char *allowed);
>  int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
> -			       bool reinit, bool cap_net_admin);
> +			       bool cap_net_admin);
>  u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
>  void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
>  
> diff --git a/net/core/filter.c b/net/core/filter.c
> index b26c04924fa3..0bd0a97ee951 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4451,8 +4451,7 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
>  			strncpy(name, optval, min_t(long, optlen,
>  						    TCP_CA_NAME_MAX-1));
>  			name[TCP_CA_NAME_MAX-1] = 0;
> -			ret = tcp_set_congestion_control(sk, name, false,
> -							 true, true);
> +			ret = tcp_set_congestion_control(sk, name, false, true);
>  		} else {
>  			struct inet_connection_sock *icsk = inet_csk(sk);
>  			struct tcp_sock *tp = tcp_sk(sk);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 7360d3db2b61..e58ab9db73ff 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3050,7 +3050,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>  		name[val] = 0;
>  
>  		lock_sock(sk);
> -		err = tcp_set_congestion_control(sk, name, true, true,
> +		err = tcp_set_congestion_control(sk, name, true,
>  						 ns_capable(sock_net(sk)->user_ns,
>  							    CAP_NET_ADMIN));
>  		release_sock(sk);
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index d18d7a1ce4ce..a9b0fb52a1ec 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -341,7 +341,7 @@ int tcp_set_allowed_congestion_control(char *val)
>   * already initialized.
>   */
>  int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
> -			       bool reinit, bool cap_net_admin)
> +			       bool cap_net_admin)
>  {
>  	struct inet_connection_sock *icsk = inet_csk(sk);
>  	const struct tcp_congestion_ops *ca;
> @@ -365,15 +365,8 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
>  	if (!ca) {
>  		err = -ENOENT;
>  	} else if (!load) {
nit.

I think this "else if (!load)" case can be completely removed and simply
allow it to fall through to the last
"else { tcp_reinit_congestion_control(sk, ca); }" .

> -		const struct tcp_congestion_ops *old_ca = icsk->icsk_ca_ops;
> -
>  		if (bpf_try_module_get(ca, ca->owner)) {
> -			if (reinit) {
> -				tcp_reinit_congestion_control(sk, ca);
> -			} else {
> -				icsk->icsk_ca_ops = ca;
> -				bpf_module_put(old_ca, old_ca->owner);
> -			}
> +			tcp_reinit_congestion_control(sk, ca);
>  		} else {
>  			err = -EBUSY;
>  		}
> -- 
> 2.28.0.526.ge36021eeef-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization
  2020-09-09 18:15 [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Neal Cardwell
                   ` (3 preceding siblings ...)
  2020-09-09 18:15 ` [PATCH net-next 4/4] tcp: simplify _bpf_setsockopt(): remove flags argument Neal Cardwell
@ 2020-09-10  0:36 ` Martin KaFai Lau
  2020-09-10  3:21   ` Neal Cardwell
  4 siblings, 1 reply; 12+ messages in thread
From: Martin KaFai Lau @ 2020-09-10  0:36 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, netdev

On Wed, Sep 09, 2020 at 02:15:52PM -0400, Neal Cardwell wrote:
> This patch series reorganizes TCP congestion control initialization so that if
> EBPF code called by tcp_init_transfer() sets the congestion control algorithm
> by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
> congestion control module immediately, instead of having tcp_init_transfer()
> later initialize the congestion control module.
> 
> This increases flexibility for the EBPF code that runs at connection
> establishment time, and simplifies the code.
> 
> This has the following benefits:
> 
> (1) This allows CC module customizations made by the EBPF called in
>     tcp_init_transfer() to persist, and not be wiped out by a later
>     call to tcp_init_congestion_control() in tcp_init_transfer().
> 
> (2) Does not flip the order of EBPF and CC init, to avoid causing bugs
>     for existing code upstream that depends on the current order.
> 
> (3) Does not cause 2 initializations for for CC in the case where the
>     EBPF called in tcp_init_transfer() wants to set the CC to a new CC
>     algorithm.
> 
> (4) Allows follow-on simplifications to the code in net/core/filter.c
>     and net/ipv4/tcp_cong.c, which currently both have some complexity
>     to special-case CC initialization to avoid double CC
>     initialization if EBPF sets the CC.
Thanks for this work.  Only have one nit in patch 3 for consideration.

Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 3/4] tcp: simplify tcp_set_congestion_control(): always reinitialize
  2020-09-10  0:29   ` Martin KaFai Lau
@ 2020-09-10  3:17     ` Neal Cardwell
  0 siblings, 0 replies; 12+ messages in thread
From: Neal Cardwell @ 2020-09-10  3:17 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: David Miller, Netdev, Yuchung Cheng, Kevin Yang, Eric Dumazet,
	Lawrence Brakmo

On Wed, Sep 9, 2020 at 8:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Sep 09, 2020 at 02:15:55PM -0400, Neal Cardwell wrote:
> > Now that the previous patches ensure that all call sites for
> > tcp_set_congestion_control() want to initialize congestion control, we
> > can simplify tcp_set_congestion_control() by removing the reinit
> > argument and the code to support it.
> >
> > Signed-off-by: Neal Cardwell <ncardwell@google.com>
> > Acked-by: Yuchung Cheng <ycheng@google.com>
> > Acked-by: Kevin Yang <yyd@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Lawrence Brakmo <brakmo@fb.com>
> > ---
> >  include/net/tcp.h   |  2 +-
> >  net/core/filter.c   |  3 +--
> >  net/ipv4/tcp.c      |  2 +-
> >  net/ipv4/tcp_cong.c | 11 ++---------
> >  4 files changed, 5 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index e85d564446c6..f857146c17a5 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -1104,7 +1104,7 @@ void tcp_get_available_congestion_control(char *buf, size_t len);
> >  void tcp_get_allowed_congestion_control(char *buf, size_t len);
> >  int tcp_set_allowed_congestion_control(char *allowed);
> >  int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
> > -                            bool reinit, bool cap_net_admin);
> > +                            bool cap_net_admin);
> >  u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
> >  void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index b26c04924fa3..0bd0a97ee951 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -4451,8 +4451,7 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
> >                       strncpy(name, optval, min_t(long, optlen,
> >                                                   TCP_CA_NAME_MAX-1));
> >                       name[TCP_CA_NAME_MAX-1] = 0;
> > -                     ret = tcp_set_congestion_control(sk, name, false,
> > -                                                      true, true);
> > +                     ret = tcp_set_congestion_control(sk, name, false, true);
> >               } else {
> >                       struct inet_connection_sock *icsk = inet_csk(sk);
> >                       struct tcp_sock *tp = tcp_sk(sk);
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 7360d3db2b61..e58ab9db73ff 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -3050,7 +3050,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname,
> >               name[val] = 0;
> >
> >               lock_sock(sk);
> > -             err = tcp_set_congestion_control(sk, name, true, true,
> > +             err = tcp_set_congestion_control(sk, name, true,
> >                                                ns_capable(sock_net(sk)->user_ns,
> >                                                           CAP_NET_ADMIN));
> >               release_sock(sk);
> > diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> > index d18d7a1ce4ce..a9b0fb52a1ec 100644
> > --- a/net/ipv4/tcp_cong.c
> > +++ b/net/ipv4/tcp_cong.c
> > @@ -341,7 +341,7 @@ int tcp_set_allowed_congestion_control(char *val)
> >   * already initialized.
> >   */
> >  int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
> > -                            bool reinit, bool cap_net_admin)
> > +                            bool cap_net_admin)
> >  {
> >       struct inet_connection_sock *icsk = inet_csk(sk);
> >       const struct tcp_congestion_ops *ca;
> > @@ -365,15 +365,8 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load,
> >       if (!ca) {
> >               err = -ENOENT;
> >       } else if (!load) {
> nit.
>
> I think this "else if (!load)" case can be completely removed and simply
> allow it to fall through to the last
> "else { tcp_reinit_congestion_control(sk, ca); }" .

Thanks, Martin. That is a very nice observation, and I think you are
right that we can make the additional refactor/simplification/clean-up
that you mention, without changing behavior. For clarity I think that
would be nice to have as a separate follow-on commit. Are you
interested in posting a follow-on patch for that, since it's your nice
idea?

thanks,
neal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization
  2020-09-10  0:36 ` [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Martin KaFai Lau
@ 2020-09-10  3:21   ` Neal Cardwell
  2020-09-10  5:12     ` Martin KaFai Lau
  2020-09-10  5:31     ` Alexei Starovoitov
  0 siblings, 2 replies; 12+ messages in thread
From: Neal Cardwell @ 2020-09-10  3:21 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: David Miller, Netdev

On Wed, Sep 9, 2020 at 8:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Sep 09, 2020 at 02:15:52PM -0400, Neal Cardwell wrote:
> > This patch series reorganizes TCP congestion control initialization so that if
> > EBPF code called by tcp_init_transfer() sets the congestion control algorithm
> > by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
> > congestion control module immediately, instead of having tcp_init_transfer()
> > later initialize the congestion control module.
> >
> > This increases flexibility for the EBPF code that runs at connection
> > establishment time, and simplifies the code.
> >
> > This has the following benefits:
> >
> > (1) This allows CC module customizations made by the EBPF called in
> >     tcp_init_transfer() to persist, and not be wiped out by a later
> >     call to tcp_init_congestion_control() in tcp_init_transfer().
> >
> > (2) Does not flip the order of EBPF and CC init, to avoid causing bugs
> >     for existing code upstream that depends on the current order.
> >
> > (3) Does not cause 2 initializations for for CC in the case where the
> >     EBPF called in tcp_init_transfer() wants to set the CC to a new CC
> >     algorithm.
> >
> > (4) Allows follow-on simplifications to the code in net/core/filter.c
> >     and net/ipv4/tcp_cong.c, which currently both have some complexity
> >     to special-case CC initialization to avoid double CC
> >     initialization if EBPF sets the CC.
> Thanks for this work.  Only have one nit in patch 3 for consideration.
>
> Acked-by: Martin KaFai Lau <kafai@fb.com>

Thanks for the review! I like your suggestion in patch 3 to further
simplify the code. Do you mind submitting your idea for a follow-on
clean-up/refactor as a separate follow-on commit?

thanks,
neal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization
  2020-09-10  3:21   ` Neal Cardwell
@ 2020-09-10  5:12     ` Martin KaFai Lau
  2020-09-10  5:31     ` Alexei Starovoitov
  1 sibling, 0 replies; 12+ messages in thread
From: Martin KaFai Lau @ 2020-09-10  5:12 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, Netdev

On Wed, Sep 09, 2020 at 11:21:50PM -0400, Neal Cardwell wrote:
> On Wed, Sep 9, 2020 at 8:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Wed, Sep 09, 2020 at 02:15:52PM -0400, Neal Cardwell wrote:
> > > This patch series reorganizes TCP congestion control initialization so that if
> > > EBPF code called by tcp_init_transfer() sets the congestion control algorithm
> > > by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
> > > congestion control module immediately, instead of having tcp_init_transfer()
> > > later initialize the congestion control module.
> > >
> > > This increases flexibility for the EBPF code that runs at connection
> > > establishment time, and simplifies the code.
> > >
> > > This has the following benefits:
> > >
> > > (1) This allows CC module customizations made by the EBPF called in
> > >     tcp_init_transfer() to persist, and not be wiped out by a later
> > >     call to tcp_init_congestion_control() in tcp_init_transfer().
> > >
> > > (2) Does not flip the order of EBPF and CC init, to avoid causing bugs
> > >     for existing code upstream that depends on the current order.
> > >
> > > (3) Does not cause 2 initializations for for CC in the case where the
> > >     EBPF called in tcp_init_transfer() wants to set the CC to a new CC
> > >     algorithm.
> > >
> > > (4) Allows follow-on simplifications to the code in net/core/filter.c
> > >     and net/ipv4/tcp_cong.c, which currently both have some complexity
> > >     to special-case CC initialization to avoid double CC
> > >     initialization if EBPF sets the CC.
> > Thanks for this work.  Only have one nit in patch 3 for consideration.
> >
> > Acked-by: Martin KaFai Lau <kafai@fb.com>
> 
> Thanks for the review! I like your suggestion in patch 3 to further
> simplify the code. Do you mind submitting your idea for a follow-on
> clean-up/refactor as a separate follow-on commit?
Sure. will do.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization
  2020-09-10  3:21   ` Neal Cardwell
  2020-09-10  5:12     ` Martin KaFai Lau
@ 2020-09-10  5:31     ` Alexei Starovoitov
  2020-09-10 13:17       ` Neal Cardwell
  1 sibling, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2020-09-10  5:31 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Martin KaFai Lau, David Miller, Netdev

On Wed, Sep 9, 2020 at 8:24 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Sep 9, 2020 at 8:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Wed, Sep 09, 2020 at 02:15:52PM -0400, Neal Cardwell wrote:
> > > This patch series reorganizes TCP congestion control initialization so that if
> > > EBPF code called by tcp_init_transfer() sets the congestion control algorithm
> > > by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
> > > congestion control module immediately, instead of having tcp_init_transfer()
> > > later initialize the congestion control module.
> > >
> > > This increases flexibility for the EBPF code that runs at connection
> > > establishment time, and simplifies the code.
> > >
> > > This has the following benefits:
> > >
> > > (1) This allows CC module customizations made by the EBPF called in
> > >     tcp_init_transfer() to persist, and not be wiped out by a later
> > >     call to tcp_init_congestion_control() in tcp_init_transfer().
> > >
> > > (2) Does not flip the order of EBPF and CC init, to avoid causing bugs
> > >     for existing code upstream that depends on the current order.
> > >
> > > (3) Does not cause 2 initializations for for CC in the case where the
> > >     EBPF called in tcp_init_transfer() wants to set the CC to a new CC
> > >     algorithm.
> > >
> > > (4) Allows follow-on simplifications to the code in net/core/filter.c
> > >     and net/ipv4/tcp_cong.c, which currently both have some complexity
> > >     to special-case CC initialization to avoid double CC
> > >     initialization if EBPF sets the CC.
> > Thanks for this work.  Only have one nit in patch 3 for consideration.
> >
> > Acked-by: Martin KaFai Lau <kafai@fb.com>
>
> Thanks for the review! I like your suggestion in patch 3 to further
> simplify the code. Do you mind submitting your idea for a follow-on
> clean-up/refactor as a separate follow-on commit?

I think it's better to be folded into this set.
It needs rebase to bpf-next anyway.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization
  2020-09-10  5:31     ` Alexei Starovoitov
@ 2020-09-10 13:17       ` Neal Cardwell
  0 siblings, 0 replies; 12+ messages in thread
From: Neal Cardwell @ 2020-09-10 13:17 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Martin KaFai Lau, David Miller, Netdev

OK, thanks everyone. I will rebase the series onto bpf-next and
include a patch with Martin's suggested follow-on cleanup. Will
resubmit ASAP.

thanks,
neal


On Thu, Sep 10, 2020 at 1:31 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Sep 9, 2020 at 8:24 PM Neal Cardwell <ncardwell@google.com> wrote:
> >
> > On Wed, Sep 9, 2020 at 8:36 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Wed, Sep 09, 2020 at 02:15:52PM -0400, Neal Cardwell wrote:
> > > > This patch series reorganizes TCP congestion control initialization so that if
> > > > EBPF code called by tcp_init_transfer() sets the congestion control algorithm
> > > > by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
> > > > congestion control module immediately, instead of having tcp_init_transfer()
> > > > later initialize the congestion control module.
> > > >
> > > > This increases flexibility for the EBPF code that runs at connection
> > > > establishment time, and simplifies the code.
> > > >
> > > > This has the following benefits:
> > > >
> > > > (1) This allows CC module customizations made by the EBPF called in
> > > >     tcp_init_transfer() to persist, and not be wiped out by a later
> > > >     call to tcp_init_congestion_control() in tcp_init_transfer().
> > > >
> > > > (2) Does not flip the order of EBPF and CC init, to avoid causing bugs
> > > >     for existing code upstream that depends on the current order.
> > > >
> > > > (3) Does not cause 2 initializations for for CC in the case where the
> > > >     EBPF called in tcp_init_transfer() wants to set the CC to a new CC
> > > >     algorithm.
> > > >
> > > > (4) Allows follow-on simplifications to the code in net/core/filter.c
> > > >     and net/ipv4/tcp_cong.c, which currently both have some complexity
> > > >     to special-case CC initialization to avoid double CC
> > > >     initialization if EBPF sets the CC.
> > > Thanks for this work.  Only have one nit in patch 3 for consideration.
> > >
> > > Acked-by: Martin KaFai Lau <kafai@fb.com>
> >
> > Thanks for the review! I like your suggestion in patch 3 to further
> > simplify the code. Do you mind submitting your idea for a follow-on
> > clean-up/refactor as a separate follow-on commit?
>
> I think it's better to be folded into this set.
> It needs rebase to bpf-next anyway.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-09-10 13:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09 18:15 [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Neal Cardwell
2020-09-09 18:15 ` [PATCH net-next 1/4] tcp: only init congestion control if not initialized already Neal Cardwell
2020-09-09 18:15 ` [PATCH net-next 2/4] tcp: simplify EBPF TCP_CONGESTION to always init CC Neal Cardwell
2020-09-09 18:15 ` [PATCH net-next 3/4] tcp: simplify tcp_set_congestion_control(): always reinitialize Neal Cardwell
2020-09-10  0:29   ` Martin KaFai Lau
2020-09-10  3:17     ` Neal Cardwell
2020-09-09 18:15 ` [PATCH net-next 4/4] tcp: simplify _bpf_setsockopt(): remove flags argument Neal Cardwell
2020-09-10  0:36 ` [PATCH net-next 0/4] tcp: increase flexibility of EBPF congestion control initialization Martin KaFai Lau
2020-09-10  3:21   ` Neal Cardwell
2020-09-10  5:12     ` Martin KaFai Lau
2020-09-10  5:31     ` Alexei Starovoitov
2020-09-10 13:17       ` Neal Cardwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.