netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] net: tls: add socket diag
@ 2019-08-15 16:00 Davide Caratti
  2019-08-15 16:00 ` [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data Davide Caratti
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Davide Caratti @ 2019-08-15 16:00 UTC (permalink / raw)
  To: Boris Pismenny, Jakub Kicinski, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev

The current kernel does not provide any diagnostic tool, except
getsockopt(TCP_ULP), to know more about TCP sockets that have an upper
layer protocol (ULP) on top of them. This series extends the set of
information exported by INET_DIAG_INFO, to include data that are specific
to the ULP (and that might be meaningful for debug/testing purposes).

patch 1/3 ensures that the control plane reads/updates ULP specific data
using RCU.

patch 2/3 extends INET_DIAG_INFO and allows knowing the ULP name for
each TCP socket that has done setsockopt(TCP_ULP) successfully.

patch 3/3 extends kTLS to let programs like 'ss' know the protocol
version and the cipher in use.

Changes since RFC:
- some coding style fixes, thanks to Jakub Kicinski
- add X_UNSPEC as lowest value of uAPI enums, thanks to Jakub Kicinski
- fix assignment of struct nlattr *start, thanks to Jakub Kicinski
- let tls dump RXCONF and TXCONF, suggested by Jakub Kicinski
- don't dump anything if TLS version or cipher are 0 (but still return a
  constant size in get_aux_size()), thanks to Boris Pismenny
- constify first argument of get_info() and get_size()
- use RCU to access access ulp_ops, like it's done for ca_ops
- add patch 1/3, from Jakub Kicinski

Davide Caratti (2):
  tcp: ulp: add functions to dump ulp-specific information
  net: tls: export protocol version, cipher, tx_conf/rx_conf to socket
    diag

Jakub Kicinski (1):
  net/tls: use RCU protection on icsk->icsk_ulp_data

 include/net/inet_connection_sock.h |  2 +-
 include/net/tcp.h                  |  3 ++
 include/net/tls.h                  | 28 +++++++++-
 include/uapi/linux/inet_diag.h     |  9 ++++
 include/uapi/linux/tls.h           | 15 ++++++
 net/core/sock_map.c                |  2 +-
 net/ipv4/tcp_diag.c                | 56 ++++++++++++++++++-
 net/tls/tls_device.c               |  2 +-
 net/tls/tls_main.c                 | 87 +++++++++++++++++++++++++++---
 9 files changed, 191 insertions(+), 13 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data
  2019-08-15 16:00 [PATCH net-next 0/3] net: tls: add socket diag Davide Caratti
@ 2019-08-15 16:00 ` Davide Caratti
  2019-08-15 21:32   ` Jakub Kicinski
  2019-08-15 16:00 ` [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information Davide Caratti
  2019-08-15 16:00 ` [PATCH net-next 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag Davide Caratti
  2 siblings, 1 reply; 10+ messages in thread
From: Davide Caratti @ 2019-08-15 16:00 UTC (permalink / raw)
  To: Boris Pismenny, Jakub Kicinski, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev

From: Jakub Kicinski <jakub.kicinski@netronome.com>

We need to make sure context does not get freed while diag
code is interrogating it. Free struct tls_context with
kfree_rcu().

We add the __rcu annotation directly in icsk, and cast it
away in the datapath accessor. Presumably all ULPs will
do a similar thing.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/inet_connection_sock.h |  2 +-
 include/net/tls.h                  |  9 +++++++--
 net/core/sock_map.c                |  2 +-
 net/tls/tls_device.c               |  2 +-
 net/tls/tls_main.c                 | 31 +++++++++++++++++++++++-------
 5 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index c57d53e7e02c..895546058a20 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -97,7 +97,7 @@ struct inet_connection_sock {
 	const struct tcp_congestion_ops *icsk_ca_ops;
 	const struct inet_connection_sock_af_ops *icsk_af_ops;
 	const struct tcp_ulp_ops  *icsk_ulp_ops;
-	void			  *icsk_ulp_data;
+	void __rcu		  *icsk_ulp_data;
 	void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
 	struct hlist_node         icsk_listen_portaddr_node;
 	unsigned int		  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
diff --git a/include/net/tls.h b/include/net/tls.h
index 41b2d41bb1b8..4997742475cd 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -41,6 +41,7 @@
 #include <linux/tcp.h>
 #include <linux/skmsg.h>
 #include <linux/netdevice.h>
+#include <linux/rcupdate.h>
 
 #include <net/tcp.h>
 #include <net/strparser.h>
@@ -290,6 +291,7 @@ struct tls_context {
 
 	struct list_head list;
 	refcount_t refcount;
+	struct rcu_head rcu;
 };
 
 enum tls_offload_ctx_dir {
@@ -348,7 +350,7 @@ struct tls_offload_context_rx {
 #define TLS_OFFLOAD_CONTEXT_SIZE_RX					\
 	(sizeof(struct tls_offload_context_rx) + TLS_DRIVER_STATE_SIZE_RX)
 
-void tls_ctx_free(struct tls_context *ctx);
+void tls_ctx_free(struct sock *sk, struct tls_context *ctx);
 int wait_on_pending_writer(struct sock *sk, long *timeo);
 int tls_sk_query(struct sock *sk, int optname, char __user *optval,
 		int __user *optlen);
@@ -467,7 +469,10 @@ static inline struct tls_context *tls_get_ctx(const struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 
-	return icsk->icsk_ulp_data;
+	/* Use RCU on icsk_ulp_data only for sock diag code,
+	 * TLS data path doesn't need rcu_dereference().
+	 */
+	return (__force void *)icsk->icsk_ulp_data;
 }
 
 static inline void tls_advance_record_sn(struct sock *sk,
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 1330a7442e5b..01998860afaa 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -345,7 +345,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
 		return -EINVAL;
 	if (unlikely(idx >= map->max_entries))
 		return -E2BIG;
-	if (unlikely(icsk->icsk_ulp_data))
+	if (unlikely(rcu_access_pointer(icsk->icsk_ulp_data)))
 		return -EINVAL;
 
 	link = sk_psock_init_link();
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index d184230665eb..436df5b4bb60 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -61,7 +61,7 @@ static void tls_device_free_ctx(struct tls_context *ctx)
 	if (ctx->rx_conf == TLS_HW)
 		kfree(tls_offload_ctx_rx(ctx));
 
-	tls_ctx_free(ctx);
+	tls_ctx_free(NULL, ctx);
 }
 
 static void tls_device_gc_task(struct work_struct *work)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 9cbbae606ced..04829bef514c 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -251,14 +251,31 @@ static void tls_write_space(struct sock *sk)
 	ctx->sk_write_space(sk);
 }
 
-void tls_ctx_free(struct tls_context *ctx)
+/**
+ * tls_ctx_free() - free TLS ULP context
+ * @sk:  socket to with @ctx is attached
+ * @ctx: TLS context structure
+ *
+ * Free TLS context. If @sk is %NULL caller guarantees that the socket
+ * to which @ctx was attached has no outstanding references.
+ */
+void tls_ctx_free(struct sock *sk, struct tls_context *ctx)
 {
+	struct inet_connection_sock *icsk;
+
 	if (!ctx)
 		return;
 
 	memzero_explicit(&ctx->crypto_send, sizeof(ctx->crypto_send));
 	memzero_explicit(&ctx->crypto_recv, sizeof(ctx->crypto_recv));
-	kfree(ctx);
+
+	if (sk) {
+		icsk = inet_csk(sk);
+		rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
+		kfree_rcu(ctx, rcu);
+	} else {
+		kfree(ctx);
+	}
 }
 
 static void tls_sk_proto_cleanup(struct sock *sk,
@@ -306,7 +323,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 
 	write_lock_bh(&sk->sk_callback_lock);
 	if (free_ctx)
-		icsk->icsk_ulp_data = NULL;
+		rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
 	sk->sk_prot = ctx->sk_proto;
 	write_unlock_bh(&sk->sk_callback_lock);
 	release_sock(sk);
@@ -319,7 +336,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 	ctx->sk_proto_close(sk, timeout);
 
 	if (free_ctx)
-		tls_ctx_free(ctx);
+		tls_ctx_free(sk, ctx);
 }
 
 static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
@@ -608,7 +625,7 @@ static struct tls_context *create_ctx(struct sock *sk)
 	if (!ctx)
 		return NULL;
 
-	icsk->icsk_ulp_data = ctx;
+	rcu_assign_pointer(icsk->icsk_ulp_data, ctx);
 	ctx->setsockopt = sk->sk_prot->setsockopt;
 	ctx->getsockopt = sk->sk_prot->getsockopt;
 	ctx->sk_proto_close = sk->sk_prot->close;
@@ -649,8 +666,8 @@ static void tls_hw_sk_destruct(struct sock *sk)
 
 	ctx->sk_destruct(sk);
 	/* Free ctx */
-	tls_ctx_free(ctx);
-	icsk->icsk_ulp_data = NULL;
+	tls_ctx_free(sk, ctx);
+	rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
 }
 
 static int tls_hw_prot(struct sock *sk)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information
  2019-08-15 16:00 [PATCH net-next 0/3] net: tls: add socket diag Davide Caratti
  2019-08-15 16:00 ` [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data Davide Caratti
@ 2019-08-15 16:00 ` Davide Caratti
  2019-08-15 18:46   ` Eric Dumazet
  2019-08-15 16:00 ` [PATCH net-next 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag Davide Caratti
  2 siblings, 1 reply; 10+ messages in thread
From: Davide Caratti @ 2019-08-15 16:00 UTC (permalink / raw)
  To: Boris Pismenny, Jakub Kicinski, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev

currently, only getsockopt(TCP_ULP) can be invoked to know if a ULP is on
top of a TCP socket. Extend idiag_get_aux() and idiag_get_aux_size(),
introduced by commit b37e88407c1d ("inet_diag: allow protocols to provide
additional data"), to report the ULP name and other information that can
be made available by the ULP through optional functions.

Users having CAP_NET_ADMIN privileges will then be able to retrieve this
information through inet_diag_handler, if they specify INET_DIAG_INFO in
the request.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 include/net/tcp.h              |  3 ++
 include/uapi/linux/inet_diag.h |  8 +++++
 net/ipv4/tcp_diag.c            | 56 +++++++++++++++++++++++++++++++++-
 3 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 77fe87f7a992..c9a3f9688223 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2122,6 +2122,9 @@ struct tcp_ulp_ops {
 	void (*update)(struct sock *sk, struct proto *p);
 	/* cleanup ulp */
 	void (*release)(struct sock *sk);
+	/* diagnostic */
+	int (*get_info)(const struct sock *sk, struct sk_buff *skb);
+	size_t (*get_info_size)(const struct sock *sk);
 
 	char		name[TCP_ULP_NAME_MAX];
 	struct module	*owner;
diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index e8baca85bac6..e2c6273274f3 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -153,11 +153,19 @@ enum {
 	INET_DIAG_BBRINFO,	/* request as INET_DIAG_VEGASINFO */
 	INET_DIAG_CLASS_ID,	/* request as INET_DIAG_TCLASS */
 	INET_DIAG_MD5SIG,
+	INET_DIAG_ULP_INFO,
 	__INET_DIAG_MAX,
 };
 
 #define INET_DIAG_MAX (__INET_DIAG_MAX - 1)
 
+enum {
+	INET_ULP_INFO_UNSPEC,
+	INET_ULP_INFO_NAME,
+	__INET_ULP_INFO_MAX,
+};
+#define INET_ULP_INFO_MAX (__INET_ULP_INFO_MAX - 1)
+
 /* INET_DIAG_MEM */
 
 struct inet_diag_meminfo {
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index a3a386236d93..1cec262ac8eb 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -81,13 +81,42 @@ static int tcp_diag_put_md5sig(struct sk_buff *skb,
 }
 #endif
 
+static int tcp_diag_put_ulp(struct sk_buff *skb, struct sock *sk,
+			    const struct tcp_ulp_ops *ulp_ops)
+{
+	struct nlattr *nest;
+	int err;
+
+	nest = nla_nest_start_noflag(skb, INET_DIAG_ULP_INFO);
+	if (!nest)
+		return -EMSGSIZE;
+
+	err = nla_put_string(skb, INET_ULP_INFO_NAME, ulp_ops->name);
+	if (err)
+		goto nla_failure;
+
+	if (ulp_ops->get_info)
+		err = ulp_ops->get_info(sk, skb);
+	if (err)
+		goto nla_failure;
+
+	nla_nest_end(skb, nest);
+	return 0;
+
+nla_failure:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
 static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
 			    struct sk_buff *skb)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
+	int err = 0;
+
 #ifdef CONFIG_TCP_MD5SIG
 	if (net_admin) {
 		struct tcp_md5sig_info *md5sig;
-		int err = 0;
 
 		rcu_read_lock();
 		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
@@ -99,11 +128,23 @@ static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
 	}
 #endif
 
+	if (net_admin) {
+		const struct tcp_ulp_ops *ulp_ops;
+
+		rcu_read_lock();
+		ulp_ops = icsk->icsk_ulp_ops;
+		if (ulp_ops)
+			err = tcp_diag_put_ulp(skb, sk, ulp_ops);
+		rcu_read_unlock();
+		if (err)
+			return err;
+	}
 	return 0;
 }
 
 static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
 	size_t size = 0;
 
 #ifdef CONFIG_TCP_MD5SIG
@@ -124,6 +165,19 @@ static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 	}
 #endif
 
+	if (net_admin) {
+		const struct tcp_ulp_ops *ulp_ops;
+
+		rcu_read_lock();
+		ulp_ops = icsk->icsk_ulp_ops;
+		if (ulp_ops) {
+			size += nla_total_size(0) +
+				nla_total_size(TCP_ULP_NAME_MAX);
+			if (ulp_ops->get_info_size)
+				size += ulp_ops->get_info_size(sk);
+		}
+		rcu_read_unlock();
+	}
 	return size;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag
  2019-08-15 16:00 [PATCH net-next 0/3] net: tls: add socket diag Davide Caratti
  2019-08-15 16:00 ` [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data Davide Caratti
  2019-08-15 16:00 ` [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information Davide Caratti
@ 2019-08-15 16:00 ` Davide Caratti
  2 siblings, 0 replies; 10+ messages in thread
From: Davide Caratti @ 2019-08-15 16:00 UTC (permalink / raw)
  To: Boris Pismenny, Jakub Kicinski, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev

When an application configures kernel TLS on top of a TCP socket, it's
now possible for inet_diag_handler() to collect information regarding the
protocol version, the cipher type and TX / RX configuration, in case
INET_DIAG_INFO is requested.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 include/net/tls.h              | 19 ++++++++++++
 include/uapi/linux/inet_diag.h |  1 +
 include/uapi/linux/tls.h       | 15 +++++++++
 net/tls/tls_main.c             | 56 ++++++++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4997742475cd..990f1d9182a3 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -431,6 +431,25 @@ static inline bool is_tx_ready(struct tls_sw_context_tx *ctx)
 	return READ_ONCE(rec->tx_ready);
 }
 
+static inline u16 tls_user_config(struct tls_context *ctx, bool tx)
+{
+	u16 config = tx ? ctx->tx_conf : ctx->rx_conf;
+
+	switch (config) {
+	case TLS_BASE:
+		return TLS_CONF_BASE;
+	case TLS_SW:
+		return TLS_CONF_SW;
+#ifdef CONFIG_TLS_DEVICE
+	case TLS_HW:
+		return TLS_CONF_HW;
+#endif
+	case TLS_HW_RECORD:
+		return TLS_CONF_HW_RECORD;
+	}
+	return 0;
+}
+
 struct sk_buff *
 tls_validate_xmit_skb(struct sock *sk, struct net_device *dev,
 		      struct sk_buff *skb);
diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index e2c6273274f3..a1ff345b3f33 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -162,6 +162,7 @@ enum {
 enum {
 	INET_ULP_INFO_UNSPEC,
 	INET_ULP_INFO_NAME,
+	INET_ULP_INFO_TLS,
 	__INET_ULP_INFO_MAX,
 };
 #define INET_ULP_INFO_MAX (__INET_ULP_INFO_MAX - 1)
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index 5b9c26753e46..bcd2869ed472 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -109,4 +109,19 @@ struct tls12_crypto_info_aes_ccm_128 {
 	unsigned char rec_seq[TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE];
 };
 
+enum {
+	TLS_INFO_UNSPEC,
+	TLS_INFO_VERSION,
+	TLS_INFO_CIPHER,
+	TLS_INFO_TXCONF,
+	TLS_INFO_RXCONF,
+	__TLS_INFO_MAX,
+};
+#define TLS_INFO_MAX (__TLS_INFO_MAX - 1)
+
+#define TLS_CONF_BASE 1
+#define TLS_CONF_SW 2
+#define TLS_CONF_HW 3
+#define TLS_CONF_HW_RECORD 4
+
 #endif /* _UAPI_LINUX_TLS_H */
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 04829bef514c..957d937c72d2 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -39,6 +39,7 @@
 #include <linux/netdevice.h>
 #include <linux/sched/signal.h>
 #include <linux/inetdevice.h>
+#include <linux/inet_diag.h>
 
 #include <net/tls.h>
 
@@ -838,6 +839,59 @@ static void tls_update(struct sock *sk, struct proto *p)
 	}
 }
 
+static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
+{
+	struct tls_context *ctx = tls_get_ctx(sk);
+	u16 version, cipher_type;
+	struct nlattr *start;
+	int err;
+
+	start = nla_nest_start_noflag(skb, INET_ULP_INFO_TLS);
+	if (!start)
+		return -EMSGSIZE;
+
+	version = ctx->prot_info.version;
+	if (version) {
+		err = nla_put_u16(skb, TLS_INFO_VERSION, version);
+		if (err)
+			goto nla_failure;
+	}
+	cipher_type = ctx->prot_info.cipher_type;
+	if (cipher_type) {
+		err = nla_put_u16(skb, TLS_INFO_CIPHER, cipher_type);
+		if (err)
+			goto nla_failure;
+	}
+	err = nla_put_u16(skb, TLS_INFO_TXCONF, tls_user_config(ctx, true));
+	if (err)
+		goto nla_failure;
+
+	err = nla_put_u16(skb, TLS_INFO_RXCONF, tls_user_config(ctx, false));
+	if (err)
+		goto nla_failure;
+
+	nla_nest_end(skb, start);
+	return 0;
+
+nla_failure:
+	nla_nest_cancel(skb, start);
+	return err;
+}
+
+static size_t tls_get_info_size(const struct sock *sk)
+{
+	size_t size = 0;
+
+	size += nla_total_size(0) +		/* INET_ULP_INFO_TLS */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_VERSION */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_CIPHER */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_RXCONF */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_TXCONF */
+		0;
+
+	return size;
+}
+
 void tls_register_device(struct tls_device *device)
 {
 	spin_lock_bh(&device_spinlock);
@@ -859,6 +913,8 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = {
 	.owner			= THIS_MODULE,
 	.init			= tls_init,
 	.update			= tls_update,
+	.get_info		= tls_get_info,
+	.get_info_size		= tls_get_info_size,
 };
 
 static int __init tls_register(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information
  2019-08-15 16:00 ` [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information Davide Caratti
@ 2019-08-15 18:46   ` Eric Dumazet
  2019-08-15 21:38     ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2019-08-15 18:46 UTC (permalink / raw)
  To: Davide Caratti, Boris Pismenny, Jakub Kicinski, John Fastabend,
	Dave Watson, Aviad Yehezkel, David S. Miller, netdev



On 8/15/19 6:00 PM, Davide Caratti wrote:

>  
> +	if (net_admin) {
> +		const struct tcp_ulp_ops *ulp_ops;
> +
> +		rcu_read_lock();
> +		ulp_ops = icsk->icsk_ulp_ops;
> +		if (ulp_ops)
> +			err = tcp_diag_put_ulp(skb, sk, ulp_ops);
> +		rcu_read_unlock();
> +		if (err)
> +			return err;
> +	}
>  	return 0;


Why is rcu_read_lock() and rcu_read_unlock() used at all ?

icsk->icsk_ulp_ops does not seem to be rcu protected ?

If this was, then an rcu_dereference() would be appropriate.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data
  2019-08-15 16:00 ` [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data Davide Caratti
@ 2019-08-15 21:32   ` Jakub Kicinski
  2019-08-19 13:23     ` Davide Caratti
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2019-08-15 21:32 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Boris Pismenny, John Fastabend, Dave Watson, Aviad Yehezkel,
	David S. Miller, netdev

On Thu, 15 Aug 2019 18:00:42 +0200, Davide Caratti wrote:
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> 
> We need to make sure context does not get freed while diag
> code is interrogating it. Free struct tls_context with
> kfree_rcu().
> 
> We add the __rcu annotation directly in icsk, and cast it
> away in the datapath accessor. Presumably all ULPs will
> do a similar thing.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  include/net/inet_connection_sock.h |  2 +-
>  include/net/tls.h                  |  9 +++++++--
>  net/core/sock_map.c                |  2 +-
>  net/tls/tls_device.c               |  2 +-
>  net/tls/tls_main.c                 | 31 +++++++++++++++++++++++-------
>  5 files changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index c57d53e7e02c..895546058a20 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -97,7 +97,7 @@ struct inet_connection_sock {
>  	const struct tcp_congestion_ops *icsk_ca_ops;
>  	const struct inet_connection_sock_af_ops *icsk_af_ops;
>  	const struct tcp_ulp_ops  *icsk_ulp_ops;
> -	void			  *icsk_ulp_data;
> +	void __rcu		  *icsk_ulp_data;
>  	void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
>  	struct hlist_node         icsk_listen_portaddr_node;
>  	unsigned int		  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
> diff --git a/include/net/tls.h b/include/net/tls.h
> index 41b2d41bb1b8..4997742475cd 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -41,6 +41,7 @@
>  #include <linux/tcp.h>
>  #include <linux/skmsg.h>
>  #include <linux/netdevice.h>
> +#include <linux/rcupdate.h>
>  
>  #include <net/tcp.h>
>  #include <net/strparser.h>
> @@ -290,6 +291,7 @@ struct tls_context {
>  
>  	struct list_head list;
>  	refcount_t refcount;
> +	struct rcu_head rcu;
>  };
>  
>  enum tls_offload_ctx_dir {
> @@ -348,7 +350,7 @@ struct tls_offload_context_rx {
>  #define TLS_OFFLOAD_CONTEXT_SIZE_RX					\
>  	(sizeof(struct tls_offload_context_rx) + TLS_DRIVER_STATE_SIZE_RX)
>  
> -void tls_ctx_free(struct tls_context *ctx);
> +void tls_ctx_free(struct sock *sk, struct tls_context *ctx);
>  int wait_on_pending_writer(struct sock *sk, long *timeo);
>  int tls_sk_query(struct sock *sk, int optname, char __user *optval,
>  		int __user *optlen);
> @@ -467,7 +469,10 @@ static inline struct tls_context *tls_get_ctx(const struct sock *sk)
>  {
>  	struct inet_connection_sock *icsk = inet_csk(sk);
>  
> -	return icsk->icsk_ulp_data;
> +	/* Use RCU on icsk_ulp_data only for sock diag code,
> +	 * TLS data path doesn't need rcu_dereference().
> +	 */
> +	return (__force void *)icsk->icsk_ulp_data;
>  }
>  
>  static inline void tls_advance_record_sn(struct sock *sk,
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 1330a7442e5b..01998860afaa 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -345,7 +345,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
>  		return -EINVAL;
>  	if (unlikely(idx >= map->max_entries))
>  		return -E2BIG;
> -	if (unlikely(icsk->icsk_ulp_data))
> +	if (unlikely(rcu_access_pointer(icsk->icsk_ulp_data)))
>  		return -EINVAL;
>  
>  	link = sk_psock_init_link();
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index d184230665eb..436df5b4bb60 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -61,7 +61,7 @@ static void tls_device_free_ctx(struct tls_context *ctx)
>  	if (ctx->rx_conf == TLS_HW)
>  		kfree(tls_offload_ctx_rx(ctx));
>  
> -	tls_ctx_free(ctx);
> +	tls_ctx_free(NULL, ctx);
>  }
>  
>  static void tls_device_gc_task(struct work_struct *work)
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index 9cbbae606ced..04829bef514c 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -251,14 +251,31 @@ static void tls_write_space(struct sock *sk)
>  	ctx->sk_write_space(sk);
>  }
>  
> -void tls_ctx_free(struct tls_context *ctx)
> +/**
> + * tls_ctx_free() - free TLS ULP context
> + * @sk:  socket to with @ctx is attached
> + * @ctx: TLS context structure
> + *
> + * Free TLS context. If @sk is %NULL caller guarantees that the socket
> + * to which @ctx was attached has no outstanding references.
> + */
> +void tls_ctx_free(struct sock *sk, struct tls_context *ctx)
>  {
> +	struct inet_connection_sock *icsk;
> +
>  	if (!ctx)
>  		return;
>  
>  	memzero_explicit(&ctx->crypto_send, sizeof(ctx->crypto_send));
>  	memzero_explicit(&ctx->crypto_recv, sizeof(ctx->crypto_recv));
> -	kfree(ctx);
> +
> +	if (sk) {
> +		icsk = inet_csk(sk);
> +		rcu_assign_pointer(icsk->icsk_ulp_data, NULL);

Now that we kind of want to set the icsk_ulp_data to NULL under the
callback_lock I think we should let the callers do it.

> +		kfree_rcu(ctx, rcu);
> +	} else {
> +		kfree(ctx);
> +	}
>  }
>  
>  static void tls_sk_proto_cleanup(struct sock *sk,
> @@ -306,7 +323,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
>  
>  	write_lock_bh(&sk->sk_callback_lock);
>  	if (free_ctx)
> -		icsk->icsk_ulp_data = NULL;
> +		rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
>  	sk->sk_prot = ctx->sk_proto;
>  	write_unlock_bh(&sk->sk_callback_lock);
>  	release_sock(sk);
> @@ -319,7 +336,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
>  	ctx->sk_proto_close(sk, timeout);
>  
>  	if (free_ctx)
> -		tls_ctx_free(ctx);
> +		tls_ctx_free(sk, ctx);
>  }
>  
>  static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
> @@ -608,7 +625,7 @@ static struct tls_context *create_ctx(struct sock *sk)
>  	if (!ctx)
>  		return NULL;
>  
> -	icsk->icsk_ulp_data = ctx;
> +	rcu_assign_pointer(icsk->icsk_ulp_data, ctx);
>  	ctx->setsockopt = sk->sk_prot->setsockopt;
>  	ctx->getsockopt = sk->sk_prot->getsockopt;
>  	ctx->sk_proto_close = sk->sk_prot->close;
> @@ -649,8 +666,8 @@ static void tls_hw_sk_destruct(struct sock *sk)
>  
>  	ctx->sk_destruct(sk);
>  	/* Free ctx */
> -	tls_ctx_free(ctx);
> -	icsk->icsk_ulp_data = NULL;
> +	tls_ctx_free(sk, ctx);
> +	rcu_assign_pointer(icsk->icsk_ulp_data, NULL);

Let's reorder the assignment before the free.

>  }
>  
>  static int tls_hw_prot(struct sock *sk)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information
  2019-08-15 18:46   ` Eric Dumazet
@ 2019-08-15 21:38     ` Jakub Kicinski
  2019-08-19 13:32       ` Davide Caratti
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2019-08-15 21:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Davide Caratti, Boris Pismenny, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev

On Thu, 15 Aug 2019 20:46:01 +0200, Eric Dumazet wrote:
> On 8/15/19 6:00 PM, Davide Caratti wrote:
> 
> >  
> > +	if (net_admin) {
> > +		const struct tcp_ulp_ops *ulp_ops;
> > +
> > +		rcu_read_lock();
> > +		ulp_ops = icsk->icsk_ulp_ops;
> > +		if (ulp_ops)
> > +			err = tcp_diag_put_ulp(skb, sk, ulp_ops);
> > +		rcu_read_unlock();
> > +		if (err)
> > +			return err;
> > +	}
> >  	return 0;  
> 
> 
> Why is rcu_read_lock() and rcu_read_unlock() used at all ?
> 
> icsk->icsk_ulp_ops does not seem to be rcu protected ?
> 
> If this was, then an rcu_dereference() would be appropriate.

Indeed it's ulp_data not ulp_ops that are protected. Davide, 
perhaps we could push the RCU lock into tls_get_info(), after all?

And tls_context has to use rcu_deference there, as Eric points out, 
plus we should probably NULL-check it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data
  2019-08-15 21:32   ` Jakub Kicinski
@ 2019-08-19 13:23     ` Davide Caratti
  0 siblings, 0 replies; 10+ messages in thread
From: Davide Caratti @ 2019-08-19 13:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Boris Pismenny, John Fastabend, Dave Watson, Aviad Yehezkel,
	David S. Miller, netdev

On Thu, 2019-08-15 at 14:32 -0700, Jakub Kicinski wrote:
> On Thu, 15 Aug 2019 18:00:42 +0200, Davide Caratti wrote:
> > From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > 
> > We need to make sure context does not get freed while diag
> > code is interrogating it. Free struct tls_context with
> > kfree_rcu().
> > 
> > We add the __rcu annotation directly in icsk, and cast it
> > away in the datapath accessor. Presumably all ULPs will
> > do a similar thing.
> > 
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

hello Jakub,

> > @@ -251,14 +251,31 @@ static void tls_write_space(struct sock *sk)
> >  	ctx->sk_write_space(sk);
> >  }
> >  
> > -void tls_ctx_free(struct tls_context *ctx)
> > +/**
> > + * tls_ctx_free() - free TLS ULP context
> > + * @sk:  socket to with @ctx is attached
> > + * @ctx: TLS context structure
> > + *
> > + * Free TLS context. If @sk is %NULL caller guarantees that the socket
> > + * to which @ctx was attached has no outstanding references.
> > + */
> > +void tls_ctx_free(struct sock *sk, struct tls_context *ctx)
> >  {
> > +	struct inet_connection_sock *icsk;
> > +
> >  	if (!ctx)
> >  		return;
> >  
> >  	memzero_explicit(&ctx->crypto_send, sizeof(ctx->crypto_send));
> >  	memzero_explicit(&ctx->crypto_recv, sizeof(ctx->crypto_recv));
> > -	kfree(ctx);
> > +
> > +	if (sk) {
> > +		icsk = inet_csk(sk);
> > +		rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
> 
> Now that we kind of want to set the icsk_ulp_data to NULL under the
> callback_lock I think we should let the callers do it.

Ok, I will fix this in series v2.

> > 
> > @@ -649,8 +666,8 @@ static void tls_hw_sk_destruct(struct sock *sk)
> >  
> >  	ctx->sk_destruct(sk);
> >  	/* Free ctx */
> > -	tls_ctx_free(ctx);
> > -	icsk->icsk_ulp_data = NULL;
> > +	tls_ctx_free(sk, ctx);
> > +	rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
> 
> Let's reorder the assignment before the free.

Ok, I will fix this in series v2.

thanks!
-- 
davide



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information
  2019-08-15 21:38     ` Jakub Kicinski
@ 2019-08-19 13:32       ` Davide Caratti
  2019-08-19 20:40         ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Davide Caratti @ 2019-08-19 13:32 UTC (permalink / raw)
  To: Jakub Kicinski, Eric Dumazet
  Cc: Boris Pismenny, John Fastabend, Dave Watson, Aviad Yehezkel,
	David S. Miller, netdev

On Thu, 2019-08-15 at 14:38 -0700, Jakub Kicinski wrote:
> On Thu, 15 Aug 2019 20:46:01 +0200, Eric Dumazet wrote:

hello Eric and Jakub, thanks a lot for looking at this.

> > On 8/15/19 6:00 PM, Davide Caratti wrote:
> > 
> > >  
> > > +	if (net_admin) {
> > > +		const struct tcp_ulp_ops *ulp_ops;
> > > +
> > > +		rcu_read_lock();
> > > +		ulp_ops = icsk->icsk_ulp_ops;
> > > +		if (ulp_ops)
> > > +			err = tcp_diag_put_ulp(skb, sk, ulp_ops);
> > > +		rcu_read_unlock();
> > > +		if (err)
> > > +			return err;
> > > +	}
> > >  	return 0;  
> > 
> > Why is rcu_read_lock() and rcu_read_unlock() used at all ?
> > 
> > icsk->icsk_ulp_ops does not seem to be rcu protected ?
> > 
> > If this was, then an rcu_dereference() would be appropriate.
> 
> Indeed it's ulp_data not ulp_ops that are protected. 

the goal is to protect execution of 'ss -tni' against concurrent removal
of tls.ko module, similarly to what was done in inet_sk_diag_fill() when
INET_DIAG_CONG is requested [1]. But after reading more carefully, the
assignment of ulp_ops needs to be:

	ulp_ops = READ_ONCE(icsk->icsk_ulp_ops);

which I lost in internal reviews, with some additional explanatory
comment. Ok if I correct the above hunk with READ_ONCE() and add a
comment?

> Davide, perhaps we could push the RCU lock into tls_get_info(), after all?

It depends on whether concurrent dump / module removal is an issue for TCP
ULPs, like it was for congestion control schemes [1]. Any advice?

> And tls_context has to use rcu_deference there, as Eric points out, 
> plus we should probably NULL-check it.

yes, it makes sense, for patch 3/3, in the assignment of 'ctx'. Instead of
calling tls_get_ctx() in tls_get_info() I will do

	ctx = rcu_dereference(inet_csk(sk)->icsk_ulp_data);

and let it return 0 in case of NULL ctx (as it doesn't look like a faulty
situation). Ok? 

-- 
davide


[1] see:
commit 521f1cf1dbb9d5ad858dca5dc75d1b45f64b6589
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Apr 16 18:10:35 2015 -0700

    inet_diag: fix access to tcp cc information


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information
  2019-08-19 13:32       ` Davide Caratti
@ 2019-08-19 20:40         ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2019-08-19 20:40 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Eric Dumazet, Boris Pismenny, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev

On Mon, 19 Aug 2019 15:32:09 +0200, Davide Caratti wrote:
> On Thu, 2019-08-15 at 14:38 -0700, Jakub Kicinski wrote:
> > On Thu, 15 Aug 2019 20:46:01 +0200, Eric Dumazet wrote:  
> > > On 8/15/19 6:00 PM, Davide Caratti wrote:
> > > > +	if (net_admin) {
> > > > +		const struct tcp_ulp_ops *ulp_ops;
> > > > +
> > > > +		rcu_read_lock();
> > > > +		ulp_ops = icsk->icsk_ulp_ops;
> > > > +		if (ulp_ops)
> > > > +			err = tcp_diag_put_ulp(skb, sk, ulp_ops);
> > > > +		rcu_read_unlock();
> > > > +		if (err)
> > > > +			return err;
> > > > +	}
> > > >  	return 0;    
> > > 
> > > Why is rcu_read_lock() and rcu_read_unlock() used at all ?
> > > 
> > > icsk->icsk_ulp_ops does not seem to be rcu protected ?
> > > 
> > > If this was, then an rcu_dereference() would be appropriate.  
> > 
> > Indeed it's ulp_data not ulp_ops that are protected.   
> 
> the goal is to protect execution of 'ss -tni' against concurrent removal
> of tls.ko module, similarly to what was done in inet_sk_diag_fill() when
> INET_DIAG_CONG is requested [1]. But after reading more carefully, the
> assignment of ulp_ops needs to be:
> 
> 	ulp_ops = READ_ONCE(icsk->icsk_ulp_ops);
> 
> which I lost in internal reviews, with some additional explanatory
> comment. Ok if I correct the above hunk with READ_ONCE() and add a
> comment?

Seems like a forth while future-proofing. Currently the ULP can't
change, and is only released when socket is destroyed, so we should 
be safe (unlike CC which can be changed at any moment). 

We should mark the pointer as RCU tho, I find it hard to wrap my head
around these half-way RCU pointers with just READ_ONCE() on them :S

> > Davide, perhaps we could push the RCU lock into tls_get_info(), after all?  
> 
> It depends on whether concurrent dump / module removal is an issue for TCP
> ULPs, like it was for congestion control schemes [1]. Any advice?

If we're willing to mark icsk->icsk_ulp_ops as RCU I think it's fine.
But I'm not 100% sure its worth the churn :S

> > And tls_context has to use rcu_deference there, as Eric points out, 
> > plus we should probably NULL-check it.  
> 
> yes, it makes sense, for patch 3/3, in the assignment of 'ctx'. Instead of
> calling tls_get_ctx() in tls_get_info() I will do
> 
> 	ctx = rcu_dereference(inet_csk(sk)->icsk_ulp_data);
> 
> and let it return 0 in case of NULL ctx (as it doesn't look like a faulty
> situation). Ok? 

SGTM!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-19 20:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15 16:00 [PATCH net-next 0/3] net: tls: add socket diag Davide Caratti
2019-08-15 16:00 ` [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data Davide Caratti
2019-08-15 21:32   ` Jakub Kicinski
2019-08-19 13:23     ` Davide Caratti
2019-08-15 16:00 ` [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information Davide Caratti
2019-08-15 18:46   ` Eric Dumazet
2019-08-15 21:38     ` Jakub Kicinski
2019-08-19 13:32       ` Davide Caratti
2019-08-19 20:40         ` Jakub Kicinski
2019-08-15 16:00 ` [PATCH net-next 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag Davide Caratti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).