All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/9]  inet_diag: remove three mutexes in diag dumps
@ 2024-01-22 11:25 Eric Dumazet
  2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

Surprisingly, inet_diag operations are serialized over a stack
of three mutexes, giving legacy /proc based files an unfair
advantage on modern hosts.

This series removes all of them, making inet_diag operations
(eg iproute2/ss) fully parallel.

1-2) Two first patches are adding data-race annotations
     and can be backported to stable kernels.

3-4) inet_diag_table_mutex can be replaced with RCU protection,
     if we add corresponding protection against module unload.

5-7) sock_diag_table_mutex can be replaced with RCU protection,
     if we add corresponding protection against module unload.

 8)  sock_diag_mutex is removed, as the old bug it was
     working around has been fixed more elegantly.

 9)  inet_diag_dump_icsk() can skip over empty buckets to reduce
     spinlock contention.

Eric Dumazet (9):
  sock_diag: annotate data-races around sock_diag_handlers[family]
  inet_diag: annotate data-races around inet_diag_table[]
  inet_diag: add module pointer to "struct inet_diag_handler"
  inet_diag: allow concurrent operations
  sock_diag: add module pointer to "struct sock_diag_handler"
  sock_diag: allow concurrent operations
  sock_diag: allow concurrent operation in sock_diag_rcv_msg()
  sock_diag: remove sock_diag_mutex
  inet_diag: skip over empty buckets

 include/linux/inet_diag.h |   1 +
 include/linux/sock_diag.h |  10 +++-
 net/core/sock_diag.c      | 120 +++++++++++++++++++++-----------------
 net/dccp/diag.c           |   1 +
 net/ipv4/inet_diag.c      | 101 ++++++++++++++++++--------------
 net/ipv4/raw_diag.c       |   1 +
 net/ipv4/tcp_diag.c       |   1 +
 net/ipv4/udp_diag.c       |   2 +
 net/mptcp/mptcp_diag.c    |   1 +
 net/netlink/diag.c        |   1 +
 net/packet/diag.c         |   1 +
 net/sctp/diag.c           |   1 +
 net/smc/smc_diag.c        |   1 +
 net/tipc/diag.c           |   1 +
 net/unix/diag.c           |   1 +
 net/vmw_vsock/diag.c      |   1 +
 net/xdp/xsk_diag.c        |   1 +
 17 files changed, 149 insertions(+), 97 deletions(-)

-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family]
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
  2024-01-22 16:29   ` Guillaume Nault
  2024-01-22 22:36   ` Kuniyuki Iwashima
  2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

__sock_diag_cmd() and sock_diag_bind() read sock_diag_handlers[family]
without a lock held.

Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.

Fixes: 8ef874bfc729 ("sock_diag: Move the sock_ code to net/core/")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/sock_diag.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index b1e29e18d1d60cb5c87c884652f547c083ba81cd..c53b731f2d6728d113b90732f4df5b011a438038 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -193,7 +193,7 @@ int sock_diag_register(const struct sock_diag_handler *hndl)
 	if (sock_diag_handlers[hndl->family])
 		err = -EBUSY;
 	else
-		sock_diag_handlers[hndl->family] = hndl;
+		WRITE_ONCE(sock_diag_handlers[hndl->family], hndl);
 	mutex_unlock(&sock_diag_table_mutex);
 
 	return err;
@@ -209,7 +209,7 @@ void sock_diag_unregister(const struct sock_diag_handler *hnld)
 
 	mutex_lock(&sock_diag_table_mutex);
 	BUG_ON(sock_diag_handlers[family] != hnld);
-	sock_diag_handlers[family] = NULL;
+	WRITE_ONCE(sock_diag_handlers[family], NULL);
 	mutex_unlock(&sock_diag_table_mutex);
 }
 EXPORT_SYMBOL_GPL(sock_diag_unregister);
@@ -227,7 +227,7 @@ static int __sock_diag_cmd(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return -EINVAL;
 	req->sdiag_family = array_index_nospec(req->sdiag_family, AF_MAX);
 
-	if (sock_diag_handlers[req->sdiag_family] == NULL)
+	if (READ_ONCE(sock_diag_handlers[req->sdiag_family]) == NULL)
 		sock_load_diag_module(req->sdiag_family, 0);
 
 	mutex_lock(&sock_diag_table_mutex);
@@ -286,12 +286,12 @@ static int sock_diag_bind(struct net *net, int group)
 	switch (group) {
 	case SKNLGRP_INET_TCP_DESTROY:
 	case SKNLGRP_INET_UDP_DESTROY:
-		if (!sock_diag_handlers[AF_INET])
+		if (!READ_ONCE(sock_diag_handlers[AF_INET]))
 			sock_load_diag_module(AF_INET, 0);
 		break;
 	case SKNLGRP_INET6_TCP_DESTROY:
 	case SKNLGRP_INET6_UDP_DESTROY:
-		if (!sock_diag_handlers[AF_INET6])
+		if (!READ_ONCE(sock_diag_handlers[AF_INET6]))
 			sock_load_diag_module(AF_INET6, 0);
 		break;
 	}
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[]
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
  2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
  2024-01-22 16:29   ` Guillaume Nault
  2024-01-22 22:37   ` Kuniyuki Iwashima
  2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

inet_diag_lock_handler() reads inet_diag_table[proto] locklessly.

Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.

Fixes: d523a328fb02 ("[INET]: Fix inet_diag dead-lock regression")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_diag.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 8e6b6aa0579e1c94def819a1f9eab8b946771ba7..9804e9608a5a0294b3ffabc4b5bb87ac1b96b09e 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -57,7 +57,7 @@ static const struct inet_diag_handler *inet_diag_lock_handler(int proto)
 		return ERR_PTR(-ENOENT);
 	}
 
-	if (!inet_diag_table[proto])
+	if (!READ_ONCE(inet_diag_table[proto]))
 		sock_load_diag_module(AF_INET, proto);
 
 	mutex_lock(&inet_diag_table_mutex);
@@ -1503,7 +1503,7 @@ int inet_diag_register(const struct inet_diag_handler *h)
 	mutex_lock(&inet_diag_table_mutex);
 	err = -EEXIST;
 	if (!inet_diag_table[type]) {
-		inet_diag_table[type] = h;
+		WRITE_ONCE(inet_diag_table[type], h);
 		err = 0;
 	}
 	mutex_unlock(&inet_diag_table_mutex);
@@ -1520,7 +1520,7 @@ void inet_diag_unregister(const struct inet_diag_handler *h)
 		return;
 
 	mutex_lock(&inet_diag_table_mutex);
-	inet_diag_table[type] = NULL;
+	WRITE_ONCE(inet_diag_table[type], NULL);
 	mutex_unlock(&inet_diag_table_mutex);
 }
 EXPORT_SYMBOL_GPL(inet_diag_unregister);
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler"
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
  2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
  2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
  2024-01-22 16:58   ` Guillaume Nault
  2024-01-22 22:38   ` Kuniyuki Iwashima
  2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

Following patch is going to use RCU instead of
inet_diag_table_mutex acquisition.

This patch is a preparation, no change of behavior yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/inet_diag.h | 1 +
 net/dccp/diag.c           | 1 +
 net/ipv4/raw_diag.c       | 1 +
 net/ipv4/tcp_diag.c       | 1 +
 net/ipv4/udp_diag.c       | 2 ++
 net/mptcp/mptcp_diag.c    | 1 +
 net/sctp/diag.c           | 1 +
 7 files changed, 8 insertions(+)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 84abb30a3fbb13a61ed786bf32e0f72c0bbfbdd2..a9033696b0aad36ab9abd47e4b68e272053019d7 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -8,6 +8,7 @@
 struct inet_hashinfo;
 
 struct inet_diag_handler {
+	struct module	*owner;
 	void		(*dump)(struct sk_buff *skb,
 				struct netlink_callback *cb,
 				const struct inet_diag_req_v2 *r);
diff --git a/net/dccp/diag.c b/net/dccp/diag.c
index 8a82c5a2c5a8c9ed8885b53744ff13c2bee657d8..f5019d95c3ae535555cc2b4d4885c718227fec67 100644
--- a/net/dccp/diag.c
+++ b/net/dccp/diag.c
@@ -58,6 +58,7 @@ static int dccp_diag_dump_one(struct netlink_callback *cb,
 }
 
 static const struct inet_diag_handler dccp_diag_handler = {
+	.owner		 = THIS_MODULE,
 	.dump		 = dccp_diag_dump,
 	.dump_one	 = dccp_diag_dump_one,
 	.idiag_get_info	 = dccp_diag_get_info,
diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
index fe2140c8375c8ebcc69880142c42655233007900..cc793bd8de258c3a12f11e95cec81c5ae4b9a7f6 100644
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -213,6 +213,7 @@ static int raw_diag_destroy(struct sk_buff *in_skb,
 #endif
 
 static const struct inet_diag_handler raw_diag_handler = {
+	.owner			= THIS_MODULE,
 	.dump			= raw_diag_dump,
 	.dump_one		= raw_diag_dump_one,
 	.idiag_get_info		= raw_diag_get_info,
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index 4cbe4b44425a6a5daf55abe348c167932ca07222..f428ecf9120f2f596e1d67db2b2a0d0d0e211905 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -222,6 +222,7 @@ static int tcp_diag_destroy(struct sk_buff *in_skb,
 #endif
 
 static const struct inet_diag_handler tcp_diag_handler = {
+	.owner			= THIS_MODULE,
 	.dump			= tcp_diag_dump,
 	.dump_one		= tcp_diag_dump_one,
 	.idiag_get_info		= tcp_diag_get_info,
diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c
index dc41a22ee80e829582349e8e644f204eff07df0e..38cb3a28e4ed6d54f7078afa2700e71db9ce4b85 100644
--- a/net/ipv4/udp_diag.c
+++ b/net/ipv4/udp_diag.c
@@ -237,6 +237,7 @@ static int udplite_diag_destroy(struct sk_buff *in_skb,
 #endif
 
 static const struct inet_diag_handler udp_diag_handler = {
+	.owner		 = THIS_MODULE,
 	.dump		 = udp_diag_dump,
 	.dump_one	 = udp_diag_dump_one,
 	.idiag_get_info  = udp_diag_get_info,
@@ -260,6 +261,7 @@ static int udplite_diag_dump_one(struct netlink_callback *cb,
 }
 
 static const struct inet_diag_handler udplite_diag_handler = {
+	.owner		 = THIS_MODULE,
 	.dump		 = udplite_diag_dump,
 	.dump_one	 = udplite_diag_dump_one,
 	.idiag_get_info  = udp_diag_get_info,
diff --git a/net/mptcp/mptcp_diag.c b/net/mptcp/mptcp_diag.c
index 5409c2ea3f5728a05999db17b7af1b1fb56f757e..bd8ff5950c8d33766a0da971dc127f106feb8481 100644
--- a/net/mptcp/mptcp_diag.c
+++ b/net/mptcp/mptcp_diag.c
@@ -225,6 +225,7 @@ static void mptcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
 }
 
 static const struct inet_diag_handler mptcp_diag_handler = {
+	.owner		 = THIS_MODULE,
 	.dump		 = mptcp_diag_dump,
 	.dump_one	 = mptcp_diag_dump_one,
 	.idiag_get_info  = mptcp_diag_get_info,
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index eb05131ff1dd671e734457e28b2d7b64eab07f85..23359e522273f0377080007c75eb2c276945f781 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -507,6 +507,7 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 }
 
 static const struct inet_diag_handler sctp_diag_handler = {
+	.owner		 = THIS_MODULE,
 	.dump		 = sctp_diag_dump,
 	.dump_one	 = sctp_diag_dump_one,
 	.idiag_get_info  = sctp_diag_get_info,
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 4/9] inet_diag: allow concurrent operations
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
                   ` (2 preceding siblings ...)
  2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
  2024-01-22 16:58   ` Guillaume Nault
  2024-01-22 22:40   ` Kuniyuki Iwashima
  2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

inet_diag_lock_handler() current implementation uses a mutex
to protect inet_diag_table[] array against concurrent changes.

This makes inet_diag dump serialized, thus less scalable
than legacy /proc files.

It is time to switch to full RCU protection.

As a bonus, if a target is statically linked instead of being
modular, inet_diag_lock_handler() & inet_diag_unlock_handler()
reduce to reads only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_diag.c | 80 ++++++++++++++++++++++----------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 9804e9608a5a0294b3ffabc4b5bb87ac1b96b09e..abf7dc9827969d7e8061420be8629730ccce5449 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -32,7 +32,7 @@
 #include <linux/inet_diag.h>
 #include <linux/sock_diag.h>
 
-static const struct inet_diag_handler **inet_diag_table;
+static const struct inet_diag_handler __rcu **inet_diag_table;
 
 struct inet_diag_entry {
 	const __be32 *saddr;
@@ -48,28 +48,28 @@ struct inet_diag_entry {
 #endif
 };
 
-static DEFINE_MUTEX(inet_diag_table_mutex);
-
 static const struct inet_diag_handler *inet_diag_lock_handler(int proto)
 {
-	if (proto < 0 || proto >= IPPROTO_MAX) {
-		mutex_lock(&inet_diag_table_mutex);
-		return ERR_PTR(-ENOENT);
-	}
+	const struct inet_diag_handler *handler;
+
+	if (proto < 0 || proto >= IPPROTO_MAX)
+		return NULL;
 
 	if (!READ_ONCE(inet_diag_table[proto]))
 		sock_load_diag_module(AF_INET, proto);
 
-	mutex_lock(&inet_diag_table_mutex);
-	if (!inet_diag_table[proto])
-		return ERR_PTR(-ENOENT);
+	rcu_read_lock();
+	handler = rcu_dereference(inet_diag_table[proto]);
+	if (handler && !try_module_get(handler->owner))
+		handler = NULL;
+	rcu_read_unlock();
 
-	return inet_diag_table[proto];
+	return handler;
 }
 
 static void inet_diag_unlock_handler(const struct inet_diag_handler *handler)
 {
-	mutex_unlock(&inet_diag_table_mutex);
+	module_put(handler->owner);
 }
 
 void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
@@ -104,9 +104,12 @@ static size_t inet_sk_attr_size(struct sock *sk,
 	const struct inet_diag_handler *handler;
 	size_t aux = 0;
 
-	handler = inet_diag_table[req->sdiag_protocol];
+	rcu_read_lock();
+	handler = rcu_dereference(inet_diag_table[req->sdiag_protocol]);
+	DEBUG_NET_WARN_ON_ONCE(!handler);
 	if (handler && handler->idiag_get_aux_size)
 		aux = handler->idiag_get_aux_size(sk, net_admin);
+	rcu_read_unlock();
 
 	return	  nla_total_size(sizeof(struct tcp_info))
 		+ nla_total_size(sizeof(struct inet_diag_msg))
@@ -244,10 +247,16 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 	struct nlmsghdr  *nlh;
 	struct nlattr *attr;
 	void *info = NULL;
+	int protocol;
 
 	cb_data = cb->data;
-	handler = inet_diag_table[inet_diag_get_protocol(req, cb_data)];
-	BUG_ON(!handler);
+	protocol = inet_diag_get_protocol(req, cb_data);
+
+	/* inet_diag_lock_handler() made sure inet_diag_table[] is stable. */
+	handler = rcu_dereference_protected(inet_diag_table[protocol], 1);
+	DEBUG_NET_WARN_ON_ONCE(!handler);
+	if (!handler)
+		return -ENXIO;
 
 	nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
 			cb->nlh->nlmsg_type, sizeof(*r), nlmsg_flags);
@@ -605,9 +614,10 @@ static int inet_diag_cmd_exact(int cmd, struct sk_buff *in_skb,
 	protocol = inet_diag_get_protocol(req, &dump_data);
 
 	handler = inet_diag_lock_handler(protocol);
-	if (IS_ERR(handler)) {
-		err = PTR_ERR(handler);
-	} else if (cmd == SOCK_DIAG_BY_FAMILY) {
+	if (!handler)
+		return -ENOENT;
+
+	if (cmd == SOCK_DIAG_BY_FAMILY) {
 		struct netlink_callback cb = {
 			.nlh = nlh,
 			.skb = in_skb,
@@ -1259,12 +1269,12 @@ static int __inet_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 again:
 	prev_min_dump_alloc = cb->min_dump_alloc;
 	handler = inet_diag_lock_handler(protocol);
-	if (!IS_ERR(handler))
+	if (handler) {
 		handler->dump(skb, cb, r);
-	else
-		err = PTR_ERR(handler);
-	inet_diag_unlock_handler(handler);
-
+		inet_diag_unlock_handler(handler);
+	} else {
+		err = -ENOENT;
+	}
 	/* The skb is not large enough to fit one sk info and
 	 * inet_sk_diag_fill() has requested for a larger skb.
 	 */
@@ -1457,10 +1467,9 @@ int inet_diag_handler_get_info(struct sk_buff *skb, struct sock *sk)
 	}
 
 	handler = inet_diag_lock_handler(sk->sk_protocol);
-	if (IS_ERR(handler)) {
-		inet_diag_unlock_handler(handler);
+	if (!handler) {
 		nlmsg_cancel(skb, nlh);
-		return PTR_ERR(handler);
+		return -ENOENT;
 	}
 
 	attr = handler->idiag_info_size
@@ -1495,20 +1504,12 @@ static const struct sock_diag_handler inet6_diag_handler = {
 int inet_diag_register(const struct inet_diag_handler *h)
 {
 	const __u16 type = h->idiag_type;
-	int err = -EINVAL;
 
 	if (type >= IPPROTO_MAX)
-		goto out;
+		return -EINVAL;
 
-	mutex_lock(&inet_diag_table_mutex);
-	err = -EEXIST;
-	if (!inet_diag_table[type]) {
-		WRITE_ONCE(inet_diag_table[type], h);
-		err = 0;
-	}
-	mutex_unlock(&inet_diag_table_mutex);
-out:
-	return err;
+	return !cmpxchg((const struct inet_diag_handler **)&inet_diag_table[type],
+			NULL, h) ? 0 : -EEXIST;
 }
 EXPORT_SYMBOL_GPL(inet_diag_register);
 
@@ -1519,9 +1520,8 @@ void inet_diag_unregister(const struct inet_diag_handler *h)
 	if (type >= IPPROTO_MAX)
 		return;
 
-	mutex_lock(&inet_diag_table_mutex);
-	WRITE_ONCE(inet_diag_table[type], NULL);
-	mutex_unlock(&inet_diag_table_mutex);
+	xchg((const struct inet_diag_handler **)&inet_diag_table[type],
+	     NULL);
 }
 EXPORT_SYMBOL_GPL(inet_diag_unregister);
 
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler"
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
                   ` (3 preceding siblings ...)
  2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
  2024-01-22 17:24   ` Guillaume Nault
  2024-01-22 22:44   ` Kuniyuki Iwashima
  2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

Following patch is going to use RCU instead of
sock_diag_table_mutex acquisition.

This patch is a preparation, no change of behavior yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/sock_diag.h | 1 +
 net/ipv4/inet_diag.c      | 2 ++
 net/netlink/diag.c        | 1 +
 net/packet/diag.c         | 1 +
 net/smc/smc_diag.c        | 1 +
 net/tipc/diag.c           | 1 +
 net/unix/diag.c           | 1 +
 net/vmw_vsock/diag.c      | 1 +
 net/xdp/xsk_diag.c        | 1 +
 9 files changed, 10 insertions(+)

diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index 0b9ecd8cf9793bc26138a0a36474e78773fb4f31..7c07754d711b9bd04bc57f8ed08981849fcadb11 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -13,6 +13,7 @@ struct nlmsghdr;
 struct sock;
 
 struct sock_diag_handler {
+	struct module *owner;
 	__u8 family;
 	int (*dump)(struct sk_buff *skb, struct nlmsghdr *nlh);
 	int (*get_info)(struct sk_buff *skb, struct sock *sk);
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index abf7dc9827969d7e8061420be8629730ccce5449..52ce20691e4ef1382da94473128e3c14c55bd542 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -1488,6 +1488,7 @@ int inet_diag_handler_get_info(struct sk_buff *skb, struct sock *sk)
 }
 
 static const struct sock_diag_handler inet_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_INET,
 	.dump = inet_diag_handler_cmd,
 	.get_info = inet_diag_handler_get_info,
@@ -1495,6 +1496,7 @@ static const struct sock_diag_handler inet_diag_handler = {
 };
 
 static const struct sock_diag_handler inet6_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_INET6,
 	.dump = inet_diag_handler_cmd,
 	.get_info = inet_diag_handler_get_info,
diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index 1eeff9422856eb9006e25b21ef188280b4cff7f6..e12c90d5f6ad29446ea1990c88c19bcb0ee856c3 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -241,6 +241,7 @@ static int netlink_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
 }
 
 static const struct sock_diag_handler netlink_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_NETLINK,
 	.dump = netlink_diag_handler_dump,
 };
diff --git a/net/packet/diag.c b/net/packet/diag.c
index 9a7980e3309d6a2950688f8b69b08c15c288f601..b3bd2f6c2bf7be7b1436aa1a7fad6ef3f77217ad 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -245,6 +245,7 @@ static int packet_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
 }
 
 static const struct sock_diag_handler packet_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_PACKET,
 	.dump = packet_diag_handler_dump,
 };
diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index 52f7c4f1e7670d723a6858614f071f73dbd88dc5..32bad267fa3e2729b833cb711a6bb946bc7229d9 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -255,6 +255,7 @@ static int smc_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
 }
 
 static const struct sock_diag_handler smc_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_SMC,
 	.dump = smc_diag_handler_dump,
 };
diff --git a/net/tipc/diag.c b/net/tipc/diag.c
index 18733451c9e0c23a63d9400d408979aab46ecf19..54dde8c4e4d46d8556b9cc5396c863d24306d547 100644
--- a/net/tipc/diag.c
+++ b/net/tipc/diag.c
@@ -95,6 +95,7 @@ static int tipc_sock_diag_handler_dump(struct sk_buff *skb,
 }
 
 static const struct sock_diag_handler tipc_sock_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_TIPC,
 	.dump = tipc_sock_diag_handler_dump,
 };
diff --git a/net/unix/diag.c b/net/unix/diag.c
index bec09a3a1d44ce56d43e16583fdf3b417cce4033..c3648b706509653480b71ea26ec4f8462f1a3c42 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -322,6 +322,7 @@ static int unix_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
 }
 
 static const struct sock_diag_handler unix_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_UNIX,
 	.dump = unix_diag_handler_dump,
 };
diff --git a/net/vmw_vsock/diag.c b/net/vmw_vsock/diag.c
index 2e29994f92ffa2facee45cd53ec791034182508c..ab87ef66c1e88765911a3b6ff89b7fc720b6d692 100644
--- a/net/vmw_vsock/diag.c
+++ b/net/vmw_vsock/diag.c
@@ -157,6 +157,7 @@ static int vsock_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
 }
 
 static const struct sock_diag_handler vsock_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_VSOCK,
 	.dump = vsock_diag_handler_dump,
 };
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index 9f8955367275e2439d910f978fc3b2b7a1669978..09dcea0cbbed97d9a41e88224994279cfbf8c536 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -194,6 +194,7 @@ static int xsk_diag_handler_dump(struct sk_buff *nlskb, struct nlmsghdr *hdr)
 }
 
 static const struct sock_diag_handler xsk_diag_handler = {
+	.owner = THIS_MODULE,
 	.family = AF_XDP,
 	.dump = xsk_diag_handler_dump,
 };
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 6/9] sock_diag: allow concurrent operations
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
                   ` (4 preceding siblings ...)
  2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
@ 2024-01-22 11:26 ` Eric Dumazet
  2024-01-22 17:25   ` Guillaume Nault
  2024-01-22 22:47   ` Kuniyuki Iwashima
  2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

sock_diag_broadcast_destroy_work() and __sock_diag_cmd()
are currently using sock_diag_table_mutex to protect
against concurrent sock_diag_handlers[] changes.

This makes inet_diag dump serialized, thus less scalable
than legacy /proc files.

It is time to switch to full RCU protection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/sock_diag.c | 73 +++++++++++++++++++++++++-------------------
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index c53b731f2d6728d113b90732f4df5b011a438038..72009e1f4380dfdcbf43ed08791e5039e74f5c54 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -16,7 +16,7 @@
 #include <linux/inet_diag.h>
 #include <linux/sock_diag.h>
 
-static const struct sock_diag_handler *sock_diag_handlers[AF_MAX];
+static const struct sock_diag_handler __rcu *sock_diag_handlers[AF_MAX];
 static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh);
 static DEFINE_MUTEX(sock_diag_table_mutex);
 static struct workqueue_struct *broadcast_wq;
@@ -122,6 +122,24 @@ static size_t sock_diag_nlmsg_size(void)
 	       + nla_total_size_64bit(sizeof(struct tcp_info))); /* INET_DIAG_INFO */
 }
 
+static const struct sock_diag_handler *sock_diag_lock_handler(int family)
+{
+	const struct sock_diag_handler *handler;
+
+	rcu_read_lock();
+	handler = rcu_dereference(sock_diag_handlers[family]);
+	if (handler && !try_module_get(handler->owner))
+		handler = NULL;
+	rcu_read_unlock();
+
+	return handler;
+}
+
+static void sock_diag_unlock_handler(const struct sock_diag_handler *handler)
+{
+	module_put(handler->owner);
+}
+
 static void sock_diag_broadcast_destroy_work(struct work_struct *work)
 {
 	struct broadcast_sk *bsk =
@@ -138,12 +156,12 @@ static void sock_diag_broadcast_destroy_work(struct work_struct *work)
 	if (!skb)
 		goto out;
 
-	mutex_lock(&sock_diag_table_mutex);
-	hndl = sock_diag_handlers[sk->sk_family];
-	if (hndl && hndl->get_info)
-		err = hndl->get_info(skb, sk);
-	mutex_unlock(&sock_diag_table_mutex);
-
+	hndl = sock_diag_lock_handler(sk->sk_family);
+	if (hndl) {
+		if (hndl->get_info)
+			err = hndl->get_info(skb, sk);
+		sock_diag_unlock_handler(hndl);
+	}
 	if (!err)
 		nlmsg_multicast(sock_net(sk)->diag_nlsk, skb, 0, group,
 				GFP_KERNEL);
@@ -184,33 +202,26 @@ EXPORT_SYMBOL_GPL(sock_diag_unregister_inet_compat);
 
 int sock_diag_register(const struct sock_diag_handler *hndl)
 {
-	int err = 0;
+	int family = hndl->family;
 
-	if (hndl->family >= AF_MAX)
+	if (family >= AF_MAX)
 		return -EINVAL;
 
-	mutex_lock(&sock_diag_table_mutex);
-	if (sock_diag_handlers[hndl->family])
-		err = -EBUSY;
-	else
-		WRITE_ONCE(sock_diag_handlers[hndl->family], hndl);
-	mutex_unlock(&sock_diag_table_mutex);
-
-	return err;
+	return !cmpxchg((const struct sock_diag_handler **)
+				&sock_diag_handlers[family],
+			NULL, hndl) ? 0 : -EBUSY;
 }
 EXPORT_SYMBOL_GPL(sock_diag_register);
 
-void sock_diag_unregister(const struct sock_diag_handler *hnld)
+void sock_diag_unregister(const struct sock_diag_handler *hndl)
 {
-	int family = hnld->family;
+	int family = hndl->family;
 
 	if (family >= AF_MAX)
 		return;
 
-	mutex_lock(&sock_diag_table_mutex);
-	BUG_ON(sock_diag_handlers[family] != hnld);
-	WRITE_ONCE(sock_diag_handlers[family], NULL);
-	mutex_unlock(&sock_diag_table_mutex);
+	xchg((const struct sock_diag_handler **)&sock_diag_handlers[family],
+	     NULL);
 }
 EXPORT_SYMBOL_GPL(sock_diag_unregister);
 
@@ -227,20 +238,20 @@ static int __sock_diag_cmd(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return -EINVAL;
 	req->sdiag_family = array_index_nospec(req->sdiag_family, AF_MAX);
 
-	if (READ_ONCE(sock_diag_handlers[req->sdiag_family]) == NULL)
+	if (!rcu_access_pointer(sock_diag_handlers[req->sdiag_family]))
 		sock_load_diag_module(req->sdiag_family, 0);
 
-	mutex_lock(&sock_diag_table_mutex);
-	hndl = sock_diag_handlers[req->sdiag_family];
+	hndl = sock_diag_lock_handler(req->sdiag_family);
 	if (hndl == NULL)
-		err = -ENOENT;
-	else if (nlh->nlmsg_type == SOCK_DIAG_BY_FAMILY)
+		return -ENOENT;
+
+	if (nlh->nlmsg_type == SOCK_DIAG_BY_FAMILY)
 		err = hndl->dump(skb, nlh);
 	else if (nlh->nlmsg_type == SOCK_DESTROY && hndl->destroy)
 		err = hndl->destroy(skb, nlh);
 	else
 		err = -EOPNOTSUPP;
-	mutex_unlock(&sock_diag_table_mutex);
+	sock_diag_unlock_handler(hndl);
 
 	return err;
 }
@@ -286,12 +297,12 @@ static int sock_diag_bind(struct net *net, int group)
 	switch (group) {
 	case SKNLGRP_INET_TCP_DESTROY:
 	case SKNLGRP_INET_UDP_DESTROY:
-		if (!READ_ONCE(sock_diag_handlers[AF_INET]))
+		if (!rcu_access_pointer(sock_diag_handlers[AF_INET]))
 			sock_load_diag_module(AF_INET, 0);
 		break;
 	case SKNLGRP_INET6_TCP_DESTROY:
 	case SKNLGRP_INET6_UDP_DESTROY:
-		if (!READ_ONCE(sock_diag_handlers[AF_INET6]))
+		if (!rcu_access_pointer(sock_diag_handlers[AF_INET6]))
 			sock_load_diag_module(AF_INET6, 0);
 		break;
 	}
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg()
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
                   ` (5 preceding siblings ...)
  2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
@ 2024-01-22 11:26 ` Eric Dumazet
  2024-01-22 17:25   ` Guillaume Nault
  2024-01-22 22:51   ` Kuniyuki Iwashima
  2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

TCPDIAG_GETSOCK and DCCPDIAG_GETSOCK diag are serialized
on sock_diag_table_mutex.

This is to make sure inet_diag module is not unloaded
while diag was ongoing.

It is time to get rid of this mutex and use RCU protection,
allowing full parallelism.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/sock_diag.h |  9 ++++++--
 net/core/sock_diag.c      | 43 +++++++++++++++++++++++----------------
 net/ipv4/inet_diag.c      |  9 ++++++--
 3 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index 7c07754d711b9bd04bc57f8ed08981849fcadb11..110978dc9af1b19194644151af5456b8c6644cf9 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -23,8 +23,13 @@ struct sock_diag_handler {
 int sock_diag_register(const struct sock_diag_handler *h);
 void sock_diag_unregister(const struct sock_diag_handler *h);
 
-void sock_diag_register_inet_compat(int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh));
-void sock_diag_unregister_inet_compat(int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh));
+struct sock_diag_inet_compat {
+	struct module *owner;
+	int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh);
+};
+
+void sock_diag_register_inet_compat(const struct sock_diag_inet_compat *ptr);
+void sock_diag_unregister_inet_compat(const struct sock_diag_inet_compat *ptr);
 
 u64 __sock_gen_cookie(struct sock *sk);
 
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 72009e1f4380dfdcbf43ed08791e5039e74f5c54..5c3666431df49b3c278ef795f11ba542247796a6 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -17,8 +17,9 @@
 #include <linux/sock_diag.h>
 
 static const struct sock_diag_handler __rcu *sock_diag_handlers[AF_MAX];
-static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh);
-static DEFINE_MUTEX(sock_diag_table_mutex);
+
+static struct sock_diag_inet_compat __rcu *inet_rcv_compat;
+
 static struct workqueue_struct *broadcast_wq;
 
 DEFINE_COOKIE(sock_cookie);
@@ -184,19 +185,20 @@ void sock_diag_broadcast_destroy(struct sock *sk)
 	queue_work(broadcast_wq, &bsk->work);
 }
 
-void sock_diag_register_inet_compat(int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh))
+void sock_diag_register_inet_compat(const struct sock_diag_inet_compat *ptr)
 {
-	mutex_lock(&sock_diag_table_mutex);
-	inet_rcv_compat = fn;
-	mutex_unlock(&sock_diag_table_mutex);
+	xchg((__force const struct sock_diag_inet_compat **)&inet_rcv_compat,
+	     ptr);
 }
 EXPORT_SYMBOL_GPL(sock_diag_register_inet_compat);
 
-void sock_diag_unregister_inet_compat(int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh))
+void sock_diag_unregister_inet_compat(const struct sock_diag_inet_compat *ptr)
 {
-	mutex_lock(&sock_diag_table_mutex);
-	inet_rcv_compat = NULL;
-	mutex_unlock(&sock_diag_table_mutex);
+	const struct sock_diag_inet_compat *old;
+
+	old = xchg((__force const struct sock_diag_inet_compat **)&inet_rcv_compat,
+		   NULL);
+	WARN_ON_ONCE(old != ptr);
 }
 EXPORT_SYMBOL_GPL(sock_diag_unregister_inet_compat);
 
@@ -259,20 +261,27 @@ static int __sock_diag_cmd(struct sk_buff *skb, struct nlmsghdr *nlh)
 static int sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 			     struct netlink_ext_ack *extack)
 {
+	const struct sock_diag_inet_compat *ptr;
 	int ret;
 
 	switch (nlh->nlmsg_type) {
 	case TCPDIAG_GETSOCK:
 	case DCCPDIAG_GETSOCK:
-		if (inet_rcv_compat == NULL)
+
+		if (!rcu_access_pointer(inet_rcv_compat))
 			sock_load_diag_module(AF_INET, 0);
 
-		mutex_lock(&sock_diag_table_mutex);
-		if (inet_rcv_compat != NULL)
-			ret = inet_rcv_compat(skb, nlh);
-		else
-			ret = -EOPNOTSUPP;
-		mutex_unlock(&sock_diag_table_mutex);
+		rcu_read_lock();
+		ptr = rcu_dereference(inet_rcv_compat);
+		if (ptr && !try_module_get(ptr->owner))
+			ptr = NULL;
+		rcu_read_unlock();
+
+		ret = -EOPNOTSUPP;
+		if (ptr) {
+			ret = ptr->fn(skb, nlh);
+			module_put(ptr->owner);
+		}
 
 		return ret;
 	case SOCK_DIAG_BY_FAMILY:
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 52ce20691e4ef1382da94473128e3c14c55bd542..2c2d8b9dd8e9bb502e52e30dffc70da36d9b1c74 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -1527,6 +1527,11 @@ void inet_diag_unregister(const struct inet_diag_handler *h)
 }
 EXPORT_SYMBOL_GPL(inet_diag_unregister);
 
+static const struct sock_diag_inet_compat inet_diag_compat = {
+	.owner	= THIS_MODULE,
+	.fn	= inet_diag_rcv_msg_compat,
+};
+
 static int __init inet_diag_init(void)
 {
 	const int inet_diag_table_size = (IPPROTO_MAX *
@@ -1545,7 +1550,7 @@ static int __init inet_diag_init(void)
 	if (err)
 		goto out_free_inet;
 
-	sock_diag_register_inet_compat(inet_diag_rcv_msg_compat);
+	sock_diag_register_inet_compat(&inet_diag_compat);
 out:
 	return err;
 
@@ -1560,7 +1565,7 @@ static void __exit inet_diag_exit(void)
 {
 	sock_diag_unregister(&inet6_diag_handler);
 	sock_diag_unregister(&inet_diag_handler);
-	sock_diag_unregister_inet_compat(inet_diag_rcv_msg_compat);
+	sock_diag_unregister_inet_compat(&inet_diag_compat);
 	kfree(inet_diag_table);
 }
 
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
                   ` (6 preceding siblings ...)
  2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
@ 2024-01-22 11:26 ` Eric Dumazet
  2024-01-22 21:38   ` Guillaume Nault
  2024-01-22 22:55   ` Kuniyuki Iwashima
  2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

sock_diag_rcv() is still serializing its operations using
a mutex, for no good reason.

This came with commit 0a9c73014415 ("[INET_DIAG]: Fix oops
in netlink_rcv_skb"), but the root cause has been fixed
with commit cd40b7d3983c ("[NET]: make netlink user -> kernel
interface synchronious")

Remove this mutex to let multiple threads run concurrently.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/sock_diag.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 5c3666431df49b3c278ef795f11ba542247796a6..6541228380252d597821b084df34176bff4ada83 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -292,13 +292,9 @@ static int sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}
 }
 
-static DEFINE_MUTEX(sock_diag_mutex);
-
 static void sock_diag_rcv(struct sk_buff *skb)
 {
-	mutex_lock(&sock_diag_mutex);
 	netlink_rcv_skb(skb, &sock_diag_rcv_msg);
-	mutex_unlock(&sock_diag_mutex);
 }
 
 static int sock_diag_bind(struct net *net, int group)
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 9/9] inet_diag: skip over empty buckets
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
                   ` (7 preceding siblings ...)
  2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
@ 2024-01-22 11:26 ` Eric Dumazet
  2024-01-22 21:44   ` Guillaume Nault
  2024-01-22 22:58   ` Kuniyuki Iwashima
  2024-01-22 15:05 ` [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Willem de Bruijn
  2024-01-23 14:40 ` patchwork-bot+netdevbpf
  10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

After the removal of inet_diag_table_mutex, sock_diag_table_mutex
and sock_diag_mutex, I was able so see spinlock contention from
inet_diag_dump_icsk() when running 100 parallel invocations.

It is time to skip over empty buckets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_diag.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 2c2d8b9dd8e9bb502e52e30dffc70da36d9b1c74..7adace541fe292851a66ccc4de1da2a60ac4714e 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -1045,6 +1045,10 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
 			num = 0;
 			ilb = &hashinfo->lhash2[i];
 
+			if (hlist_nulls_empty(&ilb->nulls_head)) {
+				s_num = 0;
+				continue;
+			}
 			spin_lock(&ilb->lock);
 			sk_nulls_for_each(sk, node, &ilb->nulls_head) {
 				struct inet_sock *inet = inet_sk(sk);
@@ -1109,6 +1113,10 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
 			accum = 0;
 			ibb = &hashinfo->bhash2[i];
 
+			if (hlist_empty(&ibb->chain)) {
+				s_num = 0;
+				continue;
+			}
 			spin_lock_bh(&ibb->lock);
 			inet_bind_bucket_for_each(tb2, &ibb->chain) {
 				if (!net_eq(ib2_net(tb2), net))
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/9]  inet_diag: remove three mutexes in diag dumps
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
                   ` (8 preceding siblings ...)
  2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
@ 2024-01-22 15:05 ` Willem de Bruijn
  2024-01-23 14:40 ` patchwork-bot+netdevbpf
  10 siblings, 0 replies; 30+ messages in thread
From: Willem de Bruijn @ 2024-01-22 15:05 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
	Guillaume Nault, netdev, eric.dumazet, Eric Dumazet

Eric Dumazet wrote:
> Surprisingly, inet_diag operations are serialized over a stack
> of three mutexes, giving legacy /proc based files an unfair
> advantage on modern hosts.
> 
> This series removes all of them, making inet_diag operations
> (eg iproute2/ss) fully parallel.
> 
> 1-2) Two first patches are adding data-race annotations
>      and can be backported to stable kernels.
> 
> 3-4) inet_diag_table_mutex can be replaced with RCU protection,
>      if we add corresponding protection against module unload.
> 
> 5-7) sock_diag_table_mutex can be replaced with RCU protection,
>      if we add corresponding protection against module unload.
> 
>  8)  sock_diag_mutex is removed, as the old bug it was
>      working around has been fixed more elegantly.
> 
>  9)  inet_diag_dump_icsk() can skip over empty buckets to reduce
>      spinlock contention.
> 
> Eric Dumazet (9):
>   sock_diag: annotate data-races around sock_diag_handlers[family]
>   inet_diag: annotate data-races around inet_diag_table[]
>   inet_diag: add module pointer to "struct inet_diag_handler"
>   inet_diag: allow concurrent operations
>   sock_diag: add module pointer to "struct sock_diag_handler"
>   sock_diag: allow concurrent operations
>   sock_diag: allow concurrent operation in sock_diag_rcv_msg()
>   sock_diag: remove sock_diag_mutex
>   inet_diag: skip over empty buckets

For the series:

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family]
  2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
@ 2024-01-22 16:29   ` Guillaume Nault
  2024-01-22 22:36   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 16:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:25:55AM +0000, Eric Dumazet wrote:
> __sock_diag_cmd() and sock_diag_bind() read sock_diag_handlers[family]
> without a lock held.
> 
> Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
> 
> Fixes: 8ef874bfc729 ("sock_diag: Move the sock_ code to net/core/")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[]
  2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
@ 2024-01-22 16:29   ` Guillaume Nault
  2024-01-22 22:37   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 16:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:25:56AM +0000, Eric Dumazet wrote:
> inet_diag_lock_handler() reads inet_diag_table[proto] locklessly.
> 
> Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
> 
> Fixes: d523a328fb02 ("[INET]: Fix inet_diag dead-lock regression")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler"
  2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
@ 2024-01-22 16:58   ` Guillaume Nault
  2024-01-22 22:38   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 16:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:25:57AM +0000, Eric Dumazet wrote:
> Following patch is going to use RCU instead of
> inet_diag_table_mutex acquisition.
> 
> This patch is a preparation, no change of behavior yet.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 4/9] inet_diag: allow concurrent operations
  2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
@ 2024-01-22 16:58   ` Guillaume Nault
  2024-01-22 22:40   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 16:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:25:58AM +0000, Eric Dumazet wrote:
> inet_diag_lock_handler() current implementation uses a mutex
> to protect inet_diag_table[] array against concurrent changes.
> 
> This makes inet_diag dump serialized, thus less scalable
> than legacy /proc files.
> 
> It is time to switch to full RCU protection.
> 
> As a bonus, if a target is statically linked instead of being
> modular, inet_diag_lock_handler() & inet_diag_unlock_handler()
> reduce to reads only.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler"
  2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
@ 2024-01-22 17:24   ` Guillaume Nault
  2024-01-22 22:44   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 17:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:25:59AM +0000, Eric Dumazet wrote:
> Following patch is going to use RCU instead of
> sock_diag_table_mutex acquisition.
> 
> This patch is a preparation, no change of behavior yet.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 6/9] sock_diag: allow concurrent operations
  2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
@ 2024-01-22 17:25   ` Guillaume Nault
  2024-01-22 22:47   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 17:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:26:00AM +0000, Eric Dumazet wrote:
> sock_diag_broadcast_destroy_work() and __sock_diag_cmd()
> are currently using sock_diag_table_mutex to protect
> against concurrent sock_diag_handlers[] changes.
> 
> This makes inet_diag dump serialized, thus less scalable
> than legacy /proc files.
> 
> It is time to switch to full RCU protection.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg()
  2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
@ 2024-01-22 17:25   ` Guillaume Nault
  2024-01-22 22:51   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 17:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:26:01AM +0000, Eric Dumazet wrote:
> TCPDIAG_GETSOCK and DCCPDIAG_GETSOCK diag are serialized
> on sock_diag_table_mutex.
> 
> This is to make sure inet_diag module is not unloaded
> while diag was ongoing.
> 
> It is time to get rid of this mutex and use RCU protection,
> allowing full parallelism.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex
  2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
@ 2024-01-22 21:38   ` Guillaume Nault
  2024-01-22 22:55   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 21:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:26:02AM +0000, Eric Dumazet wrote:
> sock_diag_rcv() is still serializing its operations using
> a mutex, for no good reason.
> 
> This came with commit 0a9c73014415 ("[INET_DIAG]: Fix oops
> in netlink_rcv_skb"), but the root cause has been fixed
> with commit cd40b7d3983c ("[NET]: make netlink user -> kernel
> interface synchronious")
> 
> Remove this mutex to let multiple threads run concurrently.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 9/9] inet_diag: skip over empty buckets
  2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
@ 2024-01-22 21:44   ` Guillaume Nault
  2024-01-22 22:58   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 21:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet

On Mon, Jan 22, 2024 at 11:26:03AM +0000, Eric Dumazet wrote:
> After the removal of inet_diag_table_mutex, sock_diag_table_mutex
> and sock_diag_mutex, I was able so see spinlock contention from
> inet_diag_dump_icsk() when running 100 parallel invocations.
> 
> It is time to skip over empty buckets.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family]
  2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
  2024-01-22 16:29   ` Guillaume Nault
@ 2024-01-22 22:36   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:36 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:55 +0000
> __sock_diag_cmd() and sock_diag_bind() read sock_diag_handlers[family]
> without a lock held.
> 
> Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
> 
> Fixes: 8ef874bfc729 ("sock_diag: Move the sock_ code to net/core/")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[]
  2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
  2024-01-22 16:29   ` Guillaume Nault
@ 2024-01-22 22:37   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:37 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:56 +0000
> inet_diag_lock_handler() reads inet_diag_table[proto] locklessly.
> 
> Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
> 
> Fixes: d523a328fb02 ("[INET]: Fix inet_diag dead-lock regression")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler"
  2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
  2024-01-22 16:58   ` Guillaume Nault
@ 2024-01-22 22:38   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:38 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:57 +0000
> Following patch is going to use RCU instead of
> inet_diag_table_mutex acquisition.
> 
> This patch is a preparation, no change of behavior yet.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 4/9] inet_diag: allow concurrent operations
  2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
  2024-01-22 16:58   ` Guillaume Nault
@ 2024-01-22 22:40   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:40 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:58 +0000
> inet_diag_lock_handler() current implementation uses a mutex
> to protect inet_diag_table[] array against concurrent changes.
> 
> This makes inet_diag dump serialized, thus less scalable
> than legacy /proc files.
> 
> It is time to switch to full RCU protection.
> 
> As a bonus, if a target is statically linked instead of being
> modular, inet_diag_lock_handler() & inet_diag_unlock_handler()
> reduce to reads only.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler"
  2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
  2024-01-22 17:24   ` Guillaume Nault
@ 2024-01-22 22:44   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:44 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:59 +0000
> Following patch is going to use RCU instead of
> sock_diag_table_mutex acquisition.
> 
> This patch is a preparation, no change of behavior yet.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 6/9] sock_diag: allow concurrent operations
  2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
  2024-01-22 17:25   ` Guillaume Nault
@ 2024-01-22 22:47   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:47 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:26:00 +0000
> sock_diag_broadcast_destroy_work() and __sock_diag_cmd()
> are currently using sock_diag_table_mutex to protect
> against concurrent sock_diag_handlers[] changes.
> 
> This makes inet_diag dump serialized, thus less scalable
> than legacy /proc files.
> 
> It is time to switch to full RCU protection.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg()
  2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
  2024-01-22 17:25   ` Guillaume Nault
@ 2024-01-22 22:51   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:51 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:26:01 +0000
> TCPDIAG_GETSOCK and DCCPDIAG_GETSOCK diag are serialized
> on sock_diag_table_mutex.
> 
> This is to make sure inet_diag module is not unloaded
> while diag was ongoing.
> 
> It is time to get rid of this mutex and use RCU protection,
> allowing full parallelism.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex
  2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
  2024-01-22 21:38   ` Guillaume Nault
@ 2024-01-22 22:55   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:55 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:26:02 +0000
> sock_diag_rcv() is still serializing its operations using
> a mutex, for no good reason.
> 
> This came with commit 0a9c73014415 ("[INET_DIAG]: Fix oops
> in netlink_rcv_skb"), but the root cause has been fixed
> with commit cd40b7d3983c ("[NET]: make netlink user -> kernel
> interface synchronious")
> 
> Remove this mutex to let multiple threads run concurrently.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 9/9] inet_diag: skip over empty buckets
  2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
  2024-01-22 21:44   ` Guillaume Nault
@ 2024-01-22 22:58   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:58 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:26:03 +0000
> After the removal of inet_diag_table_mutex, sock_diag_table_mutex
> and sock_diag_mutex, I was able so see spinlock contention from
> inet_diag_dump_icsk() when running 100 parallel invocations.
> 
> It is time to skip over empty buckets.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/9]  inet_diag: remove three mutexes in diag dumps
  2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
                   ` (9 preceding siblings ...)
  2024-01-22 15:05 ` [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Willem de Bruijn
@ 2024-01-23 14:40 ` patchwork-bot+netdevbpf
  10 siblings, 0 replies; 30+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-23 14:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, dsahern, kuniyu, kafai, gnault, netdev,
	eric.dumazet

Hello:

This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Mon, 22 Jan 2024 11:25:54 +0000 you wrote:
> Surprisingly, inet_diag operations are serialized over a stack
> of three mutexes, giving legacy /proc based files an unfair
> advantage on modern hosts.
> 
> This series removes all of them, making inet_diag operations
> (eg iproute2/ss) fully parallel.
> 
> [...]

Here is the summary with links:
  - [net-next,1/9] sock_diag: annotate data-races around sock_diag_handlers[family]
    https://git.kernel.org/netdev/net-next/c/efd402537673
  - [net-next,2/9] inet_diag: annotate data-races around inet_diag_table[]
    https://git.kernel.org/netdev/net-next/c/e50e10ae5d81
  - [net-next,3/9] inet_diag: add module pointer to "struct inet_diag_handler"
    https://git.kernel.org/netdev/net-next/c/db5914695a84
  - [net-next,4/9] inet_diag: allow concurrent operations
    https://git.kernel.org/netdev/net-next/c/223f55196bbd
  - [net-next,5/9] sock_diag: add module pointer to "struct sock_diag_handler"
    https://git.kernel.org/netdev/net-next/c/114b4bb1cc19
  - [net-next,6/9] sock_diag: allow concurrent operations
    https://git.kernel.org/netdev/net-next/c/1d55a6974756
  - [net-next,7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg()
    https://git.kernel.org/netdev/net-next/c/86e8921df05c
  - [net-next,8/9] sock_diag: remove sock_diag_mutex
    https://git.kernel.org/netdev/net-next/c/f44e64990beb
  - [net-next,9/9] inet_diag: skip over empty buckets
    https://git.kernel.org/netdev/net-next/c/622a08e8de9f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-01-23 14:40 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
2024-01-22 16:29   ` Guillaume Nault
2024-01-22 22:36   ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
2024-01-22 16:29   ` Guillaume Nault
2024-01-22 22:37   ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
2024-01-22 16:58   ` Guillaume Nault
2024-01-22 22:38   ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
2024-01-22 16:58   ` Guillaume Nault
2024-01-22 22:40   ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
2024-01-22 17:24   ` Guillaume Nault
2024-01-22 22:44   ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
2024-01-22 17:25   ` Guillaume Nault
2024-01-22 22:47   ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
2024-01-22 17:25   ` Guillaume Nault
2024-01-22 22:51   ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
2024-01-22 21:38   ` Guillaume Nault
2024-01-22 22:55   ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
2024-01-22 21:44   ` Guillaume Nault
2024-01-22 22:58   ` Kuniyuki Iwashima
2024-01-22 15:05 ` [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Willem de Bruijn
2024-01-23 14:40 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.