* [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps
@ 2024-01-22 11:25 Eric Dumazet
2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
` (10 more replies)
0 siblings, 11 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
Surprisingly, inet_diag operations are serialized over a stack
of three mutexes, giving legacy /proc based files an unfair
advantage on modern hosts.
This series removes all of them, making inet_diag operations
(eg iproute2/ss) fully parallel.
1-2) Two first patches are adding data-race annotations
and can be backported to stable kernels.
3-4) inet_diag_table_mutex can be replaced with RCU protection,
if we add corresponding protection against module unload.
5-7) sock_diag_table_mutex can be replaced with RCU protection,
if we add corresponding protection against module unload.
8) sock_diag_mutex is removed, as the old bug it was
working around has been fixed more elegantly.
9) inet_diag_dump_icsk() can skip over empty buckets to reduce
spinlock contention.
Eric Dumazet (9):
sock_diag: annotate data-races around sock_diag_handlers[family]
inet_diag: annotate data-races around inet_diag_table[]
inet_diag: add module pointer to "struct inet_diag_handler"
inet_diag: allow concurrent operations
sock_diag: add module pointer to "struct sock_diag_handler"
sock_diag: allow concurrent operations
sock_diag: allow concurrent operation in sock_diag_rcv_msg()
sock_diag: remove sock_diag_mutex
inet_diag: skip over empty buckets
include/linux/inet_diag.h | 1 +
include/linux/sock_diag.h | 10 +++-
net/core/sock_diag.c | 120 +++++++++++++++++++++-----------------
net/dccp/diag.c | 1 +
net/ipv4/inet_diag.c | 101 ++++++++++++++++++--------------
net/ipv4/raw_diag.c | 1 +
net/ipv4/tcp_diag.c | 1 +
net/ipv4/udp_diag.c | 2 +
net/mptcp/mptcp_diag.c | 1 +
net/netlink/diag.c | 1 +
net/packet/diag.c | 1 +
net/sctp/diag.c | 1 +
net/smc/smc_diag.c | 1 +
net/tipc/diag.c | 1 +
net/unix/diag.c | 1 +
net/vmw_vsock/diag.c | 1 +
net/xdp/xsk_diag.c | 1 +
17 files changed, 149 insertions(+), 97 deletions(-)
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family]
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
2024-01-22 16:29 ` Guillaume Nault
2024-01-22 22:36 ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
` (9 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
__sock_diag_cmd() and sock_diag_bind() read sock_diag_handlers[family]
without a lock held.
Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
Fixes: 8ef874bfc729 ("sock_diag: Move the sock_ code to net/core/")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/sock_diag.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index b1e29e18d1d60cb5c87c884652f547c083ba81cd..c53b731f2d6728d113b90732f4df5b011a438038 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -193,7 +193,7 @@ int sock_diag_register(const struct sock_diag_handler *hndl)
if (sock_diag_handlers[hndl->family])
err = -EBUSY;
else
- sock_diag_handlers[hndl->family] = hndl;
+ WRITE_ONCE(sock_diag_handlers[hndl->family], hndl);
mutex_unlock(&sock_diag_table_mutex);
return err;
@@ -209,7 +209,7 @@ void sock_diag_unregister(const struct sock_diag_handler *hnld)
mutex_lock(&sock_diag_table_mutex);
BUG_ON(sock_diag_handlers[family] != hnld);
- sock_diag_handlers[family] = NULL;
+ WRITE_ONCE(sock_diag_handlers[family], NULL);
mutex_unlock(&sock_diag_table_mutex);
}
EXPORT_SYMBOL_GPL(sock_diag_unregister);
@@ -227,7 +227,7 @@ static int __sock_diag_cmd(struct sk_buff *skb, struct nlmsghdr *nlh)
return -EINVAL;
req->sdiag_family = array_index_nospec(req->sdiag_family, AF_MAX);
- if (sock_diag_handlers[req->sdiag_family] == NULL)
+ if (READ_ONCE(sock_diag_handlers[req->sdiag_family]) == NULL)
sock_load_diag_module(req->sdiag_family, 0);
mutex_lock(&sock_diag_table_mutex);
@@ -286,12 +286,12 @@ static int sock_diag_bind(struct net *net, int group)
switch (group) {
case SKNLGRP_INET_TCP_DESTROY:
case SKNLGRP_INET_UDP_DESTROY:
- if (!sock_diag_handlers[AF_INET])
+ if (!READ_ONCE(sock_diag_handlers[AF_INET]))
sock_load_diag_module(AF_INET, 0);
break;
case SKNLGRP_INET6_TCP_DESTROY:
case SKNLGRP_INET6_UDP_DESTROY:
- if (!sock_diag_handlers[AF_INET6])
+ if (!READ_ONCE(sock_diag_handlers[AF_INET6]))
sock_load_diag_module(AF_INET6, 0);
break;
}
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[]
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
2024-01-22 16:29 ` Guillaume Nault
2024-01-22 22:37 ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
` (8 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
inet_diag_lock_handler() reads inet_diag_table[proto] locklessly.
Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
Fixes: d523a328fb02 ("[INET]: Fix inet_diag dead-lock regression")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/inet_diag.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 8e6b6aa0579e1c94def819a1f9eab8b946771ba7..9804e9608a5a0294b3ffabc4b5bb87ac1b96b09e 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -57,7 +57,7 @@ static const struct inet_diag_handler *inet_diag_lock_handler(int proto)
return ERR_PTR(-ENOENT);
}
- if (!inet_diag_table[proto])
+ if (!READ_ONCE(inet_diag_table[proto]))
sock_load_diag_module(AF_INET, proto);
mutex_lock(&inet_diag_table_mutex);
@@ -1503,7 +1503,7 @@ int inet_diag_register(const struct inet_diag_handler *h)
mutex_lock(&inet_diag_table_mutex);
err = -EEXIST;
if (!inet_diag_table[type]) {
- inet_diag_table[type] = h;
+ WRITE_ONCE(inet_diag_table[type], h);
err = 0;
}
mutex_unlock(&inet_diag_table_mutex);
@@ -1520,7 +1520,7 @@ void inet_diag_unregister(const struct inet_diag_handler *h)
return;
mutex_lock(&inet_diag_table_mutex);
- inet_diag_table[type] = NULL;
+ WRITE_ONCE(inet_diag_table[type], NULL);
mutex_unlock(&inet_diag_table_mutex);
}
EXPORT_SYMBOL_GPL(inet_diag_unregister);
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler"
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
2024-01-22 16:58 ` Guillaume Nault
2024-01-22 22:38 ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
` (7 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
Following patch is going to use RCU instead of
inet_diag_table_mutex acquisition.
This patch is a preparation, no change of behavior yet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/inet_diag.h | 1 +
net/dccp/diag.c | 1 +
net/ipv4/raw_diag.c | 1 +
net/ipv4/tcp_diag.c | 1 +
net/ipv4/udp_diag.c | 2 ++
net/mptcp/mptcp_diag.c | 1 +
net/sctp/diag.c | 1 +
7 files changed, 8 insertions(+)
diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 84abb30a3fbb13a61ed786bf32e0f72c0bbfbdd2..a9033696b0aad36ab9abd47e4b68e272053019d7 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -8,6 +8,7 @@
struct inet_hashinfo;
struct inet_diag_handler {
+ struct module *owner;
void (*dump)(struct sk_buff *skb,
struct netlink_callback *cb,
const struct inet_diag_req_v2 *r);
diff --git a/net/dccp/diag.c b/net/dccp/diag.c
index 8a82c5a2c5a8c9ed8885b53744ff13c2bee657d8..f5019d95c3ae535555cc2b4d4885c718227fec67 100644
--- a/net/dccp/diag.c
+++ b/net/dccp/diag.c
@@ -58,6 +58,7 @@ static int dccp_diag_dump_one(struct netlink_callback *cb,
}
static const struct inet_diag_handler dccp_diag_handler = {
+ .owner = THIS_MODULE,
.dump = dccp_diag_dump,
.dump_one = dccp_diag_dump_one,
.idiag_get_info = dccp_diag_get_info,
diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
index fe2140c8375c8ebcc69880142c42655233007900..cc793bd8de258c3a12f11e95cec81c5ae4b9a7f6 100644
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -213,6 +213,7 @@ static int raw_diag_destroy(struct sk_buff *in_skb,
#endif
static const struct inet_diag_handler raw_diag_handler = {
+ .owner = THIS_MODULE,
.dump = raw_diag_dump,
.dump_one = raw_diag_dump_one,
.idiag_get_info = raw_diag_get_info,
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index 4cbe4b44425a6a5daf55abe348c167932ca07222..f428ecf9120f2f596e1d67db2b2a0d0d0e211905 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -222,6 +222,7 @@ static int tcp_diag_destroy(struct sk_buff *in_skb,
#endif
static const struct inet_diag_handler tcp_diag_handler = {
+ .owner = THIS_MODULE,
.dump = tcp_diag_dump,
.dump_one = tcp_diag_dump_one,
.idiag_get_info = tcp_diag_get_info,
diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c
index dc41a22ee80e829582349e8e644f204eff07df0e..38cb3a28e4ed6d54f7078afa2700e71db9ce4b85 100644
--- a/net/ipv4/udp_diag.c
+++ b/net/ipv4/udp_diag.c
@@ -237,6 +237,7 @@ static int udplite_diag_destroy(struct sk_buff *in_skb,
#endif
static const struct inet_diag_handler udp_diag_handler = {
+ .owner = THIS_MODULE,
.dump = udp_diag_dump,
.dump_one = udp_diag_dump_one,
.idiag_get_info = udp_diag_get_info,
@@ -260,6 +261,7 @@ static int udplite_diag_dump_one(struct netlink_callback *cb,
}
static const struct inet_diag_handler udplite_diag_handler = {
+ .owner = THIS_MODULE,
.dump = udplite_diag_dump,
.dump_one = udplite_diag_dump_one,
.idiag_get_info = udp_diag_get_info,
diff --git a/net/mptcp/mptcp_diag.c b/net/mptcp/mptcp_diag.c
index 5409c2ea3f5728a05999db17b7af1b1fb56f757e..bd8ff5950c8d33766a0da971dc127f106feb8481 100644
--- a/net/mptcp/mptcp_diag.c
+++ b/net/mptcp/mptcp_diag.c
@@ -225,6 +225,7 @@ static void mptcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
}
static const struct inet_diag_handler mptcp_diag_handler = {
+ .owner = THIS_MODULE,
.dump = mptcp_diag_dump,
.dump_one = mptcp_diag_dump_one,
.idiag_get_info = mptcp_diag_get_info,
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index eb05131ff1dd671e734457e28b2d7b64eab07f85..23359e522273f0377080007c75eb2c276945f781 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -507,6 +507,7 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
}
static const struct inet_diag_handler sctp_diag_handler = {
+ .owner = THIS_MODULE,
.dump = sctp_diag_dump,
.dump_one = sctp_diag_dump_one,
.idiag_get_info = sctp_diag_get_info,
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH net-next 4/9] inet_diag: allow concurrent operations
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
` (2 preceding siblings ...)
2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
2024-01-22 16:58 ` Guillaume Nault
2024-01-22 22:40 ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
` (6 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
inet_diag_lock_handler() current implementation uses a mutex
to protect inet_diag_table[] array against concurrent changes.
This makes inet_diag dump serialized, thus less scalable
than legacy /proc files.
It is time to switch to full RCU protection.
As a bonus, if a target is statically linked instead of being
modular, inet_diag_lock_handler() & inet_diag_unlock_handler()
reduce to reads only.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/inet_diag.c | 80 ++++++++++++++++++++++----------------------
1 file changed, 40 insertions(+), 40 deletions(-)
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 9804e9608a5a0294b3ffabc4b5bb87ac1b96b09e..abf7dc9827969d7e8061420be8629730ccce5449 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -32,7 +32,7 @@
#include <linux/inet_diag.h>
#include <linux/sock_diag.h>
-static const struct inet_diag_handler **inet_diag_table;
+static const struct inet_diag_handler __rcu **inet_diag_table;
struct inet_diag_entry {
const __be32 *saddr;
@@ -48,28 +48,28 @@ struct inet_diag_entry {
#endif
};
-static DEFINE_MUTEX(inet_diag_table_mutex);
-
static const struct inet_diag_handler *inet_diag_lock_handler(int proto)
{
- if (proto < 0 || proto >= IPPROTO_MAX) {
- mutex_lock(&inet_diag_table_mutex);
- return ERR_PTR(-ENOENT);
- }
+ const struct inet_diag_handler *handler;
+
+ if (proto < 0 || proto >= IPPROTO_MAX)
+ return NULL;
if (!READ_ONCE(inet_diag_table[proto]))
sock_load_diag_module(AF_INET, proto);
- mutex_lock(&inet_diag_table_mutex);
- if (!inet_diag_table[proto])
- return ERR_PTR(-ENOENT);
+ rcu_read_lock();
+ handler = rcu_dereference(inet_diag_table[proto]);
+ if (handler && !try_module_get(handler->owner))
+ handler = NULL;
+ rcu_read_unlock();
- return inet_diag_table[proto];
+ return handler;
}
static void inet_diag_unlock_handler(const struct inet_diag_handler *handler)
{
- mutex_unlock(&inet_diag_table_mutex);
+ module_put(handler->owner);
}
void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
@@ -104,9 +104,12 @@ static size_t inet_sk_attr_size(struct sock *sk,
const struct inet_diag_handler *handler;
size_t aux = 0;
- handler = inet_diag_table[req->sdiag_protocol];
+ rcu_read_lock();
+ handler = rcu_dereference(inet_diag_table[req->sdiag_protocol]);
+ DEBUG_NET_WARN_ON_ONCE(!handler);
if (handler && handler->idiag_get_aux_size)
aux = handler->idiag_get_aux_size(sk, net_admin);
+ rcu_read_unlock();
return nla_total_size(sizeof(struct tcp_info))
+ nla_total_size(sizeof(struct inet_diag_msg))
@@ -244,10 +247,16 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
struct nlmsghdr *nlh;
struct nlattr *attr;
void *info = NULL;
+ int protocol;
cb_data = cb->data;
- handler = inet_diag_table[inet_diag_get_protocol(req, cb_data)];
- BUG_ON(!handler);
+ protocol = inet_diag_get_protocol(req, cb_data);
+
+ /* inet_diag_lock_handler() made sure inet_diag_table[] is stable. */
+ handler = rcu_dereference_protected(inet_diag_table[protocol], 1);
+ DEBUG_NET_WARN_ON_ONCE(!handler);
+ if (!handler)
+ return -ENXIO;
nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
cb->nlh->nlmsg_type, sizeof(*r), nlmsg_flags);
@@ -605,9 +614,10 @@ static int inet_diag_cmd_exact(int cmd, struct sk_buff *in_skb,
protocol = inet_diag_get_protocol(req, &dump_data);
handler = inet_diag_lock_handler(protocol);
- if (IS_ERR(handler)) {
- err = PTR_ERR(handler);
- } else if (cmd == SOCK_DIAG_BY_FAMILY) {
+ if (!handler)
+ return -ENOENT;
+
+ if (cmd == SOCK_DIAG_BY_FAMILY) {
struct netlink_callback cb = {
.nlh = nlh,
.skb = in_skb,
@@ -1259,12 +1269,12 @@ static int __inet_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
again:
prev_min_dump_alloc = cb->min_dump_alloc;
handler = inet_diag_lock_handler(protocol);
- if (!IS_ERR(handler))
+ if (handler) {
handler->dump(skb, cb, r);
- else
- err = PTR_ERR(handler);
- inet_diag_unlock_handler(handler);
-
+ inet_diag_unlock_handler(handler);
+ } else {
+ err = -ENOENT;
+ }
/* The skb is not large enough to fit one sk info and
* inet_sk_diag_fill() has requested for a larger skb.
*/
@@ -1457,10 +1467,9 @@ int inet_diag_handler_get_info(struct sk_buff *skb, struct sock *sk)
}
handler = inet_diag_lock_handler(sk->sk_protocol);
- if (IS_ERR(handler)) {
- inet_diag_unlock_handler(handler);
+ if (!handler) {
nlmsg_cancel(skb, nlh);
- return PTR_ERR(handler);
+ return -ENOENT;
}
attr = handler->idiag_info_size
@@ -1495,20 +1504,12 @@ static const struct sock_diag_handler inet6_diag_handler = {
int inet_diag_register(const struct inet_diag_handler *h)
{
const __u16 type = h->idiag_type;
- int err = -EINVAL;
if (type >= IPPROTO_MAX)
- goto out;
+ return -EINVAL;
- mutex_lock(&inet_diag_table_mutex);
- err = -EEXIST;
- if (!inet_diag_table[type]) {
- WRITE_ONCE(inet_diag_table[type], h);
- err = 0;
- }
- mutex_unlock(&inet_diag_table_mutex);
-out:
- return err;
+ return !cmpxchg((const struct inet_diag_handler **)&inet_diag_table[type],
+ NULL, h) ? 0 : -EEXIST;
}
EXPORT_SYMBOL_GPL(inet_diag_register);
@@ -1519,9 +1520,8 @@ void inet_diag_unregister(const struct inet_diag_handler *h)
if (type >= IPPROTO_MAX)
return;
- mutex_lock(&inet_diag_table_mutex);
- WRITE_ONCE(inet_diag_table[type], NULL);
- mutex_unlock(&inet_diag_table_mutex);
+ xchg((const struct inet_diag_handler **)&inet_diag_table[type],
+ NULL);
}
EXPORT_SYMBOL_GPL(inet_diag_unregister);
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler"
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
` (3 preceding siblings ...)
2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
@ 2024-01-22 11:25 ` Eric Dumazet
2024-01-22 17:24 ` Guillaume Nault
2024-01-22 22:44 ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
` (5 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:25 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
Following patch is going to use RCU instead of
sock_diag_table_mutex acquisition.
This patch is a preparation, no change of behavior yet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/sock_diag.h | 1 +
net/ipv4/inet_diag.c | 2 ++
net/netlink/diag.c | 1 +
net/packet/diag.c | 1 +
net/smc/smc_diag.c | 1 +
net/tipc/diag.c | 1 +
net/unix/diag.c | 1 +
net/vmw_vsock/diag.c | 1 +
net/xdp/xsk_diag.c | 1 +
9 files changed, 10 insertions(+)
diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index 0b9ecd8cf9793bc26138a0a36474e78773fb4f31..7c07754d711b9bd04bc57f8ed08981849fcadb11 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -13,6 +13,7 @@ struct nlmsghdr;
struct sock;
struct sock_diag_handler {
+ struct module *owner;
__u8 family;
int (*dump)(struct sk_buff *skb, struct nlmsghdr *nlh);
int (*get_info)(struct sk_buff *skb, struct sock *sk);
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index abf7dc9827969d7e8061420be8629730ccce5449..52ce20691e4ef1382da94473128e3c14c55bd542 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -1488,6 +1488,7 @@ int inet_diag_handler_get_info(struct sk_buff *skb, struct sock *sk)
}
static const struct sock_diag_handler inet_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_INET,
.dump = inet_diag_handler_cmd,
.get_info = inet_diag_handler_get_info,
@@ -1495,6 +1496,7 @@ static const struct sock_diag_handler inet_diag_handler = {
};
static const struct sock_diag_handler inet6_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_INET6,
.dump = inet_diag_handler_cmd,
.get_info = inet_diag_handler_get_info,
diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index 1eeff9422856eb9006e25b21ef188280b4cff7f6..e12c90d5f6ad29446ea1990c88c19bcb0ee856c3 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -241,6 +241,7 @@ static int netlink_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
}
static const struct sock_diag_handler netlink_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_NETLINK,
.dump = netlink_diag_handler_dump,
};
diff --git a/net/packet/diag.c b/net/packet/diag.c
index 9a7980e3309d6a2950688f8b69b08c15c288f601..b3bd2f6c2bf7be7b1436aa1a7fad6ef3f77217ad 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -245,6 +245,7 @@ static int packet_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
}
static const struct sock_diag_handler packet_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_PACKET,
.dump = packet_diag_handler_dump,
};
diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index 52f7c4f1e7670d723a6858614f071f73dbd88dc5..32bad267fa3e2729b833cb711a6bb946bc7229d9 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -255,6 +255,7 @@ static int smc_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
}
static const struct sock_diag_handler smc_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_SMC,
.dump = smc_diag_handler_dump,
};
diff --git a/net/tipc/diag.c b/net/tipc/diag.c
index 18733451c9e0c23a63d9400d408979aab46ecf19..54dde8c4e4d46d8556b9cc5396c863d24306d547 100644
--- a/net/tipc/diag.c
+++ b/net/tipc/diag.c
@@ -95,6 +95,7 @@ static int tipc_sock_diag_handler_dump(struct sk_buff *skb,
}
static const struct sock_diag_handler tipc_sock_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_TIPC,
.dump = tipc_sock_diag_handler_dump,
};
diff --git a/net/unix/diag.c b/net/unix/diag.c
index bec09a3a1d44ce56d43e16583fdf3b417cce4033..c3648b706509653480b71ea26ec4f8462f1a3c42 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -322,6 +322,7 @@ static int unix_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
}
static const struct sock_diag_handler unix_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_UNIX,
.dump = unix_diag_handler_dump,
};
diff --git a/net/vmw_vsock/diag.c b/net/vmw_vsock/diag.c
index 2e29994f92ffa2facee45cd53ec791034182508c..ab87ef66c1e88765911a3b6ff89b7fc720b6d692 100644
--- a/net/vmw_vsock/diag.c
+++ b/net/vmw_vsock/diag.c
@@ -157,6 +157,7 @@ static int vsock_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
}
static const struct sock_diag_handler vsock_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_VSOCK,
.dump = vsock_diag_handler_dump,
};
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index 9f8955367275e2439d910f978fc3b2b7a1669978..09dcea0cbbed97d9a41e88224994279cfbf8c536 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -194,6 +194,7 @@ static int xsk_diag_handler_dump(struct sk_buff *nlskb, struct nlmsghdr *hdr)
}
static const struct sock_diag_handler xsk_diag_handler = {
+ .owner = THIS_MODULE,
.family = AF_XDP,
.dump = xsk_diag_handler_dump,
};
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH net-next 6/9] sock_diag: allow concurrent operations
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
` (4 preceding siblings ...)
2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
@ 2024-01-22 11:26 ` Eric Dumazet
2024-01-22 17:25 ` Guillaume Nault
2024-01-22 22:47 ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
` (4 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:26 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
sock_diag_broadcast_destroy_work() and __sock_diag_cmd()
are currently using sock_diag_table_mutex to protect
against concurrent sock_diag_handlers[] changes.
This makes inet_diag dump serialized, thus less scalable
than legacy /proc files.
It is time to switch to full RCU protection.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/sock_diag.c | 73 +++++++++++++++++++++++++-------------------
1 file changed, 42 insertions(+), 31 deletions(-)
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index c53b731f2d6728d113b90732f4df5b011a438038..72009e1f4380dfdcbf43ed08791e5039e74f5c54 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -16,7 +16,7 @@
#include <linux/inet_diag.h>
#include <linux/sock_diag.h>
-static const struct sock_diag_handler *sock_diag_handlers[AF_MAX];
+static const struct sock_diag_handler __rcu *sock_diag_handlers[AF_MAX];
static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh);
static DEFINE_MUTEX(sock_diag_table_mutex);
static struct workqueue_struct *broadcast_wq;
@@ -122,6 +122,24 @@ static size_t sock_diag_nlmsg_size(void)
+ nla_total_size_64bit(sizeof(struct tcp_info))); /* INET_DIAG_INFO */
}
+static const struct sock_diag_handler *sock_diag_lock_handler(int family)
+{
+ const struct sock_diag_handler *handler;
+
+ rcu_read_lock();
+ handler = rcu_dereference(sock_diag_handlers[family]);
+ if (handler && !try_module_get(handler->owner))
+ handler = NULL;
+ rcu_read_unlock();
+
+ return handler;
+}
+
+static void sock_diag_unlock_handler(const struct sock_diag_handler *handler)
+{
+ module_put(handler->owner);
+}
+
static void sock_diag_broadcast_destroy_work(struct work_struct *work)
{
struct broadcast_sk *bsk =
@@ -138,12 +156,12 @@ static void sock_diag_broadcast_destroy_work(struct work_struct *work)
if (!skb)
goto out;
- mutex_lock(&sock_diag_table_mutex);
- hndl = sock_diag_handlers[sk->sk_family];
- if (hndl && hndl->get_info)
- err = hndl->get_info(skb, sk);
- mutex_unlock(&sock_diag_table_mutex);
-
+ hndl = sock_diag_lock_handler(sk->sk_family);
+ if (hndl) {
+ if (hndl->get_info)
+ err = hndl->get_info(skb, sk);
+ sock_diag_unlock_handler(hndl);
+ }
if (!err)
nlmsg_multicast(sock_net(sk)->diag_nlsk, skb, 0, group,
GFP_KERNEL);
@@ -184,33 +202,26 @@ EXPORT_SYMBOL_GPL(sock_diag_unregister_inet_compat);
int sock_diag_register(const struct sock_diag_handler *hndl)
{
- int err = 0;
+ int family = hndl->family;
- if (hndl->family >= AF_MAX)
+ if (family >= AF_MAX)
return -EINVAL;
- mutex_lock(&sock_diag_table_mutex);
- if (sock_diag_handlers[hndl->family])
- err = -EBUSY;
- else
- WRITE_ONCE(sock_diag_handlers[hndl->family], hndl);
- mutex_unlock(&sock_diag_table_mutex);
-
- return err;
+ return !cmpxchg((const struct sock_diag_handler **)
+ &sock_diag_handlers[family],
+ NULL, hndl) ? 0 : -EBUSY;
}
EXPORT_SYMBOL_GPL(sock_diag_register);
-void sock_diag_unregister(const struct sock_diag_handler *hnld)
+void sock_diag_unregister(const struct sock_diag_handler *hndl)
{
- int family = hnld->family;
+ int family = hndl->family;
if (family >= AF_MAX)
return;
- mutex_lock(&sock_diag_table_mutex);
- BUG_ON(sock_diag_handlers[family] != hnld);
- WRITE_ONCE(sock_diag_handlers[family], NULL);
- mutex_unlock(&sock_diag_table_mutex);
+ xchg((const struct sock_diag_handler **)&sock_diag_handlers[family],
+ NULL);
}
EXPORT_SYMBOL_GPL(sock_diag_unregister);
@@ -227,20 +238,20 @@ static int __sock_diag_cmd(struct sk_buff *skb, struct nlmsghdr *nlh)
return -EINVAL;
req->sdiag_family = array_index_nospec(req->sdiag_family, AF_MAX);
- if (READ_ONCE(sock_diag_handlers[req->sdiag_family]) == NULL)
+ if (!rcu_access_pointer(sock_diag_handlers[req->sdiag_family]))
sock_load_diag_module(req->sdiag_family, 0);
- mutex_lock(&sock_diag_table_mutex);
- hndl = sock_diag_handlers[req->sdiag_family];
+ hndl = sock_diag_lock_handler(req->sdiag_family);
if (hndl == NULL)
- err = -ENOENT;
- else if (nlh->nlmsg_type == SOCK_DIAG_BY_FAMILY)
+ return -ENOENT;
+
+ if (nlh->nlmsg_type == SOCK_DIAG_BY_FAMILY)
err = hndl->dump(skb, nlh);
else if (nlh->nlmsg_type == SOCK_DESTROY && hndl->destroy)
err = hndl->destroy(skb, nlh);
else
err = -EOPNOTSUPP;
- mutex_unlock(&sock_diag_table_mutex);
+ sock_diag_unlock_handler(hndl);
return err;
}
@@ -286,12 +297,12 @@ static int sock_diag_bind(struct net *net, int group)
switch (group) {
case SKNLGRP_INET_TCP_DESTROY:
case SKNLGRP_INET_UDP_DESTROY:
- if (!READ_ONCE(sock_diag_handlers[AF_INET]))
+ if (!rcu_access_pointer(sock_diag_handlers[AF_INET]))
sock_load_diag_module(AF_INET, 0);
break;
case SKNLGRP_INET6_TCP_DESTROY:
case SKNLGRP_INET6_UDP_DESTROY:
- if (!READ_ONCE(sock_diag_handlers[AF_INET6]))
+ if (!rcu_access_pointer(sock_diag_handlers[AF_INET6]))
sock_load_diag_module(AF_INET6, 0);
break;
}
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg()
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
` (5 preceding siblings ...)
2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
@ 2024-01-22 11:26 ` Eric Dumazet
2024-01-22 17:25 ` Guillaume Nault
2024-01-22 22:51 ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
` (3 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:26 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
TCPDIAG_GETSOCK and DCCPDIAG_GETSOCK diag are serialized
on sock_diag_table_mutex.
This is to make sure inet_diag module is not unloaded
while diag was ongoing.
It is time to get rid of this mutex and use RCU protection,
allowing full parallelism.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/sock_diag.h | 9 ++++++--
net/core/sock_diag.c | 43 +++++++++++++++++++++++----------------
net/ipv4/inet_diag.c | 9 ++++++--
3 files changed, 40 insertions(+), 21 deletions(-)
diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index 7c07754d711b9bd04bc57f8ed08981849fcadb11..110978dc9af1b19194644151af5456b8c6644cf9 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -23,8 +23,13 @@ struct sock_diag_handler {
int sock_diag_register(const struct sock_diag_handler *h);
void sock_diag_unregister(const struct sock_diag_handler *h);
-void sock_diag_register_inet_compat(int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh));
-void sock_diag_unregister_inet_compat(int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh));
+struct sock_diag_inet_compat {
+ struct module *owner;
+ int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh);
+};
+
+void sock_diag_register_inet_compat(const struct sock_diag_inet_compat *ptr);
+void sock_diag_unregister_inet_compat(const struct sock_diag_inet_compat *ptr);
u64 __sock_gen_cookie(struct sock *sk);
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 72009e1f4380dfdcbf43ed08791e5039e74f5c54..5c3666431df49b3c278ef795f11ba542247796a6 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -17,8 +17,9 @@
#include <linux/sock_diag.h>
static const struct sock_diag_handler __rcu *sock_diag_handlers[AF_MAX];
-static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh);
-static DEFINE_MUTEX(sock_diag_table_mutex);
+
+static struct sock_diag_inet_compat __rcu *inet_rcv_compat;
+
static struct workqueue_struct *broadcast_wq;
DEFINE_COOKIE(sock_cookie);
@@ -184,19 +185,20 @@ void sock_diag_broadcast_destroy(struct sock *sk)
queue_work(broadcast_wq, &bsk->work);
}
-void sock_diag_register_inet_compat(int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh))
+void sock_diag_register_inet_compat(const struct sock_diag_inet_compat *ptr)
{
- mutex_lock(&sock_diag_table_mutex);
- inet_rcv_compat = fn;
- mutex_unlock(&sock_diag_table_mutex);
+ xchg((__force const struct sock_diag_inet_compat **)&inet_rcv_compat,
+ ptr);
}
EXPORT_SYMBOL_GPL(sock_diag_register_inet_compat);
-void sock_diag_unregister_inet_compat(int (*fn)(struct sk_buff *skb, struct nlmsghdr *nlh))
+void sock_diag_unregister_inet_compat(const struct sock_diag_inet_compat *ptr)
{
- mutex_lock(&sock_diag_table_mutex);
- inet_rcv_compat = NULL;
- mutex_unlock(&sock_diag_table_mutex);
+ const struct sock_diag_inet_compat *old;
+
+ old = xchg((__force const struct sock_diag_inet_compat **)&inet_rcv_compat,
+ NULL);
+ WARN_ON_ONCE(old != ptr);
}
EXPORT_SYMBOL_GPL(sock_diag_unregister_inet_compat);
@@ -259,20 +261,27 @@ static int __sock_diag_cmd(struct sk_buff *skb, struct nlmsghdr *nlh)
static int sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
+ const struct sock_diag_inet_compat *ptr;
int ret;
switch (nlh->nlmsg_type) {
case TCPDIAG_GETSOCK:
case DCCPDIAG_GETSOCK:
- if (inet_rcv_compat == NULL)
+
+ if (!rcu_access_pointer(inet_rcv_compat))
sock_load_diag_module(AF_INET, 0);
- mutex_lock(&sock_diag_table_mutex);
- if (inet_rcv_compat != NULL)
- ret = inet_rcv_compat(skb, nlh);
- else
- ret = -EOPNOTSUPP;
- mutex_unlock(&sock_diag_table_mutex);
+ rcu_read_lock();
+ ptr = rcu_dereference(inet_rcv_compat);
+ if (ptr && !try_module_get(ptr->owner))
+ ptr = NULL;
+ rcu_read_unlock();
+
+ ret = -EOPNOTSUPP;
+ if (ptr) {
+ ret = ptr->fn(skb, nlh);
+ module_put(ptr->owner);
+ }
return ret;
case SOCK_DIAG_BY_FAMILY:
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 52ce20691e4ef1382da94473128e3c14c55bd542..2c2d8b9dd8e9bb502e52e30dffc70da36d9b1c74 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -1527,6 +1527,11 @@ void inet_diag_unregister(const struct inet_diag_handler *h)
}
EXPORT_SYMBOL_GPL(inet_diag_unregister);
+static const struct sock_diag_inet_compat inet_diag_compat = {
+ .owner = THIS_MODULE,
+ .fn = inet_diag_rcv_msg_compat,
+};
+
static int __init inet_diag_init(void)
{
const int inet_diag_table_size = (IPPROTO_MAX *
@@ -1545,7 +1550,7 @@ static int __init inet_diag_init(void)
if (err)
goto out_free_inet;
- sock_diag_register_inet_compat(inet_diag_rcv_msg_compat);
+ sock_diag_register_inet_compat(&inet_diag_compat);
out:
return err;
@@ -1560,7 +1565,7 @@ static void __exit inet_diag_exit(void)
{
sock_diag_unregister(&inet6_diag_handler);
sock_diag_unregister(&inet_diag_handler);
- sock_diag_unregister_inet_compat(inet_diag_rcv_msg_compat);
+ sock_diag_unregister_inet_compat(&inet_diag_compat);
kfree(inet_diag_table);
}
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
` (6 preceding siblings ...)
2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
@ 2024-01-22 11:26 ` Eric Dumazet
2024-01-22 21:38 ` Guillaume Nault
2024-01-22 22:55 ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
` (2 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:26 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
sock_diag_rcv() is still serializing its operations using
a mutex, for no good reason.
This came with commit 0a9c73014415 ("[INET_DIAG]: Fix oops
in netlink_rcv_skb"), but the root cause has been fixed
with commit cd40b7d3983c ("[NET]: make netlink user -> kernel
interface synchronious")
Remove this mutex to let multiple threads run concurrently.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/sock_diag.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 5c3666431df49b3c278ef795f11ba542247796a6..6541228380252d597821b084df34176bff4ada83 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -292,13 +292,9 @@ static int sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
}
}
-static DEFINE_MUTEX(sock_diag_mutex);
-
static void sock_diag_rcv(struct sk_buff *skb)
{
- mutex_lock(&sock_diag_mutex);
netlink_rcv_skb(skb, &sock_diag_rcv_msg);
- mutex_unlock(&sock_diag_mutex);
}
static int sock_diag_bind(struct net *net, int group)
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH net-next 9/9] inet_diag: skip over empty buckets
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
` (7 preceding siblings ...)
2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
@ 2024-01-22 11:26 ` Eric Dumazet
2024-01-22 21:44 ` Guillaume Nault
2024-01-22 22:58 ` Kuniyuki Iwashima
2024-01-22 15:05 ` [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Willem de Bruijn
2024-01-23 14:40 ` patchwork-bot+netdevbpf
10 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2024-01-22 11:26 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
After the removal of inet_diag_table_mutex, sock_diag_table_mutex
and sock_diag_mutex, I was able so see spinlock contention from
inet_diag_dump_icsk() when running 100 parallel invocations.
It is time to skip over empty buckets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/inet_diag.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 2c2d8b9dd8e9bb502e52e30dffc70da36d9b1c74..7adace541fe292851a66ccc4de1da2a60ac4714e 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -1045,6 +1045,10 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
num = 0;
ilb = &hashinfo->lhash2[i];
+ if (hlist_nulls_empty(&ilb->nulls_head)) {
+ s_num = 0;
+ continue;
+ }
spin_lock(&ilb->lock);
sk_nulls_for_each(sk, node, &ilb->nulls_head) {
struct inet_sock *inet = inet_sk(sk);
@@ -1109,6 +1113,10 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
accum = 0;
ibb = &hashinfo->bhash2[i];
+ if (hlist_empty(&ibb->chain)) {
+ s_num = 0;
+ continue;
+ }
spin_lock_bh(&ibb->lock);
inet_bind_bucket_for_each(tb2, &ibb->chain) {
if (!net_eq(ib2_net(tb2), net))
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
` (8 preceding siblings ...)
2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
@ 2024-01-22 15:05 ` Willem de Bruijn
2024-01-23 14:40 ` patchwork-bot+netdevbpf
10 siblings, 0 replies; 30+ messages in thread
From: Willem de Bruijn @ 2024-01-22 15:05 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Kuniyuki Iwashima, Martin KaFai Lau,
Guillaume Nault, netdev, eric.dumazet, Eric Dumazet
Eric Dumazet wrote:
> Surprisingly, inet_diag operations are serialized over a stack
> of three mutexes, giving legacy /proc based files an unfair
> advantage on modern hosts.
>
> This series removes all of them, making inet_diag operations
> (eg iproute2/ss) fully parallel.
>
> 1-2) Two first patches are adding data-race annotations
> and can be backported to stable kernels.
>
> 3-4) inet_diag_table_mutex can be replaced with RCU protection,
> if we add corresponding protection against module unload.
>
> 5-7) sock_diag_table_mutex can be replaced with RCU protection,
> if we add corresponding protection against module unload.
>
> 8) sock_diag_mutex is removed, as the old bug it was
> working around has been fixed more elegantly.
>
> 9) inet_diag_dump_icsk() can skip over empty buckets to reduce
> spinlock contention.
>
> Eric Dumazet (9):
> sock_diag: annotate data-races around sock_diag_handlers[family]
> inet_diag: annotate data-races around inet_diag_table[]
> inet_diag: add module pointer to "struct inet_diag_handler"
> inet_diag: allow concurrent operations
> sock_diag: add module pointer to "struct sock_diag_handler"
> sock_diag: allow concurrent operations
> sock_diag: allow concurrent operation in sock_diag_rcv_msg()
> sock_diag: remove sock_diag_mutex
> inet_diag: skip over empty buckets
For the series:
Reviewed-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family]
2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
@ 2024-01-22 16:29 ` Guillaume Nault
2024-01-22 22:36 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 16:29 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:25:55AM +0000, Eric Dumazet wrote:
> __sock_diag_cmd() and sock_diag_bind() read sock_diag_handlers[family]
> without a lock held.
>
> Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
>
> Fixes: 8ef874bfc729 ("sock_diag: Move the sock_ code to net/core/")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[]
2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
@ 2024-01-22 16:29 ` Guillaume Nault
2024-01-22 22:37 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 16:29 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:25:56AM +0000, Eric Dumazet wrote:
> inet_diag_lock_handler() reads inet_diag_table[proto] locklessly.
>
> Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
>
> Fixes: d523a328fb02 ("[INET]: Fix inet_diag dead-lock regression")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler"
2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
@ 2024-01-22 16:58 ` Guillaume Nault
2024-01-22 22:38 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 16:58 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:25:57AM +0000, Eric Dumazet wrote:
> Following patch is going to use RCU instead of
> inet_diag_table_mutex acquisition.
>
> This patch is a preparation, no change of behavior yet.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 4/9] inet_diag: allow concurrent operations
2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
@ 2024-01-22 16:58 ` Guillaume Nault
2024-01-22 22:40 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 16:58 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:25:58AM +0000, Eric Dumazet wrote:
> inet_diag_lock_handler() current implementation uses a mutex
> to protect inet_diag_table[] array against concurrent changes.
>
> This makes inet_diag dump serialized, thus less scalable
> than legacy /proc files.
>
> It is time to switch to full RCU protection.
>
> As a bonus, if a target is statically linked instead of being
> modular, inet_diag_lock_handler() & inet_diag_unlock_handler()
> reduce to reads only.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler"
2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
@ 2024-01-22 17:24 ` Guillaume Nault
2024-01-22 22:44 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 17:24 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:25:59AM +0000, Eric Dumazet wrote:
> Following patch is going to use RCU instead of
> sock_diag_table_mutex acquisition.
>
> This patch is a preparation, no change of behavior yet.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 6/9] sock_diag: allow concurrent operations
2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
@ 2024-01-22 17:25 ` Guillaume Nault
2024-01-22 22:47 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 17:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:26:00AM +0000, Eric Dumazet wrote:
> sock_diag_broadcast_destroy_work() and __sock_diag_cmd()
> are currently using sock_diag_table_mutex to protect
> against concurrent sock_diag_handlers[] changes.
>
> This makes inet_diag dump serialized, thus less scalable
> than legacy /proc files.
>
> It is time to switch to full RCU protection.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg()
2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
@ 2024-01-22 17:25 ` Guillaume Nault
2024-01-22 22:51 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 17:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:26:01AM +0000, Eric Dumazet wrote:
> TCPDIAG_GETSOCK and DCCPDIAG_GETSOCK diag are serialized
> on sock_diag_table_mutex.
>
> This is to make sure inet_diag module is not unloaded
> while diag was ongoing.
>
> It is time to get rid of this mutex and use RCU protection,
> allowing full parallelism.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex
2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
@ 2024-01-22 21:38 ` Guillaume Nault
2024-01-22 22:55 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 21:38 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:26:02AM +0000, Eric Dumazet wrote:
> sock_diag_rcv() is still serializing its operations using
> a mutex, for no good reason.
>
> This came with commit 0a9c73014415 ("[INET_DIAG]: Fix oops
> in netlink_rcv_skb"), but the root cause has been fixed
> with commit cd40b7d3983c ("[NET]: make netlink user -> kernel
> interface synchronious")
>
> Remove this mutex to let multiple threads run concurrently.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 9/9] inet_diag: skip over empty buckets
2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
@ 2024-01-22 21:44 ` Guillaume Nault
2024-01-22 22:58 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Guillaume Nault @ 2024-01-22 21:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Martin KaFai Lau, netdev, eric.dumazet
On Mon, Jan 22, 2024 at 11:26:03AM +0000, Eric Dumazet wrote:
> After the removal of inet_diag_table_mutex, sock_diag_table_mutex
> and sock_diag_mutex, I was able so see spinlock contention from
> inet_diag_dump_icsk() when running 100 parallel invocations.
>
> It is time to skip over empty buckets.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family]
2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
2024-01-22 16:29 ` Guillaume Nault
@ 2024-01-22 22:36 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:36 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:55 +0000
> __sock_diag_cmd() and sock_diag_bind() read sock_diag_handlers[family]
> without a lock held.
>
> Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
>
> Fixes: 8ef874bfc729 ("sock_diag: Move the sock_ code to net/core/")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[]
2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
2024-01-22 16:29 ` Guillaume Nault
@ 2024-01-22 22:37 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:37 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:56 +0000
> inet_diag_lock_handler() reads inet_diag_table[proto] locklessly.
>
> Use READ_ONCE()/WRITE_ONCE() annotations to avoid potential issues.
>
> Fixes: d523a328fb02 ("[INET]: Fix inet_diag dead-lock regression")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler"
2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
2024-01-22 16:58 ` Guillaume Nault
@ 2024-01-22 22:38 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:38 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:57 +0000
> Following patch is going to use RCU instead of
> inet_diag_table_mutex acquisition.
>
> This patch is a preparation, no change of behavior yet.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 4/9] inet_diag: allow concurrent operations
2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
2024-01-22 16:58 ` Guillaume Nault
@ 2024-01-22 22:40 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:40 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:58 +0000
> inet_diag_lock_handler() current implementation uses a mutex
> to protect inet_diag_table[] array against concurrent changes.
>
> This makes inet_diag dump serialized, thus less scalable
> than legacy /proc files.
>
> It is time to switch to full RCU protection.
>
> As a bonus, if a target is statically linked instead of being
> modular, inet_diag_lock_handler() & inet_diag_unlock_handler()
> reduce to reads only.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler"
2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
2024-01-22 17:24 ` Guillaume Nault
@ 2024-01-22 22:44 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:44 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:25:59 +0000
> Following patch is going to use RCU instead of
> sock_diag_table_mutex acquisition.
>
> This patch is a preparation, no change of behavior yet.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 6/9] sock_diag: allow concurrent operations
2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
2024-01-22 17:25 ` Guillaume Nault
@ 2024-01-22 22:47 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:47 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:26:00 +0000
> sock_diag_broadcast_destroy_work() and __sock_diag_cmd()
> are currently using sock_diag_table_mutex to protect
> against concurrent sock_diag_handlers[] changes.
>
> This makes inet_diag dump serialized, thus less scalable
> than legacy /proc files.
>
> It is time to switch to full RCU protection.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg()
2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
2024-01-22 17:25 ` Guillaume Nault
@ 2024-01-22 22:51 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:51 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:26:01 +0000
> TCPDIAG_GETSOCK and DCCPDIAG_GETSOCK diag are serialized
> on sock_diag_table_mutex.
>
> This is to make sure inet_diag module is not unloaded
> while diag was ongoing.
>
> It is time to get rid of this mutex and use RCU protection,
> allowing full parallelism.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex
2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
2024-01-22 21:38 ` Guillaume Nault
@ 2024-01-22 22:55 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:55 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:26:02 +0000
> sock_diag_rcv() is still serializing its operations using
> a mutex, for no good reason.
>
> This came with commit 0a9c73014415 ("[INET_DIAG]: Fix oops
> in netlink_rcv_skb"), but the root cause has been fixed
> with commit cd40b7d3983c ("[NET]: make netlink user -> kernel
> interface synchronious")
>
> Remove this mutex to let multiple threads run concurrently.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 9/9] inet_diag: skip over empty buckets
2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
2024-01-22 21:44 ` Guillaume Nault
@ 2024-01-22 22:58 ` Kuniyuki Iwashima
1 sibling, 0 replies; 30+ messages in thread
From: Kuniyuki Iwashima @ 2024-01-22 22:58 UTC (permalink / raw)
To: edumazet
Cc: davem, dsahern, eric.dumazet, gnault, kafai, kuba, kuniyu,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 22 Jan 2024 11:26:03 +0000
> After the removal of inet_diag_table_mutex, sock_diag_table_mutex
> and sock_diag_mutex, I was able so see spinlock contention from
> inet_diag_dump_icsk() when running 100 parallel invocations.
>
> It is time to skip over empty buckets.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
` (9 preceding siblings ...)
2024-01-22 15:05 ` [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Willem de Bruijn
@ 2024-01-23 14:40 ` patchwork-bot+netdevbpf
10 siblings, 0 replies; 30+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-23 14:40 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, kuba, pabeni, dsahern, kuniyu, kafai, gnault, netdev,
eric.dumazet
Hello:
This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Mon, 22 Jan 2024 11:25:54 +0000 you wrote:
> Surprisingly, inet_diag operations are serialized over a stack
> of three mutexes, giving legacy /proc based files an unfair
> advantage on modern hosts.
>
> This series removes all of them, making inet_diag operations
> (eg iproute2/ss) fully parallel.
>
> [...]
Here is the summary with links:
- [net-next,1/9] sock_diag: annotate data-races around sock_diag_handlers[family]
https://git.kernel.org/netdev/net-next/c/efd402537673
- [net-next,2/9] inet_diag: annotate data-races around inet_diag_table[]
https://git.kernel.org/netdev/net-next/c/e50e10ae5d81
- [net-next,3/9] inet_diag: add module pointer to "struct inet_diag_handler"
https://git.kernel.org/netdev/net-next/c/db5914695a84
- [net-next,4/9] inet_diag: allow concurrent operations
https://git.kernel.org/netdev/net-next/c/223f55196bbd
- [net-next,5/9] sock_diag: add module pointer to "struct sock_diag_handler"
https://git.kernel.org/netdev/net-next/c/114b4bb1cc19
- [net-next,6/9] sock_diag: allow concurrent operations
https://git.kernel.org/netdev/net-next/c/1d55a6974756
- [net-next,7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg()
https://git.kernel.org/netdev/net-next/c/86e8921df05c
- [net-next,8/9] sock_diag: remove sock_diag_mutex
https://git.kernel.org/netdev/net-next/c/f44e64990beb
- [net-next,9/9] inet_diag: skip over empty buckets
https://git.kernel.org/netdev/net-next/c/622a08e8de9f
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2024-01-23 14:40 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22 11:25 [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Eric Dumazet
2024-01-22 11:25 ` [PATCH net-next 1/9] sock_diag: annotate data-races around sock_diag_handlers[family] Eric Dumazet
2024-01-22 16:29 ` Guillaume Nault
2024-01-22 22:36 ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 2/9] inet_diag: annotate data-races around inet_diag_table[] Eric Dumazet
2024-01-22 16:29 ` Guillaume Nault
2024-01-22 22:37 ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 3/9] inet_diag: add module pointer to "struct inet_diag_handler" Eric Dumazet
2024-01-22 16:58 ` Guillaume Nault
2024-01-22 22:38 ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 4/9] inet_diag: allow concurrent operations Eric Dumazet
2024-01-22 16:58 ` Guillaume Nault
2024-01-22 22:40 ` Kuniyuki Iwashima
2024-01-22 11:25 ` [PATCH net-next 5/9] sock_diag: add module pointer to "struct sock_diag_handler" Eric Dumazet
2024-01-22 17:24 ` Guillaume Nault
2024-01-22 22:44 ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 6/9] sock_diag: allow concurrent operations Eric Dumazet
2024-01-22 17:25 ` Guillaume Nault
2024-01-22 22:47 ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 7/9] sock_diag: allow concurrent operation in sock_diag_rcv_msg() Eric Dumazet
2024-01-22 17:25 ` Guillaume Nault
2024-01-22 22:51 ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 8/9] sock_diag: remove sock_diag_mutex Eric Dumazet
2024-01-22 21:38 ` Guillaume Nault
2024-01-22 22:55 ` Kuniyuki Iwashima
2024-01-22 11:26 ` [PATCH net-next 9/9] inet_diag: skip over empty buckets Eric Dumazet
2024-01-22 21:44 ` Guillaume Nault
2024-01-22 22:58 ` Kuniyuki Iwashima
2024-01-22 15:05 ` [PATCH net-next 0/9] inet_diag: remove three mutexes in diag dumps Willem de Bruijn
2024-01-23 14:40 ` patchwork-bot+netdevbpf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.