* [PATCH net-next v3 0/5] Rework ip_ra_chain protection
@ 2018-03-22 9:44 Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 1/5] net: Revert "ipv4: get rid of ip_ra_lock" Kirill Tkhai
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-03-22 9:44 UTC (permalink / raw)
To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai
Commit 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control"
made rtnl_lock() be used in raw_close(). This function is called
on every RAW socket destruction, so that rtnl_mutex is taken
every time. This scales very sadly. I observe cleanup_net()
spending a lot of time in rtnl_lock() and raw_close() is one
of the biggest rtnl user (since we have percpu net->ipv4.icmp_sk).
This patchset reworks the locking: reverts the problem commit
and its descendant, and introduces rtnl-independent locking.
This may have a continuation, and someone may work on killing
rtnl_lock() in mrtsock_destruct() in the future.
Thanks,
Kirill
---
v3: Change patches order: [2/5] and [3/5].
v2: Fix sparse warning [4/5], as reported by kbuild test robot.
---
Kirill Tkhai (5):
net: Revert "ipv4: get rid of ip_ra_lock"
net: Move IP_ROUTER_ALERT out of lock_sock(sk)
net: Revert "ipv4: fix a deadlock in ip_ra_control"
net: Make ip_ra_chain per struct net
net: Replace ip_ra_lock with per-net mutex
include/net/ip.h | 13 +++++++++++--
include/net/netns/ipv4.h | 2 ++
net/core/net_namespace.c | 1 +
net/ipv4/ip_input.c | 5 ++---
net/ipv4/ip_sockglue.c | 34 +++++++++++++---------------------
net/ipv4/ipmr.c | 11 +++++++++--
net/ipv4/raw.c | 2 --
7 files changed, 38 insertions(+), 30 deletions(-)
--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v3 1/5] net: Revert "ipv4: get rid of ip_ra_lock"
2018-03-22 9:44 [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
@ 2018-03-22 9:45 ` Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 2/5] net: Move IP_ROUTER_ALERT out of lock_sock(sk) Kirill Tkhai
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-03-22 9:45 UTC (permalink / raw)
To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai
This reverts commit ba3f571d5dde. The commit was made
after 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control",
and killed ip_ra_lock, which became useless after rtnl_lock()
made used to destroy every raw ipv4 socket. This scales
very bad, and next patch in series reverts 1215e51edad1.
ip_ra_lock will be used again.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
net/ipv4/ip_sockglue.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 74c962b9b09c..be7c3b71914d 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -334,6 +334,7 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
sent to multicast group to reach destination designated router.
*/
struct ip_ra_chain __rcu *ip_ra_chain;
+static DEFINE_SPINLOCK(ip_ra_lock);
static void ip_ra_destroy_rcu(struct rcu_head *head)
@@ -355,17 +356,21 @@ int ip_ra_control(struct sock *sk, unsigned char on,
new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
+ spin_lock_bh(&ip_ra_lock);
for (rap = &ip_ra_chain;
- (ra = rtnl_dereference(*rap)) != NULL;
+ (ra = rcu_dereference_protected(*rap,
+ lockdep_is_held(&ip_ra_lock))) != NULL;
rap = &ra->next) {
if (ra->sk == sk) {
if (on) {
+ spin_unlock_bh(&ip_ra_lock);
kfree(new_ra);
return -EADDRINUSE;
}
/* dont let ip_call_ra_chain() use sk again */
ra->sk = NULL;
RCU_INIT_POINTER(*rap, ra->next);
+ spin_unlock_bh(&ip_ra_lock);
if (ra->destructor)
ra->destructor(sk);
@@ -379,14 +384,17 @@ int ip_ra_control(struct sock *sk, unsigned char on,
return 0;
}
}
- if (!new_ra)
+ if (!new_ra) {
+ spin_unlock_bh(&ip_ra_lock);
return -ENOBUFS;
+ }
new_ra->sk = sk;
new_ra->destructor = destructor;
RCU_INIT_POINTER(new_ra->next, ra);
rcu_assign_pointer(*rap, new_ra);
sock_hold(sk);
+ spin_unlock_bh(&ip_ra_lock);
return 0;
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v3 2/5] net: Move IP_ROUTER_ALERT out of lock_sock(sk)
2018-03-22 9:44 [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 1/5] net: Revert "ipv4: get rid of ip_ra_lock" Kirill Tkhai
@ 2018-03-22 9:45 ` Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 3/5] net: Revert "ipv4: fix a deadlock in ip_ra_control" Kirill Tkhai
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-03-22 9:45 UTC (permalink / raw)
To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai
ip_ra_control() does not need sk_lock. Who are the another
users of ip_ra_chain? ip_mroute_setsockopt() doesn't take
sk_lock, while parallel IP_ROUTER_ALERT syscalls are
synchronized by ip_ra_lock. So, we may move this command
out of sk_lock.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
net/ipv4/ip_sockglue.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index be7c3b71914d..dcbf6afe27e7 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -647,6 +647,8 @@ static int do_ip_setsockopt(struct sock *sk, int level,
/* If optlen==0, it is equivalent to val == 0 */
+ if (optname == IP_ROUTER_ALERT)
+ return ip_ra_control(sk, val ? 1 : 0, NULL);
if (ip_mroute_opt(optname))
return ip_mroute_setsockopt(sk, optname, optval, optlen);
@@ -1157,9 +1159,6 @@ static int do_ip_setsockopt(struct sock *sk, int level,
goto e_inval;
inet->mc_all = val;
break;
- case IP_ROUTER_ALERT:
- err = ip_ra_control(sk, val ? 1 : 0, NULL);
- break;
case IP_FREEBIND:
if (optlen < 1)
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v3 3/5] net: Revert "ipv4: fix a deadlock in ip_ra_control"
2018-03-22 9:44 [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 1/5] net: Revert "ipv4: get rid of ip_ra_lock" Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 2/5] net: Move IP_ROUTER_ALERT out of lock_sock(sk) Kirill Tkhai
@ 2018-03-22 9:45 ` Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 4/5] net: Make ip_ra_chain per struct net Kirill Tkhai
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-03-22 9:45 UTC (permalink / raw)
To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai
This reverts commit 1215e51edad1.
Since raw_close() is used on every RAW socket destruction,
the changes made by 1215e51edad1 scale sadly. This clearly
seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
kwork spends a lot of time waiting for rtnl_lock() introduced
by this commit.
Previous patch moved IP_ROUTER_ALERT out of rtnl_lock(),
so we revert this patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
net/ipv4/ip_sockglue.c | 1 -
net/ipv4/ipmr.c | 11 +++++++++--
net/ipv4/raw.c | 2 --
3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index dcbf6afe27e7..bf5f44b27b7e 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -594,7 +594,6 @@ static bool setsockopt_needs_rtnl(int optname)
case MCAST_LEAVE_GROUP:
case MCAST_LEAVE_SOURCE_GROUP:
case MCAST_UNBLOCK_SOURCE:
- case IP_ROUTER_ALERT:
return true;
}
return false;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index d752a70855d8..f6be5db16da2 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1399,7 +1399,7 @@ static void mrtsock_destruct(struct sock *sk)
struct net *net = sock_net(sk);
struct mr_table *mrt;
- ASSERT_RTNL();
+ rtnl_lock();
ipmr_for_each_table(mrt, net) {
if (sk == rtnl_dereference(mrt->mroute_sk)) {
IPV4_DEVCONF_ALL(net, MC_FORWARDING)--;
@@ -1411,6 +1411,7 @@ static void mrtsock_destruct(struct sock *sk)
mroute_clean_tables(mrt, false);
}
}
+ rtnl_unlock();
}
/* Socket options and virtual interface manipulation. The whole
@@ -1475,8 +1476,13 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
if (sk != rcu_access_pointer(mrt->mroute_sk)) {
ret = -EACCES;
} else {
+ /* We need to unlock here because mrtsock_destruct takes
+ * care of rtnl itself and we can't change that due to
+ * the IP_ROUTER_ALERT setsockopt which runs without it.
+ */
+ rtnl_unlock();
ret = ip_ra_control(sk, 0, NULL);
- goto out_unlock;
+ goto out;
}
break;
case MRT_ADD_VIF:
@@ -1588,6 +1594,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
}
out_unlock:
rtnl_unlock();
+out:
return ret;
}
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 54648d20bf0f..720bef7da2f6 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -711,9 +711,7 @@ static void raw_close(struct sock *sk, long timeout)
/*
* Raw sockets may have direct kernel references. Kill them.
*/
- rtnl_lock();
ip_ra_control(sk, 0, NULL);
- rtnl_unlock();
sk_common_release(sk);
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v3 4/5] net: Make ip_ra_chain per struct net
2018-03-22 9:44 [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
` (2 preceding siblings ...)
2018-03-22 9:45 ` [PATCH net-next v3 3/5] net: Revert "ipv4: fix a deadlock in ip_ra_control" Kirill Tkhai
@ 2018-03-22 9:45 ` Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 5/5] net: Replace ip_ra_lock with per-net mutex Kirill Tkhai
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-03-22 9:45 UTC (permalink / raw)
To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai
This is optimization, which makes ip_call_ra_chain()
iterate less sockets to find the sockets it's looking for.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
include/net/ip.h | 13 +++++++++++--
include/net/netns/ipv4.h | 1 +
net/ipv4/ip_input.c | 5 ++---
net/ipv4/ip_sockglue.c | 15 ++-------------
4 files changed, 16 insertions(+), 18 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index fe63ba95d12b..d53b5a9eae34 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -91,6 +91,17 @@ static inline int inet_sdif(struct sk_buff *skb)
return 0;
}
+/* Special input handler for packets caught by router alert option.
+ They are selected only by protocol field, and then processed likely
+ local ones; but only if someone wants them! Otherwise, router
+ not running rsvpd will kill RSVP.
+
+ It is user level problem, what it will make with them.
+ I have no idea, how it will masquearde or NAT them (it is joke, joke :-)),
+ but receiver should be enough clever f.e. to forward mtrace requests,
+ sent to multicast group to reach destination designated router.
+ */
+
struct ip_ra_chain {
struct ip_ra_chain __rcu *next;
struct sock *sk;
@@ -101,8 +112,6 @@ struct ip_ra_chain {
struct rcu_head rcu;
};
-extern struct ip_ra_chain __rcu *ip_ra_chain;
-
/* IP flags. */
#define IP_CE 0x8000 /* Flag: "Congestion" */
#define IP_DF 0x4000 /* Flag: "Don't Fragment" */
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 382bfd7583cf..97d7ee6667c7 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -49,6 +49,7 @@ struct netns_ipv4 {
#endif
struct ipv4_devconf *devconf_all;
struct ipv4_devconf *devconf_dflt;
+ struct ip_ra_chain __rcu *ra_chain;
#ifdef CONFIG_IP_MULTIPLE_TABLES
struct fib_rules_ops *rules_ops;
bool fib_has_custom_rules;
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 57fc13c6ab2b..7582713dd18f 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -159,7 +159,7 @@ bool ip_call_ra_chain(struct sk_buff *skb)
struct net_device *dev = skb->dev;
struct net *net = dev_net(dev);
- for (ra = rcu_dereference(ip_ra_chain); ra; ra = rcu_dereference(ra->next)) {
+ for (ra = rcu_dereference(net->ipv4.ra_chain); ra; ra = rcu_dereference(ra->next)) {
struct sock *sk = ra->sk;
/* If socket is bound to an interface, only report
@@ -167,8 +167,7 @@ bool ip_call_ra_chain(struct sk_buff *skb)
*/
if (sk && inet_sk(sk)->inet_num == protocol &&
(!sk->sk_bound_dev_if ||
- sk->sk_bound_dev_if == dev->ifindex) &&
- net_eq(sock_net(sk), net)) {
+ sk->sk_bound_dev_if == dev->ifindex)) {
if (ip_is_fragment(ip_hdr(skb))) {
if (ip_defrag(net, skb, IP_DEFRAG_CALL_RA_CHAIN))
return true;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index bf5f44b27b7e..f36d35fe924b 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -322,18 +322,6 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
return 0;
}
-
-/* Special input handler for packets caught by router alert option.
- They are selected only by protocol field, and then processed likely
- local ones; but only if someone wants them! Otherwise, router
- not running rsvpd will kill RSVP.
-
- It is user level problem, what it will make with them.
- I have no idea, how it will masquearde or NAT them (it is joke, joke :-)),
- but receiver should be enough clever f.e. to forward mtrace requests,
- sent to multicast group to reach destination designated router.
- */
-struct ip_ra_chain __rcu *ip_ra_chain;
static DEFINE_SPINLOCK(ip_ra_lock);
@@ -350,6 +338,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
{
struct ip_ra_chain *ra, *new_ra;
struct ip_ra_chain __rcu **rap;
+ struct net *net = sock_net(sk);
if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num == IPPROTO_RAW)
return -EINVAL;
@@ -357,7 +346,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
spin_lock_bh(&ip_ra_lock);
- for (rap = &ip_ra_chain;
+ for (rap = &net->ipv4.ra_chain;
(ra = rcu_dereference_protected(*rap,
lockdep_is_held(&ip_ra_lock))) != NULL;
rap = &ra->next) {
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v3 5/5] net: Replace ip_ra_lock with per-net mutex
2018-03-22 9:44 [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
` (3 preceding siblings ...)
2018-03-22 9:45 ` [PATCH net-next v3 4/5] net: Make ip_ra_chain per struct net Kirill Tkhai
@ 2018-03-22 9:45 ` Kirill Tkhai
2018-03-22 9:49 ` [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
2018-03-22 19:14 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-03-22 9:45 UTC (permalink / raw)
To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai
Since ra_chain is per-net, we may use per-net mutexes
to protect them in ip_ra_control(). This improves
scalability.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
include/net/netns/ipv4.h | 1 +
net/core/net_namespace.c | 1 +
net/ipv4/ip_sockglue.c | 15 ++++++---------
3 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 97d7ee6667c7..8491bc9c86b1 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -50,6 +50,7 @@ struct netns_ipv4 {
struct ipv4_devconf *devconf_all;
struct ipv4_devconf *devconf_dflt;
struct ip_ra_chain __rcu *ra_chain;
+ struct mutex ra_mutex;
#ifdef CONFIG_IP_MULTIPLE_TABLES
struct fib_rules_ops *rules_ops;
bool fib_has_custom_rules;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index c340d5cfbdec..95ba2c53bd9a 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -301,6 +301,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
net->user_ns = user_ns;
idr_init(&net->netns_ids);
spin_lock_init(&net->nsid_lock);
+ mutex_init(&net->ipv4.ra_mutex);
list_for_each_entry(ops, &pernet_list, list) {
error = ops_init(ops, net);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index f36d35fe924b..5ad2d8ed3a3f 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -322,9 +322,6 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
return 0;
}
-static DEFINE_SPINLOCK(ip_ra_lock);
-
-
static void ip_ra_destroy_rcu(struct rcu_head *head)
{
struct ip_ra_chain *ra = container_of(head, struct ip_ra_chain, rcu);
@@ -345,21 +342,21 @@ int ip_ra_control(struct sock *sk, unsigned char on,
new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
- spin_lock_bh(&ip_ra_lock);
+ mutex_lock(&net->ipv4.ra_mutex);
for (rap = &net->ipv4.ra_chain;
(ra = rcu_dereference_protected(*rap,
- lockdep_is_held(&ip_ra_lock))) != NULL;
+ lockdep_is_held(&net->ipv4.ra_mutex))) != NULL;
rap = &ra->next) {
if (ra->sk == sk) {
if (on) {
- spin_unlock_bh(&ip_ra_lock);
+ mutex_unlock(&net->ipv4.ra_mutex);
kfree(new_ra);
return -EADDRINUSE;
}
/* dont let ip_call_ra_chain() use sk again */
ra->sk = NULL;
RCU_INIT_POINTER(*rap, ra->next);
- spin_unlock_bh(&ip_ra_lock);
+ mutex_unlock(&net->ipv4.ra_mutex);
if (ra->destructor)
ra->destructor(sk);
@@ -374,7 +371,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
}
}
if (!new_ra) {
- spin_unlock_bh(&ip_ra_lock);
+ mutex_unlock(&net->ipv4.ra_mutex);
return -ENOBUFS;
}
new_ra->sk = sk;
@@ -383,7 +380,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
RCU_INIT_POINTER(new_ra->next, ra);
rcu_assign_pointer(*rap, new_ra);
sock_hold(sk);
- spin_unlock_bh(&ip_ra_lock);
+ mutex_unlock(&net->ipv4.ra_mutex);
return 0;
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 0/5] Rework ip_ra_chain protection
2018-03-22 9:44 [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
` (4 preceding siblings ...)
2018-03-22 9:45 ` [PATCH net-next v3 5/5] net: Replace ip_ra_lock with per-net mutex Kirill Tkhai
@ 2018-03-22 9:49 ` Kirill Tkhai
2018-03-22 19:14 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Kirill Tkhai @ 2018-03-22 9:49 UTC (permalink / raw)
To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
xiyou.wangcong, dvyukov, andreyknvl, lkp
On 22.03.2018 12:44, Kirill Tkhai wrote:
> Commit 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control"
> made rtnl_lock() be used in raw_close(). This function is called
> on every RAW socket destruction, so that rtnl_mutex is taken
> every time. This scales very sadly. I observe cleanup_net()
> spending a lot of time in rtnl_lock() and raw_close() is one
> of the biggest rtnl user (since we have percpu net->ipv4.icmp_sk).
>
> This patchset reworks the locking: reverts the problem commit
> and its descendant, and introduces rtnl-independent locking.
> This may have a continuation, and someone may work on killing
> rtnl_lock() in mrtsock_destruct() in the future.
>
> Thanks,
> Kirill
>
> ---
> v3: Change patches order: [2/5] and [3/5].
> v2: Fix sparse warning [4/5], as reported by kbuild test robot.
>
> ---
>
> Kirill Tkhai (5):
> net: Revert "ipv4: get rid of ip_ra_lock"
> net: Move IP_ROUTER_ALERT out of lock_sock(sk)
> net: Revert "ipv4: fix a deadlock in ip_ra_control"
> net: Make ip_ra_chain per struct net
> net: Replace ip_ra_lock with per-net mutex
>
>
> include/net/ip.h | 13 +++++++++++--
> include/net/netns/ipv4.h | 2 ++
> net/core/net_namespace.c | 1 +
> net/ipv4/ip_input.c | 5 ++---
> net/ipv4/ip_sockglue.c | 34 +++++++++++++---------------------
> net/ipv4/ipmr.c | 11 +++++++++--
> net/ipv4/raw.c | 2 --
> 7 files changed, 38 insertions(+), 30 deletions(-)
>
> --
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
JFI: I used the below program to test:
#define _GNU_SOURCE
#include <sys/socket.h>
#include <netinet/in.h>
#include <sys/types.h>
#include <linux/mroute.h>
#include <sched.h>
int main()
{
int sk, v, i = 0;
if (unshare(CLONE_NEWNET)) {
perror("unshare");
return 1;
}
sk = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);
if (sk < 0) {
perror("socket");
return 1;
}
for (i = 0; i < 3; i++)
fork();
while (1) {
setsockopt(sk, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v));
setsockopt(sk, IPPROTO_IP, MRT_DONE, (void *)&v, sizeof(v));
v = (i++)%2;
setsockopt(sk, IPPROTO_IP, IP_ROUTER_ALERT, (void *)&v, sizeof(v));
}
return 0;
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 0/5] Rework ip_ra_chain protection
2018-03-22 9:44 [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
` (5 preceding siblings ...)
2018-03-22 9:49 ` [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
@ 2018-03-22 19:14 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2018-03-22 19:14 UTC (permalink / raw)
To: ktkhai
Cc: yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
xiyou.wangcong, dvyukov, andreyknvl, lkp
From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Thu, 22 Mar 2018 12:44:46 +0300
> Commit 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control"
> made rtnl_lock() be used in raw_close(). This function is called
> on every RAW socket destruction, so that rtnl_mutex is taken
> every time. This scales very sadly. I observe cleanup_net()
> spending a lot of time in rtnl_lock() and raw_close() is one
> of the biggest rtnl user (since we have percpu net->ipv4.icmp_sk).
>
> This patchset reworks the locking: reverts the problem commit
> and its descendant, and introduces rtnl-independent locking.
> This may have a continuation, and someone may work on killing
> rtnl_lock() in mrtsock_destruct() in the future.
>
> Thanks,
> Kirill
>
> ---
> v3: Change patches order: [2/5] and [3/5].
> v2: Fix sparse warning [4/5], as reported by kbuild test robot.
This looks a lot better, series applied, thank you.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-03-22 19:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22 9:44 [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 1/5] net: Revert "ipv4: get rid of ip_ra_lock" Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 2/5] net: Move IP_ROUTER_ALERT out of lock_sock(sk) Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 3/5] net: Revert "ipv4: fix a deadlock in ip_ra_control" Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 4/5] net: Make ip_ra_chain per struct net Kirill Tkhai
2018-03-22 9:45 ` [PATCH net-next v3 5/5] net: Replace ip_ra_lock with per-net mutex Kirill Tkhai
2018-03-22 9:49 ` [PATCH net-next v3 0/5] Rework ip_ra_chain protection Kirill Tkhai
2018-03-22 19:14 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.