All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/5] Rework ip_ra_chain protection
@ 2018-03-19  9:14 Kirill Tkhai
  2018-03-19  9:14 ` [PATCH net-next v2 1/5] net: Revert "ipv4: get rid of ip_ra_lock" Kirill Tkhai
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Kirill Tkhai @ 2018-03-19  9:14 UTC (permalink / raw)
  To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai

Commit 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control"
made rtnl_lock() be used in raw_close(). This function is called
on every RAW socket destruction, so that rtnl_mutex is taken
every time. This scales very sadly. I observe cleanup_net()
spending a lot of time in rtnl_lock() and raw_close() is one
of the biggest rtnl user (since we have percpu net->ipv4.icmp_sk).

Another problem is that commit does not explain actual call path
mrtsock_destruct() takes sk lock and the way to deadlock.
But there is no sk lock taking is visible in mrtsock_destruct().
So, it is a question does we need it at all.

This patchset reworks the locking: reverts the problem commit
and its descendant, and introduces rtnl-independent locking.
This may have a continuation, and someone may work on killing
rtnl_lock() in mrtsock_destruct() in the future.

Thanks,
Kirill

---
v2:  Fix sparse warning (4/5), as reported by kbuild test robot.

Kirill Tkhai (5):
      net: Revert "ipv4: get rid of ip_ra_lock"
      net: Revert "ipv4: fix a deadlock in ip_ra_control"
      net: Move IP_ROUTER_ALERT out of lock_sock(sk)
      net: Make ip_ra_chain per struct net
      net: Replace ip_ra_lock with per-net mutex


 include/net/ip.h         |   13 +++++++++++--
 include/net/netns/ipv4.h |    2 ++
 net/core/net_namespace.c |    1 +
 net/ipv4/ip_input.c      |    5 ++---
 net/ipv4/ip_sockglue.c   |   34 +++++++++++++---------------------
 net/ipv4/ipmr.c          |   11 +++++++++--
 net/ipv4/raw.c           |    2 --
 7 files changed, 38 insertions(+), 30 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 1/5] net: Revert "ipv4: get rid of ip_ra_lock"
  2018-03-19  9:14 [PATCH net-next v2 0/5] Rework ip_ra_chain protection Kirill Tkhai
@ 2018-03-19  9:14 ` Kirill Tkhai
  2018-03-19  9:14 ` [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control" Kirill Tkhai
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2018-03-19  9:14 UTC (permalink / raw)
  To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai

This reverts commit ba3f571d5dde. The commit was made
after 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control",
and killed ip_ra_lock, which became useless after rtnl_lock()
made used to destroy every raw ipv4 socket. This scales
very bad, and next patch in series reverts 1215e51edad1.
ip_ra_lock will be used again.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 net/ipv4/ip_sockglue.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 74c962b9b09c..be7c3b71914d 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -334,6 +334,7 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
    sent to multicast group to reach destination designated router.
  */
 struct ip_ra_chain __rcu *ip_ra_chain;
+static DEFINE_SPINLOCK(ip_ra_lock);
 
 
 static void ip_ra_destroy_rcu(struct rcu_head *head)
@@ -355,17 +356,21 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 
 	new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
 
+	spin_lock_bh(&ip_ra_lock);
 	for (rap = &ip_ra_chain;
-	     (ra = rtnl_dereference(*rap)) != NULL;
+	     (ra = rcu_dereference_protected(*rap,
+			lockdep_is_held(&ip_ra_lock))) != NULL;
 	     rap = &ra->next) {
 		if (ra->sk == sk) {
 			if (on) {
+				spin_unlock_bh(&ip_ra_lock);
 				kfree(new_ra);
 				return -EADDRINUSE;
 			}
 			/* dont let ip_call_ra_chain() use sk again */
 			ra->sk = NULL;
 			RCU_INIT_POINTER(*rap, ra->next);
+			spin_unlock_bh(&ip_ra_lock);
 
 			if (ra->destructor)
 				ra->destructor(sk);
@@ -379,14 +384,17 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 			return 0;
 		}
 	}
-	if (!new_ra)
+	if (!new_ra) {
+		spin_unlock_bh(&ip_ra_lock);
 		return -ENOBUFS;
+	}
 	new_ra->sk = sk;
 	new_ra->destructor = destructor;
 
 	RCU_INIT_POINTER(new_ra->next, ra);
 	rcu_assign_pointer(*rap, new_ra);
 	sock_hold(sk);
+	spin_unlock_bh(&ip_ra_lock);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control"
  2018-03-19  9:14 [PATCH net-next v2 0/5] Rework ip_ra_chain protection Kirill Tkhai
  2018-03-19  9:14 ` [PATCH net-next v2 1/5] net: Revert "ipv4: get rid of ip_ra_lock" Kirill Tkhai
@ 2018-03-19  9:14 ` Kirill Tkhai
  2018-03-20 16:23   ` David Miller
  2018-03-19  9:15 ` [PATCH net-next v2 3/5] net: Move IP_ROUTER_ALERT out of lock_sock(sk) Kirill Tkhai
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Kirill Tkhai @ 2018-03-19  9:14 UTC (permalink / raw)
  To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai

This reverts commit 1215e51edad1.
Since raw_close() is used on every RAW socket destruction,
the changes made by 1215e51edad1 scale sadly. This clearly
seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
kwork spends a lot of time waiting for rtnl_lock() introduced
by this commit.

Next patches in series will rework this in another way,
so now we revert 1215e51edad1. Also, it doesn't seen
mrtsock_destruct() takes sk_lock, and the comment to the commit
does not show the actual stack dump. So, there is a question
did we really need in it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 net/ipv4/ip_sockglue.c |    1 -
 net/ipv4/ipmr.c        |   11 +++++++++--
 net/ipv4/raw.c         |    2 --
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index be7c3b71914d..b7bac7351409 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -594,7 +594,6 @@ static bool setsockopt_needs_rtnl(int optname)
 	case MCAST_LEAVE_GROUP:
 	case MCAST_LEAVE_SOURCE_GROUP:
 	case MCAST_UNBLOCK_SOURCE:
-	case IP_ROUTER_ALERT:
 		return true;
 	}
 	return false;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index d752a70855d8..f6be5db16da2 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1399,7 +1399,7 @@ static void mrtsock_destruct(struct sock *sk)
 	struct net *net = sock_net(sk);
 	struct mr_table *mrt;
 
-	ASSERT_RTNL();
+	rtnl_lock();
 	ipmr_for_each_table(mrt, net) {
 		if (sk == rtnl_dereference(mrt->mroute_sk)) {
 			IPV4_DEVCONF_ALL(net, MC_FORWARDING)--;
@@ -1411,6 +1411,7 @@ static void mrtsock_destruct(struct sock *sk)
 			mroute_clean_tables(mrt, false);
 		}
 	}
+	rtnl_unlock();
 }
 
 /* Socket options and virtual interface manipulation. The whole
@@ -1475,8 +1476,13 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 		if (sk != rcu_access_pointer(mrt->mroute_sk)) {
 			ret = -EACCES;
 		} else {
+			/* We need to unlock here because mrtsock_destruct takes
+			 * care of rtnl itself and we can't change that due to
+			 * the IP_ROUTER_ALERT setsockopt which runs without it.
+			 */
+			rtnl_unlock();
 			ret = ip_ra_control(sk, 0, NULL);
-			goto out_unlock;
+			goto out;
 		}
 		break;
 	case MRT_ADD_VIF:
@@ -1588,6 +1594,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 	}
 out_unlock:
 	rtnl_unlock();
+out:
 	return ret;
 }
 
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 54648d20bf0f..720bef7da2f6 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -711,9 +711,7 @@ static void raw_close(struct sock *sk, long timeout)
 	/*
 	 * Raw sockets may have direct kernel references. Kill them.
 	 */
-	rtnl_lock();
 	ip_ra_control(sk, 0, NULL);
-	rtnl_unlock();
 
 	sk_common_release(sk);
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 3/5] net: Move IP_ROUTER_ALERT out of lock_sock(sk)
  2018-03-19  9:14 [PATCH net-next v2 0/5] Rework ip_ra_chain protection Kirill Tkhai
  2018-03-19  9:14 ` [PATCH net-next v2 1/5] net: Revert "ipv4: get rid of ip_ra_lock" Kirill Tkhai
  2018-03-19  9:14 ` [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control" Kirill Tkhai
@ 2018-03-19  9:15 ` Kirill Tkhai
  2018-03-19  9:15 ` [PATCH net-next v2 4/5] net: Make ip_ra_chain per struct net Kirill Tkhai
  2018-03-19  9:15 ` [PATCH net-next v2 5/5] net: Replace ip_ra_lock with per-net mutex Kirill Tkhai
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2018-03-19  9:15 UTC (permalink / raw)
  To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai

ip_ra_control() does not need sk_lock. Who are the another
users of ip_ra_chain? ip_mroute_setsockopt() doesn't take
sk_lock, while parallel IP_ROUTER_ALERT syscalls are
synchronized by ip_ra_lock. So, we may move this command
out of sk_lock.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 net/ipv4/ip_sockglue.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b7bac7351409..bf5f44b27b7e 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -646,6 +646,8 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 
 	/* If optlen==0, it is equivalent to val == 0 */
 
+	if (optname == IP_ROUTER_ALERT)
+		return ip_ra_control(sk, val ? 1 : 0, NULL);
 	if (ip_mroute_opt(optname))
 		return ip_mroute_setsockopt(sk, optname, optval, optlen);
 
@@ -1156,9 +1158,6 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 			goto e_inval;
 		inet->mc_all = val;
 		break;
-	case IP_ROUTER_ALERT:
-		err = ip_ra_control(sk, val ? 1 : 0, NULL);
-		break;
 
 	case IP_FREEBIND:
 		if (optlen < 1)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 4/5] net: Make ip_ra_chain per struct net
  2018-03-19  9:14 [PATCH net-next v2 0/5] Rework ip_ra_chain protection Kirill Tkhai
                   ` (2 preceding siblings ...)
  2018-03-19  9:15 ` [PATCH net-next v2 3/5] net: Move IP_ROUTER_ALERT out of lock_sock(sk) Kirill Tkhai
@ 2018-03-19  9:15 ` Kirill Tkhai
  2018-03-19  9:15 ` [PATCH net-next v2 5/5] net: Replace ip_ra_lock with per-net mutex Kirill Tkhai
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2018-03-19  9:15 UTC (permalink / raw)
  To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai

This is optimization, which makes ip_call_ra_chain()
iterate less sockets to find the sockets it's looking for.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 include/net/ip.h         |   13 +++++++++++--
 include/net/netns/ipv4.h |    1 +
 net/ipv4/ip_input.c      |    5 ++---
 net/ipv4/ip_sockglue.c   |   15 ++-------------
 4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index fe63ba95d12b..d53b5a9eae34 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -91,6 +91,17 @@ static inline int inet_sdif(struct sk_buff *skb)
 	return 0;
 }
 
+/* Special input handler for packets caught by router alert option.
+   They are selected only by protocol field, and then processed likely
+   local ones; but only if someone wants them! Otherwise, router
+   not running rsvpd will kill RSVP.
+
+   It is user level problem, what it will make with them.
+   I have no idea, how it will masquearde or NAT them (it is joke, joke :-)),
+   but receiver should be enough clever f.e. to forward mtrace requests,
+   sent to multicast group to reach destination designated router.
+ */
+
 struct ip_ra_chain {
 	struct ip_ra_chain __rcu *next;
 	struct sock		*sk;
@@ -101,8 +112,6 @@ struct ip_ra_chain {
 	struct rcu_head		rcu;
 };
 
-extern struct ip_ra_chain __rcu *ip_ra_chain;
-
 /* IP flags. */
 #define IP_CE		0x8000		/* Flag: "Congestion"		*/
 #define IP_DF		0x4000		/* Flag: "Don't Fragment"	*/
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 382bfd7583cf..97d7ee6667c7 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -49,6 +49,7 @@ struct netns_ipv4 {
 #endif
 	struct ipv4_devconf	*devconf_all;
 	struct ipv4_devconf	*devconf_dflt;
+	struct ip_ra_chain __rcu *ra_chain;
 #ifdef CONFIG_IP_MULTIPLE_TABLES
 	struct fib_rules_ops	*rules_ops;
 	bool			fib_has_custom_rules;
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 57fc13c6ab2b..7582713dd18f 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -159,7 +159,7 @@ bool ip_call_ra_chain(struct sk_buff *skb)
 	struct net_device *dev = skb->dev;
 	struct net *net = dev_net(dev);
 
-	for (ra = rcu_dereference(ip_ra_chain); ra; ra = rcu_dereference(ra->next)) {
+	for (ra = rcu_dereference(net->ipv4.ra_chain); ra; ra = rcu_dereference(ra->next)) {
 		struct sock *sk = ra->sk;
 
 		/* If socket is bound to an interface, only report
@@ -167,8 +167,7 @@ bool ip_call_ra_chain(struct sk_buff *skb)
 		 */
 		if (sk && inet_sk(sk)->inet_num == protocol &&
 		    (!sk->sk_bound_dev_if ||
-		     sk->sk_bound_dev_if == dev->ifindex) &&
-		    net_eq(sock_net(sk), net)) {
+		     sk->sk_bound_dev_if == dev->ifindex)) {
 			if (ip_is_fragment(ip_hdr(skb))) {
 				if (ip_defrag(net, skb, IP_DEFRAG_CALL_RA_CHAIN))
 					return true;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index bf5f44b27b7e..f36d35fe924b 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -322,18 +322,6 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
 	return 0;
 }
 
-
-/* Special input handler for packets caught by router alert option.
-   They are selected only by protocol field, and then processed likely
-   local ones; but only if someone wants them! Otherwise, router
-   not running rsvpd will kill RSVP.
-
-   It is user level problem, what it will make with them.
-   I have no idea, how it will masquearde or NAT them (it is joke, joke :-)),
-   but receiver should be enough clever f.e. to forward mtrace requests,
-   sent to multicast group to reach destination designated router.
- */
-struct ip_ra_chain __rcu *ip_ra_chain;
 static DEFINE_SPINLOCK(ip_ra_lock);
 
 
@@ -350,6 +338,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 {
 	struct ip_ra_chain *ra, *new_ra;
 	struct ip_ra_chain __rcu **rap;
+	struct net *net = sock_net(sk);
 
 	if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num == IPPROTO_RAW)
 		return -EINVAL;
@@ -357,7 +346,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 	new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
 
 	spin_lock_bh(&ip_ra_lock);
-	for (rap = &ip_ra_chain;
+	for (rap = &net->ipv4.ra_chain;
 	     (ra = rcu_dereference_protected(*rap,
 			lockdep_is_held(&ip_ra_lock))) != NULL;
 	     rap = &ra->next) {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 5/5] net: Replace ip_ra_lock with per-net mutex
  2018-03-19  9:14 [PATCH net-next v2 0/5] Rework ip_ra_chain protection Kirill Tkhai
                   ` (3 preceding siblings ...)
  2018-03-19  9:15 ` [PATCH net-next v2 4/5] net: Make ip_ra_chain per struct net Kirill Tkhai
@ 2018-03-19  9:15 ` Kirill Tkhai
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2018-03-19  9:15 UTC (permalink / raw)
  To: davem, yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp, ktkhai

Since ra_chain is per-net, we may use per-net mutexes
to protect them in ip_ra_control(). This improves
scalability.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 include/net/netns/ipv4.h |    1 +
 net/core/net_namespace.c |    1 +
 net/ipv4/ip_sockglue.c   |   15 ++++++---------
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 97d7ee6667c7..8491bc9c86b1 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -50,6 +50,7 @@ struct netns_ipv4 {
 	struct ipv4_devconf	*devconf_all;
 	struct ipv4_devconf	*devconf_dflt;
 	struct ip_ra_chain __rcu *ra_chain;
+	struct mutex		ra_mutex;
 #ifdef CONFIG_IP_MULTIPLE_TABLES
 	struct fib_rules_ops	*rules_ops;
 	bool			fib_has_custom_rules;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index c340d5cfbdec..95ba2c53bd9a 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -301,6 +301,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	net->user_ns = user_ns;
 	idr_init(&net->netns_ids);
 	spin_lock_init(&net->nsid_lock);
+	mutex_init(&net->ipv4.ra_mutex);
 
 	list_for_each_entry(ops, &pernet_list, list) {
 		error = ops_init(ops, net);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index f36d35fe924b..5ad2d8ed3a3f 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -322,9 +322,6 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
 	return 0;
 }
 
-static DEFINE_SPINLOCK(ip_ra_lock);
-
-
 static void ip_ra_destroy_rcu(struct rcu_head *head)
 {
 	struct ip_ra_chain *ra = container_of(head, struct ip_ra_chain, rcu);
@@ -345,21 +342,21 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 
 	new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
 
-	spin_lock_bh(&ip_ra_lock);
+	mutex_lock(&net->ipv4.ra_mutex);
 	for (rap = &net->ipv4.ra_chain;
 	     (ra = rcu_dereference_protected(*rap,
-			lockdep_is_held(&ip_ra_lock))) != NULL;
+			lockdep_is_held(&net->ipv4.ra_mutex))) != NULL;
 	     rap = &ra->next) {
 		if (ra->sk == sk) {
 			if (on) {
-				spin_unlock_bh(&ip_ra_lock);
+				mutex_unlock(&net->ipv4.ra_mutex);
 				kfree(new_ra);
 				return -EADDRINUSE;
 			}
 			/* dont let ip_call_ra_chain() use sk again */
 			ra->sk = NULL;
 			RCU_INIT_POINTER(*rap, ra->next);
-			spin_unlock_bh(&ip_ra_lock);
+			mutex_unlock(&net->ipv4.ra_mutex);
 
 			if (ra->destructor)
 				ra->destructor(sk);
@@ -374,7 +371,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 		}
 	}
 	if (!new_ra) {
-		spin_unlock_bh(&ip_ra_lock);
+		mutex_unlock(&net->ipv4.ra_mutex);
 		return -ENOBUFS;
 	}
 	new_ra->sk = sk;
@@ -383,7 +380,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 	RCU_INIT_POINTER(new_ra->next, ra);
 	rcu_assign_pointer(*rap, new_ra);
 	sock_hold(sk);
-	spin_unlock_bh(&ip_ra_lock);
+	mutex_unlock(&net->ipv4.ra_mutex);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control"
  2018-03-19  9:14 ` [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control" Kirill Tkhai
@ 2018-03-20 16:23   ` David Miller
  2018-03-20 19:25     ` Kirill Tkhai
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2018-03-20 16:23 UTC (permalink / raw)
  To: ktkhai
  Cc: yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp

From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Mon, 19 Mar 2018 12:14:54 +0300

> This reverts commit 1215e51edad1.
> Since raw_close() is used on every RAW socket destruction,
> the changes made by 1215e51edad1 scale sadly. This clearly
> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
> kwork spends a lot of time waiting for rtnl_lock() introduced
> by this commit.
> 
> Next patches in series will rework this in another way,
> so now we revert 1215e51edad1. Also, it doesn't seen
> mrtsock_destruct() takes sk_lock, and the comment to the commit
> does not show the actual stack dump. So, there is a question
> did we really need in it.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

Kirill, I think the commit you are reverting is legitimate.

The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
you revert this, so you are reintroducing a bug.

All code paths that must take both RTNL and the socket lock must
do them in the same order.  And that order is RTNL then socket
lock.

But you are breaking that here by getting us back into a state
where IP_RAW_CONTROL setsockopt will take the socket lock and
then RTNL.

Again, we can't take, or retake, RTNL if we have the socket lock
currently.

The only valid locking order is socket lock then RTNL.

Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control"
  2018-03-20 16:23   ` David Miller
@ 2018-03-20 19:25     ` Kirill Tkhai
  2018-03-20 21:50       ` Kirill Tkhai
  2018-03-22 18:41       ` David Miller
  0 siblings, 2 replies; 10+ messages in thread
From: Kirill Tkhai @ 2018-03-20 19:25 UTC (permalink / raw)
  To: David Miller
  Cc: yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp

Hi, David,

thanks for the review!

On 20.03.2018 19:23, David Miller wrote:
> From: Kirill Tkhai <ktkhai@virtuozzo.com>
> Date: Mon, 19 Mar 2018 12:14:54 +0300
> 
>> This reverts commit 1215e51edad1.
>> Since raw_close() is used on every RAW socket destruction,
>> the changes made by 1215e51edad1 scale sadly. This clearly
>> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
>> kwork spends a lot of time waiting for rtnl_lock() introduced
>> by this commit.
>>
>> Next patches in series will rework this in another way,
>> so now we revert 1215e51edad1. Also, it doesn't seen
>> mrtsock_destruct() takes sk_lock, and the comment to the commit
>> does not show the actual stack dump. So, there is a question
>> did we really need in it.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> Kirill, I think the commit you are reverting is legitimate.
> 
> The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
> you revert this, so you are reintroducing a bug.

The talk is about IP_ROUTER_ALERT, I assume there is just an erratum.
 
> All code paths that must take both RTNL and the socket lock must
> do them in the same order.  And that order is RTNL then socket
> lock.

The place I change in this patch is IP_ROUTER_ALERT. There is only
a call of ip_ra_control(), while this function does not need socket
lock. Please, see next patch. It moves this ip_ra_control() out
of socket lock. And it fixes the problem pointed in reverted patch
in another way. So, if there is ABBA, after next patch it becomes
solved. Does this mean I have to merge [2/5] and [3/5] together?

> But you are breaking that here by getting us back into a state
> where IP_RAW_CONTROL setsockopt will take the socket lock and
> then RTNL.
> 
> Again, we can't take, or retake, RTNL if we have the socket lock
> currently.
> 
> The only valid locking order is socket lock then RTNL.

Thanks,
Kirill

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control"
  2018-03-20 19:25     ` Kirill Tkhai
@ 2018-03-20 21:50       ` Kirill Tkhai
  2018-03-22 18:41       ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2018-03-20 21:50 UTC (permalink / raw)
  To: David Miller
  Cc: yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp

On 20.03.2018 22:25, Kirill Tkhai wrote:
> Hi, David,
> 
> thanks for the review!
> 
> On 20.03.2018 19:23, David Miller wrote:
>> From: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Date: Mon, 19 Mar 2018 12:14:54 +0300
>>
>>> This reverts commit 1215e51edad1.
>>> Since raw_close() is used on every RAW socket destruction,
>>> the changes made by 1215e51edad1 scale sadly. This clearly
>>> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
>>> kwork spends a lot of time waiting for rtnl_lock() introduced
>>> by this commit.
>>>
>>> Next patches in series will rework this in another way,
>>> so now we revert 1215e51edad1. Also, it doesn't seen
>>> mrtsock_destruct() takes sk_lock, and the comment to the commit
>>> does not show the actual stack dump. So, there is a question
>>> did we really need in it.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>
>> Kirill, I think the commit you are reverting is legitimate.
>>
>> The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
>> you revert this, so you are reintroducing a bug.
> 
> The talk is about IP_ROUTER_ALERT, I assume there is just an erratum.
>  
>> All code paths that must take both RTNL and the socket lock must
>> do them in the same order.  And that order is RTNL then socket
>> lock.
> 
> The place I change in this patch is IP_ROUTER_ALERT. There is only
> a call of ip_ra_control(), while this function does not need socket
> lock. Please, see next patch. It moves this ip_ra_control() out
> of socket lock. And it fixes the problem pointed in reverted patch
> in another way. So, if there is ABBA, after next patch it becomes
> solved. Does this mean I have to merge [2/5] and [3/5] together?

We also can just change the order of patches, and make [3/5] go before [2/5].
Then, the kernel still remains bisectable. How do you think about this?

Thanks,
Kirill

>> But you are breaking that here by getting us back into a state
>> where IP_RAW_CONTROL setsockopt will take the socket lock and
>> then RTNL.
>>
>> Again, we can't take, or retake, RTNL if we have the socket lock
>> currently.
>>
>> The only valid locking order is socket lock then RTNL.
> 
> Thanks,
> Kirill
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control"
  2018-03-20 19:25     ` Kirill Tkhai
  2018-03-20 21:50       ` Kirill Tkhai
@ 2018-03-22 18:41       ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2018-03-22 18:41 UTC (permalink / raw)
  To: ktkhai
  Cc: yoshfuji, edumazet, yanhaishuang, nikolay, yotamg, soheil,
	avagin, nicolas.dichtel, ebiederm, fw, roman.kapl, netdev,
	xiyou.wangcong, dvyukov, andreyknvl, lkp

From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Tue, 20 Mar 2018 22:25:35 +0300

> On 20.03.2018 19:23, David Miller wrote:
>> From: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Date: Mon, 19 Mar 2018 12:14:54 +0300
>> 
>>> This reverts commit 1215e51edad1.
>>> Since raw_close() is used on every RAW socket destruction,
>>> the changes made by 1215e51edad1 scale sadly. This clearly
>>> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
>>> kwork spends a lot of time waiting for rtnl_lock() introduced
>>> by this commit.
>>>
>>> Next patches in series will rework this in another way,
>>> so now we revert 1215e51edad1. Also, it doesn't seen
>>> mrtsock_destruct() takes sk_lock, and the comment to the commit
>>> does not show the actual stack dump. So, there is a question
>>> did we really need in it.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> 
>> Kirill, I think the commit you are reverting is legitimate.
>> 
>> The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
>> you revert this, so you are reintroducing a bug.
> 
> The talk is about IP_ROUTER_ALERT, I assume there is just an erratum.

My bad, I did indeed mean IP_ROUTER_ALERT.

>> All code paths that must take both RTNL and the socket lock must
>> do them in the same order.  And that order is RTNL then socket
>> lock.
> 
> The place I change in this patch is IP_ROUTER_ALERT. There is only
> a call of ip_ra_control(), while this function does not need socket
> lock. Please, see next patch. It moves this ip_ra_control() out
> of socket lock. And it fixes the problem pointed in reverted patch
> in another way. So, if there is ABBA, after next patch it becomes
> solved. Does this mean I have to merge [2/5] and [3/5] together?

Yes, that is what should happen, because the revert by itself
reintroduces the potential ABBA deadlock between the socket lock
and the RTNL mutex.

I'll take a look at the new version of your series.

Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-22 18:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-19  9:14 [PATCH net-next v2 0/5] Rework ip_ra_chain protection Kirill Tkhai
2018-03-19  9:14 ` [PATCH net-next v2 1/5] net: Revert "ipv4: get rid of ip_ra_lock" Kirill Tkhai
2018-03-19  9:14 ` [PATCH net-next v2 2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control" Kirill Tkhai
2018-03-20 16:23   ` David Miller
2018-03-20 19:25     ` Kirill Tkhai
2018-03-20 21:50       ` Kirill Tkhai
2018-03-22 18:41       ` David Miller
2018-03-19  9:15 ` [PATCH net-next v2 3/5] net: Move IP_ROUTER_ALERT out of lock_sock(sk) Kirill Tkhai
2018-03-19  9:15 ` [PATCH net-next v2 4/5] net: Make ip_ra_chain per struct net Kirill Tkhai
2018-03-19  9:15 ` [PATCH net-next v2 5/5] net: Replace ip_ra_lock with per-net mutex Kirill Tkhai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.