All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks
@ 2022-06-23  4:34 Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 01/19] ip6mr: do not get a device reference in pim6_rcv() Eric Dumazet
                   ` (19 more replies)
  0 siblings, 20 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

We need to get rid of rwlocks in networking stacks,
if read_lock() is (ab)used from softirq context.

As discussed recently [1], rwlock are unfair by design in this case,
and writers can starve and trigger soft lockups.

This series convert ipmr code (both IPv4 and IPv6 families)
to RCU and spinlocks.

[1] https://lkml.org/lkml/2022/6/17/272

v2: fixed two typos, and resent because patch 19/19
    did not make it to patchwork.

Eric Dumazet (19):
  ip6mr: do not get a device reference in pim6_rcv()
  ipmr: add rcu protection over (struct vif_device)->dev
  ipmr: change igmpmsg_netlink_event() prototype
  ipmr: ipmr_cache_report() changes
  ipmr: do not acquire mrt_lock in __pim_rcv()
  ipmr: do not acquire mrt_lock in ioctl(SIOCGETVIFCNT)
  ipmr: do not acquire mrt_lock before calling ipmr_cache_unresolved()
  ipmr: do not acquire mrt_lock while calling ip_mr_forward()
  ipmr: do not acquire mrt_lock in ipmr_get_route()
  ip6mr: ip6mr_cache_report() changes
  ip6mr: do not acquire mrt_lock in pim6_rcv()
  ip6mr: do not acquire mrt_lock in ioctl(SIOCGETMIFCNT_IN6)
  ip6mr: do not acquire mrt_lock before calling ip6mr_cache_unresolved
  ip6mr: do not acquire mrt_lock while calling ip6_mr_forward()
  ip6mr: switch ip6mr_get_route() to rcu_read_lock()
  ipmr: adopt rcu_read_lock() in mr_dump()
  ipmr: convert /proc handlers to rcu_read_lock()
  ipmr: convert mrt_lock to a spinlock
  ip6mr: convert mrt_lock to a spinlock

 include/linux/mroute_base.h |  15 ++-
 net/ipv4/ipmr.c             | 215 +++++++++++++++++++-----------------
 net/ipv4/ipmr_base.c        |  53 +++++----
 net/ipv6/ip6mr.c            | 202 ++++++++++++++++-----------------
 4 files changed, 255 insertions(+), 230 deletions(-)

-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 01/19] ip6mr: do not get a device reference in pim6_rcv()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 02/19] ipmr: add rcu protection over (struct vif_device)->dev Eric Dumazet
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

pim6_rcv() is called under rcu_read_lock(), there is
no need to use dev_hold()/dev_put() pair.

IPv4 side was handled in commit 55747a0a73ea
("ipmr: __pim_rcv() is called under rcu_read_lock")

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6mr.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index d4aad41c9849f977e395e9e07a4442dbfec07b1b..aa66c032ba979e7a14e5e296b68c55bc73d98398 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -554,7 +554,6 @@ static int pim6_rcv(struct sk_buff *skb)
 	read_lock(&mrt_lock);
 	if (reg_vif_num >= 0)
 		reg_dev = mrt->vif_table[reg_vif_num].dev;
-	dev_hold(reg_dev);
 	read_unlock(&mrt_lock);
 
 	if (!reg_dev)
@@ -570,7 +569,6 @@ static int pim6_rcv(struct sk_buff *skb)
 
 	netif_rx(skb);
 
-	dev_put(reg_dev);
 	return 0;
  drop:
 	kfree_skb(skb);
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 02/19] ipmr: add rcu protection over (struct vif_device)->dev
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 01/19] ip6mr: do not get a device reference in pim6_rcv() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 03/19] ipmr: change igmpmsg_netlink_event() prototype Eric Dumazet
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

We will soon use RCU instead of rwlock in ipmr & ip6mr

This preliminary patch adds proper rcu verbs to read/write
(struct vif_device)->dev

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/mroute_base.h | 11 ++++---
 net/ipv4/ipmr.c             | 65 ++++++++++++++++++++++---------------
 net/ipv4/ipmr_base.c        | 49 ++++++++++++++++++----------
 net/ipv6/ip6mr.c            | 63 +++++++++++++++++++----------------
 4 files changed, 111 insertions(+), 77 deletions(-)

diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
index e05ee9f001ffbf30f7b26ab5555e2db5cc058560..10d1e4fb4e9fe387d914c83d135ed6a8f284c374 100644
--- a/include/linux/mroute_base.h
+++ b/include/linux/mroute_base.h
@@ -26,7 +26,7 @@
  * @remote: Remote address for tunnels
  */
 struct vif_device {
-	struct net_device *dev;
+	struct net_device __rcu *dev;
 	netdevice_tracker dev_tracker;
 	unsigned long bytes_in, bytes_out;
 	unsigned long pkt_in, pkt_out;
@@ -52,6 +52,7 @@ static inline int mr_call_vif_notifier(struct notifier_block *nb,
 				       unsigned short family,
 				       enum fib_event_type event_type,
 				       struct vif_device *vif,
+				       struct net_device *vif_dev,
 				       unsigned short vif_index, u32 tb_id,
 				       struct netlink_ext_ack *extack)
 {
@@ -60,7 +61,7 @@ static inline int mr_call_vif_notifier(struct notifier_block *nb,
 			.family = family,
 			.extack = extack,
 		},
-		.dev = vif->dev,
+		.dev = vif_dev,
 		.vif_index = vif_index,
 		.vif_flags = vif->flags,
 		.tb_id = tb_id,
@@ -73,6 +74,7 @@ static inline int mr_call_vif_notifiers(struct net *net,
 					unsigned short family,
 					enum fib_event_type event_type,
 					struct vif_device *vif,
+					struct net_device *vif_dev,
 					unsigned short vif_index, u32 tb_id,
 					unsigned int *ipmr_seq)
 {
@@ -80,7 +82,7 @@ static inline int mr_call_vif_notifiers(struct net *net,
 		.info = {
 			.family = family,
 		},
-		.dev = vif->dev,
+		.dev = vif_dev,
 		.vif_index = vif_index,
 		.vif_flags = vif->flags,
 		.tb_id = tb_id,
@@ -98,7 +100,8 @@ static inline int mr_call_vif_notifiers(struct net *net,
 #define MAXVIFS	32
 #endif
 
-#define VIF_EXISTS(_mrt, _idx) (!!((_mrt)->vif_table[_idx].dev))
+/* Note: This helper is deprecated. */
+#define VIF_EXISTS(_mrt, _idx) (!!rcu_access_pointer((_mrt)->vif_table[_idx].dev))
 
 /* mfc_flags:
  * MFC_STATIC - the entry was added statically (not by a routing daemon)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 8324e541d193b0024a82d83149a0ee3b5967f2a9..10371a9e78fc41e8562fa8d81222a8ae24e2fbb6 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -79,6 +79,12 @@ struct ipmr_result {
 
 static DEFINE_RWLOCK(mrt_lock);
 
+static struct net_device *vif_dev_read(const struct vif_device *vif)
+{
+	return rcu_dereference_check(vif->dev,
+				     lockdep_is_held(&mrt_lock));
+}
+
 /* Multicast router control variables */
 
 /* Special spinlock for queue of unresolved entries */
@@ -586,7 +592,7 @@ static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
 
 	read_lock(&mrt_lock);
 	if (mrt->mroute_reg_vif_num >= 0)
-		reg_dev = mrt->vif_table[mrt->mroute_reg_vif_num].dev;
+		reg_dev = vif_dev_read(&mrt->vif_table[mrt->mroute_reg_vif_num]);
 	read_unlock(&mrt_lock);
 
 	if (!reg_dev)
@@ -614,10 +620,11 @@ static struct net_device *ipmr_reg_vif(struct net *net, struct mr_table *mrt)
 static int call_ipmr_vif_entry_notifiers(struct net *net,
 					 enum fib_event_type event_type,
 					 struct vif_device *vif,
+					 struct net_device *vif_dev,
 					 vifi_t vif_index, u32 tb_id)
 {
 	return mr_call_vif_notifiers(net, RTNL_FAMILY_IPMR, event_type,
-				     vif, vif_index, tb_id,
+				     vif, vif_dev, vif_index, tb_id,
 				     &net->ipv4.ipmr_seq);
 }
 
@@ -649,18 +656,14 @@ static int vif_delete(struct mr_table *mrt, int vifi, int notify,
 
 	v = &mrt->vif_table[vifi];
 
-	if (VIF_EXISTS(mrt, vifi))
-		call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_DEL, v, vifi,
-					      mrt->id);
+	dev = rtnl_dereference(v->dev);
+	if (!dev)
+		return -EADDRNOTAVAIL;
 
 	write_lock_bh(&mrt_lock);
-	dev = v->dev;
-	v->dev = NULL;
-
-	if (!dev) {
-		write_unlock_bh(&mrt_lock);
-		return -EADDRNOTAVAIL;
-	}
+	call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_DEL, v, dev,
+				      vifi, mrt->id);
+	RCU_INIT_POINTER(v->dev, NULL);
 
 	if (vifi == mrt->mroute_reg_vif_num)
 		mrt->mroute_reg_vif_num = -1;
@@ -890,14 +893,15 @@ static int vif_add(struct net *net, struct mr_table *mrt,
 
 	/* And finish update writing critical data */
 	write_lock_bh(&mrt_lock);
-	v->dev = dev;
+	rcu_assign_pointer(v->dev, dev);
 	netdev_tracker_alloc(dev, &v->dev_tracker, GFP_ATOMIC);
 	if (v->flags & VIFF_REGISTER)
 		mrt->mroute_reg_vif_num = vifi;
 	if (vifi+1 > mrt->maxvif)
 		mrt->maxvif = vifi+1;
 	write_unlock_bh(&mrt_lock);
-	call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD, v, vifi, mrt->id);
+	call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD, v, dev,
+				      vifi, mrt->id);
 	return 0;
 }
 
@@ -1726,7 +1730,7 @@ static int ipmr_device_event(struct notifier_block *this, unsigned long event, v
 	ipmr_for_each_table(mrt, net) {
 		v = &mrt->vif_table[0];
 		for (ct = 0; ct < mrt->maxvif; ct++, v++) {
-			if (v->dev == dev)
+			if (rcu_access_pointer(v->dev) == dev)
 				vif_delete(mrt, ct, 1, NULL);
 		}
 	}
@@ -1811,19 +1815,21 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 {
 	const struct iphdr *iph = ip_hdr(skb);
 	struct vif_device *vif = &mrt->vif_table[vifi];
+	struct net_device *vif_dev;
 	struct net_device *dev;
 	struct rtable *rt;
 	struct flowi4 fl4;
 	int    encap = 0;
 
-	if (!vif->dev)
+	vif_dev = vif_dev_read(vif);
+	if (!vif_dev)
 		goto out_free;
 
 	if (vif->flags & VIFF_REGISTER) {
 		vif->pkt_out++;
 		vif->bytes_out += skb->len;
-		vif->dev->stats.tx_bytes += skb->len;
-		vif->dev->stats.tx_packets++;
+		vif_dev->stats.tx_bytes += skb->len;
+		vif_dev->stats.tx_packets++;
 		ipmr_cache_report(mrt, skb, vifi, IGMPMSG_WHOLEPKT);
 		goto out_free;
 	}
@@ -1881,8 +1887,8 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 	if (vif->flags & VIFF_TUNNEL) {
 		ip_encap(net, skb, vif->local, vif->remote);
 		/* FIXME: extra output firewall step used to be here. --RR */
-		vif->dev->stats.tx_packets++;
-		vif->dev->stats.tx_bytes += skb->len;
+		vif_dev->stats.tx_packets++;
+		vif_dev->stats.tx_bytes += skb->len;
 	}
 
 	IPCB(skb)->flags |= IPSKB_FORWARDED;
@@ -1911,7 +1917,7 @@ static int ipmr_find_vif(struct mr_table *mrt, struct net_device *dev)
 	int ct;
 
 	for (ct = mrt->maxvif-1; ct >= 0; ct--) {
-		if (mrt->vif_table[ct].dev == dev)
+		if (rcu_access_pointer(mrt->vif_table[ct].dev) == dev)
 			break;
 	}
 	return ct;
@@ -1944,7 +1950,7 @@ static void ip_mr_forward(struct net *net, struct mr_table *mrt,
 	}
 
 	/* Wrong interface: drop packet and (maybe) send PIM assert. */
-	if (mrt->vif_table[vif].dev != dev) {
+	if (rcu_access_pointer(mrt->vif_table[vif].dev) != dev) {
 		if (rt_is_output_route(skb_rtable(skb))) {
 			/* It is our own packet, looped back.
 			 * Very complicated situation...
@@ -2744,18 +2750,21 @@ static bool ipmr_fill_table(struct mr_table *mrt, struct sk_buff *skb)
 
 static bool ipmr_fill_vif(struct mr_table *mrt, u32 vifid, struct sk_buff *skb)
 {
+	struct net_device *vif_dev;
 	struct nlattr *vif_nest;
 	struct vif_device *vif;
 
+	vif = &mrt->vif_table[vifid];
+	vif_dev = vif_dev_read(vif);
 	/* if the VIF doesn't exist just continue */
-	if (!VIF_EXISTS(mrt, vifid))
+	if (!vif_dev)
 		return true;
 
-	vif = &mrt->vif_table[vifid];
 	vif_nest = nla_nest_start_noflag(skb, IPMRA_VIF);
 	if (!vif_nest)
 		return false;
-	if (nla_put_u32(skb, IPMRA_VIFA_IFINDEX, vif->dev->ifindex) ||
+
+	if (nla_put_u32(skb, IPMRA_VIFA_IFINDEX, vif_dev->ifindex) ||
 	    nla_put_u32(skb, IPMRA_VIFA_VIF_ID, vifid) ||
 	    nla_put_u16(skb, IPMRA_VIFA_FLAGS, vif->flags) ||
 	    nla_put_u64_64bit(skb, IPMRA_VIFA_BYTES_IN, vif->bytes_in,
@@ -2919,9 +2928,11 @@ static int ipmr_vif_seq_show(struct seq_file *seq, void *v)
 			 "Interface      BytesIn  PktsIn  BytesOut PktsOut Flags Local    Remote\n");
 	} else {
 		const struct vif_device *vif = v;
-		const char *name =  vif->dev ?
-				    vif->dev->name : "none";
+		const struct net_device *vif_dev;
+		const char *name;
 
+		vif_dev = vif_dev_read(vif);
+		name = vif_dev ? vif_dev->name : "none";
 		seq_printf(seq,
 			   "%2td %-10s %8ld %7ld  %8ld %7ld %05X %08X %08X\n",
 			   vif - mrt->vif_table,
diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
index aa8738a91210a563a2bd7ee1fe17f3bcde1936de..59f62b938472aef79b4eb3ade706bf4d111e1e3a 100644
--- a/net/ipv4/ipmr_base.c
+++ b/net/ipv4/ipmr_base.c
@@ -13,7 +13,7 @@ void vif_device_init(struct vif_device *v,
 		     unsigned short flags,
 		     unsigned short get_iflink_mask)
 {
-	v->dev = NULL;
+	RCU_INIT_POINTER(v->dev, NULL);
 	v->bytes_in = 0;
 	v->bytes_out = 0;
 	v->pkt_in = 0;
@@ -208,6 +208,7 @@ EXPORT_SYMBOL(mr_mfc_seq_next);
 int mr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 		   struct mr_mfc *c, struct rtmsg *rtm)
 {
+	struct net_device *vif_dev;
 	struct rta_mfc_stats mfcs;
 	struct nlattr *mp_attr;
 	struct rtnexthop *nhp;
@@ -220,10 +221,13 @@ int mr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 		return -ENOENT;
 	}
 
-	if (VIF_EXISTS(mrt, c->mfc_parent) &&
-	    nla_put_u32(skb, RTA_IIF,
-			mrt->vif_table[c->mfc_parent].dev->ifindex) < 0)
+	rcu_read_lock();
+	vif_dev = rcu_dereference(mrt->vif_table[c->mfc_parent].dev);
+	if (vif_dev && nla_put_u32(skb, RTA_IIF, vif_dev->ifindex) < 0) {
+		rcu_read_unlock();
 		return -EMSGSIZE;
+	}
+	rcu_read_unlock();
 
 	if (c->mfc_flags & MFC_OFFLOAD)
 		rtm->rtm_flags |= RTNH_F_OFFLOAD;
@@ -232,23 +236,27 @@ int mr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 	if (!mp_attr)
 		return -EMSGSIZE;
 
+	rcu_read_lock();
 	for (ct = c->mfc_un.res.minvif; ct < c->mfc_un.res.maxvif; ct++) {
-		if (VIF_EXISTS(mrt, ct) && c->mfc_un.res.ttls[ct] < 255) {
-			struct vif_device *vif;
+		struct vif_device *vif = &mrt->vif_table[ct];
+
+		vif_dev = rcu_dereference(vif->dev);
+		if (vif_dev && c->mfc_un.res.ttls[ct] < 255) {
 
 			nhp = nla_reserve_nohdr(skb, sizeof(*nhp));
 			if (!nhp) {
+				rcu_read_unlock();
 				nla_nest_cancel(skb, mp_attr);
 				return -EMSGSIZE;
 			}
 
 			nhp->rtnh_flags = 0;
 			nhp->rtnh_hops = c->mfc_un.res.ttls[ct];
-			vif = &mrt->vif_table[ct];
-			nhp->rtnh_ifindex = vif->dev->ifindex;
+			nhp->rtnh_ifindex = vif_dev->ifindex;
 			nhp->rtnh_len = sizeof(*nhp);
 		}
 	}
+	rcu_read_unlock();
 
 	nla_nest_end(skb, mp_attr);
 
@@ -275,13 +283,14 @@ static bool mr_mfc_uses_dev(const struct mr_table *mrt,
 	int ct;
 
 	for (ct = c->mfc_un.res.minvif; ct < c->mfc_un.res.maxvif; ct++) {
-		if (VIF_EXISTS(mrt, ct) && c->mfc_un.res.ttls[ct] < 255) {
-			const struct vif_device *vif;
-
-			vif = &mrt->vif_table[ct];
-			if (vif->dev == dev)
-				return true;
-		}
+		const struct net_device *vif_dev;
+		const struct vif_device *vif;
+
+		vif = &mrt->vif_table[ct];
+		vif_dev = rcu_access_pointer(vif->dev);
+		if (vif_dev && c->mfc_un.res.ttls[ct] < 255 &&
+		    vif_dev == dev)
+			return true;
 	}
 	return false;
 }
@@ -402,18 +411,22 @@ int mr_dump(struct net *net, struct notifier_block *nb, unsigned short family,
 
 	for (mrt = mr_iter(net, NULL); mrt; mrt = mr_iter(net, mrt)) {
 		struct vif_device *v = &mrt->vif_table[0];
+		struct net_device *vif_dev;
 		struct mr_mfc *mfc;
 		int vifi;
 
 		/* Notifiy on table VIF entries */
 		read_lock(mrt_lock);
 		for (vifi = 0; vifi < mrt->maxvif; vifi++, v++) {
-			if (!v->dev)
+			vif_dev = rcu_dereference_check(v->dev,
+							lockdep_is_held(mrt_lock));
+			if (!vif_dev)
 				continue;
 
 			err = mr_call_vif_notifier(nb, family,
-						   FIB_EVENT_VIF_ADD,
-						   v, vifi, mrt->id, extack);
+						   FIB_EVENT_VIF_ADD, v,
+						   vif_dev, vifi,
+						   mrt->id, extack);
 			if (err)
 				break;
 		}
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index aa66c032ba979e7a14e5e296b68c55bc73d98398..44cb3d88bbd6f85ce0c8e9054dd3d578b7b3733b 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -64,6 +64,12 @@ struct ip6mr_result {
 
 static DEFINE_RWLOCK(mrt_lock);
 
+static struct net_device *vif_dev_read(const struct vif_device *vif)
+{
+	return rcu_dereference_check(vif->dev,
+				     lockdep_is_held(&mrt_lock));
+}
+
 /* Multicast router control variables */
 
 /* Special spinlock for queue of unresolved entries */
@@ -430,7 +436,11 @@ static int ip6mr_vif_seq_show(struct seq_file *seq, void *v)
 			 "Interface      BytesIn  PktsIn  BytesOut PktsOut Flags\n");
 	} else {
 		const struct vif_device *vif = v;
-		const char *name = vif->dev ? vif->dev->name : "none";
+		const struct net_device *vif_dev;
+		const char *name;
+
+		vif_dev = vif_dev_read(vif);
+		name = vif_dev ? vif_dev->name : "none";
 
 		seq_printf(seq,
 			   "%2td %-10s %8ld %7ld  %8ld %7ld %05X\n",
@@ -553,7 +563,7 @@ static int pim6_rcv(struct sk_buff *skb)
 
 	read_lock(&mrt_lock);
 	if (reg_vif_num >= 0)
-		reg_dev = mrt->vif_table[reg_vif_num].dev;
+		reg_dev = vif_dev_read(&mrt->vif_table[reg_vif_num]);
 	read_unlock(&mrt_lock);
 
 	if (!reg_dev)
@@ -668,10 +678,11 @@ static struct net_device *ip6mr_reg_vif(struct net *net, struct mr_table *mrt)
 static int call_ip6mr_vif_entry_notifiers(struct net *net,
 					  enum fib_event_type event_type,
 					  struct vif_device *vif,
+					  struct net_device *vif_dev,
 					  mifi_t vif_index, u32 tb_id)
 {
 	return mr_call_vif_notifiers(net, RTNL_FAMILY_IP6MR, event_type,
-				     vif, vif_index, tb_id,
+				     vif, vif_dev, vif_index, tb_id,
 				     &net->ipv6.ipmr_seq);
 }
 
@@ -696,19 +707,15 @@ static int mif6_delete(struct mr_table *mrt, int vifi, int notify,
 
 	v = &mrt->vif_table[vifi];
 
-	if (VIF_EXISTS(mrt, vifi))
-		call_ip6mr_vif_entry_notifiers(read_pnet(&mrt->net),
-					       FIB_EVENT_VIF_DEL, v, vifi,
-					       mrt->id);
+	dev = rtnl_dereference(v->dev);
+	if (!dev)
+		return -EADDRNOTAVAIL;
 
+	call_ip6mr_vif_entry_notifiers(read_pnet(&mrt->net),
+				       FIB_EVENT_VIF_DEL, v, dev,
+				       vifi, mrt->id);
 	write_lock_bh(&mrt_lock);
-	dev = v->dev;
-	v->dev = NULL;
-
-	if (!dev) {
-		write_unlock_bh(&mrt_lock);
-		return -EADDRNOTAVAIL;
-	}
+	RCU_INIT_POINTER(v->dev, NULL);
 
 #ifdef CONFIG_IPV6_PIMSM_V2
 	if (vifi == mrt->mroute_reg_vif_num)
@@ -911,7 +918,7 @@ static int mif6_add(struct net *net, struct mr_table *mrt,
 
 	/* And finish update writing critical data */
 	write_lock_bh(&mrt_lock);
-	v->dev = dev;
+	rcu_assign_pointer(v->dev, dev);
 	netdev_tracker_alloc(dev, &v->dev_tracker, GFP_ATOMIC);
 #ifdef CONFIG_IPV6_PIMSM_V2
 	if (v->flags & MIFF_REGISTER)
@@ -921,7 +928,7 @@ static int mif6_add(struct net *net, struct mr_table *mrt,
 		mrt->maxvif = vifi + 1;
 	write_unlock_bh(&mrt_lock);
 	call_ip6mr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD,
-				       v, vifi, mrt->id);
+				       v, dev, vifi, mrt->id);
 	return 0;
 }
 
@@ -1241,7 +1248,7 @@ static int ip6mr_device_event(struct notifier_block *this,
 	ip6mr_for_each_table(mrt, net) {
 		v = &mrt->vif_table[0];
 		for (ct = 0; ct < mrt->maxvif; ct++, v++) {
-			if (v->dev == dev)
+			if (rcu_access_pointer(v->dev) == dev)
 				mif6_delete(mrt, ct, 1, NULL);
 		}
 	}
@@ -2019,21 +2026,22 @@ static inline int ip6mr_forward2_finish(struct net *net, struct sock *sk, struct
 static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 			  struct sk_buff *skb, int vifi)
 {
-	struct ipv6hdr *ipv6h;
 	struct vif_device *vif = &mrt->vif_table[vifi];
-	struct net_device *dev;
+	struct net_device *vif_dev;
+	struct ipv6hdr *ipv6h;
 	struct dst_entry *dst;
 	struct flowi6 fl6;
 
-	if (!vif->dev)
+	vif_dev = vif_dev_read(vif);
+	if (!vif_dev)
 		goto out_free;
 
 #ifdef CONFIG_IPV6_PIMSM_V2
 	if (vif->flags & MIFF_REGISTER) {
 		vif->pkt_out++;
 		vif->bytes_out += skb->len;
-		vif->dev->stats.tx_bytes += skb->len;
-		vif->dev->stats.tx_packets++;
+		vif_dev->stats.tx_bytes += skb->len;
+		vif_dev->stats.tx_packets++;
 		ip6mr_cache_report(mrt, skb, vifi, MRT6MSG_WHOLEPKT);
 		goto out_free;
 	}
@@ -2066,14 +2074,13 @@ static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 	 * not mrouter) cannot join to more than one interface - it will
 	 * result in receiving multiple packets.
 	 */
-	dev = vif->dev;
-	skb->dev = dev;
+	skb->dev = vif_dev;
 	vif->pkt_out++;
 	vif->bytes_out += skb->len;
 
 	/* We are about to write */
 	/* XXX: extension headers? */
-	if (skb_cow(skb, sizeof(*ipv6h) + LL_RESERVED_SPACE(dev)))
+	if (skb_cow(skb, sizeof(*ipv6h) + LL_RESERVED_SPACE(vif_dev)))
 		goto out_free;
 
 	ipv6h = ipv6_hdr(skb);
@@ -2082,7 +2089,7 @@ static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 	IP6CB(skb)->flags |= IP6SKB_FORWARDED;
 
 	return NF_HOOK(NFPROTO_IPV6, NF_INET_FORWARD,
-		       net, NULL, skb, skb->dev, dev,
+		       net, NULL, skb, skb->dev, vif_dev,
 		       ip6mr_forward2_finish);
 
 out_free:
@@ -2095,7 +2102,7 @@ static int ip6mr_find_vif(struct mr_table *mrt, struct net_device *dev)
 	int ct;
 
 	for (ct = mrt->maxvif - 1; ct >= 0; ct--) {
-		if (mrt->vif_table[ct].dev == dev)
+		if (rcu_access_pointer(mrt->vif_table[ct].dev) == dev)
 			break;
 	}
 	return ct;
@@ -2133,7 +2140,7 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 	/*
 	 * Wrong interface: drop packet and (maybe) send PIM assert.
 	 */
-	if (mrt->vif_table[vif].dev != dev) {
+	if (rcu_access_pointer(mrt->vif_table[vif].dev) != dev) {
 		c->_c.mfc_un.res.wrong_if++;
 
 		if (true_vifi >= 0 && mrt->mroute_do_assert &&
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 03/19] ipmr: change igmpmsg_netlink_event() prototype
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 01/19] ip6mr: do not get a device reference in pim6_rcv() Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 02/19] ipmr: add rcu protection over (struct vif_device)->dev Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 04/19] ipmr: ipmr_cache_report() changes Eric Dumazet
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

igmpmsg_netlink_event() first argument can be marked const.

igmpmsg_netlink_event() reads mrt->net and mrt->id,
both being set once in mr_table_alloc().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 10371a9e78fc41e8562fa8d81222a8ae24e2fbb6..6710324497cae3bbc2fcdd12d6e1d44eed5215b3 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -110,7 +110,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
 			     struct sk_buff *pkt, vifi_t vifi, int assert);
 static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
 				 int cmd);
-static void igmpmsg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt);
+static void igmpmsg_netlink_event(const struct mr_table *mrt, struct sk_buff *pkt);
 static void mroute_clean_tables(struct mr_table *mrt, int flags);
 static void ipmr_expire_process(struct timer_list *t);
 
@@ -2410,7 +2410,7 @@ static size_t igmpmsg_netlink_msgsize(size_t payloadlen)
 	return len;
 }
 
-static void igmpmsg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt)
+static void igmpmsg_netlink_event(const struct mr_table *mrt, struct sk_buff *pkt)
 {
 	struct net *net = read_pnet(&mrt->net);
 	struct nlmsghdr *nlh;
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 04/19] ipmr: ipmr_cache_report() changes
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (2 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 03/19] ipmr: change igmpmsg_netlink_event() prototype Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 05/19] ipmr: do not acquire mrt_lock in __pim_rcv() Eric Dumazet
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

ipmr_cache_report() first argument can be marked const, and we change
the caller convention about which lock needs to be held.

Instead of read_lock(&mrt_lock), we can use rcu_read_lock().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c | 45 +++++++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 6710324497cae3bbc2fcdd12d6e1d44eed5215b3..8fe7a688cf41deeb99c7ca554f1788a956d2fdb9 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -106,7 +106,7 @@ static void ipmr_free_table(struct mr_table *mrt);
 static void ip_mr_forward(struct net *net, struct mr_table *mrt,
 			  struct net_device *dev, struct sk_buff *skb,
 			  struct mfc_cache *cache, int local);
-static int ipmr_cache_report(struct mr_table *mrt,
+static int ipmr_cache_report(const struct mr_table *mrt,
 			     struct sk_buff *pkt, vifi_t vifi, int assert);
 static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
 				 int cmd);
@@ -507,11 +507,15 @@ static netdev_tx_t reg_vif_xmit(struct sk_buff *skb, struct net_device *dev)
 		return err;
 	}
 
-	read_lock(&mrt_lock);
 	dev->stats.tx_bytes += skb->len;
 	dev->stats.tx_packets++;
-	ipmr_cache_report(mrt, skb, mrt->mroute_reg_vif_num, IGMPMSG_WHOLEPKT);
-	read_unlock(&mrt_lock);
+	rcu_read_lock();
+
+	/* Pairs with WRITE_ONCE() in vif_add() and vif_delete() */
+	ipmr_cache_report(mrt, skb, READ_ONCE(mrt->mroute_reg_vif_num),
+			  IGMPMSG_WHOLEPKT);
+
+	rcu_read_unlock();
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
@@ -665,9 +669,10 @@ static int vif_delete(struct mr_table *mrt, int vifi, int notify,
 				      vifi, mrt->id);
 	RCU_INIT_POINTER(v->dev, NULL);
 
-	if (vifi == mrt->mroute_reg_vif_num)
-		mrt->mroute_reg_vif_num = -1;
-
+	if (vifi == mrt->mroute_reg_vif_num) {
+		/* Pairs with READ_ONCE() in ipmr_cache_report() and reg_vif_xmit() */
+		WRITE_ONCE(mrt->mroute_reg_vif_num, -1);
+	}
 	if (vifi + 1 == mrt->maxvif) {
 		int tmp;
 
@@ -895,8 +900,10 @@ static int vif_add(struct net *net, struct mr_table *mrt,
 	write_lock_bh(&mrt_lock);
 	rcu_assign_pointer(v->dev, dev);
 	netdev_tracker_alloc(dev, &v->dev_tracker, GFP_ATOMIC);
-	if (v->flags & VIFF_REGISTER)
-		mrt->mroute_reg_vif_num = vifi;
+	if (v->flags & VIFF_REGISTER) {
+		/* Pairs with READ_ONCE() in ipmr_cache_report() and reg_vif_xmit() */
+		WRITE_ONCE(mrt->mroute_reg_vif_num, vifi);
+	}
 	if (vifi+1 > mrt->maxvif)
 		mrt->maxvif = vifi+1;
 	write_unlock_bh(&mrt_lock);
@@ -1005,9 +1012,9 @@ static void ipmr_cache_resolve(struct net *net, struct mr_table *mrt,
 
 /* Bounce a cache query up to mrouted and netlink.
  *
- * Called under mrt_lock.
+ * Called under rcu_read_lock().
  */
-static int ipmr_cache_report(struct mr_table *mrt,
+static int ipmr_cache_report(const struct mr_table *mrt,
 			     struct sk_buff *pkt, vifi_t vifi, int assert)
 {
 	const int ihl = ip_hdrlen(pkt);
@@ -1042,8 +1049,11 @@ static int ipmr_cache_report(struct mr_table *mrt,
 			msg->im_vif = vifi;
 			msg->im_vif_hi = vifi >> 8;
 		} else {
-			msg->im_vif = mrt->mroute_reg_vif_num;
-			msg->im_vif_hi = mrt->mroute_reg_vif_num >> 8;
+			/* Pairs with WRITE_ONCE() in vif_add() and vif_delete() */
+			int vif_num = READ_ONCE(mrt->mroute_reg_vif_num);
+
+			msg->im_vif = vif_num;
+			msg->im_vif_hi = vif_num >> 8;
 		}
 		ip_hdr(skb)->ihl = sizeof(struct iphdr) >> 2;
 		ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(pkt)->tot_len) +
@@ -1068,10 +1078,8 @@ static int ipmr_cache_report(struct mr_table *mrt,
 		skb->transport_header = skb->network_header;
 	}
 
-	rcu_read_lock();
 	mroute_sk = rcu_dereference(mrt->mroute_sk);
 	if (!mroute_sk) {
-		rcu_read_unlock();
 		kfree_skb(skb);
 		return -EINVAL;
 	}
@@ -1080,7 +1088,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
 
 	/* Deliver to mrouted */
 	ret = sock_queue_rcv_skb(mroute_sk, skb);
-	rcu_read_unlock();
+
 	if (ret < 0) {
 		net_warn_ratelimited("mroute: pending queue full, dropping entries\n");
 		kfree_skb(skb);
@@ -1090,6 +1098,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
 }
 
 /* Queue a packet for resolution. It gets locked cache entry! */
+/* Called under rcu_read_lock() */
 static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 				 struct sk_buff *skb, struct net_device *dev)
 {
@@ -1830,7 +1839,9 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 		vif->bytes_out += skb->len;
 		vif_dev->stats.tx_bytes += skb->len;
 		vif_dev->stats.tx_packets++;
+		rcu_read_lock();
 		ipmr_cache_report(mrt, skb, vifi, IGMPMSG_WHOLEPKT);
+		rcu_read_unlock();
 		goto out_free;
 	}
 
@@ -1980,10 +1991,12 @@ static void ip_mr_forward(struct net *net, struct mr_table *mrt,
 			       c->_c.mfc_un.res.last_assert +
 			       MFC_ASSERT_THRESH)) {
 			c->_c.mfc_un.res.last_assert = jiffies;
+			rcu_read_lock();
 			ipmr_cache_report(mrt, skb, true_vifi, IGMPMSG_WRONGVIF);
 			if (mrt->mroute_do_wrvifwhole)
 				ipmr_cache_report(mrt, skb, true_vifi,
 						  IGMPMSG_WRVIFWHOLE);
+			rcu_read_unlock();
 		}
 		goto dont_forward;
 	}
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 05/19] ipmr: do not acquire mrt_lock in __pim_rcv()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (3 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 04/19] ipmr: ipmr_cache_report() changes Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 06/19] ipmr: do not acquire mrt_lock in ioctl(SIOCGETVIFCNT) Eric Dumazet
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

rcu_read_lock() protection is more than enough.

vif_dev_read() supports either mrt_lock or rcu_read_lock().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 8fe7a688cf41deeb99c7ca554f1788a956d2fdb9..8a94f9a459cd077d74b5e38c3d2f248620d4ecfc 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -582,6 +582,7 @@ static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
 {
 	struct net_device *reg_dev = NULL;
 	struct iphdr *encap;
+	int vif_num;
 
 	encap = (struct iphdr *)(skb_transport_header(skb) + pimlen);
 	/* Check that:
@@ -594,11 +595,10 @@ static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
 	    ntohs(encap->tot_len) + pimlen > skb->len)
 		return 1;
 
-	read_lock(&mrt_lock);
-	if (mrt->mroute_reg_vif_num >= 0)
-		reg_dev = vif_dev_read(&mrt->vif_table[mrt->mroute_reg_vif_num]);
-	read_unlock(&mrt_lock);
-
+	/* Pairs with WRITE_ONCE() in vif_add()/vid_delete() */
+	vif_num = READ_ONCE(mrt->mroute_reg_vif_num);
+	if (vif_num >= 0)
+		reg_dev = vif_dev_read(&mrt->vif_table[vif_num]);
 	if (!reg_dev)
 		return 1;
 
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 06/19] ipmr: do not acquire mrt_lock in ioctl(SIOCGETVIFCNT)
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (4 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 05/19] ipmr: do not acquire mrt_lock in __pim_rcv() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 07/19] ipmr: do not acquire mrt_lock before calling ipmr_cache_unresolved() Eric Dumazet
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

rcu_read_lock() protection is good enough.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 8a94f9a459cd077d74b5e38c3d2f248620d4ecfc..bc8b7504fde6ec3aadd6c0962f23e59c0aac702a 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1611,20 +1611,20 @@ int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg)
 		if (vr.vifi >= mrt->maxvif)
 			return -EINVAL;
 		vr.vifi = array_index_nospec(vr.vifi, mrt->maxvif);
-		read_lock(&mrt_lock);
+		rcu_read_lock();
 		vif = &mrt->vif_table[vr.vifi];
 		if (VIF_EXISTS(mrt, vr.vifi)) {
-			vr.icount = vif->pkt_in;
-			vr.ocount = vif->pkt_out;
-			vr.ibytes = vif->bytes_in;
-			vr.obytes = vif->bytes_out;
-			read_unlock(&mrt_lock);
+			vr.icount = READ_ONCE(vif->pkt_in);
+			vr.ocount = READ_ONCE(vif->pkt_out);
+			vr.ibytes = READ_ONCE(vif->bytes_in);
+			vr.obytes = READ_ONCE(vif->bytes_out);
+			rcu_read_unlock();
 
 			if (copy_to_user(arg, &vr, sizeof(vr)))
 				return -EFAULT;
 			return 0;
 		}
-		read_unlock(&mrt_lock);
+		rcu_read_unlock();
 		return -EADDRNOTAVAIL;
 	case SIOCGETSGCNT:
 		if (copy_from_user(&sr, arg, sizeof(sr)))
@@ -1686,20 +1686,20 @@ int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg)
 		if (vr.vifi >= mrt->maxvif)
 			return -EINVAL;
 		vr.vifi = array_index_nospec(vr.vifi, mrt->maxvif);
-		read_lock(&mrt_lock);
+		rcu_read_lock();
 		vif = &mrt->vif_table[vr.vifi];
 		if (VIF_EXISTS(mrt, vr.vifi)) {
-			vr.icount = vif->pkt_in;
-			vr.ocount = vif->pkt_out;
-			vr.ibytes = vif->bytes_in;
-			vr.obytes = vif->bytes_out;
-			read_unlock(&mrt_lock);
+			vr.icount = READ_ONCE(vif->pkt_in);
+			vr.ocount = READ_ONCE(vif->pkt_out);
+			vr.ibytes = READ_ONCE(vif->bytes_in);
+			vr.obytes = READ_ONCE(vif->bytes_out);
+			rcu_read_unlock();
 
 			if (copy_to_user(arg, &vr, sizeof(vr)))
 				return -EFAULT;
 			return 0;
 		}
-		read_unlock(&mrt_lock);
+		rcu_read_unlock();
 		return -EADDRNOTAVAIL;
 	case SIOCGETSGCNT:
 		if (copy_from_user(&sr, arg, sizeof(sr)))
@@ -1835,8 +1835,8 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 		goto out_free;
 
 	if (vif->flags & VIFF_REGISTER) {
-		vif->pkt_out++;
-		vif->bytes_out += skb->len;
+		WRITE_ONCE(vif->pkt_out, vif->pkt_out + 1);
+		WRITE_ONCE(vif->bytes_out, vif->bytes_out + skb->len);
 		vif_dev->stats.tx_bytes += skb->len;
 		vif_dev->stats.tx_packets++;
 		rcu_read_lock();
@@ -1885,8 +1885,8 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 		goto out_free;
 	}
 
-	vif->pkt_out++;
-	vif->bytes_out += skb->len;
+	WRITE_ONCE(vif->pkt_out, vif->pkt_out + 1);
+	WRITE_ONCE(vif->bytes_out, vif->bytes_out + skb->len);
 
 	skb_dst_drop(skb);
 	skb_dst_set(skb, &rt->dst);
@@ -2002,8 +2002,10 @@ static void ip_mr_forward(struct net *net, struct mr_table *mrt,
 	}
 
 forward:
-	mrt->vif_table[vif].pkt_in++;
-	mrt->vif_table[vif].bytes_in += skb->len;
+	WRITE_ONCE(mrt->vif_table[vif].pkt_in,
+		   mrt->vif_table[vif].pkt_in + 1);
+	WRITE_ONCE(mrt->vif_table[vif].bytes_in,
+		   mrt->vif_table[vif].bytes_in + skb->len);
 
 	/* Forward the frame */
 	if (c->mfc_origin == htonl(INADDR_ANY) &&
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 07/19] ipmr: do not acquire mrt_lock before calling ipmr_cache_unresolved()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (5 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 06/19] ipmr: do not acquire mrt_lock in ioctl(SIOCGETVIFCNT) Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward() Eric Dumazet
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

rcu_read_lock() protection is good enough.

ipmr_cache_unresolved() uses a dedicated spinlock (mfc_unres_lock)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index bc8b7504fde6ec3aadd6c0962f23e59c0aac702a..6ea54bc3d9b6555aaa9974d81ba4acd47481724f 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -680,7 +680,7 @@ static int vif_delete(struct mr_table *mrt, int vifi, int notify,
 			if (VIF_EXISTS(mrt, tmp))
 				break;
 		}
-		mrt->maxvif = tmp+1;
+		WRITE_ONCE(mrt->maxvif, tmp + 1);
 	}
 
 	write_unlock_bh(&mrt_lock);
@@ -905,7 +905,7 @@ static int vif_add(struct net *net, struct mr_table *mrt,
 		WRITE_ONCE(mrt->mroute_reg_vif_num, vifi);
 	}
 	if (vifi+1 > mrt->maxvif)
-		mrt->maxvif = vifi+1;
+		WRITE_ONCE(mrt->maxvif, vifi + 1);
 	write_unlock_bh(&mrt_lock);
 	call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD, v, dev,
 				      vifi, mrt->id);
@@ -1923,11 +1923,12 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 	kfree_skb(skb);
 }
 
-static int ipmr_find_vif(struct mr_table *mrt, struct net_device *dev)
+/* Called with mrt_lock or rcu_read_lock() */
+static int ipmr_find_vif(const struct mr_table *mrt, struct net_device *dev)
 {
 	int ct;
-
-	for (ct = mrt->maxvif-1; ct >= 0; ct--) {
+	/* Pairs with WRITE_ONCE() in vif_delete()/vif_add() */
+	for (ct = READ_ONCE(mrt->maxvif) - 1; ct >= 0; ct--) {
 		if (rcu_access_pointer(mrt->vif_table[ct].dev) == dev)
 			break;
 	}
@@ -2161,15 +2162,9 @@ int ip_mr_input(struct sk_buff *skb)
 			skb = skb2;
 		}
 
-		read_lock(&mrt_lock);
 		vif = ipmr_find_vif(mrt, dev);
-		if (vif >= 0) {
-			int err2 = ipmr_cache_unresolved(mrt, vif, skb, dev);
-			read_unlock(&mrt_lock);
-
-			return err2;
-		}
-		read_unlock(&mrt_lock);
+		if (vif >= 0)
+			return ipmr_cache_unresolved(mrt, vif, skb, dev);
 		kfree_skb(skb);
 		return -ENODEV;
 	}
@@ -2273,18 +2268,15 @@ int ipmr_get_route(struct net *net, struct sk_buff *skb,
 		int vif = -1;
 
 		dev = skb->dev;
-		read_lock(&mrt_lock);
 		if (dev)
 			vif = ipmr_find_vif(mrt, dev);
 		if (vif < 0) {
-			read_unlock(&mrt_lock);
 			rcu_read_unlock();
 			return -ENODEV;
 		}
 
 		skb2 = skb_realloc_headroom(skb, sizeof(struct iphdr));
 		if (!skb2) {
-			read_unlock(&mrt_lock);
 			rcu_read_unlock();
 			return -ENOMEM;
 		}
@@ -2298,7 +2290,6 @@ int ipmr_get_route(struct net *net, struct sk_buff *skb,
 		iph->daddr = daddr;
 		iph->version = 0;
 		err = ipmr_cache_unresolved(mrt, vif, skb2, dev);
-		read_unlock(&mrt_lock);
 		rcu_read_unlock();
 		return err;
 	}
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (6 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 07/19] ipmr: do not acquire mrt_lock before calling ipmr_cache_unresolved() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-07-22 19:34   ` Vladimir Oltean
  2022-06-23  4:34 ` [PATCH v2 net-next 09/19] ipmr: do not acquire mrt_lock in ipmr_get_route() Eric Dumazet
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

ip_mr_forward() uses standard RCU protection already.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 6ea54bc3d9b6555aaa9974d81ba4acd47481724f..b0f2e6d79d62273c8c2682f28cb45fe5ec3df6f3 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1817,7 +1817,7 @@ static bool ipmr_forward_offloaded(struct sk_buff *skb, struct mr_table *mrt,
 }
 #endif
 
-/* Processing handlers for ipmr_forward */
+/* Processing handlers for ipmr_forward, under rcu_read_lock() */
 
 static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 			    int in_vifi, struct sk_buff *skb, int vifi)
@@ -1839,9 +1839,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 		WRITE_ONCE(vif->bytes_out, vif->bytes_out + skb->len);
 		vif_dev->stats.tx_bytes += skb->len;
 		vif_dev->stats.tx_packets++;
-		rcu_read_lock();
 		ipmr_cache_report(mrt, skb, vifi, IGMPMSG_WHOLEPKT);
-		rcu_read_unlock();
 		goto out_free;
 	}
 
@@ -1936,6 +1934,7 @@ static int ipmr_find_vif(const struct mr_table *mrt, struct net_device *dev)
 }
 
 /* "local" means that we should preserve one skb (for local delivery) */
+/* Called uner rcu_read_lock() */
 static void ip_mr_forward(struct net *net, struct mr_table *mrt,
 			  struct net_device *dev, struct sk_buff *skb,
 			  struct mfc_cache *c, int local)
@@ -1992,12 +1991,10 @@ static void ip_mr_forward(struct net *net, struct mr_table *mrt,
 			       c->_c.mfc_un.res.last_assert +
 			       MFC_ASSERT_THRESH)) {
 			c->_c.mfc_un.res.last_assert = jiffies;
-			rcu_read_lock();
 			ipmr_cache_report(mrt, skb, true_vifi, IGMPMSG_WRONGVIF);
 			if (mrt->mroute_do_wrvifwhole)
 				ipmr_cache_report(mrt, skb, true_vifi,
 						  IGMPMSG_WRVIFWHOLE);
-			rcu_read_unlock();
 		}
 		goto dont_forward;
 	}
@@ -2169,9 +2166,7 @@ int ip_mr_input(struct sk_buff *skb)
 		return -ENODEV;
 	}
 
-	read_lock(&mrt_lock);
 	ip_mr_forward(net, mrt, dev, skb, cache, local);
-	read_unlock(&mrt_lock);
 
 	if (local)
 		return ip_local_deliver(skb);
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 09/19] ipmr: do not acquire mrt_lock in ipmr_get_route()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (7 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 10/19] ip6mr: ip6mr_cache_report() changes Eric Dumazet
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

mr_fill_mroute() uses standard rcu_read_unlock(),
no need to grab mrt_lock anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index b0f2e6d79d62273c8c2682f28cb45fe5ec3df6f3..69ccd3d7c655a53d3cf1ac9104b9da94213416f6 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2289,9 +2289,7 @@ int ipmr_get_route(struct net *net, struct sk_buff *skb,
 		return err;
 	}
 
-	read_lock(&mrt_lock);
 	err = mr_fill_mroute(mrt, skb, &cache->_c, rtm);
-	read_unlock(&mrt_lock);
 	rcu_read_unlock();
 	return err;
 }
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 10/19] ip6mr: ip6mr_cache_report() changes
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (8 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 09/19] ipmr: do not acquire mrt_lock in ipmr_get_route() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 11/19] ip6mr: do not acquire mrt_lock in pim6_rcv() Eric Dumazet
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

ip6mr_cache_report() first argument can be marked const, and we change
the caller convention about which lock needs to be held.

Instead of read_lock(&mrt_lock), we can use rcu_read_lock().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6mr.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 44cb3d88bbd6f85ce0c8e9054dd3d578b7b3733b..a6d97952bf5306c245996c612107d0c851bbc822 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -91,11 +91,11 @@ static void ip6mr_free_table(struct mr_table *mrt);
 static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 			   struct net_device *dev, struct sk_buff *skb,
 			   struct mfc6_cache *cache);
-static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
+static int ip6mr_cache_report(const struct mr_table *mrt, struct sk_buff *pkt,
 			      mifi_t mifi, int assert);
 static void mr6_netlink_event(struct mr_table *mrt, struct mfc6_cache *mfc,
 			      int cmd);
-static void mrt6msg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt);
+static void mrt6msg_netlink_event(const struct mr_table *mrt, struct sk_buff *pkt);
 static int ip6mr_rtm_dumproute(struct sk_buff *skb,
 			       struct netlink_callback *cb);
 static void mroute_clean_tables(struct mr_table *mrt, int flags);
@@ -608,11 +608,12 @@ static netdev_tx_t reg_vif_xmit(struct sk_buff *skb,
 	if (ip6mr_fib_lookup(net, &fl6, &mrt) < 0)
 		goto tx_err;
 
-	read_lock(&mrt_lock);
 	dev->stats.tx_bytes += skb->len;
 	dev->stats.tx_packets++;
-	ip6mr_cache_report(mrt, skb, mrt->mroute_reg_vif_num, MRT6MSG_WHOLEPKT);
-	read_unlock(&mrt_lock);
+	rcu_read_lock();
+	ip6mr_cache_report(mrt, skb, READ_ONCE(mrt->mroute_reg_vif_num),
+			   MRT6MSG_WHOLEPKT);
+	rcu_read_unlock();
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
 
@@ -718,8 +719,10 @@ static int mif6_delete(struct mr_table *mrt, int vifi, int notify,
 	RCU_INIT_POINTER(v->dev, NULL);
 
 #ifdef CONFIG_IPV6_PIMSM_V2
-	if (vifi == mrt->mroute_reg_vif_num)
-		mrt->mroute_reg_vif_num = -1;
+	if (vifi == mrt->mroute_reg_vif_num) {
+		/* Pairs with READ_ONCE() in ip6mr_cache_report() and reg_vif_xmit() */
+		WRITE_ONCE(mrt->mroute_reg_vif_num, -1);
+	}
 #endif
 
 	if (vifi + 1 == mrt->maxvif) {
@@ -922,7 +925,7 @@ static int mif6_add(struct net *net, struct mr_table *mrt,
 	netdev_tracker_alloc(dev, &v->dev_tracker, GFP_ATOMIC);
 #ifdef CONFIG_IPV6_PIMSM_V2
 	if (v->flags & MIFF_REGISTER)
-		mrt->mroute_reg_vif_num = vifi;
+		WRITE_ONCE(mrt->mroute_reg_vif_num, vifi);
 #endif
 	if (vifi + 1 > mrt->maxvif)
 		mrt->maxvif = vifi + 1;
@@ -1033,10 +1036,10 @@ static void ip6mr_cache_resolve(struct net *net, struct mr_table *mrt,
 /*
  *	Bounce a cache query up to pim6sd and netlink.
  *
- *	Called under mrt_lock.
+ *	Called under rcu_read_lock()
  */
 
-static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
+static int ip6mr_cache_report(const struct mr_table *mrt, struct sk_buff *pkt,
 			      mifi_t mifi, int assert)
 {
 	struct sock *mroute6_sk;
@@ -1077,7 +1080,7 @@ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
 		if (assert == MRT6MSG_WRMIFWHOLE)
 			msg->im6_mif = mifi;
 		else
-			msg->im6_mif = mrt->mroute_reg_vif_num;
+			msg->im6_mif = READ_ONCE(mrt->mroute_reg_vif_num);
 		msg->im6_pad = 0;
 		msg->im6_src = ipv6_hdr(pkt)->saddr;
 		msg->im6_dst = ipv6_hdr(pkt)->daddr;
@@ -1112,10 +1115,8 @@ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
 	}
 
-	rcu_read_lock();
 	mroute6_sk = rcu_dereference(mrt->mroute_sk);
 	if (!mroute6_sk) {
-		rcu_read_unlock();
 		kfree_skb(skb);
 		return -EINVAL;
 	}
@@ -1124,7 +1125,7 @@ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
 
 	/* Deliver to user space multicast routing algorithms */
 	ret = sock_queue_rcv_skb(mroute6_sk, skb);
-	rcu_read_unlock();
+
 	if (ret < 0) {
 		net_warn_ratelimited("mroute6: pending queue full, dropping entries\n");
 		kfree_skb(skb);
@@ -2042,7 +2043,9 @@ static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 		vif->bytes_out += skb->len;
 		vif_dev->stats.tx_bytes += skb->len;
 		vif_dev->stats.tx_packets++;
+		rcu_read_lock();
 		ip6mr_cache_report(mrt, skb, vifi, MRT6MSG_WHOLEPKT);
+		rcu_read_unlock();
 		goto out_free;
 	}
 #endif
@@ -2155,10 +2158,12 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 			       c->_c.mfc_un.res.last_assert +
 			       MFC_ASSERT_THRESH)) {
 			c->_c.mfc_un.res.last_assert = jiffies;
+			rcu_read_lock();
 			ip6mr_cache_report(mrt, skb, true_vifi, MRT6MSG_WRONGMIF);
 			if (mrt->mroute_do_wrvifwhole)
 				ip6mr_cache_report(mrt, skb, true_vifi,
 						   MRT6MSG_WRMIFWHOLE);
+			rcu_read_unlock();
 		}
 		goto dont_forward;
 	}
@@ -2465,7 +2470,7 @@ static size_t mrt6msg_netlink_msgsize(size_t payloadlen)
 	return len;
 }
 
-static void mrt6msg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt)
+static void mrt6msg_netlink_event(const struct mr_table *mrt, struct sk_buff *pkt)
 {
 	struct net *net = read_pnet(&mrt->net);
 	struct nlmsghdr *nlh;
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 11/19] ip6mr: do not acquire mrt_lock in pim6_rcv()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (9 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 10/19] ip6mr: ip6mr_cache_report() changes Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 12/19] ip6mr: do not acquire mrt_lock in ioctl(SIOCGETMIFCNT_IN6) Eric Dumazet
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

rcu_read_lock() protection is more than enough.

vif_dev_read() supports either mrt_lock or rcu_read_lock().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6mr.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index a6d97952bf5306c245996c612107d0c851bbc822..fa6720377e82d732ccafa02b37cc28e0ab1cea07 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -559,12 +559,11 @@ static int pim6_rcv(struct sk_buff *skb)
 
 	if (ip6mr_fib_lookup(net, &fl6, &mrt) < 0)
 		goto drop;
-	reg_vif_num = mrt->mroute_reg_vif_num;
 
-	read_lock(&mrt_lock);
+	/* Pairs with WRITE_ONCE() in mif6_add()/mif6_delete() */
+	reg_vif_num = READ_ONCE(mrt->mroute_reg_vif_num);
 	if (reg_vif_num >= 0)
 		reg_dev = vif_dev_read(&mrt->vif_table[reg_vif_num]);
-	read_unlock(&mrt_lock);
 
 	if (!reg_dev)
 		goto drop;
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 12/19] ip6mr: do not acquire mrt_lock in ioctl(SIOCGETMIFCNT_IN6)
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (10 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 11/19] ip6mr: do not acquire mrt_lock in pim6_rcv() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 13/19] ip6mr: do not acquire mrt_lock before calling ip6mr_cache_unresolved Eric Dumazet
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

rcu_read_lock() protection is good enough.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6mr.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index fa6720377e82d732ccafa02b37cc28e0ab1cea07..b4ad606e24bdaf12c233f642cee303f06ccfa4eb 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1896,20 +1896,20 @@ int ip6mr_ioctl(struct sock *sk, int cmd, void __user *arg)
 		if (vr.mifi >= mrt->maxvif)
 			return -EINVAL;
 		vr.mifi = array_index_nospec(vr.mifi, mrt->maxvif);
-		read_lock(&mrt_lock);
+		rcu_read_lock();
 		vif = &mrt->vif_table[vr.mifi];
 		if (VIF_EXISTS(mrt, vr.mifi)) {
-			vr.icount = vif->pkt_in;
-			vr.ocount = vif->pkt_out;
-			vr.ibytes = vif->bytes_in;
-			vr.obytes = vif->bytes_out;
-			read_unlock(&mrt_lock);
+			vr.icount = READ_ONCE(vif->pkt_in);
+			vr.ocount = READ_ONCE(vif->pkt_out);
+			vr.ibytes = READ_ONCE(vif->bytes_in);
+			vr.obytes = READ_ONCE(vif->bytes_out);
+			rcu_read_unlock();
 
 			if (copy_to_user(arg, &vr, sizeof(vr)))
 				return -EFAULT;
 			return 0;
 		}
-		read_unlock(&mrt_lock);
+		rcu_read_unlock();
 		return -EADDRNOTAVAIL;
 	case SIOCGETSGCNT_IN6:
 		if (copy_from_user(&sr, arg, sizeof(sr)))
@@ -1971,20 +1971,20 @@ int ip6mr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg)
 		if (vr.mifi >= mrt->maxvif)
 			return -EINVAL;
 		vr.mifi = array_index_nospec(vr.mifi, mrt->maxvif);
-		read_lock(&mrt_lock);
+		rcu_read_lock();
 		vif = &mrt->vif_table[vr.mifi];
 		if (VIF_EXISTS(mrt, vr.mifi)) {
-			vr.icount = vif->pkt_in;
-			vr.ocount = vif->pkt_out;
-			vr.ibytes = vif->bytes_in;
-			vr.obytes = vif->bytes_out;
-			read_unlock(&mrt_lock);
+			vr.icount = READ_ONCE(vif->pkt_in);
+			vr.ocount = READ_ONCE(vif->pkt_out);
+			vr.ibytes = READ_ONCE(vif->bytes_in);
+			vr.obytes = READ_ONCE(vif->bytes_out);
+			rcu_read_unlock();
 
 			if (copy_to_user(arg, &vr, sizeof(vr)))
 				return -EFAULT;
 			return 0;
 		}
-		read_unlock(&mrt_lock);
+		rcu_read_unlock();
 		return -EADDRNOTAVAIL;
 	case SIOCGETSGCNT_IN6:
 		if (copy_from_user(&sr, arg, sizeof(sr)))
@@ -2038,8 +2038,8 @@ static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 
 #ifdef CONFIG_IPV6_PIMSM_V2
 	if (vif->flags & MIFF_REGISTER) {
-		vif->pkt_out++;
-		vif->bytes_out += skb->len;
+		WRITE_ONCE(vif->pkt_out, vif->pkt_out + 1);
+		WRITE_ONCE(vif->bytes_out, vif->bytes_out + skb->len);
 		vif_dev->stats.tx_bytes += skb->len;
 		vif_dev->stats.tx_packets++;
 		rcu_read_lock();
@@ -2077,8 +2077,8 @@ static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 	 * result in receiving multiple packets.
 	 */
 	skb->dev = vif_dev;
-	vif->pkt_out++;
-	vif->bytes_out += skb->len;
+	WRITE_ONCE(vif->pkt_out, vif->pkt_out + 1);
+	WRITE_ONCE(vif->bytes_out, vif->bytes_out + skb->len);
 
 	/* We are about to write */
 	/* XXX: extension headers? */
@@ -2168,8 +2168,10 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 	}
 
 forward:
-	mrt->vif_table[vif].pkt_in++;
-	mrt->vif_table[vif].bytes_in += skb->len;
+	WRITE_ONCE(mrt->vif_table[vif].pkt_in,
+		   mrt->vif_table[vif].pkt_in + 1);
+	WRITE_ONCE(mrt->vif_table[vif].bytes_in,
+		   mrt->vif_table[vif].bytes_in + skb->len);
 
 	/*
 	 *	Forward the frame
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 13/19] ip6mr: do not acquire mrt_lock before calling ip6mr_cache_unresolved
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (11 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 12/19] ip6mr: do not acquire mrt_lock in ioctl(SIOCGETMIFCNT_IN6) Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 14/19] ip6mr: do not acquire mrt_lock while calling ip6_mr_forward() Eric Dumazet
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

rcu_read_lock() protection is good enough.

ip6mr_cache_unresolved() uses a dedicated spinlock (mfc_unres_lock)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6mr.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index b4ad606e24bdaf12c233f642cee303f06ccfa4eb..089c26af3120d198ccfdda631294ed9712568ac0 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -730,7 +730,7 @@ static int mif6_delete(struct mr_table *mrt, int vifi, int notify,
 			if (VIF_EXISTS(mrt, tmp))
 				break;
 		}
-		mrt->maxvif = tmp + 1;
+		WRITE_ONCE(mrt->maxvif, tmp + 1);
 	}
 
 	write_unlock_bh(&mrt_lock);
@@ -927,7 +927,7 @@ static int mif6_add(struct net *net, struct mr_table *mrt,
 		WRITE_ONCE(mrt->mroute_reg_vif_num, vifi);
 #endif
 	if (vifi + 1 > mrt->maxvif)
-		mrt->maxvif = vifi + 1;
+		WRITE_ONCE(mrt->maxvif, vifi + 1);
 	write_unlock_bh(&mrt_lock);
 	call_ip6mr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD,
 				       v, dev, vifi, mrt->id);
@@ -2099,11 +2099,13 @@ static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 	return 0;
 }
 
+/* Called with mrt_lock or rcu_read_lock() */
 static int ip6mr_find_vif(struct mr_table *mrt, struct net_device *dev)
 {
 	int ct;
 
-	for (ct = mrt->maxvif - 1; ct >= 0; ct--) {
+	/* Pairs with WRITE_ONCE() in mif6_delete()/mif6_add() */
+	for (ct = READ_ONCE(mrt->maxvif) - 1; ct >= 0; ct--) {
 		if (rcu_access_pointer(mrt->vif_table[ct].dev) == dev)
 			break;
 	}
@@ -2249,7 +2251,6 @@ int ip6_mr_input(struct sk_buff *skb)
 		return err;
 	}
 
-	read_lock(&mrt_lock);
 	cache = ip6mr_cache_find(mrt,
 				 &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr);
 	if (!cache) {
@@ -2270,15 +2271,14 @@ int ip6_mr_input(struct sk_buff *skb)
 		vif = ip6mr_find_vif(mrt, dev);
 		if (vif >= 0) {
 			int err = ip6mr_cache_unresolved(mrt, vif, skb, dev);
-			read_unlock(&mrt_lock);
 
 			return err;
 		}
-		read_unlock(&mrt_lock);
 		kfree_skb(skb);
 		return -ENODEV;
 	}
 
+	read_lock(&mrt_lock);
 	ip6_mr_forward(net, mrt, dev, skb, cache);
 
 	read_unlock(&mrt_lock);
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 14/19] ip6mr: do not acquire mrt_lock while calling ip6_mr_forward()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (12 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 13/19] ip6mr: do not acquire mrt_lock before calling ip6mr_cache_unresolved Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 15/19] ip6mr: switch ip6mr_get_route() to rcu_read_lock() Eric Dumazet
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

ip6_mr_forward() uses standard RCU protection already.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6mr.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 089c26af3120d198ccfdda631294ed9712568ac0..8a751a36020334168fe87c98ea38d561e2fa1d94 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2042,9 +2042,7 @@ static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 		WRITE_ONCE(vif->bytes_out, vif->bytes_out + skb->len);
 		vif_dev->stats.tx_bytes += skb->len;
 		vif_dev->stats.tx_packets++;
-		rcu_read_lock();
 		ip6mr_cache_report(mrt, skb, vifi, MRT6MSG_WHOLEPKT);
-		rcu_read_unlock();
 		goto out_free;
 	}
 #endif
@@ -2112,6 +2110,7 @@ static int ip6mr_find_vif(struct mr_table *mrt, struct net_device *dev)
 	return ct;
 }
 
+/* Called under rcu_read_lock() */
 static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 			   struct net_device *dev, struct sk_buff *skb,
 			   struct mfc6_cache *c)
@@ -2131,14 +2130,12 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 		/* For an (*,G) entry, we only check that the incoming
 		 * interface is part of the static tree.
 		 */
-		rcu_read_lock();
 		cache_proxy = mr_mfc_find_any_parent(mrt, vif);
 		if (cache_proxy &&
 		    cache_proxy->_c.mfc_un.res.ttls[true_vifi] < 255) {
 			rcu_read_unlock();
 			goto forward;
 		}
-		rcu_read_unlock();
 	}
 
 	/*
@@ -2159,12 +2156,10 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 			       c->_c.mfc_un.res.last_assert +
 			       MFC_ASSERT_THRESH)) {
 			c->_c.mfc_un.res.last_assert = jiffies;
-			rcu_read_lock();
 			ip6mr_cache_report(mrt, skb, true_vifi, MRT6MSG_WRONGMIF);
 			if (mrt->mroute_do_wrvifwhole)
 				ip6mr_cache_report(mrt, skb, true_vifi,
 						   MRT6MSG_WRMIFWHOLE);
-			rcu_read_unlock();
 		}
 		goto dont_forward;
 	}
@@ -2278,11 +2273,8 @@ int ip6_mr_input(struct sk_buff *skb)
 		return -ENODEV;
 	}
 
-	read_lock(&mrt_lock);
 	ip6_mr_forward(net, mrt, dev, skb, cache);
 
-	read_unlock(&mrt_lock);
-
 	return 0;
 }
 
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 15/19] ip6mr: switch ip6mr_get_route() to rcu_read_lock()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (13 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 14/19] ip6mr: do not acquire mrt_lock while calling ip6_mr_forward() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 16/19] ipmr: adopt rcu_read_lock() in mr_dump() Eric Dumazet
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

Like ipmr_get_route(), we can use standard RCU here.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6mr.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 8a751a36020334168fe87c98ea38d561e2fa1d94..08ac177fe30ca3bfbc50cd73b41cdc3da56d23e0 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2290,7 +2290,7 @@ int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm,
 	if (!mrt)
 		return -ENOENT;
 
-	read_lock(&mrt_lock);
+	rcu_read_lock();
 	cache = ip6mr_cache_find(mrt, &rt->rt6i_src.addr, &rt->rt6i_dst.addr);
 	if (!cache && skb->dev) {
 		int vif = ip6mr_find_vif(mrt, skb->dev);
@@ -2308,14 +2308,14 @@ int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm,
 
 		dev = skb->dev;
 		if (!dev || (vif = ip6mr_find_vif(mrt, dev)) < 0) {
-			read_unlock(&mrt_lock);
+			rcu_read_unlock();
 			return -ENODEV;
 		}
 
 		/* really correct? */
 		skb2 = alloc_skb(sizeof(struct ipv6hdr), GFP_ATOMIC);
 		if (!skb2) {
-			read_unlock(&mrt_lock);
+			rcu_read_unlock();
 			return -ENOMEM;
 		}
 
@@ -2338,13 +2338,13 @@ int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm,
 		iph->daddr = rt->rt6i_dst.addr;
 
 		err = ip6mr_cache_unresolved(mrt, vif, skb2, dev);
-		read_unlock(&mrt_lock);
+		rcu_read_unlock();
 
 		return err;
 	}
 
 	err = mr_fill_mroute(mrt, skb, &cache->_c, rtm);
-	read_unlock(&mrt_lock);
+	rcu_read_unlock();
 	return err;
 }
 
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 16/19] ipmr: adopt rcu_read_lock() in mr_dump()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (14 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 15/19] ip6mr: switch ip6mr_get_route() to rcu_read_lock() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 17/19] ipmr: convert /proc handlers to rcu_read_lock() Eric Dumazet
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

We no longer need to acquire mrt_lock() in mr_dump,
using rcu_read_lock() is enough.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/mroute_base.h | 4 ++--
 net/ipv4/ipmr.c             | 2 +-
 net/ipv4/ipmr_base.c        | 8 +++-----
 net/ipv6/ip6mr.c            | 2 +-
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
index 10d1e4fb4e9fe387d914c83d135ed6a8f284c374..9dd4bf1572553ffbf41bade97393fac091797a8d 100644
--- a/include/linux/mroute_base.h
+++ b/include/linux/mroute_base.h
@@ -308,7 +308,7 @@ int mr_dump(struct net *net, struct notifier_block *nb, unsigned short family,
 			      struct netlink_ext_ack *extack),
 	    struct mr_table *(*mr_iter)(struct net *net,
 					struct mr_table *mrt),
-	    rwlock_t *mrt_lock, struct netlink_ext_ack *extack);
+	    struct netlink_ext_ack *extack);
 #else
 static inline void vif_device_init(struct vif_device *v,
 				   struct net_device *dev,
@@ -363,7 +363,7 @@ static inline int mr_dump(struct net *net, struct notifier_block *nb,
 					    struct netlink_ext_ack *extack),
 			  struct mr_table *(*mr_iter)(struct net *net,
 						      struct mr_table *mrt),
-			  rwlock_t *mrt_lock, struct netlink_ext_ack *extack)
+			  struct netlink_ext_ack *extack)
 {
 	return -EINVAL;
 }
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 69ccd3d7c655a53d3cf1ac9104b9da94213416f6..38963b8de7af65cabd09894a816d342cd3cee5df 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -3027,7 +3027,7 @@ static int ipmr_dump(struct net *net, struct notifier_block *nb,
 		     struct netlink_ext_ack *extack)
 {
 	return mr_dump(net, nb, RTNL_FAMILY_IPMR, ipmr_rules_dump,
-		       ipmr_mr_table_iter, &mrt_lock, extack);
+		       ipmr_mr_table_iter, extack);
 }
 
 static const struct fib_notifier_ops ipmr_notifier_ops_template = {
diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
index 59f62b938472aef79b4eb3ade706bf4d111e1e3a..271dc03fc6dbd9b35db4d5782716679134f225e4 100644
--- a/net/ipv4/ipmr_base.c
+++ b/net/ipv4/ipmr_base.c
@@ -399,7 +399,6 @@ int mr_dump(struct net *net, struct notifier_block *nb, unsigned short family,
 			      struct netlink_ext_ack *extack),
 	    struct mr_table *(*mr_iter)(struct net *net,
 					struct mr_table *mrt),
-	    rwlock_t *mrt_lock,
 	    struct netlink_ext_ack *extack)
 {
 	struct mr_table *mrt;
@@ -416,10 +415,9 @@ int mr_dump(struct net *net, struct notifier_block *nb, unsigned short family,
 		int vifi;
 
 		/* Notifiy on table VIF entries */
-		read_lock(mrt_lock);
+		rcu_read_lock();
 		for (vifi = 0; vifi < mrt->maxvif; vifi++, v++) {
-			vif_dev = rcu_dereference_check(v->dev,
-							lockdep_is_held(mrt_lock));
+			vif_dev = rcu_dereference(v->dev);
 			if (!vif_dev)
 				continue;
 
@@ -430,7 +428,7 @@ int mr_dump(struct net *net, struct notifier_block *nb, unsigned short family,
 			if (err)
 				break;
 		}
-		read_unlock(mrt_lock);
+		rcu_read_unlock();
 
 		if (err)
 			return err;
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 08ac177fe30ca3bfbc50cd73b41cdc3da56d23e0..f0a9bceb8e3c05ab45e95e8983e505edc005917e 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1267,7 +1267,7 @@ static int ip6mr_dump(struct net *net, struct notifier_block *nb,
 		      struct netlink_ext_ack *extack)
 {
 	return mr_dump(net, nb, RTNL_FAMILY_IP6MR, ip6mr_rules_dump,
-		       ip6mr_mr_table_iter, &mrt_lock, extack);
+		       ip6mr_mr_table_iter, extack);
 }
 
 static struct notifier_block ip6_mr_notifier = {
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 17/19] ipmr: convert /proc handlers to rcu_read_lock()
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (15 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 16/19] ipmr: adopt rcu_read_lock() in mr_dump() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 18/19] ipmr: convert mrt_lock to a spinlock Eric Dumazet
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

We can use standard rcu_read_lock(), to get rid
of last read_lock(&mrt_lock) call points.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c  | 8 ++++----
 net/ipv6/ip6mr.c | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 38963b8de7af65cabd09894a816d342cd3cee5df..2e39f73fe81a2392e07af83fd933033964e3a730 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2895,7 +2895,7 @@ static int ipmr_rtm_dumplink(struct sk_buff *skb, struct netlink_callback *cb)
  */
 
 static void *ipmr_vif_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(mrt_lock)
+	__acquires(RCU)
 {
 	struct mr_vif_iter *iter = seq->private;
 	struct net *net = seq_file_net(seq);
@@ -2907,14 +2907,14 @@ static void *ipmr_vif_seq_start(struct seq_file *seq, loff_t *pos)
 
 	iter->mrt = mrt;
 
-	read_lock(&mrt_lock);
+	rcu_read_lock();
 	return mr_vif_seq_start(seq, pos);
 }
 
 static void ipmr_vif_seq_stop(struct seq_file *seq, void *v)
-	__releases(mrt_lock)
+	__releases(RCU)
 {
-	read_unlock(&mrt_lock);
+	rcu_read_unlock();
 }
 
 static int ipmr_vif_seq_show(struct seq_file *seq, void *v)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index f0a9bceb8e3c05ab45e95e8983e505edc005917e..7381cfdac3e376c97917465918a464bd61643f2a 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -404,7 +404,7 @@ static void ip6mr_free_table(struct mr_table *mrt)
  */
 
 static void *ip6mr_vif_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(mrt_lock)
+	__acquires(RCU)
 {
 	struct mr_vif_iter *iter = seq->private;
 	struct net *net = seq_file_net(seq);
@@ -416,14 +416,14 @@ static void *ip6mr_vif_seq_start(struct seq_file *seq, loff_t *pos)
 
 	iter->mrt = mrt;
 
-	read_lock(&mrt_lock);
+	rcu_read_lock();
 	return mr_vif_seq_start(seq, pos);
 }
 
 static void ip6mr_vif_seq_stop(struct seq_file *seq, void *v)
-	__releases(mrt_lock)
+	__releases(RCU)
 {
-	read_unlock(&mrt_lock);
+	rcu_read_unlock();
 }
 
 static int ip6mr_vif_seq_show(struct seq_file *seq, void *v)
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 18/19] ipmr: convert mrt_lock to a spinlock
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (16 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 17/19] ipmr: convert /proc handlers to rcu_read_lock() Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-23  4:34 ` [PATCH v2 net-next 19/19] ip6mr: " Eric Dumazet
  2022-06-24 10:50 ` [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks patchwork-bot+netdevbpf
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

mrt_lock is only held in write mode, from process context only.

We can switch to a mere spinlock, and avoid blocking BH.

Also, vif_dev_read() is always called under standard rcu_read_lock().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ipmr.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 2e39f73fe81a2392e07af83fd933033964e3a730..f095b6c8100bd24262c949390b68865b8b3987c3 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -77,12 +77,11 @@ struct ipmr_result {
  * Note that the changes are semaphored via rtnl_lock.
  */
 
-static DEFINE_RWLOCK(mrt_lock);
+static DEFINE_SPINLOCK(mrt_lock);
 
 static struct net_device *vif_dev_read(const struct vif_device *vif)
 {
-	return rcu_dereference_check(vif->dev,
-				     lockdep_is_held(&mrt_lock));
+	return rcu_dereference(vif->dev);
 }
 
 /* Multicast router control variables */
@@ -664,7 +663,7 @@ static int vif_delete(struct mr_table *mrt, int vifi, int notify,
 	if (!dev)
 		return -EADDRNOTAVAIL;
 
-	write_lock_bh(&mrt_lock);
+	spin_lock(&mrt_lock);
 	call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_DEL, v, dev,
 				      vifi, mrt->id);
 	RCU_INIT_POINTER(v->dev, NULL);
@@ -683,7 +682,7 @@ static int vif_delete(struct mr_table *mrt, int vifi, int notify,
 		WRITE_ONCE(mrt->maxvif, tmp + 1);
 	}
 
-	write_unlock_bh(&mrt_lock);
+	spin_unlock(&mrt_lock);
 
 	dev_set_allmulti(dev, -1);
 
@@ -785,7 +784,7 @@ static void ipmr_expire_process(struct timer_list *t)
 	spin_unlock(&mfc_unres_lock);
 }
 
-/* Fill oifs list. It is called under write locked mrt_lock. */
+/* Fill oifs list. It is called under locked mrt_lock. */
 static void ipmr_update_thresholds(struct mr_table *mrt, struct mr_mfc *cache,
 				   unsigned char *ttls)
 {
@@ -897,7 +896,7 @@ static int vif_add(struct net *net, struct mr_table *mrt,
 	v->remote = vifc->vifc_rmt_addr.s_addr;
 
 	/* And finish update writing critical data */
-	write_lock_bh(&mrt_lock);
+	spin_lock(&mrt_lock);
 	rcu_assign_pointer(v->dev, dev);
 	netdev_tracker_alloc(dev, &v->dev_tracker, GFP_ATOMIC);
 	if (v->flags & VIFF_REGISTER) {
@@ -906,7 +905,7 @@ static int vif_add(struct net *net, struct mr_table *mrt,
 	}
 	if (vifi+1 > mrt->maxvif)
 		WRITE_ONCE(mrt->maxvif, vifi + 1);
-	write_unlock_bh(&mrt_lock);
+	spin_unlock(&mrt_lock);
 	call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD, v, dev,
 				      vifi, mrt->id);
 	return 0;
@@ -1211,12 +1210,12 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
 				   mfc->mfcc_mcastgrp.s_addr, parent);
 	rcu_read_unlock();
 	if (c) {
-		write_lock_bh(&mrt_lock);
+		spin_lock(&mrt_lock);
 		c->_c.mfc_parent = mfc->mfcc_parent;
 		ipmr_update_thresholds(mrt, &c->_c, mfc->mfcc_ttls);
 		if (!mrtsock)
 			c->_c.mfc_flags |= MFC_STATIC;
-		write_unlock_bh(&mrt_lock);
+		spin_unlock(&mrt_lock);
 		call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_REPLACE, c,
 					      mrt->id);
 		mroute_netlink_event(mrt, c, RTM_NEWROUTE);
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 net-next 19/19] ip6mr: convert mrt_lock to a spinlock
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (17 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 18/19] ipmr: convert mrt_lock to a spinlock Eric Dumazet
@ 2022-06-23  4:34 ` Eric Dumazet
  2022-06-24 10:50 ` [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks patchwork-bot+netdevbpf
  19 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-06-23  4:34 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet

mrt_lock is only held in write mode, from process context only.

We can switch to a mere spinlock, and avoid blocking BH.

Also, vif_dev_read() is always called under standard rcu_read_lock().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6mr.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 7381cfdac3e376c97917465918a464bd61643f2a..ec6e1509fc7cdfaf55bc79034f43b6e3ce0434ed 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -62,12 +62,11 @@ struct ip6mr_result {
    Note that the changes are semaphored via rtnl_lock.
  */
 
-static DEFINE_RWLOCK(mrt_lock);
+static DEFINE_SPINLOCK(mrt_lock);
 
 static struct net_device *vif_dev_read(const struct vif_device *vif)
 {
-	return rcu_dereference_check(vif->dev,
-				     lockdep_is_held(&mrt_lock));
+	return rcu_dereference(vif->dev);
 }
 
 /* Multicast router control variables */
@@ -714,7 +713,7 @@ static int mif6_delete(struct mr_table *mrt, int vifi, int notify,
 	call_ip6mr_vif_entry_notifiers(read_pnet(&mrt->net),
 				       FIB_EVENT_VIF_DEL, v, dev,
 				       vifi, mrt->id);
-	write_lock_bh(&mrt_lock);
+	spin_lock(&mrt_lock);
 	RCU_INIT_POINTER(v->dev, NULL);
 
 #ifdef CONFIG_IPV6_PIMSM_V2
@@ -733,7 +732,7 @@ static int mif6_delete(struct mr_table *mrt, int vifi, int notify,
 		WRITE_ONCE(mrt->maxvif, tmp + 1);
 	}
 
-	write_unlock_bh(&mrt_lock);
+	spin_unlock(&mrt_lock);
 
 	dev_set_allmulti(dev, -1);
 
@@ -833,7 +832,7 @@ static void ipmr_expire_process(struct timer_list *t)
 	spin_unlock(&mfc_unres_lock);
 }
 
-/* Fill oifs list. It is called under write locked mrt_lock. */
+/* Fill oifs list. It is called under locked mrt_lock. */
 
 static void ip6mr_update_thresholds(struct mr_table *mrt,
 				    struct mr_mfc *cache,
@@ -919,7 +918,7 @@ static int mif6_add(struct net *net, struct mr_table *mrt,
 			MIFF_REGISTER);
 
 	/* And finish update writing critical data */
-	write_lock_bh(&mrt_lock);
+	spin_lock(&mrt_lock);
 	rcu_assign_pointer(v->dev, dev);
 	netdev_tracker_alloc(dev, &v->dev_tracker, GFP_ATOMIC);
 #ifdef CONFIG_IPV6_PIMSM_V2
@@ -928,7 +927,7 @@ static int mif6_add(struct net *net, struct mr_table *mrt,
 #endif
 	if (vifi + 1 > mrt->maxvif)
 		WRITE_ONCE(mrt->maxvif, vifi + 1);
-	write_unlock_bh(&mrt_lock);
+	spin_unlock(&mrt_lock);
 	call_ip6mr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD,
 				       v, dev, vifi, mrt->id);
 	return 0;
@@ -1442,12 +1441,12 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
 				    &mfc->mf6cc_mcastgrp.sin6_addr, parent);
 	rcu_read_unlock();
 	if (c) {
-		write_lock_bh(&mrt_lock);
+		spin_lock(&mrt_lock);
 		c->_c.mfc_parent = mfc->mf6cc_parent;
 		ip6mr_update_thresholds(mrt, &c->_c, ttls);
 		if (!mrtsock)
 			c->_c.mfc_flags |= MFC_STATIC;
-		write_unlock_bh(&mrt_lock);
+		spin_unlock(&mrt_lock);
 		call_ip6mr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_REPLACE,
 					       c, mrt->id);
 		mr6_netlink_event(mrt, c, RTM_NEWROUTE);
@@ -1565,7 +1564,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk)
 	struct net *net = sock_net(sk);
 
 	rtnl_lock();
-	write_lock_bh(&mrt_lock);
+	spin_lock(&mrt_lock);
 	if (rtnl_dereference(mrt->mroute_sk)) {
 		err = -EADDRINUSE;
 	} else {
@@ -1573,7 +1572,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk)
 		sock_set_flag(sk, SOCK_RCU_FREE);
 		atomic_inc(&net->ipv6.devconf_all->mc_forwarding);
 	}
-	write_unlock_bh(&mrt_lock);
+	spin_unlock(&mrt_lock);
 
 	if (!err)
 		inet6_netconf_notify_devconf(net, RTM_NEWNETCONF,
@@ -1603,14 +1602,14 @@ int ip6mr_sk_done(struct sock *sk)
 	rtnl_lock();
 	ip6mr_for_each_table(mrt, net) {
 		if (sk == rtnl_dereference(mrt->mroute_sk)) {
-			write_lock_bh(&mrt_lock);
+			spin_lock(&mrt_lock);
 			RCU_INIT_POINTER(mrt->mroute_sk, NULL);
 			/* Note that mroute_sk had SOCK_RCU_FREE set,
 			 * so the RCU grace period before sk freeing
 			 * is guaranteed by sk_destruct()
 			 */
 			atomic_dec(&devconf->mc_forwarding);
-			write_unlock_bh(&mrt_lock);
+			spin_unlock(&mrt_lock);
 			inet6_netconf_notify_devconf(net, RTM_NEWNETCONF,
 						     NETCONFA_MC_FORWARDING,
 						     NETCONFA_IFINDEX_ALL,
@@ -2097,7 +2096,7 @@ static int ip6mr_forward2(struct net *net, struct mr_table *mrt,
 	return 0;
 }
 
-/* Called with mrt_lock or rcu_read_lock() */
+/* Called with rcu_read_lock() */
 static int ip6mr_find_vif(struct mr_table *mrt, struct net_device *dev)
 {
 	int ct;
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks
  2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
                   ` (18 preceding siblings ...)
  2022-06-23  4:34 ` [PATCH v2 net-next 19/19] ip6mr: " Eric Dumazet
@ 2022-06-24 10:50 ` patchwork-bot+netdevbpf
  19 siblings, 0 replies; 25+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-06-24 10:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, eric.dumazet

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 23 Jun 2022 04:34:30 +0000 you wrote:
> We need to get rid of rwlocks in networking stacks,
> if read_lock() is (ab)used from softirq context.
> 
> As discussed recently [1], rwlock are unfair by design in this case,
> and writers can starve and trigger soft lockups.
> 
> This series convert ipmr code (both IPv4 and IPv6 families)
> to RCU and spinlocks.
> 
> [...]

Here is the summary with links:
  - [v2,net-next,01/19] ip6mr: do not get a device reference in pim6_rcv()
    https://git.kernel.org/netdev/net-next/c/0a24c43f54b2
  - [v2,net-next,02/19] ipmr: add rcu protection over (struct vif_device)->dev
    https://git.kernel.org/netdev/net-next/c/ebc3197963fc
  - [v2,net-next,03/19] ipmr: change igmpmsg_netlink_event() prototype
    https://git.kernel.org/netdev/net-next/c/0b490b51d226
  - [v2,net-next,04/19] ipmr: ipmr_cache_report() changes
    https://git.kernel.org/netdev/net-next/c/646679881a02
  - [v2,net-next,05/19] ipmr: do not acquire mrt_lock in __pim_rcv()
    https://git.kernel.org/netdev/net-next/c/121fefc669bf
  - [v2,net-next,06/19] ipmr: do not acquire mrt_lock in ioctl(SIOCGETVIFCNT)
    https://git.kernel.org/netdev/net-next/c/559260fd9d9a
  - [v2,net-next,07/19] ipmr: do not acquire mrt_lock before calling ipmr_cache_unresolved()
    https://git.kernel.org/netdev/net-next/c/9094db4b8004
  - [v2,net-next,08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward()
    https://git.kernel.org/netdev/net-next/c/4eadb88244d1
  - [v2,net-next,09/19] ipmr: do not acquire mrt_lock in ipmr_get_route()
    https://git.kernel.org/netdev/net-next/c/e4cd9868e8ec
  - [v2,net-next,10/19] ip6mr: ip6mr_cache_report() changes
    https://git.kernel.org/netdev/net-next/c/3493a5b730e5
  - [v2,net-next,11/19] ip6mr: do not acquire mrt_lock in pim6_rcv()
    https://git.kernel.org/netdev/net-next/c/6d08658736fc
  - [v2,net-next,12/19] ip6mr: do not acquire mrt_lock in ioctl(SIOCGETMIFCNT_IN6)
    https://git.kernel.org/netdev/net-next/c/638cf4a24a09
  - [v2,net-next,13/19] ip6mr: do not acquire mrt_lock before calling ip6mr_cache_unresolved
    https://git.kernel.org/netdev/net-next/c/db9eb7c8ae34
  - [v2,net-next,14/19] ip6mr: do not acquire mrt_lock while calling ip6_mr_forward()
    https://git.kernel.org/netdev/net-next/c/9b1c21d898fd
  - [v2,net-next,15/19] ip6mr: switch ip6mr_get_route() to rcu_read_lock()
    https://git.kernel.org/netdev/net-next/c/6fa40a290219
  - [v2,net-next,16/19] ipmr: adopt rcu_read_lock() in mr_dump()
    https://git.kernel.org/netdev/net-next/c/194366b28b83
  - [v2,net-next,17/19] ipmr: convert /proc handlers to rcu_read_lock()
    https://git.kernel.org/netdev/net-next/c/b96ef16d2f83
  - [v2,net-next,18/19] ipmr: convert mrt_lock to a spinlock
    https://git.kernel.org/netdev/net-next/c/3f55211ecf6a
  - [v2,net-next,19/19] ip6mr: convert mrt_lock to a spinlock
    https://git.kernel.org/netdev/net-next/c/a96f7a6a60b3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward()
  2022-06-23  4:34 ` [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward() Eric Dumazet
@ 2022-07-22 19:34   ` Vladimir Oltean
  2022-07-22 20:37     ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Vladimir Oltean @ 2022-07-22 19:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, eric.dumazet

Hi Eric,

On Thu, Jun 23, 2022 at 04:34:38AM +0000, Eric Dumazet wrote:
> ip_mr_forward() uses standard RCU protection already.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/ipmr.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 6ea54bc3d9b6555aaa9974d81ba4acd47481724f..b0f2e6d79d62273c8c2682f28cb45fe5ec3df6f3 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -1817,7 +1817,7 @@ static bool ipmr_forward_offloaded(struct sk_buff *skb, struct mr_table *mrt,
>  }
>  #endif
>  
> -/* Processing handlers for ipmr_forward */
> +/* Processing handlers for ipmr_forward, under rcu_read_lock() */
>  
>  static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
>  			    int in_vifi, struct sk_buff *skb, int vifi)
> @@ -1839,9 +1839,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
>  		WRITE_ONCE(vif->bytes_out, vif->bytes_out + skb->len);
>  		vif_dev->stats.tx_bytes += skb->len;
>  		vif_dev->stats.tx_packets++;
> -		rcu_read_lock();
>  		ipmr_cache_report(mrt, skb, vifi, IGMPMSG_WHOLEPKT);
> -		rcu_read_unlock();
>  		goto out_free;
>  	}
>  
> @@ -1936,6 +1934,7 @@ static int ipmr_find_vif(const struct mr_table *mrt, struct net_device *dev)
>  }
>  
>  /* "local" means that we should preserve one skb (for local delivery) */
> +/* Called uner rcu_read_lock() */
>  static void ip_mr_forward(struct net *net, struct mr_table *mrt,
>  			  struct net_device *dev, struct sk_buff *skb,
>  			  struct mfc_cache *c, int local)
> @@ -1992,12 +1991,10 @@ static void ip_mr_forward(struct net *net, struct mr_table *mrt,
>  			       c->_c.mfc_un.res.last_assert +
>  			       MFC_ASSERT_THRESH)) {
>  			c->_c.mfc_un.res.last_assert = jiffies;
> -			rcu_read_lock();
>  			ipmr_cache_report(mrt, skb, true_vifi, IGMPMSG_WRONGVIF);
>  			if (mrt->mroute_do_wrvifwhole)
>  				ipmr_cache_report(mrt, skb, true_vifi,
>  						  IGMPMSG_WRVIFWHOLE);
> -			rcu_read_unlock();
>  		}
>  		goto dont_forward;
>  	}
> @@ -2169,9 +2166,7 @@ int ip_mr_input(struct sk_buff *skb)
>  		return -ENODEV;
>  	}
>  
> -	read_lock(&mrt_lock);
>  	ip_mr_forward(net, mrt, dev, skb, cache, local);
> -	read_unlock(&mrt_lock);
>  
>  	if (local)
>  		return ip_local_deliver(skb);
> -- 
> 2.37.0.rc0.104.g0611611a94-goog
> 

Sorry for reporting this late, but there seems to be 1 call path from
which RCU is not watching in ip_mr_forward(). It's via ipmr_mfc_add() ->
ipmr_cache_resolve() -> ip_mr_forward().

The warning looks like this:

[  632.062382] =============================
[  632.068568] WARNING: suspicious RCU usage
[  632.073702] 5.19.0-rc7-07010-ga9b9500ffaac-dirty #3374 Not tainted
[  632.081098] -----------------------------
[  632.086216] net/ipv4/ipmr.c:1080 suspicious rcu_dereference_check() usage!
[  632.094152]
[  632.094152] other info that might help us debug this:
[  632.103349]
[  632.103349] rcu_scheduler_active = 2, debug_locks = 1
[  632.111011] 1 lock held by smcrouted/359:
[  632.116079]  #0: ffffd27b44d23770 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30
[  632.124703]
[  632.124703] stack backtrace:
[  632.129681] CPU: 0 PID: 359 Comm: smcrouted Not tainted 5.19.0-rc7-07010-ga9b9500ffaac-dirty #3374
[  632.143426] Call trace:
[  632.160542]  lockdep_rcu_suspicious+0xf8/0x10c
[  632.165014]  ipmr_cache_report+0x2f0/0x530
[  632.169137]  ip_mr_forward+0x364/0x38c
[  632.172909]  ipmr_mfc_add+0x894/0xc90
[  632.176592]  ip_mroute_setsockopt+0x6ac/0x950
[  632.180973]  ip_setsockopt+0x16a0/0x16ac
[  632.184921]  raw_setsockopt+0x110/0x184
[  632.188780]  sock_common_setsockopt+0x1c/0x2c
[  632.193163]  __sys_setsockopt+0x94/0x170
[  632.197111]  __arm64_sys_setsockopt+0x2c/0x40
[  632.201492]  invoke_syscall+0x48/0x114

I don't exactly understand the data structures that are used inside ip_mr_forward(),
so I'm unable to say what needs RCU protection and what is fine with the rtnl_mutex
that we are holding, just annotated poorly. Could you please take a look?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward()
  2022-07-22 19:34   ` Vladimir Oltean
@ 2022-07-22 20:37     ` Eric Dumazet
  2022-07-22 21:10       ` Vladimir Oltean
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2022-07-22 20:37 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, Eric Dumazet

On Fri, Jul 22, 2022 at 9:34 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> Hi Eric,
>
> On Thu, Jun 23, 2022 at 04:34:38AM +0000, Eric Dumazet wrote:
> > ip_mr_forward() uses standard RCU protection already.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv4/ipmr.c | 9 ++-------
> >  1 file changed, 2 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> > index 6ea54bc3d9b6555aaa9974d81ba4acd47481724f..b0f2e6d79d62273c8c2682f28cb45fe5ec3df6f3 100644
> > --- a/net/ipv4/ipmr.c
> > +++ b/net/ipv4/ipmr.c
> > @@ -1817,7 +1817,7 @@ static bool ipmr_forward_offloaded(struct sk_buff *skb, struct mr_table *mrt,
> >  }
> >  #endif
> >
> > -/* Processing handlers for ipmr_forward */
> > +/* Processing handlers for ipmr_forward, under rcu_read_lock() */
> >
> >  static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
> >                           int in_vifi, struct sk_buff *skb, int vifi)
> > @@ -1839,9 +1839,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
> >               WRITE_ONCE(vif->bytes_out, vif->bytes_out + skb->len);
> >               vif_dev->stats.tx_bytes += skb->len;
> >               vif_dev->stats.tx_packets++;
> > -             rcu_read_lock();
> >               ipmr_cache_report(mrt, skb, vifi, IGMPMSG_WHOLEPKT);
> > -             rcu_read_unlock();
> >               goto out_free;
> >       }
> >
> > @@ -1936,6 +1934,7 @@ static int ipmr_find_vif(const struct mr_table *mrt, struct net_device *dev)
> >  }
> >
> >  /* "local" means that we should preserve one skb (for local delivery) */
> > +/* Called uner rcu_read_lock() */
> >  static void ip_mr_forward(struct net *net, struct mr_table *mrt,
> >                         struct net_device *dev, struct sk_buff *skb,
> >                         struct mfc_cache *c, int local)
> > @@ -1992,12 +1991,10 @@ static void ip_mr_forward(struct net *net, struct mr_table *mrt,
> >                              c->_c.mfc_un.res.last_assert +
> >                              MFC_ASSERT_THRESH)) {
> >                       c->_c.mfc_un.res.last_assert = jiffies;
> > -                     rcu_read_lock();
> >                       ipmr_cache_report(mrt, skb, true_vifi, IGMPMSG_WRONGVIF);
> >                       if (mrt->mroute_do_wrvifwhole)
> >                               ipmr_cache_report(mrt, skb, true_vifi,
> >                                                 IGMPMSG_WRVIFWHOLE);
> > -                     rcu_read_unlock();
> >               }
> >               goto dont_forward;
> >       }
> > @@ -2169,9 +2166,7 @@ int ip_mr_input(struct sk_buff *skb)
> >               return -ENODEV;
> >       }
> >
> > -     read_lock(&mrt_lock);
> >       ip_mr_forward(net, mrt, dev, skb, cache, local);
> > -     read_unlock(&mrt_lock);
> >
> >       if (local)
> >               return ip_local_deliver(skb);
> > --
> > 2.37.0.rc0.104.g0611611a94-goog
> >
>
> Sorry for reporting this late, but there seems to be 1 call path from
> which RCU is not watching in ip_mr_forward(). It's via ipmr_mfc_add() ->
> ipmr_cache_resolve() -> ip_mr_forward().
>
> The warning looks like this:
>
> [  632.062382] =============================
> [  632.068568] WARNING: suspicious RCU usage
> [  632.073702] 5.19.0-rc7-07010-ga9b9500ffaac-dirty #3374 Not tainted
> [  632.081098] -----------------------------
> [  632.086216] net/ipv4/ipmr.c:1080 suspicious rcu_dereference_check() usage!
> [  632.094152]
> [  632.094152] other info that might help us debug this:
> [  632.103349]
> [  632.103349] rcu_scheduler_active = 2, debug_locks = 1
> [  632.111011] 1 lock held by smcrouted/359:
> [  632.116079]  #0: ffffd27b44d23770 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30
> [  632.124703]
> [  632.124703] stack backtrace:
> [  632.129681] CPU: 0 PID: 359 Comm: smcrouted Not tainted 5.19.0-rc7-07010-ga9b9500ffaac-dirty #3374
> [  632.143426] Call trace:
> [  632.160542]  lockdep_rcu_suspicious+0xf8/0x10c
> [  632.165014]  ipmr_cache_report+0x2f0/0x530
> [  632.169137]  ip_mr_forward+0x364/0x38c
> [  632.172909]  ipmr_mfc_add+0x894/0xc90
> [  632.176592]  ip_mroute_setsockopt+0x6ac/0x950
> [  632.180973]  ip_setsockopt+0x16a0/0x16ac
> [  632.184921]  raw_setsockopt+0x110/0x184
> [  632.188780]  sock_common_setsockopt+0x1c/0x2c
> [  632.193163]  __sys_setsockopt+0x94/0x170
> [  632.197111]  __arm64_sys_setsockopt+0x2c/0x40
> [  632.201492]  invoke_syscall+0x48/0x114
>
> I don't exactly understand the data structures that are used inside ip_mr_forward(),
> so I'm unable to say what needs RCU protection and what is fine with the rtnl_mutex
> that we are holding, just annotated poorly. Could you please take a look?

Thanks for the report.

I guess there are multiple ways to solve this issue, one being:

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 73651d17e51f31c8755da6ac3c1c2763a99b1117..1c288a7b60132365c072874d1f811b70679a2bcb
100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1004,7 +1004,9 @@ static void ipmr_cache_resolve(struct net *net,
struct mr_table *mrt,

                        rtnl_unicast(skb, net, NETLINK_CB(skb).portid);
                } else {
+                       rcu_read_lock();
                        ip_mr_forward(net, mrt, skb->dev, skb, c, 0);
+                       rcu_read_unlock();
                }
        }
 }
@@ -1933,7 +1935,7 @@ static int ipmr_find_vif(const struct mr_table
*mrt, struct net_device *dev)
 }

 /* "local" means that we should preserve one skb (for local delivery) */
-/* Called uner rcu_read_lock() */
+/* Called under rcu_read_lock() */
 static void ip_mr_forward(struct net *net, struct mr_table *mrt,
                          struct net_device *dev, struct sk_buff *skb,
                          struct mfc_cache *c, int local)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward()
  2022-07-22 20:37     ` Eric Dumazet
@ 2022-07-22 21:10       ` Vladimir Oltean
  2022-07-22 21:15         ` Vladimir Oltean
  0 siblings, 1 reply; 25+ messages in thread
From: Vladimir Oltean @ 2022-07-22 21:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, Eric Dumazet

On Fri, Jul 22, 2022 at 10:37:24PM +0200, Eric Dumazet wrote:
> Thanks for the report.
> 
> I guess there are multiple ways to solve this issue, one being:
> 
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 73651d17e51f31c8755da6ac3c1c2763a99b1117..1c288a7b60132365c072874d1f811b70679a2bcb
> 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -1004,7 +1004,9 @@ static void ipmr_cache_resolve(struct net *net,
> struct mr_table *mrt,
> 
>                         rtnl_unicast(skb, net, NETLINK_CB(skb).portid);
>                 } else {
> +                       rcu_read_lock();
>                         ip_mr_forward(net, mrt, skb->dev, skb, c, 0);
> +                       rcu_read_unlock();
>                 }
>         }
>  }
> @@ -1933,7 +1935,7 @@ static int ipmr_find_vif(const struct mr_table
> *mrt, struct net_device *dev)
>  }
> 
>  /* "local" means that we should preserve one skb (for local delivery) */
> -/* Called uner rcu_read_lock() */
> +/* Called under rcu_read_lock() */
>  static void ip_mr_forward(struct net *net, struct mr_table *mrt,
>                           struct net_device *dev, struct sk_buff *skb,
>                           struct mfc_cache *c, int local)

It sure makes lockdep stop complaining...

I just noticed that we appear to have the same problem with the
equivalent call path for ipv6: ip6mr_mfc_add -> ip6mr_cache_resolve ->
ip6_mr_forward, although I don't have smcroute or the kernel configured
for any IPv6 multicast routes right now, so I can't say for sure.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward()
  2022-07-22 21:10       ` Vladimir Oltean
@ 2022-07-22 21:15         ` Vladimir Oltean
  0 siblings, 0 replies; 25+ messages in thread
From: Vladimir Oltean @ 2022-07-22 21:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, Eric Dumazet

On Sat, Jul 23, 2022 at 12:10:05AM +0300, Vladimir Oltean wrote:
> I just noticed that we appear to have the same problem with the
> equivalent call path for ipv6: ip6mr_mfc_add -> ip6mr_cache_resolve ->
> ip6_mr_forward, although I don't have smcroute or the kernel configured
> for any IPv6 multicast routes right now, so I can't say for sure.

Not to mention ip6_mr_forward() has a random rcu_read_unlock() thrown in
there, with no paired lock(), left from who knows what refactoring...
I don't think I'll be able to report all the locking problems with the
IP multicast routing code, maybe someone with more familiarity should
take a look there :-/

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-07-22 21:15 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23  4:34 [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 01/19] ip6mr: do not get a device reference in pim6_rcv() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 02/19] ipmr: add rcu protection over (struct vif_device)->dev Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 03/19] ipmr: change igmpmsg_netlink_event() prototype Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 04/19] ipmr: ipmr_cache_report() changes Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 05/19] ipmr: do not acquire mrt_lock in __pim_rcv() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 06/19] ipmr: do not acquire mrt_lock in ioctl(SIOCGETVIFCNT) Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 07/19] ipmr: do not acquire mrt_lock before calling ipmr_cache_unresolved() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 08/19] ipmr: do not acquire mrt_lock while calling ip_mr_forward() Eric Dumazet
2022-07-22 19:34   ` Vladimir Oltean
2022-07-22 20:37     ` Eric Dumazet
2022-07-22 21:10       ` Vladimir Oltean
2022-07-22 21:15         ` Vladimir Oltean
2022-06-23  4:34 ` [PATCH v2 net-next 09/19] ipmr: do not acquire mrt_lock in ipmr_get_route() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 10/19] ip6mr: ip6mr_cache_report() changes Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 11/19] ip6mr: do not acquire mrt_lock in pim6_rcv() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 12/19] ip6mr: do not acquire mrt_lock in ioctl(SIOCGETMIFCNT_IN6) Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 13/19] ip6mr: do not acquire mrt_lock before calling ip6mr_cache_unresolved Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 14/19] ip6mr: do not acquire mrt_lock while calling ip6_mr_forward() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 15/19] ip6mr: switch ip6mr_get_route() to rcu_read_lock() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 16/19] ipmr: adopt rcu_read_lock() in mr_dump() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 17/19] ipmr: convert /proc handlers to rcu_read_lock() Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 18/19] ipmr: convert mrt_lock to a spinlock Eric Dumazet
2022-06-23  4:34 ` [PATCH v2 net-next 19/19] ip6mr: " Eric Dumazet
2022-06-24 10:50 ` [PATCH v2 net-next 00/19] ipmr: get rid of rwlocks patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.