All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] netpoll: make sure napi_list is safe for RCU traversal
@ 2020-09-09 17:37 Jakub Kicinski
  2020-09-09 17:37 ` [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API Jakub Kicinski
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Jakub Kicinski @ 2020-09-09 17:37 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet, kernel-team, Jakub Kicinski

Hi!

This series is a follow-up to the fix in commit 96e97bc07e90 ("net: 
disable netpoll on fresh napis"). To avoid any latent race conditions
convert dev->napi_list to a proper RCU list. We need minor restructuring
because it looks like netif_napi_del() used to be idempotent, and
it may be quite hard to track down everyone who depends on that.

Jakub Kicinski (3):
  net: remove napi_hash_del() from driver-facing API
  net: manage napi add/del idempotence explicitly
  net: make sure napi_list is safe for RCU traversal

 .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |  8 ++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  5 ++-
 drivers/net/ethernet/cisco/enic/enic_main.c   | 12 ++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  4 +--
 .../net/ethernet/myricom/myri10ge/myri10ge.c  |  5 ++-
 drivers/net/veth.c                            |  3 +-
 drivers/net/virtio_net.c                      |  7 ++--
 include/linux/netdevice.h                     | 36 +++++++++----------
 net/core/dev.c                                | 32 ++++++++---------
 net/core/netpoll.c                            |  2 +-
 10 files changed, 55 insertions(+), 59 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API
  2020-09-09 17:37 [PATCH net-next 0/3] netpoll: make sure napi_list is safe for RCU traversal Jakub Kicinski
@ 2020-09-09 17:37 ` Jakub Kicinski
  2020-11-24 18:00   ` Eric Dumazet
  2020-09-09 17:37 ` [PATCH net-next 2/3] net: manage napi add/del idempotence explicitly Jakub Kicinski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2020-09-09 17:37 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet, kernel-team, Jakub Kicinski

We allow drivers to call napi_hash_del() before calling
netif_napi_del() to batch RCU grace periods. This makes
the API asymmetric and leaks internal implementation details.
Soon we will want the grace period to protect more than just
the NAPI hash table.

Restructure the API and have drivers call a new function -
__netif_napi_del() if they want to take care of RCU waits.

Note that only core was checking the return status from
napi_hash_del() so the new helper does not report if the
NAPI was actually deleted.

Some notes on driver oddness:
 - veth observed the grace period before calling netif_napi_del()
   but that should not matter
 - myri10ge observed normal RCU flavor
 - bnx2x and enic did not actually observe the grace period
   (unless they did so implicitly)
 - virtio_net and enic only unhashed Rx NAPIs

The last two points seem to indicate that the calls to
napi_hash_del() were a left over rather than an optimization.
Regardless, it's easy enough to correct them.

This patch may introduce extra synchronize_net() calls for
interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
free_netdev() to call netif_napi_del(). This seems inevitable
since we want to use RCU for netpoll dev->napi_list traversal,
and almost no drivers set IFF_DISABLE_NETPOLL.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |  8 ++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  5 ++-
 drivers/net/ethernet/cisco/enic/enic_main.c   | 12 ++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  4 +--
 .../net/ethernet/myricom/myri10ge/myri10ge.c  |  5 ++-
 drivers/net/veth.c                            |  3 +-
 drivers/net/virtio_net.c                      |  7 ++--
 include/linux/netdevice.h                     | 32 +++++++++----------
 net/core/dev.c                                | 19 ++++-------
 9 files changed, 43 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 7e4c93be4451..d8b1824c334d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -825,9 +825,9 @@ static inline void bnx2x_del_all_napi_cnic(struct bnx2x *bp)
 	int i;
 
 	for_each_rx_queue_cnic(bp, i) {
-		napi_hash_del(&bnx2x_fp(bp, i, napi));
-		netif_napi_del(&bnx2x_fp(bp, i, napi));
+		__netif_napi_del(&bnx2x_fp(bp, i, napi));
 	}
+	synchronize_net();
 }
 
 static inline void bnx2x_del_all_napi(struct bnx2x *bp)
@@ -835,9 +835,9 @@ static inline void bnx2x_del_all_napi(struct bnx2x *bp)
 	int i;
 
 	for_each_eth_queue(bp, i) {
-		napi_hash_del(&bnx2x_fp(bp, i, napi));
-		netif_napi_del(&bnx2x_fp(bp, i, napi));
+		__netif_napi_del(&bnx2x_fp(bp, i, napi));
 	}
+	synchronize_net();
 }
 
 int bnx2x_set_int_mode(struct bnx2x *bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b167066af450..53f64ca673c3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8634,10 +8634,9 @@ static void bnxt_del_napi(struct bnxt *bp)
 	for (i = 0; i < bp->cp_nr_rings; i++) {
 		struct bnxt_napi *bnapi = bp->bnapi[i];
 
-		napi_hash_del(&bnapi->napi);
-		netif_napi_del(&bnapi->napi);
+		__netif_napi_del(&bnapi->napi);
 	}
-	/* We called napi_hash_del() before netif_napi_del(), we need
+	/* We called __netif_napi_del(), we need
 	 * to respect an RCU grace period before freeing napi structures.
 	 */
 	synchronize_net();
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index a4e307636f5a..4f0329d8778f 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2529,13 +2529,15 @@ static void enic_dev_deinit(struct enic *enic)
 {
 	unsigned int i;
 
-	for (i = 0; i < enic->rq_count; i++) {
-		napi_hash_del(&enic->napi[i]);
-		netif_napi_del(&enic->napi[i]);
-	}
+	for (i = 0; i < enic->rq_count; i++)
+		__netif_napi_del(&enic->napi[i]);
+
 	if (vnic_dev_get_intr_mode(enic->vdev) == VNIC_DEV_INTR_MODE_MSIX)
 		for (i = 0; i < enic->wq_count; i++)
-			netif_napi_del(&enic->napi[enic_cq_wq(enic, i)]);
+			__netif_napi_del(&enic->napi[enic_cq_wq(enic, i)]);
+
+	/* observe RCU grace period after __netif_napi_del() calls */
+	synchronize_net();
 
 	enic_free_vnic_resources(enic);
 	enic_clear_intr_mode(enic);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 2e35c5706cf1..df389a11d3af 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -1029,10 +1029,10 @@ static void ixgbe_free_q_vector(struct ixgbe_adapter *adapter, int v_idx)
 		WRITE_ONCE(adapter->rx_ring[ring->queue_index], NULL);
 
 	adapter->q_vector[v_idx] = NULL;
-	napi_hash_del(&q_vector->napi);
-	netif_napi_del(&q_vector->napi);
+	__netif_napi_del(&q_vector->napi);
 
 	/*
+	 * after a call to __netif_napi_del() napi may still be used and
 	 * ixgbe_get_stats64() might access the rings on this vector,
 	 * we must wait a grace period before freeing it.
 	 */
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 4a5beafa0493..1634ca6d4a8f 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -3543,11 +3543,10 @@ static void myri10ge_free_slices(struct myri10ge_priv *mgp)
 					  ss->fw_stats, ss->fw_stats_bus);
 			ss->fw_stats = NULL;
 		}
-		napi_hash_del(&ss->napi);
-		netif_napi_del(&ss->napi);
+		__netif_napi_del(&ss->napi);
 	}
 	/* Wait till napi structs are no longer used, and then free ss. */
-	synchronize_rcu();
+	synchronize_net();
 	kfree(mgp->ss);
 	mgp->ss = NULL;
 }
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 7de8f0ea3f6b..091e5b4ba042 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -897,14 +897,13 @@ static void veth_napi_del(struct net_device *dev)
 		struct veth_rq *rq = &priv->rq[i];
 
 		napi_disable(&rq->xdp_napi);
-		napi_hash_del(&rq->xdp_napi);
+		__netif_napi_del(&rq->xdp_napi);
 	}
 	synchronize_net();
 
 	for (i = 0; i < dev->real_num_rx_queues; i++) {
 		struct veth_rq *rq = &priv->rq[i];
 
-		netif_napi_del(&rq->xdp_napi);
 		rq->rx_notify_masked = false;
 		ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free);
 	}
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 263b005981bd..7145c83c6c8c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2604,12 +2604,11 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	int i;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		napi_hash_del(&vi->rq[i].napi);
-		netif_napi_del(&vi->rq[i].napi);
-		netif_napi_del(&vi->sq[i].napi);
+		__netif_napi_del(&vi->rq[i].napi);
+		__netif_napi_del(&vi->sq[i].napi);
 	}
 
-	/* We called napi_hash_del() before netif_napi_del(),
+	/* We called __netif_napi_del(),
 	 * we need to respect an RCU grace period before freeing vi->rq
 	 */
 	synchronize_net();
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7f9fcfd15942..74ed95215091 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -70,6 +70,7 @@ struct udp_tunnel_nic;
 struct bpf_prog;
 struct xdp_buff;
 
+void synchronize_net(void);
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
 
@@ -488,20 +489,6 @@ static inline bool napi_complete(struct napi_struct *n)
 	return napi_complete_done(n, 0);
 }
 
-/**
- *	napi_hash_del - remove a NAPI from global table
- *	@napi: NAPI context
- *
- * Warning: caller must observe RCU grace period
- * before freeing memory containing @napi, if
- * this function returns true.
- * Note: core networking stack automatically calls it
- * from netif_napi_del().
- * Drivers might want to call this helper to combine all
- * the needed RCU grace periods into a single one.
- */
-bool napi_hash_del(struct napi_struct *napi);
-
 /**
  *	napi_disable - prevent NAPI from scheduling
  *	@n: NAPI context
@@ -2367,13 +2354,27 @@ static inline void netif_tx_napi_add(struct net_device *dev,
 	netif_napi_add(dev, napi, poll, weight);
 }
 
+/**
+ *  __netif_napi_del - remove a NAPI context
+ *  @napi: NAPI context
+ *
+ * Warning: caller must observe RCU grace period before freeing memory
+ * containing @napi. Drivers might want to call this helper to combine
+ * all the needed RCU grace periods into a single one.
+ */
+void __netif_napi_del(struct napi_struct *napi);
+
 /**
  *  netif_napi_del - remove a NAPI context
  *  @napi: NAPI context
  *
  *  netif_napi_del() removes a NAPI context from the network device NAPI list
  */
-void netif_napi_del(struct napi_struct *napi);
+static inline void netif_napi_del(struct napi_struct *napi)
+{
+	__netif_napi_del(napi);
+	synchronize_net();
+}
 
 struct napi_gro_cb {
 	/* Virtual address of skb_shinfo(skb)->frags[0].page + offset. */
@@ -2797,7 +2798,6 @@ static inline void unregister_netdevice(struct net_device *dev)
 int netdev_refcnt_read(const struct net_device *dev);
 void free_netdev(struct net_device *dev);
 void netdev_freemem(struct net_device *dev);
-void synchronize_net(void);
 int init_dummy_netdev(struct net_device *dev);
 
 struct net_device *netdev_get_xmit_slave(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index 152ad3b578de..205b7339d4ae 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6555,20 +6555,15 @@ static void napi_hash_add(struct napi_struct *napi)
 /* Warning : caller is responsible to make sure rcu grace period
  * is respected before freeing memory containing @napi
  */
-bool napi_hash_del(struct napi_struct *napi)
+static void napi_hash_del(struct napi_struct *napi)
 {
-	bool rcu_sync_needed = false;
-
 	spin_lock(&napi_hash_lock);
 
-	if (test_and_clear_bit(NAPI_STATE_HASHED, &napi->state)) {
-		rcu_sync_needed = true;
+	if (test_and_clear_bit(NAPI_STATE_HASHED, &napi->state))
 		hlist_del_rcu(&napi->napi_hash_node);
-	}
+
 	spin_unlock(&napi_hash_lock);
-	return rcu_sync_needed;
 }
-EXPORT_SYMBOL_GPL(napi_hash_del);
 
 static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
 {
@@ -6653,18 +6648,16 @@ static void flush_gro_hash(struct napi_struct *napi)
 }
 
 /* Must be called in process context */
-void netif_napi_del(struct napi_struct *napi)
+void __netif_napi_del(struct napi_struct *napi)
 {
-	might_sleep();
-	if (napi_hash_del(napi))
-		synchronize_net();
+	napi_hash_del(napi);
 	list_del_init(&napi->dev_list);
 	napi_free_frags(napi);
 
 	flush_gro_hash(napi);
 	napi->gro_bitmask = 0;
 }
-EXPORT_SYMBOL(netif_napi_del);
+EXPORT_SYMBOL(__netif_napi_del);
 
 static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/3] net: manage napi add/del idempotence explicitly
  2020-09-09 17:37 [PATCH net-next 0/3] netpoll: make sure napi_list is safe for RCU traversal Jakub Kicinski
  2020-09-09 17:37 ` [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API Jakub Kicinski
@ 2020-09-09 17:37 ` Jakub Kicinski
  2020-09-09 17:37 ` [PATCH net-next 3/3] net: make sure napi_list is safe for RCU traversal Jakub Kicinski
  2020-09-10 20:09 ` [PATCH net-next 0/3] netpoll: " David Miller
  3 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2020-09-09 17:37 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet, kernel-team, Jakub Kicinski

To RCUify napi->dev_list we need to replace list_del_init()
with list_del_rcu(). There is no _init() version for RCU for
obvious reasons. Up until now netif_napi_del() was idempotent
so to make sure it remains such add a bit which is set when
NAPI is listed, and cleared when it removed. Since we don't
expect multiple calls to netif_napi_add() to be correct,
add a warning on that side.

Now that napi_hash_add / napi_hash_del are only called by
napi_add / del we can actually steal its bit. We just need
to make sure hash node is initialized correctly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/netdevice.h |  4 ++--
 net/core/dev.c            | 13 +++++++++----
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74ed95215091..157e0242e9ee 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -355,7 +355,7 @@ enum {
 	NAPI_STATE_MISSED,	/* reschedule a napi */
 	NAPI_STATE_DISABLE,	/* Disable pending */
 	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
-	NAPI_STATE_HASHED,	/* In NAPI hash (busy polling possible) */
+	NAPI_STATE_LISTED,	/* NAPI added to system lists */
 	NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
 	NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
 };
@@ -365,7 +365,7 @@ enum {
 	NAPIF_STATE_MISSED	 = BIT(NAPI_STATE_MISSED),
 	NAPIF_STATE_DISABLE	 = BIT(NAPI_STATE_DISABLE),
 	NAPIF_STATE_NPSVC	 = BIT(NAPI_STATE_NPSVC),
-	NAPIF_STATE_HASHED	 = BIT(NAPI_STATE_HASHED),
+	NAPIF_STATE_LISTED	 = BIT(NAPI_STATE_LISTED),
 	NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
 	NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index 205b7339d4ae..e0a1be986824 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6533,8 +6533,7 @@ EXPORT_SYMBOL(napi_busy_loop);
 
 static void napi_hash_add(struct napi_struct *napi)
 {
-	if (test_bit(NAPI_STATE_NO_BUSY_POLL, &napi->state) ||
-	    test_and_set_bit(NAPI_STATE_HASHED, &napi->state))
+	if (test_bit(NAPI_STATE_NO_BUSY_POLL, &napi->state))
 		return;
 
 	spin_lock(&napi_hash_lock);
@@ -6559,8 +6558,7 @@ static void napi_hash_del(struct napi_struct *napi)
 {
 	spin_lock(&napi_hash_lock);
 
-	if (test_and_clear_bit(NAPI_STATE_HASHED, &napi->state))
-		hlist_del_rcu(&napi->napi_hash_node);
+	hlist_del_init_rcu(&napi->napi_hash_node);
 
 	spin_unlock(&napi_hash_lock);
 }
@@ -6595,7 +6593,11 @@ static void init_gro_hash(struct napi_struct *napi)
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
+	if (WARN_ON(test_and_set_bit(NAPI_STATE_LISTED, &napi->state)))
+		return;
+
 	INIT_LIST_HEAD(&napi->poll_list);
+	INIT_HLIST_NODE(&napi->napi_hash_node);
 	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	napi->timer.function = napi_watchdog;
 	init_gro_hash(napi);
@@ -6650,6 +6652,9 @@ static void flush_gro_hash(struct napi_struct *napi)
 /* Must be called in process context */
 void __netif_napi_del(struct napi_struct *napi)
 {
+	if (!test_and_clear_bit(NAPI_STATE_LISTED, &napi->state))
+		return;
+
 	napi_hash_del(napi);
 	list_del_init(&napi->dev_list);
 	napi_free_frags(napi);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 3/3] net: make sure napi_list is safe for RCU traversal
  2020-09-09 17:37 [PATCH net-next 0/3] netpoll: make sure napi_list is safe for RCU traversal Jakub Kicinski
  2020-09-09 17:37 ` [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API Jakub Kicinski
  2020-09-09 17:37 ` [PATCH net-next 2/3] net: manage napi add/del idempotence explicitly Jakub Kicinski
@ 2020-09-09 17:37 ` Jakub Kicinski
  2020-09-10 20:09 ` [PATCH net-next 0/3] netpoll: " David Miller
  3 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2020-09-09 17:37 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet, kernel-team, Jakub Kicinski

netpoll needs to traverse dev->napi_list under RCU, make
sure it uses the right iterator and that removal from this
list is handled safely.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/core/dev.c     | 2 +-
 net/core/netpoll.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e0a1be986824..03624192862a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6656,7 +6656,7 @@ void __netif_napi_del(struct napi_struct *napi)
 		return;
 
 	napi_hash_del(napi);
-	list_del_init(&napi->dev_list);
+	list_del_rcu(&napi->dev_list);
 	napi_free_frags(napi);
 
 	flush_gro_hash(napi);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 2338753e936b..c310c7c1cef7 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -297,7 +297,7 @@ static int netpoll_owner_active(struct net_device *dev)
 {
 	struct napi_struct *napi;
 
-	list_for_each_entry(napi, &dev->napi_list, dev_list) {
+	list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) {
 		if (napi->poll_owner == smp_processor_id())
 			return 1;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/3] netpoll: make sure napi_list is safe for RCU traversal
  2020-09-09 17:37 [PATCH net-next 0/3] netpoll: make sure napi_list is safe for RCU traversal Jakub Kicinski
                   ` (2 preceding siblings ...)
  2020-09-09 17:37 ` [PATCH net-next 3/3] net: make sure napi_list is safe for RCU traversal Jakub Kicinski
@ 2020-09-10 20:09 ` David Miller
  3 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2020-09-10 20:09 UTC (permalink / raw)
  To: kuba; +Cc: netdev, eric.dumazet, kernel-team

From: Jakub Kicinski <kuba@kernel.org>
Date: Wed,  9 Sep 2020 10:37:50 -0700

> This series is a follow-up to the fix in commit 96e97bc07e90 ("net: 
> disable netpoll on fresh napis"). To avoid any latent race conditions
> convert dev->napi_list to a proper RCU list. We need minor restructuring
> because it looks like netif_napi_del() used to be idempotent, and
> it may be quite hard to track down everyone who depends on that.

Series applied, thanks Jakub.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API
  2020-09-09 17:37 ` [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API Jakub Kicinski
@ 2020-11-24 18:00   ` Eric Dumazet
  2020-11-24 18:54     ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2020-11-24 18:00 UTC (permalink / raw)
  To: Jakub Kicinski, davem; +Cc: netdev, eric.dumazet, kernel-team



On 9/9/20 7:37 PM, Jakub Kicinski wrote:
> We allow drivers to call napi_hash_del() before calling
> netif_napi_del() to batch RCU grace periods. This makes
> the API asymmetric and leaks internal implementation details.
> Soon we will want the grace period to protect more than just
> the NAPI hash table.
> 
> Restructure the API and have drivers call a new function -
> __netif_napi_del() if they want to take care of RCU waits.
> 
> Note that only core was checking the return status from
> napi_hash_del() so the new helper does not report if the
> NAPI was actually deleted.
> 
> Some notes on driver oddness:
>  - veth observed the grace period before calling netif_napi_del()
>    but that should not matter
>  - myri10ge observed normal RCU flavor
>  - bnx2x and enic did not actually observe the grace period
>    (unless they did so implicitly)
>  - virtio_net and enic only unhashed Rx NAPIs
> 
> The last two points seem to indicate that the calls to
> napi_hash_del() were a left over rather than an optimization.
> Regardless, it's easy enough to correct them.
> 
> This patch may introduce extra synchronize_net() calls for
> interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
> free_netdev() to call netif_napi_del(). This seems inevitable
> since we want to use RCU for netpoll dev->napi_list traversal,
> and almost no drivers set IFF_DISABLE_NETPOLL.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

After this patch, gro_cells_destroy() became damn slow
on hosts with a lot of cores.

After your change, we have one additional synchronize_net() per cpu as
you stated in your changelog.

gro_cells_init() is setting NAPI_STATE_NO_BUSY_POLL, and this was enough
to not have one synchronize_net() call per netif_napi_del()

I will test something like :
I am not yet convinced the synchronize_net() is needed, since these
NAPI structs are not involved in busy polling.


diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
index e095fb871d9120787bfdf62149f4d82e0e3b0a51..8cfa6ce0738977290cc9f76a3f5daa617308e107 100644
--- a/net/core/gro_cells.c
+++ b/net/core/gro_cells.c
@@ -99,9 +99,10 @@ void gro_cells_destroy(struct gro_cells *gcells)
                struct gro_cell *cell = per_cpu_ptr(gcells->cells, i);
 
                napi_disable(&cell->napi);
-               netif_napi_del(&cell->napi);
+               __netif_napi_del(&cell->napi);
                __skb_queue_purge(&cell->napi_skbs);
        }
+       synchronize_net();
        free_percpu(gcells->cells);
        gcells->cells = NULL;
 }



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API
  2020-11-24 18:00   ` Eric Dumazet
@ 2020-11-24 18:54     ` Jakub Kicinski
  2020-11-24 19:11       ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2020-11-24 18:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, kernel-team

On Tue, 24 Nov 2020 19:00:50 +0100 Eric Dumazet wrote:
> On 9/9/20 7:37 PM, Jakub Kicinski wrote:
> > We allow drivers to call napi_hash_del() before calling
> > netif_napi_del() to batch RCU grace periods. This makes
> > the API asymmetric and leaks internal implementation details.
> > Soon we will want the grace period to protect more than just
> > the NAPI hash table.
> > 
> > Restructure the API and have drivers call a new function -
> > __netif_napi_del() if they want to take care of RCU waits.
> > 
> > Note that only core was checking the return status from
> > napi_hash_del() so the new helper does not report if the
> > NAPI was actually deleted.
> > 
> > Some notes on driver oddness:
> >  - veth observed the grace period before calling netif_napi_del()
> >    but that should not matter
> >  - myri10ge observed normal RCU flavor
> >  - bnx2x and enic did not actually observe the grace period
> >    (unless they did so implicitly)
> >  - virtio_net and enic only unhashed Rx NAPIs
> > 
> > The last two points seem to indicate that the calls to
> > napi_hash_del() were a left over rather than an optimization.
> > Regardless, it's easy enough to correct them.
> > 
> > This patch may introduce extra synchronize_net() calls for
> > interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
> > free_netdev() to call netif_napi_del(). This seems inevitable
> > since we want to use RCU for netpoll dev->napi_list traversal,
> > and almost no drivers set IFF_DISABLE_NETPOLL.
> > 
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
> 
> After this patch, gro_cells_destroy() became damn slow
> on hosts with a lot of cores.
> 
> After your change, we have one additional synchronize_net() per cpu as
> you stated in your changelog.

Sorry :S  I hope it didn't waste too much of your time..

> gro_cells_init() is setting NAPI_STATE_NO_BUSY_POLL, and this was enough
> to not have one synchronize_net() call per netif_napi_del()
> 
> I will test something like :
> I am not yet convinced the synchronize_net() is needed, since these
> NAPI structs are not involved in busy polling.

IDK how this squares against netpoll, though?

> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> index e095fb871d9120787bfdf62149f4d82e0e3b0a51..8cfa6ce0738977290cc9f76a3f5daa617308e107 100644
> --- a/net/core/gro_cells.c
> +++ b/net/core/gro_cells.c
> @@ -99,9 +99,10 @@ void gro_cells_destroy(struct gro_cells *gcells)
>                 struct gro_cell *cell = per_cpu_ptr(gcells->cells, i);
>  
>                 napi_disable(&cell->napi);
> -               netif_napi_del(&cell->napi);
> +               __netif_napi_del(&cell->napi);
>                 __skb_queue_purge(&cell->napi_skbs);
>         }
> +       synchronize_net();
>         free_percpu(gcells->cells);
>         gcells->cells = NULL;
>  }
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API
  2020-11-24 18:54     ` Jakub Kicinski
@ 2020-11-24 19:11       ` Eric Dumazet
  2020-11-24 19:36         ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2020-11-24 19:11 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, kernel-team



On 11/24/20 7:54 PM, Jakub Kicinski wrote:
> On Tue, 24 Nov 2020 19:00:50 +0100 Eric Dumazet wrote:
>> On 9/9/20 7:37 PM, Jakub Kicinski wrote:
>>> We allow drivers to call napi_hash_del() before calling
>>> netif_napi_del() to batch RCU grace periods. This makes
>>> the API asymmetric and leaks internal implementation details.
>>> Soon we will want the grace period to protect more than just
>>> the NAPI hash table.
>>>
>>> Restructure the API and have drivers call a new function -
>>> __netif_napi_del() if they want to take care of RCU waits.
>>>
>>> Note that only core was checking the return status from
>>> napi_hash_del() so the new helper does not report if the
>>> NAPI was actually deleted.
>>>
>>> Some notes on driver oddness:
>>>  - veth observed the grace period before calling netif_napi_del()
>>>    but that should not matter
>>>  - myri10ge observed normal RCU flavor
>>>  - bnx2x and enic did not actually observe the grace period
>>>    (unless they did so implicitly)
>>>  - virtio_net and enic only unhashed Rx NAPIs
>>>
>>> The last two points seem to indicate that the calls to
>>> napi_hash_del() were a left over rather than an optimization.
>>> Regardless, it's easy enough to correct them.
>>>
>>> This patch may introduce extra synchronize_net() calls for
>>> interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
>>> free_netdev() to call netif_napi_del(). This seems inevitable
>>> since we want to use RCU for netpoll dev->napi_list traversal,
>>> and almost no drivers set IFF_DISABLE_NETPOLL.
>>>
>>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
>>
>> After this patch, gro_cells_destroy() became damn slow
>> on hosts with a lot of cores.
>>
>> After your change, we have one additional synchronize_net() per cpu as
>> you stated in your changelog.
> 
> Sorry :S  I hope it didn't waste too much of your time..

Do not worry ;)

> 
>> gro_cells_init() is setting NAPI_STATE_NO_BUSY_POLL, and this was enough
>> to not have one synchronize_net() call per netif_napi_del()
>>
>> I will test something like :
>> I am not yet convinced the synchronize_net() is needed, since these
>> NAPI structs are not involved in busy polling.
> 
> IDK how this squares against netpoll, though?
> 

Can we actually attach netpoll to a virtual device using gro_cells ?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API
  2020-11-24 19:11       ` Eric Dumazet
@ 2020-11-24 19:36         ` Jakub Kicinski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2020-11-24 19:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, kernel-team

On Tue, 24 Nov 2020 20:11:42 +0100 Eric Dumazet wrote:
> >> gro_cells_init() is setting NAPI_STATE_NO_BUSY_POLL, and this was enough
> >> to not have one synchronize_net() call per netif_napi_del()
> >>
> >> I will test something like :
> >> I am not yet convinced the synchronize_net() is needed, since these
> >> NAPI structs are not involved in busy polling.  
> > 
> > IDK how this squares against netpoll, though?
> >   
> 
> Can we actually attach netpoll to a virtual device using gro_cells ?

Not 100% sure but I grabbed a machine running 5.2 and it appears to
work:

# ip tunnel add gre1 mode gre local 192.168.123.1 remote 192.168.123.2 ttl 255
# ip link set gre1 up
# ip addr add 10.12.34.56/30 dev gre1

# cd /sys/kernel/config/netconsole/
# cat enabled 
0
# echo gre1 > dev_name 
# echo 10.12.34.57 > remote_ip 
# echo cb:a9:87:65:43:21 > remote_mac 
# echo 1 > enabled
# dmesg | tail
[16859.016632] gre: GRE over IPv4 demultiplexor driver
[16859.019831] ip_gre: GRE over IPv4 tunneling driver
[16945.776625] netpoll: netconsole: local port 6665
[16945.776627] netpoll: netconsole: local IPv4 address 0.0.0.0
[16945.776628] netpoll: netconsole: interface 'gre1'
[16945.776629] netpoll: netconsole: remote port 6666
[16945.776630] netpoll: netconsole: remote IPv4 address 10.12.34.57
[16945.776631] netpoll: netconsole: remote ethernet address cb:a9:87:65:43:21
[16945.776633] netpoll: netconsole: local IP 10.12.34.56
[16945.776635] netconsole: network logging started
# ip li show dev gre1
6: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/gre 192.168.123.1 peer 192.168.123.2


Not sure anything actually comes out on the other end of the pipe :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-24 19:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09 17:37 [PATCH net-next 0/3] netpoll: make sure napi_list is safe for RCU traversal Jakub Kicinski
2020-09-09 17:37 ` [PATCH net-next 1/3] net: remove napi_hash_del() from driver-facing API Jakub Kicinski
2020-11-24 18:00   ` Eric Dumazet
2020-11-24 18:54     ` Jakub Kicinski
2020-11-24 19:11       ` Eric Dumazet
2020-11-24 19:36         ` Jakub Kicinski
2020-09-09 17:37 ` [PATCH net-next 2/3] net: manage napi add/del idempotence explicitly Jakub Kicinski
2020-09-09 17:37 ` [PATCH net-next 3/3] net: make sure napi_list is safe for RCU traversal Jakub Kicinski
2020-09-10 20:09 ` [PATCH net-next 0/3] netpoll: " David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.