bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf 0/4] Fix concurrency issues between XSK wakeup and control path using RCU
@ 2019-12-05 15:51 Maxim Mikityanskiy
  2019-12-05 15:51 ` [PATCH bpf 1/4] xsk: Add rcu_read_lock around the XSK wakeup Maxim Mikityanskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-05 15:51 UTC (permalink / raw)
  To: Björn Töpel, Magnus Karlsson, Jeff Kirsher
  Cc: bpf, netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon,
	Maxim Mikityanskiy

This series addresses the issue described in the commit message of the
first patch: lack of synchronization between XSK wakeup and destroying
the resources used by XSK wakeup. The idea is similar to
napi_synchronize. The series contains fixes for the drivers that
implement XSK. I haven't tested the changes to Intel's drivers, so,
Intel guys, please review them.

Maxim Mikityanskiy (4):
  xsk: Add rcu_read_lock around the XSK wakeup
  net/mlx5e: Fix concurrency issues between config flow and XSK
  net/i40e: Fix concurrency issues between config flow and XSK
  net/ixgbe: Fix concurrency issues between config flow and XSK

 drivers/net/ethernet/intel/i40e/i40e_main.c   |  7 ++++--
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    |  4 ++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  6 ++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  8 +++++--
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  | 22 ++++++++-----------
 .../mellanox/mlx5/core/en/xsk/setup.c         |  1 +
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 19 +---------------
 net/xdp/xsk.c                                 |  8 +++++--
 10 files changed, 39 insertions(+), 40 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf 1/4] xsk: Add rcu_read_lock around the XSK wakeup
  2019-12-05 15:51 [PATCH bpf 0/4] Fix concurrency issues between XSK wakeup and control path using RCU Maxim Mikityanskiy
@ 2019-12-05 15:51 ` Maxim Mikityanskiy
  2019-12-06  9:02   ` Björn Töpel
  2019-12-05 15:51 ` [PATCH bpf 2/4] net/mlx5e: Fix concurrency issues between config flow and XSK Maxim Mikityanskiy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-05 15:51 UTC (permalink / raw)
  To: Björn Töpel, Magnus Karlsson, Jeff Kirsher
  Cc: bpf, netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon,
	Maxim Mikityanskiy

The XSK wakeup callback in drivers makes some sanity checks before
triggering NAPI. However, some configuration changes may occur during
this function that affect the result of those checks. For example, the
interface can go down, and all the resources will be destroyed after the
checks in the wakeup function, but before it attempts to use these
resources. Wrap this callback in rcu_read_lock to allow driver to
synchronize_rcu before actually destroying the resources.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
 net/xdp/xsk.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 956793893c9d..d2261c90f03a 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -337,9 +337,13 @@ EXPORT_SYMBOL(xsk_umem_consume_tx);
 static int xsk_zc_xmit(struct xdp_sock *xs)
 {
 	struct net_device *dev = xs->dev;
+	int err;
 
-	return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
-					       XDP_WAKEUP_TX);
+	rcu_read_lock();
+	err = dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, XDP_WAKEUP_TX);
+	rcu_read_unlock();
+
+	return err;
 }
 
 static void xsk_destruct_skb(struct sk_buff *skb)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf 2/4] net/mlx5e: Fix concurrency issues between config flow and XSK
  2019-12-05 15:51 [PATCH bpf 0/4] Fix concurrency issues between XSK wakeup and control path using RCU Maxim Mikityanskiy
  2019-12-05 15:51 ` [PATCH bpf 1/4] xsk: Add rcu_read_lock around the XSK wakeup Maxim Mikityanskiy
@ 2019-12-05 15:51 ` Maxim Mikityanskiy
  2019-12-06  9:03   ` Björn Töpel
  2019-12-05 15:51 ` [PATCH bpf 3/4] net/i40e: " Maxim Mikityanskiy
  2019-12-05 15:51 ` [PATCH bpf 4/4] net/ixgbe: " Maxim Mikityanskiy
  3 siblings, 1 reply; 13+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-05 15:51 UTC (permalink / raw)
  To: Björn Töpel, Magnus Karlsson, Jeff Kirsher
  Cc: bpf, netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon,
	Maxim Mikityanskiy

After disabling resources necessary for XSK (the XDP program, channels,
XSK queues), use synchronize_rcu to wait until the XSK wakeup function
finishes, before freeing the resources.

Suspend XSK wakeups during switching channels. If the XDP program is
being removed, synchronize_rcu before closing the old channels to allow
XSK wakeup to complete.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  | 22 ++++++++-----------
 .../mellanox/mlx5/core/en/xsk/setup.c         |  1 +
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 19 +---------------
 5 files changed, 13 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f1a7bc46f1c0..61084c3744ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -760,7 +760,7 @@ enum {
 	MLX5E_STATE_OPENED,
 	MLX5E_STATE_DESTROYING,
 	MLX5E_STATE_XDP_TX_ENABLED,
-	MLX5E_STATE_XDP_OPEN,
+	MLX5E_STATE_XDP_ACTIVE,
 };
 
 struct mlx5e_rqt {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
index 36ac1e3816b9..d7587f40ecae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -75,12 +75,18 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
 static inline void mlx5e_xdp_tx_enable(struct mlx5e_priv *priv)
 {
 	set_bit(MLX5E_STATE_XDP_TX_ENABLED, &priv->state);
+
+	if (priv->channels.params.xdp_prog)
+		set_bit(MLX5E_STATE_XDP_ACTIVE, &priv->state);
 }
 
 static inline void mlx5e_xdp_tx_disable(struct mlx5e_priv *priv)
 {
+	if (priv->channels.params.xdp_prog)
+		clear_bit(MLX5E_STATE_XDP_ACTIVE, &priv->state);
+
 	clear_bit(MLX5E_STATE_XDP_TX_ENABLED, &priv->state);
-	/* let other device's napi(s) see our new state */
+	/* Let other device's napi(s) and XSK wakeups see our new state. */
 	synchronize_rcu();
 }
 
@@ -89,19 +95,9 @@ static inline bool mlx5e_xdp_tx_is_enabled(struct mlx5e_priv *priv)
 	return test_bit(MLX5E_STATE_XDP_TX_ENABLED, &priv->state);
 }
 
-static inline void mlx5e_xdp_set_open(struct mlx5e_priv *priv)
-{
-	set_bit(MLX5E_STATE_XDP_OPEN, &priv->state);
-}
-
-static inline void mlx5e_xdp_set_closed(struct mlx5e_priv *priv)
-{
-	clear_bit(MLX5E_STATE_XDP_OPEN, &priv->state);
-}
-
-static inline bool mlx5e_xdp_is_open(struct mlx5e_priv *priv)
+static inline bool mlx5e_xdp_is_active(struct mlx5e_priv *priv)
 {
-	return test_bit(MLX5E_STATE_XDP_OPEN, &priv->state);
+	return test_bit(MLX5E_STATE_XDP_ACTIVE, &priv->state);
 }
 
 static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_xdpsq *sq)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index 631af8dee517..c28cbae42331 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -144,6 +144,7 @@ void mlx5e_close_xsk(struct mlx5e_channel *c)
 {
 	clear_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
 	napi_synchronize(&c->napi);
+	synchronize_rcu(); /* Sync with the XSK wakeup. */
 
 	mlx5e_close_rq(&c->xskrq);
 	mlx5e_close_cq(&c->xskrq.cq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
index 87827477d38c..fe2d596cb361 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
@@ -14,7 +14,7 @@ int mlx5e_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 	struct mlx5e_channel *c;
 	u16 ix;
 
-	if (unlikely(!mlx5e_xdp_is_open(priv)))
+	if (unlikely(!mlx5e_xdp_is_active(priv)))
 		return -ENETDOWN;
 
 	if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix)))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 09ed7f5f688b..fe1a42fa214b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3006,12 +3006,9 @@ void mlx5e_timestamp_init(struct mlx5e_priv *priv)
 int mlx5e_open_locked(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
-	bool is_xdp = priv->channels.params.xdp_prog;
 	int err;
 
 	set_bit(MLX5E_STATE_OPENED, &priv->state);
-	if (is_xdp)
-		mlx5e_xdp_set_open(priv);
 
 	err = mlx5e_open_channels(priv, &priv->channels);
 	if (err)
@@ -3026,8 +3023,6 @@ int mlx5e_open_locked(struct net_device *netdev)
 	return 0;
 
 err_clear_state_opened_flag:
-	if (is_xdp)
-		mlx5e_xdp_set_closed(priv);
 	clear_bit(MLX5E_STATE_OPENED, &priv->state);
 	return err;
 }
@@ -3059,8 +3054,6 @@ int mlx5e_close_locked(struct net_device *netdev)
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
 		return 0;
 
-	if (priv->channels.params.xdp_prog)
-		mlx5e_xdp_set_closed(priv);
 	clear_bit(MLX5E_STATE_OPENED, &priv->state);
 
 	netif_carrier_off(priv->netdev);
@@ -4377,16 +4370,6 @@ static int mlx5e_xdp_allowed(struct mlx5e_priv *priv, struct bpf_prog *prog)
 	return 0;
 }
 
-static int mlx5e_xdp_update_state(struct mlx5e_priv *priv)
-{
-	if (priv->channels.params.xdp_prog)
-		mlx5e_xdp_set_open(priv);
-	else
-		mlx5e_xdp_set_closed(priv);
-
-	return 0;
-}
-
 static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -4421,7 +4404,7 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 		mlx5e_set_rq_type(priv->mdev, &new_channels.params);
 		old_prog = priv->channels.params.xdp_prog;
 
-		err = mlx5e_safe_switch_channels(priv, &new_channels, mlx5e_xdp_update_state);
+		err = mlx5e_safe_switch_channels(priv, &new_channels, NULL);
 		if (err)
 			goto unlock;
 	} else {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf 3/4] net/i40e: Fix concurrency issues between config flow and XSK
  2019-12-05 15:51 [PATCH bpf 0/4] Fix concurrency issues between XSK wakeup and control path using RCU Maxim Mikityanskiy
  2019-12-05 15:51 ` [PATCH bpf 1/4] xsk: Add rcu_read_lock around the XSK wakeup Maxim Mikityanskiy
  2019-12-05 15:51 ` [PATCH bpf 2/4] net/mlx5e: Fix concurrency issues between config flow and XSK Maxim Mikityanskiy
@ 2019-12-05 15:51 ` Maxim Mikityanskiy
  2019-12-06  9:03   ` Björn Töpel
  2019-12-05 15:51 ` [PATCH bpf 4/4] net/ixgbe: " Maxim Mikityanskiy
  3 siblings, 1 reply; 13+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-05 15:51 UTC (permalink / raw)
  To: Björn Töpel, Magnus Karlsson, Jeff Kirsher
  Cc: bpf, netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon,
	Maxim Mikityanskiy

Use synchronize_rcu to wait until the XSK wakeup function finishes
before destroying the resources it uses:

1. i40e_down already calls synchronize_rcu. On i40e_down either
__I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
i40e_xsk_async_xmit (the former is already checked there).

2. After switching the XDP program, call synchronize_rcu to let
i40e_xsk_async_xmit exit before the XDP program is freed.

3. Changing the number of channels brings the interface down (see
i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).

4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++++--
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 4 ++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1ccabeafa44c..afa3a99e68e1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6823,8 +6823,8 @@ void i40e_down(struct i40e_vsi *vsi)
 	for (i = 0; i < vsi->num_queue_pairs; i++) {
 		i40e_clean_tx_ring(vsi->tx_rings[i]);
 		if (i40e_enabled_xdp_vsi(vsi)) {
-			/* Make sure that in-progress ndo_xdp_xmit
-			 * calls are completed.
+			/* Make sure that in-progress ndo_xdp_xmit and
+			 * ndo_xsk_async_xmit calls are completed.
 			 */
 			synchronize_rcu();
 			i40e_clean_tx_ring(vsi->xdp_rings[i]);
@@ -12545,6 +12545,9 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
 		i40e_prep_for_reset(pf, true);
 
 	old_prog = xchg(&vsi->xdp_prog, prog);
+	if (old_prog)
+		/* Wait until ndo_xsk_async_xmit completes. */
+		synchronize_rcu();
 
 	if (need_reset)
 		i40e_reset_and_rebuild(pf, true, true);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index d07e1a890428..f73cd917c44f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -787,8 +787,12 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
 {
 	struct i40e_netdev_priv *np = netdev_priv(dev);
 	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
 	struct i40e_ring *ring;
 
+	if (test_bit(__I40E_CONFIG_BUSY, pf->state))
+		return -ENETDOWN;
+
 	if (test_bit(__I40E_VSI_DOWN, vsi->state))
 		return -ENETDOWN;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf 4/4] net/ixgbe: Fix concurrency issues between config flow and XSK
  2019-12-05 15:51 [PATCH bpf 0/4] Fix concurrency issues between XSK wakeup and control path using RCU Maxim Mikityanskiy
                   ` (2 preceding siblings ...)
  2019-12-05 15:51 ` [PATCH bpf 3/4] net/i40e: " Maxim Mikityanskiy
@ 2019-12-05 15:51 ` Maxim Mikityanskiy
  2019-12-06  9:03   ` Björn Töpel
  3 siblings, 1 reply; 13+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-05 15:51 UTC (permalink / raw)
  To: Björn Töpel, Magnus Karlsson, Jeff Kirsher
  Cc: bpf, netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon,
	Maxim Mikityanskiy

Use synchronize_rcu to wait until the XSK wakeup function finishes
before destroying the resources it uses:

1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN.

2. After switching the XDP program, call synchronize_rcu to let
ixgbe_xsk_async_xmit exit before the XDP program is freed.

3. Changing the number of channels brings the interface down.

4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware
resources and resetting xsk_umem. Check that bit in ixgbe_xsk_async_xmit
to avoid using the XDP ring when it's already destroyed. synchronize_rcu
is called from ixgbe_txrx_ring_disable.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 8 ++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 25c097cd8100..60503318c7e5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10273,8 +10273,12 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
 			    adapter->xdp_prog);
 	}
 
-	if (old_prog)
+	if (old_prog) {
+		/* Wait until ndo_xsk_async_xmit completes. */
+		synchronize_rcu();
+
 		bpf_prog_put(old_prog);
+	}
 
 	/* Kick start the NAPI context if there is an AF_XDP socket open
 	 * on that queue id. This so that receiving will start.
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index d6feaacfbf89..b43be9f14105 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -709,10 +709,14 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 	if (qid >= adapter->num_xdp_queues)
 		return -ENXIO;
 
-	if (!adapter->xdp_ring[qid]->xsk_umem)
+	ring = adapter->xdp_ring[qid];
+
+	if (test_bit(__IXGBE_TX_DISABLED, &ring->state))
+		return -ENETDOWN;
+
+	if (!ring->xsk_umem)
 		return -ENXIO;
 
-	ring = adapter->xdp_ring[qid];
 	if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
 		u64 eics = BIT_ULL(ring->q_vector->v_idx);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf 1/4] xsk: Add rcu_read_lock around the XSK wakeup
  2019-12-05 15:51 ` [PATCH bpf 1/4] xsk: Add rcu_read_lock around the XSK wakeup Maxim Mikityanskiy
@ 2019-12-06  9:02   ` Björn Töpel
  0 siblings, 0 replies; 13+ messages in thread
From: Björn Töpel @ 2019-12-06  9:02 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Björn Töpel, Magnus Karlsson, Jeff Kirsher, bpf,
	netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon

On Thu, 5 Dec 2019 at 16:52, Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> The XSK wakeup callback in drivers makes some sanity checks before
> triggering NAPI. However, some configuration changes may occur during
> this function that affect the result of those checks. For example, the
> interface can go down, and all the resources will be destroyed after the
> checks in the wakeup function, but before it attempts to use these
> resources. Wrap this callback in rcu_read_lock to allow driver to
> synchronize_rcu before actually destroying the resources.
>

Thanks for taking a deeper look!

> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> ---
>  net/xdp/xsk.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 956793893c9d..d2261c90f03a 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -337,9 +337,13 @@ EXPORT_SYMBOL(xsk_umem_consume_tx);
>  static int xsk_zc_xmit(struct xdp_sock *xs)
>  {
>         struct net_device *dev = xs->dev;
> +       int err;
>
> -       return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> -                                              XDP_WAKEUP_TX);
> +       rcu_read_lock();
> +       err = dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, XDP_WAKEUP_TX);
> +       rcu_read_unlock();
> +

The rationale for the the not having any synchronization on the
ndo_xsk_wakeup was to not constrain the drivers. The idea was to let
drivers take care of the required synchronization themselves, since
this is most likely driver specific. I'd prefer leaving that to the
driver implementors, not having the read-lock in the generic AF_XDP
code.

(And note that the ndo_xsk_wakeup is also called in the poll() implementation.)

I don't think this is needed for the Intel drivers, but let me
elaborate on that in those patches. Note "think" here -- I might be
way off here! :-)

> +       return err;
>  }
>
>  static void xsk_destruct_skb(struct sk_buff *skb)
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf 2/4] net/mlx5e: Fix concurrency issues between config flow and XSK
  2019-12-05 15:51 ` [PATCH bpf 2/4] net/mlx5e: Fix concurrency issues between config flow and XSK Maxim Mikityanskiy
@ 2019-12-06  9:03   ` Björn Töpel
  0 siblings, 0 replies; 13+ messages in thread
From: Björn Töpel @ 2019-12-06  9:03 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Björn Töpel, Magnus Karlsson, Jeff Kirsher, bpf,
	netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon

On Thu, 5 Dec 2019 at 16:52, Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> After disabling resources necessary for XSK (the XDP program, channels,
> XSK queues), use synchronize_rcu to wait until the XSK wakeup function
> finishes, before freeing the resources.
>
> Suspend XSK wakeups during switching channels. If the XDP program is
> being removed, synchronize_rcu before closing the old channels to allow
> XSK wakeup to complete.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
>  .../net/ethernet/mellanox/mlx5/core/en/xdp.h  | 22 ++++++++-----------
>  .../mellanox/mlx5/core/en/xsk/setup.c         |  1 +
>  .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_main.c | 19 +---------------
>  5 files changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index f1a7bc46f1c0..61084c3744ba 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -760,7 +760,7 @@ enum {
>         MLX5E_STATE_OPENED,
>         MLX5E_STATE_DESTROYING,
>         MLX5E_STATE_XDP_TX_ENABLED,
> -       MLX5E_STATE_XDP_OPEN,
> +       MLX5E_STATE_XDP_ACTIVE,
>  };
>
>  struct mlx5e_rqt {
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> index 36ac1e3816b9..d7587f40ecae 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> @@ -75,12 +75,18 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
>  static inline void mlx5e_xdp_tx_enable(struct mlx5e_priv *priv)
>  {
>         set_bit(MLX5E_STATE_XDP_TX_ENABLED, &priv->state);
> +
> +       if (priv->channels.params.xdp_prog)
> +               set_bit(MLX5E_STATE_XDP_ACTIVE, &priv->state);
>  }
>
>  static inline void mlx5e_xdp_tx_disable(struct mlx5e_priv *priv)
>  {
> +       if (priv->channels.params.xdp_prog)
> +               clear_bit(MLX5E_STATE_XDP_ACTIVE, &priv->state);
> +
>         clear_bit(MLX5E_STATE_XDP_TX_ENABLED, &priv->state);
> -       /* let other device's napi(s) see our new state */
> +       /* Let other device's napi(s) and XSK wakeups see our new state. */
>         synchronize_rcu();
>  }
>
> @@ -89,19 +95,9 @@ static inline bool mlx5e_xdp_tx_is_enabled(struct mlx5e_priv *priv)
>         return test_bit(MLX5E_STATE_XDP_TX_ENABLED, &priv->state);
>  }
>
> -static inline void mlx5e_xdp_set_open(struct mlx5e_priv *priv)
> -{
> -       set_bit(MLX5E_STATE_XDP_OPEN, &priv->state);
> -}
> -
> -static inline void mlx5e_xdp_set_closed(struct mlx5e_priv *priv)
> -{
> -       clear_bit(MLX5E_STATE_XDP_OPEN, &priv->state);
> -}
> -
> -static inline bool mlx5e_xdp_is_open(struct mlx5e_priv *priv)
> +static inline bool mlx5e_xdp_is_active(struct mlx5e_priv *priv)
>  {
> -       return test_bit(MLX5E_STATE_XDP_OPEN, &priv->state);
> +       return test_bit(MLX5E_STATE_XDP_ACTIVE, &priv->state);
>  }
>
>  static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_xdpsq *sq)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
> index 631af8dee517..c28cbae42331 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
> @@ -144,6 +144,7 @@ void mlx5e_close_xsk(struct mlx5e_channel *c)
>  {
>         clear_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
>         napi_synchronize(&c->napi);
> +       synchronize_rcu(); /* Sync with the XSK wakeup. */

Again, so my idea was that the read-lock can be done here, instead of
the generic AF_XDP code, since it's driver specific. Agree?

>
>         mlx5e_close_rq(&c->xskrq);
>         mlx5e_close_cq(&c->xskrq.cq);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> index 87827477d38c..fe2d596cb361 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
> @@ -14,7 +14,7 @@ int mlx5e_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
>         struct mlx5e_channel *c;
>         u16 ix;
>
> -       if (unlikely(!mlx5e_xdp_is_open(priv)))
> +       if (unlikely(!mlx5e_xdp_is_active(priv)))
>                 return -ENETDOWN;
>
>         if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix)))
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 09ed7f5f688b..fe1a42fa214b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3006,12 +3006,9 @@ void mlx5e_timestamp_init(struct mlx5e_priv *priv)
>  int mlx5e_open_locked(struct net_device *netdev)
>  {
>         struct mlx5e_priv *priv = netdev_priv(netdev);
> -       bool is_xdp = priv->channels.params.xdp_prog;
>         int err;
>
>         set_bit(MLX5E_STATE_OPENED, &priv->state);
> -       if (is_xdp)
> -               mlx5e_xdp_set_open(priv);
>
>         err = mlx5e_open_channels(priv, &priv->channels);
>         if (err)
> @@ -3026,8 +3023,6 @@ int mlx5e_open_locked(struct net_device *netdev)
>         return 0;
>
>  err_clear_state_opened_flag:
> -       if (is_xdp)
> -               mlx5e_xdp_set_closed(priv);
>         clear_bit(MLX5E_STATE_OPENED, &priv->state);
>         return err;
>  }
> @@ -3059,8 +3054,6 @@ int mlx5e_close_locked(struct net_device *netdev)
>         if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
>                 return 0;
>
> -       if (priv->channels.params.xdp_prog)
> -               mlx5e_xdp_set_closed(priv);
>         clear_bit(MLX5E_STATE_OPENED, &priv->state);
>
>         netif_carrier_off(priv->netdev);
> @@ -4377,16 +4370,6 @@ static int mlx5e_xdp_allowed(struct mlx5e_priv *priv, struct bpf_prog *prog)
>         return 0;
>  }
>
> -static int mlx5e_xdp_update_state(struct mlx5e_priv *priv)
> -{
> -       if (priv->channels.params.xdp_prog)
> -               mlx5e_xdp_set_open(priv);
> -       else
> -               mlx5e_xdp_set_closed(priv);
> -
> -       return 0;
> -}
> -
>  static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
>  {
>         struct mlx5e_priv *priv = netdev_priv(netdev);
> @@ -4421,7 +4404,7 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
>                 mlx5e_set_rq_type(priv->mdev, &new_channels.params);
>                 old_prog = priv->channels.params.xdp_prog;
>
> -               err = mlx5e_safe_switch_channels(priv, &new_channels, mlx5e_xdp_update_state);
> +               err = mlx5e_safe_switch_channels(priv, &new_channels, NULL);
>                 if (err)
>                         goto unlock;
>         } else {
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf 3/4] net/i40e: Fix concurrency issues between config flow and XSK
  2019-12-05 15:51 ` [PATCH bpf 3/4] net/i40e: " Maxim Mikityanskiy
@ 2019-12-06  9:03   ` Björn Töpel
  2019-12-10 14:26     ` Maxim Mikityanskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Björn Töpel @ 2019-12-06  9:03 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Björn Töpel, Magnus Karlsson, Jeff Kirsher, bpf,
	netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon

On Thu, 5 Dec 2019 at 16:52, Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> Use synchronize_rcu to wait until the XSK wakeup function finishes
> before destroying the resources it uses:
>
> 1. i40e_down already calls synchronize_rcu. On i40e_down either
> __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
> i40e_xsk_async_xmit (the former is already checked there).
>
> 2. After switching the XDP program, call synchronize_rcu to let
> i40e_xsk_async_xmit exit before the XDP program is freed.
>
> 3. Changing the number of channels brings the interface down (see
> i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).
>
> 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++++--
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 4 ++++
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 1ccabeafa44c..afa3a99e68e1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -6823,8 +6823,8 @@ void i40e_down(struct i40e_vsi *vsi)
>         for (i = 0; i < vsi->num_queue_pairs; i++) {
>                 i40e_clean_tx_ring(vsi->tx_rings[i]);
>                 if (i40e_enabled_xdp_vsi(vsi)) {
> -                       /* Make sure that in-progress ndo_xdp_xmit
> -                        * calls are completed.
> +                       /* Make sure that in-progress ndo_xdp_xmit and
> +                        * ndo_xsk_async_xmit calls are completed.
>                          */
>                         synchronize_rcu();
>                         i40e_clean_tx_ring(vsi->xdp_rings[i]);
> @@ -12545,6 +12545,9 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
>                 i40e_prep_for_reset(pf, true);
>
>         old_prog = xchg(&vsi->xdp_prog, prog);
> +       if (old_prog)
> +               /* Wait until ndo_xsk_async_xmit completes. */
> +               synchronize_rcu();

This is not needed -- please correct me if I'm missing something! If
we're disabling XDP, the need_reset-clause will take care or the
proper synchronization. And we don't need to worry about old_prog for
the swap-XDP case, because bpf_prog_put() does cleanup with
call_rcu().


>
>         if (need_reset)
>                 i40e_reset_and_rebuild(pf, true, true);
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index d07e1a890428..f73cd917c44f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -787,8 +787,12 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
>  {
>         struct i40e_netdev_priv *np = netdev_priv(dev);
>         struct i40e_vsi *vsi = np->vsi;
> +       struct i40e_pf *pf = vsi->back;
>         struct i40e_ring *ring;
>
> +       if (test_bit(__I40E_CONFIG_BUSY, pf->state))
> +               return -ENETDOWN;
> +

You right that we need to check for BUSY, since the XDP ring might be
stale! Thanks for catching this! Can you respin this patch, with just
this hunk? (Unless I'm wrong! :-))



>         if (test_bit(__I40E_VSI_DOWN, vsi->state))
>                 return -ENETDOWN;
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf 4/4] net/ixgbe: Fix concurrency issues between config flow and XSK
  2019-12-05 15:51 ` [PATCH bpf 4/4] net/ixgbe: " Maxim Mikityanskiy
@ 2019-12-06  9:03   ` Björn Töpel
  0 siblings, 0 replies; 13+ messages in thread
From: Björn Töpel @ 2019-12-06  9:03 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Björn Töpel, Magnus Karlsson, Jeff Kirsher, bpf,
	netdev, intel-wired-lan, David S. Miller, Saeed Mahameed,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jonathan Lemon

On Thu, 5 Dec 2019 at 16:52, Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> Use synchronize_rcu to wait until the XSK wakeup function finishes
> before destroying the resources it uses:
>
> 1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN.
>
> 2. After switching the XDP program, call synchronize_rcu to let
> ixgbe_xsk_async_xmit exit before the XDP program is freed.
>
> 3. Changing the number of channels brings the interface down.
>
> 4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware
> resources and resetting xsk_umem. Check that bit in ixgbe_xsk_async_xmit
> to avoid using the XDP ring when it's already destroyed. synchronize_rcu
> is called from ixgbe_txrx_ring_disable.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +++++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 8 ++++++--
>  2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 25c097cd8100..60503318c7e5 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -10273,8 +10273,12 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>                             adapter->xdp_prog);
>         }
>
> -       if (old_prog)
> +       if (old_prog) {
> +               /* Wait until ndo_xsk_async_xmit completes. */
> +               synchronize_rcu();
> +
>                 bpf_prog_put(old_prog);
> +       }
>
>         /* Kick start the NAPI context if there is an AF_XDP socket open
>          * on that queue id. This so that receiving will start.
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index d6feaacfbf89..b43be9f14105 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -709,10 +709,14 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
>         if (qid >= adapter->num_xdp_queues)
>                 return -ENXIO;
>
> -       if (!adapter->xdp_ring[qid]->xsk_umem)
> +       ring = adapter->xdp_ring[qid];
> +
> +       if (test_bit(__IXGBE_TX_DISABLED, &ring->state))
> +               return -ENETDOWN;
> +
> +       if (!ring->xsk_umem)

Pretty much same comment as in i40e. The synchronize_rcu() is not
needed, but the additional test is!

>                 return -ENXIO;
>
> -       ring = adapter->xdp_ring[qid];
>         if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
>                 u64 eics = BIT_ULL(ring->q_vector->v_idx);
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf 3/4] net/i40e: Fix concurrency issues between config flow and XSK
  2019-12-06  9:03   ` Björn Töpel
@ 2019-12-10 14:26     ` Maxim Mikityanskiy
  2019-12-11 10:08       ` Björn Töpel
  0 siblings, 1 reply; 13+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-10 14:26 UTC (permalink / raw)
  To: Björn Töpel, Björn Töpel
  Cc: Magnus Karlsson, Jeff Kirsher, bpf, netdev, intel-wired-lan,
	David S. Miller, Saeed Mahameed, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Jonathan Lemon

On 2019-12-06 11:03, Björn Töpel wrote:
> On Thu, 5 Dec 2019 at 16:52, Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>>
>> Use synchronize_rcu to wait until the XSK wakeup function finishes
>> before destroying the resources it uses:
>>
>> 1. i40e_down already calls synchronize_rcu. On i40e_down either
>> __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
>> i40e_xsk_async_xmit (the former is already checked there).
>>
>> 2. After switching the XDP program, call synchronize_rcu to let
>> i40e_xsk_async_xmit exit before the XDP program is freed.
>>
>> 3. Changing the number of channels brings the interface down (see
>> i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).
>>
>> 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.
>>
>> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
>> ---
>>   drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++++--
>>   drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 4 ++++
>>   2 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> index 1ccabeafa44c..afa3a99e68e1 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> @@ -6823,8 +6823,8 @@ void i40e_down(struct i40e_vsi *vsi)
>>          for (i = 0; i < vsi->num_queue_pairs; i++) {
>>                  i40e_clean_tx_ring(vsi->tx_rings[i]);
>>                  if (i40e_enabled_xdp_vsi(vsi)) {
>> -                       /* Make sure that in-progress ndo_xdp_xmit
>> -                        * calls are completed.
>> +                       /* Make sure that in-progress ndo_xdp_xmit and
>> +                        * ndo_xsk_async_xmit calls are completed.

Ooops, I see now some changes were lost during rebasing. I had a version 
of this series with ndo_xsk_async_xmit -> ndo_xsk_wakeup fixed.

>>                           */
>>                          synchronize_rcu();
>>                          i40e_clean_tx_ring(vsi->xdp_rings[i]);
>> @@ -12545,6 +12545,9 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
>>                  i40e_prep_for_reset(pf, true);
>>
>>          old_prog = xchg(&vsi->xdp_prog, prog);
>> +       if (old_prog)
>> +               /* Wait until ndo_xsk_async_xmit completes. */
>> +               synchronize_rcu();
> 
> This is not needed -- please correct me if I'm missing something! If
> we're disabling XDP, the need_reset-clause will take care or the
> proper synchronization.

Yes, it was added to cover the case of disabling XDP. I'm not sure how 
i40e_reset_and_rebuild will help here. What I wanted to achieve is to 
wait until all i40e_xsk_wakeups finish after setting vsi->xdp_prog = 
NULL and before doing anything else. I don't see how 
i40e_reset_and_rebuild helps here, but I'm not very familiar with i40e 
code. Could you guide me through the code of i40e_reset_and_rebuild and 
show how it takes care of the synchronization?

> And we don't need to worry about old_prog for
> the swap-XDP case,

Right.

> because bpf_prog_put() does cleanup with
> call_rcu().

But if i40e_xsk_wakeup is not wrapped with rcu_read_lock, it won't sync 
with it.

> 
>>
>>          if (need_reset)
>>                  i40e_reset_and_rebuild(pf, true, true);
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> index d07e1a890428..f73cd917c44f 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> @@ -787,8 +787,12 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
>>   {
>>          struct i40e_netdev_priv *np = netdev_priv(dev);
>>          struct i40e_vsi *vsi = np->vsi;
>> +       struct i40e_pf *pf = vsi->back;
>>          struct i40e_ring *ring;
>>
>> +       if (test_bit(__I40E_CONFIG_BUSY, pf->state))
>> +               return -ENETDOWN;
>> +
> 
> You right that we need to check for BUSY, since the XDP ring might be
> stale! Thanks for catching this! Can you respin this patch, with just
> this hunk? (Unless I'm wrong! :-))

This check itself will not be enough, unless you wrap i40e_xsk_wakeup 
with rcu_read_lock.

i40e_xsk_wakeup does some checks in the beginning, then the main part of 
the code goes. My assumption is that if the checks don't pass, the main 
part will fail in some way (e.g., NULL or dangling pointer when 
accessing the ring), otherwise you wouldn't need those checks at all. 
Without RCU, even if you do these checks, it may still fail in the 
following scenario:

1. i40e_xsk_wakeup starts running, checks the flag, the flag is set, the 
function goes on.

2. The flag gets cleared.

3. The resources are destroyed.

4. i40e_xsk_wakeup tries to access the resources that were destroyed.

I don't see anything in i40e and ixgbe that currently protects from such 
a race condition. If I'm missing anything, please point me to it, 
otherwise i40e_xsk_wakeup really needs to be wrapped into rcu_read_lock. 
I would prefer to have rcu_read_lock in the kernel, so that all drivers 
could benefit from it, because I think it's the same issue in all 
drivers, but I'm fine with moving it to the drivers if there is a reason 
why some drivers may not need it.

Thanks for taking a look. Let's settle the case with Intel's drivers, so 
that I will know what fixes to include into the respin.

> 
> 
>>          if (test_bit(__I40E_VSI_DOWN, vsi->state))
>>                  return -ENETDOWN;
>>
>> --
>> 2.20.1
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf 3/4] net/i40e: Fix concurrency issues between config flow and XSK
  2019-12-10 14:26     ` Maxim Mikityanskiy
@ 2019-12-11 10:08       ` Björn Töpel
  2019-12-13 17:10         ` Maxim Mikityanskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Björn Töpel @ 2019-12-11 10:08 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Björn Töpel
  Cc: Magnus Karlsson, Jeff Kirsher, bpf, netdev, intel-wired-lan,
	David S. Miller, Saeed Mahameed, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Jonathan Lemon

On 2019-12-10 15:26, Maxim Mikityanskiy wrote:
> On 2019-12-06 11:03, Björn Töpel wrote:
>> On Thu, 5 Dec 2019 at 16:52, Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>>>
>>> Use synchronize_rcu to wait until the XSK wakeup function finishes
>>> before destroying the resources it uses:
>>>
>>> 1. i40e_down already calls synchronize_rcu. On i40e_down either
>>> __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
>>> i40e_xsk_async_xmit (the former is already checked there).
>>>
>>> 2. After switching the XDP program, call synchronize_rcu to let
>>> i40e_xsk_async_xmit exit before the XDP program is freed.
>>>
>>> 3. Changing the number of channels brings the interface down (see
>>> i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).
>>>
>>> 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.
>>>
>>> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
>>> ---
>>>    drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++++--
>>>    drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 4 ++++
>>>    2 files changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>> index 1ccabeafa44c..afa3a99e68e1 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>> @@ -6823,8 +6823,8 @@ void i40e_down(struct i40e_vsi *vsi)
>>>           for (i = 0; i < vsi->num_queue_pairs; i++) {
>>>                   i40e_clean_tx_ring(vsi->tx_rings[i]);
>>>                   if (i40e_enabled_xdp_vsi(vsi)) {
>>> -                       /* Make sure that in-progress ndo_xdp_xmit
>>> -                        * calls are completed.
>>> +                       /* Make sure that in-progress ndo_xdp_xmit and
>>> +                        * ndo_xsk_async_xmit calls are completed.
> 
> Ooops, I see now some changes were lost during rebasing. I had a version
> of this series with ndo_xsk_async_xmit -> ndo_xsk_wakeup fixed.
> 
>>>                            */
>>>                           synchronize_rcu();
>>>                           i40e_clean_tx_ring(vsi->xdp_rings[i]);
>>> @@ -12545,6 +12545,9 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
>>>                   i40e_prep_for_reset(pf, true);
>>>
>>>           old_prog = xchg(&vsi->xdp_prog, prog);
>>> +       if (old_prog)
>>> +               /* Wait until ndo_xsk_async_xmit completes. */
>>> +               synchronize_rcu();
>>
>> This is not needed -- please correct me if I'm missing something! If
>> we're disabling XDP, the need_reset-clause will take care or the
>> proper synchronization.
> 
> Yes, it was added to cover the case of disabling XDP. I'm not sure how
> i40e_reset_and_rebuild will help here. What I wanted to achieve is to
> wait until all i40e_xsk_wakeups finish after setting vsi->xdp_prog =
> NULL and before doing anything else. I don't see how
> i40e_reset_and_rebuild helps here, but I'm not very familiar with i40e
> code. Could you guide me through the code of i40e_reset_and_rebuild and
> show how it takes care of the synchronization?
> 
>> And we don't need to worry about old_prog for
>> the swap-XDP case,
> 
> Right.
> 
>> because bpf_prog_put() does cleanup with
>> call_rcu().
> 
> But if i40e_xsk_wakeup is not wrapped with rcu_read_lock, it won't sync
> with it.
> 
>>
>>>
>>>           if (need_reset)
>>>                   i40e_reset_and_rebuild(pf, true, true);
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>> index d07e1a890428..f73cd917c44f 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>> @@ -787,8 +787,12 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
>>>    {
>>>           struct i40e_netdev_priv *np = netdev_priv(dev);
>>>           struct i40e_vsi *vsi = np->vsi;
>>> +       struct i40e_pf *pf = vsi->back;
>>>           struct i40e_ring *ring;
>>>
>>> +       if (test_bit(__I40E_CONFIG_BUSY, pf->state))
>>> +               return -ENETDOWN;
>>> +
>>
>> You right that we need to check for BUSY, since the XDP ring might be
>> stale! Thanks for catching this! Can you respin this patch, with just
>> this hunk? (Unless I'm wrong! :-))
> 
> This check itself will not be enough, unless you wrap i40e_xsk_wakeup
> with rcu_read_lock.
> 
> i40e_xsk_wakeup does some checks in the beginning, then the main part of
> the code goes. My assumption is that if the checks don't pass, the main
> part will fail in some way (e.g., NULL or dangling pointer when
> accessing the ring), otherwise you wouldn't need those checks at all.
> Without RCU, even if you do these checks, it may still fail in the
> following scenario:
> 
> 1. i40e_xsk_wakeup starts running, checks the flag, the flag is set, the
> function goes on.
> 
> 2. The flag gets cleared.
> 
> 3. The resources are destroyed.
> 
> 4. i40e_xsk_wakeup tries to access the resources that were destroyed.
> 
> I don't see anything in i40e and ixgbe that currently protects from such
> a race condition. If I'm missing anything, please point me to it,
> otherwise i40e_xsk_wakeup really needs to be wrapped into rcu_read_lock.
> I would prefer to have rcu_read_lock in the kernel, so that all drivers
> could benefit from it, because I think it's the same issue in all
> drivers, but I'm fine with moving it to the drivers if there is a reason
> why some drivers may not need it.
> 
> Thanks for taking a look. Let's settle the case with Intel's drivers, so
> that I will know what fixes to include into the respin.
>

Max, you're right. It's not correct without RCU locking. Thanks for the
patience.

For the Intel ndo_xsk_wakeup implementation we need to make sure that 
the 1. umem(socket) exists, and that the 2. XDP rings exists for the 
duration of the call.

1. In xsk_unbind_dev() the state is changed to UNBOUND and then
    followed by synchronize_net(). It would be, as you wrote, incorrect
    without RCU locking the ndo_xsk_wakeup() region.

2. To ensure that the XDP rings are intact for the duration of the
    ndo_xsk_wakeup(), a synchronize_rcu() is required when the XDP
    program is swapped out (prog->NULL).

Again, both 1 and 2 requires RCU locking.

What do you think about the this patch for xsk.c (and the Intel 
drivers)? It moves all ndo_xsk_wakeup calls do one place.


diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index cb6367334ca7..4833187bd259 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -1152,7 +1152,7 @@ void i40e_set_fec_in_flags(u8 fec_cfg, u32 *flags);

  static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
  {
-	return !!vsi->xdp_prog;
+	return !!READ_ONCE(vsi->xdp_prog);
  }

  int i40e_create_queue_channel(struct i40e_vsi *vsi, struct 
i40e_channel *ch);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1ccabeafa44c..fd084f03f00d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12546,8 +12546,11 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,

  	old_prog = xchg(&vsi->xdp_prog, prog);

-	if (need_reset)
+	if (need_reset) {
+		if (!prog)
+			synchronize_rcu();
  		i40e_reset_and_rebuild(pf, true, true);
+	}

  	for (i = 0; i < vsi->num_queue_pairs; i++)
  		WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 25c097cd8100..adff2cbcde1a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10261,7 +10261,11 @@ static int ixgbe_xdp_setup(struct net_device 
*dev, struct bpf_prog *prog)

  	/* If transitioning XDP modes reconfigure rings */
  	if (need_reset) {
-		int err = ixgbe_setup_tc(dev, adapter->hw_tcs);
+		int err;
+
+		if (!prog)
+			synchronize_rcu();
+		err = ixgbe_setup_tc(dev, adapter->hw_tcs);

  		if (err) {
  			rcu_assign_pointer(adapter->xdp_prog, old_prog);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 956793893c9d..dbf06c3b7061 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -334,12 +334,20 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
struct xdp_desc *desc)
  }
  EXPORT_SYMBOL(xsk_umem_consume_tx);

-static int xsk_zc_xmit(struct xdp_sock *xs)
+static int xsk_wakeup(struct xdp_sock *xs, u8 flags)
  {
  	struct net_device *dev = xs->dev;
+	int err;

-	return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
-					       XDP_WAKEUP_TX);
+	rcu_read_lock();
+	err = dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags);
+	rcu_read_unlock();
+	return err;
+}
+
+static int xsk_zc_xmit(struct xdp_sock *xs)
+{
+	return xsk_wakeup(xs, XDP_WAKEUP_TX);
  }

  static void xsk_destruct_skb(struct sk_buff *skb)
@@ -453,19 +461,16 @@ static __poll_t xsk_poll(struct file *file, struct 
socket *sock,
  	__poll_t mask = datagram_poll(file, sock, wait);
  	struct sock *sk = sock->sk;
  	struct xdp_sock *xs = xdp_sk(sk);
-	struct net_device *dev;
  	struct xdp_umem *umem;

  	if (unlikely(!xsk_is_bound(xs)))
  		return mask;

-	dev = xs->dev;
  	umem = xs->umem;

  	if (umem->need_wakeup) {
-		if (dev->netdev_ops->ndo_xsk_wakeup)
-			dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
-							umem->need_wakeup);
+		if (xs->zc)
+			xsk_wakeup(umem->need_wakeup);
  		else
  			/* Poll needs to drive Tx also in copy mode */
  			__xsk_sendmsg(sk);


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf 3/4] net/i40e: Fix concurrency issues between config flow and XSK
  2019-12-11 10:08       ` Björn Töpel
@ 2019-12-13 17:10         ` Maxim Mikityanskiy
  2019-12-16  7:50           ` Björn Töpel
  0 siblings, 1 reply; 13+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-13 17:10 UTC (permalink / raw)
  To: Björn Töpel, Björn Töpel
  Cc: Magnus Karlsson, Jeff Kirsher, bpf, netdev, intel-wired-lan,
	David S. Miller, Saeed Mahameed, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Jonathan Lemon, Maxim Mikityanskiy

On 2019-12-11 12:08, Björn Töpel wrote:
> On 2019-12-10 15:26, Maxim Mikityanskiy wrote:
>> On 2019-12-06 11:03, Björn Töpel wrote:
>>> On Thu, 5 Dec 2019 at 16:52, Maxim Mikityanskiy 
>>> <maximmi@mellanox.com> wrote:
>>>>
>>>> Use synchronize_rcu to wait until the XSK wakeup function finishes
>>>> before destroying the resources it uses:
>>>>
>>>> 1. i40e_down already calls synchronize_rcu. On i40e_down either
>>>> __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
>>>> i40e_xsk_async_xmit (the former is already checked there).
>>>>
>>>> 2. After switching the XDP program, call synchronize_rcu to let
>>>> i40e_xsk_async_xmit exit before the XDP program is freed.
>>>>
>>>> 3. Changing the number of channels brings the interface down (see
>>>> i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).
>>>>
>>>> 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.
>>>>
>>>> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
>>>> ---
>>>>    drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++++--
>>>>    drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 4 ++++
>>>>    2 files changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
>>>> b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>> index 1ccabeafa44c..afa3a99e68e1 100644
>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>> @@ -6823,8 +6823,8 @@ void i40e_down(struct i40e_vsi *vsi)
>>>>           for (i = 0; i < vsi->num_queue_pairs; i++) {
>>>>                   i40e_clean_tx_ring(vsi->tx_rings[i]);
>>>>                   if (i40e_enabled_xdp_vsi(vsi)) {
>>>> -                       /* Make sure that in-progress ndo_xdp_xmit
>>>> -                        * calls are completed.
>>>> +                       /* Make sure that in-progress ndo_xdp_xmit and
>>>> +                        * ndo_xsk_async_xmit calls are completed.
>>
>> Ooops, I see now some changes were lost during rebasing. I had a version
>> of this series with ndo_xsk_async_xmit -> ndo_xsk_wakeup fixed.
>>
>>>>                            */
>>>>                           synchronize_rcu();
>>>>                           i40e_clean_tx_ring(vsi->xdp_rings[i]);
>>>> @@ -12545,6 +12545,9 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
>>>>                   i40e_prep_for_reset(pf, true);
>>>>
>>>>           old_prog = xchg(&vsi->xdp_prog, prog);
>>>> +       if (old_prog)
>>>> +               /* Wait until ndo_xsk_async_xmit completes. */
>>>> +               synchronize_rcu();
>>>
>>> This is not needed -- please correct me if I'm missing something! If
>>> we're disabling XDP, the need_reset-clause will take care or the
>>> proper synchronization.
>>
>> Yes, it was added to cover the case of disabling XDP. I'm not sure how
>> i40e_reset_and_rebuild will help here. What I wanted to achieve is to
>> wait until all i40e_xsk_wakeups finish after setting vsi->xdp_prog =
>> NULL and before doing anything else. I don't see how
>> i40e_reset_and_rebuild helps here, but I'm not very familiar with i40e
>> code. Could you guide me through the code of i40e_reset_and_rebuild and
>> show how it takes care of the synchronization?
>>
>>> And we don't need to worry about old_prog for
>>> the swap-XDP case,
>>
>> Right.
>>
>>> because bpf_prog_put() does cleanup with
>>> call_rcu().
>>
>> But if i40e_xsk_wakeup is not wrapped with rcu_read_lock, it won't sync
>> with it.
>>
>>>
>>>>
>>>>           if (need_reset)
>>>>                   i40e_reset_and_rebuild(pf, true, true);
>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
>>>> b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>>> index d07e1a890428..f73cd917c44f 100644
>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>>> @@ -787,8 +787,12 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 
>>>> queue_id, u32 flags)
>>>>    {
>>>>           struct i40e_netdev_priv *np = netdev_priv(dev);
>>>>           struct i40e_vsi *vsi = np->vsi;
>>>> +       struct i40e_pf *pf = vsi->back;
>>>>           struct i40e_ring *ring;
>>>>
>>>> +       if (test_bit(__I40E_CONFIG_BUSY, pf->state))
>>>> +               return -ENETDOWN;
>>>> +
>>>
>>> You right that we need to check for BUSY, since the XDP ring might be
>>> stale! Thanks for catching this! Can you respin this patch, with just
>>> this hunk? (Unless I'm wrong! :-))
>>
>> This check itself will not be enough, unless you wrap i40e_xsk_wakeup
>> with rcu_read_lock.
>>
>> i40e_xsk_wakeup does some checks in the beginning, then the main part of
>> the code goes. My assumption is that if the checks don't pass, the main
>> part will fail in some way (e.g., NULL or dangling pointer when
>> accessing the ring), otherwise you wouldn't need those checks at all.
>> Without RCU, even if you do these checks, it may still fail in the
>> following scenario:
>>
>> 1. i40e_xsk_wakeup starts running, checks the flag, the flag is set, the
>> function goes on.
>>
>> 2. The flag gets cleared.
>>
>> 3. The resources are destroyed.
>>
>> 4. i40e_xsk_wakeup tries to access the resources that were destroyed.
>>
>> I don't see anything in i40e and ixgbe that currently protects from such
>> a race condition. If I'm missing anything, please point me to it,
>> otherwise i40e_xsk_wakeup really needs to be wrapped into rcu_read_lock.
>> I would prefer to have rcu_read_lock in the kernel, so that all drivers
>> could benefit from it, because I think it's the same issue in all
>> drivers, but I'm fine with moving it to the drivers if there is a reason
>> why some drivers may not need it.
>>
>> Thanks for taking a look. Let's settle the case with Intel's drivers, so
>> that I will know what fixes to include into the respin.
>>
> 
> Max, you're right. It's not correct without RCU locking. Thanks for the
> patience.
> 
> For the Intel ndo_xsk_wakeup implementation we need to make sure that 
> the 1. umem(socket) exists, and that the 2. XDP rings exists for the 
> duration of the call.
> 
> 1. In xsk_unbind_dev() the state is changed to UNBOUND and then
>     followed by synchronize_net(). It would be, as you wrote, incorrect
>     without RCU locking the ndo_xsk_wakeup() region.
> 
> 2. To ensure that the XDP rings are intact for the duration of the
>     ndo_xsk_wakeup(), a synchronize_rcu() is required when the XDP
>     program is swapped out (prog->NULL).
> 
> Again, both 1 and 2 requires RCU locking.
> 
> What do you think about the this patch for xsk.c (and the Intel 
> drivers)? It moves all ndo_xsk_wakeup calls do one place.

Looks good, please also see a few comments below.

Checking the __I40E_CONFIG_BUSY and __IXGBE_TX_DISABLED flags in the 
wakeup functions is still required, I believe. You didn't include these 
parts into your patch, I just want to make sure you don't suggest to 
drop these parts of my patches.

How would you like to proceed with it? Do you want me to integrate your 
changes into my patches and respin with both my and your sign-offs?

> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
> b/drivers/net/ethernet/intel/i40e/i40e.h
> index cb6367334ca7..4833187bd259 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -1152,7 +1152,7 @@ void i40e_set_fec_in_flags(u8 fec_cfg, u32 *flags);
> 
>   static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
>   {
> -    return !!vsi->xdp_prog;
> +    return !!READ_ONCE(vsi->xdp_prog);
>   }
> 
>   int i40e_create_queue_channel(struct i40e_vsi *vsi, struct 
> i40e_channel *ch);
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 1ccabeafa44c..fd084f03f00d 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -12546,8 +12546,11 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
> 
>       old_prog = xchg(&vsi->xdp_prog, prog);
> 
> -    if (need_reset)
> +    if (need_reset) {
> +        if (!prog)
> +            synchronize_rcu();
>           i40e_reset_and_rebuild(pf, true, true);
> +    }
> 
>       for (i = 0; i < vsi->num_queue_pairs; i++)
>           WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 25c097cd8100..adff2cbcde1a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -10261,7 +10261,11 @@ static int ixgbe_xdp_setup(struct net_device 
> *dev, struct bpf_prog *prog)
> 
>       /* If transitioning XDP modes reconfigure rings */
>       if (need_reset) {
> -        int err = ixgbe_setup_tc(dev, adapter->hw_tcs);
> +        int err;
> +
> +        if (!prog)
> +            synchronize_rcu();
> +        err = ixgbe_setup_tc(dev, adapter->hw_tcs);
> 
>           if (err) {
>               rcu_assign_pointer(adapter->xdp_prog, old_prog);
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 956793893c9d..dbf06c3b7061 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -334,12 +334,20 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
> struct xdp_desc *desc)
>   }
>   EXPORT_SYMBOL(xsk_umem_consume_tx);
> 
> -static int xsk_zc_xmit(struct xdp_sock *xs)
> +static int xsk_wakeup(struct xdp_sock *xs, u8 flags)
>   {
>       struct net_device *dev = xs->dev;
> +    int err;
> 
> -    return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> -                           XDP_WAKEUP_TX);
> +    rcu_read_lock();
> +    err = dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags);
> +    rcu_read_unlock();
> +    return err;
> +}
> +
> +static int xsk_zc_xmit(struct xdp_sock *xs)
> +{
> +    return xsk_wakeup(xs, XDP_WAKEUP_TX);
>   }
> 
>   static void xsk_destruct_skb(struct sk_buff *skb)
> @@ -453,19 +461,16 @@ static __poll_t xsk_poll(struct file *file, struct 
> socket *sock,
>       __poll_t mask = datagram_poll(file, sock, wait);
>       struct sock *sk = sock->sk;
>       struct xdp_sock *xs = xdp_sk(sk);
> -    struct net_device *dev;
>       struct xdp_umem *umem;
> 
>       if (unlikely(!xsk_is_bound(xs)))
>           return mask;
> 
> -    dev = xs->dev;
>       umem = xs->umem;
> 
>       if (umem->need_wakeup) {
> -        if (dev->netdev_ops->ndo_xsk_wakeup)
> -            dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> -                            umem->need_wakeup);
> +        if (xs->zc)
> +            xsk_wakeup(umem->need_wakeup);

Thanks for spotting this. I missed the second place where wakeup is called.

BTW, does the current code contain a bug? The socket can be opened in 
copy mode even if ndo_xsk_wakeup is not NULL, and we'll call 
ndo_xsk_wakeup in this case, although it's not zero-copy. After your 
patch it will be fixed.

P.S. I'll be on vacation, so I added my personal email to Cc to be able 
to receive responses.

>           else
>               /* Poll needs to drive Tx also in copy mode */
>               __xsk_sendmsg(sk);
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf 3/4] net/i40e: Fix concurrency issues between config flow and XSK
  2019-12-13 17:10         ` Maxim Mikityanskiy
@ 2019-12-16  7:50           ` Björn Töpel
  0 siblings, 0 replies; 13+ messages in thread
From: Björn Töpel @ 2019-12-16  7:50 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Björn Töpel
  Cc: Magnus Karlsson, Jeff Kirsher, bpf, netdev, intel-wired-lan,
	David S. Miller, Saeed Mahameed, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Jonathan Lemon, Maxim Mikityanskiy

On 2019-12-13 18:10, Maxim Mikityanskiy wrote:
> On 2019-12-11 12:08, Björn Töpel wrote:
>> On 2019-12-10 15:26, Maxim Mikityanskiy wrote:
>>> On 2019-12-06 11:03, Björn Töpel wrote:
>>>> On Thu, 5 Dec 2019 at 16:52, Maxim Mikityanskiy
>>>> <maximmi@mellanox.com> wrote:
>>>>>
>>>>> Use synchronize_rcu to wait until the XSK wakeup function finishes
>>>>> before destroying the resources it uses:
>>>>>
>>>>> 1. i40e_down already calls synchronize_rcu. On i40e_down either
>>>>> __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
>>>>> i40e_xsk_async_xmit (the former is already checked there).
>>>>>
>>>>> 2. After switching the XDP program, call synchronize_rcu to let
>>>>> i40e_xsk_async_xmit exit before the XDP program is freed.
>>>>>
>>>>> 3. Changing the number of channels brings the interface down (see
>>>>> i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).
>>>>>
>>>>> 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.
>>>>>
>>>>> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
>>>>> ---
>>>>>     drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++++--
>>>>>     drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 4 ++++
>>>>>     2 files changed, 9 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>>> b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>>> index 1ccabeafa44c..afa3a99e68e1 100644
>>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>>>>> @@ -6823,8 +6823,8 @@ void i40e_down(struct i40e_vsi *vsi)
>>>>>            for (i = 0; i < vsi->num_queue_pairs; i++) {
>>>>>                    i40e_clean_tx_ring(vsi->tx_rings[i]);
>>>>>                    if (i40e_enabled_xdp_vsi(vsi)) {
>>>>> -                       /* Make sure that in-progress ndo_xdp_xmit
>>>>> -                        * calls are completed.
>>>>> +                       /* Make sure that in-progress ndo_xdp_xmit and
>>>>> +                        * ndo_xsk_async_xmit calls are completed.
>>>
>>> Ooops, I see now some changes were lost during rebasing. I had a version
>>> of this series with ndo_xsk_async_xmit -> ndo_xsk_wakeup fixed.
>>>
>>>>>                             */
>>>>>                            synchronize_rcu();
>>>>>                            i40e_clean_tx_ring(vsi->xdp_rings[i]);
>>>>> @@ -12545,6 +12545,9 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
>>>>>                    i40e_prep_for_reset(pf, true);
>>>>>
>>>>>            old_prog = xchg(&vsi->xdp_prog, prog);
>>>>> +       if (old_prog)
>>>>> +               /* Wait until ndo_xsk_async_xmit completes. */
>>>>> +               synchronize_rcu();
>>>>
>>>> This is not needed -- please correct me if I'm missing something! If
>>>> we're disabling XDP, the need_reset-clause will take care or the
>>>> proper synchronization.
>>>
>>> Yes, it was added to cover the case of disabling XDP. I'm not sure how
>>> i40e_reset_and_rebuild will help here. What I wanted to achieve is to
>>> wait until all i40e_xsk_wakeups finish after setting vsi->xdp_prog =
>>> NULL and before doing anything else. I don't see how
>>> i40e_reset_and_rebuild helps here, but I'm not very familiar with i40e
>>> code. Could you guide me through the code of i40e_reset_and_rebuild and
>>> show how it takes care of the synchronization?
>>>
>>>> And we don't need to worry about old_prog for
>>>> the swap-XDP case,
>>>
>>> Right.
>>>
>>>> because bpf_prog_put() does cleanup with
>>>> call_rcu().
>>>
>>> But if i40e_xsk_wakeup is not wrapped with rcu_read_lock, it won't sync
>>> with it.
>>>
>>>>
>>>>>
>>>>>            if (need_reset)
>>>>>                    i40e_reset_and_rebuild(pf, true, true);
>>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>>>> b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>>>> index d07e1a890428..f73cd917c44f 100644
>>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>>>>> @@ -787,8 +787,12 @@ int i40e_xsk_wakeup(struct net_device *dev, u32
>>>>> queue_id, u32 flags)
>>>>>     {
>>>>>            struct i40e_netdev_priv *np = netdev_priv(dev);
>>>>>            struct i40e_vsi *vsi = np->vsi;
>>>>> +       struct i40e_pf *pf = vsi->back;
>>>>>            struct i40e_ring *ring;
>>>>>
>>>>> +       if (test_bit(__I40E_CONFIG_BUSY, pf->state))
>>>>> +               return -ENETDOWN;
>>>>> +
>>>>
>>>> You right that we need to check for BUSY, since the XDP ring might be
>>>> stale! Thanks for catching this! Can you respin this patch, with just
>>>> this hunk? (Unless I'm wrong! :-))
>>>
>>> This check itself will not be enough, unless you wrap i40e_xsk_wakeup
>>> with rcu_read_lock.
>>>
>>> i40e_xsk_wakeup does some checks in the beginning, then the main part of
>>> the code goes. My assumption is that if the checks don't pass, the main
>>> part will fail in some way (e.g., NULL or dangling pointer when
>>> accessing the ring), otherwise you wouldn't need those checks at all.
>>> Without RCU, even if you do these checks, it may still fail in the
>>> following scenario:
>>>
>>> 1. i40e_xsk_wakeup starts running, checks the flag, the flag is set, the
>>> function goes on.
>>>
>>> 2. The flag gets cleared.
>>>
>>> 3. The resources are destroyed.
>>>
>>> 4. i40e_xsk_wakeup tries to access the resources that were destroyed.
>>>
>>> I don't see anything in i40e and ixgbe that currently protects from such
>>> a race condition. If I'm missing anything, please point me to it,
>>> otherwise i40e_xsk_wakeup really needs to be wrapped into rcu_read_lock.
>>> I would prefer to have rcu_read_lock in the kernel, so that all drivers
>>> could benefit from it, because I think it's the same issue in all
>>> drivers, but I'm fine with moving it to the drivers if there is a reason
>>> why some drivers may not need it.
>>>
>>> Thanks for taking a look. Let's settle the case with Intel's drivers, so
>>> that I will know what fixes to include into the respin.
>>>
>>
>> Max, you're right. It's not correct without RCU locking. Thanks for the
>> patience.
>>
>> For the Intel ndo_xsk_wakeup implementation we need to make sure that
>> the 1. umem(socket) exists, and that the 2. XDP rings exists for the
>> duration of the call.
>>
>> 1. In xsk_unbind_dev() the state is changed to UNBOUND and then
>>      followed by synchronize_net(). It would be, as you wrote, incorrect
>>      without RCU locking the ndo_xsk_wakeup() region.
>>
>> 2. To ensure that the XDP rings are intact for the duration of the
>>      ndo_xsk_wakeup(), a synchronize_rcu() is required when the XDP
>>      program is swapped out (prog->NULL).
>>
>> Again, both 1 and 2 requires RCU locking.
>>
>> What do you think about the this patch for xsk.c (and the Intel
>> drivers)? It moves all ndo_xsk_wakeup calls do one place.
> 
> Looks good, please also see a few comments below.
> 
> Checking the __I40E_CONFIG_BUSY and __IXGBE_TX_DISABLED flags in the
> wakeup functions is still required, I believe. You didn't include these
> parts into your patch, I just want to make sure you don't suggest to
> drop these parts of my patches.
>

No, you're right. I missed that!


> How would you like to proceed with it? Do you want me to integrate your
> changes into my patches and respin with both my and your sign-offs?
>

That sounds like a good plan! Let me know if you'd like me to do it, 
given that you're on vacation!

>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h
>> b/drivers/net/ethernet/intel/i40e/i40e.h
>> index cb6367334ca7..4833187bd259 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
>> @@ -1152,7 +1152,7 @@ void i40e_set_fec_in_flags(u8 fec_cfg, u32 *flags);
>>
>>    static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
>>    {
>> -    return !!vsi->xdp_prog;
>> +    return !!READ_ONCE(vsi->xdp_prog);
>>    }
>>
>>    int i40e_create_queue_channel(struct i40e_vsi *vsi, struct
>> i40e_channel *ch);
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> index 1ccabeafa44c..fd084f03f00d 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> @@ -12546,8 +12546,11 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
>>
>>        old_prog = xchg(&vsi->xdp_prog, prog);
>>
>> -    if (need_reset)
>> +    if (need_reset) {
>> +        if (!prog)
>> +            synchronize_rcu();
>>            i40e_reset_and_rebuild(pf, true, true);
>> +    }
>>
>>        for (i = 0; i < vsi->num_queue_pairs; i++)
>>            WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> index 25c097cd8100..adff2cbcde1a 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> @@ -10261,7 +10261,11 @@ static int ixgbe_xdp_setup(struct net_device
>> *dev, struct bpf_prog *prog)
>>
>>        /* If transitioning XDP modes reconfigure rings */
>>        if (need_reset) {
>> -        int err = ixgbe_setup_tc(dev, adapter->hw_tcs);
>> +        int err;
>> +
>> +        if (!prog)
>> +            synchronize_rcu();
>> +        err = ixgbe_setup_tc(dev, adapter->hw_tcs);
>>
>>            if (err) {
>>                rcu_assign_pointer(adapter->xdp_prog, old_prog);
>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>> index 956793893c9d..dbf06c3b7061 100644
>> --- a/net/xdp/xsk.c
>> +++ b/net/xdp/xsk.c
>> @@ -334,12 +334,20 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem,
>> struct xdp_desc *desc)
>>    }
>>    EXPORT_SYMBOL(xsk_umem_consume_tx);
>>
>> -static int xsk_zc_xmit(struct xdp_sock *xs)
>> +static int xsk_wakeup(struct xdp_sock *xs, u8 flags)
>>    {
>>        struct net_device *dev = xs->dev;
>> +    int err;
>>
>> -    return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
>> -                           XDP_WAKEUP_TX);
>> +    rcu_read_lock();
>> +    err = dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags);
>> +    rcu_read_unlock();
>> +    return err;
>> +}
>> +
>> +static int xsk_zc_xmit(struct xdp_sock *xs)
>> +{
>> +    return xsk_wakeup(xs, XDP_WAKEUP_TX);
>>    }
>>
>>    static void xsk_destruct_skb(struct sk_buff *skb)
>> @@ -453,19 +461,16 @@ static __poll_t xsk_poll(struct file *file, struct
>> socket *sock,
>>        __poll_t mask = datagram_poll(file, sock, wait);
>>        struct sock *sk = sock->sk;
>>        struct xdp_sock *xs = xdp_sk(sk);
>> -    struct net_device *dev;
>>        struct xdp_umem *umem;
>>
>>        if (unlikely(!xsk_is_bound(xs)))
>>            return mask;
>>
>> -    dev = xs->dev;
>>        umem = xs->umem;
>>
>>        if (umem->need_wakeup) {
>> -        if (dev->netdev_ops->ndo_xsk_wakeup)
>> -            dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
>> -                            umem->need_wakeup);
>> +        if (xs->zc)
>> +            xsk_wakeup(umem->need_wakeup);
> 
> Thanks for spotting this. I missed the second place where wakeup is called.
> 
> BTW, does the current code contain a bug? The socket can be opened in
> copy mode even if ndo_xsk_wakeup is not NULL, and we'll call
> ndo_xsk_wakeup in this case, although it's not zero-copy. After your
> patch it will be fixed.
>

Ah, yes indeed. Good catch!

> P.S. I'll be on vacation, so I added my personal email to Cc to be able
> to receive responses.
>

ACK! ...and again let me know if you'd like me to spin the patches.


Cheers,
Björn


>>            else
>>                /* Poll needs to drive Tx also in copy mode */
>>                __xsk_sendmsg(sk);
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-12-16  7:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05 15:51 [PATCH bpf 0/4] Fix concurrency issues between XSK wakeup and control path using RCU Maxim Mikityanskiy
2019-12-05 15:51 ` [PATCH bpf 1/4] xsk: Add rcu_read_lock around the XSK wakeup Maxim Mikityanskiy
2019-12-06  9:02   ` Björn Töpel
2019-12-05 15:51 ` [PATCH bpf 2/4] net/mlx5e: Fix concurrency issues between config flow and XSK Maxim Mikityanskiy
2019-12-06  9:03   ` Björn Töpel
2019-12-05 15:51 ` [PATCH bpf 3/4] net/i40e: " Maxim Mikityanskiy
2019-12-06  9:03   ` Björn Töpel
2019-12-10 14:26     ` Maxim Mikityanskiy
2019-12-11 10:08       ` Björn Töpel
2019-12-13 17:10         ` Maxim Mikityanskiy
2019-12-16  7:50           ` Björn Töpel
2019-12-05 15:51 ` [PATCH bpf 4/4] net/ixgbe: " Maxim Mikityanskiy
2019-12-06  9:03   ` Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).