On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote: > Saeed Mahameed writes: > > > I think that the mlx5 driver doesn't know how to tell the other > > device > > to stop transmitting to it while it is resetting.. Maybe tariq or > > Jesper know more about this ? > > I will look at this tomorrow after noon and will try to repro... > > Hi Saeed > > Did you have a chance to poke at this? :) HI Toke, yes i have been planing to respond but also i wanted to dig more, so the root cause is very clear. 1. core 1 is doing tx_dev->ndo_xdp_xmit() 2. core 2 is doing tx_dev->xdp_set() //remove xdp program. in mlx5 you must have xdp porgram on tx_dev in order to be able to use dev_map and ndo_xdp_xmit, due to the simple reason that we create unique TX resources (Send Queues/SQs) for xdp redirect/tx use case. so if you are removing xdp program on core2, driver will start destroying xdp redirect SQs, this safe for xdp rx and fwd since we use napi_synchronize. But for xdp redirect, we don't have the means to synchronize with a different napi device and yet on a different cpu ! so if core 1 got past the below check in mlx5/core/en/xdp.c @ mlx5e_xdp_xmit if (unlikely(!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))) return -ENETDOWN; and at the same moment core 2 destroyed that SQ: @mlx5e_xdp_set -> mlx5e_close_locked This SQ is no longer available and core 1 will be writing descriptors on a freed dma buffer. and the problem is beyond mlx5, since we don't have a way to tell a different core/different netdev to stop xmitting, or at least synchronize with it. Assuming napi is polled under rcu read lock, then synchronize_net might help ass suggested in the attached temporary fix. the idea is to set a flag for mlx5e_xdp_tx ndo to check if tx resources are valid, and to set it on and of on xdp_set ndo with synchronize_net to synchronize with ongoing xdp redirection. I am still not sure if napi is polled under rcu read lock, if it's not and synchronize_net() didn't help then replace it with msleep(200), should be enough for now. I've managed to reproduce and verify the fix with even one direction of xdp_redirect and by just removing the xdp program on the tx side while xdp redirection was ongoing. #run xdp redirect RX_IF=p6p1 TX_IF=p5p1 ./samples/bpf/xdp_redirect_map $( Date: Fri, 19 Oct 2018 14:59:00 -0700 Subject: [PATCH] net/mlx5e: XDP redirect bug fix Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++++++++++++-- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index aea74856c702..4228aafc0165 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -620,6 +620,7 @@ struct mlx5e_channels { struct mlx5e_channel **c; unsigned int num; struct mlx5e_params params; + bool xdp_disabled; }; struct mlx5e_channel_stats { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index ad6d471d00dd..c338b7b8f838 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -268,6 +268,9 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) return -EINVAL; + if (unlikely(READ_ONCE(priv->channels.xdp_disabled))) + return -ENETDOWN; + sq_num = smp_processor_id(); if (unlikely(sq_num >= priv->channels.num)) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 0d495a6b3949..a2d8a52ae469 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4237,8 +4237,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) /* no need for full reset when exchanging programs */ reset = (!priv->channels.params.xdp_prog || !prog); - if (was_opened && reset) + if (was_opened && reset) { + for (i = 0; i < priv->channels.num; i++) + clear_bit(MLX5E_SQ_STATE_ENABLED, &priv- >channels.c[i]->xdpsq.state); + priv->channels.xdp_disabled = true; + + synchronize_net(); + //msleep(200); + mlx5e_close_locked(netdev); + } + if (was_opened && !reset) { /* num_channels is invariant here, so we can take the * batched reference right upfront. @@ -4260,8 +4269,10 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) if (reset) /* change RQ type according to priv->xdp_prog */ mlx5e_set_rq_type(priv->mdev, &priv->channels.params); - if (was_opened && reset) + if (was_opened && reset) { mlx5e_open_locked(netdev); + priv->channels.xdp_disabled = false; + } if (!test_bit(MLX5E_STATE_OPENED, &priv->state) || reset) goto unlock; -- 2.17.2 > > -Toke