All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net 00/10] mlx5 fixes 2023-02-07
@ 2023-02-08  3:02 Saeed Mahameed
  2023-02-08  3:02 ` [net 01/10] net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change Saeed Mahameed
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:02 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Saeed Mahameed <saeedm@nvidia.com>

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.

Thanks,
Saeed.


The following changes since commit 565b4824c39fa335cba2028a09d7beb7112f3c9a:

  devlink: change port event netdev notifier from per-net to global (2023-02-07 14:13:55 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2023-02-07

for you to fetch changes up to 8f0d1451ecf7b3bd5a06ffc866c753d0f3ab4683:

  net/mlx5: Serialize module cleanup with reload and remove (2023-02-07 19:01:07 -0800)

----------------------------------------------------------------
mlx5-fixes-2023-02-07

----------------------------------------------------------------
Adham Faris (1):
      net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change

Amir Tzin (1):
      net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode

Dragos Tatulea (1):
      net/mlx5e: IPoIB, Show unknown speed instead of error

Maher Sanalla (2):
      net/mlx5: Store page counters in a single array
      net/mlx5: Expose SF firmware pages counter

Shay Drory (3):
      net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
      net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
      net/mlx5: Serialize module cleanup with reload and remove

Vlad Buslov (1):
      net/mlx5: Bridge, fix ageing of peer FDB entries

Yevgeny Kliteynik (1):
      net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic

 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c  |  5 +-
 .../ethernet/mellanox/mlx5/core/diag/fw_tracer.c   |  3 +-
 drivers/net/ethernet/mellanox/mlx5/core/ecpf.c     |  2 +-
 .../ethernet/mellanox/mlx5/core/en/rep/bridge.c    |  4 -
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 90 +++++-----------------
 .../net/ethernet/mellanox/mlx5/core/esw/bridge.c   |  2 +-
 .../ethernet/mellanox/mlx5/core/ipoib/ethtool.c    | 13 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 14 ++--
 .../net/ethernet/mellanox/mlx5/core/pagealloc.c    | 37 +++++----
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c    |  2 +-
 .../ethernet/mellanox/mlx5/core/steering/dr_rule.c | 25 +++---
 include/linux/mlx5/driver.h                        | 13 +++-
 13 files changed, 86 insertions(+), 126 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [net 01/10] net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
@ 2023-02-08  3:02 ` Saeed Mahameed
  2023-02-09  5:00   ` patchwork-bot+netdevbpf
  2023-02-08  3:02 ` [net 02/10] net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic Saeed Mahameed
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:02 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Adham Faris

From: Adham Faris <afaris@nvidia.com>

rq->hw_mtu is used in function en_rx.c/mlx5e_skb_from_cqe_mpwrq_linear()
to catch oversized packets. If FCS is concatenated to the end of the
packet then the check should be updated accordingly.

Rx rings initialization (mlx5e_init_rxq_rq()) invoked for every new set
of channels, as part of mlx5e_safe_switch_params(), unknowingly if it
runs with default configuration or not. Current rq->hw_mtu
initialization assumes default configuration and ignores
params->scatter_fcs_en flag state.
Fix this, by accounting for params->scatter_fcs_en flag state during
rq->hw_mtu initialization.

In addition, updating rq->hw_mtu value during ingress traffic might
lead to packets drop and oversize_pkts_sw_drop counter increase with no
good reason. Hence we remove this optimization and switch the set of
channels with a new one, to make sure we don't get false positives on
the oversize_pkts_sw_drop counter.

Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 86 ++++---------------
 1 file changed, 15 insertions(+), 71 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index abcc614b6191..f8b5e5993f35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -591,7 +591,8 @@ static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *param
 	rq->ix           = c->ix;
 	rq->channel      = c;
 	rq->mdev         = mdev;
-	rq->hw_mtu       = MLX5E_SW2HW_MTU(params, params->sw_mtu);
+	rq->hw_mtu =
+		MLX5E_SW2HW_MTU(params, params->sw_mtu) - ETH_FCS_LEN * !params->scatter_fcs_en;
 	rq->xdpsq        = &c->rq_xdpsq;
 	rq->stats        = &c->priv->channel_stats[c->ix]->rq;
 	rq->ptp_cyc2time = mlx5_rq_ts_translator(mdev);
@@ -1014,35 +1015,6 @@ int mlx5e_flush_rq(struct mlx5e_rq *rq, int curr_state)
 	return mlx5e_rq_to_ready(rq, curr_state);
 }
 
-static int mlx5e_modify_rq_scatter_fcs(struct mlx5e_rq *rq, bool enable)
-{
-	struct mlx5_core_dev *mdev = rq->mdev;
-
-	void *in;
-	void *rqc;
-	int inlen;
-	int err;
-
-	inlen = MLX5_ST_SZ_BYTES(modify_rq_in);
-	in = kvzalloc(inlen, GFP_KERNEL);
-	if (!in)
-		return -ENOMEM;
-
-	rqc = MLX5_ADDR_OF(modify_rq_in, in, ctx);
-
-	MLX5_SET(modify_rq_in, in, rq_state, MLX5_RQC_STATE_RDY);
-	MLX5_SET64(modify_rq_in, in, modify_bitmask,
-		   MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS);
-	MLX5_SET(rqc, rqc, scatter_fcs, enable);
-	MLX5_SET(rqc, rqc, state, MLX5_RQC_STATE_RDY);
-
-	err = mlx5_core_modify_rq(mdev, rq->rqn, in);
-
-	kvfree(in);
-
-	return err;
-}
-
 static int mlx5e_modify_rq_vsd(struct mlx5e_rq *rq, bool vsd)
 {
 	struct mlx5_core_dev *mdev = rq->mdev;
@@ -3314,20 +3286,6 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
 	mlx5e_destroy_tises(priv);
 }
 
-static int mlx5e_modify_channels_scatter_fcs(struct mlx5e_channels *chs, bool enable)
-{
-	int err = 0;
-	int i;
-
-	for (i = 0; i < chs->num; i++) {
-		err = mlx5e_modify_rq_scatter_fcs(&chs->c[i]->rq, enable);
-		if (err)
-			return err;
-	}
-
-	return 0;
-}
-
 static int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd)
 {
 	int err;
@@ -3903,41 +3861,27 @@ static int mlx5e_set_rx_port_ts(struct mlx5_core_dev *mdev, bool enable)
 	return mlx5_set_ports_check(mdev, in, sizeof(in));
 }
 
+static int mlx5e_set_rx_port_ts_wrap(struct mlx5e_priv *priv, void *ctx)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	bool enable = *(bool *)ctx;
+
+	return mlx5e_set_rx_port_ts(mdev, enable);
+}
+
 static int set_feature_rx_fcs(struct net_device *netdev, bool enable)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5e_channels *chs = &priv->channels;
-	struct mlx5_core_dev *mdev = priv->mdev;
+	struct mlx5e_params new_params;
 	int err;
 
 	mutex_lock(&priv->state_lock);
 
-	if (enable) {
-		err = mlx5e_set_rx_port_ts(mdev, false);
-		if (err)
-			goto out;
-
-		chs->params.scatter_fcs_en = true;
-		err = mlx5e_modify_channels_scatter_fcs(chs, true);
-		if (err) {
-			chs->params.scatter_fcs_en = false;
-			mlx5e_set_rx_port_ts(mdev, true);
-		}
-	} else {
-		chs->params.scatter_fcs_en = false;
-		err = mlx5e_modify_channels_scatter_fcs(chs, false);
-		if (err) {
-			chs->params.scatter_fcs_en = true;
-			goto out;
-		}
-		err = mlx5e_set_rx_port_ts(mdev, true);
-		if (err) {
-			mlx5_core_warn(mdev, "Failed to set RX port timestamp %d\n", err);
-			err = 0;
-		}
-	}
-
-out:
+	new_params = chs->params;
+	new_params.scatter_fcs_en = enable;
+	err = mlx5e_safe_switch_params(priv, &new_params, mlx5e_set_rx_port_ts_wrap,
+				       &new_params.scatter_fcs_en, true);
 	mutex_unlock(&priv->state_lock);
 	return err;
 }
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 02/10] net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
  2023-02-08  3:02 ` [net 01/10] net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change Saeed Mahameed
@ 2023-02-08  3:02 ` Saeed Mahameed
  2023-02-08  3:02 ` [net 03/10] net/mlx5: Bridge, fix ageing of peer FDB entries Saeed Mahameed
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:02 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Yevgeny Kliteynik, Alex Vesker

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Selecting builder should be protected by the lock to prevent the case
where a new rule sets a builder in the nic_matcher while the previous
rule is still using the nic_matcher.

Fixing this issue and cleaning the error flow.

Fixes: b9b81e1e9382 ("net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_rule.c     | 25 +++++++++++--------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
index b851141e03de..042ca0349124 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
@@ -1138,12 +1138,14 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 			 rule->flow_source))
 		return 0;
 
+	mlx5dr_domain_nic_lock(nic_dmn);
+
 	ret = mlx5dr_matcher_select_builders(matcher,
 					     nic_matcher,
 					     dr_rule_get_ipv(&param->outer),
 					     dr_rule_get_ipv(&param->inner));
 	if (ret)
-		return ret;
+		goto err_unlock;
 
 	hw_ste_arr_is_opt = nic_matcher->num_of_builders <= DR_RULE_MAX_STES_OPTIMIZED;
 	if (likely(hw_ste_arr_is_opt)) {
@@ -1152,12 +1154,12 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 		hw_ste_arr = kzalloc((nic_matcher->num_of_builders + DR_ACTION_MAX_STES) *
 				     DR_STE_SIZE, GFP_KERNEL);
 
-		if (!hw_ste_arr)
-			return -ENOMEM;
+		if (!hw_ste_arr) {
+			ret = -ENOMEM;
+			goto err_unlock;
+		}
 	}
 
-	mlx5dr_domain_nic_lock(nic_dmn);
-
 	ret = mlx5dr_matcher_add_to_tbl_nic(dmn, nic_matcher);
 	if (ret)
 		goto free_hw_ste;
@@ -1223,7 +1225,10 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 
 	mlx5dr_domain_nic_unlock(nic_dmn);
 
-	goto out;
+	if (unlikely(!hw_ste_arr_is_opt))
+		kfree(hw_ste_arr);
+
+	return 0;
 
 free_rule:
 	dr_rule_clean_rule_members(rule, nic_rule);
@@ -1238,12 +1243,12 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 		mlx5dr_matcher_remove_from_tbl_nic(dmn, nic_matcher);
 
 free_hw_ste:
-	mlx5dr_domain_nic_unlock(nic_dmn);
-
-out:
-	if (unlikely(!hw_ste_arr_is_opt))
+	if (!hw_ste_arr_is_opt)
 		kfree(hw_ste_arr);
 
+err_unlock:
+	mlx5dr_domain_nic_unlock(nic_dmn);
+
 	return ret;
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 03/10] net/mlx5: Bridge, fix ageing of peer FDB entries
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
  2023-02-08  3:02 ` [net 01/10] net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change Saeed Mahameed
  2023-02-08  3:02 ` [net 02/10] net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic Saeed Mahameed
@ 2023-02-08  3:02 ` Saeed Mahameed
  2023-02-08  3:02 ` [net 04/10] net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode Saeed Mahameed
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:02 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Vlad Buslov, Maor Dickman

From: Vlad Buslov <vladbu@nvidia.com>

SWITCHDEV_FDB_ADD_TO_BRIDGE event handler that updates FDB entry 'lastuse'
field is only executed for eswitch that owns the entry. However, if peer
entry processed packets at least once it will have hardware counter 'used'
value greater than entry 'lastuse' from that point on, which will cause FDB
entry not being aged out.

Process the event on all eswitch instances.

Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c | 4 ----
 drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c    | 2 +-
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
index 8099a21e674c..ce85b48d327d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
@@ -438,10 +438,6 @@ static int mlx5_esw_bridge_switchdev_event(struct notifier_block *nb,
 
 	switch (event) {
 	case SWITCHDEV_FDB_ADD_TO_BRIDGE:
-		/* only handle the event on native eswtich of representor */
-		if (!mlx5_esw_bridge_is_local(dev, rep, esw))
-			break;
-
 		fdb_info = container_of(info,
 					struct switchdev_notifier_fdb_info,
 					info);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c
index b176648d1343..3cdcb0e0b20f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c
@@ -1715,7 +1715,7 @@ void mlx5_esw_bridge_fdb_update_used(struct net_device *dev, u16 vport_num, u16
 	struct mlx5_esw_bridge *bridge;
 
 	port = mlx5_esw_bridge_port_lookup(vport_num, esw_owner_vhca_id, br_offloads);
-	if (!port || port->flags & MLX5_ESW_BRIDGE_PORT_FLAG_PEER)
+	if (!port)
 		return;
 
 	bridge = port->bridge;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 04/10] net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2023-02-08  3:02 ` [net 03/10] net/mlx5: Bridge, fix ageing of peer FDB entries Saeed Mahameed
@ 2023-02-08  3:02 ` Saeed Mahameed
  2023-02-08  3:02 ` [net 05/10] net/mlx5e: IPoIB, Show unknown speed instead of error Saeed Mahameed
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:02 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Amir Tzin, Maor Dickman

From: Amir Tzin <amirtz@nvidia.com>

Moving to switchdev mode with rx-vlan-filter on and then setting it off
causes the kernel to crash since fs->vlan is freed during nic profile
cleanup flow.

RX VLAN filtering is not supported in switchdev mode so unset it when
changing to switchdev and restore its value when switching back to
legacy.

trace:
[] RIP: 0010:mlx5e_disable_cvlan_filter+0x43/0x70
[] set_feature_cvlan_filter+0x37/0x40 [mlx5_core]
[] mlx5e_handle_feature+0x3a/0x60 [mlx5_core]
[] mlx5e_set_features+0x6d/0x160 [mlx5_core]
[] __netdev_update_features+0x288/0xa70
[] ethnl_set_features+0x309/0x380
[] ? __nla_parse+0x21/0x30
[] genl_family_rcv_msg_doit.isra.17+0x110/0x150
[] genl_rcv_msg+0x112/0x260
[] ? features_reply_size+0xe0/0xe0
[] ? genl_family_rcv_msg_doit.isra.17+0x150/0x150
[] netlink_rcv_skb+0x4e/0x100
[] genl_rcv+0x24/0x40
[] netlink_unicast+0x1ab/0x290
[] netlink_sendmsg+0x257/0x4f0
[] sock_sendmsg+0x5c/0x70

Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 1892ccb889b3..7cd36f4ac3ef 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -443,7 +443,7 @@ void mlx5e_enable_cvlan_filter(struct mlx5e_flow_steering *fs, bool promisc)
 
 void mlx5e_disable_cvlan_filter(struct mlx5e_flow_steering *fs, bool promisc)
 {
-	if (fs->vlan->cvlan_filter_disabled)
+	if (!fs->vlan || fs->vlan->cvlan_filter_disabled)
 		return;
 
 	fs->vlan->cvlan_filter_disabled = true;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index f8b5e5993f35..6c24f33a5ea5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4018,6 +4018,10 @@ static netdev_features_t mlx5e_fix_uplink_rep_features(struct net_device *netdev
 	if (netdev->features & NETIF_F_GRO_HW)
 		netdev_warn(netdev, "Disabling HW_GRO, not supported in switchdev mode\n");
 
+	features &= ~NETIF_F_HW_VLAN_CTAG_FILTER;
+	if (netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER)
+		netdev_warn(netdev, "Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode\n");
+
 	return features;
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 05/10] net/mlx5e: IPoIB, Show unknown speed instead of error
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2023-02-08  3:02 ` [net 04/10] net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode Saeed Mahameed
@ 2023-02-08  3:02 ` Saeed Mahameed
  2023-02-08  3:02 ` [net 06/10] net/mlx5: Store page counters in a single array Saeed Mahameed
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:02 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Dragos Tatulea, Gal Pressman

From: Dragos Tatulea <dtatulea@nvidia.com>

ethtool is returning an error for unknown speeds for the IPoIB interface:

$ ethtool ib0
netlink error: failed to retrieve link settings
netlink error: Invalid argument
netlink error: failed to retrieve link settings
netlink error: Invalid argument
Settings for ib0:
Link detected: no

After this change, ethtool will return success and show "unknown speed":

$ ethtool ib0
Settings for ib0:
Supported ports: [  ]
Supported link modes:   Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: no

Fixes: eb234ee9d541 ("net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index eff92dc0927c..e09518f887a0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -189,16 +189,16 @@ static inline int mlx5_ptys_rate_enum_to_int(enum mlx5_ptys_rate rate)
 	}
 }
 
-static int mlx5i_get_speed_settings(u16 ib_link_width_oper, u16 ib_proto_oper)
+static u32 mlx5i_get_speed_settings(u16 ib_link_width_oper, u16 ib_proto_oper)
 {
 	int rate, width;
 
 	rate = mlx5_ptys_rate_enum_to_int(ib_proto_oper);
 	if (rate < 0)
-		return -EINVAL;
+		return SPEED_UNKNOWN;
 	width = mlx5_ptys_width_enum_to_int(ib_link_width_oper);
 	if (width < 0)
-		return -EINVAL;
+		return SPEED_UNKNOWN;
 
 	return rate * width;
 }
@@ -221,16 +221,13 @@ static int mlx5i_get_link_ksettings(struct net_device *netdev,
 	ethtool_link_ksettings_zero_link_mode(link_ksettings, advertising);
 
 	speed = mlx5i_get_speed_settings(ib_link_width_oper, ib_proto_oper);
-	if (speed < 0)
-		return -EINVAL;
+	link_ksettings->base.speed = speed;
+	link_ksettings->base.duplex = speed == SPEED_UNKNOWN ? DUPLEX_UNKNOWN : DUPLEX_FULL;
 
-	link_ksettings->base.duplex = DUPLEX_FULL;
 	link_ksettings->base.port = PORT_OTHER;
 
 	link_ksettings->base.autoneg = AUTONEG_DISABLE;
 
-	link_ksettings->base.speed = speed;
-
 	return 0;
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 06/10] net/mlx5: Store page counters in a single array
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2023-02-08  3:02 ` [net 05/10] net/mlx5e: IPoIB, Show unknown speed instead of error Saeed Mahameed
@ 2023-02-08  3:02 ` Saeed Mahameed
  2023-02-08  3:02 ` [net 07/10] net/mlx5: Expose SF firmware pages counter Saeed Mahameed
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:02 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maher Sanalla, Shay Drory

From: Maher Sanalla <msanalla@nvidia.com>

Currently, an independent page counter is used for tracking memory usage
for each function type such as VF, PF and host PF (DPU).

For better code-readibilty, use a single array that stores
the number of allocated memory pages for each function type.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/debugfs.c |  4 +-
 .../net/ethernet/mellanox/mlx5/core/ecpf.c    |  2 +-
 .../ethernet/mellanox/mlx5/core/pagealloc.c   | 37 +++++++++++--------
 .../net/ethernet/mellanox/mlx5/core/sriov.c   |  2 +-
 include/linux/mlx5/driver.h                   | 12 ++++--
 5 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index 3e232a65a0c3..c3e7c24a0971 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -245,8 +245,8 @@ void mlx5_pages_debugfs_init(struct mlx5_core_dev *dev)
 	pages = dev->priv.dbg.pages_debugfs;
 
 	debugfs_create_u32("fw_pages_total", 0400, pages, &dev->priv.fw_pages);
-	debugfs_create_u32("fw_pages_vfs", 0400, pages, &dev->priv.vfs_pages);
-	debugfs_create_u32("fw_pages_host_pf", 0400, pages, &dev->priv.host_pf_pages);
+	debugfs_create_u32("fw_pages_vfs", 0400, pages, &dev->priv.page_counters[MLX5_VF]);
+	debugfs_create_u32("fw_pages_host_pf", 0400, pages, &dev->priv.page_counters[MLX5_HOST_PF]);
 	debugfs_create_u32("fw_pages_alloc_failed", 0400, pages, &dev->priv.fw_pages_alloc_failed);
 	debugfs_create_u32("fw_pages_give_dropped", 0400, pages, &dev->priv.give_pages_dropped);
 	debugfs_create_u32("fw_pages_reclaim_discard", 0400, pages,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
index 464eb3a18450..cdc87ecae5d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
@@ -87,7 +87,7 @@ void mlx5_ec_cleanup(struct mlx5_core_dev *dev)
 
 	mlx5_host_pf_cleanup(dev);
 
-	err = mlx5_wait_for_pages(dev, &dev->priv.host_pf_pages);
+	err = mlx5_wait_for_pages(dev, &dev->priv.page_counters[MLX5_HOST_PF]);
 	if (err)
 		mlx5_core_warn(dev, "Timeout reclaiming external host PF pages err(%d)\n", err);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index 60596357bfc7..9f99292ab5ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -74,6 +74,14 @@ static u32 get_function(u16 func_id, bool ec_function)
 	return (u32)func_id | (ec_function << 16);
 }
 
+static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_function)
+{
+	if (!func_id)
+		return mlx5_core_is_ecpf(dev) && !ec_function ? MLX5_HOST_PF : MLX5_PF;
+
+	return MLX5_VF;
+}
+
 static struct rb_root *page_root_per_function(struct mlx5_core_dev *dev, u32 function)
 {
 	struct rb_root *root;
@@ -332,6 +340,7 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 	u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0};
 	int inlen = MLX5_ST_SZ_BYTES(manage_pages_in);
 	int notify_fail = event;
+	u16 func_type;
 	u64 addr;
 	int err;
 	u32 *in;
@@ -383,11 +392,9 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 		goto out_dropped;
 	}
 
+	func_type = func_id_to_type(dev, func_id, ec_function);
+	dev->priv.page_counters[func_type] += npages;
 	dev->priv.fw_pages += npages;
-	if (func_id)
-		dev->priv.vfs_pages += npages;
-	else if (mlx5_core_is_ecpf(dev) && !ec_function)
-		dev->priv.host_pf_pages += npages;
 
 	mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x, err %d\n",
 		      npages, ec_function, func_id, err);
@@ -414,6 +421,7 @@ static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id,
 	struct rb_root *root;
 	struct rb_node *p;
 	int npages = 0;
+	u16 func_type;
 
 	root = xa_load(&dev->priv.page_root_xa, function);
 	if (WARN_ON_ONCE(!root))
@@ -428,11 +436,9 @@ static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id,
 		free_fwp(dev, fwp, fwp->free_count);
 	}
 
+	func_type = func_id_to_type(dev, func_id, ec_function);
+	dev->priv.page_counters[func_type] -= npages;
 	dev->priv.fw_pages -= npages;
-	if (func_id)
-		dev->priv.vfs_pages -= npages;
-	else if (mlx5_core_is_ecpf(dev) && !ec_function)
-		dev->priv.host_pf_pages -= npages;
 
 	mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x\n",
 		      npages, ec_function, func_id);
@@ -498,6 +504,7 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 	int outlen = MLX5_ST_SZ_BYTES(manage_pages_out);
 	u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {};
 	int num_claimed;
+	u16 func_type;
 	u32 *out;
 	int err;
 	int i;
@@ -549,11 +556,9 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 	if (nclaimed)
 		*nclaimed = num_claimed;
 
+	func_type = func_id_to_type(dev, func_id, ec_function);
+	dev->priv.page_counters[func_type] -= num_claimed;
 	dev->priv.fw_pages -= num_claimed;
-	if (func_id)
-		dev->priv.vfs_pages -= num_claimed;
-	else if (mlx5_core_is_ecpf(dev) && !ec_function)
-		dev->priv.host_pf_pages -= num_claimed;
 
 out_free:
 	kvfree(out);
@@ -706,12 +711,12 @@ int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev)
 	WARN(dev->priv.fw_pages,
 	     "FW pages counter is %d after reclaiming all pages\n",
 	     dev->priv.fw_pages);
-	WARN(dev->priv.vfs_pages,
+	WARN(dev->priv.page_counters[MLX5_VF],
 	     "VFs FW pages counter is %d after reclaiming all pages\n",
-	     dev->priv.vfs_pages);
-	WARN(dev->priv.host_pf_pages,
+	     dev->priv.page_counters[MLX5_VF]);
+	WARN(dev->priv.page_counters[MLX5_HOST_PF],
 	     "External host PF FW pages counter is %d after reclaiming all pages\n",
-	     dev->priv.host_pf_pages);
+	     dev->priv.page_counters[MLX5_HOST_PF]);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index c0e6c487c63c..3008e9ce2bbf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -147,7 +147,7 @@ mlx5_device_disable_sriov(struct mlx5_core_dev *dev, int num_vfs, bool clear_vf)
 
 	mlx5_eswitch_disable_sriov(dev->priv.eswitch, clear_vf);
 
-	if (mlx5_wait_for_pages(dev, &dev->priv.vfs_pages))
+	if (mlx5_wait_for_pages(dev, &dev->priv.page_counters[MLX5_VF]))
 		mlx5_core_warn(dev, "timeout reclaiming VFs pages\n");
 }
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 76ef2e4fde38..82a9bd4274b8 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -573,6 +573,13 @@ struct mlx5_debugfs_entries {
 	struct dentry *lag_debugfs;
 };
 
+enum mlx5_func_type {
+	MLX5_PF,
+	MLX5_VF,
+	MLX5_HOST_PF,
+	MLX5_FUNC_TYPE_NUM,
+};
+
 struct mlx5_ft_pool;
 struct mlx5_priv {
 	/* IRQ table valid only for real pci devices PF or VF */
@@ -583,11 +590,10 @@ struct mlx5_priv {
 	struct mlx5_nb          pg_nb;
 	struct workqueue_struct *pg_wq;
 	struct xarray           page_root_xa;
-	u32			fw_pages;
 	atomic_t		reg_pages;
 	struct list_head	free_list;
-	u32			vfs_pages;
-	u32			host_pf_pages;
+	u32			fw_pages;
+	u32			page_counters[MLX5_FUNC_TYPE_NUM];
 	u32			fw_pages_alloc_failed;
 	u32			give_pages_dropped;
 	u32			reclaim_pages_discard;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 07/10] net/mlx5: Expose SF firmware pages counter
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2023-02-08  3:02 ` [net 06/10] net/mlx5: Store page counters in a single array Saeed Mahameed
@ 2023-02-08  3:02 ` Saeed Mahameed
  2023-02-08  3:03 ` [net 08/10] net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers Saeed Mahameed
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:02 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maher Sanalla, Shay Drory

From: Maher Sanalla <msanalla@nvidia.com>

Currently, each core device has VF pages counter which stores number of
fw pages used by its VFs and SFs.

The current design led to a hang when performing firmware reset on DPU,
where the DPU PFs stalled in sriov unload flow due to waiting on release
of SFs pages instead of waiting on only VFs pages.

Thus, Add a separate counter for SF firmware pages, which will prevent
the stall scenario described above.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c   | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +-
 include/linux/mlx5/driver.h                         | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index c3e7c24a0971..bb95b40d25eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -246,6 +246,7 @@ void mlx5_pages_debugfs_init(struct mlx5_core_dev *dev)
 
 	debugfs_create_u32("fw_pages_total", 0400, pages, &dev->priv.fw_pages);
 	debugfs_create_u32("fw_pages_vfs", 0400, pages, &dev->priv.page_counters[MLX5_VF]);
+	debugfs_create_u32("fw_pages_sfs", 0400, pages, &dev->priv.page_counters[MLX5_SF]);
 	debugfs_create_u32("fw_pages_host_pf", 0400, pages, &dev->priv.page_counters[MLX5_HOST_PF]);
 	debugfs_create_u32("fw_pages_alloc_failed", 0400, pages, &dev->priv.fw_pages_alloc_failed);
 	debugfs_create_u32("fw_pages_give_dropped", 0400, pages, &dev->priv.give_pages_dropped);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index 9f99292ab5ce..0eb50be175cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -79,7 +79,7 @@ static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_funct
 	if (!func_id)
 		return mlx5_core_is_ecpf(dev) && !ec_function ? MLX5_HOST_PF : MLX5_PF;
 
-	return MLX5_VF;
+	return func_id <= mlx5_core_max_vfs(dev) ?  MLX5_VF : MLX5_SF;
 }
 
 static struct rb_root *page_root_per_function(struct mlx5_core_dev *dev, u32 function)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 82a9bd4274b8..333c1fec72f8 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -576,6 +576,7 @@ struct mlx5_debugfs_entries {
 enum mlx5_func_type {
 	MLX5_PF,
 	MLX5_VF,
+	MLX5_SF,
 	MLX5_HOST_PF,
 	MLX5_FUNC_TYPE_NUM,
 };
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 08/10] net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2023-02-08  3:02 ` [net 07/10] net/mlx5: Expose SF firmware pages counter Saeed Mahameed
@ 2023-02-08  3:03 ` Saeed Mahameed
  2023-02-08  3:03 ` [net 09/10] net/mlx5: fw_tracer, Zero consumer index when reloading the tracer Saeed Mahameed
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:03 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

Whenever the driver is reading the string DBs into buffers, the driver
is setting the load bit, but the driver never clears this bit.
As a result, in case load bit is on and the driver query the device for
new string DBs, the driver won't read again the string DBs.
Fix it by clearing the load bit when query the device for new string
DBs.

Fixes: 2d69356752ff ("net/mlx5: Add support for fw live patch event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index 21831386b26e..d82e98a0cdfa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -64,6 +64,7 @@ static int mlx5_query_mtrc_caps(struct mlx5_fw_tracer *tracer)
 			MLX5_GET(mtrc_cap, out, num_string_trace);
 	tracer->str_db.num_string_db = MLX5_GET(mtrc_cap, out, num_string_db);
 	tracer->owner = !!MLX5_GET(mtrc_cap, out, trace_owner);
+	tracer->str_db.loaded = false;
 
 	for (i = 0; i < tracer->str_db.num_string_db; i++) {
 		mtrc_cap_sp = MLX5_ADDR_OF(mtrc_cap, out, string_db_param[i]);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 09/10] net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2023-02-08  3:03 ` [net 08/10] net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers Saeed Mahameed
@ 2023-02-08  3:03 ` Saeed Mahameed
  2023-02-08  3:03 ` [net 10/10] net/mlx5: Serialize module cleanup with reload and remove Saeed Mahameed
  2023-02-08 12:40 ` [pull request][net 00/10] mlx5 fixes 2023-02-07 Vadim Fedorenko
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:03 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory

From: Shay Drory <shayd@nvidia.com>

When tracer is reloaded, the device will log the traces at the
beginning of the log buffer. Also, driver is reading the log buffer in
chunks in accordance to the consumer index.
Hence, zero consumer index when reloading the tracer.

Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index d82e98a0cdfa..5b05b884b5fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -757,6 +757,7 @@ static int mlx5_fw_tracer_set_mtrc_conf(struct mlx5_fw_tracer *tracer)
 	if (err)
 		mlx5_core_warn(dev, "FWTracer: Failed to set tracer configurations %d\n", err);
 
+	tracer->buff.consumer_index = 0;
 	return err;
 }
 
@@ -821,7 +822,6 @@ static void mlx5_fw_tracer_ownership_change(struct work_struct *work)
 	mlx5_core_dbg(tracer->dev, "FWTracer: ownership changed, current=(%d)\n", tracer->owner);
 	if (tracer->owner) {
 		tracer->owner = false;
-		tracer->buff.consumer_index = 0;
 		return;
 	}
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [net 10/10] net/mlx5: Serialize module cleanup with reload and remove
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2023-02-08  3:03 ` [net 09/10] net/mlx5: fw_tracer, Zero consumer index when reloading the tracer Saeed Mahameed
@ 2023-02-08  3:03 ` Saeed Mahameed
  2023-02-08 12:40 ` [pull request][net 00/10] mlx5 fixes 2023-02-07 Vadim Fedorenko
  10 siblings, 0 replies; 18+ messages in thread
From: Saeed Mahameed @ 2023-02-08  3:03 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

Currently, remove and reload flows can run in parallel to module cleanup.
This design is error prone. For example: aux_drivers callbacks are called
from both cleanup and remove flows with different lockings, which can
cause a deadlock[1].
Hence, serialize module cleanup with reload and remove.

[1]
       cleanup                        remove
       -------                        ------
   auxiliary_driver_unregister();
                                     devl_lock()
                                      auxiliary_device_delete(mlx5e_aux)
    device_lock(mlx5e_aux)
     devl_lock()
                                       device_lock(mlx5e_aux)

Fixes: 912cebf420c2 ("net/mlx5e: Connect ethernet part to auxiliary bus")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 3d5f2a4b1fed..4e1b5757528a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -2110,7 +2110,7 @@ static int __init mlx5_init(void)
 	mlx5_core_verify_params();
 	mlx5_register_debugfs();
 
-	err = pci_register_driver(&mlx5_core_driver);
+	err = mlx5e_init();
 	if (err)
 		goto err_debug;
 
@@ -2118,16 +2118,16 @@ static int __init mlx5_init(void)
 	if (err)
 		goto err_sf;
 
-	err = mlx5e_init();
+	err = pci_register_driver(&mlx5_core_driver);
 	if (err)
-		goto err_en;
+		goto err_pci;
 
 	return 0;
 
-err_en:
+err_pci:
 	mlx5_sf_driver_unregister();
 err_sf:
-	pci_unregister_driver(&mlx5_core_driver);
+	mlx5e_cleanup();
 err_debug:
 	mlx5_unregister_debugfs();
 	return err;
@@ -2135,9 +2135,9 @@ static int __init mlx5_init(void)
 
 static void __exit mlx5_cleanup(void)
 {
-	mlx5e_cleanup();
-	mlx5_sf_driver_unregister();
 	pci_unregister_driver(&mlx5_core_driver);
+	mlx5_sf_driver_unregister();
+	mlx5e_cleanup();
 	mlx5_unregister_debugfs();
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
  2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2023-02-08  3:03 ` [net 10/10] net/mlx5: Serialize module cleanup with reload and remove Saeed Mahameed
@ 2023-02-08 12:40 ` Vadim Fedorenko
       [not found]   ` <DM5PR12MB134054EC92BC13E36B6C5711B3D89@DM5PR12MB1340.namprd12.prod.outlook.com>
  10 siblings, 1 reply; 18+ messages in thread
From: Vadim Fedorenko @ 2023-02-08 12:40 UTC (permalink / raw)
  To: Saeed Mahameed, Jakub Kicinski; +Cc: Saeed Mahameed, netdev, Tariq Toukan

On 08/02/2023 03:02, Saeed Mahameed wrote:
> From: Saeed Mahameed <saeedm@nvidia.com>
> 
> This series provides bug fixes to mlx5 driver.
> Please pull and let me know if there is any problem.
> 
Still no patches for PTP queue? That's a bit wierd.
Do you think that they are not ready to be in -net?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
       [not found]   ` <DM5PR12MB134054EC92BC13E36B6C5711B3D89@DM5PR12MB1340.namprd12.prod.outlook.com>
@ 2023-02-08 21:16     ` Rahul Rameshbabu
  2023-02-08 21:36       ` Vadim Fedorenko
  0 siblings, 1 reply; 18+ messages in thread
From: Rahul Rameshbabu @ 2023-02-08 21:16 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Vadim Fedorenko, Saeed Mahameed, Jakub Kicinski, netdev, Tariq Toukan

On Wed, 08 Feb, 2023 12:52:55 -0800 Saeed Mahameed <saeedm@nvidia.com> wrote:
> Hi Vadim,
>
> We have some new findings internally and Rahul is testing your patches,
> he found some issues where the patches don't handle the case where only drops are happening, meanings no OOO.
>
> Rahul can share more details, he's still working on this and I believe we will have a fully detailed follow-up by the end of the week.

One thing I noticed was the conditional in mlx5e_ptp_ts_cqe_ooo in v5
does handle OOO but considers the monotomically increasing case of 1,3,4
for example to be OOO as well (a resync does not occur when I tested
this case).

A simple patch I made to verify this.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index ae75e230170b..dfa5c53bd0d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -125,6 +125,8 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
 	struct sk_buff *skb;
 	ktime_t hwtstamp;
 
+	pr_info("wqe_counter value: %u\n", skb_id);
+
 	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
 		skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
 		ptpsq->cq_stats->err_cqe++;
@@ -133,6 +135,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
 
 	if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id)) {
 		if (mlx5e_ptp_ts_cqe_ooo(ptpsq, skb_id)) {
+			pr_info("Marked ooo wqe_counter: %u\n", skb_id);
 			/* already handled by a previous resync */
 			ptpsq->cq_stats->ooo_cqe_drop++;
 			return;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index f7897ddb29c5..8582f0535e21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -646,7 +646,7 @@ static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb,
 				 struct mlx5_wqe_eth_seg *eseg)
 {
 	if (ptpsq->ts_cqe_ctr_mask && unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
-		eseg->flow_table_metadata = cpu_to_be32(ptpsq->skb_fifo_pc &
+		eseg->flow_table_metadata = cpu_to_be32((ptpsq->skb_fifo_pc * 2) &
 							ptpsq->ts_cqe_ctr_mask);
 }
 
Basically, I multiply the wqe_counter written in the WQE by two. The
thing here is we have a situation where we have "lost" a CQE with
wqe_counter index of one, but the patch treats that as OOO, which
basically disables our normal resiliency path for resyncs on drops. At
that point, the patch could just remove the resync logic altogether when
a drop is detected.

What I noticed then was that the case of 0,2 was marked as OOO even
though out of order would be something like 0,2,1.

  [Feb 8 02:40] wqe_counter value: 0
  [ +24.199404] wqe_counter value: 2
  [  +0.001041] Marked ooo wqe_counter: 2

I acknowledge the OOO issue but not sure the patch as is, correctly
solves the issue.

>
> Sorry for the late update but these new findings are only from yesterday.
>
> Thanks,
> Saeed.
>
>  
> -------------------------------------------------------------------------------------------------------------------------
> From: Vadim Fedorenko <vadfed@meta.com>
> Sent: Wednesday, February 8, 2023 4:40 AM
> To: Saeed Mahameed <saeed@kernel.org>; Jakub Kicinski <kuba@kernel.org>
> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Tariq Toukan <tariqt@nvidia.com>
> Subject: Re: [pull request][net 00/10] mlx5 fixes 2023-02-07 
>  
> On 08/02/2023 03:02, Saeed Mahameed wrote:
>> From: Saeed Mahameed <saeedm@nvidia.com>
>> 
>> This series provides bug fixes to mlx5 driver.
>> Please pull and let me know if there is any problem.
>> 
> Still no patches for PTP queue? That's a bit wierd.
> Do you think that they are not ready to be in -net?

-- Rahul Rameshbabu

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
  2023-02-08 21:16     ` Rahul Rameshbabu
@ 2023-02-08 21:36       ` Vadim Fedorenko
  2023-02-08 23:52         ` Rahul Rameshbabu
  0 siblings, 1 reply; 18+ messages in thread
From: Vadim Fedorenko @ 2023-02-08 21:36 UTC (permalink / raw)
  To: Rahul Rameshbabu, Saeed Mahameed
  Cc: Saeed Mahameed, Jakub Kicinski, netdev, Tariq Toukan

On 08/02/2023 21:16, Rahul Rameshbabu wrote:
> On Wed, 08 Feb, 2023 12:52:55 -0800 Saeed Mahameed <saeedm@nvidia.com> wrote:
>> Hi Vadim,
>>
>> We have some new findings internally and Rahul is testing your patches,
>> he found some issues where the patches don't handle the case where only drops are happening, meanings no OOO.
>>
>> Rahul can share more details, he's still working on this and I believe we will have a fully detailed follow-up by the end of the week.
> 
> One thing I noticed was the conditional in mlx5e_ptp_ts_cqe_ooo in v5
> does handle OOO but considers the monotomically increasing case of 1,3,4
> for example to be OOO as well (a resync does not occur when I tested
> this case).
> 
> A simple patch I made to verify this.
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> index ae75e230170b..dfa5c53bd0d5 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> @@ -125,6 +125,8 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>   	struct sk_buff *skb;
>   	ktime_t hwtstamp;
>   
> +	pr_info("wqe_counter value: %u\n", skb_id);
> +
>   	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
>   		skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
>   		ptpsq->cq_stats->err_cqe++;
> @@ -133,6 +135,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>   
>   	if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id)) {
>   		if (mlx5e_ptp_ts_cqe_ooo(ptpsq, skb_id)) {
> +			pr_info("Marked ooo wqe_counter: %u\n", skb_id);
>   			/* already handled by a previous resync */
>   			ptpsq->cq_stats->ooo_cqe_drop++;
>   			return;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> index f7897ddb29c5..8582f0535e21 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> @@ -646,7 +646,7 @@ static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb,
>   				 struct mlx5_wqe_eth_seg *eseg)
>   {
>   	if (ptpsq->ts_cqe_ctr_mask && unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
> -		eseg->flow_table_metadata = cpu_to_be32(ptpsq->skb_fifo_pc &
> +		eseg->flow_table_metadata = cpu_to_be32((ptpsq->skb_fifo_pc * 2) &
>   							ptpsq->ts_cqe_ctr_mask);
>   }
>   
> Basically, I multiply the wqe_counter written in the WQE by two. The
> thing here is we have a situation where we have "lost" a CQE with
> wqe_counter index of one, but the patch treats that as OOO, which
> basically disables our normal resiliency path for resyncs on drops. At
> that point, the patch could just remove the resync logic altogether when
> a drop is detected.
> 
> What I noticed then was that the case of 0,2 was marked as OOO even
> though out of order would be something like 0,2,1.
> 
>    [Feb 8 02:40] wqe_counter value: 0
>    [ +24.199404] wqe_counter value: 2
>    [  +0.001041] Marked ooo wqe_counter: 2
> 
> I acknowledge the OOO issue but not sure the patch as is, correctly
> solves the issue.
> 

With this patch it's not clear how many skbs were in the queue. AFAIU if 
there was only skb id = 1 in the queue, then the id = 2 is definitely 
OOO because it couldn't be found in the queue. Otherwise resync should 
be triggered and that is what I have seen in our setup with v5 patches.


>>
>> Sorry for the late update but these new findings are only from yesterday.
>>
>> Thanks,
>> Saeed.
>>
>>   
>> -------------------------------------------------------------------------------------------------------------------------
>> From: Vadim Fedorenko <vadfed@meta.com>
>> Sent: Wednesday, February 8, 2023 4:40 AM
>> To: Saeed Mahameed <saeed@kernel.org>; Jakub Kicinski <kuba@kernel.org>
>> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Tariq Toukan <tariqt@nvidia.com>
>> Subject: Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
>>   
>> On 08/02/2023 03:02, Saeed Mahameed wrote:
>>> From: Saeed Mahameed <saeedm@nvidia.com>
>>>
>>> This series provides bug fixes to mlx5 driver.
>>> Please pull and let me know if there is any problem.
>>>
>> Still no patches for PTP queue? That's a bit wierd.
>> Do you think that they are not ready to be in -net?
> 
> -- Rahul Rameshbabu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
  2023-02-08 21:36       ` Vadim Fedorenko
@ 2023-02-08 23:52         ` Rahul Rameshbabu
  2023-02-09  1:10           ` Vadim Fedorenko
  0 siblings, 1 reply; 18+ messages in thread
From: Rahul Rameshbabu @ 2023-02-08 23:52 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Saeed Mahameed, Saeed Mahameed, Jakub Kicinski, netdev, Tariq Toukan

On Wed, 08 Feb, 2023 21:36:52 +0000 Vadim Fedorenko <vadfed@meta.com> wrote:
> On 08/02/2023 21:16, Rahul Rameshbabu wrote:
>> On Wed, 08 Feb, 2023 12:52:55 -0800 Saeed Mahameed <saeedm@nvidia.com> wrote:
>>> Hi Vadim,
>>>
>>> We have some new findings internally and Rahul is testing your patches,
>>> he found some issues where the patches don't handle the case where only drops are happening, meanings no OOO.
>>>
>>> Rahul can share more details, he's still working on this and I believe we will have a fully detailed follow-up by the end of the week.
>> One thing I noticed was the conditional in mlx5e_ptp_ts_cqe_ooo in v5
>> does handle OOO but considers the monotomically increasing case of 1,3,4
>> for example to be OOO as well (a resync does not occur when I tested
>> this case).
>> A simple patch I made to verify this.
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>> index ae75e230170b..dfa5c53bd0d5 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>> @@ -125,6 +125,8 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>>   	struct sk_buff *skb;
>>   	ktime_t hwtstamp;
>>   +	pr_info("wqe_counter value: %u\n", skb_id);
>> +
>>   	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
>>   		skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
>>   		ptpsq->cq_stats->err_cqe++;
>> @@ -133,6 +135,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>>     	if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id)) {
>>   		if (mlx5e_ptp_ts_cqe_ooo(ptpsq, skb_id)) {
>> +			pr_info("Marked ooo wqe_counter: %u\n", skb_id);
>>   			/* already handled by a previous resync */
>>   			ptpsq->cq_stats->ooo_cqe_drop++;
>>   			return;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> index f7897ddb29c5..8582f0535e21 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> @@ -646,7 +646,7 @@ static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb,
>>   				 struct mlx5_wqe_eth_seg *eseg)
>>   {
>>   	if (ptpsq->ts_cqe_ctr_mask && unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
>> -		eseg->flow_table_metadata = cpu_to_be32(ptpsq->skb_fifo_pc &
>> +		eseg->flow_table_metadata = cpu_to_be32((ptpsq->skb_fifo_pc * 2) &
>>   							ptpsq->ts_cqe_ctr_mask);
>>   }
>>   Basically, I multiply the wqe_counter written in the WQE by two. The
>> thing here is we have a situation where we have "lost" a CQE with
>> wqe_counter index of one, but the patch treats that as OOO, which
>> basically disables our normal resiliency path for resyncs on drops. At
>> that point, the patch could just remove the resync logic altogether when
>> a drop is detected.
>> What I noticed then was that the case of 0,2 was marked as OOO even
>> though out of order would be something like 0,2,1.
>>    [Feb 8 02:40] wqe_counter value: 0
>>    [ +24.199404] wqe_counter value: 2
>>    [  +0.001041] Marked ooo wqe_counter: 2
>> I acknowledge the OOO issue but not sure the patch as is, correctly
>> solves the issue.
>> 
>
> With this patch it's not clear how many skbs were in the queue. AFAIU if there
> was only skb id = 1 in the queue, then the id = 2 is definitely OOO because it
> couldn't be found in the queue. Otherwise resync should be triggered and that is
> what I have seen in our setup with v5 patches.

With this patch at the time of testing, the pc is only 2 because we
skipped generating a WQE with a wqe_counter of 1. This matches your
expectation that it's OOO since we don't have a pc of 3 (wqe_counter
<skb id> 1 was never actually put on the WQ).

One thing I am still concerned about then.

  wqe_counter   0   3   1   2
  skb_cc        0   1   2   3
  skb_pc        4   4   4   4

Lets say we encounter wqe_counter 3 and the pc is currently 4. OOO is
not triggered and we go into the resync logic. The resync logic then
consumers 3, 1, and 2 out of order which is still an issue?

>
>
>>>
>>> Sorry for the late update but these new findings are only from yesterday.
>>>
>>> Thanks,
>>> Saeed.
>>>
>>>   -------------------------------------------------------------------------------------------------------------------------
>>> From: Vadim Fedorenko <vadfed@meta.com>
>>> Sent: Wednesday, February 8, 2023 4:40 AM
>>> To: Saeed Mahameed <saeed@kernel.org>; Jakub Kicinski <kuba@kernel.org>
>>> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Tariq Toukan <tariqt@nvidia.com>
>>> Subject: Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
>>>   On 08/02/2023 03:02, Saeed Mahameed wrote:
>>>> From: Saeed Mahameed <saeedm@nvidia.com>
>>>>
>>>> This series provides bug fixes to mlx5 driver.
>>>> Please pull and let me know if there is any problem.
>>>>
>>> Still no patches for PTP queue? That's a bit wierd.
>>> Do you think that they are not ready to be in -net?
>> -- Rahul Rameshbabu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
  2023-02-08 23:52         ` Rahul Rameshbabu
@ 2023-02-09  1:10           ` Vadim Fedorenko
  2023-02-15  3:19             ` Rahul Rameshbabu
  0 siblings, 1 reply; 18+ messages in thread
From: Vadim Fedorenko @ 2023-02-09  1:10 UTC (permalink / raw)
  To: Rahul Rameshbabu
  Cc: Saeed Mahameed, Saeed Mahameed, Jakub Kicinski, netdev, Tariq Toukan

On 08/02/2023 23:52, Rahul Rameshbabu wrote:
> On Wed, 08 Feb, 2023 21:36:52 +0000 Vadim Fedorenko <vadfed@meta.com> wrote:
>> On 08/02/2023 21:16, Rahul Rameshbabu wrote:
>>> On Wed, 08 Feb, 2023 12:52:55 -0800 Saeed Mahameed <saeedm@nvidia.com> wrote:
>>>> Hi Vadim,
>>>>
>>>> We have some new findings internally and Rahul is testing your patches,
>>>> he found some issues where the patches don't handle the case where only drops are happening, meanings no OOO.
>>>>
>>>> Rahul can share more details, he's still working on this and I believe we will have a fully detailed follow-up by the end of the week.
>>> One thing I noticed was the conditional in mlx5e_ptp_ts_cqe_ooo in v5
>>> does handle OOO but considers the monotomically increasing case of 1,3,4
>>> for example to be OOO as well (a resync does not occur when I tested
>>> this case).
>>> A simple patch I made to verify this.
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>> index ae75e230170b..dfa5c53bd0d5 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>> @@ -125,6 +125,8 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>>>    	struct sk_buff *skb;
>>>    	ktime_t hwtstamp;
>>>    +	pr_info("wqe_counter value: %u\n", skb_id);
>>> +
>>>    	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
>>>    		skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
>>>    		ptpsq->cq_stats->err_cqe++;
>>> @@ -133,6 +135,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>>>      	if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id)) {
>>>    		if (mlx5e_ptp_ts_cqe_ooo(ptpsq, skb_id)) {
>>> +			pr_info("Marked ooo wqe_counter: %u\n", skb_id);
>>>    			/* already handled by a previous resync */
>>>    			ptpsq->cq_stats->ooo_cqe_drop++;
>>>    			return;
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>>> index f7897ddb29c5..8582f0535e21 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>>> @@ -646,7 +646,7 @@ static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb,
>>>    				 struct mlx5_wqe_eth_seg *eseg)
>>>    {
>>>    	if (ptpsq->ts_cqe_ctr_mask && unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
>>> -		eseg->flow_table_metadata = cpu_to_be32(ptpsq->skb_fifo_pc &
>>> +		eseg->flow_table_metadata = cpu_to_be32((ptpsq->skb_fifo_pc * 2) &
>>>    							ptpsq->ts_cqe_ctr_mask);
>>>    }
>>>    Basically, I multiply the wqe_counter written in the WQE by two. The
>>> thing here is we have a situation where we have "lost" a CQE with
>>> wqe_counter index of one, but the patch treats that as OOO, which
>>> basically disables our normal resiliency path for resyncs on drops. At
>>> that point, the patch could just remove the resync logic altogether when
>>> a drop is detected.
>>> What I noticed then was that the case of 0,2 was marked as OOO even
>>> though out of order would be something like 0,2,1.
>>>     [Feb 8 02:40] wqe_counter value: 0
>>>     [ +24.199404] wqe_counter value: 2
>>>     [  +0.001041] Marked ooo wqe_counter: 2
>>> I acknowledge the OOO issue but not sure the patch as is, correctly
>>> solves the issue.
>>>
>>
>> With this patch it's not clear how many skbs were in the queue. AFAIU if there
>> was only skb id = 1 in the queue, then the id = 2 is definitely OOO because it
>> couldn't be found in the queue. Otherwise resync should be triggered and that is
>> what I have seen in our setup with v5 patches.
> 
> With this patch at the time of testing, the pc is only 2 because we
> skipped generating a WQE with a wqe_counter of 1. This matches your
> expectation that it's OOO since we don't have a pc of 3 (wqe_counter
> <skb id> 1 was never actually put on the WQ).
> 
> One thing I am still concerned about then.
> 
>    wqe_counter   0   3   1   2
>    skb_cc        0   1   2   3
>    skb_pc        4   4   4   4
> 
> Lets say we encounter wqe_counter 3 and the pc is currently 4. OOO is
> not triggered and we go into the resync logic. The resync logic then
> consumers 3, 1, and 2 out of order which is still an issue?

Resync logic will drop 1 and 2. The 3 will be consumed and the logic 
will wait for 4 as the next one. And in this case it's OK to count 1 and 
2 as OOO because both of them have arrived after 3. I have to mention 
that I didn't implement "resync logic". It was implemented before as 
there should never be OOO cqes according to what was stated in the 
previous versions of patches by reviewers. My patches do not change 
logic, they just fix the implementation which is currently crashes the 
kernel. Once the root cause in FW (which is completely closed source and 
I can only guess what logic is implemented in it) is found we can 
re-think the logic. But for now I just want to fix the easy reproducible 
crash, even if the patch is "bandage".

> 
>>
>>
>>>>
>>>> Sorry for the late update but these new findings are only from yesterday.
>>>>
>>>> Thanks,
>>>> Saeed.
>>>>
>>>>    -------------------------------------------------------------------------------------------------------------------------
>>>> From: Vadim Fedorenko <vadfed@meta.com>
>>>> Sent: Wednesday, February 8, 2023 4:40 AM
>>>> To: Saeed Mahameed <saeed@kernel.org>; Jakub Kicinski <kuba@kernel.org>
>>>> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Tariq Toukan <tariqt@nvidia.com>
>>>> Subject: Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
>>>>    On 08/02/2023 03:02, Saeed Mahameed wrote:
>>>>> From: Saeed Mahameed <saeedm@nvidia.com>
>>>>>
>>>>> This series provides bug fixes to mlx5 driver.
>>>>> Please pull and let me know if there is any problem.
>>>>>
>>>> Still no patches for PTP queue? That's a bit wierd.
>>>> Do you think that they are not ready to be in -net?
>>> -- Rahul Rameshbabu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [net 01/10] net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
  2023-02-08  3:02 ` [net 01/10] net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change Saeed Mahameed
@ 2023-02-09  5:00   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 18+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-02-09  5:00 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: davem, kuba, pabeni, edumazet, saeedm, netdev, tariqt, afaris

Hello:

This series was applied to netdev/net.git (master)
by Saeed Mahameed <saeedm@nvidia.com>:

On Tue,  7 Feb 2023 19:02:53 -0800 you wrote:
> From: Adham Faris <afaris@nvidia.com>
> 
> rq->hw_mtu is used in function en_rx.c/mlx5e_skb_from_cqe_mpwrq_linear()
> to catch oversized packets. If FCS is concatenated to the end of the
> packet then the check should be updated accordingly.
> 
> Rx rings initialization (mlx5e_init_rxq_rq()) invoked for every new set
> of channels, as part of mlx5e_safe_switch_params(), unknowingly if it
> runs with default configuration or not. Current rq->hw_mtu
> initialization assumes default configuration and ignores
> params->scatter_fcs_en flag state.
> Fix this, by accounting for params->scatter_fcs_en flag state during
> rq->hw_mtu initialization.
> 
> [...]

Here is the summary with links:
  - [net,01/10] net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
    https://git.kernel.org/netdev/net/c/1e66220948df
  - [net,02/10] net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic
    https://git.kernel.org/netdev/net/c/288d85e07fbc
  - [net,03/10] net/mlx5: Bridge, fix ageing of peer FDB entries
    https://git.kernel.org/netdev/net/c/da0c52426cd2
  - [net,04/10] net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode
    https://git.kernel.org/netdev/net/c/8974aa9638df
  - [net,05/10] net/mlx5e: IPoIB, Show unknown speed instead of error
    https://git.kernel.org/netdev/net/c/8aa5f171d51c
  - [net,06/10] net/mlx5: Store page counters in a single array
    https://git.kernel.org/netdev/net/c/c3bdbaea654d
  - [net,07/10] net/mlx5: Expose SF firmware pages counter
    https://git.kernel.org/netdev/net/c/9965bbebae59
  - [net,08/10] net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
    https://git.kernel.org/netdev/net/c/db561fed6b8f
  - [net,09/10] net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
    https://git.kernel.org/netdev/net/c/184e1e4474db
  - [net,10/10] net/mlx5: Serialize module cleanup with reload and remove
    https://git.kernel.org/netdev/net/c/8f0d1451ecf7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
  2023-02-09  1:10           ` Vadim Fedorenko
@ 2023-02-15  3:19             ` Rahul Rameshbabu
  0 siblings, 0 replies; 18+ messages in thread
From: Rahul Rameshbabu @ 2023-02-15  3:19 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Saeed Mahameed, Saeed Mahameed, Jakub Kicinski, netdev, Tariq Toukan

Sorry for the late response. Needed a bit of time to wrap my head around
the patch.

On Thu, 09 Feb, 2023 01:10:50 +0000 Vadim Fedorenko <vadfed@meta.com> wrote:
> On 08/02/2023 23:52, Rahul Rameshbabu wrote:
>> On Wed, 08 Feb, 2023 21:36:52 +0000 Vadim Fedorenko <vadfed@meta.com> wrote:
>>> On 08/02/2023 21:16, Rahul Rameshbabu wrote:
>>>> On Wed, 08 Feb, 2023 12:52:55 -0800 Saeed Mahameed <saeedm@nvidia.com> wrote:
>>>>> Hi Vadim,
>>>>>
>>>>> We have some new findings internally and Rahul is testing your patches,
>>>>> he found some issues where the patches don't handle the case where only drops are happening, meanings no OOO.
>>>>>
>>>>> Rahul can share more details, he's still working on this and I believe we will have a fully detailed follow-up by the end of the week.
>>>> One thing I noticed was the conditional in mlx5e_ptp_ts_cqe_ooo in v5
>>>> does handle OOO but considers the monotomically increasing case of 1,3,4
>>>> for example to be OOO as well (a resync does not occur when I tested
>>>> this case).
>>>> A simple patch I made to verify this.
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>>> index ae75e230170b..dfa5c53bd0d5 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>>> @@ -125,6 +125,8 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>>>>    	struct sk_buff *skb;
>>>>    	ktime_t hwtstamp;
>>>>    +	pr_info("wqe_counter value: %u\n", skb_id);
>>>> +
>>>>    	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
>>>>    		skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
>>>>    		ptpsq->cq_stats->err_cqe++;
>>>> @@ -133,6 +135,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>>>>      	if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id)) {
>>>>    		if (mlx5e_ptp_ts_cqe_ooo(ptpsq, skb_id)) {
>>>> +			pr_info("Marked ooo wqe_counter: %u\n", skb_id);
>>>>    			/* already handled by a previous resync */
>>>>    			ptpsq->cq_stats->ooo_cqe_drop++;
>>>>    			return;
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>>>> index f7897ddb29c5..8582f0535e21 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>>>> @@ -646,7 +646,7 @@ static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb,
>>>>    				 struct mlx5_wqe_eth_seg *eseg)
>>>>    {
>>>>    	if (ptpsq->ts_cqe_ctr_mask && unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
>>>> -		eseg->flow_table_metadata = cpu_to_be32(ptpsq->skb_fifo_pc &
>>>> +		eseg->flow_table_metadata = cpu_to_be32((ptpsq->skb_fifo_pc * 2) &
>>>>    							ptpsq->ts_cqe_ctr_mask);
>>>>    }
>>>>    Basically, I multiply the wqe_counter written in the WQE by two. The
>>>> thing here is we have a situation where we have "lost" a CQE with
>>>> wqe_counter index of one, but the patch treats that as OOO, which
>>>> basically disables our normal resiliency path for resyncs on drops. At
>>>> that point, the patch could just remove the resync logic altogether when
>>>> a drop is detected.
>>>> What I noticed then was that the case of 0,2 was marked as OOO even
>>>> though out of order would be something like 0,2,1.
>>>>     [Feb 8 02:40] wqe_counter value: 0
>>>>     [ +24.199404] wqe_counter value: 2
>>>>     [  +0.001041] Marked ooo wqe_counter: 2
>>>> I acknowledge the OOO issue but not sure the patch as is, correctly
>>>> solves the issue.
>>>>
>>>
>>> With this patch it's not clear how many skbs were in the queue. AFAIU if there
>>> was only skb id = 1 in the queue, then the id = 2 is definitely OOO because it
>>> couldn't be found in the queue. Otherwise resync should be triggered and that is
>>> what I have seen in our setup with v5 patches.
>> 
>> With this patch at the time of testing, the pc is only 2 because we
>> skipped generating a WQE with a wqe_counter of 1. This matches your
>> expectation that it's OOO since we don't have a pc of 3 (wqe_counter
>> <skb id> 1 was never actually put on the WQ).
>> 
>> One thing I am still concerned about then.
>> 
>>    wqe_counter   0   3   1   2
>>    skb_cc        0   1   2   3
>>    skb_pc        4   4   4   4
>> 
>> Lets say we encounter wqe_counter 3 and the pc is currently 4. OOO is
>> not triggered and we go into the resync logic. The resync logic then
>> consumers 3, 1, and 2 out of order which is still an issue?
>
> Resync logic will drop 1 and 2. The 3 will be consumed and the logic 
> will wait for 4 as the next one. And in this case it's OK to count 1 and 
> 2 as OOO because both of them have arrived after 3. I have to mention 
> that I didn't implement "resync logic". It was implemented before as 
> there should never be OOO cqes according to what was stated in the 
> previous versions of patches by reviewers. My patches do not change 
> logic, they just fix the implementation which is currently crashes the 
> kernel. Once the root cause in FW (which is completely closed source and 
> I can only guess what logic is implemented in it) is found we can 
> re-think the logic. But for now I just want to fix the easy reproducible 
> crash, even if the patch is "bandage".
>

Agree with this explanation and tested the behavior just to confirm.
Also, tested a couple other cases and logically understand how this
patch can make use of the resync logic to drop out-of-order skbs using
the resync logic and then ignore the corresponding CQEs with the OOO
check.

>> 
>>>
>>>
>>>>>
>>>>> Sorry for the late update but these new findings are only from yesterday.
>>>>>
>>>>> Thanks,
>>>>> Saeed.
>>>>>
>>>>>    -------------------------------------------------------------------------------------------------------------------------
>>>>> From: Vadim Fedorenko <vadfed@meta.com>
>>>>> Sent: Wednesday, February 8, 2023 4:40 AM
>>>>> To: Saeed Mahameed <saeed@kernel.org>; Jakub Kicinski <kuba@kernel.org>
>>>>> Cc: Saeed Mahameed <saeedm@nvidia.com>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Tariq Toukan <tariqt@nvidia.com>
>>>>> Subject: Re: [pull request][net 00/10] mlx5 fixes 2023-02-07
>>>>>    On 08/02/2023 03:02, Saeed Mahameed wrote:
>>>>>> From: Saeed Mahameed <saeedm@nvidia.com>
>>>>>>
>>>>>> This series provides bug fixes to mlx5 driver.
>>>>>> Please pull and let me know if there is any problem.
>>>>>>
>>>>> Still no patches for PTP queue? That's a bit wierd.
>>>>> Do you think that they are not ready to be in -net?
>>>> -- Rahul Rameshbabu

Acked-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-02-15  3:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-08  3:02 [pull request][net 00/10] mlx5 fixes 2023-02-07 Saeed Mahameed
2023-02-08  3:02 ` [net 01/10] net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change Saeed Mahameed
2023-02-09  5:00   ` patchwork-bot+netdevbpf
2023-02-08  3:02 ` [net 02/10] net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic Saeed Mahameed
2023-02-08  3:02 ` [net 03/10] net/mlx5: Bridge, fix ageing of peer FDB entries Saeed Mahameed
2023-02-08  3:02 ` [net 04/10] net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode Saeed Mahameed
2023-02-08  3:02 ` [net 05/10] net/mlx5e: IPoIB, Show unknown speed instead of error Saeed Mahameed
2023-02-08  3:02 ` [net 06/10] net/mlx5: Store page counters in a single array Saeed Mahameed
2023-02-08  3:02 ` [net 07/10] net/mlx5: Expose SF firmware pages counter Saeed Mahameed
2023-02-08  3:03 ` [net 08/10] net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers Saeed Mahameed
2023-02-08  3:03 ` [net 09/10] net/mlx5: fw_tracer, Zero consumer index when reloading the tracer Saeed Mahameed
2023-02-08  3:03 ` [net 10/10] net/mlx5: Serialize module cleanup with reload and remove Saeed Mahameed
2023-02-08 12:40 ` [pull request][net 00/10] mlx5 fixes 2023-02-07 Vadim Fedorenko
     [not found]   ` <DM5PR12MB134054EC92BC13E36B6C5711B3D89@DM5PR12MB1340.namprd12.prod.outlook.com>
2023-02-08 21:16     ` Rahul Rameshbabu
2023-02-08 21:36       ` Vadim Fedorenko
2023-02-08 23:52         ` Rahul Rameshbabu
2023-02-09  1:10           ` Vadim Fedorenko
2023-02-15  3:19             ` Rahul Rameshbabu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.