All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net 00/11] mlx5 fixes 2021-12-22
@ 2021-12-22 21:11 Saeed Mahameed
  2021-12-22 21:11 ` [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources Saeed Mahameed
                   ` (11 more replies)
  0 siblings, 12 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Roi Dayan, Saeed Mahameed

From: Saeed Mahameed <saeedm@nvidia.com>

Hi Dave, Hi Jakub,

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.

Additionally and unrelated to this pull, I would like to kindly request
to cherry-pick the following fix commit from net-next branch into net:
31108d142f36 ("net/mlx5: Fix some error handling paths in ...")

Thanks,
Saeed.


The following changes since commit 9b8bdd1eb5890aeeab7391dddcf8bd51f7b07216:

  sfc: falcon: Check null pointer of rx_queue->page_ring (2021-12-22 12:25:18 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2021-12-22

for you to fetch changes up to 3a67257323fc284bc50d0cd7e849f3c141087169:

  net/mlx5e: Delete forward rule for ct or sample action (2021-12-22 13:02:24 -0800)

----------------------------------------------------------------
mlx5-fixes-2021-12-22

----------------------------------------------------------------
Amir Tzin (1):
      net/mlx5e: Wrap the tx reporter dump callback to extract the sq

Chris Mi (2):
      net/mlx5: Fix tc max supported prio for nic mode
      net/mlx5e: Delete forward rule for ct or sample action

Gal Pressman (1):
      net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled

Maxim Mikityanskiy (2):
      net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow
      net/mlx5e: Fix ICOSQ recovery flow for XSK

Miaoqian Lin (1):
      net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources

Moshe Shemesh (1):
      net/mlx5: Fix SF health recovery flow

Shay Drory (2):
      net/mlx5: Use first online CPU instead of hard coded CPU
      net/mlx5: Fix error print in case of IRQ request failed

Yevgeny Kliteynik (1):
      net/mlx5: DR, Fix querying eswitch manager vport for ECPF

 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  5 ++-
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |  2 ++
 .../net/ethernet/mellanox/mlx5/core/en/rep/tc.h    |  2 +-
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   | 35 +++++++++++++++++++-
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 10 +++++-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c | 16 +++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 37 ++++++++++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 17 ++++------
 .../ethernet/mellanox/mlx5/core/lib/fs_chains.c    |  3 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 11 ++++---
 drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c  |  6 ++--
 .../mellanox/mlx5/core/steering/dr_domain.c        |  9 +++---
 12 files changed, 112 insertions(+), 41 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-23  1:04   ` Jakub Kicinski
  2021-12-22 21:11 ` [net 02/11] net/mlx5: DR, Fix querying eswitch manager vport for ECPF Saeed Mahameed
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Miaoqian Lin, Saeed Mahameed

From: Miaoqian Lin <linmq006@gmail.com>

The mlx5_get_uars_page() function  returns error pointers.
Using IS_ERR() to check the return value to fix this.

Fixes: 4ec9e7b02697("net/mlx5: DR, Expose steering domain functionality")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
index 8cbd36c82b3b..f6e6d9209766 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2019 Mellanox Technologies. */
 
 #include <linux/mlx5/eswitch.h>
+#include <linux/err.h>
 #include "dr_types.h"
 
 #define DR_DOMAIN_SW_STEERING_SUPPORTED(dmn, dmn_type)	\
@@ -72,9 +73,9 @@ static int dr_domain_init_resources(struct mlx5dr_domain *dmn)
 	}
 
 	dmn->uar = mlx5_get_uars_page(dmn->mdev);
-	if (!dmn->uar) {
+	if (IS_ERR(dmn->uar)) {
 		mlx5dr_err(dmn, "Couldn't allocate UAR\n");
-		ret = -ENOMEM;
+		ret = PTR_ERR(dmn->uar);
 		goto clean_pd;
 	}
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 02/11] net/mlx5: DR, Fix querying eswitch manager vport for ECPF
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
  2021-12-22 21:11 ` [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-22 21:11 ` [net 03/11] net/mlx5: Use first online CPU instead of hard coded CPU Saeed Mahameed
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Yevgeny Kliteynik, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

On BlueField the E-Switch manager is the ECPF (vport 0xFFFE), but when
querying capabilities of ECPF eswitch manager, need to query vport 0
with other_vport = 0.

Fixes: 9091b821aaa4 ("net/mlx5: DR, Handle eswitch manager and uplink vports separately")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
index f6e6d9209766..c54cc45f63dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
@@ -164,9 +164,7 @@ static int dr_domain_query_vport(struct mlx5dr_domain *dmn,
 
 static int dr_domain_query_esw_mngr(struct mlx5dr_domain *dmn)
 {
-	return dr_domain_query_vport(dmn,
-				     dmn->info.caps.is_ecpf ? MLX5_VPORT_ECPF : 0,
-				     false,
+	return dr_domain_query_vport(dmn, 0, false,
 				     &dmn->info.caps.vports.esw_manager_caps);
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 03/11] net/mlx5: Use first online CPU instead of hard coded CPU
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
  2021-12-22 21:11 ` [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources Saeed Mahameed
  2021-12-22 21:11 ` [net 02/11] net/mlx5: DR, Fix querying eswitch manager vport for ECPF Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-22 21:11 ` [net 04/11] net/mlx5: Fix error print in case of IRQ request failed Saeed Mahameed
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Shay Drory, Moshe Shemesh, Saeed Mahameed

From: Shay Drory <shayd@nvidia.com>

Hard coded CPU (0 in our case) might be offline. Hence, use the first
online CPU instead.

Fixes: f891b7cdbdcd ("net/mlx5: Enable single IRQ for PCI Function")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index 830444f927d4..0e84c005d160 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -398,7 +398,7 @@ irq_pool_request_vector(struct mlx5_irq_pool *pool, int vecidx,
 	cpumask_copy(irq->mask, affinity);
 	if (!irq_pool_is_sf_pool(pool) && !pool->xa_num_irqs.max &&
 	    cpumask_empty(irq->mask))
-		cpumask_set_cpu(0, irq->mask);
+		cpumask_set_cpu(cpumask_first(cpu_online_mask), irq->mask);
 	irq_set_affinity_hint(irq->irqn, irq->mask);
 unlock:
 	mutex_unlock(&pool->lock);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 04/11] net/mlx5: Fix error print in case of IRQ request failed
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2021-12-22 21:11 ` [net 03/11] net/mlx5: Use first online CPU instead of hard coded CPU Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-22 21:11 ` [net 05/11] net/mlx5: Fix SF health recovery flow Saeed Mahameed
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Shay Drory, Moshe Shemesh, Saeed Mahameed

From: Shay Drory <shayd@nvidia.com>

In case IRQ layer failed to find or to request irq, the driver is
printing the first cpu of the provided affinity as part of the error
print. Empty affinity is a valid input for the IRQ layer, and it is
an error to call cpumask_first() on empty affinity.

Remove the first cpu print from the error message.

Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index 0e84c005d160..bcee30f5de0a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -356,8 +356,8 @@ static struct mlx5_irq *irq_pool_request_affinity(struct mlx5_irq_pool *pool,
 	new_irq = irq_pool_create_irq(pool, affinity);
 	if (IS_ERR(new_irq)) {
 		if (!least_loaded_irq) {
-			mlx5_core_err(pool->dev, "Didn't find IRQ for cpu = %u\n",
-				      cpumask_first(affinity));
+			mlx5_core_err(pool->dev, "Didn't find a matching IRQ. err = %ld\n",
+				      PTR_ERR(new_irq));
 			mutex_unlock(&pool->lock);
 			return new_irq;
 		}
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 05/11] net/mlx5: Fix SF health recovery flow
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2021-12-22 21:11 ` [net 04/11] net/mlx5: Fix error print in case of IRQ request failed Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-22 21:11 ` [net 06/11] net/mlx5: Fix tc max supported prio for nic mode Saeed Mahameed
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Moshe Shemesh, Saeed Mahameed

From: Moshe Shemesh <moshe@nvidia.com>

SF do not directly control the PCI device. During recovery flow SF
should not be allowed to do pci disable or pci reset, its PF will do it.

It fixes the following kernel trace:
mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:387:(pid 40948): starting health recovery flow
mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
mlx5_core 0000:03:00.0: wait vital counter value 0xab175 after 1 iterations
mlx5_core.sf mlx5_core.sf.25: firmware version: 24.32.532
mlx5_core.sf mlx5_core.sf.23: mlx5_health_try_recover:387:(pid 40946): starting health recovery flow
mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
mlx5_core 0000:03:00.0: wait vital counter value 0xab193 after 1 iterations
mlx5_core.sf mlx5_core.sf.23: firmware version: 24.32.532
mlx5_core.sf mlx5_core.sf.25: mlx5_cmd_check:813:(pid 40948): ENABLE_HCA(0x104) op_mod(0x0) failed,
status bad resource state(0x9), syndrome (0x658908)
mlx5_core.sf mlx5_core.sf.25: mlx5_function_setup:1292:(pid 40948): enable hca failed
mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:389:(pid 40948): health recovery failed

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 7df9c7f8d9c8..65083496f913 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1809,12 +1809,13 @@ void mlx5_disable_device(struct mlx5_core_dev *dev)
 
 int mlx5_recover_device(struct mlx5_core_dev *dev)
 {
-	int ret = -EIO;
+	if (!mlx5_core_is_sf(dev)) {
+		mlx5_pci_disable_device(dev);
+		if (mlx5_pci_slot_reset(dev->pdev) != PCI_ERS_RESULT_RECOVERED)
+			return -EIO;
+	}
 
-	mlx5_pci_disable_device(dev);
-	if (mlx5_pci_slot_reset(dev->pdev) == PCI_ERS_RESULT_RECOVERED)
-		ret = mlx5_load_one(dev);
-	return ret;
+	return mlx5_load_one(dev);
 }
 
 static struct pci_driver mlx5_core_driver = {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 06/11] net/mlx5: Fix tc max supported prio for nic mode
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2021-12-22 21:11 ` [net 05/11] net/mlx5: Fix SF health recovery flow Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-22 21:11 ` [net 07/11] net/mlx5e: Wrap the tx reporter dump callback to extract the sq Saeed Mahameed
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Chris Mi, Saeed Mahameed

From: Chris Mi <cmi@nvidia.com>

Only prio 1 is supported if firmware doesn't support ignore flow
level for nic mode. The offending commit removed the check wrongly.
Add it back.

Fixes: 9a99c8f1253a ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
index 97e5845b4cfd..d5e47630e284 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
@@ -121,6 +121,9 @@ u32 mlx5_chains_get_nf_ft_chain(struct mlx5_fs_chains *chains)
 
 u32 mlx5_chains_get_prio_range(struct mlx5_fs_chains *chains)
 {
+	if (!mlx5_chains_prios_supported(chains))
+		return 1;
+
 	if (mlx5_chains_ignore_flow_level_supported(chains))
 		return UINT_MAX;
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 07/11] net/mlx5e: Wrap the tx reporter dump callback to extract the sq
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2021-12-22 21:11 ` [net 06/11] net/mlx5: Fix tc max supported prio for nic mode Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-22 21:11 ` [net 08/11] net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled Saeed Mahameed
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Amir Tzin, Aya Levin, Saeed Mahameed

From: Amir Tzin <amirtz@nvidia.com>

Function mlx5e_tx_reporter_dump_sq() casts its void * argument to struct
mlx5e_txqsq *, but in TX-timeout-recovery flow the argument is actually
of type struct mlx5e_tx_timeout_ctx *.

 mlx5_core 0000:08:00.1 enp8s0f1: TX timeout detected
 mlx5_core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000
 BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae)
 kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
 CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0_mlnx #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
 RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
 [mlx5_core]
 Call Trace:
 mlx5e_tx_reporter_dump+0x43/0x1c0 [mlx5_core]
 devlink_health_do_dump.part.91+0x71/0xd0
 devlink_health_report+0x157/0x1b0
 mlx5e_reporter_tx_timeout+0xb9/0xf0 [mlx5_core]
 ? mlx5e_tx_reporter_err_cqe_recover+0x1d0/0x1d0
 [mlx5_core]
 ? mlx5e_health_queue_dump+0xd0/0xd0 [mlx5_core]
 ? update_load_avg+0x19b/0x550
 ? set_next_entity+0x72/0x80
 ? pick_next_task_fair+0x227/0x340
 ? finish_task_switch+0xa2/0x280
   mlx5e_tx_timeout_work+0x83/0xb0 [mlx5_core]
   process_one_work+0x1de/0x3a0
   worker_thread+0x2d/0x3c0
 ? process_one_work+0x3a0/0x3a0
   kthread+0x115/0x130
 ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30
 --[ end trace 51ccabea504edaff ]---
 RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
 PKRU: 55555554
 Kernel panic - not syncing: Fatal exception
 Kernel Offset: disabled
 end Kernel panic - not syncing: Fatal exception

To fix this bug add a wrapper for mlx5e_tx_reporter_dump_sq() which
extracts the sq from struct mlx5e_tx_timeout_ctx and set it as the
TX-timeout-recovery flow dump callback.

Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 4f4bc8726ec4..614cd9477600 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -466,6 +466,14 @@ static int mlx5e_tx_reporter_dump_sq(struct mlx5e_priv *priv, struct devlink_fms
 	return mlx5e_health_fmsg_named_obj_nest_end(fmsg);
 }
 
+static int mlx5e_tx_reporter_timeout_dump(struct mlx5e_priv *priv, struct devlink_fmsg *fmsg,
+					  void *ctx)
+{
+	struct mlx5e_tx_timeout_ctx *to_ctx = ctx;
+
+	return mlx5e_tx_reporter_dump_sq(priv, fmsg, to_ctx->sq);
+}
+
 static int mlx5e_tx_reporter_dump_all_sqs(struct mlx5e_priv *priv,
 					  struct devlink_fmsg *fmsg)
 {
@@ -561,7 +569,7 @@ int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
 	to_ctx.sq = sq;
 	err_ctx.ctx = &to_ctx;
 	err_ctx.recover = mlx5e_tx_reporter_timeout_recover;
-	err_ctx.dump = mlx5e_tx_reporter_dump_sq;
+	err_ctx.dump = mlx5e_tx_reporter_timeout_dump;
 	snprintf(err_str, sizeof(err_str),
 		 "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u",
 		 sq->ch_ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 08/11] net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2021-12-22 21:11 ` [net 07/11] net/mlx5e: Wrap the tx reporter dump callback to extract the sq Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-22 21:11 ` [net 09/11] net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow Saeed Mahameed
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Gal Pressman, Ariel Levkovich, Saeed Mahameed

From: Gal Pressman <gal@nvidia.com>

When TC classifier action offloads are disabled (CONFIG_MLX5_CLS_ACT in
Kconfig), the mlx5e_rep_tc_receive() function which is responsible for
passing the skb to the stack (or freeing it) is defined as a nop, and
results in leaking the skb memory. Replace the nop with a call to
napi_gro_receive() to resolve the leak.

Fixes: 28e7606fa8f1 ("net/mlx5e: Refactor rx handler of represetor device")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.h
index d6c7c81690eb..7c9dd3a75f8a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.h
@@ -66,7 +66,7 @@ mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
 
 static inline void
 mlx5e_rep_tc_receive(struct mlx5_cqe64 *cqe, struct mlx5e_rq *rq,
-		     struct sk_buff *skb) {}
+		     struct sk_buff *skb) { napi_gro_receive(rq->cq.napi, skb); }
 
 #endif /* CONFIG_MLX5_CLS_ACT */
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 09/11] net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2021-12-22 21:11 ` [net 08/11] net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled Saeed Mahameed
@ 2021-12-22 21:11 ` Saeed Mahameed
  2021-12-22 21:12 ` [net 10/11] net/mlx5e: Fix ICOSQ recovery flow for XSK Saeed Mahameed
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:11 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Maxim Mikityanskiy, Saeed Mahameed

From: Maxim Mikityanskiy <maximmi@nvidia.com>

Both regular RQ and XSKRQ use the same ICOSQ for UMRs. When doing
recovery for the ICOSQ, don't forget to deactivate XSKRQ.

XSK can be opened and closed while channels are active, so a new mutex
prevents the ICOSQ recovery from running at the same time. The ICOSQ
recovery deactivates and reactivates XSKRQ, so any parallel change in
XSK state would break consistency. As the regular RQ is running, it's
not enough to just flush the recovery work, because it can be
rescheduled.

Fixes: be5323c8379f ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 ++
 .../ethernet/mellanox/mlx5/core/en/health.h   |  2 ++
 .../mellanox/mlx5/core/en/reporter_rx.c       | 35 ++++++++++++++++++-
 .../mellanox/mlx5/core/en/xsk/setup.c         | 16 ++++++++-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  7 ++--
 5 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f0ac6b0d9653..f42067adc79d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -783,6 +783,8 @@ struct mlx5e_channel {
 	DECLARE_BITMAP(state, MLX5E_CHANNEL_NUM_STATES);
 	int                        ix;
 	int                        cpu;
+	/* Sync between icosq recovery and XSK enable/disable. */
+	struct mutex               icosq_recovery_lock;
 };
 
 struct mlx5e_ptp;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index d5b7110a4265..0107e4e73bb0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -30,6 +30,8 @@ void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv);
 void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq);
 void mlx5e_reporter_rq_cqe_err(struct mlx5e_rq *rq);
 void mlx5e_reporter_rx_timeout(struct mlx5e_rq *rq);
+void mlx5e_reporter_icosq_suspend_recovery(struct mlx5e_channel *c);
+void mlx5e_reporter_icosq_resume_recovery(struct mlx5e_channel *c);
 
 #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 74086eb556ae..2684e9da9f41 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -62,6 +62,7 @@ static void mlx5e_reset_icosq_cc_pc(struct mlx5e_icosq *icosq)
 
 static int mlx5e_rx_reporter_err_icosq_cqe_recover(void *ctx)
 {
+	struct mlx5e_rq *xskrq = NULL;
 	struct mlx5_core_dev *mdev;
 	struct mlx5e_icosq *icosq;
 	struct net_device *dev;
@@ -70,7 +71,13 @@ static int mlx5e_rx_reporter_err_icosq_cqe_recover(void *ctx)
 	int err;
 
 	icosq = ctx;
+
+	mutex_lock(&icosq->channel->icosq_recovery_lock);
+
+	/* mlx5e_close_rq cancels this work before RQ and ICOSQ are killed. */
 	rq = &icosq->channel->rq;
+	if (test_bit(MLX5E_RQ_STATE_ENABLED, &icosq->channel->xskrq.state))
+		xskrq = &icosq->channel->xskrq;
 	mdev = icosq->channel->mdev;
 	dev = icosq->channel->netdev;
 	err = mlx5_core_query_sq_state(mdev, icosq->sqn, &state);
@@ -84,6 +91,9 @@ static int mlx5e_rx_reporter_err_icosq_cqe_recover(void *ctx)
 		goto out;
 
 	mlx5e_deactivate_rq(rq);
+	if (xskrq)
+		mlx5e_deactivate_rq(xskrq);
+
 	err = mlx5e_wait_for_icosq_flush(icosq);
 	if (err)
 		goto out;
@@ -97,15 +107,28 @@ static int mlx5e_rx_reporter_err_icosq_cqe_recover(void *ctx)
 		goto out;
 
 	mlx5e_reset_icosq_cc_pc(icosq);
+
 	mlx5e_free_rx_in_progress_descs(rq);
+	if (xskrq)
+		mlx5e_free_rx_in_progress_descs(xskrq);
+
 	clear_bit(MLX5E_SQ_STATE_RECOVERING, &icosq->state);
 	mlx5e_activate_icosq(icosq);
-	mlx5e_activate_rq(rq);
 
+	mlx5e_activate_rq(rq);
 	rq->stats->recover++;
+
+	if (xskrq) {
+		mlx5e_activate_rq(xskrq);
+		xskrq->stats->recover++;
+	}
+
+	mutex_unlock(&icosq->channel->icosq_recovery_lock);
+
 	return 0;
 out:
 	clear_bit(MLX5E_SQ_STATE_RECOVERING, &icosq->state);
+	mutex_unlock(&icosq->channel->icosq_recovery_lock);
 	return err;
 }
 
@@ -706,6 +729,16 @@ void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq)
 	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
 }
 
+void mlx5e_reporter_icosq_suspend_recovery(struct mlx5e_channel *c)
+{
+	mutex_lock(&c->icosq_recovery_lock);
+}
+
+void mlx5e_reporter_icosq_resume_recovery(struct mlx5e_channel *c)
+{
+	mutex_unlock(&c->icosq_recovery_lock);
+}
+
 static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
 	.name = "rx",
 	.recover = mlx5e_rx_reporter_recover,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index 538bc2419bd8..8526a5fbbf0b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -4,6 +4,7 @@
 #include "setup.h"
 #include "en/params.h"
 #include "en/txrx.h"
+#include "en/health.h"
 
 /* It matches XDP_UMEM_MIN_CHUNK_SIZE, but as this constant is private and may
  * change unexpectedly, and mlx5e has a minimum valid stride size for striding
@@ -170,7 +171,13 @@ void mlx5e_close_xsk(struct mlx5e_channel *c)
 
 void mlx5e_activate_xsk(struct mlx5e_channel *c)
 {
+	/* ICOSQ recovery deactivates RQs. Suspend the recovery to avoid
+	 * activating XSKRQ in the middle of recovery.
+	 */
+	mlx5e_reporter_icosq_suspend_recovery(c);
 	set_bit(MLX5E_RQ_STATE_ENABLED, &c->xskrq.state);
+	mlx5e_reporter_icosq_resume_recovery(c);
+
 	/* TX queue is created active. */
 
 	spin_lock_bh(&c->async_icosq_lock);
@@ -180,6 +187,13 @@ void mlx5e_activate_xsk(struct mlx5e_channel *c)
 
 void mlx5e_deactivate_xsk(struct mlx5e_channel *c)
 {
-	mlx5e_deactivate_rq(&c->xskrq);
+	/* ICOSQ recovery may reactivate XSKRQ if clear_bit is called in the
+	 * middle of recovery. Suspend the recovery to avoid it.
+	 */
+	mlx5e_reporter_icosq_suspend_recovery(c);
+	clear_bit(MLX5E_RQ_STATE_ENABLED, &c->xskrq.state);
+	mlx5e_reporter_icosq_resume_recovery(c);
+	synchronize_net(); /* Sync with NAPI to prevent mlx5e_post_rx_wqes. */
+
 	/* TX queue is disabled on close. */
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 65571593ec5c..a572fc9690ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1087,8 +1087,6 @@ void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
 void mlx5e_close_rq(struct mlx5e_rq *rq)
 {
 	cancel_work_sync(&rq->dim.work);
-	if (rq->icosq)
-		cancel_work_sync(&rq->icosq->recover_work);
 	cancel_work_sync(&rq->recover_work);
 	mlx5e_destroy_rq(rq);
 	mlx5e_free_rx_descs(rq);
@@ -2088,6 +2086,8 @@ static int mlx5e_open_queues(struct mlx5e_channel *c,
 	if (err)
 		goto err_close_xdpsq_cq;
 
+	mutex_init(&c->icosq_recovery_lock);
+
 	err = mlx5e_open_icosq(c, params, &cparam->icosq, &c->icosq);
 	if (err)
 		goto err_close_async_icosq;
@@ -2156,9 +2156,12 @@ static void mlx5e_close_queues(struct mlx5e_channel *c)
 	mlx5e_close_xdpsq(&c->xdpsq);
 	if (c->xdp)
 		mlx5e_close_xdpsq(&c->rq_xdpsq);
+	/* The same ICOSQ is used for UMRs for both RQ and XSKRQ. */
+	cancel_work_sync(&c->icosq.recover_work);
 	mlx5e_close_rq(&c->rq);
 	mlx5e_close_sqs(c);
 	mlx5e_close_icosq(&c->icosq);
+	mutex_destroy(&c->icosq_recovery_lock);
 	mlx5e_close_icosq(&c->async_icosq);
 	if (c->xdp)
 		mlx5e_close_cq(&c->rq_xdpsq.cq);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 10/11] net/mlx5e: Fix ICOSQ recovery flow for XSK
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2021-12-22 21:11 ` [net 09/11] net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow Saeed Mahameed
@ 2021-12-22 21:12 ` Saeed Mahameed
  2021-12-22 21:12 ` [net 11/11] net/mlx5e: Delete forward rule for ct or sample action Saeed Mahameed
  2021-12-23 10:31 ` [pull request][net 00/11] mlx5 fixes 2021-12-22 David Miller
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:12 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Maxim Mikityanskiy, Aya Levin, Saeed Mahameed

From: Maxim Mikityanskiy <maximmi@mellanox.com>

There are two ICOSQs per channel: one is needed for RX, and the other
for async operations (XSK TX, kTLS offload). Currently, the recovery
flow for both is the same, and async ICOSQ is mistakenly treated like
the regular ICOSQ.

This patch prevents running the regular ICOSQ recovery on async ICOSQ.
The purpose of async ICOSQ is to handle XSK wakeup requests and post
kTLS offload RX parameters, it has nothing to do with RQ and XSKRQ UMRs,
so the regular recovery sequence is not applicable here.

Fixes: be5323c8379f ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  3 --
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 30 ++++++++++++++-----
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f42067adc79d..b47a0d3ef22f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1016,9 +1016,6 @@ int mlx5e_create_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param);
 void mlx5e_destroy_rq(struct mlx5e_rq *rq);
 
 struct mlx5e_sq_param;
-int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
-		     struct mlx5e_sq_param *param, struct mlx5e_icosq *sq);
-void mlx5e_close_icosq(struct mlx5e_icosq *sq);
 int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		     struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool,
 		     struct mlx5e_xdpsq *sq, bool is_redirect);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a572fc9690ed..3b0f3a831216 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1214,9 +1214,20 @@ static void mlx5e_icosq_err_cqe_work(struct work_struct *recover_work)
 	mlx5e_reporter_icosq_cqe_err(sq);
 }
 
+static void mlx5e_async_icosq_err_cqe_work(struct work_struct *recover_work)
+{
+	struct mlx5e_icosq *sq = container_of(recover_work, struct mlx5e_icosq,
+					      recover_work);
+
+	/* Not implemented yet. */
+
+	netdev_warn(sq->channel->netdev, "async_icosq recovery is not implemented\n");
+}
+
 static int mlx5e_alloc_icosq(struct mlx5e_channel *c,
 			     struct mlx5e_sq_param *param,
-			     struct mlx5e_icosq *sq)
+			     struct mlx5e_icosq *sq,
+			     work_func_t recover_work_func)
 {
 	void *sqc_wq               = MLX5_ADDR_OF(sqc, param->sqc, wq);
 	struct mlx5_core_dev *mdev = c->mdev;
@@ -1237,7 +1248,7 @@ static int mlx5e_alloc_icosq(struct mlx5e_channel *c,
 	if (err)
 		goto err_sq_wq_destroy;
 
-	INIT_WORK(&sq->recover_work, mlx5e_icosq_err_cqe_work);
+	INIT_WORK(&sq->recover_work, recover_work_func);
 
 	return 0;
 
@@ -1573,13 +1584,14 @@ void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
 	mlx5e_reporter_tx_err_cqe(sq);
 }
 
-int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
-		     struct mlx5e_sq_param *param, struct mlx5e_icosq *sq)
+static int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
+			    struct mlx5e_sq_param *param, struct mlx5e_icosq *sq,
+			    work_func_t recover_work_func)
 {
 	struct mlx5e_create_sq_param csp = {};
 	int err;
 
-	err = mlx5e_alloc_icosq(c, param, sq);
+	err = mlx5e_alloc_icosq(c, param, sq, recover_work_func);
 	if (err)
 		return err;
 
@@ -1618,7 +1630,7 @@ void mlx5e_deactivate_icosq(struct mlx5e_icosq *icosq)
 	synchronize_net(); /* Sync with NAPI. */
 }
 
-void mlx5e_close_icosq(struct mlx5e_icosq *sq)
+static void mlx5e_close_icosq(struct mlx5e_icosq *sq)
 {
 	struct mlx5e_channel *c = sq->channel;
 
@@ -2082,13 +2094,15 @@ static int mlx5e_open_queues(struct mlx5e_channel *c,
 
 	spin_lock_init(&c->async_icosq_lock);
 
-	err = mlx5e_open_icosq(c, params, &cparam->async_icosq, &c->async_icosq);
+	err = mlx5e_open_icosq(c, params, &cparam->async_icosq, &c->async_icosq,
+			       mlx5e_async_icosq_err_cqe_work);
 	if (err)
 		goto err_close_xdpsq_cq;
 
 	mutex_init(&c->icosq_recovery_lock);
 
-	err = mlx5e_open_icosq(c, params, &cparam->icosq, &c->icosq);
+	err = mlx5e_open_icosq(c, params, &cparam->icosq, &c->icosq,
+			       mlx5e_icosq_err_cqe_work);
 	if (err)
 		goto err_close_async_icosq;
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [net 11/11] net/mlx5e: Delete forward rule for ct or sample action
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2021-12-22 21:12 ` [net 10/11] net/mlx5e: Fix ICOSQ recovery flow for XSK Saeed Mahameed
@ 2021-12-22 21:12 ` Saeed Mahameed
  2021-12-23 10:31 ` [pull request][net 00/11] mlx5 fixes 2021-12-22 David Miller
  11 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-22 21:12 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Roi Dayan, Chris Mi, Saeed Mahameed

From: Chris Mi <cmi@nvidia.com>

When there is ct or sample action, the ct or sample rule will be deleted
and return. But if there is an extra mirror action, the forward rule can't
be deleted because of the return.

Fix it by removing the return.

Fixes: 69e2916ebce4 ("net/mlx5: CT: Add support for mirroring")
Fixes: f94d6389f6a8 ("net/mlx5e: TC, Add support to offload sample action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3d45f4ae80c0..f633448c3cc7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1196,21 +1196,16 @@ void mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw,
 	if (attr->flags & MLX5_ESW_ATTR_FLAG_SLOW_PATH)
 		goto offload_rule_0;
 
-	if (flow_flag_test(flow, CT)) {
-		mlx5_tc_ct_delete_flow(get_ct_priv(flow->priv), flow, attr);
-		return;
-	}
-
-	if (flow_flag_test(flow, SAMPLE)) {
-		mlx5e_tc_sample_unoffload(get_sample_priv(flow->priv), flow->rule[0], attr);
-		return;
-	}
-
 	if (attr->esw_attr->split_count)
 		mlx5_eswitch_del_fwd_rule(esw, flow->rule[1], attr);
 
+	if (flow_flag_test(flow, CT))
+		mlx5_tc_ct_delete_flow(get_ct_priv(flow->priv), flow, attr);
+	else if (flow_flag_test(flow, SAMPLE))
+		mlx5e_tc_sample_unoffload(get_sample_priv(flow->priv), flow->rule[0], attr);
+	else
 offload_rule_0:
-	mlx5_eswitch_del_offloaded_rule(esw, flow->rule[0], attr);
+		mlx5_eswitch_del_offloaded_rule(esw, flow->rule[0], attr);
 }
 
 struct mlx5_flow_handle *
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
  2021-12-22 21:11 ` [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources Saeed Mahameed
@ 2021-12-23  1:04   ` Jakub Kicinski
  2021-12-23  1:09     ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2021-12-23  1:04 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Roi Dayan, Miaoqian Lin, Saeed Mahameed

On Wed, 22 Dec 2021 13:11:51 -0800 Saeed Mahameed wrote:
> From: Miaoqian Lin <linmq006@gmail.com>
> 
> The mlx5_get_uars_page() function  returns error pointers.
> Using IS_ERR() to check the return value to fix this.
> 
> Fixes: 4ec9e7b02697("net/mlx5: DR, Expose steering domain functionality")

Do you mind fixing this missing space? I'll cherry pick the change from
net-next in the meantime.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
  2021-12-23  1:04   ` Jakub Kicinski
@ 2021-12-23  1:09     ` Jakub Kicinski
  2021-12-23  4:28       ` Saeed Mahameed
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2021-12-23  1:09 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Roi Dayan, Miaoqian Lin, Saeed Mahameed

On Wed, 22 Dec 2021 17:04:03 -0800 Jakub Kicinski wrote:
> On Wed, 22 Dec 2021 13:11:51 -0800 Saeed Mahameed wrote:
> > From: Miaoqian Lin <linmq006@gmail.com>
> > 
> > The mlx5_get_uars_page() function  returns error pointers.
> > Using IS_ERR() to check the return value to fix this.
> > 
> > Fixes: 4ec9e7b02697("net/mlx5: DR, Expose steering domain functionality")  
> 
> Do you mind fixing this missing space? I'll cherry pick the change from
> net-next in the meantime.

I take that back, I thought the error was on our side but looks like
the patch was put in the wrong PR. Why not put it in the net PR
yourself? We'll handle the unavoidable conflict, but I don't see any
advantage to me cherry picking here (which I can't do directly anyway,
TBH, my local trees have only one remote to avoid false negative Fixes
tag checks).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
  2021-12-23  1:09     ` Jakub Kicinski
@ 2021-12-23  4:28       ` Saeed Mahameed
  0 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2021-12-23  4:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, netdev, Roi Dayan, Miaoqian Lin, Saeed Mahameed

On Wed, 2021-12-22 at 17:09 -0800, Jakub Kicinski wrote:
> On Wed, 22 Dec 2021 17:04:03 -0800 Jakub Kicinski wrote:
> > On Wed, 22 Dec 2021 13:11:51 -0800 Saeed Mahameed wrote:
> > > From: Miaoqian Lin <linmq006@gmail.com>
> > > 
> > > The mlx5_get_uars_page() function  returns error pointers.
> > > Using IS_ERR() to check the return value to fix this.
> > > 
> > > Fixes: 4ec9e7b02697("net/mlx5: DR, Expose steering domain
> > > functionality")  
> > 
> > Do you mind fixing this missing space? I'll cherry pick the change
> > from

Sure, I will submit v2.

> > net-next in the meantime.
> 
> I take that back, I thought the error was on our side but looks like
> the patch was put in the wrong PR. Why not put it in the net PR

Yes, my mistake, the bug only existed in net-next when I accepted the
patch, I mistakenly left it in my net-next tree after rc1 was out.

> yourself? We'll handle the unavoidable conflict, but I don't see any
> advantage to me cherry picking here (which I can't do directly
> anyway,

This how I remember we used to do it.
Sure i will attach it in v2. I already checked, there will be no
conflicts.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pull request][net 00/11] mlx5 fixes 2021-12-22
  2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2021-12-22 21:12 ` [net 11/11] net/mlx5e: Delete forward rule for ct or sample action Saeed Mahameed
@ 2021-12-23 10:31 ` David Miller
  11 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2021-12-23 10:31 UTC (permalink / raw)
  To: saeed; +Cc: kuba, netdev, roid, saeedm

From: Saeed Mahameed <saeed@kernel.org>
Date: Wed, 22 Dec 2021 13:11:50 -0800

> From: Saeed Mahameed <saeedm@nvidia.com>
> 
> Hi Dave, Hi Jakub,
> 
> This series provides bug fixes to mlx5 driver.
> Please pull and let me know if there is any problem.
> 
> Additionally and unrelated to this pull, I would like to kindly request
> to cherry-pick the following fix commit from net-next branch into net:
> 31108d142f36 ("net/mlx5: Fix some error handling paths in ...")
> 
> The following changes since commit 9b8bdd1eb5890aeeab7391dddcf8bd51f7b07216:
> 
>   sfc: falcon: Check null pointer of rx_queue->page_ring (2021-12-22 12:25:18 -0800)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2021-12-22

Pulled.

That cherry pick doesn't come close to applying to the current tree.

Could you submit it yourself after fixing that?

Thank you.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-12-23 10:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-22 21:11 [pull request][net 00/11] mlx5 fixes 2021-12-22 Saeed Mahameed
2021-12-22 21:11 ` [net 01/11] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources Saeed Mahameed
2021-12-23  1:04   ` Jakub Kicinski
2021-12-23  1:09     ` Jakub Kicinski
2021-12-23  4:28       ` Saeed Mahameed
2021-12-22 21:11 ` [net 02/11] net/mlx5: DR, Fix querying eswitch manager vport for ECPF Saeed Mahameed
2021-12-22 21:11 ` [net 03/11] net/mlx5: Use first online CPU instead of hard coded CPU Saeed Mahameed
2021-12-22 21:11 ` [net 04/11] net/mlx5: Fix error print in case of IRQ request failed Saeed Mahameed
2021-12-22 21:11 ` [net 05/11] net/mlx5: Fix SF health recovery flow Saeed Mahameed
2021-12-22 21:11 ` [net 06/11] net/mlx5: Fix tc max supported prio for nic mode Saeed Mahameed
2021-12-22 21:11 ` [net 07/11] net/mlx5e: Wrap the tx reporter dump callback to extract the sq Saeed Mahameed
2021-12-22 21:11 ` [net 08/11] net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled Saeed Mahameed
2021-12-22 21:11 ` [net 09/11] net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow Saeed Mahameed
2021-12-22 21:12 ` [net 10/11] net/mlx5e: Fix ICOSQ recovery flow for XSK Saeed Mahameed
2021-12-22 21:12 ` [net 11/11] net/mlx5e: Delete forward rule for ct or sample action Saeed Mahameed
2021-12-23 10:31 ` [pull request][net 00/11] mlx5 fixes 2021-12-22 David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.