All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/16] mlx5e devlink health reporters
@ 2019-07-07 11:52 Tariq Toukan
  2019-07-07 11:52 ` [PATCH net-next 01/16] Revert "net/mlx5e: Fix mlx5e_tx_reporter_create return value" Tariq Toukan
                   ` (15 more replies)
  0 siblings, 16 replies; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

Hi Dave,

I'm submitting this series myself as Saeed is on vacation.

This series from Aya to the mlx5e driver introduces changes in
devlink health reporters.
Most noticeable is adding a new reporter, RX reporter, which
reports and recovers from timeout and completion errors in the
receive path.

In patches 1-6, we  perform TX reporter cleanup. In order to maintain
the code flow as similar as possible between RX and TX reporters.
In patch 7, we prepare for code sharing, generalize and move shared
functionality.
Patches 8-10 refactor and extend TX reporter diagnostics information
to align the TX reporter diagnostics output with the RX reporter's
diagnostics output.
Patch 11 adds RX reporter, initially supports only the diagnostics
call back.
In patch 12 we split ICOSQ open/close functions into two stages, to call
specific parts from the recover flow.
Patches 13-16 introduce recovery flows for: RX timeout on ICOSQ, completion
error on receive path.

Series generated against net-next commit:
23f30c41c732 Merge branch 'mlx5-TLS-TX-HW-offload-support'

Regards,
Tariq


Aya Levin (15):
  Revert "net/mlx5e: Fix mlx5e_tx_reporter_create return value"
  net/mlx5e: Fix error flow in tx reporter diagnose
  net/mlx5e: Set tx reporter only on successful creation
  net/mlx5e: TX reporter cleanup
  net/mlx5e: Rename reporter header file
  net/mlx5e: Change naming convention for reporter's functions
  net/mlx5e: Generalize tx reporter's functionality
  net/mlx5e: Extend tx diagnose function
  net/mlx5e: Extend tx reporter diagnostics output
  net/mlx5e: Add cq info to tx reporter diagnose
  net/mlx5e: Add support to rx reporter diagnose
  net/mlx5e: Split open/close ICOSQ into stages
  net/mlx5e: Recover from CQE error on ICOSQ
  net/mlx5e: Recover from rx timeout
  net/mlx5e: Recover from CQE with error on RQ

Saeed Mahameed (1):
  net/mlx5e: RX, Handle CQE with error at the earliest stage

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  32 ++
 .../net/ethernet/mellanox/mlx5/core/en/health.c    | 205 +++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |  53 +++
 .../net/ethernet/mellanox/mlx5/core/en/reporter.h  |  15 -
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   | 390 +++++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 226 ++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  84 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  62 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/wq.c       |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/wq.h       |   1 +
 13 files changed, 883 insertions(+), 200 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/health.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/health.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net-next 01/16] Revert "net/mlx5e: Fix mlx5e_tx_reporter_create return value"
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
@ 2019-07-07 11:52 ` Tariq Toukan
  2019-07-08 11:03   ` Jiri Pirko
  2019-07-07 11:52 ` [PATCH net-next 02/16] net/mlx5e: Fix error flow in tx reporter diagnose Tariq Toukan
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

This reverts commit 2e5b0534622fa87fd570d54af2d01ce304b88077.

This commit was needed prior to commit f6b19b354d50 ("net: devlink:
select NET_DEVLINK from drivers") Then, reporter's pointer could have
been a NULL. But with NET_DEVLINK mandatory to MLX5_CORE in Kconfig,
pointer can only hold an error in bad path.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 476dd97f7f2f..24626bb55598 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -302,7 +302,7 @@ int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
 		netdev_warn(priv->netdev,
 			    "Failed to create tx reporter, err = %ld\n",
 			    PTR_ERR(priv->tx_reporter));
-	return IS_ERR_OR_NULL(priv->tx_reporter);
+	return PTR_ERR_OR_ZERO(priv->tx_reporter);
 }
 
 void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 02/16] net/mlx5e: Fix error flow in tx reporter diagnose
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
  2019-07-07 11:52 ` [PATCH net-next 01/16] Revert "net/mlx5e: Fix mlx5e_tx_reporter_create return value" Tariq Toukan
@ 2019-07-07 11:52 ` Tariq Toukan
  2019-07-08 11:09   ` Jiri Pirko
  2019-07-07 11:52 ` [PATCH net-next 03/16] net/mlx5e: Set tx reporter only on successful creation Tariq Toukan
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Fix tx reporter's diagnose call back. Propagate error when failing to
gather diagnostics information or failing to print diagnostic data per
queue.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 24626bb55598..b2bc8d8ca80e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -264,13 +264,13 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
 
 		err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
 		if (err)
-			break;
+			goto unlock;
 
 		err = mlx5e_tx_reporter_build_diagnose_output(fmsg, sq->sqn,
 							      state,
 							      netif_xmit_stopped(sq->txq));
 		if (err)
-			break;
+			goto unlock;
 	}
 	err = devlink_fmsg_arr_pair_nest_end(fmsg);
 	if (err)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 03/16] net/mlx5e: Set tx reporter only on successful creation
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
  2019-07-07 11:52 ` [PATCH net-next 01/16] Revert "net/mlx5e: Fix mlx5e_tx_reporter_create return value" Tariq Toukan
  2019-07-07 11:52 ` [PATCH net-next 02/16] net/mlx5e: Fix error flow in tx reporter diagnose Tariq Toukan
@ 2019-07-07 11:52 ` Tariq Toukan
  2019-07-07 11:52 ` [PATCH net-next 04/16] net/mlx5e: TX reporter cleanup Tariq Toukan
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid
garbage/error pointer.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/reporter_tx.c  | 19 +++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c     |  2 +-
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index b2bc8d8ca80e..259811811e06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -117,7 +117,7 @@ static int mlx5_tx_health_report(struct devlink_health_reporter *tx_reporter,
 				 char *err_str,
 				 struct mlx5e_tx_err_ctx *err_ctx)
 {
-	if (IS_ERR_OR_NULL(tx_reporter)) {
+	if (!tx_reporter) {
 		netdev_err(err_ctx->sq->channel->netdev, err_str);
 		return err_ctx->recover(err_ctx->sq);
 	}
@@ -291,23 +291,26 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
 
 int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
 {
+	struct devlink_health_reporter *reporter;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct devlink *devlink = priv_to_devlink(mdev);
 
-	priv->tx_reporter =
+	reporter =
 		devlink_health_reporter_create(devlink, &mlx5_tx_reporter_ops,
 					       MLX5_REPORTER_TX_GRACEFUL_PERIOD,
 					       true, priv);
-	if (IS_ERR(priv->tx_reporter))
-		netdev_warn(priv->netdev,
-			    "Failed to create tx reporter, err = %ld\n",
-			    PTR_ERR(priv->tx_reporter));
-	return PTR_ERR_OR_ZERO(priv->tx_reporter);
+	if (IS_ERR(reporter))
+		netdev_warn(priv->netdev, "Failed to create tx reporter, err = %ld\n",
+			    PTR_ERR(reporter));
+	else
+		priv->tx_reporter = reporter;
+
+	return PTR_ERR_OR_ZERO(reporter);
 }
 
 void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv)
 {
-	if (IS_ERR_OR_NULL(priv->tx_reporter))
+	if (!priv->tx_reporter)
 		return;
 
 	devlink_health_reporter_destroy(priv->tx_reporter);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 83194d56434d..c5f11a05c34b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2320,7 +2320,7 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
 			goto err_close_channels;
 	}
 
-	if (!IS_ERR_OR_NULL(priv->tx_reporter))
+	if (priv->tx_reporter)
 		devlink_health_reporter_state_update(priv->tx_reporter,
 						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 04/16] net/mlx5e: TX reporter cleanup
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (2 preceding siblings ...)
  2019-07-07 11:52 ` [PATCH net-next 03/16] net/mlx5e: Set tx reporter only on successful creation Tariq Toukan
@ 2019-07-07 11:52 ` Tariq Toukan
  2019-07-07 11:52 ` [PATCH net-next 05/16] net/mlx5e: Rename reporter header file Tariq Toukan
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Remove redundant include file.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 259811811e06..83555894bacb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -1,7 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Copyright (c) 2019 Mellanox Technologies. */
 
-#include <net/devlink.h>
 #include "reporter.h"
 #include "lib/eq.h"
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 05/16] net/mlx5e: Rename reporter header file
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (3 preceding siblings ...)
  2019-07-07 11:52 ` [PATCH net-next 04/16] net/mlx5e: TX reporter cleanup Tariq Toukan
@ 2019-07-07 11:52 ` Tariq Toukan
  2019-07-08 11:11   ` Jiri Pirko
  2019-07-07 11:52 ` [PATCH net-next 06/16] net/mlx5e: Change naming convention for reporter's functions Tariq Toukan
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Rename reporter.h -> health.h so patches in the set can use it for
health related functionality.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/health.h      | 15 +++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h    | 15 ---------------
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c        |  2 +-
 4 files changed, 17 insertions(+), 17 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/health.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
new file mode 100644
index 000000000000..e78e92753d73
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __MLX5E_EN_REPORTER_H
+#define __MLX5E_EN_REPORTER_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq);
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
deleted file mode 100644
index e78e92753d73..000000000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
+++ /dev/null
@@ -1,15 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2019 Mellanox Technologies. */
-
-#ifndef __MLX5E_EN_REPORTER_H
-#define __MLX5E_EN_REPORTER_H
-
-#include <linux/mlx5/driver.h>
-#include "en.h"
-
-int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
-void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
-void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
-int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq);
-
-#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 83555894bacb..0035ae9306ec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Copyright (c) 2019 Mellanox Technologies. */
 
-#include "reporter.h"
+#include "health.h"
 #include "lib/eq.h"
 
 #define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index c5f11a05c34b..ce4533f8169c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -56,7 +56,7 @@
 #include "en/xdp.h"
 #include "lib/eq.h"
 #include "en/monitor_stats.h"
-#include "en/reporter.h"
+#include "en/health.h"
 #include "en/params.h"
 #include "en/xsk/umem.h"
 #include "en/xsk/setup.h"
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 06/16] net/mlx5e: Change naming convention for reporter's functions
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (4 preceding siblings ...)
  2019-07-07 11:52 ` [PATCH net-next 05/16] net/mlx5e: Rename reporter header file Tariq Toukan
@ 2019-07-07 11:52 ` Tariq Toukan
  2019-07-08 11:29   ` Jiri Pirko
  2019-07-07 11:52 ` [PATCH net-next 07/16] net/mlx5e: Generalize tx reporter's functionality Tariq Toukan
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Change from mlx5e_tx_reporter_* to mlx5e_reporter_tx_*. In the following
patches in the set rx reporter is added, the new naming convention is
more uniformed.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/health.h      | 8 ++++----
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 8 ++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c        | 8 ++++----
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index e78e92753d73..e3a3bcee89e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -7,9 +7,9 @@
 #include <linux/mlx5/driver.h>
 #include "en.h"
 
-int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
-void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
-void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
-int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq);
+int mlx5e_reporter_tx_create(struct mlx5e_priv *priv);
+void mlx5e_reporter_tx_destroy(struct mlx5e_priv *priv);
+void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
+int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq);
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 0035ae9306ec..d5ecfcfe5d52 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -124,7 +124,7 @@ static int mlx5_tx_health_report(struct devlink_health_reporter *tx_reporter,
 	return devlink_health_report(tx_reporter, err_str, err_ctx);
 }
 
-void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
+void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq)
 {
 	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
 	struct mlx5e_tx_err_ctx err_ctx = {0};
@@ -159,7 +159,7 @@ static int mlx5e_tx_reporter_timeout_recover(struct mlx5e_txqsq *sq)
 	return ret;
 }
 
-int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq)
+int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
 {
 	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
 	struct mlx5e_tx_err_ctx err_ctx;
@@ -288,7 +288,7 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
 
 #define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
 
-int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
+int mlx5e_reporter_tx_create(struct mlx5e_priv *priv)
 {
 	struct devlink_health_reporter *reporter;
 	struct mlx5_core_dev *mdev = priv->mdev;
@@ -307,7 +307,7 @@ int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
 	return PTR_ERR_OR_ZERO(reporter);
 }
 
-void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv)
+void mlx5e_reporter_tx_destroy(struct mlx5e_priv *priv)
 {
 	if (!priv->tx_reporter)
 		return;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ce4533f8169c..98c925e72706 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1376,7 +1376,7 @@ static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
 	struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
 					      recover_work);
 
-	mlx5e_tx_reporter_err_cqe(sq);
+	mlx5e_reporter_tx_err_cqe(sq);
 }
 
 int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
@@ -3201,7 +3201,7 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
 {
 	int tc;
 
-	mlx5e_tx_reporter_destroy(priv);
+	mlx5e_reporter_tx_destroy(priv);
 	for (tc = 0; tc < priv->profile->max_tc; tc++)
 		mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
 }
@@ -4286,7 +4286,7 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 		if (!netif_xmit_stopped(dev_queue))
 			continue;
 
-		if (mlx5e_tx_reporter_timeout(sq))
+		if (mlx5e_reporter_tx_timeout(sq))
 			report_failed = true;
 	}
 
@@ -5093,7 +5093,7 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 	mlx5e_dcbnl_initialize(priv);
 #endif
-	mlx5e_tx_reporter_create(priv);
+	mlx5e_reporter_tx_create(priv);
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 07/16] net/mlx5e: Generalize tx reporter's functionality
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (5 preceding siblings ...)
  2019-07-07 11:52 ` [PATCH net-next 06/16] net/mlx5e: Change naming convention for reporter's functions Tariq Toukan
@ 2019-07-07 11:52 ` Tariq Toukan
  2019-07-08 12:42   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 08/16] net/mlx5e: Extend tx diagnose function Tariq Toukan
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Prepare for code sharing with rx reporter, which is added in the
following patches in the set. Introduce a generic error_ctx for
agnostic recovery despatch.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   5 +-
 .../net/ethernet/mellanox/mlx5/core/en/health.c    |  82 ++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |  15 ++-
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 126 ++++-----------------
 4 files changed, 124 insertions(+), 104 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/health.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 57d2cc666fe3..23d566a45a30 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -23,8 +23,9 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 #
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
-		en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o \
-		en/params.o en/xsk/umem.o en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o
+		en_selftest.o en/port.o en/monitor_stats.o en/health.o \
+		en/reporter_tx.o en/params.o en/xsk/umem.o en/xsk/setup.o \
+		en/xsk/rx.o en/xsk/tx.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
new file mode 100644
index 000000000000..60166e5432ae
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Mellanox Technologies.
+
+#include "health.h"
+#include "lib/eq.h"
+
+int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn)
+{
+	struct mlx5_core_dev *mdev = channel->mdev;
+	struct net_device *dev = channel->netdev;
+	struct mlx5e_modify_sq_param msp = {0};
+	int err;
+
+	msp.curr_state = MLX5_SQC_STATE_ERR;
+	msp.next_state = MLX5_SQC_STATE_RST;
+
+	err = mlx5e_modify_sq(mdev, sqn, &msp);
+	if (err) {
+		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sqn);
+		return err;
+	}
+
+	memset(&msp, 0, sizeof(msp));
+	msp.curr_state = MLX5_SQC_STATE_RST;
+	msp.next_state = MLX5_SQC_STATE_RDY;
+
+	err = mlx5e_modify_sq(mdev, sqn, &msp);
+	if (err) {
+		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sqn);
+		return err;
+	}
+
+	return 0;
+}
+
+int mlx5e_health_recover_channels(struct mlx5e_priv *priv)
+{
+	int err = 0;
+
+	rtnl_lock();
+	mutex_lock(&priv->state_lock);
+
+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
+		goto out;
+
+	err = mlx5e_safe_reopen_channels(priv);
+
+out:
+	mutex_unlock(&priv->state_lock);
+	rtnl_unlock();
+
+	return err;
+}
+
+int mlx5e_health_channel_eq_recover(struct mlx5_eq_comp *eq, struct mlx5e_channel *channel)
+{
+	u32 eqe_count;
+
+	netdev_err(channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
+		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
+
+	eqe_count = mlx5_eq_poll_irq_disabled(eq);
+	if (!eqe_count)
+		return -EIO;
+
+	netdev_err(channel->netdev, "Recovered %d eqes on EQ 0x%x\n",
+		   eqe_count, eq->core.eqn);
+
+	channel->stats->eq_rearm++;
+	return 0;
+}
+
+int mlx5e_health_report(struct mlx5e_priv *priv,
+			struct devlink_health_reporter *reporter, char *err_str,
+			struct mlx5e_err_ctx *err_ctx)
+{
+	if (!reporter) {
+		netdev_err(priv->netdev, err_str);
+		return err_ctx->recover(&err_ctx->ctx);
+	}
+	return devlink_health_report(reporter, err_str, err_ctx);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index e3a3bcee89e7..960aa18c425d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -4,7 +4,6 @@
 #ifndef __MLX5E_EN_REPORTER_H
 #define __MLX5E_EN_REPORTER_H
 
-#include <linux/mlx5/driver.h>
 #include "en.h"
 
 int mlx5e_reporter_tx_create(struct mlx5e_priv *priv);
@@ -12,4 +11,18 @@
 void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
 int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq);
 
+#define MLX5E_REPORTER_PER_Q_MAX_LEN 256
+
+struct mlx5e_err_ctx {
+	int (*recover)(void *ctx);
+	void *ctx;
+};
+
+int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn);
+int mlx5e_health_channel_eq_recover(struct mlx5_eq_comp *eq, struct mlx5e_channel *channel);
+int mlx5e_health_recover_channels(struct mlx5e_priv *priv);
+int mlx5e_health_report(struct mlx5e_priv *priv,
+			struct devlink_health_reporter *reporter, char *err_str,
+			struct mlx5e_err_ctx *err_ctx);
+
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index d5ecfcfe5d52..3e03a1ac8e5a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -2,14 +2,6 @@
 /* Copyright (c) 2019 Mellanox Technologies. */
 
 #include "health.h"
-#include "lib/eq.h"
-
-#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
-
-struct mlx5e_tx_err_ctx {
-	int (*recover)(struct mlx5e_txqsq *sq);
-	struct mlx5e_txqsq *sq;
-};
 
 static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
 {
@@ -39,37 +31,9 @@ static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
 	sq->pc = 0;
 }
 
-static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
-{
-	struct mlx5_core_dev *mdev = sq->channel->mdev;
-	struct net_device *dev = sq->channel->netdev;
-	struct mlx5e_modify_sq_param msp = {0};
-	int err;
-
-	msp.curr_state = curr_state;
-	msp.next_state = MLX5_SQC_STATE_RST;
-
-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
-	if (err) {
-		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
-		return err;
-	}
-
-	memset(&msp, 0, sizeof(msp));
-	msp.curr_state = MLX5_SQC_STATE_RST;
-	msp.next_state = MLX5_SQC_STATE_RDY;
-
-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
-	if (err) {
-		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
-		return err;
-	}
-
-	return 0;
-}
-
-static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
+static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
 {
+	struct mlx5e_txqsq *sq = (struct mlx5e_txqsq *)ctx;
 	struct mlx5_core_dev *mdev = sq->channel->mdev;
 	struct net_device *dev = sq->channel->netdev;
 	u8 state;
@@ -101,7 +65,7 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
 	 * pending WQEs. SQ can safely reset the SQ.
 	 */
 
-	err = mlx5e_sq_to_ready(sq, state);
+	err = mlx5e_health_sq_to_ready(sq->channel, sq->sqn);
 	if (err)
 		return err;
 
@@ -112,104 +76,64 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
 	return 0;
 }
 
-static int mlx5_tx_health_report(struct devlink_health_reporter *tx_reporter,
-				 char *err_str,
-				 struct mlx5e_tx_err_ctx *err_ctx)
-{
-	if (!tx_reporter) {
-		netdev_err(err_ctx->sq->channel->netdev, err_str);
-		return err_ctx->recover(err_ctx->sq);
-	}
-
-	return devlink_health_report(tx_reporter, err_str, err_ctx);
-}
-
 void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq)
 {
-	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
-	struct mlx5e_tx_err_ctx err_ctx = {0};
+	struct mlx5e_priv *priv = sq->channel->priv;
+	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+	struct mlx5e_err_ctx err_ctx = {0};
 
-	err_ctx.sq       = sq;
-	err_ctx.recover  = mlx5e_tx_reporter_err_cqe_recover;
+	err_ctx.ctx = sq;
+	err_ctx.recover = mlx5e_tx_reporter_err_cqe_recover;
 	sprintf(err_str, "ERR CQE on SQ: 0x%x", sq->sqn);
 
-	mlx5_tx_health_report(sq->channel->priv->tx_reporter, err_str,
-			      &err_ctx);
+	mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
 }
 
-static int mlx5e_tx_reporter_timeout_recover(struct mlx5e_txqsq *sq)
+static int mlx5e_tx_reporter_timeout_recover(void *ctx)
 {
+	struct mlx5e_txqsq *sq = (struct mlx5e_txqsq *)ctx;
 	struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
-	u32 eqe_count;
-	int ret;
-
-	netdev_err(sq->channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
-		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
+	int err;
 
-	eqe_count = mlx5_eq_poll_irq_disabled(eq);
-	ret = eqe_count ? false : true;
-	if (!eqe_count) {
+	err = mlx5e_health_channel_eq_recover(eq, sq->channel);
+	if (err)
 		clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
-		return ret;
-	}
 
-	netdev_err(sq->channel->netdev, "Recover %d eqes on EQ 0x%x\n",
-		   eqe_count, eq->core.eqn);
-	sq->channel->stats->eq_rearm++;
-	return ret;
+	return err;
 }
 
 int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
 {
-	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
-	struct mlx5e_tx_err_ctx err_ctx;
+	struct mlx5e_priv *priv = sq->channel->priv;
+	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+	struct mlx5e_err_ctx err_ctx;
 
-	err_ctx.sq       = sq;
-	err_ctx.recover  = mlx5e_tx_reporter_timeout_recover;
+	err_ctx.ctx = sq;
+	err_ctx.recover = mlx5e_tx_reporter_timeout_recover;
 	sprintf(err_str,
 		"TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
 		sq->channel->ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
 		jiffies_to_usecs(jiffies - sq->txq->trans_start));
 
-	return mlx5_tx_health_report(sq->channel->priv->tx_reporter, err_str,
-				     &err_ctx);
+	return mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
 }
 
 /* state lock cannot be grabbed within this function.
  * It can cause a dead lock or a read-after-free.
  */
-static int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_tx_err_ctx *err_ctx)
+static int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_err_ctx *err_ctx)
 {
-	return err_ctx->recover(err_ctx->sq);
-}
-
-static int mlx5e_tx_reporter_recover_all(struct mlx5e_priv *priv)
-{
-	int err = 0;
-
-	rtnl_lock();
-	mutex_lock(&priv->state_lock);
-
-	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
-		goto out;
-
-	err = mlx5e_safe_reopen_channels(priv);
-
-out:
-	mutex_unlock(&priv->state_lock);
-	rtnl_unlock();
-
-	return err;
+	return err_ctx->recover(err_ctx->ctx);
 }
 
 static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
 				     void *context)
 {
 	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
-	struct mlx5e_tx_err_ctx *err_ctx = context;
+	struct mlx5e_err_ctx *err_ctx = context;
 
 	return err_ctx ? mlx5e_tx_reporter_recover_from_ctx(err_ctx) :
-			 mlx5e_tx_reporter_recover_all(priv);
+			 mlx5e_health_recover_channels(priv);
 }
 
 static int
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 08/16] net/mlx5e: Extend tx diagnose function
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (6 preceding siblings ...)
  2019-07-07 11:52 ` [PATCH net-next 07/16] net/mlx5e: Generalize tx reporter's functionality Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  2019-07-08 12:43   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 09/16] net/mlx5e: Extend tx reporter diagnostics output Tariq Toukan
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

The following patches in the set enhance the diagnostics info of tx
reporter. Therefore, it is better to pass a pointer to the SQ for
further data extraction.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 3e03a1ac8e5a..dd6417930461 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -138,15 +138,22 @@ static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
 
 static int
 mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
-					u32 sqn, u8 state, bool stopped)
+					struct mlx5e_txqsq *sq)
 {
+	struct mlx5e_priv *priv = sq->channel->priv;
+	bool stopped = netif_xmit_stopped(sq->txq);
+	u8 state;
 	int err;
 
+	err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
+	if (err)
+		return err;
+
 	err = devlink_fmsg_obj_nest_start(fmsg);
 	if (err)
 		return err;
 
-	err = devlink_fmsg_u32_pair_put(fmsg, "sqn", sqn);
+	err = devlink_fmsg_u32_pair_put(fmsg, "sqn", sq->sqn);
 	if (err)
 		return err;
 
@@ -183,15 +190,8 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
 	for (i = 0; i < priv->channels.num * priv->channels.params.num_tc;
 	     i++) {
 		struct mlx5e_txqsq *sq = priv->txq2sq[i];
-		u8 state;
-
-		err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
-		if (err)
-			goto unlock;
 
-		err = mlx5e_tx_reporter_build_diagnose_output(fmsg, sq->sqn,
-							      state,
-							      netif_xmit_stopped(sq->txq));
+		err = mlx5e_tx_reporter_build_diagnose_output(fmsg, sq);
 		if (err)
 			goto unlock;
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 09/16] net/mlx5e: Extend tx reporter diagnostics output
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (7 preceding siblings ...)
  2019-07-07 11:53 ` [PATCH net-next 08/16] net/mlx5e: Extend tx diagnose function Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  2019-07-08 12:55   ` Jiri Pirko
  2019-07-08 13:39   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 10/16] net/mlx5e: Add cq info to tx reporter diagnose Tariq Toukan
                   ` (6 subsequent siblings)
  15 siblings, 2 replies; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Enhance tx reporter's diagnostics output to include: information common
to all SQs: SQ size, SQ stride size.
In addition add channel ix, cc and pc.

$ devlink health diagnose pci/0000:00:0b.0 reporter tx
Common config:
   SQ: stride size: 64 size: 1024
 SQs:
   channel ix: 0 sqn: 4283 HW state: 1 stopped: false cc: 0 pc: 0
   channel ix: 1 sqn: 4288 HW state: 1 stopped: false cc: 0 pc: 0
   channel ix: 2 sqn: 4293 HW state: 1 stopped: false cc: 0 pc: 0
   channel ix: 3 sqn: 4298 HW state: 1 stopped: false cc: 0 pc: 0

$ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp
{
    "Common config": [
        "SQ": {
            "stride size": 64,
            "size": 1024
        } ],
    "SQs": [ {
            "channel ix": 0,
            "sqn": 4283,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0
        },{
            "channel ix": 1,
            "sqn": 4288,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0
        },{
            "channel ix": 2,
            "sqn": 4293,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0
        },{
            "channel ix": 3,
            "sqn": 4298,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0
         } ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/health.c    | 30 ++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |  3 ++
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 42 ++++++++++++++++++++++
 3 files changed, 75 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
index 60166e5432ae..0d44b081259f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
@@ -4,6 +4,36 @@
 #include "health.h"
 #include "lib/eq.h"
 
+int mlx5e_reporter_named_obj_nest_start(struct devlink_fmsg *fmsg, char *name)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_start(fmsg, name);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_obj_nest_start(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int mlx5e_reporter_named_obj_nest_end(struct devlink_fmsg *fmsg)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_end(fmsg);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_obj_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn)
 {
 	struct mlx5_core_dev *mdev = channel->mdev;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index 960aa18c425d..0fecad63135c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -11,6 +11,9 @@
 void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
 int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq);
 
+int mlx5e_reporter_named_obj_nest_start(struct devlink_fmsg *fmsg, char *name);
+int mlx5e_reporter_named_obj_nest_end(struct devlink_fmsg *fmsg);
+
 #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
 
 struct mlx5e_err_ctx {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index dd6417930461..c481f7142a12 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -153,6 +153,10 @@ static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
 	if (err)
 		return err;
 
+	err = devlink_fmsg_u32_pair_put(fmsg, "channel ix", sq->channel->ix);
+	if (err)
+		return err;
+
 	err = devlink_fmsg_u32_pair_put(fmsg, "sqn", sq->sqn);
 	if (err)
 		return err;
@@ -165,6 +169,14 @@ static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
 	if (err)
 		return err;
 
+	err = devlink_fmsg_u32_pair_put(fmsg, "cc", sq->cc);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "pc", sq->pc);
+	if (err)
+		return err;
+
 	err = devlink_fmsg_obj_nest_end(fmsg);
 	if (err)
 		return err;
@@ -176,6 +188,9 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
 				      struct devlink_fmsg *fmsg)
 {
 	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	struct mlx5e_txqsq *generic_sq = priv->txq2sq[0];
+	u32 sq_stride, sq_sz;
+
 	int i, err = 0;
 
 	mutex_lock(&priv->state_lock);
@@ -183,6 +198,33 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
 		goto unlock;
 
+	sq_sz = mlx5_wq_cyc_get_size(&generic_sq->wq);
+	sq_stride = MLX5_SEND_WQE_BB;
+
+	err = devlink_fmsg_arr_pair_nest_start(fmsg, "Common config");
+	if (err)
+		goto unlock;
+
+	err = mlx5e_reporter_named_obj_nest_start(fmsg, "SQ");
+	if (err)
+		goto unlock;
+
+	err = devlink_fmsg_u64_pair_put(fmsg, "stride size", sq_stride);
+	if (err)
+		goto unlock;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "size", sq_sz);
+	if (err)
+		goto unlock;
+
+	err = devlink_fmsg_arr_pair_nest_end(fmsg);
+	if (err)
+		goto unlock;
+
+	err = mlx5e_reporter_named_obj_nest_end(fmsg);
+	if (err)
+		goto unlock;
+
 	err = devlink_fmsg_arr_pair_nest_start(fmsg, "SQs");
 	if (err)
 		goto unlock;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 10/16] net/mlx5e: Add cq info to tx reporter diagnose
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (8 preceding siblings ...)
  2019-07-07 11:53 ` [PATCH net-next 09/16] net/mlx5e: Extend tx reporter diagnostics output Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  2019-07-08 13:00   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 11/16] net/mlx5e: Add support to rx " Tariq Toukan
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Add cq information to general diagnose output: CQ size and stride size.
Per SQ add information about the related CQ: cqn and CQ's HW status.

$ devlink health diagnose pci/0000:00:0b.0 reporter tx
Common config:
   SQ: stride size: 64 size: 1024
   CQ: stride size: 64 size: 1024
 SQs:
   channel ix: 0 sqn: 4283 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1030 HW status: 0
   channel ix: 1 sqn: 4288 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1034 HW status: 0
   channel ix: 2 sqn: 4293 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1038 HW status: 0
   channel ix: 3 sqn: 4298 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1042 HW status: 0

$ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp
{
    "Common config": [
        "SQ": {
            "stride size": 64,
            "size": 1024
        },
        "CQ": {
            "stride size": 64,
            "size": 1024
        } ],
    "SQs": [ {
            "channel ix": 0,
            "sqn": 4283,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 1030,
                "HW status": 0
            }
        },{
            "channel ix": 1,
            "sqn": 4288,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 1034,
                "HW status": 0
            }
        },{
            "channel ix": 2,
            "sqn": 4293,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 1038,
                "HW status": 0
            }
        },{
            "channel ix": 3,
            "sqn": 4298,
            "HW state": 1,
            "stopped": false,
            "cc": 0,
            "pc": 0,
            "CQ": {
                "cqn": 1042,
                "HW status": 0
            }
        } ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/health.c    | 62 ++++++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |  2 +
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   |  8 +++
 drivers/net/ethernet/mellanox/mlx5/core/wq.c       |  5 ++
 drivers/net/ethernet/mellanox/mlx5/core/wq.h       |  1 +
 5 files changed, 78 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
index 0d44b081259f..a266717d41e5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
@@ -34,6 +34,68 @@ int mlx5e_reporter_named_obj_nest_end(struct devlink_fmsg *fmsg)
 	return 0;
 }
 
+int mlx5e_reporter_cq_diagnose(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg)
+{
+	struct mlx5e_priv *priv = cq->channel->priv;
+	u32 out[MLX5_ST_SZ_DW(query_cq_out)] = {};
+	u8 hw_status;
+	void *cqc;
+	int err;
+
+	err = mlx5_core_query_cq(priv->mdev, &cq->mcq, out, sizeof(out));
+	if (err)
+		return err;
+
+	cqc = MLX5_ADDR_OF(query_cq_out, out, cq_context);
+	hw_status = MLX5_GET(cqc, cqc, status);
+
+	err = mlx5e_reporter_named_obj_nest_start(fmsg, "CQ");
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "cqn", cq->mcq.cqn);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u8_pair_put(fmsg, "HW status", hw_status);
+	if (err)
+		return err;
+
+	err = mlx5e_reporter_named_obj_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int mlx5e_reporter_cq_common_diagnose(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg)
+{
+	u8 cq_log_stride;
+	u32 cq_sz;
+	int err;
+
+	cq_sz = mlx5_cqwq_get_size(&cq->wq);
+	cq_log_stride = mlx5_cqwq_get_log_stride_size(&cq->wq);
+
+	err = mlx5e_reporter_named_obj_nest_start(fmsg, "CQ");
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u64_pair_put(fmsg, "stride size", BIT(cq_log_stride));
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "size", cq_sz);
+	if (err)
+		return err;
+
+	err = mlx5e_reporter_named_obj_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn)
 {
 	struct mlx5_core_dev *mdev = channel->mdev;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index 0fecad63135c..b0c8bda3d25f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -11,6 +11,8 @@
 void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
 int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq);
 
+int mlx5e_reporter_cq_diagnose(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg);
+int mlx5e_reporter_cq_common_diagnose(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg);
 int mlx5e_reporter_named_obj_nest_start(struct devlink_fmsg *fmsg, char *name);
 int mlx5e_reporter_named_obj_nest_end(struct devlink_fmsg *fmsg);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index c481f7142a12..47e788cf05bf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -177,6 +177,10 @@ static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
 	if (err)
 		return err;
 
+	err = mlx5e_reporter_cq_diagnose(&sq->cq, fmsg);
+	if (err)
+		return err;
+
 	err = devlink_fmsg_obj_nest_end(fmsg);
 	if (err)
 		return err;
@@ -221,6 +225,10 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
 	if (err)
 		goto unlock;
 
+	err = mlx5e_reporter_cq_common_diagnose(&generic_sq->cq, fmsg);
+	if (err)
+		goto unlock;
+
 	err = mlx5e_reporter_named_obj_nest_end(fmsg);
 	if (err)
 		goto unlock;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.c b/drivers/net/ethernet/mellanox/mlx5/core/wq.c
index 953cc8efba69..dd2315ce4441 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.c
@@ -44,6 +44,11 @@ u32 mlx5_cqwq_get_size(struct mlx5_cqwq *wq)
 	return wq->fbc.sz_m1 + 1;
 }
 
+u8 mlx5_cqwq_get_log_stride_size(struct mlx5_cqwq *wq)
+{
+	return wq->fbc.log_stride;
+}
+
 u32 mlx5_wq_ll_get_size(struct mlx5_wq_ll *wq)
 {
 	return (u32)wq->fbc.sz_m1 + 1;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.h b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
index f1ec58c9e9e3..55791f71a778 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
@@ -89,6 +89,7 @@ int mlx5_cqwq_create(struct mlx5_core_dev *mdev, struct mlx5_wq_param *param,
 		     void *cqc, struct mlx5_cqwq *wq,
 		     struct mlx5_wq_ctrl *wq_ctrl);
 u32 mlx5_cqwq_get_size(struct mlx5_cqwq *wq);
+u8 mlx5_cqwq_get_log_stride_size(struct mlx5_cqwq *wq);
 
 int mlx5_wq_ll_create(struct mlx5_core_dev *mdev, struct mlx5_wq_param *param,
 		      void *wqc, struct mlx5_wq_ll *wq,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 11/16] net/mlx5e: Add support to rx reporter diagnose
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (9 preceding siblings ...)
  2019-07-07 11:53 ` [PATCH net-next 10/16] net/mlx5e: Add cq info to tx reporter diagnose Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  2019-07-08 14:22   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 12/16] net/mlx5e: Split open/close ICOSQ into stages Tariq Toukan
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Add rx reporter, which supports diagnose call-back. Diagnostics output
include: information common to all RQs: RQ type, RQ size, RQ stride
size, CQ size and CQ stride size. In addition advertise information per
RQ and its related icosq and attached CQ.

$ devlink health diagnose pci/0000:00:0b.0 reporter rx
Common config:
   RQ: type: 2 stride size: 2048 size: 8
   CQ: stride size: 64 size: 1024
 RQs:
   channel ix: 0 rqn: 4284 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0
   channel ix: 1 rqn: 4289 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0
   channel ix: 2 rqn: 4294 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1040 HW status: 0
   channel ix: 3 rqn: 4299 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1044 HW status: 0

$ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp
{
    "Common config": [
        "RQ": {
            "type": 2,
            "stride size": 2048,
            "size": 8
        },
        "CQ": {
            "stride size": 64,
            "size": 1024
        } ],
    "RQs": [ {
            "channel ix": 0,
            "rqn": 4284,
            "HW state": 1,
            "SW state": 3,
            "posted WQEs": 7,
            "cc": 7,
            "ICOSQ HW state": 1,
            "CQ": {
                "cqn": 1032,
                "HW status": 0
            }
        },{
            "channel ix": 1,
            "rqn": 4289,
            "HW state": 1,
            "SW state": 3,
            "posted WQEs": 7,
            "cc": 7,
            "ICOSQ HW state": 1,
            "CQ": {
                "cqn": 1036,
                "HW status": 0
            }
        },{
            "channel ix": 2,
            "rqn": 4294,
            "HW state": 1,
            "SW state": 3,
            "posted WQEs": 7,
            "cc": 7,
            "ICOSQ HW state": 1,
            "CQ": {
                "cqn": 1040,
                "HW status": 0
            }
        },{
            "channel ix": 3,
            "rqn": 4299,
            "HW state": 1,
            "SW state": 3,
            "posted WQEs": 7,
            "cc": 7,
            "ICOSQ HW state": 1,
            "CQ": {
                "cqn": 1044,
                "HW status": 0
            }
        } ]
}

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  21 +++
 .../net/ethernet/mellanox/mlx5/core/en/health.c    |  31 ++++
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |   7 +
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   | 194 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  29 +--
 6 files changed, 258 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 23d566a45a30..a3b9659649a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -24,8 +24,8 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
 		en_selftest.o en/port.o en/monitor_stats.o en/health.o \
-		en/reporter_tx.o en/params.o en/xsk/umem.o en/xsk/setup.o \
-		en/xsk/rx.o en/xsk/tx.o
+		en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/umem.o \
+		en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 263558875f20..f7c5cf7a7064 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -848,6 +848,7 @@ struct mlx5e_priv {
 	struct mlx5e_tls          *tls;
 #endif
 	struct devlink_health_reporter *tx_reporter;
+	struct devlink_health_reporter *rx_reporter;
 	struct mlx5e_xsk           xsk;
 };
 
@@ -888,6 +889,26 @@ netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget);
 void mlx5e_free_txqsq_descs(struct mlx5e_txqsq *sq);
 
+static inline u32 mlx5e_rqwq_get_size(struct mlx5e_rq *rq)
+{
+	switch (rq->wq_type) {
+	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
+		return mlx5_wq_ll_get_size(&rq->mpwqe.wq);
+	default:
+		return mlx5_wq_cyc_get_size(&rq->wqe.wq);
+	}
+}
+
+static inline u32 mlx5e_rqwq_get_cur_sz(struct mlx5e_rq *rq)
+{
+	switch (rq->wq_type) {
+	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
+		return rq->mpwqe.wq.cur_sz;
+	default:
+		return rq->wqe.wq.cur_sz;
+	}
+}
+
 bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev);
 bool mlx5e_striding_rq_possible(struct mlx5_core_dev *mdev,
 				struct mlx5e_params *params);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
index a266717d41e5..a0579de8e2e0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
@@ -96,6 +96,37 @@ int mlx5e_reporter_cq_common_diagnose(struct mlx5e_cq *cq, struct devlink_fmsg *
 	return 0;
 }
 
+int mlx5e_health_create_reporters(struct mlx5e_priv *priv)
+{
+	int err;
+
+	err = mlx5e_reporter_tx_create(priv);
+	if (err)
+		return err;
+
+	err = mlx5e_reporter_rx_create(priv);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+void mlx5e_health_destroy_reporters(struct mlx5e_priv *priv)
+{
+	mlx5e_reporter_rx_destroy(priv);
+	mlx5e_reporter_tx_destroy(priv);
+}
+
+void mlx5e_health_channels_update(struct mlx5e_priv *priv)
+{
+	if (priv->tx_reporter)
+		devlink_health_reporter_state_update(priv->tx_reporter,
+						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
+	if (priv->rx_reporter)
+		devlink_health_reporter_state_update(priv->rx_reporter,
+						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
+}
+
 int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn)
 {
 	struct mlx5_core_dev *mdev = channel->mdev;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index b0c8bda3d25f..7829ea229914 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -16,6 +16,9 @@
 int mlx5e_reporter_named_obj_nest_start(struct devlink_fmsg *fmsg, char *name);
 int mlx5e_reporter_named_obj_nest_end(struct devlink_fmsg *fmsg);
 
+int mlx5e_reporter_rx_create(struct mlx5e_priv *priv);
+void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv);
+
 #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
 
 struct mlx5e_err_ctx {
@@ -29,5 +32,9 @@ struct mlx5e_err_ctx {
 int mlx5e_health_report(struct mlx5e_priv *priv,
 			struct devlink_health_reporter *reporter, char *err_str,
 			struct mlx5e_err_ctx *err_ctx);
+int mlx5e_health_create_reporters(struct mlx5e_priv *priv);
+void mlx5e_health_destroy_reporters(struct mlx5e_priv *priv);
+void mlx5e_health_channels_update(struct mlx5e_priv *priv);
+
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
new file mode 100644
index 000000000000..b24bdff473b0
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Mellanox Technologies.
+
+#include "health.h"
+#include "params.h"
+
+static int mlx5e_query_rq_state(struct mlx5_core_dev *dev, u32 rqn, u8 *state)
+{
+	int outlen = MLX5_ST_SZ_BYTES(query_rq_out);
+	void *out;
+	void *rqc;
+	int err;
+
+	out = kvzalloc(outlen, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	err = mlx5_core_query_rq(dev, rqn, out);
+	if (err)
+		goto out;
+
+	rqc = MLX5_ADDR_OF(query_rq_out, out, rq_context);
+	*state = MLX5_GET(rqc, rqc, state);
+
+out:
+	kvfree(out);
+	return err;
+}
+
+static int mlx5e_rx_reporter_build_diagnose_output(struct mlx5e_rq *rq,
+						   struct devlink_fmsg *fmsg)
+{
+	struct mlx5e_priv *priv = rq->channel->priv;
+	struct mlx5e_params *params = &priv->channels.params;
+	struct mlx5e_icosq *icosq = &rq->channel->icosq;
+	u8 icosq_hw_state;
+	int wqes_sz;
+	u8 hw_state;
+	u16 wq_head;
+	int err;
+
+	err = mlx5e_query_rq_state(priv->mdev, rq->rqn, &hw_state);
+	if (err)
+		return err;
+
+	err = mlx5_core_query_sq_state(priv->mdev, icosq->sqn, &icosq_hw_state);
+	if (err)
+		return err;
+
+	wqes_sz = mlx5e_rqwq_get_cur_sz(rq);
+	wq_head = params->rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ ?
+		  rq->mpwqe.wq.head : mlx5_wq_cyc_get_head(&rq->wqe.wq);
+
+	err = devlink_fmsg_obj_nest_start(fmsg);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "channel ix", rq->channel->ix);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "rqn", rq->rqn);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u8_pair_put(fmsg, "HW state", hw_state);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u8_pair_put(fmsg, "SW state", rq->state);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "posted WQEs", wqes_sz);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "cc", wq_head);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u8_pair_put(fmsg, "ICOSQ HW state", icosq_hw_state);
+	if (err)
+		return err;
+
+	err = mlx5e_reporter_cq_diagnose(&rq->cq, fmsg);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_obj_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int mlx5e_rx_reporter_diagnose(struct devlink_health_reporter *reporter,
+				      struct devlink_fmsg *fmsg)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	struct mlx5e_params *params = &priv->channels.params;
+	struct mlx5e_rq *generic_rq;
+	u32 rq_stride, rq_sz;
+	int i, err = 0;
+
+	mutex_lock(&priv->state_lock);
+
+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
+		goto unlock;
+
+	generic_rq = &priv->channels.c[0]->rq;
+	rq_sz = mlx5e_rqwq_get_size(generic_rq);
+	rq_stride = BIT(mlx5e_mpwqe_get_log_stride_size(priv->mdev, params, NULL));
+
+	err = devlink_fmsg_arr_pair_nest_start(fmsg, "Common config");
+	if (err)
+		goto unlock;
+
+	err = mlx5e_reporter_named_obj_nest_start(fmsg, "RQ");
+	if (err)
+		goto unlock;
+
+	err = devlink_fmsg_u8_pair_put(fmsg, "type", params->rq_wq_type);
+	if (err)
+		goto unlock;
+
+	err = devlink_fmsg_u64_pair_put(fmsg, "stride size", rq_stride);
+	if (err)
+		goto unlock;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "size", rq_sz);
+	if (err)
+		goto unlock;
+
+	err = devlink_fmsg_arr_pair_nest_end(fmsg);
+	if (err)
+		goto unlock;
+
+	err = mlx5e_reporter_cq_common_diagnose(&generic_rq->cq, fmsg);
+	if (err)
+		goto unlock;
+
+	err = mlx5e_reporter_named_obj_nest_end(fmsg);
+	if (err)
+		goto unlock;
+
+	err = devlink_fmsg_arr_pair_nest_start(fmsg, "RQs");
+	if (err)
+		goto unlock;
+
+	for (i = 0; i < priv->channels.num; i++) {
+		struct mlx5e_rq *rq = &priv->channels.c[i]->rq;
+
+		err = mlx5e_rx_reporter_build_diagnose_output(rq, fmsg);
+		if (err)
+			goto unlock;
+	}
+	err = devlink_fmsg_arr_pair_nest_end(fmsg);
+	if (err)
+		goto unlock;
+unlock:
+	mutex_unlock(&priv->state_lock);
+	return err;
+}
+
+static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
+		.name = "rx",
+		.diagnose = mlx5e_rx_reporter_diagnose,
+};
+
+int mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
+{
+	struct devlink_health_reporter *reporter;
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct devlink *devlink = priv_to_devlink(mdev);
+
+	reporter =
+		devlink_health_reporter_create(devlink, &mlx5_rx_reporter_ops,
+					       0, false, priv);
+	if (IS_ERR(reporter))
+		netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n",
+			    PTR_ERR(reporter));
+	else
+		priv->tx_reporter = reporter;
+	return PTR_ERR_OR_ZERO(reporter);
+}
+
+void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv)
+{
+	if (!priv->rx_reporter)
+		return;
+
+	devlink_health_reporter_destroy(priv->rx_reporter);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 98c925e72706..3922905e909f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -247,26 +247,6 @@ static inline void mlx5e_build_umr_wqe(struct mlx5e_rq *rq,
 	ucseg->mkey_mask     = cpu_to_be64(MLX5_MKEY_MASK_FREE);
 }
 
-static u32 mlx5e_rqwq_get_size(struct mlx5e_rq *rq)
-{
-	switch (rq->wq_type) {
-	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
-		return mlx5_wq_ll_get_size(&rq->mpwqe.wq);
-	default:
-		return mlx5_wq_cyc_get_size(&rq->wqe.wq);
-	}
-}
-
-static u32 mlx5e_rqwq_get_cur_sz(struct mlx5e_rq *rq)
-{
-	switch (rq->wq_type) {
-	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
-		return rq->mpwqe.wq.cur_sz;
-	default:
-		return rq->wqe.wq.cur_sz;
-	}
-}
-
 static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq,
 				     struct mlx5e_channel *c)
 {
@@ -2320,10 +2300,7 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
 			goto err_close_channels;
 	}
 
-	if (priv->tx_reporter)
-		devlink_health_reporter_state_update(priv->tx_reporter,
-						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
-
+	mlx5e_health_channels_update(priv);
 	kvfree(cparam);
 	return 0;
 
@@ -3201,7 +3178,6 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
 {
 	int tc;
 
-	mlx5e_reporter_tx_destroy(priv);
 	for (tc = 0; tc < priv->profile->max_tc; tc++)
 		mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
 }
@@ -4985,12 +4961,14 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
 		mlx5_core_err(mdev, "TLS initialization failed, %d\n", err);
 	mlx5e_build_nic_netdev(netdev);
 	mlx5e_build_tc2txq_maps(priv);
+	mlx5e_health_create_reporters(priv);
 
 	return 0;
 }
 
 static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 {
+	mlx5e_health_destroy_reporters(priv);
 	mlx5e_tls_cleanup(priv);
 	mlx5e_ipsec_cleanup(priv);
 	mlx5e_netdev_cleanup(priv->netdev, priv);
@@ -5093,7 +5071,6 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 	mlx5e_dcbnl_initialize(priv);
 #endif
-	mlx5e_reporter_tx_create(priv);
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 12/16] net/mlx5e: Split open/close ICOSQ into stages
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (10 preceding siblings ...)
  2019-07-07 11:53 ` [PATCH net-next 11/16] net/mlx5e: Add support to rx " Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  2019-07-09  7:23   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 13/16] net/mlx5e: Recover from CQE error on ICOSQ Tariq Toukan
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Align ICOSQ open/close behaviour with RQ and SQ. Split open flow into
open and activate where open handles creation and activate enables the
queue. Do a symmetric thing in close flow: split into close and
deactivate.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3922905e909f..7e6ac1e7bdd1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1372,7 +1372,6 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
 	csp.cqn             = sq->cq.mcq.cqn;
 	csp.wq_ctrl         = &sq->wq_ctrl;
 	csp.min_inline_mode = params->tx_min_inline_mode;
-	set_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
 	err = mlx5e_create_sq_rdy(c->mdev, param, &csp, &sq->sqn);
 	if (err)
 		goto err_free_icosq;
@@ -1386,12 +1385,22 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
 	return err;
 }
 
-void mlx5e_close_icosq(struct mlx5e_icosq *sq)
+static void mlx5e_activate_icosq(struct mlx5e_icosq *icosq)
 {
-	struct mlx5e_channel *c = sq->channel;
+	set_bit(MLX5E_SQ_STATE_ENABLED, &icosq->state);
+}
 
-	clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
+static void mlx5e_deactivate_icosq(struct mlx5e_icosq *icosq)
+{
+	struct mlx5e_channel *c = icosq->channel;
+
+	clear_bit(MLX5E_SQ_STATE_ENABLED, &icosq->state);
 	napi_synchronize(&c->napi);
+}
+
+void mlx5e_close_icosq(struct mlx5e_icosq *sq)
+{
+	struct mlx5e_channel *c = sq->channel;
 
 	mlx5e_destroy_sq(c->mdev, sq->sqn);
 	mlx5e_free_icosq(sq);
@@ -1968,6 +1977,7 @@ static void mlx5e_activate_channel(struct mlx5e_channel *c)
 
 	for (tc = 0; tc < c->num_tc; tc++)
 		mlx5e_activate_txqsq(&c->sq[tc]);
+	mlx5e_activate_icosq(&c->icosq);
 	mlx5e_activate_rq(&c->rq);
 	netif_set_xps_queue(c->netdev, c->xps_cpumask, c->ix);
 
@@ -1983,6 +1993,7 @@ static void mlx5e_deactivate_channel(struct mlx5e_channel *c)
 		mlx5e_deactivate_xsk(c);
 
 	mlx5e_deactivate_rq(&c->rq);
+	mlx5e_deactivate_icosq(&c->icosq);
 	for (tc = 0; tc < c->num_tc; tc++)
 		mlx5e_deactivate_txqsq(&c->sq[tc]);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 13/16] net/mlx5e: Recover from CQE error on ICOSQ
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (11 preceding siblings ...)
  2019-07-07 11:53 ` [PATCH net-next 12/16] net/mlx5e: Split open/close ICOSQ into stages Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  2019-07-09 15:30   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 14/16] net/mlx5e: Recover from rx timeout Tariq Toukan
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Add support for recovery from error on completion on ICOSQ. Deactivate
RQ and flush, then deactivate ICOSQ. Set the queue back to ready state
(firmware) and reset the ICOSQ and the RQ (software resources). Finally,
activate the ICOSQ and the RQ.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   8 ++
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |   1 +
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   | 103 ++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  23 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   2 +
 7 files changed, 135 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f7c5cf7a7064..648641e68de2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -554,6 +554,8 @@ struct mlx5e_icosq {
 	/* control path */
 	struct mlx5_wq_ctrl        wq_ctrl;
 	struct mlx5e_channel      *channel;
+
+	struct work_struct         recover_work;
 } ____cacheline_aligned_in_smp;
 
 struct mlx5e_wqe_frag_info {
@@ -1027,6 +1029,12 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
 void mlx5e_set_rq_type(struct mlx5_core_dev *mdev, struct mlx5e_params *params);
 void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
 			       struct mlx5e_params *params);
+int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state, int next_state);
+void mlx5e_activate_rq(struct mlx5e_rq *rq);
+void mlx5e_deactivate_rq(struct mlx5e_rq *rq);
+void mlx5e_free_rx_descs(struct mlx5e_rq *rq);
+void mlx5e_activate_icosq(struct mlx5e_icosq *icosq);
+void mlx5e_deactivate_icosq(struct mlx5e_icosq *icosq);
 
 int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
 		    struct mlx5e_modify_sq_param *p);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index 7829ea229914..e8c5d3bd86f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -18,6 +18,7 @@
 
 int mlx5e_reporter_rx_create(struct mlx5e_priv *priv);
 void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv);
+void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq);
 
 #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index b24bdff473b0..c47e9a53bd53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -27,6 +27,103 @@ static int mlx5e_query_rq_state(struct mlx5_core_dev *dev, u32 rqn, u8 *state)
 	return err;
 }
 
+static int mlx5e_wait_for_icosq_flush(struct mlx5e_icosq *icosq)
+{
+	unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
+
+	while (time_before(jiffies, exp_time)) {
+		if (icosq->cc == icosq->pc)
+			return 0;
+
+		msleep(20);
+	}
+
+	netdev_err(icosq->channel->netdev,
+		   "Wait for ICOSQ 0x%x flush timeout (cc = 0x%x, pc = 0x%x)\n",
+		   icosq->sqn, icosq->cc, icosq->pc);
+
+	return -ETIMEDOUT;
+}
+
+static void mlx5e_reset_icosq_cc_pc(struct mlx5e_icosq *icosq)
+{
+	WARN_ONCE(icosq->cc != icosq->pc, "ICOSQ 0x%x: cc (0x%x) != pc (0x%x)\n",
+		  icosq->sqn, icosq->cc, icosq->pc);
+	icosq->cc = 0;
+	icosq->pc = 0;
+}
+
+static int mlx5e_rx_reporter_err_icosq_cqe_recover(void *ctx)
+{
+	struct mlx5e_icosq *icosq = (struct mlx5e_icosq *)ctx;
+	struct mlx5_core_dev *mdev = icosq->channel->mdev;
+	struct net_device *dev = icosq->channel->netdev;
+	struct mlx5e_rq *rq = &icosq->channel->rq;
+	u8 state;
+	int err;
+
+	err = mlx5_core_query_sq_state(mdev, icosq->sqn, &state);
+	if (err) {
+		netdev_err(dev, "Failed to query ICOSQ 0x%x state. err = %d\n",
+			   icosq->sqn, err);
+		return err;
+	}
+
+	if (state != MLX5_SQC_STATE_ERR)
+		return 0;
+
+	mlx5e_deactivate_rq(rq);
+	err = mlx5e_wait_for_icosq_flush(icosq);
+	if (err)
+		goto out;
+
+	mlx5e_deactivate_icosq(icosq);
+
+	/* At this point, both the rq and the icosq are disabled */
+
+	err = mlx5e_health_sq_to_ready(icosq->channel, icosq->sqn);
+	if (err)
+		goto out;
+
+	mlx5e_reset_icosq_cc_pc(icosq);
+	mlx5e_free_rx_descs(rq);
+	mlx5e_activate_icosq(icosq);
+	mlx5e_activate_rq(rq);
+
+	rq->stats->recover++;
+out:
+	clear_bit(MLX5E_SQ_STATE_RECOVERING, &icosq->state);
+	return err;
+}
+
+void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq)
+{
+	struct mlx5e_priv *priv = icosq->channel->priv;
+	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+	struct mlx5e_err_ctx err_ctx = {};
+
+	err_ctx.ctx = icosq;
+	err_ctx.recover = mlx5e_rx_reporter_err_icosq_cqe_recover;
+	sprintf(err_str, "ERR CQE on ICOSQ: 0x%x", icosq->sqn);
+
+	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
+}
+
+static int mlx5e_rx_reporter_recover_from_ctx(struct mlx5e_err_ctx *err_ctx)
+{
+	return err_ctx->recover(err_ctx->ctx);
+}
+
+static int mlx5e_rx_reporter_recover(struct devlink_health_reporter *reporter,
+				     void *context)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	struct mlx5e_err_ctx *err_ctx = context;
+
+	return err_ctx ? mlx5e_rx_reporter_recover_from_ctx(err_ctx) :
+			 mlx5e_health_recover_channels(priv);
+}
+
 static int mlx5e_rx_reporter_build_diagnose_output(struct mlx5e_rq *rq,
 						   struct devlink_fmsg *fmsg)
 {
@@ -165,9 +262,12 @@ static int mlx5e_rx_reporter_diagnose(struct devlink_health_reporter *reporter,
 
 static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
 		.name = "rx",
+		.recover = mlx5e_rx_reporter_recover,
 		.diagnose = mlx5e_rx_reporter_diagnose,
 };
 
+#define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500
+
 int mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
 {
 	struct devlink_health_reporter *reporter;
@@ -176,7 +276,8 @@ int mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
 
 	reporter =
 		devlink_health_reporter_create(devlink, &mlx5_rx_reporter_ops,
-					       0, false, priv);
+					       MLX5E_REPORTER_RX_GRACEFUL_PERIOD,
+					       true, priv);
 	if (IS_ERR(reporter))
 		netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n",
 			    PTR_ERR(reporter));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7e6ac1e7bdd1..2d57611ac579 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -701,8 +701,7 @@ static int mlx5e_create_rq(struct mlx5e_rq *rq,
 	return err;
 }
 
-static int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state,
-				 int next_state)
+int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state, int next_state)
 {
 	struct mlx5_core_dev *mdev = rq->mdev;
 
@@ -813,7 +812,7 @@ int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time)
 	return -ETIMEDOUT;
 }
 
-static void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
+void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
 {
 	__be16 wqe_ix_be;
 	u16 wqe_ix;
@@ -889,7 +888,7 @@ int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
 	return err;
 }
 
-static void mlx5e_activate_rq(struct mlx5e_rq *rq)
+void mlx5e_activate_rq(struct mlx5e_rq *rq)
 {
 	set_bit(MLX5E_RQ_STATE_ENABLED, &rq->state);
 	mlx5e_trigger_irq(&rq->channel->icosq);
@@ -904,6 +903,7 @@ void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
 void mlx5e_close_rq(struct mlx5e_rq *rq)
 {
 	cancel_work_sync(&rq->dim.work);
+	cancel_work_sync(&rq->channel->icosq.recover_work);
 	mlx5e_destroy_rq(rq);
 	mlx5e_free_rx_descs(rq);
 	mlx5e_free_rq(rq);
@@ -1020,6 +1020,14 @@ static int mlx5e_alloc_icosq_db(struct mlx5e_icosq *sq, int numa)
 	return 0;
 }
 
+static void mlx5e_icosq_err_cqe_work(struct work_struct *recover_work)
+{
+	struct mlx5e_icosq *sq = container_of(recover_work, struct mlx5e_icosq,
+					      recover_work);
+
+	mlx5e_reporter_icosq_cqe_err(sq);
+}
+
 static int mlx5e_alloc_icosq(struct mlx5e_channel *c,
 			     struct mlx5e_sq_param *param,
 			     struct mlx5e_icosq *sq)
@@ -1042,6 +1050,8 @@ static int mlx5e_alloc_icosq(struct mlx5e_channel *c,
 	if (err)
 		goto err_sq_wq_destroy;
 
+	INIT_WORK(&sq->recover_work, mlx5e_icosq_err_cqe_work);
+
 	return 0;
 
 err_sq_wq_destroy:
@@ -1385,12 +1395,13 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
 	return err;
 }
 
-static void mlx5e_activate_icosq(struct mlx5e_icosq *icosq)
+void mlx5e_activate_icosq(struct mlx5e_icosq *icosq)
 {
+	clear_bit(MLX5E_SQ_STATE_RECOVERING, &icosq->state);
 	set_bit(MLX5E_SQ_STATE_ENABLED, &icosq->state);
 }
 
-static void mlx5e_deactivate_icosq(struct mlx5e_icosq *icosq)
+void mlx5e_deactivate_icosq(struct mlx5e_icosq *icosq)
 {
 	struct mlx5e_channel *c = icosq->channel;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 56a2f4666c47..4173679254bf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -615,6 +615,8 @@ void mlx5e_poll_ico_cq(struct mlx5e_cq *cq)
 		if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_REQ)) {
 			netdev_WARN_ONCE(cq->channel->netdev,
 					 "Bad OP in ICOSQ CQE: 0x%x\n", get_cqe_opcode(cqe));
+			if (!test_and_set_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
+				queue_work(cq->channel->priv->wq, &sq->recover_work);
 			break;
 		}
 		do {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 539b4d3656da..7de3d87dc434 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -107,6 +107,7 @@
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_cache_waive) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_congst_umr) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_arfs_err) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_recover) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, ch_events) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, ch_poll) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, ch_arm) },
@@ -217,6 +218,7 @@ static void mlx5e_grp_sw_update_stats(struct mlx5e_priv *priv)
 		s->rx_cache_waive += rq_stats->cache_waive;
 		s->rx_congst_umr  += rq_stats->congst_umr;
 		s->rx_arfs_err    += rq_stats->arfs_err;
+		s->rx_recover     += rq_stats->recover;
 		s->ch_events      += ch_stats->events;
 		s->ch_poll        += ch_stats->poll;
 		s->ch_arm         += ch_stats->arm;
@@ -1294,6 +1296,7 @@ static int mlx5e_grp_tls_fill_stats(struct mlx5e_priv *priv, u64 *data, int idx)
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, cache_waive) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, congst_umr) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, arfs_err) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, recover) },
 };
 
 static const struct counter_desc sq_stats_desc[] = {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 76ac111e14d0..a50caa70453c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -114,6 +114,7 @@ struct mlx5e_sw_stats {
 	u64 rx_cache_waive;
 	u64 rx_congst_umr;
 	u64 rx_arfs_err;
+	u64 rx_recover;
 	u64 ch_events;
 	u64 ch_poll;
 	u64 ch_arm;
@@ -247,6 +248,7 @@ struct mlx5e_rq_stats {
 	u64 cache_waive;
 	u64 congst_umr;
 	u64 arfs_err;
+	u64 recover;
 };
 
 struct mlx5e_sq_stats {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 14/16] net/mlx5e: Recover from rx timeout
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (12 preceding siblings ...)
  2019-07-07 11:53 ` [PATCH net-next 13/16] net/mlx5e: Recover from CQE error on ICOSQ Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  2019-07-09 15:32   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 15/16] net/mlx5e: RX, Handle CQE with error at the earliest stage Tariq Toukan
  2019-07-07 11:53 ` [PATCH net-next 16/16] net/mlx5e: Recover from CQE with error on RQ Tariq Toukan
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Add support for recovery from rx timeout. On driver open we post NOP
work request on the rx channels to trigger napi in order to fillup the
rx rings. In case napi wasn't scheduled due to a lost interrupt, perform
EQ recovery.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |  1 +
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   | 30 ++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  1 +
 3 files changed, 32 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index e8c5d3bd86f1..aa46f7ecae53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -19,6 +19,7 @@
 int mlx5e_reporter_rx_create(struct mlx5e_priv *priv);
 void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv);
 void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq);
+void mlx5e_reporter_rx_timeout(struct mlx5e_rq *rq);
 
 #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index c47e9a53bd53..7e7dba129330 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -109,6 +109,36 @@ void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq)
 	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
 }
 
+static int mlx5e_rx_reporter_timeout_recover(void *ctx)
+{
+	struct mlx5e_rq *rq = (struct mlx5e_rq *)ctx;
+	struct mlx5e_icosq *icosq = &rq->channel->icosq;
+	struct mlx5_eq_comp *eq = rq->cq.mcq.eq;
+	int err;
+
+	err = mlx5e_health_channel_eq_recover(eq, rq->channel);
+	if (err)
+		clear_bit(MLX5E_SQ_STATE_ENABLED, &icosq->state);
+
+	return err;
+}
+
+void mlx5e_reporter_rx_timeout(struct mlx5e_rq *rq)
+{
+	struct mlx5e_icosq *icosq = &rq->channel->icosq;
+	struct mlx5e_priv *priv = rq->channel->priv;
+	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+	struct mlx5e_err_ctx err_ctx = {};
+
+	err_ctx.ctx = rq;
+	err_ctx.recover = mlx5e_rx_reporter_timeout_recover;
+	sprintf(err_str,
+		"RX timeout on channel: %d, ICOSQ: 0x%x RQ: 0x%x, CQ: 0x%x\n",
+		icosq->channel->ix, icosq->sqn, rq->rqn, rq->cq.mcq.cqn);
+
+	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
+}
+
 static int mlx5e_rx_reporter_recover_from_ctx(struct mlx5e_err_ctx *err_ctx)
 {
 	return err_ctx->recover(err_ctx->ctx);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2d57611ac579..1ebdeccf395d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -809,6 +809,7 @@ int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time)
 	netdev_warn(c->netdev, "Failed to get min RX wqes on Channel[%d] RQN[0x%x] wq cur_sz(%d) min_rx_wqes(%d)\n",
 		    c->ix, rq->rqn, mlx5e_rqwq_get_cur_sz(rq), min_wqes);
 
+	mlx5e_reporter_rx_timeout(rq);
 	return -ETIMEDOUT;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 15/16] net/mlx5e: RX, Handle CQE with error at the earliest stage
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (13 preceding siblings ...)
  2019-07-07 11:53 ` [PATCH net-next 14/16] net/mlx5e: Recover from rx timeout Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  2019-07-09 15:34   ` Jiri Pirko
  2019-07-07 11:53 ` [PATCH net-next 16/16] net/mlx5e: Recover from CQE with error on RQ Tariq Toukan
  15 siblings, 1 reply; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Saeed Mahameed <saeedm@mellanox.com>

Just to be aligned with the MPWQE handlers, handle RX WQE with error
for legacy RQs in the top RX handlers, just before calling skb_from_cqe().

CQE error handling will now be called at the same stage regardless of
the RQ type or netdev mode NIC, Representor, IPoIB, etc ..

This will be useful for down stream patches to improve error CQE
handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    | 49 ++++++++++++----------
 2 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index aa46f7ecae53..435e31ad1cfb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -6,6 +6,8 @@
 
 #include "en.h"
 
+#define MLX5E_RX_ERR_CQE(cqe) (get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)
+
 int mlx5e_reporter_tx_create(struct mlx5e_priv *priv);
 void mlx5e_reporter_tx_destroy(struct mlx5e_priv *priv);
 void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 4173679254bf..3b730d3436e6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -48,6 +48,7 @@
 #include "lib/clock.h"
 #include "en/xdp.h"
 #include "en/xsk/rx.h"
+#include "en/health.h"
 
 static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config)
 {
@@ -1062,11 +1063,6 @@ struct sk_buff *
 	prefetchw(va); /* xdp_frame data area */
 	prefetch(data);
 
-	if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) {
-		rq->stats->wqe_err++;
-		return NULL;
-	}
-
 	rcu_read_lock();
 	consumed = mlx5e_xdp_handle(rq, di, va, &rx_headroom, &cqe_bcnt, false);
 	rcu_read_unlock();
@@ -1094,11 +1090,6 @@ struct sk_buff *
 	u16 byte_cnt     = cqe_bcnt - headlen;
 	struct sk_buff *skb;
 
-	if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) {
-		rq->stats->wqe_err++;
-		return NULL;
-	}
-
 	/* XDP is not supported in this configuration, as incoming packets
 	 * might spread among multiple pages.
 	 */
@@ -1144,6 +1135,11 @@ void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wi       = get_frag(rq, ci);
 	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
+	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
+		rq->stats->wqe_err++;
+		goto free_wqe;
+	}
+
 	skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe,
 			      mlx5e_skb_from_cqe_linear,
 			      mlx5e_skb_from_cqe_nonlinear,
@@ -1185,6 +1181,11 @@ void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wi       = get_frag(rq, ci);
 	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
+	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
+		rq->stats->wqe_err++;
+		goto free_wqe;
+	}
+
 	skb = rq->wqe.skb_from_cqe(rq, cqe, wi, cqe_bcnt);
 	if (!skb) {
 		/* probably for XDP */
@@ -1319,7 +1320,7 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 
 	wi->consumed_strides += cstrides;
 
-	if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) {
+	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
 		rq->stats->wqe_err++;
 		goto mpwrq_cqe_out;
 	}
@@ -1495,6 +1496,11 @@ void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wi       = get_frag(rq, ci);
 	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
+	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
+		rq->stats->wqe_err++;
+		goto wq_free_wqe;
+	}
+
 	skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe,
 			      mlx5e_skb_from_cqe_linear,
 			      mlx5e_skb_from_cqe_nonlinear,
@@ -1530,26 +1536,27 @@ void mlx5e_ipsec_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wi       = get_frag(rq, ci);
 	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
+	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
+		rq->stats->wqe_err++;
+		goto wq_free_wqe;
+	}
+
 	skb = INDIRECT_CALL_2(rq->wqe.skb_from_cqe,
 			      mlx5e_skb_from_cqe_linear,
 			      mlx5e_skb_from_cqe_nonlinear,
 			      rq, cqe, wi, cqe_bcnt);
-	if (unlikely(!skb)) {
-		/* a DROP, save the page-reuse checks */
-		mlx5e_free_rx_wqe(rq, wi, true);
-		goto wq_cyc_pop;
-	}
+	if (unlikely(!skb)) /* a DROP, save the page-reuse checks */
+		goto wq_free_wqe;
+
 	skb = mlx5e_ipsec_handle_rx_skb(rq->netdev, skb, &cqe_bcnt);
-	if (unlikely(!skb)) {
-		mlx5e_free_rx_wqe(rq, wi, true);
-		goto wq_cyc_pop;
-	}
+	if (unlikely(!skb))
+		goto wq_free_wqe;
 
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
 	napi_gro_receive(rq->cq.napi, skb);
 
+wq_free_wqe:
 	mlx5e_free_rx_wqe(rq, wi, true);
-wq_cyc_pop:
 	mlx5_wq_cyc_pop(wq);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 16/16] net/mlx5e: Recover from CQE with error on RQ
  2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
                   ` (14 preceding siblings ...)
  2019-07-07 11:53 ` [PATCH net-next 15/16] net/mlx5e: RX, Handle CQE with error at the earliest stage Tariq Toukan
@ 2019-07-07 11:53 ` Tariq Toukan
  15 siblings, 0 replies; 31+ messages in thread
From: Tariq Toukan @ 2019-07-07 11:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, ayal, jiri, Saeed Mahameed, moshe, Tariq Toukan

From: Aya Levin <ayal@mellanox.com>

Add support for recovery from error on completion on RQ by setting
the queue back to ready state. Handle only errors with a syndrome
indicating the RQ might enter error state and could be recovered.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  3 +
 .../net/ethernet/mellanox/mlx5/core/en/health.h    |  9 +++
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   | 65 ++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 10 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    | 11 ++++
 5 files changed, 98 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 648641e68de2..bf39ce19303d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -303,6 +303,7 @@ struct mlx5e_dcbx_dp {
 
 enum {
 	MLX5E_RQ_STATE_ENABLED,
+	MLX5E_RQ_STATE_RECOVERING,
 	MLX5E_RQ_STATE_AM,
 	MLX5E_RQ_STATE_NO_CSUM_COMPLETE,
 };
@@ -675,6 +676,8 @@ struct mlx5e_rq {
 	struct zero_copy_allocator zca;
 	struct xdp_umem       *umem;
 
+	struct work_struct     recover_work;
+
 	/* control */
 	struct mlx5_wq_ctrl    wq_ctrl;
 	__be32                 mkey_be;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index 435e31ad1cfb..198ef625005c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -8,6 +8,14 @@
 
 #define MLX5E_RX_ERR_CQE(cqe) (get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)
 
+static inline bool cqe_syndrome_needs_recover(u8 syndrome)
+{
+	return syndrome == MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR ||
+	       syndrome == MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR ||
+	       syndrome == MLX5_CQE_SYNDROME_LOCAL_PROT_ERR ||
+	       syndrome == MLX5_CQE_SYNDROME_WR_FLUSH_ERR;
+}
+
 int mlx5e_reporter_tx_create(struct mlx5e_priv *priv);
 void mlx5e_reporter_tx_destroy(struct mlx5e_priv *priv);
 void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
@@ -21,6 +29,7 @@
 int mlx5e_reporter_rx_create(struct mlx5e_priv *priv);
 void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv);
 void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq);
+void mlx5e_reporter_rq_cqe_err(struct mlx5e_rq *rq);
 void mlx5e_reporter_rx_timeout(struct mlx5e_rq *rq);
 
 #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 7e7dba129330..2f48321d14d9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -109,6 +109,71 @@ void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq)
 	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
 }
 
+static int mlx5e_rq_to_ready(struct mlx5e_rq *rq, int curr_state)
+{
+	struct net_device *dev = rq->netdev;
+	int err;
+
+	err = mlx5e_modify_rq_state(rq, curr_state, MLX5_RQC_STATE_RST);
+	if (err) {
+		netdev_err(dev, "Failed to move rq 0x%x to reset\n", rq->rqn);
+		return err;
+	}
+	err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
+	if (err) {
+		netdev_err(dev, "Failed to move rq 0x%x to ready\n", rq->rqn);
+		return err;
+	}
+
+	return 0;
+}
+
+static int mlx5e_rx_reporter_err_rq_cqe_recover(void *ctx)
+{
+	struct mlx5e_rq *rq = (struct mlx5e_rq *)ctx;
+	struct mlx5_core_dev *mdev = rq->mdev;
+	struct net_device *dev = rq->netdev;
+	u8 state;
+	int err;
+
+	err = mlx5e_query_rq_state(mdev, rq->rqn, &state);
+	if (err) {
+		netdev_err(dev, "Failed to query RQ 0x%x state. err = %d\n",
+			   rq->rqn, err);
+		return err;
+	}
+
+	if (state != MLX5_RQC_STATE_ERR)
+		return 0;
+
+	mlx5e_deactivate_rq(rq);
+	mlx5e_free_rx_descs(rq);
+
+	err = mlx5e_rq_to_ready(rq, MLX5_RQC_STATE_ERR);
+	if (err)
+		goto out;
+
+	mlx5e_activate_rq(rq);
+	rq->stats->recover++;
+
+out:
+	clear_bit(MLX5E_RQ_STATE_RECOVERING, &rq->state);
+	return err;
+}
+
+void mlx5e_reporter_rq_cqe_err(struct mlx5e_rq *rq)
+{
+	struct mlx5e_priv *priv = rq->channel->priv;
+	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+	struct mlx5e_err_ctx err_ctx = {};
+
+	err_ctx.ctx = rq;
+	err_ctx.recover = mlx5e_rx_reporter_err_rq_cqe_recover;
+	sprintf(err_str, "ERR CQE on RQ: 0x%x", rq->rqn);
+
+	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
+}
+
 static int mlx5e_rx_reporter_timeout_recover(void *ctx)
 {
 	struct mlx5e_rq *rq = (struct mlx5e_rq *)ctx;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1ebdeccf395d..c1f12f2e3a1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -363,6 +363,13 @@ static void mlx5e_free_di_list(struct mlx5e_rq *rq)
 	kvfree(rq->wqe.di);
 }
 
+static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work)
+{
+	struct mlx5e_rq *rq = container_of(recover_work, struct mlx5e_rq, recover_work);
+
+	mlx5e_reporter_rq_cqe_err(rq);
+}
+
 static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 			  struct mlx5e_params *params,
 			  struct mlx5e_xsk_param *xsk,
@@ -399,6 +406,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 		rq->stats = &c->priv->channel_stats[c->ix].xskrq;
 	else
 		rq->stats = &c->priv->channel_stats[c->ix].rq;
+	INIT_WORK(&rq->recover_work, mlx5e_rq_err_cqe_work);
 
 	rq->xdp_prog = params->xdp_prog ? bpf_prog_inc(params->xdp_prog) : NULL;
 	if (IS_ERR(rq->xdp_prog)) {
@@ -891,6 +899,7 @@ int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
 
 void mlx5e_activate_rq(struct mlx5e_rq *rq)
 {
+	clear_bit(MLX5E_SQ_STATE_RECOVERING, &rq->state);
 	set_bit(MLX5E_RQ_STATE_ENABLED, &rq->state);
 	mlx5e_trigger_irq(&rq->channel->icosq);
 }
@@ -905,6 +914,7 @@ void mlx5e_close_rq(struct mlx5e_rq *rq)
 {
 	cancel_work_sync(&rq->dim.work);
 	cancel_work_sync(&rq->channel->icosq.recover_work);
+	cancel_work_sync(&rq->recover_work);
 	mlx5e_destroy_rq(rq);
 	mlx5e_free_rx_descs(rq);
 	mlx5e_free_rq(rq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3b730d3436e6..2b486863038e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1123,6 +1123,15 @@ struct sk_buff *
 	return skb;
 }
 
+static void trigger_report(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
+{
+	struct mlx5_err_cqe *err_cqe = (struct mlx5_err_cqe *)cqe;
+
+	if (cqe_syndrome_needs_recover(err_cqe->syndrome) &&
+	    !test_and_set_bit(MLX5E_RQ_STATE_RECOVERING, &rq->state))
+		queue_work(rq->channel->priv->wq, &rq->recover_work);
+}
+
 void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 {
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
@@ -1136,6 +1145,7 @@ void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
 	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
+		trigger_report(rq, cqe);
 		rq->stats->wqe_err++;
 		goto free_wqe;
 	}
@@ -1321,6 +1331,7 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wi->consumed_strides += cstrides;
 
 	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
+		trigger_report(rq, cqe);
 		rq->stats->wqe_err++;
 		goto mpwrq_cqe_out;
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 01/16] Revert "net/mlx5e: Fix mlx5e_tx_reporter_create return value"
  2019-07-07 11:52 ` [PATCH net-next 01/16] Revert "net/mlx5e: Fix mlx5e_tx_reporter_create return value" Tariq Toukan
@ 2019-07-08 11:03   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 11:03 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:52:53PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>This reverts commit 2e5b0534622fa87fd570d54af2d01ce304b88077.
>
>This commit was needed prior to commit f6b19b354d50 ("net: devlink:
>select NET_DEVLINK from drivers") Then, reporter's pointer could have
>been a NULL. But with NET_DEVLINK mandatory to MLX5_CORE in Kconfig,
>pointer can only hold an error in bad path.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

I'm not sure if the patch name "Revert: ..." is correct. I would rather
just describe the change and don't mention the "revert" even in the
patch description.

The patch looks good.
Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 02/16] net/mlx5e: Fix error flow in tx reporter diagnose
  2019-07-07 11:52 ` [PATCH net-next 02/16] net/mlx5e: Fix error flow in tx reporter diagnose Tariq Toukan
@ 2019-07-08 11:09   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 11:09 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:52:54PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Fix tx reporter's diagnose call back. Propagate error when failing to

s/call back/callback/ - it's a noun.


>gather diagnostics information or failing to print diagnostic data per
>queue.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

It's a fix, please provide "Fixes" line:
Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Also, it should be most likely sent to -net tree.

The patch itself looks fine.
Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 05/16] net/mlx5e: Rename reporter header file
  2019-07-07 11:52 ` [PATCH net-next 05/16] net/mlx5e: Rename reporter header file Tariq Toukan
@ 2019-07-08 11:11   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 11:11 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:52:57PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Rename reporter.h -> health.h so patches in the set can use it for
>health related functionality.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
>---
> drivers/net/ethernet/mellanox/mlx5/core/en/health.h      | 15 +++++++++++++++
> drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h    | 15 ---------------
> drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c |  2 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c        |  2 +-
> 4 files changed, 17 insertions(+), 17 deletions(-)
> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/health.h
> delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>new file mode 100644
>index 000000000000..e78e92753d73
>--- /dev/null
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>@@ -0,0 +1,15 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+/* Copyright (c) 2019 Mellanox Technologies. */
>+
>+#ifndef __MLX5E_EN_REPORTER_H
>+#define __MLX5E_EN_REPORTER_H

This should be "__MLX5E_EN_HEALTH_H" now.

[...]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 06/16] net/mlx5e: Change naming convention for reporter's functions
  2019-07-07 11:52 ` [PATCH net-next 06/16] net/mlx5e: Change naming convention for reporter's functions Tariq Toukan
@ 2019-07-08 11:29   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 11:29 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:52:58PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Change from mlx5e_tx_reporter_* to mlx5e_reporter_tx_*. In the following
>patches in the set rx reporter is added, the new naming convention is
>more uniformed.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 07/16] net/mlx5e: Generalize tx reporter's functionality
  2019-07-07 11:52 ` [PATCH net-next 07/16] net/mlx5e: Generalize tx reporter's functionality Tariq Toukan
@ 2019-07-08 12:42   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 12:42 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:52:59PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Prepare for code sharing with rx reporter, which is added in the
>following patches in the set. Introduce a generic error_ctx for
>agnostic recovery despatch.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
>---
> drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   5 +-
> .../net/ethernet/mellanox/mlx5/core/en/health.c    |  82 ++++++++++++++
> .../net/ethernet/mellanox/mlx5/core/en/health.h    |  15 ++-
> .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 126 ++++-----------------
> 4 files changed, 124 insertions(+), 104 deletions(-)
> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>index 57d2cc666fe3..23d566a45a30 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>@@ -23,8 +23,9 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
> #
> mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
> 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
>-		en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o \
>-		en/params.o en/xsk/umem.o en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o
>+		en_selftest.o en/port.o en/monitor_stats.o en/health.o \
>+		en/reporter_tx.o en/params.o en/xsk/umem.o en/xsk/setup.o \
>+		en/xsk/rx.o en/xsk/tx.o
> 
> #
> # Netdev extra
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>new file mode 100644
>index 000000000000..60166e5432ae
>--- /dev/null
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>@@ -0,0 +1,82 @@
>+// SPDX-License-Identifier: GPL-2.0
>+// Copyright (c) 2019 Mellanox Technologies.
>+
>+#include "health.h"
>+#include "lib/eq.h"
>+
>+int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn)
>+{
>+	struct mlx5_core_dev *mdev = channel->mdev;
>+	struct net_device *dev = channel->netdev;
>+	struct mlx5e_modify_sq_param msp = {0};
>+	int err;
>+
>+	msp.curr_state = MLX5_SQC_STATE_ERR;
>+	msp.next_state = MLX5_SQC_STATE_RST;
>+
>+	err = mlx5e_modify_sq(mdev, sqn, &msp);
>+	if (err) {
>+		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sqn);
>+		return err;
>+	}
>+
>+	memset(&msp, 0, sizeof(msp));
>+	msp.curr_state = MLX5_SQC_STATE_RST;
>+	msp.next_state = MLX5_SQC_STATE_RDY;
>+
>+	err = mlx5e_modify_sq(mdev, sqn, &msp);
>+	if (err) {
>+		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sqn);
>+		return err;
>+	}
>+
>+	return 0;
>+}
>+
>+int mlx5e_health_recover_channels(struct mlx5e_priv *priv)
>+{
>+	int err = 0;
>+
>+	rtnl_lock();
>+	mutex_lock(&priv->state_lock);
>+
>+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
>+		goto out;
>+
>+	err = mlx5e_safe_reopen_channels(priv);
>+
>+out:
>+	mutex_unlock(&priv->state_lock);
>+	rtnl_unlock();
>+
>+	return err;
>+}
>+
>+int mlx5e_health_channel_eq_recover(struct mlx5_eq_comp *eq, struct mlx5e_channel *channel)
>+{
>+	u32 eqe_count;
>+
>+	netdev_err(channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
>+		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
>+
>+	eqe_count = mlx5_eq_poll_irq_disabled(eq);
>+	if (!eqe_count)
>+		return -EIO;
>+
>+	netdev_err(channel->netdev, "Recovered %d eqes on EQ 0x%x\n",
>+		   eqe_count, eq->core.eqn);
>+
>+	channel->stats->eq_rearm++;
>+	return 0;
>+}
>+
>+int mlx5e_health_report(struct mlx5e_priv *priv,
>+			struct devlink_health_reporter *reporter, char *err_str,
>+			struct mlx5e_err_ctx *err_ctx)
>+{
>+	if (!reporter) {
>+		netdev_err(priv->netdev, err_str);
>+		return err_ctx->recover(&err_ctx->ctx);
>+	}
>+	return devlink_health_report(reporter, err_str, err_ctx);
>+}
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>index e3a3bcee89e7..960aa18c425d 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>@@ -4,7 +4,6 @@
> #ifndef __MLX5E_EN_REPORTER_H
> #define __MLX5E_EN_REPORTER_H
> 
>-#include <linux/mlx5/driver.h>

How is this related to the rest of the patch?


> #include "en.h"
> 
> int mlx5e_reporter_tx_create(struct mlx5e_priv *priv);
>@@ -12,4 +11,18 @@
> void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
> int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq);
> 
>+#define MLX5E_REPORTER_PER_Q_MAX_LEN 256
>+
>+struct mlx5e_err_ctx {
>+	int (*recover)(void *ctx);
>+	void *ctx;
>+};
>+
>+int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn);
>+int mlx5e_health_channel_eq_recover(struct mlx5_eq_comp *eq, struct mlx5e_channel *channel);
>+int mlx5e_health_recover_channels(struct mlx5e_priv *priv);
>+int mlx5e_health_report(struct mlx5e_priv *priv,
>+			struct devlink_health_reporter *reporter, char *err_str,
>+			struct mlx5e_err_ctx *err_ctx);
>+
> #endif
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>index d5ecfcfe5d52..3e03a1ac8e5a 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>@@ -2,14 +2,6 @@
> /* Copyright (c) 2019 Mellanox Technologies. */
> 
> #include "health.h"
>-#include "lib/eq.h"
>-
>-#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
>-
>-struct mlx5e_tx_err_ctx {
>-	int (*recover)(struct mlx5e_txqsq *sq);
>-	struct mlx5e_txqsq *sq;
>-};
> 
> static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
> {
>@@ -39,37 +31,9 @@ static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
> 	sq->pc = 0;
> }
> 
>-static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
>-{
>-	struct mlx5_core_dev *mdev = sq->channel->mdev;
>-	struct net_device *dev = sq->channel->netdev;
>-	struct mlx5e_modify_sq_param msp = {0};
>-	int err;
>-
>-	msp.curr_state = curr_state;
>-	msp.next_state = MLX5_SQC_STATE_RST;
>-
>-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
>-	if (err) {
>-		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
>-		return err;
>-	}
>-
>-	memset(&msp, 0, sizeof(msp));
>-	msp.curr_state = MLX5_SQC_STATE_RST;
>-	msp.next_state = MLX5_SQC_STATE_RDY;
>-
>-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
>-	if (err) {
>-		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
>-		return err;
>-	}
>-
>-	return 0;
>-}
>-
>-static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
>+static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
> {
>+	struct mlx5e_txqsq *sq = (struct mlx5e_txqsq *)ctx;

No need to cast from void *


> 	struct mlx5_core_dev *mdev = sq->channel->mdev;
> 	struct net_device *dev = sq->channel->netdev;
> 	u8 state;
>@@ -101,7 +65,7 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
> 	 * pending WQEs. SQ can safely reset the SQ.
> 	 */
> 
>-	err = mlx5e_sq_to_ready(sq, state);
>+	err = mlx5e_health_sq_to_ready(sq->channel, sq->sqn);
> 	if (err)
> 		return err;
> 
>@@ -112,104 +76,64 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
> 	return 0;
> }
> 
>-static int mlx5_tx_health_report(struct devlink_health_reporter *tx_reporter,
>-				 char *err_str,
>-				 struct mlx5e_tx_err_ctx *err_ctx)
>-{
>-	if (!tx_reporter) {
>-		netdev_err(err_ctx->sq->channel->netdev, err_str);
>-		return err_ctx->recover(err_ctx->sq);
>-	}
>-
>-	return devlink_health_report(tx_reporter, err_str, err_ctx);
>-}
>-
> void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq)
> {
>-	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
>-	struct mlx5e_tx_err_ctx err_ctx = {0};
>+	struct mlx5e_priv *priv = sq->channel->priv;
>+	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
>+	struct mlx5e_err_ctx err_ctx = {0};
> 
>-	err_ctx.sq       = sq;
>-	err_ctx.recover  = mlx5e_tx_reporter_err_cqe_recover;
>+	err_ctx.ctx = sq;
>+	err_ctx.recover = mlx5e_tx_reporter_err_cqe_recover;
> 	sprintf(err_str, "ERR CQE on SQ: 0x%x", sq->sqn);
> 
>-	mlx5_tx_health_report(sq->channel->priv->tx_reporter, err_str,
>-			      &err_ctx);
>+	mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
> }
> 
>-static int mlx5e_tx_reporter_timeout_recover(struct mlx5e_txqsq *sq)
>+static int mlx5e_tx_reporter_timeout_recover(void *ctx)
> {
>+	struct mlx5e_txqsq *sq = (struct mlx5e_txqsq *)ctx;

No need to cast from void *

	
> 	struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
>-	u32 eqe_count;
>-	int ret;
>-
>-	netdev_err(sq->channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
>-		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
>+	int err;
> 
>-	eqe_count = mlx5_eq_poll_irq_disabled(eq);
>-	ret = eqe_count ? false : true;
>-	if (!eqe_count) {
>+	err = mlx5e_health_channel_eq_recover(eq, sq->channel);
>+	if (err)
> 		clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
>-		return ret;
>-	}
> 
>-	netdev_err(sq->channel->netdev, "Recover %d eqes on EQ 0x%x\n",
>-		   eqe_count, eq->core.eqn);
>-	sq->channel->stats->eq_rearm++;
>-	return ret;
>+	return err;
> }
> 
> int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
> {
>-	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
>-	struct mlx5e_tx_err_ctx err_ctx;
>+	struct mlx5e_priv *priv = sq->channel->priv;
>+	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
>+	struct mlx5e_err_ctx err_ctx;
> 
>-	err_ctx.sq       = sq;
>-	err_ctx.recover  = mlx5e_tx_reporter_timeout_recover;
>+	err_ctx.ctx = sq;
>+	err_ctx.recover = mlx5e_tx_reporter_timeout_recover;
> 	sprintf(err_str,
> 		"TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
> 		sq->channel->ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
> 		jiffies_to_usecs(jiffies - sq->txq->trans_start));
> 
>-	return mlx5_tx_health_report(sq->channel->priv->tx_reporter, err_str,
>-				     &err_ctx);
>+	return mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
> }
> 
> /* state lock cannot be grabbed within this function.
>  * It can cause a dead lock or a read-after-free.
>  */
>-static int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_tx_err_ctx *err_ctx)
>+static int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_err_ctx *err_ctx)
> {
>-	return err_ctx->recover(err_ctx->sq);
>-}
>-
>-static int mlx5e_tx_reporter_recover_all(struct mlx5e_priv *priv)
>-{
>-	int err = 0;
>-
>-	rtnl_lock();
>-	mutex_lock(&priv->state_lock);
>-
>-	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
>-		goto out;
>-
>-	err = mlx5e_safe_reopen_channels(priv);
>-
>-out:
>-	mutex_unlock(&priv->state_lock);
>-	rtnl_unlock();
>-
>-	return err;
>+	return err_ctx->recover(err_ctx->ctx);
> }
> 
> static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
> 				     void *context)
> {
> 	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
>-	struct mlx5e_tx_err_ctx *err_ctx = context;
>+	struct mlx5e_err_ctx *err_ctx = context;
> 
> 	return err_ctx ? mlx5e_tx_reporter_recover_from_ctx(err_ctx) :
>-			 mlx5e_tx_reporter_recover_all(priv);
>+			 mlx5e_health_recover_channels(priv);
> }
> 
> static int
>-- 
>1.8.3.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 08/16] net/mlx5e: Extend tx diagnose function
  2019-07-07 11:53 ` [PATCH net-next 08/16] net/mlx5e: Extend tx diagnose function Tariq Toukan
@ 2019-07-08 12:43   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 12:43 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:00PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>The following patches in the set enhance the diagnostics info of tx
>reporter. Therefore, it is better to pass a pointer to the SQ for
>further data extraction.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 09/16] net/mlx5e: Extend tx reporter diagnostics output
  2019-07-07 11:53 ` [PATCH net-next 09/16] net/mlx5e: Extend tx reporter diagnostics output Tariq Toukan
@ 2019-07-08 12:55   ` Jiri Pirko
  2019-07-08 13:39   ` Jiri Pirko
  1 sibling, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 12:55 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:01PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Enhance tx reporter's diagnostics output to include: information common
>to all SQs: SQ size, SQ stride size.
>In addition add channel ix, cc and pc.
>
>$ devlink health diagnose pci/0000:00:0b.0 reporter tx
>Common config:
>   SQ: stride size: 64 size: 1024
> SQs:
>   channel ix: 0 sqn: 4283 HW state: 1 stopped: false cc: 0 pc: 0
>   channel ix: 1 sqn: 4288 HW state: 1 stopped: false cc: 0 pc: 0
>   channel ix: 2 sqn: 4293 HW state: 1 stopped: false cc: 0 pc: 0
>   channel ix: 3 sqn: 4298 HW state: 1 stopped: false cc: 0 pc: 0
>
>$ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp
>{
>    "Common config": [
>        "SQ": {
>            "stride size": 64,
>            "size": 1024
>        } ],
>    "SQs": [ {
>            "channel ix": 0,
>            "sqn": 4283,
>            "HW state": 1,
>            "stopped": false,
>            "cc": 0,
>            "pc": 0
>        },{
>            "channel ix": 1,
>            "sqn": 4288,
>            "HW state": 1,
>            "stopped": false,
>            "cc": 0,
>            "pc": 0
>        },{
>            "channel ix": 2,
>            "sqn": 4293,
>            "HW state": 1,
>            "stopped": false,
>            "cc": 0,
>            "pc": 0
>        },{
>            "channel ix": 3,
>            "sqn": 4298,
>            "HW state": 1,
>            "stopped": false,
>            "cc": 0,
>            "pc": 0
>         } ]
>}
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
>---
> .../net/ethernet/mellanox/mlx5/core/en/health.c    | 30 ++++++++++++++++
> .../net/ethernet/mellanox/mlx5/core/en/health.h    |  3 ++
> .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 42 ++++++++++++++++++++++
> 3 files changed, 75 insertions(+)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>index 60166e5432ae..0d44b081259f 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>@@ -4,6 +4,36 @@
> #include "health.h"
> #include "lib/eq.h"
> 
>+int mlx5e_reporter_named_obj_nest_start(struct devlink_fmsg *fmsg, char *name)
>+{
>+	int err;
>+
>+	err = devlink_fmsg_pair_nest_start(fmsg, name);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_obj_nest_start(fmsg);
>+	if (err)
>+		return err;
>+
>+	return 0;
>+}
>+
>+int mlx5e_reporter_named_obj_nest_end(struct devlink_fmsg *fmsg)
>+{
>+	int err;
>+
>+	err = devlink_fmsg_pair_nest_end(fmsg);

You should end the obj nest first.


>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_obj_nest_end(fmsg);
>+	if (err)
>+		return err;
>+
>+	return 0;
>+}
>+
> int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn)
> {
> 	struct mlx5_core_dev *mdev = channel->mdev;
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>index 960aa18c425d..0fecad63135c 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>@@ -11,6 +11,9 @@
> void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
> int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq);
> 
>+int mlx5e_reporter_named_obj_nest_start(struct devlink_fmsg *fmsg, char *name);
>+int mlx5e_reporter_named_obj_nest_end(struct devlink_fmsg *fmsg);
>+
> #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
> 
> struct mlx5e_err_ctx {
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>index dd6417930461..c481f7142a12 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>@@ -153,6 +153,10 @@ static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
> 	if (err)
> 		return err;
> 
>+	err = devlink_fmsg_u32_pair_put(fmsg, "channel ix", sq->channel->ix);

"ix" is an attribute of the channel. I think it should be nested under
"channel" here.


>+	if (err)
>+		return err;
>+
> 	err = devlink_fmsg_u32_pair_put(fmsg, "sqn", sq->sqn);
> 	if (err)
> 		return err;
>@@ -165,6 +169,14 @@ static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
> 	if (err)
> 		return err;
> 
>+	err = devlink_fmsg_u32_pair_put(fmsg, "cc", sq->cc);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_u32_pair_put(fmsg, "pc", sq->pc);
>+	if (err)
>+		return err;
>+
> 	err = devlink_fmsg_obj_nest_end(fmsg);
> 	if (err)
> 		return err;
>@@ -176,6 +188,9 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
> 				      struct devlink_fmsg *fmsg)
> {
> 	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
>+	struct mlx5e_txqsq *generic_sq = priv->txq2sq[0];
>+	u32 sq_stride, sq_sz;
>+
> 	int i, err = 0;
> 
> 	mutex_lock(&priv->state_lock);
>@@ -183,6 +198,33 @@ static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
> 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
> 		goto unlock;
> 
>+	sq_sz = mlx5_wq_cyc_get_size(&generic_sq->wq);
>+	sq_stride = MLX5_SEND_WQE_BB;
>+
>+	err = devlink_fmsg_arr_pair_nest_start(fmsg, "Common config");
>+	if (err)
>+		goto unlock;
>+
>+	err = mlx5e_reporter_named_obj_nest_start(fmsg, "SQ");
>+	if (err)
>+		goto unlock;
>+
>+	err = devlink_fmsg_u64_pair_put(fmsg, "stride size", sq_stride);
>+	if (err)
>+		goto unlock;
>+
>+	err = devlink_fmsg_u32_pair_put(fmsg, "size", sq_sz);
>+	if (err)
>+		goto unlock;
>+
>+	err = devlink_fmsg_arr_pair_nest_end(fmsg);

Which nest is this supposed to end? Looks like it is extra and
should not be here...


>+	if (err)
>+		goto unlock;
>+
>+	err = mlx5e_reporter_named_obj_nest_end(fmsg);
>+	if (err)
>+		goto unlock;
>+
> 	err = devlink_fmsg_arr_pair_nest_start(fmsg, "SQs");
> 	if (err)
> 		goto unlock;
>-- 
>1.8.3.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 10/16] net/mlx5e: Add cq info to tx reporter diagnose
  2019-07-07 11:53 ` [PATCH net-next 10/16] net/mlx5e: Add cq info to tx reporter diagnose Tariq Toukan
@ 2019-07-08 13:00   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 13:00 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:02PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Add cq information to general diagnose output: CQ size and stride size.
>Per SQ add information about the related CQ: cqn and CQ's HW status.
>
>$ devlink health diagnose pci/0000:00:0b.0 reporter tx
>Common config:
>   SQ: stride size: 64 size: 1024
>   CQ: stride size: 64 size: 1024
> SQs:
>   channel ix: 0 sqn: 4283 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1030 HW status: 0

The nesting of "cqn" and "HW status" is not visible there. I know it is
comment to iproute2 patch, but still. Should be corrected in this patch
description too.

Other than this, the patch looks good.


>   channel ix: 1 sqn: 4288 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1034 HW status: 0
>   channel ix: 2 sqn: 4293 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1038 HW status: 0
>   channel ix: 3 sqn: 4298 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1042 HW status: 0
>
>$ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp
>{
>    "Common config": [
>        "SQ": {
>            "stride size": 64,
>            "size": 1024
>        },
>        "CQ": {
>            "stride size": 64,
>            "size": 1024
>        } ],
>    "SQs": [ {
>            "channel ix": 0,
>            "sqn": 4283,
>            "HW state": 1,
>            "stopped": false,
>            "cc": 0,
>            "pc": 0,
>            "CQ": {
>                "cqn": 1030,
>                "HW status": 0
>            }
>        },{
>            "channel ix": 1,
>            "sqn": 4288,
>            "HW state": 1,
>            "stopped": false,
>            "cc": 0,
>            "pc": 0,
>            "CQ": {
>                "cqn": 1034,
>                "HW status": 0
>            }
>        },{
>            "channel ix": 2,
>            "sqn": 4293,
>            "HW state": 1,
>            "stopped": false,
>            "cc": 0,
>            "pc": 0,
>            "CQ": {
>                "cqn": 1038,
>                "HW status": 0
>            }
>        },{
>            "channel ix": 3,
>            "sqn": 4298,
>            "HW state": 1,
>            "stopped": false,
>            "cc": 0,
>            "pc": 0,
>            "CQ": {
>                "cqn": 1042,
>                "HW status": 0
>            }
>        } ]
>}
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

[...]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 09/16] net/mlx5e: Extend tx reporter diagnostics output
  2019-07-07 11:53 ` [PATCH net-next 09/16] net/mlx5e: Extend tx reporter diagnostics output Tariq Toukan
  2019-07-08 12:55   ` Jiri Pirko
@ 2019-07-08 13:39   ` Jiri Pirko
  1 sibling, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 13:39 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:01PM CEST, tariqt@mellanox.com wrote:

[...]

>+	err = devlink_fmsg_arr_pair_nest_start(fmsg, "Common config");

Are the values in this section "configurable"? If not, I wouldn't call
it "config"

[...]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 11/16] net/mlx5e: Add support to rx reporter diagnose
  2019-07-07 11:53 ` [PATCH net-next 11/16] net/mlx5e: Add support to rx " Tariq Toukan
@ 2019-07-08 14:22   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-08 14:22 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:03PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Add rx reporter, which supports diagnose call-back. Diagnostics output
>include: information common to all RQs: RQ type, RQ size, RQ stride
>size, CQ size and CQ stride size. In addition advertise information per
>RQ and its related icosq and attached CQ.
>
>$ devlink health diagnose pci/0000:00:0b.0 reporter rx
>Common config:
>   RQ: type: 2 stride size: 2048 size: 8
>   CQ: stride size: 64 size: 1024
> RQs:
>   channel ix: 0 rqn: 4284 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0
>   channel ix: 1 rqn: 4289 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0
>   channel ix: 2 rqn: 4294 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1040 HW status: 0
>   channel ix: 3 rqn: 4299 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1044 HW status: 0
>
>$ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp
>{
>    "Common config": [
>        "RQ": {
>            "type": 2,
>            "stride size": 2048,
>            "size": 8
>        },
>        "CQ": {
>            "stride size": 64,
>            "size": 1024
>        } ],
>    "RQs": [ {
>            "channel ix": 0,
>            "rqn": 4284,
>            "HW state": 1,
>            "SW state": 3,
>            "posted WQEs": 7,
>            "cc": 7,
>            "ICOSQ HW state": 1,
>            "CQ": {
>                "cqn": 1032,
>                "HW status": 0
>            }
>        },{
>            "channel ix": 1,
>            "rqn": 4289,
>            "HW state": 1,
>            "SW state": 3,
>            "posted WQEs": 7,
>            "cc": 7,
>            "ICOSQ HW state": 1,
>            "CQ": {
>                "cqn": 1036,
>                "HW status": 0
>            }
>        },{
>            "channel ix": 2,
>            "rqn": 4294,
>            "HW state": 1,
>            "SW state": 3,
>            "posted WQEs": 7,
>            "cc": 7,
>            "ICOSQ HW state": 1,
>            "CQ": {
>                "cqn": 1040,
>                "HW status": 0
>            }
>        },{
>            "channel ix": 3,
>            "rqn": 4299,
>            "HW state": 1,
>            "SW state": 3,
>            "posted WQEs": 7,
>            "cc": 7,
>            "ICOSQ HW state": 1,
>            "CQ": {
>                "cqn": 1044,
>                "HW status": 0
>            }
>        } ]
>}
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
>---
> drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
> drivers/net/ethernet/mellanox/mlx5/core/en.h       |  21 +++
> .../net/ethernet/mellanox/mlx5/core/en/health.c    |  31 ++++
> .../net/ethernet/mellanox/mlx5/core/en/health.h    |   7 +
> .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   | 194 +++++++++++++++++++++
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  29 +--
> 6 files changed, 258 insertions(+), 28 deletions(-)
> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>index 23d566a45a30..a3b9659649a8 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>@@ -24,8 +24,8 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
> mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
> 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
> 		en_selftest.o en/port.o en/monitor_stats.o en/health.o \
>-		en/reporter_tx.o en/params.o en/xsk/umem.o en/xsk/setup.o \
>-		en/xsk/rx.o en/xsk/tx.o
>+		en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/umem.o \
>+		en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o
> 
> #
> # Netdev extra
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>index 263558875f20..f7c5cf7a7064 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>@@ -848,6 +848,7 @@ struct mlx5e_priv {
> 	struct mlx5e_tls          *tls;
> #endif
> 	struct devlink_health_reporter *tx_reporter;
>+	struct devlink_health_reporter *rx_reporter;
> 	struct mlx5e_xsk           xsk;
> };
> 
>@@ -888,6 +889,26 @@ netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget);
> void mlx5e_free_txqsq_descs(struct mlx5e_txqsq *sq);
> 
>+static inline u32 mlx5e_rqwq_get_size(struct mlx5e_rq *rq)
>+{
>+	switch (rq->wq_type) {
>+	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
>+		return mlx5_wq_ll_get_size(&rq->mpwqe.wq);
>+	default:
>+		return mlx5_wq_cyc_get_size(&rq->wqe.wq);
>+	}
>+}
>+
>+static inline u32 mlx5e_rqwq_get_cur_sz(struct mlx5e_rq *rq)
>+{
>+	switch (rq->wq_type) {
>+	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
>+		return rq->mpwqe.wq.cur_sz;
>+	default:
>+		return rq->wqe.wq.cur_sz;
>+	}
>+}
>+
> bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev);
> bool mlx5e_striding_rq_possible(struct mlx5_core_dev *mdev,
> 				struct mlx5e_params *params);
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>index a266717d41e5..a0579de8e2e0 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.c
>@@ -96,6 +96,37 @@ int mlx5e_reporter_cq_common_diagnose(struct mlx5e_cq *cq, struct devlink_fmsg *
> 	return 0;
> }
> 
>+int mlx5e_health_create_reporters(struct mlx5e_priv *priv)
>+{
>+	int err;
>+
>+	err = mlx5e_reporter_tx_create(priv);
>+	if (err)
>+		return err;
>+
>+	err = mlx5e_reporter_rx_create(priv);
>+	if (err)
>+		return err;
>+
>+	return 0;
>+}
>+
>+void mlx5e_health_destroy_reporters(struct mlx5e_priv *priv)
>+{
>+	mlx5e_reporter_rx_destroy(priv);
>+	mlx5e_reporter_tx_destroy(priv);
>+}
>+
>+void mlx5e_health_channels_update(struct mlx5e_priv *priv)
>+{
>+	if (priv->tx_reporter)
>+		devlink_health_reporter_state_update(priv->tx_reporter,
>+						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
>+	if (priv->rx_reporter)
>+		devlink_health_reporter_state_update(priv->rx_reporter,
>+						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
>+}

Could you do this introduction of mlx5e_health_create_reporters(),
mlx5e_health_destroy_reporters(), and mlx5e_health_channels_update()
in a separate patch before this one?


>+
> int mlx5e_health_sq_to_ready(struct mlx5e_channel *channel, u32 sqn)
> {
> 	struct mlx5_core_dev *mdev = channel->mdev;
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>index b0c8bda3d25f..7829ea229914 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>@@ -16,6 +16,9 @@
> int mlx5e_reporter_named_obj_nest_start(struct devlink_fmsg *fmsg, char *name);
> int mlx5e_reporter_named_obj_nest_end(struct devlink_fmsg *fmsg);
> 
>+int mlx5e_reporter_rx_create(struct mlx5e_priv *priv);
>+void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv);
>+
> #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
> 
> struct mlx5e_err_ctx {
>@@ -29,5 +32,9 @@ struct mlx5e_err_ctx {
> int mlx5e_health_report(struct mlx5e_priv *priv,
> 			struct devlink_health_reporter *reporter, char *err_str,
> 			struct mlx5e_err_ctx *err_ctx);
>+int mlx5e_health_create_reporters(struct mlx5e_priv *priv);
>+void mlx5e_health_destroy_reporters(struct mlx5e_priv *priv);
>+void mlx5e_health_channels_update(struct mlx5e_priv *priv);
>+
> 
> #endif
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>new file mode 100644
>index 000000000000..b24bdff473b0
>--- /dev/null
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>@@ -0,0 +1,194 @@
>+// SPDX-License-Identifier: GPL-2.0
>+// Copyright (c) 2019 Mellanox Technologies.
>+
>+#include "health.h"
>+#include "params.h"
>+
>+static int mlx5e_query_rq_state(struct mlx5_core_dev *dev, u32 rqn, u8 *state)
>+{
>+	int outlen = MLX5_ST_SZ_BYTES(query_rq_out);
>+	void *out;
>+	void *rqc;
>+	int err;
>+
>+	out = kvzalloc(outlen, GFP_KERNEL);
>+	if (!out)
>+		return -ENOMEM;
>+
>+	err = mlx5_core_query_rq(dev, rqn, out);
>+	if (err)
>+		goto out;
>+
>+	rqc = MLX5_ADDR_OF(query_rq_out, out, rq_context);
>+	*state = MLX5_GET(rqc, rqc, state);
>+
>+out:
>+	kvfree(out);
>+	return err;
>+}
>+
>+static int mlx5e_rx_reporter_build_diagnose_output(struct mlx5e_rq *rq,
>+						   struct devlink_fmsg *fmsg)
>+{
>+	struct mlx5e_priv *priv = rq->channel->priv;
>+	struct mlx5e_params *params = &priv->channels.params;
>+	struct mlx5e_icosq *icosq = &rq->channel->icosq;
>+	u8 icosq_hw_state;
>+	int wqes_sz;
>+	u8 hw_state;
>+	u16 wq_head;
>+	int err;
>+
>+	err = mlx5e_query_rq_state(priv->mdev, rq->rqn, &hw_state);
>+	if (err)
>+		return err;
>+
>+	err = mlx5_core_query_sq_state(priv->mdev, icosq->sqn, &icosq_hw_state);
>+	if (err)
>+		return err;
>+
>+	wqes_sz = mlx5e_rqwq_get_cur_sz(rq);
>+	wq_head = params->rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ ?
>+		  rq->mpwqe.wq.head : mlx5_wq_cyc_get_head(&rq->wqe.wq);
>+
>+	err = devlink_fmsg_obj_nest_start(fmsg);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_u32_pair_put(fmsg, "channel ix", rq->channel->ix);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_u32_pair_put(fmsg, "rqn", rq->rqn);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_u8_pair_put(fmsg, "HW state", hw_state);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_u8_pair_put(fmsg, "SW state", rq->state);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_u32_pair_put(fmsg, "posted WQEs", wqes_sz);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_u32_pair_put(fmsg, "cc", wq_head);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_u8_pair_put(fmsg, "ICOSQ HW state", icosq_hw_state);
>+	if (err)
>+		return err;
>+
>+	err = mlx5e_reporter_cq_diagnose(&rq->cq, fmsg);
>+	if (err)
>+		return err;
>+
>+	err = devlink_fmsg_obj_nest_end(fmsg);
>+	if (err)
>+		return err;
>+
>+	return 0;
>+}
>+
>+static int mlx5e_rx_reporter_diagnose(struct devlink_health_reporter *reporter,
>+				      struct devlink_fmsg *fmsg)
>+{
>+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
>+	struct mlx5e_params *params = &priv->channels.params;
>+	struct mlx5e_rq *generic_rq;
>+	u32 rq_stride, rq_sz;
>+	int i, err = 0;
>+
>+	mutex_lock(&priv->state_lock);
>+
>+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
>+		goto unlock;
>+
>+	generic_rq = &priv->channels.c[0]->rq;
>+	rq_sz = mlx5e_rqwq_get_size(generic_rq);
>+	rq_stride = BIT(mlx5e_mpwqe_get_log_stride_size(priv->mdev, params, NULL));
>+
>+	err = devlink_fmsg_arr_pair_nest_start(fmsg, "Common config");
>+	if (err)
>+		goto unlock;
>+
>+	err = mlx5e_reporter_named_obj_nest_start(fmsg, "RQ");
>+	if (err)
>+		goto unlock;
>+
>+	err = devlink_fmsg_u8_pair_put(fmsg, "type", params->rq_wq_type);
>+	if (err)
>+		goto unlock;
>+
>+	err = devlink_fmsg_u64_pair_put(fmsg, "stride size", rq_stride);
>+	if (err)
>+		goto unlock;
>+
>+	err = devlink_fmsg_u32_pair_put(fmsg, "size", rq_sz);
>+	if (err)
>+		goto unlock;
>+
>+	err = devlink_fmsg_arr_pair_nest_end(fmsg);

This is odd. I think that you should have
mlx5e_reporter_named_obj_nest_end() called here and
devlink_fmsg_arr_pair_nest_end() after
mlx5e_reporter_cq_common_diagnose()

Misuse of this seems to be a pattern. We need some checker for this
apparently.


>+	if (err)
>+		goto unlock;
>+
>+	err = mlx5e_reporter_cq_common_diagnose(&generic_rq->cq, fmsg);
>+	if (err)
>+		goto unlock;
>+
>+	err = mlx5e_reporter_named_obj_nest_end(fmsg);
>+	if (err)
>+		goto unlock;
>+
>+	err = devlink_fmsg_arr_pair_nest_start(fmsg, "RQs");
>+	if (err)
>+		goto unlock;
>+
>+	for (i = 0; i < priv->channels.num; i++) {
>+		struct mlx5e_rq *rq = &priv->channels.c[i]->rq;
>+
>+		err = mlx5e_rx_reporter_build_diagnose_output(rq, fmsg);
>+		if (err)
>+			goto unlock;
>+	}
>+	err = devlink_fmsg_arr_pair_nest_end(fmsg);
>+	if (err)
>+		goto unlock;
>+unlock:
>+	mutex_unlock(&priv->state_lock);
>+	return err;
>+}
>+
>+static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
>+		.name = "rx",
>+		.diagnose = mlx5e_rx_reporter_diagnose,
>+};
>+
>+int mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
>+{
>+	struct devlink_health_reporter *reporter;
>+	struct mlx5_core_dev *mdev = priv->mdev;
>+	struct devlink *devlink = priv_to_devlink(mdev);
>+
>+	reporter =
>+		devlink_health_reporter_create(devlink, &mlx5_rx_reporter_ops,
>+					       0, false, priv);

Rather align like this:

	reporter = devlink_health_reporter_create(devlink,
						  &mlx5_rx_reporter_ops,
						  0, false, priv);


>+	if (IS_ERR(reporter))
>+		netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n",
>+			    PTR_ERR(reporter));
>+	else
>+		priv->tx_reporter = reporter;
>+	return PTR_ERR_OR_ZERO(reporter);

Change the flow to:

	if (IS_ERR(reporter)) {
		netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n",
			    PTR_ERR(reporter));
		return PTR_ERR(reporter);
	}
	priv->tx_reporter = reporter;
	return 0;


>+}
>+
>+void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv)
>+{
>+	if (!priv->rx_reporter)
>+		return;
>+
>+	devlink_health_reporter_destroy(priv->rx_reporter);
>+}
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>index 98c925e72706..3922905e909f 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>@@ -247,26 +247,6 @@ static inline void mlx5e_build_umr_wqe(struct mlx5e_rq *rq,
> 	ucseg->mkey_mask     = cpu_to_be64(MLX5_MKEY_MASK_FREE);
> }
> 
>-static u32 mlx5e_rqwq_get_size(struct mlx5e_rq *rq)
>-{
>-	switch (rq->wq_type) {
>-	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
>-		return mlx5_wq_ll_get_size(&rq->mpwqe.wq);
>-	default:
>-		return mlx5_wq_cyc_get_size(&rq->wqe.wq);
>-	}
>-}
>-
>-static u32 mlx5e_rqwq_get_cur_sz(struct mlx5e_rq *rq)
>-{
>-	switch (rq->wq_type) {
>-	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
>-		return rq->mpwqe.wq.cur_sz;
>-	default:
>-		return rq->wqe.wq.cur_sz;
>-	}
>-}
>-
> static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq,
> 				     struct mlx5e_channel *c)
> {
>@@ -2320,10 +2300,7 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
> 			goto err_close_channels;
> 	}
> 
>-	if (priv->tx_reporter)
>-		devlink_health_reporter_state_update(priv->tx_reporter,
>-						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
>-
>+	mlx5e_health_channels_update(priv);
> 	kvfree(cparam);
> 	return 0;
> 
>@@ -3201,7 +3178,6 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
> {
> 	int tc;
> 
>-	mlx5e_reporter_tx_destroy(priv);
> 	for (tc = 0; tc < priv->profile->max_tc; tc++)
> 		mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
> }
>@@ -4985,12 +4961,14 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
> 		mlx5_core_err(mdev, "TLS initialization failed, %d\n", err);
> 	mlx5e_build_nic_netdev(netdev);
> 	mlx5e_build_tc2txq_maps(priv);
>+	mlx5e_health_create_reporters(priv);
> 
> 	return 0;
> }
> 
> static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
> {
>+	mlx5e_health_destroy_reporters(priv);
> 	mlx5e_tls_cleanup(priv);
> 	mlx5e_ipsec_cleanup(priv);
> 	mlx5e_netdev_cleanup(priv->netdev, priv);
>@@ -5093,7 +5071,6 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
> #ifdef CONFIG_MLX5_CORE_EN_DCB
> 	mlx5e_dcbnl_initialize(priv);
> #endif
>-	mlx5e_reporter_tx_create(priv);
> 	return 0;
> }
> 
>-- 
>1.8.3.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 12/16] net/mlx5e: Split open/close ICOSQ into stages
  2019-07-07 11:53 ` [PATCH net-next 12/16] net/mlx5e: Split open/close ICOSQ into stages Tariq Toukan
@ 2019-07-09  7:23   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-09  7:23 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:04PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Align ICOSQ open/close behaviour with RQ and SQ. Split open flow into
>open and activate where open handles creation and activate enables the
>queue. Do a symmetric thing in close flow: split into close and
>deactivate.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 13/16] net/mlx5e: Recover from CQE error on ICOSQ
  2019-07-07 11:53 ` [PATCH net-next 13/16] net/mlx5e: Recover from CQE error on ICOSQ Tariq Toukan
@ 2019-07-09 15:30   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-09 15:30 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:05PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Add support for recovery from error on completion on ICOSQ. Deactivate
>RQ and flush, then deactivate ICOSQ. Set the queue back to ready state
>(firmware) and reset the ICOSQ and the RQ (software resources). Finally,
>activate the ICOSQ and the RQ.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 14/16] net/mlx5e: Recover from rx timeout
  2019-07-07 11:53 ` [PATCH net-next 14/16] net/mlx5e: Recover from rx timeout Tariq Toukan
@ 2019-07-09 15:32   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-09 15:32 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:06PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Add support for recovery from rx timeout. On driver open we post NOP
>work request on the rx channels to trigger napi in order to fillup the
>rx rings. In case napi wasn't scheduled due to a lost interrupt, perform
>EQ recovery.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
>---
> .../net/ethernet/mellanox/mlx5/core/en/health.h    |  1 +
> .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   | 30 ++++++++++++++++++++++
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  1 +
> 3 files changed, 32 insertions(+)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>index e8c5d3bd86f1..aa46f7ecae53 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
>@@ -19,6 +19,7 @@
> int mlx5e_reporter_rx_create(struct mlx5e_priv *priv);
> void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv);
> void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq);
>+void mlx5e_reporter_rx_timeout(struct mlx5e_rq *rq);
> 
> #define MLX5E_REPORTER_PER_Q_MAX_LEN 256
> 
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>index c47e9a53bd53..7e7dba129330 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>@@ -109,6 +109,36 @@ void mlx5e_reporter_icosq_cqe_err(struct mlx5e_icosq *icosq)
> 	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
> }
> 
>+static int mlx5e_rx_reporter_timeout_recover(void *ctx)
>+{
>+	struct mlx5e_rq *rq = (struct mlx5e_rq *)ctx;

No need to cast. Please fix this in the rest of the patchset too.


>+	struct mlx5e_icosq *icosq = &rq->channel->icosq;
>+	struct mlx5_eq_comp *eq = rq->cq.mcq.eq;
>+	int err;
>+
>+	err = mlx5e_health_channel_eq_recover(eq, rq->channel);
>+	if (err)
>+		clear_bit(MLX5E_SQ_STATE_ENABLED, &icosq->state);
>+
>+	return err;
>+}
>+
>+void mlx5e_reporter_rx_timeout(struct mlx5e_rq *rq)
>+{
>+	struct mlx5e_icosq *icosq = &rq->channel->icosq;
>+	struct mlx5e_priv *priv = rq->channel->priv;
>+	char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
>+	struct mlx5e_err_ctx err_ctx = {};
>+
>+	err_ctx.ctx = rq;
>+	err_ctx.recover = mlx5e_rx_reporter_timeout_recover;
>+	sprintf(err_str,
>+		"RX timeout on channel: %d, ICOSQ: 0x%x RQ: 0x%x, CQ: 0x%x\n",
>+		icosq->channel->ix, icosq->sqn, rq->rqn, rq->cq.mcq.cqn);
>+
>+	mlx5e_health_report(priv, priv->rx_reporter, err_str, &err_ctx);
>+}
>+
> static int mlx5e_rx_reporter_recover_from_ctx(struct mlx5e_err_ctx *err_ctx)
> {
> 	return err_ctx->recover(err_ctx->ctx);
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>index 2d57611ac579..1ebdeccf395d 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>@@ -809,6 +809,7 @@ int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time)
> 	netdev_warn(c->netdev, "Failed to get min RX wqes on Channel[%d] RQN[0x%x] wq cur_sz(%d) min_rx_wqes(%d)\n",
> 		    c->ix, rq->rqn, mlx5e_rqwq_get_cur_sz(rq), min_wqes);
> 
>+	mlx5e_reporter_rx_timeout(rq);
> 	return -ETIMEDOUT;
> }
> 
>-- 
>1.8.3.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 15/16] net/mlx5e: RX, Handle CQE with error at the earliest stage
  2019-07-07 11:53 ` [PATCH net-next 15/16] net/mlx5e: RX, Handle CQE with error at the earliest stage Tariq Toukan
@ 2019-07-09 15:34   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-07-09 15:34 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe

Sun, Jul 07, 2019 at 01:53:07PM CEST, tariqt@mellanox.com wrote:
>From: Saeed Mahameed <saeedm@mellanox.com>
>
>Just to be aligned with the MPWQE handlers, handle RX WQE with error
>for legacy RQs in the top RX handlers, just before calling skb_from_cqe().
>
>CQE error handling will now be called at the same stage regardless of
>the RQ type or netdev mode NIC, Representor, IPoIB, etc ..
>
>This will be useful for down stream patches to improve error CQE

I see only one patch left in this set.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-07-09 15:35 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-07 11:52 [PATCH net-next 00/16] mlx5e devlink health reporters Tariq Toukan
2019-07-07 11:52 ` [PATCH net-next 01/16] Revert "net/mlx5e: Fix mlx5e_tx_reporter_create return value" Tariq Toukan
2019-07-08 11:03   ` Jiri Pirko
2019-07-07 11:52 ` [PATCH net-next 02/16] net/mlx5e: Fix error flow in tx reporter diagnose Tariq Toukan
2019-07-08 11:09   ` Jiri Pirko
2019-07-07 11:52 ` [PATCH net-next 03/16] net/mlx5e: Set tx reporter only on successful creation Tariq Toukan
2019-07-07 11:52 ` [PATCH net-next 04/16] net/mlx5e: TX reporter cleanup Tariq Toukan
2019-07-07 11:52 ` [PATCH net-next 05/16] net/mlx5e: Rename reporter header file Tariq Toukan
2019-07-08 11:11   ` Jiri Pirko
2019-07-07 11:52 ` [PATCH net-next 06/16] net/mlx5e: Change naming convention for reporter's functions Tariq Toukan
2019-07-08 11:29   ` Jiri Pirko
2019-07-07 11:52 ` [PATCH net-next 07/16] net/mlx5e: Generalize tx reporter's functionality Tariq Toukan
2019-07-08 12:42   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 08/16] net/mlx5e: Extend tx diagnose function Tariq Toukan
2019-07-08 12:43   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 09/16] net/mlx5e: Extend tx reporter diagnostics output Tariq Toukan
2019-07-08 12:55   ` Jiri Pirko
2019-07-08 13:39   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 10/16] net/mlx5e: Add cq info to tx reporter diagnose Tariq Toukan
2019-07-08 13:00   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 11/16] net/mlx5e: Add support to rx " Tariq Toukan
2019-07-08 14:22   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 12/16] net/mlx5e: Split open/close ICOSQ into stages Tariq Toukan
2019-07-09  7:23   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 13/16] net/mlx5e: Recover from CQE error on ICOSQ Tariq Toukan
2019-07-09 15:30   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 14/16] net/mlx5e: Recover from rx timeout Tariq Toukan
2019-07-09 15:32   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 15/16] net/mlx5e: RX, Handle CQE with error at the earliest stage Tariq Toukan
2019-07-09 15:34   ` Jiri Pirko
2019-07-07 11:53 ` [PATCH net-next 16/16] net/mlx5e: Recover from CQE with error on RQ Tariq Toukan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.