All of lore.kernel.org
 help / color / mirror / Atom feed
* [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
@ 2018-08-01 21:52 Saeed Mahameed
  2018-08-01 21:52 ` [net-next 01/10] devlink: Fix param set handling for string type Saeed Mahameed
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Saeed Mahameed

Hi Dave,

This series provides devlink parameters updates to both devlink API and
mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous
mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
Devlink" to address review comments [1].

Changes from the original series:
- According to the discussion outcome, we are keeping the congestion control
  setting as mlx5 device specific for the current HW generation.
- Changed the congestion_mode and congestion action param type to string
- Added patches to fix devlink handling of param type string
- Added a patch which adds extack messages support for param set.
- At the end of this series, I've added yet another mlx5 devlink related
 feature, firmware snapshot support.

For more information please see tag log below.

Please pull and let me know if there's any problem.

[1] https://patchwork.ozlabs.org/patch/945996/

Thanks,
Saeed.

---

The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8:

  net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2018-08-01

for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:

  net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700)

----------------------------------------------------------------
mlx5-updates-2018-08-01

This series provides devlink parameters updates to both devlink API and
mlx5 driver,

1) Devlink changes: (Moshe Shemesh)
The first two patches fix devlink param infrastructure for string type
params.
The third patch adds a devlink helper function to safely copy string from
driver to devlink.
The forth patch adds extack support for param set.

2) mlx5 specific congestion parameters: (Eran Ben Elisha)
Next three patches add new devlink driver specific params for controlling
congestion action and mode, using string type params and extack messages support.

This congestion mode enables hw workaround in specific devices which is
controlled by devlink driver-specific params. The workaround is device
specific for this NIC generation, the next NIC will not need it.

Congestion parameters:
 - Congestion action
            HW W/A mechanism in the PCIe buffer which monitors the amount of
            consumed PCIe buffer per host.  This mechanism supports the
            following actions in case of threshold overflow:
            - Disabled - NOP (Default)
            - Drop
            - Mark - Mark CE bit in the CQE of received packet
    - Congestion mode
            - Aggressive - Aggressive static trigger threshold (Default)
            - Dynamic - Dynamically change the trigger threshold

3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
Last three patches, add the support for capturing region snapshot of the
firmware crspace during critical errors, using devlink region_snapshot
parameter.

-Saeed.

----------------------------------------------------------------
Alex Vesker (3):
      net/mlx5: Add Vendor Specific Capability access gateway
      net/mlx5: Add Crdump FW snapshot support
      net/mlx5: Use devlink region_snapshot parameter

Eran Ben Elisha (3):
      net/mlx5: Move all devlink related functions calls to devlink.c
      net/mlx5: Add MPEGC register configuration functionality
      net/mlx5: Enable PCIe buffer congestion handling workaround via devlink

Moshe Shemesh (4):
      devlink: Fix param set handling for string type
      devlink: Fix param cmode driverinit for string type
      devlink: Add helper function for safely copy string param
      devlink: Add extack messages support to param set

 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
 .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 ++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
 .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 +++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  10 +-
 include/linux/mlx5/driver.h                        |   5 +
 include/net/devlink.h                              |  15 +-
 net/core/devlink.c                                 |  44 ++-
 14 files changed, 1076 insertions(+), 17 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [net-next 01/10] devlink: Fix param set handling for string type
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 22:33   ` Jakub Kicinski
  2018-08-01 21:52 ` [net-next 02/10] devlink: Fix param cmode driverinit " Saeed Mahameed
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Moshe Shemesh, Saeed Mahameed

From: Moshe Shemesh <moshe@mellanox.com>

In case devlink param type is string, it needs to copy the string value
it got from the input to devlink_param_value.

Fixes: e3b7ca18ad7b ("devlink: Add param set command")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/net/devlink.h | 2 +-
 net/core/devlink.c    | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b9b89d6604d4..b0e17c025fdc 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -311,7 +311,7 @@ union devlink_param_value {
 	u8 vu8;
 	u16 vu16;
 	u32 vu32;
-	const char *vstr;
+	char vstr[DEVLINK_PARAM_MAX_STRING_VALUE];
 	bool vbool;
 };
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 65fc366a78a4..61e126c28526 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3014,7 +3014,8 @@ devlink_param_value_get_from_info(const struct devlink_param *param,
 		if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) >
 		    DEVLINK_PARAM_MAX_STRING_VALUE)
 			return -EINVAL;
-		value->vstr = nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
+		strcpy(value->vstr,
+		       nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));
 		break;
 	case DEVLINK_PARAM_TYPE_BOOL:
 		value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ?
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 02/10] devlink: Fix param cmode driverinit for string type
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
  2018-08-01 21:52 ` [net-next 01/10] devlink: Fix param set handling for string type Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 21:52 ` [net-next 03/10] devlink: Add helper function for safely copy string param Saeed Mahameed
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Moshe Shemesh, Saeed Mahameed

From: Moshe Shemesh <moshe@mellanox.com>

Driverinit configuration mode value is held by devlink to enable the
driver fetch the value after reload command. In case the param type is
string devlink should copy the value from driver string buffer to
devlink string buffer on devlink_param_driverinit_value_set() and
vice-versa on devlink_param_driverinit_value_get().

Fixes: ec01aeb1803e ("devlink: Add support for get/set driverinit value")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 net/core/devlink.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 61e126c28526..99ad53e8297f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3102,7 +3102,10 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb,
 		return -EOPNOTSUPP;
 
 	if (cmode == DEVLINK_PARAM_CMODE_DRIVERINIT) {
-		param_item->driverinit_value = value;
+		if (param->type == DEVLINK_PARAM_TYPE_STRING)
+			strcpy(param_item->driverinit_value.vstr, value.vstr);
+		else
+			param_item->driverinit_value = value;
 		param_item->driverinit_value_valid = true;
 	} else {
 		if (!param->set)
@@ -4542,7 +4545,10 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id,
 					      DEVLINK_PARAM_CMODE_DRIVERINIT))
 		return -EOPNOTSUPP;
 
-	*init_val = param_item->driverinit_value;
+	if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING)
+		strcpy(init_val->vstr, param_item->driverinit_value.vstr);
+	else
+		*init_val = param_item->driverinit_value;
 
 	return 0;
 }
@@ -4573,7 +4579,10 @@ int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id,
 					      DEVLINK_PARAM_CMODE_DRIVERINIT))
 		return -EOPNOTSUPP;
 
-	param_item->driverinit_value = init_val;
+	if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING)
+		strcpy(param_item->driverinit_value.vstr, init_val.vstr);
+	else
+		param_item->driverinit_value = init_val;
 	param_item->driverinit_value_valid = true;
 
 	devlink_param_notify(devlink, param_item, DEVLINK_CMD_PARAM_NEW);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 03/10] devlink: Add helper function for safely copy string param
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
  2018-08-01 21:52 ` [net-next 01/10] devlink: Fix param set handling for string type Saeed Mahameed
  2018-08-01 21:52 ` [net-next 02/10] devlink: Fix param cmode driverinit " Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 21:52 ` [net-next 04/10] devlink: Add extack messages support to param set Saeed Mahameed
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Moshe Shemesh, Saeed Mahameed

From: Moshe Shemesh <moshe@mellanox.com>

Devlink string param buffer is allocated at the size of
DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure
this size is not exceeded.
Renamed DEVLINK_PARAM_MAX_STRING_VALUE to
__DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by
devlink only. The driver should use the helper function instead to
verify it doesn't exceed the allowed length.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/net/devlink.h | 12 ++++++++++--
 net/core/devlink.c    | 19 ++++++++++++++++++-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b0e17c025fdc..99efc156a309 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -298,7 +298,7 @@ struct devlink_resource {
 
 #define DEVLINK_RESOURCE_ID_PARENT_TOP 0
 
-#define DEVLINK_PARAM_MAX_STRING_VALUE 32
+#define __DEVLINK_PARAM_MAX_STRING_VALUE 32
 enum devlink_param_type {
 	DEVLINK_PARAM_TYPE_U8,
 	DEVLINK_PARAM_TYPE_U16,
@@ -311,7 +311,7 @@ union devlink_param_value {
 	u8 vu8;
 	u16 vu16;
 	u32 vu32;
-	char vstr[DEVLINK_PARAM_MAX_STRING_VALUE];
+	char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE];
 	bool vbool;
 };
 
@@ -553,6 +553,8 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id,
 int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id,
 				       union devlink_param_value init_val);
 void devlink_param_value_changed(struct devlink *devlink, u32 param_id);
+void devlink_param_value_str_fill(union devlink_param_value *dst_val,
+				  const char *src);
 struct devlink_region *devlink_region_create(struct devlink *devlink,
 					     const char *region_name,
 					     u32 region_max_snapshots,
@@ -789,6 +791,12 @@ devlink_param_value_changed(struct devlink *devlink, u32 param_id)
 {
 }
 
+static inline void
+devlink_param_value_str_fill(union devlink_param_value *dst_val,
+			     const char *src)
+{
+}
+
 static inline struct devlink_region *
 devlink_region_create(struct devlink *devlink,
 		      const char *region_name,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 99ad53e8297f..367564099357 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3012,7 +3012,7 @@ devlink_param_value_get_from_info(const struct devlink_param *param,
 		break;
 	case DEVLINK_PARAM_TYPE_STRING:
 		if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) >
-		    DEVLINK_PARAM_MAX_STRING_VALUE)
+		    __DEVLINK_PARAM_MAX_STRING_VALUE)
 			return -EINVAL;
 		strcpy(value->vstr,
 		       nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));
@@ -4614,6 +4614,23 @@ void devlink_param_value_changed(struct devlink *devlink, u32 param_id)
 }
 EXPORT_SYMBOL_GPL(devlink_param_value_changed);
 
+/**
+ *	devlink_param_value_str_fill - Safely fill-up the string preventing
+ *				       from overflow of the preallocated buffer
+ *
+ *	@dst_val: destination devlink_param_value
+ *	@src: source buffer
+ */
+void devlink_param_value_str_fill(union devlink_param_value *dst_val,
+				  const char *src)
+{
+	size_t len;
+
+	len = strlcpy(dst_val->vstr, src, __DEVLINK_PARAM_MAX_STRING_VALUE);
+	WARN_ON(len >= __DEVLINK_PARAM_MAX_STRING_VALUE);
+}
+EXPORT_SYMBOL_GPL(devlink_param_value_str_fill);
+
 /**
  *	devlink_region_create - create a new address region
  *
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 04/10] devlink: Add extack messages support to param set
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2018-08-01 21:52 ` [net-next 03/10] devlink: Add helper function for safely copy string param Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 21:52 ` [net-next 05/10] net/mlx5: Move all devlink related functions calls to devlink.c Saeed Mahameed
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Moshe Shemesh, Saeed Mahameed

From: Moshe Shemesh <moshe@mellanox.com>

Add param extack messages support to param->set() function.
This will enable providing clear reasoning to user on setting new value
failure. Note that param->validate() already supports extack messages,
but it is not enough as input validation can pass or even not needed and
still setting new value may fail for some reason, such as device state,
FW support, etc.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 3 ++-
 drivers/net/ethernet/mellanox/mlx4/main.c         | 6 ++++--
 include/net/devlink.h                             | 3 ++-
 net/core/devlink.c                                | 7 ++++---
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 7bd96ab4f7c5..97b987f31881 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -91,7 +91,8 @@ static int bnxt_dl_nvm_param_get(struct devlink *dl, u32 id,
 }
 
 static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 id,
-				 struct devlink_param_gset_ctx *ctx)
+				 struct devlink_param_gset_ctx *ctx,
+				 struct netlink_ext_ack *extack)
 {
 	struct hwrm_nvm_set_variable_input req = {0};
 	struct bnxt *bp = bnxt_get_bp_from_dl(dl);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index d2d59444f562..629a5c2a4971 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -186,7 +186,8 @@ static int mlx4_devlink_ierr_reset_get(struct devlink *devlink, u32 id,
 }
 
 static int mlx4_devlink_ierr_reset_set(struct devlink *devlink, u32 id,
-				       struct devlink_param_gset_ctx *ctx)
+				       struct devlink_param_gset_ctx *ctx,
+				       struct netlink_ext_ack *extack)
 {
 	mlx4_internal_err_reset = ctx->val.vbool;
 	return 0;
@@ -203,7 +204,8 @@ static int mlx4_devlink_crdump_snapshot_get(struct devlink *devlink, u32 id,
 }
 
 static int mlx4_devlink_crdump_snapshot_set(struct devlink *devlink, u32 id,
-					    struct devlink_param_gset_ctx *ctx)
+					    struct devlink_param_gset_ctx *ctx,
+					    struct netlink_ext_ack *extack)
 {
 	struct mlx4_priv *priv = devlink_priv(devlink);
 	struct mlx4_dev *dev = &priv->dev;
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 99efc156a309..84294f4c2580 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -344,7 +344,8 @@ struct devlink_param {
 	int (*get)(struct devlink *devlink, u32 id,
 		   struct devlink_param_gset_ctx *ctx);
 	int (*set)(struct devlink *devlink, u32 id,
-		   struct devlink_param_gset_ctx *ctx);
+		   struct devlink_param_gset_ctx *ctx,
+		   struct netlink_ext_ack *extack);
 	int (*validate)(struct devlink *devlink, u32 id,
 			union devlink_param_value val,
 			struct netlink_ext_ack *extack);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 367564099357..54ba7e55b1de 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2746,11 +2746,12 @@ static int devlink_param_get(struct devlink *devlink,
 
 static int devlink_param_set(struct devlink *devlink,
 			     const struct devlink_param *param,
-			     struct devlink_param_gset_ctx *ctx)
+			     struct devlink_param_gset_ctx *ctx,
+			     struct netlink_ext_ack *extack)
 {
 	if (!param->set)
 		return -EOPNOTSUPP;
-	return param->set(devlink, param->id, ctx);
+	return param->set(devlink, param->id, ctx, extack);
 }
 
 static int
@@ -3112,7 +3113,7 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb,
 			return -EOPNOTSUPP;
 		ctx.val = value;
 		ctx.cmode = cmode;
-		err = devlink_param_set(devlink, param, &ctx);
+		err = devlink_param_set(devlink, param, &ctx, info->extack);
 		if (err)
 			return err;
 	}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 05/10] net/mlx5: Move all devlink related functions calls to devlink.c
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2018-08-01 21:52 ` [net-next 04/10] devlink: Add extack messages support to param set Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 21:52 ` [net-next 06/10] net/mlx5: Add MPEGC register configuration functionality Saeed Mahameed
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Eran Ben Elisha, Saeed Mahameed

From: Eran Ben Elisha <eranbe@mellanox.com>

Centralize all devlink related callbacks in one file.
In the downstream patch, some more functionality will be added, this
patch is preparing the driver infrastructure for it.

Currently, move devlink un/register functions calls into this file.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 14 ++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 12 ++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/main.c    |  5 +++--
 4 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index f20fda1ced4f..0be5bf93f33b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -6,7 +6,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
-		diag/fs_tracepoint.o diag/fw_tracer.o
+		diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
 
 mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
new file mode 100644
index 000000000000..2daf686bcc98
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
+
+#include <devlink.h>
+
+int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
+{
+	return devlink_register(devlink, dev);
+}
+
+void mlx5_devlink_unregister(struct devlink *devlink)
+{
+	devlink_unregister(devlink);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
new file mode 100644
index 000000000000..455bfa4e89c0
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
+
+#ifndef __MLX5_DEVLINK_H__
+#define __MLX5_DEVLINK_H__
+
+#include <net/devlink.h>
+
+int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
+void mlx5_devlink_unregister(struct devlink *devlink);
+
+#endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 03b9c6733eed..7a1ddf96f065 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -56,6 +56,7 @@
 #include "fs_core.h"
 #include "lib/mpfs.h"
 #include "eswitch.h"
+#include "devlink.h"
 #include "lib/mlx5.h"
 #include "fpga/core.h"
 #include "fpga/ipsec.h"
@@ -1445,7 +1446,7 @@ static int init_one(struct pci_dev *pdev,
 
 	request_module_nowait(MLX5_IB_MOD);
 
-	err = devlink_register(devlink, &pdev->dev);
+	err = mlx5_devlink_register(devlink, &pdev->dev);
 	if (err)
 		goto clean_load;
 
@@ -1475,7 +1476,7 @@ static void remove_one(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_priv *priv = &dev->priv;
 
-	devlink_unregister(devlink);
+	mlx5_devlink_unregister(devlink);
 	mlx5_unregister_device(dev);
 
 	if (mlx5_unload_one(dev, priv, true)) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 06/10] net/mlx5: Add MPEGC register configuration functionality
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2018-08-01 21:52 ` [net-next 05/10] net/mlx5: Move all devlink related functions calls to devlink.c Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 21:52 ` [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink Saeed Mahameed
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Eran Ben Elisha, Saeed Mahameed

From: Eran Ben Elisha <eranbe@mellanox.com>

MPEGC register is used to configure and access the PCIe general
configuration.

Expose set/get for TX lossy overflow and TX overflow sense which use the
MPEGC register. These will be used in a downstream patch via devlink
params.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 123 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |   1 +
 2 files changed, 124 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 2daf686bcc98..e3a5de6b4ee7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -3,6 +3,129 @@
 
 #include <devlink.h>
 
+enum {
+	MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN = BIT(0),
+	MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_SENSE = BIT(3),
+	MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CQE = BIT(4),
+	MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CNP = BIT(5),
+};
+
+enum {
+	MLX5_DEVLINK_CONGESTION_ACTION_DISABLED,
+	MLX5_DEVLINK_CONGESTION_ACTION_DROP,
+	MLX5_DEVLINK_CONGESTION_ACTION_MARK,
+	__MLX5_DEVLINK_CONGESTION_ACTION_MAX,
+	MLX5_DEVLINK_CONGESTION_ACTION_MAX = __MLX5_DEVLINK_CONGESTION_ACTION_MAX - 1,
+};
+
+enum {
+	MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE,
+	MLX5_DEVLINK_CONGESTION_MODE_DYNAMIC_ADJUSTMENT,
+	__MLX5_DEVLINK_CONGESTION_MODE_MAX,
+	MLX5_DEVLINK_CONGESTION_MODE_MAX = __MLX5_DEVLINK_CONGESTION_MODE_MAX - 1,
+};
+
+static int mlx5_devlink_set_mpegc(struct mlx5_core_dev *mdev, u32 *in,
+				  int size_in)
+{
+	u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+
+	if (!MLX5_CAP_MCAM_REG(mdev, mpegc))
+		return -EOPNOTSUPP;
+
+	return mlx5_core_access_reg(mdev, in, size_in, out,
+				    sizeof(out), MLX5_REG_MPEGC, 0, 1);
+}
+
+static int mlx5_devlink_set_tx_lossy_overflow(struct mlx5_core_dev *mdev,
+					      u8 tx_lossy_overflow)
+{
+	u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+	u8 field_select = 0;
+
+	if (tx_lossy_overflow == MLX5_DEVLINK_CONGESTION_ACTION_MARK) {
+		if (MLX5_CAP_MCAM_FEATURE(mdev, mark_tx_action_cqe))
+			field_select |=
+				MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CQE;
+
+		if (MLX5_CAP_MCAM_FEATURE(mdev, mark_tx_action_cnp))
+			field_select |=
+				MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CNP;
+
+		if (!field_select)
+			return -EOPNOTSUPP;
+	}
+
+	MLX5_SET(mpegc_reg, in, field_select,
+		 field_select |
+		 MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN);
+	MLX5_SET(mpegc_reg, in, tx_lossy_overflow_oper, tx_lossy_overflow);
+	MLX5_SET(mpegc_reg, in, mark_cqe, 0x1);
+	MLX5_SET(mpegc_reg, in, mark_cnp, 0x1);
+
+	return mlx5_devlink_set_mpegc(mdev, in, sizeof(in));
+}
+
+static int mlx5_devlink_set_tx_overflow_sense(struct mlx5_core_dev *mdev,
+					      u8 tx_overflow_sense)
+{
+	u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+
+	if (!MLX5_CAP_MCAM_FEATURE(mdev, dynamic_tx_overflow))
+		return -EOPNOTSUPP;
+
+	MLX5_SET(mpegc_reg, in, field_select,
+		 MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_SENSE);
+	MLX5_SET(mpegc_reg, in, tx_overflow_sense, tx_overflow_sense);
+
+	return mlx5_devlink_set_mpegc(mdev, in, sizeof(in));
+}
+
+static int mlx5_devlink_query_mpegc(struct mlx5_core_dev *mdev, u32 *out,
+				    int size_out)
+{
+	u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+
+	if (!MLX5_CAP_MCAM_REG(mdev, mpegc))
+		return -EOPNOTSUPP;
+
+	return mlx5_core_access_reg(mdev, in, sizeof(in), out,
+				    size_out, MLX5_REG_MPEGC, 0, 0);
+}
+
+static int mlx5_devlink_query_tx_lossy_overflow(struct mlx5_core_dev *mdev,
+						u8 *tx_lossy_overflow)
+{
+	u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+	int err;
+
+	err = mlx5_devlink_query_mpegc(mdev, out, sizeof(out));
+	if (err)
+		return err;
+
+	*tx_lossy_overflow = MLX5_GET(mpegc_reg, out, tx_lossy_overflow_oper);
+
+	return 0;
+}
+
+static int mlx5_devlink_query_tx_overflow_sense(struct mlx5_core_dev *mdev,
+						u8 *tx_overflow_sense)
+{
+	u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+	int err;
+
+	if (!MLX5_CAP_MCAM_FEATURE(mdev, dynamic_tx_overflow))
+		return -EOPNOTSUPP;
+
+	err = mlx5_devlink_query_mpegc(mdev, out, sizeof(out));
+	if (err)
+		return err;
+
+	*tx_overflow_sense = MLX5_GET(mpegc_reg, out, tx_overflow_sense);
+
+	return 0;
+}
+
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
 {
 	return devlink_register(devlink, dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 455bfa4e89c0..cdf09df8293f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -5,6 +5,7 @@
 #define __MLX5_DEVLINK_H__
 
 #include <net/devlink.h>
+#include <linux/mlx5/driver.h>
 
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
 void mlx5_devlink_unregister(struct devlink *devlink);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2018-08-01 21:52 ` [net-next 06/10] net/mlx5: Add MPEGC register configuration functionality Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 22:18   ` Alexander Duyck
  2018-08-01 21:52 ` [net-next 08/10] net/mlx5: Add Vendor Specific Capability access gateway Saeed Mahameed
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Eran Ben Elisha, Saeed Mahameed

From: Eran Ben Elisha <eranbe@mellanox.com>

Add support for two driver parameters via devlink params interface:
- Congestion action
	HW mechanism in the PCIe buffer which monitors the amount of
	consumed PCIe buffer per host. This mechanism supports the
	following actions in case of threshold overflow:
	- disabled - NOP (Default)
	- drop
	- mark - Mark CE bit in the CQE of received packet
- Congestion mode
	- aggressive - Aggressive static trigger threshold (Default)
	- dynamic - Dynamically change the trigger threshold

These driver-specific params enable the NIC HW workaround to handle
buffer congestion on the current NIC generation.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 204 +++++++++++++++++-
 1 file changed, 203 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index e3a5de6b4ee7..ec0ca690839e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -126,12 +126,214 @@ static int mlx5_devlink_query_tx_overflow_sense(struct mlx5_core_dev *mdev,
 	return 0;
 }
 
+static const char *const action_to_str[] = {
+	[MLX5_DEVLINK_CONGESTION_ACTION_DISABLED] = "disabled",
+	[MLX5_DEVLINK_CONGESTION_ACTION_DROP] = "drop",
+	[MLX5_DEVLINK_CONGESTION_ACTION_MARK] = "mark"
+};
+
+static const char *mlx5_devlink_congestion_action_to_str(int action)
+{
+	if (action > MLX5_DEVLINK_CONGESTION_ACTION_MAX) {
+		WARN_ON(1);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return action_to_str[action];
+}
+
+static int mlx5_devlink_str_to_congestion_action(const char *str, u8 *action)
+{
+	int i;
+
+	for (i = 0; i <= MLX5_DEVLINK_CONGESTION_ACTION_MAX; i++) {
+		if (!strcmp(str, action_to_str[i])) {
+			*action = i;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static int mlx5_devlink_set_congestion_action(struct devlink *devlink, u32 id,
+					      struct devlink_param_gset_ctx *ctx,
+					      struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	u8 max = MLX5_DEVLINK_CONGESTION_ACTION_MAX;
+	u8 congestion_action;
+	u8 sense;
+	int err;
+
+	if (!MLX5_CAP_MCAM_FEATURE(dev, mark_tx_action_cqe) &&
+	    !MLX5_CAP_MCAM_FEATURE(dev, mark_tx_action_cnp))
+		max = MLX5_DEVLINK_CONGESTION_ACTION_MARK - 1;
+
+	err = mlx5_devlink_str_to_congestion_action(ctx->val.vstr,
+						    &congestion_action);
+	if (err)
+		return err;
+
+	if (congestion_action > max) {
+		NL_SET_ERR_MSG(extack, "Requested congestion action is not supported on current device/FW");
+		return -EINVAL;
+	}
+
+	err = mlx5_devlink_query_tx_overflow_sense(dev, &sense);
+	if (err)
+		return err;
+
+	if (congestion_action == MLX5_DEVLINK_CONGESTION_ACTION_DISABLED &&
+	    sense != MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE) {
+		NL_SET_ERR_MSG(extack, "Congestion action \"disabled\" is allowed only while mode is configured to aggressive");
+		return -EINVAL;
+	}
+
+	return mlx5_devlink_set_tx_lossy_overflow(dev, congestion_action);
+}
+
+static int mlx5_devlink_get_congestion_action(struct devlink *devlink, u32 id,
+					      struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	u8 congestion_action;
+	const char *val;
+	int err;
+
+	err = mlx5_devlink_query_tx_lossy_overflow(dev, &congestion_action);
+	if (err)
+		return err;
+
+	val = mlx5_devlink_congestion_action_to_str(congestion_action);
+	if (IS_ERR(val))
+		return PTR_ERR(val);
+
+	devlink_param_value_str_fill(&ctx->val, val);
+	return 0;
+}
+
+static const char *const mode_to_str[] = {
+	[MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE] = "aggressive",
+	[MLX5_DEVLINK_CONGESTION_MODE_DYNAMIC_ADJUSTMENT] = "dynamic"
+};
+
+static const char *mlx5_devlink_congestion_mode_to_str(int mode)
+{
+	if (mode > MLX5_DEVLINK_CONGESTION_MODE_MAX) {
+		WARN_ON(1);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return mode_to_str[mode];
+}
+
+static int mlx5_devlink_str_to_congestion_mode(const char *str, u8 *mode)
+{
+	int i;
+
+	for (i = 0; i <= MLX5_DEVLINK_CONGESTION_MODE_MAX; i++) {
+		if (!strcmp(str, mode_to_str[i])) {
+			*mode = i;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static int mlx5_devlink_set_congestion_mode(struct devlink *devlink, u32 id,
+					    struct devlink_param_gset_ctx *ctx,
+					    struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	u8 tx_lossy_overflow, congestion_mode;
+	int err;
+
+	err = mlx5_devlink_str_to_congestion_mode(ctx->val.vstr,
+						  &congestion_mode);
+	if (err)
+		return err;
+
+	err = mlx5_devlink_query_tx_lossy_overflow(dev, &tx_lossy_overflow);
+	if (err)
+		return err;
+
+	if (congestion_mode != MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE &&
+	    tx_lossy_overflow == MLX5_DEVLINK_CONGESTION_ACTION_DISABLED) {
+		NL_SET_ERR_MSG(extack, "Congestion mode must be aggressive while congestion action is configured to \"disabled\"");
+		return -EINVAL;
+	}
+
+	return mlx5_devlink_set_tx_overflow_sense(dev, congestion_mode);
+}
+
+static int mlx5_devlink_get_congestion_mode(struct devlink *devlink, u32 id,
+					    struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	u8 congestion_mode;
+	const char *val;
+	int err;
+
+	err = mlx5_devlink_query_tx_overflow_sense(dev, &congestion_mode);
+	if (err)
+		return err;
+
+	val = mlx5_devlink_congestion_mode_to_str(congestion_mode);
+	if (IS_ERR(val))
+		return PTR_ERR(val);
+
+	devlink_param_value_str_fill(&ctx->val, val);
+	return 0;
+}
+
+enum mlx5_devlink_param_id {
+	MLX5_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+	MLX5_DEVLINK_PARAM_ID_CONGESTION_ACTION,
+	MLX5_DEVLINK_PARAM_ID_CONGESTION_MODE,
+};
+
+static const struct devlink_param mlx5_devlink_params[] = {
+	DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CONGESTION_ACTION,
+			     "congestion_action",
+			     DEVLINK_PARAM_TYPE_STRING,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     mlx5_devlink_get_congestion_action,
+			     mlx5_devlink_set_congestion_action, NULL),
+	DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CONGESTION_MODE,
+			     "congestion_mode",
+			     DEVLINK_PARAM_TYPE_STRING,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     mlx5_devlink_get_congestion_mode,
+			     mlx5_devlink_set_congestion_mode, NULL),
+};
+
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
 {
-	return devlink_register(devlink, dev);
+	int err;
+
+	err = devlink_register(devlink, dev);
+	if (err)
+		return err;
+
+	err = devlink_params_register(devlink, mlx5_devlink_params,
+				      ARRAY_SIZE(mlx5_devlink_params));
+	if (err) {
+		dev_err(dev, "devlink_params_register failed\n");
+		goto unregister;
+	}
+
+	return 0;
+
+unregister:
+	devlink_unregister(devlink);
+	return err;
 }
 
 void mlx5_devlink_unregister(struct devlink *devlink)
 {
+	devlink_params_unregister(devlink, mlx5_devlink_params,
+				  ARRAY_SIZE(mlx5_devlink_params));
 	devlink_unregister(devlink);
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 08/10] net/mlx5: Add Vendor Specific Capability access gateway
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2018-08-01 21:52 ` [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 21:52 ` [net-next 09/10] net/mlx5: Add Crdump FW snapshot support Saeed Mahameed
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Alex Vesker, Saeed Mahameed

From: Alex Vesker <valex@mellanox.com>

The Vendor Specific Capability (VSC) is used to activate a gateway
interfacing with the device. The gateway is used to read or write
device configurations, which are organized in different domains (spaces).
A configuration access may result in multiple actions, reads, writes.

Example usages are accessing the Crspace domain to read the crspace or
locking a device semaphore using the Semaphore domain.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  |   2 +
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 320 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.h |  56 +++
 include/linux/mlx5/driver.h                   |   1 +
 5 files changed, 380 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 0be5bf93f33b..15f6916efe1b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -6,7 +6,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
-		diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
+		lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
 
 mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index d39b0b7011b2..db9e39fdc33e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -38,6 +38,7 @@
 #include <linux/mlx5/driver.h>
 #include <linux/mlx5/cmd.h>
 #include "mlx5_core.h"
+#include "lib/pci_vsc.h"
 
 enum {
 	MLX5_HEALTH_POLL_INTERVAL	= 2 * HZ,
@@ -388,6 +389,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
 	spin_lock_init(&health->wq_lock);
 	INIT_WORK(&health->work, health_care);
 	INIT_DELAYED_WORK(&health->recover_work, health_recover);
+	mlx5_vsc_init(dev);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
new file mode 100644
index 000000000000..cfc2eebbf7c5
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/pci.h>
+#include "mlx5_core.h"
+#include "pci_vsc.h"
+
+#define MLX5_EXTRACT_C(source, offset, size)	\
+	((((unsigned)(source)) >> (offset)) & MLX5_ONES32(size))
+#define MLX5_EXTRACT(src, start, len)		\
+	(((len) == 32) ? (src) : MLX5_EXTRACT_C(src, start, len))
+#define MLX5_ONES32(size)			\
+	((size) ? (0xffffffff >> (32 - (size))) : 0)
+#define MLX5_MASK32(offset, size)		\
+	(MLX5_ONES32(size) << (offset))
+#define MLX5_MERGE_C(rsrc1, rsrc2, start, len)  \
+	((((rsrc2) << (start)) & (MLX5_MASK32((start), (len)))) | \
+	((rsrc1) & (~MLX5_MASK32((start), (len)))))
+#define MLX5_MERGE(rsrc1, rsrc2, start, len)	\
+	(((len) == 32) ? (rsrc2) : MLX5_MERGE_C(rsrc1, rsrc2, start, len))
+
+#define VSC_MAX_RETRIES 2048
+
+enum {
+	MLX5_VSC_UNLOCK,
+	MLX5_VSC_LOCK,
+};
+
+enum {
+	VSC_CTRL_OFFSET = 0x4,
+	VSC_COUNTER_OFFSET = 0x8,
+	VSC_SEMAPHORE_OFFSET = 0xc,
+	VSC_ADDR_OFFSET = 0x10,
+	VSC_DATA_OFFSET = 0x14,
+
+	VSC_FLAG_BIT_OFFS = 31,
+	VSC_FLAG_BIT_LEN = 1,
+
+	VSC_SYND_BIT_OFFS = 30,
+	VSC_SYND_BIT_LEN = 1,
+
+	VSC_ADDR_BIT_OFFS = 0,
+	VSC_ADDR_BIT_LEN = 30,
+
+	VSC_SPACE_BIT_OFFS = 0,
+	VSC_SPACE_BIT_LEN = 16,
+
+	VSC_SIZE_VLD_BIT_OFFS = 28,
+	VSC_SIZE_VLD_BIT_LEN = 1,
+
+	VSC_STATUS_BIT_OFFS = 29,
+	VSC_STATUS_BIT_LEN = 3,
+};
+
+int mlx5_vsc_init(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+
+	priv->health.vsc_addr = pci_find_capability(dev->pdev,
+						    PCI_CAP_ID_VNDR);
+	if (!priv->health.vsc_addr)
+		mlx5_core_warn(dev, "Failed to get valid vendor specific ID\n");
+
+	return 0;
+}
+
+int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+	u32 counter = 0;
+	int retries = 0;
+	u32 lock_val;
+	int ret;
+
+	do {
+		if (retries > VSC_MAX_RETRIES)
+			return -EBUSY;
+
+		/* Check if semaphore is already locked */
+		ret = pci_read_config_dword(dev->pdev,
+					    priv->health.vsc_addr +
+					    VSC_SEMAPHORE_OFFSET,
+					    &lock_val);
+		if (ret)
+			return ret;
+
+		if (lock_val) {
+			retries++;
+			usleep_range(1000, 2000);
+			continue;
+		}
+
+		/*
+		 * Read and write counter value, if written value is
+		 * the same, semaphore was acquired successfully.
+		 */
+		ret = pci_read_config_dword(dev->pdev,
+					    priv->health.vsc_addr +
+					    VSC_COUNTER_OFFSET,
+					    &counter);
+		if (ret)
+			return ret;
+
+		ret = pci_write_config_dword(dev->pdev,
+					     priv->health.vsc_addr +
+					     VSC_SEMAPHORE_OFFSET,
+					     counter);
+		if (ret)
+			return ret;
+
+		ret = pci_read_config_dword(dev->pdev,
+					    priv->health.vsc_addr +
+					    VSC_SEMAPHORE_OFFSET,
+					    &lock_val);
+		if (ret)
+			return ret;
+
+		retries++;
+	} while (counter != lock_val);
+
+	return 0;
+}
+
+int mlx5_vsc_gw_unlock(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+	int ret;
+
+	ret = pci_write_config_dword(dev->pdev,
+				     priv->health.vsc_addr +
+				     VSC_SEMAPHORE_OFFSET,
+				     MLX5_VSC_UNLOCK);
+	return ret;
+}
+
+int mlx5_vsc_gw_set_space(struct mlx5_core_dev *dev, u16 space,
+			  u32 *ret_space_size)
+{
+	struct mlx5_priv *priv = &dev->priv;
+	int ret;
+	u32 val;
+
+	if (!mlx5_vsc_accessible(dev))
+		return -EINVAL;
+
+	if (ret_space_size)
+		*ret_space_size = 0;
+
+	ret = pci_read_config_dword(dev->pdev,
+				    priv->health.vsc_addr +
+				    VSC_CTRL_OFFSET,
+				    &val);
+	if (ret)
+		goto out;
+
+	val = MLX5_MERGE(val, space, VSC_SPACE_BIT_OFFS, VSC_SPACE_BIT_LEN);
+	ret = pci_write_config_dword(dev->pdev,
+				     priv->health.vsc_addr +
+				     VSC_CTRL_OFFSET,
+				     val);
+	if (ret)
+		goto out;
+
+	ret = pci_read_config_dword(dev->pdev,
+				    priv->health.vsc_addr +
+				    VSC_CTRL_OFFSET,
+				    &val);
+	if (ret)
+		goto out;
+
+	if (MLX5_EXTRACT(val, VSC_STATUS_BIT_OFFS, VSC_STATUS_BIT_LEN) == 0)
+		return -EINVAL;
+
+	/* Get space max address if indicated by size valid bit */
+	if (ret_space_size &&
+	    MLX5_EXTRACT(val, VSC_SIZE_VLD_BIT_OFFS, VSC_SIZE_VLD_BIT_LEN)) {
+		ret = pci_read_config_dword(dev->pdev,
+					    priv->health.vsc_addr +
+					    VSC_ADDR_OFFSET,
+					    &val);
+		if (ret) {
+			mlx5_core_warn(dev, "Failed to get max space size\n");
+			goto out;
+		}
+		*ret_space_size = MLX5_EXTRACT(val, VSC_ADDR_BIT_OFFS,
+					       VSC_ADDR_BIT_LEN);
+	}
+	return 0;
+
+out:
+	return ret;
+}
+
+static int mlx5_vsc_wait_on_flag(struct mlx5_core_dev *dev, u8 expected_val)
+{
+	struct mlx5_priv *priv = &dev->priv;
+	int retries = 0;
+	u32 flag;
+	int ret;
+
+	do {
+		if (retries > VSC_MAX_RETRIES)
+			return -EBUSY;
+
+		ret = pci_read_config_dword(dev->pdev,
+					    priv->health.vsc_addr +
+					    VSC_ADDR_OFFSET,
+					    &flag);
+		flag = MLX5_EXTRACT(flag, VSC_FLAG_BIT_OFFS, VSC_FLAG_BIT_LEN);
+		retries++;
+
+		if ((retries & 0xf) == 0)
+			usleep_range(1000, 2000);
+
+	} while (flag != expected_val);
+
+	return 0;
+}
+
+static int mlx5_vsc_gw_read(struct mlx5_core_dev *dev, unsigned int address,
+			    u32 *data)
+{
+	struct mlx5_priv *priv = &dev->priv;
+	int ret;
+
+	if (MLX5_EXTRACT(address, VSC_SYND_BIT_OFFS,
+			 VSC_FLAG_BIT_LEN + VSC_SYND_BIT_LEN))
+		return -EINVAL;
+
+	ret = pci_write_config_dword(dev->pdev,
+				     priv->health.vsc_addr +
+				     VSC_ADDR_OFFSET,
+				     address);
+	if (ret)
+		goto out;
+
+	ret = mlx5_vsc_wait_on_flag(dev, 1);
+	if (ret)
+		goto out;
+
+	ret = pci_read_config_dword(dev->pdev,
+				    priv->health.vsc_addr +
+				    VSC_DATA_OFFSET,
+				    data);
+out:
+	return ret;
+}
+
+static int mlx5_vsc_gw_read_fast(struct mlx5_core_dev *dev,
+				 unsigned int read_addr,
+				 unsigned int *next_read_addr,
+				 u32 *data)
+{
+	struct mlx5_priv *priv = &dev->priv;
+	int ret;
+
+	ret = mlx5_vsc_gw_read(dev, read_addr, data);
+	if (ret)
+		goto out;
+
+	ret = pci_read_config_dword(dev->pdev,
+				    priv->health.vsc_addr +
+				    VSC_ADDR_OFFSET,
+				    next_read_addr);
+	if (ret)
+		goto out;
+
+	*next_read_addr = MLX5_EXTRACT(*next_read_addr, VSC_ADDR_BIT_OFFS,
+				       VSC_ADDR_BIT_LEN);
+
+	if (*next_read_addr <= read_addr)
+		ret = EINVAL;
+out:
+	return ret;
+}
+
+int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
+				int length)
+{
+	unsigned int next_read_addr = 0;
+	unsigned int read_addr = 0;
+
+	while (read_addr < length) {
+		if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr,
+					  &data[(read_addr >> 2)]))
+			return read_addr;
+
+		read_addr = next_read_addr;
+	}
+	return length;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
new file mode 100644
index 000000000000..af10bce44692
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __MLX5_PCI_VSC_H__
+#define __MLX5_PCI_VSC_H__
+
+enum {
+	MLX5_VSC_SPACE_SCAN_CRSPACE = 0x7,
+};
+
+int mlx5_vsc_init(struct mlx5_core_dev *dev);
+void mlx5_vsc_cleanup(struct mlx5_core_dev *dev);
+int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev);
+int mlx5_vsc_gw_unlock(struct mlx5_core_dev *dev);
+int mlx5_vsc_gw_set_space(struct mlx5_core_dev *dev, u16 space,
+			  u32 *ret_space_size);
+int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
+				int length);
+
+static inline bool mlx5_vsc_accessible(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+
+	return (!!priv->health.vsc_addr);
+}
+
+#endif /* __MLX5_PCI_VSC_H__ */
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 54f385cc8811..6690875b368b 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -541,6 +541,7 @@ struct mlx5_core_health {
 	unsigned long			flags;
 	struct work_struct		work;
 	struct delayed_work		recover_work;
+	u32				vsc_addr;
 };
 
 struct mlx5_qp_table {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 09/10] net/mlx5: Add Crdump FW snapshot support
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2018-08-01 21:52 ` [net-next 08/10] net/mlx5: Add Vendor Specific Capability access gateway Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 21:52 ` [net-next 10/10] net/mlx5: Use devlink region_snapshot parameter Saeed Mahameed
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Alex Vesker, Saeed Mahameed

From: Alex Vesker <valex@mellanox.com>

Crdump allows the driver to create a snapshot of the FW PCI
crspace. This is useful in case of catastrophic issues which
require FW reset. The snapshot can be used for later debug.

The snapshot is exposed using devlink, cr-space
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   3 +-
 .../ethernet/mellanox/mlx5/core/diag/crdump.c | 201 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/health.c  |   1 +
 .../ethernet/mellanox/mlx5/core/lib/mlx5.h    |   2 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   5 +
 include/linux/mlx5/driver.h                   |   4 +
 6 files changed, 215 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 15f6916efe1b..6ea9e9462c77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -6,7 +6,8 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
-		lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
+		lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o diag/crdump.o \
+		devlink.o
 
 mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
new file mode 100644
index 000000000000..fe779e62fc70
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
@@ -0,0 +1,201 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/proc_fs.h>
+#include <linux/mlx5/driver.h>
+#include <net/devlink.h>
+#include "mlx5_core.h"
+#include "lib/pci_vsc.h"
+
+#define BAD_ACCESS			0xBADACCE5
+#define MLX5_PROTECTED_CR_SCAN_CRSPACE	0x7
+#define MAX_NUM_OF_DUMPS_TO_STORE	(8)
+
+static const char *region_cr_space_str = "cr-space";
+
+struct mlx5_fw_crdump {
+	u32			size;
+	struct devlink_region	*region_crspace;
+};
+
+bool mlx5_crdump_enbaled(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+
+	return (!!priv->health.crdump);
+}
+
+static int mlx5_crdump_fill(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	struct mlx5_priv *priv = &dev->priv;
+	struct mlx5_fw_crdump *crdump = priv->health.crdump;
+	int i, ret = 0;
+	u32 *cr_data;
+	u32 id;
+
+	cr_data = kvmalloc(crdump->size, GFP_KERNEL);
+	if (!cr_data)
+		return -ENOMEM;
+
+	for (i = 0; i < (crdump->size / 4); i++)
+		cr_data[i] = BAD_ACCESS;
+
+	ret = mlx5_vsc_gw_read_block_fast(dev, cr_data, crdump->size);
+	if (ret <= 0)
+		goto free_data;
+
+	if (crdump->size != ret) {
+		mlx5_core_warn(dev, "failed to read full dump, read %d out of %u\n",
+			       ret, crdump->size);
+		ret = -EINVAL;
+		goto free_data;
+	}
+
+	/* Get the available snapshot ID for the dumps */
+	id = devlink_region_shapshot_id_get(devlink);
+	ret = devlink_region_snapshot_create(crdump->region_crspace,
+					     crdump->size, (u8 *)cr_data,
+					     id, &kvfree);
+	if (ret) {
+		mlx5_core_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
+			       region_cr_space_str, id, ret);
+		goto free_data;
+	} else {
+		mlx5_core_info(dev, "crdump: added snapshot %d to devlink region %s\n",
+			       id, region_cr_space_str);
+	}
+	return 0;
+
+free_data:
+	kvfree(cr_data);
+	return ret;
+}
+
+int mlx5_crdump_collect(struct mlx5_core_dev *dev)
+{
+	int ret = 0;
+
+	if (!mlx5_crdump_enbaled(dev))
+		return -ENODEV;
+
+	ret = mlx5_vsc_gw_lock(dev);
+	if (ret)
+		return ret;
+
+	ret = mlx5_vsc_gw_set_space(dev, MLX5_VSC_SPACE_SCAN_CRSPACE, NULL);
+	if (ret)
+		goto unlock;
+
+	ret = mlx5_crdump_fill(dev);
+	if (ret)
+		goto unlock;
+
+unlock:
+	mlx5_vsc_gw_unlock(dev);
+	return ret;
+}
+
+int mlx5_crdump_init(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	struct mlx5_priv *priv = &dev->priv;
+	struct mlx5_fw_crdump *crdump;
+	u32 space_size;
+	int ret;
+
+	if (!mlx5_core_is_pf(dev) || !mlx5_vsc_accessible(dev) ||
+	    mlx5_crdump_enbaled(dev))
+		return 0;
+
+	ret = mlx5_vsc_gw_lock(dev);
+	if (ret)
+		return ret;
+
+	/* Check if space is supported and get space size */
+	ret = mlx5_vsc_gw_set_space(dev, MLX5_VSC_SPACE_SCAN_CRSPACE,
+				    &space_size);
+	if (ret) {
+		/* Unlock and mask error since space is not supported */
+		mlx5_vsc_gw_unlock(dev);
+		return 0;
+	}
+
+	if (space_size == 0) {
+		mlx5_core_warn(dev, "Invalid Crspace size, zero\n");
+		mlx5_vsc_gw_unlock(dev);
+		return -EINVAL;
+	}
+
+	ret = mlx5_vsc_gw_unlock(dev);
+	if (ret)
+		return ret;
+
+	crdump = kzalloc(sizeof(*crdump), GFP_KERNEL);
+	if (!crdump)
+		return -ENOMEM;
+
+	/* Create cr-space region */
+	crdump->size = space_size;
+	crdump->region_crspace =
+		devlink_region_create(devlink,
+				      region_cr_space_str,
+				      MAX_NUM_OF_DUMPS_TO_STORE,
+				      space_size);
+	if (IS_ERR(crdump->region_crspace)) {
+		mlx5_core_warn(dev,
+			       "crdump: create devlink region %s err %ld\n",
+			       region_cr_space_str,
+			       PTR_ERR(crdump->region_crspace));
+		ret = PTR_ERR(crdump->region_crspace);
+		goto free_crdump;
+	}
+	priv->health.crdump = crdump;
+	return 0;
+
+free_crdump:
+	kfree(crdump);
+	return ret;
+}
+
+void mlx5_crdump_cleanup(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+	struct mlx5_fw_crdump *crdump = priv->health.crdump;
+
+	if (!crdump)
+		return;
+
+	devlink_region_destroy(crdump->region_crspace);
+	kfree(crdump);
+	priv->health.crdump = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index db9e39fdc33e..10ac6a98ea96 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -390,6 +390,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
 	INIT_WORK(&health->work, health_care);
 	INIT_DELAYED_WORK(&health->recover_work, health_recover);
 	mlx5_vsc_init(dev);
+	health->crdump = NULL;
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index 7550b1cc8c6a..fbdf332a9174 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -39,5 +39,7 @@ int  mlx5_core_reserve_gids(struct mlx5_core_dev *dev, unsigned int count);
 void mlx5_core_unreserve_gids(struct mlx5_core_dev *dev, unsigned int count);
 int  mlx5_core_reserved_gid_alloc(struct mlx5_core_dev *dev, int *gid_index);
 void mlx5_core_reserved_gid_free(struct mlx5_core_dev *dev, int gid_index);
+int mlx5_crdump_init(struct mlx5_core_dev *dev);
+void mlx5_crdump_cleanup(struct mlx5_core_dev *dev);
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 7a1ddf96f065..f2dd54accd0a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1450,6 +1450,10 @@ static int init_one(struct pci_dev *pdev,
 	if (err)
 		goto clean_load;
 
+	err = mlx5_crdump_init(dev);
+	if (err)
+		dev_err(&pdev->dev, "mlx5_crdump_init failed with error code %d\n", err);
+
 	pci_save_state(pdev);
 	return 0;
 
@@ -1476,6 +1480,7 @@ static void remove_one(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_priv *priv = &dev->priv;
 
+	mlx5_crdump_cleanup(dev);
 	mlx5_devlink_unregister(devlink);
 	mlx5_unregister_device(dev);
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 6690875b368b..52744065924a 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -52,6 +52,7 @@
 #include <linux/mlx5/srq.h>
 #include <linux/timecounter.h>
 #include <linux/ptp_clock_kernel.h>
+#include <net/devlink.h>
 
 enum {
 	MLX5_BOARD_ID_LEN = 64,
@@ -528,6 +529,8 @@ struct mlx5_sq_bfreg {
 	unsigned int		offset;
 };
 
+struct mlx5_fw_crdump;
+
 struct mlx5_core_health {
 	struct health_buffer __iomem   *health;
 	__be32 __iomem		       *health_counter;
@@ -542,6 +545,7 @@ struct mlx5_core_health {
 	struct work_struct		work;
 	struct delayed_work		recover_work;
 	u32				vsc_addr;
+	struct mlx5_fw_crdump	       *crdump;
 };
 
 struct mlx5_qp_table {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [net-next 10/10] net/mlx5: Use devlink region_snapshot parameter
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2018-08-01 21:52 ` [net-next 09/10] net/mlx5: Add Crdump FW snapshot support Saeed Mahameed
@ 2018-08-01 21:52 ` Saeed Mahameed
  2018-08-01 22:34 ` [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Alexander Duyck
  2018-08-02  0:00 ` Jakub Kicinski
  11 siblings, 0 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 21:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jakub Kicinski, Alexander Duyck,
	Bjorn Helgaas, Alex Vesker, Saeed Mahameed

From: Alex Vesker <valex@mellanox.com>

This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 49 +++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/diag/crdump.c | 22 +++++++++
 .../ethernet/mellanox/mlx5/core/lib/mlx5.h    |  2 +
 3 files changed, 73 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index ec0ca690839e..4e33049096d0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
 
 #include <devlink.h>
+#include "lib/mlx5.h"
 
 enum {
 	MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN = BIT(0),
@@ -288,6 +289,24 @@ static int mlx5_devlink_get_congestion_mode(struct devlink *devlink, u32 id,
 	return 0;
 }
 
+static int mlx5_devlink_get_crdump_snapshot(struct devlink *devlink, u32 id,
+					    struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	ctx->val.vbool = mlx5_crdump_is_snapshot_enabled(dev);
+	return 0;
+}
+
+static int mlx5_devlink_set_crdump_snapshot(struct devlink *devlink, u32 id,
+					    struct devlink_param_gset_ctx *ctx,
+					    struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	return mlx5_crdump_set_snapshot_enabled(dev, ctx->val.vbool);
+}
+
 enum mlx5_devlink_param_id {
 	MLX5_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	MLX5_DEVLINK_PARAM_ID_CONGESTION_ACTION,
@@ -295,6 +314,11 @@ enum mlx5_devlink_param_id {
 };
 
 static const struct devlink_param mlx5_devlink_params[] = {
+	DEVLINK_PARAM_GENERIC(REGION_SNAPSHOT,
+			      BIT(DEVLINK_PARAM_CMODE_RUNTIME) |
+			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+			      mlx5_devlink_get_crdump_snapshot,
+			      mlx5_devlink_set_crdump_snapshot, NULL),
 	DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CONGESTION_ACTION,
 			     "congestion_action",
 			     DEVLINK_PARAM_TYPE_STRING,
@@ -309,6 +333,29 @@ static const struct devlink_param mlx5_devlink_params[] = {
 			     mlx5_devlink_set_congestion_mode, NULL),
 };
 
+static void mlx5_devlink_set_init_value(struct devlink *devlink, u32 param_id,
+					union devlink_param_value init_val)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	int err;
+
+	err = devlink_param_driverinit_value_set(devlink, param_id, init_val);
+	if (err)
+		dev_warn(&dev->pdev->dev,
+			 "devlink set parameter %u value failed (err = %d)",
+			 param_id, err);
+}
+
+static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
+{
+	union devlink_param_value value;
+
+	value.vbool = false;
+	mlx5_devlink_set_init_value(devlink,
+				    DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+				    value);
+}
+
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
 {
 	int err;
@@ -324,6 +371,8 @@ int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
 		goto unregister;
 	}
 
+	mlx5_devlink_set_params_init_values(devlink);
+
 	return 0;
 
 unregister:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
index fe779e62fc70..94b74b256eaa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
@@ -44,6 +44,7 @@ static const char *region_cr_space_str = "cr-space";
 
 struct mlx5_fw_crdump {
 	u32			size;
+	bool			snapshot_enable;
 	struct devlink_region	*region_crspace;
 };
 
@@ -125,6 +126,27 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev)
 	return ret;
 }
 
+bool mlx5_crdump_is_snapshot_enabled(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+
+	if (mlx5_crdump_enbaled(dev))
+		return priv->health.crdump->snapshot_enable;
+
+	return false;
+}
+
+int mlx5_crdump_set_snapshot_enabled(struct mlx5_core_dev *dev, bool value)
+{
+	struct mlx5_priv *priv = &dev->priv;
+
+	if (!mlx5_crdump_enbaled(dev))
+		return -ENODEV;
+
+	priv->health.crdump->snapshot_enable = value;
+	return 0;
+}
+
 int mlx5_crdump_init(struct mlx5_core_dev *dev)
 {
 	struct devlink *devlink = priv_to_devlink(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index fbdf332a9174..1533a959b82d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -41,5 +41,7 @@ int  mlx5_core_reserved_gid_alloc(struct mlx5_core_dev *dev, int *gid_index);
 void mlx5_core_reserved_gid_free(struct mlx5_core_dev *dev, int gid_index);
 int mlx5_crdump_init(struct mlx5_core_dev *dev);
 void mlx5_crdump_cleanup(struct mlx5_core_dev *dev);
+bool mlx5_crdump_is_snapshot_enabled(struct mlx5_core_dev *dev);
+int mlx5_crdump_set_snapshot_enabled(struct mlx5_core_dev *dev, bool value);
 
 #endif
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
  2018-08-01 21:52 ` [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink Saeed Mahameed
@ 2018-08-01 22:18   ` Alexander Duyck
  0 siblings, 0 replies; 32+ messages in thread
From: Alexander Duyck @ 2018-08-01 22:18 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Netdev, Jiri Pirko, Jakub Kicinski,
	Bjorn Helgaas, Eran Ben Elisha

On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed <saeedm@mellanox.com> wrote:
> From: Eran Ben Elisha <eranbe@mellanox.com>
>
> Add support for two driver parameters via devlink params interface:
> - Congestion action
>         HW mechanism in the PCIe buffer which monitors the amount of
>         consumed PCIe buffer per host. This mechanism supports the
>         following actions in case of threshold overflow:
>         - disabled - NOP (Default)
>         - drop
>         - mark - Mark CE bit in the CQE of received packet

Any chance you could clarify the differences between "disabled" and
"drop"? I am assuming the "drop" is a head-of-line drop versus the
"disabled" being a incoming packet drop?

Also I still don't see this as necessarily being all that unique of a
feature/issue. Basically being PCIe bus limited is not all that
uncommon of a thing and has existed since the early days of PCI. In
the case of the Intel NICs we just throw a warning and end up dropping
the incoming packets instead of providing the two other options you
have listed.

> - Congestion mode
>         - aggressive - Aggressive static trigger threshold (Default)
>         - dynamic - Dynamically change the trigger threshold
>
> These driver-specific params enable the NIC HW workaround to handle
> buffer congestion on the current NIC generation.

Is there any documentation anywhere for any of these features? In the
patch set I see you adding interfaces, but I don't see them documented
anywhere.

> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/devlink.c | 204 +++++++++++++++++-
>  1 file changed, 203 insertions(+), 1 deletion(-)

Ideally there should be some documentation going into the kernel when
you extend the devlink interface at least so that I know how to use
your new interfaces when you define them. Just updating devlink.c
seems like a messy way to do things.

- Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [net-next 01/10] devlink: Fix param set handling for string type
  2018-08-01 21:52 ` [net-next 01/10] devlink: Fix param set handling for string type Saeed Mahameed
@ 2018-08-01 22:33   ` Jakub Kicinski
  0 siblings, 0 replies; 32+ messages in thread
From: Jakub Kicinski @ 2018-08-01 22:33 UTC (permalink / raw)
  To: Saeed Mahameed, Jiri Pirko
  Cc: David S. Miller, netdev, Alexander Duyck, Bjorn Helgaas, Moshe Shemesh

On Wed,  1 Aug 2018 14:52:46 -0700, Saeed Mahameed wrote:
> From: Moshe Shemesh <moshe@mellanox.com>
> 
> In case devlink param type is string, it needs to copy the string value
> it got from the input to devlink_param_value.
> 
> Fixes: e3b7ca18ad7b ("devlink: Add param set command")
> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  include/net/devlink.h | 2 +-
>  net/core/devlink.c    | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index b9b89d6604d4..b0e17c025fdc 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -311,7 +311,7 @@ union devlink_param_value {
>  	u8 vu8;
>  	u16 vu16;
>  	u32 vu32;
> -	const char *vstr;
> +	char vstr[DEVLINK_PARAM_MAX_STRING_VALUE];
>  	bool vbool;
>  };
>  
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 65fc366a78a4..61e126c28526 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -3014,7 +3014,8 @@ devlink_param_value_get_from_info(const struct devlink_param *param,
>  		if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) >
>  		    DEVLINK_PARAM_MAX_STRING_VALUE)
>  			return -EINVAL;
> -		value->vstr = nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
> +		strcpy(value->vstr,
> +		       nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));

DEVLINK_ATTR_PARAM_VALUE_DATA is a type mux, I'm struggling to find in
the code where it's checked for being null-terminated for strings.

>  		break;
>  	case DEVLINK_PARAM_TYPE_BOOL:
>  		value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2018-08-01 21:52 ` [net-next 10/10] net/mlx5: Use devlink region_snapshot parameter Saeed Mahameed
@ 2018-08-01 22:34 ` Alexander Duyck
  2018-08-01 23:13   ` Saeed Mahameed
  2018-08-02  0:00 ` Jakub Kicinski
  11 siblings, 1 reply; 32+ messages in thread
From: Alexander Duyck @ 2018-08-01 22:34 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Netdev, Jiri Pirko, Jakub Kicinski, Bjorn Helgaas

On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed <saeedm@mellanox.com> wrote:
> Hi Dave,
>
> This series provides devlink parameters updates to both devlink API and
> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous
> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
> Devlink" to address review comments [1].
>
> Changes from the original series:
> - According to the discussion outcome, we are keeping the congestion control
>   setting as mlx5 device specific for the current HW generation.
> - Changed the congestion_mode and congestion action param type to string
> - Added patches to fix devlink handling of param type string
> - Added a patch which adds extack messages support for param set.
> - At the end of this series, I've added yet another mlx5 devlink related
>  feature, firmware snapshot support.
>
> For more information please see tag log below.
>
> Please pull and let me know if there's any problem.
>
> [1] https://patchwork.ozlabs.org/patch/945996/
>
> Thanks,
> Saeed.
>
> ---
>
> The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8:
>
>   net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2018-08-01
>
> for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
>
>   net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700)
>
> ----------------------------------------------------------------
> mlx5-updates-2018-08-01
>
> This series provides devlink parameters updates to both devlink API and
> mlx5 driver,
>
> 1) Devlink changes: (Moshe Shemesh)
> The first two patches fix devlink param infrastructure for string type
> params.
> The third patch adds a devlink helper function to safely copy string from
> driver to devlink.
> The forth patch adds extack support for param set.
>
> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
> Next three patches add new devlink driver specific params for controlling
> congestion action and mode, using string type params and extack messages support.
>
> This congestion mode enables hw workaround in specific devices which is
> controlled by devlink driver-specific params. The workaround is device
> specific for this NIC generation, the next NIC will not need it.
>
> Congestion parameters:
>  - Congestion action
>             HW W/A mechanism in the PCIe buffer which monitors the amount of
>             consumed PCIe buffer per host.  This mechanism supports the
>             following actions in case of threshold overflow:
>             - Disabled - NOP (Default)
>             - Drop
>             - Mark - Mark CE bit in the CQE of received packet
>     - Congestion mode
>             - Aggressive - Aggressive static trigger threshold (Default)
>             - Dynamic - Dynamically change the trigger threshold
>
> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
> Last three patches, add the support for capturing region snapshot of the
> firmware crspace during critical errors, using devlink region_snapshot
> parameter.
>
> -Saeed.
>
> ----------------------------------------------------------------
> Alex Vesker (3):
>       net/mlx5: Add Vendor Specific Capability access gateway
>       net/mlx5: Add Crdump FW snapshot support
>       net/mlx5: Use devlink region_snapshot parameter
>
> Eran Ben Elisha (3):
>       net/mlx5: Move all devlink related functions calls to devlink.c
>       net/mlx5: Add MPEGC register configuration functionality
>       net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
>
> Moshe Shemesh (4):
>       devlink: Fix param set handling for string type
>       devlink: Fix param cmode driverinit for string type
>       devlink: Add helper function for safely copy string param
>       devlink: Add extack messages support to param set
>
>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
>  drivers/net/ethernet/mellanox/mlx4/main.c          |   6 +-
>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
>  drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 +++++++++++++++++++++
>  drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
>  .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 ++++++++++++
>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 +++++++++++++++++
>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
>  drivers/net/ethernet/mellanox/mlx5/core/main.c     |  10 +-
>  include/linux/mlx5/driver.h                        |   5 +
>  include/net/devlink.h                              |  15 +-
>  net/core/devlink.c                                 |  44 ++-
>  14 files changed, 1076 insertions(+), 17 deletions(-)
>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h

So after looking over the patch set the one thing I would ask for in
this is some sort of documentation at a minimum. As a user I don't see
how you can expect someone to be able to use this when the naming of
things are pretty cryptic and there is no real explanation anywhere if
you don't go through and read the patch description itself. When you
start adding driver specific interfaces, you should at least start
adding vendor specific documentation.

Also I don't see how using a vendor specific configuration space
section can be done without adding some tie-ins to the PCI core files
because it should be possible to race with someone poking at the
register space via something like setpci/lspci. Also one of the things
that came up was that drivers are not supposed to be banging on the
PCI configuration space at will, and it seems like this patch set is
doing exactly that through the VSC block.

- Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-01 22:34 ` [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Alexander Duyck
@ 2018-08-01 23:13   ` Saeed Mahameed
  2018-08-02  0:36     ` Alexander Duyck
  2018-08-02  6:15     ` Jiri Pirko
  0 siblings, 2 replies; 32+ messages in thread
From: Saeed Mahameed @ 2018-08-01 23:13 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Saeed Mahameed, David S. Miller, Netdev, Jiri Pirko,
	Jakub Kicinski, Bjorn Helgaas

On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed <saeedm@mellanox.com> wrote:
>> Hi Dave,
>>
>> This series provides devlink parameters updates to both devlink API and
>> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous
>> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
>> Devlink" to address review comments [1].
>>
>> Changes from the original series:
>> - According to the discussion outcome, we are keeping the congestion control
>>   setting as mlx5 device specific for the current HW generation.
>> - Changed the congestion_mode and congestion action param type to string
>> - Added patches to fix devlink handling of param type string
>> - Added a patch which adds extack messages support for param set.
>> - At the end of this series, I've added yet another mlx5 devlink related
>>  feature, firmware snapshot support.
>>
>> For more information please see tag log below.
>>
>> Please pull and let me know if there's any problem.
>>
>> [1] https://patchwork.ozlabs.org/patch/945996/
>>
>> Thanks,
>> Saeed.
>>
>> ---
>>
>> The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8:
>>
>>   net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700)
>>
>> are available in the Git repository at:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2018-08-01
>>
>> for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
>>
>>   net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700)
>>
>> ----------------------------------------------------------------
>> mlx5-updates-2018-08-01
>>
>> This series provides devlink parameters updates to both devlink API and
>> mlx5 driver,
>>
>> 1) Devlink changes: (Moshe Shemesh)
>> The first two patches fix devlink param infrastructure for string type
>> params.
>> The third patch adds a devlink helper function to safely copy string from
>> driver to devlink.
>> The forth patch adds extack support for param set.
>>
>> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
>> Next three patches add new devlink driver specific params for controlling
>> congestion action and mode, using string type params and extack messages support.
>>
>> This congestion mode enables hw workaround in specific devices which is
>> controlled by devlink driver-specific params. The workaround is device
>> specific for this NIC generation, the next NIC will not need it.
>>
>> Congestion parameters:
>>  - Congestion action
>>             HW W/A mechanism in the PCIe buffer which monitors the amount of
>>             consumed PCIe buffer per host.  This mechanism supports the
>>             following actions in case of threshold overflow:
>>             - Disabled - NOP (Default)
>>             - Drop
>>             - Mark - Mark CE bit in the CQE of received packet
>>     - Congestion mode
>>             - Aggressive - Aggressive static trigger threshold (Default)
>>             - Dynamic - Dynamically change the trigger threshold
>>
>> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
>> Last three patches, add the support for capturing region snapshot of the
>> firmware crspace during critical errors, using devlink region_snapshot
>> parameter.
>>
>> -Saeed.
>>
>> ----------------------------------------------------------------
>> Alex Vesker (3):
>>       net/mlx5: Add Vendor Specific Capability access gateway
>>       net/mlx5: Add Crdump FW snapshot support
>>       net/mlx5: Use devlink region_snapshot parameter
>>
>> Eran Ben Elisha (3):
>>       net/mlx5: Move all devlink related functions calls to devlink.c
>>       net/mlx5: Add MPEGC register configuration functionality
>>       net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
>>
>> Moshe Shemesh (4):
>>       devlink: Fix param set handling for string type
>>       devlink: Fix param cmode driverinit for string type
>>       devlink: Add helper function for safely copy string param
>>       devlink: Add extack messages support to param set
>>
>>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
>>  drivers/net/ethernet/mellanox/mlx4/main.c          |   6 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 +++++++++++++++++++++
>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
>>  .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 ++++++++++++
>>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
>>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 +++++++++++++++++
>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
>>  drivers/net/ethernet/mellanox/mlx5/core/main.c     |  10 +-
>>  include/linux/mlx5/driver.h                        |   5 +
>>  include/net/devlink.h                              |  15 +-
>>  net/core/devlink.c                                 |  44 ++-
>>  14 files changed, 1076 insertions(+), 17 deletions(-)
>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
>
> So after looking over the patch set the one thing I would ask for in
> this is some sort of documentation at a minimum. As a user I don't see
> how you can expect someone to be able to use this when the naming of
> things are pretty cryptic and there is no real explanation anywhere if
> you don't go through and read the patch description itself. When you
> start adding driver specific interfaces, you should at least start
> adding vendor specific documentation.
>

Sure, sounds like a great idea, something like:
Documentation/networking/mlx5.txt and have a devlink section ?
or have a generic devlink doc and a mlx5 section in it ?

> Also I don't see how using a vendor specific configuration space
> section can be done without adding some tie-ins to the PCI core files
> because it should be possible to race with someone poking at the
> register space via something like setpci/lspci. Also one of the things
> that came up was that drivers are not supposed to be banging on the
> PCI configuration space at will, and it seems like this patch set is
> doing exactly that through the VSC block.
>

this is a whole different feature than the device specific parameters.
The whole vendor specific configuration space access is needed only
for diagnostic/dump
purposes when something really bad happens and the command interface
with FW is down,
and when the FW is un-responsive, we want to dump the crspace into the
already existing devlink
crdump buffer, how do you expect us to read it if we are not allowed
to access it ?

What do you mean by tie-ins to the PCI core files ? can you please elaborate ?

> - Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2018-08-01 22:34 ` [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Alexander Duyck
@ 2018-08-02  0:00 ` Jakub Kicinski
  2018-08-02  1:40   ` David Miller
  11 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2018-08-02  0:00 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Jiri Pirko, Alexander Duyck, Bjorn Helgaas

On Wed,  1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote:
> - According to the discussion outcome, we are keeping the congestion control
>   setting as mlx5 device specific for the current HW generation.

I still see queuing and marking based on queue level.  You want to add 
a Qdisc that will mirror your HW's behaviour to offload, if you really
believe this is not a subset of RED, why not...  But devlink params?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-01 23:13   ` Saeed Mahameed
@ 2018-08-02  0:36     ` Alexander Duyck
       [not found]       ` <2d84340e-0703-0bc7-4917-3b18979b2aa5@mellanox.com>
  2018-08-02  6:15     ` Jiri Pirko
  1 sibling, 1 reply; 32+ messages in thread
From: Alexander Duyck @ 2018-08-02  0:36 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, David S. Miller, Netdev, Jiri Pirko,
	Jakub Kicinski, Bjorn Helgaas

On Wed, Aug 1, 2018 at 4:13 PM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed <saeedm@mellanox.com> wrote:
>>> Hi Dave,
>>>
>>> This series provides devlink parameters updates to both devlink API and
>>> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous
>>> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
>>> Devlink" to address review comments [1].
>>>
>>> Changes from the original series:
>>> - According to the discussion outcome, we are keeping the congestion control
>>>   setting as mlx5 device specific for the current HW generation.
>>> - Changed the congestion_mode and congestion action param type to string
>>> - Added patches to fix devlink handling of param type string
>>> - Added a patch which adds extack messages support for param set.
>>> - At the end of this series, I've added yet another mlx5 devlink related
>>>  feature, firmware snapshot support.
>>>
>>> For more information please see tag log below.
>>>
>>> Please pull and let me know if there's any problem.
>>>
>>> [1] https://patchwork.ozlabs.org/patch/945996/
>>>
>>> Thanks,
>>> Saeed.
>>>
>>> ---
>>>
>>> The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8:
>>>
>>>   net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700)
>>>
>>> are available in the Git repository at:
>>>
>>>   git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2018-08-01
>>>
>>> for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
>>>
>>>   net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700)
>>>
>>> ----------------------------------------------------------------
>>> mlx5-updates-2018-08-01
>>>
>>> This series provides devlink parameters updates to both devlink API and
>>> mlx5 driver,
>>>
>>> 1) Devlink changes: (Moshe Shemesh)
>>> The first two patches fix devlink param infrastructure for string type
>>> params.
>>> The third patch adds a devlink helper function to safely copy string from
>>> driver to devlink.
>>> The forth patch adds extack support for param set.
>>>
>>> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
>>> Next three patches add new devlink driver specific params for controlling
>>> congestion action and mode, using string type params and extack messages support.
>>>
>>> This congestion mode enables hw workaround in specific devices which is
>>> controlled by devlink driver-specific params. The workaround is device
>>> specific for this NIC generation, the next NIC will not need it.
>>>
>>> Congestion parameters:
>>>  - Congestion action
>>>             HW W/A mechanism in the PCIe buffer which monitors the amount of
>>>             consumed PCIe buffer per host.  This mechanism supports the
>>>             following actions in case of threshold overflow:
>>>             - Disabled - NOP (Default)
>>>             - Drop
>>>             - Mark - Mark CE bit in the CQE of received packet
>>>     - Congestion mode
>>>             - Aggressive - Aggressive static trigger threshold (Default)
>>>             - Dynamic - Dynamically change the trigger threshold
>>>
>>> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
>>> Last three patches, add the support for capturing region snapshot of the
>>> firmware crspace during critical errors, using devlink region_snapshot
>>> parameter.
>>>
>>> -Saeed.
>>>
>>> ----------------------------------------------------------------
>>> Alex Vesker (3):
>>>       net/mlx5: Add Vendor Specific Capability access gateway
>>>       net/mlx5: Add Crdump FW snapshot support
>>>       net/mlx5: Use devlink region_snapshot parameter
>>>
>>> Eran Ben Elisha (3):
>>>       net/mlx5: Move all devlink related functions calls to devlink.c
>>>       net/mlx5: Add MPEGC register configuration functionality
>>>       net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
>>>
>>> Moshe Shemesh (4):
>>>       devlink: Fix param set handling for string type
>>>       devlink: Fix param cmode driverinit for string type
>>>       devlink: Add helper function for safely copy string param
>>>       devlink: Add extack messages support to param set
>>>
>>>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
>>>  drivers/net/ethernet/mellanox/mlx4/main.c          |   6 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 +++++++++++++++++++++
>>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
>>>  .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 ++++++++++++
>>>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
>>>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
>>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 +++++++++++++++++
>>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c     |  10 +-
>>>  include/linux/mlx5/driver.h                        |   5 +
>>>  include/net/devlink.h                              |  15 +-
>>>  net/core/devlink.c                                 |  44 ++-
>>>  14 files changed, 1076 insertions(+), 17 deletions(-)
>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
>>
>> So after looking over the patch set the one thing I would ask for in
>> this is some sort of documentation at a minimum. As a user I don't see
>> how you can expect someone to be able to use this when the naming of
>> things are pretty cryptic and there is no real explanation anywhere if
>> you don't go through and read the patch description itself. When you
>> start adding driver specific interfaces, you should at least start
>> adding vendor specific documentation.
>>
>
> Sure, sounds like a great idea, something like:
> Documentation/networking/mlx5.txt and have a devlink section ?
> or have a generic devlink doc and a mlx5 section in it ?

Either would work for me.

>> Also I don't see how using a vendor specific configuration space
>> section can be done without adding some tie-ins to the PCI core files
>> because it should be possible to race with someone poking at the
>> register space via something like setpci/lspci. Also one of the things
>> that came up was that drivers are not supposed to be banging on the
>> PCI configuration space at will, and it seems like this patch set is
>> doing exactly that through the VSC block.
>>
>
> this is a whole different feature than the device specific parameters.
> The whole vendor specific configuration space access is needed only
> for diagnostic/dump
> purposes when something really bad happens and the command interface
> with FW is down,
> and when the FW is un-responsive, we want to dump the crspace into the
> already existing devlink
> crdump buffer, how do you expect us to read it if we are not allowed
> to access it ?
>
> What do you mean by tie-ins to the PCI core files ? can you please elaborate ?

You have added a vendor specific config section and you are using it
to access several of the pieces of metadata. The setup isn't too
different than the VPD setup and approach. However I don't see many of
the protections that exist for VPD in place for this vendor specific
configuration. As such I have concerns. For example what is to keep
requests to the various devlink interfaces from racing with each other
when they both end up operating through the VCS?

- Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02  0:00 ` Jakub Kicinski
@ 2018-08-02  1:40   ` David Miller
  2018-08-02  8:29     ` Petr Machata
  2018-08-02 15:07     ` Eran Ben Elisha
  0 siblings, 2 replies; 32+ messages in thread
From: David Miller @ 2018-08-02  1:40 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: saeedm, netdev, jiri, alexander.duyck, helgaas

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 1 Aug 2018 17:00:47 -0700

> On Wed,  1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote:
>> - According to the discussion outcome, we are keeping the congestion control
>>   setting as mlx5 device specific for the current HW generation.
> 
> I still see queuing and marking based on queue level.  You want to add 
> a Qdisc that will mirror your HW's behaviour to offload, if you really
> believe this is not a subset of RED, why not...  But devlink params?

I totally agree, devlink seems like absolutely to wrong level and set
of interfaces to be doing this stuff.

I will not pull these changes in and I probably should have not
accepted the DCB changes from the other day and they were sneakily
leading up to this crap.

Sorry, please follow Jakub's lead as I think his approach makes much
more technical sense than using devlink for this.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-01 23:13   ` Saeed Mahameed
  2018-08-02  0:36     ` Alexander Duyck
@ 2018-08-02  6:15     ` Jiri Pirko
  1 sibling, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2018-08-02  6:15 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Alexander Duyck, Saeed Mahameed, David S. Miller, Netdev,
	Jiri Pirko, Jakub Kicinski, Bjorn Helgaas

Thu, Aug 02, 2018 at 01:13:20AM CEST, saeedm@dev.mellanox.co.il wrote:

[...]

>>
>> So after looking over the patch set the one thing I would ask for in
>> this is some sort of documentation at a minimum. As a user I don't see
>> how you can expect someone to be able to use this when the naming of
>> things are pretty cryptic and there is no real explanation anywhere if
>> you don't go through and read the patch description itself. When you
>> start adding driver specific interfaces, you should at least start
>> adding vendor specific documentation.
>>
>
>Sure, sounds like a great idea, something like:
>Documentation/networking/mlx5.txt and have a devlink section ?
>or have a generic devlink doc and a mlx5 section in it ?

I think that Documentation/networking/devlink/mlx5.txt would be good.
We can have generic param description in:
Documentation/networking/devlink/generic.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02  1:40   ` David Miller
@ 2018-08-02  8:29     ` Petr Machata
  2018-08-02 17:11       ` Jakub Kicinski
  2018-08-02 15:07     ` Eran Ben Elisha
  1 sibling, 1 reply; 32+ messages in thread
From: Petr Machata @ 2018-08-02  8:29 UTC (permalink / raw)
  To: David Miller
  Cc: jakub.kicinski, saeedm, netdev, jiri, alexander.duyck, helgaas

David Miller <davem@davemloft.net> writes:

> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date: Wed, 1 Aug 2018 17:00:47 -0700
>
>> On Wed,  1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote:
>>> - According to the discussion outcome, we are keeping the congestion control
>>>   setting as mlx5 device specific for the current HW generation.
>> 
>> I still see queuing and marking based on queue level.  You want to add 
>> a Qdisc that will mirror your HW's behaviour to offload, if you really
>> believe this is not a subset of RED, why not...  But devlink params?
>
> I totally agree, devlink seems like absolutely to wrong level and set
> of interfaces to be doing this stuff.
>
> I will not pull these changes in and I probably should have not
> accepted the DCB changes from the other day and they were sneakily
> leading up to this crap.

Are you talking about the recent additions of DCB helpers
dcb_ieee_getapp_prio_dscp_mask_map() etc.?

If yes, I can assure there were no sneaky intentions at all. I'm at a
loss to understand the relation to mlx5 team's decision to use devlink
for congestion control configuration.

Could you please clarify your remark?

Thanks,
Petr

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02  1:40   ` David Miller
  2018-08-02  8:29     ` Petr Machata
@ 2018-08-02 15:07     ` Eran Ben Elisha
  2018-08-02 22:53       ` Jakub Kicinski
  1 sibling, 1 reply; 32+ messages in thread
From: Eran Ben Elisha @ 2018-08-02 15:07 UTC (permalink / raw)
  To: David Miller, jakub.kicinski
  Cc: saeedm, netdev, jiri, alexander.duyck, helgaas



On 8/2/2018 4:40 AM, David Miller wrote:
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date: Wed, 1 Aug 2018 17:00:47 -0700
> 
>> On Wed,  1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote:
>>> - According to the discussion outcome, we are keeping the congestion control
>>>    setting as mlx5 device specific for the current HW generation.
>>
>> I still see queuing and marking based on queue level.  You want to add
>> a Qdisc that will mirror your HW's behaviour to offload, if you really
>> believe this is not a subset of RED, why not...  But devlink params?
> 
> I totally agree, devlink seems like absolutely to wrong level and set
> of interfaces to be doing this stuff.
> 
> I will not pull these changes in and I probably should have not
> accepted the DCB changes from the other day and they were sneakily
> leading up to this crap.
> 
> Sorry, please follow Jakub's lead as I think his approach makes much
> more technical sense than using devlink for this.
> 
> Thanks.
> 

Hi Dave,
I would like to re-state that this feature was not meant to be a generic 
one. This feature was added in order to resolve a HW bug which exist in
a small portion of our devices. Those params will be used only on those 
current HWs and won't be in use for our future devices.

During the discussions, several alternatives where offered to be used by 
various members of the community. These alternatives includes TC and 
enhancements to PCI configuration tools.

Regarding the TC, from my perspective, this is not an option as:
1) The HW mechanism handles multiple functions and therefore cannot be 
configured on as a regular TC
2) No PF + representors modeling can be applied here, this is a 
MultiHost environment where one host is not aware to the other hosts, 
and each is running on its own pci/driver. It is a device working mode 
configuration.
3) The current HW W/A is very limited, maybe it has a similar algorithm 
as WRED, but is being used for much simpler different use case (pci bus 
congestion). It cannot be compared to a standard TC capability 
(RED/WRED), and defining it as a offload fully controlled by the user 
will be a big misuse. (for example, drop rate cannot be configured)

regarding the PCI config tools, there was a consensus that such tool is 
not acceptable as it is not a part of the PCI spec.

Since module param/sysfs/debugfs/etc are no longer acceptable, and 
current drivers still desired with a way to do some configurations to 
the device/driver which cannot used standard Linux tool or by other 
vendors, devlink params was developed (under the assumption that this 
tool will be helpful for those needs, and those only).

 From my perspective, Devlink is the tool to configure the device for 
handling such unexpected bugs, i.e "PCIe buffer congestion handling 
workaround".

Thanks,
Eran

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02  8:29     ` Petr Machata
@ 2018-08-02 17:11       ` Jakub Kicinski
  2018-08-02 18:04         ` David Miller
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2018-08-02 17:11 UTC (permalink / raw)
  To: Petr Machata; +Cc: David Miller, saeedm, netdev, jiri, alexander.duyck, helgaas

On Thu, 02 Aug 2018 11:29:12 +0300, Petr Machata wrote:
> David Miller <davem@davemloft.net> writes:
> 
> > From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Date: Wed, 1 Aug 2018 17:00:47 -0700
> >  
> >> On Wed,  1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote:  
> >>> - According to the discussion outcome, we are keeping the congestion control
> >>>   setting as mlx5 device specific for the current HW generation.  
> >> 
> >> I still see queuing and marking based on queue level.  You want to add 
> >> a Qdisc that will mirror your HW's behaviour to offload, if you really
> >> believe this is not a subset of RED, why not...  But devlink params?  
> >
> > I totally agree, devlink seems like absolutely to wrong level and set
> > of interfaces to be doing this stuff.
> >
> > I will not pull these changes in and I probably should have not
> > accepted the DCB changes from the other day and they were sneakily
> > leading up to this crap.  
> 
> Are you talking about the recent additions of DCB helpers
> dcb_ieee_getapp_prio_dscp_mask_map() etc.?
> 
> If yes, I can assure there were no sneaky intentions at all. I'm at a
> loss to understand the relation to mlx5 team's decision to use devlink
> for congestion control configuration.
> 
> Could you please clarify your remark?

Oh, I think David meant the patches I was objecting to a while ago,
which were doing buffer configuration via the DCB API.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02 17:11       ` Jakub Kicinski
@ 2018-08-02 18:04         ` David Miller
  2018-08-02 20:10           ` Petr Machata
  0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2018-08-02 18:04 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: petrm, saeedm, netdev, jiri, alexander.duyck, helgaas

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Thu, 2 Aug 2018 10:11:12 -0700

> On Thu, 02 Aug 2018 11:29:12 +0300, Petr Machata wrote:
>> Could you please clarify your remark?
> 
> Oh, I think David meant the patches I was objecting to a while ago,
> which were doing buffer configuration via the DCB API.

Exactly.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02 18:04         ` David Miller
@ 2018-08-02 20:10           ` Petr Machata
  0 siblings, 0 replies; 32+ messages in thread
From: Petr Machata @ 2018-08-02 20:10 UTC (permalink / raw)
  To: David Miller
  Cc: jakub.kicinski, saeedm, netdev, jiri, alexander.duyck, helgaas

David Miller <davem@davemloft.net> writes:

> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date: Thu, 2 Aug 2018 10:11:12 -0700
>
>> On Thu, 02 Aug 2018 11:29:12 +0300, Petr Machata wrote:
>>> Could you please clarify your remark?
>> 
>> Oh, I think David meant the patches I was objecting to a while ago,
>> which were doing buffer configuration via the DCB API.
>
> Exactly.

Ah, OK then.

Thanks,
Petr

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02 15:07     ` Eran Ben Elisha
@ 2018-08-02 22:53       ` Jakub Kicinski
  2018-08-03 16:41         ` Ido Schimmel
  2018-08-06 13:01         ` Eran Ben Elisha
  0 siblings, 2 replies; 32+ messages in thread
From: Jakub Kicinski @ 2018-08-02 22:53 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: David Miller, saeedm, netdev, jiri, alexander.duyck, helgaas

On Thu, 2 Aug 2018 18:07:18 +0300, Eran Ben Elisha wrote:
> On 8/2/2018 4:40 AM, David Miller wrote:
> > From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Date: Wed, 1 Aug 2018 17:00:47 -0700
> >   
> >> On Wed,  1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote:  
> >>> - According to the discussion outcome, we are keeping the congestion control
> >>>    setting as mlx5 device specific for the current HW generation.  
> >>
> >> I still see queuing and marking based on queue level.  You want to add
> >> a Qdisc that will mirror your HW's behaviour to offload, if you really
> >> believe this is not a subset of RED, why not...  But devlink params?  
> > 
> > I totally agree, devlink seems like absolutely to wrong level and set
> > of interfaces to be doing this stuff.
> > 
> > I will not pull these changes in and I probably should have not
> > accepted the DCB changes from the other day and they were sneakily
> > leading up to this crap.
> > 
> > Sorry, please follow Jakub's lead as I think his approach makes much
> > more technical sense than using devlink for this.
> > 
> > Thanks.
> >   
> 
> Hi Dave,
> I would like to re-state that this feature was not meant to be a generic 
> one. This feature was added in order to resolve a HW bug which exist in
> a small portion of our devices. 

Would you mind describing the HW bug in more detail?  To a outside
reviewer it really looks like you're adding a feature.  What are you
working around?  Is the lack of full AQM on the PCIe side of the chip 
considered a bug?

> Those params will be used only on those current HWs and won't be in
> use for our future devices.

I'm glad that is your plan today, however, customers may get used to
the simple interface you're adding now.  This means the API you are
adding is effectively becoming an API other drivers may need to
implement to keep compatibility with someone's proprietary
orchestration.

> During the discussions, several alternatives where offered to be used by 
> various members of the community. These alternatives includes TC and 
> enhancements to PCI configuration tools.
> 
> Regarding the TC, from my perspective, this is not an option as:
> 1) The HW mechanism handles multiple functions and therefore cannot be 
> configured on as a regular TC

Could you elaborate?  What are the multiple functions?  You seem to be
adding a knob to enable ECN marking and a knob for choosing between
some predefined slopes.

In what way would your solution not behave like a RED offload?

With TC offload you'd also get a well-defined set of statistics, I
presume right now you're planning on adding a set of ethtool -S
counters?

> 2) No PF + representors modeling can be applied here, this is a 
> MultiHost environment where one host is not aware to the other hosts, 
> and each is running on its own pci/driver. It is a device working mode 
> configuration.

Yes, the multihost part makes it less pleasant.  But this is a problem
we have to tackle separately, at some point.  It's not a center of
attention here.

> 3) The current HW W/A is very limited, maybe it has a similar algorithm 
> as WRED, but is being used for much simpler different use case (pci bus 
> congestion).

No one is requesting full RED offload here..  if someone sets the
parameters you can't support you simply won't offload them.  And ignore
the parameters which only make sense in software terms.  Look at the
docs for mlxsw:

https://github.com/Mellanox/mlxsw/wiki/Queues-Management#offloading-red

It says "not offloaded" in a number of places.

> It cannot be compared to a standard TC capability (RED/WRED), and
> defining it as a offload fully controlled by the user will be a big
> misuse. 

It's generally preferable to implement a subset of exiting well defined
API than create vendor knobs, hence hardly a misuse.

> (for example, drop rate cannot be configured)

I don't know what "configuring drop rate" means in case of RED..

> regarding the PCI config tools, there was a consensus that such tool is 
> not acceptable as it is not a part of the PCI spec.

As I said, this has nothing to do with PCI being the transport.  The
port you're running over could be serial, SPI or anything else.  You
have congestion on a port of a device, that's a networking problem.

> Since module param/sysfs/debugfs/etc are no longer acceptable, and 
> current drivers still desired with a way to do some configurations to 
> the device/driver which cannot used standard Linux tool or by other 
> vendors, devlink params was developed (under the assumption that this 
> tool will be helpful for those needs, and those only).
>
> From my perspective, Devlink is the tool to configure the device for 
> handling such unexpected bugs, i.e "PCIe buffer congestion handling 
> workaround".

Hm.  Are you calling it a bug because you had to work around silicon
limitation in firmware?  Hm.  I'm very intrigued by the framing :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02 22:53       ` Jakub Kicinski
@ 2018-08-03 16:41         ` Ido Schimmel
  2018-08-04  4:59           ` Jakub Kicinski
  2018-08-06 13:01         ` Eran Ben Elisha
  1 sibling, 1 reply; 32+ messages in thread
From: Ido Schimmel @ 2018-08-03 16:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eran Ben Elisha, David Miller, saeedm, netdev, jiri,
	alexander.duyck, helgaas

On Thu, Aug 02, 2018 at 03:53:15PM -0700, Jakub Kicinski wrote:
> No one is requesting full RED offload here..  if someone sets the
> parameters you can't support you simply won't offload them.  And ignore
> the parameters which only make sense in software terms.  Look at the
> docs for mlxsw:
> 
> https://github.com/Mellanox/mlxsw/wiki/Queues-Management#offloading-red
> 
> It says "not offloaded" in a number of places.
> 
...
> It's generally preferable to implement a subset of exiting well defined
> API than create vendor knobs, hence hardly a misuse.

Sorry for derailing the discussion, but you mentioned some points that
have been bothering me for a while.

I think we didn't do a very good job with buffer management and this is
exactly why you see some parameters marked as "not offloaded". Take the
"limit" (queue size) for example. It's configured via devlink-sb, by
setting a quota on the number of bytes that can be queued for the port
and TC (queue) that RED manages. See:

https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service#pool-binding

It would have been much better and user friendly to not ignore this
parameter and have users configure the limit using existing interfaces
(tc), instead of creating a discrepancy between the software and
hardware data paths by configuring the hardware directly via devlink-sb.

I believe devlink-sb is mainly the result of Linux's short comings in
this area and our lack of perspective back then. While the qdisc layer
(Linux's shared buffers) works for end hosts, it requires enhancements
(mainly on ingress) for switches (physical/virtual) that forward
packets.

For example, switches (I'm familiar with Mellanox ASICs, but I assume
the concept is similar in other ASICs) have ingress buffers where
packets are stored while going through the pipeline. Once out of the
pipeline you know from which port and queue the packet should egress. In
case you have both lossless and lossy traffic in your network you
probably want to classify it into different ingress buffers and mark the
buffers where the lossless traffic is stored as such, so that PFC frames
would be emitted above a certain threshold.

This is currently configured using dcbnl, but it lacks a software model
which means that packets that are forwarded by the kernel don't get the
same treatment (e.g., skb priority isn't set). It also means that when
you want to limit the number of packets that are queued *from* a certain
port and ingress buffer you resort to tools such as devlink-sb that end
up colliding with existing tools (tc).

I was thinking (not too much...) about modelling the above using ingress
qdiscs. They don't do any queueing, but more of accounting. Once the
egress qdisc dequeues the packet, you give credit back to the ingress
qdisc from which the packet came from. I believe that modelling these
buffers using the qdisc layer is the right abstraction.

Would appreciate hearing your thoughts on the above.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-03 16:41         ` Ido Schimmel
@ 2018-08-04  4:59           ` Jakub Kicinski
  0 siblings, 0 replies; 32+ messages in thread
From: Jakub Kicinski @ 2018-08-04  4:59 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Eran Ben Elisha, David Miller, saeedm, netdev, jiri,
	alexander.duyck, helgaas

On Fri, 3 Aug 2018 19:41:50 +0300, Ido Schimmel wrote:
> On Thu, Aug 02, 2018 at 03:53:15PM -0700, Jakub Kicinski wrote:
> > No one is requesting full RED offload here..  if someone sets the
> > parameters you can't support you simply won't offload them.  And ignore
> > the parameters which only make sense in software terms.  Look at the
> > docs for mlxsw:
> > 
> > https://github.com/Mellanox/mlxsw/wiki/Queues-Management#offloading-red
> > 
> > It says "not offloaded" in a number of places.
> >   
> ...
> > It's generally preferable to implement a subset of exiting well defined
> > API than create vendor knobs, hence hardly a misuse.  
> 
> Sorry for derailing the discussion, but you mentioned some points that
> have been bothering me for a while.
> 
> I think we didn't do a very good job with buffer management and this is
> exactly why you see some parameters marked as "not offloaded". Take the
> "limit" (queue size) for example. It's configured via devlink-sb, by
> setting a quota on the number of bytes that can be queued for the port
> and TC (queue) that RED manages. See:
> 
> https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service#pool-binding

FWIW I was implementing a very similar thing for the NFP a while back.
devlink-sb to configure per-port limits and RED offload.  I believe we
have some more qdisc offloads but out-of-tree/for appliances.
"Switchdev mode" + qdisc offloads work quite well.  For RED I think
we also don't offload the limit.

> It would have been much better and user friendly to not ignore this
> parameter and have users configure the limit using existing interfaces
> (tc), instead of creating a discrepancy between the software and
> hardware data paths by configuring the hardware directly via devlink-sb.
>
> I believe devlink-sb is mainly the result of Linux's short comings in
> this area and our lack of perspective back then. While the qdisc layer
> (Linux's shared buffers) works for end hosts, it requires enhancements
> (mainly on ingress) for switches (physical/virtual) that forward
> packets.

I could definitely agree with you.  But there is another way to look at
this.  Memory in ASICs is fundamentally more precious.  If the problem
was never solved for Linux (placing constraints on the number of
packets in the system by ingress port) maybe it's just not important
for software stacks?  Qdiscs are focused on egress.  Perhaps a better
software equivalent to Shared Buffers would be Jesper's Buffer Pools?

With Buffer Pools the concern that a pre-configured and pinned pool of
DMA-mapped pages will start growing and eat all host's memory is more
real.  That to me that's closer.  If we develop XDP-based fastpaths
with DMA pools shared between devices - that's much more like an ASIC's
SB.

In my view we don't offload the limit not because we configure it via
an different API, but because the limit assumes there is abundance of
memory and queue has to be capped.  Limit expresses how much queue
build up is okay, while SB config is strictly a resource quota.  In
practice the quota is always a lot lower than user's desired limit so
we don't even bother with the limit.

> For example, switches (I'm familiar with Mellanox ASICs, but I assume
> the concept is similar in other ASICs) have ingress buffers where
> packets are stored while going through the pipeline. Once out of the
> pipeline you know from which port and queue the packet should egress. In
> case you have both lossless and lossy traffic in your network you
> probably want to classify it into different ingress buffers and mark the
> buffers where the lossless traffic is stored as such, so that PFC frames
> would be emitted above a certain threshold.
> 
> This is currently configured using dcbnl, but it lacks a software model
> which means that packets that are forwarded by the kernel don't get the
> same treatment (e.g., skb priority isn't set). It also means that when
> you want to limit the number of packets that are queued *from* a certain
> port and ingress buffer you resort to tools such as devlink-sb that end
> up colliding with existing tools (tc).

Extending DCB further into the kernel on ingress does not seem
impossible.  Maybe the AVB/industrial folks will tackle that at some
point?

> I was thinking (not too much...) about modelling the above using ingress
> qdiscs. They don't do any queueing, but more of accounting. Once the
> egress qdisc dequeues the packet, you give credit back to the ingress
> qdisc from which the packet came from. I believe that modelling these
> buffers using the qdisc layer is the right abstraction.

Interesting.  My concern would be mapping the packet back to ingress
port to free the right qdisc credit.  MM direction, like Buffer Pools,
seem more viable to a layman like me.  But ingress qdiscs sound worth
exploring.

> Would appreciate hearing your thoughts on the above.

Thanks a lot for your response, you've certainly given me things to
think about over the weekend :)  A lot of cool things we can build if 
we keep moving forward :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-02 22:53       ` Jakub Kicinski
  2018-08-03 16:41         ` Ido Schimmel
@ 2018-08-06 13:01         ` Eran Ben Elisha
  2018-08-07  0:49           ` Jakub Kicinski
  1 sibling, 1 reply; 32+ messages in thread
From: Eran Ben Elisha @ 2018-08-06 13:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, saeedm, netdev, jiri, alexander.duyck, helgaas

>>
>> Hi Dave,
>> I would like to re-state that this feature was not meant to be a generic
>> one. This feature was added in order to resolve a HW bug which exist in
>> a small portion of our devices.
> 
> Would you mind describing the HW bug in more detail?  To a outside
> reviewer it really looks like you're adding a feature.  What are you
> working around?  Is the lack of full AQM on the PCIe side of the chip
> considered a bug?

In multiple function environment, there is an issue with buffer 
allocation per function which may lead to starvation. There is an HW WA 
for mitigate this starvation by identifying this state and apply early 
drop/mark.

> 
>> Those params will be used only on those current HWs and won't be in
>> use for our future devices.
> 
> I'm glad that is your plan today, however, customers may get used to
> the simple interface you're adding now.  This means the API you are
> adding is effectively becoming an API other drivers may need to
> implement to keep compatibility with someone's proprietary
> orchestration.

This issue was refactored, thus no need to have this WA at all in future 
NICs. So I don't believe we will end up in the situation you are describing.
It is less likely that other vendors will be facing the same issue and 
will have to support such param. it was burn out of a bug and not as a 
feature which other may follow.

> 
>> During the discussions, several alternatives where offered to be used by
>> various members of the community. These alternatives includes TC and
>> enhancements to PCI configuration tools.
>>
>> Regarding the TC, from my perspective, this is not an option as:
>> 1) The HW mechanism handles multiple functions and therefore cannot be
>> configured on as a regular TC
> 
> Could you elaborate?  What are the multiple functions?  You seem to be
> adding a knob to enable ECN marking and a knob for choosing between
> some predefined slopes.

PSB, The sloped are dynamic and enabled in a dynamic way.
Indeed, we are adding a very specific knob for very non standard 
specific issue which can be used in addition to standard ECN marking.

> 
> In what way would your solution not behave like a RED offload?

Existing Algo (RED, PIE, etc) are static, configurable. Our HW WA is 
dynamic (dynamic slope), adjusted and auto enabled.

> 
> With TC offload you'd also get a well-defined set of statistics, I
> presume right now you're planning on adding a set of ethtool -S
> counters?
> 
>> 2) No PF + representors modeling can be applied here, this is a
>> MultiHost environment where one host is not aware to the other hosts,
>> and each is running on its own pci/driver. It is a device working mode
>> configuration.
> 
> Yes, the multihost part makes it less pleasant.  But this is a problem
> we have to tackle separately, at some point.  It's not a center of
> attention here.

Agree, however the multihost part makes it non-transparent if we chose a 
solution which is not based on direct vendor configuration. This will 
lead to a bad user experience.

> 
>> 3) The current HW W/A is very limited, maybe it has a similar algorithm
>> as WRED, but is being used for much simpler different use case (pci bus
>> congestion).
> 
> No one is requesting full RED offload here..  if someone sets the
> parameters you can't support you simply won't offload them.  And ignore
> the parameters which only make sense in software terms.  Look at the
> docs for mlxsw:
> 
> https://github.com/Mellanox/mlxsw/wiki/Queues-Management#offloading-red
> 
> It says "not offloaded" in a number of places.
> 
>> It cannot be compared to a standard TC capability (RED/WRED), and
>> defining it as a offload fully controlled by the user will be a big
>> misuse.
> 
> It's generally preferable to implement a subset of exiting well defined
> API than create vendor knobs, hence hardly a misuse.

As written above, this is not the case here.

> 
>> (for example, drop rate cannot be configured)
> 
> I don't know what "configuring drop rate" means in case of RED..
> 
>> regarding the PCI config tools, there was a consensus that such tool is
>> not acceptable as it is not a part of the PCI spec.
> 
> As I said, this has nothing to do with PCI being the transport.  The
> port you're running over could be serial, SPI or anything else.  You
> have congestion on a port of a device, that's a networking problem.
> 
>> Since module param/sysfs/debugfs/etc are no longer acceptable, and
>> current drivers still desired with a way to do some configurations to
>> the device/driver which cannot used standard Linux tool or by other
>> vendors, devlink params was developed (under the assumption that this
>> tool will be helpful for those needs, and those only).
>>
>>  From my perspective, Devlink is the tool to configure the device for
>> handling such unexpected bugs, i.e "PCIe buffer congestion handling
>> workaround".
> 
> Hm.  Are you calling it a bug because you had to work around silicon
> limitation in firmware?  Hm.  I'm very intrigued by the framing :)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-06 13:01         ` Eran Ben Elisha
@ 2018-08-07  0:49           ` Jakub Kicinski
  0 siblings, 0 replies; 32+ messages in thread
From: Jakub Kicinski @ 2018-08-07  0:49 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: David Miller, saeedm, netdev, jiri, alexander.duyck, helgaas

On Mon, 6 Aug 2018 16:01:25 +0300, Eran Ben Elisha wrote:
> >> Hi Dave,
> >> I would like to re-state that this feature was not meant to be a generic
> >> one. This feature was added in order to resolve a HW bug which exist in
> >> a small portion of our devices.  
> > 
> > Would you mind describing the HW bug in more detail?  To a outside
> > reviewer it really looks like you're adding a feature.  What are you
> > working around?  Is the lack of full AQM on the PCIe side of the chip
> > considered a bug?  
> 
> In multiple function environment, there is an issue with buffer 
> allocation per function which may lead to starvation. 

Multi-function?  I thought you have a PF per uplink on all mlx5
silicon.  Does the problem occur in single host scenarios as well?
What if with single function the host is too slow with taking packets
off the RX ring?  Can the problem occur?  Would this feature help in
such scenario as well?

> There is an HW WA for mitigate this starvation by identifying this
> state and apply early drop/mark.

If I understand you correctly you presently have one shared buffer with
no way to place limits (quotas) on how much of it can be consumed by a
traffic for a single PF.  It remains unclear why this have not been a
problem for you until now.

To avoid the starvation you are adding AQM which is a *feature* that
may help you avoid queue build up in the NIC.  But even if you could
place quotas, why would you not expose the AQM scheme?  It looks very
useful.

> >> Those params will be used only on those current HWs and won't be in
> >> use for our future devices.  
> > 
> > I'm glad that is your plan today, however, customers may get used to
> > the simple interface you're adding now.  This means the API you are
> > adding is effectively becoming an API other drivers may need to
> > implement to keep compatibility with someone's proprietary
> > orchestration.  
> 
> This issue was refactored, thus no need to have this WA at all in
> future NICs. So I don't believe we will end up in the situation you are
> describing. It is less likely that other vendors will be facing the
> same issue and will have to support such param. it was burn out of a
> bug and not as a feature which other may follow.

Sure other vendors may have buffers quotas configurable by e.g.
devlink-sb.  But the AQM you are adding is a feature which is
potentially already supported by others.

> >> During the discussions, several alternatives where offered to be
> >> used by various members of the community. These alternatives
> >> includes TC and enhancements to PCI configuration tools.
> >>
> >> Regarding the TC, from my perspective, this is not an option as:
> >> 1) The HW mechanism handles multiple functions and therefore
> >> cannot be configured on as a regular TC  
> > 
> > Could you elaborate?  What are the multiple functions?  You seem to
> > be adding a knob to enable ECN marking and a knob for choosing
> > between some predefined slopes.  
> 
> PSB, The sloped are dynamic and enabled in a dynamic way.
> Indeed, we are adding a very specific knob for very non standard 
> specific issue which can be used in addition to standard ECN marking.
> 
> > In what way would your solution not behave like a RED offload?  
> 
> Existing Algo (RED, PIE, etc) are static, configurable. Our HW WA is 
> dynamic (dynamic slope), adjusted and auto enabled.

You mean like Adaptive RED?  The lack of documentation is making this
conversation harder to have.  What's dynamic and aggressive?  These 
are not antonyms.  Are the parameters to the algorithm configurable?

Will you not want to expose the actual threshold and adjustment values
so the customers can tweak them on their own depending on the workload?

> > With TC offload you'd also get a well-defined set of statistics, I
> > presume right now you're planning on adding a set of ethtool -S
> > counters?
> >   
> >> 2) No PF + representors modeling can be applied here, this is a
> >> MultiHost environment where one host is not aware to the other
> >> hosts, and each is running on its own pci/driver. It is a device
> >> working mode configuration.  
> > 
> > Yes, the multihost part makes it less pleasant.  But this is a
> > problem we have to tackle separately, at some point.  It's not a
> > center of attention here.  
> 
> Agree, however the multihost part makes it non-transparent if we
> chose a solution which is not based on direct vendor configuration.
> This will lead to a bad user experience.

In my experience multi-host is not a major issue in practice.  And
switchdev mode gives some visibility into statistics of others hosts 
etc., which people appreciate.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
       [not found]       ` <2d84340e-0703-0bc7-4917-3b18979b2aa5@mellanox.com>
@ 2018-08-29 15:42         ` Alex Vesker
  2018-08-29 17:04           ` Alexander Duyck
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Vesker @ 2018-08-29 15:42 UTC (permalink / raw)
  To: Alexander Duyck, Saeed Mahameed
  Cc: Saeed Mahameed, David S. Miller, Netdev, Jiri Pirko,
	Jakub Kicinski, Bjorn Helgaas


> On Wed, Aug 1, 2018 at 4:13 PM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed <saeedm@mellanox.com> 
>>> wrote:
>>>> Hi Dave,
>>>>
>>>> This series provides devlink parameters updates to both devlink API 
>>>> and
>>>> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a 
>>>> previous
>>>> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
>>>> Devlink" to address review comments [1].
>>>>
>>>> Changes from the original series:
>>>> - According to the discussion outcome, we are keeping the 
>>>> congestion control
>>>>   setting as mlx5 device specific for the current HW generation.
>>>> - Changed the congestion_mode and congestion action param type to 
>>>> string
>>>> - Added patches to fix devlink handling of param type string
>>>> - Added a patch which adds extack messages support for param set.
>>>> - At the end of this series, I've added yet another mlx5 devlink 
>>>> related
>>>>  feature, firmware snapshot support.
>>>>
>>>> For more information please see tag log below.
>>>>
>>>> Please pull and let me know if there's any problem.
>>>>
>>>> [1] https://patchwork.ozlabs.org/patch/945996/
>>>>
>>>> Thanks,
>>>> Saeed.
>>>>
>>>> ---
>>>>
>>>> The following changes since commit 
>>>> e6476c21447c4b17c47e476aade6facf050f31e8:
>>>>
>>>>   net: remove bogus RCU annotations on socket.wq (2018-07-31 
>>>> 12:40:22 -0700)
>>>>
>>>> are available in the Git repository at:
>>>>
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
>>>> tags/mlx5-updates-2018-08-01
>>>>
>>>> for you to fetch changes up to 
>>>> 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
>>>>
>>>>   net/mlx5: Use devlink region_snapshot parameter (2018-08-01 
>>>> 14:49:09 -0700)
>>>>
>>>> ----------------------------------------------------------------
>>>> mlx5-updates-2018-08-01
>>>>
>>>> This series provides devlink parameters updates to both devlink API 
>>>> and
>>>> mlx5 driver,
>>>>
>>>> 1) Devlink changes: (Moshe Shemesh)
>>>> The first two patches fix devlink param infrastructure for string type
>>>> params.
>>>> The third patch adds a devlink helper function to safely copy 
>>>> string from
>>>> driver to devlink.
>>>> The forth patch adds extack support for param set.
>>>>
>>>> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
>>>> Next three patches add new devlink driver specific params for 
>>>> controlling
>>>> congestion action and mode, using string type params and extack 
>>>> messages support.
>>>>
>>>> This congestion mode enables hw workaround in specific devices 
>>>> which is
>>>> controlled by devlink driver-specific params. The workaround is device
>>>> specific for this NIC generation, the next NIC will not need it.
>>>>
>>>> Congestion parameters:
>>>>  - Congestion action
>>>>             HW W/A mechanism in the PCIe buffer which monitors the 
>>>> amount of
>>>>             consumed PCIe buffer per host.  This mechanism supports 
>>>> the
>>>>             following actions in case of threshold overflow:
>>>>             - Disabled - NOP (Default)
>>>>             - Drop
>>>>             - Mark - Mark CE bit in the CQE of received packet
>>>>     - Congestion mode
>>>>             - Aggressive - Aggressive static trigger threshold 
>>>> (Default)
>>>>             - Dynamic - Dynamically change the trigger threshold
>>>>
>>>> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
>>>> Last three patches, add the support for capturing region snapshot 
>>>> of the
>>>> firmware crspace during critical errors, using devlink region_snapshot
>>>> parameter.
>>>>
>>>> -Saeed.
>>>>
>>>> ----------------------------------------------------------------
>>>> Alex Vesker (3):
>>>>       net/mlx5: Add Vendor Specific Capability access gateway
>>>>       net/mlx5: Add Crdump FW snapshot support
>>>>       net/mlx5: Use devlink region_snapshot parameter
>>>>
>>>> Eran Ben Elisha (3):
>>>>       net/mlx5: Move all devlink related functions calls to devlink.c
>>>>       net/mlx5: Add MPEGC register configuration functionality
>>>>       net/mlx5: Enable PCIe buffer congestion handling workaround 
>>>> via devlink
>>>>
>>>> Moshe Shemesh (4):
>>>>       devlink: Fix param set handling for string type
>>>>       devlink: Fix param cmode driverinit for string type
>>>>       devlink: Add helper function for safely copy string param
>>>>       devlink: Add extack messages support to param set
>>>>
>>>>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
>>>>  drivers/net/ethernet/mellanox/mlx4/main.c          |   6 +-
>>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
>>>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 
>>>> +++++++++++++++++++++
>>>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
>>>>  .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 ++++++++++++
>>>>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
>>>>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
>>>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 
>>>> +++++++++++++++++
>>>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
>>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c     |  10 +-
>>>>  include/linux/mlx5/driver.h                        |   5 +
>>>>  include/net/devlink.h                              |  15 +-
>>>>  net/core/devlink.c                                 |  44 ++-
>>>>  14 files changed, 1076 insertions(+), 17 deletions(-)
>>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
>>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
>>>>  create mode 100644 
>>>> drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
>>>>  create mode 100644 
>>>> drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
>>>>  create mode 100644 
>>>> drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
>>>
>>> So after looking over the patch set the one thing I would ask for in
>>> this is some sort of documentation at a minimum. As a user I don't see
>>> how you can expect someone to be able to use this when the naming of
>>> things are pretty cryptic and there is no real explanation anywhere if
>>> you don't go through and read the patch description itself. When you
>>> start adding driver specific interfaces, you should at least start
>>> adding vendor specific documentation.
>>>
>>
>> Sure, sounds like a great idea, something like:
>> Documentation/networking/mlx5.txt and have a devlink section ?
>> or have a generic devlink doc and a mlx5 section in it ?
>
> Either would work for me.
>
For which patches are you missing documentation?

>>> Also I don't see how using a vendor specific configuration space
>>> section can be done without adding some tie-ins to the PCI core files
>>> because it should be possible to race with someone poking at the
>>> register space via something like setpci/lspci. Also one of the things
>>> that came up was that drivers are not supposed to be banging on the
>>> PCI configuration space at will, and it seems like this patch set is
>>> doing exactly that through the VSC block.
>>>
>>
>> this is a whole different feature than the device specific parameters.
>> The whole vendor specific configuration space access is needed only
>> for diagnostic/dump
>> purposes when something really bad happens and the command interface
>> with FW is down,
>> and when the FW is un-responsive, we want to dump the crspace into the
>> already existing devlink
>> crdump buffer, how do you expect us to read it if we are not allowed
>> to access it ?
>>
>> What do you mean by tie-ins to the PCI core files ? can you please 
>> elaborate ?
>
> You have added a vendor specific config section and you are using it
> to access several of the pieces of metadata. The setup isn't too
> different than the VPD setup and approach. However I don't see many of
> the protections that exist for VPD in place for this vendor specific
> configuration. As such I have concerns. For example what is to keep
> requests to the various devlink interfaces from racing with each other
> when they both end up operating through the VCS?
>
> - Alex

Hi, I would like to resubmit the devlink region crdump support for mlx5,
which is part of this patch-set.

AlexD, regarding the protection, various devlink interfaces cannot race 
since
devlink_mutex is used. The VSC access is also protected, using 
mlx5_vsc_gw_lock/unlock
only one can acquire the lock.
After explaining this I want to clarify something, the access to VSC is 
not user driven
it happens automatically by the driver when an error is detected to 
collect a crdump.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
  2018-08-29 15:42         ` Alex Vesker
@ 2018-08-29 17:04           ` Alexander Duyck
       [not found]             ` <5206dd74-432d-3342-2a48-3cdd1be8b5cb@mellanox.com>
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Duyck @ 2018-08-29 17:04 UTC (permalink / raw)
  To: valex
  Cc: Saeed Mahameed, Saeed Mahameed, David Miller, Netdev, Jiri Pirko,
	Jakub Kicinski, Bjorn Helgaas

On Wed, Aug 29, 2018 at 8:43 AM Alex Vesker <valex@mellanox.com> wrote:
>
>
> > On Wed, Aug 1, 2018 at 4:13 PM, Saeed Mahameed
> > <saeedm@dev.mellanox.co.il> wrote:
> >> On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck
> >> <alexander.duyck@gmail.com> wrote:
> >>> On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed <saeedm@mellanox.com>
> >>> wrote:
> >>>> Hi Dave,
> >>>>
> >>>> This series provides devlink parameters updates to both devlink API
> >>>> and
> >>>> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a
> >>>> previous
> >>>> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
> >>>> Devlink" to address review comments [1].
> >>>>
> >>>> Changes from the original series:
> >>>> - According to the discussion outcome, we are keeping the
> >>>> congestion control
> >>>>   setting as mlx5 device specific for the current HW generation.
> >>>> - Changed the congestion_mode and congestion action param type to
> >>>> string
> >>>> - Added patches to fix devlink handling of param type string
> >>>> - Added a patch which adds extack messages support for param set.
> >>>> - At the end of this series, I've added yet another mlx5 devlink
> >>>> related
> >>>>  feature, firmware snapshot support.
> >>>>
> >>>> For more information please see tag log below.
> >>>>
> >>>> Please pull and let me know if there's any problem.
> >>>>
> >>>> [1] https://patchwork.ozlabs.org/patch/945996/
> >>>>
> >>>> Thanks,
> >>>> Saeed.
> >>>>
> >>>> ---
> >>>>
> >>>> The following changes since commit
> >>>> e6476c21447c4b17c47e476aade6facf050f31e8:
> >>>>
> >>>>   net: remove bogus RCU annotations on socket.wq (2018-07-31
> >>>> 12:40:22 -0700)
> >>>>
> >>>> are available in the Git repository at:
> >>>>
> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
> >>>> tags/mlx5-updates-2018-08-01
> >>>>
> >>>> for you to fetch changes up to
> >>>> 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
> >>>>
> >>>>   net/mlx5: Use devlink region_snapshot parameter (2018-08-01
> >>>> 14:49:09 -0700)
> >>>>
> >>>> ----------------------------------------------------------------
> >>>> mlx5-updates-2018-08-01
> >>>>
> >>>> This series provides devlink parameters updates to both devlink API
> >>>> and
> >>>> mlx5 driver,
> >>>>
> >>>> 1) Devlink changes: (Moshe Shemesh)
> >>>> The first two patches fix devlink param infrastructure for string type
> >>>> params.
> >>>> The third patch adds a devlink helper function to safely copy
> >>>> string from
> >>>> driver to devlink.
> >>>> The forth patch adds extack support for param set.
> >>>>
> >>>> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
> >>>> Next three patches add new devlink driver specific params for
> >>>> controlling
> >>>> congestion action and mode, using string type params and extack
> >>>> messages support.
> >>>>
> >>>> This congestion mode enables hw workaround in specific devices
> >>>> which is
> >>>> controlled by devlink driver-specific params. The workaround is device
> >>>> specific for this NIC generation, the next NIC will not need it.
> >>>>
> >>>> Congestion parameters:
> >>>>  - Congestion action
> >>>>             HW W/A mechanism in the PCIe buffer which monitors the
> >>>> amount of
> >>>>             consumed PCIe buffer per host.  This mechanism supports
> >>>> the
> >>>>             following actions in case of threshold overflow:
> >>>>             - Disabled - NOP (Default)
> >>>>             - Drop
> >>>>             - Mark - Mark CE bit in the CQE of received packet
> >>>>     - Congestion mode
> >>>>             - Aggressive - Aggressive static trigger threshold
> >>>> (Default)
> >>>>             - Dynamic - Dynamically change the trigger threshold
> >>>>
> >>>> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
> >>>> Last three patches, add the support for capturing region snapshot
> >>>> of the
> >>>> firmware crspace during critical errors, using devlink region_snapshot
> >>>> parameter.
> >>>>
> >>>> -Saeed.
> >>>>
> >>>> ----------------------------------------------------------------
> >>>> Alex Vesker (3):
> >>>>       net/mlx5: Add Vendor Specific Capability access gateway
> >>>>       net/mlx5: Add Crdump FW snapshot support
> >>>>       net/mlx5: Use devlink region_snapshot parameter
> >>>>
> >>>> Eran Ben Elisha (3):
> >>>>       net/mlx5: Move all devlink related functions calls to devlink.c
> >>>>       net/mlx5: Add MPEGC register configuration functionality
> >>>>       net/mlx5: Enable PCIe buffer congestion handling workaround
> >>>> via devlink
> >>>>
> >>>> Moshe Shemesh (4):
> >>>>       devlink: Fix param set handling for string type
> >>>>       devlink: Fix param cmode driverinit for string type
> >>>>       devlink: Add helper function for safely copy string param
> >>>>       devlink: Add extack messages support to param set
> >>>>
> >>>>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
> >>>>  drivers/net/ethernet/mellanox/mlx4/main.c          |   6 +-
> >>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
> >>>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388
> >>>> +++++++++++++++++++++
> >>>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
> >>>>  .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 ++++++++++++
> >>>>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
> >>>>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
> >>>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320
> >>>> +++++++++++++++++
> >>>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
> >>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c     |  10 +-
> >>>>  include/linux/mlx5/driver.h                        |   5 +
> >>>>  include/net/devlink.h                              |  15 +-
> >>>>  net/core/devlink.c                                 |  44 ++-
> >>>>  14 files changed, 1076 insertions(+), 17 deletions(-)
> >>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
> >>>>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
> >>>>  create mode 100644
> >>>> drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
> >>>>  create mode 100644
> >>>> drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> >>>>  create mode 100644
> >>>> drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
> >>>
> >>> So after looking over the patch set the one thing I would ask for in
> >>> this is some sort of documentation at a minimum. As a user I don't see
> >>> how you can expect someone to be able to use this when the naming of
> >>> things are pretty cryptic and there is no real explanation anywhere if
> >>> you don't go through and read the patch description itself. When you
> >>> start adding driver specific interfaces, you should at least start
> >>> adding vendor specific documentation.
> >>>
> >>
> >> Sure, sounds like a great idea, something like:
> >> Documentation/networking/mlx5.txt and have a devlink section ?
> >> or have a generic devlink doc and a mlx5 section in it ?
> >
> > Either would work for me.
> >
> For which patches are you missing documentation?

Any interfaces that you are exposing via the devlink interface should
have a document somewhere that explains if, when, and how to use it.
All of these new interfaces you were adding in the patch set has no
explanation of what they are other than what is in the driver code
itself and most users don't want to try and decode the driver itself
to figure out what it is they need to do to use an interface.

> >>> Also I don't see how using a vendor specific configuration space
> >>> section can be done without adding some tie-ins to the PCI core files
> >>> because it should be possible to race with someone poking at the
> >>> register space via something like setpci/lspci. Also one of the things
> >>> that came up was that drivers are not supposed to be banging on the
> >>> PCI configuration space at will, and it seems like this patch set is
> >>> doing exactly that through the VSC block.
> >>>
> >>
> >> this is a whole different feature than the device specific parameters.
> >> The whole vendor specific configuration space access is needed only
> >> for diagnostic/dump
> >> purposes when something really bad happens and the command interface
> >> with FW is down,
> >> and when the FW is un-responsive, we want to dump the crspace into the
> >> already existing devlink
> >> crdump buffer, how do you expect us to read it if we are not allowed
> >> to access it ?
> >>
> >> What do you mean by tie-ins to the PCI core files ? can you please
> >> elaborate ?
> >
> > You have added a vendor specific config section and you are using it
> > to access several of the pieces of metadata. The setup isn't too
> > different than the VPD setup and approach. However I don't see many of
> > the protections that exist for VPD in place for this vendor specific
> > configuration. As such I have concerns. For example what is to keep
> > requests to the various devlink interfaces from racing with each other
> > when they both end up operating through the VCS?
> >
> > - Alex
>
> Hi, I would like to resubmit the devlink region crdump support for mlx5,
> which is part of this patch-set.
>
> AlexD, regarding the protection, various devlink interfaces cannot race
> since
> devlink_mutex is used. The VSC access is also protected, using
> mlx5_vsc_gw_lock/unlock
> only one can acquire the lock.

You are talking about from within the driver right? What about raw PCI
configuration space access? Is there anything to keep setpci from
causing issues when you are using the interface? Really creating a
vendor specific config block in and of itself creates a whole new
mess.

> After explaining this I want to clarify something, the access to VSC is
> not user driven
> it happens automatically by the driver when an error is detected to
> collect a crdump.

You don't prevent a user from being able to poke at the configuration
space via setpci.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
       [not found]             ` <5206dd74-432d-3342-2a48-3cdd1be8b5cb@mellanox.com>
@ 2018-08-30 15:39               ` Alexander Duyck
  0 siblings, 0 replies; 32+ messages in thread
From: Alexander Duyck @ 2018-08-30 15:39 UTC (permalink / raw)
  To: valex
  Cc: Erez Shitrit, Saeed Mahameed, Saeed Mahameed, David Miller,
	Netdev, Jiri Pirko, Jakub Kicinski, Bjorn Helgaas, linux-pci

I'm dropping all the old comments since the conversation was flattened
and only has one level of marks for everything.

On Thu, Aug 30, 2018 at 7:43 AM Alex Vesker <valex@mellanox.com> wrote:

<snip>

> To which devlink interfaces are you referring?

All of them. Not just the ones in this patch. If you are exposing an
interface to the user you should have documentation for it somewhere.
You should probably look at adding a patch to make certain you have
all the existing devlink interfaces in the driver documented.

I would like to see something added to the documentation folder that
explains what all the DEVLINK_PARAM_GENERIC interfaces are expected to
do, and maybe why I would use them. Then in addition I would like to
see per-driver documentation added for the DEVLINK_PARAM_DRIVER calls.
So for example I can't find any documentation in the kernel on what
enable_64b_cqe_eqe or enable_4k_uar do in mlx4 or why I would need
them, but you have them exposed as interfaces to userspace.

> There are 3 patches here that provide the crdump capability,
> these are the patches I would like to resubmit.
>
> net/mlx5: Add Vendor Specific Capability access gateway:
>     This is needed to read from the VSC by only the driver to collect a dump

You should probably work with the linux-pci mailing list on this bit
since you are exposing a new capability and they can probably point
you in the direction of how they want to deal with any potential races
in terms of access to the device versus your capability which you are
adding support for dumping via devlink.

> net/mlx5: Add Crdump FW snapshot support
>     This is code that collects the dump and registers a region called crdump
> net/mlx5: Use devlink region_snapshot parameter
>     Here I use an already implemented global param that specifies whether
>     snapshots are supported.
>
> The devlink region feature is well documented.

Where?

> can it be that you referring to devlink region called "crdump" which mlx5 exposes?

I don't care about the internals. I care about user available
documentation for the interface that is exposed. How do you expect the
user to use this functionality? That is what I want documented.

<snip>

> Will it be sufficient to prevent setcpi access using "pci_cfg_access_lock -
> any userspace reads or writes to config space and concurrent lock requests will sleep"
> otherwise do you have a different solution?

That sounds like a step in the right direction, but that is something
you should work with the linux-pci list on. My main concern is that I
don't want us being able to come at this interface from multiple
directions and screw things up.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-08-30 19:42 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
2018-08-01 21:52 ` [net-next 01/10] devlink: Fix param set handling for string type Saeed Mahameed
2018-08-01 22:33   ` Jakub Kicinski
2018-08-01 21:52 ` [net-next 02/10] devlink: Fix param cmode driverinit " Saeed Mahameed
2018-08-01 21:52 ` [net-next 03/10] devlink: Add helper function for safely copy string param Saeed Mahameed
2018-08-01 21:52 ` [net-next 04/10] devlink: Add extack messages support to param set Saeed Mahameed
2018-08-01 21:52 ` [net-next 05/10] net/mlx5: Move all devlink related functions calls to devlink.c Saeed Mahameed
2018-08-01 21:52 ` [net-next 06/10] net/mlx5: Add MPEGC register configuration functionality Saeed Mahameed
2018-08-01 21:52 ` [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink Saeed Mahameed
2018-08-01 22:18   ` Alexander Duyck
2018-08-01 21:52 ` [net-next 08/10] net/mlx5: Add Vendor Specific Capability access gateway Saeed Mahameed
2018-08-01 21:52 ` [net-next 09/10] net/mlx5: Add Crdump FW snapshot support Saeed Mahameed
2018-08-01 21:52 ` [net-next 10/10] net/mlx5: Use devlink region_snapshot parameter Saeed Mahameed
2018-08-01 22:34 ` [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Alexander Duyck
2018-08-01 23:13   ` Saeed Mahameed
2018-08-02  0:36     ` Alexander Duyck
     [not found]       ` <2d84340e-0703-0bc7-4917-3b18979b2aa5@mellanox.com>
2018-08-29 15:42         ` Alex Vesker
2018-08-29 17:04           ` Alexander Duyck
     [not found]             ` <5206dd74-432d-3342-2a48-3cdd1be8b5cb@mellanox.com>
2018-08-30 15:39               ` Alexander Duyck
2018-08-02  6:15     ` Jiri Pirko
2018-08-02  0:00 ` Jakub Kicinski
2018-08-02  1:40   ` David Miller
2018-08-02  8:29     ` Petr Machata
2018-08-02 17:11       ` Jakub Kicinski
2018-08-02 18:04         ` David Miller
2018-08-02 20:10           ` Petr Machata
2018-08-02 15:07     ` Eran Ben Elisha
2018-08-02 22:53       ` Jakub Kicinski
2018-08-03 16:41         ` Ido Schimmel
2018-08-04  4:59           ` Jakub Kicinski
2018-08-06 13:01         ` Eran Ben Elisha
2018-08-07  0:49           ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.