linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next RFC v2 00/13] Add devlink reload action option
@ 2020-08-17  9:37 Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command Moshe Shemesh
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Introduce new option on devlink reload API to enable the user to select the
reload action required. Complete support for all actions in mlx5.
The following reload actions are supported:
  fw_live_patch: firmware live patching.
  driver_reinit: driver entities re-initialization, applying devlink-params
                 and devlink-resources values.
  fw_activate: firmware activate.

Each driver which support this command should expose the reload actions
supported.
The uAPI is backward compatible, if the reload action option is omitted
from the reload command, the driver reinit action will be used.
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities.

Patch 1 adds the new API reload action option to devlink.
Patch 2 exposes the supported reload actions on devlink dev get.
Patches 3-8 add support on mlx5 for devlink reload action fw_activate
            and handle the firmware reset events.
Patches 9-10 add devlink enable remote dev reset parameter and use it
             in mlx5.
Patches 11-12 mlx5 add devlink reload live patch support and event
              handling.
Patch 13 adds documentation file devlink-reload.rst 

Command examples:

# Run reload command with fw activate reload action:
$ devlink dev reload pci/0000:82:00.0 action fw_activate

# Run reload command with driver reload action:
$ devlink dev reload pci/0000:82:00.0 action driver_reinit

# Run reload command with fw live patch reload action:
$ devlink dev reload pci/0000:82:00.0 action fw_live_patch

v1 -> v2:
- Instead of reload levels driver,fw_reset,fw_live_patch have reload
  actions driver_reinit,fw_activate,fw_live_patch
- Remove driver default level, the action driver_reinit is the default
  action for all drivers 


Moshe Shemesh (13):
  devlink: Add reload action option to devlink reload command
  devlink: Add supported reload actions to dev get
  net/mlx5: Add functions to set/query MFRL register
  net/mlx5: Set cap for pci sync for fw update event
  net/mlx5: Handle sync reset request event
  net/mlx5: Handle sync reset now event
  net/mlx5: Handle sync reset abort event
  net/mlx5: Add support for devlink reload action fw activate
  devlink: Add enable_remote_dev_reset generic parameter
  net/mlx5: Add devlink param enable_remote_dev_reset support
  net/mlx5: Add support for fw live patch event
  net/mlx5: Add support for devlink reload action live patch
  devlink: Add Documentation/networking/devlink/devlink-reload.rst

 .../networking/devlink/devlink-params.rst     |   6 +
 .../networking/devlink/devlink-reload.rst     |  54 +++
 Documentation/networking/devlink/index.rst    |   1 +
 drivers/net/ethernet/mellanox/mlx4/main.c     |   4 +-
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 104 +++-
 .../mellanox/mlx5/core/diag/fw_tracer.c       |  31 ++
 .../mellanox/mlx5/core/diag/fw_tracer.h       |   1 +
 .../ethernet/mellanox/mlx5/core/fw_reset.c    | 448 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/fw_reset.h    |  19 +
 .../net/ethernet/mellanox/mlx5/core/health.c  |  35 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    |  13 +
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   2 +
 drivers/net/ethernet/mellanox/mlxsw/core.c    |   6 +-
 drivers/net/netdevsim/dev.c                   |   5 +-
 include/linux/mlx5/device.h                   |   1 +
 include/linux/mlx5/driver.h                   |   4 +
 include/net/devlink.h                         |   9 +-
 include/uapi/linux/devlink.h                  |  20 +
 net/core/devlink.c                            |  84 +++-
 20 files changed, 812 insertions(+), 37 deletions(-)
 create mode 100644 Documentation/networking/devlink/devlink-reload.rst
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17 16:16   ` Jakub Kicinski
  2020-08-17 16:36   ` Jiri Pirko
  2020-08-17  9:37 ` [PATCH net-next RFC v2 02/13] devlink: Add supported reload actions to dev get Moshe Shemesh
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Add devlink reload action to allow the user to request a specific reload
action. The action parameter is optional, if not specified then devlink
driver re-init action is used (backward compatible).
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities.
Reload actions supported are:
driver_reinit: driver entities re-initialization, applying devlink-param
               and devlink-resource values.
fw_activate: firmware activate.
fw_live_patch: firmware live patching.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
v1 -> v2:
- Instead of reload levels driver,fw_reset,fw_live_patch have reload
  actions driver_reinit,fw_activate,fw_live_patch
- Remove driver default level, the action driver_reinit is the default
  action for all drivers
---
 drivers/net/ethernet/mellanox/mlx4/main.c     |  4 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  4 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  6 ++-
 drivers/net/netdevsim/dev.c                   |  5 +-
 include/net/devlink.h                         |  5 +-
 include/uapi/linux/devlink.h                  | 19 +++++++
 net/core/devlink.c                            | 51 +++++++++++++++++--
 7 files changed, 81 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 258c7a96f269..e7df4975bea3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3935,6 +3935,7 @@ static int mlx4_restart_one_up(struct pci_dev *pdev, bool reload,
 			       struct devlink *devlink);
 
 static int mlx4_devlink_reload_down(struct devlink *devlink, bool netns_change,
+				    enum devlink_reload_action action,
 				    struct netlink_ext_ack *extack)
 {
 	struct mlx4_priv *priv = devlink_priv(devlink);
@@ -3951,7 +3952,7 @@ static int mlx4_devlink_reload_down(struct devlink *devlink, bool netns_change,
 	return 0;
 }
 
-static int mlx4_devlink_reload_up(struct devlink *devlink,
+static int mlx4_devlink_reload_up(struct devlink *devlink, enum devlink_reload_action action,
 				  struct netlink_ext_ack *extack)
 {
 	struct mlx4_priv *priv = devlink_priv(devlink);
@@ -3969,6 +3970,7 @@ static int mlx4_devlink_reload_up(struct devlink *devlink,
 
 static const struct devlink_ops mlx4_devlink_ops = {
 	.port_type_set	= mlx4_devlink_port_type_set,
+	.supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT),
 	.reload_down	= mlx4_devlink_reload_down,
 	.reload_up	= mlx4_devlink_reload_up,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index c709e9a385f6..dfdf48869f70 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -89,6 +89,7 @@ mlx5_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 }
 
 static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
+				    enum devlink_reload_action action,
 				    struct netlink_ext_ack *extack)
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
@@ -97,7 +98,7 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
 	return 0;
 }
 
-static int mlx5_devlink_reload_up(struct devlink *devlink,
+static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_action action,
 				  struct netlink_ext_ack *extack)
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
@@ -118,6 +119,7 @@ static const struct devlink_ops mlx5_devlink_ops = {
 #endif
 	.flash_update = mlx5_devlink_flash_update,
 	.info_get = mlx5_devlink_info_get,
+	.supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT),
 	.reload_down = mlx5_devlink_reload_down,
 	.reload_up = mlx5_devlink_reload_up,
 };
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 08d101138fbe..f67c5aa2a86f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1113,7 +1113,7 @@ mlxsw_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 
 static int
 mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink,
-					  bool netns_change,
+					  bool netns_change, enum devlink_reload_action action,
 					  struct netlink_ext_ack *extack)
 {
 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
@@ -1126,7 +1126,7 @@ mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink,
 }
 
 static int
-mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink,
+mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink, enum devlink_reload_action action,
 					struct netlink_ext_ack *extack)
 {
 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
@@ -1268,6 +1268,8 @@ mlxsw_devlink_trap_policer_counter_get(struct devlink *devlink,
 }
 
 static const struct devlink_ops mlxsw_devlink_ops = {
+	.supported_reload_actions	= BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
+					  BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE),
 	.reload_down		= mlxsw_devlink_core_bus_device_reload_down,
 	.reload_up		= mlxsw_devlink_core_bus_device_reload_up,
 	.port_type_set			= mlxsw_devlink_port_type_set,
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 32f339fedb21..c212f502052c 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -697,7 +697,7 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
 static void nsim_dev_reload_destroy(struct nsim_dev *nsim_dev);
 
 static int nsim_dev_reload_down(struct devlink *devlink, bool netns_change,
-				struct netlink_ext_ack *extack)
+				enum devlink_reload_action action, struct netlink_ext_ack *extack)
 {
 	struct nsim_dev *nsim_dev = devlink_priv(devlink);
 
@@ -713,7 +713,7 @@ static int nsim_dev_reload_down(struct devlink *devlink, bool netns_change,
 	return 0;
 }
 
-static int nsim_dev_reload_up(struct devlink *devlink,
+static int nsim_dev_reload_up(struct devlink *devlink, enum devlink_reload_action action,
 			      struct netlink_ext_ack *extack)
 {
 	struct nsim_dev *nsim_dev = devlink_priv(devlink);
@@ -875,6 +875,7 @@ nsim_dev_devlink_trap_policer_counter_get(struct devlink *devlink,
 }
 
 static const struct devlink_ops nsim_dev_devlink_ops = {
+	.supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT),
 	.reload_down = nsim_dev_reload_down,
 	.reload_up = nsim_dev_reload_up,
 	.info_get = nsim_dev_info_get,
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 8f3c8a443238..cad3e11d0b9b 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -991,9 +991,10 @@ enum devlink_trap_group_generic_id {
 	}
 
 struct devlink_ops {
+	unsigned long supported_reload_actions;
 	int (*reload_down)(struct devlink *devlink, bool netns_change,
-			   struct netlink_ext_ack *extack);
-	int (*reload_up)(struct devlink *devlink,
+			   enum devlink_reload_action action, struct netlink_ext_ack *extack);
+	int (*reload_up)(struct devlink *devlink, enum devlink_reload_action action,
 			 struct netlink_ext_ack *extack);
 	int (*port_type_set)(struct devlink_port *devlink_port,
 			     enum devlink_port_type port_type);
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cfef4245ea5a..6728029d2e1e 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -272,6 +272,23 @@ enum {
 	DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE,
 };
 
+/**
+ * enum devlink_reload_action - Reload action.
+ * @DEVLINK_RELOAD_ACTION_FW_LIVE_PATCH: FW live patching.
+ * @DEVLINK_RELOAD_ACTION_DRIVER_REINIT: Driver entities re-instantiation.
+ * @DEVLINK_RELOAD_ACTION_FW_ACTIVATE: FW activate.
+ */
+enum devlink_reload_action {
+	DEVLINK_RELOAD_ACTION_UNSPEC,
+	DEVLINK_RELOAD_ACTION_FW_LIVE_PATCH,
+	DEVLINK_RELOAD_ACTION_DRIVER_REINIT,
+	DEVLINK_RELOAD_ACTION_FW_ACTIVATE,
+
+	/* Add new reload actions above */
+	__DEVLINK_RELOAD_ACTION_MAX,
+	DEVLINK_RELOAD_ACTION_MAX = __DEVLINK_RELOAD_ACTION_MAX - 1
+};
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -458,6 +475,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_LANES,			/* u32 */
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
+	DEVLINK_ATTR_RELOAD_ACTION,		/* u8 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e674f0f46dc2..88438ffd6015 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -462,6 +462,12 @@ static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink)
 	return 0;
 }
 
+static bool
+devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action)
+{
+	return test_bit(action, &devlink->ops->supported_reload_actions);
+}
+
 static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
 			   enum devlink_command cmd, u32 portid,
 			   u32 seq, int flags)
@@ -2964,21 +2970,21 @@ bool devlink_is_reload_failed(const struct devlink *devlink)
 EXPORT_SYMBOL_GPL(devlink_is_reload_failed);
 
 static int devlink_reload(struct devlink *devlink, struct net *dest_net,
-			  struct netlink_ext_ack *extack)
+			  enum devlink_reload_action action, struct netlink_ext_ack *extack)
 {
 	int err;
 
 	if (!devlink->reload_enabled)
 		return -EOPNOTSUPP;
 
-	err = devlink->ops->reload_down(devlink, !!dest_net, extack);
+	err = devlink->ops->reload_down(devlink, !!dest_net, action, extack);
 	if (err)
 		return err;
 
 	if (dest_net && !net_eq(dest_net, devlink_net(devlink)))
 		devlink_reload_netns_change(devlink, dest_net);
 
-	err = devlink->ops->reload_up(devlink, extack);
+	err = devlink->ops->reload_up(devlink, action, extack);
 	devlink_reload_failed_set(devlink, !!err);
 	return err;
 }
@@ -2986,6 +2992,7 @@ static int devlink_reload(struct devlink *devlink, struct net *dest_net,
 static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 {
 	struct devlink *devlink = info->user_ptr[0];
+	enum devlink_reload_action action;
 	struct net *dest_net = NULL;
 	int err;
 
@@ -3006,7 +3013,20 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 			return PTR_ERR(dest_net);
 	}
 
-	err = devlink_reload(devlink, dest_net, info->extack);
+	if (info->attrs[DEVLINK_ATTR_RELOAD_ACTION])
+		action = nla_get_u8(info->attrs[DEVLINK_ATTR_RELOAD_ACTION]);
+	else
+		action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT;
+
+	if (action == DEVLINK_RELOAD_ACTION_UNSPEC || action > DEVLINK_RELOAD_ACTION_MAX) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Invalid reload action");
+		return -EINVAL;
+	} else if (!devlink_reload_action_is_supported(devlink, action)) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Requested reload action is not supported");
+		return -EOPNOTSUPP;
+	}
+
+	err = devlink_reload(devlink, dest_net, action, info->extack);
 
 	if (dest_net)
 		put_net(dest_net);
@@ -7039,6 +7059,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_TRAP_POLICER_RATE] = { .type = NLA_U64 },
 	[DEVLINK_ATTR_TRAP_POLICER_BURST] = { .type = NLA_U64 },
 	[DEVLINK_ATTR_PORT_FUNCTION] = { .type = NLA_NESTED },
+	[DEVLINK_ATTR_RELOAD_ACTION] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -7364,6 +7385,20 @@ static struct genl_family devlink_nl_family __ro_after_init = {
 	.n_mcgrps	= ARRAY_SIZE(devlink_nl_mcgrps),
 };
 
+static int devlink_reload_actions_verify(struct devlink *devlink)
+{
+	const struct devlink_ops *ops;
+
+	if (!devlink_reload_supported(devlink))
+		return 0;
+
+	ops = devlink->ops;
+	if (WARN_ON(ops->supported_reload_actions >= BIT(__DEVLINK_RELOAD_ACTION_MAX) ||
+		    ops->supported_reload_actions <= BIT(DEVLINK_RELOAD_ACTION_UNSPEC)))
+		return -EINVAL;
+	return 0;
+}
+
 /**
  *	devlink_alloc - Allocate new devlink instance resources
  *
@@ -7384,6 +7419,11 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	if (!devlink)
 		return NULL;
 	devlink->ops = ops;
+	if (devlink_reload_actions_verify(devlink)) {
+		kfree(devlink);
+		return NULL;
+	}
+
 	xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC);
 	__devlink_net_set(devlink, &init_net);
 	INIT_LIST_HEAD(&devlink->port_list);
@@ -9615,7 +9655,8 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net)
 		if (net_eq(devlink_net(devlink), net)) {
 			if (WARN_ON(!devlink_reload_supported(devlink)))
 				continue;
-			err = devlink_reload(devlink, &init_net, NULL);
+			err = devlink_reload(devlink, &init_net,
+					     DEVLINK_RELOAD_ACTION_DRIVER_REINIT, NULL);
 			if (err && err != -EOPNOTSUPP)
 				pr_warn("Failed to reload devlink instance into init_net\n");
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 02/13] devlink: Add supported reload actions to dev get
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 03/13] net/mlx5: Add functions to set/query MFRL register Moshe Shemesh
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Expose devlink reload supported actions to the user through devlink dev
get command.

Examples:
$ devlink dev show
pci/0000:82:00.0:
  supported_reload_actions:
    fw_live_patch driver_reinit fw_activate
pci/0000:82:00.1:
  supported_reload_actions:
    fw_live_patch driver_reinit fw_activate

$ devlink dev show -jp
{
    "dev": {
        "pci/0000:82:00.0": {
            "supported_reload_actions": [ "fw_live_patch","driver_reinit","fw_activate" ]
        },
        "pci/0000:82:00.1": {
            "supported_reload_actions": [ "fw_live_patch","driver_reinit","fw_activate" ]
        }
    }
}

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
v1 -> v2:
- Removed DEVLINK_ATTR_RELOAD_DEFAULT_LEVEL
- Removed DEVLINK_ATTR_RELOAD_LEVELS_INFO
- Have actions instead of levels
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 28 +++++++++++++++++++++++-----
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 6728029d2e1e..803a9717110c 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -476,6 +476,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
 	DEVLINK_ATTR_RELOAD_ACTION,		/* u8 */
+	DEVLINK_ATTR_RELOAD_SUPPORTED_ACTIONS,	/* nested */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 88438ffd6015..6bab1b02ca99 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -462,6 +462,11 @@ static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink)
 	return 0;
 }
 
+static bool devlink_reload_supported(struct devlink *devlink)
+{
+	return devlink->ops->reload_down && devlink->ops->reload_up;
+}
+
 static bool
 devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action)
 {
@@ -472,7 +477,9 @@ static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
 			   enum devlink_command cmd, u32 portid,
 			   u32 seq, int flags)
 {
+	struct nlattr *supported_actions;
 	void *hdr;
+	int i;
 
 	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
 	if (!hdr)
@@ -483,9 +490,25 @@ static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
 	if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_FAILED, devlink->reload_failed))
 		goto nla_put_failure;
 
+	if (devlink_reload_supported(devlink)) {
+		supported_actions = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_SUPPORTED_ACTIONS);
+		if (!supported_actions)
+			goto nla_put_failure;
+
+		for (i = 0; i <= DEVLINK_RELOAD_ACTION_MAX; i++) {
+			if (!devlink_reload_action_is_supported(devlink, i))
+				continue;
+			if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_ACTION, i))
+				goto supported_actions_nest_cancel;
+		}
+		nla_nest_end(msg, supported_actions);
+	}
+
 	genlmsg_end(msg, hdr);
 	return 0;
 
+supported_actions_nest_cancel:
+	nla_nest_cancel(msg, supported_actions);
 nla_put_failure:
 	genlmsg_cancel(msg, hdr);
 	return -EMSGSIZE;
@@ -2949,11 +2972,6 @@ static void devlink_reload_netns_change(struct devlink *devlink,
 				     DEVLINK_CMD_PARAM_NEW);
 }
 
-static bool devlink_reload_supported(const struct devlink *devlink)
-{
-	return devlink->ops->reload_down && devlink->ops->reload_up;
-}
-
 static void devlink_reload_failed_set(struct devlink *devlink,
 				      bool reload_failed)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 03/13] net/mlx5: Add functions to set/query MFRL register
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 02/13] devlink: Add supported reload actions to dev get Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 04/13] net/mlx5: Set cap for pci sync for fw update event Moshe Shemesh
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Add functions to query and set the MFRL reset options supported by
firmware.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |  2 +-
 .../ethernet/mellanox/mlx5/core/fw_reset.c    | 46 +++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/fw_reset.h    | 13 ++++++
 3 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 10e6886c96ba..4d45a2f6fed6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -16,7 +16,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		transobj.o vport.o sriov.o fs_cmd.o fs_core.o pci_irq.o \
 		fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \
 		lib/devcom.o lib/pci_vsc.o lib/dm.o diag/fs_tracepoint.o \
-		diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o
+		diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o fw_reset.o
 
 #
 # Netdev basic
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
new file mode 100644
index 000000000000..76d2cece29ac
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020, Mellanox Technologies inc.  All rights reserved. */
+
+#include "fw_reset.h"
+
+static int mlx5_reg_mfrl_set(struct mlx5_core_dev *dev, u8 reset_level,
+			     u8 reset_type_sel, u8 sync_resp, bool sync_start)
+{
+	u32 out[MLX5_ST_SZ_DW(mfrl_reg)] = {};
+	u32 in[MLX5_ST_SZ_DW(mfrl_reg)] = {};
+
+	MLX5_SET(mfrl_reg, in, reset_level, reset_level);
+	MLX5_SET(mfrl_reg, in, rst_type_sel, reset_type_sel);
+	MLX5_SET(mfrl_reg, in, pci_sync_for_fw_update_resp, sync_resp);
+	MLX5_SET(mfrl_reg, in, pci_sync_for_fw_update_start, sync_start);
+
+	return mlx5_core_access_reg(dev, in, sizeof(in), out, sizeof(out), MLX5_REG_MFRL, 0, 1);
+}
+
+int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_type)
+{
+	u32 out[MLX5_ST_SZ_DW(mfrl_reg)] = {};
+	u32 in[MLX5_ST_SZ_DW(mfrl_reg)] = {};
+	int err;
+
+	err = mlx5_core_access_reg(dev, in, sizeof(in), out, sizeof(out), MLX5_REG_MFRL, 0, 0);
+	if (err)
+		return err;
+
+	if (reset_level)
+		*reset_level = MLX5_GET(mfrl_reg, out, reset_level);
+	if (reset_type)
+		*reset_type = MLX5_GET(mfrl_reg, out, reset_type);
+
+	return 0;
+}
+
+int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel)
+{
+	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, reset_type_sel, 0, true);
+}
+
+int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev)
+{
+	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL0, 0, 0, false);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
new file mode 100644
index 000000000000..1bbd95182ca6
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020, Mellanox Technologies inc.  All rights reserved. */
+
+#ifndef __MLX5_FW_RESET_H
+#define __MLX5_FW_RESET_H
+
+#include "mlx5_core.h"
+
+int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_type);
+int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel);
+int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev);
+
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 04/13] net/mlx5: Set cap for pci sync for fw update event
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (2 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 03/13] net/mlx5: Add functions to set/query MFRL register Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 05/13] net/mlx5: Handle sync reset request event Moshe Shemesh
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Set capability to notify the firmware that this host driver is capable
of handling pci sync for firmware update events.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ce43e3feccd9..871d28b09f8a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -548,6 +548,9 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx)
 	if (MLX5_CAP_GEN_MAX(dev, dct))
 		MLX5_SET(cmd_hca_cap, set_hca_cap, dct, 1);
 
+	if (MLX5_CAP_GEN_MAX(dev, pci_sync_for_fw_update_event))
+		MLX5_SET(cmd_hca_cap, set_hca_cap, pci_sync_for_fw_update_event, 1);
+
 	if (MLX5_CAP_GEN_MAX(dev, num_vhca_ports))
 		MLX5_SET(cmd_hca_cap,
 			 set_hca_cap,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 05/13] net/mlx5: Handle sync reset request event
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (3 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 04/13] net/mlx5: Set cap for pci sync for fw update event Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 06/13] net/mlx5: Handle sync reset now event Moshe Shemesh
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
v1 -> v2:
- Moved handling of sync reset recovery from health to fw_reset
---
 .../ethernet/mellanox/mlx5/core/fw_reset.c    | 167 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/fw_reset.h    |   3 +
 .../net/ethernet/mellanox/mlx5/core/health.c  |  35 ++--
 .../net/ethernet/mellanox/mlx5/core/main.c    |  10 ++
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   2 +
 include/linux/mlx5/driver.h                   |   4 +
 6 files changed, 206 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 76d2cece29ac..0f224454b4a2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -3,6 +3,20 @@
 
 #include "fw_reset.h"
 
+enum {
+	MLX5_FW_RESET_FLAGS_RESET_REQUESTED,
+};
+
+struct mlx5_fw_reset {
+	struct mlx5_core_dev *dev;
+	struct mlx5_nb nb;
+	struct workqueue_struct *wq;
+	struct work_struct reset_request_work;
+	struct work_struct reset_reload_work;
+	unsigned long reset_flags;
+	struct timer_list timer;
+};
+
 static int mlx5_reg_mfrl_set(struct mlx5_core_dev *dev, u8 reset_level,
 			     u8 reset_type_sel, u8 sync_resp, bool sync_start)
 {
@@ -44,3 +58,156 @@ int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev)
 {
 	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL0, 0, 0, false);
 }
+
+static int mlx5_fw_set_reset_sync_ack(struct mlx5_core_dev *dev)
+{
+	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 1, false);
+}
+
+static void mlx5_sync_reset_reload_work(struct work_struct *work)
+{
+	struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
+						      reset_reload_work);
+	struct mlx5_core_dev *dev = fw_reset->dev;
+
+	mlx5_enter_error_state(dev, true);
+	mlx5_unload_one(dev, false);
+	if (mlx5_health_wait_pci_up(dev)) {
+		mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n");
+		return;
+	}
+	mlx5_load_one(dev, false);
+}
+
+static void mlx5_stop_sync_reset_poll(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+	del_timer(&fw_reset->timer);
+}
+
+static void mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, bool poll_health)
+{
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+	mlx5_stop_sync_reset_poll(dev);
+	clear_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags);
+	if (poll_health)
+		mlx5_start_health_poll(dev);
+}
+
+#define MLX5_RESET_POLL_INTERVAL	(HZ / 10)
+static void poll_sync_reset(struct timer_list *t)
+{
+	struct mlx5_fw_reset *fw_reset = from_timer(fw_reset, t, timer);
+	struct mlx5_core_dev *dev = fw_reset->dev;
+	u32 fatal_error;
+
+	if (!test_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags))
+		return;
+
+	fatal_error = mlx5_health_check_fatal_sensors(dev);
+
+	if (fatal_error) {
+		mlx5_core_warn(dev, "Got Device Reset\n");
+		mlx5_sync_reset_clear_reset_requested(dev, false);
+		queue_work(fw_reset->wq, &fw_reset->reset_reload_work);
+		return;
+	}
+
+	mod_timer(&fw_reset->timer, round_jiffies(jiffies + MLX5_RESET_POLL_INTERVAL));
+}
+
+static void mlx5_start_sync_reset_poll(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+	timer_setup(&fw_reset->timer, poll_sync_reset, 0);
+	fw_reset->timer.expires = round_jiffies(jiffies + MLX5_RESET_POLL_INTERVAL);
+	add_timer(&fw_reset->timer);
+}
+
+static void mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+	mlx5_stop_health_poll(dev, true);
+	set_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags);
+	mlx5_start_sync_reset_poll(dev);
+}
+
+static void mlx5_sync_reset_request_event(struct work_struct *work)
+{
+	struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
+						      reset_request_work);
+	struct mlx5_core_dev *dev = fw_reset->dev;
+
+	mlx5_sync_reset_set_reset_requested(dev);
+	if (mlx5_fw_set_reset_sync_ack(dev))
+		mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack Failed.\n");
+	else
+		mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack. Device reset is expected.\n");
+}
+
+static void mlx5_sync_reset_events_handle(struct mlx5_fw_reset *fw_reset, struct mlx5_eqe *eqe)
+{
+	struct mlx5_eqe_sync_fw_update *sync_fw_update_eqe;
+	u8 sync_event_rst_type;
+
+	sync_fw_update_eqe = &eqe->data.sync_fw_update;
+	sync_event_rst_type = sync_fw_update_eqe->sync_rst_state & SYNC_RST_STATE_MASK;
+	switch (sync_event_rst_type) {
+	case MLX5_SYNC_RST_STATE_RESET_REQUEST:
+		queue_work(fw_reset->wq, &fw_reset->reset_request_work);
+		break;
+	}
+}
+
+static int fw_reset_event_notifier(struct notifier_block *nb, unsigned long action, void *data)
+{
+	struct mlx5_fw_reset *fw_reset = mlx5_nb_cof(nb, struct mlx5_fw_reset, nb);
+	struct mlx5_eqe *eqe = data;
+
+	switch (eqe->sub_type) {
+	case MLX5_GENERAL_SUBTYPE_PCI_SYNC_FOR_FW_UPDATE_EVENT:
+		mlx5_sync_reset_events_handle(fw_reset, eqe);
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_reset *fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL);
+
+	if (!fw_reset)
+		return -ENOMEM;
+	fw_reset->wq = create_singlethread_workqueue("mlx5_fw_reset_events");
+	if (!fw_reset->wq) {
+		kfree(fw_reset);
+		return -ENOMEM;
+	}
+
+	fw_reset->dev = dev;
+	dev->priv.fw_reset = fw_reset;
+
+	INIT_WORK(&fw_reset->reset_request_work, mlx5_sync_reset_request_event);
+	INIT_WORK(&fw_reset->reset_reload_work, mlx5_sync_reset_reload_work);
+
+	MLX5_NB_INIT(&fw_reset->nb, fw_reset_event_notifier, GENERAL_EVENT);
+	mlx5_eq_notifier_register(dev, &fw_reset->nb);
+
+	return 0;
+}
+
+void mlx5_fw_reset_events_cleanup(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+	mlx5_eq_notifier_unregister(dev, &fw_reset->nb);
+	destroy_workqueue(fw_reset->wq);
+	kvfree(dev->priv.fw_reset);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
index 1bbd95182ca6..278f538ea92a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
@@ -10,4 +10,7 @@ int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_ty
 int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel);
 int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev);
 
+int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev);
+void mlx5_fw_reset_events_cleanup(struct mlx5_core_dev *dev);
+
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index b31f769d2df9..54523bed16cd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -110,7 +110,7 @@ static bool sensor_fw_synd_rfr(struct mlx5_core_dev *dev)
 	return rfr && synd;
 }
 
-static u32 check_fatal_sensors(struct mlx5_core_dev *dev)
+u32 mlx5_health_check_fatal_sensors(struct mlx5_core_dev *dev)
 {
 	if (sensor_pci_not_working(dev))
 		return MLX5_SENSOR_PCI_COMM_ERR;
@@ -173,7 +173,7 @@ static bool reset_fw_if_needed(struct mlx5_core_dev *dev)
 	 * Check again to avoid a redundant 2nd reset. If the fatal erros was
 	 * PCI related a reset won't help.
 	 */
-	fatal_error = check_fatal_sensors(dev);
+	fatal_error = mlx5_health_check_fatal_sensors(dev);
 	if (fatal_error == MLX5_SENSOR_PCI_COMM_ERR ||
 	    fatal_error == MLX5_SENSOR_NIC_DISABLED ||
 	    fatal_error == MLX5_SENSOR_NIC_SW_RESET) {
@@ -195,7 +195,7 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
 	bool err_detected = false;
 
 	/* Mark the device as fatal in order to abort FW commands */
-	if ((check_fatal_sensors(dev) || force) &&
+	if ((mlx5_health_check_fatal_sensors(dev) || force) &&
 	    dev->state == MLX5_DEVICE_STATE_UP) {
 		dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
 		err_detected = true;
@@ -208,7 +208,7 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
 		goto unlock;
 	}
 
-	if (check_fatal_sensors(dev) || force) { /* protected state setting */
+	if (mlx5_health_check_fatal_sensors(dev) || force) { /* protected state setting */
 		dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
 		mlx5_cmd_flush(dev);
 	}
@@ -231,7 +231,7 @@ void mlx5_error_sw_reset(struct mlx5_core_dev *dev)
 
 	mlx5_core_err(dev, "start\n");
 
-	if (check_fatal_sensors(dev) == MLX5_SENSOR_FW_SYND_RFR) {
+	if (mlx5_health_check_fatal_sensors(dev) == MLX5_SENSOR_FW_SYND_RFR) {
 		/* Get cr-dump and reset FW semaphore */
 		lock = lock_sem_sw_reset(dev, true);
 
@@ -308,26 +308,31 @@ static void mlx5_handle_bad_state(struct mlx5_core_dev *dev)
 
 /* How much time to wait until health resetting the driver (in msecs) */
 #define MLX5_RECOVERY_WAIT_MSECS 60000
-static int mlx5_health_try_recover(struct mlx5_core_dev *dev)
+int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
 {
 	unsigned long end;
 
-	mlx5_core_warn(dev, "handling bad device here\n");
-	mlx5_handle_bad_state(dev);
 	end = jiffies + msecs_to_jiffies(MLX5_RECOVERY_WAIT_MSECS);
 	while (sensor_pci_not_working(dev)) {
-		if (time_after(jiffies, end)) {
-			mlx5_core_err(dev,
-				      "health recovery flow aborted, PCI reads still not working\n");
-			return -EIO;
-		}
+		if (time_after(jiffies, end))
+			return -ETIMEDOUT;
 		msleep(100);
 	}
+	return 0;
+}
 
+static int mlx5_health_try_recover(struct mlx5_core_dev *dev)
+{
+	mlx5_core_warn(dev, "handling bad device here\n");
+	mlx5_handle_bad_state(dev);
+	if (mlx5_health_wait_pci_up(dev)) {
+		mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
+		return -EIO;
+	}
 	mlx5_core_err(dev, "starting health recovery flow\n");
 	mlx5_recover_device(dev);
 	if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state) ||
-	    check_fatal_sensors(dev)) {
+	    mlx5_health_check_fatal_sensors(dev)) {
 		mlx5_core_err(dev, "health recovery failed\n");
 		return -EIO;
 	}
@@ -696,7 +701,7 @@ static void poll_health(struct timer_list *t)
 	if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
 		goto out;
 
-	fatal_error = check_fatal_sensors(dev);
+	fatal_error = mlx5_health_check_fatal_sensors(dev);
 
 	if (fatal_error && !health->fatal_error) {
 		mlx5_core_err(dev, "Fatal error %u detected\n", fatal_error);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 871d28b09f8a..e833db424f11 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -57,6 +57,7 @@
 #include "lib/mpfs.h"
 #include "eswitch.h"
 #include "devlink.h"
+#include "fw_reset.h"
 #include "lib/mlx5.h"
 #include "fpga/core.h"
 #include "fpga/ipsec.h"
@@ -835,6 +836,12 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 		goto err_eq_cleanup;
 	}
 
+	err = mlx5_fw_reset_events_init(dev);
+	if (err) {
+		mlx5_core_err(dev, "failed to initialize fw reset events\n");
+		goto err_events_cleanup;
+	}
+
 	mlx5_cq_debugfs_init(dev);
 
 	mlx5_init_reserved_gids(dev);
@@ -896,6 +903,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 	mlx5_geneve_destroy(dev->geneve);
 	mlx5_vxlan_destroy(dev->vxlan);
 	mlx5_cq_debugfs_cleanup(dev);
+	mlx5_fw_reset_events_cleanup(dev);
+err_events_cleanup:
 	mlx5_events_cleanup(dev);
 err_eq_cleanup:
 	mlx5_eq_table_cleanup(dev);
@@ -923,6 +932,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_cleanup_clock(dev);
 	mlx5_cleanup_reserved_gids(dev);
 	mlx5_cq_debugfs_cleanup(dev);
+	mlx5_fw_reset_events_cleanup(dev);
 	mlx5_events_cleanup(dev);
 	mlx5_eq_table_cleanup(dev);
 	mlx5_irq_table_cleanup(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index fc1649dac11b..d07a32165792 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -123,6 +123,8 @@ int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev);
 int mlx5_cmd_fast_teardown_hca(struct mlx5_core_dev *dev);
 void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force);
 void mlx5_error_sw_reset(struct mlx5_core_dev *dev);
+u32 mlx5_health_check_fatal_sensors(struct mlx5_core_dev *dev);
+int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev);
 void mlx5_disable_device(struct mlx5_core_dev *dev);
 void mlx5_recover_device(struct mlx5_core_dev *dev);
 int mlx5_sriov_init(struct mlx5_core_dev *dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 8dc3da6e6480..80e31a7684e0 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -501,6 +501,7 @@ struct mlx5_mpfs;
 struct mlx5_eswitch;
 struct mlx5_lag;
 struct mlx5_devcom;
+struct mlx5_fw_reset;
 struct mlx5_eq_table;
 struct mlx5_irq_table;
 
@@ -578,6 +579,7 @@ struct mlx5_priv {
 	struct mlx5_core_sriov	sriov;
 	struct mlx5_lag		*lag;
 	struct mlx5_devcom	*devcom;
+	struct mlx5_fw_reset	*fw_reset;
 	struct mlx5_core_roce	roce;
 	struct mlx5_fc_stats		fc_stats;
 	struct mlx5_rl_table            rl_table;
@@ -943,6 +945,8 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev);
 void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health);
 void mlx5_drain_health_wq(struct mlx5_core_dev *dev);
 void mlx5_trigger_health_work(struct mlx5_core_dev *dev);
+void mlx5_health_set_reset_requested_mode(struct mlx5_core_dev *dev);
+void mlx5_health_clear_reset_requested_mode(struct mlx5_core_dev *dev);
 int mlx5_buf_alloc(struct mlx5_core_dev *dev,
 		   int size, struct mlx5_frag_buf *buf);
 void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 06/13] net/mlx5: Handle sync reset now event
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (4 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 05/13] net/mlx5: Handle sync reset request event Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 07/13] net/mlx5: Handle sync reset abort event Moshe Shemesh
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

On sync_reset_now event the driver does reload and PCI link toggle to
activate firmware upgrade reset. When the firmware sends this event it
syncs the event on all PFs, so all PFs will do PCI link toggle at once.
To do PCI link toggle, the driver ensures that no other device ID under
the same bridge by checking that all the PF functions under the same PCI
bridge have same device ID. If no other device it uses PCI bridge link
control to turn link down and up.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/fw_reset.c    | 121 ++++++++++++++++++
 1 file changed, 121 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 0f224454b4a2..f9e293de9de3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -13,6 +13,7 @@ struct mlx5_fw_reset {
 	struct workqueue_struct *wq;
 	struct work_struct reset_request_work;
 	struct work_struct reset_reload_work;
+	struct work_struct reset_now_work;
 	unsigned long reset_flags;
 	struct timer_list timer;
 };
@@ -149,6 +150,122 @@ static void mlx5_sync_reset_request_event(struct work_struct *work)
 		mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack. Device reset is expected.\n");
 }
 
+#define MLX5_PCI_LINK_UP_TIMEOUT 2000
+
+static int mlx5_pci_link_toggle(struct mlx5_core_dev *dev)
+{
+	struct pci_bus *bridge_bus = dev->pdev->bus;
+	struct pci_dev *bridge = bridge_bus->self;
+	u16 reg16, dev_id, sdev_id;
+	unsigned long timeout;
+	struct pci_dev *sdev;
+	int cap, err;
+	u32 reg32;
+
+	/* Check that all functions under the pci bridge are PFs of
+	 * this device otherwise fail this function.
+	 */
+	err = pci_read_config_word(dev->pdev, PCI_DEVICE_ID, &dev_id);
+	if (err)
+		return err;
+	list_for_each_entry(sdev, &bridge_bus->devices, bus_list) {
+		err = pci_read_config_word(sdev, PCI_DEVICE_ID, &sdev_id);
+		if (err)
+			return err;
+		if (sdev_id != dev_id)
+			return -EPERM;
+	}
+
+	cap = pci_find_capability(bridge, PCI_CAP_ID_EXP);
+	if (!cap)
+		return -EOPNOTSUPP;
+
+	list_for_each_entry(sdev, &bridge_bus->devices, bus_list) {
+		pci_save_state(sdev);
+		pci_cfg_access_lock(sdev);
+	}
+	/* PCI link toggle */
+	err = pci_read_config_word(bridge, cap + PCI_EXP_LNKCTL, &reg16);
+	if (err)
+		return err;
+	reg16 |= PCI_EXP_LNKCTL_LD;
+	err = pci_write_config_word(bridge, cap + PCI_EXP_LNKCTL, reg16);
+	if (err)
+		return err;
+	msleep(500);
+	reg16 &= ~PCI_EXP_LNKCTL_LD;
+	err = pci_write_config_word(bridge, cap + PCI_EXP_LNKCTL, reg16);
+	if (err)
+		return err;
+
+	/* Check link */
+	err = pci_read_config_dword(bridge, cap + PCI_EXP_LNKCAP, &reg32);
+	if (err)
+		return err;
+	if (!(reg32 & PCI_EXP_LNKCAP_DLLLARC)) {
+		mlx5_core_warn(dev, "No PCI link reporting capability (0x%08x)\n", reg32);
+		msleep(1000);
+		goto restore;
+	}
+
+	timeout = jiffies + msecs_to_jiffies(MLX5_PCI_LINK_UP_TIMEOUT);
+	do {
+		err = pci_read_config_word(bridge, cap + PCI_EXP_LNKSTA, &reg16);
+		if (err)
+			return err;
+		if (reg16 & PCI_EXP_LNKSTA_DLLLA)
+			break;
+		msleep(20);
+	} while (!time_after(jiffies, timeout));
+
+	if (reg16 & PCI_EXP_LNKSTA_DLLLA) {
+		mlx5_core_info(dev, "PCI Link up\n");
+	} else {
+		mlx5_core_err(dev, "PCI link not ready (0x%04x) after %d ms\n",
+			      reg16, MLX5_PCI_LINK_UP_TIMEOUT);
+		err = -ETIMEDOUT;
+	}
+
+restore:
+	list_for_each_entry(sdev, &bridge_bus->devices, bus_list) {
+		pci_cfg_access_unlock(sdev);
+		pci_restore_state(sdev);
+	}
+
+	return err;
+}
+
+static void mlx5_sync_reset_now_event(struct work_struct *work)
+{
+	struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
+						      reset_now_work);
+	struct mlx5_core_dev *dev = fw_reset->dev;
+	int err;
+
+	mlx5_sync_reset_clear_reset_requested(dev, false);
+
+	mlx5_core_warn(dev, "Sync Reset now. Device is going to reset.\n");
+
+	err = mlx5_cmd_fast_teardown_hca(dev);
+	if (err) {
+		mlx5_core_warn(dev, "Fast teardown failed, no reset done, err %d\n", err);
+		goto done;
+	}
+
+	err = mlx5_pci_link_toggle(dev);
+	if (err) {
+		mlx5_core_warn(dev, "mlx5_pci_link_toggle failed, no reset done, err %d\n", err);
+		goto done;
+	}
+
+	mlx5_enter_error_state(dev, true);
+	mlx5_unload_one(dev, false);
+done:
+	if (err)
+		mlx5_start_health_poll(dev);
+	mlx5_load_one(dev, false);
+}
+
 static void mlx5_sync_reset_events_handle(struct mlx5_fw_reset *fw_reset, struct mlx5_eqe *eqe)
 {
 	struct mlx5_eqe_sync_fw_update *sync_fw_update_eqe;
@@ -160,6 +277,9 @@ static void mlx5_sync_reset_events_handle(struct mlx5_fw_reset *fw_reset, struct
 	case MLX5_SYNC_RST_STATE_RESET_REQUEST:
 		queue_work(fw_reset->wq, &fw_reset->reset_request_work);
 		break;
+	case MLX5_SYNC_RST_STATE_RESET_NOW:
+		queue_work(fw_reset->wq, &fw_reset->reset_now_work);
+		break;
 	}
 }
 
@@ -196,6 +316,7 @@ int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev)
 
 	INIT_WORK(&fw_reset->reset_request_work, mlx5_sync_reset_request_event);
 	INIT_WORK(&fw_reset->reset_reload_work, mlx5_sync_reset_reload_work);
+	INIT_WORK(&fw_reset->reset_now_work, mlx5_sync_reset_now_event);
 
 	MLX5_NB_INIT(&fw_reset->nb, fw_reset_event_notifier, GENERAL_EVENT);
 	mlx5_eq_notifier_register(dev, &fw_reset->nb);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 07/13] net/mlx5: Handle sync reset abort event
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (5 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 06/13] net/mlx5: Handle sync reset now event Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 08/13] net/mlx5: Add support for devlink reload action fw activate Moshe Shemesh
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

If firmware sends sync_reset_abort to driver the driver should clear the
reset requested mode as reset is not expected any more.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/fw_reset.c    | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index f9e293de9de3..61237f4836cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -14,6 +14,7 @@ struct mlx5_fw_reset {
 	struct work_struct reset_request_work;
 	struct work_struct reset_reload_work;
 	struct work_struct reset_now_work;
+	struct work_struct reset_abort_work;
 	unsigned long reset_flags;
 	struct timer_list timer;
 };
@@ -266,6 +267,16 @@ static void mlx5_sync_reset_now_event(struct work_struct *work)
 	mlx5_load_one(dev, false);
 }
 
+static void mlx5_sync_reset_abort_event(struct work_struct *work)
+{
+	struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
+						      reset_abort_work);
+	struct mlx5_core_dev *dev = fw_reset->dev;
+
+	mlx5_sync_reset_clear_reset_requested(dev, true);
+	mlx5_core_warn(dev, "PCI Sync FW Update Reset Aborted.\n");
+}
+
 static void mlx5_sync_reset_events_handle(struct mlx5_fw_reset *fw_reset, struct mlx5_eqe *eqe)
 {
 	struct mlx5_eqe_sync_fw_update *sync_fw_update_eqe;
@@ -280,6 +291,9 @@ static void mlx5_sync_reset_events_handle(struct mlx5_fw_reset *fw_reset, struct
 	case MLX5_SYNC_RST_STATE_RESET_NOW:
 		queue_work(fw_reset->wq, &fw_reset->reset_now_work);
 		break;
+	case MLX5_SYNC_RST_STATE_RESET_ABORT:
+		queue_work(fw_reset->wq, &fw_reset->reset_abort_work);
+		break;
 	}
 }
 
@@ -317,6 +331,7 @@ int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev)
 	INIT_WORK(&fw_reset->reset_request_work, mlx5_sync_reset_request_event);
 	INIT_WORK(&fw_reset->reset_reload_work, mlx5_sync_reset_reload_work);
 	INIT_WORK(&fw_reset->reset_now_work, mlx5_sync_reset_now_event);
+	INIT_WORK(&fw_reset->reset_abort_work, mlx5_sync_reset_abort_event);
 
 	MLX5_NB_INIT(&fw_reset->nb, fw_reset_event_notifier, GENERAL_EVENT);
 	mlx5_eq_notifier_register(dev, &fw_reset->nb);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 08/13] net/mlx5: Add support for devlink reload action fw activate
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (6 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 07/13] net/mlx5: Handle sync reset abort event Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 09/13] devlink: Add enable_remote_dev_reset generic parameter Moshe Shemesh
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Add support for devlink reload action fw_activate. To activate firmware
image the mlx5 driver resets the firmware and reloads it from flash. If
a new image was stored on flash it will be loaded. Once this reload
command is executed the driver initiates fw sync reset flow, where the
firmware synchronizes all PFs on coming reset and driver reload.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
v1 -> v2:
- Have fw_activate action instead of fw_reset level
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 53 ++++++++++++++++--
 .../ethernet/mellanox/mlx5/core/fw_reset.c    | 54 ++++++++++++++++---
 .../ethernet/mellanox/mlx5/core/fw_reset.h    |  1 +
 3 files changed, 98 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index dfdf48869f70..0a62c98f8c98 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -4,6 +4,7 @@
 #include <devlink.h>
 
 #include "mlx5_core.h"
+#include "fw_reset.h"
 #include "fs_core.h"
 #include "eswitch.h"
 
@@ -88,14 +89,49 @@ mlx5_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 	return 0;
 }
 
+static int mlx5_devlink_reload_fw_activate(struct devlink *devlink, struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	u8 reset_level, reset_type, net_port_alive;
+	int err;
+
+	err = mlx5_reg_mfrl_query(dev, &reset_level, &reset_type);
+	if (err)
+		return err;
+	if (!(reset_level & MLX5_MFRL_REG_RESET_LEVEL3)) {
+		NL_SET_ERR_MSG_MOD(extack, "FW activate requires reboot");
+		return -EINVAL;
+	}
+
+	net_port_alive = !!(reset_type & MLX5_MFRL_REG_RESET_TYPE_NET_PORT_ALIVE);
+	err = mlx5_fw_set_reset_sync(dev, net_port_alive);
+	if (err)
+		goto out;
+
+	err = mlx5_fw_wait_fw_reset_done(dev);
+out:
+	if (err)
+		NL_SET_ERR_MSG_MOD(extack, "FW activate command failed");
+	return err;
+}
+
 static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
 				    enum devlink_reload_action action,
 				    struct netlink_ext_ack *extack)
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 
-	mlx5_unload_one(dev, false);
-	return 0;
+	switch (action) {
+	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
+		mlx5_unload_one(dev, false);
+		return 0;
+	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
+		return mlx5_devlink_reload_fw_activate(devlink, extack);
+	default:
+		/* Unsupported action should not get to this function */
+		WARN_ON(1);
+		return -EOPNOTSUPP;
+	}
 }
 
 static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_action action,
@@ -103,7 +139,15 @@ static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_a
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
 
-	return mlx5_load_one(dev, false);
+	switch (action) {
+	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
+	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
+		return mlx5_load_one(dev, false);
+	default:
+		/* Unsupported action should not get to this function */
+		WARN_ON(1);
+		return -EOPNOTSUPP;
+	}
 }
 
 static const struct devlink_ops mlx5_devlink_ops = {
@@ -119,7 +163,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
 #endif
 	.flash_update = mlx5_devlink_flash_update,
 	.info_get = mlx5_devlink_info_get,
-	.supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT),
+	.supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
+				    BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE),
 	.reload_down = mlx5_devlink_reload_down,
 	.reload_up = mlx5_devlink_reload_up,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 61237f4836cc..44fed2f1911c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -5,6 +5,7 @@
 
 enum {
 	MLX5_FW_RESET_FLAGS_RESET_REQUESTED,
+	MLX5_FW_RESET_FLAGS_PENDING_COMP
 };
 
 struct mlx5_fw_reset {
@@ -17,6 +18,8 @@ struct mlx5_fw_reset {
 	struct work_struct reset_abort_work;
 	unsigned long reset_flags;
 	struct timer_list timer;
+	struct completion done;
+	int ret;
 };
 
 static int mlx5_reg_mfrl_set(struct mlx5_core_dev *dev, u8 reset_level,
@@ -53,7 +56,14 @@ int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_ty
 
 int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel)
 {
-	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, reset_type_sel, 0, true);
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+	int err;
+
+	set_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags);
+	err =  mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, reset_type_sel, 0, true);
+	if (err)
+		clear_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags);
+	return err;
 }
 
 int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev)
@@ -66,19 +76,30 @@ static int mlx5_fw_set_reset_sync_ack(struct mlx5_core_dev *dev)
 	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 1, false);
 }
 
+static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+	if (test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags))
+		complete(&fw_reset->done);
+	else
+		mlx5_load_one(dev, false);
+}
+
 static void mlx5_sync_reset_reload_work(struct work_struct *work)
 {
 	struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
 						      reset_reload_work);
 	struct mlx5_core_dev *dev = fw_reset->dev;
+	int err;
 
 	mlx5_enter_error_state(dev, true);
 	mlx5_unload_one(dev, false);
-	if (mlx5_health_wait_pci_up(dev)) {
+	err = mlx5_health_wait_pci_up(dev);
+	if (err)
 		mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n");
-		return;
-	}
-	mlx5_load_one(dev, false);
+	fw_reset->ret = err;
+	mlx5_fw_reset_complete_reload(dev);
 }
 
 static void mlx5_stop_sync_reset_poll(struct mlx5_core_dev *dev)
@@ -264,7 +285,8 @@ static void mlx5_sync_reset_now_event(struct work_struct *work)
 done:
 	if (err)
 		mlx5_start_health_poll(dev);
-	mlx5_load_one(dev, false);
+	fw_reset->ret = err;
+	mlx5_fw_reset_complete_reload(dev);
 }
 
 static void mlx5_sync_reset_abort_event(struct work_struct *work)
@@ -313,6 +335,25 @@ static int fw_reset_event_notifier(struct notifier_block *nb, unsigned long acti
 	return NOTIFY_OK;
 }
 
+#define MLX5_FW_RESET_TIMEOUT_MSEC 5000
+int mlx5_fw_wait_fw_reset_done(struct mlx5_core_dev *dev)
+{
+	unsigned long timeout = msecs_to_jiffies(MLX5_FW_RESET_TIMEOUT_MSEC);
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+	int err;
+
+	if (!wait_for_completion_timeout(&fw_reset->done, timeout)) {
+		mlx5_core_warn(dev, "FW sync reset timeout after %d seconds\n",
+			       MLX5_FW_RESET_TIMEOUT_MSEC / 1000);
+		err = -ETIMEDOUT;
+		goto out;
+	}
+	err = fw_reset->ret;
+out:
+	clear_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags);
+	return err;
+}
+
 int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev)
 {
 	struct mlx5_fw_reset *fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL);
@@ -336,6 +377,7 @@ int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev)
 	MLX5_NB_INIT(&fw_reset->nb, fw_reset_event_notifier, GENERAL_EVENT);
 	mlx5_eq_notifier_register(dev, &fw_reset->nb);
 
+	init_completion(&fw_reset->done);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
index 278f538ea92a..d7ee951a2258 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
@@ -10,6 +10,7 @@ int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_ty
 int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel);
 int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev);
 
+int mlx5_fw_wait_fw_reset_done(struct mlx5_core_dev *dev);
 int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev);
 void mlx5_fw_reset_events_cleanup(struct mlx5_core_dev *dev);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 09/13] devlink: Add enable_remote_dev_reset generic parameter
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (7 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 08/13] net/mlx5: Add support for devlink reload action fw activate Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 10/13] net/mlx5: Add devlink param enable_remote_dev_reset support Moshe Shemesh
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

The enable_remote_dev_reset devlink param flags that the host admin
allows device resets that can be initiated by other hosts. This
parameter is useful for setups where a device is shared by different
hosts, such as multi-host setup. Once the user set this parameter to
false, the driver should NACK any attempt to reset the device while the
driver is loaded.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 Documentation/networking/devlink/devlink-params.rst | 6 ++++++
 include/net/devlink.h                               | 4 ++++
 net/core/devlink.c                                  | 5 +++++
 3 files changed, 15 insertions(+)

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index d075fd090b3d..54c9f107c4b0 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -108,3 +108,9 @@ own name.
    * - ``region_snapshot_enable``
      - Boolean
      - Enable capture of ``devlink-region`` snapshots.
+   * - ``enable_remote_dev_reset``
+     - Boolean
+     - Enable device reset by remote host. When cleared, the device driver
+       will NACK any attempt of other host to reset the device. This parameter
+       is useful for setups where a device is shared by different hosts, such
+       as multi-host setup.
diff --git a/include/net/devlink.h b/include/net/devlink.h
index cad3e11d0b9b..0818e9c864eb 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -420,6 +420,7 @@ enum devlink_param_generic_id {
 	DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY,
 	DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE,
 	DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
+	DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,
 
 	/* add new param generic ids above here*/
 	__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -457,6 +458,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME "enable_roce"
 #define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME "enable_remote_dev_reset"
+#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE DEVLINK_PARAM_TYPE_BOOL
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate)	\
 {									\
 	.id = DEVLINK_PARAM_GENERIC_ID_##_id,				\
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6bab1b02ca99..43b1839b8305 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3226,6 +3226,11 @@ static const struct devlink_param devlink_param_generic[] = {
 		.name = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME,
 		.type = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE,
 	},
+	{
+		.id = DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,
+		.name = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME,
+		.type = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE,
+	},
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 10/13] net/mlx5: Add devlink param enable_remote_dev_reset support
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (8 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 09/13] devlink: Add enable_remote_dev_reset generic parameter Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 11/13] net/mlx5: Add support for fw live patch event Moshe Shemesh
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

The enable_remote_dev_reset devlink param flags that the host admin
allows resets by other hosts. In case it is cleared mlx5 host PF driver
will send NACK on pci sync for firmware update reset request and the
command will fail.
By default enable_remote_dev_reset parameter is true, so pci sync for
firmware update reset is enabled.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
v1 -> v2:
- Have MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST instead of
  MLX5_HEALTH_RESET_FLAGS_NACK_RESET_REQUEST
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 21 +++++++++++++
 .../ethernet/mellanox/mlx5/core/fw_reset.c    | 30 +++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/fw_reset.h    |  2 ++
 3 files changed, 53 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 0a62c98f8c98..d975f5bd7394 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -275,6 +275,24 @@ static int mlx5_devlink_large_group_num_validate(struct devlink *devlink, u32 id
 }
 #endif
 
+static int mlx5_devlink_enable_remote_dev_reset_set(struct devlink *devlink, u32 id,
+						    struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	mlx5_fw_enable_remote_dev_reset_set(dev, ctx->val.vbool);
+	return 0;
+}
+
+static int mlx5_devlink_enable_remote_dev_reset_get(struct devlink *devlink, u32 id,
+						    struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	ctx->val.vbool = mlx5_fw_enable_remote_dev_reset_get(dev);
+	return 0;
+}
+
 static const struct devlink_param mlx5_devlink_params[] = {
 	DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_FLOW_STEERING_MODE,
 			     "flow_steering_mode", DEVLINK_PARAM_TYPE_STRING,
@@ -290,6 +308,9 @@ static const struct devlink_param mlx5_devlink_params[] = {
 			     NULL, NULL,
 			     mlx5_devlink_large_group_num_validate),
 #endif
+	DEVLINK_PARAM_GENERIC(ENABLE_REMOTE_DEV_RESET, BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			      mlx5_devlink_enable_remote_dev_reset_get,
+			      mlx5_devlink_enable_remote_dev_reset_set, NULL),
 };
 
 static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 44fed2f1911c..f9d6310d99d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -5,6 +5,7 @@
 
 enum {
 	MLX5_FW_RESET_FLAGS_RESET_REQUESTED,
+	MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST,
 	MLX5_FW_RESET_FLAGS_PENDING_COMP
 };
 
@@ -22,6 +23,23 @@ struct mlx5_fw_reset {
 	int ret;
 };
 
+void mlx5_fw_enable_remote_dev_reset_set(struct mlx5_core_dev *dev, bool enable)
+{
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+	if (enable)
+		clear_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags);
+	else
+		set_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags);
+}
+
+bool mlx5_fw_enable_remote_dev_reset_get(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+	return !test_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags);
+}
+
 static int mlx5_reg_mfrl_set(struct mlx5_core_dev *dev, u8 reset_level,
 			     u8 reset_type_sel, u8 sync_resp, bool sync_start)
 {
@@ -76,6 +94,11 @@ static int mlx5_fw_set_reset_sync_ack(struct mlx5_core_dev *dev)
 	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 1, false);
 }
 
+static int mlx5_fw_set_reset_sync_nack(struct mlx5_core_dev *dev)
+{
+	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 2, false);
+}
+
 static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev)
 {
 	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
@@ -164,7 +187,14 @@ static void mlx5_sync_reset_request_event(struct work_struct *work)
 	struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
 						      reset_request_work);
 	struct mlx5_core_dev *dev = fw_reset->dev;
+	int err;
 
+	if (test_bit(MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST, &fw_reset->reset_flags)) {
+		err = mlx5_fw_set_reset_sync_nack(dev);
+		mlx5_core_warn(dev, "PCI Sync FW Update Reset Nack %s",
+			       err ? "Failed" : "Sent");
+		return;
+	}
 	mlx5_sync_reset_set_reset_requested(dev);
 	if (mlx5_fw_set_reset_sync_ack(dev))
 		mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack Failed.\n");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
index d7ee951a2258..fd558dfe93fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
@@ -6,6 +6,8 @@
 
 #include "mlx5_core.h"
 
+void mlx5_fw_enable_remote_dev_reset_set(struct mlx5_core_dev *dev, bool enable);
+bool mlx5_fw_enable_remote_dev_reset_get(struct mlx5_core_dev *dev);
 int mlx5_reg_mfrl_query(struct mlx5_core_dev *dev, u8 *reset_level, u8 *reset_type);
 int mlx5_fw_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel);
 int mlx5_fw_set_live_patch(struct mlx5_core_dev *dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 11/13] net/mlx5: Add support for fw live patch event
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (9 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 10/13] net/mlx5: Add devlink param enable_remote_dev_reset support Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 12/13] net/mlx5: Add support for devlink reload action live patch Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst Moshe Shemesh
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../mellanox/mlx5/core/diag/fw_tracer.c       | 31 +++++++++++++++++++
 .../mellanox/mlx5/core/diag/fw_tracer.h       |  1 +
 .../ethernet/mellanox/mlx5/core/fw_reset.c    | 27 ++++++++++++++++
 include/linux/mlx5/device.h                   |  1 +
 4 files changed, 60 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index ad3594c4afcb..08dae045d185 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -1064,6 +1064,37 @@ void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer)
 	kvfree(tracer);
 }
 
+int mlx5_fw_tracer_recreate_strings_db(struct mlx5_fw_tracer *tracer)
+{
+	struct mlx5_core_dev *dev;
+	int err;
+
+	if (IS_ERR_OR_NULL(tracer))
+		return -EINVAL;
+
+	cancel_work_sync(&tracer->read_fw_strings_work);
+	mlx5_fw_tracer_clean_ready_list(tracer);
+	mlx5_fw_tracer_clean_print_hash(tracer);
+	mlx5_fw_tracer_clean_saved_traces_array(tracer);
+	mlx5_fw_tracer_free_strings_db(tracer);
+
+	dev = tracer->dev;
+	err = mlx5_query_mtrc_caps(tracer);
+	if (err) {
+		mlx5_core_dbg(dev, "FWTracer: Failed to query capabilities %d\n", err);
+		return err;
+	}
+
+	err = mlx5_fw_tracer_allocate_strings_db(tracer);
+	if (err) {
+		mlx5_core_warn(dev, "FWTracer: Allocate strings DB failed %d\n", err);
+		return err;
+	}
+	mlx5_fw_tracer_init_saved_traces_array(tracer);
+
+	return 0;
+}
+
 static int fw_tracer_event(struct notifier_block *nb, unsigned long action, void *data)
 {
 	struct mlx5_fw_tracer *tracer = mlx5_nb_cof(nb, struct mlx5_fw_tracer, nb);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
index 40601fba80ba..1a755098aeeb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
@@ -191,5 +191,6 @@ void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer);
 int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev);
 int mlx5_fw_tracer_get_saved_traces_objects(struct mlx5_fw_tracer *tracer,
 					    struct devlink_fmsg *fmsg);
+int mlx5_fw_tracer_recreate_strings_db(struct mlx5_fw_tracer *tracer);
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index f9d6310d99d6..aa0044150388 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2020, Mellanox Technologies inc.  All rights reserved. */
 
 #include "fw_reset.h"
+#include "diag/fw_tracer.h"
 
 enum {
 	MLX5_FW_RESET_FLAGS_RESET_REQUESTED,
@@ -13,6 +14,7 @@ struct mlx5_fw_reset {
 	struct mlx5_core_dev *dev;
 	struct mlx5_nb nb;
 	struct workqueue_struct *wq;
+	struct work_struct fw_live_patch_work;
 	struct work_struct reset_request_work;
 	struct work_struct reset_reload_work;
 	struct work_struct reset_now_work;
@@ -182,6 +184,27 @@ static void mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev)
 	mlx5_start_sync_reset_poll(dev);
 }
 
+static void mlx5_fw_live_patch_event(struct work_struct *work)
+{
+	struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
+						      fw_live_patch_work);
+	struct mlx5_core_dev *dev = fw_reset->dev;
+	struct mlx5_fw_tracer *tracer;
+
+	mlx5_core_info(dev, "Live patch updated firmware version: %d.%d.%d\n", fw_rev_maj(dev),
+		       fw_rev_min(dev), fw_rev_sub(dev));
+
+	tracer = dev->tracer;
+	if (IS_ERR_OR_NULL(tracer))
+		return;
+
+	mlx5_fw_tracer_cleanup(tracer);
+	if (mlx5_fw_tracer_recreate_strings_db(tracer))
+		mlx5_core_err(dev, "Failed to recreate FW tracer strings DB\n");
+	if (mlx5_fw_tracer_init(tracer))
+		mlx5_core_err(dev, "Failed to re-initialize FW tracer\n");
+}
+
 static void mlx5_sync_reset_request_event(struct work_struct *work)
 {
 	struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
@@ -355,6 +378,9 @@ static int fw_reset_event_notifier(struct notifier_block *nb, unsigned long acti
 	struct mlx5_eqe *eqe = data;
 
 	switch (eqe->sub_type) {
+	case MLX5_GENERAL_SUBTYPE_FW_LIVE_PATCH_EVENT:
+			queue_work(fw_reset->wq, &fw_reset->fw_live_patch_work);
+		break;
 	case MLX5_GENERAL_SUBTYPE_PCI_SYNC_FOR_FW_UPDATE_EVENT:
 		mlx5_sync_reset_events_handle(fw_reset, eqe);
 		break;
@@ -399,6 +425,7 @@ int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev)
 	fw_reset->dev = dev;
 	dev->priv.fw_reset = fw_reset;
 
+	INIT_WORK(&fw_reset->fw_live_patch_work, mlx5_fw_live_patch_event);
 	INIT_WORK(&fw_reset->reset_request_work, mlx5_sync_reset_request_event);
 	INIT_WORK(&fw_reset->reset_reload_work, mlx5_sync_reset_reload_work);
 	INIT_WORK(&fw_reset->reset_now_work, mlx5_sync_reset_now_event);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 4d3376e20f5e..ab5bedd9d3d3 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -366,6 +366,7 @@ enum {
 enum {
 	MLX5_GENERAL_SUBTYPE_DELAY_DROP_TIMEOUT = 0x1,
 	MLX5_GENERAL_SUBTYPE_PCI_POWER_CHANGE_EVENT = 0x5,
+	MLX5_GENERAL_SUBTYPE_FW_LIVE_PATCH_EVENT = 0x7,
 	MLX5_GENERAL_SUBTYPE_PCI_SYNC_FOR_FW_UPDATE_EVENT = 0x8,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 12/13] net/mlx5: Add support for devlink reload action live patch
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (10 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 11/13] net/mlx5: Add support for fw live patch event Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17  9:37 ` [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst Moshe Shemesh
  12 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Add support for devlink reload action fw_live_patch which does live
patching to firmware.
The driver checks if the firmware is capable of handling the pending
firmware changes as a live patch. If it is then it triggers
fw_live_patch flow.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
v1 -> v2:
- Have fw_live_patch action instead of level
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 30 ++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index d975f5bd7394..a62281cfc084 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -115,6 +115,29 @@ static int mlx5_devlink_reload_fw_activate(struct devlink *devlink, struct netli
 	return err;
 }
 
+static int mlx5_devlink_trigger_fw_live_patch(struct devlink *devlink,
+					      struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	u8 reset_level;
+	int err;
+
+	err = mlx5_reg_mfrl_query(dev, &reset_level, NULL);
+	if (err)
+		return err;
+	if (!(reset_level & MLX5_MFRL_REG_RESET_LEVEL0)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "FW upgrade to the stored FW can't be done by FW live patching");
+		return -EINVAL;
+	}
+
+	err = mlx5_fw_set_live_patch(dev);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
 				    enum devlink_reload_action action,
 				    struct netlink_ext_ack *extack)
@@ -127,6 +150,8 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
 		return 0;
 	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
 		return mlx5_devlink_reload_fw_activate(devlink, extack);
+	case DEVLINK_RELOAD_ACTION_FW_LIVE_PATCH:
+		return mlx5_devlink_trigger_fw_live_patch(devlink, extack);
 	default:
 		/* Unsupported action should not get to this function */
 		WARN_ON(1);
@@ -143,6 +168,8 @@ static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_a
 	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
 	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
 		return mlx5_load_one(dev, false);
+	case DEVLINK_RELOAD_ACTION_FW_LIVE_PATCH:
+		return 0;
 	default:
 		/* Unsupported action should not get to this function */
 		WARN_ON(1);
@@ -164,7 +191,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
 	.flash_update = mlx5_devlink_flash_update,
 	.info_get = mlx5_devlink_info_get,
 	.supported_reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
-				    BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE),
+				    BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE) |
+				    BIT(DEVLINK_RELOAD_ACTION_FW_LIVE_PATCH),
 	.reload_down = mlx5_devlink_reload_down,
 	.reload_up = mlx5_devlink_reload_up,
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst
  2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
                   ` (11 preceding siblings ...)
  2020-08-17  9:37 ` [PATCH net-next RFC v2 12/13] net/mlx5: Add support for devlink reload action live patch Moshe Shemesh
@ 2020-08-17  9:37 ` Moshe Shemesh
  2020-08-17 16:39   ` Jiri Pirko
  12 siblings, 1 reply; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-17  9:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev, linux-kernel, Moshe Shemesh

Add devlink reload rst documentation file.
Update index file to include it.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
- Instead of reload levels driver,fw_reset,fw_live_patch have reload
  actions driver_reinit,fw_activate,fw_live_patch
---
 .../networking/devlink/devlink-reload.rst     | 54 +++++++++++++++++++
 Documentation/networking/devlink/index.rst    |  1 +
 2 files changed, 55 insertions(+)
 create mode 100644 Documentation/networking/devlink/devlink-reload.rst

diff --git a/Documentation/networking/devlink/devlink-reload.rst b/Documentation/networking/devlink/devlink-reload.rst
new file mode 100644
index 000000000000..9846ea727f3b
--- /dev/null
+++ b/Documentation/networking/devlink/devlink-reload.rst
@@ -0,0 +1,54 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+Devlink Reload
+==============
+
+``devlink-reload`` provides mechanism to either reload driver entities,
+applying ``devlink-params`` and ``devlink-resources`` new values or firmware
+activation depends on reload action selected.
+
+Reload actions
+=============
+
+User may select a reload action.
+By default ``driver_reinit`` action is done.
+
+.. list-table:: Possible reload actions
+   :widths: 5 90
+
+   * - Name
+     - Description
+   * - ``driver-reinit``
+     - Driver entities re-initialization, including applying
+       new values to devlink entities which are used during driver
+       load such as ``devlink-params`` in configuration mode
+       ``driverinit`` or ``devlink-resources``
+   * - ``fw_activate``
+     - Firmware activate. Can be used for firmware reload or firmware
+       upgrade if new firmware is stored and driver supports such
+       firmware upgrade.
+   * - ``fw_live_patch``
+     - Firmware live patch, applies firmware changes without reset.
+
+Change namespace
+================
+
+All devlink instances are created in init_net and stay there for a
+lifetime. Allow user to be able to move devlink instances into
+namespaces during devlink reload operation. That ensures proper
+re-instantiation of driver objects, including netdevices.
+
+example usage
+-------------
+
+.. code:: shell
+
+    $ devlink dev reload help
+    $ devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { fw_live_patch | driver_reinit | fw_activate } ]
+
+    # Run reload command for devlink driver entities re-initialization:
+    $ devlink dev reload pci/0000:82:00.0 action driver_reinit
+
+    # Run reload command to activate firmware:
+    $ devlink dev reload pci/0000:82:00.0 action fw_activate
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index 7684ae5c4a4a..d82874760ae2 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -20,6 +20,7 @@ general.
    devlink-params
    devlink-region
    devlink-resource
+   devlink-reload
    devlink-trap
 
 Driver-specific documentation
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-17  9:37 ` [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command Moshe Shemesh
@ 2020-08-17 16:16   ` Jakub Kicinski
  2020-08-18  9:06     ` Moshe Shemesh
  2020-08-17 16:36   ` Jiri Pirko
  1 sibling, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2020-08-17 16:16 UTC (permalink / raw)
  To: Moshe Shemesh; +Cc: David S. Miller, Jiri Pirko, netdev, linux-kernel

On Mon, 17 Aug 2020 12:37:40 +0300 Moshe Shemesh wrote:
> Add devlink reload action to allow the user to request a specific reload
> action. The action parameter is optional, if not specified then devlink
> driver re-init action is used (backward compatible).
> Note that when required to do firmware activation some drivers may need
> to reload the driver. On the other hand some drivers may need to reset
> the firmware to reinitialize the driver entities.

See, this is why I wanted to keep --live as a separate option. 
Normally the driver is okay to satisfy more actions than requested, 
e.g. activate FW even if only driver_reinit was requested.

fw_live_patch does not have this semantics, it explicitly requires
driver to not impact connectivity much. No "can do more resets than
requested" here. Hence the --live part would be better off as a
separate argument (at least in uAPI, the in-kernel interface we can
change later if needed).

> Reload actions supported are:
> driver_reinit: driver entities re-initialization, applying devlink-param
>                and devlink-resource values.
> fw_activate: firmware activate.
> fw_live_patch: firmware live patching.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-17  9:37 ` [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command Moshe Shemesh
  2020-08-17 16:16   ` Jakub Kicinski
@ 2020-08-17 16:36   ` Jiri Pirko
  2020-08-18  9:10     ` Moshe Shemesh
  1 sibling, 1 reply; 30+ messages in thread
From: Jiri Pirko @ 2020-08-17 16:36 UTC (permalink / raw)
  To: Moshe Shemesh
  Cc: David S. Miller, Jakub Kicinski, Jiri Pirko, netdev, linux-kernel

Mon, Aug 17, 2020 at 11:37:40AM CEST, moshe@mellanox.com wrote:
>Add devlink reload action to allow the user to request a specific reload
>action. The action parameter is optional, if not specified then devlink
>driver re-init action is used (backward compatible).
>Note that when required to do firmware activation some drivers may need
>to reload the driver. On the other hand some drivers may need to reset

Sounds reasonable. I think it would be good to indicate that though. Not
sure how...


>the firmware to reinitialize the driver entities.
>Reload actions supported are:
>driver_reinit: driver entities re-initialization, applying devlink-param
>               and devlink-resource values.
>fw_activate: firmware activate.
>fw_live_patch: firmware live patching.
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst
  2020-08-17  9:37 ` [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst Moshe Shemesh
@ 2020-08-17 16:39   ` Jiri Pirko
  2020-08-18  9:14     ` Moshe Shemesh
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Pirko @ 2020-08-17 16:39 UTC (permalink / raw)
  To: Moshe Shemesh
  Cc: David S. Miller, Jakub Kicinski, Jiri Pirko, netdev, linux-kernel

Mon, Aug 17, 2020 at 11:37:52AM CEST, moshe@mellanox.com wrote:
>Add devlink reload rst documentation file.
>Update index file to include it.
>
>Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
>---
>- Instead of reload levels driver,fw_reset,fw_live_patch have reload
>  actions driver_reinit,fw_activate,fw_live_patch
>---
> .../networking/devlink/devlink-reload.rst     | 54 +++++++++++++++++++
> Documentation/networking/devlink/index.rst    |  1 +
> 2 files changed, 55 insertions(+)
> create mode 100644 Documentation/networking/devlink/devlink-reload.rst
>
>diff --git a/Documentation/networking/devlink/devlink-reload.rst b/Documentation/networking/devlink/devlink-reload.rst
>new file mode 100644
>index 000000000000..9846ea727f3b
>--- /dev/null
>+++ b/Documentation/networking/devlink/devlink-reload.rst
>@@ -0,0 +1,54 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+==============
>+Devlink Reload
>+==============
>+
>+``devlink-reload`` provides mechanism to either reload driver entities,
>+applying ``devlink-params`` and ``devlink-resources`` new values or firmware
>+activation depends on reload action selected.
>+
>+Reload actions
>+=============
>+
>+User may select a reload action.
>+By default ``driver_reinit`` action is done.
>+
>+.. list-table:: Possible reload actions
>+   :widths: 5 90
>+
>+   * - Name
>+     - Description
>+   * - ``driver-reinit``
>+     - Driver entities re-initialization, including applying
>+       new values to devlink entities which are used during driver
>+       load such as ``devlink-params`` in configuration mode
>+       ``driverinit`` or ``devlink-resources``
>+   * - ``fw_activate``
>+     - Firmware activate. Can be used for firmware reload or firmware
>+       upgrade if new firmware is stored and driver supports such
>+       firmware upgrade.

Does this do the same as "driver-reinit" + fw activation? If yes, it
should be written here. If no, it should be written here as well.


>+   * - ``fw_live_patch``
>+     - Firmware live patch, applies firmware changes without reset.
>+
>+Change namespace
>+================
>+
>+All devlink instances are created in init_net and stay there for a
>+lifetime. Allow user to be able to move devlink instances into
>+namespaces during devlink reload operation. That ensures proper
>+re-instantiation of driver objects, including netdevices.
>+
>+example usage
>+-------------
>+
>+.. code:: shell
>+
>+    $ devlink dev reload help
>+    $ devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { fw_live_patch | driver_reinit | fw_activate } ]
>+
>+    # Run reload command for devlink driver entities re-initialization:
>+    $ devlink dev reload pci/0000:82:00.0 action driver_reinit
>+
>+    # Run reload command to activate firmware:
>+    $ devlink dev reload pci/0000:82:00.0 action fw_activate
>diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>index 7684ae5c4a4a..d82874760ae2 100644
>--- a/Documentation/networking/devlink/index.rst
>+++ b/Documentation/networking/devlink/index.rst
>@@ -20,6 +20,7 @@ general.
>    devlink-params
>    devlink-region
>    devlink-resource
>+   devlink-reload
>    devlink-trap
> 
> Driver-specific documentation
>-- 
>2.17.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-17 16:16   ` Jakub Kicinski
@ 2020-08-18  9:06     ` Moshe Shemesh
  2020-08-18 15:37       ` Jakub Kicinski
  0 siblings, 1 reply; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-18  9:06 UTC (permalink / raw)
  To: Jakub Kicinski, Moshe Shemesh
  Cc: David S. Miller, Jiri Pirko, netdev, linux-kernel


On 8/17/2020 7:16 PM, Jakub Kicinski wrote:
>
> On Mon, 17 Aug 2020 12:37:40 +0300 Moshe Shemesh wrote:
>> Add devlink reload action to allow the user to request a specific reload
>> action. The action parameter is optional, if not specified then devlink
>> driver re-init action is used (backward compatible).
>> Note that when required to do firmware activation some drivers may need
>> to reload the driver. On the other hand some drivers may need to reset
>> the firmware to reinitialize the driver entities.
> See, this is why I wanted to keep --live as a separate option.
> Normally the driver is okay to satisfy more actions than requested,
> e.g. activate FW even if only driver_reinit was requested.
>
> fw_live_patch does not have this semantics, it explicitly requires
> driver to not impact connectivity much. No "can do more resets than
> requested" here. Hence the --live part would be better off as a
> separate argument (at least in uAPI, the in-kernel interface we can
> change later if needed).


Yes, it does have a different semantics, kind of no reset allowed.

On the other hand, it is not related to driver_reinit, only fw_activate.

So the uAPI should be:

     devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { 
driver_reinit | fw_activate [--live] } ]

Or maybe better than "live" say explicitly "no reset":

     devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { 
driver_reinit | fw_activate [--no_reset] } ]


>> Reload actions supported are:
>> driver_reinit: driver entities re-initialization, applying devlink-param
>>                 and devlink-resource values.
>> fw_activate: firmware activate.
>> fw_live_patch: firmware live patching.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-17 16:36   ` Jiri Pirko
@ 2020-08-18  9:10     ` Moshe Shemesh
  2020-08-19  0:10       ` Jakub Kicinski
  0 siblings, 1 reply; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-18  9:10 UTC (permalink / raw)
  To: Jiri Pirko, Moshe Shemesh
  Cc: David S. Miller, Jakub Kicinski, Jiri Pirko, netdev, linux-kernel


On 8/17/2020 7:36 PM, Jiri Pirko wrote:
> Mon, Aug 17, 2020 at 11:37:40AM CEST, moshe@mellanox.com wrote:
>> Add devlink reload action to allow the user to request a specific reload
>> action. The action parameter is optional, if not specified then devlink
>> driver re-init action is used (backward compatible).
>> Note that when required to do firmware activation some drivers may need
>> to reload the driver. On the other hand some drivers may need to reset
> Sounds reasonable. I think it would be good to indicate that though. Not
> sure how...


Maybe counters on the actions done ? Actually such counters can be 
useful on debug, knowing what reloads we had since driver was up.

>
>> the firmware to reinitialize the driver entities.
>> Reload actions supported are:
>> driver_reinit: driver entities re-initialization, applying devlink-param
>>                and devlink-resource values.
>> fw_activate: firmware activate.
>> fw_live_patch: firmware live patching.
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst
  2020-08-17 16:39   ` Jiri Pirko
@ 2020-08-18  9:14     ` Moshe Shemesh
  2020-08-18 11:07       ` Jiri Pirko
  0 siblings, 1 reply; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-18  9:14 UTC (permalink / raw)
  To: Jiri Pirko, Moshe Shemesh
  Cc: David S. Miller, Jakub Kicinski, Jiri Pirko, netdev, linux-kernel


On 8/17/2020 7:39 PM, Jiri Pirko wrote:
> Mon, Aug 17, 2020 at 11:37:52AM CEST, moshe@mellanox.com wrote:
>> Add devlink reload rst documentation file.
>> Update index file to include it.
>>
>> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
>> ---
>> - Instead of reload levels driver,fw_reset,fw_live_patch have reload
>>   actions driver_reinit,fw_activate,fw_live_patch
>> ---
>> .../networking/devlink/devlink-reload.rst     | 54 +++++++++++++++++++
>> Documentation/networking/devlink/index.rst    |  1 +
>> 2 files changed, 55 insertions(+)
>> create mode 100644 Documentation/networking/devlink/devlink-reload.rst
>>
>> diff --git a/Documentation/networking/devlink/devlink-reload.rst b/Documentation/networking/devlink/devlink-reload.rst
>> new file mode 100644
>> index 000000000000..9846ea727f3b
>> --- /dev/null
>> +++ b/Documentation/networking/devlink/devlink-reload.rst
>> @@ -0,0 +1,54 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============
>> +Devlink Reload
>> +==============
>> +
>> +``devlink-reload`` provides mechanism to either reload driver entities,
>> +applying ``devlink-params`` and ``devlink-resources`` new values or firmware
>> +activation depends on reload action selected.
>> +
>> +Reload actions
>> +=============
>> +
>> +User may select a reload action.
>> +By default ``driver_reinit`` action is done.
>> +
>> +.. list-table:: Possible reload actions
>> +   :widths: 5 90
>> +
>> +   * - Name
>> +     - Description
>> +   * - ``driver-reinit``
>> +     - Driver entities re-initialization, including applying
>> +       new values to devlink entities which are used during driver
>> +       load such as ``devlink-params`` in configuration mode
>> +       ``driverinit`` or ``devlink-resources``
>> +   * - ``fw_activate``
>> +     - Firmware activate. Can be used for firmware reload or firmware
>> +       upgrade if new firmware is stored and driver supports such
>> +       firmware upgrade.
> Does this do the same as "driver-reinit" + fw activation? If yes, it
> should be written here. If no, it should be written here as well.
>

No, The only thing required here is the action of firmware activation. 
If a driver needs to do reload to make that happen and do reinit that's 
ok, but not required.

>> +   * - ``fw_live_patch``
>> +     - Firmware live patch, applies firmware changes without reset.
>> +
>> +Change namespace
>> +================
>> +
>> +All devlink instances are created in init_net and stay there for a
>> +lifetime. Allow user to be able to move devlink instances into
>> +namespaces during devlink reload operation. That ensures proper
>> +re-instantiation of driver objects, including netdevices.
>> +
>> +example usage
>> +-------------
>> +
>> +.. code:: shell
>> +
>> +    $ devlink dev reload help
>> +    $ devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { fw_live_patch | driver_reinit | fw_activate } ]
>> +
>> +    # Run reload command for devlink driver entities re-initialization:
>> +    $ devlink dev reload pci/0000:82:00.0 action driver_reinit
>> +
>> +    # Run reload command to activate firmware:
>> +    $ devlink dev reload pci/0000:82:00.0 action fw_activate
>> diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>> index 7684ae5c4a4a..d82874760ae2 100644
>> --- a/Documentation/networking/devlink/index.rst
>> +++ b/Documentation/networking/devlink/index.rst
>> @@ -20,6 +20,7 @@ general.
>>     devlink-params
>>     devlink-region
>>     devlink-resource
>> +   devlink-reload
>>     devlink-trap
>>
>> Driver-specific documentation
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst
  2020-08-18  9:14     ` Moshe Shemesh
@ 2020-08-18 11:07       ` Jiri Pirko
  2020-08-18 20:04         ` Moshe Shemesh
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Pirko @ 2020-08-18 11:07 UTC (permalink / raw)
  To: Moshe Shemesh
  Cc: Moshe Shemesh, David S. Miller, Jakub Kicinski, Jiri Pirko,
	netdev, linux-kernel

Tue, Aug 18, 2020 at 11:14:16AM CEST, moshe@nvidia.com wrote:
>
>On 8/17/2020 7:39 PM, Jiri Pirko wrote:
>> Mon, Aug 17, 2020 at 11:37:52AM CEST, moshe@mellanox.com wrote:
>> > Add devlink reload rst documentation file.
>> > Update index file to include it.
>> > 
>> > Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
>> > ---
>> > - Instead of reload levels driver,fw_reset,fw_live_patch have reload
>> >   actions driver_reinit,fw_activate,fw_live_patch
>> > ---
>> > .../networking/devlink/devlink-reload.rst     | 54 +++++++++++++++++++
>> > Documentation/networking/devlink/index.rst    |  1 +
>> > 2 files changed, 55 insertions(+)
>> > create mode 100644 Documentation/networking/devlink/devlink-reload.rst
>> > 
>> > diff --git a/Documentation/networking/devlink/devlink-reload.rst b/Documentation/networking/devlink/devlink-reload.rst
>> > new file mode 100644
>> > index 000000000000..9846ea727f3b
>> > --- /dev/null
>> > +++ b/Documentation/networking/devlink/devlink-reload.rst
>> > @@ -0,0 +1,54 @@
>> > +.. SPDX-License-Identifier: GPL-2.0
>> > +
>> > +==============
>> > +Devlink Reload
>> > +==============
>> > +
>> > +``devlink-reload`` provides mechanism to either reload driver entities,
>> > +applying ``devlink-params`` and ``devlink-resources`` new values or firmware
>> > +activation depends on reload action selected.
>> > +
>> > +Reload actions
>> > +=============
>> > +
>> > +User may select a reload action.
>> > +By default ``driver_reinit`` action is done.
>> > +
>> > +.. list-table:: Possible reload actions
>> > +   :widths: 5 90
>> > +
>> > +   * - Name
>> > +     - Description
>> > +   * - ``driver-reinit``
>> > +     - Driver entities re-initialization, including applying
>> > +       new values to devlink entities which are used during driver
>> > +       load such as ``devlink-params`` in configuration mode
>> > +       ``driverinit`` or ``devlink-resources``
>> > +   * - ``fw_activate``
>> > +     - Firmware activate. Can be used for firmware reload or firmware
>> > +       upgrade if new firmware is stored and driver supports such
>> > +       firmware upgrade.
>> Does this do the same as "driver-reinit" + fw activation? If yes, it
>> should be written here. If no, it should be written here as well.
>> 
>
>No, The only thing required here is the action of firmware activation. If a
>driver needs to do reload to make that happen and do reinit that's ok, but
>not required.

What does the "FW activation" mean? I believe that this needs explicit
documentation here.


>
>> > +   * - ``fw_live_patch``
>> > +     - Firmware live patch, applies firmware changes without reset.
>> > +
>> > +Change namespace
>> > +================
>> > +
>> > +All devlink instances are created in init_net and stay there for a
>> > +lifetime. Allow user to be able to move devlink instances into
>> > +namespaces during devlink reload operation. That ensures proper
>> > +re-instantiation of driver objects, including netdevices.
>> > +
>> > +example usage
>> > +-------------
>> > +
>> > +.. code:: shell
>> > +
>> > +    $ devlink dev reload help
>> > +    $ devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { fw_live_patch | driver_reinit | fw_activate } ]
>> > +
>> > +    # Run reload command for devlink driver entities re-initialization:
>> > +    $ devlink dev reload pci/0000:82:00.0 action driver_reinit
>> > +
>> > +    # Run reload command to activate firmware:
>> > +    $ devlink dev reload pci/0000:82:00.0 action fw_activate
>> > diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>> > index 7684ae5c4a4a..d82874760ae2 100644
>> > --- a/Documentation/networking/devlink/index.rst
>> > +++ b/Documentation/networking/devlink/index.rst
>> > @@ -20,6 +20,7 @@ general.
>> >     devlink-params
>> >     devlink-region
>> >     devlink-resource
>> > +   devlink-reload
>> >     devlink-trap
>> > 
>> > Driver-specific documentation
>> > -- 
>> > 2.17.1
>> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-18  9:06     ` Moshe Shemesh
@ 2020-08-18 15:37       ` Jakub Kicinski
  0 siblings, 0 replies; 30+ messages in thread
From: Jakub Kicinski @ 2020-08-18 15:37 UTC (permalink / raw)
  To: Moshe Shemesh
  Cc: Moshe Shemesh, David S. Miller, Jiri Pirko, netdev, linux-kernel

On Tue, 18 Aug 2020 12:06:13 +0300 Moshe Shemesh wrote:
> Or maybe better than "live" say explicitly "no reset":
> 
>      devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { 
> driver_reinit | fw_activate [--no_reset] } ]

SGTM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst
  2020-08-18 11:07       ` Jiri Pirko
@ 2020-08-18 20:04         ` Moshe Shemesh
  0 siblings, 0 replies; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-18 20:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Moshe Shemesh, David S. Miller, Jakub Kicinski, Jiri Pirko,
	netdev, linux-kernel


On 8/18/2020 2:07 PM, Jiri Pirko wrote:
> Tue, Aug 18, 2020 at 11:14:16AM CEST, moshe@nvidia.com wrote:
>> On 8/17/2020 7:39 PM, Jiri Pirko wrote:
>>> Mon, Aug 17, 2020 at 11:37:52AM CEST, moshe@mellanox.com wrote:
>>>> Add devlink reload rst documentation file.
>>>> Update index file to include it.
>>>>
>>>> Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
>>>> ---
>>>> - Instead of reload levels driver,fw_reset,fw_live_patch have reload
>>>>    actions driver_reinit,fw_activate,fw_live_patch
>>>> ---
>>>> .../networking/devlink/devlink-reload.rst     | 54 +++++++++++++++++++
>>>> Documentation/networking/devlink/index.rst    |  1 +
>>>> 2 files changed, 55 insertions(+)
>>>> create mode 100644 Documentation/networking/devlink/devlink-reload.rst
>>>>
>>>> diff --git a/Documentation/networking/devlink/devlink-reload.rst b/Documentation/networking/devlink/devlink-reload.rst
>>>> new file mode 100644
>>>> index 000000000000..9846ea727f3b
>>>> --- /dev/null
>>>> +++ b/Documentation/networking/devlink/devlink-reload.rst
>>>> @@ -0,0 +1,54 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0
>>>> +
>>>> +==============
>>>> +Devlink Reload
>>>> +==============
>>>> +
>>>> +``devlink-reload`` provides mechanism to either reload driver entities,
>>>> +applying ``devlink-params`` and ``devlink-resources`` new values or firmware
>>>> +activation depends on reload action selected.
>>>> +
>>>> +Reload actions
>>>> +=============
>>>> +
>>>> +User may select a reload action.
>>>> +By default ``driver_reinit`` action is done.
>>>> +
>>>> +.. list-table:: Possible reload actions
>>>> +   :widths: 5 90
>>>> +
>>>> +   * - Name
>>>> +     - Description
>>>> +   * - ``driver-reinit``
>>>> +     - Driver entities re-initialization, including applying
>>>> +       new values to devlink entities which are used during driver
>>>> +       load such as ``devlink-params`` in configuration mode
>>>> +       ``driverinit`` or ``devlink-resources``
>>>> +   * - ``fw_activate``
>>>> +     - Firmware activate. Can be used for firmware reload or firmware
>>>> +       upgrade if new firmware is stored and driver supports such
>>>> +       firmware upgrade.
>>> Does this do the same as "driver-reinit" + fw activation? If yes, it
>>> should be written here. If no, it should be written here as well.
>>>
>> No, The only thing required here is the action of firmware activation. If a
>> driver needs to do reload to make that happen and do reinit that's ok, but
>> not required.
> What does the "FW activation" mean? I believe that this needs explicit
> documentation here.
>
I will add it explicitly.

FW activation means FW upgrade if new image is pending activation. If no 
FW image pending, it reloads the current FW.

>>>> +   * - ``fw_live_patch``
>>>> +     - Firmware live patch, applies firmware changes without reset.
>>>> +
>>>> +Change namespace
>>>> +================
>>>> +
>>>> +All devlink instances are created in init_net and stay there for a
>>>> +lifetime. Allow user to be able to move devlink instances into
>>>> +namespaces during devlink reload operation. That ensures proper
>>>> +re-instantiation of driver objects, including netdevices.
>>>> +
>>>> +example usage
>>>> +-------------
>>>> +
>>>> +.. code:: shell
>>>> +
>>>> +    $ devlink dev reload help
>>>> +    $ devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { fw_live_patch | driver_reinit | fw_activate } ]
>>>> +
>>>> +    # Run reload command for devlink driver entities re-initialization:
>>>> +    $ devlink dev reload pci/0000:82:00.0 action driver_reinit
>>>> +
>>>> +    # Run reload command to activate firmware:
>>>> +    $ devlink dev reload pci/0000:82:00.0 action fw_activate
>>>> diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>>>> index 7684ae5c4a4a..d82874760ae2 100644
>>>> --- a/Documentation/networking/devlink/index.rst
>>>> +++ b/Documentation/networking/devlink/index.rst
>>>> @@ -20,6 +20,7 @@ general.
>>>>      devlink-params
>>>>      devlink-region
>>>>      devlink-resource
>>>> +   devlink-reload
>>>>      devlink-trap
>>>>
>>>> Driver-specific documentation
>>>> -- 
>>>> 2.17.1
>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-18  9:10     ` Moshe Shemesh
@ 2020-08-19  0:10       ` Jakub Kicinski
  2020-08-19 12:18         ` Moshe Shemesh
  0 siblings, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2020-08-19  0:10 UTC (permalink / raw)
  To: Moshe Shemesh
  Cc: Jiri Pirko, Moshe Shemesh, David S. Miller, Jiri Pirko, netdev,
	linux-kernel

On Tue, 18 Aug 2020 12:10:36 +0300 Moshe Shemesh wrote:
> On 8/17/2020 7:36 PM, Jiri Pirko wrote:
> > Mon, Aug 17, 2020 at 11:37:40AM CEST, moshe@mellanox.com wrote:  
> >> Add devlink reload action to allow the user to request a specific reload
> >> action. The action parameter is optional, if not specified then devlink
> >> driver re-init action is used (backward compatible).
> >> Note that when required to do firmware activation some drivers may need
> >> to reload the driver. On the other hand some drivers may need to reset  
> > Sounds reasonable. I think it would be good to indicate that though. Not
> > sure how...  
> 
> Maybe counters on the actions done ? Actually such counters can be 
> useful on debug, knowing what reloads we had since driver was up.

Wouldn't we need to know all types of reset of drivers may do?

I think documenting this clearly should be sufficient.

A reset counter for the _requested_ reset type (fully maintained by
core), however - that may be useful. The question "why did this NIC
reset itself / why did the link just flap" comes up repeatedly.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-19  0:10       ` Jakub Kicinski
@ 2020-08-19 12:18         ` Moshe Shemesh
  2020-08-19 12:46           ` Jiri Pirko
  0 siblings, 1 reply; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-19 12:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Moshe Shemesh, David S. Miller, Jiri Pirko, netdev,
	linux-kernel


On 8/19/2020 3:10 AM, Jakub Kicinski wrote:
>
> On Tue, 18 Aug 2020 12:10:36 +0300 Moshe Shemesh wrote:
>> On 8/17/2020 7:36 PM, Jiri Pirko wrote:
>>> Mon, Aug 17, 2020 at 11:37:40AM CEST, moshe@mellanox.com wrote:
>>>> Add devlink reload action to allow the user to request a specific reload
>>>> action. The action parameter is optional, if not specified then devlink
>>>> driver re-init action is used (backward compatible).
>>>> Note that when required to do firmware activation some drivers may need
>>>> to reload the driver. On the other hand some drivers may need to reset
>>> Sounds reasonable. I think it would be good to indicate that though. Not
>>> sure how...
>> Maybe counters on the actions done ? Actually such counters can be
>> useful on debug, knowing what reloads we had since driver was up.
> Wouldn't we need to know all types of reset of drivers may do?


Right, we can't tell all reset types driver may have, but we can tell 
which reload actions were done.

> I think documenting this clearly should be sufficient.
>
> A reset counter for the _requested_ reset type (fully maintained by
> core), however - that may be useful. The question "why did this NIC
> reset itself / why did the link just flap" comes up repeatedly.


I will add counters on which reload were done. reload_down()/up() can 
return which actions were actually done and devlink will show counters.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-19 12:18         ` Moshe Shemesh
@ 2020-08-19 12:46           ` Jiri Pirko
  2020-08-19 14:23             ` Moshe Shemesh
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Pirko @ 2020-08-19 12:46 UTC (permalink / raw)
  To: Moshe Shemesh
  Cc: Jakub Kicinski, Moshe Shemesh, David S. Miller, Jiri Pirko,
	netdev, linux-kernel

Wed, Aug 19, 2020 at 02:18:22PM CEST, moshe@nvidia.com wrote:
>
>On 8/19/2020 3:10 AM, Jakub Kicinski wrote:
>> 
>> On Tue, 18 Aug 2020 12:10:36 +0300 Moshe Shemesh wrote:
>> > On 8/17/2020 7:36 PM, Jiri Pirko wrote:
>> > > Mon, Aug 17, 2020 at 11:37:40AM CEST, moshe@mellanox.com wrote:
>> > > > Add devlink reload action to allow the user to request a specific reload
>> > > > action. The action parameter is optional, if not specified then devlink
>> > > > driver re-init action is used (backward compatible).
>> > > > Note that when required to do firmware activation some drivers may need
>> > > > to reload the driver. On the other hand some drivers may need to reset
>> > > Sounds reasonable. I think it would be good to indicate that though. Not
>> > > sure how...
>> > Maybe counters on the actions done ? Actually such counters can be
>> > useful on debug, knowing what reloads we had since driver was up.
>> Wouldn't we need to know all types of reset of drivers may do?
>
>
>Right, we can't tell all reset types driver may have, but we can tell which
>reload actions were done.
>
>> I think documenting this clearly should be sufficient.
>> 
>> A reset counter for the _requested_ reset type (fully maintained by
>> core), however - that may be useful. The question "why did this NIC
>> reset itself / why did the link just flap" comes up repeatedly.
>
>
>I will add counters on which reload were done. reload_down()/up() can return
>which actions were actually done and devlink will show counters.

Why a counter? Just return what was done over netlink reply.

>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-19 12:46           ` Jiri Pirko
@ 2020-08-19 14:23             ` Moshe Shemesh
  2020-08-19 15:18               ` Jiri Pirko
  0 siblings, 1 reply; 30+ messages in thread
From: Moshe Shemesh @ 2020-08-19 14:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Moshe Shemesh, David S. Miller, Jiri Pirko,
	netdev, linux-kernel


On 8/19/2020 3:46 PM, Jiri Pirko wrote:
> Wed, Aug 19, 2020 at 02:18:22PM CEST, moshe@nvidia.com wrote:
>> On 8/19/2020 3:10 AM, Jakub Kicinski wrote:
>>> On Tue, 18 Aug 2020 12:10:36 +0300 Moshe Shemesh wrote:
>>>> On 8/17/2020 7:36 PM, Jiri Pirko wrote:
>>>>> Mon, Aug 17, 2020 at 11:37:40AM CEST, moshe@mellanox.com wrote:
>>>>>> Add devlink reload action to allow the user to request a specific reload
>>>>>> action. The action parameter is optional, if not specified then devlink
>>>>>> driver re-init action is used (backward compatible).
>>>>>> Note that when required to do firmware activation some drivers may need
>>>>>> to reload the driver. On the other hand some drivers may need to reset
>>>>> Sounds reasonable. I think it would be good to indicate that though. Not
>>>>> sure how...
>>>> Maybe counters on the actions done ? Actually such counters can be
>>>> useful on debug, knowing what reloads we had since driver was up.
>>> Wouldn't we need to know all types of reset of drivers may do?
>>
>> Right, we can't tell all reset types driver may have, but we can tell which
>> reload actions were done.
>>
>>> I think documenting this clearly should be sufficient.
>>>
>>> A reset counter for the _requested_ reset type (fully maintained by
>>> core), however - that may be useful. The question "why did this NIC
>>> reset itself / why did the link just flap" comes up repeatedly.
>>
>> I will add counters on which reload were done. reload_down()/up() can return
>> which actions were actually done and devlink will show counters.
> Why a counter? Just return what was done over netlink reply.


Such counters can be useful for debugging, telling which reload actions 
were done on this dev from the point it was up.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-19 14:23             ` Moshe Shemesh
@ 2020-08-19 15:18               ` Jiri Pirko
  2020-08-19 16:25                 ` Jakub Kicinski
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Pirko @ 2020-08-19 15:18 UTC (permalink / raw)
  To: Moshe Shemesh
  Cc: Jakub Kicinski, Moshe Shemesh, David S. Miller, Jiri Pirko,
	netdev, linux-kernel

Wed, Aug 19, 2020 at 04:23:25PM CEST, moshe@nvidia.com wrote:
>
>On 8/19/2020 3:46 PM, Jiri Pirko wrote:
>> Wed, Aug 19, 2020 at 02:18:22PM CEST, moshe@nvidia.com wrote:
>> > On 8/19/2020 3:10 AM, Jakub Kicinski wrote:
>> > > On Tue, 18 Aug 2020 12:10:36 +0300 Moshe Shemesh wrote:
>> > > > On 8/17/2020 7:36 PM, Jiri Pirko wrote:
>> > > > > Mon, Aug 17, 2020 at 11:37:40AM CEST, moshe@mellanox.com wrote:
>> > > > > > Add devlink reload action to allow the user to request a specific reload
>> > > > > > action. The action parameter is optional, if not specified then devlink
>> > > > > > driver re-init action is used (backward compatible).
>> > > > > > Note that when required to do firmware activation some drivers may need
>> > > > > > to reload the driver. On the other hand some drivers may need to reset
>> > > > > Sounds reasonable. I think it would be good to indicate that though. Not
>> > > > > sure how...
>> > > > Maybe counters on the actions done ? Actually such counters can be
>> > > > useful on debug, knowing what reloads we had since driver was up.
>> > > Wouldn't we need to know all types of reset of drivers may do?
>> > 
>> > Right, we can't tell all reset types driver may have, but we can tell which
>> > reload actions were done.
>> > 
>> > > I think documenting this clearly should be sufficient.
>> > > 
>> > > A reset counter for the _requested_ reset type (fully maintained by
>> > > core), however - that may be useful. The question "why did this NIC
>> > > reset itself / why did the link just flap" comes up repeatedly.
>> > 
>> > I will add counters on which reload were done. reload_down()/up() can return
>> > which actions were actually done and devlink will show counters.
>> Why a counter? Just return what was done over netlink reply.
>
>
>Such counters can be useful for debugging, telling which reload actions were
>done on this dev from the point it was up.

Not sure why this is any different from other commands...

>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-19 15:18               ` Jiri Pirko
@ 2020-08-19 16:25                 ` Jakub Kicinski
  2020-08-19 18:55                   ` Jiri Pirko
  0 siblings, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2020-08-19 16:25 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Moshe Shemesh, Moshe Shemesh, David S. Miller, Jiri Pirko,
	netdev, linux-kernel

On Wed, 19 Aug 2020 17:18:15 +0200 Jiri Pirko wrote:
>>>> I will add counters on which reload were done. reload_down()/up() can return
>>>> which actions were actually done and devlink will show counters.  
>>> Why a counter? Just return what was done over netlink reply.  
>>
>> Such counters can be useful for debugging, telling which reload actions were
>> done on this dev from the point it was up.  
> 
> Not sure why this is any different from other commands...

Good question, perhaps because reset is more "dangerous"? The question
of "what reset this NIC" does come up in practice. With live activation
in the mix, knowing if the NIC FW was live activated will be very
useful for dissecting failures, I'd imagine.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command
  2020-08-19 16:25                 ` Jakub Kicinski
@ 2020-08-19 18:55                   ` Jiri Pirko
  0 siblings, 0 replies; 30+ messages in thread
From: Jiri Pirko @ 2020-08-19 18:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Moshe Shemesh, Moshe Shemesh, David S. Miller, Jiri Pirko,
	netdev, linux-kernel

Wed, Aug 19, 2020 at 06:25:51PM CEST, kuba@kernel.org wrote:
>On Wed, 19 Aug 2020 17:18:15 +0200 Jiri Pirko wrote:
>>>>> I will add counters on which reload were done. reload_down()/up() can return
>>>>> which actions were actually done and devlink will show counters.  
>>>> Why a counter? Just return what was done over netlink reply.  
>>>
>>> Such counters can be useful for debugging, telling which reload actions were
>>> done on this dev from the point it was up.  
>> 
>> Not sure why this is any different from other commands...
>
>Good question, perhaps because reset is more "dangerous"? The question
>of "what reset this NIC" does come up in practice. With live activation
>in the mix, knowing if the NIC FW was live activated will be very
>useful for dissecting failures, I'd imagine.

Okay, fair enough. Yet, I think that the info in the reply as I
suggested would be also nice to have, while we are at it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-08-19 18:55 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-17  9:37 [PATCH net-next RFC v2 00/13] Add devlink reload action option Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 01/13] devlink: Add reload action option to devlink reload command Moshe Shemesh
2020-08-17 16:16   ` Jakub Kicinski
2020-08-18  9:06     ` Moshe Shemesh
2020-08-18 15:37       ` Jakub Kicinski
2020-08-17 16:36   ` Jiri Pirko
2020-08-18  9:10     ` Moshe Shemesh
2020-08-19  0:10       ` Jakub Kicinski
2020-08-19 12:18         ` Moshe Shemesh
2020-08-19 12:46           ` Jiri Pirko
2020-08-19 14:23             ` Moshe Shemesh
2020-08-19 15:18               ` Jiri Pirko
2020-08-19 16:25                 ` Jakub Kicinski
2020-08-19 18:55                   ` Jiri Pirko
2020-08-17  9:37 ` [PATCH net-next RFC v2 02/13] devlink: Add supported reload actions to dev get Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 03/13] net/mlx5: Add functions to set/query MFRL register Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 04/13] net/mlx5: Set cap for pci sync for fw update event Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 05/13] net/mlx5: Handle sync reset request event Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 06/13] net/mlx5: Handle sync reset now event Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 07/13] net/mlx5: Handle sync reset abort event Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 08/13] net/mlx5: Add support for devlink reload action fw activate Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 09/13] devlink: Add enable_remote_dev_reset generic parameter Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 10/13] net/mlx5: Add devlink param enable_remote_dev_reset support Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 11/13] net/mlx5: Add support for fw live patch event Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 12/13] net/mlx5: Add support for devlink reload action live patch Moshe Shemesh
2020-08-17  9:37 ` [PATCH net-next RFC v2 13/13] devlink: Add Documentation/networking/devlink/devlink-reload.rst Moshe Shemesh
2020-08-17 16:39   ` Jiri Pirko
2020-08-18  9:14     ` Moshe Shemesh
2020-08-18 11:07       ` Jiri Pirko
2020-08-18 20:04         ` Moshe Shemesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).