netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] Devlink health auto attributes refactor
@ 2020-03-25 13:26 Eran Ben Elisha
  2020-03-25 13:26 ` [PATCH net-next 1/2] devlink: Implicitly set auto recover flag when registering health reporter Eran Ben Elisha
  2020-03-25 13:26 ` [PATCH net-next 2/2] devlink: Add auto dump flag to " Eran Ben Elisha
  0 siblings, 2 replies; 12+ messages in thread
From: Eran Ben Elisha @ 2020-03-25 13:26 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, Jiri Pirko, Michael Chan,
	David S. Miller, Saeed Mahameed
  Cc: Eran Ben Elisha

This patchset refactors the auto-recover health reporter flag to be
explicitly set by the devlink core.
In addition, add another flag to control auto-dump attribute, also
to be explicitly set by the devlink core.

After reporter registration, both flags can be altered be administrator
only.

Eran Ben Elisha (2):
  devlink: Implicitly set auto recover flag when registering health
    reporter
  devlink: Add auto dump flag to health reporter

 .../net/ethernet/broadcom/bnxt/bnxt_devlink.c |  6 ++--
 .../mellanox/mlx5/core/en/reporter_rx.c       |  2 +-
 .../mellanox/mlx5/core/en/reporter_tx.c       |  2 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  |  4 +--
 drivers/net/netdevsim/health.c                |  4 +--
 include/net/devlink.h                         |  3 +-
 include/uapi/linux/devlink.h                  |  2 ++
 net/core/devlink.c                            | 35 +++++++++++++------
 8 files changed, 37 insertions(+), 21 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 1/2] devlink: Implicitly set auto recover flag when registering health reporter
  2020-03-25 13:26 [PATCH net-next 0/2] Devlink health auto attributes refactor Eran Ben Elisha
@ 2020-03-25 13:26 ` Eran Ben Elisha
  2020-03-25 18:33   ` Jakub Kicinski
  2020-03-25 13:26 ` [PATCH net-next 2/2] devlink: Add auto dump flag to " Eran Ben Elisha
  1 sibling, 1 reply; 12+ messages in thread
From: Eran Ben Elisha @ 2020-03-25 13:26 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, Jiri Pirko, Michael Chan,
	David S. Miller, Saeed Mahameed
  Cc: Eran Ben Elisha

When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.

Remove this flag from all drivers that called
devlink_health_reporter_create.

Yet, administrator can unset auto recover via netlink command as prior to
this patch.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c        | 6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c         | 4 ++--
 drivers/net/netdevsim/health.c                           | 4 ++--
 include/net/devlink.h                                    | 3 +--
 net/core/devlink.c                                       | 9 +++------
 7 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index d3c93ccee86a..c4c0d2259304 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -150,7 +150,7 @@ void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 	health->fw_reset_reporter =
 		devlink_health_reporter_create(bp->dl,
 					       &bnxt_dl_fw_reset_reporter_ops,
-					       0, true, bp);
+					       0, bp);
 	if (IS_ERR(health->fw_reset_reporter)) {
 		netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
 			    PTR_ERR(health->fw_reset_reporter));
@@ -166,7 +166,7 @@ void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 		health->fw_reporter =
 			devlink_health_reporter_create(bp->dl,
 						       &bnxt_dl_fw_reporter_ops,
-						       0, false, bp);
+						       0, bp);
 		if (IS_ERR(health->fw_reporter)) {
 			netdev_warn(bp->dev, "Failed to create FW health reporter, rc = %ld\n",
 				    PTR_ERR(health->fw_reporter));
@@ -182,7 +182,7 @@ void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 	health->fw_fatal_reporter =
 		devlink_health_reporter_create(bp->dl,
 					       &bnxt_dl_fw_fatal_reporter_ops,
-					       0, true, bp);
+					       0, bp);
 	if (IS_ERR(health->fw_fatal_reporter)) {
 		netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
 			    PTR_ERR(health->fw_fatal_reporter));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index ce2b41c61090..09fcd843f808 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -571,7 +571,7 @@ int mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
 	reporter = devlink_health_reporter_create(devlink,
 						  &mlx5_rx_reporter_ops,
 						  MLX5E_REPORTER_RX_GRACEFUL_PERIOD,
-						  true, priv);
+						  priv);
 	if (IS_ERR(reporter)) {
 		netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n",
 			    PTR_ERR(reporter));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 2028ce9b151f..9805fc085512 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -416,7 +416,7 @@ int mlx5e_reporter_tx_create(struct mlx5e_priv *priv)
 	reporter =
 		devlink_health_reporter_create(devlink, &mlx5_tx_reporter_ops,
 					       MLX5_REPORTER_TX_GRACEFUL_PERIOD,
-					       true, priv);
+					       priv);
 	if (IS_ERR(reporter)) {
 		netdev_warn(priv->netdev,
 			    "Failed to create tx reporter, err = %ld\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index d9f4e8c59c1f..fa1665caac46 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -627,7 +627,7 @@ static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
 
 	health->fw_reporter =
 		devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
-					       0, false, dev);
+					       0, dev);
 	if (IS_ERR(health->fw_reporter))
 		mlx5_core_warn(dev, "Failed to create fw reporter, err = %ld\n",
 			       PTR_ERR(health->fw_reporter));
@@ -636,7 +636,7 @@ static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
 		devlink_health_reporter_create(devlink,
 					       &mlx5_fw_fatal_reporter_ops,
 					       MLX5_REPORTER_FW_GRACEFUL_PERIOD,
-					       true, dev);
+					       dev);
 	if (IS_ERR(health->fw_fatal_reporter))
 		mlx5_core_warn(dev, "Failed to create fw fatal reporter, err = %ld\n",
 			       PTR_ERR(health->fw_fatal_reporter));
diff --git a/drivers/net/netdevsim/health.c b/drivers/net/netdevsim/health.c
index ba8d9ad60feb..62958b238d50 100644
--- a/drivers/net/netdevsim/health.c
+++ b/drivers/net/netdevsim/health.c
@@ -271,14 +271,14 @@ int nsim_dev_health_init(struct nsim_dev *nsim_dev, struct devlink *devlink)
 	health->empty_reporter =
 		devlink_health_reporter_create(devlink,
 					       &nsim_dev_empty_reporter_ops,
-					       0, false, health);
+					       0, health);
 	if (IS_ERR(health->empty_reporter))
 		return PTR_ERR(health->empty_reporter);
 
 	health->dummy_reporter =
 		devlink_health_reporter_create(devlink,
 					       &nsim_dev_dummy_reporter_ops,
-					       0, false, health);
+					       0, health);
 	if (IS_ERR(health->dummy_reporter)) {
 		err = PTR_ERR(health->dummy_reporter);
 		goto err_empty_reporter_destroy;
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 37230e23b5b0..28381763bd94 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1023,8 +1023,7 @@ int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
 struct devlink_health_reporter *
 devlink_health_reporter_create(struct devlink *devlink,
 			       const struct devlink_health_reporter_ops *ops,
-			       u64 graceful_period, bool auto_recover,
-			       void *priv);
+			       u64 graceful_period, void *priv);
 void
 devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 73bb8fbe3393..ad69379747ef 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4872,14 +4872,12 @@ devlink_health_reporter_find_by_name(struct devlink *devlink,
  *	@devlink: devlink
  *	@ops: ops
  *	@graceful_period: to avoid recovery loops, in msecs
- *	@auto_recover: auto recover when error occurs
  *	@priv: priv
  */
 struct devlink_health_reporter *
 devlink_health_reporter_create(struct devlink *devlink,
 			       const struct devlink_health_reporter_ops *ops,
-			       u64 graceful_period, bool auto_recover,
-			       void *priv)
+			       u64 graceful_period, void *priv)
 {
 	struct devlink_health_reporter *reporter;
 
@@ -4889,8 +4887,7 @@ devlink_health_reporter_create(struct devlink *devlink,
 		goto unlock;
 	}
 
-	if (WARN_ON(auto_recover && !ops->recover) ||
-	    WARN_ON(graceful_period && !ops->recover)) {
+	if (WARN_ON(graceful_period && !ops->recover)) {
 		reporter = ERR_PTR(-EINVAL);
 		goto unlock;
 	}
@@ -4905,7 +4902,7 @@ devlink_health_reporter_create(struct devlink *devlink,
 	reporter->ops = ops;
 	reporter->devlink = devlink;
 	reporter->graceful_period = graceful_period;
-	reporter->auto_recover = auto_recover;
+	reporter->auto_recover = !!ops->recover;
 	mutex_init(&reporter->dump_lock);
 	refcount_set(&reporter->refcount, 1);
 	list_add_tail(&reporter->list, &devlink->reporter_list);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-25 13:26 [PATCH net-next 0/2] Devlink health auto attributes refactor Eran Ben Elisha
  2020-03-25 13:26 ` [PATCH net-next 1/2] devlink: Implicitly set auto recover flag when registering health reporter Eran Ben Elisha
@ 2020-03-25 13:26 ` Eran Ben Elisha
  2020-03-25 18:45   ` Jakub Kicinski
  1 sibling, 1 reply; 12+ messages in thread
From: Eran Ben Elisha @ 2020-03-25 13:26 UTC (permalink / raw)
  To: netdev, Jakub Kicinski, Jiri Pirko, Michael Chan,
	David S. Miller, Saeed Mahameed
  Cc: Eran Ben Elisha

On low memory system, run time dumps can consume too much memory. Add
administrator ability to disable auto dumps per reporter as part of the
error flow handle routine.

This attribute is not relevant while executing
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.

By default, auto dump is activated for any reporter that has a dump method,
as part of the reporter registration to devlink.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/devlink.h |  2 ++
 net/core/devlink.c           | 26 ++++++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index dfdffc42e87d..e7891d1d2ebd 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -429,6 +429,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_NETNS_FD,			/* u32 */
 	DEVLINK_ATTR_NETNS_PID,			/* u32 */
 	DEVLINK_ATTR_NETNS_ID,			/* u32 */
+
+	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,	/* u8 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index ad69379747ef..e14bf3052289 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4837,6 +4837,7 @@ struct devlink_health_reporter {
 	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
 	u64 graceful_period;
 	bool auto_recover;
+	bool auto_dump;
 	u8 health_state;
 	u64 dump_ts;
 	u64 dump_real_ts;
@@ -4903,6 +4904,7 @@ devlink_health_reporter_create(struct devlink *devlink,
 	reporter->devlink = devlink;
 	reporter->graceful_period = graceful_period;
 	reporter->auto_recover = !!ops->recover;
+	reporter->auto_dump = !!ops->dump;
 	mutex_init(&reporter->dump_lock);
 	refcount_set(&reporter->refcount, 1);
 	list_add_tail(&reporter->list, &devlink->reporter_list);
@@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
 	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
 			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
 		goto reporter_nest_cancel;
+	if (reporter->ops->dump &&
+	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
+		       reporter->auto_dump))
+		goto reporter_nest_cancel;
 
 	nla_nest_end(msg, reporter_attr);
 	genlmsg_end(msg, hdr);
@@ -5129,10 +5135,12 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
 
 	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
 
-	mutex_lock(&reporter->dump_lock);
-	/* store current dump of current error, for later analysis */
-	devlink_health_do_dump(reporter, priv_ctx, NULL);
-	mutex_unlock(&reporter->dump_lock);
+	if (reporter->auto_dump) {
+		mutex_lock(&reporter->dump_lock);
+		/* store current dump of current error, for later analysis */
+		devlink_health_do_dump(reporter, priv_ctx, NULL);
+		mutex_unlock(&reporter->dump_lock);
+	}
 
 	if (reporter->auto_recover)
 		return devlink_health_reporter_recover(reporter,
@@ -5306,6 +5314,11 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
 		err = -EOPNOTSUPP;
 		goto out;
 	}
+	if (!reporter->ops->dump &&
+	    info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) {
+		err = -EOPNOTSUPP;
+		goto out;
+	}
 
 	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
 		reporter->graceful_period =
@@ -5315,6 +5328,10 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
 		reporter->auto_recover =
 			nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
 
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP])
+		reporter->auto_dump =
+		nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]);
+
 	devlink_health_reporter_put(reporter);
 	return 0;
 out:
@@ -6053,6 +6070,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
 	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
+	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8 },
 	[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING },
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/2] devlink: Implicitly set auto recover flag when registering health reporter
  2020-03-25 13:26 ` [PATCH net-next 1/2] devlink: Implicitly set auto recover flag when registering health reporter Eran Ben Elisha
@ 2020-03-25 18:33   ` Jakub Kicinski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2020-03-25 18:33 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, Jiri Pirko, Michael Chan, David S. Miller, Saeed Mahameed

On Wed, 25 Mar 2020 15:26:23 +0200 Eran Ben Elisha wrote:
> When health reporter is registered to devlink, devlink will implicitly set
> auto recover if and only if the reporter has a recover method. No reason
> to explicitly get the auto recover flag from the driver.
> 
> Remove this flag from all drivers that called
> devlink_health_reporter_create.
> 
> Yet, administrator can unset auto recover via netlink command as prior to
> this patch.
> 
> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>

It would have been useful to say that all the cases where auto-recovery
was disabled don't have the recover method.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-25 13:26 ` [PATCH net-next 2/2] devlink: Add auto dump flag to " Eran Ben Elisha
@ 2020-03-25 18:45   ` Jakub Kicinski
  2020-03-25 19:08     ` Jiri Pirko
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2020-03-25 18:45 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, Jiri Pirko, Michael Chan, David S. Miller, Saeed Mahameed

On Wed, 25 Mar 2020 15:26:24 +0200 Eran Ben Elisha wrote:
> On low memory system, run time dumps can consume too much memory. Add
> administrator ability to disable auto dumps per reporter as part of the
> error flow handle routine.
> 
> This attribute is not relevant while executing
> DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
> 
> By default, auto dump is activated for any reporter that has a dump method,
> as part of the reporter registration to devlink.
> 
> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/uapi/linux/devlink.h |  2 ++
>  net/core/devlink.c           | 26 ++++++++++++++++++++++----
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index dfdffc42e87d..e7891d1d2ebd 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -429,6 +429,8 @@ enum devlink_attr {
>  	DEVLINK_ATTR_NETNS_FD,			/* u32 */
>  	DEVLINK_ATTR_NETNS_PID,			/* u32 */
>  	DEVLINK_ATTR_NETNS_ID,			/* u32 */
> +
> +	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,	/* u8 */
>  	/* add new attributes above here, update the policy in devlink.c */
>  
>  	__DEVLINK_ATTR_MAX,
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index ad69379747ef..e14bf3052289 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -4837,6 +4837,7 @@ struct devlink_health_reporter {
>  	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
>  	u64 graceful_period;
>  	bool auto_recover;
> +	bool auto_dump;
>  	u8 health_state;
>  	u64 dump_ts;
>  	u64 dump_real_ts;
> @@ -4903,6 +4904,7 @@ devlink_health_reporter_create(struct devlink *devlink,
>  	reporter->devlink = devlink;
>  	reporter->graceful_period = graceful_period;
>  	reporter->auto_recover = !!ops->recover;
> +	reporter->auto_dump = !!ops->dump;
>  	mutex_init(&reporter->dump_lock);
>  	refcount_set(&reporter->refcount, 1);
>  	list_add_tail(&reporter->list, &devlink->reporter_list);
> @@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
>  	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
>  			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
>  		goto reporter_nest_cancel;
> +	if (reporter->ops->dump &&
> +	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
> +		       reporter->auto_dump))
> +		goto reporter_nest_cancel;

Since you're making it a u8 - does it make sense to indicate to user
space whether the dump is disabled or not supported?

Right now no attribute means either old kernel or dump not possible..

>  	nla_nest_end(msg, reporter_attr);
>  	genlmsg_end(msg, hdr);
> @@ -5129,10 +5135,12 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
>  
>  	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
>  
> -	mutex_lock(&reporter->dump_lock);
> -	/* store current dump of current error, for later analysis */
> -	devlink_health_do_dump(reporter, priv_ctx, NULL);
> -	mutex_unlock(&reporter->dump_lock);
> +	if (reporter->auto_dump) {
> +		mutex_lock(&reporter->dump_lock);
> +		/* store current dump of current error, for later analysis */
> +		devlink_health_do_dump(reporter, priv_ctx, NULL);
> +		mutex_unlock(&reporter->dump_lock);
> +	}
>  
>  	if (reporter->auto_recover)
>  		return devlink_health_reporter_recover(reporter,
> @@ -5306,6 +5314,11 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
>  		err = -EOPNOTSUPP;
>  		goto out;
>  	}
> +	if (!reporter->ops->dump &&
> +	    info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) {

... and then this behavior may have to change, I think?

> +		err = -EOPNOTSUPP;
> +		goto out;
> +	}
>  
>  	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
>  		reporter->graceful_period =
> @@ -5315,6 +5328,10 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
>  		reporter->auto_recover =
>  			nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
>  
> +	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP])
> +		reporter->auto_dump =
> +		nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]);
> +
>  	devlink_health_reporter_put(reporter);
>  	return 0;
>  out:
> @@ -6053,6 +6070,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>  	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
>  	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
>  	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
> +	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8 },

I'd suggest we keep the attrs in order of definition, because we should
set .strict_start_type, and then it matters which are before and which
are after.

Also please set max value of 1.

>  	[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING },
>  	[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING },
>  	[DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING },


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-25 18:45   ` Jakub Kicinski
@ 2020-03-25 19:08     ` Jiri Pirko
  2020-03-25 19:38       ` Eran Ben Elisha
  0 siblings, 1 reply; 12+ messages in thread
From: Jiri Pirko @ 2020-03-25 19:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eran Ben Elisha, netdev, Jiri Pirko, Michael Chan,
	David S. Miller, Saeed Mahameed

Wed, Mar 25, 2020 at 07:45:29PM CET, kuba@kernel.org wrote:
>On Wed, 25 Mar 2020 15:26:24 +0200 Eran Ben Elisha wrote:
>> On low memory system, run time dumps can consume too much memory. Add
>> administrator ability to disable auto dumps per reporter as part of the
>> error flow handle routine.
>> 
>> This attribute is not relevant while executing
>> DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
>> 
>> By default, auto dump is activated for any reporter that has a dump method,
>> as part of the reporter registration to devlink.
>> 
>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/uapi/linux/devlink.h |  2 ++
>>  net/core/devlink.c           | 26 ++++++++++++++++++++++----
>>  2 files changed, 24 insertions(+), 4 deletions(-)
>> 
>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> index dfdffc42e87d..e7891d1d2ebd 100644
>> --- a/include/uapi/linux/devlink.h
>> +++ b/include/uapi/linux/devlink.h
>> @@ -429,6 +429,8 @@ enum devlink_attr {
>>  	DEVLINK_ATTR_NETNS_FD,			/* u32 */
>>  	DEVLINK_ATTR_NETNS_PID,			/* u32 */
>>  	DEVLINK_ATTR_NETNS_ID,			/* u32 */
>> +
>> +	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,	/* u8 */
>>  	/* add new attributes above here, update the policy in devlink.c */
>>  
>>  	__DEVLINK_ATTR_MAX,
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index ad69379747ef..e14bf3052289 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -4837,6 +4837,7 @@ struct devlink_health_reporter {
>>  	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
>>  	u64 graceful_period;
>>  	bool auto_recover;
>> +	bool auto_dump;
>>  	u8 health_state;
>>  	u64 dump_ts;
>>  	u64 dump_real_ts;
>> @@ -4903,6 +4904,7 @@ devlink_health_reporter_create(struct devlink *devlink,
>>  	reporter->devlink = devlink;
>>  	reporter->graceful_period = graceful_period;
>>  	reporter->auto_recover = !!ops->recover;
>> +	reporter->auto_dump = !!ops->dump;
>>  	mutex_init(&reporter->dump_lock);
>>  	refcount_set(&reporter->refcount, 1);
>>  	list_add_tail(&reporter->list, &devlink->reporter_list);
>> @@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
>>  	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
>>  			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
>>  		goto reporter_nest_cancel;
>> +	if (reporter->ops->dump &&
>> +	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
>> +		       reporter->auto_dump))
>> +		goto reporter_nest_cancel;
>
>Since you're making it a u8 - does it make sense to indicate to user

Please don't be mistaken. u8 carries a bool here.


>space whether the dump is disabled or not supported?

If you want to expose "not supported", I suggest to do it in another
attr. Because this attr is here to do the config from userspace. Would
be off if the same enum would carry "not supported".

But anyway, since you opened this can, the supported/capabilities
should be probably passed by a separate bitfield for all features.


>
>Right now no attribute means either old kernel or dump not possible..
>
>>  	nla_nest_end(msg, reporter_attr);
>>  	genlmsg_end(msg, hdr);
>> @@ -5129,10 +5135,12 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
>>  
>>  	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
>>  
>> -	mutex_lock(&reporter->dump_lock);
>> -	/* store current dump of current error, for later analysis */
>> -	devlink_health_do_dump(reporter, priv_ctx, NULL);
>> -	mutex_unlock(&reporter->dump_lock);
>> +	if (reporter->auto_dump) {
>> +		mutex_lock(&reporter->dump_lock);
>> +		/* store current dump of current error, for later analysis */
>> +		devlink_health_do_dump(reporter, priv_ctx, NULL);
>> +		mutex_unlock(&reporter->dump_lock);
>> +	}
>>  
>>  	if (reporter->auto_recover)
>>  		return devlink_health_reporter_recover(reporter,
>> @@ -5306,6 +5314,11 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
>>  		err = -EOPNOTSUPP;
>>  		goto out;
>>  	}
>> +	if (!reporter->ops->dump &&
>> +	    info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) {
>
>... and then this behavior may have to change, I think?
>
>> +		err = -EOPNOTSUPP;
>> +		goto out;
>> +	}
>>  
>>  	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
>>  		reporter->graceful_period =
>> @@ -5315,6 +5328,10 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
>>  		reporter->auto_recover =
>>  			nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
>>  
>> +	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP])
>> +		reporter->auto_dump =
>> +		nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]);
>> +
>>  	devlink_health_reporter_put(reporter);
>>  	return 0;
>>  out:
>> @@ -6053,6 +6070,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>>  	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
>>  	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
>>  	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
>> +	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8 },
>
>I'd suggest we keep the attrs in order of definition, because we should
>set .strict_start_type, and then it matters which are before and which
>are after.
>
>Also please set max value of 1.
>
>>  	[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING },
>>  	[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING },
>>  	[DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING },
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-25 19:08     ` Jiri Pirko
@ 2020-03-25 19:38       ` Eran Ben Elisha
  2020-03-26  0:01         ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: Eran Ben Elisha @ 2020-03-25 19:38 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: netdev, Jiri Pirko, Michael Chan, David S. Miller, Saeed Mahameed



On 3/25/2020 9:08 PM, Jiri Pirko wrote:
> Wed, Mar 25, 2020 at 07:45:29PM CET, kuba@kernel.org wrote:
>> On Wed, 25 Mar 2020 15:26:24 +0200 Eran Ben Elisha wrote:
>>> On low memory system, run time dumps can consume too much memory. Add
>>> administrator ability to disable auto dumps per reporter as part of the
>>> error flow handle routine.
>>>
>>> This attribute is not relevant while executing
>>> DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
>>>
>>> By default, auto dump is activated for any reporter that has a dump method,
>>> as part of the reporter registration to devlink.
>>>
>>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>   include/uapi/linux/devlink.h |  2 ++
>>>   net/core/devlink.c           | 26 ++++++++++++++++++++++----
>>>   2 files changed, 24 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>>> index dfdffc42e87d..e7891d1d2ebd 100644
>>> --- a/include/uapi/linux/devlink.h
>>> +++ b/include/uapi/linux/devlink.h
>>> @@ -429,6 +429,8 @@ enum devlink_attr {
>>>   	DEVLINK_ATTR_NETNS_FD,			/* u32 */
>>>   	DEVLINK_ATTR_NETNS_PID,			/* u32 */
>>>   	DEVLINK_ATTR_NETNS_ID,			/* u32 */
>>> +
>>> +	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,	/* u8 */
>>>   	/* add new attributes above here, update the policy in devlink.c */
>>>   
>>>   	__DEVLINK_ATTR_MAX,
>>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>>> index ad69379747ef..e14bf3052289 100644
>>> --- a/net/core/devlink.c
>>> +++ b/net/core/devlink.c
>>> @@ -4837,6 +4837,7 @@ struct devlink_health_reporter {
>>>   	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
>>>   	u64 graceful_period;
>>>   	bool auto_recover;
>>> +	bool auto_dump;
>>>   	u8 health_state;
>>>   	u64 dump_ts;
>>>   	u64 dump_real_ts;
>>> @@ -4903,6 +4904,7 @@ devlink_health_reporter_create(struct devlink *devlink,
>>>   	reporter->devlink = devlink;
>>>   	reporter->graceful_period = graceful_period;
>>>   	reporter->auto_recover = !!ops->recover;
>>> +	reporter->auto_dump = !!ops->dump;
>>>   	mutex_init(&reporter->dump_lock);
>>>   	refcount_set(&reporter->refcount, 1);
>>>   	list_add_tail(&reporter->list, &devlink->reporter_list);
>>> @@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
>>>   	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
>>>   			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
>>>   		goto reporter_nest_cancel;
>>> +	if (reporter->ops->dump &&
>>> +	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
>>> +		       reporter->auto_dump))
>>> +		goto reporter_nest_cancel;
>>
>> Since you're making it a u8 - does it make sense to indicate to user
> 
> Please don't be mistaken. u8 carries a bool here.
> 
> 
>> space whether the dump is disabled or not supported?
> 
> If you want to expose "not supported", I suggest to do it in another
> attr. Because this attr is here to do the config from userspace. Would
> be off if the same enum would carry "not supported".
> 
> But anyway, since you opened this can, the supported/capabilities
> should be probably passed by a separate bitfield for all features.
> 

Actually we supports trinary state per x attribute.
(x can be auto-dump or auto-recover which is supported since day 1)

devlink_nl_health_reporter_fill can set
DEVLINK_ATTR_HEALTH_REPORTER_AUTO_X to {0 , 1 , no set}
And user space devlink propagates it accordingly.

If auto_x is supported, user will see "auto_x true"
If auto_x is not supported, user will see "auto_x false"
If x is not supported at all, user will not the auto_x at all for this 
reporter.

> 
>>
>> Right now no attribute means either old kernel or dump not possible.
when you run on old kernel, you will not see auto-dump attribute at all. 
In any case you won't be able to distinguish in user space between 
{auto-dump, no-auto-dump, no dump support}.

>>
>>>   	nla_nest_end(msg, reporter_attr);
>>>   	genlmsg_end(msg, hdr);
>>> @@ -5129,10 +5135,12 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
>>>   
>>>   	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
>>>   
>>> -	mutex_lock(&reporter->dump_lock);
>>> -	/* store current dump of current error, for later analysis */
>>> -	devlink_health_do_dump(reporter, priv_ctx, NULL);
>>> -	mutex_unlock(&reporter->dump_lock);
>>> +	if (reporter->auto_dump) {
>>> +		mutex_lock(&reporter->dump_lock);
>>> +		/* store current dump of current error, for later analysis */
>>> +		devlink_health_do_dump(reporter, priv_ctx, NULL);
>>> +		mutex_unlock(&reporter->dump_lock);
>>> +	}
>>>   
>>>   	if (reporter->auto_recover)
>>>   		return devlink_health_reporter_recover(reporter,
>>> @@ -5306,6 +5314,11 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
>>>   		err = -EOPNOTSUPP;
>>>   		goto out;
>>>   	}
>>> +	if (!reporter->ops->dump &&
>>> +	    info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) {
>>
>> ... and then this behavior may have to change, I think?
>>
>>> +		err = -EOPNOTSUPP;
>>> +		goto out;
>>> +	}
>>>   
>>>   	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
>>>   		reporter->graceful_period =
>>> @@ -5315,6 +5328,10 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
>>>   		reporter->auto_recover =
>>>   			nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
>>>   
>>> +	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP])
>>> +		reporter->auto_dump =
>>> +		nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]);
>>> +
>>>   	devlink_health_reporter_put(reporter);
>>>   	return 0;
>>>   out:
>>> @@ -6053,6 +6070,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>>>   	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
>>>   	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
>>>   	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
>>> +	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8 },
>>
>> I'd suggest we keep the attrs in order of definition, because we should
>> set .strict_start_type, and then it matters which are before and which
>> are after.
>>
>> Also please set max value of 1.
>>
>>>   	[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING },
>>>   	[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING },
>>>   	[DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING },
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-25 19:38       ` Eran Ben Elisha
@ 2020-03-26  0:01         ` Jakub Kicinski
  2020-03-26  9:06           ` Eran Ben Elisha
  2020-03-26 10:22           ` Jiri Pirko
  0 siblings, 2 replies; 12+ messages in thread
From: Jakub Kicinski @ 2020-03-26  0:01 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: Jiri Pirko, netdev, Jiri Pirko, Michael Chan, David S. Miller,
	Saeed Mahameed

On Wed, 25 Mar 2020 21:38:35 +0200 Eran Ben Elisha wrote:
> On 3/25/2020 9:08 PM, Jiri Pirko wrote:
> > Wed, Mar 25, 2020 at 07:45:29PM CET, kuba@kernel.org wrote:  
> >> On Wed, 25 Mar 2020 15:26:24 +0200 Eran Ben Elisha wrote:  
> >>> On low memory system, run time dumps can consume too much memory. Add
> >>> administrator ability to disable auto dumps per reporter as part of the
> >>> error flow handle routine.
> >>>
> >>> This attribute is not relevant while executing
> >>> DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
> >>>
> >>> By default, auto dump is activated for any reporter that has a dump method,
> >>> as part of the reporter registration to devlink.
> >>>
> >>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> >>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> >>> ---
> >>>   include/uapi/linux/devlink.h |  2 ++
> >>>   net/core/devlink.c           | 26 ++++++++++++++++++++++----
> >>>   2 files changed, 24 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> >>> index dfdffc42e87d..e7891d1d2ebd 100644
> >>> --- a/include/uapi/linux/devlink.h
> >>> +++ b/include/uapi/linux/devlink.h
> >>> @@ -429,6 +429,8 @@ enum devlink_attr {
> >>>   	DEVLINK_ATTR_NETNS_FD,			/* u32 */
> >>>   	DEVLINK_ATTR_NETNS_PID,			/* u32 */
> >>>   	DEVLINK_ATTR_NETNS_ID,			/* u32 */
> >>> +
> >>> +	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,	/* u8 */
> >>>   	/* add new attributes above here, update the policy in devlink.c */
> >>>   
> >>>   	__DEVLINK_ATTR_MAX,
> >>> diff --git a/net/core/devlink.c b/net/core/devlink.c
> >>> index ad69379747ef..e14bf3052289 100644
> >>> --- a/net/core/devlink.c
> >>> +++ b/net/core/devlink.c
> >>> @@ -4837,6 +4837,7 @@ struct devlink_health_reporter {
> >>>   	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
> >>>   	u64 graceful_period;
> >>>   	bool auto_recover;
> >>> +	bool auto_dump;
> >>>   	u8 health_state;
> >>>   	u64 dump_ts;
> >>>   	u64 dump_real_ts;
> >>> @@ -4903,6 +4904,7 @@ devlink_health_reporter_create(struct devlink *devlink,
> >>>   	reporter->devlink = devlink;
> >>>   	reporter->graceful_period = graceful_period;
> >>>   	reporter->auto_recover = !!ops->recover;
> >>> +	reporter->auto_dump = !!ops->dump;
> >>>   	mutex_init(&reporter->dump_lock);
> >>>   	refcount_set(&reporter->refcount, 1);
> >>>   	list_add_tail(&reporter->list, &devlink->reporter_list);
> >>> @@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
> >>>   	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
> >>>   			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
> >>>   		goto reporter_nest_cancel;
> >>> +	if (reporter->ops->dump &&
> >>> +	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
> >>> +		       reporter->auto_dump))
> >>> +		goto reporter_nest_cancel;  
> >>
> >> Since you're making it a u8 - does it make sense to indicate to user  
> > 
> > Please don't be mistaken. u8 carries a bool here.

Are you okay with limiting the value in the policy?

I guess the auto-recover doesn't have it so we'd create a little
inconsistency.

> >> space whether the dump is disabled or not supported?  
> > 
> > If you want to expose "not supported", I suggest to do it in another
> > attr. Because this attr is here to do the config from userspace. Would
> > be off if the same enum would carry "not supported".
> > 
> > But anyway, since you opened this can, the supported/capabilities
> > should be probably passed by a separate bitfield for all features.
> >   
> 
> Actually we supports trinary state per x attribute.
> (x can be auto-dump or auto-recover which is supported since day 1)
> 
> devlink_nl_health_reporter_fill can set
> DEVLINK_ATTR_HEALTH_REPORTER_AUTO_X to {0 , 1 , no set}
> And user space devlink propagates it accordingly.
> 
> If auto_x is supported, user will see "auto_x true"
> If auto_x is not supported, user will see "auto_x false"
> If x is not supported at all, user will not the auto_x at all for this 
> reporter.

Yeah, makes perfect the only glitch is that for auto-dump we have the
old kernel case. auto-recover was there from day 1.

> >> Right now no attribute means either old kernel or dump not possible.  
> when you run on old kernel, you will not see auto-dump attribute at all. 
> In any case you won't be able to distinguish in user space between 
> {auto-dump, no-auto-dump, no dump support}.

Right.

Anyway, I don't think this will matter in this particular case in
practice, so if you're okay with the code as is:

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-26  0:01         ` Jakub Kicinski
@ 2020-03-26  9:06           ` Eran Ben Elisha
  2020-03-26 10:22           ` Jiri Pirko
  1 sibling, 0 replies; 12+ messages in thread
From: Eran Ben Elisha @ 2020-03-26  9:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, netdev, Jiri Pirko, Michael Chan, David S. Miller,
	Saeed Mahameed



On 3/26/2020 2:01 AM, Jakub Kicinski wrote:
> On Wed, 25 Mar 2020 21:38:35 +0200 Eran Ben Elisha wrote:
>> On 3/25/2020 9:08 PM, Jiri Pirko wrote:
>>> Wed, Mar 25, 2020 at 07:45:29PM CET, kuba@kernel.org wrote:
>>>> On Wed, 25 Mar 2020 15:26:24 +0200 Eran Ben Elisha wrote:
>>>>> On low memory system, run time dumps can consume too much memory. Add
>>>>> administrator ability to disable auto dumps per reporter as part of the
>>>>> error flow handle routine.
>>>>>
>>>>> This attribute is not relevant while executing
>>>>> DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
>>>>>
>>>>> By default, auto dump is activated for any reporter that has a dump method,
>>>>> as part of the reporter registration to devlink.
>>>>>
>>>>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>>>>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>>>>> ---
>>>>>    include/uapi/linux/devlink.h |  2 ++
>>>>>    net/core/devlink.c           | 26 ++++++++++++++++++++++----
>>>>>    2 files changed, 24 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>>>>> index dfdffc42e87d..e7891d1d2ebd 100644
>>>>> --- a/include/uapi/linux/devlink.h
>>>>> +++ b/include/uapi/linux/devlink.h
>>>>> @@ -429,6 +429,8 @@ enum devlink_attr {
>>>>>    	DEVLINK_ATTR_NETNS_FD,			/* u32 */
>>>>>    	DEVLINK_ATTR_NETNS_PID,			/* u32 */
>>>>>    	DEVLINK_ATTR_NETNS_ID,			/* u32 */
>>>>> +
>>>>> +	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,	/* u8 */
>>>>>    	/* add new attributes above here, update the policy in devlink.c */
>>>>>    
>>>>>    	__DEVLINK_ATTR_MAX,
>>>>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>>>>> index ad69379747ef..e14bf3052289 100644
>>>>> --- a/net/core/devlink.c
>>>>> +++ b/net/core/devlink.c
>>>>> @@ -4837,6 +4837,7 @@ struct devlink_health_reporter {
>>>>>    	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
>>>>>    	u64 graceful_period;
>>>>>    	bool auto_recover;
>>>>> +	bool auto_dump;
>>>>>    	u8 health_state;
>>>>>    	u64 dump_ts;
>>>>>    	u64 dump_real_ts;
>>>>> @@ -4903,6 +4904,7 @@ devlink_health_reporter_create(struct devlink *devlink,
>>>>>    	reporter->devlink = devlink;
>>>>>    	reporter->graceful_period = graceful_period;
>>>>>    	reporter->auto_recover = !!ops->recover;
>>>>> +	reporter->auto_dump = !!ops->dump;
>>>>>    	mutex_init(&reporter->dump_lock);
>>>>>    	refcount_set(&reporter->refcount, 1);
>>>>>    	list_add_tail(&reporter->list, &devlink->reporter_list);
>>>>> @@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
>>>>>    	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
>>>>>    			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
>>>>>    		goto reporter_nest_cancel;
>>>>> +	if (reporter->ops->dump &&
>>>>> +	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
>>>>> +		       reporter->auto_dump))
>>>>> +		goto reporter_nest_cancel;
>>>>
>>>> Since you're making it a u8 - does it make sense to indicate to user
>>>
>>> Please don't be mistaken. u8 carries a bool here.
> 
> Are you okay with limiting the value in the policy?
> 
> I guess the auto-recover doesn't have it so we'd create a little
> inconsistency.
> 
>>>> space whether the dump is disabled or not supported?
>>>
>>> If you want to expose "not supported", I suggest to do it in another
>>> attr. Because this attr is here to do the config from userspace. Would
>>> be off if the same enum would carry "not supported".
>>>
>>> But anyway, since you opened this can, the supported/capabilities
>>> should be probably passed by a separate bitfield for all features.
>>>    
>>
>> Actually we supports trinary state per x attribute.
>> (x can be auto-dump or auto-recover which is supported since day 1)
>>
>> devlink_nl_health_reporter_fill can set
>> DEVLINK_ATTR_HEALTH_REPORTER_AUTO_X to {0 , 1 , no set}
>> And user space devlink propagates it accordingly.
>>
>> If auto_x is supported, user will see "auto_x true"
>> If auto_x is not supported, user will see "auto_x false"
>> If x is not supported at all, user will not the auto_x at all for this
>> reporter.
> 
> Yeah, makes perfect the only glitch is that for auto-dump we have the
> old kernel case. auto-recover was there from day 1.
> 
>>>> Right now no attribute means either old kernel or dump not possible.
>> when you run on old kernel, you will not see auto-dump attribute at all.
>> In any case you won't be able to distinguish in user space between
>> {auto-dump, no-auto-dump, no dump support}.
> 
> Right.
> 
> Anyway, I don't think this will matter in this particular case in
> practice, so if you're okay with the code as is:
> 
> Reviewed-by: Jakub Kicinski <kuba@kernel.org>
> 
Thanks Jakub.

Dave,
I see the patchset is currently marked as Changes Requested. Can you 
please modify?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-26  0:01         ` Jakub Kicinski
  2020-03-26  9:06           ` Eran Ben Elisha
@ 2020-03-26 10:22           ` Jiri Pirko
  2020-03-26 17:39             ` Jakub Kicinski
  1 sibling, 1 reply; 12+ messages in thread
From: Jiri Pirko @ 2020-03-26 10:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eran Ben Elisha, netdev, Jiri Pirko, Michael Chan,
	David S. Miller, Saeed Mahameed

Thu, Mar 26, 2020 at 01:01:35AM CET, kuba@kernel.org wrote:
>On Wed, 25 Mar 2020 21:38:35 +0200 Eran Ben Elisha wrote:
>> On 3/25/2020 9:08 PM, Jiri Pirko wrote:
>> > Wed, Mar 25, 2020 at 07:45:29PM CET, kuba@kernel.org wrote:  
>> >> On Wed, 25 Mar 2020 15:26:24 +0200 Eran Ben Elisha wrote:  
>> >>> On low memory system, run time dumps can consume too much memory. Add
>> >>> administrator ability to disable auto dumps per reporter as part of the
>> >>> error flow handle routine.
>> >>>
>> >>> This attribute is not relevant while executing
>> >>> DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
>> >>>
>> >>> By default, auto dump is activated for any reporter that has a dump method,
>> >>> as part of the reporter registration to devlink.
>> >>>
>> >>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>> >>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>> >>> ---
>> >>>   include/uapi/linux/devlink.h |  2 ++
>> >>>   net/core/devlink.c           | 26 ++++++++++++++++++++++----
>> >>>   2 files changed, 24 insertions(+), 4 deletions(-)
>> >>>
>> >>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> >>> index dfdffc42e87d..e7891d1d2ebd 100644
>> >>> --- a/include/uapi/linux/devlink.h
>> >>> +++ b/include/uapi/linux/devlink.h
>> >>> @@ -429,6 +429,8 @@ enum devlink_attr {
>> >>>   	DEVLINK_ATTR_NETNS_FD,			/* u32 */
>> >>>   	DEVLINK_ATTR_NETNS_PID,			/* u32 */
>> >>>   	DEVLINK_ATTR_NETNS_ID,			/* u32 */
>> >>> +
>> >>> +	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,	/* u8 */
>> >>>   	/* add new attributes above here, update the policy in devlink.c */
>> >>>   
>> >>>   	__DEVLINK_ATTR_MAX,
>> >>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> >>> index ad69379747ef..e14bf3052289 100644
>> >>> --- a/net/core/devlink.c
>> >>> +++ b/net/core/devlink.c
>> >>> @@ -4837,6 +4837,7 @@ struct devlink_health_reporter {
>> >>>   	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
>> >>>   	u64 graceful_period;
>> >>>   	bool auto_recover;
>> >>> +	bool auto_dump;
>> >>>   	u8 health_state;
>> >>>   	u64 dump_ts;
>> >>>   	u64 dump_real_ts;
>> >>> @@ -4903,6 +4904,7 @@ devlink_health_reporter_create(struct devlink *devlink,
>> >>>   	reporter->devlink = devlink;
>> >>>   	reporter->graceful_period = graceful_period;
>> >>>   	reporter->auto_recover = !!ops->recover;
>> >>> +	reporter->auto_dump = !!ops->dump;
>> >>>   	mutex_init(&reporter->dump_lock);
>> >>>   	refcount_set(&reporter->refcount, 1);
>> >>>   	list_add_tail(&reporter->list, &devlink->reporter_list);
>> >>> @@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
>> >>>   	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
>> >>>   			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
>> >>>   		goto reporter_nest_cancel;
>> >>> +	if (reporter->ops->dump &&
>> >>> +	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
>> >>> +		       reporter->auto_dump))
>> >>> +		goto reporter_nest_cancel;  
>> >>
>> >> Since you're making it a u8 - does it make sense to indicate to user  
>> > 
>> > Please don't be mistaken. u8 carries a bool here.
>
>Are you okay with limiting the value in the policy?

Well, not-0 means true. Do you think it is wise to limit to 0/1?

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-26 10:22           ` Jiri Pirko
@ 2020-03-26 17:39             ` Jakub Kicinski
  2020-03-26 21:23               ` Jiri Pirko
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2020-03-26 17:39 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Eran Ben Elisha, netdev, Jiri Pirko, Michael Chan,
	David S. Miller, Saeed Mahameed

On Thu, 26 Mar 2020 11:22:44 +0100 Jiri Pirko wrote:
> >> >>> @@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
> >> >>>   	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
> >> >>>   			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
> >> >>>   		goto reporter_nest_cancel;
> >> >>> +	if (reporter->ops->dump &&
> >> >>> +	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
> >> >>> +		       reporter->auto_dump))
> >> >>> +		goto reporter_nest_cancel;    
> >> >>
> >> >> Since you're making it a u8 - does it make sense to indicate to user    
> >> > 
> >> > Please don't be mistaken. u8 carries a bool here.  
> >
> >Are you okay with limiting the value in the policy?  
> 
> Well, not-0 means true. Do you think it is wise to limit to 0/1?

Just future proofing, in general seems wise to always constrain the
input as much as possible. But in this case we already have similar
attrs in the dump which don't have the constraint, and we will probably
want consistency, so maybe we're unlikely to use other values.

In particular I was wondering if auto-dump value can be extended to
mean the number of dumps we want to collect, the current behavior I
think matches collecting just one. But obviously this can be solved
with a new attr when needed..

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/2] devlink: Add auto dump flag to health reporter
  2020-03-26 17:39             ` Jakub Kicinski
@ 2020-03-26 21:23               ` Jiri Pirko
  0 siblings, 0 replies; 12+ messages in thread
From: Jiri Pirko @ 2020-03-26 21:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eran Ben Elisha, netdev, Jiri Pirko, Michael Chan,
	David S. Miller, Saeed Mahameed

Thu, Mar 26, 2020 at 06:39:13PM CET, kuba@kernel.org wrote:
>On Thu, 26 Mar 2020 11:22:44 +0100 Jiri Pirko wrote:
>> >> >>> @@ -4983,6 +4985,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
>> >> >>>   	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS,
>> >> >>>   			      reporter->dump_real_ts, DEVLINK_ATTR_PAD))
>> >> >>>   		goto reporter_nest_cancel;
>> >> >>> +	if (reporter->ops->dump &&
>> >> >>> +	    nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
>> >> >>> +		       reporter->auto_dump))
>> >> >>> +		goto reporter_nest_cancel;    
>> >> >>
>> >> >> Since you're making it a u8 - does it make sense to indicate to user    
>> >> > 
>> >> > Please don't be mistaken. u8 carries a bool here.  
>> >
>> >Are you okay with limiting the value in the policy?  
>> 
>> Well, not-0 means true. Do you think it is wise to limit to 0/1?
>
>Just future proofing, in general seems wise to always constrain the
>input as much as possible. But in this case we already have similar
>attrs in the dump which don't have the constraint, and we will probably
>want consistency, so maybe we're unlikely to use other values.

Agreed.

>
>In particular I was wondering if auto-dump value can be extended to
>mean the number of dumps we want to collect, the current behavior I
>think matches collecting just one. But obviously this can be solved
>with a new attr when needed..

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-03-26 21:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25 13:26 [PATCH net-next 0/2] Devlink health auto attributes refactor Eran Ben Elisha
2020-03-25 13:26 ` [PATCH net-next 1/2] devlink: Implicitly set auto recover flag when registering health reporter Eran Ben Elisha
2020-03-25 18:33   ` Jakub Kicinski
2020-03-25 13:26 ` [PATCH net-next 2/2] devlink: Add auto dump flag to " Eran Ben Elisha
2020-03-25 18:45   ` Jakub Kicinski
2020-03-25 19:08     ` Jiri Pirko
2020-03-25 19:38       ` Eran Ben Elisha
2020-03-26  0:01         ` Jakub Kicinski
2020-03-26  9:06           ` Eran Ben Elisha
2020-03-26 10:22           ` Jiri Pirko
2020-03-26 17:39             ` Jakub Kicinski
2020-03-26 21:23               ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).