Netdev Archive on lore.kernel.org
 help / Atom feed
* [PATCH v4 net-next 00/11] Devlink health reporting and recovery system
@ 2019-02-07  9:36 Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 01/11] devlink: Add devlink formatted message (fmsg) API Eran Ben Elisha
                   ` (13 more replies)
  0 siblings, 14 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

The health mechanism is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
  information.

The main idea is to unify and centralize driver health reports in the
generic devlink instance and allow the user to set different
attributes of the health reporting and recovery procedures.

The devlink health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by devlink.
Device driver can provide specific callbacks for each "health reporter", e.g.
 - Recovery procedures
 - Diagnostics and object dump procedures
 - OOB initial attributes
Different parts of the driver can register different types of health reporters
with different handlers.

Once an error is reported, devlink health will do the following actions:
  * A log is being send to the kernel trace events buffer
  * Health status and statistics are being updated for the reporter instance
  * Object dump is being taken and saved at the reporter instance (as long as
    there is no other dump which is already stored)
  * Auto recovery attempt is being done. Depends on:
    - Auto-recovery configuration
    - Grace period vs. time passed since last recover

The user interface:
User can access/change each reporter attributes and driver specific callbacks
via devlink, e.g per error type (per health reporter)
 - Configure reporter's generic attributes (like: Disable/enable auto recovery)
 - Invoke recovery procedure
 - Run diagnostics
 - Object dump

The devlink health interface (via netlink):
DEVLINK_CMD_HEALTH_REPORTER_GET
  Retrieves status and configuration info per DEV and reporter.
DEVLINK_CMD_HEALTH_REPORTER_SET
  Allows reporter-related configuration setting.
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
  Triggers a reporter's recovery procedure.
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
  Retrieves diagnostics data from a reporter on a device.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
  Retrieves the last stored dump. Devlink health
  saves a single dump. If an dump is not already stored by the devlink
  for this reporter, devlink generates a new dump.
  dump output is defined by the reporter.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
  Clears the last saved dump file for the specified reporter.

                                               netlink
                                      +--------------------------+
                                      |                          |
                                      |            +             |
                                      |            |             |
                                      +--------------------------+
                                                   |request for ops
                                                   |(diagnose,
 mlx5_core                             devlink     |recover,
                                                   |dump)
+--------+                            +--------------------------+
|        |                            |    reporter|             |
|        |                            |  +---------v----------+  |
|        |   ops execution            |  |                    |  |
|     <----------------------------------+                    |  |
|        |                            |  |                    |  |
|        |                            |  + ^------------------+  |
|        |                            |    | request for ops     |
|        |                            |    | (recover, dump)     |
|        |                            |    |                     |
|        |                            |  +-+------------------+  |
|        |     health report          |  | health handler     |  |
|        +------------------------------->                    |  |
|        |                            |  +--------------------+  |
|        |     health reporter create |                          |
|        +---------------------------->                          |
+--------+                            +--------------------------+

In this patchset, mlx5e TX reporter is implemented.

Cmdline format:
    devlink health show [DEV reporter REPORTE_NAME]
    devlink health recover DEV reporter REPORTER_NAME
    devlink health diagnose DEV reporter REPORTER_NAME
    devlink health dump show DEV reporter REPORTER_NAME
    devlink health dump clear DEV reporter REPORTER_NAME
    devlink health set DEV reporter REPORTER_NAME NAME VALUE

Cmdline examples:
$devlink health show
pci/0000:00:09.0:
  name tx
    state healthy #err 1 #recover 0 last_dump_ts N/A
    parameters:
      grace_period 500 auto_recover false

$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
    "SQs": [ {
            "sqn": 138,
            "HW state": 1,
            "stopped": false
        },{
            "sqn": 142,
            "HW state": 1,
            "stopped": false
        } ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs: 
  sqn: 138 HW state: 1 stopped: false 
  sqn: 142 HW state: 1 stopped: false 

$devlink health recover pci/0000:00:09 reporter tx

$devlink health set pci/0000:00:09.0 reporter tx grace_period 3500

$devlink health set pci/0000:00:09.0 reporter tx auto_recover false

Changelog:
v4:
- Rebase on latest net-next
- Remove trace_devlink_health signature exposure in case CONFIG_NET_DEVLINK is
  not defined as it shall only be used from devlink.

v3:
- Redesign of devlink <-> driver fmsg API
- Various bug fixes

v2:
- Remove FW* reporters to decrease the amount of patches in the patchset

Aya Levin (1):
  devlink: Add Documentation/networking/devlink-health.txt

Eran Ben Elisha (10):
  devlink: Add devlink formatted message (fmsg) API
  devlink: Add health reporter create/destroy functionality
  devlink: Add health report functionality
  devlink: Add health get command
  devlink: Add health set command
  devlink: Add health recover command
  devlink: Add health diagnose command
  devlink: Add health dump {get,clear} commands
  net/mlx5e: Add tx reporter support
  net/mlx5e: Add tx timeout support for mlx5e tx reporter

 Documentation/networking/devlink-health.txt   |   86 ++
 .../net/ethernet/mellanox/mlx5/core/Makefile  |    2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   18 +-
 .../ethernet/mellanox/mlx5/core/en/reporter.h |   15 +
 .../mellanox/mlx5/core/en/reporter_tx.c       |  297 +++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  189 +---
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |    5 +-
 include/net/devlink.h                         |  211 ++++
 include/trace/events/devlink.h                |   65 ++
 include/uapi/linux/devlink.h                  |   24 +
 net/core/devlink.c                            | 1008 +++++++++++++++++
 11 files changed, 1755 insertions(+), 165 deletions(-)
 create mode 100644 Documentation/networking/devlink-health.txt
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 01/11] devlink: Add devlink formatted message (fmsg) API
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 02/11] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Devlink fmsg is a mechanism to pass descriptors between drivers and
devlink, in json-like format. The API allows the driver to add nested
attributes such as object, object pair and value array, in addition to
attributes such as name and value.

Driver can use this API to fill the fmsg context in a format which will be
translated by the devlink to the netlink message later.
There is no memory allocation in advance (other than the initial list
head), and it dynamically allocates messages descriptors and add them to
the list on the fly.

When it needs to send the data using SKBs to the netlink layer, it
fragments the data between different SKBs. In order to do this
fragmentation, it uses virtual nests attributes, to avoid actual
nesting use which cannot be divided between different SKBs.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h        | 149 +++++++++++
 include/uapi/linux/devlink.h |   8 +
 net/core/devlink.c           | 483 +++++++++++++++++++++++++++++++++++
 3 files changed, 640 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 74d992a68a06..7c5722e816aa 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -448,6 +448,8 @@ struct devlink_info_req;
 
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
+struct devlink_fmsg;
+
 struct devlink_ops {
 	int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
 	int (*port_type_set)(struct devlink_port *devlink_port,
@@ -639,6 +641,37 @@ int devlink_info_version_running_put(struct devlink_info_req *req,
 				     const char *version_name,
 				     const char *version_value);
 
+int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg);
+int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name);
+int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
+				     const char *name);
+int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value);
+int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value);
+int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value);
+int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value);
+int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value);
+int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
+			    u16 value_len);
+
+int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			       bool value);
+int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			     u8 value);
+int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			      u32 value);
+int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			      u64 value);
+int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
+				 const char *value);
+int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
+				 const void *value, u16 value_len);
+
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -971,6 +1004,122 @@ devlink_info_version_running_put(struct devlink_info_req *req,
 {
 	return 0;
 }
+
+static inline int
+devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
+				 const char *name)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
+			u16 value_len)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			   bool value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			 u8 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			  u32 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			  u64 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			     const char *value)
+{
+	return 0;
+}
+
+static inline int
+devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			     const void *value, u16 value_len)
+{
+	return 0;
+}
 #endif
 
 #if IS_REACHABLE(CONFIG_NET_DEVLINK)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 054b2d1a4537..076692209a9b 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -302,6 +302,14 @@ enum devlink_attr {
 
 	DEVLINK_ATTR_SB_POOL_CELL_SIZE,		/* u32 */
 
+	DEVLINK_ATTR_FMSG,			/* nested */
+	DEVLINK_ATTR_FMSG_OBJ_NEST_START,	/* flag */
+	DEVLINK_ATTR_FMSG_PAIR_NEST_START,	/* flag */
+	DEVLINK_ATTR_FMSG_ARR_NEST_START,	/* flag */
+	DEVLINK_ATTR_FMSG_NEST_END,		/* flag */
+	DEVLINK_ATTR_FMSG_OBJ_NAME,		/* string */
+	DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE,	/* u8 */
+	DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA,	/* dynamic */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index cd0d393bc62d..03883697fcf0 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3879,6 +3879,489 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg,
 	return msg->len;
 }
 
+struct devlink_fmsg_item {
+	struct list_head list;
+	int attrtype;
+	u8 nla_type;
+	u16 len;
+	int value[0];
+};
+
+struct devlink_fmsg {
+	struct list_head item_list;
+};
+
+static struct devlink_fmsg *devlink_fmsg_alloc(void)
+{
+	struct devlink_fmsg *fmsg;
+
+	fmsg = kzalloc(sizeof(*fmsg), GFP_KERNEL);
+	if (!fmsg)
+		return NULL;
+
+	INIT_LIST_HEAD(&fmsg->item_list);
+
+	return fmsg;
+}
+
+static void devlink_fmsg_free(struct devlink_fmsg *fmsg)
+{
+	struct devlink_fmsg_item *item, *tmp;
+
+	list_for_each_entry_safe(item, tmp, &fmsg->item_list, list) {
+		list_del(&item->list);
+		kfree(item);
+	}
+	kfree(fmsg);
+}
+
+static int devlink_fmsg_nest_common(struct devlink_fmsg *fmsg,
+				    int attrtype)
+{
+	struct devlink_fmsg_item *item;
+
+	item = kzalloc(sizeof(*item), GFP_KERNEL);
+	if (!item)
+		return -ENOMEM;
+
+	item->attrtype = attrtype;
+	list_add_tail(&item->list, &fmsg->item_list);
+
+	return 0;
+}
+
+int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg)
+{
+	return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_OBJ_NEST_START);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_start);
+
+static int devlink_fmsg_nest_end(struct devlink_fmsg *fmsg)
+{
+	return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_NEST_END);
+}
+
+int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg)
+{
+	return devlink_fmsg_nest_end(fmsg);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_end);
+
+#define DEVLINK_FMSG_MAX_SIZE (GENLMSG_DEFAULT_SIZE - GENL_HDRLEN - NLA_HDRLEN)
+
+static int devlink_fmsg_put_name(struct devlink_fmsg *fmsg, const char *name)
+{
+	struct devlink_fmsg_item *item;
+
+	if (strlen(name) + 1 > DEVLINK_FMSG_MAX_SIZE)
+		return -EMSGSIZE;
+
+	item = kzalloc(sizeof(*item) + strlen(name) + 1, GFP_KERNEL);
+	if (!item)
+		return -ENOMEM;
+
+	item->nla_type = NLA_NUL_STRING;
+	item->len = strlen(name) + 1;
+	item->attrtype = DEVLINK_ATTR_FMSG_OBJ_NAME;
+	memcpy(&item->value, name, item->len);
+	list_add_tail(&item->list, &fmsg->item_list);
+
+	return 0;
+}
+
+int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name)
+{
+	int err;
+
+	err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_PAIR_NEST_START);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_put_name(fmsg, name);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_start);
+
+int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+	return devlink_fmsg_nest_end(fmsg);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_end);
+
+int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
+				     const char *name)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_start(fmsg, name);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_ARR_NEST_START);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_start);
+
+int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+	int err;
+
+	err = devlink_fmsg_nest_end(fmsg);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_end);
+
+static int devlink_fmsg_put_value(struct devlink_fmsg *fmsg,
+				  const void *value, u16 value_len,
+				  u8 value_nla_type)
+{
+	struct devlink_fmsg_item *item;
+
+	if (value_len > DEVLINK_FMSG_MAX_SIZE)
+		return -EMSGSIZE;
+
+	item = kzalloc(sizeof(*item) + value_len, GFP_KERNEL);
+	if (!item)
+		return -ENOMEM;
+
+	item->nla_type = value_nla_type;
+	item->len = value_len;
+	item->attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA;
+	memcpy(&item->value, value, item->len);
+	list_add_tail(&item->list, &fmsg->item_list);
+
+	return 0;
+}
+
+int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value)
+{
+	return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_FLAG);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_bool_put);
+
+int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value)
+{
+	return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U8);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u8_put);
+
+int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value)
+{
+	return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U32);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u32_put);
+
+int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value)
+{
+	return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U64);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u64_put);
+
+int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value)
+{
+	return devlink_fmsg_put_value(fmsg, value, strlen(value) + 1,
+				      NLA_NUL_STRING);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_string_put);
+
+int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
+			    u16 value_len)
+{
+	return devlink_fmsg_put_value(fmsg, value, value_len, NLA_BINARY);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_binary_put);
+
+int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			       bool value)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_start(fmsg, name);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_bool_put(fmsg, value);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_pair_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_bool_pair_put);
+
+int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			     u8 value)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_start(fmsg, name);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u8_put(fmsg, value);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_pair_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u8_pair_put);
+
+int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			      u32 value)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_start(fmsg, name);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_put(fmsg, value);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_pair_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u32_pair_put);
+
+int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
+			      u64 value)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_start(fmsg, name);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u64_put(fmsg, value);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_pair_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u64_pair_put);
+
+int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
+				 const char *value)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_start(fmsg, name);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_string_put(fmsg, value);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_pair_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_string_pair_put);
+
+int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
+				 const void *value, u16 value_len)
+{
+	int err;
+
+	err = devlink_fmsg_pair_nest_start(fmsg, name);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_binary_put(fmsg, value, value_len);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_pair_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_put);
+
+static int
+devlink_fmsg_item_fill_type(struct devlink_fmsg_item *msg, struct sk_buff *skb)
+{
+	switch (msg->nla_type) {
+	case NLA_FLAG:
+	case NLA_U8:
+	case NLA_U32:
+	case NLA_U64:
+	case NLA_NUL_STRING:
+	case NLA_BINARY:
+		return nla_put_u8(skb, DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE,
+				  msg->nla_type);
+	default:
+		return -EINVAL;
+	}
+}
+
+static int
+devlink_fmsg_item_fill_data(struct devlink_fmsg_item *msg, struct sk_buff *skb)
+{
+	int attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA;
+	u8 tmp;
+
+	switch (msg->nla_type) {
+	case NLA_FLAG:
+		/* Always provide flag data, regardless of its value */
+		tmp = *(bool *) msg->value;
+
+		return nla_put_u8(skb, attrtype, tmp);
+	case NLA_U8:
+		return nla_put_u8(skb, attrtype, *(u8 *) msg->value);
+	case NLA_U32:
+		return nla_put_u32(skb, attrtype, *(u32 *) msg->value);
+	case NLA_U64:
+		return nla_put_u64_64bit(skb, attrtype, *(u64 *) msg->value,
+					 DEVLINK_ATTR_PAD);
+	case NLA_NUL_STRING:
+		return nla_put_string(skb, attrtype, (char *) &msg->value);
+	case NLA_BINARY:
+		return nla_put(skb, attrtype, msg->len, (void *) &msg->value);
+	default:
+		return -EINVAL;
+	}
+}
+
+static int
+devlink_fmsg_prepare_skb(struct devlink_fmsg *fmsg, struct sk_buff *skb,
+			 int *start)
+{
+	struct devlink_fmsg_item *item;
+	struct nlattr *fmsg_nlattr;
+	int i = 0;
+	int err;
+
+	fmsg_nlattr = nla_nest_start(skb, DEVLINK_ATTR_FMSG);
+	if (!fmsg_nlattr)
+		return -EMSGSIZE;
+
+	list_for_each_entry(item, &fmsg->item_list, list) {
+		if (i < *start) {
+			i++;
+			continue;
+		}
+
+		switch (item->attrtype) {
+		case DEVLINK_ATTR_FMSG_OBJ_NEST_START:
+		case DEVLINK_ATTR_FMSG_PAIR_NEST_START:
+		case DEVLINK_ATTR_FMSG_ARR_NEST_START:
+		case DEVLINK_ATTR_FMSG_NEST_END:
+			err = nla_put_flag(skb, item->attrtype);
+			break;
+		case DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA:
+			err = devlink_fmsg_item_fill_type(item, skb);
+			if (err)
+				break;
+			err = devlink_fmsg_item_fill_data(item, skb);
+			break;
+		case DEVLINK_ATTR_FMSG_OBJ_NAME:
+			err = nla_put_string(skb, item->attrtype,
+					     (char *) &item->value);
+			break;
+		default:
+			err = -EINVAL;
+			break;
+		}
+		if (!err)
+			*start = ++i;
+		else
+			break;
+	}
+
+	nla_nest_end(skb, fmsg_nlattr);
+	return err;
+}
+
+static int devlink_fmsg_snd(struct devlink_fmsg *fmsg,
+			    struct genl_info *info,
+			    enum devlink_command cmd, int flags)
+{
+	struct nlmsghdr *nlh;
+	struct sk_buff *skb;
+	bool last = false;
+	int index = 0;
+	void *hdr;
+	int err;
+
+	while (!last) {
+		int tmp_index = index;
+
+		skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+		if (!skb)
+			return -ENOMEM;
+
+		hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
+				  &devlink_nl_family, flags | NLM_F_MULTI, cmd);
+		if (!hdr) {
+			err = -EMSGSIZE;
+			goto nla_put_failure;
+		}
+
+		err = devlink_fmsg_prepare_skb(fmsg, skb, &index);
+		if (!err)
+			last = true;
+		else if (err != -EMSGSIZE || tmp_index == index)
+			goto nla_put_failure;
+
+		genlmsg_end(skb, hdr);
+		err = genlmsg_reply(skb, info);
+		if (err)
+			return err;
+	}
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+	nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
+			NLMSG_DONE, 0, flags | NLM_F_MULTI);
+	if (!nlh) {
+		err = -EMSGSIZE;
+		goto nla_put_failure;
+	}
+	err = genlmsg_reply(skb, info);
+	if (err)
+		return err;
+
+	return 0;
+
+nla_put_failure:
+	nlmsg_free(skb);
+	return err;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 02/11] devlink: Add health reporter create/destroy functionality
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 01/11] devlink: Add devlink formatted message (fmsg) API Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 03/11] devlink: Add health report functionality Eran Ben Elisha
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.

As part of the allocation, driver shall provide a set of callbacks which
will be used by devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.

For each reporter, devlink will hold a metadata of statistics,
dump msg and status.

For passing dumps and diagnose data to the user-space, it will use devlink
fmsg API.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h | 53 +++++++++++++++++++++++++
 net/core/devlink.c    | 92 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 145 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7c5722e816aa..3dfe30235878 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -30,6 +30,7 @@ struct devlink {
 	struct list_head param_list;
 	struct list_head region_list;
 	u32 snapshot_id;
+	struct list_head reporter_list;
 	struct devlink_dpipe_headers *dpipe_headers;
 	const struct devlink_ops *ops;
 	struct device *dev;
@@ -449,6 +450,27 @@ struct devlink_info_req;
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
 struct devlink_fmsg;
+struct devlink_health_reporter;
+
+/**
+ * struct devlink_health_reporter_ops - Reporter operations
+ * @name: reporter name
+ * @recover: callback to recover from reported error
+ *           if priv_ctx is NULL, run a full recover
+ * @dump: callback to dump an object
+ *        if priv_ctx is NULL, run a full dump
+ * @diagnose: callback to diagnose the current status
+ */
+
+struct devlink_health_reporter_ops {
+	char *name;
+	int (*recover)(struct devlink_health_reporter *reporter,
+		       void *priv_ctx);
+	int (*dump)(struct devlink_health_reporter *reporter,
+		    struct devlink_fmsg *fmsg, void *priv_ctx);
+	int (*diagnose)(struct devlink_health_reporter *reporter,
+			struct devlink_fmsg *fmsg);
+};
 
 struct devlink_ops {
 	int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
@@ -672,6 +694,17 @@ int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
 int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
 				 const void *value, u16 value_len);
 
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv);
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
+
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -1120,6 +1153,26 @@ devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
 {
 	return 0;
 }
+
+static inline struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv)
+{
+	return NULL;
+}
+
+static inline void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+}
+
+static inline void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+	return NULL;
+}
 #endif
 
 #if IS_REACHABLE(CONFIG_NET_DEVLINK)
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 03883697fcf0..341548d7f1f1 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4362,6 +4362,97 @@ static int devlink_fmsg_snd(struct devlink_fmsg *fmsg,
 	return err;
 }
 
+struct devlink_health_reporter {
+	struct list_head list;
+	void *priv;
+	const struct devlink_health_reporter_ops *ops;
+	struct devlink *devlink;
+	u64 graceful_period;
+	bool auto_recover;
+	u8 health_state;
+};
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+	return reporter->priv;
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_priv);
+
+static struct devlink_health_reporter *
+devlink_health_reporter_find_by_name(struct devlink *devlink,
+				     const char *reporter_name)
+{
+	struct devlink_health_reporter *reporter;
+
+	list_for_each_entry(reporter, &devlink->reporter_list, list)
+		if (!strcmp(reporter->ops->name, reporter_name))
+			return reporter;
+	return NULL;
+}
+
+/**
+ *	devlink_health_reporter_create - create devlink health reporter
+ *
+ *	@devlink: devlink
+ *	@ops: ops
+ *	@graceful_period: to avoid recovery loops, in msecs
+ *	@auto_recover: auto recover when error occurs
+ *	@priv: priv
+ */
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv)
+{
+	struct devlink_health_reporter *reporter;
+
+	mutex_lock(&devlink->lock);
+	if (devlink_health_reporter_find_by_name(devlink, ops->name)) {
+		reporter = ERR_PTR(-EEXIST);
+		goto unlock;
+	}
+
+	if (WARN_ON(auto_recover && !ops->recover) ||
+	    WARN_ON(graceful_period && !ops->recover)) {
+		reporter = ERR_PTR(-EINVAL);
+		goto unlock;
+	}
+
+	reporter = kzalloc(sizeof(*reporter), GFP_KERNEL);
+	if (!reporter) {
+		reporter = ERR_PTR(-ENOMEM);
+		goto unlock;
+	}
+
+	reporter->priv = priv;
+	reporter->ops = ops;
+	reporter->devlink = devlink;
+	reporter->graceful_period = graceful_period;
+	reporter->auto_recover = auto_recover;
+	list_add_tail(&reporter->list, &devlink->reporter_list);
+unlock:
+	mutex_unlock(&devlink->lock);
+	return reporter;
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_create);
+
+/**
+ *	devlink_health_reporter_destroy - destroy devlink health reporter
+ *
+ *	@reporter: devlink health reporter to destroy
+ */
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+	mutex_lock(&reporter->devlink->lock);
+	list_del(&reporter->list);
+	mutex_unlock(&reporter->devlink->lock);
+	kfree(reporter);
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4670,6 +4761,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	INIT_LIST_HEAD(&devlink->resource_list);
 	INIT_LIST_HEAD(&devlink->param_list);
 	INIT_LIST_HEAD(&devlink->region_list);
+	INIT_LIST_HEAD(&devlink->reporter_list);
 	mutex_init(&devlink->lock);
 	return devlink;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 03/11] devlink: Add health report functionality
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 01/11] devlink: Add devlink formatted message (fmsg) API Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 02/11] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:38   ` Jiri Pirko
  2019-02-07  9:36 ` [PATCH v4 net-next 04/11] devlink: Add health get command Eran Ben Elisha
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.

Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
  as there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
  - Auto Recovery configuration
  - Grace period vs. Time since last recover

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/net/devlink.h          |   9 +++
 include/trace/events/devlink.h |  65 ++++++++++++++++++
 net/core/devlink.c             | 119 +++++++++++++++++++++++++++++++++
 3 files changed, 193 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3dfe30235878..c12ad6e9095d 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -704,6 +704,8 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
 
 void *
 devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
+int devlink_health_report(struct devlink_health_reporter *reporter,
+			  const char *msg, void *priv_ctx);
 
 #else
 
@@ -1173,6 +1175,13 @@ devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
 {
 	return NULL;
 }
+
+static inline int
+devlink_health_report(struct devlink_health_reporter *reporter,
+		      const char *msg, void *priv_ctx)
+{
+	return 0;
+}
 #endif
 
 #if IS_REACHABLE(CONFIG_NET_DEVLINK)
diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
index 40705364a50f..191ddf67d769 100644
--- a/include/trace/events/devlink.h
+++ b/include/trace/events/devlink.h
@@ -75,6 +75,71 @@ TRACE_EVENT(devlink_hwerr,
 			__get_str(driver_name), __entry->err, __get_str(msg))
 );
 
+/*
+ * Tracepoint for devlink health message:
+ */
+TRACE_EVENT(devlink_health_report,
+	TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+		 const char *msg),
+
+	TP_ARGS(devlink, reporter_name, msg),
+
+	TP_STRUCT__entry(
+		__string(bus_name, devlink->dev->bus->name)
+		__string(dev_name, dev_name(devlink->dev))
+		__string(driver_name, devlink->dev->driver->name)
+		__string(reporter_name, msg)
+		__string(msg, msg)
+	),
+
+	TP_fast_assign(
+		__assign_str(bus_name, devlink->dev->bus->name);
+		__assign_str(dev_name, dev_name(devlink->dev));
+		__assign_str(driver_name, devlink->dev->driver->name);
+		__assign_str(reporter_name, reporter_name);
+		__assign_str(msg, msg);
+	),
+
+	TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: %s",
+		  __get_str(bus_name), __get_str(dev_name),
+		  __get_str(driver_name), __get_str(reporter_name),
+		  __get_str(msg))
+);
+
+/*
+ * Tracepoint for devlink health recover aborted message:
+ */
+TRACE_EVENT(devlink_health_recover_aborted,
+	TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+		 bool health_state, u64 time_since_last_recover),
+
+	TP_ARGS(devlink, reporter_name, health_state, time_since_last_recover),
+
+	TP_STRUCT__entry(
+		__string(bus_name, devlink->dev->bus->name)
+		__string(dev_name, dev_name(devlink->dev))
+		__string(driver_name, devlink->dev->driver->name)
+		__string(reporter_name, reporter_name)
+		__field(bool, health_state)
+		__field(u64, time_since_last_recover)
+	),
+
+	TP_fast_assign(
+		__assign_str(bus_name, devlink->dev->bus->name);
+		__assign_str(dev_name, dev_name(devlink->dev));
+		__assign_str(driver_name, devlink->dev->driver->name);
+		__assign_str(reporter_name, reporter_name);
+		__entry->health_state = health_state;
+		__entry->time_since_last_recover = time_since_last_recover;
+	),
+
+	TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: health_state=%d time_since_last_recover=%llu recover aborted",
+		  __get_str(bus_name), __get_str(dev_name),
+		  __get_str(driver_name), __get_str(reporter_name),
+		  __entry->health_state,
+		  __entry->time_since_last_recover)
+);
+
 #endif /* _TRACE_DEVLINK_H */
 
 /* This part must be outside protection */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 341548d7f1f1..3eaa290831aa 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4367,9 +4367,20 @@ struct devlink_health_reporter {
 	void *priv;
 	const struct devlink_health_reporter_ops *ops;
 	struct devlink *devlink;
+	struct devlink_fmsg *dump_fmsg;
+	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
 	u64 graceful_period;
 	bool auto_recover;
 	u8 health_state;
+	u64 dump_ts;
+	u64 error_count;
+	u64 recovery_count;
+	u64 last_recovery_ts;
+};
+
+enum devlink_health_reporter_state {
+	DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+	DEVLINK_HEALTH_REPORTER_STATE_ERROR,
 };
 
 void *
@@ -4431,6 +4442,7 @@ devlink_health_reporter_create(struct devlink *devlink,
 	reporter->devlink = devlink;
 	reporter->graceful_period = graceful_period;
 	reporter->auto_recover = auto_recover;
+	mutex_init(&reporter->dump_lock);
 	list_add_tail(&reporter->list, &devlink->reporter_list);
 unlock:
 	mutex_unlock(&devlink->lock);
@@ -4449,10 +4461,117 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
 	mutex_lock(&reporter->devlink->lock);
 	list_del(&reporter->list);
 	mutex_unlock(&reporter->devlink->lock);
+	if (reporter->dump_fmsg)
+		devlink_fmsg_free(reporter->dump_fmsg);
 	kfree(reporter);
 }
 EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
 
+static int
+devlink_health_reporter_recover(struct devlink_health_reporter *reporter,
+				void *priv_ctx)
+{
+	int err;
+
+	if (!reporter->ops->recover)
+		return -EOPNOTSUPP;
+
+	err = reporter->ops->recover(reporter, priv_ctx);
+	if (err)
+		return err;
+
+	reporter->recovery_count++;
+	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY;
+	reporter->last_recovery_ts = jiffies;
+
+	return 0;
+}
+
+static void
+devlink_health_dump_clear(struct devlink_health_reporter *reporter)
+{
+	if (!reporter->dump_fmsg)
+		return;
+	devlink_fmsg_free(reporter->dump_fmsg);
+	reporter->dump_fmsg = NULL;
+}
+
+static int devlink_health_do_dump(struct devlink_health_reporter *reporter,
+				  void *priv_ctx)
+{
+	int err;
+
+	if (!reporter->ops->dump)
+		return 0;
+
+	if (reporter->dump_fmsg)
+		return 0;
+
+	reporter->dump_fmsg = devlink_fmsg_alloc();
+	if (!reporter->dump_fmsg) {
+		err = -ENOMEM;
+		return err;
+	}
+
+	err = devlink_fmsg_obj_nest_start(reporter->dump_fmsg);
+	if (err)
+		goto dump_err;
+
+	err = reporter->ops->dump(reporter, reporter->dump_fmsg,
+				  priv_ctx);
+	if (err)
+		goto dump_err;
+
+	err = devlink_fmsg_obj_nest_end(reporter->dump_fmsg);
+	if (err)
+		goto dump_err;
+
+	reporter->dump_ts = jiffies;
+
+	return 0;
+
+dump_err:
+	devlink_health_dump_clear(reporter);
+	return err;
+}
+
+int devlink_health_report(struct devlink_health_reporter *reporter,
+			  const char *msg, void *priv_ctx)
+{
+	struct devlink *devlink = reporter->devlink;
+
+	/* write a log message of the current error */
+	WARN_ON(!msg);
+	trace_devlink_health_report(devlink, reporter->ops->name, msg);
+	reporter->error_count++;
+
+	/* abort if the previous error wasn't recovered */
+	if (reporter->auto_recover &&
+	    (reporter->health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
+	     jiffies - reporter->last_recovery_ts <
+	     msecs_to_jiffies(reporter->graceful_period))) {
+		trace_devlink_health_recover_aborted(devlink,
+						     reporter->ops->name,
+						     reporter->health_state,
+						     jiffies -
+						     reporter->last_recovery_ts);
+		return -ECANCELED;
+	}
+
+	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
+
+	mutex_lock(&reporter->dump_lock);
+	/* store current dump of current error, for later analysis */
+	devlink_health_do_dump(reporter, priv_ctx);
+	mutex_unlock(&reporter->dump_lock);
+
+	if (reporter->auto_recover)
+		return devlink_health_reporter_recover(reporter, priv_ctx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_report);
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 04/11] devlink: Add health get command
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (2 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 03/11] devlink: Add health report functionality Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 05/11] devlink: Add health set command Eran Ben Elisha
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/devlink.h |  11 +++
 net/core/devlink.c           | 149 +++++++++++++++++++++++++++++++++++
 2 files changed, 160 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 076692209a9b..d8f20d6ce139 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -96,6 +96,8 @@ enum devlink_command {
 
 	DEVLINK_CMD_INFO_GET,		/* can dump */
 
+	DEVLINK_CMD_HEALTH_REPORTER_GET,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -310,6 +312,15 @@ enum devlink_attr {
 	DEVLINK_ATTR_FMSG_OBJ_NAME,		/* string */
 	DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE,	/* u8 */
 	DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA,	/* dynamic */
+
+	DEVLINK_ATTR_HEALTH_REPORTER,			/* nested */
+	DEVLINK_ATTR_HEALTH_REPORTER_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_REPORTER_STATE,		/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_ERR,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,	/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,	/* u8 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 3eaa290831aa..86f7c0e5d4bc 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4572,6 +4572,146 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
 }
 EXPORT_SYMBOL_GPL(devlink_health_report);
 
+static struct devlink_health_reporter *
+devlink_health_reporter_get_from_info(struct devlink *devlink,
+				      struct genl_info *info)
+{
+	char *reporter_name;
+
+	if (!info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME])
+		return NULL;
+
+	reporter_name =
+		nla_data(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]);
+	return devlink_health_reporter_find_by_name(devlink, reporter_name);
+}
+
+static int
+devlink_nl_health_reporter_fill(struct sk_buff *msg,
+				struct devlink *devlink,
+				struct devlink_health_reporter *reporter,
+				enum devlink_command cmd, u32 portid,
+				u32 seq, int flags)
+{
+	struct nlattr *reporter_attr;
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (devlink_nl_put_handle(msg, devlink))
+		goto genlmsg_cancel;
+
+	reporter_attr = nla_nest_start(msg, DEVLINK_ATTR_HEALTH_REPORTER);
+	if (!reporter_attr)
+		goto genlmsg_cancel;
+	if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME,
+			   reporter->ops->name))
+		goto reporter_nest_cancel;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE,
+		       reporter->health_state))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_ERR,
+			      reporter->error_count, DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,
+			      reporter->recovery_count, DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,
+			      reporter->graceful_period,
+			      DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
+		       reporter->auto_recover))
+		goto reporter_nest_cancel;
+	if (reporter->dump_fmsg &&
+	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,
+			      jiffies_to_msecs(reporter->dump_ts),
+			      DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+
+	nla_nest_end(msg, reporter_attr);
+	genlmsg_end(msg, hdr);
+	return 0;
+
+reporter_nest_cancel:
+	nla_nest_end(msg, reporter_attr);
+genlmsg_cancel:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb,
+						   struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	struct sk_buff *msg;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_health_reporter_fill(msg, devlink, reporter,
+					      DEVLINK_CMD_HEALTH_REPORTER_GET,
+					      info->snd_portid, info->snd_seq,
+					      0);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int
+devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
+					  struct netlink_callback *cb)
+{
+	struct devlink_health_reporter *reporter;
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		mutex_lock(&devlink->lock);
+		list_for_each_entry(reporter, &devlink->reporter_list,
+				    list) {
+			if (idx < start) {
+				idx++;
+				continue;
+			}
+			err = devlink_nl_health_reporter_fill(msg, devlink,
+							      reporter,
+							      DEVLINK_CMD_HEALTH_REPORTER_GET,
+							      NETLINK_CB(cb->skb).portid,
+							      cb->nlh->nlmsg_seq,
+							      NLM_F_MULTI);
+			if (err) {
+				mutex_unlock(&devlink->lock);
+				goto out;
+			}
+			idx++;
+		}
+		mutex_unlock(&devlink->lock);
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4597,6 +4737,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 },
 	[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -4840,6 +4981,14 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 		/* can be retrieved by unprivileged users */
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_GET,
+		.doit = devlink_nl_cmd_health_reporter_get_doit,
+		.dumpit = devlink_nl_cmd_health_reporter_get_dumpit,
+		.policy = devlink_nl_policy,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+		/* can be retrieved by unprivileged users */
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 05/11] devlink: Add health set command
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (3 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 04/11] devlink: Add health get command Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 06/11] devlink: Add health recover command Eran Ben Elisha
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon
		receiving error for the reporter

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index d8f20d6ce139..b03065a99884 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -97,6 +97,7 @@ enum devlink_command {
 	DEVLINK_CMD_INFO_GET,		/* can dump */
 
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_SET,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 86f7c0e5d4bc..0b231fb76e59 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4712,6 +4712,33 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
 	return msg->len;
 }
 
+static int
+devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->recover &&
+	    (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] ||
+	     info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]))
+		return -EOPNOTSUPP;
+
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
+		reporter->graceful_period =
+			nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]);
+
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])
+		reporter->auto_recover =
+			nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
+
+	return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4738,6 +4765,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
 	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
+	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -4989,6 +5018,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 		/* can be retrieved by unprivileged users */
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_SET,
+		.doit = devlink_nl_cmd_health_reporter_set_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 06/11] devlink: Add health recover command
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (4 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 05/11] devlink: Add health set command Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 07/11] devlink: Add health diagnose command Eran Ben Elisha
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index b03065a99884..a3a97e6edad8 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -98,6 +98,7 @@ enum devlink_command {
 
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
+	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 0b231fb76e59..0e6b0e034863 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4739,6 +4739,19 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
 	return 0;
 }
 
+static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
+						       struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	return devlink_health_reporter_recover(reporter, NULL);
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -5025,6 +5038,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+		.doit = devlink_nl_cmd_health_reporter_recover_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 07/11] devlink: Add health diagnose command
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (5 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 06/11] devlink: Add health recover command Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 08/11] devlink: Add health dump {get,clear} commands Eran Ben Elisha
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.

It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 46 ++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index a3a97e6edad8..09be37137841 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -99,6 +99,7 @@ enum devlink_command {
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
 	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 0e6b0e034863..1e8613db2bf7 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4752,6 +4752,45 @@ static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
 	return devlink_health_reporter_recover(reporter, NULL);
 }
 
+static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb,
+							struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	struct devlink_fmsg *fmsg;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->diagnose)
+		return -EOPNOTSUPP;
+
+	fmsg = devlink_fmsg_alloc();
+	if (!fmsg)
+		return -ENOMEM;
+
+	err = devlink_fmsg_obj_nest_start(fmsg);
+	if (err)
+		goto out;
+
+	err = reporter->ops->diagnose(reporter, fmsg);
+	if (err)
+		goto out;
+
+	err = devlink_fmsg_obj_nest_end(fmsg);
+	if (err)
+		goto out;
+
+	err = devlink_fmsg_snd(fmsg, info,
+			       DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0);
+
+out:
+	devlink_fmsg_free(fmsg);
+	return err;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -5045,6 +5084,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+		.doit = devlink_nl_cmd_health_reporter_diagnose_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 08/11] devlink: Add health dump {get,clear} commands
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (6 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 07/11] devlink: Add health diagnose command Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 09/11] net/mlx5e: Add tx reporter support Eran Ben Elisha
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Add devlink health dump commands, in order to run an dump operation
over a specific reporter.

The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.

It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/devlink.h |  2 ++
 net/core/devlink.c           | 63 ++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 09be37137841..72d9f7c89190 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -100,6 +100,8 @@ enum devlink_command {
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
 	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
 	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+	DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 1e8613db2bf7..7fbdba547d4f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4791,6 +4791,53 @@ static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb,
 	return err;
 }
 
+static int devlink_nl_cmd_health_reporter_dump_get_doit(struct sk_buff *skb,
+							struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->dump)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&reporter->dump_lock);
+	err = devlink_health_do_dump(reporter, NULL);
+	if (err)
+		goto out;
+
+	err = devlink_fmsg_snd(reporter->dump_fmsg, info,
+			       DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET, 0);
+
+out:
+	mutex_unlock(&reporter->dump_lock);
+	return err;
+}
+
+static int
+devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb,
+					       struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->dump)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&reporter->dump_lock);
+	devlink_health_dump_clear(reporter);
+	mutex_unlock(&reporter->dump_lock);
+	return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -5091,6 +5138,22 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+		.doit = devlink_nl_cmd_health_reporter_dump_get_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+				  DEVLINK_NL_FLAG_NO_LOCK,
+	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
+		.doit = devlink_nl_cmd_health_reporter_dump_clear_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+				  DEVLINK_NL_FLAG_NO_LOCK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 09/11] net/mlx5e: Add tx reporter support
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (7 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 08/11] devlink: Add health dump {get,clear} commands Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 10/11] net/mlx5e: Add tx timeout support for mlx5e tx reporter Eran Ben Elisha
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of tx errors.
This patch declares the TX reporter operations and creates it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full tx recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.

The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.

Diagnose output:
$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
    "SQs": [ {
            "sqn": 138,
            "HW state": 1,
            "stopped": false
        },{
            "sqn": 142,
            "HW state": 1,
            "stopped": false
        } ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
  sqn: 138 HW state: 1 stopped: false
  sqn: 142 HW state: 1 stopped: false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  18 +-
 .../ethernet/mellanox/mlx5/core/en/reporter.h |  14 +
 .../mellanox/mlx5/core/en/reporter_tx.c       | 259 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 136 +--------
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |   5 +-
 6 files changed, 306 insertions(+), 128 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9de9abacf7f6..6bb2a860b15b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -22,7 +22,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 #
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
-		en_selftest.o en/port.o en/monitor_stats.o
+		en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6dd74ef69389..66e510b16243 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -387,10 +387,7 @@ struct mlx5e_txqsq {
 	struct mlx5e_channel      *channel;
 	int                        txq_ix;
 	u32                        rate_limit;
-	struct mlx5e_txqsq_recover {
-		struct work_struct         recover_work;
-		u64                        last_recover;
-	} recover;
+	struct work_struct         recover_work;
 } ____cacheline_aligned_in_smp;
 
 struct mlx5e_dma_info {
@@ -683,6 +680,13 @@ struct mlx5e_rss_params {
 	u8	hfunc;
 };
 
+struct mlx5e_modify_sq_param {
+	int curr_state;
+	int next_state;
+	int rl_update;
+	int rl_index;
+};
+
 struct mlx5e_priv {
 	/* priv data path fields - start */
 	struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
@@ -738,6 +742,7 @@ struct mlx5e_priv {
 #ifdef CONFIG_MLX5_EN_TLS
 	struct mlx5e_tls          *tls;
 #endif
+	struct devlink_health_reporter *tx_reporter;
 };
 
 struct mlx5e_profile {
@@ -868,6 +873,11 @@ void mlx5e_set_rq_type(struct mlx5_core_dev *mdev, struct mlx5e_params *params);
 void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
 			       struct mlx5e_params *params);
 
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+		    struct mlx5e_modify_sq_param *p);
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq);
+void mlx5e_tx_disable_queue(struct netdev_queue *txq);
+
 static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
 {
 	return (MLX5_CAP_ETH(mdev, tunnel_stateless_gre) &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
new file mode 100644
index 000000000000..1c90059db5f8
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __MLX5E_EN_REPORTER_H
+#define __MLX5E_EN_REPORTER_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
new file mode 100644
index 000000000000..35568e4820de
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -0,0 +1,259 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#include <net/devlink.h>
+#include "reporter.h"
+#include "lib/eq.h"
+
+#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
+
+struct mlx5e_tx_err_ctx {
+	int (*recover)(struct mlx5e_txqsq *sq);
+	struct mlx5e_txqsq *sq;
+};
+
+static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
+{
+	unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
+
+	while (time_before(jiffies, exp_time)) {
+		if (sq->cc == sq->pc)
+			return 0;
+
+		msleep(20);
+	}
+
+	netdev_err(sq->channel->netdev,
+		   "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
+		   sq->sqn, sq->cc, sq->pc);
+
+	return -ETIMEDOUT;
+}
+
+static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
+{
+	WARN_ONCE(sq->cc != sq->pc,
+		  "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
+		  sq->sqn, sq->cc, sq->pc);
+	sq->cc = 0;
+	sq->dma_fifo_cc = 0;
+	sq->pc = 0;
+}
+
+static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
+{
+	struct mlx5_core_dev *mdev = sq->channel->mdev;
+	struct net_device *dev = sq->channel->netdev;
+	struct mlx5e_modify_sq_param msp = {0};
+	int err;
+
+	msp.curr_state = curr_state;
+	msp.next_state = MLX5_SQC_STATE_RST;
+
+	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+	if (err) {
+		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
+		return err;
+	}
+
+	memset(&msp, 0, sizeof(msp));
+	msp.curr_state = MLX5_SQC_STATE_RST;
+	msp.next_state = MLX5_SQC_STATE_RDY;
+
+	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+	if (err) {
+		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
+		return err;
+	}
+
+	return 0;
+}
+
+static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
+{
+	struct mlx5_core_dev *mdev = sq->channel->mdev;
+	struct net_device *dev = sq->channel->netdev;
+	u8 state;
+	int err;
+
+	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
+		return 0;
+
+	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
+	if (err) {
+		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
+			   sq->sqn, err);
+		return err;
+	}
+
+	if (state != MLX5_SQC_STATE_ERR) {
+		netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
+		return -EINVAL;
+	}
+
+	mlx5e_tx_disable_queue(sq->txq);
+
+	err = mlx5e_wait_for_sq_flush(sq);
+	if (err)
+		return err;
+
+	/* At this point, no new packets will arrive from the stack as TXQ is
+	 * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
+	 * pending WQEs. SQ can safely reset the SQ.
+	 */
+
+	err = mlx5e_sq_to_ready(sq, state);
+	if (err)
+		return err;
+
+	mlx5e_reset_txqsq_cc_pc(sq);
+	sq->stats->recover++;
+	mlx5e_activate_txqsq(sq);
+
+	return 0;
+}
+
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
+{
+	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+	struct mlx5e_tx_err_ctx err_ctx = {0};
+
+	err_ctx.sq       = sq;
+	err_ctx.recover  = mlx5e_tx_reporter_err_cqe_recover;
+	sprintf(err_str, "ERR CQE on SQ: 0x%x", sq->sqn);
+
+	devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+			      &err_ctx);
+}
+
+/* state lock cannot be grabbed within this function.
+ * It can cause a dead lock or a read-after-free.
+ */
+int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_tx_err_ctx *err_ctx)
+{
+	return err_ctx->recover(err_ctx->sq);
+}
+
+static int mlx5e_tx_reporter_recover_all(struct mlx5e_priv *priv)
+{
+	int err;
+
+	rtnl_lock();
+	mutex_lock(&priv->state_lock);
+	mlx5e_close_locked(priv->netdev);
+	err = mlx5e_open_locked(priv->netdev);
+	mutex_unlock(&priv->state_lock);
+	rtnl_unlock();
+
+	return err;
+}
+
+static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
+				     void *context)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	struct mlx5e_tx_err_ctx *err_ctx = context;
+
+	return err_ctx ? mlx5e_tx_reporter_recover_from_ctx(err_ctx) :
+			 mlx5e_tx_reporter_recover_all(priv);
+}
+
+static int
+mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
+					u32 sqn, u8 state, bool stopped)
+{
+	int err;
+
+	err = devlink_fmsg_obj_nest_start(fmsg);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u32_pair_put(fmsg, "sqn", sqn);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_u8_pair_put(fmsg, "HW state", state);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_bool_pair_put(fmsg, "stopped", stopped);
+	if (err)
+		return err;
+
+	err = devlink_fmsg_obj_nest_end(fmsg);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
+				      struct devlink_fmsg *fmsg)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	int i, err = 0;
+
+	mutex_lock(&priv->state_lock);
+
+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
+		goto unlock;
+
+	err = devlink_fmsg_arr_pair_nest_start(fmsg, "SQs");
+	if (err)
+		goto unlock;
+
+	for (i = 0; i < priv->channels.num * priv->channels.params.num_tc;
+	     i++) {
+		struct mlx5e_txqsq *sq = priv->txq2sq[i];
+		u8 state;
+
+		err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
+		if (err)
+			break;
+
+		err = mlx5e_tx_reporter_build_diagnose_output(fmsg, sq->sqn,
+							      state,
+							      netif_xmit_stopped(sq->txq));
+		if (err)
+			break;
+	}
+	err = devlink_fmsg_arr_pair_nest_end(fmsg);
+	if (err)
+		goto unlock;
+
+unlock:
+	mutex_unlock(&priv->state_lock);
+	return err;
+}
+
+static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
+		.name = "tx",
+		.recover = mlx5e_tx_reporter_recover,
+		.diagnose = mlx5e_tx_reporter_diagnose,
+};
+
+#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct devlink *devlink = priv_to_devlink(mdev);
+
+	priv->tx_reporter =
+		devlink_health_reporter_create(devlink, &mlx5_tx_reporter_ops,
+					       MLX5_REPORTER_TX_GRACEFUL_PERIOD,
+					       true, priv);
+	if (IS_ERR_OR_NULL(priv->tx_reporter))
+		netdev_warn(priv->netdev,
+			    "Failed to create tx reporter, err = %ld\n",
+			    PTR_ERR(priv->tx_reporter));
+	return PTR_ERR_OR_ZERO(priv->tx_reporter);
+}
+
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv)
+{
+	if (IS_ERR_OR_NULL(priv->tx_reporter))
+		return;
+
+	devlink_health_reporter_destroy(priv->tx_reporter);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 099d307e6f25..93e1f037b019 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -51,6 +51,7 @@
 #include "en/xdp.h"
 #include "lib/eq.h"
 #include "en/monitor_stats.h"
+#include "en/reporter.h"
 
 struct mlx5e_rq_param {
 	u32			rqc[MLX5_ST_SZ_DW(rqc)];
@@ -1159,7 +1160,7 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
 	return 0;
 }
 
-static void mlx5e_sq_recover(struct work_struct *work);
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work);
 static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 			     int txq_ix,
 			     struct mlx5e_params *params,
@@ -1181,7 +1182,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 	sq->uar_map   = mdev->mlx5e_res.bfreg.map;
 	sq->min_inline_mode = params->tx_min_inline_mode;
 	sq->stats     = &c->priv->channel_stats[c->ix].sq[tc];
-	INIT_WORK(&sq->recover.recover_work, mlx5e_sq_recover);
+	INIT_WORK(&sq->recover_work, mlx5e_tx_err_cqe_work);
 	if (MLX5_IPSEC_DEV(c->priv->mdev))
 		set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state);
 	if (mlx5_accel_is_tls_device(c->priv->mdev))
@@ -1269,15 +1270,8 @@ static int mlx5e_create_sq(struct mlx5_core_dev *mdev,
 	return err;
 }
 
-struct mlx5e_modify_sq_param {
-	int curr_state;
-	int next_state;
-	bool rl_update;
-	int rl_index;
-};
-
-static int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
-			   struct mlx5e_modify_sq_param *p)
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+		    struct mlx5e_modify_sq_param *p)
 {
 	void *in;
 	void *sqc;
@@ -1375,17 +1369,7 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
 	return err;
 }
 
-static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
-{
-	WARN_ONCE(sq->cc != sq->pc,
-		  "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
-		  sq->sqn, sq->cc, sq->pc);
-	sq->cc = 0;
-	sq->dma_fifo_cc = 0;
-	sq->pc = 0;
-}
-
-static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
 {
 	sq->txq = netdev_get_tx_queue(sq->channel->netdev, sq->txq_ix);
 	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
@@ -1394,7 +1378,7 @@ static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
 	netif_tx_start_queue(sq->txq);
 }
 
-static inline void netif_tx_disable_queue(struct netdev_queue *txq)
+void mlx5e_tx_disable_queue(struct netdev_queue *txq)
 {
 	__netif_tx_lock_bh(txq);
 	netif_tx_stop_queue(txq);
@@ -1410,7 +1394,7 @@ static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
 	/* prevent netif_tx_wake_queue */
 	napi_synchronize(&c->napi);
 
-	netif_tx_disable_queue(sq->txq);
+	mlx5e_tx_disable_queue(sq->txq);
 
 	/* last doorbell out, godspeed .. */
 	if (mlx5e_wqc_has_room_for(wq, sq->cc, sq->pc, 1)) {
@@ -1430,6 +1414,7 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
 	struct mlx5_rate_limit rl = {0};
 
 	cancel_work_sync(&sq->dim.work);
+	cancel_work_sync(&sq->recover_work);
 	mlx5e_destroy_sq(mdev, sq->sqn);
 	if (sq->rate_limit) {
 		rl.rate = sq->rate_limit;
@@ -1439,105 +1424,12 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
 	mlx5e_free_txqsq(sq);
 }
 
-static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
-{
-	unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
-
-	while (time_before(jiffies, exp_time)) {
-		if (sq->cc == sq->pc)
-			return 0;
-
-		msleep(20);
-	}
-
-	netdev_err(sq->channel->netdev,
-		   "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
-		   sq->sqn, sq->cc, sq->pc);
-
-	return -ETIMEDOUT;
-}
-
-static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
 {
-	struct mlx5_core_dev *mdev = sq->channel->mdev;
-	struct net_device *dev = sq->channel->netdev;
-	struct mlx5e_modify_sq_param msp = {0};
-	int err;
-
-	msp.curr_state = curr_state;
-	msp.next_state = MLX5_SQC_STATE_RST;
-
-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
-	if (err) {
-		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
-		return err;
-	}
-
-	memset(&msp, 0, sizeof(msp));
-	msp.curr_state = MLX5_SQC_STATE_RST;
-	msp.next_state = MLX5_SQC_STATE_RDY;
-
-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
-	if (err) {
-		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
-		return err;
-	}
-
-	return 0;
-}
-
-static void mlx5e_sq_recover(struct work_struct *work)
-{
-	struct mlx5e_txqsq_recover *recover =
-		container_of(work, struct mlx5e_txqsq_recover,
-			     recover_work);
-	struct mlx5e_txqsq *sq = container_of(recover, struct mlx5e_txqsq,
-					      recover);
-	struct mlx5_core_dev *mdev = sq->channel->mdev;
-	struct net_device *dev = sq->channel->netdev;
-	u8 state;
-	int err;
-
-	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
-	if (err) {
-		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
-			   sq->sqn, err);
-		return;
-	}
-
-	if (state != MLX5_RQC_STATE_ERR) {
-		netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
-		return;
-	}
-
-	netif_tx_disable_queue(sq->txq);
-
-	if (mlx5e_wait_for_sq_flush(sq))
-		return;
-
-	/* If the interval between two consecutive recovers per SQ is too
-	 * short, don't recover to avoid infinite loop of ERR_CQE -> recover.
-	 * If we reached this state, there is probably a bug that needs to be
-	 * fixed. let's keep the queue close and let tx timeout cleanup.
-	 */
-	if (jiffies_to_msecs(jiffies - recover->last_recover) <
-	    MLX5E_SQ_RECOVER_MIN_INTERVAL) {
-		netdev_err(dev, "Recover SQ 0x%x canceled, too many error CQEs\n",
-			   sq->sqn);
-		return;
-	}
-
-	/* At this point, no new packets will arrive from the stack as TXQ is
-	 * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
-	 * pending WQEs.  SQ can safely reset the SQ.
-	 */
-	if (mlx5e_sq_to_ready(sq, state))
-		return;
+	struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
+					      recover_work);
 
-	mlx5e_reset_txqsq_cc_pc(sq);
-	sq->stats->recover++;
-	recover->last_recover = jiffies;
-	mlx5e_activate_txqsq(sq);
+	mlx5e_tx_reporter_err_cqe(sq);
 }
 
 static int mlx5e_open_icosq(struct mlx5e_channel *c,
@@ -3236,6 +3128,7 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
 {
 	int tc;
 
+	mlx5e_tx_reporter_destroy(priv);
 	for (tc = 0; tc < priv->profile->max_tc; tc++)
 		mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
 }
@@ -4953,6 +4846,7 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 	mlx5e_dcbnl_initialize(priv);
 #endif
+	mlx5e_tx_reporter_create(priv);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 598ad7e4d5c9..189211295e48 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -513,8 +513,9 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 					      &sq->state)) {
 				mlx5e_dump_error_cqe(sq,
 						     (struct mlx5_err_cqe *)cqe);
-				queue_work(cq->channel->priv->wq,
-					   &sq->recover.recover_work);
+				if (!IS_ERR_OR_NULL(cq->channel->priv->tx_reporter))
+					queue_work(cq->channel->priv->wq,
+						   &sq->recover_work);
 			}
 			stats->cqe_err++;
 		}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 10/11] net/mlx5e: Add tx timeout support for mlx5e tx reporter
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (8 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 09/11] net/mlx5e: Add tx reporter support Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07  9:36 ` [PATCH v4 net-next 11/11] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

With this patch, ndo_tx_timeout callback will be redirected to the tx
reporter in order to detect a tx timeout error and report it to the
devlink health. (The watchdog detects tx timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from tx timeout in case of lost interrupt was added
to the tx reporter recover method. The tx timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the tx reporter recovery flow.

tx timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a,
CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000

$devlink health show
pci/0000:00:09.0:
  name tx
    state healthy #err 1 #recover 0 last_dump_ts N/A
    parameters:
      grace_period 500 auto_recover false

$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
    "SQs": [ {
            "sqn": 138,
            "HW state": 1,
            "stopped": true
        },{
            "sqn": 142,
            "HW state": 1,
            "stopped": false
        } ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
  sqn: 138 HW state: 1 stopped: true
  sqn: 142 HW state: 1 stopped: false

$devlink health recover pci/0000:00:09 reporter tx
$devlink health show
pci/0000:00:09.0:
  name tx
    state healthy #err 1 #recover 1 last_dump_ts N/A
    parameters:
      grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en/reporter.h |  1 +
 .../mellanox/mlx5/core/en/reporter_tx.c       | 38 +++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 53 ++++++-------------
 3 files changed, 55 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
index 1c90059db5f8..e78e92753d73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -10,5 +10,6 @@
 int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
 void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
 void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq);
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 35568e4820de..0aebfb377cf0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -126,6 +126,44 @@ void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
 			      &err_ctx);
 }
 
+static int mlx5e_tx_reporter_timeout_recover(struct mlx5e_txqsq *sq)
+{
+	struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
+	u32 eqe_count;
+	int ret;
+
+	netdev_err(sq->channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
+		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
+
+	eqe_count = mlx5_eq_poll_irq_disabled(eq);
+	ret = eqe_count ? true : false;
+	if (!eqe_count) {
+		clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
+		return ret;
+	}
+
+	netdev_err(sq->channel->netdev, "Recover %d eqes on EQ 0x%x\n",
+		   eqe_count, eq->core.eqn);
+	sq->channel->stats->eq_rearm++;
+	return ret;
+}
+
+int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq)
+{
+	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+	struct mlx5e_tx_err_ctx err_ctx;
+
+	err_ctx.sq       = sq;
+	err_ctx.recover  = mlx5e_tx_reporter_timeout_recover;
+	sprintf(err_str,
+		"TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
+		sq->channel->ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
+		jiffies_to_usecs(jiffies - sq->txq->trans_start));
+
+	return devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+				     &err_ctx);
+}
+
 /* state lock cannot be grabbed within this function.
  * It can cause a dead lock or a read-after-free.
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 93e1f037b019..d81ebba7c04e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4116,31 +4116,13 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb,
 	return features;
 }
 
-static bool mlx5e_tx_timeout_eq_recover(struct net_device *dev,
-					struct mlx5e_txqsq *sq)
-{
-	struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
-	u32 eqe_count;
-
-	netdev_err(dev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
-		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
-
-	eqe_count = mlx5_eq_poll_irq_disabled(eq);
-	if (!eqe_count)
-		return false;
-
-	netdev_err(dev, "Recover %d eqes on EQ 0x%x\n", eqe_count, eq->core.eqn);
-	sq->channel->stats->eq_rearm++;
-	return true;
-}
-
 static void mlx5e_tx_timeout_work(struct work_struct *work)
 {
 	struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
 					       tx_timeout_work);
-	struct net_device *dev = priv->netdev;
-	bool reopen_channels = false;
-	int i, err;
+	bool report_failed = false;
+	int err;
+	int i;
 
 	rtnl_lock();
 	mutex_lock(&priv->state_lock);
@@ -4149,31 +4131,22 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 		goto unlock;
 
 	for (i = 0; i < priv->channels.num * priv->channels.params.num_tc; i++) {
-		struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i);
+		struct netdev_queue *dev_queue =
+			netdev_get_tx_queue(priv->netdev, i);
 		struct mlx5e_txqsq *sq = priv->txq2sq[i];
 
 		if (!netif_xmit_stopped(dev_queue))
 			continue;
 
-		netdev_err(dev,
-			   "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
-			   i, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
-			   jiffies_to_usecs(jiffies - dev_queue->trans_start));
-
-		/* If we recover a lost interrupt, most likely TX timeout will
-		 * be resolved, skip reopening channels
-		 */
-		if (!mlx5e_tx_timeout_eq_recover(dev, sq)) {
-			clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
-			reopen_channels = true;
-		}
+		if (mlx5e_tx_reporter_timeout(sq))
+			report_failed = true;
 	}
 
-	if (!reopen_channels)
+	if (!report_failed)
 		goto unlock;
 
-	mlx5e_close_locked(dev);
-	err = mlx5e_open_locked(dev);
+	mlx5e_close_locked(priv->netdev);
+	err = mlx5e_open_locked(priv->netdev);
 	if (err)
 		netdev_err(priv->netdev,
 			   "mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n",
@@ -4189,6 +4162,12 @@ static void mlx5e_tx_timeout(struct net_device *dev)
 	struct mlx5e_priv *priv = netdev_priv(dev);
 
 	netdev_err(dev, "TX timeout detected\n");
+
+	if (IS_ERR_OR_NULL(priv->tx_reporter)) {
+		netdev_err_once(priv->netdev, "tx timeout will not be handled, no valid tx reporter\n");
+		return;
+	}
+
 	queue_work(priv->wq, &priv->tx_timeout_work);
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 net-next 11/11] devlink: Add Documentation/networking/devlink-health.txt
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (9 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 10/11] net/mlx5e: Add tx timeout support for mlx5e tx reporter Eran Ben Elisha
@ 2019-02-07  9:36 ` Eran Ben Elisha
  2019-02-07 18:37 ` [PATCH v4 net-next 00/11] Devlink health reporting and recovery system David Miller
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Eran Ben Elisha @ 2019-02-07  9:36 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin, Eran Ben Elisha

From: Aya Levin <ayal@mellanox.com>

This patch adds a new file to add information about devlink health
mechanism.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 Documentation/networking/devlink-health.txt | 86 +++++++++++++++++++++
 1 file changed, 86 insertions(+)
 create mode 100644 Documentation/networking/devlink-health.txt

diff --git a/Documentation/networking/devlink-health.txt b/Documentation/networking/devlink-health.txt
new file mode 100644
index 000000000000..1db3fbea0831
--- /dev/null
+++ b/Documentation/networking/devlink-health.txt
@@ -0,0 +1,86 @@
+The health mechanism is targeted for Real Time Alerting, in order to know when
+something bad had happened to a PCI device
+- Provide alert debug information
+- Self healing
+- If problem needs vendor support, provide a way to gather all needed debugging
+  information.
+
+The main idea is to unify and centralize driver health reports in the
+generic devlink instance and allow the user to set different
+attributes of the health reporting and recovery procedures.
+
+The devlink health reporter:
+Device driver creates a "health reporter" per each error/health type.
+Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
+or unknown (driver specific).
+For each registered health reporter a driver can issue error/health reports
+asynchronously. All health reports handling is done by devlink.
+Device driver can provide specific callbacks for each "health reporter", e.g.
+ - Recovery procedures
+ - Diagnostics and object dump procedures
+ - OOB initial parameters
+Different parts of the driver can register different types of health reporters
+with different handlers.
+
+Once an error is reported, devlink health will do the following actions:
+  * A log is being send to the kernel trace events buffer
+  * Health status and statistics are being updated for the reporter instance
+  * Object dump is being taken and saved at the reporter instance (as long as
+    there is no other dump which is already stored)
+  * Auto recovery attempt is being done. Depends on:
+    - Auto-recovery configuration
+    - Grace period vs. time passed since last recover
+
+The user interface:
+User can access/change each reporter's parameters and driver specific callbacks
+via devlink, e.g per error type (per health reporter)
+ - Configure reporter's generic parameters (like: disable/enable auto recovery)
+ - Invoke recovery procedure
+ - Run diagnostics
+ - Object dump
+
+The devlink health interface (via netlink):
+DEVLINK_CMD_HEALTH_REPORTER_GET
+  Retrieves status and configuration info per DEV and reporter.
+DEVLINK_CMD_HEALTH_REPORTER_SET
+  Allows reporter-related configuration setting.
+DEVLINK_CMD_HEALTH_REPORTER_RECOVER
+  Triggers a reporter's recovery procedure.
+DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
+  Retrieves diagnostics data from a reporter on a device.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
+  Retrieves the last stored dump. Devlink health
+  saves a single dump. If an dump is not already stored by the devlink
+  for this reporter, devlink generates a new dump.
+  dump output is defined by the reporter.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
+  Clears the last saved dump file for the specified reporter.
+
+
+                                               netlink
+                                      +--------------------------+
+                                      |                          |
+                                      |            +             |
+                                      |            |             |
+                                      +--------------------------+
+                                                   |request for ops
+                                                   |(diagnose,
+ mlx5_core                             devlink     |recover,
+                                                   |dump)
++--------+                            +--------------------------+
+|        |                            |    reporter|             |
+|        |                            |  +---------v----------+  |
+|        |   ops execution            |  |                    |  |
+|     <----------------------------------+                    |  |
+|        |                            |  |                    |  |
+|        |                            |  + ^------------------+  |
+|        |                            |    | request for ops     |
+|        |                            |    | (recover, dump)     |
+|        |                            |    |                     |
+|        |                            |  +-+------------------+  |
+|        |     health report          |  | health handler     |  |
+|        +------------------------------->                    |  |
+|        |                            |  +--------------------+  |
+|        |     health reporter create |                          |
+|        +---------------------------->                          |
++--------+                            +--------------------------+
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 03/11] devlink: Add health report functionality
  2019-02-07  9:36 ` [PATCH v4 net-next 03/11] devlink: Add health report functionality Eran Ben Elisha
@ 2019-02-07  9:38   ` Jiri Pirko
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Pirko @ 2019-02-07  9:38 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, David S. Miller, Saeed Mahameed, Jiri Pirko,
	Moshe Shemesh, Aya Levin

Thu, Feb 07, 2019 at 10:36:34AM CET, eranbe@mellanox.com wrote:
>Upon error discover, every driver can report it to the devlink health
>mechanism via devlink_health_report function, using the appropriate
>reporter registered to it. Driver can pass error specific context which
>will be delivered to it as part of the dump / recovery callbacks.
>
>Once an error is reported, devlink health will do the following actions:
>* A log is being send to the kernel trace events buffer
>* Health status and statistics are being updated for the reporter instance
>* Object dump is being taken and stored at the reporter instance (as long
>  as there is no other dump which is already stored)
>* Auto recovery attempt is being done. Depends on:
>  - Auto Recovery configuration
>  - Grace period vs. Time since last recover
>
>Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>Reviewed-by: Moshe Shemesh <moshe@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 net-next 00/11] Devlink health reporting and recovery system
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (10 preceding siblings ...)
  2019-02-07  9:36 ` [PATCH v4 net-next 11/11] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
@ 2019-02-07 18:37 ` David Miller
  2019-02-10 18:28 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
  2019-02-10 18:35 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
  13 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2019-02-07 18:37 UTC (permalink / raw)
  To: eranbe; +Cc: netdev, saeedm, jiri, moshe, ayal

From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Thu,  7 Feb 2019 11:36:31 +0200

> The health mechanism is targeted for Real Time Alerting, in order to know when
> something bad had happened to a PCI device
> - Provide alert debug information
> - Self healing
> - If problem needs vendor support, provide a way to gather all needed debugging
>   information.
> 
> The main idea is to unify and centralize driver health reports in the
> generic devlink instance and allow the user to set different
> attributes of the health reporting and recovery procedures.
 ...

Series applied, thank you.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [iproute2-next, 0/4] Add support for devlink health
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (11 preceding siblings ...)
  2019-02-07 18:37 ` [PATCH v4 net-next 00/11] Devlink health reporting and recovery system David Miller
@ 2019-02-10 18:28 ` Aya Levin
  2019-02-10 18:28   ` [PATCH for-next 1/4] devlink: refactor validation of finding required arguments Aya Levin
                     ` (3 more replies)
  2019-02-10 18:35 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
  13 siblings, 4 replies; 28+ messages in thread
From: Aya Levin @ 2019-02-10 18:28 UTC (permalink / raw)
  To: David Ahern, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog


This series adds support for devlink health commands:
 devlink health show      [DEV reporter REPORTE_NAME]
 devlink health recover    DEV reporter REPORTER_NAME
 devlink health diagnose   DEV reporter REPORTER_NAME
 devlink health dump show  DEV reporter REPORTER_NAME
 devlink health dump clear DEV reporter REPORTER_NAME
 devlink health set        DEV reporter REPORTER_NAME NAME VALUE

The first patch refactors the validation of input parameters, which
grow way too long. Patch 2 and 3 fix bugs that were discovered during
the devlink health development.  Patch 4 adds the devlink health
functionality.


Aya Levin (4):
  devlink: refactor validation of finding required arguments
  devlink: fix print of uint64_t
  devlink: fix boolean JSON print
  devlink: add health command support

 devlink/devlink.c            | 721 ++++++++++++++++++++++++++++++++++++-------
 include/uapi/linux/devlink.h |  23 ++
 man/man8/devlink-health.8    | 176 +++++++++++
 man/man8/devlink.8           |   7 +-
 4 files changed, 813 insertions(+), 114 deletions(-)
 create mode 100644 man/man8/devlink-health.8

-- 
2.14.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH for-next 1/4] devlink: refactor validation of finding required arguments
  2019-02-10 18:28 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
@ 2019-02-10 18:28   ` Aya Levin
  2019-02-11  2:46     ` David Ahern
  2019-02-11 10:29     ` Jiri Pirko
  2019-02-10 18:28   ` [PATCH for-next 2/4] devlink: fix print of uint64_t Aya Levin
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Aya Levin @ 2019-02-10 18:28 UTC (permalink / raw)
  To: David Ahern, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Introducing argument's metadata structure matching a bitmap flag per
required argument and an error message if missing. Using this static
array to refactor validation of finding required arguments in devlink
command line and to ease further maintenance.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 devlink/devlink.c | 155 +++++++++++++++++-------------------------------------
 1 file changed, 47 insertions(+), 108 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index d823512a4030..a05755385a49 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -39,6 +39,7 @@
 #define PARAM_CMODE_RUNTIME_STR "runtime"
 #define PARAM_CMODE_DRIVERINIT_STR "driverinit"
 #define PARAM_CMODE_PERMANENT_STR "permanent"
+#define DL_ARGS_REQUIRED_MAX_ERR_LEN 80
 
 static int g_new_line_count;
 
@@ -950,6 +951,51 @@ static int param_cmode_get(const char *cmodestr,
 	return 0;
 }
 
+struct dl_args_metadata {
+	uint32_t o_flag;
+	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
+};
+
+static const struct dl_args_metadata dl_args_required[] = {
+	{DL_OPT_PORT_TYPE,	      "Port type not set.\n"},
+	{DL_OPT_PORT_COUNT,	      "Port split count option expected.\n"},
+	{DL_OPT_SB_POOL,	      "Pool index option expected.\n"},
+	{DL_OPT_SB_SIZE,	      "Pool size option expected.\n"},
+	{DL_OPT_SB_TYPE,	      "Pool type option expected.\n"},
+	{DL_OPT_SB_THTYPE,	      "Pool threshold type option expected.\n"},
+	{DL_OPT_SB_TH,		      "Threshold option expected.\n"},
+	{DL_OPT_SB_TC,		      "TC index option expected.\n"},
+	{DL_OPT_ESWITCH_MODE,	      "E-Switch mode option expected.\n"},
+	{DL_OPT_ESWITCH_INLINE_MODE,  "E-Switch inline-mode option expected.\n"},
+	{DL_OPT_DPIPE_TABLE_NAME,     "Dpipe table name expected\n"},
+	{DL_OPT_DPIPE_TABLE_COUNTERS, "Dpipe table counter state expected\n"},
+	{DL_OPT_ESWITCH_ENCAP_MODE,   "E-Switch encapsulation option expected.\n"},
+	{DL_OPT_PARAM_NAME,	      "Parameter name expected.\n"},
+	{DL_OPT_PARAM_VALUE,	      "Value to set expected.\n"},
+	{DL_OPT_PARAM_CMODE,	      "Configuration mode expected.\n"},
+	{DL_OPT_REGION_SNAPSHOT_ID,   "Region snapshot id expected.\n"},
+	{DL_OPT_REGION_ADDRESS,	      "Region address value expected.\n"},
+	{DL_OPT_REGION_LENGTH,	      "Region length value expected.\n"},
+};
+
+static int validate_finding_required_dl_args(uint32_t o_required,
+					     uint32_t o_found)
+{
+	uint32_t dl_args_required_size;
+	uint32_t o_flag;
+	int i;
+
+	dl_args_required_size = ARRAY_SIZE(dl_args_required);
+	for (i = 0; i < dl_args_required_size; i++) {
+		o_flag = dl_args_required[i].o_flag;
+		if ((o_required & o_flag) && !(o_found & o_flag)) {
+			pr_err("%s", dl_args_required[i].err_msg);
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
 static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			 uint32_t o_optional)
 {
@@ -1198,114 +1244,7 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 		opts->present |= DL_OPT_SB;
 	}
 
-	if ((o_required & DL_OPT_PORT_TYPE) && !(o_found & DL_OPT_PORT_TYPE)) {
-		pr_err("Port type option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_PORT_COUNT) &&
-	    !(o_found & DL_OPT_PORT_COUNT)) {
-		pr_err("Port split count option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_SB_POOL) && !(o_found & DL_OPT_SB_POOL)) {
-		pr_err("Pool index option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_SB_SIZE) && !(o_found & DL_OPT_SB_SIZE)) {
-		pr_err("Pool size option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_SB_TYPE) && !(o_found & DL_OPT_SB_TYPE)) {
-		pr_err("Pool type option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_SB_THTYPE) && !(o_found & DL_OPT_SB_THTYPE)) {
-		pr_err("Pool threshold type option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_SB_TH) && !(o_found & DL_OPT_SB_TH)) {
-		pr_err("Threshold option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_SB_TC) && !(o_found & DL_OPT_SB_TC)) {
-		pr_err("TC index option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_ESWITCH_MODE) &&
-	    !(o_found & DL_OPT_ESWITCH_MODE)) {
-		pr_err("E-Switch mode option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_ESWITCH_INLINE_MODE) &&
-	    !(o_found & DL_OPT_ESWITCH_INLINE_MODE)) {
-		pr_err("E-Switch inline-mode option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_DPIPE_TABLE_NAME) &&
-	    !(o_found & DL_OPT_DPIPE_TABLE_NAME)) {
-		pr_err("Dpipe table name expected\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_DPIPE_TABLE_COUNTERS) &&
-	    !(o_found & DL_OPT_DPIPE_TABLE_COUNTERS)) {
-		pr_err("Dpipe table counter state expected\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_ESWITCH_ENCAP_MODE) &&
-	    !(o_found & DL_OPT_ESWITCH_ENCAP_MODE)) {
-		pr_err("E-Switch encapsulation option expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_PARAM_NAME) &&
-	    !(o_found & DL_OPT_PARAM_NAME)) {
-		pr_err("Parameter name expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_PARAM_VALUE) &&
-	    !(o_found & DL_OPT_PARAM_VALUE)) {
-		pr_err("Value to set expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_PARAM_CMODE) &&
-	    !(o_found & DL_OPT_PARAM_CMODE)) {
-		pr_err("Configuration mode expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_REGION_SNAPSHOT_ID) &&
-	    !(o_found & DL_OPT_REGION_SNAPSHOT_ID)) {
-		pr_err("Region snapshot id expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_REGION_ADDRESS) &&
-	    !(o_found & DL_OPT_REGION_ADDRESS)) {
-		pr_err("Region address value expected.\n");
-		return -EINVAL;
-	}
-
-	if ((o_required & DL_OPT_REGION_LENGTH) &&
-	    !(o_found & DL_OPT_REGION_LENGTH)) {
-		pr_err("Region length value expected.\n");
-		return -EINVAL;
-	}
-
-	return 0;
+	return validate_finding_required_dl_args(o_required, o_found);
 }
 
 static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
-- 
2.14.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH for-next 2/4]  devlink: fix print of uint64_t
  2019-02-10 18:28 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
  2019-02-10 18:28   ` [PATCH for-next 1/4] devlink: refactor validation of finding required arguments Aya Levin
@ 2019-02-10 18:28   ` Aya Levin
  2019-02-10 20:34     ` Stephen Hemminger
  2019-02-11 10:32     ` Jiri Pirko
  2019-02-10 18:28   ` [PATCH for-next 3/4] devlink: fix boolean JSON print Aya Levin
  2019-02-10 18:28   ` [PATCH for-next 4/4] devlink: add health command support Aya Levin
  3 siblings, 2 replies; 28+ messages in thread
From: Aya Levin @ 2019-02-10 18:28 UTC (permalink / raw)
  To: David Ahern, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

 This patch prints uint64_t with its corresponding format and avoid implicit
 cast to uint32_t.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reported-by: Maria Pasechnik <mariap@mellanox.com>
---
 devlink/devlink.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index a05755385a49..46e2e41c5dfd 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -1628,7 +1628,14 @@ static void pr_out_u64(struct dl *dl, const char *name, uint64_t val)
 	if (val == (uint64_t) -1)
 		return pr_out_str(dl, name, "unlimited");
 
-	return pr_out_uint(dl, name, val);
+	if (dl->json_output) {
+		jsonw_u64_field(dl->jw, name, val);
+	} else {
+		if (g_indent_newline)
+			pr_out("%s %lu", name, val);
+		else
+			pr_out(" %s %lu", name, val);
+	}
 }
 
 static void pr_out_region_chunk_start(struct dl *dl, uint64_t addr)
-- 
2.14.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH for-next 3/4] devlink: fix boolean JSON print
  2019-02-10 18:28 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
  2019-02-10 18:28   ` [PATCH for-next 1/4] devlink: refactor validation of finding required arguments Aya Levin
  2019-02-10 18:28   ` [PATCH for-next 2/4] devlink: fix print of uint64_t Aya Levin
@ 2019-02-10 18:28   ` Aya Levin
  2019-02-10 20:34     ` Stephen Hemminger
  2019-02-10 18:28   ` [PATCH for-next 4/4] devlink: add health command support Aya Levin
  3 siblings, 1 reply; 28+ messages in thread
From: Aya Levin @ 2019-02-10 18:28 UTC (permalink / raw)
  To: David Ahern, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

This patch removes the inverted commas from boolean values in JSON
format: true/false instead of "true"/"false".

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 devlink/devlink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 46e2e41c5dfd..a433883f1b2b 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -1605,10 +1605,10 @@ static void pr_out_str(struct dl *dl, const char *name, const char *val)
 
 static void pr_out_bool(struct dl *dl, const char *name, bool val)
 {
-	if (val)
-		pr_out_str(dl, name, "true");
+	if (dl->json_output)
+		jsonw_bool_field(dl->jw, name, val);
 	else
-		pr_out_str(dl, name, "false");
+		pr_out_str(dl, name, val ? "true" : "false");
 }
 
 static void pr_out_uint(struct dl *dl, const char *name, unsigned int val)
-- 
2.14.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH for-next 4/4] devlink: add health command support
  2019-02-10 18:28 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
                     ` (2 preceding siblings ...)
  2019-02-10 18:28   ` [PATCH for-next 3/4] devlink: fix boolean JSON print Aya Levin
@ 2019-02-10 18:28   ` Aya Levin
  2019-02-10 20:42     ` Stephen Hemminger
  2019-02-11 10:41     ` Jiri Pirko
  3 siblings, 2 replies; 28+ messages in thread
From: Aya Levin @ 2019-02-10 18:28 UTC (permalink / raw)
  To: David Ahern, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

This patch adds support for the following commands:
devlink health show      [DEV reporter REPORTE_NAME]
devlink health recover    DEV reporter REPORTER_NAME
devlink health diagnose   DEV reporter REPORTER_NAME
devlink health dump show  DEV reporter REPORTER_NAME
devlink health dump clear DEV reporter REPORTER_NAME
devlink health set        DEV reporter REPORTER_NAME NAME VALUE

 * show: Devlink health show command displays status and configuration info on
   specific reporter on a device or dump the info on all reporters on all
   devices.
 * recover: Devlink health recover enables the user to initiate a
   recovery on a reporter. This operation will increment the recoveries
   counter displayed in the show command.
 * diagnose: Devlink health diagnose enables the user to retrieve diagnostics data
   on a reporter on a device. The command's output is a free text defined
   by the reporter.
 * dump show: Devlink health dump show displays the last saved dump. Devlink
   health saves a single dump. If a dump is not already stored by
   the Devlink for this reporter, Devlink generates a new dump. The
   dump can be generated automatically when a reporter reports on an
   error or manually by user's request.
   dump output is defined by the reporter.
 * dump clear: Devlink health dump clear, deletes the last saved dump file.
 * set: Devlink health set, enables the user to configure:
	1) grace_period [msec] time interval between auto recoveries.
	2) auto_recover [true/false] whether the devlink should execute
	automatic recover on error.

Examples:
$devlink health show pci/0000:00:09.0 reporter tx
pci/0000:00:09.0:
name tx
  state healthy #err 0 #recover 1 last_dump_ts N/A
    parameters:
      grace period 600 auto_recover true
$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
  sqn: 4283 HW state: 1 stopped: false
  sqn: 4288 HW state: 1 stopped: false
  sqn: 4293 HW state: 1 stopped: false
  sqn: 4298 HW state: 1 stopped: false
  sqn: 4303 HW state: 1 stopped: false
$devlink health dump show pci/0000:00:09.0 reporter tx
TX dump data
$devlink health dump clear pci/0000:00:09.0 reporter tx
$devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
$devlink health set pci/0000:00:09.0 reporter tx auto_recover false

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 devlink/devlink.c            | 551 ++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/devlink.h |  23 ++
 man/man8/devlink-health.8    | 176 ++++++++++++++
 man/man8/devlink.8           |   7 +-
 4 files changed, 755 insertions(+), 2 deletions(-)
 create mode 100644 man/man8/devlink-health.8

diff --git a/devlink/devlink.c b/devlink/devlink.c
index a433883f1b2b..2ddbf389a3ea 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -22,6 +22,8 @@
 #include <linux/devlink.h>
 #include <libmnl/libmnl.h>
 #include <netinet/ether.h>
+#include <sys/sysinfo.h>
+#include <sys/queue.h>
 
 #include "SNAPSHOT.h"
 #include "list.h"
@@ -41,6 +43,10 @@
 #define PARAM_CMODE_PERMANENT_STR "permanent"
 #define DL_ARGS_REQUIRED_MAX_ERR_LEN 80
 
+#define HEALTH_REPORTER_STATE_HEALTHY_STR "healthy"
+#define HEALTH_REPORTER_STATE_ERROR_STR "error"
+#define HEALTH_REPORTER_TIMESTAMP_FORMAT_LEN 80
+
 static int g_new_line_count;
 
 #define pr_err(args...) fprintf(stderr, ##args)
@@ -200,6 +206,9 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_REGION_SNAPSHOT_ID	BIT(22)
 #define DL_OPT_REGION_ADDRESS		BIT(23)
 #define DL_OPT_REGION_LENGTH		BIT(24)
+#define DL_OPT_HEALTH_REPORTER_NAME	BIT(25)
+#define DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD	BIT(26)
+#define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER	BIT(27)
 
 struct dl_opts {
 	uint32_t present; /* flags of present items */
@@ -231,6 +240,9 @@ struct dl_opts {
 	uint32_t region_snapshot_id;
 	uint64_t region_address;
 	uint64_t region_length;
+	const char *reporter_name;
+	uint64_t reporter_graceful_period;
+	bool reporter_auto_recover;
 };
 
 struct dl {
@@ -391,6 +403,17 @@ static const enum mnl_attr_data_type devlink_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_INFO_VERSION_STORED] = MNL_TYPE_NESTED,
 	[DEVLINK_ATTR_INFO_VERSION_NAME] = MNL_TYPE_STRING,
 	[DEVLINK_ATTR_INFO_VERSION_VALUE] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_FMSG] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_FMSG_OBJ_NAME] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_HEALTH_REPORTER] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_HEALTH_REPORTER_STATE] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_HEALTH_REPORTER_ERR] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = MNL_TYPE_U8,
 };
 
 static int attr_cb(const struct nlattr *attr, void *data)
@@ -822,6 +845,24 @@ static int dl_argv_uint16_t(struct dl *dl, uint16_t *p_val)
 	return 0;
 }
 
+static int dl_argv_bool(struct dl *dl, bool *p_val)
+{
+	char *str = dl_argv_next(dl);
+	int err;
+
+	if (!str) {
+		pr_err("Boolean argument expected\n");
+		return -EINVAL;
+	}
+
+	err = strtobool(str, p_val);
+	if (err) {
+		pr_err("\"%s\" is not a valid boolean value\n", str);
+		return err;
+	}
+	return 0;
+}
+
 static int dl_argv_str(struct dl *dl, const char **p_str)
 {
 	const char *str = dl_argv_next(dl);
@@ -976,6 +1017,7 @@ static const struct dl_args_metadata dl_args_required[] = {
 	{DL_OPT_REGION_SNAPSHOT_ID,   "Region snapshot id expected.\n"},
 	{DL_OPT_REGION_ADDRESS,	      "Region address value expected.\n"},
 	{DL_OPT_REGION_LENGTH,	      "Region length value expected.\n"},
+	{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected.\n"},
 };
 
 static int validate_finding_required_dl_args(uint32_t o_required,
@@ -1231,6 +1273,28 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_REGION_LENGTH;
+		} else if (dl_argv_match(dl, "reporter") &&
+			   (o_all & DL_OPT_HEALTH_REPORTER_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->reporter_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_HEALTH_REPORTER_NAME;
+		} else if (dl_argv_match(dl, "grace_period") &&
+			   (o_all & DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD)) {
+			dl_arg_inc(dl);
+			err = dl_argv_uint64_t(dl,
+					       &opts->reporter_graceful_period);
+			if (err)
+				return err;
+			o_found |= DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD;
+		} else if (dl_argv_match(dl, "auto_recover") &&
+			(o_all & DL_OPT_HEALTH_REPORTER_AUTO_RECOVER)) {
+			dl_arg_inc(dl);
+			err = dl_argv_bool(dl, &opts->reporter_auto_recover);
+			if (err)
+				return err;
+			o_found |= DL_OPT_HEALTH_REPORTER_AUTO_RECOVER;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -1328,6 +1392,16 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_REGION_LENGTH)
 		mnl_attr_put_u64(nlh, DEVLINK_ATTR_REGION_CHUNK_LEN,
 				 opts->region_length);
+	if (opts->present & DL_OPT_HEALTH_REPORTER_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_HEALTH_REPORTER_NAME,
+				  opts->reporter_name);
+	if (opts->present & DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD)
+		mnl_attr_put_u64(nlh,
+				 DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,
+				 opts->reporter_graceful_period);
+	if (opts->present & DL_OPT_HEALTH_REPORTER_AUTO_RECOVER)
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
+				opts->reporter_auto_recover);
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -5677,11 +5751,482 @@ static int cmd_region(struct dl *dl)
 	return -ENOENT;
 }
 
+static int cmd_health_set_params(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+	err = dl_argv_parse(dl, DL_OPT_HANDLE | DL_OPT_HEALTH_REPORTER_NAME,
+			    DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD |
+			    DL_OPT_HEALTH_REPORTER_AUTO_RECOVER);
+	if (err)
+		return err;
+
+	dl_opts_put(nlh, dl);
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_health_dump_clear(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg,
+			       DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl,
+				DL_OPT_HANDLE | DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	dl_opts_put(nlh, dl);
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int health_value_show(struct dl *dl, int type, struct nlattr *nl_data)
+{
+	const char *str;
+	uint8_t *data;
+	uint32_t len;
+	uint64_t val_u64;
+	uint32_t val_u32;
+	uint16_t val_u16;
+	uint8_t val_u8;
+	bool val_bool;
+	int i;
+
+	switch (type) {
+	case MNL_TYPE_FLAG:
+		val_bool = mnl_attr_get_u8(nl_data);
+		if (dl->json_output)
+			jsonw_bool(dl->jw, !!val_bool);
+		else
+			pr_out("%s ", val_bool ? "true" : "false");
+		break;
+	case MNL_TYPE_U8:
+		val_u8 = mnl_attr_get_u8(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, val_u8);
+		else
+			pr_out("%u ", val_u8);
+		break;
+	case MNL_TYPE_U16:
+		val_u16 = mnl_attr_get_u16(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, val_u16);
+		else
+			pr_out("%u ", val_u16);
+		break;
+	case MNL_TYPE_U32:
+		val_u32 = mnl_attr_get_u32(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, val_u32);
+		else
+			pr_out("%u ", val_u32);
+		break;
+	case MNL_TYPE_U64:
+		val_u64 = mnl_attr_get_u64(nl_data);
+		if (dl->json_output)
+			jsonw_u64(dl->jw, val_u64);
+		else
+			pr_out("%lu ", val_u64);
+		break;
+	case MNL_TYPE_NUL_STRING:
+		str = mnl_attr_get_str(nl_data);
+		if (dl->json_output)
+			jsonw_string(dl->jw, str);
+		else
+			pr_out("%s ", str);
+		break;
+	case MNL_TYPE_BINARY:
+		len = mnl_attr_get_payload_len(nl_data);
+		data = mnl_attr_get_payload(nl_data);
+		i = 0;
+		while (i < len) {
+			if (dl->json_output) {
+				jsonw_printf(dl->jw, "%d", data[i]);
+			} else {
+				pr_out("%02x ", data[i]);
+				if (!(i % 15))
+					pr_out("\n");
+			}
+			i++;
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+	return MNL_CB_OK;
+}
+
+struct nest_qentry {
+	int attr_type;
+
+	TAILQ_ENTRY(nest_qentry) nest_entries;
+};
+
+struct health_cb_data {
+	struct dl *dl;
+	uint8_t value_type;
+
+	TAILQ_HEAD(, nest_qentry) qhead;
+};
+
+static int cmd_health_nest_queue(struct health_cb_data *health_data,
+				 uint8_t *attr_value, bool insert)
+{
+	struct nest_qentry *entry = NULL;
+
+	if (insert) {
+		entry = malloc(sizeof(struct nest_qentry));
+		if (!entry)
+			return -ENOMEM;
+
+		entry->attr_type = *attr_value;
+		TAILQ_INSERT_HEAD(&health_data->qhead, entry, nest_entries);
+	} else {
+		if (TAILQ_EMPTY(&health_data->qhead))
+			return MNL_CB_ERROR;
+		entry = TAILQ_FIRST(&health_data->qhead);
+		*attr_value = entry->attr_type;
+		TAILQ_REMOVE(&health_data->qhead, entry, nest_entries);
+		free(entry);
+	}
+	return MNL_CB_OK;
+}
+
+static int cmd_health_nest(struct health_cb_data *health_data,
+			   uint8_t nest_value, bool start)
+{
+	struct dl *dl = health_data->dl;
+	uint8_t value = nest_value;
+	int err;
+
+	err = cmd_health_nest_queue(health_data, &value, start);
+	if (err != MNL_CB_OK)
+		return err;
+
+	switch (value) {
+	case DEVLINK_ATTR_FMSG_OBJ_NEST_START:
+		if (start)
+			pr_out_entry_start(dl);
+		else
+			pr_out_entry_end(dl);
+		break;
+	case DEVLINK_ATTR_FMSG_PAIR_NEST_START:
+		break;
+	case DEVLINK_ATTR_FMSG_ARR_NEST_START:
+		if (dl->json_output) {
+			if (start)
+				jsonw_start_array(dl->jw);
+			else
+				jsonw_end_array(dl->jw);
+		} else {
+			if (start) {
+				__pr_out_newline();
+				__pr_out_indent_inc();
+			} else {
+				__pr_out_indent_dec();
+			}
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+	return MNL_CB_OK;
+}
+
+static int cmd_health_object_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct health_cb_data *health_data = data;
+	struct dl *dl = health_data->dl;
+	struct nlattr *nla_object;
+	const char *name;
+	int attr_type;
+	int err;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_FMSG])
+		return MNL_CB_ERROR;
+
+	mnl_attr_for_each_nested(nla_object, tb[DEVLINK_ATTR_FMSG]) {
+		attr_type = mnl_attr_get_type(nla_object);
+		switch (attr_type) {
+		case DEVLINK_ATTR_FMSG_OBJ_NEST_START:
+		case DEVLINK_ATTR_FMSG_PAIR_NEST_START:
+		case DEVLINK_ATTR_FMSG_ARR_NEST_START:
+			err = cmd_health_nest(health_data, attr_type, true);
+			if (err != MNL_CB_OK)
+				return err;
+			break;
+		case DEVLINK_ATTR_FMSG_NEST_END:
+			err = cmd_health_nest(health_data, attr_type, false);
+			if (err != MNL_CB_OK)
+				return err;
+			break;
+		case DEVLINK_ATTR_FMSG_OBJ_NAME:
+			name = mnl_attr_get_str(nla_object);
+			if (dl->json_output)
+				jsonw_name(dl->jw, name);
+			else
+				pr_out("%s: ", name);
+			break;
+		case DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE:
+			health_data->value_type = mnl_attr_get_u8(nla_object);
+			break;
+		case DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA:
+			err = health_value_show(dl, health_data->value_type,
+						nla_object);
+			if (err != MNL_CB_OK)
+				return err;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+	return MNL_CB_OK;
+}
+
+static int cmd_health_object_common(struct dl *dl, uint8_t cmd)
+{
+	struct nlmsghdr *nlh;
+	struct health_cb_data data;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, cmd,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl,
+				DL_OPT_HANDLE | DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	data.dl = dl;
+	TAILQ_INIT(&data.qhead);
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_object_cb, &data);
+	return err;
+}
+
+static int cmd_health_diagnose(struct dl *dl)
+{
+	return cmd_health_object_common(dl, DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE);
+}
+
+static int cmd_health_dump_show(struct dl *dl)
+{
+	return cmd_health_object_common(dl, DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET);
+}
+
+static int cmd_health_recover(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl,
+				DL_OPT_HANDLE | DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	dl_opts_put(nlh, dl);
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+enum devlink_health_reporter_state {
+	DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+	DEVLINK_HEALTH_REPORTER_STATE_ERROR,
+};
+
+static const char *health_state_name(uint8_t state)
+{
+	switch (state) {
+	case DEVLINK_HEALTH_REPORTER_STATE_HEALTHY:
+		return HEALTH_REPORTER_STATE_HEALTHY_STR;
+	case DEVLINK_HEALTH_REPORTER_STATE_ERROR:
+		return HEALTH_REPORTER_STATE_ERROR_STR;
+	default:
+		return "<unknown state>";
+	}
+}
+
+static void format_logtime(uint64_t time_ms, char *output)
+{
+	struct sysinfo s_info;
+	struct tm *info;
+	time_t now, sec;
+	int err;
+
+	time(&now);
+	info = localtime(&now);
+	err = sysinfo(&s_info);
+	if (err)
+		goto out;
+	/* substruct uptime in sec from now
+	 * add time_ms and 5 minutes factor
+	 */
+	sec = now - (s_info.uptime - time_ms / 1000) + 300;
+	info = localtime(&sec);
+out:
+	strftime(output, HEALTH_REPORTER_TIMESTAMP_FORMAT_LEN,
+		 "%Y-%b-%d %H:%M:%S", info);
+}
+
+static void pr_out_health(struct dl *dl, struct nlattr **tb)
+{
+	char dump_time_date[HEALTH_REPORTER_TIMESTAMP_FORMAT_LEN] = "N/A";
+	struct nlattr *hlt[DEVLINK_ATTR_MAX + 1] = {};
+	enum devlink_health_reporter_state state;
+	bool auto_recover = false;
+	const struct nlattr *attr;
+	uint64_t time_ms;
+	int err;
+
+	state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY;
+
+	err = mnl_attr_parse_nested(tb[DEVLINK_ATTR_HEALTH_REPORTER], attr_cb,
+				    hlt);
+	if (err != MNL_CB_OK)
+		return;
+
+	if (!hlt[DEVLINK_ATTR_HEALTH_REPORTER_NAME]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_ERR]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_STATE]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]	||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
+		return;
+
+	if (hlt[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS]) {
+		attr = hlt[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS];
+		time_ms = mnl_attr_get_u64(attr);
+		format_logtime(time_ms, dump_time_date);
+	}
+	pr_out_handle_start_arr(dl, tb);
+
+	pr_out_str(dl, "name",
+		   mnl_attr_get_str(hlt[DEVLINK_ATTR_HEALTH_REPORTER_NAME]));
+	if (!dl->json_output) {
+		__pr_out_newline();
+		__pr_out_indent_inc();
+	}
+	state = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_STATE]);
+	pr_out_str(dl, "state", health_state_name(state));
+	pr_out_u64(dl, "#err",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_ERR]));
+	pr_out_u64(dl, "#recover",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER]));
+	pr_out_str(dl, "last_dump_ts", dump_time_date);
+	pr_out_array_start(dl, "parameters");
+	pr_out_entry_start(dl);
+	pr_out_u64(dl, "grace_period",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]));
+	auto_recover = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
+	pr_out_bool(dl, "auto_recover", auto_recover);
+	pr_out_entry_end(dl);
+	pr_out_array_end(dl);
+	__pr_out_indent_dec();
+	__pr_out_indent_dec();
+	pr_out_handle_end(dl);
+}
+
+static int cmd_health_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct dl *dl = data;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_HEALTH_REPORTER])
+		return MNL_CB_ERROR;
+
+	pr_out_health(dl, tb);
+
+	return MNL_CB_OK;
+}
+
+static int cmd_health_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_GET,
+			       flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl,
+					DL_OPT_HANDLE |
+					DL_OPT_HEALTH_REPORTER_NAME, 0);
+		if (err)
+			return err;
+	}
+	pr_out_section_start(dl, "health");
+
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_show_cb, dl);
+	if (err)
+		return err;
+	pr_out_section_end(dl);
+	return err;
+}
+
+static void cmd_health_help(void)
+{
+	pr_err("Usage: devlink health show [ dev DEV reporter REPORTER_NAME ]\n");
+	pr_err("Usage: devlink health recover DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health diagnose DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health dump show DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health dump clear DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health set DEV reporter REPORTER_NAME NAME VALUE\n");
+}
+
+static int cmd_health(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_health_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_health_show(dl);
+	} else if (dl_argv_match(dl, "recover")) {
+		dl_arg_inc(dl);
+		return cmd_health_recover(dl);
+	} else if (dl_argv_match(dl, "diagnose")) {
+		dl_arg_inc(dl);
+		return cmd_health_diagnose(dl);
+	} else if (dl_argv_match(dl, "dump")) {
+		dl_arg_inc(dl);
+		if (dl_argv_match(dl, "show")) {
+			dl_arg_inc(dl);
+			return cmd_health_dump_show(dl);
+		} else if (dl_argv_match(dl, "clear")) {
+			dl_arg_inc(dl);
+			return cmd_health_dump_clear(dl);
+		}
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_health_set_params(dl);
+	}
+
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static void help(void)
 {
 	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       devlink [ -f[orce] ] -b[atch] filename\n"
-	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region }\n"
+	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health }\n"
 	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] }\n");
 }
 
@@ -5714,7 +6259,11 @@ static int dl_cmd(struct dl *dl, int argc, char **argv)
 	} else if (dl_argv_match(dl, "region")) {
 		dl_arg_inc(dl);
 		return cmd_region(dl);
+	} else if (dl_argv_match(dl, "health")) {
+		dl_arg_inc(dl);
+		return cmd_health(dl);
 	}
+
 	pr_err("Object \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
 }
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index d51b59a7b8ee..bec76e94bc47 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -95,6 +95,12 @@ enum devlink_command {
 	DEVLINK_CMD_PORT_PARAM_DEL,
 
 	DEVLINK_CMD_INFO_GET,		/* can dump */
+	DEVLINK_CMD_HEALTH_REPORTER_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_SET,
+	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+	DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
@@ -302,6 +308,23 @@ enum devlink_attr {
 
 	DEVLINK_ATTR_SB_POOL_CELL_SIZE,		/* u32 */
 
+	DEVLINK_ATTR_FMSG,			/* nested */
+	DEVLINK_ATTR_FMSG_OBJ_NEST_START,	/* flag */
+	DEVLINK_ATTR_FMSG_PAIR_NEST_START,	/* flag */
+	DEVLINK_ATTR_FMSG_ARR_NEST_START,	/* flag */
+	DEVLINK_ATTR_FMSG_NEST_END,		/* flag */
+	DEVLINK_ATTR_FMSG_OBJ_NAME,		/* string */
+	DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE,	/* u8 */
+	DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA,	/* dynamic */
+
+	DEVLINK_ATTR_HEALTH_REPORTER,			/* nested */
+	DEVLINK_ATTR_HEALTH_REPORTER_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_REPORTER_STATE,		/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_ERR,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,	/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,	/* u8 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/man/man8/devlink-health.8 b/man/man8/devlink-health.8
new file mode 100644
index 000000000000..2824d4d1bbf1
--- /dev/null
+++ b/man/man8/devlink-health.8
@@ -0,0 +1,176 @@
+.TH DEVLINK\-HEALTH 8 "27 Dec 2018" "iproute2" "Linux"
+.SH NAME
+devlink-health \- devlink health reporting and recovery
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.B health
+.RI  " { " COMMAND " | "
+.BR help " }"
+.sp
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] }
+.ti -8
+.BR "devlink health show"
+.RI "[ "
+.RI "" DEV ""
+.BR " reporter "
+.RI ""REPORTER " ] "
+.ti -8
+.BR "devlink health recover"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health diagnose"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health dump show"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health dump clear"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health set"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.RI "" NAME ""
+.RI "" VALUE ""
+.ti -8
+.B devlink health help
+.SH "DESCRIPTION"
+.SS devlink health show - Show status and configuration on all supported reporters on all devlink devices.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health recover - Initiate a recovery operation on a reporter.
+This action performs a recovery and increases the recoveries counter on success.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health diagnose - Retrieve diagnostics data on a reporter.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health dump show - Display the last saved dump.
+.PD 0
+.P
+Devlink health saves a single dump per reporter. If an dump is
+.P
+not already stored by the Devlink, this command will generate a new
+.P
+dump. The dump can be generated either automatically when a
+.P
+reporter reports on an error or manually at the user's request.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health dump clear - Delete the saved dump.
+Deleting the saved dump enables a generation of a new dump on
+.PD 0
+.P
+the next "devlink health dump show" command.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health set - Enable the user to configure:
+.PD 0
+1) grace_period [msec] - Time interval between auto recoveries.
+.P
+2) auto_recover [true/false] - Indicates whether the devlink should execute automatic recover on error.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SH "EXAMPLES"
+.PP
+devlink health show
+.RS 4
+pci/0000:00:09.0:
+  name tx
+  state healthy #err 1 #recover 1 last_dump_ts N/A
+  parameters:
+    grace period 600 auto_recover true
+.RE
+.PP
+devlink health recover pci/0000:00:09.0 reporter tx
+.RS 4
+Initiate recovery on tx reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health diagnose pci/0000:00:09.0 reporter tx
+.RS 4
+.PD 0
+SQs:
+.P
+sqn: 4283 HW state: 1 stopped: 0
+.P
+sqn: 4288 HW state: 1 stopped: 0
+.P
+sqn: 4293 HW state: 1 stopped: 0
+.P
+sqn: 4298 HW state: 1 stopped: 0
+.PD
+.RE
+.PP
+devlink health dump show pci/0000:00:09.0 reporter tx
+.RS 4
+Display the last saved dump on tx reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health dump clear pci/0000:00:09.0 reporter tx
+.RS 4
+Delete saved dump on tx reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
+.RS 4
+Set time interval between auto recoveries to minimum of 3500 mSec on
+tx reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health set pci/0000:00:09.0 reporter tx auto_recover false
+.RS 4
+Turn off auto recovery on tx reporter registered on pci/0000:00:09.0.
+.RE
+.SH SEE ALSO
+.BR devlink (8),
+.BR devlink-dev (8),
+.BR devlink-port (8),
+.BR devlink-region (8),
+.br
+
+.SH AUTHOR
+Aya Levin <ayal@mellanox.com>
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 8d527e7e1d60..13d4dcd908b3 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -7,7 +7,7 @@ devlink \- Devlink tool
 .in +8
 .ti -8
 .B devlink
-.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region " } { " COMMAND " | "
+.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health " } { " COMMAND " | "
 .BR help " }"
 .sp
 
@@ -78,6 +78,10 @@ Turn on verbose output.
 .B region
 - devlink address region access
 
+.TP
+.B health
+- devlink reporting and recovery
+
 .SS
 .I COMMAND
 
@@ -109,6 +113,7 @@ Exit status is 0 if command was successful or a positive integer upon failure.
 .BR devlink-sb (8),
 .BR devlink-resource (8),
 .BR devlink-region (8),
+.BR devlink-health (8),
 .br
 
 .SH REPORTING BUGS
-- 
2.14.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [iproute2-next, 0/4] Add support for devlink health
  2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (12 preceding siblings ...)
  2019-02-10 18:28 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
@ 2019-02-10 18:35 ` Aya Levin
  13 siblings, 0 replies; 28+ messages in thread
From: Aya Levin @ 2019-02-10 18:35 UTC (permalink / raw)
  To: David Ahern, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog


This series adds support for devlink health commands:
 devlink health show      [DEV reporter REPORTE_NAME]
 devlink health recover    DEV reporter REPORTER_NAME
 devlink health diagnose   DEV reporter REPORTER_NAME
 devlink health dump show  DEV reporter REPORTER_NAME
 devlink health dump clear DEV reporter REPORTER_NAME
 devlink health set        DEV reporter REPORTER_NAME NAME VALUE

The first patch refactors the validation of input parameters, which
grow way too long. Patch 2 and 3 fix bugs that were discovered during
the devlink health development.  Patch 4 adds the devlink health
functionality.


Aya Levin (4):
  devlink: refactor validation of finding required arguments
  devlink: fix print of uint64_t
  devlink: fix boolean JSON print
  devlink: add health command support

 devlink/devlink.c            | 721 ++++++++++++++++++++++++++++++++++++-------
 include/uapi/linux/devlink.h |  23 ++
 man/man8/devlink-health.8    | 176 +++++++++++
 man/man8/devlink.8           |   7 +-
 4 files changed, 813 insertions(+), 114 deletions(-)
 create mode 100644 man/man8/devlink-health.8

-- 
2.14.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next 2/4]  devlink: fix print of uint64_t
  2019-02-10 18:28   ` [PATCH for-next 2/4] devlink: fix print of uint64_t Aya Levin
@ 2019-02-10 20:34     ` Stephen Hemminger
  2019-02-11  2:44       ` David Ahern
  2019-02-11 10:32     ` Jiri Pirko
  1 sibling, 1 reply; 28+ messages in thread
From: Stephen Hemminger @ 2019-02-10 20:34 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog

On Sun, 10 Feb 2019 20:28:47 +0200
Aya Levin <ayal@mellanox.com> wrote:

>  This patch prints uint64_t with its corresponding format and avoid implicit
>  cast to uint32_t.
> 
> Signed-off-by: Aya Levin <ayal@mellanox.com>
> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
> Reported-by: Maria Pasechnik <mariap@mellanox.com>
> ---
>  devlink/devlink.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/devlink/devlink.c b/devlink/devlink.c
> index a05755385a49..46e2e41c5dfd 100644
> --- a/devlink/devlink.c
> +++ b/devlink/devlink.c
> @@ -1628,7 +1628,14 @@ static void pr_out_u64(struct dl *dl, const char *name, uint64_t val)
>  	if (val == (uint64_t) -1)
>  		return pr_out_str(dl, name, "unlimited");
>  
> -	return pr_out_uint(dl, name, val);
> +	if (dl->json_output) {
> +		jsonw_u64_field(dl->jw, name, val);
> +	} else {
> +		if (g_indent_newline)
> +			pr_out("%s %lu", name, val);
> +		else
> +			pr_out(" %s %lu", name, val);
> +	}
>  }
>  
>  static void pr_out_region_chunk_start(struct dl *dl, uint64_t addr)

More conditional code adds bloat and is a nuisance.
Why not use print_u64 that already exists.

I wish devlink used json_print in a more standard manner.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next 3/4] devlink: fix boolean JSON print
  2019-02-10 18:28   ` [PATCH for-next 3/4] devlink: fix boolean JSON print Aya Levin
@ 2019-02-10 20:34     ` Stephen Hemminger
  0 siblings, 0 replies; 28+ messages in thread
From: Stephen Hemminger @ 2019-02-10 20:34 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog

On Sun, 10 Feb 2019 20:28:48 +0200
Aya Levin <ayal@mellanox.com> wrote:

> This patch removes the inverted commas from boolean values in JSON
> format: true/false instead of "true"/"false".
> 
> Signed-off-by: Aya Levin <ayal@mellanox.com>
> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
> ---
>  devlink/devlink.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/devlink/devlink.c b/devlink/devlink.c
> index 46e2e41c5dfd..a433883f1b2b 100644
> --- a/devlink/devlink.c
> +++ b/devlink/devlink.c
> @@ -1605,10 +1605,10 @@ static void pr_out_str(struct dl *dl, const char *name, const char *val)
>  
>  static void pr_out_bool(struct dl *dl, const char *name, bool val)
>  {
> -	if (val)
> -		pr_out_str(dl, name, "true");
> +	if (dl->json_output)
> +		jsonw_bool_field(dl->jw, name, val);
>  	else
> -		pr_out_str(dl, name, "false");
> +		pr_out_str(dl, name, val ? "true" : "false");
>  }
>  
>  static void pr_out_uint(struct dl *dl, const char *name, unsigned int val)

There is already print_bool in json_print why is that not used?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next 4/4] devlink: add health command support
  2019-02-10 18:28   ` [PATCH for-next 4/4] devlink: add health command support Aya Levin
@ 2019-02-10 20:42     ` Stephen Hemminger
  2019-02-11 10:41     ` Jiri Pirko
  1 sibling, 0 replies; 28+ messages in thread
From: Stephen Hemminger @ 2019-02-10 20:42 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog

On Sun, 10 Feb 2019 20:28:49 +0200
Aya Levin <ayal@mellanox.com> wrote:

> +
> +static void cmd_health_help(void)
> +{
> +	pr_err("Usage: devlink health show [ dev DEV reporter REPORTER_NAME ]\n");
> +	pr_err("Usage: devlink health recover DEV reporter REPORTER_NAME\n");
> +	pr_err("Usage: devlink health diagnose DEV reporter REPORTER_NAME\n");
> +	pr_err("Usage: devlink health dump show DEV reporter REPORTER_NAME\n");
> +	pr_err("Usage: devlink health dump clear DEV reporter REPORTER_NAME\n");
> +	pr_err("Usage: devlink health set DEV reporter REPORTER_NAME NAME VALUE\n");
> +}
> +

Minor nit:
I prefer that all code and outputs in iproute2 look the same for ease
of maintenance and constituency of user experience.

Why does devlink not use:

static void cmd_health_help(void)
{
	fprintf(stderr, "Usage: devlink health show [ dev DEV reporter REPORTER_NAME ]\n");
	fprintf(stderr, "       devlink health recover DEV reporter REPORTER_NAME\n");
	fprintf(stderr, "       devlink health diagnose DEV reporter REPORTER_NAME\n");

...

Or 
	fprintf(stderr, "Usage: devlink health show [ dev DEV reporter REPORTER_NAME ]\n"
			"       devlink health recover DEV reporter REPORTER_NAME\n"	
			"       devlink health diagnose DEV reporter REPORTER_NAME\n");

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next 2/4] devlink: fix print of uint64_t
  2019-02-10 20:34     ` Stephen Hemminger
@ 2019-02-11  2:44       ` David Ahern
  0 siblings, 0 replies; 28+ messages in thread
From: David Ahern @ 2019-02-11  2:44 UTC (permalink / raw)
  To: Stephen Hemminger, Aya Levin
  Cc: netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog

On 2/10/19 1:34 PM, Stephen Hemminger wrote:
> On Sun, 10 Feb 2019 20:28:47 +0200
> Aya Levin <ayal@mellanox.com> wrote:
> 
>>  This patch prints uint64_t with its corresponding format and avoid implicit
>>  cast to uint32_t.
>>
>> Signed-off-by: Aya Levin <ayal@mellanox.com>
>> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
>> Reported-by: Maria Pasechnik <mariap@mellanox.com>
>> ---
>>  devlink/devlink.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/devlink/devlink.c b/devlink/devlink.c
>> index a05755385a49..46e2e41c5dfd 100644
>> --- a/devlink/devlink.c
>> +++ b/devlink/devlink.c
>> @@ -1628,7 +1628,14 @@ static void pr_out_u64(struct dl *dl, const char *name, uint64_t val)
>>  	if (val == (uint64_t) -1)
>>  		return pr_out_str(dl, name, "unlimited");
>>  
>> -	return pr_out_uint(dl, name, val);
>> +	if (dl->json_output) {
>> +		jsonw_u64_field(dl->jw, name, val);
>> +	} else {
>> +		if (g_indent_newline)
>> +			pr_out("%s %lu", name, val);
>> +		else
>> +			pr_out(" %s %lu", name, val);

I would expect the compiler to throw a warning when %lu is used for u64's.

>> +	}
>>  }
>>  
>>  static void pr_out_region_chunk_start(struct dl *dl, uint64_t addr)
> 
> More conditional code adds bloat and is a nuisance.
> Why not use print_u64 that already exists.
> 
> I wish devlink used json_print in a more standard manner.
> 

That's a general devlink problem that can be addressed outside of this
patch set. It is an example of what I meant by devlink not using more of
the infra from the iproute2 code.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next 1/4] devlink: refactor validation of finding required arguments
  2019-02-10 18:28   ` [PATCH for-next 1/4] devlink: refactor validation of finding required arguments Aya Levin
@ 2019-02-11  2:46     ` David Ahern
  2019-02-11 10:29     ` Jiri Pirko
  1 sibling, 0 replies; 28+ messages in thread
From: David Ahern @ 2019-02-11  2:46 UTC (permalink / raw)
  To: Aya Levin, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Eran Ben Elisha, Tal Alon, Ariel Almog

On 2/10/19 11:28 AM, Aya Levin wrote:
> @@ -950,6 +951,51 @@ static int param_cmode_get(const char *cmodestr,
>  	return 0;
>  }
>  
> +struct dl_args_metadata {
> +	uint32_t o_flag;
> +	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
> +};
> +
> +static const struct dl_args_metadata dl_args_required[] = {
> +	{DL_OPT_PORT_TYPE,	      "Port type not set.\n"},
> +	{DL_OPT_PORT_COUNT,	      "Port split count option expected.\n"},
> +	{DL_OPT_SB_POOL,	      "Pool index option expected.\n"},
> +	{DL_OPT_SB_SIZE,	      "Pool size option expected.\n"},
> +	{DL_OPT_SB_TYPE,	      "Pool type option expected.\n"},
> +	{DL_OPT_SB_THTYPE,	      "Pool threshold type option expected.\n"},
> +	{DL_OPT_SB_TH,		      "Threshold option expected.\n"},
> +	{DL_OPT_SB_TC,		      "TC index option expected.\n"},
> +	{DL_OPT_ESWITCH_MODE,	      "E-Switch mode option expected.\n"},
> +	{DL_OPT_ESWITCH_INLINE_MODE,  "E-Switch inline-mode option expected.\n"},
> +	{DL_OPT_DPIPE_TABLE_NAME,     "Dpipe table name expected\n"},
> +	{DL_OPT_DPIPE_TABLE_COUNTERS, "Dpipe table counter state expected\n"},
> +	{DL_OPT_ESWITCH_ENCAP_MODE,   "E-Switch encapsulation option expected.\n"},
> +	{DL_OPT_PARAM_NAME,	      "Parameter name expected.\n"},
> +	{DL_OPT_PARAM_VALUE,	      "Value to set expected.\n"},
> +	{DL_OPT_PARAM_CMODE,	      "Configuration mode expected.\n"},
> +	{DL_OPT_REGION_SNAPSHOT_ID,   "Region snapshot id expected.\n"},
> +	{DL_OPT_REGION_ADDRESS,	      "Region address value expected.\n"},
> +	{DL_OPT_REGION_LENGTH,	      "Region length value expected.\n"},
> +};
> +
> +static int validate_finding_required_dl_args(uint32_t o_required,
> +					     uint32_t o_found)
> +{
> +	uint32_t dl_args_required_size;
> +	uint32_t o_flag;
> +	int i;
> +
> +	dl_args_required_size = ARRAY_SIZE(dl_args_required);
> +	for (i = 0; i < dl_args_required_size; i++) {
> +		o_flag = dl_args_required[i].o_flag;
> +		if ((o_required & o_flag) && !(o_found & o_flag)) {
> +			pr_err("%s", dl_args_required[i].err_msg);
> +			return -EINVAL;
> +		}
> +	}
> +	return 0;
> +}
> +

much better. Thank you for refactoring this.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next 1/4] devlink: refactor validation of finding required arguments
  2019-02-10 18:28   ` [PATCH for-next 1/4] devlink: refactor validation of finding required arguments Aya Levin
  2019-02-11  2:46     ` David Ahern
@ 2019-02-11 10:29     ` Jiri Pirko
  1 sibling, 0 replies; 28+ messages in thread
From: Jiri Pirko @ 2019-02-11 10:29 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog

Sun, Feb 10, 2019 at 07:28:46PM CET, ayal@mellanox.com wrote:
>Introducing argument's metadata structure matching a bitmap flag per
>required argument and an error message if missing. Using this static
>array to refactor validation of finding required arguments in devlink
>command line and to ease further maintenance.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
>---
> devlink/devlink.c | 155 +++++++++++++++++-------------------------------------
> 1 file changed, 47 insertions(+), 108 deletions(-)
>
>diff --git a/devlink/devlink.c b/devlink/devlink.c
>index d823512a4030..a05755385a49 100644
>--- a/devlink/devlink.c
>+++ b/devlink/devlink.c
>@@ -39,6 +39,7 @@
> #define PARAM_CMODE_RUNTIME_STR "runtime"
> #define PARAM_CMODE_DRIVERINIT_STR "driverinit"
> #define PARAM_CMODE_PERMANENT_STR "permanent"
>+#define DL_ARGS_REQUIRED_MAX_ERR_LEN 80
> 
> static int g_new_line_count;
> 
>@@ -950,6 +951,51 @@ static int param_cmode_get(const char *cmodestr,
> 	return 0;
> }
> 
>+struct dl_args_metadata {
>+	uint32_t o_flag;
>+	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
>+};
>+
>+static const struct dl_args_metadata dl_args_required[] = {
>+	{DL_OPT_PORT_TYPE,	      "Port type not set.\n"},
>+	{DL_OPT_PORT_COUNT,	      "Port split count option expected.\n"},
>+	{DL_OPT_SB_POOL,	      "Pool index option expected.\n"},
>+	{DL_OPT_SB_SIZE,	      "Pool size option expected.\n"},
>+	{DL_OPT_SB_TYPE,	      "Pool type option expected.\n"},
>+	{DL_OPT_SB_THTYPE,	      "Pool threshold type option expected.\n"},
>+	{DL_OPT_SB_TH,		      "Threshold option expected.\n"},
>+	{DL_OPT_SB_TC,		      "TC index option expected.\n"},
>+	{DL_OPT_ESWITCH_MODE,	      "E-Switch mode option expected.\n"},
>+	{DL_OPT_ESWITCH_INLINE_MODE,  "E-Switch inline-mode option expected.\n"},
>+	{DL_OPT_DPIPE_TABLE_NAME,     "Dpipe table name expected\n"},
>+	{DL_OPT_DPIPE_TABLE_COUNTERS, "Dpipe table counter state expected\n"},
>+	{DL_OPT_ESWITCH_ENCAP_MODE,   "E-Switch encapsulation option expected.\n"},
>+	{DL_OPT_PARAM_NAME,	      "Parameter name expected.\n"},
>+	{DL_OPT_PARAM_VALUE,	      "Value to set expected.\n"},
>+	{DL_OPT_PARAM_CMODE,	      "Configuration mode expected.\n"},
>+	{DL_OPT_REGION_SNAPSHOT_ID,   "Region snapshot id expected.\n"},
>+	{DL_OPT_REGION_ADDRESS,	      "Region address value expected.\n"},
>+	{DL_OPT_REGION_LENGTH,	      "Region length value expected.\n"},

Remove the "\n" from there and put it to the pr_err() call.


>+};
>+
>+static int validate_finding_required_dl_args(uint32_t o_required,

Please maintain the current naming scheme of functions. This should be
named something like:
dl_args_finding_required_validate()


>+					     uint32_t o_found)
>+{
>+	uint32_t dl_args_required_size;
>+	uint32_t o_flag;
>+	int i;
>+
>+	dl_args_required_size = ARRAY_SIZE(dl_args_required);
>+	for (i = 0; i < dl_args_required_size; i++) {
>+		o_flag = dl_args_required[i].o_flag;
>+		if ((o_required & o_flag) && !(o_found & o_flag)) {
>+			pr_err("%s", dl_args_required[i].err_msg);
>+			return -EINVAL;
>+		}
>+	}
>+	return 0;
>+}
>+

[...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next 2/4]  devlink: fix print of uint64_t
  2019-02-10 18:28   ` [PATCH for-next 2/4] devlink: fix print of uint64_t Aya Levin
  2019-02-10 20:34     ` Stephen Hemminger
@ 2019-02-11 10:32     ` Jiri Pirko
  1 sibling, 0 replies; 28+ messages in thread
From: Jiri Pirko @ 2019-02-11 10:32 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog

Sun, Feb 10, 2019 at 07:28:47PM CET, ayal@mellanox.com wrote:
> This patch prints uint64_t with its corresponding format and avoid implicit
> cast to uint32_t.

Drop the space at the beginning of the lines.

Otherwise this looks fine.
Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next 4/4] devlink: add health command support
  2019-02-10 18:28   ` [PATCH for-next 4/4] devlink: add health command support Aya Levin
  2019-02-10 20:42     ` Stephen Hemminger
@ 2019-02-11 10:41     ` Jiri Pirko
  1 sibling, 0 replies; 28+ messages in thread
From: Jiri Pirko @ 2019-02-11 10:41 UTC (permalink / raw)
  To: Aya Levin
  Cc: David Ahern, netdev, David S. Miller, Jiri Pirko, Moshe Shemesh,
	Eran Ben Elisha, Tal Alon, Ariel Almog

Sun, Feb 10, 2019 at 07:28:49PM CET, ayal@mellanox.com wrote:
>This patch adds support for the following commands:
>devlink health show      [DEV reporter REPORTE_NAME]
>devlink health recover    DEV reporter REPORTER_NAME
>devlink health diagnose   DEV reporter REPORTER_NAME
>devlink health dump show  DEV reporter REPORTER_NAME
>devlink health dump clear DEV reporter REPORTER_NAME
>devlink health set        DEV reporter REPORTER_NAME NAME VALUE
>
> * show: Devlink health show command displays status and configuration info on
>   specific reporter on a device or dump the info on all reporters on all
>   devices.
> * recover: Devlink health recover enables the user to initiate a
>   recovery on a reporter. This operation will increment the recoveries
>   counter displayed in the show command.
> * diagnose: Devlink health diagnose enables the user to retrieve diagnostics data
>   on a reporter on a device. The command's output is a free text defined
>   by the reporter.
> * dump show: Devlink health dump show displays the last saved dump. Devlink
>   health saves a single dump. If a dump is not already stored by
>   the Devlink for this reporter, Devlink generates a new dump. The
>   dump can be generated automatically when a reporter reports on an
>   error or manually by user's request.
>   dump output is defined by the reporter.
> * dump clear: Devlink health dump clear, deletes the last saved dump file.
> * set: Devlink health set, enables the user to configure:
>	1) grace_period [msec] time interval between auto recoveries.
>	2) auto_recover [true/false] whether the devlink should execute
>	automatic recover on error.
>
>Examples:
>$devlink health show pci/0000:00:09.0 reporter tx
>pci/0000:00:09.0:
>name tx
>  state healthy #err 0 #recover 1 last_dump_ts N/A
>    parameters:
>      grace period 600 auto_recover true
>$devlink health diagnose pci/0000:00:09.0 reporter tx
>SQs:
>  sqn: 4283 HW state: 1 stopped: false
>  sqn: 4288 HW state: 1 stopped: false
>  sqn: 4293 HW state: 1 stopped: false
>  sqn: 4298 HW state: 1 stopped: false
>  sqn: 4303 HW state: 1 stopped: false
>$devlink health dump show pci/0000:00:09.0 reporter tx
>TX dump data
>$devlink health dump clear pci/0000:00:09.0 reporter tx
>$devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
>$devlink health set pci/0000:00:09.0 reporter tx auto_recover false
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
>---
> devlink/devlink.c            | 551 ++++++++++++++++++++++++++++++++++++++++++-
> include/uapi/linux/devlink.h |  23 ++
> man/man8/devlink-health.8    | 176 ++++++++++++++
> man/man8/devlink.8           |   7 +-
> 4 files changed, 755 insertions(+), 2 deletions(-)

755 lines is too much for one patch.
For easier review, please split this patch into separate patchset,
preferably per-cmd.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, back to index

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-07  9:36 [PATCH v4 net-next 00/11] Devlink health reporting and recovery system Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 01/11] devlink: Add devlink formatted message (fmsg) API Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 02/11] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 03/11] devlink: Add health report functionality Eran Ben Elisha
2019-02-07  9:38   ` Jiri Pirko
2019-02-07  9:36 ` [PATCH v4 net-next 04/11] devlink: Add health get command Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 05/11] devlink: Add health set command Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 06/11] devlink: Add health recover command Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 07/11] devlink: Add health diagnose command Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 08/11] devlink: Add health dump {get,clear} commands Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 09/11] net/mlx5e: Add tx reporter support Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 10/11] net/mlx5e: Add tx timeout support for mlx5e tx reporter Eran Ben Elisha
2019-02-07  9:36 ` [PATCH v4 net-next 11/11] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
2019-02-07 18:37 ` [PATCH v4 net-next 00/11] Devlink health reporting and recovery system David Miller
2019-02-10 18:28 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin
2019-02-10 18:28   ` [PATCH for-next 1/4] devlink: refactor validation of finding required arguments Aya Levin
2019-02-11  2:46     ` David Ahern
2019-02-11 10:29     ` Jiri Pirko
2019-02-10 18:28   ` [PATCH for-next 2/4] devlink: fix print of uint64_t Aya Levin
2019-02-10 20:34     ` Stephen Hemminger
2019-02-11  2:44       ` David Ahern
2019-02-11 10:32     ` Jiri Pirko
2019-02-10 18:28   ` [PATCH for-next 3/4] devlink: fix boolean JSON print Aya Levin
2019-02-10 20:34     ` Stephen Hemminger
2019-02-10 18:28   ` [PATCH for-next 4/4] devlink: add health command support Aya Levin
2019-02-10 20:42     ` Stephen Hemminger
2019-02-11 10:41     ` Jiri Pirko
2019-02-10 18:35 ` [iproute2-next, 0/4] Add support for devlink health Aya Levin

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox