All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
@ 2018-12-31 14:31 Eran Ben Elisha
  2018-12-31 14:31 ` [PATCH RFC net-next 01/19] devlink: Add health buffer support Eran Ben Elisha
                   ` (20 more replies)
  0 siblings, 21 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:31 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

The health mechanism is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
  information.

The main idea is to unify and centralize driver health reports in the
generic devlink instance and allow the user to set different
attributes of the health reporting and recovery procedures.

The devlink health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by devlink.
Device driver can provide specific callbacks for each "health reporter", e.g.
 - Recovery procedures
 - Diagnostics and object dump procedures
 - OOB initial attributes
Different parts of the driver can register different types of health reporters
with different handlers.

Once an error is reported, devlink health will do the following actions:
  * A log is being send to the kernel trace events buffer
  * Health status and statistics are being updated for the reporter instance
  * Object dump is being taken and saved at the reporter instance (as long as
    there is no other Objdump which is already stored)
  * Auto recovery attempt is being done. Depends on:
    - Auto-recovery configuration
    - Grace period vs. time passed since last recover

The user interface:
User can access/change each reporter attributes and driver specific callbacks
via devlink, e.g per error type (per health reporter)
 - Configure reporter's generic attributes (like: Disable/enable auto recovery)
 - Invoke recovery procedure
 - Run diagnostics
 - Object dump

The devlink health interface (via netlink):
DEVLINK_CMD_HEALTH_REPORTER_GET
  Retrieves status and configuration info per DEV and reporter.
DEVLINK_CMD_HEALTH_REPORTER_SET
  Allows reporter-related configuration setting.
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
  Triggers a reporter's recovery procedure.
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
  Retrieves diagnostics data from a reporter on a device.
DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET
  Retrieves the last stored objdump. Devlink health
  saves a single objdump. If an objdump is not already stored by the devlink
  for this reporter, devlink generates a new objdump.
  Objdump output is defined by the reporter.
DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR
  Clears the last saved objdump file for the specified reporter.


                                               netlink
                                      +--------------------------+
                                      |                          |
                                      |            +             |
                                      |            |             |
                                      +--------------------------+
                                                   |request for ops
                                                   |(diagnose,
 mlx5_core                             devlink     |recover,
                                                   |dump)
+--------+                            +--------------------------+
|        |                            |    reporter|             |
|        |                            |  +---------v----------+  |
|        |   ops execution            |  |                    |  |
|     <----------------------------------+                    |  |
|        |                            |  |                    |  |
|        |                            |  + ^------------------+  |
|        |                            |    | request for ops     |
|        |                            |    | (recover, dump)     |
|        |                            |    |                     |
|        |                            |  +-+------------------+  |
|        |     health report          |  | health handler     |  |
|        +------------------------------->                    |  |
|        |                            |  +--------------------+  |
|        |     health reporter create |                          |
|        +---------------------------->                          |
+--------+                            +--------------------------+

Available reporters:
In this patchset, three reporters of mlx5e driver are included. The FW
reporters implement devlink_health_reporter diagnostic, object dump and
recovery procedures. The TX reporter implements devlink_health_reporter
diagnostic and recovery procedures.

This RFC is based on the same concepts as previous RFC I sent earlier this
year: "[RFC PATCH iproute2-next] System specification health API". The API was
changed, also devlink code and mlx5e reporters were not available at the
previous RFC.

Aya Levin (1):
  devlink: Add Documentation/networking/devlink-health.txt

Eran Ben Elisha (11):
  devlink: Add health buffer support
  devlink: Add health reporter create/destroy functionality
  devlink: Add health report functionality
  devlink: Add health get command
  devlink: Add health set command
  devlink: Add health recover command
  devlink: Add health diagnose command
  devlink: Add health objdump {get,clear} commands
  net/mlx5e: Add TX reporter support
  net/mlx5e: Add TX timeout support for mlx5e TX reporter
  net/mlx5: Move all devlink related functions calls to devlink.c

Moshe Shemesh (7):
  net/mlx5: Refactor print health info
  net/mlx5: Create FW devlink_health_reporter
  net/mlx5: Add core dump register access functions
  net/mlx5: Add support for FW reporter objdump
  net/mlx5: Report devlink health on FW issues
  net/mlx5: Add FW fatal devlink_health_reporter
  net/mlx5: Report devlink health on FW fatal issues

 Documentation/networking/devlink-health.txt   |   86 ++
 .../net/ethernet/mellanox/mlx5/core/Makefile  |    4 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  310 +++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |   22 +
 .../mellanox/mlx5/core/diag/fw_tracer.c       |   75 ++
 .../mellanox/mlx5/core/diag/fw_tracer.h       |   13 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   18 +-
 .../ethernet/mellanox/mlx5/core/en/reporter.h |   15 +
 .../mellanox/mlx5/core/en/reporter_tx.c       |  356 ++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  186 +--
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |    2 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  |   79 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    |   10 +-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |    7 +
 include/linux/mlx5/device.h                   |    1 +
 include/linux/mlx5/driver.h                   |    5 +
 include/linux/mlx5/mlx5_ifc.h                 |   21 +-
 include/net/devlink.h                         |  142 +++
 include/trace/events/devlink.h                |   62 +
 include/uapi/linux/devlink.h                  |   25 +
 net/core/devlink.c                            | 1037 +++++++++++++++++
 21 files changed, 2265 insertions(+), 211 deletions(-)
 create mode 100644 Documentation/networking/devlink-health.txt
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 01/19] devlink: Add health buffer support
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
@ 2018-12-31 14:31 ` Eran Ben Elisha
  2018-12-31 14:31 ` [PATCH RFC net-next 02/19] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:31 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Devlink health buffer is a mechanism to pass descriptors between drivers
and devlink. The API allows the driver to add objects, object pair,
value array (nested attributes), value and name.

Driver can use this API to fill the buffers in a format which can be
translated by the devlink to the netlink message.

In order to fulfill it, an internal buffer descriptor is defined. This
will hold the data and metadata per each attribute and by used to pass
actual commands to the netlink.

This mechanism will be later used in devlink health for objdump and diagnose
data store by the drivers.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/net/devlink.h        |  76 ++++++
 include/uapi/linux/devlink.h |   8 +
 net/core/devlink.c           | 501 +++++++++++++++++++++++++++++++++++
 3 files changed, 585 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 67f4293bc970..77c77319290a 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -423,6 +423,8 @@ struct devlink_region;
 
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
+struct devlink_health_buffer;
+
 struct devlink_ops {
 	int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
 	int (*port_type_set)(struct devlink_port *devlink_port,
@@ -584,6 +586,22 @@ int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
 				   u8 *data, u32 snapshot_id,
 				   devlink_snapshot_data_dest_t *data_destructor);
 
+int devlink_health_buffer_nest_start(struct devlink_health_buffer *buffer,
+				     int attrtype);
+void devlink_health_buffer_nest_end(struct devlink_health_buffer *buffer);
+void devlink_health_buffer_nest_cancel(struct devlink_health_buffer *buffer);
+int devlink_health_buffer_put_object_name(struct devlink_health_buffer *buffer,
+					  char *name);
+int devlink_health_buffer_put_value_u8(struct devlink_health_buffer *buffer,
+				       u8 value);
+int devlink_health_buffer_put_value_u32(struct devlink_health_buffer *buffer,
+					u32 value);
+int devlink_health_buffer_put_value_u64(struct devlink_health_buffer *buffer,
+					u64 value);
+int devlink_health_buffer_put_value_string(struct devlink_health_buffer *buffer,
+					   char *name);
+int devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
+					 void *data, int len);
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -844,6 +862,64 @@ devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
 	return 0;
 }
 
+static inline int
+devlink_health_buffer_nest_start(struct devlink_health_buffer *buffer,
+				 int attrtype)
+{
+	return 0;
+}
+
+static inline void
+devlink_health_buffer_nest_end(struct devlink_health_buffer *buffer)
+{
+}
+
+static inline void
+devlink_health_buffer_nest_cancel(struct devlink_health_buffer *buffer)
+{
+}
+
+static inline int
+devlink_health_buffer_put_object_name(struct devlink_health_buffer *buffer,
+				      char *name)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_u8(struct devlink_health_buffer *buffer,
+				   u8 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_u32(struct devlink_health_buffer *buffer,
+				    u32 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_u64(struct devlink_health_buffer *buffer,
+				    u64 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_string(struct devlink_health_buffer *buffer,
+				       char *name)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
+				     void *data, int len)
+{
+	return 0;
+}
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 6e52d3660654..cff0e0cb5ac2 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -285,6 +285,14 @@ enum devlink_attr {
 	DEVLINK_ATTR_REGION_CHUNK_ADDR,         /* u64 */
 	DEVLINK_ATTR_REGION_CHUNK_LEN,          /* u64 */
 
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT,		/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR,		/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE,	/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY,	/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,	/* u8 */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,	/* dynamic */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index abb0da9d7b4b..36018af086a0 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3597,6 +3597,507 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	return 0;
 }
 
+#define DEVLINK_HEALTH_BUFFER_SIZE (4096 - GENL_HDRLEN)
+#define DEVLINK_HEALTH_BUFFER_DATA_SIZE (DEVLINK_HEALTH_BUFFER_SIZE / 2)
+#define DEVLINK_HEALTH_SIZE_TO_BUFFERS(size) DIV_ROUND_UP(size, DEVLINK_HEALTH_BUFFER_DATA_SIZE)
+#define DEVLINK_HEALTH_BUFFER_MAX_CHUNK 1024
+
+struct devlink_health_buffer {
+	void *data;
+	u64 offset;
+	u64 bytes_left;
+	u64 bytes_left_metadata;
+	u64 max_nested_depth;
+	u64 curr_nest;
+};
+
+struct devlink_health_buffer_desc {
+	int attrtype;
+	u16 len;
+	u8 nla_type;
+	u8 nest_end;
+	int value[0];
+};
+
+static void
+devlink_health_buffers_reset(struct devlink_health_buffer **buffers_list,
+			     u64 num_of_buffers)
+{
+	u64 i;
+
+	for (i = 0; i < num_of_buffers; i++) {
+		memset(buffers_list[i]->data, 0, DEVLINK_HEALTH_BUFFER_SIZE);
+		buffers_list[i]->offset = 0;
+		buffers_list[i]->bytes_left = DEVLINK_HEALTH_BUFFER_DATA_SIZE;
+		buffers_list[i]->bytes_left_metadata =
+			DEVLINK_HEALTH_BUFFER_DATA_SIZE;
+		buffers_list[i]->max_nested_depth = 0;
+		buffers_list[i]->curr_nest = 0;
+	}
+}
+
+static void
+devlink_health_buffers_destroy(struct devlink_health_buffer **buffers_list,
+			       u64 size);
+
+static struct devlink_health_buffer **
+devlink_health_buffers_create(u64 size)
+{
+	struct devlink_health_buffer **buffers_list;
+	u64 num_of_buffers = DEVLINK_HEALTH_SIZE_TO_BUFFERS(size);
+	u64 i;
+
+	buffers_list = kcalloc(num_of_buffers,
+			       sizeof(struct devlink_health_buffer *),
+			       GFP_KERNEL);
+	if (!buffers_list)
+		return NULL;
+
+	for (i = 0; i < num_of_buffers; i++) {
+		struct devlink_health_buffer *buffer;
+		void *data;
+
+		buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
+		data = kzalloc(DEVLINK_HEALTH_BUFFER_SIZE, GFP_KERNEL);
+		if (!buffer || !data) {
+			kfree(buffer);
+			kfree(data);
+			goto buffers_cleanup;
+		}
+		buffers_list[i] = buffer;
+		buffer->data = data;
+	}
+	devlink_health_buffers_reset(buffers_list, num_of_buffers);
+
+	return buffers_list;
+
+buffers_cleanup:
+	devlink_health_buffers_destroy(buffers_list, --i);
+	kfree(buffers_list);
+	return NULL;
+}
+
+static void
+devlink_health_buffers_destroy(struct devlink_health_buffer **buffers_list,
+			       u64 num_of_buffers)
+{
+	u64 i;
+
+	for (i = 0; i < num_of_buffers; i++) {
+		kfree(buffers_list[i]->data);
+		kfree(buffers_list[i]);
+	}
+}
+
+void
+devlink_health_buffer_offset_inc(struct devlink_health_buffer *buffer,
+				 int len)
+{
+	buffer->offset += len;
+}
+
+/* In order to store a nest, need two descriptors, for start and end */
+#define DEVLINK_HEALTH_BUFFER_NEST_SIZE (sizeof(struct devlink_health_buffer_desc) * 2)
+
+int devlink_health_buffer_verify_len(struct devlink_health_buffer *buffer,
+				     int len, int metadata_len)
+{
+	if (len > DEVLINK_HEALTH_BUFFER_DATA_SIZE)
+		return -EINVAL;
+
+	if (buffer->bytes_left < len ||
+	    buffer->bytes_left_metadata < metadata_len)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static struct devlink_health_buffer_desc *
+devlink_health_buffer_get_desc_from_offset(struct devlink_health_buffer *buffer)
+{
+	return buffer->data + buffer->offset;
+}
+
+int
+devlink_health_buffer_nest_start(struct devlink_health_buffer *buffer,
+				 int attrtype)
+{
+	struct devlink_health_buffer_desc *desc;
+	int err;
+
+	err = devlink_health_buffer_verify_len(buffer, 0,
+					       DEVLINK_HEALTH_BUFFER_NEST_SIZE);
+	if (err)
+		return err;
+
+	if (attrtype != DEVLINK_ATTR_HEALTH_BUFFER_OBJECT &&
+	    attrtype != DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR &&
+	    attrtype != DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE &&
+	    attrtype != DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY)
+		return -EINVAL;
+
+	desc = devlink_health_buffer_get_desc_from_offset(buffer);
+
+	desc->attrtype = attrtype;
+	buffer->bytes_left_metadata -= DEVLINK_HEALTH_BUFFER_NEST_SIZE;
+	devlink_health_buffer_offset_inc(buffer, sizeof(*desc));
+
+	buffer->curr_nest++;
+	buffer->max_nested_depth = max(buffer->max_nested_depth,
+				       buffer->curr_nest);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_nest_start);
+
+enum devlink_health_buffer_nest_end_cancel {
+	DEVLINK_HEALTH_BUFFER_NEST_END = 1,
+	DEVLINK_HEALTH_BUFFER_NEST_CANCEL,
+};
+
+static void
+devlink_health_buffer_nest_end_cancel(struct devlink_health_buffer *buffer,
+				      enum devlink_health_buffer_nest_end_cancel nest)
+{
+	struct devlink_health_buffer_desc *desc;
+
+	WARN_ON(!buffer->curr_nest);
+	buffer->curr_nest--;
+
+	desc = devlink_health_buffer_get_desc_from_offset(buffer);
+	desc->nest_end = nest;
+	devlink_health_buffer_offset_inc(buffer, sizeof(*desc));
+}
+
+void devlink_health_buffer_nest_end(struct devlink_health_buffer *buffer)
+{
+	devlink_health_buffer_nest_end_cancel(buffer,
+					      DEVLINK_HEALTH_BUFFER_NEST_END);
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_nest_end);
+
+void devlink_health_buffer_nest_cancel(struct devlink_health_buffer *buffer)
+{
+	devlink_health_buffer_nest_end_cancel(buffer,
+					      DEVLINK_HEALTH_BUFFER_NEST_CANCEL);
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_nest_cancel);
+
+int
+devlink_health_buffer_put_object_name(struct devlink_health_buffer *buffer,
+				      char *name)
+{
+	struct devlink_health_buffer_desc *desc;
+	int err;
+
+	err = devlink_health_buffer_verify_len(buffer, strlen(name) + 1,
+					       sizeof(*desc));
+	if (err)
+		return err;
+
+	desc = devlink_health_buffer_get_desc_from_offset(buffer);
+	desc->attrtype = DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME;
+	desc->nla_type = NLA_NUL_STRING;
+	desc->len = strlen(name) + 1;
+	memcpy(&desc->value, name, desc->len);
+	devlink_health_buffer_offset_inc(buffer, sizeof(*desc) + desc->len);
+
+	buffer->bytes_left_metadata -= sizeof(*desc);
+	buffer->bytes_left -= (strlen(name) + 1);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_object_name);
+
+static int
+devlink_health_buffer_put_value(struct devlink_health_buffer *buffer,
+				u8 nla_type, void *value, int len)
+{
+	struct devlink_health_buffer_desc *desc;
+	int err;
+
+	err = devlink_health_buffer_verify_len(buffer, len, sizeof(*desc));
+	if (err)
+		return err;
+
+	desc = devlink_health_buffer_get_desc_from_offset(buffer);
+	desc->attrtype = DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA;
+	desc->nla_type = nla_type;
+	desc->len = len;
+	memcpy(&desc->value, value, len);
+	devlink_health_buffer_offset_inc(buffer, sizeof(*desc) + desc->len);
+
+	buffer->bytes_left_metadata -= sizeof(*desc);
+	buffer->bytes_left -= len;
+
+	return 0;
+}
+
+int
+devlink_health_buffer_put_value_u8(struct devlink_health_buffer *buffer,
+				   u8 value)
+{
+	int err;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_U8, &value,
+					      sizeof(value));
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_u8);
+
+int
+devlink_health_buffer_put_value_u32(struct devlink_health_buffer *buffer,
+				    u32 value)
+{
+	int err;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_U32, &value,
+					      sizeof(value));
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_u32);
+
+int
+devlink_health_buffer_put_value_u64(struct devlink_health_buffer *buffer,
+				    u64 value)
+{
+	int err;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_U64, &value,
+					      sizeof(value));
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_u64);
+
+int
+devlink_health_buffer_put_value_string(struct devlink_health_buffer *buffer,
+				       char *name)
+{
+	int err;
+
+	if (strlen(name) + 1 > DEVLINK_HEALTH_BUFFER_MAX_CHUNK)
+		return -EINVAL;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_NUL_STRING, name,
+					      strlen(name) + 1);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_string);
+
+int
+devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
+				     void *data, int len)
+{
+	int err;
+
+	if (len > DEVLINK_HEALTH_BUFFER_MAX_CHUNK)
+		return -EINVAL;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_BINARY, data, len);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_data);
+
+static int
+devlink_health_buffer_fill_data(struct sk_buff *skb,
+				struct devlink_health_buffer_desc *desc)
+{
+	int err = -EINVAL;
+
+	switch (desc->nla_type) {
+	case NLA_U8:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+				 *(u8 *)desc->value);
+		break;
+	case NLA_U32:
+		err = nla_put_u32(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+				  *(u32 *)desc->value);
+		break;
+	case NLA_U64:
+		err = nla_put_u64_64bit(skb,
+					DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+					*(u64 *)desc->value, DEVLINK_ATTR_PAD);
+		break;
+	case NLA_NUL_STRING:
+		err = nla_put_string(skb,
+				     DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+				     (char *)&desc->value);
+		break;
+	case NLA_BINARY:
+		err = nla_put(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+			      desc->len, (void *)&desc->value);
+		break;
+	}
+
+	return err;
+}
+
+static int
+devlink_health_buffer_fill_type(struct sk_buff *skb,
+				struct devlink_health_buffer_desc *desc)
+{
+	int err = -EINVAL;
+
+	switch (desc->nla_type) {
+	case NLA_U8:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_U8);
+		break;
+	case NLA_U32:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_U32);
+		break;
+	case NLA_U64:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_U64);
+		break;
+	case NLA_NUL_STRING:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_NUL_STRING);
+		break;
+	case NLA_BINARY:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_BINARY);
+		break;
+	}
+
+	return err;
+}
+
+static inline struct devlink_health_buffer_desc *
+devlink_health_buffer_get_next_desc(struct devlink_health_buffer_desc *desc)
+{
+	return (void *)&desc->value + desc->len;
+}
+
+static int
+devlink_health_buffer_prepare_skb(struct sk_buff *skb,
+				  struct devlink_health_buffer *buffer)
+{
+	struct devlink_health_buffer_desc *last_desc, *desc;
+	struct nlattr **buffer_nlattr;
+	int err;
+	int i = 0;
+
+	buffer_nlattr = kcalloc(buffer->max_nested_depth,
+				sizeof(*buffer_nlattr), GFP_KERNEL);
+	if (!buffer_nlattr)
+		return -EINVAL;
+
+	last_desc = devlink_health_buffer_get_desc_from_offset(buffer);
+	desc = buffer->data;
+	while (desc != last_desc) {
+		switch (desc->attrtype) {
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT:
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR:
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE:
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY:
+			buffer_nlattr[i] = nla_nest_start(skb, desc->attrtype);
+			if (!buffer_nlattr[i])
+				goto nla_put_failure;
+			i++;
+			break;
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA:
+			err = devlink_health_buffer_fill_data(skb, desc);
+			if (err)
+				goto nla_put_failure;
+			err = devlink_health_buffer_fill_type(skb, desc);
+			if (err)
+				goto nla_put_failure;
+			break;
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME:
+			err = nla_put_string(skb, desc->attrtype,
+					     (char *)&desc->value);
+			if (err)
+				goto nla_put_failure;
+			break;
+		default:
+			WARN_ON(!desc->nest_end);
+			WARN_ON(i <= 0);
+			if (desc->nest_end == DEVLINK_HEALTH_BUFFER_NEST_END)
+				nla_nest_end(skb, buffer_nlattr[--i]);
+			else
+				nla_nest_cancel(skb, buffer_nlattr[--i]);
+			break;
+		}
+		desc = devlink_health_buffer_get_next_desc(desc);
+	};
+
+	return 0;
+
+nla_put_failure:
+	kfree(buffer_nlattr);
+	return err;
+}
+
+static int
+devlink_health_buffer_snd(struct genl_info *info,
+			  enum devlink_command cmd, int flags,
+			  struct devlink_health_buffer **buffers_array,
+			  u64 num_of_buffers)
+{
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	void *hdr;
+	int err;
+	u64 i;
+
+	for (i = 0; i < num_of_buffers; i++) {
+		/* Skip buffer if driver did not fill it up with any data */
+		if (!buffers_array[i]->offset)
+			continue;
+
+		skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+		if (!skb)
+			return -ENOMEM;
+
+		hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
+				  &devlink_nl_family, NLM_F_MULTI, cmd);
+		if (!hdr)
+			goto nla_put_failure;
+
+		err = devlink_health_buffer_prepare_skb(skb, buffers_array[i]);
+		if (err)
+			goto nla_put_failure;
+
+		genlmsg_end(skb, hdr);
+		err = genlmsg_reply(skb, info);
+		if (err)
+			return err;
+	}
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+	nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
+			NLMSG_DONE, 0, flags | NLM_F_MULTI);
+	err = genlmsg_reply(skb, info);
+	if (err)
+		return err;
+
+	return 0;
+
+nla_put_failure:
+	err = -EIO;
+	nlmsg_free(skb);
+	return err;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 02/19] devlink: Add health reporter create/destroy functionality
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
  2018-12-31 14:31 ` [PATCH RFC net-next 01/19] devlink: Add health buffer support Eran Ben Elisha
@ 2018-12-31 14:31 ` Eran Ben Elisha
  2018-12-31 14:31 ` [PATCH RFC net-next 03/19] devlink: Add health report functionality Eran Ben Elisha
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:31 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.

As part of the allocation, driver shall provide a set of callbacks which
will be used the devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.

For each reporter, devlink will hold a metadata of statistics,
buffers and status.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/net/devlink.h |  58 ++++++++++++++++++++++
 net/core/devlink.c    | 112 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 170 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 77c77319290a..6884a2571348 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -30,6 +30,7 @@ struct devlink {
 	struct list_head param_list;
 	struct list_head region_list;
 	u32 snapshot_id;
+	struct list_head reporter_list;
 	struct devlink_dpipe_headers *dpipe_headers;
 	const struct devlink_ops *ops;
 	struct device *dev;
@@ -424,6 +425,34 @@ struct devlink_region;
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
 struct devlink_health_buffer;
+struct devlink_health_reporter;
+
+/**
+ * struct devlink_health_reporter_ops - Reporter operations
+ * @name: reporter name
+ * objdump_size: objdump buffer size allocated by the devlink
+ * diagnose_size: diagnose buffer size allocated by the devlink
+ * recover: callback to recover from reported error
+ *          if priv_ctx is NULL, run a full recover
+ * objdump: callback to dump an object
+ *          if priv_ctx is NULL, run a full objdump
+ * diagnose: callback to diagnose the current status
+ */
+
+struct devlink_health_reporter_ops {
+	char *name;
+	unsigned int objdump_size;
+	unsigned int diagnose_size;
+	int (*recover)(struct devlink_health_reporter *reporter,
+		       void *priv_ctx);
+	int (*objdump)(struct devlink_health_reporter *reporter,
+		       struct devlink_health_buffer **buffers_array,
+		       unsigned int buffer_size, unsigned int num_buffers,
+		       void *priv_ctx);
+	int (*diagnose)(struct devlink_health_reporter *reporter,
+			struct devlink_health_buffer **buffers_array,
+			unsigned int buffer_size, unsigned int num_buffers);
+};
 
 struct devlink_ops {
 	int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
@@ -602,6 +631,16 @@ int devlink_health_buffer_put_value_string(struct devlink_health_buffer *buffer,
 					   char *name);
 int devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
 					 void *data, int len);
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv);
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -919,6 +958,25 @@ devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
 				     void *data, int len)
 {
 	return 0;
+
+static inline struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv)
+{
+	return NULL;
+}
+
+static inline void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+}
+
+static inline void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+	return NULL;
 }
 #endif
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 36018af086a0..a69d4679211f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4098,6 +4098,117 @@ devlink_health_buffer_snd(struct genl_info *info,
 	return err;
 }
 
+struct devlink_health_reporter {
+	struct list_head list;
+	struct devlink_health_buffer **objdump_buffers_array;
+	struct mutex objdump_lock; /* lock parallel read/write from objdump buffers */
+	struct devlink_health_buffer **diagnose_buffers_array;
+	struct mutex diagnose_lock; /* lock parallel read/write from diagnose buffers */
+	void *priv;
+	const struct devlink_health_reporter_ops *ops;
+	struct devlink *devlink;
+	u64 graceful_period;
+	bool auto_recover;
+	u8 health_state;
+};
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+	return reporter->priv;
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_priv);
+
+static struct devlink_health_reporter *
+devlink_health_reporter_find_by_name(struct devlink *devlink,
+				     const char *reporter_name)
+{
+	struct devlink_health_reporter *reporter;
+
+	list_for_each_entry(reporter, &devlink->reporter_list, list)
+		if (!strcmp(reporter->ops->name, reporter_name))
+			return reporter;
+	return NULL;
+}
+
+/**
+ *	devlink_health_reporter_create - create devlink health reporter
+ *
+ *	@devlink: devlink
+ *	@ops: ops
+ *	@graceful_period: to avoid recovery loops, in msecs
+ *	@auto_recover: auto recover when error occurs
+ *	@priv: priv
+ */
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv)
+{
+	struct devlink_health_reporter *reporter;
+
+	if (devlink_health_reporter_find_by_name(devlink, ops->name))
+		return ERR_PTR(-EEXIST);
+
+	if (WARN_ON(ops->objdump && !ops->objdump_size) ||
+	    WARN_ON(ops->diagnose && !ops->diagnose_size) ||
+	    WARN_ON(auto_recover && !ops->recover) ||
+	    WARN_ON(graceful_period && !ops->recover))
+		return ERR_PTR(-EINVAL);
+
+	reporter = kzalloc(sizeof(*reporter), GFP_KERNEL);
+	if (!reporter)
+		return ERR_PTR(-ENOMEM);
+
+	if (ops->objdump) {
+		reporter->objdump_buffers_array =
+			devlink_health_buffers_create(ops->objdump_size);
+		if (!reporter->objdump_buffers_array)
+			return ERR_PTR(-ENOMEM);
+	}
+
+	if (ops->diagnose) {
+		reporter->diagnose_buffers_array =
+			devlink_health_buffers_create(ops->diagnose_size);
+		if (!reporter->diagnose_buffers_array) {
+			devlink_health_buffers_destroy(reporter->diagnose_buffers_array,
+						       DEVLINK_HEALTH_SIZE_TO_BUFFERS(ops->objdump_size));
+			return ERR_PTR(-ENOMEM);
+		}
+	}
+
+	list_add_tail(&reporter->list, &devlink->reporter_list);
+	mutex_init(&reporter->objdump_lock);
+	mutex_init(&reporter->diagnose_lock);
+
+	reporter->priv = priv;
+	reporter->ops = ops;
+	reporter->devlink = devlink;
+	reporter->graceful_period = graceful_period;
+	reporter->auto_recover = auto_recover;
+
+	return reporter;
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_create);
+
+/**
+ *	devlink_health_reporter_destroy - destroy devlink health reporter
+ *
+ *	@reporter: devlink health reporter to destroy
+ */
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+	list_del(&reporter->list);
+	devlink_health_buffers_destroy(reporter->diagnose_buffers_array,
+				       DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->objdump_size));
+	devlink_health_buffers_destroy(reporter->diagnose_buffers_array,
+				       DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->diagnose_size));
+	kfree(reporter);
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4383,6 +4494,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	INIT_LIST_HEAD(&devlink->resource_list);
 	INIT_LIST_HEAD(&devlink->param_list);
 	INIT_LIST_HEAD(&devlink->region_list);
+	INIT_LIST_HEAD(&devlink->reporter_list);
 	mutex_init(&devlink->lock);
 	return devlink;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 03/19] devlink: Add health report functionality
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
  2018-12-31 14:31 ` [PATCH RFC net-next 01/19] devlink: Add health buffer support Eran Ben Elisha
  2018-12-31 14:31 ` [PATCH RFC net-next 02/19] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
@ 2018-12-31 14:31 ` Eran Ben Elisha
  2018-12-31 14:31 ` [PATCH RFC net-next 04/19] devlink: Add health get command Eran Ben Elisha
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:31 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the objdump / recovery callbacks.

Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long as
  there is no other Objdump which is already stored)
* Auto recovery attempt is being done. depends on:
  - Auto Recovery configuration
  - Grace period vs. time since last recover

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/net/devlink.h          |  8 +++
 include/trace/events/devlink.h | 62 +++++++++++++++++++++++
 net/core/devlink.c             | 93 ++++++++++++++++++++++++++++++++++
 3 files changed, 163 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 6884a2571348..40f988f4c940 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -641,6 +641,8 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
 
 void *
 devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
+int devlink_health_report(struct devlink_health_reporter *reporter,
+			  const char *msg, void *priv_ctx);
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -977,6 +979,12 @@ static inline void *
 devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
 {
 	return NULL;
+
+static inline int
+devlink_health_report(struct devlink_health_reporter *reporter,
+		      const char *msg, void *priv_ctx)
+{
+	return 0;
 }
 #endif
 
diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
index 44acfbca1266..7e39d2fc7c75 100644
--- a/include/trace/events/devlink.h
+++ b/include/trace/events/devlink.h
@@ -46,6 +46,65 @@ TRACE_EVENT(devlink_hwmsg,
 		  (int) __entry->len, __get_dynamic_array(buf), __entry->len)
 );
 
+TRACE_EVENT(devlink_health_report,
+	TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+		 const char *msg),
+
+	TP_ARGS(devlink, reporter_name, msg),
+
+	TP_STRUCT__entry(
+		__string(bus_name, devlink->dev->bus->name)
+		__string(dev_name, dev_name(devlink->dev))
+		__string(driver_name, devlink->dev->driver->name)
+		__string(reporter_name, msg)
+		__string(msg, msg)
+	),
+
+	TP_fast_assign(
+		__assign_str(bus_name, devlink->dev->bus->name);
+		__assign_str(dev_name, dev_name(devlink->dev));
+		__assign_str(driver_name, devlink->dev->driver->name);
+		__assign_str(reporter_name, reporter_name);
+		__assign_str(msg, msg);
+	),
+
+	TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: %s",
+		  __get_str(bus_name), __get_str(dev_name),
+		  __get_str(driver_name), __get_str(reporter_name),
+		  __get_str(msg))
+);
+
+TRACE_EVENT(devlink_health_recover_aborted,
+	TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+		 bool health_state, u64 time_since_last_recover),
+
+	TP_ARGS(devlink, reporter_name, health_state, time_since_last_recover),
+
+	TP_STRUCT__entry(
+		__string(bus_name, devlink->dev->bus->name)
+		__string(dev_name, dev_name(devlink->dev))
+		__string(driver_name, devlink->dev->driver->name)
+		__string(reporter_name, reporter_name)
+		__field(bool, health_state)
+		__field(u64, time_since_last_recover)
+	),
+
+	TP_fast_assign(
+		__assign_str(bus_name, devlink->dev->bus->name);
+		__assign_str(dev_name, dev_name(devlink->dev));
+		__assign_str(driver_name, devlink->dev->driver->name);
+		__assign_str(reporter_name, reporter_name);
+		__entry->health_state = health_state;
+		__entry->time_since_last_recover = time_since_last_recover;
+	),
+
+	TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: health_state=%d time_since_last_recover = %llu recover aborted",
+		  __get_str(bus_name), __get_str(dev_name),
+		  __get_str(driver_name), __get_str(reporter_name),
+		  __entry->health_state,
+		  __entry->time_since_last_recover)
+);
+
 #endif /* _TRACE_DEVLINK_H */
 
 /* This part must be outside protection */
@@ -64,6 +123,9 @@ static inline void trace_devlink_hwmsg(const struct devlink *devlink,
 {
 }
 
+static inline void trace_devlink_health(const char *msg)
+{
+}
 #endif /* _TRACE_DEVLINK_H */
 
 #endif
diff --git a/net/core/devlink.c b/net/core/devlink.c
index a69d4679211f..5150d127da5f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4110,6 +4110,16 @@ struct devlink_health_reporter {
 	u64 graceful_period;
 	bool auto_recover;
 	u8 health_state;
+	u8 objdump_avail;
+	u64 objdump_ts;
+	u64 error_count;
+	u64 recovery_count;
+	u64 last_recovery_ts;
+};
+
+enum devlink_health_reporter_state {
+	DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+	DEVLINK_HEALTH_REPORTER_STATE_ERROR,
 };
 
 void *
@@ -4209,6 +4219,89 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
 }
 EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
 
+static int
+devlink_health_reporter_recover(struct devlink_health_reporter *reporter,
+				void *priv_ctx)
+{
+	int err;
+
+	if (!reporter->ops->recover)
+		return -EOPNOTSUPP;
+
+	err = reporter->ops->recover(reporter, priv_ctx);
+	if (err)
+		return err;
+
+	reporter->recovery_count++;
+	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY;
+	reporter->last_recovery_ts = jiffies;
+
+	return 0;
+}
+
+static int devlink_health_do_objdump(struct devlink_health_reporter *reporter,
+				     void *priv_ctx)
+{
+	int err;
+
+	if (!reporter->ops->objdump)
+		return 0;
+
+	if (reporter->objdump_avail)
+		return 0;
+
+	devlink_health_buffers_reset(reporter->objdump_buffers_array,
+				     DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->objdump_size));
+	err = reporter->ops->objdump(reporter, reporter->objdump_buffers_array,
+				     DEVLINK_HEALTH_BUFFER_SIZE,
+				     DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->objdump_size),
+				     priv_ctx);
+	if (!err) {
+		reporter->objdump_avail = true;
+		reporter->objdump_ts = jiffies;
+	}
+
+	return err;
+}
+
+int devlink_health_report(struct devlink_health_reporter *reporter,
+			  const char *msg, void *priv_ctx)
+{
+	struct devlink *devlink = reporter->devlink;
+	int err = 0;
+
+	/* write a log message of the current error */
+	WARN_ON(!msg);
+	trace_devlink_health_report(devlink, reporter->ops->name, msg);
+	reporter->error_count++;
+
+	/* abort if the previous error wasn't recovered */
+	if (reporter->auto_recover &&
+	    (reporter->health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
+	     jiffies - reporter->last_recovery_ts <
+	     msecs_to_jiffies(reporter->graceful_period))) {
+		trace_devlink_health_recover_aborted(devlink,
+						     reporter->ops->name,
+						     reporter->health_state,
+						     jiffies -
+						     reporter->last_recovery_ts);
+		return -ECANCELED;
+	}
+
+	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
+
+	mutex_lock(&reporter->objdump_lock);
+	/* store current objdump of current error, for later analysis */
+	devlink_health_do_objdump(reporter, priv_ctx);
+	mutex_unlock(&reporter->objdump_lock);
+
+	if (reporter->auto_recover)
+		err = devlink_health_reporter_recover(reporter, priv_ctx);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(devlink_health_report);
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 04/19] devlink: Add health get command
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (2 preceding siblings ...)
  2018-12-31 14:31 ` [PATCH RFC net-next 03/19] devlink: Add health report functionality Eran Ben Elisha
@ 2018-12-31 14:31 ` Eran Ben Elisha
  2018-12-31 14:31 ` [PATCH RFC net-next 05/19] devlink: Add health set command Eran Ben Elisha
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:31 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/uapi/linux/devlink.h |  12 +++
 net/core/devlink.c           | 151 +++++++++++++++++++++++++++++++++++
 2 files changed, 163 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cff0e0cb5ac2..5d35fabf25bf 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -89,6 +89,8 @@ enum devlink_command {
 	DEVLINK_CMD_REGION_DEL,
 	DEVLINK_CMD_REGION_READ,
 
+	DEVLINK_CMD_HEALTH_REPORTER_GET,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -293,6 +295,16 @@ enum devlink_attr {
 	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,	/* u8 */
 	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,	/* dynamic */
 
+	DEVLINK_ATTR_HEALTH_REPORTER,			/* nested */
+	DEVLINK_ATTR_HEALTH_REPORTER_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_REPORTER_STATE,		/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_ERR,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_AVAIL,	/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_TS,	/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,	/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,	/* u8 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 5150d127da5f..4e6624f889aa 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4302,6 +4302,148 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
 }
 EXPORT_SYMBOL_GPL(devlink_health_report);
 
+static struct devlink_health_reporter *
+devlink_health_reporter_get_from_info(struct devlink *devlink,
+				      struct genl_info *info)
+{
+	char *reporter_name;
+
+	if (!info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME])
+		return NULL;
+
+	reporter_name =
+		nla_data(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]);
+	return devlink_health_reporter_find_by_name(devlink, reporter_name);
+}
+
+static int
+devlink_nl_health_reporter_fill(struct sk_buff *msg,
+				struct devlink *devlink,
+				struct devlink_health_reporter *reporter,
+				enum devlink_command cmd, u32 portid,
+				u32 seq, int flags)
+{
+	struct nlattr *reporter_attr;
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (devlink_nl_put_handle(msg, devlink))
+		goto genlmsg_cancel;
+
+	reporter_attr = nla_nest_start(msg, DEVLINK_ATTR_HEALTH_REPORTER);
+	if (!reporter_attr)
+		goto genlmsg_cancel;
+	if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME,
+			   reporter->ops->name))
+		goto reporter_nest_cancel;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE,
+		       reporter->health_state))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_ERR,
+			      reporter->error_count, DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,
+			      reporter->recovery_count, DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,
+			      reporter->graceful_period,
+			      DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
+		       reporter->auto_recover))
+		goto reporter_nest_cancel;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_AVAIL,
+		       reporter->objdump_avail))
+		goto reporter_nest_cancel;
+	if (reporter->objdump_avail &&
+	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_TS,
+			      reporter->objdump_ts, DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+
+	nla_nest_end(msg, reporter_attr);
+	genlmsg_end(msg, hdr);
+	return 0;
+
+reporter_nest_cancel:
+	nla_nest_end(msg, reporter_attr);
+genlmsg_cancel:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb,
+						   struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	struct sk_buff *msg;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_health_reporter_fill(msg, devlink, reporter,
+					      DEVLINK_CMD_HEALTH_REPORTER_GET,
+					      info->snd_portid, info->snd_seq,
+					      0);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int
+devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
+					  struct netlink_callback *cb)
+{
+	struct devlink_health_reporter *reporter;
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		mutex_lock(&devlink->lock);
+		list_for_each_entry(reporter, &devlink->reporter_list,
+				    list) {
+			if (idx < start) {
+				idx++;
+				continue;
+			}
+			err = devlink_nl_health_reporter_fill(msg, devlink,
+							      reporter,
+							      DEVLINK_CMD_HEALTH_REPORTER_GET,
+							      NETLINK_CB(cb->skb).portid,
+							      cb->nlh->nlmsg_seq,
+							      NLM_F_MULTI);
+			if (err) {
+				mutex_unlock(&devlink->lock);
+				goto out;
+			}
+			idx++;
+		}
+		mutex_unlock(&devlink->lock);
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4327,6 +4469,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 },
 	[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -4547,6 +4690,14 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_GET,
+		.doit = devlink_nl_cmd_health_reporter_get_doit,
+		.dumpit = devlink_nl_cmd_health_reporter_get_dumpit,
+		.policy = devlink_nl_policy,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+		/* can be retrieved by unprivileged users */
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 05/19] devlink: Add health set command
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (3 preceding siblings ...)
  2018-12-31 14:31 ` [PATCH RFC net-next 04/19] devlink: Add health get command Eran Ben Elisha
@ 2018-12-31 14:31 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 06/19] devlink: Add health recover command Eran Ben Elisha
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:31 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon receiving
		error for the reporter

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 5d35fabf25bf..7b6253c3fc89 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -90,6 +90,7 @@ enum devlink_command {
 	DEVLINK_CMD_REGION_READ,
 
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_SET,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 4e6624f889aa..c0959242d891 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4444,6 +4444,33 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
 	return msg->len;
 }
 
+static int
+devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->recover &&
+	    (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] ||
+	     info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]))
+		return -EINVAL;
+
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
+		reporter->graceful_period =
+			nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]);
+
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])
+		reporter->auto_recover =
+			nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
+
+	return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4470,6 +4497,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
 	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
+	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -4698,6 +4727,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 		/* can be retrieved by unprivileged users */
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_SET,
+		.doit = devlink_nl_cmd_health_reporter_set_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 06/19] devlink: Add health recover command
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (4 preceding siblings ...)
  2018-12-31 14:31 ` [PATCH RFC net-next 05/19] devlink: Add health set command Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 07/19] devlink: Add health diagnose command Eran Ben Elisha
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 7b6253c3fc89..243819560715 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -91,6 +91,7 @@ enum devlink_command {
 
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
+	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index c0959242d891..31fff86390e3 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4471,6 +4471,19 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
 	return 0;
 }
 
+static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
+						       struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	return devlink_health_reporter_recover(reporter, NULL);
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4734,6 +4747,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+		.doit = devlink_nl_cmd_health_reporter_recover_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 07/19] devlink: Add health diagnose command
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (5 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 06/19] devlink: Add health recover command Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 08/19] devlink: Add health objdump {get,clear} commands Eran Ben Elisha
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.

It is expected from driver's callback for diagnose command to fill it via the
buffer descriptors API. Devlink will parse it and convert isto netlink nla API
in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 51 ++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 243819560715..73faabe37488 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -92,6 +92,7 @@ enum devlink_command {
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
 	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 31fff86390e3..343ebbd38ba8 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4484,6 +4484,50 @@ static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
 	return devlink_health_reporter_recover(reporter, NULL);
 }
 
+static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb,
+							struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	u64 num_of_buffers;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->diagnose)
+		return -EOPNOTSUPP;
+
+	num_of_buffers =
+		DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->diagnose_size);
+
+	mutex_lock(&reporter->diagnose_lock);
+	devlink_health_buffers_reset(reporter->diagnose_buffers_array,
+				     num_of_buffers);
+
+	err = reporter->ops->diagnose(reporter,
+				      reporter->diagnose_buffers_array,
+				      DEVLINK_HEALTH_BUFFER_SIZE,
+				      num_of_buffers);
+	if (err)
+		goto out;
+
+	err = devlink_health_buffer_snd(info,
+					DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+					0, reporter->diagnose_buffers_array,
+					num_of_buffers);
+	if (err)
+		goto out;
+
+	mutex_unlock(&reporter->diagnose_lock);
+	return 0;
+
+out:
+	mutex_unlock(&reporter->diagnose_lock);
+	return err;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4754,6 +4798,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+		.doit = devlink_nl_cmd_health_reporter_diagnose_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 08/19] devlink: Add health objdump {get,clear} commands
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (6 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 07/19] devlink: Add health diagnose command Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 09/19] net/mlx5e: Add TX reporter support Eran Ben Elisha
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Add devlink health objdump commands, in order to run an objdump operation
over a specific reporter.

The supported operations are objdump_get in order to get last saved
objdump (if not exist, dump now) and objdump_clear to clear last saved
objdump.

It is expected from driver's callback for diagnose command to fill it via the
buffer descriptors API. Devlink will parse it and convert isto netlink nla API
in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/uapi/linux/devlink.h |  2 +
 net/core/devlink.c           | 73 ++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 73faabe37488..0fecb7618d18 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -93,6 +93,8 @@ enum devlink_command {
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
 	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
 	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+	DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 343ebbd38ba8..d45a284aa608 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4528,6 +4528,65 @@ static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb,
 	return err;
 }
 
+static void
+devlink_health_objdump_clear(struct devlink_health_reporter *reporter)
+{
+	reporter->objdump_avail = false;
+	reporter->objdump_ts = 0;
+	devlink_health_buffers_reset(reporter->objdump_buffers_array,
+				     DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->objdump_size));
+}
+
+static int devlink_nl_cmd_health_reporter_objdump_get_doit(struct sk_buff *skb,
+							   struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	u64 num_of_buffers;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->objdump)
+		return -EOPNOTSUPP;
+
+	num_of_buffers =
+		DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->objdump_size);
+
+	mutex_lock(&reporter->objdump_lock);
+	err = devlink_health_do_objdump(reporter, NULL);
+	if (err)
+		goto out;
+
+	err = devlink_health_buffer_snd(info,
+					DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET,
+					0, reporter->objdump_buffers_array,
+					num_of_buffers);
+
+out:
+	mutex_unlock(&reporter->objdump_lock);
+	return err;
+}
+
+static int
+devlink_nl_cmd_health_reporter_objdump_clear_doit(struct sk_buff *skb,
+						  struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	mutex_lock(&reporter->objdump_lock);
+	devlink_health_objdump_clear(reporter);
+	mutex_unlock(&reporter->objdump_lock);
+	return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4805,6 +4864,20 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET,
+		.doit = devlink_nl_cmd_health_reporter_objdump_get_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR,
+		.doit = devlink_nl_cmd_health_reporter_objdump_clear_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 09/19] net/mlx5e: Add TX reporter support
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (7 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 08/19] devlink: Add health objdump {get,clear} commands Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 10/19] net/mlx5e: Add TX timeout support for mlx5e TX reporter Eran Ben Elisha
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of TX errors.
This patch declares the TX reporter operations and allocate it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full TX recover, an attempt to close and open the channels will be done.
If this one passed successfully, it will be considered as a successful recover.

The SQ recover from error CQE flow is not a new feature in the driver, this
patch re-organize the functions and adapt them for the devlink health API. For
this purpose, move code from en_main.c to a new file named reporter_tx.c.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  18 +-
 .../ethernet/mellanox/mlx5/core/en/reporter.h |  14 +
 .../mellanox/mlx5/core/en/reporter_tx.c       | 321 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 135 +-------
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |   2 +-
 6 files changed, 367 insertions(+), 125 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9de9abacf7f6..6bb2a860b15b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -22,7 +22,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 #
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
-		en_selftest.o en/port.o en/monitor_stats.o
+		en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8fa8fdd30b85..27e276c9bf84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -388,10 +388,7 @@ struct mlx5e_txqsq {
 	struct mlx5e_channel      *channel;
 	int                        txq_ix;
 	u32                        rate_limit;
-	struct mlx5e_txqsq_recover {
-		struct work_struct         recover_work;
-		u64                        last_recover;
-	} recover;
+	struct work_struct         recover_work;
 } ____cacheline_aligned_in_smp;
 
 struct mlx5e_dma_info {
@@ -682,6 +679,13 @@ struct mlx5e_rss_params {
 	u8	hfunc;
 };
 
+struct mlx5e_modify_sq_param {
+	int curr_state;
+	int next_state;
+	int rl_update;
+	int rl_index;
+};
+
 struct mlx5e_priv {
 	/* priv data path fields - start */
 	struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
@@ -737,6 +741,7 @@ struct mlx5e_priv {
 #ifdef CONFIG_MLX5_EN_TLS
 	struct mlx5e_tls          *tls;
 #endif
+	struct devlink_health_reporter *tx_reporter;
 };
 
 struct mlx5e_profile {
@@ -866,6 +871,11 @@ void mlx5e_set_rq_type(struct mlx5_core_dev *mdev, struct mlx5e_params *params);
 void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
 			       struct mlx5e_params *params);
 
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+		    struct mlx5e_modify_sq_param *p);
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq);
+void mlx5e_tx_disable_queue(struct netdev_queue *txq);
+
 static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
 {
 	return (MLX5_CAP_ETH(mdev, tunnel_stateless_gre) &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
new file mode 100644
index 000000000000..74083e120ab3
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018 Mellanox Technologies. */
+
+#ifndef __MLX5E_EN_REPORTER_H
+#define __MLX5E_EN_REPORTER_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
new file mode 100644
index 000000000000..b88539528f12
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -0,0 +1,321 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018 Mellanox Technologies. */
+
+#include <net/devlink.h>
+#include "reporter.h"
+#include "lib/eq.h"
+
+#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
+
+struct mlx5e_tx_err_ctx {
+	int (*recover)(struct mlx5e_txqsq *sq);
+	struct mlx5e_txqsq *sq;
+};
+
+static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
+{
+	unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
+
+	while (time_before(jiffies, exp_time)) {
+		if (sq->cc == sq->pc)
+			return 0;
+
+		msleep(20);
+	}
+
+	netdev_err(sq->channel->netdev,
+		   "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
+		   sq->sqn, sq->cc, sq->pc);
+
+	return -ETIMEDOUT;
+}
+
+static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
+{
+	WARN_ONCE(sq->cc != sq->pc,
+		  "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
+		  sq->sqn, sq->cc, sq->pc);
+	sq->cc = 0;
+	sq->dma_fifo_cc = 0;
+	sq->pc = 0;
+}
+
+static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
+{
+	struct mlx5_core_dev *mdev = sq->channel->mdev;
+	struct net_device *dev = sq->channel->netdev;
+	struct mlx5e_modify_sq_param msp = {0};
+	int err;
+
+	msp.curr_state = curr_state;
+	msp.next_state = MLX5_SQC_STATE_RST;
+
+	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+	if (err) {
+		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
+		return err;
+	}
+
+	memset(&msp, 0, sizeof(msp));
+	msp.curr_state = MLX5_SQC_STATE_RST;
+	msp.next_state = MLX5_SQC_STATE_RDY;
+
+	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+	if (err) {
+		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
+		return err;
+	}
+
+	return 0;
+}
+
+static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
+{
+	struct mlx5_core_dev *mdev = sq->channel->mdev;
+	struct net_device *dev = sq->channel->netdev;
+	u8 state;
+	int err;
+
+	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
+		return 0;
+
+	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
+	if (err) {
+		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
+			   sq->sqn, err);
+		return err;
+	}
+
+	if (state != MLX5_RQC_STATE_ERR) {
+		netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
+		return -EINVAL;
+	}
+
+	mlx5e_tx_disable_queue(sq->txq);
+
+	err = mlx5e_wait_for_sq_flush(sq);
+	if (err)
+		return err;
+
+	/* At this point, no new packets will arrive from the stack as TXQ is
+	 * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
+	 * pending WQEs.  SQ can safely reset the SQ.
+	 */
+
+	err = mlx5e_sq_to_ready(sq, state);
+	if (err)
+		return err;
+
+	mlx5e_reset_txqsq_cc_pc(sq);
+	sq->stats->recover++;
+	mlx5e_activate_txqsq(sq);
+
+	return 0;
+}
+
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
+{
+	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+	struct mlx5e_tx_err_ctx err_ctx = {0};
+
+	err_ctx.sq       = sq;
+	err_ctx.recover  = mlx5e_tx_reporter_err_cqe_recover;
+	sprintf(err_str, "ERR CQE on SQ: 0x%x", sq->sqn);
+
+	devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+			      &err_ctx);
+}
+
+/* state lock cannot be grabbed within this function.
+ * It can cause a dead lock or a read-after-free.
+ */
+int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_tx_err_ctx *err_ctx)
+{
+	return err_ctx->recover(err_ctx->sq);
+}
+
+static int mlx5e_tx_reporter_recover_all(struct mlx5e_priv *priv)
+{
+	int err;
+
+	mutex_lock(&priv->state_lock);
+	mlx5e_close_locked(priv->netdev);
+	err = mlx5e_open_locked(priv->netdev);
+	mutex_unlock(&priv->state_lock);
+
+	return err;
+}
+
+static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
+				     void *context)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	struct mlx5e_tx_err_ctx *err_ctx = context;
+
+	return err_ctx ? mlx5e_tx_reporter_recover_from_ctx(err_ctx) :
+			 mlx5e_tx_reporter_recover_all(priv);
+}
+
+static int
+mlx5e_tx_reporter_build_diagnose_output(struct devlink_health_buffer *buffer,
+					u32 sqn, u8 state, u8 stopped)
+{
+	int err, i;
+	int nest = 0;
+	char name[20];
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	sprintf(name, "SQ 0x%x", sqn);
+	err = devlink_health_buffer_put_object_name(buffer, name);
+	if (err)
+		goto buffer_error;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_put_object_name(buffer, "HW state");
+	if (err)
+		goto buffer_error;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_put_value_u8(buffer, state);
+	if (err)
+		goto buffer_error;
+
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE */
+	nest--;
+
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR */
+	nest--;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_put_object_name(buffer, "stopped");
+	if (err)
+		goto buffer_error;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_put_value_u8(buffer, stopped);
+	if (err)
+		goto buffer_error;
+
+	for (i = 0; i < nest; i++)
+		devlink_health_buffer_nest_end(buffer);
+
+	return 0;
+
+buffer_error:
+	for (i = 0; i < nest; i++)
+		devlink_health_buffer_nest_cancel(buffer);
+	return err;
+}
+
+static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
+				      struct devlink_health_buffer **buffers_array,
+				      unsigned int buffer_size,
+				      unsigned int num_buffers)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	unsigned int buff = 0;
+	int i = 0, err = 0;
+
+	if (buffer_size < MLX5E_TX_REPORTER_PER_SQ_MAX_LEN)
+		return -ENOMEM;
+
+	mutex_lock(&priv->state_lock);
+
+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
+		mutex_unlock(&priv->state_lock);
+		return 0;
+	}
+
+	while (i < priv->channels.num * priv->channels.params.num_tc) {
+		struct mlx5e_txqsq *sq = priv->txq2sq[i];
+		u8 state;
+
+		err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
+		if (err)
+			break;
+
+		err = mlx5e_tx_reporter_build_diagnose_output(buffers_array[buff],
+							      sq->sqn, state,
+							      netif_xmit_stopped(sq->txq));
+		if (err) {
+			if (++buff == num_buffers)
+				break;
+		} else {
+			i++;
+		}
+	}
+
+	mutex_unlock(&priv->state_lock);
+	return err;
+}
+
+static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
+		.name = "TX",
+		.recover = mlx5e_tx_reporter_recover,
+		.diagnose_size = MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC *
+				 MLX5E_TX_REPORTER_PER_SQ_MAX_LEN,
+		.diagnose = mlx5e_tx_reporter_diagnose,
+		.objdump_size = 0,
+		.objdump = NULL,
+};
+
+#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct devlink *devlink = priv_to_devlink(mdev);
+
+	priv->tx_reporter =
+		devlink_health_reporter_create(devlink, &mlx5_tx_reporter_ops,
+					       MLX5_REPORTER_TX_GRACEFUL_PERIOD,
+					       true, priv);
+	return PTR_ERR_OR_ZERO(priv->tx_reporter);
+}
+
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv)
+{
+	devlink_health_reporter_destroy(priv->tx_reporter);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8cfd2ec7c0a2..4c4507384fed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -51,6 +51,7 @@
 #include "en/xdp.h"
 #include "lib/eq.h"
 #include "en/monitor_stats.h"
+#include "en/reporter.h"
 
 struct mlx5e_rq_param {
 	u32			rqc[MLX5_ST_SZ_DW(rqc)];
@@ -1160,7 +1161,7 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
 	return 0;
 }
 
-static void mlx5e_sq_recover(struct work_struct *work);
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work);
 static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 			     int txq_ix,
 			     struct mlx5e_params *params,
@@ -1182,7 +1183,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 	sq->uar_map   = mdev->mlx5e_res.bfreg.map;
 	sq->min_inline_mode = params->tx_min_inline_mode;
 	sq->stats     = &c->priv->channel_stats[c->ix].sq[tc];
-	INIT_WORK(&sq->recover.recover_work, mlx5e_sq_recover);
+	INIT_WORK(&sq->recover_work, mlx5e_tx_err_cqe_work);
 	if (MLX5_IPSEC_DEV(c->priv->mdev))
 		set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state);
 	if (mlx5_accel_is_tls_device(c->priv->mdev))
@@ -1270,15 +1271,8 @@ static int mlx5e_create_sq(struct mlx5_core_dev *mdev,
 	return err;
 }
 
-struct mlx5e_modify_sq_param {
-	int curr_state;
-	int next_state;
-	bool rl_update;
-	int rl_index;
-};
-
-static int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
-			   struct mlx5e_modify_sq_param *p)
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+		    struct mlx5e_modify_sq_param *p)
 {
 	void *in;
 	void *sqc;
@@ -1376,17 +1370,7 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
 	return err;
 }
 
-static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
-{
-	WARN_ONCE(sq->cc != sq->pc,
-		  "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
-		  sq->sqn, sq->cc, sq->pc);
-	sq->cc = 0;
-	sq->dma_fifo_cc = 0;
-	sq->pc = 0;
-}
-
-static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
 {
 	sq->txq = netdev_get_tx_queue(sq->channel->netdev, sq->txq_ix);
 	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
@@ -1395,7 +1379,7 @@ static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
 	netif_tx_start_queue(sq->txq);
 }
 
-static inline void netif_tx_disable_queue(struct netdev_queue *txq)
+void mlx5e_tx_disable_queue(struct netdev_queue *txq)
 {
 	__netif_tx_lock_bh(txq);
 	netif_tx_stop_queue(txq);
@@ -1411,7 +1395,7 @@ static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
 	/* prevent netif_tx_wake_queue */
 	napi_synchronize(&c->napi);
 
-	netif_tx_disable_queue(sq->txq);
+	mlx5e_tx_disable_queue(sq->txq);
 
 	/* last doorbell out, godspeed .. */
 	if (mlx5e_wqc_has_room_for(wq, sq->cc, sq->pc, 1)) {
@@ -1431,6 +1415,7 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
 	struct mlx5_rate_limit rl = {0};
 
 	cancel_work_sync(&sq->dim.work);
+	cancel_work_sync(&sq->recover_work);
 	mlx5e_destroy_sq(mdev, sq->sqn);
 	if (sq->rate_limit) {
 		rl.rate = sq->rate_limit;
@@ -1440,105 +1425,15 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
 	mlx5e_free_txqsq(sq);
 }
 
-static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
 {
-	unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
-
-	while (time_before(jiffies, exp_time)) {
-		if (sq->cc == sq->pc)
-			return 0;
-
-		msleep(20);
-	}
-
-	netdev_err(sq->channel->netdev,
-		   "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
-		   sq->sqn, sq->cc, sq->pc);
-
-	return -ETIMEDOUT;
-}
-
-static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
-{
-	struct mlx5_core_dev *mdev = sq->channel->mdev;
-	struct net_device *dev = sq->channel->netdev;
-	struct mlx5e_modify_sq_param msp = {0};
-	int err;
-
-	msp.curr_state = curr_state;
-	msp.next_state = MLX5_SQC_STATE_RST;
-
-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
-	if (err) {
-		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
-		return err;
-	}
-
-	memset(&msp, 0, sizeof(msp));
-	msp.curr_state = MLX5_SQC_STATE_RST;
-	msp.next_state = MLX5_SQC_STATE_RDY;
+	struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
+					      recover_work);
 
-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
-	if (err) {
-		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
-		return err;
-	}
-
-	return 0;
-}
-
-static void mlx5e_sq_recover(struct work_struct *work)
-{
-	struct mlx5e_txqsq_recover *recover =
-		container_of(work, struct mlx5e_txqsq_recover,
-			     recover_work);
-	struct mlx5e_txqsq *sq = container_of(recover, struct mlx5e_txqsq,
-					      recover);
-	struct mlx5_core_dev *mdev = sq->channel->mdev;
-	struct net_device *dev = sq->channel->netdev;
-	u8 state;
-	int err;
-
-	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
-	if (err) {
-		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
-			   sq->sqn, err);
-		return;
-	}
-
-	if (state != MLX5_RQC_STATE_ERR) {
-		netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
-		return;
-	}
-
-	netif_tx_disable_queue(sq->txq);
-
-	if (mlx5e_wait_for_sq_flush(sq))
-		return;
-
-	/* If the interval between two consecutive recovers per SQ is too
-	 * short, don't recover to avoid infinite loop of ERR_CQE -> recover.
-	 * If we reached this state, there is probably a bug that needs to be
-	 * fixed. let's keep the queue close and let tx timeout cleanup.
-	 */
-	if (jiffies_to_msecs(jiffies - recover->last_recover) <
-	    MLX5E_SQ_RECOVER_MIN_INTERVAL) {
-		netdev_err(dev, "Recover SQ 0x%x canceled, too many error CQEs\n",
-			   sq->sqn);
-		return;
-	}
-
-	/* At this point, no new packets will arrive from the stack as TXQ is
-	 * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
-	 * pending WQEs.  SQ can safely reset the SQ.
-	 */
-	if (mlx5e_sq_to_ready(sq, state))
+	if (!sq->channel->priv->tx_reporter)
 		return;
 
-	mlx5e_reset_txqsq_cc_pc(sq);
-	sq->stats->recover++;
-	recover->last_recover = jiffies;
-	mlx5e_activate_txqsq(sq);
+	mlx5e_tx_reporter_err_cqe(sq);
 }
 
 static int mlx5e_open_icosq(struct mlx5e_channel *c,
@@ -3207,6 +3102,7 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
 {
 	int tc;
 
+	mlx5e_tx_reporter_destroy(priv);
 	for (tc = 0; tc < priv->profile->max_tc; tc++)
 		mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
 }
@@ -4908,6 +4804,7 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 	mlx5e_dcbnl_initialize(priv);
 #endif
+	mlx5e_tx_reporter_create(priv);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 598ad7e4d5c9..a8e052a5ce36 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -514,7 +514,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 				mlx5e_dump_error_cqe(sq,
 						     (struct mlx5_err_cqe *)cqe);
 				queue_work(cq->channel->priv->wq,
-					   &sq->recover.recover_work);
+					   &sq->recover_work);
 			}
 			stats->cqe_err++;
 		}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 10/19] net/mlx5e: Add TX timeout support for mlx5e TX reporter
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (8 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 09/19] net/mlx5e: Add TX reporter support Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 11/19] net/mlx5: Move all devlink related functions calls to devlink.c Eran Ben Elisha
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

With this patch, ndo_tx_timeout callback will be redirected to the TX
reporter in order to detect a TX timeout error and report it to the
devlink health. (The watchdog detects TX timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from TX timeout in case of lost interrupt was added to the
TX reporter recover method. The TX timeout recover from lost interrupt is not a
new feature in the driver, this patch re-organize the functionality and move it
to the TX reporter recovery flow.

TX timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=TX: TX timeout on queue: 0, SQ: 0xd8a, CQ:
0x406, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 13972000

$devlink health diagnose pci/0000:00:09 reporter TX
SQ 0xd8a: HW state: 1, stopped: 1
SQ 0xe44: HW state: 1, stopped: 0
SQ 0xeb4: HW state: 1, stopped: 0
SQ 0xf1f: HW state: 1, stopped: 0
SQ 0xf80: HW state: 1, stopped: 0
SQ 0xfe5: HW state: 1, stopped: 0

$devlink health recover pci/0000:00:09 reporter TX
$devlink health show
pci/0000:00:09.0:
  name TX state healthy #err 1 #recover 1 last_objdump_ts N/A objdump_available false
    attributes:
        grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en/reporter.h |  1 +
 .../mellanox/mlx5/core/en/reporter_tx.c       | 35 +++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 51 +++----------------
 3 files changed, 43 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
index 74083e120ab3..2335c5b48820 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -10,5 +10,6 @@
 int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
 void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
 void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+void mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq);
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index b88539528f12..5cbb0a611c5b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -126,6 +126,41 @@ void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
 			      &err_ctx);
 }
 
+static int mlx5e_tx_reporter_timeout_recover(struct mlx5e_txqsq *sq)
+{
+	struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
+	u32 eqe_count;
+
+	netdev_err(sq->channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
+		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
+
+	eqe_count = mlx5_eq_poll_irq_disabled(eq);
+	if (!eqe_count) {
+		clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
+		return 1;
+	}
+
+	netdev_err(sq->channel->netdev, "Recover %d eqes on EQ 0x%x\n",
+		   eqe_count, eq->core.eqn);
+	sq->channel->stats->eq_rearm++;
+	return 0;
+}
+
+void mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq)
+{
+	struct mlx5e_tx_err_ctx err_ctx;
+	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+
+	err_ctx.sq       = sq;
+	err_ctx.recover  = mlx5e_tx_reporter_timeout_recover;
+	sprintf(err_str,
+		"TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
+		sq->channel->ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
+		jiffies_to_usecs(jiffies - sq->txq->trans_start));
+	devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+			      &err_ctx);
+}
+
 /* state lock cannot be grabbed within this function.
  * It can cause a dead lock or a read-after-free.
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 4c4507384fed..dee0c8f3d4e9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4074,31 +4074,14 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb,
 	return features;
 }
 
-static bool mlx5e_tx_timeout_eq_recover(struct net_device *dev,
-					struct mlx5e_txqsq *sq)
-{
-	struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
-	u32 eqe_count;
-
-	netdev_err(dev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
-		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
-
-	eqe_count = mlx5_eq_poll_irq_disabled(eq);
-	if (!eqe_count)
-		return false;
-
-	netdev_err(dev, "Recover %d eqes on EQ 0x%x\n", eqe_count, eq->core.eqn);
-	sq->channel->stats->eq_rearm++;
-	return true;
-}
-
 static void mlx5e_tx_timeout_work(struct work_struct *work)
 {
 	struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
 					       tx_timeout_work);
-	struct net_device *dev = priv->netdev;
-	bool reopen_channels = false;
-	int i, err;
+	int i;
+
+	if (!priv->tx_reporter)
+		return;
 
 	rtnl_lock();
 	mutex_lock(&priv->state_lock);
@@ -4107,36 +4090,16 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 		goto unlock;
 
 	for (i = 0; i < priv->channels.num * priv->channels.params.num_tc; i++) {
-		struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i);
+		struct netdev_queue *dev_queue =
+			netdev_get_tx_queue(priv->netdev, i);
 		struct mlx5e_txqsq *sq = priv->txq2sq[i];
 
 		if (!netif_xmit_stopped(dev_queue))
 			continue;
 
-		netdev_err(dev,
-			   "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
-			   i, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
-			   jiffies_to_usecs(jiffies - dev_queue->trans_start));
-
-		/* If we recover a lost interrupt, most likely TX timeout will
-		 * be resolved, skip reopening channels
-		 */
-		if (!mlx5e_tx_timeout_eq_recover(dev, sq)) {
-			clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
-			reopen_channels = true;
-		}
+		mlx5e_tx_reporter_timeout(sq);
 	}
 
-	if (!reopen_channels)
-		goto unlock;
-
-	mlx5e_close_locked(dev);
-	err = mlx5e_open_locked(dev);
-	if (err)
-		netdev_err(priv->netdev,
-			   "mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n",
-			   err);
-
 unlock:
 	mutex_unlock(&priv->state_lock);
 	rtnl_unlock();
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 11/19] net/mlx5: Move all devlink related functions calls to devlink.c
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (9 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 10/19] net/mlx5e: Add TX timeout support for mlx5e TX reporter Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 12/19] net/mlx5: Refactor print health info Eran Ben Elisha
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

Centralize all devlink related callbacks in one file.
In the downstream patch, some more functionality will be added, this
patch is preparing the driver infrastructure for it.

Currently, move devlink un/register functions calls into this file.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 14 ++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 12 ++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/main.c    |  5 +++--
 4 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 6bb2a860b15b..2049f332c7e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o alloc.o qp.o port.o mr.o pd.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \
-		lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o
+		lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
 
 #
 # Netdev basic
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
new file mode 100644
index 000000000000..2daf686bcc98
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
+
+#include <devlink.h>
+
+int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
+{
+	return devlink_register(devlink, dev);
+}
+
+void mlx5_devlink_unregister(struct devlink *devlink)
+{
+	devlink_unregister(devlink);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
new file mode 100644
index 000000000000..455bfa4e89c0
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
+
+#ifndef __MLX5_DEVLINK_H__
+#define __MLX5_DEVLINK_H__
+
+#include <net/devlink.h>
+
+int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
+void mlx5_devlink_unregister(struct devlink *devlink);
+
+#endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 77896c11f6f3..bd50d4adc6bf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -56,6 +56,7 @@
 #include "fs_core.h"
 #include "lib/mpfs.h"
 #include "eswitch.h"
+#include "devlink.h"
 #include "lib/mlx5.h"
 #include "fpga/core.h"
 #include "fpga/ipsec.h"
@@ -1204,7 +1205,7 @@ static int init_one(struct pci_dev *pdev,
 
 	request_module_nowait(MLX5_IB_MOD);
 
-	err = devlink_register(devlink, &pdev->dev);
+	err = mlx5_devlink_register(devlink, &pdev->dev);
 	if (err)
 		goto clean_load;
 
@@ -1231,7 +1232,7 @@ static void remove_one(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_priv *priv = &dev->priv;
 
-	devlink_unregister(devlink);
+	mlx5_devlink_unregister(devlink);
 	mlx5_unregister_device(dev);
 
 	if (mlx5_unload_one(dev, priv, true)) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 12/19] net/mlx5: Refactor print health info
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (10 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 11/19] net/mlx5: Move all devlink related functions calls to devlink.c Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 13/19] net/mlx5: Create FW devlink_health_reporter Eran Ben Elisha
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

From: Moshe Shemesh <moshe@mellanox.com>

Refactor print health info code, split to two functions:
 1. mlx5_get_health_info() - writes the health info into a buffer.
 2. mlx5_print_health_info() - prints the health info to kernel log.
This refactoring is done to enable using the health info data by devlink
health reporter diagnose() in the downstream patch.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/health.c  | 44 +++++++++++++------
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  7 +++
 include/linux/mlx5/device.h                   |  1 +
 include/linux/mlx5/mlx5_ifc.h                 |  4 +-
 4 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 196c07383082..2fa78edde1fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -215,7 +215,8 @@ static const char *hsynd_str(u8 synd)
 	}
 }
 
-static void print_health_info(struct mlx5_core_dev *dev)
+void mlx5_get_health_info(struct mlx5_core_dev *dev, u8 *synd,
+			  char (*lines_buf)[HEALTH_INFO_MAX_LINE])
 {
 	struct mlx5_core_health *health = &dev->priv.health;
 	struct health_buffer __iomem *h = health->health;
@@ -223,23 +224,40 @@ static void print_health_info(struct mlx5_core_dev *dev)
 	u32 fw;
 	int i;
 
+	*synd = ioread8(&h->synd);
 	/* If the syndrome is 0, the device is OK and no need to print buffer */
-	if (!ioread8(&h->synd))
+	if (!synd)
 		return;
 
 	for (i = 0; i < ARRAY_SIZE(h->assert_var); i++)
-		dev_err(&dev->pdev->dev, "assert_var[%d] 0x%08x\n", i, ioread32be(h->assert_var + i));
+		sprintf(lines_buf[i], "assert_var[%d] 0x%08x\n", i, ioread32be(h->assert_var + i));
 
-	dev_err(&dev->pdev->dev, "assert_exit_ptr 0x%08x\n", ioread32be(&h->assert_exit_ptr));
-	dev_err(&dev->pdev->dev, "assert_callra 0x%08x\n", ioread32be(&h->assert_callra));
+	sprintf(lines_buf[i++], "assert_exit_ptr 0x%08x\n", ioread32be(&h->assert_exit_ptr));
+	sprintf(lines_buf[i++], "assert_callra 0x%08x\n", ioread32be(&h->assert_callra));
 	sprintf(fw_str, "%d.%d.%d", fw_rev_maj(dev), fw_rev_min(dev), fw_rev_sub(dev));
-	dev_err(&dev->pdev->dev, "fw_ver %s\n", fw_str);
-	dev_err(&dev->pdev->dev, "hw_id 0x%08x\n", ioread32be(&h->hw_id));
-	dev_err(&dev->pdev->dev, "irisc_index %d\n", ioread8(&h->irisc_index));
-	dev_err(&dev->pdev->dev, "synd 0x%x: %s\n", ioread8(&h->synd), hsynd_str(ioread8(&h->synd)));
-	dev_err(&dev->pdev->dev, "ext_synd 0x%04x\n", ioread16be(&h->ext_synd));
+	sprintf(lines_buf[i++], "fw_ver %s\n", fw_str);
+	sprintf(lines_buf[i++], "hw_id 0x%08x\n", ioread32be(&h->hw_id));
+	sprintf(lines_buf[i++], "irisc_index %d\n", ioread8(&h->irisc_index));
+	sprintf(lines_buf[i++], "synd 0x%x: %s\n", ioread8(&h->synd), hsynd_str(ioread8(&h->synd)));
+	sprintf(lines_buf[i++], "ext_synd 0x%04x\n", ioread16be(&h->ext_synd));
 	fw = ioread32be(&h->fw_ver);
-	dev_err(&dev->pdev->dev, "raw fw_ver 0x%08x\n", fw);
+	sprintf(lines_buf[i], "raw fw_ver 0x%08x\n", fw);
+}
+
+static void mlx5_print_health_info(struct mlx5_core_dev *dev)
+{
+	char lines_buf[HEALTH_INFO_LINES][HEALTH_INFO_MAX_LINE] = {};
+	u8 synd;
+	int i;
+
+	mlx5_get_health_info(dev, &synd, lines_buf);
+
+	if (!synd)
+		return;
+
+	for (i = 0; i < HEALTH_INFO_LINES; i++)
+		dev_err(&dev->pdev->dev, lines_buf[i]);
+
 }
 
 static unsigned long get_next_poll_jiffies(void)
@@ -285,12 +303,12 @@ static void poll_health(struct timer_list *t)
 	health->prev = count;
 	if (health->miss_counter == MAX_MISSES) {
 		dev_err(&dev->pdev->dev, "device's health compromised - reached miss count\n");
-		print_health_info(dev);
+		mlx5_print_health_info(dev);
 	}
 
 	if (in_fatal(dev) && !health->sick) {
 		health->sick = true;
-		print_health_info(dev);
+		mlx5_print_health_info(dev);
 		mlx5_trigger_health_work(dev);
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index c68dcea5985b..ea53195c89a6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -199,4 +199,11 @@ enum {
 
 u8 mlx5_get_nic_state(struct mlx5_core_dev *dev);
 void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state);
+
+#define HEALTH_INFO_MAX_LINE 80
+#define HEALTH_INFO_LINES (MLX5_FLD_SZ_DW(health_buffer, assert_var) + 8)
+#define HEALTH_INFO_MAX_BUFF (HEALTH_INFO_MAX_LINE * HEALTH_INFO_LINES)
+void mlx5_get_health_info(struct mlx5_core_dev *dev, u8 *synd,
+			  char (*lines_buf)[HEALTH_INFO_MAX_LINE]);
+
 #endif /* __MLX5_CORE_H__ */
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 8c4a820bd4c1..bb68ab6dffe0 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -61,6 +61,7 @@
 #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits)
 
 #define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
+#define MLX5_FLD_SZ_DW(typ, fld) (__mlx5_bit_sz(typ, fld) / 32)
 #define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
 #define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
 #define MLX5_ST_SZ_QW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 64)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 821b751485fb..505a5f6cecdf 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -3360,7 +3360,9 @@ union mlx5_ifc_event_auto_bits {
 };
 
 struct mlx5_ifc_health_buffer_bits {
-	u8         reserved_at_0[0x100];
+	u8         assert_var[0xa0];
+
+	u8         reserved_at_a0[0x60];
 
 	u8         assert_existptr[0x20];
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 13/19] net/mlx5: Create FW devlink_health_reporter
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (11 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 12/19] net/mlx5: Refactor print health info Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 14/19] net/mlx5: Add core dump register access functions Eran Ben Elisha
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

From: Moshe Shemesh <moshe@mellanox.com>

Create mlx5_devlink_health_reporter for FW reporter. The FW reporter
implements devlink_health_reporter diagnose callback.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 115 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |   8 ++
 .../net/ethernet/mellanox/mlx5/core/main.c    |   5 +
 include/linux/mlx5/driver.h                   |   1 +
 4 files changed, 129 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 2daf686bcc98..4ec5d092a332 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -2,6 +2,121 @@
 /* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
 
 #include <devlink.h>
+#include "mlx5_core.h"
+
+static int
+mlx5_devlink_health_buffer_fill_syndrom(struct devlink_health_buffer *dh_buffer,
+					u8 syndrom)
+{
+	int err;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_object_name(dh_buffer, "Syndrom");
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_value_u8(dh_buffer, syndrom);
+	if (err)
+		return err;
+	devlink_health_buffer_nest_end(dh_buffer);
+	devlink_health_buffer_nest_end(dh_buffer);
+	devlink_health_buffer_nest_end(dh_buffer);
+
+	return 0;
+}
+
+static int
+mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
+			  struct devlink_health_buffer **buffers_array,
+			  unsigned int buff_size, unsigned int num_buffers)
+{
+	struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
+	char lines_buf[HEALTH_INFO_LINES][HEALTH_INFO_MAX_LINE] = {};
+	struct devlink_health_buffer *buffer;
+	u8 synd;
+	int err;
+	int i;
+
+	if (!buffers_array || buff_size < HEALTH_INFO_MAX_BUFF ||
+	    num_buffers < 1 || !buffers_array[0])
+		return -EINVAL;
+
+	buffer = buffers_array[0];
+	mlx5_get_health_info(dev, &synd, lines_buf);
+	mlx5_devlink_health_buffer_fill_syndrom(buffer, synd);
+
+	if (!synd)
+		return 0;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_object_name(buffer, "diagnose data");
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY);
+	if (err)
+		return err;
+
+	for (i = 0; i < HEALTH_INFO_LINES; i++) {
+		err = devlink_health_buffer_nest_start(buffer, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+		if (err)
+			return err;
+		err = devlink_health_buffer_put_value_string(buffer, lines_buf[i]);
+		if (err)
+			return err;
+		devlink_health_buffer_nest_end(buffer);
+	}
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_HEALTH_BUFFER_OBJECT_VALUE_ARRAY */
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_HEALTH_BUFFER_OBJECT_VALUE */
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_HEALTH_BUFFER_OBJECT_PAIR */
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_HEALTH_BUFFER_OBJECT */
+
+	return 0;
+}
+
+static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
+		.name = "FW",
+		.diagnose_size = HEALTH_INFO_MAX_BUFF,
+		.diagnose = mlx5_fw_reporter_diagnose,
+};
+
+int mlx5_fw_reporter_create(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+
+	dev->fw_reporter = devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
+							  0, false, dev);
+	return PTR_ERR_OR_ZERO(dev->fw_reporter);
+}
+
+void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev)
+{
+	if (!dev->fw_reporter)
+		return;
+
+	devlink_health_reporter_destroy(dev->fw_reporter);
+}
 
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 455bfa4e89c0..34f6bfed1cfb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -5,8 +5,16 @@
 #define __MLX5_DEVLINK_H__
 
 #include <net/devlink.h>
+#include "mlx5_core.h"
+
+struct mlx5_fw_reporter_ctx {
+	u8 err_synd;
+	int miss_counter;
+};
 
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
 void mlx5_devlink_unregister(struct devlink *devlink);
+int mlx5_fw_reporter_create(struct mlx5_core_dev *dev);
+void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev);
 
 #endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index bd50d4adc6bf..ca5f4c661f6d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -981,6 +981,10 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_fw_tracer;
 	}
 
+	err = mlx5_fw_reporter_create(dev);
+	if (err)
+		dev_warn(&pdev->dev, "Failed to create FW reporter\n");
+
 	err = mlx5_fpga_device_start(dev);
 	if (err) {
 		dev_err(&pdev->dev, "fpga device start failed %d\n", err);
@@ -1113,6 +1117,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_accel_ipsec_cleanup(dev);
 	mlx5_accel_tls_cleanup(dev);
 	mlx5_fpga_device_stop(dev);
+	mlx5_fw_reporter_destroy(dev);
 	mlx5_fw_tracer_cleanup(dev->tracer);
 	mlx5_eq_table_destroy(dev);
 	mlx5_pagealloc_stop(dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 4d16ba04790e..98c17e74fd4d 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -677,6 +677,7 @@ struct mlx5_core_dev {
 	struct mlx5_ib_clock_info  *clock_info;
 	struct page             *clock_info_page;
 	struct mlx5_fw_tracer   *tracer;
+	struct devlink_health_reporter *fw_reporter;
 };
 
 struct mlx5_db {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 14/19] net/mlx5: Add core dump register access functions
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (12 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 13/19] net/mlx5: Create FW devlink_health_reporter Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 15/19] net/mlx5: Add support for FW reporter objdump Eran Ben Elisha
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

From: Moshe Shemesh <moshe@mellanox.com>

Add access functions to core dump register to enable trigger FW core
dump.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../mellanox/mlx5/core/diag/fw_tracer.c       | 31 +++++++++++++++++++
 include/linux/mlx5/driver.h                   |  1 +
 include/linux/mlx5/mlx5_ifc.h                 | 17 +++++++++-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index 6999f4486e9e..d0f8449019af 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -786,6 +786,37 @@ static void mlx5_fw_tracer_ownership_change(struct work_struct *work)
 	mlx5_fw_tracer_start(tracer);
 }
 
+static int mlx5_fw_tracer_set_core_dump_reg(struct mlx5_core_dev *dev, u32 *in, int size_in)
+{
+	u32 out[MLX5_ST_SZ_DW(core_dump_reg)] = {0};
+
+	if (!MLX5_CAP_DEBUG(dev, core_dump_general) &&
+	    !MLX5_CAP_DEBUG(dev, core_dump_qp))
+		return -EOPNOTSUPP;
+
+	return mlx5_core_access_reg(dev, in, size_in, out, sizeof(out),
+				    MLX5_REG_CORE_DUMP, 0, 1);
+}
+
+int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_tracer *tracer = dev->tracer;
+	u32 in[MLX5_ST_SZ_DW(core_dump_reg)] = {0};
+	int err;
+
+	if (!MLX5_CAP_DEBUG(dev, core_dump_general) || !tracer)
+		return -EOPNOTSUPP;
+
+	MLX5_SET(core_dump_reg, in, core_dump_type, 0x0);
+
+	err =  mlx5_fw_tracer_set_core_dump_reg(dev, in, sizeof(in));
+	if (err)
+		return err;
+	queue_work(tracer->work_queue, &tracer->handle_traces_work);
+	flush_workqueue(tracer->work_queue);
+	return 0;
+}
+
 /* Create software resources (Buffers, etc ..) */
 struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev)
 {
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 98c17e74fd4d..bedc9bc08963 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -108,6 +108,7 @@ enum {
 	MLX5_REG_FPGA_CAP	 = 0x4022,
 	MLX5_REG_FPGA_CTRL	 = 0x4023,
 	MLX5_REG_FPGA_ACCESS_REG = 0x4024,
+	MLX5_REG_CORE_DUMP	 = 0x402e,
 	MLX5_REG_PCAP		 = 0x5001,
 	MLX5_REG_PMTU		 = 0x5003,
 	MLX5_REG_PTYS		 = 0x5004,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 505a5f6cecdf..d1f7d4b2e3eb 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -671,7 +671,9 @@ struct mlx5_ifc_qos_cap_bits {
 };
 
 struct mlx5_ifc_debug_cap_bits {
-	u8         reserved_at_0[0x20];
+	u8         core_dump_general[0x1];
+	u8         core_dump_qp[0x1];
+	u8         reserved_at_2[0x1e];
 
 	u8         reserved_at_20[0x2];
 	u8         stall_detect[0x1];
@@ -2453,6 +2455,7 @@ union mlx5_ifc_hca_cap_union_bits {
 	struct mlx5_ifc_e_switch_cap_bits e_switch_cap;
 	struct mlx5_ifc_vector_calc_cap_bits vector_calc_cap;
 	struct mlx5_ifc_qos_cap_bits qos_cap;
+	struct mlx5_ifc_debug_cap_bits debug_cap;
 	struct mlx5_ifc_fpga_cap_bits fpga_cap;
 	u8         reserved_at_0[0x8000];
 };
@@ -8410,6 +8413,18 @@ struct mlx5_ifc_qcam_reg_bits {
 	u8         reserved_at_1c0[0x80];
 };
 
+struct mlx5_ifc_core_dump_reg_bits {
+	u8         reserved_at_0[0x18];
+	u8         core_dump_type[0x8];
+
+	u8         reserved_at_20[0x30];
+	u8         vhca_id[0x10];
+
+	u8         reserved_at_60[0x8];
+	u8         qpn[0x18];
+	u8         reserved_at_80[0x180];
+};
+
 struct mlx5_ifc_pcap_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         local_port[0x8];
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 15/19] net/mlx5: Add support for FW reporter objdump
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (13 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 14/19] net/mlx5: Add core dump register access functions Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 16/19] net/mlx5: Report devlink health on FW issues Eran Ben Elisha
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

From: Moshe Shemesh <moshe@mellanox.com>

Add support of objdump callback for mlx5 FW reporter.
Once we trigger FW dump, the FW will write the core dump to its raw data
buffer. The tracer translates the raw data to traces and save it to a
buffer. Once dump is done, the saved traces data is filled as objects
into the objdump buffer.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 112 ++++++++++++++++++
 .../mellanox/mlx5/core/diag/fw_tracer.c       |  44 +++++++
 .../mellanox/mlx5/core/diag/fw_tracer.h       |  13 ++
 3 files changed, 169 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 4ec5d092a332..07bc473a8ebb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -3,6 +3,7 @@
 
 #include <devlink.h>
 #include "mlx5_core.h"
+#include "diag/fw_tracer.h"
 
 static int
 mlx5_devlink_health_buffer_fill_syndrom(struct devlink_health_buffer *dh_buffer,
@@ -35,6 +36,115 @@ mlx5_devlink_health_buffer_fill_syndrom(struct devlink_health_buffer *dh_buffer,
 	return 0;
 }
 
+int mlx5_devlink_health_buffer_fill_trace(struct devlink_health_buffer *dh_buffer,
+					  char *trace)
+{
+	int nest = 0;
+	int err = 0;
+	int i;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		goto nest_cancel;
+	nest++;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		goto nest_cancel;
+	nest++;
+
+	err = devlink_health_buffer_put_object_name(dh_buffer, "trace");
+	if (err)
+		goto nest_cancel;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		goto nest_cancel;
+	nest++;
+
+	err = devlink_health_buffer_put_value_string(dh_buffer, trace);
+	if (err)
+		goto nest_cancel;
+
+	for (i = 0; i < nest; i++)
+		devlink_health_buffer_nest_end(dh_buffer);
+	return 0;
+nest_cancel:
+	for (i = 0; i < nest; i++)
+		devlink_health_buffer_nest_cancel(dh_buffer);
+
+	return err;
+}
+
+int mlx5_fw_tracer_get_saved_traces_objects(struct mlx5_fw_tracer *tracer,
+					    struct devlink_health_buffer **buffers_array,
+					    unsigned int num_buffers)
+{
+	u32 saved_traces_index = tracer->sbuff.saved_traces_index;
+	char *saved_traces = tracer->sbuff.traces_buff;
+	u32 index, start_index, end_index;
+	u32 dh_buffer_index = 0;
+	int err = 0;
+
+	if (!saved_traces[0])
+		return -ENOMSG;
+
+	if (saved_traces[saved_traces_index * TRACE_STR_LINE])
+		start_index = saved_traces_index;
+	else
+		start_index = 0;
+	end_index = (saved_traces_index - 1) & (SAVED_TRACES_NUM - 1);
+
+	index = start_index;
+	while (index <= end_index) {
+		err = mlx5_devlink_health_buffer_fill_trace(buffers_array[dh_buffer_index],
+							    saved_traces + index * TRACE_STR_LINE);
+		if (err) {
+			dh_buffer_index++;
+			if (dh_buffer_index == num_buffers)
+				break;
+		} else {
+			index++;
+		}
+	}
+
+	return err;
+}
+
+static int
+mlx5_fw_reporter_objdump(struct devlink_health_reporter *reporter,
+			 struct devlink_health_buffer **buffers_array,
+			 unsigned int buff_size, unsigned int num_buffers,
+			 void *priv_ctx)
+{
+	struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
+	struct devlink_health_buffer *buffer;
+	int err;
+
+	if (!buffers_array || buff_size < TRACE_STR_LINE || num_buffers < 1)
+		return -EINVAL;
+
+	err = mlx5_fw_tracer_trigger_core_dump_general(dev);
+	if (err)
+		return err;
+
+	buffer = buffers_array[0];
+	if (priv_ctx) {
+		struct mlx5_fw_reporter_ctx *fw_reporter_ctx = priv_ctx;
+
+		err = mlx5_devlink_health_buffer_fill_syndrom(buffer,
+							      fw_reporter_ctx->err_synd);
+		if (err)
+			return err;
+	}
+
+	return mlx5_fw_tracer_get_saved_traces_objects(dev->tracer, buffers_array,
+						       num_buffers);
+}
+
 static int
 mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
 			  struct devlink_health_buffer **buffers_array,
@@ -97,7 +207,9 @@ mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
 
 static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
 		.name = "FW",
+		.objdump_size = SAVED_TRACES_BUFFER_SIZE_BYTE,
 		.diagnose_size = HEALTH_INFO_MAX_BUFF,
+		.objdump = mlx5_fw_reporter_objdump,
 		.diagnose = mlx5_fw_reporter_diagnose,
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index d0f8449019af..b704df545d01 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -243,6 +243,24 @@ static int mlx5_fw_tracer_allocate_strings_db(struct mlx5_fw_tracer *tracer)
 	return -ENOMEM;
 }
 
+static int mlx5_fw_tracer_allocate_saved_traces_buff(struct mlx5_fw_tracer *tracer)
+{
+	int traces_buff_size = SAVED_TRACES_BUFFER_SIZE_BYTE;
+
+	tracer->sbuff.traces_buff = kzalloc(traces_buff_size, GFP_KERNEL);
+	if (!tracer->sbuff.traces_buff)
+		return -ENOMEM;
+	tracer->sbuff.saved_traces_index = 0;
+
+	return 0;
+}
+
+static void mlx5_fw_tracer_free_saved_traces_buff(struct mlx5_fw_tracer *tracer)
+{
+	kfree(tracer->sbuff.traces_buff);
+	tracer->sbuff.traces_buff = NULL;
+}
+
 static void mlx5_tracer_read_strings_db(struct work_struct *work)
 {
 	struct mlx5_fw_tracer *tracer = container_of(work, struct mlx5_fw_tracer,
@@ -522,6 +540,20 @@ static void mlx5_fw_tracer_clean_ready_list(struct mlx5_fw_tracer *tracer)
 		list_del(&str_frmt->list);
 }
 
+static void mlx5_fw_tracer_save_trace(struct mlx5_fw_tracer *tracer, u64 timestamp,
+				      bool lost, u8 event_id, char *msg)
+{
+	char *saved_traces = tracer->sbuff.traces_buff;
+	u32 offset;
+
+	offset = tracer->sbuff.saved_traces_index * TRACE_STR_LINE;
+	snprintf(saved_traces + offset, TRACE_STR_LINE, "%s [0x%llx] %d [0x%x] %s",
+		 dev_name(&tracer->dev->pdev->dev), timestamp, lost, event_id, msg);
+
+	tracer->sbuff.saved_traces_index =
+		(tracer->sbuff.saved_traces_index + 1) & (SAVED_TRACES_NUM - 1);
+}
+
 static void mlx5_tracer_print_trace(struct tracer_string_format *str_frmt,
 				    struct mlx5_core_dev *dev,
 				    u64 trace_timestamp)
@@ -540,6 +572,9 @@ static void mlx5_tracer_print_trace(struct tracer_string_format *str_frmt,
 	trace_mlx5_fw(dev->tracer, trace_timestamp, str_frmt->lost,
 		      str_frmt->event_id, tmp);
 
+	mlx5_fw_tracer_save_trace(dev->tracer, trace_timestamp,
+				  str_frmt->lost, str_frmt->event_id, tmp);
+
 	/* remove it from hash */
 	mlx5_tracer_clean_message(str_frmt);
 }
@@ -864,10 +899,18 @@ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev)
 		goto free_log_buf;
 	}
 
+	err = mlx5_fw_tracer_allocate_saved_traces_buff(tracer);
+	if (err) {
+		mlx5_core_warn(dev, "FWTracer: Create saved traces buffer failed %d\n", err);
+		goto free_strings_db;
+	}
+
 	mlx5_core_dbg(dev, "FWTracer: Tracer created\n");
 
 	return tracer;
 
+free_strings_db:
+	mlx5_fw_tracer_free_strings_db(tracer);
 free_log_buf:
 	mlx5_fw_tracer_destroy_log_buf(tracer);
 destroy_workqueue:
@@ -948,6 +991,7 @@ void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer)
 	cancel_work_sync(&tracer->read_fw_strings_work);
 	mlx5_fw_tracer_clean_ready_list(tracer);
 	mlx5_fw_tracer_clean_print_hash(tracer);
+	mlx5_fw_tracer_free_saved_traces_buff(tracer);
 	mlx5_fw_tracer_free_strings_db(tracer);
 	mlx5_fw_tracer_destroy_log_buf(tracer);
 	flush_workqueue(tracer->work_queue);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
index a8b8747f2b61..ad817932cc8e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
@@ -46,6 +46,10 @@
 #define TRACER_BLOCK_SIZE_BYTE 256
 #define TRACES_PER_BLOCK 32
 
+#define TRACE_STR_LINE 256
+#define SAVED_TRACES_NUM 1024
+#define SAVED_TRACES_BUFFER_SIZE_BYTE (SAVED_TRACES_NUM * TRACE_STR_LINE)
+
 #define TRACER_MAX_PARAMS 7
 #define MESSAGE_HASH_BITS 6
 #define MESSAGE_HASH_SIZE BIT(MESSAGE_HASH_BITS)
@@ -83,6 +87,12 @@ struct mlx5_fw_tracer {
 		u32 consumer_index;
 	} buff;
 
+	/* Saved Tarces Buffer */
+	struct {
+		void *traces_buff;
+		u32 saved_traces_index;
+	} sbuff;
+
 	u64 last_timestamp;
 	struct work_struct handle_traces_work;
 	struct hlist_head hash[MESSAGE_HASH_SIZE];
@@ -171,5 +181,8 @@ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev);
 int mlx5_fw_tracer_init(struct mlx5_fw_tracer *tracer);
 void mlx5_fw_tracer_cleanup(struct mlx5_fw_tracer *tracer);
 void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer);
+int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev);
+int mlx5_fw_tracer_get_saved_traces(struct mlx5_fw_tracer *tracer,
+				    char *buff, unsigned int buff_size);
 
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 16/19] net/mlx5: Report devlink health on FW issues
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (14 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 15/19] net/mlx5: Add support for FW reporter objdump Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 17/19] net/mlx5: Add FW fatal devlink_health_reporter Eran Ben Elisha
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

From: Moshe Shemesh <moshe@mellanox.com>

Use devlink_health_report() to report any symptom of FW issue as FW
counter miss or new health syndrom.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 21 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |  1 +
 .../net/ethernet/mellanox/mlx5/core/health.c  | 10 +++++++++
 include/linux/mlx5/driver.h                   |  2 ++
 4 files changed, 34 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 07bc473a8ebb..04c904214e4c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -205,6 +205,27 @@ mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
 	return 0;
 }
 
+void mlx5_fw_reporter_err_work(struct work_struct *work)
+{
+	struct mlx5_fw_reporter_ctx fw_reporter_ctx;
+	struct mlx5_core_health *health;
+	struct mlx5_core_dev *dev;
+	struct mlx5_priv *priv;
+
+	health = container_of(work, struct mlx5_core_health, report_work);
+	priv = container_of(health, struct mlx5_priv, health);
+	dev = container_of(priv, struct mlx5_core_dev, priv);
+
+	fw_reporter_ctx.err_synd = health->synd;
+	fw_reporter_ctx.miss_counter = health->miss_counter;
+	if (fw_reporter_ctx.err_synd)
+		devlink_health_report(dev->fw_reporter, "FW syndrom reported",
+				      &fw_reporter_ctx);
+	else if (fw_reporter_ctx.miss_counter)
+		devlink_health_report(dev->fw_reporter, "FW miss counter reported",
+				      &fw_reporter_ctx);
+}
+
 static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
 		.name = "FW",
 		.objdump_size = SAVED_TRACES_BUFFER_SIZE_BYTE,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 34f6bfed1cfb..082a648a3af3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -16,5 +16,6 @@ int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
 void mlx5_devlink_unregister(struct devlink *devlink);
 int mlx5_fw_reporter_create(struct mlx5_core_dev *dev);
 void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev);
+void mlx5_fw_reporter_err_work(struct work_struct *work);
 
 #endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 2fa78edde1fe..4d0ad792b226 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -38,6 +38,7 @@
 #include <linux/mlx5/driver.h>
 #include <linux/mlx5/cmd.h>
 #include "mlx5_core.h"
+#include "devlink.h"
 #include "lib/eq.h"
 #include "lib/mlx5.h"
 
@@ -289,7 +290,9 @@ static void poll_health(struct timer_list *t)
 {
 	struct mlx5_core_dev *dev = from_timer(dev, t, priv.health.timer);
 	struct mlx5_core_health *health = &dev->priv.health;
+	struct health_buffer __iomem *h = health->health;
 	u32 count;
+	u8 synd;
 
 	if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
 		goto out;
@@ -304,8 +307,13 @@ static void poll_health(struct timer_list *t)
 	if (health->miss_counter == MAX_MISSES) {
 		dev_err(&dev->pdev->dev, "device's health compromised - reached miss count\n");
 		mlx5_print_health_info(dev);
+		queue_work(health->wq, &health->report_work);
 	}
 
+	synd = ioread8(&h->synd);
+	if (synd && synd != health->synd)
+		queue_work(health->wq, &health->report_work);
+
 	if (in_fatal(dev) && !health->sick) {
 		health->sick = true;
 		mlx5_print_health_info(dev);
@@ -356,6 +364,7 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev)
 	set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
 	spin_unlock_irqrestore(&health->wq_lock, flags);
 	cancel_delayed_work_sync(&health->recover_work);
+	cancel_work_sync(&health->report_work);
 	cancel_work_sync(&health->work);
 }
 
@@ -395,6 +404,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
 		return -ENOMEM;
 	spin_lock_init(&health->wq_lock);
 	INIT_WORK(&health->work, health_care);
+	INIT_WORK(&health->report_work, mlx5_fw_reporter_err_work);
 	INIT_DELAYED_WORK(&health->recover_work, health_recover);
 
 	return 0;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index bedc9bc08963..b2dc32b553b4 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -432,12 +432,14 @@ struct mlx5_core_health {
 	struct timer_list		timer;
 	u32				prev;
 	int				miss_counter;
+	u8				synd;
 	bool				sick;
 	/* wq spinlock to synchronize draining */
 	spinlock_t			wq_lock;
 	struct workqueue_struct	       *wq;
 	unsigned long			flags;
 	struct work_struct		work;
+	struct work_struct		report_work;
 	struct delayed_work		recover_work;
 };
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 17/19] net/mlx5: Add FW fatal devlink_health_reporter
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (15 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 16/19] net/mlx5: Report devlink health on FW issues Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 18/19] net/mlx5: Report devlink health on FW fatal issues Eran Ben Elisha
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

From: Moshe Shemesh <moshe@mellanox.com>

Create mlx5_devlink_health_reporter for FW fatal reporter.
The FW fatal reporter is added in addition to the fw reporter and
implements only the recover callback.
The point of having two reporters for FW issues, is that we
don't want to run FW recover on any issue, but only fatal ones.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 51 +++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/devlink.h |  4 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    |  6 +--
 include/linux/mlx5/driver.h                   |  1 +
 4 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 04c904214e4c..2e6c74ed1f04 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -234,21 +234,54 @@ static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
 		.diagnose = mlx5_fw_reporter_diagnose,
 };
 
-int mlx5_fw_reporter_create(struct mlx5_core_dev *dev)
+static int
+mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
+			       void *priv_ctx)
 {
-	struct devlink *devlink = priv_to_devlink(dev);
+	struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
+	u8 nic_state;
 
-	dev->fw_reporter = devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
-							  0, false, dev);
-	return PTR_ERR_OR_ZERO(dev->fw_reporter);
+	nic_state = mlx5_get_nic_state(dev);
+	if (nic_state == MLX5_NIC_IFC_INVALID) {
+		dev_err(&dev->pdev->dev, "health recovery flow aborted since the nic state is invalid\n");
+		return -ECANCELED;
+	}
+	dev_err(&dev->pdev->dev, "starting health recovery flow\n");
+
+	mlx5_recover_device(dev);
+
+	return 0;
 }
 
-void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev)
+static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = {
+		.name = "FW_fatal",
+		.recover = mlx5_fw_fatal_reporter_recover,
+};
+
+#define MLX5_REPORTER_FW_GRACEFUL_PERIOD 120000
+int mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
 {
-	if (!dev->fw_reporter)
-		return;
+	struct devlink *devlink = priv_to_devlink(dev);
 
-	devlink_health_reporter_destroy(dev->fw_reporter);
+	dev->fw_reporter =
+		devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
+					       0, false, dev);
+	if (IS_ERR(dev->fw_reporter))
+		return PTR_ERR(dev->fw_reporter);
+
+	dev->fw_fatal_reporter =
+		devlink_health_reporter_create(devlink, &mlx5_fw_fatal_reporter_ops,
+					       MLX5_REPORTER_FW_GRACEFUL_PERIOD,
+					       true, dev);
+	return PTR_ERR_OR_ZERO(dev->fw_fatal_reporter);
+}
+
+void mlx5_fw_reporters_destroy(struct mlx5_core_dev *dev)
+{
+	if (dev->fw_reporter)
+		devlink_health_reporter_destroy(dev->fw_reporter);
+	if (dev->fw_fatal_reporter)
+		devlink_health_reporter_destroy(dev->fw_fatal_reporter);
 }
 
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 082a648a3af3..9b544f677aa7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -14,8 +14,8 @@ struct mlx5_fw_reporter_ctx {
 
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
 void mlx5_devlink_unregister(struct devlink *devlink);
-int mlx5_fw_reporter_create(struct mlx5_core_dev *dev);
-void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev);
+int mlx5_fw_reporters_create(struct mlx5_core_dev *dev);
+void mlx5_fw_reporters_destroy(struct mlx5_core_dev *dev);
 void mlx5_fw_reporter_err_work(struct work_struct *work);
 
 #endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ca5f4c661f6d..8f12c761a485 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -981,9 +981,9 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_fw_tracer;
 	}
 
-	err = mlx5_fw_reporter_create(dev);
+	err = mlx5_fw_reporters_create(dev);
 	if (err)
-		dev_warn(&pdev->dev, "Failed to create FW reporter\n");
+		dev_warn(&pdev->dev, "Failed to create FW reporters\n");
 
 	err = mlx5_fpga_device_start(dev);
 	if (err) {
@@ -1117,7 +1117,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_accel_ipsec_cleanup(dev);
 	mlx5_accel_tls_cleanup(dev);
 	mlx5_fpga_device_stop(dev);
-	mlx5_fw_reporter_destroy(dev);
+	mlx5_fw_reporters_destroy(dev);
 	mlx5_fw_tracer_cleanup(dev->tracer);
 	mlx5_eq_table_destroy(dev);
 	mlx5_pagealloc_stop(dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index b2dc32b553b4..8ab66bb40a17 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -681,6 +681,7 @@ struct mlx5_core_dev {
 	struct page             *clock_info_page;
 	struct mlx5_fw_tracer   *tracer;
 	struct devlink_health_reporter *fw_reporter;
+	struct devlink_health_reporter *fw_fatal_reporter;
 };
 
 struct mlx5_db {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 18/19] net/mlx5: Report devlink health on FW fatal issues
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (16 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 17/19] net/mlx5: Add FW fatal devlink_health_reporter Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2018-12-31 14:32 ` [PATCH RFC net-next 19/19] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

From: Moshe Shemesh <moshe@mellanox.com>

Report devlink health on FW fatal issues via FW_fatal_reporter. The
driver recover flow for FW fatal error is now being handled by the
devlink health.

Having the recovery controlled by devlink health, the user has the
ability to cancel the auto-recovery for debug session and run it
manually.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 15 +++++++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |  1 +
 .../net/ethernet/mellanox/mlx5/core/health.c  | 25 +------------------
 3 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 2e6c74ed1f04..4e41b8844d61 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -234,6 +234,21 @@ static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
 		.diagnose = mlx5_fw_reporter_diagnose,
 };
 
+void mlx5_fw_fatal_reporter_work(struct work_struct *work)
+{
+	struct mlx5_core_health *health;
+	struct delayed_work *dwork;
+	struct mlx5_core_dev *dev;
+	struct mlx5_priv *priv;
+
+	dwork = container_of(work, struct delayed_work, work);
+	health = container_of(dwork, struct mlx5_core_health, recover_work);
+	priv = container_of(health, struct mlx5_priv, health);
+	dev = container_of(priv, struct mlx5_core_dev, priv);
+
+	devlink_health_report(dev->fw_fatal_reporter, "FW recovery", NULL);
+}
+
 static int
 mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
 			       void *priv_ctx)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 9b544f677aa7..9c5eba632293 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -17,5 +17,6 @@ void mlx5_devlink_unregister(struct devlink *devlink);
 int mlx5_fw_reporters_create(struct mlx5_core_dev *dev);
 void mlx5_fw_reporters_destroy(struct mlx5_core_dev *dev);
 void mlx5_fw_reporter_err_work(struct work_struct *work);
+void mlx5_fw_fatal_reporter_work(struct work_struct *work);
 
 #endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 4d0ad792b226..f3d030de5695 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -138,29 +138,6 @@ static void mlx5_handle_bad_state(struct mlx5_core_dev *dev)
 	mlx5_disable_device(dev);
 }
 
-static void health_recover(struct work_struct *work)
-{
-	struct mlx5_core_health *health;
-	struct delayed_work *dwork;
-	struct mlx5_core_dev *dev;
-	struct mlx5_priv *priv;
-	u8 nic_state;
-
-	dwork = container_of(work, struct delayed_work, work);
-	health = container_of(dwork, struct mlx5_core_health, recover_work);
-	priv = container_of(health, struct mlx5_priv, health);
-	dev = container_of(priv, struct mlx5_core_dev, priv);
-
-	nic_state = mlx5_get_nic_state(dev);
-	if (nic_state == MLX5_NIC_IFC_INVALID) {
-		dev_err(&dev->pdev->dev, "health recovery flow aborted since the nic state is invalid\n");
-		return;
-	}
-
-	dev_err(&dev->pdev->dev, "starting health recovery flow\n");
-	mlx5_recover_device(dev);
-}
-
 /* How much time to wait until health resetting the driver (in msecs) */
 #define MLX5_RECOVERY_DELAY_MSECS 60000
 static void health_care(struct work_struct *work)
@@ -405,7 +382,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
 	spin_lock_init(&health->wq_lock);
 	INIT_WORK(&health->work, health_care);
 	INIT_WORK(&health->report_work, mlx5_fw_reporter_err_work);
-	INIT_DELAYED_WORK(&health->recover_work, health_recover);
+	INIT_DELAYED_WORK(&health->recover_work, mlx5_fw_fatal_reporter_work);
 
 	return 0;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC net-next 19/19] devlink: Add Documentation/networking/devlink-health.txt
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (17 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 18/19] net/mlx5: Report devlink health on FW fatal issues Eran Ben Elisha
@ 2018-12-31 14:32 ` Eran Ben Elisha
  2019-01-01  1:47   ` Jakub Kicinski
  2018-12-31 14:41 ` [PATCH RFC iproute2-next] devlink: Add health command support Aya Levin
  2019-01-01  1:47 ` [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Jakub Kicinski
  20 siblings, 1 reply; 31+ messages in thread
From: Eran Ben Elisha @ 2018-12-31 14:32 UTC (permalink / raw)
  To: netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

From: Aya Levin <ayal@mellanox.com>

This patch adds a new file to add information about devlink health
mechanism.

Signed-off-by: Aya Levin <ayal@mellanox.com>
---
 Documentation/networking/devlink-health.txt | 86 +++++++++++++++++++++
 1 file changed, 86 insertions(+)
 create mode 100644 Documentation/networking/devlink-health.txt

diff --git a/Documentation/networking/devlink-health.txt b/Documentation/networking/devlink-health.txt
new file mode 100644
index 000000000000..ea8a9cc773a2
--- /dev/null
+++ b/Documentation/networking/devlink-health.txt
@@ -0,0 +1,86 @@
+The health mechanism is targeted for Real Time Alerting, in order to know when
+something bad had happened to a PCI device
+- Provide alert debug information
+- Self healing
+- If problem needs vendor support, provide a way to gather all needed debugging
+  information.
+
+The main idea is to unify and centralize driver health reports in the
+generic devlink instance and allow the user to set different
+attributes of the health reporting and recovery procedures.
+
+The devlink health reporter:
+Device driver creates a "health reporter" per each error/health type.
+Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
+or unknown (driver specific).
+For each registered health reporter a driver can issue error/health reports
+asynchronously. All health reports handling is done by devlink.
+Device driver can provide specific callbacks for each "health reporter", e.g.
+ - Recovery procedures
+ - Diagnostics and object dump procedures
+ - OOB initial parameters
+Different parts of the driver can register different types of health reporters
+with different handlers.
+
+Once an error is reported, devlink health will do the following actions:
+  * A log is being send to the kernel trace events buffer
+  * Health status and statistics are being updated for the reporter instance
+  * Object dump is being taken and saved at the reporter instance (as long as
+    there is no other Objdump which is already stored)
+  * Auto recovery attempt is being done. Depends on:
+    - Auto-recovery configuration
+    - Grace period vs. time passed since last recover
+
+The user interface:
+User can access/change each reporter's parameters and driver specific callbacks
+via devlink, e.g per error type (per health reporter)
+ - Configure reporter's generic parameters (like: disable/enable auto recovery)
+ - Invoke recovery procedure
+ - Run diagnostics
+ - Object dump
+
+The devlink health interface (via netlink):
+DEVLINK_CMD_HEALTH_REPORTER_GET
+  Retrieves status and configuration info per DEV and reporter.
+DEVLINK_CMD_HEALTH_REPORTER_SET
+  Allows reporter-related configuration setting.
+DEVLINK_CMD_HEALTH_REPORTER_RECOVER
+  Triggers a reporter's recovery procedure.
+DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
+  Retrieves diagnostics data from a reporter on a device.
+DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET
+  Retrieves the last stored objdump. Devlink health
+  saves a single objdump. If an objdump is not already stored by the devlink
+  for this reporter, devlink generates a new objdump.
+  Objdump output is defined by the reporter.
+DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR
+  Clears the last saved objdump file for the specified reporter.
+
+
+                                               netlink
+                                      +--------------------------+
+                                      |                          |
+                                      |            +             |
+                                      |            |             |
+                                      +--------------------------+
+                                                   |request for ops
+                                                   |(diagnose,
+ mlx5_core                             devlink     |recover,
+                                                   |dump)
++--------+                            +--------------------------+
+|        |                            |    reporter|             |
+|        |                            |  +---------v----------+  |
+|        |   ops execution            |  |                    |  |
+|     <----------------------------------+                    |  |
+|        |                            |  |                    |  |
+|        |                            |  + ^------------------+  |
+|        |                            |    | request for ops     |
+|        |                            |    | (recover, dump)     |
+|        |                            |    |                     |
+|        |                            |  +-+------------------+  |
+|        |     health report          |  | health handler     |  |
+|        +------------------------------->                    |  |
+|        |                            |  +--------------------+  |
+|        |     health reporter create |                          |
+|        +---------------------------->                          |
++--------+                            +--------------------------+
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH RFC iproute2-next] devlink: Add health command support
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (18 preceding siblings ...)
  2018-12-31 14:32 ` [PATCH RFC net-next 19/19] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
@ 2018-12-31 14:41 ` Aya Levin
  2019-01-01  1:47 ` [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Jakub Kicinski
  20 siblings, 0 replies; 31+ messages in thread
From: Aya Levin @ 2018-12-31 14:41 UTC (permalink / raw)
  To: David Ahern, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

This patch add support for the following commands:
Devlink health show [DEV reporter REPORTE_NAME],
Devlink health recover DEV reporter REPORTER_NAME,
Devlink health diagnose DEV reporter REPORTER_NAME,
Devlink health objdump show DEV reporter REPORTER_NAME,
Devlink health objdump clear DEV reporter REPORTER_NAME,
Devlink health set DEV reporter REPORTER_NAME NAME VALUE

 * Devlink health show command displays status and configuration info on
   specific reporter on a device or dump the info on all reporters on all
   devices.
 * Devlink health recover enables the user to initiate a
   recovery on a reporter. This operation will increment the recoveries
   counter displayed in the show command.
 * Devlink health diagnose enables the user to retrieve diagnostics data
   on a reporter on a device. The command's output is a free text defined
   by the reporter.
 * Devlink health objdump show displays the last saved objdump. Devlink
   health saves a single objdump. If an objdump is not already stored by
   the Devlink for this reporter, Devlink generates a new objdump. The
   objdump can be generated automatically when a reporter reports on an
   error or by the user's request.
   Objdump output is defined by the reporter.
 * Devlink health objdump clear, deletes the last saved objdump file.
 * Devlink health set, enables the user to configure:
	1) grace_period [msec] time interval between auto recoveries.
	2) auto_recover [true/false] whether the devlink should execute
	automatic recover on error.

Examples:
$devlink health show pci/0000:00:09.0 reporter TX
pci/0000:00:09.0:
name TX state healthy #err 0 #recover 1 last_objdump_ts N/A objdump_available false
  attributes:
    grace period 600 auto recover true
$devlink health diagnose pci/0000:00:09.0 reporter TX
SQ 0x9b: HW state: 1 stopped: 0
SQ 0x9f: HW state: 1 stopped: 0
SQ 0xa3: HW state: 1 stopped: 0
SQ 0xa7: HW state: 1 stopped: 0
SQ 0xab: HW state: 1 stopped: 0
$devlink health objdump show pci/0000:00:09.0 reporter TX
TX dump data
$devlink health objdump clear pci/0000:00:09.0 reporter TX
$devlink health set pci/0000:00:09.0 reporter TX grace_period 3500
$devlink health set pci/0000:00:09.0 reporter TX auto_recover false

Signed-off-by: Aya Levin <ayal@mellanox.com>
---
 devlink/devlink.c            | 531 ++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/devlink.h |  29 +++
 man/man8/devlink-health.8    | 175 ++++++++++++++
 man/man8/devlink.8           |   7 +-
 4 files changed, 739 insertions(+), 3 deletions(-)
 create mode 100644 man/man8/devlink-health.8

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 3651e90c1159..24dd852d8478 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -40,6 +40,11 @@
 #define PARAM_CMODE_DRIVERINIT_STR "driverinit"
 #define PARAM_CMODE_PERMANENT_STR "permanent"
 
+#define HEALTH_REPORTER_STATE_HEALTHY_STR "healthy"
+#define HEALTH_REPORTER_STATE_ERROR_STR "error"
+#define HEALTH_REPORTER_GRACE_PERIOD_STR "grace_period"
+#define HEALTH_REPORTER_AUTO_RECOVER_STR "auto_recover"
+
 static int g_new_line_count;
 
 #define pr_err(args...) fprintf(stderr, ##args)
@@ -199,6 +204,10 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_REGION_SNAPSHOT_ID	BIT(22)
 #define DL_OPT_REGION_ADDRESS		BIT(23)
 #define DL_OPT_REGION_LENGTH		BIT(24)
+#define DL_OPT_HANDLE_HEALTH		BIT(25)
+#define DL_OPT_HEALTH_REPORTER_NAME	BIT(26)
+#define DL_OPT_HEALTH_REPORTER_DEV	BIT(27)
+
 
 struct dl_opts {
 	uint32_t present; /* flags of present items */
@@ -230,6 +239,10 @@ struct dl_opts {
 	uint32_t region_snapshot_id;
 	uint64_t region_address;
 	uint64_t region_length;
+	const char *reporter_name;
+	const char *reporter_param_name;
+	const char *reporter_param_value;
+
 };
 
 struct dl {
@@ -959,7 +972,7 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 		if (err)
 			return err;
 		o_found |= handle_bit;
-	} else if (o_required & DL_OPT_HANDLE) {
+	} else if (DL_OPT_HANDLE) {
 		err = dl_argv_handle(dl, &opts->bus_name, &opts->dev_name);
 		if (err)
 			return err;
@@ -1178,6 +1191,15 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_REGION_LENGTH;
+		} else if (dl_argv_match(dl, "reporter") &&
+			   (o_all & DL_OPT_HEALTH_REPORTER_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->reporter_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_HEALTH_REPORTER_NAME;
+			o_found |= DL_OPT_HANDLE;
+			break;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -1298,6 +1320,12 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 		return -EINVAL;
 	}
 
+	if ((o_required & DL_OPT_HEALTH_REPORTER_NAME) &&
+	    !(o_found & DL_OPT_HEALTH_REPORTER_NAME)) {
+		pr_err("Reporter name expected.\n");
+		return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -1382,6 +1410,9 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_REGION_LENGTH)
 		mnl_attr_put_u64(nlh, DEVLINK_ATTR_REGION_CHUNK_LEN,
 				 opts->region_length);
+	if (opts->present & DL_OPT_HEALTH_REPORTER_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_HEALTH_REPORTER_NAME,
+				  opts->reporter_name);
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -1513,6 +1544,8 @@ static void __pr_out_handle_start(struct dl *dl, struct nlattr **tb,
 				__pr_out_newline();
 				__pr_out_indent_inc();
 				arr_last_handle_set(dl, bus_name, dev_name);
+			} else {
+				__pr_out_indent_inc();
 			}
 		} else {
 			pr_out("%s%s", buf, content ? ":" : "");
@@ -5557,11 +5590,501 @@ static int cmd_region(struct dl *dl)
 	return -ENOENT;
 }
 
+static int cmd_health_set_params(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint64_t period;
+	bool auto_recover;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+	err = dl_argv_parse(dl, DL_OPT_HANDLE |
+			    DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	err = dl_argv_str(dl, &dl->opts.reporter_param_name);
+	if (err)
+		return err;
+	err = dl_argv_str(dl, &dl->opts.reporter_param_value);
+	if (err)
+		return err;
+	dl_opts_put(nlh, dl);
+
+	if (!strncmp(dl->opts.reporter_param_name,
+		     HEALTH_REPORTER_GRACE_PERIOD_STR, strlen("garce"))) {
+		err = strtouint64_t(dl->opts.reporter_param_value, &period);
+		if (err)
+			goto err_param_value_parse;
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_HEALTH_REPORTER_PERIOD,
+				 period);
+	} else if (!strncmp(dl->opts.reporter_param_name,
+			    HEALTH_REPORTER_AUTO_RECOVER_STR,
+			    strlen("auto"))) {
+		err = strtobool(dl->opts.reporter_param_value, &auto_recover);
+		if (err)
+			goto err_param_value_parse;
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_REC,
+				(uint8_t)auto_recover);
+	} else {
+		printf("Parameter name: %s  is not supported\n",
+		       dl->opts.reporter_param_name);
+		return -ENOTSUP;
+	}
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+
+err_param_value_parse:
+	pr_err("Value \"%s\" is not a number or not within range\n",
+	       dl->opts.param_value);
+	return err;
+}
+
+static int cmd_health_objdump_clear(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg,
+			       DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE |
+				DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	dl_opts_put(nlh, dl);
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int health_value_show(struct dl *dl, int type, struct nlattr *nl_data)
+{
+	const char *str;
+	uint8_t *data;
+	uint32_t len;
+	uint64_t u64;
+	uint32_t u32;
+	uint16_t u16;
+	uint8_t u8;
+	int i;
+
+	switch (type) {
+	case MNL_TYPE_FLAG:
+		if (dl->json_output)
+			jsonw_string(dl->jw, nl_data ? "true" : "false");
+		else
+			pr_out("%s ", nl_data ? "true" : "false");
+		break;
+	case MNL_TYPE_U8:
+		u8 = mnl_attr_get_u8(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, u8);
+		else
+			pr_out("%u ", u8);
+		break;
+	case MNL_TYPE_U16:
+		u16 = mnl_attr_get_u16(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, u16);
+		else
+			pr_out("%u ", u16);
+		break;
+	case MNL_TYPE_U32:
+		u32 = mnl_attr_get_u32(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, u32);
+		else
+			pr_out("%u ", u32);
+		break;
+	case MNL_TYPE_U64:
+		u64 = mnl_attr_get_u64(nl_data);
+		if (dl->json_output)
+			jsonw_u64(dl->jw, u64);
+		else
+			pr_out("%lu ", u64);
+		break;
+	case MNL_TYPE_STRING:
+	case MNL_TYPE_NUL_STRING:
+		str = mnl_attr_get_str(nl_data);
+		if (dl->json_output)
+			jsonw_string(dl->jw, str);
+		else
+			pr_out("%s ", str);
+		break;
+	case MNL_TYPE_BINARY:
+		len = mnl_attr_get_payload_len(nl_data);
+		data = mnl_attr_get_payload(nl_data);
+		i = 0;
+		while (i < len) {
+			if (dl->json_output)
+				jsonw_printf(dl->jw, "%d", data[i]);
+			else
+				pr_out("%02x ", data[i]);
+			i++;
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+	return MNL_CB_OK;
+}
+
+static int health_object_pair_show(struct dl *dl, struct nlattr *nl)
+{
+	struct nlattr *nla_pair[DEVLINK_ATTR_MAX + 1] = {};
+	struct nlattr *nla_value[DEVLINK_ATTR_MAX + 1] = {};
+	struct nlattr *nla_value_data;
+	struct nlattr *nla_obj_pair;
+	struct nlattr *nla_object;
+	struct nlattr *nla_array;
+	struct nlattr *nla_val;
+	int err, type;
+	const char *name;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_pair);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_pair[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME] ||
+	    !nla_pair[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE])
+		return -EINVAL;
+
+	name = mnl_attr_get_str(nla_pair[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME]);
+	if (dl->json_output)
+		jsonw_name(dl->jw, name);
+	else
+		pr_out("%s: ", name);
+
+	nla_val = nla_pair[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE];
+	err = mnl_attr_parse_nested(nla_val, attr_cb, nla_value);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT] &&
+	    !nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY] &&
+	    !nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE])
+		return -EINVAL;
+
+	if (nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT]) {
+		nla_object = nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT];
+		/*inside must be an object pair*/
+		if (dl->json_output)
+			jsonw_start_object(dl->jw);
+
+		mnl_attr_for_each_nested(nla_obj_pair, nla_object) {
+			err = health_object_pair_show(dl, nla_obj_pair);
+			if (err != MNL_CB_OK)
+				break;
+		}
+		if (dl->json_output)
+			jsonw_end_object(dl->jw);
+	}
+	if (nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE]) {
+		type = mnl_attr_get_u8(nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE]);
+		if (!nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA])
+			return -EINVAL;
+		nla_value_data = nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA];
+		if (type  == MNL_TYPE_FLAG)
+		health_value_show(dl, type, nla_value_data);
+	}
+	if (nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY]) {
+		if (dl->json_output)
+			jsonw_start_array(dl->jw);
+		nla_array = nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY];
+		mnl_attr_for_each_nested(nla_object, nla_array) {
+			mnl_attr_for_each_nested(nla_obj_pair, nla_object) {
+				err = health_object_pair_show(dl, nla_obj_pair);
+				if (err != MNL_CB_OK)
+					break;
+			}
+		}
+		if (dl->json_output)
+			jsonw_end_array(dl->jw);
+	}
+	return err;
+}
+
+static int cmd_health_object_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct nlattr *nla_object;
+	struct nlattr *nla_object_pair;
+	struct dl *dl = data;
+	int err = MNL_CB_OK;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT])
+		return MNL_CB_ERROR;
+
+	mnl_attr_for_each(nla_object, nlh, sizeof(*genl)) {
+		mnl_attr_for_each_nested(nla_object_pair, nla_object) {
+			err = health_object_pair_show(dl, nla_object_pair);
+			if (err != MNL_CB_OK)
+				break;
+		}
+		if (!dl->json_output)
+			pr_out("\n");
+	}
+
+	return err;
+}
+
+static int cmd_health_objdump_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg,
+			       DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE |
+				DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	if (dl->json_output)
+		jsonw_start_object(dl->jw);
+
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_object_cb, dl);
+	if (err) {
+		jsonw_end_object(dl->jw);
+		return err;
+	}
+
+	if (dl->json_output)
+		jsonw_end_object(dl->jw);
+
+	return err;
+}
+
+static int cmd_health_diagnose(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE |
+				DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	if (dl->json_output)
+		jsonw_start_object(dl->jw);
+
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_object_cb, dl);
+	if (err) {
+		jsonw_end_object(dl->jw);
+		return err;
+	}
+
+	if (dl->json_output)
+		jsonw_end_object(dl->jw);
+
+	return err;
+}
+
+static int cmd_health_recover(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE |
+				DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	dl_opts_put(nlh, dl);
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static void pr_out_array_parameters_start(struct dl *dl, const char *name)
+{
+	if (dl->json_output) {
+		jsonw_name(dl->jw, name);
+		jsonw_start_array(dl->jw);
+	} else {
+		__pr_out_newline();
+		pr_out("%s:", name);
+		 __pr_out_indent_inc();
+		__pr_out_newline();
+	}
+}
+
+static const char *health_state_name(uint8_t state)
+{
+	switch (state) {
+	case DEVLINK_HEALTH_REPORTER_STATE_HEALTHY:
+		return HEALTH_REPORTER_STATE_HEALTHY_STR;
+	case DEVLINK_HEALTH_REPORTER_STATE_ERROR:
+		return HEALTH_REPORTER_STATE_ERROR_STR;
+	default: return "<unknown state>";
+	}
+}
+
+static void pr_out_health(struct dl *dl, struct nlattr **tb)
+{
+	struct nlattr *hlt[DEVLINK_ATTR_MAX + 1] = {};
+	enum devlink_health_reporter_state state;
+	char dump_time_date[80] = "N/A";
+	bool auto_recover = false;
+	const struct nlattr *attr;
+	bool dmp = false;
+	struct timeval tv;
+	uint64_t jiffies;
+	struct tm *info;
+	int err;
+
+	state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY;
+
+	err = mnl_attr_parse_nested(tb[DEVLINK_ATTR_HEALTH_REPORTER], attr_cb,
+				    hlt);
+	if (err != MNL_CB_OK)
+		return;
+
+	if (!hlt[DEVLINK_ATTR_HEALTH_REPORTER_NAME]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_ERR]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_AVAIL]	||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_STATE]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_REC]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_PERIOD])
+		return;
+
+	dmp = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_AVAIL]);
+	if (dmp) {
+		if (hlt[DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_TS]) {
+			attr = hlt[DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_TS];
+			jiffies = mnl_attr_get_u64(attr);
+			__jiffies_to_tv(&tv, jiffies);
+			info = localtime(&tv.tv_sec);
+			strftime(dump_time_date, 80, "%b %d %l:%M:%S", info);
+		}
+	}
+
+	pr_out_handle_start_arr(dl, tb);
+
+	pr_out_str(dl, "name",
+		   mnl_attr_get_str(hlt[DEVLINK_ATTR_HEALTH_REPORTER_NAME]));
+	state = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_STATE]);
+	pr_out_str(dl, "state", health_state_name(state));
+	pr_out_u64(dl, "#err",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_ERR]));
+	pr_out_u64(dl, "#recover",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER]));
+	pr_out_str(dl, "last_objdump_ts", dump_time_date);
+	pr_out_bool(dl, "objdump_available", dmp);
+	pr_out_array_parameters_start(dl, "parameters");
+	pr_out_entry_start(dl);
+	pr_out_u64(dl, "grace_period",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_PERIOD]));
+	auto_recover = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_REC]);
+	pr_out_bool(dl, "auto_recover", auto_recover);
+	pr_out_entry_end(dl);
+	pr_out_array_end(dl);
+
+	pr_out_handle_end(dl);
+}
+
+static int cmd_health_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct dl *dl = data;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_HEALTH_REPORTER])
+		return MNL_CB_ERROR;
+
+	pr_out_health(dl, tb);
+
+	return MNL_CB_OK;
+}
+
+static int cmd_health_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_GET,
+			       flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl, 0, DL_OPT_HANDLE |
+					DL_OPT_HEALTH_REPORTER_NAME);
+		if (err)
+			return err;
+	}
+	pr_out_section_start(dl, "health");
+
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_show_cb, dl);
+	if (err)
+		return err;
+	pr_out_section_end(dl);
+	return err;
+}
+
+static void cmd_health_help(void)
+{
+	pr_err("Usage: devlink health show [ dev DEV reporter REPORTER_NAME ]\n");
+	pr_err("Usage: devlink health recover DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health diagnose DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health objdump show DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health objdump clear DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health set DEV reporter REPORTER_NAME NAME VALUE\n");
+}
+
+static int cmd_health(struct dl *dl)
+{
+	if (dl_no_arg(dl)) {
+		return cmd_health_show(dl);
+	} else if (dl_argv_match(dl, "help")) {
+		cmd_health_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show")) {
+		dl_arg_inc(dl);
+		return cmd_health_show(dl);
+	} else if (dl_argv_match(dl, "recover")) {
+		dl_arg_inc(dl);
+		return cmd_health_recover(dl);
+	} else if (dl_argv_match(dl, "diagnose")) {
+		dl_arg_inc(dl);
+		return cmd_health_diagnose(dl);
+	} else if (dl_argv_match(dl, "objdump")) {
+		dl_arg_inc(dl);
+		if (dl_argv_match(dl, "show")) {
+			dl_arg_inc(dl);
+			return cmd_health_objdump_show(dl);
+		} else if (dl_argv_match(dl, "clear")) {
+			dl_arg_inc(dl);
+			return cmd_health_objdump_clear(dl);
+		}
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_health_set_params(dl);
+	}
+
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static void help(void)
 {
 	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       devlink [ -f[orce] ] -b[atch] filename\n"
-	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region }\n"
+	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health }\n"
 	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] }\n");
 }
 
@@ -5594,7 +6117,11 @@ static int dl_cmd(struct dl *dl, int argc, char **argv)
 	} else if (dl_argv_match(dl, "region")) {
 		dl_arg_inc(dl);
 		return cmd_region(dl);
+	} else if (dl_argv_match(dl, "health")) {
+		dl_arg_inc(dl);
+		return cmd_health(dl);
 	}
+
 	pr_err("Object \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
 }
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index d0a33d79dc22..2f3bbf39e887 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -89,6 +89,12 @@ enum devlink_command {
 	DEVLINK_CMD_REGION_DEL,
 	DEVLINK_CMD_REGION_READ,
 
+	DEVLINK_CMD_HEALTH_REPORTER_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_SET,
+	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+	DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR,
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -168,6 +174,11 @@ enum devlink_param_fw_load_policy_value {
 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH,
 };
 
+enum devlink_health_reporter_state {
+	DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+	DEVLINK_HEALTH_REPORTER_STATE_ERROR,
+};
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -285,6 +296,24 @@ enum devlink_attr {
 	DEVLINK_ATTR_REGION_CHUNK_ADDR,         /* u64 */
 	DEVLINK_ATTR_REGION_CHUNK_LEN,          /* u64 */
 
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT,		/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR,		/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE,	/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY,	/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,	/* u8 */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,	/* dynamic */
+
+	DEVLINK_ATTR_HEALTH_REPORTER,			/* nested */
+	DEVLINK_ATTR_HEALTH_REPORTER_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_REPORTER_STATE,		/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_ERR,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_AVAIL,	/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_OBJDUMP_TS,	/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_PERIOD,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_REC,		/* u8 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/man/man8/devlink-health.8 b/man/man8/devlink-health.8
new file mode 100644
index 000000000000..19de4079ff67
--- /dev/null
+++ b/man/man8/devlink-health.8
@@ -0,0 +1,175 @@
+.TH DEVLINK\-HEALTH 8 "27 Dec 2018" "iproute2" "Linux"
+.SH NAME
+devlink-health \- devlink health reporting and recovery
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.B health
+.RI  " { " COMMAND " | "
+.BR help " }"
+.sp
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] }
+.ti -8
+.BR "devlink health show"
+.RI "[ "
+.RI "" DEV ""
+.BR " reporter "
+.RI ""REPORTER " ] "
+.ti -8
+.BR "devlink health recover"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health diagnose"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health objdump show"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health objdump clear"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health set"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.RI "" NAME ""
+.RI "" VALUE ""
+.ti -8
+.B devlink health help
+.SH "DESCRIPTION"
+.SS devlink health show - Show status and configuration on all supported reporters on all devlink devices.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health recover - Initiate a recovery operation on a reporter.
+This action performs a recovery and increases the recoveries counter on success.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health diagnose - Retrieve diagnostics data on a reporter.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health objdump show - Display the last saved objdump.
+.PD 0
+.P
+Devlink health saves a single objdump per reporter. If an objdump is
+.P
+not already stored by the Devlink, this command will generate a new
+.P
+objdump. The objdump can be generated either automatically when a
+.P
+reporter reports on an error or manually at the user's request.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health objdump clear - Delete the saved objdump.
+Deleting the save objdump enables a generation of a new objdump on
+.PD 0
+.P
+the next "devlink health objdump show" command.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health set - Enable the user to configure:
+.PD 0
+1) grace_period [msec] - Time interval between auto recoveries.
+.P
+2) auto_recover [true/false] - Indicates whether the devlink should execute automatic recover on error.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SH "EXAMPLES"
+.PP
+devlink health show
+.RS 4
+pci/0000:00:09.0:
+  name TX state healthy #err 1 #recover 1 last_objdump_ts N/A objdump_available false
+  attributes:
+    grace period 600 auto recover true
+.RE
+.PP
+devlink health recover pci/0000:00:09.0 reporter TX
+.RS 4
+Initiate recovery on TX reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health diagnose pci/0000:00:09.0 reporter TX
+.RS 4
+.PD 0
+SQ 0x9b: HW state: 1 stopped: 0
+.P
+SQ 0x9f: HW state: 1 stopped: 0
+.P
+SQ 0xa3: HW state: 1 stopped: 0
+.P
+SQ 0xa7: HW state: 1 stopped: 0
+.P
+SQ 0xab: HW state: 1 stopped: 0
+.PD
+.RE
+.PP
+devlink health objdump show  pci/0000:00:09.0 reporter TX
+.RS 4
+Display the last saved objdump on TX reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health objdump clear pci/0000:00:09.0 reporter TX
+.RS 4
+Delete saved objdump on TX reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health set pci/0000:00:09.0 reporter TX grace_period 3500
+.RS 4
+Set time interval between auto recoveries to minimum of 3500 mSec on
+TX reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health set pci/0000:00:09.0 reporter TX auto_recover false
+.RS 4
+Turn off auto recovery on TX reporter registered on pci/0000:00:09.0.
+.RE
+.SH SEE ALSO
+.BR devlink (8),
+.BR devlink-dev (8),
+.BR devlink-port (8),
+.BR devlink-region (8),
+.br
+
+.SH AUTHOR
+Aya Levin <ayal@mellanox.com>
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 8d527e7e1d60..13d4dcd908b3 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -7,7 +7,7 @@ devlink \- Devlink tool
 .in +8
 .ti -8
 .B devlink
-.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region " } { " COMMAND " | "
+.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health " } { " COMMAND " | "
 .BR help " }"
 .sp
 
@@ -78,6 +78,10 @@ Turn on verbose output.
 .B region
 - devlink address region access
 
+.TP
+.B health
+- devlink reporting and recovery
+
 .SS
 .I COMMAND
 
@@ -109,6 +113,7 @@ Exit status is 0 if command was successful or a positive integer upon failure.
 .BR devlink-sb (8),
 .BR devlink-resource (8),
 .BR devlink-region (8),
+.BR devlink-health (8),
 .br
 
 .SH REPORTING BUGS
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
  2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (19 preceding siblings ...)
  2018-12-31 14:41 ` [PATCH RFC iproute2-next] devlink: Add health command support Aya Levin
@ 2019-01-01  1:47 ` Jakub Kicinski
  2019-01-01  9:58   ` Eran Ben Elisha
  2019-01-04  8:57   ` Jiri Pirko
  20 siblings, 2 replies; 31+ messages in thread
From: Jakub Kicinski @ 2019-01-01  1:47 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, David S. Miller, Jiri Pirko, Moshe Shemesh, Aya Levin,
	Tal Alon, Ariel Almog

On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote:
> The health mechanism is targeted for Real Time Alerting, in order to know when
> something bad had happened to a PCI device
> - Provide alert debug information
> - Self healing
> - If problem needs vendor support, provide a way to gather all needed debugging
>   information.
> 
> The main idea is to unify and centralize driver health reports in the
> generic devlink instance and allow the user to set different
> attributes of the health reporting and recovery procedures.
> 
> The devlink health reporter:
> Device driver creates a "health reporter" per each error/health type.
> Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
> or unknown (driver specific).
> For each registered health reporter a driver can issue error/health reports
> asynchronously. All health reports handling is done by devlink.
> Device driver can provide specific callbacks for each "health reporter", e.g.
>  - Recovery procedures
>  - Diagnostics and object dump procedures
>  - OOB initial attributes
> Different parts of the driver can register different types of health reporters
> with different handlers.
> 
> Once an error is reported, devlink health will do the following actions:
>   * A log is being send to the kernel trace events buffer
>   * Health status and statistics are being updated for the reporter instance
>   * Object dump is being taken and saved at the reporter instance (as long as
>     there is no other Objdump which is already stored)
>   * Auto recovery attempt is being done. Depends on:
>     - Auto-recovery configuration
>     - Grace period vs. time passed since last recover
> 
> The user interface:
> User can access/change each reporter attributes and driver specific callbacks
> via devlink, e.g per error type (per health reporter)
>  - Configure reporter's generic attributes (like: Disable/enable auto recovery)
>  - Invoke recovery procedure
>  - Run diagnostics
>  - Object dump

Thanks for continuing this work!

> The devlink health interface (via netlink):
> DEVLINK_CMD_HEALTH_REPORTER_GET
>   Retrieves status and configuration info per DEV and reporter.
> DEVLINK_CMD_HEALTH_REPORTER_SET
>   Allows reporter-related configuration setting.
> DEVLINK_CMD_HEALTH_REPORTER_RECOVER
>   Triggers a reporter's recovery procedure.
> DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
>   Retrieves diagnostics data from a reporter on a device.
> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET

The addition of "objdump" and its marshalling is a bit disappointing.
It seemed to me when region snapshots were added that they would serve
this exact purpose.  Taking a region snapshot or "core dump", if you
will, after something went south.  Now it's confusing to have two
mechanisms serving the same purpose.

It's unclear from quick reading of the code how if the buffers get
timestamped.  Can you report more than one?

About the marshalling, I'm not sure it belongs in the kernel.  There is
precedent for adding interpretation of FW blobs in user space (ethtool).
IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
drivers.  Amount of code you add to print the simple example from last
patch is not inspiring confidence.

And on the bike shedding side :) -> I think you should steer clear of
calling this objdump, as it has very little to do with the
functionality of well-known objdump tool.  Its not even clear what the
object the name is referring to is.

Long story short the overlap with region snapshots makes it unclear
what purpose either serves, and IMHO we should avoid the marshalling by
teaching user space how to interpret snapshots.  Preferably we only
have one dump mechanism, and user space can be taught the interpretation
once.

>   Retrieves the last stored objdump. Devlink health
>   saves a single objdump. If an objdump is not already stored by the devlink
>   for this reporter, devlink generates a new objdump.
>   Objdump output is defined by the reporter.
> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR
>   Clears the last saved objdump file for the specified reporter.
> 
> 
>                                                netlink
>                                       +--------------------------+
>                                       |                          |
>                                       |            +             |
>                                       |            |             |
>                                       +--------------------------+
>                                                    |request for ops
>                                                    |(diagnose,
>  mlx5_core                             devlink     |recover,
>                                                    |dump)
> +--------+                            +--------------------------+
> |        |                            |    reporter|             |
> |        |                            |  +---------v----------+  |
> |        |   ops execution            |  |                    |  |
> |     <----------------------------------+                    |  |
> |        |                            |  |                    |  |
> |        |                            |  + ^------------------+  |
> |        |                            |    | request for ops     |
> |        |                            |    | (recover, dump)     |
> |        |                            |    |                     |
> |        |                            |  +-+------------------+  |
> |        |     health report          |  | health handler     |  |
> |        +------------------------------->                    |  |
> |        |                            |  +--------------------+  |
> |        |     health reporter create |                          |
> |        +---------------------------->                          |
> +--------+                            +--------------------------+
> 
> Available reporters:
> In this patchset, three reporters of mlx5e driver are included. The FW
> reporters implement devlink_health_reporter diagnostic, object dump and
> recovery procedures. The TX reporter implements devlink_health_reporter
> diagnostic and recovery procedures.
> 
> This RFC is based on the same concepts as previous RFC I sent earlier this
> year: "[RFC PATCH iproute2-next] System specification health API". The API was
> changed, also devlink code and mlx5e reporters were not available at the
> previous RFC.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 19/19] devlink: Add Documentation/networking/devlink-health.txt
  2018-12-31 14:32 ` [PATCH RFC net-next 19/19] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
@ 2019-01-01  1:47   ` Jakub Kicinski
  2019-01-01 10:01     ` Eran Ben Elisha
  0 siblings, 1 reply; 31+ messages in thread
From: Jakub Kicinski @ 2019-01-01  1:47 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, David S. Miller, Jiri Pirko, Moshe Shemesh, Aya Levin,
	Tal Alon, Ariel Almog

On Mon, 31 Dec 2018 16:32:13 +0200, Eran Ben Elisha wrote:
> +Once an error is reported, devlink health will do the following actions:
> +  * A log is being send to the kernel trace events buffer
> +  * Health status and statistics are being updated for the reporter instance
> +  * Object dump is being taken and saved at the reporter instance (as long as
> +    there is no other Objdump which is already stored)
> +  * Auto recovery attempt is being done. Depends on:
> +    - Auto-recovery configuration
> +    - Grace period vs. time passed since last recover

Would it make sense to store the result of last recovery if it failed?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
  2019-01-01  1:47 ` [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Jakub Kicinski
@ 2019-01-01  9:58   ` Eran Ben Elisha
  2019-01-02 22:46     ` Jakub Kicinski
  2019-01-04  8:57   ` Jiri Pirko
  1 sibling, 1 reply; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-01  9:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Jiri Pirko, Moshe Shemesh, Aya Levin,
	Tal Alon, Ariel Almog



On 1/1/2019 3:47 AM, Jakub Kicinski wrote:
> On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote:
>> The health mechanism is targeted for Real Time Alerting, in order to know when
>> something bad had happened to a PCI device
>> - Provide alert debug information
>> - Self healing
>> - If problem needs vendor support, provide a way to gather all needed debugging
>>    information.
>>
>> The main idea is to unify and centralize driver health reports in the
>> generic devlink instance and allow the user to set different
>> attributes of the health reporting and recovery procedures.
>>
>> The devlink health reporter:
>> Device driver creates a "health reporter" per each error/health type.
>> Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
>> or unknown (driver specific).
>> For each registered health reporter a driver can issue error/health reports
>> asynchronously. All health reports handling is done by devlink.
>> Device driver can provide specific callbacks for each "health reporter", e.g.
>>   - Recovery procedures
>>   - Diagnostics and object dump procedures
>>   - OOB initial attributes
>> Different parts of the driver can register different types of health reporters
>> with different handlers.
>>
>> Once an error is reported, devlink health will do the following actions:
>>    * A log is being send to the kernel trace events buffer
>>    * Health status and statistics are being updated for the reporter instance
>>    * Object dump is being taken and saved at the reporter instance (as long as
>>      there is no other Objdump which is already stored)
>>    * Auto recovery attempt is being done. Depends on:
>>      - Auto-recovery configuration
>>      - Grace period vs. time passed since last recover
>>
>> The user interface:
>> User can access/change each reporter attributes and driver specific callbacks
>> via devlink, e.g per error type (per health reporter)
>>   - Configure reporter's generic attributes (like: Disable/enable auto recovery)
>>   - Invoke recovery procedure
>>   - Run diagnostics
>>   - Object dump
> 
> Thanks for continuing this work!
:)
> 
>> The devlink health interface (via netlink):
>> DEVLINK_CMD_HEALTH_REPORTER_GET
>>    Retrieves status and configuration info per DEV and reporter.
>> DEVLINK_CMD_HEALTH_REPORTER_SET
>>    Allows reporter-related configuration setting.
>> DEVLINK_CMD_HEALTH_REPORTER_RECOVER
>>    Triggers a reporter's recovery procedure.
>> DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
>>    Retrieves diagnostics data from a reporter on a device.
>> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET
> 
> The addition of "objdump" and its marshalling is a bit disappointing.
> It seemed to me when region snapshots were added that they would serve
> this exact purpose.  Taking a region snapshot or "core dump", if you
> will, after something went south.  Now it's confusing to have two
> mechanisms serving the same purpose.

The motivation here was that the driver can provide reporters to its 
sub-modules, such that each reporter will be able to provide all needed 
info and recover methods to face run time errors.

The implementation of the objdump function is in the hands of the 
reporter developer, and he can dump whatever he thinks it is needed.
Keep in mind that a driver can have many reporters (TX, RX, FW, command 
interface, etc). For most of the reporters, there is important 
information in the driver which can not be fetched with region snapshot 
(intended for memory fetching only).
For SW info, driver shall build the info and send it interpreted (unlike 
all dumps / region available mechanisms)
I have in plans to extend the TX reporter to have objdump method in 
which I will pass many SQ SW attributes that can be very handy in a 
debug session.

> 
> It's unclear from quick reading of the code how if the buffers get
> timestamped.  Can you report more than one?

The timestamp which devlink health reports on, is the timestamp in which 
it got the buffers filled by the driver. Every dump/diagnose has one ts.
Per reporter, it is possible to store up to one dump. only clear command 
can remove it and makes the reporter ready to fetch a new objdump.

> 
> About the marshalling, I'm not sure it belongs in the kernel.  There is
> precedent for adding interpretation of FW blobs in user space (ethtool).
> IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
> drivers.  Amount of code you add to print the simple example from last
> patch is not inspiring confidence.

The idea was to provide the developer the ability to create "tree-like" 
of information, it is needed when you want to describe complex objects. 
This caused a longer coding, but I don't think we are even close to the 
scale you are talking about.
We can remove the tree flexibility, and move to array format, it will 
make the code of storing data by the driver to be shorter, but we will 
lose the ability to have it in tree-like format.

And again, regarding ethtool, it is a tool for dumping HW/FW, this could 
have been an argument for the region snapshot, not for the devlink health...

> 
> And on the bike shedding side :) -> I think you should steer clear of
> calling this objdump, as it has very little to do with the
> functionality of well-known objdump tool.  Its not even clear what the
> object the name is referring to is.
Let's agree on concept, we can change name to dump. Reporter->dump is 
very clear when you know what the reporter is.
> 
> Long story short the overlap with region snapshots makes it unclear
> what purpose either serves, and IMHO we should avoid the marshalling by
> teaching user space how to interpret snapshots.  Preferably we only
> have one dump mechanism, and user space can be taught the interpretation
> once.
Forcing SW reporters to use region snapshot mechanism sounds like a bad 
idea.

To summarize, In order to have the health system working properly, it 
must have a way to objdump/dump itself and provide it to the developer 
for debug. Removing the objdump part will make it useless for run-time 
debug.

I think this is a powerful tool and we shall not ask the user level 
scripts to fetch info from many other partial tools. It shall all be 
focused in one place (recover, diagnose, objdump, statistics).

> 
>>    Retrieves the last stored objdump. Devlink health
>>    saves a single objdump. If an objdump is not already stored by the devlink
>>    for this reporter, devlink generates a new objdump.
>>    Objdump output is defined by the reporter.
>> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR
>>    Clears the last saved objdump file for the specified reporter.
>>
>>
>>                                                 netlink
>>                                        +--------------------------+
>>                                        |                          |
>>                                        |            +             |
>>                                        |            |             |
>>                                        +--------------------------+
>>                                                     |request for ops
>>                                                     |(diagnose,
>>   mlx5_core                             devlink     |recover,
>>                                                     |dump)
>> +--------+                            +--------------------------+
>> |        |                            |    reporter|             |
>> |        |                            |  +---------v----------+  |
>> |        |   ops execution            |  |                    |  |
>> |     <----------------------------------+                    |  |
>> |        |                            |  |                    |  |
>> |        |                            |  + ^------------------+  |
>> |        |                            |    | request for ops     |
>> |        |                            |    | (recover, dump)     |
>> |        |                            |    |                     |
>> |        |                            |  +-+------------------+  |
>> |        |     health report          |  | health handler     |  |
>> |        +------------------------------->                    |  |
>> |        |                            |  +--------------------+  |
>> |        |     health reporter create |                          |
>> |        +---------------------------->                          |
>> +--------+                            +--------------------------+
>>
>> Available reporters:
>> In this patchset, three reporters of mlx5e driver are included. The FW
>> reporters implement devlink_health_reporter diagnostic, object dump and
>> recovery procedures. The TX reporter implements devlink_health_reporter
>> diagnostic and recovery procedures.
>>
>> This RFC is based on the same concepts as previous RFC I sent earlier this
>> year: "[RFC PATCH iproute2-next] System specification health API". The API was
>> changed, also devlink code and mlx5e reporters were not available at the
>> previous RFC.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 19/19] devlink: Add Documentation/networking/devlink-health.txt
  2019-01-01  1:47   ` Jakub Kicinski
@ 2019-01-01 10:01     ` Eran Ben Elisha
  0 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-01 10:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Jiri Pirko, Moshe Shemesh, Aya Levin,
	Tal Alon, Ariel Almog



On 1/1/2019 3:47 AM, Jakub Kicinski wrote:
> On Mon, 31 Dec 2018 16:32:13 +0200, Eran Ben Elisha wrote:
>> +Once an error is reported, devlink health will do the following actions:
>> +  * A log is being send to the kernel trace events buffer
>> +  * Health status and statistics are being updated for the reporter instance
>> +  * Object dump is being taken and saved at the reporter instance (as long as
>> +    there is no other Objdump which is already stored)
>> +  * Auto recovery attempt is being done. Depends on:
>> +    - Auto-recovery configuration
>> +    - Grace period vs. time passed since last recover
> 
> Would it make sense to store the result of last recovery if it failed?

We thought about it.
Internally we discussed it and decided that recover failures shall be 
indicated in the kernel logs and not be provided as part of devlink 
health show command.
Keep in mind that if a recover failed, the reporter status will be kept 
as is, since no recover was successfully finished.

> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
  2019-01-01  9:58   ` Eran Ben Elisha
@ 2019-01-02 22:46     ` Jakub Kicinski
  2019-01-03 13:31       ` Eran Ben Elisha
  0 siblings, 1 reply; 31+ messages in thread
From: Jakub Kicinski @ 2019-01-02 22:46 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, David S. Miller, Jiri Pirko, Moshe Shemesh, Aya Levin,
	Tal Alon, Ariel Almog

On Tue, 1 Jan 2019 09:58:30 +0000, Eran Ben Elisha wrote:
> On 1/1/2019 3:47 AM, Jakub Kicinski wrote:
> > The addition of "objdump" and its marshalling is a bit disappointing.
> > It seemed to me when region snapshots were added that they would serve
> > this exact purpose.  Taking a region snapshot or "core dump", if you
> > will, after something went south.  Now it's confusing to have two
> > mechanisms serving the same purpose.  
> 
> The motivation here was that the driver can provide reporters to its 
> sub-modules, such that each reporter will be able to provide all needed 
> info and recover methods to face run time errors.
> 
> The implementation of the objdump function is in the hands of the 
> reporter developer, and he can dump whatever he thinks it is needed.
> Keep in mind that a driver can have many reporters (TX, RX, FW, command 
> interface, etc). For most of the reporters, there is important 
> information in the driver which can not be fetched with region snapshot 
> (intended for memory fetching only).
> For SW info, driver shall build the info and send it interpreted (unlike 
> all dumps / region available mechanisms)
> I have in plans to extend the TX reporter to have objdump method in 
> which I will pass many SQ SW attributes that can be very handy in a 
> debug session.

My feeling is that instead of duplicating this infrastructure we should
try to grow region snapshots beyond the "HW memory dumper".  In a
debugging session you may want to have dumps as well as read the state
live.  Region snapshot API was built with this promise in mind.  The
information in reporter dump is not easily available other than when
the dump happens, which is not great in a debugging session.  Driver
will have to expose it via debugfs/region dump/ethtool dump or some
such for live debug.

> > It's unclear from quick reading of the code how if the buffers get
> > timestamped.  Can you report more than one?  
> 
> The timestamp which devlink health reports on, is the timestamp in which 
> it got the buffers filled by the driver. Every dump/diagnose has one ts.
> Per reporter, it is possible to store up to one dump. only clear command 
> can remove it and makes the reporter ready to fetch a new objdump.

Region snapshots support collecting multiple snapshots IIRC, no?

> > About the marshalling, I'm not sure it belongs in the kernel.  There is
> > precedent for adding interpretation of FW blobs in user space (ethtool).
> > IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
> > drivers.  Amount of code you add to print the simple example from last
> > patch is not inspiring confidence.  
> 
> The idea was to provide the developer the ability to create "tree-like" 
> of information, it is needed when you want to describe complex objects. 
> This caused a longer coding, but I don't think we are even close to the 
> scale you are talking about.
> We can remove the tree flexibility, and move to array format, it will 
> make the code of storing data by the driver to be shorter, but we will 
> lose the ability to have it in tree-like format.

To be clear I slightly oppose the marshalling in the first place.  It
may be better to just dump the data as is, and have user space know
what the interpretation is.

> And again, regarding ethtool, it is a tool for dumping HW/FW, this could 
> have been an argument for the region snapshot, not for the devlink health...

I've seen drivers dumping ring state and other SW info via ethtool.
All debugging APIs end up "mixed source" in my experience.

> > And on the bike shedding side :) -> I think you should steer clear of
> > calling this objdump, as it has very little to do with the
> > functionality of well-known objdump tool.  Its not even clear what the
> > object the name is referring to is.  
> Let's agree on concept, we can change name to dump. Reporter->dump is 
> very clear when you know what the reporter is.

Thanks!

> > Long story short the overlap with region snapshots makes it unclear
> > what purpose either serves, and IMHO we should avoid the marshalling by
> > teaching user space how to interpret snapshots.  Preferably we only
> > have one dump mechanism, and user space can be taught the interpretation
> > once.  
> Forcing SW reporters to use region snapshot mechanism sounds like a bad 
> idea.

I'm not super excited about reusing region API for SW info.  But I like
it more than having multitude of debug APIs for drivers to implement
with largely overlapping functionality and semantics.

> To summarize, In order to have the health system working properly, it 
> must have a way to objdump/dump itself and provide it to the developer 
> for debug. Removing the objdump part will make it useless for run-time 
> debug.
> 
> I think this is a powerful tool and we shall not ask the user level 
> scripts to fetch info from many other partial tools. It shall all be 
> focused in one place (recover, diagnose, objdump, statistics).

I don't think having reporter API refer to snapshot IDs is "many other
partial tools" if that's the suggestion.  Seems like you want to focus
all the reporter stuff in one place, and I want to focus the debug APIs
a little :)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
  2019-01-02 22:46     ` Jakub Kicinski
@ 2019-01-03 13:31       ` Eran Ben Elisha
  2019-01-03 22:28         ` Jakub Kicinski
  2019-01-04  9:01         ` Jiri Pirko
  0 siblings, 2 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-03 13:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Jiri Pirko, Moshe Shemesh, Aya Levin,
	Tal Alon, Ariel Almog



On 1/3/2019 12:46 AM, Jakub Kicinski wrote:
> On Tue, 1 Jan 2019 09:58:30 +0000, Eran Ben Elisha wrote:
>> On 1/1/2019 3:47 AM, Jakub Kicinski wrote:
>>> The addition of "objdump" and its marshalling is a bit disappointing.
>>> It seemed to me when region snapshots were added that they would serve
>>> this exact purpose.  Taking a region snapshot or "core dump", if you
>>> will, after something went south.  Now it's confusing to have two
>>> mechanisms serving the same purpose.
>>
>> The motivation here was that the driver can provide reporters to its
>> sub-modules, such that each reporter will be able to provide all needed
>> info and recover methods to face run time errors.
>>
>> The implementation of the objdump function is in the hands of the
>> reporter developer, and he can dump whatever he thinks it is needed.
>> Keep in mind that a driver can have many reporters (TX, RX, FW, command
>> interface, etc). For most of the reporters, there is important
>> information in the driver which can not be fetched with region snapshot
>> (intended for memory fetching only).
>> For SW info, driver shall build the info and send it interpreted (unlike
>> all dumps / region available mechanisms)
>> I have in plans to extend the TX reporter to have objdump method in
>> which I will pass many SQ SW attributes that can be very handy in a
>> debug session.
> 
> My feeling is that instead of duplicating this infrastructure we should
> try to grow region snapshots beyond the "HW memory dumper".  In a
> debugging session you may want to have dumps as well as read the state
> live.  Region snapshot API was built with this promise in mind.  The
> information in reporter dump is not easily available other than when
> the dump happens, which is not great in a debugging session.  Driver
> will have to expose it via debugfs/region dump/ethtool dump or some
> such for live debug.

Arch wise those are two different features which we shouldn't mix.
The region dump is aiming at dumping of information for monitoring of 
"HW memory" at real time, more like a dumb channel to provide memory 
chunks from HW to user.

The health is a user facing tool which provides a centralized vision on 
a device health status with diagnose, recover and dump options divided 
via sub-reporters in a real time and can save critical data 
automatically in case of a reported error.

Regarding the need for other tool for live debug, it is not true. 
Devlink health objdump command provides latest stored dump, as well as 
dumping now for live debug.
It is even better, as driver reporters already contains the needed HW 
and SW buffers according to the device and the error. Unlike region dump 
that requires the administrator to set regions while the administrator 
is not an expert of mapping between required info, device model and 
reported error.

> 
>>> It's unclear from quick reading of the code how if the buffers get
>>> timestamped.  Can you report more than one?
>>
>> The timestamp which devlink health reports on, is the timestamp in which
>> it got the buffers filled by the driver. Every dump/diagnose has one ts.
>> Per reporter, it is possible to store up to one dump. only clear command
>> can remove it and makes the reporter ready to fetch a new objdump.
> 
> Region snapshots support collecting multiple snapshots IIRC, no?

it does allow multiple snapshots. This can be easily added to the 
devlink health if we wish to. I didn't see the current need for it.

> 
>>> About the marshalling, I'm not sure it belongs in the kernel.  There is
>>> precedent for adding interpretation of FW blobs in user space (ethtool).
>>> IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
>>> drivers.  Amount of code you add to print the simple example from last
>>> patch is not inspiring confidence.
>>
>> The idea was to provide the developer the ability to create "tree-like"
>> of information, it is needed when you want to describe complex objects.
>> This caused a longer coding, but I don't think we are even close to the
>> scale you are talking about.
>> We can remove the tree flexibility, and move to array format, it will
>> make the code of storing data by the driver to be shorter, but we will
>> lose the ability to have it in tree-like format.
> 
> To be clear I slightly oppose the marshalling in the first place.  It
> may be better to just dump the data as is, and have user space know
> what the interpretation is.

We provides a way to store the data in nested layers. In internal 
discussions with Jiri, we decided that this is the correct approach.
However, if one insists, it can fill the buffers with raw binary and 
label it as such.

> 
>> And again, regarding ethtool, it is a tool for dumping HW/FW, this could
>> have been an argument for the region snapshot, not for the devlink health...
> 
> I've seen drivers dumping ring state and other SW info via ethtool.
> All debugging APIs end up "mixed source" in my experience.
> 
>>> And on the bike shedding side :) -> I think you should steer clear of
>>> calling this objdump, as it has very little to do with the
>>> functionality of well-known objdump tool.  Its not even clear what the
>>> object the name is referring to is.
>> Let's agree on concept, we can change name to dump. Reporter->dump is
>> very clear when you know what the reporter is.
> 
> Thanks!
> 
>>> Long story short the overlap with region snapshots makes it unclear
>>> what purpose either serves, and IMHO we should avoid the marshalling by
>>> teaching user space how to interpret snapshots.  Preferably we only
>>> have one dump mechanism, and user space can be taught the interpretation
>>> once.
>> Forcing SW reporters to use region snapshot mechanism sounds like a bad
>> idea.
> 
> I'm not super excited about reusing region API for SW info.  But I like
> it more than having multitude of debug APIs for drivers to implement
> with largely overlapping functionality and semantics.

The dumping of HW information is just a very small portion of the 
devlink health. If driver developer thinks he can use existing region 
API to fetch some data into its reporter, he can do so in his dump method.

> 
>> To summarize, In order to have the health system working properly, it
>> must have a way to objdump/dump itself and provide it to the developer
>> for debug. Removing the objdump part will make it useless for run-time
>> debug.
>>
>> I think this is a powerful tool and we shall not ask the user level
>> scripts to fetch info from many other partial tools. It shall all be
>> focused in one place (recover, diagnose, objdump, statistics).
> 
> I don't think having reporter API refer to snapshot IDs is "many other
> partial tools" if that's the suggestion.  Seems like you want to focus
> all the reporter stuff in one place, and I want to focus the debug APIs
> a little :)
> 

As I can see it, this tool is an envelop to all health functionality and 
should provide good and easy to use interface.
This tool should contain all health related dumps (unfortunately as the 
subsystem runs for a long time, it cannot be the sole provider of it, 
but aspire to be the leading one).

With devlink health dump you can get:
1. Run time dumps
2. Automatic error related dumps from the time the error happened stored 
in the memory waiting to be fetched.
3. Driver parsed data as well as raw data (up to the developer to 
decide, API can support both).
4. OOB configured data marked by the driver developers as relevant for 
the failure

Non of the existing tools can provide such features. And the option to 
add them to an existing tool doesn't seem possible / correct even in theory.

Removing the dump option from devlink for the sake of duplicate HW 
memory dump is a killer for this feature. And as A driver developer I 
really think that the networking subsystem needs it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
  2019-01-03 13:31       ` Eran Ben Elisha
@ 2019-01-03 22:28         ` Jakub Kicinski
  2019-01-04  9:12           ` Jiri Pirko
  2019-01-04  9:01         ` Jiri Pirko
  1 sibling, 1 reply; 31+ messages in thread
From: Jakub Kicinski @ 2019-01-03 22:28 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, David S. Miller, Jiri Pirko, Moshe Shemesh, Aya Levin,
	Tal Alon, Ariel Almog

On Thu, 3 Jan 2019 13:31:59 +0000, Eran Ben Elisha wrote:
> Arch wise those are two different features which we shouldn't mix.
> The region dump is aiming at dumping of information for monitoring of 
> "HW memory" at real time, more like a dumb channel to provide memory 
> chunks from HW to user.

The "real time read" part of the region dump was not even implemented.
And it was the part that made most sense to me.

Region snapshots were described as a tool for gathering crash dumps.
See bedc989b0c98 ("net/mlx4_core: Add Crdump FW snapshot support").

The "chunks from HW" is also incorrect as (1) current implementation of
regions seem to mostly revolve around FW state and (2) there is nothing
in the man page etc. that says HW.

I'm not saying region snapshots fit the bill perfectly for you, I'm
saying you guys are adding a second facility to do a very similar thing
in the span of 6 months - how is it unreasonable of me to ask to
consolidate?

But I'm not gonna fight you any more on this, if nobody else cares.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
  2019-01-01  1:47 ` [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Jakub Kicinski
  2019-01-01  9:58   ` Eran Ben Elisha
@ 2019-01-04  8:57   ` Jiri Pirko
  1 sibling, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-01-04  8:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eran Ben Elisha, netdev, David S. Miller, Jiri Pirko,
	Moshe Shemesh, Aya Levin, Tal Alon, Ariel Almog

Tue, Jan 01, 2019 at 02:47:27AM CET, jakub.kicinski@netronome.com wrote:
>On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote:
>> The health mechanism is targeted for Real Time Alerting, in order to know when
>> something bad had happened to a PCI device
>> - Provide alert debug information
>> - Self healing
>> - If problem needs vendor support, provide a way to gather all needed debugging
>>   information.
>> 
>> The main idea is to unify and centralize driver health reports in the
>> generic devlink instance and allow the user to set different
>> attributes of the health reporting and recovery procedures.
>> 
>> The devlink health reporter:
>> Device driver creates a "health reporter" per each error/health type.
>> Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
>> or unknown (driver specific).
>> For each registered health reporter a driver can issue error/health reports
>> asynchronously. All health reports handling is done by devlink.
>> Device driver can provide specific callbacks for each "health reporter", e.g.
>>  - Recovery procedures
>>  - Diagnostics and object dump procedures
>>  - OOB initial attributes
>> Different parts of the driver can register different types of health reporters
>> with different handlers.
>> 
>> Once an error is reported, devlink health will do the following actions:
>>   * A log is being send to the kernel trace events buffer
>>   * Health status and statistics are being updated for the reporter instance
>>   * Object dump is being taken and saved at the reporter instance (as long as
>>     there is no other Objdump which is already stored)
>>   * Auto recovery attempt is being done. Depends on:
>>     - Auto-recovery configuration
>>     - Grace period vs. time passed since last recover
>> 
>> The user interface:
>> User can access/change each reporter attributes and driver specific callbacks
>> via devlink, e.g per error type (per health reporter)
>>  - Configure reporter's generic attributes (like: Disable/enable auto recovery)
>>  - Invoke recovery procedure
>>  - Run diagnostics
>>  - Object dump
>
>Thanks for continuing this work!
>
>> The devlink health interface (via netlink):
>> DEVLINK_CMD_HEALTH_REPORTER_GET
>>   Retrieves status and configuration info per DEV and reporter.
>> DEVLINK_CMD_HEALTH_REPORTER_SET
>>   Allows reporter-related configuration setting.
>> DEVLINK_CMD_HEALTH_REPORTER_RECOVER
>>   Triggers a reporter's recovery procedure.
>> DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
>>   Retrieves diagnostics data from a reporter on a device.
>> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET
>
>The addition of "objdump" and its marshalling is a bit disappointing.
>It seemed to me when region snapshots were added that they would serve
>this exact purpose.  Taking a region snapshot or "core dump", if you
>will, after something went south.  Now it's confusing to have two
>mechanisms serving the same purpose.
>
>It's unclear from quick reading of the code how if the buffers get
>timestamped.  Can you report more than one?
>
>About the marshalling, I'm not sure it belongs in the kernel.  There is
>precedent for adding interpretation of FW blobs in user space (ethtool).
>IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
>drivers.  Amount of code you add to print the simple example from last
>patch is not inspiring confidence.
>
>And on the bike shedding side :) -> I think you should steer clear of
>calling this objdump, as it has very little to do with the
>functionality of well-known objdump tool.  Its not even clear what the
>object the name is referring to is.
>
>Long story short the overlap with region snapshots makes it unclear
>what purpose either serves, and IMHO we should avoid the marshalling by
>teaching user space how to interpret snapshots.  Preferably we only
>have one dump mechanism, and user space can be taught the interpretation
>once.

Unlike regions, which are binary blobs, the "objdump" we have here is a
tree of values (json like). It is not taken from FW/HW. It is generated
in driver and passed down to user to be shown. Originally, Eran just
pushed that info into a string buffer and the user had to parse it
again. I asked to format this in Netlink attributes JSON-style so the
user gets all hierarchy without need of parsing and can easily do
Netlink->human_stdout or Netlink->JSON.



>
>>   Retrieves the last stored objdump. Devlink health
>>   saves a single objdump. If an objdump is not already stored by the devlink
>>   for this reporter, devlink generates a new objdump.
>>   Objdump output is defined by the reporter.
>> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR
>>   Clears the last saved objdump file for the specified reporter.
>> 
>> 
>>                                                netlink
>>                                       +--------------------------+
>>                                       |                          |
>>                                       |            +             |
>>                                       |            |             |
>>                                       +--------------------------+
>>                                                    |request for ops
>>                                                    |(diagnose,
>>  mlx5_core                             devlink     |recover,
>>                                                    |dump)
>> +--------+                            +--------------------------+
>> |        |                            |    reporter|             |
>> |        |                            |  +---------v----------+  |
>> |        |   ops execution            |  |                    |  |
>> |     <----------------------------------+                    |  |
>> |        |                            |  |                    |  |
>> |        |                            |  + ^------------------+  |
>> |        |                            |    | request for ops     |
>> |        |                            |    | (recover, dump)     |
>> |        |                            |    |                     |
>> |        |                            |  +-+------------------+  |
>> |        |     health report          |  | health handler     |  |
>> |        +------------------------------->                    |  |
>> |        |                            |  +--------------------+  |
>> |        |     health reporter create |                          |
>> |        +---------------------------->                          |
>> +--------+                            +--------------------------+
>> 
>> Available reporters:
>> In this patchset, three reporters of mlx5e driver are included. The FW
>> reporters implement devlink_health_reporter diagnostic, object dump and
>> recovery procedures. The TX reporter implements devlink_health_reporter
>> diagnostic and recovery procedures.
>> 
>> This RFC is based on the same concepts as previous RFC I sent earlier this
>> year: "[RFC PATCH iproute2-next] System specification health API". The API was
>> changed, also devlink code and mlx5e reporters were not available at the
>> previous RFC.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
  2019-01-03 13:31       ` Eran Ben Elisha
  2019-01-03 22:28         ` Jakub Kicinski
@ 2019-01-04  9:01         ` Jiri Pirko
  1 sibling, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-01-04  9:01 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: Jakub Kicinski, netdev, David S. Miller, Jiri Pirko,
	Moshe Shemesh, Aya Levin, Tal Alon, Ariel Almog

Thu, Jan 03, 2019 at 02:31:59PM CET, eranbe@mellanox.com wrote:
>
>
>On 1/3/2019 12:46 AM, Jakub Kicinski wrote:
>> On Tue, 1 Jan 2019 09:58:30 +0000, Eran Ben Elisha wrote:
>>> On 1/1/2019 3:47 AM, Jakub Kicinski wrote:

[...]

>
>> 
>>>> About the marshalling, I'm not sure it belongs in the kernel.  There is
>>>> precedent for adding interpretation of FW blobs in user space (ethtool).
>>>> IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
>>>> drivers.  Amount of code you add to print the simple example from last
>>>> patch is not inspiring confidence.
>>>
>>> The idea was to provide the developer the ability to create "tree-like"
>>> of information, it is needed when you want to describe complex objects.
>>> This caused a longer coding, but I don't think we are even close to the
>>> scale you are talking about.
>>> We can remove the tree flexibility, and move to array format, it will
>>> make the code of storing data by the driver to be shorter, but we will
>>> lose the ability to have it in tree-like format.
>> 
>> To be clear I slightly oppose the marshalling in the first place.  It
>> may be better to just dump the data as is, and have user space know
>> what the interpretation is.
>
>We provides a way to store the data in nested layers. In internal 
>discussions with Jiri, we decided that this is the correct approach.
>However, if one insists, it can fill the buffers with raw binary and 
>label it as such.

Again, the data is generated by driver. Driver knows the objects, it can
assemble a tree of values accordingly. To me it seems wrong to squash
the info into binary, push to userspace where it has to be parsed and
unsquashed back.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery system
  2019-01-03 22:28         ` Jakub Kicinski
@ 2019-01-04  9:12           ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2019-01-04  9:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eran Ben Elisha, netdev, David S. Miller, Jiri Pirko,
	Moshe Shemesh, Aya Levin, Tal Alon, Ariel Almog

Thu, Jan 03, 2019 at 11:28:34PM CET, jakub.kicinski@netronome.com wrote:
>On Thu, 3 Jan 2019 13:31:59 +0000, Eran Ben Elisha wrote:
>> Arch wise those are two different features which we shouldn't mix.
>> The region dump is aiming at dumping of information for monitoring of 
>> "HW memory" at real time, more like a dumb channel to provide memory 
>> chunks from HW to user.
>
>The "real time read" part of the region dump was not even implemented.
>And it was the part that made most sense to me.

Agreed. I believe that it was planned to be used for mlx5.


>
>Region snapshots were described as a tool for gathering crash dumps.
>See bedc989b0c98 ("net/mlx4_core: Add Crdump FW snapshot support").
>
>The "chunks from HW" is also incorrect as (1) current implementation of
>regions seem to mostly revolve around FW state and (2) there is nothing
>in the man page etc. that says HW.
>
>I'm not saying region snapshots fit the bill perfectly for you, I'm
>saying you guys are adding a second facility to do a very similar thing
>in the span of 6 months - how is it unreasonable of me to ask to
>consolidate?

If we would need to push binary, yes. But as I described in another
email, that is not the case.

>
>But I'm not gonna fight you any more on this, if nobody else cares.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-01-04  9:20 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-31 14:31 [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Eran Ben Elisha
2018-12-31 14:31 ` [PATCH RFC net-next 01/19] devlink: Add health buffer support Eran Ben Elisha
2018-12-31 14:31 ` [PATCH RFC net-next 02/19] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
2018-12-31 14:31 ` [PATCH RFC net-next 03/19] devlink: Add health report functionality Eran Ben Elisha
2018-12-31 14:31 ` [PATCH RFC net-next 04/19] devlink: Add health get command Eran Ben Elisha
2018-12-31 14:31 ` [PATCH RFC net-next 05/19] devlink: Add health set command Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 06/19] devlink: Add health recover command Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 07/19] devlink: Add health diagnose command Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 08/19] devlink: Add health objdump {get,clear} commands Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 09/19] net/mlx5e: Add TX reporter support Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 10/19] net/mlx5e: Add TX timeout support for mlx5e TX reporter Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 11/19] net/mlx5: Move all devlink related functions calls to devlink.c Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 12/19] net/mlx5: Refactor print health info Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 13/19] net/mlx5: Create FW devlink_health_reporter Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 14/19] net/mlx5: Add core dump register access functions Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 15/19] net/mlx5: Add support for FW reporter objdump Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 16/19] net/mlx5: Report devlink health on FW issues Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 17/19] net/mlx5: Add FW fatal devlink_health_reporter Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 18/19] net/mlx5: Report devlink health on FW fatal issues Eran Ben Elisha
2018-12-31 14:32 ` [PATCH RFC net-next 19/19] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
2019-01-01  1:47   ` Jakub Kicinski
2019-01-01 10:01     ` Eran Ben Elisha
2018-12-31 14:41 ` [PATCH RFC iproute2-next] devlink: Add health command support Aya Levin
2019-01-01  1:47 ` [PATCH RFC net-next 00/19] Devlink health reporting and recovery system Jakub Kicinski
2019-01-01  9:58   ` Eran Ben Elisha
2019-01-02 22:46     ` Jakub Kicinski
2019-01-03 13:31       ` Eran Ben Elisha
2019-01-03 22:28         ` Jakub Kicinski
2019-01-04  9:12           ` Jiri Pirko
2019-01-04  9:01         ` Jiri Pirko
2019-01-04  8:57   ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.