All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/27] Devlink health reporting and recovery system
@ 2019-01-17 15:04 Eran Ben Elisha
  2019-01-17 15:04 ` [PATCH net-next 01/27] devlink: Add health buffer support Eran Ben Elisha
                   ` (28 more replies)
  0 siblings, 29 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:04 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

The health mechanism is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
  information.

The main idea is to unify and centralize driver health reports in the
generic devlink instance and allow the user to set different
attributes of the health reporting and recovery procedures.

The devlink health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by devlink.
Device driver can provide specific callbacks for each "health reporter", e.g.
 - Recovery procedures
 - Diagnostics and object dump procedures
 - OOB initial attributes
Different parts of the driver can register different types of health reporters
with different handlers.

Once an error is reported, devlink health will do the following actions:
  * A log is being send to the kernel trace events buffer
  * Health status and statistics are being updated for the reporter instance
  * Object dump is being taken and saved at the reporter instance (as long as
    there is no other dump which is already stored)
  * Auto recovery attempt is being done. Depends on:
    - Auto-recovery configuration
    - Grace period vs. time passed since last recover

The user interface:
User can access/change each reporter attributes and driver specific callbacks
via devlink, e.g per error type (per health reporter)
 - Configure reporter's generic attributes (like: Disable/enable auto recovery)
 - Invoke recovery procedure
 - Run diagnostics
 - Object dump

The devlink health interface (via netlink):
DEVLINK_CMD_HEALTH_REPORTER_GET
  Retrieves status and configuration info per DEV and reporter.
DEVLINK_CMD_HEALTH_REPORTER_SET
  Allows reporter-related configuration setting.
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
  Triggers a reporter's recovery procedure.
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
  Retrieves diagnostics data from a reporter on a device.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
  Retrieves the last stored dump. Devlink health
  saves a single dump. If an dump is not already stored by the devlink
  for this reporter, devlink generates a new dump.
  dump output is defined by the reporter.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
  Clears the last saved dump file for the specified reporter.


                                               netlink
                                      +--------------------------+
                                      |                          |
                                      |            +             |
                                      |            |             |
                                      +--------------------------+
                                                   |request for ops
                                                   |(diagnose,
 mlx5_core                             devlink     |recover,
                                                   |dump)
+--------+                            +--------------------------+
|        |                            |    reporter|             |
|        |                            |  +---------v----------+  |
|        |   ops execution            |  |                    |  |
|     <----------------------------------+                    |  |
|        |                            |  |                    |  |
|        |                            |  + ^------------------+  |
|        |                            |    | request for ops     |
|        |                            |    | (recover, dump)     |
|        |                            |    |                     |
|        |                            |  +-+------------------+  |
|        |     health report          |  | health handler     |  |
|        +------------------------------->                    |  |
|        |                            |  +--------------------+  |
|        |     health reporter create |                          |
|        +---------------------------->                          |
+--------+                            +--------------------------+

Available reporters:
In this patchset, three reporters of mlx5e driver are included. The FW
reporters implement devlink_health_reporter diagnostic, dump and
recovery procedures. The TX reporter implements devlink_health_reporter
diagnostic and recovery procedures.

In order to support CR space dump as part of FW fatal reporter, mlx5 support
for devlink region was added.

Alex Vesker (2):
  net/mlx5: Add Vendor Specific Capability access gateway
  net/mlx5: Add Crdump FW snapshot support

Aya Levin (1):
  devlink: Add Documentation/networking/devlink-health.txt

Eran Ben Elisha (11):
  devlink: Add health buffer support
  devlink: Add health reporter create/destroy functionality
  devlink: Add health report functionality
  devlink: Add health get command
  devlink: Add health set command
  devlink: Add health recover command
  devlink: Add health diagnose command
  devlink: Add health dump {get,clear} commands
  net/mlx5e: Add TX reporter support
  net/mlx5e: Add TX timeout support for mlx5e TX reporter
  net/mlx5: Move all devlink related functions calls to devlink.c

Feras Daoud (4):
  Documentation: mlx5: Update kernel documentation
  net/mlx5: Handle SW reset of FW in error flow
  net/mlx5: Control CR-space access by different PFs
  net/mlx5: Issue SW reset on FW assert

Moshe Shemesh (9):
  net/mlx5: Use devlink region_snapshot parameter
  net/mlx5: Refactor print health info
  net/mlx5: Create FW devlink_health_reporter
  net/mlx5: Add core dump register access functions
  net/mlx5: Add support for FW reporter dump
  net/mlx5: Report devlink health on FW issues
  net/mlx5: Add FW fatal devlink_health_reporter
  net/mlx5: Add support for FW fatal reporter dump
  net/mlx5: Report devlink health on FW fatal issues

 Documentation/networking/devlink-health.txt   |   86 ++
 Documentation/networking/mlx5.rst             |   39 +
 MAINTAINERS                                   |    1 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |    5 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  460 +++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |   22 +
 .../ethernet/mellanox/mlx5/core/diag/crdump.c |  208 ++++
 .../mellanox/mlx5/core/diag/fw_tracer.c       |   76 ++
 .../mellanox/mlx5/core/diag/fw_tracer.h       |   11 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   18 +-
 .../ethernet/mellanox/mlx5/core/en/reporter.h |   15 +
 .../mellanox/mlx5/core/en/reporter_tx.c       |  356 ++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  186 +--
 .../ethernet/mellanox/mlx5/core/en_selftest.c |    2 +-
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |    2 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  |  291 ++++-
 .../ethernet/mellanox/mlx5/core/lib/mlx5.h    |    6 +
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.c |  311 +++++
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.h |   33 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   18 +-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   15 +-
 include/linux/mlx5/device.h                   |   11 +-
 include/linux/mlx5/driver.h                   |   13 +-
 include/linux/mlx5/mlx5_ifc.h                 |   21 +-
 include/net/devlink.h                         |  144 +++
 include/trace/events/devlink.h                |   62 +
 include/uapi/linux/devlink.h                  |   25 +
 net/core/devlink.c                            | 1054 +++++++++++++++++
 28 files changed, 3262 insertions(+), 229 deletions(-)
 create mode 100644 Documentation/networking/devlink-health.txt
 create mode 100644 Documentation/networking/mlx5.rst
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net-next 01/27] devlink: Add health buffer support
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
@ 2019-01-17 15:04 ` Eran Ben Elisha
  2019-01-17 15:04 ` [PATCH net-next 02/27] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:04 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Devlink health buffer is a mechanism to pass descriptors between drivers
and devlink. The API allows the driver to add objects, object pair,
value array (nested attributes), value and name.

Driver can use this API to fill the buffers in a format which can be
translated by the devlink to the netlink message.

In order to fulfill it, an internal buffer descriptor is defined. This
will hold the data and metadata per each attribute and by used to pass
actual commands to the netlink.

This mechanism will be later used in devlink health for dump and diagnose
data store by the drivers.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/net/devlink.h        |  76 ++++++
 include/uapi/linux/devlink.h |   8 +
 net/core/devlink.c           | 501 +++++++++++++++++++++++++++++++++++
 3 files changed, 585 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 67f4293bc970..77c77319290a 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -423,6 +423,8 @@ struct devlink_region;
 
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
+struct devlink_health_buffer;
+
 struct devlink_ops {
 	int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
 	int (*port_type_set)(struct devlink_port *devlink_port,
@@ -584,6 +586,22 @@ int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
 				   u8 *data, u32 snapshot_id,
 				   devlink_snapshot_data_dest_t *data_destructor);
 
+int devlink_health_buffer_nest_start(struct devlink_health_buffer *buffer,
+				     int attrtype);
+void devlink_health_buffer_nest_end(struct devlink_health_buffer *buffer);
+void devlink_health_buffer_nest_cancel(struct devlink_health_buffer *buffer);
+int devlink_health_buffer_put_object_name(struct devlink_health_buffer *buffer,
+					  char *name);
+int devlink_health_buffer_put_value_u8(struct devlink_health_buffer *buffer,
+				       u8 value);
+int devlink_health_buffer_put_value_u32(struct devlink_health_buffer *buffer,
+					u32 value);
+int devlink_health_buffer_put_value_u64(struct devlink_health_buffer *buffer,
+					u64 value);
+int devlink_health_buffer_put_value_string(struct devlink_health_buffer *buffer,
+					   char *name);
+int devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
+					 void *data, int len);
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -844,6 +862,64 @@ devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
 	return 0;
 }
 
+static inline int
+devlink_health_buffer_nest_start(struct devlink_health_buffer *buffer,
+				 int attrtype)
+{
+	return 0;
+}
+
+static inline void
+devlink_health_buffer_nest_end(struct devlink_health_buffer *buffer)
+{
+}
+
+static inline void
+devlink_health_buffer_nest_cancel(struct devlink_health_buffer *buffer)
+{
+}
+
+static inline int
+devlink_health_buffer_put_object_name(struct devlink_health_buffer *buffer,
+				      char *name)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_u8(struct devlink_health_buffer *buffer,
+				   u8 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_u32(struct devlink_health_buffer *buffer,
+				    u32 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_u64(struct devlink_health_buffer *buffer,
+				    u64 value)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_string(struct devlink_health_buffer *buffer,
+				       char *name)
+{
+	return 0;
+}
+
+static inline int
+devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
+				     void *data, int len)
+{
+	return 0;
+}
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 6e52d3660654..cff0e0cb5ac2 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -285,6 +285,14 @@ enum devlink_attr {
 	DEVLINK_ATTR_REGION_CHUNK_ADDR,         /* u64 */
 	DEVLINK_ATTR_REGION_CHUNK_LEN,          /* u64 */
 
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT,		/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR,		/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE,	/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY,	/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,	/* u8 */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,	/* dynamic */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index abb0da9d7b4b..8984501edade 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3597,6 +3597,507 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	return 0;
 }
 
+#define DEVLINK_HEALTH_BUFFER_SIZE (4096 - GENL_HDRLEN)
+#define DEVLINK_HEALTH_BUFFER_DATA_SIZE (DEVLINK_HEALTH_BUFFER_SIZE / 2)
+#define DEVLINK_HEALTH_SIZE_TO_BUFFERS(size) DIV_ROUND_UP(size, DEVLINK_HEALTH_BUFFER_DATA_SIZE)
+#define DEVLINK_HEALTH_BUFFER_MAX_CHUNK 1024
+
+struct devlink_health_buffer {
+	void *data;
+	u64 offset;
+	u64 bytes_left;
+	u64 bytes_left_metadata;
+	u64 max_nested_depth;
+	u64 curr_nest;
+};
+
+struct devlink_health_buffer_desc {
+	int attrtype;
+	u16 len;
+	u8 nla_type;
+	u8 nest_end;
+	int value[0];
+};
+
+static void
+devlink_health_buffers_reset(struct devlink_health_buffer **buffers_list,
+			     u64 num_of_buffers)
+{
+	u64 i;
+
+	for (i = 0; i < num_of_buffers; i++) {
+		memset(buffers_list[i]->data, 0, DEVLINK_HEALTH_BUFFER_SIZE);
+		buffers_list[i]->offset = 0;
+		buffers_list[i]->bytes_left = DEVLINK_HEALTH_BUFFER_DATA_SIZE;
+		buffers_list[i]->bytes_left_metadata =
+			DEVLINK_HEALTH_BUFFER_DATA_SIZE;
+		buffers_list[i]->max_nested_depth = 0;
+		buffers_list[i]->curr_nest = 0;
+	}
+}
+
+static void
+devlink_health_buffers_destroy(struct devlink_health_buffer **buffers_list,
+			       u64 size);
+
+static struct devlink_health_buffer **
+devlink_health_buffers_create(u64 size)
+{
+	struct devlink_health_buffer **buffers_list;
+	u64 num_of_buffers = DEVLINK_HEALTH_SIZE_TO_BUFFERS(size);
+	u64 i;
+
+	buffers_list = kcalloc(num_of_buffers,
+			       sizeof(struct devlink_health_buffer *),
+			       GFP_KERNEL);
+	if (!buffers_list)
+		return NULL;
+
+	for (i = 0; i < num_of_buffers; i++) {
+		struct devlink_health_buffer *buffer;
+		void *data;
+
+		buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
+		data = kzalloc(DEVLINK_HEALTH_BUFFER_SIZE, GFP_KERNEL);
+		if (!buffer || !data) {
+			kfree(buffer);
+			kfree(data);
+			goto buffers_cleanup;
+		}
+		buffers_list[i] = buffer;
+		buffer->data = data;
+	}
+	devlink_health_buffers_reset(buffers_list, num_of_buffers);
+
+	return buffers_list;
+
+buffers_cleanup:
+	devlink_health_buffers_destroy(buffers_list, --i);
+	kfree(buffers_list);
+	return NULL;
+}
+
+static void
+devlink_health_buffers_destroy(struct devlink_health_buffer **buffers_list,
+			       u64 num_of_buffers)
+{
+	u64 i;
+
+	for (i = 0; i < num_of_buffers; i++) {
+		kfree(buffers_list[i]->data);
+		kfree(buffers_list[i]);
+	}
+}
+
+void
+devlink_health_buffer_offset_inc(struct devlink_health_buffer *buffer,
+				 int len)
+{
+	buffer->offset += len;
+}
+
+/* In order to store a nest, need two descriptors, for start and end */
+#define DEVLINK_HEALTH_BUFFER_NEST_SIZE (sizeof(struct devlink_health_buffer_desc) * 2)
+
+int devlink_health_buffer_verify_len(struct devlink_health_buffer *buffer,
+				     int len, int metadata_len)
+{
+	if (len > DEVLINK_HEALTH_BUFFER_DATA_SIZE)
+		return -EINVAL;
+
+	if (buffer->bytes_left < len ||
+	    buffer->bytes_left_metadata < metadata_len)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static struct devlink_health_buffer_desc *
+devlink_health_buffer_get_desc_from_offset(struct devlink_health_buffer *buffer)
+{
+	return buffer->data + buffer->offset;
+}
+
+int
+devlink_health_buffer_nest_start(struct devlink_health_buffer *buffer,
+				 int attrtype)
+{
+	struct devlink_health_buffer_desc *desc;
+	int err;
+
+	err = devlink_health_buffer_verify_len(buffer, 0,
+					       DEVLINK_HEALTH_BUFFER_NEST_SIZE);
+	if (err)
+		return err;
+
+	if (attrtype != DEVLINK_ATTR_HEALTH_BUFFER_OBJECT &&
+	    attrtype != DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR &&
+	    attrtype != DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE &&
+	    attrtype != DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY)
+		return -EINVAL;
+
+	desc = devlink_health_buffer_get_desc_from_offset(buffer);
+
+	desc->attrtype = attrtype;
+	buffer->bytes_left_metadata -= DEVLINK_HEALTH_BUFFER_NEST_SIZE;
+	devlink_health_buffer_offset_inc(buffer, sizeof(*desc));
+
+	buffer->curr_nest++;
+	buffer->max_nested_depth = max(buffer->max_nested_depth,
+				       buffer->curr_nest);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_nest_start);
+
+enum devlink_health_buffer_nest_end_cancel {
+	DEVLINK_HEALTH_BUFFER_NEST_END = 1,
+	DEVLINK_HEALTH_BUFFER_NEST_CANCEL,
+};
+
+static void
+devlink_health_buffer_nest_end_cancel(struct devlink_health_buffer *buffer,
+				      enum devlink_health_buffer_nest_end_cancel nest)
+{
+	struct devlink_health_buffer_desc *desc;
+
+	WARN_ON(!buffer->curr_nest);
+	buffer->curr_nest--;
+
+	desc = devlink_health_buffer_get_desc_from_offset(buffer);
+	desc->nest_end = nest;
+	devlink_health_buffer_offset_inc(buffer, sizeof(*desc));
+}
+
+void devlink_health_buffer_nest_end(struct devlink_health_buffer *buffer)
+{
+	devlink_health_buffer_nest_end_cancel(buffer,
+					      DEVLINK_HEALTH_BUFFER_NEST_END);
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_nest_end);
+
+void devlink_health_buffer_nest_cancel(struct devlink_health_buffer *buffer)
+{
+	devlink_health_buffer_nest_end_cancel(buffer,
+					      DEVLINK_HEALTH_BUFFER_NEST_CANCEL);
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_nest_cancel);
+
+int
+devlink_health_buffer_put_object_name(struct devlink_health_buffer *buffer,
+				      char *name)
+{
+	struct devlink_health_buffer_desc *desc;
+	int err;
+
+	err = devlink_health_buffer_verify_len(buffer, strlen(name) + 1,
+					       sizeof(*desc));
+	if (err)
+		return err;
+
+	desc = devlink_health_buffer_get_desc_from_offset(buffer);
+	desc->attrtype = DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME;
+	desc->nla_type = NLA_NUL_STRING;
+	desc->len = strlen(name) + 1;
+	memcpy(&desc->value, name, desc->len);
+	devlink_health_buffer_offset_inc(buffer, sizeof(*desc) + desc->len);
+
+	buffer->bytes_left_metadata -= sizeof(*desc);
+	buffer->bytes_left -= (strlen(name) + 1);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_object_name);
+
+static int
+devlink_health_buffer_put_value(struct devlink_health_buffer *buffer,
+				u8 nla_type, void *value, int len)
+{
+	struct devlink_health_buffer_desc *desc;
+	int err;
+
+	err = devlink_health_buffer_verify_len(buffer, len, sizeof(*desc));
+	if (err)
+		return err;
+
+	desc = devlink_health_buffer_get_desc_from_offset(buffer);
+	desc->attrtype = DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA;
+	desc->nla_type = nla_type;
+	desc->len = len;
+	memcpy(&desc->value, value, len);
+	devlink_health_buffer_offset_inc(buffer, sizeof(*desc) + desc->len);
+
+	buffer->bytes_left_metadata -= sizeof(*desc);
+	buffer->bytes_left -= len;
+
+	return 0;
+}
+
+int
+devlink_health_buffer_put_value_u8(struct devlink_health_buffer *buffer,
+				   u8 value)
+{
+	int err;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_U8, &value,
+					      sizeof(value));
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_u8);
+
+int
+devlink_health_buffer_put_value_u32(struct devlink_health_buffer *buffer,
+				    u32 value)
+{
+	int err;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_U32, &value,
+					      sizeof(value));
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_u32);
+
+int
+devlink_health_buffer_put_value_u64(struct devlink_health_buffer *buffer,
+				    u64 value)
+{
+	int err;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_U64, &value,
+					      sizeof(value));
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_u64);
+
+int
+devlink_health_buffer_put_value_string(struct devlink_health_buffer *buffer,
+				       char *name)
+{
+	int err;
+
+	if (strlen(name) + 1 > DEVLINK_HEALTH_BUFFER_MAX_CHUNK)
+		return -EINVAL;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_NUL_STRING, name,
+					      strlen(name) + 1);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_string);
+
+int
+devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
+				     void *data, int len)
+{
+	int err;
+
+	if (len > DEVLINK_HEALTH_BUFFER_MAX_CHUNK)
+		return -EINVAL;
+
+	err = devlink_health_buffer_put_value(buffer, NLA_BINARY, data, len);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_buffer_put_value_data);
+
+static int
+devlink_health_buffer_fill_data(struct sk_buff *skb,
+				struct devlink_health_buffer_desc *desc)
+{
+	int err = -EINVAL;
+
+	switch (desc->nla_type) {
+	case NLA_U8:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+				 *(u8 *)desc->value);
+		break;
+	case NLA_U32:
+		err = nla_put_u32(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+				  *(u32 *)desc->value);
+		break;
+	case NLA_U64:
+		err = nla_put_u64_64bit(skb,
+					DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+					*(u64 *)desc->value, DEVLINK_ATTR_PAD);
+		break;
+	case NLA_NUL_STRING:
+		err = nla_put_string(skb,
+				     DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+				     (char *)&desc->value);
+		break;
+	case NLA_BINARY:
+		err = nla_put(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,
+			      desc->len, (void *)&desc->value);
+		break;
+	}
+
+	return err;
+}
+
+static int
+devlink_health_buffer_fill_type(struct sk_buff *skb,
+				struct devlink_health_buffer_desc *desc)
+{
+	int err = -EINVAL;
+
+	switch (desc->nla_type) {
+	case NLA_U8:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_U8);
+		break;
+	case NLA_U32:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_U32);
+		break;
+	case NLA_U64:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_U64);
+		break;
+	case NLA_NUL_STRING:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_NUL_STRING);
+		break;
+	case NLA_BINARY:
+		err = nla_put_u8(skb, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,
+				 NLA_BINARY);
+		break;
+	}
+
+	return err;
+}
+
+static inline struct devlink_health_buffer_desc *
+devlink_health_buffer_get_next_desc(struct devlink_health_buffer_desc *desc)
+{
+	return (void *)&desc->value + desc->len;
+}
+
+static int
+devlink_health_buffer_prepare_skb(struct sk_buff *skb,
+				  struct devlink_health_buffer *buffer)
+{
+	struct devlink_health_buffer_desc *last_desc, *desc;
+	struct nlattr **buffer_nlattr;
+	int err;
+	int i = 0;
+
+	buffer_nlattr = kcalloc(buffer->max_nested_depth,
+				sizeof(*buffer_nlattr), GFP_KERNEL);
+	if (!buffer_nlattr)
+		return -EINVAL;
+
+	last_desc = devlink_health_buffer_get_desc_from_offset(buffer);
+	desc = buffer->data;
+	while (desc != last_desc) {
+		switch (desc->attrtype) {
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT:
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR:
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE:
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY:
+			buffer_nlattr[i] = nla_nest_start(skb, desc->attrtype);
+			if (!buffer_nlattr[i])
+				goto nla_put_failure;
+			i++;
+			break;
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA:
+			err = devlink_health_buffer_fill_data(skb, desc);
+			if (err)
+				goto nla_put_failure;
+			err = devlink_health_buffer_fill_type(skb, desc);
+			if (err)
+				goto nla_put_failure;
+			break;
+		case DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME:
+			err = nla_put_string(skb, desc->attrtype,
+					     (char *)&desc->value);
+			if (err)
+				goto nla_put_failure;
+			break;
+		default:
+			WARN_ON(!desc->nest_end);
+			WARN_ON(i <= 0);
+			if (desc->nest_end == DEVLINK_HEALTH_BUFFER_NEST_END)
+				nla_nest_end(skb, buffer_nlattr[--i]);
+			else
+				nla_nest_cancel(skb, buffer_nlattr[--i]);
+			break;
+		}
+		desc = devlink_health_buffer_get_next_desc(desc);
+	}
+
+	return 0;
+
+nla_put_failure:
+	kfree(buffer_nlattr);
+	return err;
+}
+
+static int
+devlink_health_buffer_snd(struct genl_info *info,
+			  enum devlink_command cmd, int flags,
+			  struct devlink_health_buffer **buffers_array,
+			  u64 num_of_buffers)
+{
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	void *hdr;
+	int err;
+	u64 i;
+
+	for (i = 0; i < num_of_buffers; i++) {
+		/* Skip buffer if driver did not fill it up with any data */
+		if (!buffers_array[i]->offset)
+			continue;
+
+		skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+		if (!skb)
+			return -ENOMEM;
+
+		hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
+				  &devlink_nl_family, NLM_F_MULTI, cmd);
+		if (!hdr)
+			goto nla_put_failure;
+
+		err = devlink_health_buffer_prepare_skb(skb, buffers_array[i]);
+		if (err)
+			goto nla_put_failure;
+
+		genlmsg_end(skb, hdr);
+		err = genlmsg_reply(skb, info);
+		if (err)
+			return err;
+	}
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+	nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
+			NLMSG_DONE, 0, flags | NLM_F_MULTI);
+	err = genlmsg_reply(skb, info);
+	if (err)
+		return err;
+
+	return 0;
+
+nla_put_failure:
+	err = -EIO;
+	nlmsg_free(skb);
+	return err;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 02/27] devlink: Add health reporter create/destroy functionality
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
  2019-01-17 15:04 ` [PATCH net-next 01/27] devlink: Add health buffer support Eran Ben Elisha
@ 2019-01-17 15:04 ` Eran Ben Elisha
  2019-01-17 15:04 ` [PATCH net-next 03/27] devlink: Add health report functionality Eran Ben Elisha
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:04 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.

As part of the allocation, driver shall provide a set of callbacks which
will be used the devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.

For each reporter, devlink will hold a metadata of statistics,
buffers and status.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/net/devlink.h |  59 ++++++++++++++++++++
 net/core/devlink.c    | 127 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 77c77319290a..7fe30d67678a 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -30,6 +30,7 @@ struct devlink {
 	struct list_head param_list;
 	struct list_head region_list;
 	u32 snapshot_id;
+	struct list_head reporter_list;
 	struct devlink_dpipe_headers *dpipe_headers;
 	const struct devlink_ops *ops;
 	struct device *dev;
@@ -424,6 +425,34 @@ struct devlink_region;
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
 struct devlink_health_buffer;
+struct devlink_health_reporter;
+
+/**
+ * struct devlink_health_reporter_ops - Reporter operations
+ * @name: reporter name
+ * dump_size: dump buffer size allocated by the devlink
+ * diagnose_size: diagnose buffer size allocated by the devlink
+ * recover: callback to recover from reported error
+ *          if priv_ctx is NULL, run a full recover
+ * dump: callback to dump an object
+ *       if priv_ctx is NULL, run a full dump
+ * diagnose: callback to diagnose the current status
+ */
+
+struct devlink_health_reporter_ops {
+	char *name;
+	unsigned int dump_size;
+	unsigned int diagnose_size;
+	int (*recover)(struct devlink_health_reporter *reporter,
+		       void *priv_ctx);
+	int (*dump)(struct devlink_health_reporter *reporter,
+		    struct devlink_health_buffer **buffers_array,
+		    unsigned int buffer_size, unsigned int num_buffers,
+		    void *priv_ctx);
+	int (*diagnose)(struct devlink_health_reporter *reporter,
+			struct devlink_health_buffer **buffers_array,
+			unsigned int buffer_size, unsigned int num_buffers);
+};
 
 struct devlink_ops {
 	int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
@@ -602,6 +631,16 @@ int devlink_health_buffer_put_value_string(struct devlink_health_buffer *buffer,
 					   char *name);
 int devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
 					 void *data, int len);
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv);
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -920,6 +959,26 @@ devlink_health_buffer_put_value_data(struct devlink_health_buffer *buffer,
 {
 	return 0;
 }
+
+static inline struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv)
+{
+	return NULL;
+}
+
+static inline void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+}
+
+static inline void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+	return NULL;
+}
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 8984501edade..fec169a28dba 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4098,6 +4098,132 @@ devlink_health_buffer_snd(struct genl_info *info,
 	return err;
 }
 
+struct devlink_health_reporter {
+	struct list_head list;
+	struct devlink_health_buffer **dump_buffers_array;
+	struct mutex dump_lock; /* lock parallel read/write from dump buffers */
+	struct devlink_health_buffer **diagnose_buffers_array;
+	struct mutex diagnose_lock; /* lock parallel read/write from diagnose buffers */
+	void *priv;
+	const struct devlink_health_reporter_ops *ops;
+	struct devlink *devlink;
+	u64 graceful_period;
+	bool auto_recover;
+	u8 health_state;
+};
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+	return reporter->priv;
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_priv);
+
+static struct devlink_health_reporter *
+devlink_health_reporter_find_by_name(struct devlink *devlink,
+				     const char *reporter_name)
+{
+	struct devlink_health_reporter *reporter;
+
+	list_for_each_entry(reporter, &devlink->reporter_list, list)
+		if (!strcmp(reporter->ops->name, reporter_name))
+			return reporter;
+	return NULL;
+}
+
+/**
+ *	devlink_health_reporter_create - create devlink health reporter
+ *
+ *	@devlink: devlink
+ *	@ops: ops
+ *	@graceful_period: to avoid recovery loops, in msecs
+ *	@auto_recover: auto recover when error occurs
+ *	@priv: priv
+ */
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+			       const struct devlink_health_reporter_ops *ops,
+			       u64 graceful_period, bool auto_recover,
+			       void *priv)
+{
+	struct devlink_health_reporter *reporter;
+
+	mutex_lock(&devlink->lock);
+	if (devlink_health_reporter_find_by_name(devlink, ops->name)) {
+		reporter = ERR_PTR(-EEXIST);
+		goto unlock;
+	}
+
+	if (WARN_ON(ops->dump && !ops->dump_size) ||
+	    WARN_ON(ops->diagnose && !ops->diagnose_size) ||
+	    WARN_ON(auto_recover && !ops->recover) ||
+	    WARN_ON(graceful_period && !ops->recover)) {
+		reporter = ERR_PTR(-EINVAL);
+		goto unlock;
+	}
+
+	reporter = kzalloc(sizeof(*reporter), GFP_KERNEL);
+	if (!reporter) {
+		reporter = ERR_PTR(-ENOMEM);
+		goto unlock;
+	}
+
+	if (ops->dump) {
+		reporter->dump_buffers_array =
+			devlink_health_buffers_create(ops->dump_size);
+		if (!reporter->dump_buffers_array) {
+			kfree(reporter);
+			reporter = ERR_PTR(-ENOMEM);
+			goto unlock;
+		}
+	}
+
+	if (ops->diagnose) {
+		reporter->diagnose_buffers_array =
+			devlink_health_buffers_create(ops->diagnose_size);
+		if (!reporter->diagnose_buffers_array) {
+			devlink_health_buffers_destroy(reporter->dump_buffers_array,
+						       DEVLINK_HEALTH_SIZE_TO_BUFFERS(ops->dump_size));
+			kfree(reporter);
+			reporter = ERR_PTR(-ENOMEM);
+			goto unlock;
+		}
+	}
+
+	list_add_tail(&reporter->list, &devlink->reporter_list);
+	mutex_init(&reporter->dump_lock);
+	mutex_init(&reporter->diagnose_lock);
+
+	reporter->priv = priv;
+	reporter->ops = ops;
+	reporter->devlink = devlink;
+	reporter->graceful_period = graceful_period;
+	reporter->auto_recover = auto_recover;
+unlock:
+	mutex_unlock(&devlink->lock);
+	return reporter;
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_create);
+
+/**
+ *	devlink_health_reporter_destroy - destroy devlink health reporter
+ *
+ *	@reporter: devlink health reporter to destroy
+ */
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+	mutex_lock(&reporter->devlink->lock);
+	list_del(&reporter->list);
+	devlink_health_buffers_destroy(reporter->dump_buffers_array,
+				       DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->dump_size));
+	devlink_health_buffers_destroy(reporter->diagnose_buffers_array,
+				       DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->diagnose_size));
+	kfree(reporter);
+	mutex_unlock(&reporter->devlink->lock);
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4383,6 +4509,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	INIT_LIST_HEAD(&devlink->resource_list);
 	INIT_LIST_HEAD(&devlink->param_list);
 	INIT_LIST_HEAD(&devlink->region_list);
+	INIT_LIST_HEAD(&devlink->reporter_list);
 	mutex_init(&devlink->lock);
 	return devlink;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 03/27] devlink: Add health report functionality
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
  2019-01-17 15:04 ` [PATCH net-next 01/27] devlink: Add health buffer support Eran Ben Elisha
  2019-01-17 15:04 ` [PATCH net-next 02/27] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
@ 2019-01-17 15:04 ` Eran Ben Elisha
  2019-01-17 15:04 ` [PATCH net-next 04/27] devlink: Add health get command Eran Ben Elisha
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:04 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.

Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
  as there is no other dump which is already stored)
* Auto recovery attempt is being done. depends on:
  - Auto Recovery configuration
  - Grace period vs. time since last recover

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/net/devlink.h          |  9 ++++
 include/trace/events/devlink.h | 62 +++++++++++++++++++++++
 net/core/devlink.c             | 93 ++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7fe30d67678a..a81a1b7a67d7 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -641,6 +641,8 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
 
 void *
 devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
+int devlink_health_report(struct devlink_health_reporter *reporter,
+			  const char *msg, void *priv_ctx);
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -979,6 +981,13 @@ devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
 {
 	return NULL;
 }
+
+static inline int
+devlink_health_report(struct devlink_health_reporter *reporter,
+		      const char *msg, void *priv_ctx)
+{
+	return 0;
+}
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
index 44acfbca1266..7e39d2fc7c75 100644
--- a/include/trace/events/devlink.h
+++ b/include/trace/events/devlink.h
@@ -46,6 +46,65 @@ TRACE_EVENT(devlink_hwmsg,
 		  (int) __entry->len, __get_dynamic_array(buf), __entry->len)
 );
 
+TRACE_EVENT(devlink_health_report,
+	TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+		 const char *msg),
+
+	TP_ARGS(devlink, reporter_name, msg),
+
+	TP_STRUCT__entry(
+		__string(bus_name, devlink->dev->bus->name)
+		__string(dev_name, dev_name(devlink->dev))
+		__string(driver_name, devlink->dev->driver->name)
+		__string(reporter_name, msg)
+		__string(msg, msg)
+	),
+
+	TP_fast_assign(
+		__assign_str(bus_name, devlink->dev->bus->name);
+		__assign_str(dev_name, dev_name(devlink->dev));
+		__assign_str(driver_name, devlink->dev->driver->name);
+		__assign_str(reporter_name, reporter_name);
+		__assign_str(msg, msg);
+	),
+
+	TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: %s",
+		  __get_str(bus_name), __get_str(dev_name),
+		  __get_str(driver_name), __get_str(reporter_name),
+		  __get_str(msg))
+);
+
+TRACE_EVENT(devlink_health_recover_aborted,
+	TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+		 bool health_state, u64 time_since_last_recover),
+
+	TP_ARGS(devlink, reporter_name, health_state, time_since_last_recover),
+
+	TP_STRUCT__entry(
+		__string(bus_name, devlink->dev->bus->name)
+		__string(dev_name, dev_name(devlink->dev))
+		__string(driver_name, devlink->dev->driver->name)
+		__string(reporter_name, reporter_name)
+		__field(bool, health_state)
+		__field(u64, time_since_last_recover)
+	),
+
+	TP_fast_assign(
+		__assign_str(bus_name, devlink->dev->bus->name);
+		__assign_str(dev_name, dev_name(devlink->dev));
+		__assign_str(driver_name, devlink->dev->driver->name);
+		__assign_str(reporter_name, reporter_name);
+		__entry->health_state = health_state;
+		__entry->time_since_last_recover = time_since_last_recover;
+	),
+
+	TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: health_state=%d time_since_last_recover = %llu recover aborted",
+		  __get_str(bus_name), __get_str(dev_name),
+		  __get_str(driver_name), __get_str(reporter_name),
+		  __entry->health_state,
+		  __entry->time_since_last_recover)
+);
+
 #endif /* _TRACE_DEVLINK_H */
 
 /* This part must be outside protection */
@@ -64,6 +123,9 @@ static inline void trace_devlink_hwmsg(const struct devlink *devlink,
 {
 }
 
+static inline void trace_devlink_health(const char *msg)
+{
+}
 #endif /* _TRACE_DEVLINK_H */
 
 #endif
diff --git a/net/core/devlink.c b/net/core/devlink.c
index fec169a28dba..943d3e7dea6a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4110,6 +4110,16 @@ struct devlink_health_reporter {
 	u64 graceful_period;
 	bool auto_recover;
 	u8 health_state;
+	u8 dump_avail;
+	u64 dump_ts;
+	u64 error_count;
+	u64 recovery_count;
+	u64 last_recovery_ts;
+};
+
+enum devlink_health_reporter_state {
+	DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+	DEVLINK_HEALTH_REPORTER_STATE_ERROR,
 };
 
 void *
@@ -4224,6 +4234,89 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
 }
 EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
 
+static int
+devlink_health_reporter_recover(struct devlink_health_reporter *reporter,
+				void *priv_ctx)
+{
+	int err;
+
+	if (!reporter->ops->recover)
+		return -EOPNOTSUPP;
+
+	err = reporter->ops->recover(reporter, priv_ctx);
+	if (err)
+		return err;
+
+	reporter->recovery_count++;
+	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY;
+	reporter->last_recovery_ts = jiffies;
+
+	return 0;
+}
+
+static int devlink_health_do_dump(struct devlink_health_reporter *reporter,
+				  void *priv_ctx)
+{
+	int err;
+
+	if (!reporter->ops->dump)
+		return 0;
+
+	if (reporter->dump_avail)
+		return 0;
+
+	devlink_health_buffers_reset(reporter->dump_buffers_array,
+				     DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->dump_size));
+	err = reporter->ops->dump(reporter, reporter->dump_buffers_array,
+				     DEVLINK_HEALTH_BUFFER_SIZE,
+				     DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->dump_size),
+				     priv_ctx);
+	if (!err) {
+		reporter->dump_avail = true;
+		reporter->dump_ts = jiffies;
+	}
+
+	return err;
+}
+
+int devlink_health_report(struct devlink_health_reporter *reporter,
+			  const char *msg, void *priv_ctx)
+{
+	struct devlink *devlink = reporter->devlink;
+	int err = 0;
+
+	/* write a log message of the current error */
+	WARN_ON(!msg);
+	trace_devlink_health_report(devlink, reporter->ops->name, msg);
+	reporter->error_count++;
+
+	/* abort if the previous error wasn't recovered */
+	if (reporter->auto_recover &&
+	    (reporter->health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
+	     jiffies - reporter->last_recovery_ts <
+	     msecs_to_jiffies(reporter->graceful_period))) {
+		trace_devlink_health_recover_aborted(devlink,
+						     reporter->ops->name,
+						     reporter->health_state,
+						     jiffies -
+						     reporter->last_recovery_ts);
+		return -ECANCELED;
+	}
+
+	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
+
+	mutex_lock(&reporter->dump_lock);
+	/* store current dump of current error, for later analysis */
+	devlink_health_do_dump(reporter, priv_ctx);
+	mutex_unlock(&reporter->dump_lock);
+
+	if (reporter->auto_recover)
+		err = devlink_health_reporter_recover(reporter, priv_ctx);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(devlink_health_report);
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 04/27] devlink: Add health get command
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (2 preceding siblings ...)
  2019-01-17 15:04 ` [PATCH net-next 03/27] devlink: Add health report functionality Eran Ben Elisha
@ 2019-01-17 15:04 ` Eran Ben Elisha
  2019-01-17 15:04 ` [PATCH net-next 05/27] devlink: Add health set command Eran Ben Elisha
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:04 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/uapi/linux/devlink.h |  12 +++
 net/core/devlink.c           | 152 +++++++++++++++++++++++++++++++++++
 2 files changed, 164 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cff0e0cb5ac2..c05470578b99 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -89,6 +89,8 @@ enum devlink_command {
 	DEVLINK_CMD_REGION_DEL,
 	DEVLINK_CMD_REGION_READ,
 
+	DEVLINK_CMD_HEALTH_REPORTER_GET,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -293,6 +295,16 @@ enum devlink_attr {
 	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,	/* u8 */
 	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,	/* dynamic */
 
+	DEVLINK_ATTR_HEALTH_REPORTER,			/* nested */
+	DEVLINK_ATTR_HEALTH_REPORTER_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_REPORTER_STATE,		/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_ERR,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_AVAIL,	/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,	/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,	/* u8 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 943d3e7dea6a..2ba9275449c2 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4317,6 +4317,149 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
 }
 EXPORT_SYMBOL_GPL(devlink_health_report);
 
+static struct devlink_health_reporter *
+devlink_health_reporter_get_from_info(struct devlink *devlink,
+				      struct genl_info *info)
+{
+	char *reporter_name;
+
+	if (!info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME])
+		return NULL;
+
+	reporter_name =
+		nla_data(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]);
+	return devlink_health_reporter_find_by_name(devlink, reporter_name);
+}
+
+static int
+devlink_nl_health_reporter_fill(struct sk_buff *msg,
+				struct devlink *devlink,
+				struct devlink_health_reporter *reporter,
+				enum devlink_command cmd, u32 portid,
+				u32 seq, int flags)
+{
+	struct nlattr *reporter_attr;
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (devlink_nl_put_handle(msg, devlink))
+		goto genlmsg_cancel;
+
+	reporter_attr = nla_nest_start(msg, DEVLINK_ATTR_HEALTH_REPORTER);
+	if (!reporter_attr)
+		goto genlmsg_cancel;
+	if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME,
+			   reporter->ops->name))
+		goto reporter_nest_cancel;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE,
+		       reporter->health_state))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_ERR,
+			      reporter->error_count, DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,
+			      reporter->recovery_count, DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,
+			      reporter->graceful_period,
+			      DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
+		       reporter->auto_recover))
+		goto reporter_nest_cancel;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_AVAIL,
+		       reporter->dump_avail))
+		goto reporter_nest_cancel;
+	if (reporter->dump_avail &&
+	    nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,
+			      jiffies_to_msecs(reporter->dump_ts),
+			      DEVLINK_ATTR_PAD))
+		goto reporter_nest_cancel;
+
+	nla_nest_end(msg, reporter_attr);
+	genlmsg_end(msg, hdr);
+	return 0;
+
+reporter_nest_cancel:
+	nla_nest_end(msg, reporter_attr);
+genlmsg_cancel:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb,
+						   struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	struct sk_buff *msg;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_health_reporter_fill(msg, devlink, reporter,
+					      DEVLINK_CMD_HEALTH_REPORTER_GET,
+					      info->snd_portid, info->snd_seq,
+					      0);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int
+devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
+					  struct netlink_callback *cb)
+{
+	struct devlink_health_reporter *reporter;
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		mutex_lock(&devlink->lock);
+		list_for_each_entry(reporter, &devlink->reporter_list,
+				    list) {
+			if (idx < start) {
+				idx++;
+				continue;
+			}
+			err = devlink_nl_health_reporter_fill(msg, devlink,
+							      reporter,
+							      DEVLINK_CMD_HEALTH_REPORTER_GET,
+							      NETLINK_CB(cb->skb).portid,
+							      cb->nlh->nlmsg_seq,
+							      NLM_F_MULTI);
+			if (err) {
+				mutex_unlock(&devlink->lock);
+				goto out;
+			}
+			idx++;
+		}
+		mutex_unlock(&devlink->lock);
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4342,6 +4485,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 },
 	[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -4562,6 +4706,14 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_GET,
+		.doit = devlink_nl_cmd_health_reporter_get_doit,
+		.dumpit = devlink_nl_cmd_health_reporter_get_dumpit,
+		.policy = devlink_nl_policy,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+		/* can be retrieved by unprivileged users */
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 05/27] devlink: Add health set command
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (3 preceding siblings ...)
  2019-01-17 15:04 ` [PATCH net-next 04/27] devlink: Add health get command Eran Ben Elisha
@ 2019-01-17 15:04 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 06/27] devlink: Add health recover command Eran Ben Elisha
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:04 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon
		receiving error for the reporter

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index c05470578b99..49ad5a76b121 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -90,6 +90,7 @@ enum devlink_command {
 	DEVLINK_CMD_REGION_READ,
 
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_SET,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 2ba9275449c2..a34414bf1c27 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4460,6 +4460,33 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
 	return msg->len;
 }
 
+static int
+devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->recover &&
+	    (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] ||
+	     info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]))
+		return -EINVAL;
+
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
+		reporter->graceful_period =
+			nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]);
+
+	if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])
+		reporter->auto_recover =
+			nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
+
+	return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4486,6 +4513,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
 	[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
+	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -4714,6 +4743,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 		/* can be retrieved by unprivileged users */
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_SET,
+		.doit = devlink_nl_cmd_health_reporter_set_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 06/27] devlink: Add health recover command
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (4 preceding siblings ...)
  2019-01-17 15:04 ` [PATCH net-next 05/27] devlink: Add health set command Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 07/27] devlink: Add health diagnose command Eran Ben Elisha
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 49ad5a76b121..1c186fd77343 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -91,6 +91,7 @@ enum devlink_command {
 
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
+	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index a34414bf1c27..b224d0d31c0c 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4487,6 +4487,19 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
 	return 0;
 }
 
+static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
+						       struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	return devlink_health_reporter_recover(reporter, NULL);
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4750,6 +4763,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+		.doit = devlink_nl_cmd_health_reporter_recover_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 07/27] devlink: Add health diagnose command
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (5 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 06/27] devlink: Add health recover command Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 08/27] devlink: Add health dump {get,clear} commands Eran Ben Elisha
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.

It is expected from driver's callback for diagnose command to fill it
via the buffer descriptors API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 51 ++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 1c186fd77343..51b4d7612cf8 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -92,6 +92,7 @@ enum devlink_command {
 	DEVLINK_CMD_HEALTH_REPORTER_GET,
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
 	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index b224d0d31c0c..57252ca31e1e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4500,6 +4500,50 @@ static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
 	return devlink_health_reporter_recover(reporter, NULL);
 }
 
+static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb,
+							struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	u64 num_of_buffers;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->diagnose)
+		return -EOPNOTSUPP;
+
+	num_of_buffers =
+		DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->diagnose_size);
+
+	mutex_lock(&reporter->diagnose_lock);
+	devlink_health_buffers_reset(reporter->diagnose_buffers_array,
+				     num_of_buffers);
+
+	err = reporter->ops->diagnose(reporter,
+				      reporter->diagnose_buffers_array,
+				      DEVLINK_HEALTH_BUFFER_SIZE,
+				      num_of_buffers);
+	if (err)
+		goto out;
+
+	err = devlink_health_buffer_snd(info,
+					DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+					0, reporter->diagnose_buffers_array,
+					num_of_buffers);
+	if (err)
+		goto out;
+
+	mutex_unlock(&reporter->diagnose_lock);
+	return 0;
+
+out:
+	mutex_unlock(&reporter->diagnose_lock);
+	return err;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4770,6 +4814,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+		.doit = devlink_nl_cmd_health_reporter_diagnose_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 08/27] devlink: Add health dump {get,clear} commands
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (6 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 07/27] devlink: Add health diagnose command Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 09/27] net/mlx5e: Add TX reporter support Eran Ben Elisha
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Add devlink health dump commands, in order to run an dump operation
over a specific reporter.

The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.

It is expected from driver's callback for diagnose command to fill it
via the buffer descriptors API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
---
 include/uapi/linux/devlink.h |  2 +
 net/core/devlink.c           | 74 ++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 51b4d7612cf8..6b26bb2ce4dc 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -93,6 +93,8 @@ enum devlink_command {
 	DEVLINK_CMD_HEALTH_REPORTER_SET,
 	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
 	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+	DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
 
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 57252ca31e1e..9ce07aa325a0 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4544,6 +4544,65 @@ static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb,
 	return err;
 }
 
+static void
+devlink_health_dump_clear(struct devlink_health_reporter *reporter)
+{
+	reporter->dump_avail = false;
+	reporter->dump_ts = 0;
+	devlink_health_buffers_reset(reporter->dump_buffers_array,
+				     DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->dump_size));
+}
+
+static int devlink_nl_cmd_health_reporter_dump_get_doit(struct sk_buff *skb,
+							struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+	u64 num_of_buffers;
+	int err;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	if (!reporter->ops->dump)
+		return -EOPNOTSUPP;
+
+	num_of_buffers =
+		DEVLINK_HEALTH_SIZE_TO_BUFFERS(reporter->ops->dump_size);
+
+	mutex_lock(&reporter->dump_lock);
+	err = devlink_health_do_dump(reporter, NULL);
+	if (err)
+		goto out;
+
+	err = devlink_health_buffer_snd(info,
+					DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+					0, reporter->dump_buffers_array,
+					num_of_buffers);
+
+out:
+	mutex_unlock(&reporter->dump_lock);
+	return err;
+}
+
+static int
+devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb,
+					       struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_health_reporter *reporter;
+
+	reporter = devlink_health_reporter_get_from_info(devlink, info);
+	if (!reporter)
+		return -EINVAL;
+
+	mutex_lock(&reporter->dump_lock);
+	devlink_health_dump_clear(reporter);
+	mutex_unlock(&reporter->dump_lock);
+	return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4821,6 +4880,21 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+		.doit = devlink_nl_cmd_health_reporter_dump_get_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+				  DEVLINK_NL_FLAG_NO_LOCK,
+	},
+	{
+		.cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
+		.doit = devlink_nl_cmd_health_reporter_dump_clear_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 09/27] net/mlx5e: Add TX reporter support
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (7 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 08/27] devlink: Add health dump {get,clear} commands Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 10/27] net/mlx5e: Add TX timeout support for mlx5e TX reporter Eran Ben Elisha
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of TX errors.
This patch declares the TX reporter operations and allocate it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full TX recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.

The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  18 +-
 .../ethernet/mellanox/mlx5/core/en/reporter.h |  14 +
 .../mellanox/mlx5/core/en/reporter_tx.c       | 321 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 135 +-------
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   |   2 +-
 6 files changed, 367 insertions(+), 125 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9de9abacf7f6..6bb2a860b15b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -22,7 +22,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 #
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
-		en_selftest.o en/port.o en/monitor_stats.o
+		en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8fa8fdd30b85..27e276c9bf84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -388,10 +388,7 @@ struct mlx5e_txqsq {
 	struct mlx5e_channel      *channel;
 	int                        txq_ix;
 	u32                        rate_limit;
-	struct mlx5e_txqsq_recover {
-		struct work_struct         recover_work;
-		u64                        last_recover;
-	} recover;
+	struct work_struct         recover_work;
 } ____cacheline_aligned_in_smp;
 
 struct mlx5e_dma_info {
@@ -682,6 +679,13 @@ struct mlx5e_rss_params {
 	u8	hfunc;
 };
 
+struct mlx5e_modify_sq_param {
+	int curr_state;
+	int next_state;
+	int rl_update;
+	int rl_index;
+};
+
 struct mlx5e_priv {
 	/* priv data path fields - start */
 	struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
@@ -737,6 +741,7 @@ struct mlx5e_priv {
 #ifdef CONFIG_MLX5_EN_TLS
 	struct mlx5e_tls          *tls;
 #endif
+	struct devlink_health_reporter *tx_reporter;
 };
 
 struct mlx5e_profile {
@@ -866,6 +871,11 @@ void mlx5e_set_rq_type(struct mlx5_core_dev *mdev, struct mlx5e_params *params);
 void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
 			       struct mlx5e_params *params);
 
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+		    struct mlx5e_modify_sq_param *p);
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq);
+void mlx5e_tx_disable_queue(struct netdev_queue *txq);
+
 static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
 {
 	return (MLX5_CAP_ETH(mdev, tunnel_stateless_gre) &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
new file mode 100644
index 000000000000..74083e120ab3
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018 Mellanox Technologies. */
+
+#ifndef __MLX5E_EN_REPORTER_H
+#define __MLX5E_EN_REPORTER_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
new file mode 100644
index 000000000000..9800df4909c2
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -0,0 +1,321 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018 Mellanox Technologies. */
+
+#include <net/devlink.h>
+#include "reporter.h"
+#include "lib/eq.h"
+
+#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
+
+struct mlx5e_tx_err_ctx {
+	int (*recover)(struct mlx5e_txqsq *sq);
+	struct mlx5e_txqsq *sq;
+};
+
+static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
+{
+	unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
+
+	while (time_before(jiffies, exp_time)) {
+		if (sq->cc == sq->pc)
+			return 0;
+
+		msleep(20);
+	}
+
+	netdev_err(sq->channel->netdev,
+		   "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
+		   sq->sqn, sq->cc, sq->pc);
+
+	return -ETIMEDOUT;
+}
+
+static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
+{
+	WARN_ONCE(sq->cc != sq->pc,
+		  "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
+		  sq->sqn, sq->cc, sq->pc);
+	sq->cc = 0;
+	sq->dma_fifo_cc = 0;
+	sq->pc = 0;
+}
+
+static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
+{
+	struct mlx5_core_dev *mdev = sq->channel->mdev;
+	struct net_device *dev = sq->channel->netdev;
+	struct mlx5e_modify_sq_param msp = {0};
+	int err;
+
+	msp.curr_state = curr_state;
+	msp.next_state = MLX5_SQC_STATE_RST;
+
+	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+	if (err) {
+		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
+		return err;
+	}
+
+	memset(&msp, 0, sizeof(msp));
+	msp.curr_state = MLX5_SQC_STATE_RST;
+	msp.next_state = MLX5_SQC_STATE_RDY;
+
+	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+	if (err) {
+		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
+		return err;
+	}
+
+	return 0;
+}
+
+static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
+{
+	struct mlx5_core_dev *mdev = sq->channel->mdev;
+	struct net_device *dev = sq->channel->netdev;
+	u8 state;
+	int err;
+
+	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
+		return 0;
+
+	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
+	if (err) {
+		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
+			   sq->sqn, err);
+		return err;
+	}
+
+	if (state != MLX5_RQC_STATE_ERR) {
+		netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
+		return -EINVAL;
+	}
+
+	mlx5e_tx_disable_queue(sq->txq);
+
+	err = mlx5e_wait_for_sq_flush(sq);
+	if (err)
+		return err;
+
+	/* At this point, no new packets will arrive from the stack as TXQ is
+	 * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
+	 * pending WQEs.  SQ can safely reset the SQ.
+	 */
+
+	err = mlx5e_sq_to_ready(sq, state);
+	if (err)
+		return err;
+
+	mlx5e_reset_txqsq_cc_pc(sq);
+	sq->stats->recover++;
+	mlx5e_activate_txqsq(sq);
+
+	return 0;
+}
+
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
+{
+	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+	struct mlx5e_tx_err_ctx err_ctx = {0};
+
+	err_ctx.sq       = sq;
+	err_ctx.recover  = mlx5e_tx_reporter_err_cqe_recover;
+	sprintf(err_str, "ERR CQE on SQ: 0x%x", sq->sqn);
+
+	devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+			      &err_ctx);
+}
+
+/* state lock cannot be grabbed within this function.
+ * It can cause a dead lock or a read-after-free.
+ */
+int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_tx_err_ctx *err_ctx)
+{
+	return err_ctx->recover(err_ctx->sq);
+}
+
+static int mlx5e_tx_reporter_recover_all(struct mlx5e_priv *priv)
+{
+	int err;
+
+	mutex_lock(&priv->state_lock);
+	mlx5e_close_locked(priv->netdev);
+	err = mlx5e_open_locked(priv->netdev);
+	mutex_unlock(&priv->state_lock);
+
+	return err;
+}
+
+static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
+				     void *context)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	struct mlx5e_tx_err_ctx *err_ctx = context;
+
+	return err_ctx ? mlx5e_tx_reporter_recover_from_ctx(err_ctx) :
+			 mlx5e_tx_reporter_recover_all(priv);
+}
+
+static int
+mlx5e_tx_reporter_build_diagnose_output(struct devlink_health_buffer *buffer,
+					u32 sqn, u8 state, u8 stopped)
+{
+	int err, i;
+	int nest = 0;
+	char name[20];
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	sprintf(name, "SQ 0x%x", sqn);
+	err = devlink_health_buffer_put_object_name(buffer, name);
+	if (err)
+		goto buffer_error;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_put_object_name(buffer, "HW state");
+	if (err)
+		goto buffer_error;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_put_value_u8(buffer, state);
+	if (err)
+		goto buffer_error;
+
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE */
+	nest--;
+
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR */
+	nest--;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_put_object_name(buffer, "stopped");
+	if (err)
+		goto buffer_error;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		goto buffer_error;
+	nest++;
+
+	err = devlink_health_buffer_put_value_u8(buffer, stopped);
+	if (err)
+		goto buffer_error;
+
+	for (i = 0; i < nest; i++)
+		devlink_health_buffer_nest_end(buffer);
+
+	return 0;
+
+buffer_error:
+	for (i = 0; i < nest; i++)
+		devlink_health_buffer_nest_cancel(buffer);
+	return err;
+}
+
+static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
+				      struct devlink_health_buffer **buffers_array,
+				      unsigned int buffer_size,
+				      unsigned int num_buffers)
+{
+	struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+	unsigned int buff = 0;
+	int i = 0, err = 0;
+
+	if (buffer_size < MLX5E_TX_REPORTER_PER_SQ_MAX_LEN)
+		return -ENOMEM;
+
+	mutex_lock(&priv->state_lock);
+
+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
+		mutex_unlock(&priv->state_lock);
+		return 0;
+	}
+
+	while (i < priv->channels.num * priv->channels.params.num_tc) {
+		struct mlx5e_txqsq *sq = priv->txq2sq[i];
+		u8 state;
+
+		err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
+		if (err)
+			break;
+
+		err = mlx5e_tx_reporter_build_diagnose_output(buffers_array[buff],
+							      sq->sqn, state,
+							      netif_xmit_stopped(sq->txq));
+		if (err) {
+			if (++buff == num_buffers)
+				break;
+		} else {
+			i++;
+		}
+	}
+
+	mutex_unlock(&priv->state_lock);
+	return err;
+}
+
+static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
+		.name = "TX",
+		.recover = mlx5e_tx_reporter_recover,
+		.diagnose_size = MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC *
+				 MLX5E_TX_REPORTER_PER_SQ_MAX_LEN,
+		.diagnose = mlx5e_tx_reporter_diagnose,
+		.dump_size = 0,
+		.dump = NULL,
+};
+
+#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	struct devlink *devlink = priv_to_devlink(mdev);
+
+	priv->tx_reporter =
+		devlink_health_reporter_create(devlink, &mlx5_tx_reporter_ops,
+					       MLX5_REPORTER_TX_GRACEFUL_PERIOD,
+					       true, priv);
+	return PTR_ERR_OR_ZERO(priv->tx_reporter);
+}
+
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv)
+{
+	devlink_health_reporter_destroy(priv->tx_reporter);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8cfd2ec7c0a2..4c4507384fed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -51,6 +51,7 @@
 #include "en/xdp.h"
 #include "lib/eq.h"
 #include "en/monitor_stats.h"
+#include "en/reporter.h"
 
 struct mlx5e_rq_param {
 	u32			rqc[MLX5_ST_SZ_DW(rqc)];
@@ -1160,7 +1161,7 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
 	return 0;
 }
 
-static void mlx5e_sq_recover(struct work_struct *work);
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work);
 static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 			     int txq_ix,
 			     struct mlx5e_params *params,
@@ -1182,7 +1183,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
 	sq->uar_map   = mdev->mlx5e_res.bfreg.map;
 	sq->min_inline_mode = params->tx_min_inline_mode;
 	sq->stats     = &c->priv->channel_stats[c->ix].sq[tc];
-	INIT_WORK(&sq->recover.recover_work, mlx5e_sq_recover);
+	INIT_WORK(&sq->recover_work, mlx5e_tx_err_cqe_work);
 	if (MLX5_IPSEC_DEV(c->priv->mdev))
 		set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state);
 	if (mlx5_accel_is_tls_device(c->priv->mdev))
@@ -1270,15 +1271,8 @@ static int mlx5e_create_sq(struct mlx5_core_dev *mdev,
 	return err;
 }
 
-struct mlx5e_modify_sq_param {
-	int curr_state;
-	int next_state;
-	bool rl_update;
-	int rl_index;
-};
-
-static int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
-			   struct mlx5e_modify_sq_param *p)
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+		    struct mlx5e_modify_sq_param *p)
 {
 	void *in;
 	void *sqc;
@@ -1376,17 +1370,7 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
 	return err;
 }
 
-static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
-{
-	WARN_ONCE(sq->cc != sq->pc,
-		  "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
-		  sq->sqn, sq->cc, sq->pc);
-	sq->cc = 0;
-	sq->dma_fifo_cc = 0;
-	sq->pc = 0;
-}
-
-static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
 {
 	sq->txq = netdev_get_tx_queue(sq->channel->netdev, sq->txq_ix);
 	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
@@ -1395,7 +1379,7 @@ static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
 	netif_tx_start_queue(sq->txq);
 }
 
-static inline void netif_tx_disable_queue(struct netdev_queue *txq)
+void mlx5e_tx_disable_queue(struct netdev_queue *txq)
 {
 	__netif_tx_lock_bh(txq);
 	netif_tx_stop_queue(txq);
@@ -1411,7 +1395,7 @@ static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
 	/* prevent netif_tx_wake_queue */
 	napi_synchronize(&c->napi);
 
-	netif_tx_disable_queue(sq->txq);
+	mlx5e_tx_disable_queue(sq->txq);
 
 	/* last doorbell out, godspeed .. */
 	if (mlx5e_wqc_has_room_for(wq, sq->cc, sq->pc, 1)) {
@@ -1431,6 +1415,7 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
 	struct mlx5_rate_limit rl = {0};
 
 	cancel_work_sync(&sq->dim.work);
+	cancel_work_sync(&sq->recover_work);
 	mlx5e_destroy_sq(mdev, sq->sqn);
 	if (sq->rate_limit) {
 		rl.rate = sq->rate_limit;
@@ -1440,105 +1425,15 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
 	mlx5e_free_txqsq(sq);
 }
 
-static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
 {
-	unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
-
-	while (time_before(jiffies, exp_time)) {
-		if (sq->cc == sq->pc)
-			return 0;
-
-		msleep(20);
-	}
-
-	netdev_err(sq->channel->netdev,
-		   "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
-		   sq->sqn, sq->cc, sq->pc);
-
-	return -ETIMEDOUT;
-}
-
-static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
-{
-	struct mlx5_core_dev *mdev = sq->channel->mdev;
-	struct net_device *dev = sq->channel->netdev;
-	struct mlx5e_modify_sq_param msp = {0};
-	int err;
-
-	msp.curr_state = curr_state;
-	msp.next_state = MLX5_SQC_STATE_RST;
-
-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
-	if (err) {
-		netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
-		return err;
-	}
-
-	memset(&msp, 0, sizeof(msp));
-	msp.curr_state = MLX5_SQC_STATE_RST;
-	msp.next_state = MLX5_SQC_STATE_RDY;
+	struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
+					      recover_work);
 
-	err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
-	if (err) {
-		netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
-		return err;
-	}
-
-	return 0;
-}
-
-static void mlx5e_sq_recover(struct work_struct *work)
-{
-	struct mlx5e_txqsq_recover *recover =
-		container_of(work, struct mlx5e_txqsq_recover,
-			     recover_work);
-	struct mlx5e_txqsq *sq = container_of(recover, struct mlx5e_txqsq,
-					      recover);
-	struct mlx5_core_dev *mdev = sq->channel->mdev;
-	struct net_device *dev = sq->channel->netdev;
-	u8 state;
-	int err;
-
-	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
-	if (err) {
-		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
-			   sq->sqn, err);
-		return;
-	}
-
-	if (state != MLX5_RQC_STATE_ERR) {
-		netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
-		return;
-	}
-
-	netif_tx_disable_queue(sq->txq);
-
-	if (mlx5e_wait_for_sq_flush(sq))
-		return;
-
-	/* If the interval between two consecutive recovers per SQ is too
-	 * short, don't recover to avoid infinite loop of ERR_CQE -> recover.
-	 * If we reached this state, there is probably a bug that needs to be
-	 * fixed. let's keep the queue close and let tx timeout cleanup.
-	 */
-	if (jiffies_to_msecs(jiffies - recover->last_recover) <
-	    MLX5E_SQ_RECOVER_MIN_INTERVAL) {
-		netdev_err(dev, "Recover SQ 0x%x canceled, too many error CQEs\n",
-			   sq->sqn);
-		return;
-	}
-
-	/* At this point, no new packets will arrive from the stack as TXQ is
-	 * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
-	 * pending WQEs.  SQ can safely reset the SQ.
-	 */
-	if (mlx5e_sq_to_ready(sq, state))
+	if (!sq->channel->priv->tx_reporter)
 		return;
 
-	mlx5e_reset_txqsq_cc_pc(sq);
-	sq->stats->recover++;
-	recover->last_recover = jiffies;
-	mlx5e_activate_txqsq(sq);
+	mlx5e_tx_reporter_err_cqe(sq);
 }
 
 static int mlx5e_open_icosq(struct mlx5e_channel *c,
@@ -3207,6 +3102,7 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
 {
 	int tc;
 
+	mlx5e_tx_reporter_destroy(priv);
 	for (tc = 0; tc < priv->profile->max_tc; tc++)
 		mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
 }
@@ -4908,6 +4804,7 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 	mlx5e_dcbnl_initialize(priv);
 #endif
+	mlx5e_tx_reporter_create(priv);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 598ad7e4d5c9..a8e052a5ce36 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -514,7 +514,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 				mlx5e_dump_error_cqe(sq,
 						     (struct mlx5_err_cqe *)cqe);
 				queue_work(cq->channel->priv->wq,
-					   &sq->recover.recover_work);
+					   &sq->recover_work);
 			}
 			stats->cqe_err++;
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 10/27] net/mlx5e: Add TX timeout support for mlx5e TX reporter
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (8 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 09/27] net/mlx5e: Add TX reporter support Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 11/27] net/mlx5: Move all devlink related functions calls to devlink.c Eran Ben Elisha
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

With this patch, ndo_tx_timeout callback will be redirected to the TX
reporter in order to detect a TX timeout error and report it to the
devlink health. (The watchdog detects TX timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from TX timeout in case of lost interrupt was added
to the TX reporter recover method. The TX timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the TX reporter recovery flow.

TX timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=TX: TX timeout on queue: 0, SQ: 0xd8a, CQ:
0x406, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 13972000

$devlink health diagnose pci/0000:00:09 reporter TX
SQ 0xd8a: HW state: 1, stopped: 1
SQ 0xe44: HW state: 1, stopped: 0
SQ 0xeb4: HW state: 1, stopped: 0
SQ 0xf1f: HW state: 1, stopped: 0
SQ 0xf80: HW state: 1, stopped: 0
SQ 0xfe5: HW state: 1, stopped: 0

$devlink health recover pci/0000:00:09 reporter TX
$devlink health show
pci/0000:00:09.0:
  name TX state healthy #err 1 #recover 1 last_dump_ts N/A dump_available false
    attributes:
        grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en/reporter.h |  1 +
 .../mellanox/mlx5/core/en/reporter_tx.c       | 35 +++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 51 +++----------------
 3 files changed, 43 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
index 74083e120ab3..2335c5b48820 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -10,5 +10,6 @@
 int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
 void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
 void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+void mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq);
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 9800df4909c2..d9675afbb924 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -126,6 +126,41 @@ void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
 			      &err_ctx);
 }
 
+static int mlx5e_tx_reporter_timeout_recover(struct mlx5e_txqsq *sq)
+{
+	struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
+	u32 eqe_count;
+
+	netdev_err(sq->channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
+		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
+
+	eqe_count = mlx5_eq_poll_irq_disabled(eq);
+	if (!eqe_count) {
+		clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
+		return 1;
+	}
+
+	netdev_err(sq->channel->netdev, "Recover %d eqes on EQ 0x%x\n",
+		   eqe_count, eq->core.eqn);
+	sq->channel->stats->eq_rearm++;
+	return 0;
+}
+
+void mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq)
+{
+	struct mlx5e_tx_err_ctx err_ctx;
+	char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+
+	err_ctx.sq       = sq;
+	err_ctx.recover  = mlx5e_tx_reporter_timeout_recover;
+	sprintf(err_str,
+		"TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
+		sq->channel->ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
+		jiffies_to_usecs(jiffies - sq->txq->trans_start));
+	devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+			      &err_ctx);
+}
+
 /* state lock cannot be grabbed within this function.
  * It can cause a dead lock or a read-after-free.
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 4c4507384fed..dee0c8f3d4e9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4074,31 +4074,14 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb,
 	return features;
 }
 
-static bool mlx5e_tx_timeout_eq_recover(struct net_device *dev,
-					struct mlx5e_txqsq *sq)
-{
-	struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
-	u32 eqe_count;
-
-	netdev_err(dev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
-		   eq->core.eqn, eq->core.cons_index, eq->core.irqn);
-
-	eqe_count = mlx5_eq_poll_irq_disabled(eq);
-	if (!eqe_count)
-		return false;
-
-	netdev_err(dev, "Recover %d eqes on EQ 0x%x\n", eqe_count, eq->core.eqn);
-	sq->channel->stats->eq_rearm++;
-	return true;
-}
-
 static void mlx5e_tx_timeout_work(struct work_struct *work)
 {
 	struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
 					       tx_timeout_work);
-	struct net_device *dev = priv->netdev;
-	bool reopen_channels = false;
-	int i, err;
+	int i;
+
+	if (!priv->tx_reporter)
+		return;
 
 	rtnl_lock();
 	mutex_lock(&priv->state_lock);
@@ -4107,36 +4090,16 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 		goto unlock;
 
 	for (i = 0; i < priv->channels.num * priv->channels.params.num_tc; i++) {
-		struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i);
+		struct netdev_queue *dev_queue =
+			netdev_get_tx_queue(priv->netdev, i);
 		struct mlx5e_txqsq *sq = priv->txq2sq[i];
 
 		if (!netif_xmit_stopped(dev_queue))
 			continue;
 
-		netdev_err(dev,
-			   "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
-			   i, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
-			   jiffies_to_usecs(jiffies - dev_queue->trans_start));
-
-		/* If we recover a lost interrupt, most likely TX timeout will
-		 * be resolved, skip reopening channels
-		 */
-		if (!mlx5e_tx_timeout_eq_recover(dev, sq)) {
-			clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
-			reopen_channels = true;
-		}
+		mlx5e_tx_reporter_timeout(sq);
 	}
 
-	if (!reopen_channels)
-		goto unlock;
-
-	mlx5e_close_locked(dev);
-	err = mlx5e_open_locked(dev);
-	if (err)
-		netdev_err(priv->netdev,
-			   "mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n",
-			   err);
-
 unlock:
 	mutex_unlock(&priv->state_lock);
 	rtnl_unlock();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 11/27] net/mlx5: Move all devlink related functions calls to devlink.c
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (9 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 10/27] net/mlx5e: Add TX timeout support for mlx5e TX reporter Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 12/27] net/mlx5: Add Vendor Specific Capability access gateway Eran Ben Elisha
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

Centralize all devlink related callbacks in one file.
In the downstream patch, some more functionality will be added, this
patch is preparing the driver infrastructure for it.

Currently, move devlink un/register functions calls into this file.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 14 ++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 12 ++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/main.c    |  5 +++--
 4 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 6bb2a860b15b..2049f332c7e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o alloc.o qp.o port.o mr.o pd.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \
-		lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o
+		lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
 
 #
 # Netdev basic
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
new file mode 100644
index 000000000000..ddfaf06c910c
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2018 Mellanox Technologies */
+
+#include <devlink.h>
+
+int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
+{
+	return devlink_register(devlink, dev);
+}
+
+void mlx5_devlink_unregister(struct devlink *devlink)
+{
+	devlink_unregister(devlink);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
new file mode 100644
index 000000000000..455bfa4e89c0
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
+
+#ifndef __MLX5_DEVLINK_H__
+#define __MLX5_DEVLINK_H__
+
+#include <net/devlink.h>
+
+int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
+void mlx5_devlink_unregister(struct devlink *devlink);
+
+#endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index be81b319b0dc..cf9dbf86323c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -56,6 +56,7 @@
 #include "fs_core.h"
 #include "lib/mpfs.h"
 #include "eswitch.h"
+#include "devlink.h"
 #include "lib/mlx5.h"
 #include "fpga/core.h"
 #include "fpga/ipsec.h"
@@ -1202,7 +1203,7 @@ static int init_one(struct pci_dev *pdev,
 
 	request_module_nowait(MLX5_IB_MOD);
 
-	err = devlink_register(devlink, &pdev->dev);
+	err = mlx5_devlink_register(devlink, &pdev->dev);
 	if (err)
 		goto clean_load;
 
@@ -1229,7 +1230,7 @@ static void remove_one(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_priv *priv = &dev->priv;
 
-	devlink_unregister(devlink);
+	mlx5_devlink_unregister(devlink);
 	mlx5_unregister_device(dev);
 
 	if (mlx5_unload_one(dev, priv, true)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 12/27] net/mlx5: Add Vendor Specific Capability access gateway
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (10 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 11/27] net/mlx5: Move all devlink related functions calls to devlink.c Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 13/27] net/mlx5: Add Crdump FW snapshot support Eran Ben Elisha
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh
  Cc: Alex Vesker, Feras Daoud, Saeed Mahameed

From: Alex Vesker <valex@mellanox.com>

The Vendor Specific Capability (VSC) is used to activate a gateway
interfacing with the device. The gateway is used to read or write
device configurations, which are organized in different domains (spaces).
A configuration access may result in multiple actions, reads, writes.

Example usages are accessing the Crspace domain to read the crspace or
locking a device semaphore using the Semaphore domain.

The configuration access use pci_cfg_access to prevent parallel access to
the VSC space by the driver and userspace calls.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   3 +-
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 281 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.h |  25 ++
 .../net/ethernet/mellanox/mlx5/core/main.c    |   3 +
 include/linux/mlx5/driver.h                   |   1 +
 5 files changed, 312 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 2049f332c7e3..9a2144c0901b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,8 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o alloc.o qp.o port.o mr.o pd.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \
-		lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
+		lib/devcom.o lib/pci_vsc.o diag/fs_tracepoint.o \
+		diag/fw_tracer.o devlink.o
 
 #
 # Netdev basic
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
new file mode 100644
index 000000000000..96695365d8ce
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2018 Mellanox Technologies */
+
+#include <linux/pci.h>
+#include "mlx5_core.h"
+#include "pci_vsc.h"
+
+#define MLX5_EXTRACT_C(source, offset, size)	\
+	((((unsigned)(source)) >> (offset)) & MLX5_ONES32(size))
+#define MLX5_EXTRACT(src, start, len)		\
+	(((len) == 32) ? (src) : MLX5_EXTRACT_C(src, start, len))
+#define MLX5_ONES32(size)			\
+	((size) ? (0xffffffff >> (32 - (size))) : 0)
+#define MLX5_MASK32(offset, size)		\
+	(MLX5_ONES32(size) << (offset))
+#define MLX5_MERGE_C(rsrc1, rsrc2, start, len)  \
+	((((rsrc2) << (start)) & (MLX5_MASK32((start), (len)))) | \
+	((rsrc1) & (~MLX5_MASK32((start), (len)))))
+#define MLX5_MERGE(rsrc1, rsrc2, start, len)	\
+	(((len) == 32) ? (rsrc2) : MLX5_MERGE_C(rsrc1, rsrc2, start, len))
+#define vsc_read(dev, offset, val) \
+	pci_read_config_dword((dev)->pdev, (dev)->vsc_addr + (offset), (val))
+#define vsc_write(dev, offset, val) \
+	pci_write_config_dword((dev)->pdev, (dev)->vsc_addr + (offset), (val))
+#define VSC_MAX_RETRIES 2048
+
+enum mlx5_vsc_state {
+	MLX5_VSC_UNLOCK,
+	MLX5_VSC_LOCK,
+};
+
+enum {
+	VSC_CTRL_OFFSET = 0x4,
+	VSC_COUNTER_OFFSET = 0x8,
+	VSC_SEMAPHORE_OFFSET = 0xc,
+	VSC_ADDR_OFFSET = 0x10,
+	VSC_DATA_OFFSET = 0x14,
+
+	VSC_FLAG_BIT_OFFS = 31,
+	VSC_FLAG_BIT_LEN = 1,
+
+	VSC_SYND_BIT_OFFS = 30,
+	VSC_SYND_BIT_LEN = 1,
+
+	VSC_ADDR_BIT_OFFS = 0,
+	VSC_ADDR_BIT_LEN = 30,
+
+	VSC_SPACE_BIT_OFFS = 0,
+	VSC_SPACE_BIT_LEN = 16,
+
+	VSC_SIZE_VLD_BIT_OFFS = 28,
+	VSC_SIZE_VLD_BIT_LEN = 1,
+
+	VSC_STATUS_BIT_OFFS = 29,
+	VSC_STATUS_BIT_LEN = 3,
+};
+
+void mlx5_vsc_init(struct mlx5_core_dev *dev)
+{
+	dev->vsc_addr = pci_find_capability(dev->pdev,
+					    PCI_CAP_ID_VNDR);
+	if (!dev->vsc_addr)
+		mlx5_core_warn(dev, "Failed to get valid vendor specific ID\n");
+}
+
+int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev)
+{
+	u32 counter = 0;
+	int retries = 0;
+	u32 lock_val;
+	int ret;
+
+	pci_cfg_access_lock(dev->pdev);
+	do {
+		if (retries > VSC_MAX_RETRIES) {
+			ret = -EBUSY;
+			goto pci_unlock;
+		}
+
+		/* Check if semaphore is already locked */
+		ret = vsc_read(dev, VSC_SEMAPHORE_OFFSET, &lock_val);
+		if (ret)
+			goto pci_unlock;
+
+		if (lock_val) {
+			retries++;
+			usleep_range(1000, 2000);
+			continue;
+		}
+
+		/* Read and write counter value, if written value is
+		 * the same, semaphore was acquired successfully.
+		 */
+		ret = vsc_read(dev, VSC_COUNTER_OFFSET, &counter);
+		if (ret)
+			goto pci_unlock;
+
+		ret = vsc_write(dev, VSC_SEMAPHORE_OFFSET, counter);
+		if (ret)
+			goto pci_unlock;
+
+		ret = vsc_read(dev, VSC_SEMAPHORE_OFFSET, &lock_val);
+		if (ret)
+			goto pci_unlock;
+
+		retries++;
+	} while (counter != lock_val);
+
+	return 0;
+
+pci_unlock:
+	pci_cfg_access_unlock(dev->pdev);
+	return ret;
+}
+
+int mlx5_vsc_gw_unlock(struct mlx5_core_dev *dev)
+{
+	int ret;
+
+	ret = vsc_write(dev, VSC_SEMAPHORE_OFFSET, MLX5_VSC_UNLOCK);
+	pci_cfg_access_unlock(dev->pdev);
+	return ret;
+}
+
+int mlx5_vsc_gw_set_space(struct mlx5_core_dev *dev, u16 space,
+			  u32 *ret_space_size)
+{
+	int ret;
+	u32 val = 0;
+
+	if (!mlx5_vsc_accessible(dev))
+		return -EINVAL;
+
+	if (ret_space_size)
+		*ret_space_size = 0;
+
+	/* Get a unique val */
+	ret = vsc_read(dev, VSC_CTRL_OFFSET, &val);
+	if (ret)
+		goto out;
+
+	/* Try to modify the lock */
+	val = MLX5_MERGE(val, space, VSC_SPACE_BIT_OFFS, VSC_SPACE_BIT_LEN);
+	ret = vsc_write(dev, VSC_CTRL_OFFSET, val);
+	if (ret)
+		goto out;
+
+	/* Verify lock was modified */
+	ret = vsc_read(dev, VSC_CTRL_OFFSET, &val);
+	if (ret)
+		goto out;
+
+	if (MLX5_EXTRACT(val, VSC_STATUS_BIT_OFFS, VSC_STATUS_BIT_LEN) == 0)
+		return -EINVAL;
+
+	/* Get space max address if indicated by size valid bit */
+	if (ret_space_size &&
+	    MLX5_EXTRACT(val, VSC_SIZE_VLD_BIT_OFFS, VSC_SIZE_VLD_BIT_LEN)) {
+		ret = vsc_read(dev, VSC_ADDR_OFFSET, &val);
+		if (ret) {
+			mlx5_core_warn(dev, "Failed to get max space size\n");
+			goto out;
+		}
+		*ret_space_size = MLX5_EXTRACT(val, VSC_ADDR_BIT_OFFS,
+					       VSC_ADDR_BIT_LEN);
+	}
+	return 0;
+
+out:
+	return ret;
+}
+
+static int mlx5_vsc_wait_on_flag(struct mlx5_core_dev *dev, u8 expected_val)
+{
+	int retries = 0;
+	u32 flag;
+	int ret;
+
+	do {
+		if (retries > VSC_MAX_RETRIES)
+			return -EBUSY;
+
+		ret = vsc_read(dev, VSC_ADDR_OFFSET, &flag);
+		flag = MLX5_EXTRACT(flag, VSC_FLAG_BIT_OFFS, VSC_FLAG_BIT_LEN);
+		retries++;
+
+		if ((retries & 0xf) == 0)
+			usleep_range(1000, 2000);
+
+	} while (flag != expected_val);
+
+	return 0;
+}
+
+static int mlx5_vsc_gw_write(struct mlx5_core_dev *dev, unsigned int address,
+			     u32 data)
+{
+	int ret;
+
+	if (MLX5_EXTRACT(address, VSC_SYND_BIT_OFFS,
+			 VSC_FLAG_BIT_LEN + VSC_SYND_BIT_LEN))
+		return -EINVAL;
+
+	/* Set flag to 0x1 */
+	address = MLX5_MERGE(address, 1, VSC_FLAG_BIT_OFFS, 1);
+	ret = vsc_write(dev, VSC_DATA_OFFSET, data);
+	if (ret)
+		goto out;
+
+	ret = vsc_write(dev, VSC_ADDR_OFFSET, address);
+	if (ret)
+		goto out;
+
+	/* Wait for the flag to be cleared */
+	ret = mlx5_vsc_wait_on_flag(dev, 0);
+
+out:
+	return ret;
+}
+
+static int mlx5_vsc_gw_read(struct mlx5_core_dev *dev, unsigned int address,
+			    u32 *data)
+{
+	int ret;
+
+	if (MLX5_EXTRACT(address, VSC_SYND_BIT_OFFS,
+			 VSC_FLAG_BIT_LEN + VSC_SYND_BIT_LEN))
+		return -EINVAL;
+
+	ret = vsc_write(dev, VSC_ADDR_OFFSET, address);
+	if (ret)
+		goto out;
+
+	ret = mlx5_vsc_wait_on_flag(dev, 1);
+	if (ret)
+		goto out;
+
+	ret = vsc_read(dev, VSC_DATA_OFFSET, data);
+out:
+	return ret;
+}
+
+static int mlx5_vsc_gw_read_fast(struct mlx5_core_dev *dev,
+				 unsigned int read_addr,
+				 unsigned int *next_read_addr,
+				 u32 *data)
+{
+	int ret;
+
+	ret = mlx5_vsc_gw_read(dev, read_addr, data);
+	if (ret)
+		goto out;
+
+	ret = vsc_read(dev, VSC_ADDR_OFFSET, next_read_addr);
+	if (ret)
+		goto out;
+
+	*next_read_addr = MLX5_EXTRACT(*next_read_addr, VSC_ADDR_BIT_OFFS,
+				       VSC_ADDR_BIT_LEN);
+
+	if (*next_read_addr <= read_addr)
+		ret = EINVAL;
+out:
+	return ret;
+}
+
+int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
+				int length)
+{
+	unsigned int next_read_addr = 0;
+	unsigned int read_addr = 0;
+
+	while (read_addr < length) {
+		if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr,
+					  &data[(read_addr >> 2)]))
+			return read_addr;
+
+		read_addr = next_read_addr;
+	}
+	return length;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
new file mode 100644
index 000000000000..c30ff69ffa3e
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018 Mellanox Technologies */
+
+#ifndef __MLX5_PCI_VSC_H__
+#define __MLX5_PCI_VSC_H__
+
+enum {
+	MLX5_VSC_SPACE_SCAN_CRSPACE = 0x7,
+};
+
+void mlx5_vsc_init(struct mlx5_core_dev *dev);
+void mlx5_vsc_cleanup(struct mlx5_core_dev *dev);
+int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev);
+int mlx5_vsc_gw_unlock(struct mlx5_core_dev *dev);
+int mlx5_vsc_gw_set_space(struct mlx5_core_dev *dev, u16 space,
+			  u32 *ret_space_size);
+int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
+				int length);
+
+static inline bool mlx5_vsc_accessible(struct mlx5_core_dev *dev)
+{
+	return !!dev->vsc_addr;
+}
+
+#endif /* __MLX5_PCI_VSC_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index cf9dbf86323c..124d6ccaf2f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -66,6 +66,7 @@
 #include "lib/vxlan.h"
 #include "lib/devcom.h"
 #include "diag/fw_tracer.h"
+#include "lib/pci_vsc.h"
 
 MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
 MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
@@ -1203,6 +1204,8 @@ static int init_one(struct pci_dev *pdev,
 
 	request_module_nowait(MLX5_IB_MOD);
 
+	mlx5_vsc_init(dev);
+
 	err = mlx5_devlink_register(devlink, &pdev->dev);
 	if (err)
 		goto clean_load;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 54299251d40d..e086f1fb5eea 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -677,6 +677,7 @@ struct mlx5_core_dev {
 	struct mlx5_ib_clock_info  *clock_info;
 	struct page             *clock_info_page;
 	struct mlx5_fw_tracer   *tracer;
+	u32                      vsc_addr;
 };
 
 struct mlx5_db {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 13/27] net/mlx5: Add Crdump FW snapshot support
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (11 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 12/27] net/mlx5: Add Vendor Specific Capability access gateway Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 14/27] net/mlx5: Use devlink region_snapshot parameter Eran Ben Elisha
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh
  Cc: Alex Vesker, Saeed Mahameed

From: Alex Vesker <valex@mellanox.com>

Crdump allows the driver to create a snapshot of the FW PCI
crspace. This is useful in case of catastrophic issues which
require FW reset. The snapshot can be used for later debug.

The snapshot is exposed using devlink, cr-space
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../ethernet/mellanox/mlx5/core/diag/crdump.c | 178 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/health.c  |   1 +
 .../ethernet/mellanox/mlx5/core/lib/mlx5.h    |   4 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   5 +
 include/linux/mlx5/driver.h                   |   4 +
 6 files changed, 193 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9a2144c0901b..9f0b96a6c915 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -16,7 +16,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \
 		lib/devcom.o lib/pci_vsc.o diag/fs_tracepoint.o \
-		diag/fw_tracer.o devlink.o
+		diag/fw_tracer.o diag/crdump.o devlink.o
 
 #
 # Netdev basic
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
new file mode 100644
index 000000000000..18a413913b6e
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2018 Mellanox Technologies */
+
+#include <linux/proc_fs.h>
+#include <linux/mlx5/driver.h>
+#include <net/devlink.h>
+#include "mlx5_core.h"
+#include "lib/pci_vsc.h"
+
+#define BAD_ACCESS			0xBADACCE5
+#define MLX5_PROTECTED_CR_SCAN_CRSPACE	0x7
+#define MAX_NUM_OF_DUMPS_TO_STORE	(8)
+
+static const char *region_cr_space_str = "cr-space";
+
+struct mlx5_fw_crdump {
+	u32			size;
+	struct devlink_region	*region_crspace;
+};
+
+bool mlx5_crdump_enbaled(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+
+	return (!!priv->health.crdump);
+}
+
+static int mlx5_crdump_fill(struct mlx5_core_dev *dev,
+			    char *crdump_region, u32 *snapshot_id)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	struct mlx5_priv *priv = &dev->priv;
+	struct mlx5_fw_crdump *crdump = priv->health.crdump;
+	int i, ret = 0;
+	u32 *cr_data;
+	u32 id;
+
+	cr_data = kvmalloc(crdump->size, GFP_KERNEL);
+	if (!cr_data)
+		return -ENOMEM;
+
+	for (i = 0; i < (crdump->size / 4); i++)
+		cr_data[i] = BAD_ACCESS;
+
+	ret = mlx5_vsc_gw_read_block_fast(dev, cr_data, crdump->size);
+	if (ret <= 0) {
+		if (ret == 0)
+			ret = -EIO;
+		goto free_data;
+	}
+
+	if (crdump->size != ret) {
+		mlx5_core_warn(dev, "failed to read full dump, read %d out of %u\n",
+			       ret, crdump->size);
+		ret = -EINVAL;
+		goto free_data;
+	}
+
+	/* Get the available snapshot ID for the dumps */
+	id = devlink_region_shapshot_id_get(devlink);
+	ret = devlink_region_snapshot_create(crdump->region_crspace,
+					     crdump->size, (u8 *)cr_data,
+					     id, &kvfree);
+	if (ret) {
+		mlx5_core_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
+			       region_cr_space_str, id, ret);
+		goto free_data;
+	} else {
+		*snapshot_id = id;
+		strcpy(crdump_region, region_cr_space_str);
+	}
+	return 0;
+
+free_data:
+	kvfree(cr_data);
+	return ret;
+}
+
+int mlx5_crdump_collect(struct mlx5_core_dev *dev,
+			char *crdump_region, u32 *snapshot_id)
+{
+	int ret = 0;
+
+	if (!mlx5_crdump_enbaled(dev))
+		return -ENODEV;
+
+	ret = mlx5_vsc_gw_lock(dev);
+	if (ret) {
+		mlx5_core_warn(dev, "crdump: failed to lock vsc gw err %d\n",
+			       ret);
+		return ret;
+	}
+
+	ret = mlx5_vsc_gw_set_space(dev, MLX5_VSC_SPACE_SCAN_CRSPACE, NULL);
+	if (ret)
+		goto unlock;
+
+	ret = mlx5_crdump_fill(dev, crdump_region, snapshot_id);
+
+unlock:
+	mlx5_vsc_gw_unlock(dev);
+	return ret;
+}
+
+int mlx5_crdump_init(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	struct mlx5_priv *priv = &dev->priv;
+	struct mlx5_fw_crdump *crdump;
+	u32 space_size;
+	int ret;
+
+	if (!mlx5_core_is_pf(dev) || !mlx5_vsc_accessible(dev) ||
+	    mlx5_crdump_enbaled(dev))
+		return 0;
+
+	ret = mlx5_vsc_gw_lock(dev);
+	if (ret)
+		return ret;
+
+	/* Check if space is supported and get space size */
+	ret = mlx5_vsc_gw_set_space(dev, MLX5_VSC_SPACE_SCAN_CRSPACE,
+				    &space_size);
+	if (ret) {
+		/* Unlock and mask error since space is not supported */
+		mlx5_vsc_gw_unlock(dev);
+		return 0;
+	}
+
+	if (!space_size) {
+		mlx5_core_warn(dev, "Invalid Crspace size, zero\n");
+		mlx5_vsc_gw_unlock(dev);
+		return -EINVAL;
+	}
+
+	ret = mlx5_vsc_gw_unlock(dev);
+	if (ret)
+		return ret;
+
+	crdump = kzalloc(sizeof(*crdump), GFP_KERNEL);
+	if (!crdump)
+		return -ENOMEM;
+
+	/* Create cr-space region */
+	crdump->size = space_size;
+	crdump->region_crspace =
+		devlink_region_create(devlink,
+				      region_cr_space_str,
+				      MAX_NUM_OF_DUMPS_TO_STORE,
+				      space_size);
+	if (IS_ERR(crdump->region_crspace)) {
+		mlx5_core_warn(dev,
+			       "crdump: create devlink region %s err %ld\n",
+			       region_cr_space_str,
+			       PTR_ERR(crdump->region_crspace));
+		ret = PTR_ERR(crdump->region_crspace);
+		goto free_crdump;
+	}
+	priv->health.crdump = crdump;
+	return 0;
+
+free_crdump:
+	kfree(crdump);
+	return ret;
+}
+
+void mlx5_crdump_cleanup(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+	struct mlx5_fw_crdump *crdump = priv->health.crdump;
+
+	if (!crdump)
+		return;
+
+	devlink_region_destroy(crdump->region_crspace);
+	kfree(crdump);
+	priv->health.crdump = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 196c07383082..883c1e8ffdc2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -378,6 +378,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
 	spin_lock_init(&health->wq_lock);
 	INIT_WORK(&health->work, health_care);
 	INIT_DELAYED_WORK(&health->recover_work, health_recover);
+	health->crdump = NULL;
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index 397a2847867a..3c9a6dedccaa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -41,6 +41,10 @@ int  mlx5_core_reserve_gids(struct mlx5_core_dev *dev, unsigned int count);
 void mlx5_core_unreserve_gids(struct mlx5_core_dev *dev, unsigned int count);
 int  mlx5_core_reserved_gid_alloc(struct mlx5_core_dev *dev, int *gid_index);
 void mlx5_core_reserved_gid_free(struct mlx5_core_dev *dev, int gid_index);
+int mlx5_crdump_init(struct mlx5_core_dev *dev);
+void mlx5_crdump_cleanup(struct mlx5_core_dev *dev);
+int mlx5_crdump_collect(struct mlx5_core_dev *dev,
+			char *crdump_region, u32 *snapshot_id);
 
 /* TODO move to lib/events.h */
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 124d6ccaf2f8..c6c07ca0e423 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1210,6 +1210,10 @@ static int init_one(struct pci_dev *pdev,
 	if (err)
 		goto clean_load;
 
+	err = mlx5_crdump_init(dev);
+	if (err)
+		dev_err(&pdev->dev, "mlx5_crdump_init failed with error code %d\n", err);
+
 	pci_save_state(pdev);
 	return 0;
 
@@ -1233,6 +1237,7 @@ static void remove_one(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_priv *priv = &dev->priv;
 
+	mlx5_crdump_cleanup(dev);
 	mlx5_devlink_unregister(devlink);
 	mlx5_unregister_device(dev);
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index e086f1fb5eea..77c04154ecf3 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -53,6 +53,7 @@
 #include <linux/mlx5/eq.h>
 #include <linux/timecounter.h>
 #include <linux/ptp_clock_kernel.h>
+#include <net/devlink.h>
 
 enum {
 	MLX5_BOARD_ID_LEN = 64,
@@ -425,6 +426,8 @@ struct mlx5_sq_bfreg {
 	unsigned int		offset;
 };
 
+struct mlx5_fw_crdump;
+
 struct mlx5_core_health {
 	struct health_buffer __iomem   *health;
 	__be32 __iomem		       *health_counter;
@@ -438,6 +441,7 @@ struct mlx5_core_health {
 	unsigned long			flags;
 	struct work_struct		work;
 	struct delayed_work		recover_work;
+	struct mlx5_fw_crdump	       *crdump;
 };
 
 struct mlx5_qp_table {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 14/27] net/mlx5: Use devlink region_snapshot parameter
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (12 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 13/27] net/mlx5: Add Crdump FW snapshot support Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 15/27] Documentation: mlx5: Update kernel documentation Eran Ben Elisha
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh
  Cc: Alex Vesker, Saeed Mahameed

From: Moshe Shemesh <moshe@mellanox.com>

This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 60 ++++++++++++++++++-
 .../ethernet/mellanox/mlx5/core/diag/crdump.c | 22 +++++++
 .../ethernet/mellanox/mlx5/core/lib/mlx5.h    |  2 +
 3 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index ddfaf06c910c..88190acae873 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -2,13 +2,71 @@
 /* Copyright (c) 2018 Mellanox Technologies */
 
 #include <devlink.h>
+#include <linux/mlx5/driver.h>
+#include "lib/mlx5.h"
+
+static int mlx5_devlink_get_crdump_snapshot(struct devlink *devlink, u32 id,
+					    struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	ctx->val.vbool = mlx5_crdump_is_snapshot_enabled(dev);
+	return 0;
+}
+
+static int mlx5_devlink_set_crdump_snapshot(struct devlink *devlink, u32 id,
+					    struct devlink_param_gset_ctx *ctx)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+	return mlx5_crdump_set_snapshot_enabled(dev, ctx->val.vbool);
+}
+
+static const struct devlink_param mlx5_devlink_params[] = {
+	DEVLINK_PARAM_GENERIC(REGION_SNAPSHOT,
+			      BIT(DEVLINK_PARAM_CMODE_RUNTIME) |
+			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+			      mlx5_devlink_get_crdump_snapshot,
+			      mlx5_devlink_set_crdump_snapshot, NULL),
+};
 
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
 {
-	return devlink_register(devlink, dev);
+	union devlink_param_value init_val;
+	int err;
+
+	err = devlink_register(devlink, dev);
+	if (err) {
+		dev_warn(dev,
+			 "devlink register failed (err = %d)", err);
+		return err;
+	}
+
+	err = devlink_params_register(devlink, mlx5_devlink_params,
+				      ARRAY_SIZE(mlx5_devlink_params));
+	if (err) {
+		dev_err(dev, "devlink_params_register failed, err = %d\n", err);
+		goto unregister;
+	}
+
+	init_val.vbool = false;
+	err = devlink_param_driverinit_value_set(devlink,
+						 DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+						 init_val);
+	if (err)
+		dev_warn(dev,
+			 "devlink param init failed (err = %d)", err);
+
+	return 0;
+
+unregister:
+	devlink_unregister(devlink);
+	return err;
 }
 
 void mlx5_devlink_unregister(struct devlink *devlink)
 {
+	devlink_params_unregister(devlink, mlx5_devlink_params,
+				  ARRAY_SIZE(mlx5_devlink_params));
 	devlink_unregister(devlink);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
index 18a413913b6e..cf601300eb2e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
@@ -15,6 +15,7 @@ static const char *region_cr_space_str = "cr-space";
 
 struct mlx5_fw_crdump {
 	u32			size;
+	bool			snapshot_enable;
 	struct devlink_region	*region_crspace;
 };
 
@@ -102,6 +103,27 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev,
 	return ret;
 }
 
+bool mlx5_crdump_is_snapshot_enabled(struct mlx5_core_dev *dev)
+{
+	struct mlx5_priv *priv = &dev->priv;
+
+	if (mlx5_crdump_enbaled(dev))
+		return priv->health.crdump->snapshot_enable;
+
+	return false;
+}
+
+int mlx5_crdump_set_snapshot_enabled(struct mlx5_core_dev *dev, bool value)
+{
+	struct mlx5_priv *priv = &dev->priv;
+
+	if (!mlx5_crdump_enbaled(dev))
+		return -ENODEV;
+
+	priv->health.crdump->snapshot_enable = value;
+	return 0;
+}
+
 int mlx5_crdump_init(struct mlx5_core_dev *dev)
 {
 	struct devlink *devlink = priv_to_devlink(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
index 3c9a6dedccaa..c639f0af29ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h
@@ -45,6 +45,8 @@ int mlx5_crdump_init(struct mlx5_core_dev *dev);
 void mlx5_crdump_cleanup(struct mlx5_core_dev *dev);
 int mlx5_crdump_collect(struct mlx5_core_dev *dev,
 			char *crdump_region, u32 *snapshot_id);
+bool mlx5_crdump_is_snapshot_enabled(struct mlx5_core_dev *dev);
+int mlx5_crdump_set_snapshot_enabled(struct mlx5_core_dev *dev, bool value);
 
 /* TODO move to lib/events.h */
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 15/27] Documentation: mlx5: Update kernel documentation
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (13 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 14/27] net/mlx5: Use devlink region_snapshot parameter Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 16/27] net/mlx5: Handle SW reset of FW in error flow Eran Ben Elisha
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh
  Cc: Feras Daoud, Saeed Mahameed

From: Feras Daoud <ferasda@mellanox.com>

Initial kernel documentation for mlx5 core driver.
This change includes also documentation for the new devlink region
param : DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 Documentation/networking/mlx5.rst | 39 +++++++++++++++++++++++++++++++
 MAINTAINERS                       |  1 +
 2 files changed, 40 insertions(+)
 create mode 100644 Documentation/networking/mlx5.rst

diff --git a/Documentation/networking/mlx5.rst b/Documentation/networking/mlx5.rst
new file mode 100644
index 000000000000..4c44996d2132
--- /dev/null
+++ b/Documentation/networking/mlx5.rst
@@ -0,0 +1,39 @@
+.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+
+Linux* Base Driver for Mellanox Core VPI Network Connection
+===========================================================
+
+Copyright (c) 2018, Mellanox Technologies inc.
+
+Contents
+========
+
+- Command Line Parameters
+
+Command Line Parameters
+=======================
+
+Devlink tool
+------------
+The driver utilizes the devlink device configuration tool for setting driver
+configuration, as well as displaying device attributes.
+
+:Devlink supported parameters:
+
+- DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT:
+
+      - This parameter enables capturing region snapshot of the crspace during critical errors.
+      - The default value of this parameter is disabled.
+
+      - Example:
+         devlink region show
+            List available address regions and snapshot.
+
+         devlink region del pci/0000:00:05.0/cr-space snapshot 1
+            Delete snapshot id 1 from cr-space address region from device pci/0000:00:05.0.
+
+         devlink region dump pci/0000:00:05.0/cr-space snapshot 1
+            Dump the snapshot taken from cr-space address region with ID 1
+
+         devlink region read pci/0000:00:05.0/cr-space snapshot 1 address 0x10 legth 16
+            Read from address 0x10, 16 Bytes of snapshot ID 1 taken from cr-space address region
diff --git a/MAINTAINERS b/MAINTAINERS
index 4d04cebb4a71..d886bef6f06a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9743,6 +9743,7 @@ Q:	http://patchwork.ozlabs.org/project/netdev/list/
 S:	Supported
 F:	drivers/net/ethernet/mellanox/mlx5/core/
 F:	include/linux/mlx5/
+F:	Documentation/networking/mlx5.rst
 
 MELLANOX MLX5 IB driver
 M:	Leon Romanovsky <leonro@mellanox.com>
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 16/27] net/mlx5: Handle SW reset of FW in error flow
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (14 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 15/27] Documentation: mlx5: Update kernel documentation Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 17/27] net/mlx5: Control CR-space access by different PFs Eran Ben Elisha
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh
  Cc: Feras Daoud, Daniel Jurgens, Saeed Mahameed

From: Feras Daoud <ferasda@mellanox.com>

New mlx5 adapters allow the driver to reset the FW in the event of an
error, this action called "SW Reset". When an SW reset is issued on any
PF all PFs enter reset state which is a recoverable condition. The
existing recovery flow was designed to allow the recovery of a VF after
a PF driver reload. This patch adds the sw reset to the NIC states
as a preparation for sw reset handling.

When a software reset is issued the following occurs:
1. The NIC interface mode is set to 7 while the reset is in progress.
2. Once the reset completes the NIC interface mode is set to 1.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en_selftest.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  | 57 +++++++++++++------
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  2 +-
 include/linux/mlx5/driver.h                   |  2 +-
 4 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 4382ef85488c..840ec945ccba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -64,7 +64,7 @@ static int mlx5e_test_health_info(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_health *health = &priv->mdev->priv.health;
 
-	return health->sick ? 1 : 0;
+	return health->fatal_error ? 1 : 0;
 }
 
 static int mlx5e_test_link_state(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 883c1e8ffdc2..afa3fe6eef8f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -65,9 +65,16 @@ enum {
 	MLX5_DROP_NEW_RECOVERY_WORK,
 };
 
+enum  {
+	MLX5_SENSOR_NO_ERR		= 0,
+	MLX5_SENSOR_PCI_COMM_ERR	= 1,
+	MLX5_SENSOR_NIC_DISABLED	= 2,
+	MLX5_SENSOR_NIC_SW_RESET	= 3,
+};
+
 u8 mlx5_get_nic_state(struct mlx5_core_dev *dev)
 {
-	return (ioread32be(&dev->iseg->cmdq_addr_l_sz) >> 8) & 3;
+	return (ioread32be(&dev->iseg->cmdq_addr_l_sz) >> 8) & 7;
 }
 
 void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state)
@@ -80,18 +87,25 @@ void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state)
 		    &dev->iseg->cmdq_addr_l_sz);
 }
 
-static int in_fatal(struct mlx5_core_dev *dev)
+static bool sensor_pci_not_working(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_health *health = &dev->priv.health;
 	struct health_buffer __iomem *h = health->health;
 
-	if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED)
-		return 1;
+	/* Offline PCI reads return 0xffffffff */
+	return (ioread32be(&h->fw_ver) == 0xffffffff);
+}
 
-	if (ioread32be(&h->fw_ver) == 0xffffffff)
-		return 1;
+static u32 check_fatal_sensors(struct mlx5_core_dev *dev)
+{
+	if (sensor_pci_not_working(dev))
+		return MLX5_SENSOR_PCI_COMM_ERR;
+	if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED)
+		return MLX5_SENSOR_NIC_DISABLED;
+	if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_SW_RESET)
+		return MLX5_SENSOR_NIC_SW_RESET;
 
-	return 0;
+	return MLX5_SENSOR_NO_ERR;
 }
 
 void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
@@ -101,7 +115,8 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
 		goto unlock;
 
 	mlx5_core_err(dev, "start\n");
-	if (pci_channel_offline(dev->pdev) || in_fatal(dev) || force) {
+	if (pci_channel_offline(dev->pdev) ||
+	    dev->priv.health.fatal_error != MLX5_SENSOR_NO_ERR || force) {
 		dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
 		mlx5_cmd_trigger_completions(dev);
 	}
@@ -143,16 +158,14 @@ static void health_recover(struct work_struct *work)
 	struct delayed_work *dwork;
 	struct mlx5_core_dev *dev;
 	struct mlx5_priv *priv;
-	u8 nic_state;
 
 	dwork = container_of(work, struct delayed_work, work);
 	health = container_of(dwork, struct mlx5_core_health, recover_work);
 	priv = container_of(health, struct mlx5_priv, health);
 	dev = container_of(priv, struct mlx5_core_dev, priv);
 
-	nic_state = mlx5_get_nic_state(dev);
-	if (nic_state == MLX5_NIC_IFC_INVALID) {
-		dev_err(&dev->pdev->dev, "health recovery flow aborted since the nic state is invalid\n");
+	if (sensor_pci_not_working(dev)) {
+		dev_err(&dev->pdev->dev, "health recovery flow aborted, PCI reads still not working\n");
 		return;
 	}
 
@@ -162,10 +175,17 @@ static void health_recover(struct work_struct *work)
 
 /* How much time to wait until health resetting the driver (in msecs) */
 #define MLX5_RECOVERY_DELAY_MSECS 60000
+#define MLX5_RECOVERY_NO_DELAY 0
+static unsigned long get_recovery_delay(struct mlx5_core_dev *dev)
+{
+	return dev->priv.health.fatal_error == MLX5_SENSOR_PCI_COMM_ERR ?
+	       MLX5_RECOVERY_DELAY_MSECS : MLX5_RECOVERY_NO_DELAY;
+}
+
 static void health_care(struct work_struct *work)
 {
-	unsigned long recover_delay = msecs_to_jiffies(MLX5_RECOVERY_DELAY_MSECS);
 	struct mlx5_core_health *health;
+	unsigned long recover_delay;
 	struct mlx5_core_dev *dev;
 	struct mlx5_priv *priv;
 	unsigned long flags;
@@ -175,6 +195,7 @@ static void health_care(struct work_struct *work)
 	dev = container_of(priv, struct mlx5_core_dev, priv);
 	mlx5_core_warn(dev, "handling bad device here\n");
 	mlx5_handle_bad_state(dev);
+	recover_delay = msecs_to_jiffies(get_recovery_delay(dev));
 
 	spin_lock_irqsave(&health->wq_lock, flags);
 	if (!test_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags))
@@ -271,6 +292,7 @@ static void poll_health(struct timer_list *t)
 {
 	struct mlx5_core_dev *dev = from_timer(dev, t, priv.health.timer);
 	struct mlx5_core_health *health = &dev->priv.health;
+	u32 fatal_error;
 	u32 count;
 
 	if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
@@ -288,8 +310,11 @@ static void poll_health(struct timer_list *t)
 		print_health_info(dev);
 	}
 
-	if (in_fatal(dev) && !health->sick) {
-		health->sick = true;
+	fatal_error = check_fatal_sensors(dev);
+
+	if (fatal_error && !health->fatal_error) {
+		mlx5_core_err(dev, "Fatal error %u detected\n", fatal_error);
+		dev->priv.health.fatal_error = fatal_error;
 		print_health_info(dev);
 		mlx5_trigger_health_work(dev);
 	}
@@ -303,7 +328,7 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev)
 	struct mlx5_core_health *health = &dev->priv.health;
 
 	timer_setup(&health->timer, poll_health, 0);
-	health->sick = 0;
+	health->fatal_error = MLX5_SENSOR_NO_ERR;
 	clear_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags);
 	clear_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
 	health->health = &dev->iseg->health;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index c68dcea5985b..29a2a4546fb6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -194,7 +194,7 @@ enum {
 	MLX5_NIC_IFC_FULL		= 0,
 	MLX5_NIC_IFC_DISABLED		= 1,
 	MLX5_NIC_IFC_NO_DRAM_NIC	= 2,
-	MLX5_NIC_IFC_INVALID		= 3
+	MLX5_NIC_IFC_SW_RESET		= 7
 };
 
 u8 mlx5_get_nic_state(struct mlx5_core_dev *dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 77c04154ecf3..ac138180f08b 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -434,7 +434,7 @@ struct mlx5_core_health {
 	struct timer_list		timer;
 	u32				prev;
 	int				miss_counter;
-	bool				sick;
+	u32				fatal_error;
 	/* wq spinlock to synchronize draining */
 	spinlock_t			wq_lock;
 	struct workqueue_struct	       *wq;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 17/27] net/mlx5: Control CR-space access by different PFs
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (15 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 16/27] net/mlx5: Handle SW reset of FW in error flow Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 18/27] net/mlx5: Issue SW reset on FW assert Eran Ben Elisha
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh
  Cc: Feras Daoud, Saeed Mahameed

From: Feras Daoud <ferasda@mellanox.com>

Since the FW can be shared between different PFs/VFs it is common
that more than one health poll will detected a failure, this can
lead to multiple resets which are unneeded.

The solution is to use a FW locking mechanism using semaphore space
to provide a way to allow only one device to collect the cr-dump and
to issue a sw-reset.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 40 ++++++++++++++++---
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.h |  8 ++++
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  4 ++
 3 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
index 96695365d8ce..4ad06465cccb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
@@ -24,11 +24,6 @@
 	pci_write_config_dword((dev)->pdev, (dev)->vsc_addr + (offset), (val))
 #define VSC_MAX_RETRIES 2048
 
-enum mlx5_vsc_state {
-	MLX5_VSC_UNLOCK,
-	MLX5_VSC_LOCK,
-};
-
 enum {
 	VSC_CTRL_OFFSET = 0x4,
 	VSC_COUNTER_OFFSET = 0x8,
@@ -279,3 +274,38 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
 	}
 	return length;
 }
+
+int mlx5_vsc_sem_set_space(struct mlx5_core_dev *dev, u16 space,
+			   enum mlx5_vsc_state state)
+{
+	u32 data, id = 0;
+	int ret;
+
+	ret = mlx5_vsc_gw_set_space(dev, MLX5_SEMAPHORE_SPACE_DOMAIN, NULL);
+	if (ret) {
+		mlx5_core_warn(dev, "Failed to set gw space %d\n", ret);
+		return ret;
+	}
+
+	if (state == MLX5_VSC_LOCK) {
+		/* Get a unique ID based on the counter */
+		ret = vsc_read(dev, VSC_COUNTER_OFFSET, &id);
+		if (ret)
+			return ret;
+	}
+
+	/* Try to modify lock */
+	ret = mlx5_vsc_gw_write(dev, space, id);
+	if (ret)
+		return ret;
+
+	/* Verify lock was modified */
+	ret = mlx5_vsc_gw_read(dev, space, &data);
+	if (ret)
+		return -EINVAL;
+
+	if (data != id)
+		return -EBUSY;
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
index c30ff69ffa3e..1f96db9b982b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
@@ -4,6 +4,11 @@
 #ifndef __MLX5_PCI_VSC_H__
 #define __MLX5_PCI_VSC_H__
 
+enum mlx5_vsc_state {
+	MLX5_VSC_UNLOCK,
+	MLX5_VSC_LOCK,
+};
+
 enum {
 	MLX5_VSC_SPACE_SCAN_CRSPACE = 0x7,
 };
@@ -22,4 +27,7 @@ static inline bool mlx5_vsc_accessible(struct mlx5_core_dev *dev)
 	return !!dev->vsc_addr;
 }
 
+int mlx5_vsc_sem_set_space(struct mlx5_core_dev *dev, u16 space,
+			   enum mlx5_vsc_state state);
+
 #endif /* __MLX5_PCI_VSC_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 29a2a4546fb6..e5350c885ab7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -97,6 +97,10 @@ enum {
 	MLX5_DRIVER_SYND = 0xbadd00de,
 };
 
+enum mlx5_semaphore_space_address {
+	MLX5_SEMAPHORE_SPACE_DOMAIN     = 0xA,
+};
+
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
 int mlx5_query_board_id(struct mlx5_core_dev *dev);
 int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 18/27] net/mlx5: Issue SW reset on FW assert
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (16 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 17/27] net/mlx5: Control CR-space access by different PFs Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 19/27] net/mlx5: Refactor print health info Eran Ben Elisha
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh
  Cc: Feras Daoud, Alex Vesker, Daniel Jurgens, Saeed Mahameed

From: Feras Daoud <ferasda@mellanox.com>

If a FW assert is considered fatal, indicated by a new bit in the health
buffer, reset the FW. After the reset go through the normal recovery
flow. Only one PF needs to issue the reset, so an attempt is made to
prevent the 2nd function from also issuing the reset.
It's not an error if that happens, it just slows recovery.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/diag/crdump.c |  10 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  | 172 +++++++++++++++++-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   1 +
 include/linux/mlx5/device.h                   |  10 +-
 include/linux/mlx5/driver.h                   |   1 +
 5 files changed, 183 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
index cf601300eb2e..6a9793f83015 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
@@ -45,7 +45,7 @@ static int mlx5_crdump_fill(struct mlx5_core_dev *dev,
 
 	ret = mlx5_vsc_gw_read_block_fast(dev, cr_data, crdump->size);
 	if (ret <= 0) {
-		if (ret == 0)
+		if (!ret)
 			ret = -EIO;
 		goto free_data;
 	}
@@ -91,6 +91,13 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev,
 			       ret);
 		return ret;
 	}
+	/* Verify no other PF is running cr-dump or sw reset */
+	ret = mlx5_vsc_sem_set_space(dev, MLX5_SEMAPHORE_SW_RESET,
+				     MLX5_VSC_LOCK);
+	if (ret) {
+		mlx5_core_warn(dev, "Failed to lock SW reset semaphore\n");
+		goto unlock;
+	}
 
 	ret = mlx5_vsc_gw_set_space(dev, MLX5_VSC_SPACE_SCAN_CRSPACE, NULL);
 	if (ret)
@@ -98,6 +105,7 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev,
 
 	ret = mlx5_crdump_fill(dev, crdump_region, snapshot_id);
 
+	mlx5_vsc_sem_set_space(dev, MLX5_SEMAPHORE_SW_RESET, MLX5_VSC_UNLOCK);
 unlock:
 	mlx5_vsc_gw_unlock(dev);
 	return ret;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index afa3fe6eef8f..87c327751941 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -40,6 +40,7 @@
 #include "mlx5_core.h"
 #include "lib/eq.h"
 #include "lib/mlx5.h"
+#include "lib/pci_vsc.h"
 
 enum {
 	MLX5_HEALTH_POLL_INTERVAL	= 2 * HZ,
@@ -68,8 +69,10 @@ enum {
 enum  {
 	MLX5_SENSOR_NO_ERR		= 0,
 	MLX5_SENSOR_PCI_COMM_ERR	= 1,
-	MLX5_SENSOR_NIC_DISABLED	= 2,
-	MLX5_SENSOR_NIC_SW_RESET	= 3,
+	MLX5_SENSOR_PCI_ERR		= 2,
+	MLX5_SENSOR_NIC_DISABLED	= 3,
+	MLX5_SENSOR_NIC_SW_RESET	= 4,
+	MLX5_SENSOR_FW_SYND_RFR		= 5,
 };
 
 u8 mlx5_get_nic_state(struct mlx5_core_dev *dev)
@@ -96,34 +99,167 @@ static bool sensor_pci_not_working(struct mlx5_core_dev *dev)
 	return (ioread32be(&h->fw_ver) == 0xffffffff);
 }
 
+static bool sensor_fw_synd_rfr(struct mlx5_core_dev *dev)
+{
+	struct mlx5_core_health *health = &dev->priv.health;
+	struct health_buffer __iomem *h = health->health;
+	u32 rfr = ioread32be(&h->rfr) >> MLX5_RFR_OFFSET;
+	u8 synd = ioread8(&h->synd);
+
+	if (rfr && synd)
+		mlx5_core_dbg(dev, "FW requests reset, synd: %d\n", synd);
+	return rfr && synd;
+}
+
 static u32 check_fatal_sensors(struct mlx5_core_dev *dev)
 {
 	if (sensor_pci_not_working(dev))
 		return MLX5_SENSOR_PCI_COMM_ERR;
+	if (pci_channel_offline(dev->pdev))
+		return MLX5_SENSOR_PCI_ERR;
 	if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED)
 		return MLX5_SENSOR_NIC_DISABLED;
 	if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_SW_RESET)
 		return MLX5_SENSOR_NIC_SW_RESET;
+	if (sensor_fw_synd_rfr(dev))
+		return MLX5_SENSOR_FW_SYND_RFR;
 
 	return MLX5_SENSOR_NO_ERR;
 }
 
+static int lock_sem_sw_reset(struct mlx5_core_dev *dev, bool lock)
+{
+	enum mlx5_vsc_state state;
+	int ret;
+
+	if (!mlx5_core_is_pf(dev))
+		return -EBUSY;
+
+	/* Try to lock GW access, this stage doesn't return
+	 * EBUSY because locked GW does not mean that other PF
+	 * already started the reset.
+	 */
+	ret = mlx5_vsc_gw_lock(dev);
+	if (ret == -EBUSY)
+		return -EINVAL;
+	if (ret)
+		return ret;
+
+	state = lock ? MLX5_VSC_LOCK : MLX5_VSC_UNLOCK;
+	/* At this stage, if the return status == EBUSY, then we know
+	 * for sure that another PF started the reset, so don't allow
+	 * another reset.
+	 */
+	ret = mlx5_vsc_sem_set_space(dev, MLX5_SEMAPHORE_SW_RESET, state);
+	if (ret)
+		mlx5_core_warn(dev, "Failed to lock SW reset semaphore\n");
+
+	/* Unlock GW access */
+	mlx5_vsc_gw_unlock(dev);
+
+	return ret;
+}
+
+static bool reset_fw_if_needed(struct mlx5_core_dev *dev)
+{
+	bool supported = (ioread32be(&dev->iseg->initializing) >>
+			  MLX5_FW_RESET_SUPPORTED_OFFSET) & 1;
+	u32 fatal_error;
+
+	if (!supported)
+		return false;
+
+	/* The reset only needs to be issued by one PF. The health buffer is
+	 * shared between all functions, and will be cleared during a reset.
+	 * Check again to avoid a redundant 2nd reset. If the fatal erros was
+	 * PCI related a reset won't help.
+	 */
+	fatal_error = check_fatal_sensors(dev);
+	if (fatal_error == MLX5_SENSOR_PCI_COMM_ERR ||
+	    fatal_error == MLX5_SENSOR_NIC_DISABLED ||
+	    fatal_error == MLX5_SENSOR_NIC_SW_RESET) {
+		mlx5_core_warn(dev, "Not issuing FW reset. Either it's already done or won't help.");
+		return false;
+	}
+
+	mlx5_core_warn(dev, "Issuing FW Reset\n");
+	/* Write the NIC interface field to initiate the reset, the command
+	 * interface address also resides here, don't overwrite it.
+	 */
+	mlx5_set_nic_state(dev, MLX5_NIC_IFC_SW_RESET);
+
+	return true;
+}
+
+#define MLX5_CRDUMP_WAIT_MS	60000
+#define MLX5_FW_RESET_WAIT_MS	1000
 void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
 {
+	unsigned long end, delay_ms = MLX5_FW_RESET_WAIT_MS;
+	u32 fatal_error;
+	int lock = -EBUSY;
+
 	mutex_lock(&dev->intf_state_mutex);
 	if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
 		goto unlock;
+	if (dev->state == MLX5_DEVICE_STATE_UNINITIALIZED) {
+		dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
+		goto unlock;
+	}
+
+	if (force)
+		mlx5_core_dbg(dev, "start\n");
+	else
+		mlx5_core_err(dev, "start\n");
+
+	fatal_error = check_fatal_sensors(dev);
 
-	mlx5_core_err(dev, "start\n");
-	if (pci_channel_offline(dev->pdev) ||
-	    dev->priv.health.fatal_error != MLX5_SENSOR_NO_ERR || force) {
+	if (fatal_error || force) {
 		dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
 		mlx5_cmd_trigger_completions(dev);
 	}
 
 	mlx5_notifier_call_chain(dev->priv.events, MLX5_DEV_EVENT_SYS_ERROR, (void *)1);
-	mlx5_core_err(dev, "end\n");
 
+	if (force)
+		goto err_state_done;
+
+	if (fatal_error == MLX5_SENSOR_FW_SYND_RFR) {
+		/* Get cr-dump and reset FW semaphore */
+		lock = lock_sem_sw_reset(dev, true);
+
+		if (lock == -EBUSY) {
+			delay_ms = MLX5_CRDUMP_WAIT_MS;
+			goto recover_from_sw_reset;
+		}
+		/* Execute SW reset */
+		reset_fw_if_needed(dev);
+	}
+
+recover_from_sw_reset:
+	/* Recover from SW reset */
+	end = jiffies + msecs_to_jiffies(delay_ms);
+	do {
+		if (mlx5_get_nic_state(dev) == MLX5_NIC_IFC_DISABLED)
+			break;
+
+		cond_resched();
+	} while (!time_after(jiffies, end));
+
+	if (mlx5_get_nic_state(dev) != MLX5_NIC_IFC_DISABLED) {
+		dev_err(&dev->pdev->dev, "NIC IFC still %d after %lums.\n",
+			mlx5_get_nic_state(dev), delay_ms);
+	}
+
+	/* Release FW semaphore if you are the lock owner */
+	if (!lock)
+		lock_sem_sw_reset(dev, false);
+
+err_state_done:
+	if (force)
+		mlx5_core_dbg(dev, "end\n");
+	else
+		mlx5_core_err(dev, "end\n");
 unlock:
 	mutex_unlock(&dev->intf_state_mutex);
 }
@@ -144,6 +280,20 @@ static void mlx5_handle_bad_state(struct mlx5_core_dev *dev)
 	case MLX5_NIC_IFC_NO_DRAM_NIC:
 		mlx5_core_warn(dev, "Expected to see disabled NIC but it is no dram nic\n");
 		break;
+
+	case MLX5_NIC_IFC_SW_RESET:
+		/* The IFC mode field is 3 bits, so it will read 0x7 in 2 cases:
+		 * 1. PCI has been disabled (ie. PCI-AER, PF driver unloaded
+		 *    and this is a VF), this is not recoverable by SW reset.
+		 *    Logging of this is handled elsewhere.
+		 * 2. FW reset has been issued by another function, driver can
+		 *    be reloaded to recover after the mode switches to
+		 *    MLX5_NIC_IFC_DISABLED.
+		 */
+		if (dev->priv.health.fatal_error != MLX5_SENSOR_PCI_COMM_ERR)
+			mlx5_core_warn(dev, "NIC SW reset in progress\n");
+		break;
+
 	default:
 		mlx5_core_warn(dev, "Expected to see disabled NIC but it is has invalid value %d\n",
 			       nic_interface);
@@ -178,7 +328,8 @@ static void health_recover(struct work_struct *work)
 #define MLX5_RECOVERY_NO_DELAY 0
 static unsigned long get_recovery_delay(struct mlx5_core_dev *dev)
 {
-	return dev->priv.health.fatal_error == MLX5_SENSOR_PCI_COMM_ERR ?
+	return dev->priv.health.fatal_error == MLX5_SENSOR_PCI_ERR ||
+					       MLX5_SENSOR_PCI_COMM_ERR ?
 	       MLX5_RECOVERY_DELAY_MSECS : MLX5_RECOVERY_NO_DELAY;
 }
 
@@ -198,11 +349,14 @@ static void health_care(struct work_struct *work)
 	recover_delay = msecs_to_jiffies(get_recovery_delay(dev));
 
 	spin_lock_irqsave(&health->wq_lock, flags);
-	if (!test_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags))
+	if (!test_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags)) {
+		mlx5_core_warn(dev, "Scheduling recovery work with %lums delay\n",
+			       recover_delay);
 		schedule_delayed_work(&health->recover_work, recover_delay);
-	else
+	} else {
 		dev_err(&dev->pdev->dev,
 			"new health works are not permitted at this stage\n");
+	}
 	spin_unlock_irqrestore(&health->wq_lock, flags);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index e5350c885ab7..bf118e85aefe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -99,6 +99,7 @@ enum {
 
 enum mlx5_semaphore_space_address {
 	MLX5_SEMAPHORE_SPACE_DOMAIN     = 0xA,
+	MLX5_SEMAPHORE_SW_RESET         = 0x20,
 };
 
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 8c4a820bd4c1..cac801609d76 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -507,6 +507,10 @@ struct mlx5_cmd_layout {
 	u8		status_own;
 };
 
+enum mlx5_fatal_assert_bit_offsets {
+	MLX5_RFR_OFFSET = 31,
+};
+
 struct health_buffer {
 	__be32		assert_var[5];
 	__be32		rsvd0[3];
@@ -515,12 +519,16 @@ struct health_buffer {
 	__be32		rsvd1[2];
 	__be32		fw_ver;
 	__be32		hw_id;
-	__be32		rsvd2;
+	__be32		rfr;
 	u8		irisc_index;
 	u8		synd;
 	__be16		ext_synd;
 };
 
+enum mlx5_initializing_bit_offsets {
+	MLX5_FW_RESET_SUPPORTED_OFFSET = 30,
+};
+
 enum mlx5_cmd_addr_l_sz_offset {
 	MLX5_NIC_IFC_OFFSET = 8,
 };
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index ac138180f08b..778710ff77c9 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -575,6 +575,7 @@ struct mlx5_priv {
 };
 
 enum mlx5_device_state {
+	MLX5_DEVICE_STATE_UNINITIALIZED,
 	MLX5_DEVICE_STATE_UP,
 	MLX5_DEVICE_STATE_INTERNAL_ERROR,
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 19/27] net/mlx5: Refactor print health info
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (17 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 18/27] net/mlx5: Issue SW reset on FW assert Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 20/27] net/mlx5: Create FW devlink_health_reporter Eran Ben Elisha
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Moshe Shemesh <moshe@mellanox.com>

Refactor print health info code, split to two functions:
 1. mlx5_get_health_info() - writes the health info into a buffer.
 2. mlx5_print_health_info() - prints the health info to kernel log.
This refactoring is done to enable using the health info data by devlink
health reporter diagnose() in the downstream patch.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/health.c  | 44 +++++++++++++------
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  7 +++
 include/linux/mlx5/device.h                   |  1 +
 include/linux/mlx5/mlx5_ifc.h                 |  4 +-
 4 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 87c327751941..20576a525421 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -390,7 +390,8 @@ static const char *hsynd_str(u8 synd)
 	}
 }
 
-static void print_health_info(struct mlx5_core_dev *dev)
+void mlx5_get_health_info(struct mlx5_core_dev *dev, u8 *synd,
+			  char (*lines_buf)[HEALTH_INFO_MAX_LINE])
 {
 	struct mlx5_core_health *health = &dev->priv.health;
 	struct health_buffer __iomem *h = health->health;
@@ -398,23 +399,40 @@ static void print_health_info(struct mlx5_core_dev *dev)
 	u32 fw;
 	int i;
 
+	*synd = ioread8(&h->synd);
 	/* If the syndrome is 0, the device is OK and no need to print buffer */
-	if (!ioread8(&h->synd))
+	if (!synd)
 		return;
 
 	for (i = 0; i < ARRAY_SIZE(h->assert_var); i++)
-		dev_err(&dev->pdev->dev, "assert_var[%d] 0x%08x\n", i, ioread32be(h->assert_var + i));
+		sprintf(lines_buf[i], "assert_var[%d] 0x%08x\n", i, ioread32be(h->assert_var + i));
 
-	dev_err(&dev->pdev->dev, "assert_exit_ptr 0x%08x\n", ioread32be(&h->assert_exit_ptr));
-	dev_err(&dev->pdev->dev, "assert_callra 0x%08x\n", ioread32be(&h->assert_callra));
+	sprintf(lines_buf[i++], "assert_exit_ptr 0x%08x\n", ioread32be(&h->assert_exit_ptr));
+	sprintf(lines_buf[i++], "assert_callra 0x%08x\n", ioread32be(&h->assert_callra));
 	sprintf(fw_str, "%d.%d.%d", fw_rev_maj(dev), fw_rev_min(dev), fw_rev_sub(dev));
-	dev_err(&dev->pdev->dev, "fw_ver %s\n", fw_str);
-	dev_err(&dev->pdev->dev, "hw_id 0x%08x\n", ioread32be(&h->hw_id));
-	dev_err(&dev->pdev->dev, "irisc_index %d\n", ioread8(&h->irisc_index));
-	dev_err(&dev->pdev->dev, "synd 0x%x: %s\n", ioread8(&h->synd), hsynd_str(ioread8(&h->synd)));
-	dev_err(&dev->pdev->dev, "ext_synd 0x%04x\n", ioread16be(&h->ext_synd));
+	sprintf(lines_buf[i++], "fw_ver %s\n", fw_str);
+	sprintf(lines_buf[i++], "hw_id 0x%08x\n", ioread32be(&h->hw_id));
+	sprintf(lines_buf[i++], "irisc_index %d\n", ioread8(&h->irisc_index));
+	sprintf(lines_buf[i++], "synd 0x%x: %s\n", ioread8(&h->synd), hsynd_str(ioread8(&h->synd)));
+	snprintf(lines_buf[i++], HEALTH_INFO_MAX_LINE, "ext_synd 0x%04x\n", ioread16be(&h->ext_synd));
 	fw = ioread32be(&h->fw_ver);
-	dev_err(&dev->pdev->dev, "raw fw_ver 0x%08x\n", fw);
+	sprintf(lines_buf[i], "raw fw_ver 0x%08x\n", fw);
+}
+
+static void mlx5_print_health_info(struct mlx5_core_dev *dev)
+{
+	char lines_buf[HEALTH_INFO_LINES][HEALTH_INFO_MAX_LINE] = {};
+	u8 synd;
+	int i;
+
+	mlx5_get_health_info(dev, &synd, lines_buf);
+
+	if (!synd)
+		return;
+
+	for (i = 0; i < HEALTH_INFO_LINES; i++)
+		dev_err(&dev->pdev->dev, lines_buf[i]);
+
 }
 
 static unsigned long get_next_poll_jiffies(void)
@@ -461,7 +479,7 @@ static void poll_health(struct timer_list *t)
 	health->prev = count;
 	if (health->miss_counter == MAX_MISSES) {
 		dev_err(&dev->pdev->dev, "device's health compromised - reached miss count\n");
-		print_health_info(dev);
+		mlx5_print_health_info(dev);
 	}
 
 	fatal_error = check_fatal_sensors(dev);
@@ -469,7 +487,7 @@ static void poll_health(struct timer_list *t)
 	if (fatal_error && !health->fatal_error) {
 		mlx5_core_err(dev, "Fatal error %u detected\n", fatal_error);
 		dev->priv.health.fatal_error = fatal_error;
-		print_health_info(dev);
+		mlx5_print_health_info(dev);
 		mlx5_trigger_health_work(dev);
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index bf118e85aefe..af5cf19cc019 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -204,4 +204,11 @@ enum {
 
 u8 mlx5_get_nic_state(struct mlx5_core_dev *dev);
 void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state);
+
+#define HEALTH_INFO_MAX_LINE 64
+#define HEALTH_INFO_LINES (MLX5_FLD_SZ_DW(health_buffer, assert_var) + 8)
+#define HEALTH_INFO_MAX_BUFF (HEALTH_INFO_MAX_LINE * HEALTH_INFO_LINES)
+void mlx5_get_health_info(struct mlx5_core_dev *dev, u8 *synd,
+			  char (*lines_buf)[HEALTH_INFO_MAX_LINE]);
+
 #endif /* __MLX5_CORE_H__ */
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index cac801609d76..9125289d1f46 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -61,6 +61,7 @@
 #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits)
 
 #define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
+#define MLX5_FLD_SZ_DW(typ, fld) (__mlx5_bit_sz(typ, fld) / 32)
 #define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
 #define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
 #define MLX5_ST_SZ_QW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 64)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 35fe5217b244..203f937e9efb 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -3371,7 +3371,9 @@ union mlx5_ifc_event_auto_bits {
 };
 
 struct mlx5_ifc_health_buffer_bits {
-	u8         reserved_at_0[0x100];
+	u8         assert_var[0xa0];
+
+	u8         reserved_at_a0[0x60];
 
 	u8         assert_existptr[0x20];
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 20/27] net/mlx5: Create FW devlink_health_reporter
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (18 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 19/27] net/mlx5: Refactor print health info Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 21/27] net/mlx5: Add core dump register access functions Eran Ben Elisha
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Moshe Shemesh <moshe@mellanox.com>

Create mlx5_devlink_health_reporter for FW reporter. The FW reporter
implements devlink_health_reporter diagnose callback.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 114 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |   8 ++
 .../net/ethernet/mellanox/mlx5/core/main.c    |   5 +
 include/linux/mlx5/driver.h                   |   1 +
 4 files changed, 128 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 88190acae873..b28a923e3645 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -5,6 +5,120 @@
 #include <linux/mlx5/driver.h>
 #include "lib/mlx5.h"
 
+static int
+mlx5_devlink_health_buffer_fill_syndrom(struct devlink_health_buffer *dh_buffer,
+					u8 syndrom)
+{
+	int err;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_object_name(dh_buffer, "Syndrom");
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_value_u8(dh_buffer, syndrom);
+	if (err)
+		return err;
+	devlink_health_buffer_nest_end(dh_buffer);
+	devlink_health_buffer_nest_end(dh_buffer);
+	devlink_health_buffer_nest_end(dh_buffer);
+
+	return 0;
+}
+
+static int
+mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
+			  struct devlink_health_buffer **buffers_array,
+			  unsigned int buff_size, unsigned int num_buffers)
+{
+	struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
+	char lines_buf[HEALTH_INFO_LINES][HEALTH_INFO_MAX_LINE] = {};
+	struct devlink_health_buffer *buffer;
+	u8 synd;
+	int err;
+	int i;
+
+	if (!buffers_array || buff_size < HEALTH_INFO_MAX_BUFF ||
+	    num_buffers < 1 || !buffers_array[0])
+		return -EINVAL;
+
+	buffer = buffers_array[0];
+	mlx5_get_health_info(dev, &synd, lines_buf);
+	mlx5_devlink_health_buffer_fill_syndrom(buffer, synd);
+
+	if (!synd)
+		return 0;
+
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_object_name(buffer, "diagnose data");
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY);
+	if (err)
+		return err;
+
+	for (i = 0; i < HEALTH_INFO_LINES; i++) {
+		err = devlink_health_buffer_nest_start(buffer, DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+		if (err)
+			return err;
+		err = devlink_health_buffer_put_value_string(buffer, lines_buf[i]);
+		if (err)
+			return err;
+		devlink_health_buffer_nest_end(buffer);
+	}
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_HEALTH_BUFFER_OBJECT_VALUE_ARRAY */
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_HEALTH_BUFFER_OBJECT_VALUE */
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_HEALTH_BUFFER_OBJECT_PAIR */
+	devlink_health_buffer_nest_end(buffer); /* DEVLINK_HEALTH_BUFFER_OBJECT */
+
+	return 0;
+}
+
+static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
+		.name = "FW",
+		.diagnose_size = HEALTH_INFO_MAX_BUFF,
+		.diagnose = mlx5_fw_reporter_diagnose,
+};
+
+int mlx5_fw_reporter_create(struct mlx5_core_dev *dev)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+
+	dev->fw_reporter = devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
+							  0, false, dev);
+	return PTR_ERR_OR_ZERO(dev->fw_reporter);
+}
+
+void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev)
+{
+	if (!dev->fw_reporter)
+		return;
+
+	devlink_health_reporter_destroy(dev->fw_reporter);
+}
+
 static int mlx5_devlink_get_crdump_snapshot(struct devlink *devlink, u32 id,
 					    struct devlink_param_gset_ctx *ctx)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 455bfa4e89c0..34f6bfed1cfb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -5,8 +5,16 @@
 #define __MLX5_DEVLINK_H__
 
 #include <net/devlink.h>
+#include "mlx5_core.h"
+
+struct mlx5_fw_reporter_ctx {
+	u8 err_synd;
+	int miss_counter;
+};
 
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
 void mlx5_devlink_unregister(struct devlink *devlink);
+int mlx5_fw_reporter_create(struct mlx5_core_dev *dev);
+void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev);
 
 #endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index c6c07ca0e423..010a6a28e08d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -980,6 +980,10 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_fw_tracer;
 	}
 
+	err = mlx5_fw_reporter_create(dev);
+	if (err)
+		dev_warn(&pdev->dev, "Failed to create FW reporter\n");
+
 	err = mlx5_fpga_device_start(dev);
 	if (err) {
 		dev_err(&pdev->dev, "fpga device start failed %d\n", err);
@@ -1112,6 +1116,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_accel_ipsec_cleanup(dev);
 	mlx5_accel_tls_cleanup(dev);
 	mlx5_fpga_device_stop(dev);
+	mlx5_fw_reporter_destroy(dev);
 	mlx5_fw_tracer_cleanup(dev->tracer);
 	mlx5_eq_table_destroy(dev);
 	mlx5_pagealloc_stop(dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 778710ff77c9..350f11fa0c99 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -682,6 +682,7 @@ struct mlx5_core_dev {
 	struct mlx5_ib_clock_info  *clock_info;
 	struct page             *clock_info_page;
 	struct mlx5_fw_tracer   *tracer;
+	struct devlink_health_reporter *fw_reporter;
 	u32                      vsc_addr;
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 21/27] net/mlx5: Add core dump register access functions
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (19 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 20/27] net/mlx5: Create FW devlink_health_reporter Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 22/27] net/mlx5: Add support for FW reporter dump Eran Ben Elisha
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Moshe Shemesh <moshe@mellanox.com>

Add access functions to core dump register to enable trigger FW core
dump.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../mellanox/mlx5/core/diag/fw_tracer.c       | 32 +++++++++++++++++++
 include/linux/mlx5/driver.h                   |  1 +
 include/linux/mlx5/mlx5_ifc.h                 | 17 +++++++++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index 6999f4486e9e..89f6b459d21d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -786,6 +786,38 @@ static void mlx5_fw_tracer_ownership_change(struct work_struct *work)
 	mlx5_fw_tracer_start(tracer);
 }
 
+static int mlx5_fw_tracer_set_core_dump_reg(struct mlx5_core_dev *dev,
+					    u32 *in, int size_in)
+{
+	u32 out[MLX5_ST_SZ_DW(core_dump_reg)] = {0};
+
+	if (!MLX5_CAP_DEBUG(dev, core_dump_general) &&
+	    !MLX5_CAP_DEBUG(dev, core_dump_qp))
+		return -EOPNOTSUPP;
+
+	return mlx5_core_access_reg(dev, in, size_in, out, sizeof(out),
+				    MLX5_REG_CORE_DUMP, 0, 1);
+}
+
+int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev)
+{
+	struct mlx5_fw_tracer *tracer = dev->tracer;
+	u32 in[MLX5_ST_SZ_DW(core_dump_reg)] = {0};
+	int err;
+
+	if (!MLX5_CAP_DEBUG(dev, core_dump_general) || !tracer)
+		return -EOPNOTSUPP;
+
+	MLX5_SET(core_dump_reg, in, core_dump_type, 0x0);
+
+	err =  mlx5_fw_tracer_set_core_dump_reg(dev, in, sizeof(in));
+	if (err)
+		return err;
+	queue_work(tracer->work_queue, &tracer->handle_traces_work);
+	flush_workqueue(tracer->work_queue);
+	return 0;
+}
+
 /* Create software resources (Buffers, etc ..) */
 struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev)
 {
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 350f11fa0c99..cd23926aaf4a 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -109,6 +109,7 @@ enum {
 	MLX5_REG_FPGA_CAP	 = 0x4022,
 	MLX5_REG_FPGA_CTRL	 = 0x4023,
 	MLX5_REG_FPGA_ACCESS_REG = 0x4024,
+	MLX5_REG_CORE_DUMP	 = 0x402e,
 	MLX5_REG_PCAP		 = 0x5001,
 	MLX5_REG_PMTU		 = 0x5003,
 	MLX5_REG_PTYS		 = 0x5004,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 203f937e9efb..0063274711b0 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -679,7 +679,9 @@ struct mlx5_ifc_qos_cap_bits {
 };
 
 struct mlx5_ifc_debug_cap_bits {
-	u8         reserved_at_0[0x20];
+	u8         core_dump_general[0x1];
+	u8         core_dump_qp[0x1];
+	u8         reserved_at_2[0x1e];
 
 	u8         reserved_at_20[0x2];
 	u8         stall_detect[0x1];
@@ -2464,6 +2466,7 @@ union mlx5_ifc_hca_cap_union_bits {
 	struct mlx5_ifc_e_switch_cap_bits e_switch_cap;
 	struct mlx5_ifc_vector_calc_cap_bits vector_calc_cap;
 	struct mlx5_ifc_qos_cap_bits qos_cap;
+	struct mlx5_ifc_debug_cap_bits debug_cap;
 	struct mlx5_ifc_fpga_cap_bits fpga_cap;
 	u8         reserved_at_0[0x8000];
 };
@@ -8421,6 +8424,18 @@ struct mlx5_ifc_qcam_reg_bits {
 	u8         reserved_at_1c0[0x80];
 };
 
+struct mlx5_ifc_core_dump_reg_bits {
+	u8         reserved_at_0[0x18];
+	u8         core_dump_type[0x8];
+
+	u8         reserved_at_20[0x30];
+	u8         vhca_id[0x10];
+
+	u8         reserved_at_60[0x8];
+	u8         qpn[0x18];
+	u8         reserved_at_80[0x180];
+};
+
 struct mlx5_ifc_pcap_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         local_port[0x8];
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 22/27] net/mlx5: Add support for FW reporter dump
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (20 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 21/27] net/mlx5: Add core dump register access functions Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 23/27] net/mlx5: Report devlink health on FW issues Eran Ben Elisha
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Moshe Shemesh <moshe@mellanox.com>

Add support of dump callback for mlx5 FW reporter.
Once we trigger FW dump, the FW will write the core dump to its raw data
buffer. The tracer translates the raw data to traces and save it to a
buffer. Once dump is done, the saved traces data is filled as objects
into the dump buffer.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 112 ++++++++++++++++++
 .../mellanox/mlx5/core/diag/fw_tracer.c       |  44 +++++++
 .../mellanox/mlx5/core/diag/fw_tracer.h       |  11 ++
 3 files changed, 167 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index b28a923e3645..3124ced06d51 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -4,6 +4,7 @@
 #include <devlink.h>
 #include <linux/mlx5/driver.h>
 #include "lib/mlx5.h"
+#include "diag/fw_tracer.h"
 
 static int
 mlx5_devlink_health_buffer_fill_syndrom(struct devlink_health_buffer *dh_buffer,
@@ -36,6 +37,115 @@ mlx5_devlink_health_buffer_fill_syndrom(struct devlink_health_buffer *dh_buffer,
 	return 0;
 }
 
+int mlx5_devlink_health_buffer_fill_trace(struct devlink_health_buffer *dh_buffer,
+					  char *trace)
+{
+	int nest = 0;
+	int err = 0;
+	int i;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		goto nest_cancel;
+	nest++;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		goto nest_cancel;
+	nest++;
+
+	err = devlink_health_buffer_put_object_name(dh_buffer, "trace");
+	if (err)
+		goto nest_cancel;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		goto nest_cancel;
+	nest++;
+
+	err = devlink_health_buffer_put_value_string(dh_buffer, trace);
+	if (err)
+		goto nest_cancel;
+
+	for (i = 0; i < nest; i++)
+		devlink_health_buffer_nest_end(dh_buffer);
+	return 0;
+nest_cancel:
+	for (i = 0; i < nest; i++)
+		devlink_health_buffer_nest_cancel(dh_buffer);
+
+	return err;
+}
+
+int mlx5_fw_tracer_get_saved_traces_objects(struct mlx5_fw_tracer *tracer,
+					    struct devlink_health_buffer **buffers_array,
+					    unsigned int num_buffers)
+{
+	u32 saved_traces_index = tracer->sbuff.saved_traces_index;
+	char *saved_traces = tracer->sbuff.traces_buff;
+	u32 index, start_index, end_index;
+	u32 dh_buffer_index = 0;
+	int err = 0;
+
+	if (!saved_traces[0])
+		return -ENOMSG;
+
+	if (saved_traces[saved_traces_index * TRACE_STR_LINE])
+		start_index = saved_traces_index;
+	else
+		start_index = 0;
+	end_index = (saved_traces_index - 1) & (SAVED_TRACES_NUM - 1);
+
+	index = start_index;
+	while (index <= end_index) {
+		err = mlx5_devlink_health_buffer_fill_trace(buffers_array[dh_buffer_index],
+							    saved_traces + index * TRACE_STR_LINE);
+		if (err) {
+			dh_buffer_index++;
+			if (dh_buffer_index == num_buffers)
+				break;
+		} else {
+			index++;
+		}
+	}
+
+	return err;
+}
+
+static int
+mlx5_fw_reporter_dump(struct devlink_health_reporter *reporter,
+		      struct devlink_health_buffer **buffers_array,
+		      unsigned int buff_size, unsigned int num_buffers,
+		      void *priv_ctx)
+{
+	struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
+	struct devlink_health_buffer *buffer;
+	int err;
+
+	if (!buffers_array || buff_size < TRACE_STR_LINE || num_buffers < 1)
+		return -EINVAL;
+
+	err = mlx5_fw_tracer_trigger_core_dump_general(dev);
+	if (err)
+		return err;
+
+	buffer = buffers_array[0];
+	if (priv_ctx) {
+		struct mlx5_fw_reporter_ctx *fw_reporter_ctx = priv_ctx;
+
+		err = mlx5_devlink_health_buffer_fill_syndrom(buffer,
+							      fw_reporter_ctx->err_synd);
+		if (err)
+			return err;
+	}
+
+	return mlx5_fw_tracer_get_saved_traces_objects(dev->tracer, buffers_array,
+						       num_buffers);
+}
+
 static int
 mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
 			  struct devlink_health_buffer **buffers_array,
@@ -98,7 +208,9 @@ mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
 
 static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
 		.name = "FW",
+		.dump_size = SAVED_TRACES_BUFFER_SIZE_BYTE,
 		.diagnose_size = HEALTH_INFO_MAX_BUFF,
+		.dump = mlx5_fw_reporter_dump,
 		.diagnose = mlx5_fw_reporter_diagnose,
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index 89f6b459d21d..781a7aa09816 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -243,6 +243,24 @@ static int mlx5_fw_tracer_allocate_strings_db(struct mlx5_fw_tracer *tracer)
 	return -ENOMEM;
 }
 
+static int mlx5_fw_tracer_allocate_saved_traces_buff(struct mlx5_fw_tracer *tracer)
+{
+	int traces_buff_size = SAVED_TRACES_BUFFER_SIZE_BYTE;
+
+	tracer->sbuff.traces_buff = kzalloc(traces_buff_size, GFP_KERNEL);
+	if (!tracer->sbuff.traces_buff)
+		return -ENOMEM;
+	tracer->sbuff.saved_traces_index = 0;
+
+	return 0;
+}
+
+static void mlx5_fw_tracer_free_saved_traces_buff(struct mlx5_fw_tracer *tracer)
+{
+	kfree(tracer->sbuff.traces_buff);
+	tracer->sbuff.traces_buff = NULL;
+}
+
 static void mlx5_tracer_read_strings_db(struct work_struct *work)
 {
 	struct mlx5_fw_tracer *tracer = container_of(work, struct mlx5_fw_tracer,
@@ -522,6 +540,20 @@ static void mlx5_fw_tracer_clean_ready_list(struct mlx5_fw_tracer *tracer)
 		list_del(&str_frmt->list);
 }
 
+static void mlx5_fw_tracer_save_trace(struct mlx5_fw_tracer *tracer, u64 timestamp,
+				      bool lost, u8 event_id, char *msg)
+{
+	char *saved_traces = tracer->sbuff.traces_buff;
+	u32 offset;
+
+	offset = tracer->sbuff.saved_traces_index * TRACE_STR_LINE;
+	snprintf(saved_traces + offset, TRACE_STR_LINE, "%s [0x%llx] %d [0x%x] %s",
+		 dev_name(&tracer->dev->pdev->dev), timestamp, lost, event_id, msg);
+
+	tracer->sbuff.saved_traces_index =
+		(tracer->sbuff.saved_traces_index + 1) & (SAVED_TRACES_NUM - 1);
+}
+
 static void mlx5_tracer_print_trace(struct tracer_string_format *str_frmt,
 				    struct mlx5_core_dev *dev,
 				    u64 trace_timestamp)
@@ -540,6 +572,9 @@ static void mlx5_tracer_print_trace(struct tracer_string_format *str_frmt,
 	trace_mlx5_fw(dev->tracer, trace_timestamp, str_frmt->lost,
 		      str_frmt->event_id, tmp);
 
+	mlx5_fw_tracer_save_trace(dev->tracer, trace_timestamp,
+				  str_frmt->lost, str_frmt->event_id, tmp);
+
 	/* remove it from hash */
 	mlx5_tracer_clean_message(str_frmt);
 }
@@ -865,10 +900,18 @@ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev)
 		goto free_log_buf;
 	}
 
+	err = mlx5_fw_tracer_allocate_saved_traces_buff(tracer);
+	if (err) {
+		mlx5_core_warn(dev, "FWTracer: Create saved traces buffer failed %d\n", err);
+		goto free_strings_db;
+	}
+
 	mlx5_core_dbg(dev, "FWTracer: Tracer created\n");
 
 	return tracer;
 
+free_strings_db:
+	mlx5_fw_tracer_free_strings_db(tracer);
 free_log_buf:
 	mlx5_fw_tracer_destroy_log_buf(tracer);
 destroy_workqueue:
@@ -949,6 +992,7 @@ void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer)
 	cancel_work_sync(&tracer->read_fw_strings_work);
 	mlx5_fw_tracer_clean_ready_list(tracer);
 	mlx5_fw_tracer_clean_print_hash(tracer);
+	mlx5_fw_tracer_free_saved_traces_buff(tracer);
 	mlx5_fw_tracer_free_strings_db(tracer);
 	mlx5_fw_tracer_destroy_log_buf(tracer);
 	flush_workqueue(tracer->work_queue);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
index a8b8747f2b61..59047c8822fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
@@ -46,6 +46,10 @@
 #define TRACER_BLOCK_SIZE_BYTE 256
 #define TRACES_PER_BLOCK 32
 
+#define TRACE_STR_LINE 256
+#define SAVED_TRACES_NUM 1024
+#define SAVED_TRACES_BUFFER_SIZE_BYTE (SAVED_TRACES_NUM * TRACE_STR_LINE)
+
 #define TRACER_MAX_PARAMS 7
 #define MESSAGE_HASH_BITS 6
 #define MESSAGE_HASH_SIZE BIT(MESSAGE_HASH_BITS)
@@ -83,6 +87,12 @@ struct mlx5_fw_tracer {
 		u32 consumer_index;
 	} buff;
 
+	/* Saved Tarces Buffer */
+	struct {
+		void *traces_buff;
+		u32 saved_traces_index;
+	} sbuff;
+
 	u64 last_timestamp;
 	struct work_struct handle_traces_work;
 	struct hlist_head hash[MESSAGE_HASH_SIZE];
@@ -171,5 +181,6 @@ struct mlx5_fw_tracer *mlx5_fw_tracer_create(struct mlx5_core_dev *dev);
 int mlx5_fw_tracer_init(struct mlx5_fw_tracer *tracer);
 void mlx5_fw_tracer_cleanup(struct mlx5_fw_tracer *tracer);
 void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer);
+int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev);
 
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 23/27] net/mlx5: Report devlink health on FW issues
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (21 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 22/27] net/mlx5: Add support for FW reporter dump Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 24/27] net/mlx5: Add FW fatal devlink_health_reporter Eran Ben Elisha
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Moshe Shemesh <moshe@mellanox.com>

Use devlink_health_report() to report any symptom of FW issue as FW
counter miss or new health syndrom.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 21 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |  1 +
 .../net/ethernet/mellanox/mlx5/core/health.c  | 12 +++++++++++
 include/linux/mlx5/driver.h                   |  2 ++
 4 files changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 3124ced06d51..5713f89d9235 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -206,6 +206,27 @@ mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
 	return 0;
 }
 
+void mlx5_fw_reporter_err_work(struct work_struct *work)
+{
+	struct mlx5_fw_reporter_ctx fw_reporter_ctx;
+	struct mlx5_core_health *health;
+	struct mlx5_core_dev *dev;
+	struct mlx5_priv *priv;
+
+	health = container_of(work, struct mlx5_core_health, report_work);
+	priv = container_of(health, struct mlx5_priv, health);
+	dev = container_of(priv, struct mlx5_core_dev, priv);
+
+	fw_reporter_ctx.err_synd = health->synd;
+	fw_reporter_ctx.miss_counter = health->miss_counter;
+	if (fw_reporter_ctx.err_synd)
+		devlink_health_report(dev->fw_reporter, "FW syndrom reported",
+				      &fw_reporter_ctx);
+	else if (fw_reporter_ctx.miss_counter)
+		devlink_health_report(dev->fw_reporter, "FW miss counter reported",
+				      &fw_reporter_ctx);
+}
+
 static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
 		.name = "FW",
 		.dump_size = SAVED_TRACES_BUFFER_SIZE_BYTE,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 34f6bfed1cfb..082a648a3af3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -16,5 +16,6 @@ int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
 void mlx5_devlink_unregister(struct devlink *devlink);
 int mlx5_fw_reporter_create(struct mlx5_core_dev *dev);
 void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev);
+void mlx5_fw_reporter_err_work(struct work_struct *work);
 
 #endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 20576a525421..61ff82380093 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -38,6 +38,7 @@
 #include <linux/mlx5/driver.h>
 #include <linux/mlx5/cmd.h>
 #include "mlx5_core.h"
+#include "devlink.h"
 #include "lib/eq.h"
 #include "lib/mlx5.h"
 #include "lib/pci_vsc.h"
@@ -464,8 +465,10 @@ static void poll_health(struct timer_list *t)
 {
 	struct mlx5_core_dev *dev = from_timer(dev, t, priv.health.timer);
 	struct mlx5_core_health *health = &dev->priv.health;
+	struct health_buffer __iomem *h = health->health;
 	u32 fatal_error;
 	u32 count;
+	u8 synd;
 
 	if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
 		goto out;
@@ -480,6 +483,13 @@ static void poll_health(struct timer_list *t)
 	if (health->miss_counter == MAX_MISSES) {
 		dev_err(&dev->pdev->dev, "device's health compromised - reached miss count\n");
 		mlx5_print_health_info(dev);
+		queue_work(health->wq, &health->report_work);
+	}
+
+	synd = ioread8(&h->synd);
+	if (synd && synd != health->synd) {
+		health->synd = synd;
+		queue_work(health->wq, &health->report_work);
 	}
 
 	fatal_error = check_fatal_sensors(dev);
@@ -535,6 +545,7 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev)
 	set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
 	spin_unlock_irqrestore(&health->wq_lock, flags);
 	cancel_delayed_work_sync(&health->recover_work);
+	cancel_work_sync(&health->report_work);
 	cancel_work_sync(&health->work);
 }
 
@@ -574,6 +585,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
 		return -ENOMEM;
 	spin_lock_init(&health->wq_lock);
 	INIT_WORK(&health->work, health_care);
+	INIT_WORK(&health->report_work, mlx5_fw_reporter_err_work);
 	INIT_DELAYED_WORK(&health->recover_work, health_recover);
 	health->crdump = NULL;
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index cd23926aaf4a..b5393684a6df 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -435,12 +435,14 @@ struct mlx5_core_health {
 	struct timer_list		timer;
 	u32				prev;
 	int				miss_counter;
+	u8				synd;
 	u32				fatal_error;
 	/* wq spinlock to synchronize draining */
 	spinlock_t			wq_lock;
 	struct workqueue_struct	       *wq;
 	unsigned long			flags;
 	struct work_struct		work;
+	struct work_struct		report_work;
 	struct delayed_work		recover_work;
 	struct mlx5_fw_crdump	       *crdump;
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 24/27] net/mlx5: Add FW fatal devlink_health_reporter
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (22 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 23/27] net/mlx5: Report devlink health on FW issues Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 25/27] net/mlx5: Add support for FW fatal reporter dump Eran Ben Elisha
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Moshe Shemesh <moshe@mellanox.com>

Create mlx5_devlink_health_reporter for FW fatal reporter.
The FW fatal reporter is added in addition to the fw reporter and
implements only the recover callback.
The point of having two reporters for FW issues, is that we
don't want to run FW recover on any issue, but only fatal ones.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 49 +++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/devlink.h |  4 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  |  6 +--
 .../net/ethernet/mellanox/mlx5/core/main.c    |  6 +--
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  1 +
 include/linux/mlx5/driver.h                   |  1 +
 6 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 5713f89d9235..ae08af00b101 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -235,21 +235,52 @@ static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
 		.diagnose = mlx5_fw_reporter_diagnose,
 };
 
-int mlx5_fw_reporter_create(struct mlx5_core_dev *dev)
+static int
+mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
+			       void *priv_ctx)
 {
-	struct devlink *devlink = priv_to_devlink(dev);
+	struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
 
-	dev->fw_reporter = devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
-							  0, false, dev);
-	return PTR_ERR_OR_ZERO(dev->fw_reporter);
+	if (mlx5_sensor_pci_not_working(dev)) {
+		dev_err(&dev->pdev->dev, "health recovery flow aborted, PCI reads still not working\n");
+		return -ECANCELED;
+	}
+	dev_err(&dev->pdev->dev, "starting health recovery flow\n");
+
+	mlx5_recover_device(dev);
+
+	return 0;
 }
 
-void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev)
+static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = {
+		.name = "FW_fatal",
+		.recover = mlx5_fw_fatal_reporter_recover,
+};
+
+#define MLX5_REPORTER_FW_GRACEFUL_PERIOD 1200000
+int mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
 {
-	if (!dev->fw_reporter)
-		return;
+	struct devlink *devlink = priv_to_devlink(dev);
 
-	devlink_health_reporter_destroy(dev->fw_reporter);
+	dev->fw_reporter =
+		devlink_health_reporter_create(devlink, &mlx5_fw_reporter_ops,
+					       0, false, dev);
+	if (IS_ERR(dev->fw_reporter))
+		return PTR_ERR(dev->fw_reporter);
+
+	dev->fw_fatal_reporter =
+		devlink_health_reporter_create(devlink, &mlx5_fw_fatal_reporter_ops,
+					       MLX5_REPORTER_FW_GRACEFUL_PERIOD,
+					       true, dev);
+	return PTR_ERR_OR_ZERO(dev->fw_fatal_reporter);
+}
+
+void mlx5_fw_reporters_destroy(struct mlx5_core_dev *dev)
+{
+	if (dev->fw_reporter)
+		devlink_health_reporter_destroy(dev->fw_reporter);
+	if (dev->fw_fatal_reporter)
+		devlink_health_reporter_destroy(dev->fw_fatal_reporter);
 }
 
 static int mlx5_devlink_get_crdump_snapshot(struct devlink *devlink, u32 id,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 082a648a3af3..9b544f677aa7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -14,8 +14,8 @@ struct mlx5_fw_reporter_ctx {
 
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
 void mlx5_devlink_unregister(struct devlink *devlink);
-int mlx5_fw_reporter_create(struct mlx5_core_dev *dev);
-void mlx5_fw_reporter_destroy(struct mlx5_core_dev *dev);
+int mlx5_fw_reporters_create(struct mlx5_core_dev *dev);
+void mlx5_fw_reporters_destroy(struct mlx5_core_dev *dev);
 void mlx5_fw_reporter_err_work(struct work_struct *work);
 
 #endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 61ff82380093..d3ea624bd03c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -91,7 +91,7 @@ void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state)
 		    &dev->iseg->cmdq_addr_l_sz);
 }
 
-static bool sensor_pci_not_working(struct mlx5_core_dev *dev)
+bool mlx5_sensor_pci_not_working(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_health *health = &dev->priv.health;
 	struct health_buffer __iomem *h = health->health;
@@ -114,7 +114,7 @@ static bool sensor_fw_synd_rfr(struct mlx5_core_dev *dev)
 
 static u32 check_fatal_sensors(struct mlx5_core_dev *dev)
 {
-	if (sensor_pci_not_working(dev))
+	if (mlx5_sensor_pci_not_working(dev))
 		return MLX5_SENSOR_PCI_COMM_ERR;
 	if (pci_channel_offline(dev->pdev))
 		return MLX5_SENSOR_PCI_ERR;
@@ -315,7 +315,7 @@ static void health_recover(struct work_struct *work)
 	priv = container_of(health, struct mlx5_priv, health);
 	dev = container_of(priv, struct mlx5_core_dev, priv);
 
-	if (sensor_pci_not_working(dev)) {
+	if (mlx5_sensor_pci_not_working(dev)) {
 		dev_err(&dev->pdev->dev, "health recovery flow aborted, PCI reads still not working\n");
 		return;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 010a6a28e08d..2e7c54975f47 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -980,9 +980,9 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_fw_tracer;
 	}
 
-	err = mlx5_fw_reporter_create(dev);
+	err = mlx5_fw_reporters_create(dev);
 	if (err)
-		dev_warn(&pdev->dev, "Failed to create FW reporter\n");
+		dev_warn(&pdev->dev, "Failed to create FW reporters\n");
 
 	err = mlx5_fpga_device_start(dev);
 	if (err) {
@@ -1116,7 +1116,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_accel_ipsec_cleanup(dev);
 	mlx5_accel_tls_cleanup(dev);
 	mlx5_fpga_device_stop(dev);
-	mlx5_fw_reporter_destroy(dev);
+	mlx5_fw_reporters_destroy(dev);
 	mlx5_fw_tracer_cleanup(dev->tracer);
 	mlx5_eq_table_destroy(dev);
 	mlx5_pagealloc_stop(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index af5cf19cc019..dca9a736c2cd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -204,6 +204,7 @@ enum {
 
 u8 mlx5_get_nic_state(struct mlx5_core_dev *dev);
 void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state);
+bool mlx5_sensor_pci_not_working(struct mlx5_core_dev *dev);
 
 #define HEALTH_INFO_MAX_LINE 64
 #define HEALTH_INFO_LINES (MLX5_FLD_SZ_DW(health_buffer, assert_var) + 8)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index b5393684a6df..f2c6850b0ec4 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -686,6 +686,7 @@ struct mlx5_core_dev {
 	struct page             *clock_info_page;
 	struct mlx5_fw_tracer   *tracer;
 	struct devlink_health_reporter *fw_reporter;
+	struct devlink_health_reporter *fw_fatal_reporter;
 	u32                      vsc_addr;
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 25/27] net/mlx5: Add support for FW fatal reporter dump
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (23 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 24/27] net/mlx5: Add FW fatal devlink_health_reporter Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 26/27] net/mlx5: Report devlink health on FW fatal issues Eran Ben Elisha
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Moshe Shemesh <moshe@mellanox.com>

Add support of dump callback for mlx5 FW fatal reporter.
The FW fatal dump use cr-dump functionality to gather cr-space data for
debug. The cr-dump uses vsc interface which is valid even if the FW
command interface is not functional, which is the case in FW fatal errors.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 91 +++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index ae08af00b101..406856002dd1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -252,9 +252,100 @@ mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
 	return 0;
 }
 
+static int
+mlx5_devlink_health_buffer_fill_snapshot(struct devlink_health_buffer *dh_buffer,
+					 char *crdump_region, u32 snapshot_id)
+{
+	int err;
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_object_name(dh_buffer,
+						    "devlink region name");
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_value_string(dh_buffer, crdump_region);
+	if (err)
+		return err;
+	devlink_health_buffer_nest_end(dh_buffer);
+	devlink_health_buffer_nest_end(dh_buffer);
+
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_object_name(dh_buffer, "snapshot id");
+	if (err)
+		return err;
+	err = devlink_health_buffer_nest_start(dh_buffer,
+					       DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
+	if (err)
+		return err;
+	err = devlink_health_buffer_put_value_u32(dh_buffer, snapshot_id);
+	if (err)
+		return err;
+	devlink_health_buffer_nest_end(dh_buffer);
+	devlink_health_buffer_nest_end(dh_buffer);
+	devlink_health_buffer_nest_end(dh_buffer);
+
+	return 0;
+}
+
+#define MLX5_FW_FATAL_REPORTER_DUMP_SIZE_BYTE 256
+static int
+mlx5_fw_fatal_reporter_dump(struct devlink_health_reporter *reporter,
+			    struct devlink_health_buffer **buffers_array,
+			    unsigned int buff_size, unsigned int num_buffers,
+			    void *priv_ctx)
+{
+	struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
+	struct devlink_health_buffer *buffer;
+	char crdump_region[20];
+	u32 snapshot_id;
+	int err;
+
+	if (!mlx5_core_is_pf(dev)) {
+		mlx5_core_err(dev, "Only PF is permitted run FW fatal dump");
+		return -EPERM;
+	}
+
+	if (!buffers_array || num_buffers < 1 ||
+	    buff_size < MLX5_FW_FATAL_REPORTER_DUMP_SIZE_BYTE)
+		return -EINVAL;
+
+	err = mlx5_crdump_collect(dev, crdump_region, &snapshot_id);
+	if (err)
+		return err;
+
+	buffer = buffers_array[0];
+	if (priv_ctx) {
+		struct mlx5_fw_reporter_ctx *fw_reporter_ctx = priv_ctx;
+
+		err = mlx5_devlink_health_buffer_fill_syndrom(buffer,
+							      fw_reporter_ctx->err_synd);
+		if (err)
+			return err;
+	}
+
+	return mlx5_devlink_health_buffer_fill_snapshot(buffer, crdump_region,
+							snapshot_id);
+}
+
 static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = {
 		.name = "FW_fatal",
 		.recover = mlx5_fw_fatal_reporter_recover,
+		.dump_size = MLX5_FW_FATAL_REPORTER_DUMP_SIZE_BYTE,
+		.dump = mlx5_fw_fatal_reporter_dump,
 };
 
 #define MLX5_REPORTER_FW_GRACEFUL_PERIOD 1200000
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 26/27] net/mlx5: Report devlink health on FW fatal issues
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (24 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 25/27] net/mlx5: Add support for FW fatal reporter dump Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:05 ` [PATCH net-next 27/27] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Moshe Shemesh <moshe@mellanox.com>

Report devlink health on FW fatal issues via FW_fatal_reporter. The
driver recover flow for FW fatal error is now being handled by the
devlink health.

Having the recovery controlled by devlink health, the user has the
ability to cancel the auto-recovery for debug session and run it
manually.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 19 +++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |  1 +
 .../net/ethernet/mellanox/mlx5/core/health.c  | 23 +------------------
 3 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 406856002dd1..e03160626171 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -235,6 +235,25 @@ static const struct devlink_health_reporter_ops mlx5_fw_reporter_ops = {
 		.diagnose = mlx5_fw_reporter_diagnose,
 };
 
+void mlx5_fw_fatal_reporter_work(struct work_struct *work)
+{
+	struct mlx5_fw_reporter_ctx fw_reporter_ctx;
+	struct mlx5_core_health *health;
+	struct delayed_work *dwork;
+	struct mlx5_core_dev *dev;
+	struct mlx5_priv *priv;
+
+	dwork = container_of(work, struct delayed_work, work);
+	health = container_of(dwork, struct mlx5_core_health, recover_work);
+	priv = container_of(health, struct mlx5_priv, health);
+	dev = container_of(priv, struct mlx5_core_dev, priv);
+
+	fw_reporter_ctx.err_synd = health->synd;
+	fw_reporter_ctx.miss_counter = health->miss_counter;
+	devlink_health_report(dev->fw_fatal_reporter, "FW recovery",
+			      &fw_reporter_ctx);
+}
+
 static int
 mlx5_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
 			       void *priv_ctx)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 9b544f677aa7..9c5eba632293 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -17,5 +17,6 @@ void mlx5_devlink_unregister(struct devlink *devlink);
 int mlx5_fw_reporters_create(struct mlx5_core_dev *dev);
 void mlx5_fw_reporters_destroy(struct mlx5_core_dev *dev);
 void mlx5_fw_reporter_err_work(struct work_struct *work);
+void mlx5_fw_fatal_reporter_work(struct work_struct *work);
 
 #endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index d3ea624bd03c..189000545162 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -303,27 +303,6 @@ static void mlx5_handle_bad_state(struct mlx5_core_dev *dev)
 	mlx5_disable_device(dev);
 }
 
-static void health_recover(struct work_struct *work)
-{
-	struct mlx5_core_health *health;
-	struct delayed_work *dwork;
-	struct mlx5_core_dev *dev;
-	struct mlx5_priv *priv;
-
-	dwork = container_of(work, struct delayed_work, work);
-	health = container_of(dwork, struct mlx5_core_health, recover_work);
-	priv = container_of(health, struct mlx5_priv, health);
-	dev = container_of(priv, struct mlx5_core_dev, priv);
-
-	if (mlx5_sensor_pci_not_working(dev)) {
-		dev_err(&dev->pdev->dev, "health recovery flow aborted, PCI reads still not working\n");
-		return;
-	}
-
-	dev_err(&dev->pdev->dev, "starting health recovery flow\n");
-	mlx5_recover_device(dev);
-}
-
 /* How much time to wait until health resetting the driver (in msecs) */
 #define MLX5_RECOVERY_DELAY_MSECS 60000
 #define MLX5_RECOVERY_NO_DELAY 0
@@ -586,7 +565,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
 	spin_lock_init(&health->wq_lock);
 	INIT_WORK(&health->work, health_care);
 	INIT_WORK(&health->report_work, mlx5_fw_reporter_err_work);
-	INIT_DELAYED_WORK(&health->recover_work, health_recover);
+	INIT_DELAYED_WORK(&health->recover_work, mlx5_fw_fatal_reporter_work);
 	health->crdump = NULL;
 
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next 27/27] devlink: Add Documentation/networking/devlink-health.txt
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (25 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 26/27] net/mlx5: Report devlink health on FW fatal issues Eran Ben Elisha
@ 2019-01-17 15:05 ` Eran Ben Elisha
  2019-01-17 15:09 ` [PATCH] devlink: Add health command support Aya Levin
  2019-01-17 18:52 ` [PATCH net-next 00/27] Devlink health reporting and recovery system Jiri Pirko
  28 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 15:05 UTC (permalink / raw)
  To: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Eran Ben Elisha, Moshe Shemesh

From: Aya Levin <ayal@mellanox.com>

This patch adds a new file to add information about devlink health
mechanism.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 Documentation/networking/devlink-health.txt | 86 +++++++++++++++++++++
 1 file changed, 86 insertions(+)
 create mode 100644 Documentation/networking/devlink-health.txt

diff --git a/Documentation/networking/devlink-health.txt b/Documentation/networking/devlink-health.txt
new file mode 100644
index 000000000000..1db3fbea0831
--- /dev/null
+++ b/Documentation/networking/devlink-health.txt
@@ -0,0 +1,86 @@
+The health mechanism is targeted for Real Time Alerting, in order to know when
+something bad had happened to a PCI device
+- Provide alert debug information
+- Self healing
+- If problem needs vendor support, provide a way to gather all needed debugging
+  information.
+
+The main idea is to unify and centralize driver health reports in the
+generic devlink instance and allow the user to set different
+attributes of the health reporting and recovery procedures.
+
+The devlink health reporter:
+Device driver creates a "health reporter" per each error/health type.
+Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
+or unknown (driver specific).
+For each registered health reporter a driver can issue error/health reports
+asynchronously. All health reports handling is done by devlink.
+Device driver can provide specific callbacks for each "health reporter", e.g.
+ - Recovery procedures
+ - Diagnostics and object dump procedures
+ - OOB initial parameters
+Different parts of the driver can register different types of health reporters
+with different handlers.
+
+Once an error is reported, devlink health will do the following actions:
+  * A log is being send to the kernel trace events buffer
+  * Health status and statistics are being updated for the reporter instance
+  * Object dump is being taken and saved at the reporter instance (as long as
+    there is no other dump which is already stored)
+  * Auto recovery attempt is being done. Depends on:
+    - Auto-recovery configuration
+    - Grace period vs. time passed since last recover
+
+The user interface:
+User can access/change each reporter's parameters and driver specific callbacks
+via devlink, e.g per error type (per health reporter)
+ - Configure reporter's generic parameters (like: disable/enable auto recovery)
+ - Invoke recovery procedure
+ - Run diagnostics
+ - Object dump
+
+The devlink health interface (via netlink):
+DEVLINK_CMD_HEALTH_REPORTER_GET
+  Retrieves status and configuration info per DEV and reporter.
+DEVLINK_CMD_HEALTH_REPORTER_SET
+  Allows reporter-related configuration setting.
+DEVLINK_CMD_HEALTH_REPORTER_RECOVER
+  Triggers a reporter's recovery procedure.
+DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
+  Retrieves diagnostics data from a reporter on a device.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
+  Retrieves the last stored dump. Devlink health
+  saves a single dump. If an dump is not already stored by the devlink
+  for this reporter, devlink generates a new dump.
+  dump output is defined by the reporter.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
+  Clears the last saved dump file for the specified reporter.
+
+
+                                               netlink
+                                      +--------------------------+
+                                      |                          |
+                                      |            +             |
+                                      |            |             |
+                                      +--------------------------+
+                                                   |request for ops
+                                                   |(diagnose,
+ mlx5_core                             devlink     |recover,
+                                                   |dump)
++--------+                            +--------------------------+
+|        |                            |    reporter|             |
+|        |                            |  +---------v----------+  |
+|        |   ops execution            |  |                    |  |
+|     <----------------------------------+                    |  |
+|        |                            |  |                    |  |
+|        |                            |  + ^------------------+  |
+|        |                            |    | request for ops     |
+|        |                            |    | (recover, dump)     |
+|        |                            |    |                     |
+|        |                            |  +-+------------------+  |
+|        |     health report          |  | health handler     |  |
+|        +------------------------------->                    |  |
+|        |                            |  +--------------------+  |
+|        |     health reporter create |                          |
+|        +---------------------------->                          |
++--------+                            +--------------------------+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH] devlink: Add health command support
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (26 preceding siblings ...)
  2019-01-17 15:05 ` [PATCH net-next 27/27] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
@ 2019-01-17 15:09 ` Aya Levin
  2019-01-17 18:52 ` [PATCH net-next 00/27] Devlink health reporting and recovery system Jiri Pirko
  28 siblings, 0 replies; 31+ messages in thread
From: Aya Levin @ 2019-01-17 15:09 UTC (permalink / raw)
  To: David Ahern, netdev, David S. Miller, Jiri Pirko
  Cc: Moshe Shemesh, Aya Levin, Eran Ben Elisha, Tal Alon, Ariel Almog

This patch add support for the following commands:
Devlink health show [DEV reporter REPORTE_NAME],
Devlink health recover DEV reporter REPORTER_NAME,
Devlink health diagnose DEV reporter REPORTER_NAME,
Devlink health dump show DEV reporter REPORTER_NAME,
Devlink health dump clear DEV reporter REPORTER_NAME,
Devlink health set DEV reporter REPORTER_NAME NAME VALUE

 * Devlink health show command displays status and configuration info on
   specific reporter on a device or dump the info on all reporters on all
   devices.
 * Devlink health recover enables the user to initiate a
   recovery on a reporter. This operation will increment the recoveries
   counter displayed in the show command.
 * Devlink health diagnose enables the user to retrieve diagnostics data
   on a reporter on a device. The command's output is a free text defined
   by the reporter.
 * Devlink health dump show displays the last saved dump. Devlink
   health saves a single dump. If an dump is not already stored by
   the Devlink for this reporter, Devlink generates a new dump. The
   dump can be generated automatically when a reporter reports on an
   error or by the user's request.
   dump output is defined by the reporter.
 * Devlink health dump clear, deletes the last saved dump file.
 * Devlink health set, enables the user to configure:
	1) grace_period [msec] time interval between auto recoveries.
	2) auto_recover [true/false] whether the devlink should execute
	automatic recover on error.

Examples:
$devlink health show pci/0000:00:09.0 reporter TX
pci/0000:00:09.0:
name TX state healthy #err 0 #recover 1 last_dump_ts N/A dump_available false
  attributes:
    grace period 600 auto_recover true
$devlink health diagnose pci/0000:00:09.0 reporter TX
SQ 0x9b: HW state: 1 stopped: 0
SQ 0x9f: HW state: 1 stopped: 0
SQ 0xa3: HW state: 1 stopped: 0
SQ 0xa7: HW state: 1 stopped: 0
SQ 0xab: HW state: 1 stopped: 0
$devlink health dump show pci/0000:00:09.0 reporter TX
TX dump data
$devlink health dump clear pci/0000:00:09.0 reporter TX
$devlink health set pci/0000:00:09.0 reporter TX grace_period 3500
$devlink health set pci/0000:00:09.0 reporter TX auto_recover false

Change-Id: Iaa58e70db360cb134d8384965a795eca4c1b09ef
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 devlink/devlink.c            | 533 ++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/devlink.h |  29 +++
 man/man8/devlink-health.8    | 175 ++++++++++++++
 man/man8/devlink.8           |   7 +-
 4 files changed, 740 insertions(+), 4 deletions(-)
 create mode 100644 man/man8/devlink-health.8

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 3651e90c1159..9aca7c2c9bd4 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -22,7 +22,7 @@
 #include <linux/devlink.h>
 #include <libmnl/libmnl.h>
 #include <netinet/ether.h>
-
+#include <sys/sysinfo.h>
 #include "SNAPSHOT.h"
 #include "list.h"
 #include "mnlg.h"
@@ -40,6 +40,12 @@
 #define PARAM_CMODE_DRIVERINIT_STR "driverinit"
 #define PARAM_CMODE_PERMANENT_STR "permanent"
 
+#define HEALTH_REPORTER_STATE_HEALTHY_STR "healthy"
+#define HEALTH_REPORTER_STATE_ERROR_STR "error"
+#define HEALTH_REPORTER_GRACE_PERIOD_STR "grace_period"
+#define HEALTH_REPORTER_AUTO_RECOVER_STR "auto_recover"
+#define HEALTH_REPORTER_TS_LEN 80
+
 static int g_new_line_count;
 
 #define pr_err(args...) fprintf(stderr, ##args)
@@ -199,6 +205,10 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_REGION_SNAPSHOT_ID	BIT(22)
 #define DL_OPT_REGION_ADDRESS		BIT(23)
 #define DL_OPT_REGION_LENGTH		BIT(24)
+#define DL_OPT_HANDLE_HEALTH		BIT(25)
+#define DL_OPT_HEALTH_REPORTER_NAME	BIT(26)
+#define DL_OPT_HEALTH_REPORTER_DEV	BIT(27)
+
 
 struct dl_opts {
 	uint32_t present; /* flags of present items */
@@ -230,6 +240,10 @@ struct dl_opts {
 	uint32_t region_snapshot_id;
 	uint64_t region_address;
 	uint64_t region_length;
+	const char *reporter_name;
+	const char *reporter_param_name;
+	const char *reporter_param_value;
+
 };
 
 struct dl {
@@ -959,7 +973,7 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 		if (err)
 			return err;
 		o_found |= handle_bit;
-	} else if (o_required & DL_OPT_HANDLE) {
+	} else if (DL_OPT_HANDLE) {
 		err = dl_argv_handle(dl, &opts->bus_name, &opts->dev_name);
 		if (err)
 			return err;
@@ -1178,6 +1192,15 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_REGION_LENGTH;
+		} else if (dl_argv_match(dl, "reporter") &&
+			   (o_all & DL_OPT_HEALTH_REPORTER_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->reporter_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_HEALTH_REPORTER_NAME;
+			o_found |= DL_OPT_HANDLE;
+			break;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -1298,6 +1321,12 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 		return -EINVAL;
 	}
 
+	if ((o_required & DL_OPT_HEALTH_REPORTER_NAME) &&
+	    !(o_found & DL_OPT_HEALTH_REPORTER_NAME)) {
+		pr_err("Reporter name expected.\n");
+		return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -1382,6 +1411,9 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_REGION_LENGTH)
 		mnl_attr_put_u64(nlh, DEVLINK_ATTR_REGION_CHUNK_LEN,
 				 opts->region_length);
+	if (opts->present & DL_OPT_HEALTH_REPORTER_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_HEALTH_REPORTER_NAME,
+				  opts->reporter_name);
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -1513,6 +1545,8 @@ static void __pr_out_handle_start(struct dl *dl, struct nlattr **tb,
 				__pr_out_newline();
 				__pr_out_indent_inc();
 				arr_last_handle_set(dl, bus_name, dev_name);
+			} else {
+				__pr_out_indent_inc();
 			}
 		} else {
 			pr_out("%s%s", buf, content ? ":" : "");
@@ -5557,11 +5591,500 @@ static int cmd_region(struct dl *dl)
 	return -ENOENT;
 }
 
+static int cmd_health_set_params(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint64_t period;
+	bool auto_recover;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+	err = dl_argv_parse(dl, DL_OPT_HANDLE |
+			    DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	err = dl_argv_str(dl, &dl->opts.reporter_param_name);
+	if (err)
+		return err;
+	err = dl_argv_str(dl, &dl->opts.reporter_param_value);
+	if (err)
+		return err;
+	dl_opts_put(nlh, dl);
+
+	if (!strncmp(dl->opts.reporter_param_name,
+		     HEALTH_REPORTER_GRACE_PERIOD_STR, strlen("garce"))) {
+		err = strtouint64_t(dl->opts.reporter_param_value, &period);
+		if (err)
+			goto err_param_value_parse;
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_HEALTH_REPORTER_PERIOD,
+				 period);
+	} else if (!strncmp(dl->opts.reporter_param_name,
+			    HEALTH_REPORTER_AUTO_RECOVER_STR,
+			    strlen("auto"))) {
+		err = strtobool(dl->opts.reporter_param_value, &auto_recover);
+		if (err)
+			goto err_param_value_parse;
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_REC,
+				(uint8_t)auto_recover);
+	} else {
+		printf("Parameter name: %s  is not supported\n",
+		       dl->opts.reporter_param_name);
+		return -ENOTSUP;
+	}
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+
+err_param_value_parse:
+	pr_err("Value \"%s\" is not a number or not within range\n",
+	       dl->opts.param_value);
+	return err;
+}
+
+static int cmd_health_dump_clear(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg,
+			       DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE |
+				DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	dl_opts_put(nlh, dl);
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int health_value_show(struct dl *dl, int type, struct nlattr *nl_data)
+{
+	const char *str;
+	uint8_t *data;
+	uint32_t len;
+	uint64_t u64;
+	uint32_t u32;
+	uint16_t u16;
+	uint8_t u8;
+	int i;
+
+	switch (type) {
+	case MNL_TYPE_FLAG:
+		if (dl->json_output)
+			jsonw_string(dl->jw, nl_data ? "true" : "false");
+		else
+			pr_out("%s ", nl_data ? "true" : "false");
+		break;
+	case MNL_TYPE_U8:
+		u8 = mnl_attr_get_u8(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, u8);
+		else
+			pr_out("%u ", u8);
+		break;
+	case MNL_TYPE_U16:
+		u16 = mnl_attr_get_u16(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, u16);
+		else
+			pr_out("%u ", u16);
+		break;
+	case MNL_TYPE_U32:
+		u32 = mnl_attr_get_u32(nl_data);
+		if (dl->json_output)
+			jsonw_uint(dl->jw, u32);
+		else
+			pr_out("%u ", u32);
+		break;
+	case MNL_TYPE_U64:
+		u64 = mnl_attr_get_u64(nl_data);
+		if (dl->json_output)
+			jsonw_u64(dl->jw, u64);
+		else
+			pr_out("%lu ", u64);
+		break;
+	case MNL_TYPE_STRING:
+	case MNL_TYPE_NUL_STRING:
+		str = mnl_attr_get_str(nl_data);
+		if (dl->json_output)
+			jsonw_string(dl->jw, str);
+		else
+			pr_out("%s ", str);
+		break;
+	case MNL_TYPE_BINARY:
+		len = mnl_attr_get_payload_len(nl_data);
+		data = mnl_attr_get_payload(nl_data);
+		i = 0;
+		while (i < len) {
+			if (dl->json_output)
+				jsonw_printf(dl->jw, "%d", data[i]);
+			else
+				pr_out("%02x ", data[i]);
+			i++;
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+	return MNL_CB_OK;
+}
+
+static int health_object_pair_show(struct dl *dl, struct nlattr *nl)
+{
+	struct nlattr *nla_pair[DEVLINK_ATTR_MAX + 1] = {};
+	struct nlattr *nla_value[DEVLINK_ATTR_MAX + 1] = {};
+	struct nlattr *nla_value_data;
+	struct nlattr *nla_obj_pair;
+	struct nlattr *nla_object;
+	struct nlattr *nla_array;
+	struct nlattr *nla_val;
+	int err, type;
+	const char *name;
+
+	err = mnl_attr_parse_nested(nl, attr_cb, nla_pair);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_pair[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME] ||
+	    !nla_pair[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE])
+		return -EINVAL;
+
+	name = mnl_attr_get_str(nla_pair[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME]);
+	if (dl->json_output)
+		jsonw_name(dl->jw, name);
+	else
+		pr_out("%s: ", name);
+
+	nla_val = nla_pair[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE];
+	err = mnl_attr_parse_nested(nla_val, attr_cb, nla_value);
+	if (err != MNL_CB_OK)
+		return -EINVAL;
+
+	if (!nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT] &&
+	    !nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY] &&
+	    !nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE])
+		return -EINVAL;
+
+	if (nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT]) {
+		nla_object = nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT];
+		/*inside must be an object pair*/
+		if (dl->json_output)
+			jsonw_start_object(dl->jw);
+
+		mnl_attr_for_each_nested(nla_obj_pair, nla_object) {
+			err = health_object_pair_show(dl, nla_obj_pair);
+			if (err != MNL_CB_OK)
+				break;
+		}
+		if (dl->json_output)
+			jsonw_end_object(dl->jw);
+	}
+	if (nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE]) {
+		type = mnl_attr_get_u8(nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE]);
+		if (!nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA])
+			return -EINVAL;
+		nla_value_data = nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA];
+		health_value_show(dl, type, nla_value_data);
+	}
+	if (nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY]) {
+		if (dl->json_output)
+			jsonw_start_array(dl->jw);
+		nla_array = nla_value[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY];
+		mnl_attr_for_each_nested(nla_object, nla_array) {
+			mnl_attr_for_each_nested(nla_obj_pair, nla_object) {
+				err = health_object_pair_show(dl, nla_obj_pair);
+				if (err != MNL_CB_OK)
+					break;
+			}
+		}
+		if (dl->json_output)
+			jsonw_end_array(dl->jw);
+	}
+	return err;
+}
+
+static int cmd_health_object_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct nlattr *nla_object;
+	struct nlattr *nla_object_pair;
+	struct dl *dl = data;
+	int err = MNL_CB_OK;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_HEALTH_BUFFER_OBJECT])
+		return MNL_CB_ERROR;
+
+	mnl_attr_for_each(nla_object, nlh, sizeof(*genl)) {
+		mnl_attr_for_each_nested(nla_object_pair, nla_object) {
+			err = health_object_pair_show(dl, nla_object_pair);
+			if (err != MNL_CB_OK)
+				break;
+		}
+		if (!dl->json_output)
+			pr_out("\n");
+	}
+
+	return err;
+}
+
+static int cmd_health_object_common(struct dl *dl, uint8_t cmd)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, cmd,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE |
+				DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	if (dl->json_output)
+		jsonw_start_object(dl->jw);
+
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_object_cb, dl);
+
+	if (dl->json_output)
+		jsonw_end_object(dl->jw);
+
+	return err;
+}
+
+static int cmd_health_diagnose(struct dl *dl)
+{
+	return cmd_health_object_common(dl,
+					DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE);
+}
+
+static int cmd_health_dump_show(struct dl *dl)
+{
+	return cmd_health_object_common(dl,
+					DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET);
+}
+
+static int cmd_health_recover(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE |
+				DL_OPT_HEALTH_REPORTER_NAME, 0);
+	if (err)
+		return err;
+
+	dl_opts_put(nlh, dl);
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static void pr_out_array_parameters_start(struct dl *dl, const char *name)
+{
+	if (dl->json_output) {
+		jsonw_name(dl->jw, name);
+		jsonw_start_array(dl->jw);
+	} else {
+		__pr_out_newline();
+		pr_out("%s:", name);
+		 __pr_out_indent_inc();
+		__pr_out_newline();
+	}
+}
+
+static const char *health_state_name(uint8_t state)
+{
+	switch (state) {
+	case DEVLINK_HEALTH_REPORTER_STATE_HEALTHY:
+		return HEALTH_REPORTER_STATE_HEALTHY_STR;
+	case DEVLINK_HEALTH_REPORTER_STATE_ERROR:
+		return HEALTH_REPORTER_STATE_ERROR_STR;
+	default: return "<unknown state>";
+	}
+}
+
+static int jiffies_to_logtime(uint64_t jiffies, char *output)
+{
+	struct sysinfo s_info;
+	struct tm *info;
+	time_t now, sec;
+	int err;
+
+	time(&now);
+	info = localtime(&now);
+	strftime(output, HEALTH_REPORTER_TS_LEN, "%Y-%b-%d %H:%M:%S", info);
+	err = sysinfo(&s_info);
+	if (err)
+		return err;
+	/*substruct uptime in sec from now
+	 * add jiffies and 5 minutes factor*/
+	sec = now - (s_info.uptime - jiffies/1000) + 300;
+	info = localtime(&sec);
+	strftime(output, HEALTH_REPORTER_TS_LEN, "%Y-%b-%d %H:%M:%S", info);
+
+	return 0;
+}
+
+static void pr_out_health(struct dl *dl, struct nlattr **tb)
+{
+	struct nlattr *hlt[DEVLINK_ATTR_MAX + 1] = {};
+	enum devlink_health_reporter_state state;
+	char dump_time_date[HEALTH_REPORTER_TS_LEN] = "N/A";
+	bool auto_recover = false;
+	const struct nlattr *attr;
+	bool dmp = false;
+	uint64_t jiffies;
+	int err;
+
+	state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY;
+
+	err = mnl_attr_parse_nested(tb[DEVLINK_ATTR_HEALTH_REPORTER], attr_cb,
+				    hlt);
+	if (err != MNL_CB_OK)
+		return;
+
+	if (!hlt[DEVLINK_ATTR_HEALTH_REPORTER_NAME]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_ERR]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_AVAIL]	||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_STATE]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_REC]		||
+	    !hlt[DEVLINK_ATTR_HEALTH_REPORTER_PERIOD])
+		return;
+
+	dmp = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_AVAIL]);
+	if (dmp) {
+		if (hlt[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS]) {
+			attr = hlt[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS];
+			jiffies = mnl_attr_get_u64(attr);
+			err = jiffies_to_logtime(jiffies, dump_time_date);
+			if (err)
+				sprintf(dump_time_date,
+					"Failed to retrieve dump's timestamp");
+		}
+	}
+
+	pr_out_handle_start_arr(dl, tb);
+
+	pr_out_str(dl, "name",
+		   mnl_attr_get_str(hlt[DEVLINK_ATTR_HEALTH_REPORTER_NAME]));
+	state = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_STATE]);
+	pr_out_str(dl, "state", health_state_name(state));
+	pr_out_u64(dl, "#err",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_ERR]));
+	pr_out_u64(dl, "#recover",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER]));
+	pr_out_str(dl, "last_dump_ts", dump_time_date);
+	pr_out_bool(dl, "dump_available", dmp);
+	pr_out_array_parameters_start(dl, "parameters");
+	pr_out_entry_start(dl);
+	pr_out_u64(dl, "grace_period",
+		   mnl_attr_get_u64(hlt[DEVLINK_ATTR_HEALTH_REPORTER_PERIOD]));
+	auto_recover = mnl_attr_get_u8(hlt[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_REC]);
+	pr_out_bool(dl, "auto_recover", auto_recover);
+	pr_out_entry_end(dl);
+	pr_out_array_end(dl);
+
+	pr_out_handle_end(dl);
+}
+
+static int cmd_health_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct dl *dl = data;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_HEALTH_REPORTER])
+		return MNL_CB_ERROR;
+
+	pr_out_health(dl, tb);
+
+	return MNL_CB_OK;
+}
+
+static int cmd_health_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_HEALTH_REPORTER_GET,
+			       flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl, 0, DL_OPT_HANDLE |
+					DL_OPT_HEALTH_REPORTER_NAME);
+		if (err)
+			return err;
+	}
+	pr_out_section_start(dl, "health");
+
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_health_show_cb, dl);
+	if (err)
+		return err;
+	pr_out_section_end(dl);
+	return err;
+}
+
+static void cmd_health_help(void)
+{
+	pr_err("Usage: devlink health show [ dev DEV reporter REPORTER_NAME ]\n");
+	pr_err("Usage: devlink health recover DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health diagnose DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health dump show DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health dump clear DEV reporter REPORTER_NAME\n");
+	pr_err("Usage: devlink health set DEV reporter REPORTER_NAME NAME VALUE\n");
+}
+
+static int cmd_health(struct dl *dl)
+{
+	if (dl_no_arg(dl)) {
+		return cmd_health_show(dl);
+	} else if (dl_argv_match(dl, "help")) {
+		cmd_health_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show")) {
+		dl_arg_inc(dl);
+		return cmd_health_show(dl);
+	} else if (dl_argv_match(dl, "recover")) {
+		dl_arg_inc(dl);
+		return cmd_health_recover(dl);
+	} else if (dl_argv_match(dl, "diagnose")) {
+		dl_arg_inc(dl);
+		return cmd_health_diagnose(dl);
+	} else if (dl_argv_match(dl, "dump")) {
+		dl_arg_inc(dl);
+		if (dl_argv_match(dl, "show")) {
+			dl_arg_inc(dl);
+			return cmd_health_dump_show(dl);
+		} else if (dl_argv_match(dl, "clear")) {
+			dl_arg_inc(dl);
+			return cmd_health_dump_clear(dl);
+		}
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_health_set_params(dl);
+	}
+
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static void help(void)
 {
 	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       devlink [ -f[orce] ] -b[atch] filename\n"
-	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region }\n"
+	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health }\n"
 	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] }\n");
 }
 
@@ -5594,7 +6117,11 @@ static int dl_cmd(struct dl *dl, int argc, char **argv)
 	} else if (dl_argv_match(dl, "region")) {
 		dl_arg_inc(dl);
 		return cmd_region(dl);
+	} else if (dl_argv_match(dl, "health")) {
+		dl_arg_inc(dl);
+		return cmd_health(dl);
 	}
+
 	pr_err("Object \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
 }
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index d0a33d79dc22..90c6ebb4829d 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -89,6 +89,12 @@ enum devlink_command {
 	DEVLINK_CMD_REGION_DEL,
 	DEVLINK_CMD_REGION_READ,
 
+	DEVLINK_CMD_HEALTH_REPORTER_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_SET,
+	DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+	DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+	DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+	DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -168,6 +174,11 @@ enum devlink_param_fw_load_policy_value {
 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH,
 };
 
+enum devlink_health_reporter_state {
+	DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+	DEVLINK_HEALTH_REPORTER_STATE_ERROR,
+};
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -285,6 +296,24 @@ enum devlink_attr {
 	DEVLINK_ATTR_REGION_CHUNK_ADDR,         /* u64 */
 	DEVLINK_ATTR_REGION_CHUNK_LEN,          /* u64 */
 
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT,		/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR,		/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE,	/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_ARRAY,	/* nested */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_TYPE,	/* u8 */
+	DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE_DATA,	/* dynamic */
+
+	DEVLINK_ATTR_HEALTH_REPORTER,			/* nested */
+	DEVLINK_ATTR_HEALTH_REPORTER_NAME,		/* string */
+	DEVLINK_ATTR_HEALTH_REPORTER_STATE,		/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_ERR,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_AVAIL,	/* u8 */
+	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_PERIOD,		/* u64 */
+	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_REC,		/* u8 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/man/man8/devlink-health.8 b/man/man8/devlink-health.8
new file mode 100644
index 000000000000..880672290d36
--- /dev/null
+++ b/man/man8/devlink-health.8
@@ -0,0 +1,175 @@
+.TH DEVLINK\-HEALTH 8 "27 Dec 2018" "iproute2" "Linux"
+.SH NAME
+devlink-health \- devlink health reporting and recovery
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.B health
+.RI  " { " COMMAND " | "
+.BR help " }"
+.sp
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] }
+.ti -8
+.BR "devlink health show"
+.RI "[ "
+.RI "" DEV ""
+.BR " reporter "
+.RI ""REPORTER " ] "
+.ti -8
+.BR "devlink health recover"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health diagnose"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health dump show"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health dump clear"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.ti -8
+.BR "devlink health set"
+.RI "" DEV ""
+.BR "reporter"
+.RI "" REPORTER ""
+.RI "" NAME ""
+.RI "" VALUE ""
+.ti -8
+.B devlink health help
+.SH "DESCRIPTION"
+.SS devlink health show - Show status and configuration on all supported reporters on all devlink devices.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health recover - Initiate a recovery operation on a reporter.
+This action performs a recovery and increases the recoveries counter on success.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health diagnose - Retrieve diagnostics data on a reporter.
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health dump show - Display the last saved dump.
+.PD 0
+.P
+Devlink health saves a single dump per reporter. If an dump is
+.P
+not already stored by the Devlink, this command will generate a new
+.P
+dump. The dump can be generated either automatically when a
+.P
+reporter reports on an error or manually at the user's request.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health dump clear - Delete the saved dump.
+Deleting the save dump enables a generation of a new dump on
+.PD 0
+.P
+the next "devlink health dump show" command.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SS devlink health set - Enable the user to configure:
+.PD 0
+1) grace_period [msec] - Time interval between auto recoveries.
+.P
+2) auto_recover [true/false] - Indicates whether the devlink should execute automatic recover on error.
+.PD
+.PP
+.I "DEV"
+- specifies the devlink device.
+.PP
+.I "REPORTER"
+- specifies the reporter's name registered on the devlink device.
+.SH "EXAMPLES"
+.PP
+devlink health show
+.RS 4
+pci/0000:00:09.0:
+  name TX state healthy #err 1 #recover 1 last_dump_ts N/A dump_available false
+  attributes:
+    grace period 600 auto_recover true
+.RE
+.PP
+devlink health recover pci/0000:00:09.0 reporter TX
+.RS 4
+Initiate recovery on TX reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health diagnose pci/0000:00:09.0 reporter TX
+.RS 4
+.PD 0
+SQ 0x9b: HW state: 1 stopped: 0
+.P
+SQ 0x9f: HW state: 1 stopped: 0
+.P
+SQ 0xa3: HW state: 1 stopped: 0
+.P
+SQ 0xa7: HW state: 1 stopped: 0
+.P
+SQ 0xab: HW state: 1 stopped: 0
+.PD
+.RE
+.PP
+devlink health dump show pci/0000:00:09.0 reporter TX
+.RS 4
+Display the last saved dump on TX reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health dump clear pci/0000:00:09.0 reporter TX
+.RS 4
+Delete saved dump on TX reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health set pci/0000:00:09.0 reporter TX grace_period 3500
+.RS 4
+Set time interval between auto recoveries to minimum of 3500 mSec on
+TX reporter registered on pci/0000:00:09.0.
+.RE
+.PP
+devlink health set pci/0000:00:09.0 reporter TX auto_recover false
+.RS 4
+Turn off auto recovery on TX reporter registered on pci/0000:00:09.0.
+.RE
+.SH SEE ALSO
+.BR devlink (8),
+.BR devlink-dev (8),
+.BR devlink-port (8),
+.BR devlink-region (8),
+.br
+
+.SH AUTHOR
+Aya Levin <ayal@mellanox.com>
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 8d527e7e1d60..13d4dcd908b3 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -7,7 +7,7 @@ devlink \- Devlink tool
 .in +8
 .ti -8
 .B devlink
-.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region " } { " COMMAND " | "
+.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health " } { " COMMAND " | "
 .BR help " }"
 .sp
 
@@ -78,6 +78,10 @@ Turn on verbose output.
 .B region
 - devlink address region access
 
+.TP
+.B health
+- devlink reporting and recovery
+
 .SS
 .I COMMAND
 
@@ -109,6 +113,7 @@ Exit status is 0 if command was successful or a positive integer upon failure.
 .BR devlink-sb (8),
 .BR devlink-resource (8),
 .BR devlink-region (8),
+.BR devlink-health (8),
 .br
 
 .SH REPORTING BUGS
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 00/27] Devlink health reporting and recovery system
  2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
                   ` (27 preceding siblings ...)
  2019-01-17 15:09 ` [PATCH] devlink: Add health command support Aya Levin
@ 2019-01-17 18:52 ` Jiri Pirko
  2019-01-17 21:30   ` Eran Ben Elisha
  28 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2019-01-17 18:52 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Moshe Shemesh

Thu, Jan 17, 2019 at 04:04:54PM CET, eranbe@mellanox.com wrote:


27 patches is way too many to review in one go.
Please split to at least 2 patchsets. Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next 00/27] Devlink health reporting and recovery system
  2019-01-17 18:52 ` [PATCH net-next 00/27] Devlink health reporting and recovery system Jiri Pirko
@ 2019-01-17 21:30   ` Eran Ben Elisha
  0 siblings, 0 replies; 31+ messages in thread
From: Eran Ben Elisha @ 2019-01-17 21:30 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, Jiri Pirko, David S. Miller, Ariel Almog, Aya Levin,
	Moshe Shemesh



On 1/17/2019 8:52 PM, Jiri Pirko wrote:
> Thu, Jan 17, 2019 at 04:04:54PM CET, eranbe@mellanox.com wrote:
> 
> 
> 27 patches is way too many to review in one go.
> Please split to at least 2 patchsets. Thanks!
>


Ack, will post v2 soon.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-01-17 21:31 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-17 15:04 [PATCH net-next 00/27] Devlink health reporting and recovery system Eran Ben Elisha
2019-01-17 15:04 ` [PATCH net-next 01/27] devlink: Add health buffer support Eran Ben Elisha
2019-01-17 15:04 ` [PATCH net-next 02/27] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
2019-01-17 15:04 ` [PATCH net-next 03/27] devlink: Add health report functionality Eran Ben Elisha
2019-01-17 15:04 ` [PATCH net-next 04/27] devlink: Add health get command Eran Ben Elisha
2019-01-17 15:04 ` [PATCH net-next 05/27] devlink: Add health set command Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 06/27] devlink: Add health recover command Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 07/27] devlink: Add health diagnose command Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 08/27] devlink: Add health dump {get,clear} commands Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 09/27] net/mlx5e: Add TX reporter support Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 10/27] net/mlx5e: Add TX timeout support for mlx5e TX reporter Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 11/27] net/mlx5: Move all devlink related functions calls to devlink.c Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 12/27] net/mlx5: Add Vendor Specific Capability access gateway Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 13/27] net/mlx5: Add Crdump FW snapshot support Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 14/27] net/mlx5: Use devlink region_snapshot parameter Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 15/27] Documentation: mlx5: Update kernel documentation Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 16/27] net/mlx5: Handle SW reset of FW in error flow Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 17/27] net/mlx5: Control CR-space access by different PFs Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 18/27] net/mlx5: Issue SW reset on FW assert Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 19/27] net/mlx5: Refactor print health info Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 20/27] net/mlx5: Create FW devlink_health_reporter Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 21/27] net/mlx5: Add core dump register access functions Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 22/27] net/mlx5: Add support for FW reporter dump Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 23/27] net/mlx5: Report devlink health on FW issues Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 24/27] net/mlx5: Add FW fatal devlink_health_reporter Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 25/27] net/mlx5: Add support for FW fatal reporter dump Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 26/27] net/mlx5: Report devlink health on FW fatal issues Eran Ben Elisha
2019-01-17 15:05 ` [PATCH net-next 27/27] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
2019-01-17 15:09 ` [PATCH] devlink: Add health command support Aya Levin
2019-01-17 18:52 ` [PATCH net-next 00/27] Devlink health reporting and recovery system Jiri Pirko
2019-01-17 21:30   ` Eran Ben Elisha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.