All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/13] devlink direct region reading
@ 2020-01-30 22:58 Jacob Keller
  2020-01-30 22:58 ` [PATCH 01/15] devlink: prepare to support region operations Jacob Keller
                   ` (19 more replies)
  0 siblings, 20 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:58 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

As per the previous discussions about supporting devlink region snapshot
triggers and direct region writes, this is an RFC patch series to implement
these changes. I fully expect to need revisions to this series, and the ice
driver changes will need to go through the normal intel-wired-lan queue
process.

The final patches implement support for directly reading from a region
without a snapshot. In order to have context on how and why this is
implemented, the ice driver is modified to support devlink and expose a
region. The series can be broken down into multiple parts or stages, as
described below.

The first 4 patches modify devlink regions to support a new
devlink_region_ops structure, and then implement a new op for requesting
that the driver capture an immediate snapshot. Support for that is then
implemented in the netdevsim driver. This was previously discussed on the
list:

  https://lore.kernel.org/netdev/20200109193311.1352330-1-jacob.e.keller@intel.com/

The only major change is to convert to using an operation in the
devlink_region_ops structure instead of passing the pointer as an argument
to devlink_region_create. This makes it easier for regions to support
additional operations in the future.

Following this section is 3 patches to the ice driver implementing a
function to easily read portions of the NVM contents as flat addressable
memory. The current .get_eeprom function is updated to support reading from
the entire NVM.

There is a patch to add devlinkm_alloc as a devres based managed allocation
for the devlink_alloc call.

Following this are 4 patches to implement basic devlink support for the ice
driver. This includes a simple .get_info handler.

Finally, the last 3 patches implement a new region in the ice driver for
accessing the shadow RAM. This region will support the immediate trigger
operation as well as a new read operation that enables directly reading from
the shadow RAM without a region.

I expect that this series is not yet ready to land and am sending it
primarily as a discussion point on the changes to the devlink core code. I
expect feedback and the need to change and improve the implementation.

It's likely that the actual submissions can be broken up into smaller
series, for example the basic ice devlink support going first followed by a
separate series for the region changes.

I've also submitted iproute2 patches for implementing the ability to request
snapshots and read from the region directly.

Jacob Keller (12):
  devlink: prepare to support region operations
  devlink: add functions to take snapshot while locked
  devlink: add operation to take an immediate snapshot
  netdevsim: support taking immediate snapshot via devlink
  ice: use __le16 types for explicitly Little Endian values
  ice: create function to read a section of the NVM and Shadow RAM
  ice: enable initial devlink support for function zero
  ice: add basic handler for devlink .info_get
  ice: add board identifier info to devlink .info_get
  ice: add a devlink region to dump shadow RAM contents
  devlink: support directly reading from region memory
  ice: support direct read of the shadow ram region

Jesse Brandeburg (1):
  ice: implement full NVM read from ETHTOOL_GEEPROM

 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/ice/Makefile       |   1 +
 drivers/net/ethernet/intel/ice/ice.h          |   7 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   3 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  61 +++
 drivers/net/ethernet/intel/ice/ice_common.h   |   5 +-
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 427 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h  |  20 +
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  37 +-
 drivers/net/ethernet/intel/ice/ice_main.c     |  22 +
 drivers/net/ethernet/intel/ice/ice_nvm.c      | 211 +++------
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   7 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/mellanox/mlx4/crdump.c   |  25 +-
 drivers/net/netdevsim/dev.c                   |  44 +-
 include/net/devlink.h                         |  28 +-
 include/uapi/linux/devlink.h                  |   2 +
 net/core/devlink.c                            | 246 +++++++---
 18 files changed, 918 insertions(+), 230 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.h

-- 
2.25.0.rc1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/15] devlink: prepare to support region operations
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
@ 2020-01-30 22:58 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-02-03 11:35   ` Jiri Pirko
  2020-01-30 22:58 ` [PATCH 02/15] devlink: add functions to take snapshot while locked Jacob Keller
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:58 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Modify the devlink region code in preparation for adding new operations
on regions.

Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).

This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 25 ++++++++++++---------
 drivers/net/netdevsim/dev.c                 |  6 ++++-
 include/net/devlink.h                       | 17 ++++++++++----
 net/core/devlink.c                          | 23 ++++++++++---------
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c b/drivers/net/ethernet/mellanox/mlx4/crdump.c
index 64ed725aec28..4cea64033919 100644
--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -38,8 +38,13 @@
 #define CR_ENABLE_BIT_OFFSET		0xF3F04
 #define MAX_NUM_OF_DUMPS_TO_STORE	(8)
 
-static const char *region_cr_space_str = "cr-space";
-static const char *region_fw_health_str = "fw-health";
+static const struct devlink_region_ops region_cr_space_ops = {
+	.name = "cr-space",
+};
+
+static const struct devlink_region_ops region_fw_health_ops = {
+	.name = "fw-health",
+};
 
 /* Set to true in case cr enable bit was set to true before crdump */
 static bool crdump_enbale_bit_set;
@@ -103,10 +108,10 @@ static void mlx4_crdump_collect_crspace(struct mlx4_dev *dev,
 		if (err) {
 			kvfree(crspace_data);
 			mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
-				  region_cr_space_str, id, err);
+				  region_cr_space_ops.name, id, err);
 		} else {
 			mlx4_info(dev, "crdump: added snapshot %d to devlink region %s\n",
-				  id, region_cr_space_str);
+				  id, region_cr_space_ops.name);
 		}
 	} else {
 		mlx4_err(dev, "crdump: Failed to allocate crspace buffer\n");
@@ -142,10 +147,10 @@ static void mlx4_crdump_collect_fw_health(struct mlx4_dev *dev,
 		if (err) {
 			kvfree(health_data);
 			mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
-				  region_fw_health_str, id, err);
+				  region_fw_health_ops.name, id, err);
 		} else {
 			mlx4_info(dev, "crdump: added snapshot %d to devlink region %s\n",
-				  id, region_fw_health_str);
+				  id, region_fw_health_ops.name);
 		}
 	} else {
 		mlx4_err(dev, "crdump: Failed to allocate health buffer\n");
@@ -205,23 +210,23 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
 	/* Create cr-space region */
 	crdump->region_crspace =
 		devlink_region_create(devlink,
-				      region_cr_space_str,
+				      &region_cr_space_ops,
 				      MAX_NUM_OF_DUMPS_TO_STORE,
 				      pci_resource_len(pdev, 0));
 	if (IS_ERR(crdump->region_crspace))
 		mlx4_warn(dev, "crdump: create devlink region %s err %ld\n",
-			  region_cr_space_str,
+			  region_cr_space_ops.name,
 			  PTR_ERR(crdump->region_crspace));
 
 	/* Create fw-health region */
 	crdump->region_fw_health =
 		devlink_region_create(devlink,
-				      region_fw_health_str,
+				      &region_fw_health_ops,
 				      MAX_NUM_OF_DUMPS_TO_STORE,
 				      HEALTH_BUFFER_SIZE);
 	if (IS_ERR(crdump->region_fw_health))
 		mlx4_warn(dev, "crdump: create devlink region %s err %ld\n",
-			  region_fw_health_str,
+			  region_fw_health_ops.name,
 			  PTR_ERR(crdump->region_fw_health));
 
 	return 0;
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index b53fbc06e104..d521b7bfe007 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -242,11 +242,15 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
 
 #define NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX 16
 
+static const struct devlink_region_ops dummy_region_ops = {
+	.name = "dummy",
+};
+
 static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
 				      struct devlink *devlink)
 {
 	nsim_dev->dummy_region =
-		devlink_region_create(devlink, "dummy",
+		devlink_region_create(devlink, &dummy_region_ops,
 				      NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX,
 				      NSIM_DEV_DUMMY_REGION_SIZE);
 	return PTR_ERR_OR_ZERO(nsim_dev->dummy_region);
diff --git a/include/net/devlink.h b/include/net/devlink.h
index ce5cea428fdc..4a0baa6903cb 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -495,6 +495,14 @@ struct devlink_info_req;
 
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
+/**
+ * struct devlink_region_ops - Region operations
+ * @name: region name
+ */
+struct devlink_region_ops {
+	const char *name;
+};
+
 struct devlink_fmsg;
 struct devlink_health_reporter;
 
@@ -949,10 +957,11 @@ void devlink_port_param_value_changed(struct devlink_port *devlink_port,
 				      u32 param_id);
 void devlink_param_value_str_fill(union devlink_param_value *dst_val,
 				  const char *src);
-struct devlink_region *devlink_region_create(struct devlink *devlink,
-					     const char *region_name,
-					     u32 region_max_snapshots,
-					     u64 region_size);
+struct devlink_region *
+devlink_region_create(struct devlink *devlink,
+		      const struct devlink_region_ops *ops,
+		      u32 region_max_snapshots,
+		      u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
 u32 devlink_region_snapshot_id_get(struct devlink *devlink);
 int devlink_region_snapshot_create(struct devlink_region *region,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index ca1df0ec3c97..d1f7bfbf81da 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -344,7 +344,7 @@ devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
 struct devlink_region {
 	struct devlink *devlink;
 	struct list_head list;
-	const char *name;
+	const struct devlink_region_ops *ops;
 	struct list_head snapshot_list;
 	u32 max_snapshots;
 	u32 cur_snapshots;
@@ -365,7 +365,7 @@ devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
 	struct devlink_region *region;
 
 	list_for_each_entry(region, &devlink->region_list, list)
-		if (!strcmp(region->name, region_name))
+		if (!strcmp(region->ops->name, region_name))
 			return region;
 
 	return NULL;
@@ -3687,7 +3687,7 @@ static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
 	if (err)
 		goto nla_put_failure;
 
-	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->name);
+	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->ops->name);
 	if (err)
 		goto nla_put_failure;
 
@@ -3733,7 +3733,7 @@ static void devlink_nl_region_notify(struct devlink_region *region,
 		goto out_cancel_msg;
 
 	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME,
-			     region->name);
+			     region->ops->name);
 	if (err)
 		goto out_cancel_msg;
 
@@ -7530,21 +7530,22 @@ EXPORT_SYMBOL_GPL(devlink_param_value_str_fill);
  *	devlink_region_create - create a new address region
  *
  *	@devlink: devlink
- *	@region_name: region name
+ *	@ops: region operations and name
  *	@region_max_snapshots: Maximum supported number of snapshots for region
  *	@region_size: size of region
  */
-struct devlink_region *devlink_region_create(struct devlink *devlink,
-					     const char *region_name,
-					     u32 region_max_snapshots,
-					     u64 region_size)
+struct devlink_region *
+devlink_region_create(struct devlink *devlink,
+		      const struct devlink_region_ops *ops,
+		      u32 region_max_snapshots,
+		      u64 region_size)
 {
 	struct devlink_region *region;
 	int err = 0;
 
 	mutex_lock(&devlink->lock);
 
-	if (devlink_region_get_by_name(devlink, region_name)) {
+	if (devlink_region_get_by_name(devlink, ops->name)) {
 		err = -EEXIST;
 		goto unlock;
 	}
@@ -7557,7 +7558,7 @@ struct devlink_region *devlink_region_create(struct devlink *devlink,
 
 	region->devlink = devlink;
 	region->max_snapshots = region_max_snapshots;
-	region->name = region_name;
+	region->ops = ops;
 	region->size = region_size;
 	INIT_LIST_HEAD(&region->snapshot_list);
 	list_add_tail(&region->list, &devlink->region_list);
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 02/15] devlink: add functions to take snapshot while locked
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
  2020-01-30 22:58 ` [PATCH 01/15] devlink: prepare to support region operations Jacob Keller
@ 2020-01-30 22:58 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-02-03 11:39   ` Jiri Pirko
  2020-01-30 22:58 ` [PATCH 03/15] devlink: add operation to take an immediate snapshot Jacob Keller
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:58 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

A future change is going to add a new devlink command to request
a snapshot on demand. This function will want to call the
devlink_region_snapshot_id_get and devlink_region_snapshot_create
functions while already holding the devlink instance lock.

Extract the logic of these two functions into static functions with the
_locked postfix. Modify the original functions to be implemented in
terms of the new locked functions.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 net/core/devlink.c | 95 +++++++++++++++++++++++++++++-----------------
 1 file changed, 61 insertions(+), 34 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index d1f7bfbf81da..faf4f4c5c539 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3761,6 +3761,63 @@ static void devlink_nl_region_notify(struct devlink_region *region,
 	nlmsg_free(msg);
 }
 
+/**
+ *	devlink_region_snapshot_id_get_locked - get snapshot ID
+ *
+ *	Returns a new snapshot id. Must be called while holding the
+ *	devlink instance lock.
+ */
+static u32 devlink_region_snapshot_id_get_locked(struct devlink *devlink)
+{
+	return ++devlink->snapshot_id;
+}
+
+/**
+ *	devlink_region_snapshot_create_locked - create a new snapshot
+ *	This will add a new snapshot of a region. The snapshot
+ *	will be stored on the region struct and can be accessed
+ *	from devlink. This is useful for future	analyses of snapshots.
+ *	Multiple snapshots can be created on a region.
+ *	The @snapshot_id should be obtained using the getter function.
+ *
+ *	Must be called only while holding the devlink instance lock.
+ *
+ *	@region: devlink region of the snapshot
+ *	@data: snapshot data
+ *	@snapshot_id: snapshot id to be created
+ *	@destructor: pointer to destructor function to free data
+ */
+static int
+devlink_region_snapshot_create_locked(struct devlink_region *region,
+				      u8 *data, u32 snapshot_id,
+				      devlink_snapshot_data_dest_t *destructor)
+{
+	struct devlink_snapshot *snapshot;
+
+	/* check if region can hold one more snapshot */
+	if (region->cur_snapshots == region->max_snapshots)
+		return -ENOMEM;
+
+	if (devlink_region_snapshot_get_by_id(region, snapshot_id))
+		return -EEXIST;
+
+	snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
+	if (!snapshot)
+		return -ENOMEM;
+
+	snapshot->id = snapshot_id;
+	snapshot->region = region;
+	snapshot->data = data;
+	snapshot->data_destructor = destructor;
+
+	list_add_tail(&snapshot->list, &region->snapshot_list);
+
+	region->cur_snapshots++;
+
+	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
+	return 0;
+}
+
 static void devlink_region_snapshot_del(struct devlink_region *region,
 					struct devlink_snapshot *snapshot)
 {
@@ -7611,7 +7668,7 @@ u32 devlink_region_snapshot_id_get(struct devlink *devlink)
 	u32 id;
 
 	mutex_lock(&devlink->lock);
-	id = ++devlink->snapshot_id;
+	id = devlink_region_snapshot_id_get_locked(devlink);
 	mutex_unlock(&devlink->lock);
 
 	return id;
@@ -7622,7 +7679,7 @@ EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_get);
  *	devlink_region_snapshot_create - create a new snapshot
  *	This will add a new snapshot of a region. The snapshot
  *	will be stored on the region struct and can be accessed
- *	from devlink. This is useful for future	analyses of snapshots.
+ *	from devlink. This is useful for future analyses of snapshots.
  *	Multiple snapshots can be created on a region.
  *	The @snapshot_id should be obtained using the getter function.
  *
@@ -7636,43 +7693,13 @@ int devlink_region_snapshot_create(struct devlink_region *region,
 				   devlink_snapshot_data_dest_t *data_destructor)
 {
 	struct devlink *devlink = region->devlink;
-	struct devlink_snapshot *snapshot;
 	int err;
 
 	mutex_lock(&devlink->lock);
-
-	/* check if region can hold one more snapshot */
-	if (region->cur_snapshots == region->max_snapshots) {
-		err = -ENOMEM;
-		goto unlock;
-	}
-
-	if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
-		err = -EEXIST;
-		goto unlock;
-	}
-
-	snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
-	if (!snapshot) {
-		err = -ENOMEM;
-		goto unlock;
-	}
-
-	snapshot->id = snapshot_id;
-	snapshot->region = region;
-	snapshot->data = data;
-	snapshot->data_destructor = data_destructor;
-
-	list_add_tail(&snapshot->list, &region->snapshot_list);
-
-	region->cur_snapshots++;
-
-	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
+	err = devlink_region_snapshot_create_locked(region, data, snapshot_id,
+						    data_destructor);
 	mutex_unlock(&devlink->lock);
-	return 0;
 
-unlock:
-	mutex_unlock(&devlink->lock);
 	return err;
 }
 EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
  2020-01-30 22:58 ` [PATCH 01/15] devlink: prepare to support region operations Jacob Keller
  2020-01-30 22:58 ` [PATCH 02/15] devlink: add functions to take snapshot while locked Jacob Keller
@ 2020-01-30 22:58 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
                     ` (2 more replies)
  2020-01-30 22:58 ` [PATCH 04/15] netdevsim: support taking immediate snapshot via devlink Jacob Keller
                   ` (16 subsequent siblings)
  19 siblings, 3 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:58 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
command is intended to enable userspace to request an immediate snapshot
of a region.

Regions can enable support for requestable snapshots by implementing the
snapshot callback function in the region's devlink_region_ops structure.

Implementations of this function callback should capture an immediate
copy of the data and return it and its destructor in the function
parameters. The core devlink code will generate a snapshot ID and create
the new snapshot while holding the devlink instance lock.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../networking/devlink/devlink-region.rst     |  9 +++-
 include/net/devlink.h                         |  7 +++
 include/uapi/linux/devlink.h                  |  2 +
 net/core/devlink.c                            | 46 +++++++++++++++++++
 4 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
index 1a7683e7acb2..262249e6c3fc 100644
--- a/Documentation/networking/devlink/devlink-region.rst
+++ b/Documentation/networking/devlink/devlink-region.rst
@@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
 Regions may also be used to provide an additional way to debug complex error
 states, but see also :doc:`devlink-health`
 
+Regions may optionally support capturing a snapshot on demand via the
+``DEVLINK_CMD_REGION_TAKE_SNAPSHOT`` netlink message. A driver wishing to
+allow requested snapshots must implement the ``.snapshot`` callback for the
+region in its ``devlink_region_ops`` structure.
+
 example usage
 -------------
 
@@ -40,8 +45,8 @@ example usage
     # Delete a snapshot using:
     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
 
-    # Trigger (request) a snapshot be taken:
-    $ devlink region trigger pci/0000:00:05.0/cr-space
+    # Request an immediate snapshot, if supported by the region
+    $ devlink region snapshot pci/0000:00:05.0/cr-space
 
     # Dump a snapshot:
     $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 4a0baa6903cb..63e954241404 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -498,9 +498,16 @@ typedef void devlink_snapshot_data_dest_t(const void *data);
 /**
  * struct devlink_region_ops - Region operations
  * @name: region name
+ * @snapshot: callback to request an immediate snapshot. On success,
+ *            the data and destructor variables must be updated. The function
+ *            will be called while the devlink instance lock is held.
  */
 struct devlink_region_ops {
 	const char *name;
+	int (*snapshot)(struct devlink *devlink,
+			struct netlink_ext_ack *extack,
+			u8 **data,
+			devlink_snapshot_data_dest_t **destructor);
 };
 
 struct devlink_fmsg;
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index ae37fd4d194a..46643c4320b9 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -117,6 +117,8 @@ enum devlink_command {
 	DEVLINK_CMD_TRAP_GROUP_NEW,
 	DEVLINK_CMD_TRAP_GROUP_DEL,
 
+	DEVLINK_CMD_REGION_TAKE_SNAPSHOT,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
diff --git a/net/core/devlink.c b/net/core/devlink.c
index faf4f4c5c539..574008c536fa 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4109,6 +4109,45 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	return err;
 }
 
+static int devlink_nl_cmd_region_take_snapshot(struct sk_buff *skb,
+					       struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	devlink_snapshot_data_dest_t *destructor;
+	struct devlink_region *region;
+	const char *region_name;
+	u32 snapshot_id;
+	u8 *data;
+	int err;
+
+	if (!info->attrs[DEVLINK_ATTR_REGION_NAME]) {
+		NL_SET_ERR_MSG_MOD(info->extack, "No region name provided");
+		return -EINVAL;
+	}
+
+	region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+	region = devlink_region_get_by_name(devlink, region_name);
+	if (!region) {
+		NL_SET_ERR_MSG_MOD(info->extack,
+				   "The requested region does not exist");
+		return -EINVAL;
+	}
+
+	if (!region->ops->snapshot) {
+		NL_SET_ERR_MSG_MOD(info->extack,
+				   "The requested region does not support taking an immediate snapshot");
+		return -EOPNOTSUPP;
+	}
+
+	err = region->ops->snapshot(devlink, info->extack, &data, &destructor);
+	if (err)
+		return err;
+
+	snapshot_id = devlink_region_snapshot_id_get_locked(devlink);
+	return devlink_region_snapshot_create_locked(region, data, snapshot_id,
+						     destructor);
+}
+
 struct devlink_info_req {
 	struct sk_buff *msg;
 };
@@ -6249,6 +6288,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_REGION_TAKE_SNAPSHOT,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit = devlink_nl_cmd_region_take_snapshot,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 	{
 		.cmd = DEVLINK_CMD_INFO_GET,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 04/15] netdevsim: support taking immediate snapshot via devlink
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (2 preceding siblings ...)
  2020-01-30 22:58 ` [PATCH 03/15] devlink: add operation to take an immediate snapshot Jacob Keller
@ 2020-01-30 22:58 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-01-30 22:59 ` [PATCH 05/15] ice: use __le16 types for explicitly Little Endian values Jacob Keller
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:58 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Implement the .snapshot region operation for the dummy data region. This
enables a region snapshot to be taken upon request via the new
DEVLINK_CMD_REGION_SNAPSHOT command.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/netdevsim/dev.c                   | 38 +++++++++++++++----
 .../drivers/net/netdevsim/devlink.sh          |  5 +++
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index d521b7bfe007..924cd328037f 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -38,24 +38,47 @@ static struct dentry *nsim_dev_ddir;
 
 #define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
 
+static int nsim_dev_take_snapshot(struct devlink *devlink,
+				  struct netlink_ext_ack *extack,
+				  u8 **data,
+				  devlink_snapshot_data_dest_t **destructor)
+{
+	void *dummy_data;
+
+	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
+	if (!dummy_data) {
+		NL_SET_ERR_MSG(extack, "Out of memory");
+		return -ENOMEM;
+	}
+
+	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
+
+	*data = dummy_data;
+	*destructor = kfree;
+
+	return 0;
+}
+
 static ssize_t nsim_dev_take_snapshot_write(struct file *file,
 					    const char __user *data,
 					    size_t count, loff_t *ppos)
 {
 	struct nsim_dev *nsim_dev = file->private_data;
-	void *dummy_data;
+	devlink_snapshot_data_dest_t *destructor;
+	u8 *dummy_data;
 	int err;
 	u32 id;
 
-	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
-	if (!dummy_data)
-		return -ENOMEM;
-
-	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
+	err = nsim_dev_take_snapshot(priv_to_devlink(nsim_dev), NULL,
+				     &dummy_data, &destructor);
+	if (err) {
+		pr_err("Failed to capture region snapshot\n");
+		return err;
+	}
 
 	id = devlink_region_snapshot_id_get(priv_to_devlink(nsim_dev));
 	err = devlink_region_snapshot_create(nsim_dev->dummy_region,
-					     dummy_data, id, kfree);
+					     dummy_data, id, destructor);
 	if (err) {
 		pr_err("Failed to create region snapshot\n");
 		kfree(dummy_data);
@@ -244,6 +267,7 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
 
 static const struct devlink_region_ops dummy_region_ops = {
 	.name = "dummy",
+	.snapshot = nsim_dev_take_snapshot,
 };
 
 static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index 025a84c2ab5a..e7827a092478 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -141,6 +141,11 @@ regions_test()
 
 	check_region_snapshot_count dummy post-first-delete 2
 
+	devlink region snapshot $DL_HANDLE/dummy
+	check_err $? "Failed to request a snapshot"
+
+	check_region_snapshot_count dummy post-request 3
+
 	log_test "regions test"
 }
 
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 05/15] ice: use __le16 types for explicitly Little Endian values
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (3 preceding siblings ...)
  2020-01-30 22:58 ` [PATCH 04/15] netdevsim: support taking immediate snapshot via devlink Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 22:59 ` [PATCH 06/15] ice: create function to read a section of the NVM and Shadow RAM Jacob Keller
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

The ice_read_sr_aq function returns words in the Little Endian format.
Remove the need for __force and typecasting by using a local variable in
the ice_read_sr_word_aq function.

Additionally clarify explicitly that the ice_read_sr_aq function takes
storage for __le16 values instead of using u16.

Being explicit about the endianness of this data helps when using tools
like sparse to catch endian-related issues.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_nvm.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 7525ac50742e..46db9bb0977f 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -80,13 +80,14 @@ ice_check_sr_access_params(struct ice_hw *hw, u32 offset, u16 words)
  * @hw: pointer to the HW structure
  * @offset: offset in words from module start
  * @words: number of words to read
- * @data: buffer for words reads from Shadow RAM
+ * @data: storage for the words read from Shadow RAM (Little Endian)
  * @last_command: tells the AdminQ that this is the last command
  *
- * Reads 16-bit word buffers from the Shadow RAM using the admin command.
+ * Reads 16-bit Little Endian word buffers from the Shadow RAM using the admin
+ * command.
  */
 static enum ice_status
-ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, u16 *data,
+ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, __le16 *data,
 	       bool last_command)
 {
 	enum ice_status status;
@@ -116,10 +117,11 @@ static enum ice_status
 ice_read_sr_word_aq(struct ice_hw *hw, u16 offset, u16 *data)
 {
 	enum ice_status status;
+	__le16 data_local;
 
-	status = ice_read_sr_aq(hw, offset, 1, data, true);
+	status = ice_read_sr_aq(hw, offset, 1, &data_local, true);
 	if (!status)
-		*data = le16_to_cpu(*(__force __le16 *)data);
+		*data = le16_to_cpu(data_local);
 
 	return status;
 }
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 06/15] ice: create function to read a section of the NVM and Shadow RAM
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (4 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 05/15] ice: use __le16 types for explicitly Little Endian values Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 22:59 ` [PATCH 07/15] ice: implement full NVM read from ETHTOOL_GEEPROM Jacob Keller
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

The NVM contents are read via firmware by using the ice_aq_read_nvm
function. This function has a couple of limits:

1) The AdminQ commands can only take buffers sized up to 4Kb. Thus, any
   larger read must be split into multiple reads.
2) when reading from the Shadow RAM, reads must not cross sector
   boundaries. The sectors are also 4Kb in size.

Implement the ice_read_flat_nvm function to read portions of the NVM by
flat offset. That is, to read using offsets from the start of the NVM
rather than from a specific module.

This function will be able to read both from the NVM and from the Shadow
RAM. For simplicity NVM reads will always be broken up to not cross 4Kb
page boundaries, even though this is not required unless reading from
the Shadow RAM.

Use this new function as the implementation of ice_read_sr_word_aq.

The ice_read_sr_buf_aq function is not modified here. This is because
a following change will remove the only caller of that function in favor
of directly using ice_read_flat_nvm. Thus, there is little benefit to
changing it now only to remove it momentarily. At the same time, the
ice_read_sr_aq function will also be removed.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  2 +
 drivers/net/ethernet/intel/ice/ice_nvm.c      | 87 +++++++++++++++++--
 drivers/net/ethernet/intel/ice/ice_nvm.h      |  3 +
 3 files changed, 85 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 4459bc564b11..2240f226568f 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1250,6 +1250,8 @@ struct ice_aqc_nvm {
 	__le32 addr_low;
 };
 
+#define ICE_AQC_NVM_START_POINT			0
+
 /* NVM Checksum Command (direct, 0x0706) */
 struct ice_aqc_nvm_checksum {
 	u8 flags;
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 46db9bb0977f..622431aa1e22 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -11,13 +11,15 @@
  * @length: length of the section to be read (in bytes from the offset)
  * @data: command buffer (size [bytes] = length)
  * @last_command: tells if this is the last command in a series
+ * @read_shadow_ram: tell if this is a shadow RAM read
  * @cd: pointer to command details structure or NULL
  *
  * Read the NVM using the admin queue commands (0x0701)
  */
 static enum ice_status
 ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
-		void *data, bool last_command, struct ice_sq_cd *cd)
+		void *data, bool last_command, bool read_shadow_ram,
+		struct ice_sq_cd *cd)
 {
 	struct ice_aq_desc desc;
 	struct ice_aqc_nvm *cmd;
@@ -30,6 +32,9 @@ ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
 
 	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_nvm_read);
 
+	if (!read_shadow_ram && module_typeid == ICE_AQC_NVM_START_POINT)
+		cmd->cmd_flags |= ICE_AQC_NVM_FLASH_ONLY;
+
 	/* If this is the last command in a series, set the proper flag. */
 	if (last_command)
 		cmd->cmd_flags |= ICE_AQC_NVM_LAST_CMD;
@@ -41,6 +46,68 @@ ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
 	return ice_aq_send_cmd(hw, &desc, data, length, cd);
 }
 
+/**
+ * ice_read_flat_nvm - Read portion of NVM by flat offset
+ * @hw: pointer to the HW struct
+ * @offset: offset from beginning of NVM
+ * @length: (in) number of bytes to read; (out) number of bytes actually read
+ * @data: buffer to return data in (sized to fit the specified length)
+ * @read_shadow_ram: if true, read from shadow RAM instead of NVM
+ *
+ * Reads a portion of the NVM, as a flat memory space. This function will
+ * correctly handle reading of sizes beyond a page by breaking the request
+ * into multiple reads.
+ *
+ * Returns a status code on failure. Note that the data pointer may be
+ * partially updated if some reads succeed before a failure.
+ */
+enum ice_status
+ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
+		  bool read_shadow_ram)
+{
+	enum ice_status status;
+	bool last_cmd = true;
+	u32 inlen = *length;
+	u32 bytes_read = 0;
+
+	*length = 0;
+
+	/* Verify the length of the read if this is for the Shadow RAM */
+	if (read_shadow_ram && ((offset + inlen) > (hw->nvm.sr_words * 2))) {
+		ice_debug(hw, ICE_DBG_NVM,
+			  "NVM error: requested offset is beyond Shadow RAM limit\n");
+		return ICE_ERR_PARAM;
+	}
+
+	do {
+		u32 read_size, page_offset;
+
+		/* ice_aq_read_nvm cannot read more than 4Kb at a time.
+		 * Additionally, break the reads up so that they do not cross
+		 * a page boundary.
+		 */
+		page_offset = offset % ICE_AQ_MAX_BUF_LEN;
+		read_size = min_t(u32, ICE_AQ_MAX_BUF_LEN - page_offset,
+				  inlen - bytes_read);
+
+		if ((bytes_read + read_size) < inlen)
+			last_cmd = false;
+
+		status = ice_aq_read_nvm(hw, ICE_AQC_NVM_START_POINT,
+					 offset, read_size,
+					 data + bytes_read, last_cmd,
+					 read_shadow_ram, NULL);
+		if (status)
+			break;
+
+		bytes_read += read_size;
+		offset += read_size;
+	} while (!last_cmd);
+
+	*length = bytes_read;
+	return status;
+}
+
 /**
  * ice_check_sr_access_params - verify params for Shadow RAM R/W operations.
  * @hw: pointer to the HW structure
@@ -100,7 +167,7 @@ ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, __le16 *data,
 	 */
 	if (!status)
 		status = ice_aq_read_nvm(hw, 0, 2 * offset, 2 * words, data,
-					 last_command, NULL);
+					 last_command, true, NULL);
 
 	return status;
 }
@@ -111,19 +178,25 @@ ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, __le16 *data,
  * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
  * @data: word read from the Shadow RAM
  *
- * Reads one 16 bit word from the Shadow RAM using the ice_read_sr_aq method.
+ * Reads one 16 bit word from the Shadow RAM using ice_read_flat_nvm.
  */
 static enum ice_status
 ice_read_sr_word_aq(struct ice_hw *hw, u16 offset, u16 *data)
 {
+	u32 bytes = sizeof(u16);
 	enum ice_status status;
 	__le16 data_local;
 
-	status = ice_read_sr_aq(hw, offset, 1, &data_local, true);
-	if (!status)
-		*data = le16_to_cpu(data_local);
+	/* Note that ice_read_flat_nvm checks if the read is past the Shadow
+	 * RAM size, and ensures we don't read across a page boundary
+	 */
+	status = ice_read_flat_nvm(hw, offset * sizeof(u16), &bytes,
+				   (u8 *)&data_local, true);
+	if (status)
+		return status;
 
-	return status;
+	*data = le16_to_cpu(data_local);
+	return 0;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index a9fa011c22c6..4245ef988edf 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -4,5 +4,8 @@
 #ifndef _ICE_NVM_H_
 #define _ICE_NVM_H_
 
+enum ice_status
+ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
+		  bool read_shadow_ram);
 enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data);
 #endif /* _ICE_NVM_H_ */
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 07/15] ice: implement full NVM read from ETHTOOL_GEEPROM
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (5 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 06/15] ice: create function to read a section of the NVM and Shadow RAM Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 22:59 ` [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free Jacob Keller
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev
  Cc: jiri, valex, linyunsheng, lihong.yang, Jesse Brandeburg, Jacob Keller

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

The current implementation of .get_eeprom only enables reading from the
Shadow RAM portion of the NVM contents. Implement support for reading
the entire flash contents instead of only the initial portion contained
in the Shadow RAM.

A complete dump can take several seconds, but the ETHTOOL_GEEPROM ioctl
is capable of reading only a limited portion at a time by specifying the
offset and length to read.

In order to perform the reads directly, several functions are made non
static. Additionally, the unused ice_read_sr_buf_aq and ice_read_sr_buf
functions are removed.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_common.h   |   3 -
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  36 +++--
 drivers/net/ethernet/intel/ice/ice_nvm.c      | 150 +-----------------
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   4 +
 5 files changed, 31 insertions(+), 163 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 2240f226568f..0370e5835b2e 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1757,6 +1757,7 @@ enum ice_aq_err {
 	ICE_AQ_RC_ENOMEM	= 9,  /* Out of memory */
 	ICE_AQ_RC_EBUSY		= 12, /* Device or resource busy */
 	ICE_AQ_RC_EEXIST	= 13, /* Object already exists */
+	ICE_AQ_RC_EINVAL	= 14, /* Invalid argument */
 	ICE_AQ_RC_ENOSPC	= 16, /* No space left or allocation failure */
 	ICE_AQ_RC_ENOSYS	= 17, /* Function not implemented */
 	ICE_AQ_RC_ENOSEC	= 24, /* Missing security manifest */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index b5c013fdaaf9..045913cd7428 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -38,9 +38,6 @@ enum ice_status
 ice_alloc_hw_res(struct ice_hw *hw, u16 type, u16 num, bool btm, u16 *res);
 enum ice_status
 ice_free_hw_res(struct ice_hw *hw, u16 type, u16 num, u16 *res);
-enum ice_status ice_init_nvm(struct ice_hw *hw);
-enum ice_status
-ice_read_sr_buf(struct ice_hw *hw, u16 offset, u16 *words, u16 *data);
 enum ice_status
 ice_aq_alloc_free_res(struct ice_hw *hw, u16 num_entries,
 		      struct ice_aqc_alloc_free_res_elem *buf, u16 buf_size,
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 90c6a3ca20c9..c3ddaab25313 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -240,39 +240,51 @@ ice_get_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom,
 	       u8 *bytes)
 {
 	struct ice_netdev_priv *np = netdev_priv(netdev);
-	u16 first_word, last_word, nwords;
 	struct ice_vsi *vsi = np->vsi;
 	struct ice_pf *pf = vsi->back;
 	struct ice_hw *hw = &pf->hw;
 	enum ice_status status;
 	struct device *dev;
 	int ret = 0;
-	u16 *buf;
+	u8 *buf;
 
 	dev = ice_pf_to_dev(pf);
 
 	eeprom->magic = hw->vendor_id | (hw->device_id << 16);
+	netdev_dbg(netdev, "GEEPROM cmd 0x%08x, offset 0x%08x, len 0x%08x\n",
+		   eeprom->cmd, eeprom->offset, eeprom->len);
 
-	first_word = eeprom->offset >> 1;
-	last_word = (eeprom->offset + eeprom->len - 1) >> 1;
-	nwords = last_word - first_word + 1;
-
-	buf = devm_kcalloc(dev, nwords, sizeof(u16), GFP_KERNEL);
+	buf = kzalloc(eeprom->len, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 
-	status = ice_read_sr_buf(hw, first_word, &nwords, buf);
+	status = ice_acquire_nvm(hw, ICE_RES_READ);
 	if (status) {
-		dev_err(dev, "ice_read_sr_buf failed, err %d aq_err %d\n",
+		dev_err(dev, "ice_acquire_nvm failed, err %d aq_err %d\n",
 			status, hw->adminq.sq_last_status);
-		eeprom->len = sizeof(u16) * nwords;
 		ret = -EIO;
 		goto out;
 	}
 
-	memcpy(bytes, (u8 *)buf + (eeprom->offset & 1), eeprom->len);
+	status = ice_read_flat_nvm(hw, eeprom->offset, &eeprom->len, buf, false);
+	if (status == ICE_ERR_AQ_ERROR &&
+	    hw->adminq.sq_last_status == ICE_AQ_RC_EINVAL) {
+		/* do nothing, we reached the end */
+		ice_release_nvm(hw);
+		goto out;
+	} else if (status) {
+		dev_err(dev, "ice_read_flat_nvm failed, err %d aq_err %d\n",
+			status, hw->adminq.sq_last_status);
+		ret = -EIO;
+		ice_release_nvm(hw);
+		goto out;
+	}
+
+	ice_release_nvm(hw);
+
+	memcpy(bytes, buf, eeprom->len);
 out:
-	devm_kfree(dev, buf);
+	kfree(buf);
 	return ret;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 622431aa1e22..ff9ccb17be25 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -108,70 +108,6 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 	return status;
 }
 
-/**
- * ice_check_sr_access_params - verify params for Shadow RAM R/W operations.
- * @hw: pointer to the HW structure
- * @offset: offset in words from module start
- * @words: number of words to access
- */
-static enum ice_status
-ice_check_sr_access_params(struct ice_hw *hw, u32 offset, u16 words)
-{
-	if ((offset + words) > hw->nvm.sr_words) {
-		ice_debug(hw, ICE_DBG_NVM,
-			  "NVM error: offset beyond SR lmt.\n");
-		return ICE_ERR_PARAM;
-	}
-
-	if (words > ICE_SR_SECTOR_SIZE_IN_WORDS) {
-		/* We can access only up to 4KB (one sector), in one AQ write */
-		ice_debug(hw, ICE_DBG_NVM,
-			  "NVM error: tried to access %d words, limit is %d.\n",
-			  words, ICE_SR_SECTOR_SIZE_IN_WORDS);
-		return ICE_ERR_PARAM;
-	}
-
-	if (((offset + (words - 1)) / ICE_SR_SECTOR_SIZE_IN_WORDS) !=
-	    (offset / ICE_SR_SECTOR_SIZE_IN_WORDS)) {
-		/* A single access cannot spread over two sectors */
-		ice_debug(hw, ICE_DBG_NVM,
-			  "NVM error: cannot spread over two sectors.\n");
-		return ICE_ERR_PARAM;
-	}
-
-	return 0;
-}
-
-/**
- * ice_read_sr_aq - Read Shadow RAM.
- * @hw: pointer to the HW structure
- * @offset: offset in words from module start
- * @words: number of words to read
- * @data: storage for the words read from Shadow RAM (Little Endian)
- * @last_command: tells the AdminQ that this is the last command
- *
- * Reads 16-bit Little Endian word buffers from the Shadow RAM using the admin
- * command.
- */
-static enum ice_status
-ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, __le16 *data,
-	       bool last_command)
-{
-	enum ice_status status;
-
-	status = ice_check_sr_access_params(hw, offset, words);
-
-	/* values in "offset" and "words" parameters are sized as words
-	 * (16 bits) but ice_aq_read_nvm expects these values in bytes.
-	 * So do this conversion while calling ice_aq_read_nvm.
-	 */
-	if (!status)
-		status = ice_aq_read_nvm(hw, 0, 2 * offset, 2 * words, data,
-					 last_command, true, NULL);
-
-	return status;
-}
-
 /**
  * ice_read_sr_word_aq - Reads Shadow RAM via AQ
  * @hw: pointer to the HW structure
@@ -199,63 +135,6 @@ ice_read_sr_word_aq(struct ice_hw *hw, u16 offset, u16 *data)
 	return 0;
 }
 
-/**
- * ice_read_sr_buf_aq - Reads Shadow RAM buf via AQ
- * @hw: pointer to the HW structure
- * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
- * @words: (in) number of words to read; (out) number of words actually read
- * @data: words read from the Shadow RAM
- *
- * Reads 16 bit words (data buf) from the SR using the ice_read_sr_aq
- * method. Ownership of the NVM is taken before reading the buffer and later
- * released.
- */
-static enum ice_status
-ice_read_sr_buf_aq(struct ice_hw *hw, u16 offset, u16 *words, u16 *data)
-{
-	enum ice_status status;
-	bool last_cmd = false;
-	u16 words_read = 0;
-	u16 i = 0;
-
-	do {
-		u16 read_size, off_w;
-
-		/* Calculate number of bytes we should read in this step.
-		 * It's not allowed to read more than one page at a time or
-		 * to cross page boundaries.
-		 */
-		off_w = offset % ICE_SR_SECTOR_SIZE_IN_WORDS;
-		read_size = off_w ?
-			min_t(u16, *words,
-			      (ICE_SR_SECTOR_SIZE_IN_WORDS - off_w)) :
-			min_t(u16, (*words - words_read),
-			      ICE_SR_SECTOR_SIZE_IN_WORDS);
-
-		/* Check if this is last command, if so set proper flag */
-		if ((words_read + read_size) >= *words)
-			last_cmd = true;
-
-		status = ice_read_sr_aq(hw, offset, read_size,
-					data + words_read, last_cmd);
-		if (status)
-			goto read_nvm_buf_aq_exit;
-
-		/* Increment counter for words already read and move offset to
-		 * new read location
-		 */
-		words_read += read_size;
-		offset += read_size;
-	} while (words_read < *words);
-
-	for (i = 0; i < *words; i++)
-		data[i] = le16_to_cpu(((__force __le16 *)data)[i]);
-
-read_nvm_buf_aq_exit:
-	*words = words_read;
-	return status;
-}
-
 /**
  * ice_acquire_nvm - Generic request for acquiring the NVM ownership
  * @hw: pointer to the HW structure
@@ -263,7 +142,7 @@ ice_read_sr_buf_aq(struct ice_hw *hw, u16 offset, u16 *words, u16 *data)
  *
  * This function will request NVM ownership.
  */
-static enum ice_status
+enum ice_status
 ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access)
 {
 	if (hw->nvm.blank_nvm_mode)
@@ -278,7 +157,7 @@ ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access)
  *
  * This function will release NVM ownership.
  */
-static void ice_release_nvm(struct ice_hw *hw)
+void ice_release_nvm(struct ice_hw *hw)
 {
 	if (hw->nvm.blank_nvm_mode)
 		return;
@@ -412,31 +291,6 @@ enum ice_status ice_init_nvm(struct ice_hw *hw)
 	return 0;
 }
 
-/**
- * ice_read_sr_buf - Reads Shadow RAM buf and acquire lock if necessary
- * @hw: pointer to the HW structure
- * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
- * @words: (in) number of words to read; (out) number of words actually read
- * @data: words read from the Shadow RAM
- *
- * Reads 16 bit words (data buf) from the SR using the ice_read_nvm_buf_aq
- * method. The buf read is preceded by the NVM ownership take
- * and followed by the release.
- */
-enum ice_status
-ice_read_sr_buf(struct ice_hw *hw, u16 offset, u16 *words, u16 *data)
-{
-	enum ice_status status;
-
-	status = ice_acquire_nvm(hw, ICE_RES_READ);
-	if (!status) {
-		status = ice_read_sr_buf_aq(hw, offset, words, data);
-		ice_release_nvm(hw);
-	}
-
-	return status;
-}
-
 /**
  * ice_nvm_validate_checksum
  * @hw: pointer to the HW struct
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index 4245ef988edf..7375f6b96919 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -4,8 +4,12 @@
 #ifndef _ICE_NVM_H_
 #define _ICE_NVM_H_
 
+enum ice_status
+ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access);
+void ice_release_nvm(struct ice_hw *hw);
 enum ice_status
 ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 		  bool read_shadow_ram);
+enum ice_status ice_init_nvm(struct ice_hw *hw);
 enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data);
 #endif /* _ICE_NVM_H_ */
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (6 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 07/15] ice: implement full NVM read from ETHTOOL_GEEPROM Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
                     ` (2 more replies)
  2020-01-30 22:59 ` [PATCH 09/15] ice: enable initial devlink support Jacob Keller
                   ` (11 subsequent siblings)
  19 siblings, 3 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Add devres managed allocation functions for allocating a devlink
instance. These can be used by device drivers based on the devres
framework which want to allocate a devlink instance.

For simplicity and to reduce churn in the devlink core code, the devres
management works by creating a node with a double-pointer. The devlink
instance is allocated using the normal devlink_alloc and released using
the normal devlink_free.

An alternative solution where the raw memory for devlink is allocated
directly via devres_alloc could be done. Such an implementation would
either significantly increase code duplication or code churn in order to
refactor the setup from the allocation.

The new devres managed allocation function will be used by the ice
driver in a following change to implement initial devlink support.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 include/net/devlink.h |  4 ++++
 lib/devres.c          |  1 +
 net/core/devlink.c    | 54 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 63e954241404..1c3540280396 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -858,11 +858,15 @@ struct ib_device;
 struct net *devlink_net(const struct devlink *devlink);
 void devlink_net_set(struct devlink *devlink, struct net *net);
 struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
+struct devlink *devlinkm_alloc(struct device * dev,
+			       const struct devlink_ops *ops,
+			       size_t priv_size);
 int devlink_register(struct devlink *devlink, struct device *dev);
 void devlink_unregister(struct devlink *devlink);
 void devlink_reload_enable(struct devlink *devlink);
 void devlink_reload_disable(struct devlink *devlink);
 void devlink_free(struct devlink *devlink);
+void devlinkm_free(struct device *dev, struct devlink *devlink);
 int devlink_port_register(struct devlink *devlink,
 			  struct devlink_port *devlink_port,
 			  unsigned int port_index);
diff --git a/lib/devres.c b/lib/devres.c
index 6ef51f159c54..239c81d40612 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -5,6 +5,7 @@
 #include <linux/gfp.h>
 #include <linux/export.h>
 #include <linux/of_address.h>
+#include <net/devlink.h>
 
 enum devm_ioremap_type {
 	DEVM_IOREMAP = 0,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 574008c536fa..b2b855d12a11 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -6531,6 +6531,60 @@ void devlink_free(struct devlink *devlink)
 }
 EXPORT_SYMBOL_GPL(devlink_free);
 
+static void devres_devlink_release(struct device *dev, void *res)
+{
+	devlink_free(*(struct devlink **)res);
+}
+
+static int devres_devlink_match(struct device *dev, void *res, void *data)
+{
+	return *(struct devlink **)res == data;
+}
+
+/**
+ * devlinkm_alloc - Allocate devlink instance managed by devres
+ * @dev: device to allocate devlink for
+ * @ops: devlink ops structure
+ * @priv_size: size of private data portion
+ *
+ * Allocate a devlink instance and manage its release via devres.
+ */
+struct devlink *devlinkm_alloc(struct device *dev,
+			       const struct devlink_ops *ops,
+			       size_t priv_size)
+{
+	struct devlink **ptr, *devlink = NULL;
+
+	ptr = devres_alloc(devres_devlink_release, sizeof(*ptr), GFP_KERNEL);
+	if (!ptr)
+		return NULL;
+
+	devlink = devlink_alloc(ops, priv_size);
+	if (devlink) {
+		*ptr = devlink;
+		devres_add(dev, ptr);
+	} else {
+		devres_free(ptr);
+	}
+
+	return devlink;
+}
+EXPORT_SYMBOL_GPL(devlinkm_alloc);
+
+/**
+ * devlinkm_free - Free devlink instance managed by devres
+ * @dev: device to remove the allocated devlink from
+ * @devlink: devlink instance to free
+ *
+ * Find and remove the devres node associated with the given devlink.
+ */
+void devlinkm_free(struct device *dev, struct devlink *devlink)
+{
+	WARN_ON(devres_release(dev, devres_devlink_release,
+			       devres_devlink_match, devlink));
+}
+EXPORT_SYMBOL_GPL(devlinkm_free);
+
 static void devlink_port_type_warn(struct work_struct *work)
 {
 	WARN(true, "Type was not set for devlink port.");
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 09/15] ice: enable initial devlink support
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (7 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 22:59 ` [PATCH 10/15] ice: add basic handler for devlink .info_get Jacob Keller
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Begin implementing support for the devlink interface with the ice
driver. Use devlinkm_alloc to allocate the devlink memory. The PF
private data structure is now allocated as part of the devlink instead
of as a standalone allocation.

The ice hardware is a multi-function PCIe device. Thus, each physical
function will get its own devlink instance. This means that each
function will be treated independently, with its own parameters and
configuration. This is done because the ice driver loads a separate
instance for each function.

That means that this implementation does not enable devlink to manage
device-wide resources or configuration, as each physical function will
be treated independently. This is done for simplicity, as managing
a devlink instance across multiple driver instances would significantly
increase the complexity for minimal gain.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/Kconfig           |  1 +
 drivers/net/ethernet/intel/ice/Makefile      |  1 +
 drivers/net/ethernet/intel/ice/ice.h         |  4 +
 drivers/net/ethernet/intel/ice/ice_devlink.c | 88 ++++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h | 14 ++++
 drivers/net/ethernet/intel/ice/ice_main.c    | 26 +++++-
 6 files changed, 132 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 154e2e818ec6..ad34e4335df2 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -294,6 +294,7 @@ config ICE
 	tristate "Intel(R) Ethernet Connection E800 Series Support"
 	default n
 	depends on PCI_MSI
+	select NET_DEVLINK
 	---help---
 	  This driver supports Intel(R) Ethernet Connection E800 Series of
 	  devices.  For more information on how to identify your adapter, go
diff --git a/drivers/net/ethernet/intel/ice/Makefile b/drivers/net/ethernet/intel/ice/Makefile
index 59544b0fc086..e2502ff3229d 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -19,6 +19,7 @@ ice-y := ice_main.o	\
 	 ice_txrx.o	\
 	 ice_flex_pipe.o \
 	 ice_flow.o	\
+	 ice_devlink.o \
 	 ice_ethtool.o
 ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o ice_sriov.o
 ice-$(CONFIG_DCB) += ice_dcb.o ice_dcb_nl.o ice_dcb_lib.o
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index cb10abb14e11..a195135f840f 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -36,6 +36,7 @@
 #include <linux/avf/virtchnl.h>
 #include <net/ipv6.h>
 #include <net/xdp_sock.h>
+#include <net/devlink.h>
 #include "ice_devids.h"
 #include "ice_type.h"
 #include "ice_txrx.h"
@@ -346,6 +347,9 @@ enum ice_pf_flags {
 struct ice_pf {
 	struct pci_dev *pdev;
 
+	/* devlink port data */
+	struct devlink_port devlink_port;
+
 	/* OS reserved IRQ details */
 	struct msix_entry *msix_entries;
 	struct ice_res_tracker *irq_tracker;
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
new file mode 100644
index 000000000000..0b0f936122de
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019, Intel Corporation. */
+
+#include "ice.h"
+#include "ice_devlink.h"
+
+const struct devlink_ops ice_devlink_ops = {
+};
+
+/**
+ * ice_devlink_register - Register devlink interface for this PF
+ * @pf: the PF to register the devlink for.
+ *
+ * Register the devlink instance associated with this physical function.
+ *
+ * @returns zero on success or an error code on failure.
+ */
+int ice_devlink_register(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = devlink_register(devlink, dev);
+	if (err) {
+		dev_err(dev, "devlink registration failed: %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_devlink_unregister - Unregister devlink resources for this pf.
+ * @pf: the PF structure to cleanup
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void ice_devlink_unregister(struct ice_pf *pf)
+{
+	devlink_unregister(priv_to_devlink(pf));
+}
+
+/**
+ * ice_devlink_create_port - Create a devlink port for this PF
+ * @pf: the PF to create a port for
+ *
+ * Create and register a devlink_port for this PF. Note that although each
+ * physical function connected to a separate devlink instance, the port will
+ * still be numbered according to the physical function id.
+ *
+ * @returns zero on success or an error code on failure.
+ */
+int ice_devlink_create_port(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	struct ice_vsi *vsi = ice_get_main_vsi(pf);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	if (!vsi) {
+		dev_warn(dev, "%s: unable to find main VSI\n", __func__);
+		return -EIO;
+	}
+
+	devlink_port_attrs_set(&pf->devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
+			       pf->hw.pf_id, false, 0, NULL, 0);
+	err = devlink_port_register(devlink, &pf->devlink_port, pf->hw.pf_id);
+	if (err) {
+		dev_err(dev, "devlink_port_register failed: %d\n", err);
+		return err;
+	}
+	devlink_port_type_eth_set(&pf->devlink_port, vsi->netdev);
+
+	return 0;
+}
+
+/**
+ * ice_devlink_destroy_port - Destroy the devlink_port for this PF
+ * @pf: the PF to cleanup
+ *
+ * Unregisters the devlink_port structure associated with this PF.
+ */
+void ice_devlink_destroy_port(struct ice_pf *pf)
+{
+	devlink_port_type_clear(&pf->devlink_port);
+	devlink_port_unregister(&pf->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h
new file mode 100644
index 000000000000..5e34f6e1b57e
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019, Intel Corporation. */
+
+#ifndef _ICE_DEVLINK_H_
+#define _ICE_DEVLINK_H_
+
+extern const struct devlink_ops ice_devlink_ops;
+
+int ice_devlink_register(struct ice_pf *pf);
+void ice_devlink_unregister(struct ice_pf *pf);
+int ice_devlink_create_port(struct ice_pf *pf);
+void ice_devlink_destroy_port(struct ice_pf *pf);
+
+#endif /* _ICE_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 5ae671609f98..a1aa4a1b6870 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -10,6 +10,7 @@
 #include "ice_lib.h"
 #include "ice_dcb_lib.h"
 #include "ice_dcb_nl.h"
+#include "ice_devlink.h"
 
 #define DRV_VERSION_MAJOR 0
 #define DRV_VERSION_MINOR 8
@@ -2553,6 +2554,11 @@ static int ice_setup_pf_sw(struct ice_pf *pf)
 		status = -ENODEV;
 		goto unroll_vsi_setup;
 	}
+
+	status = ice_devlink_create_port(pf);
+	if (status)
+		goto unroll_vsi_setup;
+
 	/* netdev has to be configured before setting frame size */
 	ice_vsi_cfg_frame_size(vsi);
 
@@ -2582,6 +2588,8 @@ static int ice_setup_pf_sw(struct ice_pf *pf)
 		}
 	}
 
+	ice_devlink_destroy_port(pf);
+
 unroll_vsi_setup:
 	if (vsi) {
 		ice_vsi_free_q_vectors(vsi);
@@ -3180,6 +3188,7 @@ static int
 ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 {
 	struct device *dev = &pdev->dev;
+	struct devlink *devlink;
 	struct ice_pf *pf;
 	struct ice_hw *hw;
 	int err;
@@ -3195,9 +3204,11 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		return err;
 	}
 
-	pf = devm_kzalloc(dev, sizeof(*pf), GFP_KERNEL);
-	if (!pf)
+	devlink = devlinkm_alloc(dev, &ice_devlink_ops, sizeof(*pf));
+	if (!devlink) {
+		dev_err(dev, "devlink allocation failed\n");
 		return -ENOMEM;
+	}
 
 	/* set up for high or low DMA */
 	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
@@ -3211,6 +3222,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	pci_enable_pcie_error_reporting(pdev);
 	pci_set_master(pdev);
 
+	pf = devlink_priv(devlink);
 	pf->pdev = pdev;
 	pci_set_drvdata(pdev, pf);
 	set_bit(__ICE_DOWN, pf->state);
@@ -3233,6 +3245,12 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 
 	pf->msg_enable = netif_msg_init(debug, ICE_DFLT_NETIF_M);
 
+	err = ice_devlink_register(pf);
+	if (err) {
+		dev_err(dev, "ice_devlink_register failed: %d\n", err);
+		goto err_exit_unroll;
+	}
+
 #ifndef CONFIG_DYNAMIC_DEBUG
 	if (debug < -1)
 		hw->debug_mask = debug;
@@ -3384,6 +3402,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	ice_deinit_pf(pf);
 	ice_deinit_hw(hw);
 err_exit_unroll:
+	ice_devlink_unregister(pf);
 	pci_disable_pcie_error_reporting(pdev);
 	return err;
 }
@@ -3411,6 +3430,7 @@ static void ice_remove(struct pci_dev *pdev)
 
 	if (test_bit(ICE_FLAG_SRIOV_ENA, pf->flags))
 		ice_free_vfs(pf);
+	ice_devlink_destroy_port(pf);
 	ice_vsi_release_all(pf);
 	ice_free_irq_msix_misc(pf);
 	ice_for_each_vsi(pf, i) {
@@ -3420,6 +3440,8 @@ static void ice_remove(struct pci_dev *pdev)
 	}
 	ice_deinit_pf(pf);
 	ice_deinit_hw(&pf->hw);
+	ice_devlink_unregister(pf);
+
 	/* Issue a PFR as part of the prescribed driver unload flow.  Do not
 	 * do it via ice_schedule_reset() since there is no need to rebuild
 	 * and the service task is already stopped.
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 10/15] ice: add basic handler for devlink .info_get
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (8 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 09/15] ice: enable initial devlink support Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-01-30 22:59 ` [PATCH 11/15] ice: add board identifier info to " Jacob Keller
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

The devlink .info_get callback allows the driver to report detailed
version information. The following devlink versions are reported with
this initial implementation:

 "driver.version" -> device driver version, to match ethtool -i version
 "fw" -> firmware version as reported by ethtool -i firmware-version
 "fw.mgmt" -> The version of the firmware that controls PHY, link, etc
 "fw.api" -> API version of interface exposed over the AdminQ
 "fw.build" -> Unique build identifier for the management firmware
 "nvm.version" -> Version of the NVM parameters section
 "nvm.oem" -> OEM-specific version information
 "nvm.eetrack" -> Unique identifier for the combined NVM image

With this, devlink can now report at least the same information as
reported by the older ethtool interface. Each section of the
"firmware-version" is also reported independently so that it is easier
to understand the meaning.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_devlink.c | 103 +++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 0b0f936122de..493c2c2986f2 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -2,9 +2,112 @@
 /* Copyright (c) 2019, Intel Corporation. */
 
 #include "ice.h"
+#include "ice_lib.h"
 #include "ice_devlink.h"
 
+/**
+ * ice_devlink_info_get - .info_get devlink handler
+ * @devlink: devlink instance structure
+ * @req: the devlink info request
+ * @extack: extended netdev ack structure
+ *
+ * Callback for the devlink .info_get operation. Reports information about the
+ * device.
+ *
+ * @returns zero on success or an error code on failure.
+ */
+static int ice_devlink_info_get(struct devlink *devlink,
+				struct devlink_info_req *req,
+				struct netlink_ext_ack *extack)
+{
+	u8 oem_ver, oem_patch, nvm_ver_hi, nvm_ver_lo;
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct ice_hw *hw = &pf->hw;
+	u16 oem_build;
+	char buf[32]; /* TODO: size this properly */
+	int err;
+
+	ice_get_nvm_version(hw, &oem_ver, &oem_build, &oem_patch, &nvm_ver_hi,
+			    &nvm_ver_lo);
+
+	err = devlink_info_driver_name_put(req, KBUILD_MODNAME);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set driver name");
+		return err;
+	}
+
+	/* driver.version */
+	err = devlink_info_version_fixed_put(req, "driver.version",
+					     ice_drv_ver);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set driver version");
+		return err;
+	}
+
+	/* fw (match exact output of ethtool -i firmware-version) */
+	err = devlink_info_version_running_put(req,
+					       DEVLINK_INFO_VERSION_GENERIC_FW,
+					       ice_nvm_version_str(hw));
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set combined fw version");
+		return err;
+	}
+
+	/* fw.mgmt (DEVLINK_INFO_VERSION_GENERIC_FW_MGMT) */
+	snprintf(buf, sizeof(buf), "%u.%u.%u", hw->fw_maj_ver, hw->fw_min_ver,
+		 hw->fw_patch);
+	err = devlink_info_version_running_put(req, "fw.mgmt", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw version data");
+		return err;
+	}
+
+	/* fw.api */
+	snprintf(buf, sizeof(buf), "%u.%u", hw->api_maj_ver,
+		 hw->api_min_ver);
+	err = devlink_info_version_running_put(req, "fw.api", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw API data");
+		return err;
+	}
+
+	/* fw.build */
+	snprintf(buf, sizeof(buf), "%08x", hw->fw_build);
+	err = devlink_info_version_running_put(req, "fw.build", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw build data");
+		return err;
+	}
+
+	/* nvm.version */
+	snprintf(buf, sizeof(buf), "%x.%02x", nvm_ver_hi, nvm_ver_lo);
+	err = devlink_info_version_running_put(req, "nvm.version", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set NVM version data");
+		return err;
+	}
+
+	/* nvm.oem */
+	snprintf(buf, sizeof(buf), "%u.%u.%u", oem_ver, oem_build, oem_patch);
+	err = devlink_info_version_running_put(req, "nvm.oem", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set oem version data");
+		return err;
+	}
+
+	/* nvm.eetrack */
+	snprintf(buf, sizeof(buf), "0x%0X", hw->nvm.eetrack);
+	err = devlink_info_version_running_put(req, "nvm.eetrack", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set NVM eetrack data");
+		return err;
+	}
+
+	return 0;
+}
+
 const struct devlink_ops ice_devlink_ops = {
+	.info_get = ice_devlink_info_get,
 };
 
 /**
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 11/15] ice: add board identifier info to devlink .info_get
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (9 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 10/15] ice: add basic handler for devlink .info_get Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-01-30 22:59 ` [PATCH 12/15] ice: add a devlink region to dump shadow RAM contents Jacob Keller
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Export a unique board identifier using "board.id" for devlink's
.info_get command.

Obtain this by reading the NVM for the PBA identification string.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c  |  66 ----------
 drivers/net/ethernet/intel/ice/ice_common.h  |   3 -
 drivers/net/ethernet/intel/ice/ice_devlink.c |  14 ++
 drivers/net/ethernet/intel/ice/ice_nvm.c     | 127 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_nvm.h     |   5 +
 drivers/net/ethernet/intel/ice/ice_type.h    |   1 +
 6 files changed, 147 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 0207e28c2682..8928c3907596 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -966,72 +966,6 @@ enum ice_status ice_reset(struct ice_hw *hw, enum ice_reset_req req)
 	return ice_check_reset(hw);
 }
 
-/**
- * ice_get_pfa_module_tlv - Reads sub module TLV from NVM PFA
- * @hw: pointer to hardware structure
- * @module_tlv: pointer to module TLV to return
- * @module_tlv_len: pointer to module TLV length to return
- * @module_type: module type requested
- *
- * Finds the requested sub module TLV type from the Preserved Field
- * Area (PFA) and returns the TLV pointer and length. The caller can
- * use these to read the variable length TLV value.
- */
-enum ice_status
-ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len,
-		       u16 module_type)
-{
-	enum ice_status status;
-	u16 pfa_len, pfa_ptr;
-	u16 next_tlv;
-
-	status = ice_read_sr_word(hw, ICE_SR_PFA_PTR, &pfa_ptr);
-	if (status) {
-		ice_debug(hw, ICE_DBG_INIT, "Preserved Field Array pointer.\n");
-		return status;
-	}
-	status = ice_read_sr_word(hw, pfa_ptr, &pfa_len);
-	if (status) {
-		ice_debug(hw, ICE_DBG_INIT, "Failed to read PFA length.\n");
-		return status;
-	}
-	/* Starting with first TLV after PFA length, iterate through the list
-	 * of TLVs to find the requested one.
-	 */
-	next_tlv = pfa_ptr + 1;
-	while (next_tlv < pfa_ptr + pfa_len) {
-		u16 tlv_sub_module_type;
-		u16 tlv_len;
-
-		/* Read TLV type */
-		status = ice_read_sr_word(hw, next_tlv, &tlv_sub_module_type);
-		if (status) {
-			ice_debug(hw, ICE_DBG_INIT, "Failed to read TLV type.\n");
-			break;
-		}
-		/* Read TLV length */
-		status = ice_read_sr_word(hw, next_tlv + 1, &tlv_len);
-		if (status) {
-			ice_debug(hw, ICE_DBG_INIT, "Failed to read TLV length.\n");
-			break;
-		}
-		if (tlv_sub_module_type == module_type) {
-			if (tlv_len) {
-				*module_tlv = next_tlv;
-				*module_tlv_len = tlv_len;
-				return 0;
-			}
-			return ICE_ERR_INVAL_SIZE;
-		}
-		/* Check next TLV, i.e. current TLV pointer + length + 2 words
-		 * (for current TLV's type and length)
-		 */
-		next_tlv = next_tlv + tlv_len + 2;
-	}
-	/* Module does not exist */
-	return ICE_ERR_DOES_NOT_EXIST;
-}
-
 /**
  * ice_copy_rxq_ctx_to_hw
  * @hw: pointer to the hardware structure
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 045913cd7428..6d5d18665c7e 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -15,9 +15,6 @@ enum ice_status ice_nvm_validate_checksum(struct ice_hw *hw);
 
 enum ice_status ice_init_hw(struct ice_hw *hw);
 void ice_deinit_hw(struct ice_hw *hw);
-enum ice_status
-ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len,
-		       u16 module_type);
 enum ice_status ice_check_reset(struct ice_hw *hw);
 enum ice_status ice_reset(struct ice_hw *hw, enum ice_reset_req req);
 enum ice_status ice_create_all_ctrlq(struct ice_hw *hw);
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 493c2c2986f2..3a98f241ccee 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -23,6 +23,7 @@ static int ice_devlink_info_get(struct devlink *devlink,
 	u8 oem_ver, oem_patch, nvm_ver_hi, nvm_ver_lo;
 	struct ice_pf *pf = devlink_priv(devlink);
 	struct ice_hw *hw = &pf->hw;
+	enum ice_status status;
 	u16 oem_build;
 	char buf[32]; /* TODO: size this properly */
 	int err;
@@ -44,6 +45,19 @@ static int ice_devlink_info_get(struct devlink *devlink,
 		return err;
 	}
 
+	status = ice_read_pba_string(hw, buf, sizeof(buf));
+	if (status) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to obtain PBA string");
+		return -EIO;
+	}
+
+	/* board.id (DEVLINK_INFO_VERSION_GENERIC_BOARD_ID) */
+	err = devlink_info_version_fixed_put(req, "board.id", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set board identifier");
+		return err;
+	}
+
 	/* fw (match exact output of ethtool -i firmware-version) */
 	err = devlink_info_version_running_put(req,
 					       DEVLINK_INFO_VERSION_GENERIC_FW,
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index ff9ccb17be25..ac7b3b9faeae 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -186,6 +186,133 @@ enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data)
 	return status;
 }
 
+/**
+ * ice_get_pfa_module_tlv - Reads sub module TLV from NVM PFA
+ * @hw: pointer to hardware structure
+ * @module_tlv: pointer to module TLV to return
+ * @module_tlv_len: pointer to module TLV length to return
+ * @module_type: module type requested
+ *
+ * Finds the requested sub module TLV type from the Preserved Field
+ * Area (PFA) and returns the TLV pointer and length. The caller can
+ * use these to read the variable length TLV value.
+ */
+enum ice_status
+ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len,
+		       u16 module_type)
+{
+	enum ice_status status;
+	u16 pfa_len, pfa_ptr;
+	u16 next_tlv;
+
+	status = ice_read_sr_word(hw, ICE_SR_PFA_PTR, &pfa_ptr);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Preserved Field Array pointer.\n");
+		return status;
+	}
+	status = ice_read_sr_word(hw, pfa_ptr, &pfa_len);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to read PFA length.\n");
+		return status;
+	}
+	/* Starting with first TLV after PFA length, iterate through the list
+	 * of TLVs to find the requested one.
+	 */
+	next_tlv = pfa_ptr + 1;
+	while (next_tlv < pfa_ptr + pfa_len) {
+		u16 tlv_sub_module_type;
+		u16 tlv_len;
+
+		/* Read TLV type */
+		status = ice_read_sr_word(hw, next_tlv, &tlv_sub_module_type);
+		if (status) {
+			ice_debug(hw, ICE_DBG_INIT, "Failed to read TLV type.\n");
+			break;
+		}
+		/* Read TLV length */
+		status = ice_read_sr_word(hw, next_tlv + 1, &tlv_len);
+		if (status) {
+			ice_debug(hw, ICE_DBG_INIT, "Failed to read TLV length.\n");
+			break;
+		}
+		if (tlv_sub_module_type == module_type) {
+			if (tlv_len) {
+				*module_tlv = next_tlv;
+				*module_tlv_len = tlv_len;
+				return 0;
+			}
+			return ICE_ERR_INVAL_SIZE;
+		}
+		/* Check next TLV, i.e. current TLV pointer + length + 2 words
+		 * (for current TLV's type and length)
+		 */
+		next_tlv = next_tlv + tlv_len + 2;
+	}
+	/* Module does not exist */
+	return ICE_ERR_DOES_NOT_EXIST;
+}
+
+/**
+ * ice_read_pba_string - Reads part number string from NVM
+ * @hw: pointer to hardware structure
+ * @pba_num: stores the part number string from the NVM
+ * @pba_num_size: part number string buffer length
+ *
+ * Reads the part number string from the NVM.
+ */
+enum ice_status
+ice_read_pba_string(struct ice_hw *hw, u8 *pba_num, u32 pba_num_size)
+{
+	u16 pba_tlv, pba_tlv_len;
+	enum ice_status status;
+	u16 pba_word, pba_size;
+	u16 i;
+
+	status = ice_get_pfa_module_tlv(hw, &pba_tlv, &pba_tlv_len,
+					ICE_SR_PBA_BLOCK_PTR);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to read PBA Block TLV.\n");
+		return status;
+	}
+
+	/* pba_size is the next word */
+	status = ice_read_sr_word(hw, (pba_tlv + 2), &pba_size);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to read PBA Section size.\n");
+		return status;
+	}
+
+	if (pba_tlv_len < pba_size) {
+		ice_debug(hw, ICE_DBG_INIT, "Invalid PBA Block TLV size.\n");
+		return ICE_ERR_INVAL_SIZE;
+	}
+
+	/* Subtract one to get PBA word count (PBA Size word is included in
+	 * total size)
+	 */
+	pba_size--;
+	if (pba_num_size < (((u32)pba_size * 2) + 1)) {
+		ice_debug(hw, ICE_DBG_INIT,
+			  "Buffer too small for PBA data.\n");
+		return ICE_ERR_PARAM;
+	}
+
+	for (i = 0; i < pba_size; i++) {
+		status = ice_read_sr_word(hw, (pba_tlv + 2 + 1) + i, &pba_word);
+		if (status) {
+			ice_debug(hw, ICE_DBG_INIT,
+				  "Failed to read PBA Block word %d.\n", i);
+			return status;
+		}
+
+		pba_num[(i * 2)] = (pba_word >> 8) & 0xFF;
+		pba_num[(i * 2) + 1] = pba_word & 0xFF;
+	}
+	pba_num[(pba_size * 2)] = '\0';
+
+	return status;
+}
+
 /**
  * ice_init_nvm - initializes NVM setting
  * @hw: pointer to the HW struct
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index 7375f6b96919..999f273ba6ad 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -10,6 +10,11 @@ void ice_release_nvm(struct ice_hw *hw);
 enum ice_status
 ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 		  bool read_shadow_ram);
+enum ice_status
+ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len,
+		       u16 module_type);
+enum ice_status
+ice_read_pba_string(struct ice_hw *hw, u8 *pba_num, u32 pba_num_size);
 enum ice_status ice_init_nvm(struct ice_hw *hw);
 enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data);
 #endif /* _ICE_NVM_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index b361ffabb0ca..f65c53ea335b 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -627,6 +627,7 @@ struct ice_hw_port_stats {
 /* Checksum and Shadow RAM pointers */
 #define ICE_SR_BOOT_CFG_PTR		0x132
 #define ICE_NVM_OEM_VER_OFF		0x02
+#define ICE_SR_PBA_BLOCK_PTR		0x16
 #define ICE_SR_NVM_DEV_STARTER_VER	0x18
 #define ICE_SR_NVM_EETRACK_LO		0x2D
 #define ICE_SR_NVM_EETRACK_HI		0x2E
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 12/15] ice: add a devlink region to dump shadow RAM contents
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (10 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 11/15] ice: add board identifier info to " Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 22:59 ` [PATCH 13/15] devlink: support directly reading from region memory Jacob Keller
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Add a devlink region for exposing the device's Shadow RAM contents.
Support immediate snapshots by implementing the .snapshot callback.

Currently, no driver event triggers a snapshot automatically. Users must
request a snapshot via the new DEVLINK_CMD_REGION_TAKE_SNAPSHOT command.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h         |   2 +
 drivers/net/ethernet/intel/ice/ice_devlink.c | 120 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h |   3 +
 drivers/net/ethernet/intel/ice/ice_main.c    |   4 +
 4 files changed, 129 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index a195135f840f..43deda152dd3 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -350,6 +350,8 @@ struct ice_pf {
 	/* devlink port data */
 	struct devlink_port devlink_port;
 
+	struct devlink_region *sr_region;
+
 	/* OS reserved IRQ details */
 	struct msix_entry *msix_entries;
 	struct ice_res_tracker *irq_tracker;
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 3a98f241ccee..9f6a0281fd6e 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -5,6 +5,26 @@
 #include "ice_lib.h"
 #include "ice_devlink.h"
 
+/**
+ * ice_is_primary_devlink - Check if this is the primary devlink instance
+ * @pf: the PF device to check
+ *
+ * The ice hardware is represented by a multi-function PCIe device. This
+ * function is used to check whether this PF is function zero. Certain devlink
+ * features will call this function and only be enabled if it returns true.
+ */
+static bool ice_is_primary_function(struct ice_pf *pf)
+{
+	struct ice_hw *hw = &pf->hw;
+	u8 pf_id;
+
+	/* the hw->pf_id may not always be set when this function is called */
+	pf_id = (u8)(rd32(hw, PF_FUNC_RID) & PF_FUNC_RID_FUNC_NUM_M) >>
+		PF_FUNC_RID_FUNC_NUM_S;
+
+	return (pf_id == 0);
+}
+
 /**
  * ice_devlink_info_get - .info_get devlink handler
  * @devlink: devlink instance structure
@@ -203,3 +223,103 @@ void ice_devlink_destroy_port(struct ice_pf *pf)
 	devlink_port_type_clear(&pf->devlink_port);
 	devlink_port_unregister(&pf->devlink_port);
 }
+
+/**
+ * ice_devlink_sr_snapshot - Capture a snapshot of the Shadow RAM contents
+ * @devlink: the devlink instance
+ * @region: pointer to the sr_region
+ * @extack: extended ACK response structure
+ *
+ * This function is called in response to the DEVLINK_CMD_REGION_TRIGGER for
+ * the shadow-ram devlink region. It captures a snapshot of the shadow ram
+ * contents. This snapshot can later be viewed via the devlink-region
+ * interface.
+ *
+ * @returns zero on success, and updates the data and destructor values.
+ * Returns a non-zero error code on failure.
+ */
+static int ice_devlink_sr_snapshot(struct devlink *devlink,
+			      struct netlink_ext_ack *extack,
+			      u8 **data,
+			      devlink_snapshot_data_dest_t **destructor)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_hw *hw = &pf->hw;
+	enum ice_status status;
+	void *sr_data;
+	u32 sr_size;
+
+	if (!ice_is_primary_function(pf))
+		return -EACCES;
+
+	sr_size = hw->nvm.sr_words * sizeof(u16);
+	sr_data = kzalloc(sr_size, GFP_KERNEL);
+	if (!sr_data) {
+		NL_SET_ERR_MSG_MOD(extack, "Out of memory");
+		return -ENOMEM;
+	}
+
+	status = ice_acquire_nvm(hw, ICE_RES_READ);
+	if (status) {
+		dev_dbg(dev, "ice_acquire_nvm failed, err %d aq_err %d\n",
+			status, hw->adminq.sq_last_status);
+		NL_SET_ERR_MSG_MOD(extack, "Failed to acquire NVM semaphore");
+		kfree(sr_data);
+		return -EIO;
+	}
+
+	status = ice_read_flat_nvm(hw, 0, &sr_size, sr_data, true);
+	if (status) {
+		dev_dbg(dev, "ice_read_flat_nvm failed after reading %u bytes, err %d aq_err %d\n",
+			sr_size, status, hw->adminq.sq_last_status);
+		NL_SET_ERR_MSG_MOD(extack, "Failed to read Shadow RAM contents");
+		ice_release_nvm(hw);
+		kfree(sr_data);
+		return -EIO;
+	}
+
+	ice_release_nvm(hw);
+
+	*data = sr_data;
+	*destructor = kfree;
+
+	return 0;
+}
+
+static const struct devlink_region_ops ice_sr_region_ops = {
+	.name = "shadow-ram",
+	.snapshot = ice_devlink_sr_snapshot,
+};
+
+/**
+ * ice_devlink_regions_init - Initialize devlink regions
+ * @pf: the PF device structure
+ *
+ * Create devlink regions used to enable access to dump the contents of the
+ * flash memory on the device.
+ */
+void ice_devlink_regions_init(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	struct device *dev = ice_pf_to_dev(pf);
+	u64 shadow_ram_size;
+
+	/* Regions which should be exported by all functions should be added
+	 * above this check.
+	 */
+	if (!ice_is_primary_function(pf))
+		return;
+
+	shadow_ram_size = pf->hw.nvm.sr_words * sizeof(u16);
+	pf->sr_region = devlink_region_create(devlink, &ice_sr_region_ops, 1,
+					      shadow_ram_size);
+	if (IS_ERR(pf->sr_region))
+		dev_warn(dev, "failed to create shadow-ram devlink region, err %ld\n",
+			 PTR_ERR(pf->sr_region));
+}
+
+void ice_devlink_regions_destroy(struct ice_pf *pf)
+{
+	devlink_region_destroy(pf->sr_region);
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h
index 5e34f6e1b57e..b1bacc922a5e 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.h
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.h
@@ -11,4 +11,7 @@ void ice_devlink_unregister(struct ice_pf *pf);
 int ice_devlink_create_port(struct ice_pf *pf);
 void ice_devlink_destroy_port(struct ice_pf *pf);
 
+void ice_devlink_regions_init(struct ice_pf *pf);
+void ice_devlink_regions_destroy(struct ice_pf *pf);
+
 #endif /* _ICE_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index a1aa4a1b6870..c20a8e761ca8 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3291,6 +3291,8 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		goto err_init_pf_unroll;
 	}
 
+	ice_devlink_regions_init(pf);
+
 	pf->num_alloc_vsi = hw->func_caps.guar_num_vsi;
 	if (!pf->num_alloc_vsi) {
 		err = -EIO;
@@ -3400,6 +3402,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	devm_kfree(dev, pf->vsi);
 err_init_pf_unroll:
 	ice_deinit_pf(pf);
+	ice_devlink_regions_destroy(pf);
 	ice_deinit_hw(hw);
 err_exit_unroll:
 	ice_devlink_unregister(pf);
@@ -3439,6 +3442,7 @@ static void ice_remove(struct pci_dev *pdev)
 		ice_vsi_free_q_vectors(pf->vsi[i]);
 	}
 	ice_deinit_pf(pf);
+	ice_devlink_regions_destroy(pf);
 	ice_deinit_hw(&pf->hw);
 	ice_devlink_unregister(pf);
 
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 13/15] devlink: support directly reading from region memory
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (11 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 12/15] ice: add a devlink region to dump shadow RAM contents Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-02-03 13:44   ` Jiri Pirko
  2020-01-30 22:59 ` [PATCH 14/15] ice: support direct read of the shadow ram region Jacob Keller
                   ` (6 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Add a new region operation for directly reading from a region, without
taking a full snapshot.

Extend the DEVLINK_CMD_REGION_READ to allow directly reading from
a region, if supported. Instead of reporting a missing snapshot id as
invalid, check to see if direct reading is implemented for the region.
If so, use the direct read operation to grab the current contents of the
region.

This new behavior of DEVLINK_CMD_REGION_READ should be backwards
compatible. Previously, all kernels rejected such
a DEVLINK_CMD_REGION_READ with -EINVAL, and will now either accept the
call or report -EOPNOTSUPP for regions which do not implement direct
access.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../networking/devlink/devlink-region.rst     |  8 ++
 include/net/devlink.h                         |  4 +
 net/core/devlink.c                            | 82 +++++++++++++++++--
 3 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
index 262249e6c3fc..a543f5ee7a9e 100644
--- a/Documentation/networking/devlink/devlink-region.rst
+++ b/Documentation/networking/devlink/devlink-region.rst
@@ -25,6 +25,10 @@ Regions may optionally support capturing a snapshot on demand via the
 allow requested snapshots must implement the ``.snapshot`` callback for the
 region in its ``devlink_region_ops`` structure.
 
+Regions may optionally allow directly reading from their contents without a
+snapshot. A driver wishing to enable this for a region should implement the
+``.read`` callback in the ``devlink_region_ops`` structure.
+
 example usage
 -------------
 
@@ -60,6 +64,10 @@ example usage
             length 16
     0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
 
+    # Read from the region without a snapshot
+    $ devlink region read pci/0000:00:05.0/fw-health address 16 length 16
+    0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
+
 As regions are likely very device or driver specific, no generic regions are
 defined. See the driver-specific documentation files for information on the
 specific regions a driver supports.
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 1c3540280396..47ce1b5481de 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -508,6 +508,10 @@ struct devlink_region_ops {
 			struct netlink_ext_ack *extack,
 			u8 **data,
 			devlink_snapshot_data_dest_t **destructor);
+	int (*read)(struct devlink *devlink,
+		    u64 curr_offset,
+		    u32 data_size,
+		    u8 *data);
 };
 
 struct devlink_fmsg;
diff --git a/net/core/devlink.c b/net/core/devlink.c
index b2b855d12a11..5831b7b78915 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4005,6 +4005,56 @@ static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
 	return err;
 }
 
+static int devlink_nl_region_read_direct_fill(struct sk_buff *skb,
+					      struct devlink *devlink,
+					      struct devlink_region *region,
+					      struct nlattr **attrs,
+					      u64 start_offset,
+					      u64 end_offset,
+					      bool dump,
+					      u64 *new_offset)
+{
+	u64 curr_offset = start_offset;
+	int err = 0;
+	u8 *data;
+
+	/* Allocate and re-use a single buffer */
+	data = kzalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	*new_offset = start_offset;
+
+	if (end_offset > region->size || dump)
+		end_offset = region->size;
+
+	while (curr_offset < end_offset) {
+		u32 data_size;
+
+		if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
+			data_size = end_offset - curr_offset;
+		else
+			data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
+
+		err = region->ops->read(devlink, curr_offset, data_size, data);
+		if (err)
+			break;
+
+		err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
+							    data, data_size,
+							    curr_offset);
+		if (err)
+			break;
+
+		curr_offset += data_size;
+	}
+	*new_offset = curr_offset;
+
+	kfree(data);
+
+	return err;
+}
+
 static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 					     struct netlink_callback *cb)
 {
@@ -4016,6 +4066,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	const char *region_name;
 	struct devlink *devlink;
 	bool dump = true;
+	bool direct;
 	void *hdr;
 	int err;
 
@@ -4030,8 +4081,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 
 	mutex_lock(&devlink->lock);
 
-	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
-	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
+	if (!attrs[DEVLINK_ATTR_REGION_NAME]) {
 		err = -EINVAL;
 		goto out_unlock;
 	}
@@ -4043,6 +4093,17 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto out_unlock;
 	}
 
+	if (attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
+		direct = false;
+	else
+		direct = true;
+
+	/* Region may not support direct read access */
+	if (direct && !region->ops->read) {
+		err = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
 	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
 			  &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI,
 			  DEVLINK_CMD_REGION_READ);
@@ -4076,11 +4137,18 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		dump = false;
 	}
 
-	err = devlink_nl_region_read_snapshot_fill(skb, devlink,
-						   region, attrs,
-						   start_offset,
-						   end_offset, dump,
-						   &ret_offset);
+	if (direct)
+		err = devlink_nl_region_read_direct_fill(skb, devlink,
+							 region, attrs,
+							 start_offset,
+							 end_offset, dump,
+							 &ret_offset);
+	else
+		err = devlink_nl_region_read_snapshot_fill(skb, devlink,
+							   region, attrs,
+							   start_offset,
+							   end_offset, dump,
+							   &ret_offset);
 
 	if (err && err != -EMSGSIZE)
 		goto nla_put_failure;
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 14/15] ice: support direct read of the shadow ram region
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (12 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 13/15] devlink: support directly reading from region memory Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 22:59 ` [PATCH 15/15] ice: add ice.rst devlink documentation file Jacob Keller
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Using the new .read region operation, implement support for directly
reading from the Shadow RAM region without a snapshot. This is useful
when the atomic guarantees provided by a snapshot aren't necessary and
userspace only wants to read a small portion of the region.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_devlink.c | 48 ++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 9f6a0281fd6e..7bad237177a0 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -224,6 +224,53 @@ void ice_devlink_destroy_port(struct ice_pf *pf)
 	devlink_port_unregister(&pf->devlink_port);
 }
 
+/**
+ * ice_devlink_sr_read - Read a portion of the shadow RAM
+ * @devlink: the devlink instance
+ * @curr_offset: offset to start at
+ * @data_size: portion of the region to read
+ * @data: buffer to store region contents
+ *
+ * This function is called to directly read from the shadow-ram region in
+ * response to a DEVLINK_CMD_REGION_READ without a snapshot id.
+ *
+ * @returns zero on success and updates the contents of the data region,
+ * otherwise returns a non-zero error code on failure.
+ */
+static int ice_devlink_sr_read(struct devlink *devlink, u64 curr_offset,
+			  u32 data_size, u8 *data)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_hw *hw = &pf->hw;
+	enum ice_status status;
+
+	if (!ice_is_primary_function(pf))
+		return -EACCES;
+
+	if (curr_offset + data_size > hw->nvm.sr_words * sizeof(u16))
+		return -ERANGE;
+
+	status = ice_acquire_nvm(hw, ICE_RES_READ);
+	if (status) {
+		dev_err(dev, "ice_acquire_nvm failed, err %d aq_err %d\n",
+			status, hw->adminq.sq_last_status);
+		return -EIO;
+	}
+
+	status = ice_read_flat_nvm(hw, curr_offset, &data_size, data, true);
+	if (status) {
+		dev_err(dev, "ice_read_flat_nvm failed, err %d aq_err %d\n",
+			status, hw->adminq.sq_last_status);
+		ice_release_nvm(hw);
+		return -EIO;
+	}
+
+	ice_release_nvm(hw);
+
+	return 0;
+}
+
 /**
  * ice_devlink_sr_snapshot - Capture a snapshot of the Shadow RAM contents
  * @devlink: the devlink instance
@@ -290,6 +337,7 @@ static int ice_devlink_sr_snapshot(struct devlink *devlink,
 static const struct devlink_region_ops ice_sr_region_ops = {
 	.name = "shadow-ram",
 	.snapshot = ice_devlink_sr_snapshot,
+	.read = ice_devlink_sr_read,
 };
 
 /**
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 15/15] ice: add ice.rst devlink documentation file
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (13 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 14/15] ice: support direct read of the shadow ram region Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-01-30 22:59 ` [RFC PATCH 1/3] Update kernel headers Jacob Keller
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Now that the ice driver has gained some devlink support, add
a driver-specific documentation file for it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 Documentation/networking/devlink/ice.rst   | 76 ++++++++++++++++++++++
 Documentation/networking/devlink/index.rst |  1 +
 2 files changed, 77 insertions(+)
 create mode 100644 Documentation/networking/devlink/ice.rst

diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
new file mode 100644
index 000000000000..c27b49311afe
--- /dev/null
+++ b/Documentation/networking/devlink/ice.rst
@@ -0,0 +1,76 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+ice devlink support
+===================
+
+This document describes the devlink features implemented by the ``ice``
+device driver.
+
+Info versions
+=============
+
+The ``ice`` driver reports the following versions
+
+.. list-table:: devlink info versions implemented
+    :widths: 5 5 90
+
+    * - Name
+      - Type
+      - Description
+    * - ``driver.version``
+      - fixed
+      - Device driver version as reported by ``ETHTOOL_GDRVINFO``
+    * - ``board.id``
+      - fixed
+      - The Product Board Assembly (PBA) identifier of the board
+    * - ``fw``
+      - running
+      - Combined flash version string as reported by ``ETHTOOL_GDRVINFO``
+    * - ``fw.mgmt``
+      - running
+      - 3-digit overall firmware version
+    * - ``fw.api``
+      - running
+      - 2-digit firmware API version
+    * - ``fw.build``
+      - running
+      - firmware build identifier
+    * - ``nvm.version``
+      - running
+      - NVM format version
+    * - ``nvm.oem``
+      - running
+      - Original Equipment Manufacturer (OEM) version information
+    * - ``nvm.eetrack``
+      - running
+      - Unique NVM EETRACK identifier
+
+
+Regions
+=======
+
+The ``ice`` driver enables access to the contents of the Shadow RAM portion
+of the flash chip via the ``shadow-ram`` region.
+
+Users can request an immediate capture of a snapshot via the
+``DEVLINK_CMD_REGION_TAKE_SNAPSHOT``
+
+.. code:: shell
+
+    $ devlink region snapshot pci/0000:01:00.0/shadow-ram
+    $ devlink region dump pci/0000:01:00.0/shadow-ram snapshot 1
+
+Directly reading a portion of the Shadow RAM without a snapshot is also
+supported
+
+.. code:: shell
+
+    $ devlink region dump pci/0000:01:00.0/shadow-ram
+    0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
+    0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
+    0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
+    0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
+
+    $ devlink region read pci/0000:01:00.0/shadow-ram address 0 length 16
+    0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index 087ff54d53fc..272509cd9215 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -32,6 +32,7 @@ parameters, info versions, and other features it supports.
 
    bnxt
    ionic
+   ice
    mlx4
    mlx5
    mlxsw
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 1/3] Update kernel headers
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (14 preceding siblings ...)
  2020-01-30 22:59 ` [PATCH 15/15] ice: add ice.rst devlink documentation file Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 22:59 ` [RFC PATCH 2/3] devlink: add support for DEVLINK_CMD_REGION_TAKE_SNAPSHOT Jacob Keller
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Update kernel headers to commit:
    80cabcec59f3 ("devlink: add operation to take an immediate snapshot")

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 include/uapi/linux/devlink.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 3f82dedda28f..39df8c0535b9 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -117,6 +117,8 @@ enum devlink_command {
 	DEVLINK_CMD_TRAP_GROUP_NEW,
 	DEVLINK_CMD_TRAP_GROUP_DEL,
 
+	DEVLINK_CMD_REGION_TAKE_SNAPSHOT,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 2/3] devlink: add support for DEVLINK_CMD_REGION_TAKE_SNAPSHOT
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (15 preceding siblings ...)
  2020-01-30 22:59 ` [RFC PATCH 1/3] Update kernel headers Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 22:59 ` [RFC PATCH 3/3] devlink: stop requiring snapshot for regions Jacob Keller
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

Add support to request an immediate snapshot of a devlink region. This
enables the devlink program to obtain a snapshot of the region if the
driver for the region supports immediate snapshots.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 devlink/devlink.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 73ce98654fd8..ac1ad8aa0769 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -4163,6 +4163,7 @@ static const char *cmd_obj(uint8_t cmd)
 	case DEVLINK_CMD_REGION_SET:
 	case DEVLINK_CMD_REGION_NEW:
 	case DEVLINK_CMD_REGION_DEL:
+	case DEVLINK_CMD_REGION_TAKE_SNAPSHOT:
 		return "region";
 	case DEVLINK_CMD_FLASH_UPDATE:
 	case DEVLINK_CMD_FLASH_UPDATE_END:
@@ -6345,12 +6346,28 @@ static int cmd_region_read(struct dl *dl)
 	return err;
 }
 
+static int cmd_region_take_snapshot(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_REGION_TAKE_SNAPSHOT,
+			NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE_REGION, 0);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
 static void cmd_region_help(void)
 {
 	pr_err("Usage: devlink region show [ DEV/REGION ]\n");
 	pr_err("       devlink region del DEV/REGION snapshot SNAPSHOT_ID\n");
 	pr_err("       devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]\n");
 	pr_err("       devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length LENGTH\n");
+	pr_err("       devlink region snapshot DEV/REGION\n");
 }
 
 static int cmd_region(struct dl *dl)
@@ -6372,6 +6389,9 @@ static int cmd_region(struct dl *dl)
 	} else if (dl_argv_match(dl, "read")) {
 		dl_arg_inc(dl);
 		return cmd_region_read(dl);
+	} else if (dl_argv_match(dl, "snapshot")) {
+		dl_arg_inc(dl);
+		return cmd_region_take_snapshot(dl);
 	}
 	pr_err("Command \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH 3/3] devlink: stop requiring snapshot for regions
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (16 preceding siblings ...)
  2020-01-30 22:59 ` [RFC PATCH 2/3] devlink: add support for DEVLINK_CMD_REGION_TAKE_SNAPSHOT Jacob Keller
@ 2020-01-30 22:59 ` Jacob Keller
  2020-01-30 23:03 ` [RFC PATCH 00/13] devlink direct region reading Jacob Keller
  2020-01-31 18:06 ` Jakub Kicinski
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 22:59 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, Jacob Keller

The region dump and region read commands currently require the snapshot
to work. Recent changes to the kernel have enabled optionally
supporting direct read of a region's contents without a snapshot id.

Enable this by allowing the read and dump commands to execute without
a snapshot id. On older kernels, this will return -EINVAL as the kernel
will reject such a command. On newer kernels, this will directly read
the region contents without taking a snapshot. If a region does not
support direct read, it will return -EOPNOTSUPP.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 devlink/devlink.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index ac1ad8aa0769..9f9e8ea09bf5 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -6311,8 +6311,8 @@ static int cmd_region_dump(struct dl *dl)
 	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_REGION_READ,
 			       NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP);
 
-	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE_REGION |
-				DL_OPT_REGION_SNAPSHOT_ID, 0);
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE_REGION,
+				DL_OPT_REGION_SNAPSHOT_ID);
 	if (err)
 		return err;
 
@@ -6333,8 +6333,8 @@ static int cmd_region_read(struct dl *dl)
 			       NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP);
 
 	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE_REGION |
-				DL_OPT_REGION_ADDRESS | DL_OPT_REGION_LENGTH |
-				DL_OPT_REGION_SNAPSHOT_ID, 0);
+				DL_OPT_REGION_ADDRESS | DL_OPT_REGION_LENGTH,
+				DL_OPT_REGION_SNAPSHOT_ID);
 	if (err)
 		return err;
 
-- 
2.25.0.rc1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/13] devlink direct region reading
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (17 preceding siblings ...)
  2020-01-30 22:59 ` [RFC PATCH 3/3] devlink: stop requiring snapshot for regions Jacob Keller
@ 2020-01-30 23:03 ` Jacob Keller
  2020-01-31 18:06 ` Jakub Kicinski
  19 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-30 23:03 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang



On 1/30/2020 2:58 PM, Jacob Keller wrote:
> 
> Jacob Keller (12):
>   devlink: prepare to support region operations
>   devlink: add functions to take snapshot while locked
>   devlink: add operation to take an immediate snapshot
>   netdevsim: support taking immediate snapshot via devlink
>   ice: use __le16 types for explicitly Little Endian values
>   ice: create function to read a section of the NVM and Shadow RAM
>   ice: enable initial devlink support for function zero
>   ice: add basic handler for devlink .info_get
>   ice: add board identifier info to devlink .info_get
>   ice: add a devlink region to dump shadow RAM contents
>   devlink: support directly reading from region memory
>   ice: support direct read of the shadow ram region
> 
> Jesse Brandeburg (1):
>   ice: implement full NVM read from ETHTOOL_GEEPROM
> 
>  drivers/net/ethernet/intel/Kconfig            |   1 +
>  drivers/net/ethernet/intel/ice/Makefile       |   1 +
>  drivers/net/ethernet/intel/ice/ice.h          |   7 +
>  .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   3 +
>  drivers/net/ethernet/intel/ice/ice_common.c   |  61 +++
>  drivers/net/ethernet/intel/ice/ice_common.h   |   5 +-
>  drivers/net/ethernet/intel/ice/ice_devlink.c  | 427 ++++++++++++++++++
>  drivers/net/ethernet/intel/ice/ice_devlink.h  |  20 +
>  drivers/net/ethernet/intel/ice/ice_ethtool.c  |  37 +-
>  drivers/net/ethernet/intel/ice/ice_main.c     |  22 +
>  drivers/net/ethernet/intel/ice/ice_nvm.c      | 211 +++------
>  drivers/net/ethernet/intel/ice/ice_nvm.h      |   7 +
>  drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>  drivers/net/ethernet/mellanox/mlx4/crdump.c   |  25 +-
>  drivers/net/netdevsim/dev.c                   |  44 +-
>  include/net/devlink.h                         |  28 +-
>  include/uapi/linux/devlink.h                  |   2 +
>  net/core/devlink.c                            | 246 +++++++---
>  18 files changed, 918 insertions(+), 230 deletions(-)
>  create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.c
>  create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.h
> 

Woops, I forgot to re-create the cover letter after some reworks. All of
the following patches should have also been tagged with 'RFC'. Here's
the correct cover letter summary for this:

> 
> Jacob Keller (14):
>   devlink: prepare to support region operations
>   devlink: add functions to take snapshot while locked
>   devlink: add operation to take an immediate snapshot
>   netdevsim: support taking immediate snapshot via devlink
>   ice: use __le16 types for explicitly Little Endian values
>   ice: create function to read a section of the NVM and Shadow RAM
>   devlink: add devres managed devlinkm_alloc and devlinkm_free
>   ice: enable initial devlink support
>   ice: add basic handler for devlink .info_get
>   ice: add board identifier info to devlink .info_get
>   ice: add a devlink region to dump shadow RAM contents
>   devlink: support directly reading from region memory
>   ice: support direct read of the shadow ram region
>   ice: add ice.rst devlink documentation file
> 
> Jesse Brandeburg (1):
>   ice: implement full NVM read from ETHTOOL_GEEPROM
> 
>  .../networking/devlink/devlink-region.rst     |  17 +-
>  Documentation/networking/devlink/ice.rst      |  76 ++++
>  Documentation/networking/devlink/index.rst    |   1 +
>  drivers/net/ethernet/intel/Kconfig            |   1 +
>  drivers/net/ethernet/intel/ice/Makefile       |   1 +
>  drivers/net/ethernet/intel/ice/ice.h          |   6 +
>  .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   3 +
>  drivers/net/ethernet/intel/ice/ice_common.c   |  66 ----
>  drivers/net/ethernet/intel/ice/ice_common.h   |   6 -
>  drivers/net/ethernet/intel/ice/ice_devlink.c  | 373 ++++++++++++++++++
>  drivers/net/ethernet/intel/ice/ice_devlink.h  |  17 +
>  drivers/net/ethernet/intel/ice/ice_ethtool.c  |  36 +-
>  drivers/net/ethernet/intel/ice/ice_main.c     |  30 +-
>  drivers/net/ethernet/intel/ice/ice_nvm.c      | 340 +++++++++-------
>  drivers/net/ethernet/intel/ice/ice_nvm.h      |  12 +
>  drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>  drivers/net/ethernet/mellanox/mlx4/crdump.c   |  25 +-
>  drivers/net/netdevsim/dev.c                   |  44 ++-
>  include/net/devlink.h                         |  32 +-
>  include/uapi/linux/devlink.h                  |   2 +
>  lib/devres.c                                  |   1 +
>  net/core/devlink.c                            | 300 +++++++++++---
>  .../drivers/net/netdevsim/devlink.sh          |   5 +
>  23 files changed, 1091 insertions(+), 304 deletions(-)
>  create mode 100644 Documentation/networking/devlink/ice.rst
>  create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.c
>  create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.h
> 

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/13] devlink direct region reading
  2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
                   ` (18 preceding siblings ...)
  2020-01-30 23:03 ` [RFC PATCH 00/13] devlink direct region reading Jacob Keller
@ 2020-01-31 18:06 ` Jakub Kicinski
  2020-01-31 18:09   ` Jacob Keller
  19 siblings, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:06 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:58:55 -0800, Jacob Keller wrote:
> Finally, the last 3 patches implement a new region in the ice driver for
> accessing the shadow RAM. This region will support the immediate trigger
> operation as well as a new read operation that enables directly reading from
> the shadow RAM without a region.

... without a snapshot?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/15] devlink: prepare to support region operations
  2020-01-30 22:58 ` [PATCH 01/15] devlink: prepare to support region operations Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-02-03 11:35   ` Jiri Pirko
  1 sibling, 0 replies; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:58:56 -0800, Jacob Keller wrote:
> Modify the devlink region code in preparation for adding new operations
> on regions.
> 
> Create a devlink_region_ops structure, and move the name pointer from
> within the devlink_region structure into the ops structure (similar to
> the devlink_health_reporter_ops).
> 
> This prepares the regions to enable support of additional operations in
> the future such as requesting snapshots, or accessing the region
> directly without a snapshot.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/15] devlink: add functions to take snapshot while locked
  2020-01-30 22:58 ` [PATCH 02/15] devlink: add functions to take snapshot while locked Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:09     ` Jacob Keller
  2020-02-03 11:39   ` Jiri Pirko
  1 sibling, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:58:57 -0800, Jacob Keller wrote:
> +static int
> +devlink_region_snapshot_create_locked(struct devlink_region *region,
> +				      u8 *data, u32 snapshot_id,
> +				      devlink_snapshot_data_dest_t *destructor)

-1 on the _locked suffix. Please follow the time-honored tradition of
using double underscore for internal helpers which make assumption
about calling context.

> +{
> +	struct devlink_snapshot *snapshot;

lockdep_assert_held() is much better than just a kdoc comment.

> +	/* check if region can hold one more snapshot */
> +	if (region->cur_snapshots == region->max_snapshots)
> +		return -ENOMEM;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/15] netdevsim: support taking immediate snapshot via devlink
  2020-01-30 22:58 ` [PATCH 04/15] netdevsim: support taking immediate snapshot via devlink Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:12     ` Jacob Keller
  0 siblings, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:58:59 -0800, Jacob Keller wrote:
> Implement the .snapshot region operation for the dummy data region. This
> enables a region snapshot to be taken upon request via the new
> DEVLINK_CMD_REGION_SNAPSHOT command.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  drivers/net/netdevsim/dev.c                   | 38 +++++++++++++++----
>  .../drivers/net/netdevsim/devlink.sh          |  5 +++
>  2 files changed, 36 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> index d521b7bfe007..924cd328037f 100644
> --- a/drivers/net/netdevsim/dev.c
> +++ b/drivers/net/netdevsim/dev.c
> @@ -38,24 +38,47 @@ static struct dentry *nsim_dev_ddir;
>  
>  #define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
>  
> +static int nsim_dev_take_snapshot(struct devlink *devlink,

nit: break the line after static int, you've done it in other patches
    so I trust you agree it's a superior formatting style :)

> +				  struct netlink_ext_ack *extack,
> +				  u8 **data,
> +				  devlink_snapshot_data_dest_t **destructor)
> +{
> +	void *dummy_data;
> +
> +	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
> +	if (!dummy_data) {
> +		NL_SET_ERR_MSG(extack, "Out of memory");

Unnecessary, there will be an OOM splat, and ENOMEM is basically
exactly the same as the message.

> +		return -ENOMEM;
> +	}
> +
> +	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
> +
> +	*data = dummy_data;
> +	*destructor = kfree;

Is there any driver which uses different destructor for different
snapshots? Looks like something we could put in ops, maybe?

> +	return 0;
> +}
> +
>  static ssize_t nsim_dev_take_snapshot_write(struct file *file,
>  					    const char __user *data,
>  					    size_t count, loff_t *ppos)
>  {
>  	struct nsim_dev *nsim_dev = file->private_data;
> -	void *dummy_data;
> +	devlink_snapshot_data_dest_t *destructor;
> +	u8 *dummy_data;
>  	int err;
>  	u32 id;
>  
> -	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
> -	if (!dummy_data)
> -		return -ENOMEM;
> -
> -	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
> +	err = nsim_dev_take_snapshot(priv_to_devlink(nsim_dev), NULL,
> +				     &dummy_data, &destructor);
> +	if (err) {
> +		pr_err("Failed to capture region snapshot\n");

Also not a very useful message for netdevsim IMHO give the caller
clearly requested a snapshot and will get a more informative error 
from errno.

> +		return err;
> +	}
>  
>  	id = devlink_region_snapshot_id_get(priv_to_devlink(nsim_dev));
>  	err = devlink_region_snapshot_create(nsim_dev->dummy_region,
> -					     dummy_data, id, kfree);
> +					     dummy_data, id, destructor);
>  	if (err) {
>  		pr_err("Failed to create region snapshot\n");
>  		kfree(dummy_data);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-01-30 22:59 ` [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:16     ` Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-02-03 11:29   ` Jiri Pirko
  2 siblings, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:59:03 -0800, Jacob Keller wrote:
> Add devres managed allocation functions for allocating a devlink
> instance. These can be used by device drivers based on the devres
> framework which want to allocate a devlink instance.
> 
> For simplicity and to reduce churn in the devlink core code, the devres
> management works by creating a node with a double-pointer. The devlink
> instance is allocated using the normal devlink_alloc and released using
> the normal devlink_free.
> 
> An alternative solution where the raw memory for devlink is allocated
> directly via devres_alloc could be done. Such an implementation would
> either significantly increase code duplication or code churn in order to
> refactor the setup from the allocation.
> 
> The new devres managed allocation function will be used by the ice
> driver in a following change to implement initial devlink support.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

How much are you actually going to gain by doing this given you still
have to deal with registering an unregistering all devlink things...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-01-30 22:59 ` [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-02-01  0:51     ` Jacob Keller
  2020-02-03 11:29   ` Jiri Pirko
  2 siblings, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:59:03 -0800, Jacob Keller wrote:
> Add devres managed allocation functions for allocating a devlink
> instance. These can be used by device drivers based on the devres
> framework which want to allocate a devlink instance.
> 
> For simplicity and to reduce churn in the devlink core code, the devres
> management works by creating a node with a double-pointer. The devlink
> instance is allocated using the normal devlink_alloc and released using
> the normal devlink_free.
> 
> An alternative solution where the raw memory for devlink is allocated
> directly via devres_alloc could be done. Such an implementation would
> either significantly increase code duplication or code churn in order to
> refactor the setup from the allocation.
> 
> The new devres managed allocation function will be used by the ice
> driver in a following change to implement initial devlink support.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Ugh. The devlink instance sharing/aliasing is something that needs to
be solved at some point. But the problem likely exists elsewhere
already. Do you have global ASIC resources?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/15] ice: add basic handler for devlink .info_get
  2020-01-30 22:59 ` [PATCH 10/15] ice: add basic handler for devlink .info_get Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:25     ` Jacob Keller
  0 siblings, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:59:05 -0800, Jacob Keller wrote:
> The devlink .info_get callback allows the driver to report detailed
> version information. The following devlink versions are reported with
> this initial implementation:
> 
>  "driver.version" -> device driver version, to match ethtool -i version
>  "fw" -> firmware version as reported by ethtool -i firmware-version
>  "fw.mgmt" -> The version of the firmware that controls PHY, link, etc
>  "fw.api" -> API version of interface exposed over the AdminQ
>  "fw.build" -> Unique build identifier for the management firmware
>  "nvm.version" -> Version of the NVM parameters section
>  "nvm.oem" -> OEM-specific version information
>  "nvm.eetrack" -> Unique identifier for the combined NVM image

These all need documentation.

> With this, devlink can now report at least the same information as
> reported by the older ethtool interface. Each section of the
> "firmware-version" is also reported independently so that it is easier
> to understand the meaning.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_devlink.c | 103 +++++++++++++++++++
>  1 file changed, 103 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
> index 0b0f936122de..493c2c2986f2 100644
> --- a/drivers/net/ethernet/intel/ice/ice_devlink.c
> +++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
> @@ -2,9 +2,112 @@
>  /* Copyright (c) 2019, Intel Corporation. */
>  
>  #include "ice.h"
> +#include "ice_lib.h"
>  #include "ice_devlink.h"
>  
> +/**
> + * ice_devlink_info_get - .info_get devlink handler
> + * @devlink: devlink instance structure
> + * @req: the devlink info request
> + * @extack: extended netdev ack structure
> + *
> + * Callback for the devlink .info_get operation. Reports information about the
> + * device.
> + *
> + * @returns zero on success or an error code on failure.
> + */
> +static int ice_devlink_info_get(struct devlink *devlink,
> +				struct devlink_info_req *req,
> +				struct netlink_ext_ack *extack)
> +{
> +	u8 oem_ver, oem_patch, nvm_ver_hi, nvm_ver_lo;
> +	struct ice_pf *pf = devlink_priv(devlink);
> +	struct ice_hw *hw = &pf->hw;
> +	u16 oem_build;
> +	char buf[32]; /* TODO: size this properly */
> +	int err;
> +
> +	ice_get_nvm_version(hw, &oem_ver, &oem_build, &oem_patch, &nvm_ver_hi,
> +			    &nvm_ver_lo);
> +
> +	err = devlink_info_driver_name_put(req, KBUILD_MODNAME);
> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set driver name");
> +		return err;
> +	}
> +
> +	/* driver.version */
> +	err = devlink_info_version_fixed_put(req, "driver.version",
> +					     ice_drv_ver);

Hard no. You should really try to follow the discussions on netdev.

> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set driver version");
> +		return err;
> +	}
> +
> +	/* fw (match exact output of ethtool -i firmware-version) */

That's generally a bad idea, the whole point of info was that people
were stuffing multiple things into ethtool -i fw. Is this only one item
referring to one single entity?

> +	err = devlink_info_version_running_put(req,
> +					       DEVLINK_INFO_VERSION_GENERIC_FW,
> +					       ice_nvm_version_str(hw));
> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set combined fw version");
> +		return err;
> +	}
> +
> +	/* fw.mgmt (DEVLINK_INFO_VERSION_GENERIC_FW_MGMT) */
> +	snprintf(buf, sizeof(buf), "%u.%u.%u", hw->fw_maj_ver, hw->fw_min_ver,
> +		 hw->fw_patch);
> +	err = devlink_info_version_running_put(req, "fw.mgmt", buf);

why not use the define?

> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw version data");
> +		return err;
> +	}
> +
> +	/* fw.api */
> +	snprintf(buf, sizeof(buf), "%u.%u", hw->api_maj_ver,
> +		 hw->api_min_ver);
> +	err = devlink_info_version_running_put(req, "fw.api", buf);

Is this the API version of the management FW? I'd go for "fw.mgmt.api".
Datapath, RoCE and other bits may have APIs which evolve independently
for complex chips.

> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw API data");
> +		return err;
> +	}
> +
> +	/* fw.build */
> +	snprintf(buf, sizeof(buf), "%08x", hw->fw_build);
> +	err = devlink_info_version_running_put(req, "fw.build", buf);

Huh? Why is this not part of the version?

Maybe you want to use fw.bundle? Naming is hard, at Netronome added
that as a unique identifier for the FW in its entirety / the entire
build as it is passed from Eng to QA and released externally.

> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw build data");
> +		return err;
> +	}
> +
> +	/* nvm.version */
> +	snprintf(buf, sizeof(buf), "%x.%02x", nvm_ver_hi, nvm_ver_lo);
> +	err = devlink_info_version_running_put(req, "nvm.version", buf);

Please us the psid 

> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set NVM version data");
> +		return err;
> +	}
> +
> +	/* nvm.oem */
> +	snprintf(buf, sizeof(buf), "%u.%u.%u", oem_ver, oem_build, oem_patch);
> +	err = devlink_info_version_running_put(req, "nvm.oem", buf);

This looks like free form catch all. Let's not.

> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set oem version data");
> +		return err;
> +	}
> +
> +	/* nvm.eetrack */
> +	snprintf(buf, sizeof(buf), "0x%0X", hw->nvm.eetrack);

Mm. maybe this is bundle? Or psid. Hm. Please explain what this is and
what it's supposed to be used for. I should probably add more docs to
the existing items :S

> +	err = devlink_info_version_running_put(req, "nvm.eetrack", buf);
> +	if (err) {
> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set NVM eetrack data");
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +
>  const struct devlink_ops ice_devlink_ops = {
> +	.info_get = ice_devlink_info_get,
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 11/15] ice: add board identifier info to devlink .info_get
  2020-01-30 22:59 ` [PATCH 11/15] ice: add board identifier info to " Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:26     ` Jacob Keller
  0 siblings, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:59:06 -0800, Jacob Keller wrote:
> Export a unique board identifier using "board.id" for devlink's
> .info_get command.
> 
> Obtain this by reading the NVM for the PBA identification string.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

In general for the devlink info it'd be really useful to have example
outputs to see what the values actually are.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-01-30 22:58 ` [PATCH 03/15] devlink: add operation to take an immediate snapshot Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-02-03  8:19   ` Yunsheng Lin
  2020-02-03 11:50   ` Jiri Pirko
  2 siblings, 0 replies; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:58:58 -0800, Jacob Keller wrote:
> Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
> command is intended to enable userspace to request an immediate snapshot
> of a region.
> 
> Regions can enable support for requestable snapshots by implementing the
> snapshot callback function in the region's devlink_region_ops structure.
> 
> Implementations of this function callback should capture an immediate
> copy of the data and return it and its destructor in the function
> parameters. The core devlink code will generate a snapshot ID and create
> the new snapshot while holding the devlink instance lock.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/15] ice: add ice.rst devlink documentation file
  2020-01-30 22:59 ` [PATCH 15/15] ice: add ice.rst devlink documentation file Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:28     ` Jacob Keller
  0 siblings, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:59:10 -0800, Jacob Keller wrote:
> Now that the ice driver has gained some devlink support, add
> a driver-specific documentation file for it.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

This should have been when info items are added.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 13/15] devlink: support directly reading from region memory
  2020-01-30 22:59 ` [PATCH 13/15] devlink: support directly reading from region memory Jacob Keller
@ 2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:27     ` Jacob Keller
  2020-01-31 19:15     ` Jacob Keller
  2020-02-03 13:44   ` Jiri Pirko
  1 sibling, 2 replies; 62+ messages in thread
From: Jakub Kicinski @ 2020-01-31 18:07 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Thu, 30 Jan 2020 14:59:08 -0800, Jacob Keller wrote:
> Add a new region operation for directly reading from a region, without
> taking a full snapshot.
> 
> Extend the DEVLINK_CMD_REGION_READ to allow directly reading from
> a region, if supported. Instead of reporting a missing snapshot id as
> invalid, check to see if direct reading is implemented for the region.
> If so, use the direct read operation to grab the current contents of the
> region.
> 
> This new behavior of DEVLINK_CMD_REGION_READ should be backwards
> compatible. Previously, all kernels rejected such
> a DEVLINK_CMD_REGION_READ with -EINVAL, and will now either accept the
> call or report -EOPNOTSUPP for regions which do not implement direct
> access.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

> +static int devlink_nl_region_read_direct_fill(struct sk_buff *skb,
> +					      struct devlink *devlink,
> +					      struct devlink_region *region,
> +					      struct nlattr **attrs,
> +					      u64 start_offset,
> +					      u64 end_offset,
> +					      bool dump,
> +					      u64 *new_offset)
> +{
> +	u64 curr_offset = start_offset;
> +	int err = 0;
> +	u8 *data;
> +
> +	/* Allocate and re-use a single buffer */
> +	data = kzalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	*new_offset = start_offset;
> +
> +	if (end_offset > region->size || dump)
> +		end_offset = region->size;
> +
> +	while (curr_offset < end_offset) {
> +		u32 data_size;
> +
> +		if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
> +			data_size = end_offset - curr_offset;
> +		else
> +			data_size = DEVLINK_REGION_READ_CHUNK_SIZE;

Also known as min() ?

> +		err = region->ops->read(devlink, curr_offset, data_size, data);
> +		if (err)
> +			break;
> +
> +		err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
> +							    data, data_size,
> +							    curr_offset);
> +		if (err)
> +			break;
> +
> +		curr_offset += data_size;
> +	}
> +	*new_offset = curr_offset;
> +
> +	kfree(data);
> +
> +	return err;
> +}
> +
>  static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
>  					     struct netlink_callback *cb)
>  {

> +	/* Region may not support direct read access */
> +	if (direct && !region->ops->read) {

for missing trigger you added an extack, perhaps makes sense here, too?

> +		err = -EOPNOTSUPP;
> +		goto out_unlock;
> +	}
> +
>  	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
>  			  &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI,
>  			  DEVLINK_CMD_REGION_READ);

Generally all the devlink parts look quite reasonable to me 👍

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 00/13] devlink direct region reading
  2020-01-31 18:06 ` Jakub Kicinski
@ 2020-01-31 18:09   ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 18:09 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:06 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:58:55 -0800, Jacob Keller wrote:
>> Finally, the last 3 patches implement a new region in the ice driver for
>> accessing the shadow RAM. This region will support the immediate trigger
>> operation as well as a new read operation that enables directly reading from
>> the shadow RAM without a region.
> 
> ... without a snapshot?
> 

Woops. Yes that's the correct word I meant here.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/15] devlink: add functions to take snapshot while locked
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-01-31 18:09     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 18:09 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:58:57 -0800, Jacob Keller wrote:
>> +static int
>> +devlink_region_snapshot_create_locked(struct devlink_region *region,
>> +				      u8 *data, u32 snapshot_id,
>> +				      devlink_snapshot_data_dest_t *destructor)
> 
> -1 on the _locked suffix. Please follow the time-honored tradition of
> using double underscore for internal helpers which make assumption
> about calling context.
> 

Sure.

>> +{
>> +	struct devlink_snapshot *snapshot;
> 
> lockdep_assert_held() is much better than just a kdoc comment.
> 

Ok.

>> +	/* check if region can hold one more snapshot */
>> +	if (region->cur_snapshots == region->max_snapshots)
>> +		return -ENOMEM;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/15] netdevsim: support taking immediate snapshot via devlink
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-01-31 18:12     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 18:12 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:58:59 -0800, Jacob Keller wrote:
>> Implement the .snapshot region operation for the dummy data region. This
>> enables a region snapshot to be taken upon request via the new
>> DEVLINK_CMD_REGION_SNAPSHOT command.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>>  drivers/net/netdevsim/dev.c                   | 38 +++++++++++++++----
>>  .../drivers/net/netdevsim/devlink.sh          |  5 +++
>>  2 files changed, 36 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
>> index d521b7bfe007..924cd328037f 100644
>> --- a/drivers/net/netdevsim/dev.c
>> +++ b/drivers/net/netdevsim/dev.c
>> @@ -38,24 +38,47 @@ static struct dentry *nsim_dev_ddir;
>>  
>>  #define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
>>  
>> +static int nsim_dev_take_snapshot(struct devlink *devlink,
> 
> nit: break the line after static int, you've done it in other patches
>     so I trust you agree it's a superior formatting style :)
> 

Sure.

>> +				  struct netlink_ext_ack *extack,
>> +				  u8 **data,
>> +				  devlink_snapshot_data_dest_t **destructor)
>> +{
>> +	void *dummy_data;
>> +
>> +	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
>> +	if (!dummy_data) {
>> +		NL_SET_ERR_MSG(extack, "Out of memory");
> 
> Unnecessary, there will be an OOM splat, and ENOMEM is basically
> exactly the same as the message.
> 
>> +		return -ENOMEM;
>> +	}
>> +
>> +	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
>> +
>> +	*data = dummy_data;
>> +	*destructor = kfree;
> 
> Is there any driver which uses different destructor for different
> snapshots? Looks like something we could put in ops, maybe?
> 

Hmm. Yea I think you're right making this tied to the ops structure
instead of the callback makes sense to me.

>> +	return 0;
>> +}
>> +
>>  static ssize_t nsim_dev_take_snapshot_write(struct file *file,
>>  					    const char __user *data,
>>  					    size_t count, loff_t *ppos)
>>  {
>>  	struct nsim_dev *nsim_dev = file->private_data;
>> -	void *dummy_data;
>> +	devlink_snapshot_data_dest_t *destructor;
>> +	u8 *dummy_data;
>>  	int err;
>>  	u32 id;
>>  
>> -	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
>> -	if (!dummy_data)
>> -		return -ENOMEM;
>> -
>> -	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
>> +	err = nsim_dev_take_snapshot(priv_to_devlink(nsim_dev), NULL,
>> +				     &dummy_data, &destructor);
>> +	if (err) {
>> +		pr_err("Failed to capture region snapshot\n");
> 
> Also not a very useful message for netdevsim IMHO give the caller
> clearly requested a snapshot and will get a more informative error 
> from errno.
> 

Will remove.

>> +		return err;
>> +	}
>>  
>>  	id = devlink_region_snapshot_id_get(priv_to_devlink(nsim_dev));
>>  	err = devlink_region_snapshot_create(nsim_dev->dummy_region,
>> -					     dummy_data, id, kfree);
>> +					     dummy_data, id, destructor);
>>  	if (err) {
>>  		pr_err("Failed to create region snapshot\n");
>>  		kfree(dummy_data);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-01-31 18:16     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 18:16 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:59:03 -0800, Jacob Keller wrote:
>> Add devres managed allocation functions for allocating a devlink
>> instance. These can be used by device drivers based on the devres
>> framework which want to allocate a devlink instance.
>>
>> For simplicity and to reduce churn in the devlink core code, the devres
>> management works by creating a node with a double-pointer. The devlink
>> instance is allocated using the normal devlink_alloc and released using
>> the normal devlink_free.
>>
>> An alternative solution where the raw memory for devlink is allocated
>> directly via devres_alloc could be done. Such an implementation would
>> either significantly increase code duplication or code churn in order to
>> refactor the setup from the allocation.
>>
>> The new devres managed allocation function will be used by the ice
>> driver in a following change to implement initial devlink support.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> 
> How much are you actually going to gain by doing this given you still
> have to deal with registering an unregistering all devlink things...
> 

So, the main problem is that the ice driver's private PF structure is
allocated using devm_alloc.. and if we switch to using devlink_alloc
then that memory is no longer managed by devres and it becomes more
difficult to manage...

That being said, after looking at exactly how the ice driver uses devres
and managing things.. Probably not much gain in general. The ice driver
could implement its own call for this if we still need it.

In theory the register/unregister of devlink resources could be managed
by using devres_add_action to handle the teardown.. but that's a fairly
major refactor to get everything working.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/15] ice: add basic handler for devlink .info_get
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-01-31 18:25     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 18:25 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:59:05 -0800, Jacob Keller wrote:
>> The devlink .info_get callback allows the driver to report detailed
>> version information. The following devlink versions are reported with
>> this initial implementation:
>>
>>  "driver.version" -> device driver version, to match ethtool -i version
>>  "fw" -> firmware version as reported by ethtool -i firmware-version
>>  "fw.mgmt" -> The version of the firmware that controls PHY, link, etc
>>  "fw.api" -> API version of interface exposed over the AdminQ
>>  "fw.build" -> Unique build identifier for the management firmware
>>  "nvm.version" -> Version of the NVM parameters section
>>  "nvm.oem" -> OEM-specific version information
>>  "nvm.eetrack" -> Unique identifier for the combined NVM image
> 
> These all need documentation.
> 

There's a patch at the end of the series, but it should probably just be
squashed in here.

>> With this, devlink can now report at least the same information as
>> reported by the older ethtool interface. Each section of the
>> "firmware-version" is also reported independently so that it is easier
>> to understand the meaning.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>>  drivers/net/ethernet/intel/ice/ice_devlink.c | 103 +++++++++++++++++++
>>  1 file changed, 103 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
>> index 0b0f936122de..493c2c2986f2 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_devlink.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
>> @@ -2,9 +2,112 @@
>>  /* Copyright (c) 2019, Intel Corporation. */
>>  
>>  #include "ice.h"
>> +#include "ice_lib.h"
>>  #include "ice_devlink.h"
>>  
>> +/**
>> + * ice_devlink_info_get - .info_get devlink handler
>> + * @devlink: devlink instance structure
>> + * @req: the devlink info request
>> + * @extack: extended netdev ack structure
>> + *
>> + * Callback for the devlink .info_get operation. Reports information about the
>> + * device.
>> + *
>> + * @returns zero on success or an error code on failure.
>> + */
>> +static int ice_devlink_info_get(struct devlink *devlink,
>> +				struct devlink_info_req *req,
>> +				struct netlink_ext_ack *extack)
>> +{
>> +	u8 oem_ver, oem_patch, nvm_ver_hi, nvm_ver_lo;
>> +	struct ice_pf *pf = devlink_priv(devlink);
>> +	struct ice_hw *hw = &pf->hw;
>> +	u16 oem_build;
>> +	char buf[32]; /* TODO: size this properly */
>> +	int err;
>> +
>> +	ice_get_nvm_version(hw, &oem_ver, &oem_build, &oem_patch, &nvm_ver_hi,
>> +			    &nvm_ver_lo);
>> +
>> +	err = devlink_info_driver_name_put(req, KBUILD_MODNAME);
>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set driver name");
>> +		return err;
>> +	}
>> +
>> +	/* driver.version */
>> +	err = devlink_info_version_fixed_put(req, "driver.version",
>> +					     ice_drv_ver);
> 
> Hard no. You should really try to follow the discussions on netdev.
> 
>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set driver version");
>> +		return err;
>> +	}
>> +
>> +	/* fw (match exact output of ethtool -i firmware-version) */
> 
> That's generally a bad idea, the whole point of info was that people
> were stuffing multiple things into ethtool -i fw. Is this only one item
> referring to one single entity?

Right. I can just remove this entirely.

> 
>> +	err = devlink_info_version_running_put(req,
>> +					       DEVLINK_INFO_VERSION_GENERIC_FW,
>> +					       ice_nvm_version_str(hw));
>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set combined fw version");
>> +		return err;
>> +	}
>> +
>> +	/* fw.mgmt (DEVLINK_INFO_VERSION_GENERIC_FW_MGMT) */
>> +	snprintf(buf, sizeof(buf), "%u.%u.%u", hw->fw_maj_ver, hw->fw_min_ver,
>> +		 hw->fw_patch);
>> +	err = devlink_info_version_running_put(req, "fw.mgmt", buf);
> 
> why not use the define?
> 
>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw version data");
>> +		return err;
>> +	}
>> +
>> +	/* fw.api */
>> +	snprintf(buf, sizeof(buf), "%u.%u", hw->api_maj_ver,
>> +		 hw->api_min_ver);
>> +	err = devlink_info_version_running_put(req, "fw.api", buf);
> 
> Is this the API version of the management FW? I'd go for "fw.mgmt.api".
> Datapath, RoCE and other bits may have APIs which evolve independently
> for complex chips.
> 

I'm not 100% sure, but probably. Will check and update.

>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw API data");
>> +		return err;
>> +	}
>> +
>> +	/* fw.build */
>> +	snprintf(buf, sizeof(buf), "%08x", hw->fw_build);
>> +	err = devlink_info_version_running_put(req, "fw.build", buf);
> 
> Huh? Why is this not part of the version?
> 
> Maybe you want to use fw.bundle? Naming is hard, at Netronome added
> that as a unique identifier for the FW in its entirety / the entire
> build as it is passed from Eng to QA and released externally.
> 

fw.bundle is probably more appropriate, yes.

>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw build data");
>> +		return err;
>> +	}
>> +
>> +	/* nvm.version */
>> +	snprintf(buf, sizeof(buf), "%x.%02x", nvm_ver_hi, nvm_ver_lo);
>> +	err = devlink_info_version_running_put(req, "nvm.version", buf);
> 
> Please us the psid

Ok.

> 
>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set NVM version data");
>> +		return err;
>> +	}
>> +
>> +	/* nvm.oem */
>> +	snprintf(buf, sizeof(buf), "%u.%u.%u", oem_ver, oem_build, oem_patch);
>> +	err = devlink_info_version_running_put(req, "nvm.oem", buf);
> 
> This looks like free form catch all. Let's not.

I'm not actually sure what these represent either. Will try to figure
that out and update.

> 
>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set oem version data");
>> +		return err;
>> +	}
>> +
>> +	/* nvm.eetrack */
>> +	snprintf(buf, sizeof(buf), "0x%0X", hw->nvm.eetrack);
> 
> Mm. maybe this is bundle? Or psid. Hm. Please explain what this is and
> what it's supposed to be used for. I should probably add more docs to
> the existing items :S

It's probably closer to a bundle or psid. Will fix.

> 
>> +	err = devlink_info_version_running_put(req, "nvm.eetrack", buf);
>> +	if (err) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Unable to set NVM eetrack data");
>> +		return err;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  const struct devlink_ops ice_devlink_ops = {
>> +	.info_get = ice_devlink_info_get,
>>  };
>>  
>>  /**
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 11/15] ice: add board identifier info to devlink .info_get
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-01-31 18:26     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 18:26 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:59:06 -0800, Jacob Keller wrote:
>> Export a unique board identifier using "board.id" for devlink's
>> .info_get command.
>>
>> Obtain this by reading the NVM for the PBA identification string.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> 
> In general for the devlink info it'd be really useful to have example
> outputs to see what the values actually are.
> 

Makes sense, will add some.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 13/15] devlink: support directly reading from region memory
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-01-31 18:27     ` Jacob Keller
  2020-01-31 19:15     ` Jacob Keller
  1 sibling, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 18:27 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:59:08 -0800, Jacob Keller wrote:
>> Add a new region operation for directly reading from a region, without
>> taking a full snapshot.
>>
>> Extend the DEVLINK_CMD_REGION_READ to allow directly reading from
>> a region, if supported. Instead of reporting a missing snapshot id as
>> invalid, check to see if direct reading is implemented for the region.
>> If so, use the direct read operation to grab the current contents of the
>> region.
>>
>> This new behavior of DEVLINK_CMD_REGION_READ should be backwards
>> compatible. Previously, all kernels rejected such
>> a DEVLINK_CMD_REGION_READ with -EINVAL, and will now either accept the
>> call or report -EOPNOTSUPP for regions which do not implement direct
>> access.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> 
>> +static int devlink_nl_region_read_direct_fill(struct sk_buff *skb,
>> +					      struct devlink *devlink,
>> +					      struct devlink_region *region,
>> +					      struct nlattr **attrs,
>> +					      u64 start_offset,
>> +					      u64 end_offset,
>> +					      bool dump,
>> +					      u64 *new_offset)
>> +{
>> +	u64 curr_offset = start_offset;
>> +	int err = 0;
>> +	u8 *data;
>> +
>> +	/* Allocate and re-use a single buffer */
>> +	data = kzalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	*new_offset = start_offset;
>> +
>> +	if (end_offset > region->size || dump)
>> +		end_offset = region->size;
>> +
>> +	while (curr_offset < end_offset) {
>> +		u32 data_size;
>> +
>> +		if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
>> +			data_size = end_offset - curr_offset;
>> +		else
>> +			data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
> 
> Also known as min() ?
> 

Heh. Yep.

>> +		err = region->ops->read(devlink, curr_offset, data_size, data);
>> +		if (err)
>> +			break;
>> +
>> +		err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
>> +							    data, data_size,
>> +							    curr_offset);
>> +		if (err)
>> +			break;
>> +
>> +		curr_offset += data_size;
>> +	}
>> +	*new_offset = curr_offset;
>> +
>> +	kfree(data);
>> +
>> +	return err;
>> +}
>> +
>>  static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
>>  					     struct netlink_callback *cb)
>>  {
> 
>> +	/* Region may not support direct read access */
>> +	if (direct && !region->ops->read) {
> 
> for missing trigger you added an extack, perhaps makes sense here, too?
> 

Sure.

>> +		err = -EOPNOTSUPP;
>> +		goto out_unlock;
>> +	}
>> +
>>  	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
>>  			  &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI,
>>  			  DEVLINK_CMD_REGION_READ);
> 
> Generally all the devlink parts look quite reasonable to me 👍
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/15] ice: add ice.rst devlink documentation file
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-01-31 18:28     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 18:28 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:59:10 -0800, Jacob Keller wrote:
>> Now that the ice driver has gained some devlink support, add
>> a driver-specific documentation file for it.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> 
> This should have been when info items are added.
> 

Yep.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 13/15] devlink: support directly reading from region memory
  2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:27     ` Jacob Keller
@ 2020-01-31 19:15     ` Jacob Keller
  1 sibling, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-01-31 19:15 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:59:08 -0800, Jacob Keller wrote:
>> +static int devlink_nl_region_read_direct_fill(struct sk_buff *skb,
>> +					      struct devlink *devlink,
>> +					      struct devlink_region *region,
>> +					      struct nlattr **attrs,
>> +					      u64 start_offset,
>> +					      u64 end_offset,
>> +					      bool dump,
>> +					      u64 *new_offset)
>> +{
>> +	u64 curr_offset = start_offset;
>> +	int err = 0;
>> +	u8 *data;
>> +
>> +	/* Allocate and re-use a single buffer */
>> +	data = kzalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	*new_offset = start_offset;
>> +
>> +	if (end_offset > region->size || dump)
>> +		end_offset = region->size;
>> +
>> +	while (curr_offset < end_offset) {
>> +		u32 data_size;
>> +
>> +		if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
>> +			data_size = end_offset - curr_offset;
>> +		else
>> +			data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
> 
> Also known as min() ?

This was more or less adapted from the read_snapshot_fill function. I
could probably make more of an effort to share this logic.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-02-01  0:51     ` Jacob Keller
  2020-02-01 17:43       ` Jakub Kicinski
  0 siblings, 1 reply; 62+ messages in thread
From: Jacob Keller @ 2020-02-01  0:51 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 1/31/2020 10:07 AM, Jakub Kicinski wrote:
> On Thu, 30 Jan 2020 14:59:03 -0800, Jacob Keller wrote:
> Ugh. The devlink instance sharing/aliasing is something that needs to
> be solved at some point. But the problem likely exists elsewhere
> already. Do you have global ASIC resources?
> 

We do have some global resources on the device. This is somewhat managed
by firmware, but not everything is managed, and there's often little to
no visibility even at the driver layer let alone a system administrator.

Things like flash_update and flash regions also only make sense device
wide, and it's a little silly to expose them for multiple functions.

Unfortunately, we *do* have separate PCIe functions, and I wasn't able
to come up with a good (safe) method for managing something like this
that crosses the function boundary.

I noticed that the mlx4 and mlx5 drivers appear to create one devlink
per PF as well, but the mlxsw driver appears to have a single function
for the whole device.

I'm not sure about other drivers, but I do think this is a common
problem, and would benefit from some core code.

One possibility that I've tossed around in my head is some sort of
subordinate PCI bus driver that loads once for the device before the
function drivers load... But I really am not sure where to begin with
that. This also represents a fairly significant device driver model change.

Beyond just having the devlink is some capability to add per-port
configuration as well.

TL;DR; Yes, I'd like to have a single devlink for the device, but no, I
don't have a good answer for how to do it sanely.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-02-01  0:51     ` Jacob Keller
@ 2020-02-01 17:43       ` Jakub Kicinski
  2020-02-03 16:35         ` Jacob Keller
  0 siblings, 1 reply; 62+ messages in thread
From: Jakub Kicinski @ 2020-02-01 17:43 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Fri, 31 Jan 2020 16:51:10 -0800, Jacob Keller wrote:
> TL;DR; Yes, I'd like to have a single devlink for the device, but no, I
> don't have a good answer for how to do it sanely.

Ack, it not a new problem and I don't have a solution either :(

I don't think mlx5 has this distinction of only single/first PF being
able to perform device-wide updates so perhaps it's better to not
introduce that notion? 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-01-30 22:58 ` [PATCH 03/15] devlink: add operation to take an immediate snapshot Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-02-03  8:19   ` Yunsheng Lin
  2020-02-03 11:50     ` Jiri Pirko
  2020-02-03 11:50   ` Jiri Pirko
  2 siblings, 1 reply; 62+ messages in thread
From: Yunsheng Lin @ 2020-02-03  8:19 UTC (permalink / raw)
  To: Jacob Keller, netdev; +Cc: jiri, valex, lihong.yang

On 2020/1/31 6:58, Jacob Keller wrote:
> Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
> command is intended to enable userspace to request an immediate snapshot
> of a region.
> 
> Regions can enable support for requestable snapshots by implementing the
> snapshot callback function in the region's devlink_region_ops structure.
> 
> Implementations of this function callback should capture an immediate
> copy of the data and return it and its destructor in the function
> parameters. The core devlink code will generate a snapshot ID and create
> the new snapshot while holding the devlink instance lock.

Does we need a new devlink command to clear the snapshot created by
DEVLINK_CMD_REGION_TAKE_SNAPSHOT?

It seems the snapshot of a region is only destroyed when unloading
the driver.

> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  .../networking/devlink/devlink-region.rst     |  9 +++-
>  include/net/devlink.h                         |  7 +++
>  include/uapi/linux/devlink.h                  |  2 +
>  net/core/devlink.c                            | 46 +++++++++++++++++++
>  4 files changed, 62 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
> index 1a7683e7acb2..262249e6c3fc 100644
> --- a/Documentation/networking/devlink/devlink-region.rst
> +++ b/Documentation/networking/devlink/devlink-region.rst
> @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
>  Regions may also be used to provide an additional way to debug complex error
>  states, but see also :doc:`devlink-health`
>  
> +Regions may optionally support capturing a snapshot on demand via the
> +``DEVLINK_CMD_REGION_TAKE_SNAPSHOT`` netlink message. A driver wishing to
> +allow requested snapshots must implement the ``.snapshot`` callback for the
> +region in its ``devlink_region_ops`` structure.
> +
>  example usage
>  -------------
>  
> @@ -40,8 +45,8 @@ example usage
>      # Delete a snapshot using:
>      $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>  
> -    # Trigger (request) a snapshot be taken:
> -    $ devlink region trigger pci/0000:00:05.0/cr-space
> +    # Request an immediate snapshot, if supported by the region
> +    $ devlink region snapshot pci/0000:00:05.0/cr-space
>  
>      # Dump a snapshot:
>      $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index 4a0baa6903cb..63e954241404 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -498,9 +498,16 @@ typedef void devlink_snapshot_data_dest_t(const void *data);
>  /**
>   * struct devlink_region_ops - Region operations
>   * @name: region name
> + * @snapshot: callback to request an immediate snapshot. On success,
> + *            the data and destructor variables must be updated. The function
> + *            will be called while the devlink instance lock is held.
>   */
>  struct devlink_region_ops {
>  	const char *name;
> +	int (*snapshot)(struct devlink *devlink,
> +			struct netlink_ext_ack *extack,
> +			u8 **data,
> +			devlink_snapshot_data_dest_t **destructor);
>  };
>  
>  struct devlink_fmsg;
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index ae37fd4d194a..46643c4320b9 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -117,6 +117,8 @@ enum devlink_command {
>  	DEVLINK_CMD_TRAP_GROUP_NEW,
>  	DEVLINK_CMD_TRAP_GROUP_DEL,
>  
> +	DEVLINK_CMD_REGION_TAKE_SNAPSHOT,
> +
>  	/* add new commands above here */
>  	__DEVLINK_CMD_MAX,
>  	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index faf4f4c5c539..574008c536fa 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -4109,6 +4109,45 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
>  	return err;
>  }
>  
> +static int devlink_nl_cmd_region_take_snapshot(struct sk_buff *skb,
> +					       struct genl_info *info)
> +{
> +	struct devlink *devlink = info->user_ptr[0];
> +	devlink_snapshot_data_dest_t *destructor;
> +	struct devlink_region *region;
> +	const char *region_name;
> +	u32 snapshot_id;
> +	u8 *data;
> +	int err;
> +
> +	if (!info->attrs[DEVLINK_ATTR_REGION_NAME]) {
> +		NL_SET_ERR_MSG_MOD(info->extack, "No region name provided");
> +		return -EINVAL;
> +	}
> +
> +	region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
> +	region = devlink_region_get_by_name(devlink, region_name);
> +	if (!region) {
> +		NL_SET_ERR_MSG_MOD(info->extack,
> +				   "The requested region does not exist");
> +		return -EINVAL;
> +	}
> +
> +	if (!region->ops->snapshot) {
> +		NL_SET_ERR_MSG_MOD(info->extack,
> +				   "The requested region does not support taking an immediate snapshot");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	err = region->ops->snapshot(devlink, info->extack, &data, &destructor);
> +	if (err)
> +		return err;
> +
> +	snapshot_id = devlink_region_snapshot_id_get_locked(devlink);
> +	return devlink_region_snapshot_create_locked(region, data, snapshot_id,
> +						     destructor);
> +}
> +
>  struct devlink_info_req {
>  	struct sk_buff *msg;
>  };
> @@ -6249,6 +6288,13 @@ static const struct genl_ops devlink_nl_ops[] = {
>  		.flags = GENL_ADMIN_PERM,
>  		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>  	},
> +	{
> +		.cmd = DEVLINK_CMD_REGION_TAKE_SNAPSHOT,
> +		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> +		.doit = devlink_nl_cmd_region_take_snapshot,
> +		.flags = GENL_ADMIN_PERM,
> +		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
> +	},
>  	{
>  		.cmd = DEVLINK_CMD_INFO_GET,
>  		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-01-30 22:59 ` [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-02-03 11:29   ` Jiri Pirko
  2020-02-03 16:56     ` Jacob Keller
  2 siblings, 1 reply; 62+ messages in thread
From: Jiri Pirko @ 2020-02-03 11:29 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang

Thu, Jan 30, 2020 at 11:59:03PM CET, jacob.e.keller@intel.com wrote:
>Add devres managed allocation functions for allocating a devlink
>instance. These can be used by device drivers based on the devres
>framework which want to allocate a devlink instance.
>
>For simplicity and to reduce churn in the devlink core code, the devres
>management works by creating a node with a double-pointer. The devlink
>instance is allocated using the normal devlink_alloc and released using
>the normal devlink_free.
>
>An alternative solution where the raw memory for devlink is allocated
>directly via devres_alloc could be done. Such an implementation would
>either significantly increase code duplication or code churn in order to
>refactor the setup from the allocation.
>
>The new devres managed allocation function will be used by the ice
>driver in a following change to implement initial devlink support.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>---
> include/net/devlink.h |  4 ++++
> lib/devres.c          |  1 +
> net/core/devlink.c    | 54 +++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 59 insertions(+)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 63e954241404..1c3540280396 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -858,11 +858,15 @@ struct ib_device;
> struct net *devlink_net(const struct devlink *devlink);
> void devlink_net_set(struct devlink *devlink, struct net *net);
> struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
>+struct devlink *devlinkm_alloc(struct device * dev,
>+			       const struct devlink_ops *ops,
>+			       size_t priv_size);
> int devlink_register(struct devlink *devlink, struct device *dev);
> void devlink_unregister(struct devlink *devlink);
> void devlink_reload_enable(struct devlink *devlink);
> void devlink_reload_disable(struct devlink *devlink);
> void devlink_free(struct devlink *devlink);
>+void devlinkm_free(struct device *dev, struct devlink *devlink);
> int devlink_port_register(struct devlink *devlink,
> 			  struct devlink_port *devlink_port,
> 			  unsigned int port_index);
>diff --git a/lib/devres.c b/lib/devres.c
>index 6ef51f159c54..239c81d40612 100644
>--- a/lib/devres.c
>+++ b/lib/devres.c
>@@ -5,6 +5,7 @@
> #include <linux/gfp.h>
> #include <linux/export.h>
> #include <linux/of_address.h>
>+#include <net/devlink.h>
> 
> enum devm_ioremap_type {
> 	DEVM_IOREMAP = 0,
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 574008c536fa..b2b855d12a11 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -6531,6 +6531,60 @@ void devlink_free(struct devlink *devlink)
> }
> EXPORT_SYMBOL_GPL(devlink_free);
> 
>+static void devres_devlink_release(struct device *dev, void *res)
>+{
>+	devlink_free(*(struct devlink **)res);
>+}
>+
>+static int devres_devlink_match(struct device *dev, void *res, void *data)
>+{
>+	return *(struct devlink **)res == data;
>+}
>+
>+/**
>+ * devlinkm_alloc - Allocate devlink instance managed by devres
>+ * @dev: device to allocate devlink for
>+ * @ops: devlink ops structure
>+ * @priv_size: size of private data portion
>+ *
>+ * Allocate a devlink instance and manage its release via devres.
>+ */
>+struct devlink *devlinkm_alloc(struct device *dev,

Why "devlinkm"? Looks like the usual prefix for this is "devm_"
So "devm_devlink_alloc/free"?


>+			       const struct devlink_ops *ops,
>+			       size_t priv_size)
>+{
>+	struct devlink **ptr, *devlink = NULL;
>+
>+	ptr = devres_alloc(devres_devlink_release, sizeof(*ptr), GFP_KERNEL);
>+	if (!ptr)
>+		return NULL;
>+
>+	devlink = devlink_alloc(ops, priv_size);
>+	if (devlink) {
>+		*ptr = devlink;
>+		devres_add(dev, ptr);
>+	} else {
>+		devres_free(ptr);
>+	}
>+
>+	return devlink;
>+}
>+EXPORT_SYMBOL_GPL(devlinkm_alloc);
>+
>+/**
>+ * devlinkm_free - Free devlink instance managed by devres
>+ * @dev: device to remove the allocated devlink from
>+ * @devlink: devlink instance to free
>+ *
>+ * Find and remove the devres node associated with the given devlink.
>+ */
>+void devlinkm_free(struct device *dev, struct devlink *devlink)
>+{
>+	WARN_ON(devres_release(dev, devres_devlink_release,
>+			       devres_devlink_match, devlink));
>+}
>+EXPORT_SYMBOL_GPL(devlinkm_free);
>+
> static void devlink_port_type_warn(struct work_struct *work)
> {
> 	WARN(true, "Type was not set for devlink port.");
>-- 
>2.25.0.rc1
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/15] devlink: prepare to support region operations
  2020-01-30 22:58 ` [PATCH 01/15] devlink: prepare to support region operations Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-02-03 11:35   ` Jiri Pirko
  2020-02-03 16:48     ` Jacob Keller
                       ` (3 more replies)
  1 sibling, 4 replies; 62+ messages in thread
From: Jiri Pirko @ 2020-02-03 11:35 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang

Thu, Jan 30, 2020 at 11:58:56PM CET, jacob.e.keller@intel.com wrote:
>Modify the devlink region code in preparation for adding new operations
>on regions.
>
>Create a devlink_region_ops structure, and move the name pointer from
>within the devlink_region structure into the ops structure (similar to
>the devlink_health_reporter_ops).
>
>This prepares the regions to enable support of additional operations in
>the future such as requesting snapshots, or accessing the region
>directly without a snapshot.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>---
> drivers/net/ethernet/mellanox/mlx4/crdump.c | 25 ++++++++++++---------
> drivers/net/netdevsim/dev.c                 |  6 ++++-
> include/net/devlink.h                       | 17 ++++++++++----
> net/core/devlink.c                          | 23 ++++++++++---------
> 4 files changed, 45 insertions(+), 26 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c b/drivers/net/ethernet/mellanox/mlx4/crdump.c
>index 64ed725aec28..4cea64033919 100644
>--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
>+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
>@@ -38,8 +38,13 @@
> #define CR_ENABLE_BIT_OFFSET		0xF3F04
> #define MAX_NUM_OF_DUMPS_TO_STORE	(8)
> 
>-static const char *region_cr_space_str = "cr-space";
>-static const char *region_fw_health_str = "fw-health";

Just leave these as are and use in ops and messages. It is odd to use
ops.name in the message.


>+static const struct devlink_region_ops region_cr_space_ops = {
>+	.name = "cr-space",
>+};
>+
>+static const struct devlink_region_ops region_fw_health_ops = {
>+	.name = "fw-health",
>+};
> 
> /* Set to true in case cr enable bit was set to true before crdump */
> static bool crdump_enbale_bit_set;
>@@ -103,10 +108,10 @@ static void mlx4_crdump_collect_crspace(struct mlx4_dev *dev,
> 		if (err) {
> 			kvfree(crspace_data);
> 			mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
>-				  region_cr_space_str, id, err);
>+				  region_cr_space_ops.name, id, err);
> 		} else {
> 			mlx4_info(dev, "crdump: added snapshot %d to devlink region %s\n",
>-				  id, region_cr_space_str);
>+				  id, region_cr_space_ops.name);
> 		}
> 	} else {
> 		mlx4_err(dev, "crdump: Failed to allocate crspace buffer\n");
>@@ -142,10 +147,10 @@ static void mlx4_crdump_collect_fw_health(struct mlx4_dev *dev,
> 		if (err) {
> 			kvfree(health_data);
> 			mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
>-				  region_fw_health_str, id, err);
>+				  region_fw_health_ops.name, id, err);
> 		} else {
> 			mlx4_info(dev, "crdump: added snapshot %d to devlink region %s\n",
>-				  id, region_fw_health_str);
>+				  id, region_fw_health_ops.name);
> 		}
> 	} else {
> 		mlx4_err(dev, "crdump: Failed to allocate health buffer\n");
>@@ -205,23 +210,23 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
> 	/* Create cr-space region */
> 	crdump->region_crspace =
> 		devlink_region_create(devlink,
>-				      region_cr_space_str,
>+				      &region_cr_space_ops,
> 				      MAX_NUM_OF_DUMPS_TO_STORE,
> 				      pci_resource_len(pdev, 0));
> 	if (IS_ERR(crdump->region_crspace))
> 		mlx4_warn(dev, "crdump: create devlink region %s err %ld\n",
>-			  region_cr_space_str,
>+			  region_cr_space_ops.name,
> 			  PTR_ERR(crdump->region_crspace));
> 
> 	/* Create fw-health region */
> 	crdump->region_fw_health =
> 		devlink_region_create(devlink,
>-				      region_fw_health_str,
>+				      &region_fw_health_ops,
> 				      MAX_NUM_OF_DUMPS_TO_STORE,
> 				      HEALTH_BUFFER_SIZE);
> 	if (IS_ERR(crdump->region_fw_health))
> 		mlx4_warn(dev, "crdump: create devlink region %s err %ld\n",
>-			  region_fw_health_str,
>+			  region_fw_health_ops.name,
> 			  PTR_ERR(crdump->region_fw_health));
> 
> 	return 0;
>diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
>index b53fbc06e104..d521b7bfe007 100644
>--- a/drivers/net/netdevsim/dev.c
>+++ b/drivers/net/netdevsim/dev.c
>@@ -242,11 +242,15 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
> 
> #define NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX 16
> 
>+static const struct devlink_region_ops dummy_region_ops = {
>+	.name = "dummy",
>+};
>+
> static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
> 				      struct devlink *devlink)
> {
> 	nsim_dev->dummy_region =
>-		devlink_region_create(devlink, "dummy",
>+		devlink_region_create(devlink, &dummy_region_ops,
> 				      NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX,
> 				      NSIM_DEV_DUMMY_REGION_SIZE);
> 	return PTR_ERR_OR_ZERO(nsim_dev->dummy_region);
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index ce5cea428fdc..4a0baa6903cb 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -495,6 +495,14 @@ struct devlink_info_req;
> 
> typedef void devlink_snapshot_data_dest_t(const void *data);
> 
>+/**
>+ * struct devlink_region_ops - Region operations
>+ * @name: region name
>+ */
>+struct devlink_region_ops {
>+	const char *name;
>+};
>+
> struct devlink_fmsg;
> struct devlink_health_reporter;
> 
>@@ -949,10 +957,11 @@ void devlink_port_param_value_changed(struct devlink_port *devlink_port,
> 				      u32 param_id);
> void devlink_param_value_str_fill(union devlink_param_value *dst_val,
> 				  const char *src);
>-struct devlink_region *devlink_region_create(struct devlink *devlink,
>-					     const char *region_name,
>-					     u32 region_max_snapshots,
>-					     u64 region_size);
>+struct devlink_region *
>+devlink_region_create(struct devlink *devlink,
>+		      const struct devlink_region_ops *ops,
>+		      u32 region_max_snapshots,

No need to wrap here.


>+		      u64 region_size);
> void devlink_region_destroy(struct devlink_region *region);
> u32 devlink_region_snapshot_id_get(struct devlink *devlink);
> int devlink_region_snapshot_create(struct devlink_region *region,
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index ca1df0ec3c97..d1f7bfbf81da 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -344,7 +344,7 @@ devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
> struct devlink_region {
> 	struct devlink *devlink;
> 	struct list_head list;
>-	const char *name;
>+	const struct devlink_region_ops *ops;
> 	struct list_head snapshot_list;
> 	u32 max_snapshots;
> 	u32 cur_snapshots;
>@@ -365,7 +365,7 @@ devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
> 	struct devlink_region *region;
> 
> 	list_for_each_entry(region, &devlink->region_list, list)
>-		if (!strcmp(region->name, region_name))
>+		if (!strcmp(region->ops->name, region_name))
> 			return region;
> 
> 	return NULL;
>@@ -3687,7 +3687,7 @@ static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
> 	if (err)
> 		goto nla_put_failure;
> 
>-	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->name);
>+	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->ops->name);
> 	if (err)
> 		goto nla_put_failure;
> 
>@@ -3733,7 +3733,7 @@ static void devlink_nl_region_notify(struct devlink_region *region,
> 		goto out_cancel_msg;
> 
> 	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME,
>-			     region->name);
>+			     region->ops->name);
> 	if (err)
> 		goto out_cancel_msg;
> 
>@@ -7530,21 +7530,22 @@ EXPORT_SYMBOL_GPL(devlink_param_value_str_fill);
>  *	devlink_region_create - create a new address region
>  *
>  *	@devlink: devlink
>- *	@region_name: region name
>+ *	@ops: region operations and name
>  *	@region_max_snapshots: Maximum supported number of snapshots for region
>  *	@region_size: size of region
>  */
>-struct devlink_region *devlink_region_create(struct devlink *devlink,
>-					     const char *region_name,
>-					     u32 region_max_snapshots,
>-					     u64 region_size)
>+struct devlink_region *
>+devlink_region_create(struct devlink *devlink,
>+		      const struct devlink_region_ops *ops,
>+		      u32 region_max_snapshots,

No need to wrap here.


>+		      u64 region_size)
> {
> 	struct devlink_region *region;
> 	int err = 0;
> 
> 	mutex_lock(&devlink->lock);
> 
>-	if (devlink_region_get_by_name(devlink, region_name)) {
>+	if (devlink_region_get_by_name(devlink, ops->name)) {
> 		err = -EEXIST;
> 		goto unlock;
> 	}
>@@ -7557,7 +7558,7 @@ struct devlink_region *devlink_region_create(struct devlink *devlink,
> 
> 	region->devlink = devlink;
> 	region->max_snapshots = region_max_snapshots;
>-	region->name = region_name;
>+	region->ops = ops;
> 	region->size = region_size;
> 	INIT_LIST_HEAD(&region->snapshot_list);
> 	list_add_tail(&region->list, &devlink->region_list);
>-- 
>2.25.0.rc1
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/15] devlink: add functions to take snapshot while locked
  2020-01-30 22:58 ` [PATCH 02/15] devlink: add functions to take snapshot while locked Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-02-03 11:39   ` Jiri Pirko
  2020-02-03 16:45     ` Jacob Keller
  1 sibling, 1 reply; 62+ messages in thread
From: Jiri Pirko @ 2020-02-03 11:39 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang

Thu, Jan 30, 2020 at 11:58:57PM CET, jacob.e.keller@intel.com wrote:
>A future change is going to add a new devlink command to request
>a snapshot on demand. This function will want to call the
>devlink_region_snapshot_id_get and devlink_region_snapshot_create
>functions while already holding the devlink instance lock.
>
>Extract the logic of these two functions into static functions with the
>_locked postfix. Modify the original functions to be implemented in
>terms of the new locked functions.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>---
> net/core/devlink.c | 95 +++++++++++++++++++++++++++++-----------------
> 1 file changed, 61 insertions(+), 34 deletions(-)
>
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index d1f7bfbf81da..faf4f4c5c539 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -3761,6 +3761,63 @@ static void devlink_nl_region_notify(struct devlink_region *region,
> 	nlmsg_free(msg);
> }
> 
>+/**
>+ *	devlink_region_snapshot_id_get_locked - get snapshot ID
>+ *
>+ *	Returns a new snapshot id. Must be called while holding the
>+ *	devlink instance lock.
>+ */
>+static u32 devlink_region_snapshot_id_get_locked(struct devlink *devlink)

__devlink_region_snapshot_id_get()


>+{
>+	return ++devlink->snapshot_id;
>+}
>+
>+/**
>+ *	devlink_region_snapshot_create_locked - create a new snapshot
>+ *	This will add a new snapshot of a region. The snapshot
>+ *	will be stored on the region struct and can be accessed
>+ *	from devlink. This is useful for future	analyses of snapshots.
>+ *	Multiple snapshots can be created on a region.
>+ *	The @snapshot_id should be obtained using the getter function.
>+ *
>+ *	Must be called only while holding the devlink instance lock.
>+ *
>+ *	@region: devlink region of the snapshot
>+ *	@data: snapshot data
>+ *	@snapshot_id: snapshot id to be created
>+ *	@destructor: pointer to destructor function to free data
>+ */
>+static int
>+devlink_region_snapshot_create_locked(struct devlink_region *region,

__devlink_region_snapshot_create()


>+				      u8 *data, u32 snapshot_id,
>+				      devlink_snapshot_data_dest_t *destructor)
>+{
>+	struct devlink_snapshot *snapshot;
>+
>+	/* check if region can hold one more snapshot */
>+	if (region->cur_snapshots == region->max_snapshots)
>+		return -ENOMEM;
>+
>+	if (devlink_region_snapshot_get_by_id(region, snapshot_id))
>+		return -EEXIST;
>+
>+	snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
>+	if (!snapshot)
>+		return -ENOMEM;
>+
>+	snapshot->id = snapshot_id;
>+	snapshot->region = region;
>+	snapshot->data = data;
>+	snapshot->data_destructor = destructor;
>+
>+	list_add_tail(&snapshot->list, &region->snapshot_list);
>+
>+	region->cur_snapshots++;
>+
>+	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
>+	return 0;
>+}
>+
> static void devlink_region_snapshot_del(struct devlink_region *region,
> 					struct devlink_snapshot *snapshot)
> {
>@@ -7611,7 +7668,7 @@ u32 devlink_region_snapshot_id_get(struct devlink *devlink)
> 	u32 id;
> 
> 	mutex_lock(&devlink->lock);
>-	id = ++devlink->snapshot_id;
>+	id = devlink_region_snapshot_id_get_locked(devlink);
> 	mutex_unlock(&devlink->lock);
> 
> 	return id;
>@@ -7622,7 +7679,7 @@ EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_get);
>  *	devlink_region_snapshot_create - create a new snapshot
>  *	This will add a new snapshot of a region. The snapshot
>  *	will be stored on the region struct and can be accessed
>- *	from devlink. This is useful for future	analyses of snapshots.
>+ *	from devlink. This is useful for future analyses of snapshots.

What this hunk is about? :O



>  *	Multiple snapshots can be created on a region.
>  *	The @snapshot_id should be obtained using the getter function.
>  *
>@@ -7636,43 +7693,13 @@ int devlink_region_snapshot_create(struct devlink_region *region,
> 				   devlink_snapshot_data_dest_t *data_destructor)
> {
> 	struct devlink *devlink = region->devlink;
>-	struct devlink_snapshot *snapshot;
> 	int err;
> 
> 	mutex_lock(&devlink->lock);
>-
>-	/* check if region can hold one more snapshot */
>-	if (region->cur_snapshots == region->max_snapshots) {
>-		err = -ENOMEM;
>-		goto unlock;
>-	}
>-
>-	if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
>-		err = -EEXIST;
>-		goto unlock;
>-	}
>-
>-	snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
>-	if (!snapshot) {
>-		err = -ENOMEM;
>-		goto unlock;
>-	}
>-
>-	snapshot->id = snapshot_id;
>-	snapshot->region = region;
>-	snapshot->data = data;
>-	snapshot->data_destructor = data_destructor;
>-
>-	list_add_tail(&snapshot->list, &region->snapshot_list);
>-
>-	region->cur_snapshots++;
>-
>-	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
>+	err = devlink_region_snapshot_create_locked(region, data, snapshot_id,
>+						    data_destructor);
> 	mutex_unlock(&devlink->lock);
>-	return 0;
> 
>-unlock:
>-	mutex_unlock(&devlink->lock);
> 	return err;
> }
> EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
>-- 
>2.25.0.rc1
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-01-30 22:58 ` [PATCH 03/15] devlink: add operation to take an immediate snapshot Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
  2020-02-03  8:19   ` Yunsheng Lin
@ 2020-02-03 11:50   ` Jiri Pirko
  2020-02-03 16:33     ` Jacob Keller
  2020-02-03 19:32     ` Jacob Keller
  2 siblings, 2 replies; 62+ messages in thread
From: Jiri Pirko @ 2020-02-03 11:50 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang

Thu, Jan 30, 2020 at 11:58:58PM CET, jacob.e.keller@intel.com wrote:
>Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
>command is intended to enable userspace to request an immediate snapshot
>of a region.
>
>Regions can enable support for requestable snapshots by implementing the
>snapshot callback function in the region's devlink_region_ops structure.
>
>Implementations of this function callback should capture an immediate
>copy of the data and return it and its destructor in the function
>parameters. The core devlink code will generate a snapshot ID and create
>the new snapshot while holding the devlink instance lock.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>---
> .../networking/devlink/devlink-region.rst     |  9 +++-
> include/net/devlink.h                         |  7 +++
> include/uapi/linux/devlink.h                  |  2 +
> net/core/devlink.c                            | 46 +++++++++++++++++++
> 4 files changed, 62 insertions(+), 2 deletions(-)
>
>diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>index 1a7683e7acb2..262249e6c3fc 100644
>--- a/Documentation/networking/devlink/devlink-region.rst
>+++ b/Documentation/networking/devlink/devlink-region.rst
>@@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
> Regions may also be used to provide an additional way to debug complex error
> states, but see also :doc:`devlink-health`
> 
>+Regions may optionally support capturing a snapshot on demand via the
>+``DEVLINK_CMD_REGION_TAKE_SNAPSHOT`` netlink message. A driver wishing to
>+allow requested snapshots must implement the ``.snapshot`` callback for the
>+region in its ``devlink_region_ops`` structure.
>+
> example usage
> -------------
> 
>@@ -40,8 +45,8 @@ example usage
>     # Delete a snapshot using:
>     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
> 
>-    # Trigger (request) a snapshot be taken:
>-    $ devlink region trigger pci/0000:00:05.0/cr-space
>+    # Request an immediate snapshot, if supported by the region
>+    $ devlink region snapshot pci/0000:00:05.0/cr-space


Hmm, the shapshot is now removed by user calling:

$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
That is using DEVLINK_CMD_REGION_DEL netlink command calling
devlink_nl_cmd_region_del()

I think the creation should be symmetric. Something like:
$ devlink region add DEV/REGION snapshot SNAPSHOT_ID
SNAPSHOT_ID is either exact number or "any" if user does not care.
The benefit of using user-passed ID value is that you can use this
easily in scripts.

The existing unused netlink command DEVLINK_CMD_REGION_NEW would be used
for this.


> 
>     # Dump a snapshot:
>     $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1

[...]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-02-03  8:19   ` Yunsheng Lin
@ 2020-02-03 11:50     ` Jiri Pirko
  0 siblings, 0 replies; 62+ messages in thread
From: Jiri Pirko @ 2020-02-03 11:50 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: Jacob Keller, netdev, valex, lihong.yang

Mon, Feb 03, 2020 at 09:19:23AM CET, linyunsheng@huawei.com wrote:
>On 2020/1/31 6:58, Jacob Keller wrote:
>> Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
>> command is intended to enable userspace to request an immediate snapshot
>> of a region.
>> 
>> Regions can enable support for requestable snapshots by implementing the
>> snapshot callback function in the region's devlink_region_ops structure.
>> 
>> Implementations of this function callback should capture an immediate
>> copy of the data and return it and its destructor in the function
>> parameters. The core devlink code will generate a snapshot ID and create
>> the new snapshot while holding the devlink instance lock.
>
>Does we need a new devlink command to clear the snapshot created by
>DEVLINK_CMD_REGION_TAKE_SNAPSHOT?
>
>It seems the snapshot of a region is only destroyed when unloading
>the driver.

There is existing command to handle this:
devlink region del DEV/REGION snapshot SNAPSHOT_ID

>
>> 
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>>  .../networking/devlink/devlink-region.rst     |  9 +++-
>>  include/net/devlink.h                         |  7 +++
>>  include/uapi/linux/devlink.h                  |  2 +
>>  net/core/devlink.c                            | 46 +++++++++++++++++++
>>  4 files changed, 62 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>> index 1a7683e7acb2..262249e6c3fc 100644
>> --- a/Documentation/networking/devlink/devlink-region.rst
>> +++ b/Documentation/networking/devlink/devlink-region.rst
>> @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
>>  Regions may also be used to provide an additional way to debug complex error
>>  states, but see also :doc:`devlink-health`
>>  
>> +Regions may optionally support capturing a snapshot on demand via the
>> +``DEVLINK_CMD_REGION_TAKE_SNAPSHOT`` netlink message. A driver wishing to
>> +allow requested snapshots must implement the ``.snapshot`` callback for the
>> +region in its ``devlink_region_ops`` structure.
>> +
>>  example usage
>>  -------------
>>  
>> @@ -40,8 +45,8 @@ example usage
>>      # Delete a snapshot using:
>>      $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>>  
>> -    # Trigger (request) a snapshot be taken:
>> -    $ devlink region trigger pci/0000:00:05.0/cr-space
>> +    # Request an immediate snapshot, if supported by the region
>> +    $ devlink region snapshot pci/0000:00:05.0/cr-space
>>  
>>      # Dump a snapshot:
>>      $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index 4a0baa6903cb..63e954241404 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -498,9 +498,16 @@ typedef void devlink_snapshot_data_dest_t(const void *data);
>>  /**
>>   * struct devlink_region_ops - Region operations
>>   * @name: region name
>> + * @snapshot: callback to request an immediate snapshot. On success,
>> + *            the data and destructor variables must be updated. The function
>> + *            will be called while the devlink instance lock is held.
>>   */
>>  struct devlink_region_ops {
>>  	const char *name;
>> +	int (*snapshot)(struct devlink *devlink,
>> +			struct netlink_ext_ack *extack,
>> +			u8 **data,
>> +			devlink_snapshot_data_dest_t **destructor);
>>  };
>>  
>>  struct devlink_fmsg;
>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> index ae37fd4d194a..46643c4320b9 100644
>> --- a/include/uapi/linux/devlink.h
>> +++ b/include/uapi/linux/devlink.h
>> @@ -117,6 +117,8 @@ enum devlink_command {
>>  	DEVLINK_CMD_TRAP_GROUP_NEW,
>>  	DEVLINK_CMD_TRAP_GROUP_DEL,
>>  
>> +	DEVLINK_CMD_REGION_TAKE_SNAPSHOT,
>> +
>>  	/* add new commands above here */
>>  	__DEVLINK_CMD_MAX,
>>  	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index faf4f4c5c539..574008c536fa 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -4109,6 +4109,45 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
>>  	return err;
>>  }
>>  
>> +static int devlink_nl_cmd_region_take_snapshot(struct sk_buff *skb,
>> +					       struct genl_info *info)
>> +{
>> +	struct devlink *devlink = info->user_ptr[0];
>> +	devlink_snapshot_data_dest_t *destructor;
>> +	struct devlink_region *region;
>> +	const char *region_name;
>> +	u32 snapshot_id;
>> +	u8 *data;
>> +	int err;
>> +
>> +	if (!info->attrs[DEVLINK_ATTR_REGION_NAME]) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "No region name provided");
>> +		return -EINVAL;
>> +	}
>> +
>> +	region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
>> +	region = devlink_region_get_by_name(devlink, region_name);
>> +	if (!region) {
>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> +				   "The requested region does not exist");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!region->ops->snapshot) {
>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> +				   "The requested region does not support taking an immediate snapshot");
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	err = region->ops->snapshot(devlink, info->extack, &data, &destructor);
>> +	if (err)
>> +		return err;
>> +
>> +	snapshot_id = devlink_region_snapshot_id_get_locked(devlink);
>> +	return devlink_region_snapshot_create_locked(region, data, snapshot_id,
>> +						     destructor);
>> +}
>> +
>>  struct devlink_info_req {
>>  	struct sk_buff *msg;
>>  };
>> @@ -6249,6 +6288,13 @@ static const struct genl_ops devlink_nl_ops[] = {
>>  		.flags = GENL_ADMIN_PERM,
>>  		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>>  	},
>> +	{
>> +		.cmd = DEVLINK_CMD_REGION_TAKE_SNAPSHOT,
>> +		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> +		.doit = devlink_nl_cmd_region_take_snapshot,
>> +		.flags = GENL_ADMIN_PERM,
>> +		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>> +	},
>>  	{
>>  		.cmd = DEVLINK_CMD_INFO_GET,
>>  		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> 
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 13/15] devlink: support directly reading from region memory
  2020-01-30 22:59 ` [PATCH 13/15] devlink: support directly reading from region memory Jacob Keller
  2020-01-31 18:07   ` Jakub Kicinski
@ 2020-02-03 13:44   ` Jiri Pirko
  2020-02-03 16:40     ` Jacob Keller
  1 sibling, 1 reply; 62+ messages in thread
From: Jiri Pirko @ 2020-02-03 13:44 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang

Thu, Jan 30, 2020 at 11:59:08PM CET, jacob.e.keller@intel.com wrote:
>Add a new region operation for directly reading from a region, without
>taking a full snapshot.
>
>Extend the DEVLINK_CMD_REGION_READ to allow directly reading from
>a region, if supported. Instead of reporting a missing snapshot id as
>invalid, check to see if direct reading is implemented for the region.
>If so, use the direct read operation to grab the current contents of the
>region.
>
>This new behavior of DEVLINK_CMD_REGION_READ should be backwards
>compatible. Previously, all kernels rejected such
>a DEVLINK_CMD_REGION_READ with -EINVAL, and will now either accept the
>call or report -EOPNOTSUPP for regions which do not implement direct
>access.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>---
> .../networking/devlink/devlink-region.rst     |  8 ++
> include/net/devlink.h                         |  4 +
> net/core/devlink.c                            | 82 +++++++++++++++++--
> 3 files changed, 87 insertions(+), 7 deletions(-)
>
>diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>index 262249e6c3fc..a543f5ee7a9e 100644
>--- a/Documentation/networking/devlink/devlink-region.rst
>+++ b/Documentation/networking/devlink/devlink-region.rst
>@@ -25,6 +25,10 @@ Regions may optionally support capturing a snapshot on demand via the
> allow requested snapshots must implement the ``.snapshot`` callback for the
> region in its ``devlink_region_ops`` structure.
> 
>+Regions may optionally allow directly reading from their contents without a
>+snapshot. A driver wishing to enable this for a region should implement the
>+``.read`` callback in the ``devlink_region_ops`` structure.
>+
> example usage
> -------------
> 
>@@ -60,6 +64,10 @@ example usage
>             length 16
>     0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
> 
>+    # Read from the region without a snapshot
>+    $ devlink region read pci/0000:00:05.0/fw-health address 16 length 16
>+    0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
>+
> As regions are likely very device or driver specific, no generic regions are
> defined. See the driver-specific documentation files for information on the
> specific regions a driver supports.
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 1c3540280396..47ce1b5481de 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -508,6 +508,10 @@ struct devlink_region_ops {
> 			struct netlink_ext_ack *extack,
> 			u8 **data,
> 			devlink_snapshot_data_dest_t **destructor);
>+	int (*read)(struct devlink *devlink,
>+		    u64 curr_offset,
>+		    u32 data_size,
>+		    u8 *data);

Too much wrapping.


> };
> 
> struct devlink_fmsg;
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index b2b855d12a11..5831b7b78915 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -4005,6 +4005,56 @@ static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
> 	return err;
> }
> 
>+static int devlink_nl_region_read_direct_fill(struct sk_buff *skb,
>+					      struct devlink *devlink,
>+					      struct devlink_region *region,
>+					      struct nlattr **attrs,
>+					      u64 start_offset,
>+					      u64 end_offset,
>+					      bool dump,
>+					      u64 *new_offset)

Again.


>+{
>+	u64 curr_offset = start_offset;
>+	int err = 0;
>+	u8 *data;
>+
>+	/* Allocate and re-use a single buffer */
>+	data = kzalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL);
>+	if (!data)
>+		return -ENOMEM;
>+
>+	*new_offset = start_offset;
>+
>+	if (end_offset > region->size || dump)
>+		end_offset = region->size;
>+
>+	while (curr_offset < end_offset) {
>+		u32 data_size;
>+
>+		if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
>+			data_size = end_offset - curr_offset;
>+		else
>+			data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
>+
>+		err = region->ops->read(devlink, curr_offset, data_size, data);

There is a lot of code duplication is this function. Perhap there could
be a cb and cb_priv here to distinguish shapshot and direct read?



>+		if (err)
>+			break;
>+
>+		err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
>+							    data, data_size,
>+							    curr_offset);
>+		if (err)
>+			break;
>+
>+		curr_offset += data_size;
>+	}
>+	*new_offset = curr_offset;
>+
>+	kfree(data);
>+
>+	return err;
>+}
>+
> static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
> 					     struct netlink_callback *cb)
> {
>@@ -4016,6 +4066,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
> 	const char *region_name;
> 	struct devlink *devlink;
> 	bool dump = true;
>+	bool direct;
> 	void *hdr;
> 	int err;
> 
>@@ -4030,8 +4081,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
> 
> 	mutex_lock(&devlink->lock);
> 
>-	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
>-	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
>+	if (!attrs[DEVLINK_ATTR_REGION_NAME]) {
> 		err = -EINVAL;
> 		goto out_unlock;
> 	}
>@@ -4043,6 +4093,17 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
> 		goto out_unlock;
> 	}
> 
>+	if (attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
>+		direct = false;
>+	else
>+		direct = true;

	direct = !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID];


>+
>+	/* Region may not support direct read access */
>+	if (direct && !region->ops->read) {

extack msg please.


>+		err = -EOPNOTSUPP;
>+		goto out_unlock;
>+	}
>+
> 	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
> 			  &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI,
> 			  DEVLINK_CMD_REGION_READ);
>@@ -4076,11 +4137,18 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
> 		dump = false;
> 	}
> 
>-	err = devlink_nl_region_read_snapshot_fill(skb, devlink,
>-						   region, attrs,
>-						   start_offset,
>-						   end_offset, dump,
>-						   &ret_offset);
>+	if (direct)
>+		err = devlink_nl_region_read_direct_fill(skb, devlink,
>+							 region, attrs,
>+							 start_offset,
>+							 end_offset, dump,
>+							 &ret_offset);
>+	else
>+		err = devlink_nl_region_read_snapshot_fill(skb, devlink,
>+							   region, attrs,
>+							   start_offset,
>+							   end_offset, dump,
>+							   &ret_offset);
> 
> 	if (err && err != -EMSGSIZE)
> 		goto nla_put_failure;
>-- 
>2.25.0.rc1
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-02-03 11:50   ` Jiri Pirko
@ 2020-02-03 16:33     ` Jacob Keller
  2020-02-03 19:32     ` Jacob Keller
  1 sibling, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 16:33 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang

On 2/3/2020 3:50 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:58:58PM CET, jacob.e.keller@intel.com wrote:
>> Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
> 
> Hmm, the shapshot is now removed by user calling:
> 
> $ devlink region del DEV/REGION snapshot SNAPSHOT_ID
> That is using DEVLINK_CMD_REGION_DEL netlink command calling
> devlink_nl_cmd_region_del()
> 
> I think the creation should be symmetric. Something like:
> $ devlink region add DEV/REGION snapshot SNAPSHOT_ID
> SNAPSHOT_ID is either exact number or "any" if user does not care.
> The benefit of using user-passed ID value is that you can use this
> easily in scripts.
> 
> The existing unused netlink command DEVLINK_CMD_REGION_NEW would be used
> for this.
> 

Sure, this makes some sense. It will likely require refactoring the
snapshot id creation to handle the fact that users can choose a snapshot
id themselves.

This approach sounds like a more sound design, making the commands
symmetric and enabling easier usage from scripts.

Thanks,
Jake

> 
>>
>>     # Dump a snapshot:
>>     $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-02-01 17:43       ` Jakub Kicinski
@ 2020-02-03 16:35         ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 16:35 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 2/1/2020 9:43 AM, Jakub Kicinski wrote:
> On Fri, 31 Jan 2020 16:51:10 -0800, Jacob Keller wrote:
>> TL;DR; Yes, I'd like to have a single devlink for the device, but no, I
>> don't have a good answer for how to do it sanely.
> 
> Ack, it not a new problem and I don't have a solution either :(
> 

Right.

> I don't think mlx5 has this distinction of only single/first PF being
> able to perform device-wide updates so perhaps it's better to not
> introduce that notion? 
> 

I was just talking about that with someone yesterday. Yea, I think
you're right. I believe both the overall devlink mutex and the device's
NVM acquire locking mechanism should be suitable to prevent access.

I was originally just trying to make it difficult to somehow start
updates or perform conflicting activity over multiple PFs at once.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 13/15] devlink: support directly reading from region memory
  2020-02-03 13:44   ` Jiri Pirko
@ 2020-02-03 16:40     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 16:40 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang

On 2/3/2020 5:44 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:59:08PM CET, jacob.e.keller@intel.com wrote:
>> @@ -508,6 +508,10 @@ struct devlink_region_ops {
>> 			struct netlink_ext_ack *extack,
>> 			u8 **data,
>> 			devlink_snapshot_data_dest_t **destructor);
>> +	int (*read)(struct devlink *devlink,
>> +		    u64 curr_offset,
>> +		    u32 data_size,
>> +		    u8 *data);
> 
> Too much wrapping.
> 

Yep, will clean it up.

> 
>> };
>>
>> struct devlink_fmsg;
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index b2b855d12a11..5831b7b78915 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -4005,6 +4005,56 @@ static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
>> 	return err;
>> }
>>
>> +static int devlink_nl_region_read_direct_fill(struct sk_buff *skb,
>> +					      struct devlink *devlink,
>> +					      struct devlink_region *region,
>> +					      struct nlattr **attrs,
>> +					      u64 start_offset,
>> +					      u64 end_offset,
>> +					      bool dump,
>> +					      u64 *new_offset)
> 
> Again.

Yep.

> 
> 
>> +{
>> +	u64 curr_offset = start_offset;
>> +	int err = 0;
>> +	u8 *data;
>> +
>> +	/* Allocate and re-use a single buffer */
>> +	data = kzalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	*new_offset = start_offset;
>> +
>> +	if (end_offset > region->size || dump)
>> +		end_offset = region->size;
>> +
>> +	while (curr_offset < end_offset) {
>> +		u32 data_size;
>> +
>> +		if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
>> +			data_size = end_offset - curr_offset;
>> +		else
>> +			data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
>> +
>> +		err = region->ops->read(devlink, curr_offset, data_size, data);
> 
> There is a lot of code duplication is this function. Perhap there could
> be a cb and cb_priv here to distinguish shapshot and direct read?
> 
> 

So, I was looking into how to do this, and I have a couple of patches to
simplify this function:

first I removed the region parameter and replaced it with "snapshot",
which enabled removing the dump and calculating the end_offset in the
caller function properly...

The main problem I found in sharing code is that snapshots didn't need
to allocate a data buffer while the raw-read does.

I'll see about doing a sort of cb/cb_priv setup... I'm not 100%
convinced yet though.

> 
> 	direct = !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID];
> 

Makes sense.

> 
>> +
>> +	/* Region may not support direct read access */
>> +	if (direct && !region->ops->read) {
> 
> extack msg please.
> 

Yep.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/15] devlink: add functions to take snapshot while locked
  2020-02-03 11:39   ` Jiri Pirko
@ 2020-02-03 16:45     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 16:45 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang

On 2/3/2020 3:39 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:58:57PM CET, jacob.e.keller@intel.com wrote:
>> + */
>> +static int
>> +devlink_region_snapshot_create_locked(struct devlink_region *region,
> 
> __devlink_region_snapshot_create()
> 

Yep, was pointed out earlier, will clean up in v2.

>> 	return id;
>> @@ -7622,7 +7679,7 @@ EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_get);
>>  *	devlink_region_snapshot_create - create a new snapshot
>>  *	This will add a new snapshot of a region. The snapshot
>>  *	will be stored on the region struct and can be accessed
>> - *	from devlink. This is useful for future	analyses of snapshots.
>> + *	from devlink. This is useful for future analyses of snapshots.
> 
> What this hunk is about? :O
> 
> 

Not sure, but if I had to guess it was a stray space vs. tab thing that
accidentally got changed without reason.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/15] devlink: prepare to support region operations
  2020-02-03 11:35   ` Jiri Pirko
@ 2020-02-03 16:48     ` Jacob Keller
  2020-02-03 17:07     ` Jacob Keller
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 16:48 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang



On 2/3/2020 3:35 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:58:56PM CET, jacob.e.keller@intel.com wrote:
>> -static const char *region_cr_space_str = "cr-space";
>> -static const char *region_fw_health_str = "fw-health";
> 
> Just leave these as are and use in ops and messages. It is odd to use
> ops.name in the message.
> 

If I recall, I tried this and it produced some issues with const. Will
confirm that and fix or explain in the commit message next round.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free
  2020-02-03 11:29   ` Jiri Pirko
@ 2020-02-03 16:56     ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 16:56 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang

On 2/3/2020 3:29 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:59:03PM CET, jacob.e.keller@intel.com wrote:
>> Add devres managed allocation functions for allocating a devlink
>> instance. These can be used by device drivers based on the devres
>> framework which want to allocate a devlink instance.
>>
>> For simplicity and to reduce churn in the devlink core code, the devres
>> management works by creating a node with a double-pointer. The devlink
>> instance is allocated using the normal devlink_alloc and released using
>> the normal devlink_free.
>>
>> An alternative solution where the raw memory for devlink is allocated
>> directly via devres_alloc could be done. Such an implementation would
>> either significantly increase code duplication or code churn in order to
>> refactor the setup from the allocation.
>>
>> The new devres managed allocation function will be used by the ice
>> driver in a following change to implement initial devlink support.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>> include/net/devlink.h |  4 ++++
>> lib/devres.c          |  1 +
>> net/core/devlink.c    | 54 +++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 59 insertions(+)
>>
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index 63e954241404..1c3540280396 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -858,11 +858,15 @@ struct ib_device;
>> struct net *devlink_net(const struct devlink *devlink);
>> void devlink_net_set(struct devlink *devlink, struct net *net);
>> struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
>> +struct devlink *devlinkm_alloc(struct device * dev,
>> +			       const struct devlink_ops *ops,
>> +			       size_t priv_size);
>> int devlink_register(struct devlink *devlink, struct device *dev);
>> void devlink_unregister(struct devlink *devlink);
>> void devlink_reload_enable(struct devlink *devlink);
>> void devlink_reload_disable(struct devlink *devlink);
>> void devlink_free(struct devlink *devlink);
>> +void devlinkm_free(struct device *dev, struct devlink *devlink);
>> int devlink_port_register(struct devlink *devlink,
>> 			  struct devlink_port *devlink_port,
>> 			  unsigned int port_index);
>> diff --git a/lib/devres.c b/lib/devres.c
>> index 6ef51f159c54..239c81d40612 100644
>> --- a/lib/devres.c
>> +++ b/lib/devres.c
>> @@ -5,6 +5,7 @@
>> #include <linux/gfp.h>
>> #include <linux/export.h>
>> #include <linux/of_address.h>
>> +#include <net/devlink.h>
>>
>> enum devm_ioremap_type {
>> 	DEVM_IOREMAP = 0,
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index 574008c536fa..b2b855d12a11 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -6531,6 +6531,60 @@ void devlink_free(struct devlink *devlink)
>> }
>> EXPORT_SYMBOL_GPL(devlink_free);
>>
>> +static void devres_devlink_release(struct device *dev, void *res)
>> +{
>> +	devlink_free(*(struct devlink **)res);
>> +}
>> +
>> +static int devres_devlink_match(struct device *dev, void *res, void *data)
>> +{
>> +	return *(struct devlink **)res == data;
>> +}
>> +
>> +/**
>> + * devlinkm_alloc - Allocate devlink instance managed by devres
>> + * @dev: device to allocate devlink for
>> + * @ops: devlink ops structure
>> + * @priv_size: size of private data portion
>> + *
>> + * Allocate a devlink instance and manage its release via devres.
>> + */
>> +struct devlink *devlinkm_alloc(struct device *dev,
> 
> Why "devlinkm"? Looks like the usual prefix for this is "devm_"
> So "devm_devlink_alloc/free"?
> 
> 

pcim_enable_device
pcim_iomap
pcim_iomap_devres
pcim_iomap_regions
pcim_iomap_regions_request_all
pcim_iomap_release
pcim_iomap_table
pcim_iounmap
pcim_iounmap_regions
pcim_pin_device
pcim_release
pcim_set_mwi
pcim_state

There are some devm_pci_* though... Heh.

Regardless, I agree wit Jakub, and am going to remove this, because it
seems less valuable since the driver needs to manage the remaining
devlink state regardless. (Unless we add devm_devlink_* for every thing,
but....)

I think it makes more sense to assume a devres driver would need to
manage its own usage via direct devres calls to setup the teardown
actions manually.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/15] devlink: prepare to support region operations
  2020-02-03 11:35   ` Jiri Pirko
  2020-02-03 16:48     ` Jacob Keller
@ 2020-02-03 17:07     ` Jacob Keller
  2020-02-03 17:10       ` Jacob Keller
  2020-02-03 17:14     ` Jacob Keller
  2020-02-03 17:17     ` Jacob Keller
  3 siblings, 1 reply; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 17:07 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang

On 2/3/2020 3:35 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:58:56PM CET, jacob.e.keller@intel.com wrote:
>> Modify the devlink region code in preparation for adding new operations
>> on regions.
>>
>> Create a devlink_region_ops structure, and move the name pointer from
>> within the devlink_region structure into the ops structure (similar to
>> the devlink_health_reporter_ops).
>>
>> This prepares the regions to enable support of additional operations in
>> the future such as requesting snapshots, or accessing the region
>> directly without a snapshot.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx4/crdump.c | 25 ++++++++++++---------
>> drivers/net/netdevsim/dev.c                 |  6 ++++-
>> include/net/devlink.h                       | 17 ++++++++++----
>> net/core/devlink.c                          | 23 ++++++++++---------
>> 4 files changed, 45 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c b/drivers/net/ethernet/mellanox/mlx4/crdump.c
>> index 64ed725aec28..4cea64033919 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
>> @@ -38,8 +38,13 @@
>> #define CR_ENABLE_BIT_OFFSET		0xF3F04
>> #define MAX_NUM_OF_DUMPS_TO_STORE	(8)
>>
>> -static const char *region_cr_space_str = "cr-space";
>> -static const char *region_fw_health_str = "fw-health";
> 
> Just leave these as are and use in ops and messages. It is odd to use
> ops.name in the message.
> 

So this produces the following errors, not 100% sure how to resolve:


> drivers/net/ethernet/mellanox/mlx4/crdump.c:45:10: error: initializer element is not constant
>    45 |  .name = region_cr_space_str,
>       |          ^~~~~~~~~~~~~~~~~~~
> drivers/net/ethernet/mellanox/mlx4/crdump.c:45:10: note: (near initialization for ‘region_cr_space_ops.name’)
> drivers/net/ethernet/mellanox/mlx4/crdump.c:49:10: error: initializer element is not constant
>    49 |  .name = region_fw_health_str,
>       |          ^~~~~~~~~~~~~~~~~~~~


Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/15] devlink: prepare to support region operations
  2020-02-03 17:07     ` Jacob Keller
@ 2020-02-03 17:10       ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 17:10 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang

On 2/3/2020 9:07 AM, Jacob Keller wrote:
> On 2/3/2020 3:35 AM, Jiri Pirko wrote:
>> Thu, Jan 30, 2020 at 11:58:56PM CET, jacob.e.keller@intel.com wrote:
>>> Modify the devlink region code in preparation for adding new operations
>>> on regions.
>>>
>>> Create a devlink_region_ops structure, and move the name pointer from
>>> within the devlink_region structure into the ops structure (similar to
>>> the devlink_health_reporter_ops).
>>>
>>> This prepares the regions to enable support of additional operations in
>>> the future such as requesting snapshots, or accessing the region
>>> directly without a snapshot.
>>>
>>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>>> ---
>>> drivers/net/ethernet/mellanox/mlx4/crdump.c | 25 ++++++++++++---------
>>> drivers/net/netdevsim/dev.c                 |  6 ++++-
>>> include/net/devlink.h                       | 17 ++++++++++----
>>> net/core/devlink.c                          | 23 ++++++++++---------
>>> 4 files changed, 45 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c b/drivers/net/ethernet/mellanox/mlx4/crdump.c
>>> index 64ed725aec28..4cea64033919 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
>>> @@ -38,8 +38,13 @@
>>> #define CR_ENABLE_BIT_OFFSET		0xF3F04
>>> #define MAX_NUM_OF_DUMPS_TO_STORE	(8)
>>>
>>> -static const char *region_cr_space_str = "cr-space";
>>> -static const char *region_fw_health_str = "fw-health";
>>
>> Just leave these as are and use in ops and messages. It is odd to use
>> ops.name in the message.
>>
> 
> So this produces the following errors, not 100% sure how to resolve:
> 
> 
>> drivers/net/ethernet/mellanox/mlx4/crdump.c:45:10: error: initializer element is not constant
>>    45 |  .name = region_cr_space_str,
>>       |          ^~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/mellanox/mlx4/crdump.c:45:10: note: (near initialization for ‘region_cr_space_ops.name’)
>> drivers/net/ethernet/mellanox/mlx4/crdump.c:49:10: error: initializer element is not constant
>>    49 |  .name = region_fw_health_str,
>>       |          ^~~~~~~~~~~~~~~~~~~~
> 
> 
> Thanks,
> Jake
> 

Heh, this is fixed by using static const char * const <name>, to ensure
that the compiler realizes both the pointer and the contents are constant.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/15] devlink: prepare to support region operations
  2020-02-03 11:35   ` Jiri Pirko
  2020-02-03 16:48     ` Jacob Keller
  2020-02-03 17:07     ` Jacob Keller
@ 2020-02-03 17:14     ` Jacob Keller
  2020-02-03 17:17     ` Jacob Keller
  3 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 17:14 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang

On 2/3/2020 3:35 AM, Jiri Pirko wrot>> +struct devlink_region *
>> +devlink_region_create(struct devlink *devlink,
>> +		      const struct devlink_region_ops *ops,
>> +		      u32 region_max_snapshots,
> 
> No need to wrap here.
> 

So I can do either:

> struct devlink_region *
> devlink_region_create(struct devlink *devlink, const struct devlink_region_ops *ops,
>                       u32 region_max_snapshots,

or

> struct devlink_region *devlink_region_create(struct devlink *devlink,>                                              const struct
devlink_region_ops *ops,
>                                              u32 region_max_snapshots,

Both of which go over the 80-character limit.

Or I can stick with what I've chosen which does not.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/15] devlink: prepare to support region operations
  2020-02-03 11:35   ` Jiri Pirko
                       ` (2 preceding siblings ...)
  2020-02-03 17:14     ` Jacob Keller
@ 2020-02-03 17:17     ` Jacob Keller
  3 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 17:17 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang



On 2/3/2020 3:35 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:58:56PM CET, jacob.e.keller@intel.com wrote:
>> +struct devlink_region *
>> +devlink_region_create(struct devlink *devlink,
>> +		      const struct devlink_region_ops *ops,
>> +		      u32 region_max_snapshots,
> 
> No need to wrap here.
> 
> 
>> +		      u64 region_size);

Probably misunderstood your command previously. Will merge the
region_max_snapshots and region_size to one line.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-02-03 11:50   ` Jiri Pirko
  2020-02-03 16:33     ` Jacob Keller
@ 2020-02-03 19:32     ` Jacob Keller
  2020-02-03 21:30       ` Jiri Pirko
  1 sibling, 1 reply; 62+ messages in thread
From: Jacob Keller @ 2020-02-03 19:32 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang

On 2/3/2020 3:50 AM, Jiri Pirko wrote:
> Thu, Jan 30, 2020 at 11:58:58PM CET, jacob.e.keller@intel.com wrote:
>> Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
>> command is intended to enable userspace to request an immediate snapshot
>> of a region.
>>
>> Regions can enable support for requestable snapshots by implementing the
>> snapshot callback function in the region's devlink_region_ops structure.
>>
>> Implementations of this function callback should capture an immediate
>> copy of the data and return it and its destructor in the function
>> parameters. The core devlink code will generate a snapshot ID and create
>> the new snapshot while holding the devlink instance lock.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>> .../networking/devlink/devlink-region.rst     |  9 +++-
>> include/net/devlink.h                         |  7 +++
>> include/uapi/linux/devlink.h                  |  2 +
>> net/core/devlink.c                            | 46 +++++++++++++++++++
>> 4 files changed, 62 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>> index 1a7683e7acb2..262249e6c3fc 100644
>> --- a/Documentation/networking/devlink/devlink-region.rst
>> +++ b/Documentation/networking/devlink/devlink-region.rst
>> @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
>> Regions may also be used to provide an additional way to debug complex error
>> states, but see also :doc:`devlink-health`
>>
>> +Regions may optionally support capturing a snapshot on demand via the
>> +``DEVLINK_CMD_REGION_TAKE_SNAPSHOT`` netlink message. A driver wishing to
>> +allow requested snapshots must implement the ``.snapshot`` callback for the
>> +region in its ``devlink_region_ops`` structure.
>> +
>> example usage
>> -------------
>>
>> @@ -40,8 +45,8 @@ example usage
>>     # Delete a snapshot using:
>>     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>>
>> -    # Trigger (request) a snapshot be taken:
>> -    $ devlink region trigger pci/0000:00:05.0/cr-space
>> +    # Request an immediate snapshot, if supported by the region
>> +    $ devlink region snapshot pci/0000:00:05.0/cr-space
> 
> 
> Hmm, the shapshot is now removed by user calling:
> 
> $ devlink region del DEV/REGION snapshot SNAPSHOT_ID
> That is using DEVLINK_CMD_REGION_DEL netlink command calling
> devlink_nl_cmd_region_del()
> 
> I think the creation should be symmetric. Something like:
> $ devlink region add DEV/REGION snapshot SNAPSHOT_ID
> SNAPSHOT_ID is either exact number or "any" if user does not care.
> The benefit of using user-passed ID value is that you can use this
> easily in scripts.
> 
> The existing unused netlink command DEVLINK_CMD_REGION_NEW would be used
> for this.
> 

So I have some concern trying to allow picking the snapshot id. I agree
it is useful, but want to make sure we pick the best design for how to
handle things.

Currently regions support taking a snapshot across multiple regions with
the same ID. this means that the region id value is stored per devlink
instead of per region.

If users can pick IDs, they can and probably will become sparse. This
means that we now need to be able to handle this.

If a user picks an ID, we want to ensure that the global region id
number is incremented properly so that we skip the used IDs, otherwise
those could accidentally collide.

The simplest solution is to just force the global ID to be 1 larger at a
minimum every time the userspace calls us with an ID.

But now what happens if a user requests a really large ID (U32_MAX - 1)?
and then we now overflow our region ID.

This was previously a rare occurrence, but has now become possibly common.

We could require/force the user to pick IDs within a limited range, and
have the automatic regions come from another range..

We could enhance ID selection to just pick "lowest number unused by any
region". This would allow re-using ID numbers after they've been
deleted.. I think this approach is the most robust but does require a
bit of extra computation.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-02-03 19:32     ` Jacob Keller
@ 2020-02-03 21:30       ` Jiri Pirko
  2020-02-04 19:20         ` Jacob Keller
  0 siblings, 1 reply; 62+ messages in thread
From: Jiri Pirko @ 2020-02-03 21:30 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang

Mon, Feb 03, 2020 at 08:32:36PM CET, jacob.e.keller@intel.com wrote:
>On 2/3/2020 3:50 AM, Jiri Pirko wrote:
>> Thu, Jan 30, 2020 at 11:58:58PM CET, jacob.e.keller@intel.com wrote:
>>> Add a new devlink command, DEVLINK_CMD_REGION_TAKE_SNAPSHOT. This
>>> command is intended to enable userspace to request an immediate snapshot
>>> of a region.
>>>
>>> Regions can enable support for requestable snapshots by implementing the
>>> snapshot callback function in the region's devlink_region_ops structure.
>>>
>>> Implementations of this function callback should capture an immediate
>>> copy of the data and return it and its destructor in the function
>>> parameters. The core devlink code will generate a snapshot ID and create
>>> the new snapshot while holding the devlink instance lock.
>>>
>>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>>> ---
>>> .../networking/devlink/devlink-region.rst     |  9 +++-
>>> include/net/devlink.h                         |  7 +++
>>> include/uapi/linux/devlink.h                  |  2 +
>>> net/core/devlink.c                            | 46 +++++++++++++++++++
>>> 4 files changed, 62 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>>> index 1a7683e7acb2..262249e6c3fc 100644
>>> --- a/Documentation/networking/devlink/devlink-region.rst
>>> +++ b/Documentation/networking/devlink/devlink-region.rst
>>> @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
>>> Regions may also be used to provide an additional way to debug complex error
>>> states, but see also :doc:`devlink-health`
>>>
>>> +Regions may optionally support capturing a snapshot on demand via the
>>> +``DEVLINK_CMD_REGION_TAKE_SNAPSHOT`` netlink message. A driver wishing to
>>> +allow requested snapshots must implement the ``.snapshot`` callback for the
>>> +region in its ``devlink_region_ops`` structure.
>>> +
>>> example usage
>>> -------------
>>>
>>> @@ -40,8 +45,8 @@ example usage
>>>     # Delete a snapshot using:
>>>     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>>>
>>> -    # Trigger (request) a snapshot be taken:
>>> -    $ devlink region trigger pci/0000:00:05.0/cr-space
>>> +    # Request an immediate snapshot, if supported by the region
>>> +    $ devlink region snapshot pci/0000:00:05.0/cr-space
>> 
>> 
>> Hmm, the shapshot is now removed by user calling:
>> 
>> $ devlink region del DEV/REGION snapshot SNAPSHOT_ID
>> That is using DEVLINK_CMD_REGION_DEL netlink command calling
>> devlink_nl_cmd_region_del()
>> 
>> I think the creation should be symmetric. Something like:
>> $ devlink region add DEV/REGION snapshot SNAPSHOT_ID
>> SNAPSHOT_ID is either exact number or "any" if user does not care.
>> The benefit of using user-passed ID value is that you can use this
>> easily in scripts.
>> 
>> The existing unused netlink command DEVLINK_CMD_REGION_NEW would be used
>> for this.
>> 
>
>So I have some concern trying to allow picking the snapshot id. I agree
>it is useful, but want to make sure we pick the best design for how to
>handle things.
>
>Currently regions support taking a snapshot across multiple regions with
>the same ID. this means that the region id value is stored per devlink
>instead of per region.
>
>If users can pick IDs, they can and probably will become sparse. This
>means that we now need to be able to handle this.
>
>If a user picks an ID, we want to ensure that the global region id
>number is incremented properly so that we skip the used IDs, otherwise
>those could accidentally collide.
>
>The simplest solution is to just force the global ID to be 1 larger at a
>minimum every time the userspace calls us with an ID.
>
>But now what happens if a user requests a really large ID (U32_MAX - 1)?
>and then we now overflow our region ID.
>
>This was previously a rare occurrence, but has now become possibly common.
>
>We could require/force the user to pick IDs within a limited range, and
>have the automatic regions come from another range..
>
>We could enhance ID selection to just pick "lowest number unused by any
>region". This would allow re-using ID numbers after they've been
>deleted.. I think this approach is the most robust but does require a
>bit of extra computation.

Check out "ida"

>
>Thanks,
>Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/15] devlink: add operation to take an immediate snapshot
  2020-02-03 21:30       ` Jiri Pirko
@ 2020-02-04 19:20         ` Jacob Keller
  0 siblings, 0 replies; 62+ messages in thread
From: Jacob Keller @ 2020-02-04 19:20 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang


On 2/3/2020 1:30 PM, Jiri Pirko wrote:
> Mon, Feb 03, 2020 at 08:32:36PM CET, jacob.e.keller@intel.com wrote:
> 
> Check out "ida"
> 

Oh, that looks perfect.

Thanks,
Jake

>>
>> Thanks,
>> Jake

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2020-02-04 19:20 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-30 22:58 [RFC PATCH 00/13] devlink direct region reading Jacob Keller
2020-01-30 22:58 ` [PATCH 01/15] devlink: prepare to support region operations Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-02-03 11:35   ` Jiri Pirko
2020-02-03 16:48     ` Jacob Keller
2020-02-03 17:07     ` Jacob Keller
2020-02-03 17:10       ` Jacob Keller
2020-02-03 17:14     ` Jacob Keller
2020-02-03 17:17     ` Jacob Keller
2020-01-30 22:58 ` [PATCH 02/15] devlink: add functions to take snapshot while locked Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-01-31 18:09     ` Jacob Keller
2020-02-03 11:39   ` Jiri Pirko
2020-02-03 16:45     ` Jacob Keller
2020-01-30 22:58 ` [PATCH 03/15] devlink: add operation to take an immediate snapshot Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-02-03  8:19   ` Yunsheng Lin
2020-02-03 11:50     ` Jiri Pirko
2020-02-03 11:50   ` Jiri Pirko
2020-02-03 16:33     ` Jacob Keller
2020-02-03 19:32     ` Jacob Keller
2020-02-03 21:30       ` Jiri Pirko
2020-02-04 19:20         ` Jacob Keller
2020-01-30 22:58 ` [PATCH 04/15] netdevsim: support taking immediate snapshot via devlink Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-01-31 18:12     ` Jacob Keller
2020-01-30 22:59 ` [PATCH 05/15] ice: use __le16 types for explicitly Little Endian values Jacob Keller
2020-01-30 22:59 ` [PATCH 06/15] ice: create function to read a section of the NVM and Shadow RAM Jacob Keller
2020-01-30 22:59 ` [PATCH 07/15] ice: implement full NVM read from ETHTOOL_GEEPROM Jacob Keller
2020-01-30 22:59 ` [PATCH 08/15] devlink: add devres managed devlinkm_alloc and devlinkm_free Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-01-31 18:16     ` Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-02-01  0:51     ` Jacob Keller
2020-02-01 17:43       ` Jakub Kicinski
2020-02-03 16:35         ` Jacob Keller
2020-02-03 11:29   ` Jiri Pirko
2020-02-03 16:56     ` Jacob Keller
2020-01-30 22:59 ` [PATCH 09/15] ice: enable initial devlink support Jacob Keller
2020-01-30 22:59 ` [PATCH 10/15] ice: add basic handler for devlink .info_get Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-01-31 18:25     ` Jacob Keller
2020-01-30 22:59 ` [PATCH 11/15] ice: add board identifier info to " Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-01-31 18:26     ` Jacob Keller
2020-01-30 22:59 ` [PATCH 12/15] ice: add a devlink region to dump shadow RAM contents Jacob Keller
2020-01-30 22:59 ` [PATCH 13/15] devlink: support directly reading from region memory Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-01-31 18:27     ` Jacob Keller
2020-01-31 19:15     ` Jacob Keller
2020-02-03 13:44   ` Jiri Pirko
2020-02-03 16:40     ` Jacob Keller
2020-01-30 22:59 ` [PATCH 14/15] ice: support direct read of the shadow ram region Jacob Keller
2020-01-30 22:59 ` [PATCH 15/15] ice: add ice.rst devlink documentation file Jacob Keller
2020-01-31 18:07   ` Jakub Kicinski
2020-01-31 18:28     ` Jacob Keller
2020-01-30 22:59 ` [RFC PATCH 1/3] Update kernel headers Jacob Keller
2020-01-30 22:59 ` [RFC PATCH 2/3] devlink: add support for DEVLINK_CMD_REGION_TAKE_SNAPSHOT Jacob Keller
2020-01-30 22:59 ` [RFC PATCH 3/3] devlink: stop requiring snapshot for regions Jacob Keller
2020-01-30 23:03 ` [RFC PATCH 00/13] devlink direct region reading Jacob Keller
2020-01-31 18:06 ` Jakub Kicinski
2020-01-31 18:09   ` Jacob Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.