All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/22] devlink region updates
@ 2020-02-14 23:21 Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 01/22] ice: use __le16 types for explicitly Little Endian values Jacob Keller
                   ` (24 more replies)
  0 siblings, 25 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:21 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

This is a second revision of the previous RFC series I sent to enable two
new devlink region features.

The original series can be viewed on the list archives at

https://lore.kernel.org/netdev/20200130225913.1671982-1-jacob.e.keller@intel.com/

Overall, this series can be broken into 5 phases:

 1) implement basic devlink support in the ice driver, including .info_get
 2) convert regions to use the new devlink_region_ops structure
 3) implement support for DEVLINK_CMD_REGION_NEW
 4) implement support for directly reading from a region
 5) use these new features in the ice driver for the Shadow RAM region

(1) comprises 6 patches for the ice driver that add the devlink framework
and cleanup a few places in the code in preparation for the new region.

(2) comprises 2 patches which convert regions to use the new
devlink_region_ops structure, and additionally move the snapshot destructor
to a region operation.

(3) comprises 6 patches to enable supporting the DEVLINK_CMD_REGION_NEW
operation. This replaces what was previously the
DEVLINK_CMD_REGION_TAKE_SNAPSHOT, as per Jiri's suggestion. The new
operation supports specifying the requested id for the snapshot. To make
that possible, first snapshot id management is refactored to use an IDR.
Note that the extra complexity of the IDR is necessary in order to maintain
the ability for the snapshot IDs to be generated so that multiple regions
can use the same ID if triggered at the same time.

(4) comprises 6 patches for modifying DEVLINK_CMD_REGION_READ so that it
accepts a request without a snapshot id. A new region operation is defined
for regions to optionally support the requests. The first few patches
refactor and simplify the functions used so that adding the new read method
reuses logic where possible.

(5) finally comprises a single patch to implement a region for the ice
device hardware's Shadow RAM contents.

Note that I plan to submit the ice patches through the Intel Wired LAN list,
but am sending the complete set here as an RFC in case there is further
feedback, and so that reviewers can have the correct context.

I expect to get further feedback this RFC revision, and will hopefully send
the patches as non-RFC following this, if feedback looks good. Thank you for
the diligent review.

Changes since v1:

* reword some comments and variable names in the ice driver that used the
  term "page" to use the term "sector" to avoid confusion with the PAGE_SIZE
  of the system.
* Fixed a bug in the ice_read_flat_nvm function due to misusing the last_cmd
  variable
* Remove the devlinkm* functions and just use devm_add_action in the ice
  driver for managing the devlink memory similar to how the PF memory was
  managed by the devm_kzalloc.
* Fix typos in a couple of function comments in ice_devlink.c
* use dev_err instead of dev_warn for an error case where the main VSI can't
  be found.
* Only call devlink_port_type_eth_set if the VSI has a netdev
* Move where the devlink_port is created in the ice_probe flow
* Update the new ice.rst documentation for info versions, providing more
  clear descriptions of the parameters. Give examples for each field as
  well. Squash the documentation additions into the relevant patches.
* Add a new patch to the ice driver which renames some variables referring
  to the Option ROM version.
* keep the string constants in the mlx4 crdump.c file, converting them to
  "const char * const" so that the compiler understands they can be used in
  constant initializers.
* Add a patch to convert snapshot destructors into a region operation
* Add a patch to fix a trivial typo in a devlink function comment
* Use __ as a prefix for static internal functions instead of a _locked
  suffix.
* Refactor snapshot id management to use an IDR.
* Implement DEVLINK_CMD_REGION_NEW of DEVLINK_CMD_REGION_TAKE_SNAPSHOT
* Add several patches which refactor devlink_nl_cmd_region_snapshot_fill
* Use the new cb_ and cb_priv parameters to implement what was previously
  a separate function called devlink_nl_cmd_region_direct_fill

Jacob Keller (21):
  ice: use __le16 types for explicitly Little Endian values
  ice: create function to read a section of the NVM and Shadow RAM
  ice: enable initial devlink support
  ice: rename variables used for Option ROM version
  ice: add basic handler for devlink .info_get
  ice: add board identifier info to devlink .info_get
  devlink: prepare to support region operations
  devlink: convert snapshot destructor callback to region op
  devlink: trivial: fix tab in function documentation
  devlink: add functions to take snapshot while locked
  devlink: convert snapshot id getter to return an error
  devlink: track snapshot ids using an IDR and refcounts
  devlink: implement DEVLINK_CMD_REGION_NEW
  netdevsim: support taking immediate snapshot via devlink
  devlink: simplify arguments for read_snapshot_fill
  devlink: use min_t to calculate data_size
  devlink: report extended error message in region_read_dumpit
  devlink: remove unnecessary parameter from chunk_fill function
  devlink: refactor region_read_snapshot_fill to use a callback function
  devlink: support directly reading from region memory
  ice: add a devlink region to dump shadow RAM contents

Jesse Brandeburg (1):
  ice: implement full NVM read from ETHTOOL_GEEPROM

 .../networking/devlink/devlink-region.rst     |  20 +-
 Documentation/networking/devlink/ice.rst      |  87 ++++
 Documentation/networking/devlink/index.rst    |   1 +
 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/ice/Makefile       |   1 +
 drivers/net/ethernet/intel/ice/ice.h          |   6 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   3 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  85 +---
 drivers/net/ethernet/intel/ice/ice_common.h   |  10 +-
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 360 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h  |  17 +
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  44 +-
 drivers/net/ethernet/intel/ice/ice_main.c     |  23 +-
 drivers/net/ethernet/intel/ice/ice_nvm.c      | 354 +++++++------
 drivers/net/ethernet/intel/ice/ice_nvm.h      |  12 +
 drivers/net/ethernet/intel/ice/ice_type.h     |  17 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c   |  32 +-
 drivers/net/netdevsim/dev.c                   |  41 +-
 include/net/devlink.h                         |  38 +-
 net/core/devlink.c                            | 465 ++++++++++++++----
 .../drivers/net/netdevsim/devlink.sh          |  15 +
 21 files changed, 1257 insertions(+), 375 deletions(-)
 create mode 100644 Documentation/networking/devlink/ice.rst
 create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.h

-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 01/22] ice: use __le16 types for explicitly Little Endian values
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 02/22] ice: create function to read a section of the NVM and Shadow RAM Jacob Keller
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The ice_read_sr_aq function returns words in the Little Endian format.
Remove the need for __force and typecasting by using a local variable in
the ice_read_sr_word_aq function.

Additionally clarify explicitly that the ice_read_sr_aq function takes
storage for __le16 values instead of using u16.

Being explicit about the endianness of this data helps when using tools
like sparse to catch endian-related issues.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_nvm.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 7525ac50742e..46db9bb0977f 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -80,13 +80,14 @@ ice_check_sr_access_params(struct ice_hw *hw, u32 offset, u16 words)
  * @hw: pointer to the HW structure
  * @offset: offset in words from module start
  * @words: number of words to read
- * @data: buffer for words reads from Shadow RAM
+ * @data: storage for the words read from Shadow RAM (Little Endian)
  * @last_command: tells the AdminQ that this is the last command
  *
- * Reads 16-bit word buffers from the Shadow RAM using the admin command.
+ * Reads 16-bit Little Endian word buffers from the Shadow RAM using the admin
+ * command.
  */
 static enum ice_status
-ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, u16 *data,
+ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, __le16 *data,
 	       bool last_command)
 {
 	enum ice_status status;
@@ -116,10 +117,11 @@ static enum ice_status
 ice_read_sr_word_aq(struct ice_hw *hw, u16 offset, u16 *data)
 {
 	enum ice_status status;
+	__le16 data_local;
 
-	status = ice_read_sr_aq(hw, offset, 1, data, true);
+	status = ice_read_sr_aq(hw, offset, 1, &data_local, true);
 	if (!status)
-		*data = le16_to_cpu(*(__force __le16 *)data);
+		*data = le16_to_cpu(data_local);
 
 	return status;
 }
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 02/22] ice: create function to read a section of the NVM and Shadow RAM
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 01/22] ice: use __le16 types for explicitly Little Endian values Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 03/22] ice: implement full NVM read from ETHTOOL_GEEPROM Jacob Keller
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The NVM contents are read via firmware by using the ice_aq_read_nvm
function. This function has a couple of limits:

1) The AdminQ commands can only take buffers sized up to 4Kb. Thus, any
   larger read must be split into multiple reads.
2) when reading from the Shadow RAM, reads must not cross sector
   boundaries. The sectors are also 4Kb in size.

Implement the ice_read_flat_nvm function to read portions of the NVM by
flat offset. That is, to read using offsets from the start of the NVM
rather than from a specific module.

This function will be able to read both from the NVM and from the Shadow
RAM. For simplicity NVM reads will always be broken up to not cross 4Kb
page boundaries, even though this is not required unless reading from
the Shadow RAM.

Use this new function as the implementation of ice_read_sr_word_aq.

The ice_read_sr_buf_aq function is not modified here. This is because
a following change will remove the only caller of that function in favor
of directly using ice_read_flat_nvm. Thus, there is little benefit to
changing it now only to remove it momentarily. At the same time, the
ice_read_sr_aq function will also be removed.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  2 +
 drivers/net/ethernet/intel/ice/ice_nvm.c      | 87 +++++++++++++++++--
 drivers/net/ethernet/intel/ice/ice_nvm.h      |  3 +
 3 files changed, 85 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 6873998cf145..04bc092e8f45 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1250,6 +1250,8 @@ struct ice_aqc_nvm {
 	__le32 addr_low;
 };
 
+#define ICE_AQC_NVM_START_POINT			0
+
 /* NVM Checksum Command (direct, 0x0706) */
 struct ice_aqc_nvm_checksum {
 	u8 flags;
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 46db9bb0977f..e2214c076ca9 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -11,13 +11,15 @@
  * @length: length of the section to be read (in bytes from the offset)
  * @data: command buffer (size [bytes] = length)
  * @last_command: tells if this is the last command in a series
+ * @read_shadow_ram: tell if this is a shadow RAM read
  * @cd: pointer to command details structure or NULL
  *
  * Read the NVM using the admin queue commands (0x0701)
  */
 static enum ice_status
 ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
-		void *data, bool last_command, struct ice_sq_cd *cd)
+		void *data, bool last_command, bool read_shadow_ram,
+		struct ice_sq_cd *cd)
 {
 	struct ice_aq_desc desc;
 	struct ice_aqc_nvm *cmd;
@@ -30,6 +32,9 @@ ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
 
 	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_nvm_read);
 
+	if (!read_shadow_ram && module_typeid == ICE_AQC_NVM_START_POINT)
+		cmd->cmd_flags |= ICE_AQC_NVM_FLASH_ONLY;
+
 	/* If this is the last command in a series, set the proper flag. */
 	if (last_command)
 		cmd->cmd_flags |= ICE_AQC_NVM_LAST_CMD;
@@ -41,6 +46,68 @@ ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
 	return ice_aq_send_cmd(hw, &desc, data, length, cd);
 }
 
+/**
+ * ice_read_flat_nvm - Read portion of NVM by flat offset
+ * @hw: pointer to the HW struct
+ * @offset: offset from beginning of NVM
+ * @length: (in) number of bytes to read; (out) number of bytes actually read
+ * @data: buffer to return data in (sized to fit the specified length)
+ * @read_shadow_ram: if true, read from shadow RAM instead of NVM
+ *
+ * Reads a portion of the NVM, as a flat memory space. This function correctly
+ * breaks read requests across Shadow RAM sectors and ensures that no single
+ * read request exceeds the maximum 4Kb read for a single AdminQ command.
+ *
+ * Returns a status code on failure. Note that the data pointer may be
+ * partially updated if some reads succeed before a failure.
+ */
+enum ice_status
+ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
+		  bool read_shadow_ram)
+{
+	enum ice_status status;
+	u32 inlen = *length;
+	u32 bytes_read = 0;
+	bool last_cmd;
+
+	*length = 0;
+
+	/* Verify the length of the read if this is for the Shadow RAM */
+	if (read_shadow_ram && ((offset + inlen) > (hw->nvm.sr_words * 2u))) {
+		ice_debug(hw, ICE_DBG_NVM,
+			  "NVM error: requested offset is beyond Shadow RAM limit\n");
+		return ICE_ERR_PARAM;
+	}
+
+	do {
+		u32 read_size, sector_offset;
+
+		/* ice_aq_read_nvm cannot read more than 4Kb at a time.
+		 * Additionally, a read from the Shadow RAM may not cross over
+		 * a sector boundary. Conveniently, the sector size is also
+		 * 4Kb.
+		 */
+		sector_offset = offset % ICE_AQ_MAX_BUF_LEN;
+		read_size = min_t(u32, ICE_AQ_MAX_BUF_LEN - sector_offset,
+				  inlen - bytes_read);
+
+		last_cmd = !(bytes_read + read_size < inlen);
+
+		status = ice_aq_read_nvm(hw, ICE_AQC_NVM_START_POINT,
+					 offset, read_size,
+					 data + bytes_read, last_cmd,
+					 read_shadow_ram, NULL);
+		if (status)
+			break;
+
+		bytes_read += read_size;
+		offset += read_size;
+	} while (!last_cmd);
+
+	*length = bytes_read;
+	return status;
+}
+
 /**
  * ice_check_sr_access_params - verify params for Shadow RAM R/W operations.
  * @hw: pointer to the HW structure
@@ -100,7 +167,7 @@ ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, __le16 *data,
 	 */
 	if (!status)
 		status = ice_aq_read_nvm(hw, 0, 2 * offset, 2 * words, data,
-					 last_command, NULL);
+					 last_command, true, NULL);
 
 	return status;
 }
@@ -111,19 +178,25 @@ ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, __le16 *data,
  * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
  * @data: word read from the Shadow RAM
  *
- * Reads one 16 bit word from the Shadow RAM using the ice_read_sr_aq method.
+ * Reads one 16 bit word from the Shadow RAM using ice_read_flat_nvm.
  */
 static enum ice_status
 ice_read_sr_word_aq(struct ice_hw *hw, u16 offset, u16 *data)
 {
+	u32 bytes = sizeof(u16);
 	enum ice_status status;
 	__le16 data_local;
 
-	status = ice_read_sr_aq(hw, offset, 1, &data_local, true);
-	if (!status)
-		*data = le16_to_cpu(data_local);
+	/* Note that ice_read_flat_nvm takes into account the 4Kb AdminQ and
+	 * Shadow RAM sector restrictions necessary when reading from the NVM.
+	 */
+	status = ice_read_flat_nvm(hw, offset * sizeof(u16), &bytes,
+				   (u8 *)&data_local, true);
+	if (status)
+		return status;
 
-	return status;
+	*data = le16_to_cpu(data_local);
+	return 0;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index a9fa011c22c6..4245ef988edf 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -4,5 +4,8 @@
 #ifndef _ICE_NVM_H_
 #define _ICE_NVM_H_
 
+enum ice_status
+ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
+		  bool read_shadow_ram);
 enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data);
 #endif /* _ICE_NVM_H_ */
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 03/22] ice: implement full NVM read from ETHTOOL_GEEPROM
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 01/22] ice: use __le16 types for explicitly Little Endian values Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 02/22] ice: create function to read a section of the NVM and Shadow RAM Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 04/22] ice: enable initial devlink support Jacob Keller
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev
  Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jesse Brandeburg,
	Jacob Keller

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

The current implementation of .get_eeprom only enables reading from the
Shadow RAM portion of the NVM contents. Implement support for reading
the entire flash contents instead of only the initial portion contained
in the Shadow RAM.

A complete dump can take several seconds, but the ETHTOOL_GEEPROM ioctl
is capable of reading only a limited portion at a time by specifying the
offset and length to read.

In order to perform the reads directly, several functions are made non
static. Additionally, the unused ice_read_sr_buf_aq and ice_read_sr_buf
functions are removed.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_common.h   |   3 -
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  36 +++--
 drivers/net/ethernet/intel/ice/ice_nvm.c      | 150 +-----------------
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   4 +
 5 files changed, 31 insertions(+), 163 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 04bc092e8f45..ba4e4f9a89ad 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1758,6 +1758,7 @@ enum ice_aq_err {
 	ICE_AQ_RC_ENOMEM	= 9,  /* Out of memory */
 	ICE_AQ_RC_EBUSY		= 12, /* Device or resource busy */
 	ICE_AQ_RC_EEXIST	= 13, /* Object already exists */
+	ICE_AQ_RC_EINVAL	= 14, /* Invalid argument */
 	ICE_AQ_RC_ENOSPC	= 16, /* No space left or allocation failure */
 	ICE_AQ_RC_ENOSYS	= 17, /* Function not implemented */
 	ICE_AQ_RC_ENOSEC	= 24, /* Missing security manifest */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index f9fc005d35a7..9d5e86c9f886 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -38,9 +38,6 @@ enum ice_status
 ice_alloc_hw_res(struct ice_hw *hw, u16 type, u16 num, bool btm, u16 *res);
 enum ice_status
 ice_free_hw_res(struct ice_hw *hw, u16 type, u16 num, u16 *res);
-enum ice_status ice_init_nvm(struct ice_hw *hw);
-enum ice_status
-ice_read_sr_buf(struct ice_hw *hw, u16 offset, u16 *words, u16 *data);
 enum ice_status
 ice_aq_alloc_free_res(struct ice_hw *hw, u16 num_entries,
 		      struct ice_aqc_alloc_free_res_elem *buf, u16 buf_size,
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index b002ab4e5838..223e8e707dcb 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -251,39 +251,51 @@ ice_get_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom,
 	       u8 *bytes)
 {
 	struct ice_netdev_priv *np = netdev_priv(netdev);
-	u16 first_word, last_word, nwords;
 	struct ice_vsi *vsi = np->vsi;
 	struct ice_pf *pf = vsi->back;
 	struct ice_hw *hw = &pf->hw;
 	enum ice_status status;
 	struct device *dev;
 	int ret = 0;
-	u16 *buf;
+	u8 *buf;
 
 	dev = ice_pf_to_dev(pf);
 
 	eeprom->magic = hw->vendor_id | (hw->device_id << 16);
+	netdev_dbg(netdev, "GEEPROM cmd 0x%08x, offset 0x%08x, len 0x%08x\n",
+		   eeprom->cmd, eeprom->offset, eeprom->len);
 
-	first_word = eeprom->offset >> 1;
-	last_word = (eeprom->offset + eeprom->len - 1) >> 1;
-	nwords = last_word - first_word + 1;
-
-	buf = devm_kcalloc(dev, nwords, sizeof(u16), GFP_KERNEL);
+	buf = kzalloc(eeprom->len, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 
-	status = ice_read_sr_buf(hw, first_word, &nwords, buf);
+	status = ice_acquire_nvm(hw, ICE_RES_READ);
 	if (status) {
-		dev_err(dev, "ice_read_sr_buf failed, err %d aq_err %d\n",
+		dev_err(dev, "ice_acquire_nvm failed, err %d aq_err %d\n",
 			status, hw->adminq.sq_last_status);
-		eeprom->len = sizeof(u16) * nwords;
 		ret = -EIO;
 		goto out;
 	}
 
-	memcpy(bytes, (u8 *)buf + (eeprom->offset & 1), eeprom->len);
+	status = ice_read_flat_nvm(hw, eeprom->offset, &eeprom->len, buf, false);
+	if (status == ICE_ERR_AQ_ERROR &&
+	    hw->adminq.sq_last_status == ICE_AQ_RC_EINVAL) {
+		/* do nothing, we reached the end */
+		ice_release_nvm(hw);
+		goto out;
+	} else if (status) {
+		dev_err(dev, "ice_read_flat_nvm failed, err %d aq_err %d\n",
+			status, hw->adminq.sq_last_status);
+		ret = -EIO;
+		ice_release_nvm(hw);
+		goto out;
+	}
+
+	ice_release_nvm(hw);
+
+	memcpy(bytes, buf, eeprom->len);
 out:
-	devm_kfree(dev, buf);
+	kfree(buf);
 	return ret;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index e2214c076ca9..aaf5fd064725 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -108,70 +108,6 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 	return status;
 }
 
-/**
- * ice_check_sr_access_params - verify params for Shadow RAM R/W operations.
- * @hw: pointer to the HW structure
- * @offset: offset in words from module start
- * @words: number of words to access
- */
-static enum ice_status
-ice_check_sr_access_params(struct ice_hw *hw, u32 offset, u16 words)
-{
-	if ((offset + words) > hw->nvm.sr_words) {
-		ice_debug(hw, ICE_DBG_NVM,
-			  "NVM error: offset beyond SR lmt.\n");
-		return ICE_ERR_PARAM;
-	}
-
-	if (words > ICE_SR_SECTOR_SIZE_IN_WORDS) {
-		/* We can access only up to 4KB (one sector), in one AQ write */
-		ice_debug(hw, ICE_DBG_NVM,
-			  "NVM error: tried to access %d words, limit is %d.\n",
-			  words, ICE_SR_SECTOR_SIZE_IN_WORDS);
-		return ICE_ERR_PARAM;
-	}
-
-	if (((offset + (words - 1)) / ICE_SR_SECTOR_SIZE_IN_WORDS) !=
-	    (offset / ICE_SR_SECTOR_SIZE_IN_WORDS)) {
-		/* A single access cannot spread over two sectors */
-		ice_debug(hw, ICE_DBG_NVM,
-			  "NVM error: cannot spread over two sectors.\n");
-		return ICE_ERR_PARAM;
-	}
-
-	return 0;
-}
-
-/**
- * ice_read_sr_aq - Read Shadow RAM.
- * @hw: pointer to the HW structure
- * @offset: offset in words from module start
- * @words: number of words to read
- * @data: storage for the words read from Shadow RAM (Little Endian)
- * @last_command: tells the AdminQ that this is the last command
- *
- * Reads 16-bit Little Endian word buffers from the Shadow RAM using the admin
- * command.
- */
-static enum ice_status
-ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, __le16 *data,
-	       bool last_command)
-{
-	enum ice_status status;
-
-	status = ice_check_sr_access_params(hw, offset, words);
-
-	/* values in "offset" and "words" parameters are sized as words
-	 * (16 bits) but ice_aq_read_nvm expects these values in bytes.
-	 * So do this conversion while calling ice_aq_read_nvm.
-	 */
-	if (!status)
-		status = ice_aq_read_nvm(hw, 0, 2 * offset, 2 * words, data,
-					 last_command, true, NULL);
-
-	return status;
-}
-
 /**
  * ice_read_sr_word_aq - Reads Shadow RAM via AQ
  * @hw: pointer to the HW structure
@@ -199,63 +135,6 @@ ice_read_sr_word_aq(struct ice_hw *hw, u16 offset, u16 *data)
 	return 0;
 }
 
-/**
- * ice_read_sr_buf_aq - Reads Shadow RAM buf via AQ
- * @hw: pointer to the HW structure
- * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
- * @words: (in) number of words to read; (out) number of words actually read
- * @data: words read from the Shadow RAM
- *
- * Reads 16 bit words (data buf) from the SR using the ice_read_sr_aq
- * method. Ownership of the NVM is taken before reading the buffer and later
- * released.
- */
-static enum ice_status
-ice_read_sr_buf_aq(struct ice_hw *hw, u16 offset, u16 *words, u16 *data)
-{
-	enum ice_status status;
-	bool last_cmd = false;
-	u16 words_read = 0;
-	u16 i = 0;
-
-	do {
-		u16 read_size, off_w;
-
-		/* Calculate number of bytes we should read in this step.
-		 * It's not allowed to read more than one page at a time or
-		 * to cross page boundaries.
-		 */
-		off_w = offset % ICE_SR_SECTOR_SIZE_IN_WORDS;
-		read_size = off_w ?
-			min_t(u16, *words,
-			      (ICE_SR_SECTOR_SIZE_IN_WORDS - off_w)) :
-			min_t(u16, (*words - words_read),
-			      ICE_SR_SECTOR_SIZE_IN_WORDS);
-
-		/* Check if this is last command, if so set proper flag */
-		if ((words_read + read_size) >= *words)
-			last_cmd = true;
-
-		status = ice_read_sr_aq(hw, offset, read_size,
-					data + words_read, last_cmd);
-		if (status)
-			goto read_nvm_buf_aq_exit;
-
-		/* Increment counter for words already read and move offset to
-		 * new read location
-		 */
-		words_read += read_size;
-		offset += read_size;
-	} while (words_read < *words);
-
-	for (i = 0; i < *words; i++)
-		data[i] = le16_to_cpu(((__force __le16 *)data)[i]);
-
-read_nvm_buf_aq_exit:
-	*words = words_read;
-	return status;
-}
-
 /**
  * ice_acquire_nvm - Generic request for acquiring the NVM ownership
  * @hw: pointer to the HW structure
@@ -263,7 +142,7 @@ ice_read_sr_buf_aq(struct ice_hw *hw, u16 offset, u16 *words, u16 *data)
  *
  * This function will request NVM ownership.
  */
-static enum ice_status
+enum ice_status
 ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access)
 {
 	if (hw->nvm.blank_nvm_mode)
@@ -278,7 +157,7 @@ ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access)
  *
  * This function will release NVM ownership.
  */
-static void ice_release_nvm(struct ice_hw *hw)
+void ice_release_nvm(struct ice_hw *hw)
 {
 	if (hw->nvm.blank_nvm_mode)
 		return;
@@ -412,31 +291,6 @@ enum ice_status ice_init_nvm(struct ice_hw *hw)
 	return 0;
 }
 
-/**
- * ice_read_sr_buf - Reads Shadow RAM buf and acquire lock if necessary
- * @hw: pointer to the HW structure
- * @offset: offset of the Shadow RAM word to read (0x000000 - 0x001FFF)
- * @words: (in) number of words to read; (out) number of words actually read
- * @data: words read from the Shadow RAM
- *
- * Reads 16 bit words (data buf) from the SR using the ice_read_nvm_buf_aq
- * method. The buf read is preceded by the NVM ownership take
- * and followed by the release.
- */
-enum ice_status
-ice_read_sr_buf(struct ice_hw *hw, u16 offset, u16 *words, u16 *data)
-{
-	enum ice_status status;
-
-	status = ice_acquire_nvm(hw, ICE_RES_READ);
-	if (!status) {
-		status = ice_read_sr_buf_aq(hw, offset, words, data);
-		ice_release_nvm(hw);
-	}
-
-	return status;
-}
-
 /**
  * ice_nvm_validate_checksum
  * @hw: pointer to the HW struct
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index 4245ef988edf..7375f6b96919 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -4,8 +4,12 @@
 #ifndef _ICE_NVM_H_
 #define _ICE_NVM_H_
 
+enum ice_status
+ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access);
+void ice_release_nvm(struct ice_hw *hw);
 enum ice_status
 ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 		  bool read_shadow_ram);
+enum ice_status ice_init_nvm(struct ice_hw *hw);
 enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data);
 #endif /* _ICE_NVM_H_ */
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 04/22] ice: enable initial devlink support
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (2 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 03/22] ice: implement full NVM read from ETHTOOL_GEEPROM Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-03-02 16:30   ` Jiri Pirko
  2020-02-14 23:22 ` [RFC PATCH v2 05/22] ice: rename variables used for Option ROM version Jacob Keller
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Begin implementing support for the devlink interface with the ice
driver.

The pf structure is currently memory managed through devres, via
a devm_alloc. To mimic this behavior, after allocating the devlink
pointer, use devm_add_action to add a teardown action for releasing the
devlink memory on exit.

The ice hardware is a multi-function PCIe device. Thus, each physical
function will get its own devlink instance. This means that each
function will be treated independently, with its own parameters and
configuration. This is done because the ice driver loads a separate
instance for each function.

Due to this, the implementation does not enable devlink to manage
device-wide resources or configuration, as each physical function will
be treated independently. This is done for simplicity, as managing
a devlink instance across multiple driver instances would significantly
increase the complexity for minimal gain.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/Kconfig           |   1 +
 drivers/net/ethernet/intel/ice/Makefile      |   1 +
 drivers/net/ethernet/intel/ice/ice.h         |   4 +
 drivers/net/ethernet/intel/ice/ice_devlink.c | 119 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h |  14 +++
 drivers/net/ethernet/intel/ice/ice_main.c    |  19 ++-
 6 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 154e2e818ec6..ad34e4335df2 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -294,6 +294,7 @@ config ICE
 	tristate "Intel(R) Ethernet Connection E800 Series Support"
 	default n
 	depends on PCI_MSI
+	select NET_DEVLINK
 	---help---
 	  This driver supports Intel(R) Ethernet Connection E800 Series of
 	  devices.  For more information on how to identify your adapter, go
diff --git a/drivers/net/ethernet/intel/ice/Makefile b/drivers/net/ethernet/intel/ice/Makefile
index 59544b0fc086..e2502ff3229d 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -19,6 +19,7 @@ ice-y := ice_main.o	\
 	 ice_txrx.o	\
 	 ice_flex_pipe.o \
 	 ice_flow.o	\
+	 ice_devlink.o \
 	 ice_ethtool.o
 ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o ice_sriov.o
 ice-$(CONFIG_DCB) += ice_dcb.o ice_dcb_nl.o ice_dcb_lib.o
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index cb10abb14e11..a195135f840f 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -36,6 +36,7 @@
 #include <linux/avf/virtchnl.h>
 #include <net/ipv6.h>
 #include <net/xdp_sock.h>
+#include <net/devlink.h>
 #include "ice_devids.h"
 #include "ice_type.h"
 #include "ice_txrx.h"
@@ -346,6 +347,9 @@ enum ice_pf_flags {
 struct ice_pf {
 	struct pci_dev *pdev;
 
+	/* devlink port data */
+	struct devlink_port devlink_port;
+
 	/* OS reserved IRQ details */
 	struct msix_entry *msix_entries;
 	struct ice_res_tracker *irq_tracker;
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
new file mode 100644
index 000000000000..2a72857c4b26
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019, Intel Corporation. */
+
+#include "ice.h"
+#include "ice_devlink.h"
+
+const struct devlink_ops ice_devlink_ops = {
+};
+
+static void ice_devlink_free(void *devlink_ptr)
+{
+	devlink_free((struct devlink *)devlink_ptr);
+}
+
+/**
+ * ice_allocate_pf - Allocate devlink and return PF structure pointer
+ * @dev: the device to allocate for
+ *
+ * Allocate a devlink instance for this device and return the private area as
+ * the PF structure. The devlink memory is kept track of through devres by
+ * adding an action to remove it when unwinding.
+ */
+struct ice_pf *ice_allocate_pf(struct device *dev)
+{
+	struct devlink *devlink;
+
+	devlink = devlink_alloc(&ice_devlink_ops, sizeof(struct ice_pf));
+	if (!devlink)
+		return NULL;
+
+	/* Add an action to teardown the devlink when unwinding the driver */
+	if (devm_add_action(dev, ice_devlink_free, devlink)) {
+		devlink_free(devlink);
+		return NULL;
+	}
+
+	return devlink_priv(devlink);
+}
+
+/**
+ * ice_devlink_register - Register devlink interface for this PF
+ * @pf: the PF to register the devlink for.
+ *
+ * Register the devlink instance associated with this physical function.
+ *
+ * @returns zero on success or an error code on failure.
+ */
+int ice_devlink_register(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = devlink_register(devlink, dev);
+	if (err) {
+		dev_err(dev, "devlink registration failed: %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_devlink_unregister - Unregister devlink resources for this PF.
+ * @pf: the PF structure to cleanup
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void ice_devlink_unregister(struct ice_pf *pf)
+{
+	devlink_unregister(priv_to_devlink(pf));
+}
+
+/**
+ * ice_devlink_create_port - Create a devlink port for this PF
+ * @pf: the PF to create a port for
+ *
+ * Create and register a devlink_port for this PF. Note that although each
+ * physical function is connected to a separate devlink instance, the port
+ * will still be numbered according to the physical function id.
+ *
+ * @returns zero on success or an error code on failure.
+ */
+int ice_devlink_create_port(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	struct ice_vsi *vsi = ice_get_main_vsi(pf);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	if (!vsi) {
+		dev_err(dev, "%s: unable to find main VSI\n", __func__);
+		return -EIO;
+	}
+
+	devlink_port_attrs_set(&pf->devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
+			       pf->hw.pf_id, false, 0, NULL, 0);
+	err = devlink_port_register(devlink, &pf->devlink_port, pf->hw.pf_id);
+	if (err) {
+		dev_err(dev, "devlink_port_register failed: %d\n", err);
+		return err;
+	}
+	if (vsi->netdev)
+		devlink_port_type_eth_set(&pf->devlink_port, vsi->netdev);
+
+	return 0;
+}
+
+/**
+ * ice_devlink_destroy_port - Destroy the devlink_port for this PF
+ * @pf: the PF to cleanup
+ *
+ * Unregisters the devlink_port structure associated with this PF.
+ */
+void ice_devlink_destroy_port(struct ice_pf *pf)
+{
+	devlink_port_type_clear(&pf->devlink_port);
+	devlink_port_unregister(&pf->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h
new file mode 100644
index 000000000000..f94dc93c24c5
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019, Intel Corporation. */
+
+#ifndef _ICE_DEVLINK_H_
+#define _ICE_DEVLINK_H_
+
+struct ice_pf *ice_allocate_pf(struct device *dev);
+
+int ice_devlink_register(struct ice_pf *pf);
+void ice_devlink_unregister(struct ice_pf *pf);
+int ice_devlink_create_port(struct ice_pf *pf);
+void ice_devlink_destroy_port(struct ice_pf *pf);
+
+#endif /* _ICE_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 5ef28052c0f8..f2cca810977d 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -10,6 +10,7 @@
 #include "ice_lib.h"
 #include "ice_dcb_lib.h"
 #include "ice_dcb_nl.h"
+#include "ice_devlink.h"
 
 #define DRV_VERSION_MAJOR 0
 #define DRV_VERSION_MINOR 8
@@ -3166,7 +3167,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		return err;
 	}
 
-	pf = devm_kzalloc(dev, sizeof(*pf), GFP_KERNEL);
+	pf = ice_allocate_pf(dev);
 	if (!pf)
 		return -ENOMEM;
 
@@ -3204,6 +3205,12 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 
 	pf->msg_enable = netif_msg_init(debug, ICE_DFLT_NETIF_M);
 
+	err = ice_devlink_register(pf);
+	if (err) {
+		dev_err(dev, "ice_devlink_register failed: %d\n", err);
+		goto err_exit_unroll;
+	}
+
 #ifndef CONFIG_DYNAMIC_DEBUG
 	if (debug < -1)
 		hw->debug_mask = debug;
@@ -3295,6 +3302,11 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		goto err_alloc_sw_unroll;
 	}
 
+	err = ice_devlink_create_port(pf);
+	if (err)
+		goto err_alloc_sw_unroll;
+
+
 	clear_bit(__ICE_SERVICE_DIS, pf->state);
 
 	/* tell the firmware we are up */
@@ -3336,6 +3348,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	return 0;
 
 err_alloc_sw_unroll:
+	ice_devlink_destroy_port(pf);
 	set_bit(__ICE_SERVICE_DIS, pf->state);
 	set_bit(__ICE_DOWN, pf->state);
 	devm_kfree(dev, pf->first_sw);
@@ -3348,6 +3361,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	ice_deinit_pf(pf);
 	ice_deinit_hw(hw);
 err_exit_unroll:
+	ice_devlink_unregister(pf);
 	pci_disable_pcie_error_reporting(pdev);
 	return err;
 }
@@ -3375,6 +3389,7 @@ static void ice_remove(struct pci_dev *pdev)
 
 	if (test_bit(ICE_FLAG_SRIOV_ENA, pf->flags))
 		ice_free_vfs(pf);
+	ice_devlink_destroy_port(pf);
 	ice_vsi_release_all(pf);
 	ice_free_irq_msix_misc(pf);
 	ice_for_each_vsi(pf, i) {
@@ -3384,6 +3399,8 @@ static void ice_remove(struct pci_dev *pdev)
 	}
 	ice_deinit_pf(pf);
 	ice_deinit_hw(&pf->hw);
+	ice_devlink_unregister(pf);
+
 	/* Issue a PFR as part of the prescribed driver unload flow.  Do not
 	 * do it via ice_schedule_reset() since there is no need to rebuild
 	 * and the service task is already stopped.
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 05/22] ice: rename variables used for Option ROM version
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (3 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 04/22] ice: enable initial devlink support Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get Jacob Keller
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The function ice_get_nvm_version reports data for both the NVM map
version and the version of the combined Option ROM. The version data for
the option ROM uses variables with the prefix "oem".

This causes confusion as it makes it difficult for a reviewer to
understand what the version actually represents.

Rename the variables to use the prefix "orom", and update the code
comments to mention that this is the combined Option ROM version. This
helps the code clarify what the version actually represents.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c  | 19 ++++++++++---------
 drivers/net/ethernet/intel/ice/ice_common.h  |  4 ++--
 drivers/net/ethernet/intel/ice/ice_ethtool.c |  8 ++++----
 drivers/net/ethernet/intel/ice/ice_nvm.c     | 16 ++++++++--------
 drivers/net/ethernet/intel/ice/ice_type.h    | 16 ++++++++--------
 5 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 04d5db0a25bf..a74532520112 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -617,22 +617,23 @@ static void ice_get_itr_intrl_gran(struct ice_hw *hw)
 /**
  * ice_get_nvm_version - get cached NVM version data
  * @hw: pointer to the hardware structure
- * @oem_ver: 8 bit NVM version
- * @oem_build: 16 bit NVM build number
- * @oem_patch: 8 NVM patch number
+ * @orom_ver: 8 bit version of combined Option ROM
+ * @orom_build: 16 bit build number of combined Option ROM
+ * @orom_patch: 8 bit patch level of combined Option ROM
  * @ver_hi: high 16 bits of the NVM version
  * @ver_lo: low 16 bits of the NVM version
  */
 void
-ice_get_nvm_version(struct ice_hw *hw, u8 *oem_ver, u16 *oem_build,
-		    u8 *oem_patch, u8 *ver_hi, u8 *ver_lo)
+ice_get_nvm_version(struct ice_hw *hw, u8 *orom_ver, u16 *orom_build,
+		    u8 *orom_patch, u8 *ver_hi, u8 *ver_lo)
 {
 	struct ice_nvm_info *nvm = &hw->nvm;
 
-	*oem_ver = (u8)((nvm->oem_ver & ICE_OEM_VER_MASK) >> ICE_OEM_VER_SHIFT);
-	*oem_patch = (u8)(nvm->oem_ver & ICE_OEM_VER_PATCH_MASK);
-	*oem_build = (u16)((nvm->oem_ver & ICE_OEM_VER_BUILD_MASK) >>
-			   ICE_OEM_VER_BUILD_SHIFT);
+	*orom_ver = (u8)((nvm->orom_ver & ICE_OROM_VER_MASK) >>
+			 ICE_OROM_VER_SHIFT);
+	*orom_patch = (u8)(nvm->orom_ver & ICE_OROM_VER_PATCH_MASK);
+	*orom_build = (u16)((nvm->orom_ver & ICE_OROM_VER_BUILD_MASK) >>
+			   ICE_OROM_VER_BUILD_SHIFT);
 	*ver_hi = (nvm->ver & ICE_NVM_VER_HI_MASK) >> ICE_NVM_VER_HI_SHIFT;
 	*ver_lo = (nvm->ver & ICE_NVM_VER_LO_MASK) >> ICE_NVM_VER_LO_SHIFT;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 9d5e86c9f886..0f9aa1986cab 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -151,8 +151,8 @@ void
 ice_stat_update32(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
 		  u64 *prev_stat, u64 *cur_stat);
 void
-ice_get_nvm_version(struct ice_hw *hw, u8 *oem_ver, u16 *oem_build,
-		    u8 *oem_patch, u8 *ver_hi, u8 *ver_lo);
+ice_get_nvm_version(struct ice_hw *hw, u8 *orom_ver, u16 *orom_build,
+		    u8 *orom_patch, u8 *ver_hi, u8 *ver_lo);
 enum ice_status
 ice_sched_query_elem(struct ice_hw *hw, u32 node_teid,
 		     struct ice_aqc_get_elem *buf);
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 223e8e707dcb..af5e5d6fc29c 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -166,11 +166,11 @@ static void
 ice_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
 {
 	struct ice_netdev_priv *np = netdev_priv(netdev);
-	u8 oem_ver, oem_patch, nvm_ver_hi, nvm_ver_lo;
+	u8 orom_ver, orom_patch, nvm_ver_hi, nvm_ver_lo;
 	struct ice_vsi *vsi = np->vsi;
 	struct ice_pf *pf = vsi->back;
 	struct ice_hw *hw = &pf->hw;
-	u16 oem_build;
+	u16 orom_build;
 
 	strlcpy(drvinfo->driver, KBUILD_MODNAME, sizeof(drvinfo->driver));
 	strlcpy(drvinfo->version, ice_drv_ver, sizeof(drvinfo->version));
@@ -178,11 +178,11 @@ ice_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
 	/* Display NVM version (from which the firmware version can be
 	 * determined) which contains more pertinent information.
 	 */
-	ice_get_nvm_version(hw, &oem_ver, &oem_build, &oem_patch,
+	ice_get_nvm_version(hw, &orom_ver, &orom_build, &orom_patch,
 			    &nvm_ver_hi, &nvm_ver_lo);
 	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
 		 "%x.%02x 0x%x %d.%d.%d", nvm_ver_hi, nvm_ver_lo,
-		 hw->nvm.eetrack, oem_ver, oem_build, oem_patch);
+		 hw->nvm.eetrack, orom_ver, orom_build, orom_patch);
 
 	strlcpy(drvinfo->bus_info, pci_name(pf->pdev),
 		sizeof(drvinfo->bus_info));
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index aaf5fd064725..7d5f2a6296c9 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -195,7 +195,7 @@ enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data)
  */
 enum ice_status ice_init_nvm(struct ice_hw *hw)
 {
-	u16 oem_hi, oem_lo, boot_cfg_tlv, boot_cfg_tlv_len;
+	u16 orom_hi, orom_lo, boot_cfg_tlv, boot_cfg_tlv_len;
 	struct ice_nvm_info *nvm = &hw->nvm;
 	u16 eetrack_lo, eetrack_hi;
 	enum ice_status status;
@@ -272,21 +272,21 @@ enum ice_status ice_init_nvm(struct ice_hw *hw)
 		return ICE_ERR_INVAL_SIZE;
 	}
 
-	status = ice_read_sr_word(hw, (boot_cfg_tlv + ICE_NVM_OEM_VER_OFF),
-				  &oem_hi);
+	status = ice_read_sr_word(hw, (boot_cfg_tlv + ICE_NVM_OROM_VER_OFF),
+				  &orom_hi);
 	if (status) {
-		ice_debug(hw, ICE_DBG_INIT, "Failed to read OEM_VER hi.\n");
+		ice_debug(hw, ICE_DBG_INIT, "Failed to read OROM_VER hi.\n");
 		return status;
 	}
 
-	status = ice_read_sr_word(hw, (boot_cfg_tlv + ICE_NVM_OEM_VER_OFF + 1),
-				  &oem_lo);
+	status = ice_read_sr_word(hw, (boot_cfg_tlv + ICE_NVM_OROM_VER_OFF + 1),
+				  &orom_lo);
 	if (status) {
-		ice_debug(hw, ICE_DBG_INIT, "Failed to read OEM_VER lo.\n");
+		ice_debug(hw, ICE_DBG_INIT, "Failed to read OROM_VER lo.\n");
 		return status;
 	}
 
-	nvm->oem_ver = ((u32)oem_hi << 16) | oem_lo;
+	nvm->orom_ver = ((u32)orom_hi << 16) | orom_lo;
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index db0ef6ba907f..1d9420cd53b1 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -242,7 +242,7 @@ struct ice_fc_info {
 /* NVM Information */
 struct ice_nvm_info {
 	u32 eetrack;              /* NVM data version */
-	u32 oem_ver;              /* OEM version info */
+	u32 orom_ver;             /* Combined Option ROM version info */
 	u16 sr_words;             /* Shadow RAM size in words */
 	u16 ver;                  /* NVM package version */
 	u8 blank_nvm_mode;        /* is NVM empty (no FW present) */
@@ -626,7 +626,7 @@ struct ice_hw_port_stats {
 
 /* Checksum and Shadow RAM pointers */
 #define ICE_SR_BOOT_CFG_PTR		0x132
-#define ICE_NVM_OEM_VER_OFF		0x02
+#define ICE_NVM_OROM_VER_OFF		0x02
 #define ICE_SR_NVM_DEV_STARTER_VER	0x18
 #define ICE_SR_NVM_EETRACK_LO		0x2D
 #define ICE_SR_NVM_EETRACK_HI		0x2E
@@ -634,12 +634,12 @@ struct ice_hw_port_stats {
 #define ICE_NVM_VER_LO_MASK		(0xff << ICE_NVM_VER_LO_SHIFT)
 #define ICE_NVM_VER_HI_SHIFT		12
 #define ICE_NVM_VER_HI_MASK		(0xf << ICE_NVM_VER_HI_SHIFT)
-#define ICE_OEM_VER_PATCH_SHIFT		0
-#define ICE_OEM_VER_PATCH_MASK		(0xff << ICE_OEM_VER_PATCH_SHIFT)
-#define ICE_OEM_VER_BUILD_SHIFT		8
-#define ICE_OEM_VER_BUILD_MASK		(0xffff << ICE_OEM_VER_BUILD_SHIFT)
-#define ICE_OEM_VER_SHIFT		24
-#define ICE_OEM_VER_MASK		(0xff << ICE_OEM_VER_SHIFT)
+#define ICE_OROM_VER_PATCH_SHIFT	0
+#define ICE_OROM_VER_PATCH_MASK		(0xff << ICE_OROM_VER_PATCH_SHIFT)
+#define ICE_OROM_VER_BUILD_SHIFT	8
+#define ICE_OROM_VER_BUILD_MASK		(0xffff << ICE_OROM_VER_BUILD_SHIFT)
+#define ICE_OROM_VER_SHIFT		24
+#define ICE_OROM_VER_MASK		(0xff << ICE_OROM_VER_SHIFT)
 #define ICE_SR_PFA_PTR			0x40
 #define ICE_SR_SECTOR_SIZE_IN_WORDS	0x800
 #define ICE_SR_WORDS_IN_1KB		512
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (4 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 05/22] ice: rename variables used for Option ROM version Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-19  2:45   ` Jakub Kicinski
  2020-02-14 23:22 ` [RFC PATCH v2 07/22] ice: add board identifier info to " Jacob Keller
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The devlink .info_get callback allows the driver to report detailed
version information. The following devlink versions are reported with
this initial implementation:

 "fw.mgmt" -> The version of the firmware that controls PHY, link, etc
 "fw.mgmt.api" -> API version of interface exposed over the AdminQ
 "fw.mgmt.bundle" -> Unique identifier for the firmware bundle
 "fw.undi.orom" -> Version of the Option ROM containing the UEFI driver
 "nvm.psid" -> Version of the format for the NVM parameter set
 "nvm.bundle" -> Unique identifier for the combined NVM image

With this, devlink can now report at least the same information as
reported by the older ethtool interface. Each section of the
"firmware-version" is also reported independently so that it is easier
to understand the meaning.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 Documentation/networking/devlink/ice.rst     | 55 +++++++++++++
 Documentation/networking/devlink/index.rst   |  1 +
 drivers/net/ethernet/intel/ice/ice_devlink.c | 81 ++++++++++++++++++++
 3 files changed, 137 insertions(+)
 create mode 100644 Documentation/networking/devlink/ice.rst

diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
new file mode 100644
index 000000000000..5545e708f18f
--- /dev/null
+++ b/Documentation/networking/devlink/ice.rst
@@ -0,0 +1,55 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+ice devlink support
+===================
+
+This document describes the devlink features implemented by the ``ice``
+device driver.
+
+Info versions
+=============
+
+The ``ice`` driver reports the following versions
+
+.. list-table:: devlink info versions implemented
+    :widths: 5 5 5 90
+
+    * - Name
+      - Type
+      - Example
+      - Description
+    * - ``fw.mgmt``
+      - running
+      - 1.16.10
+      - 3-digit version number of the management firmware that controls the
+        PHY, link, etc.
+    * - ``fw.mgmt.api``
+      - running
+      - 1.5
+      - 2-digit version number of the API exported over the AdminQ by the
+        management firmware. Used by the driver to identify what commands
+        are supported.
+    * - ``fw.mgmt.bundle``
+      - running
+      - 0xecabd066
+      - Unique identifier of the management firmware build.
+    * - ``fw.undi.orom``
+      - running
+      - 1.2186.0
+      - Version of the Option ROM containing the UEFI driver. The version is
+        reported in ``major.minor.patch`` format. The major version is
+        incremented whenever a major breaking change occurs, or when the
+        minor version would overflow. The minor version is incremented for
+        non-breaking changes and reset to 1 when the major version is
+        incremented. The patch version is normally 0 but is incremented when
+        a fix is delivered as a patch against an older base Option ROM.
+    * - ``nvm.psid``
+      - running
+      - 0.50
+      - Version describing the format of the NVM parameter set.
+    * - ``nvm.bundle``
+      - running
+      - 0x80001709
+      - Unique identifier of the NVM image contents, also known as the
+        EETRACK id.
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index 087ff54d53fc..272509cd9215 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -32,6 +32,7 @@ parameters, info versions, and other features it supports.
 
    bnxt
    ionic
+   ice
    mlx4
    mlx5
    mlxsw
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 2a72857c4b26..f834025d58aa 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -2,9 +2,90 @@
 /* Copyright (c) 2019, Intel Corporation. */
 
 #include "ice.h"
+#include "ice_lib.h"
 #include "ice_devlink.h"
 
+/**
+ * ice_devlink_info_get - .info_get devlink handler
+ * @devlink: devlink instance structure
+ * @req: the devlink info request
+ * @extack: extended netdev ack structure
+ *
+ * Callback for the devlink .info_get operation. Reports information about the
+ * device.
+ *
+ * @returns zero on success or an error code on failure.
+ */
+static int ice_devlink_info_get(struct devlink *devlink,
+				struct devlink_info_req *req,
+				struct netlink_ext_ack *extack)
+{
+	u8 orom_maj, orom_patch, nvm_ver_hi, nvm_ver_lo;
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct ice_hw *hw = &pf->hw;
+	u16 orom_min;
+	char buf[32];
+	int err;
+
+	ice_get_nvm_version(hw, &orom_maj, &orom_min, &orom_patch, &nvm_ver_hi,
+			    &nvm_ver_lo);
+
+	err = devlink_info_driver_name_put(req, KBUILD_MODNAME);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set driver name");
+		return err;
+	}
+
+	snprintf(buf, sizeof(buf), "%u.%u.%u", hw->fw_maj_ver, hw->fw_min_ver,
+		 hw->fw_patch);
+	err = devlink_info_version_running_put(req,
+					       DEVLINK_INFO_VERSION_GENERIC_FW_MGMT,
+					       buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw version data");
+		return err;
+	}
+
+	snprintf(buf, sizeof(buf), "%u.%u", hw->api_maj_ver, hw->api_min_ver);
+	err = devlink_info_version_running_put(req, "fw.mgmt.api", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set mgmt fw API data");
+		return err;
+	}
+
+	snprintf(buf, sizeof(buf), "0x%08x", hw->fw_build);
+	err = devlink_info_version_running_put(req, "fw.mgmt.bundle", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set fw bundle data");
+		return err;
+	}
+
+	snprintf(buf, sizeof(buf), "%u.%u.%u", orom_maj, orom_min, orom_patch);
+	err = devlink_info_version_running_put(req, "fw.undi.orom", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set Option ROM version");
+		return err;
+	}
+
+	snprintf(buf, sizeof(buf), "%x.%02x", nvm_ver_hi, nvm_ver_lo);
+	err = devlink_info_version_running_put(req, "nvm.psid", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set NVM parameter set version data");
+		return err;
+	}
+
+	snprintf(buf, sizeof(buf), "0x%0X", hw->nvm.eetrack);
+	err = devlink_info_version_running_put(req, "nvm.bundle", buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set NVM bundle data");
+		return err;
+	}
+
+	return 0;
+}
+
 const struct devlink_ops ice_devlink_ops = {
+	.info_get = ice_devlink_info_get,
 };
 
 static void ice_devlink_free(void *devlink_ptr)
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 07/22] ice: add board identifier info to devlink .info_get
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (5 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 08/22] devlink: prepare to support region operations Jacob Keller
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Export a unique board identifier using "board.id" for devlink's
.info_get command.

Obtain this by reading the NVM for the PBA identification string.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 Documentation/networking/devlink/ice.rst     |   4 +
 drivers/net/ethernet/intel/ice/ice_common.c  |  66 ----------
 drivers/net/ethernet/intel/ice/ice_common.h  |   3 -
 drivers/net/ethernet/intel/ice/ice_devlink.c |  15 +++
 drivers/net/ethernet/intel/ice/ice_nvm.c     | 125 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_nvm.h     |   5 +
 drivers/net/ethernet/intel/ice/ice_type.h    |   1 +
 7 files changed, 150 insertions(+), 69 deletions(-)

diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 5545e708f18f..10ec6c1900b0 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -19,6 +19,10 @@ The ``ice`` driver reports the following versions
       - Type
       - Example
       - Description
+    * - ``board.id``
+      - fixed
+      - K65390-000
+      - The Product Board Assembly (PBA) identifier of the board.
     * - ``fw.mgmt``
       - running
       - 1.16.10
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index a74532520112..2ecf8bec795b 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -958,72 +958,6 @@ enum ice_status ice_reset(struct ice_hw *hw, enum ice_reset_req req)
 	return ice_check_reset(hw);
 }
 
-/**
- * ice_get_pfa_module_tlv - Reads sub module TLV from NVM PFA
- * @hw: pointer to hardware structure
- * @module_tlv: pointer to module TLV to return
- * @module_tlv_len: pointer to module TLV length to return
- * @module_type: module type requested
- *
- * Finds the requested sub module TLV type from the Preserved Field
- * Area (PFA) and returns the TLV pointer and length. The caller can
- * use these to read the variable length TLV value.
- */
-enum ice_status
-ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len,
-		       u16 module_type)
-{
-	enum ice_status status;
-	u16 pfa_len, pfa_ptr;
-	u16 next_tlv;
-
-	status = ice_read_sr_word(hw, ICE_SR_PFA_PTR, &pfa_ptr);
-	if (status) {
-		ice_debug(hw, ICE_DBG_INIT, "Preserved Field Array pointer.\n");
-		return status;
-	}
-	status = ice_read_sr_word(hw, pfa_ptr, &pfa_len);
-	if (status) {
-		ice_debug(hw, ICE_DBG_INIT, "Failed to read PFA length.\n");
-		return status;
-	}
-	/* Starting with first TLV after PFA length, iterate through the list
-	 * of TLVs to find the requested one.
-	 */
-	next_tlv = pfa_ptr + 1;
-	while (next_tlv < pfa_ptr + pfa_len) {
-		u16 tlv_sub_module_type;
-		u16 tlv_len;
-
-		/* Read TLV type */
-		status = ice_read_sr_word(hw, next_tlv, &tlv_sub_module_type);
-		if (status) {
-			ice_debug(hw, ICE_DBG_INIT, "Failed to read TLV type.\n");
-			break;
-		}
-		/* Read TLV length */
-		status = ice_read_sr_word(hw, next_tlv + 1, &tlv_len);
-		if (status) {
-			ice_debug(hw, ICE_DBG_INIT, "Failed to read TLV length.\n");
-			break;
-		}
-		if (tlv_sub_module_type == module_type) {
-			if (tlv_len) {
-				*module_tlv = next_tlv;
-				*module_tlv_len = tlv_len;
-				return 0;
-			}
-			return ICE_ERR_INVAL_SIZE;
-		}
-		/* Check next TLV, i.e. current TLV pointer + length + 2 words
-		 * (for current TLV's type and length)
-		 */
-		next_tlv = next_tlv + tlv_len + 2;
-	}
-	/* Module does not exist */
-	return ICE_ERR_DOES_NOT_EXIST;
-}
-
 /**
  * ice_copy_rxq_ctx_to_hw
  * @hw: pointer to the hardware structure
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 0f9aa1986cab..8903e0aa42c5 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -15,9 +15,6 @@ enum ice_status ice_nvm_validate_checksum(struct ice_hw *hw);
 
 enum ice_status ice_init_hw(struct ice_hw *hw);
 void ice_deinit_hw(struct ice_hw *hw);
-enum ice_status
-ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len,
-		       u16 module_type);
 enum ice_status ice_check_reset(struct ice_hw *hw);
 enum ice_status ice_reset(struct ice_hw *hw, enum ice_reset_req req);
 enum ice_status ice_create_all_ctrlq(struct ice_hw *hw);
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index f834025d58aa..1f755b98d785 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -23,6 +23,7 @@ static int ice_devlink_info_get(struct devlink *devlink,
 	u8 orom_maj, orom_patch, nvm_ver_hi, nvm_ver_lo;
 	struct ice_pf *pf = devlink_priv(devlink);
 	struct ice_hw *hw = &pf->hw;
+	enum ice_status status;
 	u16 orom_min;
 	char buf[32];
 	int err;
@@ -36,6 +37,20 @@ static int ice_devlink_info_get(struct devlink *devlink,
 		return err;
 	}
 
+	status = ice_read_pba_string(hw, buf, sizeof(buf));
+	if (status) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to obtain PBA string");
+		return -EIO;
+	}
+
+	err = devlink_info_version_fixed_put(req,
+					     DEVLINK_INFO_VERSION_GENERIC_BOARD_ID,
+					     buf);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to set board identifier");
+		return err;
+	}
+
 	snprintf(buf, sizeof(buf), "%u.%u.%u", hw->fw_maj_ver, hw->fw_min_ver,
 		 hw->fw_patch);
 	err = devlink_info_version_running_put(req,
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 7d5f2a6296c9..d964311bdd66 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -186,6 +186,131 @@ enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data)
 	return status;
 }
 
+/**
+ * ice_get_pfa_module_tlv - Reads sub module TLV from NVM PFA
+ * @hw: pointer to hardware structure
+ * @module_tlv: pointer to module TLV to return
+ * @module_tlv_len: pointer to module TLV length to return
+ * @module_type: module type requested
+ *
+ * Finds the requested sub module TLV type from the Preserved Field
+ * Area (PFA) and returns the TLV pointer and length. The caller can
+ * use these to read the variable length TLV value.
+ */
+enum ice_status
+ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len,
+		       u16 module_type)
+{
+	enum ice_status status;
+	u16 pfa_len, pfa_ptr;
+	u16 next_tlv;
+
+	status = ice_read_sr_word(hw, ICE_SR_PFA_PTR, &pfa_ptr);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Preserved Field Array pointer.\n");
+		return status;
+	}
+	status = ice_read_sr_word(hw, pfa_ptr, &pfa_len);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to read PFA length.\n");
+		return status;
+	}
+	/* Starting with first TLV after PFA length, iterate through the list
+	 * of TLVs to find the requested one.
+	 */
+	next_tlv = pfa_ptr + 1;
+	while (next_tlv < pfa_ptr + pfa_len) {
+		u16 tlv_sub_module_type;
+		u16 tlv_len;
+
+		/* Read TLV type */
+		status = ice_read_sr_word(hw, next_tlv, &tlv_sub_module_type);
+		if (status) {
+			ice_debug(hw, ICE_DBG_INIT, "Failed to read TLV type.\n");
+			break;
+		}
+		/* Read TLV length */
+		status = ice_read_sr_word(hw, next_tlv + 1, &tlv_len);
+		if (status) {
+			ice_debug(hw, ICE_DBG_INIT, "Failed to read TLV length.\n");
+			break;
+		}
+		if (tlv_sub_module_type == module_type) {
+			if (tlv_len) {
+				*module_tlv = next_tlv;
+				*module_tlv_len = tlv_len;
+				return 0;
+			}
+			return ICE_ERR_INVAL_SIZE;
+		}
+		/* Check next TLV, i.e. current TLV pointer + length + 2 words
+		 * (for current TLV's type and length)
+		 */
+		next_tlv = next_tlv + tlv_len + 2;
+	}
+	/* Module does not exist */
+	return ICE_ERR_DOES_NOT_EXIST;
+}
+
+/**
+ * ice_read_pba_string - Reads part number string from NVM
+ * @hw: pointer to hardware structure
+ * @pba_num: stores the part number string from the NVM
+ * @pba_num_size: part number string buffer length
+ *
+ * Reads the part number string from the NVM.
+ */
+enum ice_status
+ice_read_pba_string(struct ice_hw *hw, u8 *pba_num, u32 pba_num_size)
+{
+	u16 pba_tlv, pba_tlv_len;
+	enum ice_status status;
+	u16 pba_word, pba_size;
+	u16 i;
+
+	status = ice_get_pfa_module_tlv(hw, &pba_tlv, &pba_tlv_len,
+					ICE_SR_PBA_BLOCK_PTR);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to read PBA Block TLV.\n");
+		return status;
+	}
+
+	/* pba_size is the next word */
+	status = ice_read_sr_word(hw, (pba_tlv + 2), &pba_size);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to read PBA Section size.\n");
+		return status;
+	}
+
+	if (pba_tlv_len < pba_size) {
+		ice_debug(hw, ICE_DBG_INIT, "Invalid PBA Block TLV size.\n");
+		return ICE_ERR_INVAL_SIZE;
+	}
+
+	/* Subtract one to get PBA word count (PBA Size word is included in
+	 * total size)
+	 */
+	pba_size--;
+	if (pba_num_size < (((u32)pba_size * 2) + 1)) {
+		ice_debug(hw, ICE_DBG_INIT, "Buffer too small for PBA data.\n");
+		return ICE_ERR_PARAM;
+	}
+
+	for (i = 0; i < pba_size; i++) {
+		status = ice_read_sr_word(hw, (pba_tlv + 2 + 1) + i, &pba_word);
+		if (status) {
+			ice_debug(hw, ICE_DBG_INIT, "Failed to read PBA Block word %d.\n", i);
+			return status;
+		}
+
+		pba_num[(i * 2)] = (pba_word >> 8) & 0xFF;
+		pba_num[(i * 2) + 1] = pba_word & 0xFF;
+	}
+	pba_num[(pba_size * 2)] = '\0';
+
+	return status;
+}
+
 /**
  * ice_init_nvm - initializes NVM setting
  * @hw: pointer to the HW struct
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index 7375f6b96919..999f273ba6ad 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -10,6 +10,11 @@ void ice_release_nvm(struct ice_hw *hw);
 enum ice_status
 ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 		  bool read_shadow_ram);
+enum ice_status
+ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len,
+		       u16 module_type);
+enum ice_status
+ice_read_pba_string(struct ice_hw *hw, u8 *pba_num, u32 pba_num_size);
 enum ice_status ice_init_nvm(struct ice_hw *hw);
 enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data);
 #endif /* _ICE_NVM_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 1d9420cd53b1..12e0aa061260 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -627,6 +627,7 @@ struct ice_hw_port_stats {
 /* Checksum and Shadow RAM pointers */
 #define ICE_SR_BOOT_CFG_PTR		0x132
 #define ICE_NVM_OROM_VER_OFF		0x02
+#define ICE_SR_PBA_BLOCK_PTR		0x16
 #define ICE_SR_NVM_DEV_STARTER_VER	0x18
 #define ICE_SR_NVM_EETRACK_LO		0x2D
 #define ICE_SR_NVM_EETRACK_HI		0x2E
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 08/22] devlink: prepare to support region operations
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (6 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 07/22] ice: add board identifier info to " Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-03-02 17:42   ` Jiri Pirko
  2020-02-14 23:22 ` [RFC PATCH v2 09/22] devlink: convert snapshot destructor callback to region op Jacob Keller
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Modify the devlink region code in preparation for adding new operations
on regions.

Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).

This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.

In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 16 +++++++++++----
 drivers/net/netdevsim/dev.c                 |  6 +++++-
 include/net/devlink.h                       | 16 +++++++++++----
 net/core/devlink.c                          | 22 ++++++++++-----------
 4 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c b/drivers/net/ethernet/mellanox/mlx4/crdump.c
index 64ed725aec28..cc2bf596c74b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -38,8 +38,16 @@
 #define CR_ENABLE_BIT_OFFSET		0xF3F04
 #define MAX_NUM_OF_DUMPS_TO_STORE	(8)
 
-static const char *region_cr_space_str = "cr-space";
-static const char *region_fw_health_str = "fw-health";
+static const char * const region_cr_space_str = "cr-space";
+static const char * const region_fw_health_str = "fw-health";
+
+static const struct devlink_region_ops region_cr_space_ops = {
+	.name = region_cr_space_str,
+};
+
+static const struct devlink_region_ops region_fw_health_ops = {
+	.name = region_fw_health_str,
+};
 
 /* Set to true in case cr enable bit was set to true before crdump */
 static bool crdump_enbale_bit_set;
@@ -205,7 +213,7 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
 	/* Create cr-space region */
 	crdump->region_crspace =
 		devlink_region_create(devlink,
-				      region_cr_space_str,
+				      &region_cr_space_ops,
 				      MAX_NUM_OF_DUMPS_TO_STORE,
 				      pci_resource_len(pdev, 0));
 	if (IS_ERR(crdump->region_crspace))
@@ -216,7 +224,7 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
 	/* Create fw-health region */
 	crdump->region_fw_health =
 		devlink_region_create(devlink,
-				      region_fw_health_str,
+				      &region_fw_health_ops,
 				      MAX_NUM_OF_DUMPS_TO_STORE,
 				      HEALTH_BUFFER_SIZE);
 	if (IS_ERR(crdump->region_fw_health))
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index d7706a0346f2..3365de48ea9d 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -245,11 +245,15 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
 
 #define NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX 16
 
+static const struct devlink_region_ops dummy_region_ops = {
+	.name = "dummy",
+};
+
 static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
 				      struct devlink *devlink)
 {
 	nsim_dev->dummy_region =
-		devlink_region_create(devlink, "dummy",
+		devlink_region_create(devlink, &dummy_region_ops,
 				      NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX,
 				      NSIM_DEV_DUMMY_REGION_SIZE);
 	return PTR_ERR_OR_ZERO(nsim_dev->dummy_region);
diff --git a/include/net/devlink.h b/include/net/devlink.h
index ce5cea428fdc..7012bda22aa8 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -495,6 +495,14 @@ struct devlink_info_req;
 
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
+/**
+ * struct devlink_region_ops - Region operations
+ * @name: region name
+ */
+struct devlink_region_ops {
+	const char *name;
+};
+
 struct devlink_fmsg;
 struct devlink_health_reporter;
 
@@ -949,10 +957,10 @@ void devlink_port_param_value_changed(struct devlink_port *devlink_port,
 				      u32 param_id);
 void devlink_param_value_str_fill(union devlink_param_value *dst_val,
 				  const char *src);
-struct devlink_region *devlink_region_create(struct devlink *devlink,
-					     const char *region_name,
-					     u32 region_max_snapshots,
-					     u64 region_size);
+struct devlink_region *
+devlink_region_create(struct devlink *devlink,
+		      const struct devlink_region_ops *ops,
+		      u32 region_max_snapshots, u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
 u32 devlink_region_snapshot_id_get(struct devlink *devlink);
 int devlink_region_snapshot_create(struct devlink_region *region,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 549ee56b7a21..4128fd1f604a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -344,7 +344,7 @@ devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
 struct devlink_region {
 	struct devlink *devlink;
 	struct list_head list;
-	const char *name;
+	const struct devlink_region_ops *ops;
 	struct list_head snapshot_list;
 	u32 max_snapshots;
 	u32 cur_snapshots;
@@ -365,7 +365,7 @@ devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
 	struct devlink_region *region;
 
 	list_for_each_entry(region, &devlink->region_list, list)
-		if (!strcmp(region->name, region_name))
+		if (!strcmp(region->ops->name, region_name))
 			return region;
 
 	return NULL;
@@ -3687,7 +3687,7 @@ static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
 	if (err)
 		goto nla_put_failure;
 
-	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->name);
+	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->ops->name);
 	if (err)
 		goto nla_put_failure;
 
@@ -3733,7 +3733,7 @@ static void devlink_nl_region_notify(struct devlink_region *region,
 		goto out_cancel_msg;
 
 	err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME,
-			     region->name);
+			     region->ops->name);
 	if (err)
 		goto out_cancel_msg;
 
@@ -7536,21 +7536,21 @@ EXPORT_SYMBOL_GPL(devlink_param_value_str_fill);
  *	devlink_region_create - create a new address region
  *
  *	@devlink: devlink
- *	@region_name: region name
+ *	@ops: region operations and name
  *	@region_max_snapshots: Maximum supported number of snapshots for region
  *	@region_size: size of region
  */
-struct devlink_region *devlink_region_create(struct devlink *devlink,
-					     const char *region_name,
-					     u32 region_max_snapshots,
-					     u64 region_size)
+struct devlink_region *
+devlink_region_create(struct devlink *devlink,
+		      const struct devlink_region_ops *ops,
+		      u32 region_max_snapshots, u64 region_size)
 {
 	struct devlink_region *region;
 	int err = 0;
 
 	mutex_lock(&devlink->lock);
 
-	if (devlink_region_get_by_name(devlink, region_name)) {
+	if (devlink_region_get_by_name(devlink, ops->name)) {
 		err = -EEXIST;
 		goto unlock;
 	}
@@ -7563,7 +7563,7 @@ struct devlink_region *devlink_region_create(struct devlink *devlink,
 
 	region->devlink = devlink;
 	region->max_snapshots = region_max_snapshots;
-	region->name = region_name;
+	region->ops = ops;
 	region->size = region_size;
 	INIT_LIST_HEAD(&region->snapshot_list);
 	list_add_tail(&region->list, &devlink->region_list);
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 09/22] devlink: convert snapshot destructor callback to region op
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (7 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 08/22] devlink: prepare to support region operations Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-03-02 17:42   ` Jiri Pirko
  2020-02-14 23:22 ` [RFC PATCH v2 10/22] devlink: trivial: fix tab in function documentation Jacob Keller
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.

This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.

Noticed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/mellanox/mlx4/crdump.c |  6 ++++--
 drivers/net/netdevsim/dev.c                 |  3 ++-
 include/net/devlink.h                       |  7 +++----
 net/core/devlink.c                          | 11 +++++------
 4 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c b/drivers/net/ethernet/mellanox/mlx4/crdump.c
index cc2bf596c74b..c3f90c0f9554 100644
--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -43,10 +43,12 @@ static const char * const region_fw_health_str = "fw-health";
 
 static const struct devlink_region_ops region_cr_space_ops = {
 	.name = region_cr_space_str,
+	.destructor = &kvfree,
 };
 
 static const struct devlink_region_ops region_fw_health_ops = {
 	.name = region_fw_health_str,
+	.destructor = &kvfree,
 };
 
 /* Set to true in case cr enable bit was set to true before crdump */
@@ -107,7 +109,7 @@ static void mlx4_crdump_collect_crspace(struct mlx4_dev *dev,
 					readl(cr_space + offset);
 
 		err = devlink_region_snapshot_create(crdump->region_crspace,
-						     crspace_data, id, &kvfree);
+						     crspace_data, id);
 		if (err) {
 			kvfree(crspace_data);
 			mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
@@ -146,7 +148,7 @@ static void mlx4_crdump_collect_fw_health(struct mlx4_dev *dev,
 					readl(health_buf_start + offset);
 
 		err = devlink_region_snapshot_create(crdump->region_fw_health,
-						     health_data, id, &kvfree);
+						     health_data, id);
 		if (err) {
 			kvfree(health_data);
 			mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 3365de48ea9d..5b1ba67fd4a0 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -55,7 +55,7 @@ static ssize_t nsim_dev_take_snapshot_write(struct file *file,
 
 	id = devlink_region_snapshot_id_get(priv_to_devlink(nsim_dev));
 	err = devlink_region_snapshot_create(nsim_dev->dummy_region,
-					     dummy_data, id, kfree);
+					     dummy_data, id);
 	if (err) {
 		pr_err("Failed to create region snapshot\n");
 		kfree(dummy_data);
@@ -247,6 +247,7 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
 
 static const struct devlink_region_ops dummy_region_ops = {
 	.name = "dummy",
+	.destructor = &kfree,
 };
 
 static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7012bda22aa8..437d3f51a5ab 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -493,14 +493,14 @@ enum devlink_param_generic_id {
 struct devlink_region;
 struct devlink_info_req;
 
-typedef void devlink_snapshot_data_dest_t(const void *data);
-
 /**
  * struct devlink_region_ops - Region operations
  * @name: region name
+ * @destructor: callback used to free snapshot memory when deleting
  */
 struct devlink_region_ops {
 	const char *name;
+	void (*destructor)(const void *data);
 };
 
 struct devlink_fmsg;
@@ -964,8 +964,7 @@ devlink_region_create(struct devlink *devlink,
 void devlink_region_destroy(struct devlink_region *region);
 u32 devlink_region_snapshot_id_get(struct devlink *devlink);
 int devlink_region_snapshot_create(struct devlink_region *region,
-				   u8 *data, u32 snapshot_id,
-				   devlink_snapshot_data_dest_t *data_destructor);
+				   u8 *data, u32 snapshot_id);
 int devlink_info_serial_number_put(struct devlink_info_req *req,
 				   const char *sn);
 int devlink_info_driver_name_put(struct devlink_info_req *req,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 4128fd1f604a..7f9e98776434 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -354,7 +354,6 @@ struct devlink_region {
 struct devlink_snapshot {
 	struct list_head list;
 	struct devlink_region *region;
-	devlink_snapshot_data_dest_t *data_destructor;
 	u8 *data;
 	u32 id;
 };
@@ -3767,7 +3766,7 @@ static void devlink_region_snapshot_del(struct devlink_region *region,
 	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL);
 	region->cur_snapshots--;
 	list_del(&snapshot->list);
-	(*snapshot->data_destructor)(snapshot->data);
+	region->ops->destructor(snapshot->data);
 	kfree(snapshot);
 }
 
@@ -7548,6 +7547,9 @@ devlink_region_create(struct devlink *devlink,
 	struct devlink_region *region;
 	int err = 0;
 
+	if (WARN_ON(!ops) || WARN_ON(!ops->destructor))
+		return ERR_PTR(-EINVAL);
+
 	mutex_lock(&devlink->lock);
 
 	if (devlink_region_get_by_name(devlink, ops->name)) {
@@ -7634,11 +7636,9 @@ EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_get);
  *	@region: devlink region of the snapshot
  *	@data: snapshot data
  *	@snapshot_id: snapshot id to be created
- *	@data_destructor: pointer to destructor function to free data
  */
 int devlink_region_snapshot_create(struct devlink_region *region,
-				   u8 *data, u32 snapshot_id,
-				   devlink_snapshot_data_dest_t *data_destructor)
+				   u8 *data, u32 snapshot_id)
 {
 	struct devlink *devlink = region->devlink;
 	struct devlink_snapshot *snapshot;
@@ -7666,7 +7666,6 @@ int devlink_region_snapshot_create(struct devlink_region *region,
 	snapshot->id = snapshot_id;
 	snapshot->region = region;
 	snapshot->data = data;
-	snapshot->data_destructor = data_destructor;
 
 	list_add_tail(&snapshot->list, &region->snapshot_list);
 
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 10/22] devlink: trivial: fix tab in function documentation
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (8 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 09/22] devlink: convert snapshot destructor callback to region op Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-03-02 17:42   ` Jiri Pirko
  2020-02-14 23:22 ` [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked Jacob Keller
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The function documentation comment for devlink_region_snapshot_create
included a literal tab character between 'future analyses' that was
difficult to spot as it happened to only display as one space wide.

Fix the comment to use a space here instead of a stray tab appearing in
the middle of a sentence.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 net/core/devlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 7f9e98776434..fef93f48028c 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -7629,7 +7629,7 @@ EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_get);
  *	devlink_region_snapshot_create - create a new snapshot
  *	This will add a new snapshot of a region. The snapshot
  *	will be stored on the region struct and can be accessed
- *	from devlink. This is useful for future	analyses of snapshots.
+ *	from devlink. This is useful for future analyses of snapshots.
  *	Multiple snapshots can be created on a region.
  *	The @snapshot_id should be obtained using the getter function.
  *
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (9 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 10/22] devlink: trivial: fix tab in function documentation Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-03-02 17:43   ` Jiri Pirko
  2020-02-14 23:22 ` [RFC PATCH v2 12/22] devlink: convert snapshot id getter to return an error Jacob Keller
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

A future change is going to add a new devlink command to request
a snapshot on demand. This function will want to call the
devlink_region_snapshot_id_get and devlink_region_snapshot_create
functions while already holding the devlink instance lock.

Extract the logic of these two functions into static functions prefixed
by `__` to indicate they are internal helper functions. Modify the
original functions to be implemented in terms of the new locked
functions.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 net/core/devlink.c | 93 ++++++++++++++++++++++++++++++----------------
 1 file changed, 61 insertions(+), 32 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index fef93f48028c..0e94887713f4 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3760,6 +3760,65 @@ static void devlink_nl_region_notify(struct devlink_region *region,
 	nlmsg_free(msg);
 }
 
+/**
+ *	__devlink_region_snapshot_id_get - get snapshot ID
+ *	@devlink: devlink instance
+ *
+ *	Returns a new snapshot id. Must be called while holding the
+ *	devlink instance lock.
+ */
+static u32 __devlink_region_snapshot_id_get(struct devlink *devlink)
+{
+	lockdep_assert_held(&devlink->lock);
+	return ++devlink->snapshot_id;
+}
+
+/**
+ *	__devlink_region_snapshot_create - create a new snapshot
+ *	This will add a new snapshot of a region. The snapshot
+ *	will be stored on the region struct and can be accessed
+ *	from devlink. This is useful for future analyses of snapshots.
+ *	Multiple snapshots can be created on a region.
+ *	The @snapshot_id should be obtained using the getter function.
+ *
+ *	Must be called only while holding the devlink instance lock.
+ *
+ *	@region: devlink region of the snapshot
+ *	@data: snapshot data
+ *	@snapshot_id: snapshot id to be created
+ */
+static int
+__devlink_region_snapshot_create(struct devlink_region *region,
+				 u8 *data, u32 snapshot_id)
+{
+	struct devlink *devlink = region->devlink;
+	struct devlink_snapshot *snapshot;
+
+	lockdep_assert_held(&devlink->lock);
+
+	/* check if region can hold one more snapshot */
+	if (region->cur_snapshots == region->max_snapshots)
+		return -ENOMEM;
+
+	if (devlink_region_snapshot_get_by_id(region, snapshot_id))
+		return -EEXIST;
+
+	snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
+	if (!snapshot)
+		return -ENOMEM;
+
+	snapshot->id = snapshot_id;
+	snapshot->region = region;
+	snapshot->data = data;
+
+	list_add_tail(&snapshot->list, &region->snapshot_list);
+
+	region->cur_snapshots++;
+
+	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
+	return 0;
+}
+
 static void devlink_region_snapshot_del(struct devlink_region *region,
 					struct devlink_snapshot *snapshot)
 {
@@ -7618,7 +7677,7 @@ u32 devlink_region_snapshot_id_get(struct devlink *devlink)
 	u32 id;
 
 	mutex_lock(&devlink->lock);
-	id = ++devlink->snapshot_id;
+	id = __devlink_region_snapshot_id_get(devlink);
 	mutex_unlock(&devlink->lock);
 
 	return id;
@@ -7641,42 +7700,12 @@ int devlink_region_snapshot_create(struct devlink_region *region,
 				   u8 *data, u32 snapshot_id)
 {
 	struct devlink *devlink = region->devlink;
-	struct devlink_snapshot *snapshot;
 	int err;
 
 	mutex_lock(&devlink->lock);
-
-	/* check if region can hold one more snapshot */
-	if (region->cur_snapshots == region->max_snapshots) {
-		err = -ENOMEM;
-		goto unlock;
-	}
-
-	if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
-		err = -EEXIST;
-		goto unlock;
-	}
-
-	snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
-	if (!snapshot) {
-		err = -ENOMEM;
-		goto unlock;
-	}
-
-	snapshot->id = snapshot_id;
-	snapshot->region = region;
-	snapshot->data = data;
-
-	list_add_tail(&snapshot->list, &region->snapshot_list);
-
-	region->cur_snapshots++;
-
-	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
+	err = __devlink_region_snapshot_create(region, data, snapshot_id);
 	mutex_unlock(&devlink->lock);
-	return 0;
 
-unlock:
-	mutex_unlock(&devlink->lock);
 	return err;
 }
 EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 12/22] devlink: convert snapshot id getter to return an error
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (10 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-03-02 17:44   ` Jiri Pirko
  2020-02-14 23:22 ` [RFC PATCH v2 13/22] devlink: track snapshot ids using an IDR and refcounts Jacob Keller
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Modify the devlink_snapshot_id_get function to return a signed value,
enabling reporting an error on failure.

This enables easily refactoring how IDs are generated and kept track of
in the future. For now, just report ENOSPC once INT_MAX snapshot ids
have been returned.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 10 +++++++---
 drivers/net/netdevsim/dev.c                 |  7 +++++--
 include/net/devlink.h                       |  2 +-
 net/core/devlink.c                          | 16 +++++++++++-----
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c b/drivers/net/ethernet/mellanox/mlx4/crdump.c
index c3f90c0f9554..723a66efdf32 100644
--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -169,7 +169,7 @@ int mlx4_crdump_collect(struct mlx4_dev *dev)
 	struct pci_dev *pdev = dev->persist->pdev;
 	unsigned long cr_res_size;
 	u8 __iomem *cr_space;
-	u32 id;
+	int id;
 
 	if (!dev->caps.health_buffer_addrs) {
 		mlx4_info(dev, "crdump: FW doesn't support health buffer access, skipping\n");
@@ -189,10 +189,14 @@ int mlx4_crdump_collect(struct mlx4_dev *dev)
 		return -ENODEV;
 	}
 
-	crdump_enable_crspace_access(dev, cr_space);
-
 	/* Get the available snapshot ID for the dumps */
 	id = devlink_region_snapshot_id_get(devlink);
+	if (id < 0) {
+		mlx4_err(dev, "crdump: devlink get snapshot id err %d\n", id);
+		return id;
+	}
+
+	crdump_enable_crspace_access(dev, cr_space);
 
 	/* Try to capture dumps */
 	mlx4_crdump_collect_crspace(dev, cr_space, id);
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 5b1ba67fd4a0..e30bd94c3d52 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -44,8 +44,7 @@ static ssize_t nsim_dev_take_snapshot_write(struct file *file,
 {
 	struct nsim_dev *nsim_dev = file->private_data;
 	void *dummy_data;
-	int err;
-	u32 id;
+	int err, id;
 
 	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
 	if (!dummy_data)
@@ -54,6 +53,10 @@ static ssize_t nsim_dev_take_snapshot_write(struct file *file,
 	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
 
 	id = devlink_region_snapshot_id_get(priv_to_devlink(nsim_dev));
+	if (id < 0) {
+		pr_err("Failed to get snapshot id\n");
+		return id;
+	}
 	err = devlink_region_snapshot_create(nsim_dev->dummy_region,
 					     dummy_data, id);
 	if (err) {
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 437d3f51a5ab..3a7759355434 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -962,7 +962,7 @@ devlink_region_create(struct devlink *devlink,
 		      const struct devlink_region_ops *ops,
 		      u32 region_max_snapshots, u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
-u32 devlink_region_snapshot_id_get(struct devlink *devlink);
+int devlink_region_snapshot_id_get(struct devlink *devlink);
 int devlink_region_snapshot_create(struct devlink_region *region,
 				   u8 *data, u32 snapshot_id);
 int devlink_info_serial_number_put(struct devlink_info_req *req,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 0e94887713f4..da4e669f425b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3764,12 +3764,16 @@ static void devlink_nl_region_notify(struct devlink_region *region,
  *	__devlink_region_snapshot_id_get - get snapshot ID
  *	@devlink: devlink instance
  *
- *	Returns a new snapshot id. Must be called while holding the
- *	devlink instance lock.
+ *	Returns a new snapshot id or a negative error code on failure. Must be
+ *	called while holding the devlink instance lock.
  */
-static u32 __devlink_region_snapshot_id_get(struct devlink *devlink)
+static int __devlink_region_snapshot_id_get(struct devlink *devlink)
 {
 	lockdep_assert_held(&devlink->lock);
+
+	if (devlink->snapshot_id >= INT_MAX)
+		return -ENOSPC;
+
 	return ++devlink->snapshot_id;
 }
 
@@ -7670,11 +7674,13 @@ EXPORT_SYMBOL_GPL(devlink_region_destroy);
  *	Driver should use the same id for multiple snapshots taken
  *	on multiple regions at the same time/by the same trigger.
  *
+ *	Returns a positive id or a negative error code on failure.
+ *
  *	@devlink: devlink
  */
-u32 devlink_region_snapshot_id_get(struct devlink *devlink)
+int devlink_region_snapshot_id_get(struct devlink *devlink)
 {
-	u32 id;
+	int id;
 
 	mutex_lock(&devlink->lock);
 	id = __devlink_region_snapshot_id_get(devlink);
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 13/22] devlink: track snapshot ids using an IDR and refcounts
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (11 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 12/22] devlink: convert snapshot id getter to return an error Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-18 21:44   ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW Jacob Keller
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

New snapshot ids are generated by calling a getter function. The same id
may be used by multiple snapshots created at the same trigger event.

Currently no effort is made to release any previously used snapshot ids.

Replace the basic logic of using a single devlink integer for tracking
ids with the IDR interface.

snapshot IDs will be reference counted using a refcount stored in the
IDR. First, ids are allocated using idr_alloc without a refcount (using
the NULL pointer).

Once the devlink_region_snapshot_create function is called, it will call
the new __devlink_region_snapshot_id_ref(). This function will insert
a new refcount or increment the pre-existing refcount.

devlink_region_snapshot_destroy will call the new
__devlink_region_snapshot_id_deref(), decrementing the reference count.
Once there are no other references, the refcount will be removed from
IDR.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 include/net/devlink.h |   3 +-
 net/core/devlink.c    | 141 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 138 insertions(+), 6 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3a7759355434..3a5ff6bea143 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -17,6 +17,7 @@
 #include <linux/refcount.h>
 #include <net/net_namespace.h>
 #include <uapi/linux/devlink.h>
+#include <linux/idr.h>
 
 struct devlink_ops;
 
@@ -28,13 +29,13 @@ struct devlink {
 	struct list_head resource_list;
 	struct list_head param_list;
 	struct list_head region_list;
-	u32 snapshot_id;
 	struct list_head reporter_list;
 	struct mutex reporters_lock; /* protects reporter_list */
 	struct devlink_dpipe_headers *dpipe_headers;
 	struct list_head trap_list;
 	struct list_head trap_group_list;
 	const struct devlink_ops *ops;
+	struct idr snapshot_idr;
 	struct device *dev;
 	possible_net_t _net;
 	struct mutex lock;
diff --git a/net/core/devlink.c b/net/core/devlink.c
index da4e669f425b..9571063846cc 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3760,21 +3760,118 @@ static void devlink_nl_region_notify(struct devlink_region *region,
 	nlmsg_free(msg);
 }
 
+/**
+ * __devlink_region_snapshot_id_ref - Increment reference for a snapshot ID
+ *	@devlink: devlink instance
+ *	@id: the snapshot id being referenced
+ *
+ *	Increments the reference count for the given snapshot id. If the id
+ *	was not yet allocated, it is allocated immediately. If the id was
+ *	allocated but no references exist, a new refcount is created and
+ *	inserted.
+ */
+static int __devlink_region_snapshot_id_ref(struct devlink *devlink, u32 id)
+{
+	struct idr *idr = &devlink->snapshot_idr;
+	refcount_t *ref;
+	void *old_ptr;
+	int err;
+
+	lockdep_assert_held(&devlink->lock);
+
+	if (id < 1 || id >= INT_MAX)
+		return -EINVAL;
+
+	/* Check if a refcount already exists. If so, increment it and exit */
+	ref = idr_find(idr, id);
+	if (ref) {
+		refcount_inc(ref);
+		return 0;
+	}
+
+	/* Allocate a new reference count */
+	ref = kzalloc(sizeof(*ref), GFP_KERNEL);
+	refcount_set(ref, 1);
+
+	/* The id was likely allocated ahead of time using
+	 * devlink_region_snapshot_id_get, so attempt to replace the NULL
+	 * pointer with the refcount. Since idr_find returned NULL,
+	 * idr_replace should either return ERR_PTR(-ENOENT) or NULL.
+	 */
+	old_ptr = idr_replace(idr, ref, id);
+	/* if old_ptr is NULL, we've inserted the reference */
+	if (old_ptr == NULL)
+		return 0;
+	if (PTR_ERR(old_ptr) != -ENOENT) {
+		kfree(ref);
+		return PTR_ERR(old_ptr);
+	}
+
+	/* the snapshot id was not reserved, so reserve it now. */
+	err = idr_alloc(idr, ref, id, id+1, GFP_KERNEL);
+	if (err < 0)
+		return err;
+	WARN_ON(err != id);
+
+	return 0;
+}
+
+/**
+ * __devlink_region_snapshot_id_deref - Decrement reference for a snapshot ID
+ *	@devlink: devlink instance
+ *	@id: the snapshot id being referenced
+ *
+ *	Decrements the reference count for a given snapshot id. If the
+ *	refcount has reached zero then remove the reference from the IDR.
+ */
+static void __devlink_region_snapshot_id_deref(struct devlink *devlink, u32 id)
+{
+	struct idr *idr = &devlink->snapshot_idr;
+	refcount_t *ref;
+
+	lockdep_assert_held(&devlink->lock);
+
+	if (WARN_ON(id < 1 || id >= INT_MAX))
+		return;
+
+	/* Find the reference pointer */
+	ref = idr_find(idr, id);
+	if (!ref) {
+		WARN(true, "no previous reference was inserted");
+		/* this shouldn't happen, but at least attempt to cleanup if
+		 * something went wrong.
+		 */
+		idr_remove(idr, id);
+		return;
+	}
+
+	if (refcount_dec_and_test(ref)) {
+		/* There are no more references, so remove it from the IDR and
+		 * free the reference count.
+		 */
+		idr_remove(idr, id);
+		kfree(ref);
+	}
+}
+
 /**
  *	__devlink_region_snapshot_id_get - get snapshot ID
  *	@devlink: devlink instance
  *
  *	Returns a new snapshot id or a negative error code on failure. Must be
  *	called while holding the devlink instance lock.
+ *
+ *	Snapshot ids are stored in an IDR and reference counted by the number
+ *	of snapshots currently using that id. This function pre-allocates
+ *	a snapshot id but does not fill in a reference count. A later call to
+ *	devlink_region_snapshot_create will update the IDR pointer to
+ *	a reference count. On devlink_region_snapshot_destory, if there are no
+ *	further references, the id will be removed from the IDR.
  */
 static int __devlink_region_snapshot_id_get(struct devlink *devlink)
 {
 	lockdep_assert_held(&devlink->lock);
-
-	if (devlink->snapshot_id >= INT_MAX)
-		return -ENOSPC;
-
-	return ++devlink->snapshot_id;
+	return idr_alloc(&devlink->snapshot_idr, NULL, 1, INT_MAX, GFP_KERNEL);
 }
 
 /**
@@ -3797,6 +3894,7 @@ __devlink_region_snapshot_create(struct devlink_region *region,
 {
 	struct devlink *devlink = region->devlink;
 	struct devlink_snapshot *snapshot;
+	int err;
 
 	lockdep_assert_held(&devlink->lock);
 
@@ -3811,6 +3909,11 @@ __devlink_region_snapshot_create(struct devlink_region *region,
 	if (!snapshot)
 		return -ENOMEM;
 
+	/* Increment snapshot id reference */
+	err = __devlink_region_snapshot_id_ref(devlink, snapshot_id);
+	if (err)
+		goto err_free_snapshot;
+
 	snapshot->id = snapshot_id;
 	snapshot->region = region;
 	snapshot->data = data;
@@ -3821,15 +3924,25 @@ __devlink_region_snapshot_create(struct devlink_region *region,
 
 	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
 	return 0;
+
+err_free_snapshot:
+	kfree(snapshot);
+	return err;
 }
 
 static void devlink_region_snapshot_del(struct devlink_region *region,
 					struct devlink_snapshot *snapshot)
 {
+	struct devlink *devlink = region->devlink;
+
+	lockdep_assert_held(&devlink_mutex);
+
 	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL);
 	region->cur_snapshots--;
 	list_del(&snapshot->list);
 	region->ops->destructor(snapshot->data);
+	__devlink_region_snapshot_id_deref(devlink, snapshot->id);
+
 	kfree(snapshot);
 }
 
@@ -6388,6 +6501,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	if (!devlink)
 		return NULL;
 	devlink->ops = ops;
+	idr_init(&devlink->snapshot_idr);
 	__devlink_net_set(devlink, &init_net);
 	INIT_LIST_HEAD(&devlink->port_list);
 	INIT_LIST_HEAD(&devlink->sb_list);
@@ -6480,6 +6594,23 @@ EXPORT_SYMBOL_GPL(devlink_reload_disable);
  */
 void devlink_free(struct devlink *devlink)
 {
+	struct idr *idr = &devlink->snapshot_idr;
+
+	mutex_lock(&devlink->lock);
+	if (!idr_is_empty(idr)) {
+		refcount_t *ref;
+		int id;
+
+		WARN(true, "snapshot IDR is not empty");
+
+		idr_for_each_entry(idr, ref, id) {
+			if (ref)
+				kfree(ref);
+		}
+	}
+	idr_destroy(&devlink->snapshot_idr);
+	mutex_unlock(&devlink->lock);
+
 	mutex_destroy(&devlink->reporters_lock);
 	mutex_destroy(&devlink->lock);
 	WARN_ON(!list_empty(&devlink->trap_group_list));
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (12 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 13/22] devlink: track snapshot ids using an IDR and refcounts Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-03-02 17:41   ` Jiri Pirko
  2020-02-14 23:22 ` [RFC PATCH v2 15/22] netdevsim: support taking immediate snapshot via devlink Jacob Keller
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Implement support for the DEVLINK_CMD_REGION_NEW command for creating
snapshots. This new command parallels the existing
DEVLINK_CMD_REGION_DEL.

In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
".snapshot" operation must be implemented in the region's ops structure.

The desired snapshot id may be provided. If the requested id is already
in use, an error will be reported. If no id is provided one will be
selected in the same way as a triggered snapshot.

In either case, the reference count for that id will be incremented
in the snapshot IDR.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../networking/devlink/devlink-region.rst     | 12 +++-
 include/net/devlink.h                         |  6 ++
 net/core/devlink.c                            | 72 +++++++++++++++++++
 3 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
index 1a7683e7acb2..a24faf2b6b7a 100644
--- a/Documentation/networking/devlink/devlink-region.rst
+++ b/Documentation/networking/devlink/devlink-region.rst
@@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
 Regions may also be used to provide an additional way to debug complex error
 states, but see also :doc:`devlink-health`
 
+Regions may optionally support capturing a snapshot on demand via the
+``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow
+requested snapshots must implement the ``.snapshot`` callback for the region
+in its ``devlink_region_ops`` structure.
+
 example usage
 -------------
 
@@ -40,8 +45,11 @@ example usage
     # Delete a snapshot using:
     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
 
-    # Trigger (request) a snapshot be taken:
-    $ devlink region trigger pci/0000:00:05.0/cr-space
+    # Request an immediate snapshot, if supported by the region
+    $ devlink region new pci/0000:00:05.0/cr-space
+
+    # Request an immediate snapshot with a specific id
+    $ devlink region new pci/0000:00:05.0/cr-space snapshot 5
 
     # Dump a snapshot:
     $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3a5ff6bea143..3cd0ff2040b2 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -498,10 +498,16 @@ struct devlink_info_req;
  * struct devlink_region_ops - Region operations
  * @name: region name
  * @destructor: callback used to free snapshot memory when deleting
+ * @snapshot: callback to request an immediate snapshot. On success,
+ *            the data variable must be updated to point to the snapshot data.
+ *            The function will be called while the devlink instance lock is
+ *            held.
  */
 struct devlink_region_ops {
 	const char *name;
 	void (*destructor)(const void *data);
+	int (*snapshot)(struct devlink *devlink, struct netlink_ext_ack *extack,
+			u8 **data);
 };
 
 struct devlink_fmsg;
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9571063846cc..b5d1b21e5178 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4045,6 +4045,71 @@ static int devlink_nl_cmd_region_del(struct sk_buff *skb,
 	return 0;
 }
 
+static int
+devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_region *region;
+	const char *region_name;
+	u32 snapshot_id;
+	u8 *data;
+	int err;
+
+	if (!info->attrs[DEVLINK_ATTR_REGION_NAME]) {
+		NL_SET_ERR_MSG_MOD(info->extack, "No region name provided");
+		return -EINVAL;
+	}
+
+	region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+	region = devlink_region_get_by_name(devlink, region_name);
+	if (!region) {
+		NL_SET_ERR_MSG_MOD(info->extack,
+				   "The requested region does not exist");
+		return -EINVAL;
+	}
+
+	if (!region->ops->snapshot) {
+		NL_SET_ERR_MSG_MOD(info->extack,
+				   "The requested region does not support taking an immediate snapshot");
+		return -EOPNOTSUPP;
+	}
+
+	if (region->cur_snapshots == region->max_snapshots) {
+		NL_SET_ERR_MSG_MOD(info->extack,
+				   "The region has reached the maximum number of stored snapshots");
+		return -ENOMEM;
+	}
+
+	if (info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
+		/* __devlink_region_snapshot_create will take care of
+		 * inserting the snapshot id into the IDR if necessary.
+		 */
+		snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+
+		if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "The requested snapshot id is already in use");
+			return -EEXIST;
+		}
+	} else {
+		snapshot_id = __devlink_region_snapshot_id_get(devlink);
+	}
+
+	err = region->ops->snapshot(devlink, info->extack, &data);
+	if (err)
+		return err;
+
+	err = __devlink_region_snapshot_create(region, data, snapshot_id);
+	if (err)
+		goto err_free_snapshot_data;
+
+	return 0;
+
+err_free_snapshot_data:
+	region->ops->destructor(data);
+	return err;
+}
+
 static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
 						 struct devlink *devlink,
 						 u8 *chunk, u32 chunk_size,
@@ -6358,6 +6423,13 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_REGION_NEW,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit = devlink_nl_cmd_region_new,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 	{
 		.cmd = DEVLINK_CMD_REGION_DEL,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 15/22] netdevsim: support taking immediate snapshot via devlink
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (13 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 16/22] devlink: simplify arguments for read_snapshot_fill Jacob Keller
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Implement the .snapshot region operation for the dummy data region. This
enables a region snapshot to be taken upon request via the new
DEVLINK_CMD_REGION_SNAPSHOT command.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/netdevsim/dev.c                   | 27 +++++++++++++++----
 .../drivers/net/netdevsim/devlink.sh          | 15 +++++++++++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index e30bd94c3d52..a54b03d49c89 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -38,13 +38,11 @@ static struct dentry *nsim_dev_ddir;
 
 #define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
 
-static ssize_t nsim_dev_take_snapshot_write(struct file *file,
-					    const char __user *data,
-					    size_t count, loff_t *ppos)
+static int
+nsim_dev_take_snapshot(struct devlink *devlink, struct netlink_ext_ack *extack,
+		       u8 **data)
 {
-	struct nsim_dev *nsim_dev = file->private_data;
 	void *dummy_data;
-	int err, id;
 
 	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
 	if (!dummy_data)
@@ -52,6 +50,24 @@ static ssize_t nsim_dev_take_snapshot_write(struct file *file,
 
 	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
 
+	*data = dummy_data;
+
+	return 0;
+}
+
+static ssize_t nsim_dev_take_snapshot_write(struct file *file,
+					    const char __user *data,
+					    size_t count, loff_t *ppos)
+{
+	struct nsim_dev *nsim_dev = file->private_data;
+	u8 *dummy_data;
+	int err, id;
+
+	err = nsim_dev_take_snapshot(priv_to_devlink(nsim_dev), NULL,
+				     &dummy_data);
+	if (err)
+		return err;
+
 	id = devlink_region_snapshot_id_get(priv_to_devlink(nsim_dev));
 	if (id < 0) {
 		pr_err("Failed to get snapshot id\n");
@@ -251,6 +267,7 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
 static const struct devlink_region_ops dummy_region_ops = {
 	.name = "dummy",
 	.destructor = &kfree,
+	.snapshot = nsim_dev_take_snapshot,
 };
 
 static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index 025a84c2ab5a..f23383fd108c 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -141,6 +141,21 @@ regions_test()
 
 	check_region_snapshot_count dummy post-first-delete 2
 
+	devlink region new $DL_HANDLE/dummy
+	check_err $? "Failed to create a new a snapshot"
+
+	check_region_snapshot_count dummy post-request 3
+
+	devlink region new $DL_HANDLE/dummy snapshot 25
+	check_err $? "Failed to create a new snapshot with id 25"
+
+	check_region_snapshot_count dummy post-request 4
+
+	devlink region del $DL_HANDLE/dummy snapshot 25
+	check_err $? "Failed to delete snapshot with id 25"
+
+	check_region_snapshot_count dummy post-request 3
+
 	log_test "regions test"
 }
 
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 16/22] devlink: simplify arguments for read_snapshot_fill
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (14 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 15/22] netdevsim: support taking immediate snapshot via devlink Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 17/22] devlink: use min_t to calculate data_size Jacob Keller
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Simplify the devlink_nl_region_read_snapshot_fill function by looking up
the snapshot pointer ahead of time and passing that instead of the
region pointer.

Check for the snapshot existence within the region_read_dumpit function
and exit early if it does not exist.

This also enables removing additionally the dump parameter and the
netlink attrs parameter.

Simply calculate the proper end_offset ahead of time before calling the
read_snapshot_fill function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 net/core/devlink.c | 47 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index b5d1b21e5178..e5bc0046f13f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4141,30 +4141,19 @@ static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
 
 #define DEVLINK_REGION_READ_CHUNK_SIZE 256
 
-static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
-						struct devlink *devlink,
-						struct devlink_region *region,
-						struct nlattr **attrs,
-						u64 start_offset,
-						u64 end_offset,
-						bool dump,
-						u64 *new_offset)
+static int
+devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
+				     struct devlink *devlink,
+				     struct devlink_snapshot *snapshot,
+				     u64 start_offset,
+				     u64 end_offset,
+				     u64 *new_offset)
 {
-	struct devlink_snapshot *snapshot;
 	u64 curr_offset = start_offset;
-	u32 snapshot_id;
 	int err = 0;
 
 	*new_offset = start_offset;
 
-	snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
-	snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
-	if (!snapshot)
-		return -EINVAL;
-
-	if (end_offset > region->size || dump)
-		end_offset = region->size;
-
 	while (curr_offset < end_offset) {
 		u32 data_size;
 		u8 *data;
@@ -4194,11 +4183,12 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	const struct genl_dumpit_info *info = genl_dumpit_info(cb);
 	u64 ret_offset, start_offset, end_offset = 0;
 	struct nlattr **attrs = info->attrs;
+	struct devlink_snapshot *snapshot;
 	struct devlink_region *region;
 	struct nlattr *chunks_attr;
 	const char *region_name;
 	struct devlink *devlink;
-	bool dump = true;
+	u32 snapshot_id;
 	void *hdr;
 	int err;
 
@@ -4232,6 +4222,13 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto out_unlock;
 	}
 
+	snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+	snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+	if (!snapshot) {
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
 			  &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI,
 			  DEVLINK_CMD_REGION_READ);
@@ -4262,13 +4259,15 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 
 		end_offset = nla_get_u64(attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR]);
 		end_offset += nla_get_u64(attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]);
-		dump = false;
+
+		if (end_offset > region->size)
+			end_offset = region->size;
+	} else {
+		end_offset = region->size;
 	}
 
-	err = devlink_nl_region_read_snapshot_fill(skb, devlink,
-						   region, attrs,
-						   start_offset,
-						   end_offset, dump,
+	err = devlink_nl_region_read_snapshot_fill(skb, devlink, snapshot,
+						   start_offset, end_offset,
 						   &ret_offset);
 
 	if (err && err != -EMSGSIZE)
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 17/22] devlink: use min_t to calculate data_size
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (15 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 16/22] devlink: simplify arguments for read_snapshot_fill Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 18/22] devlink: report extended error message in region_read_dumpit Jacob Keller
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The calculation for the data_size in the devlink_nl_read_snapshot_fill
function uses an if statement that is better expressed using the min_t
macro.

Noticed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 net/core/devlink.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index e5bc0046f13f..60f4d231470e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4155,14 +4155,10 @@ devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
 	*new_offset = start_offset;
 
 	while (curr_offset < end_offset) {
-		u32 data_size;
+		u32 data_size = min_t(u32, end_offset - curr_offset,
+				      DEVLINK_REGION_READ_CHUNK_SIZE);
 		u8 *data;
 
-		if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
-			data_size = end_offset - curr_offset;
-		else
-			data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
-
 		data = &snapshot->data[curr_offset];
 		err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
 							    data, data_size,
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 18/22] devlink: report extended error message in region_read_dumpit
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (16 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 17/22] devlink: use min_t to calculate data_size Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 19/22] devlink: remove unnecessary parameter from chunk_fill function Jacob Keller
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Report extended error details in the devlink_nl_cmd_region_read_dumpit
function, by using the extack structure from the netlink_callback.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 net/core/devlink.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 60f4d231470e..e81b56f83128 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4199,8 +4199,14 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 
 	mutex_lock(&devlink->lock);
 
-	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
-	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
+	if (!attrs[DEVLINK_ATTR_REGION_NAME]) {
+		NL_SET_ERR_MSG_MOD(cb->extack, "No region name provided");
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (!attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
+		NL_SET_ERR_MSG_MOD(cb->extack, "No snapshot id provided");
 		err = -EINVAL;
 		goto out_unlock;
 	}
@@ -4208,6 +4214,8 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	region_name = nla_data(attrs[DEVLINK_ATTR_REGION_NAME]);
 	region = devlink_region_get_by_name(devlink, region_name);
 	if (!region) {
+		NL_SET_ERR_MSG_MOD(cb->extack,
+				   "The requested region does not exist");
 		err = -EINVAL;
 		goto out_unlock;
 	}
@@ -4221,6 +4229,8 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
 	snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
 	if (!snapshot) {
+		NL_SET_ERR_MSG_MOD(cb->extack,
+				   "The requested snapshot id does not exist");
 		err = -EINVAL;
 		goto out_unlock;
 	}
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 19/22] devlink: remove unnecessary parameter from chunk_fill function
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (17 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 18/22] devlink: report extended error message in region_read_dumpit Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 20/22] devlink: refactor region_read_snapshot_fill to use a callback function Jacob Keller
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The devlink parameter of the devlink_nl_cmd_region_read_chunk_fill
function is not used. Remove it, to simplify the function signature.

Once removed, it is also obvious that the devlink parameter is not
necessary for the devlink_nl_region_read_snapshot_fill either.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 net/core/devlink.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index e81b56f83128..a722272f42b4 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4111,7 +4111,6 @@ devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info)
 }
 
 static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
-						 struct devlink *devlink,
 						 u8 *chunk, u32 chunk_size,
 						 u64 addr)
 {
@@ -4143,7 +4142,6 @@ static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
 
 static int
 devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
-				     struct devlink *devlink,
 				     struct devlink_snapshot *snapshot,
 				     u64 start_offset,
 				     u64 end_offset,
@@ -4160,8 +4158,8 @@ devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
 		u8 *data;
 
 		data = &snapshot->data[curr_offset];
-		err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
-							    data, data_size,
+		err = devlink_nl_cmd_region_read_chunk_fill(skb, data,
+							    data_size,
 							    curr_offset);
 		if (err)
 			break;
@@ -4272,9 +4270,8 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		end_offset = region->size;
 	}
 
-	err = devlink_nl_region_read_snapshot_fill(skb, devlink, snapshot,
-						   start_offset, end_offset,
-						   &ret_offset);
+	err = devlink_nl_region_read_snapshot_fill(skb, snapshot, start_offset,
+						   end_offset, &ret_offset);
 
 	if (err && err != -EMSGSIZE)
 		goto nla_put_failure;
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 20/22] devlink: refactor region_read_snapshot_fill to use a callback function
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (18 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 19/22] devlink: remove unnecessary parameter from chunk_fill function Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 21/22] devlink: support directly reading from region memory Jacob Keller
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The devlink_nl_region_read_snapshot_fill is used to copy the contents of
a snapshot into a message for reporting to userspace via the
DEVLINK_CMG_REGION_READ netlink message.

A future change is going to add support for directly reading from
a region. Almost all of the logic for this new capability is identical.

To help reduce code duplication and make this logic more generic,
refactor the function to take a cb and cb_priv pointer for doing the
actual copy.

Add a devlink_region_snapshot_fill implementation that will simply copy
the relevant chunk of the region. This does require allocating some
storage for the chunk as opposed to simply passing the correct address
forward to the devlink_nl_cmg_region_read_chunk_fill function.

A future change to implement support for directly reading from a region
without a snapshot will provide a separate implementation that calls the
newly added devlink region operation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 net/core/devlink.c | 43 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index a722272f42b4..c200701e1839 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4140,24 +4140,34 @@ static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
 
 #define DEVLINK_REGION_READ_CHUNK_SIZE 256
 
+typedef int devlink_chunk_fill_t(void *cb_priv, u8 *chunk, u32 chunk_size,
+				 u64 curr_offset,
+				 struct netlink_ext_ack *extack);
+
 static int
-devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
-				     struct devlink_snapshot *snapshot,
-				     u64 start_offset,
-				     u64 end_offset,
-				     u64 *new_offset)
+devlink_nl_region_read_fill(struct sk_buff *skb, devlink_chunk_fill_t *cb,
+			    void *cb_priv, u64 start_offset, u64 end_offset,
+			    u64 *new_offset, struct netlink_ext_ack *extack)
 {
 	u64 curr_offset = start_offset;
 	int err = 0;
+	u8 *data;
+
+	/* Allocate and re-use a single buffer */
+	data = kzalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
 
 	*new_offset = start_offset;
 
 	while (curr_offset < end_offset) {
 		u32 data_size = min_t(u32, end_offset - curr_offset,
 				      DEVLINK_REGION_READ_CHUNK_SIZE);
-		u8 *data;
 
-		data = &snapshot->data[curr_offset];
+		err = cb(cb_priv, data, data_size, curr_offset, extack);
+		if (err)
+			break;
+
 		err = devlink_nl_cmd_region_read_chunk_fill(skb, data,
 							    data_size,
 							    curr_offset);
@@ -4168,9 +4178,23 @@ devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
 	}
 	*new_offset = curr_offset;
 
+	kfree(data);
+
 	return err;
 }
 
+static int
+devlink_region_snapshot_fill(void *cb_priv, u8 *chunk, u32 chunk_size,
+			     u64 curr_offset,
+			     struct __always_unused netlink_ext_ack *extack)
+{
+	struct devlink_snapshot *snapshot = cb_priv;
+
+	memcpy(chunk, &snapshot->data[curr_offset], chunk_size);
+
+	return 0;
+}
+
 static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 					     struct netlink_callback *cb)
 {
@@ -4270,8 +4294,9 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		end_offset = region->size;
 	}
 
-	err = devlink_nl_region_read_snapshot_fill(skb, snapshot, start_offset,
-						   end_offset, &ret_offset);
+	err = devlink_nl_region_read_fill(skb, &devlink_region_snapshot_fill,
+					  snapshot, start_offset, end_offset,
+					  &ret_offset, cb->extack);
 
 	if (err && err != -EMSGSIZE)
 		goto nla_put_failure;
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 21/22] devlink: support directly reading from region memory
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (19 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 20/22] devlink: refactor region_read_snapshot_fill to use a callback function Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 22/22] ice: add a devlink region to dump shadow RAM contents Jacob Keller
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Add a new region operation for directly reading from a region, without
taking a full snapshot.

Extend the DEVLINK_CMD_REGION_READ to allow directly reading from
a region, if supported. Instead of reporting a missing snapshot id as
invalid, check to see if direct reading is implemented for the region.
If so, use the direct read operation to grab the current contents of the
region.

This new behavior of DEVLINK_CMD_REGION_READ should be backwards
compatible. Previously, all kernels rejected such
a DEVLINK_CMD_REGION_READ with -EINVAL, and will now either accept the
call or report -EOPNOTSUPP for regions which do not implement direct
access.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../networking/devlink/devlink-region.rst     |  8 +++
 include/net/devlink.h                         |  6 ++
 net/core/devlink.c                            | 59 +++++++++++++------
 3 files changed, 55 insertions(+), 18 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
index a24faf2b6b7a..aeb4e6a5051b 100644
--- a/Documentation/networking/devlink/devlink-region.rst
+++ b/Documentation/networking/devlink/devlink-region.rst
@@ -25,6 +25,10 @@ Regions may optionally support capturing a snapshot on demand via the
 requested snapshots must implement the ``.snapshot`` callback for the region
 in its ``devlink_region_ops`` structure.
 
+Regions may optionally allow directly reading from their contents without a
+snapshot. A driver wishing to enable this for a region should implement the
+``.read`` callback in the ``devlink_region_ops`` structure.
+
 example usage
 -------------
 
@@ -63,6 +67,10 @@ example usage
             length 16
     0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
 
+    # Read from the region without a snapshot
+    $ devlink region read pci/0000:00:05.0/fw-health address 16 length 16
+    0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
+
 As regions are likely very device or driver specific, no generic regions are
 defined. See the driver-specific documentation files for information on the
 specific regions a driver supports.
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3cd0ff2040b2..3f00e0890d92 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -502,12 +502,18 @@ struct devlink_info_req;
  *            the data variable must be updated to point to the snapshot data.
  *            The function will be called while the devlink instance lock is
  *            held.
+ * @read: callback to directly read a portion of the region. On success,
+ *            the data pointer will be updated with the contents of the
+ *            requested portion of the region. The function will be called
+ *            while the devlink instance lock is held.
  */
 struct devlink_region_ops {
 	const char *name;
 	void (*destructor)(const void *data);
 	int (*snapshot)(struct devlink *devlink, struct netlink_ext_ack *extack,
 			u8 **data);
+	int (*read)(struct devlink *devlink, struct netlink_ext_ack *extack,
+		    u64 curr_offset, u32 data_size, u8 *data);
 };
 
 struct devlink_fmsg;
diff --git a/net/core/devlink.c b/net/core/devlink.c
index c200701e1839..86fa9d53157e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4195,18 +4195,30 @@ devlink_region_snapshot_fill(void *cb_priv, u8 *chunk, u32 chunk_size,
 	return 0;
 }
 
+static int
+devlink_region_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size,
+			   u64 curr_offset, struct netlink_ext_ack *extack)
+{
+	struct devlink_region *region = cb_priv;
+	struct devlink *devlink = region->devlink;
+
+	return region->ops->read(devlink, extack, curr_offset, chunk_size,
+				 chunk);
+}
+
 static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 					     struct netlink_callback *cb)
 {
 	const struct genl_dumpit_info *info = genl_dumpit_info(cb);
 	u64 ret_offset, start_offset, end_offset = 0;
 	struct nlattr **attrs = info->attrs;
-	struct devlink_snapshot *snapshot;
+	devlink_chunk_fill_t *region_cb;
 	struct devlink_region *region;
 	struct nlattr *chunks_attr;
 	const char *region_name;
 	struct devlink *devlink;
-	u32 snapshot_id;
+	void *region_cb_priv;
+	bool direct;
 	void *hdr;
 	int err;
 
@@ -4227,12 +4239,6 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto out_unlock;
 	}
 
-	if (!attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
-		NL_SET_ERR_MSG_MOD(cb->extack, "No snapshot id provided");
-		err = -EINVAL;
-		goto out_unlock;
-	}
-
 	region_name = nla_data(attrs[DEVLINK_ATTR_REGION_NAME]);
 	region = devlink_region_get_by_name(devlink, region_name);
 	if (!region) {
@@ -4248,13 +4254,30 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto out_unlock;
 	}
 
-	snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
-	snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
-	if (!snapshot) {
-		NL_SET_ERR_MSG_MOD(cb->extack,
-				   "The requested snapshot id does not exist");
-		err = -EINVAL;
-		goto out_unlock;
+	direct = !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID];
+
+	if (direct) {
+		if (!region->ops->read) {
+			NL_SET_ERR_MSG_MOD(cb->extack,
+					   "The requested region does not support direct read");
+			err = -EOPNOTSUPP;
+			goto out_unlock;
+		}
+		region_cb = &devlink_region_direct_fill;
+		region_cb_priv = region;
+	} else {
+		u32 snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+		struct devlink_snapshot *snapshot;
+
+		snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+		if (!snapshot) {
+			NL_SET_ERR_MSG_MOD(cb->extack,
+					   "The requested snapshot id does not exist");
+			err = -EINVAL;
+			goto out_unlock;
+		}
+		region_cb = &devlink_region_snapshot_fill;
+		region_cb_priv = snapshot;
 	}
 
 	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
@@ -4294,9 +4317,9 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		end_offset = region->size;
 	}
 
-	err = devlink_nl_region_read_fill(skb, &devlink_region_snapshot_fill,
-					  snapshot, start_offset, end_offset,
-					  &ret_offset, cb->extack);
+	err = devlink_nl_region_read_fill(skb, region_cb, region_cb_priv,
+					  start_offset, end_offset, &ret_offset,
+					  cb->extack);
 
 	if (err && err != -EMSGSIZE)
 		goto nla_put_failure;
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 22/22] ice: add a devlink region to dump shadow RAM contents
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (20 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 21/22] devlink: support directly reading from region memory Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 1/2] devlink: add support for DEVLINK_CMD_REGION_NEW Jacob Keller
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Add a devlink region for exposing the device's Shadow RAM contents.
Support immediate snapshots by implementing the .snapshot callback.

Currently, no driver event triggers a snapshot automatically. Users must
request a snapshot via the new DEVLINK_CMD_REGION_TAKE_SNAPSHOT command.

The recently added .read region operation is also implemented, enabling
direct access to the Shadow RAM contents without a snapshot. This is
useful when the atomic guarantee of a full snapshot isn't necessary and
when userspace only wants to read a small portion of the region.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 Documentation/networking/devlink/ice.rst     |  28 ++++
 drivers/net/ethernet/intel/ice/ice.h         |   2 +
 drivers/net/ethernet/intel/ice/ice_devlink.c | 145 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h |   3 +
 drivers/net/ethernet/intel/ice/ice_main.c    |   4 +
 5 files changed, 182 insertions(+)

diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 10ec6c1900b0..452498fc0858 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -57,3 +57,31 @@ The ``ice`` driver reports the following versions
       - 0x80001709
       - Unique identifier of the NVM image contents, also known as the
         EETRACK id.
+
+Regions
+=======
+
+The ``ice`` driver enables access to the contents of the Shadow RAM portion
+of the flash chip via the ``shadow-ram`` region.
+
+Users can request an immediate capture of a snapshot via the
+``DEVLINK_CMD_REGION_TAKE_SNAPSHOT``
+
+.. code:: shell
+
+    $ devlink region snapshot pci/0000:01:00.0/shadow-ram
+    $ devlink region dump pci/0000:01:00.0/shadow-ram snapshot 1
+
+Directly reading a portion of the Shadow RAM without a snapshot is also
+supported
+
+.. code:: shell
+
+    $ devlink region dump pci/0000:01:00.0/shadow-ram
+    0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
+    0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
+    0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
+    0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
+
+    $ devlink region read pci/0000:01:00.0/shadow-ram address 0 length 16
+    0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index a195135f840f..43deda152dd3 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -350,6 +350,8 @@ struct ice_pf {
 	/* devlink port data */
 	struct devlink_port devlink_port;
 
+	struct devlink_region *sr_region;
+
 	/* OS reserved IRQ details */
 	struct msix_entry *msix_entries;
 	struct ice_res_tracker *irq_tracker;
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 1f755b98d785..b78687aed3c8 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -213,3 +213,148 @@ void ice_devlink_destroy_port(struct ice_pf *pf)
 	devlink_port_type_clear(&pf->devlink_port);
 	devlink_port_unregister(&pf->devlink_port);
 }
+
+/**
+ * ice_devlink_sr_read - Read a portion of the shadow RAM
+ * @devlink: the devlink instance
+ * @extack: netlink extended ack structure
+ * @curr_offset: offset to start at
+ * @data_size: portion of the region to read
+ * @data: buffer to store region contents
+ *
+ * This function is called to directly read from the shadow-ram region in
+ * response to a DEVLINK_CMD_REGION_READ without a snapshot id.
+ *
+ * @returns zero on success and updates the contents of the data region,
+ * otherwise returns a non-zero error code on failure.
+ */
+static int
+ice_devlink_sr_read(struct devlink *devlink, struct netlink_ext_ack *extack,
+		    u64 curr_offset, u32 data_size, u8 *data)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_hw *hw = &pf->hw;
+	enum ice_status status;
+
+	if (curr_offset + data_size > hw->nvm.sr_words * sizeof(u16))
+		return -ERANGE;
+
+	status = ice_acquire_nvm(hw, ICE_RES_READ);
+	if (status) {
+		dev_dbg(dev, "ice_acquire_nvm failed, err %d aq_err %d\n",
+			status, hw->adminq.sq_last_status);
+		NL_SET_ERR_MSG_MOD(extack, "Failed to acquire NVM semaphore");
+		return -EIO;
+	}
+
+	status = ice_read_flat_nvm(hw, curr_offset, &data_size, data, true);
+	if (status) {
+		dev_dbg(dev, "ice_read_flat_nvm failed after reading %u data_size from offset %llu, err %d aq_err %d\n",
+			data_size, curr_offset, status, hw->adminq.sq_last_status);
+		NL_SET_ERR_MSG_MOD(extack, "Failed to read Shadow RAM contents");
+		ice_release_nvm(hw);
+		return -EIO;
+	}
+
+	ice_release_nvm(hw);
+
+	return 0;
+}
+
+/**
+ * ice_devlink_sr_snapshot - Capture a snapshot of the Shadow RAM contents
+ * @devlink: the devlink instance
+ * @extack: extended ACK response structure
+ * @data: on exit points to snapshot data buffer
+ *
+ * This function is called in response to the DEVLINK_CMD_REGION_TRIGGER for
+ * the shadow-ram devlink region. It captures a snapshot of the shadow ram
+ * contents. This snapshot can later be viewed via the devlink-region
+ * interface.
+ *
+ * @returns zero on success, and updates the data pointer. Returns a non-zero
+ * error code on failure.
+ */
+static int
+ice_devlink_sr_snapshot(struct devlink *devlink, struct netlink_ext_ack *extack,
+			u8 **data)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_hw *hw = &pf->hw;
+	enum ice_status status;
+	void *sr_data;
+	u32 sr_size;
+
+	sr_size = hw->nvm.sr_words * sizeof(u16);
+	sr_data = kzalloc(sr_size, GFP_KERNEL);
+	if (!sr_data) {
+		NL_SET_ERR_MSG_MOD(extack, "Out of memory");
+		return -ENOMEM;
+	}
+
+	status = ice_acquire_nvm(hw, ICE_RES_READ);
+	if (status) {
+		dev_dbg(dev, "ice_acquire_nvm failed, err %d aq_err %d\n",
+			status, hw->adminq.sq_last_status);
+		NL_SET_ERR_MSG_MOD(extack, "Failed to acquire NVM semaphore");
+		kfree(sr_data);
+		return -EIO;
+	}
+
+	status = ice_read_flat_nvm(hw, 0, &sr_size, sr_data, true);
+	if (status) {
+		dev_dbg(dev, "ice_read_flat_nvm failed after reading %u bytes, err %d aq_err %d\n",
+			sr_size, status, hw->adminq.sq_last_status);
+		NL_SET_ERR_MSG_MOD(extack, "Failed to read Shadow RAM contents");
+		ice_release_nvm(hw);
+		kfree(sr_data);
+		return -EIO;
+	}
+
+	ice_release_nvm(hw);
+
+	*data = sr_data;
+
+	return 0;
+}
+
+static const struct devlink_region_ops ice_sr_region_ops = {
+	.name = "shadow-ram",
+	.destructor = kfree,
+	.snapshot = ice_devlink_sr_snapshot,
+	.read = ice_devlink_sr_read,
+};
+
+/**
+ * ice_devlink_init_regions - Initialize devlink regions
+ * @pf: the PF device structure
+ *
+ * Create devlink regions used to enable access to dump the contents of the
+ * flash memory on the device.
+ */
+void ice_devlink_init_regions(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	struct device *dev = ice_pf_to_dev(pf);
+	u64 shadow_ram_size;
+
+	shadow_ram_size = pf->hw.nvm.sr_words * sizeof(u16);
+	pf->sr_region = devlink_region_create(devlink, &ice_sr_region_ops, 1,
+					      shadow_ram_size);
+	if (IS_ERR(pf->sr_region))
+		dev_warn(dev, "failed to create shadow-ram devlink region, err %ld\n",
+			 PTR_ERR(pf->sr_region));
+}
+
+/**
+ * ice_devlink_destroy_regions - Destroy devlink regions
+ * @pf: the PF device structure
+ *
+ * Remove previously created regions for this PF.
+ */
+void ice_devlink_destroy_regions(struct ice_pf *pf)
+{
+	devlink_region_destroy(pf->sr_region);
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h
index f94dc93c24c5..6e806a08dc23 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.h
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.h
@@ -11,4 +11,7 @@ void ice_devlink_unregister(struct ice_pf *pf);
 int ice_devlink_create_port(struct ice_pf *pf);
 void ice_devlink_destroy_port(struct ice_pf *pf);
 
+void ice_devlink_init_regions(struct ice_pf *pf);
+void ice_devlink_destroy_regions(struct ice_pf *pf);
+
 #endif /* _ICE_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index f2cca810977d..3d199596e17d 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3245,6 +3245,8 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		goto err_init_pf_unroll;
 	}
 
+	ice_devlink_init_regions(pf);
+
 	pf->num_alloc_vsi = hw->func_caps.guar_num_vsi;
 	if (!pf->num_alloc_vsi) {
 		err = -EIO;
@@ -3359,6 +3361,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	devm_kfree(dev, pf->vsi);
 err_init_pf_unroll:
 	ice_deinit_pf(pf);
+	ice_devlink_destroy_regions(pf);
 	ice_deinit_hw(hw);
 err_exit_unroll:
 	ice_devlink_unregister(pf);
@@ -3398,6 +3401,7 @@ static void ice_remove(struct pci_dev *pdev)
 		ice_vsi_free_q_vectors(pf->vsi[i]);
 	}
 	ice_deinit_pf(pf);
+	ice_devlink_destroy_regions(pf);
 	ice_deinit_hw(&pf->hw);
 	ice_devlink_unregister(pf);
 
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 1/2] devlink: add support for DEVLINK_CMD_REGION_NEW
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (21 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 22/22] ice: add a devlink region to dump shadow RAM contents Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-02-14 23:22 ` [RFC PATCH v2 2/2] devlink: stop requiring snapshot for regions Jacob Keller
  2020-03-02 16:27 ` [RFC PATCH v2 00/22] devlink region updates Jiri Pirko
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

Add support to request that a new snapshot be taken immediately for
a devlink region. Optionally allow specifying the snapshot id to use. If
no snapshot id is provided, the kernel will select a suitable id
automatically.

If the region does not support snapshots on demand, the command will
return an error indicating the operation is not supported.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 devlink/devlink.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index f9e58c1d7394..71c300ba16ed 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -6347,10 +6347,27 @@ static int cmd_region_read(struct dl *dl)
 	return err;
 }
 
+static int cmd_region_snapshot_new(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_REGION_NEW,
+			NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE_REGION,
+				DL_OPT_REGION_SNAPSHOT_ID);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
 static void cmd_region_help(void)
 {
 	pr_err("Usage: devlink region show [ DEV/REGION ]\n");
 	pr_err("       devlink region del DEV/REGION snapshot SNAPSHOT_ID\n");
+	pr_err("       devlink region new DEV/REGION [snapshot SNAPSHOT_ID]\n");
 	pr_err("       devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]\n");
 	pr_err("       devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length LENGTH\n");
 }
@@ -6374,6 +6391,9 @@ static int cmd_region(struct dl *dl)
 	} else if (dl_argv_match(dl, "read")) {
 		dl_arg_inc(dl);
 		return cmd_region_read(dl);
+	} else if (dl_argv_match(dl, "new")) {
+		dl_arg_inc(dl);
+		return cmd_region_snapshot_new(dl);
 	}
 	pr_err("Command \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH v2 2/2] devlink: stop requiring snapshot for regions
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (22 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 1/2] devlink: add support for DEVLINK_CMD_REGION_NEW Jacob Keller
@ 2020-02-14 23:22 ` Jacob Keller
  2020-03-02 16:27 ` [RFC PATCH v2 00/22] devlink region updates Jiri Pirko
  24 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-14 23:22 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba, Jacob Keller

The region dump and region read commands currently require the snapshot
to work. Recent changes to the kernel have enabled optionally
supporting direct read of a region's contents without a snapshot id.

Enable this by allowing the read and dump commands to execute without
a snapshot id. On older kernels, this will return -EINVAL as the kernel
will reject such a command. On newer kernels, this will directly read
the region contents without taking a snapshot. If a region does not
support direct read, it will return -EOPNOTSUPP.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 devlink/devlink.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 71c300ba16ed..f98f2dc034ea 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -6312,8 +6312,8 @@ static int cmd_region_dump(struct dl *dl)
 	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_REGION_READ,
 			       NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP);
 
-	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE_REGION |
-				DL_OPT_REGION_SNAPSHOT_ID, 0);
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE_REGION,
+				DL_OPT_REGION_SNAPSHOT_ID);
 	if (err)
 		return err;
 
@@ -6334,8 +6334,8 @@ static int cmd_region_read(struct dl *dl)
 			       NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP);
 
 	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE_REGION |
-				DL_OPT_REGION_ADDRESS | DL_OPT_REGION_LENGTH |
-				DL_OPT_REGION_SNAPSHOT_ID, 0);
+				DL_OPT_REGION_ADDRESS | DL_OPT_REGION_LENGTH,
+				DL_OPT_REGION_SNAPSHOT_ID);
 	if (err)
 		return err;
 
-- 
2.25.0.368.g28a2d05eebfb


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 13/22] devlink: track snapshot ids using an IDR and refcounts
  2020-02-14 23:22 ` [RFC PATCH v2 13/22] devlink: track snapshot ids using an IDR and refcounts Jacob Keller
@ 2020-02-18 21:44   ` Jacob Keller
  0 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-18 21:44 UTC (permalink / raw)
  To: netdev; +Cc: jiri, valex, linyunsheng, lihong.yang, kuba

On 2/14/2020 3:22 PM, Jacob Keller wrote:
> New snapshot ids are generated by calling a getter function. The same id
> may be used by multiple snapshots created at the same trigger event.
> 
> Currently no effort is made to release any previously used snapshot ids.
> 
> Replace the basic logic of using a single devlink integer for tracking
> ids with the IDR interface.
> 
> snapshot IDs will be reference counted using a refcount stored in the
> IDR. First, ids are allocated using idr_alloc without a refcount (using
> the NULL pointer).
> 
> Once the devlink_region_snapshot_create function is called, it will call
> the new __devlink_region_snapshot_id_ref(). This function will insert
> a new refcount or increment the pre-existing refcount.
> 
> devlink_region_snapshot_destroy will call the new
> __devlink_region_snapshot_id_deref(), decrementing the reference count.
> Once there are no other references, the refcount will be removed from
> IDR.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Based on recent comments on the list it looks like the preference would
be to use xarray directly instead of IDR for this?

(See https://lore.kernel.org/netdev/20200218135359.GA9608@ziepe.ca/ )

I'd be happy to rework this to use xarray instead. I don't believe we
can use an ida directly because of the implementation that enables
re-using the same snapshot id for multiple snapshots. (hence why I went
with a reference count that can be NULL initially to "pre-allocate" the id)

I'm open to also considering whether we should simplify how ids are
managed in some way so that an ida on its own can be used.

Thanks,
Jake

> ---
>  include/net/devlink.h |   3 +-
>  net/core/devlink.c    | 141 ++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 138 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index 3a7759355434..3a5ff6bea143 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -17,6 +17,7 @@
>  #include <linux/refcount.h>
>  #include <net/net_namespace.h>
>  #include <uapi/linux/devlink.h>
> +#include <linux/idr.h>
>  
>  struct devlink_ops;
>  
> @@ -28,13 +29,13 @@ struct devlink {
>  	struct list_head resource_list;
>  	struct list_head param_list;
>  	struct list_head region_list;
> -	u32 snapshot_id;
>  	struct list_head reporter_list;
>  	struct mutex reporters_lock; /* protects reporter_list */
>  	struct devlink_dpipe_headers *dpipe_headers;
>  	struct list_head trap_list;
>  	struct list_head trap_group_list;
>  	const struct devlink_ops *ops;
> +	struct idr snapshot_idr;
>  	struct device *dev;
>  	possible_net_t _net;
>  	struct mutex lock;
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index da4e669f425b..9571063846cc 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -3760,21 +3760,118 @@ static void devlink_nl_region_notify(struct devlink_region *region,
>  	nlmsg_free(msg);
>  }
>  
> +/**
> + * __devlink_region_snapshot_id_ref - Increment reference for a snapshot ID
> + *	@devlink: devlink instance
> + *	@id: the snapshot id being referenced
> + *
> + *	Increments the reference count for the given snapshot id. If the id
> + *	was not yet allocated, it is allocated immediately. If the id was
> + *	allocated but no references exist, a new refcount is created and
> + *	inserted.
> + */
> +static int __devlink_region_snapshot_id_ref(struct devlink *devlink, u32 id)
> +{
> +	struct idr *idr = &devlink->snapshot_idr;
> +	refcount_t *ref;
> +	void *old_ptr;
> +	int err;
> +
> +	lockdep_assert_held(&devlink->lock);
> +
> +	if (id < 1 || id >= INT_MAX)
> +		return -EINVAL;
> +
> +	/* Check if a refcount already exists. If so, increment it and exit */
> +	ref = idr_find(idr, id);
> +	if (ref) {
> +		refcount_inc(ref);
> +		return 0;
> +	}
> +
> +	/* Allocate a new reference count */
> +	ref = kzalloc(sizeof(*ref), GFP_KERNEL);
> +	refcount_set(ref, 1);
> +
> +	/* The id was likely allocated ahead of time using
> +	 * devlink_region_snapshot_id_get, so attempt to replace the NULL
> +	 * pointer with the refcount. Since idr_find returned NULL,
> +	 * idr_replace should either return ERR_PTR(-ENOENT) or NULL.
> +	 */
> +	old_ptr = idr_replace(idr, ref, id);
> +	/* if old_ptr is NULL, we've inserted the reference */
> +	if (old_ptr == NULL)
> +		return 0;
> +	if (PTR_ERR(old_ptr) != -ENOENT) {
> +		kfree(ref);
> +		return PTR_ERR(old_ptr);
> +	}
> +
> +	/* the snapshot id was not reserved, so reserve it now. */
> +	err = idr_alloc(idr, ref, id, id+1, GFP_KERNEL);
> +	if (err < 0)
> +		return err;
> +	WARN_ON(err != id);
> +
> +	return 0;
> +}
> +
> +/**
> + * __devlink_region_snapshot_id_deref - Decrement reference for a snapshot ID
> + *	@devlink: devlink instance
> + *	@id: the snapshot id being referenced
> + *
> + *	Decrements the reference count for a given snapshot id. If the
> + *	refcount has reached zero then remove the reference from the IDR.
> + */
> +static void __devlink_region_snapshot_id_deref(struct devlink *devlink, u32 id)
> +{
> +	struct idr *idr = &devlink->snapshot_idr;
> +	refcount_t *ref;
> +
> +	lockdep_assert_held(&devlink->lock);
> +
> +	if (WARN_ON(id < 1 || id >= INT_MAX))
> +		return;
> +
> +	/* Find the reference pointer */
> +	ref = idr_find(idr, id);
> +	if (!ref) {
> +		WARN(true, "no previous reference was inserted");
> +		/* this shouldn't happen, but at least attempt to cleanup if
> +		 * something went wrong.
> +		 */
> +		idr_remove(idr, id);
> +		return;
> +	}
> +
> +	if (refcount_dec_and_test(ref)) {
> +		/* There are no more references, so remove it from the IDR and
> +		 * free the reference count.
> +		 */
> +		idr_remove(idr, id);
> +		kfree(ref);
> +	}
> +}
> +
>  /**
>   *	__devlink_region_snapshot_id_get - get snapshot ID
>   *	@devlink: devlink instance
>   *
>   *	Returns a new snapshot id or a negative error code on failure. Must be
>   *	called while holding the devlink instance lock.
> + *
> + *	Snapshot ids are stored in an IDR and reference counted by the number
> + *	of snapshots currently using that id. This function pre-allocates
> + *	a snapshot id but does not fill in a reference count. A later call to
> + *	devlink_region_snapshot_create will update the IDR pointer to
> + *	a reference count. On devlink_region_snapshot_destory, if there are no
> + *	further references, the id will be removed from the IDR.
>   */
>  static int __devlink_region_snapshot_id_get(struct devlink *devlink)
>  {
>  	lockdep_assert_held(&devlink->lock);
> -
> -	if (devlink->snapshot_id >= INT_MAX)
> -		return -ENOSPC;
> -
> -	return ++devlink->snapshot_id;
> +	return idr_alloc(&devlink->snapshot_idr, NULL, 1, INT_MAX, GFP_KERNEL);
>  }
>  
>  /**
> @@ -3797,6 +3894,7 @@ __devlink_region_snapshot_create(struct devlink_region *region,
>  {
>  	struct devlink *devlink = region->devlink;
>  	struct devlink_snapshot *snapshot;
> +	int err;
>  
>  	lockdep_assert_held(&devlink->lock);
>  
> @@ -3811,6 +3909,11 @@ __devlink_region_snapshot_create(struct devlink_region *region,
>  	if (!snapshot)
>  		return -ENOMEM;
>  
> +	/* Increment snapshot id reference */
> +	err = __devlink_region_snapshot_id_ref(devlink, snapshot_id);
> +	if (err)
> +		goto err_free_snapshot;
> +
>  	snapshot->id = snapshot_id;
>  	snapshot->region = region;
>  	snapshot->data = data;
> @@ -3821,15 +3924,25 @@ __devlink_region_snapshot_create(struct devlink_region *region,
>  
>  	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
>  	return 0;
> +
> +err_free_snapshot:
> +	kfree(snapshot);
> +	return err;
>  }
>  
>  static void devlink_region_snapshot_del(struct devlink_region *region,
>  					struct devlink_snapshot *snapshot)
>  {
> +	struct devlink *devlink = region->devlink;
> +
> +	lockdep_assert_held(&devlink_mutex);
> +
>  	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL);
>  	region->cur_snapshots--;
>  	list_del(&snapshot->list);
>  	region->ops->destructor(snapshot->data);
> +	__devlink_region_snapshot_id_deref(devlink, snapshot->id);
> +
>  	kfree(snapshot);
>  }
>  
> @@ -6388,6 +6501,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
>  	if (!devlink)
>  		return NULL;
>  	devlink->ops = ops;
> +	idr_init(&devlink->snapshot_idr);
>  	__devlink_net_set(devlink, &init_net);
>  	INIT_LIST_HEAD(&devlink->port_list);
>  	INIT_LIST_HEAD(&devlink->sb_list);
> @@ -6480,6 +6594,23 @@ EXPORT_SYMBOL_GPL(devlink_reload_disable);
>   */
>  void devlink_free(struct devlink *devlink)
>  {
> +	struct idr *idr = &devlink->snapshot_idr;
> +
> +	mutex_lock(&devlink->lock);
> +	if (!idr_is_empty(idr)) {
> +		refcount_t *ref;
> +		int id;
> +
> +		WARN(true, "snapshot IDR is not empty");
> +
> +		idr_for_each_entry(idr, ref, id) {
> +			if (ref)
> +				kfree(ref);
> +		}
> +	}
> +	idr_destroy(&devlink->snapshot_idr);
> +	mutex_unlock(&devlink->lock);
> +
>  	mutex_destroy(&devlink->reporters_lock);
>  	mutex_destroy(&devlink->lock);
>  	WARN_ON(!list_empty(&devlink->trap_group_list));
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
  2020-02-14 23:22 ` [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get Jacob Keller
@ 2020-02-19  2:45   ` Jakub Kicinski
  2020-02-19 17:33     ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jakub Kicinski @ 2020-02-19  2:45 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Fri, 14 Feb 2020 15:22:05 -0800 Jacob Keller wrote:
> The devlink .info_get callback allows the driver to report detailed
> version information. The following devlink versions are reported with
> this initial implementation:
> 
>  "fw.mgmt" -> The version of the firmware that controls PHY, link, etc
>  "fw.mgmt.api" -> API version of interface exposed over the AdminQ
>  "fw.mgmt.bundle" -> Unique identifier for the firmware bundle
>  "fw.undi.orom" -> Version of the Option ROM containing the UEFI driver
>  "nvm.psid" -> Version of the format for the NVM parameter set
>  "nvm.bundle" -> Unique identifier for the combined NVM image

I spent some time today trying to write up the design choices behind
the original implementation but I think I can't complete that unless 
I understand what the PSID thing really is.

So the original design is motivated by two things:
 - making FW versions understandable / per component (as opposed 
   to the crowded ethtool string)
 - making it possible to automate FW management in a fleet of machines
   across vendors.

The second one is more important.

The design was expecting the following:
 - HW design is uniquely identified by 'fixed' versions;
 - each HW design requires only one FW build (but FW build can cover
   multiple versions of HW);

This is why serial number is not part of the fixed versions, even
though it is fixed. Serial is different per board, users should be 
able to map HW design to the FW version they want to run.

Effectively FW update agent does this:

  # Get unique HW design identifier
  $hw_id = devlink-dev-info['fixed']

  # Find out which FW we want to use for this NIC
  $want_fw_id = some-db-backed.lookup($hw_id)

  # Update if necessary  
  if $want_fw_id != devlink-dev-info['stored']:
     # maybe download the file
     devlink-dev-flash()

  # Reboot if necessary
  if $want_fw_id != devlink-dev-info['running']
     reboot()


dev-info sets can obviously contain multiple values, but field by field
comparison for simple == and != should work just fine.

The complications which had arisen so far are two:
 - even though all components are versioned some customers expressed
   uneasiness of only identifying the components but not the entire
   "build". That's why we added the 'fw.bundle'. When multiple
   components are "bundled" together into a flashable firmware image
   that bundle itself gets and ID.
   I'd expect there to be a bundle for each set of components which are
   distributed as a FW image. IOW bundle ID per type of file that can
   be downloaded from the vendor support site. For max convenience I'd
   think there should be one file that contains all components so
   customers don't have to juggle files. That means overall fw.bundle
   that covers all.
   Note: that fw.bundle is only meaningful if _all_ components are
   unchanged from flash image, so the FW must do a self-check to
   validate any component covered by a bundle id is unchanged.

 - the PSID stuff was added, which IIUC is either (a) an identifier 
   for configuration parameters which are not visible via normal Linux
   tools, or (b) a way for an OEM to label a product.
   This changes where this thing should reside because we don't expect
   OEM to relabel the product/SKU (or do we?) and hence it's a fixed
   version.
   If it's an identifier for random parameters of the board (serdes
   params, temperature info, IDK) which is expected to maybe be updated
   or tuned it should be in running/stored.

   So any further info on what's an EETRACK in your case?

   For MLX there's bunch of documents which tell us how we can create 
   an ini file with parameters, but no info on what those parameters
   actually are. 

   Jiri would you be able to help? Please chime in..


Sorry for the painful review process, it's quite hard to review what
people are doing without knowing the back end. Hopefully above gives
you an idea of the intentions when this code was added :)

I see that the next patch adds a 'fixed' version, so if that's
sufficient to identify your board there isn't any blocker here.

What I'd still like to consider is:
 - if fw.mgmt.bundle needs to be a bundle if it doesn't bundle multiple
   things? If it's hard to inject the build ID into the fw.mgmt version
   that's fine.
 - fw.undi.orom - do we need to say orom? Is there anything else than
   orom for UNDI in the flash?
 - nvm.psid may perhaps be better as nvm.psid.api? Following your
   fw.mgmt.api?
 - nvm.bundle - eetrack sounds mode like a stream, so perhaps this is
   the PSID?

> With this, devlink can now report at least the same information as
> reported by the older ethtool interface. Each section of the
> "firmware-version" is also reported independently so that it is easier
> to understand the meaning.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
  2020-02-19  2:45   ` Jakub Kicinski
@ 2020-02-19 17:33     ` Jacob Keller
  2020-02-19 19:57       ` Jakub Kicinski
  0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-19 17:33 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 2/18/2020 6:45 PM, Jakub Kicinski wrote:
> On Fri, 14 Feb 2020 15:22:05 -0800 Jacob Keller wrote:
>> The devlink .info_get callback allows the driver to report detailed
>> version information. The following devlink versions are reported with
>> this initial implementation:
>>
>>  "fw.mgmt" -> The version of the firmware that controls PHY, link, etc
>>  "fw.mgmt.api" -> API version of interface exposed over the AdminQ
>>  "fw.mgmt.bundle" -> Unique identifier for the firmware bundle
>>  "fw.undi.orom" -> Version of the Option ROM containing the UEFI driver
>>  "nvm.psid" -> Version of the format for the NVM parameter set
>>  "nvm.bundle" -> Unique identifier for the combined NVM image
> 
> I spent some time today trying to write up the design choices behind
> the original implementation but I think I can't complete that unless 
> I understand what the PSID thing really is.
> 

Ok.

> So the original design is motivated by two things:
>  - making FW versions understandable / per component (as opposed 
>    to the crowded ethtool string)
>  - making it possible to automate FW management in a fleet of machines
>    across vendors.
> 
> The second one is more important.
> 
> The design was expecting the following:
>  - HW design is uniquely identified by 'fixed' versions;
>  - each HW design requires only one FW build (but FW build can cover
>    multiple versions of HW);
> 
> This is why serial number is not part of the fixed versions, even
> though it is fixed. Serial is different per board, users should be 
> able to map HW design to the FW version they want to run.
> 

Right. Serial is separate from board, while something like the board.id
is an identifier of the *design* of the board, not of an individual one.

> Effectively FW update agent does this:
> 
>   # Get unique HW design identifier
>   $hw_id = devlink-dev-info['fixed']
> 
>   # Find out which FW we want to use for this NIC
>   $want_fw_id = some-db-backed.lookup($hw_id)
> 
>   # Update if necessary  
>   if $want_fw_id != devlink-dev-info['stored']:
>      # maybe download the file
>      devlink-dev-flash()
> 
>   # Reboot if necessary
>   if $want_fw_id != devlink-dev-info['running']
>      reboot()
> 
> 
> dev-info sets can obviously contain multiple values, but field by field
> comparison for simple == and != should work just fine.
> 
> The complications which had arisen so far are two:
>  - even though all components are versioned some customers expressed
>    uneasiness of only identifying the components but not the entire
>    "build". That's why we added the 'fw.bundle'. When multiple
>    components are "bundled" together into a flashable firmware image
>    that bundle itself gets and ID.
>    I'd expect there to be a bundle for each set of components which are
>    distributed as a FW image. IOW bundle ID per type of file that can
>    be downloaded from the vendor support site. For max convenience I'd
>    think there should be one file that contains all components so
>    customers don't have to juggle files. That means overall fw.bundle
>    that covers all.
>    Note: that fw.bundle is only meaningful if _all_ components are
>    unchanged from flash image, so the FW must do a self-check to
>    validate any component covered by a bundle id is unchanged.
> 

Right that makes sense.

>  - the PSID stuff was added, which IIUC is either (a) an identifier 
>    for configuration parameters which are not visible via normal Linux
>    tools, or (b) a way for an OEM to label a product.
>    This changes where this thing should reside because we don't expect
>    OEM to relabel the product/SKU (or do we?) and hence it's a fixed
>    version.
>    If it's an identifier for random parameters of the board (serdes
>    params, temperature info, IDK) which is expected to maybe be updated
>    or tuned it should be in running/stored.
> 

Hmm. In my case nvm.psid is basically describing the format of the NVM
parameter set, but I don't think it actually covers the contents. This
version can update if you update to a newer image.

I probably need to re-word the versions to be "fw.bundle" and "fw.psid",
rather than using "nvm", given how you're describing the fields above.

>    So any further info on what's an EETRACK in your case?
> 

EETRACK is basically the name we used for "bundle", as it is a unique
identifier generated when new images are prepared.

I think this should probably just become "fw.bundle".

What I have now as "fw.mgmt.bundle" is a little different. It's
basically a unique identifier obtained from the build system of the
management firmware that can be used to identify exactly what got built
for that firmware. (i.e. it would change even if the developers failed
to update their version number).

>    For MLX there's bunch of documents which tell us how we can create 
>    an ini file with parameters, but no info on what those parameters
>    actually are. 
> 
>    Jiri would you be able to help? Please chime in..
> 
> 
> Sorry for the painful review process, it's quite hard to review what
> people are doing without knowing the back end. Hopefully above gives
> you an idea of the intentions when this code was added :)
> 

I understand the difficulty.

> I see that the next patch adds a 'fixed' version, so if that's
> sufficient to identify your board there isn't any blocker here.

Yes, the board.id is the unique identifier of the physical board design.
It's what we've called the Product Board Assembly identifier.

> 
> What I'd still like to consider is:
>  - if fw.mgmt.bundle needs to be a bundle if it doesn't bundle multiple
>    things? If it's hard to inject the build ID into the fw.mgmt version
>    that's fine.

I mostly didn't like having it as part of the same version because it is
somewhat distinct. I don't think it's a "bundle" in the sense of what
you're describing.

It is basically just an identifier from the build system of that
component and will be changed even if the developer did not update the
firmware version. It's useful primarily to identify precisely where that
build of the firmware binary came from. (Hence why I originally used
".build").

>  - fw.undi.orom - do we need to say orom? Is there anything else than
>    orom for UNDI in the flash?

Hmm.. I'll double check this. I wasn't entirely sure if we had other
components which is why I went that route. I think you're right though
and this can just be "fw.undi".

>  - nvm.psid may perhaps be better as nvm.psid.api? Following your
>    fw.mgmt.api?

Hmm. Yea this isn't really a parameter set id, but more of describing
the format. I am not sure I fully understand it myself yet.

>  - nvm.bundle - eetrack sounds mode like a stream, so perhaps this is
>    the PSID?
> 

So, I think this should probably become "fw.bundle", and I can drop the
nvm bits altogether. The EETRACK id is a unique identifier we create
when new images are created. If you have the eetrack you can look up
data on the source binary that the NVM image came from.

It wouldn't cover the parameters that can be changed, so I don't think
it's a psid.


Given this discussion, here is what I have so far:

"fw.bundle" -> What was "nvm.bundle", the identifier for the combined fw
image. This would be our EETRACK id.
"fw.mgmt" -> The management firmware 3 digit version
"fw.mgmt.api" -> The version of API exposed by this firmware
"fw.mgmt.build" -> The build identifier. I really do think this should
be ".build" rather than .bundle, as it's definitely not a bundle in the
same sense. I *could* simply make "fw.mgmt" be "maj.min.patch build" but
I think it makes sense as its own field.

"fw.undi" -> Version of the Option ROM containing the UEFI driver

"fw.psid.api" -> what was the "nvm.psid". This I think needs a bit more
work to define. It makes sense to me as some sort of "api" as (if I
understand it correctly) it is the format for the parameters, but does
not itself define the parameter contents.

The original reason for using "fw" and "nvm" was because we (internally)
use fw to mean the management firmware.. where as these APIs really
combine the blocks and use "fw.mgmt" for the management firmware. Thus I
think it makes sense to move from

I also have a couple other oddities that need to be sorted out. We want
to display the DDP version (piece of "firmware" that is loaded during
driver load, and is not permanent to the NVM). In some sense this is our
"fw.app", but because it's loaded by driver at load and not as
permanently stored in the NVM... I'm not really sure that makes sense to
put this as the "fw.app", since it is not updated or really touched by
the firmware flash update.

Finally we also have a component we call the "netlist", which I'm still
not fully up to speed on exactly what it represents, but it has multiple
pieces of data including a 2-digit Major.Minor version of the base, a
type field indicating the format, and a 2-digit revision field that is
incremented on internal and external changes to the contents. Finally
there is a hash that I think might *actually* be something like a psid
or a bundle to uniquely represent this component. I haven't included
this component yet because I'm still trying to grasp exactly what it
represents and how best to describe each piece.

Thanks for your review,
Jake

>> With this, devlink can now report at least the same information as
>> reported by the older ethtool interface. Each section of the
>> "firmware-version" is also reported independently so that it is easier
>> to understand the meaning.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
  2020-02-19 17:33     ` Jacob Keller
@ 2020-02-19 19:57       ` Jakub Kicinski
  2020-02-19 21:37         ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jakub Kicinski @ 2020-02-19 19:57 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Wed, 19 Feb 2020 09:33:09 -0800 Jacob Keller wrote:
> >  - the PSID stuff was added, which IIUC is either (a) an identifier 
> >    for configuration parameters which are not visible via normal Linux
> >    tools, or (b) a way for an OEM to label a product.
> >    This changes where this thing should reside because we don't expect
> >    OEM to relabel the product/SKU (or do we?) and hence it's a fixed
> >    version.
> >    If it's an identifier for random parameters of the board (serdes
> >    params, temperature info, IDK) which is expected to maybe be updated
> >    or tuned it should be in running/stored.
> >   
> 
> Hmm. In my case nvm.psid is basically describing the format of the NVM
> parameter set, but I don't think it actually covers the contents. This
> version can update if you update to a newer image.
> 
> I probably need to re-word the versions to be "fw.bundle" and "fw.psid",
> rather than using "nvm", given how you're describing the fields above.
> 
> >    So any further info on what's an EETRACK in your case?
> >   
> 
> EETRACK is basically the name we used for "bundle", as it is a unique
> identifier generated when new images are prepared.
> 
> I think this should probably just become "fw.bundle".

Okay, cool!

> What I have now as "fw.mgmt.bundle" is a little different. It's
> basically a unique identifier obtained from the build system of the
> management firmware that can be used to identify exactly what got built
> for that firmware. (i.e. it would change even if the developers failed
> to update their version number).
> 
> >    For MLX there's bunch of documents which tell us how we can create 
> >    an ini file with parameters, but no info on what those parameters
> >    actually are. 
> > 
> >    Jiri would you be able to help? Please chime in..
> > 
> > 
> > Sorry for the painful review process, it's quite hard to review what
> > people are doing without knowing the back end. Hopefully above gives
> > you an idea of the intentions when this code was added :)
> >   
> 
> I understand the difficulty.
> 
> > I see that the next patch adds a 'fixed' version, so if that's
> > sufficient to identify your board there isn't any blocker here.  
> 
> Yes, the board.id is the unique identifier of the physical board design.
> It's what we've called the Product Board Assembly identifier.
> 
> > 
> > What I'd still like to consider is:
> >  - if fw.mgmt.bundle needs to be a bundle if it doesn't bundle multiple
> >    things? If it's hard to inject the build ID into the fw.mgmt version
> >    that's fine.  
> 
> I mostly didn't like having it as part of the same version because it is
> somewhat distinct. I don't think it's a "bundle" in the sense of what
> you're describing.
> 
> It is basically just an identifier from the build system of that
> component and will be changed even if the developer did not update the
> firmware version. It's useful primarily to identify precisely where that
> build of the firmware binary came from. (Hence why I originally used
> ".build").

Okay.

> >  - fw.undi.orom - do we need to say orom? Is there anything else than
> >    orom for UNDI in the flash?  
> 
> Hmm.. I'll double check this. I wasn't entirely sure if we had other
> components which is why I went that route. I think you're right though
> and this can just be "fw.undi".
> 
> >  - nvm.psid may perhaps be better as nvm.psid.api? Following your
> >    fw.mgmt.api?  
> 
> Hmm. Yea this isn't really a parameter set id, but more of describing
> the format. I am not sure I fully understand it myself yet.
> 
> >  - nvm.bundle - eetrack sounds mode like a stream, so perhaps this is
> >    the PSID?
> >   
> 
> So, I think this should probably become "fw.bundle", and I can drop the
> nvm bits altogether. The EETRACK id is a unique identifier we create
> when new images are created. If you have the eetrack you can look up
> data on the source binary that the NVM image came from.
> 
> It wouldn't cover the parameters that can be changed, so I don't think
> it's a psid.
> 
> 
> Given this discussion, here is what I have so far:
> 
> "fw.bundle" -> What was "nvm.bundle", the identifier for the combined fw
> image. This would be our EETRACK id.

👍

> "fw.mgmt" -> The management firmware 3 digit version

👍

> "fw.mgmt.api" -> The version of API exposed by this firmware

👍

> "fw.mgmt.build" -> The build identifier. I really do think this should
> be ".build" rather than .bundle, as it's definitely not a bundle in the
> same sense. I *could* simply make "fw.mgmt" be "maj.min.patch build" but
> I think it makes sense as its own field.

okay

> "fw.undi" -> Version of the Option ROM containing the UEFI driver

👍

> "fw.psid.api" -> what was the "nvm.psid". This I think needs a bit more
> work to define. It makes sense to me as some sort of "api" as (if I
> understand it correctly) it is the format for the parameters, but does
> not itself define the parameter contents.

Sounds good. So the contents of parameters would be covered by the
fw.bundle now and not have a separate version?

> The original reason for using "fw" and "nvm" was because we (internally)
> use fw to mean the management firmware.. where as these APIs really
> combine the blocks and use "fw.mgmt" for the management firmware. Thus I
> think it makes sense to move from
> 
> I also have a couple other oddities that need to be sorted out. We want
> to display the DDP version (piece of "firmware" that is loaded during
> driver load, and is not permanent to the NVM). In some sense this is our
> "fw.app", but because it's loaded by driver at load and not as
> permanently stored in the NVM... I'm not really sure that makes sense to
> put this as the "fw.app", since it is not updated or really touched by
> the firmware flash update.

Interesting, can DDP be persisted to the flash, though? Is there some
default DDP, or is it _never_ in the flash? 

Does it not have some fun implications for firmware signing to have
part of the config/ucode loaded from the host?

IIRC you could also load multiple of those DDP packages? Perhaps they
could get names like fw.app0, fw.app1, etc? Also if DDP controls a
particular part of the datapath (parser?) feel free to come up with a
more targeted name, up to you.

> Finally we also have a component we call the "netlist", which I'm still
> not fully up to speed on exactly what it represents, but it has multiple
> pieces of data including a 2-digit Major.Minor version of the base, a
> type field indicating the format, and a 2-digit revision field that is
> incremented on internal and external changes to the contents. Finally
> there is a hash that I think might *actually* be something like a psid
> or a bundle to uniquely represent this component. I haven't included
> this component yet because I'm still trying to grasp exactly what it
> represents and how best to describe each piece.

Hmm. netlist is a Si term, perhaps it's chip init data? nfp had
something called chip.init which I think loaded all the very low 
level Si configs.

My current guess is that psid is more of the serdes and maybe clock
data. 

Thinking about it now, it seems these versions mirror the company
structure. chip.init comes from the Si team. psid comes from the 
board design guys. fw.mgmt comes from the BSP/FW team.

None of them are really fixed but the frequency of changes increases
from chip.init changing very rarely to mgmt fw having a regular release
cadence.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
  2020-02-19 19:57       ` Jakub Kicinski
@ 2020-02-19 21:37         ` Jacob Keller
  2020-02-19 23:47           ` Jakub Kicinski
  0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-19 21:37 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

Jakub,

Thanks for your excellent feedback.

On 2/19/2020 11:57 AM, Jakub Kicinski wrote:
> On Wed, 19 Feb 2020 09:33:09 -0800 Jacob Keller wrote:
>> "fw.psid.api" -> what was the "nvm.psid". This I think needs a bit more
>> work to define. It makes sense to me as some sort of "api" as (if I
>> understand it correctly) it is the format for the parameters, but does
>> not itself define the parameter contents.
> 
> Sounds good. So the contents of parameters would be covered by the
> fw.bundle now and not have a separate version?
> 

I'm actually not sure if we have any way to identify the parameters.
I'll ask around about that. My understanding is that these would include
parameters that can be modified by the driver such as Wake on LAN
settings, so I'm also not sure if they'd be covered in the fw.bundle
either. The 'defaults' that were selected when the image was created
would be covered, but changes to them wouldn't update the value.

Hmmmmm.

>> The original reason for using "fw" and "nvm" was because we (internally)
>> use fw to mean the management firmware.. where as these APIs really
>> combine the blocks and use "fw.mgmt" for the management firmware. Thus I
>> think it makes sense to move from
>>
>> I also have a couple other oddities that need to be sorted out. We want
>> to display the DDP version (piece of "firmware" that is loaded during
>> driver load, and is not permanent to the NVM). In some sense this is our
>> "fw.app", but because it's loaded by driver at load and not as
>> permanently stored in the NVM... I'm not really sure that makes sense to
>> put this as the "fw.app", since it is not updated or really touched by
>> the firmware flash update.
> 
> Interesting, can DDP be persisted to the flash, though? Is there some
> default DDP, or is it _never_ in the flash? 
> 

There's a version of this within the flash, but it is limited, and many
device features get disabled if you don't load the DDP package file.
(You may have seen patches for this for implementing "safe mode").

My understanding is there is no mechanism for persisting a different DDP
to the flash.

> Does it not have some fun implications for firmware signing to have
> part of the config/ucode loaded from the host?
> 

I'm not sure how it works exactly. As far as I know, the DDP file is
itself signed.

> IIRC you could also load multiple of those DDP packages? Perhaps they
> could get names like fw.app0, fw.app1, etc?

You can load different ones, each has their own version and name
embedded. However, only one can be loaded at any given time, so I'm not
sure if multiples like this make sense.

> Also if DDP controls a
> particular part of the datapath (parser?) feel free to come up with a
> more targeted name, up to you.
> 

Right, it's my understanding that this defines the parsing logic, and
not the complete datapath microcode.

In theory, there could be at least 3 DDP versions

1) the version in the NVM, which would be the very basic "safe mode"
compatible one.

2) the version in the ddp firmware file that we search for when we load

3) the one that actually got activated. It's a sort of
first-come-first-serve and sticks around until a device global reset.
This should in theory always be the same as (2) unless you do something
weird like load different drivers on the multiple functions.

I suppose we could use "running" and "stored" for this, to have "stored"
be what's in the NVM, and "running" for the active one.. but that's ugly
and misusing what stored vs running is supposed to represent.

>> Finally we also have a component we call the "netlist", which I'm still
>> not fully up to speed on exactly what it represents, but it has multiple
>> pieces of data including a 2-digit Major.Minor version of the base, a
>> type field indicating the format, and a 2-digit revision field that is
>> incremented on internal and external changes to the contents. Finally
>> there is a hash that I think might *actually* be something like a psid
>> or a bundle to uniquely represent this component. I haven't included
>> this component yet because I'm still trying to grasp exactly what it
>> represents and how best to describe each piece.
> 
> Hmm. netlist is a Si term, perhaps it's chip init data? nfp had
> something called chip.init which I think loaded all the very low 
> level Si configs.
> 

I'm asking some colleagues to provide further details on this. Right now
the "version" for a netlist is just a display of all these fields munged
together "a.b.c-d.e.f", which I'd rather avoid.

> My current guess is that psid is more of the serdes and maybe clock
> data. 
> 
> Thinking about it now, it seems these versions mirror the company
> structure. chip.init comes from the Si team. psid comes from the 
> board design guys. fw.mgmt comes from the BSP/FW team.
> 
> None of them are really fixed but the frequency of changes increases
> from chip.init changing very rarely to mgmt fw having a regular release
> cadence.
> 

Without further information I don't know for sure, but I don't think
chip.init makes sense. I'll try to find out more.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
  2020-02-19 21:37         ` Jacob Keller
@ 2020-02-19 23:47           ` Jakub Kicinski
  2020-02-20  0:06             ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jakub Kicinski @ 2020-02-19 23:47 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On Wed, 19 Feb 2020 13:37:50 -0800 Jacob Keller wrote:
> Jakub,
> 
> Thanks for your excellent feedback.
> 
> On 2/19/2020 11:57 AM, Jakub Kicinski wrote:
> > On Wed, 19 Feb 2020 09:33:09 -0800 Jacob Keller wrote:  
> >> "fw.psid.api" -> what was the "nvm.psid". This I think needs a bit more
> >> work to define. It makes sense to me as some sort of "api" as (if I
> >> understand it correctly) it is the format for the parameters, but does
> >> not itself define the parameter contents.  
> > 
> > Sounds good. So the contents of parameters would be covered by the
> > fw.bundle now and not have a separate version?
> >   
> 
> I'm actually not sure if we have any way to identify the parameters.
> I'll ask around about that. My understanding is that these would include
> parameters that can be modified by the driver such as Wake on LAN
> settings, so I'm also not sure if they'd be covered in the fw.bundle
> either. The 'defaults' that were selected when the image was created
> would be covered, but changes to them wouldn't update the value.
> 
> Hmmmmm.

Ah, so these are just defaults, then if there's no existing version 
I wouldn't worry.

> >> The original reason for using "fw" and "nvm" was because we (internally)
> >> use fw to mean the management firmware.. where as these APIs really
> >> combine the blocks and use "fw.mgmt" for the management firmware. Thus I
> >> think it makes sense to move from
> >>
> >> I also have a couple other oddities that need to be sorted out. We want
> >> to display the DDP version (piece of "firmware" that is loaded during
> >> driver load, and is not permanent to the NVM). In some sense this is our
> >> "fw.app", but because it's loaded by driver at load and not as
> >> permanently stored in the NVM... I'm not really sure that makes sense to
> >> put this as the "fw.app", since it is not updated or really touched by
> >> the firmware flash update.  
> > 
> > Interesting, can DDP be persisted to the flash, though? Is there some
> > default DDP, or is it _never_ in the flash? 
> 
> There's a version of this within the flash, but it is limited, and many
> device features get disabled if you don't load the DDP package file.
> (You may have seen patches for this for implementing "safe mode").
> 
> My understanding is there is no mechanism for persisting a different DDP
> to the flash.

I see, so this really isn't just parser extensions.

I'm a little surprised you guys went this way, loading FW from disk
becomes painful for network boot and provisioning :S  All the first
stage images must have it built in, which is surprisingly painful.

Perhaps the "safe mode" FW is enough to boot, but then I guess once
real FW is available there may be a loss of link as the device resets?

> > Does it not have some fun implications for firmware signing to have
> > part of the config/ucode loaded from the host?
> 
> I'm not sure how it works exactly. As far as I know, the DDP file is
> itself signed.

Right, that'd make sense :)

> > IIRC you could also load multiple of those DDP packages? Perhaps they
> > could get names like fw.app0, fw.app1, etc?  
> 
> You can load different ones, each has their own version and name
> embedded. However, only one can be loaded at any given time, so I'm not
> sure if multiples like this make sense.

I see. Maybe just fw.app works then..

> > Also if DDP controls a
> > particular part of the datapath (parser?) feel free to come up with a
> > more targeted name, up to you.
> 
> Right, it's my understanding that this defines the parsing logic, and
> not the complete datapath microcode.
> 
> In theory, there could be at least 3 DDP versions
> 
> 1) the version in the NVM, which would be the very basic "safe mode"
> compatible one.
> 
> 2) the version in the ddp firmware file that we search for when we load
> 
> 3) the one that actually got activated. It's a sort of
> first-come-first-serve and sticks around until a device global reset.
> This should in theory always be the same as (2) unless you do something
> weird like load different drivers on the multiple functions.
> 
> I suppose we could use "running" and "stored" for this, to have "stored"
> be what's in the NVM, and "running" for the active one.. but that's ugly
> and misusing what stored vs running is supposed to represent.

Ouff. Having something loaded from disk breaks the running vs stored
comparison :( But I think Dave was pretty clear on his opinion about
load FW from disk and interpret it in the kernel to extract the version.

Can we leave stored meaning "stored on the device" and running being
loaded on the chip?

It's perfectly fine for a component to only be reported in running and
not stored, nfp already does that:

https://elixir.bootlin.com/linux/v5.6-rc1/source/drivers/net/ethernet/netronome/nfp/nfp_devlink.c#L238

> >> Finally we also have a component we call the "netlist", which I'm still
> >> not fully up to speed on exactly what it represents, but it has multiple
> >> pieces of data including a 2-digit Major.Minor version of the base, a
> >> type field indicating the format, and a 2-digit revision field that is
> >> incremented on internal and external changes to the contents. Finally
> >> there is a hash that I think might *actually* be something like a psid
> >> or a bundle to uniquely represent this component. I haven't included
> >> this component yet because I'm still trying to grasp exactly what it
> >> represents and how best to describe each piece.  
> > 
> > Hmm. netlist is a Si term, perhaps it's chip init data? nfp had
> > something called chip.init which I think loaded all the very low 
> > level Si configs.
> >   
> 
> I'm asking some colleagues to provide further details on this. Right now
> the "version" for a netlist is just a display of all these fields munged
> together "a.b.c-d.e.f", which I'd rather avoid.
> 
> > My current guess is that psid is more of the serdes and maybe clock
> > data. 
> > 
> > Thinking about it now, it seems these versions mirror the company
> > structure. chip.init comes from the Si team. psid comes from the 
> > board design guys. fw.mgmt comes from the BSP/FW team.
> > 
> > None of them are really fixed but the frequency of changes increases
> > from chip.init changing very rarely to mgmt fw having a regular release
> > cadence.
> >   
> 
> Without further information I don't know for sure, but I don't think
> chip.init makes sense. I'll try to find out more.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
  2020-02-19 23:47           ` Jakub Kicinski
@ 2020-02-20  0:06             ` Jacob Keller
  2020-02-21 22:11               ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-02-20  0:06 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 2/19/2020 3:47 PM, Jakub Kicinski wrote:
> On Wed, 19 Feb 2020 13:37:50 -0800 Jacob Keller wrote:
>> Jakub,
>>
>> Thanks for your excellent feedback.
>>
>> On 2/19/2020 11:57 AM, Jakub Kicinski wrote:
>>> On Wed, 19 Feb 2020 09:33:09 -0800 Jacob Keller wrote:  
>>>> "fw.psid.api" -> what was the "nvm.psid". This I think needs a bit more
>>>> work to define. It makes sense to me as some sort of "api" as (if I
>>>> understand it correctly) it is the format for the parameters, but does
>>>> not itself define the parameter contents.  
>>>
>>> Sounds good. So the contents of parameters would be covered by the
>>> fw.bundle now and not have a separate version?
>>>   
>>
>> I'm actually not sure if we have any way to identify the parameters.
>> I'll ask around about that. My understanding is that these would include
>> parameters that can be modified by the driver such as Wake on LAN
>> settings, so I'm also not sure if they'd be covered in the fw.bundle
>> either. The 'defaults' that were selected when the image was created
>> would be covered, but changes to them wouldn't update the value.
>>
>> Hmmmmm.
> 
> Ah, so these are just defaults, then if there's no existing version 
> I wouldn't worry.
> 
Ok.

>>>> The original reason for using "fw" and "nvm" was because we (internally)
>>>> use fw to mean the management firmware.. where as these APIs really
>>>> combine the blocks and use "fw.mgmt" for the management firmware. Thus I
>>>> think it makes sense to move from
>>>>
>>>> I also have a couple other oddities that need to be sorted out. We want
>>>> to display the DDP version (piece of "firmware" that is loaded during
>>>> driver load, and is not permanent to the NVM). In some sense this is our
>>>> "fw.app", but because it's loaded by driver at load and not as
>>>> permanently stored in the NVM... I'm not really sure that makes sense to
>>>> put this as the "fw.app", since it is not updated or really touched by
>>>> the firmware flash update.  
>>>
>>> Interesting, can DDP be persisted to the flash, though? Is there some
>>> default DDP, or is it _never_ in the flash? 
>>
>> There's a version of this within the flash, but it is limited, and many
>> device features get disabled if you don't load the DDP package file.
>> (You may have seen patches for this for implementing "safe mode").
>>
>> My understanding is there is no mechanism for persisting a different DDP
>> to the flash.
> 
> I see, so this really isn't just parser extensions.
> 

Right. I'm not entirely sure what pieces of logic the contents interact
with.

> I'm a little surprised you guys went this way, loading FW from disk
> becomes painful for network boot and provisioning :S  All the first
> stage images must have it built in, which is surprisingly painful.
> 

Right. I don't have the context for why this was chosen over making it a
portion that can be updated independently. Unfortunately I don't think
it's a decision that can be changed, at least not easily.

> Perhaps the "safe mode" FW is enough to boot, but then I guess once
> real FW is available there may be a loss of link as the device resets?
> 

it's enough to boot up and handle basic functionality. I'm not sure
exactly how it would be handled in regards to device reset.

>>> Does it not have some fun implications for firmware signing to have
>>> part of the config/ucode loaded from the host?
>>
>> I'm not sure how it works exactly. As far as I know, the DDP file is
>> itself signed.
> 
> Right, that'd make sense :)
> 
>>> IIRC you could also load multiple of those DDP packages? Perhaps they
>>> could get names like fw.app0, fw.app1, etc?  
>>
>> You can load different ones, each has their own version and name
>> embedded. However, only one can be loaded at any given time, so I'm not
>> sure if multiples like this make sense.
> 
> I see. Maybe just fw.app works then..
> 

Ok

>>> Also if DDP controls a
>>> particular part of the datapath (parser?) feel free to come up with a
>>> more targeted name, up to you.
>>
>> Right, it's my understanding that this defines the parsing logic, and
>> not the complete datapath microcode.
>>
>> In theory, there could be at least 3 DDP versions
>>
>> 1) the version in the NVM, which would be the very basic "safe mode"
>> compatible one.
>>
>> 2) the version in the ddp firmware file that we search for when we load
>>
>> 3) the one that actually got activated. It's a sort of
>> first-come-first-serve and sticks around until a device global reset.
>> This should in theory always be the same as (2) unless you do something
>> weird like load different drivers on the multiple functions.
>>
>> I suppose we could use "running" and "stored" for this, to have "stored"
>> be what's in the NVM, and "running" for the active one.. but that's ugly
>> and misusing what stored vs running is supposed to represent.
> 
> Ouff. Having something loaded from disk breaks the running vs stored
> comparison :( But I think Dave was pretty clear on his opinion about
> load FW from disk and interpret it in the kernel to extract the version.
>> Can we leave stored meaning "stored on the device" and running being
> loaded on the chip?
> 

Yes.

> It's perfectly fine for a component to only be reported in running and
> not stored, nfp already does that:
> 

Right, that is my plan. I'll probably just display only the running DDP
fields. If it turns out displaying the NVM default values makes sense
then it could be done through another field something like
"fw.app.default" or similar. Not convinced yet that it will be necessary
or useful.

Even in regards to other versions, I'm not entirely sure if I can read
the values for what's "stored" vs what is running anyways.

> https://elixir.bootlin.com/linux/v5.6-rc1/source/drivers/net/ethernet/netronome/nfp/nfp_devlink.c#L238
> 

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get
  2020-02-20  0:06             ` Jacob Keller
@ 2020-02-21 22:11               ` Jacob Keller
  0 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-02-21 22:11 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, jiri, valex, linyunsheng, lihong.yang

On 2/19/2020 4:06 PM, Jacob Keller wrote:
> On 2/19/2020 3:47 PM, Jakub Kicinski wrote:
>> On Wed, 19 Feb 2020 13:37:50 -0800 Jacob Keller wrote:
>>> Jakub,
>>>
>>> Thanks for your excellent feedback.
>>>
>>> On 2/19/2020 11:57 AM, Jakub Kicinski wrote:
>>>> On Wed, 19 Feb 2020 09:33:09 -0800 Jacob Keller wrote:  
>>>>> "fw.psid.api" -> what was the "nvm.psid". This I think needs a bit more
>>>>> work to define. It makes sense to me as some sort of "api" as (if I
>>>>> understand it correctly) it is the format for the parameters, but does
>>>>> not itself define the parameter contents.  
>>>>
>>>> Sounds good. So the contents of parameters would be covered by the
>>>> fw.bundle now and not have a separate version?
>>>>   
>>>
>>> I'm actually not sure if we have any way to identify the parameters.
>>> I'll ask around about that. My understanding is that these would include
>>> parameters that can be modified by the driver such as Wake on LAN
>>> settings, so I'm also not sure if they'd be covered in the fw.bundle
>>> either. The 'defaults' that were selected when the image was created
>>> would be covered, but changes to them wouldn't update the value.
>>>
>>> Hmmmmm.
>>
>> Ah, so these are just defaults, then if there's no existing version 
>> I wouldn't worry.
>>
> Ok.
> 
>>>>> The original reason for using "fw" and "nvm" was because we (internally)
>>>>> use fw to mean the management firmware.. where as these APIs really
>>>>> combine the blocks and use "fw.mgmt" for the management firmware. Thus I
>>>>> think it makes sense to move from
>>>>>
>>>>> I also have a couple other oddities that need to be sorted out. We want
>>>>> to display the DDP version (piece of "firmware" that is loaded during
>>>>> driver load, and is not permanent to the NVM). In some sense this is our
>>>>> "fw.app", but because it's loaded by driver at load and not as
>>>>> permanently stored in the NVM... I'm not really sure that makes sense to
>>>>> put this as the "fw.app", since it is not updated or really touched by
>>>>> the firmware flash update.  
>>>>
>>>> Interesting, can DDP be persisted to the flash, though? Is there some
>>>> default DDP, or is it _never_ in the flash? 
>>>
>>> There's a version of this within the flash, but it is limited, and many
>>> device features get disabled if you don't load the DDP package file.
>>> (You may have seen patches for this for implementing "safe mode").
>>>
>>> My understanding is there is no mechanism for persisting a different DDP
>>> to the flash.
>>
>> I see, so this really isn't just parser extensions.
>>
> 
> Right. I'm not entirely sure what pieces of logic the contents interact
> with.
> 
>> I'm a little surprised you guys went this way, loading FW from disk
>> becomes painful for network boot and provisioning :S  All the first
>> stage images must have it built in, which is surprisingly painful.
>>
> 
> Right. I don't have the context for why this was chosen over making it a
> portion that can be updated independently. Unfortunately I don't think
> it's a decision that can be changed, at least not easily.
> 
>> Perhaps the "safe mode" FW is enough to boot, but then I guess once
>> real FW is available there may be a loss of link as the device resets?
>>
> 
> it's enough to boot up and handle basic functionality. I'm not sure
> exactly how it would be handled in regards to device reset.
> 
>>>> Does it not have some fun implications for firmware signing to have
>>>> part of the config/ucode loaded from the host?
>>>
>>> I'm not sure how it works exactly. As far as I know, the DDP file is
>>> itself signed.
>>
>> Right, that'd make sense :)
>>
>>>> IIRC you could also load multiple of those DDP packages? Perhaps they
>>>> could get names like fw.app0, fw.app1, etc?  
>>>
>>> You can load different ones, each has their own version and name
>>> embedded. However, only one can be loaded at any given time, so I'm not
>>> sure if multiples like this make sense.
>>
>> I see. Maybe just fw.app works then..
>>
> 
> Ok
> 
>>>> Also if DDP controls a
>>>> particular part of the datapath (parser?) feel free to come up with a
>>>> more targeted name, up to you.
>>>
>>> Right, it's my understanding that this defines the parsing logic, and
>>> not the complete datapath microcode.
>>>
>>> In theory, there could be at least 3 DDP versions
>>>
>>> 1) the version in the NVM, which would be the very basic "safe mode"
>>> compatible one.
>>>
>>> 2) the version in the ddp firmware file that we search for when we load
>>>
>>> 3) the one that actually got activated. It's a sort of
>>> first-come-first-serve and sticks around until a device global reset.
>>> This should in theory always be the same as (2) unless you do something
>>> weird like load different drivers on the multiple functions.
>>>
>>> I suppose we could use "running" and "stored" for this, to have "stored"
>>> be what's in the NVM, and "running" for the active one.. but that's ugly
>>> and misusing what stored vs running is supposed to represent.
>>
>> Ouff. Having something loaded from disk breaks the running vs stored
>> comparison :( But I think Dave was pretty clear on his opinion about
>> load FW from disk and interpret it in the kernel to extract the version.
>>> Can we leave stored meaning "stored on the device" and running being
>> loaded on the chip?
>>
> 
> Yes.
> 
>> It's perfectly fine for a component to only be reported in running and
>> not stored, nfp already does that:
>>

Based on your feedback, I believe that we have settled on a set of
suitable names for this. I'm going to submit the initial ice series (the
first 7 patches) to Intel Wired LAN.

The remaining devlink patches need further feedback, but that can happen
while the ice changes are being submitted to next-queue through IWL.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 00/22] devlink region updates
  2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
                   ` (23 preceding siblings ...)
  2020-02-14 23:22 ` [RFC PATCH v2 2/2] devlink: stop requiring snapshot for regions Jacob Keller
@ 2020-03-02 16:27 ` Jiri Pirko
  2020-03-02 19:27   ` Jacob Keller
  24 siblings, 1 reply; 59+ messages in thread
From: Jiri Pirko @ 2020-03-02 16:27 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Sat, Feb 15, 2020 at 12:21:59AM CET, jacob.e.keller@intel.com wrote:
>This is a second revision of the previous RFC series I sent to enable two
>new devlink region features.
>
>The original series can be viewed on the list archives at
>
>https://lore.kernel.org/netdev/20200130225913.1671982-1-jacob.e.keller@intel.com/
>
>Overall, this series can be broken into 5 phases:
>
> 1) implement basic devlink support in the ice driver, including .info_get
> 2) convert regions to use the new devlink_region_ops structure
> 3) implement support for DEVLINK_CMD_REGION_NEW
> 4) implement support for directly reading from a region
> 5) use these new features in the ice driver for the Shadow RAM region

Hmm. I think it is better to push this in multiple patchsets. For example,
for 1) you don't really need RFC as it is only related to the ice driver
implementing the existing API.


>
>(1) comprises 6 patches for the ice driver that add the devlink framework
>and cleanup a few places in the code in preparation for the new region.
>
>(2) comprises 2 patches which convert regions to use the new
>devlink_region_ops structure, and additionally move the snapshot destructor
>to a region operation.
>
>(3) comprises 6 patches to enable supporting the DEVLINK_CMD_REGION_NEW
>operation. This replaces what was previously the
>DEVLINK_CMD_REGION_TAKE_SNAPSHOT, as per Jiri's suggestion. The new
>operation supports specifying the requested id for the snapshot. To make
>that possible, first snapshot id management is refactored to use an IDR.
>Note that the extra complexity of the IDR is necessary in order to maintain
>the ability for the snapshot IDs to be generated so that multiple regions
>can use the same ID if triggered at the same time.
>
>(4) comprises 6 patches for modifying DEVLINK_CMD_REGION_READ so that it
>accepts a request without a snapshot id. A new region operation is defined
>for regions to optionally support the requests. The first few patches
>refactor and simplify the functions used so that adding the new read method
>reuses logic where possible.
>
>(5) finally comprises a single patch to implement a region for the ice
>device hardware's Shadow RAM contents.
>
>Note that I plan to submit the ice patches through the Intel Wired LAN list,
>but am sending the complete set here as an RFC in case there is further
>feedback, and so that reviewers can have the correct context.
>
>I expect to get further feedback this RFC revision, and will hopefully send
>the patches as non-RFC following this, if feedback looks good. Thank you for
>the diligent review.
>
>Changes since v1:

Per-patch please. This is no good for review :/


>
>* reword some comments and variable names in the ice driver that used the
>  term "page" to use the term "sector" to avoid confusion with the PAGE_SIZE
>  of the system.
>* Fixed a bug in the ice_read_flat_nvm function due to misusing the last_cmd
>  variable
>* Remove the devlinkm* functions and just use devm_add_action in the ice
>  driver for managing the devlink memory similar to how the PF memory was
>  managed by the devm_kzalloc.
>* Fix typos in a couple of function comments in ice_devlink.c
>* use dev_err instead of dev_warn for an error case where the main VSI can't
>  be found.
>* Only call devlink_port_type_eth_set if the VSI has a netdev
>* Move where the devlink_port is created in the ice_probe flow
>* Update the new ice.rst documentation for info versions, providing more
>  clear descriptions of the parameters. Give examples for each field as
>  well. Squash the documentation additions into the relevant patches.
>* Add a new patch to the ice driver which renames some variables referring
>  to the Option ROM version.
>* keep the string constants in the mlx4 crdump.c file, converting them to
>  "const char * const" so that the compiler understands they can be used in
>  constant initializers.
>* Add a patch to convert snapshot destructors into a region operation
>* Add a patch to fix a trivial typo in a devlink function comment
>* Use __ as a prefix for static internal functions instead of a _locked
>  suffix.
>* Refactor snapshot id management to use an IDR.
>* Implement DEVLINK_CMD_REGION_NEW of DEVLINK_CMD_REGION_TAKE_SNAPSHOT
>* Add several patches which refactor devlink_nl_cmd_region_snapshot_fill
>* Use the new cb_ and cb_priv parameters to implement what was previously
>  a separate function called devlink_nl_cmd_region_direct_fill
>
>Jacob Keller (21):
>  ice: use __le16 types for explicitly Little Endian values
>  ice: create function to read a section of the NVM and Shadow RAM
>  ice: enable initial devlink support
>  ice: rename variables used for Option ROM version
>  ice: add basic handler for devlink .info_get
>  ice: add board identifier info to devlink .info_get
>  devlink: prepare to support region operations
>  devlink: convert snapshot destructor callback to region op
>  devlink: trivial: fix tab in function documentation
>  devlink: add functions to take snapshot while locked
>  devlink: convert snapshot id getter to return an error
>  devlink: track snapshot ids using an IDR and refcounts
>  devlink: implement DEVLINK_CMD_REGION_NEW
>  netdevsim: support taking immediate snapshot via devlink
>  devlink: simplify arguments for read_snapshot_fill
>  devlink: use min_t to calculate data_size
>  devlink: report extended error message in region_read_dumpit
>  devlink: remove unnecessary parameter from chunk_fill function
>  devlink: refactor region_read_snapshot_fill to use a callback function
>  devlink: support directly reading from region memory
>  ice: add a devlink region to dump shadow RAM contents
>
>Jesse Brandeburg (1):
>  ice: implement full NVM read from ETHTOOL_GEEPROM
>
> .../networking/devlink/devlink-region.rst     |  20 +-
> Documentation/networking/devlink/ice.rst      |  87 ++++
> Documentation/networking/devlink/index.rst    |   1 +
> drivers/net/ethernet/intel/Kconfig            |   1 +
> drivers/net/ethernet/intel/ice/Makefile       |   1 +
> drivers/net/ethernet/intel/ice/ice.h          |   6 +
> .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   3 +
> drivers/net/ethernet/intel/ice/ice_common.c   |  85 +---
> drivers/net/ethernet/intel/ice/ice_common.h   |  10 +-
> drivers/net/ethernet/intel/ice/ice_devlink.c  | 360 ++++++++++++++
> drivers/net/ethernet/intel/ice/ice_devlink.h  |  17 +
> drivers/net/ethernet/intel/ice/ice_ethtool.c  |  44 +-
> drivers/net/ethernet/intel/ice/ice_main.c     |  23 +-
> drivers/net/ethernet/intel/ice/ice_nvm.c      | 354 +++++++------
> drivers/net/ethernet/intel/ice/ice_nvm.h      |  12 +
> drivers/net/ethernet/intel/ice/ice_type.h     |  17 +-
> drivers/net/ethernet/mellanox/mlx4/crdump.c   |  32 +-
> drivers/net/netdevsim/dev.c                   |  41 +-
> include/net/devlink.h                         |  38 +-
> net/core/devlink.c                            | 465 ++++++++++++++----
> .../drivers/net/netdevsim/devlink.sh          |  15 +
> 21 files changed, 1257 insertions(+), 375 deletions(-)
> create mode 100644 Documentation/networking/devlink/ice.rst
> create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.c
> create mode 100644 drivers/net/ethernet/intel/ice/ice_devlink.h
>
>-- 
>2.25.0.368.g28a2d05eebfb
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 04/22] ice: enable initial devlink support
  2020-02-14 23:22 ` [RFC PATCH v2 04/22] ice: enable initial devlink support Jacob Keller
@ 2020-03-02 16:30   ` Jiri Pirko
  2020-03-02 19:29     ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Pirko @ 2020-03-02 16:30 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Sat, Feb 15, 2020 at 12:22:03AM CET, jacob.e.keller@intel.com wrote:

[...]

>+int ice_devlink_create_port(struct ice_pf *pf)
>+{
>+	struct devlink *devlink = priv_to_devlink(pf);
>+	struct ice_vsi *vsi = ice_get_main_vsi(pf);
>+	struct device *dev = ice_pf_to_dev(pf);
>+	int err;
>+
>+	if (!vsi) {
>+		dev_err(dev, "%s: unable to find main VSI\n", __func__);
>+		return -EIO;
>+	}
>+
>+	devlink_port_attrs_set(&pf->devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
>+			       pf->hw.pf_id, false, 0, NULL, 0);
>+	err = devlink_port_register(devlink, &pf->devlink_port, pf->hw.pf_id);
>+	if (err) {
>+		dev_err(dev, "devlink_port_register failed: %d\n", err);
>+		return err;
>+	}

You need to register_netdev here. Otherwise you'll get inconsistent udev
naming.


>+	if (vsi->netdev)
>+		devlink_port_type_eth_set(&pf->devlink_port, vsi->netdev);
>+
>+	return 0;
>+}


[...]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-02-14 23:22 ` [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW Jacob Keller
@ 2020-03-02 17:41   ` Jiri Pirko
  2020-03-02 19:38     ` Jacob Keller
                       ` (3 more replies)
  0 siblings, 4 replies; 59+ messages in thread
From: Jiri Pirko @ 2020-03-02 17:41 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Sat, Feb 15, 2020 at 12:22:13AM CET, jacob.e.keller@intel.com wrote:
>Implement support for the DEVLINK_CMD_REGION_NEW command for creating
>snapshots. This new command parallels the existing
>DEVLINK_CMD_REGION_DEL.
>
>In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
>".snapshot" operation must be implemented in the region's ops structure.
>
>The desired snapshot id may be provided. If the requested id is already
>in use, an error will be reported. If no id is provided one will be
>selected in the same way as a triggered snapshot.
>
>In either case, the reference count for that id will be incremented
>in the snapshot IDR.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>---
> .../networking/devlink/devlink-region.rst     | 12 +++-
> include/net/devlink.h                         |  6 ++
> net/core/devlink.c                            | 72 +++++++++++++++++++
> 3 files changed, 88 insertions(+), 2 deletions(-)
>
>diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>index 1a7683e7acb2..a24faf2b6b7a 100644
>--- a/Documentation/networking/devlink/devlink-region.rst
>+++ b/Documentation/networking/devlink/devlink-region.rst
>@@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
> Regions may also be used to provide an additional way to debug complex error
> states, but see also :doc:`devlink-health`
> 
>+Regions may optionally support capturing a snapshot on demand via the
>+``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow
>+requested snapshots must implement the ``.snapshot`` callback for the region
>+in its ``devlink_region_ops`` structure.
>+
> example usage
> -------------
> 
>@@ -40,8 +45,11 @@ example usage
>     # Delete a snapshot using:
>     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
> 
>-    # Trigger (request) a snapshot be taken:
>-    $ devlink region trigger pci/0000:00:05.0/cr-space

Odd. It is actually "devlink region dump". There is no trigger.


>+    # Request an immediate snapshot, if supported by the region
>+    $ devlink region new pci/0000:00:05.0/cr-space

Without ID? I would personally require snapshot id always. Without it,
it looks like you are creating region.


>+
>+    # Request an immediate snapshot with a specific id
>+    $ devlink region new pci/0000:00:05.0/cr-space snapshot 5
> 
>     # Dump a snapshot:
>     $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 3a5ff6bea143..3cd0ff2040b2 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -498,10 +498,16 @@ struct devlink_info_req;
>  * struct devlink_region_ops - Region operations
>  * @name: region name
>  * @destructor: callback used to free snapshot memory when deleting
>+ * @snapshot: callback to request an immediate snapshot. On success,
>+ *            the data variable must be updated to point to the snapshot data.
>+ *            The function will be called while the devlink instance lock is
>+ *            held.
>  */
> struct devlink_region_ops {
> 	const char *name;
> 	void (*destructor)(const void *data);
>+	int (*snapshot)(struct devlink *devlink, struct netlink_ext_ack *extack,
>+			u8 **data);

Please have the same type here and for destructor. "u8 *" I guess.


> };
> 
> struct devlink_fmsg;
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 9571063846cc..b5d1b21e5178 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -4045,6 +4045,71 @@ static int devlink_nl_cmd_region_del(struct sk_buff *skb,
> 	return 0;
> }
> 
>+static int
>+devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info)
>+{
>+	struct devlink *devlink = info->user_ptr[0];
>+	struct devlink_region *region;
>+	const char *region_name;
>+	u32 snapshot_id;
>+	u8 *data;
>+	int err;
>+
>+	if (!info->attrs[DEVLINK_ATTR_REGION_NAME]) {
>+		NL_SET_ERR_MSG_MOD(info->extack, "No region name provided");
>+		return -EINVAL;
>+	}
>+
>+	region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
>+	region = devlink_region_get_by_name(devlink, region_name);
>+	if (!region) {
>+		NL_SET_ERR_MSG_MOD(info->extack,

In devlink.c, please don't wrap here.


>+				   "The requested region does not exist");
>+		return -EINVAL;
>+	}
>+
>+	if (!region->ops->snapshot) {
>+		NL_SET_ERR_MSG_MOD(info->extack,
>+				   "The requested region does not support taking an immediate snapshot");
>+		return -EOPNOTSUPP;
>+	}
>+
>+	if (region->cur_snapshots == region->max_snapshots) {
>+		NL_SET_ERR_MSG_MOD(info->extack,
>+				   "The region has reached the maximum number of stored snapshots");
>+		return -ENOMEM;
>+	}
>+
>+	if (info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
>+		/* __devlink_region_snapshot_create will take care of
>+		 * inserting the snapshot id into the IDR if necessary.
>+		 */
>+		snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
>+
>+		if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
>+			NL_SET_ERR_MSG_MOD(info->extack,
>+					   "The requested snapshot id is already in use");
>+			return -EEXIST;
>+		}
>+	} else {
>+		snapshot_id = __devlink_region_snapshot_id_get(devlink);
>+	}
>+
>+	err = region->ops->snapshot(devlink, info->extack, &data);

Don't you put the "id"? Looks like a leak.


>+	if (err)
>+		return err;
>+
>+	err = __devlink_region_snapshot_create(region, data, snapshot_id);
>+	if (err)
>+		goto err_free_snapshot_data;
>+
>+	return 0;
>+
>+err_free_snapshot_data:
>+	region->ops->destructor(data);
>+	return err;
>+}
>+
> static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
> 						 struct devlink *devlink,
> 						 u8 *chunk, u32 chunk_size,
>@@ -6358,6 +6423,13 @@ static const struct genl_ops devlink_nl_ops[] = {
> 		.flags = GENL_ADMIN_PERM,
> 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
> 	},
>+	{
>+		.cmd = DEVLINK_CMD_REGION_NEW,
>+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>+		.doit = devlink_nl_cmd_region_new,
>+		.flags = GENL_ADMIN_PERM,
>+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>+	},
> 	{
> 		.cmd = DEVLINK_CMD_REGION_DEL,
> 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>-- 
>2.25.0.368.g28a2d05eebfb
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 08/22] devlink: prepare to support region operations
  2020-02-14 23:22 ` [RFC PATCH v2 08/22] devlink: prepare to support region operations Jacob Keller
@ 2020-03-02 17:42   ` Jiri Pirko
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Pirko @ 2020-03-02 17:42 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Sat, Feb 15, 2020 at 12:22:07AM CET, jacob.e.keller@intel.com wrote:
>Modify the devlink region code in preparation for adding new operations
>on regions.
>
>Create a devlink_region_ops structure, and move the name pointer from
>within the devlink_region structure into the ops structure (similar to
>the devlink_health_reporter_ops).
>
>This prepares the regions to enable support of additional operations in
>the future such as requesting snapshots, or accessing the region
>directly without a snapshot.
>
>In order to re-use the constant strings in the mlx4 driver their
>declaration must be changed to 'const char * const' to ensure the
>compiler realizes that both the data and the pointer cannot change.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>Reviewed-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 09/22] devlink: convert snapshot destructor callback to region op
  2020-02-14 23:22 ` [RFC PATCH v2 09/22] devlink: convert snapshot destructor callback to region op Jacob Keller
@ 2020-03-02 17:42   ` Jiri Pirko
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Pirko @ 2020-03-02 17:42 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Sat, Feb 15, 2020 at 12:22:08AM CET, jacob.e.keller@intel.com wrote:
>It does not makes sense that two snapshots for a given region would use
>different destructors. Simplify snapshot creation by adding
>a .destructor op for regions.
>
>This operation will replace the data_destructor for the snapshot
>creation, and makes snapshot creation easier.
>
>Noticed-by: Jakub Kicinski <kuba@kernel.org>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 10/22] devlink: trivial: fix tab in function documentation
  2020-02-14 23:22 ` [RFC PATCH v2 10/22] devlink: trivial: fix tab in function documentation Jacob Keller
@ 2020-03-02 17:42   ` Jiri Pirko
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Pirko @ 2020-03-02 17:42 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Sat, Feb 15, 2020 at 12:22:09AM CET, jacob.e.keller@intel.com wrote:
>The function documentation comment for devlink_region_snapshot_create
>included a literal tab character between 'future analyses' that was
>difficult to spot as it happened to only display as one space wide.
>
>Fix the comment to use a space here instead of a stray tab appearing in
>the middle of a sentence.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked
  2020-02-14 23:22 ` [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked Jacob Keller
@ 2020-03-02 17:43   ` Jiri Pirko
  2020-03-02 22:25     ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Pirko @ 2020-03-02 17:43 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Sat, Feb 15, 2020 at 12:22:10AM CET, jacob.e.keller@intel.com wrote:
>A future change is going to add a new devlink command to request
>a snapshot on demand. This function will want to call the
>devlink_region_snapshot_id_get and devlink_region_snapshot_create
>functions while already holding the devlink instance lock.
>
>Extract the logic of these two functions into static functions prefixed
>by `__` to indicate they are internal helper functions. Modify the
>original functions to be implemented in terms of the new locked
>functions.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Reviewed-by: Jiri Pirko <jiri@mellanox.com>


>---
> net/core/devlink.c | 93 ++++++++++++++++++++++++++++++----------------
> 1 file changed, 61 insertions(+), 32 deletions(-)
>
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index fef93f48028c..0e94887713f4 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -3760,6 +3760,65 @@ static void devlink_nl_region_notify(struct devlink_region *region,
> 	nlmsg_free(msg);
> }
> 
>+/**
>+ *	__devlink_region_snapshot_id_get - get snapshot ID
>+ *	@devlink: devlink instance
>+ *
>+ *	Returns a new snapshot id. Must be called while holding the
>+ *	devlink instance lock.
>+ */

You don't need this docu comment for static functions.


>+static u32 __devlink_region_snapshot_id_get(struct devlink *devlink)
>+{
>+	lockdep_assert_held(&devlink->lock);
>+	return ++devlink->snapshot_id;
>+}
>+
>+/**
>+ *	__devlink_region_snapshot_create - create a new snapshot
>+ *	This will add a new snapshot of a region. The snapshot
>+ *	will be stored on the region struct and can be accessed
>+ *	from devlink. This is useful for future analyses of snapshots.
>+ *	Multiple snapshots can be created on a region.
>+ *	The @snapshot_id should be obtained using the getter function.
>+ *
>+ *	Must be called only while holding the devlink instance lock.
>+ *
>+ *	@region: devlink region of the snapshot
>+ *	@data: snapshot data
>+ *	@snapshot_id: snapshot id to be created
>+ */
>+static int
>+__devlink_region_snapshot_create(struct devlink_region *region,
>+				 u8 *data, u32 snapshot_id)
>+{
>+	struct devlink *devlink = region->devlink;
>+	struct devlink_snapshot *snapshot;
>+
>+	lockdep_assert_held(&devlink->lock);
>+
>+	/* check if region can hold one more snapshot */
>+	if (region->cur_snapshots == region->max_snapshots)
>+		return -ENOMEM;
>+
>+	if (devlink_region_snapshot_get_by_id(region, snapshot_id))
>+		return -EEXIST;
>+
>+	snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
>+	if (!snapshot)
>+		return -ENOMEM;
>+
>+	snapshot->id = snapshot_id;
>+	snapshot->region = region;
>+	snapshot->data = data;
>+
>+	list_add_tail(&snapshot->list, &region->snapshot_list);
>+
>+	region->cur_snapshots++;
>+
>+	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
>+	return 0;
>+}
>+
> static void devlink_region_snapshot_del(struct devlink_region *region,
> 					struct devlink_snapshot *snapshot)
> {
>@@ -7618,7 +7677,7 @@ u32 devlink_region_snapshot_id_get(struct devlink *devlink)
> 	u32 id;
> 
> 	mutex_lock(&devlink->lock);
>-	id = ++devlink->snapshot_id;
>+	id = __devlink_region_snapshot_id_get(devlink);
> 	mutex_unlock(&devlink->lock);
> 
> 	return id;
>@@ -7641,42 +7700,12 @@ int devlink_region_snapshot_create(struct devlink_region *region,
> 				   u8 *data, u32 snapshot_id)
> {
> 	struct devlink *devlink = region->devlink;
>-	struct devlink_snapshot *snapshot;
> 	int err;
> 
> 	mutex_lock(&devlink->lock);
>-
>-	/* check if region can hold one more snapshot */
>-	if (region->cur_snapshots == region->max_snapshots) {
>-		err = -ENOMEM;
>-		goto unlock;
>-	}
>-
>-	if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
>-		err = -EEXIST;
>-		goto unlock;
>-	}
>-
>-	snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
>-	if (!snapshot) {
>-		err = -ENOMEM;
>-		goto unlock;
>-	}
>-
>-	snapshot->id = snapshot_id;
>-	snapshot->region = region;
>-	snapshot->data = data;
>-
>-	list_add_tail(&snapshot->list, &region->snapshot_list);
>-
>-	region->cur_snapshots++;
>-
>-	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
>+	err = __devlink_region_snapshot_create(region, data, snapshot_id);
> 	mutex_unlock(&devlink->lock);
>-	return 0;
> 
>-unlock:
>-	mutex_unlock(&devlink->lock);
> 	return err;
> }
> EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
>-- 
>2.25.0.368.g28a2d05eebfb
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 12/22] devlink: convert snapshot id getter to return an error
  2020-02-14 23:22 ` [RFC PATCH v2 12/22] devlink: convert snapshot id getter to return an error Jacob Keller
@ 2020-03-02 17:44   ` Jiri Pirko
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Pirko @ 2020-03-02 17:44 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Sat, Feb 15, 2020 at 12:22:11AM CET, jacob.e.keller@intel.com wrote:
>Modify the devlink_snapshot_id_get function to return a signed value,
>enabling reporting an error on failure.
>
>This enables easily refactoring how IDs are generated and kept track of
>in the future. For now, just report ENOSPC once INT_MAX snapshot ids
>have been returned.
>
>Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 00/22] devlink region updates
  2020-03-02 16:27 ` [RFC PATCH v2 00/22] devlink region updates Jiri Pirko
@ 2020-03-02 19:27   ` Jacob Keller
  0 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-03-02 19:27 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

[-- Attachment #1: Type: text/plain, Size: 3073 bytes --]



On 3/2/2020 8:27 AM, Jiri Pirko wrote:
> Sat, Feb 15, 2020 at 12:21:59AM CET, jacob.e.keller@intel.com wrote:
>> This is a second revision of the previous RFC series I sent to enable two
>> new devlink region features.
>>
>> The original series can be viewed on the list archives at
>>
>> https://lore.kernel.org/netdev/20200130225913.1671982-1-jacob.e.keller@intel.com/
>>
>> Overall, this series can be broken into 5 phases:
>>
>> 1) implement basic devlink support in the ice driver, including .info_get
>> 2) convert regions to use the new devlink_region_ops structure
>> 3) implement support for DEVLINK_CMD_REGION_NEW
>> 4) implement support for directly reading from a region
>> 5) use these new features in the ice driver for the Shadow RAM region
> 
> Hmm. I think it is better to push this in multiple patchsets. For example,
> for 1) you don't really need RFC as it is only related to the ice driver
> implementing the existing API.
> 

Yes that's my plan for the next revision. I'm working on getting the ice
support ready to submit through IWL now. The other parts I will break
into 2 series.

> 
>>
>> (1) comprises 6 patches for the ice driver that add the devlink framework
>> and cleanup a few places in the code in preparation for the new region.
>>
>> (2) comprises 2 patches which convert regions to use the new
>> devlink_region_ops structure, and additionally move the snapshot destructor
>> to a region operation.
>>
>> (3) comprises 6 patches to enable supporting the DEVLINK_CMD_REGION_NEW
>> operation. This replaces what was previously the
>> DEVLINK_CMD_REGION_TAKE_SNAPSHOT, as per Jiri's suggestion. The new
>> operation supports specifying the requested id for the snapshot. To make
>> that possible, first snapshot id management is refactored to use an IDR.
>> Note that the extra complexity of the IDR is necessary in order to maintain
>> the ability for the snapshot IDs to be generated so that multiple regions
>> can use the same ID if triggered at the same time.
>>
>> (4) comprises 6 patches for modifying DEVLINK_CMD_REGION_READ so that it
>> accepts a request without a snapshot id. A new region operation is defined
>> for regions to optionally support the requests. The first few patches
>> refactor and simplify the functions used so that adding the new read method
>> reuses logic where possible.
>>
>> (5) finally comprises a single patch to implement a region for the ice
>> device hardware's Shadow RAM contents.
>>
>> Note that I plan to submit the ice patches through the Intel Wired LAN list,
>> but am sending the complete set here as an RFC in case there is further
>> feedback, and so that reviewers can have the correct context.
>>
>> I expect to get further feedback this RFC revision, and will hopefully send
>> the patches as non-RFC following this, if feedback looks good. Thank you for
>> the diligent review.
>>
>> Changes since v1:
> 
> Per-patch please. This is no good for review :/
> 

I've attached the git range-diff between the v1 and v2 series. I'll keep
in mind for future revision logs.

Thanks,
Jake

[-- Attachment #2: range-diff-since-v1.diff --]
[-- Type: text/plain, Size: 29442 bytes --]

 5:  dfe3f13dc7c8 =  1:  3289b0e46c1f ice: use __le16 types for explicitly Little Endian values
 6:  efd2a78e8fb6 !  2:  e702c773bf81 ice: create function to read a section of the NVM and Shadow RAM
    @@ drivers/net/ethernet/intel/ice/ice_nvm.c: ice_aq_read_nvm(struct ice_hw *hw, u16
     + * @data: buffer to return data in (sized to fit the specified length)
     + * @read_shadow_ram: if true, read from shadow RAM instead of NVM
     + *
    -+ * Reads a portion of the NVM, as a flat memory space. This function will
    -+ * correctly handle reading of sizes beyond a page by breaking the request
    -+ * into multiple reads.
    ++ * Reads a portion of the NVM, as a flat memory space. This function correctly
    ++ * breaks read requests across Shadow RAM sectors and ensures that no single
    ++ * read request exceeds the maximum 4Kb read for a single AdminQ command.
     + *
     + * Returns a status code on failure. Note that the data pointer may be
     + * partially updated if some reads succeed before a failure.
    @@ drivers/net/ethernet/intel/ice/ice_nvm.c: ice_aq_read_nvm(struct ice_hw *hw, u16
     +		  bool read_shadow_ram)
     +{
     +	enum ice_status status;
    -+	bool last_cmd = true;
     +	u32 inlen = *length;
     +	u32 bytes_read = 0;
    ++	bool last_cmd;
     +
     +	*length = 0;
     +
     +	/* Verify the length of the read if this is for the Shadow RAM */
    -+	if (read_shadow_ram && ((offset + inlen) > (hw->nvm.sr_words * 2))) {
    ++	if (read_shadow_ram && ((offset + inlen) > (hw->nvm.sr_words * 2u))) {
     +		ice_debug(hw, ICE_DBG_NVM,
     +			  "NVM error: requested offset is beyond Shadow RAM limit\n");
     +		return ICE_ERR_PARAM;
     +	}
     +
     +	do {
    -+		u32 read_size, page_offset;
    ++		u32 read_size, sector_offset;
     +
     +		/* ice_aq_read_nvm cannot read more than 4Kb at a time.
    -+		 * Additionally, break the reads up so that they do not cross
    -+		 * a page boundary.
    ++		 * Additionally, a read from the Shadow RAM may not cross over
    ++		 * a sector boundary. Conveniently, the sector size is also
    ++		 * 4Kb.
     +		 */
    -+		page_offset = offset % ICE_AQ_MAX_BUF_LEN;
    -+		read_size = min_t(u32, ICE_AQ_MAX_BUF_LEN - page_offset,
    ++		sector_offset = offset % ICE_AQ_MAX_BUF_LEN;
    ++		read_size = min_t(u32, ICE_AQ_MAX_BUF_LEN - sector_offset,
     +				  inlen - bytes_read);
     +
    -+		if ((bytes_read + read_size) < inlen)
    -+			last_cmd = false;
    ++		last_cmd = !(bytes_read + read_size < inlen);
     +
     +		status = ice_aq_read_nvm(hw, ICE_AQC_NVM_START_POINT,
     +					 offset, read_size,
    @@ drivers/net/ethernet/intel/ice/ice_nvm.c: ice_read_sr_aq(struct ice_hw *hw, u32
     -	status = ice_read_sr_aq(hw, offset, 1, &data_local, true);
     -	if (!status)
     -		*data = le16_to_cpu(data_local);
    -+	/* Note that ice_read_flat_nvm checks if the read is past the Shadow
    -+	 * RAM size, and ensures we don't read across a page boundary
    ++	/* Note that ice_read_flat_nvm takes into account the 4Kb AdminQ and
    ++	 * Shadow RAM sector restrictions necessary when reading from the NVM.
     +	 */
     +	status = ice_read_flat_nvm(hw, offset * sizeof(u16), &bytes,
     +				   (u8 *)&data_local, true);
 7:  5f4f6ba0e561 =  3:  54ef31b469ee ice: implement full NVM read from ETHTOOL_GEEPROM
 9:  6bc459c7ade7 !  4:  58059efb5936 ice: enable initial devlink support
    @@ Commit message
         ice: enable initial devlink support
     
         Begin implementing support for the devlink interface with the ice
    -    driver. Use devlinkm_alloc to allocate the devlink memory. The PF
    -    private data structure is now allocated as part of the devlink instead
    -    of as a standalone allocation.
    +    driver.
    +
    +    The pf structure is currently memory managed through devres, via
    +    a devm_alloc. To mimic this behavior, after allocating the devlink
    +    pointer, use devm_add_action to add a teardown action for releasing the
    +    devlink memory on exit.
     
         The ice hardware is a multi-function PCIe device. Thus, each physical
         function will get its own devlink instance. This means that each
    @@ Commit message
         configuration. This is done because the ice driver loads a separate
         instance for each function.
     
    -    That means that this implementation does not enable devlink to manage
    +    Due to this, the implementation does not enable devlink to manage
         device-wide resources or configuration, as each physical function will
         be treated independently. This is done for simplicity, as managing
         a devlink instance across multiple driver instances would significantly
    @@ drivers/net/ethernet/intel/ice/ice_devlink.c (new)
     +const struct devlink_ops ice_devlink_ops = {
     +};
     +
    ++static void ice_devlink_free(void *devlink_ptr)
    ++{
    ++	devlink_free((struct devlink *)devlink_ptr);
    ++}
    ++
    ++/**
    ++ * ice_allocate_pf - Allocate devlink and return PF structure pointer
    ++ * @dev: the device to allocate for
    ++ *
    ++ * Allocate a devlink instance for this device and return the private area as
    ++ * the PF structure. The devlink memory is kept track of through devres by
    ++ * adding an action to remove it when unwinding.
    ++ */
    ++struct ice_pf *ice_allocate_pf(struct device *dev)
    ++{
    ++	struct devlink *devlink;
    ++
    ++	devlink = devlink_alloc(&ice_devlink_ops, sizeof(struct ice_pf));
    ++	if (!devlink)
    ++		return NULL;
    ++
    ++	/* Add an action to teardown the devlink when unwinding the driver */
    ++	if (devm_add_action(dev, ice_devlink_free, devlink)) {
    ++		devlink_free(devlink);
    ++		return NULL;
    ++	}
    ++
    ++	return devlink_priv(devlink);
    ++}
    ++
     +/**
     + * ice_devlink_register - Register devlink interface for this PF
     + * @pf: the PF to register the devlink for.
    @@ drivers/net/ethernet/intel/ice/ice_devlink.c (new)
     +}
     +
     +/**
    -+ * ice_devlink_unregister - Unregister devlink resources for this pf.
    ++ * ice_devlink_unregister - Unregister devlink resources for this PF.
     + * @pf: the PF structure to cleanup
     + *
     + * Releases resources used by devlink and cleans up associated memory.
    @@ drivers/net/ethernet/intel/ice/ice_devlink.c (new)
     + * @pf: the PF to create a port for
     + *
     + * Create and register a devlink_port for this PF. Note that although each
    -+ * physical function connected to a separate devlink instance, the port will
    -+ * still be numbered according to the physical function id.
    ++ * physical function is connected to a separate devlink instance, the port
    ++ * will still be numbered according to the physical function id.
     + *
     + * @returns zero on success or an error code on failure.
     + */
    @@ drivers/net/ethernet/intel/ice/ice_devlink.c (new)
     +	int err;
     +
     +	if (!vsi) {
    -+		dev_warn(dev, "%s: unable to find main VSI\n", __func__);
    ++		dev_err(dev, "%s: unable to find main VSI\n", __func__);
     +		return -EIO;
     +	}
     +
    @@ drivers/net/ethernet/intel/ice/ice_devlink.c (new)
     +		dev_err(dev, "devlink_port_register failed: %d\n", err);
     +		return err;
     +	}
    -+	devlink_port_type_eth_set(&pf->devlink_port, vsi->netdev);
    ++	if (vsi->netdev)
    ++		devlink_port_type_eth_set(&pf->devlink_port, vsi->netdev);
     +
     +	return 0;
     +}
    @@ drivers/net/ethernet/intel/ice/ice_devlink.h (new)
     +#ifndef _ICE_DEVLINK_H_
     +#define _ICE_DEVLINK_H_
     +
    -+extern const struct devlink_ops ice_devlink_ops;
    ++struct ice_pf *ice_allocate_pf(struct device *dev);
     +
     +int ice_devlink_register(struct ice_pf *pf);
     +void ice_devlink_unregister(struct ice_pf *pf);
    @@ drivers/net/ethernet/intel/ice/ice_main.c
      
      #define DRV_VERSION_MAJOR 0
      #define DRV_VERSION_MINOR 8
    -@@ drivers/net/ethernet/intel/ice/ice_main.c: static int ice_setup_pf_sw(struct ice_pf *pf)
    - 		status = -ENODEV;
    - 		goto unroll_vsi_setup;
    - 	}
    -+
    -+	status = ice_devlink_create_port(pf);
    -+	if (status)
    -+		goto unroll_vsi_setup;
    -+
    - 	/* netdev has to be configured before setting frame size */
    - 	ice_vsi_cfg_frame_size(vsi);
    - 
    -@@ drivers/net/ethernet/intel/ice/ice_main.c: static int ice_setup_pf_sw(struct ice_pf *pf)
    - 		}
    - 	}
    - 
    -+	ice_devlink_destroy_port(pf);
    -+
    - unroll_vsi_setup:
    - 	if (vsi) {
    - 		ice_vsi_free_q_vectors(vsi);
    -@@ drivers/net/ethernet/intel/ice/ice_main.c: static int
    - ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
    - {
    - 	struct device *dev = &pdev->dev;
    -+	struct devlink *devlink;
    - 	struct ice_pf *pf;
    - 	struct ice_hw *hw;
    - 	int err;
     @@ drivers/net/ethernet/intel/ice/ice_main.c: ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
      		return err;
      	}
      
     -	pf = devm_kzalloc(dev, sizeof(*pf), GFP_KERNEL);
    --	if (!pf)
    -+	devlink = devlinkm_alloc(dev, &ice_devlink_ops, sizeof(*pf));
    -+	if (!devlink) {
    -+		dev_err(dev, "devlink allocation failed\n");
    ++	pf = ice_allocate_pf(dev);
    + 	if (!pf)
      		return -ENOMEM;
    -+	}
      
    - 	/* set up for high or low DMA */
    - 	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
    -@@ drivers/net/ethernet/intel/ice/ice_main.c: ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
    - 	pci_enable_pcie_error_reporting(pdev);
    - 	pci_set_master(pdev);
    - 
    -+	pf = devlink_priv(devlink);
    - 	pf->pdev = pdev;
    - 	pci_set_drvdata(pdev, pf);
    - 	set_bit(__ICE_DOWN, pf->state);
     @@ drivers/net/ethernet/intel/ice/ice_main.c: ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
      
      	pf->msg_enable = netif_msg_init(debug, ICE_DFLT_NETIF_M);
    @@ drivers/net/ethernet/intel/ice/ice_main.c: ice_probe(struct pci_dev *pdev, const
      #ifndef CONFIG_DYNAMIC_DEBUG
      	if (debug < -1)
      		hw->debug_mask = debug;
    +@@ drivers/net/ethernet/intel/ice/ice_main.c: ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
    + 		goto err_alloc_sw_unroll;
    + 	}
    + 
    ++	err = ice_devlink_create_port(pf);
    ++	if (err)
    ++		goto err_alloc_sw_unroll;
    ++
    ++
    + 	clear_bit(__ICE_SERVICE_DIS, pf->state);
    + 
    + 	/* tell the firmware we are up */
    +@@ drivers/net/ethernet/intel/ice/ice_main.c: ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
    + 	return 0;
    + 
    + err_alloc_sw_unroll:
    ++	ice_devlink_destroy_port(pf);
    + 	set_bit(__ICE_SERVICE_DIS, pf->state);
    + 	set_bit(__ICE_DOWN, pf->state);
    + 	devm_kfree(dev, pf->first_sw);
     @@ drivers/net/ethernet/intel/ice/ice_main.c: ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
      	ice_deinit_pf(pf);
      	ice_deinit_hw(hw);
 -:  ------------ >  5:  0b22901ddc9a ice: rename variables used for Option ROM version
 -:  ------------ >  6:  94e187ff9f4d ice: add basic handler for devlink .info_get
11:  e386119abbfb !  7:  59408e666b26 ice: add board identifier info to devlink .info_get
    @@ Commit message
     
         Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
     
    + ## Documentation/networking/devlink/ice.rst ##
    +@@ Documentation/networking/devlink/ice.rst: The ``ice`` driver reports the following versions
    +       - Type
    +       - Example
    +       - Description
    ++    * - ``board.id``
    ++      - fixed
    ++      - K65390-000
    ++      - The Product Board Assembly (PBA) identifier of the board.
    +     * - ``fw.mgmt``
    +       - running
    +       - 1.16.10
    +@@ Documentation/networking/devlink/ice.rst: The ``ice`` driver reports the following versions
    +         are supported.
    +     * - ``fw.mgmt.bundle``
    +       - running
    +-      - ecabd066
    +-      - Unique identifier of the management firmware.
    ++      - 0xecabd066
    ++      - Unique identifier of the management firmware build.
    +     * - ``fw.undi.orom``
    +       - running
    +       - 1.2186.0
    +       - Version of the Option ROM containing the UEFI driver. The version is
    +         reported in ``major.minor.patch`` format. The major version is
    +-        incremented whenever a major breaking change occurs, or when
    +-        the minor version would overflow. The minor version is incremented
    +-        for non-breaking changes, and is reset to 1 when the major version
    +-        is incremented. The patch version is normally 0 but is incremented
    +-        when a fix is delivered as a patch against an older base Option ROM.
    ++        incremented whenever a major breaking change occurs, or when the
    ++        minor version would overflow. The minor version is incremented for
    ++        non-breaking changes and reset to 1 when the major version is
    ++        incremented. The patch version is normally 0 but is incremented when
    ++        a fix is delivered as a patch against an older base Option ROM.
    +     * - ``nvm.psid``
    +       - running
    +       - 0.50
    +-      - Version of the format for the NVM parameter set
    ++      - Version describing the format of the NVM parameter set.
    +     * - ``nvm.bundle``
    +       - running
    +       - 0x80001709
    +-      - Unique identifier for the entire NVM image contents, also known as
    +-        the EETRACK id.
    ++      - Unique identifier of the NVM image contents, also known as the
    ++        EETRACK id.
    +
      ## drivers/net/ethernet/intel/ice/ice_common.c ##
     @@ drivers/net/ethernet/intel/ice/ice_common.c: enum ice_status ice_reset(struct ice_hw *hw, enum ice_reset_req req)
      	return ice_check_reset(hw);
    @@ drivers/net/ethernet/intel/ice/ice_common.h: enum ice_status ice_nvm_validate_ch
     
      ## drivers/net/ethernet/intel/ice/ice_devlink.c ##
     @@ drivers/net/ethernet/intel/ice/ice_devlink.c: static int ice_devlink_info_get(struct devlink *devlink,
    - 	u8 oem_ver, oem_patch, nvm_ver_hi, nvm_ver_lo;
    + 	u8 orom_maj, orom_patch, nvm_ver_hi, nvm_ver_lo;
      	struct ice_pf *pf = devlink_priv(devlink);
      	struct ice_hw *hw = &pf->hw;
     +	enum ice_status status;
    - 	u16 oem_build;
    - 	char buf[32]; /* TODO: size this properly */
    + 	u16 orom_min;
    +-	char buf[32]; /* TODO: size this properly */
    ++	char buf[32];
      	int err;
    + 
    + 	ice_get_nvm_version(hw, &orom_maj, &orom_min, &orom_patch, &nvm_ver_hi,
     @@ drivers/net/ethernet/intel/ice/ice_devlink.c: static int ice_devlink_info_get(struct devlink *devlink,
      		return err;
      	}
    @@ drivers/net/ethernet/intel/ice/ice_devlink.c: static int ice_devlink_info_get(st
     +		return -EIO;
     +	}
     +
    -+	/* board.id (DEVLINK_INFO_VERSION_GENERIC_BOARD_ID) */
    -+	err = devlink_info_version_fixed_put(req, "board.id", buf);
    ++	err = devlink_info_version_fixed_put(req,
    ++					     DEVLINK_INFO_VERSION_GENERIC_BOARD_ID,
    ++					     buf);
     +	if (err) {
     +		NL_SET_ERR_MSG_MOD(extack, "Unable to set board identifier");
     +		return err;
     +	}
     +
    - 	/* fw (match exact output of ethtool -i firmware-version) */
    + 	snprintf(buf, sizeof(buf), "%u.%u.%u", hw->fw_maj_ver, hw->fw_min_ver,
    + 		 hw->fw_patch);
      	err = devlink_info_version_running_put(req,
    - 					       DEVLINK_INFO_VERSION_GENERIC_FW,
    +@@ drivers/net/ethernet/intel/ice/ice_devlink.c: static int ice_devlink_info_get(struct devlink *devlink,
    + 		return err;
    + 	}
    + 
    +-	snprintf(buf, sizeof(buf), "%u.%u", hw->api_maj_ver,
    +-		 hw->api_min_ver);
    ++	snprintf(buf, sizeof(buf), "%u.%u", hw->api_maj_ver, hw->api_min_ver);
    + 	err = devlink_info_version_running_put(req, "fw.mgmt.api", buf);
    + 	if (err) {
    + 		NL_SET_ERR_MSG_MOD(extack, "Unable to set mgmt fw API data");
     
      ## drivers/net/ethernet/intel/ice/ice_nvm.c ##
     @@ drivers/net/ethernet/intel/ice/ice_nvm.c: enum ice_status ice_read_sr_word(struct ice_hw *hw, u16 offset, u16 *data)
    @@ drivers/net/ethernet/intel/ice/ice_nvm.c: enum ice_status ice_read_sr_word(struc
     +	 */
     +	pba_size--;
     +	if (pba_num_size < (((u32)pba_size * 2) + 1)) {
    -+		ice_debug(hw, ICE_DBG_INIT,
    -+			  "Buffer too small for PBA data.\n");
    ++		ice_debug(hw, ICE_DBG_INIT, "Buffer too small for PBA data.\n");
     +		return ICE_ERR_PARAM;
     +	}
     +
     +	for (i = 0; i < pba_size; i++) {
     +		status = ice_read_sr_word(hw, (pba_tlv + 2 + 1) + i, &pba_word);
     +		if (status) {
    -+			ice_debug(hw, ICE_DBG_INIT,
    -+				  "Failed to read PBA Block word %d.\n", i);
    ++			ice_debug(hw, ICE_DBG_INIT, "Failed to read PBA Block word %d.\n", i);
     +			return status;
     +		}
     +
    @@ drivers/net/ethernet/intel/ice/ice_type.h
     @@ drivers/net/ethernet/intel/ice/ice_type.h: struct ice_hw_port_stats {
      /* Checksum and Shadow RAM pointers */
      #define ICE_SR_BOOT_CFG_PTR		0x132
    - #define ICE_NVM_OEM_VER_OFF		0x02
    + #define ICE_NVM_OROM_VER_OFF		0x02
     +#define ICE_SR_PBA_BLOCK_PTR		0x16
      #define ICE_SR_NVM_DEV_STARTER_VER	0x18
      #define ICE_SR_NVM_EETRACK_LO		0x2D
 1:  7d571fe7498b !  8:  1b745d45484b devlink: prepare to support region operations
    @@ Commit message
         the future such as requesting snapshots, or accessing the region
         directly without a snapshot.
     
    +    In order to re-use the constant strings in the mlx4 driver their
    +    declaration must be changed to 'const char * const' to ensure the
    +    compiler realizes that both the data and the pointer cannot change.
    +
         Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
    +    Reviewed-by: Jakub Kicinski <kuba@kernel.org>
     
      ## drivers/net/ethernet/mellanox/mlx4/crdump.c ##
     @@
    @@ drivers/net/ethernet/mellanox/mlx4/crdump.c
      
     -static const char *region_cr_space_str = "cr-space";
     -static const char *region_fw_health_str = "fw-health";
    ++static const char * const region_cr_space_str = "cr-space";
    ++static const char * const region_fw_health_str = "fw-health";
    ++
     +static const struct devlink_region_ops region_cr_space_ops = {
    -+	.name = "cr-space",
    ++	.name = region_cr_space_str,
     +};
     +
     +static const struct devlink_region_ops region_fw_health_ops = {
    -+	.name = "fw-health",
    ++	.name = region_fw_health_str,
     +};
      
      /* Set to true in case cr enable bit was set to true before crdump */
      static bool crdump_enbale_bit_set;
    -@@ drivers/net/ethernet/mellanox/mlx4/crdump.c: static void mlx4_crdump_collect_crspace(struct mlx4_dev *dev,
    - 		if (err) {
    - 			kvfree(crspace_data);
    - 			mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
    --				  region_cr_space_str, id, err);
    -+				  region_cr_space_ops.name, id, err);
    - 		} else {
    - 			mlx4_info(dev, "crdump: added snapshot %d to devlink region %s\n",
    --				  id, region_cr_space_str);
    -+				  id, region_cr_space_ops.name);
    - 		}
    - 	} else {
    - 		mlx4_err(dev, "crdump: Failed to allocate crspace buffer\n");
    -@@ drivers/net/ethernet/mellanox/mlx4/crdump.c: static void mlx4_crdump_collect_fw_health(struct mlx4_dev *dev,
    - 		if (err) {
    - 			kvfree(health_data);
    - 			mlx4_warn(dev, "crdump: devlink create %s snapshot id %d err %d\n",
    --				  region_fw_health_str, id, err);
    -+				  region_fw_health_ops.name, id, err);
    - 		} else {
    - 			mlx4_info(dev, "crdump: added snapshot %d to devlink region %s\n",
    --				  id, region_fw_health_str);
    -+				  id, region_fw_health_ops.name);
    - 		}
    - 	} else {
    - 		mlx4_err(dev, "crdump: Failed to allocate health buffer\n");
     @@ drivers/net/ethernet/mellanox/mlx4/crdump.c: int mlx4_crdump_init(struct mlx4_dev *dev)
      	/* Create cr-space region */
      	crdump->region_crspace =
    @@ drivers/net/ethernet/mellanox/mlx4/crdump.c: int mlx4_crdump_init(struct mlx4_de
      				      MAX_NUM_OF_DUMPS_TO_STORE,
      				      pci_resource_len(pdev, 0));
      	if (IS_ERR(crdump->region_crspace))
    - 		mlx4_warn(dev, "crdump: create devlink region %s err %ld\n",
    --			  region_cr_space_str,
    -+			  region_cr_space_ops.name,
    - 			  PTR_ERR(crdump->region_crspace));
    - 
    +@@ drivers/net/ethernet/mellanox/mlx4/crdump.c: int mlx4_crdump_init(struct mlx4_dev *dev)
      	/* Create fw-health region */
      	crdump->region_fw_health =
      		devlink_region_create(devlink,
    @@ drivers/net/ethernet/mellanox/mlx4/crdump.c: int mlx4_crdump_init(struct mlx4_de
      				      MAX_NUM_OF_DUMPS_TO_STORE,
      				      HEALTH_BUFFER_SIZE);
      	if (IS_ERR(crdump->region_fw_health))
    - 		mlx4_warn(dev, "crdump: create devlink region %s err %ld\n",
    --			  region_fw_health_str,
    -+			  region_fw_health_ops.name,
    - 			  PTR_ERR(crdump->region_fw_health));
    - 
    - 	return 0;
     
      ## drivers/net/netdevsim/dev.c ##
     @@ drivers/net/netdevsim/dev.c: static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
    @@ include/net/devlink.h: void devlink_port_param_value_changed(struct devlink_port
     +struct devlink_region *
     +devlink_region_create(struct devlink *devlink,
     +		      const struct devlink_region_ops *ops,
    -+		      u32 region_max_snapshots,
    -+		      u64 region_size);
    ++		      u32 region_max_snapshots, u64 region_size);
      void devlink_region_destroy(struct devlink_region *region);
      u32 devlink_region_snapshot_id_get(struct devlink *devlink);
      int devlink_region_snapshot_create(struct devlink_region *region,
    @@ net/core/devlink.c: EXPORT_SYMBOL_GPL(devlink_param_value_str_fill);
     +struct devlink_region *
     +devlink_region_create(struct devlink *devlink,
     +		      const struct devlink_region_ops *ops,
    -+		      u32 region_max_snapshots,
    -+		      u64 region_size)
    ++		      u32 region_max_snapshots, u64 region_size)
      {
      	struct devlink_region *region;
      	int err = 0;
 -:  ------------ >  9:  9032cc32d7b0 devlink: convert snapshot destructor callback to region op
 -:  ------------ > 10:  0733d5acd4eb devlink: trivial: fix tab in function documentation
 2:  5a532f335927 ! 11:  db000f11c121 devlink: add functions to take snapshot while locked
    @@ Commit message
         devlink_region_snapshot_id_get and devlink_region_snapshot_create
         functions while already holding the devlink instance lock.
     
    -    Extract the logic of these two functions into static functions with the
    -    _locked postfix. Modify the original functions to be implemented in
    -    terms of the new locked functions.
    +    Extract the logic of these two functions into static functions prefixed
    +    by `__` to indicate they are internal helper functions. Modify the
    +    original functions to be implemented in terms of the new locked
    +    functions.
     
         Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
     
    @@ net/core/devlink.c: static void devlink_nl_region_notify(struct devlink_region *
      }
      
     +/**
    -+ *	devlink_region_snapshot_id_get_locked - get snapshot ID
    ++ *	__devlink_region_snapshot_id_get - get snapshot ID
    ++ *	@devlink: devlink instance
     + *
     + *	Returns a new snapshot id. Must be called while holding the
     + *	devlink instance lock.
     + */
    -+static u32 devlink_region_snapshot_id_get_locked(struct devlink *devlink)
    ++static u32 __devlink_region_snapshot_id_get(struct devlink *devlink)
     +{
    ++	lockdep_assert_held(&devlink->lock);
     +	return ++devlink->snapshot_id;
     +}
     +
     +/**
    -+ *	devlink_region_snapshot_create_locked - create a new snapshot
    ++ *	__devlink_region_snapshot_create - create a new snapshot
     + *	This will add a new snapshot of a region. The snapshot
     + *	will be stored on the region struct and can be accessed
    -+ *	from devlink. This is useful for future	analyses of snapshots.
    ++ *	from devlink. This is useful for future analyses of snapshots.
     + *	Multiple snapshots can be created on a region.
     + *	The @snapshot_id should be obtained using the getter function.
     + *
    @@ net/core/devlink.c: static void devlink_nl_region_notify(struct devlink_region *
     + *	@region: devlink region of the snapshot
     + *	@data: snapshot data
     + *	@snapshot_id: snapshot id to be created
    -+ *	@destructor: pointer to destructor function to free data
     + */
     +static int
    -+devlink_region_snapshot_create_locked(struct devlink_region *region,
    -+				      u8 *data, u32 snapshot_id,
    -+				      devlink_snapshot_data_dest_t *destructor)
    ++__devlink_region_snapshot_create(struct devlink_region *region,
    ++				 u8 *data, u32 snapshot_id)
     +{
    ++	struct devlink *devlink = region->devlink;
     +	struct devlink_snapshot *snapshot;
     +
    ++	lockdep_assert_held(&devlink->lock);
    ++
     +	/* check if region can hold one more snapshot */
     +	if (region->cur_snapshots == region->max_snapshots)
     +		return -ENOMEM;
    @@ net/core/devlink.c: static void devlink_nl_region_notify(struct devlink_region *
     +	snapshot->id = snapshot_id;
     +	snapshot->region = region;
     +	snapshot->data = data;
    -+	snapshot->data_destructor = destructor;
     +
     +	list_add_tail(&snapshot->list, &region->snapshot_list);
     +
    @@ net/core/devlink.c: u32 devlink_region_snapshot_id_get(struct devlink *devlink)
      
      	mutex_lock(&devlink->lock);
     -	id = ++devlink->snapshot_id;
    -+	id = devlink_region_snapshot_id_get_locked(devlink);
    ++	id = __devlink_region_snapshot_id_get(devlink);
      	mutex_unlock(&devlink->lock);
      
      	return id;
    -@@ net/core/devlink.c: EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_get);
    -  *	devlink_region_snapshot_create - create a new snapshot
    -  *	This will add a new snapshot of a region. The snapshot
    -  *	will be stored on the region struct and can be accessed
    -- *	from devlink. This is useful for future	analyses of snapshots.
    -+ *	from devlink. This is useful for future analyses of snapshots.
    -  *	Multiple snapshots can be created on a region.
    -  *	The @snapshot_id should be obtained using the getter function.
    -  *
     @@ net/core/devlink.c: int devlink_region_snapshot_create(struct devlink_region *region,
    - 				   devlink_snapshot_data_dest_t *data_destructor)
    + 				   u8 *data, u32 snapshot_id)
      {
      	struct devlink *devlink = region->devlink;
     -	struct devlink_snapshot *snapshot;
    @@ net/core/devlink.c: int devlink_region_snapshot_create(struct devlink_region *re
     -	snapshot->id = snapshot_id;
     -	snapshot->region = region;
     -	snapshot->data = data;
    --	snapshot->data_destructor = data_destructor;
     -
     -	list_add_tail(&snapshot->list, &region->snapshot_list);
     -
     -	region->cur_snapshots++;
     -
     -	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW);
    -+	err = devlink_region_snapshot_create_locked(region, data, snapshot_id,
    -+						    data_destructor);
    ++	err = __devlink_region_snapshot_create(region, data, snapshot_id);
      	mutex_unlock(&devlink->lock);
     -	return 0;
      
 3:  806a97ae3de9 <  -:  ------------ devlink: add operation to take an immediate snapshot
 4:  b4276446fdcf <  -:  ------------ netdevsim: support taking immediate snapshot via devlink
 8:  f3141a755fb5 <  -:  ------------ devlink: add devres managed devlinkm_alloc and devlinkm_free
10:  d1284fd5b0ee <  -:  ------------ ice: add basic handler for devlink .info_get
12:  30a621018ac2 <  -:  ------------ ice: add a devlink region to dump shadow RAM contents
13:  cb1b4d27d9af <  -:  ------------ devlink: support directly reading from region memory
14:  feae26ff3541 <  -:  ------------ ice: support direct read of the shadow ram region
15:  1e7c2cd5fb66 <  -:  ------------ ice: add ice.rst devlink documentation file
 -:  ------------ > 12:  192d7644d59f devlink: convert snapshot id getter to return an error
 -:  ------------ > 13:  37b91ca05e63 devlink: track snapshot ids using an IDR and refcounts
 -:  ------------ > 14:  cf6472e590b0 devlink: implement DEVLINK_CMD_REGION_NEW
 -:  ------------ > 15:  3371776f00a3 netdevsim: support taking immediate snapshot via devlink
 -:  ------------ > 16:  3017e55058e1 devlink: simplify arguments for read_snapshot_fill
 -:  ------------ > 17:  d1ef960f156c devlink: use min_t to calculate data_size
 -:  ------------ > 18:  06c791e0df4d devlink: report extended error message in region_read_dumpit
 -:  ------------ > 19:  5aa4cee09a1f devlink: remove unnecessary parameter from chunk_fill function
 -:  ------------ > 20:  2eb06cab901b devlink: refactor region_read_snapshot_fill to use a callback function
 -:  ------------ > 21:  854875dd7872 devlink: support directly reading from region memory
 -:  ------------ > 22:  0ebf8548ddb2 ice: add a devlink region to dump shadow RAM contents

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 04/22] ice: enable initial devlink support
  2020-03-02 16:30   ` Jiri Pirko
@ 2020-03-02 19:29     ` Jacob Keller
  2020-03-03 13:47       ` Jiri Pirko
  0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-03-02 19:29 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba



On 3/2/2020 8:30 AM, Jiri Pirko wrote:
> Sat, Feb 15, 2020 at 12:22:03AM CET, jacob.e.keller@intel.com wrote:
> 
> [...]
> 
>> +int ice_devlink_create_port(struct ice_pf *pf)
>> +{
>> +	struct devlink *devlink = priv_to_devlink(pf);
>> +	struct ice_vsi *vsi = ice_get_main_vsi(pf);
>> +	struct device *dev = ice_pf_to_dev(pf);
>> +	int err;
>> +
>> +	if (!vsi) {
>> +		dev_err(dev, "%s: unable to find main VSI\n", __func__);
>> +		return -EIO;
>> +	}
>> +
>> +	devlink_port_attrs_set(&pf->devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
>> +			       pf->hw.pf_id, false, 0, NULL, 0);
>> +	err = devlink_port_register(devlink, &pf->devlink_port, pf->hw.pf_id);
>> +	if (err) {
>> +		dev_err(dev, "devlink_port_register failed: %d\n", err);
>> +		return err;
>> +	}
> 
> You need to register_netdev here. Otherwise you'll get inconsistent udev
> naming.
> 

The netdev is registered in other portion of the code, and should
already be registered by the time we call ice_devlink_create_port. This
check is mostly here to prevent a NULL pointer if the VSI somehow
doesn't have a netdev associated with it.

> 
>> +	if (vsi->netdev)
>> +		devlink_port_type_eth_set(&pf->devlink_port, vsi->netdev);
>> +
>> +	return 0;
>> +}
> 
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-02 17:41   ` Jiri Pirko
@ 2020-03-02 19:38     ` Jacob Keller
  2020-03-03  9:30       ` Jiri Pirko
  2020-03-02 22:11     ` Jacob Keller
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-03-02 19:38 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

On 3/2/2020 9:41 AM, Jiri Pirko wrote:
> Sat, Feb 15, 2020 at 12:22:13AM CET, jacob.e.keller@intel.com wrote:
>> Implement support for the DEVLINK_CMD_REGION_NEW command for creating
>> snapshots. This new command parallels the existing
>> DEVLINK_CMD_REGION_DEL.
>>
>> In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
>> ".snapshot" operation must be implemented in the region's ops structure.
>>
>> The desired snapshot id may be provided. If the requested id is already
>> in use, an error will be reported. If no id is provided one will be
>> selected in the same way as a triggered snapshot.
>>
>> In either case, the reference count for that id will be incremented
>> in the snapshot IDR.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>> .../networking/devlink/devlink-region.rst     | 12 +++-
>> include/net/devlink.h                         |  6 ++
>> net/core/devlink.c                            | 72 +++++++++++++++++++
>> 3 files changed, 88 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>> index 1a7683e7acb2..a24faf2b6b7a 100644
>> --- a/Documentation/networking/devlink/devlink-region.rst
>> +++ b/Documentation/networking/devlink/devlink-region.rst
>> @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
>> Regions may also be used to provide an additional way to debug complex error
>> states, but see also :doc:`devlink-health`
>>
>> +Regions may optionally support capturing a snapshot on demand via the
>> +``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow
>> +requested snapshots must implement the ``.snapshot`` callback for the region
>> +in its ``devlink_region_ops`` structure.
>> +
>> example usage
>> -------------
>>
>> @@ -40,8 +45,11 @@ example usage
>>     # Delete a snapshot using:
>>     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>>
>> -    # Trigger (request) a snapshot be taken:
>> -    $ devlink region trigger pci/0000:00:05.0/cr-space
> 
> Odd. It is actually "devlink region dump". There is no trigger.
> 
> 
>> +    # Request an immediate snapshot, if supported by the region
>> +    $ devlink region new pci/0000:00:05.0/cr-space
> 
> Without ID? I would personally require snapshot id always. Without it,
> it looks like you are creating region.
> 

Not specifying an ID causes the ID to be auto-selected. I suppose
support for that doesn't need to be kept.

> 
>> +
>> +    # Request an immediate snapshot with a specific id
>> +    $ devlink region new pci/0000:00:05.0/cr-space snapshot 5
>>
>>     # Dump a snapshot:
>>     $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index 3a5ff6bea143..3cd0ff2040b2 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -498,10 +498,16 @@ struct devlink_info_req;
>>  * struct devlink_region_ops - Region operations
>>  * @name: region name
>>  * @destructor: callback used to free snapshot memory when deleting
>> + * @snapshot: callback to request an immediate snapshot. On success,
>> + *            the data variable must be updated to point to the snapshot data.
>> + *            The function will be called while the devlink instance lock is
>> + *            held.
>>  */
>> struct devlink_region_ops {
>> 	const char *name;
>> 	void (*destructor)(const void *data);
>> +	int (*snapshot)(struct devlink *devlink, struct netlink_ext_ack *extack,
>> +			u8 **data);
> 
> Please have the same type here and for destructor. "u8 *" I guess.
> 
Sure. My only concern would be if that causes a compiler warning when
passing kfree/vfree to the destructor pointer. Alternatively we could
use void **data, but it's definitely interpreted as a byte stream by the
devlink core code.

> 
>> };
>>
>> struct devlink_fmsg;
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index 9571063846cc..b5d1b21e5178 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -4045,6 +4045,71 @@ static int devlink_nl_cmd_region_del(struct sk_buff *skb,
>> 	return 0;
>> }
>>
>> +static int
>> +devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info)
>> +{
>> +	struct devlink *devlink = info->user_ptr[0];
>> +	struct devlink_region *region;
>> +	const char *region_name;
>> +	u32 snapshot_id;
>> +	u8 *data;
>> +	int err;
>> +
>> +	if (!info->attrs[DEVLINK_ATTR_REGION_NAME]) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "No region name provided");
>> +		return -EINVAL;
>> +	}
>> +
>> +	region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
>> +	region = devlink_region_get_by_name(devlink, region_name);
>> +	if (!region) {
>> +		NL_SET_ERR_MSG_MOD(info->extack,
> 
> In devlink.c, please don't wrap here.
> 

For any of these?

> 
>> +				   "The requested region does not exist");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!region->ops->snapshot) {
>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> +				   "The requested region does not support taking an immediate snapshot");
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	if (region->cur_snapshots == region->max_snapshots) {
>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> +				   "The region has reached the maximum number of stored snapshots");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	if (info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
>> +		/* __devlink_region_snapshot_create will take care of
>> +		 * inserting the snapshot id into the IDR if necessary.
>> +		 */
>> +		snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
>> +
>> +		if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
>> +			NL_SET_ERR_MSG_MOD(info->extack,
>> +					   "The requested snapshot id is already in use");
>> +			return -EEXIST;
>> +		}
>> +	} else {
>> +		snapshot_id = __devlink_region_snapshot_id_get(devlink);
>> +	}
>> +
>> +	err = region->ops->snapshot(devlink, info->extack, &data);
> 
> Don't you put the "id"? Looks like a leak.
> 

The id is put into the devlink_region_snapshot_create, the driver code
doesn't need to know about it as far as I can tell.

Currently the ids are managed by an IDR which stores a reference count
of how many snapshots use it.

Use of "NULL" is done so that devlink_region_snapshot_id_get can
"pre-allocate" the ID without assigning snapshots, assuming that a later
call to the devlink_region_snapshot_create will find that id and create
or increment it's refcount.

This complexity comes from the fact that the current code requires the
ability to re-use the same snapshot id for different regions in the same
devlink. This devlink_region_snapshot_id_get must return IDs which are
unique across all regions. If a user does DEVLINK_CMD_REGION_NEW with an
ID, it would only be used by a single snapshot. We need to make sure
that this doesn't confuse devlink_region_snapshot_id_get. Additionally,
I wanted to make sure that the snapshot IDs could be re-used once the
related snapshots have been deleted.

> 
>> +	if (err)
>> +		return err;
>> +
>> +	err = __devlink_region_snapshot_create(region, data, snapshot_id);
>> +	if (err)
>> +		goto err_free_snapshot_data;
>> +
>> +	return 0;
>> +
>> +err_free_snapshot_data:
>> +	region->ops->destructor(data);
>> +	return err;
>> +}
>> +
>> static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
>> 						 struct devlink *devlink,
>> 						 u8 *chunk, u32 chunk_size,
>> @@ -6358,6 +6423,13 @@ static const struct genl_ops devlink_nl_ops[] = {
>> 		.flags = GENL_ADMIN_PERM,
>> 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>> 	},
>> +	{
>> +		.cmd = DEVLINK_CMD_REGION_NEW,
>> +		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> +		.doit = devlink_nl_cmd_region_new,
>> +		.flags = GENL_ADMIN_PERM,
>> +		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>> +	},
>> 	{
>> 		.cmd = DEVLINK_CMD_REGION_DEL,
>> 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> -- 
>> 2.25.0.368.g28a2d05eebfb
>>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-02 17:41   ` Jiri Pirko
  2020-03-02 19:38     ` Jacob Keller
@ 2020-03-02 22:11     ` Jacob Keller
  2020-03-02 22:14     ` Jacob Keller
  2020-03-02 22:35     ` Jacob Keller
  3 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-03-02 22:11 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba



On 3/2/2020 9:41 AM, Jiri Pirko wrote:
> Sat, Feb 15, 2020 at 12:22:13AM CET, jacob.e.keller@intel.com wrote:
>> +++ b/include/net/devlink.h
>> @@ -498,10 +498,16 @@ struct devlink_info_req;
>>  * struct devlink_region_ops - Region operations
>>  * @name: region name
>>  * @destructor: callback used to free snapshot memory when deleting
>> + * @snapshot: callback to request an immediate snapshot. On success,
>> + *            the data variable must be updated to point to the snapshot data.
>> + *            The function will be called while the devlink instance lock is
>> + *            held.
>>  */
>> struct devlink_region_ops {
>> 	const char *name;
>> 	void (*destructor)(const void *data);
>> +	int (*snapshot)(struct devlink *devlink, struct netlink_ext_ack *extack,
>> +			u8 **data);
> 
> Please have the same type here and for destructor. "u8 *" I guess.
> 
So, changing the destructor to take a const u8 * is problematic, because
it can then no longer directly take kfree, vfree, or kvfree.

I'd be happy to change the snapshot function so that it takes a void **
though.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-02 17:41   ` Jiri Pirko
  2020-03-02 19:38     ` Jacob Keller
  2020-03-02 22:11     ` Jacob Keller
@ 2020-03-02 22:14     ` Jacob Keller
  2020-03-02 22:35     ` Jacob Keller
  3 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-03-02 22:14 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba



On 3/2/2020 9:41 AM, Jiri Pirko wrote:
> Sat, Feb 15, 2020 at 12:22:13AM CET, jacob.e.keller@intel.com wrote:
>> Implement support for the DEVLINK_CMD_REGION_NEW command for creating
>> snapshots. This new command parallels the existing
>> DEVLINK_CMD_REGION_DEL.
>>
>> In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
>> ".snapshot" operation must be implemented in the region's ops structure.
>>
>> The desired snapshot id may be provided. If the requested id is already
>> in use, an error will be reported. If no id is provided one will be
>> selected in the same way as a triggered snapshot.
>>
>> In either case, the reference count for that id will be incremented
>> in the snapshot IDR.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>> .../networking/devlink/devlink-region.rst     | 12 +++-
>> include/net/devlink.h                         |  6 ++
>> net/core/devlink.c                            | 72 +++++++++++++++++++
>> 3 files changed, 88 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>> index 1a7683e7acb2..a24faf2b6b7a 100644
>> --- a/Documentation/networking/devlink/devlink-region.rst
>> +++ b/Documentation/networking/devlink/devlink-region.rst
>> @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
>> Regions may also be used to provide an additional way to debug complex error
>> states, but see also :doc:`devlink-health`
>>
>> +Regions may optionally support capturing a snapshot on demand via the
>> +``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow
>> +requested snapshots must implement the ``.snapshot`` callback for the region
>> +in its ``devlink_region_ops`` structure.
>> +
>> example usage
>> -------------
>>
>> @@ -40,8 +45,11 @@ example usage
>>     # Delete a snapshot using:
>>     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>>
>> -    # Trigger (request) a snapshot be taken:
>> -    $ devlink region trigger pci/0000:00:05.0/cr-space
> 
> Odd. It is actually "devlink region dump". There is no trigger.
> 

This appears to have happened as I was working on the original "trigger"
patches at the same time as the documentation refactor, and things must
have gotten squashed in.

I can send a separate patch to remove it with a clearer explanation.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked
  2020-03-02 17:43   ` Jiri Pirko
@ 2020-03-02 22:25     ` Jacob Keller
  2020-03-03  8:41       ` Jiri Pirko
  0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-03-02 22:25 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba



On 3/2/2020 9:43 AM, Jiri Pirko wrote:
> Sat, Feb 15, 2020 at 12:22:10AM CET, jacob.e.keller@intel.com wrote:
>> A future change is going to add a new devlink command to request
>> a snapshot on demand. This function will want to call the
>> devlink_region_snapshot_id_get and devlink_region_snapshot_create
>> functions while already holding the devlink instance lock.
>>
>> Extract the logic of these two functions into static functions prefixed
>> by `__` to indicate they are internal helper functions. Modify the
>> original functions to be implemented in terms of the new locked
>> functions.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> 
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> 
> 
>> ---
>> net/core/devlink.c | 93 ++++++++++++++++++++++++++++++----------------
>> 1 file changed, 61 insertions(+), 32 deletions(-)
>>
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index fef93f48028c..0e94887713f4 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -3760,6 +3760,65 @@ static void devlink_nl_region_notify(struct devlink_region *region,
>> 	nlmsg_free(msg);
>> }
>>
>> +/**
>> + *	__devlink_region_snapshot_id_get - get snapshot ID
>> + *	@devlink: devlink instance
>> + *
>> + *	Returns a new snapshot id. Must be called while holding the
>> + *	devlink instance lock.
>> + */
> 
> You don't need this docu comment for static functions.
> 
> 

I like having these for all functions. I'll remove it if you feel
strongly, though.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-02 17:41   ` Jiri Pirko
                       ` (2 preceding siblings ...)
  2020-03-02 22:14     ` Jacob Keller
@ 2020-03-02 22:35     ` Jacob Keller
  2020-03-03  9:31       ` Jiri Pirko
  3 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-03-02 22:35 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba



On 3/2/2020 9:41 AM, Jiri Pirko wrote:
>> struct devlink_region_ops {
>> 	const char *name;
>> 	void (*destructor)(const void *data);
>> +	int (*snapshot)(struct devlink *devlink, struct netlink_ext_ack *extack,
>> +			u8 **data);
> 
> Please have the same type here and for destructor. "u8 *" I guess.
> 

So... if I use void **data, this ends up looking a little weird because
core code has to cast to (void **)...

I agree it looks a bit odd to use u8 ** for snapshot and void * for the
destructor.

I really do not want to change destructor to u8 *, because that makes
callers have to write a wrapper function if their destructor is simply
kvfree.

I'm ok with the cast to (void **) but it does seem a bit ugly.

Thoughts on which approach to take, or to leave this as is?

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked
  2020-03-02 22:25     ` Jacob Keller
@ 2020-03-03  8:41       ` Jiri Pirko
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Pirko @ 2020-03-03  8:41 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Mon, Mar 02, 2020 at 11:25:16PM CET, jacob.e.keller@intel.com wrote:
>
>
>On 3/2/2020 9:43 AM, Jiri Pirko wrote:
>> Sat, Feb 15, 2020 at 12:22:10AM CET, jacob.e.keller@intel.com wrote:
>>> A future change is going to add a new devlink command to request
>>> a snapshot on demand. This function will want to call the
>>> devlink_region_snapshot_id_get and devlink_region_snapshot_create
>>> functions while already holding the devlink instance lock.
>>>
>>> Extract the logic of these two functions into static functions prefixed
>>> by `__` to indicate they are internal helper functions. Modify the
>>> original functions to be implemented in terms of the new locked
>>> functions.
>>>
>>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> 
>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>> 
>> 
>>> ---
>>> net/core/devlink.c | 93 ++++++++++++++++++++++++++++++----------------
>>> 1 file changed, 61 insertions(+), 32 deletions(-)
>>>
>>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>>> index fef93f48028c..0e94887713f4 100644
>>> --- a/net/core/devlink.c
>>> +++ b/net/core/devlink.c
>>> @@ -3760,6 +3760,65 @@ static void devlink_nl_region_notify(struct devlink_region *region,
>>> 	nlmsg_free(msg);
>>> }
>>>
>>> +/**
>>> + *	__devlink_region_snapshot_id_get - get snapshot ID
>>> + *	@devlink: devlink instance
>>> + *
>>> + *	Returns a new snapshot id. Must be called while holding the
>>> + *	devlink instance lock.
>>> + */
>> 
>> You don't need this docu comment for static functions.
>> 
>> 
>
>I like having these for all functions. I'll remove it if you feel
>strongly, though.

Nope. I just wanted to note you don't have to do it.


>
>Thanks,
>Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-02 19:38     ` Jacob Keller
@ 2020-03-03  9:30       ` Jiri Pirko
  2020-03-03 17:51         ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Pirko @ 2020-03-03  9:30 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Mon, Mar 02, 2020 at 08:38:12PM CET, jacob.e.keller@intel.com wrote:
>On 3/2/2020 9:41 AM, Jiri Pirko wrote:
>> Sat, Feb 15, 2020 at 12:22:13AM CET, jacob.e.keller@intel.com wrote:
>>> Implement support for the DEVLINK_CMD_REGION_NEW command for creating
>>> snapshots. This new command parallels the existing
>>> DEVLINK_CMD_REGION_DEL.
>>>
>>> In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
>>> ".snapshot" operation must be implemented in the region's ops structure.
>>>
>>> The desired snapshot id may be provided. If the requested id is already
>>> in use, an error will be reported. If no id is provided one will be
>>> selected in the same way as a triggered snapshot.
>>>
>>> In either case, the reference count for that id will be incremented
>>> in the snapshot IDR.
>>>
>>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>>> ---
>>> .../networking/devlink/devlink-region.rst     | 12 +++-
>>> include/net/devlink.h                         |  6 ++
>>> net/core/devlink.c                            | 72 +++++++++++++++++++
>>> 3 files changed, 88 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
>>> index 1a7683e7acb2..a24faf2b6b7a 100644
>>> --- a/Documentation/networking/devlink/devlink-region.rst
>>> +++ b/Documentation/networking/devlink/devlink-region.rst
>>> @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
>>> Regions may also be used to provide an additional way to debug complex error
>>> states, but see also :doc:`devlink-health`
>>>
>>> +Regions may optionally support capturing a snapshot on demand via the
>>> +``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow
>>> +requested snapshots must implement the ``.snapshot`` callback for the region
>>> +in its ``devlink_region_ops`` structure.
>>> +
>>> example usage
>>> -------------
>>>
>>> @@ -40,8 +45,11 @@ example usage
>>>     # Delete a snapshot using:
>>>     $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
>>>
>>> -    # Trigger (request) a snapshot be taken:
>>> -    $ devlink region trigger pci/0000:00:05.0/cr-space
>> 
>> Odd. It is actually "devlink region dump". There is no trigger.
>> 
>> 
>>> +    # Request an immediate snapshot, if supported by the region
>>> +    $ devlink region new pci/0000:00:05.0/cr-space
>> 
>> Without ID? I would personally require snapshot id always. Without it,
>> it looks like you are creating region.
>> 
>
>Not specifying an ID causes the ID to be auto-selected. I suppose
>support for that doesn't need to be kept.

Yeah, I would avoid it.


>
>> 
>>> +
>>> +    # Request an immediate snapshot with a specific id
>>> +    $ devlink region new pci/0000:00:05.0/cr-space snapshot 5
>>>
>>>     # Dump a snapshot:
>>>     $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
>>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>>> index 3a5ff6bea143..3cd0ff2040b2 100644
>>> --- a/include/net/devlink.h
>>> +++ b/include/net/devlink.h
>>> @@ -498,10 +498,16 @@ struct devlink_info_req;
>>>  * struct devlink_region_ops - Region operations
>>>  * @name: region name
>>>  * @destructor: callback used to free snapshot memory when deleting
>>> + * @snapshot: callback to request an immediate snapshot. On success,
>>> + *            the data variable must be updated to point to the snapshot data.
>>> + *            The function will be called while the devlink instance lock is
>>> + *            held.
>>>  */
>>> struct devlink_region_ops {
>>> 	const char *name;
>>> 	void (*destructor)(const void *data);
>>> +	int (*snapshot)(struct devlink *devlink, struct netlink_ext_ack *extack,
>>> +			u8 **data);
>> 
>> Please have the same type here and for destructor. "u8 *" I guess.
>> 
>Sure. My only concern would be if that causes a compiler warning when
>passing kfree/vfree to the destructor pointer. Alternatively we could
>use void **data, but it's definitely interpreted as a byte stream by the
>devlink core code.

I see. Leave it as is then.


>
>> 
>>> };
>>>
>>> struct devlink_fmsg;
>>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>>> index 9571063846cc..b5d1b21e5178 100644
>>> --- a/net/core/devlink.c
>>> +++ b/net/core/devlink.c
>>> @@ -4045,6 +4045,71 @@ static int devlink_nl_cmd_region_del(struct sk_buff *skb,
>>> 	return 0;
>>> }
>>>
>>> +static int
>>> +devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info)
>>> +{
>>> +	struct devlink *devlink = info->user_ptr[0];
>>> +	struct devlink_region *region;
>>> +	const char *region_name;
>>> +	u32 snapshot_id;
>>> +	u8 *data;
>>> +	int err;
>>> +
>>> +	if (!info->attrs[DEVLINK_ATTR_REGION_NAME]) {
>>> +		NL_SET_ERR_MSG_MOD(info->extack, "No region name provided");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
>>> +	region = devlink_region_get_by_name(devlink, region_name);
>>> +	if (!region) {
>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> 
>> In devlink.c, please don't wrap here.
>> 
>
>For any of these?

Yep.


>
>> 
>>> +				   "The requested region does not exist");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (!region->ops->snapshot) {
>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>> +				   "The requested region does not support taking an immediate snapshot");
>>> +		return -EOPNOTSUPP;
>>> +	}
>>> +
>>> +	if (region->cur_snapshots == region->max_snapshots) {
>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>> +				   "The region has reached the maximum number of stored snapshots");
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>> +	if (info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
>>> +		/* __devlink_region_snapshot_create will take care of
>>> +		 * inserting the snapshot id into the IDR if necessary.
>>> +		 */
>>> +		snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
>>> +
>>> +		if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
>>> +			NL_SET_ERR_MSG_MOD(info->extack,
>>> +					   "The requested snapshot id is already in use");
>>> +			return -EEXIST;
>>> +		}
>>> +	} else {
>>> +		snapshot_id = __devlink_region_snapshot_id_get(devlink);
>>> +	}
>>> +
>>> +	err = region->ops->snapshot(devlink, info->extack, &data);
>> 
>> Don't you put the "id"? Looks like a leak.
>> 
>
>The id is put into the devlink_region_snapshot_create, the driver code
>doesn't need to know about it as far as I can tell.
>
>Currently the ids are managed by an IDR which stores a reference count
>of how many snapshots use it.
>
>Use of "NULL" is done so that devlink_region_snapshot_id_get can
>"pre-allocate" the ID without assigning snapshots, assuming that a later
>call to the devlink_region_snapshot_create will find that id and create
>or increment it's refcount.
>
>This complexity comes from the fact that the current code requires the
>ability to re-use the same snapshot id for different regions in the same
>devlink. This devlink_region_snapshot_id_get must return IDs which are
>unique across all regions. If a user does DEVLINK_CMD_REGION_NEW with an
>ID, it would only be used by a single snapshot. We need to make sure
>that this doesn't confuse devlink_region_snapshot_id_get. Additionally,
>I wanted to make sure that the snapshot IDs could be re-used once the
>related snapshots have been deleted.

Okay, I see. I'm just worried about possible scenario when user does
alloc up to max of u32 and always hits the error path.


>
>> 
>>> +	if (err)
>>> +		return err;
>>> +
>>> +	err = __devlink_region_snapshot_create(region, data, snapshot_id);
>>> +	if (err)
>>> +		goto err_free_snapshot_data;
>>> +
>>> +	return 0;
>>> +
>>> +err_free_snapshot_data:
>>> +	region->ops->destructor(data);
>>> +	return err;
>>> +}
>>> +
>>> static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
>>> 						 struct devlink *devlink,
>>> 						 u8 *chunk, u32 chunk_size,
>>> @@ -6358,6 +6423,13 @@ static const struct genl_ops devlink_nl_ops[] = {
>>> 		.flags = GENL_ADMIN_PERM,
>>> 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>>> 	},
>>> +	{
>>> +		.cmd = DEVLINK_CMD_REGION_NEW,
>>> +		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>>> +		.doit = devlink_nl_cmd_region_new,
>>> +		.flags = GENL_ADMIN_PERM,
>>> +		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>>> +	},
>>> 	{
>>> 		.cmd = DEVLINK_CMD_REGION_DEL,
>>> 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>>> -- 
>>> 2.25.0.368.g28a2d05eebfb
>>>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-02 22:35     ` Jacob Keller
@ 2020-03-03  9:31       ` Jiri Pirko
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Pirko @ 2020-03-03  9:31 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Mon, Mar 02, 2020 at 11:35:24PM CET, jacob.e.keller@intel.com wrote:
>
>
>On 3/2/2020 9:41 AM, Jiri Pirko wrote:
>>> struct devlink_region_ops {
>>> 	const char *name;
>>> 	void (*destructor)(const void *data);
>>> +	int (*snapshot)(struct devlink *devlink, struct netlink_ext_ack *extack,
>>> +			u8 **data);
>> 
>> Please have the same type here and for destructor. "u8 *" I guess.
>> 
>
>So... if I use void **data, this ends up looking a little weird because
>core code has to cast to (void **)...
>
>I agree it looks a bit odd to use u8 ** for snapshot and void * for the
>destructor.
>
>I really do not want to change destructor to u8 *, because that makes
>callers have to write a wrapper function if their destructor is simply
>kvfree.
>
>I'm ok with the cast to (void **) but it does seem a bit ugly.
>
>Thoughts on which approach to take, or to leave this as is?

Yep

>
>Thanks,
>Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 04/22] ice: enable initial devlink support
  2020-03-02 19:29     ` Jacob Keller
@ 2020-03-03 13:47       ` Jiri Pirko
  2020-03-03 17:53         ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Pirko @ 2020-03-03 13:47 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Mon, Mar 02, 2020 at 08:29:44PM CET, jacob.e.keller@intel.com wrote:
>
>
>On 3/2/2020 8:30 AM, Jiri Pirko wrote:
>> Sat, Feb 15, 2020 at 12:22:03AM CET, jacob.e.keller@intel.com wrote:
>> 
>> [...]
>> 
>>> +int ice_devlink_create_port(struct ice_pf *pf)
>>> +{
>>> +	struct devlink *devlink = priv_to_devlink(pf);
>>> +	struct ice_vsi *vsi = ice_get_main_vsi(pf);
>>> +	struct device *dev = ice_pf_to_dev(pf);
>>> +	int err;
>>> +
>>> +	if (!vsi) {
>>> +		dev_err(dev, "%s: unable to find main VSI\n", __func__);
>>> +		return -EIO;
>>> +	}
>>> +
>>> +	devlink_port_attrs_set(&pf->devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
>>> +			       pf->hw.pf_id, false, 0, NULL, 0);
>>> +	err = devlink_port_register(devlink, &pf->devlink_port, pf->hw.pf_id);
>>> +	if (err) {
>>> +		dev_err(dev, "devlink_port_register failed: %d\n", err);
>>> +		return err;
>>> +	}
>> 
>> You need to register_netdev here. Otherwise you'll get inconsistent udev
>> naming.
>> 
>
>The netdev is registered in other portion of the code, and should
>already be registered by the time we call ice_devlink_create_port. This
>check is mostly here to prevent a NULL pointer if the VSI somehow
>doesn't have a netdev associated with it.

My point is, the correct order is:
devlink_register()
devlink_port_attrs_set()
devlink_port_register()
register_netdev()
devlink_port_type_eth_set()


>
>> 
>>> +	if (vsi->netdev)
>>> +		devlink_port_type_eth_set(&pf->devlink_port, vsi->netdev);
>>> +
>>> +	return 0;
>>> +}
>> 
>> 
>> [...]
>> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-03  9:30       ` Jiri Pirko
@ 2020-03-03 17:51         ` Jacob Keller
  2020-03-04 11:58           ` Jiri Pirko
  0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-03-03 17:51 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba



On 3/3/2020 1:30 AM, Jiri Pirko wrote:
> Mon, Mar 02, 2020 at 08:38:12PM CET, jacob.e.keller@intel.com wrote:
>> On 3/2/2020 9:41 AM, Jiri Pirko wrote:
>>> Without ID? I would personally require snapshot id always. Without it,
>>> it looks like you are creating region.
>>>
>>
>> Not specifying an ID causes the ID to be auto-selected. I suppose
>> support for that doesn't need to be kept.
> 
> Yeah, I would avoid it.
> 
> 

Done.

>>> Please have the same type here and for destructor. "u8 *" I guess.
>>>
>> Sure. My only concern would be if that causes a compiler warning when
>> passing kfree/vfree to the destructor pointer. Alternatively we could
>> use void **data, but it's definitely interpreted as a byte stream by the
>> devlink core code.
> 
> I see. Leave it as is then.
> 

Ok.


>>> In devlink.c, please don't wrap here.
>>>
>>
>> For any of these?
> 
> Yep.
> 

Done.

> 
>>
>>>
>>>> +				   "The requested region does not exist");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (!region->ops->snapshot) {
>>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>>> +				   "The requested region does not support taking an immediate snapshot");
>>>> +		return -EOPNOTSUPP;
>>>> +	}
>>>> +
>>>> +	if (region->cur_snapshots == region->max_snapshots) {
>>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>>> +				   "The region has reached the maximum number of stored snapshots");
>>>> +		return -ENOMEM;
>>>> +	}
>>>> +
>>>> +	if (info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
>>>> +		/* __devlink_region_snapshot_create will take care of
>>>> +		 * inserting the snapshot id into the IDR if necessary.
>>>> +		 */
>>>> +		snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
>>>> +
>>>> +		if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
>>>> +			NL_SET_ERR_MSG_MOD(info->extack,
>>>> +					   "The requested snapshot id is already in use");
>>>> +			return -EEXIST;
>>>> +		}
>>>> +	} else {
>>>> +		snapshot_id = __devlink_region_snapshot_id_get(devlink);
>>>> +	}
>>>> +
>>>> +	err = region->ops->snapshot(devlink, info->extack, &data);
>>>
>>> Don't you put the "id"? Looks like a leak.
>>>
>>
>> The id is put into the devlink_region_snapshot_create, the driver code
>> doesn't need to know about it as far as I can tell.
>>
>> Currently the ids are managed by an IDR which stores a reference count
>> of how many snapshots use it.
>>
>> Use of "NULL" is done so that devlink_region_snapshot_id_get can
>> "pre-allocate" the ID without assigning snapshots, assuming that a later
>> call to the devlink_region_snapshot_create will find that id and create
>> or increment it's refcount.
>>
>> This complexity comes from the fact that the current code requires the
>> ability to re-use the same snapshot id for different regions in the same
>> devlink. This devlink_region_snapshot_id_get must return IDs which are
>> unique across all regions. If a user does DEVLINK_CMD_REGION_NEW with an
>> ID, it would only be used by a single snapshot. We need to make sure
>> that this doesn't confuse devlink_region_snapshot_id_get. Additionally,
>> I wanted to make sure that the snapshot IDs could be re-used once the
>> related snapshots have been deleted.
> 
> Okay, I see. I'm just worried about possible scenario when user does
> alloc up to max of u32 and always hits the error path.
> 

Hm. The flow here was about supporting both with and without snapshot
IDs. That will be gone in the next revision and should make the code clear.

The IDs are stored in the IDR with either a NULL, or a pointer to a
refcount of the number of snapshots currently using them.

On devlink_region_snapshot_create, the id must have been allocated by
the devlink_region_snapshot_id_get ahead of time by the driver.

When devlink_region_snapshot_id_get is called, a NULL is inserted into
the IDR at a suitable ID number (i.e. one that does not yet have a
refcount).

On devlink_region_snapshot_new, the callback for the new command, the ID
must be specified by userspace.

Both cases, the ID is confirmed to not be in use for that region by
looping over all snapshots and checking to see if one can be found that
has the ID.

In __devlink_region_snapshot_create, the IDR is checked to see if it is
already used. If so, the refcount is incremented. If there is no
refcount (i.e. the IDR returns NULL), a new refcount is created, set to
1, and inserted.

The basic idea is the refcount is "how many snapshots are actually using
this ID". Use of devlink_region_snapshot_id_get can "pre-allocate" an ID
value so that future calls to devlink_region_id_get won't re-use the
same ID number even if no snapshot with that ID has yet been created.

The refcount isn't actually incremented until the snapshot is created
with that ID.

Userspace never uses devlink_region_snapshot_id_get now, since it always
requires an ID to be chosen.

On snapshot delete, the id refcount is reduced, and when it hits zero
the ID is released from the IDR. This way, IDs can be re-used as long as
no remaining snapshots on any region point to them.

This system enables userspace to simply treat snapshot ids as unique to
each region, and to provide their own values on the command line. It
also preserves the behavior that devlink_region_snapshot_id_get will
never select an ID that is used by any region on that devlink, so that
the id can be safely used for multiple snapshots triggered at the same time.

This will hopefully be more clear in the next revision.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 04/22] ice: enable initial devlink support
  2020-03-03 13:47       ` Jiri Pirko
@ 2020-03-03 17:53         ` Jacob Keller
  0 siblings, 0 replies; 59+ messages in thread
From: Jacob Keller @ 2020-03-03 17:53 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

On 3/3/2020 5:47 AM, Jiri Pirko wrote:
> Mon, Mar 02, 2020 at 08:29:44PM CET, jacob.e.keller@intel.com wrote:
>>
>>
>> On 3/2/2020 8:30 AM, Jiri Pirko wrote:
>>> Sat, Feb 15, 2020 at 12:22:03AM CET, jacob.e.keller@intel.com wrote:
>>>
>>> [...]
>>>
>>>> +int ice_devlink_create_port(struct ice_pf *pf)
>>>> +{
>>>> +	struct devlink *devlink = priv_to_devlink(pf);
>>>> +	struct ice_vsi *vsi = ice_get_main_vsi(pf);
>>>> +	struct device *dev = ice_pf_to_dev(pf);
>>>> +	int err;
>>>> +
>>>> +	if (!vsi) {
>>>> +		dev_err(dev, "%s: unable to find main VSI\n", __func__);
>>>> +		return -EIO;
>>>> +	}
>>>> +
>>>> +	devlink_port_attrs_set(&pf->devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
>>>> +			       pf->hw.pf_id, false, 0, NULL, 0);
>>>> +	err = devlink_port_register(devlink, &pf->devlink_port, pf->hw.pf_id);
>>>> +	if (err) {
>>>> +		dev_err(dev, "devlink_port_register failed: %d\n", err);
>>>> +		return err;
>>>> +	}
>>>
>>> You need to register_netdev here. Otherwise you'll get inconsistent udev
>>> naming.
>>>
>>
>> The netdev is registered in other portion of the code, and should
>> already be registered by the time we call ice_devlink_create_port. This
>> check is mostly here to prevent a NULL pointer if the VSI somehow
>> doesn't have a netdev associated with it.
> 
> My point is, the correct order is:
> devlink_register()
> devlink_port_attrs_set()
> devlink_port_register()
> register_netdev()
> devlink_port_type_eth_set()
> 

Oh. Hmm. Ok, I'll need to move this around. Will fix.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-03 17:51         ` Jacob Keller
@ 2020-03-04 11:58           ` Jiri Pirko
  2020-03-04 17:43             ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Pirko @ 2020-03-04 11:58 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Tue, Mar 03, 2020 at 06:51:37PM CET, jacob.e.keller@intel.com wrote:
>
>
>On 3/3/2020 1:30 AM, Jiri Pirko wrote:
>> Mon, Mar 02, 2020 at 08:38:12PM CET, jacob.e.keller@intel.com wrote:
>>> On 3/2/2020 9:41 AM, Jiri Pirko wrote:
>>>> Without ID? I would personally require snapshot id always. Without it,
>>>> it looks like you are creating region.
>>>>
>>>
>>> Not specifying an ID causes the ID to be auto-selected. I suppose
>>> support for that doesn't need to be kept.
>> 
>> Yeah, I would avoid it.
>> 
>> 
>
>Done.
>
>>>> Please have the same type here and for destructor. "u8 *" I guess.
>>>>
>>> Sure. My only concern would be if that causes a compiler warning when
>>> passing kfree/vfree to the destructor pointer. Alternatively we could
>>> use void **data, but it's definitely interpreted as a byte stream by the
>>> devlink core code.
>> 
>> I see. Leave it as is then.
>> 
>
>Ok.
>
>
>>>> In devlink.c, please don't wrap here.
>>>>
>>>
>>> For any of these?
>> 
>> Yep.
>> 
>
>Done.
>
>> 
>>>
>>>>
>>>>> +				   "The requested region does not exist");
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	if (!region->ops->snapshot) {
>>>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>>>> +				   "The requested region does not support taking an immediate snapshot");
>>>>> +		return -EOPNOTSUPP;
>>>>> +	}
>>>>> +
>>>>> +	if (region->cur_snapshots == region->max_snapshots) {
>>>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>>>> +				   "The region has reached the maximum number of stored snapshots");
>>>>> +		return -ENOMEM;
>>>>> +	}
>>>>> +
>>>>> +	if (info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
>>>>> +		/* __devlink_region_snapshot_create will take care of
>>>>> +		 * inserting the snapshot id into the IDR if necessary.
>>>>> +		 */
>>>>> +		snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
>>>>> +
>>>>> +		if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
>>>>> +			NL_SET_ERR_MSG_MOD(info->extack,
>>>>> +					   "The requested snapshot id is already in use");
>>>>> +			return -EEXIST;
>>>>> +		}
>>>>> +	} else {
>>>>> +		snapshot_id = __devlink_region_snapshot_id_get(devlink);
>>>>> +	}
>>>>> +
>>>>> +	err = region->ops->snapshot(devlink, info->extack, &data);
>>>>
>>>> Don't you put the "id"? Looks like a leak.
>>>>
>>>
>>> The id is put into the devlink_region_snapshot_create, the driver code
>>> doesn't need to know about it as far as I can tell.
>>>
>>> Currently the ids are managed by an IDR which stores a reference count
>>> of how many snapshots use it.
>>>
>>> Use of "NULL" is done so that devlink_region_snapshot_id_get can
>>> "pre-allocate" the ID without assigning snapshots, assuming that a later
>>> call to the devlink_region_snapshot_create will find that id and create
>>> or increment it's refcount.
>>>
>>> This complexity comes from the fact that the current code requires the
>>> ability to re-use the same snapshot id for different regions in the same
>>> devlink. This devlink_region_snapshot_id_get must return IDs which are
>>> unique across all regions. If a user does DEVLINK_CMD_REGION_NEW with an
>>> ID, it would only be used by a single snapshot. We need to make sure
>>> that this doesn't confuse devlink_region_snapshot_id_get. Additionally,
>>> I wanted to make sure that the snapshot IDs could be re-used once the
>>> related snapshots have been deleted.
>> 
>> Okay, I see. I'm just worried about possible scenario when user does
>> alloc up to max of u32 and always hits the error path.
>> 
>
>Hm. The flow here was about supporting both with and without snapshot
>IDs. That will be gone in the next revision and should make the code clear.
>
>The IDs are stored in the IDR with either a NULL, or a pointer to a
>refcount of the number of snapshots currently using them.
>
>On devlink_region_snapshot_create, the id must have been allocated by
>the devlink_region_snapshot_id_get ahead of time by the driver.
>
>When devlink_region_snapshot_id_get is called, a NULL is inserted into
>the IDR at a suitable ID number (i.e. one that does not yet have a
>refcount).
>
>On devlink_region_snapshot_new, the callback for the new command, the ID
>must be specified by userspace.
>
>Both cases, the ID is confirmed to not be in use for that region by
>looping over all snapshots and checking to see if one can be found that
>has the ID.
>
>In __devlink_region_snapshot_create, the IDR is checked to see if it is
>already used. If so, the refcount is incremented. If there is no
>refcount (i.e. the IDR returns NULL), a new refcount is created, set to
>1, and inserted.
>
>The basic idea is the refcount is "how many snapshots are actually using
>this ID". Use of devlink_region_snapshot_id_get can "pre-allocate" an ID
>value so that future calls to devlink_region_id_get won't re-use the
>same ID number even if no snapshot with that ID has yet been created.
>
>The refcount isn't actually incremented until the snapshot is created
>with that ID.
>
>Userspace never uses devlink_region_snapshot_id_get now, since it always
>requires an ID to be chosen.
>
>On snapshot delete, the id refcount is reduced, and when it hits zero
>the ID is released from the IDR. This way, IDs can be re-used as long as
>no remaining snapshots on any region point to them.
>
>This system enables userspace to simply treat snapshot ids as unique to
>each region, and to provide their own values on the command line. It
>also preserves the behavior that devlink_region_snapshot_id_get will
>never select an ID that is used by any region on that devlink, so that
>the id can be safely used for multiple snapshots triggered at the same time.
>
>This will hopefully be more clear in the next revision.

Okay, I see. The code is a bit harder to follow.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-04 11:58           ` Jiri Pirko
@ 2020-03-04 17:43             ` Jacob Keller
  2020-03-05  6:41               ` Jiri Pirko
  0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-03-04 17:43 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba



On 3/4/2020 3:58 AM, Jiri Pirko wrote:
> Tue, Mar 03, 2020 at 06:51:37PM CET, jacob.e.keller@intel.com wrote:
>>
>> Hm. The flow here was about supporting both with and without snapshot
>> IDs. That will be gone in the next revision and should make the code clear.
>>
>> The IDs are stored in the IDR with either a NULL, or a pointer to a
>> refcount of the number of snapshots currently using them.
>>
>> On devlink_region_snapshot_create, the id must have been allocated by
>> the devlink_region_snapshot_id_get ahead of time by the driver.
>>
>> When devlink_region_snapshot_id_get is called, a NULL is inserted into
>> the IDR at a suitable ID number (i.e. one that does not yet have a
>> refcount).
>>
>> On devlink_region_snapshot_new, the callback for the new command, the ID
>> must be specified by userspace.
>>
>> Both cases, the ID is confirmed to not be in use for that region by
>> looping over all snapshots and checking to see if one can be found that
>> has the ID.
>>
>> In __devlink_region_snapshot_create, the IDR is checked to see if it is
>> already used. If so, the refcount is incremented. If there is no
>> refcount (i.e. the IDR returns NULL), a new refcount is created, set to
>> 1, and inserted.
>>
>> The basic idea is the refcount is "how many snapshots are actually using
>> this ID". Use of devlink_region_snapshot_id_get can "pre-allocate" an ID
>> value so that future calls to devlink_region_id_get won't re-use the
>> same ID number even if no snapshot with that ID has yet been created.
>>
>> The refcount isn't actually incremented until the snapshot is created
>> with that ID.
>>
>> Userspace never uses devlink_region_snapshot_id_get now, since it always
>> requires an ID to be chosen.
>>
>> On snapshot delete, the id refcount is reduced, and when it hits zero
>> the ID is released from the IDR. This way, IDs can be re-used as long as
>> no remaining snapshots on any region point to them.
>>
>> This system enables userspace to simply treat snapshot ids as unique to
>> each region, and to provide their own values on the command line. It
>> also preserves the behavior that devlink_region_snapshot_id_get will
>> never select an ID that is used by any region on that devlink, so that
>> the id can be safely used for multiple snapshots triggered at the same time.
>>
>> This will hopefully be more clear in the next revision.
> 
> Okay, I see. The code is a bit harder to follow.
> 

I'm open to suggestions for better alternatives, or ways to improve code
legibility.

I want to preserve the following properties:

* devlink_region_snapshot_id_get must choose IDs globally for the whole
devlink, so that the ID can safely be re-used across multiple regions.

* IDs must be reusable once all snapshots associated with the IDs have
been removed

* the new DEVLINK_CMD_REGION_NEW must allow userspace to select IDs

* selecting IDs via DEVLINK_CMD_REGION_NEW should not really require the
user to check more than the current interested snapshot

* userspace should be able to re-use the same ID across multiple regions
just like devlink_region_snapshot_id_get and driver triggered snapshots

So, in a sense, the IDs must be a combination of both global and local
to the region. When using an ID, the region must ensure that no more
than one snapshot on that region uses the id.

However, when selecting a new ID for use via the
devlink_region_snapshot_id_get(), it must select one that is not yet
used by *any* region.

That's where the IDR came into use. I'm not a huge fan of this, so maybe
there's something simpler.

We could just do a brute force search across all regions to find an ID
that isn't in use by any region snapshot....

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-04 17:43             ` Jacob Keller
@ 2020-03-05  6:41               ` Jiri Pirko
  2020-03-05 22:33                 ` Jacob Keller
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Pirko @ 2020-03-05  6:41 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Wed, Mar 04, 2020 at 06:43:02PM CET, jacob.e.keller@intel.com wrote:
>
>
>On 3/4/2020 3:58 AM, Jiri Pirko wrote:
>> Tue, Mar 03, 2020 at 06:51:37PM CET, jacob.e.keller@intel.com wrote:
>>>
>>> Hm. The flow here was about supporting both with and without snapshot
>>> IDs. That will be gone in the next revision and should make the code clear.
>>>
>>> The IDs are stored in the IDR with either a NULL, or a pointer to a
>>> refcount of the number of snapshots currently using them.
>>>
>>> On devlink_region_snapshot_create, the id must have been allocated by
>>> the devlink_region_snapshot_id_get ahead of time by the driver.
>>>
>>> When devlink_region_snapshot_id_get is called, a NULL is inserted into
>>> the IDR at a suitable ID number (i.e. one that does not yet have a
>>> refcount).
>>>
>>> On devlink_region_snapshot_new, the callback for the new command, the ID
>>> must be specified by userspace.
>>>
>>> Both cases, the ID is confirmed to not be in use for that region by
>>> looping over all snapshots and checking to see if one can be found that
>>> has the ID.
>>>
>>> In __devlink_region_snapshot_create, the IDR is checked to see if it is
>>> already used. If so, the refcount is incremented. If there is no
>>> refcount (i.e. the IDR returns NULL), a new refcount is created, set to
>>> 1, and inserted.
>>>
>>> The basic idea is the refcount is "how many snapshots are actually using
>>> this ID". Use of devlink_region_snapshot_id_get can "pre-allocate" an ID
>>> value so that future calls to devlink_region_id_get won't re-use the
>>> same ID number even if no snapshot with that ID has yet been created.
>>>
>>> The refcount isn't actually incremented until the snapshot is created
>>> with that ID.
>>>
>>> Userspace never uses devlink_region_snapshot_id_get now, since it always
>>> requires an ID to be chosen.
>>>
>>> On snapshot delete, the id refcount is reduced, and when it hits zero
>>> the ID is released from the IDR. This way, IDs can be re-used as long as
>>> no remaining snapshots on any region point to them.
>>>
>>> This system enables userspace to simply treat snapshot ids as unique to
>>> each region, and to provide their own values on the command line. It
>>> also preserves the behavior that devlink_region_snapshot_id_get will
>>> never select an ID that is used by any region on that devlink, so that
>>> the id can be safely used for multiple snapshots triggered at the same time.
>>>
>>> This will hopefully be more clear in the next revision.
>> 
>> Okay, I see. The code is a bit harder to follow.
>> 
>
>I'm open to suggestions for better alternatives, or ways to improve code
>legibility.
>
>I want to preserve the following properties:
>
>* devlink_region_snapshot_id_get must choose IDs globally for the whole
>devlink, so that the ID can safely be re-used across multiple regions.
>
>* IDs must be reusable once all snapshots associated with the IDs have
>been removed
>
>* the new DEVLINK_CMD_REGION_NEW must allow userspace to select IDs
>
>* selecting IDs via DEVLINK_CMD_REGION_NEW should not really require the
>user to check more than the current interested snapshot
>
>* userspace should be able to re-use the same ID across multiple regions
>just like devlink_region_snapshot_id_get and driver triggered snapshots

Nope. I believe this is not desired. The point of having the same id for
the multiple regions is that the driver can obtain multiple region
snapshots during single FW event. For user, that it not the case.
For user, it would be 2 separate snapshots in 2 separate times. They
should not have the same ID.


>
>So, in a sense, the IDs must be a combination of both global and local
>to the region. When using an ID, the region must ensure that no more
>than one snapshot on that region uses the id.
>
>However, when selecting a new ID for use via the
>devlink_region_snapshot_id_get(), it must select one that is not yet
>used by *any* region.
>
>That's where the IDR came into use. I'm not a huge fan of this, so maybe
>there's something simpler.
>
>We could just do a brute force search across all regions to find an ID
>that isn't in use by any region snapshot....
>
>Thanks,
>Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-05  6:41               ` Jiri Pirko
@ 2020-03-05 22:33                 ` Jacob Keller
  2020-03-06  6:16                   ` Jiri Pirko
  0 siblings, 1 reply; 59+ messages in thread
From: Jacob Keller @ 2020-03-05 22:33 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba



On 3/4/2020 10:41 PM, Jiri Pirko wrote:
> Wed, Mar 04, 2020 at 06:43:02PM CET, jacob.e.keller@intel.com wrote:
>>
>>
>> On 3/4/2020 3:58 AM, Jiri Pirko wrote:
>>> Tue, Mar 03, 2020 at 06:51:37PM CET, jacob.e.keller@intel.com wrote:
>>>>
>>>> Hm. The flow here was about supporting both with and without snapshot
>>>> IDs. That will be gone in the next revision and should make the code clear.
>>>>
>>>> The IDs are stored in the IDR with either a NULL, or a pointer to a
>>>> refcount of the number of snapshots currently using them.
>>>>
>>>> On devlink_region_snapshot_create, the id must have been allocated by
>>>> the devlink_region_snapshot_id_get ahead of time by the driver.
>>>>
>>>> When devlink_region_snapshot_id_get is called, a NULL is inserted into
>>>> the IDR at a suitable ID number (i.e. one that does not yet have a
>>>> refcount).
>>>>
>>>> On devlink_region_snapshot_new, the callback for the new command, the ID
>>>> must be specified by userspace.
>>>>
>>>> Both cases, the ID is confirmed to not be in use for that region by
>>>> looping over all snapshots and checking to see if one can be found that
>>>> has the ID.
>>>>
>>>> In __devlink_region_snapshot_create, the IDR is checked to see if it is
>>>> already used. If so, the refcount is incremented. If there is no
>>>> refcount (i.e. the IDR returns NULL), a new refcount is created, set to
>>>> 1, and inserted.
>>>>
>>>> The basic idea is the refcount is "how many snapshots are actually using
>>>> this ID". Use of devlink_region_snapshot_id_get can "pre-allocate" an ID
>>>> value so that future calls to devlink_region_id_get won't re-use the
>>>> same ID number even if no snapshot with that ID has yet been created.
>>>>
>>>> The refcount isn't actually incremented until the snapshot is created
>>>> with that ID.
>>>>
>>>> Userspace never uses devlink_region_snapshot_id_get now, since it always
>>>> requires an ID to be chosen.
>>>>
>>>> On snapshot delete, the id refcount is reduced, and when it hits zero
>>>> the ID is released from the IDR. This way, IDs can be re-used as long as
>>>> no remaining snapshots on any region point to them.
>>>>
>>>> This system enables userspace to simply treat snapshot ids as unique to
>>>> each region, and to provide their own values on the command line. It
>>>> also preserves the behavior that devlink_region_snapshot_id_get will
>>>> never select an ID that is used by any region on that devlink, so that
>>>> the id can be safely used for multiple snapshots triggered at the same time.
>>>>
>>>> This will hopefully be more clear in the next revision.
>>>
>>> Okay, I see. The code is a bit harder to follow.
>>>
>>
>> I'm open to suggestions for better alternatives, or ways to improve code
>> legibility.
>>
>> I want to preserve the following properties:
>>
>> * devlink_region_snapshot_id_get must choose IDs globally for the whole
>> devlink, so that the ID can safely be re-used across multiple regions.
>>
>> * IDs must be reusable once all snapshots associated with the IDs have
>> been removed
>>
>> * the new DEVLINK_CMD_REGION_NEW must allow userspace to select IDs
>>
>> * selecting IDs via DEVLINK_CMD_REGION_NEW should not really require the
>> user to check more than the current interested snapshot
>>
>> * userspace should be able to re-use the same ID across multiple regions
>> just like devlink_region_snapshot_id_get and driver triggered snapshots
> 
> Nope. I believe this is not desired. The point of having the same id for
> the multiple regions is that the driver can obtain multiple region
> snapshots during single FW event. For user, that it not the case.
> For user, it would be 2 separate snapshots in 2 separate times. They
> should not have the same ID.
> 

So users would have to pick an ID that's unique across all regions. Ok.

I think we still need a reference count of how many snapshots are using
an ID (so that it can be released once all region snapshots using that
ID are destroyed).

We basically add this complexity even in cases where regions are totally
independent and never taken together.

One alternative would be to instead create some sort of grouping system,
but that has even more complication.

Ok. So I think we still need to track IDs using something like the IDR
with a reference count or similar structure.

Using only an IDA doesn't give us the ability to release previously used
IDs. Because on snapshot delete it has no idea whether another region
used that ID, so it can't remove it.

Maybe something like IDR with a refcount.. but we'd really like
something that can exist for some time with a refcount of zero. That's
what I basically used the NULL trick for in this version.

We can first check if the IDR has the ID when responding to
DEVLINK_CMD_REGION_NEW, and bail if so. That would enforce that users
must specify IDs which are unused by any region on the device.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW
  2020-03-05 22:33                 ` Jacob Keller
@ 2020-03-06  6:16                   ` Jiri Pirko
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Pirko @ 2020-03-06  6:16 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, valex, linyunsheng, lihong.yang, kuba

Thu, Mar 05, 2020 at 11:33:17PM CET, jacob.e.keller@intel.com wrote:
>
>
>On 3/4/2020 10:41 PM, Jiri Pirko wrote:
>> Wed, Mar 04, 2020 at 06:43:02PM CET, jacob.e.keller@intel.com wrote:
>>>
>>>
>>> On 3/4/2020 3:58 AM, Jiri Pirko wrote:
>>>> Tue, Mar 03, 2020 at 06:51:37PM CET, jacob.e.keller@intel.com wrote:
>>>>>
>>>>> Hm. The flow here was about supporting both with and without snapshot
>>>>> IDs. That will be gone in the next revision and should make the code clear.
>>>>>
>>>>> The IDs are stored in the IDR with either a NULL, or a pointer to a
>>>>> refcount of the number of snapshots currently using them.
>>>>>
>>>>> On devlink_region_snapshot_create, the id must have been allocated by
>>>>> the devlink_region_snapshot_id_get ahead of time by the driver.
>>>>>
>>>>> When devlink_region_snapshot_id_get is called, a NULL is inserted into
>>>>> the IDR at a suitable ID number (i.e. one that does not yet have a
>>>>> refcount).
>>>>>
>>>>> On devlink_region_snapshot_new, the callback for the new command, the ID
>>>>> must be specified by userspace.
>>>>>
>>>>> Both cases, the ID is confirmed to not be in use for that region by
>>>>> looping over all snapshots and checking to see if one can be found that
>>>>> has the ID.
>>>>>
>>>>> In __devlink_region_snapshot_create, the IDR is checked to see if it is
>>>>> already used. If so, the refcount is incremented. If there is no
>>>>> refcount (i.e. the IDR returns NULL), a new refcount is created, set to
>>>>> 1, and inserted.
>>>>>
>>>>> The basic idea is the refcount is "how many snapshots are actually using
>>>>> this ID". Use of devlink_region_snapshot_id_get can "pre-allocate" an ID
>>>>> value so that future calls to devlink_region_id_get won't re-use the
>>>>> same ID number even if no snapshot with that ID has yet been created.
>>>>>
>>>>> The refcount isn't actually incremented until the snapshot is created
>>>>> with that ID.
>>>>>
>>>>> Userspace never uses devlink_region_snapshot_id_get now, since it always
>>>>> requires an ID to be chosen.
>>>>>
>>>>> On snapshot delete, the id refcount is reduced, and when it hits zero
>>>>> the ID is released from the IDR. This way, IDs can be re-used as long as
>>>>> no remaining snapshots on any region point to them.
>>>>>
>>>>> This system enables userspace to simply treat snapshot ids as unique to
>>>>> each region, and to provide their own values on the command line. It
>>>>> also preserves the behavior that devlink_region_snapshot_id_get will
>>>>> never select an ID that is used by any region on that devlink, so that
>>>>> the id can be safely used for multiple snapshots triggered at the same time.
>>>>>
>>>>> This will hopefully be more clear in the next revision.
>>>>
>>>> Okay, I see. The code is a bit harder to follow.
>>>>
>>>
>>> I'm open to suggestions for better alternatives, or ways to improve code
>>> legibility.
>>>
>>> I want to preserve the following properties:
>>>
>>> * devlink_region_snapshot_id_get must choose IDs globally for the whole
>>> devlink, so that the ID can safely be re-used across multiple regions.
>>>
>>> * IDs must be reusable once all snapshots associated with the IDs have
>>> been removed
>>>
>>> * the new DEVLINK_CMD_REGION_NEW must allow userspace to select IDs
>>>
>>> * selecting IDs via DEVLINK_CMD_REGION_NEW should not really require the
>>> user to check more than the current interested snapshot
>>>
>>> * userspace should be able to re-use the same ID across multiple regions
>>> just like devlink_region_snapshot_id_get and driver triggered snapshots
>> 
>> Nope. I believe this is not desired. The point of having the same id for
>> the multiple regions is that the driver can obtain multiple region
>> snapshots during single FW event. For user, that it not the case.
>> For user, it would be 2 separate snapshots in 2 separate times. They
>> should not have the same ID.
>> 
>
>So users would have to pick an ID that's unique across all regions. Ok.
>
>I think we still need a reference count of how many snapshots are using
>an ID (so that it can be released once all region snapshots using that
>ID are destroyed).
>
>We basically add this complexity even in cases where regions are totally
>independent and never taken together.
>
>One alternative would be to instead create some sort of grouping system,
>but that has even more complication.
>
>Ok. So I think we still need to track IDs using something like the IDR
>with a reference count or similar structure.

I agree.

>
>Using only an IDA doesn't give us the ability to release previously used
>IDs. Because on snapshot delete it has no idea whether another region
>used that ID, so it can't remove it.
>
>Maybe something like IDR with a refcount.. but we'd really like
>something that can exist for some time with a refcount of zero. That's
>what I basically used the NULL trick for in this version.
>
>We can first check if the IDR has the ID when responding to
>DEVLINK_CMD_REGION_NEW, and bail if so. That would enforce that users
>must specify IDs which are unused by any region on the device.

Yes, that I believe is the correct behaviour.


>
>Thanks,
>Jake

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2020-03-06  6:17 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 23:21 [RFC PATCH v2 00/22] devlink region updates Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 01/22] ice: use __le16 types for explicitly Little Endian values Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 02/22] ice: create function to read a section of the NVM and Shadow RAM Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 03/22] ice: implement full NVM read from ETHTOOL_GEEPROM Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 04/22] ice: enable initial devlink support Jacob Keller
2020-03-02 16:30   ` Jiri Pirko
2020-03-02 19:29     ` Jacob Keller
2020-03-03 13:47       ` Jiri Pirko
2020-03-03 17:53         ` Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 05/22] ice: rename variables used for Option ROM version Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 06/22] ice: add basic handler for devlink .info_get Jacob Keller
2020-02-19  2:45   ` Jakub Kicinski
2020-02-19 17:33     ` Jacob Keller
2020-02-19 19:57       ` Jakub Kicinski
2020-02-19 21:37         ` Jacob Keller
2020-02-19 23:47           ` Jakub Kicinski
2020-02-20  0:06             ` Jacob Keller
2020-02-21 22:11               ` Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 07/22] ice: add board identifier info to " Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 08/22] devlink: prepare to support region operations Jacob Keller
2020-03-02 17:42   ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 09/22] devlink: convert snapshot destructor callback to region op Jacob Keller
2020-03-02 17:42   ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 10/22] devlink: trivial: fix tab in function documentation Jacob Keller
2020-03-02 17:42   ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 11/22] devlink: add functions to take snapshot while locked Jacob Keller
2020-03-02 17:43   ` Jiri Pirko
2020-03-02 22:25     ` Jacob Keller
2020-03-03  8:41       ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 12/22] devlink: convert snapshot id getter to return an error Jacob Keller
2020-03-02 17:44   ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 13/22] devlink: track snapshot ids using an IDR and refcounts Jacob Keller
2020-02-18 21:44   ` Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 14/22] devlink: implement DEVLINK_CMD_REGION_NEW Jacob Keller
2020-03-02 17:41   ` Jiri Pirko
2020-03-02 19:38     ` Jacob Keller
2020-03-03  9:30       ` Jiri Pirko
2020-03-03 17:51         ` Jacob Keller
2020-03-04 11:58           ` Jiri Pirko
2020-03-04 17:43             ` Jacob Keller
2020-03-05  6:41               ` Jiri Pirko
2020-03-05 22:33                 ` Jacob Keller
2020-03-06  6:16                   ` Jiri Pirko
2020-03-02 22:11     ` Jacob Keller
2020-03-02 22:14     ` Jacob Keller
2020-03-02 22:35     ` Jacob Keller
2020-03-03  9:31       ` Jiri Pirko
2020-02-14 23:22 ` [RFC PATCH v2 15/22] netdevsim: support taking immediate snapshot via devlink Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 16/22] devlink: simplify arguments for read_snapshot_fill Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 17/22] devlink: use min_t to calculate data_size Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 18/22] devlink: report extended error message in region_read_dumpit Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 19/22] devlink: remove unnecessary parameter from chunk_fill function Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 20/22] devlink: refactor region_read_snapshot_fill to use a callback function Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 21/22] devlink: support directly reading from region memory Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 22/22] ice: add a devlink region to dump shadow RAM contents Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 1/2] devlink: add support for DEVLINK_CMD_REGION_NEW Jacob Keller
2020-02-14 23:22 ` [RFC PATCH v2 2/2] devlink: stop requiring snapshot for regions Jacob Keller
2020-03-02 16:27 ` [RFC PATCH v2 00/22] devlink region updates Jiri Pirko
2020-03-02 19:27   ` Jacob Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.