All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
@ 2021-06-23  7:59 Ido Schimmel
  2021-06-23  7:59 ` [RFC PATCH net-next 1/4] ethtool: Extract module EEPROM attributes before validation Ido Schimmel
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Ido Schimmel @ 2021-06-23  7:59 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, andrew, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

This patchset adds write support to transceiver module EEPROMs by
extending the ethtool netlink API.

Motivation
==========

The kernel can currently dump the contents of module EEPROMs to user
space via the ethtool legacy ioctl API or the new netlink API. These
dumps can then be parsed by ethtool(8) according to the specification
that defines the memory map of the EEPROM. For example, SFF-8636 [1] for
QSFP and CMIS [2] for QSFP-DD.

In addition to read-only elements, these specifications also define
writeable elements that can be used to control the behavior of the
module. For example, controlling whether the module is put in low or
high power mode to limit its power consumption.

The CMIS specification even defines a message exchange mechanism (CDB,
Command Data Block) on top of the module's memory map. This allows the
host to send various commands to the module. For example, to update its
firmware.

Implementation
==============

The legacy ioctl API to dump module EEPROMs required drivers to parse
the contents of the EEPROM in order to understand how many bytes can be
read and dumped to user space. This meant that drivers had to be updated
to support new standards. See [3], for example.

To overcome this limitation, a new netlink-based API to dump module
EEPROMs was merged in kernel 5.13 [4]. With the new API, the kernel is
merely responsible for fetching EEPROM pages. User space then parses the
information, determines if more pages are available and instructs the
kernel to fetch them as well.

Write support for module EEPROMs employs the same approach. User space
instructs the kernel which bytes (page/offset/bank/length) to change and
to which values.

This approach allows the kernel to remain ignorant of the various
standards and avoids the need to constantly update the kernel to support
new registers / commands. More importantly, it allows advanced
functionality such as firmware update to be implemented once in user
space and shared across all the drivers that support read and write
access to module EEPROMs.

The above is achieved by adding a new command to the generic ethtool
netlink family ('ETHTOOL_MSG_MODULE_EEPROM_SET') which shares the same
attributes with the get command ('ETHTOOL_MSG_MODULE_EEPROM_GET'). See
Documentation/networking/ethtool-netlink.rst in patch #3 for detailed
description of the proposed netlink API.

Note that the new command shares the same restrictions with the existing
get command. This means, for example, that no more than 128 bytes can be
written at once and that cross-page write is forbidden. However, some
CMIS compliant modules might support "Auto Paging" which allows hosts to
"write data in large chunks, without the overhead of explicitly
programming Page changes" [2].

At this time, I cannot evaluate the benefits of "Auto Paging" as I do
not have modules that support the feature, nor a host that can write
more than 48 bytes at once. If the current restrictions prove to be a
bottleneck, they can be relaxed in the future.

ethtool(8) support
==================

The corresponding user space patches extend ethtool(8) with the ability
to change the value of a single byte in the module EEPROM. Example:

 # ethtool -M swp11 offset 0x80 page 3 bank 0 i2c 0x50 value 0x44

This is in accordance with the '-E' option which allows changing the
value of a single byte in the EEPROM of the network device.

The current command line interface is not user-friendly and also
impractical for functionality that requires many reads and writes such
as firmware update.

Therefore, the plan is to extend ethtool(8) over time with commonly
requested functionality on top of the netlink API.

Testing
=======

Tested by writing to page 3 (User EEPROM) of a QSFP-DD module:

 # ethtool -m swp11 offset 0x80 length 3 page 3 bank 0 i2c 0x50
 Offset          Values
 ------          ------
 0x0080:         00 00 00
 # ethtool -M swp11 offset 0x80 page 3 bank 0 i2c 0x50 value 0x44
 # ethtool -M swp11 offset 0x81 page 3 bank 0 i2c 0x50 value 0x41
 # ethtool -M swp11 offset 0x82 page 3 bank 0 i2c 0x50 value 0x44
 # ethtool -m swp11 offset 0x80 length 3 page 3 bank 0 i2c 0x50
 Offset          Values
 ------          ------
 0x0080:         44 41 44

Patchset overview
=================

Patches #1-#2 refactor the ethtool module EEPROM code to allow sharing
attribute validation between read and write.

Patch #3 adds the actual module EEPROM write implementation.

Patch #4 adds mlxsw support.

[1] https://members.snia.org/document/dl/26418
[2] http://www.qsfp-dd.com/wp-content/uploads/2021/05/CMIS5p0.pdf
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af496adcbb8d4656b90a85401eeceb88d520c0d
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7dc85b599ae17fb705ffae1b7321ace4b3056aeb

Ido Schimmel (4):
  ethtool: Extract module EEPROM attributes before validation
  ethtool: Split module EEPROM attributes validation to a function
  ethtool: Add ability to write to transceiver module EEPROM
  mlxsw: core: Add support for module EEPROM write by page

 Documentation/networking/ethtool-netlink.rst  |  47 +++++
 .../net/ethernet/mellanox/mlxsw/core_env.c    |  44 ++++
 .../net/ethernet/mellanox/mlxsw/core_env.h    |   5 +
 drivers/net/ethernet/mellanox/mlxsw/minimal.c |  13 ++
 .../mellanox/mlxsw/spectrum_ethtool.c         |  14 ++
 include/linux/ethtool.h                       |  21 +-
 include/uapi/linux/ethtool_netlink.h          |   2 +
 net/ethtool/eeprom.c                          | 192 +++++++++++++++---
 net/ethtool/netlink.c                         |   7 +
 net/ethtool/netlink.h                         |   2 +
 10 files changed, 316 insertions(+), 31 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next 1/4] ethtool: Extract module EEPROM attributes before validation
  2021-06-23  7:59 [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Ido Schimmel
@ 2021-06-23  7:59 ` Ido Schimmel
  2021-06-23  7:59 ` [RFC PATCH net-next 2/4] ethtool: Split module EEPROM attributes validation to a function Ido Schimmel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2021-06-23  7:59 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, andrew, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Move the extraction of the attributes before their validation, so that
the validation could be easily split into a different function in the
subsequent patch.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 net/ethtool/eeprom.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ethtool/eeprom.c b/net/ethtool/eeprom.c
index 7e6b37a54add..ed5f677f27cd 100644
--- a/net/ethtool/eeprom.c
+++ b/net/ethtool/eeprom.c
@@ -158,6 +158,9 @@ static int eeprom_parse_request(struct ethnl_req_info *req_info, struct nlattr *
 	request->i2c_address = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS]);
 	request->offset = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_OFFSET]);
 	request->length = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_LENGTH]);
+	request->page = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_PAGE]);
+	if (tb[ETHTOOL_A_MODULE_EEPROM_BANK])
+		request->bank = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_BANK]);
 
 	/* The following set of conditions limit the API to only dump 1/2
 	 * EEPROM page without crossing low page boundary located at offset 128.
@@ -165,7 +168,6 @@ static int eeprom_parse_request(struct ethnl_req_info *req_info, struct nlattr *
 	 * either low 128 bytes or high 128 bytes.
 	 * For pages higher than 0 only high 128 bytes are accessible.
 	 */
-	request->page = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_PAGE]);
 	if (request->page && request->offset < ETH_MODULE_EEPROM_PAGE_LEN) {
 		NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_PAGE],
 				    "reading from lower half page is allowed for page 0 only");
@@ -183,9 +185,6 @@ static int eeprom_parse_request(struct ethnl_req_info *req_info, struct nlattr *
 		return -EINVAL;
 	}
 
-	if (tb[ETHTOOL_A_MODULE_EEPROM_BANK])
-		request->bank = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_BANK]);
-
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next 2/4] ethtool: Split module EEPROM attributes validation to a function
  2021-06-23  7:59 [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Ido Schimmel
  2021-06-23  7:59 ` [RFC PATCH net-next 1/4] ethtool: Extract module EEPROM attributes before validation Ido Schimmel
@ 2021-06-23  7:59 ` Ido Schimmel
  2021-06-23  7:59 ` [RFC PATCH net-next 3/4] ethtool: Add ability to write to transceiver module EEPROM Ido Schimmel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2021-06-23  7:59 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, andrew, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

The same validation will be needed for module EEPROM write access in the
subsequent patch, so split it into a function.

Modify the extack messages a bit to not be specific to read access.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 net/ethtool/eeprom.c | 58 ++++++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/net/ethtool/eeprom.c b/net/ethtool/eeprom.c
index ed5f677f27cd..945b95e64f0d 100644
--- a/net/ethtool/eeprom.c
+++ b/net/ethtool/eeprom.c
@@ -144,10 +144,41 @@ static int eeprom_prepare_data(const struct ethnl_req_info *req_base,
 	return ret;
 }
 
+static int eeprom_validate(struct nlattr **tb, struct netlink_ext_ack *extack)
+{
+	u32 offset = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_OFFSET]);
+	u32 length = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_LENGTH]);
+	u8 page = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_PAGE]);
+
+	/* The following set of conditions limit the API to only access 1/2
+	 * EEPROM page without crossing low page boundary located at offset
+	 * 128. For pages higher than 0, only high 128 bytes are accessible.
+	 */
+	if (page && offset < ETH_MODULE_EEPROM_PAGE_LEN) {
+		NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_PAGE],
+				    "access to lower half page is allowed for page 0 only");
+		return -EINVAL;
+	}
+
+	if (offset < ETH_MODULE_EEPROM_PAGE_LEN &&
+	    offset + length > ETH_MODULE_EEPROM_PAGE_LEN) {
+		NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_LENGTH],
+				    "crossing half page boundary is illegal");
+		return -EINVAL;
+	} else if (offset + length > ETH_MODULE_EEPROM_PAGE_LEN * 2) {
+		NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_LENGTH],
+				    "crossing page boundary is illegal");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int eeprom_parse_request(struct ethnl_req_info *req_info, struct nlattr **tb,
 				struct netlink_ext_ack *extack)
 {
 	struct eeprom_req_info *request = MODULE_EEPROM_REQINFO(req_info);
+	int err;
 
 	if (!tb[ETHTOOL_A_MODULE_EEPROM_OFFSET] ||
 	    !tb[ETHTOOL_A_MODULE_EEPROM_LENGTH] ||
@@ -155,6 +186,10 @@ static int eeprom_parse_request(struct ethnl_req_info *req_info, struct nlattr *
 	    !tb[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS])
 		return -EINVAL;
 
+	err = eeprom_validate(tb, extack);
+	if (err)
+		return err;
+
 	request->i2c_address = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS]);
 	request->offset = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_OFFSET]);
 	request->length = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_LENGTH]);
@@ -162,29 +197,6 @@ static int eeprom_parse_request(struct ethnl_req_info *req_info, struct nlattr *
 	if (tb[ETHTOOL_A_MODULE_EEPROM_BANK])
 		request->bank = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_BANK]);
 
-	/* The following set of conditions limit the API to only dump 1/2
-	 * EEPROM page without crossing low page boundary located at offset 128.
-	 * This means user may only request dumps of length limited to 128 from
-	 * either low 128 bytes or high 128 bytes.
-	 * For pages higher than 0 only high 128 bytes are accessible.
-	 */
-	if (request->page && request->offset < ETH_MODULE_EEPROM_PAGE_LEN) {
-		NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_PAGE],
-				    "reading from lower half page is allowed for page 0 only");
-		return -EINVAL;
-	}
-
-	if (request->offset < ETH_MODULE_EEPROM_PAGE_LEN &&
-	    request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN) {
-		NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_LENGTH],
-				    "reading cross half page boundary is illegal");
-		return -EINVAL;
-	} else if (request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN * 2) {
-		NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_LENGTH],
-				    "reading cross page boundary is illegal");
-		return -EINVAL;
-	}
-
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next 3/4] ethtool: Add ability to write to transceiver module EEPROM
  2021-06-23  7:59 [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Ido Schimmel
  2021-06-23  7:59 ` [RFC PATCH net-next 1/4] ethtool: Extract module EEPROM attributes before validation Ido Schimmel
  2021-06-23  7:59 ` [RFC PATCH net-next 2/4] ethtool: Split module EEPROM attributes validation to a function Ido Schimmel
@ 2021-06-23  7:59 ` Ido Schimmel
  2021-06-23  7:59 ` [RFC PATCH net-next 4/4] mlxsw: core: Add support for module EEPROM write by page Ido Schimmel
  2021-06-23 18:44 ` [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Andrew Lunn
  4 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2021-06-23  7:59 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, andrew, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

In a similar fashion to read access to transceiver module EEPROM by
page, add write access. This allows user space to configure various
registers specified in the transceiver module EEPROM memory map (e.g.,
SFF-8636, CMIS).

In the case of CMIS, this also allows user space to implement various
Command Data Block (CDB) messages for interaction with the module. This
is required for advanced functionality such as firmware update.

The new generic netlink command (i.e., 'ETHTOOL_MSG_MODULE_EEPROM_SET')
is fully described in Documentation/networking/ethtool-netlink.rst.

Note that while the 'ETHTOOL_A_MODULE_EEPROM_LENGTH' attribute might
seem redundant, it allows the kernel to validate
'ETHTOOL_MSG_MODULE_EEPROM_SET' messages in a similar fashion to
'ETHTOOL_MSG_MODULE_EEPROM_GET' messages. In addition, it adds another
confirmation regarding the number of bytes to write to the module
EEPROM.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 Documentation/networking/ethtool-netlink.rst |  47 +++++++
 include/linux/ethtool.h                      |  21 ++-
 include/uapi/linux/ethtool_netlink.h         |   2 +
 net/ethtool/eeprom.c                         | 133 +++++++++++++++++++
 net/ethtool/netlink.c                        |   7 +
 net/ethtool/netlink.h                        |   2 +
 6 files changed, 205 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 6ea91e41593f..b7da1f5a9a9d 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -212,6 +212,7 @@ Userspace to kernel:
   ``ETHTOOL_MSG_FEC_SET``               set FEC settings
   ``ETHTOOL_MSG_MODULE_EEPROM_GET``     read SFP module EEPROM
   ``ETHTOOL_MSG_STATS_GET``             get standard statistics
+  ``ETHTOOL_MSG_MODULE_EEPROM_SET``     write to SFP module EEPROM
   ===================================== ================================
 
 Kernel to userspace:
@@ -250,6 +251,7 @@ Kernel to userspace:
   ``ETHTOOL_MSG_FEC_NTF``                  FEC settings
   ``ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY``  read SFP module EEPROM
   ``ETHTOOL_MSG_STATS_GET_REPLY``          standard statistics
+  ``ETHTOOL_MSG_MODULE_EEPROM_NTF``        SFP module EEPROM settings
   ======================================== =================================
 
 ``GET`` requests are sent by userspace applications to retrieve device
@@ -1477,6 +1479,50 @@ Low and high bounds are inclusive, for example:
  etherStatsPkts512to1023Octets 512  1023
  ============================= ==== ====
 
+MODULE_EEPROM_SET
+=================
+
+Write to SFP module EEPROM.
+This interface is designed to allow writes of at most 1/2 page at once. This
+means only writes of 128 (or less) bytes are allowed, without crossing half
+page boundary located at offset 128. For pages other than 0 only high 128 bytes
+are accessible.
+
+Request contents:
+
+  =======================================  ======  ==========================
+  ``ETHTOOL_A_MODULE_EEPROM_HEADER``       nested  request header
+  ``ETHTOOL_A_MODULE_EEPROM_OFFSET``       u32     offset within a page
+  ``ETHTOOL_A_MODULE_EEPROM_LENGTH``       u32     amount of bytes to write
+  ``ETHTOOL_A_MODULE_EEPROM_PAGE``         u8      page number
+  ``ETHTOOL_A_MODULE_EEPROM_BANK``         u8      bank number
+  ``ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS``  u8      page I2C address
+  ``ETHTOOL_A_MODULE_EEPROM_DATA``         binary  array of bytes to write
+  =======================================  ======  ==========================
+
+The amount of bytes to write encoded in ``ETHTOOL_A_MODULE_EEPROM_LENGTH`` must
+match the length of the payload of ``ETHTOOL_A_MODULE_EEPROM_DATA``.
+
+If ``ETHTOOL_A_MODULE_EEPROM_BANK`` is not specified, bank 0 is assumed.
+
+Upon a successful write, a ``ETHTOOL_MSG_MODULE_EEPROM_NTF`` notification is
+sent to user space.
+
+Notification contents:
+
+  =======================================  ======  ==========================
+  ``ETHTOOL_A_MODULE_EEPROM_HEADER``       nested  reply header
+  ``ETHTOOL_A_MODULE_EEPROM_OFFSET``       u32     offset within a page
+  ``ETHTOOL_A_MODULE_EEPROM_LENGTH``       u32     amount of bytes written
+  ``ETHTOOL_A_MODULE_EEPROM_PAGE``         u8      page number
+  ``ETHTOOL_A_MODULE_EEPROM_BANK``         u8      bank number
+  ``ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS``  u8      page I2C address
+  =======================================  ======  ==========================
+
+The notification does not include the ``ETHTOOL_A_MODULE_EEPROM_DATA``
+attribute in order to prevent exposing module EEPROM contents to users without
+admin privileges.
+
 Request translation
 ===================
 
@@ -1575,4 +1621,5 @@ are netlink only.
   n/a                                 ``ETHTOOL_MSG_CABLE_TEST_ACT``
   n/a                                 ``ETHTOOL_MSG_CABLE_TEST_TDR_ACT``
   n/a                                 ``ETHTOOL_MSG_TUNNEL_INFO_GET``
+  n/a                                 ``ETHTOOL_MSG_MODULE_EEPROM_SET``
   =================================== =====================================
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 29dbb603bc91..a91e31a1aeeb 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -388,17 +388,19 @@ struct ethtool_rmon_stats {
 #define ETH_MODULE_MAX_I2C_ADDRESS	0x7f
 
 /**
- * struct ethtool_module_eeprom - EEPROM dump from specified page
- * @offset: Offset within the specified EEPROM page to begin read, in bytes.
- * @length: Number of bytes to read.
- * @page: Page number to read from.
- * @bank: Page bank number to read from, if applicable by EEPROM spec.
+ * struct ethtool_module_eeprom - EEPROM read / write from / to specified page
+ * @offset: Offset within the specified EEPROM page to begin read / write,
+ *	in bytes.
+ * @length: Number of bytes to read / write.
+ * @page: Page number to read / write from / to.
+ * @bank: Page bank number to read / write from /to , if applicable by EEPROM
+ *	spec.
  * @i2c_address: I2C address of a page. Value less than 0x7f expected. Most
  *	EEPROMs use 0x50 or 0x51.
  * @data: Pointer to buffer with EEPROM data of @length size.
  *
- * This can be used to manage pages during EEPROM dump in ethtool and pass
- * required information to the driver.
+ * This can be used to manage pages during EEPROM read / write in ethtool and
+ * pass required information to the driver.
  */
 struct ethtool_module_eeprom {
 	u32	offset;
@@ -574,6 +576,8 @@ struct ethtool_module_eeprom {
  * @get_eth_ctrl_stats: Query some of the IEEE 802.3 MAC Ctrl statistics.
  * @get_rmon_stats: Query some of the RMON (RFC 2819) statistics.
  *	Set %ranges to a pointer to zero-terminated array of byte ranges.
+ * @set_module_eeprom_by_page: Write to a region of plug-in module EEPROM.
+ *	Returns a negative error code or zero.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -693,6 +697,9 @@ struct ethtool_ops {
 	void	(*get_rmon_stats)(struct net_device *dev,
 				  struct ethtool_rmon_stats *rmon_stats,
 				  const struct ethtool_rmon_hist_range **ranges);
+	int	(*set_module_eeprom_by_page)(struct net_device *dev,
+					     const struct ethtool_module_eeprom *page,
+					     struct netlink_ext_ack *extack);
 };
 
 int ethtool_check_ops(const struct ethtool_ops *ops);
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index c7135c9c37a5..9cf2f5a860ba 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -46,6 +46,7 @@ enum {
 	ETHTOOL_MSG_FEC_SET,
 	ETHTOOL_MSG_MODULE_EEPROM_GET,
 	ETHTOOL_MSG_STATS_GET,
+	ETHTOOL_MSG_MODULE_EEPROM_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -88,6 +89,7 @@ enum {
 	ETHTOOL_MSG_FEC_NTF,
 	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
 	ETHTOOL_MSG_STATS_GET_REPLY,
+	ETHTOOL_MSG_MODULE_EEPROM_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
diff --git a/net/ethtool/eeprom.c b/net/ethtool/eeprom.c
index 945b95e64f0d..dc8ffa67b249 100644
--- a/net/ethtool/eeprom.c
+++ b/net/ethtool/eeprom.c
@@ -250,3 +250,136 @@ const struct nla_policy ethnl_module_eeprom_get_policy[] = {
 		NLA_POLICY_RANGE(NLA_U8, 0, ETH_MODULE_MAX_I2C_ADDRESS),
 };
 
+const struct nla_policy ethnl_module_eeprom_set_policy[] = {
+	[ETHTOOL_A_MODULE_EEPROM_HEADER]	= NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_MODULE_EEPROM_OFFSET]	=
+		NLA_POLICY_MAX(NLA_U32, ETH_MODULE_EEPROM_PAGE_LEN * 2 - 1),
+	[ETHTOOL_A_MODULE_EEPROM_LENGTH]	=
+		NLA_POLICY_RANGE(NLA_U32, 1, ETH_MODULE_EEPROM_PAGE_LEN),
+	[ETHTOOL_A_MODULE_EEPROM_PAGE]		= { .type = NLA_U8 },
+	[ETHTOOL_A_MODULE_EEPROM_BANK]		= { .type = NLA_U8 },
+	[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS]	=
+		NLA_POLICY_RANGE(NLA_U8, 0, ETH_MODULE_MAX_I2C_ADDRESS),
+	[ETHTOOL_A_MODULE_EEPROM_DATA]		=
+		NLA_POLICY_RANGE(NLA_BINARY, 1, ETH_MODULE_EEPROM_PAGE_LEN),
+};
+
+static int ethnl_module_eeprom_ntf(struct net_device *dev,
+				   const struct ethtool_module_eeprom *page)
+{
+	struct sk_buff *skb;
+	void *ehdr;
+	int err;
+
+	skb = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	ehdr = ethnl_bcastmsg_put(skb, ETHTOOL_MSG_MODULE_EEPROM_NTF);
+	if (!ehdr) {
+		err = -EMSGSIZE;
+		goto out;
+	}
+
+	err = ethnl_fill_reply_header(skb, dev, ETHTOOL_A_MODULE_EEPROM_HEADER);
+	if (err)
+		goto out;
+
+	err = nla_put_u32(skb, ETHTOOL_A_MODULE_EEPROM_OFFSET, page->offset);
+	if (err)
+		goto out;
+
+	err = nla_put_u32(skb, ETHTOOL_A_MODULE_EEPROM_LENGTH, page->length);
+	if (err)
+		goto out;
+
+	err = nla_put_u8(skb, ETHTOOL_A_MODULE_EEPROM_PAGE, page->page);
+	if (err)
+		goto out;
+
+	err = nla_put_u8(skb, ETHTOOL_A_MODULE_EEPROM_BANK, page->bank);
+	if (err)
+		goto out;
+
+	err = nla_put_u8(skb, ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS,
+			 page->i2c_address);
+	if (err)
+		goto out;
+
+	genlmsg_end(skb, ehdr);
+
+	return ethnl_multicast(skb, dev);
+
+out:
+	nlmsg_free(skb);
+	return err;
+}
+
+int ethnl_set_module_eeprom(struct sk_buff *skb, struct genl_info *info)
+{
+	struct ethtool_module_eeprom page = {};
+	struct ethnl_req_info req_info = {};
+	struct nlattr **tb = info->attrs;
+	const struct ethtool_ops *ops;
+	struct net_device *dev;
+	int ret;
+
+	if (!tb[ETHTOOL_A_MODULE_EEPROM_OFFSET] ||
+	    !tb[ETHTOOL_A_MODULE_EEPROM_LENGTH] ||
+	    !tb[ETHTOOL_A_MODULE_EEPROM_PAGE] ||
+	    !tb[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS] ||
+	    !tb[ETHTOOL_A_MODULE_EEPROM_DATA])
+		return -EINVAL;
+
+	if (nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_LENGTH]) !=
+	    nla_len(tb[ETHTOOL_A_MODULE_EEPROM_DATA]))
+		NL_SET_ERR_MSG_ATTR(info->extack,
+				    tb[ETHTOOL_A_MODULE_EEPROM_LENGTH],
+				    "data length does not match specified length");
+
+	ret = eeprom_validate(tb, info->extack);
+	if (ret < 0)
+		return ret;
+
+	ret = ethnl_parse_header_dev_get(&req_info,
+					 tb[ETHTOOL_A_MODULE_EEPROM_HEADER],
+					 genl_info_net(info), info->extack,
+					 true);
+	if (ret < 0)
+		return ret;
+	dev = req_info.dev;
+	ops = dev->ethtool_ops;
+	ret = -EOPNOTSUPP;
+	if (!ops->set_module_eeprom_by_page)
+		goto out_dev;
+
+	page.offset = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_OFFSET]);
+	page.length = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_LENGTH]);
+	page.page = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_PAGE]);
+	page.i2c_address = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS]);
+	page.data = nla_data(tb[ETHTOOL_A_MODULE_EEPROM_DATA]);
+	if (tb[ETHTOOL_A_MODULE_EEPROM_BANK])
+		page.bank = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_BANK]);
+
+	rtnl_lock();
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		goto out_rtnl;
+
+	ret = dev->ethtool_ops->set_module_eeprom_by_page(dev, &page,
+							  info->extack);
+	if (ret < 0)
+		goto out_ops;
+
+	ethnl_module_eeprom_ntf(dev, &page);
+
+out_ops:
+	ethnl_ops_complete(dev);
+out_rtnl:
+	rtnl_unlock();
+out_dev:
+	dev_put(dev);
+	return ret;
+}
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index a7346346114f..71afb80fda20 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -958,6 +958,13 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_stats_get_policy,
 		.maxattr = ARRAY_SIZE(ethnl_stats_get_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_MODULE_EEPROM_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_set_module_eeprom,
+		.policy	= ethnl_module_eeprom_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_module_eeprom_set_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 3e25a47fd482..5540d5713d33 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -382,6 +382,7 @@ extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
+extern const struct nla_policy ethnl_module_eeprom_set_policy[ETHTOOL_A_MODULE_EEPROM_DATA + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
 int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info);
@@ -400,6 +401,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
 int ethnl_tunnel_info_start(struct netlink_callback *cb);
 int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
+int ethnl_set_module_eeprom(struct sk_buff *skb, struct genl_info *info);
 
 extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
 extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next 4/4] mlxsw: core: Add support for module EEPROM write by page
  2021-06-23  7:59 [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Ido Schimmel
                   ` (2 preceding siblings ...)
  2021-06-23  7:59 ` [RFC PATCH net-next 3/4] ethtool: Add ability to write to transceiver module EEPROM Ido Schimmel
@ 2021-06-23  7:59 ` Ido Schimmel
  2021-06-23 18:44 ` [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Andrew Lunn
  4 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2021-06-23  7:59 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, andrew, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Add support for ethtool_ops::set_module_eeprom_by_page() which allows
user space to write to transceiver module EEPROM based on passed
parameters.

The I2C address is not validated in order to avoid module-specific code.
In case of wrong address, error will be returned from device's firmware.

Tested by writing to page 3 (User EEPROM) on a QSFP-DD module:

 # ethtool -m swp11 offset 0x80 length 3 page 3 bank 0 i2c 0x50
 Offset          Values
 ------          ------
 0x0080:         00 00 00
 # ethtool -M swp11 offset 0x80 page 3 bank 0 i2c 0x50 value 0x44
 # ethtool -M swp11 offset 0x81 page 3 bank 0 i2c 0x50 value 0x41
 # ethtool -M swp11 offset 0x82 page 3 bank 0 i2c 0x50 value 0x44
 # ethtool -m swp11 offset 0x80 length 3 page 3 bank 0 i2c 0x50
 Offset          Values
 ------          ------
 0x0080:         44 41 44

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 .../net/ethernet/mellanox/mlxsw/core_env.c    | 44 +++++++++++++++++++
 .../net/ethernet/mellanox/mlxsw/core_env.h    |  5 +++
 drivers/net/ethernet/mellanox/mlxsw/minimal.c | 13 ++++++
 .../mellanox/mlxsw/spectrum_ethtool.c         | 14 ++++++
 4 files changed, 76 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.c b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
index 3713c45cfa1e..4cb69eddbd1e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_env.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
@@ -389,6 +389,50 @@ mlxsw_env_get_module_eeprom_by_page(struct mlxsw_core *mlxsw_core, u8 module,
 }
 EXPORT_SYMBOL(mlxsw_env_get_module_eeprom_by_page);
 
+int
+mlxsw_env_set_module_eeprom_by_page(struct mlxsw_core *mlxsw_core, u8 module,
+				    const struct ethtool_module_eeprom *page,
+				    struct netlink_ext_ack *extack)
+{
+	u32 bytes_written = 0;
+	u16 device_addr;
+
+	/* Offset cannot be larger than 2 * ETH_MODULE_EEPROM_PAGE_LEN */
+	device_addr = page->offset;
+
+	while (bytes_written < page->length) {
+		char eeprom_tmp[MLXSW_REG_MCIA_EEPROM_SIZE] = {};
+		char mcia_pl[MLXSW_REG_MCIA_LEN];
+		u8 size;
+		int err;
+
+		size = min_t(u8, page->length - bytes_written,
+			     MLXSW_REG_MCIA_EEPROM_SIZE);
+
+		mlxsw_reg_mcia_pack(mcia_pl, module, 0, page->page,
+				    device_addr + bytes_written, size,
+				    page->i2c_address);
+		mlxsw_reg_mcia_bank_number_set(mcia_pl, page->bank);
+		memcpy(eeprom_tmp, page->data + bytes_written, size);
+		mlxsw_reg_mcia_eeprom_memcpy_to(mcia_pl, eeprom_tmp);
+
+		err = mlxsw_reg_write(mlxsw_core, MLXSW_REG(mcia), mcia_pl);
+		if (err) {
+			NL_SET_ERR_MSG_MOD(extack, "Failed to access module's EEPROM");
+			return err;
+		}
+
+		err = mlxsw_env_mcia_status_process(mcia_pl, extack);
+		if (err)
+			return err;
+
+		bytes_written += size;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(mlxsw_env_set_module_eeprom_by_page);
+
 static int mlxsw_env_module_has_temp_sensor(struct mlxsw_core *mlxsw_core,
 					    u8 module,
 					    bool *p_has_temp_sensor)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.h b/drivers/net/ethernet/mellanox/mlxsw/core_env.h
index 0bf5bd0f8a7e..e07f48ffbf2b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_env.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.h
@@ -24,6 +24,11 @@ mlxsw_env_get_module_eeprom_by_page(struct mlxsw_core *mlxsw_core, u8 module,
 				    const struct ethtool_module_eeprom *page,
 				    struct netlink_ext_ack *extack);
 
+int
+mlxsw_env_set_module_eeprom_by_page(struct mlxsw_core *mlxsw_core, u8 module,
+				    const struct ethtool_module_eeprom *page,
+				    struct netlink_ext_ack *extack);
+
 int
 mlxsw_env_module_overheat_counter_get(struct mlxsw_core *mlxsw_core, u8 module,
 				      u64 *p_counter);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/minimal.c b/drivers/net/ethernet/mellanox/mlxsw/minimal.c
index d9d56c44e994..b795520566bb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/minimal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/minimal.c
@@ -124,11 +124,24 @@ mlxsw_m_get_module_eeprom_by_page(struct net_device *netdev,
 						   page, extack);
 }
 
+static int
+mlxsw_m_set_module_eeprom_by_page(struct net_device *netdev,
+				  const struct ethtool_module_eeprom *page,
+				  struct netlink_ext_ack *extack)
+{
+	struct mlxsw_m_port *mlxsw_m_port = netdev_priv(netdev);
+	struct mlxsw_core *core = mlxsw_m_port->mlxsw_m->core;
+
+	return mlxsw_env_set_module_eeprom_by_page(core, mlxsw_m_port->module,
+						   page, extack);
+}
+
 static const struct ethtool_ops mlxsw_m_port_ethtool_ops = {
 	.get_drvinfo		= mlxsw_m_module_get_drvinfo,
 	.get_module_info	= mlxsw_m_get_module_info,
 	.get_module_eeprom	= mlxsw_m_get_module_eeprom,
 	.get_module_eeprom_by_page = mlxsw_m_get_module_eeprom_by_page,
+	.set_module_eeprom_by_page = mlxsw_m_set_module_eeprom_by_page,
 };
 
 static int
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
index 267590a0eee7..2e7f57a6fbb2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
@@ -1063,6 +1063,19 @@ mlxsw_sp_get_module_eeprom_by_page(struct net_device *dev,
 						   extack);
 }
 
+static int
+mlxsw_sp_set_module_eeprom_by_page(struct net_device *dev,
+				   const struct ethtool_module_eeprom *page,
+				   struct netlink_ext_ack *extack)
+{
+	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	u8 module = mlxsw_sp_port->mapping.module;
+
+	return mlxsw_env_set_module_eeprom_by_page(mlxsw_sp->core, module, page,
+						   extack);
+}
+
 static int
 mlxsw_sp_get_ts_info(struct net_device *netdev, struct ethtool_ts_info *info)
 {
@@ -1213,6 +1226,7 @@ const struct ethtool_ops mlxsw_sp_port_ethtool_ops = {
 	.get_module_info		= mlxsw_sp_get_module_info,
 	.get_module_eeprom		= mlxsw_sp_get_module_eeprom,
 	.get_module_eeprom_by_page	= mlxsw_sp_get_module_eeprom_by_page,
+	.set_module_eeprom_by_page	= mlxsw_sp_set_module_eeprom_by_page,
 	.get_ts_info			= mlxsw_sp_get_ts_info,
 	.get_eth_phy_stats		= mlxsw_sp_get_eth_phy_stats,
 	.get_eth_mac_stats		= mlxsw_sp_get_eth_mac_stats,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-23  7:59 [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Ido Schimmel
                   ` (3 preceding siblings ...)
  2021-06-23  7:59 ` [RFC PATCH net-next 4/4] mlxsw: core: Add support for module EEPROM write by page Ido Schimmel
@ 2021-06-23 18:44 ` Andrew Lunn
  2021-06-24 19:38   ` Ido Schimmel
  4 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2021-06-23 18:44 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, jiri, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

On Wed, Jun 23, 2021 at 10:59:21AM +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> This patchset adds write support to transceiver module EEPROMs by
> extending the ethtool netlink API.
> 
> Motivation
> ==========
> 
> The kernel can currently dump the contents of module EEPROMs to user
> space via the ethtool legacy ioctl API or the new netlink API. These
> dumps can then be parsed by ethtool(8) according to the specification
> that defines the memory map of the EEPROM. For example, SFF-8636 [1] for
> QSFP and CMIS [2] for QSFP-DD.
> 
> In addition to read-only elements, these specifications also define
> writeable elements that can be used to control the behavior of the
> module. For example, controlling whether the module is put in low or
> high power mode to limit its power consumption.

Hi Ido

So power control is part of the specification? All CMIS devices should
implement it the same.

> The CMIS specification even defines a message exchange mechanism (CDB,
> Command Data Block) on top of the module's memory map. This allows the
> host to send various commands to the module. For example, to update its
> firmware.
 
> This approach allows the kernel to remain ignorant of the various
> standards and avoids the need to constantly update the kernel to support
> new registers / commands. More importantly, it allows advanced
> functionality such as firmware update to be implemented once in user
> space and shared across all the drivers that support read and write
> access to module EEPROMs.

I fear we are opening the door for user space binary blob drivers,
which do much more than just firmware upgrade. You seems to say that
power control is part of the CMIS standard. So we don't need userspace
involved for that. We can implement a library which any MAC driver can
share.

I also wonder if it is safe to perform firmware upgrade from user
space? I've not looked at the code yet, but i assume you only allow
write when the interface is down? Otherwise isn't there a danger you
can brick the SFP if the MAC accesses it at the same time as when an
upgrade is happening?

Do you have other concrete use cases for write from user space?

In general, we don't allow uncontrolled access to hardware from user
space. We add specific, well documented API calls, and expect kernel
drivers to implement them. I don't see why SFPs should be
different. Standardised parts we can implement in common code. None
standard parts we need device/vendor specific code. Which just means
we need drivers following the usual linux conventions, some sort of
bus driver which reads the vendor/device ID from the EEPROM and probes
a driver for the specific SFP.

  Andrew




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-23 18:44 ` [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Andrew Lunn
@ 2021-06-24 19:38   ` Ido Schimmel
  2021-06-24 20:27     ` Andrew Lunn
  2021-06-29 17:27     ` Jakub Kicinski
  0 siblings, 2 replies; 15+ messages in thread
From: Ido Schimmel @ 2021-06-24 19:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, davem, kuba, jiri, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

On Wed, Jun 23, 2021 at 08:44:57PM +0200, Andrew Lunn wrote:
> On Wed, Jun 23, 2021 at 10:59:21AM +0300, Ido Schimmel wrote:
> > From: Ido Schimmel <idosch@nvidia.com>
> > 
> > This patchset adds write support to transceiver module EEPROMs by
> > extending the ethtool netlink API.
> > 
> > Motivation
> > ==========
> > 
> > The kernel can currently dump the contents of module EEPROMs to user
> > space via the ethtool legacy ioctl API or the new netlink API. These
> > dumps can then be parsed by ethtool(8) according to the specification
> > that defines the memory map of the EEPROM. For example, SFF-8636 [1] for
> > QSFP and CMIS [2] for QSFP-DD.
> > 
> > In addition to read-only elements, these specifications also define
> > writeable elements that can be used to control the behavior of the
> > module. For example, controlling whether the module is put in low or
> > high power mode to limit its power consumption.
> 
> Hi Ido

Hi Andrew,

Thanks for the feedback.

> 
> So power control is part of the specification?

Yes. See "6.3.2.4 Module Power Mode" in CMIS.

> All CMIS devices should implement it the same.

The implementation inside the module will probably differ between
different vendors, but the interface towards the host (Linux) will be
the same as it is standardized by CMIS.

> 
> > The CMIS specification even defines a message exchange mechanism (CDB,
> > Command Data Block) on top of the module's memory map. This allows the
> > host to send various commands to the module. For example, to update its
> > firmware.
>  
> > This approach allows the kernel to remain ignorant of the various
> > standards and avoids the need to constantly update the kernel to support
> > new registers / commands. More importantly, it allows advanced
> > functionality such as firmware update to be implemented once in user
> > space and shared across all the drivers that support read and write
> > access to module EEPROMs.
> 
> I fear we are opening the door for user space binary blob drivers,

No need for multiple drivers. The whole point of these standards is that
a single implementation on the host will work across different modules
from different vendors. I expect that ethtool(8) will have an
implementation which other projects can then use as a reference. Similar
to how iproute2 is used as a reference by developers of various
interface managers / routing daemons.

> which do much more than just firmware upgrade.

CMIS indeed allows for much more than firmware update. But again, it is
all standardized to ensure a single implementation on the host will
suffice.

> You seems to say that power control is part of the CMIS standard. So
> we don't need userspace involved for that. We can implement a library
> which any MAC driver can share.

I fail to understand this logic. I would understand pushing
functionality into the kernel in order to create an abstraction for user
space over different hardware interfaces from different vendors. This is
not the case here. Nothing is vendor specific. Not to the host vendor
nor to the module vendor.

Pushing all this functionality into the kernel will basically mean
creating an abstraction over an abstraction. Practically, it means
bloating the kernel with several thousands LoC (maybe more) and a lot of
new user APIs which we will never be able to remove. TBH, I cannot
explain to myself what we stand to gain from that.

> 
> I also wonder if it is safe to perform firmware upgrade from user
> space?

I have yet to write the relevant code in ethtool(8), but it should be
safe. See more below.

> I've not looked at the code yet, but i assume you only allow
> write when the interface is down?

No. Taking it down will mean needless packet loss for the entire time of
the firmware download process. After the download process, you need to
activate the new image from one of the banks (A/B) via CDB "Run Image"
command, but even that can be hit less. See "HitlessRestart" bit which
can be queried using "Firmware Management Features" command:

"
0: CMD Run Image causes a reset. Traffic is affected.
1: CMD Run Image may reset, but module will do its best to maintain
traffic and management states. Data path functions are not reset.
"

Can be implemented in ethtool(8) using commands such as these:

# ethtool --module-fw-query DEVNAME
# ethtool --module-fw-dl DEVNAME FILENAME
# ethtool --module-fw-run DEVNAME

Just a quick example. Yet to design / implement this functionality.
Wanted to discuss the kernel interface before starting to pile user
space functionality on top.

> Otherwise isn't there a danger you can brick the SFP if the MAC
> accesses it at the same time as when an upgrade is happening?

No. You download the firmware image to a bank, verify it was downloaded
correctly and then activate the new image (without affecting traffic,
potentially).

> 
> Do you have other concrete use cases for write from user space?

The first priority on my list is related to SFF-8636 (QSFP).
Specifically, ability to monitor signal to noise ratio and laser
temperature. This is done via High Page 20h that can monitor up to 24
parameters. Since the number of parameters that can be monitored is much
larger than 24, you need to instruct the module which parameters to
monitor. This is done by writing to the "Param Configuration" table on
this page. See "6.7.2.5 Parameter Configuration Registers".

> 
> In general, we don't allow uncontrolled access to hardware from user
> space. We add specific, well documented API calls, and expect kernel
> drivers to implement them. I don't see why SFPs should be
> different. Standardised parts we can implement in common code. None
> standard parts we need device/vendor specific code. Which just means
> we need drivers following the usual linux conventions, some sort of
> bus driver which reads the vendor/device ID from the EEPROM and probes
> a driver for the specific SFP.

I think I touched on all these points above. The symmetric GET interface
was implemented following your good advice [1] from a year ago in order
to avoid two parsers, in user space and kernel. Creating a netlink API
for every little writeable element in these standards will not only mean
that the kernel will need to be intimately familiar with these memory
maps, but also that it will need to be constantly patched with new user
APIs which we will never be able to remove.

Given the above is entirely avoidable with a single portable
implementation in ethtool(8), I do not think we should go with a kernel
implementation.

[1] https://lore.kernel.org/netdev/20200630002159.GA597495@lunn.ch/

> 
>   Andrew
> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-24 19:38   ` Ido Schimmel
@ 2021-06-24 20:27     ` Andrew Lunn
  2021-06-27 10:33       ` Ido Schimmel
  2021-06-29 17:27     ` Jakub Kicinski
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2021-06-24 20:27 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, jiri, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

> I fail to understand this logic. I would understand pushing
> functionality into the kernel in order to create an abstraction for user
> space over different hardware interfaces from different vendors. This is
> not the case here. Nothing is vendor specific. Not to the host vendor
> nor to the module vendor.

Hi Ido

My worry is, we are opening up an ideal vector for user space drivers
for SFPs. And worse still, closed source user space drivers. We have
had great success with switchdev, over a hundred supported switches,
partially because we have always pushed back against kAPIs which allow
user space driver, closed binary blobs etc.

We have the choice here. We can add a write method to the kAPI, add
open source code to Ethtool using that API, and just accept people are
going to abuse the API for all sorts of horrible things in user space.
Or we can add more restrictive kAPIs, put more code in the kernel, and
probably limit user space doing horrible things. Maybe as a side
effect, SFP vendors contribute some open source code, rather than
binary blobs?

I tend to disagree about adding kAPIs which allow write. But i would
like to hear other peoples opinions on this.

     Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-24 20:27     ` Andrew Lunn
@ 2021-06-27 10:33       ` Ido Schimmel
  2021-06-27 15:12         ` Andrew Lunn
  2021-06-29 20:12         ` Pali Rohár
  0 siblings, 2 replies; 15+ messages in thread
From: Ido Schimmel @ 2021-06-27 10:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, davem, kuba, jiri, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

On Thu, Jun 24, 2021 at 10:27:13PM +0200, Andrew Lunn wrote:
> > I fail to understand this logic. I would understand pushing
> > functionality into the kernel in order to create an abstraction for user
> > space over different hardware interfaces from different vendors. This is
> > not the case here. Nothing is vendor specific. Not to the host vendor
> > nor to the module vendor.
> 
> Hi Ido

Hi Andrew,

> 
> My worry is, we are opening up an ideal vector for user space drivers
> for SFPs. And worse still, closed source user space drivers. We have
> had great success with switchdev, over a hundred supported switches,
> partially because we have always pushed back against kAPIs which allow
> user space driver, closed binary blobs etc.

I don't think it's a correct comparison. Switch ASICs don't have a
standardized interface towards the host. It is therefore essential that
the kernel will abstract these differences to user space.

> 
> We have the choice here. We can add a write method to the kAPI, add
> open source code to Ethtool using that API, and just accept people are
> going to abuse the API for all sorts of horrible things in user space.
> Or we can add more restrictive kAPIs, put more code in the kernel, and
> probably limit user space doing horrible things. Maybe as a side
> effect, SFP vendors contribute some open source code, rather than
> binary blobs?

I didn't see any code or binary blobs from SFP vendors and I'm not sure
how they can provide these either. Their goal is - I believe - to sell
as much modules as possible to what the standard calls "systems
manufactures" / "system integrators". Therefore, they cannot make any
assumptions about the I2C connectivity (whether to the ASIC or the CPU),
the operating system running on the host and the user interface (ioctl /
netlink etc).

Given all these moving parts, I don't see how they can provide any
tooling. It is in their best interest to simply follow the standard and
make the tooling a problem of the "systems manufactures" / "system
integrators". In fact, the user who requested this functionality claims:
"the cable vendors don't develop the tools to burn the FW since the
vendors claim that the CMIS is supported". The user also confirmed that
another provider "is able to burn the FW for the cables from different
vendors".

> 
> I tend to disagree about adding kAPIs which allow write. But i would
> like to hear other peoples opinions on this.
> 
>      Andrew
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-27 10:33       ` Ido Schimmel
@ 2021-06-27 15:12         ` Andrew Lunn
  2021-06-28  7:33           ` Ido Schimmel
  2021-06-29 20:12         ` Pali Rohár
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2021-06-27 15:12 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, jiri, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

On Sun, Jun 27, 2021 at 01:33:13PM +0300, Ido Schimmel wrote:
> On Thu, Jun 24, 2021 at 10:27:13PM +0200, Andrew Lunn wrote:
> > > I fail to understand this logic. I would understand pushing
> > > functionality into the kernel in order to create an abstraction for user
> > > space over different hardware interfaces from different vendors. This is
> > > not the case here. Nothing is vendor specific. Not to the host vendor
> > > nor to the module vendor.
> > 
> > Hi Ido
> 
> Hi Andrew,
> 
> > 
> > My worry is, we are opening up an ideal vector for user space drivers
> > for SFPs. And worse still, closed source user space drivers. We have
> > had great success with switchdev, over a hundred supported switches,
> > partially because we have always pushed back against kAPIs which allow
> > user space driver, closed binary blobs etc.
> 
> I don't think it's a correct comparison. Switch ASICs don't have a
> standardized interface towards the host. It is therefore essential that
> the kernel will abstract these differences to user space.
> 
> > 
> > We have the choice here. We can add a write method to the kAPI, add
> > open source code to Ethtool using that API, and just accept people are
> > going to abuse the API for all sorts of horrible things in user space.
> > Or we can add more restrictive kAPIs, put more code in the kernel, and
> > probably limit user space doing horrible things. Maybe as a side
> > effect, SFP vendors contribute some open source code, rather than
> > binary blobs?
> 
> I didn't see any code or binary blobs from SFP vendors and I'm not sure
> how they can provide these either. Their goal is - I believe - to sell
> as much modules as possible to what the standard calls "systems
> manufactures" / "system integrators". Therefore, they cannot make any
> assumptions about the I2C connectivity (whether to the ASIC or the CPU),
> the operating system running on the host and the user interface (ioctl /
> netlink etc).
> 
> Given all these moving parts, I don't see how they can provide any
> tooling. It is in their best interest to simply follow the standard and
> make the tooling a problem of the "systems manufactures" / "system
> integrators". In fact, the user who requested this functionality claims:
> "the cable vendors don't develop the tools to burn the FW since the
> vendors claim that the CMIS is supported". The user also confirmed that
> another provider "is able to burn the FW for the cables from different
> vendors".

Hi Ido

This API is not just about CMIS, it covers any I2C connected SFP
device. I'm more involved in the lower end, 1G, 2.5G and 10G. Devices
in this category seem to be very bad a following the standards. GPON
is especially bad, and GPON manufactures don't seem to care their
devices don't follow the standard, they assume the Customer Premises
Equipment is going to run software to work around whatever issues
their specific GPON has, maybe they provide driver code? The API you
are adding would be ideal for putting that driver in user space, as a
binary blob. That is going to make it harder for us to open up the
many millions of CPE used in FTTH. And there are people attempting to
do that.

If devices following CMIS really are closely following the standard
that is great. We should provide tooling to do firmware upgrade. But
at the same time, we don't want to aid those who go against the
standards and do their own thing. And it sounds like in the CMIS
world, we might have the power to encourage vendors to follow CMIS,
"Look, firmware upgrade just works for the competitors devices, why
should i use your device when it does not work?"

I just want to make sure we are considering the full range of devices
this new API will cover. From little ARM systems with 1G copper and
FTTH fibre ports through to big TOR systems with large number of 100G
ports.  If CMIS is well support by vendors, putting the code into the
kernel, as a loadable module, might be the better solution for the
whole range of devices the kernel needs to support.

      Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-27 15:12         ` Andrew Lunn
@ 2021-06-28  7:33           ` Ido Schimmel
  2021-06-29 13:47             ` Andrew Lunn
  0 siblings, 1 reply; 15+ messages in thread
From: Ido Schimmel @ 2021-06-28  7:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, davem, kuba, jiri, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

On Sun, Jun 27, 2021 at 05:12:26PM +0200, Andrew Lunn wrote:
> This API is not just about CMIS, it covers any I2C connected SFP
> device. I'm more involved in the lower end, 1G, 2.5G and 10G. Devices
> in this category seem to be very bad a following the standards. GPON
> is especially bad, and GPON manufactures don't seem to care their
> devices don't follow the standard, they assume the Customer Premises
> Equipment is going to run software to work around whatever issues
> their specific GPON has, maybe they provide driver code? The API you
> are adding would be ideal for putting that driver in user space, as a
> binary blob. That is going to make it harder for us to open up the
> many millions of CPE used in FTTH. And there are people attempting to
> do that.
> 
> If devices following CMIS really are closely following the standard
> that is great. We should provide tooling to do firmware upgrade. But
> at the same time, we don't want to aid those who go against the
> standards and do their own thing. And it sounds like in the CMIS
> world, we might have the power to encourage vendors to follow CMIS,
> "Look, firmware upgrade just works for the competitors devices, why
> should i use your device when it does not work?"
> 
> I just want to make sure we are considering the full range of devices
> this new API will cover. From little ARM systems with 1G copper and
> FTTH fibre ports through to big TOR systems with large number of 100G
> ports.  If CMIS is well support by vendors, putting the code into the
> kernel, as a loadable module, might be the better solution for the
> whole range of devices the kernel needs to support.

If the goal is to limit what user space can do, then putting all the
code in the kernel and adding an ever-growing number of restrictive user
APIs is not the only way.

Even with the proposed approach, the kernel sits in the middle between
the module and user space. As such, it can maintain an "allow list" that
only allows access to modules with a specific memory map (CMIS and
SFF-8636 for now) and only to a subset of the pages which are
standardized by the specifications.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-28  7:33           ` Ido Schimmel
@ 2021-06-29 13:47             ` Andrew Lunn
  2021-06-29 19:44               ` Pali Rohár
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2021-06-29 13:47 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, jiri, vladyslavt, moshe, vadimp, mkubecek,
	mlxsw, Ido Schimmel

> Even with the proposed approach, the kernel sits in the middle between
> the module and user space. As such, it can maintain an "allow list" that
> only allows access to modules with a specific memory map (CMIS and
> SFF-8636 for now) and only to a subset of the pages which are
> standardized by the specifications.

Hi Ido

This seems like a reasonable compromise. But i would go further. Limit
it to just what is needed for firmware upgrade.

   Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-24 19:38   ` Ido Schimmel
  2021-06-24 20:27     ` Andrew Lunn
@ 2021-06-29 17:27     ` Jakub Kicinski
  1 sibling, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2021-06-29 17:27 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, netdev, davem, jiri, vladyslavt, moshe, vadimp,
	mkubecek, mlxsw, Ido Schimmel

On Thu, 24 Jun 2021 22:38:27 +0300 Ido Schimmel wrote:
> > I also wonder if it is safe to perform firmware upgrade from user
> > space?  
> 
> I have yet to write the relevant code in ethtool(8), but it should be
> safe. See more below.

What's unclear to me is what difference does it make whether the code is
in kernel or in ethtool.git. If modules follow the standard the code
churn will be low. And we have to type the code up anyway it seems it
doesn't matter where it lies.

Where it does matter is if one wants to reuse existing user space built
on top of raw read/write interfaces.

IOW it shouldn't matter if we put the code in the kernel or user space
from effort perspective, but the latter is more risky.

You mention some limited sensor resources, isn't this exactly the
resource management the kernel is supposed to perform?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-29 13:47             ` Andrew Lunn
@ 2021-06-29 19:44               ` Pali Rohár
  0 siblings, 0 replies; 15+ messages in thread
From: Pali Rohár @ 2021-06-29 19:44 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, netdev, davem, kuba, jiri, vladyslavt, moshe,
	vadimp, mkubecek, mlxsw, Ido Schimmel

On Tuesday 29 June 2021 15:47:33 Andrew Lunn wrote:
> > Even with the proposed approach, the kernel sits in the middle between
> > the module and user space. As such, it can maintain an "allow list" that
> > only allows access to modules with a specific memory map (CMIS and
> > SFF-8636 for now) and only to a subset of the pages which are
> > standardized by the specifications.
> 
> Hi Ido
> 
> This seems like a reasonable compromise. But i would go further. Limit
> it to just what is needed for firmware upgrade.
> 
>    Andrew

Hello! If this is just because for CMIS firmware upgrade, what about
rather providing kernel driver for CMIS firmware upgrade?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs
  2021-06-27 10:33       ` Ido Schimmel
  2021-06-27 15:12         ` Andrew Lunn
@ 2021-06-29 20:12         ` Pali Rohár
  1 sibling, 0 replies; 15+ messages in thread
From: Pali Rohár @ 2021-06-29 20:12 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, netdev, davem, kuba, jiri, vladyslavt, moshe,
	vadimp, mkubecek, mlxsw, Ido Schimmel

On Sunday 27 June 2021 13:33:13 Ido Schimmel wrote:
> On Thu, Jun 24, 2021 at 10:27:13PM +0200, Andrew Lunn wrote:
> > We have the choice here. We can add a write method to the kAPI, add
> > open source code to Ethtool using that API, and just accept people are
> > going to abuse the API for all sorts of horrible things in user space.
> > Or we can add more restrictive kAPIs, put more code in the kernel, and
> > probably limit user space doing horrible things. Maybe as a side
> > effect, SFP vendors contribute some open source code, rather than
> > binary blobs?
> 
> I didn't see any code or binary blobs from SFP vendors and I'm not sure
> how they can provide these either. Their goal is - I believe - to sell
> as much modules as possible to what the standard calls "systems
> manufactures" / "system integrators". Therefore, they cannot make any
> assumptions about the I2C connectivity (whether to the ASIC or the CPU),
> the operating system running on the host and the user interface (ioctl /
> netlink etc).

Hello! This is really happening in GPON world. Most GPON SFP modules are
working only in few devices with SFP cages. And it is either because
GPON SFP module vendor provided to vendor of device with SFP cage how to
"hack", initialize and use that GPON module properly or because we have
figured out how particular GPON modules violate SFF standards and added
special quirks per SFP module or per chipset in SFP module.

And the other thing which is happening. If vendor of GPON SFP module is
the same as vendor of device with SFP cage then it is doing everything
to ensure that only its GPON SFP modules would work in its devices. Or
to ensure that its OLT station (opposite end of GPON client SFP module)
would link only with its branded GPON SFP modules.

Classic vendor lockin. If ISP is using OLT station from vendor A then A
wants that you cannot use GPON SFP modules from vendor B on that
network.

You said that vendor goal is to sell as much modules as possible. Seem
that this is truth and vendors are doing it by above vendor lockin
strategy only.


So at the end I really do not like raw RW access to SFP EEPROM. This
just opens a new door for vendor lockin and vendor blob strategy.

And the last thing is that rewriting EEPROM on arbitrary SFP module may
lead to total damage of SFP. Specially on SFPs with "computer inside"
where parts of (critical) memory is shadowed in SFP EEPROM space.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-29 20:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-23  7:59 [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Ido Schimmel
2021-06-23  7:59 ` [RFC PATCH net-next 1/4] ethtool: Extract module EEPROM attributes before validation Ido Schimmel
2021-06-23  7:59 ` [RFC PATCH net-next 2/4] ethtool: Split module EEPROM attributes validation to a function Ido Schimmel
2021-06-23  7:59 ` [RFC PATCH net-next 3/4] ethtool: Add ability to write to transceiver module EEPROM Ido Schimmel
2021-06-23  7:59 ` [RFC PATCH net-next 4/4] mlxsw: core: Add support for module EEPROM write by page Ido Schimmel
2021-06-23 18:44 ` [RFC PATCH net-next 0/4] ethtool: Add ability to write to transceiver module EEPROMs Andrew Lunn
2021-06-24 19:38   ` Ido Schimmel
2021-06-24 20:27     ` Andrew Lunn
2021-06-27 10:33       ` Ido Schimmel
2021-06-27 15:12         ` Andrew Lunn
2021-06-28  7:33           ` Ido Schimmel
2021-06-29 13:47             ` Andrew Lunn
2021-06-29 19:44               ` Pali Rohár
2021-06-29 20:12         ` Pali Rohár
2021-06-29 17:27     ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.