All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next v2 0/6] ethtool: Add ability to control transceiver modules' power mode
@ 2021-08-18 15:51 Ido Schimmel
  2021-08-18 15:51 ` [RFC PATCH net-next v2 1/6] " Ido Schimmel
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Ido Schimmel @ 2021-08-18 15:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

This patchset extends the ethtool netlink API to allow user space to
control transceiver modules. Two specific APIs are added, but the plan
is to extend the interface with more APIs in the future (see "Future
plans").

This submission is a complete rework of a previous submission [1] that
tried to achieve the same goal by allowing user space to write to the
EEPROMs of these modules. It was rejected as it could have enabled user
space binary blob drivers.

However, the main issue is that by directly writing to some pages of
these EEPROMs, we are interfering with the entity that is controlling
the modules (kernel / device firmware). In addition, some functionality
cannot be implemented solely by writing to the EEPROM, as it requires
the assertion / de-assertion of hardware signals (e.g., "ResetL" pin in
SFF-8636).

Motivation
==========

The kernel can currently dump the contents of module EEPROMs to user
space via the ethtool legacy ioctl API or the new netlink API. These
dumps can then be parsed by ethtool(8) according to the specification
that defines the memory map of the EEPROM. For example, SFF-8636 [2] for
QSFP and CMIS [3] for QSFP-DD.

In addition to read-only elements, these specifications also define
writeable elements that can be used to control the behavior of the
module. For example, controlling whether the module is put in low or
high power mode to limit its power consumption.

The CMIS specification even defines a message exchange mechanism (CDB,
Command Data Block) on top of the module's memory map. This allows the
host to send various commands to the module. For example, to update its
firmware.

Implementation
==============

The ethtool netlink API is extended with two new messages,
'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that allow user
space to set and get transceiver module parameters. Specifically, the
'ETHTOOL_A_MODULE_POWER_MODE_POLICY' attribute allows user space to
control the power mode policy of the module in order to limit its power
consumption. See detailed description in patch #1.

The user API is designed to be generic enough so that it could be used
for modules with different memory maps (e.g., SFF-8636, CMIS).

The only implementation of the device driver API in this series is for a
MAC driver (mlxsw) where the module is controlled by the device's
firmware, but it is designed to be generic enough so that it could also
be used by implementations where the module is controlled by the kernel.

Testing and introspection
=========================

See detailed description in patches #1 and #5.

Patchset overview
=================

Patch #1 adds the initial infrastructure in ethtool along with the
ability to control transceiver modules' power mode.

Patches #2-#3 add required device registers in mlxsw.

Patch #4 implements in mlxsw the ethtool operations added in patch #1.

Patch #5 adds extended link states in order to allow user space to
troubleshoot link down issues related to transceiver modules.

Patch #6 adds support for these extended states in mlxsw.

Future plans
============

* Extend 'ETHTOOL_MSG_MODULE_SET' to control Tx output among other
attributes.

* Add new ethtool message(s) to update firmware on transceiver modules.

* Extend ethtool(8) to parse more diagnostic information from CMIS
modules. No kernel changes required.

Changes since RFC v1 [4]:

* Split 'ETHTOOL_A_MODULE_LOW_POWER_ENABLED' into two attributes.
* Add 'high-on-up' power mode policy.
* Drop 'ETHTOOL_MSG_MODULE_RESET_ACT' in favor of existing ioctl
  interface.
* Add extended link states to help in troubleshooting.

Note: This series does not apply to net-next (it is RFC, anyway) because
I left out mundane infra work in mlxsw. Will submit it with reset
support as part of a separate set.

[1] https://lore.kernel.org/netdev/20210623075925.2610908-1-idosch@idosch.org/
[2] https://members.snia.org/document/dl/26418
[3] http://www.qsfp-dd.com/wp-content/uploads/2021/05/CMIS5p0.pdf
[4] https://lore.kernel.org/netdev/20210809102152.719961-1-idosch@idosch.org/

Ido Schimmel (6):
  ethtool: Add ability to control transceiver modules' power mode
  mlxsw: reg: Add Port Module Memory Map Properties register
  mlxsw: reg: Add Management Cable IO and Notifications register
  mlxsw: Add ability to control transceiver modules' power mode
  ethtool: Add transceiver module extended states
  mlxsw: Add support for transceiver module extended states

 Documentation/networking/ethtool-netlink.rst  |  78 ++++++-
 .../net/ethernet/mellanox/mlxsw/core_env.c    | 193 +++++++++++++++++-
 .../net/ethernet/mellanox/mlxsw/core_env.h    |  10 +
 drivers/net/ethernet/mellanox/mlxsw/minimal.c |  26 +++
 drivers/net/ethernet/mellanox/mlxsw/reg.h     |  78 +++++++
 .../mellanox/mlxsw/spectrum_ethtool.c         |  73 ++++++-
 include/linux/ethtool.h                       |  26 +++
 include/uapi/linux/ethtool.h                  |  32 +++
 include/uapi/linux/ethtool_netlink.h          |  17 ++
 net/ethtool/Makefile                          |   2 +-
 net/ethtool/module.c                          | 191 +++++++++++++++++
 net/ethtool/netlink.c                         |  19 ++
 net/ethtool/netlink.h                         |   4 +
 13 files changed, 741 insertions(+), 8 deletions(-)
 create mode 100644 net/ethtool/module.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-18 15:51 [RFC PATCH net-next v2 0/6] ethtool: Add ability to control transceiver modules' power mode Ido Schimmel
@ 2021-08-18 15:51 ` Ido Schimmel
  2021-08-18 22:32   ` Jakub Kicinski
  2021-08-18 15:51 ` [RFC PATCH net-next v2 2/6] mlxsw: reg: Add Port Module Memory Map Properties register Ido Schimmel
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Ido Schimmel @ 2021-08-18 15:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and
'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver
modules parameters and retrieve their status.

The first parameter to control is the power mode of the module. It is
only relevant for paged memory modules, as flat memory modules always
operate in low power mode.

When a paged memory module is in low power mode, its power consumption
is reduced to the minimum, the management interface towards the host is
available and the data path is deactivated.

User space can choose to put modules that are not currently in use in
low power mode and transition them to high power mode before putting the
associated ports administratively up. This is useful for user space that
favors reduced power consumption and lower temperatures over reduced
link up times. In QSFP-DD modules the transition from low power mode to
high power mode can take a few seconds and this transition is only
expected to get longer with future / more complex modules.

User space can control the power mode of the module via the power mode
policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible
values:

* low: Module is always in low power mode.

* high: Module is always in high power mode.

* high-on-up: Module is transitioned by the host to high power mode when
  the first port using it is put administratively up and to low power
  mode when the last port using it is put administratively down.

The operational power mode of the module is available to user space via
the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not
reported to user space when a module is not plugged-in.

Transitioning into low power mode means loss of carrier, so an error is
returned when the port is administratively up.

The user API is designed to be generic enough so that it could be used
for modules with different memory maps (e.g., SFF-8636, CMIS).

The only implementation of the device driver API in this series is for a
MAC driver (mlxsw) where the module is controlled by the device's
firmware, but it is designed to be generic enough so that it could also
be used by implementations where the module is controlled by the CPU.

CMIS testing
============

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x03 (ModuleReady)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : Off

The module is not in low power mode, as it is not forced by hardware
(LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off).

The power mode can be queried from the kernel. In case
LowPwrAllowRequestHW was on, the kernel would need to take into account
the state of the LowPwrRequestHW signal, which is not visible to user
space.

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy high
 power-mode high

Change the power mode policy to 'high-on-up':

 # ethtool --set-module swp11 power-mode-policy high-on-up

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy high-on-up
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x01 (ModuleLowPwr)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : On

Put the associated port administratively up which will instruct the host
to transition the module to high power mode:

 # ip link set dev swp11 up

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy high-on-up
 power-mode high

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x03 (ModuleReady)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : Off

Put the associated port administratively down which will instruct the
host to transition the module to low power mode:

 # ip link set dev swp11 down

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy high-on-up
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x01 (ModuleLowPwr)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : On

SFF-8636 testing
================

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) enabled
 Power set                                 : Off
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.7733 mW / -1.12 dBm
 Transmit avg optical power (Channel 2)    : 0.7649 mW / -1.16 dBm
 Transmit avg optical power (Channel 3)    : 0.7790 mW / -1.08 dBm
 Transmit avg optical power (Channel 4)    : 0.7837 mW / -1.06 dBm
 Rcvr signal avg optical power(Channel 1)  : 0.9302 mW / -0.31 dBm
 Rcvr signal avg optical power(Channel 2)  : 0.9079 mW / -0.42 dBm
 Rcvr signal avg optical power(Channel 3)  : 0.8993 mW / -0.46 dBm
 Rcvr signal avg optical power(Channel 4)  : 0.8778 mW / -0.57 dBm

The module is not in low power mode, as it is not forced by hardware
(Power override is on) or by software (Power set is off).

The power mode can be queried from the kernel. In case Power override
was off, the kernel would need to take into account the state of the
LPMode signal, which is not visible to user space.

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy high
 power-mode high

Change the power mode policy to 'high-on-up':

 # ethtool --set-module swp13 power-mode-policy high-on-up

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy high-on-up
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) not enabled
 Power set                                 : On
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 1)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm

Put the associated port administratively up which will instruct the host
to transition the module to high power mode:

 # ip link set dev swp13 up

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy high-on-up
 power-mode high

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) enabled
 Power set                                 : Off
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.7934 mW / -1.01 dBm
 Transmit avg optical power (Channel 2)    : 0.7859 mW / -1.05 dBm
 Transmit avg optical power (Channel 3)    : 0.7885 mW / -1.03 dBm
 Transmit avg optical power (Channel 4)    : 0.7985 mW / -0.98 dBm
 Rcvr signal avg optical power(Channel 1)  : 0.9325 mW / -0.30 dBm
 Rcvr signal avg optical power(Channel 2)  : 0.9034 mW / -0.44 dBm
 Rcvr signal avg optical power(Channel 3)  : 0.9086 mW / -0.42 dBm
 Rcvr signal avg optical power(Channel 4)  : 0.8885 mW / -0.51 dBm

Put the associated port administratively down which will instruct the
host to transition the module to low power mode:

 # ip link set dev swp13 down

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy high-on-up
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) not enabled
 Power set                                 : On
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 1)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 Documentation/networking/ethtool-netlink.rst |  66 ++++++-
 include/linux/ethtool.h                      |  25 +++
 include/uapi/linux/ethtool.h                 |  25 +++
 include/uapi/linux/ethtool_netlink.h         |  17 ++
 net/ethtool/Makefile                         |   2 +-
 net/ethtool/module.c                         | 191 +++++++++++++++++++
 net/ethtool/netlink.c                        |  19 ++
 net/ethtool/netlink.h                        |   4 +
 8 files changed, 346 insertions(+), 3 deletions(-)
 create mode 100644 net/ethtool/module.c

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index c86628e6a235..54a704370bfc 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -179,7 +179,7 @@ according to message purpose:
 
 Userspace to kernel:
 
-  ===================================== ================================
+  ===================================== =================================
   ``ETHTOOL_MSG_STRSET_GET``            get string set
   ``ETHTOOL_MSG_LINKINFO_GET``          get link settings
   ``ETHTOOL_MSG_LINKINFO_SET``          set link settings
@@ -213,7 +213,9 @@ Userspace to kernel:
   ``ETHTOOL_MSG_MODULE_EEPROM_GET``     read SFP module EEPROM
   ``ETHTOOL_MSG_STATS_GET``             get standard statistics
   ``ETHTOOL_MSG_PHC_VCLOCKS_GET``       get PHC virtual clocks info
-  ===================================== ================================
+  ``ETHTOOL_MSG_MODULE_SET``            set transceiver module parameters
+  ``ETHTOOL_MSG_MODULE_GET``            get transceiver module parameters
+  ===================================== =================================
 
 Kernel to userspace:
 
@@ -252,6 +254,7 @@ Kernel to userspace:
   ``ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY``  read SFP module EEPROM
   ``ETHTOOL_MSG_STATS_GET_REPLY``          standard statistics
   ``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY``    PHC virtual clocks info
+  ``ETHTOOL_MSG_MODULE_GET_REPLY``         transceiver module parameters
   ======================================== =================================
 
 ``GET`` requests are sent by userspace applications to retrieve device
@@ -1498,6 +1501,63 @@ Kernel response contents:
   ``ETHTOOL_A_PHC_VCLOCKS_INDEX``       s32     PHC index array
   ====================================  ======  ==========================
 
+MODULE_GET
+==========
+
+Gets transceiver module parameters.
+
+Request contents:
+
+  =====================================  ======  ==========================
+  ``ETHTOOL_A_MODULE_HEADER``            nested  request header
+  =====================================  ======  ==========================
+
+Kernel response contents:
+
+  ======================================  ======  ==========================
+  ``ETHTOOL_A_MODULE_HEADER``             nested  reply header
+  ``ETHTOOL_A_MODULE_POWER_MODE_POLICY``  u8      power mode policy
+  ``ETHTOOL_A_MODULE_POWER_MODE``         u8      operational power mode
+  ======================================  ======  ==========================
+
+The optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute encodes the
+transceiver module power mode policy enforced by the host.
+
+The optional ``ETHTHOOL_A_MODULE_POWER_MODE`` attribute encodes the operational
+power mode policy of the transceiver module. It is only reported when a module
+is plugged-in. Possible values are:
+
+.. kernel-doc:: include/uapi/linux/ethtool.h
+    :identifiers: ethtool_module_power_mode
+
+MODULE_SET
+==========
+
+Sets transceiver module parameters.
+
+Request contents:
+
+  ======================================  ======  ==========================
+  ``ETHTOOL_A_MODULE_HEADER``             nested  request header
+  ``ETHTOOL_A_MODULE_POWER_MODE_POLICY``  u8      power mode policy
+  ======================================  ======  ==========================
+
+When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
+to set the transceiver module power policy enforced by the host. Possible
+values are:
+
+.. kernel-doc:: include/uapi/linux/ethtool.h
+    :identifiers: ethtool_module_power_mode_policy
+
+For SFF-8636 modules, low power mode is forced by the host according to table
+6-10 in revision 2.10a of the specification.
+
+For CMIS modules, low power mode is forced by the host according to table 6-12
+in revision 5.0 of the specification.
+
+To avoid changes to the operational state of the device, power mode policy can
+only be set when the device is administratively down.
+
 Request translation
 ===================
 
@@ -1597,4 +1657,6 @@ are netlink only.
   n/a                                 ``ETHTOOL_MSG_CABLE_TEST_TDR_ACT``
   n/a                                 ``ETHTOOL_MSG_TUNNEL_INFO_GET``
   n/a                                 ``ETHTOOL_MSG_PHC_VCLOCKS_GET``
+  n/a                                 ``ETHTOOL_MSG_MODULE_GET``
+  n/a                                 ``ETHTOOL_MSG_MODULE_SET``
   =================================== =====================================
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 4711b96dae0c..07d40dc20ca4 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -405,6 +405,20 @@ struct ethtool_module_eeprom {
 	u8	*data;
 };
 
+/**
+ * struct ethtool_module_power_mode_params - module power mode parameters
+ * @policy: The power mode policy enforced by the host for the plug-in module.
+ * @mode: The operational power mode of the plug-in module. Should be filled by
+ * device drivers on get operations.
+ * @mode_valid: Indicates the validity of the @mode field. Should be set by
+ * device drivers on get operations when a module is plugged-in.
+ */
+struct ethtool_module_power_mode_params {
+	enum ethtool_module_power_mode_policy policy;
+	enum ethtool_module_power_mode mode;
+	u8 mode_valid:1;
+};
+
 /**
  * struct ethtool_ops - optional netdev operations
  * @cap_link_lanes_supported: indicates if the driver supports lanes
@@ -570,6 +584,11 @@ struct ethtool_module_eeprom {
  * @get_eth_ctrl_stats: Query some of the IEEE 802.3 MAC Ctrl statistics.
  * @get_rmon_stats: Query some of the RMON (RFC 2819) statistics.
  *	Set %ranges to a pointer to zero-terminated array of byte ranges.
+ * @get_module_power_mode: Get the power mode policy for the plug-in module
+ *	used by the network device and its operational power mode, if
+ *	plugged-in.
+ * @set_module_power_mode: Set the power mode policy for the plug-in module
+ *	used by the network device.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -689,6 +708,12 @@ struct ethtool_ops {
 	void	(*get_rmon_stats)(struct net_device *dev,
 				  struct ethtool_rmon_stats *rmon_stats,
 				  const struct ethtool_rmon_hist_range **ranges);
+	int	(*get_module_power_mode)(struct net_device *dev,
+					 struct ethtool_module_power_mode_params *params,
+					 struct netlink_ext_ack *extack);
+	int	(*set_module_power_mode)(struct net_device *dev,
+					 const struct ethtool_module_power_mode_params *params,
+					 struct netlink_ext_ack *extack);
 };
 
 int ethtool_check_ops(const struct ethtool_ops *ops);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 67aa7134b301..0a52ee560c3a 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -704,6 +704,31 @@ enum ethtool_stringset {
 	ETH_SS_COUNT
 };
 
+/**
+ * enum ethtool_module_power_mode_policy - plug-in module power mode policy
+ * @ETHTOOL_MODULE_POWER_MODE_POLICY_LOW: Module is always in low power mode.
+ * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
+ * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP: Module is transitioned by the
+ *	host to high power mode when the first port using it is put
+ *	administratively up and to low power mode when the last port using it
+ *	is put administratively down.
+ */
+enum ethtool_module_power_mode_policy {
+	ETHTOOL_MODULE_POWER_MODE_POLICY_LOW,
+	ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH,
+	ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP,
+};
+
+/**
+ * enum ethtool_module_power_mode - plug-in module power mode
+ * @ETHTOOL_MODULE_POWER_MODE_LOW: Module is in low power mode.
+ * @ETHTOOL_MODULE_POWER_MODE_HIGH: Module is in high power mode.
+ */
+enum ethtool_module_power_mode {
+	ETHTOOL_MODULE_POWER_MODE_LOW,
+	ETHTOOL_MODULE_POWER_MODE_HIGH,
+};
+
 /**
  * struct ethtool_gstrings - string set for data tagging
  * @cmd: Command number = %ETHTOOL_GSTRINGS
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index b3b93710eff7..c22f5c902060 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -47,6 +47,8 @@ enum {
 	ETHTOOL_MSG_MODULE_EEPROM_GET,
 	ETHTOOL_MSG_STATS_GET,
 	ETHTOOL_MSG_PHC_VCLOCKS_GET,
+	ETHTOOL_MSG_MODULE_GET,
+	ETHTOOL_MSG_MODULE_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -90,6 +92,8 @@ enum {
 	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
 	ETHTOOL_MSG_STATS_GET_REPLY,
 	ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY,
+	ETHTOOL_MSG_MODULE_GET_REPLY,
+	ETHTOOL_MSG_MODULE_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -831,6 +835,19 @@ enum {
 	ETHTOOL_A_STATS_RMON_MAX = (__ETHTOOL_A_STATS_RMON_CNT - 1)
 };
 
+/* MODULE */
+
+enum {
+	ETHTOOL_A_MODULE_UNSPEC,
+	ETHTOOL_A_MODULE_HEADER,		/* nest - _A_HEADER_* */
+	ETHTOOL_A_MODULE_POWER_MODE_POLICY,	/* u8 */
+	ETHTOOL_A_MODULE_POWER_MODE,		/* u8 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_MODULE_CNT,
+	ETHTOOL_A_MODULE_MAX = (__ETHTOOL_A_MODULE_CNT - 1)
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 0a19470efbfb..b76432e70e6b 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
 ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
 		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
 		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
-		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o
+		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o
diff --git a/net/ethtool/module.c b/net/ethtool/module.c
new file mode 100644
index 000000000000..6fb7d84fabf7
--- /dev/null
+++ b/net/ethtool/module.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/ethtool.h>
+
+#include "netlink.h"
+#include "common.h"
+#include "bitset.h"
+
+struct module_req_info {
+	struct ethnl_req_info base;
+};
+
+struct module_reply_data {
+	struct ethnl_reply_data	base;
+	struct ethtool_module_power_mode_params power;
+	u8 power_valid:1;
+};
+
+#define MODULE_REPDATA(__reply_base) \
+	container_of(__reply_base, struct module_reply_data, base)
+
+/* MODULE_GET */
+
+const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1] = {
+	[ETHTOOL_A_MODULE_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
+};
+
+static int module_get_power_mode(struct net_device *dev,
+				 struct module_reply_data *data,
+				 struct netlink_ext_ack *extack)
+{
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+	int ret;
+
+	if (!ops->get_module_power_mode)
+		return 0;
+
+	ret = ops->get_module_power_mode(dev, &data->power, extack);
+	if (ret < 0)
+		return ret;
+
+	data->power_valid = true;
+
+	return 0;
+}
+
+static int module_prepare_data(const struct ethnl_req_info *req_base,
+			       struct ethnl_reply_data *reply_base,
+			       struct genl_info *info)
+{
+	struct module_reply_data *data = MODULE_REPDATA(reply_base);
+	struct netlink_ext_ack *extack = info ? info->extack : NULL;
+	struct net_device *dev = reply_base->dev;
+	int ret;
+
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		return ret;
+
+	ret = module_get_power_mode(dev, data, extack);
+	if (ret < 0)
+		goto out_complete;
+
+out_complete:
+	ethnl_ops_complete(dev);
+	return ret;
+}
+
+static int module_reply_size(const struct ethnl_req_info *req_base,
+			     const struct ethnl_reply_data *reply_base)
+{
+	struct module_reply_data *data = MODULE_REPDATA(reply_base);
+	int len = 0;
+
+	if (data->power_valid)
+		len += nla_total_size(sizeof(u8));	/* _MODULE_POWER_MODE_POLICY */
+
+	if (data->power_valid && data->power.mode_valid)
+		len += nla_total_size(sizeof(u8));	/* _MODULE_POWER_MODE */
+
+	return len;
+}
+
+static int module_fill_reply(struct sk_buff *skb,
+			     const struct ethnl_req_info *req_base,
+			     const struct ethnl_reply_data *reply_base)
+{
+	const struct module_reply_data *data = MODULE_REPDATA(reply_base);
+
+	if (data->power_valid &&
+	    nla_put_u8(skb, ETHTOOL_A_MODULE_POWER_MODE_POLICY,
+		       data->power.policy))
+		return -EMSGSIZE;
+
+	if (data->power_valid && data->power.mode_valid &&
+	    nla_put_u8(skb, ETHTOOL_A_MODULE_POWER_MODE, data->power.mode))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+const struct ethnl_request_ops ethnl_module_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_MODULE_GET,
+	.reply_cmd		= ETHTOOL_MSG_MODULE_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_MODULE_HEADER,
+	.req_info_size		= sizeof(struct module_req_info),
+	.reply_data_size	= sizeof(struct module_reply_data),
+
+	.prepare_data		= module_prepare_data,
+	.reply_size		= module_reply_size,
+	.fill_reply		= module_fill_reply,
+};
+
+/* MODULE_SET */
+
+const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MODE_POLICY + 1] = {
+	[ETHTOOL_A_MODULE_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_MODULE_POWER_MODE_POLICY] =
+		NLA_POLICY_MAX(NLA_U8, ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP),
+};
+
+static int module_set_power_mode(struct net_device *dev, struct nlattr **tb,
+				 bool *p_mod, struct netlink_ext_ack *extack)
+{
+	struct ethtool_module_power_mode_params power = {};
+	struct ethtool_module_power_mode_params power_new;
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+	int ret;
+
+	if (!tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY])
+		return 0;
+
+	if (!ops->get_module_power_mode || !ops->set_module_power_mode) {
+		NL_SET_ERR_MSG_ATTR(extack,
+				    tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY],
+				    "Setting power mode policy is not supported by this device");
+		return -EOPNOTSUPP;
+	}
+
+	if (netif_running(dev)) {
+		NL_SET_ERR_MSG_ATTR(extack,
+				    tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY],
+				    "Cannot set power mode policy when port is administratively up");
+		return -EINVAL;
+	}
+
+	power_new.policy = nla_get_u8(tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY]);
+	ret = ops->get_module_power_mode(dev, &power, extack);
+	if (ret < 0)
+		return ret;
+	*p_mod = power_new.policy != power.policy;
+
+	return ops->set_module_power_mode(dev, &power_new, extack);
+}
+
+int ethnl_set_module(struct sk_buff *skb, struct genl_info *info)
+{
+	struct ethnl_req_info req_info = {};
+	struct nlattr **tb = info->attrs;
+	struct net_device *dev;
+	bool mod = false;
+	int ret;
+
+	ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_MODULE_HEADER],
+					 genl_info_net(info), info->extack,
+					 true);
+	if (ret < 0)
+		return ret;
+	dev = req_info.dev;
+
+	rtnl_lock();
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		goto out_rtnl;
+
+	ret = module_set_power_mode(dev, tb, &mod, info->extack);
+	if (ret < 0)
+		goto out_ops;
+
+	if (!mod)
+		goto out_ops;
+
+	ethtool_notify(dev, ETHTOOL_MSG_MODULE_NTF, NULL);
+
+out_ops:
+	ethnl_ops_complete(dev);
+out_rtnl:
+	rtnl_unlock();
+	dev_put(dev);
+	return ret;
+}
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 1797a0a90019..38b44c0291b1 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -282,6 +282,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
 	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
 	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
 	[ETHTOOL_MSG_PHC_VCLOCKS_GET]	= &ethnl_phc_vclocks_request_ops,
+	[ETHTOOL_MSG_MODULE_GET]	= &ethnl_module_request_ops,
 };
 
 static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
@@ -593,6 +594,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
 	[ETHTOOL_MSG_PAUSE_NTF]		= &ethnl_pause_request_ops,
 	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
+	[ETHTOOL_MSG_MODULE_NTF]	= &ethnl_module_request_ops,
 };
 
 /* default notification handler */
@@ -686,6 +688,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
 	[ETHTOOL_MSG_PAUSE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
+	[ETHTOOL_MSG_MODULE_NTF]	= ethnl_default_notify,
 };
 
 void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
@@ -999,6 +1002,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_phc_vclocks_get_policy,
 		.maxattr = ARRAY_SIZE(ethnl_phc_vclocks_get_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_MODULE_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_module_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_module_get_policy) - 1,
+	},
+	{
+		.cmd	= ETHTOOL_MSG_MODULE_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_set_module,
+		.policy = ethnl_module_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_module_set_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 077aac3929a8..1aafba5b67bb 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -337,6 +337,7 @@ extern const struct ethnl_request_ops ethnl_fec_request_ops;
 extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
 extern const struct ethnl_request_ops ethnl_stats_request_ops;
 extern const struct ethnl_request_ops ethnl_phc_vclocks_request_ops;
+extern const struct ethnl_request_ops ethnl_module_request_ops;
 
 extern const struct nla_policy ethnl_header_policy[ETHTOOL_A_HEADER_FLAGS + 1];
 extern const struct nla_policy ethnl_header_policy_stats[ETHTOOL_A_HEADER_FLAGS + 1];
@@ -373,6 +374,8 @@ extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
+extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
+extern const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MODE_POLICY + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
 int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info);
@@ -391,6 +394,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
 int ethnl_tunnel_info_start(struct netlink_callback *cb);
 int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
+int ethnl_set_module(struct sk_buff *skb, struct genl_info *info);
 
 extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
 extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next v2 2/6] mlxsw: reg: Add Port Module Memory Map Properties register
  2021-08-18 15:51 [RFC PATCH net-next v2 0/6] ethtool: Add ability to control transceiver modules' power mode Ido Schimmel
  2021-08-18 15:51 ` [RFC PATCH net-next v2 1/6] " Ido Schimmel
@ 2021-08-18 15:51 ` Ido Schimmel
  2021-08-18 15:51 ` [RFC PATCH net-next v2 3/6] mlxsw: reg: Add Management Cable IO and Notifications register Ido Schimmel
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Ido Schimmel @ 2021-08-18 15:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Add the Port Module Memory Map Properties register. It will be used to
set the power mode of a module in subsequent patches.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 44 +++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 16092a1f68bb..e30e6d3482f6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -5883,6 +5883,49 @@ static inline void mlxsw_reg_pddr_pack(char *payload, u8 local_port,
 	mlxsw_reg_pddr_page_select_set(payload, page_select);
 }
 
+/* PMMP - Port Module Memory Map Properties Register
+ * -------------------------------------------------
+ * The PMMP register allows to override the module memory map advertisement.
+ * The register can only be set when the module is disabled by PMAOS register.
+ */
+#define MLXSW_REG_PMMP_ID 0x5044
+#define MLXSW_REG_PMMP_LEN 0x2C
+
+MLXSW_REG_DEFINE(pmmp, MLXSW_REG_PMMP_ID, MLXSW_REG_PMMP_LEN);
+
+/* reg_pmmp_module
+ * Module number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, pmmp, module, 0x00, 16, 8);
+
+/* reg_pmmp_eeprom_override_mask
+ * Write mask bit (negative polarity).
+ * 0 - Allow write
+ * 1 - Ignore write
+ * On write, indicates which of the bits from eeprom_override field are
+ * updated.
+ * Access: WO
+ */
+MLXSW_ITEM32(reg, pmmp, eeprom_override_mask, 0x04, 16, 16);
+
+enum {
+	/* Set module to low power mode */
+	MLXSW_REG_PMMP_EEPROM_OVERRIDE_LOW_POWER_MASK = BIT(8),
+};
+
+/* reg_pmmp_eeprom_override
+ * Override / ignore EEPROM advertisement properties bitmask
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, pmmp, eeprom_override, 0x04, 0, 16);
+
+static inline void mlxsw_reg_pmmp_pack(char *payload, u8 module)
+{
+	MLXSW_REG_ZERO(pmmp, payload);
+	mlxsw_reg_pmmp_module_set(payload, module);
+}
+
 /* PMTM - Port Module Type Mapping Register
  * ----------------------------------------
  * The PMTM allows query or configuration of module types.
@@ -12225,6 +12268,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
 	MLXSW_REG(pplr),
 	MLXSW_REG(pmpe),
 	MLXSW_REG(pddr),
+	MLXSW_REG(pmmp),
 	MLXSW_REG(pmtm),
 	MLXSW_REG(htgt),
 	MLXSW_REG(hpkt),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next v2 3/6] mlxsw: reg: Add Management Cable IO and Notifications register
  2021-08-18 15:51 [RFC PATCH net-next v2 0/6] ethtool: Add ability to control transceiver modules' power mode Ido Schimmel
  2021-08-18 15:51 ` [RFC PATCH net-next v2 1/6] " Ido Schimmel
  2021-08-18 15:51 ` [RFC PATCH net-next v2 2/6] mlxsw: reg: Add Port Module Memory Map Properties register Ido Schimmel
@ 2021-08-18 15:51 ` Ido Schimmel
  2021-08-18 15:52 ` [RFC PATCH net-next v2 4/6] mlxsw: Add ability to control transceiver modules' power mode Ido Schimmel
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Ido Schimmel @ 2021-08-18 15:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Add the Management Cable IO and Notifications register. It will be used
to retrieve the power mode status of a module in subsequent patches and
whether a module is present in a cage or not.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 34 +++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index e30e6d3482f6..de40b59f7d00 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -10274,6 +10274,39 @@ static inline void mlxsw_reg_mlcr_pack(char *payload, u8 local_port,
 					   MLXSW_REG_MLCR_DURATION_MAX : 0);
 }
 
+/* MCION - Management Cable IO and Notifications Register
+ * ------------------------------------------------------
+ * The MCION register is used to query transceiver modules' IO pins and other
+ * notifications.
+ */
+#define MLXSW_REG_MCION_ID 0x9052
+#define MLXSW_REG_MCION_LEN 0x18
+
+MLXSW_REG_DEFINE(mcion, MLXSW_REG_MCION_ID, MLXSW_REG_MCION_LEN);
+
+/* reg_mcion_module
+ * Module number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mcion, module, 0x00, 16, 8);
+
+enum {
+	MLXSW_REG_MCION_MODULE_STATUS_BITS_PRESENT_MASK = BIT(0),
+	MLXSW_REG_MCION_MODULE_STATUS_BITS_LOW_POWER_MASK = BIT(8),
+};
+
+/* reg_mcion_module_status_bits
+ * Module IO status as defined by SFF.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mcion, module_status_bits, 0x04, 0, 16);
+
+static inline void mlxsw_reg_mcion_pack(char *payload, u8 module)
+{
+	MLXSW_REG_ZERO(mcion, payload);
+	mlxsw_reg_mcion_module_set(payload, module);
+}
+
 /* MTPPS - Management Pulse Per Second Register
  * --------------------------------------------
  * This register provides the device PPS capabilities, configure the PPS in and
@@ -12316,6 +12349,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
 	MLXSW_REG(mgir),
 	MLXSW_REG(mrsr),
 	MLXSW_REG(mlcr),
+	MLXSW_REG(mcion),
 	MLXSW_REG(mtpps),
 	MLXSW_REG(mtutc),
 	MLXSW_REG(mpsc),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next v2 4/6] mlxsw: Add ability to control transceiver modules' power mode
  2021-08-18 15:51 [RFC PATCH net-next v2 0/6] ethtool: Add ability to control transceiver modules' power mode Ido Schimmel
                   ` (2 preceding siblings ...)
  2021-08-18 15:51 ` [RFC PATCH net-next v2 3/6] mlxsw: reg: Add Management Cable IO and Notifications register Ido Schimmel
@ 2021-08-18 15:52 ` Ido Schimmel
  2021-08-18 15:52 ` [RFC PATCH net-next v2 5/6] ethtool: Add transceiver module extended states Ido Schimmel
  2021-08-18 15:52 ` [RFC PATCH net-next v2 6/6] mlxsw: Add support for " Ido Schimmel
  5 siblings, 0 replies; 16+ messages in thread
From: Ido Schimmel @ 2021-08-18 15:52 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Implement support for ethtool_ops::.get_module_power_mode and
ethtool_ops::set_module_power_mode.

The get operation is implemented using the Management Cable IO and
Notifications (MCION) register that reports the operational power mode
of the module and its presence. In case a module is not present, its
operational power mode is not reported to ethtool and user space. If not
set before, the power mode policy is reported as "high", which is the
default on Mellanox systems.

The set operation is implemented using the Port Module Memory Map
Properties (PMMP) register. The register instructs the device's firmware
to transition a plugged-in module to / out of low power mode by writing
to its memory map.

Before setting the power mode, the per-module 'num_ports_up' counter is
consulted to ensure no ports mapped to the module are administratively
up, so that their operational state will not change during the
operation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 .../net/ethernet/mellanox/mlxsw/core_env.c    | 193 +++++++++++++++++-
 .../net/ethernet/mellanox/mlxsw/core_env.h    |  10 +
 drivers/net/ethernet/mellanox/mlxsw/minimal.c |  26 +++
 .../mellanox/mlxsw/spectrum_ethtool.c         |  28 +++
 4 files changed, 254 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.c b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
index 9e367174743d..f7df37536ca4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_env.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
@@ -17,6 +17,7 @@ struct mlxsw_env_module_info {
 	bool is_overheat;
 	int num_ports_mapped;
 	int num_ports_up;
+	enum ethtool_module_power_mode_policy power_mode_policy;
 };
 
 struct mlxsw_env {
@@ -445,6 +446,152 @@ int mlxsw_env_reset_module(struct net_device *netdev,
 }
 EXPORT_SYMBOL(mlxsw_env_reset_module);
 
+int
+mlxsw_env_get_module_power_mode(struct mlxsw_core *mlxsw_core, u8 module,
+				struct ethtool_module_power_mode_params *params,
+				struct netlink_ext_ack *extack)
+{
+	struct mlxsw_env *mlxsw_env = mlxsw_core_env(mlxsw_core);
+	char mcion_pl[MLXSW_REG_MCION_LEN];
+	u32 status_bits;
+	int err;
+
+	if (WARN_ON_ONCE(module >= mlxsw_env->module_count))
+		return -EINVAL;
+
+	mutex_lock(&mlxsw_env->module_info_lock);
+
+	params->policy = mlxsw_env->module_info[module].power_mode_policy;
+
+	mlxsw_reg_mcion_pack(mcion_pl, module);
+	err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mcion), mcion_pl);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to retrieve module's power mode");
+		goto out;
+	}
+
+	status_bits = mlxsw_reg_mcion_module_status_bits_get(mcion_pl);
+	if (!(status_bits & MLXSW_REG_MCION_MODULE_STATUS_BITS_PRESENT_MASK))
+		goto out;
+
+	if (status_bits & MLXSW_REG_MCION_MODULE_STATUS_BITS_LOW_POWER_MASK)
+		params->mode = ETHTOOL_MODULE_POWER_MODE_LOW;
+	else
+		params->mode = ETHTOOL_MODULE_POWER_MODE_HIGH;
+
+	params->mode_valid = true;
+
+out:
+	mutex_unlock(&mlxsw_env->module_info_lock);
+	return err;
+}
+EXPORT_SYMBOL(mlxsw_env_get_module_power_mode);
+
+static int mlxsw_env_module_enable_set(struct mlxsw_core *mlxsw_core,
+				       u8 module, bool enable)
+{
+	enum mlxsw_reg_pmaos_admin_status admin_status;
+	char pmaos_pl[MLXSW_REG_PMAOS_LEN];
+
+	mlxsw_reg_pmaos_pack(pmaos_pl, module);
+	admin_status = enable ? MLXSW_REG_PMAOS_ADMIN_STATUS_ENABLED :
+				MLXSW_REG_PMAOS_ADMIN_STATUS_DISABLED;
+	mlxsw_reg_pmaos_admin_status_set(pmaos_pl, admin_status);
+	mlxsw_reg_pmaos_ase_set(pmaos_pl, true);
+
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(pmaos), pmaos_pl);
+}
+
+static int mlxsw_env_module_low_power_set(struct mlxsw_core *mlxsw_core,
+					  u8 module, bool low_power)
+{
+	u16 eeprom_override_mask, eeprom_override;
+	char pmmp_pl[MLXSW_REG_PMMP_LEN];
+
+	mlxsw_reg_pmmp_pack(pmmp_pl, module);
+	/* Mask all the bits except low power mode. */
+	eeprom_override_mask = ~MLXSW_REG_PMMP_EEPROM_OVERRIDE_LOW_POWER_MASK;
+	mlxsw_reg_pmmp_eeprom_override_mask_set(pmmp_pl, eeprom_override_mask);
+	eeprom_override = low_power ? MLXSW_REG_PMMP_EEPROM_OVERRIDE_LOW_POWER_MASK :
+				      0;
+	mlxsw_reg_pmmp_eeprom_override_set(pmmp_pl, eeprom_override);
+
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(pmmp), pmmp_pl);
+}
+
+static int __mlxsw_env_set_module_power_mode(struct mlxsw_core *mlxsw_core,
+					     u8 module, bool low_power,
+					     struct netlink_ext_ack *extack)
+{
+	struct mlxsw_env *mlxsw_env = mlxsw_core_env(mlxsw_core);
+	int err;
+
+	if (mlxsw_env->module_info[module].num_ports_up) {
+		NL_SET_ERR_MSG_MOD(extack, "Cannot set module's power mode when ports using it are administratively up");
+		return -EINVAL;
+	}
+
+	err = mlxsw_env_module_enable_set(mlxsw_core, module, false);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to disable module");
+		return err;
+	}
+
+	err = mlxsw_env_module_low_power_set(mlxsw_core, module, low_power);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to set module's power mode");
+		goto err_module_low_power_set;
+	}
+
+	err = mlxsw_env_module_enable_set(mlxsw_core, module, true);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to enable module");
+		goto err_module_enable_set;
+	}
+
+	return 0;
+
+err_module_enable_set:
+	mlxsw_env_module_low_power_set(mlxsw_core, module, !low_power);
+err_module_low_power_set:
+	mlxsw_env_module_enable_set(mlxsw_core, module, true);
+	return err;
+}
+
+int
+mlxsw_env_set_module_power_mode(struct mlxsw_core *mlxsw_core, u8 module,
+				enum ethtool_module_power_mode_policy policy,
+				struct netlink_ext_ack *extack)
+{
+	struct mlxsw_env *mlxsw_env = mlxsw_core_env(mlxsw_core);
+	bool low_power;
+	int err;
+
+	if (WARN_ON_ONCE(module >= mlxsw_env->module_count))
+		return -EINVAL;
+
+	if (policy != ETHTOOL_MODULE_POWER_MODE_POLICY_LOW &&
+	    policy != ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH &&
+	    policy != ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP) {
+		NL_SET_ERR_MSG_MOD(extack, "Unsupported power mode policy");
+		return -EOPNOTSUPP;
+	}
+
+	mutex_lock(&mlxsw_env->module_info_lock);
+
+	low_power = policy != ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH;
+	err = __mlxsw_env_set_module_power_mode(mlxsw_core, module, low_power,
+						extack);
+	if (err)
+		goto out;
+
+	mlxsw_env->module_info[module].power_mode_policy = policy;
+out:
+	mutex_unlock(&mlxsw_env->module_info_lock);
+	return err;
+}
+EXPORT_SYMBOL(mlxsw_env_set_module_power_mode);
+
 static int mlxsw_env_module_has_temp_sensor(struct mlxsw_core *mlxsw_core,
 					    u8 module,
 					    bool *p_has_temp_sensor)
@@ -794,15 +941,33 @@ EXPORT_SYMBOL(mlxsw_env_module_port_unmap);
 int mlxsw_env_module_port_up(struct mlxsw_core *mlxsw_core, u8 module)
 {
 	struct mlxsw_env *mlxsw_env = mlxsw_core_env(mlxsw_core);
+	int err = 0;
 
 	if (WARN_ON_ONCE(module >= mlxsw_env->module_count))
 		return -EINVAL;
 
 	mutex_lock(&mlxsw_env->module_info_lock);
+
+	if (mlxsw_env->module_info[module].power_mode_policy !=
+	    ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP)
+		goto out_inc;
+
+	if (mlxsw_env->module_info[module].num_ports_up != 0)
+		goto out_inc;
+
+	/* Transition to high power mode following first port using the module
+	 * being put administratively up.
+	 */
+	err = __mlxsw_env_set_module_power_mode(mlxsw_core, module, false,
+						NULL);
+	if (err)
+		goto out_unlock;
+
+out_inc:
 	mlxsw_env->module_info[module].num_ports_up++;
+out_unlock:
 	mutex_unlock(&mlxsw_env->module_info_lock);
-
-	return 0;
+	return err;
 }
 EXPORT_SYMBOL(mlxsw_env_module_port_up);
 
@@ -814,7 +979,22 @@ void mlxsw_env_module_port_down(struct mlxsw_core *mlxsw_core, u8 module)
 		return;
 
 	mutex_lock(&mlxsw_env->module_info_lock);
+
 	mlxsw_env->module_info[module].num_ports_up--;
+
+	if (mlxsw_env->module_info[module].power_mode_policy !=
+	    ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP)
+		goto out_unlock;
+
+	if (mlxsw_env->module_info[module].num_ports_up != 0)
+		goto out_unlock;
+
+	/* Transition to low power mode following last port using the module
+	 * being put administratively down.
+	 */
+	__mlxsw_env_set_module_power_mode(mlxsw_core, module, true, NULL);
+
+out_unlock:
 	mutex_unlock(&mlxsw_env->module_info_lock);
 }
 EXPORT_SYMBOL(mlxsw_env_module_port_down);
@@ -824,7 +1004,7 @@ int mlxsw_env_init(struct mlxsw_core *mlxsw_core, struct mlxsw_env **p_env)
 	char mgpir_pl[MLXSW_REG_MGPIR_LEN];
 	struct mlxsw_env *env;
 	u8 module_count;
-	int err;
+	int i, err;
 
 	mlxsw_reg_mgpir_pack(mgpir_pl);
 	err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mgpir), mgpir_pl);
@@ -837,6 +1017,13 @@ int mlxsw_env_init(struct mlxsw_core *mlxsw_core, struct mlxsw_env **p_env)
 	if (!env)
 		return -ENOMEM;
 
+	/* Firmware defaults to high power mode policy where modules are
+	 * transitioned to high power mode following plug-in.
+	 */
+	for (i = 0; i < module_count; i++)
+		env->module_info[i].power_mode_policy =
+			ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH;
+
 	mutex_init(&env->module_info_lock);
 	env->core = mlxsw_core;
 	env->module_count = module_count;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.h b/drivers/net/ethernet/mellanox/mlxsw/core_env.h
index c486397f5dfe..da121b1a84b4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_env.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.h
@@ -28,6 +28,16 @@ int mlxsw_env_reset_module(struct net_device *netdev,
 			   struct mlxsw_core *mlxsw_core, u8 module,
 			   u32 *flags);
 
+int
+mlxsw_env_get_module_power_mode(struct mlxsw_core *mlxsw_core, u8 module,
+				struct ethtool_module_power_mode_params *params,
+				struct netlink_ext_ack *extack);
+
+int
+mlxsw_env_set_module_power_mode(struct mlxsw_core *mlxsw_core, u8 module,
+				enum ethtool_module_power_mode_policy policy,
+				struct netlink_ext_ack *extack);
+
 int
 mlxsw_env_module_overheat_counter_get(struct mlxsw_core *mlxsw_core, u8 module,
 				      u64 *p_counter);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/minimal.c b/drivers/net/ethernet/mellanox/mlxsw/minimal.c
index 9644e9c486b8..e0892f259adf 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/minimal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/minimal.c
@@ -145,12 +145,38 @@ static int mlxsw_m_reset(struct net_device *netdev, u32 *flags)
 				      flags);
 }
 
+static int
+mlxsw_m_get_module_power_mode(struct net_device *netdev,
+			      struct ethtool_module_power_mode_params *params,
+			      struct netlink_ext_ack *extack)
+{
+	struct mlxsw_m_port *mlxsw_m_port = netdev_priv(netdev);
+	struct mlxsw_core *core = mlxsw_m_port->mlxsw_m->core;
+
+	return mlxsw_env_get_module_power_mode(core, mlxsw_m_port->module,
+					       params, extack);
+}
+
+static int
+mlxsw_m_set_module_power_mode(struct net_device *netdev,
+			      const struct ethtool_module_power_mode_params *params,
+			      struct netlink_ext_ack *extack)
+{
+	struct mlxsw_m_port *mlxsw_m_port = netdev_priv(netdev);
+	struct mlxsw_core *core = mlxsw_m_port->mlxsw_m->core;
+
+	return mlxsw_env_set_module_power_mode(core, mlxsw_m_port->module,
+					       params->policy, extack);
+}
+
 static const struct ethtool_ops mlxsw_m_port_ethtool_ops = {
 	.get_drvinfo		= mlxsw_m_module_get_drvinfo,
 	.get_module_info	= mlxsw_m_get_module_info,
 	.get_module_eeprom	= mlxsw_m_get_module_eeprom,
 	.get_module_eeprom_by_page = mlxsw_m_get_module_eeprom_by_page,
 	.reset			= mlxsw_m_reset,
+	.get_module_power_mode	= mlxsw_m_get_module_power_mode,
+	.set_module_power_mode	= mlxsw_m_set_module_power_mode,
 };
 
 static int
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
index 06f1645561c6..7329bc84a8ee 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
@@ -1206,6 +1206,32 @@ static int mlxsw_sp_reset(struct net_device *dev, u32 *flags)
 	return mlxsw_env_reset_module(dev, mlxsw_sp->core, module, flags);
 }
 
+static int
+mlxsw_sp_get_module_power_mode(struct net_device *dev,
+			       struct ethtool_module_power_mode_params *params,
+			       struct netlink_ext_ack *extack)
+{
+	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	u8 module = mlxsw_sp_port->mapping.module;
+
+	return mlxsw_env_get_module_power_mode(mlxsw_sp->core, module, params,
+					       extack);
+}
+
+static int
+mlxsw_sp_set_module_power_mode(struct net_device *dev,
+			       const struct ethtool_module_power_mode_params *params,
+			       struct netlink_ext_ack *extack)
+{
+	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	u8 module = mlxsw_sp_port->mapping.module;
+
+	return mlxsw_env_set_module_power_mode(mlxsw_sp->core, module,
+					       params->policy, extack);
+}
+
 const struct ethtool_ops mlxsw_sp_port_ethtool_ops = {
 	.cap_link_lanes_supported	= true,
 	.get_drvinfo			= mlxsw_sp_port_get_drvinfo,
@@ -1228,6 +1254,8 @@ const struct ethtool_ops mlxsw_sp_port_ethtool_ops = {
 	.get_eth_ctrl_stats		= mlxsw_sp_get_eth_ctrl_stats,
 	.get_rmon_stats			= mlxsw_sp_get_rmon_stats,
 	.reset				= mlxsw_sp_reset,
+	.get_module_power_mode		= mlxsw_sp_get_module_power_mode,
+	.set_module_power_mode		= mlxsw_sp_set_module_power_mode,
 };
 
 struct mlxsw_sp1_port_link_mode {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next v2 5/6] ethtool: Add transceiver module extended states
  2021-08-18 15:51 [RFC PATCH net-next v2 0/6] ethtool: Add ability to control transceiver modules' power mode Ido Schimmel
                   ` (3 preceding siblings ...)
  2021-08-18 15:52 ` [RFC PATCH net-next v2 4/6] mlxsw: Add ability to control transceiver modules' power mode Ido Schimmel
@ 2021-08-18 15:52 ` Ido Schimmel
  2021-08-18 15:52 ` [RFC PATCH net-next v2 6/6] mlxsw: Add support for " Ido Schimmel
  5 siblings, 0 replies; 16+ messages in thread
From: Ido Schimmel @ 2021-08-18 15:52 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Add an extended state and two extended sub-states to describe link
issues related to transceiver modules.

The first, 'ETHTOOL_LINK_EXT_SUBSTATE_MODULE_LOW_POWER_MODE', tells user
space that port is unable to gain a carrier because the associated
transceiver module is in low power mode where the data path is
deactivated. This is applicable to both SFF-8636 and CMIS modules.

Example:

 # ethtool --set-module swp13 power-mode-policy low

 # ip link set dev swp13 up

 $ ethtool swp13
 ...
 Link detected: no (Module, Module is in low power mode)

 # ip link set dev swp13 down

 # ethtool --set-module swp13 power-mode-policy high-on-up

 # ip link set dev swp13 up

 $ ethtool swp13
 ...
 Link detected: yes

The second, 'ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY', tells
user space that port is unable to gain a carrier because the CMIS Module
State Machine did not reach the ModuleReady (Fully Operational) state.
For example, if the module is stuck at ModuleFault state. In which case,
user space can read the fault reason from the module's EEPROM and
potentially reset it.

For CMIS modules, the first extended sub-state is contained in the
second, but has the added advantage of being applicable to more module
types and being more specific about the nature of the problem.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 Documentation/networking/ethtool-netlink.rst | 12 ++++++++++++
 include/linux/ethtool.h                      |  1 +
 include/uapi/linux/ethtool.h                 |  7 +++++++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 54a704370bfc..2dcf3d4e4dd4 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -523,6 +523,8 @@ Link extended states:
                                                         power required from cable or module
 
   ``ETHTOOL_LINK_EXT_STATE_OVERHEAT``                   The module is overheated
+
+  ``ETHTOOL_LINK_EXT_STATE_MODULE``                     Transceiver module issue
   ================================================      ============================================
 
 Link extended substates:
@@ -608,6 +610,16 @@ Link extended substates:
   ``ETHTOOL_LINK_EXT_SUBSTATE_CI_CABLE_TEST_FAILURE``   Cable test failure
   ===================================================   ============================================
 
+  Transceiver module issue substates:
+
+  ===================================================   ============================================
+  ``ETHTOOL_LINK_EXT_SUBSTATE_MODULE_LOW_POWER_MODE``   The transceiver module is in low power mode
+
+  ``ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY``   The CMIS Module State Machine did not reach
+                                                        the ModuleReady state. For example, if the
+                                                        module is stuck at ModuleFault state
+  ===================================================   ============================================
+
 DEBUG_GET
 =========
 
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 07d40dc20ca4..1f71293011ca 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -93,6 +93,7 @@ struct ethtool_link_ext_state_info {
 		enum ethtool_link_ext_substate_link_logical_mismatch link_logical_mismatch;
 		enum ethtool_link_ext_substate_bad_signal_integrity bad_signal_integrity;
 		enum ethtool_link_ext_substate_cable_issue cable_issue;
+		enum ethtool_link_ext_substate_module module;
 		u8 __link_ext_substate;
 	};
 };
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 0a52ee560c3a..ec2518e9d4e3 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -603,6 +603,7 @@ enum ethtool_link_ext_state {
 	ETHTOOL_LINK_EXT_STATE_CALIBRATION_FAILURE,
 	ETHTOOL_LINK_EXT_STATE_POWER_BUDGET_EXCEEDED,
 	ETHTOOL_LINK_EXT_STATE_OVERHEAT,
+	ETHTOOL_LINK_EXT_STATE_MODULE,
 };
 
 /* More information in addition to ETHTOOL_LINK_EXT_STATE_AUTONEG. */
@@ -647,6 +648,12 @@ enum ethtool_link_ext_substate_cable_issue {
 	ETHTOOL_LINK_EXT_SUBSTATE_CI_CABLE_TEST_FAILURE,
 };
 
+/* More information in addition to ETHTOOL_LINK_EXT_STATE_MODULE. */
+enum ethtool_link_ext_substate_module {
+	ETHTOOL_LINK_EXT_SUBSTATE_MODULE_LOW_POWER_MODE = 1,
+	ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY,
+};
+
 #define ETH_GSTRING_LEN		32
 
 /**
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next v2 6/6] mlxsw: Add support for transceiver module extended states
  2021-08-18 15:51 [RFC PATCH net-next v2 0/6] ethtool: Add ability to control transceiver modules' power mode Ido Schimmel
                   ` (4 preceding siblings ...)
  2021-08-18 15:52 ` [RFC PATCH net-next v2 5/6] ethtool: Add transceiver module extended states Ido Schimmel
@ 2021-08-18 15:52 ` Ido Schimmel
  5 siblings, 0 replies; 16+ messages in thread
From: Ido Schimmel @ 2021-08-18 15:52 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

Add support for the two transceiver module extended sub-states added in
previous patch. The two extended sub-states are meant to describe link
issues related to transceiver modules.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 .../mellanox/mlxsw/spectrum_ethtool.c         | 45 ++++++++++++++++++-
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
index 7329bc84a8ee..337fa0053de9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c
@@ -96,6 +96,11 @@ mlxsw_sp_link_ext_state_opcode_map[] = {
 	{1032, ETHTOOL_LINK_EXT_STATE_POWER_BUDGET_EXCEEDED, 0},
 
 	{1030, ETHTOOL_LINK_EXT_STATE_OVERHEAT, 0},
+
+	{1042, ETHTOOL_LINK_EXT_STATE_MODULE,
+	 ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY},
+	{2048, ETHTOOL_LINK_EXT_STATE_MODULE,
+	 ETHTOOL_LINK_EXT_SUBSTATE_MODULE_LOW_POWER_MODE},
 };
 
 static void
@@ -124,6 +129,10 @@ mlxsw_sp_port_set_link_ext_state(struct mlxsw_sp_ethtool_link_ext_state_opcode_m
 		link_ext_state_info->cable_issue =
 			link_ext_state_mapping.link_ext_substate;
 		break;
+	case ETHTOOL_LINK_EXT_STATE_MODULE:
+		link_ext_state_info->module =
+			link_ext_state_mapping.link_ext_substate;
+		break;
 	default:
 		break;
 	}
@@ -131,19 +140,46 @@ mlxsw_sp_port_set_link_ext_state(struct mlxsw_sp_ethtool_link_ext_state_opcode_m
 	link_ext_state_info->link_ext_state = link_ext_state_mapping.link_ext_state;
 }
 
+static int
+mlxsw_sp_port_status_opcode_drv_get(struct mlxsw_sp_port *mlxsw_sp_port,
+				    u32 *p_status_opcode)
+{
+	struct ethtool_module_power_mode_params params = {};
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	u8 module = mlxsw_sp_port->mapping.module;
+	int err;
+
+	*p_status_opcode = 0;
+
+	err = mlxsw_env_get_module_power_mode(mlxsw_sp->core, module, &params,
+					      NULL);
+	if (err)
+		return err;
+	if (params.mode_valid && params.mode == ETHTOOL_MODULE_POWER_MODE_LOW)
+		*p_status_opcode = 2048;
+
+	return 0;
+}
+
 static int
 mlxsw_sp_port_get_link_ext_state(struct net_device *dev,
 				 struct ethtool_link_ext_state_info *link_ext_state_info)
 {
 	struct mlxsw_sp_ethtool_link_ext_state_opcode_mapping link_ext_state_mapping;
 	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+	u32 status_opcode, status_opcode_drv;
 	char pddr_pl[MLXSW_REG_PDDR_LEN];
 	int opcode, err, i;
-	u32 status_opcode;
 
 	if (netif_carrier_ok(dev))
 		return -ENODATA;
 
+	/* Opcodes 2048-3072 are reserved for driver use. */
+	err = mlxsw_sp_port_status_opcode_drv_get(mlxsw_sp_port,
+						  &status_opcode_drv);
+	if (err)
+		return err;
+
 	mlxsw_reg_pddr_pack(pddr_pl, mlxsw_sp_port->local_port,
 			    MLXSW_REG_PDDR_PAGE_SELECT_TROUBLESHOOTING_INFO);
 
@@ -156,9 +192,14 @@ mlxsw_sp_port_get_link_ext_state(struct net_device *dev,
 		return err;
 
 	status_opcode = mlxsw_reg_pddr_trblsh_status_opcode_get(pddr_pl);
-	if (!status_opcode)
+	if (!status_opcode && !status_opcode_drv)
 		return -ENODATA;
 
+	/* Allow driver-detected issues to take precedence, as it is likely
+	 * that they are more meaningful to user space.
+	 */
+	status_opcode = status_opcode_drv ? status_opcode_drv : status_opcode;
+
 	for (i = 0; i < ARRAY_SIZE(mlxsw_sp_link_ext_state_opcode_map); i++) {
 		link_ext_state_mapping = mlxsw_sp_link_ext_state_opcode_map[i];
 		if (link_ext_state_mapping.status_opcode == status_opcode) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-18 15:51 ` [RFC PATCH net-next v2 1/6] " Ido Schimmel
@ 2021-08-18 22:32   ` Jakub Kicinski
  2021-08-18 22:55     ` Andrew Lunn
  2021-08-19  7:47     ` Ido Schimmel
  0 siblings, 2 replies; 16+ messages in thread
From: Jakub Kicinski @ 2021-08-18 22:32 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

On Wed, 18 Aug 2021 18:51:57 +0300 Ido Schimmel wrote:
> +MODULE_SET
> +==========
> +
> +Sets transceiver module parameters.
> +
> +Request contents:
> +
> +  ======================================  ======  ==========================
> +  ``ETHTOOL_A_MODULE_HEADER``             nested  request header
> +  ``ETHTOOL_A_MODULE_POWER_MODE_POLICY``  u8      power mode policy
> +  ======================================  ======  ==========================
> +
> +When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
> +to set the transceiver module power policy enforced by the host. Possible
> +values are:
> +
> +.. kernel-doc:: include/uapi/linux/ethtool.h
> +    :identifiers: ethtool_module_power_mode_policy
> +
> +For SFF-8636 modules, low power mode is forced by the host according to table
> +6-10 in revision 2.10a of the specification.
> +
> +For CMIS modules, low power mode is forced by the host according to table 6-12
> +in revision 5.0 of the specification.
> +
> +To avoid changes to the operational state of the device, power mode policy can
> +only be set when the device is administratively down.

Would you mind explaining why?

> +/**
> + * enum ethtool_module_power_mode_policy - plug-in module power mode policy
> + * @ETHTOOL_MODULE_POWER_MODE_POLICY_LOW: Module is always in low power mode.

Did you have a use case for this one or is it for completeness? Seems
like user can just bring the port down if they want no carrier? My
understanding was you primarily wanted the latter two, and those can
be freely changed when netdev is running, right?

> + * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
> + * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP: Module is transitioned by the
> + *	host to high power mode when the first port using it is put
> + *	administratively up and to low power mode when the last port using it
> + *	is put administratively down.

s/HIGH_ON_UP/AUTO/ ?
high on up == low on down, right, seems arbitrary to pick one over the
other

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-18 22:32   ` Jakub Kicinski
@ 2021-08-18 22:55     ` Andrew Lunn
  2021-08-19  7:50       ` Ido Schimmel
  2021-08-19  7:47     ` Ido Schimmel
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Lunn @ 2021-08-18 22:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ido Schimmel, netdev, davem, mkubecek, pali, jacob.e.keller,
	jiri, vadimp, mlxsw, Ido Schimmel

On Wed, Aug 18, 2021 at 03:32:41PM -0700, Jakub Kicinski wrote:
> On Wed, 18 Aug 2021 18:51:57 +0300 Ido Schimmel wrote:
> > +MODULE_SET
> > +==========
> > +
> > +Sets transceiver module parameters.
> > +
> > +Request contents:
> > +
> > +  ======================================  ======  ==========================
> > +  ``ETHTOOL_A_MODULE_HEADER``             nested  request header
> > +  ``ETHTOOL_A_MODULE_POWER_MODE_POLICY``  u8      power mode policy
> > +  ======================================  ======  ==========================
> > +
> > +When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
> > +to set the transceiver module power policy enforced by the host. Possible
> > +values are:
> > +
> > +.. kernel-doc:: include/uapi/linux/ethtool.h
> > +    :identifiers: ethtool_module_power_mode_policy
> > +
> > +For SFF-8636 modules, low power mode is forced by the host according to table
> > +6-10 in revision 2.10a of the specification.
> > +
> > +For CMIS modules, low power mode is forced by the host according to table 6-12
> > +in revision 5.0 of the specification.
> > +
> > +To avoid changes to the operational state of the device, power mode policy can
> > +only be set when the device is administratively down.
> 
> Would you mind explaining why?

Part of the issue is we have two different sorts of policy mixed
together.

ETHTOOL_MODULE_POWER_MODE_POLICY_LOW and
ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH change the state now. This could
be a surprise to a user when there link disappears for the
ETHTOOL_MODULE_POWER_MODE_POLICY_LOW case, when the interface is admin up.

ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP however follows the state
of the interface. So there should not be any surprises.

I actually think ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP should be
allowed at any time, just to make it easier to use.

> > +/**
> > + * enum ethtool_module_power_mode_policy - plug-in module power mode policy
> > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_LOW: Module is always in low power mode.
> 
> Did you have a use case for this one or is it for completeness? Seems
> like user can just bring the port down if they want no carrier? My
> understanding was you primarily wanted the latter two, and those can
> be freely changed when netdev is running, right?
> 
> > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
> > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP: Module is transitioned by the
> > + *	host to high power mode when the first port using it is put
> > + *	administratively up and to low power mode when the last port using it
> > + *	is put administratively down.
> 
> s/HIGH_ON_UP/AUTO/ ?
> high on up == low on down, right, seems arbitrary to pick one over the
> other

Should we also document what the default is? Seems like
ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP is the generic network
interface default, so maybe it should also be the default for SFPs?

	  Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-18 22:32   ` Jakub Kicinski
  2021-08-18 22:55     ` Andrew Lunn
@ 2021-08-19  7:47     ` Ido Schimmel
  2021-08-19 14:38       ` Jakub Kicinski
  1 sibling, 1 reply; 16+ messages in thread
From: Ido Schimmel @ 2021-08-19  7:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

On Wed, Aug 18, 2021 at 03:32:41PM -0700, Jakub Kicinski wrote:
> On Wed, 18 Aug 2021 18:51:57 +0300 Ido Schimmel wrote:
> > +MODULE_SET
> > +==========
> > +
> > +Sets transceiver module parameters.
> > +
> > +Request contents:
> > +
> > +  ======================================  ======  ==========================
> > +  ``ETHTOOL_A_MODULE_HEADER``             nested  request header
> > +  ``ETHTOOL_A_MODULE_POWER_MODE_POLICY``  u8      power mode policy
> > +  ======================================  ======  ==========================
> > +
> > +When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
> > +to set the transceiver module power policy enforced by the host. Possible
> > +values are:
> > +
> > +.. kernel-doc:: include/uapi/linux/ethtool.h
> > +    :identifiers: ethtool_module_power_mode_policy
> > +
> > +For SFF-8636 modules, low power mode is forced by the host according to table
> > +6-10 in revision 2.10a of the specification.
> > +
> > +For CMIS modules, low power mode is forced by the host according to table 6-12
> > +in revision 5.0 of the specification.
> > +
> > +To avoid changes to the operational state of the device, power mode policy can
> > +only be set when the device is administratively down.
> 
> Would you mind explaining why?

Yes, it is more restrictive than it should be. The check can be relaxed
to only disallow transition to low power mode when the device is
administratively up.

> 
> > +/**
> > + * enum ethtool_module_power_mode_policy - plug-in module power mode policy
> > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_LOW: Module is always in low power mode.
> 
> Did you have a use case for this one or is it for completeness? Seems
> like user can just bring the port down if they want no carrier? My
> understanding was you primarily wanted the latter two, and those can
> be freely changed when netdev is running, right?

In all the implementations I could find (3), the user interface is
high/low (on/off). The proposed interface is more flexible as it
contains both the 'high' / 'low' policies in addition to the more user
friendly 'high-on-up' ('auto') policy.

Given that keeping the 'low' policy does not complicate the
implementation / maintenance and that it provides users with a similar
interface to what they are used to from other implementations, I would
like to keep it in addition to the other two policies.

> 
> > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
> > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP: Module is transitioned by the
> > + *	host to high power mode when the first port using it is put
> > + *	administratively up and to low power mode when the last port using it
> > + *	is put administratively down.
> 
> s/HIGH_ON_UP/AUTO/ ?

OK

> high on up == low on down, right, seems arbitrary to pick one over the
> other

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-18 22:55     ` Andrew Lunn
@ 2021-08-19  7:50       ` Ido Schimmel
  2021-08-19 13:13         ` Andrew Lunn
  0 siblings, 1 reply; 16+ messages in thread
From: Ido Schimmel @ 2021-08-19  7:50 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, netdev, davem, mkubecek, pali, jacob.e.keller,
	jiri, vadimp, mlxsw, Ido Schimmel

On Thu, Aug 19, 2021 at 12:55:43AM +0200, Andrew Lunn wrote:
> On Wed, Aug 18, 2021 at 03:32:41PM -0700, Jakub Kicinski wrote:
> > On Wed, 18 Aug 2021 18:51:57 +0300 Ido Schimmel wrote:
> > > +MODULE_SET
> > > +==========
> > > +
> > > +Sets transceiver module parameters.
> > > +
> > > +Request contents:
> > > +
> > > +  ======================================  ======  ==========================
> > > +  ``ETHTOOL_A_MODULE_HEADER``             nested  request header
> > > +  ``ETHTOOL_A_MODULE_POWER_MODE_POLICY``  u8      power mode policy
> > > +  ======================================  ======  ==========================
> > > +
> > > +When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
> > > +to set the transceiver module power policy enforced by the host. Possible
> > > +values are:
> > > +
> > > +.. kernel-doc:: include/uapi/linux/ethtool.h
> > > +    :identifiers: ethtool_module_power_mode_policy
> > > +
> > > +For SFF-8636 modules, low power mode is forced by the host according to table
> > > +6-10 in revision 2.10a of the specification.
> > > +
> > > +For CMIS modules, low power mode is forced by the host according to table 6-12
> > > +in revision 5.0 of the specification.
> > > +
> > > +To avoid changes to the operational state of the device, power mode policy can
> > > +only be set when the device is administratively down.
> > 
> > Would you mind explaining why?
> 
> Part of the issue is we have two different sorts of policy mixed
> together.
> 
> ETHTOOL_MODULE_POWER_MODE_POLICY_LOW and
> ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH change the state now. This could
> be a surprise to a user when there link disappears for the
> ETHTOOL_MODULE_POWER_MODE_POLICY_LOW case, when the interface is admin up.
> 
> ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP however follows the state
> of the interface. So there should not be any surprises.
> 
> I actually think ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP should be
> allowed at any time, just to make it easier to use.

Yes

> 
> > > +/**
> > > + * enum ethtool_module_power_mode_policy - plug-in module power mode policy
> > > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_LOW: Module is always in low power mode.
> > 
> > Did you have a use case for this one or is it for completeness? Seems
> > like user can just bring the port down if they want no carrier? My
> > understanding was you primarily wanted the latter two, and those can
> > be freely changed when netdev is running, right?
> > 
> > > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
> > > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP: Module is transitioned by the
> > > + *	host to high power mode when the first port using it is put
> > > + *	administratively up and to low power mode when the last port using it
> > > + *	is put administratively down.
> > 
> > s/HIGH_ON_UP/AUTO/ ?
> > high on up == low on down, right, seems arbitrary to pick one over the
> > other
> 
> Should we also document what the default is? Seems like
> ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP is the generic network
> interface default, so maybe it should also be the default for SFPs?

I will add a note in Documentation/networking/ethtool-netlink.rst that
the default power mode policy is driver-dependent (can be queried) and
that it can either be 'high' or 'auto'.

> 
> 	  Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-19  7:50       ` Ido Schimmel
@ 2021-08-19 13:13         ` Andrew Lunn
  2021-08-19 14:34           ` Ido Schimmel
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Lunn @ 2021-08-19 13:13 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jakub Kicinski, netdev, davem, mkubecek, pali, jacob.e.keller,
	jiri, vadimp, mlxsw, Ido Schimmel

> > Should we also document what the default is? Seems like
> > ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP is the generic network
> > interface default, so maybe it should also be the default for SFPs?
> 
> I will add a note in Documentation/networking/ethtool-netlink.rst that
> the default power mode policy is driver-dependent (can be queried) and
> that it can either be 'high' or 'auto'.

Hi Ido

That is kind of my question. Do you want the default driver defined,
and varying between implementations, or do we want a clearly defined
default?

The stack has a mixture of both. An interface is admin down by
default, but it is anybody guess how pause will be configured?

By making it driver undefined, you cannot assume anything, and you
require user space to always configure it.

I don't have too strong an opinion, i'm more interested in what others
say, those who have to live with this.

	Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-19 13:13         ` Andrew Lunn
@ 2021-08-19 14:34           ` Ido Schimmel
  2021-08-19 14:42             ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: Ido Schimmel @ 2021-08-19 14:34 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, netdev, davem, mkubecek, pali, jacob.e.keller,
	jiri, vadimp, mlxsw, Ido Schimmel

On Thu, Aug 19, 2021 at 03:13:10PM +0200, Andrew Lunn wrote:
> > > Should we also document what the default is? Seems like
> > > ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH_ON_UP is the generic network
> > > interface default, so maybe it should also be the default for SFPs?
> > 
> > I will add a note in Documentation/networking/ethtool-netlink.rst that
> > the default power mode policy is driver-dependent (can be queried) and
> > that it can either be 'high' or 'auto'.
> 
> Hi Ido

Hi Andrew,

> 
> That is kind of my question. Do you want the default driver defined,
> and varying between implementations, or do we want a clearly defined
> default?
> 
> The stack has a mixture of both. An interface is admin down by
> default, but it is anybody guess how pause will be configured?
> 
> By making it driver undefined, you cannot assume anything, and you
> require user space to always configure it.
> 
> I don't have too strong an opinion, i'm more interested in what others
> say, those who have to live with this.

I evaluated the link up times using a QSFP module [1] connected to my
system. There is a 36% increase in link up times when using the 'auto'
policy compared to the 'high' policy (default on all Mellanox systems).
Very noticeable and very measurable.

Couple the above with the fact that despite shipping millions of ports
over the years, we are only now getting requests to control the power
mode of transceivers and from a small number of users.

In addition, any user space that is interested in changing the behavior,
has the ability to query the default policy and override it in a
vendor-agnostic way.

Therefore, I'm strictly against changing the existing behavior.

[1] https://www.mellanox.com/related-docs/prod_cables/PB_MFS1S00-VxxxE_200GbE_QSFP56_AOC.pdf

> 
> 	Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-19  7:47     ` Ido Schimmel
@ 2021-08-19 14:38       ` Jakub Kicinski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2021-08-19 14:38 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, andrew, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

On Thu, 19 Aug 2021 10:47:03 +0300 Ido Schimmel wrote:
> > > + * enum ethtool_module_power_mode_policy - plug-in module power mode policy
> > > + * @ETHTOOL_MODULE_POWER_MODE_POLICY_LOW: Module is always in low power mode.  
> > 
> > Did you have a use case for this one or is it for completeness? Seems
> > like user can just bring the port down if they want no carrier? My
> > understanding was you primarily wanted the latter two, and those can
> > be freely changed when netdev is running, right?  
> 
> In all the implementations I could find (3), the user interface is
> high/low (on/off). The proposed interface is more flexible as it
> contains both the 'high' / 'low' policies in addition to the more user
> friendly 'high-on-up' ('auto') policy.

on/off is probably due to blindly setting the SFP bit. Our intention
here is to set a policy with respect to the netdev state (or integrate
the setting with the rest of the stack, if you will) not just control 
the bit.

IMO we should leave the value out of the enum until the use case for 
it becomes clear. Adding it later is simple enough.

> Given that keeping the 'low' policy does not complicate the
> implementation / maintenance

I'd argue it does. The netif_running() check is exactly to prevent
carrier from going away, so it only makes sense if the low setting 
exists. We can switch between 'auto' and 'high' any time.

> and that it provides users with a similar interface to what they are
> used to from other implementations, I would like to keep it in
> addition to the other two policies.

IIUC the existing interfaces are build around the architecture where
driver/fw do not control the SFP automatically IOW 'auto' does not
exist. 'low' is there for disabling the SFP when interface is down.

The interface where 'low' can't be set while the netdev is up is
*already* not 1:1 with the out of tree APIs, right?

I'd bet that if you convert users of existing APIs to map 'on' to
(always-)high and 'off' to auto everyone will be happy.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-19 14:34           ` Ido Schimmel
@ 2021-08-19 14:42             ` Jakub Kicinski
  2021-08-19 22:38               ` Keller, Jacob E
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2021-08-19 14:42 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, netdev, davem, mkubecek, pali, jacob.e.keller, jiri,
	vadimp, mlxsw, Ido Schimmel

On Thu, 19 Aug 2021 17:34:46 +0300 Ido Schimmel wrote:
> > That is kind of my question. Do you want the default driver defined,
> > and varying between implementations, or do we want a clearly defined
> > default?
> > 
> > The stack has a mixture of both. An interface is admin down by
> > default, but it is anybody guess how pause will be configured?
> > 
> > By making it driver undefined, you cannot assume anything, and you
> > require user space to always configure it.
> > 
> > I don't have too strong an opinion, i'm more interested in what others
> > say, those who have to live with this.  
> 
> I evaluated the link up times using a QSFP module [1] connected to my
> system. There is a 36% increase in link up times when using the 'auto'
> policy compared to the 'high' policy (default on all Mellanox systems).
> Very noticeable and very measurable.
> 
> Couple the above with the fact that despite shipping millions of ports
> over the years, we are only now getting requests to control the power
> mode of transceivers and from a small number of users.
> 
> In addition, any user space that is interested in changing the behavior,
> has the ability to query the default policy and override it in a
> vendor-agnostic way.
> 
> Therefore, I'm strictly against changing the existing behavior.
> 
> [1] https://www.mellanox.com/related-docs/prod_cables/PB_MFS1S00-VxxxE_200GbE_QSFP56_AOC.pdf

Fine by me FWIW. Obviously in an ideal world we'd have uniform presets
as part of 'what it means to be upstream Linux' but from practical
standpoint where most features start out of tree having the requirement
of uniformity will be an impediment preventing vendors from switching to
upstream APIs. That's my personal opinion or should I say 'gut feeling',
I could well be wrong.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control transceiver modules' power mode
  2021-08-19 14:42             ` Jakub Kicinski
@ 2021-08-19 22:38               ` Keller, Jacob E
  0 siblings, 0 replies; 16+ messages in thread
From: Keller, Jacob E @ 2021-08-19 22:38 UTC (permalink / raw)
  To: Jakub Kicinski, Ido Schimmel
  Cc: Andrew Lunn, netdev, davem, mkubecek, pali, jiri, vadimp, mlxsw,
	Ido Schimmel



> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, August 19, 2021 7:43 AM
> To: Ido Schimmel <idosch@idosch.org>
> Cc: Andrew Lunn <andrew@lunn.ch>; netdev@vger.kernel.org;
> davem@davemloft.net; mkubecek@suse.cz; pali@kernel.org; Keller, Jacob E
> <jacob.e.keller@intel.com>; jiri@nvidia.com; vadimp@nvidia.com;
> mlxsw@nvidia.com; Ido Schimmel <idosch@nvidia.com>
> Subject: Re: [RFC PATCH net-next v2 1/6] ethtool: Add ability to control
> transceiver modules' power mode
> 
> On Thu, 19 Aug 2021 17:34:46 +0300 Ido Schimmel wrote:
> > > That is kind of my question. Do you want the default driver defined,
> > > and varying between implementations, or do we want a clearly defined
> > > default?
> > >
> > > The stack has a mixture of both. An interface is admin down by
> > > default, but it is anybody guess how pause will be configured?
> > >
> > > By making it driver undefined, you cannot assume anything, and you
> > > require user space to always configure it.
> > >
> > > I don't have too strong an opinion, i'm more interested in what others
> > > say, those who have to live with this.
> >
> > I evaluated the link up times using a QSFP module [1] connected to my
> > system. There is a 36% increase in link up times when using the 'auto'
> > policy compared to the 'high' policy (default on all Mellanox systems).
> > Very noticeable and very measurable.
> >
> > Couple the above with the fact that despite shipping millions of ports
> > over the years, we are only now getting requests to control the power
> > mode of transceivers and from a small number of users.
> >
> > In addition, any user space that is interested in changing the behavior,
> > has the ability to query the default policy and override it in a
> > vendor-agnostic way.
> >
> > Therefore, I'm strictly against changing the existing behavior.
> >
> > [1] https://www.mellanox.com/related-docs/prod_cables/PB_MFS1S00-
> VxxxE_200GbE_QSFP56_AOC.pdf
> 
> Fine by me FWIW. Obviously in an ideal world we'd have uniform presets
> as part of 'what it means to be upstream Linux' but from practical
> standpoint where most features start out of tree having the requirement
> of uniformity will be an impediment preventing vendors from switching to
> upstream APIs. That's my personal opinion or should I say 'gut feeling',
> I could well be wrong.

I think it makes sense to push for uniform presets where we can. But in some cases where we already have a variety of differing behaviors this becomes difficult.

I would agree in pushing for uniform defaults in cases where we're clearly adding new functionality. However in cases like this where there exists a wide spectrum of behaviors, it makes more sense to allow individual drivers to maintain the same behavior while reporting that up with the option to configure or change it. It's not ideal if we did everything from scratch, but it's the current reality.

Thanks,
Jake 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-08-19 22:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18 15:51 [RFC PATCH net-next v2 0/6] ethtool: Add ability to control transceiver modules' power mode Ido Schimmel
2021-08-18 15:51 ` [RFC PATCH net-next v2 1/6] " Ido Schimmel
2021-08-18 22:32   ` Jakub Kicinski
2021-08-18 22:55     ` Andrew Lunn
2021-08-19  7:50       ` Ido Schimmel
2021-08-19 13:13         ` Andrew Lunn
2021-08-19 14:34           ` Ido Schimmel
2021-08-19 14:42             ` Jakub Kicinski
2021-08-19 22:38               ` Keller, Jacob E
2021-08-19  7:47     ` Ido Schimmel
2021-08-19 14:38       ` Jakub Kicinski
2021-08-18 15:51 ` [RFC PATCH net-next v2 2/6] mlxsw: reg: Add Port Module Memory Map Properties register Ido Schimmel
2021-08-18 15:51 ` [RFC PATCH net-next v2 3/6] mlxsw: reg: Add Management Cable IO and Notifications register Ido Schimmel
2021-08-18 15:52 ` [RFC PATCH net-next v2 4/6] mlxsw: Add ability to control transceiver modules' power mode Ido Schimmel
2021-08-18 15:52 ` [RFC PATCH net-next v2 5/6] ethtool: Add transceiver module extended states Ido Schimmel
2021-08-18 15:52 ` [RFC PATCH net-next v2 6/6] mlxsw: Add support for " Ido Schimmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.