All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/6] Add RTNL interface for SyncE
@ 2021-11-05 20:53 ` Maciej Machnikowski
  0 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: maciej.machnikowski, netdev, intel-wired-lan
  Cc: richardcochran, abyagowi, anthony.l.nguyen, davem, kuba,
	linux-kselftest, idosch, mkubecek, saeed, michael.chan

Synchronous Ethernet networks use a physical layer clock to syntonize
the frequency across different network elements.

Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
Equipment Clock (EEC) and have the ability to recover synchronization
from the synchronization inputs - either traffic interfaces or external
frequency sources.
The EEC can synchronize its frequency (syntonize) to any of those sources.
It is also able to select synchronization source through priority tables
and synchronization status messaging. It also provides neccessary
filtering and holdover capabilities

This patch series introduces basic interface for reading the Ethernet
Equipment Clock (EEC) state on a SyncE capable device. This state gives
information about the source of the syntonization signal (ether my port,
or any external one) and the state of EEC. This interface is required\
to implement Synchronization Status Messaging on upper layers.

v2:
- improved documentation
- fixed kdoc warning

RFC history:
v2:
- removed whitespace changes
- fix issues reported by test robot
v3:
- Changed naming from SyncE to EEC
- Clarify cover letter and commit message for patch 1
v4:
- Removed sync_source and pin_idx info
- Changed one structure to attributes
- Added EEC_SRC_PORT flag to indicate that the EEC is synchronized
  to the recovered clock of a port that returns the state
v5:
- add EEC source as an optiona attribute
- implement support for recovered clocks
- align states returned by EEC to ITU-T G.781
v6:
- fix EEC clock state reporting
- add documentation
- fix descriptions in code comments

Maciej Machnikowski (6):
  ice: add support detecting features based on netlist
  rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
  ice: add support for reading SyncE DPLL state
  rtnetlink: Add support for SyncE recovered clock configuration
  ice: add support for SyncE recovered clocks
  docs: net: Add description of SyncE interfaces

 Documentation/networking/synce.rst            | 117 ++++++++
 drivers/net/ethernet/intel/ice/ice.h          |   7 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  94 ++++++-
 drivers/net/ethernet/intel/ice/ice_common.c   | 224 ++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |  20 +-
 drivers/net/ethernet/intel/ice/ice_devids.h   |   3 +
 drivers/net/ethernet/intel/ice/ice_lib.c      |   6 +-
 drivers/net/ethernet/intel/ice/ice_main.c     | 137 ++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp.c      |  34 +++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   |  49 ++++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h   |  22 ++
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 include/linux/netdevice.h                     |  33 +++
 include/uapi/linux/if_link.h                  |  57 ++++
 include/uapi/linux/rtnetlink.h                |  10 +
 net/core/rtnetlink.c                          | 253 ++++++++++++++++++
 security/selinux/nlmsgtab.c                   |   6 +-
 17 files changed, 1069 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/networking/synce.rst

-- 
2.26.3


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 0/6] Add RTNL interface for SyncE
@ 2021-11-05 20:53 ` Maciej Machnikowski
  0 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: intel-wired-lan

Synchronous Ethernet networks use a physical layer clock to syntonize
the frequency across different network elements.

Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
Equipment Clock (EEC) and have the ability to recover synchronization
from the synchronization inputs - either traffic interfaces or external
frequency sources.
The EEC can synchronize its frequency (syntonize) to any of those sources.
It is also able to select synchronization source through priority tables
and synchronization status messaging. It also provides neccessary
filtering and holdover capabilities

This patch series introduces basic interface for reading the Ethernet
Equipment Clock (EEC) state on a SyncE capable device. This state gives
information about the source of the syntonization signal (ether my port,
or any external one) and the state of EEC. This interface is required\
to implement Synchronization Status Messaging on upper layers.

v2:
- improved documentation
- fixed kdoc warning

RFC history:
v2:
- removed whitespace changes
- fix issues reported by test robot
v3:
- Changed naming from SyncE to EEC
- Clarify cover letter and commit message for patch 1
v4:
- Removed sync_source and pin_idx info
- Changed one structure to attributes
- Added EEC_SRC_PORT flag to indicate that the EEC is synchronized
  to the recovered clock of a port that returns the state
v5:
- add EEC source as an optiona attribute
- implement support for recovered clocks
- align states returned by EEC to ITU-T G.781
v6:
- fix EEC clock state reporting
- add documentation
- fix descriptions in code comments

Maciej Machnikowski (6):
  ice: add support detecting features based on netlist
  rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
  ice: add support for reading SyncE DPLL state
  rtnetlink: Add support for SyncE recovered clock configuration
  ice: add support for SyncE recovered clocks
  docs: net: Add description of SyncE interfaces

 Documentation/networking/synce.rst            | 117 ++++++++
 drivers/net/ethernet/intel/ice/ice.h          |   7 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  94 ++++++-
 drivers/net/ethernet/intel/ice/ice_common.c   | 224 ++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |  20 +-
 drivers/net/ethernet/intel/ice/ice_devids.h   |   3 +
 drivers/net/ethernet/intel/ice/ice_lib.c      |   6 +-
 drivers/net/ethernet/intel/ice/ice_main.c     | 137 ++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp.c      |  34 +++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   |  49 ++++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h   |  22 ++
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 include/linux/netdevice.h                     |  33 +++
 include/uapi/linux/if_link.h                  |  57 ++++
 include/uapi/linux/rtnetlink.h                |  10 +
 net/core/rtnetlink.c                          | 253 ++++++++++++++++++
 security/selinux/nlmsgtab.c                   |   6 +-
 17 files changed, 1069 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/networking/synce.rst

-- 
2.26.3


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 net-next 1/6] ice: add support detecting features based on netlist
  2021-11-05 20:53 ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-05 20:53   ` Maciej Machnikowski
  -1 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: maciej.machnikowski, netdev, intel-wired-lan
  Cc: richardcochran, abyagowi, anthony.l.nguyen, davem, kuba,
	linux-kselftest, idosch, mkubecek, saeed, michael.chan

Add new functions to check netlist of a given board for:
- Recovered Clock device,
- Clock Generation Unit,
- Clock Multiplexer,

Initialize feature bits depending on detected components.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   7 +-
 drivers/net/ethernet/intel/ice/ice_common.c   | 123 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |   9 ++
 drivers/net/ethernet/intel/ice/ice_lib.c      |   6 +-
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   |   1 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 7 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index bf4ecd9a517c..3dc4caa41565 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -186,6 +186,8 @@
 
 enum ice_feature {
 	ICE_F_DSCP,
+	ICE_F_CGU,
+	ICE_F_PHY_RCLK,
 	ICE_F_SMA_CTRL,
 	ICE_F_MAX
 };
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 4eef3488d86f..339c2a86f680 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1297,6 +1297,8 @@ struct ice_aqc_link_topo_params {
 #define ICE_AQC_LINK_TOPO_NODE_TYPE_CAGE	6
 #define ICE_AQC_LINK_TOPO_NODE_TYPE_MEZZ	7
 #define ICE_AQC_LINK_TOPO_NODE_TYPE_ID_EEPROM	8
+#define ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_CTRL	9
+#define ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_MUX	10
 #define ICE_AQC_LINK_TOPO_NODE_CTX_S		4
 #define ICE_AQC_LINK_TOPO_NODE_CTX_M		\
 				(0xF << ICE_AQC_LINK_TOPO_NODE_CTX_S)
@@ -1333,7 +1335,10 @@ struct ice_aqc_link_topo_addr {
 struct ice_aqc_get_link_topo {
 	struct ice_aqc_link_topo_addr addr;
 	u8 node_part_num;
-#define ICE_AQC_GET_LINK_TOPO_NODE_NR_PCA9575	0x21
+#define ICE_AQC_GET_LINK_TOPO_NODE_NR_PCA9575		0x21
+#define ICE_ACQ_GET_LINK_TOPO_NODE_NR_ZL30632_80032	0x24
+#define ICE_ACQ_GET_LINK_TOPO_NODE_NR_PKVL		0x31
+#define ICE_ACQ_GET_LINK_TOPO_NODE_NR_GEN_CLK_MUX	0x47
 	u8 rsvd[9];
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index b3066d0fea8b..35903b282885 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -274,6 +274,79 @@ ice_aq_get_link_topo_handle(struct ice_port_info *pi, u8 node_type,
 	return ice_aq_send_cmd(pi->hw, &desc, NULL, 0, cd);
 }
 
+/**
+ * ice_aq_get_netlist_node
+ * @hw: pointer to the hw struct
+ * @cmd: get_link_topo AQ structure
+ * @node_part_number: output node part number if node found
+ * @node_handle: output node handle parameter if node found
+ */
+enum ice_status
+ice_aq_get_netlist_node(struct ice_hw *hw, struct ice_aqc_get_link_topo *cmd,
+			u8 *node_part_number, u16 *node_handle)
+{
+	struct ice_aq_desc desc;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_link_topo);
+	desc.params.get_link_topo = *cmd;
+
+	if (ice_aq_send_cmd(hw, &desc, NULL, 0, NULL))
+		return ICE_ERR_NOT_SUPPORTED;
+
+	if (node_handle)
+		*node_handle =
+			le16_to_cpu(desc.params.get_link_topo.addr.handle);
+	if (node_part_number)
+		*node_part_number = desc.params.get_link_topo.node_part_num;
+
+	return ICE_SUCCESS;
+}
+
+#define MAX_NETLIST_SIZE 10
+/**
+ * ice_find_netlist_node
+ * @hw: pointer to the hw struct
+ * @node_type_ctx: type of netlist node to look for
+ * @node_part_number: node part number to look for
+ * @node_handle: output parameter if node found - optional
+ *
+ * Find and return the node handle for a given node type and part number in the
+ * netlist. When found ICE_SUCCESS is returned, ICE_ERR_DOES_NOT_EXIST
+ * otherwise. If @node_handle provided, it would be set to found node handle.
+ */
+enum ice_status
+ice_find_netlist_node(struct ice_hw *hw, u8 node_type_ctx, u8 node_part_number,
+		      u16 *node_handle)
+{
+	struct ice_aqc_get_link_topo cmd;
+	u8 rec_node_part_number;
+	enum ice_status status;
+	u16 rec_node_handle;
+	u8 idx;
+
+	for (idx = 0; idx < MAX_NETLIST_SIZE; idx++) {
+		memset(&cmd, 0, sizeof(cmd));
+
+		cmd.addr.topo_params.node_type_ctx =
+			(node_type_ctx << ICE_AQC_LINK_TOPO_NODE_TYPE_S);
+		cmd.addr.topo_params.index = idx;
+
+		status = ice_aq_get_netlist_node(hw, &cmd,
+						 &rec_node_part_number,
+						 &rec_node_handle);
+		if (status)
+			return status;
+
+		if (rec_node_part_number == node_part_number) {
+			if (node_handle)
+				*node_handle = rec_node_handle;
+			return ICE_SUCCESS;
+		}
+	}
+
+	return ICE_ERR_DOES_NOT_EXIST;
+}
+
 /**
  * ice_is_media_cage_present
  * @pi: port information structure
@@ -5083,3 +5156,53 @@ bool ice_fw_supports_report_dflt_cfg(struct ice_hw *hw)
 	}
 	return false;
 }
+
+/**
+ * ice_is_phy_rclk_present_e810t
+ * @hw: pointer to the hw struct
+ *
+ * Check if the PHY Recovered Clock device is present in the netlist
+ */
+bool ice_is_phy_rclk_present_e810t(struct ice_hw *hw)
+{
+	if (ice_find_netlist_node(hw, ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_CTRL,
+				  ICE_ACQ_GET_LINK_TOPO_NODE_NR_PKVL, NULL))
+		return false;
+
+	return true;
+}
+
+/**
+ * ice_is_cgu_present_e810t
+ * @hw: pointer to the hw struct
+ *
+ * Check if the Clock Generation Unit (CGU) device is present in the netlist
+ */
+bool ice_is_cgu_present_e810t(struct ice_hw *hw)
+{
+	if (!ice_find_netlist_node(hw, ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_CTRL,
+				   ICE_ACQ_GET_LINK_TOPO_NODE_NR_ZL30632_80032,
+				   NULL)) {
+		hw->cgu_part_number =
+			ICE_ACQ_GET_LINK_TOPO_NODE_NR_ZL30632_80032;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * ice_is_clock_mux_present_e810t
+ * @hw: pointer to the hw struct
+ *
+ * Check if the Clock Multiplexer device is present in the netlist
+ */
+bool ice_is_clock_mux_present_e810t(struct ice_hw *hw)
+{
+	if (ice_find_netlist_node(hw, ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_MUX,
+				  ICE_ACQ_GET_LINK_TOPO_NODE_NR_GEN_CLK_MUX,
+				  NULL))
+		return false;
+
+	return true;
+}
+
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 65c1b3244264..b20a5c085246 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -89,6 +89,12 @@ ice_aq_get_phy_caps(struct ice_port_info *pi, bool qual_mods, u8 report_mode,
 		    struct ice_aqc_get_phy_caps_data *caps,
 		    struct ice_sq_cd *cd);
 enum ice_status
+ice_aq_get_netlist_node(struct ice_hw *hw, struct ice_aqc_get_link_topo *cmd,
+			u8 *node_part_number, u16 *node_handle);
+enum ice_status
+ice_find_netlist_node(struct ice_hw *hw, u8 node_type_ctx, u8 node_part_number,
+		      u16 *node_handle);
+enum ice_status
 ice_aq_list_caps(struct ice_hw *hw, void *buf, u16 buf_size, u32 *cap_count,
 		 enum ice_adminq_opc opc, struct ice_sq_cd *cd);
 enum ice_status
@@ -206,4 +212,7 @@ bool ice_fw_supports_lldp_fltr_ctrl(struct ice_hw *hw);
 enum ice_status
 ice_lldp_fltr_add_remove(struct ice_hw *hw, u16 vsi_num, bool add);
 bool ice_fw_supports_report_dflt_cfg(struct ice_hw *hw);
+bool ice_is_phy_rclk_present_e810t(struct ice_hw *hw);
+bool ice_is_cgu_present_e810t(struct ice_hw *hw);
+bool ice_is_clock_mux_present_e810t(struct ice_hw *hw);
 #endif /* _ICE_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 40562600a8cf..2422215b7937 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -4183,8 +4183,12 @@ void ice_init_feature_support(struct ice_pf *pf)
 	case ICE_DEV_ID_E810C_QSFP:
 	case ICE_DEV_ID_E810C_SFP:
 		ice_set_feature_support(pf, ICE_F_DSCP);
-		if (ice_is_e810t(&pf->hw))
+		if (ice_is_clock_mux_present_e810t(&pf->hw))
 			ice_set_feature_support(pf, ICE_F_SMA_CTRL);
+		if (ice_is_phy_rclk_present_e810t(&pf->hw))
+			ice_set_feature_support(pf, ICE_F_PHY_RCLK);
+		if (ice_is_cgu_present_e810t(&pf->hw))
+			ice_set_feature_support(pf, ICE_F_CGU);
 		break;
 	default:
 		break;
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index 29f947c0cd2e..aa257db36765 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -800,3 +800,4 @@ bool ice_is_pca9575_present(struct ice_hw *hw)
 
 	return !status && handle;
 }
+
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 9e0c2923c62e..a9dc16641bd4 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -920,6 +920,7 @@ struct ice_hw {
 	struct list_head rss_list_head;
 	struct ice_mbx_snapshot mbx_snapshot;
 	u16 io_expander_handle;
+	u8 cgu_part_number;
 };
 
 /* Statistics collected by each port, VSI, VEB, and S-channel */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 1/6] ice: add support detecting features based on netlist
@ 2021-11-05 20:53   ` Maciej Machnikowski
  0 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: intel-wired-lan

Add new functions to check netlist of a given board for:
- Recovered Clock device,
- Clock Generation Unit,
- Clock Multiplexer,

Initialize feature bits depending on detected components.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   7 +-
 drivers/net/ethernet/intel/ice/ice_common.c   | 123 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |   9 ++
 drivers/net/ethernet/intel/ice/ice_lib.c      |   6 +-
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   |   1 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 7 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index bf4ecd9a517c..3dc4caa41565 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -186,6 +186,8 @@
 
 enum ice_feature {
 	ICE_F_DSCP,
+	ICE_F_CGU,
+	ICE_F_PHY_RCLK,
 	ICE_F_SMA_CTRL,
 	ICE_F_MAX
 };
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 4eef3488d86f..339c2a86f680 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1297,6 +1297,8 @@ struct ice_aqc_link_topo_params {
 #define ICE_AQC_LINK_TOPO_NODE_TYPE_CAGE	6
 #define ICE_AQC_LINK_TOPO_NODE_TYPE_MEZZ	7
 #define ICE_AQC_LINK_TOPO_NODE_TYPE_ID_EEPROM	8
+#define ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_CTRL	9
+#define ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_MUX	10
 #define ICE_AQC_LINK_TOPO_NODE_CTX_S		4
 #define ICE_AQC_LINK_TOPO_NODE_CTX_M		\
 				(0xF << ICE_AQC_LINK_TOPO_NODE_CTX_S)
@@ -1333,7 +1335,10 @@ struct ice_aqc_link_topo_addr {
 struct ice_aqc_get_link_topo {
 	struct ice_aqc_link_topo_addr addr;
 	u8 node_part_num;
-#define ICE_AQC_GET_LINK_TOPO_NODE_NR_PCA9575	0x21
+#define ICE_AQC_GET_LINK_TOPO_NODE_NR_PCA9575		0x21
+#define ICE_ACQ_GET_LINK_TOPO_NODE_NR_ZL30632_80032	0x24
+#define ICE_ACQ_GET_LINK_TOPO_NODE_NR_PKVL		0x31
+#define ICE_ACQ_GET_LINK_TOPO_NODE_NR_GEN_CLK_MUX	0x47
 	u8 rsvd[9];
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index b3066d0fea8b..35903b282885 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -274,6 +274,79 @@ ice_aq_get_link_topo_handle(struct ice_port_info *pi, u8 node_type,
 	return ice_aq_send_cmd(pi->hw, &desc, NULL, 0, cd);
 }
 
+/**
+ * ice_aq_get_netlist_node
+ * @hw: pointer to the hw struct
+ * @cmd: get_link_topo AQ structure
+ * @node_part_number: output node part number if node found
+ * @node_handle: output node handle parameter if node found
+ */
+enum ice_status
+ice_aq_get_netlist_node(struct ice_hw *hw, struct ice_aqc_get_link_topo *cmd,
+			u8 *node_part_number, u16 *node_handle)
+{
+	struct ice_aq_desc desc;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_link_topo);
+	desc.params.get_link_topo = *cmd;
+
+	if (ice_aq_send_cmd(hw, &desc, NULL, 0, NULL))
+		return ICE_ERR_NOT_SUPPORTED;
+
+	if (node_handle)
+		*node_handle =
+			le16_to_cpu(desc.params.get_link_topo.addr.handle);
+	if (node_part_number)
+		*node_part_number = desc.params.get_link_topo.node_part_num;
+
+	return ICE_SUCCESS;
+}
+
+#define MAX_NETLIST_SIZE 10
+/**
+ * ice_find_netlist_node
+ * @hw: pointer to the hw struct
+ * @node_type_ctx: type of netlist node to look for
+ * @node_part_number: node part number to look for
+ * @node_handle: output parameter if node found - optional
+ *
+ * Find and return the node handle for a given node type and part number in the
+ * netlist. When found ICE_SUCCESS is returned, ICE_ERR_DOES_NOT_EXIST
+ * otherwise. If @node_handle provided, it would be set to found node handle.
+ */
+enum ice_status
+ice_find_netlist_node(struct ice_hw *hw, u8 node_type_ctx, u8 node_part_number,
+		      u16 *node_handle)
+{
+	struct ice_aqc_get_link_topo cmd;
+	u8 rec_node_part_number;
+	enum ice_status status;
+	u16 rec_node_handle;
+	u8 idx;
+
+	for (idx = 0; idx < MAX_NETLIST_SIZE; idx++) {
+		memset(&cmd, 0, sizeof(cmd));
+
+		cmd.addr.topo_params.node_type_ctx =
+			(node_type_ctx << ICE_AQC_LINK_TOPO_NODE_TYPE_S);
+		cmd.addr.topo_params.index = idx;
+
+		status = ice_aq_get_netlist_node(hw, &cmd,
+						 &rec_node_part_number,
+						 &rec_node_handle);
+		if (status)
+			return status;
+
+		if (rec_node_part_number == node_part_number) {
+			if (node_handle)
+				*node_handle = rec_node_handle;
+			return ICE_SUCCESS;
+		}
+	}
+
+	return ICE_ERR_DOES_NOT_EXIST;
+}
+
 /**
  * ice_is_media_cage_present
  * @pi: port information structure
@@ -5083,3 +5156,53 @@ bool ice_fw_supports_report_dflt_cfg(struct ice_hw *hw)
 	}
 	return false;
 }
+
+/**
+ * ice_is_phy_rclk_present_e810t
+ * @hw: pointer to the hw struct
+ *
+ * Check if the PHY Recovered Clock device is present in the netlist
+ */
+bool ice_is_phy_rclk_present_e810t(struct ice_hw *hw)
+{
+	if (ice_find_netlist_node(hw, ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_CTRL,
+				  ICE_ACQ_GET_LINK_TOPO_NODE_NR_PKVL, NULL))
+		return false;
+
+	return true;
+}
+
+/**
+ * ice_is_cgu_present_e810t
+ * @hw: pointer to the hw struct
+ *
+ * Check if the Clock Generation Unit (CGU) device is present in the netlist
+ */
+bool ice_is_cgu_present_e810t(struct ice_hw *hw)
+{
+	if (!ice_find_netlist_node(hw, ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_CTRL,
+				   ICE_ACQ_GET_LINK_TOPO_NODE_NR_ZL30632_80032,
+				   NULL)) {
+		hw->cgu_part_number =
+			ICE_ACQ_GET_LINK_TOPO_NODE_NR_ZL30632_80032;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * ice_is_clock_mux_present_e810t
+ * @hw: pointer to the hw struct
+ *
+ * Check if the Clock Multiplexer device is present in the netlist
+ */
+bool ice_is_clock_mux_present_e810t(struct ice_hw *hw)
+{
+	if (ice_find_netlist_node(hw, ICE_AQC_LINK_TOPO_NODE_TYPE_CLK_MUX,
+				  ICE_ACQ_GET_LINK_TOPO_NODE_NR_GEN_CLK_MUX,
+				  NULL))
+		return false;
+
+	return true;
+}
+
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 65c1b3244264..b20a5c085246 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -89,6 +89,12 @@ ice_aq_get_phy_caps(struct ice_port_info *pi, bool qual_mods, u8 report_mode,
 		    struct ice_aqc_get_phy_caps_data *caps,
 		    struct ice_sq_cd *cd);
 enum ice_status
+ice_aq_get_netlist_node(struct ice_hw *hw, struct ice_aqc_get_link_topo *cmd,
+			u8 *node_part_number, u16 *node_handle);
+enum ice_status
+ice_find_netlist_node(struct ice_hw *hw, u8 node_type_ctx, u8 node_part_number,
+		      u16 *node_handle);
+enum ice_status
 ice_aq_list_caps(struct ice_hw *hw, void *buf, u16 buf_size, u32 *cap_count,
 		 enum ice_adminq_opc opc, struct ice_sq_cd *cd);
 enum ice_status
@@ -206,4 +212,7 @@ bool ice_fw_supports_lldp_fltr_ctrl(struct ice_hw *hw);
 enum ice_status
 ice_lldp_fltr_add_remove(struct ice_hw *hw, u16 vsi_num, bool add);
 bool ice_fw_supports_report_dflt_cfg(struct ice_hw *hw);
+bool ice_is_phy_rclk_present_e810t(struct ice_hw *hw);
+bool ice_is_cgu_present_e810t(struct ice_hw *hw);
+bool ice_is_clock_mux_present_e810t(struct ice_hw *hw);
 #endif /* _ICE_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 40562600a8cf..2422215b7937 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -4183,8 +4183,12 @@ void ice_init_feature_support(struct ice_pf *pf)
 	case ICE_DEV_ID_E810C_QSFP:
 	case ICE_DEV_ID_E810C_SFP:
 		ice_set_feature_support(pf, ICE_F_DSCP);
-		if (ice_is_e810t(&pf->hw))
+		if (ice_is_clock_mux_present_e810t(&pf->hw))
 			ice_set_feature_support(pf, ICE_F_SMA_CTRL);
+		if (ice_is_phy_rclk_present_e810t(&pf->hw))
+			ice_set_feature_support(pf, ICE_F_PHY_RCLK);
+		if (ice_is_cgu_present_e810t(&pf->hw))
+			ice_set_feature_support(pf, ICE_F_CGU);
 		break;
 	default:
 		break;
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index 29f947c0cd2e..aa257db36765 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -800,3 +800,4 @@ bool ice_is_pca9575_present(struct ice_hw *hw)
 
 	return !status && handle;
 }
+
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 9e0c2923c62e..a9dc16641bd4 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -920,6 +920,7 @@ struct ice_hw {
 	struct list_head rss_list_head;
 	struct ice_mbx_snapshot mbx_snapshot;
 	u16 io_expander_handle;
+	u8 cgu_part_number;
 };
 
 /* Statistics collected by each port, VSI, VEB, and S-channel */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 net-next 2/6] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
  2021-11-05 20:53 ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-05 20:53   ` Maciej Machnikowski
  -1 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: maciej.machnikowski, netdev, intel-wired-lan
  Cc: richardcochran, abyagowi, anthony.l.nguyen, davem, kuba,
	linux-kselftest, idosch, mkubecek, saeed, michael.chan

This patch series introduces basic interface for reading the Ethernet
Equipment Clock (EEC) state on a SyncE capable device. This state gives
information about the state of EEC. This interface is required to
implement Synchronization Status Messaging on upper layers.

Initial implementation returns SyncE EEC state in the IFLA_EEC_STATE
attribute. The optional index of input that's used as a source can be
returned in the IFLA_EEC_SRC_IDX attribute.

SyncE EEC state read needs to be implemented as a ndo_get_eec_state
function. The index will be read by calling the ndo_get_eec_src.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 include/linux/netdevice.h      | 13 ++++++
 include/uapi/linux/if_link.h   | 31 +++++++++++++
 include/uapi/linux/rtnetlink.h |  3 ++
 net/core/rtnetlink.c           | 79 ++++++++++++++++++++++++++++++++++
 security/selinux/nlmsgtab.c    |  3 +-
 5 files changed, 128 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ec42495a43a..ef2b381dae0c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1344,6 +1344,13 @@ struct netdev_net_notifier {
  *	The caller must be under RCU read context.
  * int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path);
  *     Get the forwarding path to reach the real device from the HW destination address
+ * int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state *state,
+ *			    u32 *src_idx, struct netlink_ext_ack *extack);
+ *	Get state of physical layer frequency synchronization (SyncE)
+ * int (*ndo_get_eec_src)(struct net_device *dev, u32 *src,
+ *			  struct netlink_ext_ack *extack);
+ *	Get the index of the source signal that's currently used as EEC's
+ *	reference
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1563,6 +1570,12 @@ struct net_device_ops {
 	struct net_device *	(*ndo_get_peer_dev)(struct net_device *dev);
 	int                     (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx,
                                                          struct net_device_path *path);
+	int			(*ndo_get_eec_state)(struct net_device *dev,
+						     enum if_eec_state *state,
+						     struct netlink_ext_ack *extack);
+	int			(*ndo_get_eec_src)(struct net_device *dev,
+						   u32 *src,
+						   struct netlink_ext_ack *extack);
 };
 
 /**
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index eebd3894fe89..8eae80f287e9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1273,4 +1273,35 @@ enum {
 
 #define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
 
+/* SyncE section */
+
+enum if_eec_state {
+	IF_EEC_STATE_INVALID = 0,	/* state is not valid */
+	IF_EEC_STATE_FREERUN,		/* clock is free-running */
+	IF_EEC_STATE_LOCKED,		/* clock is locked to the reference,
+					 * but the holdover memory is not valid
+					 */
+	IF_EEC_STATE_LOCKED_HO_ACQ,	/* clock is locked to the reference
+					 * and holdover memory is valid
+					 */
+	IF_EEC_STATE_HOLDOVER,		/* clock is in holdover mode */
+};
+
+#define EEC_SRC_PORT		(1 << 0) /* recovered clock from the port is
+					  * currently the source for the EEC
+					  */
+
+struct if_eec_state_msg {
+	__u32 ifindex;
+};
+
+enum {
+	IFLA_EEC_UNSPEC,
+	IFLA_EEC_STATE,
+	IFLA_EEC_SRC_IDX,
+	__IFLA_EEC_MAX,
+};
+
+#define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 5888492a5257..1d8662afd6bd 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -185,6 +185,9 @@ enum {
 	RTM_GETNEXTHOPBUCKET,
 #define RTM_GETNEXTHOPBUCKET	RTM_GETNEXTHOPBUCKET
 
+	RTM_GETEECSTATE = 124,
+#define RTM_GETEECSTATE	RTM_GETEECSTATE
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2af8aeeadadf..03bc773d0e69 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -5467,6 +5467,83 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+static int rtnl_fill_eec_state(struct sk_buff *skb, struct net_device *dev,
+			       u32 portid, u32 seq, struct netlink_callback *cb,
+			       int flags, struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct if_eec_state_msg *state_msg;
+	enum if_eec_state state;
+	struct nlmsghdr *nlh;
+	u32 src_idx;
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!ops->ndo_get_eec_state)
+		return -EOPNOTSUPP;
+
+	err = ops->ndo_get_eec_state(dev, &state, extack);
+	if (err)
+		return err;
+
+	nlh = nlmsg_put(skb, portid, seq, RTM_GETEECSTATE, sizeof(*state_msg),
+			flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	state_msg = nlmsg_data(nlh);
+	state_msg->ifindex = dev->ifindex;
+
+	if (nla_put_u32(skb, IFLA_EEC_STATE, state))
+		return -EMSGSIZE;
+
+	if (!ops->ndo_get_eec_src)
+		goto end_msg;
+
+	err = ops->ndo_get_eec_src(dev, &src_idx, extack);
+	if (err)
+		return err;
+
+	if (nla_put_u32(skb, IFLA_EEC_SRC_IDX, src_idx))
+		return -EMSGSIZE;
+
+end_msg:
+	nlmsg_end(skb, nlh);
+	return 0;
+}
+
+static int rtnl_eec_state_get(struct sk_buff *skb, struct nlmsghdr *nlh,
+			      struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_eec_state_msg *state;
+	struct net_device *dev;
+	struct sk_buff *nskb;
+	int err;
+
+	state = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, state->ifindex);
+	if (!dev) {
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
+		return -ENODEV;
+	}
+
+	nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!nskb)
+		return -ENOBUFS;
+
+	err = rtnl_fill_eec_state(nskb, dev, NETLINK_CB(skb).portid,
+				  nlh->nlmsg_seq, NULL, nlh->nlmsg_flags,
+				  extack);
+	if (err < 0)
+		kfree_skb(nskb);
+	else
+		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+
+	return err;
+}
+
 /* Process one rtnetlink message. */
 
 static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -5692,4 +5769,6 @@ void __init rtnetlink_init(void)
 
 	rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump,
 		      0);
+
+	rtnl_register(PF_UNSPEC, RTM_GETEECSTATE, rtnl_eec_state_get, NULL, 0);
 }
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 94ea2a8b2bb7..2c66e722ea9c 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -91,6 +91,7 @@ static const struct nlmsg_perm nlmsg_route_perms[] =
 	{ RTM_NEWNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_DELNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_GETEECSTATE,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
 };
 
 static const struct nlmsg_perm nlmsg_tcpdiag_perms[] =
@@ -176,7 +177,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
 		 * structures at the top of this file with the new mappings
 		 * before updating the BUILD_BUG_ON() macro!
 		 */
-		BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3));
+		BUILD_BUG_ON(RTM_MAX != (RTM_GETEECSTATE + 3));
 		err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms,
 				 sizeof(nlmsg_route_perms));
 		break;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 2/6] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
@ 2021-11-05 20:53   ` Maciej Machnikowski
  0 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: intel-wired-lan

This patch series introduces basic interface for reading the Ethernet
Equipment Clock (EEC) state on a SyncE capable device. This state gives
information about the state of EEC. This interface is required to
implement Synchronization Status Messaging on upper layers.

Initial implementation returns SyncE EEC state in the IFLA_EEC_STATE
attribute. The optional index of input that's used as a source can be
returned in the IFLA_EEC_SRC_IDX attribute.

SyncE EEC state read needs to be implemented as a ndo_get_eec_state
function. The index will be read by calling the ndo_get_eec_src.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 include/linux/netdevice.h      | 13 ++++++
 include/uapi/linux/if_link.h   | 31 +++++++++++++
 include/uapi/linux/rtnetlink.h |  3 ++
 net/core/rtnetlink.c           | 79 ++++++++++++++++++++++++++++++++++
 security/selinux/nlmsgtab.c    |  3 +-
 5 files changed, 128 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ec42495a43a..ef2b381dae0c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1344,6 +1344,13 @@ struct netdev_net_notifier {
  *	The caller must be under RCU read context.
  * int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path);
  *     Get the forwarding path to reach the real device from the HW destination address
+ * int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state *state,
+ *			    u32 *src_idx, struct netlink_ext_ack *extack);
+ *	Get state of physical layer frequency synchronization (SyncE)
+ * int (*ndo_get_eec_src)(struct net_device *dev, u32 *src,
+ *			  struct netlink_ext_ack *extack);
+ *	Get the index of the source signal that's currently used as EEC's
+ *	reference
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1563,6 +1570,12 @@ struct net_device_ops {
 	struct net_device *	(*ndo_get_peer_dev)(struct net_device *dev);
 	int                     (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx,
                                                          struct net_device_path *path);
+	int			(*ndo_get_eec_state)(struct net_device *dev,
+						     enum if_eec_state *state,
+						     struct netlink_ext_ack *extack);
+	int			(*ndo_get_eec_src)(struct net_device *dev,
+						   u32 *src,
+						   struct netlink_ext_ack *extack);
 };
 
 /**
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index eebd3894fe89..8eae80f287e9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1273,4 +1273,35 @@ enum {
 
 #define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
 
+/* SyncE section */
+
+enum if_eec_state {
+	IF_EEC_STATE_INVALID = 0,	/* state is not valid */
+	IF_EEC_STATE_FREERUN,		/* clock is free-running */
+	IF_EEC_STATE_LOCKED,		/* clock is locked to the reference,
+					 * but the holdover memory is not valid
+					 */
+	IF_EEC_STATE_LOCKED_HO_ACQ,	/* clock is locked to the reference
+					 * and holdover memory is valid
+					 */
+	IF_EEC_STATE_HOLDOVER,		/* clock is in holdover mode */
+};
+
+#define EEC_SRC_PORT		(1 << 0) /* recovered clock from the port is
+					  * currently the source for the EEC
+					  */
+
+struct if_eec_state_msg {
+	__u32 ifindex;
+};
+
+enum {
+	IFLA_EEC_UNSPEC,
+	IFLA_EEC_STATE,
+	IFLA_EEC_SRC_IDX,
+	__IFLA_EEC_MAX,
+};
+
+#define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 5888492a5257..1d8662afd6bd 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -185,6 +185,9 @@ enum {
 	RTM_GETNEXTHOPBUCKET,
 #define RTM_GETNEXTHOPBUCKET	RTM_GETNEXTHOPBUCKET
 
+	RTM_GETEECSTATE = 124,
+#define RTM_GETEECSTATE	RTM_GETEECSTATE
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2af8aeeadadf..03bc773d0e69 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -5467,6 +5467,83 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+static int rtnl_fill_eec_state(struct sk_buff *skb, struct net_device *dev,
+			       u32 portid, u32 seq, struct netlink_callback *cb,
+			       int flags, struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct if_eec_state_msg *state_msg;
+	enum if_eec_state state;
+	struct nlmsghdr *nlh;
+	u32 src_idx;
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!ops->ndo_get_eec_state)
+		return -EOPNOTSUPP;
+
+	err = ops->ndo_get_eec_state(dev, &state, extack);
+	if (err)
+		return err;
+
+	nlh = nlmsg_put(skb, portid, seq, RTM_GETEECSTATE, sizeof(*state_msg),
+			flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	state_msg = nlmsg_data(nlh);
+	state_msg->ifindex = dev->ifindex;
+
+	if (nla_put_u32(skb, IFLA_EEC_STATE, state))
+		return -EMSGSIZE;
+
+	if (!ops->ndo_get_eec_src)
+		goto end_msg;
+
+	err = ops->ndo_get_eec_src(dev, &src_idx, extack);
+	if (err)
+		return err;
+
+	if (nla_put_u32(skb, IFLA_EEC_SRC_IDX, src_idx))
+		return -EMSGSIZE;
+
+end_msg:
+	nlmsg_end(skb, nlh);
+	return 0;
+}
+
+static int rtnl_eec_state_get(struct sk_buff *skb, struct nlmsghdr *nlh,
+			      struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_eec_state_msg *state;
+	struct net_device *dev;
+	struct sk_buff *nskb;
+	int err;
+
+	state = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, state->ifindex);
+	if (!dev) {
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
+		return -ENODEV;
+	}
+
+	nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!nskb)
+		return -ENOBUFS;
+
+	err = rtnl_fill_eec_state(nskb, dev, NETLINK_CB(skb).portid,
+				  nlh->nlmsg_seq, NULL, nlh->nlmsg_flags,
+				  extack);
+	if (err < 0)
+		kfree_skb(nskb);
+	else
+		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+
+	return err;
+}
+
 /* Process one rtnetlink message. */
 
 static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -5692,4 +5769,6 @@ void __init rtnetlink_init(void)
 
 	rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump,
 		      0);
+
+	rtnl_register(PF_UNSPEC, RTM_GETEECSTATE, rtnl_eec_state_get, NULL, 0);
 }
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 94ea2a8b2bb7..2c66e722ea9c 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -91,6 +91,7 @@ static const struct nlmsg_perm nlmsg_route_perms[] =
 	{ RTM_NEWNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_DELNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_GETEECSTATE,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
 };
 
 static const struct nlmsg_perm nlmsg_tcpdiag_perms[] =
@@ -176,7 +177,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
 		 * structures@the top of this file with the new mappings
 		 * before updating the BUILD_BUG_ON() macro!
 		 */
-		BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3));
+		BUILD_BUG_ON(RTM_MAX != (RTM_GETEECSTATE + 3));
 		err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms,
 				 sizeof(nlmsg_route_perms));
 		break;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 net-next 3/6] ice: add support for reading SyncE DPLL state
  2021-11-05 20:53 ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-05 20:53   ` Maciej Machnikowski
  -1 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: maciej.machnikowski, netdev, intel-wired-lan
  Cc: richardcochran, abyagowi, anthony.l.nguyen, davem, kuba,
	linux-kselftest, idosch, mkubecek, saeed, michael.chan

Implement SyncE DPLL monitoring for E810-T devices.
Poll loop will periodically check the state of the DPLL and cache it
in the pf structure. State changes will be logged in the system log.

Cached state can be read using the RTM_GETEECSTATE rtnetlink
message.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |  5 ++
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   | 34 +++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.c   | 36 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |  5 +-
 drivers/net/ethernet/intel/ice/ice_devids.h   |  3 ++
 drivers/net/ethernet/intel/ice/ice_main.c     | 46 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp.c      | 34 +++++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   | 48 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h   | 22 +++++++++
 9 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 3dc4caa41565..1dff7ca704d4 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -609,6 +609,11 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	enum if_eec_state synce_dpll_state;
+	u8 synce_dpll_pin;
+	enum if_eec_state ptp_dpll_state;
+	u8 ptp_dpll_pin;
 };
 
 struct ice_netdev_priv {
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 339c2a86f680..11226af7a9a4 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1808,6 +1808,36 @@ struct ice_aqc_add_rdma_qset_data {
 	struct ice_aqc_add_tx_rdma_qset_entry rdma_qsets[];
 };
 
+/* Get CGU DPLL status (direct 0x0C66) */
+struct ice_aqc_get_cgu_dpll_status {
+	u8 dpll_num;
+	u8 ref_state;
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_LOS		BIT(0)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_SCM		BIT(1)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_CFM		BIT(2)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_GST		BIT(3)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_PFM		BIT(4)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_ESYNC	BIT(6)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_FAST_LOCK_EN	BIT(7)
+	__le16 dpll_state;
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_LOCK		BIT(0)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO		BIT(1)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO_READY	BIT(2)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_FLHIT		BIT(5)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_PSLHIT	BIT(7)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT	8
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SEL	\
+	ICE_M(0x1F, ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE_SHIFT	13
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE \
+	ICE_M(0x7, ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE_SHIFT)
+	__le32 phase_offset_h;
+	__le32 phase_offset_l;
+	u8 eec_mode;
+	u8 rsvd[1];
+	__le16 node_handle;
+};
+
 /* Configure Firmware Logging Command (indirect 0xFF09)
  * Logging Information Read Response (indirect 0xFF10)
  * Note: The 0xFF10 command has no input parameters.
@@ -2039,6 +2069,7 @@ struct ice_aq_desc {
 		struct ice_aqc_fw_logging fw_logging;
 		struct ice_aqc_get_clear_fw_log get_clear_fw_log;
 		struct ice_aqc_download_pkg download_pkg;
+		struct ice_aqc_get_cgu_dpll_status get_cgu_dpll_status;
 		struct ice_aqc_driver_shared_params drv_shared_params;
 		struct ice_aqc_set_mac_lb set_mac_lb;
 		struct ice_aqc_alloc_free_res_cmd sw_res_ctrl;
@@ -2205,6 +2236,9 @@ enum ice_adminq_opc {
 	ice_aqc_opc_update_pkg				= 0x0C42,
 	ice_aqc_opc_get_pkg_info_list			= 0x0C43,
 
+	/* 1588/SyncE commands/events */
+	ice_aqc_opc_get_cgu_dpll_status			= 0x0C66,
+
 	ice_aqc_opc_driver_shared_params		= 0x0C90,
 
 	/* Standalone Commands/Events */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 35903b282885..8069141ac105 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -4644,6 +4644,42 @@ ice_dis_vsi_rdma_qset(struct ice_port_info *pi, u16 count, u32 *qset_teid,
 	return ice_status_to_errno(status);
 }
 
+/**
+ * ice_aq_get_cgu_dpll_status
+ * @hw: pointer to the HW struct
+ * @dpll_num: DPLL index
+ * @ref_state: Reference clock state
+ * @dpll_state: DPLL state
+ * @phase_offset: Phase offset in ps
+ * @eec_mode: EEC_mode
+ *
+ * Get CGU DPLL status (0x0C66)
+ */
+enum ice_status
+ice_aq_get_cgu_dpll_status(struct ice_hw *hw, u8 dpll_num, u8 *ref_state,
+			   u16 *dpll_state, u64 *phase_offset, u8 *eec_mode)
+{
+	struct ice_aqc_get_cgu_dpll_status *cmd;
+	struct ice_aq_desc desc;
+	enum ice_status status;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_cgu_dpll_status);
+	cmd = &desc.params.get_cgu_dpll_status;
+	cmd->dpll_num = dpll_num;
+
+	status = ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
+	if (!status) {
+		*ref_state = cmd->ref_state;
+		*dpll_state = le16_to_cpu(cmd->dpll_state);
+		*phase_offset = le32_to_cpu(cmd->phase_offset_h);
+		*phase_offset <<= 32;
+		*phase_offset += le32_to_cpu(cmd->phase_offset_l);
+		*eec_mode = cmd->eec_mode;
+	}
+
+	return status;
+}
+
 /**
  * ice_replay_pre_init - replay pre initialization
  * @hw: pointer to the HW struct
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index b20a5c085246..aaed388a40a8 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -106,6 +106,7 @@ enum ice_status
 ice_aq_manage_mac_write(struct ice_hw *hw, const u8 *mac_addr, u8 flags,
 			struct ice_sq_cd *cd);
 bool ice_is_e810(struct ice_hw *hw);
+bool ice_is_e810t(struct ice_hw *hw);
 enum ice_status ice_clear_pf_cfg(struct ice_hw *hw);
 enum ice_status
 ice_aq_set_phy_cfg(struct ice_hw *hw, struct ice_port_info *pi,
@@ -162,6 +163,9 @@ ice_cfg_vsi_rdma(struct ice_port_info *pi, u16 vsi_handle, u16 tc_bitmap,
 int
 ice_ena_vsi_rdma_qset(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 		      u16 *rdma_qset, u16 num_qsets, u32 *qset_teid);
+enum ice_status
+ice_aq_get_cgu_dpll_status(struct ice_hw *hw, u8 dpll_num, u8 *ref_state,
+			   u16 *dpll_state, u64 *phase_offset, u8 *eec_mode);
 int
 ice_dis_vsi_rdma_qset(struct ice_port_info *pi, u16 count, u32 *qset_teid,
 		      u16 *q_id);
@@ -189,7 +193,6 @@ ice_stat_update40(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
 void
 ice_stat_update32(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
 		  u64 *prev_stat, u64 *cur_stat);
-bool ice_is_e810t(struct ice_hw *hw);
 enum ice_status
 ice_sched_query_elem(struct ice_hw *hw, u32 node_teid,
 		     struct ice_aqc_txsched_elem_data *buf);
diff --git a/drivers/net/ethernet/intel/ice/ice_devids.h b/drivers/net/ethernet/intel/ice/ice_devids.h
index 61dd2f18dee8..0b654d417d29 100644
--- a/drivers/net/ethernet/intel/ice/ice_devids.h
+++ b/drivers/net/ethernet/intel/ice/ice_devids.h
@@ -58,4 +58,7 @@
 /* Intel(R) Ethernet Connection E822-L 1GbE */
 #define ICE_DEV_ID_E822L_SGMII		0x189A
 
+#define ICE_SUBDEV_ID_E810T		0x000E
+#define ICE_SUBDEV_ID_E810T2		0x000F
+
 #endif /* _ICE_DEVIDS_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index f099797f35e3..7fac27903ab4 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6240,6 +6240,50 @@ static void ice_napi_disable_all(struct ice_vsi *vsi)
 	}
 }
 
+/**
+ * ice_get_eec_state - get state of SyncE DPLL
+ * @netdev: network interface device structure
+ * @state: state of SyncE DPLL
+ * @extack: netlink extended ack
+ */
+static int
+ice_get_eec_state(struct net_device *netdev, enum if_eec_state *state,
+		  struct netlink_ext_ack *extack)
+{
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	*state = pf->synce_dpll_state;
+
+	return 0;
+}
+
+/**
+ * ice_get_eec_src - get reference index of SyncE DPLL
+ * @netdev: network interface device structure
+ * @src: index of source reference of the SyncE DPLL
+ * @extack: netlink extended ack
+ */
+static int
+ice_get_eec_src(struct net_device *netdev, u32 *src,
+		struct netlink_ext_ack *extack)
+{
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	*src = pf->synce_dpll_pin;
+
+	return 0;
+}
+
 /**
  * ice_down - Shutdown the connection
  * @vsi: The VSI being stopped
@@ -8601,4 +8645,6 @@ static const struct net_device_ops ice_netdev_ops = {
 	.ndo_bpf = ice_xdp,
 	.ndo_xdp_xmit = ice_xdp_xmit,
 	.ndo_xsk_wakeup = ice_xsk_wakeup,
+	.ndo_get_eec_state = ice_get_eec_state,
+	.ndo_get_eec_src = ice_get_eec_src,
 };
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index bf7247c6f58e..a38d0ab4d6d5 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -1766,6 +1766,36 @@ static void ice_ptp_tx_tstamp_cleanup(struct ice_ptp_tx *tx)
 	}
 }
 
+static void ice_handle_cgu_state(struct ice_pf *pf)
+{
+	enum if_eec_state cgu_state;
+	u8 pin;
+
+	cgu_state = ice_get_zl_dpll_state(&pf->hw, ICE_CGU_DPLL_SYNCE, &pin);
+	if (pf->synce_dpll_state != cgu_state) {
+		pf->synce_dpll_state = cgu_state;
+		pf->synce_dpll_pin = pin;
+
+		dev_warn(ice_pf_to_dev(pf),
+			 "<DPLL%i> state changed to: %d, pin %d",
+			 ICE_CGU_DPLL_SYNCE,
+			 pf->synce_dpll_state,
+			 pin);
+	}
+
+	cgu_state = ice_get_zl_dpll_state(&pf->hw, ICE_CGU_DPLL_PTP, &pin);
+	if (pf->ptp_dpll_state != cgu_state) {
+		pf->ptp_dpll_state = cgu_state;
+		pf->ptp_dpll_pin = pin;
+
+		dev_warn(ice_pf_to_dev(pf),
+			 "<DPLL%i> state changed to: %d, pin %d",
+			 ICE_CGU_DPLL_PTP,
+			 pf->ptp_dpll_state,
+			 pin);
+	}
+}
+
 static void ice_ptp_periodic_work(struct kthread_work *work)
 {
 	struct ice_ptp *ptp = container_of(work, struct ice_ptp, work.work);
@@ -1774,6 +1804,9 @@ static void ice_ptp_periodic_work(struct kthread_work *work)
 	if (!test_bit(ICE_FLAG_PTP, pf->flags))
 		return;
 
+	if (ice_is_feature_supported(pf, ICE_F_CGU))
+		ice_handle_cgu_state(pf);
+
 	ice_ptp_update_cached_phctime(pf);
 
 	ice_ptp_tx_tstamp_cleanup(&pf->ptp.port.tx);
@@ -1958,3 +1991,4 @@ void ice_ptp_release(struct ice_pf *pf)
 
 	dev_info(ice_pf_to_dev(pf), "Removed PTP clock\n");
 }
+
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index aa257db36765..7a9482918a20 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -375,6 +375,54 @@ static int ice_ptp_port_cmd_e810(struct ice_hw *hw, enum ice_ptp_tmr_cmd cmd)
 	return 0;
 }
 
+/**
+ * ice_get_zl_dpll_state - get the state of the DPLL
+ * @hw: pointer to the hw struct
+ * @dpll_idx: Index of internal DPLL unit
+ * @pin: pointer to a buffer for returning currently active pin
+ *
+ * This function will read the state of the DPLL(dpll_idx). If optional
+ * parameter pin is given it'll be used to retrieve currently active pin.
+ *
+ * Return: state of the DPLL
+ */
+enum if_eec_state
+ice_get_zl_dpll_state(struct ice_hw *hw, u8 dpll_idx, u8 *pin)
+{
+	enum ice_status status;
+	u64 phase_offset;
+	u16 dpll_state;
+	u8 ref_state;
+	u8 eec_mode;
+
+	if (dpll_idx >= ICE_CGU_DPLL_MAX)
+		return IF_EEC_STATE_INVALID;
+
+	status = ice_aq_get_cgu_dpll_status(hw, dpll_idx, &ref_state,
+					    &dpll_state, &phase_offset,
+					    &eec_mode);
+	if (status)
+		return IF_EEC_STATE_INVALID;
+
+	if (pin) {
+		/* current ref pin in dpll_state_refsel_status_X register */
+		*pin = (dpll_state &
+			ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SEL) >>
+		       ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT;
+	}
+
+	if (dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_LOCK) {
+		if (dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO_READY)
+			return IF_EEC_STATE_LOCKED_HO_ACQ;
+		else
+			return IF_EEC_STATE_LOCKED;
+	} else if ((dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO) &&
+		  (dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO_READY)) {
+		return IF_EEC_STATE_HOLDOVER;
+	}
+	return IF_EEC_STATE_FREERUN;
+}
+
 /* Device agnostic functions
  *
  * The following functions implement useful behavior to hide the differences
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
index b2984b5c22c1..fcd543531b2c 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
@@ -33,6 +33,8 @@ int ice_ptp_init_phy_e810(struct ice_hw *hw);
 int ice_read_sma_ctrl_e810t(struct ice_hw *hw, u8 *data);
 int ice_write_sma_ctrl_e810t(struct ice_hw *hw, u8 data);
 bool ice_is_pca9575_present(struct ice_hw *hw);
+enum if_eec_state
+ice_get_zl_dpll_state(struct ice_hw *hw, u8 dpll_idx, u8 *pin);
 
 #define PFTSYN_SEM_BYTES	4
 
@@ -98,4 +100,24 @@ bool ice_is_pca9575_present(struct ice_hw *hw);
 #define ICE_SMA_MAX_BIT_E810T	7
 #define ICE_PCA9575_P1_OFFSET	8
 
+enum ice_e810t_cgu_dpll {
+	ICE_CGU_DPLL_SYNCE,
+	ICE_CGU_DPLL_PTP,
+	ICE_CGU_DPLL_MAX
+};
+
+enum ice_e810t_cgu_pins {
+	REF0P,
+	REF0N,
+	REF1P,
+	REF1N,
+	REF2P,
+	REF2N,
+	REF3P,
+	REF3N,
+	REF4P,
+	REF4N,
+	NUM_E810T_CGU_PINS
+};
+
 #endif /* _ICE_PTP_HW_H_ */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 3/6] ice: add support for reading SyncE DPLL state
@ 2021-11-05 20:53   ` Maciej Machnikowski
  0 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: intel-wired-lan

Implement SyncE DPLL monitoring for E810-T devices.
Poll loop will periodically check the state of the DPLL and cache it
in the pf structure. State changes will be logged in the system log.

Cached state can be read using the RTM_GETEECSTATE rtnetlink
message.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |  5 ++
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   | 34 +++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.c   | 36 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |  5 +-
 drivers/net/ethernet/intel/ice/ice_devids.h   |  3 ++
 drivers/net/ethernet/intel/ice/ice_main.c     | 46 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp.c      | 34 +++++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   | 48 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h   | 22 +++++++++
 9 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 3dc4caa41565..1dff7ca704d4 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -609,6 +609,11 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	enum if_eec_state synce_dpll_state;
+	u8 synce_dpll_pin;
+	enum if_eec_state ptp_dpll_state;
+	u8 ptp_dpll_pin;
 };
 
 struct ice_netdev_priv {
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 339c2a86f680..11226af7a9a4 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1808,6 +1808,36 @@ struct ice_aqc_add_rdma_qset_data {
 	struct ice_aqc_add_tx_rdma_qset_entry rdma_qsets[];
 };
 
+/* Get CGU DPLL status (direct 0x0C66) */
+struct ice_aqc_get_cgu_dpll_status {
+	u8 dpll_num;
+	u8 ref_state;
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_LOS		BIT(0)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_SCM		BIT(1)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_CFM		BIT(2)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_GST		BIT(3)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_PFM		BIT(4)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_REF_SW_ESYNC	BIT(6)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_FAST_LOCK_EN	BIT(7)
+	__le16 dpll_state;
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_LOCK		BIT(0)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO		BIT(1)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO_READY	BIT(2)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_FLHIT		BIT(5)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_PSLHIT	BIT(7)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT	8
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SEL	\
+	ICE_M(0x1F, ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT)
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE_SHIFT	13
+#define ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE \
+	ICE_M(0x7, ICE_AQC_GET_CGU_DPLL_STATUS_STATE_MODE_SHIFT)
+	__le32 phase_offset_h;
+	__le32 phase_offset_l;
+	u8 eec_mode;
+	u8 rsvd[1];
+	__le16 node_handle;
+};
+
 /* Configure Firmware Logging Command (indirect 0xFF09)
  * Logging Information Read Response (indirect 0xFF10)
  * Note: The 0xFF10 command has no input parameters.
@@ -2039,6 +2069,7 @@ struct ice_aq_desc {
 		struct ice_aqc_fw_logging fw_logging;
 		struct ice_aqc_get_clear_fw_log get_clear_fw_log;
 		struct ice_aqc_download_pkg download_pkg;
+		struct ice_aqc_get_cgu_dpll_status get_cgu_dpll_status;
 		struct ice_aqc_driver_shared_params drv_shared_params;
 		struct ice_aqc_set_mac_lb set_mac_lb;
 		struct ice_aqc_alloc_free_res_cmd sw_res_ctrl;
@@ -2205,6 +2236,9 @@ enum ice_adminq_opc {
 	ice_aqc_opc_update_pkg				= 0x0C42,
 	ice_aqc_opc_get_pkg_info_list			= 0x0C43,
 
+	/* 1588/SyncE commands/events */
+	ice_aqc_opc_get_cgu_dpll_status			= 0x0C66,
+
 	ice_aqc_opc_driver_shared_params		= 0x0C90,
 
 	/* Standalone Commands/Events */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 35903b282885..8069141ac105 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -4644,6 +4644,42 @@ ice_dis_vsi_rdma_qset(struct ice_port_info *pi, u16 count, u32 *qset_teid,
 	return ice_status_to_errno(status);
 }
 
+/**
+ * ice_aq_get_cgu_dpll_status
+ * @hw: pointer to the HW struct
+ * @dpll_num: DPLL index
+ * @ref_state: Reference clock state
+ * @dpll_state: DPLL state
+ * @phase_offset: Phase offset in ps
+ * @eec_mode: EEC_mode
+ *
+ * Get CGU DPLL status (0x0C66)
+ */
+enum ice_status
+ice_aq_get_cgu_dpll_status(struct ice_hw *hw, u8 dpll_num, u8 *ref_state,
+			   u16 *dpll_state, u64 *phase_offset, u8 *eec_mode)
+{
+	struct ice_aqc_get_cgu_dpll_status *cmd;
+	struct ice_aq_desc desc;
+	enum ice_status status;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_cgu_dpll_status);
+	cmd = &desc.params.get_cgu_dpll_status;
+	cmd->dpll_num = dpll_num;
+
+	status = ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
+	if (!status) {
+		*ref_state = cmd->ref_state;
+		*dpll_state = le16_to_cpu(cmd->dpll_state);
+		*phase_offset = le32_to_cpu(cmd->phase_offset_h);
+		*phase_offset <<= 32;
+		*phase_offset += le32_to_cpu(cmd->phase_offset_l);
+		*eec_mode = cmd->eec_mode;
+	}
+
+	return status;
+}
+
 /**
  * ice_replay_pre_init - replay pre initialization
  * @hw: pointer to the HW struct
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index b20a5c085246..aaed388a40a8 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -106,6 +106,7 @@ enum ice_status
 ice_aq_manage_mac_write(struct ice_hw *hw, const u8 *mac_addr, u8 flags,
 			struct ice_sq_cd *cd);
 bool ice_is_e810(struct ice_hw *hw);
+bool ice_is_e810t(struct ice_hw *hw);
 enum ice_status ice_clear_pf_cfg(struct ice_hw *hw);
 enum ice_status
 ice_aq_set_phy_cfg(struct ice_hw *hw, struct ice_port_info *pi,
@@ -162,6 +163,9 @@ ice_cfg_vsi_rdma(struct ice_port_info *pi, u16 vsi_handle, u16 tc_bitmap,
 int
 ice_ena_vsi_rdma_qset(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 		      u16 *rdma_qset, u16 num_qsets, u32 *qset_teid);
+enum ice_status
+ice_aq_get_cgu_dpll_status(struct ice_hw *hw, u8 dpll_num, u8 *ref_state,
+			   u16 *dpll_state, u64 *phase_offset, u8 *eec_mode);
 int
 ice_dis_vsi_rdma_qset(struct ice_port_info *pi, u16 count, u32 *qset_teid,
 		      u16 *q_id);
@@ -189,7 +193,6 @@ ice_stat_update40(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
 void
 ice_stat_update32(struct ice_hw *hw, u32 reg, bool prev_stat_loaded,
 		  u64 *prev_stat, u64 *cur_stat);
-bool ice_is_e810t(struct ice_hw *hw);
 enum ice_status
 ice_sched_query_elem(struct ice_hw *hw, u32 node_teid,
 		     struct ice_aqc_txsched_elem_data *buf);
diff --git a/drivers/net/ethernet/intel/ice/ice_devids.h b/drivers/net/ethernet/intel/ice/ice_devids.h
index 61dd2f18dee8..0b654d417d29 100644
--- a/drivers/net/ethernet/intel/ice/ice_devids.h
+++ b/drivers/net/ethernet/intel/ice/ice_devids.h
@@ -58,4 +58,7 @@
 /* Intel(R) Ethernet Connection E822-L 1GbE */
 #define ICE_DEV_ID_E822L_SGMII		0x189A
 
+#define ICE_SUBDEV_ID_E810T		0x000E
+#define ICE_SUBDEV_ID_E810T2		0x000F
+
 #endif /* _ICE_DEVIDS_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index f099797f35e3..7fac27903ab4 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6240,6 +6240,50 @@ static void ice_napi_disable_all(struct ice_vsi *vsi)
 	}
 }
 
+/**
+ * ice_get_eec_state - get state of SyncE DPLL
+ * @netdev: network interface device structure
+ * @state: state of SyncE DPLL
+ * @extack: netlink extended ack
+ */
+static int
+ice_get_eec_state(struct net_device *netdev, enum if_eec_state *state,
+		  struct netlink_ext_ack *extack)
+{
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	*state = pf->synce_dpll_state;
+
+	return 0;
+}
+
+/**
+ * ice_get_eec_src - get reference index of SyncE DPLL
+ * @netdev: network interface device structure
+ * @src: index of source reference of the SyncE DPLL
+ * @extack: netlink extended ack
+ */
+static int
+ice_get_eec_src(struct net_device *netdev, u32 *src,
+		struct netlink_ext_ack *extack)
+{
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	*src = pf->synce_dpll_pin;
+
+	return 0;
+}
+
 /**
  * ice_down - Shutdown the connection
  * @vsi: The VSI being stopped
@@ -8601,4 +8645,6 @@ static const struct net_device_ops ice_netdev_ops = {
 	.ndo_bpf = ice_xdp,
 	.ndo_xdp_xmit = ice_xdp_xmit,
 	.ndo_xsk_wakeup = ice_xsk_wakeup,
+	.ndo_get_eec_state = ice_get_eec_state,
+	.ndo_get_eec_src = ice_get_eec_src,
 };
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index bf7247c6f58e..a38d0ab4d6d5 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -1766,6 +1766,36 @@ static void ice_ptp_tx_tstamp_cleanup(struct ice_ptp_tx *tx)
 	}
 }
 
+static void ice_handle_cgu_state(struct ice_pf *pf)
+{
+	enum if_eec_state cgu_state;
+	u8 pin;
+
+	cgu_state = ice_get_zl_dpll_state(&pf->hw, ICE_CGU_DPLL_SYNCE, &pin);
+	if (pf->synce_dpll_state != cgu_state) {
+		pf->synce_dpll_state = cgu_state;
+		pf->synce_dpll_pin = pin;
+
+		dev_warn(ice_pf_to_dev(pf),
+			 "<DPLL%i> state changed to: %d, pin %d",
+			 ICE_CGU_DPLL_SYNCE,
+			 pf->synce_dpll_state,
+			 pin);
+	}
+
+	cgu_state = ice_get_zl_dpll_state(&pf->hw, ICE_CGU_DPLL_PTP, &pin);
+	if (pf->ptp_dpll_state != cgu_state) {
+		pf->ptp_dpll_state = cgu_state;
+		pf->ptp_dpll_pin = pin;
+
+		dev_warn(ice_pf_to_dev(pf),
+			 "<DPLL%i> state changed to: %d, pin %d",
+			 ICE_CGU_DPLL_PTP,
+			 pf->ptp_dpll_state,
+			 pin);
+	}
+}
+
 static void ice_ptp_periodic_work(struct kthread_work *work)
 {
 	struct ice_ptp *ptp = container_of(work, struct ice_ptp, work.work);
@@ -1774,6 +1804,9 @@ static void ice_ptp_periodic_work(struct kthread_work *work)
 	if (!test_bit(ICE_FLAG_PTP, pf->flags))
 		return;
 
+	if (ice_is_feature_supported(pf, ICE_F_CGU))
+		ice_handle_cgu_state(pf);
+
 	ice_ptp_update_cached_phctime(pf);
 
 	ice_ptp_tx_tstamp_cleanup(&pf->ptp.port.tx);
@@ -1958,3 +1991,4 @@ void ice_ptp_release(struct ice_pf *pf)
 
 	dev_info(ice_pf_to_dev(pf), "Removed PTP clock\n");
 }
+
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index aa257db36765..7a9482918a20 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -375,6 +375,54 @@ static int ice_ptp_port_cmd_e810(struct ice_hw *hw, enum ice_ptp_tmr_cmd cmd)
 	return 0;
 }
 
+/**
+ * ice_get_zl_dpll_state - get the state of the DPLL
+ * @hw: pointer to the hw struct
+ * @dpll_idx: Index of internal DPLL unit
+ * @pin: pointer to a buffer for returning currently active pin
+ *
+ * This function will read the state of the DPLL(dpll_idx). If optional
+ * parameter pin is given it'll be used to retrieve currently active pin.
+ *
+ * Return: state of the DPLL
+ */
+enum if_eec_state
+ice_get_zl_dpll_state(struct ice_hw *hw, u8 dpll_idx, u8 *pin)
+{
+	enum ice_status status;
+	u64 phase_offset;
+	u16 dpll_state;
+	u8 ref_state;
+	u8 eec_mode;
+
+	if (dpll_idx >= ICE_CGU_DPLL_MAX)
+		return IF_EEC_STATE_INVALID;
+
+	status = ice_aq_get_cgu_dpll_status(hw, dpll_idx, &ref_state,
+					    &dpll_state, &phase_offset,
+					    &eec_mode);
+	if (status)
+		return IF_EEC_STATE_INVALID;
+
+	if (pin) {
+		/* current ref pin in dpll_state_refsel_status_X register */
+		*pin = (dpll_state &
+			ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SEL) >>
+		       ICE_AQC_GET_CGU_DPLL_STATUS_STATE_CLK_REF_SHIFT;
+	}
+
+	if (dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_LOCK) {
+		if (dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO_READY)
+			return IF_EEC_STATE_LOCKED_HO_ACQ;
+		else
+			return IF_EEC_STATE_LOCKED;
+	} else if ((dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO) &&
+		  (dpll_state & ICE_AQC_GET_CGU_DPLL_STATUS_STATE_HO_READY)) {
+		return IF_EEC_STATE_HOLDOVER;
+	}
+	return IF_EEC_STATE_FREERUN;
+}
+
 /* Device agnostic functions
  *
  * The following functions implement useful behavior to hide the differences
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
index b2984b5c22c1..fcd543531b2c 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
@@ -33,6 +33,8 @@ int ice_ptp_init_phy_e810(struct ice_hw *hw);
 int ice_read_sma_ctrl_e810t(struct ice_hw *hw, u8 *data);
 int ice_write_sma_ctrl_e810t(struct ice_hw *hw, u8 data);
 bool ice_is_pca9575_present(struct ice_hw *hw);
+enum if_eec_state
+ice_get_zl_dpll_state(struct ice_hw *hw, u8 dpll_idx, u8 *pin);
 
 #define PFTSYN_SEM_BYTES	4
 
@@ -98,4 +100,24 @@ bool ice_is_pca9575_present(struct ice_hw *hw);
 #define ICE_SMA_MAX_BIT_E810T	7
 #define ICE_PCA9575_P1_OFFSET	8
 
+enum ice_e810t_cgu_dpll {
+	ICE_CGU_DPLL_SYNCE,
+	ICE_CGU_DPLL_PTP,
+	ICE_CGU_DPLL_MAX
+};
+
+enum ice_e810t_cgu_pins {
+	REF0P,
+	REF0N,
+	REF1P,
+	REF1N,
+	REF2P,
+	REF2N,
+	REF3P,
+	REF3N,
+	REF4P,
+	REF4N,
+	NUM_E810T_CGU_PINS
+};
+
 #endif /* _ICE_PTP_HW_H_ */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 net-next 4/6] rtnetlink: Add support for SyncE recovered clock configuration
  2021-11-05 20:53 ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-05 20:53   ` Maciej Machnikowski
  -1 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: maciej.machnikowski, netdev, intel-wired-lan
  Cc: richardcochran, abyagowi, anthony.l.nguyen, davem, kuba,
	linux-kselftest, idosch, mkubecek, saeed, michael.chan

Add support for RTNL messages for reading/configuring SyncE recovered
clocks.
The messages are:
RTM_GETRCLKRANGE: Reads the allowed pin index range for the recovered
		  clock outputs. This can be aligned to PHY outputs or
		  to EEC inputs, whichever is better for a given
		  application

RTM_GETRCLKSTATE: Read the state of recovered pins that output recovered
		  clock from a given port. The message will contain the
		  number of assigned clocks (IFLA_RCLK_STATE_COUNT) and
		  a N pin inexes in IFLA_RCLK_STATE_OUT_IDX

RTM_SETRCLKSTATE: Sets the redirection of the recovered clock for
		  a given pin

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 include/linux/netdevice.h      |   9 ++
 include/uapi/linux/if_link.h   |  26 +++++
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/rtnetlink.c           | 174 +++++++++++++++++++++++++++++++++
 security/selinux/nlmsgtab.c    |   3 +
 5 files changed, 219 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ef2b381dae0c..708bd8336155 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1576,6 +1576,15 @@ struct net_device_ops {
 	int			(*ndo_get_eec_src)(struct net_device *dev,
 						   u32 *src,
 						   struct netlink_ext_ack *extack);
+	int			(*ndo_get_rclk_range)(struct net_device *dev,
+						      u32 *min_idx, u32 *max_idx,
+						      struct netlink_ext_ack *extack);
+	int			(*ndo_set_rclk_out)(struct net_device *dev,
+						    u32 out_idx, bool ena,
+						    struct netlink_ext_ack *extack);
+	int			(*ndo_get_rclk_state)(struct net_device *dev,
+						      u32 out_idx, bool *ena,
+						      struct netlink_ext_ack *extack);
 };
 
 /**
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8eae80f287e9..e27c153cfba3 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1304,4 +1304,30 @@ enum {
 
 #define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1)
 
+struct if_rclk_range_msg {
+	__u32 ifindex;
+};
+
+enum {
+	IFLA_RCLK_RANGE_UNSPEC,
+	IFLA_RCLK_RANGE_MIN_PIN,
+	IFLA_RCLK_RANGE_MAX_PIN,
+	__IFLA_RCLK_RANGE_MAX,
+};
+
+struct if_set_rclk_msg {
+	__u32 ifindex;
+	__u32 out_idx;
+	__u32 flags;
+};
+
+#define SET_RCLK_FLAGS_ENA	(1U << 0)
+
+enum {
+	IFLA_RCLK_STATE_UNSPEC,
+	IFLA_RCLK_STATE_OUT_IDX,
+	IFLA_RCLK_STATE_COUNT,
+	__IFLA_RCLK_STATE_MAX,
+};
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 1d8662afd6bd..6c0d96d56ec7 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -185,6 +185,13 @@ enum {
 	RTM_GETNEXTHOPBUCKET,
 #define RTM_GETNEXTHOPBUCKET	RTM_GETNEXTHOPBUCKET
 
+	RTM_GETRCLKRANGE = 120,
+#define RTM_GETRCLKRANGE	RTM_GETRCLKRANGE
+	RTM_GETRCLKSTATE = 121,
+#define RTM_GETRCLKSTATE	RTM_GETRCLKSTATE
+	RTM_SETRCLKSTATE = 122,
+#define RTM_SETRCLKSTATE	RTM_SETRCLKSTATE
+
 	RTM_GETEECSTATE = 124,
 #define RTM_GETEECSTATE	RTM_GETEECSTATE
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 03bc773d0e69..bc1e050f6d38 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -5544,6 +5544,176 @@ static int rtnl_eec_state_get(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return err;
 }
 
+static int rtnl_fill_rclk_range(struct sk_buff *skb, struct net_device *dev,
+				u32 portid, u32 seq,
+				struct netlink_callback *cb, int flags,
+				struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct if_rclk_range_msg *state_msg;
+	struct nlmsghdr *nlh;
+	u32 min_idx, max_idx;
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!ops->ndo_get_rclk_range)
+		return -EOPNOTSUPP;
+
+	err = ops->ndo_get_rclk_range(dev, &min_idx, &max_idx, extack);
+	if (err)
+		return err;
+
+	nlh = nlmsg_put(skb, portid, seq, RTM_GETRCLKRANGE, sizeof(*state_msg),
+			flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	state_msg = nlmsg_data(nlh);
+	state_msg->ifindex = dev->ifindex;
+
+	if (nla_put_u32(skb, IFLA_RCLK_RANGE_MIN_PIN, min_idx) ||
+	    nla_put_u32(skb, IFLA_RCLK_RANGE_MAX_PIN, max_idx))
+		return -EMSGSIZE;
+
+	nlmsg_end(skb, nlh);
+	return 0;
+}
+
+static int rtnl_rclk_range_get(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_eec_state_msg *state;
+	struct net_device *dev;
+	struct sk_buff *nskb;
+	int err;
+
+	state = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, state->ifindex);
+	if (!dev) {
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
+		return -ENODEV;
+	}
+
+	nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!nskb)
+		return -ENOBUFS;
+
+	err = rtnl_fill_rclk_range(nskb, dev, NETLINK_CB(skb).portid,
+				   nlh->nlmsg_seq, NULL, nlh->nlmsg_flags,
+				   extack);
+	if (err < 0)
+		kfree_skb(nskb);
+	else
+		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+
+	return err;
+}
+
+static int rtnl_fill_rclk_state(struct sk_buff *skb, struct net_device *dev,
+				u32 portid, u32 seq,
+				struct netlink_callback *cb, int flags,
+				struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	u32 min_idx, max_idx, src_idx, count = 0;
+	struct if_eec_state_msg *state_msg;
+	struct nlmsghdr *nlh;
+	bool ena;
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!ops->ndo_get_rclk_state || !ops->ndo_get_rclk_range)
+		return -EOPNOTSUPP;
+
+	err = ops->ndo_get_rclk_range(dev, &min_idx, &max_idx, extack);
+	if (err)
+		return err;
+
+	nlh = nlmsg_put(skb, portid, seq, RTM_GETRCLKSTATE, sizeof(*state_msg),
+			flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	state_msg = nlmsg_data(nlh);
+	state_msg->ifindex = dev->ifindex;
+
+	for (src_idx = min_idx; src_idx <= max_idx; src_idx++) {
+		ops->ndo_get_rclk_state(dev, src_idx, &ena, extack);
+		if (!ena)
+			continue;
+
+		if (nla_put_u32(skb, IFLA_RCLK_STATE_OUT_IDX, src_idx))
+			return -EMSGSIZE;
+		count++;
+	}
+
+	if (nla_put_u32(skb, IFLA_RCLK_STATE_COUNT, count))
+		return -EMSGSIZE;
+
+	nlmsg_end(skb, nlh);
+	return 0;
+}
+
+static int rtnl_rclk_state_get(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_eec_state_msg *state;
+	struct net_device *dev;
+	struct sk_buff *nskb;
+	int err;
+
+	state = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, state->ifindex);
+	if (!dev) {
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
+		return -ENODEV;
+	}
+
+	nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!nskb)
+		return -ENOBUFS;
+
+	err = rtnl_fill_rclk_state(nskb, dev, NETLINK_CB(skb).portid,
+				   nlh->nlmsg_seq, NULL, nlh->nlmsg_flags,
+				   extack);
+	if (err < 0)
+		kfree_skb(nskb);
+	else
+		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+
+	return err;
+}
+
+static int rtnl_rclk_set(struct sk_buff *skb, struct nlmsghdr *nlh,
+			 struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_set_rclk_msg *state;
+	struct net_device *dev;
+	bool ena;
+	int err;
+
+	state = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, state->ifindex);
+	if (!dev) {
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
+		return -ENODEV;
+	}
+
+	if (!dev->netdev_ops->ndo_set_rclk_out)
+		return -EOPNOTSUPP;
+
+	ena = !!(state->flags & SET_RCLK_FLAGS_ENA);
+	err = dev->netdev_ops->ndo_set_rclk_out(dev, state->out_idx, ena,
+						extack);
+
+	return err;
+}
+
 /* Process one rtnetlink message. */
 
 static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -5770,5 +5940,9 @@ void __init rtnetlink_init(void)
 	rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump,
 		      0);
 
+	rtnl_register(PF_UNSPEC, RTM_GETRCLKRANGE, rtnl_rclk_range_get, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_GETRCLKSTATE, rtnl_rclk_state_get, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_SETRCLKSTATE, rtnl_rclk_set, NULL, 0);
+
 	rtnl_register(PF_UNSPEC, RTM_GETEECSTATE, rtnl_eec_state_get, NULL, 0);
 }
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 2c66e722ea9c..57c7c85edd4d 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -91,6 +91,9 @@ static const struct nlmsg_perm nlmsg_route_perms[] =
 	{ RTM_NEWNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_DELNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_GETRCLKRANGE,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_GETRCLKSTATE,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_SETRCLKSTATE,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETEECSTATE,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
 };
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 4/6] rtnetlink: Add support for SyncE recovered clock configuration
@ 2021-11-05 20:53   ` Maciej Machnikowski
  0 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: intel-wired-lan

Add support for RTNL messages for reading/configuring SyncE recovered
clocks.
The messages are:
RTM_GETRCLKRANGE: Reads the allowed pin index range for the recovered
		  clock outputs. This can be aligned to PHY outputs or
		  to EEC inputs, whichever is better for a given
		  application

RTM_GETRCLKSTATE: Read the state of recovered pins that output recovered
		  clock from a given port. The message will contain the
		  number of assigned clocks (IFLA_RCLK_STATE_COUNT) and
		  a N pin inexes in IFLA_RCLK_STATE_OUT_IDX

RTM_SETRCLKSTATE: Sets the redirection of the recovered clock for
		  a given pin

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 include/linux/netdevice.h      |   9 ++
 include/uapi/linux/if_link.h   |  26 +++++
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/rtnetlink.c           | 174 +++++++++++++++++++++++++++++++++
 security/selinux/nlmsgtab.c    |   3 +
 5 files changed, 219 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ef2b381dae0c..708bd8336155 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1576,6 +1576,15 @@ struct net_device_ops {
 	int			(*ndo_get_eec_src)(struct net_device *dev,
 						   u32 *src,
 						   struct netlink_ext_ack *extack);
+	int			(*ndo_get_rclk_range)(struct net_device *dev,
+						      u32 *min_idx, u32 *max_idx,
+						      struct netlink_ext_ack *extack);
+	int			(*ndo_set_rclk_out)(struct net_device *dev,
+						    u32 out_idx, bool ena,
+						    struct netlink_ext_ack *extack);
+	int			(*ndo_get_rclk_state)(struct net_device *dev,
+						      u32 out_idx, bool *ena,
+						      struct netlink_ext_ack *extack);
 };
 
 /**
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8eae80f287e9..e27c153cfba3 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1304,4 +1304,30 @@ enum {
 
 #define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1)
 
+struct if_rclk_range_msg {
+	__u32 ifindex;
+};
+
+enum {
+	IFLA_RCLK_RANGE_UNSPEC,
+	IFLA_RCLK_RANGE_MIN_PIN,
+	IFLA_RCLK_RANGE_MAX_PIN,
+	__IFLA_RCLK_RANGE_MAX,
+};
+
+struct if_set_rclk_msg {
+	__u32 ifindex;
+	__u32 out_idx;
+	__u32 flags;
+};
+
+#define SET_RCLK_FLAGS_ENA	(1U << 0)
+
+enum {
+	IFLA_RCLK_STATE_UNSPEC,
+	IFLA_RCLK_STATE_OUT_IDX,
+	IFLA_RCLK_STATE_COUNT,
+	__IFLA_RCLK_STATE_MAX,
+};
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 1d8662afd6bd..6c0d96d56ec7 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -185,6 +185,13 @@ enum {
 	RTM_GETNEXTHOPBUCKET,
 #define RTM_GETNEXTHOPBUCKET	RTM_GETNEXTHOPBUCKET
 
+	RTM_GETRCLKRANGE = 120,
+#define RTM_GETRCLKRANGE	RTM_GETRCLKRANGE
+	RTM_GETRCLKSTATE = 121,
+#define RTM_GETRCLKSTATE	RTM_GETRCLKSTATE
+	RTM_SETRCLKSTATE = 122,
+#define RTM_SETRCLKSTATE	RTM_SETRCLKSTATE
+
 	RTM_GETEECSTATE = 124,
 #define RTM_GETEECSTATE	RTM_GETEECSTATE
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 03bc773d0e69..bc1e050f6d38 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -5544,6 +5544,176 @@ static int rtnl_eec_state_get(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return err;
 }
 
+static int rtnl_fill_rclk_range(struct sk_buff *skb, struct net_device *dev,
+				u32 portid, u32 seq,
+				struct netlink_callback *cb, int flags,
+				struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct if_rclk_range_msg *state_msg;
+	struct nlmsghdr *nlh;
+	u32 min_idx, max_idx;
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!ops->ndo_get_rclk_range)
+		return -EOPNOTSUPP;
+
+	err = ops->ndo_get_rclk_range(dev, &min_idx, &max_idx, extack);
+	if (err)
+		return err;
+
+	nlh = nlmsg_put(skb, portid, seq, RTM_GETRCLKRANGE, sizeof(*state_msg),
+			flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	state_msg = nlmsg_data(nlh);
+	state_msg->ifindex = dev->ifindex;
+
+	if (nla_put_u32(skb, IFLA_RCLK_RANGE_MIN_PIN, min_idx) ||
+	    nla_put_u32(skb, IFLA_RCLK_RANGE_MAX_PIN, max_idx))
+		return -EMSGSIZE;
+
+	nlmsg_end(skb, nlh);
+	return 0;
+}
+
+static int rtnl_rclk_range_get(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_eec_state_msg *state;
+	struct net_device *dev;
+	struct sk_buff *nskb;
+	int err;
+
+	state = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, state->ifindex);
+	if (!dev) {
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
+		return -ENODEV;
+	}
+
+	nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!nskb)
+		return -ENOBUFS;
+
+	err = rtnl_fill_rclk_range(nskb, dev, NETLINK_CB(skb).portid,
+				   nlh->nlmsg_seq, NULL, nlh->nlmsg_flags,
+				   extack);
+	if (err < 0)
+		kfree_skb(nskb);
+	else
+		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+
+	return err;
+}
+
+static int rtnl_fill_rclk_state(struct sk_buff *skb, struct net_device *dev,
+				u32 portid, u32 seq,
+				struct netlink_callback *cb, int flags,
+				struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	u32 min_idx, max_idx, src_idx, count = 0;
+	struct if_eec_state_msg *state_msg;
+	struct nlmsghdr *nlh;
+	bool ena;
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!ops->ndo_get_rclk_state || !ops->ndo_get_rclk_range)
+		return -EOPNOTSUPP;
+
+	err = ops->ndo_get_rclk_range(dev, &min_idx, &max_idx, extack);
+	if (err)
+		return err;
+
+	nlh = nlmsg_put(skb, portid, seq, RTM_GETRCLKSTATE, sizeof(*state_msg),
+			flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	state_msg = nlmsg_data(nlh);
+	state_msg->ifindex = dev->ifindex;
+
+	for (src_idx = min_idx; src_idx <= max_idx; src_idx++) {
+		ops->ndo_get_rclk_state(dev, src_idx, &ena, extack);
+		if (!ena)
+			continue;
+
+		if (nla_put_u32(skb, IFLA_RCLK_STATE_OUT_IDX, src_idx))
+			return -EMSGSIZE;
+		count++;
+	}
+
+	if (nla_put_u32(skb, IFLA_RCLK_STATE_COUNT, count))
+		return -EMSGSIZE;
+
+	nlmsg_end(skb, nlh);
+	return 0;
+}
+
+static int rtnl_rclk_state_get(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_eec_state_msg *state;
+	struct net_device *dev;
+	struct sk_buff *nskb;
+	int err;
+
+	state = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, state->ifindex);
+	if (!dev) {
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
+		return -ENODEV;
+	}
+
+	nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!nskb)
+		return -ENOBUFS;
+
+	err = rtnl_fill_rclk_state(nskb, dev, NETLINK_CB(skb).portid,
+				   nlh->nlmsg_seq, NULL, nlh->nlmsg_flags,
+				   extack);
+	if (err < 0)
+		kfree_skb(nskb);
+	else
+		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+
+	return err;
+}
+
+static int rtnl_rclk_set(struct sk_buff *skb, struct nlmsghdr *nlh,
+			 struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct if_set_rclk_msg *state;
+	struct net_device *dev;
+	bool ena;
+	int err;
+
+	state = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, state->ifindex);
+	if (!dev) {
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
+		return -ENODEV;
+	}
+
+	if (!dev->netdev_ops->ndo_set_rclk_out)
+		return -EOPNOTSUPP;
+
+	ena = !!(state->flags & SET_RCLK_FLAGS_ENA);
+	err = dev->netdev_ops->ndo_set_rclk_out(dev, state->out_idx, ena,
+						extack);
+
+	return err;
+}
+
 /* Process one rtnetlink message. */
 
 static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -5770,5 +5940,9 @@ void __init rtnetlink_init(void)
 	rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump,
 		      0);
 
+	rtnl_register(PF_UNSPEC, RTM_GETRCLKRANGE, rtnl_rclk_range_get, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_GETRCLKSTATE, rtnl_rclk_state_get, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_SETRCLKSTATE, rtnl_rclk_set, NULL, 0);
+
 	rtnl_register(PF_UNSPEC, RTM_GETEECSTATE, rtnl_eec_state_get, NULL, 0);
 }
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 2c66e722ea9c..57c7c85edd4d 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -91,6 +91,9 @@ static const struct nlmsg_perm nlmsg_route_perms[] =
 	{ RTM_NEWNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_DELNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETNEXTHOPBUCKET,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_GETRCLKRANGE,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_GETRCLKSTATE,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
+	{ RTM_SETRCLKSTATE,	NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETEECSTATE,	NETLINK_ROUTE_SOCKET__NLMSG_READ  },
 };
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 net-next 5/6] ice: add support for SyncE recovered clocks
  2021-11-05 20:53 ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-05 20:53   ` Maciej Machnikowski
  -1 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: maciej.machnikowski, netdev, intel-wired-lan
  Cc: richardcochran, abyagowi, anthony.l.nguyen, davem, kuba,
	linux-kselftest, idosch, mkubecek, saeed, michael.chan

Implement NDO functions for handling SyncE recovered clocks.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   | 53 +++++++++++
 drivers/net/ethernet/intel/ice/ice_common.c   | 65 +++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |  6 ++
 drivers/net/ethernet/intel/ice/ice_main.c     | 91 +++++++++++++++++++
 include/linux/netdevice.h                     | 11 +++
 5 files changed, 226 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 11226af7a9a4..dace00a35c44 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1281,6 +1281,31 @@ struct ice_aqc_set_mac_lb {
 	u8 reserved[15];
 };
 
+/* Set PHY recovered clock output (direct 0x0630) */
+struct ice_aqc_set_phy_rec_clk_out {
+	u8 phy_output;
+	u8 port_num;
+	u8 flags;
+#define ICE_AQC_SET_PHY_REC_CLK_OUT_OUT_EN	BIT(0)
+#define ICE_AQC_SET_PHY_REC_CLK_OUT_CURR_PORT	0xFF
+	u8 rsvd;
+	__le32 freq;
+	u8 rsvd2[6];
+	__le16 node_handle;
+};
+
+/* Get PHY recovered clock output (direct 0x0631) */
+struct ice_aqc_get_phy_rec_clk_out {
+	u8 phy_output;
+	u8 port_num;
+	u8 flags;
+#define ICE_AQC_GET_PHY_REC_CLK_OUT_OUT_EN	BIT(0)
+	u8 rsvd;
+	__le32 freq;
+	u8 rsvd2[6];
+	__le16 node_handle;
+};
+
 struct ice_aqc_link_topo_params {
 	u8 lport_num;
 	u8 lport_num_valid;
@@ -1838,6 +1863,28 @@ struct ice_aqc_get_cgu_dpll_status {
 	__le16 node_handle;
 };
 
+/* Read CGU register (direct 0x0C6E) */
+struct ice_aqc_read_cgu_reg {
+	__le16 offset;
+#define ICE_AQC_READ_CGU_REG_MAX_DATA_LEN	16
+	u8 data_len;
+	u8 rsvd[13];
+};
+
+/* Read CGU register response (direct 0x0C6E) */
+struct ice_aqc_read_cgu_reg_resp {
+	u8 data[ICE_AQC_READ_CGU_REG_MAX_DATA_LEN];
+};
+
+/* Write CGU register (direct 0x0C6F) */
+struct ice_aqc_write_cgu_reg {
+	__le16 offset;
+#define ICE_AQC_WRITE_CGU_REG_MAX_DATA_LEN	7
+	u8 data_len;
+	u8 data[ICE_AQC_WRITE_CGU_REG_MAX_DATA_LEN];
+	u8 rsvd[6];
+};
+
 /* Configure Firmware Logging Command (indirect 0xFF09)
  * Logging Information Read Response (indirect 0xFF10)
  * Note: The 0xFF10 command has no input parameters.
@@ -2033,6 +2080,8 @@ struct ice_aq_desc {
 		struct ice_aqc_get_phy_caps get_phy;
 		struct ice_aqc_set_phy_cfg set_phy;
 		struct ice_aqc_restart_an restart_an;
+		struct ice_aqc_set_phy_rec_clk_out set_phy_rec_clk_out;
+		struct ice_aqc_get_phy_rec_clk_out get_phy_rec_clk_out;
 		struct ice_aqc_gpio read_write_gpio;
 		struct ice_aqc_sff_eeprom read_write_sff_param;
 		struct ice_aqc_set_port_id_led set_port_id_led;
@@ -2188,6 +2237,8 @@ enum ice_adminq_opc {
 	ice_aqc_opc_get_link_status			= 0x0607,
 	ice_aqc_opc_set_event_mask			= 0x0613,
 	ice_aqc_opc_set_mac_lb				= 0x0620,
+	ice_aqc_opc_set_phy_rec_clk_out			= 0x0630,
+	ice_aqc_opc_get_phy_rec_clk_out			= 0x0631,
 	ice_aqc_opc_get_link_topo			= 0x06E0,
 	ice_aqc_opc_set_port_id_led			= 0x06E9,
 	ice_aqc_opc_set_gpio				= 0x06EC,
@@ -2238,6 +2289,8 @@ enum ice_adminq_opc {
 
 	/* 1588/SyncE commands/events */
 	ice_aqc_opc_get_cgu_dpll_status			= 0x0C66,
+	ice_aqc_opc_read_cgu_reg			= 0x0C6E,
+	ice_aqc_opc_write_cgu_reg			= 0x0C6F,
 
 	ice_aqc_opc_driver_shared_params		= 0x0C90,
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 8069141ac105..29d302ea1e56 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -5242,3 +5242,68 @@ bool ice_is_clock_mux_present_e810t(struct ice_hw *hw)
 	return true;
 }
 
+/**
+ * ice_aq_set_phy_rec_clk_out - set RCLK phy out
+ * @hw: pointer to the HW struct
+ * @phy_output: PHY reference clock output pin
+ * @enable: GPIO state to be applied
+ * @freq: PHY output frequency
+ *
+ * Set CGU reference priority (0x0630)
+ * Return 0 on success or negative value on failure.
+ */
+enum ice_status
+ice_aq_set_phy_rec_clk_out(struct ice_hw *hw, u8 phy_output, bool enable,
+			   u32 *freq)
+{
+	struct ice_aqc_set_phy_rec_clk_out *cmd;
+	struct ice_aq_desc desc;
+	enum ice_status status;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_phy_rec_clk_out);
+	cmd = &desc.params.set_phy_rec_clk_out;
+	cmd->phy_output = phy_output;
+	cmd->port_num = ICE_AQC_SET_PHY_REC_CLK_OUT_CURR_PORT;
+	cmd->flags = enable & ICE_AQC_SET_PHY_REC_CLK_OUT_OUT_EN;
+	cmd->freq = cpu_to_le32(*freq);
+
+	status = ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
+	if (!status)
+		*freq = le32_to_cpu(cmd->freq);
+
+	return status;
+}
+
+/**
+ * ice_aq_get_phy_rec_clk_out
+ * @hw: pointer to the HW struct
+ * @phy_output: PHY reference clock output pin
+ * @port_num: Port number
+ * @flags: PHY flags
+ * @freq: PHY output frequency
+ *
+ * Get PHY recovered clock output (0x0631)
+ */
+enum ice_status
+ice_aq_get_phy_rec_clk_out(struct ice_hw *hw, u8 phy_output, u8 *port_num,
+			   u8 *flags, u32 *freq)
+{
+	struct ice_aqc_get_phy_rec_clk_out *cmd;
+	struct ice_aq_desc desc;
+	enum ice_status status;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_phy_rec_clk_out);
+	cmd = &desc.params.get_phy_rec_clk_out;
+	cmd->phy_output = phy_output;
+	cmd->port_num = *port_num;
+
+	status = ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
+	if (!status) {
+		*port_num = cmd->port_num;
+		*flags = cmd->flags;
+		*freq = le32_to_cpu(cmd->freq);
+	}
+
+	return status;
+}
+
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index aaed388a40a8..8a99c8364173 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -166,6 +166,12 @@ ice_ena_vsi_rdma_qset(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 enum ice_status
 ice_aq_get_cgu_dpll_status(struct ice_hw *hw, u8 dpll_num, u8 *ref_state,
 			   u16 *dpll_state, u64 *phase_offset, u8 *eec_mode);
+enum ice_status
+ice_aq_set_phy_rec_clk_out(struct ice_hw *hw, u8 phy_output, bool enable,
+			   u32 *freq);
+enum ice_status
+ice_aq_get_phy_rec_clk_out(struct ice_hw *hw, u8 phy_output, u8 *port_num,
+			   u8 *flags, u32 *freq);
 int
 ice_dis_vsi_rdma_qset(struct ice_port_info *pi, u16 count, u32 *qset_teid,
 		      u16 *q_id);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 7fac27903ab4..98834aa3f3dc 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6284,6 +6284,94 @@ ice_get_eec_src(struct net_device *netdev, u32 *src,
 	return 0;
 }
 
+/**
+ * ice_get_rclk_range - get range of recovered clock indices
+ * @netdev: network interface device structure
+ * @min_idx: min rclk index
+ * @max_idx: max rclk index
+ * @extack: netlink extended ack
+ */
+static int
+ice_get_rclk_range(struct net_device *netdev, u32 *min_idx, u32 *max_idx,
+		   struct netlink_ext_ack *extack)
+{
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	*min_idx = REF1P;
+	*max_idx = REF1N;
+
+	return 0;
+}
+
+/**
+ * ice_set_rclk_out - set recovered clock redirection to the output pin
+ * @netdev: network interface device structure
+ * @out_idx: output index
+ * @ena: true will enable redirection, false will disable it
+ * @extack: netlink extended ack
+ */
+static int
+ice_set_rclk_out(struct net_device *netdev, u32 out_idx, bool ena,
+		 struct netlink_ext_ack *extack)
+{
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+	enum ice_status ret;
+	u32 freq;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	if (out_idx < REF1P || out_idx > REF1N)
+		return -EINVAL;
+
+	ret = ice_aq_set_phy_rec_clk_out(&pf->hw, out_idx - REF1P, ena, &freq);
+
+	return ice_status_to_errno(ret);
+}
+
+/**
+ * ice_get_rclk_state - Get state of recovered clock pin for a given netdev
+ * @netdev: network interface device structure
+ * @out_idx: output index
+ * @ena: returns true if the pin is enabled
+ * @extack: netlink extended ack
+ */
+static int
+ice_get_rclk_state(struct net_device *netdev, u32 out_idx, bool *ena,
+		   struct netlink_ext_ack *extack)
+{
+	u8 port_num = ICE_AQC_SET_PHY_REC_CLK_OUT_CURR_PORT;
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+	enum ice_status ret;
+	u32 freq;
+	u8 flags;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	if (out_idx < REF1P || out_idx > REF1N)
+		return -EINVAL;
+
+	ret = ice_aq_get_phy_rec_clk_out(&pf->hw, out_idx - REF1P, &port_num,
+					 &flags, &freq);
+
+	if (!ret && (flags & ICE_AQC_GET_PHY_REC_CLK_OUT_OUT_EN))
+		*ena = true;
+	else
+		*ena = false;
+
+	return ice_status_to_errno(ret);
+}
+
 /**
  * ice_down - Shutdown the connection
  * @vsi: The VSI being stopped
@@ -8647,4 +8735,7 @@ static const struct net_device_ops ice_netdev_ops = {
 	.ndo_xsk_wakeup = ice_xsk_wakeup,
 	.ndo_get_eec_state = ice_get_eec_state,
 	.ndo_get_eec_src = ice_get_eec_src,
+	.ndo_get_rclk_range = ice_get_rclk_range,
+	.ndo_set_rclk_out = ice_set_rclk_out,
+	.ndo_get_rclk_state = ice_get_rclk_state,
 };
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 708bd8336155..9faa005506d1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1351,6 +1351,17 @@ struct netdev_net_notifier {
  *			  struct netlink_ext_ack *extack);
  *	Get the index of the source signal that's currently used as EEC's
  *	reference
+ * int (*ndo_get_rclk_range)(struct net_device *dev, u32 *min_idx, u32 *max_idx,
+ *			     struct netlink_ext_ack *extack);
+ *	Get range of valid output indices for the set/get Recovered Clock
+ *	functions
+ * int (*ndo_set_rclk_out)(struct net_device *dev, u32 out_idx, bool ena,
+ *			   struct netlink_ext_ack *extack);
+ *	Set the receive clock recovery redirection to a given Recovered Clock
+ *	output.
+ * int (*ndo_get_rclk_state)(struct net_device *dev, u32 out_idx, bool *ena,
+ *			     struct netlink_ext_ack *extack);
+ *	Get current state of the recovered clock to pin mapping.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 5/6] ice: add support for SyncE recovered clocks
@ 2021-11-05 20:53   ` Maciej Machnikowski
  0 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: intel-wired-lan

Implement NDO functions for handling SyncE recovered clocks.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   | 53 +++++++++++
 drivers/net/ethernet/intel/ice/ice_common.c   | 65 +++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |  6 ++
 drivers/net/ethernet/intel/ice/ice_main.c     | 91 +++++++++++++++++++
 include/linux/netdevice.h                     | 11 +++
 5 files changed, 226 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 11226af7a9a4..dace00a35c44 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1281,6 +1281,31 @@ struct ice_aqc_set_mac_lb {
 	u8 reserved[15];
 };
 
+/* Set PHY recovered clock output (direct 0x0630) */
+struct ice_aqc_set_phy_rec_clk_out {
+	u8 phy_output;
+	u8 port_num;
+	u8 flags;
+#define ICE_AQC_SET_PHY_REC_CLK_OUT_OUT_EN	BIT(0)
+#define ICE_AQC_SET_PHY_REC_CLK_OUT_CURR_PORT	0xFF
+	u8 rsvd;
+	__le32 freq;
+	u8 rsvd2[6];
+	__le16 node_handle;
+};
+
+/* Get PHY recovered clock output (direct 0x0631) */
+struct ice_aqc_get_phy_rec_clk_out {
+	u8 phy_output;
+	u8 port_num;
+	u8 flags;
+#define ICE_AQC_GET_PHY_REC_CLK_OUT_OUT_EN	BIT(0)
+	u8 rsvd;
+	__le32 freq;
+	u8 rsvd2[6];
+	__le16 node_handle;
+};
+
 struct ice_aqc_link_topo_params {
 	u8 lport_num;
 	u8 lport_num_valid;
@@ -1838,6 +1863,28 @@ struct ice_aqc_get_cgu_dpll_status {
 	__le16 node_handle;
 };
 
+/* Read CGU register (direct 0x0C6E) */
+struct ice_aqc_read_cgu_reg {
+	__le16 offset;
+#define ICE_AQC_READ_CGU_REG_MAX_DATA_LEN	16
+	u8 data_len;
+	u8 rsvd[13];
+};
+
+/* Read CGU register response (direct 0x0C6E) */
+struct ice_aqc_read_cgu_reg_resp {
+	u8 data[ICE_AQC_READ_CGU_REG_MAX_DATA_LEN];
+};
+
+/* Write CGU register (direct 0x0C6F) */
+struct ice_aqc_write_cgu_reg {
+	__le16 offset;
+#define ICE_AQC_WRITE_CGU_REG_MAX_DATA_LEN	7
+	u8 data_len;
+	u8 data[ICE_AQC_WRITE_CGU_REG_MAX_DATA_LEN];
+	u8 rsvd[6];
+};
+
 /* Configure Firmware Logging Command (indirect 0xFF09)
  * Logging Information Read Response (indirect 0xFF10)
  * Note: The 0xFF10 command has no input parameters.
@@ -2033,6 +2080,8 @@ struct ice_aq_desc {
 		struct ice_aqc_get_phy_caps get_phy;
 		struct ice_aqc_set_phy_cfg set_phy;
 		struct ice_aqc_restart_an restart_an;
+		struct ice_aqc_set_phy_rec_clk_out set_phy_rec_clk_out;
+		struct ice_aqc_get_phy_rec_clk_out get_phy_rec_clk_out;
 		struct ice_aqc_gpio read_write_gpio;
 		struct ice_aqc_sff_eeprom read_write_sff_param;
 		struct ice_aqc_set_port_id_led set_port_id_led;
@@ -2188,6 +2237,8 @@ enum ice_adminq_opc {
 	ice_aqc_opc_get_link_status			= 0x0607,
 	ice_aqc_opc_set_event_mask			= 0x0613,
 	ice_aqc_opc_set_mac_lb				= 0x0620,
+	ice_aqc_opc_set_phy_rec_clk_out			= 0x0630,
+	ice_aqc_opc_get_phy_rec_clk_out			= 0x0631,
 	ice_aqc_opc_get_link_topo			= 0x06E0,
 	ice_aqc_opc_set_port_id_led			= 0x06E9,
 	ice_aqc_opc_set_gpio				= 0x06EC,
@@ -2238,6 +2289,8 @@ enum ice_adminq_opc {
 
 	/* 1588/SyncE commands/events */
 	ice_aqc_opc_get_cgu_dpll_status			= 0x0C66,
+	ice_aqc_opc_read_cgu_reg			= 0x0C6E,
+	ice_aqc_opc_write_cgu_reg			= 0x0C6F,
 
 	ice_aqc_opc_driver_shared_params		= 0x0C90,
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 8069141ac105..29d302ea1e56 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -5242,3 +5242,68 @@ bool ice_is_clock_mux_present_e810t(struct ice_hw *hw)
 	return true;
 }
 
+/**
+ * ice_aq_set_phy_rec_clk_out - set RCLK phy out
+ * @hw: pointer to the HW struct
+ * @phy_output: PHY reference clock output pin
+ * @enable: GPIO state to be applied
+ * @freq: PHY output frequency
+ *
+ * Set CGU reference priority (0x0630)
+ * Return 0 on success or negative value on failure.
+ */
+enum ice_status
+ice_aq_set_phy_rec_clk_out(struct ice_hw *hw, u8 phy_output, bool enable,
+			   u32 *freq)
+{
+	struct ice_aqc_set_phy_rec_clk_out *cmd;
+	struct ice_aq_desc desc;
+	enum ice_status status;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_phy_rec_clk_out);
+	cmd = &desc.params.set_phy_rec_clk_out;
+	cmd->phy_output = phy_output;
+	cmd->port_num = ICE_AQC_SET_PHY_REC_CLK_OUT_CURR_PORT;
+	cmd->flags = enable & ICE_AQC_SET_PHY_REC_CLK_OUT_OUT_EN;
+	cmd->freq = cpu_to_le32(*freq);
+
+	status = ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
+	if (!status)
+		*freq = le32_to_cpu(cmd->freq);
+
+	return status;
+}
+
+/**
+ * ice_aq_get_phy_rec_clk_out
+ * @hw: pointer to the HW struct
+ * @phy_output: PHY reference clock output pin
+ * @port_num: Port number
+ * @flags: PHY flags
+ * @freq: PHY output frequency
+ *
+ * Get PHY recovered clock output (0x0631)
+ */
+enum ice_status
+ice_aq_get_phy_rec_clk_out(struct ice_hw *hw, u8 phy_output, u8 *port_num,
+			   u8 *flags, u32 *freq)
+{
+	struct ice_aqc_get_phy_rec_clk_out *cmd;
+	struct ice_aq_desc desc;
+	enum ice_status status;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_phy_rec_clk_out);
+	cmd = &desc.params.get_phy_rec_clk_out;
+	cmd->phy_output = phy_output;
+	cmd->port_num = *port_num;
+
+	status = ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
+	if (!status) {
+		*port_num = cmd->port_num;
+		*flags = cmd->flags;
+		*freq = le32_to_cpu(cmd->freq);
+	}
+
+	return status;
+}
+
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index aaed388a40a8..8a99c8364173 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -166,6 +166,12 @@ ice_ena_vsi_rdma_qset(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 enum ice_status
 ice_aq_get_cgu_dpll_status(struct ice_hw *hw, u8 dpll_num, u8 *ref_state,
 			   u16 *dpll_state, u64 *phase_offset, u8 *eec_mode);
+enum ice_status
+ice_aq_set_phy_rec_clk_out(struct ice_hw *hw, u8 phy_output, bool enable,
+			   u32 *freq);
+enum ice_status
+ice_aq_get_phy_rec_clk_out(struct ice_hw *hw, u8 phy_output, u8 *port_num,
+			   u8 *flags, u32 *freq);
 int
 ice_dis_vsi_rdma_qset(struct ice_port_info *pi, u16 count, u32 *qset_teid,
 		      u16 *q_id);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 7fac27903ab4..98834aa3f3dc 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6284,6 +6284,94 @@ ice_get_eec_src(struct net_device *netdev, u32 *src,
 	return 0;
 }
 
+/**
+ * ice_get_rclk_range - get range of recovered clock indices
+ * @netdev: network interface device structure
+ * @min_idx: min rclk index
+ * @max_idx: max rclk index
+ * @extack: netlink extended ack
+ */
+static int
+ice_get_rclk_range(struct net_device *netdev, u32 *min_idx, u32 *max_idx,
+		   struct netlink_ext_ack *extack)
+{
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	*min_idx = REF1P;
+	*max_idx = REF1N;
+
+	return 0;
+}
+
+/**
+ * ice_set_rclk_out - set recovered clock redirection to the output pin
+ * @netdev: network interface device structure
+ * @out_idx: output index
+ * @ena: true will enable redirection, false will disable it
+ * @extack: netlink extended ack
+ */
+static int
+ice_set_rclk_out(struct net_device *netdev, u32 out_idx, bool ena,
+		 struct netlink_ext_ack *extack)
+{
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+	enum ice_status ret;
+	u32 freq;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	if (out_idx < REF1P || out_idx > REF1N)
+		return -EINVAL;
+
+	ret = ice_aq_set_phy_rec_clk_out(&pf->hw, out_idx - REF1P, ena, &freq);
+
+	return ice_status_to_errno(ret);
+}
+
+/**
+ * ice_get_rclk_state - Get state of recovered clock pin for a given netdev
+ * @netdev: network interface device structure
+ * @out_idx: output index
+ * @ena: returns true if the pin is enabled
+ * @extack: netlink extended ack
+ */
+static int
+ice_get_rclk_state(struct net_device *netdev, u32 out_idx, bool *ena,
+		   struct netlink_ext_ack *extack)
+{
+	u8 port_num = ICE_AQC_SET_PHY_REC_CLK_OUT_CURR_PORT;
+	struct ice_netdev_priv *np = netdev_priv(netdev);
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+	enum ice_status ret;
+	u32 freq;
+	u8 flags;
+
+	if (!ice_is_feature_supported(pf, ICE_F_CGU))
+		return -EOPNOTSUPP;
+
+	if (out_idx < REF1P || out_idx > REF1N)
+		return -EINVAL;
+
+	ret = ice_aq_get_phy_rec_clk_out(&pf->hw, out_idx - REF1P, &port_num,
+					 &flags, &freq);
+
+	if (!ret && (flags & ICE_AQC_GET_PHY_REC_CLK_OUT_OUT_EN))
+		*ena = true;
+	else
+		*ena = false;
+
+	return ice_status_to_errno(ret);
+}
+
 /**
  * ice_down - Shutdown the connection
  * @vsi: The VSI being stopped
@@ -8647,4 +8735,7 @@ static const struct net_device_ops ice_netdev_ops = {
 	.ndo_xsk_wakeup = ice_xsk_wakeup,
 	.ndo_get_eec_state = ice_get_eec_state,
 	.ndo_get_eec_src = ice_get_eec_src,
+	.ndo_get_rclk_range = ice_get_rclk_range,
+	.ndo_set_rclk_out = ice_set_rclk_out,
+	.ndo_get_rclk_state = ice_get_rclk_state,
 };
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 708bd8336155..9faa005506d1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1351,6 +1351,17 @@ struct netdev_net_notifier {
  *			  struct netlink_ext_ack *extack);
  *	Get the index of the source signal that's currently used as EEC's
  *	reference
+ * int (*ndo_get_rclk_range)(struct net_device *dev, u32 *min_idx, u32 *max_idx,
+ *			     struct netlink_ext_ack *extack);
+ *	Get range of valid output indices for the set/get Recovered Clock
+ *	functions
+ * int (*ndo_set_rclk_out)(struct net_device *dev, u32 out_idx, bool ena,
+ *			   struct netlink_ext_ack *extack);
+ *	Set the receive clock recovery redirection to a given Recovered Clock
+ *	output.
+ * int (*ndo_get_rclk_state)(struct net_device *dev, u32 out_idx, bool *ena,
+ *			     struct netlink_ext_ack *extack);
+ *	Get current state of the recovered clock to pin mapping.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-05 20:53 ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-05 20:53   ` Maciej Machnikowski
  -1 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: maciej.machnikowski, netdev, intel-wired-lan
  Cc: richardcochran, abyagowi, anthony.l.nguyen, davem, kuba,
	linux-kselftest, idosch, mkubecek, saeed, michael.chan

Add Documentation/networking/synce.rst describing new RTNL messages
and respective NDO ops supporting SyncE (Synchronous Ethernet).

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 Documentation/networking/synce.rst | 117 +++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 Documentation/networking/synce.rst

diff --git a/Documentation/networking/synce.rst b/Documentation/networking/synce.rst
new file mode 100644
index 000000000000..4ca41fb9a481
--- /dev/null
+++ b/Documentation/networking/synce.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Synchronous Ethernet
+====================
+
+Synchronous Ethernet networks use a physical layer clock to syntonize
+the frequency across different network elements.
+
+Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
+Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered clocks
+and a dedicated TX clock input that is used as to transmit data to other nodes.
+
+The SyncE capable PHY is able to recover the incomning frequency of the data
+stream on RX lanes and redirect it (sometimes dividing it) to recovered
+clock outputs. In SyncE PHY the TX frequency is directly dependent on the
+input frequency - either on the PHY CLK input, or on a dedicated
+TX clock input.
+
+      ┌───────────┬──────────┐
+      │ RX        │ TX       │
+  1   │ lanes     │ lanes    │ 1
+  ───►├──────┐    │          ├─────►
+  2   │      │    │          │ 2
+  ───►├──┐   │    │          ├─────►
+  3   │  │   │    │          │ 3
+  ───►├─▼▼   ▼    │          ├─────►
+      │ ──────    │          │
+      │ \____/    │          │
+      └──┼──┼─────┴──────────┘
+        1│ 2│        ▲
+ RCLK out│  │        │ TX CLK in
+         ▼  ▼        │
+       ┌─────────────┴───┐
+       │                 │
+       │       EEC       │
+       │                 │
+       └─────────────────┘
+
+The EEC can synchronize its frequency to one of the synchronization inputs
+either clocks recovered on traffic interfaces or (in advanced deployments)
+external frequency sources.
+
+Some EEC implementations can select synchronization source through
+priority tables and synchronization status messaging and provide necessary
+filtering and holdover capabilities.
+
+The following interface can be applicable to diffferent packet network types
+following ITU-T G.8261/G.8262 recommendations.
+
+Interface
+=========
+
+The following RTNL messages are used to read/configure SyncE recovered
+clocks.
+
+RTM_GETRCLKRANGE
+-----------------
+Reads the allowed pin index range for the recovered clock outputs.
+This can be aligned to PHY outputs or to EEC inputs, whichever is
+better for a given application.
+Will call the ndo_get_rclk_range function to read the allowed range
+of output pin indexes.
+Will call ndo_get_rclk_range to determine the allowed recovered clock
+range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
+IFLA_RCLK_RANGE_MAX_PIN attributes
+
+RTM_GETRCLKSTATE
+-----------------
+Read the state of recovered pins that output recovered clock from
+a given port. The message will contain the number of assigned clocks
+(IFLA_RCLK_STATE_COUNT) and an N pin indexes in IFLA_RCLK_STATE_OUT_IDX
+To support multiple recovered clock outputs from the same port, this message
+will return the IFLA_RCLK_STATE_COUNT attribute containing the number of
+active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX attributes
+listing the active output indexes.
+This message will call the ndo_get_rclk_range to determine the allowed
+recovered clock indexes and then will loop through them, calling
+the ndo_get_rclk_state for each of them.
+
+RTM_SETRCLKSTATE
+-----------------
+Sets the redirection of the recovered clock for a given pin. This message
+expects one attribute:
+struct if_set_rclk_msg {
+	__u32 ifindex; /* interface index */
+	__u32 out_idx; /* output index (from a valid range)
+	__u32 flags; /* configuration flags */
+};
+
+Supported flags are:
+SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
+		     if clear - the output will be disabled.
+
+RTM_GETEECSTATE
+----------------
+Reads the state of the EEC or equivalent physical clock synchronizer.
+This message returns the following attributes:
+IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
+		 The states returned in this attribute are aligned to the
+		 ITU-T G.781 and are:
+		  IF_EEC_STATE_INVALID - state is not valid
+		  IF_EEC_STATE_FREERUN - clock is free-running
+		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
+		                        but the holdover memory is not valid
+		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the reference
+		                               and holdover memory is valid
+		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
+State is read from the netdev calling the:
+int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state *state,
+			 u32 *src_idx, struct netlink_ext_ack *extack);
+
+IFLA_EEC_SRC_IDX - optional attribute returning the index of the reference that
+		   is used for the current IFLA_EEC_STATE, i.e., the index of
+		   the pin that the EEC is locked to.
+
+Will be returned only if the ndo_get_eec_src is implemented.
\ No newline at end of file
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-05 20:53   ` Maciej Machnikowski
  0 siblings, 0 replies; 50+ messages in thread
From: Maciej Machnikowski @ 2021-11-05 20:53 UTC (permalink / raw)
  To: intel-wired-lan

Add Documentation/networking/synce.rst describing new RTNL messages
and respective NDO ops supporting SyncE (Synchronous Ethernet).

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
---
 Documentation/networking/synce.rst | 117 +++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 Documentation/networking/synce.rst

diff --git a/Documentation/networking/synce.rst b/Documentation/networking/synce.rst
new file mode 100644
index 000000000000..4ca41fb9a481
--- /dev/null
+++ b/Documentation/networking/synce.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Synchronous Ethernet
+====================
+
+Synchronous Ethernet networks use a physical layer clock to syntonize
+the frequency across different network elements.
+
+Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
+Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered clocks
+and a dedicated TX clock input that is used as to transmit data to other nodes.
+
+The SyncE capable PHY is able to recover the incomning frequency of the data
+stream on RX lanes and redirect it (sometimes dividing it) to recovered
+clock outputs. In SyncE PHY the TX frequency is directly dependent on the
+input frequency - either on the PHY CLK input, or on a dedicated
+TX clock input.
+
+      ????????????????????????
+      ? RX        ? TX       ?
+  1   ? lanes     ? lanes    ? 1
+  ????????????    ?          ???????
+  2   ?      ?    ?          ? 2
+  ????????   ?    ?          ???????
+  3   ?  ?   ?    ?          ? 3
+  ????????   ?    ?          ???????
+      ? ??????    ?          ?
+      ? \____/    ?          ?
+      ????????????????????????
+        1? 2?        ?
+ RCLK out?  ?        ? TX CLK in
+         ?  ?        ?
+       ???????????????????
+       ?                 ?
+       ?       EEC       ?
+       ?                 ?
+       ???????????????????
+
+The EEC can synchronize its frequency to one of the synchronization inputs
+either clocks recovered on traffic interfaces or (in advanced deployments)
+external frequency sources.
+
+Some EEC implementations can select synchronization source through
+priority tables and synchronization status messaging and provide necessary
+filtering and holdover capabilities.
+
+The following interface can be applicable to diffferent packet network types
+following ITU-T G.8261/G.8262 recommendations.
+
+Interface
+=========
+
+The following RTNL messages are used to read/configure SyncE recovered
+clocks.
+
+RTM_GETRCLKRANGE
+-----------------
+Reads the allowed pin index range for the recovered clock outputs.
+This can be aligned to PHY outputs or to EEC inputs, whichever is
+better for a given application.
+Will call the ndo_get_rclk_range function to read the allowed range
+of output pin indexes.
+Will call ndo_get_rclk_range to determine the allowed recovered clock
+range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
+IFLA_RCLK_RANGE_MAX_PIN attributes
+
+RTM_GETRCLKSTATE
+-----------------
+Read the state of recovered pins that output recovered clock from
+a given port. The message will contain the number of assigned clocks
+(IFLA_RCLK_STATE_COUNT) and an N pin indexes in IFLA_RCLK_STATE_OUT_IDX
+To support multiple recovered clock outputs from the same port, this message
+will return the IFLA_RCLK_STATE_COUNT attribute containing the number of
+active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX attributes
+listing the active output indexes.
+This message will call the ndo_get_rclk_range to determine the allowed
+recovered clock indexes and then will loop through them, calling
+the ndo_get_rclk_state for each of them.
+
+RTM_SETRCLKSTATE
+-----------------
+Sets the redirection of the recovered clock for a given pin. This message
+expects one attribute:
+struct if_set_rclk_msg {
+	__u32 ifindex; /* interface index */
+	__u32 out_idx; /* output index (from a valid range)
+	__u32 flags; /* configuration flags */
+};
+
+Supported flags are:
+SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
+		     if clear - the output will be disabled.
+
+RTM_GETEECSTATE
+----------------
+Reads the state of the EEC or equivalent physical clock synchronizer.
+This message returns the following attributes:
+IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
+		 The states returned in this attribute are aligned to the
+		 ITU-T G.781 and are:
+		  IF_EEC_STATE_INVALID - state is not valid
+		  IF_EEC_STATE_FREERUN - clock is free-running
+		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
+		                        but the holdover memory is not valid
+		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the reference
+		                               and holdover memory is valid
+		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
+State is read from the netdev calling the:
+int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state *state,
+			 u32 *src_idx, struct netlink_ext_ack *extack);
+
+IFLA_EEC_SRC_IDX - optional attribute returning the index of the reference that
+		   is used for the current IFLA_EEC_STATE, i.e., the index of
+		   the pin that the EEC is locked to.
+
+Will be returned only if the ndo_get_eec_src is implemented.
\ No newline at end of file
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 2/6] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
  2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-07 13:44     ` Ido Schimmel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ido Schimmel @ 2021-11-07 13:44 UTC (permalink / raw)
  To: Maciej Machnikowski
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi,
	anthony.l.nguyen, davem, kuba, linux-kselftest, mkubecek, saeed,
	michael.chan

On Fri, Nov 05, 2021 at 09:53:27PM +0100, Maciej Machnikowski wrote:
> +/* SyncE section */
> +
> +enum if_eec_state {
> +	IF_EEC_STATE_INVALID = 0,	/* state is not valid */
> +	IF_EEC_STATE_FREERUN,		/* clock is free-running */
> +	IF_EEC_STATE_LOCKED,		/* clock is locked to the reference,
> +					 * but the holdover memory is not valid
> +					 */
> +	IF_EEC_STATE_LOCKED_HO_ACQ,	/* clock is locked to the reference
> +					 * and holdover memory is valid
> +					 */
> +	IF_EEC_STATE_HOLDOVER,		/* clock is in holdover mode */
> +};
> +
> +#define EEC_SRC_PORT		(1 << 0) /* recovered clock from the port is
> +					  * currently the source for the EEC
> +					  */

Where is this used?

Note that the merge window is open and that net-next is closed:

http://vger.kernel.org/~davem/net-next.html

> +
> +struct if_eec_state_msg {
> +	__u32 ifindex;
> +};
> +
> +enum {
> +	IFLA_EEC_UNSPEC,
> +	IFLA_EEC_STATE,
> +	IFLA_EEC_SRC_IDX,
> +	__IFLA_EEC_MAX,
> +};
> +
> +#define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 2/6] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status
@ 2021-11-07 13:44     ` Ido Schimmel
  0 siblings, 0 replies; 50+ messages in thread
From: Ido Schimmel @ 2021-11-07 13:44 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Nov 05, 2021 at 09:53:27PM +0100, Maciej Machnikowski wrote:
> +/* SyncE section */
> +
> +enum if_eec_state {
> +	IF_EEC_STATE_INVALID = 0,	/* state is not valid */
> +	IF_EEC_STATE_FREERUN,		/* clock is free-running */
> +	IF_EEC_STATE_LOCKED,		/* clock is locked to the reference,
> +					 * but the holdover memory is not valid
> +					 */
> +	IF_EEC_STATE_LOCKED_HO_ACQ,	/* clock is locked to the reference
> +					 * and holdover memory is valid
> +					 */
> +	IF_EEC_STATE_HOLDOVER,		/* clock is in holdover mode */
> +};
> +
> +#define EEC_SRC_PORT		(1 << 0) /* recovered clock from the port is
> +					  * currently the source for the EEC
> +					  */

Where is this used?

Note that the merge window is open and that net-next is closed:

http://vger.kernel.org/~davem/net-next.html

> +
> +struct if_eec_state_msg {
> +	__u32 ifindex;
> +};
> +
> +enum {
> +	IFLA_EEC_UNSPEC,
> +	IFLA_EEC_STATE,
> +	IFLA_EEC_SRC_IDX,
> +	__IFLA_EEC_MAX,
> +};
> +
> +#define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-07 14:08     ` Ido Schimmel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ido Schimmel @ 2021-11-07 14:08 UTC (permalink / raw)
  To: Maciej Machnikowski
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi,
	anthony.l.nguyen, davem, kuba, linux-kselftest, mkubecek, saeed,
	michael.chan

On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> Add Documentation/networking/synce.rst describing new RTNL messages
> and respective NDO ops supporting SyncE (Synchronous Ethernet).
> 
> Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
> ---
>  Documentation/networking/synce.rst | 117 +++++++++++++++++++++++++++++
>  1 file changed, 117 insertions(+)
>  create mode 100644 Documentation/networking/synce.rst
> 
> diff --git a/Documentation/networking/synce.rst b/Documentation/networking/synce.rst
> new file mode 100644
> index 000000000000..4ca41fb9a481
> --- /dev/null
> +++ b/Documentation/networking/synce.rst
> @@ -0,0 +1,117 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================
> +Synchronous Ethernet
> +====================
> +
> +Synchronous Ethernet networks use a physical layer clock to syntonize
> +the frequency across different network elements.
> +
> +Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
> +Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered clocks
> +and a dedicated TX clock input that is used as to transmit data to other nodes.
> +
> +The SyncE capable PHY is able to recover the incomning frequency of the data
> +stream on RX lanes and redirect it (sometimes dividing it) to recovered
> +clock outputs. In SyncE PHY the TX frequency is directly dependent on the
> +input frequency - either on the PHY CLK input, or on a dedicated
> +TX clock input.
> +
> +      ┌───────────┬──────────┐
> +      │ RX        │ TX       │
> +  1   │ lanes     │ lanes    │ 1
> +  ───►├──────┐    │          ├─────►
> +  2   │      │    │          │ 2
> +  ───►├──┐   │    │          ├─────►
> +  3   │  │   │    │          │ 3
> +  ───►├─▼▼   ▼    │          ├─────►
> +      │ ──────    │          │
> +      │ \____/    │          │
> +      └──┼──┼─────┴──────────┘
> +        1│ 2│        ▲
> + RCLK out│  │        │ TX CLK in
> +         ▼  ▼        │
> +       ┌─────────────┴───┐
> +       │                 │
> +       │       EEC       │
> +       │                 │
> +       └─────────────────┘
> +
> +The EEC can synchronize its frequency to one of the synchronization inputs
> +either clocks recovered on traffic interfaces or (in advanced deployments)
> +external frequency sources.
> +
> +Some EEC implementations can select synchronization source through
> +priority tables and synchronization status messaging and provide necessary
> +filtering and holdover capabilities.
> +
> +The following interface can be applicable to diffferent packet network types
> +following ITU-T G.8261/G.8262 recommendations.
> +
> +Interface
> +=========
> +
> +The following RTNL messages are used to read/configure SyncE recovered
> +clocks.
> +
> +RTM_GETRCLKRANGE
> +-----------------
> +Reads the allowed pin index range for the recovered clock outputs.
> +This can be aligned to PHY outputs or to EEC inputs, whichever is
> +better for a given application.

Can you explain the difference between PHY outputs and EEC inputs? It is
no clear to me from the diagram.

How would the diagram look in a multi-port adapter where you have a
single EEC?

> +Will call the ndo_get_rclk_range function to read the allowed range
> +of output pin indexes.
> +Will call ndo_get_rclk_range to determine the allowed recovered clock
> +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> +IFLA_RCLK_RANGE_MAX_PIN attributes

The first sentence seems to be redundant

> +
> +RTM_GETRCLKSTATE
> +-----------------
> +Read the state of recovered pins that output recovered clock from
> +a given port. The message will contain the number of assigned clocks
> +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in IFLA_RCLK_STATE_OUT_IDX
> +To support multiple recovered clock outputs from the same port, this message
> +will return the IFLA_RCLK_STATE_COUNT attribute containing the number of
> +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX attributes
> +listing the active output indexes.
> +This message will call the ndo_get_rclk_range to determine the allowed
> +recovered clock indexes and then will loop through them, calling
> +the ndo_get_rclk_state for each of them.

Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE? Isn't
RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in the
range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just report the
state (enabled / disable) for all

> +
> +RTM_SETRCLKSTATE
> +-----------------
> +Sets the redirection of the recovered clock for a given pin. This message
> +expects one attribute:
> +struct if_set_rclk_msg {
> +	__u32 ifindex; /* interface index */
> +	__u32 out_idx; /* output index (from a valid range)
> +	__u32 flags; /* configuration flags */
> +};
> +
> +Supported flags are:
> +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> +		     if clear - the output will be disabled.

In the diagram you have two recovered clock outputs going into the EEC.
According to which the EEC is synchronized?

How does user space know which pins to enable?

> +
> +RTM_GETEECSTATE
> +----------------
> +Reads the state of the EEC or equivalent physical clock synchronizer.
> +This message returns the following attributes:
> +IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
> +		 The states returned in this attribute are aligned to the
> +		 ITU-T G.781 and are:
> +		  IF_EEC_STATE_INVALID - state is not valid
> +		  IF_EEC_STATE_FREERUN - clock is free-running
> +		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
> +		                        but the holdover memory is not valid
> +		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the reference
> +		                               and holdover memory is valid
> +		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
> +State is read from the netdev calling the:
> +int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state *state,
> +			 u32 *src_idx, struct netlink_ext_ack *extack);
> +
> +IFLA_EEC_SRC_IDX - optional attribute returning the index of the reference that
> +		   is used for the current IFLA_EEC_STATE, i.e., the index of
> +		   the pin that the EEC is locked to.
> +
> +Will be returned only if the ndo_get_eec_src is implemented.
> \ No newline at end of file
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-07 14:08     ` Ido Schimmel
  0 siblings, 0 replies; 50+ messages in thread
From: Ido Schimmel @ 2021-11-07 14:08 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> Add Documentation/networking/synce.rst describing new RTNL messages
> and respective NDO ops supporting SyncE (Synchronous Ethernet).
> 
> Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
> ---
>  Documentation/networking/synce.rst | 117 +++++++++++++++++++++++++++++
>  1 file changed, 117 insertions(+)
>  create mode 100644 Documentation/networking/synce.rst
> 
> diff --git a/Documentation/networking/synce.rst b/Documentation/networking/synce.rst
> new file mode 100644
> index 000000000000..4ca41fb9a481
> --- /dev/null
> +++ b/Documentation/networking/synce.rst
> @@ -0,0 +1,117 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================
> +Synchronous Ethernet
> +====================
> +
> +Synchronous Ethernet networks use a physical layer clock to syntonize
> +the frequency across different network elements.
> +
> +Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
> +Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered clocks
> +and a dedicated TX clock input that is used as to transmit data to other nodes.
> +
> +The SyncE capable PHY is able to recover the incomning frequency of the data
> +stream on RX lanes and redirect it (sometimes dividing it) to recovered
> +clock outputs. In SyncE PHY the TX frequency is directly dependent on the
> +input frequency - either on the PHY CLK input, or on a dedicated
> +TX clock input.
> +
> +      ????????????????????????
> +      ? RX        ? TX       ?
> +  1   ? lanes     ? lanes    ? 1
> +  ????????????    ?          ???????
> +  2   ?      ?    ?          ? 2
> +  ????????   ?    ?          ???????
> +  3   ?  ?   ?    ?          ? 3
> +  ????????   ?    ?          ???????
> +      ? ??????    ?          ?
> +      ? \____/    ?          ?
> +      ????????????????????????
> +        1? 2?        ?
> + RCLK out?  ?        ? TX CLK in
> +         ?  ?        ?
> +       ???????????????????
> +       ?                 ?
> +       ?       EEC       ?
> +       ?                 ?
> +       ???????????????????
> +
> +The EEC can synchronize its frequency to one of the synchronization inputs
> +either clocks recovered on traffic interfaces or (in advanced deployments)
> +external frequency sources.
> +
> +Some EEC implementations can select synchronization source through
> +priority tables and synchronization status messaging and provide necessary
> +filtering and holdover capabilities.
> +
> +The following interface can be applicable to diffferent packet network types
> +following ITU-T G.8261/G.8262 recommendations.
> +
> +Interface
> +=========
> +
> +The following RTNL messages are used to read/configure SyncE recovered
> +clocks.
> +
> +RTM_GETRCLKRANGE
> +-----------------
> +Reads the allowed pin index range for the recovered clock outputs.
> +This can be aligned to PHY outputs or to EEC inputs, whichever is
> +better for a given application.

Can you explain the difference between PHY outputs and EEC inputs? It is
no clear to me from the diagram.

How would the diagram look in a multi-port adapter where you have a
single EEC?

> +Will call the ndo_get_rclk_range function to read the allowed range
> +of output pin indexes.
> +Will call ndo_get_rclk_range to determine the allowed recovered clock
> +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> +IFLA_RCLK_RANGE_MAX_PIN attributes

The first sentence seems to be redundant

> +
> +RTM_GETRCLKSTATE
> +-----------------
> +Read the state of recovered pins that output recovered clock from
> +a given port. The message will contain the number of assigned clocks
> +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in IFLA_RCLK_STATE_OUT_IDX
> +To support multiple recovered clock outputs from the same port, this message
> +will return the IFLA_RCLK_STATE_COUNT attribute containing the number of
> +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX attributes
> +listing the active output indexes.
> +This message will call the ndo_get_rclk_range to determine the allowed
> +recovered clock indexes and then will loop through them, calling
> +the ndo_get_rclk_state for each of them.

Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE? Isn't
RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in the
range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just report the
state (enabled / disable) for all

> +
> +RTM_SETRCLKSTATE
> +-----------------
> +Sets the redirection of the recovered clock for a given pin. This message
> +expects one attribute:
> +struct if_set_rclk_msg {
> +	__u32 ifindex; /* interface index */
> +	__u32 out_idx; /* output index (from a valid range)
> +	__u32 flags; /* configuration flags */
> +};
> +
> +Supported flags are:
> +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> +		     if clear - the output will be disabled.

In the diagram you have two recovered clock outputs going into the EEC.
According to which the EEC is synchronized?

How does user space know which pins to enable?

> +
> +RTM_GETEECSTATE
> +----------------
> +Reads the state of the EEC or equivalent physical clock synchronizer.
> +This message returns the following attributes:
> +IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
> +		 The states returned in this attribute are aligned to the
> +		 ITU-T G.781 and are:
> +		  IF_EEC_STATE_INVALID - state is not valid
> +		  IF_EEC_STATE_FREERUN - clock is free-running
> +		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
> +		                        but the holdover memory is not valid
> +		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the reference
> +		                               and holdover memory is valid
> +		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
> +State is read from the netdev calling the:
> +int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state *state,
> +			 u32 *src_idx, struct netlink_ext_ack *extack);
> +
> +IFLA_EEC_SRC_IDX - optional attribute returning the index of the reference that
> +		   is used for the current IFLA_EEC_STATE, i.e., the index of
> +		   the pin that the EEC is locked to.
> +
> +Will be returned only if the ndo_get_eec_src is implemented.
> \ No newline at end of file
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-07 14:08     ` [Intel-wired-lan] " Ido Schimmel
@ 2021-11-08  8:35       ` Machnikowski, Maciej
  -1 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-08  8:35 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, kuba, linux-kselftest, mkubecek, saeed,
	michael.chan

> -----Original Message-----
> From: Ido Schimmel <idosch@idosch.org>
> Sent: Sunday, November 7, 2021 3:09 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> > +Interface
> > +=========
> > +
> > +The following RTNL messages are used to read/configure SyncE recovered
> > +clocks.
> > +
> > +RTM_GETRCLKRANGE
> > +-----------------
> > +Reads the allowed pin index range for the recovered clock outputs.
> > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > +better for a given application.
> 
> Can you explain the difference between PHY outputs and EEC inputs? It is
> no clear to me from the diagram.

PHY is the source of frequency for the EEC, so PHY produces the reference
And EEC synchronizes to it.

Both PHY outputs and EEC inputs are configurable. PHY outputs usually are
configured using PHY registers, and EEC inputs in the DPLL references
block
 
> How would the diagram look in a multi-port adapter where you have a
> single EEC?

That depends. It can be either a multiport PHY - in this case it will look
exactly like the one I drawn. In case we have multiple PHYs their recovered
clock outputs will go to different recovered clock inputs and each PHY
TX clock inputs will be driven from different EEC's synchronized outputs
or from a single one through  clock fan out.

> > +Will call the ndo_get_rclk_range function to read the allowed range
> > +of output pin indexes.
> > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > +IFLA_RCLK_RANGE_MAX_PIN attributes
> 
> The first sentence seems to be redundant
> 
> > +
> > +RTM_GETRCLKSTATE
> > +-----------------
> > +Read the state of recovered pins that output recovered clock from
> > +a given port. The message will contain the number of assigned clocks
> > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> IFLA_RCLK_STATE_OUT_IDX
> > +To support multiple recovered clock outputs from the same port, this
> message
> > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
> of
> > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> attributes
> > +listing the active output indexes.
> > +This message will call the ndo_get_rclk_range to determine the allowed
> > +recovered clock indexes and then will loop through them, calling
> > +the ndo_get_rclk_state for each of them.
> 
> Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE? Isn't
> RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in the
> range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just
> report the
> state (enabled / disable) for all

Great idea! Will implement it.
 
> > +
> > +RTM_SETRCLKSTATE
> > +-----------------
> > +Sets the redirection of the recovered clock for a given pin. This message
> > +expects one attribute:
> > +struct if_set_rclk_msg {
> > +	__u32 ifindex; /* interface index */
> > +	__u32 out_idx; /* output index (from a valid range)
> > +	__u32 flags; /* configuration flags */
> > +};
> > +
> > +Supported flags are:
> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> > +		     if clear - the output will be disabled.
> 
> In the diagram you have two recovered clock outputs going into the EEC.
> According to which the EEC is synchronized?

That will depend on the future DPLL configuration. For now it'll be based
on the DPLL's auto select ability and its default configuration.
 
> How does user space know which pins to enable?

That's why the RTM_GETRCLKRANGE was invented but I like the suggestion
you made above so will rework the code to remove the range one and
just return the indexes with enable/disable bit for each of them. In this
case youserspace will just send the RTM_GETRCLKSTATE to learn what
can be enabled.

> > +
> > +RTM_GETEECSTATE
> > +----------------
> > +Reads the state of the EEC or equivalent physical clock synchronizer.
> > +This message returns the following attributes:
> > +IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
> > +		 The states returned in this attribute are aligned to the
> > +		 ITU-T G.781 and are:
> > +		  IF_EEC_STATE_INVALID - state is not valid
> > +		  IF_EEC_STATE_FREERUN - clock is free-running
> > +		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
> > +		                        but the holdover memory is not valid
> > +		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the
> reference
> > +		                               and holdover memory is valid
> > +		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
> > +State is read from the netdev calling the:
> > +int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state
> *state,
> > +			 u32 *src_idx, struct netlink_ext_ack *extack);
> > +
> > +IFLA_EEC_SRC_IDX - optional attribute returning the index of the
> reference that
> > +		   is used for the current IFLA_EEC_STATE, i.e., the index of
> > +		   the pin that the EEC is locked to.
> > +
> > +Will be returned only if the ndo_get_eec_src is implemented.
> > \ No newline at end of file
> > --
> > 2.26.3
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-08  8:35       ` Machnikowski, Maciej
  0 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-08  8:35 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Ido Schimmel <idosch@idosch.org>
> Sent: Sunday, November 7, 2021 3:09 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> > +Interface
> > +=========
> > +
> > +The following RTNL messages are used to read/configure SyncE recovered
> > +clocks.
> > +
> > +RTM_GETRCLKRANGE
> > +-----------------
> > +Reads the allowed pin index range for the recovered clock outputs.
> > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > +better for a given application.
> 
> Can you explain the difference between PHY outputs and EEC inputs? It is
> no clear to me from the diagram.

PHY is the source of frequency for the EEC, so PHY produces the reference
And EEC synchronizes to it.

Both PHY outputs and EEC inputs are configurable. PHY outputs usually are
configured using PHY registers, and EEC inputs in the DPLL references
block
 
> How would the diagram look in a multi-port adapter where you have a
> single EEC?

That depends. It can be either a multiport PHY - in this case it will look
exactly like the one I drawn. In case we have multiple PHYs their recovered
clock outputs will go to different recovered clock inputs and each PHY
TX clock inputs will be driven from different EEC's synchronized outputs
or from a single one through  clock fan out.

> > +Will call the ndo_get_rclk_range function to read the allowed range
> > +of output pin indexes.
> > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > +IFLA_RCLK_RANGE_MAX_PIN attributes
> 
> The first sentence seems to be redundant
> 
> > +
> > +RTM_GETRCLKSTATE
> > +-----------------
> > +Read the state of recovered pins that output recovered clock from
> > +a given port. The message will contain the number of assigned clocks
> > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> IFLA_RCLK_STATE_OUT_IDX
> > +To support multiple recovered clock outputs from the same port, this
> message
> > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
> of
> > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> attributes
> > +listing the active output indexes.
> > +This message will call the ndo_get_rclk_range to determine the allowed
> > +recovered clock indexes and then will loop through them, calling
> > +the ndo_get_rclk_state for each of them.
> 
> Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE? Isn't
> RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in the
> range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just
> report the
> state (enabled / disable) for all

Great idea! Will implement it.
 
> > +
> > +RTM_SETRCLKSTATE
> > +-----------------
> > +Sets the redirection of the recovered clock for a given pin. This message
> > +expects one attribute:
> > +struct if_set_rclk_msg {
> > +	__u32 ifindex; /* interface index */
> > +	__u32 out_idx; /* output index (from a valid range)
> > +	__u32 flags; /* configuration flags */
> > +};
> > +
> > +Supported flags are:
> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> > +		     if clear - the output will be disabled.
> 
> In the diagram you have two recovered clock outputs going into the EEC.
> According to which the EEC is synchronized?

That will depend on the future DPLL configuration. For now it'll be based
on the DPLL's auto select ability and its default configuration.
 
> How does user space know which pins to enable?

That's why the RTM_GETRCLKRANGE was invented but I like the suggestion
you made above so will rework the code to remove the range one and
just return the indexes with enable/disable bit for each of them. In this
case youserspace will just send the RTM_GETRCLKSTATE to learn what
can be enabled.

> > +
> > +RTM_GETEECSTATE
> > +----------------
> > +Reads the state of the EEC or equivalent physical clock synchronizer.
> > +This message returns the following attributes:
> > +IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
> > +		 The states returned in this attribute are aligned to the
> > +		 ITU-T G.781 and are:
> > +		  IF_EEC_STATE_INVALID - state is not valid
> > +		  IF_EEC_STATE_FREERUN - clock is free-running
> > +		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
> > +		                        but the holdover memory is not valid
> > +		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the
> reference
> > +		                               and holdover memory is valid
> > +		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
> > +State is read from the netdev calling the:
> > +int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state
> *state,
> > +			 u32 *src_idx, struct netlink_ext_ack *extack);
> > +
> > +IFLA_EEC_SRC_IDX - optional attribute returning the index of the
> reference that
> > +		   is used for the current IFLA_EEC_STATE, i.e., the index of
> > +		   the pin that the EEC is locked to.
> > +
> > +Will be returned only if the ndo_get_eec_src is implemented.
> > \ No newline at end of file
> > --
> > 2.26.3
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-08  8:35       ` [Intel-wired-lan] " Machnikowski, Maciej
@ 2021-11-08 16:29         ` Ido Schimmel
  -1 siblings, 0 replies; 50+ messages in thread
From: Ido Schimmel @ 2021-11-08 16:29 UTC (permalink / raw)
  To: Machnikowski, Maciej
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, kuba, linux-kselftest, mkubecek, saeed,
	michael.chan

On Mon, Nov 08, 2021 at 08:35:17AM +0000, Machnikowski, Maciej wrote:
> > -----Original Message-----
> > From: Ido Schimmel <idosch@idosch.org>
> > Sent: Sunday, November 7, 2021 3:09 PM
> > To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> > Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> > interfaces
> > 
> > On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> > > +Interface
> > > +=========
> > > +
> > > +The following RTNL messages are used to read/configure SyncE recovered
> > > +clocks.
> > > +
> > > +RTM_GETRCLKRANGE
> > > +-----------------
> > > +Reads the allowed pin index range for the recovered clock outputs.
> > > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > > +better for a given application.
> > 
> > Can you explain the difference between PHY outputs and EEC inputs? It is
> > no clear to me from the diagram.
> 
> PHY is the source of frequency for the EEC, so PHY produces the reference
> And EEC synchronizes to it.
> 
> Both PHY outputs and EEC inputs are configurable. PHY outputs usually are
> configured using PHY registers, and EEC inputs in the DPLL references
> block
>  
> > How would the diagram look in a multi-port adapter where you have a
> > single EEC?
> 
> That depends. It can be either a multiport PHY - in this case it will look
> exactly like the one I drawn. In case we have multiple PHYs their recovered
> clock outputs will go to different recovered clock inputs and each PHY
> TX clock inputs will be driven from different EEC's synchronized outputs
> or from a single one through  clock fan out.
> 
> > > +Will call the ndo_get_rclk_range function to read the allowed range
> > > +of output pin indexes.
> > > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > > +IFLA_RCLK_RANGE_MAX_PIN attributes
> > 
> > The first sentence seems to be redundant
> > 
> > > +
> > > +RTM_GETRCLKSTATE
> > > +-----------------
> > > +Read the state of recovered pins that output recovered clock from
> > > +a given port. The message will contain the number of assigned clocks
> > > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> > IFLA_RCLK_STATE_OUT_IDX
> > > +To support multiple recovered clock outputs from the same port, this
> > message
> > > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
> > of
> > > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> > attributes
> > > +listing the active output indexes.
> > > +This message will call the ndo_get_rclk_range to determine the allowed
> > > +recovered clock indexes and then will loop through them, calling
> > > +the ndo_get_rclk_state for each of them.
> > 
> > Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE? Isn't
> > RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in the
> > range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just
> > report the
> > state (enabled / disable) for all
> 
> Great idea! Will implement it.
>  
> > > +
> > > +RTM_SETRCLKSTATE
> > > +-----------------
> > > +Sets the redirection of the recovered clock for a given pin. This message
> > > +expects one attribute:
> > > +struct if_set_rclk_msg {
> > > +	__u32 ifindex; /* interface index */
> > > +	__u32 out_idx; /* output index (from a valid range)
> > > +	__u32 flags; /* configuration flags */
> > > +};
> > > +
> > > +Supported flags are:
> > > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> > > +		     if clear - the output will be disabled.
> > 
> > In the diagram you have two recovered clock outputs going into the EEC.
> > According to which the EEC is synchronized?
> 
> That will depend on the future DPLL configuration. For now it'll be based
> on the DPLL's auto select ability and its default configuration.
>  
> > How does user space know which pins to enable?
> 
> That's why the RTM_GETRCLKRANGE was invented but I like the suggestion
> you made above so will rework the code to remove the range one and
> just return the indexes with enable/disable bit for each of them. In this
> case youserspace will just send the RTM_GETRCLKSTATE to learn what
> can be enabled.

In the diagram there are multiple Rx lanes, all of which might be used
by the same port. How does user space know to differentiate between the
quality levels of the clock signal recovered from each lane / pin when
the information is transmitted on a per-port basis via ESMC messages?

The uAPI seems to be too low-level and is not compatible with Nvidia's
devices and potentially other vendors. We really just need a logical
interface that says "Synchronize the frequency of the EEC to the clock
recovered from port X". The kernel / drivers should abstract the inner
workings of the device from user space. Any reason this can't work for
ice?

I also want to re-iterate my dissatisfaction with the interface being
netdev-centric. By modelling the EEC as a standalone object we will be
able to extend it to set the source of the EEC to something other than a
netdev in the future. If we don't do it now, we will end up with two
ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
else).

Other advantages of modelling the EEC as a separate object include the
ability for user space to determine the mapping between netdevs and EECs
(currently impossible) and reporting additional EEC attributes such as
SyncE clockIdentity and default SSM code. There is really no reason to
report all of this identical information via multiple netdevs.

With regards to rtnetlink vs. something else, in my suggestion the only
thing that should be reported per-netdev is the mapping between the
netdev and the EEC. Similar to the way user space determines the mapping
from netdev to PHC via ETHTOOL_GET_TS_INFO. If we go with rtnetlink,
this can be reported as a new attribute in RTM_NEWLINK, no need to add
new messages.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-08 16:29         ` Ido Schimmel
  0 siblings, 0 replies; 50+ messages in thread
From: Ido Schimmel @ 2021-11-08 16:29 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Nov 08, 2021 at 08:35:17AM +0000, Machnikowski, Maciej wrote:
> > -----Original Message-----
> > From: Ido Schimmel <idosch@idosch.org>
> > Sent: Sunday, November 7, 2021 3:09 PM
> > To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> > Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> > interfaces
> > 
> > On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> > > +Interface
> > > +=========
> > > +
> > > +The following RTNL messages are used to read/configure SyncE recovered
> > > +clocks.
> > > +
> > > +RTM_GETRCLKRANGE
> > > +-----------------
> > > +Reads the allowed pin index range for the recovered clock outputs.
> > > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > > +better for a given application.
> > 
> > Can you explain the difference between PHY outputs and EEC inputs? It is
> > no clear to me from the diagram.
> 
> PHY is the source of frequency for the EEC, so PHY produces the reference
> And EEC synchronizes to it.
> 
> Both PHY outputs and EEC inputs are configurable. PHY outputs usually are
> configured using PHY registers, and EEC inputs in the DPLL references
> block
>  
> > How would the diagram look in a multi-port adapter where you have a
> > single EEC?
> 
> That depends. It can be either a multiport PHY - in this case it will look
> exactly like the one I drawn. In case we have multiple PHYs their recovered
> clock outputs will go to different recovered clock inputs and each PHY
> TX clock inputs will be driven from different EEC's synchronized outputs
> or from a single one through  clock fan out.
> 
> > > +Will call the ndo_get_rclk_range function to read the allowed range
> > > +of output pin indexes.
> > > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > > +IFLA_RCLK_RANGE_MAX_PIN attributes
> > 
> > The first sentence seems to be redundant
> > 
> > > +
> > > +RTM_GETRCLKSTATE
> > > +-----------------
> > > +Read the state of recovered pins that output recovered clock from
> > > +a given port. The message will contain the number of assigned clocks
> > > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> > IFLA_RCLK_STATE_OUT_IDX
> > > +To support multiple recovered clock outputs from the same port, this
> > message
> > > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
> > of
> > > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> > attributes
> > > +listing the active output indexes.
> > > +This message will call the ndo_get_rclk_range to determine the allowed
> > > +recovered clock indexes and then will loop through them, calling
> > > +the ndo_get_rclk_state for each of them.
> > 
> > Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE? Isn't
> > RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in the
> > range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just
> > report the
> > state (enabled / disable) for all
> 
> Great idea! Will implement it.
>  
> > > +
> > > +RTM_SETRCLKSTATE
> > > +-----------------
> > > +Sets the redirection of the recovered clock for a given pin. This message
> > > +expects one attribute:
> > > +struct if_set_rclk_msg {
> > > +	__u32 ifindex; /* interface index */
> > > +	__u32 out_idx; /* output index (from a valid range)
> > > +	__u32 flags; /* configuration flags */
> > > +};
> > > +
> > > +Supported flags are:
> > > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> > > +		     if clear - the output will be disabled.
> > 
> > In the diagram you have two recovered clock outputs going into the EEC.
> > According to which the EEC is synchronized?
> 
> That will depend on the future DPLL configuration. For now it'll be based
> on the DPLL's auto select ability and its default configuration.
>  
> > How does user space know which pins to enable?
> 
> That's why the RTM_GETRCLKRANGE was invented but I like the suggestion
> you made above so will rework the code to remove the range one and
> just return the indexes with enable/disable bit for each of them. In this
> case youserspace will just send the RTM_GETRCLKSTATE to learn what
> can be enabled.

In the diagram there are multiple Rx lanes, all of which might be used
by the same port. How does user space know to differentiate between the
quality levels of the clock signal recovered from each lane / pin when
the information is transmitted on a per-port basis via ESMC messages?

The uAPI seems to be too low-level and is not compatible with Nvidia's
devices and potentially other vendors. We really just need a logical
interface that says "Synchronize the frequency of the EEC to the clock
recovered from port X". The kernel / drivers should abstract the inner
workings of the device from user space. Any reason this can't work for
ice?

I also want to re-iterate my dissatisfaction with the interface being
netdev-centric. By modelling the EEC as a standalone object we will be
able to extend it to set the source of the EEC to something other than a
netdev in the future. If we don't do it now, we will end up with two
ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
else).

Other advantages of modelling the EEC as a separate object include the
ability for user space to determine the mapping between netdevs and EECs
(currently impossible) and reporting additional EEC attributes such as
SyncE clockIdentity and default SSM code. There is really no reason to
report all of this identical information via multiple netdevs.

With regards to rtnetlink vs. something else, in my suggestion the only
thing that should be reported per-netdev is the mapping between the
netdev and the EEC. Similar to the way user space determines the mapping
from netdev to PHC via ETHTOOL_GET_TS_INFO. If we go with rtnetlink,
this can be reported as a new attribute in RTM_NEWLINK, no need to add
new messages.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-08 16:29         ` [Intel-wired-lan] " Ido Schimmel
@ 2021-11-08 17:03           ` Jakub Kicinski
  -1 siblings, 0 replies; 50+ messages in thread
From: Jakub Kicinski @ 2021-11-08 17:03 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Machnikowski, Maciej, netdev, intel-wired-lan, richardcochran,
	abyagowi, Nguyen, Anthony L, davem, linux-kselftest, mkubecek,
	saeed, michael.chan

On Mon, 8 Nov 2021 18:29:50 +0200 Ido Schimmel wrote:
> I also want to re-iterate my dissatisfaction with the interface being
> netdev-centric. By modelling the EEC as a standalone object we will be
> able to extend it to set the source of the EEC to something other than a
> netdev in the future. If we don't do it now, we will end up with two
> ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
> else).
> 
> Other advantages of modelling the EEC as a separate object include the
> ability for user space to determine the mapping between netdevs and EECs
> (currently impossible) and reporting additional EEC attributes such as
> SyncE clockIdentity and default SSM code. There is really no reason to
> report all of this identical information via multiple netdevs.

Indeed, I feel convinced. I believe the OCP timing card will benefit
from such API as well. I pinged Jonathan if he doesn't have cycles 
I'll do the typing.

What do you have in mind for driver abstracting away pin selection?
For a standalone clock fed PPS signal from a backplate this will be
impossible, so we may need some middle way.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-08 17:03           ` Jakub Kicinski
  0 siblings, 0 replies; 50+ messages in thread
From: Jakub Kicinski @ 2021-11-08 17:03 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 8 Nov 2021 18:29:50 +0200 Ido Schimmel wrote:
> I also want to re-iterate my dissatisfaction with the interface being
> netdev-centric. By modelling the EEC as a standalone object we will be
> able to extend it to set the source of the EEC to something other than a
> netdev in the future. If we don't do it now, we will end up with two
> ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
> else).
> 
> Other advantages of modelling the EEC as a separate object include the
> ability for user space to determine the mapping between netdevs and EECs
> (currently impossible) and reporting additional EEC attributes such as
> SyncE clockIdentity and default SSM code. There is really no reason to
> report all of this identical information via multiple netdevs.

Indeed, I feel convinced. I believe the OCP timing card will benefit
from such API as well. I pinged Jonathan if he doesn't have cycles 
I'll do the typing.

What do you have in mind for driver abstracting away pin selection?
For a standalone clock fed PPS signal from a backplate this will be
impossible, so we may need some middle way.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
@ 2021-11-08 18:00     ` Petr Machata
  -1 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-08 18:00 UTC (permalink / raw)
  To: Maciej Machnikowski
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi,
	anthony.l.nguyen, davem, kuba, linux-kselftest, idosch, mkubecek,
	saeed, michael.chan


Maciej Machnikowski <maciej.machnikowski@intel.com> writes:

> Add Documentation/networking/synce.rst describing new RTNL messages
> and respective NDO ops supporting SyncE (Synchronous Ethernet).
>
> Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
> ---
>  Documentation/networking/synce.rst | 117 +++++++++++++++++++++++++++++
>  1 file changed, 117 insertions(+)
>  create mode 100644 Documentation/networking/synce.rst
>
> diff --git a/Documentation/networking/synce.rst b/Documentation/networking/synce.rst
> new file mode 100644
> index 000000000000..4ca41fb9a481
> --- /dev/null
> +++ b/Documentation/networking/synce.rst
> @@ -0,0 +1,117 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================
> +Synchronous Ethernet
> +====================
> +
> +Synchronous Ethernet networks use a physical layer clock to syntonize
> +the frequency across different network elements.
> +
> +Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
> +Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered clocks
> +and a dedicated TX clock input that is used as to transmit data to other nodes.
> +
> +The SyncE capable PHY is able to recover the incomning frequency of the data
> +stream on RX lanes and redirect it (sometimes dividing it) to recovered
> +clock outputs. In SyncE PHY the TX frequency is directly dependent on the
> +input frequency - either on the PHY CLK input, or on a dedicated
> +TX clock input.
> +
> +      ┌───────────┬──────────┐
> +      │ RX        │ TX       │
> +  1   │ lanes     │ lanes    │ 1
> +  ───►├──────┐    │          ├─────►
> +  2   │      │    │          │ 2
> +  ───►├──┐   │    │          ├─────►
> +  3   │  │   │    │          │ 3
> +  ───►├─▼▼   ▼    │          ├─────►
> +      │ ──────    │          │
> +      │ \____/    │          │
> +      └──┼──┼─────┴──────────┘
> +        1│ 2│        ▲
> + RCLK out│  │        │ TX CLK in
> +         ▼  ▼        │
> +       ┌─────────────┴───┐
> +       │                 │
> +       │       EEC       │
> +       │                 │
> +       └─────────────────┘
> +
> +The EEC can synchronize its frequency to one of the synchronization inputs
> +either clocks recovered on traffic interfaces or (in advanced deployments)
> +external frequency sources.
> +
> +Some EEC implementations can select synchronization source through
> +priority tables and synchronization status messaging and provide necessary
> +filtering and holdover capabilities.
> +
> +The following interface can be applicable to diffferent packet network types
> +following ITU-T G.8261/G.8262 recommendations.
> +
> +Interface
> +=========
> +
> +The following RTNL messages are used to read/configure SyncE recovered
> +clocks.
> +
> +RTM_GETRCLKRANGE
> +-----------------
> +Reads the allowed pin index range for the recovered clock outputs.
> +This can be aligned to PHY outputs or to EEC inputs, whichever is
> +better for a given application.
> +Will call the ndo_get_rclk_range function to read the allowed range
> +of output pin indexes.
> +Will call ndo_get_rclk_range to determine the allowed recovered clock
> +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> +IFLA_RCLK_RANGE_MAX_PIN attributes
> +
> +RTM_GETRCLKSTATE
> +-----------------
> +Read the state of recovered pins that output recovered clock from
> +a given port. The message will contain the number of assigned clocks
> +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in IFLA_RCLK_STATE_OUT_IDX
> +To support multiple recovered clock outputs from the same port, this message
> +will return the IFLA_RCLK_STATE_COUNT attribute containing the number of
> +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX attributes
> +listing the active output indexes.
> +This message will call the ndo_get_rclk_range to determine the allowed
> +recovered clock indexes and then will loop through them, calling
> +the ndo_get_rclk_state for each of them.

Let me make sure I understand the model that you propose. Specifically
from the point of view of a multi-port device, because that's my
immediate use case.

RTM_GETRCLKRANGE would report number of "pins" that matches the number
of lanes in the system. So e.g. a 32-port switch, where each port has 4
lanes, would give a range of [1; 128], inclusive. (Or maybe [0; 128) or
whatever.)

RTM_GETRCLKSTATE would then return some subset of those pins, depending
on which lanes actually managed to establish a connection and carry a
valid clock signal. So, say, [1, 2, 3, 4] if the first port has e.g. a
100Gbps established.

> +
> +RTM_SETRCLKSTATE
> +-----------------
> +Sets the redirection of the recovered clock for a given pin. This message
> +expects one attribute:
> +struct if_set_rclk_msg {
> +	__u32 ifindex; /* interface index */
> +	__u32 out_idx; /* output index (from a valid range)
> +	__u32 flags; /* configuration flags */
> +};
> +
> +Supported flags are:
> +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> +		     if clear - the output will be disabled.

OK, so here I set up the tracking. ifindex tells me which EEC to
configure, out_idx is the pin to track, flags tell me whether to set up
the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
I somehow know that lane 2 has the best clock.


If the above is broadly correct, I've got some questions.

First, what if more than one out_idx is set? What are drivers / HW meant
to do with this? What is the expected behavior?

Also GETRCLKSTATE and SETRCLKSTATE have a somewhat different scope: one
reports which pins carry a clock signal, the other influences tracking.
That seems wrong. There also does not seems to be an UAPI to retrieve
the tracking settings.

Second, as a user-space client, how do I know that if ports 1 and 2 both
report pin range [A; B], that they both actually share the same
underlying EEC? Is there some sort of coordination among the drivers,
such that each pin in the system has a unique ID?

Further, how do I actually know the mapping from ports to pins? E.g. as
a user, I might know my master is behind swp1. How do I know what pins
correspond to that port? As a user-space tool author, how do I help
users to do something like "eec set clock eec0 track swp1"?

Additionally, how would things like external GPSs or 1pps be modeled? I
guess the driver would know about such interface, and would expose it as
a "pin". When the GPS signal locks, the driver starts reporting the pin
in the RCLK set. Then it is possible to set up tracking of that pin.


It seems to me it would be easier to understand, and to write user-space
tools and drivers for, a model that has EEC as an explicit first-class
object. That's where the EEC state naturally belongs, that's where the
pin range naturally belongs. Netdevs should have a reference to EEC and
pins, not present this information as if they own it. A first-class EEC
would also allow to later figure out how to hook up PHC and EEC.

> +
> +RTM_GETEECSTATE
> +----------------
> +Reads the state of the EEC or equivalent physical clock synchronizer.
> +This message returns the following attributes:
> +IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
> +		 The states returned in this attribute are aligned to the
> +		 ITU-T G.781 and are:
> +		  IF_EEC_STATE_INVALID - state is not valid
> +		  IF_EEC_STATE_FREERUN - clock is free-running
> +		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
> +		                        but the holdover memory is not valid
> +		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the reference
> +		                               and holdover memory is valid
> +		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
> +State is read from the netdev calling the:
> +int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state *state,
> +			 u32 *src_idx, struct netlink_ext_ack *extack);
> +
> +IFLA_EEC_SRC_IDX - optional attribute returning the index of the reference that
> +		   is used for the current IFLA_EEC_STATE, i.e., the index of
> +		   the pin that the EEC is locked to.
> +
> +Will be returned only if the ndo_get_eec_src is implemented.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-08 18:00     ` Petr Machata
  0 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-08 18:00 UTC (permalink / raw)
  To: intel-wired-lan


Maciej Machnikowski <maciej.machnikowski@intel.com> writes:

> Add Documentation/networking/synce.rst describing new RTNL messages
> and respective NDO ops supporting SyncE (Synchronous Ethernet).
>
> Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
> ---
>  Documentation/networking/synce.rst | 117 +++++++++++++++++++++++++++++
>  1 file changed, 117 insertions(+)
>  create mode 100644 Documentation/networking/synce.rst
>
> diff --git a/Documentation/networking/synce.rst b/Documentation/networking/synce.rst
> new file mode 100644
> index 000000000000..4ca41fb9a481
> --- /dev/null
> +++ b/Documentation/networking/synce.rst
> @@ -0,0 +1,117 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================
> +Synchronous Ethernet
> +====================
> +
> +Synchronous Ethernet networks use a physical layer clock to syntonize
> +the frequency across different network elements.
> +
> +Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
> +Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered clocks
> +and a dedicated TX clock input that is used as to transmit data to other nodes.
> +
> +The SyncE capable PHY is able to recover the incomning frequency of the data
> +stream on RX lanes and redirect it (sometimes dividing it) to recovered
> +clock outputs. In SyncE PHY the TX frequency is directly dependent on the
> +input frequency - either on the PHY CLK input, or on a dedicated
> +TX clock input.
> +
> +      ????????????????????????
> +      ? RX        ? TX       ?
> +  1   ? lanes     ? lanes    ? 1
> +  ????????????    ?          ???????
> +  2   ?      ?    ?          ? 2
> +  ????????   ?    ?          ???????
> +  3   ?  ?   ?    ?          ? 3
> +  ????????   ?    ?          ???????
> +      ? ??????    ?          ?
> +      ? \____/    ?          ?
> +      ????????????????????????
> +        1? 2?        ?
> + RCLK out?  ?        ? TX CLK in
> +         ?  ?        ?
> +       ???????????????????
> +       ?                 ?
> +       ?       EEC       ?
> +       ?                 ?
> +       ???????????????????
> +
> +The EEC can synchronize its frequency to one of the synchronization inputs
> +either clocks recovered on traffic interfaces or (in advanced deployments)
> +external frequency sources.
> +
> +Some EEC implementations can select synchronization source through
> +priority tables and synchronization status messaging and provide necessary
> +filtering and holdover capabilities.
> +
> +The following interface can be applicable to diffferent packet network types
> +following ITU-T G.8261/G.8262 recommendations.
> +
> +Interface
> +=========
> +
> +The following RTNL messages are used to read/configure SyncE recovered
> +clocks.
> +
> +RTM_GETRCLKRANGE
> +-----------------
> +Reads the allowed pin index range for the recovered clock outputs.
> +This can be aligned to PHY outputs or to EEC inputs, whichever is
> +better for a given application.
> +Will call the ndo_get_rclk_range function to read the allowed range
> +of output pin indexes.
> +Will call ndo_get_rclk_range to determine the allowed recovered clock
> +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> +IFLA_RCLK_RANGE_MAX_PIN attributes
> +
> +RTM_GETRCLKSTATE
> +-----------------
> +Read the state of recovered pins that output recovered clock from
> +a given port. The message will contain the number of assigned clocks
> +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in IFLA_RCLK_STATE_OUT_IDX
> +To support multiple recovered clock outputs from the same port, this message
> +will return the IFLA_RCLK_STATE_COUNT attribute containing the number of
> +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX attributes
> +listing the active output indexes.
> +This message will call the ndo_get_rclk_range to determine the allowed
> +recovered clock indexes and then will loop through them, calling
> +the ndo_get_rclk_state for each of them.

Let me make sure I understand the model that you propose. Specifically
from the point of view of a multi-port device, because that's my
immediate use case.

RTM_GETRCLKRANGE would report number of "pins" that matches the number
of lanes in the system. So e.g. a 32-port switch, where each port has 4
lanes, would give a range of [1; 128], inclusive. (Or maybe [0; 128) or
whatever.)

RTM_GETRCLKSTATE would then return some subset of those pins, depending
on which lanes actually managed to establish a connection and carry a
valid clock signal. So, say, [1, 2, 3, 4] if the first port has e.g. a
100Gbps established.

> +
> +RTM_SETRCLKSTATE
> +-----------------
> +Sets the redirection of the recovered clock for a given pin. This message
> +expects one attribute:
> +struct if_set_rclk_msg {
> +	__u32 ifindex; /* interface index */
> +	__u32 out_idx; /* output index (from a valid range)
> +	__u32 flags; /* configuration flags */
> +};
> +
> +Supported flags are:
> +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> +		     if clear - the output will be disabled.

OK, so here I set up the tracking. ifindex tells me which EEC to
configure, out_idx is the pin to track, flags tell me whether to set up
the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
I somehow know that lane 2 has the best clock.


If the above is broadly correct, I've got some questions.

First, what if more than one out_idx is set? What are drivers / HW meant
to do with this? What is the expected behavior?

Also GETRCLKSTATE and SETRCLKSTATE have a somewhat different scope: one
reports which pins carry a clock signal, the other influences tracking.
That seems wrong. There also does not seems to be an UAPI to retrieve
the tracking settings.

Second, as a user-space client, how do I know that if ports 1 and 2 both
report pin range [A; B], that they both actually share the same
underlying EEC? Is there some sort of coordination among the drivers,
such that each pin in the system has a unique ID?

Further, how do I actually know the mapping from ports to pins? E.g. as
a user, I might know my master is behind swp1. How do I know what pins
correspond to that port? As a user-space tool author, how do I help
users to do something like "eec set clock eec0 track swp1"?

Additionally, how would things like external GPSs or 1pps be modeled? I
guess the driver would know about such interface, and would expose it as
a "pin". When the GPS signal locks, the driver starts reporting the pin
in the RCLK set. Then it is possible to set up tracking of that pin.


It seems to me it would be easier to understand, and to write user-space
tools and drivers for, a model that has EEC as an explicit first-class
object. That's where the EEC state naturally belongs, that's where the
pin range naturally belongs. Netdevs should have a reference to EEC and
pins, not present this information as if they own it. A first-class EEC
would also allow to later figure out how to hook up PHC and EEC.

> +
> +RTM_GETEECSTATE
> +----------------
> +Reads the state of the EEC or equivalent physical clock synchronizer.
> +This message returns the following attributes:
> +IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
> +		 The states returned in this attribute are aligned to the
> +		 ITU-T G.781 and are:
> +		  IF_EEC_STATE_INVALID - state is not valid
> +		  IF_EEC_STATE_FREERUN - clock is free-running
> +		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
> +		                        but the holdover memory is not valid
> +		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the reference
> +		                               and holdover memory is valid
> +		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
> +State is read from the netdev calling the:
> +int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state *state,
> +			 u32 *src_idx, struct netlink_ext_ack *extack);
> +
> +IFLA_EEC_SRC_IDX - optional attribute returning the index of the reference that
> +		   is used for the current IFLA_EEC_STATE, i.e., the index of
> +		   the pin that the EEC is locked to.
> +
> +Will be returned only if the ndo_get_eec_src is implemented.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-08 16:29         ` [Intel-wired-lan] " Ido Schimmel
@ 2021-11-09 10:32           ` Machnikowski, Maciej
  -1 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-09 10:32 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, kuba, linux-kselftest, mkubecek, saeed,
	michael.chan

> -----Original Message-----
> From: Ido Schimmel <idosch@idosch.org>
> Sent: Monday, November 8, 2021 5:30 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> On Mon, Nov 08, 2021 at 08:35:17AM +0000, Machnikowski, Maciej wrote:
> > > -----Original Message-----
> > > From: Ido Schimmel <idosch@idosch.org>
> > > Sent: Sunday, November 7, 2021 3:09 PM
> > > To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> > > Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> > > interfaces
> > >
> > > On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> > > > +Interface
> > > > +=========
> > > > +
> > > > +The following RTNL messages are used to read/configure SyncE
> recovered
> > > > +clocks.
> > > > +
> > > > +RTM_GETRCLKRANGE
> > > > +-----------------
> > > > +Reads the allowed pin index range for the recovered clock outputs.
> > > > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > > > +better for a given application.
> > >
> > > Can you explain the difference between PHY outputs and EEC inputs? It is
> > > no clear to me from the diagram.
> >
> > PHY is the source of frequency for the EEC, so PHY produces the reference
> > And EEC synchronizes to it.
> >
> > Both PHY outputs and EEC inputs are configurable. PHY outputs usually are
> > configured using PHY registers, and EEC inputs in the DPLL references
> > block
> >
> > > How would the diagram look in a multi-port adapter where you have a
> > > single EEC?
> >
> > That depends. It can be either a multiport PHY - in this case it will look
> > exactly like the one I drawn. In case we have multiple PHYs their recovered
> > clock outputs will go to different recovered clock inputs and each PHY
> > TX clock inputs will be driven from different EEC's synchronized outputs
> > or from a single one through  clock fan out.
> >
> > > > +Will call the ndo_get_rclk_range function to read the allowed range
> > > > +of output pin indexes.
> > > > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > > > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > > > +IFLA_RCLK_RANGE_MAX_PIN attributes
> > >
> > > The first sentence seems to be redundant
> > >
> > > > +
> > > > +RTM_GETRCLKSTATE
> > > > +-----------------
> > > > +Read the state of recovered pins that output recovered clock from
> > > > +a given port. The message will contain the number of assigned clocks
> > > > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> > > IFLA_RCLK_STATE_OUT_IDX
> > > > +To support multiple recovered clock outputs from the same port, this
> > > message
> > > > +will return the IFLA_RCLK_STATE_COUNT attribute containing the
> number
> > > of
> > > > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> > > attributes
> > > > +listing the active output indexes.
> > > > +This message will call the ndo_get_rclk_range to determine the
> allowed
> > > > +recovered clock indexes and then will loop through them, calling
> > > > +the ndo_get_rclk_state for each of them.
> > >
> > > Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE?
> Isn't
> > > RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in
> the
> > > range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just
> > > report the
> > > state (enabled / disable) for all
> >
> > Great idea! Will implement it.
> >
> > > > +
> > > > +RTM_SETRCLKSTATE
> > > > +-----------------
> > > > +Sets the redirection of the recovered clock for a given pin. This
> message
> > > > +expects one attribute:
> > > > +struct if_set_rclk_msg {
> > > > +	__u32 ifindex; /* interface index */
> > > > +	__u32 out_idx; /* output index (from a valid range)
> > > > +	__u32 flags; /* configuration flags */
> > > > +};
> > > > +
> > > > +Supported flags are:
> > > > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be
> enabled,
> > > > +		     if clear - the output will be disabled.
> > >
> > > In the diagram you have two recovered clock outputs going into the EEC.
> > > According to which the EEC is synchronized?
> >
> > That will depend on the future DPLL configuration. For now it'll be based
> > on the DPLL's auto select ability and its default configuration.
> >
> > > How does user space know which pins to enable?
> >
> > That's why the RTM_GETRCLKRANGE was invented but I like the suggestion
> > you made above so will rework the code to remove the range one and
> > just return the indexes with enable/disable bit for each of them. In this
> > case youserspace will just send the RTM_GETRCLKSTATE to learn what
> > can be enabled.
> 
> In the diagram there are multiple Rx lanes, all of which might be used
> by the same port. How does user space know to differentiate between the
> quality levels of the clock signal recovered from each lane / pin when
> the information is transmitted on a per-port basis via ESMC messages?

The lines represent different ports - not necessarily lanes. My bad - will fix.

> The uAPI seems to be too low-level and is not compatible with Nvidia's
> devices and potentially other vendors. We really just need a logical
> interface that says "Synchronize the frequency of the EEC to the clock
> recovered from port X". The kernel / drivers should abstract the inner
> workings of the device from user space. Any reason this can't work for
> ice?

You can build a very simple solution with just one recovered clock index and
implement exactly what you described. RTM_SETRCLKSTATE will only set the
redirection and RTM_GETRCLKSTATE will read the current HW setting of
what's enabled.
 
> I also want to re-iterate my dissatisfaction with the interface being
> netdev-centric. By modelling the EEC as a standalone object we will be
> able to extend it to set the source of the EEC to something other than a
> netdev in the future. If we don't do it now, we will end up with two
> ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
> else).
> 
> Other advantages of modelling the EEC as a separate object include the
> ability for user space to determine the mapping between netdevs and EECs
> (currently impossible) and reporting additional EEC attributes such as
> SyncE clockIdentity and default SSM code. There is really no reason to
> report all of this identical information via multiple netdevs.
>
> With regards to rtnetlink vs. something else, in my suggestion the only
> thing that should be reported per-netdev is the mapping between the
> netdev and the EEC. Similar to the way user space determines the mapping
> from netdev to PHC via ETHTOOL_GET_TS_INFO. If we go with rtnetlink,
> this can be reported as a new attribute in RTM_NEWLINK, no need to add
> new messages.

Will answer that in the following mail.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-09 10:32           ` Machnikowski, Maciej
  0 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-09 10:32 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Ido Schimmel <idosch@idosch.org>
> Sent: Monday, November 8, 2021 5:30 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> On Mon, Nov 08, 2021 at 08:35:17AM +0000, Machnikowski, Maciej wrote:
> > > -----Original Message-----
> > > From: Ido Schimmel <idosch@idosch.org>
> > > Sent: Sunday, November 7, 2021 3:09 PM
> > > To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> > > Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> > > interfaces
> > >
> > > On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> > > > +Interface
> > > > +=========
> > > > +
> > > > +The following RTNL messages are used to read/configure SyncE
> recovered
> > > > +clocks.
> > > > +
> > > > +RTM_GETRCLKRANGE
> > > > +-----------------
> > > > +Reads the allowed pin index range for the recovered clock outputs.
> > > > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > > > +better for a given application.
> > >
> > > Can you explain the difference between PHY outputs and EEC inputs? It is
> > > no clear to me from the diagram.
> >
> > PHY is the source of frequency for the EEC, so PHY produces the reference
> > And EEC synchronizes to it.
> >
> > Both PHY outputs and EEC inputs are configurable. PHY outputs usually are
> > configured using PHY registers, and EEC inputs in the DPLL references
> > block
> >
> > > How would the diagram look in a multi-port adapter where you have a
> > > single EEC?
> >
> > That depends. It can be either a multiport PHY - in this case it will look
> > exactly like the one I drawn. In case we have multiple PHYs their recovered
> > clock outputs will go to different recovered clock inputs and each PHY
> > TX clock inputs will be driven from different EEC's synchronized outputs
> > or from a single one through  clock fan out.
> >
> > > > +Will call the ndo_get_rclk_range function to read the allowed range
> > > > +of output pin indexes.
> > > > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > > > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > > > +IFLA_RCLK_RANGE_MAX_PIN attributes
> > >
> > > The first sentence seems to be redundant
> > >
> > > > +
> > > > +RTM_GETRCLKSTATE
> > > > +-----------------
> > > > +Read the state of recovered pins that output recovered clock from
> > > > +a given port. The message will contain the number of assigned clocks
> > > > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> > > IFLA_RCLK_STATE_OUT_IDX
> > > > +To support multiple recovered clock outputs from the same port, this
> > > message
> > > > +will return the IFLA_RCLK_STATE_COUNT attribute containing the
> number
> > > of
> > > > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> > > attributes
> > > > +listing the active output indexes.
> > > > +This message will call the ndo_get_rclk_range to determine the
> allowed
> > > > +recovered clock indexes and then will loop through them, calling
> > > > +the ndo_get_rclk_state for each of them.
> > >
> > > Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE?
> Isn't
> > > RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in
> the
> > > range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just
> > > report the
> > > state (enabled / disable) for all
> >
> > Great idea! Will implement it.
> >
> > > > +
> > > > +RTM_SETRCLKSTATE
> > > > +-----------------
> > > > +Sets the redirection of the recovered clock for a given pin. This
> message
> > > > +expects one attribute:
> > > > +struct if_set_rclk_msg {
> > > > +	__u32 ifindex; /* interface index */
> > > > +	__u32 out_idx; /* output index (from a valid range)
> > > > +	__u32 flags; /* configuration flags */
> > > > +};
> > > > +
> > > > +Supported flags are:
> > > > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be
> enabled,
> > > > +		     if clear - the output will be disabled.
> > >
> > > In the diagram you have two recovered clock outputs going into the EEC.
> > > According to which the EEC is synchronized?
> >
> > That will depend on the future DPLL configuration. For now it'll be based
> > on the DPLL's auto select ability and its default configuration.
> >
> > > How does user space know which pins to enable?
> >
> > That's why the RTM_GETRCLKRANGE was invented but I like the suggestion
> > you made above so will rework the code to remove the range one and
> > just return the indexes with enable/disable bit for each of them. In this
> > case youserspace will just send the RTM_GETRCLKSTATE to learn what
> > can be enabled.
> 
> In the diagram there are multiple Rx lanes, all of which might be used
> by the same port. How does user space know to differentiate between the
> quality levels of the clock signal recovered from each lane / pin when
> the information is transmitted on a per-port basis via ESMC messages?

The lines represent different ports - not necessarily lanes. My bad - will fix.

> The uAPI seems to be too low-level and is not compatible with Nvidia's
> devices and potentially other vendors. We really just need a logical
> interface that says "Synchronize the frequency of the EEC to the clock
> recovered from port X". The kernel / drivers should abstract the inner
> workings of the device from user space. Any reason this can't work for
> ice?

You can build a very simple solution with just one recovered clock index and
implement exactly what you described. RTM_SETRCLKSTATE will only set the
redirection and RTM_GETRCLKSTATE will read the current HW setting of
what's enabled.
 
> I also want to re-iterate my dissatisfaction with the interface being
> netdev-centric. By modelling the EEC as a standalone object we will be
> able to extend it to set the source of the EEC to something other than a
> netdev in the future. If we don't do it now, we will end up with two
> ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
> else).
> 
> Other advantages of modelling the EEC as a separate object include the
> ability for user space to determine the mapping between netdevs and EECs
> (currently impossible) and reporting additional EEC attributes such as
> SyncE clockIdentity and default SSM code. There is really no reason to
> report all of this identical information via multiple netdevs.
>
> With regards to rtnetlink vs. something else, in my suggestion the only
> thing that should be reported per-netdev is the mapping between the
> netdev and the EEC. Similar to the way user space determines the mapping
> from netdev to PHC via ETHTOOL_GET_TS_INFO. If we go with rtnetlink,
> this can be reported as a new attribute in RTM_NEWLINK, no need to add
> new messages.

Will answer that in the following mail.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-08 18:00     ` [Intel-wired-lan] " Petr Machata
@ 2021-11-09 10:43       ` Machnikowski, Maciej
  -1 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-09 10:43 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, kuba, linux-kselftest, idosch, mkubecek, saeed,
	michael.chan



> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Monday, November 8, 2021 7:00 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org;
> richardcochran@gmail.com; abyagowi@fb.com; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; davem@davemloft.net; kuba@kernel.org;
> linux-kselftest@vger.kernel.org; idosch@idosch.org; mkubecek@suse.cz;
> saeed@kernel.org; michael.chan@broadcom.com
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> Maciej Machnikowski <maciej.machnikowski@intel.com> writes:
> 
> > Add Documentation/networking/synce.rst describing new RTNL messages
> > and respective NDO ops supporting SyncE (Synchronous Ethernet).
> >
> > Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
> > ---
> >  Documentation/networking/synce.rst | 117
> +++++++++++++++++++++++++++++
> >  1 file changed, 117 insertions(+)
> >  create mode 100644 Documentation/networking/synce.rst
> >
> > diff --git a/Documentation/networking/synce.rst
> b/Documentation/networking/synce.rst
> > new file mode 100644
> > index 000000000000..4ca41fb9a481
> > --- /dev/null
> > +++ b/Documentation/networking/synce.rst
> > @@ -0,0 +1,117 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +====================
> > +Synchronous Ethernet
> > +====================
> > +
> > +Synchronous Ethernet networks use a physical layer clock to syntonize
> > +the frequency across different network elements.
> > +
> > +Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
> > +Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered
> clocks
> > +and a dedicated TX clock input that is used as to transmit data to other
> nodes.
> > +
> > +The SyncE capable PHY is able to recover the incomning frequency of the
> data
> > +stream on RX lanes and redirect it (sometimes dividing it) to recovered
> > +clock outputs. In SyncE PHY the TX frequency is directly dependent on the
> > +input frequency - either on the PHY CLK input, or on a dedicated
> > +TX clock input.
> > +
> > +      ┌───────────┬──────────┐
> > +      │ RX        │ TX       │
> > +  1   │ lanes     │ lanes    │ 1
> > +  ───►├──────┐    │          ├─────►
> > +  2   │      │    │          │ 2
> > +  ───►├──┐   │    │          ├─────►
> > +  3   │  │   │    │          │ 3
> > +  ───►├─▼▼   ▼    │          ├─────►
> > +      │ ──────    │          │
> > +      │ \____/    │          │
> > +      └──┼──┼─────┴──────────┘
> > +        1│ 2│        ▲
> > + RCLK out│  │        │ TX CLK in
> > +         ▼  ▼        │
> > +       ┌─────────────┴───┐
> > +       │                 │
> > +       │       EEC       │
> > +       │                 │
> > +       └─────────────────┘
> > +
> > +The EEC can synchronize its frequency to one of the synchronization
> inputs
> > +either clocks recovered on traffic interfaces or (in advanced deployments)
> > +external frequency sources.
> > +
> > +Some EEC implementations can select synchronization source through
> > +priority tables and synchronization status messaging and provide
> necessary
> > +filtering and holdover capabilities.
> > +
> > +The following interface can be applicable to diffferent packet network
> types
> > +following ITU-T G.8261/G.8262 recommendations.
> > +
> > +Interface
> > +=========
> > +
> > +The following RTNL messages are used to read/configure SyncE recovered
> > +clocks.
> > +
> > +RTM_GETRCLKRANGE
> > +-----------------
> > +Reads the allowed pin index range for the recovered clock outputs.
> > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > +better for a given application.
> > +Will call the ndo_get_rclk_range function to read the allowed range
> > +of output pin indexes.
> > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > +IFLA_RCLK_RANGE_MAX_PIN attributes
> > +
> > +RTM_GETRCLKSTATE
> > +-----------------
> > +Read the state of recovered pins that output recovered clock from
> > +a given port. The message will contain the number of assigned clocks
> > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> IFLA_RCLK_STATE_OUT_IDX
> > +To support multiple recovered clock outputs from the same port, this
> message
> > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
> of
> > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> attributes
> > +listing the active output indexes.
> > +This message will call the ndo_get_rclk_range to determine the allowed
> > +recovered clock indexes and then will loop through them, calling
> > +the ndo_get_rclk_state for each of them.
> 
> Let me make sure I understand the model that you propose. Specifically
> from the point of view of a multi-port device, because that's my
> immediate use case.
> 
> RTM_GETRCLKRANGE would report number of "pins" that matches the
> number
> of lanes in the system. So e.g. a 32-port switch, where each port has 4
> lanes, would give a range of [1; 128], inclusive. (Or maybe [0; 128) or
> whatever.)
> 
> RTM_GETRCLKSTATE would then return some subset of those pins,
> depending
> on which lanes actually managed to establish a connection and carry a
> valid clock signal. So, say, [1, 2, 3, 4] if the first port has e.g. a
> 100Gbps established.
> 

Those 2 will be merged into a single RTM_GETRCLKSTATE that will report
the state of all available pins for a given port.

Also lanes here should really be ports - will fix in next revision.

But the logic will be: 
Call the RTM_GETRCLKSTATE. It will return the list of pins and their state
for a given port. Once you read the range you will send the RTM_SETRCLKSTATE
to enable the redirection to a given RCLK output from the PHY. If your DPLL/EEC
is configured to accept it automatically - it's all you need to do and you need to
wait for the right state of the EEC (locked/locked with HO).

> > +
> > +RTM_SETRCLKSTATE
> > +-----------------
> > +Sets the redirection of the recovered clock for a given pin. This message
> > +expects one attribute:
> > +struct if_set_rclk_msg {
> > +	__u32 ifindex; /* interface index */
> > +	__u32 out_idx; /* output index (from a valid range)
> > +	__u32 flags; /* configuration flags */
> > +};
> > +
> > +Supported flags are:
> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> > +		     if clear - the output will be disabled.
> 
> OK, so here I set up the tracking. ifindex tells me which EEC to
> configure, out_idx is the pin to track, flags tell me whether to set up
> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
> I somehow know that lane 2 has the best clock.

It's bound to ifindex to know which PHY port you interact with. It has nothing to
do with the EEC yet.
 
> If the above is broadly correct, I've got some questions.
> 
> First, what if more than one out_idx is set? What are drivers / HW meant
> to do with this? What is the expected behavior?

Expected behavior is deployment specific. You can use different phy recovered
clock outputs to implement active/passive mode of clock failover.

> Also GETRCLKSTATE and SETRCLKSTATE have a somewhat different scope:
> one
> reports which pins carry a clock signal, the other influences tracking.
> That seems wrong. There also does not seems to be an UAPI to retrieve
> the tracking settings.

They don't. Get reads the redirection state and SET sets it - nothing more,
nothing less. In ICE we use EEC pin indexes so that the model translates easier
to the one when we support DPLL subsystem.

> Second, as a user-space client, how do I know that if ports 1 and 2 both
> report pin range [A; B], that they both actually share the same
> underlying EEC? Is there some sort of coordination among the drivers,
> such that each pin in the system has a unique ID?

For now we don't, as we don't have EEC subsystem. But that can be solved
by a config file temporarily.

> Further, how do I actually know the mapping from ports to pins? E.g. as
> a user, I might know my master is behind swp1. How do I know what pins
> correspond to that port? As a user-space tool author, how do I help
> users to do something like "eec set clock eec0 track swp1"?

That's why driver needs to be smart there and return indexes properly.

> Additionally, how would things like external GPSs or 1pps be modeled? I
> guess the driver would know about such interface, and would expose it as
> a "pin". When the GPS signal locks, the driver starts reporting the pin
> in the RCLK set. Then it is possible to set up tracking of that pin.

That won't be enabled before we get the DPLL subsystem ready.
 
> It seems to me it would be easier to understand, and to write user-space
> tools and drivers for, a model that has EEC as an explicit first-class
> object. That's where the EEC state naturally belongs, that's where the
> pin range naturally belongs. Netdevs should have a reference to EEC and
> pins, not present this information as if they own it. A first-class EEC
> would also allow to later figure out how to hook up PHC and EEC.

We have the userspace tool, but can’t upstream it until we define kernel 
Interfaces. It's paragraph 22 :(

Regards
Maciek

> > +
> > +RTM_GETEECSTATE
> > +----------------
> > +Reads the state of the EEC or equivalent physical clock synchronizer.
> > +This message returns the following attributes:
> > +IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
> > +		 The states returned in this attribute are aligned to the
> > +		 ITU-T G.781 and are:
> > +		  IF_EEC_STATE_INVALID - state is not valid
> > +		  IF_EEC_STATE_FREERUN - clock is free-running
> > +		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
> > +		                        but the holdover memory is not valid
> > +		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the
> reference
> > +		                               and holdover memory is valid
> > +		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
> > +State is read from the netdev calling the:
> > +int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state
> *state,
> > +			 u32 *src_idx, struct netlink_ext_ack *extack);
> > +
> > +IFLA_EEC_SRC_IDX - optional attribute returning the index of the
> reference that
> > +		   is used for the current IFLA_EEC_STATE, i.e., the index of
> > +		   the pin that the EEC is locked to.
> > +
> > +Will be returned only if the ndo_get_eec_src is implemented.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-09 10:43       ` Machnikowski, Maciej
  0 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-09 10:43 UTC (permalink / raw)
  To: intel-wired-lan



> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Monday, November 8, 2021 7:00 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Cc: netdev at vger.kernel.org; intel-wired-lan at lists.osuosl.org;
> richardcochran at gmail.com; abyagowi at fb.com; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; davem at davemloft.net; kuba at kernel.org;
> linux-kselftest at vger.kernel.org; idosch at idosch.org; mkubecek at suse.cz;
> saeed at kernel.org; michael.chan at broadcom.com
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> Maciej Machnikowski <maciej.machnikowski@intel.com> writes:
> 
> > Add Documentation/networking/synce.rst describing new RTNL messages
> > and respective NDO ops supporting SyncE (Synchronous Ethernet).
> >
> > Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
> > ---
> >  Documentation/networking/synce.rst | 117
> +++++++++++++++++++++++++++++
> >  1 file changed, 117 insertions(+)
> >  create mode 100644 Documentation/networking/synce.rst
> >
> > diff --git a/Documentation/networking/synce.rst
> b/Documentation/networking/synce.rst
> > new file mode 100644
> > index 000000000000..4ca41fb9a481
> > --- /dev/null
> > +++ b/Documentation/networking/synce.rst
> > @@ -0,0 +1,117 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +====================
> > +Synchronous Ethernet
> > +====================
> > +
> > +Synchronous Ethernet networks use a physical layer clock to syntonize
> > +the frequency across different network elements.
> > +
> > +Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
> > +Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered
> clocks
> > +and a dedicated TX clock input that is used as to transmit data to other
> nodes.
> > +
> > +The SyncE capable PHY is able to recover the incomning frequency of the
> data
> > +stream on RX lanes and redirect it (sometimes dividing it) to recovered
> > +clock outputs. In SyncE PHY the TX frequency is directly dependent on the
> > +input frequency - either on the PHY CLK input, or on a dedicated
> > +TX clock input.
> > +
> > +      ????????????????????????
> > +      ? RX        ? TX       ?
> > +  1   ? lanes     ? lanes    ? 1
> > +  ????????????    ?          ???????
> > +  2   ?      ?    ?          ? 2
> > +  ????????   ?    ?          ???????
> > +  3   ?  ?   ?    ?          ? 3
> > +  ????????   ?    ?          ???????
> > +      ? ??????    ?          ?
> > +      ? \____/    ?          ?
> > +      ????????????????????????
> > +        1? 2?        ?
> > + RCLK out?  ?        ? TX CLK in
> > +         ?  ?        ?
> > +       ???????????????????
> > +       ?                 ?
> > +       ?       EEC       ?
> > +       ?                 ?
> > +       ???????????????????
> > +
> > +The EEC can synchronize its frequency to one of the synchronization
> inputs
> > +either clocks recovered on traffic interfaces or (in advanced deployments)
> > +external frequency sources.
> > +
> > +Some EEC implementations can select synchronization source through
> > +priority tables and synchronization status messaging and provide
> necessary
> > +filtering and holdover capabilities.
> > +
> > +The following interface can be applicable to diffferent packet network
> types
> > +following ITU-T G.8261/G.8262 recommendations.
> > +
> > +Interface
> > +=========
> > +
> > +The following RTNL messages are used to read/configure SyncE recovered
> > +clocks.
> > +
> > +RTM_GETRCLKRANGE
> > +-----------------
> > +Reads the allowed pin index range for the recovered clock outputs.
> > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > +better for a given application.
> > +Will call the ndo_get_rclk_range function to read the allowed range
> > +of output pin indexes.
> > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > +IFLA_RCLK_RANGE_MAX_PIN attributes
> > +
> > +RTM_GETRCLKSTATE
> > +-----------------
> > +Read the state of recovered pins that output recovered clock from
> > +a given port. The message will contain the number of assigned clocks
> > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> IFLA_RCLK_STATE_OUT_IDX
> > +To support multiple recovered clock outputs from the same port, this
> message
> > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
> of
> > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> attributes
> > +listing the active output indexes.
> > +This message will call the ndo_get_rclk_range to determine the allowed
> > +recovered clock indexes and then will loop through them, calling
> > +the ndo_get_rclk_state for each of them.
> 
> Let me make sure I understand the model that you propose. Specifically
> from the point of view of a multi-port device, because that's my
> immediate use case.
> 
> RTM_GETRCLKRANGE would report number of "pins" that matches the
> number
> of lanes in the system. So e.g. a 32-port switch, where each port has 4
> lanes, would give a range of [1; 128], inclusive. (Or maybe [0; 128) or
> whatever.)
> 
> RTM_GETRCLKSTATE would then return some subset of those pins,
> depending
> on which lanes actually managed to establish a connection and carry a
> valid clock signal. So, say, [1, 2, 3, 4] if the first port has e.g. a
> 100Gbps established.
> 

Those 2 will be merged into a single RTM_GETRCLKSTATE that will report
the state of all available pins for a given port.

Also lanes here should really be ports - will fix in next revision.

But the logic will be: 
Call the RTM_GETRCLKSTATE. It will return the list of pins and their state
for a given port. Once you read the range you will send the RTM_SETRCLKSTATE
to enable the redirection to a given RCLK output from the PHY. If your DPLL/EEC
is configured to accept it automatically - it's all you need to do and you need to
wait for the right state of the EEC (locked/locked with HO).

> > +
> > +RTM_SETRCLKSTATE
> > +-----------------
> > +Sets the redirection of the recovered clock for a given pin. This message
> > +expects one attribute:
> > +struct if_set_rclk_msg {
> > +	__u32 ifindex; /* interface index */
> > +	__u32 out_idx; /* output index (from a valid range)
> > +	__u32 flags; /* configuration flags */
> > +};
> > +
> > +Supported flags are:
> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> > +		     if clear - the output will be disabled.
> 
> OK, so here I set up the tracking. ifindex tells me which EEC to
> configure, out_idx is the pin to track, flags tell me whether to set up
> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
> I somehow know that lane 2 has the best clock.

It's bound to ifindex to know which PHY port you interact with. It has nothing to
do with the EEC yet.
 
> If the above is broadly correct, I've got some questions.
> 
> First, what if more than one out_idx is set? What are drivers / HW meant
> to do with this? What is the expected behavior?

Expected behavior is deployment specific. You can use different phy recovered
clock outputs to implement active/passive mode of clock failover.

> Also GETRCLKSTATE and SETRCLKSTATE have a somewhat different scope:
> one
> reports which pins carry a clock signal, the other influences tracking.
> That seems wrong. There also does not seems to be an UAPI to retrieve
> the tracking settings.

They don't. Get reads the redirection state and SET sets it - nothing more,
nothing less. In ICE we use EEC pin indexes so that the model translates easier
to the one when we support DPLL subsystem.

> Second, as a user-space client, how do I know that if ports 1 and 2 both
> report pin range [A; B], that they both actually share the same
> underlying EEC? Is there some sort of coordination among the drivers,
> such that each pin in the system has a unique ID?

For now we don't, as we don't have EEC subsystem. But that can be solved
by a config file temporarily.

> Further, how do I actually know the mapping from ports to pins? E.g. as
> a user, I might know my master is behind swp1. How do I know what pins
> correspond to that port? As a user-space tool author, how do I help
> users to do something like "eec set clock eec0 track swp1"?

That's why driver needs to be smart there and return indexes properly.

> Additionally, how would things like external GPSs or 1pps be modeled? I
> guess the driver would know about such interface, and would expose it as
> a "pin". When the GPS signal locks, the driver starts reporting the pin
> in the RCLK set. Then it is possible to set up tracking of that pin.

That won't be enabled before we get the DPLL subsystem ready.
 
> It seems to me it would be easier to understand, and to write user-space
> tools and drivers for, a model that has EEC as an explicit first-class
> object. That's where the EEC state naturally belongs, that's where the
> pin range naturally belongs. Netdevs should have a reference to EEC and
> pins, not present this information as if they own it. A first-class EEC
> would also allow to later figure out how to hook up PHC and EEC.

We have the userspace tool, but can?t upstream it until we define kernel 
Interfaces. It's paragraph 22 :(

Regards
Maciek

> > +
> > +RTM_GETEECSTATE
> > +----------------
> > +Reads the state of the EEC or equivalent physical clock synchronizer.
> > +This message returns the following attributes:
> > +IFLA_EEC_STATE - current state of the EEC or equivalent clock generator.
> > +		 The states returned in this attribute are aligned to the
> > +		 ITU-T G.781 and are:
> > +		  IF_EEC_STATE_INVALID - state is not valid
> > +		  IF_EEC_STATE_FREERUN - clock is free-running
> > +		  IF_EEC_STATE_LOCKED - clock is locked to the reference,
> > +		                        but the holdover memory is not valid
> > +		  IF_EEC_STATE_LOCKED_HO_ACQ - clock is locked to the
> reference
> > +		                               and holdover memory is valid
> > +		  IF_EEC_STATE_HOLDOVER - clock is in holdover mode
> > +State is read from the netdev calling the:
> > +int (*ndo_get_eec_state)(struct net_device *dev, enum if_eec_state
> *state,
> > +			 u32 *src_idx, struct netlink_ext_ack *extack);
> > +
> > +IFLA_EEC_SRC_IDX - optional attribute returning the index of the
> reference that
> > +		   is used for the current IFLA_EEC_STATE, i.e., the index of
> > +		   the pin that the EEC is locked to.
> > +
> > +Will be returned only if the ndo_get_eec_src is implemented.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-08 17:03           ` [Intel-wired-lan] " Jakub Kicinski
@ 2021-11-09 10:50             ` Machnikowski, Maciej
  -1 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-09 10:50 UTC (permalink / raw)
  To: Jakub Kicinski, Ido Schimmel
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, linux-kselftest, mkubecek, saeed, michael.chan

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Monday, November 8, 2021 6:03 PM
> To: Ido Schimmel <idosch@idosch.org>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> On Mon, 8 Nov 2021 18:29:50 +0200 Ido Schimmel wrote:
> > I also want to re-iterate my dissatisfaction with the interface being
> > netdev-centric. By modelling the EEC as a standalone object we will be
> > able to extend it to set the source of the EEC to something other than a
> > netdev in the future. If we don't do it now, we will end up with two
> > ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
> > else).
> >
> > Other advantages of modelling the EEC as a separate object include the
> > ability for user space to determine the mapping between netdevs and EECs
> > (currently impossible) and reporting additional EEC attributes such as
> > SyncE clockIdentity and default SSM code. There is really no reason to
> > report all of this identical information via multiple netdevs.
> 
> Indeed, I feel convinced. I believe the OCP timing card will benefit
> from such API as well. I pinged Jonathan if he doesn't have cycles
> I'll do the typing.
> 
> What do you have in mind for driver abstracting away pin selection?
> For a standalone clock fed PPS signal from a backplate this will be
> impossible, so we may need some middle way.

Me too! Yet it'll take a lot of time to implement it. My thinking was to
implement the simplest usable EEC state possible that is applicable to all
solutions (like 1GBaseT that doesn't always require external DPLL to enable
SyncE) and have an option to return the state for netdev-specific use cases
And easily enable the new path when it's available. We can just check if the
driver is connected to the DPLL in the future DPLL subsystem and reroute
the GET_EECSTATE call there.

We can also fix the mapping by adding the DPLL_IDX attribute.

The DPLL subsystem will require very flexible pin model as there are a lot to
configure inside the DPLL to enable many use cases.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-09 10:50             ` Machnikowski, Maciej
  0 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-09 10:50 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Monday, November 8, 2021 6:03 PM
> To: Ido Schimmel <idosch@idosch.org>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> On Mon, 8 Nov 2021 18:29:50 +0200 Ido Schimmel wrote:
> > I also want to re-iterate my dissatisfaction with the interface being
> > netdev-centric. By modelling the EEC as a standalone object we will be
> > able to extend it to set the source of the EEC to something other than a
> > netdev in the future. If we don't do it now, we will end up with two
> > ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
> > else).
> >
> > Other advantages of modelling the EEC as a separate object include the
> > ability for user space to determine the mapping between netdevs and EECs
> > (currently impossible) and reporting additional EEC attributes such as
> > SyncE clockIdentity and default SSM code. There is really no reason to
> > report all of this identical information via multiple netdevs.
> 
> Indeed, I feel convinced. I believe the OCP timing card will benefit
> from such API as well. I pinged Jonathan if he doesn't have cycles
> I'll do the typing.
> 
> What do you have in mind for driver abstracting away pin selection?
> For a standalone clock fed PPS signal from a backplate this will be
> impossible, so we may need some middle way.

Me too! Yet it'll take a lot of time to implement it. My thinking was to
implement the simplest usable EEC state possible that is applicable to all
solutions (like 1GBaseT that doesn't always require external DPLL to enable
SyncE) and have an option to return the state for netdev-specific use cases
And easily enable the new path when it's available. We can just check if the
driver is connected to the DPLL in the future DPLL subsystem and reroute
the GET_EECSTATE call there.

We can also fix the mapping by adding the DPLL_IDX attribute.

The DPLL subsystem will require very flexible pin model as there are a lot to
configure inside the DPLL to enable many use cases.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-09 10:43       ` [Intel-wired-lan] " Machnikowski, Maciej
@ 2021-11-09 14:52         ` Petr Machata
  -1 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-09 14:52 UTC (permalink / raw)
  To: Machnikowski, Maciej
  Cc: Petr Machata, netdev, intel-wired-lan, richardcochran, abyagowi,
	Nguyen, Anthony L, davem, kuba, linux-kselftest, idosch,
	mkubecek, saeed, michael.chan


Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:

>> Maciej Machnikowski <maciej.machnikowski@intel.com> writes:
>> 
>> > +====================
>> > +Synchronous Ethernet
>> > +====================
>> > +
>> > +Synchronous Ethernet networks use a physical layer clock to syntonize
>> > +the frequency across different network elements.
>> > +
>> > +Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
>> > +Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered
>> clocks
>> > +and a dedicated TX clock input that is used as to transmit data to other
>> nodes.
>> > +
>> > +The SyncE capable PHY is able to recover the incomning frequency of the
>> data
>> > +stream on RX lanes and redirect it (sometimes dividing it) to recovered
>> > +clock outputs. In SyncE PHY the TX frequency is directly dependent on the
>> > +input frequency - either on the PHY CLK input, or on a dedicated
>> > +TX clock input.
>> > +
>> > +      ┌───────────┬──────────┐
>> > +      │ RX        │ TX       │
>> > +  1   │ lanes     │ lanes    │ 1
>> > +  ───►├──────┐    │          ├─────►
>> > +  2   │      │    │          │ 2
>> > +  ───►├──┐   │    │          ├─────►
>> > +  3   │  │   │    │          │ 3
>> > +  ───►├─▼▼   ▼    │          ├─────►
>> > +      │ ──────    │          │
>> > +      │ \____/    │          │
>> > +      └──┼──┼─────┴──────────┘
>> > +        1│ 2│        ▲
>> > + RCLK out│  │        │ TX CLK in
>> > +         ▼  ▼        │
>> > +       ┌─────────────┴───┐
>> > +       │                 │
>> > +       │       EEC       │
>> > +       │                 │
>> > +       └─────────────────┘
>> > +
>> > +The EEC can synchronize its frequency to one of the synchronization
>> inputs
>> > +either clocks recovered on traffic interfaces or (in advanced deployments)
>> > +external frequency sources.
>> > +
>> > +Some EEC implementations can select synchronization source through
>> > +priority tables and synchronization status messaging and provide
>> necessary
>> > +filtering and holdover capabilities.
>> > +
>> > +The following interface can be applicable to diffferent packet network
>> types
>> > +following ITU-T G.8261/G.8262 recommendations.
>> > +
>> > +Interface
>> > +=========
>> > +
>> > +The following RTNL messages are used to read/configure SyncE recovered
>> > +clocks.
>> > +
>> > +RTM_GETRCLKRANGE
>> > +-----------------
>> > +Reads the allowed pin index range for the recovered clock outputs.
>> > +This can be aligned to PHY outputs or to EEC inputs, whichever is
>> > +better for a given application.
>> > +Will call the ndo_get_rclk_range function to read the allowed range
>> > +of output pin indexes.
>> > +Will call ndo_get_rclk_range to determine the allowed recovered clock
>> > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
>> > +IFLA_RCLK_RANGE_MAX_PIN attributes
>> > +
>> > +RTM_GETRCLKSTATE
>> > +-----------------
>> > +Read the state of recovered pins that output recovered clock from
>> > +a given port. The message will contain the number of assigned clocks
>> > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
>> IFLA_RCLK_STATE_OUT_IDX
>> > +To support multiple recovered clock outputs from the same port, this
>> message
>> > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
>> of
>> > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
>> attributes
>> > +listing the active output indexes.
>> > +This message will call the ndo_get_rclk_range to determine the allowed
>> > +recovered clock indexes and then will loop through them, calling
>> > +the ndo_get_rclk_state for each of them.
>> 
>> Let me make sure I understand the model that you propose. Specifically
>> from the point of view of a multi-port device, because that's my
>> immediate use case.
>> 
>> RTM_GETRCLKRANGE would report number of "pins" that matches the
>> number
>> of lanes in the system. So e.g. a 32-port switch, where each port has 4
>> lanes, would give a range of [1; 128], inclusive. (Or maybe [0; 128) or
>> whatever.)
>> 
>> RTM_GETRCLKSTATE would then return some subset of those pins,
>> depending
>> on which lanes actually managed to establish a connection and carry a
>> valid clock signal. So, say, [1, 2, 3, 4] if the first port has e.g. a
>> 100Gbps established.
>> 
>
> Those 2 will be merged into a single RTM_GETRCLKSTATE that will report
> the state of all available pins for a given port.
>
> Also lanes here should really be ports - will fix in next revision.
>
> But the logic will be: 
> Call the RTM_GETRCLKSTATE. It will return the list of pins and their state
> for a given port. Once you read the range you will send the RTM_SETRCLKSTATE
> to enable the redirection to a given RCLK output from the PHY. If your DPLL/EEC
> is configured to accept it automatically - it's all you need to do and you need to
> wait for the right state of the EEC (locked/locked with HO).

Ha, ok, so the RANGE call goes away, it's all in the RTM_GETRCLKSTATE.

>> > +
>> > +RTM_SETRCLKSTATE
>> > +-----------------
>> > +Sets the redirection of the recovered clock for a given pin. This message
>> > +expects one attribute:
>> > +struct if_set_rclk_msg {
>> > +	__u32 ifindex; /* interface index */
>> > +	__u32 out_idx; /* output index (from a valid range)
>> > +	__u32 flags; /* configuration flags */
>> > +};
>> > +
>> > +Supported flags are:
>> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
>> > +		     if clear - the output will be disabled.
>> 
>> OK, so here I set up the tracking. ifindex tells me which EEC to
>> configure, out_idx is the pin to track, flags tell me whether to set up
>> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
>> I somehow know that lane 2 has the best clock.
>
> It's bound to ifindex to know which PHY port you interact with. It has nothing to
> do with the EEC yet.

It has in the sense that I'm configuring "TX CLK in", which leads from
EEC to the port.

>> If the above is broadly correct, I've got some questions.
>> 
>> First, what if more than one out_idx is set? What are drivers / HW meant
>> to do with this? What is the expected behavior?
>
> Expected behavior is deployment specific. You can use different phy recovered
> clock outputs to implement active/passive mode of clock failover.

How? Which one is primary and which one is backup? I just have two
enabled pins...

Wouldn't failover be implementable in a userspace daemon? That would get
a notification from the system that holdover was entered, and can
reconfigure tracking to another pin based on arbitrary rules.

>> Also GETRCLKSTATE and SETRCLKSTATE have a somewhat different scope:
>> one
>> reports which pins carry a clock signal, the other influences tracking.
>> That seems wrong. There also does not seems to be an UAPI to retrieve
>> the tracking settings.
>
> They don't. Get reads the redirection state and SET sets it - nothing more,
> nothing less. In ICE we use EEC pin indexes so that the model translates easier
> to the one when we support DPLL subsystem.
>
>> Second, as a user-space client, how do I know that if ports 1 and 2 both
>> report pin range [A; B], that they both actually share the same
>> underlying EEC? Is there some sort of coordination among the drivers,
>> such that each pin in the system has a unique ID?
>
> For now we don't, as we don't have EEC subsystem. But that can be solved
> by a config file temporarily.

I think it would be better to model this properly from day one.

>> Further, how do I actually know the mapping from ports to pins? E.g. as
>> a user, I might know my master is behind swp1. How do I know what pins
>> correspond to that port? As a user-space tool author, how do I help
>> users to do something like "eec set clock eec0 track swp1"?
>
> That's why driver needs to be smart there and return indexes properly.

What do you mean, properly? Up there you have RTM_GETRCLKRANGE that just
gives me a min and a max. Is there a policy about how to correlate
numbers in that range to... ifindices, netdevice names, devlink port
numbers, I don't know, something?

How do several drivers coordinate this numbering among themselves? Is
there a core kernel authority that manages pin number de/allocations?

>> Additionally, how would things like external GPSs or 1pps be modeled? I
>> guess the driver would know about such interface, and would expose it as
>> a "pin". When the GPS signal locks, the driver starts reporting the pin
>> in the RCLK set. Then it is possible to set up tracking of that pin.
>
> That won't be enabled before we get the DPLL subsystem ready.

It might prove challenging to retrofit an existing netdev-centric
interface into a more generic model. It would be better to model this
properly from day one, and OK, if we can carve out a subset of that
model to implement now, and leave the rest for later, fine. But the
current model does not strike me as having a natural migration path to
something more generic. E.g. reporting the EEC state through the
interfaces attached to that EEC... like, that will have to stay, even at
a time when it is superseded by a better interface.

>> It seems to me it would be easier to understand, and to write user-space
>> tools and drivers for, a model that has EEC as an explicit first-class
>> object. That's where the EEC state naturally belongs, that's where the
>> pin range naturally belongs. Netdevs should have a reference to EEC and
>> pins, not present this information as if they own it. A first-class EEC
>> would also allow to later figure out how to hook up PHC and EEC.
>
> We have the userspace tool, but can’t upstream it until we define
> kernel Interfaces. It's paragraph 22 :(

I'm sure you do, presumably you test this somehow. Still, as a potential
consumer of that interface, I will absolutely poke at it to figure out
how to use it, what it lets me to do, and what won't work.

BTW, what we've done in the past in a situation like this was, here's
the current submission, here's a pointer to a GIT with more stuff we
plan to send later on, here's a pointer to a GIT with the userspace
stuff. I doubt anybody actually looks at that code, ain't nobody got
time for that, but really there's no catch 22.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-09 14:52         ` Petr Machata
  0 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-09 14:52 UTC (permalink / raw)
  To: intel-wired-lan


Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:

>> Maciej Machnikowski <maciej.machnikowski@intel.com> writes:
>> 
>> > +====================
>> > +Synchronous Ethernet
>> > +====================
>> > +
>> > +Synchronous Ethernet networks use a physical layer clock to syntonize
>> > +the frequency across different network elements.
>> > +
>> > +Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
>> > +Equipment Clock (EEC) and a PHY that has dedicated outputs of recovered
>> clocks
>> > +and a dedicated TX clock input that is used as to transmit data to other
>> nodes.
>> > +
>> > +The SyncE capable PHY is able to recover the incomning frequency of the
>> data
>> > +stream on RX lanes and redirect it (sometimes dividing it) to recovered
>> > +clock outputs. In SyncE PHY the TX frequency is directly dependent on the
>> > +input frequency - either on the PHY CLK input, or on a dedicated
>> > +TX clock input.
>> > +
>> > +      ????????????????????????
>> > +      ? RX        ? TX       ?
>> > +  1   ? lanes     ? lanes    ? 1
>> > +  ????????????    ?          ???????
>> > +  2   ?      ?    ?          ? 2
>> > +  ????????   ?    ?          ???????
>> > +  3   ?  ?   ?    ?          ? 3
>> > +  ????????   ?    ?          ???????
>> > +      ? ??????    ?          ?
>> > +      ? \____/    ?          ?
>> > +      ????????????????????????
>> > +        1? 2?        ?
>> > + RCLK out?  ?        ? TX CLK in
>> > +         ?  ?        ?
>> > +       ???????????????????
>> > +       ?                 ?
>> > +       ?       EEC       ?
>> > +       ?                 ?
>> > +       ???????????????????
>> > +
>> > +The EEC can synchronize its frequency to one of the synchronization
>> inputs
>> > +either clocks recovered on traffic interfaces or (in advanced deployments)
>> > +external frequency sources.
>> > +
>> > +Some EEC implementations can select synchronization source through
>> > +priority tables and synchronization status messaging and provide
>> necessary
>> > +filtering and holdover capabilities.
>> > +
>> > +The following interface can be applicable to diffferent packet network
>> types
>> > +following ITU-T G.8261/G.8262 recommendations.
>> > +
>> > +Interface
>> > +=========
>> > +
>> > +The following RTNL messages are used to read/configure SyncE recovered
>> > +clocks.
>> > +
>> > +RTM_GETRCLKRANGE
>> > +-----------------
>> > +Reads the allowed pin index range for the recovered clock outputs.
>> > +This can be aligned to PHY outputs or to EEC inputs, whichever is
>> > +better for a given application.
>> > +Will call the ndo_get_rclk_range function to read the allowed range
>> > +of output pin indexes.
>> > +Will call ndo_get_rclk_range to determine the allowed recovered clock
>> > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
>> > +IFLA_RCLK_RANGE_MAX_PIN attributes
>> > +
>> > +RTM_GETRCLKSTATE
>> > +-----------------
>> > +Read the state of recovered pins that output recovered clock from
>> > +a given port. The message will contain the number of assigned clocks
>> > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
>> IFLA_RCLK_STATE_OUT_IDX
>> > +To support multiple recovered clock outputs from the same port, this
>> message
>> > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
>> of
>> > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
>> attributes
>> > +listing the active output indexes.
>> > +This message will call the ndo_get_rclk_range to determine the allowed
>> > +recovered clock indexes and then will loop through them, calling
>> > +the ndo_get_rclk_state for each of them.
>> 
>> Let me make sure I understand the model that you propose. Specifically
>> from the point of view of a multi-port device, because that's my
>> immediate use case.
>> 
>> RTM_GETRCLKRANGE would report number of "pins" that matches the
>> number
>> of lanes in the system. So e.g. a 32-port switch, where each port has 4
>> lanes, would give a range of [1; 128], inclusive. (Or maybe [0; 128) or
>> whatever.)
>> 
>> RTM_GETRCLKSTATE would then return some subset of those pins,
>> depending
>> on which lanes actually managed to establish a connection and carry a
>> valid clock signal. So, say, [1, 2, 3, 4] if the first port has e.g. a
>> 100Gbps established.
>> 
>
> Those 2 will be merged into a single RTM_GETRCLKSTATE that will report
> the state of all available pins for a given port.
>
> Also lanes here should really be ports - will fix in next revision.
>
> But the logic will be: 
> Call the RTM_GETRCLKSTATE. It will return the list of pins and their state
> for a given port. Once you read the range you will send the RTM_SETRCLKSTATE
> to enable the redirection to a given RCLK output from the PHY. If your DPLL/EEC
> is configured to accept it automatically - it's all you need to do and you need to
> wait for the right state of the EEC (locked/locked with HO).

Ha, ok, so the RANGE call goes away, it's all in the RTM_GETRCLKSTATE.

>> > +
>> > +RTM_SETRCLKSTATE
>> > +-----------------
>> > +Sets the redirection of the recovered clock for a given pin. This message
>> > +expects one attribute:
>> > +struct if_set_rclk_msg {
>> > +	__u32 ifindex; /* interface index */
>> > +	__u32 out_idx; /* output index (from a valid range)
>> > +	__u32 flags; /* configuration flags */
>> > +};
>> > +
>> > +Supported flags are:
>> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
>> > +		     if clear - the output will be disabled.
>> 
>> OK, so here I set up the tracking. ifindex tells me which EEC to
>> configure, out_idx is the pin to track, flags tell me whether to set up
>> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
>> I somehow know that lane 2 has the best clock.
>
> It's bound to ifindex to know which PHY port you interact with. It has nothing to
> do with the EEC yet.

It has in the sense that I'm configuring "TX CLK in", which leads from
EEC to the port.

>> If the above is broadly correct, I've got some questions.
>> 
>> First, what if more than one out_idx is set? What are drivers / HW meant
>> to do with this? What is the expected behavior?
>
> Expected behavior is deployment specific. You can use different phy recovered
> clock outputs to implement active/passive mode of clock failover.

How? Which one is primary and which one is backup? I just have two
enabled pins...

Wouldn't failover be implementable in a userspace daemon? That would get
a notification from the system that holdover was entered, and can
reconfigure tracking to another pin based on arbitrary rules.

>> Also GETRCLKSTATE and SETRCLKSTATE have a somewhat different scope:
>> one
>> reports which pins carry a clock signal, the other influences tracking.
>> That seems wrong. There also does not seems to be an UAPI to retrieve
>> the tracking settings.
>
> They don't. Get reads the redirection state and SET sets it - nothing more,
> nothing less. In ICE we use EEC pin indexes so that the model translates easier
> to the one when we support DPLL subsystem.
>
>> Second, as a user-space client, how do I know that if ports 1 and 2 both
>> report pin range [A; B], that they both actually share the same
>> underlying EEC? Is there some sort of coordination among the drivers,
>> such that each pin in the system has a unique ID?
>
> For now we don't, as we don't have EEC subsystem. But that can be solved
> by a config file temporarily.

I think it would be better to model this properly from day one.

>> Further, how do I actually know the mapping from ports to pins? E.g. as
>> a user, I might know my master is behind swp1. How do I know what pins
>> correspond to that port? As a user-space tool author, how do I help
>> users to do something like "eec set clock eec0 track swp1"?
>
> That's why driver needs to be smart there and return indexes properly.

What do you mean, properly? Up there you have RTM_GETRCLKRANGE that just
gives me a min and a max. Is there a policy about how to correlate
numbers in that range to... ifindices, netdevice names, devlink port
numbers, I don't know, something?

How do several drivers coordinate this numbering among themselves? Is
there a core kernel authority that manages pin number de/allocations?

>> Additionally, how would things like external GPSs or 1pps be modeled? I
>> guess the driver would know about such interface, and would expose it as
>> a "pin". When the GPS signal locks, the driver starts reporting the pin
>> in the RCLK set. Then it is possible to set up tracking of that pin.
>
> That won't be enabled before we get the DPLL subsystem ready.

It might prove challenging to retrofit an existing netdev-centric
interface into a more generic model. It would be better to model this
properly from day one, and OK, if we can carve out a subset of that
model to implement now, and leave the rest for later, fine. But the
current model does not strike me as having a natural migration path to
something more generic. E.g. reporting the EEC state through the
interfaces attached to that EEC... like, that will have to stay, even at
a time when it is superseded by a better interface.

>> It seems to me it would be easier to understand, and to write user-space
>> tools and drivers for, a model that has EEC as an explicit first-class
>> object. That's where the EEC state naturally belongs, that's where the
>> pin range naturally belongs. Netdevs should have a reference to EEC and
>> pins, not present this information as if they own it. A first-class EEC
>> would also allow to later figure out how to hook up PHC and EEC.
>
> We have the userspace tool, but can?t upstream it until we define
> kernel Interfaces. It's paragraph 22 :(

I'm sure you do, presumably you test this somehow. Still, as a potential
consumer of that interface, I will absolutely poke at it to figure out
how to use it, what it lets me to do, and what won't work.

BTW, what we've done in the past in a situation like this was, here's
the current submission, here's a pointer to a GIT with more stuff we
plan to send later on, here's a pointer to a GIT with the userspace
stuff. I doubt anybody actually looks at that code, ain't nobody got
time for that, but really there's no catch 22.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-09 14:52         ` [Intel-wired-lan] " Petr Machata
@ 2021-11-09 18:19           ` Machnikowski, Maciej
  -1 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-09 18:19 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, kuba, linux-kselftest, idosch, mkubecek, saeed,
	michael.chan

> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Tuesday, November 9, 2021 3:53 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:
> 
> >> Maciej Machnikowski <maciej.machnikowski@intel.com> writes:
> >>
> >> RTM_GETRCLKRANGE would report number of "pins" that matches the
> >> number
> >> of lanes in the system. So e.g. a 32-port switch, where each port has 4
> >> lanes, would give a range of [1; 128], inclusive. (Or maybe [0; 128) or
> >> whatever.)
> >>
> >> RTM_GETRCLKSTATE would then return some subset of those pins,
> >> depending
> >> on which lanes actually managed to establish a connection and carry a
> >> valid clock signal. So, say, [1, 2, 3, 4] if the first port has e.g. a
> >> 100Gbps established.
> >>
> >
> > Those 2 will be merged into a single RTM_GETRCLKSTATE that will report
> > the state of all available pins for a given port.
> >
> > Also lanes here should really be ports - will fix in next revision.
> >
> > But the logic will be:
> > Call the RTM_GETRCLKSTATE. It will return the list of pins and their state
> > for a given port. Once you read the range you will send the
> RTM_SETRCLKSTATE
> > to enable the redirection to a given RCLK output from the PHY. If your
> DPLL/EEC
> > is configured to accept it automatically - it's all you need to do and you need
> to
> > wait for the right state of the EEC (locked/locked with HO).
> 
> Ha, ok, so the RANGE call goes away, it's all in the RTM_GETRCLKSTATE.

The functionality needs to be there, but the message will be gone.
 
> >> > +
> >> > +RTM_SETRCLKSTATE
> >> > +-----------------
> >> > +Sets the redirection of the recovered clock for a given pin. This
> message
> >> > +expects one attribute:
> >> > +struct if_set_rclk_msg {
> >> > +	__u32 ifindex; /* interface index */
> >> > +	__u32 out_idx; /* output index (from a valid range)
> >> > +	__u32 flags; /* configuration flags */
> >> > +};
> >> > +
> >> > +Supported flags are:
> >> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be
> enabled,
> >> > +		     if clear - the output will be disabled.
> >>
> >> OK, so here I set up the tracking. ifindex tells me which EEC to
> >> configure, out_idx is the pin to track, flags tell me whether to set up
> >> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
> >> I somehow know that lane 2 has the best clock.
> >
> > It's bound to ifindex to know which PHY port you interact with. It has
> nothing to
> > do with the EEC yet.
> 
> It has in the sense that I'm configuring "TX CLK in", which leads from
> EEC to the port.

At this stage we only enable the recovered clock. EEC may or may not use it
depending on many additional factors.

> >> If the above is broadly correct, I've got some questions.
> >>
> >> First, what if more than one out_idx is set? What are drivers / HW meant
> >> to do with this? What is the expected behavior?
> >
> > Expected behavior is deployment specific. You can use different phy
> recovered
> > clock outputs to implement active/passive mode of clock failover.
> 
> How? Which one is primary and which one is backup? I just have two
> enabled pins...

With this API you only have ports and pins and set up the redirection.
The EEC part is out of picture and will be part of DPLL subsystem.

> Wouldn't failover be implementable in a userspace daemon? That would get
> a notification from the system that holdover was entered, and can
> reconfigure tracking to another pin based on arbitrary rules.

Not necessarily. You can deploy the QL-disabled mode and rely on the
local DPLL configuration to manage the switching. In that mode you're
not passing the quality level downstream, so you only need to know if you
have a source.

> >> Also GETRCLKSTATE and SETRCLKSTATE have a somewhat different scope:
> >> one
> >> reports which pins carry a clock signal, the other influences tracking.
> >> That seems wrong. There also does not seems to be an UAPI to retrieve
> >> the tracking settings.
> >
> > They don't. Get reads the redirection state and SET sets it - nothing more,
> > nothing less. In ICE we use EEC pin indexes so that the model translates
> easier
> > to the one when we support DPLL subsystem.
> >
> >> Second, as a user-space client, how do I know that if ports 1 and 2 both
> >> report pin range [A; B], that they both actually share the same
> >> underlying EEC? Is there some sort of coordination among the drivers,
> >> such that each pin in the system has a unique ID?
> >
> > For now we don't, as we don't have EEC subsystem. But that can be solved
> > by a config file temporarily.
> 
> I think it would be better to model this properly from day one.

I want to propose the simplest API that will work for the simplest device,
follow that with the userspace tool that will help everyone understand
what we need in the DPLL subsystem, otherwise it'll be hard to explain the
requirements. The only change will be the addition of the DPLL index.
 
> >> Further, how do I actually know the mapping from ports to pins? E.g. as
> >> a user, I might know my master is behind swp1. How do I know what pins
> >> correspond to that port? As a user-space tool author, how do I help
> >> users to do something like "eec set clock eec0 track swp1"?
> >
> > That's why driver needs to be smart there and return indexes properly.
> 
> What do you mean, properly? Up there you have RTM_GETRCLKRANGE that
> just
> gives me a min and a max. Is there a policy about how to correlate
> numbers in that range to... ifindices, netdevice names, devlink port
> numbers, I don't know, something?

The driver needs to know the underlying HW and report those ranges
correctly.

> How do several drivers coordinate this numbering among themselves? Is
> there a core kernel authority that manages pin number de/allocations?

I believe the goal is to create something similar to the ptp subsystem.
The driver will need to configure the relationship during initialization and the
OS will manage the indexes.
 
> >> Additionally, how would things like external GPSs or 1pps be modeled? I
> >> guess the driver would know about such interface, and would expose it as
> >> a "pin". When the GPS signal locks, the driver starts reporting the pin
> >> in the RCLK set. Then it is possible to set up tracking of that pin.
> >
> > That won't be enabled before we get the DPLL subsystem ready.
> 
> It might prove challenging to retrofit an existing netdev-centric
> interface into a more generic model. It would be better to model this
> properly from day one, and OK, if we can carve out a subset of that
> model to implement now, and leave the rest for later, fine. But the
> current model does not strike me as having a natural migration path to
> something more generic. E.g. reporting the EEC state through the
> interfaces attached to that EEC... like, that will have to stay, even at
> a time when it is superseded by a better interface.

The recovered clock API will not change - only EEC_STATE is in question.
We can either redirect the call to the DPLL subsystem, or just add the DPLL IDX
Into that call and return it. 

> >> It seems to me it would be easier to understand, and to write user-space
> >> tools and drivers for, a model that has EEC as an explicit first-class
> >> object. That's where the EEC state naturally belongs, that's where the
> >> pin range naturally belongs. Netdevs should have a reference to EEC and
> >> pins, not present this information as if they own it. A first-class EEC
> >> would also allow to later figure out how to hook up PHC and EEC.
> >
> > We have the userspace tool, but can’t upstream it until we define
> > kernel Interfaces. It's paragraph 22 :(
> 
> I'm sure you do, presumably you test this somehow. Still, as a potential
> consumer of that interface, I will absolutely poke at it to figure out
> how to use it, what it lets me to do, and what won't work.

That's why now I want to enable very basic functionality that will not go away
anytime soon. Mapping between port and recovered clock (as in
take my clock and output on the first PHY's recovered clock output)
and checking the state of the clock.

> BTW, what we've done in the past in a situation like this was, here's
> the current submission, here's a pointer to a GIT with more stuff we
> plan to send later on, here's a pointer to a GIT with the userspace
> stuff. I doubt anybody actually looks at that code, ain't nobody got
> time for that, but really there's no catch 22.

Unfortunately, the userspace of it will be a part of linuxptp and we can't
upstream it partially before we get those basics defined here. More 
advanced functionality will be grown organically, as I also have a limited
view of SyncE and am not expert on switches. 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-09 18:19           ` Machnikowski, Maciej
  0 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-09 18:19 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Tuesday, November 9, 2021 3:53 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:
> 
> >> Maciej Machnikowski <maciej.machnikowski@intel.com> writes:
> >>
> >> RTM_GETRCLKRANGE would report number of "pins" that matches the
> >> number
> >> of lanes in the system. So e.g. a 32-port switch, where each port has 4
> >> lanes, would give a range of [1; 128], inclusive. (Or maybe [0; 128) or
> >> whatever.)
> >>
> >> RTM_GETRCLKSTATE would then return some subset of those pins,
> >> depending
> >> on which lanes actually managed to establish a connection and carry a
> >> valid clock signal. So, say, [1, 2, 3, 4] if the first port has e.g. a
> >> 100Gbps established.
> >>
> >
> > Those 2 will be merged into a single RTM_GETRCLKSTATE that will report
> > the state of all available pins for a given port.
> >
> > Also lanes here should really be ports - will fix in next revision.
> >
> > But the logic will be:
> > Call the RTM_GETRCLKSTATE. It will return the list of pins and their state
> > for a given port. Once you read the range you will send the
> RTM_SETRCLKSTATE
> > to enable the redirection to a given RCLK output from the PHY. If your
> DPLL/EEC
> > is configured to accept it automatically - it's all you need to do and you need
> to
> > wait for the right state of the EEC (locked/locked with HO).
> 
> Ha, ok, so the RANGE call goes away, it's all in the RTM_GETRCLKSTATE.

The functionality needs to be there, but the message will be gone.
 
> >> > +
> >> > +RTM_SETRCLKSTATE
> >> > +-----------------
> >> > +Sets the redirection of the recovered clock for a given pin. This
> message
> >> > +expects one attribute:
> >> > +struct if_set_rclk_msg {
> >> > +	__u32 ifindex; /* interface index */
> >> > +	__u32 out_idx; /* output index (from a valid range)
> >> > +	__u32 flags; /* configuration flags */
> >> > +};
> >> > +
> >> > +Supported flags are:
> >> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be
> enabled,
> >> > +		     if clear - the output will be disabled.
> >>
> >> OK, so here I set up the tracking. ifindex tells me which EEC to
> >> configure, out_idx is the pin to track, flags tell me whether to set up
> >> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
> >> I somehow know that lane 2 has the best clock.
> >
> > It's bound to ifindex to know which PHY port you interact with. It has
> nothing to
> > do with the EEC yet.
> 
> It has in the sense that I'm configuring "TX CLK in", which leads from
> EEC to the port.

At this stage we only enable the recovered clock. EEC may or may not use it
depending on many additional factors.

> >> If the above is broadly correct, I've got some questions.
> >>
> >> First, what if more than one out_idx is set? What are drivers / HW meant
> >> to do with this? What is the expected behavior?
> >
> > Expected behavior is deployment specific. You can use different phy
> recovered
> > clock outputs to implement active/passive mode of clock failover.
> 
> How? Which one is primary and which one is backup? I just have two
> enabled pins...

With this API you only have ports and pins and set up the redirection.
The EEC part is out of picture and will be part of DPLL subsystem.

> Wouldn't failover be implementable in a userspace daemon? That would get
> a notification from the system that holdover was entered, and can
> reconfigure tracking to another pin based on arbitrary rules.

Not necessarily. You can deploy the QL-disabled mode and rely on the
local DPLL configuration to manage the switching. In that mode you're
not passing the quality level downstream, so you only need to know if you
have a source.

> >> Also GETRCLKSTATE and SETRCLKSTATE have a somewhat different scope:
> >> one
> >> reports which pins carry a clock signal, the other influences tracking.
> >> That seems wrong. There also does not seems to be an UAPI to retrieve
> >> the tracking settings.
> >
> > They don't. Get reads the redirection state and SET sets it - nothing more,
> > nothing less. In ICE we use EEC pin indexes so that the model translates
> easier
> > to the one when we support DPLL subsystem.
> >
> >> Second, as a user-space client, how do I know that if ports 1 and 2 both
> >> report pin range [A; B], that they both actually share the same
> >> underlying EEC? Is there some sort of coordination among the drivers,
> >> such that each pin in the system has a unique ID?
> >
> > For now we don't, as we don't have EEC subsystem. But that can be solved
> > by a config file temporarily.
> 
> I think it would be better to model this properly from day one.

I want to propose the simplest API that will work for the simplest device,
follow that with the userspace tool that will help everyone understand
what we need in the DPLL subsystem, otherwise it'll be hard to explain the
requirements. The only change will be the addition of the DPLL index.
 
> >> Further, how do I actually know the mapping from ports to pins? E.g. as
> >> a user, I might know my master is behind swp1. How do I know what pins
> >> correspond to that port? As a user-space tool author, how do I help
> >> users to do something like "eec set clock eec0 track swp1"?
> >
> > That's why driver needs to be smart there and return indexes properly.
> 
> What do you mean, properly? Up there you have RTM_GETRCLKRANGE that
> just
> gives me a min and a max. Is there a policy about how to correlate
> numbers in that range to... ifindices, netdevice names, devlink port
> numbers, I don't know, something?

The driver needs to know the underlying HW and report those ranges
correctly.

> How do several drivers coordinate this numbering among themselves? Is
> there a core kernel authority that manages pin number de/allocations?

I believe the goal is to create something similar to the ptp subsystem.
The driver will need to configure the relationship during initialization and the
OS will manage the indexes.
 
> >> Additionally, how would things like external GPSs or 1pps be modeled? I
> >> guess the driver would know about such interface, and would expose it as
> >> a "pin". When the GPS signal locks, the driver starts reporting the pin
> >> in the RCLK set. Then it is possible to set up tracking of that pin.
> >
> > That won't be enabled before we get the DPLL subsystem ready.
> 
> It might prove challenging to retrofit an existing netdev-centric
> interface into a more generic model. It would be better to model this
> properly from day one, and OK, if we can carve out a subset of that
> model to implement now, and leave the rest for later, fine. But the
> current model does not strike me as having a natural migration path to
> something more generic. E.g. reporting the EEC state through the
> interfaces attached to that EEC... like, that will have to stay, even at
> a time when it is superseded by a better interface.

The recovered clock API will not change - only EEC_STATE is in question.
We can either redirect the call to the DPLL subsystem, or just add the DPLL IDX
Into that call and return it. 

> >> It seems to me it would be easier to understand, and to write user-space
> >> tools and drivers for, a model that has EEC as an explicit first-class
> >> object. That's where the EEC state naturally belongs, that's where the
> >> pin range naturally belongs. Netdevs should have a reference to EEC and
> >> pins, not present this information as if they own it. A first-class EEC
> >> would also allow to later figure out how to hook up PHC and EEC.
> >
> > We have the userspace tool, but can?t upstream it until we define
> > kernel Interfaces. It's paragraph 22 :(
> 
> I'm sure you do, presumably you test this somehow. Still, as a potential
> consumer of that interface, I will absolutely poke at it to figure out
> how to use it, what it lets me to do, and what won't work.

That's why now I want to enable very basic functionality that will not go away
anytime soon. Mapping between port and recovered clock (as in
take my clock and output on the first PHY's recovered clock output)
and checking the state of the clock.

> BTW, what we've done in the past in a situation like this was, here's
> the current submission, here's a pointer to a GIT with more stuff we
> plan to send later on, here's a pointer to a GIT with the userspace
> stuff. I doubt anybody actually looks at that code, ain't nobody got
> time for that, but really there's no catch 22.

Unfortunately, the userspace of it will be a part of linuxptp and we can't
upstream it partially before we get those basics defined here. More 
advanced functionality will be grown organically, as I also have a limited
view of SyncE and am not expert on switches. 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-09 18:19           ` [Intel-wired-lan] " Machnikowski, Maciej
@ 2021-11-10 10:27             ` Petr Machata
  -1 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-10 10:27 UTC (permalink / raw)
  To: Machnikowski, Maciej
  Cc: Petr Machata, netdev, intel-wired-lan, richardcochran, abyagowi,
	Nguyen, Anthony L, davem, kuba, linux-kselftest, idosch,
	mkubecek, saeed, michael.chan


Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:

>> Ha, ok, so the RANGE call goes away, it's all in the RTM_GETRCLKSTATE.
>
> The functionality needs to be there, but the message will be gone.

Gotcha.

>> >> > +RTM_SETRCLKSTATE
>> >> > +-----------------
>> >> > +Sets the redirection of the recovered clock for a given pin. This
>> message
>> >> > +expects one attribute:
>> >> > +struct if_set_rclk_msg {
>> >> > +	__u32 ifindex; /* interface index */
>> >> > +	__u32 out_idx; /* output index (from a valid range)
>> >> > +	__u32 flags; /* configuration flags */
>> >> > +};
>> >> > +
>> >> > +Supported flags are:
>> >> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
>> >> > +		     if clear - the output will be disabled.
>> >>
>> >> OK, so here I set up the tracking. ifindex tells me which EEC to
>> >> configure, out_idx is the pin to track, flags tell me whether to set up
>> >> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
>> >> I somehow know that lane 2 has the best clock.
>> >
>> > It's bound to ifindex to know which PHY port you interact with. It
>> > has nothing to do with the EEC yet.
>>
>> It has in the sense that I'm configuring "TX CLK in", which leads
>> from EEC to the port.
>
> At this stage we only enable the recovered clock. EEC may or may not
> use it depending on many additional factors.
>
>> >> If the above is broadly correct, I've got some questions.
>> >>
>> >> First, what if more than one out_idx is set? What are drivers / HW
>> >> meant to do with this? What is the expected behavior?
>> >
>> > Expected behavior is deployment specific. You can use different phy
>> > recovered clock outputs to implement active/passive mode of clock
>> > failover.
>>
>> How? Which one is primary and which one is backup? I just have two
>> enabled pins...
>
> With this API you only have ports and pins and set up the redirection.

Wait, so how do I do failover? Which of the set pins in primary and
which is backup? Should the backup be sticky, i.e. do primary and backup
switch roles after primary goes into holdover? It looks like there are a
number of policy decisions that would be best served by a userspace
tool.

> The EEC part is out of picture and will be part of DPLL subsystem.

So about that. I don't think it's contentious to claim that you need to
communicate EEC state somehow. This proposal does that through a netdev
object. After the DPLL subsystem comes along, that will necessarily
provide the same information, and the netdev interface will become
redundant, but we will need to keep it around.

That is a strong indication that a first-class DPLL object should be
part of the initial submission.

>> Wouldn't failover be implementable in a userspace daemon? That would get
>> a notification from the system that holdover was entered, and can
>> reconfigure tracking to another pin based on arbitrary rules.
>
> Not necessarily. You can deploy the QL-disabled mode and rely on the
> local DPLL configuration to manage the switching. In that mode you're
> not passing the quality level downstream, so you only need to know if you
> have a source.

The daemon can reconfigure tracking to another pin based on _arbitrary_
rules. They don't have to involve QL in any way. Can be round-robin,
FIFO, random choice... IMO it's better than just enabling a bunch of
pins and not providing any guidance as to the policy.

>> >> Second, as a user-space client, how do I know that if ports 1 and
>> >> 2 both report pin range [A; B], that they both actually share the
>> >> same underlying EEC? Is there some sort of coordination among the
>> >> drivers, such that each pin in the system has a unique ID?
>> >
>> > For now we don't, as we don't have EEC subsystem. But that can be
>> > solved by a config file temporarily.
>>
>> I think it would be better to model this properly from day one.
>
> I want to propose the simplest API that will work for the simplest
> device, follow that with the userspace tool that will help everyone
> understand what we need in the DPLL subsystem, otherwise it'll be hard
> to explain the requirements. The only change will be the addition of
> the DPLL index.

That would be fine if there were a migration path to the more complete
API. But as DPLL object is introduced, even the APIs that are superseded
by the DPLL APIs will need to stay in as a baggage.

>> >> Further, how do I actually know the mapping from ports to pins?
>> >> E.g. as a user, I might know my master is behind swp1. How do I
>> >> know what pins correspond to that port? As a user-space tool
>> >> author, how do I help users to do something like "eec set clock
>> >> eec0 track swp1"?
>> >
>> > That's why driver needs to be smart there and return indexes
>> > properly.
>>
>> What do you mean, properly? Up there you have RTM_GETRCLKRANGE that
>> just gives me a min and a max. Is there a policy about how to
>> correlate numbers in that range to... ifindices, netdevice names,
>> devlink port numbers, I don't know, something?
>
> The driver needs to know the underlying HW and report those ranges
> correctly.

How do I know _as a user_ though? As a user I want to be able to say
something like "eec set dev swp1 track dev swp2". But the "eec" tool has
no way of knowing how to set that up.

>> How do several drivers coordinate this numbering among themselves? Is
>> there a core kernel authority that manages pin number de/allocations?
>
> I believe the goal is to create something similar to the ptp
> subsystem. The driver will need to configure the relationship during
> initialization and the OS will manage the indexes.

Can you point at the index management code, please?

>> >> Additionally, how would things like external GPSs or 1pps be
>> >> modeled? I guess the driver would know about such interface, and
>> >> would expose it as a "pin". When the GPS signal locks, the driver
>> >> starts reporting the pin in the RCLK set. Then it is possible to
>> >> set up tracking of that pin.
>> >
>> > That won't be enabled before we get the DPLL subsystem ready.
>>
>> It might prove challenging to retrofit an existing netdev-centric
>> interface into a more generic model. It would be better to model this
>> properly from day one, and OK, if we can carve out a subset of that
>> model to implement now, and leave the rest for later, fine. But the
>> current model does not strike me as having a natural migration path to
>> something more generic. E.g. reporting the EEC state through the
>> interfaces attached to that EEC... like, that will have to stay, even at
>> a time when it is superseded by a better interface.
>
> The recovered clock API will not change - only EEC_STATE is in
> question. We can either redirect the call to the DPLL subsystem, or
> just add the DPLL IDX Into that call and return it.

It would be better to have a first-class DPLL object, however vestigial,
in the initial submission.

>> >> It seems to me it would be easier to understand, and to write
>> >> user-space tools and drivers for, a model that has EEC as an
>> >> explicit first-class object. That's where the EEC state naturally
>> >> belongs, that's where the pin range naturally belongs. Netdevs
>> >> should have a reference to EEC and pins, not present this
>> >> information as if they own it. A first-class EEC would also allow
>> >> to later figure out how to hook up PHC and EEC.
>> >
>> > We have the userspace tool, but can’t upstream it until we define
>> > kernel Interfaces. It's paragraph 22 :(
>>
>> I'm sure you do, presumably you test this somehow. Still, as a
>> potential consumer of that interface, I will absolutely poke at it to
>> figure out how to use it, what it lets me to do, and what won't work.
>
> That's why now I want to enable very basic functionality that will not
> go away anytime soon.

The issue is that the APIs won't go away any time soon either. That's
why people object to your proposal so strongly. Because we won't be able
to fix this later, and we _already_ see shortcomings now.

> Mapping between port and recovered clock (as in take my clock and
> output on the first PHY's recovered clock output) and checking the
> state of the clock.

Where is that mapping? I see a per-netdev call for a list of pins that
carry RCLK, and the state as well. I don't see a way to distinguish
which is which in any way.

>> BTW, what we've done in the past in a situation like this was, here's
>> the current submission, here's a pointer to a GIT with more stuff we
>> plan to send later on, here's a pointer to a GIT with the userspace
>> stuff. I doubt anybody actually looks at that code, ain't nobody got
>> time for that, but really there's no catch 22.
>
> Unfortunately, the userspace of it will be a part of linuxptp and we
> can't upstream it partially before we get those basics defined here.

Just push it to github or whereever?

> More advanced functionality will be grown organically, as I also have
> a limited view of SyncE and am not expert on switches.

We are growing it organically _right now_. I am strongly advocating an
organic growth in the direction of a first-class DPLL object.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-10 10:27             ` Petr Machata
  0 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-10 10:27 UTC (permalink / raw)
  To: intel-wired-lan


Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:

>> Ha, ok, so the RANGE call goes away, it's all in the RTM_GETRCLKSTATE.
>
> The functionality needs to be there, but the message will be gone.

Gotcha.

>> >> > +RTM_SETRCLKSTATE
>> >> > +-----------------
>> >> > +Sets the redirection of the recovered clock for a given pin. This
>> message
>> >> > +expects one attribute:
>> >> > +struct if_set_rclk_msg {
>> >> > +	__u32 ifindex; /* interface index */
>> >> > +	__u32 out_idx; /* output index (from a valid range)
>> >> > +	__u32 flags; /* configuration flags */
>> >> > +};
>> >> > +
>> >> > +Supported flags are:
>> >> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
>> >> > +		     if clear - the output will be disabled.
>> >>
>> >> OK, so here I set up the tracking. ifindex tells me which EEC to
>> >> configure, out_idx is the pin to track, flags tell me whether to set up
>> >> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
>> >> I somehow know that lane 2 has the best clock.
>> >
>> > It's bound to ifindex to know which PHY port you interact with. It
>> > has nothing to do with the EEC yet.
>>
>> It has in the sense that I'm configuring "TX CLK in", which leads
>> from EEC to the port.
>
> At this stage we only enable the recovered clock. EEC may or may not
> use it depending on many additional factors.
>
>> >> If the above is broadly correct, I've got some questions.
>> >>
>> >> First, what if more than one out_idx is set? What are drivers / HW
>> >> meant to do with this? What is the expected behavior?
>> >
>> > Expected behavior is deployment specific. You can use different phy
>> > recovered clock outputs to implement active/passive mode of clock
>> > failover.
>>
>> How? Which one is primary and which one is backup? I just have two
>> enabled pins...
>
> With this API you only have ports and pins and set up the redirection.

Wait, so how do I do failover? Which of the set pins in primary and
which is backup? Should the backup be sticky, i.e. do primary and backup
switch roles after primary goes into holdover? It looks like there are a
number of policy decisions that would be best served by a userspace
tool.

> The EEC part is out of picture and will be part of DPLL subsystem.

So about that. I don't think it's contentious to claim that you need to
communicate EEC state somehow. This proposal does that through a netdev
object. After the DPLL subsystem comes along, that will necessarily
provide the same information, and the netdev interface will become
redundant, but we will need to keep it around.

That is a strong indication that a first-class DPLL object should be
part of the initial submission.

>> Wouldn't failover be implementable in a userspace daemon? That would get
>> a notification from the system that holdover was entered, and can
>> reconfigure tracking to another pin based on arbitrary rules.
>
> Not necessarily. You can deploy the QL-disabled mode and rely on the
> local DPLL configuration to manage the switching. In that mode you're
> not passing the quality level downstream, so you only need to know if you
> have a source.

The daemon can reconfigure tracking to another pin based on _arbitrary_
rules. They don't have to involve QL in any way. Can be round-robin,
FIFO, random choice... IMO it's better than just enabling a bunch of
pins and not providing any guidance as to the policy.

>> >> Second, as a user-space client, how do I know that if ports 1 and
>> >> 2 both report pin range [A; B], that they both actually share the
>> >> same underlying EEC? Is there some sort of coordination among the
>> >> drivers, such that each pin in the system has a unique ID?
>> >
>> > For now we don't, as we don't have EEC subsystem. But that can be
>> > solved by a config file temporarily.
>>
>> I think it would be better to model this properly from day one.
>
> I want to propose the simplest API that will work for the simplest
> device, follow that with the userspace tool that will help everyone
> understand what we need in the DPLL subsystem, otherwise it'll be hard
> to explain the requirements. The only change will be the addition of
> the DPLL index.

That would be fine if there were a migration path to the more complete
API. But as DPLL object is introduced, even the APIs that are superseded
by the DPLL APIs will need to stay in as a baggage.

>> >> Further, how do I actually know the mapping from ports to pins?
>> >> E.g. as a user, I might know my master is behind swp1. How do I
>> >> know what pins correspond to that port? As a user-space tool
>> >> author, how do I help users to do something like "eec set clock
>> >> eec0 track swp1"?
>> >
>> > That's why driver needs to be smart there and return indexes
>> > properly.
>>
>> What do you mean, properly? Up there you have RTM_GETRCLKRANGE that
>> just gives me a min and a max. Is there a policy about how to
>> correlate numbers in that range to... ifindices, netdevice names,
>> devlink port numbers, I don't know, something?
>
> The driver needs to know the underlying HW and report those ranges
> correctly.

How do I know _as a user_ though? As a user I want to be able to say
something like "eec set dev swp1 track dev swp2". But the "eec" tool has
no way of knowing how to set that up.

>> How do several drivers coordinate this numbering among themselves? Is
>> there a core kernel authority that manages pin number de/allocations?
>
> I believe the goal is to create something similar to the ptp
> subsystem. The driver will need to configure the relationship during
> initialization and the OS will manage the indexes.

Can you point at the index management code, please?

>> >> Additionally, how would things like external GPSs or 1pps be
>> >> modeled? I guess the driver would know about such interface, and
>> >> would expose it as a "pin". When the GPS signal locks, the driver
>> >> starts reporting the pin in the RCLK set. Then it is possible to
>> >> set up tracking of that pin.
>> >
>> > That won't be enabled before we get the DPLL subsystem ready.
>>
>> It might prove challenging to retrofit an existing netdev-centric
>> interface into a more generic model. It would be better to model this
>> properly from day one, and OK, if we can carve out a subset of that
>> model to implement now, and leave the rest for later, fine. But the
>> current model does not strike me as having a natural migration path to
>> something more generic. E.g. reporting the EEC state through the
>> interfaces attached to that EEC... like, that will have to stay, even at
>> a time when it is superseded by a better interface.
>
> The recovered clock API will not change - only EEC_STATE is in
> question. We can either redirect the call to the DPLL subsystem, or
> just add the DPLL IDX Into that call and return it.

It would be better to have a first-class DPLL object, however vestigial,
in the initial submission.

>> >> It seems to me it would be easier to understand, and to write
>> >> user-space tools and drivers for, a model that has EEC as an
>> >> explicit first-class object. That's where the EEC state naturally
>> >> belongs, that's where the pin range naturally belongs. Netdevs
>> >> should have a reference to EEC and pins, not present this
>> >> information as if they own it. A first-class EEC would also allow
>> >> to later figure out how to hook up PHC and EEC.
>> >
>> > We have the userspace tool, but can?t upstream it until we define
>> > kernel Interfaces. It's paragraph 22 :(
>>
>> I'm sure you do, presumably you test this somehow. Still, as a
>> potential consumer of that interface, I will absolutely poke at it to
>> figure out how to use it, what it lets me to do, and what won't work.
>
> That's why now I want to enable very basic functionality that will not
> go away anytime soon.

The issue is that the APIs won't go away any time soon either. That's
why people object to your proposal so strongly. Because we won't be able
to fix this later, and we _already_ see shortcomings now.

> Mapping between port and recovered clock (as in take my clock and
> output on the first PHY's recovered clock output) and checking the
> state of the clock.

Where is that mapping? I see a per-netdev call for a list of pins that
carry RCLK, and the state as well. I don't see a way to distinguish
which is which in any way.

>> BTW, what we've done in the past in a situation like this was, here's
>> the current submission, here's a pointer to a GIT with more stuff we
>> plan to send later on, here's a pointer to a GIT with the userspace
>> stuff. I doubt anybody actually looks at that code, ain't nobody got
>> time for that, but really there's no catch 22.
>
> Unfortunately, the userspace of it will be a part of linuxptp and we
> can't upstream it partially before we get those basics defined here.

Just push it to github or whereever?

> More advanced functionality will be grown organically, as I also have
> a limited view of SyncE and am not expert on switches.

We are growing it organically _right now_. I am strongly advocating an
organic growth in the direction of a first-class DPLL object.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-10 10:27             ` [Intel-wired-lan] " Petr Machata
@ 2021-11-10 11:19               ` Machnikowski, Maciej
  -1 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-10 11:19 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, kuba, linux-kselftest, idosch, mkubecek, saeed,
	michael.chan

> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Wednesday, November 10, 2021 11:27 AM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:
> 
> >> Ha, ok, so the RANGE call goes away, it's all in the RTM_GETRCLKSTATE.
> >
> > The functionality needs to be there, but the message will be gone.
> 
> Gotcha.
> 
> >> >> > +RTM_SETRCLKSTATE
> >> >> > +-----------------
> >> >> > +Sets the redirection of the recovered clock for a given pin. This
> >> message
> >> >> > +expects one attribute:
> >> >> > +struct if_set_rclk_msg {
> >> >> > +	__u32 ifindex; /* interface index */
> >> >> > +	__u32 out_idx; /* output index (from a valid range)
> >> >> > +	__u32 flags; /* configuration flags */
> >> >> > +};
> >> >> > +
> >> >> > +Supported flags are:
> >> >> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be
> enabled,
> >> >> > +		     if clear - the output will be disabled.
> >> >>
> >> >> OK, so here I set up the tracking. ifindex tells me which EEC to
> >> >> configure, out_idx is the pin to track, flags tell me whether to set up
> >> >> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
> >> >> I somehow know that lane 2 has the best clock.
> >> >
> >> > It's bound to ifindex to know which PHY port you interact with. It
> >> > has nothing to do with the EEC yet.
> >>
> >> It has in the sense that I'm configuring "TX CLK in", which leads
> >> from EEC to the port.
> >
> > At this stage we only enable the recovered clock. EEC may or may not
> > use it depending on many additional factors.
> >
> >> >> If the above is broadly correct, I've got some questions.
> >> >>
> >> >> First, what if more than one out_idx is set? What are drivers / HW
> >> >> meant to do with this? What is the expected behavior?
> >> >
> >> > Expected behavior is deployment specific. You can use different phy
> >> > recovered clock outputs to implement active/passive mode of clock
> >> > failover.
> >>
> >> How? Which one is primary and which one is backup? I just have two
> >> enabled pins...
> >
> > With this API you only have ports and pins and set up the redirection.
> 
> Wait, so how do I do failover? Which of the set pins in primary and
> which is backup? Should the backup be sticky, i.e. do primary and backup
> switch roles after primary goes into holdover? It looks like there are a
> number of policy decisions that would be best served by a userspace
> tool.

The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
only configures the redirections (aka. Which clocks will be available to the
DPLL as references). In some DPLLs the fallback is automatic as long as
secondary clock is available when the primary goes away. Userspace tool
can preconfigure that before the failure occurs.

> > The EEC part is out of picture and will be part of DPLL subsystem.
> 
> So about that. I don't think it's contentious to claim that you need to
> communicate EEC state somehow. This proposal does that through a netdev
> object. After the DPLL subsystem comes along, that will necessarily
> provide the same information, and the netdev interface will become
> redundant, but we will need to keep it around.
> 
> That is a strong indication that a first-class DPLL object should be
> part of the initial submission.

That's why only a bare minimum is proposed in this patch - reading the state
and which signal is used as a reference.

> >> Wouldn't failover be implementable in a userspace daemon? That would
> get
> >> a notification from the system that holdover was entered, and can
> >> reconfigure tracking to another pin based on arbitrary rules.
> >
> > Not necessarily. You can deploy the QL-disabled mode and rely on the
> > local DPLL configuration to manage the switching. In that mode you're
> > not passing the quality level downstream, so you only need to know if you
> > have a source.
> 
> The daemon can reconfigure tracking to another pin based on _arbitrary_
> rules. They don't have to involve QL in any way. Can be round-robin,
> FIFO, random choice... IMO it's better than just enabling a bunch of
> pins and not providing any guidance as to the policy.

This is how the API works now. You can enable clock on output N with the
RTM_SETRCLKSTATE.
It can't be random/round-robin, but it's deployment specific. If in your setup
you only have one link to synchronous network you'll always use it as your frequency
reference.

> >> >> Second, as a user-space client, how do I know that if ports 1 and
> >> >> 2 both report pin range [A; B], that they both actually share the
> >> >> same underlying EEC? Is there some sort of coordination among the
> >> >> drivers, such that each pin in the system has a unique ID?
> >> >
> >> > For now we don't, as we don't have EEC subsystem. But that can be
> >> > solved by a config file temporarily.
> >>
> >> I think it would be better to model this properly from day one.
> >
> > I want to propose the simplest API that will work for the simplest
> > device, follow that with the userspace tool that will help everyone
> > understand what we need in the DPLL subsystem, otherwise it'll be hard
> > to explain the requirements. The only change will be the addition of
> > the DPLL index.
> 
> That would be fine if there were a migration path to the more complete
> API. But as DPLL object is introduced, even the APIs that are superseded
> by the DPLL APIs will need to stay in as a baggage.

The migration paths are:
A) when the DPLL API is there check if the DPLL object is linked to the given netdev
     in the rtnl_eec_state_get - if it is - get the state from the DPLL object there
or
B) return the DPLL index linked to the given netdev and fail the rtnl_eec_state_get
     so that the userspace tool will need to switch to the new API

Also the rtnl_eec_state_get won't get obsolete in all cases once we get the DPLL
subsystem, as there are solutions where SyncE DPLL is embedded in the PHY
in which case the rtnl_eec_state_get will return all needed information without
the need to create a separate DPLL object.

The DPLL object makes sense for advanced SyncE DPLLs that provide additional
functionality, such as external reference/output pins.

> >> >> Further, how do I actually know the mapping from ports to pins?
> >> >> E.g. as a user, I might know my master is behind swp1. How do I
> >> >> know what pins correspond to that port? As a user-space tool
> >> >> author, how do I help users to do something like "eec set clock
> >> >> eec0 track swp1"?
> >> >
> >> > That's why driver needs to be smart there and return indexes
> >> > properly.
> >>
> >> What do you mean, properly? Up there you have RTM_GETRCLKRANGE
> that
> >> just gives me a min and a max. Is there a policy about how to
> >> correlate numbers in that range to... ifindices, netdevice names,
> >> devlink port numbers, I don't know, something?
> >
> > The driver needs to know the underlying HW and report those ranges
> > correctly.
> 
> How do I know _as a user_ though? As a user I want to be able to say
> something like "eec set dev swp1 track dev swp2". But the "eec" tool has
> no way of knowing how to set that up.

There's no such flexibility. It's more like timing pins in the PTP subsystem - we
expose the API to control them, but it's up to the final user to decide how 
to use them.

If we index the PHY outputs in the same way as the DPLL subsystem will see
them in the references part it should be sufficient to make sense out of them.
 
> >> How do several drivers coordinate this numbering among themselves? Is
> >> there a core kernel authority that manages pin number de/allocations?
> >
> > I believe the goal is to create something similar to the ptp
> > subsystem. The driver will need to configure the relationship during
> > initialization and the OS will manage the indexes.
> 
> Can you point at the index management code, please?

Look for the ptp_clock_register function in the kernel - it owns the registration
of the ptp clock to the subsystem.

> >> >> Additionally, how would things like external GPSs or 1pps be
> >> >> modeled? I guess the driver would know about such interface, and
> >> >> would expose it as a "pin". When the GPS signal locks, the driver
> >> >> starts reporting the pin in the RCLK set. Then it is possible to
> >> >> set up tracking of that pin.
> >> >
> >> > That won't be enabled before we get the DPLL subsystem ready.
> >>
> >> It might prove challenging to retrofit an existing netdev-centric
> >> interface into a more generic model. It would be better to model this
> >> properly from day one, and OK, if we can carve out a subset of that
> >> model to implement now, and leave the rest for later, fine. But the
> >> current model does not strike me as having a natural migration path to
> >> something more generic. E.g. reporting the EEC state through the
> >> interfaces attached to that EEC... like, that will have to stay, even at
> >> a time when it is superseded by a better interface.
> >
> > The recovered clock API will not change - only EEC_STATE is in
> > question. We can either redirect the call to the DPLL subsystem, or
> > just add the DPLL IDX Into that call and return it.
> 
> It would be better to have a first-class DPLL object, however vestigial,
> in the initial submission.

As stated above - DPLL subsystem won't render EEC state useless.

> >> >> It seems to me it would be easier to understand, and to write
> >> >> user-space tools and drivers for, a model that has EEC as an
> >> >> explicit first-class object. That's where the EEC state naturally
> >> >> belongs, that's where the pin range naturally belongs. Netdevs
> >> >> should have a reference to EEC and pins, not present this
> >> >> information as if they own it. A first-class EEC would also allow
> >> >> to later figure out how to hook up PHC and EEC.
> >> >
> >> > We have the userspace tool, but can’t upstream it until we define
> >> > kernel Interfaces. It's paragraph 22 :(
> >>
> >> I'm sure you do, presumably you test this somehow. Still, as a
> >> potential consumer of that interface, I will absolutely poke at it to
> >> figure out how to use it, what it lets me to do, and what won't work.
> >
> > That's why now I want to enable very basic functionality that will not
> > go away anytime soon.
> 
> The issue is that the APIs won't go away any time soon either. That's
> why people object to your proposal so strongly. Because we won't be able
> to fix this later, and we _already_ see shortcomings now.
> 
> > Mapping between port and recovered clock (as in take my clock and
> > output on the first PHY's recovered clock output) and checking the
> > state of the clock.
> 
> Where is that mapping? I see a per-netdev call for a list of pins that
> carry RCLK, and the state as well. I don't see a way to distinguish
> which is which in any way.
> 
> >> BTW, what we've done in the past in a situation like this was, here's
> >> the current submission, here's a pointer to a GIT with more stuff we
> >> plan to send later on, here's a pointer to a GIT with the userspace
> >> stuff. I doubt anybody actually looks at that code, ain't nobody got
> >> time for that, but really there's no catch 22.
> >
> > Unfortunately, the userspace of it will be a part of linuxptp and we
> > can't upstream it partially before we get those basics defined here.
> 
> Just push it to github or whereever?
> 
> > More advanced functionality will be grown organically, as I also have
> > a limited view of SyncE and am not expert on switches.
> 
> We are growing it organically _right now_. I am strongly advocating an
> organic growth in the direction of a first-class DPLL object.

If it helps - I can separate the PHY RCLK control patches and leave EEC state
under review

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-10 11:19               ` Machnikowski, Maciej
  0 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-10 11:19 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Wednesday, November 10, 2021 11:27 AM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:
> 
> >> Ha, ok, so the RANGE call goes away, it's all in the RTM_GETRCLKSTATE.
> >
> > The functionality needs to be there, but the message will be gone.
> 
> Gotcha.
> 
> >> >> > +RTM_SETRCLKSTATE
> >> >> > +-----------------
> >> >> > +Sets the redirection of the recovered clock for a given pin. This
> >> message
> >> >> > +expects one attribute:
> >> >> > +struct if_set_rclk_msg {
> >> >> > +	__u32 ifindex; /* interface index */
> >> >> > +	__u32 out_idx; /* output index (from a valid range)
> >> >> > +	__u32 flags; /* configuration flags */
> >> >> > +};
> >> >> > +
> >> >> > +Supported flags are:
> >> >> > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be
> enabled,
> >> >> > +		     if clear - the output will be disabled.
> >> >>
> >> >> OK, so here I set up the tracking. ifindex tells me which EEC to
> >> >> configure, out_idx is the pin to track, flags tell me whether to set up
> >> >> the tracking or tear it down. Thus e.g. on port 2, track pin 2, because
> >> >> I somehow know that lane 2 has the best clock.
> >> >
> >> > It's bound to ifindex to know which PHY port you interact with. It
> >> > has nothing to do with the EEC yet.
> >>
> >> It has in the sense that I'm configuring "TX CLK in", which leads
> >> from EEC to the port.
> >
> > At this stage we only enable the recovered clock. EEC may or may not
> > use it depending on many additional factors.
> >
> >> >> If the above is broadly correct, I've got some questions.
> >> >>
> >> >> First, what if more than one out_idx is set? What are drivers / HW
> >> >> meant to do with this? What is the expected behavior?
> >> >
> >> > Expected behavior is deployment specific. You can use different phy
> >> > recovered clock outputs to implement active/passive mode of clock
> >> > failover.
> >>
> >> How? Which one is primary and which one is backup? I just have two
> >> enabled pins...
> >
> > With this API you only have ports and pins and set up the redirection.
> 
> Wait, so how do I do failover? Which of the set pins in primary and
> which is backup? Should the backup be sticky, i.e. do primary and backup
> switch roles after primary goes into holdover? It looks like there are a
> number of policy decisions that would be best served by a userspace
> tool.

The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
only configures the redirections (aka. Which clocks will be available to the
DPLL as references). In some DPLLs the fallback is automatic as long as
secondary clock is available when the primary goes away. Userspace tool
can preconfigure that before the failure occurs.

> > The EEC part is out of picture and will be part of DPLL subsystem.
> 
> So about that. I don't think it's contentious to claim that you need to
> communicate EEC state somehow. This proposal does that through a netdev
> object. After the DPLL subsystem comes along, that will necessarily
> provide the same information, and the netdev interface will become
> redundant, but we will need to keep it around.
> 
> That is a strong indication that a first-class DPLL object should be
> part of the initial submission.

That's why only a bare minimum is proposed in this patch - reading the state
and which signal is used as a reference.

> >> Wouldn't failover be implementable in a userspace daemon? That would
> get
> >> a notification from the system that holdover was entered, and can
> >> reconfigure tracking to another pin based on arbitrary rules.
> >
> > Not necessarily. You can deploy the QL-disabled mode and rely on the
> > local DPLL configuration to manage the switching. In that mode you're
> > not passing the quality level downstream, so you only need to know if you
> > have a source.
> 
> The daemon can reconfigure tracking to another pin based on _arbitrary_
> rules. They don't have to involve QL in any way. Can be round-robin,
> FIFO, random choice... IMO it's better than just enabling a bunch of
> pins and not providing any guidance as to the policy.

This is how the API works now. You can enable clock on output N with the
RTM_SETRCLKSTATE.
It can't be random/round-robin, but it's deployment specific. If in your setup
you only have one link to synchronous network you'll always use it as your frequency
reference.

> >> >> Second, as a user-space client, how do I know that if ports 1 and
> >> >> 2 both report pin range [A; B], that they both actually share the
> >> >> same underlying EEC? Is there some sort of coordination among the
> >> >> drivers, such that each pin in the system has a unique ID?
> >> >
> >> > For now we don't, as we don't have EEC subsystem. But that can be
> >> > solved by a config file temporarily.
> >>
> >> I think it would be better to model this properly from day one.
> >
> > I want to propose the simplest API that will work for the simplest
> > device, follow that with the userspace tool that will help everyone
> > understand what we need in the DPLL subsystem, otherwise it'll be hard
> > to explain the requirements. The only change will be the addition of
> > the DPLL index.
> 
> That would be fine if there were a migration path to the more complete
> API. But as DPLL object is introduced, even the APIs that are superseded
> by the DPLL APIs will need to stay in as a baggage.

The migration paths are:
A) when the DPLL API is there check if the DPLL object is linked to the given netdev
     in the rtnl_eec_state_get - if it is - get the state from the DPLL object there
or
B) return the DPLL index linked to the given netdev and fail the rtnl_eec_state_get
     so that the userspace tool will need to switch to the new API

Also the rtnl_eec_state_get won't get obsolete in all cases once we get the DPLL
subsystem, as there are solutions where SyncE DPLL is embedded in the PHY
in which case the rtnl_eec_state_get will return all needed information without
the need to create a separate DPLL object.

The DPLL object makes sense for advanced SyncE DPLLs that provide additional
functionality, such as external reference/output pins.

> >> >> Further, how do I actually know the mapping from ports to pins?
> >> >> E.g. as a user, I might know my master is behind swp1. How do I
> >> >> know what pins correspond to that port? As a user-space tool
> >> >> author, how do I help users to do something like "eec set clock
> >> >> eec0 track swp1"?
> >> >
> >> > That's why driver needs to be smart there and return indexes
> >> > properly.
> >>
> >> What do you mean, properly? Up there you have RTM_GETRCLKRANGE
> that
> >> just gives me a min and a max. Is there a policy about how to
> >> correlate numbers in that range to... ifindices, netdevice names,
> >> devlink port numbers, I don't know, something?
> >
> > The driver needs to know the underlying HW and report those ranges
> > correctly.
> 
> How do I know _as a user_ though? As a user I want to be able to say
> something like "eec set dev swp1 track dev swp2". But the "eec" tool has
> no way of knowing how to set that up.

There's no such flexibility. It's more like timing pins in the PTP subsystem - we
expose the API to control them, but it's up to the final user to decide how 
to use them.

If we index the PHY outputs in the same way as the DPLL subsystem will see
them in the references part it should be sufficient to make sense out of them.
 
> >> How do several drivers coordinate this numbering among themselves? Is
> >> there a core kernel authority that manages pin number de/allocations?
> >
> > I believe the goal is to create something similar to the ptp
> > subsystem. The driver will need to configure the relationship during
> > initialization and the OS will manage the indexes.
> 
> Can you point at the index management code, please?

Look for the ptp_clock_register function in the kernel - it owns the registration
of the ptp clock to the subsystem.

> >> >> Additionally, how would things like external GPSs or 1pps be
> >> >> modeled? I guess the driver would know about such interface, and
> >> >> would expose it as a "pin". When the GPS signal locks, the driver
> >> >> starts reporting the pin in the RCLK set. Then it is possible to
> >> >> set up tracking of that pin.
> >> >
> >> > That won't be enabled before we get the DPLL subsystem ready.
> >>
> >> It might prove challenging to retrofit an existing netdev-centric
> >> interface into a more generic model. It would be better to model this
> >> properly from day one, and OK, if we can carve out a subset of that
> >> model to implement now, and leave the rest for later, fine. But the
> >> current model does not strike me as having a natural migration path to
> >> something more generic. E.g. reporting the EEC state through the
> >> interfaces attached to that EEC... like, that will have to stay, even at
> >> a time when it is superseded by a better interface.
> >
> > The recovered clock API will not change - only EEC_STATE is in
> > question. We can either redirect the call to the DPLL subsystem, or
> > just add the DPLL IDX Into that call and return it.
> 
> It would be better to have a first-class DPLL object, however vestigial,
> in the initial submission.

As stated above - DPLL subsystem won't render EEC state useless.

> >> >> It seems to me it would be easier to understand, and to write
> >> >> user-space tools and drivers for, a model that has EEC as an
> >> >> explicit first-class object. That's where the EEC state naturally
> >> >> belongs, that's where the pin range naturally belongs. Netdevs
> >> >> should have a reference to EEC and pins, not present this
> >> >> information as if they own it. A first-class EEC would also allow
> >> >> to later figure out how to hook up PHC and EEC.
> >> >
> >> > We have the userspace tool, but can?t upstream it until we define
> >> > kernel Interfaces. It's paragraph 22 :(
> >>
> >> I'm sure you do, presumably you test this somehow. Still, as a
> >> potential consumer of that interface, I will absolutely poke at it to
> >> figure out how to use it, what it lets me to do, and what won't work.
> >
> > That's why now I want to enable very basic functionality that will not
> > go away anytime soon.
> 
> The issue is that the APIs won't go away any time soon either. That's
> why people object to your proposal so strongly. Because we won't be able
> to fix this later, and we _already_ see shortcomings now.
> 
> > Mapping between port and recovered clock (as in take my clock and
> > output on the first PHY's recovered clock output) and checking the
> > state of the clock.
> 
> Where is that mapping? I see a per-netdev call for a list of pins that
> carry RCLK, and the state as well. I don't see a way to distinguish
> which is which in any way.
> 
> >> BTW, what we've done in the past in a situation like this was, here's
> >> the current submission, here's a pointer to a GIT with more stuff we
> >> plan to send later on, here's a pointer to a GIT with the userspace
> >> stuff. I doubt anybody actually looks at that code, ain't nobody got
> >> time for that, but really there's no catch 22.
> >
> > Unfortunately, the userspace of it will be a part of linuxptp and we
> > can't upstream it partially before we get those basics defined here.
> 
> Just push it to github or whereever?
> 
> > More advanced functionality will be grown organically, as I also have
> > a limited view of SyncE and am not expert on switches.
> 
> We are growing it organically _right now_. I am strongly advocating an
> organic growth in the direction of a first-class DPLL object.

If it helps - I can separate the PHY RCLK control patches and leave EEC state
under review

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-10 11:19               ` [Intel-wired-lan] " Machnikowski, Maciej
@ 2021-11-10 15:15                 ` Petr Machata
  -1 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-10 15:15 UTC (permalink / raw)
  To: Machnikowski, Maciej
  Cc: Petr Machata, netdev, intel-wired-lan, richardcochran, abyagowi,
	Nguyen, Anthony L, davem, kuba, linux-kselftest, idosch,
	mkubecek, saeed, michael.chan


>> >> >> First, what if more than one out_idx is set? What are drivers / HW
>> >> >> meant to do with this? What is the expected behavior?
>> >> >
>> >> > Expected behavior is deployment specific. You can use different phy
>> >> > recovered clock outputs to implement active/passive mode of clock
>> >> > failover.
>> >>
>> >> How? Which one is primary and which one is backup? I just have two
>> >> enabled pins...
>> >
>> > With this API you only have ports and pins and set up the redirection.
>> 
>> Wait, so how do I do failover? Which of the set pins in primary and
>> which is backup? Should the backup be sticky, i.e. do primary and backup
>> switch roles after primary goes into holdover? It looks like there are a
>> number of policy decisions that would be best served by a userspace
>> tool.
>
> The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
> only configures the redirections (aka. Which clocks will be available to the
> DPLL as references). In some DPLLs the fallback is automatic as long as
> secondary clock is available when the primary goes away. Userspace tool
> can preconfigure that before the failure occurs.

OK, I see. It looks like this priority list implies which pins need to
be enabled. That makes the netdev interface redundant.

>> > The EEC part is out of picture and will be part of DPLL subsystem.
>> 
>> So about that. I don't think it's contentious to claim that you need to
>> communicate EEC state somehow. This proposal does that through a netdev
>> object. After the DPLL subsystem comes along, that will necessarily
>> provide the same information, and the netdev interface will become
>> redundant, but we will need to keep it around.
>> 
>> That is a strong indication that a first-class DPLL object should be
>> part of the initial submission.
>
> That's why only a bare minimum is proposed in this patch - reading the state
> and which signal is used as a reference.

The proposal includes APIs that we know _right now_ will be historical
baggage by the time the DPLL object is added. That does not constitute
bare minimum.

>> >> >> Second, as a user-space client, how do I know that if ports 1 and
>> >> >> 2 both report pin range [A; B], that they both actually share the
>> >> >> same underlying EEC? Is there some sort of coordination among the
>> >> >> drivers, such that each pin in the system has a unique ID?
>> >> >
>> >> > For now we don't, as we don't have EEC subsystem. But that can be
>> >> > solved by a config file temporarily.
>> >>
>> >> I think it would be better to model this properly from day one.
>> >
>> > I want to propose the simplest API that will work for the simplest
>> > device, follow that with the userspace tool that will help everyone
>> > understand what we need in the DPLL subsystem, otherwise it'll be hard
>> > to explain the requirements. The only change will be the addition of
>> > the DPLL index.
>> 
>> That would be fine if there were a migration path to the more complete
>> API. But as DPLL object is introduced, even the APIs that are superseded
>> by the DPLL APIs will need to stay in as a baggage.
>
> The migration paths are:
> A) when the DPLL API is there check if the DPLL object is linked to the given netdev
>      in the rtnl_eec_state_get - if it is - get the state from the DPLL object there
> or
> B) return the DPLL index linked to the given netdev and fail the rtnl_eec_state_get
>      so that the userspace tool will need to switch to the new API

Well, we call B) an API breakage, and it won't fly. That API is there to
stay, and operate like it operates now.

That leaves us with A), where the API becomes a redundant wart that we
can never get rid of.

> Also the rtnl_eec_state_get won't get obsolete in all cases once we get the DPLL
> subsystem, as there are solutions where SyncE DPLL is embedded in the PHY
> in which case the rtnl_eec_state_get will return all needed information without
> the need to create a separate DPLL object.

So the NIC or PHY driver will register the object. Easy peasy.

Allowing the interface to go through a netdev sometimes, and through a
dedicated object other times, just makes everybody's life harder. It's
two cases that need to be handled in user documentation, in scripts, in
UAPI clients, when reviewing kernel code.

This is a "hysterical raisins" sort of baggage, except we see up front
that's where it goes.

> The DPLL object makes sense for advanced SyncE DPLLs that provide
> additional functionality, such as external reference/output pins.

That does not need to be the case.

>> >> >> Further, how do I actually know the mapping from ports to pins?
>> >> >> E.g. as a user, I might know my master is behind swp1. How do I
>> >> >> know what pins correspond to that port? As a user-space tool
>> >> >> author, how do I help users to do something like "eec set clock
>> >> >> eec0 track swp1"?
>> >> >
>> >> > That's why driver needs to be smart there and return indexes
>> >> > properly.
>> >>
>> >> What do you mean, properly? Up there you have RTM_GETRCLKRANGE
>> that
>> >> just gives me a min and a max. Is there a policy about how to
>> >> correlate numbers in that range to... ifindices, netdevice names,
>> >> devlink port numbers, I don't know, something?
>> >
>> > The driver needs to know the underlying HW and report those ranges
>> > correctly.
>> 
>> How do I know _as a user_ though? As a user I want to be able to say
>> something like "eec set dev swp1 track dev swp2". But the "eec" tool has
>> no way of knowing how to set that up.
>
> There's no such flexibility. It's more like timing pins in the PTP subsystem - we
> expose the API to control them, but it's up to the final user to decide how 
> to use them.

As a user, say I know the signal coming from swp1 is freqency-locked.
How can I instruct the switch ASIC to propagate that signal to the other
ports? Well, I go through swp2..swpN, and issue RTM_SETRCLKSTATE or
whatever, with flags indicating I set up tracking, and pin number...
what exactly? How do I know which pin carries clock recovered from swp1?

> If we index the PHY outputs in the same way as the DPLL subsystem will
> see them in the references part it should be sufficient to make sense
> out of them.

What do you mean by indexing PHY outputs? Where are those indexed?

>> >> How do several drivers coordinate this numbering among themselves?
>> >> Is there a core kernel authority that manages pin number
>> >> de/allocations?
>> >
>> > I believe the goal is to create something similar to the ptp
>> > subsystem. The driver will need to configure the relationship
>> > during initialization and the OS will manage the indexes.
>> 
>> Can you point at the index management code, please?
>
> Look for the ptp_clock_register function in the kernel - it owns the
> registration of the ptp clock to the subsystem.

But I'm talking about the SyncE code.

>> >> >> Additionally, how would things like external GPSs or 1pps be
>> >> >> modeled? I guess the driver would know about such interface, and
>> >> >> would expose it as a "pin". When the GPS signal locks, the driver
>> >> >> starts reporting the pin in the RCLK set. Then it is possible to
>> >> >> set up tracking of that pin.
>> >> >
>> >> > That won't be enabled before we get the DPLL subsystem ready.
>> >>
>> >> It might prove challenging to retrofit an existing netdev-centric
>> >> interface into a more generic model. It would be better to model this
>> >> properly from day one, and OK, if we can carve out a subset of that
>> >> model to implement now, and leave the rest for later, fine. But the
>> >> current model does not strike me as having a natural migration path to
>> >> something more generic. E.g. reporting the EEC state through the
>> >> interfaces attached to that EEC... like, that will have to stay, even at
>> >> a time when it is superseded by a better interface.
>> >
>> > The recovered clock API will not change - only EEC_STATE is in
>> > question. We can either redirect the call to the DPLL subsystem, or
>> > just add the DPLL IDX Into that call and return it.
>> 
>> It would be better to have a first-class DPLL object, however vestigial,
>> in the initial submission.
>
> As stated above - DPLL subsystem won't render EEC state useless.

Of course not, the state is still important. But it will render the API
useless, and worse, an extra baggage everyone needs to know about and
support.

>> > More advanced functionality will be grown organically, as I also have
>> > a limited view of SyncE and am not expert on switches.
>> 
>> We are growing it organically _right now_. I am strongly advocating an
>> organic growth in the direction of a first-class DPLL object.
>
> If it helps - I can separate the PHY RCLK control patches and leave EEC state
> under review

Not sure what you mean by that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-10 15:15                 ` Petr Machata
  0 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-10 15:15 UTC (permalink / raw)
  To: intel-wired-lan


>> >> >> First, what if more than one out_idx is set? What are drivers / HW
>> >> >> meant to do with this? What is the expected behavior?
>> >> >
>> >> > Expected behavior is deployment specific. You can use different phy
>> >> > recovered clock outputs to implement active/passive mode of clock
>> >> > failover.
>> >>
>> >> How? Which one is primary and which one is backup? I just have two
>> >> enabled pins...
>> >
>> > With this API you only have ports and pins and set up the redirection.
>> 
>> Wait, so how do I do failover? Which of the set pins in primary and
>> which is backup? Should the backup be sticky, i.e. do primary and backup
>> switch roles after primary goes into holdover? It looks like there are a
>> number of policy decisions that would be best served by a userspace
>> tool.
>
> The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
> only configures the redirections (aka. Which clocks will be available to the
> DPLL as references). In some DPLLs the fallback is automatic as long as
> secondary clock is available when the primary goes away. Userspace tool
> can preconfigure that before the failure occurs.

OK, I see. It looks like this priority list implies which pins need to
be enabled. That makes the netdev interface redundant.

>> > The EEC part is out of picture and will be part of DPLL subsystem.
>> 
>> So about that. I don't think it's contentious to claim that you need to
>> communicate EEC state somehow. This proposal does that through a netdev
>> object. After the DPLL subsystem comes along, that will necessarily
>> provide the same information, and the netdev interface will become
>> redundant, but we will need to keep it around.
>> 
>> That is a strong indication that a first-class DPLL object should be
>> part of the initial submission.
>
> That's why only a bare minimum is proposed in this patch - reading the state
> and which signal is used as a reference.

The proposal includes APIs that we know _right now_ will be historical
baggage by the time the DPLL object is added. That does not constitute
bare minimum.

>> >> >> Second, as a user-space client, how do I know that if ports 1 and
>> >> >> 2 both report pin range [A; B], that they both actually share the
>> >> >> same underlying EEC? Is there some sort of coordination among the
>> >> >> drivers, such that each pin in the system has a unique ID?
>> >> >
>> >> > For now we don't, as we don't have EEC subsystem. But that can be
>> >> > solved by a config file temporarily.
>> >>
>> >> I think it would be better to model this properly from day one.
>> >
>> > I want to propose the simplest API that will work for the simplest
>> > device, follow that with the userspace tool that will help everyone
>> > understand what we need in the DPLL subsystem, otherwise it'll be hard
>> > to explain the requirements. The only change will be the addition of
>> > the DPLL index.
>> 
>> That would be fine if there were a migration path to the more complete
>> API. But as DPLL object is introduced, even the APIs that are superseded
>> by the DPLL APIs will need to stay in as a baggage.
>
> The migration paths are:
> A) when the DPLL API is there check if the DPLL object is linked to the given netdev
>      in the rtnl_eec_state_get - if it is - get the state from the DPLL object there
> or
> B) return the DPLL index linked to the given netdev and fail the rtnl_eec_state_get
>      so that the userspace tool will need to switch to the new API

Well, we call B) an API breakage, and it won't fly. That API is there to
stay, and operate like it operates now.

That leaves us with A), where the API becomes a redundant wart that we
can never get rid of.

> Also the rtnl_eec_state_get won't get obsolete in all cases once we get the DPLL
> subsystem, as there are solutions where SyncE DPLL is embedded in the PHY
> in which case the rtnl_eec_state_get will return all needed information without
> the need to create a separate DPLL object.

So the NIC or PHY driver will register the object. Easy peasy.

Allowing the interface to go through a netdev sometimes, and through a
dedicated object other times, just makes everybody's life harder. It's
two cases that need to be handled in user documentation, in scripts, in
UAPI clients, when reviewing kernel code.

This is a "hysterical raisins" sort of baggage, except we see up front
that's where it goes.

> The DPLL object makes sense for advanced SyncE DPLLs that provide
> additional functionality, such as external reference/output pins.

That does not need to be the case.

>> >> >> Further, how do I actually know the mapping from ports to pins?
>> >> >> E.g. as a user, I might know my master is behind swp1. How do I
>> >> >> know what pins correspond to that port? As a user-space tool
>> >> >> author, how do I help users to do something like "eec set clock
>> >> >> eec0 track swp1"?
>> >> >
>> >> > That's why driver needs to be smart there and return indexes
>> >> > properly.
>> >>
>> >> What do you mean, properly? Up there you have RTM_GETRCLKRANGE
>> that
>> >> just gives me a min and a max. Is there a policy about how to
>> >> correlate numbers in that range to... ifindices, netdevice names,
>> >> devlink port numbers, I don't know, something?
>> >
>> > The driver needs to know the underlying HW and report those ranges
>> > correctly.
>> 
>> How do I know _as a user_ though? As a user I want to be able to say
>> something like "eec set dev swp1 track dev swp2". But the "eec" tool has
>> no way of knowing how to set that up.
>
> There's no such flexibility. It's more like timing pins in the PTP subsystem - we
> expose the API to control them, but it's up to the final user to decide how 
> to use them.

As a user, say I know the signal coming from swp1 is freqency-locked.
How can I instruct the switch ASIC to propagate that signal to the other
ports? Well, I go through swp2..swpN, and issue RTM_SETRCLKSTATE or
whatever, with flags indicating I set up tracking, and pin number...
what exactly? How do I know which pin carries clock recovered from swp1?

> If we index the PHY outputs in the same way as the DPLL subsystem will
> see them in the references part it should be sufficient to make sense
> out of them.

What do you mean by indexing PHY outputs? Where are those indexed?

>> >> How do several drivers coordinate this numbering among themselves?
>> >> Is there a core kernel authority that manages pin number
>> >> de/allocations?
>> >
>> > I believe the goal is to create something similar to the ptp
>> > subsystem. The driver will need to configure the relationship
>> > during initialization and the OS will manage the indexes.
>> 
>> Can you point at the index management code, please?
>
> Look for the ptp_clock_register function in the kernel - it owns the
> registration of the ptp clock to the subsystem.

But I'm talking about the SyncE code.

>> >> >> Additionally, how would things like external GPSs or 1pps be
>> >> >> modeled? I guess the driver would know about such interface, and
>> >> >> would expose it as a "pin". When the GPS signal locks, the driver
>> >> >> starts reporting the pin in the RCLK set. Then it is possible to
>> >> >> set up tracking of that pin.
>> >> >
>> >> > That won't be enabled before we get the DPLL subsystem ready.
>> >>
>> >> It might prove challenging to retrofit an existing netdev-centric
>> >> interface into a more generic model. It would be better to model this
>> >> properly from day one, and OK, if we can carve out a subset of that
>> >> model to implement now, and leave the rest for later, fine. But the
>> >> current model does not strike me as having a natural migration path to
>> >> something more generic. E.g. reporting the EEC state through the
>> >> interfaces attached to that EEC... like, that will have to stay, even at
>> >> a time when it is superseded by a better interface.
>> >
>> > The recovered clock API will not change - only EEC_STATE is in
>> > question. We can either redirect the call to the DPLL subsystem, or
>> > just add the DPLL IDX Into that call and return it.
>> 
>> It would be better to have a first-class DPLL object, however vestigial,
>> in the initial submission.
>
> As stated above - DPLL subsystem won't render EEC state useless.

Of course not, the state is still important. But it will render the API
useless, and worse, an extra baggage everyone needs to know about and
support.

>> > More advanced functionality will be grown organically, as I also have
>> > a limited view of SyncE and am not expert on switches.
>> 
>> We are growing it organically _right now_. I am strongly advocating an
>> organic growth in the direction of a first-class DPLL object.
>
> If it helps - I can separate the PHY RCLK control patches and leave EEC state
> under review

Not sure what you mean by that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-10 15:15                 ` [Intel-wired-lan] " Petr Machata
@ 2021-11-10 15:50                   ` Machnikowski, Maciej
  -1 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-10 15:50 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, kuba, linux-kselftest, idosch, mkubecek, saeed,
	michael.chan

> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Wednesday, November 10, 2021 4:15 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> >> >> >> First, what if more than one out_idx is set? What are drivers / HW
> >> >> >> meant to do with this? What is the expected behavior?
> >> >> >
> >> >> > Expected behavior is deployment specific. You can use different phy
> >> >> > recovered clock outputs to implement active/passive mode of clock
> >> >> > failover.
> >> >>
> >> >> How? Which one is primary and which one is backup? I just have two
> >> >> enabled pins...
> >> >
> >> > With this API you only have ports and pins and set up the redirection.
> >>
> >> Wait, so how do I do failover? Which of the set pins in primary and
> >> which is backup? Should the backup be sticky, i.e. do primary and backup
> >> switch roles after primary goes into holdover? It looks like there are a
> >> number of policy decisions that would be best served by a userspace
> >> tool.
> >
> > The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
> > only configures the redirections (aka. Which clocks will be available to the
> > DPLL as references). In some DPLLs the fallback is automatic as long as
> > secondary clock is available when the primary goes away. Userspace tool
> > can preconfigure that before the failure occurs.
> 
> OK, I see. It looks like this priority list implies which pins need to
> be enabled. That makes the netdev interface redundant.

Netdev owns the PHY, so it needs to enable/disable clock from a given
port/lane - other than that it's EECs task. Technically - those subsystems
are separate.

> >> > The EEC part is out of picture and will be part of DPLL subsystem.
> >>
> >> So about that. I don't think it's contentious to claim that you need to
> >> communicate EEC state somehow. This proposal does that through a
> netdev
> >> object. After the DPLL subsystem comes along, that will necessarily
> >> provide the same information, and the netdev interface will become
> >> redundant, but we will need to keep it around.
> >>
> >> That is a strong indication that a first-class DPLL object should be
> >> part of the initial submission.
> >
> > That's why only a bare minimum is proposed in this patch - reading the
> state
> > and which signal is used as a reference.
> 
> The proposal includes APIs that we know _right now_ will be historical
> baggage by the time the DPLL object is added. That does not constitute
> bare minimum.
> 
> >> >> >> Second, as a user-space client, how do I know that if ports 1 and
> >> >> >> 2 both report pin range [A; B], that they both actually share the
> >> >> >> same underlying EEC? Is there some sort of coordination among the
> >> >> >> drivers, such that each pin in the system has a unique ID?
> >> >> >
> >> >> > For now we don't, as we don't have EEC subsystem. But that can be
> >> >> > solved by a config file temporarily.
> >> >>
> >> >> I think it would be better to model this properly from day one.
> >> >
> >> > I want to propose the simplest API that will work for the simplest
> >> > device, follow that with the userspace tool that will help everyone
> >> > understand what we need in the DPLL subsystem, otherwise it'll be hard
> >> > to explain the requirements. The only change will be the addition of
> >> > the DPLL index.
> >>
> >> That would be fine if there were a migration path to the more complete
> >> API. But as DPLL object is introduced, even the APIs that are superseded
> >> by the DPLL APIs will need to stay in as a baggage.
> >
> > The migration paths are:
> > A) when the DPLL API is there check if the DPLL object is linked to the given
> netdev
> >      in the rtnl_eec_state_get - if it is - get the state from the DPLL object
> there
> > or
> > B) return the DPLL index linked to the given netdev and fail the
> rtnl_eec_state_get
> >      so that the userspace tool will need to switch to the new API
> 
> Well, we call B) an API breakage, and it won't fly. That API is there to
> stay, and operate like it operates now.
> 
> That leaves us with A), where the API becomes a redundant wart that we
> can never get rid of.
> 
> > Also the rtnl_eec_state_get won't get obsolete in all cases once we get the
> DPLL
> > subsystem, as there are solutions where SyncE DPLL is embedded in the
> PHY
> > in which case the rtnl_eec_state_get will return all needed information
> without
> > the need to create a separate DPLL object.
> 
> So the NIC or PHY driver will register the object. Easy peasy.
> 
> Allowing the interface to go through a netdev sometimes, and through a
> dedicated object other times, just makes everybody's life harder. It's
> two cases that need to be handled in user documentation, in scripts, in
> UAPI clients, when reviewing kernel code.
> 
> This is a "hysterical raisins" sort of baggage, except we see up front
> that's where it goes.
> 
> > The DPLL object makes sense for advanced SyncE DPLLs that provide
> > additional functionality, such as external reference/output pins.
> 
> That does not need to be the case.
> 
> >> >> >> Further, how do I actually know the mapping from ports to pins?
> >> >> >> E.g. as a user, I might know my master is behind swp1. How do I
> >> >> >> know what pins correspond to that port? As a user-space tool
> >> >> >> author, how do I help users to do something like "eec set clock
> >> >> >> eec0 track swp1"?
> >> >> >
> >> >> > That's why driver needs to be smart there and return indexes
> >> >> > properly.
> >> >>
> >> >> What do you mean, properly? Up there you have
> RTM_GETRCLKRANGE
> >> that
> >> >> just gives me a min and a max. Is there a policy about how to
> >> >> correlate numbers in that range to... ifindices, netdevice names,
> >> >> devlink port numbers, I don't know, something?
> >> >
> >> > The driver needs to know the underlying HW and report those ranges
> >> > correctly.
> >>
> >> How do I know _as a user_ though? As a user I want to be able to say
> >> something like "eec set dev swp1 track dev swp2". But the "eec" tool has
> >> no way of knowing how to set that up.
> >
> > There's no such flexibility. It's more like timing pins in the PTP subsystem -
> we
> > expose the API to control them, but it's up to the final user to decide how
> > to use them.
> 
> As a user, say I know the signal coming from swp1 is freqency-locked.
> How can I instruct the switch ASIC to propagate that signal to the other
> ports? Well, I go through swp2..swpN, and issue RTM_SETRCLKSTATE or
> whatever, with flags indicating I set up tracking, and pin number...
> what exactly? How do I know which pin carries clock recovered from swp1?

You send the RTM_SETRCLKSTATE to the port that has the best reference
clock available.
If you want to know which pin carries the clock you simply send the
RTM_GETRCLKSTATE and it'll return the list of possible outputs with the flags
saying which of them are enabled (see the newer revision)

> > If we index the PHY outputs in the same way as the DPLL subsystem will
> > see them in the references part it should be sufficient to make sense
> > out of them.
> 
> What do you mean by indexing PHY outputs? Where are those indexed?

That's what ndo_get_rclk_range does. It returns allowed range of pins for a given
netdev.
 
> >> >> How do several drivers coordinate this numbering among themselves?
> >> >> Is there a core kernel authority that manages pin number
> >> >> de/allocations?
> >> >
> >> > I believe the goal is to create something similar to the ptp
> >> > subsystem. The driver will need to configure the relationship
> >> > during initialization and the OS will manage the indexes.
> >>
> >> Can you point at the index management code, please?
> >
> > Look for the ptp_clock_register function in the kernel - it owns the
> > registration of the ptp clock to the subsystem.
> 
> But I'm talking about the SyncE code.

PHY pins are indexed as the driver wishes, as they are board specific. 
You can index PHY pins 1,2,3 or 3,4,5 - whichever makes sense for 
a given application, as they are local for a netdev.
I would suggest returning numbers that are tightly coupled to the EEC
when that's known to make guessing game easier, but that's not mandatory.

> >> >> >> Additionally, how would things like external GPSs or 1pps be
> >> >> >> modeled? I guess the driver would know about such interface, and
> >> >> >> would expose it as a "pin". When the GPS signal locks, the driver
> >> >> >> starts reporting the pin in the RCLK set. Then it is possible to
> >> >> >> set up tracking of that pin.
> >> >> >
> >> >> > That won't be enabled before we get the DPLL subsystem ready.
> >> >>
> >> >> It might prove challenging to retrofit an existing netdev-centric
> >> >> interface into a more generic model. It would be better to model this
> >> >> properly from day one, and OK, if we can carve out a subset of that
> >> >> model to implement now, and leave the rest for later, fine. But the
> >> >> current model does not strike me as having a natural migration path to
> >> >> something more generic. E.g. reporting the EEC state through the
> >> >> interfaces attached to that EEC... like, that will have to stay, even at
> >> >> a time when it is superseded by a better interface.
> >> >
> >> > The recovered clock API will not change - only EEC_STATE is in
> >> > question. We can either redirect the call to the DPLL subsystem, or
> >> > just add the DPLL IDX Into that call and return it.
> >>
> >> It would be better to have a first-class DPLL object, however vestigial,
> >> in the initial submission.
> >
> > As stated above - DPLL subsystem won't render EEC state useless.
> 
> Of course not, the state is still important. But it will render the API
> useless, and worse, an extra baggage everyone needs to know about and
> support.
> 
> >> > More advanced functionality will be grown organically, as I also have
> >> > a limited view of SyncE and am not expert on switches.
> >>
> >> We are growing it organically _right now_. I am strongly advocating an
> >> organic growth in the direction of a first-class DPLL object.
> >
> > If it helps - I can separate the PHY RCLK control patches and leave EEC state
> > under review
> 
> Not sure what you mean by that.

Commit RTM_GETRCLKSTATE and RTM_SETRCLKSTATE now, wait with 
RTM_GETEECSTATE  till we clarify further direction of the DPLL subsystem

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-10 15:50                   ` Machnikowski, Maciej
  0 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-10 15:50 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Wednesday, November 10, 2021 4:15 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> >> >> >> First, what if more than one out_idx is set? What are drivers / HW
> >> >> >> meant to do with this? What is the expected behavior?
> >> >> >
> >> >> > Expected behavior is deployment specific. You can use different phy
> >> >> > recovered clock outputs to implement active/passive mode of clock
> >> >> > failover.
> >> >>
> >> >> How? Which one is primary and which one is backup? I just have two
> >> >> enabled pins...
> >> >
> >> > With this API you only have ports and pins and set up the redirection.
> >>
> >> Wait, so how do I do failover? Which of the set pins in primary and
> >> which is backup? Should the backup be sticky, i.e. do primary and backup
> >> switch roles after primary goes into holdover? It looks like there are a
> >> number of policy decisions that would be best served by a userspace
> >> tool.
> >
> > The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
> > only configures the redirections (aka. Which clocks will be available to the
> > DPLL as references). In some DPLLs the fallback is automatic as long as
> > secondary clock is available when the primary goes away. Userspace tool
> > can preconfigure that before the failure occurs.
> 
> OK, I see. It looks like this priority list implies which pins need to
> be enabled. That makes the netdev interface redundant.

Netdev owns the PHY, so it needs to enable/disable clock from a given
port/lane - other than that it's EECs task. Technically - those subsystems
are separate.

> >> > The EEC part is out of picture and will be part of DPLL subsystem.
> >>
> >> So about that. I don't think it's contentious to claim that you need to
> >> communicate EEC state somehow. This proposal does that through a
> netdev
> >> object. After the DPLL subsystem comes along, that will necessarily
> >> provide the same information, and the netdev interface will become
> >> redundant, but we will need to keep it around.
> >>
> >> That is a strong indication that a first-class DPLL object should be
> >> part of the initial submission.
> >
> > That's why only a bare minimum is proposed in this patch - reading the
> state
> > and which signal is used as a reference.
> 
> The proposal includes APIs that we know _right now_ will be historical
> baggage by the time the DPLL object is added. That does not constitute
> bare minimum.
> 
> >> >> >> Second, as a user-space client, how do I know that if ports 1 and
> >> >> >> 2 both report pin range [A; B], that they both actually share the
> >> >> >> same underlying EEC? Is there some sort of coordination among the
> >> >> >> drivers, such that each pin in the system has a unique ID?
> >> >> >
> >> >> > For now we don't, as we don't have EEC subsystem. But that can be
> >> >> > solved by a config file temporarily.
> >> >>
> >> >> I think it would be better to model this properly from day one.
> >> >
> >> > I want to propose the simplest API that will work for the simplest
> >> > device, follow that with the userspace tool that will help everyone
> >> > understand what we need in the DPLL subsystem, otherwise it'll be hard
> >> > to explain the requirements. The only change will be the addition of
> >> > the DPLL index.
> >>
> >> That would be fine if there were a migration path to the more complete
> >> API. But as DPLL object is introduced, even the APIs that are superseded
> >> by the DPLL APIs will need to stay in as a baggage.
> >
> > The migration paths are:
> > A) when the DPLL API is there check if the DPLL object is linked to the given
> netdev
> >      in the rtnl_eec_state_get - if it is - get the state from the DPLL object
> there
> > or
> > B) return the DPLL index linked to the given netdev and fail the
> rtnl_eec_state_get
> >      so that the userspace tool will need to switch to the new API
> 
> Well, we call B) an API breakage, and it won't fly. That API is there to
> stay, and operate like it operates now.
> 
> That leaves us with A), where the API becomes a redundant wart that we
> can never get rid of.
> 
> > Also the rtnl_eec_state_get won't get obsolete in all cases once we get the
> DPLL
> > subsystem, as there are solutions where SyncE DPLL is embedded in the
> PHY
> > in which case the rtnl_eec_state_get will return all needed information
> without
> > the need to create a separate DPLL object.
> 
> So the NIC or PHY driver will register the object. Easy peasy.
> 
> Allowing the interface to go through a netdev sometimes, and through a
> dedicated object other times, just makes everybody's life harder. It's
> two cases that need to be handled in user documentation, in scripts, in
> UAPI clients, when reviewing kernel code.
> 
> This is a "hysterical raisins" sort of baggage, except we see up front
> that's where it goes.
> 
> > The DPLL object makes sense for advanced SyncE DPLLs that provide
> > additional functionality, such as external reference/output pins.
> 
> That does not need to be the case.
> 
> >> >> >> Further, how do I actually know the mapping from ports to pins?
> >> >> >> E.g. as a user, I might know my master is behind swp1. How do I
> >> >> >> know what pins correspond to that port? As a user-space tool
> >> >> >> author, how do I help users to do something like "eec set clock
> >> >> >> eec0 track swp1"?
> >> >> >
> >> >> > That's why driver needs to be smart there and return indexes
> >> >> > properly.
> >> >>
> >> >> What do you mean, properly? Up there you have
> RTM_GETRCLKRANGE
> >> that
> >> >> just gives me a min and a max. Is there a policy about how to
> >> >> correlate numbers in that range to... ifindices, netdevice names,
> >> >> devlink port numbers, I don't know, something?
> >> >
> >> > The driver needs to know the underlying HW and report those ranges
> >> > correctly.
> >>
> >> How do I know _as a user_ though? As a user I want to be able to say
> >> something like "eec set dev swp1 track dev swp2". But the "eec" tool has
> >> no way of knowing how to set that up.
> >
> > There's no such flexibility. It's more like timing pins in the PTP subsystem -
> we
> > expose the API to control them, but it's up to the final user to decide how
> > to use them.
> 
> As a user, say I know the signal coming from swp1 is freqency-locked.
> How can I instruct the switch ASIC to propagate that signal to the other
> ports? Well, I go through swp2..swpN, and issue RTM_SETRCLKSTATE or
> whatever, with flags indicating I set up tracking, and pin number...
> what exactly? How do I know which pin carries clock recovered from swp1?

You send the RTM_SETRCLKSTATE to the port that has the best reference
clock available.
If you want to know which pin carries the clock you simply send the
RTM_GETRCLKSTATE and it'll return the list of possible outputs with the flags
saying which of them are enabled (see the newer revision)

> > If we index the PHY outputs in the same way as the DPLL subsystem will
> > see them in the references part it should be sufficient to make sense
> > out of them.
> 
> What do you mean by indexing PHY outputs? Where are those indexed?

That's what ndo_get_rclk_range does. It returns allowed range of pins for a given
netdev.
 
> >> >> How do several drivers coordinate this numbering among themselves?
> >> >> Is there a core kernel authority that manages pin number
> >> >> de/allocations?
> >> >
> >> > I believe the goal is to create something similar to the ptp
> >> > subsystem. The driver will need to configure the relationship
> >> > during initialization and the OS will manage the indexes.
> >>
> >> Can you point at the index management code, please?
> >
> > Look for the ptp_clock_register function in the kernel - it owns the
> > registration of the ptp clock to the subsystem.
> 
> But I'm talking about the SyncE code.

PHY pins are indexed as the driver wishes, as they are board specific. 
You can index PHY pins 1,2,3 or 3,4,5 - whichever makes sense for 
a given application, as they are local for a netdev.
I would suggest returning numbers that are tightly coupled to the EEC
when that's known to make guessing game easier, but that's not mandatory.

> >> >> >> Additionally, how would things like external GPSs or 1pps be
> >> >> >> modeled? I guess the driver would know about such interface, and
> >> >> >> would expose it as a "pin". When the GPS signal locks, the driver
> >> >> >> starts reporting the pin in the RCLK set. Then it is possible to
> >> >> >> set up tracking of that pin.
> >> >> >
> >> >> > That won't be enabled before we get the DPLL subsystem ready.
> >> >>
> >> >> It might prove challenging to retrofit an existing netdev-centric
> >> >> interface into a more generic model. It would be better to model this
> >> >> properly from day one, and OK, if we can carve out a subset of that
> >> >> model to implement now, and leave the rest for later, fine. But the
> >> >> current model does not strike me as having a natural migration path to
> >> >> something more generic. E.g. reporting the EEC state through the
> >> >> interfaces attached to that EEC... like, that will have to stay, even at
> >> >> a time when it is superseded by a better interface.
> >> >
> >> > The recovered clock API will not change - only EEC_STATE is in
> >> > question. We can either redirect the call to the DPLL subsystem, or
> >> > just add the DPLL IDX Into that call and return it.
> >>
> >> It would be better to have a first-class DPLL object, however vestigial,
> >> in the initial submission.
> >
> > As stated above - DPLL subsystem won't render EEC state useless.
> 
> Of course not, the state is still important. But it will render the API
> useless, and worse, an extra baggage everyone needs to know about and
> support.
> 
> >> > More advanced functionality will be grown organically, as I also have
> >> > a limited view of SyncE and am not expert on switches.
> >>
> >> We are growing it organically _right now_. I am strongly advocating an
> >> organic growth in the direction of a first-class DPLL object.
> >
> > If it helps - I can separate the PHY RCLK control patches and leave EEC state
> > under review
> 
> Not sure what you mean by that.

Commit RTM_GETRCLKSTATE and RTM_SETRCLKSTATE now, wait with 
RTM_GETEECSTATE  till we clarify further direction of the DPLL subsystem

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-10 15:50                   ` [Intel-wired-lan] " Machnikowski, Maciej
@ 2021-11-10 21:05                     ` Petr Machata
  -1 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-10 21:05 UTC (permalink / raw)
  To: Machnikowski, Maciej
  Cc: Petr Machata, netdev, intel-wired-lan, richardcochran, abyagowi,
	Nguyen, Anthony L, davem, kuba, linux-kselftest, idosch,
	mkubecek, saeed, michael.chan


Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:

>> >> Wait, so how do I do failover? Which of the set pins in primary and
>> >> which is backup? Should the backup be sticky, i.e. do primary and backup
>> >> switch roles after primary goes into holdover? It looks like there are a
>> >> number of policy decisions that would be best served by a userspace
>> >> tool.
>> >
>> > The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
>> > only configures the redirections (aka. Which clocks will be available to the
>> > DPLL as references). In some DPLLs the fallback is automatic as long as
>> > secondary clock is available when the primary goes away. Userspace tool
>> > can preconfigure that before the failure occurs.
>> 
>> OK, I see. It looks like this priority list implies which pins need to
>> be enabled. That makes the netdev interface redundant.
>
> Netdev owns the PHY, so it needs to enable/disable clock from a given
> port/lane - other than that it's EECs task. Technically - those subsystems
> are separate.

So why is the UAPI conflating the two?

>> As a user, say I know the signal coming from swp1 is freqency-locked.
>> How can I instruct the switch ASIC to propagate that signal to the other
>> ports? Well, I go through swp2..swpN, and issue RTM_SETRCLKSTATE or
>> whatever, with flags indicating I set up tracking, and pin number...
>> what exactly? How do I know which pin carries clock recovered from swp1?
>
> You send the RTM_SETRCLKSTATE to the port that has the best reference
> clock available.
> If you want to know which pin carries the clock you simply send the
> RTM_GETRCLKSTATE and it'll return the list of possible outputs with the flags
> saying which of them are enabled (see the newer revision)

As a user I would really prefer to have a pin reference reported
somewhere at the netdev / phy / somewhere. Similarly to how a netdev can
reference a PHC. But whatever, I won't split hairs over this, this is
acutally one aspect that is easy to add later.

>> >> > More advanced functionality will be grown organically, as I also have
>> >> > a limited view of SyncE and am not expert on switches.
>> >>
>> >> We are growing it organically _right now_. I am strongly advocating an
>> >> organic growth in the direction of a first-class DPLL object.
>> >
>> > If it helps - I can separate the PHY RCLK control patches and leave EEC state
>> > under review
>> 
>> Not sure what you mean by that.
>
> Commit RTM_GETRCLKSTATE and RTM_SETRCLKSTATE now, wait with 
> RTM_GETEECSTATE  till we clarify further direction of the DPLL subsystem

It's not just state though. There is another oddity that I am not sure
is intentional. The proposed UAPI allows me to set up fairly general
frequency bridging. In a device with a bunch of ports, it would allow me
to set up, say, swp1 to track RCLK from swp2, then swp3 from swp4, etc.
But what will be the EEC state in that case?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-10 21:05                     ` Petr Machata
  0 siblings, 0 replies; 50+ messages in thread
From: Petr Machata @ 2021-11-10 21:05 UTC (permalink / raw)
  To: intel-wired-lan


Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:

>> >> Wait, so how do I do failover? Which of the set pins in primary and
>> >> which is backup? Should the backup be sticky, i.e. do primary and backup
>> >> switch roles after primary goes into holdover? It looks like there are a
>> >> number of policy decisions that would be best served by a userspace
>> >> tool.
>> >
>> > The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
>> > only configures the redirections (aka. Which clocks will be available to the
>> > DPLL as references). In some DPLLs the fallback is automatic as long as
>> > secondary clock is available when the primary goes away. Userspace tool
>> > can preconfigure that before the failure occurs.
>> 
>> OK, I see. It looks like this priority list implies which pins need to
>> be enabled. That makes the netdev interface redundant.
>
> Netdev owns the PHY, so it needs to enable/disable clock from a given
> port/lane - other than that it's EECs task. Technically - those subsystems
> are separate.

So why is the UAPI conflating the two?

>> As a user, say I know the signal coming from swp1 is freqency-locked.
>> How can I instruct the switch ASIC to propagate that signal to the other
>> ports? Well, I go through swp2..swpN, and issue RTM_SETRCLKSTATE or
>> whatever, with flags indicating I set up tracking, and pin number...
>> what exactly? How do I know which pin carries clock recovered from swp1?
>
> You send the RTM_SETRCLKSTATE to the port that has the best reference
> clock available.
> If you want to know which pin carries the clock you simply send the
> RTM_GETRCLKSTATE and it'll return the list of possible outputs with the flags
> saying which of them are enabled (see the newer revision)

As a user I would really prefer to have a pin reference reported
somewhere at the netdev / phy / somewhere. Similarly to how a netdev can
reference a PHC. But whatever, I won't split hairs over this, this is
acutally one aspect that is easy to add later.

>> >> > More advanced functionality will be grown organically, as I also have
>> >> > a limited view of SyncE and am not expert on switches.
>> >>
>> >> We are growing it organically _right now_. I am strongly advocating an
>> >> organic growth in the direction of a first-class DPLL object.
>> >
>> > If it helps - I can separate the PHY RCLK control patches and leave EEC state
>> > under review
>> 
>> Not sure what you mean by that.
>
> Commit RTM_GETRCLKSTATE and RTM_SETRCLKSTATE now, wait with 
> RTM_GETEECSTATE  till we clarify further direction of the DPLL subsystem

It's not just state though. There is another oddity that I am not sure
is intentional. The proposed UAPI allows me to set up fairly general
frequency bridging. In a device with a bunch of ports, it would allow me
to set up, say, swp1 to track RCLK from swp2, then swp3 from swp4, etc.
But what will be the EEC state in that case?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-10 21:05                     ` [Intel-wired-lan] " Petr Machata
@ 2021-11-15 10:12                       ` Machnikowski, Maciej
  -1 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-15 10:12 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, intel-wired-lan, richardcochran, abyagowi, Nguyen,
	Anthony L, davem, kuba, linux-kselftest, idosch, mkubecek, saeed,
	michael.chan



> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Wednesday, November 10, 2021 10:06 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Cc: Petr Machata <petrm@nvidia.com>; netdev@vger.kernel.org; intel-
> wired-lan@lists.osuosl.org; richardcochran@gmail.com; abyagowi@fb.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net;
> kuba@kernel.org; linux-kselftest@vger.kernel.org; idosch@idosch.org;
> mkubecek@suse.cz; saeed@kernel.org; michael.chan@broadcom.com
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:
> 
> >> >> Wait, so how do I do failover? Which of the set pins in primary and
> >> >> which is backup? Should the backup be sticky, i.e. do primary and
> backup
> >> >> switch roles after primary goes into holdover? It looks like there are a
> >> >> number of policy decisions that would be best served by a userspace
> >> >> tool.
> >> >
> >> > The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
> >> > only configures the redirections (aka. Which clocks will be available to
> the
> >> > DPLL as references). In some DPLLs the fallback is automatic as long as
> >> > secondary clock is available when the primary goes away. Userspace
> tool
> >> > can preconfigure that before the failure occurs.
> >>
> >> OK, I see. It looks like this priority list implies which pins need to
> >> be enabled. That makes the netdev interface redundant.
> >
> > Netdev owns the PHY, so it needs to enable/disable clock from a given
> > port/lane - other than that it's EECs task. Technically - those subsystems
> > are separate.
> 
> So why is the UAPI conflating the two?

Because EEC can be a separate external device, but also can be integrated
inside the netdev. In the second case it makes more sense to just return
the state from a netdev 
 
> >> As a user, say I know the signal coming from swp1 is freqency-locked.
> >> How can I instruct the switch ASIC to propagate that signal to the other
> >> ports? Well, I go through swp2..swpN, and issue RTM_SETRCLKSTATE or
> >> whatever, with flags indicating I set up tracking, and pin number...
> >> what exactly? How do I know which pin carries clock recovered from
> swp1?
> >
> > You send the RTM_SETRCLKSTATE to the port that has the best reference
> > clock available.
> > If you want to know which pin carries the clock you simply send the
> > RTM_GETRCLKSTATE and it'll return the list of possible outputs with the
> flags
> > saying which of them are enabled (see the newer revision)
> 
> As a user I would really prefer to have a pin reference reported
> somewhere at the netdev / phy / somewhere. Similarly to how a netdev can
> reference a PHC. But whatever, I won't split hairs over this, this is
> acutally one aspect that is easy to add later.

I believe the best way would be to use sysfs entry for that (and provide a basic
control using it as well). But first we need the UAPI defined.
 
> >> >> > More advanced functionality will be grown organically, as I also have
> >> >> > a limited view of SyncE and am not expert on switches.
> >> >>
> >> >> We are growing it organically _right now_. I am strongly advocating an
> >> >> organic growth in the direction of a first-class DPLL object.
> >> >
> >> > If it helps - I can separate the PHY RCLK control patches and leave EEC
> state
> >> > under review
> >>
> >> Not sure what you mean by that.
> >
> > Commit RTM_GETRCLKSTATE and RTM_SETRCLKSTATE now, wait with
> > RTM_GETEECSTATE  till we clarify further direction of the DPLL subsystem
> 
> It's not just state though. There is another oddity that I am not sure
> is intentional. The proposed UAPI allows me to set up fairly general
> frequency bridging. In a device with a bunch of ports, it would allow me
> to set up, say, swp1 to track RCLK from swp2, then swp3 from swp4, etc.
> But what will be the EEC state in that case?

Yes. GET/SET UAPI is exactly there to configure that bridging. All it does
is to set up the recovered frequency on physical frequency output pins
of the phy/integrated device. In case DPLL is embedded the pins may be 
internal to the device and not exposed externally. It doesn't allow creation
of the tracking maps, as that's usually not a case in SyncE appliances.
In typical ones you recover the clock from a single port and then use that 
clock on all other ports.
The EEC state will depend on the signal quality and the configuration.
When the clock is enabled and is valid the EEC will tune its internal frequency
and report locked/Locked HO Acquired state.

Can remove word STATE from name and change to RTM_{GET,SET}RCLK 
if state is confusing there.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-15 10:12                       ` Machnikowski, Maciej
  0 siblings, 0 replies; 50+ messages in thread
From: Machnikowski, Maciej @ 2021-11-15 10:12 UTC (permalink / raw)
  To: intel-wired-lan



> -----Original Message-----
> From: Petr Machata <petrm@nvidia.com>
> Sent: Wednesday, November 10, 2021 10:06 PM
> To: Machnikowski, Maciej <maciej.machnikowski@intel.com>
> Cc: Petr Machata <petrm@nvidia.com>; netdev at vger.kernel.org; intel-
> wired-lan at lists.osuosl.org; richardcochran at gmail.com; abyagowi at fb.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem at davemloft.net;
> kuba at kernel.org; linux-kselftest at vger.kernel.org; idosch at idosch.org;
> mkubecek at suse.cz; saeed at kernel.org; michael.chan at broadcom.com
> Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> interfaces
> 
> 
> Machnikowski, Maciej <maciej.machnikowski@intel.com> writes:
> 
> >> >> Wait, so how do I do failover? Which of the set pins in primary and
> >> >> which is backup? Should the backup be sticky, i.e. do primary and
> backup
> >> >> switch roles after primary goes into holdover? It looks like there are a
> >> >> number of policy decisions that would be best served by a userspace
> >> >> tool.
> >> >
> >> > The clock priority is configured in the SEC/EEC/DPLL. Recovered clock API
> >> > only configures the redirections (aka. Which clocks will be available to
> the
> >> > DPLL as references). In some DPLLs the fallback is automatic as long as
> >> > secondary clock is available when the primary goes away. Userspace
> tool
> >> > can preconfigure that before the failure occurs.
> >>
> >> OK, I see. It looks like this priority list implies which pins need to
> >> be enabled. That makes the netdev interface redundant.
> >
> > Netdev owns the PHY, so it needs to enable/disable clock from a given
> > port/lane - other than that it's EECs task. Technically - those subsystems
> > are separate.
> 
> So why is the UAPI conflating the two?

Because EEC can be a separate external device, but also can be integrated
inside the netdev. In the second case it makes more sense to just return
the state from a netdev 
 
> >> As a user, say I know the signal coming from swp1 is freqency-locked.
> >> How can I instruct the switch ASIC to propagate that signal to the other
> >> ports? Well, I go through swp2..swpN, and issue RTM_SETRCLKSTATE or
> >> whatever, with flags indicating I set up tracking, and pin number...
> >> what exactly? How do I know which pin carries clock recovered from
> swp1?
> >
> > You send the RTM_SETRCLKSTATE to the port that has the best reference
> > clock available.
> > If you want to know which pin carries the clock you simply send the
> > RTM_GETRCLKSTATE and it'll return the list of possible outputs with the
> flags
> > saying which of them are enabled (see the newer revision)
> 
> As a user I would really prefer to have a pin reference reported
> somewhere at the netdev / phy / somewhere. Similarly to how a netdev can
> reference a PHC. But whatever, I won't split hairs over this, this is
> acutally one aspect that is easy to add later.

I believe the best way would be to use sysfs entry for that (and provide a basic
control using it as well). But first we need the UAPI defined.
 
> >> >> > More advanced functionality will be grown organically, as I also have
> >> >> > a limited view of SyncE and am not expert on switches.
> >> >>
> >> >> We are growing it organically _right now_. I am strongly advocating an
> >> >> organic growth in the direction of a first-class DPLL object.
> >> >
> >> > If it helps - I can separate the PHY RCLK control patches and leave EEC
> state
> >> > under review
> >>
> >> Not sure what you mean by that.
> >
> > Commit RTM_GETRCLKSTATE and RTM_SETRCLKSTATE now, wait with
> > RTM_GETEECSTATE  till we clarify further direction of the DPLL subsystem
> 
> It's not just state though. There is another oddity that I am not sure
> is intentional. The proposed UAPI allows me to set up fairly general
> frequency bridging. In a device with a bunch of ports, it would allow me
> to set up, say, swp1 to track RCLK from swp2, then swp3 from swp4, etc.
> But what will be the EEC state in that case?

Yes. GET/SET UAPI is exactly there to configure that bridging. All it does
is to set up the recovered frequency on physical frequency output pins
of the phy/integrated device. In case DPLL is embedded the pins may be 
internal to the device and not exposed externally. It doesn't allow creation
of the tracking maps, as that's usually not a case in SyncE appliances.
In typical ones you recover the clock from a single port and then use that 
clock on all other ports.
The EEC state will depend on the signal quality and the configuration.
When the clock is enabled and is valid the EEC will tune its internal frequency
and report locked/Locked HO Acquired state.

Can remove word STATE from name and change to RTM_{GET,SET}RCLK 
if state is confusing there.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
  2021-11-15 10:12                       ` [Intel-wired-lan] " Machnikowski, Maciej
@ 2021-11-15 21:42                         ` Jakub Kicinski
  -1 siblings, 0 replies; 50+ messages in thread
From: Jakub Kicinski @ 2021-11-15 21:42 UTC (permalink / raw)
  To: Machnikowski, Maciej, Vadim Fedorenko
  Cc: Petr Machata, netdev, intel-wired-lan, richardcochran, abyagowi,
	Nguyen, Anthony L, davem, linux-kselftest, idosch, mkubecek,
	saeed, michael.chan

On Mon, 15 Nov 2021 10:12:25 +0000 Machnikowski, Maciej wrote:
> > > Netdev owns the PHY, so it needs to enable/disable clock from a given
> > > port/lane - other than that it's EECs task. Technically - those subsystems
> > > are separate.  
> > 
> > So why is the UAPI conflating the two?  
> 
> Because EEC can be a separate external device, but also can be integrated
> inside the netdev. In the second case it makes more sense to just return
> the state from a netdev 

I mentioned that we are in a need of such API to Vadim who, among other
things, works on the OCP Timecard. He indicated interest in developing
the separate netlink interface for "DPLLs" (the timecard is just an
atomic clock + GPS, no netdev to hang from). Let's wait for Vadim's work
to materialize and build on top of that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces
@ 2021-11-15 21:42                         ` Jakub Kicinski
  0 siblings, 0 replies; 50+ messages in thread
From: Jakub Kicinski @ 2021-11-15 21:42 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 15 Nov 2021 10:12:25 +0000 Machnikowski, Maciej wrote:
> > > Netdev owns the PHY, so it needs to enable/disable clock from a given
> > > port/lane - other than that it's EECs task. Technically - those subsystems
> > > are separate.  
> > 
> > So why is the UAPI conflating the two?  
> 
> Because EEC can be a separate external device, but also can be integrated
> inside the netdev. In the second case it makes more sense to just return
> the state from a netdev 

I mentioned that we are in a need of such API to Vadim who, among other
things, works on the OCP Timecard. He indicated interest in developing
the separate netlink interface for "DPLLs" (the timecard is just an
atomic clock + GPS, no netdev to hang from). Let's wait for Vadim's work
to materialize and build on top of that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2021-11-15 21:45 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-05 20:53 [PATCH v2 net-next 0/6] Add RTNL interface for SyncE Maciej Machnikowski
2021-11-05 20:53 ` [Intel-wired-lan] " Maciej Machnikowski
2021-11-05 20:53 ` [PATCH v2 net-next 1/6] ice: add support detecting features based on netlist Maciej Machnikowski
2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
2021-11-05 20:53 ` [PATCH v2 net-next 2/6] rtnetlink: Add new RTM_GETEECSTATE message to get SyncE status Maciej Machnikowski
2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
2021-11-07 13:44   ` Ido Schimmel
2021-11-07 13:44     ` [Intel-wired-lan] " Ido Schimmel
2021-11-05 20:53 ` [PATCH v2 net-next 3/6] ice: add support for reading SyncE DPLL state Maciej Machnikowski
2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
2021-11-05 20:53 ` [PATCH v2 net-next 4/6] rtnetlink: Add support for SyncE recovered clock configuration Maciej Machnikowski
2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
2021-11-05 20:53 ` [PATCH v2 net-next 5/6] ice: add support for SyncE recovered clocks Maciej Machnikowski
2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
2021-11-05 20:53 ` [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces Maciej Machnikowski
2021-11-05 20:53   ` [Intel-wired-lan] " Maciej Machnikowski
2021-11-07 14:08   ` Ido Schimmel
2021-11-07 14:08     ` [Intel-wired-lan] " Ido Schimmel
2021-11-08  8:35     ` Machnikowski, Maciej
2021-11-08  8:35       ` [Intel-wired-lan] " Machnikowski, Maciej
2021-11-08 16:29       ` Ido Schimmel
2021-11-08 16:29         ` [Intel-wired-lan] " Ido Schimmel
2021-11-08 17:03         ` Jakub Kicinski
2021-11-08 17:03           ` [Intel-wired-lan] " Jakub Kicinski
2021-11-09 10:50           ` Machnikowski, Maciej
2021-11-09 10:50             ` [Intel-wired-lan] " Machnikowski, Maciej
2021-11-09 10:32         ` Machnikowski, Maciej
2021-11-09 10:32           ` [Intel-wired-lan] " Machnikowski, Maciej
2021-11-08 18:00   ` Petr Machata
2021-11-08 18:00     ` [Intel-wired-lan] " Petr Machata
2021-11-09 10:43     ` Machnikowski, Maciej
2021-11-09 10:43       ` [Intel-wired-lan] " Machnikowski, Maciej
2021-11-09 14:52       ` Petr Machata
2021-11-09 14:52         ` [Intel-wired-lan] " Petr Machata
2021-11-09 18:19         ` Machnikowski, Maciej
2021-11-09 18:19           ` [Intel-wired-lan] " Machnikowski, Maciej
2021-11-10 10:27           ` Petr Machata
2021-11-10 10:27             ` [Intel-wired-lan] " Petr Machata
2021-11-10 11:19             ` Machnikowski, Maciej
2021-11-10 11:19               ` [Intel-wired-lan] " Machnikowski, Maciej
2021-11-10 15:15               ` Petr Machata
2021-11-10 15:15                 ` [Intel-wired-lan] " Petr Machata
2021-11-10 15:50                 ` Machnikowski, Maciej
2021-11-10 15:50                   ` [Intel-wired-lan] " Machnikowski, Maciej
2021-11-10 21:05                   ` Petr Machata
2021-11-10 21:05                     ` [Intel-wired-lan] " Petr Machata
2021-11-15 10:12                     ` Machnikowski, Maciej
2021-11-15 10:12                       ` [Intel-wired-lan] " Machnikowski, Maciej
2021-11-15 21:42                       ` Jakub Kicinski
2021-11-15 21:42                         ` [Intel-wired-lan] " Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.