All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
@ 2022-08-16 22:29 Vladimir Oltean
  2022-08-16 22:29 ` [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool() Vladimir Oltean
                   ` (9 more replies)
  0 siblings, 10 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-16 22:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

Vinicius' progress on upstreaming frame preemption support for Intel I226
seemed to stall, so I decided to give it a go using my own view as well.
https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/

Please don't take this patch set too seriously; I spent only a few
days working on this, and I'm only posting it as RFC to inform others
I've started doing this, before I spend too much time to risk colliding
with someone else's active work.

Compared to Vinicius' previous patches, this is basically a new
implementation, with the following differences:

- The MAC Merge (mm) and Frame Preemption (fp) settings are split like
  they were in Vinicius' proposal to have fp as part of tc-taprio. But
  in my proposal, the fp portion is still part of ethtool, like mm.

- We have statistics, actually 2 kinds. First we have MAC merge layer
  stats, which are exposed as protocol-specific stats:

  ethtool --json --include-statistics --show-mm eno2
  [ {
          "ifname": "eno2",
          "verify-status": "SUCCEEDED",
          "verify-time": 10,
          "supported": true,
          "enabled": true,
          "active": true,
          "add-frag-size": 0,
          "statistics": {
              "MACMergeFrameAssErrorCount": 0,
              "MACMergeFrameSmdErrorCount": 0,
              "MACMergeFrameAssOkCount": 0,
              "MACMergeFragCountRx": 0,
              "MACMergeFragCountTx": 0,
              "MACMergeHoldCount": 0
          }
      } ]

  and then we also have the usual standardized statistics counters, but
  replicated for the pMAC:

  ethtool -S eno0 --groups pmac-rmon
  Standard stats for eno0:
  pmac-rmon-etherStatsUndersizePkts: 0
  pmac-rmon-etherStatsOversizePkts: 0
  pmac-rmon-etherStatsFragments: 0
  pmac-rmon-etherStatsJabbers: 0
  rx-pmac-rmon-etherStatsPkts64to64Octets: 0
  rx-pmac-rmon-etherStatsPkts65to127Octets: 0
  rx-pmac-rmon-etherStatsPkts128to255Octets: 0
  rx-pmac-rmon-etherStatsPkts256to511Octets: 0
  rx-pmac-rmon-etherStatsPkts512to1023Octets: 0
  rx-pmac-rmon-etherStatsPkts1024to1522Octets: 0
  rx-pmac-rmon-etherStatsPkts1523to9000Octets: 0
  tx-pmac-rmon-etherStatsPkts64to64Octets: 0
  tx-pmac-rmon-etherStatsPkts65to127Octets: 0
  tx-pmac-rmon-etherStatsPkts128to255Octets: 0
  tx-pmac-rmon-etherStatsPkts256to511Octets: 0
  tx-pmac-rmon-etherStatsPkts512to1023Octets: 0
  tx-pmac-rmon-etherStatsPkts1024to1522Octets: 0
  tx-pmac-rmon-etherStatsPkts1523to9000Octets: 0

  ethtool -S eno0 --groups eth-pmac-mac
  Standard stats for eno0:
  eth-pmac-mac-FramesTransmittedOK: 0
  eth-pmac-mac-SingleCollisionFrames: 0
  eth-pmac-mac-MultipleCollisionFrames: 0
  eth-pmac-mac-FramesReceivedOK: 0
  eth-pmac-mac-FrameCheckSequenceErrors: 0
  eth-pmac-mac-AlignmentErrors: 0
  eth-pmac-mac-OctetsTransmittedOK: 0
  eth-pmac-mac-FramesWithDeferredXmissions: 0
  eth-pmac-mac-LateCollisions: 0
  eth-pmac-mac-FramesAbortedDueToXSColls: 0
  eth-pmac-mac-FramesLostDueToIntMACXmitError: 0
  eth-pmac-mac-CarrierSenseErrors: 0
  eth-pmac-mac-OctetsReceivedOK: 0
  eth-pmac-mac-FramesLostDueToIntMACRcvError: 0
  eth-pmac-mac-MulticastFramesXmittedOK: 0
  eth-pmac-mac-BroadcastFramesXmittedOK: 0
  eth-pmac-mac-MulticastFramesReceivedOK: 0
  eth-pmac-mac-BroadcastFramesReceivedOK: 0

  ethtool -S eno0 --groups eth-pmac-ctrl
  Standard stats for eno0:
  eth-pmac-ctrl-MACControlFramesTransmitted: 0
  eth-pmac-ctrl-MACControlFramesReceived: 0

  What also exists but is not exported here are PAUSE stats for the
  pMAC. Since those are also protocol-specific stats, I'm not sure how
  to mix the 2 (MAC Merge layer + PAUSE). Maybe just extend
  ETHTOOL_A_PAUSE_STAT_TX_FRAMES and ETHTOOL_A_PAUSE_STAT_RX_FRAMES with
  the pMAC variants?

- Finally, the hardware I'm working with (here, the test vehicle is the
  NXP ENETC from LS1028A, although I have patches for the Felix switch
  as well, but those need a bit of a revolution in the driver to go in
  first). This hardware is not without its flaws, but at least allows me
  to concentrate on the UAPI portions for this series.

I also have a kselftest written, but it's for the Felix switch (covers
forwarding latency) and so it's not included here.

Are there objections in exposing the UAPI for this new feature in this way?

Also, there is no documentation associated with this patch set, other
than the code. Life is too short to write documentation for an RFC, sorry.
I may get kdoc related kernel bot warnings because I copy-pasted ethtool
structure definitions from here and there, but I didn't fill in the
descriptions of all their fields. All those fields are as truthful to
the standards as possible rather than my own variables or names, so
please refer to those specs for now.

Vladimir Oltean (7):
  net: ethtool: netlink: introduce ethnl_update_bool()
  net: ethtool: add support for Frame Preemption and MAC Merge layer
  net: ethtool: stats: make stats_put_stats() take input from multiple
    sources
  net: ethtool: stats: replicate standardized counters for the pMAC
  net: enetc: parameterize port MAC stats to also cover the pMAC
  net: enetc: expose some standardized ethtool counters
  net: enetc: add support for Frame Preemption and MAC Merge layer

 .../ethernet/freescale/enetc/enetc_ethtool.c  | 399 +++++++++++++++---
 .../net/ethernet/freescale/enetc/enetc_hw.h   | 132 +++---
 include/linux/ethtool.h                       |  68 +++
 include/uapi/linux/ethtool.h                  |  15 +
 include/uapi/linux/ethtool_netlink.h          |  86 ++++
 net/ethtool/Makefile                          |   3 +-
 net/ethtool/fp.c                              | 295 +++++++++++++
 net/ethtool/mm.c                              | 228 ++++++++++
 net/ethtool/netlink.c                         |  38 ++
 net/ethtool/netlink.h                         |  34 ++
 net/ethtool/stats.c                           | 218 +++++++---
 11 files changed, 1338 insertions(+), 178 deletions(-)
 create mode 100644 net/ethtool/fp.c
 create mode 100644 net/ethtool/mm.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool()
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
@ 2022-08-16 22:29 ` Vladimir Oltean
  2022-08-17 11:27   ` Michal Kubecek
  2022-08-16 22:29 ` [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer Vladimir Oltean
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-16 22:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

For a reason I can't really understand, ethnl_update_bool32() exists,
but the plain function that operates on a boolean value kept in an
actual u8 netlink attribute doesn't. Introduce it; it's needed for
things like verify-disabled for the MAC merge configuration.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/ethtool/netlink.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index c0d587611854..1653fd2cf0cf 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -111,6 +111,32 @@ static inline void ethnl_update_u8(u8 *dst, const struct nlattr *attr,
 	*mod = true;
 }
 
+/**
+ * ethnl_update_bool() - update bool from NLA_U8 attribute
+ * @dst:  value to update
+ * @attr: netlink attribute with new value or null
+ * @mod:  pointer to bool for modification tracking
+ *
+ * Use the u8 value from NLA_U8 netlink attribute @attr to set bool variable
+ * pointed to by @dst to false (if zero) or 1 (if not); do nothing if @attr is
+ * null. Bool pointed to by @mod is set to true if this function changed the
+ * logical value of *dst, otherwise it is left as is.
+ */
+static inline void ethnl_update_bool(bool *dst, const struct nlattr *attr,
+				     bool *mod)
+{
+	u8 val;
+
+	if (!attr)
+		return;
+	val = !!nla_get_u8(attr);
+	if (*dst == val)
+		return;
+
+	*dst = val;
+	*mod = true;
+}
+
 /**
  * ethnl_update_bool32() - update u32 used as bool from NLA_U8 attribute
  * @dst:  value to update
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
  2022-08-16 22:29 ` [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool() Vladimir Oltean
@ 2022-08-16 22:29 ` Vladimir Oltean
  2022-08-17  3:22   ` Jakub Kicinski
  2022-08-17 23:15   ` Vinicius Costa Gomes
  2022-08-16 22:29 ` [RFC PATCH net-next 3/7] net: ethtool: stats: make stats_put_stats() take input from multiple sources Vladimir Oltean
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-16 22:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

Frame preemption (IEEE 802.1Q-2018 clause 6.7.2) and the MAC merge
sublayer (IEEE 802.3-2018 clause 99) are 2 specifications, part of the
Time Sensitive Networking enhancements, which work together to minimize
latency caused by frame interference. The overall goal of TSN is for
normal traffic and traffic belonging to real-time control processes to
be able to cohabitate on the same L2 network and not bother each other
too much.

They achieve this (partly) by introducing the concept of preemptable
traffic, i.e. Ethernet frames that have an altered Start-Of-Frame
delimiter, which can be fragmented and reassembled at L2 on a link-local
basis. The non-preemptable frames are called express traffic, and they
can preempt preemptable frames, therefore having lower latency, which
can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
frames around 9K). Preemption is not recursive, i.e. a PT frame cannot
preempt another PT frame. Preemption also does not depend upon priority,
or otherwise said, an ET frame with prio 0 will still preempt a PT frame
with prio 7.

In terms of implementation, the specs talk about the fact that the
express traffic is handled by an express MAC (eMAC) and the preemptable
traffic by a preemptable MAC (pMAC), and these MACs are multiplexed on
the same MII by a MAC merge layer.

On RX, packets go to the eMAC or to the pMAC based on their SFD (the
definition of which was generalized to SMD, Start-of-mPacket-Delimiter,
where mPacket is essentially an Ethernet frame fragment, or a complete
frame). On TX, the eMAC/pMAC classification decision is taken by the
802.1Q spec, based on packet priority (each of the 8 priority values may
have an admin-status of preemptable or express).

I have modeled both the Ethernet part of the spec and the queuing part
as separate netlink messages, with separate ethtool command sets
intended for them. I am slightly flexible as to where to place the FP
settings; there were previous discussions about placing them in tc.
At the moment I haven't received a good enough argument to move them out
of ethtool, so here they are.
https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/ethtool.h              |  59 ++++++
 include/uapi/linux/ethtool.h         |  15 ++
 include/uapi/linux/ethtool_netlink.h |  82 ++++++++
 net/ethtool/Makefile                 |   3 +-
 net/ethtool/fp.c                     | 295 +++++++++++++++++++++++++++
 net/ethtool/mm.c                     | 228 +++++++++++++++++++++
 net/ethtool/netlink.c                |  38 ++++
 net/ethtool/netlink.h                |   8 +
 8 files changed, 727 insertions(+), 1 deletion(-)
 create mode 100644 net/ethtool/fp.c
 create mode 100644 net/ethtool/mm.c

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 99dc7bfbcd3c..fa504dd22bf6 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -453,6 +453,49 @@ struct ethtool_module_power_mode_params {
 	enum ethtool_module_power_mode mode;
 };
 
+/**
+ * struct ethtool_fp_param - 802.1Q Frame Preemption parameters
+ */
+struct ethtool_fp_param {
+	u8 preemptable_prios;
+	u32 hold_advance;
+	u32 release_advance;
+};
+
+/**
+ * struct ethtool_mm_state - 802.3 MAC merge layer state
+ */
+struct ethtool_mm_state {
+	u32 verify_time;
+	enum ethtool_mm_verify_status verify_status;
+	bool supported;
+	bool enabled;
+	bool active;
+	u8 add_frag_size;
+};
+
+/**
+ * struct ethtool_mm_cfg - 802.3 MAC merge layer configuration
+ */
+struct ethtool_mm_cfg {
+	u32 verify_time;
+	bool verify_disable;
+	bool enabled;
+	u8 add_frag_size;
+};
+
+/* struct ethtool_mm_stats - 802.3 MAC merge layer statistics, as defined in
+ * IEEE 802.3-2018 subclause 30.14.1 oMACMergeEntity managed object class
+ */
+struct ethtool_mm_stats {
+	u64 MACMergeFrameAssErrorCount;
+	u64 MACMergeFrameSmdErrorCount;
+	u64 MACMergeFrameAssOkCount;
+	u64 MACMergeFragCountRx;
+	u64 MACMergeFragCountTx;
+	u64 MACMergeHoldCount;
+};
+
 /**
  * struct ethtool_ops - optional netdev operations
  * @cap_link_lanes_supported: indicates if the driver supports lanes
@@ -624,6 +667,11 @@ struct ethtool_module_power_mode_params {
  *	plugged-in.
  * @set_module_power_mode: Set the power mode policy for the plug-in module
  *	used by the network device.
+ * @get_fp_param: Query the 802.1Q Frame Preemption parameters.
+ * @set_fp_param: Set the 802.1Q Frame Preemption parameters.
+ * @get_mm_state: Query the 802.3 MAC Merge layer state.
+ * @set_mm_cfg: Set the 802.3 MAC Merge layer parameters.
+ * @get_mm_stats: Query the 802.3 MAC Merge layer statistics.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -760,6 +808,17 @@ struct ethtool_ops {
 	int	(*set_module_power_mode)(struct net_device *dev,
 					 const struct ethtool_module_power_mode_params *params,
 					 struct netlink_ext_ack *extack);
+	void	(*get_fp_param)(struct net_device *dev,
+				struct ethtool_fp_param *params);
+	int	(*set_fp_param)(struct net_device *dev,
+				const struct ethtool_fp_param *params,
+				struct netlink_ext_ack *extack);
+	void	(*get_mm_state)(struct net_device *dev,
+				struct ethtool_mm_state *state);
+	int	(*set_mm_cfg)(struct net_device *dev, struct ethtool_mm_cfg *cfg,
+			      struct netlink_ext_ack *extack);
+	void	(*get_mm_stats)(struct net_device *dev,
+				struct ethtool_mm_stats *stats);
 };
 
 int ethtool_check_ops(const struct ethtool_ops *ops);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 2d5741fd44bb..7a21fcb26a43 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -736,6 +736,21 @@ enum ethtool_module_power_mode {
 	ETHTOOL_MODULE_POWER_MODE_HIGH,
 };
 
+/* Values from ieee8021FramePreemptionAdminStatus */
+enum ethtool_fp_admin_status {
+	ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_EXPRESS = 1,
+	ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_PREEMPTABLE = 2,
+};
+
+enum ethtool_mm_verify_status {
+	ETHTOOL_MM_VERIFY_STATUS_UNKNOWN,
+	ETHTOOL_MM_VERIFY_STATUS_INITIAL,
+	ETHTOOL_MM_VERIFY_STATUS_VERIFYING,
+	ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED,
+	ETHTOOL_MM_VERIFY_STATUS_FAILED,
+	ETHTOOL_MM_VERIFY_STATUS_DISABLED,
+};
+
 /**
  * struct ethtool_gstrings - string set for data tagging
  * @cmd: Command number = %ETHTOOL_GSTRINGS
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index d2fb4f7be61b..658810274c49 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -49,6 +49,10 @@ enum {
 	ETHTOOL_MSG_PHC_VCLOCKS_GET,
 	ETHTOOL_MSG_MODULE_GET,
 	ETHTOOL_MSG_MODULE_SET,
+	ETHTOOL_MSG_FP_GET,
+	ETHTOOL_MSG_FP_SET,
+	ETHTOOL_MSG_MM_GET,
+	ETHTOOL_MSG_MM_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -94,6 +98,10 @@ enum {
 	ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY,
 	ETHTOOL_MSG_MODULE_GET_REPLY,
 	ETHTOOL_MSG_MODULE_NTF,
+	ETHTOOL_MSG_FP_GET_REPLY,
+	ETHTOOL_MSG_FP_NTF,
+	ETHTOOL_MSG_MM_GET_REPLY,
+	ETHTOOL_MSG_MM_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -862,6 +870,80 @@ enum {
 	ETHTOOL_A_MODULE_MAX = (__ETHTOOL_A_MODULE_CNT - 1)
 };
 
+/* FRAME PREEMPTION (802.1Q) */
+
+enum {
+	ETHTOOL_A_FP_PARAM_ENTRY_UNSPEC,
+	ETHTOOL_A_FP_PARAM_ENTRY_PRIO,		/* u8 */
+	ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS,	/* u8 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_FP_PARAM_ENTRY_CNT,
+	ETHTOOL_A_FP_PARAM_ENTRY_MAX = (__ETHTOOL_A_FP_PARAM_ENTRY_CNT - 1)
+};
+
+enum {
+	ETHTOOL_A_FP_PARAM_UNSPEC,
+	ETHTOOL_A_FP_PARAM_ENTRY,		/* nest */
+
+	/* add new constants above here */
+	__ETHTOOL_A_FP_PARAM_CNT,
+	ETHTOOL_A_FP_PARAM_MAX = (__ETHTOOL_A_FP_PARAM_CNT - 1)
+};
+
+enum {
+	ETHTOOL_A_FP_UNSPEC,
+	ETHTOOL_A_FP_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_FP_PARAM_TABLE,		/* nest */
+	ETHTOOL_A_FP_HOLD_ADVANCE,		/* u32 */
+	ETHTOOL_A_FP_RELEASE_ADVANCE,		/* u32 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_FP_CNT,
+	ETHTOOL_A_FP_MAX = (__ETHTOOL_A_FP_CNT - 1)
+};
+
+/* MAC MERGE (802.3) */
+
+enum {
+	ETHTOOL_A_MM_STAT_UNSPEC,
+	ETHTOOL_A_MM_STAT_PAD,
+
+	/* aMACMergeFrameAssErrorCount */
+	ETHTOOL_A_MM_STAT_REASSEMBLY_ERRORS,	/* u64 */
+	/* aMACMergeFrameSmdErrorCount */
+	ETHTOOL_A_MM_STAT_SMD_ERRORS,		/* u64 */
+	/* aMACMergeFrameAssOkCount */
+	ETHTOOL_A_MM_STAT_REASSEMBLY_OK,	/* u64 */
+	/* aMACMergeFragCountRx */
+	ETHTOOL_A_MM_STAT_RX_FRAG_COUNT,	/* u64 */
+	/* aMACMergeFragCountTx */
+	ETHTOOL_A_MM_STAT_TX_FRAG_COUNT,	/* u64 */
+	/* aMACMergeHoldCount */
+	ETHTOOL_A_MM_STAT_HOLD_COUNT,		/* u64 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_MM_STAT_CNT,
+	ETHTOOL_A_MM_STAT_MAX = (__ETHTOOL_A_MM_STAT_CNT - 1)
+};
+
+enum {
+	ETHTOOL_A_MM_UNSPEC,
+	ETHTOOL_A_MM_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_MM_VERIFY_STATUS,		/* u8 */
+	ETHTOOL_A_MM_VERIFY_DISABLE,		/* u8 */
+	ETHTOOL_A_MM_VERIFY_TIME,		/* u32 */
+	ETHTOOL_A_MM_SUPPORTED,			/* u8 */
+	ETHTOOL_A_MM_ENABLED,			/* u8 */
+	ETHTOOL_A_MM_ACTIVE,			/* u8 */
+	ETHTOOL_A_MM_ADD_FRAG_SIZE,		/* u8 */
+	ETHTOOL_A_MM_STATS,			/* nest - _A_MM_STAT_* */
+
+	/* add new constants above here */
+	__ETHTOOL_A_MM_CNT,
+	ETHTOOL_A_MM_MAX = (__ETHTOOL_A_MM_CNT - 1)
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index b76432e70e6b..31e056f17856 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -7,4 +7,5 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
 ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
 		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
 		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
-		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o
+		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o \
+		   fp.o mm.o
diff --git a/net/ethtool/fp.c b/net/ethtool/fp.c
new file mode 100644
index 000000000000..20f19d8c1461
--- /dev/null
+++ b/net/ethtool/fp.c
@@ -0,0 +1,295 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022 NXP
+ */
+#include "common.h"
+#include "netlink.h"
+
+struct fp_req_info {
+	struct ethnl_req_info		base;
+};
+
+struct fp_reply_data {
+	struct ethnl_reply_data		base;
+	struct ethtool_fp_param		params;
+};
+
+#define FP_REPDATA(__reply_base) \
+	container_of(__reply_base, struct fp_reply_data, base)
+
+const struct nla_policy ethnl_fp_get_policy[ETHTOOL_A_FP_HEADER + 1] = {
+	[ETHTOOL_A_FP_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
+};
+
+static int fp_prepare_data(const struct ethnl_req_info *req_base,
+			   struct ethnl_reply_data *reply_base,
+			   struct genl_info *info)
+{
+	struct fp_reply_data *data = FP_REPDATA(reply_base);
+	struct net_device *dev = reply_base->dev;
+	int ret;
+
+	if (!dev->ethtool_ops->get_fp_param)
+		return -EOPNOTSUPP;
+
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		return ret;
+
+	dev->ethtool_ops->get_fp_param(dev, &data->params);
+	ethnl_ops_complete(dev);
+
+	return 0;
+}
+
+static int fp_reply_size(const struct ethnl_req_info *req_base,
+			 const struct ethnl_reply_data *reply_base)
+{
+	int len = 0;
+
+	len += nla_total_size(0); /* _FP_PARAM_ENTRY */
+	len += nla_total_size(sizeof(u8)); /* _FP_PARAM_ENTRY_PRIO */
+	len += nla_total_size(sizeof(u8)); /* _FP_PARAM_ENTRY_ADMIN_STATUS */
+	len *= 8; /* 8 prios */
+	len += nla_total_size(0); /* _FP_PARAM_TABLE */
+	len += nla_total_size(sizeof(u32)); /* _FP_HOLD_ADVANCE */
+	len += nla_total_size(sizeof(u32)); /* _FP_RELEASE_ADVANCE */
+
+	return len;
+}
+
+static int fp_fill_reply(struct sk_buff *skb,
+			 const struct ethnl_req_info *req_base,
+			 const struct ethnl_reply_data *reply_base)
+{
+	const struct fp_reply_data *data = FP_REPDATA(reply_base);
+	const struct ethtool_fp_param *params = &data->params;
+	enum ethtool_fp_admin_status admin_status;
+	struct nlattr *nest_table, *nest_entry;
+	int prio;
+	int ret;
+
+	if (nla_put_u32(skb, ETHTOOL_A_FP_HOLD_ADVANCE, params->hold_advance) ||
+	    nla_put_u32(skb, ETHTOOL_A_FP_RELEASE_ADVANCE, params->release_advance))
+		return -EMSGSIZE;
+
+	nest_table = nla_nest_start(skb, ETHTOOL_A_FP_PARAM_TABLE);
+	if (!nest_table)
+		return -EMSGSIZE;
+
+	for (prio = 0; prio < 8; prio++) {
+		admin_status = (params->preemptable_prios & BIT(prio)) ?
+			ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_PREEMPTABLE :
+			ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_EXPRESS;
+
+		nest_entry = nla_nest_start(skb, ETHTOOL_A_FP_PARAM_ENTRY);
+		if (!nest_entry)
+			goto nla_nest_cancel_table;
+
+		if (nla_put_u8(skb, ETHTOOL_A_FP_PARAM_ENTRY_PRIO, prio))
+			goto nla_nest_cancel_entry;
+
+		if (nla_put_u8(skb, ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS,
+			       admin_status))
+			goto nla_nest_cancel_entry;
+
+		nla_nest_end(skb, nest_entry);
+	}
+
+	nla_nest_end(skb, nest_table);
+
+	return 0;
+
+nla_nest_cancel_entry:
+	nla_nest_cancel(skb, nest_entry);
+nla_nest_cancel_table:
+	nla_nest_cancel(skb, nest_table);
+	return ret;
+}
+
+const struct ethnl_request_ops ethnl_fp_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_FP_GET,
+	.reply_cmd		= ETHTOOL_MSG_FP_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_FP_HEADER,
+	.req_info_size		= sizeof(struct fp_req_info),
+	.reply_data_size	= sizeof(struct fp_reply_data),
+
+	.prepare_data		= fp_prepare_data,
+	.reply_size		= fp_reply_size,
+	.fill_reply		= fp_fill_reply,
+};
+
+static const struct nla_policy
+ethnl_fp_set_param_entry_policy[ETHTOOL_A_FP_PARAM_ENTRY_MAX  + 1] = {
+	[ETHTOOL_A_FP_PARAM_ENTRY_PRIO] = { .type = NLA_U8 },
+	[ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS] = { .type = NLA_U8 },
+};
+
+static const struct nla_policy
+ethnl_fp_set_param_table_policy[ETHTOOL_A_FP_PARAM_MAX + 1] = {
+	[ETHTOOL_A_FP_PARAM_ENTRY] = NLA_POLICY_NESTED(ethnl_fp_set_param_entry_policy),
+};
+
+const struct nla_policy ethnl_fp_set_policy[ETHTOOL_A_FP_MAX + 1] = {
+	[ETHTOOL_A_FP_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_FP_PARAM_TABLE] = NLA_POLICY_NESTED(ethnl_fp_set_param_table_policy),
+};
+
+static int fp_parse_param_entry(const struct nlattr *nest,
+				struct ethtool_fp_param *params,
+				u8 *seen_prios, bool *mod,
+				struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[ARRAY_SIZE(ethnl_fp_set_param_entry_policy)];
+	u8 preemptable_prios = params->preemptable_prios;
+	u8 prio, admin_status;
+	int ret;
+
+	if (!nest)
+		return 0;
+
+	ret = nla_parse_nested(tb,
+			       ARRAY_SIZE(ethnl_fp_set_param_entry_policy) - 1,
+			       nest, ethnl_fp_set_param_entry_policy,
+			       extack);
+	if (ret)
+		return ret;
+
+	if (!tb[ETHTOOL_A_FP_PARAM_ENTRY_PRIO]) {
+		NL_SET_ERR_MSG_ATTR(extack, nest,
+				    "Missing 'prio' attribute");
+		return -EINVAL;
+	}
+
+	if (!tb[ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS]) {
+		NL_SET_ERR_MSG_ATTR(extack, nest,
+				    "Missing 'admin_status' attribute");
+		return -EINVAL;
+	}
+
+	prio = nla_get_u8(tb[ETHTOOL_A_FP_PARAM_ENTRY_PRIO]);
+	if (prio >= 8) {
+		NL_SET_ERR_MSG_ATTR(extack, nest,
+				    "Range exceeded for 'prio' attribute");
+		return -ERANGE;
+	}
+
+	if (*seen_prios & BIT(prio)) {
+		NL_SET_ERR_MSG_ATTR(extack, nest,
+				    "Duplicate 'prio' attribute");
+		return -EINVAL;
+	}
+
+	*seen_prios |= BIT(prio);
+
+	admin_status = nla_get_u8(tb[ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS]);
+	switch (admin_status) {
+	case ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_PREEMPTABLE:
+		preemptable_prios |= BIT(prio);
+		break;
+	case ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_EXPRESS:
+		preemptable_prios &= ~BIT(prio);
+		break;
+	default:
+		NL_SET_ERR_MSG_ATTR(extack, nest,
+				    "Unexpected value for 'admin_status' attribute");
+		return -EINVAL;
+	}
+
+	if (preemptable_prios != params->preemptable_prios) {
+		params->preemptable_prios = preemptable_prios;
+		*mod = true;
+	}
+
+	return 0;
+}
+
+static int fp_parse_param_table(const struct nlattr *nest,
+				struct ethtool_fp_param *params, bool *mod,
+				struct netlink_ext_ack *extack)
+{
+	u8 seen_prios = 0;
+	struct nlattr *n;
+	int rem, ret;
+
+	if (!nest)
+		return 0;
+
+	nla_for_each_nested(n, nest, rem) {
+		struct sched_entry *entry;
+
+		if (nla_type(n) != ETHTOOL_A_FP_PARAM_ENTRY) {
+			NL_SET_ERR_MSG_ATTR(extack, n,
+					    "Attribute is not of type 'entry'");
+			continue;
+		}
+
+		ret = fp_parse_param_entry(n, params, &seen_prios, mod, extack);
+		if (ret < 0) {
+			kfree(entry);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+int ethnl_set_fp_param(struct sk_buff *skb, struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	struct ethnl_req_info req_info = {};
+	struct ethtool_fp_param params = {};
+	struct nlattr **tb = info->attrs;
+	const struct ethtool_ops *ops;
+	struct net_device *dev;
+	bool mod = false;
+	int ret;
+
+	ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_FP_HEADER],
+					 genl_info_net(info), extack, true);
+	if (ret)
+		return ret;
+
+	dev = req_info.dev;
+	ops = dev->ethtool_ops;
+
+	if (!ops->get_fp_param || !ops->set_fp_param) {
+		ret = -EOPNOTSUPP;
+		goto out_dev;
+	}
+
+	rtnl_lock();
+	ret = ethnl_ops_begin(dev);
+	if (ret)
+		goto out_rtnl;
+
+	dev->ethtool_ops->get_fp_param(dev, &params);
+
+	ethnl_update_u32(&params.hold_advance, tb[ETHTOOL_A_FP_HOLD_ADVANCE],
+			 &mod);
+	ethnl_update_u32(&params.release_advance,
+			 tb[ETHTOOL_A_FP_RELEASE_ADVANCE], &mod);
+
+	ret = fp_parse_param_table(tb[ETHTOOL_A_FP_PARAM_TABLE], &params, &mod,
+				   extack);
+	if (ret || !mod)
+		goto out_ops;
+
+	ret = dev->ethtool_ops->set_fp_param(dev, &params, extack);
+	if (ret) {
+		if (!extack->_msg)
+			NL_SET_ERR_MSG(extack,
+				       "Failed to update frame preemption parameters");
+		goto out_ops;
+	}
+
+	ethtool_notify(dev, ETHTOOL_MSG_FP_NTF, NULL);
+
+out_ops:
+	ethnl_ops_complete(dev);
+out_rtnl:
+	rtnl_unlock();
+out_dev:
+	dev_put(dev);
+	return ret;
+}
diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c
new file mode 100644
index 000000000000..d4731fb7aee4
--- /dev/null
+++ b/net/ethtool/mm.c
@@ -0,0 +1,228 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022 NXP
+ */
+#include "common.h"
+#include "netlink.h"
+
+struct mm_req_info {
+	struct ethnl_req_info		base;
+};
+
+struct mm_reply_data {
+	struct ethnl_reply_data		base;
+	struct ethtool_mm_state		state;
+	struct ethtool_mm_stats		stats;
+};
+
+#define MM_REPDATA(__reply_base) \
+	container_of(__reply_base, struct mm_reply_data, base)
+
+#define ETHTOOL_MM_STAT_CNT \
+	(__ETHTOOL_A_MM_STAT_CNT - (ETHTOOL_A_MM_STAT_PAD + 1))
+
+const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1] = {
+	[ETHTOOL_A_MM_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy_stats),
+};
+
+static int mm_prepare_data(const struct ethnl_req_info *req_base,
+			   struct ethnl_reply_data *reply_base,
+			   struct genl_info *info)
+{
+	struct mm_reply_data *data = MM_REPDATA(reply_base);
+	struct net_device *dev = reply_base->dev;
+	const struct ethtool_ops *ops;
+	int ret;
+
+	ops = dev->ethtool_ops;
+
+	if (!ops->get_mm_state)
+		return -EOPNOTSUPP;
+
+	ethtool_stats_init((u64 *)&data->stats,
+			   sizeof(data->stats) / sizeof(u64));
+
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		return ret;
+
+	ops->get_mm_state(dev, &data->state);
+
+	if (ops->get_mm_stats && (req_base->flags & ETHTOOL_FLAG_STATS))
+		ops->get_mm_stats(dev, &data->stats);
+
+	ethnl_ops_complete(dev);
+
+	return 0;
+}
+
+static int mm_reply_size(const struct ethnl_req_info *req_base,
+			 const struct ethnl_reply_data *reply_base)
+{
+	int len = 0;
+
+	len += nla_total_size(sizeof(u8)); /* _MM_VERIFY_STATUS */
+	len += nla_total_size(sizeof(u32)); /* _MM_VERIFY_TIME */
+	len += nla_total_size(sizeof(u8)); /* _MM_SUPPORTED */
+	len += nla_total_size(sizeof(u8)); /* _MM_ENABLED */
+	len += nla_total_size(sizeof(u8)); /* _MM_ACTIVE */
+	len += nla_total_size(sizeof(u8)); /* _MM_ADD_FRAG_SIZE */
+
+	if (req_base->flags & ETHTOOL_FLAG_STATS)
+		len += nla_total_size(0) + /* _MM_STATS */
+		       nla_total_size_64bit(sizeof(u64)) * ETHTOOL_MM_STAT_CNT;
+
+	return len;
+}
+
+static int mm_put_stat(struct sk_buff *skb, u64 val, u16 attrtype)
+{
+	if (val == ETHTOOL_STAT_NOT_SET)
+		return 0;
+	if (nla_put_u64_64bit(skb, attrtype, val, ETHTOOL_A_MM_STAT_PAD))
+		return -EMSGSIZE;
+	return 0;
+}
+
+static int mm_put_stats(struct sk_buff *skb,
+			const struct ethtool_mm_stats *stats)
+{
+	struct nlattr *nest;
+
+	nest = nla_nest_start(skb, ETHTOOL_A_MM_STATS);
+	if (!nest)
+		return -EMSGSIZE;
+
+	if (mm_put_stat(skb, stats->MACMergeFrameAssErrorCount,
+			ETHTOOL_A_MM_STAT_REASSEMBLY_ERRORS) ||
+	    mm_put_stat(skb, stats->MACMergeFrameSmdErrorCount,
+			ETHTOOL_A_MM_STAT_SMD_ERRORS) ||
+	    mm_put_stat(skb, stats->MACMergeFrameAssOkCount,
+			ETHTOOL_A_MM_STAT_REASSEMBLY_OK) ||
+	    mm_put_stat(skb, stats->MACMergeFragCountRx,
+			ETHTOOL_A_MM_STAT_RX_FRAG_COUNT) ||
+	    mm_put_stat(skb, stats->MACMergeFragCountTx,
+			ETHTOOL_A_MM_STAT_TX_FRAG_COUNT) ||
+	    mm_put_stat(skb, stats->MACMergeHoldCount,
+			ETHTOOL_A_MM_STAT_HOLD_COUNT))
+		goto err_cancel;
+
+	nla_nest_end(skb, nest);
+	return 0;
+
+err_cancel:
+	nla_nest_cancel(skb, nest);
+	return -EMSGSIZE;
+}
+
+static int mm_fill_reply(struct sk_buff *skb,
+			 const struct ethnl_req_info *req_base,
+			 const struct ethnl_reply_data *reply_base)
+{
+	const struct mm_reply_data *data = MM_REPDATA(reply_base);
+	const struct ethtool_mm_state *state = &data->state;
+
+	if (nla_put_u8(skb, ETHTOOL_A_MM_VERIFY_STATUS, state->verify_status) ||
+	    nla_put_u32(skb, ETHTOOL_A_MM_VERIFY_TIME, state->verify_time) ||
+	    nla_put_u8(skb, ETHTOOL_A_MM_SUPPORTED, state->supported) ||
+	    nla_put_u8(skb, ETHTOOL_A_MM_ENABLED, state->enabled) ||
+	    nla_put_u8(skb, ETHTOOL_A_MM_ACTIVE, state->active) ||
+	    nla_put_u8(skb, ETHTOOL_A_MM_ADD_FRAG_SIZE, state->add_frag_size))
+		return -EMSGSIZE;
+
+	if (req_base->flags & ETHTOOL_FLAG_STATS &&
+	    mm_put_stats(skb, &data->stats))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+const struct ethnl_request_ops ethnl_mm_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_MM_GET,
+	.reply_cmd		= ETHTOOL_MSG_MM_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_MM_HEADER,
+	.req_info_size		= sizeof(struct mm_req_info),
+	.reply_data_size	= sizeof(struct mm_reply_data),
+
+	.prepare_data		= mm_prepare_data,
+	.reply_size		= mm_reply_size,
+	.fill_reply		= mm_fill_reply,
+};
+
+const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1] = {
+	[ETHTOOL_A_MM_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_MM_VERIFY_DISABLE] = { .type = NLA_U8 },
+	[ETHTOOL_A_MM_VERIFY_TIME] = { .type = NLA_U32 },
+	[ETHTOOL_A_MM_ENABLED] = { .type = NLA_U8 },
+	[ETHTOOL_A_MM_ADD_FRAG_SIZE] = { .type = NLA_U8 },
+};
+
+static void mm_state_to_cfg(const struct ethtool_mm_state *state,
+			    struct ethtool_mm_cfg *cfg)
+{
+	cfg->verify_disable =
+		state->verify_status == ETHTOOL_MM_VERIFY_STATUS_DISABLED;
+	cfg->verify_time = state->verify_time;
+	cfg->enabled = state->enabled;
+	cfg->add_frag_size = state->add_frag_size;
+}
+
+int ethnl_set_mm_cfg(struct sk_buff *skb, struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	struct ethnl_req_info req_info = {};
+	struct ethtool_mm_state state = {};
+	struct nlattr **tb = info->attrs;
+	struct ethtool_mm_cfg cfg = {};
+	const struct ethtool_ops *ops;
+	struct net_device *dev;
+	bool mod = false;
+	int ret;
+
+	ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_MM_HEADER],
+					 genl_info_net(info), extack, true);
+	if (ret)
+		return ret;
+
+	dev = req_info.dev;
+	ops = dev->ethtool_ops;
+
+	if (!ops->get_mm_state || !ops->set_mm_cfg) {
+		ret = -EOPNOTSUPP;
+		goto out_dev;
+	}
+
+	rtnl_lock();
+	ret = ethnl_ops_begin(dev);
+	if (ret)
+		goto out_rtnl;
+
+	ops->get_mm_state(dev, &state);
+
+	mm_state_to_cfg(&state, &cfg);
+
+	ethnl_update_bool(&cfg.verify_disable, tb[ETHTOOL_A_MM_VERIFY_DISABLE],
+			  &mod);
+	ethnl_update_u32(&cfg.verify_time, tb[ETHTOOL_A_MM_VERIFY_TIME], &mod);
+	ethnl_update_bool(&cfg.enabled, tb[ETHTOOL_A_MM_ENABLED], &mod);
+	ethnl_update_u8(&cfg.add_frag_size, tb[ETHTOOL_A_MM_ADD_FRAG_SIZE],
+			&mod);
+
+	ret = ops->set_mm_cfg(dev, &cfg, extack);
+	if (ret) {
+		if (!extack->_msg)
+			NL_SET_ERR_MSG(extack,
+				       "Failed to update MAC merge configuration");
+		goto out_ops;
+	}
+
+	ethtool_notify(dev, ETHTOOL_MSG_MM_NTF, NULL);
+
+out_ops:
+	ethnl_ops_complete(dev);
+out_rtnl:
+	rtnl_unlock();
+out_dev:
+	dev_put(dev);
+	return ret;
+}
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index e26079e11835..82ad2bd92d70 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -286,6 +286,8 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
 	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
 	[ETHTOOL_MSG_PHC_VCLOCKS_GET]	= &ethnl_phc_vclocks_request_ops,
 	[ETHTOOL_MSG_MODULE_GET]	= &ethnl_module_request_ops,
+	[ETHTOOL_MSG_FP_GET]		= &ethnl_fp_request_ops,
+	[ETHTOOL_MSG_MM_GET]		= &ethnl_mm_request_ops,
 };
 
 static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
@@ -598,6 +600,8 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
 	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
 	[ETHTOOL_MSG_MODULE_NTF]	= &ethnl_module_request_ops,
+	[ETHTOOL_MSG_FP_NTF]		= &ethnl_fp_request_ops,
+	[ETHTOOL_MSG_MM_NTF]		= &ethnl_mm_request_ops,
 };
 
 /* default notification handler */
@@ -691,6 +695,8 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
 	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_MODULE_NTF]	= ethnl_default_notify,
+	[ETHTOOL_MSG_FP_NTF]		= ethnl_default_notify,
+	[ETHTOOL_MSG_MM_NTF]		= ethnl_default_notify,
 };
 
 void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
@@ -1020,6 +1026,38 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_module_set_policy,
 		.maxattr = ARRAY_SIZE(ethnl_module_set_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_FP_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_fp_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_fp_get_policy) - 1,
+	},
+	{
+		.cmd	= ETHTOOL_MSG_FP_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_set_fp_param,
+		.policy = ethnl_fp_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_fp_set_policy) - 1,
+	},
+	{
+		.cmd	= ETHTOOL_MSG_MM_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_mm_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_mm_get_policy) - 1,
+	},
+	{
+		.cmd	= ETHTOOL_MSG_MM_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_set_mm_cfg,
+		.policy = ethnl_mm_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_mm_set_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 1653fd2cf0cf..a2e56df74c85 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -371,6 +371,8 @@ extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
 extern const struct ethnl_request_ops ethnl_stats_request_ops;
 extern const struct ethnl_request_ops ethnl_phc_vclocks_request_ops;
 extern const struct ethnl_request_ops ethnl_module_request_ops;
+extern const struct ethnl_request_ops ethnl_fp_request_ops;
+extern const struct ethnl_request_ops ethnl_mm_request_ops;
 
 extern const struct nla_policy ethnl_header_policy[ETHTOOL_A_HEADER_FLAGS + 1];
 extern const struct nla_policy ethnl_header_policy_stats[ETHTOOL_A_HEADER_FLAGS + 1];
@@ -409,6 +411,10 @@ extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1
 extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
 extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
 extern const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MODE_POLICY + 1];
+extern const struct nla_policy ethnl_fp_get_policy[ETHTOOL_A_FP_HEADER + 1];
+extern const struct nla_policy ethnl_fp_set_policy[ETHTOOL_A_FP_MAX + 1];
+extern const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1];
+extern const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
 int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info);
@@ -428,6 +434,8 @@ int ethnl_tunnel_info_start(struct netlink_callback *cb);
 int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
 int ethnl_set_module(struct sk_buff *skb, struct genl_info *info);
+int ethnl_set_fp_param(struct sk_buff *skb, struct genl_info *info);
+int ethnl_set_mm_cfg(struct sk_buff *skb, struct genl_info *info);
 
 extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
 extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC PATCH net-next 3/7] net: ethtool: stats: make stats_put_stats() take input from multiple sources
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
  2022-08-16 22:29 ` [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool() Vladimir Oltean
  2022-08-16 22:29 ` [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer Vladimir Oltean
@ 2022-08-16 22:29 ` Vladimir Oltean
  2022-08-16 22:29 ` [RFC PATCH net-next 4/7] net: ethtool: stats: replicate standardized counters for the pMAC Vladimir Oltean
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-16 22:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

IEEE 802.3 defines a MAC merge layer, for supporting the 802.1Q Frame
Preemption capability. Beyond the MAC merge layer there are 2 distinct
MACs, one is for express traffic (eMAC, aka the good old MAC that the
Linux network stack knows about) and the other is for preemptable
traffic (pMAC).

The pMAC is a MAC in its own right, and has a full set of statistics
that can be exposed through the standardized ethtool counters (RMON,
802.3 etc).

In preparation for extending struct stats_reply_data to store both the
eMAC and the pMAC stats structures, modify stats_put_stats() to not
assume that we only care about the eMAC stats. Instead, we will
introduce separate ops for pMAC stats, and put those stats into separate
netlink attributes, and take them from different instances of the same
structure types as in the case of the eMAC.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/ethtool/stats.c | 148 +++++++++++++++++++++++++-------------------
 1 file changed, 86 insertions(+), 62 deletions(-)

diff --git a/net/ethtool/stats.c b/net/ethtool/stats.c
index a20e0a24ff61..8d4d3c70c0a4 100644
--- a/net/ethtool/stats.c
+++ b/net/ethtool/stats.c
@@ -9,6 +9,11 @@ struct stats_req_info {
 	DECLARE_BITMAP(stat_mask, __ETHTOOL_STATS_CNT);
 };
 
+struct rmon_stats_put_ctx {
+	const struct ethtool_rmon_stats *rmon_stats;
+	const struct ethtool_rmon_hist_range *rmon_ranges;
+};
+
 #define STATS_REQINFO(__req_base) \
 	container_of(__req_base, struct stats_req_info, base)
 
@@ -110,8 +115,11 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base,
 	const struct stats_req_info *req_info = STATS_REQINFO(req_base);
 	struct stats_reply_data *data = STATS_REPDATA(reply_base);
 	struct net_device *dev = reply_base->dev;
+	const struct ethtool_ops *ops;
 	int ret;
 
+	ops = dev->ethtool_ops;
+
 	ret = ethnl_ops_begin(dev);
 	if (ret < 0)
 		return ret;
@@ -122,18 +130,18 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base,
 	memset(&data->stats, 0xff, sizeof(data->stats));
 
 	if (test_bit(ETHTOOL_STATS_ETH_PHY, req_info->stat_mask) &&
-	    dev->ethtool_ops->get_eth_phy_stats)
-		dev->ethtool_ops->get_eth_phy_stats(dev, &data->phy_stats);
+	    ops->get_eth_phy_stats)
+		ops->get_eth_phy_stats(dev, &data->phy_stats);
 	if (test_bit(ETHTOOL_STATS_ETH_MAC, req_info->stat_mask) &&
-	    dev->ethtool_ops->get_eth_mac_stats)
-		dev->ethtool_ops->get_eth_mac_stats(dev, &data->mac_stats);
+	    ops->get_eth_mac_stats)
+		ops->get_eth_mac_stats(dev, &data->mac_stats);
 	if (test_bit(ETHTOOL_STATS_ETH_CTRL, req_info->stat_mask) &&
-	    dev->ethtool_ops->get_eth_ctrl_stats)
-		dev->ethtool_ops->get_eth_ctrl_stats(dev, &data->ctrl_stats);
+	    ops->get_eth_ctrl_stats)
+		ops->get_eth_ctrl_stats(dev, &data->ctrl_stats);
 	if (test_bit(ETHTOOL_STATS_RMON, req_info->stat_mask) &&
-	    dev->ethtool_ops->get_rmon_stats)
-		dev->ethtool_ops->get_rmon_stats(dev, &data->rmon_stats,
-						 &data->rmon_ranges);
+	    ops->get_rmon_stats)
+		ops->get_rmon_stats(dev, &data->rmon_stats,
+				    &data->rmon_ranges);
 
 	ethnl_ops_complete(dev);
 	return 0;
@@ -212,76 +220,82 @@ static int stat_put(struct sk_buff *skb, u16 attrtype, u64 val)
 	return 0;
 }
 
-static int stats_put_phy_stats(struct sk_buff *skb,
-			       const struct stats_reply_data *data)
+static int stats_put_phy_stats(struct sk_buff *skb, const void *src)
 {
+	const struct ethtool_eth_phy_stats *phy_stats = src;
+
 	if (stat_put(skb, ETHTOOL_A_STATS_ETH_PHY_5_SYM_ERR,
-		     data->phy_stats.SymbolErrorDuringCarrier))
+		     phy_stats->SymbolErrorDuringCarrier))
 		return -EMSGSIZE;
+
 	return 0;
 }
 
-static int stats_put_mac_stats(struct sk_buff *skb,
-			       const struct stats_reply_data *data)
+static int stats_put_mac_stats(struct sk_buff *skb, const void *src)
 {
+	const struct ethtool_eth_mac_stats *mac_stats = src;
+
 	if (stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_2_TX_PKT,
-		     data->mac_stats.FramesTransmittedOK) ||
+		     mac_stats->FramesTransmittedOK) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_3_SINGLE_COL,
-		     data->mac_stats.SingleCollisionFrames) ||
+		     mac_stats->SingleCollisionFrames) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_4_MULTI_COL,
-		     data->mac_stats.MultipleCollisionFrames) ||
+		     mac_stats->MultipleCollisionFrames) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_5_RX_PKT,
-		     data->mac_stats.FramesReceivedOK) ||
+		     mac_stats->FramesReceivedOK) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_6_FCS_ERR,
-		     data->mac_stats.FrameCheckSequenceErrors) ||
+		     mac_stats->FrameCheckSequenceErrors) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_7_ALIGN_ERR,
-		     data->mac_stats.AlignmentErrors) ||
+		     mac_stats->AlignmentErrors) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_8_TX_BYTES,
-		     data->mac_stats.OctetsTransmittedOK) ||
+		     mac_stats->OctetsTransmittedOK) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_9_TX_DEFER,
-		     data->mac_stats.FramesWithDeferredXmissions) ||
+		     mac_stats->FramesWithDeferredXmissions) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_10_LATE_COL,
-		     data->mac_stats.LateCollisions) ||
+		     mac_stats->LateCollisions) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_11_XS_COL,
-		     data->mac_stats.FramesAbortedDueToXSColls) ||
+		     mac_stats->FramesAbortedDueToXSColls) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_12_TX_INT_ERR,
-		     data->mac_stats.FramesLostDueToIntMACXmitError) ||
+		     mac_stats->FramesLostDueToIntMACXmitError) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_13_CS_ERR,
-		     data->mac_stats.CarrierSenseErrors) ||
+		     mac_stats->CarrierSenseErrors) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_14_RX_BYTES,
-		     data->mac_stats.OctetsReceivedOK) ||
+		     mac_stats->OctetsReceivedOK) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_15_RX_INT_ERR,
-		     data->mac_stats.FramesLostDueToIntMACRcvError) ||
+		     mac_stats->FramesLostDueToIntMACRcvError) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_18_TX_MCAST,
-		     data->mac_stats.MulticastFramesXmittedOK) ||
+		     mac_stats->MulticastFramesXmittedOK) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_19_TX_BCAST,
-		     data->mac_stats.BroadcastFramesXmittedOK) ||
+		     mac_stats->BroadcastFramesXmittedOK) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_20_XS_DEFER,
-		     data->mac_stats.FramesWithExcessiveDeferral) ||
+		     mac_stats->FramesWithExcessiveDeferral) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_21_RX_MCAST,
-		     data->mac_stats.MulticastFramesReceivedOK) ||
+		     mac_stats->MulticastFramesReceivedOK) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_22_RX_BCAST,
-		     data->mac_stats.BroadcastFramesReceivedOK) ||
+		     mac_stats->BroadcastFramesReceivedOK) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_23_IR_LEN_ERR,
-		     data->mac_stats.InRangeLengthErrors) ||
+		     mac_stats->InRangeLengthErrors) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_24_OOR_LEN,
-		     data->mac_stats.OutOfRangeLengthField) ||
+		     mac_stats->OutOfRangeLengthField) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_MAC_25_TOO_LONG_ERR,
-		     data->mac_stats.FrameTooLongErrors))
+		     mac_stats->FrameTooLongErrors))
 		return -EMSGSIZE;
+
 	return 0;
 }
 
-static int stats_put_ctrl_stats(struct sk_buff *skb,
-				const struct stats_reply_data *data)
+static int stats_put_ctrl_stats(struct sk_buff *skb, const void *src)
 {
+	const struct ethtool_eth_ctrl_stats *ctrl_stats = src;
+
 	if (stat_put(skb, ETHTOOL_A_STATS_ETH_CTRL_3_TX,
-		     data->ctrl_stats.MACControlFramesTransmitted) ||
+		     ctrl_stats->MACControlFramesTransmitted) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_CTRL_4_RX,
-		     data->ctrl_stats.MACControlFramesReceived) ||
+		     ctrl_stats->MACControlFramesReceived) ||
 	    stat_put(skb, ETHTOOL_A_STATS_ETH_CTRL_5_RX_UNSUP,
-		     data->ctrl_stats.UnsupportedOpcodesReceived))
+		     ctrl_stats->UnsupportedOpcodesReceived))
 		return -EMSGSIZE;
+
 	return 0;
 }
 
@@ -322,33 +336,34 @@ static int stats_put_rmon_hist(struct sk_buff *skb, u32 attr, const u64 *hist,
 	return -EMSGSIZE;
 }
 
-static int stats_put_rmon_stats(struct sk_buff *skb,
-				const struct stats_reply_data *data)
+static int stats_put_rmon_stats(struct sk_buff *skb, const void *src)
 {
+	const struct ethtool_rmon_hist_range *rmon_ranges;
+	const struct ethtool_rmon_stats *rmon_stats;
+	const struct rmon_stats_put_ctx *ctx = src;
+
+	rmon_stats = ctx->rmon_stats;
+	rmon_ranges = ctx->rmon_ranges;
+
 	if (stats_put_rmon_hist(skb, ETHTOOL_A_STATS_GRP_HIST_RX,
-				data->rmon_stats.hist, data->rmon_ranges) ||
+				rmon_stats->hist, rmon_ranges) ||
 	    stats_put_rmon_hist(skb, ETHTOOL_A_STATS_GRP_HIST_TX,
-				data->rmon_stats.hist_tx, data->rmon_ranges))
+				rmon_stats->hist_tx, rmon_ranges))
 		return -EMSGSIZE;
 
 	if (stat_put(skb, ETHTOOL_A_STATS_RMON_UNDERSIZE,
-		     data->rmon_stats.undersize_pkts) ||
+		     rmon_stats->undersize_pkts) ||
 	    stat_put(skb, ETHTOOL_A_STATS_RMON_OVERSIZE,
-		     data->rmon_stats.oversize_pkts) ||
-	    stat_put(skb, ETHTOOL_A_STATS_RMON_FRAG,
-		     data->rmon_stats.fragments) ||
-	    stat_put(skb, ETHTOOL_A_STATS_RMON_JABBER,
-		     data->rmon_stats.jabbers))
+		     rmon_stats->oversize_pkts) ||
+	    stat_put(skb, ETHTOOL_A_STATS_RMON_FRAG, rmon_stats->fragments) ||
+	    stat_put(skb, ETHTOOL_A_STATS_RMON_JABBER, rmon_stats->jabbers))
 		return -EMSGSIZE;
 
 	return 0;
 }
 
-static int stats_put_stats(struct sk_buff *skb,
-			   const struct stats_reply_data *data,
-			   u32 id, u32 ss_id,
-			   int (*cb)(struct sk_buff *skb,
-				     const struct stats_reply_data *data))
+static int stats_put_stats(struct sk_buff *skb, const void *src, u32 id, u32 ss_id,
+			   int (*cb)(struct sk_buff *skb, const void *src))
 {
 	struct nlattr *nest;
 
@@ -360,7 +375,7 @@ static int stats_put_stats(struct sk_buff *skb,
 	    nla_put_u32(skb, ETHTOOL_A_STATS_GRP_SS_ID, ss_id))
 		goto err_cancel;
 
-	if (cb(skb, data))
+	if (cb(skb, src))
 		goto err_cancel;
 
 	nla_nest_end(skb, nest);
@@ -380,20 +395,29 @@ static int stats_fill_reply(struct sk_buff *skb,
 	int ret = 0;
 
 	if (!ret && test_bit(ETHTOOL_STATS_ETH_PHY, req_info->stat_mask))
-		ret = stats_put_stats(skb, data, ETHTOOL_STATS_ETH_PHY,
+		ret = stats_put_stats(skb, &data->phy_stats,
+				      ETHTOOL_STATS_ETH_PHY,
 				      ETH_SS_STATS_ETH_PHY,
 				      stats_put_phy_stats);
 	if (!ret && test_bit(ETHTOOL_STATS_ETH_MAC, req_info->stat_mask))
-		ret = stats_put_stats(skb, data, ETHTOOL_STATS_ETH_MAC,
+		ret = stats_put_stats(skb, &data->mac_stats,
+				      ETHTOOL_STATS_ETH_MAC,
 				      ETH_SS_STATS_ETH_MAC,
 				      stats_put_mac_stats);
 	if (!ret && test_bit(ETHTOOL_STATS_ETH_CTRL, req_info->stat_mask))
-		ret = stats_put_stats(skb, data, ETHTOOL_STATS_ETH_CTRL,
+		ret = stats_put_stats(skb, &data->ctrl_stats,
+				      ETHTOOL_STATS_ETH_CTRL,
 				      ETH_SS_STATS_ETH_CTRL,
 				      stats_put_ctrl_stats);
-	if (!ret && test_bit(ETHTOOL_STATS_RMON, req_info->stat_mask))
-		ret = stats_put_stats(skb, data, ETHTOOL_STATS_RMON,
+	if (!ret && test_bit(ETHTOOL_STATS_RMON, req_info->stat_mask)) {
+		struct rmon_stats_put_ctx ctx = {
+			.rmon_stats = &data->rmon_stats,
+			.rmon_ranges = data->rmon_ranges,
+		};
+
+		ret = stats_put_stats(skb, &ctx, ETHTOOL_STATS_RMON,
 				      ETH_SS_STATS_RMON, stats_put_rmon_stats);
+	}
 
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC PATCH net-next 4/7] net: ethtool: stats: replicate standardized counters for the pMAC
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
                   ` (2 preceding siblings ...)
  2022-08-16 22:29 ` [RFC PATCH net-next 3/7] net: ethtool: stats: make stats_put_stats() take input from multiple sources Vladimir Oltean
@ 2022-08-16 22:29 ` Vladimir Oltean
  2022-08-16 22:29 ` [RFC PATCH net-next 5/7] net: enetc: parameterize port MAC stats to also cover " Vladimir Oltean
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-16 22:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

Devices supporting a MAC Merge layer will have a second Ethernet MAC per
port which handles preemptable traffic. This has all the same counters
as the regular (express) MAC, so export 4 new groups of statistics for
it, which have all the same structure as the ones for the eMAC, except
for different names.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/linux/ethtool.h              |  9 ++++
 include/uapi/linux/ethtool_netlink.h |  4 ++
 net/ethtool/stats.c                  | 70 ++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index fa504dd22bf6..ce3b118bcf19 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -802,6 +802,15 @@ struct ethtool_ops {
 	void	(*get_rmon_stats)(struct net_device *dev,
 				  struct ethtool_rmon_stats *rmon_stats,
 				  const struct ethtool_rmon_hist_range **ranges);
+	void	(*get_eth_pmac_phy_stats)(struct net_device *dev,
+					  struct ethtool_eth_phy_stats *phy_stats);
+	void	(*get_eth_pmac_mac_stats)(struct net_device *dev,
+					  struct ethtool_eth_mac_stats *mac_stats);
+	void	(*get_eth_pmac_ctrl_stats)(struct net_device *dev,
+					   struct ethtool_eth_ctrl_stats *ctrl_stats);
+	void	(*get_pmac_rmon_stats)(struct net_device *dev,
+				       struct ethtool_rmon_stats *rmon_stats,
+				       const struct ethtool_rmon_hist_range **ranges);
 	int	(*get_module_power_mode)(struct net_device *dev,
 					 struct ethtool_module_power_mode_params *params,
 					 struct netlink_ext_ack *extack);
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 658810274c49..a2743ffb6c63 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -742,6 +742,10 @@ enum {
 	ETHTOOL_STATS_ETH_MAC,
 	ETHTOOL_STATS_ETH_CTRL,
 	ETHTOOL_STATS_RMON,
+	ETHTOOL_STATS_ETH_PMAC_PHY,
+	ETHTOOL_STATS_ETH_PMAC_MAC,
+	ETHTOOL_STATS_ETH_PMAC_CTRL,
+	ETHTOOL_STATS_PMAC_RMON,
 
 	/* add new constants above here */
 	__ETHTOOL_STATS_CNT
diff --git a/net/ethtool/stats.c b/net/ethtool/stats.c
index 8d4d3c70c0a4..50e858b540c2 100644
--- a/net/ethtool/stats.c
+++ b/net/ethtool/stats.c
@@ -24,8 +24,13 @@ struct stats_reply_data {
 		struct ethtool_eth_mac_stats	mac_stats;
 		struct ethtool_eth_ctrl_stats	ctrl_stats;
 		struct ethtool_rmon_stats	rmon_stats;
+		struct ethtool_eth_phy_stats	pmac_phy_stats;
+		struct ethtool_eth_mac_stats	pmac_mac_stats;
+		struct ethtool_eth_ctrl_stats	pmac_ctrl_stats;
+		struct ethtool_rmon_stats	pmac_rmon_stats;
 	);
 	const struct ethtool_rmon_hist_range	*rmon_ranges;
+	const struct ethtool_rmon_hist_range	*pmac_rmon_ranges;
 };
 
 #define STATS_REPDATA(__reply_base) \
@@ -36,6 +41,10 @@ const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN] = {
 	[ETHTOOL_STATS_ETH_MAC]			= "eth-mac",
 	[ETHTOOL_STATS_ETH_CTRL]		= "eth-ctrl",
 	[ETHTOOL_STATS_RMON]			= "rmon",
+	[ETHTOOL_STATS_ETH_PMAC_PHY]		= "eth-pmac-phy",
+	[ETHTOOL_STATS_ETH_PMAC_MAC]		= "eth-pmac-mac",
+	[ETHTOOL_STATS_ETH_PMAC_CTRL]		= "eth-pmac-ctrl",
+	[ETHTOOL_STATS_PMAC_RMON]		= "pmac-rmon",
 };
 
 const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN] = {
@@ -143,6 +152,20 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base,
 		ops->get_rmon_stats(dev, &data->rmon_stats,
 				    &data->rmon_ranges);
 
+	if (test_bit(ETHTOOL_STATS_ETH_PMAC_PHY, req_info->stat_mask) &&
+	    ops->get_eth_pmac_phy_stats)
+		ops->get_eth_pmac_phy_stats(dev, &data->pmac_phy_stats);
+	if (test_bit(ETHTOOL_STATS_ETH_PMAC_MAC, req_info->stat_mask) &&
+	    ops->get_eth_pmac_mac_stats)
+		ops->get_eth_pmac_mac_stats(dev, &data->pmac_mac_stats);
+	if (test_bit(ETHTOOL_STATS_ETH_PMAC_CTRL, req_info->stat_mask) &&
+	    ops->get_eth_pmac_ctrl_stats)
+		ops->get_eth_pmac_ctrl_stats(dev, &data->pmac_ctrl_stats);
+	if (test_bit(ETHTOOL_STATS_PMAC_RMON, req_info->stat_mask) &&
+	    ops->get_pmac_rmon_stats)
+		ops->get_pmac_rmon_stats(dev, &data->pmac_rmon_stats,
+					 &data->pmac_rmon_ranges);
+
 	ethnl_ops_complete(dev);
 	return 0;
 }
@@ -176,6 +199,28 @@ static int stats_reply_size(const struct ethnl_req_info *req_base,
 			nla_total_size(4)) *	/* _A_STATS_GRP_HIST_BKT_HI */
 			ETHTOOL_RMON_HIST_MAX * 2;
 	}
+	if (test_bit(ETHTOOL_STATS_ETH_PMAC_PHY, req_info->stat_mask)) {
+		n_stats += sizeof(struct ethtool_eth_mac_stats) / sizeof(u64);
+		n_grps++;
+	}
+	if (test_bit(ETHTOOL_STATS_ETH_PMAC_MAC, req_info->stat_mask)) {
+		n_stats += sizeof(struct ethtool_eth_mac_stats) / sizeof(u64);
+		n_grps++;
+	}
+	if (test_bit(ETHTOOL_STATS_ETH_PMAC_CTRL, req_info->stat_mask)) {
+		n_stats += sizeof(struct ethtool_eth_ctrl_stats) / sizeof(u64);
+		n_grps++;
+	}
+	if (test_bit(ETHTOOL_STATS_PMAC_RMON, req_info->stat_mask)) {
+		n_stats += sizeof(struct ethtool_rmon_stats) / sizeof(u64);
+		n_grps++;
+		/* Above includes the space for _A_STATS_GRP_HIST_VALs */
+
+		len += (nla_total_size(0) +	/* _A_STATS_GRP_HIST */
+			nla_total_size(4) +	/* _A_STATS_GRP_HIST_BKT_LOW */
+			nla_total_size(4)) *	/* _A_STATS_GRP_HIST_BKT_HI */
+			ETHTOOL_RMON_HIST_MAX * 2;
+	}
 
 	len += n_grps * (nla_total_size(0) + /* _A_STATS_GRP */
 			 nla_total_size(4) + /* _A_STATS_GRP_ID */
@@ -419,6 +464,31 @@ static int stats_fill_reply(struct sk_buff *skb,
 				      ETH_SS_STATS_RMON, stats_put_rmon_stats);
 	}
 
+	if (!ret && test_bit(ETHTOOL_STATS_ETH_PMAC_PHY, req_info->stat_mask))
+		ret = stats_put_stats(skb, &data->pmac_phy_stats,
+				      ETHTOOL_STATS_ETH_PMAC_PHY,
+				      ETH_SS_STATS_ETH_PHY,
+				      stats_put_phy_stats);
+	if (!ret && test_bit(ETHTOOL_STATS_ETH_PMAC_MAC, req_info->stat_mask))
+		ret = stats_put_stats(skb, &data->pmac_mac_stats,
+				      ETHTOOL_STATS_ETH_PMAC_MAC,
+				      ETH_SS_STATS_ETH_MAC,
+				      stats_put_mac_stats);
+	if (!ret && test_bit(ETHTOOL_STATS_ETH_PMAC_CTRL, req_info->stat_mask))
+		ret = stats_put_stats(skb, &data->pmac_ctrl_stats,
+				      ETHTOOL_STATS_ETH_PMAC_CTRL,
+				      ETH_SS_STATS_ETH_CTRL,
+				      stats_put_ctrl_stats);
+	if (!ret && test_bit(ETHTOOL_STATS_PMAC_RMON, req_info->stat_mask)) {
+		struct rmon_stats_put_ctx ctx = {
+			.rmon_stats = &data->pmac_rmon_stats,
+			.rmon_ranges = data->pmac_rmon_ranges,
+		};
+
+		ret = stats_put_stats(skb, &ctx, ETHTOOL_STATS_PMAC_RMON,
+				      ETH_SS_STATS_RMON, stats_put_rmon_stats);
+	}
+
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC PATCH net-next 5/7] net: enetc: parameterize port MAC stats to also cover the pMAC
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
                   ` (3 preceding siblings ...)
  2022-08-16 22:29 ` [RFC PATCH net-next 4/7] net: ethtool: stats: replicate standardized counters for the pMAC Vladimir Oltean
@ 2022-08-16 22:29 ` Vladimir Oltean
  2022-08-16 22:29 ` [RFC PATCH net-next 6/7] net: enetc: expose some standardized ethtool counters Vladimir Oltean
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-16 22:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

The ENETC has counters for the eMAC and for the pMAC exactly 0x1000
apart from each other. The driver only contains definitions for PM0,
the eMAC.

Rather than duplicating everything for PM1, modify the register
definitions such that they take the MAC as argument.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 .../ethernet/freescale/enetc/enetc_ethtool.c  | 124 +++++++++---------
 .../net/ethernet/freescale/enetc/enetc_hw.h   | 106 +++++++--------
 2 files changed, 116 insertions(+), 114 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
index ff872e40ce85..236bb24ec999 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
@@ -125,68 +125,68 @@ static const struct {
 	int reg;
 	char name[ETH_GSTRING_LEN];
 } enetc_port_counters[] = {
-	{ ENETC_PM0_REOCT,  "MAC rx ethernet octets" },
-	{ ENETC_PM0_RALN,   "MAC rx alignment errors" },
-	{ ENETC_PM0_RXPF,   "MAC rx valid pause frames" },
-	{ ENETC_PM0_RFRM,   "MAC rx valid frames" },
-	{ ENETC_PM0_RFCS,   "MAC rx fcs errors" },
-	{ ENETC_PM0_RVLAN,  "MAC rx VLAN frames" },
-	{ ENETC_PM0_RERR,   "MAC rx frame errors" },
-	{ ENETC_PM0_RUCA,   "MAC rx unicast frames" },
-	{ ENETC_PM0_RMCA,   "MAC rx multicast frames" },
-	{ ENETC_PM0_RBCA,   "MAC rx broadcast frames" },
-	{ ENETC_PM0_RDRP,   "MAC rx dropped packets" },
-	{ ENETC_PM0_RPKT,   "MAC rx packets" },
-	{ ENETC_PM0_RUND,   "MAC rx undersized packets" },
-	{ ENETC_PM0_R64,    "MAC rx 64 byte packets" },
-	{ ENETC_PM0_R127,   "MAC rx 65-127 byte packets" },
-	{ ENETC_PM0_R255,   "MAC rx 128-255 byte packets" },
-	{ ENETC_PM0_R511,   "MAC rx 256-511 byte packets" },
-	{ ENETC_PM0_R1023,  "MAC rx 512-1023 byte packets" },
-	{ ENETC_PM0_R1522,  "MAC rx 1024-1522 byte packets" },
-	{ ENETC_PM0_R1523X, "MAC rx 1523 to max-octet packets" },
-	{ ENETC_PM0_ROVR,   "MAC rx oversized packets" },
-	{ ENETC_PM0_RJBR,   "MAC rx jabber packets" },
-	{ ENETC_PM0_RFRG,   "MAC rx fragment packets" },
-	{ ENETC_PM0_RCNP,   "MAC rx control packets" },
-	{ ENETC_PM0_RDRNTP, "MAC rx fifo drop" },
-	{ ENETC_PM0_TEOCT,  "MAC tx ethernet octets" },
-	{ ENETC_PM0_TOCT,   "MAC tx octets" },
-	{ ENETC_PM0_TCRSE,  "MAC tx carrier sense errors" },
-	{ ENETC_PM0_TXPF,   "MAC tx valid pause frames" },
-	{ ENETC_PM0_TFRM,   "MAC tx frames" },
-	{ ENETC_PM0_TFCS,   "MAC tx fcs errors" },
-	{ ENETC_PM0_TVLAN,  "MAC tx VLAN frames" },
-	{ ENETC_PM0_TERR,   "MAC tx frame errors" },
-	{ ENETC_PM0_TUCA,   "MAC tx unicast frames" },
-	{ ENETC_PM0_TMCA,   "MAC tx multicast frames" },
-	{ ENETC_PM0_TBCA,   "MAC tx broadcast frames" },
-	{ ENETC_PM0_TPKT,   "MAC tx packets" },
-	{ ENETC_PM0_TUND,   "MAC tx undersized packets" },
-	{ ENETC_PM0_T64,    "MAC tx 64 byte packets" },
-	{ ENETC_PM0_T127,   "MAC tx 65-127 byte packets" },
-	{ ENETC_PM0_T255,   "MAC tx 128-255 byte packets" },
-	{ ENETC_PM0_T511,   "MAC tx 256-511 byte packets" },
-	{ ENETC_PM0_T1023,  "MAC tx 512-1023 byte packets" },
-	{ ENETC_PM0_T1522,  "MAC tx 1024-1522 byte packets" },
-	{ ENETC_PM0_T1523X, "MAC tx 1523 to max-octet packets" },
-	{ ENETC_PM0_TCNP,   "MAC tx control packets" },
-	{ ENETC_PM0_TDFR,   "MAC tx deferred packets" },
-	{ ENETC_PM0_TMCOL,  "MAC tx multiple collisions" },
-	{ ENETC_PM0_TSCOL,  "MAC tx single collisions" },
-	{ ENETC_PM0_TLCOL,  "MAC tx late collisions" },
-	{ ENETC_PM0_TECOL,  "MAC tx excessive collisions" },
-	{ ENETC_UFDMF,      "SI MAC nomatch u-cast discards" },
-	{ ENETC_MFDMF,      "SI MAC nomatch m-cast discards" },
-	{ ENETC_PBFDSIR,    "SI MAC nomatch b-cast discards" },
-	{ ENETC_PUFDVFR,    "SI VLAN nomatch u-cast discards" },
-	{ ENETC_PMFDVFR,    "SI VLAN nomatch m-cast discards" },
-	{ ENETC_PBFDVFR,    "SI VLAN nomatch b-cast discards" },
-	{ ENETC_PFDMSAPR,   "SI pruning discarded frames" },
-	{ ENETC_PICDR(0),   "ICM DR0 discarded frames" },
-	{ ENETC_PICDR(1),   "ICM DR1 discarded frames" },
-	{ ENETC_PICDR(2),   "ICM DR2 discarded frames" },
-	{ ENETC_PICDR(3),   "ICM DR3 discarded frames" },
+	{ ENETC_PM_REOCT(0),	"MAC rx ethernet octets" },
+	{ ENETC_PM_RALN(0),	"MAC rx alignment errors" },
+	{ ENETC_PM_RXPF(0),	"MAC rx valid pause frames" },
+	{ ENETC_PM_RFRM(0),	"MAC rx valid frames" },
+	{ ENETC_PM_RFCS(0),	"MAC rx fcs errors" },
+	{ ENETC_PM_RVLAN(0),	"MAC rx VLAN frames" },
+	{ ENETC_PM_RERR(0),	"MAC rx frame errors" },
+	{ ENETC_PM_RUCA(0),	"MAC rx unicast frames" },
+	{ ENETC_PM_RMCA(0),	"MAC rx multicast frames" },
+	{ ENETC_PM_RBCA(0),	"MAC rx broadcast frames" },
+	{ ENETC_PM_RDRP(0),	"MAC rx dropped packets" },
+	{ ENETC_PM_RPKT(0),	"MAC rx packets" },
+	{ ENETC_PM_RUND(0),	"MAC rx undersized packets" },
+	{ ENETC_PM_R64(0),	"MAC rx 64 byte packets" },
+	{ ENETC_PM_R127(0),	"MAC rx 65-127 byte packets" },
+	{ ENETC_PM_R255(0),	"MAC rx 128-255 byte packets" },
+	{ ENETC_PM_R511(0),	"MAC rx 256-511 byte packets" },
+	{ ENETC_PM_R1023(0),	"MAC rx 512-1023 byte packets" },
+	{ ENETC_PM_R1522(0),	"MAC rx 1024-1522 byte packets" },
+	{ ENETC_PM_R1523X(0),	"MAC rx 1523 to max-octet packets" },
+	{ ENETC_PM_ROVR(0),	"MAC rx oversized packets" },
+	{ ENETC_PM_RJBR(0),	"MAC rx jabber packets" },
+	{ ENETC_PM_RFRG(0),	"MAC rx fragment packets" },
+	{ ENETC_PM_RCNP(0),	"MAC rx control packets" },
+	{ ENETC_PM_RDRNTP(0),	"MAC rx fifo drop" },
+	{ ENETC_PM_TEOCT(0),	"MAC tx ethernet octets" },
+	{ ENETC_PM_TOCT(0),	"MAC tx octets" },
+	{ ENETC_PM_TCRSE(0),	"MAC tx carrier sense errors" },
+	{ ENETC_PM_TXPF(0),	"MAC tx valid pause frames" },
+	{ ENETC_PM_TFRM(0),	"MAC tx frames" },
+	{ ENETC_PM_TFCS(0),	"MAC tx fcs errors" },
+	{ ENETC_PM_TVLAN(0),	"MAC tx VLAN frames" },
+	{ ENETC_PM_TERR(0),	"MAC tx frame errors" },
+	{ ENETC_PM_TUCA(0),	"MAC tx unicast frames" },
+	{ ENETC_PM_TMCA(0),	"MAC tx multicast frames" },
+	{ ENETC_PM_TBCA(0),	"MAC tx broadcast frames" },
+	{ ENETC_PM_TPKT(0),	"MAC tx packets" },
+	{ ENETC_PM_TUND(0),	"MAC tx undersized packets" },
+	{ ENETC_PM_T64(0),	"MAC tx 64 byte packets" },
+	{ ENETC_PM_T127(0),	"MAC tx 65-127 byte packets" },
+	{ ENETC_PM_T255(0),	"MAC tx 128-255 byte packets" },
+	{ ENETC_PM_T511(0),	"MAC tx 256-511 byte packets" },
+	{ ENETC_PM_T1023(0),	"MAC tx 512-1023 byte packets" },
+	{ ENETC_PM_T1522(0),	"MAC tx 1024-1522 byte packets" },
+	{ ENETC_PM_T1523X(0),	"MAC tx 1523 to max-octet packets" },
+	{ ENETC_PM_TCNP(0),	"MAC tx control packets" },
+	{ ENETC_PM_TDFR(0),	"MAC tx deferred packets" },
+	{ ENETC_PM_TMCOL(0),	"MAC tx multiple collisions" },
+	{ ENETC_PM_TSCOL(0),	"MAC tx single collisions" },
+	{ ENETC_PM_TLCOL(0),	"MAC tx late collisions" },
+	{ ENETC_PM_TECOL(0),	"MAC tx excessive collisions" },
+	{ ENETC_UFDMF,		"SI MAC nomatch u-cast discards" },
+	{ ENETC_MFDMF,		"SI MAC nomatch m-cast discards" },
+	{ ENETC_PBFDSIR,	"SI MAC nomatch b-cast discards" },
+	{ ENETC_PUFDVFR,	"SI VLAN nomatch u-cast discards" },
+	{ ENETC_PMFDVFR,	"SI VLAN nomatch m-cast discards" },
+	{ ENETC_PBFDVFR,	"SI VLAN nomatch b-cast discards" },
+	{ ENETC_PFDMSAPR,	"SI pruning discarded frames" },
+	{ ENETC_PICDR(0),	"ICM DR0 discarded frames" },
+	{ ENETC_PICDR(1),	"ICM DR1 discarded frames" },
+	{ ENETC_PICDR(2),	"ICM DR2 discarded frames" },
+	{ ENETC_PICDR(3),	"ICM DR3 discarded frames" },
 };
 
 static const char rx_ring_stats[][ETH_GSTRING_LEN] = {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 647c87f73bf7..0b85e37a00eb 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -276,58 +276,60 @@ enum enetc_bdr_type {TX, RX};
 #define ENETC_PFMCAPR		0x1b38
 #define ENETC_PFMCAPR_MSK	GENMASK(15, 0)
 
-/* MAC counters */
-#define ENETC_PM0_REOCT		0x8100
-#define ENETC_PM0_RALN		0x8110
-#define ENETC_PM0_RXPF		0x8118
-#define ENETC_PM0_RFRM		0x8120
-#define ENETC_PM0_RFCS		0x8128
-#define ENETC_PM0_RVLAN		0x8130
-#define ENETC_PM0_RERR		0x8138
-#define ENETC_PM0_RUCA		0x8140
-#define ENETC_PM0_RMCA		0x8148
-#define ENETC_PM0_RBCA		0x8150
-#define ENETC_PM0_RDRP		0x8158
-#define ENETC_PM0_RPKT		0x8160
-#define ENETC_PM0_RUND		0x8168
-#define ENETC_PM0_R64		0x8170
-#define ENETC_PM0_R127		0x8178
-#define ENETC_PM0_R255		0x8180
-#define ENETC_PM0_R511		0x8188
-#define ENETC_PM0_R1023		0x8190
-#define ENETC_PM0_R1522		0x8198
-#define ENETC_PM0_R1523X	0x81A0
-#define ENETC_PM0_ROVR		0x81A8
-#define ENETC_PM0_RJBR		0x81B0
-#define ENETC_PM0_RFRG		0x81B8
-#define ENETC_PM0_RCNP		0x81C0
-#define ENETC_PM0_RDRNTP	0x81C8
-#define ENETC_PM0_TEOCT		0x8200
-#define ENETC_PM0_TOCT		0x8208
-#define ENETC_PM0_TCRSE		0x8210
-#define ENETC_PM0_TXPF		0x8218
-#define ENETC_PM0_TFRM		0x8220
-#define ENETC_PM0_TFCS		0x8228
-#define ENETC_PM0_TVLAN		0x8230
-#define ENETC_PM0_TERR		0x8238
-#define ENETC_PM0_TUCA		0x8240
-#define ENETC_PM0_TMCA		0x8248
-#define ENETC_PM0_TBCA		0x8250
-#define ENETC_PM0_TPKT		0x8260
-#define ENETC_PM0_TUND		0x8268
-#define ENETC_PM0_T64		0x8270
-#define ENETC_PM0_T127		0x8278
-#define ENETC_PM0_T255		0x8280
-#define ENETC_PM0_T511		0x8288
-#define ENETC_PM0_T1023		0x8290
-#define ENETC_PM0_T1522		0x8298
-#define ENETC_PM0_T1523X	0x82A0
-#define ENETC_PM0_TCNP		0x82C0
-#define ENETC_PM0_TDFR		0x82D0
-#define ENETC_PM0_TMCOL		0x82D8
-#define ENETC_PM0_TSCOL		0x82E0
-#define ENETC_PM0_TLCOL		0x82E8
-#define ENETC_PM0_TECOL		0x82F0
+/* Port MAC counters: Port MAC 0 corresponds to the eMAC and
+ * Port MAC 1 to the pMAC.
+ */
+#define ENETC_PM_REOCT(mac)	(0x8100 + 0x1000 * (mac))
+#define ENETC_PM_RALN(mac)	(0x8110 + 0x1000 * (mac))
+#define ENETC_PM_RXPF(mac)	(0x8118 + 0x1000 * (mac))
+#define ENETC_PM_RFRM(mac)	(0x8120 + 0x1000 * (mac))
+#define ENETC_PM_RFCS(mac)	(0x8128 + 0x1000 * (mac))
+#define ENETC_PM_RVLAN(mac)	(0x8130 + 0x1000 * (mac))
+#define ENETC_PM_RERR(mac)	(0x8138 + 0x1000 * (mac))
+#define ENETC_PM_RUCA(mac)	(0x8140 + 0x1000 * (mac))
+#define ENETC_PM_RMCA(mac)	(0x8148 + 0x1000 * (mac))
+#define ENETC_PM_RBCA(mac)	(0x8150 + 0x1000 * (mac))
+#define ENETC_PM_RDRP(mac)	(0x8158 + 0x1000 * (mac))
+#define ENETC_PM_RPKT(mac)	(0x8160 + 0x1000 * (mac))
+#define ENETC_PM_RUND(mac)	(0x8168 + 0x1000 * (mac))
+#define ENETC_PM_R64(mac)	(0x8170 + 0x1000 * (mac))
+#define ENETC_PM_R127(mac)	(0x8178 + 0x1000 * (mac))
+#define ENETC_PM_R255(mac)	(0x8180 + 0x1000 * (mac))
+#define ENETC_PM_R511(mac)	(0x8188 + 0x1000 * (mac))
+#define ENETC_PM_R1023(mac)	(0x8190 + 0x1000 * (mac))
+#define ENETC_PM_R1522(mac)	(0x8198 + 0x1000 * (mac))
+#define ENETC_PM_R1523X(mac)	(0x81A0 + 0x1000 * (mac))
+#define ENETC_PM_ROVR(mac)	(0x81A8 + 0x1000 * (mac))
+#define ENETC_PM_RJBR(mac)	(0x81B0 + 0x1000 * (mac))
+#define ENETC_PM_RFRG(mac)	(0x81B8 + 0x1000 * (mac))
+#define ENETC_PM_RCNP(mac)	(0x81C0 + 0x1000 * (mac))
+#define ENETC_PM_RDRNTP(mac)	(0x81C8 + 0x1000 * (mac))
+#define ENETC_PM_TEOCT(mac)	(0x8200 + 0x1000 * (mac))
+#define ENETC_PM_TOCT(mac)	(0x8208 + 0x1000 * (mac))
+#define ENETC_PM_TCRSE(mac)	(0x8210 + 0x1000 * (mac))
+#define ENETC_PM_TXPF(mac)	(0x8218 + 0x1000 * (mac))
+#define ENETC_PM_TFRM(mac)	(0x8220 + 0x1000 * (mac))
+#define ENETC_PM_TFCS(mac)	(0x8228 + 0x1000 * (mac))
+#define ENETC_PM_TVLAN(mac)	(0x8230 + 0x1000 * (mac))
+#define ENETC_PM_TERR(mac)	(0x8238 + 0x1000 * (mac))
+#define ENETC_PM_TUCA(mac)	(0x8240 + 0x1000 * (mac))
+#define ENETC_PM_TMCA(mac)	(0x8248 + 0x1000 * (mac))
+#define ENETC_PM_TBCA(mac)	(0x8250 + 0x1000 * (mac))
+#define ENETC_PM_TPKT(mac)	(0x8260 + 0x1000 * (mac))
+#define ENETC_PM_TUND(mac)	(0x8268 + 0x1000 * (mac))
+#define ENETC_PM_T64(mac)	(0x8270 + 0x1000 * (mac))
+#define ENETC_PM_T127(mac)	(0x8278 + 0x1000 * (mac))
+#define ENETC_PM_T255(mac)	(0x8280 + 0x1000 * (mac))
+#define ENETC_PM_T511(mac)	(0x8288 + 0x1000 * (mac))
+#define ENETC_PM_T1023(mac)	(0x8290 + 0x1000 * (mac))
+#define ENETC_PM_T1522(mac)	(0x8298 + 0x1000 * (mac))
+#define ENETC_PM_T1523X(mac)	(0x82A0 + 0x1000 * (mac))
+#define ENETC_PM_TCNP(mac)	(0x82C0 + 0x1000 * (mac))
+#define ENETC_PM_TDFR(mac)	(0x82D0 + 0x1000 * (mac))
+#define ENETC_PM_TMCOL(mac)	(0x82D8 + 0x1000 * (mac))
+#define ENETC_PM_TSCOL(mac)	(0x82E0 + 0x1000 * (mac))
+#define ENETC_PM_TLCOL(mac)	(0x82E8 + 0x1000 * (mac))
+#define ENETC_PM_TECOL(mac)	(0x82F0 + 0x1000 * (mac))
 
 /* Port counters */
 #define ENETC_PICDR(n)		(0x0700 + (n) * 8) /* n = [0..3] */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC PATCH net-next 6/7] net: enetc: expose some standardized ethtool counters
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
                   ` (4 preceding siblings ...)
  2022-08-16 22:29 ` [RFC PATCH net-next 5/7] net: enetc: parameterize port MAC stats to also cover " Vladimir Oltean
@ 2022-08-16 22:29 ` Vladimir Oltean
  2022-08-16 22:29 ` [RFC PATCH net-next 7/7] net: enetc: add support for Frame Preemption and MAC Merge layer Vladimir Oltean
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-16 22:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

Structure the code in such a way that it can be reused later for the
pMAC statistics, by just changing the "mac" argument to 1.

Usage:
ethtool --include-statistics --show-pause eno2
ethtool -S eno0 --groups eth-mac
ethtool -S eno0 --groups eth-ctrl
ethtool -S eno0 --groups rmon

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 .../ethernet/freescale/enetc/enetc_ethtool.c  | 111 ++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
index 236bb24ec999..25705b2c1be9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
@@ -301,6 +301,113 @@ static void enetc_get_ethtool_stats(struct net_device *ndev,
 		data[o++] = enetc_port_rd(hw, enetc_port_counters[i].reg);
 }
 
+static void enetc_get_pause_stats(struct net_device *ndev,
+				  struct ethtool_pause_stats *pause_stats)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct enetc_hw *hw = &priv->si->hw;
+
+	pause_stats->tx_pause_frames = enetc_port_rd(hw, ENETC_PM_TXPF(0));
+	pause_stats->rx_pause_frames = enetc_port_rd(hw, ENETC_PM_RXPF(0));
+}
+
+static void enetc_mac_stats(struct enetc_hw *hw, int mac,
+			    struct ethtool_eth_mac_stats *s)
+{
+	s->FramesTransmittedOK = enetc_port_rd(hw, ENETC_PM_TFRM(mac));
+	s->SingleCollisionFrames = enetc_port_rd(hw, ENETC_PM_TSCOL(mac));
+	s->MultipleCollisionFrames = enetc_port_rd(hw, ENETC_PM_TMCOL(mac));
+	s->FramesReceivedOK = enetc_port_rd(hw, ENETC_PM_RFRM(mac));
+	s->FrameCheckSequenceErrors = enetc_port_rd(hw, ENETC_PM_RFCS(mac));
+	s->AlignmentErrors = enetc_port_rd(hw, ENETC_PM_RALN(mac));
+	s->OctetsTransmittedOK = enetc_port_rd(hw, ENETC_PM_TEOCT(mac));
+	s->FramesWithDeferredXmissions = enetc_port_rd(hw, ENETC_PM_TDFR(mac));
+	s->LateCollisions = enetc_port_rd(hw, ENETC_PM_TLCOL(mac));
+	s->FramesAbortedDueToXSColls = enetc_port_rd(hw, ENETC_PM_TECOL(mac));
+	s->FramesLostDueToIntMACXmitError = enetc_port_rd(hw, ENETC_PM_TERR(mac));
+	s->CarrierSenseErrors = enetc_port_rd(hw, ENETC_PM_TCRSE(mac));
+	s->OctetsReceivedOK = enetc_port_rd(hw, ENETC_PM_REOCT(mac));
+	s->FramesLostDueToIntMACRcvError = enetc_port_rd(hw, ENETC_PM_RDRNTP(mac));
+	s->MulticastFramesXmittedOK = enetc_port_rd(hw, ENETC_PM_TMCA(mac));
+	s->BroadcastFramesXmittedOK = enetc_port_rd(hw, ENETC_PM_TBCA(mac));
+	s->MulticastFramesReceivedOK = enetc_port_rd(hw, ENETC_PM_RMCA(mac));
+	s->BroadcastFramesReceivedOK = enetc_port_rd(hw, ENETC_PM_RBCA(mac));
+}
+
+static void enetc_ctrl_stats(struct enetc_hw *hw, int mac,
+			     struct ethtool_eth_ctrl_stats *s)
+{
+	s->MACControlFramesTransmitted = enetc_port_rd(hw, ENETC_PM_TCNP(mac));
+	s->MACControlFramesReceived = enetc_port_rd(hw, ENETC_PM_RCNP(mac));
+}
+
+static const struct ethtool_rmon_hist_range enetc_rmon_ranges[] = {
+	{   64,   64 },
+	{   65,  127 },
+	{  128,  255 },
+	{  256,  511 },
+	{  512, 1023 },
+	{ 1024, 1522 },
+	{ 1523, 9000 },
+	{},
+};
+
+static void enetc_rmon_stats(struct enetc_hw *hw, int mac,
+			     struct ethtool_rmon_stats *s,
+			     const struct ethtool_rmon_hist_range **ranges)
+{
+	s->undersize_pkts = enetc_port_rd(hw, ENETC_PM_RUND(mac));
+	s->oversize_pkts = enetc_port_rd(hw, ENETC_PM_ROVR(mac));
+	s->fragments = enetc_port_rd(hw, ENETC_PM_RFRG(mac));
+	s->jabbers = enetc_port_rd(hw, ENETC_PM_RJBR(mac));
+
+	s->hist[0] = enetc_port_rd(hw, ENETC_PM_R64(mac));
+	s->hist[1] = enetc_port_rd(hw, ENETC_PM_R127(mac));
+	s->hist[2] = enetc_port_rd(hw, ENETC_PM_R255(mac));
+	s->hist[3] = enetc_port_rd(hw, ENETC_PM_R511(mac));
+	s->hist[4] = enetc_port_rd(hw, ENETC_PM_R1023(mac));
+	s->hist[5] = enetc_port_rd(hw, ENETC_PM_R1522(mac));
+	s->hist[6] = enetc_port_rd(hw, ENETC_PM_R1523X(mac));
+
+	s->hist_tx[0] = enetc_port_rd(hw, ENETC_PM_T64(mac));
+	s->hist_tx[1] = enetc_port_rd(hw, ENETC_PM_T127(mac));
+	s->hist_tx[2] = enetc_port_rd(hw, ENETC_PM_T255(mac));
+	s->hist_tx[3] = enetc_port_rd(hw, ENETC_PM_T511(mac));
+	s->hist_tx[4] = enetc_port_rd(hw, ENETC_PM_T1023(mac));
+	s->hist_tx[5] = enetc_port_rd(hw, ENETC_PM_T1522(mac));
+	s->hist_tx[6] = enetc_port_rd(hw, ENETC_PM_T1523X(mac));
+
+	*ranges = enetc_rmon_ranges;
+}
+
+static void enetc_get_eth_mac_stats(struct net_device *ndev,
+				    struct ethtool_eth_mac_stats *mac_stats)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct enetc_hw *hw = &priv->si->hw;
+
+	enetc_mac_stats(hw, 0, mac_stats);
+}
+
+static void enetc_get_eth_ctrl_stats(struct net_device *ndev,
+				     struct ethtool_eth_ctrl_stats *ctrl_stats)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct enetc_hw *hw = &priv->si->hw;
+
+	enetc_ctrl_stats(hw, 0, ctrl_stats);
+}
+
+static void enetc_get_rmon_stats(struct net_device *ndev,
+				 struct ethtool_rmon_stats *rmon_stats,
+				 const struct ethtool_rmon_hist_range **ranges)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct enetc_hw *hw = &priv->si->hw;
+
+	enetc_rmon_stats(hw, 0, rmon_stats, ranges);
+}
+
 #define ENETC_RSSHASH_L3 (RXH_L2DA | RXH_VLAN | RXH_L3_PROTO | RXH_IP_SRC | \
 			  RXH_IP_DST)
 #define ENETC_RSSHASH_L4 (ENETC_RSSHASH_L3 | RXH_L4_B_0_1 | RXH_L4_B_2_3)
@@ -766,6 +873,10 @@ static const struct ethtool_ops enetc_pf_ethtool_ops = {
 	.get_sset_count = enetc_get_sset_count,
 	.get_strings = enetc_get_strings,
 	.get_ethtool_stats = enetc_get_ethtool_stats,
+	.get_pause_stats = enetc_get_pause_stats,
+	.get_rmon_stats = enetc_get_rmon_stats,
+	.get_eth_ctrl_stats = enetc_get_eth_ctrl_stats,
+	.get_eth_mac_stats = enetc_get_eth_mac_stats,
 	.get_rxnfc = enetc_get_rxnfc,
 	.set_rxnfc = enetc_set_rxnfc,
 	.get_rxfh_key_size = enetc_get_rxfh_key_size,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC PATCH net-next 7/7] net: enetc: add support for Frame Preemption and MAC Merge layer
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
                   ` (5 preceding siblings ...)
  2022-08-16 22:29 ` [RFC PATCH net-next 6/7] net: enetc: expose some standardized ethtool counters Vladimir Oltean
@ 2022-08-16 22:29 ` Vladimir Oltean
  2022-08-17  3:34 ` [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Jakub Kicinski
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-16 22:29 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

Add PF driver support for the following:

- Viewing and changing the Frame Preemption parameters. Currently only
  the admin-status (preemptable/express per priority) is exposed.
  The spec also mentions HoldAdvance and ReleaseAdvance as times in ns,
  however the ENETC treats these as bytes, and we currently lack the
  proper logic to do the bit time <-> byte count conversion as the link
  speed changes.

- Viewing and changing the MAC Merge sublayer parameters, and seeing the
  verification state machine's current state. The verification handshake
  with the link partner is driven by hardware.

- Viewing the standardized MAC Merge layer counters.

- Viewing the standardized Ethernet MAC and RMON counters associated
  with the pMAC.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 .../ethernet/freescale/enetc/enetc_ethtool.c  | 164 ++++++++++++++++++
 .../net/ethernet/freescale/enetc/enetc_hw.h   |  26 ++-
 2 files changed, 189 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
index 25705b2c1be9..9a6869c7663a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
@@ -408,6 +408,34 @@ static void enetc_get_rmon_stats(struct net_device *ndev,
 	enetc_rmon_stats(hw, 0, rmon_stats, ranges);
 }
 
+static void enetc_get_pmac_mac_stats(struct net_device *ndev,
+				     struct ethtool_eth_mac_stats *mac_stats)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct enetc_hw *hw = &priv->si->hw;
+
+	enetc_mac_stats(hw, 1, mac_stats);
+}
+
+static void enetc_get_pmac_ctrl_stats(struct net_device *ndev,
+				      struct ethtool_eth_ctrl_stats *ctrl_stats)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct enetc_hw *hw = &priv->si->hw;
+
+	enetc_ctrl_stats(hw, 1, ctrl_stats);
+}
+
+static void enetc_get_pmac_rmon_stats(struct net_device *ndev,
+				      struct ethtool_rmon_stats *rmon_stats,
+				      const struct ethtool_rmon_hist_range **ranges)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct enetc_hw *hw = &priv->si->hw;
+
+	enetc_rmon_stats(hw, 1, rmon_stats, ranges);
+}
+
 #define ENETC_RSSHASH_L3 (RXH_L2DA | RXH_VLAN | RXH_L3_PROTO | RXH_IP_SRC | \
 			  RXH_IP_DST)
 #define ENETC_RSSHASH_L4 (ENETC_RSSHASH_L3 | RXH_L4_B_0_1 | RXH_L4_B_2_3)
@@ -864,6 +892,134 @@ static int enetc_set_link_ksettings(struct net_device *dev,
 	return phylink_ethtool_ksettings_set(priv->phylink, cmd);
 }
 
+static void enetc_get_fp_param(struct net_device *ndev,
+			       struct ethtool_fp_param *params)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	u32 val;
+	int tc;
+
+	for (tc = 0; tc < 8; tc++) {
+		val = enetc_port_rd(&priv->si->hw, ENETC_PTCFPR(tc));
+
+		if (val & ENETC_PTCFPR_FPE)
+			params->preemptable_prios |= BIT(tc);
+	}
+}
+
+static int enetc_set_fp_param(struct net_device *ndev,
+			      const struct ethtool_fp_param *params,
+			      struct netlink_ext_ack *extack)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	u32 val;
+	int tc;
+
+	for (tc = 0; tc < 8; tc++) {
+		val = enetc_port_rd(&priv->si->hw, ENETC_PTCFPR(tc));
+
+		if (params->preemptable_prios & BIT(tc))
+			val |= ENETC_PTCFPR_FPE;
+		else
+			val &= ~ENETC_PTCFPR_FPE;
+
+		enetc_port_wr(&priv->si->hw, ENETC_PTCFPR(tc), val);
+	}
+
+	return 0;
+}
+
+static void enetc_get_mm_state(struct net_device *ndev,
+			       struct ethtool_mm_state *state)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	val = enetc_port_rd(&priv->si->hw, ENETC_MMCSR);
+
+	switch (ENETC_MMCSR_GET_VSTS(val)) {
+	case 0:
+		state->verify_status = ETHTOOL_MM_VERIFY_STATUS_DISABLED;
+		break;
+	case 1:
+		state->verify_status = ETHTOOL_MM_VERIFY_STATUS_INITIAL;
+		break;
+	case 2:
+		state->verify_status = ETHTOOL_MM_VERIFY_STATUS_VERIFYING;
+		break;
+	case 3:
+		state->verify_status = ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED;
+		break;
+	case 4:
+		state->verify_status = ETHTOOL_MM_VERIFY_STATUS_FAILED;
+		break;
+	case 5:
+	default:
+		state->verify_status = ETHTOOL_MM_VERIFY_STATUS_UNKNOWN;
+		break;
+	}
+
+	state->supported = val & ENETC_MMCSR_LPS;
+	state->enabled = val & ENETC_MMCSR_LPE;
+	state->active = val & ENETC_MMCSR_LPA;
+	state->add_frag_size = ENETC_MMCSR_GET_RAFS(val);
+	state->verify_time = ENETC_MMCSR_GET_VT(val);
+}
+
+static int enetc_set_mm_cfg(struct net_device *ndev,
+			    struct ethtool_mm_cfg *cfg,
+			    struct netlink_ext_ack *extack)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (cfg->add_frag_size > 3) {
+		NL_SET_ERR_MSG_MOD(extack, "addFragSize out of range 0-3");
+		return -ERANGE;
+	}
+
+	if (cfg->verify_time < 1 || cfg->verify_time > 128) {
+		NL_SET_ERR_MSG_MOD(extack, "verifyTime out of range 1-128 ms");
+		return -ERANGE;
+	}
+
+	val = enetc_port_rd(&priv->si->hw, ENETC_MMCSR);
+
+	if (cfg->verify_disable)
+		val |= ENETC_MMCSR_VDIS;
+	else
+		val &= ~ENETC_MMCSR_VDIS;
+
+	if (cfg->enabled)
+		val |= ENETC_MMCSR_ME;
+	else
+		val &= ~ENETC_MMCSR_ME;
+
+	val &= ~ENETC_MMCSR_VT_MASK;
+	val |= ENETC_MMCSR_VT(cfg->verify_time);
+
+	val &= ~ENETC_MMCSR_RAFS_MASK;
+	val |= ENETC_MMCSR_RAFS(cfg->add_frag_size);
+
+	enetc_port_wr(&priv->si->hw, ENETC_MMCSR, val);
+
+	return 0;
+}
+
+static void enetc_get_mm_stats(struct net_device *ndev,
+			       struct ethtool_mm_stats *s)
+{
+	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct enetc_hw *hw = &priv->si->hw;
+
+	s->MACMergeFrameAssErrorCount = enetc_port_rd(hw, ENETC_MMFAECR);
+	s->MACMergeFrameSmdErrorCount = enetc_port_rd(hw, ENETC_MMFSECR);
+	s->MACMergeFrameAssOkCount = enetc_port_rd(hw, ENETC_MMFAOCR);
+	s->MACMergeFragCountRx = enetc_port_rd(hw, ENETC_MMFCRXR);
+	s->MACMergeFragCountTx = enetc_port_rd(hw, ENETC_MMFCTXR);
+	s->MACMergeHoldCount = enetc_port_rd(hw, ENETC_MMHCR);
+}
+
 static const struct ethtool_ops enetc_pf_ethtool_ops = {
 	.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
 				     ETHTOOL_COALESCE_MAX_FRAMES |
@@ -877,6 +1033,9 @@ static const struct ethtool_ops enetc_pf_ethtool_ops = {
 	.get_rmon_stats = enetc_get_rmon_stats,
 	.get_eth_ctrl_stats = enetc_get_eth_ctrl_stats,
 	.get_eth_mac_stats = enetc_get_eth_mac_stats,
+	.get_pmac_rmon_stats = enetc_get_pmac_rmon_stats,
+	.get_eth_pmac_ctrl_stats = enetc_get_pmac_ctrl_stats,
+	.get_eth_pmac_mac_stats = enetc_get_pmac_mac_stats,
 	.get_rxnfc = enetc_get_rxnfc,
 	.set_rxnfc = enetc_set_rxnfc,
 	.get_rxfh_key_size = enetc_get_rxfh_key_size,
@@ -894,6 +1053,11 @@ static const struct ethtool_ops enetc_pf_ethtool_ops = {
 	.set_wol = enetc_set_wol,
 	.get_pauseparam = enetc_get_pauseparam,
 	.set_pauseparam = enetc_set_pauseparam,
+	.get_fp_param = enetc_get_fp_param,
+	.set_fp_param = enetc_set_fp_param,
+	.get_mm_state = enetc_get_mm_state,
+	.set_mm_cfg = enetc_set_mm_cfg,
+	.get_mm_stats = enetc_get_mm_stats,
 };
 
 static const struct ethtool_ops enetc_vf_ethtool_ops = {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 0b85e37a00eb..b39ef6f03b3d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -222,7 +222,27 @@ enum enetc_bdr_type {TX, RX};
 #define ENETC_PSIVHFR0(n)	(0x1e00 + (n) * 8) /* n = SI index */
 #define ENETC_PSIVHFR1(n)	(0x1e04 + (n) * 8) /* n = SI index */
 #define ENETC_MMCSR		0x1f00
-#define ENETC_MMCSR_ME		BIT(16)
+#define ENETC_MMCSR_VT_MASK	GENMASK(29, 23) /* Verify Time */
+#define ENETC_MMCSR_VT(x)	(((x) << 23) & ENETC_MMCSR_VT_MASK)
+#define ENETC_MMCSR_GET_VT(x)	(((x) & ENETC_MMCSR_VT_MASK) >> 23)
+#define ENETC_MMCSR_TXSTS_MASK	GENMASK(22, 21) /* Merge Status */
+#define ENETC_MMCSR_GET_TXSTS(x) (((x) & ENETC_MMCSR_TXSTS_MASK) >> 21)
+#define ENETC_MMCSR_VSTS_MASK	GENMASK(20, 18) /* Verify Status */
+#define ENETC_MMCSR_GET_VSTS(x) (((x) & ENETC_MMCSR_VSTS_MASK) >> 18)
+#define ENETC_MMCSR_VDIS	BIT(17) /* Verify Disabled */
+#define ENETC_MMCSR_ME		BIT(16) /* Merge Enabled */
+#define ENETC_MMCSR_RAFS_MASK	GENMASK(9, 8) /* Remote Additional Fragment Size */
+#define ENETC_MMCSR_RAFS(x)	(((x) << 8) & ENETC_MMCSR_RAFS_MASK)
+#define ENETC_MMCSR_GET_RAFS(x)	(((x) & ENETC_MMCSR_RAFS_MASK) >> 8)
+#define ENETC_MMCSR_LPA		BIT(2) /* Local Preemption Active */
+#define ENETC_MMCSR_LPE		BIT(1) /* Local Preemption Enabled */
+#define ENETC_MMCSR_LPS		BIT(0) /* Local Preemption Supported */
+#define ENETC_MMFAECR		0x1f08
+#define ENETC_MMFSECR		0x1f0c
+#define ENETC_MMFAOCR		0x1f10
+#define ENETC_MMFCRXR		0x1f14
+#define ENETC_MMFCTXR		0x1f18
+#define ENETC_MMHCR		0x1f1c
 #define ENETC_PTCMSDUR(n)	(0x2020 + (n) * 4) /* n = TC index [0..7] */
 
 #define ENETC_PM0_CMD_CFG	0x8008
@@ -944,6 +964,10 @@ static inline u32 enetc_usecs_to_cycles(u32 usecs)
 	return (u32)div_u64(usecs * ENETC_CLK, 1000000ULL);
 }
 
+/* Port traffic class frame preemption register */
+#define ENETC_PTCFPR(n)			(0x1910 + (n) * 4) /* n = [0 ..7] */
+#define ENETC_PTCFPR_FPE		BIT(31)
+
 /* port time gating control register */
 #define ENETC_QBV_PTGCR_OFFSET		0x11a00
 #define ENETC_QBV_TGE			BIT(31)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-08-16 22:29 ` [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer Vladimir Oltean
@ 2022-08-17  3:22   ` Jakub Kicinski
  2022-08-17 11:41     ` Vladimir Oltean
  2022-08-17 23:15   ` Vinicius Costa Gomes
  1 sibling, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2022-08-17  3:22 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Wed, 17 Aug 2022 01:29:15 +0300 Vladimir Oltean wrote:
> +/**
> + * struct ethtool_mm_state - 802.3 MAC merge layer state
> + */
> +struct ethtool_mm_state {
> +	u32 verify_time;
> +	enum ethtool_mm_verify_status verify_status;
> +	bool supported;
> +	bool enabled;
> +	bool active;

The enabled vs active piqued my interest. Is there some handshake /
aneg or such?

> +	nest_table = nla_nest_start(skb, ETHTOOL_A_FP_PARAM_TABLE);
> +	if (!nest_table)
> +		return -EMSGSIZE;

Don't warp tables in nests, let the elements repeat in the parent.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
                   ` (6 preceding siblings ...)
  2022-08-16 22:29 ` [RFC PATCH net-next 7/7] net: enetc: add support for Frame Preemption and MAC Merge layer Vladimir Oltean
@ 2022-08-17  3:34 ` Jakub Kicinski
  2022-08-17 11:50   ` Vladimir Oltean
  2022-08-17 22:47 ` Vinicius Costa Gomes
  2022-08-19  8:16 ` Kurt Kanzenbach
  9 siblings, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2022-08-17  3:34 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Wed, 17 Aug 2022 01:29:13 +0300 Vladimir Oltean wrote:
>   What also exists but is not exported here are PAUSE stats for the
>   pMAC. Since those are also protocol-specific stats, I'm not sure how
>   to mix the 2 (MAC Merge layer + PAUSE). Maybe just extend
>   ETHTOOL_A_PAUSE_STAT_TX_FRAMES and ETHTOOL_A_PAUSE_STAT_RX_FRAMES with
>   the pMAC variants?

I have a couple of general questions. The mm and fp are related but fp
can be implemented without mm or they must always come together? (I'd
still split patch 2 for ease of review, tho.)

When we have separate set of stats for pMAC the normal stats are sum of
all traffic, right? So normal - pMAC == eMAC, everything that's not
preemptible is express?

Did you consider adding an attribute for switching between MAC and pMAC
for stats rather than duplicating things?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool()
  2022-08-16 22:29 ` [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool() Vladimir Oltean
@ 2022-08-17 11:27   ` Michal Kubecek
  2022-08-17 11:52     ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Michal Kubecek @ 2022-08-17 11:27 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

[-- Attachment #1: Type: text/plain, Size: 2160 bytes --]

On Wed, Aug 17, 2022 at 01:29:14AM +0300, Vladimir Oltean wrote:
> For a reason I can't really understand, ethnl_update_bool32() exists,
> but the plain function that operates on a boolean value kept in an
> actual u8 netlink attribute doesn't.

I can explain that: at the moment these helpers were introduced, only
members of traditional structures shared with ioctl interface were
updated and all attributes which were booleans logically were
represented as u32 in them so that no other helper was needed back then.


>                                       Introduce it; it's needed for
> things like verify-disabled for the MAC merge configuration.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/ethtool/netlink.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
> index c0d587611854..1653fd2cf0cf 100644
> --- a/net/ethtool/netlink.h
> +++ b/net/ethtool/netlink.h
> @@ -111,6 +111,32 @@ static inline void ethnl_update_u8(u8 *dst, const struct nlattr *attr,
>  	*mod = true;
>  }
>  
> +/**
> + * ethnl_update_bool() - update bool from NLA_U8 attribute
> + * @dst:  value to update
> + * @attr: netlink attribute with new value or null
> + * @mod:  pointer to bool for modification tracking
> + *
> + * Use the u8 value from NLA_U8 netlink attribute @attr to set bool variable
> + * pointed to by @dst to false (if zero) or 1 (if not); do nothing if @attr is
                                               ^ true

Looks good otherwise.

Michal

> + * null. Bool pointed to by @mod is set to true if this function changed the
> + * logical value of *dst, otherwise it is left as is.
> + */
> +static inline void ethnl_update_bool(bool *dst, const struct nlattr *attr,
> +				     bool *mod)
> +{
> +	u8 val;
> +
> +	if (!attr)
> +		return;
> +	val = !!nla_get_u8(attr);
> +	if (*dst == val)
> +		return;
> +
> +	*dst = val;
> +	*mod = true;
> +}
> +
>  /**
>   * ethnl_update_bool32() - update u32 used as bool from NLA_U8 attribute
>   * @dst:  value to update
> -- 
> 2.34.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-08-17  3:22   ` Jakub Kicinski
@ 2022-08-17 11:41     ` Vladimir Oltean
  2022-08-17 18:35       ` Jakub Kicinski
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-17 11:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Tue, Aug 16, 2022 at 08:22:09PM -0700, Jakub Kicinski wrote:
> On Wed, 17 Aug 2022 01:29:15 +0300 Vladimir Oltean wrote:
> > +/**
> > + * struct ethtool_mm_state - 802.3 MAC merge layer state
> > + */
> > +struct ethtool_mm_state {
> > +	u32 verify_time;
> > +	enum ethtool_mm_verify_status verify_status;
> > +	bool supported;
> > +	bool enabled;
> > +	bool active;
> 
> The enabled vs active piqued my interest. Is there some handshake /
> aneg or such?

This is where writing the kdocs, as mentioned in the cover letter, would
have helped. Yes, the handshake is described in 802.3 clause "99.4.3
Verifying preemption operation". It says:

| Verification (see Figure 99–3) checks that the link can support the
| preemption capability.
|
| If verification is enabled, the preemption capability shall be active
| only after verification has completed
| successfully.
|
| If the preemption capability is enabled but has not been verified yet,
| the MAC Merge sublayer initiates verification. Verification relies on
| the transmission of a verify mPacket and receipt of a respond mPacket to
| confirm that the remote station supports the preemption capability.

In fact, the verify_time and verify_status fields right above
enabled/active are related exactly to this handshake process.

> > +	nest_table = nla_nest_start(skb, ETHTOOL_A_FP_PARAM_TABLE);
> > +	if (!nest_table)
> > +		return -EMSGSIZE;
> 
> Don't warp tables in nests, let the elements repeat in the parent.

Ok, can do. I did this because 802.1Q actually specifies in the
IEEE8021-Preemption-MIB that there is a ieee8021PreemptionParameterTable
structure containing pairs of ieee8021PreemptionPriority and
ieee8021FramePreemptionAdminStatus.

Do you have actual issues with the structuring of the FP parameters
though? They look like this currently:

ethtool --json --show-fp eno0
[ {
        "ifname": "eno0",
        "parameter-table": [ {
                "prio": 0,
                "admin-status": "preemptable"
            },{
                "prio": 1,
                "admin-status": "preemptable"
            },{
                "prio": 2,
                "admin-status": "preemptable"
            },{
                "prio": 3,
                "admin-status": "preemptable"
            },{
                "prio": 4,
                "admin-status": "preemptable"
            },{
                "prio": 5,
                "admin-status": "preemptable"
            },{
                "prio": 6,
                "admin-status": "preemptable"
            },{
                "prio": 7,
                "admin-status": "preemptable"
            } ],
        "hold-advance": 0,
        "release-advance": 0
    } ]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-08-17  3:34 ` [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Jakub Kicinski
@ 2022-08-17 11:50   ` Vladimir Oltean
  2022-08-17 18:46     ` Jakub Kicinski
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-17 11:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Tue, Aug 16, 2022 at 08:34:17PM -0700, Jakub Kicinski wrote:
> On Wed, 17 Aug 2022 01:29:13 +0300 Vladimir Oltean wrote:
> >   What also exists but is not exported here are PAUSE stats for the
> >   pMAC. Since those are also protocol-specific stats, I'm not sure how
> >   to mix the 2 (MAC Merge layer + PAUSE). Maybe just extend
> >   ETHTOOL_A_PAUSE_STAT_TX_FRAMES and ETHTOOL_A_PAUSE_STAT_RX_FRAMES with
> >   the pMAC variants?
> 
> I have a couple of general questions. The mm and fp are related but fp
> can be implemented without mm or they must always come together? (I'd
> still split patch 2 for ease of review, tho.)

FP cannot be implemented without MM and MM makes limited (but some)
sense without FP. Since FP just decides which packets you TX via the
pMAC and which via the eMAC, you can configure just the MM layer such
that you interoperate with a FP-capable switch, but you don't actually
generate any preemptable traffic yourself.

In fact, the reasons why I decided to split these are:
- because they are part of different specs, which call for different
  managed objects
- because in an SoC where IPs are mixed and matched from different
  vendors, it makes perfect sense to me that the FP portion (more
  related to the queue/classification system) is provided by one vendor,
  and the MM portion is provided by another. In the future, we may find
  enough commonalities to justify introducing the concept of a dedicated
  MAC driver, independent/reusable between Ethernet controller ("net_device")
  drivers. We have this today already with the PCS layer in phylink.
  So if there is a physical split between the layers, I think keeping a
  split in terms of callbacks makes some sense too.

> When we have separate set of stats for pMAC the normal stats are sum of
> all traffic, right? So normal - pMAC == eMAC, everything that's not
> preemptible is express?

Actually not quite, or at least not for the LS1028A ENETC and Felix switch.
The normal counters report just what the eMAC sees, and the pMAC counters
just what the pMAC sees. After all, only the eMAC was enabled up until now.
Nobody does the addition currently.

> Did you consider adding an attribute for switching between MAC and pMAC
> for stats rather than duplicating things?

No. Could you expand on that idea a little? Add a netlink attribute
where, and this helps reduce duplication where, and how?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool()
  2022-08-17 11:27   ` Michal Kubecek
@ 2022-08-17 11:52     ` Vladimir Oltean
  2022-08-17 12:00       ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-17 11:52 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Wed, Aug 17, 2022 at 01:27:29PM +0200, Michal Kubecek wrote:
> On Wed, Aug 17, 2022 at 01:29:14AM +0300, Vladimir Oltean wrote:
> > For a reason I can't really understand, ethnl_update_bool32() exists,
> > but the plain function that operates on a boolean value kept in an
> > actual u8 netlink attribute doesn't.
> 
> I can explain that: at the moment these helpers were introduced, only
> members of traditional structures shared with ioctl interface were
> updated and all attributes which were booleans logically were
> represented as u32 in them so that no other helper was needed back then.

Thanks, but the internal data structures of the kernel did not
necessitate boolean netlink attributes to be promoted to u32 just
because the ioctl interface did it that way; or did they?

Or otherwise said, is there a technical requirement that if a boolean is
passed to the kernel as u32 via ioctl, it should be passed as u32 via
netlink too?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool()
  2022-08-17 11:52     ` Vladimir Oltean
@ 2022-08-17 12:00       ` Vladimir Oltean
  2022-08-17 12:05         ` Michal Kubecek
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-17 12:00 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Wed, Aug 17, 2022 at 02:52:56PM +0300, Vladimir Oltean wrote:
> On Wed, Aug 17, 2022 at 01:27:29PM +0200, Michal Kubecek wrote:
> > On Wed, Aug 17, 2022 at 01:29:14AM +0300, Vladimir Oltean wrote:
> > > For a reason I can't really understand, ethnl_update_bool32() exists,
> > > but the plain function that operates on a boolean value kept in an
> > > actual u8 netlink attribute doesn't.
> > 
> > I can explain that: at the moment these helpers were introduced, only
> > members of traditional structures shared with ioctl interface were
> > updated and all attributes which were booleans logically were
> > represented as u32 in them so that no other helper was needed back then.
> 
> Thanks, but the internal data structures of the kernel did not
> necessitate boolean netlink attributes to be promoted to u32 just
> because the ioctl interface did it that way; or did they?
> 
> Or otherwise said, is there a technical requirement that if a boolean is
> passed to the kernel as u32 via ioctl, it should be passed as u32 via
> netlink too?

Ah, don't mind me... By the time I wrote this email, I forgot that
ethnl_update_bool32() also calls nla_get_u8(). All clear now.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool()
  2022-08-17 12:00       ` Vladimir Oltean
@ 2022-08-17 12:05         ` Michal Kubecek
  0 siblings, 0 replies; 37+ messages in thread
From: Michal Kubecek @ 2022-08-17 12:05 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

[-- Attachment #1: Type: text/plain, Size: 1632 bytes --]

On Wed, Aug 17, 2022 at 12:00:37PM +0000, Vladimir Oltean wrote:
> On Wed, Aug 17, 2022 at 02:52:56PM +0300, Vladimir Oltean wrote:
> > On Wed, Aug 17, 2022 at 01:27:29PM +0200, Michal Kubecek wrote:
> > > On Wed, Aug 17, 2022 at 01:29:14AM +0300, Vladimir Oltean wrote:
> > > > For a reason I can't really understand, ethnl_update_bool32() exists,
> > > > but the plain function that operates on a boolean value kept in an
> > > > actual u8 netlink attribute doesn't.
> > > 
> > > I can explain that: at the moment these helpers were introduced, only
> > > members of traditional structures shared with ioctl interface were
> > > updated and all attributes which were booleans logically were
> > > represented as u32 in them so that no other helper was needed back then.
> > 
> > Thanks, but the internal data structures of the kernel did not
> > necessitate boolean netlink attributes to be promoted to u32 just
> > because the ioctl interface did it that way; or did they?
> > 
> > Or otherwise said, is there a technical requirement that if a boolean is
> > passed to the kernel as u32 via ioctl, it should be passed as u32 via
> > netlink too?
> 
> Ah, don't mind me... By the time I wrote this email, I forgot that
> ethnl_update_bool32() also calls nla_get_u8(). All clear now.

Yes, it expects a NLA_U8 attribute. But it does not really matter that
much as even NLA_U8 still occupies 4 bytes in the message because of the
padding. (Which is why I generally prefer using NLA_U32 attributes
unless the value is naturally 8- or 16-bit and we are absolutely sure we
will never need more.)

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-08-17 11:41     ` Vladimir Oltean
@ 2022-08-17 18:35       ` Jakub Kicinski
  0 siblings, 0 replies; 37+ messages in thread
From: Jakub Kicinski @ 2022-08-17 18:35 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Wed, 17 Aug 2022 11:41:01 +0000 Vladimir Oltean wrote:
> > > +	nest_table = nla_nest_start(skb, ETHTOOL_A_FP_PARAM_TABLE);
> > > +	if (!nest_table)
> > > +		return -EMSGSIZE;  
> > 
> > Don't warp tables in nests, let the elements repeat in the parent.  
> 
> Ok, can do. I did this because 802.1Q actually specifies in the
> IEEE8021-Preemption-MIB that there is a ieee8021PreemptionParameterTable
> structure containing pairs of ieee8021PreemptionPriority and
> ieee8021FramePreemptionAdminStatus.

Yeah, netlink is a bit special in array definition. I need to document
it well in my YAML netlink patches..

> Do you have actual issues with the structuring of the FP parameters
> though? They look like this currently

IDK how to put it best, I shared my largely uninformed thoughts, you
know much more about the spec and HW so whatever you think is most
appropriate is fine by me.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-08-17 11:50   ` Vladimir Oltean
@ 2022-08-17 18:46     ` Jakub Kicinski
  2022-10-01 15:53       ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2022-08-17 18:46 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Wed, 17 Aug 2022 11:50:09 +0000 Vladimir Oltean wrote:
> On Tue, Aug 16, 2022 at 08:34:17PM -0700, Jakub Kicinski wrote:
> > I have a couple of general questions. The mm and fp are related but fp
> > can be implemented without mm or they must always come together? (I'd
> > still split patch 2 for ease of review, tho.)  
> 
> FP cannot be implemented without MM and MM makes limited (but some)
> sense without FP. Since FP just decides which packets you TX via the
> pMAC and which via the eMAC, you can configure just the MM layer such
> that you interoperate with a FP-capable switch, but you don't actually
> generate any preemptable traffic yourself.
> 
> In fact, the reasons why I decided to split these are:
> - because they are part of different specs, which call for different
>   managed objects
> - because in an SoC where IPs are mixed and matched from different
>   vendors, it makes perfect sense to me that the FP portion (more
>   related to the queue/classification system) is provided by one vendor,
>   and the MM portion is provided by another. In the future, we may find
>   enough commonalities to justify introducing the concept of a dedicated
>   MAC driver, independent/reusable between Ethernet controller ("net_device")
>   drivers. We have this today already with the PCS layer in phylink.
>   So if there is a physical split between the layers, I think keeping a
>   split in terms of callbacks makes some sense too.

Hah, interesting. I was under the impression that FP can be done
without MM, if frame is preempted it just gets scrambled (bad FCS 
gets injected or a special symbol) and dropped by the receiver.
I had it completely backwards, then.

> > When we have separate set of stats for pMAC the normal stats are sum of
> > all traffic, right? So normal - pMAC == eMAC, everything that's not
> > preemptible is express?  
> 
> Actually not quite, or at least not for the LS1028A ENETC and Felix switch.
> The normal counters report just what the eMAC sees, and the pMAC counters
> just what the pMAC sees. After all, only the eMAC was enabled up until now.
> Nobody does the addition currently.

I see. And the netdev stats are the total?

> > Did you consider adding an attribute for switching between MAC and pMAC
> > for stats rather than duplicating things?  
> 
> No. Could you expand on that idea a little? Add a netlink attribute
> where, and this helps reduce duplication where, and how?

Add a attribute to ETHTOOL_MSG_STATS_GET, let's call it
ETHTOOL_A_STATS_EXPRESS, a flag.

Plumb thru to all the stats callback an extra argument 
(a structure for future extensibility) with a bool pMAC;

Add a capability field to ethtool_ops to announce that
driver will pay attention to the bool pMAC / has support.

We can then use the existing callbacks.

Am I making sense?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
                   ` (7 preceding siblings ...)
  2022-08-17  3:34 ` [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Jakub Kicinski
@ 2022-08-17 22:47 ` Vinicius Costa Gomes
  2022-08-19  8:16 ` Kurt Kanzenbach
  9 siblings, 0 replies; 37+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-17 22:47 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Xiaoliang Yang, Kurt Kanzenbach,
	Rui Sousa, Ferenc Fejes

Hi Vladimir,

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> Vinicius' progress on upstreaming frame preemption support for Intel I226
> seemed to stall, so I decided to give it a go using my own view as well.
> https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/

I was stuck with some internal projects (and some other things) for some
time which left me with very little energy/time to follow up with that
series.

Just let's say that your timing was very good, a few more days I would
have sent another version. I am kind of glad that you decided to take
this torch.

>
> Please don't take this patch set too seriously; I spent only a few
> days working on this, and I'm only posting it as RFC to inform others
> I've started doing this, before I spend too much time to risk colliding
> with someone else's active work.
>
> Compared to Vinicius' previous patches, this is basically a new
> implementation, with the following differences:
>
> - The MAC Merge (mm) and Frame Preemption (fp) settings are split like
>   they were in Vinicius' proposal to have fp as part of tc-taprio. But
>   in my proposal, the fp portion is still part of ethtool, like mm.
>

I have some questions/comments about this part. Mostly related to
"prios" in this context. Will make them in the UAPI patches.

> - We have statistics, actually 2 kinds. First we have MAC merge layer
>   stats, which are exposed as protocol-specific stats:
>
>   ethtool --json --include-statistics --show-mm eno2
>   [ {
>           "ifname": "eno2",
>           "verify-status": "SUCCEEDED",
>           "verify-time": 10,
>           "supported": true,
>           "enabled": true,
>           "active": true,
>           "add-frag-size": 0,
>           "statistics": {
>               "MACMergeFrameAssErrorCount": 0,
>               "MACMergeFrameSmdErrorCount": 0,
>               "MACMergeFrameAssOkCount": 0,
>               "MACMergeFragCountRx": 0,
>               "MACMergeFragCountTx": 0,
>               "MACMergeHoldCount": 0
>           }
>       } ]
>
>   and then we also have the usual standardized statistics counters, but
>   replicated for the pMAC:
>
>   ethtool -S eno0 --groups pmac-rmon
>   Standard stats for eno0:
>   pmac-rmon-etherStatsUndersizePkts: 0
>   pmac-rmon-etherStatsOversizePkts: 0
>   pmac-rmon-etherStatsFragments: 0
>   pmac-rmon-etherStatsJabbers: 0
>   rx-pmac-rmon-etherStatsPkts64to64Octets: 0
>   rx-pmac-rmon-etherStatsPkts65to127Octets: 0
>   rx-pmac-rmon-etherStatsPkts128to255Octets: 0
>   rx-pmac-rmon-etherStatsPkts256to511Octets: 0
>   rx-pmac-rmon-etherStatsPkts512to1023Octets: 0
>   rx-pmac-rmon-etherStatsPkts1024to1522Octets: 0
>   rx-pmac-rmon-etherStatsPkts1523to9000Octets: 0
>   tx-pmac-rmon-etherStatsPkts64to64Octets: 0
>   tx-pmac-rmon-etherStatsPkts65to127Octets: 0
>   tx-pmac-rmon-etherStatsPkts128to255Octets: 0
>   tx-pmac-rmon-etherStatsPkts256to511Octets: 0
>   tx-pmac-rmon-etherStatsPkts512to1023Octets: 0
>   tx-pmac-rmon-etherStatsPkts1024to1522Octets: 0
>   tx-pmac-rmon-etherStatsPkts1523to9000Octets: 0
>
>   ethtool -S eno0 --groups eth-pmac-mac
>   Standard stats for eno0:
>   eth-pmac-mac-FramesTransmittedOK: 0
>   eth-pmac-mac-SingleCollisionFrames: 0
>   eth-pmac-mac-MultipleCollisionFrames: 0
>   eth-pmac-mac-FramesReceivedOK: 0
>   eth-pmac-mac-FrameCheckSequenceErrors: 0
>   eth-pmac-mac-AlignmentErrors: 0
>   eth-pmac-mac-OctetsTransmittedOK: 0
>   eth-pmac-mac-FramesWithDeferredXmissions: 0
>   eth-pmac-mac-LateCollisions: 0
>   eth-pmac-mac-FramesAbortedDueToXSColls: 0
>   eth-pmac-mac-FramesLostDueToIntMACXmitError: 0
>   eth-pmac-mac-CarrierSenseErrors: 0
>   eth-pmac-mac-OctetsReceivedOK: 0
>   eth-pmac-mac-FramesLostDueToIntMACRcvError: 0
>   eth-pmac-mac-MulticastFramesXmittedOK: 0
>   eth-pmac-mac-BroadcastFramesXmittedOK: 0
>   eth-pmac-mac-MulticastFramesReceivedOK: 0
>   eth-pmac-mac-BroadcastFramesReceivedOK: 0
>
>   ethtool -S eno0 --groups eth-pmac-ctrl
>   Standard stats for eno0:
>   eth-pmac-ctrl-MACControlFramesTransmitted: 0
>   eth-pmac-ctrl-MACControlFramesReceived: 0
>
>   What also exists but is not exported here are PAUSE stats for the
>   pMAC. Since those are also protocol-specific stats, I'm not sure how
>   to mix the 2 (MAC Merge layer + PAUSE). Maybe just extend
>   ETHTOOL_A_PAUSE_STAT_TX_FRAMES and ETHTOOL_A_PAUSE_STAT_RX_FRAMES with
>   the pMAC variants?
>
> - Finally, the hardware I'm working with (here, the test vehicle is the
>   NXP ENETC from LS1028A, although I have patches for the Felix switch
>   as well, but those need a bit of a revolution in the driver to go in
>   first). This hardware is not without its flaws, but at least allows me
>   to concentrate on the UAPI portions for this series.
>
> I also have a kselftest written, but it's for the Felix switch (covers
> forwarding latency) and so it's not included here.
>
> Are there objections in exposing the UAPI for this new feature in this way?
>

I really liked the statistics part, even though the hardware I am
working right now with doesn't provide all of them.

> Also, there is no documentation associated with this patch set, other
> than the code. Life is too short to write documentation for an RFC, sorry.
> I may get kdoc related kernel bot warnings because I copy-pasted ethtool
> structure definitions from here and there, but I didn't fill in the
> descriptions of all their fields. All those fields are as truthful to
> the standards as possible rather than my own variables or names, so
> please refer to those specs for now.
>
> Vladimir Oltean (7):
>   net: ethtool: netlink: introduce ethnl_update_bool()
>   net: ethtool: add support for Frame Preemption and MAC Merge layer
>   net: ethtool: stats: make stats_put_stats() take input from multiple
>     sources
>   net: ethtool: stats: replicate standardized counters for the pMAC
>   net: enetc: parameterize port MAC stats to also cover the pMAC
>   net: enetc: expose some standardized ethtool counters
>   net: enetc: add support for Frame Preemption and MAC Merge layer
>
>  .../ethernet/freescale/enetc/enetc_ethtool.c  | 399 +++++++++++++++---
>  .../net/ethernet/freescale/enetc/enetc_hw.h   | 132 +++---
>  include/linux/ethtool.h                       |  68 +++
>  include/uapi/linux/ethtool.h                  |  15 +
>  include/uapi/linux/ethtool_netlink.h          |  86 ++++
>  net/ethtool/Makefile                          |   3 +-
>  net/ethtool/fp.c                              | 295 +++++++++++++
>  net/ethtool/mm.c                              | 228 ++++++++++
>  net/ethtool/netlink.c                         |  38 ++
>  net/ethtool/netlink.h                         |  34 ++
>  net/ethtool/stats.c                           | 218 +++++++---
>  11 files changed, 1338 insertions(+), 178 deletions(-)
>  create mode 100644 net/ethtool/fp.c
>  create mode 100644 net/ethtool/mm.c
>
> -- 
> 2.34.1
>


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-08-16 22:29 ` [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer Vladimir Oltean
  2022-08-17  3:22   ` Jakub Kicinski
@ 2022-08-17 23:15   ` Vinicius Costa Gomes
  2022-08-19 16:12     ` Vladimir Oltean
  1 sibling, 1 reply; 37+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-17 23:15 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Xiaoliang Yang, Kurt Kanzenbach,
	Rui Sousa, Ferenc Fejes

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> Frame preemption (IEEE 802.1Q-2018 clause 6.7.2) and the MAC merge
> sublayer (IEEE 802.3-2018 clause 99) are 2 specifications, part of the
> Time Sensitive Networking enhancements, which work together to minimize
> latency caused by frame interference. The overall goal of TSN is for
> normal traffic and traffic belonging to real-time control processes to
> be able to cohabitate on the same L2 network and not bother each other
> too much.
>
> They achieve this (partly) by introducing the concept of preemptable
> traffic, i.e. Ethernet frames that have an altered Start-Of-Frame
> delimiter, which can be fragmented and reassembled at L2 on a link-local
> basis. The non-preemptable frames are called express traffic, and they
> can preempt preemptable frames, therefore having lower latency, which
> can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
> frames around 9K). Preemption is not recursive, i.e. a PT frame cannot
> preempt another PT frame. Preemption also does not depend upon priority,
> or otherwise said, an ET frame with prio 0 will still preempt a PT frame
> with prio 7.

I liked that in the API sense, using this "prio" concept we gain more
flexibility, and we can express better what the hardware you work with
can do, i.e. priority (for frame preemption purposes?) and queues are
orthogonal.

The problem I have is that the hardware I work with is more limited (as
are some stmmac-based NICs that I am aware of) frame preemption
capability and "priority" are attached to queues.

From the API perspective, it seems that I could say that "fp-prio" 0 is
associated with queue 0, fp-prio 1 to queue 1, and so on, and everything
will work.

The only thing that I am not happy at all is that there are exactly 8
fp-prios.

The Linux network stack is more flexible than what 802.1Q defines, think
skb->priority, number of TCs, as you said earlier, I would hate to
impose some almost artificial limits here. And in future we are going to
see TSN enabled devices with lots more queues.

In short:
 - Comment: this section of the RFC is hardware independent, this
 behavior of queues and priorities being orthogonal is only valid for
 some implementations;
 - Question: would it be against the intention of the API to have a 1:1
 map of priorities to queues?
 - Deal breaker: fixed 8 prios;

>
> In terms of implementation, the specs talk about the fact that the
> express traffic is handled by an express MAC (eMAC) and the preemptable
> traffic by a preemptable MAC (pMAC), and these MACs are multiplexed on
> the same MII by a MAC merge layer.
>
> On RX, packets go to the eMAC or to the pMAC based on their SFD (the
> definition of which was generalized to SMD, Start-of-mPacket-Delimiter,
> where mPacket is essentially an Ethernet frame fragment, or a complete
> frame). On TX, the eMAC/pMAC classification decision is taken by the
> 802.1Q spec, based on packet priority (each of the 8 priority values may
> have an admin-status of preemptable or express).
>
> I have modeled both the Ethernet part of the spec and the queuing part
> as separate netlink messages, with separate ethtool command sets
> intended for them. I am slightly flexible as to where to place the FP
> settings; there were previous discussions about placing them in tc.
> At the moment I haven't received a good enough argument to move them out
> of ethtool, so here they are.
> https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  include/linux/ethtool.h              |  59 ++++++
>  include/uapi/linux/ethtool.h         |  15 ++
>  include/uapi/linux/ethtool_netlink.h |  82 ++++++++
>  net/ethtool/Makefile                 |   3 +-
>  net/ethtool/fp.c                     | 295 +++++++++++++++++++++++++++
>  net/ethtool/mm.c                     | 228 +++++++++++++++++++++
>  net/ethtool/netlink.c                |  38 ++++
>  net/ethtool/netlink.h                |   8 +
>  8 files changed, 727 insertions(+), 1 deletion(-)
>  create mode 100644 net/ethtool/fp.c
>  create mode 100644 net/ethtool/mm.c
>
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 99dc7bfbcd3c..fa504dd22bf6 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -453,6 +453,49 @@ struct ethtool_module_power_mode_params {
>  	enum ethtool_module_power_mode mode;
>  };
>  
> +/**
> + * struct ethtool_fp_param - 802.1Q Frame Preemption parameters
> + */
> +struct ethtool_fp_param {
> +	u8 preemptable_prios;
> +	u32 hold_advance;
> +	u32 release_advance;
> +};
> +
> +/**
> + * struct ethtool_mm_state - 802.3 MAC merge layer state
> + */
> +struct ethtool_mm_state {
> +	u32 verify_time;
> +	enum ethtool_mm_verify_status verify_status;
> +	bool supported;
> +	bool enabled;
> +	bool active;
> +	u8 add_frag_size;
> +};
> +
> +/**
> + * struct ethtool_mm_cfg - 802.3 MAC merge layer configuration
> + */
> +struct ethtool_mm_cfg {
> +	u32 verify_time;
> +	bool verify_disable;
> +	bool enabled;
> +	u8 add_frag_size;
> +};
> +
> +/* struct ethtool_mm_stats - 802.3 MAC merge layer statistics, as defined in
> + * IEEE 802.3-2018 subclause 30.14.1 oMACMergeEntity managed object class
> + */
> +struct ethtool_mm_stats {
> +	u64 MACMergeFrameAssErrorCount;
> +	u64 MACMergeFrameSmdErrorCount;
> +	u64 MACMergeFrameAssOkCount;
> +	u64 MACMergeFragCountRx;
> +	u64 MACMergeFragCountTx;
> +	u64 MACMergeHoldCount;
> +};
> +
>  /**
>   * struct ethtool_ops - optional netdev operations
>   * @cap_link_lanes_supported: indicates if the driver supports lanes
> @@ -624,6 +667,11 @@ struct ethtool_module_power_mode_params {
>   *	plugged-in.
>   * @set_module_power_mode: Set the power mode policy for the plug-in module
>   *	used by the network device.
> + * @get_fp_param: Query the 802.1Q Frame Preemption parameters.
> + * @set_fp_param: Set the 802.1Q Frame Preemption parameters.
> + * @get_mm_state: Query the 802.3 MAC Merge layer state.
> + * @set_mm_cfg: Set the 802.3 MAC Merge layer parameters.
> + * @get_mm_stats: Query the 802.3 MAC Merge layer statistics.
>   *
>   * All operations are optional (i.e. the function pointer may be set
>   * to %NULL) and callers must take this into account.  Callers must
> @@ -760,6 +808,17 @@ struct ethtool_ops {
>  	int	(*set_module_power_mode)(struct net_device *dev,
>  					 const struct ethtool_module_power_mode_params *params,
>  					 struct netlink_ext_ack *extack);
> +	void	(*get_fp_param)(struct net_device *dev,
> +				struct ethtool_fp_param *params);
> +	int	(*set_fp_param)(struct net_device *dev,
> +				const struct ethtool_fp_param *params,
> +				struct netlink_ext_ack *extack);
> +	void	(*get_mm_state)(struct net_device *dev,
> +				struct ethtool_mm_state *state);
> +	int	(*set_mm_cfg)(struct net_device *dev, struct ethtool_mm_cfg *cfg,
> +			      struct netlink_ext_ack *extack);
> +	void	(*get_mm_stats)(struct net_device *dev,
> +				struct ethtool_mm_stats *stats);
>  };
>  
>  int ethtool_check_ops(const struct ethtool_ops *ops);
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 2d5741fd44bb..7a21fcb26a43 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -736,6 +736,21 @@ enum ethtool_module_power_mode {
>  	ETHTOOL_MODULE_POWER_MODE_HIGH,
>  };
>  
> +/* Values from ieee8021FramePreemptionAdminStatus */
> +enum ethtool_fp_admin_status {
> +	ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_EXPRESS = 1,
> +	ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_PREEMPTABLE = 2,
> +};
> +
> +enum ethtool_mm_verify_status {
> +	ETHTOOL_MM_VERIFY_STATUS_UNKNOWN,
> +	ETHTOOL_MM_VERIFY_STATUS_INITIAL,
> +	ETHTOOL_MM_VERIFY_STATUS_VERIFYING,
> +	ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED,
> +	ETHTOOL_MM_VERIFY_STATUS_FAILED,
> +	ETHTOOL_MM_VERIFY_STATUS_DISABLED,
> +};
> +
>  /**
>   * struct ethtool_gstrings - string set for data tagging
>   * @cmd: Command number = %ETHTOOL_GSTRINGS
> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
> index d2fb4f7be61b..658810274c49 100644
> --- a/include/uapi/linux/ethtool_netlink.h
> +++ b/include/uapi/linux/ethtool_netlink.h
> @@ -49,6 +49,10 @@ enum {
>  	ETHTOOL_MSG_PHC_VCLOCKS_GET,
>  	ETHTOOL_MSG_MODULE_GET,
>  	ETHTOOL_MSG_MODULE_SET,
> +	ETHTOOL_MSG_FP_GET,
> +	ETHTOOL_MSG_FP_SET,
> +	ETHTOOL_MSG_MM_GET,
> +	ETHTOOL_MSG_MM_SET,
>  
>  	/* add new constants above here */
>  	__ETHTOOL_MSG_USER_CNT,
> @@ -94,6 +98,10 @@ enum {
>  	ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY,
>  	ETHTOOL_MSG_MODULE_GET_REPLY,
>  	ETHTOOL_MSG_MODULE_NTF,
> +	ETHTOOL_MSG_FP_GET_REPLY,
> +	ETHTOOL_MSG_FP_NTF,
> +	ETHTOOL_MSG_MM_GET_REPLY,
> +	ETHTOOL_MSG_MM_NTF,
>  
>  	/* add new constants above here */
>  	__ETHTOOL_MSG_KERNEL_CNT,
> @@ -862,6 +870,80 @@ enum {
>  	ETHTOOL_A_MODULE_MAX = (__ETHTOOL_A_MODULE_CNT - 1)
>  };
>  
> +/* FRAME PREEMPTION (802.1Q) */
> +
> +enum {
> +	ETHTOOL_A_FP_PARAM_ENTRY_UNSPEC,
> +	ETHTOOL_A_FP_PARAM_ENTRY_PRIO,		/* u8 */
> +	ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS,	/* u8 */
> +
> +	/* add new constants above here */
> +	__ETHTOOL_A_FP_PARAM_ENTRY_CNT,
> +	ETHTOOL_A_FP_PARAM_ENTRY_MAX = (__ETHTOOL_A_FP_PARAM_ENTRY_CNT - 1)
> +};
> +
> +enum {
> +	ETHTOOL_A_FP_PARAM_UNSPEC,
> +	ETHTOOL_A_FP_PARAM_ENTRY,		/* nest */
> +
> +	/* add new constants above here */
> +	__ETHTOOL_A_FP_PARAM_CNT,
> +	ETHTOOL_A_FP_PARAM_MAX = (__ETHTOOL_A_FP_PARAM_CNT - 1)
> +};
> +
> +enum {
> +	ETHTOOL_A_FP_UNSPEC,
> +	ETHTOOL_A_FP_HEADER,			/* nest - _A_HEADER_* */
> +	ETHTOOL_A_FP_PARAM_TABLE,		/* nest */
> +	ETHTOOL_A_FP_HOLD_ADVANCE,		/* u32 */
> +	ETHTOOL_A_FP_RELEASE_ADVANCE,		/* u32 */
> +
> +	/* add new constants above here */
> +	__ETHTOOL_A_FP_CNT,
> +	ETHTOOL_A_FP_MAX = (__ETHTOOL_A_FP_CNT - 1)
> +};
> +
> +/* MAC MERGE (802.3) */
> +
> +enum {
> +	ETHTOOL_A_MM_STAT_UNSPEC,
> +	ETHTOOL_A_MM_STAT_PAD,
> +
> +	/* aMACMergeFrameAssErrorCount */
> +	ETHTOOL_A_MM_STAT_REASSEMBLY_ERRORS,	/* u64 */
> +	/* aMACMergeFrameSmdErrorCount */
> +	ETHTOOL_A_MM_STAT_SMD_ERRORS,		/* u64 */
> +	/* aMACMergeFrameAssOkCount */
> +	ETHTOOL_A_MM_STAT_REASSEMBLY_OK,	/* u64 */
> +	/* aMACMergeFragCountRx */
> +	ETHTOOL_A_MM_STAT_RX_FRAG_COUNT,	/* u64 */
> +	/* aMACMergeFragCountTx */
> +	ETHTOOL_A_MM_STAT_TX_FRAG_COUNT,	/* u64 */
> +	/* aMACMergeHoldCount */
> +	ETHTOOL_A_MM_STAT_HOLD_COUNT,		/* u64 */
> +
> +	/* add new constants above here */
> +	__ETHTOOL_A_MM_STAT_CNT,
> +	ETHTOOL_A_MM_STAT_MAX = (__ETHTOOL_A_MM_STAT_CNT - 1)
> +};
> +
> +enum {
> +	ETHTOOL_A_MM_UNSPEC,
> +	ETHTOOL_A_MM_HEADER,			/* nest - _A_HEADER_* */
> +	ETHTOOL_A_MM_VERIFY_STATUS,		/* u8 */
> +	ETHTOOL_A_MM_VERIFY_DISABLE,		/* u8 */
> +	ETHTOOL_A_MM_VERIFY_TIME,		/* u32 */
> +	ETHTOOL_A_MM_SUPPORTED,			/* u8 */
> +	ETHTOOL_A_MM_ENABLED,			/* u8 */
> +	ETHTOOL_A_MM_ACTIVE,			/* u8 */
> +	ETHTOOL_A_MM_ADD_FRAG_SIZE,		/* u8 */
> +	ETHTOOL_A_MM_STATS,			/* nest - _A_MM_STAT_* */
> +
> +	/* add new constants above here */
> +	__ETHTOOL_A_MM_CNT,
> +	ETHTOOL_A_MM_MAX = (__ETHTOOL_A_MM_CNT - 1)
> +};
> +
>  /* generic netlink info */
>  #define ETHTOOL_GENL_NAME "ethtool"
>  #define ETHTOOL_GENL_VERSION 1
> diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
> index b76432e70e6b..31e056f17856 100644
> --- a/net/ethtool/Makefile
> +++ b/net/ethtool/Makefile
> @@ -7,4 +7,5 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
>  ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
>  		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
>  		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
> -		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o
> +		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o \
> +		   fp.o mm.o
> diff --git a/net/ethtool/fp.c b/net/ethtool/fp.c
> new file mode 100644
> index 000000000000..20f19d8c1461
> --- /dev/null
> +++ b/net/ethtool/fp.c
> @@ -0,0 +1,295 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright 2022 NXP
> + */
> +#include "common.h"
> +#include "netlink.h"
> +
> +struct fp_req_info {
> +	struct ethnl_req_info		base;
> +};
> +
> +struct fp_reply_data {
> +	struct ethnl_reply_data		base;
> +	struct ethtool_fp_param		params;
> +};
> +
> +#define FP_REPDATA(__reply_base) \
> +	container_of(__reply_base, struct fp_reply_data, base)
> +
> +const struct nla_policy ethnl_fp_get_policy[ETHTOOL_A_FP_HEADER + 1] = {
> +	[ETHTOOL_A_FP_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
> +};
> +
> +static int fp_prepare_data(const struct ethnl_req_info *req_base,
> +			   struct ethnl_reply_data *reply_base,
> +			   struct genl_info *info)
> +{
> +	struct fp_reply_data *data = FP_REPDATA(reply_base);
> +	struct net_device *dev = reply_base->dev;
> +	int ret;
> +
> +	if (!dev->ethtool_ops->get_fp_param)
> +		return -EOPNOTSUPP;
> +
> +	ret = ethnl_ops_begin(dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	dev->ethtool_ops->get_fp_param(dev, &data->params);
> +	ethnl_ops_complete(dev);
> +
> +	return 0;
> +}
> +
> +static int fp_reply_size(const struct ethnl_req_info *req_base,
> +			 const struct ethnl_reply_data *reply_base)
> +{
> +	int len = 0;
> +
> +	len += nla_total_size(0); /* _FP_PARAM_ENTRY */
> +	len += nla_total_size(sizeof(u8)); /* _FP_PARAM_ENTRY_PRIO */
> +	len += nla_total_size(sizeof(u8)); /* _FP_PARAM_ENTRY_ADMIN_STATUS */
> +	len *= 8; /* 8 prios */
> +	len += nla_total_size(0); /* _FP_PARAM_TABLE */
> +	len += nla_total_size(sizeof(u32)); /* _FP_HOLD_ADVANCE */
> +	len += nla_total_size(sizeof(u32)); /* _FP_RELEASE_ADVANCE */
> +
> +	return len;
> +}
> +
> +static int fp_fill_reply(struct sk_buff *skb,
> +			 const struct ethnl_req_info *req_base,
> +			 const struct ethnl_reply_data *reply_base)
> +{
> +	const struct fp_reply_data *data = FP_REPDATA(reply_base);
> +	const struct ethtool_fp_param *params = &data->params;
> +	enum ethtool_fp_admin_status admin_status;
> +	struct nlattr *nest_table, *nest_entry;
> +	int prio;
> +	int ret;
> +
> +	if (nla_put_u32(skb, ETHTOOL_A_FP_HOLD_ADVANCE, params->hold_advance) ||
> +	    nla_put_u32(skb, ETHTOOL_A_FP_RELEASE_ADVANCE, params->release_advance))
> +		return -EMSGSIZE;
> +
> +	nest_table = nla_nest_start(skb, ETHTOOL_A_FP_PARAM_TABLE);
> +	if (!nest_table)
> +		return -EMSGSIZE;
> +
> +	for (prio = 0; prio < 8; prio++) {
> +		admin_status = (params->preemptable_prios & BIT(prio)) ?
> +			ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_PREEMPTABLE :
> +			ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_EXPRESS;
> +
> +		nest_entry = nla_nest_start(skb, ETHTOOL_A_FP_PARAM_ENTRY);
> +		if (!nest_entry)
> +			goto nla_nest_cancel_table;
> +
> +		if (nla_put_u8(skb, ETHTOOL_A_FP_PARAM_ENTRY_PRIO, prio))
> +			goto nla_nest_cancel_entry;
> +
> +		if (nla_put_u8(skb, ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS,
> +			       admin_status))
> +			goto nla_nest_cancel_entry;
> +
> +		nla_nest_end(skb, nest_entry);
> +	}
> +
> +	nla_nest_end(skb, nest_table);
> +
> +	return 0;
> +
> +nla_nest_cancel_entry:
> +	nla_nest_cancel(skb, nest_entry);
> +nla_nest_cancel_table:
> +	nla_nest_cancel(skb, nest_table);
> +	return ret;
> +}
> +
> +const struct ethnl_request_ops ethnl_fp_request_ops = {
> +	.request_cmd		= ETHTOOL_MSG_FP_GET,
> +	.reply_cmd		= ETHTOOL_MSG_FP_GET_REPLY,
> +	.hdr_attr		= ETHTOOL_A_FP_HEADER,
> +	.req_info_size		= sizeof(struct fp_req_info),
> +	.reply_data_size	= sizeof(struct fp_reply_data),
> +
> +	.prepare_data		= fp_prepare_data,
> +	.reply_size		= fp_reply_size,
> +	.fill_reply		= fp_fill_reply,
> +};
> +
> +static const struct nla_policy
> +ethnl_fp_set_param_entry_policy[ETHTOOL_A_FP_PARAM_ENTRY_MAX  + 1] = {
> +	[ETHTOOL_A_FP_PARAM_ENTRY_PRIO] = { .type = NLA_U8 },
> +	[ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS] = { .type = NLA_U8 },
> +};
> +
> +static const struct nla_policy
> +ethnl_fp_set_param_table_policy[ETHTOOL_A_FP_PARAM_MAX + 1] = {
> +	[ETHTOOL_A_FP_PARAM_ENTRY] = NLA_POLICY_NESTED(ethnl_fp_set_param_entry_policy),
> +};
> +
> +const struct nla_policy ethnl_fp_set_policy[ETHTOOL_A_FP_MAX + 1] = {
> +	[ETHTOOL_A_FP_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
> +	[ETHTOOL_A_FP_PARAM_TABLE] = NLA_POLICY_NESTED(ethnl_fp_set_param_table_policy),
> +};
> +
> +static int fp_parse_param_entry(const struct nlattr *nest,
> +				struct ethtool_fp_param *params,
> +				u8 *seen_prios, bool *mod,
> +				struct netlink_ext_ack *extack)
> +{
> +	struct nlattr *tb[ARRAY_SIZE(ethnl_fp_set_param_entry_policy)];
> +	u8 preemptable_prios = params->preemptable_prios;
> +	u8 prio, admin_status;
> +	int ret;
> +
> +	if (!nest)
> +		return 0;
> +
> +	ret = nla_parse_nested(tb,
> +			       ARRAY_SIZE(ethnl_fp_set_param_entry_policy) - 1,
> +			       nest, ethnl_fp_set_param_entry_policy,
> +			       extack);
> +	if (ret)
> +		return ret;
> +
> +	if (!tb[ETHTOOL_A_FP_PARAM_ENTRY_PRIO]) {
> +		NL_SET_ERR_MSG_ATTR(extack, nest,
> +				    "Missing 'prio' attribute");
> +		return -EINVAL;
> +	}
> +
> +	if (!tb[ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS]) {
> +		NL_SET_ERR_MSG_ATTR(extack, nest,
> +				    "Missing 'admin_status' attribute");
> +		return -EINVAL;
> +	}
> +
> +	prio = nla_get_u8(tb[ETHTOOL_A_FP_PARAM_ENTRY_PRIO]);
> +	if (prio >= 8) {
> +		NL_SET_ERR_MSG_ATTR(extack, nest,
> +				    "Range exceeded for 'prio' attribute");
> +		return -ERANGE;
> +	}
> +
> +	if (*seen_prios & BIT(prio)) {
> +		NL_SET_ERR_MSG_ATTR(extack, nest,
> +				    "Duplicate 'prio' attribute");
> +		return -EINVAL;
> +	}
> +
> +	*seen_prios |= BIT(prio);
> +
> +	admin_status = nla_get_u8(tb[ETHTOOL_A_FP_PARAM_ENTRY_ADMIN_STATUS]);
> +	switch (admin_status) {
> +	case ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_PREEMPTABLE:
> +		preemptable_prios |= BIT(prio);
> +		break;
> +	case ETHTOOL_FP_PARAM_ENTRY_ADMIN_STATUS_EXPRESS:
> +		preemptable_prios &= ~BIT(prio);
> +		break;
> +	default:
> +		NL_SET_ERR_MSG_ATTR(extack, nest,
> +				    "Unexpected value for 'admin_status' attribute");
> +		return -EINVAL;
> +	}
> +
> +	if (preemptable_prios != params->preemptable_prios) {
> +		params->preemptable_prios = preemptable_prios;
> +		*mod = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int fp_parse_param_table(const struct nlattr *nest,
> +				struct ethtool_fp_param *params, bool *mod,
> +				struct netlink_ext_ack *extack)
> +{
> +	u8 seen_prios = 0;
> +	struct nlattr *n;
> +	int rem, ret;
> +
> +	if (!nest)
> +		return 0;
> +
> +	nla_for_each_nested(n, nest, rem) {
> +		struct sched_entry *entry;
> +
> +		if (nla_type(n) != ETHTOOL_A_FP_PARAM_ENTRY) {
> +			NL_SET_ERR_MSG_ATTR(extack, n,
> +					    "Attribute is not of type 'entry'");
> +			continue;
> +		}
> +
> +		ret = fp_parse_param_entry(n, params, &seen_prios, mod, extack);
> +		if (ret < 0) {
> +			kfree(entry);
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int ethnl_set_fp_param(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct netlink_ext_ack *extack = info->extack;
> +	struct ethnl_req_info req_info = {};
> +	struct ethtool_fp_param params = {};
> +	struct nlattr **tb = info->attrs;
> +	const struct ethtool_ops *ops;
> +	struct net_device *dev;
> +	bool mod = false;
> +	int ret;
> +
> +	ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_FP_HEADER],
> +					 genl_info_net(info), extack, true);
> +	if (ret)
> +		return ret;
> +
> +	dev = req_info.dev;
> +	ops = dev->ethtool_ops;
> +
> +	if (!ops->get_fp_param || !ops->set_fp_param) {
> +		ret = -EOPNOTSUPP;
> +		goto out_dev;
> +	}
> +
> +	rtnl_lock();
> +	ret = ethnl_ops_begin(dev);
> +	if (ret)
> +		goto out_rtnl;
> +
> +	dev->ethtool_ops->get_fp_param(dev, &params);
> +
> +	ethnl_update_u32(&params.hold_advance, tb[ETHTOOL_A_FP_HOLD_ADVANCE],
> +			 &mod);
> +	ethnl_update_u32(&params.release_advance,
> +			 tb[ETHTOOL_A_FP_RELEASE_ADVANCE], &mod);
> +
> +	ret = fp_parse_param_table(tb[ETHTOOL_A_FP_PARAM_TABLE], &params, &mod,
> +				   extack);
> +	if (ret || !mod)
> +		goto out_ops;
> +
> +	ret = dev->ethtool_ops->set_fp_param(dev, &params, extack);
> +	if (ret) {
> +		if (!extack->_msg)
> +			NL_SET_ERR_MSG(extack,
> +				       "Failed to update frame preemption parameters");
> +		goto out_ops;
> +	}
> +
> +	ethtool_notify(dev, ETHTOOL_MSG_FP_NTF, NULL);
> +
> +out_ops:
> +	ethnl_ops_complete(dev);
> +out_rtnl:
> +	rtnl_unlock();
> +out_dev:
> +	dev_put(dev);
> +	return ret;
> +}
> diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c
> new file mode 100644
> index 000000000000..d4731fb7aee4
> --- /dev/null
> +++ b/net/ethtool/mm.c
> @@ -0,0 +1,228 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright 2022 NXP
> + */
> +#include "common.h"
> +#include "netlink.h"
> +
> +struct mm_req_info {
> +	struct ethnl_req_info		base;
> +};
> +
> +struct mm_reply_data {
> +	struct ethnl_reply_data		base;
> +	struct ethtool_mm_state		state;
> +	struct ethtool_mm_stats		stats;
> +};
> +
> +#define MM_REPDATA(__reply_base) \
> +	container_of(__reply_base, struct mm_reply_data, base)
> +
> +#define ETHTOOL_MM_STAT_CNT \
> +	(__ETHTOOL_A_MM_STAT_CNT - (ETHTOOL_A_MM_STAT_PAD + 1))
> +
> +const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1] = {
> +	[ETHTOOL_A_MM_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy_stats),
> +};
> +
> +static int mm_prepare_data(const struct ethnl_req_info *req_base,
> +			   struct ethnl_reply_data *reply_base,
> +			   struct genl_info *info)
> +{
> +	struct mm_reply_data *data = MM_REPDATA(reply_base);
> +	struct net_device *dev = reply_base->dev;
> +	const struct ethtool_ops *ops;
> +	int ret;
> +
> +	ops = dev->ethtool_ops;
> +
> +	if (!ops->get_mm_state)
> +		return -EOPNOTSUPP;
> +
> +	ethtool_stats_init((u64 *)&data->stats,
> +			   sizeof(data->stats) / sizeof(u64));
> +
> +	ret = ethnl_ops_begin(dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	ops->get_mm_state(dev, &data->state);
> +
> +	if (ops->get_mm_stats && (req_base->flags & ETHTOOL_FLAG_STATS))
> +		ops->get_mm_stats(dev, &data->stats);
> +
> +	ethnl_ops_complete(dev);
> +
> +	return 0;
> +}
> +
> +static int mm_reply_size(const struct ethnl_req_info *req_base,
> +			 const struct ethnl_reply_data *reply_base)
> +{
> +	int len = 0;
> +
> +	len += nla_total_size(sizeof(u8)); /* _MM_VERIFY_STATUS */
> +	len += nla_total_size(sizeof(u32)); /* _MM_VERIFY_TIME */
> +	len += nla_total_size(sizeof(u8)); /* _MM_SUPPORTED */
> +	len += nla_total_size(sizeof(u8)); /* _MM_ENABLED */
> +	len += nla_total_size(sizeof(u8)); /* _MM_ACTIVE */
> +	len += nla_total_size(sizeof(u8)); /* _MM_ADD_FRAG_SIZE */
> +
> +	if (req_base->flags & ETHTOOL_FLAG_STATS)
> +		len += nla_total_size(0) + /* _MM_STATS */
> +		       nla_total_size_64bit(sizeof(u64)) * ETHTOOL_MM_STAT_CNT;
> +
> +	return len;
> +}
> +
> +static int mm_put_stat(struct sk_buff *skb, u64 val, u16 attrtype)
> +{
> +	if (val == ETHTOOL_STAT_NOT_SET)
> +		return 0;
> +	if (nla_put_u64_64bit(skb, attrtype, val, ETHTOOL_A_MM_STAT_PAD))
> +		return -EMSGSIZE;
> +	return 0;
> +}
> +
> +static int mm_put_stats(struct sk_buff *skb,
> +			const struct ethtool_mm_stats *stats)
> +{
> +	struct nlattr *nest;
> +
> +	nest = nla_nest_start(skb, ETHTOOL_A_MM_STATS);
> +	if (!nest)
> +		return -EMSGSIZE;
> +
> +	if (mm_put_stat(skb, stats->MACMergeFrameAssErrorCount,
> +			ETHTOOL_A_MM_STAT_REASSEMBLY_ERRORS) ||
> +	    mm_put_stat(skb, stats->MACMergeFrameSmdErrorCount,
> +			ETHTOOL_A_MM_STAT_SMD_ERRORS) ||
> +	    mm_put_stat(skb, stats->MACMergeFrameAssOkCount,
> +			ETHTOOL_A_MM_STAT_REASSEMBLY_OK) ||
> +	    mm_put_stat(skb, stats->MACMergeFragCountRx,
> +			ETHTOOL_A_MM_STAT_RX_FRAG_COUNT) ||
> +	    mm_put_stat(skb, stats->MACMergeFragCountTx,
> +			ETHTOOL_A_MM_STAT_TX_FRAG_COUNT) ||
> +	    mm_put_stat(skb, stats->MACMergeHoldCount,
> +			ETHTOOL_A_MM_STAT_HOLD_COUNT))
> +		goto err_cancel;
> +
> +	nla_nest_end(skb, nest);
> +	return 0;
> +
> +err_cancel:
> +	nla_nest_cancel(skb, nest);
> +	return -EMSGSIZE;
> +}
> +
> +static int mm_fill_reply(struct sk_buff *skb,
> +			 const struct ethnl_req_info *req_base,
> +			 const struct ethnl_reply_data *reply_base)
> +{
> +	const struct mm_reply_data *data = MM_REPDATA(reply_base);
> +	const struct ethtool_mm_state *state = &data->state;
> +
> +	if (nla_put_u8(skb, ETHTOOL_A_MM_VERIFY_STATUS, state->verify_status) ||
> +	    nla_put_u32(skb, ETHTOOL_A_MM_VERIFY_TIME, state->verify_time) ||
> +	    nla_put_u8(skb, ETHTOOL_A_MM_SUPPORTED, state->supported) ||
> +	    nla_put_u8(skb, ETHTOOL_A_MM_ENABLED, state->enabled) ||
> +	    nla_put_u8(skb, ETHTOOL_A_MM_ACTIVE, state->active) ||
> +	    nla_put_u8(skb, ETHTOOL_A_MM_ADD_FRAG_SIZE, state->add_frag_size))
> +		return -EMSGSIZE;
> +
> +	if (req_base->flags & ETHTOOL_FLAG_STATS &&
> +	    mm_put_stats(skb, &data->stats))
> +		return -EMSGSIZE;
> +
> +	return 0;
> +}
> +
> +const struct ethnl_request_ops ethnl_mm_request_ops = {
> +	.request_cmd		= ETHTOOL_MSG_MM_GET,
> +	.reply_cmd		= ETHTOOL_MSG_MM_GET_REPLY,
> +	.hdr_attr		= ETHTOOL_A_MM_HEADER,
> +	.req_info_size		= sizeof(struct mm_req_info),
> +	.reply_data_size	= sizeof(struct mm_reply_data),
> +
> +	.prepare_data		= mm_prepare_data,
> +	.reply_size		= mm_reply_size,
> +	.fill_reply		= mm_fill_reply,
> +};
> +
> +const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1] = {
> +	[ETHTOOL_A_MM_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
> +	[ETHTOOL_A_MM_VERIFY_DISABLE] = { .type = NLA_U8 },
> +	[ETHTOOL_A_MM_VERIFY_TIME] = { .type = NLA_U32 },
> +	[ETHTOOL_A_MM_ENABLED] = { .type = NLA_U8 },
> +	[ETHTOOL_A_MM_ADD_FRAG_SIZE] = { .type = NLA_U8 },
> +};
> +
> +static void mm_state_to_cfg(const struct ethtool_mm_state *state,
> +			    struct ethtool_mm_cfg *cfg)
> +{
> +	cfg->verify_disable =
> +		state->verify_status == ETHTOOL_MM_VERIFY_STATUS_DISABLED;
> +	cfg->verify_time = state->verify_time;
> +	cfg->enabled = state->enabled;
> +	cfg->add_frag_size = state->add_frag_size;
> +}
> +
> +int ethnl_set_mm_cfg(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct netlink_ext_ack *extack = info->extack;
> +	struct ethnl_req_info req_info = {};
> +	struct ethtool_mm_state state = {};
> +	struct nlattr **tb = info->attrs;
> +	struct ethtool_mm_cfg cfg = {};
> +	const struct ethtool_ops *ops;
> +	struct net_device *dev;
> +	bool mod = false;
> +	int ret;
> +
> +	ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_MM_HEADER],
> +					 genl_info_net(info), extack, true);
> +	if (ret)
> +		return ret;
> +
> +	dev = req_info.dev;
> +	ops = dev->ethtool_ops;
> +
> +	if (!ops->get_mm_state || !ops->set_mm_cfg) {
> +		ret = -EOPNOTSUPP;
> +		goto out_dev;
> +	}
> +
> +	rtnl_lock();
> +	ret = ethnl_ops_begin(dev);
> +	if (ret)
> +		goto out_rtnl;
> +
> +	ops->get_mm_state(dev, &state);
> +
> +	mm_state_to_cfg(&state, &cfg);
> +
> +	ethnl_update_bool(&cfg.verify_disable, tb[ETHTOOL_A_MM_VERIFY_DISABLE],
> +			  &mod);
> +	ethnl_update_u32(&cfg.verify_time, tb[ETHTOOL_A_MM_VERIFY_TIME], &mod);
> +	ethnl_update_bool(&cfg.enabled, tb[ETHTOOL_A_MM_ENABLED], &mod);
> +	ethnl_update_u8(&cfg.add_frag_size, tb[ETHTOOL_A_MM_ADD_FRAG_SIZE],
> +			&mod);
> +
> +	ret = ops->set_mm_cfg(dev, &cfg, extack);
> +	if (ret) {
> +		if (!extack->_msg)
> +			NL_SET_ERR_MSG(extack,
> +				       "Failed to update MAC merge configuration");
> +		goto out_ops;
> +	}
> +
> +	ethtool_notify(dev, ETHTOOL_MSG_MM_NTF, NULL);
> +
> +out_ops:
> +	ethnl_ops_complete(dev);
> +out_rtnl:
> +	rtnl_unlock();
> +out_dev:
> +	dev_put(dev);
> +	return ret;
> +}
> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
> index e26079e11835..82ad2bd92d70 100644
> --- a/net/ethtool/netlink.c
> +++ b/net/ethtool/netlink.c
> @@ -286,6 +286,8 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
>  	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
>  	[ETHTOOL_MSG_PHC_VCLOCKS_GET]	= &ethnl_phc_vclocks_request_ops,
>  	[ETHTOOL_MSG_MODULE_GET]	= &ethnl_module_request_ops,
> +	[ETHTOOL_MSG_FP_GET]		= &ethnl_fp_request_ops,
> +	[ETHTOOL_MSG_MM_GET]		= &ethnl_mm_request_ops,
>  };
>  
>  static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
> @@ -598,6 +600,8 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
>  	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
>  	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
>  	[ETHTOOL_MSG_MODULE_NTF]	= &ethnl_module_request_ops,
> +	[ETHTOOL_MSG_FP_NTF]		= &ethnl_fp_request_ops,
> +	[ETHTOOL_MSG_MM_NTF]		= &ethnl_mm_request_ops,
>  };
>  
>  /* default notification handler */
> @@ -691,6 +695,8 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
>  	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
>  	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
>  	[ETHTOOL_MSG_MODULE_NTF]	= ethnl_default_notify,
> +	[ETHTOOL_MSG_FP_NTF]		= ethnl_default_notify,
> +	[ETHTOOL_MSG_MM_NTF]		= ethnl_default_notify,
>  };
>  
>  void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
> @@ -1020,6 +1026,38 @@ static const struct genl_ops ethtool_genl_ops[] = {
>  		.policy = ethnl_module_set_policy,
>  		.maxattr = ARRAY_SIZE(ethnl_module_set_policy) - 1,
>  	},
> +	{
> +		.cmd	= ETHTOOL_MSG_FP_GET,
> +		.doit	= ethnl_default_doit,
> +		.start	= ethnl_default_start,
> +		.dumpit	= ethnl_default_dumpit,
> +		.done	= ethnl_default_done,
> +		.policy = ethnl_fp_get_policy,
> +		.maxattr = ARRAY_SIZE(ethnl_fp_get_policy) - 1,
> +	},
> +	{
> +		.cmd	= ETHTOOL_MSG_FP_SET,
> +		.flags	= GENL_UNS_ADMIN_PERM,
> +		.doit	= ethnl_set_fp_param,
> +		.policy = ethnl_fp_set_policy,
> +		.maxattr = ARRAY_SIZE(ethnl_fp_set_policy) - 1,
> +	},
> +	{
> +		.cmd	= ETHTOOL_MSG_MM_GET,
> +		.doit	= ethnl_default_doit,
> +		.start	= ethnl_default_start,
> +		.dumpit	= ethnl_default_dumpit,
> +		.done	= ethnl_default_done,
> +		.policy = ethnl_mm_get_policy,
> +		.maxattr = ARRAY_SIZE(ethnl_mm_get_policy) - 1,
> +	},
> +	{
> +		.cmd	= ETHTOOL_MSG_MM_SET,
> +		.flags	= GENL_UNS_ADMIN_PERM,
> +		.doit	= ethnl_set_mm_cfg,
> +		.policy = ethnl_mm_set_policy,
> +		.maxattr = ARRAY_SIZE(ethnl_mm_set_policy) - 1,
> +	},
>  };
>  
>  static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
> index 1653fd2cf0cf..a2e56df74c85 100644
> --- a/net/ethtool/netlink.h
> +++ b/net/ethtool/netlink.h
> @@ -371,6 +371,8 @@ extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
>  extern const struct ethnl_request_ops ethnl_stats_request_ops;
>  extern const struct ethnl_request_ops ethnl_phc_vclocks_request_ops;
>  extern const struct ethnl_request_ops ethnl_module_request_ops;
> +extern const struct ethnl_request_ops ethnl_fp_request_ops;
> +extern const struct ethnl_request_ops ethnl_mm_request_ops;
>  
>  extern const struct nla_policy ethnl_header_policy[ETHTOOL_A_HEADER_FLAGS + 1];
>  extern const struct nla_policy ethnl_header_policy_stats[ETHTOOL_A_HEADER_FLAGS + 1];
> @@ -409,6 +411,10 @@ extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1
>  extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
>  extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
>  extern const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MODE_POLICY + 1];
> +extern const struct nla_policy ethnl_fp_get_policy[ETHTOOL_A_FP_HEADER + 1];
> +extern const struct nla_policy ethnl_fp_set_policy[ETHTOOL_A_FP_MAX + 1];
> +extern const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1];
> +extern const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1];
>  
>  int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
>  int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info);
> @@ -428,6 +434,8 @@ int ethnl_tunnel_info_start(struct netlink_callback *cb);
>  int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
>  int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
>  int ethnl_set_module(struct sk_buff *skb, struct genl_info *info);
> +int ethnl_set_fp_param(struct sk_buff *skb, struct genl_info *info);
> +int ethnl_set_mm_cfg(struct sk_buff *skb, struct genl_info *info);
>  
>  extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
>  extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
> -- 
> 2.34.1
>

-- 
Vinicius

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
                   ` (8 preceding siblings ...)
  2022-08-17 22:47 ` Vinicius Costa Gomes
@ 2022-08-19  8:16 ` Kurt Kanzenbach
  2022-08-19 16:59   ` Vladimir Oltean
  9 siblings, 1 reply; 37+ messages in thread
From: Kurt Kanzenbach @ 2022-08-19  8:16 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Rui Sousa, Ferenc Fejes

[-- Attachment #1: Type: text/plain, Size: 1340 bytes --]

On Wed Aug 17 2022, Vladimir Oltean wrote:
> Vinicius' progress on upstreaming frame preemption support for Intel I226
> seemed to stall, so I decided to give it a go using my own view as well.
> https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/

Great to see progress on FPE :-).

[snip]

> - Finally, the hardware I'm working with (here, the test vehicle is the
>   NXP ENETC from LS1028A, although I have patches for the Felix switch
>   as well, but those need a bit of a revolution in the driver to go in
>   first). This hardware is not without its flaws, but at least allows me
>   to concentrate on the UAPI portions for this series.
>
> I also have a kselftest written, but it's for the Felix switch (covers
> forwarding latency) and so it's not included here.

What kind of selftest did you implement? So far I've been doing this:
Using a cyclic real time application to create high priority frames and
running iperf3 in parallel to simulate low priority traffic
constantly. Afterwards, checking the NIC statistics for fragments and so
on. Also checking the latency of the RT frames with FPE on/off.

BTW, if you guys need help with testing of patches, i do have access to
i225 and stmmacs which both support FPE. Also the Hirschmann switches
should support it.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-08-17 23:15   ` Vinicius Costa Gomes
@ 2022-08-19 16:12     ` Vladimir Oltean
  2022-08-24  0:35       ` Vinicius Costa Gomes
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-19 16:12 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil, Xiaoliang Yang,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Wed, Aug 17, 2022 at 04:15:11PM -0700, Vinicius Costa Gomes wrote:
> I liked that in the API sense, using this "prio" concept we gain more
> flexibility, and we can express better what the hardware you work with
> can do, i.e. priority (for frame preemption purposes?) and queues are
> orthogonal.
> 
> The problem I have is that the hardware I work with is more limited (as
> are some stmmac-based NICs that I am aware of) frame preemption
> capability and "priority" are attached to queues.

Don't get me wrong, enetc also has "priority" attached to TX rings
(enetc_set_bdr_prio). Then there is another register which maps a TX
priority to a traffic class. This could be altered by the "map" property
of tc-mqprio, but in practice it isn't. The only supported prio->tc map
is "map 0 1 2 3 4 5 6 7".

This is to say, enetc does not have a per-packet priority that gets
passed via BD metadata, but rather, packets inherit the configured
priority of the ring (Linux TX queue).

> From the API perspective, it seems that I could say that "fp-prio" 0 is
> associated with queue 0, fp-prio 1 to queue 1, and so on, and everything
> will work.

You have the tc-mqprio "queues" and "map" to juggle with, to end up
configuring FP per queue based on the provided per-priority settings.

> The only thing that I am not happy at all is that there are exactly 8
> fp-prios.
> 
> The Linux network stack is more flexible than what 802.1Q defines, think
> skb->priority, number of TCs, as you said earlier, I would hate to
> impose some almost artificial limits here. And in future we are going to
> see TSN enabled devices with lots more queues.

~artificial limits~

IEEE 802.1Q says:

| 12.30.1.1 framePreemptionStatusTable structure and data types
| 
| The framePreemptionStatusTable (6.7.2) consists of 8 framePreemptionAdminStatus
| values (12.30.1.1.1), one per priority.

I'm more concerned about setting things straight than about the limits
right now. I don't yet think everything is quite ok in that regard.
The netlink format proposed here is in principle extensible for
priorities > 7, if that will ever make sense.

> 
> In short:
>  - Comment: this section of the RFC is hardware independent, this
>  behavior of queues and priorities being orthogonal is only valid for
>  some implementations;

Yes, and IMO it can only be that way, if I were to be truthful to my
interpretation of the intention 802.1Q spec (please contradict me if you
have a different interpretation of it!).

Note that I do see contradictions in 802.1Q, and I don't know how to
reconcile them. I've suppressed some of them for lack of a logical
explanation. I'm mentioning this for transparency; I don't know
everything either, but I need to make something out of what I do know.

So 802.1Q says this:

| 12.30.1.1.1 framePreemptionAdminStatus
|
| This parameter is the administrative value of the preemption status for
| the priority. It takes value express if frames queued for the priority
| are to be transmitted using the express service for the Port, or
| preemptible if frames queued for the priority are to be transmitted
| using the preemptible service for the Port and preemption is enabled for
| the Port.

So far so good. In 802.1Q definitions, priority is attached to a packet
rather than to a traffic class / queue. But then the same clause continues:

| Priorities that all map to the same traffic class should be constrained
| to use the same value of preemption status.

Which seems to throw a wrench into everything.

It raises two questions:

(A) why is AdminStatus not per traffic class then?
(B) why is the constraint there, what's it trying to protect against?

I've asked around, and I got unsatisfactory answers to both questions.

A seemingly competent answer given to (A) by Rui (CCed) is that the
eMAC/pMAC selection on TX actually takes place in the MAC layer, in what
is called by 802.1AC-2016 "MA_UNITDATA.request" (and what everybody else
calls "MAC client xmit"). The parameters of this MAC service primitive
are:

MA_UNITDATA.request(destination_address,
		    source_address,
		    mac_service_data_unit,
		    priority)

So since only "priority" is passed to the MAC service (and traffic class isn't),
this means that the MAC service can only steer packets to eMAC/pMAC
based on what it's given (i.e. priority).

But upon closer investigation, this explanation doesn't appear to hold
water very well. This is because clause 6.7.1 Support of the ISS by IEEE
Std 802.3 (Ethernet) from 802.1Q says:

| If frame preemption (6.7.2) is supported on a Port, then the IEEE 802.3
| MAC provides the following two MAC service interfaces (99.4 of IEEE Std
| 802.3br™-2016 [B21]):
|
| a) A preemptible MAC (pMAC) service interface
| b) An express MAC (eMAC) service interface
|
| For priority values that are identified in the frame preemption status
| table (6.7.2) as preemptible, frames that are selected for transmission
| shall be transmitted using the pMAC service instance, and for priority
| values that are identified in the frame preemption status table as
| express, frames that are selected for transmission shall be transmitted
| using the eMAC service instance.
| In all other respects, the Port behaves as if it is supported by a
| single MAC service interface. In particular, all frames received by the
| Port are treated as if they were received on a single MAC service
| interface regardless of whether they were received on the eMAC service
| interface or the pMAC service interface, except with respect to frame
| preemption.

So there you go, the MAC has to provide *two* service interfaces, so the
802.1Q upper layer (client of both) can just decide based on an internal
criterion (like, say traffic class) into which service it sends a frame.
So this can't be the reason.

As for (B), it was suggested to me that 802.1Q that doesn't allow out of
order transmission within the same queue/traffic class. After all,
what's at play here is whether a single TX queue device can support FP
or not. Supporting FP would mean reordering PT frames relative to ET
frames.

I think this explanation is unsatisfactory too. Here's the only
reference I saw in 802.1Q to ordering guarantees. Basically those
guarantees are all *per priority*, and since PT/ET is also per priority,
I don't see why reordering there would violate this:

| 6.5.3 Frame misordering
| The MAC Service (IEEE Std 802.1AC) permits a negligible rate of
| reordering of frames with a given priority for a given combination of
| destination address, source address, and flow hash, if present,
| transmitted on a given VLAN.
| MA_UNITDATA.indication service primitives corresponding to
| MA_UNITDATA.request primitives, with the same requested priority and for
| the same combination of VLAN classification, destination address, source
| address, and flow hash, if present, are received in the same order as
| the request primitives were processed.

So for anything to make sense for me at all, I simply had to block out
that phrase from my mind. I'm posting the concern here, publicly, in
case someone can enlighten me.

>  - Question: would it be against the intention of the API to have a 1:1
>  map of priorities to queues?

No; as mentioned, going through mqprio's "map" and "queues" to resolve
the priority to a queue is something that I intended to be possible.
Sure, it's not handy either. It would have been handier if the
"admin-status" array was part of the tc-mqprio config, like you did in
your RFC.

But right now I'm trying to not close the possibility for single queue
devices (which won't implement mqprio) to support FP, since I haven't
seen anything convincing that would disprove such a hw design as
infeasible. If someone could share some compelling insight into this it
would be really appreciated.

>  - Deal breaker: fixed 8 prios;

idk, I don't think that's our biggest problem right now honestly.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-08-19  8:16 ` Kurt Kanzenbach
@ 2022-08-19 16:59   ` Vladimir Oltean
  2022-08-23 10:50     ` Kurt Kanzenbach
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-08-19 16:59 UTC (permalink / raw)
  To: Kurt Kanzenbach
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil,
	Vinicius Costa Gomes, Xiaoliang Yang, Rui Sousa, Ferenc Fejes

Hi Kurt,

On Fri, Aug 19, 2022 at 10:16:20AM +0200, Kurt Kanzenbach wrote:
> On Wed Aug 17 2022, Vladimir Oltean wrote:
> > Vinicius' progress on upstreaming frame preemption support for Intel I226
> > seemed to stall, so I decided to give it a go using my own view as well.
> > https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/
> 
> Great to see progress on FPE :-).

Let's hope it lasts ;)

> > - Finally, the hardware I'm working with (here, the test vehicle is the
> >   NXP ENETC from LS1028A, although I have patches for the Felix switch
> >   as well, but those need a bit of a revolution in the driver to go in
> >   first). This hardware is not without its flaws, but at least allows me
> >   to concentrate on the UAPI portions for this series.
> >
> > I also have a kselftest written, but it's for the Felix switch (covers
> > forwarding latency) and so it's not included here.
> 
> What kind of selftest did you implement? So far I've been doing this:
> Using a cyclic real time application to create high priority frames and
> running iperf3 in parallel to simulate low priority traffic
> constantly. Afterwards, checking the NIC statistics for fragments and so
> on. Also checking the latency of the RT frames with FPE on/off.
> 
> BTW, if you guys need help with testing of patches, i do have access to
> i225 and stmmacs which both support FPE. Also the Hirschmann switches
> should support it.

Blah, I didn't want to spoil the surprise just yet. I am orchestrating 2
isochron senders at specific times, one of PT traffic and one of ET.

There are actually 2 variants of this: one is for endpoint FP and the
other is for bridge FP. I only had time to convert the bridge FP to
kselftest format; not the endpoint one (for enetc).

In the endpoint case, interference is created on the sender interface.
I compare HW TX timestamps to the expected TX times to calculate how
long it took until PT was preempted. I repeat the test millions of times
until I can plot the latencies having the PT <-> ET base time offset on
the X axis. It looks very cool.

The bridge case is similar, except for the fact that interference is
created on a bridge port going to a common receiver of 2 isochron
senders. What I plot is the path delay, and again, this shows actual
preemption times with a nanosecond resolution.

Here's a small snapshot of the main script. Not shown are the supporting
scripts for this:

#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# Copyright 2022 NXP
#
# Selftest for Frame Preemption on a switch. Creates controlled packet
# collisions between a priority configured as preemptable traffic (PT) and a
# priority configured as express traffic (ET).
#
#                           +-----------------------------+
#                           |            fp.sh            |
#                           |             br0             |
#                           |     +--------+--------+     |
#                           |     |        |        |     |
#                           |   $swp1    $swp2    $swp3   |
#                           +-----------------------------+
#   +--------------------+        ^        ^        |       +----------------+
#   |                    |        |        |        |       |                |
#   | fp_node_et_send.sh |--------+        |        +------>| fp_node_rcv.sh |
#   |        $h1         |                 |                |       $h3      |
#   +--------------------+                 |                +----------------+
#                                          |
#   +--------------------+                 |
#   |                    |                 |
#   | fp_node_pt_send.sh |-----------------+
#   |        $h2         |
#   +--------------------+
#
# Normally, a packet collision plot with no frame preemption looks as follows.
# The first packet starts being transmitted by the MAC, and the second packet
# sees a latency proportional to how much transmission time of the previous
# packet there is left.
#
#        ^
#   ET   |
# frame  |        ---.
#  path  |       /    --.
# delay  |       |       --.
#        |       /          --.
#        |       |             --.
#        |      /                 --.
#        |      |                    ---.
#        |-----/                         -----------------------
#        |
#     ---+----------------------------------------------------------->
#        |      ET base time offset relative to PT
#
# Depending on the preemption point (where the express packet hits the
# preemptable packet that is undergoing transmission), the express packet will
# have a higher or lower path delay (latency).
#
# The packet collision can be broken into 3 distinct intervals:
# (a) when the packets start interfering, but not enough of the PT packet has
#     been transmitted yet so as to preempt it (MAC merge fragments must still
#     obey the minimum Ethernet frame size rule). There are no frame
#     preemptions when packets collide here.
# (b) the preemptable area, where the latency should be equal to the
#     transmission time of a MAC merge fragment rather than a full packet
#     (preemption should happen right away).
# (c) where the packets still interfere, but there are not enough bytes of the
#     PT frame left to transmit so as to construct a valid fragment out of the
#     remainder. No preemptions are expected in this region.
#
#        ^
#   ET   |      a         b            c
# frame  |   <----><---------------><---->
#  path  |
# delay  |
#        |
#        |       /|
#        |      /  -----------------.
#        |      |                    ---.
#        |-----/                         -----------------------
#        |
#     ---+----------------------------------------------------------->
#        |      ET base time offset relative to PT
#
# The selftest evaluates the preemption performance of a given switch by
# plotting the ET latency as a function of the ET base time offset.

WAIT_TIME=1
NUM_NETIFS=3
NETIF_CREATE=no
REQUIRE_MZ=no
lib_dir=$(dirname $0)/../../../net/forwarding
source $lib_dir/lib.sh
source $lib_dir/tsn_lib.sh
source $(dirname $0)/fp_topology.sh

require_command gnuplot

swp1=${NETIFS[p1]}
swp2=${NETIFS[p2]}
swp3=${NETIFS[p3]}

signal_received=false

ethtool_save_fp_admin_status()
{
	admin_status=$(ethtool --json --show-fp $1 | \
		jq "(.[].\"parameter-table\"[] | select(.prio == $2)).\"admin-status\"")
}

ethtool_restore_fp_admin_status()
{
	case $admin_status in
	express)
		ethtool --set-fp $1 admin-status $2:E
		;;
	preemptable)
		ethtool --set-fp $1 admin-status $2:P
		;;
	esac
}

avg_pdelay()
{
	local file=$1
	local tmp=$(mktemp)

	printf "print((" > $tmp
	isochron report \
		--input-file "${file}" \
		--printf-format "print(\"{} + \".format(%d - %d), end='')\n" \
		--printf-args "RT" | python3 - >> $tmp
	printf "0) / $NUM_FRAMES)" >> $tmp
	cat $tmp | python3 -
	rm -f $tmp
}

validate_coordination()
{
	local et=$1; shift
	local pt=$1; shift
	local et_start
	local pt_start

	et_start=$(($(isochron report --input-file $et \
		--printf-format "%u - %u" --printf-args "SB" --stop 1)))
	pt_start=$(($(isochron report --input-file $pt \
		--printf-format "%u - %u" --printf-args "SB" --stop 1)))
	if ! [ $et_start = $pt_start ]; then
		printf "ET test %s / PT test %s: ET starts at %s, PT starts at %s\n" \
			"$et" "$pt" \
			"$(isochron report --input-file $et \
				--printf-format "%T" --printf-args "S" --stop 1)" \
			"$(isochron report --input-file $pt \
				--printf-format "%T" --printf-args "S" --stop 1)"
		exit 1
	fi
}

plot()
{
	local pt_frame_size=$1; shift
	local num_tests=$1; shift
	local fp_label=$1; shift
	local pt_svg="plots/pt-${pt_frame_size}-${fp_label}-pt.svg"
	local pt_plot="plots/pt-${pt_frame_size}-${fp_label}-pt.plot"
	local et_svg="plots/pt-${pt_frame_size}-${fp_label}-et.svg"
	local et_plot="plots/pt-${pt_frame_size}-${fp_label}-et.plot"
	local pt_avg_pdelay
	local et_avg_pdelay
	local base_time
	local et
	local pt
	local i

	printf "Plotting charts for collision between PT@%d and ET@%d\n" \
		"${pt_frame_size}" "${ET_FRAME_SIZE}"

	mkdir -p plots
	rm -f $et_plot $pt_plot

	for ((i = 0; i < num_tests; i++)); do
		et="reports/pt-${pt_frame_size}-test-${i}-${fp_label}-et.dat"
		pt="reports/pt-${pt_frame_size}-test-${i}-${fp_label}-pt.dat"

		base_time=$(isochron report --input-file "${et}" \
				--printf-format "%u" --printf-args "B" \
				--stop 1)

		pt_avg_pdelay=$(avg_pdelay $pt)
		et_avg_pdelay=$(avg_pdelay $et)

		echo "$base_time $pt_avg_pdelay" >> $pt_plot
		echo "$base_time $et_avg_pdelay" >> $et_plot

		if [ "$signal_received" = "true" ]; then
			exit 1
		fi
	done

	gnuplot --persist <<- EOF
		set xlabel "Base time offset (ns)"
		set ylabel "Average PT path delay (ns)"
		set term svg
		set terminal svg enhanced background rgb "white"
		set output "${pt_svg}"
		plot "${pt_plot}" using 1:2 with lines title "PT latency"
	EOF

	gnuplot --persist <<- EOF
		set xlabel "Base time offset (ns)"
		set ylabel "Average ET path delay (ns)"
		set term svg
		set terminal svg enhanced background rgb "white"
		set output "${et_svg}"
		plot "${et_plot}" using 1:2 with lines title "ET latency"
	EOF

	rm -f ${et_plot} ${pt_plot}
}

test_collision()
{
	local test_num=$1; shift
	local pt_frame_size=$1; shift
	local base_time_offset=$1; shift
	local fp_label=$1; shift
	local vrf_name=$(master_name_get br0)
	local et="reports/pt-${pt_frame_size}-test-${test_num}-${fp_label}-et.dat"
	local pt="reports/pt-${pt_frame_size}-test-${test_num}-${fp_label}-pt.dat"
	local orchestration=$(mktemp)
	local et_extra_args
	local pt_extra_args

	if [ "${H1_REQUIRE_PTP4L}" = "no" ]; then
		et_extra_args="${et_extra_args} --omit-sync"
	fi

	if [ "${H2_REQUIRE_PTP4L}" = "no" ]; then
		pt_extra_args="${pt_extra_args} --omit-sync"
	fi

	if [ "${H3_REQUIRE_PTP4L}" = "no" ]; then
		et_extra_args="${et_extra_args} --omit-remote-sync"
		pt_extra_args="${pt_extra_args} --omit-remote-sync"
	fi

	mkdir -p reports

	printf "Collision between PT packets of size %d and ET packets of size %d at %s, test %d (base time offset %d)\n" \
		${pt_frame_size} ${ET_FRAME_SIZE} ${fp_label} ${test_num} ${base_time_offset}

	cat <<- EOF > ${orchestration}
	[ET]
	host = $H1_IPV4%$vrf_name
	port = $H1_PORT
	exec = isochron send \\
		--client $H3_IPV4%$vrf_name \\
		--stats-port $H3_ET_PORT \\
		--interface $ET_IF_NAME \\
		--unix-domain-socket $UDS_ADDRESS_H1 \\
		--num-frames $NUM_FRAMES \\
		--priority $ET_PRIO \\
		--vid 0 \\
		--base-time ${base_time_offset} \\
		--cycle-time ${CYCLE_TIME_NS} \\
		--frame-size $ET_FRAME_SIZE \\
		--sync-threshold $SYNC_THRESHOLD_NS \\
		--cpu-mask $((1 << ${ISOCHRON_CPU})) \\
		--sched-fifo \\
		--sched-priority 98 \\
		--etype 0xdead \\
		--txtime \\
		${et_extra_args} \\
		--output-file ${et}

	[PT]
	host = $H2_IPV4%$vrf_name
	port = $H2_PORT
	exec = isochron send \\
		--client $H3_IPV4%$vrf_name \\
		--stats-port $H3_PT_PORT \\
		--interface $PT_IF_NAME \\
		--unix-domain-socket $UDS_ADDRESS_H2 \\
		--num-frames $NUM_FRAMES \\
		--priority $PT_PRIO \\
		--vid 0 \\
		--base-time 0.000000000 \\
		--cycle-time ${CYCLE_TIME_NS} \\
		--frame-size $pt_frame_size \\
		--sync-threshold $SYNC_THRESHOLD_NS \\
		--cpu-mask $((1 << ${ISOCHRON_CPU})) \\
		--sched-fifo \\
		--sched-priority 98 \\
		--etype 0xdeaf \\
		--txtime \\
		${pt_extra_args} \\
		--output-file ${pt}
	EOF

	isochron orchestrate --input-file ${orchestration}

	rm -rf ${orchestration}

	validate_coordination $et $pt
}

test_collision_sweep_base_time()
{
	local pt_frame_size=$1; shift
	local base_time_start=$1; shift
	local base_time_stop=$1; shift
	local base_time_increment=$1; shift
	local fp_label=$1; shift
	local num_tests
	local i

	num_tests=$(((base_time_stop - base_time_start + 1) / base_time_increment))
	for ((i = 0; i < num_tests; i++)); do
		base_time=$((base_time_start + i * base_time_increment))

		test_collision "$i" "$pt_frame_size" "$base_time" "$fp_label"

		if [ "$signal_received" = "true" ]; then
			exit 1
		fi
	done

	plot $pt_frame_size $num_tests $fp_label
}

test_pt_124()
{
	local start=$1; shift
	local stop=$1; shift
	local fp_label=$1; shift

	test_collision_sweep_base_time 124 $start $stop 40 $fp_label
}

test_pt_300()
{
	local start=$1; shift
	local stop=$1; shift
	local fp_label=$1; shift

	test_collision_sweep_base_time 300 $start $stop 40 $fp_label
}

test_pt_600()
{
	local start=$1; shift
	local stop=$1; shift
	local fp_label=$1; shift

	test_collision_sweep_base_time 600 $start $stop 40 $fp_label
}

test_pt_1000()
{
	local start=$1; shift
	local stop=$1; shift
	local fp_label=$1; shift

	test_collision_sweep_base_time 1000 $start $stop 40 $fp_label
}

test_pt_1500()
{
	local start=$1; shift
	local stop=$1; shift
	local fp_label=$1

	test_collision_sweep_base_time 1500 $start $stop 40 $fp_label
}

test_no_fp_cut_through()
{
	local fp_label="no-fp"

	test_pt_124 0 1800 $fp_label
	test_pt_300 0 3400 $fp_label
	test_pt_600 0 7000 $fp_label
	test_pt_1000 1500 12000 $fp_label
	test_pt_1500 3000 17000 $fp_label
}

test_fp_quick()
{
	ethtool_save_fp_admin_status $swp3 $PT_PRIO
	ethtool --set-mm $swp3 verify-disable off enabled on add-frag-size 0
	ethtool --set-fp $swp3 admin-status $PT_PRIO:P

	test_pt_1500 12000 26000 "quick"

	ethtool_restore_fp_admin_status $swp3 $PT_PRIO
}

test_fp()
{
	local fp_label=$1; shift
	local add_frag_size=$1; shift

	ethtool_save_fp_admin_status $swp3 $PT_PRIO
	ethtool --set-mm $swp3 verify-disable off enabled on add-frag-size $add_frag_size
	ethtool --set-fp $swp3 admin-status $PT_PRIO:P

	test_pt_124 200 2000 $fp_label
	test_pt_300 1500 5000 $fp_label
	test_pt_600 4500 10000 $fp_label
	test_pt_1000 8800 17000 $fp_label
	test_pt_1500 12000 26000 $fp_label

	ethtool_restore_fp_admin_status $swp3 $PT_PRIO
}

test_fp_add_frag_size_0()
{
	test_fp "add-frag-size-0" 0
}

test_fp_add_frag_size_1()
{
	test_fp "add-frag-size-1" 1
}

test_fp_add_frag_size_2()
{
	test_fp "add-frag-size-2" 2
}

test_fp_add_frag_size_3()
{
	test_fp "add-frag-size-2" 3
}

setup_prepare()
{
	vrf_prepare

	switch_create
	ptp4l_start "$swp1 $swp2 $swp3" false ${UDS_ADDRESS_SWITCH}
	phc2sys_start ${UDS_ADDRESS_SWITCH}
}

cleanup()
{
	pre_cleanup

	ptp4l_stop "$swp1 $swp2 $swp3"
	phc2sys_stop
	switch_destroy

	vrf_cleanup
}

signal()
{
	signal_received=true
}

trap cleanup EXIT
trap signal SIGTERM
trap signal SIGINT

#	test_no_fp_cut_through
#	test_fp_add_frag_size_0
#	test_fp_add_frag_size_1
#	test_fp_add_frag_size_2
#	test_fp_add_frag_size_3
ALL_TESTS="
	test_fp_quick
"

setup_prepare
setup_wait

tests_run

exit $EXIT_STATUS

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-08-19 16:59   ` Vladimir Oltean
@ 2022-08-23 10:50     ` Kurt Kanzenbach
  0 siblings, 0 replies; 37+ messages in thread
From: Kurt Kanzenbach @ 2022-08-23 10:50 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil,
	Vinicius Costa Gomes, Xiaoliang Yang, Rui Sousa, Ferenc Fejes

[-- Attachment #1: Type: text/plain, Size: 2680 bytes --]

On Fri Aug 19 2022, Vladimir Oltean wrote:
> Hi Kurt,
>
> On Fri, Aug 19, 2022 at 10:16:20AM +0200, Kurt Kanzenbach wrote:
>> On Wed Aug 17 2022, Vladimir Oltean wrote:
>> > Vinicius' progress on upstreaming frame preemption support for Intel I226
>> > seemed to stall, so I decided to give it a go using my own view as well.
>> > https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/
>> 
>> Great to see progress on FPE :-).
>
> Let's hope it lasts ;)
>
>> > - Finally, the hardware I'm working with (here, the test vehicle is the
>> >   NXP ENETC from LS1028A, although I have patches for the Felix switch
>> >   as well, but those need a bit of a revolution in the driver to go in
>> >   first). This hardware is not without its flaws, but at least allows me
>> >   to concentrate on the UAPI portions for this series.
>> >
>> > I also have a kselftest written, but it's for the Felix switch (covers
>> > forwarding latency) and so it's not included here.
>> 
>> What kind of selftest did you implement? So far I've been doing this:
>> Using a cyclic real time application to create high priority frames and
>> running iperf3 in parallel to simulate low priority traffic
>> constantly. Afterwards, checking the NIC statistics for fragments and so
>> on. Also checking the latency of the RT frames with FPE on/off.
>> 
>> BTW, if you guys need help with testing of patches, i do have access to
>> i225 and stmmacs which both support FPE. Also the Hirschmann switches
>> should support it.
>
> Blah, I didn't want to spoil the surprise just yet. I am orchestrating 2
> isochron senders at specific times, one of PT traffic and one of ET.
>
> There are actually 2 variants of this: one is for endpoint FP and the
> other is for bridge FP. I only had time to convert the bridge FP to
> kselftest format; not the endpoint one (for enetc).
>
> In the endpoint case, interference is created on the sender interface.
> I compare HW TX timestamps to the expected TX times to calculate how
> long it took until PT was preempted. I repeat the test millions of times
> until I can plot the latencies having the PT <-> ET base time offset on
> the X axis. It looks very cool.
>
> The bridge case is similar, except for the fact that interference is
> created on a bridge port going to a common receiver of 2 isochron
> senders. What I plot is the path delay, and again, this shows actual
> preemption times with a nanosecond resolution.

That makes a lot of sense and this kind of test scenario should work for
other endpoints implementations such as igc and stmmac too.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-08-19 16:12     ` Vladimir Oltean
@ 2022-08-24  0:35       ` Vinicius Costa Gomes
  2022-09-07 20:57         ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-24  0:35 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil, Xiaoliang Yang,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

Hi Vladimir,

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Wed, Aug 17, 2022 at 04:15:11PM -0700, Vinicius Costa Gomes wrote:
>> I liked that in the API sense, using this "prio" concept we gain more
>> flexibility, and we can express better what the hardware you work with
>> can do, i.e. priority (for frame preemption purposes?) and queues are
>> orthogonal.
>> 
>> The problem I have is that the hardware I work with is more limited (as
>> are some stmmac-based NICs that I am aware of) frame preemption
>> capability and "priority" are attached to queues.
>
> Don't get me wrong, enetc also has "priority" attached to TX rings
> (enetc_set_bdr_prio). Then there is another register which maps a TX
> priority to a traffic class. This could be altered by the "map" property
> of tc-mqprio, but in practice it isn't. The only supported prio->tc map
> is "map 0 1 2 3 4 5 6 7".
>
> This is to say, enetc does not have a per-packet priority that gets
> passed via BD metadata, but rather, packets inherit the configured
> priority of the ring (Linux TX queue).
>

I see. Thanks for the explanation. Intel hardware (client/"not data
center", that is) is simpler, we only have one level of "ordering", and
we usually don't change the default: lowest queue index (0) has highest
priority, (1) has lower, and so on.

>> From the API perspective, it seems that I could say that "fp-prio" 0 is
>> associated with queue 0, fp-prio 1 to queue 1, and so on, and everything
>> will work.
>
> You have the tc-mqprio "queues" and "map" to juggle with, to end up
> configuring FP per queue based on the provided per-priority settings.
>
>> The only thing that I am not happy at all is that there are exactly 8
>> fp-prios.
>> 
>> The Linux network stack is more flexible than what 802.1Q defines, think
>> skb->priority, number of TCs, as you said earlier, I would hate to
>> impose some almost artificial limits here. And in future we are going to
>> see TSN enabled devices with lots more queues.
>
> ~artificial limits~
>
> IEEE 802.1Q says:
>
> | 12.30.1.1 framePreemptionStatusTable structure and data types
> | 
> | The framePreemptionStatusTable (6.7.2) consists of 8 framePreemptionAdminStatus
> | values (12.30.1.1.1), one per priority.
>
> I'm more concerned about setting things straight than about the limits
> right now. I don't yet think everything is quite ok in that regard.
> The netlink format proposed here is in principle extensible for
> priorities > 7, if that will ever make sense.

Yes, as the limits are not in the UAPI this is a minor point, agreed.

What I am concerned is about use cases not handled by IEEE 802.1Q, for
example, thinking about that other thread about PCP/DSCP: how about a
network that wants to have as many priorities as the DSCP field (6 bits)
allows, together with frame preemption (in the future we may have
hardware that supports that). I am only thinking that we should not
close that door.

Again, minor point.

>
>> 
>> In short:
>>  - Comment: this section of the RFC is hardware independent, this
>>  behavior of queues and priorities being orthogonal is only valid for
>>  some implementations;
>
> Yes, and IMO it can only be that way, if I were to be truthful to my
> interpretation of the intention 802.1Q spec (please contradict me if you
> have a different interpretation of it!).
>

Sorry if I made you go that deep in the spec.

I was only trying to say that for some implementations it's
difficult/impossible to have totally orthogonal priorities vs. queue
assignments, and that the commit message needs to take that into account
(because the API introduced in this commit is implementation
independent).

> Note that I do see contradictions in 802.1Q, and I don't know how to
> reconcile them. I've suppressed some of them for lack of a logical
> explanation. I'm mentioning this for transparency; I don't know
> everything either, but I need to make something out of what I do know.
>
> So 802.1Q says this:
>
> | 12.30.1.1.1 framePreemptionAdminStatus
> |
> | This parameter is the administrative value of the preemption status for
> | the priority. It takes value express if frames queued for the priority
> | are to be transmitted using the express service for the Port, or
> | preemptible if frames queued for the priority are to be transmitted
> | using the preemptible service for the Port and preemption is enabled for
> | the Port.
>
> So far so good. In 802.1Q definitions, priority is attached to a packet
> rather than to a traffic class / queue. But then the same clause continues:
>
> | Priorities that all map to the same traffic class should be constrained
> | to use the same value of preemption status.
>
> Which seems to throw a wrench into everything.
>
> It raises two questions:
>
> (A) why is AdminStatus not per traffic class then?
> (B) why is the constraint there, what's it trying to protect against?
>
> I've asked around, and I got unsatisfactory answers to both questions.
>
> A seemingly competent answer given to (A) by Rui (CCed) is that the
> eMAC/pMAC selection on TX actually takes place in the MAC layer, in what
> is called by 802.1AC-2016 "MA_UNITDATA.request" (and what everybody else
> calls "MAC client xmit"). The parameters of this MAC service primitive
> are:
>
> MA_UNITDATA.request(destination_address,
> 		    source_address,
> 		    mac_service_data_unit,
> 		    priority)
>
> So since only "priority" is passed to the MAC service (and traffic class isn't),
> this means that the MAC service can only steer packets to eMAC/pMAC
> based on what it's given (i.e. priority).
>
> But upon closer investigation, this explanation doesn't appear to hold
> water very well. This is because clause 6.7.1 Support of the ISS by IEEE
> Std 802.3 (Ethernet) from 802.1Q says:
>
> | If frame preemption (6.7.2) is supported on a Port, then the IEEE 802.3
> | MAC provides the following two MAC service interfaces (99.4 of IEEE Std
> | 802.3br™-2016 [B21]):
> |
> | a) A preemptible MAC (pMAC) service interface
> | b) An express MAC (eMAC) service interface
> |
> | For priority values that are identified in the frame preemption status
> | table (6.7.2) as preemptible, frames that are selected for transmission
> | shall be transmitted using the pMAC service instance, and for priority
> | values that are identified in the frame preemption status table as
> | express, frames that are selected for transmission shall be transmitted
> | using the eMAC service instance.
> | In all other respects, the Port behaves as if it is supported by a
> | single MAC service interface. In particular, all frames received by the
> | Port are treated as if they were received on a single MAC service
> | interface regardless of whether they were received on the eMAC service
> | interface or the pMAC service interface, except with respect to frame
> | preemption.
>
> So there you go, the MAC has to provide *two* service interfaces, so the
> 802.1Q upper layer (client of both) can just decide based on an internal
> criterion (like, say traffic class) into which service it sends a frame.
> So this can't be the reason.
>
> As for (B), it was suggested to me that 802.1Q that doesn't allow out of
> order transmission within the same queue/traffic class. After all,
> what's at play here is whether a single TX queue device can support FP
> or not. Supporting FP would mean reordering PT frames relative to ET
> frames.
>
> I think this explanation is unsatisfactory too. Here's the only
> reference I saw in 802.1Q to ordering guarantees. Basically those
> guarantees are all *per priority*, and since PT/ET is also per priority,
> I don't see why reordering there would violate this:
>
> | 6.5.3 Frame misordering
> | The MAC Service (IEEE Std 802.1AC) permits a negligible rate of
> | reordering of frames with a given priority for a given combination of
> | destination address, source address, and flow hash, if present,
> | transmitted on a given VLAN.
> | MA_UNITDATA.indication service primitives corresponding to
> | MA_UNITDATA.request primitives, with the same requested priority and for
> | the same combination of VLAN classification, destination address, source
> | address, and flow hash, if present, are received in the same order as
> | the request primitives were processed.
>
> So for anything to make sense for me at all, I simply had to block out
> that phrase from my mind. I'm posting the concern here, publicly, in
> case someone can enlighten me.
>

I, truly, admire your effort (and the time you took) and your
explanation. I think I understand where you come from better. Thank you.

Taking a few steps back (I don't like this expression, but anyway), I
think that strict conformance is not an objective of the Linux network
stack as a whole. What I mean by "not strict": that it doesn't try to
follow the letter of the "baggy pants" model that IEEE 802.* and friends
define. Being efficient/useful has a higher importance.

But being certifiable is a target/ideal, i.e. the "on the wire" and user
visible (statistics, etc) behavior can/"should be able" to be configured
to pass conformance tests or even better, interoperability tests.

Now back on track, in my mental model, this simplifies things to a
couple of questions that I ask myself about this RFC, in particular
about this priority idea:
 - Is this idea useful? My answer is yes.
 - Can this idea be used to configure a certifiable system? Also, yes.

So, I have no problems with it.

>>  - Question: would it be against the intention of the API to have a 1:1
>>  map of priorities to queues?
>
> No; as mentioned, going through mqprio's "map" and "queues" to resolve
> the priority to a queue is something that I intended to be possible.
> Sure, it's not handy either. It would have been handier if the
> "admin-status" array was part of the tc-mqprio config, like you did in
> your RFC.

Just to be sure, say that my plan is that I document that for the igc
driver, the mapping of priorities for the "FP" command is prio (0) ->
queue (0), prio (1) -> queue (1), and so on. Note that the mqprio
mapping could be that "skb->priority" 10 goes to TC 1 which is mapped to
queue 0. Would this be ok?

>
> But right now I'm trying to not close the possibility for single queue
> devices (which won't implement mqprio) to support FP, since I haven't
> seen anything convincing that would disprove such a hw design as
> infeasible. If someone could share some compelling insight into this it
> would be really appreciated.
>

The important part is that you didn't close that door. If someone is
brave enough to design that hw, it seems that it can be made to work,
and that's good enough.

>>  - Deal breaker: fixed 8 prios;
>
> idk, I don't think that's our biggest problem right now honestly.

Perhaps I am being too optimistic, but the way I see, your proposal is
already expressible enough to be used in a "certifiable" system. I was
too harsh with "deal breaker.

So, what is the biggest problem that you see?

Overall, no issues from my side.

Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-08-24  0:35       ` Vinicius Costa Gomes
@ 2022-09-07 20:57         ` Vladimir Oltean
  2022-09-10  0:19           ` Vinicius Costa Gomes
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-09-07 20:57 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil, Xiaoliang Yang,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes, Richie Pearn

On Tue, Aug 23, 2022 at 05:35:11PM -0700, Vinicius Costa Gomes wrote:
> Yes, as the limits are not in the UAPI this is a minor point, agreed.
> 
> What I am concerned is about use cases not handled by IEEE 802.1Q, for
> example, thinking about that other thread about PCP/DSCP: how about a
> network that wants to have as many priorities as the DSCP field (6 bits)
> allows, together with frame preemption (in the future we may have
> hardware that supports that). I am only thinking that we should not
> close that door.
> 
> Again, minor point.

Since Linux seems to have settled on TC_PRIO_MAX, TC_QOPT_MAX_QUEUE
constants of 16 rather than the 8 you'd expect, I guess I don't have
that big of a problem to also expose 16 admin-status values for FP.
I can limit the drivers I have to only look at the first 8 entries.
But what do 16 priorities mean to ethtool? 16 priorities are a tc thing.
Does ethtool even have a history of messing with packet priorities?
What am I even doing? lol.

> Sorry if I made you go that deep in the spec.
> 
> I was only trying to say that for some implementations it's
> difficult/impossible to have totally orthogonal priorities vs. queue
> assignments, and that the commit message needs to take that into account
> (because the API introduced in this commit is implementation
> independent).

Yes, the comment is totally valid. However, between "difficult" and
"impossible" there is a big leap, and if it is really impossible, I'd
like to know precisely why.

> I, truly, admire your effort (and the time you took) and your
> explanation. I think I understand where you come from better. Thank you.
> 
> Taking a few steps back (I don't like this expression, but anyway), I
> think that strict conformance is not an objective of the Linux network
> stack as a whole. What I mean by "not strict": that it doesn't try to
> follow the letter of the "baggy pants" model that IEEE 802.* and friends
> define. Being efficient/useful has a higher importance.
> 
> But being certifiable is a target/ideal, i.e. the "on the wire" and user
> visible (statistics, etc) behavior can/"should be able" to be configured
> to pass conformance tests or even better, interoperability tests.
> 
> Now back on track, in my mental model, this simplifies things to a
> couple of questions that I ask myself about this RFC, in particular
> about this priority idea:
>  - Is this idea useful? My answer is yes.
>  - Can this idea be used to configure a certifiable system? Also, yes.
> 
> So, I have no problems with it.
> 

I've taken quite a few steps back now, unfortunately I'm still back to
where I was :)

I've talked to more people within NXP, explained to them that the
standard technically does not disallow the existence of FP for single
queue devices, for this and that reason. The only responses I got varied
from the equivalent of a frightened "brrr", to a resounding "no, that
isn't what was intended". Of course I tried to dig deeper, and I was
told that the configuration is per (PCP) priority because this is the
externally visible label carried by packets, so an external management
system that has to coordinate FP across a LAN does not need to know how
many traffic classes each node has, or how the priorities map to the
traffic classes. Only the externally visible behavior is important,
which is that packets with priority (PCP) X are sent on the wire using a
preemptable SFD or a normal SFD, because the configuration also affects
what the other nodes in the network expect. You might contrast this with
tc-taprio, where the open/closed gates really are defined per traffic
class, and you might wonder why it wasn't important there for the
configuration to also be handled using the externally-visible priority
(PCP) as input. Yes, but with tc-taprio, you can schedule within your
traffic classes all you want, but this does not change the externally
visible appearance of a frame, so it doesn't matter. The scheduling is
orthogonal to whether a packet will be sent as preemptable or not.

Is this explanation satisfactory, and where does it leave us? For me it
isn't, and it leaves me nowhere new, but it's the best I got.

So ok, single TX queue devices with FP probably are possibly a fun
physics class experiment, but practically minded vendors won't implement
them. But does the alternate justification given change my design decision
in any way (i.e. expose "preemptable priorities" in ethtool as opposed
to "preemptable queues" in tc-mqprio and tc-taprio as you did)? No.
It just becomes a matter of enforcing the recommended restriction that
preemptable and express priorities shouldn't be mixed within the same
traffic class, something which my current patch set does not do.

[ yes, "should" is a technical term in IEEE standards which means "not
  mandatory", and that's precisely how the constraint from 12.30.1.1.1
  was formulated ]

I guess when push comes to shove, somebody will have to answer the
question of why was FP exposed in Linux by something other than what the
standard defined it for (prio), and I wouldn't know how to answer that.

> >>  - Question: would it be against the intention of the API to have a 1:1
> >>  map of priorities to queues?
> >
> > No; as mentioned, going through mqprio's "map" and "queues" to resolve
> > the priority to a queue is something that I intended to be possible.
> > Sure, it's not handy either. It would have been handier if the
> > "admin-status" array was part of the tc-mqprio config, like you did in
> > your RFC.
> 
> Just to be sure, say that my plan is that I document that for the igc
> driver, the mapping of priorities for the "FP" command is prio (0) ->
> queue (0), prio (1) -> queue (1), and so on. Note that the mqprio
> mapping could be that "skb->priority" 10 goes to TC 1 which is mapped to
> queue 0. Would this be ok?

If skb->priority 10 goes to TC 1 via the prio_tc_map, and finally lands
in queue 0 where your actual FP knob is, then I'd expect the driver to
follow the reverse path between queue -> tc -> prios mapped to that tc.
And I would not state that prio 0 is queue 0 and skb->priority values
from the whole spectrum may land there, kthxbye. Also, didn't you say
earlier that "lowest queue index (0) has highest priority"? In Linux,
skb->priority 0 has lower priority that skb->priority 1. How does that
work out?

In fact, for both enetc and felix, I have to do this exact thing as you,
except that my hardware knobs are halfway in the path (per tc, not per
queue). The reason why I'm not doing it is because we only consider the
1:1 prio_tc_map, so we use "prio" and "tc" interchangeably.

It can't even be any other way given the current code, since the struct
tc_taprio_qopt_offload passed via ndo_setup_tc does not even contain the
tc_mqprio_qopt sub-structure with the prio_tc_map. Drivers are probably
expected to look, from the ndo_setup_tc hook, at their dev->prio_tc_map
as changed by netdev_set_prio_tc_map() in taprio_change(). Why this API
is different compared to struct tc_mqprio_qopt_offload, I don't know.

What prevents you from doing this? I'd like to understand as precisely
as you can explain, to see how what I'm proposing really sounds when
applied to Intel hardware.

> >>  - Deal breaker: fixed 8 prios;
> >
> > idk, I don't think that's our biggest problem right now honestly.
> 
> Perhaps I am being too optimistic, but the way I see, your proposal is
> already expressible enough to be used in a "certifiable" system. I was
> too harsh with "deal breaker.
> 
> So, what is the biggest problem that you see?
> 
> Overall, no issues from my side.

Let me recap my action items out loud for v2 (some of them weren't
discussed publicly per se).

* Enforce the constraint recommended by 12.30.1.1.1, somewhere as a
  static inline helper function that can be called by drivers, from the
  tc-taprio/tc-mqprio and ethtool-fp code paths. Note that I could very
  quickly run into a very deep rabbit hole here, if I actually bother to
  offload to hardware (enetc) the prio_tc_map communicated from tc-mqprio.
  I'll try to avoid that. I noticed that the mqprio_parse_opt() function
  doesn't validate the provided queue offsets and count if hw offload is
  requested (therefore allowing overlapping queue ranges). I really
  dislike that, because it was probably done for a reason.

* Port the isochron script I have for enetc (endpoint) FP latency
  measurements to kselftest format.

* Do something such that eMAC/pMAC counters are also summed up somewhere
  centrally by ethtool, and reported to the user either individually or
  aggregated.

* Reorganize the netlink UAPI such as to remove ETHTOOL_A_FP_PARAM_TABLE
  and just have multiple ETHTOOL_A_FP_PARAM_ENTRY nests in the parent.

* The enetc verification state machine is off by default. The felix
  verification state machine is on by default. I guess we should figure
  out a reasonable default for all drivers out there?

* For some reason, I notice that the verification state machine remains
  FAILED on felix when it links to enetc and that has verification
  enabled. I need to disable it on enetc, enable it again, and only then
  will the MAC merge interrupt on felix say that verification completed
  successfully. I think I'll need to put a quirk in enetc_pl_mac_link_up()
  to toggle verification off and on, if it was requested, in order for
  this to be seamless to the user.

If it's only these, nothing seems exactly major. However, I assume this
is only the beginning. I fully expect there to be one more round of
"why not dcbnl? why not tc?" nitpicks, for which I'm not exactly
mentally ready (if credible arguments are to be put forward, I'd rather
have them now while I haven't yet put in the extra work in the direction
I'm about to).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-09-07 20:57         ` Vladimir Oltean
@ 2022-09-10  0:19           ` Vinicius Costa Gomes
  2022-09-10 16:36             ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Vinicius Costa Gomes @ 2022-09-10  0:19 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil, Xiaoliang Yang,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes, Richie Pearn

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Tue, Aug 23, 2022 at 05:35:11PM -0700, Vinicius Costa Gomes wrote:
>> Yes, as the limits are not in the UAPI this is a minor point, agreed.
>> 
>> What I am concerned is about use cases not handled by IEEE 802.1Q, for
>> example, thinking about that other thread about PCP/DSCP: how about a
>> network that wants to have as many priorities as the DSCP field (6 bits)
>> allows, together with frame preemption (in the future we may have
>> hardware that supports that). I am only thinking that we should not
>> close that door.
>> 
>> Again, minor point.
>
> Since Linux seems to have settled on TC_PRIO_MAX, TC_QOPT_MAX_QUEUE
> constants of 16 rather than the 8 you'd expect, I guess I don't have
> that big of a problem to also expose 16 admin-status values for FP.
> I can limit the drivers I have to only look at the first 8 entries.
> But what do 16 priorities mean to ethtool? 16 priorities are a tc thing.
> Does ethtool even have a history of messing with packet priorities?
> What am I even doing? lol.
>

That's a good point. ethtool doesn't has a history of knowing about
priorities (and what they even mean). But I like the flexibility that
this approach provides.

>> Sorry if I made you go that deep in the spec.
>> 
>> I was only trying to say that for some implementations it's
>> difficult/impossible to have totally orthogonal priorities vs. queue
>> assignments, and that the commit message needs to take that into account
>> (because the API introduced in this commit is implementation
>> independent).
>
> Yes, the comment is totally valid. However, between "difficult" and
> "impossible" there is a big leap, and if it is really impossible, I'd
> like to know precisely why.
>

Sometimes the enemy of good is perfect (in this case precisely). Perhaps
aiming for "good enough" understanding?

>> I, truly, admire your effort (and the time you took) and your
>> explanation. I think I understand where you come from better. Thank you.
>> 
>> Taking a few steps back (I don't like this expression, but anyway), I
>> think that strict conformance is not an objective of the Linux network
>> stack as a whole. What I mean by "not strict": that it doesn't try to
>> follow the letter of the "baggy pants" model that IEEE 802.* and friends
>> define. Being efficient/useful has a higher importance.
>> 
>> But being certifiable is a target/ideal, i.e. the "on the wire" and user
>> visible (statistics, etc) behavior can/"should be able" to be configured
>> to pass conformance tests or even better, interoperability tests.
>> 
>> Now back on track, in my mental model, this simplifies things to a
>> couple of questions that I ask myself about this RFC, in particular
>> about this priority idea:
>>  - Is this idea useful? My answer is yes.
>>  - Can this idea be used to configure a certifiable system? Also, yes.
>> 
>> So, I have no problems with it.
>> 
>
> I've taken quite a few steps back now, unfortunately I'm still back to
> where I was :)
>
> I've talked to more people within NXP, explained to them that the
> standard technically does not disallow the existence of FP for single
> queue devices, for this and that reason. The only responses I got varied
> from the equivalent of a frightened "brrr", to a resounding "no, that
> isn't what was intended". Of course I tried to dig deeper, and I was
> told that the configuration is per (PCP) priority because this is the
> externally visible label carried by packets, so an external management
> system that has to coordinate FP across a LAN does not need to know how
> many traffic classes each node has, or how the priorities map to the
> traffic classes. Only the externally visible behavior is important,
> which is that packets with priority (PCP) X are sent on the wire using a
> preemptable SFD or a normal SFD, because the configuration also affects
> what the other nodes in the network expect. You might contrast this with
> tc-taprio, where the open/closed gates really are defined per traffic
> class, and you might wonder why it wasn't important there for the
> configuration to also be handled using the externally-visible priority
> (PCP) as input. Yes, but with tc-taprio, you can schedule within your
> traffic classes all you want, but this does not change the externally
> visible appearance of a frame, so it doesn't matter. The scheduling is
> orthogonal to whether a packet will be sent as preemptable or not.
>
> Is this explanation satisfactory, and where does it leave us? For me it
> isn't, and it leaves me nowhere new, but it's the best I got.
>
> So ok, single TX queue devices with FP probably are possibly a fun
> physics class experiment, but practically minded vendors won't implement
> them. But does the alternate justification given change my design decision
> in any way (i.e. expose "preemptable priorities" in ethtool as opposed
> to "preemptable queues" in tc-mqprio and tc-taprio as you did)? No.
> It just becomes a matter of enforcing the recommended restriction that
> preemptable and express priorities shouldn't be mixed within the same
> traffic class, something which my current patch set does not do.
>
> [ yes, "should" is a technical term in IEEE standards which means "not
>   mandatory", and that's precisely how the constraint from 12.30.1.1.1
>   was formulated ]
>
> I guess when push comes to shove, somebody will have to answer the
> question of why was FP exposed in Linux by something other than what the
> standard defined it for (prio), and I wouldn't know how to answer that.
>

In my mind the answer is: because it's cool to support more use-cases
than the standard defines, Linux is still a vehicle for experimentation,
more space for our users to play.

>> >>  - Question: would it be against the intention of the API to have a 1:1
>> >>  map of priorities to queues?
>> >
>> > No; as mentioned, going through mqprio's "map" and "queues" to resolve
>> > the priority to a queue is something that I intended to be possible.
>> > Sure, it's not handy either. It would have been handier if the
>> > "admin-status" array was part of the tc-mqprio config, like you did in
>> > your RFC.
>> 
>> Just to be sure, say that my plan is that I document that for the igc
>> driver, the mapping of priorities for the "FP" command is prio (0) ->
>> queue (0), prio (1) -> queue (1), and so on. Note that the mqprio
>> mapping could be that "skb->priority" 10 goes to TC 1 which is mapped to
>> queue 0. Would this be ok?
>
> If skb->priority 10 goes to TC 1 via the prio_tc_map, and finally lands
> in queue 0 where your actual FP knob is, then I'd expect the driver to
> follow the reverse path between queue -> tc -> prios mapped to that tc.
> And I would not state that prio 0 is queue 0 and skb->priority values
> from the whole spectrum may land there, kthxbye. Also, didn't you say
> earlier that "lowest queue index (0) has highest priority"? In Linux,
> skb->priority 0 has lower priority that skb->priority 1. How does that
> work out?
>

Ah, sorry for the confusion, when I said "lowest queue index (0) has
highest priority" it was more in the sense that queue 0 has higher
precedence, packets from it are fetched first, that kind of thing.

> In fact, for both enetc and felix, I have to do this exact thing as you,
> except that my hardware knobs are halfway in the path (per tc, not per
> queue). The reason why I'm not doing it is because we only consider the
> 1:1 prio_tc_map, so we use "prio" and "tc" interchangeably.
>

Hm, that's interesting. I think that's a difference on how we are
modeling our traffic, here's one very common mqprio config for AVB use
cases:

$ tc qdisc replace dev $IFACE parent root handle 100 mqprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      hw 0

One priority for Class A traffic, one priority for Class B traffic and
multiple priorities mapped to a single TC, the rest is Best Effort and
it's mapped to multiple queues. The way we model traffic, TCs and
priorities are different concepts.

One important detail is that in TSN mode, we use the "priority" (meaning
precedence in this context) fetch order for the queues: queue 0 first,
then queue 1, and so on.

> It can't even be any other way given the current code, since the struct
> tc_taprio_qopt_offload passed via ndo_setup_tc does not even contain the
> tc_mqprio_qopt sub-structure with the prio_tc_map. Drivers are probably
> expected to look, from the ndo_setup_tc hook, at their dev->prio_tc_map
> as changed by netdev_set_prio_tc_map() in taprio_change(). Why this API
> is different compared to struct tc_mqprio_qopt_offload, I don't know.

At least for igb and igc I never felt it was necessary to look at the
prio_tc_map. The core netstack will already enqueue each packet to the
right queue. All I need to know is how to configure each queue.

Also note that for igb and igc I didn't ndo_setup_tc for mqprio, because
the default behavior is fine, that's also why that taprio doesn't send
that information (but this is easily fixed).

This could be a consequence that prio_tc_map (and friends) was
introduced by Intel folks (long before my time) and this doesn't work
that well for other vendors.

>
> What prevents you from doing this? I'd like to understand as precisely
> as you can explain, to see how what I'm proposing really sounds when
> applied to Intel hardware.
>

If I am understanding you correctly, nothing prevents me from doing
that. It's just that everything worked out that from the driver side I
don't need to know about priorities, traffic classes, I only need to
care about queues.

That's why I for my "ethtool" proposal I focused on setting preemption
per queue, because the drivers I work with don't need to know about
anything else. But that can change if necessary. The cost doesn't seem
too big.

>> >>  - Deal breaker: fixed 8 prios;
>> >
>> > idk, I don't think that's our biggest problem right now honestly.
>> 
>> Perhaps I am being too optimistic, but the way I see, your proposal is
>> already expressible enough to be used in a "certifiable" system. I was
>> too harsh with "deal breaker.
>> 
>> So, what is the biggest problem that you see?
>> 
>> Overall, no issues from my side.
>
> Let me recap my action items out loud for v2 (some of them weren't
> discussed publicly per se).
>
> * Enforce the constraint recommended by 12.30.1.1.1, somewhere as a
>   static inline helper function that can be called by drivers, from the
>   tc-taprio/tc-mqprio and ethtool-fp code paths. Note that I could very
>   quickly run into a very deep rabbit hole here, if I actually bother to
>   offload to hardware (enetc) the prio_tc_map communicated from tc-mqprio.
>   I'll try to avoid that. I noticed that the mqprio_parse_opt() function
>   doesn't validate the provided queue offsets and count if hw offload is
>   requested (therefore allowing overlapping queue ranges). I really
>   dislike that, because it was probably done for a reason.
>

I agree with that mqprio offload lack of validation being weird.

+1 for the helper, those kind of functions help document a lot the
expectations.

> * Port the isochron script I have for enetc (endpoint) FP latency
>   measurements to kselftest format.
>
> * Do something such that eMAC/pMAC counters are also summed up somewhere
>   centrally by ethtool, and reported to the user either individually or
>   aggregated.
>
> * Reorganize the netlink UAPI such as to remove ETHTOOL_A_FP_PARAM_TABLE
>   and just have multiple ETHTOOL_A_FP_PARAM_ENTRY nests in the parent.
>

Another +1

> * The enetc verification state machine is off by default. The felix
>   verification state machine is on by default. I guess we should figure
>   out a reasonable default for all drivers out there?
>

From the language of IEEE 802.3 ("disableVerify" and friends), it seems
that the expectation is that verification is enabled by default. *But*
from a interoperability point of view, I would only enable verification
by default after some testing in the wild. Or only send the verify
packet if frame preemption is also enabled. 

> * For some reason, I notice that the verification state machine remains
>   FAILED on felix when it links to enetc and that has verification
>   enabled. I need to disable it on enetc, enable it again, and only then
>   will the MAC merge interrupt on felix say that verification completed
>   successfully. I think I'll need to put a quirk in enetc_pl_mac_link_up()
>   to toggle verification off and on, if it was requested, in order for
>   this to be seamless to the user.
>
> If it's only these, nothing seems exactly major. However, I assume this
> is only the beginning. I fully expect there to be one more round of
> "why not dcbnl? why not tc?" nitpicks, for which I'm not exactly
> mentally ready (if credible arguments are to be put forward, I'd rather
> have them now while I haven't yet put in the extra work in the direction
> I'm about to).

I really know the feeling "not mentally ready".

I am still in the "tc" camp. But I cannot deny that this interface seems
to work for the uses that we have in mind. That's (at least) good enough
for me ;-)


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-09-10  0:19           ` Vinicius Costa Gomes
@ 2022-09-10 16:36             ` Vladimir Oltean
  2022-09-14  2:59               ` Vinicius Costa Gomes
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-09-10 16:36 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil, Xiaoliang Yang,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes, Richie Pearn

On Fri, Sep 09, 2022 at 05:19:35PM -0700, Vinicius Costa Gomes wrote:
> Vladimir Oltean <vladimir.oltean@nxp.com> writes:
> 
> > On Tue, Aug 23, 2022 at 05:35:11PM -0700, Vinicius Costa Gomes wrote:
> >> Yes, as the limits are not in the UAPI this is a minor point, agreed.
> >> 
> >> What I am concerned is about use cases not handled by IEEE 802.1Q, for
> >> example, thinking about that other thread about PCP/DSCP: how about a
> >> network that wants to have as many priorities as the DSCP field (6 bits)
> >> allows, together with frame preemption (in the future we may have
> >> hardware that supports that). I am only thinking that we should not
> >> close that door.
> >> 
> >> Again, minor point.
> >
> > Since Linux seems to have settled on TC_PRIO_MAX, TC_QOPT_MAX_QUEUE
> > constants of 16 rather than the 8 you'd expect, I guess I don't have
> > that big of a problem to also expose 16 admin-status values for FP.
> > I can limit the drivers I have to only look at the first 8 entries.
> > But what do 16 priorities mean to ethtool? 16 priorities are a tc thing.
> > Does ethtool even have a history of messing with packet priorities?
> > What am I even doing? lol.
> 
> That's a good point. ethtool doesn't has a history of knowing about
> priorities (and what they even mean). But I like the flexibility that
> this approach provides.

Fair. On the other hand, we need to weigh the pros and cons of moving
the adminStatus to other places.

dcbnl

- pro: is a hardware-only interface, just like ethtool, which is
  suitable for a hardware-only feature like FP

- pro: knows about priorities (app table, PFC). I keep coming back to
  PFC as an example because it has the absolute exact same problems as
  FP, which is that it is defined per priority, but there is a "NOTE 2 -
  Mixing PFC and non-PFC priorities in the same queue results in non-PFC
  traffic being paused causing congestion spreading, and therefore is
  not recommended."

- con: I think the dcbnl app table has the same limitation that
  priorities go only up to 8 (the comment above struct dcb_app says
  "@priority: 3-bit unsigned integer indicating priority for IEEE")

- con: where do dcbnl's responsibilities begin, and where do they end,
  exactly? My understanding is that DCB and TSN are exactly the same
  kind of thing, i.e. extensions to 802.1Q (and 802.3, in case of TSN)
  which were made in order to align Ethernet (and bridges) with a set of
  vertical use cases which weren't quite possible before. All is well,
  except TSN was integrated quite differently into the Linux kernel,
  i.o.w. we don't have a "tsnnl" like there is a "dcbnl", but things are
  much more fine-grained, and integrated with the subsystem that they
  extend, rather than creating a subsystem aligned to a use case. So the
  question becomes, is anything we're doing here, for TSN, extending
  DCB? With Microchip's desire to add non-standard APP table selectors
  for VLAN PCP/DEI prioritization, I think the general movement is
  towards more reuse of dcbnl constructs, and towards forgetting that
  dcbnl is for DCB. But there are many things I don't understand about
  dcbnl, for example why is it possible to set an interface's prio-tc
  map using dcb, and what does it even mean in relationship to
  tc-mqprio, tc-taprio etc (or even with "priomap" from tc-ets).

  $ dcb ets set dev eth0 prio-tc 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7

  Jakub had an answer which explained this in terms of Conway's law:
  https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/#24870325
  but as beautiful that answer is, as many questions it still leaves for
  me. Like if we were to accept dcbnl as part of the UAPI for TSN, how
  does this fact fit within our story that the already use the tc-taprio
  map to map priorities to traffic classes?

ethtool:

- pro: is in the same subsystem as MAC merge layer configuration (for
  which I have no doubts that ethtool is the right place)

- pro: is not aligned to any specific vertical use case, like dcbnl is

- con: knows nothing about priorities

- con: for hardware where FP adminStatus is per traffic class rather
  than per priority, we'll have to change the adminStatus not only from
  changes made via ethtool to the FP adminStatus, but also from changes
  made to the prio_tc_map (tc I guess?). But this may also be a subtle
  pro for dcbnl, since that has its own prio_tc_map, so we'd remain
  within the same subsystem?

tc (part of tc-mqprio and tc-taprio):

- open question: does the FP adminStatus influence scheduling or not?
  Normally I'd say no, adminStatus only decides whether the eMAC or pMAC
  will be used to send a frame, and this is orthogonal to scheduling.
  But 802.1Q says that "It should be noted that, other things being
  equal, designating a priority as “express” effectively increases its
  priority above that of any priority designated as “preemptible”."
  Is this phrase enough to justify a modification of the qdisc dequeue
  method in, say, taprio, which would prioritize a lower priority qdisc
  queue A over a higher one B, if A is express but B is preemptable?

- pro: assuming that in lack of tc-mqprio or tc-taprio, an interface's
  number of traffic classes will collapse to 1, then putting FP
  adminStatus here makes it trivial to support hardware with FP per
  traffic class

- con: more difficult for user space to change FP adminStatus
  independently of the current qdisc?

> >> > No; as mentioned, going through mqprio's "map" and "queues" to resolve
> >> > the priority to a queue is something that I intended to be possible.
> >> > Sure, it's not handy either. It would have been handier if the
> >> > "admin-status" array was part of the tc-mqprio config, like you did in
> >> > your RFC.
> >> 
> >> Just to be sure, say that my plan is that I document that for the igc
> >> driver, the mapping of priorities for the "FP" command is prio (0) ->
> >> queue (0), prio (1) -> queue (1), and so on. Note that the mqprio
> >> mapping could be that "skb->priority" 10 goes to TC 1 which is mapped to
> >> queue 0. Would this be ok?
> >
> > If skb->priority 10 goes to TC 1 via the prio_tc_map, and finally lands
> > in queue 0 where your actual FP knob is, then I'd expect the driver to
> > follow the reverse path between queue -> tc -> prios mapped to that tc.
> > And I would not state that prio 0 is queue 0 and skb->priority values
> > from the whole spectrum may land there, kthxbye. Also, didn't you say
> > earlier that "lowest queue index (0) has highest priority"? In Linux,
> > skb->priority 0 has lower priority that skb->priority 1. How does that
> > work out?
> >
> 
> Ah, sorry for the confusion, when I said "lowest queue index (0) has
> highest priority" it was more in the sense that queue 0 has higher
> precedence, packets from it are fetched first, that kind of thing.

Fetched first by whom?
In ENETC, the transmission selection chooses between TX BD rings whose
configured priority maps to the same traffic class based on a weighted
round robin dequeuing policy. Groups of TX BD rings of different
priorities are in strict priority dequeuing relative to each other.
And the number of the TX BD ring has nothing to do with its priority.

> > In fact, for both enetc and felix, I have to do this exact thing as you,
> > except that my hardware knobs are halfway in the path (per tc, not per
> > queue). The reason why I'm not doing it is because we only consider the
> > 1:1 prio_tc_map, so we use "prio" and "tc" interchangeably.
> >
> 
> Hm, that's interesting. I think that's a difference on how we are
> modeling our traffic, here's one very common mqprio config for AVB use
> cases:
> 
> $ tc qdisc replace dev $IFACE parent root handle 100 mqprio \
>       num_tc 3 \
>       map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>       queues 1@0 1@1 2@2 \
>       hw 0
> 
> One priority for Class A traffic, one priority for Class B traffic and
> multiple priorities mapped to a single TC, the rest is Best Effort and
> it's mapped to multiple queues. The way we model traffic, TCs and
> priorities are different concepts.

So here's what I don't understand. Why do you need 16 priority values,
when in the end you'll map them to only 3 egress traffic classes anyway?

> One important detail is that in TSN mode, we use the "priority" (meaning
> precedence in this context) fetch order for the queues: queue 0 first,
> then queue 1, and so on.

So the number of the queue is what actually contributes to the egress
scheduling algorithm of your NIC? Strange. Tell me something so I'm sure
I'm understanding you properly. netdev_pick_tx() will pick a TX queue
using skb_tx_hash(), which hashes between the netdev queues mapped to
the same traffic class as the traffic class that skb->priority is mapped to.
In your case, you have 2@2 (2 TX queues for traffic class 2), call these
queues 2 and 3. You're saying that hardware prioritizes queue 2 over 3.
But the network stack doesn't know that. It hashes equally packets that
go to tc 2 between queue 2 and queue 3. In our case, this misconfiguration
would be immediately obvious, because we could have an offloaded tc-taprio
schedule in ENETC, and Linux could be enqueuing into a ring that ultimately
has a different schedule than software knows about. How does this work
out for you?

> > * The enetc verification state machine is off by default. The felix
> >   verification state machine is on by default. I guess we should figure
> >   out a reasonable default for all drivers out there?
> >
> 
> From the language of IEEE 802.3 ("disableVerify" and friends), it seems
> that the expectation is that verification is enabled by default. *But*
> from a interoperability point of view, I would only enable verification
> by default after some testing in the wild. Or only send the verify
> packet if frame preemption is also enabled. 

I wonder whether the answer to this one is actually "Additional Ethernet
Capabilities TLV" (802.3 clause 79.3.7). It would be good if we could
all start off with FP/MM disabled, verification state machine included,
until the LLDP daemon receives a TLV that says PreemptSupported and
PreemptEnabled are true. If both local and remote preemption are
supported and enabled, we tell the kernel to enable the MAC merge layer
and kick off verification. In turn this will alter PreemptActive that
gets set in future replies. I don't know, the flow may be different.

> > * For some reason, I notice that the verification state machine remains
> >   FAILED on felix when it links to enetc and that has verification
> >   enabled. I need to disable it on enetc, enable it again, and only then
> >   will the MAC merge interrupt on felix say that verification completed
> >   successfully. I think I'll need to put a quirk in enetc_pl_mac_link_up()
> >   to toggle verification off and on, if it was requested, in order for
> >   this to be seamless to the user.
> >
> > If it's only these, nothing seems exactly major. However, I assume this
> > is only the beginning. I fully expect there to be one more round of
> > "why not dcbnl? why not tc?" nitpicks, for which I'm not exactly
> > mentally ready (if credible arguments are to be put forward, I'd rather
> > have them now while I haven't yet put in the extra work in the direction
> > I'm about to).
> 
> I really know the feeling "not mentally ready".
> 
> I am still in the "tc" camp. But I cannot deny that this interface seems
> to work for the uses that we have in mind. That's (at least) good enough
> for me ;-)

To me right now, FP adminStatus is not exactly something that the
scheduler is concerned with, and this is why tc is kind of my last
choice. But I agree that it makes certain things simpler. Plus, it seems
to be the only subsystem aware of 16 priorities and 16 traffic classes,
which you seem to care about.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-09-10 16:36             ` Vladimir Oltean
@ 2022-09-14  2:59               ` Vinicius Costa Gomes
  2022-09-15 14:14                 ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Vinicius Costa Gomes @ 2022-09-14  2:59 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil, Xiaoliang Yang,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes, Richie Pearn

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Sep 09, 2022 at 05:19:35PM -0700, Vinicius Costa Gomes wrote:
>> Vladimir Oltean <vladimir.oltean@nxp.com> writes:
>> 
>> > On Tue, Aug 23, 2022 at 05:35:11PM -0700, Vinicius Costa Gomes wrote:
>> >> Yes, as the limits are not in the UAPI this is a minor point, agreed.
>> >> 
>> >> What I am concerned is about use cases not handled by IEEE 802.1Q, for
>> >> example, thinking about that other thread about PCP/DSCP: how about a
>> >> network that wants to have as many priorities as the DSCP field (6 bits)
>> >> allows, together with frame preemption (in the future we may have
>> >> hardware that supports that). I am only thinking that we should not
>> >> close that door.
>> >> 
>> >> Again, minor point.
>> >
>> > Since Linux seems to have settled on TC_PRIO_MAX, TC_QOPT_MAX_QUEUE
>> > constants of 16 rather than the 8 you'd expect, I guess I don't have
>> > that big of a problem to also expose 16 admin-status values for FP.
>> > I can limit the drivers I have to only look at the first 8 entries.
>> > But what do 16 priorities mean to ethtool? 16 priorities are a tc thing.
>> > Does ethtool even have a history of messing with packet priorities?
>> > What am I even doing? lol.
>> 
>> That's a good point. ethtool doesn't has a history of knowing about
>> priorities (and what they even mean). But I like the flexibility that
>> this approach provides.
>
> Fair. On the other hand, we need to weigh the pros and cons of moving
> the adminStatus to other places.
>
> dcbnl
>
> - pro: is a hardware-only interface, just like ethtool, which is
>   suitable for a hardware-only feature like FP
>
> - pro: knows about priorities (app table, PFC). I keep coming back to
>   PFC as an example because it has the absolute exact same problems as
>   FP, which is that it is defined per priority, but there is a "NOTE 2 -
>   Mixing PFC and non-PFC priorities in the same queue results in non-PFC
>   traffic being paused causing congestion spreading, and therefore is
>   not recommended."
>
> - con: I think the dcbnl app table has the same limitation that
>   priorities go only up to 8 (the comment above struct dcb_app says
>   "@priority: 3-bit unsigned integer indicating priority for IEEE")
>
> - con: where do dcbnl's responsibilities begin, and where do they end,
>   exactly? My understanding is that DCB and TSN are exactly the same
>   kind of thing, i.e. extensions to 802.1Q (and 802.3, in case of TSN)
>   which were made in order to align Ethernet (and bridges) with a set of
>   vertical use cases which weren't quite possible before. All is well,
>   except TSN was integrated quite differently into the Linux kernel,
>   i.o.w. we don't have a "tsnnl" like there is a "dcbnl", but things are
>   much more fine-grained, and integrated with the subsystem that they
>   extend, rather than creating a subsystem aligned to a use case. So the
>   question becomes, is anything we're doing here, for TSN, extending
>   DCB? With Microchip's desire to add non-standard APP table selectors
>   for VLAN PCP/DEI prioritization, I think the general movement is
>   towards more reuse of dcbnl constructs, and towards forgetting that
>   dcbnl is for DCB. But there are many things I don't understand about
>   dcbnl, for example why is it possible to set an interface's prio-tc
>   map using dcb, and what does it even mean in relationship to
>   tc-mqprio, tc-taprio etc (or even with "priomap" from tc-ets).
>

I don't know. I still have to learn more about DCB. I suppose HW that
has both TSN and DCB features is not too far in the future. So is it
possible that people are going to try to use them together? Does it make
sense? I don't know.

Here are some notes from a quick look at the code, anyone feel free to
correct me, or add anything.

Looking at two drivers code (mlx5 and ice) it seems that the dcbnl
prio-tc map has a similar objective to the netdev prio_tc_map (I am
looking at how that map feeds into each driver ndo_select_queue()), but
inside the driver and with more flexibility ("in the DSCP mode,
overwrite the skb->priority with this value and use this value for
netdev_tx_pick()", that kind of thing). So using tc-mqprio/tc-taprio
with 'dcb ets' 'prio-map' will send traffic to suprising traffic
classes/queues. 

tc-ets priomap feels like another layer, you use that to setup buckets
in which run what seems to be burst protection algorithms, and this
happens after ndo_select_queue() runs, so this map seems almost
independent of the dcbnl prio-tc map (?).

>   $ dcb ets set dev eth0 prio-tc 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
>
>   Jakub had an answer which explained this in terms of Conway's law:
>   https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/#24870325
>   but as beautiful that answer is, as many questions it still leaves for
>   me. Like if we were to accept dcbnl as part of the UAPI for TSN, how
>   does this fact fit within our story that the already use the tc-taprio
>   map to map priorities to traffic classes?

Setting the 'dcb ets' "prio-tc" map together with tc-taprio or
tc-mqprio, doesn't make much sense, true. But if I am understanding the
code right, you could use that 'dcb' command, and configure frame
preemption via ethtool (for example) and it could work, without
tc-taprio or tc-mqprio to setup the prio:tc mapping.

Perhaps, for now, the solution is to disallow this as invalid, that you
can only have one such priority map configured, either dcb prio-map or
taprio/mqprio. Not sure.

For the long term, perhaps the solution is to have a single place that
does this mapping (from skb to traffic classes and then to queues) more
centralized. 'dcb' would populate that as would 'mqprio/taprio'. This
seems to imply that ndo_select_queue() is a escape hatch, that in the
ideal world it would not exist. Would this create worse problems?
(Reality is pointing that it doesn't seem that we can avoid making
drivers aware of this map, so this idea doesn't seem to help the
situation much)

>
> ethtool:
>
> - pro: is in the same subsystem as MAC merge layer configuration (for
>   which I have no doubts that ethtool is the right place)
>
> - pro: is not aligned to any specific vertical use case, like dcbnl is
>
> - con: knows nothing about priorities
>
> - con: for hardware where FP adminStatus is per traffic class rather
>   than per priority, we'll have to change the adminStatus not only from
>   changes made via ethtool to the FP adminStatus, but also from changes
>   made to the prio_tc_map (tc I guess?). But this may also be a subtle
>   pro for dcbnl, since that has its own prio_tc_map, so we'd remain
>   within the same subsystem?

If the driver also supports DCB, it will have the offload functions
implemented, so it be aware of changes in that map. The problem is now
mqprio, that if it isn't offloaded, the driver is not aware of any
changes. But mqprio doesn't implement .change(), which makes things a
bit easier.

'ethtool' is gaining my vote.

>
> tc (part of tc-mqprio and tc-taprio):
>
> - open question: does the FP adminStatus influence scheduling or not?
>   Normally I'd say no, adminStatus only decides whether the eMAC or pMAC
>   will be used to send a frame, and this is orthogonal to scheduling.
>   But 802.1Q says that "It should be noted that, other things being
>   equal, designating a priority as “express” effectively increases its
>   priority above that of any priority designated as “preemptible”."
>   Is this phrase enough to justify a modification of the qdisc dequeue
>   method in, say, taprio, which would prioritize a lower priority qdisc
>   queue A over a higher one B, if A is express but B is preemptable?

I understood this as: if a preemptible packet (higher priority) and
expresss packet (lower priority) arrive at the MAC at the same time, the
express packet will finish transmitting before the preemptible, causing
the express packet to have an 'effective' higher priority.

I don't think we need to change anything on the dequeueing in the qdisc
side (and trying to solve it is going to be awkward, having to check the
neighbor qdiscs for packets being enqueued/peeked, ugh!).

(as a note: for i225, traffic in queues marked as express will only
preempt traffic from lower numbered queues, this behavior doesn't seem
to be forbidden by the standard)

>
> - pro: assuming that in lack of tc-mqprio or tc-taprio, an interface's
>   number of traffic classes will collapse to 1, then putting FP
>   adminStatus here makes it trivial to support hardware with FP per
>   traffic class
>
> - con: more difficult for user space to change FP adminStatus
>   independently of the current qdisc?
>
>> >> > No; as mentioned, going through mqprio's "map" and "queues" to resolve
>> >> > the priority to a queue is something that I intended to be possible.
>> >> > Sure, it's not handy either. It would have been handier if the
>> >> > "admin-status" array was part of the tc-mqprio config, like you did in
>> >> > your RFC.
>> >> 
>> >> Just to be sure, say that my plan is that I document that for the igc
>> >> driver, the mapping of priorities for the "FP" command is prio (0) ->
>> >> queue (0), prio (1) -> queue (1), and so on. Note that the mqprio
>> >> mapping could be that "skb->priority" 10 goes to TC 1 which is mapped to
>> >> queue 0. Would this be ok?
>> >
>> > If skb->priority 10 goes to TC 1 via the prio_tc_map, and finally lands
>> > in queue 0 where your actual FP knob is, then I'd expect the driver to
>> > follow the reverse path between queue -> tc -> prios mapped to that tc.
>> > And I would not state that prio 0 is queue 0 and skb->priority values
>> > from the whole spectrum may land there, kthxbye. Also, didn't you say
>> > earlier that "lowest queue index (0) has highest priority"? In Linux,
>> > skb->priority 0 has lower priority that skb->priority 1. How does that
>> > work out?
>> >
>> 
>> Ah, sorry for the confusion, when I said "lowest queue index (0) has
>> highest priority" it was more in the sense that queue 0 has higher
>> precedence, packets from it are fetched first, that kind of thing.
>
> Fetched first by whom?
> In ENETC, the transmission selection chooses between TX BD rings whose
> configured priority maps to the same traffic class based on a weighted
> round robin dequeuing policy. Groups of TX BD rings of different
> priorities are in strict priority dequeuing relative to each other.
> And the number of the TX BD ring has nothing to do with its priority.

For the hardware I work with:
 - i210: two modes, (1) round robin, by default, or in (2) Qav-mode
   ("TSN), strict priority, fixed by the queue index.
 - i225/i226: two modes, (1) round robin, by default, or in (2) TSN
    mode, strict priority, configurable queue priority.

(to give the full picture, we can also configure how the descriptors are
going to be fetched from host memory, but as host memory is not usually
where the congestion happens, it's less important for this)

>
>> > In fact, for both enetc and felix, I have to do this exact thing as you,
>> > except that my hardware knobs are halfway in the path (per tc, not per
>> > queue). The reason why I'm not doing it is because we only consider the
>> > 1:1 prio_tc_map, so we use "prio" and "tc" interchangeably.
>> >
>> 
>> Hm, that's interesting. I think that's a difference on how we are
>> modeling our traffic, here's one very common mqprio config for AVB use
>> cases:
>> 
>> $ tc qdisc replace dev $IFACE parent root handle 100 mqprio \
>>       num_tc 3 \
>>       map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>>       queues 1@0 1@1 2@2 \
>>       hw 0
>> 
>> One priority for Class A traffic, one priority for Class B traffic and
>> multiple priorities mapped to a single TC, the rest is Best Effort and
>> it's mapped to multiple queues. The way we model traffic, TCs and
>> priorities are different concepts.
>
> So here's what I don't understand. Why do you need 16 priority values,
> when in the end you'll map them to only 3 egress traffic classes anyway?
>

Basically because of how prio_tc_map works and the fact that I want TC 0
to be mapped to queue 0. i.e. if an entry is not specified in the map,
it's going ot be mapped to TC 0. I want to avoid that, historically, in
our models, TC 0 is the "important" one.

>> One important detail is that in TSN mode, we use the "priority" (meaning
>> precedence in this context) fetch order for the queues: queue 0 first,
>> then queue 1, and so on.
>
> So the number of the queue is what actually contributes to the egress
> scheduling algorithm of your NIC? Strange. Tell me something so I'm sure
> I'm understanding you properly. netdev_pick_tx() will pick a TX queue
> using skb_tx_hash(), which hashes between the netdev queues mapped to
> the same traffic class as the traffic class that skb->priority is mapped to.
> In your case, you have 2@2 (2 TX queues for traffic class 2), call these
> queues 2 and 3. You're saying that hardware prioritizes queue 2 over 3.
> But the network stack doesn't know that. It hashes equally packets that
> go to tc 2 between queue 2 and queue 3. In our case, this misconfiguration
> would be immediately obvious, because we could have an offloaded tc-taprio
> schedule in ENETC, and Linux could be enqueuing into a ring that ultimately
> has a different schedule than software knows about. How does this work
> out for you?
>

More or less, repeating a bit what I said before, on i210 it was fixed,
on i225/i226 it can be changed, but we decided to keep the default, less
surprises for our users.

In (some of) our traffic models, the only kind of traffic class that we
allow to go through multiple queues is Best Effort.

And even that it works out because taprio "translates" from traffic
classes to queues when it sends the offload information to the driver,
i.e. the driver knows the schedule of queues, not traffic classes.

>> > * The enetc verification state machine is off by default. The felix
>> >   verification state machine is on by default. I guess we should figure
>> >   out a reasonable default for all drivers out there?
>> >
>> 
>> From the language of IEEE 802.3 ("disableVerify" and friends), it seems
>> that the expectation is that verification is enabled by default. *But*
>> from a interoperability point of view, I would only enable verification
>> by default after some testing in the wild. Or only send the verify
>> packet if frame preemption is also enabled. 
>
> I wonder whether the answer to this one is actually "Additional Ethernet
> Capabilities TLV" (802.3 clause 79.3.7). It would be good if we could
> all start off with FP/MM disabled, verification state machine included,
> until the LLDP daemon receives a TLV that says PreemptSupported and
> PreemptEnabled are true. If both local and remote preemption are
> supported and enabled, we tell the kernel to enable the MAC merge layer
> and kick off verification. In turn this will alter PreemptActive that
> gets set in future replies. I don't know, the flow may be different.
>

That's a good point (your understanding of the flow is similar to mine).
This seems a good idea. But we may have to wait for a bit until we have
a LLDP implementation that supports this.

>> > * For some reason, I notice that the verification state machine remains
>> >   FAILED on felix when it links to enetc and that has verification
>> >   enabled. I need to disable it on enetc, enable it again, and only then
>> >   will the MAC merge interrupt on felix say that verification completed
>> >   successfully. I think I'll need to put a quirk in enetc_pl_mac_link_up()
>> >   to toggle verification off and on, if it was requested, in order for
>> >   this to be seamless to the user.
>> >
>> > If it's only these, nothing seems exactly major. However, I assume this
>> > is only the beginning. I fully expect there to be one more round of
>> > "why not dcbnl? why not tc?" nitpicks, for which I'm not exactly
>> > mentally ready (if credible arguments are to be put forward, I'd rather
>> > have them now while I haven't yet put in the extra work in the direction
>> > I'm about to).
>> 
>> I really know the feeling "not mentally ready".
>> 
>> I am still in the "tc" camp. But I cannot deny that this interface seems
>> to work for the uses that we have in mind. That's (at least) good enough
>> for me ;-)
>
> To me right now, FP adminStatus is not exactly something that the
> scheduler is concerned with, and this is why tc is kind of my last
> choice. But I agree that it makes certain things simpler. Plus, it seems
> to be the only subsystem aware of 16 priorities and 16 traffic classes,
> which you seem to care about.

This is just food for thought, I am not expecting a reply, as it doesn't
help the review of the RFC.

Ideally, we wouldn't have that limit of 16 priorities or traffic
classes. I don't want to limit it even more. This "16" limit is already
going to require some creativity to handle some use cases in a medium
future, I am afraid.

I am already hearing coleagues talking about containers (think the
net_prio namespace) together with VMs, NICs with dozens of queues. All
of this with scheduled traffic. On the wire, the 8 PCP codepoints are
what we have, but internally/"inside the system", we might need more
granularity to organize all those applications.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-09-14  2:59               ` Vinicius Costa Gomes
@ 2022-09-15 14:14                 ` Vladimir Oltean
  2022-11-10 10:33                   ` 回复: " Xiaoliang Yang
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-09-15 14:14 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil, Xiaoliang Yang,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes, Richie Pearn

I won't reply to every point, as that would just diverge more and more
from what I'm really trying to settle on.

On Tue, Sep 13, 2022 at 07:59:29PM -0700, Vinicius Costa Gomes wrote:
> For the long term, perhaps the solution is to have a single place that
> does this mapping (from skb to traffic classes and then to queues) more
> centralized. 'dcb' would populate that as would 'mqprio/taprio'. This
> seems to imply that ndo_select_queue() is a escape hatch, that in the
> ideal world it would not exist. Would this create worse problems?
> (Reality is pointing that it doesn't seem that we can avoid making
> drivers aware of this map, so this idea doesn't seem to help the
> situation much)

That central place is called netdev_set_prio_tc_map(). It is called from
qdiscs and drivers alike. W.r.t. your "ndo_select_queue() escape hatch",
the commit message of 4f57c087de9b ("net: implement mechanism for HW
based QOS") provides some nice insight into networking history, 11 years
later.

> > ethtool:
> >
> > - pro: is in the same subsystem as MAC merge layer configuration (for
> >   which I have no doubts that ethtool is the right place)
> >
> > - pro: is not aligned to any specific vertical use case, like dcbnl is
> >
> > - con: knows nothing about priorities
> >
> > - con: for hardware where FP adminStatus is per traffic class rather
> >   than per priority, we'll have to change the adminStatus not only from
> >   changes made via ethtool to the FP adminStatus, but also from changes
> >   made to the prio_tc_map (tc I guess?). But this may also be a subtle
> >   pro for dcbnl, since that has its own prio_tc_map, so we'd remain
> >   within the same subsystem?
> 
> If the driver also supports DCB, it will have the offload functions
> implemented, so it be aware of changes in that map.

Not so simple. DCB has many ops, you don't have to implement all of them.

> The problem is now mqprio, that if it isn't offloaded, the driver is
> not aware of any changes. But mqprio doesn't implement .change(),
> which makes things a bit easier.
> 
> 'ethtool' is gaining my vote.

Well, ethtool is certainly not aware of changes to the prio:tc map
either, very much less so than the driver :)

> > tc (part of tc-mqprio and tc-taprio):
> >
> > - open question: does the FP adminStatus influence scheduling or not?
> >   Normally I'd say no, adminStatus only decides whether the eMAC or pMAC
> >   will be used to send a frame, and this is orthogonal to scheduling.
> >   But 802.1Q says that "It should be noted that, other things being
> >   equal, designating a priority as “express” effectively increases its
> >   priority above that of any priority designated as “preemptible”."
> >   Is this phrase enough to justify a modification of the qdisc dequeue
> >   method in, say, taprio, which would prioritize a lower priority qdisc
> >   queue A over a higher one B, if A is express but B is preemptable?
> 
> I understood this as: if a preemptible packet (higher priority) and
> expresss packet (lower priority) arrive at the MAC at the same time, the
> express packet will finish transmitting before the preemptible, causing
> the express packet to have an 'effective' higher priority.
> 
> I don't think we need to change anything on the dequeueing in the qdisc
> side (and trying to solve it is going to be awkward, having to check the
> neighbor qdiscs for packets being enqueued/peeked, ugh!).

Agree. So bottom line, express/preemptable does not affect the
scheduling algorithm. This is one reservation I have against FP in tc.
Maybe if we restricted this new option to only the hardware offload
modes of the respective qdiscs.

> (as a note: for i225, traffic in queues marked as express will only
> preempt traffic from lower numbered queues, this behavior doesn't seem
> to be forbidden by the standard)

So I want the frame preemption adminStatus to work hand in hand with the
prio:tc map of the netdev, no matter where/how the adminStatus lands.

Whereas your hardware implementation does not work hand in hand with it.
You'll have to add some restrictions and do some clarification work, sorry.

> For the hardware I work with:
>  - i210: two modes, (1) round robin, by default, or in (2) Qav-mode
>    ("TSN), strict priority, fixed by the queue index.
>  - i225/i226: two modes, (1) round robin, by default, or in (2) TSN
>     mode, strict priority, configurable queue priority.
> 
> (to give the full picture, we can also configure how the descriptors are
> going to be fetched from host memory, but as host memory is not usually
> where the congestion happens, it's less important for this)

So this is to say that i210 has 1 traffic class in mode (1), and a
number of traffic classes equal to the number of queues, in mode (2)?
As for i225/i226, the number of traffic classes is given by the number
of uniqueuly configured queue priorities? Or what would you call a
traffic class, exactly?

> > So here's what I don't understand. Why do you need 16 priority values,
> > when in the end you'll map them to only 3 egress traffic classes anyway?
> >
> 
> Basically because of how prio_tc_map works and the fact that I want TC 0
> to be mapped to queue 0. i.e. if an entry is not specified in the map,
> it's going ot be mapped to TC 0. I want to avoid that, historically, in
> our models, TC 0 is the "important" one.

Strange. I wonder who else expects traffic class 0 to be the important one?
This is clearly a case of 2 vendors using the exact same tooling and
commands, but a configuration script working in the exact opposite way.

> More or less, repeating a bit what I said before, on i210 it was fixed,
> on i225/i226 it can be changed, but we decided to keep the default, less
> surprises for our users.

For yours, sure, maybe. For others.. :)

> In (some of) our traffic models, the only kind of traffic class that we
> allow to go through multiple queues is Best Effort.

With the mention that half (or more) of the best effort traffic will be
Better Effort, due to your secret prioritization scheme within the same
traffic class. And netdev_pick_tx() will essentially be a Russian
roulette, because it hashes packets with the same skb->priority towards
queues of different actual priorities.

> And even that it works out because taprio "translates" from traffic
> classes to queues when it sends the offload information to the driver,
> i.e. the driver knows the schedule of queues, not traffic classes.

Which is incredibly strange to me, since the standard clearly defines
Qbv gates to be per traffic class, and in ENETC, even if we have 2 TX
queues for the same traffic class (one per CPU), the hardware schedule
is still per traffic class and not per independent TX queue (BD ring).

How does this work for i225/i226, if 2 queues are configured for the
same dequeue priority? Do the taprio gates still take effect per queue?

> That's a good point (your understanding of the flow is similar to mine).
> This seems a good idea. But we may have to wait for a bit until we have
> a LLDP implementation that supports this.

This is circular; we have a downstream patch to openlldp that adds
support for this TLV, but it can't go anywhere until there is mainline
kernel support.

What about splitting out MAC merge support from FP support, and me
concentrating on the MAC merge layer for now? They're independent up to
a pretty high level. The MAC merge layer is supposed to be controlled by
the LLDP daemon and be pretty much plug-and-play, while the FP adminStatus
is supposed to be set by some high level administrator, like a NETCONF
client.

We can argue some more about how to best expose the FP adminStatus.
Right now, I'm thinking that for all intents and purposes, the
adminStatus really only makes practical sense when we have at least 2
traffic classes: one onto which we can map the preemptable priorities,
and one onto which we can map the express ones. In turn, it means that
the 'priorities collapsing into the same traffic class' problem that we
have when we delete the taprio/mqprio qdisc can be solved if we put the
FP adminStatus into the same netlink interface that also configures the
prio:tc map and the number of traffic classes (i.e. yes, in tc).

I'll let someone more knowledgeable of our sysrepo-tsn implementation of
a NETCONF server (like Xiaoliang, maybe even Rui or Richie?)
https://github.com/real-time-edge-sw/real-time-edge-sysrepo
comment on whether this would create any undesirable side effect.
The implication is that user space, when asked to change the adminStatus,
will have to figure out whether we are using a taprio, or an mqprio
qdisc, and create different netlink attributes to talk to the kernel to
request that configuration. User space will also have to send an error
towards the NETCONF client if no such qdisc exists (meaning that the
user wants to combine priorities of different express/preemptable values
in the same traffic class, TC0). The qdiscs will also centrally validate
whether such invalid mixed configurations are requested.

I'm mostly OK with this: an UAPI where FP adminStatus is per priority,
but it gets translated into per-tc by the qdisc, and passed to the
driver through ndo_setup_tc(). I am certainly still very much against
the "preemptable queue mask" that you proposed, and IMO you'll have to
do something and clarify what a traffic class even means for Intel
hardware, especially in relationship to multiple queues per tc
(mqprio queues 2@1).

Anyone else who has an opinion of any sort, please feel free.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-08-17 18:46     ` Jakub Kicinski
@ 2022-10-01 15:53       ` Vladimir Oltean
  2022-10-03 14:36         ` Jakub Kicinski
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-10-01 15:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Wed, Aug 17, 2022 at 11:46:42AM -0700, Jakub Kicinski wrote:
> > > When we have separate set of stats for pMAC the normal stats are sum of
> > > all traffic, right? So normal - pMAC == eMAC, everything that's not
> > > preemptible is express?  
> > 
> > Actually not quite, or at least not for the LS1028A ENETC and Felix switch.
> > The normal counters report just what the eMAC sees, and the pMAC counters
> > just what the pMAC sees. After all, only the eMAC was enabled up until now.
> > Nobody does the addition currently.
> 
> I see. And the netdev stats are the total?

dev->stats reports the aggregate of express and preemptable packets seen
by software, yes. I got the hint though, I should also report the
aggregate. The summation seems like a generic problem which ethtool
should be able to do internally, yet a generic implementation is riddled
with problems that must be dealt with (RMON histograms reported by eMAC
and pMAC can be different; some counters could be implemented by the
eMAC but not the pMAC or vice versa, and that needs to be handled, etc).
Additionally, the summation of counters must also be done for ndo_get_stats64(),
when those come from hardware as well. So I'd incline to do it in the
driver rn.

> > > Did you consider adding an attribute for switching between MAC and pMAC
> > > for stats rather than duplicating things?  
> > 
> > No. Could you expand on that idea a little? Add a netlink attribute
> > where, and this helps reduce duplication where, and how?
> 
> Add a attribute to ETHTOOL_MSG_STATS_GET, let's call it
> ETHTOOL_A_STATS_EXPRESS, a flag.

I'll add this to the UAPI and to internal data structures, ok?

enum ethtool_stats_src {
	ETHTOOL_STATS_SRC_AGGREGATE = 0,
	ETHTOOL_STATS_SRC_EMAC,
	ETHTOOL_STATS_SRC_PMAC,
};

> Plumb thru to all the stats callback an extra argument 
> (a structure for future extensibility) with a bool pMAC;
> 
> Add a capability field to ethtool_ops to announce that
> driver will pay attention to the bool pMAC / has support.

You mean capability field as in ethtool_ops::supported_coalesce_params,
right? (we discussed about this separately).
This won't fit the enetc driver very well. Some enetc ports on the NXP
LS1028A support the MM layer (port 0, port 2) and some don't (port 1,
port 3). Yet they share the same PF driver. So populating mm_supported =
true in the const struct enetc_pf_ethtool_ops isn't going to cover both.
I can, however, key on my ethtool_ops :: get_mm_state() function which
lets the driver report a "bool supported". Is this ok?

> We can then use the existing callbacks.
> 
> Am I making sense?

Yes, thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-10-01 15:53       ` Vladimir Oltean
@ 2022-10-03 14:36         ` Jakub Kicinski
  2022-10-03 14:51           ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2022-10-03 14:36 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Sat, 1 Oct 2022 15:53:38 +0000 Vladimir Oltean wrote:
> > Add a attribute to ETHTOOL_MSG_STATS_GET, let's call it
> > ETHTOOL_A_STATS_EXPRESS, a flag.  
> 
> I'll add this to the UAPI and to internal data structures, ok?
> 
> enum ethtool_stats_src {
> 	ETHTOOL_STATS_SRC_AGGREGATE = 0,
> 	ETHTOOL_STATS_SRC_EMAC,
> 	ETHTOOL_STATS_SRC_PMAC,
> };

Yup!

> > Plumb thru to all the stats callback an extra argument 
> > (a structure for future extensibility) with a bool pMAC;
> > 
> > Add a capability field to ethtool_ops to announce that
> > driver will pay attention to the bool pMAC / has support.  
> 
> You mean capability field as in ethtool_ops::supported_coalesce_params,
> right? (we discussed about this separately).
> This won't fit the enetc driver very well. Some enetc ports on the NXP
> LS1028A support the MM layer (port 0, port 2) and some don't (port 1,
> port 3). Yet they share the same PF driver. So populating mm_supported =
> true in the const struct enetc_pf_ethtool_ops isn't going to cover both.
> I can, however, key on my ethtool_ops :: get_mm_state() function which
> lets the driver report a "bool supported". Is this ok?

That happens, I think about the capability in the ops as driver caps
rather than HW caps. The driver can still return -EOPNOTSUPP, but it
guarantees to check the field's value. 

Most (all but one) datacenter NIC vendors have uber-drivers for all
their HW generations these days, static per-driver caps can't map to 
HW caps in my world.

So weak preference for sticking to that model to avoid confusion about
the semantics of existing caps vs caps which should use a function call.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool
  2022-10-03 14:36         ` Jakub Kicinski
@ 2022-10-03 14:51           ` Vladimir Oltean
  0 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-10-03 14:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Michal Kubecek, Claudiu Manoil, Vinicius Costa Gomes,
	Xiaoliang Yang, Kurt Kanzenbach, Rui Sousa, Ferenc Fejes

On Mon, Oct 03, 2022 at 07:36:03AM -0700, Jakub Kicinski wrote:
> On Sat, 1 Oct 2022 15:53:38 +0000 Vladimir Oltean wrote:
> > > Add a attribute to ETHTOOL_MSG_STATS_GET, let's call it
> > > ETHTOOL_A_STATS_EXPRESS, a flag.  
> > 
> > I'll add this to the UAPI and to internal data structures, ok?
> > 
> > enum ethtool_stats_src {
> > 	ETHTOOL_STATS_SRC_AGGREGATE = 0,
> > 	ETHTOOL_STATS_SRC_EMAC,
> > 	ETHTOOL_STATS_SRC_PMAC,
> > };
> 
> Yup!

Ok. I've also added enum ethtool_stats_src as the first member of struct
ethtool_eth_mac_stats, ethtool_eth_phy_stats, ethtool_eth_ctrl_stats,
ethtool_pause_stats, ethtool_rmon_stats. So I am not adding an extra
argument (another "structure for future extensibility" as you wrote
below). Hope that's ok.

> > > Plumb thru to all the stats callback an extra argument 
> > > (a structure for future extensibility) with a bool pMAC;
> > > 
> > > Add a capability field to ethtool_ops to announce that
> > > driver will pay attention to the bool pMAC / has support.  
> > 
> > You mean capability field as in ethtool_ops::supported_coalesce_params,
> > right? (we discussed about this separately).
> > This won't fit the enetc driver very well. Some enetc ports on the NXP
> > LS1028A support the MM layer (port 0, port 2) and some don't (port 1,
> > port 3). Yet they share the same PF driver. So populating mm_supported =
> > true in the const struct enetc_pf_ethtool_ops isn't going to cover both.
> > I can, however, key on my ethtool_ops :: get_mm_state() function which
> > lets the driver report a "bool supported". Is this ok?
> 
> That happens, I think about the capability in the ops as driver caps
> rather than HW caps. The driver can still return -EOPNOTSUPP, but it
> guarantees to check the field's value. 

The stats callbacks return void. We'd be relying on the ETHTOOL_STAT_NOT_SET value.

> 
> Most (all but one) datacenter NIC vendors have uber-drivers for all
> their HW generations these days, static per-driver caps can't map to 
> HW caps in my world.
> 
> So weak preference for sticking to that model to avoid confusion about
> the semantics of existing caps vs caps which should use a function call.

An even bigger uber-driver is DSA, with its own dsa_slave_ethtool_ops.
If I put "supported_mm" in ethtool_ops, and set it to true in DSA,
I become responsible for rejecting everything except ETHTOOL_STATS_SRC_AGGREGATE
for all DSA drivers, which I'd rather not do. Alternatively, I put it to
false in DSA and I won't have pMAC stats callbacks getting called even
if I do support a pMAC. Maybe DSA isn't even the only one in this situation.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* 回复: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-09-15 14:14                 ` Vladimir Oltean
@ 2022-11-10 10:33                   ` Xiaoliang Yang
  2022-11-14 15:42                     ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Xiaoliang Yang @ 2022-11-10 10:33 UTC (permalink / raw)
  To: Vladimir Oltean, Vinicius Costa Gomes
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michal Kubecek, Claudiu Manoil, Kurt Kanzenbach,
	Rui Sousa, Ferenc Fejes, Richie Pearn

On Sep 15, 2022, Vladimir wrote:
> > That's a good point (your understanding of the flow is similar to mine).
> > This seems a good idea. But we may have to wait for a bit until we
> > have a LLDP implementation that supports this.
> 
> This is circular; we have a downstream patch to openlldp that adds support for
> this TLV, but it can't go anywhere until there is mainline kernel support.
> 
> What about splitting out MAC merge support from FP support, and me
> concentrating on the MAC merge layer for now? They're independent up to a
> pretty high level. The MAC merge layer is supposed to be controlled by the
> LLDP daemon and be pretty much plug-and-play, while the FP adminStatus is
> supposed to be set by some high level administrator, like a NETCONF client.
I agree that splitting out MAC merge support from FP support, they are defined
by different spec. I'm trying to add LLDP exchange verification support. The
procedure is like following:
 1. enable preemption on local port by using "ethtool", do not active preemption.
 2. Decode the LLDP TLV of remote port and ensure the remote port supports
and enables preemption.
 3. Run SMD-v/r verify process on local port or active the preemption directly.
 4. Send updated LLDP TLV to remote port.
 5. Disable preemption on local port and repeat step 1-4 when link down/up.
The struct "ethtool_mm_cfg" seems not fit this procedure. I update it:
struct ethtool_mm_cfg {
	u32 verify_time;
	bool verify_disable;
	bool enabled;
+	bool active; // Used to active or start verify preemption by LLDP daemon.
	u8 add_frag_size;
};
If we want to enable/disable the LLDP exchange verification, maybe we need
to add more parameters like "bool lldp_verify_enable"

> 
> We can argue some more about how to best expose the FP adminStatus.
> Right now, I'm thinking that for all intents and purposes, the adminStatus
> really only makes practical sense when we have at least 2 traffic classes: one
> onto which we can map the preemptable priorities, and one onto which we
> can map the express ones. In turn, it means that the 'priorities collapsing into
> the same traffic class' problem that we have when we delete the
> taprio/mqprio qdisc can be solved if we put the FP adminStatus into the same
> netlink interface that also configures the prio:tc map and the number of traffic
> classes (i.e. yes, in tc).
> 
> I'll let someone more knowledgeable of our sysrepo-tsn implementation of a
> NETCONF server (like Xiaoliang, maybe even Rui or Richie?)
> https://github.com/real-time-edge-sw/real-time-edge-sysrepo
> comment on whether this would create any undesirable side effect.
> The implication is that user space, when asked to change the adminStatus, will
> have to figure out whether we are using a taprio, or an mqprio qdisc, and
> create different netlink attributes to talk to the kernel to request that
> configuration. User space will also have to send an error towards the NETCONF
> client if no such qdisc exists (meaning that the user wants to combine priorities
> of different express/preemptable values in the same traffic class, TC0). The
> qdiscs will also centrally validate whether such invalid mixed configurations
> are requested.
The Netconf yang model haven't defined the mac preemption feature of 802.3br,
but it defines the 802.1Qbu preemption. It uses priority to configure the
adminStatus of frame-preemption-status-table. The priority map traffic class is
independent in Netconf, so there is no effect whether we combine FP status
configuration with tc-mqprio or not, just translated it in Kernel is OK.

> 
> I'm mostly OK with this: an UAPI where FP adminStatus is per priority, but it
> gets translated into per-tc by the qdisc, and passed to the driver through
> ndo_setup_tc(). I am certainly still very much against the "preemptable queue
> mask" that you proposed, and IMO you'll have to do something and clarify
> what a traffic class even means for Intel hardware, especially in relationship to
> multiple queues per tc (mqprio queues 2@1).
> 
> Anyone else who has an opinion of any sort, please feel free.
I didn't think about it too much, but I think combined with tc-mqprio command
is more easy to implement, this eliminates the priority of translating to TC.
Example:
tc qdisc add dev eth0 root handle 1: mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
	fp-prios 0xff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: 回复: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-11-10 10:33                   ` 回复: " Xiaoliang Yang
@ 2022-11-14 15:42                     ` Vladimir Oltean
  2022-11-15  3:01                       ` 回复: " Xiaoliang Yang
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2022-11-14 15:42 UTC (permalink / raw)
  To: Xiaoliang Yang
  Cc: Vinicius Costa Gomes, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Michal Kubecek, Claudiu Manoil,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes, Richie Pearn

Hi Xiaoliang,

On Thu, Nov 10, 2022 at 10:33:50AM +0000, Xiaoliang Yang wrote:
> On Sep 15, 2022, Vladimir wrote:
> > > That's a good point (your understanding of the flow is similar to mine).
> > > This seems a good idea. But we may have to wait for a bit until we
> > > have a LLDP implementation that supports this.
> > 
> > This is circular; we have a downstream patch to openlldp that adds support for
> > this TLV, but it can't go anywhere until there is mainline kernel support.
> > 
> > What about splitting out MAC merge support from FP support, and me
> > concentrating on the MAC merge layer for now? They're independent up to a
> > pretty high level. The MAC merge layer is supposed to be controlled by the
> > LLDP daemon and be pretty much plug-and-play, while the FP adminStatus is
> > supposed to be set by some high level administrator, like a NETCONF client.
> 
> I agree that splitting out MAC merge support from FP support, they are defined
> by different spec. I'm trying to add LLDP exchange verification support. The
> procedure is like following:
>  1. enable preemption on local port by using "ethtool", do not active preemption.

"Enable" and "activate" are different things only if verification is used.
Otherwise, they mean the same thing.

In turn, LLDP does not control whether verification is used or not. So,
from the perspective of the LLDP daemon, you can't "enable" but "not activate"
preemption.

And I wouldn't expect the LLDP flow to require use of ethtool.

And I think that enabling preemption at step 1 is too early. Why enable
it, if we don't know if the link partner supports it?

>  2. Decode the LLDP TLV of remote port and ensure the remote port supports
> and enables preemption.
>  3. Run SMD-v/r verify process on local port or active the preemption directly.

Verification as defined by 802.3 is, as mentioned, independent of the
Additional Ethernet Capabilities TLV.
Which seems to be a good thing, considering that 2 link partners A and B
don't need to have coordinated settings for verification (verify_disable
doesn't need to be in sync). Any device with the MAC Merge layer turned on is
required to send SMD-R (response) frames in response to SMD-V (verification),
regardless of whether it sends SMD-V frames of its own.

So those who enable verification should get a response from those who
don't enable it, and those who don't enable verification don't cause the
link partners who do enable it to fail their own verification.

The only thing to agree on is this: if the link partner supports
preemption, and you support preemption, then how does the LLDP daemon
decide whether to actually *enable* preemption? My assumption is that
it's fine to unconditionally enable it as long as it's supported.
No configuration or user override is needed.

FWIW, I have my own patch on the openlldp project (yes, not lldpd) which
adds support for the MAC Merge layer. I hope I can wrap everything up
some time next week and resend everything in a state that reflects what
I've been cooking in the meantime. I don't have a solution for the
adminStatus yet, but I guess I'll try to add it via the netlink interface
of tc-taprio and tc-mqprio.

>  4. Send updated LLDP TLV to remote port.
>  5. Disable preemption on local port and repeat step 1-4 when link down/up.
> 
> The struct "ethtool_mm_cfg" seems not fit this procedure. I update it:
> struct ethtool_mm_cfg {
> 	u32 verify_time;
> 	bool verify_disable;
> 	bool enabled;
> +	bool active; // Used to active or start verify preemption by LLDP daemon.

What sense does it make to "force" preemption as active? What would this
do in terms of driving the internal state machines of the hardware, i.e.
force FP active in which way? It's active when enabled and verification
is either disabled, or enabled and complete.

> 	u8 add_frag_size;
> };
> If we want to enable/disable the LLDP exchange verification, maybe we need
> to add more parameters like "bool lldp_verify_enable"

Control the LLDP exchange via kernel UAPI? But why? What difference does
it make to the kernel whether there is an LLDP daemon or not? LLDP frames
are normal packets sent via PF_PACKET sockets, no special driver involvement
needed. Not the same thing as SMD-V/SMD-R hardware verification.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* 回复: 回复: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-11-14 15:42                     ` Vladimir Oltean
@ 2022-11-15  3:01                       ` Xiaoliang Yang
  2022-11-16 12:36                         ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Xiaoliang Yang @ 2022-11-15  3:01 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vinicius Costa Gomes, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Michal Kubecek, Claudiu Manoil,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes, Richie Pearn

Hi Vladimir,

Thanks for your reply, I comment below and make a summarize at the end.

On Mon, Nov 14, 2022 Vladimir wrote:
> On Thu, Nov 10, 2022 at 10:33:50AM +0000, Xiaoliang Yang wrote:
> > On Sep 15, 2022, Vladimir wrote:
> > > > That's a good point (your understanding of the flow is similar to mine).
> > > > This seems a good idea. But we may have to wait for a bit until we
> > > > have a LLDP implementation that supports this.
> > >
> > > This is circular; we have a downstream patch to openlldp that adds
> > > support for this TLV, but it can't go anywhere until there is mainline kernel
> support.
> > >
> > > What about splitting out MAC merge support from FP support, and me
> > > concentrating on the MAC merge layer for now? They're independent up
> > > to a pretty high level. The MAC merge layer is supposed to be
> > > controlled by the LLDP daemon and be pretty much plug-and-play,
> > > while the FP adminStatus is supposed to be set by some high level
> administrator, like a NETCONF client.
> >
> > I agree that splitting out MAC merge support from FP support, they are
> > defined by different spec. I'm trying to add LLDP exchange
> > verification support. The procedure is like following:
> >  1. enable preemption on local port by using "ethtool", do not active
> preemption.
> 
> "Enable" and "activate" are different things only if verification is used.
> Otherwise, they mean the same thing.
Yes, I know that "Enable" and "activate" are different things. I mean that if we
want to enable the next lldp exchange verification process, we need to enable
it but not let the MAC merge layer working until lldp verify successfully.
There are two verification procedure, one is LLDP verification, based on chapter
"99.4.2 Determining that the link partner supports preemption" of 802.3 spec.
This need lldp exchange and ensure preemption is enabled on remote port, and
this verification doesn’t have parameter to enable/disable it. I just want to add
this feature. Another is SMD-r/v verification, this is hardware verification and
already has "verify_disable" to enable/disable it.

Or we can enable by default through LLDP. That don’t need a new "active"
parameter. Once the LLDP application has verified that remote port supports
preemption, it will enable preemption on local port, and tell remote port to
enable it. If we use ethtool to enable preemption, the preemption will be force
enabled without LLDP verification.


> 
> In turn, LLDP does not control whether verification is used or not. So, from the
> perspective of the LLDP daemon, you can't "enable" but "not activate"
> preemption.
Yes, LLDP does not control whether hardware verification is used or not. Maybe I
shouldn't use the word "active", I mean preemption can't be working before LLDP
verification, even though it's enabled by ethtool.

> 
> And I wouldn't expect the LLDP flow to require use of ethtool.
If LLDP application can’t require to use ethtool driver, how to enable the preemption
After LLDP verification?

> 
> And I think that enabling preemption at step 1 is too early. Why enable it, if we
> don't know if the link partner supports it?
Same as mentioned above. Enabling at step 1 does not actually enable preemption,
it means enable the LLDP verification.

> 
> >  2. Decode the LLDP TLV of remote port and ensure the remote port
> > supports and enables preemption.
> >  3. Run SMD-v/r verify process on local port or active the preemption
> directly.
> 
> Verification as defined by 802.3 is, as mentioned, independent of the
> Additional Ethernet Capabilities TLV.
> Which seems to be a good thing, considering that 2 link partners A and B don't
> need to have coordinated settings for verification (verify_disable doesn't need
> to be in sync). Any device with the MAC Merge layer turned on is required to
> send SMD-R (response) frames in response to SMD-V (verification), regardless
> of whether it sends SMD-V frames of its own.
> 
> So those who enable verification should get a response from those who don't
> enable it, and those who don't enable verification don't cause the link partners
> who do enable it to fail their own verification.
The hardware may not obey this rule. I have tested on LS1028a and i.mx8mp boards,
The verification needs to be enabled at the same time for link partners. So it needs
LLDP verification, and let LLDP application to enable both partner ports at the same
time(The hardware verify three times, and each verify time is 10ms in default).

> 
> The only thing to agree on is this: if the link partner supports preemption, and
> you support preemption, then how does the LLDP daemon decide whether to
> actually *enable* preemption? My assumption is that it's fine to
> unconditionally enable it as long as it's supported.
> No configuration or user override is needed.
Yes, I also consider this way. Enable the preemption once LLDP exchange ensure
that both partner support it. But how to enable preemption via LLDP daemon?
I tried to use the ioctl API of ethtool to enable preemption. Do you have any
suggestion?

> 
> FWIW, I have my own patch on the openlldp project (yes, not lldpd) which adds
> support for the MAC Merge layer. I hope I can wrap everything up some time
> next week and resend everything in a state that reflects what I've been cooking
> in the meantime. I don't have a solution for the adminStatus yet, but I guess I'll
> try to add it via the netlink interface of tc-taprio and tc-mqprio.
It's a good news, I can keep up with upstream patches after you upstream the patches.

> 
> >  4. Send updated LLDP TLV to remote port.
> >  5. Disable preemption on local port and repeat step 1-4 when link
> down/up.
> >
> > The struct "ethtool_mm_cfg" seems not fit this procedure. I update it:
> > struct ethtool_mm_cfg {
> > 	u32 verify_time;
> > 	bool verify_disable;
> > 	bool enabled;
> > +	bool active; // Used to active or start verify preemption by LLDP daemon.
> 
> What sense does it make to "force" preemption as active? What would this do
> in terms of driving the internal state machines of the hardware, i.e.
> force FP active in which way? It's active when enabled and verification is either
> disabled, or enabled and complete.
This is not mean active status of SMD-r/v verification. It means let preemption
Working.

> 
> > 	u8 add_frag_size;
> > };
> > If we want to enable/disable the LLDP exchange verification, maybe we
> > need to add more parameters like "bool lldp_verify_enable"
> 
> Control the LLDP exchange via kernel UAPI? But why? What difference does it
> make to the kernel whether there is an LLDP daemon or not? LLDP frames are
> normal packets sent via PF_PACKET sockets, no special driver involvement
> needed. Not the same thing as SMD-V/SMD-R hardware verification.
I mean that if user want to enable the preemption without LLDP. It's not needed
If we use the second way I mentioned above.

I will summarize my opinion. Enable the preemption by default through LLDP.
Once the LLDP application has verified that remote port supports preemption,
it will enable preemption on local port, and tell remote port to enable it.
If user want to force enable preemption, use ethtool to enable it directly.
The lldp daemon use the ioctl API of ethtool to enable preemption.

Thanks,
Xiaoliang

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: 回复: 回复: [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer
  2022-11-15  3:01                       ` 回复: " Xiaoliang Yang
@ 2022-11-16 12:36                         ` Vladimir Oltean
  0 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2022-11-16 12:36 UTC (permalink / raw)
  To: Xiaoliang Yang
  Cc: Vinicius Costa Gomes, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Michal Kubecek, Claudiu Manoil,
	Kurt Kanzenbach, Rui Sousa, Ferenc Fejes, Richie Pearn

Hi Xiaoliang,

On Tue, Nov 15, 2022 at 05:01:24AM +0200, Xiaoliang Yang wrote:
> Yes, I know that "Enable" and "activate" are different things. I mean that if we
> want to enable the next lldp exchange verification process, we need to enable
> it but not let the MAC merge layer working until lldp verify successfully.
> There are two verification procedure, one is LLDP verification, based on chapter
> "99.4.2 Determining that the link partner supports preemption" of 802.3 spec.
> This need lldp exchange and ensure preemption is enabled on remote port, and
> this verification doesn’t have parameter to enable/disable it. I just want to add
> this feature.

Why? The mechanism to enable/disable the MAC Merge layer is the ETHTOOL_A_MM_ENABLED
netlink attribute which is settable.

To the kernel, it doesn't matter if ethtool (the program) or openlldp
sets ETHTOOL_A_MM_ENABLED.

> Another is SMD-r/v verification, this is hardware verification and
> already has "verify_disable" to enable/disable it.
> 
> Or we can enable by default through LLDP. That don’t need a new "active"
> parameter. Once the LLDP application has verified that remote port supports
> preemption, it will enable preemption on local port, and tell remote port to
> enable it.

IIUC, it cannot "tell" the link partner to enable preemption. It can
just advertise the following in a subsequent TLV:

Table 79–8—Additional Ethernet capabilities

Bit 0 (preemption capability support) - supported/not supported
Bit 1 (preemption capability status) - enabled/not enabled
Bit 2 (preemption capability active) - active/not active
Bits 4:3 (additional fragment size) - A 2-bit integer value indicating,
in units of 64 octets, the minimum number of octets over 64 octets
required in non-final fragments by the receiver

The only thing we can do, is if the link partner sets Bit 0 to true, to
generate a ETHTOOL_MSG_MM_SET netlink message with ETHTOOL_A_MM_ENABLED
set to true.

We _cannot_ wait for the link partner to set Bit 1 to true, because, if
it's the same implementation as us, nothing will work, we will both wait
for each other...

So either the link partner enables preemption too, as a result of us
advertising a TLV with Bit 0 set, and things eventually work, or the
link partner doesn't enable preemption even though both ends support it.
In that case, preemption doesn't work (and this is also visible in the
TLVs sent by the link partner - Bit 1 and Bit 2 remain zero).

If the SMD-V/SMD-R handshake is enabled locally (ETHTOOL_A_MM_VERIFY_DISABLE
is false), we can also detect that by the fact that the link partner
will not respond with a SMD-R to our SMD-V. So, even if we will set Bit 1
(corresponds to ETHTOOL_A_MM_ENABLED in the ETHTOOL_MSG_MM_GET message)
to true, Bit 2 (corresponding to ETHTOOL_A_MM_ACTIVE in the GET message)
will still remain unset. Otherwise, the condition will remain undetected,
and we will report Bit 2 (active) as true (which it is - verification is
disabled after all).

> If we use ethtool to enable preemption, the preemption will be force
> enabled without LLDP verification.

Yup. I consider that within the expected parameters of the game, no?

> > In turn, LLDP does not control whether verification is used or not. So, from the
> > perspective of the LLDP daemon, you can't "enable" but "not activate"
> > preemption.
> Yes, LLDP does not control whether hardware verification is used or not. Maybe I
> shouldn't use the word "active", I mean preemption can't be working before LLDP
> verification, even though it's enabled by ethtool.

I don't understand this. Why can't preemption work without LLDP
verification? My understanding is that LLDP is just an automated way of
maximizing the enabled feature set depending on what the link partner
advertises as supported. But the same feature set can be force-enabled
manually, if there is prior knowledge that the link partner can do it.
I think of it as a software autoneg, I might be wrong, I don't have too
much XP with LLDP. That's exactly what ethtool (the program, not the
kernel interface) is for - to be able to force settings.

> > And I wouldn't expect the LLDP flow to require use of ethtool.
> If LLDP application can’t require to use ethtool driver, how to enable the preemption
> after LLDP verification?

No, I mean the LLDP flow doesn't require use of ethtool the program, not
ethtool the kernel netlink interface. Of course it requires the latter.

> > And I think that enabling preemption at step 1 is too early. Why enable it, if we
> > don't know if the link partner supports it?
> Same as mentioned above. Enabling at step 1 does not actually enable preemption,
> it means enable the LLDP verification.

Nope. LLDP verification is no more than a user space program that
automates something that can also be done manually, using ethtool the
program. No special netlink messages for LLDP.

> > So those who enable verification should get a response from those who don't
> > enable it, and those who don't enable verification don't cause the link partners
> > who do enable it to fail their own verification.
> The hardware may not obey this rule. I have tested on LS1028a and i.mx8mp boards,
> The verification needs to be enabled at the same time for link partners. So it needs
> LLDP verification, and let LLDP application to enable both partner ports at the same
> time(The hardware verify three times, and each verify time is 10ms in default).

I think the imx8mp has the same EQOS block as the S32G GMAC, right?
If so, I might try experimenting a bit on the S32G linked to LS1028A,
to see exactly what you're talking about.

Keep in mind that hardware quirks don't mean much to what the user space
flow should be. The kernel should deal with quirks. For example, on enetc,
the MAC merge layer must be manually disabled and re-enabled on each
link up/down event by the kernel driver. Otherwise the verification
state machine remains FAILED, as you seem to say. But that changes
exactly nothing in terms of the LLDP flow - because as you say, the
hardware SMD-R/SMD-V verification state machine which you talk about is
independent of LLDP.

> > The only thing to agree on is this: if the link partner supports preemption, and
> > you support preemption, then how does the LLDP daemon decide whether to
> > actually *enable* preemption? My assumption is that it's fine to
> > unconditionally enable it as long as it's supported.
> > No configuration or user override is needed.
> Yes, I also consider this way. Enable the preemption once LLDP exchange ensure
> that both partner support it. But how to enable preemption via LLDP daemon?
> I tried to use the ioctl API of ethtool to enable preemption. Do you have any
> suggestion?

Yeah, see above. The ethtool ioctl is dead and it's not coming back, see ETHTOOL_A_MM_ENABLED.
I wish I could come back to this patch set sooner, but I'm really deadlocked
with some other stuff I need to do this week. Sorry.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2022-11-16 12:37 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16 22:29 [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Vladimir Oltean
2022-08-16 22:29 ` [RFC PATCH net-next 1/7] net: ethtool: netlink: introduce ethnl_update_bool() Vladimir Oltean
2022-08-17 11:27   ` Michal Kubecek
2022-08-17 11:52     ` Vladimir Oltean
2022-08-17 12:00       ` Vladimir Oltean
2022-08-17 12:05         ` Michal Kubecek
2022-08-16 22:29 ` [RFC PATCH net-next 2/7] net: ethtool: add support for Frame Preemption and MAC Merge layer Vladimir Oltean
2022-08-17  3:22   ` Jakub Kicinski
2022-08-17 11:41     ` Vladimir Oltean
2022-08-17 18:35       ` Jakub Kicinski
2022-08-17 23:15   ` Vinicius Costa Gomes
2022-08-19 16:12     ` Vladimir Oltean
2022-08-24  0:35       ` Vinicius Costa Gomes
2022-09-07 20:57         ` Vladimir Oltean
2022-09-10  0:19           ` Vinicius Costa Gomes
2022-09-10 16:36             ` Vladimir Oltean
2022-09-14  2:59               ` Vinicius Costa Gomes
2022-09-15 14:14                 ` Vladimir Oltean
2022-11-10 10:33                   ` 回复: " Xiaoliang Yang
2022-11-14 15:42                     ` Vladimir Oltean
2022-11-15  3:01                       ` 回复: " Xiaoliang Yang
2022-11-16 12:36                         ` Vladimir Oltean
2022-08-16 22:29 ` [RFC PATCH net-next 3/7] net: ethtool: stats: make stats_put_stats() take input from multiple sources Vladimir Oltean
2022-08-16 22:29 ` [RFC PATCH net-next 4/7] net: ethtool: stats: replicate standardized counters for the pMAC Vladimir Oltean
2022-08-16 22:29 ` [RFC PATCH net-next 5/7] net: enetc: parameterize port MAC stats to also cover " Vladimir Oltean
2022-08-16 22:29 ` [RFC PATCH net-next 6/7] net: enetc: expose some standardized ethtool counters Vladimir Oltean
2022-08-16 22:29 ` [RFC PATCH net-next 7/7] net: enetc: add support for Frame Preemption and MAC Merge layer Vladimir Oltean
2022-08-17  3:34 ` [RFC PATCH net-next 0/7] 802.1Q Frame Preemption and 802.3 MAC Merge support via ethtool Jakub Kicinski
2022-08-17 11:50   ` Vladimir Oltean
2022-08-17 18:46     ` Jakub Kicinski
2022-10-01 15:53       ` Vladimir Oltean
2022-10-03 14:36         ` Jakub Kicinski
2022-10-03 14:51           ` Vladimir Oltean
2022-08-17 22:47 ` Vinicius Costa Gomes
2022-08-19  8:16 ` Kurt Kanzenbach
2022-08-19 16:59   ` Vladimir Oltean
2022-08-23 10:50     ` Kurt Kanzenbach

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.