All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-20  1:15 ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

Hi,

Please consider this as a PATCH-like quality RFC (in short, even in
the absence of comments, please do not apply this series as is), my
aim is to get an consensus on the userspace API.

I also found some weirdness with Intel I226, that I would like to
investigate better. So, maybe it's a good use of everyone's time to
have this series out, so people can take a look at the more
controversial parts while I investigate/fix those issues.

(The checkpatch.pl warnings about the spelling of "preemptible" are
ignored because that's the way it's spelled in IEEE 802.1Q-2018, but
in IEEE 802.3-2018 it's preemptable, it's a mess)

Changes from v4:
 - Went back to exposing the per-queue frame preemption bits via
   ethtool-netlink only, via taprio/mqprio was seen as too much
   trouble. (Vladimir Oltean)
 - Fixed documentation and code/patch organization changes (Vladimir
   Oltean).

Changes from v3:
 - Added early support for sending/receiving support for verification
   frames (Vladimir Oltean). This is a bit more than RFC-quality, but
   adding this so people can see how it fits together with the rest.
   The driver specific bits are interesting because the hardware does
   the absolute minimum, the driver needs to do the heavy lifting.

 - Added support for setting preemptible/express traffic classes via
   tc-mqprio (Vladimir Oltean). mqprio parsing of configuration
   options is... interesting, so comments here are going to be useful,
   I may have missed something.

Changes from v2:
 - Fixed some copy&paste mistakes, documentation formatting and
   slightly improved error reporting (Jakub Kicinski);

Changes from v1:
 - The minimum fragment size configuration was changed to be
   configured in bytes to be more future proof, in case the standard
   changes this (the previous definition was '(X + 1) * 64', X being
   [0..3]) (Michal Kubecek);
 - In taprio, frame preemption is now configured by traffic classes (was
   done by queues) (Jakub Kicinski, Vladimir Oltean);
 - Various netlink protocol validation improvements (Jakub Kicinski);
 - Dropped the IGC register dump for frame preemption registers, until a
   stardandized way of exposing that is agreed (Jakub Kicinski);

Changes from RFC v2:
 - Reorganised the offload enabling/disabling on the driver size;
 - Added a few igc fixes;

Changes from RFC v1:
 - The per-queue preemptible/express setting is moved to applicable
   qdiscs (Jakub Kicinski and others);
 - "min-frag-size" now follows the 802.3br specification more closely,
   it's expressed as X in '64(1 + X) + 4' (Joergen Andreasen);

Another point that should be noted is the addition of the
TC_SETUP_PREEMPT offload type, the idea behind this is to allow other
qdiscs (was thinking of mqprio) to also configure which traffic
classes should be marked as express/preemptible.

Original cover letter (lightly edited):

This is still an RFC because two main reasons, I want to confirm that
this approach (per-queue settings via qdiscs, device settings via
ethtool) looks good, even though there aren't much more options left ;-)
The other reason is that while testing this I found some weirdness
in the driver that I would need a bit more time to investigate.

(In case these patches are not enough to give an idea of how things
work, I can send the userspace patches, of course.)

The idea of this "hybrid" approach is that applications/users would do
the following steps to configure frame preemption:

$ tc qdisc replace dev $IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      base-time $BASE_TIME \
      sched-entry S 0f 10000000 \
      preempt 1110 \
      flags 0x2 

The "preempt" parameter is the only difference, it configures which
traffic classes are marked as preemptible, in this example, traffic
class 0 is marked as "not preemptible", so it is express, the rest of
the four traffic classes are preemptible.

The next step, of this example, would be to enable frame preemption in
the device, via ethtool, and set the minimum fragment size to 192 bytes:

$ sudo ./ethtool --set-frame-preemption $IFACE fp on min-frag-size 192

Cheers,


Vinicius Costa Gomes (11):
  ethtool: Add support for configuring frame preemption
  ethtool: Add support for Frame Preemption verification
  igc: Add support for receiving frames with all zeroes address
  igc: Set the RX packet buffer size for TSN mode
  igc: Optimze TX buffer sizes for TSN
  igc: Add support for receiving errored frames
  igc: Add support for enabling frame preemption via ethtool
  igc: Add support for setting frame preemption configuration
  igc: Add support for Frame Preemption verification
  igc: Check incompatible configs for Frame Preemption
  igc: Add support for exposing frame preemption stats registers

 Documentation/networking/ethtool-netlink.rst |  55 ++++
 drivers/net/ethernet/intel/igc/igc.h         |  29 ++-
 drivers/net/ethernet/intel/igc/igc_defines.h |  22 +-
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  92 +++++++
 drivers/net/ethernet/intel/igc/igc_main.c    | 256 +++++++++++++++++++
 drivers/net/ethernet/intel/igc/igc_regs.h    |  10 +
 drivers/net/ethernet/intel/igc/igc_tsn.c     |  57 ++++-
 include/linux/ethtool.h                      |  26 ++
 include/uapi/linux/ethtool_netlink.h         |  20 ++
 net/ethtool/Makefile                         |   3 +-
 net/ethtool/common.c                         |  23 ++
 net/ethtool/netlink.c                        |  19 ++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 188 ++++++++++++++
 14 files changed, 791 insertions(+), 13 deletions(-)
 create mode 100644 net/ethtool/preempt.c

-- 
2.35.3


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-20  1:15 ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

Hi,

Please consider this as a PATCH-like quality RFC (in short, even in
the absence of comments, please do not apply this series as is), my
aim is to get an consensus on the userspace API.

I also found some weirdness with Intel I226, that I would like to
investigate better. So, maybe it's a good use of everyone's time to
have this series out, so people can take a look at the more
controversial parts while I investigate/fix those issues.

(The checkpatch.pl warnings about the spelling of "preemptible" are
ignored because that's the way it's spelled in IEEE 802.1Q-2018, but
in IEEE 802.3-2018 it's preemptable, it's a mess)

Changes from v4:
 - Went back to exposing the per-queue frame preemption bits via
   ethtool-netlink only, via taprio/mqprio was seen as too much
   trouble. (Vladimir Oltean)
 - Fixed documentation and code/patch organization changes (Vladimir
   Oltean).

Changes from v3:
 - Added early support for sending/receiving support for verification
   frames (Vladimir Oltean). This is a bit more than RFC-quality, but
   adding this so people can see how it fits together with the rest.
   The driver specific bits are interesting because the hardware does
   the absolute minimum, the driver needs to do the heavy lifting.

 - Added support for setting preemptible/express traffic classes via
   tc-mqprio (Vladimir Oltean). mqprio parsing of configuration
   options is... interesting, so comments here are going to be useful,
   I may have missed something.

Changes from v2:
 - Fixed some copy&paste mistakes, documentation formatting and
   slightly improved error reporting (Jakub Kicinski);

Changes from v1:
 - The minimum fragment size configuration was changed to be
   configured in bytes to be more future proof, in case the standard
   changes this (the previous definition was '(X + 1) * 64', X being
   [0..3]) (Michal Kubecek);
 - In taprio, frame preemption is now configured by traffic classes (was
   done by queues) (Jakub Kicinski, Vladimir Oltean);
 - Various netlink protocol validation improvements (Jakub Kicinski);
 - Dropped the IGC register dump for frame preemption registers, until a
   stardandized way of exposing that is agreed (Jakub Kicinski);

Changes from RFC v2:
 - Reorganised the offload enabling/disabling on the driver size;
 - Added a few igc fixes;

Changes from RFC v1:
 - The per-queue preemptible/express setting is moved to applicable
   qdiscs (Jakub Kicinski and others);
 - "min-frag-size" now follows the 802.3br specification more closely,
   it's expressed as X in '64(1 + X) + 4' (Joergen Andreasen);

Another point that should be noted is the addition of the
TC_SETUP_PREEMPT offload type, the idea behind this is to allow other
qdiscs (was thinking of mqprio) to also configure which traffic
classes should be marked as express/preemptible.

Original cover letter (lightly edited):

This is still an RFC because two main reasons, I want to confirm that
this approach (per-queue settings via qdiscs, device settings via
ethtool) looks good, even though there aren't much more options left ;-)
The other reason is that while testing this I found some weirdness
in the driver that I would need a bit more time to investigate.

(In case these patches are not enough to give an idea of how things
work, I can send the userspace patches, of course.)

The idea of this "hybrid" approach is that applications/users would do
the following steps to configure frame preemption:

$ tc qdisc replace dev $IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1 at 0 1 at 1 2 at 2 \
      base-time $BASE_TIME \
      sched-entry S 0f 10000000 \
      preempt 1110 \
      flags 0x2 

The "preempt" parameter is the only difference, it configures which
traffic classes are marked as preemptible, in this example, traffic
class 0 is marked as "not preemptible", so it is express, the rest of
the four traffic classes are preemptible.

The next step, of this example, would be to enable frame preemption in
the device, via ethtool, and set the minimum fragment size to 192 bytes:

$ sudo ./ethtool --set-frame-preemption $IFACE fp on min-frag-size 192

Cheers,


Vinicius Costa Gomes (11):
  ethtool: Add support for configuring frame preemption
  ethtool: Add support for Frame Preemption verification
  igc: Add support for receiving frames with all zeroes address
  igc: Set the RX packet buffer size for TSN mode
  igc: Optimze TX buffer sizes for TSN
  igc: Add support for receiving errored frames
  igc: Add support for enabling frame preemption via ethtool
  igc: Add support for setting frame preemption configuration
  igc: Add support for Frame Preemption verification
  igc: Check incompatible configs for Frame Preemption
  igc: Add support for exposing frame preemption stats registers

 Documentation/networking/ethtool-netlink.rst |  55 ++++
 drivers/net/ethernet/intel/igc/igc.h         |  29 ++-
 drivers/net/ethernet/intel/igc/igc_defines.h |  22 +-
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  92 +++++++
 drivers/net/ethernet/intel/igc/igc_main.c    | 256 +++++++++++++++++++
 drivers/net/ethernet/intel/igc/igc_regs.h    |  10 +
 drivers/net/ethernet/intel/igc/igc_tsn.c     |  57 ++++-
 include/linux/ethtool.h                      |  26 ++
 include/uapi/linux/ethtool_netlink.h         |  20 ++
 net/ethtool/Makefile                         |   3 +-
 net/ethtool/common.c                         |  23 ++
 net/ethtool/netlink.c                        |  19 ++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 188 ++++++++++++++
 14 files changed, 791 insertions(+), 13 deletions(-)
 create mode 100644 net/ethtool/preempt.c

-- 
2.35.3


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 01/11] ethtool: Add support for configuring frame preemption
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

Frame preemption (described in IEEE 802.3-2018, Section 99 in
particular) defines the concept of preemptible and express queues. It
allows traffic from express queues to "interrupt" traffic from
preemptible queues, which are "resumed" after the express traffic has
finished transmitting.

Expose the UAPI bits for applications to enable using ethtool-netlink.
Also expose the kernel ethtool functions, so device drivers can
support it.

Frame preemption can only be used when both the local device and the
link partner support it.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 Documentation/networking/ethtool-netlink.rst |  52 ++++++
 include/linux/ethtool.h                      |  23 +++
 include/uapi/linux/ethtool_netlink.h         |  18 ++
 net/ethtool/Makefile                         |   3 +-
 net/ethtool/common.c                         |  23 +++
 net/ethtool/netlink.c                        |  19 ++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 177 +++++++++++++++++++
 8 files changed, 318 insertions(+), 1 deletion(-)
 create mode 100644 net/ethtool/preempt.c

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index dbca3e9ec782..15d7c025cc4e 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -220,6 +220,8 @@ Userspace to kernel:
   ``ETHTOOL_MSG_PHC_VCLOCKS_GET``       get PHC virtual clocks info
   ``ETHTOOL_MSG_MODULE_SET``            set transceiver module parameters
   ``ETHTOOL_MSG_MODULE_GET``            get transceiver module parameters
+  ``ETHTOOL_MSG_PREEMPT_GET``           get frame preemption parameters
+  ``ETHTOOL_MSG_PREEMPT_SET``           set frame preemption parameters
   ===================================== =================================
 
 Kernel to userspace:
@@ -260,6 +262,7 @@ Kernel to userspace:
   ``ETHTOOL_MSG_STATS_GET_REPLY``          standard statistics
   ``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY``    PHC virtual clocks info
   ``ETHTOOL_MSG_MODULE_GET_REPLY``         transceiver module parameters
+  ``ETHTOOL_MSG_PREEMPT_GET_REPLY``        frame preemption parameters
   ======================================== =================================
 
 ``GET`` requests are sent by userspace applications to retrieve device
@@ -1625,6 +1628,53 @@ For SFF-8636 modules, low power mode is forced by the host according to table
 For CMIS modules, low power mode is forced by the host according to table 6-12
 in revision 5.0 of the specification.
 
+PREEMPT_GET
+===========
+
+Get information about frame preemption state.
+
+Request contents:
+
+  ====================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header
+  ====================================  ======  ==========================
+
+Kernel response contents:
+
+  ======================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``            nested  reply header
+  ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
+  ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
+  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
+  ======================================  ======  ==========================
+
+``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
+fragment size that the receiver device supports.
+
+``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK`` configures which queues should
+be marked as preemptible. If bit X is '1' then queue X is preemptible,
+the queue is express otherwise.
+
+PREEMPT_SET
+===========
+
+Sets frame preemption parameters.
+
+Request contents:
+
+  ======================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``            nested  reply header
+  ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
+  ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
+  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
+  ======================================  ======  ==========================
+
+``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
+fragment size that the receiver device supports.
+
+``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK`` configures which queues should be marked as
+preemptible.
+
 Request translation
 ===================
 
@@ -1726,4 +1776,6 @@ are netlink only.
   n/a                                 ``ETHTOOL_MSG_PHC_VCLOCKS_GET``
   n/a                                 ``ETHTOOL_MSG_MODULE_GET``
   n/a                                 ``ETHTOOL_MSG_MODULE_SET``
+  n/a                                 ``ETHTOOL_MSG_PREEMPT_GET``
+  n/a                                 ``ETHTOOL_MSG_PREEMPT_SET``
   =================================== =====================================
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 99dc7bfbcd3c..42570ec8ee44 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -453,6 +453,20 @@ struct ethtool_module_power_mode_params {
 	enum ethtool_module_power_mode mode;
 };
 
+/**
+ * struct ethtool_fp - Frame Preemption information
+ *
+ * @enabled: Enable frame preemption.
+ * @add_frag_size: Minimum size for additional (non-final) fragments
+ * in bytes, for the value defined in the IEEE 802.3-2018 standard see
+ * ethtool_frag_size_to_mult().
+ */
+struct ethtool_fp {
+	u32 enabled;
+	u32 preemptible_mask;
+	u32 add_frag_size;
+};
+
 /**
  * struct ethtool_ops - optional netdev operations
  * @cap_link_lanes_supported: indicates if the driver supports lanes
@@ -606,6 +620,8 @@ struct ethtool_module_power_mode_params {
  *	not report statistics.
  * @get_fecparam: Get the network device Forward Error Correction parameters.
  * @set_fecparam: Set the network device Forward Error Correction parameters.
+ * @get_preempt: Get the network device Frame Preemption parameters.
+ * @set_preempt: Set the network device Frame Preemption parameters.
  * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
  *	This is only useful if the device maintains PHY statistics and
  *	cannot use the standard PHY library helpers.
@@ -736,6 +752,10 @@ struct ethtool_ops {
 				      struct ethtool_fecparam *);
 	int	(*set_fecparam)(struct net_device *,
 				      struct ethtool_fecparam *);
+	int	(*get_preempt)(struct net_device *dev,
+			       struct ethtool_fp *fp);
+	int	(*set_preempt)(struct net_device *dev, struct ethtool_fp *fp,
+			       struct netlink_ext_ack *extack);
 	void	(*get_ethtool_phy_stats)(struct net_device *,
 					 struct ethtool_stats *, u64 *);
 	int	(*get_phy_tunable)(struct net_device *,
@@ -843,4 +863,7 @@ int ethtool_get_phc_vclocks(struct net_device *dev, int **vclock_index);
  * next string.
  */
 extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
+
+u8 ethtool_frag_size_to_mult(u32 frag_size);
+
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index d2fb4f7be61b..651c7af76776 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -49,6 +49,8 @@ enum {
 	ETHTOOL_MSG_PHC_VCLOCKS_GET,
 	ETHTOOL_MSG_MODULE_GET,
 	ETHTOOL_MSG_MODULE_SET,
+	ETHTOOL_MSG_PREEMPT_GET,
+	ETHTOOL_MSG_PREEMPT_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -94,6 +96,8 @@ enum {
 	ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY,
 	ETHTOOL_MSG_MODULE_GET_REPLY,
 	ETHTOOL_MSG_MODULE_NTF,
+	ETHTOOL_MSG_PREEMPT_GET_REPLY,
+	ETHTOOL_MSG_PREEMPT_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -697,6 +701,20 @@ enum {
 	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)
 };
 
+/* FRAME PREEMPTION */
+
+enum {
+	ETHTOOL_A_PREEMPT_UNSPEC,
+	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
+	ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK,		/* bitset */
+	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_PREEMPT_CNT,
+	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)
+};
+
 /* MODULE EEPROM */
 
 enum {
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index b76432e70e6b..c0ab048b46c9 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -7,4 +7,5 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
 ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
 		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
 		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
-		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o
+		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o \
+		   preempt.o
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 566adf85e658..2232b8ef18b4 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -597,3 +597,26 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
 	link_ksettings->base.duplex = link_info->duplex;
 }
 EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);
+
+/**
+ * ethtool_frag_size_to_mult() - Convert from a Frame Preemption
+ * Additional Fragment size in bytes to a multiplier.
+ * @frag_size: minimum non-final fragment size in bytes.
+ *
+ * The multiplier is defined as:
+ *	"A 2-bit integer value used to indicate the minimum size of
+ *	non-final fragments supported by the receiver on the given port
+ *	associated with the local System. This value is expressed in units
+ *	of 64 octets of additional fragment length."
+ *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018
+ *	standard.
+ *
+ * Return: the multiplier is a number in the [0, 2] interval.
+ */
+u8 ethtool_frag_size_to_mult(u32 frag_size)
+{
+	u8 mult = (frag_size / 64) - 1;
+
+	return clamp_t(u8, mult, 0, 3);
+}
+EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 5fe8f4ae2ceb..66b35c35fcdb 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -282,6 +282,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
 	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,
 	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
+	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,
 	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
 	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
 	[ETHTOOL_MSG_PHC_VCLOCKS_GET]	= &ethnl_phc_vclocks_request_ops,
@@ -598,6 +599,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
 	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
 	[ETHTOOL_MSG_MODULE_NTF]	= &ethnl_module_request_ops,
+	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,
 };
 
 /* default notification handler */
@@ -691,6 +693,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
 	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_MODULE_NTF]	= ethnl_default_notify,
+	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,
 };
 
 void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
@@ -1020,6 +1023,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_module_set_policy,
 		.maxattr = ARRAY_SIZE(ethnl_module_set_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_PREEMPT_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_preempt_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,
+	},
+	{
+		.cmd	= ETHTOOL_MSG_PREEMPT_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_set_preempt,
+		.policy = ethnl_preempt_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 7919ddb2371c..444799f3e91a 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -341,6 +341,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;
 extern const struct ethnl_request_ops ethnl_eee_request_ops;
 extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
 extern const struct ethnl_request_ops ethnl_fec_request_ops;
+extern const struct ethnl_request_ops ethnl_preempt_request_ops;
 extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
 extern const struct ethnl_request_ops ethnl_stats_request_ops;
 extern const struct ethnl_request_ops ethnl_phc_vclocks_request_ops;
@@ -379,6 +380,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
 extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
+extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
+extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
 extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
@@ -402,6 +405,7 @@ int ethnl_tunnel_info_start(struct netlink_callback *cb);
 int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
 int ethnl_set_module(struct sk_buff *skb, struct genl_info *info);
+int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);
 
 extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
 extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
new file mode 100644
index 000000000000..0000ba8cb90c
--- /dev/null
+++ b/net/ethtool/preempt.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "common.h"
+#include "netlink.h"
+#include "bitset.h"
+
+struct preempt_req_info {
+	struct ethnl_req_info		base;
+};
+
+struct preempt_reply_data {
+	struct ethnl_reply_data		base;
+	struct ethtool_fp		fp;
+};
+
+#define PREEMPT_QUEUES_COUNT \
+	(sizeof_field(struct ethtool_fp, preemptible_mask) * BITS_PER_BYTE)
+
+#define PREEMPT_REPDATA(__reply_base) \
+	container_of(__reply_base, struct preempt_reply_data, base)
+
+const struct nla_policy
+ethnl_preempt_get_policy[] = {
+	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),
+};
+
+static int preempt_prepare_data(const struct ethnl_req_info *req_base,
+				struct ethnl_reply_data *reply_base,
+				struct genl_info *info)
+{
+	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	struct net_device *dev = reply_base->dev;
+	int ret;
+
+	if (!dev->ethtool_ops->get_preempt)
+		return -EOPNOTSUPP;
+
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		return ret;
+
+	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);
+	ethnl_ops_complete(dev);
+
+	return ret;
+}
+
+static int preempt_reply_size(const struct ethnl_req_info *req_base,
+			      const struct ethnl_reply_data *reply_base)
+{
+	bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS;
+	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	const struct ethtool_fp *preempt = &data->fp;
+	int len = 0;
+	int ret;
+
+	ret = ethnl_bitset32_size(&preempt->preemptible_mask, NULL,
+				  PREEMPT_QUEUES_COUNT, NULL, compact);
+	if (ret < 0)
+		return ret;
+
+	len += ret;
+
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
+	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
+
+	return len;
+}
+
+static int preempt_fill_reply(struct sk_buff *skb,
+			      const struct ethnl_req_info *req_base,
+			      const struct ethnl_reply_data *reply_base)
+{
+	bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS;
+	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	const struct ethtool_fp *preempt = &data->fp;
+	int ret;
+
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,
+			preempt->add_frag_size))
+		return -EMSGSIZE;
+
+	ret = ethnl_put_bitset32(skb, ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK,
+				 &preempt->preemptible_mask, NULL, PREEMPT_QUEUES_COUNT,
+				 NULL, compact);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+const struct ethnl_request_ops ethnl_preempt_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,
+	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,
+	.req_info_size		= sizeof(struct preempt_req_info),
+	.reply_data_size	= sizeof(struct preempt_reply_data),
+
+	.prepare_data		= preempt_prepare_data,
+	.reply_size		= preempt_reply_size,
+	.fill_reply		= preempt_fill_reply,
+};
+
+const struct nla_policy
+ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
+	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
+	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
+	[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK]		= { .type = NLA_NESTED },
+};
+
+int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
+{
+	struct ethnl_req_info req_info = {};
+	struct nlattr **tb = info->attrs;
+	struct ethtool_fp preempt = {};
+	struct net_device *dev;
+	bool mod = false;
+	int ret;
+
+	ret = ethnl_parse_header_dev_get(&req_info,
+					 tb[ETHTOOL_A_PREEMPT_HEADER],
+					 genl_info_net(info), info->extack,
+					 true);
+	if (ret < 0)
+		return ret;
+	dev = req_info.dev;
+
+	ret = -EOPNOTSUPP;
+	if (!dev->ethtool_ops->get_preempt ||
+	    !dev->ethtool_ops->set_preempt)
+		goto out_dev;
+
+	rtnl_lock();
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		goto out_rtnl;
+
+	ret = dev->ethtool_ops->get_preempt(dev, &preempt);
+	if (ret < 0) {
+		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
+		goto out_ops;
+	}
+
+	ret = ethnl_update_bitset32(&preempt.preemptible_mask, PREEMPT_QUEUES_COUNT,
+				    tb[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK],
+				    NULL, info->extack, &mod);
+	if (ret < 0)
+		goto out_ops;
+
+	ethnl_update_bool32(&preempt.enabled,
+			    tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
+	ethnl_update_u32(&preempt.add_frag_size,
+			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
+	ret = 0;
+	if (!mod)
+		goto out_ops;
+
+	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
+	if (ret < 0) {
+		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");
+		goto out_ops;
+	}
+
+	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
+
+out_ops:
+	ethnl_ops_complete(dev);
+out_rtnl:
+	rtnl_unlock();
+out_dev:
+	dev_put(dev);
+	return ret;
+}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 01/11] ethtool: Add support for configuring frame preemption
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

Frame preemption (described in IEEE 802.3-2018, Section 99 in
particular) defines the concept of preemptible and express queues. It
allows traffic from express queues to "interrupt" traffic from
preemptible queues, which are "resumed" after the express traffic has
finished transmitting.

Expose the UAPI bits for applications to enable using ethtool-netlink.
Also expose the kernel ethtool functions, so device drivers can
support it.

Frame preemption can only be used when both the local device and the
link partner support it.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 Documentation/networking/ethtool-netlink.rst |  52 ++++++
 include/linux/ethtool.h                      |  23 +++
 include/uapi/linux/ethtool_netlink.h         |  18 ++
 net/ethtool/Makefile                         |   3 +-
 net/ethtool/common.c                         |  23 +++
 net/ethtool/netlink.c                        |  19 ++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 177 +++++++++++++++++++
 8 files changed, 318 insertions(+), 1 deletion(-)
 create mode 100644 net/ethtool/preempt.c

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index dbca3e9ec782..15d7c025cc4e 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -220,6 +220,8 @@ Userspace to kernel:
   ``ETHTOOL_MSG_PHC_VCLOCKS_GET``       get PHC virtual clocks info
   ``ETHTOOL_MSG_MODULE_SET``            set transceiver module parameters
   ``ETHTOOL_MSG_MODULE_GET``            get transceiver module parameters
+  ``ETHTOOL_MSG_PREEMPT_GET``           get frame preemption parameters
+  ``ETHTOOL_MSG_PREEMPT_SET``           set frame preemption parameters
   ===================================== =================================
 
 Kernel to userspace:
@@ -260,6 +262,7 @@ Kernel to userspace:
   ``ETHTOOL_MSG_STATS_GET_REPLY``          standard statistics
   ``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY``    PHC virtual clocks info
   ``ETHTOOL_MSG_MODULE_GET_REPLY``         transceiver module parameters
+  ``ETHTOOL_MSG_PREEMPT_GET_REPLY``        frame preemption parameters
   ======================================== =================================
 
 ``GET`` requests are sent by userspace applications to retrieve device
@@ -1625,6 +1628,53 @@ For SFF-8636 modules, low power mode is forced by the host according to table
 For CMIS modules, low power mode is forced by the host according to table 6-12
 in revision 5.0 of the specification.
 
+PREEMPT_GET
+===========
+
+Get information about frame preemption state.
+
+Request contents:
+
+  ====================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header
+  ====================================  ======  ==========================
+
+Kernel response contents:
+
+  ======================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``            nested  reply header
+  ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
+  ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
+  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
+  ======================================  ======  ==========================
+
+``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
+fragment size that the receiver device supports.
+
+``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK`` configures which queues should
+be marked as preemptible. If bit X is '1' then queue X is preemptible,
+the queue is express otherwise.
+
+PREEMPT_SET
+===========
+
+Sets frame preemption parameters.
+
+Request contents:
+
+  ======================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``            nested  reply header
+  ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
+  ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
+  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
+  ======================================  ======  ==========================
+
+``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
+fragment size that the receiver device supports.
+
+``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK`` configures which queues should be marked as
+preemptible.
+
 Request translation
 ===================
 
@@ -1726,4 +1776,6 @@ are netlink only.
   n/a                                 ``ETHTOOL_MSG_PHC_VCLOCKS_GET``
   n/a                                 ``ETHTOOL_MSG_MODULE_GET``
   n/a                                 ``ETHTOOL_MSG_MODULE_SET``
+  n/a                                 ``ETHTOOL_MSG_PREEMPT_GET``
+  n/a                                 ``ETHTOOL_MSG_PREEMPT_SET``
   =================================== =====================================
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 99dc7bfbcd3c..42570ec8ee44 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -453,6 +453,20 @@ struct ethtool_module_power_mode_params {
 	enum ethtool_module_power_mode mode;
 };
 
+/**
+ * struct ethtool_fp - Frame Preemption information
+ *
+ * @enabled: Enable frame preemption.
+ * @add_frag_size: Minimum size for additional (non-final) fragments
+ * in bytes, for the value defined in the IEEE 802.3-2018 standard see
+ * ethtool_frag_size_to_mult().
+ */
+struct ethtool_fp {
+	u32 enabled;
+	u32 preemptible_mask;
+	u32 add_frag_size;
+};
+
 /**
  * struct ethtool_ops - optional netdev operations
  * @cap_link_lanes_supported: indicates if the driver supports lanes
@@ -606,6 +620,8 @@ struct ethtool_module_power_mode_params {
  *	not report statistics.
  * @get_fecparam: Get the network device Forward Error Correction parameters.
  * @set_fecparam: Set the network device Forward Error Correction parameters.
+ * @get_preempt: Get the network device Frame Preemption parameters.
+ * @set_preempt: Set the network device Frame Preemption parameters.
  * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
  *	This is only useful if the device maintains PHY statistics and
  *	cannot use the standard PHY library helpers.
@@ -736,6 +752,10 @@ struct ethtool_ops {
 				      struct ethtool_fecparam *);
 	int	(*set_fecparam)(struct net_device *,
 				      struct ethtool_fecparam *);
+	int	(*get_preempt)(struct net_device *dev,
+			       struct ethtool_fp *fp);
+	int	(*set_preempt)(struct net_device *dev, struct ethtool_fp *fp,
+			       struct netlink_ext_ack *extack);
 	void	(*get_ethtool_phy_stats)(struct net_device *,
 					 struct ethtool_stats *, u64 *);
 	int	(*get_phy_tunable)(struct net_device *,
@@ -843,4 +863,7 @@ int ethtool_get_phc_vclocks(struct net_device *dev, int **vclock_index);
  * next string.
  */
 extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
+
+u8 ethtool_frag_size_to_mult(u32 frag_size);
+
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index d2fb4f7be61b..651c7af76776 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -49,6 +49,8 @@ enum {
 	ETHTOOL_MSG_PHC_VCLOCKS_GET,
 	ETHTOOL_MSG_MODULE_GET,
 	ETHTOOL_MSG_MODULE_SET,
+	ETHTOOL_MSG_PREEMPT_GET,
+	ETHTOOL_MSG_PREEMPT_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
@@ -94,6 +96,8 @@ enum {
 	ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY,
 	ETHTOOL_MSG_MODULE_GET_REPLY,
 	ETHTOOL_MSG_MODULE_NTF,
+	ETHTOOL_MSG_PREEMPT_GET_REPLY,
+	ETHTOOL_MSG_PREEMPT_NTF,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_KERNEL_CNT,
@@ -697,6 +701,20 @@ enum {
 	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)
 };
 
+/* FRAME PREEMPTION */
+
+enum {
+	ETHTOOL_A_PREEMPT_UNSPEC,
+	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
+	ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK,		/* bitset */
+	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_PREEMPT_CNT,
+	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)
+};
+
 /* MODULE EEPROM */
 
 enum {
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index b76432e70e6b..c0ab048b46c9 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -7,4 +7,5 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
 ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
 		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
 		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
-		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o
+		   tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o \
+		   preempt.o
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 566adf85e658..2232b8ef18b4 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -597,3 +597,26 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
 	link_ksettings->base.duplex = link_info->duplex;
 }
 EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);
+
+/**
+ * ethtool_frag_size_to_mult() - Convert from a Frame Preemption
+ * Additional Fragment size in bytes to a multiplier.
+ * @frag_size: minimum non-final fragment size in bytes.
+ *
+ * The multiplier is defined as:
+ *	"A 2-bit integer value used to indicate the minimum size of
+ *	non-final fragments supported by the receiver on the given port
+ *	associated with the local System. This value is expressed in units
+ *	of 64 octets of additional fragment length."
+ *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018
+ *	standard.
+ *
+ * Return: the multiplier is a number in the [0, 2] interval.
+ */
+u8 ethtool_frag_size_to_mult(u32 frag_size)
+{
+	u8 mult = (frag_size / 64) - 1;
+
+	return clamp_t(u8, mult, 0, 3);
+}
+EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 5fe8f4ae2ceb..66b35c35fcdb 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -282,6 +282,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
 	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,
 	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
+	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,
 	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
 	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
 	[ETHTOOL_MSG_PHC_VCLOCKS_GET]	= &ethnl_phc_vclocks_request_ops,
@@ -598,6 +599,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
 	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
 	[ETHTOOL_MSG_MODULE_NTF]	= &ethnl_module_request_ops,
+	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,
 };
 
 /* default notification handler */
@@ -691,6 +693,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
 	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_MODULE_NTF]	= ethnl_default_notify,
+	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,
 };
 
 void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
@@ -1020,6 +1023,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_module_set_policy,
 		.maxattr = ARRAY_SIZE(ethnl_module_set_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_PREEMPT_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_preempt_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,
+	},
+	{
+		.cmd	= ETHTOOL_MSG_PREEMPT_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_set_preempt,
+		.policy = ethnl_preempt_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 7919ddb2371c..444799f3e91a 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -341,6 +341,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;
 extern const struct ethnl_request_ops ethnl_eee_request_ops;
 extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
 extern const struct ethnl_request_ops ethnl_fec_request_ops;
+extern const struct ethnl_request_ops ethnl_preempt_request_ops;
 extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
 extern const struct ethnl_request_ops ethnl_stats_request_ops;
 extern const struct ethnl_request_ops ethnl_phc_vclocks_request_ops;
@@ -379,6 +380,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
 extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
+extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
+extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
 extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
@@ -402,6 +405,7 @@ int ethnl_tunnel_info_start(struct netlink_callback *cb);
 int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
 int ethnl_set_module(struct sk_buff *skb, struct genl_info *info);
+int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);
 
 extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
 extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
new file mode 100644
index 000000000000..0000ba8cb90c
--- /dev/null
+++ b/net/ethtool/preempt.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "common.h"
+#include "netlink.h"
+#include "bitset.h"
+
+struct preempt_req_info {
+	struct ethnl_req_info		base;
+};
+
+struct preempt_reply_data {
+	struct ethnl_reply_data		base;
+	struct ethtool_fp		fp;
+};
+
+#define PREEMPT_QUEUES_COUNT \
+	(sizeof_field(struct ethtool_fp, preemptible_mask) * BITS_PER_BYTE)
+
+#define PREEMPT_REPDATA(__reply_base) \
+	container_of(__reply_base, struct preempt_reply_data, base)
+
+const struct nla_policy
+ethnl_preempt_get_policy[] = {
+	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),
+};
+
+static int preempt_prepare_data(const struct ethnl_req_info *req_base,
+				struct ethnl_reply_data *reply_base,
+				struct genl_info *info)
+{
+	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	struct net_device *dev = reply_base->dev;
+	int ret;
+
+	if (!dev->ethtool_ops->get_preempt)
+		return -EOPNOTSUPP;
+
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		return ret;
+
+	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);
+	ethnl_ops_complete(dev);
+
+	return ret;
+}
+
+static int preempt_reply_size(const struct ethnl_req_info *req_base,
+			      const struct ethnl_reply_data *reply_base)
+{
+	bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS;
+	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	const struct ethtool_fp *preempt = &data->fp;
+	int len = 0;
+	int ret;
+
+	ret = ethnl_bitset32_size(&preempt->preemptible_mask, NULL,
+				  PREEMPT_QUEUES_COUNT, NULL, compact);
+	if (ret < 0)
+		return ret;
+
+	len += ret;
+
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
+	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
+
+	return len;
+}
+
+static int preempt_fill_reply(struct sk_buff *skb,
+			      const struct ethnl_req_info *req_base,
+			      const struct ethnl_reply_data *reply_base)
+{
+	bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS;
+	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	const struct ethtool_fp *preempt = &data->fp;
+	int ret;
+
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,
+			preempt->add_frag_size))
+		return -EMSGSIZE;
+
+	ret = ethnl_put_bitset32(skb, ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK,
+				 &preempt->preemptible_mask, NULL, PREEMPT_QUEUES_COUNT,
+				 NULL, compact);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+const struct ethnl_request_ops ethnl_preempt_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,
+	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,
+	.req_info_size		= sizeof(struct preempt_req_info),
+	.reply_data_size	= sizeof(struct preempt_reply_data),
+
+	.prepare_data		= preempt_prepare_data,
+	.reply_size		= preempt_reply_size,
+	.fill_reply		= preempt_fill_reply,
+};
+
+const struct nla_policy
+ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
+	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
+	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
+	[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK]		= { .type = NLA_NESTED },
+};
+
+int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
+{
+	struct ethnl_req_info req_info = {};
+	struct nlattr **tb = info->attrs;
+	struct ethtool_fp preempt = {};
+	struct net_device *dev;
+	bool mod = false;
+	int ret;
+
+	ret = ethnl_parse_header_dev_get(&req_info,
+					 tb[ETHTOOL_A_PREEMPT_HEADER],
+					 genl_info_net(info), info->extack,
+					 true);
+	if (ret < 0)
+		return ret;
+	dev = req_info.dev;
+
+	ret = -EOPNOTSUPP;
+	if (!dev->ethtool_ops->get_preempt ||
+	    !dev->ethtool_ops->set_preempt)
+		goto out_dev;
+
+	rtnl_lock();
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		goto out_rtnl;
+
+	ret = dev->ethtool_ops->get_preempt(dev, &preempt);
+	if (ret < 0) {
+		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
+		goto out_ops;
+	}
+
+	ret = ethnl_update_bitset32(&preempt.preemptible_mask, PREEMPT_QUEUES_COUNT,
+				    tb[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK],
+				    NULL, info->extack, &mod);
+	if (ret < 0)
+		goto out_ops;
+
+	ethnl_update_bool32(&preempt.enabled,
+			    tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
+	ethnl_update_u32(&preempt.add_frag_size,
+			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
+	ret = 0;
+	if (!mod)
+		goto out_ops;
+
+	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
+	if (ret < 0) {
+		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");
+		goto out_ops;
+	}
+
+	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
+
+out_ops:
+	ethnl_ops_complete(dev);
+out_rtnl:
+	rtnl_unlock();
+out_dev:
+	dev_put(dev);
+	return ret;
+}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 02/11] ethtool: Add support for Frame Preemption verification
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

Expose the ethtool parameters to the PREEMPT_SET/_GET commands
necessary to support the verification procedure as defined by IEEE
802.3-2018.

These include the 'verified' bit to indicate that the verification
dialog has concluded successfully with the link partner and frame
preemption is supported. There's also the 'disable_verify' config to
disable initiating the verification dialog.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 Documentation/networking/ethtool-netlink.rst |  3 +++
 include/linux/ethtool.h                      |  3 +++
 include/uapi/linux/ethtool_netlink.h         |  2 ++
 net/ethtool/netlink.h                        |  2 +-
 net/ethtool/preempt.c                        | 11 +++++++++++
 5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 15d7c025cc4e..1731e7ad9ee7 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1646,6 +1646,8 @@ Kernel response contents:
   ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
   ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
   ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
+  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``    u32     disable verification
+  ``ETHTOOL_A_PREEMPT_VERIFIED``          u32     verification procedure
   ======================================  ======  ==========================
 
 ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
@@ -1667,6 +1669,7 @@ Request contents:
   ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
   ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
   ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
+  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``    bool    disable verification
   ======================================  ======  ==========================
 
 ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 42570ec8ee44..5600a7610fa1 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -13,6 +13,7 @@
 #ifndef _LINUX_ETHTOOL_H
 #define _LINUX_ETHTOOL_H
 
+#include "asm-generic/int-ll64.h"
 #include <linux/bitmap.h>
 #include <linux/compat.h>
 #include <linux/netlink.h>
@@ -464,6 +465,8 @@ struct ethtool_module_power_mode_params {
 struct ethtool_fp {
 	u32 enabled;
 	u32 preemptible_mask;
+	u32 disable_verify;
+	u32 verified;
 	u32 add_frag_size;
 };
 
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 651c7af76776..27c9bc5bfa51 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -709,6 +709,8 @@ enum {
 	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
 	ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK,		/* bitset */
 	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
+	ETHTOOL_A_PREEMPT_DISABLE_VERIFY,		/* u8 */
+	ETHTOOL_A_PREEMPT_VERIFIED,			/* u8 */
 
 	/* add new constants above here */
 	__ETHTOOL_A_PREEMPT_CNT,
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 444799f3e91a..dfdef5b8fe5b 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -381,7 +381,7 @@ extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
 extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
-extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
+extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_VERIFIED + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
 extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
index 0000ba8cb90c..7566ffb948b2 100644
--- a/net/ethtool/preempt.c
+++ b/net/ethtool/preempt.c
@@ -63,6 +63,8 @@ static int preempt_reply_size(const struct ethnl_req_info *req_base,
 
 	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
 	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_DISABLE_VERIFY */
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_VERIFIED */
 
 	return len;
 }
@@ -89,6 +91,12 @@ static int preempt_fill_reply(struct sk_buff *skb,
 	if (ret < 0)
 		return ret;
 
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_DISABLE_VERIFY, preempt->disable_verify))
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_VERIFIED, preempt->verified))
+		return -EMSGSIZE;
+
 	return 0;
 }
 
@@ -110,6 +118,7 @@ ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
 	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
 	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
 	[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK]		= { .type = NLA_NESTED },
+	[ETHTOOL_A_PREEMPT_DISABLE_VERIFY]		= NLA_POLICY_RANGE(NLA_U8, 0, 1),
 };
 
 int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
@@ -155,6 +164,8 @@ int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
 			    tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
 	ethnl_update_u32(&preempt.add_frag_size,
 			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
+	ethnl_update_bool32(&preempt.disable_verify,
+			    tb[ETHTOOL_A_PREEMPT_DISABLE_VERIFY], &mod);
 	ret = 0;
 	if (!mod)
 		goto out_ops;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 02/11] ethtool: Add support for Frame Preemption verification
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

Expose the ethtool parameters to the PREEMPT_SET/_GET commands
necessary to support the verification procedure as defined by IEEE
802.3-2018.

These include the 'verified' bit to indicate that the verification
dialog has concluded successfully with the link partner and frame
preemption is supported. There's also the 'disable_verify' config to
disable initiating the verification dialog.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 Documentation/networking/ethtool-netlink.rst |  3 +++
 include/linux/ethtool.h                      |  3 +++
 include/uapi/linux/ethtool_netlink.h         |  2 ++
 net/ethtool/netlink.h                        |  2 +-
 net/ethtool/preempt.c                        | 11 +++++++++++
 5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 15d7c025cc4e..1731e7ad9ee7 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1646,6 +1646,8 @@ Kernel response contents:
   ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
   ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
   ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
+  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``    u32     disable verification
+  ``ETHTOOL_A_PREEMPT_VERIFIED``          u32     verification procedure
   ======================================  ======  ==========================
 
 ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
@@ -1667,6 +1669,7 @@ Request contents:
   ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
   ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
   ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
+  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``    bool    disable verification
   ======================================  ======  ==========================
 
 ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 42570ec8ee44..5600a7610fa1 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -13,6 +13,7 @@
 #ifndef _LINUX_ETHTOOL_H
 #define _LINUX_ETHTOOL_H
 
+#include "asm-generic/int-ll64.h"
 #include <linux/bitmap.h>
 #include <linux/compat.h>
 #include <linux/netlink.h>
@@ -464,6 +465,8 @@ struct ethtool_module_power_mode_params {
 struct ethtool_fp {
 	u32 enabled;
 	u32 preemptible_mask;
+	u32 disable_verify;
+	u32 verified;
 	u32 add_frag_size;
 };
 
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 651c7af76776..27c9bc5bfa51 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -709,6 +709,8 @@ enum {
 	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
 	ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK,		/* bitset */
 	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
+	ETHTOOL_A_PREEMPT_DISABLE_VERIFY,		/* u8 */
+	ETHTOOL_A_PREEMPT_VERIFIED,			/* u8 */
 
 	/* add new constants above here */
 	__ETHTOOL_A_PREEMPT_CNT,
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 444799f3e91a..dfdef5b8fe5b 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -381,7 +381,7 @@ extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
 extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
-extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
+extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_VERIFIED + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
 extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
index 0000ba8cb90c..7566ffb948b2 100644
--- a/net/ethtool/preempt.c
+++ b/net/ethtool/preempt.c
@@ -63,6 +63,8 @@ static int preempt_reply_size(const struct ethnl_req_info *req_base,
 
 	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
 	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_DISABLE_VERIFY */
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_VERIFIED */
 
 	return len;
 }
@@ -89,6 +91,12 @@ static int preempt_fill_reply(struct sk_buff *skb,
 	if (ret < 0)
 		return ret;
 
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_DISABLE_VERIFY, preempt->disable_verify))
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_VERIFIED, preempt->verified))
+		return -EMSGSIZE;
+
 	return 0;
 }
 
@@ -110,6 +118,7 @@ ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
 	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
 	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
 	[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK]		= { .type = NLA_NESTED },
+	[ETHTOOL_A_PREEMPT_DISABLE_VERIFY]		= NLA_POLICY_RANGE(NLA_U8, 0, 1),
 };
 
 int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
@@ -155,6 +164,8 @@ int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
 			    tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
 	ethnl_update_u32(&preempt.add_frag_size,
 			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
+	ethnl_update_bool32(&preempt.disable_verify,
+			    tb[ETHTOOL_A_PREEMPT_DISABLE_VERIFY], &mod);
 	ret = 0;
 	if (!mod)
 		goto out_ops;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 03/11] igc: Add support for receiving frames with all zeroes address
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

The frame preemption verification (as defined by IEEE 802.3-2018
Section 99.4.3) handshake is done by the driver, the default
configuration of the driver is to only receive frames with the driver
address.

So, in preparation for that add a second address to the list of
acceptable addresses.

Because the frame preemption "disable verify" toggle only affects the
transmission of verification frames, this needs to always be enabled.
As that address is invalid, the impact in practical scenarios should
be minimal. But still a bummer that we have to do this.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h      | 2 ++
 drivers/net/ethernet/intel/igc/igc_main.c | 7 +++++++
 drivers/net/ethernet/intel/igc/igc_tsn.c  | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 1e7e7071f64d..31e7b4c72894 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -622,6 +622,8 @@ struct igc_nfc_rule *igc_get_nfc_rule(struct igc_adapter *adapter,
 int igc_add_nfc_rule(struct igc_adapter *adapter, struct igc_nfc_rule *rule);
 void igc_del_nfc_rule(struct igc_adapter *adapter, struct igc_nfc_rule *rule);
 
+int igc_enable_empty_addr_recv(struct igc_adapter *adapter);
+
 void igc_ptp_init(struct igc_adapter *adapter);
 void igc_ptp_reset(struct igc_adapter *adapter);
 void igc_ptp_suspend(struct igc_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ae17af44fe02..bcbf35b32ef3 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3586,6 +3586,13 @@ static int igc_uc_unsync(struct net_device *netdev, const unsigned char *addr)
 	return 0;
 }
 
+int igc_enable_empty_addr_recv(struct igc_adapter *adapter)
+{
+	u8 empty[ETH_ALEN] = { };
+
+	return igc_add_mac_filter(adapter, IGC_MAC_FILTER_TYPE_DST, empty, -1);
+}
+
 /**
  * igc_set_rx_mode - Secondary Unicast, Multicast and Promiscuous mode set
  * @netdev: network interface device structure
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 0fce22de2ab8..270a08196f49 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -235,6 +235,13 @@ int igc_tsn_reset(struct igc_adapter *adapter)
 	unsigned int new_flags;
 	int err = 0;
 
+	/* Frame preemption verification requires that packets with
+	 * the all zeroes MAC address are allowed to be received. Add
+	 * the all zeroes destination address to the list of
+	 * acceptable addresses.
+	 */
+	igc_enable_empty_addr_recv(adapter);
+
 	new_flags = igc_tsn_new_flags(adapter);
 
 	if (!(new_flags & IGC_FLAG_TSN_ANY_ENABLED))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 03/11] igc: Add support for receiving frames with all zeroes address
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

The frame preemption verification (as defined by IEEE 802.3-2018
Section 99.4.3) handshake is done by the driver, the default
configuration of the driver is to only receive frames with the driver
address.

So, in preparation for that add a second address to the list of
acceptable addresses.

Because the frame preemption "disable verify" toggle only affects the
transmission of verification frames, this needs to always be enabled.
As that address is invalid, the impact in practical scenarios should
be minimal. But still a bummer that we have to do this.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h      | 2 ++
 drivers/net/ethernet/intel/igc/igc_main.c | 7 +++++++
 drivers/net/ethernet/intel/igc/igc_tsn.c  | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 1e7e7071f64d..31e7b4c72894 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -622,6 +622,8 @@ struct igc_nfc_rule *igc_get_nfc_rule(struct igc_adapter *adapter,
 int igc_add_nfc_rule(struct igc_adapter *adapter, struct igc_nfc_rule *rule);
 void igc_del_nfc_rule(struct igc_adapter *adapter, struct igc_nfc_rule *rule);
 
+int igc_enable_empty_addr_recv(struct igc_adapter *adapter);
+
 void igc_ptp_init(struct igc_adapter *adapter);
 void igc_ptp_reset(struct igc_adapter *adapter);
 void igc_ptp_suspend(struct igc_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ae17af44fe02..bcbf35b32ef3 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3586,6 +3586,13 @@ static int igc_uc_unsync(struct net_device *netdev, const unsigned char *addr)
 	return 0;
 }
 
+int igc_enable_empty_addr_recv(struct igc_adapter *adapter)
+{
+	u8 empty[ETH_ALEN] = { };
+
+	return igc_add_mac_filter(adapter, IGC_MAC_FILTER_TYPE_DST, empty, -1);
+}
+
 /**
  * igc_set_rx_mode - Secondary Unicast, Multicast and Promiscuous mode set
  * @netdev: network interface device structure
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 0fce22de2ab8..270a08196f49 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -235,6 +235,13 @@ int igc_tsn_reset(struct igc_adapter *adapter)
 	unsigned int new_flags;
 	int err = 0;
 
+	/* Frame preemption verification requires that packets with
+	 * the all zeroes MAC address are allowed to be received. Add
+	 * the all zeroes destination address to the list of
+	 * acceptable addresses.
+	 */
+	igc_enable_empty_addr_recv(adapter);
+
 	new_flags = igc_tsn_new_flags(adapter);
 
 	if (!(new_flags & IGC_FLAG_TSN_ANY_ENABLED))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 04/11] igc: Set the RX packet buffer size for TSN mode
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

In preparation for supporting frame preemption, when entering TSN mode
set the receive packet buffer to 16KB for the Express MAC, 16KB for
the Preemptible MAC and 2KB for the BMC, according to the datasheet
section 7.1.3.2.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_defines.h |  2 ++
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 13 +++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 5c66b97c0cfa..f609b2dbbc28 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -396,6 +396,8 @@
 #define IGC_RXPBS_CFG_TS_EN	0x80000000 /* Timestamp in Rx buffer */
 
 #define IGC_TXPBSIZE_TSN	0x04145145 /* 5k bytes buffer for each queue */
+#define IGC_RXPBSIZE_TSN	0x0000f08f /* 15KB for EXP + 15KB for BE + 2KB for BMC */
+#define IGC_RXPBSIZE_SIZE_MASK	0x0001FFFF
 
 #define IGC_DTXMXPKTSZ_TSN	0x19 /* 1600 bytes of max TX DMA packet size */
 #define IGC_DTXMXPKTSZ_DEFAULT	0x98 /* 9728-byte Jumbo frames */
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 270a08196f49..40a730f8b3f3 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -54,12 +54,17 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter)
 static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 {
 	struct igc_hw *hw = &adapter->hw;
-	u32 tqavctrl;
+	u32 tqavctrl, rxpbs;
 	int i;
 
 	wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
 
+	rxpbs = rd32(IGC_RXPBS) & ~IGC_RXPBSIZE_SIZE_MASK;
+	rxpbs |= I225_RXPBSIZE_DEFAULT;
+
+	wr32(IGC_RXPBS, rxpbs);
+
 	tqavctrl = rd32(IGC_TQAVCTRL);
 	tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN |
 		      IGC_TQAVCTRL_ENHANCED_QAV);
@@ -83,7 +88,7 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 {
 	struct igc_hw *hw = &adapter->hw;
 	u32 tqavctrl, baset_l, baset_h;
-	u32 sec, nsec, cycle;
+	u32 sec, nsec, cycle, rxpbs;
 	ktime_t base_time, systim;
 	int i;
 
@@ -94,6 +99,10 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_TSN);
 	wr32(IGC_TXPBS, IGC_TXPBSIZE_TSN);
 
+	rxpbs = rd32(IGC_RXPBS) & ~IGC_RXPBSIZE_SIZE_MASK;
+	rxpbs |= IGC_RXPBSIZE_TSN;
+	wr32(IGC_RXPBS, rxpbs);
+
 	tqavctrl = rd32(IGC_TQAVCTRL);
 	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
 	wr32(IGC_TQAVCTRL, tqavctrl);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 04/11] igc: Set the RX packet buffer size for TSN mode
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

In preparation for supporting frame preemption, when entering TSN mode
set the receive packet buffer to 16KB for the Express MAC, 16KB for
the Preemptible MAC and 2KB for the BMC, according to the datasheet
section 7.1.3.2.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_defines.h |  2 ++
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 13 +++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 5c66b97c0cfa..f609b2dbbc28 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -396,6 +396,8 @@
 #define IGC_RXPBS_CFG_TS_EN	0x80000000 /* Timestamp in Rx buffer */
 
 #define IGC_TXPBSIZE_TSN	0x04145145 /* 5k bytes buffer for each queue */
+#define IGC_RXPBSIZE_TSN	0x0000f08f /* 15KB for EXP + 15KB for BE + 2KB for BMC */
+#define IGC_RXPBSIZE_SIZE_MASK	0x0001FFFF
 
 #define IGC_DTXMXPKTSZ_TSN	0x19 /* 1600 bytes of max TX DMA packet size */
 #define IGC_DTXMXPKTSZ_DEFAULT	0x98 /* 9728-byte Jumbo frames */
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 270a08196f49..40a730f8b3f3 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -54,12 +54,17 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter)
 static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 {
 	struct igc_hw *hw = &adapter->hw;
-	u32 tqavctrl;
+	u32 tqavctrl, rxpbs;
 	int i;
 
 	wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
 
+	rxpbs = rd32(IGC_RXPBS) & ~IGC_RXPBSIZE_SIZE_MASK;
+	rxpbs |= I225_RXPBSIZE_DEFAULT;
+
+	wr32(IGC_RXPBS, rxpbs);
+
 	tqavctrl = rd32(IGC_TQAVCTRL);
 	tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN |
 		      IGC_TQAVCTRL_ENHANCED_QAV);
@@ -83,7 +88,7 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 {
 	struct igc_hw *hw = &adapter->hw;
 	u32 tqavctrl, baset_l, baset_h;
-	u32 sec, nsec, cycle;
+	u32 sec, nsec, cycle, rxpbs;
 	ktime_t base_time, systim;
 	int i;
 
@@ -94,6 +99,10 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_TSN);
 	wr32(IGC_TXPBS, IGC_TXPBSIZE_TSN);
 
+	rxpbs = rd32(IGC_RXPBS) & ~IGC_RXPBSIZE_SIZE_MASK;
+	rxpbs |= IGC_RXPBSIZE_TSN;
+	wr32(IGC_RXPBS, rxpbs);
+
 	tqavctrl = rd32(IGC_TQAVCTRL);
 	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
 	wr32(IGC_TQAVCTRL, tqavctrl);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 05/11] igc: Optimze TX buffer sizes for TSN
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

There are 64KB buffer space shared for TX and RX (including the BMC).
We were only reserving 22KB for TX, increase each TX buffer (per
queue) by 2KB, the total is now 30KB for TX.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_defines.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index f609b2dbbc28..62fff53254dd 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -395,7 +395,7 @@
 #define I225_TXPBSIZE_DEFAULT	0x04000014 /* TXPBSIZE default */
 #define IGC_RXPBS_CFG_TS_EN	0x80000000 /* Timestamp in Rx buffer */
 
-#define IGC_TXPBSIZE_TSN	0x04145145 /* 5k bytes buffer for each queue */
+#define IGC_TXPBSIZE_TSN	0x041c71c7 /* 7KB buffer for each queue + 2KB for BMC */
 #define IGC_RXPBSIZE_TSN	0x0000f08f /* 15KB for EXP + 15KB for BE + 2KB for BMC */
 #define IGC_RXPBSIZE_SIZE_MASK	0x0001FFFF
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 05/11] igc: Optimze TX buffer sizes for TSN
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

There are 64KB buffer space shared for TX and RX (including the BMC).
We were only reserving 22KB for TX, increase each TX buffer (per
queue) by 2KB, the total is now 30KB for TX.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_defines.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index f609b2dbbc28..62fff53254dd 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -395,7 +395,7 @@
 #define I225_TXPBSIZE_DEFAULT	0x04000014 /* TXPBSIZE default */
 #define IGC_RXPBS_CFG_TS_EN	0x80000000 /* Timestamp in Rx buffer */
 
-#define IGC_TXPBSIZE_TSN	0x04145145 /* 5k bytes buffer for each queue */
+#define IGC_TXPBSIZE_TSN	0x041c71c7 /* 7KB buffer for each queue + 2KB for BMC */
 #define IGC_RXPBSIZE_TSN	0x0000f08f /* 15KB for EXP + 15KB for BE + 2KB for BMC */
 #define IGC_RXPBSIZE_SIZE_MASK	0x0001FFFF
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 06/11] igc: Add support for receiving errored frames
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

While developing features that require sending potencially ill formed
frames, it is useful being able to receive them on the other side.

The driver already had all the pieces in place to support that, all
that was missing was put the flag in the list of supported features.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index bcbf35b32ef3..5dd7140bac82 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6318,6 +6318,7 @@ static int igc_probe(struct pci_dev *pdev,
 
 	/* copy netdev features into list of user selectable features */
 	netdev->hw_features |= NETIF_F_NTUPLE;
+	netdev->hw_features |= NETIF_F_RXALL;
 	netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX;
 	netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX;
 	netdev->hw_features |= netdev->features;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 06/11] igc: Add support for receiving errored frames
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

While developing features that require sending potencially ill formed
frames, it is useful being able to receive them on the other side.

The driver already had all the pieces in place to support that, all
that was missing was put the flag in the list of supported features.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index bcbf35b32ef3..5dd7140bac82 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6318,6 +6318,7 @@ static int igc_probe(struct pci_dev *pdev,
 
 	/* copy netdev features into list of user selectable features */
 	netdev->hw_features |= NETIF_F_NTUPLE;
+	netdev->hw_features |= NETIF_F_RXALL;
 	netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX;
 	netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX;
 	netdev->hw_features |= netdev->features;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 07/11] igc: Add support for enabling frame preemption via ethtool
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

Add support for enabling frame preemption via ethtool. All that's left
for ethtool is to save the settings in the adapter state, and the
request for those settings to be applied.

It's done this because the TSN features (frame preemption is part of
them) interact with one another and it's better to keep track from a
central place.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  3 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c | 51 ++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 31e7b4c72894..df2fc71825a6 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -94,6 +94,7 @@ struct igc_ring {
 	u8 queue_index;                 /* logical index of the ring*/
 	u8 reg_idx;                     /* physical index of the ring */
 	bool launchtime_enable;         /* true if LaunchTime is enabled */
+	bool preemptible;               /* true if queue is preemptible */
 
 	u32 start_time;
 	u32 end_time;
@@ -182,6 +183,8 @@ struct igc_adapter {
 
 	ktime_t base_time;
 	ktime_t cycle_time;
+	bool frame_preemption_active;
+	u32 add_frag_size;
 
 	/* OS defined structs */
 	struct pci_dev *pdev;
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 8cc077b712ad..401d2cdb3e81 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -8,6 +8,7 @@
 
 #include "igc.h"
 #include "igc_diag.h"
+#include "igc_tsn.h"
 
 /* forward declaration */
 struct igc_stats {
@@ -1670,6 +1671,54 @@ static int igc_ethtool_set_eee(struct net_device *netdev,
 	return 0;
 }
 
+static int igc_ethtool_get_preempt(struct net_device *netdev,
+				   struct ethtool_fp *fpcmd)
+{
+	struct igc_adapter *adapter = netdev_priv(netdev);
+	u32 mask = 0;
+	int i;
+
+	fpcmd->enabled = adapter->frame_preemption_active;
+	fpcmd->add_frag_size = adapter->add_frag_size;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		struct igc_ring *ring = adapter->tx_ring[i];
+
+		if (ring->preemptible)
+			mask |= BIT(i);
+	}
+
+	fpcmd->preemptible_mask = mask;
+
+	return 0;
+}
+
+static int igc_ethtool_set_preempt(struct net_device *netdev,
+				   struct ethtool_fp *fpcmd,
+				   struct netlink_ext_ack *extack)
+{
+	struct igc_adapter *adapter = netdev_priv(netdev);
+	u32 mask;
+	int i;
+
+	if (fpcmd->add_frag_size < 68 || fpcmd->add_frag_size > 260) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid value for add-frag-size");
+		return -EINVAL;
+	}
+
+	adapter->frame_preemption_active = fpcmd->enabled;
+	adapter->add_frag_size = fpcmd->add_frag_size;
+	mask = fpcmd->preemptible_mask;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		struct igc_ring *ring = adapter->tx_ring[i];
+
+		ring->preemptible = (mask & BIT(i));
+	}
+
+	return igc_tsn_offload_apply(adapter);
+}
+
 static int igc_ethtool_begin(struct net_device *netdev)
 {
 	struct igc_adapter *adapter = netdev_priv(netdev);
@@ -1963,6 +2012,8 @@ static const struct ethtool_ops igc_ethtool_ops = {
 	.get_ts_info		= igc_ethtool_get_ts_info,
 	.get_channels		= igc_ethtool_get_channels,
 	.set_channels		= igc_ethtool_set_channels,
+	.get_preempt		= igc_ethtool_get_preempt,
+	.set_preempt		= igc_ethtool_set_preempt,
 	.get_priv_flags		= igc_ethtool_get_priv_flags,
 	.set_priv_flags		= igc_ethtool_set_priv_flags,
 	.get_eee		= igc_ethtool_get_eee,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 07/11] igc: Add support for enabling frame preemption via ethtool
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

Add support for enabling frame preemption via ethtool. All that's left
for ethtool is to save the settings in the adapter state, and the
request for those settings to be applied.

It's done this because the TSN features (frame preemption is part of
them) interact with one another and it's better to keep track from a
central place.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  3 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c | 51 ++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 31e7b4c72894..df2fc71825a6 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -94,6 +94,7 @@ struct igc_ring {
 	u8 queue_index;                 /* logical index of the ring*/
 	u8 reg_idx;                     /* physical index of the ring */
 	bool launchtime_enable;         /* true if LaunchTime is enabled */
+	bool preemptible;               /* true if queue is preemptible */
 
 	u32 start_time;
 	u32 end_time;
@@ -182,6 +183,8 @@ struct igc_adapter {
 
 	ktime_t base_time;
 	ktime_t cycle_time;
+	bool frame_preemption_active;
+	u32 add_frag_size;
 
 	/* OS defined structs */
 	struct pci_dev *pdev;
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 8cc077b712ad..401d2cdb3e81 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -8,6 +8,7 @@
 
 #include "igc.h"
 #include "igc_diag.h"
+#include "igc_tsn.h"
 
 /* forward declaration */
 struct igc_stats {
@@ -1670,6 +1671,54 @@ static int igc_ethtool_set_eee(struct net_device *netdev,
 	return 0;
 }
 
+static int igc_ethtool_get_preempt(struct net_device *netdev,
+				   struct ethtool_fp *fpcmd)
+{
+	struct igc_adapter *adapter = netdev_priv(netdev);
+	u32 mask = 0;
+	int i;
+
+	fpcmd->enabled = adapter->frame_preemption_active;
+	fpcmd->add_frag_size = adapter->add_frag_size;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		struct igc_ring *ring = adapter->tx_ring[i];
+
+		if (ring->preemptible)
+			mask |= BIT(i);
+	}
+
+	fpcmd->preemptible_mask = mask;
+
+	return 0;
+}
+
+static int igc_ethtool_set_preempt(struct net_device *netdev,
+				   struct ethtool_fp *fpcmd,
+				   struct netlink_ext_ack *extack)
+{
+	struct igc_adapter *adapter = netdev_priv(netdev);
+	u32 mask;
+	int i;
+
+	if (fpcmd->add_frag_size < 68 || fpcmd->add_frag_size > 260) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid value for add-frag-size");
+		return -EINVAL;
+	}
+
+	adapter->frame_preemption_active = fpcmd->enabled;
+	adapter->add_frag_size = fpcmd->add_frag_size;
+	mask = fpcmd->preemptible_mask;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		struct igc_ring *ring = adapter->tx_ring[i];
+
+		ring->preemptible = (mask & BIT(i));
+	}
+
+	return igc_tsn_offload_apply(adapter);
+}
+
 static int igc_ethtool_begin(struct net_device *netdev)
 {
 	struct igc_adapter *adapter = netdev_priv(netdev);
@@ -1963,6 +2012,8 @@ static const struct ethtool_ops igc_ethtool_ops = {
 	.get_ts_info		= igc_ethtool_get_ts_info,
 	.get_channels		= igc_ethtool_get_channels,
 	.set_channels		= igc_ethtool_set_channels,
+	.get_preempt		= igc_ethtool_get_preempt,
+	.set_preempt		= igc_ethtool_set_preempt,
 	.get_priv_flags		= igc_ethtool_get_priv_flags,
 	.set_priv_flags		= igc_ethtool_set_priv_flags,
 	.get_eee		= igc_ethtool_get_eee,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 08/11] igc: Add support for setting frame preemption configuration
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

Set the hardware register that enables the frame preemption feature.

Some code is moved around because the PREEMPT_ENA bit in the
IGC_TQAVCTRL register is recommended to be set after the individual
queue registers (IGC_TXQCTL[i]) are set.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  8 ++++-
 drivers/net/ethernet/intel/igc/igc_defines.h |  5 +++
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 37 +++++++++++++++-----
 3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index df2fc71825a6..11da66bd9c2c 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -300,9 +300,10 @@ extern char igc_driver_name[];
 #define IGC_FLAG_RX_LEGACY		BIT(16)
 #define IGC_FLAG_TSN_QBV_ENABLED	BIT(17)
 #define IGC_FLAG_TSN_QAV_ENABLED	BIT(18)
+#define IGC_FLAG_TSN_PREEMPT_ENABLED	BIT(19)
 
 #define IGC_FLAG_TSN_ANY_ENABLED \
-	(IGC_FLAG_TSN_QBV_ENABLED | IGC_FLAG_TSN_QAV_ENABLED)
+	(IGC_FLAG_TSN_QBV_ENABLED | IGC_FLAG_TSN_PREEMPT_ENABLED | IGC_FLAG_TSN_PREEMPT_ENABLED)
 
 #define IGC_FLAG_RSS_FIELD_IPV4_UDP	BIT(6)
 #define IGC_FLAG_RSS_FIELD_IPV6_UDP	BIT(7)
@@ -351,6 +352,11 @@ extern char igc_driver_name[];
 #define IGC_I225_RX_LATENCY_1000	300
 #define IGC_I225_RX_LATENCY_2500	1485
 
+/* From the datasheet section 8.12.4 Tx Qav Control TQAVCTRL,
+ * MIN_FRAG initial value.
+ */
+#define IGC_I225_MIN_FRAG_SIZE_DEFAULT	68
+
 /* RX and TX descriptor control thresholds.
  * PTHRESH - MAC will consider prefetch if it has fewer than this number of
  *           descriptors available in its onboard memory.
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 62fff53254dd..68faca584e34 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -513,6 +513,9 @@
 /* Transmit Scheduling */
 #define IGC_TQAVCTRL_TRANSMIT_MODE_TSN	0x00000001
 #define IGC_TQAVCTRL_ENHANCED_QAV	0x00000008
+#define IGC_TQAVCTRL_PREEMPT_ENA	0x00000002
+#define IGC_TQAVCTRL_MIN_FRAG_MASK	0x0000C000
+#define IGC_TQAVCTRL_MIN_FRAG_SHIFT	14
 
 #define IGC_TXQCTL_QUEUE_MODE_LAUNCHT	0x00000001
 #define IGC_TXQCTL_STRICT_CYCLE		0x00000002
@@ -526,6 +529,8 @@
 
 #define IGC_MAX_SR_QUEUES		2
 
+#define IGC_TXQCTL_PREEMPTABLE		0x00000008
+
 /* Receive Checksum Control */
 #define IGC_RXCSUM_CRCOFL	0x00000800   /* CRC32 offload enable */
 #define IGC_RXCSUM_PCSD		0x00002000   /* packet checksum disabled */
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 40a730f8b3f3..6e285bc15a6b 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -45,6 +45,9 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter)
 	if (is_cbs_enabled(adapter))
 		new_flags |= IGC_FLAG_TSN_QAV_ENABLED;
 
+	if (adapter->frame_preemption_active)
+		new_flags |= IGC_FLAG_TSN_PREEMPT_ENABLED;
+
 	return new_flags;
 }
 
@@ -57,6 +60,8 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl, rxpbs;
 	int i;
 
+	adapter->add_frag_size = IGC_I225_MIN_FRAG_SIZE_DEFAULT;
+
 	wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
 
@@ -67,7 +72,8 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 
 	tqavctrl = rd32(IGC_TQAVCTRL);
 	tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN |
-		      IGC_TQAVCTRL_ENHANCED_QAV);
+		      IGC_TQAVCTRL_ENHANCED_QAV | IGC_TQAVCTRL_PREEMPT_ENA |
+		      IGC_TQAVCTRL_MIN_FRAG_MASK);
 	wr32(IGC_TQAVCTRL, tqavctrl);
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
@@ -79,7 +85,7 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	wr32(IGC_QBVCYCLET_S, 0);
 	wr32(IGC_QBVCYCLET, NSEC_PER_SEC);
 
-	adapter->flags &= ~IGC_FLAG_TSN_QBV_ENABLED;
+	adapter->flags &= ~IGC_FLAG_TSN_ANY_ENABLED;
 
 	return 0;
 }
@@ -90,11 +96,9 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl, baset_l, baset_h;
 	u32 sec, nsec, cycle, rxpbs;
 	ktime_t base_time, systim;
+	u32 frag_size_mult;
 	int i;
 
-	cycle = adapter->cycle_time;
-	base_time = adapter->base_time;
-
 	wr32(IGC_TSAUXC, 0);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_TSN);
 	wr32(IGC_TXPBS, IGC_TXPBSIZE_TSN);
@@ -103,9 +107,8 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	rxpbs |= IGC_RXPBSIZE_TSN;
 	wr32(IGC_RXPBS, rxpbs);
 
-	tqavctrl = rd32(IGC_TQAVCTRL);
-	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
-	wr32(IGC_TQAVCTRL, tqavctrl);
+	cycle = adapter->cycle_time;
+	base_time = adapter->base_time;
 
 	wr32(IGC_QBVCYCLET_S, cycle);
 	wr32(IGC_QBVCYCLET, cycle);
@@ -216,6 +219,10 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 			wr32(IGC_TQAVHC(i), 0);
 		}
 skip_cbs:
+
+		if (adapter->frame_preemption_active && ring->preemptible)
+			txqctl |= IGC_TXQCTL_PREEMPTABLE;
+
 		wr32(IGC_TXQCTL(i), txqctl);
 	}
 
@@ -236,6 +243,20 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	wr32(IGC_BASET_H, baset_h);
 	wr32(IGC_BASET_L, baset_l);
 
+	tqavctrl = rd32(IGC_TQAVCTRL) &
+		~(IGC_TQAVCTRL_MIN_FRAG_MASK | IGC_TQAVCTRL_PREEMPT_ENA);
+
+	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
+
+	if (adapter->frame_preemption_active)
+		tqavctrl |= IGC_TQAVCTRL_PREEMPT_ENA;
+
+	frag_size_mult = ethtool_frag_size_to_mult(adapter->add_frag_size);
+
+	tqavctrl |= frag_size_mult << IGC_TQAVCTRL_MIN_FRAG_SHIFT;
+
+	wr32(IGC_TQAVCTRL, tqavctrl);
+
 	return 0;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 08/11] igc: Add support for setting frame preemption configuration
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

Set the hardware register that enables the frame preemption feature.

Some code is moved around because the PREEMPT_ENA bit in the
IGC_TQAVCTRL register is recommended to be set after the individual
queue registers (IGC_TXQCTL[i]) are set.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  8 ++++-
 drivers/net/ethernet/intel/igc/igc_defines.h |  5 +++
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 37 +++++++++++++++-----
 3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index df2fc71825a6..11da66bd9c2c 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -300,9 +300,10 @@ extern char igc_driver_name[];
 #define IGC_FLAG_RX_LEGACY		BIT(16)
 #define IGC_FLAG_TSN_QBV_ENABLED	BIT(17)
 #define IGC_FLAG_TSN_QAV_ENABLED	BIT(18)
+#define IGC_FLAG_TSN_PREEMPT_ENABLED	BIT(19)
 
 #define IGC_FLAG_TSN_ANY_ENABLED \
-	(IGC_FLAG_TSN_QBV_ENABLED | IGC_FLAG_TSN_QAV_ENABLED)
+	(IGC_FLAG_TSN_QBV_ENABLED | IGC_FLAG_TSN_PREEMPT_ENABLED | IGC_FLAG_TSN_PREEMPT_ENABLED)
 
 #define IGC_FLAG_RSS_FIELD_IPV4_UDP	BIT(6)
 #define IGC_FLAG_RSS_FIELD_IPV6_UDP	BIT(7)
@@ -351,6 +352,11 @@ extern char igc_driver_name[];
 #define IGC_I225_RX_LATENCY_1000	300
 #define IGC_I225_RX_LATENCY_2500	1485
 
+/* From the datasheet section 8.12.4 Tx Qav Control TQAVCTRL,
+ * MIN_FRAG initial value.
+ */
+#define IGC_I225_MIN_FRAG_SIZE_DEFAULT	68
+
 /* RX and TX descriptor control thresholds.
  * PTHRESH - MAC will consider prefetch if it has fewer than this number of
  *           descriptors available in its onboard memory.
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 62fff53254dd..68faca584e34 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -513,6 +513,9 @@
 /* Transmit Scheduling */
 #define IGC_TQAVCTRL_TRANSMIT_MODE_TSN	0x00000001
 #define IGC_TQAVCTRL_ENHANCED_QAV	0x00000008
+#define IGC_TQAVCTRL_PREEMPT_ENA	0x00000002
+#define IGC_TQAVCTRL_MIN_FRAG_MASK	0x0000C000
+#define IGC_TQAVCTRL_MIN_FRAG_SHIFT	14
 
 #define IGC_TXQCTL_QUEUE_MODE_LAUNCHT	0x00000001
 #define IGC_TXQCTL_STRICT_CYCLE		0x00000002
@@ -526,6 +529,8 @@
 
 #define IGC_MAX_SR_QUEUES		2
 
+#define IGC_TXQCTL_PREEMPTABLE		0x00000008
+
 /* Receive Checksum Control */
 #define IGC_RXCSUM_CRCOFL	0x00000800   /* CRC32 offload enable */
 #define IGC_RXCSUM_PCSD		0x00002000   /* packet checksum disabled */
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 40a730f8b3f3..6e285bc15a6b 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -45,6 +45,9 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter)
 	if (is_cbs_enabled(adapter))
 		new_flags |= IGC_FLAG_TSN_QAV_ENABLED;
 
+	if (adapter->frame_preemption_active)
+		new_flags |= IGC_FLAG_TSN_PREEMPT_ENABLED;
+
 	return new_flags;
 }
 
@@ -57,6 +60,8 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl, rxpbs;
 	int i;
 
+	adapter->add_frag_size = IGC_I225_MIN_FRAG_SIZE_DEFAULT;
+
 	wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
 
@@ -67,7 +72,8 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 
 	tqavctrl = rd32(IGC_TQAVCTRL);
 	tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN |
-		      IGC_TQAVCTRL_ENHANCED_QAV);
+		      IGC_TQAVCTRL_ENHANCED_QAV | IGC_TQAVCTRL_PREEMPT_ENA |
+		      IGC_TQAVCTRL_MIN_FRAG_MASK);
 	wr32(IGC_TQAVCTRL, tqavctrl);
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
@@ -79,7 +85,7 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	wr32(IGC_QBVCYCLET_S, 0);
 	wr32(IGC_QBVCYCLET, NSEC_PER_SEC);
 
-	adapter->flags &= ~IGC_FLAG_TSN_QBV_ENABLED;
+	adapter->flags &= ~IGC_FLAG_TSN_ANY_ENABLED;
 
 	return 0;
 }
@@ -90,11 +96,9 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl, baset_l, baset_h;
 	u32 sec, nsec, cycle, rxpbs;
 	ktime_t base_time, systim;
+	u32 frag_size_mult;
 	int i;
 
-	cycle = adapter->cycle_time;
-	base_time = adapter->base_time;
-
 	wr32(IGC_TSAUXC, 0);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_TSN);
 	wr32(IGC_TXPBS, IGC_TXPBSIZE_TSN);
@@ -103,9 +107,8 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	rxpbs |= IGC_RXPBSIZE_TSN;
 	wr32(IGC_RXPBS, rxpbs);
 
-	tqavctrl = rd32(IGC_TQAVCTRL);
-	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
-	wr32(IGC_TQAVCTRL, tqavctrl);
+	cycle = adapter->cycle_time;
+	base_time = adapter->base_time;
 
 	wr32(IGC_QBVCYCLET_S, cycle);
 	wr32(IGC_QBVCYCLET, cycle);
@@ -216,6 +219,10 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 			wr32(IGC_TQAVHC(i), 0);
 		}
 skip_cbs:
+
+		if (adapter->frame_preemption_active && ring->preemptible)
+			txqctl |= IGC_TXQCTL_PREEMPTABLE;
+
 		wr32(IGC_TXQCTL(i), txqctl);
 	}
 
@@ -236,6 +243,20 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	wr32(IGC_BASET_H, baset_h);
 	wr32(IGC_BASET_L, baset_l);
 
+	tqavctrl = rd32(IGC_TQAVCTRL) &
+		~(IGC_TQAVCTRL_MIN_FRAG_MASK | IGC_TQAVCTRL_PREEMPT_ENA);
+
+	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
+
+	if (adapter->frame_preemption_active)
+		tqavctrl |= IGC_TQAVCTRL_PREEMPT_ENA;
+
+	frag_size_mult = ethtool_frag_size_to_mult(adapter->add_frag_size);
+
+	tqavctrl |= frag_size_mult << IGC_TQAVCTRL_MIN_FRAG_SHIFT;
+
+	wr32(IGC_TQAVCTRL, tqavctrl);
+
 	return 0;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 09/11] igc: Add support for Frame Preemption verification
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

Add support for sending/receiving Frame Preemption verification
frames.

The i225 hardware doesn't implement the process of verification
internally, this is left to the driver.

Add a simple implementation of the state machine defined in IEEE
802.3-2018, Section 99.4.7.

For now, the state machine is started manually by the user, when
enabling verification. Example:

$ ethtool --set-frame-preemption IFACE disable-verify off

The "verified" condition is set to true when the SMD-V frame is sent,
and the SMD-R frame is received. So, it only tracks the transmission
side. This seems to be what's expected from IEEE 802.3-2018.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  16 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  13 +
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  37 ++-
 drivers/net/ethernet/intel/igc/igc_main.c    | 243 +++++++++++++++++++
 4 files changed, 307 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 11da66bd9c2c..be4a8362d6d7 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -131,6 +131,13 @@ struct igc_ring {
 	struct xsk_buff_pool *xsk_pool;
 } ____cacheline_internodealigned_in_smp;
 
+enum frame_preemption_state {
+	FRAME_PREEMPTION_STATE_FAILED,
+	FRAME_PREEMPTION_STATE_DONE,
+	FRAME_PREEMPTION_STATE_START,
+	FRAME_PREEMPTION_STATE_SENT,
+};
+
 /* Board specific private data structure */
 struct igc_adapter {
 	struct net_device *netdev;
@@ -184,6 +191,7 @@ struct igc_adapter {
 	ktime_t base_time;
 	ktime_t cycle_time;
 	bool frame_preemption_active;
+	bool frame_preemption_requested;
 	u32 add_frag_size;
 
 	/* OS defined structs */
@@ -250,6 +258,14 @@ struct igc_adapter {
 		struct timespec64 start;
 		struct timespec64 period;
 	} perout[IGC_N_PEROUT];
+
+	struct delayed_work fp_verification_work;
+	unsigned long fp_start;
+	bool fp_received_smd_v;
+	bool fp_received_smd_r;
+	unsigned int fp_verify_cnt;
+	enum frame_preemption_state fp_tx_state;
+	bool fp_disable_verify;
 };
 
 void igc_up(struct igc_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 68faca584e34..63fc76a0b72a 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -307,6 +307,8 @@
 #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
 #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
 #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
+#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
+#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
 #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
 #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
 #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
@@ -366,9 +368,20 @@
 
 #define IGC_RXDEXT_STATERR_LB	0x00040000
 
+#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
+#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
+
 /* Advanced Receive Descriptor bit definitions */
 #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
 
+#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
+#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
+
+#define IGC_SMD_TYPE_SFD		0x0
+#define IGC_SMD_TYPE_SMD_V		0x1
+#define IGC_SMD_TYPE_SMD_R		0x2
+#define IGC_SMD_TYPE_COMPLETE		0x3
+
 #define IGC_RXDEXT_STATERR_L4E		0x20000000
 #define IGC_RXDEXT_STATERR_IPE		0x40000000
 #define IGC_RXDEXT_STATERR_RXE		0x80000000
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 401d2cdb3e81..9a80e2569dc3 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1680,6 +1680,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
 
 	fpcmd->enabled = adapter->frame_preemption_active;
 	fpcmd->add_frag_size = adapter->add_frag_size;
+	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
+	fpcmd->disable_verify = adapter->fp_disable_verify;
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igc_ring *ring = adapter->tx_ring[i];
@@ -1698,6 +1700,7 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
 				   struct netlink_ext_ack *extack)
 {
 	struct igc_adapter *adapter = netdev_priv(netdev);
+	bool verified = false, mask_changed = false;
 	u32 mask;
 	int i;
 
@@ -1706,17 +1709,47 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
 		return -EINVAL;
 	}
 
-	adapter->frame_preemption_active = fpcmd->enabled;
+	adapter->frame_preemption_requested = fpcmd->enabled;
 	adapter->add_frag_size = fpcmd->add_frag_size;
 	mask = fpcmd->preemptible_mask;
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igc_ring *ring = adapter->tx_ring[i];
+		bool preemptible = mask & BIT(i);
+
+		if (ring->preemptible != preemptible)
+			mask_changed = true;
 
 		ring->preemptible = (mask & BIT(i));
 	}
 
-	return igc_tsn_offload_apply(adapter);
+	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
+		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
+		schedule_delayed_work(&adapter->fp_verification_work,
+				      msecs_to_jiffies(10));
+	}
+
+	adapter->fp_disable_verify = fpcmd->disable_verify;
+
+	verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
+
+	/* If the verification was not done, we want to enable frame
+	 * preemption and we have not finished it, wait for it to
+	 * finish.
+	 */
+	if (!verified && !adapter->fp_disable_verify && adapter->frame_preemption_requested)
+		return 0;
+
+	if (adapter->frame_preemption_active != adapter->frame_preemption_requested ||
+	    adapter->add_frag_size != fpcmd->add_frag_size ||
+	    mask_changed) {
+		adapter->frame_preemption_active = adapter->frame_preemption_requested;
+		adapter->add_frag_size = fpcmd->add_frag_size;
+
+		return igc_tsn_offload_apply(adapter);
+	}
+
+	return 0;
 }
 
 static int igc_ethtool_begin(struct net_device *netdev)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 5dd7140bac82..69e96e9a3ec8 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -30,6 +30,11 @@
 #define IGC_XDP_TX		BIT(1)
 #define IGC_XDP_REDIRECT	BIT(2)
 
+#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
+#define IGC_MAX_VERIFY_CNT 3
+
+#define IGC_FP_SMD_FRAME_SIZE 60
+
 static int debug = -1;
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -2190,6 +2195,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
 	return 0;
 }
 
+static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
+				 struct sk_buff *skb)
+{
+	dma_addr_t dma;
+	unsigned int size;
+
+	size = skb_headlen(skb);
+
+	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(ring->dev, dma)) {
+		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
+		return -ENOMEM;
+	}
+
+	buffer->skb = skb;
+	buffer->protocol = 0;
+	buffer->bytecount = skb->len;
+	buffer->gso_segs = 1;
+	buffer->time_stamp = jiffies;
+	dma_unmap_len_set(buffer, len, skb->len);
+	dma_unmap_addr_set(buffer, dma, dma);
+
+	return 0;
+}
+
+static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
+				     struct sk_buff *skb, int type)
+{
+	struct igc_tx_buffer *buffer;
+	union igc_adv_tx_desc *desc;
+	u32 cmd_type, olinfo_status;
+	int err;
+
+	if (!igc_desc_unused(ring))
+		return -EBUSY;
+
+	buffer = &ring->tx_buffer_info[ring->next_to_use];
+	err = igc_fp_init_smd_frame(ring, buffer, skb);
+	if (err)
+		return err;
+
+	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
+		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
+		   buffer->bytecount;
+	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
+
+	switch (type) {
+	case IGC_SMD_TYPE_SMD_V:
+		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
+		break;
+	case IGC_SMD_TYPE_SMD_R:
+		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	desc = IGC_TX_DESC(ring, ring->next_to_use);
+	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
+	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
+
+	netdev_tx_sent_queue(txring_txq(ring), skb->len);
+
+	buffer->next_to_watch = desc;
+
+	ring->next_to_use++;
+	if (ring->next_to_use == ring->count)
+		ring->next_to_use = 0;
+
+	return 0;
+}
+
 static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
 					    int cpu)
 {
@@ -2317,6 +2395,43 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
 	q_vector->rx.total_bytes += bytes;
 }
 
+static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
+{
+	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
+
+	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
+		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
+}
+
+static bool igc_check_smd_frame(void *pktbuf, unsigned int size)
+{
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+	const u32 *b;
+#else
+	const u16 *b;
+#endif
+	int i;
+
+	if (size != 60)
+		return false;
+
+	/* The SMD frames (V and R) have the preamble, the SMD tag, 60
+	 * octects of zeroes and the mCRC. At this point the hardware
+	 * already discarded most of that, so we only need to check
+	 * the "contents" of the frame.
+	 */
+	b = pktbuf;
+	for (i = 16 / sizeof(*b); i < size / sizeof(*b); i++)
+		/* FIXME: i226 seems to insert some garbage
+		 * (timestamps?) in SMD frames, ignore the first 16
+		 * bytes (4 words). Investigate better.
+		 */
+		if (b[i] != 0)
+			return false;
+
+	return true;
+}
+
 static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 {
 	unsigned int total_bytes = 0, total_packets = 0;
@@ -2333,6 +2448,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		ktime_t timestamp = 0;
 		struct xdp_buff xdp;
 		int pkt_offset = 0;
+		int smd_type;
 		void *pktbuf;
 
 		/* return some buffers to hardware, one at a time is too slow */
@@ -2364,6 +2480,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 			size -= IGC_TS_HDR_LEN;
 		}
 
+		smd_type = igc_rx_desc_smd_type(rx_desc);
+
+		if (unlikely(smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R)) {
+			if (igc_check_smd_frame(pktbuf, size)) {
+				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
+				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
+				schedule_delayed_work(&adapter->fp_verification_work, 0);
+			}
+
+			/* Advance the ring next-to-clean */
+			igc_is_non_eop(rx_ring, rx_desc);
+
+			cleaned_count++;
+			continue;
+		}
+
 		if (!skb) {
 			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
 			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
@@ -6003,6 +6135,116 @@ static int igc_tsn_enable_cbs(struct igc_adapter *adapter,
 	return igc_tsn_offload_apply(adapter);
 }
 
+/* I225 doesn't send the SMD frames automatically, we need to handle
+ * them ourselves.
+ */
+static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
+{
+	int cpu = smp_processor_id();
+	struct netdev_queue *nq;
+	struct igc_ring *ring;
+	struct sk_buff *skb;
+	void *data;
+	int err;
+
+	if (!netif_running(adapter->netdev))
+		return -ENOTCONN;
+
+	/* FIXME: rename this function to something less specific, as
+	 * it can be used outside XDP.
+	 */
+	ring = igc_xdp_get_tx_ring(adapter, cpu);
+	nq = txring_txq(ring);
+
+	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
+	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
+
+	__netif_tx_lock_bh(nq);
+
+	err = igc_fp_init_tx_descriptor(ring, skb, type);
+
+	igc_flush_tx_descriptors(ring);
+
+	__netif_tx_unlock_bh(nq);
+
+	return err;
+}
+
+static void igc_fp_verification_work(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct igc_adapter *adapter;
+	int err;
+
+	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
+
+	if (adapter->fp_disable_verify)
+		goto done;
+
+	switch (adapter->fp_tx_state) {
+	case FRAME_PREEMPTION_STATE_START:
+		adapter->fp_received_smd_r = false;
+		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
+		if (err < 0)
+			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
+
+		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
+		adapter->fp_start = jiffies;
+		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
+		break;
+
+	case FRAME_PREEMPTION_STATE_SENT:
+		if (adapter->fp_received_smd_r) {
+			/* Verifcation has finished successfully, we
+			 * can enable frame preemption in the hw now
+			 */
+			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
+			adapter->fp_received_smd_r = false;
+
+			if (adapter->frame_preemption_requested) {
+				adapter->frame_preemption_active = true;
+				igc_tsn_offload_apply(adapter);
+			}
+
+			break;
+		}
+
+		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
+			adapter->fp_verify_cnt++;
+			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
+
+			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
+				adapter->fp_verify_cnt = 0;
+				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
+				netdev_err(adapter->netdev,
+					   "Exceeded number of attempts for frame preemption verification\n");
+			} else {
+				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
+			}
+			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
+		}
+
+		break;
+
+	case FRAME_PREEMPTION_STATE_FAILED:
+	case FRAME_PREEMPTION_STATE_DONE:
+		break;
+	}
+
+done:
+	if (adapter->fp_received_smd_v) {
+		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
+		if (err < 0)
+			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
+
+		adapter->fp_received_smd_v = false;
+	}
+}
+
 static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			void *type_data)
 {
@@ -6369,6 +6611,7 @@ static int igc_probe(struct pci_dev *pdev,
 
 	INIT_WORK(&adapter->reset_task, igc_reset_task);
 	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
+	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
 
 	/* Initialize link properties that are user-changeable */
 	adapter->fc_autoneg = true;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 09/11] igc: Add support for Frame Preemption verification
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

Add support for sending/receiving Frame Preemption verification
frames.

The i225 hardware doesn't implement the process of verification
internally, this is left to the driver.

Add a simple implementation of the state machine defined in IEEE
802.3-2018, Section 99.4.7.

For now, the state machine is started manually by the user, when
enabling verification. Example:

$ ethtool --set-frame-preemption IFACE disable-verify off

The "verified" condition is set to true when the SMD-V frame is sent,
and the SMD-R frame is received. So, it only tracks the transmission
side. This seems to be what's expected from IEEE 802.3-2018.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  16 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  13 +
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  37 ++-
 drivers/net/ethernet/intel/igc/igc_main.c    | 243 +++++++++++++++++++
 4 files changed, 307 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 11da66bd9c2c..be4a8362d6d7 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -131,6 +131,13 @@ struct igc_ring {
 	struct xsk_buff_pool *xsk_pool;
 } ____cacheline_internodealigned_in_smp;
 
+enum frame_preemption_state {
+	FRAME_PREEMPTION_STATE_FAILED,
+	FRAME_PREEMPTION_STATE_DONE,
+	FRAME_PREEMPTION_STATE_START,
+	FRAME_PREEMPTION_STATE_SENT,
+};
+
 /* Board specific private data structure */
 struct igc_adapter {
 	struct net_device *netdev;
@@ -184,6 +191,7 @@ struct igc_adapter {
 	ktime_t base_time;
 	ktime_t cycle_time;
 	bool frame_preemption_active;
+	bool frame_preemption_requested;
 	u32 add_frag_size;
 
 	/* OS defined structs */
@@ -250,6 +258,14 @@ struct igc_adapter {
 		struct timespec64 start;
 		struct timespec64 period;
 	} perout[IGC_N_PEROUT];
+
+	struct delayed_work fp_verification_work;
+	unsigned long fp_start;
+	bool fp_received_smd_v;
+	bool fp_received_smd_r;
+	unsigned int fp_verify_cnt;
+	enum frame_preemption_state fp_tx_state;
+	bool fp_disable_verify;
 };
 
 void igc_up(struct igc_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 68faca584e34..63fc76a0b72a 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -307,6 +307,8 @@
 #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
 #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
 #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
+#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
+#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
 #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
 #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
 #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
@@ -366,9 +368,20 @@
 
 #define IGC_RXDEXT_STATERR_LB	0x00040000
 
+#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
+#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
+
 /* Advanced Receive Descriptor bit definitions */
 #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
 
+#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
+#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
+
+#define IGC_SMD_TYPE_SFD		0x0
+#define IGC_SMD_TYPE_SMD_V		0x1
+#define IGC_SMD_TYPE_SMD_R		0x2
+#define IGC_SMD_TYPE_COMPLETE		0x3
+
 #define IGC_RXDEXT_STATERR_L4E		0x20000000
 #define IGC_RXDEXT_STATERR_IPE		0x40000000
 #define IGC_RXDEXT_STATERR_RXE		0x80000000
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 401d2cdb3e81..9a80e2569dc3 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1680,6 +1680,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
 
 	fpcmd->enabled = adapter->frame_preemption_active;
 	fpcmd->add_frag_size = adapter->add_frag_size;
+	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
+	fpcmd->disable_verify = adapter->fp_disable_verify;
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igc_ring *ring = adapter->tx_ring[i];
@@ -1698,6 +1700,7 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
 				   struct netlink_ext_ack *extack)
 {
 	struct igc_adapter *adapter = netdev_priv(netdev);
+	bool verified = false, mask_changed = false;
 	u32 mask;
 	int i;
 
@@ -1706,17 +1709,47 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
 		return -EINVAL;
 	}
 
-	adapter->frame_preemption_active = fpcmd->enabled;
+	adapter->frame_preemption_requested = fpcmd->enabled;
 	adapter->add_frag_size = fpcmd->add_frag_size;
 	mask = fpcmd->preemptible_mask;
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igc_ring *ring = adapter->tx_ring[i];
+		bool preemptible = mask & BIT(i);
+
+		if (ring->preemptible != preemptible)
+			mask_changed = true;
 
 		ring->preemptible = (mask & BIT(i));
 	}
 
-	return igc_tsn_offload_apply(adapter);
+	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
+		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
+		schedule_delayed_work(&adapter->fp_verification_work,
+				      msecs_to_jiffies(10));
+	}
+
+	adapter->fp_disable_verify = fpcmd->disable_verify;
+
+	verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
+
+	/* If the verification was not done, we want to enable frame
+	 * preemption and we have not finished it, wait for it to
+	 * finish.
+	 */
+	if (!verified && !adapter->fp_disable_verify && adapter->frame_preemption_requested)
+		return 0;
+
+	if (adapter->frame_preemption_active != adapter->frame_preemption_requested ||
+	    adapter->add_frag_size != fpcmd->add_frag_size ||
+	    mask_changed) {
+		adapter->frame_preemption_active = adapter->frame_preemption_requested;
+		adapter->add_frag_size = fpcmd->add_frag_size;
+
+		return igc_tsn_offload_apply(adapter);
+	}
+
+	return 0;
 }
 
 static int igc_ethtool_begin(struct net_device *netdev)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 5dd7140bac82..69e96e9a3ec8 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -30,6 +30,11 @@
 #define IGC_XDP_TX		BIT(1)
 #define IGC_XDP_REDIRECT	BIT(2)
 
+#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
+#define IGC_MAX_VERIFY_CNT 3
+
+#define IGC_FP_SMD_FRAME_SIZE 60
+
 static int debug = -1;
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -2190,6 +2195,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
 	return 0;
 }
 
+static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
+				 struct sk_buff *skb)
+{
+	dma_addr_t dma;
+	unsigned int size;
+
+	size = skb_headlen(skb);
+
+	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(ring->dev, dma)) {
+		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
+		return -ENOMEM;
+	}
+
+	buffer->skb = skb;
+	buffer->protocol = 0;
+	buffer->bytecount = skb->len;
+	buffer->gso_segs = 1;
+	buffer->time_stamp = jiffies;
+	dma_unmap_len_set(buffer, len, skb->len);
+	dma_unmap_addr_set(buffer, dma, dma);
+
+	return 0;
+}
+
+static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
+				     struct sk_buff *skb, int type)
+{
+	struct igc_tx_buffer *buffer;
+	union igc_adv_tx_desc *desc;
+	u32 cmd_type, olinfo_status;
+	int err;
+
+	if (!igc_desc_unused(ring))
+		return -EBUSY;
+
+	buffer = &ring->tx_buffer_info[ring->next_to_use];
+	err = igc_fp_init_smd_frame(ring, buffer, skb);
+	if (err)
+		return err;
+
+	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
+		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
+		   buffer->bytecount;
+	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
+
+	switch (type) {
+	case IGC_SMD_TYPE_SMD_V:
+		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
+		break;
+	case IGC_SMD_TYPE_SMD_R:
+		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	desc = IGC_TX_DESC(ring, ring->next_to_use);
+	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
+	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
+
+	netdev_tx_sent_queue(txring_txq(ring), skb->len);
+
+	buffer->next_to_watch = desc;
+
+	ring->next_to_use++;
+	if (ring->next_to_use == ring->count)
+		ring->next_to_use = 0;
+
+	return 0;
+}
+
 static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
 					    int cpu)
 {
@@ -2317,6 +2395,43 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
 	q_vector->rx.total_bytes += bytes;
 }
 
+static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
+{
+	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
+
+	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
+		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
+}
+
+static bool igc_check_smd_frame(void *pktbuf, unsigned int size)
+{
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+	const u32 *b;
+#else
+	const u16 *b;
+#endif
+	int i;
+
+	if (size != 60)
+		return false;
+
+	/* The SMD frames (V and R) have the preamble, the SMD tag, 60
+	 * octects of zeroes and the mCRC. At this point the hardware
+	 * already discarded most of that, so we only need to check
+	 * the "contents" of the frame.
+	 */
+	b = pktbuf;
+	for (i = 16 / sizeof(*b); i < size / sizeof(*b); i++)
+		/* FIXME: i226 seems to insert some garbage
+		 * (timestamps?) in SMD frames, ignore the first 16
+		 * bytes (4 words). Investigate better.
+		 */
+		if (b[i] != 0)
+			return false;
+
+	return true;
+}
+
 static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 {
 	unsigned int total_bytes = 0, total_packets = 0;
@@ -2333,6 +2448,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		ktime_t timestamp = 0;
 		struct xdp_buff xdp;
 		int pkt_offset = 0;
+		int smd_type;
 		void *pktbuf;
 
 		/* return some buffers to hardware, one@a time is too slow */
@@ -2364,6 +2480,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 			size -= IGC_TS_HDR_LEN;
 		}
 
+		smd_type = igc_rx_desc_smd_type(rx_desc);
+
+		if (unlikely(smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R)) {
+			if (igc_check_smd_frame(pktbuf, size)) {
+				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
+				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
+				schedule_delayed_work(&adapter->fp_verification_work, 0);
+			}
+
+			/* Advance the ring next-to-clean */
+			igc_is_non_eop(rx_ring, rx_desc);
+
+			cleaned_count++;
+			continue;
+		}
+
 		if (!skb) {
 			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
 			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
@@ -6003,6 +6135,116 @@ static int igc_tsn_enable_cbs(struct igc_adapter *adapter,
 	return igc_tsn_offload_apply(adapter);
 }
 
+/* I225 doesn't send the SMD frames automatically, we need to handle
+ * them ourselves.
+ */
+static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
+{
+	int cpu = smp_processor_id();
+	struct netdev_queue *nq;
+	struct igc_ring *ring;
+	struct sk_buff *skb;
+	void *data;
+	int err;
+
+	if (!netif_running(adapter->netdev))
+		return -ENOTCONN;
+
+	/* FIXME: rename this function to something less specific, as
+	 * it can be used outside XDP.
+	 */
+	ring = igc_xdp_get_tx_ring(adapter, cpu);
+	nq = txring_txq(ring);
+
+	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
+	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
+
+	__netif_tx_lock_bh(nq);
+
+	err = igc_fp_init_tx_descriptor(ring, skb, type);
+
+	igc_flush_tx_descriptors(ring);
+
+	__netif_tx_unlock_bh(nq);
+
+	return err;
+}
+
+static void igc_fp_verification_work(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct igc_adapter *adapter;
+	int err;
+
+	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
+
+	if (adapter->fp_disable_verify)
+		goto done;
+
+	switch (adapter->fp_tx_state) {
+	case FRAME_PREEMPTION_STATE_START:
+		adapter->fp_received_smd_r = false;
+		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
+		if (err < 0)
+			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
+
+		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
+		adapter->fp_start = jiffies;
+		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
+		break;
+
+	case FRAME_PREEMPTION_STATE_SENT:
+		if (adapter->fp_received_smd_r) {
+			/* Verifcation has finished successfully, we
+			 * can enable frame preemption in the hw now
+			 */
+			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
+			adapter->fp_received_smd_r = false;
+
+			if (adapter->frame_preemption_requested) {
+				adapter->frame_preemption_active = true;
+				igc_tsn_offload_apply(adapter);
+			}
+
+			break;
+		}
+
+		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
+			adapter->fp_verify_cnt++;
+			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
+
+			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
+				adapter->fp_verify_cnt = 0;
+				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
+				netdev_err(adapter->netdev,
+					   "Exceeded number of attempts for frame preemption verification\n");
+			} else {
+				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
+			}
+			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
+		}
+
+		break;
+
+	case FRAME_PREEMPTION_STATE_FAILED:
+	case FRAME_PREEMPTION_STATE_DONE:
+		break;
+	}
+
+done:
+	if (adapter->fp_received_smd_v) {
+		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
+		if (err < 0)
+			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
+
+		adapter->fp_received_smd_v = false;
+	}
+}
+
 static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			void *type_data)
 {
@@ -6369,6 +6611,7 @@ static int igc_probe(struct pci_dev *pdev,
 
 	INIT_WORK(&adapter->reset_task, igc_reset_task);
 	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
+	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
 
 	/* Initialize link properties that are user-changeable */
 	adapter->fc_autoneg = true;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

Frame Preemption and LaunchTime cannot be enabled on the same queue.
If that situation happens, emit an error to the user, and log the
error.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 69e96e9a3ec8..96ad00e33f4b 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5916,6 +5916,11 @@ static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
 	if (queue < 0 || queue >= adapter->num_tx_queues)
 		return -EINVAL;
 
+	if (ring->preemptible) {
+		netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
+		return -EINVAL;
+	}
+
 	ring = adapter->tx_ring[queue];
 	ring->launchtime_enable = enable;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

Frame Preemption and LaunchTime cannot be enabled on the same queue.
If that situation happens, emit an error to the user, and log the
error.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 69e96e9a3ec8..96ad00e33f4b 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5916,6 +5916,11 @@ static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
 	if (queue < 0 || queue >= adapter->num_tx_queues)
 		return -EINVAL;
 
+	if (ring->preemptible) {
+		netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
+		return -EINVAL;
+	}
+
 	ring = adapter->tx_ring[queue];
 	ring->launchtime_enable = enable;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH net-next v5 11/11] igc: Add support for exposing frame preemption stats registers
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, davem,
	vladimir.oltean, po.liu, boon.leong.ong, intel-wired-lan

Expose the Frame Preemption counters, so the number of
express/preemptible packets can be monitored by userspace.

These registers are cleared when read, so the value shown is the
number of events that happened since the last read.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  8 ++++++++
 drivers/net/ethernet/intel/igc/igc_regs.h    | 10 ++++++++++
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 9a80e2569dc3..0a84fbdd494b 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -344,6 +344,14 @@ static void igc_ethtool_get_regs(struct net_device *netdev,
 
 	regs_buff[213] = adapter->stats.tlpic;
 	regs_buff[214] = adapter->stats.rlpic;
+	regs_buff[215] = rd32(IGC_PRMPTDTCNT);
+	regs_buff[216] = rd32(IGC_PRMEVNTTCNT);
+	regs_buff[217] = rd32(IGC_PRMPTDRCNT);
+	regs_buff[218] = rd32(IGC_PRMEVNTRCNT);
+	regs_buff[219] = rd32(IGC_PRMPBLTCNT);
+	regs_buff[220] = rd32(IGC_PRMPBLRCNT);
+	regs_buff[221] = rd32(IGC_PRMEXPTCNT);
+	regs_buff[222] = rd32(IGC_PRMEXPRCNT);
 }
 
 static void igc_ethtool_get_wol(struct net_device *netdev,
diff --git a/drivers/net/ethernet/intel/igc/igc_regs.h b/drivers/net/ethernet/intel/igc/igc_regs.h
index e197a33d93a0..2b5ef1e80f5f 100644
--- a/drivers/net/ethernet/intel/igc/igc_regs.h
+++ b/drivers/net/ethernet/intel/igc/igc_regs.h
@@ -224,6 +224,16 @@
 
 #define IGC_FTQF(_n)	(0x059E0 + (4 * (_n)))  /* 5-tuple Queue Fltr */
 
+/* Time sync registers - preemption statistics */
+#define IGC_PRMPTDTCNT	0x04280  /* Good TX Preempted Packets */
+#define IGC_PRMEVNTTCNT	0x04298  /* TX Preemption event counter */
+#define IGC_PRMPTDRCNT	0x04284  /* Good RX Preempted Packets */
+#define IGC_PRMEVNTRCNT	0x0429C  /* RX Preemption event counter */
+#define IGC_PRMPBLTCNT	0x04288  /* Good TX Preemptable Packets */
+#define IGC_PRMPBLRCNT	0x0428C  /* Good RX Preemptable Packets */
+#define IGC_PRMEXPTCNT	0x04290  /* Good TX Express Packets */
+#define IGC_PRMEXPRCNT	0x042A0  /* Preemption Exception Counter */
+
 /* Transmit Scheduling Registers */
 #define IGC_TQAVCTRL		0x3570
 #define IGC_TXQCTL(_n)		(0x3344 + 0x4 * (_n))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 11/11] igc: Add support for exposing frame preemption stats registers
@ 2022-05-20  1:15   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 60+ messages in thread
From: Vinicius Costa Gomes @ 2022-05-20  1:15 UTC (permalink / raw)
  To: intel-wired-lan

Expose the Frame Preemption counters, so the number of
express/preemptible packets can be monitored by userspace.

These registers are cleared when read, so the value shown is the
number of events that happened since the last read.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  8 ++++++++
 drivers/net/ethernet/intel/igc/igc_regs.h    | 10 ++++++++++
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 9a80e2569dc3..0a84fbdd494b 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -344,6 +344,14 @@ static void igc_ethtool_get_regs(struct net_device *netdev,
 
 	regs_buff[213] = adapter->stats.tlpic;
 	regs_buff[214] = adapter->stats.rlpic;
+	regs_buff[215] = rd32(IGC_PRMPTDTCNT);
+	regs_buff[216] = rd32(IGC_PRMEVNTTCNT);
+	regs_buff[217] = rd32(IGC_PRMPTDRCNT);
+	regs_buff[218] = rd32(IGC_PRMEVNTRCNT);
+	regs_buff[219] = rd32(IGC_PRMPBLTCNT);
+	regs_buff[220] = rd32(IGC_PRMPBLRCNT);
+	regs_buff[221] = rd32(IGC_PRMEXPTCNT);
+	regs_buff[222] = rd32(IGC_PRMEXPRCNT);
 }
 
 static void igc_ethtool_get_wol(struct net_device *netdev,
diff --git a/drivers/net/ethernet/intel/igc/igc_regs.h b/drivers/net/ethernet/intel/igc/igc_regs.h
index e197a33d93a0..2b5ef1e80f5f 100644
--- a/drivers/net/ethernet/intel/igc/igc_regs.h
+++ b/drivers/net/ethernet/intel/igc/igc_regs.h
@@ -224,6 +224,16 @@
 
 #define IGC_FTQF(_n)	(0x059E0 + (4 * (_n)))  /* 5-tuple Queue Fltr */
 
+/* Time sync registers - preemption statistics */
+#define IGC_PRMPTDTCNT	0x04280  /* Good TX Preempted Packets */
+#define IGC_PRMEVNTTCNT	0x04298  /* TX Preemption event counter */
+#define IGC_PRMPTDRCNT	0x04284  /* Good RX Preempted Packets */
+#define IGC_PRMEVNTRCNT	0x0429C  /* RX Preemption event counter */
+#define IGC_PRMPBLTCNT	0x04288  /* Good TX Preemptable Packets */
+#define IGC_PRMPBLRCNT	0x0428C  /* Good RX Preemptable Packets */
+#define IGC_PRMEXPTCNT	0x04290  /* Good TX Express Packets */
+#define IGC_PRMEXPRCNT	0x042A0  /* Preemption Exception Counter */
+
 /* Transmit Scheduling Registers */
 #define IGC_TQAVCTRL		0x3570
 #define IGC_TXQCTL(_n)		(0x3344 + 0x4 * (_n))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  6:11     ` kernel test robot
  -1 siblings, 0 replies; 60+ messages in thread
From: kernel test robot @ 2022-05-20  6:11 UTC (permalink / raw)
  To: Vinicius Costa Gomes, netdev
  Cc: llvm, kbuild-all, Vinicius Costa Gomes, jhs, xiyou.wangcong,
	jiri, davem, vladimir.oltean, po.liu, boon.leong.ong,
	intel-wired-lan

Hi Vinicius,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Vinicius-Costa-Gomes/ethtool-Add-support-for-frame-preemption/20220520-092800
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git df98714e432abf5cbdac3e4c1a13f94c65ddb8d3
config: s390-buildonly-randconfig-r002-20220519 (https://download.01.org/0day-ci/archive/20220520/202205201422.84XYwlpY-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project e00cbbec06c08dc616a0d52a20f678b8fbd4e304)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/a42e940bc53c40ee4e33a1bbf022a663bb28a9c7
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vinicius-Costa-Gomes/ethtool-Add-support-for-frame-preemption/20220520-092800
        git checkout a42e940bc53c40ee4e33a1bbf022a663bb28a9c7
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash drivers/net/ethernet/intel/igc/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from drivers/net/ethernet/intel/igc/igc_main.c:6:
   In file included from include/linux/if_vlan.h:10:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:40:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from drivers/net/ethernet/intel/igc/igc_main.c:6:
   In file included from include/linux/if_vlan.h:10:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:40:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from drivers/net/ethernet/intel/igc/igc_main.c:6:
   In file included from include/linux/if_vlan.h:10:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:40:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> drivers/net/ethernet/intel/igc/igc_main.c:5919:6: warning: variable 'ring' is uninitialized when used here [-Wuninitialized]
           if (ring->preemptible) {
               ^~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5914:23: note: initialize the variable 'ring' to silence this warning
           struct igc_ring *ring;
                                ^
                                 = NULL
   13 warnings generated.


vim +/ring +5919 drivers/net/ethernet/intel/igc/igc_main.c

  5910	
  5911	static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
  5912					      bool enable)
  5913	{
  5914		struct igc_ring *ring;
  5915	
  5916		if (queue < 0 || queue >= adapter->num_tx_queues)
  5917			return -EINVAL;
  5918	
> 5919		if (ring->preemptible) {
  5920			netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
  5921			return -EINVAL;
  5922		}
  5923	
  5924		ring = adapter->tx_ring[queue];
  5925		ring->launchtime_enable = enable;
  5926	
  5927		return 0;
  5928	}
  5929	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption
@ 2022-05-20  6:11     ` kernel test robot
  0 siblings, 0 replies; 60+ messages in thread
From: kernel test robot @ 2022-05-20  6:11 UTC (permalink / raw)
  To: intel-wired-lan

Hi Vinicius,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Vinicius-Costa-Gomes/ethtool-Add-support-for-frame-preemption/20220520-092800
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git df98714e432abf5cbdac3e4c1a13f94c65ddb8d3
config: s390-buildonly-randconfig-r002-20220519 (https://download.01.org/0day-ci/archive/20220520/202205201422.84XYwlpY-lkp at intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project e00cbbec06c08dc616a0d52a20f678b8fbd4e304)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/a42e940bc53c40ee4e33a1bbf022a663bb28a9c7
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vinicius-Costa-Gomes/ethtool-Add-support-for-frame-preemption/20220520-092800
        git checkout a42e940bc53c40ee4e33a1bbf022a663bb28a9c7
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash drivers/net/ethernet/intel/igc/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from drivers/net/ethernet/intel/igc/igc_main.c:6:
   In file included from include/linux/if_vlan.h:10:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:40:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from drivers/net/ethernet/intel/igc/igc_main.c:6:
   In file included from include/linux/if_vlan.h:10:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:40:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from drivers/net/ethernet/intel/igc/igc_main.c:6:
   In file included from include/linux/if_vlan.h:10:
   In file included from include/linux/netdevice.h:38:
   In file included from include/net/net_namespace.h:40:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> drivers/net/ethernet/intel/igc/igc_main.c:5919:6: warning: variable 'ring' is uninitialized when used here [-Wuninitialized]
           if (ring->preemptible) {
               ^~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5914:23: note: initialize the variable 'ring' to silence this warning
           struct igc_ring *ring;
                                ^
                                 = NULL
   13 warnings generated.


vim +/ring +5919 drivers/net/ethernet/intel/igc/igc_main.c

  5910	
  5911	static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
  5912					      bool enable)
  5913	{
  5914		struct igc_ring *ring;
  5915	
  5916		if (queue < 0 || queue >= adapter->num_tx_queues)
  5917			return -EINVAL;
  5918	
> 5919		if (ring->preemptible) {
  5920			netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
  5921			return -EINVAL;
  5922		}
  5923	
  5924		ring = adapter->tx_ring[queue];
  5925		ring->launchtime_enable = enable;
  5926	
  5927		return 0;
  5928	}
  5929	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 01/11] ethtool: Add support for configuring frame preemption
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  9:06     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:06 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Po Liu, boon.leong.ong,
	intel-wired-lan

Hi Vinicius,

On Thu, May 19, 2022 at 06:15:28PM -0700, Vinicius Costa Gomes wrote:
> Frame preemption (described in IEEE 802.3-2018, Section 99 in
> particular) defines the concept of preemptible and express queues. It
> allows traffic from express queues to "interrupt" traffic from
> preemptible queues, which are "resumed" after the express traffic has
> finished transmitting.
> 
> Expose the UAPI bits for applications to enable using ethtool-netlink.
> Also expose the kernel ethtool functions, so device drivers can
> support it.
> 
> Frame preemption can only be used when both the local device and the
> link partner support it.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

This looks good to me. Just one comment below.

> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct ethnl_req_info req_info = {};
> +	struct nlattr **tb = info->attrs;
> +	struct ethtool_fp preempt = {};
> +	struct net_device *dev;
> +	bool mod = false;
> +	int ret;
> +
> +	ret = ethnl_parse_header_dev_get(&req_info,
> +					 tb[ETHTOOL_A_PREEMPT_HEADER],
> +					 genl_info_net(info), info->extack,
> +					 true);
> +	if (ret < 0)
> +		return ret;
> +	dev = req_info.dev;
> +
> +	ret = -EOPNOTSUPP;
> +	if (!dev->ethtool_ops->get_preempt ||
> +	    !dev->ethtool_ops->set_preempt)
> +		goto out_dev;
> +
> +	rtnl_lock();
> +	ret = ethnl_ops_begin(dev);
> +	if (ret < 0)
> +		goto out_rtnl;
> +
> +	ret = dev->ethtool_ops->get_preempt(dev, &preempt);
> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
> +		goto out_ops;
> +	}
> +
> +	ret = ethnl_update_bitset32(&preempt.preemptible_mask, PREEMPT_QUEUES_COUNT,
> +				    tb[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK],
> +				    NULL, info->extack, &mod);
> +	if (ret < 0)
> +		goto out_ops;
> +
> +	ethnl_update_bool32(&preempt.enabled,
> +			    tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
> +	ethnl_update_u32(&preempt.add_frag_size,
> +			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
> +	ret = 0;
> +	if (!mod)
> +		goto out_ops;
> +
> +	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");

If you pass the extack to ->set_preempt, would you consider not
overwriting it immediately afterwards on error?

> +		goto out_ops;
> +	}
> +
> +	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
> +
> +out_ops:
> +	ethnl_ops_complete(dev);
> +out_rtnl:
> +	rtnl_unlock();
> +out_dev:
> +	dev_put(dev);
> +	return ret;
> +}
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 01/11] ethtool: Add support for configuring frame preemption
@ 2022-05-20  9:06     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:06 UTC (permalink / raw)
  To: intel-wired-lan

Hi Vinicius,

On Thu, May 19, 2022 at 06:15:28PM -0700, Vinicius Costa Gomes wrote:
> Frame preemption (described in IEEE 802.3-2018, Section 99 in
> particular) defines the concept of preemptible and express queues. It
> allows traffic from express queues to "interrupt" traffic from
> preemptible queues, which are "resumed" after the express traffic has
> finished transmitting.
> 
> Expose the UAPI bits for applications to enable using ethtool-netlink.
> Also expose the kernel ethtool functions, so device drivers can
> support it.
> 
> Frame preemption can only be used when both the local device and the
> link partner support it.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

This looks good to me. Just one comment below.

> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct ethnl_req_info req_info = {};
> +	struct nlattr **tb = info->attrs;
> +	struct ethtool_fp preempt = {};
> +	struct net_device *dev;
> +	bool mod = false;
> +	int ret;
> +
> +	ret = ethnl_parse_header_dev_get(&req_info,
> +					 tb[ETHTOOL_A_PREEMPT_HEADER],
> +					 genl_info_net(info), info->extack,
> +					 true);
> +	if (ret < 0)
> +		return ret;
> +	dev = req_info.dev;
> +
> +	ret = -EOPNOTSUPP;
> +	if (!dev->ethtool_ops->get_preempt ||
> +	    !dev->ethtool_ops->set_preempt)
> +		goto out_dev;
> +
> +	rtnl_lock();
> +	ret = ethnl_ops_begin(dev);
> +	if (ret < 0)
> +		goto out_rtnl;
> +
> +	ret = dev->ethtool_ops->get_preempt(dev, &preempt);
> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
> +		goto out_ops;
> +	}
> +
> +	ret = ethnl_update_bitset32(&preempt.preemptible_mask, PREEMPT_QUEUES_COUNT,
> +				    tb[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK],
> +				    NULL, info->extack, &mod);
> +	if (ret < 0)
> +		goto out_ops;
> +
> +	ethnl_update_bool32(&preempt.enabled,
> +			    tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
> +	ethnl_update_u32(&preempt.add_frag_size,
> +			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
> +	ret = 0;
> +	if (!mod)
> +		goto out_ops;
> +
> +	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");

If you pass the extack to ->set_preempt, would you consider not
overwriting it immediately afterwards on error?

> +		goto out_ops;
> +	}
> +
> +	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
> +
> +out_ops:
> +	ethnl_ops_complete(dev);
> +out_rtnl:
> +	rtnl_unlock();
> +out_dev:
> +	dev_put(dev);
> +	return ret;
> +}
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 02/11] ethtool: Add support for Frame Preemption verification
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  9:16     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:16 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Po Liu, boon.leong.ong,
	intel-wired-lan

On Thu, May 19, 2022 at 06:15:29PM -0700, Vinicius Costa Gomes wrote:
> Expose the ethtool parameters to the PREEMPT_SET/_GET commands
> necessary to support the verification procedure as defined by IEEE
> 802.3-2018.
> 
> These include the 'verified' bit to indicate that the verification
> dialog has concluded successfully with the link partner and frame
> preemption is supported. There's also the 'disable_verify' config to
> disable initiating the verification dialog.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  Documentation/networking/ethtool-netlink.rst |  3 +++
>  include/linux/ethtool.h                      |  3 +++
>  include/uapi/linux/ethtool_netlink.h         |  2 ++
>  net/ethtool/netlink.h                        |  2 +-
>  net/ethtool/preempt.c                        | 11 +++++++++++
>  5 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
> index 15d7c025cc4e..1731e7ad9ee7 100644
> --- a/Documentation/networking/ethtool-netlink.rst
> +++ b/Documentation/networking/ethtool-netlink.rst
> @@ -1646,6 +1646,8 @@ Kernel response contents:
>    ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
>    ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
>    ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
> +  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``    u32     disable verification
> +  ``ETHTOOL_A_PREEMPT_VERIFIED``          u32     verification procedure
>    ======================================  ======  ==========================
>  
>  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
> @@ -1667,6 +1669,7 @@ Request contents:
>    ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
>    ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
>    ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
> +  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``    bool    disable verification
>    ======================================  ======  ==========================
>  
>  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 42570ec8ee44..5600a7610fa1 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -13,6 +13,7 @@
>  #ifndef _LINUX_ETHTOOL_H
>  #define _LINUX_ETHTOOL_H
>  
> +#include "asm-generic/int-ll64.h"

Why this header, and why now?

>  #include <linux/bitmap.h>
>  #include <linux/compat.h>
>  #include <linux/netlink.h>
> @@ -464,6 +465,8 @@ struct ethtool_module_power_mode_params {
>  struct ethtool_fp {
>  	u32 enabled;
>  	u32 preemptible_mask;
> +	u32 disable_verify;
> +	u32 verified;
>  	u32 add_frag_size;
>  };
>  
> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
> index 651c7af76776..27c9bc5bfa51 100644
> --- a/include/uapi/linux/ethtool_netlink.h
> +++ b/include/uapi/linux/ethtool_netlink.h
> @@ -709,6 +709,8 @@ enum {
>  	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
>  	ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK,		/* bitset */
>  	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
> +	ETHTOOL_A_PREEMPT_DISABLE_VERIFY,		/* u8 */
> +	ETHTOOL_A_PREEMPT_VERIFIED,			/* u8 */

Clause 30.14.1.2 aMACMergeStatusVerify talks about a state machine with
more than 2 states (verified and not verified):

An ENUMERATED VALUE that has one of the following entries:
unknown: verification status is unknown
initial: the Verify State diagram (Figure 99-8) is in the state INIT_VERIFICATION
verifying: the Verify State diagram is in the state VERIFICATION_IDLE, SEND_VERIFY or WAIT_FOR_RESPONSE
succeeded: indicates that the Verify State diagram is in the state VERIFIED
failed: the Verify State diagram is in the state VERIFY_FAIL
disabled: verification of preemption operation is disabled

BEHAVIOUR DEFINED AS:
This attribute indicates (when accessed via a GET operation) the status of the MAC Merge
sublayer verification on the given device. The SET operation shall have no effect on a device.

Could we have an enum here with all the states?

>  
>  	/* add new constants above here */
>  	__ETHTOOL_A_PREEMPT_CNT,
> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
> index 444799f3e91a..dfdef5b8fe5b 100644
> --- a/net/ethtool/netlink.h
> +++ b/net/ethtool/netlink.h
> @@ -381,7 +381,7 @@ extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
>  extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
>  extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
>  extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
> -extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
> +extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_VERIFIED + 1];
>  extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
>  extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
>  extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
> diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
> index 0000ba8cb90c..7566ffb948b2 100644
> --- a/net/ethtool/preempt.c
> +++ b/net/ethtool/preempt.c
> @@ -63,6 +63,8 @@ static int preempt_reply_size(const struct ethnl_req_info *req_base,
>  
>  	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
>  	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_DISABLE_VERIFY */
> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_VERIFIED */
>  
>  	return len;
>  }
> @@ -89,6 +91,12 @@ static int preempt_fill_reply(struct sk_buff *skb,
>  	if (ret < 0)
>  		return ret;
>  
> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_DISABLE_VERIFY, preempt->disable_verify))
> +		return -EMSGSIZE;
> +
> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_VERIFIED, preempt->verified))
> +		return -EMSGSIZE;
> +
>  	return 0;
>  }
>  
> @@ -110,6 +118,7 @@ ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
>  	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
>  	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
>  	[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK]		= { .type = NLA_NESTED },
> +	[ETHTOOL_A_PREEMPT_DISABLE_VERIFY]		= NLA_POLICY_RANGE(NLA_U8, 0, 1),
>  };
>  
>  int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
> @@ -155,6 +164,8 @@ int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
>  			    tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
>  	ethnl_update_u32(&preempt.add_frag_size,
>  			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
> +	ethnl_update_bool32(&preempt.disable_verify,
> +			    tb[ETHTOOL_A_PREEMPT_DISABLE_VERIFY], &mod);
>  	ret = 0;
>  	if (!mod)
>  		goto out_ops;
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 02/11] ethtool: Add support for Frame Preemption verification
@ 2022-05-20  9:16     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:16 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, May 19, 2022 at 06:15:29PM -0700, Vinicius Costa Gomes wrote:
> Expose the ethtool parameters to the PREEMPT_SET/_GET commands
> necessary to support the verification procedure as defined by IEEE
> 802.3-2018.
> 
> These include the 'verified' bit to indicate that the verification
> dialog has concluded successfully with the link partner and frame
> preemption is supported. There's also the 'disable_verify' config to
> disable initiating the verification dialog.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  Documentation/networking/ethtool-netlink.rst |  3 +++
>  include/linux/ethtool.h                      |  3 +++
>  include/uapi/linux/ethtool_netlink.h         |  2 ++
>  net/ethtool/netlink.h                        |  2 +-
>  net/ethtool/preempt.c                        | 11 +++++++++++
>  5 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
> index 15d7c025cc4e..1731e7ad9ee7 100644
> --- a/Documentation/networking/ethtool-netlink.rst
> +++ b/Documentation/networking/ethtool-netlink.rst
> @@ -1646,6 +1646,8 @@ Kernel response contents:
>    ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
>    ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
>    ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
> +  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``    u32     disable verification
> +  ``ETHTOOL_A_PREEMPT_VERIFIED``          u32     verification procedure
>    ======================================  ======  ==========================
>  
>  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
> @@ -1667,6 +1669,7 @@ Request contents:
>    ``ETHTOOL_A_PREEMPT_ENABLED``           bool    frame preemption enabled
>    ``ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK``  bitset  preemptible queue mask
>    ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``     u32     Min additional frag size
> +  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``    bool    disable verification
>    ======================================  ======  ==========================
>  
>  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 42570ec8ee44..5600a7610fa1 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -13,6 +13,7 @@
>  #ifndef _LINUX_ETHTOOL_H
>  #define _LINUX_ETHTOOL_H
>  
> +#include "asm-generic/int-ll64.h"

Why this header, and why now?

>  #include <linux/bitmap.h>
>  #include <linux/compat.h>
>  #include <linux/netlink.h>
> @@ -464,6 +465,8 @@ struct ethtool_module_power_mode_params {
>  struct ethtool_fp {
>  	u32 enabled;
>  	u32 preemptible_mask;
> +	u32 disable_verify;
> +	u32 verified;
>  	u32 add_frag_size;
>  };
>  
> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
> index 651c7af76776..27c9bc5bfa51 100644
> --- a/include/uapi/linux/ethtool_netlink.h
> +++ b/include/uapi/linux/ethtool_netlink.h
> @@ -709,6 +709,8 @@ enum {
>  	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
>  	ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK,		/* bitset */
>  	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
> +	ETHTOOL_A_PREEMPT_DISABLE_VERIFY,		/* u8 */
> +	ETHTOOL_A_PREEMPT_VERIFIED,			/* u8 */

Clause 30.14.1.2 aMACMergeStatusVerify talks about a state machine with
more than 2 states (verified and not verified):

An ENUMERATED VALUE that has one of the following entries:?
unknown: verification status is unknown
initial: the Verify State diagram (Figure 99-8) is in the state INIT_VERIFICATION
verifying: the Verify State diagram is in the state VERIFICATION_IDLE, SEND_VERIFY or WAIT_FOR_RESPONSE
succeeded: indicates that the Verify State diagram is in the state VERIFIED
failed: the Verify State diagram is in the state VERIFY_FAIL
disabled: verification of preemption operation is disabled

BEHAVIOUR DEFINED AS:?
This attribute indicates (when accessed via a GET operation) the status of the MAC Merge
sublayer verification on the given device. The SET operation shall have no effect on a device.

Could we have an enum here with all the states?

>  
>  	/* add new constants above here */
>  	__ETHTOOL_A_PREEMPT_CNT,
> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
> index 444799f3e91a..dfdef5b8fe5b 100644
> --- a/net/ethtool/netlink.h
> +++ b/net/ethtool/netlink.h
> @@ -381,7 +381,7 @@ extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
>  extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
>  extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
>  extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
> -extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
> +extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_VERIFIED + 1];
>  extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
>  extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
>  extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
> diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
> index 0000ba8cb90c..7566ffb948b2 100644
> --- a/net/ethtool/preempt.c
> +++ b/net/ethtool/preempt.c
> @@ -63,6 +63,8 @@ static int preempt_reply_size(const struct ethnl_req_info *req_base,
>  
>  	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
>  	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_DISABLE_VERIFY */
> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_VERIFIED */
>  
>  	return len;
>  }
> @@ -89,6 +91,12 @@ static int preempt_fill_reply(struct sk_buff *skb,
>  	if (ret < 0)
>  		return ret;
>  
> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_DISABLE_VERIFY, preempt->disable_verify))
> +		return -EMSGSIZE;
> +
> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_VERIFIED, preempt->verified))
> +		return -EMSGSIZE;
> +
>  	return 0;
>  }
>  
> @@ -110,6 +118,7 @@ ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
>  	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
>  	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
>  	[ETHTOOL_A_PREEMPT_PREEMPTIBLE_MASK]		= { .type = NLA_NESTED },
> +	[ETHTOOL_A_PREEMPT_DISABLE_VERIFY]		= NLA_POLICY_RANGE(NLA_U8, 0, 1),
>  };
>  
>  int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
> @@ -155,6 +164,8 @@ int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
>  			    tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
>  	ethnl_update_u32(&preempt.add_frag_size,
>  			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
> +	ethnl_update_bool32(&preempt.disable_verify,
> +			    tb[ETHTOOL_A_PREEMPT_DISABLE_VERIFY], &mod);
>  	ret = 0;
>  	if (!mod)
>  		goto out_ops;
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 06/11] igc: Add support for receiving errored frames
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  9:19     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:19 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Po Liu, boon.leong.ong,
	intel-wired-lan

On Thu, May 19, 2022 at 06:15:33PM -0700, Vinicius Costa Gomes wrote:
> While developing features that require sending potencially ill formed
> frames, it is useful being able to receive them on the other side.
> 
> The driver already had all the pieces in place to support that, all
> that was missing was put the flag in the list of supported features.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

Is this required to run the verification state machine in software, or
just for debugging?

>  drivers/net/ethernet/intel/igc/igc_main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index bcbf35b32ef3..5dd7140bac82 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -6318,6 +6318,7 @@ static int igc_probe(struct pci_dev *pdev,
>  
>  	/* copy netdev features into list of user selectable features */
>  	netdev->hw_features |= NETIF_F_NTUPLE;
> +	netdev->hw_features |= NETIF_F_RXALL;
>  	netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX;
>  	netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX;
>  	netdev->hw_features |= netdev->features;
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 06/11] igc: Add support for receiving errored frames
@ 2022-05-20  9:19     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:19 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, May 19, 2022 at 06:15:33PM -0700, Vinicius Costa Gomes wrote:
> While developing features that require sending potencially ill formed
> frames, it is useful being able to receive them on the other side.
> 
> The driver already had all the pieces in place to support that, all
> that was missing was put the flag in the list of supported features.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

Is this required to run the verification state machine in software, or
just for debugging?

>  drivers/net/ethernet/intel/igc/igc_main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index bcbf35b32ef3..5dd7140bac82 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -6318,6 +6318,7 @@ static int igc_probe(struct pci_dev *pdev,
>  
>  	/* copy netdev features into list of user selectable features */
>  	netdev->hw_features |= NETIF_F_NTUPLE;
> +	netdev->hw_features |= NETIF_F_RXALL;
>  	netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX;
>  	netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX;
>  	netdev->hw_features |= netdev->features;
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 08/11] igc: Add support for setting frame preemption configuration
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  9:22     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:22 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Po Liu, boon.leong.ong,
	intel-wired-lan

On Thu, May 19, 2022 at 06:15:35PM -0700, Vinicius Costa Gomes wrote:
> Set the hardware register that enables the frame preemption feature.
> 
> Some code is moved around because the PREEMPT_ENA bit in the
> IGC_TQAVCTRL register is recommended to be set after the individual
> queue registers (IGC_TXQCTL[i]) are set.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

Could you please squash this patch with the previous one, which just
copies the settings from ethtool into the adapter but doesn't do
anything with them?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 08/11] igc: Add support for setting frame preemption configuration
@ 2022-05-20  9:22     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:22 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, May 19, 2022 at 06:15:35PM -0700, Vinicius Costa Gomes wrote:
> Set the hardware register that enables the frame preemption feature.
> 
> Some code is moved around because the PREEMPT_ENA bit in the
> IGC_TQAVCTRL register is recommended to be set after the individual
> queue registers (IGC_TXQCTL[i]) are set.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

Could you please squash this patch with the previous one, which just
copies the settings from ethtool into the adapter but doesn't do
anything with them?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 05/11] igc: Optimze TX buffer sizes for TSN
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20  9:33     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:33 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Po Liu, boon.leong.ong,
	intel-wired-lan

On Thu, May 19, 2022 at 06:15:32PM -0700, Vinicius Costa Gomes wrote:
> There are 64KB buffer space shared for TX and RX (including the BMC).
> We were only reserving 22KB for TX, increase each TX buffer (per
> queue) by 2KB, the total is now 30KB for TX.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

Typo in title: optimize

>  drivers/net/ethernet/intel/igc/igc_defines.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
> index f609b2dbbc28..62fff53254dd 100644
> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
> @@ -395,7 +395,7 @@
>  #define I225_TXPBSIZE_DEFAULT	0x04000014 /* TXPBSIZE default */
>  #define IGC_RXPBS_CFG_TS_EN	0x80000000 /* Timestamp in Rx buffer */
>  
> -#define IGC_TXPBSIZE_TSN	0x04145145 /* 5k bytes buffer for each queue */
> +#define IGC_TXPBSIZE_TSN	0x041c71c7 /* 7KB buffer for each queue + 2KB for BMC */
>  #define IGC_RXPBSIZE_TSN	0x0000f08f /* 15KB for EXP + 15KB for BE + 2KB for BMC */
>  #define IGC_RXPBSIZE_SIZE_MASK	0x0001FFFF
>  
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 05/11] igc: Optimze TX buffer sizes for TSN
@ 2022-05-20  9:33     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20  9:33 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, May 19, 2022 at 06:15:32PM -0700, Vinicius Costa Gomes wrote:
> There are 64KB buffer space shared for TX and RX (including the BMC).
> We were only reserving 22KB for TX, increase each TX buffer (per
> queue) by 2KB, the total is now 30KB for TX.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

Typo in title: optimize

>  drivers/net/ethernet/intel/igc/igc_defines.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
> index f609b2dbbc28..62fff53254dd 100644
> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
> @@ -395,7 +395,7 @@
>  #define I225_TXPBSIZE_DEFAULT	0x04000014 /* TXPBSIZE default */
>  #define IGC_RXPBS_CFG_TS_EN	0x80000000 /* Timestamp in Rx buffer */
>  
> -#define IGC_TXPBSIZE_TSN	0x04145145 /* 5k bytes buffer for each queue */
> +#define IGC_TXPBSIZE_TSN	0x041c71c7 /* 7KB buffer for each queue + 2KB for BMC */
>  #define IGC_RXPBSIZE_TSN	0x0000f08f /* 15KB for EXP + 15KB for BE + 2KB for BMC */
>  #define IGC_RXPBSIZE_SIZE_MASK	0x0001FFFF
>  
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 09/11] igc: Add support for Frame Preemption verification
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20 10:43     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20 10:43 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Po Liu, boon.leong.ong,
	intel-wired-lan

On Thu, May 19, 2022 at 06:15:36PM -0700, Vinicius Costa Gomes wrote:
> Add support for sending/receiving Frame Preemption verification
> frames.
> 
> The i225 hardware doesn't implement the process of verification
> internally, this is left to the driver.
> 
> Add a simple implementation of the state machine defined in IEEE
> 802.3-2018, Section 99.4.7.
> 
> For now, the state machine is started manually by the user, when
> enabling verification. Example:
> 
> $ ethtool --set-frame-preemption IFACE disable-verify off
> 
> The "verified" condition is set to true when the SMD-V frame is sent,
> and the SMD-R frame is received. So, it only tracks the transmission
> side. This seems to be what's expected from IEEE 802.3-2018.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc.h         |  16 ++
>  drivers/net/ethernet/intel/igc/igc_defines.h |  13 +
>  drivers/net/ethernet/intel/igc/igc_ethtool.c |  37 ++-
>  drivers/net/ethernet/intel/igc/igc_main.c    | 243 +++++++++++++++++++
>  4 files changed, 307 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index 11da66bd9c2c..be4a8362d6d7 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -131,6 +131,13 @@ struct igc_ring {
>  	struct xsk_buff_pool *xsk_pool;
>  } ____cacheline_internodealigned_in_smp;
>  
> +enum frame_preemption_state {
> +	FRAME_PREEMPTION_STATE_FAILED,
> +	FRAME_PREEMPTION_STATE_DONE,
> +	FRAME_PREEMPTION_STATE_START,
> +	FRAME_PREEMPTION_STATE_SENT,
> +};
> +
>  /* Board specific private data structure */
>  struct igc_adapter {
>  	struct net_device *netdev;
> @@ -184,6 +191,7 @@ struct igc_adapter {
>  	ktime_t base_time;
>  	ktime_t cycle_time;
>  	bool frame_preemption_active;
> +	bool frame_preemption_requested;
>  	u32 add_frag_size;
>  
>  	/* OS defined structs */
> @@ -250,6 +258,14 @@ struct igc_adapter {
>  		struct timespec64 start;
>  		struct timespec64 period;
>  	} perout[IGC_N_PEROUT];
> +
> +	struct delayed_work fp_verification_work;
> +	unsigned long fp_start;
> +	bool fp_received_smd_v;
> +	bool fp_received_smd_r;
> +	unsigned int fp_verify_cnt;
> +	enum frame_preemption_state fp_tx_state;
> +	bool fp_disable_verify;
>  };
>  
>  void igc_up(struct igc_adapter *adapter);
> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
> index 68faca584e34..63fc76a0b72a 100644
> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
> @@ -307,6 +307,8 @@
>  #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
>  #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
>  #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
> +#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
> +#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
>  #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
>  #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
>  #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
> @@ -366,9 +368,20 @@
>  
>  #define IGC_RXDEXT_STATERR_LB	0x00040000
>  
> +#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
> +#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
> +
>  /* Advanced Receive Descriptor bit definitions */
>  #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
>  
> +#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
> +#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
> +
> +#define IGC_SMD_TYPE_SFD		0x0
> +#define IGC_SMD_TYPE_SMD_V		0x1
> +#define IGC_SMD_TYPE_SMD_R		0x2
> +#define IGC_SMD_TYPE_COMPLETE		0x3
> +
>  #define IGC_RXDEXT_STATERR_L4E		0x20000000
>  #define IGC_RXDEXT_STATERR_IPE		0x40000000
>  #define IGC_RXDEXT_STATERR_RXE		0x80000000
> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> index 401d2cdb3e81..9a80e2569dc3 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> @@ -1680,6 +1680,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
>  
>  	fpcmd->enabled = adapter->frame_preemption_active;
>  	fpcmd->add_frag_size = adapter->add_frag_size;
> +	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
> +	fpcmd->disable_verify = adapter->fp_disable_verify;
>  
>  	for (i = 0; i < adapter->num_tx_queues; i++) {
>  		struct igc_ring *ring = adapter->tx_ring[i];
> @@ -1698,6 +1700,7 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
>  				   struct netlink_ext_ack *extack)
>  {
>  	struct igc_adapter *adapter = netdev_priv(netdev);
> +	bool verified = false, mask_changed = false;

"verified" is assigned unconditionally below, no need to initialize it to false.

>  	u32 mask;
>  	int i;
>  
> @@ -1706,17 +1709,47 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
>  		return -EINVAL;
>  	}
>  
> -	adapter->frame_preemption_active = fpcmd->enabled;
> +	adapter->frame_preemption_requested = fpcmd->enabled;
>  	adapter->add_frag_size = fpcmd->add_frag_size;
>  	mask = fpcmd->preemptible_mask;
>  
>  	for (i = 0; i < adapter->num_tx_queues; i++) {
>  		struct igc_ring *ring = adapter->tx_ring[i];
> +		bool preemptible = mask & BIT(i);
> +
> +		if (ring->preemptible != preemptible)
> +			mask_changed = true;
>  
>  		ring->preemptible = (mask & BIT(i));
>  	}
>  
> -	return igc_tsn_offload_apply(adapter);
> +	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +		schedule_delayed_work(&adapter->fp_verification_work,
> +				      msecs_to_jiffies(10));
> +	}
> +
> +	adapter->fp_disable_verify = fpcmd->disable_verify;

This races with the first check in the fp_verification_work, so it may
see an old fp_disable_verify value.

> +
> +	verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
> +
> +	/* If the verification was not done, we want to enable frame
> +	 * preemption and we have not finished it, wait for it to
> +	 * finish.
> +	 */
> +	if (!verified && !adapter->fp_disable_verify && adapter->frame_preemption_requested)
> +		return 0;

This is a bit hard to follow, sorry if I am misunderstanding something.
But in principle, you exit early if preemption is enabled (requested),
verification is enabled, and verification isn't complete.

So you proceed on the negated condition, i.e. preemption is disabled, or
verification is disabled, or verification is complete. Is the last
condition what you want? You race with the schedule_delayed_work()
above, and verification may become complete, case in which you go ahead
to the next check. Intuitively, this code block right here should only
deal with the case where we don't have verification enabled, but the
checks allow other conditions to pass.

So the next check here, right below:

	if (adapter->frame_preemption_active != adapter->frame_preemption_requested ||

races with the verify state machine doing this:

1			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
2			adapter->fp_received_smd_r = false;
3
4			if (adapter->frame_preemption_requested) {
5				adapter->frame_preemption_active = true;
6				igc_tsn_offload_apply(adapter);
7			}

Because "verified == true" makes us run further, this only means that line 1
above has already executed in the state machine. But it doesn't mean
that lines 2...5 have executed. If the state machine kthread is
preempted too between lines 1 and 5, then both igc_ethtool_set_preempt()
and igc_fp_verification_work() will end up calling igc_tsn_offload_apply().

Have you considered just introducing a DISABLED state in your verify
state machine, and handling that case in the delayed work as well, to
reduce the potential for races?

> +
> +	if (adapter->frame_preemption_active != adapter->frame_preemption_requested ||
> +	    adapter->add_frag_size != fpcmd->add_frag_size ||

To save some line space, could you perhaps rename "frame_preemption_" to "fp_"?

> +	    mask_changed) {
> +		adapter->frame_preemption_active = adapter->frame_preemption_requested;
> +		adapter->add_frag_size = fpcmd->add_frag_size;
> +
> +		return igc_tsn_offload_apply(adapter);
> +	}
> +
> +	return 0;
>  }
>  
>  static int igc_ethtool_begin(struct net_device *netdev)
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 5dd7140bac82..69e96e9a3ec8 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -30,6 +30,11 @@
>  #define IGC_XDP_TX		BIT(1)
>  #define IGC_XDP_REDIRECT	BIT(2)
>  
> +#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
> +#define IGC_MAX_VERIFY_CNT 3
> +
> +#define IGC_FP_SMD_FRAME_SIZE 60
> +
>  static int debug = -1;
>  
>  MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
> @@ -2190,6 +2195,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
>  	return 0;
>  }
>  
> +static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
> +				 struct sk_buff *skb)
> +{
> +	dma_addr_t dma;
> +	unsigned int size;

Variable ordering longest to shortest please. Also, "size" could be initialized inline.

> +
> +	size = skb_headlen(skb);

I think alloc_skb() doesn't create nonlinear skbs, only alloc_skb_with_frags(),
so this could be skb->len.

> +
> +	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
> +	if (dma_mapping_error(ring->dev, dma)) {
> +		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
> +		return -ENOMEM;
> +	}
> +
> +	buffer->skb = skb;
> +	buffer->protocol = 0;
> +	buffer->bytecount = skb->len;
> +	buffer->gso_segs = 1;
> +	buffer->time_stamp = jiffies;
> +	dma_unmap_len_set(buffer, len, skb->len);

And then use "size" here and in buffer->bytecount.

> +	dma_unmap_addr_set(buffer, dma, dma);
> +
> +	return 0;
> +}
> +
> +static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
> +				     struct sk_buff *skb, int type)
> +{
> +	struct igc_tx_buffer *buffer;
> +	union igc_adv_tx_desc *desc;
> +	u32 cmd_type, olinfo_status;
> +	int err;
> +
> +	if (!igc_desc_unused(ring))
> +		return -EBUSY;
> +
> +	buffer = &ring->tx_buffer_info[ring->next_to_use];
> +	err = igc_fp_init_smd_frame(ring, buffer, skb);
> +	if (err)
> +		return err;
> +
> +	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
> +		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
> +		   buffer->bytecount;
> +	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
> +
> +	switch (type) {
> +	case IGC_SMD_TYPE_SMD_V:
> +		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
> +		break;
> +	case IGC_SMD_TYPE_SMD_R:
> +		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	desc = IGC_TX_DESC(ring, ring->next_to_use);
> +	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> +	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> +	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
> +
> +	netdev_tx_sent_queue(txring_txq(ring), skb->len);
> +
> +	buffer->next_to_watch = desc;
> +
> +	ring->next_to_use++;
> +	if (ring->next_to_use == ring->count)
> +		ring->next_to_use = 0;
> +
> +	return 0;
> +}
> +
>  static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
>  					    int cpu)
>  {
> @@ -2317,6 +2395,43 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
>  	q_vector->rx.total_bytes += bytes;
>  }
>  
> +static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
> +{
> +	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
> +
> +	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
> +		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
> +}
> +
> +static bool igc_check_smd_frame(void *pktbuf, unsigned int size)
> +{
> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> +	const u32 *b;
> +#else
> +	const u16 *b;
> +#endif
> +	int i;
> +
> +	if (size != 60)
> +		return false;
> +
> +	/* The SMD frames (V and R) have the preamble, the SMD tag, 60
> +	 * octects of zeroes and the mCRC. At this point the hardware

Typo: octets

> +	 * already discarded most of that, so we only need to check
> +	 * the "contents" of the frame.
> +	 */
> +	b = pktbuf;
> +	for (i = 16 / sizeof(*b); i < size / sizeof(*b); i++)
> +		/* FIXME: i226 seems to insert some garbage
> +		 * (timestamps?) in SMD frames, ignore the first 16
> +		 * bytes (4 words). Investigate better.
> +		 */
> +		if (b[i] != 0)
> +			return false;
> +
> +	return true;
> +}

I admit I'm not really following the clean_rx procedure. But do you call
igc_put_rx_buffer() for SMD frames, i.e. are you DMA unmapping the
buffer before you look at it? It seems like you have the "smd" check too
early. If you enable CONFIG_DMA_API_DEBUG, does it say anything?

> +
>  static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  {
>  	unsigned int total_bytes = 0, total_packets = 0;
> @@ -2333,6 +2448,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  		ktime_t timestamp = 0;
>  		struct xdp_buff xdp;
>  		int pkt_offset = 0;
> +		int smd_type;
>  		void *pktbuf;
>  
>  		/* return some buffers to hardware, one at a time is too slow */
> @@ -2364,6 +2480,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  			size -= IGC_TS_HDR_LEN;
>  		}
>  
> +		smd_type = igc_rx_desc_smd_type(rx_desc);
> +
> +		if (unlikely(smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R)) {
> +			if (igc_check_smd_frame(pktbuf, size)) {
> +				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
> +				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
> +				schedule_delayed_work(&adapter->fp_verification_work, 0);
> +			}
> +
> +			/* Advance the ring next-to-clean */
> +			igc_is_non_eop(rx_ring, rx_desc);
> +
> +			cleaned_count++;
> +			continue;
> +		}
> +
>  		if (!skb) {
>  			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>  			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
> @@ -6003,6 +6135,116 @@ static int igc_tsn_enable_cbs(struct igc_adapter *adapter,
>  	return igc_tsn_offload_apply(adapter);
>  }
>  
> +/* I225 doesn't send the SMD frames automatically, we need to handle
> + * them ourselves.
> + */
> +static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
> +{
> +	int cpu = smp_processor_id();
> +	struct netdev_queue *nq;
> +	struct igc_ring *ring;
> +	struct sk_buff *skb;
> +	void *data;
> +	int err;
> +
> +	if (!netif_running(adapter->netdev))
> +		return -ENOTCONN;
> +
> +	/* FIXME: rename this function to something less specific, as
> +	 * it can be used outside XDP.
> +	 */
> +	ring = igc_xdp_get_tx_ring(adapter, cpu);
> +	nq = txring_txq(ring);
> +
> +	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return -ENOMEM;
> +
> +	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
> +	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
> +
> +	__netif_tx_lock_bh(nq);
> +
> +	err = igc_fp_init_tx_descriptor(ring, skb, type);
> +
> +	igc_flush_tx_descriptors(ring);
> +
> +	__netif_tx_unlock_bh(nq);
> +
> +	return err;
> +}
> +
> +static void igc_fp_verification_work(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct igc_adapter *adapter;
> +	int err;
> +
> +	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
> +
> +	if (adapter->fp_disable_verify)
> +		goto done;
> +
> +	switch (adapter->fp_tx_state) {
> +	case FRAME_PREEMPTION_STATE_START:
> +		adapter->fp_received_smd_r = false;
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
> +
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
> +		adapter->fp_start = jiffies;
> +		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_SENT:
> +		if (adapter->fp_received_smd_r) {
> +			/* Verifcation has finished successfully, we

Typo: verification

> +			 * can enable frame preemption in the hw now
> +			 */
> +			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
> +			adapter->fp_received_smd_r = false;
> +
> +			if (adapter->frame_preemption_requested) {
> +				adapter->frame_preemption_active = true;

Maybe WRITE_ONCE(adapter->fp_active, true) here, and READ_ONCE
everywhere else, so annotate lockless accesses?

> +				igc_tsn_offload_apply(adapter);
> +			}
> +
> +			break;
> +		}
> +
> +		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
> +			adapter->fp_verify_cnt++;
> +			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
> +
> +			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
> +				adapter->fp_verify_cnt = 0;
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
> +				netdev_err(adapter->netdev,
> +					   "Exceeded number of attempts for frame preemption verification\n");
> +			} else {
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +			}
> +			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		}
> +
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_FAILED:
> +	case FRAME_PREEMPTION_STATE_DONE:
> +		break;
> +	}
> +
> +done:
> +	if (adapter->fp_received_smd_v) {
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
> +
> +		adapter->fp_received_smd_v = false;
> +	}
> +}
> +
>  static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  			void *type_data)
>  {
> @@ -6369,6 +6611,7 @@ static int igc_probe(struct pci_dev *pdev,
>  
>  	INIT_WORK(&adapter->reset_task, igc_reset_task);
>  	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
> +	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
>  
>  	/* Initialize link properties that are user-changeable */
>  	adapter->fc_autoneg = true;
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 09/11] igc: Add support for Frame Preemption verification
@ 2022-05-20 10:43     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20 10:43 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, May 19, 2022 at 06:15:36PM -0700, Vinicius Costa Gomes wrote:
> Add support for sending/receiving Frame Preemption verification
> frames.
> 
> The i225 hardware doesn't implement the process of verification
> internally, this is left to the driver.
> 
> Add a simple implementation of the state machine defined in IEEE
> 802.3-2018, Section 99.4.7.
> 
> For now, the state machine is started manually by the user, when
> enabling verification. Example:
> 
> $ ethtool --set-frame-preemption IFACE disable-verify off
> 
> The "verified" condition is set to true when the SMD-V frame is sent,
> and the SMD-R frame is received. So, it only tracks the transmission
> side. This seems to be what's expected from IEEE 802.3-2018.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc.h         |  16 ++
>  drivers/net/ethernet/intel/igc/igc_defines.h |  13 +
>  drivers/net/ethernet/intel/igc/igc_ethtool.c |  37 ++-
>  drivers/net/ethernet/intel/igc/igc_main.c    | 243 +++++++++++++++++++
>  4 files changed, 307 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index 11da66bd9c2c..be4a8362d6d7 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -131,6 +131,13 @@ struct igc_ring {
>  	struct xsk_buff_pool *xsk_pool;
>  } ____cacheline_internodealigned_in_smp;
>  
> +enum frame_preemption_state {
> +	FRAME_PREEMPTION_STATE_FAILED,
> +	FRAME_PREEMPTION_STATE_DONE,
> +	FRAME_PREEMPTION_STATE_START,
> +	FRAME_PREEMPTION_STATE_SENT,
> +};
> +
>  /* Board specific private data structure */
>  struct igc_adapter {
>  	struct net_device *netdev;
> @@ -184,6 +191,7 @@ struct igc_adapter {
>  	ktime_t base_time;
>  	ktime_t cycle_time;
>  	bool frame_preemption_active;
> +	bool frame_preemption_requested;
>  	u32 add_frag_size;
>  
>  	/* OS defined structs */
> @@ -250,6 +258,14 @@ struct igc_adapter {
>  		struct timespec64 start;
>  		struct timespec64 period;
>  	} perout[IGC_N_PEROUT];
> +
> +	struct delayed_work fp_verification_work;
> +	unsigned long fp_start;
> +	bool fp_received_smd_v;
> +	bool fp_received_smd_r;
> +	unsigned int fp_verify_cnt;
> +	enum frame_preemption_state fp_tx_state;
> +	bool fp_disable_verify;
>  };
>  
>  void igc_up(struct igc_adapter *adapter);
> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
> index 68faca584e34..63fc76a0b72a 100644
> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
> @@ -307,6 +307,8 @@
>  #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
>  #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
>  #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
> +#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
> +#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
>  #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
>  #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
>  #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
> @@ -366,9 +368,20 @@
>  
>  #define IGC_RXDEXT_STATERR_LB	0x00040000
>  
> +#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
> +#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
> +
>  /* Advanced Receive Descriptor bit definitions */
>  #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
>  
> +#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
> +#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
> +
> +#define IGC_SMD_TYPE_SFD		0x0
> +#define IGC_SMD_TYPE_SMD_V		0x1
> +#define IGC_SMD_TYPE_SMD_R		0x2
> +#define IGC_SMD_TYPE_COMPLETE		0x3
> +
>  #define IGC_RXDEXT_STATERR_L4E		0x20000000
>  #define IGC_RXDEXT_STATERR_IPE		0x40000000
>  #define IGC_RXDEXT_STATERR_RXE		0x80000000
> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> index 401d2cdb3e81..9a80e2569dc3 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> @@ -1680,6 +1680,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
>  
>  	fpcmd->enabled = adapter->frame_preemption_active;
>  	fpcmd->add_frag_size = adapter->add_frag_size;
> +	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
> +	fpcmd->disable_verify = adapter->fp_disable_verify;
>  
>  	for (i = 0; i < adapter->num_tx_queues; i++) {
>  		struct igc_ring *ring = adapter->tx_ring[i];
> @@ -1698,6 +1700,7 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
>  				   struct netlink_ext_ack *extack)
>  {
>  	struct igc_adapter *adapter = netdev_priv(netdev);
> +	bool verified = false, mask_changed = false;

"verified" is assigned unconditionally below, no need to initialize it to false.

>  	u32 mask;
>  	int i;
>  
> @@ -1706,17 +1709,47 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
>  		return -EINVAL;
>  	}
>  
> -	adapter->frame_preemption_active = fpcmd->enabled;
> +	adapter->frame_preemption_requested = fpcmd->enabled;
>  	adapter->add_frag_size = fpcmd->add_frag_size;
>  	mask = fpcmd->preemptible_mask;
>  
>  	for (i = 0; i < adapter->num_tx_queues; i++) {
>  		struct igc_ring *ring = adapter->tx_ring[i];
> +		bool preemptible = mask & BIT(i);
> +
> +		if (ring->preemptible != preemptible)
> +			mask_changed = true;
>  
>  		ring->preemptible = (mask & BIT(i));
>  	}
>  
> -	return igc_tsn_offload_apply(adapter);
> +	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +		schedule_delayed_work(&adapter->fp_verification_work,
> +				      msecs_to_jiffies(10));
> +	}
> +
> +	adapter->fp_disable_verify = fpcmd->disable_verify;

This races with the first check in the fp_verification_work, so it may
see an old fp_disable_verify value.

> +
> +	verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
> +
> +	/* If the verification was not done, we want to enable frame
> +	 * preemption and we have not finished it, wait for it to
> +	 * finish.
> +	 */
> +	if (!verified && !adapter->fp_disable_verify && adapter->frame_preemption_requested)
> +		return 0;

This is a bit hard to follow, sorry if I am misunderstanding something.
But in principle, you exit early if preemption is enabled (requested),
verification is enabled, and verification isn't complete.

So you proceed on the negated condition, i.e. preemption is disabled, or
verification is disabled, or verification is complete. Is the last
condition what you want? You race with the schedule_delayed_work()
above, and verification may become complete, case in which you go ahead
to the next check. Intuitively, this code block right here should only
deal with the case where we don't have verification enabled, but the
checks allow other conditions to pass.

So the next check here, right below:

	if (adapter->frame_preemption_active != adapter->frame_preemption_requested ||

races with the verify state machine doing this:

1			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
2			adapter->fp_received_smd_r = false;
3
4			if (adapter->frame_preemption_requested) {
5				adapter->frame_preemption_active = true;
6				igc_tsn_offload_apply(adapter);
7			}

Because "verified == true" makes us run further, this only means that line 1
above has already executed in the state machine. But it doesn't mean
that lines 2...5 have executed. If the state machine kthread is
preempted too between lines 1 and 5, then both igc_ethtool_set_preempt()
and igc_fp_verification_work() will end up calling igc_tsn_offload_apply().

Have you considered just introducing a DISABLED state in your verify
state machine, and handling that case in the delayed work as well, to
reduce the potential for races?

> +
> +	if (adapter->frame_preemption_active != adapter->frame_preemption_requested ||
> +	    adapter->add_frag_size != fpcmd->add_frag_size ||

To save some line space, could you perhaps rename "frame_preemption_" to "fp_"?

> +	    mask_changed) {
> +		adapter->frame_preemption_active = adapter->frame_preemption_requested;
> +		adapter->add_frag_size = fpcmd->add_frag_size;
> +
> +		return igc_tsn_offload_apply(adapter);
> +	}
> +
> +	return 0;
>  }
>  
>  static int igc_ethtool_begin(struct net_device *netdev)
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 5dd7140bac82..69e96e9a3ec8 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -30,6 +30,11 @@
>  #define IGC_XDP_TX		BIT(1)
>  #define IGC_XDP_REDIRECT	BIT(2)
>  
> +#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
> +#define IGC_MAX_VERIFY_CNT 3
> +
> +#define IGC_FP_SMD_FRAME_SIZE 60
> +
>  static int debug = -1;
>  
>  MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
> @@ -2190,6 +2195,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
>  	return 0;
>  }
>  
> +static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
> +				 struct sk_buff *skb)
> +{
> +	dma_addr_t dma;
> +	unsigned int size;

Variable ordering longest to shortest please. Also, "size" could be initialized inline.

> +
> +	size = skb_headlen(skb);

I think alloc_skb() doesn't create nonlinear skbs, only alloc_skb_with_frags(),
so this could be skb->len.

> +
> +	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
> +	if (dma_mapping_error(ring->dev, dma)) {
> +		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
> +		return -ENOMEM;
> +	}
> +
> +	buffer->skb = skb;
> +	buffer->protocol = 0;
> +	buffer->bytecount = skb->len;
> +	buffer->gso_segs = 1;
> +	buffer->time_stamp = jiffies;
> +	dma_unmap_len_set(buffer, len, skb->len);

And then use "size" here and in buffer->bytecount.

> +	dma_unmap_addr_set(buffer, dma, dma);
> +
> +	return 0;
> +}
> +
> +static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
> +				     struct sk_buff *skb, int type)
> +{
> +	struct igc_tx_buffer *buffer;
> +	union igc_adv_tx_desc *desc;
> +	u32 cmd_type, olinfo_status;
> +	int err;
> +
> +	if (!igc_desc_unused(ring))
> +		return -EBUSY;
> +
> +	buffer = &ring->tx_buffer_info[ring->next_to_use];
> +	err = igc_fp_init_smd_frame(ring, buffer, skb);
> +	if (err)
> +		return err;
> +
> +	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
> +		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
> +		   buffer->bytecount;
> +	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
> +
> +	switch (type) {
> +	case IGC_SMD_TYPE_SMD_V:
> +		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
> +		break;
> +	case IGC_SMD_TYPE_SMD_R:
> +		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	desc = IGC_TX_DESC(ring, ring->next_to_use);
> +	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> +	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> +	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
> +
> +	netdev_tx_sent_queue(txring_txq(ring), skb->len);
> +
> +	buffer->next_to_watch = desc;
> +
> +	ring->next_to_use++;
> +	if (ring->next_to_use == ring->count)
> +		ring->next_to_use = 0;
> +
> +	return 0;
> +}
> +
>  static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
>  					    int cpu)
>  {
> @@ -2317,6 +2395,43 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
>  	q_vector->rx.total_bytes += bytes;
>  }
>  
> +static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
> +{
> +	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
> +
> +	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
> +		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
> +}
> +
> +static bool igc_check_smd_frame(void *pktbuf, unsigned int size)
> +{
> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> +	const u32 *b;
> +#else
> +	const u16 *b;
> +#endif
> +	int i;
> +
> +	if (size != 60)
> +		return false;
> +
> +	/* The SMD frames (V and R) have the preamble, the SMD tag, 60
> +	 * octects of zeroes and the mCRC. At this point the hardware

Typo: octets

> +	 * already discarded most of that, so we only need to check
> +	 * the "contents" of the frame.
> +	 */
> +	b = pktbuf;
> +	for (i = 16 / sizeof(*b); i < size / sizeof(*b); i++)
> +		/* FIXME: i226 seems to insert some garbage
> +		 * (timestamps?) in SMD frames, ignore the first 16
> +		 * bytes (4 words). Investigate better.
> +		 */
> +		if (b[i] != 0)
> +			return false;
> +
> +	return true;
> +}

I admit I'm not really following the clean_rx procedure. But do you call
igc_put_rx_buffer() for SMD frames, i.e. are you DMA unmapping the
buffer before you look at it? It seems like you have the "smd" check too
early. If you enable CONFIG_DMA_API_DEBUG, does it say anything?

> +
>  static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  {
>  	unsigned int total_bytes = 0, total_packets = 0;
> @@ -2333,6 +2448,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  		ktime_t timestamp = 0;
>  		struct xdp_buff xdp;
>  		int pkt_offset = 0;
> +		int smd_type;
>  		void *pktbuf;
>  
>  		/* return some buffers to hardware, one at a time is too slow */
> @@ -2364,6 +2480,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  			size -= IGC_TS_HDR_LEN;
>  		}
>  
> +		smd_type = igc_rx_desc_smd_type(rx_desc);
> +
> +		if (unlikely(smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R)) {
> +			if (igc_check_smd_frame(pktbuf, size)) {
> +				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
> +				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
> +				schedule_delayed_work(&adapter->fp_verification_work, 0);
> +			}
> +
> +			/* Advance the ring next-to-clean */
> +			igc_is_non_eop(rx_ring, rx_desc);
> +
> +			cleaned_count++;
> +			continue;
> +		}
> +
>  		if (!skb) {
>  			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>  			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
> @@ -6003,6 +6135,116 @@ static int igc_tsn_enable_cbs(struct igc_adapter *adapter,
>  	return igc_tsn_offload_apply(adapter);
>  }
>  
> +/* I225 doesn't send the SMD frames automatically, we need to handle
> + * them ourselves.
> + */
> +static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
> +{
> +	int cpu = smp_processor_id();
> +	struct netdev_queue *nq;
> +	struct igc_ring *ring;
> +	struct sk_buff *skb;
> +	void *data;
> +	int err;
> +
> +	if (!netif_running(adapter->netdev))
> +		return -ENOTCONN;
> +
> +	/* FIXME: rename this function to something less specific, as
> +	 * it can be used outside XDP.
> +	 */
> +	ring = igc_xdp_get_tx_ring(adapter, cpu);
> +	nq = txring_txq(ring);
> +
> +	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return -ENOMEM;
> +
> +	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
> +	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
> +
> +	__netif_tx_lock_bh(nq);
> +
> +	err = igc_fp_init_tx_descriptor(ring, skb, type);
> +
> +	igc_flush_tx_descriptors(ring);
> +
> +	__netif_tx_unlock_bh(nq);
> +
> +	return err;
> +}
> +
> +static void igc_fp_verification_work(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct igc_adapter *adapter;
> +	int err;
> +
> +	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
> +
> +	if (adapter->fp_disable_verify)
> +		goto done;
> +
> +	switch (adapter->fp_tx_state) {
> +	case FRAME_PREEMPTION_STATE_START:
> +		adapter->fp_received_smd_r = false;
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
> +
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
> +		adapter->fp_start = jiffies;
> +		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_SENT:
> +		if (adapter->fp_received_smd_r) {
> +			/* Verifcation has finished successfully, we

Typo: verification

> +			 * can enable frame preemption in the hw now
> +			 */
> +			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
> +			adapter->fp_received_smd_r = false;
> +
> +			if (adapter->frame_preemption_requested) {
> +				adapter->frame_preemption_active = true;

Maybe WRITE_ONCE(adapter->fp_active, true) here, and READ_ONCE
everywhere else, so annotate lockless accesses?

> +				igc_tsn_offload_apply(adapter);
> +			}
> +
> +			break;
> +		}
> +
> +		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
> +			adapter->fp_verify_cnt++;
> +			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
> +
> +			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
> +				adapter->fp_verify_cnt = 0;
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
> +				netdev_err(adapter->netdev,
> +					   "Exceeded number of attempts for frame preemption verification\n");
> +			} else {
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +			}
> +			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		}
> +
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_FAILED:
> +	case FRAME_PREEMPTION_STATE_DONE:
> +		break;
> +	}
> +
> +done:
> +	if (adapter->fp_received_smd_v) {
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
> +
> +		adapter->fp_received_smd_v = false;
> +	}
> +}
> +
>  static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  			void *type_data)
>  {
> @@ -6369,6 +6611,7 @@ static int igc_probe(struct pci_dev *pdev,
>  
>  	INIT_WORK(&adapter->reset_task, igc_reset_task);
>  	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
> +	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
>  
>  	/* Initialize link properties that are user-changeable */
>  	adapter->fc_autoneg = true;
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20 11:06     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20 11:06 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Po Liu, boon.leong.ong,
	intel-wired-lan

On Thu, May 19, 2022 at 06:15:37PM -0700, Vinicius Costa Gomes wrote:
> Frame Preemption and LaunchTime cannot be enabled on the same queue.
> If that situation happens, emit an error to the user, and log the
> error.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc_main.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 69e96e9a3ec8..96ad00e33f4b 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -5916,6 +5916,11 @@ static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
>  	if (queue < 0 || queue >= adapter->num_tx_queues)
>  		return -EINVAL;
>  
> +	if (ring->preemptible) {
> +		netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
> +		return -EINVAL;
> +	}
> +
>  	ring = adapter->tx_ring[queue];
>  	ring->launchtime_enable = enable;
>  
> -- 
> 2.35.3
>

I'm kind of concerned about this. I was thinking of adapting some
scripts I had into some functional kselftests for frame preemption.
I am sending 2 streams, one preemptable and one express. With SO_TXTIME
I am controlling their scheduled TX times, and I am forcing collisions
in the MAC merge layer by making the express packet have a scheduled TX
time equal to the preemptable packet's scheduled TX time, then I gradually
increase the express packet's scheduled time by small amounts (8 ns or so).
I take hardware TX timestamps of both packets and I plot when the express
packet is actually sent by the MAC. That is, I measure how long it takes
for the MAC to preempt and to reschedule the express packet.

My point is, if the LaunchTime feature cannot be enabled on preemptable
queues, how can I know that the igc does something functionally valid with
preemptable packets on TX, other than to reassemble the mPackets on RX?

Otherwise, if there isn't any other disagreement on the UAPI, would you
mind posting the iproute2 patch as well, so we could work in parallel
(me on enetc + the selftest) until net-next reopens? I'd like to write a
selftest that covers your hardware as well, but then again, not sure how
to cover it.

Do you have any sort of counters from the list in clause 30.14
Management for MAC Merge Sublayer? I see that structured ethtool counters
are becoming more popular, see struct ethtool_eth_mac_stats for example.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption
@ 2022-05-20 11:06     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20 11:06 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, May 19, 2022 at 06:15:37PM -0700, Vinicius Costa Gomes wrote:
> Frame Preemption and LaunchTime cannot be enabled on the same queue.
> If that situation happens, emit an error to the user, and log the
> error.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc_main.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 69e96e9a3ec8..96ad00e33f4b 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -5916,6 +5916,11 @@ static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
>  	if (queue < 0 || queue >= adapter->num_tx_queues)
>  		return -EINVAL;
>  
> +	if (ring->preemptible) {
> +		netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
> +		return -EINVAL;
> +	}
> +
>  	ring = adapter->tx_ring[queue];
>  	ring->launchtime_enable = enable;
>  
> -- 
> 2.35.3
>

I'm kind of concerned about this. I was thinking of adapting some
scripts I had into some functional kselftests for frame preemption.
I am sending 2 streams, one preemptable and one express. With SO_TXTIME
I am controlling their scheduled TX times, and I am forcing collisions
in the MAC merge layer by making the express packet have a scheduled TX
time equal to the preemptable packet's scheduled TX time, then I gradually
increase the express packet's scheduled time by small amounts (8 ns or so).
I take hardware TX timestamps of both packets and I plot when the express
packet is actually sent by the MAC. That is, I measure how long it takes
for the MAC to preempt and to reschedule the express packet.

My point is, if the LaunchTime feature cannot be enabled on preemptable
queues, how can I know that the igc does something functionally valid with
preemptable packets on TX, other than to reassemble the mPackets on RX?

Otherwise, if there isn't any other disagreement on the UAPI, would you
mind posting the iproute2 patch as well, so we could work in parallel
(me on enetc + the selftest) until net-next reopens? I'd like to write a
selftest that covers your hardware as well, but then again, not sure how
to cover it.

Do you have any sort of counters from the list in clause 30.14
Management for MAC Merge Sublayer? I see that structured ethtool counters
are becoming more popular, see struct ethtool_eth_mac_stats for example.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 11/11] igc: Add support for exposing frame preemption stats registers
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20 12:13     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20 12:13 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Po Liu, boon.leong.ong,
	intel-wired-lan

On Thu, May 19, 2022 at 06:15:38PM -0700, Vinicius Costa Gomes wrote:
> Expose the Frame Preemption counters, so the number of
> express/preemptible packets can be monitored by userspace.
> 
> These registers are cleared when read, so the value shown is the
> number of events that happened since the last read.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc_ethtool.c |  8 ++++++++
>  drivers/net/ethernet/intel/igc/igc_regs.h    | 10 ++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> index 9a80e2569dc3..0a84fbdd494b 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> @@ -344,6 +344,14 @@ static void igc_ethtool_get_regs(struct net_device *netdev,
>  
>  	regs_buff[213] = adapter->stats.tlpic;
>  	regs_buff[214] = adapter->stats.rlpic;
> +	regs_buff[215] = rd32(IGC_PRMPTDTCNT);
> +	regs_buff[216] = rd32(IGC_PRMEVNTTCNT);
> +	regs_buff[217] = rd32(IGC_PRMPTDRCNT);
> +	regs_buff[218] = rd32(IGC_PRMEVNTRCNT);
> +	regs_buff[219] = rd32(IGC_PRMPBLTCNT);
> +	regs_buff[220] = rd32(IGC_PRMPBLRCNT);
> +	regs_buff[221] = rd32(IGC_PRMEXPTCNT);
> +	regs_buff[222] = rd32(IGC_PRMEXPRCNT);
>  }
>  
>  static void igc_ethtool_get_wol(struct net_device *netdev,
> diff --git a/drivers/net/ethernet/intel/igc/igc_regs.h b/drivers/net/ethernet/intel/igc/igc_regs.h
> index e197a33d93a0..2b5ef1e80f5f 100644
> --- a/drivers/net/ethernet/intel/igc/igc_regs.h
> +++ b/drivers/net/ethernet/intel/igc/igc_regs.h
> @@ -224,6 +224,16 @@
>  
>  #define IGC_FTQF(_n)	(0x059E0 + (4 * (_n)))  /* 5-tuple Queue Fltr */
>  
> +/* Time sync registers - preemption statistics */
> +#define IGC_PRMPTDTCNT	0x04280  /* Good TX Preempted Packets */
> +#define IGC_PRMEVNTTCNT	0x04298  /* TX Preemption event counter */
> +#define IGC_PRMPTDRCNT	0x04284  /* Good RX Preempted Packets */
> +#define IGC_PRMEVNTRCNT	0x0429C  /* RX Preemption event counter */
> +#define IGC_PRMPBLTCNT	0x04288  /* Good TX Preemptable Packets */
> +#define IGC_PRMPBLRCNT	0x0428C  /* Good RX Preemptable Packets */
> +#define IGC_PRMEXPTCNT	0x04290  /* Good TX Express Packets */
> +#define IGC_PRMEXPRCNT	0x042A0  /* Preemption Exception Counter */
> +

Ah, I didn't notice this. FWIW, the standard talks about the following,
at the MAC merge layer:

aMACMergeFrameAssErrorCount
  A count of MAC frames with reassembly errors. The counter is
  incremented by one every time the ASSEMBLY_ERROR state in the Receive
  Processing State Diagram is entered (see Figure 99–6)

aMACMergeFrameSmdErrorCount
  A count of received MAC frames / MAC frame fragments rejected due to
  unknown SMD value or arriving with an SMD-C when no frame is in
  progress. The counter is incremented by one every time the BAD_FRAG
  state in the Receive Processing State Diagram is entered and every
  time the WAIT_FOR_DV_FALSE state is entered due to the invocation of
  the SMD_DECODE function returning the value “ERR” (see Figure 99–6)

aMACMergeFrameAssOkCount
  A count of MAC frames that were successfully reassembled and delivered
  to MAC. The counter is incremented by one every time the
  FRAME_COMPLETE state in the Receive Processing state diagram (see
  Figure 99–6) is entered if the state CHECK_FOR_RESUME was previously
  entered while processing the packet.

aMACMergeFragCountRx
  A count of the number of additional mPackets received due to preemption.
  The counter is incremented by one every time the state CHECK_FRAG_CNT
  in the Receive Processing State Diagram (see Figure 99–6) is entered.

aMACMergeFragCountTx
  A count of the number of additional mPackets transmitted due to preemption.
  This counter is incremented by one every time the SEND_SMD_C state in
  the Transmit Processing State Diagram (see Figure 99–5) is entered.

aMACMergeHoldCount
  A count of the number of times the variable hold (see 99.4.7.3)
  transitions from FALSE to TRUE.

I think we have the following correspondence:
"TX Preemption event counter" -> aMACMergeFragCountTx
"RX Preemption event counter" -> aMACMergeFragCountRx
"Preemption Exception Counter" -> aMACMergeFrameAssErrorCount + aMACMergeFrameSmdErrorCount?

Then we have the following uncovered counters:

Good TX Preempted Packets
Good RX Preempted Packets
Good TX Preemptable Packets
Good RX Preemptable Packets
Good TX Express Packets

These are at the level of individual MACs (pMAC, eMAC) rather than the MAC merge layer.


FWIW, ENETC has the following counters for FP:

Port MAC Merge Frame Assembly Error Count
Port MAC Merge Frame SMD Error Count
Port MAC Merge Frame Assembly OK
Port MAC Merge Fragment Count RX
Port MAC Merge Fragment Count TX
Port MAC Merge Hold Count

Then it has a series of RMON counters replicated twice, once for the
Port MAC 0 (eMAC) and once for Port MAC 1 (pMAC).

Similarly, the Felix switch has:

c_rx_assembly_err
c_rx_smd_err
c_rx_assembly_ok
c_rx_merge_frag

plus RMON counters replicated for the regular MAC and for the pMAC
(c_rx_pmac_oct vs c_rx_oct, c_rx_pmac_uc vs c_rx_uc, c_rx_pmac_sz_65_127
vs c_rx_sz_65_127, etc etc).

I think there's a tendency here. Maybe we count have structured data for
MAC merge layer counters, pMAC counters and eMAC counters? We already
have eMAC counters in the form of ethtool_eth_mac_stats, ethtool_eth_ctrl_stats,
ethtool_pause_stats, etc etc. We just need to figure out a way of
retrieving the same thing for the preemptable MAC.

Jakub, any ideas?

>  /* Transmit Scheduling Registers */
>  #define IGC_TQAVCTRL		0x3570
>  #define IGC_TXQCTL(_n)		(0x3344 + 0x4 * (_n))
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 11/11] igc: Add support for exposing frame preemption stats registers
@ 2022-05-20 12:13     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-20 12:13 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, May 19, 2022 at 06:15:38PM -0700, Vinicius Costa Gomes wrote:
> Expose the Frame Preemption counters, so the number of
> express/preemptible packets can be monitored by userspace.
> 
> These registers are cleared when read, so the value shown is the
> number of events that happened since the last read.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc_ethtool.c |  8 ++++++++
>  drivers/net/ethernet/intel/igc/igc_regs.h    | 10 ++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> index 9a80e2569dc3..0a84fbdd494b 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> @@ -344,6 +344,14 @@ static void igc_ethtool_get_regs(struct net_device *netdev,
>  
>  	regs_buff[213] = adapter->stats.tlpic;
>  	regs_buff[214] = adapter->stats.rlpic;
> +	regs_buff[215] = rd32(IGC_PRMPTDTCNT);
> +	regs_buff[216] = rd32(IGC_PRMEVNTTCNT);
> +	regs_buff[217] = rd32(IGC_PRMPTDRCNT);
> +	regs_buff[218] = rd32(IGC_PRMEVNTRCNT);
> +	regs_buff[219] = rd32(IGC_PRMPBLTCNT);
> +	regs_buff[220] = rd32(IGC_PRMPBLRCNT);
> +	regs_buff[221] = rd32(IGC_PRMEXPTCNT);
> +	regs_buff[222] = rd32(IGC_PRMEXPRCNT);
>  }
>  
>  static void igc_ethtool_get_wol(struct net_device *netdev,
> diff --git a/drivers/net/ethernet/intel/igc/igc_regs.h b/drivers/net/ethernet/intel/igc/igc_regs.h
> index e197a33d93a0..2b5ef1e80f5f 100644
> --- a/drivers/net/ethernet/intel/igc/igc_regs.h
> +++ b/drivers/net/ethernet/intel/igc/igc_regs.h
> @@ -224,6 +224,16 @@
>  
>  #define IGC_FTQF(_n)	(0x059E0 + (4 * (_n)))  /* 5-tuple Queue Fltr */
>  
> +/* Time sync registers - preemption statistics */
> +#define IGC_PRMPTDTCNT	0x04280  /* Good TX Preempted Packets */
> +#define IGC_PRMEVNTTCNT	0x04298  /* TX Preemption event counter */
> +#define IGC_PRMPTDRCNT	0x04284  /* Good RX Preempted Packets */
> +#define IGC_PRMEVNTRCNT	0x0429C  /* RX Preemption event counter */
> +#define IGC_PRMPBLTCNT	0x04288  /* Good TX Preemptable Packets */
> +#define IGC_PRMPBLRCNT	0x0428C  /* Good RX Preemptable Packets */
> +#define IGC_PRMEXPTCNT	0x04290  /* Good TX Express Packets */
> +#define IGC_PRMEXPRCNT	0x042A0  /* Preemption Exception Counter */
> +

Ah, I didn't notice this. FWIW, the standard talks about the following,
at the MAC merge layer:

aMACMergeFrameAssErrorCount
  A count of MAC frames with reassembly errors. The counter is
  incremented by one every time the ASSEMBLY_ERROR state in the Receive
  Processing State Diagram is entered (see Figure 99?6)

aMACMergeFrameSmdErrorCount
  A count of received MAC frames / MAC frame fragments rejected due to
  unknown SMD value or arriving with an SMD-C when no frame is in
  progress. The counter is incremented by one every time the BAD_FRAG
  state in the Receive Processing State Diagram is entered and every
  time the WAIT_FOR_DV_FALSE state is entered due to the invocation of
  the SMD_DECODE function returning the value ?ERR? (see Figure 99?6)

aMACMergeFrameAssOkCount
  A count of MAC frames that were successfully reassembled and delivered
  to MAC. The counter is incremented by one every time the
  FRAME_COMPLETE state in the Receive Processing state diagram (see
  Figure 99?6) is entered if the state CHECK_FOR_RESUME was previously
  entered while processing the packet.

aMACMergeFragCountRx
  A count of the number of additional mPackets received due to preemption.
  The counter is incremented by one every time the state CHECK_FRAG_CNT
  in the Receive Processing State Diagram (see Figure 99?6) is entered.

aMACMergeFragCountTx
  A count of the number of additional mPackets transmitted due to preemption.
  This counter is incremented by one every time the SEND_SMD_C state in
  the Transmit Processing State Diagram (see Figure 99?5) is entered.

aMACMergeHoldCount
  A count of the number of times the variable hold (see 99.4.7.3)
  transitions from FALSE to TRUE.

I think we have the following correspondence:
"TX Preemption event counter" -> aMACMergeFragCountTx
"RX Preemption event counter" -> aMACMergeFragCountRx
"Preemption Exception Counter" -> aMACMergeFrameAssErrorCount + aMACMergeFrameSmdErrorCount?

Then we have the following uncovered counters:

Good TX Preempted Packets
Good RX Preempted Packets
Good TX Preemptable Packets
Good RX Preemptable Packets
Good TX Express Packets

These are at the level of individual MACs (pMAC, eMAC) rather than the MAC merge layer.


FWIW, ENETC has the following counters for FP:

Port MAC Merge Frame Assembly Error Count
Port MAC Merge Frame SMD Error Count
Port MAC Merge Frame Assembly OK
Port MAC Merge Fragment Count RX
Port MAC Merge Fragment Count TX
Port MAC Merge Hold Count

Then it has a series of RMON counters replicated twice, once for the
Port MAC 0 (eMAC) and once for Port MAC 1 (pMAC).

Similarly, the Felix switch has:

c_rx_assembly_err
c_rx_smd_err
c_rx_assembly_ok
c_rx_merge_frag

plus RMON counters replicated for the regular MAC and for the pMAC
(c_rx_pmac_oct vs c_rx_oct, c_rx_pmac_uc vs c_rx_uc, c_rx_pmac_sz_65_127
vs c_rx_sz_65_127, etc etc).

I think there's a tendency here. Maybe we count have structured data for
MAC merge layer counters, pMAC counters and eMAC counters? We already
have eMAC counters in the form of ethtool_eth_mac_stats, ethtool_eth_ctrl_stats,
ethtool_pause_stats, etc etc. We just need to figure out a way of
retrieving the same thing for the preemptable MAC.

Jakub, any ideas?

>  /* Transmit Scheduling Registers */
>  #define IGC_TQAVCTRL		0x3570
>  #define IGC_TXQCTL(_n)		(0x3344 + 0x4 * (_n))
> -- 
> 2.35.3
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
  2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-20 22:34   ` Jakub Kicinski
  -1 siblings, 0 replies; 60+ messages in thread
From: Jakub Kicinski @ 2022-05-20 22:34 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, vladimir.oltean,
	po.liu, boon.leong.ong, intel-wired-lan

On Thu, 19 May 2022 18:15:27 -0700 Vinicius Costa Gomes wrote:
> Changes from v4:
>  - Went back to exposing the per-queue frame preemption bits via
>    ethtool-netlink only, via taprio/mqprio was seen as too much
>    trouble. (Vladimir Oltean)
>  - Fixed documentation and code/patch organization changes (Vladimir
>    Oltean).

First of all - could you please, please, please rev these patches more
than once a year? It's really hard to keep track of the context when
previous version was sent in Jun 2021 :/

I disagree that queue mask belongs in ethtool. It's an attribute of 
a queue and should be attached to a queue. The DCBNL parallel is flawed
IMO because pause generation is Rx, not Tx. There is no Rx queue in
Linux, much less per-prio.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-20 22:34   ` Jakub Kicinski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Kicinski @ 2022-05-20 22:34 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 19 May 2022 18:15:27 -0700 Vinicius Costa Gomes wrote:
> Changes from v4:
>  - Went back to exposing the per-queue frame preemption bits via
>    ethtool-netlink only, via taprio/mqprio was seen as too much
>    trouble. (Vladimir Oltean)
>  - Fixed documentation and code/patch organization changes (Vladimir
>    Oltean).

First of all - could you please, please, please rev these patches more
than once a year? It's really hard to keep track of the context when
previous version was sent in Jun 2021 :/

I disagree that queue mask belongs in ethtool. It's an attribute of 
a queue and should be attached to a queue. The DCBNL parallel is flawed
IMO because pause generation is Rx, not Tx. There is no Rx queue in
Linux, much less per-prio.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
  2022-05-20 22:34   ` [Intel-wired-lan] " Jakub Kicinski
@ 2022-05-21 15:03     ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-21 15:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri, davem,
	Po Liu, boon.leong.ong, intel-wired-lan

Hi Jakub,

On Fri, May 20, 2022 at 03:34:13PM -0700, Jakub Kicinski wrote:
> On Thu, 19 May 2022 18:15:27 -0700 Vinicius Costa Gomes wrote:
> > Changes from v4:
> >  - Went back to exposing the per-queue frame preemption bits via
> >    ethtool-netlink only, via taprio/mqprio was seen as too much
> >    trouble. (Vladimir Oltean)
> >  - Fixed documentation and code/patch organization changes (Vladimir
> >    Oltean).
> 
> First of all - could you please, please, please rev these patches more
> than once a year? It's really hard to keep track of the context when
> previous version was sent in Jun 2021 :/

It would have been nice if Vinicius would have posted these more than
once a year. But let's not throw stones at each other, I'm sure everyone
is doing their best ;) These are new specs, their usefulness in the Real
World is still being evaluated, and you shouldn't underestimate the
difficulty of exposing standard Linux interfaces for pre-standard or
first-generation hardware. Even the mistakes I may be making in my
interpretation of the spec below are in good faith (i.e. if I'm wrong
it's because I'm stupid, not because I'm interested in leaning the
implementation towards an interface that's more conventient for the
hardware I'm working with).



> I disagree that queue mask belongs in ethtool. It's an attribute of 
> a queue and should be attached to a queue.

Sure, you have very strong reasons to disagree with that statement, if
only the premise were true. But you have to understand that IEEE 802.1Q
does not talk about preemptible queues, but about preemptible priorities.
Here:

| 6.7.1 Support of the ISS by IEEE Std 802.3 (Ethernet)
| 
| For priority values that are identified in the frame preemption status table
| (6.7.2) as preemptible, frames that are selected for transmission shall be
| transmitted using the pMAC service instance, and for priority values that are
| identified in the frame preemption status table as express, frames that are
| selected for transmission shall be transmitted using the eMAC service instance.
| In all other respects, the Port behaves as if it is supported by a single MAC
| service interface. In particular, all frames received by the Port are treated
| as if they were received on a single MAC service interface regardless of
| whether they were received on the eMAC service interface or the pMAC service
| interface, except with respect to frame preemption.
| 
| 6.7.2 Frame preemption
| If the Port supports frame preemption, then a value of frame preemption status
| is assigned to each value of priority via a frame preemption status table. The
| possible values of frame preemption status are express or preemptible.
| The frame preemption status table can be changed by management as described in
| 12.30.1.1. The default value of frame preemption status is express for all
| priority values.

For context, I probably need to point out the distinction that the spec
makes between a priority and a traffic class.

A priority is a number assigned to a packet based on the VLAN PCP using
the rules in clause 6.9.3 Priority Code Point encoding. In Linux, it is
more or less equivalent to skb->priority.

A traffic class, on the other hand, is defined as basically synonimous
with a TX priority queue, as follows:

| 3.268 traffic class: A classification used to expedite transmission of frames
| generated by critical or time-sensitive services. Traffic classes are numbered
| from zero through N-1, where N is the number of outbound queues associated with
| a given Bridge Port, and 1 <= N <= 8, and each traffic class has a one-to-one
| correspondence with a specific outbound queue for that Port. Traffic class 0
| corresponds to nonexpedited traffic; nonzero traffic classes correspond to
| expedited classes of traffic. A fixed mapping determines, for a given priority
| associated with a frame and a given number of traffic classes, what traffic
| class will be assigned to the frame.

A priority is translated into a traffic class using Table 8-5:
Recommended priority to traffic class mappings, which in Linux would be
handled using the tc-mqprio "map".

But attention, a priority TX queue is not the same as a netdev TX queue,
but rather the same as a tc-mqprio traffic class (i.e. when you specify
"queues count@offset" to mqprio, from Linux perspective there are "count"
queues, from 802.1Q perspective there is only the "offset" queue (or TC).
This is because we may have per-CPU queues, etc.

This is even spelled out in this note:

| NOTE 3 - A queue in this context is not necessarily a single FIFO data structure.
| A queue is a record of all frames of a given traffic class awaiting
| transmission on a given Bridge Port. The structure of this record is not
| specified. The transmission selection algorithm (8.6.8) determines which
| traffic class, among those classes with frames available for transmission,
| provides the next frame for transmission. The method of determining which frame
| within a traffic class is the next available frame is not specified beyond
| conforming to the frame ordering requirements of this subclause. This allows a
| variety of queue structures such as a single FIFO, or a set of FIFOs with one
| for each pairing of ingress and egress ports (i.e., Virtual Output Queuing), or
| a set of FIFOs with one for each VLAN or priority, or hierarchical structures.

I'm not sure how much of this was already clear and how much wasn't.
I apologize if I'm not bringing new info to the table. I just want to
point out what a "queue" is, and what a "priority" is.



> The DCBNL parallel is flawed IMO because pause generation is Rx, not
> Tx. There is no Rx queue in Linux, much less per-prio.

First of all: we both know that PFC is not only about RX, right? :) Here:

| 8.6.8 Transmission selection
| In a port of a Bridge or station that supports PFC, a frame of priority
| n is not available for transmission if that priority is paused (i.e., if
| Priority_Paused[n] is TRUE (see 36.1.3.2) on that port.
| 
| NOTE 1 - Two or more priorities can be combined in a single queue. In
| this case if one or more of the priorities in the queue are paused, it
| is possible for frames in that queue not belonging to the paused
| priority to not be scheduled for transmission.
| 
| NOTE 2 - Mixing PFC and non-PFC priorities in the same queue results in
| non-PFC traffic being paused causing congestion spreading, and therefore
| is not recommended.

And that's kind of my whole point: PFC is per _priority_, not per
"queue"/"traffic class". And so is frame preemption (right below, same
clause). So the parallel isn't flawed at all. The dcbnl-pfc isn't in tc
for a reason, and that isn't because we don't have RX netdev queues...
And the reason why dcbnl-pfc isn't in tc is the same reason why ethtool
frame preemption shouldn't, either.

| In a port of a Bridge or station that supports frame preemption, a frame
| of priority n is not available for transmission if that priority is
| identified in the frame preemption status table (6.7.2) as preemptible
| and either the holdRequest object (12.30.1.5) is set to the value hold,
| or the transmission of a prior preemptible frame has yet to complete
| because it has been interrupted to allow the transmission of an express
| frame.

So since the managed objects for frame preemption are stipulated by IEEE
per priority:

| The framePreemptionStatusTable (6.7.2) consists of 8
| framePreemptionAdminStatus values (12.30.1.1.1), one per priority.

I think it is only reasonable for Linux to expose the same thing, and
let drivers do the priority to queue or traffic class remapping as they
see fit, when tc-mqprio or tc-taprio or other qdiscs that change this
mapping are installed (if their preemption hardware implementation is
per TC or queue rather than per priority). After all, you can have 2
priorities mapped to the same TC, but still have one express and one
preemptible. That is to say, you can implement preemption even in single
"queue" devices, and it even makes sense.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-21 15:03     ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-21 15:03 UTC (permalink / raw)
  To: intel-wired-lan

Hi Jakub,

On Fri, May 20, 2022 at 03:34:13PM -0700, Jakub Kicinski wrote:
> On Thu, 19 May 2022 18:15:27 -0700 Vinicius Costa Gomes wrote:
> > Changes from v4:
> >  - Went back to exposing the per-queue frame preemption bits via
> >    ethtool-netlink only, via taprio/mqprio was seen as too much
> >    trouble. (Vladimir Oltean)
> >  - Fixed documentation and code/patch organization changes (Vladimir
> >    Oltean).
> 
> First of all - could you please, please, please rev these patches more
> than once a year? It's really hard to keep track of the context when
> previous version was sent in Jun 2021 :/

It would have been nice if Vinicius would have posted these more than
once a year. But let's not throw stones at each other, I'm sure everyone
is doing their best ;) These are new specs, their usefulness in the Real
World is still being evaluated, and you shouldn't underestimate the
difficulty of exposing standard Linux interfaces for pre-standard or
first-generation hardware. Even the mistakes I may be making in my
interpretation of the spec below are in good faith (i.e. if I'm wrong
it's because I'm stupid, not because I'm interested in leaning the
implementation towards an interface that's more conventient for the
hardware I'm working with).



> I disagree that queue mask belongs in ethtool. It's an attribute of 
> a queue and should be attached to a queue.

Sure, you have very strong reasons to disagree with that statement, if
only the premise were true. But you have to understand that IEEE 802.1Q
does not talk about preemptible queues, but about preemptible priorities.
Here:

| 6.7.1 Support of the ISS by IEEE Std 802.3 (Ethernet)
| 
| For priority values that are identified in the frame preemption status table
| (6.7.2) as preemptible, frames that are selected for transmission shall be
| transmitted using the pMAC service instance, and for priority values that are
| identified in the frame preemption status table as express, frames that are
| selected for transmission shall be transmitted using the eMAC service instance.
| In all other respects, the Port behaves as if it is supported by a single MAC
| service interface. In particular, all frames received by the Port are treated
| as if they were received on a single MAC service interface regardless of
| whether they were received on the eMAC service interface or the pMAC service
| interface, except with respect to frame preemption.
| 
| 6.7.2 Frame preemption
| If the Port supports frame preemption, then a value of frame preemption status
| is assigned to each value of priority via a frame preemption status table. The
| possible values of frame preemption status are express or preemptible.
| The frame preemption status table can be changed by management as described in
| 12.30.1.1. The default value of frame preemption status is express for all
| priority values.

For context, I probably need to point out the distinction that the spec
makes between a priority and a traffic class.

A priority is a number assigned to a packet based on the VLAN PCP using
the rules in clause 6.9.3 Priority Code Point encoding. In Linux, it is
more or less equivalent to skb->priority.

A traffic class, on the other hand, is defined as basically synonimous
with a TX priority queue, as follows:

| 3.268 traffic class: A classification used to expedite transmission of frames
| generated by critical or time-sensitive services. Traffic classes are numbered
| from zero through N-1, where N is the number of outbound queues associated with
| a given Bridge Port, and 1 <= N <= 8, and each traffic class has a one-to-one
| correspondence with a specific outbound queue for that Port. Traffic class 0
| corresponds to nonexpedited traffic; nonzero traffic classes correspond to
| expedited classes of traffic. A fixed mapping determines, for a given priority
| associated with a frame and a given number of traffic classes, what traffic
| class will be assigned to the frame.

A priority is translated into a traffic class using Table 8-5:
Recommended priority to traffic class mappings, which in Linux would be
handled using the tc-mqprio "map".

But attention, a priority TX queue is not the same as a netdev TX queue,
but rather the same as a tc-mqprio traffic class (i.e. when you specify
"queues count@offset" to mqprio, from Linux perspective there are "count"
queues, from 802.1Q perspective there is only the "offset" queue (or TC).
This is because we may have per-CPU queues, etc.

This is even spelled out in this note:

| NOTE 3 - A queue in this context is not necessarily a single FIFO data structure.
| A queue is a record of all frames of a given traffic class awaiting
| transmission on a given Bridge Port. The structure of this record is not
| specified. The transmission selection algorithm (8.6.8) determines which
| traffic class, among those classes with frames available for transmission,
| provides the next frame for transmission. The method of determining which frame
| within a traffic class is the next available frame is not specified beyond
| conforming to the frame ordering requirements of this subclause. This allows a
| variety of queue structures such as a single FIFO, or a set of FIFOs with one
| for each pairing of ingress and egress ports (i.e., Virtual Output Queuing), or
| a set of FIFOs with one for each VLAN or priority, or hierarchical structures.

I'm not sure how much of this was already clear and how much wasn't.
I apologize if I'm not bringing new info to the table. I just want to
point out what a "queue" is, and what a "priority" is.



> The DCBNL parallel is flawed IMO because pause generation is Rx, not
> Tx. There is no Rx queue in Linux, much less per-prio.

First of all: we both know that PFC is not only about RX, right? :) Here:

| 8.6.8 Transmission selection
| In a port of a Bridge or station that supports PFC, a frame of priority
| n is not available for transmission if that priority is paused (i.e., if
| Priority_Paused[n] is TRUE (see 36.1.3.2) on that port.
| 
| NOTE 1 - Two or more priorities can be combined in a single queue. In
| this case if one or more of the priorities in the queue are paused, it
| is possible for frames in that queue not belonging to the paused
| priority to not be scheduled for transmission.
| 
| NOTE 2 - Mixing PFC and non-PFC priorities in the same queue results in
| non-PFC traffic being paused causing congestion spreading, and therefore
| is not recommended.

And that's kind of my whole point: PFC is per _priority_, not per
"queue"/"traffic class". And so is frame preemption (right below, same
clause). So the parallel isn't flawed at all. The dcbnl-pfc isn't in tc
for a reason, and that isn't because we don't have RX netdev queues...
And the reason why dcbnl-pfc isn't in tc is the same reason why ethtool
frame preemption shouldn't, either.

| In a port of a Bridge or station that supports frame preemption, a frame
| of priority n is not available for transmission if that priority is
| identified in the frame preemption status table (6.7.2) as preemptible
| and either the holdRequest object (12.30.1.5) is set to the value hold,
| or the transmission of a prior preemptible frame has yet to complete
| because it has been interrupted to allow the transmission of an express
| frame.

So since the managed objects for frame preemption are stipulated by IEEE
per priority:

| The framePreemptionStatusTable (6.7.2) consists of 8
| framePreemptionAdminStatus values (12.30.1.1.1), one per priority.

I think it is only reasonable for Linux to expose the same thing, and
let drivers do the priority to queue or traffic class remapping as they
see fit, when tc-mqprio or tc-taprio or other qdiscs that change this
mapping are installed (if their preemption hardware implementation is
per TC or queue rather than per priority). After all, you can have 2
priorities mapped to the same TC, but still have one express and one
preemptible. That is to say, you can implement preemption even in single
"queue" devices, and it even makes sense.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
  2022-05-21 15:03     ` [Intel-wired-lan] " Vladimir Oltean
@ 2022-05-23 19:52       ` Jakub Kicinski
  -1 siblings, 0 replies; 60+ messages in thread
From: Jakub Kicinski @ 2022-05-23 19:52 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri, davem,
	Po Liu, boon.leong.ong, intel-wired-lan

On Sat, 21 May 2022 15:03:05 +0000 Vladimir Oltean wrote:
> > I disagree that queue mask belongs in ethtool. It's an attribute of 
> > a queue and should be attached to a queue.  
> 
> Sure, you have very strong reasons to disagree with that statement, if
> only the premise were true. But you have to understand that IEEE 802.1Q
> does not talk about preemptible queues, but about preemptible priorities.
> Here:
> 
> | 6.7.1 Support of the ISS by IEEE Std 802.3 (Ethernet)
> | 
> | For priority values that are identified in the frame preemption status table
> | (6.7.2) as preemptible, frames that are selected for transmission shall be
> | transmitted using the pMAC service instance, and for priority values that are
> | identified in the frame preemption status table as express, frames that are
> | selected for transmission shall be transmitted using the eMAC service instance.
> | In all other respects, the Port behaves as if it is supported by a single MAC
> | service interface. In particular, all frames received by the Port are treated
> | as if they were received on a single MAC service interface regardless of
> | whether they were received on the eMAC service interface or the pMAC service
> | interface, except with respect to frame preemption.
> | 
> | 6.7.2 Frame preemption
> | If the Port supports frame preemption, then a value of frame preemption status
> | is assigned to each value of priority via a frame preemption status table. The
> | possible values of frame preemption status are express or preemptible.
> | The frame preemption status table can be changed by management as described in
> | 12.30.1.1. The default value of frame preemption status is express for all
> | priority values.
> 
> For context, I probably need to point out the distinction that the spec
> makes between a priority and a traffic class.
> 
> A priority is a number assigned to a packet based on the VLAN PCP using
> the rules in clause 6.9.3 Priority Code Point encoding. In Linux, it is
> more or less equivalent to skb->priority.
> 
> A traffic class, on the other hand, is defined as basically synonimous
> with a TX priority queue, as follows:
> 
> | 3.268 traffic class: A classification used to expedite transmission of frames
> | generated by critical or time-sensitive services. Traffic classes are numbered
> | from zero through N-1, where N is the number of outbound queues associated with
> | a given Bridge Port, and 1 <= N <= 8, and each traffic class has a one-to-one
> | correspondence with a specific outbound queue for that Port. Traffic class 0
> | corresponds to nonexpedited traffic; nonzero traffic classes correspond to
> | expedited classes of traffic. A fixed mapping determines, for a given priority
> | associated with a frame and a given number of traffic classes, what traffic
> | class will be assigned to the frame.
> 
> A priority is translated into a traffic class using Table 8-5:
> Recommended priority to traffic class mappings, which in Linux would be
> handled using the tc-mqprio "map".
> 
> But attention, a priority TX queue is not the same as a netdev TX queue,
> but rather the same as a tc-mqprio traffic class (i.e. when you specify
> "queues count@offset" to mqprio, from Linux perspective there are "count"
> queues, from 802.1Q perspective there is only the "offset" queue (or TC).
> This is because we may have per-CPU queues, etc.
> 
> This is even spelled out in this note:
> 
> | NOTE 3 - A queue in this context is not necessarily a single FIFO data structure.
> | A queue is a record of all frames of a given traffic class awaiting
> | transmission on a given Bridge Port. The structure of this record is not
> | specified. The transmission selection algorithm (8.6.8) determines which
> | traffic class, among those classes with frames available for transmission,
> | provides the next frame for transmission. The method of determining which frame
> | within a traffic class is the next available frame is not specified beyond
> | conforming to the frame ordering requirements of this subclause. This allows a
> | variety of queue structures such as a single FIFO, or a set of FIFOs with one
> | for each pairing of ingress and egress ports (i.e., Virtual Output Queuing), or
> | a set of FIFOs with one for each VLAN or priority, or hierarchical structures.
> 
> I'm not sure how much of this was already clear and how much wasn't.
> I apologize if I'm not bringing new info to the table. I just want to
> point out what a "queue" is, and what a "priority" is.

Very useful, thanks!

> > The DCBNL parallel is flawed IMO because pause generation is Rx, not
> > Tx. There is no Rx queue in Linux, much less per-prio.  
> 
> First of all: we both know that PFC is not only about RX, right? :) Here:
> 
> | 8.6.8 Transmission selection
> | In a port of a Bridge or station that supports PFC, a frame of priority
> | n is not available for transmission if that priority is paused (i.e., if
> | Priority_Paused[n] is TRUE (see 36.1.3.2) on that port.
> | 
> | NOTE 1 - Two or more priorities can be combined in a single queue. In
> | this case if one or more of the priorities in the queue are paused, it
> | is possible for frames in that queue not belonging to the paused
> | priority to not be scheduled for transmission.
> | 
> | NOTE 2 - Mixing PFC and non-PFC priorities in the same queue results in
> | non-PFC traffic being paused causing congestion spreading, and therefore
> | is not recommended.
> 
> And that's kind of my whole point: PFC is per _priority_, not per
> "queue"/"traffic class". And so is frame preemption (right below, same
> clause). So the parallel isn't flawed at all. The dcbnl-pfc isn't in tc
> for a reason, and that isn't because we don't have RX netdev queues...
> And the reason why dcbnl-pfc isn't in tc is the same reason why ethtool
> frame preemption shouldn't, either.

My understanding is that DCBNL is not in ethtool is that it was built
primarily for converged Ethernet. ethtool being a netdev thing it's
largely confined to coarse interface configuration in such
environments, they certainly don't use TC to control RDMA queues.

To put it differently DCBNL separates RoCE and storage queues from
netdev queues (latter being lossy). It's Conway's law at work.

Frame preemption falls entirely into netdev land. We can use the right
interface rather than building a FW shim^W "generic" interface.

> | In a port of a Bridge or station that supports frame preemption, a frame
> | of priority n is not available for transmission if that priority is
> | identified in the frame preemption status table (6.7.2) as preemptible
> | and either the holdRequest object (12.30.1.5) is set to the value hold,
> | or the transmission of a prior preemptible frame has yet to complete
> | because it has been interrupted to allow the transmission of an express
> | frame.
> 
> So since the managed objects for frame preemption are stipulated by IEEE
> per priority:
> 
> | The framePreemptionStatusTable (6.7.2) consists of 8
> | framePreemptionAdminStatus values (12.30.1.1.1), one per priority.
> 
> I think it is only reasonable for Linux to expose the same thing, and
> let drivers do the priority to queue or traffic class remapping as they
> see fit, when tc-mqprio or tc-taprio or other qdiscs that change this
> mapping are installed (if their preemption hardware implementation is
> per TC or queue rather than per priority). After all, you can have 2
> priorities mapped to the same TC, but still have one express and one
> preemptible. That is to say, you can implement preemption even in single
> "queue" devices, and it even makes sense.

Honestly I feel like I'm missing a key detail because all you wrote
sounds like an argument _against_ exposing the queue mask in ethtool.
Neither the standard calls for it, nor is it convenient to the user
who sets the prio->tc and queue allocation in TC.

If we wanted to expose prio mask in ethtool, that's a different story.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-23 19:52       ` Jakub Kicinski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Kicinski @ 2022-05-23 19:52 UTC (permalink / raw)
  To: intel-wired-lan

On Sat, 21 May 2022 15:03:05 +0000 Vladimir Oltean wrote:
> > I disagree that queue mask belongs in ethtool. It's an attribute of 
> > a queue and should be attached to a queue.  
> 
> Sure, you have very strong reasons to disagree with that statement, if
> only the premise were true. But you have to understand that IEEE 802.1Q
> does not talk about preemptible queues, but about preemptible priorities.
> Here:
> 
> | 6.7.1 Support of the ISS by IEEE Std 802.3 (Ethernet)
> | 
> | For priority values that are identified in the frame preemption status table
> | (6.7.2) as preemptible, frames that are selected for transmission shall be
> | transmitted using the pMAC service instance, and for priority values that are
> | identified in the frame preemption status table as express, frames that are
> | selected for transmission shall be transmitted using the eMAC service instance.
> | In all other respects, the Port behaves as if it is supported by a single MAC
> | service interface. In particular, all frames received by the Port are treated
> | as if they were received on a single MAC service interface regardless of
> | whether they were received on the eMAC service interface or the pMAC service
> | interface, except with respect to frame preemption.
> | 
> | 6.7.2 Frame preemption
> | If the Port supports frame preemption, then a value of frame preemption status
> | is assigned to each value of priority via a frame preemption status table. The
> | possible values of frame preemption status are express or preemptible.
> | The frame preemption status table can be changed by management as described in
> | 12.30.1.1. The default value of frame preemption status is express for all
> | priority values.
> 
> For context, I probably need to point out the distinction that the spec
> makes between a priority and a traffic class.
> 
> A priority is a number assigned to a packet based on the VLAN PCP using
> the rules in clause 6.9.3 Priority Code Point encoding. In Linux, it is
> more or less equivalent to skb->priority.
> 
> A traffic class, on the other hand, is defined as basically synonimous
> with a TX priority queue, as follows:
> 
> | 3.268 traffic class: A classification used to expedite transmission of frames
> | generated by critical or time-sensitive services. Traffic classes are numbered
> | from zero through N-1, where N is the number of outbound queues associated with
> | a given Bridge Port, and 1 <= N <= 8, and each traffic class has a one-to-one
> | correspondence with a specific outbound queue for that Port. Traffic class 0
> | corresponds to nonexpedited traffic; nonzero traffic classes correspond to
> | expedited classes of traffic. A fixed mapping determines, for a given priority
> | associated with a frame and a given number of traffic classes, what traffic
> | class will be assigned to the frame.
> 
> A priority is translated into a traffic class using Table 8-5:
> Recommended priority to traffic class mappings, which in Linux would be
> handled using the tc-mqprio "map".
> 
> But attention, a priority TX queue is not the same as a netdev TX queue,
> but rather the same as a tc-mqprio traffic class (i.e. when you specify
> "queues count at offset" to mqprio, from Linux perspective there are "count"
> queues, from 802.1Q perspective there is only the "offset" queue (or TC).
> This is because we may have per-CPU queues, etc.
> 
> This is even spelled out in this note:
> 
> | NOTE 3 - A queue in this context is not necessarily a single FIFO data structure.
> | A queue is a record of all frames of a given traffic class awaiting
> | transmission on a given Bridge Port. The structure of this record is not
> | specified. The transmission selection algorithm (8.6.8) determines which
> | traffic class, among those classes with frames available for transmission,
> | provides the next frame for transmission. The method of determining which frame
> | within a traffic class is the next available frame is not specified beyond
> | conforming to the frame ordering requirements of this subclause. This allows a
> | variety of queue structures such as a single FIFO, or a set of FIFOs with one
> | for each pairing of ingress and egress ports (i.e., Virtual Output Queuing), or
> | a set of FIFOs with one for each VLAN or priority, or hierarchical structures.
> 
> I'm not sure how much of this was already clear and how much wasn't.
> I apologize if I'm not bringing new info to the table. I just want to
> point out what a "queue" is, and what a "priority" is.

Very useful, thanks!

> > The DCBNL parallel is flawed IMO because pause generation is Rx, not
> > Tx. There is no Rx queue in Linux, much less per-prio.  
> 
> First of all: we both know that PFC is not only about RX, right? :) Here:
> 
> | 8.6.8 Transmission selection
> | In a port of a Bridge or station that supports PFC, a frame of priority
> | n is not available for transmission if that priority is paused (i.e., if
> | Priority_Paused[n] is TRUE (see 36.1.3.2) on that port.
> | 
> | NOTE 1 - Two or more priorities can be combined in a single queue. In
> | this case if one or more of the priorities in the queue are paused, it
> | is possible for frames in that queue not belonging to the paused
> | priority to not be scheduled for transmission.
> | 
> | NOTE 2 - Mixing PFC and non-PFC priorities in the same queue results in
> | non-PFC traffic being paused causing congestion spreading, and therefore
> | is not recommended.
> 
> And that's kind of my whole point: PFC is per _priority_, not per
> "queue"/"traffic class". And so is frame preemption (right below, same
> clause). So the parallel isn't flawed at all. The dcbnl-pfc isn't in tc
> for a reason, and that isn't because we don't have RX netdev queues...
> And the reason why dcbnl-pfc isn't in tc is the same reason why ethtool
> frame preemption shouldn't, either.

My understanding is that DCBNL is not in ethtool is that it was built
primarily for converged Ethernet. ethtool being a netdev thing it's
largely confined to coarse interface configuration in such
environments, they certainly don't use TC to control RDMA queues.

To put it differently DCBNL separates RoCE and storage queues from
netdev queues (latter being lossy). It's Conway's law at work.

Frame preemption falls entirely into netdev land. We can use the right
interface rather than building a FW shim^W "generic" interface.

> | In a port of a Bridge or station that supports frame preemption, a frame
> | of priority n is not available for transmission if that priority is
> | identified in the frame preemption status table (6.7.2) as preemptible
> | and either the holdRequest object (12.30.1.5) is set to the value hold,
> | or the transmission of a prior preemptible frame has yet to complete
> | because it has been interrupted to allow the transmission of an express
> | frame.
> 
> So since the managed objects for frame preemption are stipulated by IEEE
> per priority:
> 
> | The framePreemptionStatusTable (6.7.2) consists of 8
> | framePreemptionAdminStatus values (12.30.1.1.1), one per priority.
> 
> I think it is only reasonable for Linux to expose the same thing, and
> let drivers do the priority to queue or traffic class remapping as they
> see fit, when tc-mqprio or tc-taprio or other qdiscs that change this
> mapping are installed (if their preemption hardware implementation is
> per TC or queue rather than per priority). After all, you can have 2
> priorities mapped to the same TC, but still have one express and one
> preemptible. That is to say, you can implement preemption even in single
> "queue" devices, and it even makes sense.

Honestly I feel like I'm missing a key detail because all you wrote
sounds like an argument _against_ exposing the queue mask in ethtool.
Neither the standard calls for it, nor is it convenient to the user
who sets the prio->tc and queue allocation in TC.

If we wanted to expose prio mask in ethtool, that's a different story.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
  2022-05-23 19:52       ` [Intel-wired-lan] " Jakub Kicinski
@ 2022-05-23 20:32         ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-23 20:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri, davem,
	Po Liu, boon.leong.ong, intel-wired-lan

On Mon, May 23, 2022 at 12:52:38PM -0700, Jakub Kicinski wrote:
> > > The DCBNL parallel is flawed IMO because pause generation is Rx, not
> > > Tx. There is no Rx queue in Linux, much less per-prio.  
> > 
> > First of all: we both know that PFC is not only about RX, right? :) Here:
> > 
> > | 8.6.8 Transmission selection
> > | In a port of a Bridge or station that supports PFC, a frame of priority
> > | n is not available for transmission if that priority is paused (i.e., if
> > | Priority_Paused[n] is TRUE (see 36.1.3.2) on that port.
> > | 
> > | NOTE 1 - Two or more priorities can be combined in a single queue. In
> > | this case if one or more of the priorities in the queue are paused, it
> > | is possible for frames in that queue not belonging to the paused
> > | priority to not be scheduled for transmission.
> > | 
> > | NOTE 2 - Mixing PFC and non-PFC priorities in the same queue results in
> > | non-PFC traffic being paused causing congestion spreading, and therefore
> > | is not recommended.
> > 
> > And that's kind of my whole point: PFC is per _priority_, not per
> > "queue"/"traffic class". And so is frame preemption (right below, same
> > clause). So the parallel isn't flawed at all. The dcbnl-pfc isn't in tc
> > for a reason, and that isn't because we don't have RX netdev queues...
> > And the reason why dcbnl-pfc isn't in tc is the same reason why ethtool
> > frame preemption shouldn't, either.
> 
> My understanding is that DCBNL is not in ethtool is that it was built
> primarily for converged Ethernet. ethtool being a netdev thing it's
> largely confined to coarse interface configuration in such
> environments, they certainly don't use TC to control RDMA queues.
> 
> To put it differently DCBNL separates RoCE and storage queues from
> netdev queues (latter being lossy). It's Conway's law at work.
> 
> Frame preemption falls entirely into netdev land. We can use the right
> interface rather than building a FW shim^W "generic" interface.

Not sure where you're aiming with this, sorry. Why dcbnl is not
integrated in ethtool is a bit beside the point. What was relevant about
PFC as an analogy was it's something that is configured per priority
[ and not per queue ] and does not belong to the qdisc for that reason.

> > | In a port of a Bridge or station that supports frame preemption, a frame
> > | of priority n is not available for transmission if that priority is
> > | identified in the frame preemption status table (6.7.2) as preemptible
> > | and either the holdRequest object (12.30.1.5) is set to the value hold,
> > | or the transmission of a prior preemptible frame has yet to complete
> > | because it has been interrupted to allow the transmission of an express
> > | frame.
> > 
> > So since the managed objects for frame preemption are stipulated by IEEE
> > per priority:
> > 
> > | The framePreemptionStatusTable (6.7.2) consists of 8
> > | framePreemptionAdminStatus values (12.30.1.1.1), one per priority.
> > 
> > I think it is only reasonable for Linux to expose the same thing, and
> > let drivers do the priority to queue or traffic class remapping as they
> > see fit, when tc-mqprio or tc-taprio or other qdiscs that change this
> > mapping are installed (if their preemption hardware implementation is
> > per TC or queue rather than per priority). After all, you can have 2
> > priorities mapped to the same TC, but still have one express and one
> > preemptible. That is to say, you can implement preemption even in single
> > "queue" devices, and it even makes sense.
> 
> Honestly I feel like I'm missing a key detail because all you wrote
> sounds like an argument _against_ exposing the queue mask in ethtool.

Yeah, I guess the key detail that you're missing is that there's no such
thing as "preemptible queue mask" in 802.1Q. My feeling is that both
Vinicius and myself were confused in different ways by some spec
definitions and had slightly different things in mind, and we've
essentially ended up debating where a non-standard thing should go.

In my case, I said in my reply to the previous patch set that a priority
is essentially synonymous with a traffic class (which it isn't, as per
the definitions above), so I used the "traffic class" term incorrectly
and didn't capitalize the "priority" word, which I should have.
https://patchwork.kernel.org/project/netdevbpf/patch/20210626003314.3159402-3-vinicius.gomes@intel.com/#24812068

In Vinicius' case, part of the confusion might come from the fact that
his hardware really has preemption configurable per queue, and he
mistook it for the standard itself.

> Neither the standard calls for it, nor is it convenient to the user
> who sets the prio->tc and queue allocation in TC.
> 
> If we wanted to expose prio mask in ethtool, that's a different story.

Re-reading what I've said, I can't say "I was right all along"
(not by a long shot, sorry for my part in the confusion), but I guess
the conclusion is that:

(a) "preemptable queues" needs to become "preemptable priorities" in the
    UAPI. The question becomes how to expose the mask of preemptable
    priorities. A simple u8 bit mask where "BIT(i) == 1" means "prio i
    is preemptable", or with a nested netlink attribute scheme similar
    to DCB_PFC_UP_ATTR_0 -> DCB_PFC_UP_ATTR_7?

(b) keeping the "preemptable priorities" away from tc-qdisc is ok

(c) non-standard hardware should deal with prio <-> queue mapping by
    itself if its queues are what are preemptable

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-23 20:32         ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-23 20:32 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, May 23, 2022 at 12:52:38PM -0700, Jakub Kicinski wrote:
> > > The DCBNL parallel is flawed IMO because pause generation is Rx, not
> > > Tx. There is no Rx queue in Linux, much less per-prio.  
> > 
> > First of all: we both know that PFC is not only about RX, right? :) Here:
> > 
> > | 8.6.8 Transmission selection
> > | In a port of a Bridge or station that supports PFC, a frame of priority
> > | n is not available for transmission if that priority is paused (i.e., if
> > | Priority_Paused[n] is TRUE (see 36.1.3.2) on that port.
> > | 
> > | NOTE 1 - Two or more priorities can be combined in a single queue. In
> > | this case if one or more of the priorities in the queue are paused, it
> > | is possible for frames in that queue not belonging to the paused
> > | priority to not be scheduled for transmission.
> > | 
> > | NOTE 2 - Mixing PFC and non-PFC priorities in the same queue results in
> > | non-PFC traffic being paused causing congestion spreading, and therefore
> > | is not recommended.
> > 
> > And that's kind of my whole point: PFC is per _priority_, not per
> > "queue"/"traffic class". And so is frame preemption (right below, same
> > clause). So the parallel isn't flawed at all. The dcbnl-pfc isn't in tc
> > for a reason, and that isn't because we don't have RX netdev queues...
> > And the reason why dcbnl-pfc isn't in tc is the same reason why ethtool
> > frame preemption shouldn't, either.
> 
> My understanding is that DCBNL is not in ethtool is that it was built
> primarily for converged Ethernet. ethtool being a netdev thing it's
> largely confined to coarse interface configuration in such
> environments, they certainly don't use TC to control RDMA queues.
> 
> To put it differently DCBNL separates RoCE and storage queues from
> netdev queues (latter being lossy). It's Conway's law at work.
> 
> Frame preemption falls entirely into netdev land. We can use the right
> interface rather than building a FW shim^W "generic" interface.

Not sure where you're aiming with this, sorry. Why dcbnl is not
integrated in ethtool is a bit beside the point. What was relevant about
PFC as an analogy was it's something that is configured per priority
[ and not per queue ] and does not belong to the qdisc for that reason.

> > | In a port of a Bridge or station that supports frame preemption, a frame
> > | of priority n is not available for transmission if that priority is
> > | identified in the frame preemption status table (6.7.2) as preemptible
> > | and either the holdRequest object (12.30.1.5) is set to the value hold,
> > | or the transmission of a prior preemptible frame has yet to complete
> > | because it has been interrupted to allow the transmission of an express
> > | frame.
> > 
> > So since the managed objects for frame preemption are stipulated by IEEE
> > per priority:
> > 
> > | The framePreemptionStatusTable (6.7.2) consists of 8
> > | framePreemptionAdminStatus values (12.30.1.1.1), one per priority.
> > 
> > I think it is only reasonable for Linux to expose the same thing, and
> > let drivers do the priority to queue or traffic class remapping as they
> > see fit, when tc-mqprio or tc-taprio or other qdiscs that change this
> > mapping are installed (if their preemption hardware implementation is
> > per TC or queue rather than per priority). After all, you can have 2
> > priorities mapped to the same TC, but still have one express and one
> > preemptible. That is to say, you can implement preemption even in single
> > "queue" devices, and it even makes sense.
> 
> Honestly I feel like I'm missing a key detail because all you wrote
> sounds like an argument _against_ exposing the queue mask in ethtool.

Yeah, I guess the key detail that you're missing is that there's no such
thing as "preemptible queue mask" in 802.1Q. My feeling is that both
Vinicius and myself were confused in different ways by some spec
definitions and had slightly different things in mind, and we've
essentially ended up debating where a non-standard thing should go.

In my case, I said in my reply to the previous patch set that a priority
is essentially synonymous with a traffic class (which it isn't, as per
the definitions above), so I used the "traffic class" term incorrectly
and didn't capitalize the "priority" word, which I should have.
https://patchwork.kernel.org/project/netdevbpf/patch/20210626003314.3159402-3-vinicius.gomes at intel.com/#24812068

In Vinicius' case, part of the confusion might come from the fact that
his hardware really has preemption configurable per queue, and he
mistook it for the standard itself.

> Neither the standard calls for it, nor is it convenient to the user
> who sets the prio->tc and queue allocation in TC.
> 
> If we wanted to expose prio mask in ethtool, that's a different story.

Re-reading what I've said, I can't say "I was right all along"
(not by a long shot, sorry for my part in the confusion), but I guess
the conclusion is that:

(a) "preemptable queues" needs to become "preemptable priorities" in the
    UAPI. The question becomes how to expose the mask of preemptable
    priorities. A simple u8 bit mask where "BIT(i) == 1" means "prio i
    is preemptable", or with a nested netlink attribute scheme similar
    to DCB_PFC_UP_ATTR_0 -> DCB_PFC_UP_ATTR_7?

(b) keeping the "preemptable priorities" away from tc-qdisc is ok

(c) non-standard hardware should deal with prio <-> queue mapping by
    itself if its queues are what are preemptable

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
  2022-05-23 20:32         ` [Intel-wired-lan] " Vladimir Oltean
@ 2022-05-23 21:31           ` Jakub Kicinski
  -1 siblings, 0 replies; 60+ messages in thread
From: Jakub Kicinski @ 2022-05-23 21:31 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri, davem,
	Po Liu, boon.leong.ong, intel-wired-lan

On Mon, 23 May 2022 20:32:15 +0000 Vladimir Oltean wrote:
> > > | In a port of a Bridge or station that supports frame preemption, a frame
> > > | of priority n is not available for transmission if that priority is
> > > | identified in the frame preemption status table (6.7.2) as preemptible
> > > | and either the holdRequest object (12.30.1.5) is set to the value hold,
> > > | or the transmission of a prior preemptible frame has yet to complete
> > > | because it has been interrupted to allow the transmission of an express
> > > | frame.
> > > 
> > > So since the managed objects for frame preemption are stipulated by IEEE
> > > per priority:
> > > 
> > > | The framePreemptionStatusTable (6.7.2) consists of 8
> > > | framePreemptionAdminStatus values (12.30.1.1.1), one per priority.
> > > 
> > > I think it is only reasonable for Linux to expose the same thing, and
> > > let drivers do the priority to queue or traffic class remapping as they
> > > see fit, when tc-mqprio or tc-taprio or other qdiscs that change this
> > > mapping are installed (if their preemption hardware implementation is
> > > per TC or queue rather than per priority). After all, you can have 2
> > > priorities mapped to the same TC, but still have one express and one
> > > preemptible. That is to say, you can implement preemption even in single
> > > "queue" devices, and it even makes sense.  
> > 
> > Honestly I feel like I'm missing a key detail because all you wrote
> > sounds like an argument _against_ exposing the queue mask in ethtool.  
> 
> Yeah, I guess the key detail that you're missing is that there's no such
> thing as "preemptible queue mask" in 802.1Q. My feeling is that both
> Vinicius and myself were confused in different ways by some spec
> definitions and had slightly different things in mind, and we've
> essentially ended up debating where a non-standard thing should go.
> 
> In my case, I said in my reply to the previous patch set that a priority
> is essentially synonymous with a traffic class (which it isn't, as per
> the definitions above), so I used the "traffic class" term incorrectly
> and didn't capitalize the "priority" word, which I should have.
> https://patchwork.kernel.org/project/netdevbpf/patch/20210626003314.3159402-3-vinicius.gomes@intel.com/#24812068
> 
> In Vinicius' case, part of the confusion might come from the fact that
> his hardware really has preemption configurable per queue, and he
> mistook it for the standard itself.
> 
> > Neither the standard calls for it, nor is it convenient to the user
> > who sets the prio->tc and queue allocation in TC.
> > 
> > If we wanted to expose prio mask in ethtool, that's a different story.  
> 
> Re-reading what I've said, I can't say "I was right all along"
> (not by a long shot, sorry for my part in the confusion),

Sorry, I admit I did not go back to the archives to re-read your
feedback today. I'm purely reacting to the fact that the "preemptible
queue mask" attribute which I have successfully fought off in the
past have now returned.

Let me also spell out the source of my objection - high speed NICs
have multitude of queues, queue groups and sub-interfaces. ethtool
uAPI which uses a zero-based integer ID will lead to confusion and lack
of portability because users will not know the mapping and vendors
will invent whatever fits their HW best.

> but I guess the conclusion is that:
> 
> (a) "preemptable queues" needs to become "preemptable priorities" in the
>     UAPI. The question becomes how to expose the mask of preemptable
>     priorities. A simple u8 bit mask where "BIT(i) == 1" means "prio i
>     is preemptable", or with a nested netlink attribute scheme similar
>     to DCB_PFC_UP_ATTR_0 -> DCB_PFC_UP_ATTR_7?

No preference there, we can also put it in DCBnl, if it fits better.

> (b) keeping the "preemptable priorities" away from tc-qdisc is ok

Ack.

> (c) non-standard hardware should deal with prio <-> queue mapping by
>     itself if its queues are what are preemptable

I'd prefer if the core had helpers to do the mapping for drivers, 
but in principle yes - make the preemptible queues an implementation
detail if possible.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-23 21:31           ` Jakub Kicinski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Kicinski @ 2022-05-23 21:31 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 23 May 2022 20:32:15 +0000 Vladimir Oltean wrote:
> > > | In a port of a Bridge or station that supports frame preemption, a frame
> > > | of priority n is not available for transmission if that priority is
> > > | identified in the frame preemption status table (6.7.2) as preemptible
> > > | and either the holdRequest object (12.30.1.5) is set to the value hold,
> > > | or the transmission of a prior preemptible frame has yet to complete
> > > | because it has been interrupted to allow the transmission of an express
> > > | frame.
> > > 
> > > So since the managed objects for frame preemption are stipulated by IEEE
> > > per priority:
> > > 
> > > | The framePreemptionStatusTable (6.7.2) consists of 8
> > > | framePreemptionAdminStatus values (12.30.1.1.1), one per priority.
> > > 
> > > I think it is only reasonable for Linux to expose the same thing, and
> > > let drivers do the priority to queue or traffic class remapping as they
> > > see fit, when tc-mqprio or tc-taprio or other qdiscs that change this
> > > mapping are installed (if their preemption hardware implementation is
> > > per TC or queue rather than per priority). After all, you can have 2
> > > priorities mapped to the same TC, but still have one express and one
> > > preemptible. That is to say, you can implement preemption even in single
> > > "queue" devices, and it even makes sense.  
> > 
> > Honestly I feel like I'm missing a key detail because all you wrote
> > sounds like an argument _against_ exposing the queue mask in ethtool.  
> 
> Yeah, I guess the key detail that you're missing is that there's no such
> thing as "preemptible queue mask" in 802.1Q. My feeling is that both
> Vinicius and myself were confused in different ways by some spec
> definitions and had slightly different things in mind, and we've
> essentially ended up debating where a non-standard thing should go.
> 
> In my case, I said in my reply to the previous patch set that a priority
> is essentially synonymous with a traffic class (which it isn't, as per
> the definitions above), so I used the "traffic class" term incorrectly
> and didn't capitalize the "priority" word, which I should have.
> https://patchwork.kernel.org/project/netdevbpf/patch/20210626003314.3159402-3-vinicius.gomes at intel.com/#24812068
> 
> In Vinicius' case, part of the confusion might come from the fact that
> his hardware really has preemption configurable per queue, and he
> mistook it for the standard itself.
> 
> > Neither the standard calls for it, nor is it convenient to the user
> > who sets the prio->tc and queue allocation in TC.
> > 
> > If we wanted to expose prio mask in ethtool, that's a different story.  
> 
> Re-reading what I've said, I can't say "I was right all along"
> (not by a long shot, sorry for my part in the confusion),

Sorry, I admit I did not go back to the archives to re-read your
feedback today. I'm purely reacting to the fact that the "preemptible
queue mask" attribute which I have successfully fought off in the
past have now returned.

Let me also spell out the source of my objection - high speed NICs
have multitude of queues, queue groups and sub-interfaces. ethtool
uAPI which uses a zero-based integer ID will lead to confusion and lack
of portability because users will not know the mapping and vendors
will invent whatever fits their HW best.

> but I guess the conclusion is that:
> 
> (a) "preemptable queues" needs to become "preemptable priorities" in the
>     UAPI. The question becomes how to expose the mask of preemptable
>     priorities. A simple u8 bit mask where "BIT(i) == 1" means "prio i
>     is preemptable", or with a nested netlink attribute scheme similar
>     to DCB_PFC_UP_ATTR_0 -> DCB_PFC_UP_ATTR_7?

No preference there, we can also put it in DCBnl, if it fits better.

> (b) keeping the "preemptable priorities" away from tc-qdisc is ok

Ack.

> (c) non-standard hardware should deal with prio <-> queue mapping by
>     itself if its queues are what are preemptable

I'd prefer if the core had helpers to do the mapping for drivers, 
but in principle yes - make the preemptible queues an implementation
detail if possible.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
  2022-05-23 21:31           ` [Intel-wired-lan] " Jakub Kicinski
@ 2022-05-23 22:49             ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-23 22:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri, davem,
	Po Liu, boon.leong.ong, intel-wired-lan

On Mon, May 23, 2022 at 02:31:16PM -0700, Jakub Kicinski wrote:
> > > If we wanted to expose prio mask in ethtool, that's a different story.  
> > 
> > Re-reading what I've said, I can't say "I was right all along"
> > (not by a long shot, sorry for my part in the confusion),
> 
> Sorry, I admit I did not go back to the archives to re-read your
> feedback today. I'm purely reacting to the fact that the "preemptible
> queue mask" attribute which I have successfully fought off in the
> past have now returned.
> 
> Let me also spell out the source of my objection - high speed NICs
> have multitude of queues, queue groups and sub-interfaces. ethtool
> uAPI which uses a zero-based integer ID will lead to confusion and lack
> of portability because users will not know the mapping and vendors
> will invent whatever fits their HW best.

I'm re-reading even further back and noticing that I really did not use
the "traffic class" term with its correct meaning. I really meant
"priority" here too, in Dec 2020:
https://patchwork.kernel.org/project/netdevbpf/cover/20201202045325.3254757-1-vinicius.gomes@intel.com/#23827347

I see you were opposed to the "preemptable queue mask" idea, and so was
I, but apparently the way in which I formulated this was not quite clear.

> > but I guess the conclusion is that:
> > 
> > (a) "preemptable queues" needs to become "preemptable priorities" in the
> >     UAPI. The question becomes how to expose the mask of preemptable
> >     priorities. A simple u8 bit mask where "BIT(i) == 1" means "prio i
> >     is preemptable", or with a nested netlink attribute scheme similar
> >     to DCB_PFC_UP_ATTR_0 -> DCB_PFC_UP_ATTR_7?
> 
> No preference there, we can also put it in DCBnl, if it fits better.

TBH I don't think I understand what exactly belongs in dcbnl and what
doesn't. My running hypothesis so far was that it's the stuff negotiable
through the DCBX protocol, documented as 802.1Q clause 38 to be
(a) Enhanced Transmission Selection (ETS)
(b) Priority-based Flow Control (PFC)
(c) Application Priority TLV
(d) Application VLAN TLV

but
(1) Frame Preemption isn't negotiated through DCBX, so we should be safe there
(2) I never quite understood why the existence of the DCBX protocol or
    any other protocol would mandate what the kernel interfaces should
    look like. Following this model results in absurdities - unless I'm
    misunderstanding something, an extreme case of this seems to be ETS
    itself. As per the spec, the ETS parameters are numTrafficClassesSupported,
    TCPriorityAssignment and TCBandwidth. What's funny, though, is that
    coincidentally they aren't ETS-specific information, and we seem to
    be able to set the number of TCs of a port both with DCB_CMD_SNUMTCS
    and with tc-mqprio. Same with the priority -> tc map (struct ieee_ets ::
    prio_tc), not to mention shapers per traffic class which are also in
    tc-mqprio, etc.

My instinct so far was to stay away from adding new code to dcbnl and I
think I will continue to do that going forward, thank you.

> > (b) keeping the "preemptable priorities" away from tc-qdisc is ok
> 
> Ack.
> 
> > (c) non-standard hardware should deal with prio <-> queue mapping by
> >     itself if its queues are what are preemptable
> 
> I'd prefer if the core had helpers to do the mapping for drivers, 
> but in principle yes - make the preemptible queues an implementation
> detail if possible.

Yeah, those are details already.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-23 22:49             ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-23 22:49 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, May 23, 2022 at 02:31:16PM -0700, Jakub Kicinski wrote:
> > > If we wanted to expose prio mask in ethtool, that's a different story.  
> > 
> > Re-reading what I've said, I can't say "I was right all along"
> > (not by a long shot, sorry for my part in the confusion),
> 
> Sorry, I admit I did not go back to the archives to re-read your
> feedback today. I'm purely reacting to the fact that the "preemptible
> queue mask" attribute which I have successfully fought off in the
> past have now returned.
> 
> Let me also spell out the source of my objection - high speed NICs
> have multitude of queues, queue groups and sub-interfaces. ethtool
> uAPI which uses a zero-based integer ID will lead to confusion and lack
> of portability because users will not know the mapping and vendors
> will invent whatever fits their HW best.

I'm re-reading even further back and noticing that I really did not use
the "traffic class" term with its correct meaning. I really meant
"priority" here too, in Dec 2020:
https://patchwork.kernel.org/project/netdevbpf/cover/20201202045325.3254757-1-vinicius.gomes at intel.com/#23827347

I see you were opposed to the "preemptable queue mask" idea, and so was
I, but apparently the way in which I formulated this was not quite clear.

> > but I guess the conclusion is that:
> > 
> > (a) "preemptable queues" needs to become "preemptable priorities" in the
> >     UAPI. The question becomes how to expose the mask of preemptable
> >     priorities. A simple u8 bit mask where "BIT(i) == 1" means "prio i
> >     is preemptable", or with a nested netlink attribute scheme similar
> >     to DCB_PFC_UP_ATTR_0 -> DCB_PFC_UP_ATTR_7?
> 
> No preference there, we can also put it in DCBnl, if it fits better.

TBH I don't think I understand what exactly belongs in dcbnl and what
doesn't. My running hypothesis so far was that it's the stuff negotiable
through the DCBX protocol, documented as 802.1Q clause 38 to be
(a) Enhanced Transmission Selection (ETS)
(b) Priority-based Flow Control (PFC)
(c) Application Priority TLV
(d) Application VLAN TLV

but
(1) Frame Preemption isn't negotiated through DCBX, so we should be safe there
(2) I never quite understood why the existence of the DCBX protocol or
    any other protocol would mandate what the kernel interfaces should
    look like. Following this model results in absurdities - unless I'm
    misunderstanding something, an extreme case of this seems to be ETS
    itself. As per the spec, the ETS parameters are numTrafficClassesSupported,
    TCPriorityAssignment and TCBandwidth. What's funny, though, is that
    coincidentally they aren't ETS-specific information, and we seem to
    be able to set the number of TCs of a port both with DCB_CMD_SNUMTCS
    and with tc-mqprio. Same with the priority -> tc map (struct ieee_ets ::
    prio_tc), not to mention shapers per traffic class which are also in
    tc-mqprio, etc.

My instinct so far was to stay away from adding new code to dcbnl and I
think I will continue to do that going forward, thank you.

> > (b) keeping the "preemptable priorities" away from tc-qdisc is ok
> 
> Ack.
> 
> > (c) non-standard hardware should deal with prio <-> queue mapping by
> >     itself if its queues are what are preemptable
> 
> I'd prefer if the core had helpers to do the mapping for drivers, 
> but in principle yes - make the preemptible queues an implementation
> detail if possible.

Yeah, those are details already.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
  2022-05-23 19:52       ` [Intel-wired-lan] " Jakub Kicinski
@ 2022-05-23 23:33         ` Vladimir Oltean
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-23 23:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri, davem,
	Po Liu, boon.leong.ong, intel-wired-lan

On Mon, May 23, 2022 at 12:52:38PM -0700, Jakub Kicinski wrote:
> My understanding is that DCBNL is not in ethtool is that it was built
> primarily for converged Ethernet. ethtool being a netdev thing it's
> largely confined to coarse interface configuration in such
> environments, they certainly don't use TC to control RDMA queues.
> 
> To put it differently DCBNL separates RoCE and storage queues from
> netdev queues (latter being lossy). It's Conway's law at work.

I had to look up Conway's law, now I get it. Beautiful euphemism, thank you.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 00/11] ethtool: Add support for frame preemption
@ 2022-05-23 23:33         ` Vladimir Oltean
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Oltean @ 2022-05-23 23:33 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, May 23, 2022 at 12:52:38PM -0700, Jakub Kicinski wrote:
> My understanding is that DCBNL is not in ethtool is that it was built
> primarily for converged Ethernet. ethtool being a netdev thing it's
> largely confined to coarse interface configuration in such
> environments, they certainly don't use TC to control RDMA queues.
> 
> To put it differently DCBNL separates RoCE and storage queues from
> netdev queues (latter being lossy). It's Conway's law at work.

I had to look up Conway's law, now I get it. Beautiful euphemism, thank you.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 09/11] igc: Add support for Frame Preemption verification
  2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-05-27  9:08     ` Zhou Furong
  -1 siblings, 0 replies; 60+ messages in thread
From: Zhou Furong @ 2022-05-27  9:08 UTC (permalink / raw)
  To: Vinicius Costa Gomes, netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, vladimir.oltean, po.liu,
	boon.leong.ong, intel-wired-lan

> +
> +	struct delayed_work fp_verification_work;
> +	unsigned long fp_start;
> +	bool fp_received_smd_v;
> +	bool fp_received_smd_r;
> +	unsigned int fp_verify_cnt;
> +	enum frame_preemption_state fp_tx_state;
> +	bool fp_disable_verify;

struct size would be smaller if add member to right place


> +	if (!netif_running(adapter->netdev))
> +		return -ENOTCONN;
> +
> +	/* FIXME: rename this function to something less specific, as
> +	 * it can be used outside XDP.
> +	 */
> +	ring = igc_xdp_get_tx_ring(adapter, cpu);
> +	nq = txring_txq(ring);
> +
> +	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return -ENOMEM;
> +
if there is chance of NOMEM, move this before
ring = igc_xdp_get_tx_ring(adapter, cpu);


> +static void igc_fp_verification_work(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct igc_adapter *adapter;
> +	int err;
> +
> +	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
> +
please remove blank

> +	if (adapter->fp_disable_verify)
> +		goto done;
> +
> +	switch (adapter->fp_tx_state) {
> +	case FRAME_PREEMPTION_STATE_START:
> +		adapter->fp_received_smd_r = false;
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
> +
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
state is SENT when send error?

> +		adapter->fp_start = jiffies;
> +		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		break;
> +





> +
> +			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
> +				adapter->fp_verify_cnt = 0;
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
> +				netdev_err(adapter->netdev,
> +					   "Exceeded number of attempts for frame preemption verification\n");
> +			} else {
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +			}
> +			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		}
> +
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_FAILED:
> +	case FRAME_PREEMPTION_STATE_DONE:
miss default?

> +		break;
> +	}
> +

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-wired-lan] [PATCH net-next v5 09/11] igc: Add support for Frame Preemption verification
@ 2022-05-27  9:08     ` Zhou Furong
  0 siblings, 0 replies; 60+ messages in thread
From: Zhou Furong @ 2022-05-27  9:08 UTC (permalink / raw)
  To: intel-wired-lan

> +
> +	struct delayed_work fp_verification_work;
> +	unsigned long fp_start;
> +	bool fp_received_smd_v;
> +	bool fp_received_smd_r;
> +	unsigned int fp_verify_cnt;
> +	enum frame_preemption_state fp_tx_state;
> +	bool fp_disable_verify;

struct size would be smaller if add member to right place


> +	if (!netif_running(adapter->netdev))
> +		return -ENOTCONN;
> +
> +	/* FIXME: rename this function to something less specific, as
> +	 * it can be used outside XDP.
> +	 */
> +	ring = igc_xdp_get_tx_ring(adapter, cpu);
> +	nq = txring_txq(ring);
> +
> +	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return -ENOMEM;
> +
if there is chance of NOMEM, move this before
ring = igc_xdp_get_tx_ring(adapter, cpu);


> +static void igc_fp_verification_work(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct igc_adapter *adapter;
> +	int err;
> +
> +	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
> +
please remove blank

> +	if (adapter->fp_disable_verify)
> +		goto done;
> +
> +	switch (adapter->fp_tx_state) {
> +	case FRAME_PREEMPTION_STATE_START:
> +		adapter->fp_received_smd_r = false;
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
> +
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
state is SENT when send error?

> +		adapter->fp_start = jiffies;
> +		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		break;
> +





> +
> +			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
> +				adapter->fp_verify_cnt = 0;
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
> +				netdev_err(adapter->netdev,
> +					   "Exceeded number of attempts for frame preemption verification\n");
> +			} else {
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +			}
> +			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		}
> +
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_FAILED:
> +	case FRAME_PREEMPTION_STATE_DONE:
miss default?

> +		break;
> +	}
> +

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption
  2022-05-22 10:18 [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption kernel test robot
@ 2022-05-24  8:26 ` kernel test robot
  0 siblings, 0 replies; 60+ messages in thread
From: kernel test robot @ 2022-05-24  8:26 UTC (permalink / raw)
  To: Vinicius Costa Gomes, netdev; +Cc: llvm, kbuild-all

Hi Vinicius,

Thanks for your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Vinicius-Costa-Gomes/ethtool-Add-support-for-frame-preemption/20220520-092800
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git df98714e432abf5cbdac3e4c1a13f94c65ddb8d3
config: arm-randconfig-c002-20220522 (https://download.01.org/0day-ci/archive/20220522/202205221852.CJ4p5boS-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1443dbaba6f0e57be066995db9164f89fb57b413)
reproduce (this is a W=1 build):
         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
         chmod +x ~/bin/make.cross
         # install arm cross compiling tool for clang build
         # apt-get install binutils-arm-linux-gnueabi
         # https://github.com/intel-lab-lkp/linux/commit/a42e940bc53c40ee4e33a1bbf022a663bb28a9c7
         git remote add linux-review https://github.com/intel-lab-lkp/linux
         git fetch --no-tags linux-review Vinicius-Costa-Gomes/ethtool-Add-support-for-frame-preemption/20220520-092800
         git checkout a42e940bc53c40ee4e33a1bbf022a663bb28a9c7
         # save the config file
         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm clang-analyzer

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <yujie.liu@intel.com>


clang-analyzer warnings: (new ones prefixed by >>)

 >> drivers/net/ethernet/intel/igc/igc_main.c:5919:6: warning: Access to field 'preemptible' results in a dereference of an undefined pointer value (loaded from variable 'ring') [clang-analyzer-core.NullDereference]
            if (ring->preemptible) {
                ^

vim +5919 drivers/net/ethernet/intel/igc/igc_main.c

5f2958052c5820 Vinicius Costa Gomes 2019-12-02  5910
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5911  static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5912  				      bool enable)
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5913  {
82faa9b799500f Vinicius Costa Gomes 2020-02-14 @5914  	struct igc_ring *ring;
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5915
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5916  	if (queue < 0 || queue >= adapter->num_tx_queues)
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5917  		return -EINVAL;
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5918
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19 @5919  	if (ring->preemptible) {
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19  5920  		netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19  5921  		return -EINVAL;
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19  5922  	}
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19  5923
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5924  	ring = adapter->tx_ring[queue];
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5925  	ring->launchtime_enable = enable;
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5926
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5927  	return 0;
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5928  }
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5929

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption
@ 2022-05-22 10:18 kernel test robot
  2022-05-24  8:26 ` kernel test robot
  0 siblings, 1 reply; 60+ messages in thread
From: kernel test robot @ 2022-05-22 10:18 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 26764 bytes --]

CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220520011538.1098888-11-vinicius.gomes@intel.com>
References: <20220520011538.1098888-11-vinicius.gomes@intel.com>
TO: Vinicius Costa Gomes <vinicius.gomes@intel.com>
TO: netdev(a)vger.kernel.org
CC: Vinicius Costa Gomes <vinicius.gomes@intel.com>
CC: jhs(a)mojatatu.com
CC: xiyou.wangcong(a)gmail.com
CC: jiri(a)resnulli.us
CC: davem(a)davemloft.net
CC: vladimir.oltean(a)nxp.com
CC: po.liu(a)nxp.com
CC: boon.leong.ong(a)intel.com
CC: intel-wired-lan(a)lists.osuosl.org

Hi Vinicius,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Vinicius-Costa-Gomes/ethtool-Add-support-for-frame-preemption/20220520-092800
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git df98714e432abf5cbdac3e4c1a13f94c65ddb8d3
:::::: branch date: 2 days ago
:::::: commit date: 2 days ago
config: arm-randconfig-c002-20220522 (https://download.01.org/0day-ci/archive/20220522/202205221852.CJ4p5boS-lkp(a)intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1443dbaba6f0e57be066995db9164f89fb57b413)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/intel-lab-lkp/linux/commit/a42e940bc53c40ee4e33a1bbf022a663bb28a9c7
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vinicius-Costa-Gomes/ethtool-Add-support-for-frame-preemption/20220520-092800
        git checkout a42e940bc53c40ee4e33a1bbf022a663bb28a9c7
        # save the config file
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm clang-analyzer 

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


clang-analyzer warnings: (new ones prefixed by >>)
                              ^~~   ~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:937:21: note: Value stored to 'dev' during its initialization is never read
           struct net_device *dev = adapter->netdev;
                              ^~~   ~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:962:2: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memcpy(hw->mac.addr, addr->sa_data, netdev->addr_len);
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:962:2: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
           memcpy(hw->mac.addr, addr->sa_data, netdev->addr_len);
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:1000:3: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                   memcpy(mta_list + (i++ * ETH_ALEN), ha->addr, ETH_ALEN);
                   ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:1000:3: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
                   memcpy(mta_list + (i++ * ETH_ALEN), ha->addr, ETH_ALEN);
                   ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:1784:2: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memcpy(__skb_put(skb, headlen + metasize), xdp->data_meta,
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:1784:2: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
           memcpy(__skb_put(skb, headlen + metasize), xdp->data_meta,
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:2595:2: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memcpy(__skb_put(skb, totalsize), xdp->data_meta,
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:2595:2: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
           memcpy(__skb_put(skb, totalsize), xdp->data_meta,
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:3035:21: warning: Value stored to 'dev' during its initialization is never read [clang-analyzer-deadcode.DeadStores]
           struct net_device *dev = adapter->netdev;
                              ^~~   ~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:3035:21: note: Value stored to 'dev' during its initialization is never read
           struct net_device *dev = adapter->netdev;
                              ^~~   ~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:3064:21: warning: Value stored to 'dev' during its initialization is never read [clang-analyzer-deadcode.DeadStores]
           struct net_device *dev = adapter->netdev;
                              ^~~   ~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:3064:21: note: Value stored to 'dev' during its initialization is never read
           struct net_device *dev = adapter->netdev;
                              ^~~   ~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:3100:21: warning: Value stored to 'dev' during its initialization is never read [clang-analyzer-deadcode.DeadStores]
           struct net_device *dev = adapter->netdev;
                              ^~~   ~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:3100:21: note: Value stored to 'dev' during its initialization is never read
           struct net_device *dev = adapter->netdev;
                              ^~~   ~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:3374:2: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memcpy(&flex->data[offset], src, len);
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:3374:2: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
           memcpy(&flex->data[offset], src, len);
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:4154:3: warning: Value stored to 'current_itr' is never read [clang-analyzer-deadcode.DeadStores]
                   current_itr = 0;
                   ^             ~
   drivers/net/ethernet/intel/igc/igc_main.c:4154:3: note: Value stored to 'current_itr' is never read
                   current_itr = 0;
                   ^             ~
   drivers/net/ethernet/intel/igc/igc_main.c:4311:6: warning: Value stored to 'new_val' during its initialization is never read [clang-analyzer-deadcode.DeadStores]
           int new_val = q_vector->itr_val;
               ^~~~~~~   ~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:4311:6: note: Value stored to 'new_val' during its initialization is never read
           int new_val = q_vector->itr_val;
               ^~~~~~~   ~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:4493:3: warning: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                   memset(q_vector, 0, struct_size(q_vector, ring, ring_count));
                   ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:4493:3: note: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11
                   memset(q_vector, 0, struct_size(q_vector, ring, ring_count));
                   ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5071:2: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memcpy(stats, &adapter->stats64, sizeof(*stats));
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5071:2: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
           memcpy(stats, &adapter->stats64, sizeof(*stats));
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5316:4: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                           sprintf(q_vector->name, "%s-TxRx-%u", netdev->name,
                           ^~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5316:4: note: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11
                           sprintf(q_vector->name, "%s-TxRx-%u", netdev->name,
                           ^~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5319:4: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                           sprintf(q_vector->name, "%s-tx-%u", netdev->name,
                           ^~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5319:4: note: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11
                           sprintf(q_vector->name, "%s-tx-%u", netdev->name,
                           ^~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5322:4: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                           sprintf(q_vector->name, "%s-rx-%u", netdev->name,
                           ^~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5322:4: note: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11
                           sprintf(q_vector->name, "%s-rx-%u", netdev->name,
                           ^~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5325:4: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                           sprintf(q_vector->name, "%s-unused", netdev->name);
                           ^~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5325:4: note: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11
                           sprintf(q_vector->name, "%s-unused", netdev->name);
                           ^~~~~~~
>> drivers/net/ethernet/intel/igc/igc_main.c:5919:6: warning: Access to field 'preemptible' results in a dereference of an undefined pointer value (loaded from variable 'ring') [clang-analyzer-core.NullDereference]
           if (ring->preemptible) {
               ^
   drivers/net/ethernet/intel/igc/igc_main.c:6258:2: note: Control jumps to 'case TC_SETUP_QDISC_ETF:'  at line 6262
           switch (type) {
           ^
   drivers/net/ethernet/intel/igc/igc_main.c:6263:10: note: Calling 'igc_tsn_enable_launchtime'
                   return igc_tsn_enable_launchtime(adapter, type_data);
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5989:6: note: Assuming field 'type' is equal to igc_i225
           if (hw->mac.type != igc_i225)
               ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5989:2: note: Taking false branch
           if (hw->mac.type != igc_i225)
           ^
   drivers/net/ethernet/intel/igc/igc_main.c:5992:8: note: Calling 'igc_save_launchtime_params'
           err = igc_save_launchtime_params(adapter, qopt->queue, qopt->enable);
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5914:2: note: 'ring' declared without an initial value
           struct igc_ring *ring;
           ^~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5916:6: note: Assuming 'queue' is >= 0
           if (queue < 0 || queue >= adapter->num_tx_queues)
               ^~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5916:6: note: Left side of '||' is false
   drivers/net/ethernet/intel/igc/igc_main.c:5916:19: note: Assuming 'queue' is < field 'num_tx_queues'
           if (queue < 0 || queue >= adapter->num_tx_queues)
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:5916:2: note: Taking false branch
           if (queue < 0 || queue >= adapter->num_tx_queues)
           ^
   drivers/net/ethernet/intel/igc/igc_main.c:5919:6: note: Access to field 'preemptible' results in a dereference of an undefined pointer value (loaded from variable 'ring')
           if (ring->preemptible) {
               ^~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6169:2: warning: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6169:2: note: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11
           memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6533:2: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memcpy(&hw->mac.ops, ei->mac_ops, sizeof(hw->mac.ops));
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6533:2: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
           memcpy(&hw->mac.ops, ei->mac_ops, sizeof(hw->mac.ops));
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6534:2: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memcpy(&hw->phy.ops, ei->phy_ops, sizeof(hw->phy.ops));
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6534:2: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
           memcpy(&hw->phy.ops, ei->phy_ops, sizeof(hw->phy.ops));
           ^~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6651:2: warning: Call to function 'strncpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'strncpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           strncpy(netdev->name, "eth%d", IFNAMSIZ);
           ^~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6651:2: note: Call to function 'strncpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'strncpy_s' in case of C11
           strncpy(netdev->name, "eth%d", IFNAMSIZ);
           ^~~~~~~
   Suppressed 93 warnings (91 in non-user code, 2 with check filters).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   68 warnings generated.
   drivers/net/ethernet/brocade/bna/bna_enet.c:201:3: warning: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                   memset(stats_dst, 0, sizeof(struct bfi_enet_stats_rxf));
                   ^~~~~~
   drivers/net/ethernet/brocade/bna/bna_enet.c:201:3: note: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11
                   memset(stats_dst, 0, sizeof(struct bfi_enet_stats_rxf));
                   ^~~~~~
   drivers/net/ethernet/brocade/bna/bna_enet.c:216:3: warning: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                   memset(stats_dst, 0, sizeof(struct bfi_enet_stats_txf));
                   ^~~~~~
   drivers/net/ethernet/brocade/bna/bna_enet.c:216:3: note: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11
                   memset(stats_dst, 0, sizeof(struct bfi_enet_stats_txf));
                   ^~~~~~
   drivers/net/ethernet/brocade/bna/bna_enet.c:1732:2: warning: Value stored to 'kva' is never read [clang-analyzer-deadcode.DeadStores]
           kva += bfa_msgq_meminfo();
           ^      ~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/brocade/bna/bna_enet.c:1732:2: note: Value stored to 'kva' is never read
           kva += bfa_msgq_meminfo();
           ^      ~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/brocade/bna/bna_enet.c:1733:2: warning: Value stored to 'dma' is never read [clang-analyzer-deadcode.DeadStores]
           dma += bfa_msgq_meminfo();
           ^      ~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/brocade/bna/bna_enet.c:1733:2: note: Value stored to 'dma' is never read
           dma += bfa_msgq_meminfo();
           ^      ~~~~~~~~~~~~~~~~~~
   Suppressed 64 warnings (64 in non-user code).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   72 warnings generated.
   drivers/net/ethernet/brocade/bna/bna_tx_rx.c:305:2: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memcpy(&req->table[0], rxf->rit, rxf->rit_size);
           ^~~~~~
   drivers/net/ethernet/brocade/bna/bna_tx_rx.c:305:2: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
           memcpy(&req->table[0], rxf->rit, rxf->rit_size);
           ^~~~~~
   drivers/net/ethernet/brocade/bna/bna_tx_rx.c:640:2: warning: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memset(rxf->vlan_filter_table, 0,
           ^~~~~~
   drivers/net/ethernet/brocade/bna/bna_tx_rx.c:640:2: note: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11
           memset(rxf->vlan_filter_table, 0,
           ^~~~~~
   drivers/net/ethernet/brocade/bna/bna_tx_rx.c:912:3: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]

vim +5919 drivers/net/ethernet/intel/igc/igc_main.c

5f2958052c5820 Vinicius Costa Gomes 2019-12-02  5910  
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5911  static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5912  				      bool enable)
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5913  {
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5914  	struct igc_ring *ring;
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5915  
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5916  	if (queue < 0 || queue >= adapter->num_tx_queues)
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5917  		return -EINVAL;
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5918  
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19 @5919  	if (ring->preemptible) {
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19  5920  		netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19  5921  		return -EINVAL;
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19  5922  	}
a42e940bc53c40 Vinicius Costa Gomes 2022-05-19  5923  
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5924  	ring = adapter->tx_ring[queue];
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5925  	ring->launchtime_enable = enable;
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5926  
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5927  	return 0;
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5928  }
82faa9b799500f Vinicius Costa Gomes 2020-02-14  5929  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2022-05-27  9:12 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20  1:15 [PATCH net-next v5 00/11] ethtool: Add support for frame preemption Vinicius Costa Gomes
2022-05-20  1:15 ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  1:15 ` [PATCH net-next v5 01/11] ethtool: Add support for configuring " Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  9:06   ` Vladimir Oltean
2022-05-20  9:06     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-20  1:15 ` [PATCH net-next v5 02/11] ethtool: Add support for Frame Preemption verification Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  9:16   ` Vladimir Oltean
2022-05-20  9:16     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-20  1:15 ` [PATCH net-next v5 03/11] igc: Add support for receiving frames with all zeroes address Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  1:15 ` [PATCH net-next v5 04/11] igc: Set the RX packet buffer size for TSN mode Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  1:15 ` [PATCH net-next v5 05/11] igc: Optimze TX buffer sizes for TSN Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  9:33   ` Vladimir Oltean
2022-05-20  9:33     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-20  1:15 ` [PATCH net-next v5 06/11] igc: Add support for receiving errored frames Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  9:19   ` Vladimir Oltean
2022-05-20  9:19     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-20  1:15 ` [PATCH net-next v5 07/11] igc: Add support for enabling frame preemption via ethtool Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  1:15 ` [PATCH net-next v5 08/11] igc: Add support for setting frame preemption configuration Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  9:22   ` Vladimir Oltean
2022-05-20  9:22     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-20  1:15 ` [PATCH net-next v5 09/11] igc: Add support for Frame Preemption verification Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20 10:43   ` Vladimir Oltean
2022-05-20 10:43     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-27  9:08   ` Zhou Furong
2022-05-27  9:08     ` [Intel-wired-lan] " Zhou Furong
2022-05-20  1:15 ` [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20  6:11   ` kernel test robot
2022-05-20  6:11     ` [Intel-wired-lan] " kernel test robot
2022-05-20 11:06   ` Vladimir Oltean
2022-05-20 11:06     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-20  1:15 ` [PATCH net-next v5 11/11] igc: Add support for exposing frame preemption stats registers Vinicius Costa Gomes
2022-05-20  1:15   ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-05-20 12:13   ` Vladimir Oltean
2022-05-20 12:13     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-20 22:34 ` [PATCH net-next v5 00/11] ethtool: Add support for frame preemption Jakub Kicinski
2022-05-20 22:34   ` [Intel-wired-lan] " Jakub Kicinski
2022-05-21 15:03   ` Vladimir Oltean
2022-05-21 15:03     ` [Intel-wired-lan] " Vladimir Oltean
2022-05-23 19:52     ` Jakub Kicinski
2022-05-23 19:52       ` [Intel-wired-lan] " Jakub Kicinski
2022-05-23 20:32       ` Vladimir Oltean
2022-05-23 20:32         ` [Intel-wired-lan] " Vladimir Oltean
2022-05-23 21:31         ` Jakub Kicinski
2022-05-23 21:31           ` [Intel-wired-lan] " Jakub Kicinski
2022-05-23 22:49           ` Vladimir Oltean
2022-05-23 22:49             ` [Intel-wired-lan] " Vladimir Oltean
2022-05-23 23:33       ` Vladimir Oltean
2022-05-23 23:33         ` [Intel-wired-lan] " Vladimir Oltean
2022-05-22 10:18 [PATCH net-next v5 10/11] igc: Check incompatible configs for Frame Preemption kernel test robot
2022-05-24  8:26 ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.