All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v4 00/12] ethtool: Add support for frame preemption
@ 2021-06-26  0:33 ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Hi,

When the APIs, now including verification, are fine, I can separate
this series into smaller pieces, to make further review easier. I am
proposing this as one series so it's easier to get the full picture.


Changes from v3:
 - Added early support for sending/receiving support for verification
   frames (Vladimir Oltean). This is a bit more than RFC-quality, but
   adding this so people can see how it fits together with the rest.
   The driver specific bits are interesting because the hardware does
   the absolute minimum, the driver needs to do the heavy lifting.

 - Added support for setting preemptible/express traffic classes via
   tc-mqprio (Vladimir Oltean). mqprio parsing of configuration
   options is... interesting, so comments here are going to be useful,
   I may have missed something.

Changes from v2:
 - Fixed some copy&paste mistakes, documentation formatting and
   slightly improved error reporting (Jakub Kicinski);

Changes from v1:
 - The minimum fragment size configuration was changed to be
   configured in bytes to be more future proof, in case the standard
   changes this (the previous definition was '(X + 1) * 64', X being
   [0..3]) (Michal Kubecek);
 - In taprio, frame preemption is now configured by traffic classes (was
   done by queues) (Jakub Kicinski, Vladimir Oltean);
 - Various netlink protocol validation improvements (Jakub Kicinski);
 - Dropped the IGC register dump for frame preemption registers, until a
   stardandized way of exposing that is agreed (Jakub Kicinski);

Changes from RFC v2:
 - Reorganised the offload enabling/disabling on the driver size;
 - Added a few igc fixes;

Changes from RFC v1:
 - The per-queue preemptible/express setting is moved to applicable
   qdiscs (Jakub Kicinski and others);
 - "min-frag-size" now follows the 802.3br specification more closely,
   it's expressed as X in '64(1 + X) + 4' (Joergen Andreasen);

Another point that should be noted is the addition of the
TC_SETUP_PREEMPT offload type, the idea behind this is to allow other
qdiscs (was thinking of mqprio) to also configure which traffic
classes should be marked as express/preemptible.

Original cover letter (lightly edited):

This is still an RFC because two main reasons, I want to confirm that
this approach (per-queue settings via qdiscs, device settings via
ethtool) looks good, even though there aren't much more options left ;-)
The other reason is that while testing this I found some weirdness
in the driver that I would need a bit more time to investigate.

(In case these patches are not enough to give an idea of how things
work, I can send the userspace patches, of course.)

The idea of this "hybrid" approach is that applications/users would do
the following steps to configure frame preemption:

$ tc qdisc replace dev $IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      base-time $BASE_TIME \
      sched-entry S 0f 10000000 \
      preempt 1110 \
      flags 0x2 

The "preempt" parameter is the only difference, it configures which
traffic classes are marked as preemptible, in this example, traffic
class 0 is marked as "not preemptible", so it is express, the rest of
the four traffic classes are preemptible.

The next step, of this example, would be to enable frame preemption in
the device, via ethtool, and set the minimum fragment size to 192 bytes:

$ sudo ./ethtool --set-frame-preemption $IFACE fp on min-frag-size 192

Cheers,


Vinicius Costa Gomes (12):
  ethtool: Add support for configuring frame preemption
  taprio: Add support for frame preemption offload
  core: Introduce netdev_tc_map_to_queue_mask()
  taprio: Replace tc_map_to_queue_mask()
  mqprio: Add support for frame preemption offload
  igc: Add support for enabling frame preemption via ethtool
  igc: Add support for TC_SETUP_PREEMPT
  igc: Simplify TSN flags handling
  igc: Add support for setting frame preemption configuration
  ethtool: Add support for Frame Preemption verification
  igc: Check incompatible configs for Frame Preemption
  igc: Add support for Frame Preemption verification

 Documentation/networking/ethtool-netlink.rst |  41 +++
 drivers/net/ethernet/intel/igc/igc.h         |  27 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  17 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  45 ++++
 drivers/net/ethernet/intel/igc/igc_main.c    | 249 ++++++++++++++++++-
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 127 ++++++----
 drivers/net/ethernet/intel/igc/igc_tsn.h     |   1 +
 include/linux/ethtool.h                      |  24 ++
 include/linux/netdevice.h                    |   2 +
 include/net/pkt_sched.h                      |   4 +
 include/uapi/linux/ethtool_netlink.h         |  19 ++
 include/uapi/linux/pkt_sched.h               |   2 +
 net/core/dev.c                               |  20 ++
 net/ethtool/Makefile                         |   2 +-
 net/ethtool/common.c                         |  25 ++
 net/ethtool/netlink.c                        |  19 ++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 157 ++++++++++++
 net/sched/sch_mqprio.c                       |  41 ++-
 net/sched/sch_taprio.c                       |  65 +++--
 20 files changed, 815 insertions(+), 76 deletions(-)
 create mode 100644 net/ethtool/preempt.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 00/12] ethtool: Add support for frame preemption
@ 2021-06-26  0:33 ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Hi,

When the APIs, now including verification, are fine, I can separate
this series into smaller pieces, to make further review easier. I am
proposing this as one series so it's easier to get the full picture.


Changes from v3:
 - Added early support for sending/receiving support for verification
   frames (Vladimir Oltean). This is a bit more than RFC-quality, but
   adding this so people can see how it fits together with the rest.
   The driver specific bits are interesting because the hardware does
   the absolute minimum, the driver needs to do the heavy lifting.

 - Added support for setting preemptible/express traffic classes via
   tc-mqprio (Vladimir Oltean). mqprio parsing of configuration
   options is... interesting, so comments here are going to be useful,
   I may have missed something.

Changes from v2:
 - Fixed some copy&paste mistakes, documentation formatting and
   slightly improved error reporting (Jakub Kicinski);

Changes from v1:
 - The minimum fragment size configuration was changed to be
   configured in bytes to be more future proof, in case the standard
   changes this (the previous definition was '(X + 1) * 64', X being
   [0..3]) (Michal Kubecek);
 - In taprio, frame preemption is now configured by traffic classes (was
   done by queues) (Jakub Kicinski, Vladimir Oltean);
 - Various netlink protocol validation improvements (Jakub Kicinski);
 - Dropped the IGC register dump for frame preemption registers, until a
   stardandized way of exposing that is agreed (Jakub Kicinski);

Changes from RFC v2:
 - Reorganised the offload enabling/disabling on the driver size;
 - Added a few igc fixes;

Changes from RFC v1:
 - The per-queue preemptible/express setting is moved to applicable
   qdiscs (Jakub Kicinski and others);
 - "min-frag-size" now follows the 802.3br specification more closely,
   it's expressed as X in '64(1 + X) + 4' (Joergen Andreasen);

Another point that should be noted is the addition of the
TC_SETUP_PREEMPT offload type, the idea behind this is to allow other
qdiscs (was thinking of mqprio) to also configure which traffic
classes should be marked as express/preemptible.

Original cover letter (lightly edited):

This is still an RFC because two main reasons, I want to confirm that
this approach (per-queue settings via qdiscs, device settings via
ethtool) looks good, even though there aren't much more options left ;-)
The other reason is that while testing this I found some weirdness
in the driver that I would need a bit more time to investigate.

(In case these patches are not enough to give an idea of how things
work, I can send the userspace patches, of course.)

The idea of this "hybrid" approach is that applications/users would do
the following steps to configure frame preemption:

$ tc qdisc replace dev $IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1 at 0 1 at 1 2 at 2 \
      base-time $BASE_TIME \
      sched-entry S 0f 10000000 \
      preempt 1110 \
      flags 0x2 

The "preempt" parameter is the only difference, it configures which
traffic classes are marked as preemptible, in this example, traffic
class 0 is marked as "not preemptible", so it is express, the rest of
the four traffic classes are preemptible.

The next step, of this example, would be to enable frame preemption in
the device, via ethtool, and set the minimum fragment size to 192 bytes:

$ sudo ./ethtool --set-frame-preemption $IFACE fp on min-frag-size 192

Cheers,


Vinicius Costa Gomes (12):
  ethtool: Add support for configuring frame preemption
  taprio: Add support for frame preemption offload
  core: Introduce netdev_tc_map_to_queue_mask()
  taprio: Replace tc_map_to_queue_mask()
  mqprio: Add support for frame preemption offload
  igc: Add support for enabling frame preemption via ethtool
  igc: Add support for TC_SETUP_PREEMPT
  igc: Simplify TSN flags handling
  igc: Add support for setting frame preemption configuration
  ethtool: Add support for Frame Preemption verification
  igc: Check incompatible configs for Frame Preemption
  igc: Add support for Frame Preemption verification

 Documentation/networking/ethtool-netlink.rst |  41 +++
 drivers/net/ethernet/intel/igc/igc.h         |  27 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  17 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  45 ++++
 drivers/net/ethernet/intel/igc/igc_main.c    | 249 ++++++++++++++++++-
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 127 ++++++----
 drivers/net/ethernet/intel/igc/igc_tsn.h     |   1 +
 include/linux/ethtool.h                      |  24 ++
 include/linux/netdevice.h                    |   2 +
 include/net/pkt_sched.h                      |   4 +
 include/uapi/linux/ethtool_netlink.h         |  19 ++
 include/uapi/linux/pkt_sched.h               |   2 +
 net/core/dev.c                               |  20 ++
 net/ethtool/Makefile                         |   2 +-
 net/ethtool/common.c                         |  25 ++
 net/ethtool/netlink.c                        |  19 ++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 157 ++++++++++++
 net/sched/sch_mqprio.c                       |  41 ++-
 net/sched/sch_taprio.c                       |  65 +++--
 20 files changed, 815 insertions(+), 76 deletions(-)
 create mode 100644 net/ethtool/preempt.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 01/12] ethtool: Add support for configuring frame preemption
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Frame preemption (described in IEEE 802.3-2018, Section 99 in
particular) defines the concept of preemptible and express queues. It
allows traffic from express queues to "interrupt" traffic from
preemptible queues, which are "resumed" after the express traffic has
finished transmitting.

Frame preemption can only be used when both the local device and the
link partner support it.

Only parameters for enabling/disabling frame preemption and
configuring the minimum fragment size are included here. Expressing
which queues are marked as preemptible is left to mqprio/taprio, as
having that information there should be easier on the user.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 Documentation/networking/ethtool-netlink.rst |  38 +++++
 include/linux/ethtool.h                      |  22 +++
 include/uapi/linux/ethtool_netlink.h         |  17 +++
 net/ethtool/Makefile                         |   2 +-
 net/ethtool/common.c                         |  25 ++++
 net/ethtool/netlink.c                        |  19 +++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 146 +++++++++++++++++++
 8 files changed, 272 insertions(+), 1 deletion(-)
 create mode 100644 net/ethtool/preempt.c

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 6ea91e41593f..a87f1716944e 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1477,6 +1477,44 @@ Low and high bounds are inclusive, for example:
  etherStatsPkts512to1023Octets 512  1023
  ============================= ==== ====
 
+PREEMPT_GET
+===========
+
+Get information about frame preemption state.
+
+Request contents:
+
+  ====================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header
+  ====================================  ======  ==========================
+
+Request contents:
+
+  =====================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header
+  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
+  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
+  =====================================  ======  ==========================
+
+``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
+fragment size that the receiver device supports.
+
+PREEMPT_SET
+===========
+
+Sets frame preemption parameters.
+
+Request contents:
+
+  =====================================  ======  ==========================
+  ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header
+  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
+  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
+  =====================================  ======  ==========================
+
+``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
+fragment size that the receiver device supports.
+
 Request translation
 ===================
 
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 29dbb603bc91..7e449be8f335 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -409,6 +409,19 @@ struct ethtool_module_eeprom {
 	u8	*data;
 };
 
+/**
+ * struct ethtool_fp - Frame Preemption information
+ *
+ * @enabled: Enable frame preemption.
+ * @add_frag_size: Minimum size for additional (non-final) fragments
+ * in bytes, for the value defined in the IEEE 802.3-2018 standard see
+ * ethtool_frag_size_to_mult().
+ */
+struct ethtool_fp {
+	u8 enabled;
+	u32 add_frag_size;
+};
+
 /**
  * struct ethtool_ops - optional netdev operations
  * @cap_link_lanes_supported: indicates if the driver supports lanes
@@ -561,6 +574,8 @@ struct ethtool_module_eeprom {
  *	not report statistics.
  * @get_fecparam: Get the network device Forward Error Correction parameters.
  * @set_fecparam: Set the network device Forward Error Correction parameters.
+ * @get_preempt: Get the network device Frame Preemption parameters.
+ * @set_preempt: Set the network device Frame Preemption parameters.
  * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
  *	This is only useful if the device maintains PHY statistics and
  *	cannot use the standard PHY library helpers.
@@ -675,6 +690,10 @@ struct ethtool_ops {
 				      struct ethtool_fecparam *);
 	int	(*set_fecparam)(struct net_device *,
 				      struct ethtool_fecparam *);
+	int	(*get_preempt)(struct net_device *,
+			       struct ethtool_fp *);
+	int	(*set_preempt)(struct net_device *, struct ethtool_fp *,
+			       struct netlink_ext_ack *);
 	void	(*get_ethtool_phy_stats)(struct net_device *,
 					 struct ethtool_stats *, u64 *);
 	int	(*get_phy_tunable)(struct net_device *,
@@ -766,4 +785,7 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
  * next string.
  */
 extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
+
+u8 ethtool_frag_size_to_mult(u32 frag_size);
+
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index c7135c9c37a5..4600aba1c693 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -44,6 +44,8 @@ enum {
 	ETHTOOL_MSG_TUNNEL_INFO_GET,
 	ETHTOOL_MSG_FEC_GET,
 	ETHTOOL_MSG_FEC_SET,
+	ETHTOOL_MSG_PREEMPT_GET,
+	ETHTOOL_MSG_PREEMPT_SET,
 	ETHTOOL_MSG_MODULE_EEPROM_GET,
 	ETHTOOL_MSG_STATS_GET,
 
@@ -86,6 +88,8 @@ enum {
 	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
 	ETHTOOL_MSG_FEC_GET_REPLY,
 	ETHTOOL_MSG_FEC_NTF,
+	ETHTOOL_MSG_PREEMPT_GET_REPLY,
+	ETHTOOL_MSG_PREEMPT_NTF,
 	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
 	ETHTOOL_MSG_STATS_GET_REPLY,
 
@@ -664,6 +668,19 @@ enum {
 	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)
 };
 
+/* FRAME PREEMPTION */
+
+enum {
+	ETHTOOL_A_PREEMPT_UNSPEC,
+	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
+	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_PREEMPT_CNT,
+	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)
+};
+
 /* MODULE EEPROM */
 
 enum {
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 723c9a8a8cdf..4b84b2d34c7a 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
 ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
 		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
 		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
-		   tunnels.o fec.o eeprom.o stats.o
+		   tunnels.o fec.o preempt.o eeprom.o stats.o
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index f9dcbad84788..68d123dd500b 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -579,3 +579,28 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
 	link_ksettings->base.duplex = link_info->duplex;
 }
 EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);
+
+/**
+ * ethtool_frag_size_to_mult() - Convert from a Frame Preemption
+ * Additional Fragment size in bytes to a multiplier.
+ * @frag_size: minimum non-final fragment size in bytes.
+ *
+ * The multiplier is defined as:
+ *	"A 2-bit integer value used to indicate the minimum size of
+ *	non-final fragments supported by the receiver on the given port
+ *	associated with the local System. This value is expressed in units
+ *	of 64 octets of additional fragment length."
+ *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018
+ *	standard.
+ *
+ * Return: the multiplier is a number in the [0, 2] interval.
+ */
+u8 ethtool_frag_size_to_mult(u32 frag_size)
+{
+	u8 mult = (frag_size / 64) - 1;
+
+	mult = clamp_t(u8, mult, 0, 3);
+
+	return mult;
+}
+EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index a7346346114f..f4e07b740790 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -246,6 +246,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
 	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,
 	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
+	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,
 	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
 	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
 };
@@ -561,6 +562,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
 	[ETHTOOL_MSG_PAUSE_NTF]		= &ethnl_pause_request_ops,
 	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
+	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,
 };
 
 /* default notification handler */
@@ -654,6 +656,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
 	[ETHTOOL_MSG_PAUSE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
+	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,
 };
 
 void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
@@ -958,6 +961,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_stats_get_policy,
 		.maxattr = ARRAY_SIZE(ethnl_stats_get_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_PREEMPT_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_preempt_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,
+	},
+	{
+		.cmd	= ETHTOOL_MSG_PREEMPT_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_set_preempt,
+		.policy = ethnl_preempt_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 3e25a47fd482..cc90a463a81c 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -345,6 +345,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;
 extern const struct ethnl_request_ops ethnl_eee_request_ops;
 extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
 extern const struct ethnl_request_ops ethnl_fec_request_ops;
+extern const struct ethnl_request_ops ethnl_preempt_request_ops;
 extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
 extern const struct ethnl_request_ops ethnl_stats_request_ops;
 
@@ -381,6 +382,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
 extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
+extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
+extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
@@ -400,6 +403,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
 int ethnl_tunnel_info_start(struct netlink_callback *cb);
 int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
+int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);
 
 extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
 extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
new file mode 100644
index 000000000000..4f96d3c2b1d5
--- /dev/null
+++ b/net/ethtool/preempt.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "netlink.h"
+#include "common.h"
+
+struct preempt_req_info {
+	struct ethnl_req_info		base;
+};
+
+struct preempt_reply_data {
+	struct ethnl_reply_data		base;
+	struct ethtool_fp		fp;
+};
+
+#define PREEMPT_REPDATA(__reply_base) \
+	container_of(__reply_base, struct preempt_reply_data, base)
+
+const struct nla_policy
+ethnl_preempt_get_policy[] = {
+	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),
+};
+
+static int preempt_prepare_data(const struct ethnl_req_info *req_base,
+				struct ethnl_reply_data *reply_base,
+				struct genl_info *info)
+{
+	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	struct net_device *dev = reply_base->dev;
+	int ret;
+
+	if (!dev->ethtool_ops->get_preempt)
+		return -EOPNOTSUPP;
+
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		return ret;
+
+	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);
+	ethnl_ops_complete(dev);
+
+	return ret;
+}
+
+static int preempt_reply_size(const struct ethnl_req_info *req_base,
+			      const struct ethnl_reply_data *reply_base)
+{
+	int len = 0;
+
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
+	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
+
+	return len;
+}
+
+static int preempt_fill_reply(struct sk_buff *skb,
+			      const struct ethnl_req_info *req_base,
+			      const struct ethnl_reply_data *reply_base)
+{
+	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	const struct ethtool_fp *preempt = &data->fp;
+
+	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,
+			preempt->add_frag_size))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+const struct ethnl_request_ops ethnl_preempt_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,
+	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,
+	.req_info_size		= sizeof(struct preempt_req_info),
+	.reply_data_size	= sizeof(struct preempt_reply_data),
+
+	.prepare_data		= preempt_prepare_data,
+	.reply_size		= preempt_reply_size,
+	.fill_reply		= preempt_fill_reply,
+};
+
+const struct nla_policy
+ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
+	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
+	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
+};
+
+int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
+{
+	struct ethnl_req_info req_info = {};
+	struct nlattr **tb = info->attrs;
+	struct ethtool_fp preempt = {};
+	struct net_device *dev;
+	bool mod = false;
+	int ret;
+
+	ret = ethnl_parse_header_dev_get(&req_info,
+					 tb[ETHTOOL_A_PREEMPT_HEADER],
+					 genl_info_net(info), info->extack,
+					 true);
+	if (ret < 0)
+		return ret;
+	dev = req_info.dev;
+	ret = -EOPNOTSUPP;
+	if (!dev->ethtool_ops->get_preempt ||
+	    !dev->ethtool_ops->set_preempt)
+		goto out_dev;
+
+	rtnl_lock();
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		goto out_rtnl;
+
+	ret = dev->ethtool_ops->get_preempt(dev, &preempt);
+	if (ret < 0) {
+		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
+		goto out_ops;
+	}
+
+	ethnl_update_u8(&preempt.enabled,
+			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
+	ethnl_update_u32(&preempt.add_frag_size,
+			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
+	ret = 0;
+	if (!mod)
+		goto out_ops;
+
+	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
+	if (ret < 0) {
+		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");
+		goto out_ops;
+	}
+
+	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
+
+out_ops:
+	ethnl_ops_complete(dev);
+out_rtnl:
+	rtnl_unlock();
+out_dev:
+	dev_put(dev);
+	return ret;
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 01/12] ethtool: Add support for configuring frame preemption
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Frame preemption (described in IEEE 802.3-2018, Section 99 in
particular) defines the concept of preemptible and express queues. It
allows traffic from express queues to "interrupt" traffic from
preemptible queues, which are "resumed" after the express traffic has
finished transmitting.

Frame preemption can only be used when both the local device and the
link partner support it.

Only parameters for enabling/disabling frame preemption and
configuring the minimum fragment size are included here. Expressing
which queues are marked as preemptible is left to mqprio/taprio, as
having that information there should be easier on the user.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 Documentation/networking/ethtool-netlink.rst |  38 +++++
 include/linux/ethtool.h                      |  22 +++
 include/uapi/linux/ethtool_netlink.h         |  17 +++
 net/ethtool/Makefile                         |   2 +-
 net/ethtool/common.c                         |  25 ++++
 net/ethtool/netlink.c                        |  19 +++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 146 +++++++++++++++++++
 8 files changed, 272 insertions(+), 1 deletion(-)
 create mode 100644 net/ethtool/preempt.c

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 6ea91e41593f..a87f1716944e 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1477,6 +1477,44 @@ Low and high bounds are inclusive, for example:
  etherStatsPkts512to1023Octets 512  1023
  ============================= ==== ====
 
+PREEMPT_GET
+===========
+
+Get information about frame preemption state.
+
+Request contents:
+
+  ====================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header
+  ====================================  ======  ==========================
+
+Request contents:
+
+  =====================================  ======  ==========================
+  ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header
+  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
+  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
+  =====================================  ======  ==========================
+
+``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
+fragment size that the receiver device supports.
+
+PREEMPT_SET
+===========
+
+Sets frame preemption parameters.
+
+Request contents:
+
+  =====================================  ======  ==========================
+  ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header
+  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
+  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
+  =====================================  ======  ==========================
+
+``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
+fragment size that the receiver device supports.
+
 Request translation
 ===================
 
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 29dbb603bc91..7e449be8f335 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -409,6 +409,19 @@ struct ethtool_module_eeprom {
 	u8	*data;
 };
 
+/**
+ * struct ethtool_fp - Frame Preemption information
+ *
+ * @enabled: Enable frame preemption.
+ * @add_frag_size: Minimum size for additional (non-final) fragments
+ * in bytes, for the value defined in the IEEE 802.3-2018 standard see
+ * ethtool_frag_size_to_mult().
+ */
+struct ethtool_fp {
+	u8 enabled;
+	u32 add_frag_size;
+};
+
 /**
  * struct ethtool_ops - optional netdev operations
  * @cap_link_lanes_supported: indicates if the driver supports lanes
@@ -561,6 +574,8 @@ struct ethtool_module_eeprom {
  *	not report statistics.
  * @get_fecparam: Get the network device Forward Error Correction parameters.
  * @set_fecparam: Set the network device Forward Error Correction parameters.
+ * @get_preempt: Get the network device Frame Preemption parameters.
+ * @set_preempt: Set the network device Frame Preemption parameters.
  * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
  *	This is only useful if the device maintains PHY statistics and
  *	cannot use the standard PHY library helpers.
@@ -675,6 +690,10 @@ struct ethtool_ops {
 				      struct ethtool_fecparam *);
 	int	(*set_fecparam)(struct net_device *,
 				      struct ethtool_fecparam *);
+	int	(*get_preempt)(struct net_device *,
+			       struct ethtool_fp *);
+	int	(*set_preempt)(struct net_device *, struct ethtool_fp *,
+			       struct netlink_ext_ack *);
 	void	(*get_ethtool_phy_stats)(struct net_device *,
 					 struct ethtool_stats *, u64 *);
 	int	(*get_phy_tunable)(struct net_device *,
@@ -766,4 +785,7 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
  * next string.
  */
 extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
+
+u8 ethtool_frag_size_to_mult(u32 frag_size);
+
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index c7135c9c37a5..4600aba1c693 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -44,6 +44,8 @@ enum {
 	ETHTOOL_MSG_TUNNEL_INFO_GET,
 	ETHTOOL_MSG_FEC_GET,
 	ETHTOOL_MSG_FEC_SET,
+	ETHTOOL_MSG_PREEMPT_GET,
+	ETHTOOL_MSG_PREEMPT_SET,
 	ETHTOOL_MSG_MODULE_EEPROM_GET,
 	ETHTOOL_MSG_STATS_GET,
 
@@ -86,6 +88,8 @@ enum {
 	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
 	ETHTOOL_MSG_FEC_GET_REPLY,
 	ETHTOOL_MSG_FEC_NTF,
+	ETHTOOL_MSG_PREEMPT_GET_REPLY,
+	ETHTOOL_MSG_PREEMPT_NTF,
 	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
 	ETHTOOL_MSG_STATS_GET_REPLY,
 
@@ -664,6 +668,19 @@ enum {
 	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)
 };
 
+/* FRAME PREEMPTION */
+
+enum {
+	ETHTOOL_A_PREEMPT_UNSPEC,
+	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
+	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
+	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
+
+	/* add new constants above here */
+	__ETHTOOL_A_PREEMPT_CNT,
+	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)
+};
+
 /* MODULE EEPROM */
 
 enum {
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 723c9a8a8cdf..4b84b2d34c7a 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
 ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
 		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
 		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
-		   tunnels.o fec.o eeprom.o stats.o
+		   tunnels.o fec.o preempt.o eeprom.o stats.o
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index f9dcbad84788..68d123dd500b 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -579,3 +579,28 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
 	link_ksettings->base.duplex = link_info->duplex;
 }
 EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);
+
+/**
+ * ethtool_frag_size_to_mult() - Convert from a Frame Preemption
+ * Additional Fragment size in bytes to a multiplier.
+ * @frag_size: minimum non-final fragment size in bytes.
+ *
+ * The multiplier is defined as:
+ *	"A 2-bit integer value used to indicate the minimum size of
+ *	non-final fragments supported by the receiver on the given port
+ *	associated with the local System. This value is expressed in units
+ *	of 64 octets of additional fragment length."
+ *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018
+ *	standard.
+ *
+ * Return: the multiplier is a number in the [0, 2] interval.
+ */
+u8 ethtool_frag_size_to_mult(u32 frag_size)
+{
+	u8 mult = (frag_size / 64) - 1;
+
+	mult = clamp_t(u8, mult, 0, 3);
+
+	return mult;
+}
+EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index a7346346114f..f4e07b740790 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -246,6 +246,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
 	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,
 	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
+	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,
 	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
 	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
 };
@@ -561,6 +562,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
 	[ETHTOOL_MSG_PAUSE_NTF]		= &ethnl_pause_request_ops,
 	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
 	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
+	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,
 };
 
 /* default notification handler */
@@ -654,6 +656,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
 	[ETHTOOL_MSG_PAUSE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
 	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
+	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,
 };
 
 void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
@@ -958,6 +961,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_stats_get_policy,
 		.maxattr = ARRAY_SIZE(ethnl_stats_get_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_PREEMPT_GET,
+		.doit	= ethnl_default_doit,
+		.start	= ethnl_default_start,
+		.dumpit	= ethnl_default_dumpit,
+		.done	= ethnl_default_done,
+		.policy = ethnl_preempt_get_policy,
+		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,
+	},
+	{
+		.cmd	= ETHTOOL_MSG_PREEMPT_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_set_preempt,
+		.policy = ethnl_preempt_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 3e25a47fd482..cc90a463a81c 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -345,6 +345,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;
 extern const struct ethnl_request_ops ethnl_eee_request_ops;
 extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
 extern const struct ethnl_request_ops ethnl_fec_request_ops;
+extern const struct ethnl_request_ops ethnl_preempt_request_ops;
 extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
 extern const struct ethnl_request_ops ethnl_stats_request_ops;
 
@@ -381,6 +382,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
 extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
+extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
+extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
@@ -400,6 +403,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
 int ethnl_tunnel_info_start(struct netlink_callback *cb);
 int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
+int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);
 
 extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
 extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
new file mode 100644
index 000000000000..4f96d3c2b1d5
--- /dev/null
+++ b/net/ethtool/preempt.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "netlink.h"
+#include "common.h"
+
+struct preempt_req_info {
+	struct ethnl_req_info		base;
+};
+
+struct preempt_reply_data {
+	struct ethnl_reply_data		base;
+	struct ethtool_fp		fp;
+};
+
+#define PREEMPT_REPDATA(__reply_base) \
+	container_of(__reply_base, struct preempt_reply_data, base)
+
+const struct nla_policy
+ethnl_preempt_get_policy[] = {
+	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),
+};
+
+static int preempt_prepare_data(const struct ethnl_req_info *req_base,
+				struct ethnl_reply_data *reply_base,
+				struct genl_info *info)
+{
+	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	struct net_device *dev = reply_base->dev;
+	int ret;
+
+	if (!dev->ethtool_ops->get_preempt)
+		return -EOPNOTSUPP;
+
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		return ret;
+
+	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);
+	ethnl_ops_complete(dev);
+
+	return ret;
+}
+
+static int preempt_reply_size(const struct ethnl_req_info *req_base,
+			      const struct ethnl_reply_data *reply_base)
+{
+	int len = 0;
+
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
+	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
+
+	return len;
+}
+
+static int preempt_fill_reply(struct sk_buff *skb,
+			      const struct ethnl_req_info *req_base,
+			      const struct ethnl_reply_data *reply_base)
+{
+	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
+	const struct ethtool_fp *preempt = &data->fp;
+
+	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,
+			preempt->add_frag_size))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+const struct ethnl_request_ops ethnl_preempt_request_ops = {
+	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,
+	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,
+	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,
+	.req_info_size		= sizeof(struct preempt_req_info),
+	.reply_data_size	= sizeof(struct preempt_reply_data),
+
+	.prepare_data		= preempt_prepare_data,
+	.reply_size		= preempt_reply_size,
+	.fill_reply		= preempt_fill_reply,
+};
+
+const struct nla_policy
+ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
+	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
+	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
+};
+
+int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
+{
+	struct ethnl_req_info req_info = {};
+	struct nlattr **tb = info->attrs;
+	struct ethtool_fp preempt = {};
+	struct net_device *dev;
+	bool mod = false;
+	int ret;
+
+	ret = ethnl_parse_header_dev_get(&req_info,
+					 tb[ETHTOOL_A_PREEMPT_HEADER],
+					 genl_info_net(info), info->extack,
+					 true);
+	if (ret < 0)
+		return ret;
+	dev = req_info.dev;
+	ret = -EOPNOTSUPP;
+	if (!dev->ethtool_ops->get_preempt ||
+	    !dev->ethtool_ops->set_preempt)
+		goto out_dev;
+
+	rtnl_lock();
+	ret = ethnl_ops_begin(dev);
+	if (ret < 0)
+		goto out_rtnl;
+
+	ret = dev->ethtool_ops->get_preempt(dev, &preempt);
+	if (ret < 0) {
+		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
+		goto out_ops;
+	}
+
+	ethnl_update_u8(&preempt.enabled,
+			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
+	ethnl_update_u32(&preempt.add_frag_size,
+			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
+	ret = 0;
+	if (!mod)
+		goto out_ops;
+
+	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
+	if (ret < 0) {
+		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");
+		goto out_ops;
+	}
+
+	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
+
+out_ops:
+	ethnl_ops_complete(dev);
+out_rtnl:
+	rtnl_unlock();
+out_dev:
+	dev_put(dev);
+	return ret;
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Adds a way to configure which traffic classes are marked as
preemptible and which are marked as express.

Even if frame preemption is not a "real" offload, because it can't be
executed purely in software, having this information near where the
mapping of traffic classes to queues is specified, makes it,
hopefully, easier to use.

taprio will receive the information of which traffic classes are
marked as express/preemptible, and when offloading frame preemption to
the driver will convert the information, so the driver receives which
queues are marked as express/preemptible.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/linux/netdevice.h      |  1 +
 include/net/pkt_sched.h        |  4 ++++
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_taprio.c         | 43 ++++++++++++++++++++++++++++++----
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index be1dcceda5e4..af5d4c5b0ad5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -923,6 +923,7 @@ enum tc_setup_type {
 	TC_SETUP_QDISC_TBF,
 	TC_SETUP_QDISC_FIFO,
 	TC_SETUP_QDISC_HTB,
+	TC_SETUP_PREEMPT,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 6d7b12cba015..b4cb479d1cf5 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -178,6 +178,10 @@ struct tc_taprio_qopt_offload {
 	struct tc_taprio_sched_entry entries[];
 };
 
+struct tc_preempt_qopt_offload {
+	u32 preemptible_queues;
+};
+
 /* Reference counting */
 struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
 						  *offload);
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 79a699f106b1..830ce9c9ec6f 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1241,6 +1241,7 @@ enum {
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
 	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
 	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
+	TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 66fe2b82af9a..58586f98c648 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -64,6 +64,7 @@ struct taprio_sched {
 	struct Qdisc **qdiscs;
 	struct Qdisc *root;
 	u32 flags;
+	u32 preemptible_tcs;
 	enum tk_offsets tk_offset;
 	int clockid;
 	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
@@ -786,6 +787,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
 	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
 	[TCA_TAPRIO_ATTR_FLAGS]                      = { .type = NLA_U32 },
 	[TCA_TAPRIO_ATTR_TXTIME_DELAY]		     = { .type = NLA_U32 },
+	[TCA_TAPRIO_ATTR_PREEMPT_TCS]                = { .type = NLA_U32 },
 };
 
 static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
@@ -1284,6 +1286,7 @@ static int taprio_disable_offload(struct net_device *dev,
 				  struct netlink_ext_ack *extack)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_preempt_qopt_offload preempt = { };
 	struct tc_taprio_qopt_offload *offload;
 	int err;
 
@@ -1302,13 +1305,15 @@ static int taprio_disable_offload(struct net_device *dev,
 	offload->enable = 0;
 
 	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
-	if (err < 0) {
+	if (err < 0)
 		NL_SET_ERR_MSG(extack,
-			       "Device failed to disable offload");
-		goto out;
-	}
+			       "Device failed to disable taprio offload");
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt);
+	if (err < 0)
+		NL_SET_ERR_MSG(extack,
+			       "Device failed to disable frame preemption offload");
 
-out:
 	taprio_offload_free(offload);
 
 	return err;
@@ -1525,6 +1530,29 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 					       mqprio->prio_tc_map[i]);
 	}
 
+	/* It's valid to enable frame preemption without any kind of
+	 * offloading being enabled, so keep it separated.
+	 */
+	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
+		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
+		struct tc_preempt_qopt_offload qopt = { };
+
+		if (preempt == U32_MAX) {
+			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
+			err = -EINVAL;
+			goto free_sched;
+		}
+
+		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
+
+		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
+						    &qopt);
+		if (err)
+			goto free_sched;
+
+		q->preemptible_tcs = preempt;
+	}
+
 	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
 		err = taprio_enable_offload(dev, q, new_admin, extack);
 	else
@@ -1681,6 +1709,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 	 */
 	q->clockid = -1;
 	q->flags = TAPRIO_FLAGS_INVALID;
+	q->preemptible_tcs = U32_MAX;
 
 	spin_lock(&taprio_list_lock);
 	list_add(&q->taprio_list, &taprio_list);
@@ -1899,6 +1928,10 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
 		goto options_error;
 
+	if (q->preemptible_tcs != U32_MAX &&
+	    nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs))
+		goto options_error;
+
 	if (q->txtime_delay &&
 	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
 		goto options_error;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Adds a way to configure which traffic classes are marked as
preemptible and which are marked as express.

Even if frame preemption is not a "real" offload, because it can't be
executed purely in software, having this information near where the
mapping of traffic classes to queues is specified, makes it,
hopefully, easier to use.

taprio will receive the information of which traffic classes are
marked as express/preemptible, and when offloading frame preemption to
the driver will convert the information, so the driver receives which
queues are marked as express/preemptible.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/linux/netdevice.h      |  1 +
 include/net/pkt_sched.h        |  4 ++++
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_taprio.c         | 43 ++++++++++++++++++++++++++++++----
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index be1dcceda5e4..af5d4c5b0ad5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -923,6 +923,7 @@ enum tc_setup_type {
 	TC_SETUP_QDISC_TBF,
 	TC_SETUP_QDISC_FIFO,
 	TC_SETUP_QDISC_HTB,
+	TC_SETUP_PREEMPT,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 6d7b12cba015..b4cb479d1cf5 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -178,6 +178,10 @@ struct tc_taprio_qopt_offload {
 	struct tc_taprio_sched_entry entries[];
 };
 
+struct tc_preempt_qopt_offload {
+	u32 preemptible_queues;
+};
+
 /* Reference counting */
 struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
 						  *offload);
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 79a699f106b1..830ce9c9ec6f 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1241,6 +1241,7 @@ enum {
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
 	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
 	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
+	TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 66fe2b82af9a..58586f98c648 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -64,6 +64,7 @@ struct taprio_sched {
 	struct Qdisc **qdiscs;
 	struct Qdisc *root;
 	u32 flags;
+	u32 preemptible_tcs;
 	enum tk_offsets tk_offset;
 	int clockid;
 	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
@@ -786,6 +787,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
 	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
 	[TCA_TAPRIO_ATTR_FLAGS]                      = { .type = NLA_U32 },
 	[TCA_TAPRIO_ATTR_TXTIME_DELAY]		     = { .type = NLA_U32 },
+	[TCA_TAPRIO_ATTR_PREEMPT_TCS]                = { .type = NLA_U32 },
 };
 
 static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
@@ -1284,6 +1286,7 @@ static int taprio_disable_offload(struct net_device *dev,
 				  struct netlink_ext_ack *extack)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_preempt_qopt_offload preempt = { };
 	struct tc_taprio_qopt_offload *offload;
 	int err;
 
@@ -1302,13 +1305,15 @@ static int taprio_disable_offload(struct net_device *dev,
 	offload->enable = 0;
 
 	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
-	if (err < 0) {
+	if (err < 0)
 		NL_SET_ERR_MSG(extack,
-			       "Device failed to disable offload");
-		goto out;
-	}
+			       "Device failed to disable taprio offload");
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt);
+	if (err < 0)
+		NL_SET_ERR_MSG(extack,
+			       "Device failed to disable frame preemption offload");
 
-out:
 	taprio_offload_free(offload);
 
 	return err;
@@ -1525,6 +1530,29 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 					       mqprio->prio_tc_map[i]);
 	}
 
+	/* It's valid to enable frame preemption without any kind of
+	 * offloading being enabled, so keep it separated.
+	 */
+	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
+		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
+		struct tc_preempt_qopt_offload qopt = { };
+
+		if (preempt == U32_MAX) {
+			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
+			err = -EINVAL;
+			goto free_sched;
+		}
+
+		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
+
+		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
+						    &qopt);
+		if (err)
+			goto free_sched;
+
+		q->preemptible_tcs = preempt;
+	}
+
 	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
 		err = taprio_enable_offload(dev, q, new_admin, extack);
 	else
@@ -1681,6 +1709,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 	 */
 	q->clockid = -1;
 	q->flags = TAPRIO_FLAGS_INVALID;
+	q->preemptible_tcs = U32_MAX;
 
 	spin_lock(&taprio_list_lock);
 	list_add(&q->taprio_list, &taprio_list);
@@ -1899,6 +1928,10 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
 		goto options_error;
 
+	if (q->preemptible_tcs != U32_MAX &&
+	    nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs))
+		goto options_error;
+
 	if (q->txtime_delay &&
 	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
 		goto options_error;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 03/12] core: Introduce netdev_tc_map_to_queue_mask()
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Converts from a bitmask specifying traffic classes (bit 0 for traffic
class (TC) 0, bit 1 for TC 1, and so on) to a bitmask for queues. The
conversion is done using the netdev.tc_to_txq map.

netdev_tc_map_to_queue_mask() first users will be the mqprio and
taprio qdiscs.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/linux/netdevice.h |  1 +
 net/core/dev.c            | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index af5d4c5b0ad5..dcff0b9a55ab 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2279,6 +2279,7 @@ int netdev_txq_to_tc(struct net_device *dev, unsigned int txq);
 void netdev_reset_tc(struct net_device *dev);
 int netdev_set_tc_queue(struct net_device *dev, u8 tc, u16 count, u16 offset);
 int netdev_set_num_tc(struct net_device *dev, u8 num_tc);
+u32 netdev_tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask);
 
 static inline
 int netdev_get_num_tc(struct net_device *dev)
diff --git a/net/core/dev.c b/net/core/dev.c
index 991d09b67bd9..4b25dbd26243 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2956,6 +2956,26 @@ int netdev_set_num_tc(struct net_device *dev, u8 num_tc)
 }
 EXPORT_SYMBOL(netdev_set_num_tc);
 
+u32 netdev_tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask)
+{
+	u32 i, queue_mask = 0;
+
+	for (i = 0; i < dev->num_tc; i++) {
+		u32 offset, count;
+
+		if (!(tc_mask & BIT(i)))
+			continue;
+
+		offset = dev->tc_to_txq[i].offset;
+		count = dev->tc_to_txq[i].count;
+
+		queue_mask |= GENMASK(offset + count - 1, offset);
+	}
+
+	return queue_mask;
+}
+EXPORT_SYMBOL(netdev_tc_map_to_queue_mask);
+
 void netdev_unbind_sb_channel(struct net_device *dev,
 			      struct net_device *sb_dev)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 03/12] core: Introduce netdev_tc_map_to_queue_mask()
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Converts from a bitmask specifying traffic classes (bit 0 for traffic
class (TC) 0, bit 1 for TC 1, and so on) to a bitmask for queues. The
conversion is done using the netdev.tc_to_txq map.

netdev_tc_map_to_queue_mask() first users will be the mqprio and
taprio qdiscs.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/linux/netdevice.h |  1 +
 net/core/dev.c            | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index af5d4c5b0ad5..dcff0b9a55ab 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2279,6 +2279,7 @@ int netdev_txq_to_tc(struct net_device *dev, unsigned int txq);
 void netdev_reset_tc(struct net_device *dev);
 int netdev_set_tc_queue(struct net_device *dev, u8 tc, u16 count, u16 offset);
 int netdev_set_num_tc(struct net_device *dev, u8 num_tc);
+u32 netdev_tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask);
 
 static inline
 int netdev_get_num_tc(struct net_device *dev)
diff --git a/net/core/dev.c b/net/core/dev.c
index 991d09b67bd9..4b25dbd26243 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2956,6 +2956,26 @@ int netdev_set_num_tc(struct net_device *dev, u8 num_tc)
 }
 EXPORT_SYMBOL(netdev_set_num_tc);
 
+u32 netdev_tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask)
+{
+	u32 i, queue_mask = 0;
+
+	for (i = 0; i < dev->num_tc; i++) {
+		u32 offset, count;
+
+		if (!(tc_mask & BIT(i)))
+			continue;
+
+		offset = dev->tc_to_txq[i].offset;
+		count = dev->tc_to_txq[i].count;
+
+		queue_mask |= GENMASK(offset + count - 1, offset);
+	}
+
+	return queue_mask;
+}
+EXPORT_SYMBOL(netdev_tc_map_to_queue_mask);
+
 void netdev_unbind_sb_channel(struct net_device *dev,
 			      struct net_device *sb_dev)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 04/12] taprio: Replace tc_map_to_queue_mask()
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Replaces tc_map_to_queue_mask() by netdev_tc_map_to_queue_mask() that
was just introduced.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 net/sched/sch_taprio.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 58586f98c648..4e411ca3a9eb 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1201,25 +1201,6 @@ static void taprio_offload_config_changed(struct taprio_sched *q)
 	spin_unlock(&q->current_entry_lock);
 }
 
-static u32 tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask)
-{
-	u32 i, queue_mask = 0;
-
-	for (i = 0; i < dev->num_tc; i++) {
-		u32 offset, count;
-
-		if (!(tc_mask & BIT(i)))
-			continue;
-
-		offset = dev->tc_to_txq[i].offset;
-		count = dev->tc_to_txq[i].count;
-
-		queue_mask |= GENMASK(offset + count - 1, offset);
-	}
-
-	return queue_mask;
-}
-
 static void taprio_sched_to_offload(struct net_device *dev,
 				    struct sched_gate_list *sched,
 				    struct tc_taprio_qopt_offload *offload)
@@ -1236,7 +1217,7 @@ static void taprio_sched_to_offload(struct net_device *dev,
 
 		e->command = entry->command;
 		e->interval = entry->interval;
-		e->gate_mask = tc_map_to_queue_mask(dev, entry->gate_mask);
+		e->gate_mask = netdev_tc_map_to_queue_mask(dev, entry->gate_mask);
 
 		i++;
 	}
@@ -1536,14 +1517,15 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
 		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
 		struct tc_preempt_qopt_offload qopt = { };
+		u32 all_tcs_mask = GENMASK(mqprio->num_tc, 0);
 
-		if (preempt == U32_MAX) {
+		if ((preempt & all_tcs_mask) == all_tcs_mask) {
 			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
 			err = -EINVAL;
 			goto free_sched;
 		}
 
-		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
+		qopt.preemptible_queues = netdev_tc_map_to_queue_mask(dev, preempt);
 
 		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
 						    &qopt);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 04/12] taprio: Replace tc_map_to_queue_mask()
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Replaces tc_map_to_queue_mask() by netdev_tc_map_to_queue_mask() that
was just introduced.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 net/sched/sch_taprio.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 58586f98c648..4e411ca3a9eb 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1201,25 +1201,6 @@ static void taprio_offload_config_changed(struct taprio_sched *q)
 	spin_unlock(&q->current_entry_lock);
 }
 
-static u32 tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask)
-{
-	u32 i, queue_mask = 0;
-
-	for (i = 0; i < dev->num_tc; i++) {
-		u32 offset, count;
-
-		if (!(tc_mask & BIT(i)))
-			continue;
-
-		offset = dev->tc_to_txq[i].offset;
-		count = dev->tc_to_txq[i].count;
-
-		queue_mask |= GENMASK(offset + count - 1, offset);
-	}
-
-	return queue_mask;
-}
-
 static void taprio_sched_to_offload(struct net_device *dev,
 				    struct sched_gate_list *sched,
 				    struct tc_taprio_qopt_offload *offload)
@@ -1236,7 +1217,7 @@ static void taprio_sched_to_offload(struct net_device *dev,
 
 		e->command = entry->command;
 		e->interval = entry->interval;
-		e->gate_mask = tc_map_to_queue_mask(dev, entry->gate_mask);
+		e->gate_mask = netdev_tc_map_to_queue_mask(dev, entry->gate_mask);
 
 		i++;
 	}
@@ -1536,14 +1517,15 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
 		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
 		struct tc_preempt_qopt_offload qopt = { };
+		u32 all_tcs_mask = GENMASK(mqprio->num_tc, 0);
 
-		if (preempt == U32_MAX) {
+		if ((preempt & all_tcs_mask) == all_tcs_mask) {
 			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
 			err = -EINVAL;
 			goto free_sched;
 		}
 
-		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
+		qopt.preemptible_queues = netdev_tc_map_to_queue_mask(dev, preempt);
 
 		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
 						    &qopt);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 05/12] mqprio: Add support for frame preemption offload
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Adds a way to configure which traffic classes are marked as
preemptible and which are marked as express.

Even if frame preemption is not a "real" offload, because it can't be
executed purely in software, having this information near where the
mapping of traffic classes to queues is specified, makes it,
hopefully, easier to use.

mqprio will receive the information of which traffic classes are
marked as express/preemptible, and when offloading frame preemption to
the driver will convert the information, so the driver receives which
queues are marked as express/preemptible.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_mqprio.c         | 41 ++++++++++++++++++++++++++++++++--
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 830ce9c9ec6f..06aa155e46f7 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -738,6 +738,7 @@ enum {
 	TCA_MQPRIO_SHAPER,
 	TCA_MQPRIO_MIN_RATE64,
 	TCA_MQPRIO_MAX_RATE64,
+	TCA_MQPRIO_PREEMPT_TCS,
 	__TCA_MQPRIO_MAX,
 };
 
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 8766ab5b8788..86e6012f180a 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -23,6 +23,7 @@ struct mqprio_sched {
 	u16 shaper;
 	int hw_offload;
 	u32 flags;
+	u32 preemptible_tcs;
 	u64 min_rate[TC_QOPT_MAX_QUEUE];
 	u64 max_rate[TC_QOPT_MAX_QUEUE];
 };
@@ -33,6 +34,13 @@ static void mqprio_destroy(struct Qdisc *sch)
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	unsigned int ntx;
 
+	if (priv->preemptible_tcs && dev->netdev_ops->ndo_setup_tc) {
+		struct tc_preempt_qopt_offload preempt = { };
+
+		dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
+						    &preempt);
+	}
+
 	if (priv->qdiscs) {
 		for (ntx = 0;
 		     ntx < dev->num_tx_queues && priv->qdiscs[ntx];
@@ -112,6 +120,7 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt)
 static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = {
 	[TCA_MQPRIO_MODE]	= { .len = sizeof(u16) },
 	[TCA_MQPRIO_SHAPER]	= { .len = sizeof(u16) },
+	[TCA_MQPRIO_PREEMPT_TCS] = { .type = NLA_U32 },
 	[TCA_MQPRIO_MIN_RATE64]	= { .type = NLA_NESTED },
 	[TCA_MQPRIO_MAX_RATE64]	= { .type = NLA_NESTED },
 };
@@ -171,8 +180,17 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
 		if (err < 0)
 			return err;
 
-		if (!qopt->hw)
-			return -EINVAL;
+		if (tb[TCA_MQPRIO_PREEMPT_TCS]) {
+			u32 preempt = nla_get_u32(tb[TCA_MQPRIO_PREEMPT_TCS]);
+			u32 all_tcs_mask = GENMASK(qopt->num_tc, 0);
+
+			if ((preempt & all_tcs_mask) == all_tcs_mask) {
+				NL_SET_ERR_MSG(extack, "At least one traffic class must be not be preemptible");
+				return -EINVAL;
+			}
+
+			priv->preemptible_tcs = preempt;
+		}
 
 		if (tb[TCA_MQPRIO_MODE]) {
 			priv->flags |= TC_MQPRIO_F_MODE;
@@ -217,6 +235,9 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
 		}
 	}
 
+	if (!qopt->hw && priv->flags)
+		return -EINVAL;
+
 	/* pre-allocate qdisc, attachment can't fail */
 	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
 			       GFP_KERNEL);
@@ -282,6 +303,18 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
 	for (i = 0; i < TC_BITMASK + 1; i++)
 		netdev_set_prio_tc_map(dev, i, qopt->prio_tc_map[i]);
 
+	if (priv->preemptible_tcs) {
+		struct tc_preempt_qopt_offload preempt = { };
+
+		preempt.preemptible_queues =
+			netdev_tc_map_to_queue_mask(dev, priv->preemptible_tcs);
+
+		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
+						    &preempt);
+		if (err)
+			return err;
+	}
+
 	sch->flags |= TCQ_F_MQROOT;
 	return 0;
 }
@@ -450,6 +483,10 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	    (dump_rates(priv, &opt, skb) != 0))
 		goto nla_put_failure;
 
+	if (priv->preemptible_tcs &&
+	    nla_put_u32(skb, TCA_MQPRIO_PREEMPT_TCS, priv->preemptible_tcs))
+		goto nla_put_failure;
+
 	return nla_nest_end(skb, nla);
 nla_put_failure:
 	nlmsg_trim(skb, nla);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 05/12] mqprio: Add support for frame preemption offload
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Adds a way to configure which traffic classes are marked as
preemptible and which are marked as express.

Even if frame preemption is not a "real" offload, because it can't be
executed purely in software, having this information near where the
mapping of traffic classes to queues is specified, makes it,
hopefully, easier to use.

mqprio will receive the information of which traffic classes are
marked as express/preemptible, and when offloading frame preemption to
the driver will convert the information, so the driver receives which
queues are marked as express/preemptible.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_mqprio.c         | 41 ++++++++++++++++++++++++++++++++--
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 830ce9c9ec6f..06aa155e46f7 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -738,6 +738,7 @@ enum {
 	TCA_MQPRIO_SHAPER,
 	TCA_MQPRIO_MIN_RATE64,
 	TCA_MQPRIO_MAX_RATE64,
+	TCA_MQPRIO_PREEMPT_TCS,
 	__TCA_MQPRIO_MAX,
 };
 
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 8766ab5b8788..86e6012f180a 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -23,6 +23,7 @@ struct mqprio_sched {
 	u16 shaper;
 	int hw_offload;
 	u32 flags;
+	u32 preemptible_tcs;
 	u64 min_rate[TC_QOPT_MAX_QUEUE];
 	u64 max_rate[TC_QOPT_MAX_QUEUE];
 };
@@ -33,6 +34,13 @@ static void mqprio_destroy(struct Qdisc *sch)
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	unsigned int ntx;
 
+	if (priv->preemptible_tcs && dev->netdev_ops->ndo_setup_tc) {
+		struct tc_preempt_qopt_offload preempt = { };
+
+		dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
+						    &preempt);
+	}
+
 	if (priv->qdiscs) {
 		for (ntx = 0;
 		     ntx < dev->num_tx_queues && priv->qdiscs[ntx];
@@ -112,6 +120,7 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt)
 static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = {
 	[TCA_MQPRIO_MODE]	= { .len = sizeof(u16) },
 	[TCA_MQPRIO_SHAPER]	= { .len = sizeof(u16) },
+	[TCA_MQPRIO_PREEMPT_TCS] = { .type = NLA_U32 },
 	[TCA_MQPRIO_MIN_RATE64]	= { .type = NLA_NESTED },
 	[TCA_MQPRIO_MAX_RATE64]	= { .type = NLA_NESTED },
 };
@@ -171,8 +180,17 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
 		if (err < 0)
 			return err;
 
-		if (!qopt->hw)
-			return -EINVAL;
+		if (tb[TCA_MQPRIO_PREEMPT_TCS]) {
+			u32 preempt = nla_get_u32(tb[TCA_MQPRIO_PREEMPT_TCS]);
+			u32 all_tcs_mask = GENMASK(qopt->num_tc, 0);
+
+			if ((preempt & all_tcs_mask) == all_tcs_mask) {
+				NL_SET_ERR_MSG(extack, "At least one traffic class must be not be preemptible");
+				return -EINVAL;
+			}
+
+			priv->preemptible_tcs = preempt;
+		}
 
 		if (tb[TCA_MQPRIO_MODE]) {
 			priv->flags |= TC_MQPRIO_F_MODE;
@@ -217,6 +235,9 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
 		}
 	}
 
+	if (!qopt->hw && priv->flags)
+		return -EINVAL;
+
 	/* pre-allocate qdisc, attachment can't fail */
 	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
 			       GFP_KERNEL);
@@ -282,6 +303,18 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
 	for (i = 0; i < TC_BITMASK + 1; i++)
 		netdev_set_prio_tc_map(dev, i, qopt->prio_tc_map[i]);
 
+	if (priv->preemptible_tcs) {
+		struct tc_preempt_qopt_offload preempt = { };
+
+		preempt.preemptible_queues =
+			netdev_tc_map_to_queue_mask(dev, priv->preemptible_tcs);
+
+		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
+						    &preempt);
+		if (err)
+			return err;
+	}
+
 	sch->flags |= TCQ_F_MQROOT;
 	return 0;
 }
@@ -450,6 +483,10 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	    (dump_rates(priv, &opt, skb) != 0))
 		goto nla_put_failure;
 
+	if (priv->preemptible_tcs &&
+	    nla_put_u32(skb, TCA_MQPRIO_PREEMPT_TCS, priv->preemptible_tcs))
+		goto nla_put_failure;
+
 	return nla_nest_end(skb, nla);
 nla_put_failure:
 	nlmsg_trim(skb, nla);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 06/12] igc: Add support for enabling frame preemption via ethtool
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Adds support for enabling frame preemption via ethtool. All that's
left for ethtool is to save the settings in the adapter state, and the
request for those settings to be applied.

It's done this because the TSN features (frame preemption is part of
them) interact with one another and it's better to keep track from a
central place.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  2 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c | 31 ++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 9e0bbb2e55e3..9afee4712aeb 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -173,6 +173,8 @@ struct igc_adapter {
 
 	ktime_t base_time;
 	ktime_t cycle_time;
+	bool frame_preemption_active;
+	u32 add_frag_size;
 
 	/* OS defined structs */
 	struct pci_dev *pdev;
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index fa4171860623..84d5afe92154 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -8,6 +8,7 @@
 
 #include "igc.h"
 #include "igc_diag.h"
+#include "igc_tsn.h"
 
 /* forward declaration */
 struct igc_stats {
@@ -1641,6 +1642,34 @@ static int igc_ethtool_set_eee(struct net_device *netdev,
 	return 0;
 }
 
+static int igc_ethtool_get_preempt(struct net_device *netdev,
+				   struct ethtool_fp *fpcmd)
+{
+	struct igc_adapter *adapter = netdev_priv(netdev);
+
+	fpcmd->enabled = adapter->frame_preemption_active;
+	fpcmd->add_frag_size = adapter->add_frag_size;
+
+	return 0;
+}
+
+static int igc_ethtool_set_preempt(struct net_device *netdev,
+				   struct ethtool_fp *fpcmd,
+				   struct netlink_ext_ack *extack)
+{
+	struct igc_adapter *adapter = netdev_priv(netdev);
+
+	if (fpcmd->add_frag_size < 68 || fpcmd->add_frag_size > 260) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid value for add-frag-size");
+		return -EINVAL;
+	}
+
+	adapter->frame_preemption_active = fpcmd->enabled;
+	adapter->add_frag_size = fpcmd->add_frag_size;
+
+	return igc_tsn_offload_apply(adapter);
+}
+
 static int igc_ethtool_begin(struct net_device *netdev)
 {
 	struct igc_adapter *adapter = netdev_priv(netdev);
@@ -1934,6 +1963,8 @@ static const struct ethtool_ops igc_ethtool_ops = {
 	.get_ts_info		= igc_ethtool_get_ts_info,
 	.get_channels		= igc_ethtool_get_channels,
 	.set_channels		= igc_ethtool_set_channels,
+	.get_preempt		= igc_ethtool_get_preempt,
+	.set_preempt		= igc_ethtool_set_preempt,
 	.get_priv_flags		= igc_ethtool_get_priv_flags,
 	.set_priv_flags		= igc_ethtool_set_priv_flags,
 	.get_eee		= igc_ethtool_get_eee,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 06/12] igc: Add support for enabling frame preemption via ethtool
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Adds support for enabling frame preemption via ethtool. All that's
left for ethtool is to save the settings in the adapter state, and the
request for those settings to be applied.

It's done this because the TSN features (frame preemption is part of
them) interact with one another and it's better to keep track from a
central place.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  2 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c | 31 ++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 9e0bbb2e55e3..9afee4712aeb 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -173,6 +173,8 @@ struct igc_adapter {
 
 	ktime_t base_time;
 	ktime_t cycle_time;
+	bool frame_preemption_active;
+	u32 add_frag_size;
 
 	/* OS defined structs */
 	struct pci_dev *pdev;
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index fa4171860623..84d5afe92154 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -8,6 +8,7 @@
 
 #include "igc.h"
 #include "igc_diag.h"
+#include "igc_tsn.h"
 
 /* forward declaration */
 struct igc_stats {
@@ -1641,6 +1642,34 @@ static int igc_ethtool_set_eee(struct net_device *netdev,
 	return 0;
 }
 
+static int igc_ethtool_get_preempt(struct net_device *netdev,
+				   struct ethtool_fp *fpcmd)
+{
+	struct igc_adapter *adapter = netdev_priv(netdev);
+
+	fpcmd->enabled = adapter->frame_preemption_active;
+	fpcmd->add_frag_size = adapter->add_frag_size;
+
+	return 0;
+}
+
+static int igc_ethtool_set_preempt(struct net_device *netdev,
+				   struct ethtool_fp *fpcmd,
+				   struct netlink_ext_ack *extack)
+{
+	struct igc_adapter *adapter = netdev_priv(netdev);
+
+	if (fpcmd->add_frag_size < 68 || fpcmd->add_frag_size > 260) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid value for add-frag-size");
+		return -EINVAL;
+	}
+
+	adapter->frame_preemption_active = fpcmd->enabled;
+	adapter->add_frag_size = fpcmd->add_frag_size;
+
+	return igc_tsn_offload_apply(adapter);
+}
+
 static int igc_ethtool_begin(struct net_device *netdev)
 {
 	struct igc_adapter *adapter = netdev_priv(netdev);
@@ -1934,6 +1963,8 @@ static const struct ethtool_ops igc_ethtool_ops = {
 	.get_ts_info		= igc_ethtool_get_ts_info,
 	.get_channels		= igc_ethtool_get_channels,
 	.set_channels		= igc_ethtool_set_channels,
+	.get_preempt		= igc_ethtool_get_preempt,
+	.set_preempt		= igc_ethtool_set_preempt,
 	.get_priv_flags		= igc_ethtool_get_priv_flags,
 	.set_priv_flags		= igc_ethtool_set_priv_flags,
 	.get_eee		= igc_ethtool_get_eee,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 07/12] igc: Add support for TC_SETUP_PREEMPT
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Saves which queues are marked as preemptible, activating frame
preemption is done when the user requests it enabled, via ethtool.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |  1 +
 drivers/net/ethernet/intel/igc/igc_main.c | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 9afee4712aeb..68c7262bd172 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -92,6 +92,7 @@ struct igc_ring {
 	u8 queue_index;                 /* logical index of the ring*/
 	u8 reg_idx;                     /* physical index of the ring */
 	bool launchtime_enable;         /* true if LaunchTime is enabled */
+	bool preemptible;               /* true if not express */
 
 	u32 start_time;
 	u32 end_time;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 3f6b6d4543a8..b0981ea0ae63 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5563,6 +5563,23 @@ static int igc_save_qbv_schedule(struct igc_adapter *adapter,
 	return 0;
 }
 
+static int igc_save_frame_preemption(struct igc_adapter *adapter,
+				     struct tc_preempt_qopt_offload *qopt)
+{
+	u32 preempt;
+	int i;
+
+	preempt = qopt->preemptible_queues;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		struct igc_ring *ring = adapter->tx_ring[i];
+
+		ring->preemptible = preempt & BIT(i);
+	}
+
+	return 0;
+}
+
 static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
 					 struct tc_taprio_qopt_offload *qopt)
 {
@@ -5591,6 +5608,9 @@ static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
 	case TC_SETUP_QDISC_ETF:
 		return igc_tsn_enable_launchtime(adapter, type_data);
 
+	case TC_SETUP_PREEMPT:
+		return igc_save_frame_preemption(adapter, type_data);
+
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 07/12] igc: Add support for TC_SETUP_PREEMPT
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Saves which queues are marked as preemptible, activating frame
preemption is done when the user requests it enabled, via ethtool.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |  1 +
 drivers/net/ethernet/intel/igc/igc_main.c | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 9afee4712aeb..68c7262bd172 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -92,6 +92,7 @@ struct igc_ring {
 	u8 queue_index;                 /* logical index of the ring*/
 	u8 reg_idx;                     /* physical index of the ring */
 	bool launchtime_enable;         /* true if LaunchTime is enabled */
+	bool preemptible;               /* true if not express */
 
 	u32 start_time;
 	u32 end_time;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 3f6b6d4543a8..b0981ea0ae63 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5563,6 +5563,23 @@ static int igc_save_qbv_schedule(struct igc_adapter *adapter,
 	return 0;
 }
 
+static int igc_save_frame_preemption(struct igc_adapter *adapter,
+				     struct tc_preempt_qopt_offload *qopt)
+{
+	u32 preempt;
+	int i;
+
+	preempt = qopt->preemptible_queues;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		struct igc_ring *ring = adapter->tx_ring[i];
+
+		ring->preemptible = preempt & BIT(i);
+	}
+
+	return 0;
+}
+
 static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
 					 struct tc_taprio_qopt_offload *qopt)
 {
@@ -5591,6 +5608,9 @@ static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
 	case TC_SETUP_QDISC_ETF:
 		return igc_tsn_enable_launchtime(adapter, type_data);
 
+	case TC_SETUP_PREEMPT:
+		return igc_save_frame_preemption(adapter, type_data);
+
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 08/12] igc: Simplify TSN flags handling
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Separates the procedure done during reset from applying a
configuration, knowing when the code is executing allow us to separate
the better what changes the hardware state from what changes only the
driver state.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |  4 ++
 drivers/net/ethernet/intel/igc/igc_main.c |  2 +-
 drivers/net/ethernet/intel/igc/igc_tsn.c  | 71 ++++++++++++++---------
 drivers/net/ethernet/intel/igc/igc_tsn.h  |  1 +
 4 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 68c7262bd172..ccd5f6b02e3a 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -290,6 +290,10 @@ extern char igc_driver_name[];
 #define IGC_FLAG_VLAN_PROMISC		BIT(15)
 #define IGC_FLAG_RX_LEGACY		BIT(16)
 #define IGC_FLAG_TSN_QBV_ENABLED	BIT(17)
+#define IGC_FLAG_TSN_PREEMPT_ENABLED	BIT(18)
+
+#define IGC_FLAG_TSN_ANY_ENABLED \
+	(IGC_FLAG_TSN_QBV_ENABLED | IGC_FLAG_TSN_PREEMPT_ENABLED)
 
 #define IGC_FLAG_RSS_FIELD_IPV4_UDP	BIT(6)
 #define IGC_FLAG_RSS_FIELD_IPV6_UDP	BIT(7)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index b0981ea0ae63..038383519b10 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -118,7 +118,7 @@ void igc_reset(struct igc_adapter *adapter)
 	igc_ptp_reset(adapter);
 
 	/* Re-enable TSN offloading, where applicable. */
-	igc_tsn_offload_apply(adapter);
+	igc_tsn_reset(adapter);
 
 	igc_get_phy_info(hw);
 }
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 174103c4bea6..f2dfc8059847 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -18,8 +18,21 @@ static bool is_any_launchtime(struct igc_adapter *adapter)
 	return false;
 }
 
+static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter)
+{
+	unsigned int new_flags = adapter->flags & ~IGC_FLAG_TSN_ANY_ENABLED;
+
+	if (adapter->base_time)
+		new_flags |= IGC_FLAG_TSN_QBV_ENABLED;
+
+	if (is_any_launchtime(adapter))
+		new_flags |= IGC_FLAG_TSN_QBV_ENABLED;
+
+	return new_flags;
+}
+
 /* Returns the TSN specific registers to their default values after
- * TSN offloading is disabled.
+ * the adapter is reset.
  */
 static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 {
@@ -27,11 +40,6 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl;
 	int i;
 
-	if (!(adapter->flags & IGC_FLAG_TSN_QBV_ENABLED))
-		return 0;
-
-	adapter->cycle_time = 0;
-
 	wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
 
@@ -68,9 +76,6 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	ktime_t base_time, systim;
 	int i;
 
-	if (adapter->flags & IGC_FLAG_TSN_QBV_ENABLED)
-		return 0;
-
 	cycle = adapter->cycle_time;
 	base_time = adapter->base_time;
 
@@ -125,33 +130,41 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	wr32(IGC_BASET_H, baset_h);
 	wr32(IGC_BASET_L, baset_l);
 
-	adapter->flags |= IGC_FLAG_TSN_QBV_ENABLED;
-
 	return 0;
 }
 
+int igc_tsn_reset(struct igc_adapter *adapter)
+{
+	unsigned int new_flags;
+	int err = 0;
+
+	new_flags = igc_tsn_new_flags(adapter);
+
+	if (!(new_flags & IGC_FLAG_TSN_ANY_ENABLED))
+		return igc_tsn_disable_offload(adapter);
+
+	err = igc_tsn_enable_offload(adapter);
+	if (err < 0)
+		return err;
+
+	adapter->flags = new_flags;
+
+	return err;
+}
+
 int igc_tsn_offload_apply(struct igc_adapter *adapter)
 {
-	bool is_any_enabled = adapter->base_time || is_any_launchtime(adapter);
-
-	if (!(adapter->flags & IGC_FLAG_TSN_QBV_ENABLED) && !is_any_enabled)
-		return 0;
-
-	if (!is_any_enabled) {
-		int err = igc_tsn_disable_offload(adapter);
-
-		if (err < 0)
-			return err;
-
-		/* The BASET registers aren't cleared when writing
-		 * into them, force a reset if the interface is
-		 * running.
-		 */
-		if (netif_running(adapter->netdev))
-			schedule_work(&adapter->reset_task);
+	int err;
 
+	if (netif_running(adapter->netdev)) {
+		schedule_work(&adapter->reset_task);
 		return 0;
 	}
 
-	return igc_tsn_enable_offload(adapter);
+	err = igc_tsn_enable_offload(adapter);
+	if (err < 0)
+		return err;
+
+	adapter->flags = igc_tsn_new_flags(adapter);
+	return 0;
 }
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.h b/drivers/net/ethernet/intel/igc/igc_tsn.h
index f76bc86ddccd..1512307f5a52 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.h
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.h
@@ -5,5 +5,6 @@
 #define _IGC_TSN_H_
 
 int igc_tsn_offload_apply(struct igc_adapter *adapter);
+int igc_tsn_reset(struct igc_adapter *adapter);
 
 #endif /* _IGC_BASE_H */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 08/12] igc: Simplify TSN flags handling
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Separates the procedure done during reset from applying a
configuration, knowing when the code is executing allow us to separate
the better what changes the hardware state from what changes only the
driver state.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |  4 ++
 drivers/net/ethernet/intel/igc/igc_main.c |  2 +-
 drivers/net/ethernet/intel/igc/igc_tsn.c  | 71 ++++++++++++++---------
 drivers/net/ethernet/intel/igc/igc_tsn.h  |  1 +
 4 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 68c7262bd172..ccd5f6b02e3a 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -290,6 +290,10 @@ extern char igc_driver_name[];
 #define IGC_FLAG_VLAN_PROMISC		BIT(15)
 #define IGC_FLAG_RX_LEGACY		BIT(16)
 #define IGC_FLAG_TSN_QBV_ENABLED	BIT(17)
+#define IGC_FLAG_TSN_PREEMPT_ENABLED	BIT(18)
+
+#define IGC_FLAG_TSN_ANY_ENABLED \
+	(IGC_FLAG_TSN_QBV_ENABLED | IGC_FLAG_TSN_PREEMPT_ENABLED)
 
 #define IGC_FLAG_RSS_FIELD_IPV4_UDP	BIT(6)
 #define IGC_FLAG_RSS_FIELD_IPV6_UDP	BIT(7)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index b0981ea0ae63..038383519b10 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -118,7 +118,7 @@ void igc_reset(struct igc_adapter *adapter)
 	igc_ptp_reset(adapter);
 
 	/* Re-enable TSN offloading, where applicable. */
-	igc_tsn_offload_apply(adapter);
+	igc_tsn_reset(adapter);
 
 	igc_get_phy_info(hw);
 }
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 174103c4bea6..f2dfc8059847 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -18,8 +18,21 @@ static bool is_any_launchtime(struct igc_adapter *adapter)
 	return false;
 }
 
+static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter)
+{
+	unsigned int new_flags = adapter->flags & ~IGC_FLAG_TSN_ANY_ENABLED;
+
+	if (adapter->base_time)
+		new_flags |= IGC_FLAG_TSN_QBV_ENABLED;
+
+	if (is_any_launchtime(adapter))
+		new_flags |= IGC_FLAG_TSN_QBV_ENABLED;
+
+	return new_flags;
+}
+
 /* Returns the TSN specific registers to their default values after
- * TSN offloading is disabled.
+ * the adapter is reset.
  */
 static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 {
@@ -27,11 +40,6 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl;
 	int i;
 
-	if (!(adapter->flags & IGC_FLAG_TSN_QBV_ENABLED))
-		return 0;
-
-	adapter->cycle_time = 0;
-
 	wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
 
@@ -68,9 +76,6 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	ktime_t base_time, systim;
 	int i;
 
-	if (adapter->flags & IGC_FLAG_TSN_QBV_ENABLED)
-		return 0;
-
 	cycle = adapter->cycle_time;
 	base_time = adapter->base_time;
 
@@ -125,33 +130,41 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	wr32(IGC_BASET_H, baset_h);
 	wr32(IGC_BASET_L, baset_l);
 
-	adapter->flags |= IGC_FLAG_TSN_QBV_ENABLED;
-
 	return 0;
 }
 
+int igc_tsn_reset(struct igc_adapter *adapter)
+{
+	unsigned int new_flags;
+	int err = 0;
+
+	new_flags = igc_tsn_new_flags(adapter);
+
+	if (!(new_flags & IGC_FLAG_TSN_ANY_ENABLED))
+		return igc_tsn_disable_offload(adapter);
+
+	err = igc_tsn_enable_offload(adapter);
+	if (err < 0)
+		return err;
+
+	adapter->flags = new_flags;
+
+	return err;
+}
+
 int igc_tsn_offload_apply(struct igc_adapter *adapter)
 {
-	bool is_any_enabled = adapter->base_time || is_any_launchtime(adapter);
-
-	if (!(adapter->flags & IGC_FLAG_TSN_QBV_ENABLED) && !is_any_enabled)
-		return 0;
-
-	if (!is_any_enabled) {
-		int err = igc_tsn_disable_offload(adapter);
-
-		if (err < 0)
-			return err;
-
-		/* The BASET registers aren't cleared when writing
-		 * into them, force a reset if the interface is
-		 * running.
-		 */
-		if (netif_running(adapter->netdev))
-			schedule_work(&adapter->reset_task);
+	int err;
 
+	if (netif_running(adapter->netdev)) {
+		schedule_work(&adapter->reset_task);
 		return 0;
 	}
 
-	return igc_tsn_enable_offload(adapter);
+	err = igc_tsn_enable_offload(adapter);
+	if (err < 0)
+		return err;
+
+	adapter->flags = igc_tsn_new_flags(adapter);
+	return 0;
 }
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.h b/drivers/net/ethernet/intel/igc/igc_tsn.h
index f76bc86ddccd..1512307f5a52 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.h
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.h
@@ -5,5 +5,6 @@
 #define _IGC_TSN_H_
 
 int igc_tsn_offload_apply(struct igc_adapter *adapter);
+int igc_tsn_reset(struct igc_adapter *adapter);
 
 #endif /* _IGC_BASE_H */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 09/12] igc: Add support for setting frame preemption configuration
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Sets the hardware register that enables the frame preemption feature.

Some code is moved around because the PREEMPT_ENA bit in the
IGC_TQAVCTRL register is recommended to be set after the individual
queue registers (IGC_TXQCTL[i]) are set.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  5 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  4 ++
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 58 +++++++++++++-------
 3 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index ccd5f6b02e3a..9b2ddcbf65fb 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -342,6 +342,11 @@ extern char igc_driver_name[];
 #define IGC_I225_RX_LATENCY_1000	300
 #define IGC_I225_RX_LATENCY_2500	1485
 
+/* From the datasheet section 8.12.4 Tx Qav Control TQAVCTRL,
+ * MIN_FRAG initial value.
+ */
+#define IGC_I225_MIN_FRAG_SIZE_DEFAULT	68
+
 /* RX and TX descriptor control thresholds.
  * PTHRESH - MAC will consider prefetch if it has fewer than this number of
  *           descriptors available in its onboard memory.
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index c3a5a5518790..a2ea057d8e6e 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -472,10 +472,14 @@
 /* Transmit Scheduling */
 #define IGC_TQAVCTRL_TRANSMIT_MODE_TSN	0x00000001
 #define IGC_TQAVCTRL_ENHANCED_QAV	0x00000008
+#define IGC_TQAVCTRL_PREEMPT_ENA	0x00000002
+#define IGC_TQAVCTRL_MIN_FRAG_MASK	0x0000C000
+#define IGC_TQAVCTRL_MIN_FRAG_SHIFT	14
 
 #define IGC_TXQCTL_QUEUE_MODE_LAUNCHT	0x00000001
 #define IGC_TXQCTL_STRICT_CYCLE		0x00000002
 #define IGC_TXQCTL_STRICT_END		0x00000004
+#define IGC_TXQCTL_PREEMPTABLE		0x00000008
 
 /* Receive Checksum Control */
 #define IGC_RXCSUM_CRCOFL	0x00000800   /* CRC32 offload enable */
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index f2dfc8059847..8af5b03e17ed 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -28,6 +28,9 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter)
 	if (is_any_launchtime(adapter))
 		new_flags |= IGC_FLAG_TSN_QBV_ENABLED;
 
+	if (adapter->frame_preemption_active)
+		new_flags |= IGC_FLAG_TSN_PREEMPT_ENABLED;
+
 	return new_flags;
 }
 
@@ -40,12 +43,15 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl;
 	int i;
 
+	adapter->add_frag_size = IGC_I225_MIN_FRAG_SIZE_DEFAULT;
+
 	wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
 
 	tqavctrl = rd32(IGC_TQAVCTRL);
 	tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN |
-		      IGC_TQAVCTRL_ENHANCED_QAV);
+		      IGC_TQAVCTRL_ENHANCED_QAV | IGC_TQAVCTRL_PREEMPT_ENA |
+		      IGC_TQAVCTRL_MIN_FRAG_MASK);
 	wr32(IGC_TQAVCTRL, tqavctrl);
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
@@ -63,7 +69,7 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	wr32(IGC_QBVCYCLET_S, NSEC_PER_SEC);
 	wr32(IGC_QBVCYCLET, NSEC_PER_SEC);
 
-	adapter->flags &= ~IGC_FLAG_TSN_QBV_ENABLED;
+	adapter->flags &= ~IGC_FLAG_TSN_ANY_ENABLED;
 
 	return 0;
 }
@@ -74,22 +80,36 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl, baset_l, baset_h;
 	u32 sec, nsec, cycle;
 	ktime_t base_time, systim;
+	u32 frag_size_mult;
 	int i;
 
-	cycle = adapter->cycle_time;
-	base_time = adapter->base_time;
-
 	wr32(IGC_TSAUXC, 0);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_TSN);
 	wr32(IGC_TXPBS, IGC_TXPBSIZE_TSN);
 
-	tqavctrl = rd32(IGC_TQAVCTRL);
-	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
-	wr32(IGC_TQAVCTRL, tqavctrl);
+	cycle = adapter->cycle_time;
+	base_time = adapter->base_time;
 
 	wr32(IGC_QBVCYCLET_S, cycle);
 	wr32(IGC_QBVCYCLET, cycle);
 
+	nsec = rd32(IGC_SYSTIML);
+	sec = rd32(IGC_SYSTIMH);
+
+	systim = ktime_set(sec, nsec);
+
+	if (ktime_compare(systim, base_time) > 0) {
+		s64 n;
+
+		n = div64_s64(ktime_sub_ns(systim, base_time), cycle);
+		base_time = ktime_add_ns(base_time, (n + 1) * cycle);
+	}
+
+	baset_h = div_s64_rem(base_time, NSEC_PER_SEC, &baset_l);
+
+	wr32(IGC_BASET_H, baset_h);
+	wr32(IGC_BASET_L, baset_l);
+
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igc_ring *ring = adapter->tx_ring[i];
 		u32 txqctl = 0;
@@ -110,25 +130,25 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 		if (ring->launchtime_enable)
 			txqctl |= IGC_TXQCTL_QUEUE_MODE_LAUNCHT;
 
+		if (adapter->frame_preemption_active && ring->preemptible)
+			txqctl |= IGC_TXQCTL_PREEMPTABLE;
+
 		wr32(IGC_TXQCTL(i), txqctl);
 	}
 
-	nsec = rd32(IGC_SYSTIML);
-	sec = rd32(IGC_SYSTIMH);
+	tqavctrl = rd32(IGC_TQAVCTRL) &
+		~(IGC_TQAVCTRL_MIN_FRAG_MASK | IGC_TQAVCTRL_PREEMPT_ENA);
 
-	systim = ktime_set(sec, nsec);
+	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
 
-	if (ktime_compare(systim, base_time) > 0) {
-		s64 n;
+	if (adapter->frame_preemption_active)
+		tqavctrl |= IGC_TQAVCTRL_PREEMPT_ENA;
 
-		n = div64_s64(ktime_sub_ns(systim, base_time), cycle);
-		base_time = ktime_add_ns(base_time, (n + 1) * cycle);
-	}
+	frag_size_mult = ethtool_frag_size_to_mult(adapter->add_frag_size);
 
-	baset_h = div_s64_rem(base_time, NSEC_PER_SEC, &baset_l);
+	tqavctrl |= frag_size_mult << IGC_TQAVCTRL_MIN_FRAG_SHIFT;
 
-	wr32(IGC_BASET_H, baset_h);
-	wr32(IGC_BASET_L, baset_l);
+	wr32(IGC_TQAVCTRL, tqavctrl);
 
 	return 0;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 09/12] igc: Add support for setting frame preemption configuration
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Sets the hardware register that enables the frame preemption feature.

Some code is moved around because the PREEMPT_ENA bit in the
IGC_TQAVCTRL register is recommended to be set after the individual
queue registers (IGC_TXQCTL[i]) are set.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  5 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  4 ++
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 58 +++++++++++++-------
 3 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index ccd5f6b02e3a..9b2ddcbf65fb 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -342,6 +342,11 @@ extern char igc_driver_name[];
 #define IGC_I225_RX_LATENCY_1000	300
 #define IGC_I225_RX_LATENCY_2500	1485
 
+/* From the datasheet section 8.12.4 Tx Qav Control TQAVCTRL,
+ * MIN_FRAG initial value.
+ */
+#define IGC_I225_MIN_FRAG_SIZE_DEFAULT	68
+
 /* RX and TX descriptor control thresholds.
  * PTHRESH - MAC will consider prefetch if it has fewer than this number of
  *           descriptors available in its onboard memory.
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index c3a5a5518790..a2ea057d8e6e 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -472,10 +472,14 @@
 /* Transmit Scheduling */
 #define IGC_TQAVCTRL_TRANSMIT_MODE_TSN	0x00000001
 #define IGC_TQAVCTRL_ENHANCED_QAV	0x00000008
+#define IGC_TQAVCTRL_PREEMPT_ENA	0x00000002
+#define IGC_TQAVCTRL_MIN_FRAG_MASK	0x0000C000
+#define IGC_TQAVCTRL_MIN_FRAG_SHIFT	14
 
 #define IGC_TXQCTL_QUEUE_MODE_LAUNCHT	0x00000001
 #define IGC_TXQCTL_STRICT_CYCLE		0x00000002
 #define IGC_TXQCTL_STRICT_END		0x00000004
+#define IGC_TXQCTL_PREEMPTABLE		0x00000008
 
 /* Receive Checksum Control */
 #define IGC_RXCSUM_CRCOFL	0x00000800   /* CRC32 offload enable */
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index f2dfc8059847..8af5b03e17ed 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -28,6 +28,9 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter)
 	if (is_any_launchtime(adapter))
 		new_flags |= IGC_FLAG_TSN_QBV_ENABLED;
 
+	if (adapter->frame_preemption_active)
+		new_flags |= IGC_FLAG_TSN_PREEMPT_ENABLED;
+
 	return new_flags;
 }
 
@@ -40,12 +43,15 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl;
 	int i;
 
+	adapter->add_frag_size = IGC_I225_MIN_FRAG_SIZE_DEFAULT;
+
 	wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
 
 	tqavctrl = rd32(IGC_TQAVCTRL);
 	tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN |
-		      IGC_TQAVCTRL_ENHANCED_QAV);
+		      IGC_TQAVCTRL_ENHANCED_QAV | IGC_TQAVCTRL_PREEMPT_ENA |
+		      IGC_TQAVCTRL_MIN_FRAG_MASK);
 	wr32(IGC_TQAVCTRL, tqavctrl);
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
@@ -63,7 +69,7 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter)
 	wr32(IGC_QBVCYCLET_S, NSEC_PER_SEC);
 	wr32(IGC_QBVCYCLET, NSEC_PER_SEC);
 
-	adapter->flags &= ~IGC_FLAG_TSN_QBV_ENABLED;
+	adapter->flags &= ~IGC_FLAG_TSN_ANY_ENABLED;
 
 	return 0;
 }
@@ -74,22 +80,36 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 	u32 tqavctrl, baset_l, baset_h;
 	u32 sec, nsec, cycle;
 	ktime_t base_time, systim;
+	u32 frag_size_mult;
 	int i;
 
-	cycle = adapter->cycle_time;
-	base_time = adapter->base_time;
-
 	wr32(IGC_TSAUXC, 0);
 	wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_TSN);
 	wr32(IGC_TXPBS, IGC_TXPBSIZE_TSN);
 
-	tqavctrl = rd32(IGC_TQAVCTRL);
-	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
-	wr32(IGC_TQAVCTRL, tqavctrl);
+	cycle = adapter->cycle_time;
+	base_time = adapter->base_time;
 
 	wr32(IGC_QBVCYCLET_S, cycle);
 	wr32(IGC_QBVCYCLET, cycle);
 
+	nsec = rd32(IGC_SYSTIML);
+	sec = rd32(IGC_SYSTIMH);
+
+	systim = ktime_set(sec, nsec);
+
+	if (ktime_compare(systim, base_time) > 0) {
+		s64 n;
+
+		n = div64_s64(ktime_sub_ns(systim, base_time), cycle);
+		base_time = ktime_add_ns(base_time, (n + 1) * cycle);
+	}
+
+	baset_h = div_s64_rem(base_time, NSEC_PER_SEC, &baset_l);
+
+	wr32(IGC_BASET_H, baset_h);
+	wr32(IGC_BASET_L, baset_l);
+
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igc_ring *ring = adapter->tx_ring[i];
 		u32 txqctl = 0;
@@ -110,25 +130,25 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter)
 		if (ring->launchtime_enable)
 			txqctl |= IGC_TXQCTL_QUEUE_MODE_LAUNCHT;
 
+		if (adapter->frame_preemption_active && ring->preemptible)
+			txqctl |= IGC_TXQCTL_PREEMPTABLE;
+
 		wr32(IGC_TXQCTL(i), txqctl);
 	}
 
-	nsec = rd32(IGC_SYSTIML);
-	sec = rd32(IGC_SYSTIMH);
+	tqavctrl = rd32(IGC_TQAVCTRL) &
+		~(IGC_TQAVCTRL_MIN_FRAG_MASK | IGC_TQAVCTRL_PREEMPT_ENA);
 
-	systim = ktime_set(sec, nsec);
+	tqavctrl |= IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV;
 
-	if (ktime_compare(systim, base_time) > 0) {
-		s64 n;
+	if (adapter->frame_preemption_active)
+		tqavctrl |= IGC_TQAVCTRL_PREEMPT_ENA;
 
-		n = div64_s64(ktime_sub_ns(systim, base_time), cycle);
-		base_time = ktime_add_ns(base_time, (n + 1) * cycle);
-	}
+	frag_size_mult = ethtool_frag_size_to_mult(adapter->add_frag_size);
 
-	baset_h = div_s64_rem(base_time, NSEC_PER_SEC, &baset_l);
+	tqavctrl |= frag_size_mult << IGC_TQAVCTRL_MIN_FRAG_SHIFT;
 
-	wr32(IGC_BASET_H, baset_h);
-	wr32(IGC_BASET_L, baset_l);
+	wr32(IGC_TQAVCTRL, tqavctrl);
 
 	return 0;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 10/12] ethtool: Add support for Frame Preemption verification
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

WIP WIP WIP

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 Documentation/networking/ethtool-netlink.rst |  3 +++
 include/linux/ethtool.h                      |  2 ++
 include/uapi/linux/ethtool_netlink.h         |  2 ++
 net/ethtool/netlink.h                        |  2 +-
 net/ethtool/preempt.c                        | 11 +++++++++++
 5 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index a87f1716944e..bc44724e2cd5 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1494,6 +1494,8 @@ Request contents:
   ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header
   ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
   ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
+  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``   u8      disable verification
+  ``ETHTOOL_A_PREEMPT_VERIFIED``         u8      verification procedure
   =====================================  ======  ==========================
 
 ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
@@ -1510,6 +1512,7 @@ Request contents:
   ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header
   ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
   ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
+  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``   u8      disable verification
   =====================================  ======  ==========================
 
 ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 7e449be8f335..64c31ab75e16 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -420,6 +420,8 @@ struct ethtool_module_eeprom {
 struct ethtool_fp {
 	u8 enabled;
 	u32 add_frag_size;
+	u8 disable_verify;
+	u8 verified;
 };
 
 /**
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 4600aba1c693..c2b4d7c3ed14 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -675,6 +675,8 @@ enum {
 	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
 	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
 	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
+	ETHTOOL_A_PREEMPT_DISABLE_VERIFY,		/* u8 */
+	ETHTOOL_A_PREEMPT_VERIFIED,			/* u8 */
 
 	/* add new constants above here */
 	__ETHTOOL_A_PREEMPT_CNT,
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index cc90a463a81c..671237d31ced 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -383,7 +383,7 @@ extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
 extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
-extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
+extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_VERIFIED + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
index 4f96d3c2b1d5..5d70374d9b01 100644
--- a/net/ethtool/preempt.c
+++ b/net/ethtool/preempt.c
@@ -48,6 +48,8 @@ static int preempt_reply_size(const struct ethnl_req_info *req_base,
 
 	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
 	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_DISABLE_VERIFY */
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_VERIFIED */
 
 	return len;
 }
@@ -66,6 +68,12 @@ static int preempt_fill_reply(struct sk_buff *skb,
 			preempt->add_frag_size))
 		return -EMSGSIZE;
 
+	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_DISABLE_VERIFY, preempt->disable_verify))
+		return -EMSGSIZE;
+
+	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_VERIFIED, preempt->verified))
+		return -EMSGSIZE;
+
 	return 0;
 }
 
@@ -86,6 +94,7 @@ ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
 	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
 	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
 	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
+	[ETHTOOL_A_PREEMPT_DISABLE_VERIFY]		= NLA_POLICY_RANGE(NLA_U8, 0, 1),
 };
 
 int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
@@ -124,6 +133,8 @@ int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
 			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
 	ethnl_update_u32(&preempt.add_frag_size,
 			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
+	ethnl_update_u8(&preempt.disable_verify,
+			tb[ETHTOOL_A_PREEMPT_DISABLE_VERIFY], &mod);
 	ret = 0;
 	if (!mod)
 		goto out_ops;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 10/12] ethtool: Add support for Frame Preemption verification
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

WIP WIP WIP

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 Documentation/networking/ethtool-netlink.rst |  3 +++
 include/linux/ethtool.h                      |  2 ++
 include/uapi/linux/ethtool_netlink.h         |  2 ++
 net/ethtool/netlink.h                        |  2 +-
 net/ethtool/preempt.c                        | 11 +++++++++++
 5 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index a87f1716944e..bc44724e2cd5 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1494,6 +1494,8 @@ Request contents:
   ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header
   ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
   ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
+  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``   u8      disable verification
+  ``ETHTOOL_A_PREEMPT_VERIFIED``         u8      verification procedure
   =====================================  ======  ==========================
 
 ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
@@ -1510,6 +1512,7 @@ Request contents:
   ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header
   ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
   ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
+  ``ETHTOOL_A_PREEMPT_DISABLE_VERIFY``   u8      disable verification
   =====================================  ======  ==========================
 
 ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 7e449be8f335..64c31ab75e16 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -420,6 +420,8 @@ struct ethtool_module_eeprom {
 struct ethtool_fp {
 	u8 enabled;
 	u32 add_frag_size;
+	u8 disable_verify;
+	u8 verified;
 };
 
 /**
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 4600aba1c693..c2b4d7c3ed14 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -675,6 +675,8 @@ enum {
 	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
 	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
 	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
+	ETHTOOL_A_PREEMPT_DISABLE_VERIFY,		/* u8 */
+	ETHTOOL_A_PREEMPT_VERIFIED,			/* u8 */
 
 	/* add new constants above here */
 	__ETHTOOL_A_PREEMPT_CNT,
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index cc90a463a81c..671237d31ced 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -383,7 +383,7 @@ extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
 extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
 extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
 extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
-extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
+extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_VERIFIED + 1];
 extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
 
 int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
index 4f96d3c2b1d5..5d70374d9b01 100644
--- a/net/ethtool/preempt.c
+++ b/net/ethtool/preempt.c
@@ -48,6 +48,8 @@ static int preempt_reply_size(const struct ethnl_req_info *req_base,
 
 	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
 	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_DISABLE_VERIFY */
+	len += nla_total_size(sizeof(u8)); /* _PREEMPT_VERIFIED */
 
 	return len;
 }
@@ -66,6 +68,12 @@ static int preempt_fill_reply(struct sk_buff *skb,
 			preempt->add_frag_size))
 		return -EMSGSIZE;
 
+	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_DISABLE_VERIFY, preempt->disable_verify))
+		return -EMSGSIZE;
+
+	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_VERIFIED, preempt->verified))
+		return -EMSGSIZE;
+
 	return 0;
 }
 
@@ -86,6 +94,7 @@ ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
 	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
 	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
 	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
+	[ETHTOOL_A_PREEMPT_DISABLE_VERIFY]		= NLA_POLICY_RANGE(NLA_U8, 0, 1),
 };
 
 int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
@@ -124,6 +133,8 @@ int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
 			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
 	ethnl_update_u32(&preempt.add_frag_size,
 			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
+	ethnl_update_u8(&preempt.disable_verify,
+			tb[ETHTOOL_A_PREEMPT_DISABLE_VERIFY], &mod);
 	ret = 0;
 	if (!mod)
 		goto out_ops;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 11/12] igc: Check incompatible configs for Frame Preemption
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Frame Preemption and LaunchTime cannot be enabled on the same queue.
If that situation happens, emit an error to the user, and log the
error.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 038383519b10..20dac04a02f2 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5432,6 +5432,11 @@ static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
 	if (queue < 0 || queue >= adapter->num_tx_queues)
 		return -EINVAL;
 
+	if (ring->preemptible) {
+		netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
+		return -EINVAL;
+	}
+
 	ring = adapter->tx_ring[queue];
 	ring->launchtime_enable = enable;
 
@@ -5573,8 +5578,14 @@ static int igc_save_frame_preemption(struct igc_adapter *adapter,
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igc_ring *ring = adapter->tx_ring[i];
+		bool preemptible = preempt & BIT(i);
 
-		ring->preemptible = preempt & BIT(i);
+		if (ring->launchtime_enable && preemptible) {
+			netdev_err(adapter->netdev, "Cannot set queue as preemptible if LaunchTime is enabled\n");
+			return -EINVAL;
+		}
+
+		ring->preemptible = preemptible;
 	}
 
 	return 0;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 11/12] igc: Check incompatible configs for Frame Preemption
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Frame Preemption and LaunchTime cannot be enabled on the same queue.
If that situation happens, emit an error to the user, and log the
error.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 038383519b10..20dac04a02f2 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5432,6 +5432,11 @@ static int igc_save_launchtime_params(struct igc_adapter *adapter, int queue,
 	if (queue < 0 || queue >= adapter->num_tx_queues)
 		return -EINVAL;
 
+	if (ring->preemptible) {
+		netdev_err(adapter->netdev, "Cannot enable LaunchTime on a preemptible queue\n");
+		return -EINVAL;
+	}
+
 	ring = adapter->tx_ring[queue];
 	ring->launchtime_enable = enable;
 
@@ -5573,8 +5578,14 @@ static int igc_save_frame_preemption(struct igc_adapter *adapter,
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igc_ring *ring = adapter->tx_ring[i];
+		bool preemptible = preempt & BIT(i);
 
-		ring->preemptible = preempt & BIT(i);
+		if (ring->launchtime_enable && preemptible) {
+			netdev_err(adapter->netdev, "Cannot set queue as preemptible if LaunchTime is enabled\n");
+			return -EINVAL;
+		}
+
+		ring->preemptible = preemptible;
 	}
 
 	return 0;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH net-next v4 12/12] igc: Add support for Frame Preemption verification
  2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, kuba,
	vladimir.oltean, po.liu, intel-wired-lan, anthony.l.nguyen,
	mkubecek

Add support for sending/receiving Frame Preemption verification
frames.

The i225 hardware doesn't implement the process of verification
internally, this is left to the driver.

Add a simple implementation of the state machine defined in IEEE
802.3-2018, Section 99.4.7.

For now, the state machine is started manually by the user, when
enabling verification. Example:

$ ethtool --set-frame-preemption IFACE disable-verify off

The "verified" condition is set to true when the SMD-V frame is sent,
and the SMD-R frame is received. So, it only tracks the transmission
side. This seems to be what's expected from IEEE 802.3-2018.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  15 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  13 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  20 +-
 drivers/net/ethernet/intel/igc/igc_main.c    | 216 +++++++++++++++++++
 4 files changed, 261 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 9b2ddcbf65fb..84234efed781 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -122,6 +122,13 @@ struct igc_ring {
 	struct xsk_buff_pool *xsk_pool;
 } ____cacheline_internodealigned_in_smp;
 
+enum frame_preemption_state {
+	FRAME_PREEMPTION_STATE_FAILED,
+	FRAME_PREEMPTION_STATE_DONE,
+	FRAME_PREEMPTION_STATE_START,
+	FRAME_PREEMPTION_STATE_SENT,
+};
+
 /* Board specific private data structure */
 struct igc_adapter {
 	struct net_device *netdev;
@@ -240,6 +247,14 @@ struct igc_adapter {
 		struct timespec64 start;
 		struct timespec64 period;
 	} perout[IGC_N_PEROUT];
+
+	struct delayed_work fp_verification_work;
+	unsigned long fp_start;
+	bool fp_received_smd_v;
+	bool fp_received_smd_r;
+	unsigned int fp_verify_cnt;
+	enum frame_preemption_state fp_tx_state;
+	bool fp_disable_verify;
 };
 
 void igc_up(struct igc_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index a2ea057d8e6e..cf46f5d5a505 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -268,6 +268,8 @@
 #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
 #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
 #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
+#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
+#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
 #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
 #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
 #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
@@ -327,9 +329,20 @@
 
 #define IGC_RXDEXT_STATERR_LB	0x00040000
 
+#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
+#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
+
 /* Advanced Receive Descriptor bit definitions */
 #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
 
+#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
+#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
+
+#define IGC_SMD_TYPE_SFD		0x0
+#define IGC_SMD_TYPE_SMD_V		0x1
+#define IGC_SMD_TYPE_SMD_R		0x2
+#define IGC_SMD_TYPE_COMPLETE		0x3
+
 #define IGC_RXDEXT_STATERR_L4E		0x20000000
 #define IGC_RXDEXT_STATERR_IPE		0x40000000
 #define IGC_RXDEXT_STATERR_RXE		0x80000000
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 84d5afe92154..f52a7be3af66 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1649,6 +1649,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
 
 	fpcmd->enabled = adapter->frame_preemption_active;
 	fpcmd->add_frag_size = adapter->add_frag_size;
+	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
+	fpcmd->disable_verify = adapter->fp_disable_verify;
 
 	return 0;
 }
@@ -1664,10 +1666,22 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
 		return -EINVAL;
 	}
 
-	adapter->frame_preemption_active = fpcmd->enabled;
-	adapter->add_frag_size = fpcmd->add_frag_size;
+	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
+		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
+		schedule_delayed_work(&adapter->fp_verification_work, msecs_to_jiffies(10));
+	}
 
-	return igc_tsn_offload_apply(adapter);
+	adapter->fp_disable_verify = fpcmd->disable_verify;
+
+	if (adapter->frame_preemption_active != fpcmd->enabled ||
+	    adapter->add_frag_size != fpcmd->add_frag_size) {
+		adapter->frame_preemption_active = fpcmd->enabled;
+		adapter->add_frag_size = fpcmd->add_frag_size;
+
+		return igc_tsn_offload_apply(adapter);
+	}
+
+	return 0;
 }
 
 static int igc_ethtool_begin(struct net_device *netdev)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 20dac04a02f2..ed55bd13e4a1 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -28,6 +28,11 @@
 #define IGC_XDP_TX		BIT(1)
 #define IGC_XDP_REDIRECT	BIT(2)
 
+#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
+#define IGC_MAX_VERIFY_CNT 3
+
+#define IGC_FP_SMD_FRAME_SIZE 60
+
 static int debug = -1;
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -2169,6 +2174,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
 	return 0;
 }
 
+static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
+				 struct sk_buff *skb)
+{
+	dma_addr_t dma;
+	unsigned int size;
+
+	size = skb_headlen(skb);
+
+	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(ring->dev, dma)) {
+		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
+		return -ENOMEM;
+	}
+
+	buffer->skb = skb;
+	buffer->protocol = 0;
+	buffer->bytecount = skb->len;
+	buffer->gso_segs = 1;
+	buffer->time_stamp = jiffies;
+	dma_unmap_len_set(buffer, len, skb->len);
+	dma_unmap_addr_set(buffer, dma, dma);
+
+	return 0;
+}
+
+static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
+				     struct sk_buff *skb, int type)
+{
+	struct igc_tx_buffer *buffer;
+	union igc_adv_tx_desc *desc;
+	u32 cmd_type, olinfo_status;
+	int err;
+
+	if (!igc_desc_unused(ring))
+		return -EBUSY;
+
+	buffer = &ring->tx_buffer_info[ring->next_to_use];
+	err = igc_fp_init_smd_frame(ring, buffer, skb);
+	if (err)
+		return err;
+
+	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
+		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
+		   buffer->bytecount;
+	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
+
+	switch (type) {
+	case IGC_SMD_TYPE_SMD_V:
+		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
+		break;
+	case IGC_SMD_TYPE_SMD_R:
+		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	desc = IGC_TX_DESC(ring, ring->next_to_use);
+	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
+	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
+
+	netdev_tx_sent_queue(txring_txq(ring), skb->len);
+
+	buffer->next_to_watch = desc;
+
+	ring->next_to_use++;
+	if (ring->next_to_use == ring->count)
+		ring->next_to_use = 0;
+
+	return 0;
+}
+
 static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
 					    int cpu)
 {
@@ -2299,6 +2377,19 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
 	q_vector->rx.total_bytes += bytes;
 }
 
+static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
+{
+	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
+
+	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
+		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
+}
+
+static bool igc_check_smd_frame(struct igc_rx_buffer *rx_buffer, unsigned int size)
+{
+	return size == 60;
+}
+
 static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 {
 	unsigned int total_bytes = 0, total_packets = 0;
@@ -2315,6 +2406,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		ktime_t timestamp = 0;
 		struct xdp_buff xdp;
 		int pkt_offset = 0;
+		int smd_type;
 		void *pktbuf;
 
 		/* return some buffers to hardware, one at a time is too slow */
@@ -2346,6 +2438,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 			size -= IGC_TS_HDR_LEN;
 		}
 
+		smd_type = igc_rx_desc_smd_type(rx_desc);
+
+		if (smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R) {
+			if (igc_check_smd_frame(rx_buffer, size)) {
+				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
+				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
+				schedule_delayed_work(&adapter->fp_verification_work, 0);
+			}
+
+			/* Advance the ring next-to-clean */
+			igc_is_non_eop(rx_ring, rx_desc);
+
+			cleaned_count++;
+			continue;
+		}
+
 		if (!skb) {
 			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
 			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
@@ -5607,6 +5715,107 @@ static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
 	return igc_tsn_offload_apply(adapter);
 }
 
+/* I225 doesn't send the SMD frames automatically, we need to handle
+ * them ourselves.
+ */
+static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
+{
+	int cpu = smp_processor_id();
+	struct netdev_queue *nq;
+	struct igc_ring *ring;
+	struct sk_buff *skb;
+	void *data;
+	int err;
+
+	if (!netif_running(adapter->netdev))
+		return -ENOTCONN;
+
+	/* FIXME: rename this function to something less specific, as
+	 * it can be used outside XDP.
+	 */
+	ring = igc_xdp_get_tx_ring(adapter, cpu);
+	nq = txring_txq(ring);
+
+	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
+	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
+
+	__netif_tx_lock(nq, cpu);
+
+	err = igc_fp_init_tx_descriptor(ring, skb, type);
+
+	igc_flush_tx_descriptors(ring);
+
+	__netif_tx_unlock(nq);
+
+	return err;
+}
+
+static void igc_fp_verification_work(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct igc_adapter *adapter;
+	int err;
+
+	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
+
+	if (adapter->fp_disable_verify)
+		goto done;
+
+	switch (adapter->fp_tx_state) {
+	case FRAME_PREEMPTION_STATE_START:
+		adapter->fp_received_smd_r = false;
+		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
+		if (err < 0)
+			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
+
+		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
+		adapter->fp_start = jiffies;
+		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
+		break;
+
+	case FRAME_PREEMPTION_STATE_SENT:
+		if (adapter->fp_received_smd_r) {
+			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
+			adapter->fp_received_smd_r = false;
+			break;
+		}
+
+		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
+			adapter->fp_verify_cnt++;
+			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
+
+			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
+				adapter->fp_verify_cnt = 0;
+				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
+				netdev_err(adapter->netdev,
+					   "Exceeded number of attempts for frame preemption verification\n");
+			} else {
+				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
+			}
+			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
+		}
+
+		break;
+
+	case FRAME_PREEMPTION_STATE_FAILED:
+	case FRAME_PREEMPTION_STATE_DONE:
+		break;
+	}
+
+done:
+	if (adapter->fp_received_smd_v) {
+		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
+		if (err < 0)
+			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
+
+		adapter->fp_received_smd_v = false;
+	}
+}
+
 static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			void *type_data)
 {
@@ -6023,6 +6232,7 @@ static int igc_probe(struct pci_dev *pdev,
 
 	INIT_WORK(&adapter->reset_task, igc_reset_task);
 	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
+	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
 
 	/* Initialize link properties that are user-changeable */
 	adapter->fc_autoneg = true;
@@ -6044,6 +6254,12 @@ static int igc_probe(struct pci_dev *pdev,
 
 	igc_ptp_init(adapter);
 
+	/* FIXME: This sets the default to not do the verification
+	 * automatically, when we have support in multiple
+	 * controllers, this default can be changed.
+	 */
+	adapter->fp_disable_verify = true;
+
 	/* reset the hardware with the new settings */
 	igc_reset(adapter);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 12/12] igc: Add support for Frame Preemption verification
@ 2021-06-26  0:33   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2021-06-26  0:33 UTC (permalink / raw)
  To: intel-wired-lan

Add support for sending/receiving Frame Preemption verification
frames.

The i225 hardware doesn't implement the process of verification
internally, this is left to the driver.

Add a simple implementation of the state machine defined in IEEE
802.3-2018, Section 99.4.7.

For now, the state machine is started manually by the user, when
enabling verification. Example:

$ ethtool --set-frame-preemption IFACE disable-verify off

The "verified" condition is set to true when the SMD-V frame is sent,
and the SMD-R frame is received. So, it only tracks the transmission
side. This seems to be what's expected from IEEE 802.3-2018.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  15 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  13 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  20 +-
 drivers/net/ethernet/intel/igc/igc_main.c    | 216 +++++++++++++++++++
 4 files changed, 261 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 9b2ddcbf65fb..84234efed781 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -122,6 +122,13 @@ struct igc_ring {
 	struct xsk_buff_pool *xsk_pool;
 } ____cacheline_internodealigned_in_smp;
 
+enum frame_preemption_state {
+	FRAME_PREEMPTION_STATE_FAILED,
+	FRAME_PREEMPTION_STATE_DONE,
+	FRAME_PREEMPTION_STATE_START,
+	FRAME_PREEMPTION_STATE_SENT,
+};
+
 /* Board specific private data structure */
 struct igc_adapter {
 	struct net_device *netdev;
@@ -240,6 +247,14 @@ struct igc_adapter {
 		struct timespec64 start;
 		struct timespec64 period;
 	} perout[IGC_N_PEROUT];
+
+	struct delayed_work fp_verification_work;
+	unsigned long fp_start;
+	bool fp_received_smd_v;
+	bool fp_received_smd_r;
+	unsigned int fp_verify_cnt;
+	enum frame_preemption_state fp_tx_state;
+	bool fp_disable_verify;
 };
 
 void igc_up(struct igc_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index a2ea057d8e6e..cf46f5d5a505 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -268,6 +268,8 @@
 #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
 #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
 #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
+#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
+#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
 #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
 #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
 #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
@@ -327,9 +329,20 @@
 
 #define IGC_RXDEXT_STATERR_LB	0x00040000
 
+#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
+#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
+
 /* Advanced Receive Descriptor bit definitions */
 #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
 
+#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
+#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
+
+#define IGC_SMD_TYPE_SFD		0x0
+#define IGC_SMD_TYPE_SMD_V		0x1
+#define IGC_SMD_TYPE_SMD_R		0x2
+#define IGC_SMD_TYPE_COMPLETE		0x3
+
 #define IGC_RXDEXT_STATERR_L4E		0x20000000
 #define IGC_RXDEXT_STATERR_IPE		0x40000000
 #define IGC_RXDEXT_STATERR_RXE		0x80000000
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 84d5afe92154..f52a7be3af66 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1649,6 +1649,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
 
 	fpcmd->enabled = adapter->frame_preemption_active;
 	fpcmd->add_frag_size = adapter->add_frag_size;
+	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
+	fpcmd->disable_verify = adapter->fp_disable_verify;
 
 	return 0;
 }
@@ -1664,10 +1666,22 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
 		return -EINVAL;
 	}
 
-	adapter->frame_preemption_active = fpcmd->enabled;
-	adapter->add_frag_size = fpcmd->add_frag_size;
+	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
+		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
+		schedule_delayed_work(&adapter->fp_verification_work, msecs_to_jiffies(10));
+	}
 
-	return igc_tsn_offload_apply(adapter);
+	adapter->fp_disable_verify = fpcmd->disable_verify;
+
+	if (adapter->frame_preemption_active != fpcmd->enabled ||
+	    adapter->add_frag_size != fpcmd->add_frag_size) {
+		adapter->frame_preemption_active = fpcmd->enabled;
+		adapter->add_frag_size = fpcmd->add_frag_size;
+
+		return igc_tsn_offload_apply(adapter);
+	}
+
+	return 0;
 }
 
 static int igc_ethtool_begin(struct net_device *netdev)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 20dac04a02f2..ed55bd13e4a1 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -28,6 +28,11 @@
 #define IGC_XDP_TX		BIT(1)
 #define IGC_XDP_REDIRECT	BIT(2)
 
+#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
+#define IGC_MAX_VERIFY_CNT 3
+
+#define IGC_FP_SMD_FRAME_SIZE 60
+
 static int debug = -1;
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -2169,6 +2174,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
 	return 0;
 }
 
+static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
+				 struct sk_buff *skb)
+{
+	dma_addr_t dma;
+	unsigned int size;
+
+	size = skb_headlen(skb);
+
+	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(ring->dev, dma)) {
+		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
+		return -ENOMEM;
+	}
+
+	buffer->skb = skb;
+	buffer->protocol = 0;
+	buffer->bytecount = skb->len;
+	buffer->gso_segs = 1;
+	buffer->time_stamp = jiffies;
+	dma_unmap_len_set(buffer, len, skb->len);
+	dma_unmap_addr_set(buffer, dma, dma);
+
+	return 0;
+}
+
+static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
+				     struct sk_buff *skb, int type)
+{
+	struct igc_tx_buffer *buffer;
+	union igc_adv_tx_desc *desc;
+	u32 cmd_type, olinfo_status;
+	int err;
+
+	if (!igc_desc_unused(ring))
+		return -EBUSY;
+
+	buffer = &ring->tx_buffer_info[ring->next_to_use];
+	err = igc_fp_init_smd_frame(ring, buffer, skb);
+	if (err)
+		return err;
+
+	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
+		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
+		   buffer->bytecount;
+	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
+
+	switch (type) {
+	case IGC_SMD_TYPE_SMD_V:
+		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
+		break;
+	case IGC_SMD_TYPE_SMD_R:
+		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	desc = IGC_TX_DESC(ring, ring->next_to_use);
+	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
+	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
+
+	netdev_tx_sent_queue(txring_txq(ring), skb->len);
+
+	buffer->next_to_watch = desc;
+
+	ring->next_to_use++;
+	if (ring->next_to_use == ring->count)
+		ring->next_to_use = 0;
+
+	return 0;
+}
+
 static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
 					    int cpu)
 {
@@ -2299,6 +2377,19 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
 	q_vector->rx.total_bytes += bytes;
 }
 
+static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
+{
+	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
+
+	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
+		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
+}
+
+static bool igc_check_smd_frame(struct igc_rx_buffer *rx_buffer, unsigned int size)
+{
+	return size == 60;
+}
+
 static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 {
 	unsigned int total_bytes = 0, total_packets = 0;
@@ -2315,6 +2406,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 		ktime_t timestamp = 0;
 		struct xdp_buff xdp;
 		int pkt_offset = 0;
+		int smd_type;
 		void *pktbuf;
 
 		/* return some buffers to hardware, one at a time is too slow */
@@ -2346,6 +2438,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
 			size -= IGC_TS_HDR_LEN;
 		}
 
+		smd_type = igc_rx_desc_smd_type(rx_desc);
+
+		if (smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R) {
+			if (igc_check_smd_frame(rx_buffer, size)) {
+				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
+				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
+				schedule_delayed_work(&adapter->fp_verification_work, 0);
+			}
+
+			/* Advance the ring next-to-clean */
+			igc_is_non_eop(rx_ring, rx_desc);
+
+			cleaned_count++;
+			continue;
+		}
+
 		if (!skb) {
 			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
 			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
@@ -5607,6 +5715,107 @@ static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
 	return igc_tsn_offload_apply(adapter);
 }
 
+/* I225 doesn't send the SMD frames automatically, we need to handle
+ * them ourselves.
+ */
+static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
+{
+	int cpu = smp_processor_id();
+	struct netdev_queue *nq;
+	struct igc_ring *ring;
+	struct sk_buff *skb;
+	void *data;
+	int err;
+
+	if (!netif_running(adapter->netdev))
+		return -ENOTCONN;
+
+	/* FIXME: rename this function to something less specific, as
+	 * it can be used outside XDP.
+	 */
+	ring = igc_xdp_get_tx_ring(adapter, cpu);
+	nq = txring_txq(ring);
+
+	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
+	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
+
+	__netif_tx_lock(nq, cpu);
+
+	err = igc_fp_init_tx_descriptor(ring, skb, type);
+
+	igc_flush_tx_descriptors(ring);
+
+	__netif_tx_unlock(nq);
+
+	return err;
+}
+
+static void igc_fp_verification_work(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct igc_adapter *adapter;
+	int err;
+
+	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
+
+	if (adapter->fp_disable_verify)
+		goto done;
+
+	switch (adapter->fp_tx_state) {
+	case FRAME_PREEMPTION_STATE_START:
+		adapter->fp_received_smd_r = false;
+		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
+		if (err < 0)
+			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
+
+		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
+		adapter->fp_start = jiffies;
+		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
+		break;
+
+	case FRAME_PREEMPTION_STATE_SENT:
+		if (adapter->fp_received_smd_r) {
+			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
+			adapter->fp_received_smd_r = false;
+			break;
+		}
+
+		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
+			adapter->fp_verify_cnt++;
+			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
+
+			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
+				adapter->fp_verify_cnt = 0;
+				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
+				netdev_err(adapter->netdev,
+					   "Exceeded number of attempts for frame preemption verification\n");
+			} else {
+				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
+			}
+			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
+		}
+
+		break;
+
+	case FRAME_PREEMPTION_STATE_FAILED:
+	case FRAME_PREEMPTION_STATE_DONE:
+		break;
+	}
+
+done:
+	if (adapter->fp_received_smd_v) {
+		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
+		if (err < 0)
+			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
+
+		adapter->fp_received_smd_v = false;
+	}
+}
+
 static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			void *type_data)
 {
@@ -6023,6 +6232,7 @@ static int igc_probe(struct pci_dev *pdev,
 
 	INIT_WORK(&adapter->reset_task, igc_reset_task);
 	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
+	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
 
 	/* Initialize link properties that are user-changeable */
 	adapter->fc_autoneg = true;
@@ -6044,6 +6254,12 @@ static int igc_probe(struct pci_dev *pdev,
 
 	igc_ptp_init(adapter);
 
+	/* FIXME: This sets the default to not do the verification
+	 * automatically, when we have support in multiple
+	 * controllers, this default can be changed.
+	 */
+	adapter->fp_disable_verify = true;
+
 	/* reset the hardware with the new settings */
 	igc_reset(adapter);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 01/12] ethtool: Add support for configuring frame preemption
  2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-27 19:43     ` Vladimir Oltean
  -1 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-27 19:43 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

On Fri, Jun 25, 2021 at 05:33:03PM -0700, Vinicius Costa Gomes wrote:
> Frame preemption (described in IEEE 802.3-2018, Section 99 in
> particular) defines the concept of preemptible and express queues. It
> allows traffic from express queues to "interrupt" traffic from
> preemptible queues, which are "resumed" after the express traffic has
> finished transmitting.
> 
> Frame preemption can only be used when both the local device and the
> link partner support it.
> 
> Only parameters for enabling/disabling frame preemption and
> configuring the minimum fragment size are included here. Expressing
> which queues are marked as preemptible is left to mqprio/taprio, as
> having that information there should be easier on the user.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  Documentation/networking/ethtool-netlink.rst |  38 +++++
>  include/linux/ethtool.h                      |  22 +++
>  include/uapi/linux/ethtool_netlink.h         |  17 +++
>  net/ethtool/Makefile                         |   2 +-
>  net/ethtool/common.c                         |  25 ++++
>  net/ethtool/netlink.c                        |  19 +++
>  net/ethtool/netlink.h                        |   4 +
>  net/ethtool/preempt.c                        | 146 +++++++++++++++++++
>  8 files changed, 272 insertions(+), 1 deletion(-)
>  create mode 100644 net/ethtool/preempt.c
> 
> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
> index 6ea91e41593f..a87f1716944e 100644
> --- a/Documentation/networking/ethtool-netlink.rst
> +++ b/Documentation/networking/ethtool-netlink.rst
> @@ -1477,6 +1477,44 @@ Low and high bounds are inclusive, for example:
>   etherStatsPkts512to1023Octets 512  1023
>   ============================= ==== ====

I think you need to add some extra documentation bits to the

List of message types
=====================

and

Request translation
===================

sections.

>  
> +PREEMPT_GET
> +===========
> +
> +Get information about frame preemption state.
> +
> +Request contents:
> +
> +  ====================================  ======  ==========================
> +  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header
> +  ====================================  ======  ==========================
> +
> +Request contents:
> +
> +  =====================================  ======  ==========================
> +  ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header
> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
> +  =====================================  ======  ==========================
> +
> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
> +fragment size that the receiver device supports.
> +
> +PREEMPT_SET
> +===========
> +
> +Sets frame preemption parameters.
> +
> +Request contents:
> +
> +  =====================================  ======  ==========================
> +  ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header
> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
> +  =====================================  ======  ==========================
> +
> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
> +fragment size that the receiver device supports.
> +
>  Request translation
>  ===================
>  
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 29dbb603bc91..7e449be8f335 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -409,6 +409,19 @@ struct ethtool_module_eeprom {
>  	u8	*data;
>  };
>  
> +/**
> + * struct ethtool_fp - Frame Preemption information
> + *
> + * @enabled: Enable frame preemption.
> + * @add_frag_size: Minimum size for additional (non-final) fragments
> + * in bytes, for the value defined in the IEEE 802.3-2018 standard see
> + * ethtool_frag_size_to_mult().
> + */
> +struct ethtool_fp {
> +	u8 enabled;
> +	u32 add_frag_size;

Strange that the verify_disable bit is not in here? I haven't looked at
further patches in detail but I saw in the commit message that you added
support for it, maybe it needs to be squashed with this?

Can we make "enabled" a bool?

> +};
> +
>  /**
>   * struct ethtool_ops - optional netdev operations
>   * @cap_link_lanes_supported: indicates if the driver supports lanes
> @@ -561,6 +574,8 @@ struct ethtool_module_eeprom {
>   *	not report statistics.
>   * @get_fecparam: Get the network device Forward Error Correction parameters.
>   * @set_fecparam: Set the network device Forward Error Correction parameters.
> + * @get_preempt: Get the network device Frame Preemption parameters.
> + * @set_preempt: Set the network device Frame Preemption parameters.
>   * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
>   *	This is only useful if the device maintains PHY statistics and
>   *	cannot use the standard PHY library helpers.
> @@ -675,6 +690,10 @@ struct ethtool_ops {
>  				      struct ethtool_fecparam *);
>  	int	(*set_fecparam)(struct net_device *,
>  				      struct ethtool_fecparam *);
> +	int	(*get_preempt)(struct net_device *,
> +			       struct ethtool_fp *);
> +	int	(*set_preempt)(struct net_device *, struct ethtool_fp *,
> +			       struct netlink_ext_ack *);
>  	void	(*get_ethtool_phy_stats)(struct net_device *,
>  					 struct ethtool_stats *, u64 *);
>  	int	(*get_phy_tunable)(struct net_device *,
> @@ -766,4 +785,7 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
>   * next string.
>   */
>  extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
> +
> +u8 ethtool_frag_size_to_mult(u32 frag_size);
> +
>  #endif /* _LINUX_ETHTOOL_H */
> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
> index c7135c9c37a5..4600aba1c693 100644
> --- a/include/uapi/linux/ethtool_netlink.h
> +++ b/include/uapi/linux/ethtool_netlink.h
> @@ -44,6 +44,8 @@ enum {
>  	ETHTOOL_MSG_TUNNEL_INFO_GET,
>  	ETHTOOL_MSG_FEC_GET,
>  	ETHTOOL_MSG_FEC_SET,
> +	ETHTOOL_MSG_PREEMPT_GET,
> +	ETHTOOL_MSG_PREEMPT_SET,
>  	ETHTOOL_MSG_MODULE_EEPROM_GET,
>  	ETHTOOL_MSG_STATS_GET,
>  
> @@ -86,6 +88,8 @@ enum {
>  	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
>  	ETHTOOL_MSG_FEC_GET_REPLY,
>  	ETHTOOL_MSG_FEC_NTF,
> +	ETHTOOL_MSG_PREEMPT_GET_REPLY,
> +	ETHTOOL_MSG_PREEMPT_NTF,
>  	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
>  	ETHTOOL_MSG_STATS_GET_REPLY,

Correct me if I'm wrong, but enums in uapi should always be added at the
end, otherwise you break value with user space binaries which use
ETHTOOL_MSG_MODULE_EEPROM_GET and are compiled against old kernel
headers.

>  
> @@ -664,6 +668,19 @@ enum {
>  	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)
>  };
>  
> +/* FRAME PREEMPTION */
> +
> +enum {
> +	ETHTOOL_A_PREEMPT_UNSPEC,
> +	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
> +	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
> +	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
> +
> +	/* add new constants above here */
> +	__ETHTOOL_A_PREEMPT_CNT,
> +	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)
> +};
> +
>  /* MODULE EEPROM */
>  
>  enum {
> diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
> index 723c9a8a8cdf..4b84b2d34c7a 100644
> --- a/net/ethtool/Makefile
> +++ b/net/ethtool/Makefile
> @@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
>  ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
>  		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
>  		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
> -		   tunnels.o fec.o eeprom.o stats.o
> +		   tunnels.o fec.o preempt.o eeprom.o stats.o
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index f9dcbad84788..68d123dd500b 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -579,3 +579,28 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
>  	link_ksettings->base.duplex = link_info->duplex;
>  }
>  EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);
> +
> +/**
> + * ethtool_frag_size_to_mult() - Convert from a Frame Preemption
> + * Additional Fragment size in bytes to a multiplier.
> + * @frag_size: minimum non-final fragment size in bytes.
> + *
> + * The multiplier is defined as:
> + *	"A 2-bit integer value used to indicate the minimum size of
> + *	non-final fragments supported by the receiver on the given port
> + *	associated with the local System. This value is expressed in units
> + *	of 64 octets of additional fragment length."
> + *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018
> + *	standard.
> + *
> + * Return: the multiplier is a number in the [0, 2] interval.
> + */
> +u8 ethtool_frag_size_to_mult(u32 frag_size)
> +{
> +	u8 mult = (frag_size / 64) - 1;
> +
> +	mult = clamp_t(u8, mult, 0, 3);
> +
> +	return mult;

I think it would look better as "return clamp_t(u8, mult, 0, 3);"

> +}
> +EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);
> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
> index a7346346114f..f4e07b740790 100644
> --- a/net/ethtool/netlink.c
> +++ b/net/ethtool/netlink.c
> @@ -246,6 +246,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
>  	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
>  	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,
>  	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
> +	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,
>  	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
>  	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
>  };
> @@ -561,6 +562,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
>  	[ETHTOOL_MSG_PAUSE_NTF]		= &ethnl_pause_request_ops,
>  	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
>  	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
> +	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,
>  };
>  
>  /* default notification handler */
> @@ -654,6 +656,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
>  	[ETHTOOL_MSG_PAUSE_NTF]		= ethnl_default_notify,
>  	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
>  	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
> +	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,
>  };
>  
>  void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
> @@ -958,6 +961,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
>  		.policy = ethnl_stats_get_policy,
>  		.maxattr = ARRAY_SIZE(ethnl_stats_get_policy) - 1,
>  	},
> +	{
> +		.cmd	= ETHTOOL_MSG_PREEMPT_GET,
> +		.doit	= ethnl_default_doit,
> +		.start	= ethnl_default_start,
> +		.dumpit	= ethnl_default_dumpit,
> +		.done	= ethnl_default_done,
> +		.policy = ethnl_preempt_get_policy,
> +		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,
> +	},
> +	{
> +		.cmd	= ETHTOOL_MSG_PREEMPT_SET,
> +		.flags	= GENL_UNS_ADMIN_PERM,
> +		.doit	= ethnl_set_preempt,
> +		.policy = ethnl_preempt_set_policy,
> +		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,
> +	},
>  };
>  
>  static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
> index 3e25a47fd482..cc90a463a81c 100644
> --- a/net/ethtool/netlink.h
> +++ b/net/ethtool/netlink.h
> @@ -345,6 +345,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;
>  extern const struct ethnl_request_ops ethnl_eee_request_ops;
>  extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
>  extern const struct ethnl_request_ops ethnl_fec_request_ops;
> +extern const struct ethnl_request_ops ethnl_preempt_request_ops;
>  extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
>  extern const struct ethnl_request_ops ethnl_stats_request_ops;
>  
> @@ -381,6 +382,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
>  extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
>  extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
>  extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
> +extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
> +extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
>  extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
>  
>  int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
> @@ -400,6 +403,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
>  int ethnl_tunnel_info_start(struct netlink_callback *cb);
>  int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
>  int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);
>  
>  extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
>  extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
> diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
> new file mode 100644
> index 000000000000..4f96d3c2b1d5
> --- /dev/null
> +++ b/net/ethtool/preempt.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include "netlink.h"
> +#include "common.h"
> +
> +struct preempt_req_info {
> +	struct ethnl_req_info		base;
> +};
> +
> +struct preempt_reply_data {
> +	struct ethnl_reply_data		base;
> +	struct ethtool_fp		fp;
> +};
> +
> +#define PREEMPT_REPDATA(__reply_base) \
> +	container_of(__reply_base, struct preempt_reply_data, base)
> +
> +const struct nla_policy
> +ethnl_preempt_get_policy[] = {
> +	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),
> +};
> +
> +static int preempt_prepare_data(const struct ethnl_req_info *req_base,
> +				struct ethnl_reply_data *reply_base,
> +				struct genl_info *info)
> +{
> +	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
> +	struct net_device *dev = reply_base->dev;
> +	int ret;
> +
> +	if (!dev->ethtool_ops->get_preempt)
> +		return -EOPNOTSUPP;
> +
> +	ret = ethnl_ops_begin(dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);
> +	ethnl_ops_complete(dev);
> +
> +	return ret;
> +}
> +
> +static int preempt_reply_size(const struct ethnl_req_info *req_base,
> +			      const struct ethnl_reply_data *reply_base)
> +{
> +	int len = 0;
> +
> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
> +	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
> +
> +	return len;
> +}
> +
> +static int preempt_fill_reply(struct sk_buff *skb,
> +			      const struct ethnl_req_info *req_base,
> +			      const struct ethnl_reply_data *reply_base)
> +{
> +	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
> +	const struct ethtool_fp *preempt = &data->fp;
> +
> +	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))
> +		return -EMSGSIZE;
> +
> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,
> +			preempt->add_frag_size))
> +		return -EMSGSIZE;
> +
> +	return 0;
> +}
> +
> +const struct ethnl_request_ops ethnl_preempt_request_ops = {
> +	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,
> +	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,
> +	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,
> +	.req_info_size		= sizeof(struct preempt_req_info),
> +	.reply_data_size	= sizeof(struct preempt_reply_data),
> +
> +	.prepare_data		= preempt_prepare_data,
> +	.reply_size		= preempt_reply_size,
> +	.fill_reply		= preempt_fill_reply,
> +};
> +
> +const struct nla_policy
> +ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
> +	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
> +	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
> +	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
> +};
> +
> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct ethnl_req_info req_info = {};
> +	struct nlattr **tb = info->attrs;
> +	struct ethtool_fp preempt = {};
> +	struct net_device *dev;
> +	bool mod = false;
> +	int ret;
> +
> +	ret = ethnl_parse_header_dev_get(&req_info,
> +					 tb[ETHTOOL_A_PREEMPT_HEADER],
> +					 genl_info_net(info), info->extack,
> +					 true);
> +	if (ret < 0)
> +		return ret;
> +	dev = req_info.dev;
> +	ret = -EOPNOTSUPP;

Some new lines around here please? And maybe it would look a bit cleaner
if you could assign "ret = -EOPNOTSUPP" in the "preempt ops not present"
if condition body?

> +	if (!dev->ethtool_ops->get_preempt ||
> +	    !dev->ethtool_ops->set_preempt)
> +		goto out_dev;
> +
> +	rtnl_lock();
> +	ret = ethnl_ops_begin(dev);
> +	if (ret < 0)
> +		goto out_rtnl;
> +
> +	ret = dev->ethtool_ops->get_preempt(dev, &preempt);

I don't know much about the background of ethtool netlink, but why does
the .doit of ETHTOOL_MSG_*_SET go through a getter first? Is it because
all the netlink attributes from the message are optional, and we need to
default to the current state?

> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
> +		goto out_ops;
> +	}
> +
> +	ethnl_update_u8(&preempt.enabled,
> +			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
> +	ethnl_update_u32(&preempt.add_frag_size,
> +			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
> +	ret = 0;

This reinitialization of ret to zero is interesting. It implies
->get_preempt() is allowed to return > 0 as a success error code.
However ->set_preempt() below isn't? (its return value is directly
propagated to callers of ethnl_set_preempt().

> +	if (!mod)
> +		goto out_ops;
> +
> +	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");
> +		goto out_ops;
> +	}
> +
> +	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
> +
> +out_ops:
> +	ethnl_ops_complete(dev);
> +out_rtnl:
> +	rtnl_unlock();
> +out_dev:
> +	dev_put(dev);
> +	return ret;
> +}
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 01/12] ethtool: Add support for configuring frame preemption
@ 2021-06-27 19:43     ` Vladimir Oltean
  0 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-27 19:43 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Jun 25, 2021 at 05:33:03PM -0700, Vinicius Costa Gomes wrote:
> Frame preemption (described in IEEE 802.3-2018, Section 99 in
> particular) defines the concept of preemptible and express queues. It
> allows traffic from express queues to "interrupt" traffic from
> preemptible queues, which are "resumed" after the express traffic has
> finished transmitting.
> 
> Frame preemption can only be used when both the local device and the
> link partner support it.
> 
> Only parameters for enabling/disabling frame preemption and
> configuring the minimum fragment size are included here. Expressing
> which queues are marked as preemptible is left to mqprio/taprio, as
> having that information there should be easier on the user.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  Documentation/networking/ethtool-netlink.rst |  38 +++++
>  include/linux/ethtool.h                      |  22 +++
>  include/uapi/linux/ethtool_netlink.h         |  17 +++
>  net/ethtool/Makefile                         |   2 +-
>  net/ethtool/common.c                         |  25 ++++
>  net/ethtool/netlink.c                        |  19 +++
>  net/ethtool/netlink.h                        |   4 +
>  net/ethtool/preempt.c                        | 146 +++++++++++++++++++
>  8 files changed, 272 insertions(+), 1 deletion(-)
>  create mode 100644 net/ethtool/preempt.c
> 
> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
> index 6ea91e41593f..a87f1716944e 100644
> --- a/Documentation/networking/ethtool-netlink.rst
> +++ b/Documentation/networking/ethtool-netlink.rst
> @@ -1477,6 +1477,44 @@ Low and high bounds are inclusive, for example:
>   etherStatsPkts512to1023Octets 512  1023
>   ============================= ==== ====

I think you need to add some extra documentation bits to the

List of message types
=====================

and

Request translation
===================

sections.

>  
> +PREEMPT_GET
> +===========
> +
> +Get information about frame preemption state.
> +
> +Request contents:
> +
> +  ====================================  ======  ==========================
> +  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header
> +  ====================================  ======  ==========================
> +
> +Request contents:
> +
> +  =====================================  ======  ==========================
> +  ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header
> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
> +  =====================================  ======  ==========================
> +
> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
> +fragment size that the receiver device supports.
> +
> +PREEMPT_SET
> +===========
> +
> +Sets frame preemption parameters.
> +
> +Request contents:
> +
> +  =====================================  ======  ==========================
> +  ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header
> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
> +  =====================================  ======  ==========================
> +
> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
> +fragment size that the receiver device supports.
> +
>  Request translation
>  ===================
>  
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 29dbb603bc91..7e449be8f335 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -409,6 +409,19 @@ struct ethtool_module_eeprom {
>  	u8	*data;
>  };
>  
> +/**
> + * struct ethtool_fp - Frame Preemption information
> + *
> + * @enabled: Enable frame preemption.
> + * @add_frag_size: Minimum size for additional (non-final) fragments
> + * in bytes, for the value defined in the IEEE 802.3-2018 standard see
> + * ethtool_frag_size_to_mult().
> + */
> +struct ethtool_fp {
> +	u8 enabled;
> +	u32 add_frag_size;

Strange that the verify_disable bit is not in here? I haven't looked at
further patches in detail but I saw in the commit message that you added
support for it, maybe it needs to be squashed with this?

Can we make "enabled" a bool?

> +};
> +
>  /**
>   * struct ethtool_ops - optional netdev operations
>   * @cap_link_lanes_supported: indicates if the driver supports lanes
> @@ -561,6 +574,8 @@ struct ethtool_module_eeprom {
>   *	not report statistics.
>   * @get_fecparam: Get the network device Forward Error Correction parameters.
>   * @set_fecparam: Set the network device Forward Error Correction parameters.
> + * @get_preempt: Get the network device Frame Preemption parameters.
> + * @set_preempt: Set the network device Frame Preemption parameters.
>   * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
>   *	This is only useful if the device maintains PHY statistics and
>   *	cannot use the standard PHY library helpers.
> @@ -675,6 +690,10 @@ struct ethtool_ops {
>  				      struct ethtool_fecparam *);
>  	int	(*set_fecparam)(struct net_device *,
>  				      struct ethtool_fecparam *);
> +	int	(*get_preempt)(struct net_device *,
> +			       struct ethtool_fp *);
> +	int	(*set_preempt)(struct net_device *, struct ethtool_fp *,
> +			       struct netlink_ext_ack *);
>  	void	(*get_ethtool_phy_stats)(struct net_device *,
>  					 struct ethtool_stats *, u64 *);
>  	int	(*get_phy_tunable)(struct net_device *,
> @@ -766,4 +785,7 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
>   * next string.
>   */
>  extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
> +
> +u8 ethtool_frag_size_to_mult(u32 frag_size);
> +
>  #endif /* _LINUX_ETHTOOL_H */
> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
> index c7135c9c37a5..4600aba1c693 100644
> --- a/include/uapi/linux/ethtool_netlink.h
> +++ b/include/uapi/linux/ethtool_netlink.h
> @@ -44,6 +44,8 @@ enum {
>  	ETHTOOL_MSG_TUNNEL_INFO_GET,
>  	ETHTOOL_MSG_FEC_GET,
>  	ETHTOOL_MSG_FEC_SET,
> +	ETHTOOL_MSG_PREEMPT_GET,
> +	ETHTOOL_MSG_PREEMPT_SET,
>  	ETHTOOL_MSG_MODULE_EEPROM_GET,
>  	ETHTOOL_MSG_STATS_GET,
>  
> @@ -86,6 +88,8 @@ enum {
>  	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
>  	ETHTOOL_MSG_FEC_GET_REPLY,
>  	ETHTOOL_MSG_FEC_NTF,
> +	ETHTOOL_MSG_PREEMPT_GET_REPLY,
> +	ETHTOOL_MSG_PREEMPT_NTF,
>  	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
>  	ETHTOOL_MSG_STATS_GET_REPLY,

Correct me if I'm wrong, but enums in uapi should always be added at the
end, otherwise you break value with user space binaries which use
ETHTOOL_MSG_MODULE_EEPROM_GET and are compiled against old kernel
headers.

>  
> @@ -664,6 +668,19 @@ enum {
>  	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)
>  };
>  
> +/* FRAME PREEMPTION */
> +
> +enum {
> +	ETHTOOL_A_PREEMPT_UNSPEC,
> +	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
> +	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
> +	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
> +
> +	/* add new constants above here */
> +	__ETHTOOL_A_PREEMPT_CNT,
> +	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)
> +};
> +
>  /* MODULE EEPROM */
>  
>  enum {
> diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
> index 723c9a8a8cdf..4b84b2d34c7a 100644
> --- a/net/ethtool/Makefile
> +++ b/net/ethtool/Makefile
> @@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
>  ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
>  		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
>  		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
> -		   tunnels.o fec.o eeprom.o stats.o
> +		   tunnels.o fec.o preempt.o eeprom.o stats.o
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index f9dcbad84788..68d123dd500b 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -579,3 +579,28 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
>  	link_ksettings->base.duplex = link_info->duplex;
>  }
>  EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);
> +
> +/**
> + * ethtool_frag_size_to_mult() - Convert from a Frame Preemption
> + * Additional Fragment size in bytes to a multiplier.
> + * @frag_size: minimum non-final fragment size in bytes.
> + *
> + * The multiplier is defined as:
> + *	"A 2-bit integer value used to indicate the minimum size of
> + *	non-final fragments supported by the receiver on the given port
> + *	associated with the local System. This value is expressed in units
> + *	of 64 octets of additional fragment length."
> + *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018
> + *	standard.
> + *
> + * Return: the multiplier is a number in the [0, 2] interval.
> + */
> +u8 ethtool_frag_size_to_mult(u32 frag_size)
> +{
> +	u8 mult = (frag_size / 64) - 1;
> +
> +	mult = clamp_t(u8, mult, 0, 3);
> +
> +	return mult;

I think it would look better as "return clamp_t(u8, mult, 0, 3);"

> +}
> +EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);
> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
> index a7346346114f..f4e07b740790 100644
> --- a/net/ethtool/netlink.c
> +++ b/net/ethtool/netlink.c
> @@ -246,6 +246,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
>  	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
>  	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,
>  	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
> +	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,
>  	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
>  	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
>  };
> @@ -561,6 +562,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
>  	[ETHTOOL_MSG_PAUSE_NTF]		= &ethnl_pause_request_ops,
>  	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
>  	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
> +	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,
>  };
>  
>  /* default notification handler */
> @@ -654,6 +656,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
>  	[ETHTOOL_MSG_PAUSE_NTF]		= ethnl_default_notify,
>  	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
>  	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
> +	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,
>  };
>  
>  void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
> @@ -958,6 +961,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
>  		.policy = ethnl_stats_get_policy,
>  		.maxattr = ARRAY_SIZE(ethnl_stats_get_policy) - 1,
>  	},
> +	{
> +		.cmd	= ETHTOOL_MSG_PREEMPT_GET,
> +		.doit	= ethnl_default_doit,
> +		.start	= ethnl_default_start,
> +		.dumpit	= ethnl_default_dumpit,
> +		.done	= ethnl_default_done,
> +		.policy = ethnl_preempt_get_policy,
> +		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,
> +	},
> +	{
> +		.cmd	= ETHTOOL_MSG_PREEMPT_SET,
> +		.flags	= GENL_UNS_ADMIN_PERM,
> +		.doit	= ethnl_set_preempt,
> +		.policy = ethnl_preempt_set_policy,
> +		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,
> +	},
>  };
>  
>  static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
> index 3e25a47fd482..cc90a463a81c 100644
> --- a/net/ethtool/netlink.h
> +++ b/net/ethtool/netlink.h
> @@ -345,6 +345,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;
>  extern const struct ethnl_request_ops ethnl_eee_request_ops;
>  extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
>  extern const struct ethnl_request_ops ethnl_fec_request_ops;
> +extern const struct ethnl_request_ops ethnl_preempt_request_ops;
>  extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
>  extern const struct ethnl_request_ops ethnl_stats_request_ops;
>  
> @@ -381,6 +382,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
>  extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
>  extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
>  extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
> +extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
> +extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
>  extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
>  
>  int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
> @@ -400,6 +403,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
>  int ethnl_tunnel_info_start(struct netlink_callback *cb);
>  int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
>  int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);
>  
>  extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
>  extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
> diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
> new file mode 100644
> index 000000000000..4f96d3c2b1d5
> --- /dev/null
> +++ b/net/ethtool/preempt.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include "netlink.h"
> +#include "common.h"
> +
> +struct preempt_req_info {
> +	struct ethnl_req_info		base;
> +};
> +
> +struct preempt_reply_data {
> +	struct ethnl_reply_data		base;
> +	struct ethtool_fp		fp;
> +};
> +
> +#define PREEMPT_REPDATA(__reply_base) \
> +	container_of(__reply_base, struct preempt_reply_data, base)
> +
> +const struct nla_policy
> +ethnl_preempt_get_policy[] = {
> +	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),
> +};
> +
> +static int preempt_prepare_data(const struct ethnl_req_info *req_base,
> +				struct ethnl_reply_data *reply_base,
> +				struct genl_info *info)
> +{
> +	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
> +	struct net_device *dev = reply_base->dev;
> +	int ret;
> +
> +	if (!dev->ethtool_ops->get_preempt)
> +		return -EOPNOTSUPP;
> +
> +	ret = ethnl_ops_begin(dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);
> +	ethnl_ops_complete(dev);
> +
> +	return ret;
> +}
> +
> +static int preempt_reply_size(const struct ethnl_req_info *req_base,
> +			      const struct ethnl_reply_data *reply_base)
> +{
> +	int len = 0;
> +
> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
> +	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
> +
> +	return len;
> +}
> +
> +static int preempt_fill_reply(struct sk_buff *skb,
> +			      const struct ethnl_req_info *req_base,
> +			      const struct ethnl_reply_data *reply_base)
> +{
> +	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
> +	const struct ethtool_fp *preempt = &data->fp;
> +
> +	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))
> +		return -EMSGSIZE;
> +
> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,
> +			preempt->add_frag_size))
> +		return -EMSGSIZE;
> +
> +	return 0;
> +}
> +
> +const struct ethnl_request_ops ethnl_preempt_request_ops = {
> +	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,
> +	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,
> +	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,
> +	.req_info_size		= sizeof(struct preempt_req_info),
> +	.reply_data_size	= sizeof(struct preempt_reply_data),
> +
> +	.prepare_data		= preempt_prepare_data,
> +	.reply_size		= preempt_reply_size,
> +	.fill_reply		= preempt_fill_reply,
> +};
> +
> +const struct nla_policy
> +ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
> +	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
> +	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
> +	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
> +};
> +
> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct ethnl_req_info req_info = {};
> +	struct nlattr **tb = info->attrs;
> +	struct ethtool_fp preempt = {};
> +	struct net_device *dev;
> +	bool mod = false;
> +	int ret;
> +
> +	ret = ethnl_parse_header_dev_get(&req_info,
> +					 tb[ETHTOOL_A_PREEMPT_HEADER],
> +					 genl_info_net(info), info->extack,
> +					 true);
> +	if (ret < 0)
> +		return ret;
> +	dev = req_info.dev;
> +	ret = -EOPNOTSUPP;

Some new lines around here please? And maybe it would look a bit cleaner
if you could assign "ret = -EOPNOTSUPP" in the "preempt ops not present"
if condition body?

> +	if (!dev->ethtool_ops->get_preempt ||
> +	    !dev->ethtool_ops->set_preempt)
> +		goto out_dev;
> +
> +	rtnl_lock();
> +	ret = ethnl_ops_begin(dev);
> +	if (ret < 0)
> +		goto out_rtnl;
> +
> +	ret = dev->ethtool_ops->get_preempt(dev, &preempt);

I don't know much about the background of ethtool netlink, but why does
the .doit of ETHTOOL_MSG_*_SET go through a getter first? Is it because
all the netlink attributes from the message are optional, and we need to
default to the current state?

> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
> +		goto out_ops;
> +	}
> +
> +	ethnl_update_u8(&preempt.enabled,
> +			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
> +	ethnl_update_u32(&preempt.add_frag_size,
> +			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
> +	ret = 0;

This reinitialization of ret to zero is interesting. It implies
->get_preempt() is allowed to return > 0 as a success error code.
However ->set_preempt() below isn't? (its return value is directly
propagated to callers of ethnl_set_preempt().

> +	if (!mod)
> +		goto out_ops;
> +
> +	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");
> +		goto out_ops;
> +	}
> +
> +	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
> +
> +out_ops:
> +	ethnl_ops_complete(dev);
> +out_rtnl:
> +	rtnl_unlock();
> +out_dev:
> +	dev_put(dev);
> +	return ret;
> +}
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
  2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-27 19:58     ` Vladimir Oltean
  -1 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-27 19:58 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

On Fri, Jun 25, 2021 at 05:33:04PM -0700, Vinicius Costa Gomes wrote:
> Adds a way to configure which traffic classes are marked as
> preemptible and which are marked as express.
> 
> Even if frame preemption is not a "real" offload, because it can't be
> executed purely in software, having this information near where the
> mapping of traffic classes to queues is specified, makes it,
> hopefully, easier to use.
> 
> taprio will receive the information of which traffic classes are
> marked as express/preemptible, and when offloading frame preemption to
> the driver will convert the information, so the driver receives which
> queues are marked as express/preemptible.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  include/linux/netdevice.h      |  1 +
>  include/net/pkt_sched.h        |  4 ++++
>  include/uapi/linux/pkt_sched.h |  1 +
>  net/sched/sch_taprio.c         | 43 ++++++++++++++++++++++++++++++----
>  4 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index be1dcceda5e4..af5d4c5b0ad5 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -923,6 +923,7 @@ enum tc_setup_type {
>  	TC_SETUP_QDISC_TBF,
>  	TC_SETUP_QDISC_FIFO,
>  	TC_SETUP_QDISC_HTB,
> +	TC_SETUP_PREEMPT,
>  };
>  
>  /* These structures hold the attributes of bpf state that are being passed
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 6d7b12cba015..b4cb479d1cf5 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -178,6 +178,10 @@ struct tc_taprio_qopt_offload {
>  	struct tc_taprio_sched_entry entries[];
>  };
>  
> +struct tc_preempt_qopt_offload {
> +	u32 preemptible_queues;
> +};
> +
>  /* Reference counting */
>  struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
>  						  *offload);
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 79a699f106b1..830ce9c9ec6f 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -1241,6 +1241,7 @@ enum {
>  	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
>  	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
>  	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
> +	TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */
>  	__TCA_TAPRIO_ATTR_MAX,
>  };
>  
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 66fe2b82af9a..58586f98c648 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -64,6 +64,7 @@ struct taprio_sched {
>  	struct Qdisc **qdiscs;
>  	struct Qdisc *root;
>  	u32 flags;
> +	u32 preemptible_tcs;
>  	enum tk_offsets tk_offset;
>  	int clockid;
>  	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
> @@ -786,6 +787,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
>  	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
>  	[TCA_TAPRIO_ATTR_FLAGS]                      = { .type = NLA_U32 },
>  	[TCA_TAPRIO_ATTR_TXTIME_DELAY]		     = { .type = NLA_U32 },
> +	[TCA_TAPRIO_ATTR_PREEMPT_TCS]                = { .type = NLA_U32 },
>  };
>  
>  static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
> @@ -1284,6 +1286,7 @@ static int taprio_disable_offload(struct net_device *dev,
>  				  struct netlink_ext_ack *extack)
>  {
>  	const struct net_device_ops *ops = dev->netdev_ops;
> +	struct tc_preempt_qopt_offload preempt = { };
>  	struct tc_taprio_qopt_offload *offload;
>  	int err;
>  
> @@ -1302,13 +1305,15 @@ static int taprio_disable_offload(struct net_device *dev,
>  	offload->enable = 0;
>  
>  	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
> -	if (err < 0) {
> +	if (err < 0)
>  		NL_SET_ERR_MSG(extack,
> -			       "Device failed to disable offload");
> -		goto out;
> -	}
> +			       "Device failed to disable taprio offload");
> +
> +	err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt);
> +	if (err < 0)
> +		NL_SET_ERR_MSG(extack,
> +			       "Device failed to disable frame preemption offload");

First line in taprio_disable_offload() is:

	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
		return 0;

but you said it yourself below that the preemptible queues thing is
independent of whether you have taprio offload or not (or taprio at
all). So the queues will never be reset back to the eMAC if you don't
use full offload (yes, this includes txtime offload too). In fact, it's
so independent, that I don't even know why we add them to taprio in the
first place :)
I think the argument had to do with the hold/advance commands (other
frame preemption stuff that's already in taprio), but those are really
special and only to be used in the Qbv+Qbu combination, but the pMAC
traffic classes? I don't know... Honestly I thought that me asking to
see preemptible queues implemented for mqprio as well was going to
discourage you, but oh well...

>  
> -out:
>  	taprio_offload_free(offload);
>  
>  	return err;
> @@ -1525,6 +1530,29 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>  					       mqprio->prio_tc_map[i]);
>  	}
>  
> +	/* It's valid to enable frame preemption without any kind of
> +	 * offloading being enabled, so keep it separated.
> +	 */
> +	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
> +		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
> +		struct tc_preempt_qopt_offload qopt = { };
> +
> +		if (preempt == U32_MAX) {
> +			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
> +			err = -EINVAL;
> +			goto free_sched;
> +		}

Hmmm, did we somehow agree that at least one traffic class must not be
preemptible? Citation needed.

> +
> +		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
> +
> +		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
> +						    &qopt);
> +		if (err)
> +			goto free_sched;
> +
> +		q->preemptible_tcs = preempt;
> +	}
> +
>  	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
>  		err = taprio_enable_offload(dev, q, new_admin, extack);
>  	else
> @@ -1681,6 +1709,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
>  	 */
>  	q->clockid = -1;
>  	q->flags = TAPRIO_FLAGS_INVALID;
> +	q->preemptible_tcs = U32_MAX;
>  
>  	spin_lock(&taprio_list_lock);
>  	list_add(&q->taprio_list, &taprio_list);
> @@ -1899,6 +1928,10 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
>  	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
>  		goto options_error;
>  
> +	if (q->preemptible_tcs != U32_MAX &&
> +	    nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs))
> +		goto options_error;
> +
>  	if (q->txtime_delay &&
>  	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
>  		goto options_error;
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
@ 2021-06-27 19:58     ` Vladimir Oltean
  0 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-27 19:58 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Jun 25, 2021 at 05:33:04PM -0700, Vinicius Costa Gomes wrote:
> Adds a way to configure which traffic classes are marked as
> preemptible and which are marked as express.
> 
> Even if frame preemption is not a "real" offload, because it can't be
> executed purely in software, having this information near where the
> mapping of traffic classes to queues is specified, makes it,
> hopefully, easier to use.
> 
> taprio will receive the information of which traffic classes are
> marked as express/preemptible, and when offloading frame preemption to
> the driver will convert the information, so the driver receives which
> queues are marked as express/preemptible.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  include/linux/netdevice.h      |  1 +
>  include/net/pkt_sched.h        |  4 ++++
>  include/uapi/linux/pkt_sched.h |  1 +
>  net/sched/sch_taprio.c         | 43 ++++++++++++++++++++++++++++++----
>  4 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index be1dcceda5e4..af5d4c5b0ad5 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -923,6 +923,7 @@ enum tc_setup_type {
>  	TC_SETUP_QDISC_TBF,
>  	TC_SETUP_QDISC_FIFO,
>  	TC_SETUP_QDISC_HTB,
> +	TC_SETUP_PREEMPT,
>  };
>  
>  /* These structures hold the attributes of bpf state that are being passed
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 6d7b12cba015..b4cb479d1cf5 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -178,6 +178,10 @@ struct tc_taprio_qopt_offload {
>  	struct tc_taprio_sched_entry entries[];
>  };
>  
> +struct tc_preempt_qopt_offload {
> +	u32 preemptible_queues;
> +};
> +
>  /* Reference counting */
>  struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
>  						  *offload);
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 79a699f106b1..830ce9c9ec6f 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -1241,6 +1241,7 @@ enum {
>  	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
>  	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
>  	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
> +	TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */
>  	__TCA_TAPRIO_ATTR_MAX,
>  };
>  
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 66fe2b82af9a..58586f98c648 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -64,6 +64,7 @@ struct taprio_sched {
>  	struct Qdisc **qdiscs;
>  	struct Qdisc *root;
>  	u32 flags;
> +	u32 preemptible_tcs;
>  	enum tk_offsets tk_offset;
>  	int clockid;
>  	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
> @@ -786,6 +787,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
>  	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
>  	[TCA_TAPRIO_ATTR_FLAGS]                      = { .type = NLA_U32 },
>  	[TCA_TAPRIO_ATTR_TXTIME_DELAY]		     = { .type = NLA_U32 },
> +	[TCA_TAPRIO_ATTR_PREEMPT_TCS]                = { .type = NLA_U32 },
>  };
>  
>  static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
> @@ -1284,6 +1286,7 @@ static int taprio_disable_offload(struct net_device *dev,
>  				  struct netlink_ext_ack *extack)
>  {
>  	const struct net_device_ops *ops = dev->netdev_ops;
> +	struct tc_preempt_qopt_offload preempt = { };
>  	struct tc_taprio_qopt_offload *offload;
>  	int err;
>  
> @@ -1302,13 +1305,15 @@ static int taprio_disable_offload(struct net_device *dev,
>  	offload->enable = 0;
>  
>  	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
> -	if (err < 0) {
> +	if (err < 0)
>  		NL_SET_ERR_MSG(extack,
> -			       "Device failed to disable offload");
> -		goto out;
> -	}
> +			       "Device failed to disable taprio offload");
> +
> +	err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt);
> +	if (err < 0)
> +		NL_SET_ERR_MSG(extack,
> +			       "Device failed to disable frame preemption offload");

First line in taprio_disable_offload() is:

	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
		return 0;

but you said it yourself below that the preemptible queues thing is
independent of whether you have taprio offload or not (or taprio at
all). So the queues will never be reset back to the eMAC if you don't
use full offload (yes, this includes txtime offload too). In fact, it's
so independent, that I don't even know why we add them to taprio in the
first place :)
I think the argument had to do with the hold/advance commands (other
frame preemption stuff that's already in taprio), but those are really
special and only to be used in the Qbv+Qbu combination, but the pMAC
traffic classes? I don't know... Honestly I thought that me asking to
see preemptible queues implemented for mqprio as well was going to
discourage you, but oh well...

>  
> -out:
>  	taprio_offload_free(offload);
>  
>  	return err;
> @@ -1525,6 +1530,29 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>  					       mqprio->prio_tc_map[i]);
>  	}
>  
> +	/* It's valid to enable frame preemption without any kind of
> +	 * offloading being enabled, so keep it separated.
> +	 */
> +	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
> +		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
> +		struct tc_preempt_qopt_offload qopt = { };
> +
> +		if (preempt == U32_MAX) {
> +			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
> +			err = -EINVAL;
> +			goto free_sched;
> +		}

Hmmm, did we somehow agree that at least one traffic class must not be
preemptible? Citation needed.

> +
> +		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
> +
> +		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
> +						    &qopt);
> +		if (err)
> +			goto free_sched;
> +
> +		q->preemptible_tcs = preempt;
> +	}
> +
>  	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
>  		err = taprio_enable_offload(dev, q, new_admin, extack);
>  	else
> @@ -1681,6 +1709,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
>  	 */
>  	q->clockid = -1;
>  	q->flags = TAPRIO_FLAGS_INVALID;
> +	q->preemptible_tcs = U32_MAX;
>  
>  	spin_lock(&taprio_list_lock);
>  	list_add(&q->taprio_list, &taprio_list);
> @@ -1899,6 +1928,10 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
>  	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
>  		goto options_error;
>  
> +	if (q->preemptible_tcs != U32_MAX &&
> +	    nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs))
> +		goto options_error;
> +
>  	if (q->txtime_delay &&
>  	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
>  		goto options_error;
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 04/12] taprio: Replace tc_map_to_queue_mask()
  2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-27 20:02     ` Vladimir Oltean
  -1 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-27 20:02 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

On Fri, Jun 25, 2021 at 05:33:06PM -0700, Vinicius Costa Gomes wrote:
> Replaces tc_map_to_queue_mask() by netdev_tc_map_to_queue_mask() that
> was just introduced.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  net/sched/sch_taprio.c | 26 ++++----------------------
>  1 file changed, 4 insertions(+), 22 deletions(-)
> 
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 58586f98c648..4e411ca3a9eb 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -1201,25 +1201,6 @@ static void taprio_offload_config_changed(struct taprio_sched *q)
>  	spin_unlock(&q->current_entry_lock);
>  }
>  
> -static u32 tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask)
> -{
> -	u32 i, queue_mask = 0;
> -
> -	for (i = 0; i < dev->num_tc; i++) {
> -		u32 offset, count;
> -
> -		if (!(tc_mask & BIT(i)))
> -			continue;
> -
> -		offset = dev->tc_to_txq[i].offset;
> -		count = dev->tc_to_txq[i].count;
> -
> -		queue_mask |= GENMASK(offset + count - 1, offset);
> -	}
> -
> -	return queue_mask;
> -}
> -
>  static void taprio_sched_to_offload(struct net_device *dev,
>  				    struct sched_gate_list *sched,
>  				    struct tc_taprio_qopt_offload *offload)
> @@ -1236,7 +1217,7 @@ static void taprio_sched_to_offload(struct net_device *dev,
>  
>  		e->command = entry->command;
>  		e->interval = entry->interval;
> -		e->gate_mask = tc_map_to_queue_mask(dev, entry->gate_mask);
> +		e->gate_mask = netdev_tc_map_to_queue_mask(dev, entry->gate_mask);
>  
>  		i++;
>  	}
> @@ -1536,14 +1517,15 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>  	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
>  		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
>  		struct tc_preempt_qopt_offload qopt = { };
> +		u32 all_tcs_mask = GENMASK(mqprio->num_tc, 0);
>  
> -		if (preempt == U32_MAX) {
> +		if ((preempt & all_tcs_mask) == all_tcs_mask) {

Ouch, this patch does more than it says on the box.
If it did only what the commit message said, it could have just as well
been squashed with the previous one (and this extra change squashed with
the "preemptible queues in taprio" patch. Practically it means that
these last two patches should go before the "preemptible queues in taprio" one.

>  			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
>  			err = -EINVAL;
>  			goto free_sched;
>  		}
>  
> -		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
> +		qopt.preemptible_queues = netdev_tc_map_to_queue_mask(dev, preempt);
>  
>  		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
>  						    &qopt);
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 04/12] taprio: Replace tc_map_to_queue_mask()
@ 2021-06-27 20:02     ` Vladimir Oltean
  0 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-27 20:02 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Jun 25, 2021 at 05:33:06PM -0700, Vinicius Costa Gomes wrote:
> Replaces tc_map_to_queue_mask() by netdev_tc_map_to_queue_mask() that
> was just introduced.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  net/sched/sch_taprio.c | 26 ++++----------------------
>  1 file changed, 4 insertions(+), 22 deletions(-)
> 
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 58586f98c648..4e411ca3a9eb 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -1201,25 +1201,6 @@ static void taprio_offload_config_changed(struct taprio_sched *q)
>  	spin_unlock(&q->current_entry_lock);
>  }
>  
> -static u32 tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask)
> -{
> -	u32 i, queue_mask = 0;
> -
> -	for (i = 0; i < dev->num_tc; i++) {
> -		u32 offset, count;
> -
> -		if (!(tc_mask & BIT(i)))
> -			continue;
> -
> -		offset = dev->tc_to_txq[i].offset;
> -		count = dev->tc_to_txq[i].count;
> -
> -		queue_mask |= GENMASK(offset + count - 1, offset);
> -	}
> -
> -	return queue_mask;
> -}
> -
>  static void taprio_sched_to_offload(struct net_device *dev,
>  				    struct sched_gate_list *sched,
>  				    struct tc_taprio_qopt_offload *offload)
> @@ -1236,7 +1217,7 @@ static void taprio_sched_to_offload(struct net_device *dev,
>  
>  		e->command = entry->command;
>  		e->interval = entry->interval;
> -		e->gate_mask = tc_map_to_queue_mask(dev, entry->gate_mask);
> +		e->gate_mask = netdev_tc_map_to_queue_mask(dev, entry->gate_mask);
>  
>  		i++;
>  	}
> @@ -1536,14 +1517,15 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>  	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
>  		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
>  		struct tc_preempt_qopt_offload qopt = { };
> +		u32 all_tcs_mask = GENMASK(mqprio->num_tc, 0);
>  
> -		if (preempt == U32_MAX) {
> +		if ((preempt & all_tcs_mask) == all_tcs_mask) {

Ouch, this patch does more than it says on the box.
If it did only what the commit message said, it could have just as well
been squashed with the previous one (and this extra change squashed with
the "preemptible queues in taprio" patch. Practically it means that
these last two patches should go before the "preemptible queues in taprio" one.

>  			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
>  			err = -EINVAL;
>  			goto free_sched;
>  		}
>  
> -		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
> +		qopt.preemptible_queues = netdev_tc_map_to_queue_mask(dev, preempt);
>  
>  		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
>  						    &qopt);
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 10/12] ethtool: Add support for Frame Preemption verification
  2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-28  9:17     ` Vladimir Oltean
  -1 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-28  9:17 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

On Fri, Jun 25, 2021 at 05:33:12PM -0700, Vinicius Costa Gomes wrote:
> WIP WIP WIP
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

This looks good as long as it is squashed with the first patch.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 10/12] ethtool: Add support for Frame Preemption verification
@ 2021-06-28  9:17     ` Vladimir Oltean
  0 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-28  9:17 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Jun 25, 2021 at 05:33:12PM -0700, Vinicius Costa Gomes wrote:
> WIP WIP WIP
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

This looks good as long as it is squashed with the first patch.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 11/12] igc: Check incompatible configs for Frame Preemption
  2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-28  9:20     ` Vladimir Oltean
  -1 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-28  9:20 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

On Fri, Jun 25, 2021 at 05:33:13PM -0700, Vinicius Costa Gomes wrote:
> Frame Preemption and LaunchTime cannot be enabled on the same queue.
> If that situation happens, emit an error to the user, and log the
> error.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

This is a very interesting limitation, considering the fact that much of
the frame preemption validation that I did was in conjunction with
tc-etf and SO_TXTIME (send packets on 2 queues, one preemptible and one
express, and compare the TX timestamps of the express packets with their
scheduled TX times). The base-time offset between the ET and the PT
packets is varied in small increments in the order of 20 ns or so.
If this is not possible with hardware driven by igc, how do you know it
works properly? :)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 11/12] igc: Check incompatible configs for Frame Preemption
@ 2021-06-28  9:20     ` Vladimir Oltean
  0 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-28  9:20 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Jun 25, 2021 at 05:33:13PM -0700, Vinicius Costa Gomes wrote:
> Frame Preemption and LaunchTime cannot be enabled on the same queue.
> If that situation happens, emit an error to the user, and log the
> error.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---

This is a very interesting limitation, considering the fact that much of
the frame preemption validation that I did was in conjunction with
tc-etf and SO_TXTIME (send packets on 2 queues, one preemptible and one
express, and compare the TX timestamps of the express packets with their
scheduled TX times). The base-time offset between the ET and the PT
packets is varied in small increments in the order of 20 ns or so.
If this is not possible with hardware driven by igc, how do you know it
works properly? :)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 12/12] igc: Add support for Frame Preemption verification
  2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2021-06-28  9:59     ` Vladimir Oltean
  -1 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-28  9:59 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

On Fri, Jun 25, 2021 at 05:33:14PM -0700, Vinicius Costa Gomes wrote:
> Add support for sending/receiving Frame Preemption verification
> frames.
> 
> The i225 hardware doesn't implement the process of verification
> internally, this is left to the driver.
> 
> Add a simple implementation of the state machine defined in IEEE
> 802.3-2018, Section 99.4.7.
> 
> For now, the state machine is started manually by the user, when
> enabling verification. Example:
> 
> $ ethtool --set-frame-preemption IFACE disable-verify off
> 
> The "verified" condition is set to true when the SMD-V frame is sent,
> and the SMD-R frame is received. So, it only tracks the transmission
> side. This seems to be what's expected from IEEE 802.3-2018.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc.h         |  15 ++
>  drivers/net/ethernet/intel/igc/igc_defines.h |  13 ++
>  drivers/net/ethernet/intel/igc/igc_ethtool.c |  20 +-
>  drivers/net/ethernet/intel/igc/igc_main.c    | 216 +++++++++++++++++++
>  4 files changed, 261 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index 9b2ddcbf65fb..84234efed781 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -122,6 +122,13 @@ struct igc_ring {
>  	struct xsk_buff_pool *xsk_pool;
>  } ____cacheline_internodealigned_in_smp;
>  
> +enum frame_preemption_state {
> +	FRAME_PREEMPTION_STATE_FAILED,
> +	FRAME_PREEMPTION_STATE_DONE,
> +	FRAME_PREEMPTION_STATE_START,
> +	FRAME_PREEMPTION_STATE_SENT,
> +};
> +
>  /* Board specific private data structure */
>  struct igc_adapter {
>  	struct net_device *netdev;
> @@ -240,6 +247,14 @@ struct igc_adapter {
>  		struct timespec64 start;
>  		struct timespec64 period;
>  	} perout[IGC_N_PEROUT];
> +
> +	struct delayed_work fp_verification_work;
> +	unsigned long fp_start;
> +	bool fp_received_smd_v;
> +	bool fp_received_smd_r;
> +	unsigned int fp_verify_cnt;
> +	enum frame_preemption_state fp_tx_state;
> +	bool fp_disable_verify;
>  };
>  
>  void igc_up(struct igc_adapter *adapter);
> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
> index a2ea057d8e6e..cf46f5d5a505 100644
> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
> @@ -268,6 +268,8 @@
>  #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
>  #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
>  #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
> +#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
> +#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
>  #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
>  #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
>  #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
> @@ -327,9 +329,20 @@
>  
>  #define IGC_RXDEXT_STATERR_LB	0x00040000
>  
> +#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
> +#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
> +

So the i225 gives you the ability to select from multiple
Start-of-mPacket-Delimiter values on a per-TX descriptor basis?
And this is in addition to configuring that TX ring as preemptable I
guess? Because I notice that you're sending on the TX ring affine to the
current CPU that the verification work item is running on (which you
don't check anywhere that it is configured as going to the pMAC or not).
And on RX, it always gives you the kind of SMD that the packet had
(including the classic SFD for express packets)?
Cool.

It would be nice if I could connect back to back an i225 board with an
NXP LS1028A to see if the verification state machines pass both ways (on
LS1028A it is 100% hardware based, we just enable/disable the feature
and we can monitor the state changes via an interrupt).

>  /* Advanced Receive Descriptor bit definitions */
>  #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
>  
> +#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
> +#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
> +
> +#define IGC_SMD_TYPE_SFD		0x0
> +#define IGC_SMD_TYPE_SMD_V		0x1
> +#define IGC_SMD_TYPE_SMD_R		0x2
> +#define IGC_SMD_TYPE_COMPLETE		0x3
> +
>  #define IGC_RXDEXT_STATERR_L4E		0x20000000
>  #define IGC_RXDEXT_STATERR_IPE		0x40000000
>  #define IGC_RXDEXT_STATERR_RXE		0x80000000
> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> index 84d5afe92154..f52a7be3af66 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> @@ -1649,6 +1649,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
>  
>  	fpcmd->enabled = adapter->frame_preemption_active;
>  	fpcmd->add_frag_size = adapter->add_frag_size;
> +	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
> +	fpcmd->disable_verify = adapter->fp_disable_verify;
>  
>  	return 0;
>  }
> @@ -1664,10 +1666,22 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
>  		return -EINVAL;
>  	}
>  
> -	adapter->frame_preemption_active = fpcmd->enabled;
> -	adapter->add_frag_size = fpcmd->add_frag_size;
> +	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +		schedule_delayed_work(&adapter->fp_verification_work, msecs_to_jiffies(10));

Not sure how much you'd like to tune this, but the spec has a
configurable verifyTime between 1 ms and 128 ms. You chose the default
value, so we should be ok for now.

> +	}
>  
> -	return igc_tsn_offload_apply(adapter);
> +	adapter->fp_disable_verify = fpcmd->disable_verify;
> +
> +	if (adapter->frame_preemption_active != fpcmd->enabled ||
> +	    adapter->add_frag_size != fpcmd->add_frag_size) {
> +		adapter->frame_preemption_active = fpcmd->enabled;
> +		adapter->add_frag_size = fpcmd->add_frag_size;
> +
> +		return igc_tsn_offload_apply(adapter);
> +	}
> +
> +	return 0;
>  }
>  
>  static int igc_ethtool_begin(struct net_device *netdev)
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 20dac04a02f2..ed55bd13e4a1 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -28,6 +28,11 @@
>  #define IGC_XDP_TX		BIT(1)
>  #define IGC_XDP_REDIRECT	BIT(2)
>  
> +#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
> +#define IGC_MAX_VERIFY_CNT 3
> +
> +#define IGC_FP_SMD_FRAME_SIZE 60
> +
>  static int debug = -1;
>  
>  MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
> @@ -2169,6 +2174,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
>  	return 0;
>  }
>  
> +static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
> +				 struct sk_buff *skb)
> +{
> +	dma_addr_t dma;
> +	unsigned int size;
> +
> +	size = skb_headlen(skb);
> +
> +	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
> +	if (dma_mapping_error(ring->dev, dma)) {
> +		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
> +		return -ENOMEM;
> +	}
> +
> +	buffer->skb = skb;
> +	buffer->protocol = 0;
> +	buffer->bytecount = skb->len;
> +	buffer->gso_segs = 1;
> +	buffer->time_stamp = jiffies;
> +	dma_unmap_len_set(buffer, len, skb->len);
> +	dma_unmap_addr_set(buffer, dma, dma);
> +
> +	return 0;
> +}
> +
> +static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
> +				     struct sk_buff *skb, int type)
> +{
> +	struct igc_tx_buffer *buffer;
> +	union igc_adv_tx_desc *desc;
> +	u32 cmd_type, olinfo_status;
> +	int err;
> +
> +	if (!igc_desc_unused(ring))
> +		return -EBUSY;
> +
> +	buffer = &ring->tx_buffer_info[ring->next_to_use];
> +	err = igc_fp_init_smd_frame(ring, buffer, skb);
> +	if (err)
> +		return err;
> +
> +	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
> +		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
> +		   buffer->bytecount;
> +	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
> +
> +	switch (type) {
> +	case IGC_SMD_TYPE_SMD_V:
> +		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
> +		break;
> +	case IGC_SMD_TYPE_SMD_R:
> +		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	desc = IGC_TX_DESC(ring, ring->next_to_use);
> +	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> +	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> +	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
> +
> +	netdev_tx_sent_queue(txring_txq(ring), skb->len);
> +
> +	buffer->next_to_watch = desc;
> +
> +	ring->next_to_use++;
> +	if (ring->next_to_use == ring->count)
> +		ring->next_to_use = 0;
> +
> +	return 0;
> +}
> +
>  static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
>  					    int cpu)
>  {
> @@ -2299,6 +2377,19 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
>  	q_vector->rx.total_bytes += bytes;
>  }
>  
> +static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
> +{
> +	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
> +
> +	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
> +		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
> +}
> +
> +static bool igc_check_smd_frame(struct igc_rx_buffer *rx_buffer, unsigned int size)
> +{
> +	return size == 60;

You should probably also verify that the contents is 60 octets of zeroes (sans the mCRC)?

> +}
> +
>  static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  {
>  	unsigned int total_bytes = 0, total_packets = 0;
> @@ -2315,6 +2406,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  		ktime_t timestamp = 0;
>  		struct xdp_buff xdp;
>  		int pkt_offset = 0;
> +		int smd_type;
>  		void *pktbuf;
>  
>  		/* return some buffers to hardware, one at a time is too slow */
> @@ -2346,6 +2438,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  			size -= IGC_TS_HDR_LEN;
>  		}
>  
> +		smd_type = igc_rx_desc_smd_type(rx_desc);
> +
> +		if (smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R) {

I guess the performance people will love you for this change. You should
probably guard it by an "if (unlikely(disableVerify == false))" condition.

> +			if (igc_check_smd_frame(rx_buffer, size)) {
> +				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
> +				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
> +				schedule_delayed_work(&adapter->fp_verification_work, 0);
> +			}
> +
> +			/* Advance the ring next-to-clean */
> +			igc_is_non_eop(rx_ring, rx_desc);
> +
> +			cleaned_count++;
> +			continue;
> +		}
> +
>  		if (!skb) {
>  			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>  			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
> @@ -5607,6 +5715,107 @@ static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
>  	return igc_tsn_offload_apply(adapter);
>  }
>  
> +/* I225 doesn't send the SMD frames automatically, we need to handle
> + * them ourselves.
> + */
> +static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
> +{
> +	int cpu = smp_processor_id();
> +	struct netdev_queue *nq;
> +	struct igc_ring *ring;
> +	struct sk_buff *skb;
> +	void *data;
> +	int err;
> +
> +	if (!netif_running(adapter->netdev))
> +		return -ENOTCONN;
> +
> +	/* FIXME: rename this function to something less specific, as
> +	 * it can be used outside XDP.
> +	 */
> +	ring = igc_xdp_get_tx_ring(adapter, cpu);
> +	nq = txring_txq(ring);
> +
> +	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return -ENOMEM;
> +
> +	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
> +	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
> +
> +	__netif_tx_lock(nq, cpu);
> +
> +	err = igc_fp_init_tx_descriptor(ring, skb, type);
> +
> +	igc_flush_tx_descriptors(ring);
> +
> +	__netif_tx_unlock(nq);
> +
> +	return err;
> +}
> +
> +static void igc_fp_verification_work(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct igc_adapter *adapter;
> +	int err;
> +
> +	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
> +
> +	if (adapter->fp_disable_verify)
> +		goto done;
> +
> +	switch (adapter->fp_tx_state) {
> +	case FRAME_PREEMPTION_STATE_START:
> +		adapter->fp_received_smd_r = false;
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");

On TX error should you really advance to the STATE_SENT?

> +
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
> +		adapter->fp_start = jiffies;
> +		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_SENT:
> +		if (adapter->fp_received_smd_r) {
> +			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
> +			adapter->fp_received_smd_r = false;
> +			break;
> +		}
> +
> +		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
> +			adapter->fp_verify_cnt++;
> +			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
> +
> +			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
> +				adapter->fp_verify_cnt = 0;
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
> +				netdev_err(adapter->netdev,
> +					   "Exceeded number of attempts for frame preemption verification\n");
> +			} else {
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +			}
> +			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		}
> +
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_FAILED:
> +	case FRAME_PREEMPTION_STATE_DONE:
> +		break;
> +	}
> +
> +done:
> +	if (adapter->fp_received_smd_v) {
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
> +
> +		adapter->fp_received_smd_v = false;
> +	}
> +}
> +
>  static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  			void *type_data)
>  {
> @@ -6023,6 +6232,7 @@ static int igc_probe(struct pci_dev *pdev,
>  
>  	INIT_WORK(&adapter->reset_task, igc_reset_task);
>  	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
> +	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
>  
>  	/* Initialize link properties that are user-changeable */
>  	adapter->fc_autoneg = true;
> @@ -6044,6 +6254,12 @@ static int igc_probe(struct pci_dev *pdev,
>  
>  	igc_ptp_init(adapter);
>  
> +	/* FIXME: This sets the default to not do the verification
> +	 * automatically, when we have support in multiple
> +	 * controllers, this default can be changed.
> +	 */
> +	adapter->fp_disable_verify = true;
> +

Hmmmmm. So we need to instruct our users to explicitly enable
verification in their ethtool-based scripts, since the default values
will vary wildly from one vendor to another. On LS1028A I see no reason
why verification would be disabled by default.

>  	/* reset the hardware with the new settings */
>  	igc_reset(adapter);
>  
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 12/12] igc: Add support for Frame Preemption verification
@ 2021-06-28  9:59     ` Vladimir Oltean
  0 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2021-06-28  9:59 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Jun 25, 2021 at 05:33:14PM -0700, Vinicius Costa Gomes wrote:
> Add support for sending/receiving Frame Preemption verification
> frames.
> 
> The i225 hardware doesn't implement the process of verification
> internally, this is left to the driver.
> 
> Add a simple implementation of the state machine defined in IEEE
> 802.3-2018, Section 99.4.7.
> 
> For now, the state machine is started manually by the user, when
> enabling verification. Example:
> 
> $ ethtool --set-frame-preemption IFACE disable-verify off
> 
> The "verified" condition is set to true when the SMD-V frame is sent,
> and the SMD-R frame is received. So, it only tracks the transmission
> side. This seems to be what's expected from IEEE 802.3-2018.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc.h         |  15 ++
>  drivers/net/ethernet/intel/igc/igc_defines.h |  13 ++
>  drivers/net/ethernet/intel/igc/igc_ethtool.c |  20 +-
>  drivers/net/ethernet/intel/igc/igc_main.c    | 216 +++++++++++++++++++
>  4 files changed, 261 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index 9b2ddcbf65fb..84234efed781 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -122,6 +122,13 @@ struct igc_ring {
>  	struct xsk_buff_pool *xsk_pool;
>  } ____cacheline_internodealigned_in_smp;
>  
> +enum frame_preemption_state {
> +	FRAME_PREEMPTION_STATE_FAILED,
> +	FRAME_PREEMPTION_STATE_DONE,
> +	FRAME_PREEMPTION_STATE_START,
> +	FRAME_PREEMPTION_STATE_SENT,
> +};
> +
>  /* Board specific private data structure */
>  struct igc_adapter {
>  	struct net_device *netdev;
> @@ -240,6 +247,14 @@ struct igc_adapter {
>  		struct timespec64 start;
>  		struct timespec64 period;
>  	} perout[IGC_N_PEROUT];
> +
> +	struct delayed_work fp_verification_work;
> +	unsigned long fp_start;
> +	bool fp_received_smd_v;
> +	bool fp_received_smd_r;
> +	unsigned int fp_verify_cnt;
> +	enum frame_preemption_state fp_tx_state;
> +	bool fp_disable_verify;
>  };
>  
>  void igc_up(struct igc_adapter *adapter);
> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
> index a2ea057d8e6e..cf46f5d5a505 100644
> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
> @@ -268,6 +268,8 @@
>  #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
>  #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
>  #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
> +#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
> +#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
>  #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
>  #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
>  #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
> @@ -327,9 +329,20 @@
>  
>  #define IGC_RXDEXT_STATERR_LB	0x00040000
>  
> +#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
> +#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
> +

So the i225 gives you the ability to select from multiple
Start-of-mPacket-Delimiter values on a per-TX descriptor basis?
And this is in addition to configuring that TX ring as preemptable I
guess? Because I notice that you're sending on the TX ring affine to the
current CPU that the verification work item is running on (which you
don't check anywhere that it is configured as going to the pMAC or not).
And on RX, it always gives you the kind of SMD that the packet had
(including the classic SFD for express packets)?
Cool.

It would be nice if I could connect back to back an i225 board with an
NXP LS1028A to see if the verification state machines pass both ways (on
LS1028A it is 100% hardware based, we just enable/disable the feature
and we can monitor the state changes via an interrupt).

>  /* Advanced Receive Descriptor bit definitions */
>  #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
>  
> +#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
> +#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
> +
> +#define IGC_SMD_TYPE_SFD		0x0
> +#define IGC_SMD_TYPE_SMD_V		0x1
> +#define IGC_SMD_TYPE_SMD_R		0x2
> +#define IGC_SMD_TYPE_COMPLETE		0x3
> +
>  #define IGC_RXDEXT_STATERR_L4E		0x20000000
>  #define IGC_RXDEXT_STATERR_IPE		0x40000000
>  #define IGC_RXDEXT_STATERR_RXE		0x80000000
> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> index 84d5afe92154..f52a7be3af66 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> @@ -1649,6 +1649,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
>  
>  	fpcmd->enabled = adapter->frame_preemption_active;
>  	fpcmd->add_frag_size = adapter->add_frag_size;
> +	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
> +	fpcmd->disable_verify = adapter->fp_disable_verify;
>  
>  	return 0;
>  }
> @@ -1664,10 +1666,22 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
>  		return -EINVAL;
>  	}
>  
> -	adapter->frame_preemption_active = fpcmd->enabled;
> -	adapter->add_frag_size = fpcmd->add_frag_size;
> +	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +		schedule_delayed_work(&adapter->fp_verification_work, msecs_to_jiffies(10));

Not sure how much you'd like to tune this, but the spec has a
configurable verifyTime between 1 ms and 128 ms. You chose the default
value, so we should be ok for now.

> +	}
>  
> -	return igc_tsn_offload_apply(adapter);
> +	adapter->fp_disable_verify = fpcmd->disable_verify;
> +
> +	if (adapter->frame_preemption_active != fpcmd->enabled ||
> +	    adapter->add_frag_size != fpcmd->add_frag_size) {
> +		adapter->frame_preemption_active = fpcmd->enabled;
> +		adapter->add_frag_size = fpcmd->add_frag_size;
> +
> +		return igc_tsn_offload_apply(adapter);
> +	}
> +
> +	return 0;
>  }
>  
>  static int igc_ethtool_begin(struct net_device *netdev)
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 20dac04a02f2..ed55bd13e4a1 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -28,6 +28,11 @@
>  #define IGC_XDP_TX		BIT(1)
>  #define IGC_XDP_REDIRECT	BIT(2)
>  
> +#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
> +#define IGC_MAX_VERIFY_CNT 3
> +
> +#define IGC_FP_SMD_FRAME_SIZE 60
> +
>  static int debug = -1;
>  
>  MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
> @@ -2169,6 +2174,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
>  	return 0;
>  }
>  
> +static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
> +				 struct sk_buff *skb)
> +{
> +	dma_addr_t dma;
> +	unsigned int size;
> +
> +	size = skb_headlen(skb);
> +
> +	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
> +	if (dma_mapping_error(ring->dev, dma)) {
> +		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
> +		return -ENOMEM;
> +	}
> +
> +	buffer->skb = skb;
> +	buffer->protocol = 0;
> +	buffer->bytecount = skb->len;
> +	buffer->gso_segs = 1;
> +	buffer->time_stamp = jiffies;
> +	dma_unmap_len_set(buffer, len, skb->len);
> +	dma_unmap_addr_set(buffer, dma, dma);
> +
> +	return 0;
> +}
> +
> +static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
> +				     struct sk_buff *skb, int type)
> +{
> +	struct igc_tx_buffer *buffer;
> +	union igc_adv_tx_desc *desc;
> +	u32 cmd_type, olinfo_status;
> +	int err;
> +
> +	if (!igc_desc_unused(ring))
> +		return -EBUSY;
> +
> +	buffer = &ring->tx_buffer_info[ring->next_to_use];
> +	err = igc_fp_init_smd_frame(ring, buffer, skb);
> +	if (err)
> +		return err;
> +
> +	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
> +		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
> +		   buffer->bytecount;
> +	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
> +
> +	switch (type) {
> +	case IGC_SMD_TYPE_SMD_V:
> +		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
> +		break;
> +	case IGC_SMD_TYPE_SMD_R:
> +		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	desc = IGC_TX_DESC(ring, ring->next_to_use);
> +	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> +	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> +	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
> +
> +	netdev_tx_sent_queue(txring_txq(ring), skb->len);
> +
> +	buffer->next_to_watch = desc;
> +
> +	ring->next_to_use++;
> +	if (ring->next_to_use == ring->count)
> +		ring->next_to_use = 0;
> +
> +	return 0;
> +}
> +
>  static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
>  					    int cpu)
>  {
> @@ -2299,6 +2377,19 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
>  	q_vector->rx.total_bytes += bytes;
>  }
>  
> +static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
> +{
> +	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
> +
> +	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
> +		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
> +}
> +
> +static bool igc_check_smd_frame(struct igc_rx_buffer *rx_buffer, unsigned int size)
> +{
> +	return size == 60;

You should probably also verify that the contents is 60 octets of zeroes (sans the mCRC)?

> +}
> +
>  static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  {
>  	unsigned int total_bytes = 0, total_packets = 0;
> @@ -2315,6 +2406,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  		ktime_t timestamp = 0;
>  		struct xdp_buff xdp;
>  		int pkt_offset = 0;
> +		int smd_type;
>  		void *pktbuf;
>  
>  		/* return some buffers to hardware, one at a time is too slow */
> @@ -2346,6 +2438,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  			size -= IGC_TS_HDR_LEN;
>  		}
>  
> +		smd_type = igc_rx_desc_smd_type(rx_desc);
> +
> +		if (smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R) {

I guess the performance people will love you for this change. You should
probably guard it by an "if (unlikely(disableVerify == false))" condition.

> +			if (igc_check_smd_frame(rx_buffer, size)) {
> +				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
> +				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
> +				schedule_delayed_work(&adapter->fp_verification_work, 0);
> +			}
> +
> +			/* Advance the ring next-to-clean */
> +			igc_is_non_eop(rx_ring, rx_desc);
> +
> +			cleaned_count++;
> +			continue;
> +		}
> +
>  		if (!skb) {
>  			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>  			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
> @@ -5607,6 +5715,107 @@ static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
>  	return igc_tsn_offload_apply(adapter);
>  }
>  
> +/* I225 doesn't send the SMD frames automatically, we need to handle
> + * them ourselves.
> + */
> +static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
> +{
> +	int cpu = smp_processor_id();
> +	struct netdev_queue *nq;
> +	struct igc_ring *ring;
> +	struct sk_buff *skb;
> +	void *data;
> +	int err;
> +
> +	if (!netif_running(adapter->netdev))
> +		return -ENOTCONN;
> +
> +	/* FIXME: rename this function to something less specific, as
> +	 * it can be used outside XDP.
> +	 */
> +	ring = igc_xdp_get_tx_ring(adapter, cpu);
> +	nq = txring_txq(ring);
> +
> +	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return -ENOMEM;
> +
> +	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
> +	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
> +
> +	__netif_tx_lock(nq, cpu);
> +
> +	err = igc_fp_init_tx_descriptor(ring, skb, type);
> +
> +	igc_flush_tx_descriptors(ring);
> +
> +	__netif_tx_unlock(nq);
> +
> +	return err;
> +}
> +
> +static void igc_fp_verification_work(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct igc_adapter *adapter;
> +	int err;
> +
> +	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
> +
> +	if (adapter->fp_disable_verify)
> +		goto done;
> +
> +	switch (adapter->fp_tx_state) {
> +	case FRAME_PREEMPTION_STATE_START:
> +		adapter->fp_received_smd_r = false;
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");

On TX error should you really advance to the STATE_SENT?

> +
> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
> +		adapter->fp_start = jiffies;
> +		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_SENT:
> +		if (adapter->fp_received_smd_r) {
> +			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
> +			adapter->fp_received_smd_r = false;
> +			break;
> +		}
> +
> +		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
> +			adapter->fp_verify_cnt++;
> +			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
> +
> +			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
> +				adapter->fp_verify_cnt = 0;
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
> +				netdev_err(adapter->netdev,
> +					   "Exceeded number of attempts for frame preemption verification\n");
> +			} else {
> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
> +			}
> +			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
> +		}
> +
> +		break;
> +
> +	case FRAME_PREEMPTION_STATE_FAILED:
> +	case FRAME_PREEMPTION_STATE_DONE:
> +		break;
> +	}
> +
> +done:
> +	if (adapter->fp_received_smd_v) {
> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
> +		if (err < 0)
> +			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
> +
> +		adapter->fp_received_smd_v = false;
> +	}
> +}
> +
>  static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
>  			void *type_data)
>  {
> @@ -6023,6 +6232,7 @@ static int igc_probe(struct pci_dev *pdev,
>  
>  	INIT_WORK(&adapter->reset_task, igc_reset_task);
>  	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
> +	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
>  
>  	/* Initialize link properties that are user-changeable */
>  	adapter->fc_autoneg = true;
> @@ -6044,6 +6254,12 @@ static int igc_probe(struct pci_dev *pdev,
>  
>  	igc_ptp_init(adapter);
>  
> +	/* FIXME: This sets the default to not do the verification
> +	 * automatically, when we have support in multiple
> +	 * controllers, this default can be changed.
> +	 */
> +	adapter->fp_disable_verify = true;
> +

Hmmmmm. So we need to instruct our users to explicitly enable
verification in their ethtool-based scripts, since the default values
will vary wildly from one vendor to another. On LS1028A I see no reason
why verification would be disabled by default.

>  	/* reset the hardware with the new settings */
>  	igc_reset(adapter);
>  
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 01/12] ethtool: Add support for configuring frame preemption
  2021-06-27 19:43     ` [Intel-wired-lan] " Vladimir Oltean
@ 2022-04-11 22:39       ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-11 22:39 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

Hi,

(bringing an old thread back to life)

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jun 25, 2021 at 05:33:03PM -0700, Vinicius Costa Gomes wrote:
>> Frame preemption (described in IEEE 802.3-2018, Section 99 in
>> particular) defines the concept of preemptible and express queues. It
>> allows traffic from express queues to "interrupt" traffic from
>> preemptible queues, which are "resumed" after the express traffic has
>> finished transmitting.
>> 
>> Frame preemption can only be used when both the local device and the
>> link partner support it.
>> 
>> Only parameters for enabling/disabling frame preemption and
>> configuring the minimum fragment size are included here. Expressing
>> which queues are marked as preemptible is left to mqprio/taprio, as
>> having that information there should be easier on the user.
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>>  Documentation/networking/ethtool-netlink.rst |  38 +++++
>>  include/linux/ethtool.h                      |  22 +++
>>  include/uapi/linux/ethtool_netlink.h         |  17 +++
>>  net/ethtool/Makefile                         |   2 +-
>>  net/ethtool/common.c                         |  25 ++++
>>  net/ethtool/netlink.c                        |  19 +++
>>  net/ethtool/netlink.h                        |   4 +
>>  net/ethtool/preempt.c                        | 146 +++++++++++++++++++
>>  8 files changed, 272 insertions(+), 1 deletion(-)
>>  create mode 100644 net/ethtool/preempt.c
>> 
>> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
>> index 6ea91e41593f..a87f1716944e 100644
>> --- a/Documentation/networking/ethtool-netlink.rst
>> +++ b/Documentation/networking/ethtool-netlink.rst
>> @@ -1477,6 +1477,44 @@ Low and high bounds are inclusive, for example:
>>   etherStatsPkts512to1023Octets 512  1023
>>   ============================= ==== ====
>
> I think you need to add some extra documentation bits to the
>
> List of message types
> =====================
>
> and
>
> Request translation
> ===================
>
> sections.
>

Will add some more documentation.

>>  
>> +PREEMPT_GET
>> +===========
>> +
>> +Get information about frame preemption state.
>> +
>> +Request contents:
>> +
>> +  ====================================  ======  ==========================
>> +  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header
>> +  ====================================  ======  ==========================
>> +
>> +Request contents:
>> +
>> +  =====================================  ======  ==========================
>> +  ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header
>> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
>> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
>> +  =====================================  ======  ==========================
>> +
>> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
>> +fragment size that the receiver device supports.
>> +
>> +PREEMPT_SET
>> +===========
>> +
>> +Sets frame preemption parameters.
>> +
>> +Request contents:
>> +
>> +  =====================================  ======  ==========================
>> +  ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header
>> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
>> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
>> +  =====================================  ======  ==========================
>> +
>> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
>> +fragment size that the receiver device supports.
>> +
>>  Request translation
>>  ===================
>>  
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index 29dbb603bc91..7e449be8f335 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -409,6 +409,19 @@ struct ethtool_module_eeprom {
>>  	u8	*data;
>>  };
>>  
>> +/**
>> + * struct ethtool_fp - Frame Preemption information
>> + *
>> + * @enabled: Enable frame preemption.
>> + * @add_frag_size: Minimum size for additional (non-final) fragments
>> + * in bytes, for the value defined in the IEEE 802.3-2018 standard see
>> + * ethtool_frag_size_to_mult().
>> + */
>> +struct ethtool_fp {
>> +	u8 enabled;
>> +	u32 add_frag_size;
>
> Strange that the verify_disable bit is not in here? I haven't looked at
> further patches in detail but I saw in the commit message that you added
> support for it, maybe it needs to be squashed with this?

Will squash the commit that exposes verification config via netlink into this.

>
> Can we make "enabled" a bool?

It seems that the current convention is to use u32 to represent booleans
in the ethtool/netlink API. See ethnl_update_bool32(), will use this instead.

>
>> +};
>> +
>>  /**
>>   * struct ethtool_ops - optional netdev operations
>>   * @cap_link_lanes_supported: indicates if the driver supports lanes
>> @@ -561,6 +574,8 @@ struct ethtool_module_eeprom {
>>   *	not report statistics.
>>   * @get_fecparam: Get the network device Forward Error Correction parameters.
>>   * @set_fecparam: Set the network device Forward Error Correction parameters.
>> + * @get_preempt: Get the network device Frame Preemption parameters.
>> + * @set_preempt: Set the network device Frame Preemption parameters.
>>   * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
>>   *	This is only useful if the device maintains PHY statistics and
>>   *	cannot use the standard PHY library helpers.
>> @@ -675,6 +690,10 @@ struct ethtool_ops {
>>  				      struct ethtool_fecparam *);
>>  	int	(*set_fecparam)(struct net_device *,
>>  				      struct ethtool_fecparam *);
>> +	int	(*get_preempt)(struct net_device *,
>> +			       struct ethtool_fp *);
>> +	int	(*set_preempt)(struct net_device *, struct ethtool_fp *,
>> +			       struct netlink_ext_ack *);
>>  	void	(*get_ethtool_phy_stats)(struct net_device *,
>>  					 struct ethtool_stats *, u64 *);
>>  	int	(*get_phy_tunable)(struct net_device *,
>> @@ -766,4 +785,7 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
>>   * next string.
>>   */
>>  extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
>> +
>> +u8 ethtool_frag_size_to_mult(u32 frag_size);
>> +
>>  #endif /* _LINUX_ETHTOOL_H */
>> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
>> index c7135c9c37a5..4600aba1c693 100644
>> --- a/include/uapi/linux/ethtool_netlink.h
>> +++ b/include/uapi/linux/ethtool_netlink.h
>> @@ -44,6 +44,8 @@ enum {
>>  	ETHTOOL_MSG_TUNNEL_INFO_GET,
>>  	ETHTOOL_MSG_FEC_GET,
>>  	ETHTOOL_MSG_FEC_SET,
>> +	ETHTOOL_MSG_PREEMPT_GET,
>> +	ETHTOOL_MSG_PREEMPT_SET,
>>  	ETHTOOL_MSG_MODULE_EEPROM_GET,
>>  	ETHTOOL_MSG_STATS_GET,
>>  
>> @@ -86,6 +88,8 @@ enum {
>>  	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
>>  	ETHTOOL_MSG_FEC_GET_REPLY,
>>  	ETHTOOL_MSG_FEC_NTF,
>> +	ETHTOOL_MSG_PREEMPT_GET_REPLY,
>> +	ETHTOOL_MSG_PREEMPT_NTF,
>>  	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
>>  	ETHTOOL_MSG_STATS_GET_REPLY,
>
> Correct me if I'm wrong, but enums in uapi should always be added at the
> end, otherwise you break value with user space binaries which use
> ETHTOOL_MSG_MODULE_EEPROM_GET and are compiled against old kernel
> headers.

Fixed.

>
>>  
>> @@ -664,6 +668,19 @@ enum {
>>  	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)
>>  };
>>  
>> +/* FRAME PREEMPTION */
>> +
>> +enum {
>> +	ETHTOOL_A_PREEMPT_UNSPEC,
>> +	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
>> +	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
>> +	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
>> +
>> +	/* add new constants above here */
>> +	__ETHTOOL_A_PREEMPT_CNT,
>> +	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)
>> +};
>> +
>>  /* MODULE EEPROM */
>>  
>>  enum {
>> diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
>> index 723c9a8a8cdf..4b84b2d34c7a 100644
>> --- a/net/ethtool/Makefile
>> +++ b/net/ethtool/Makefile
>> @@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
>>  ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
>>  		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
>>  		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
>> -		   tunnels.o fec.o eeprom.o stats.o
>> +		   tunnels.o fec.o preempt.o eeprom.o stats.o
>> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
>> index f9dcbad84788..68d123dd500b 100644
>> --- a/net/ethtool/common.c
>> +++ b/net/ethtool/common.c
>> @@ -579,3 +579,28 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
>>  	link_ksettings->base.duplex = link_info->duplex;
>>  }
>>  EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);
>> +
>> +/**
>> + * ethtool_frag_size_to_mult() - Convert from a Frame Preemption
>> + * Additional Fragment size in bytes to a multiplier.
>> + * @frag_size: minimum non-final fragment size in bytes.
>> + *
>> + * The multiplier is defined as:
>> + *	"A 2-bit integer value used to indicate the minimum size of
>> + *	non-final fragments supported by the receiver on the given port
>> + *	associated with the local System. This value is expressed in units
>> + *	of 64 octets of additional fragment length."
>> + *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018
>> + *	standard.
>> + *
>> + * Return: the multiplier is a number in the [0, 2] interval.
>> + */
>> +u8 ethtool_frag_size_to_mult(u32 frag_size)
>> +{
>> +	u8 mult = (frag_size / 64) - 1;
>> +
>> +	mult = clamp_t(u8, mult, 0, 3);
>> +
>> +	return mult;
>
> I think it would look better as "return clamp_t(u8, mult, 0, 3);"

Fixed.

>
>> +}
>> +EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);
>> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
>> index a7346346114f..f4e07b740790 100644
>> --- a/net/ethtool/netlink.c
>> +++ b/net/ethtool/netlink.c
>> @@ -246,6 +246,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
>>  	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
>>  	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,
>>  	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
>> +	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,
>>  	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
>>  	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
>>  };
>> @@ -561,6 +562,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
>>  	[ETHTOOL_MSG_PAUSE_NTF]		= &ethnl_pause_request_ops,
>>  	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
>>  	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
>> +	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,
>>  };
>>  
>>  /* default notification handler */
>> @@ -654,6 +656,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
>>  	[ETHTOOL_MSG_PAUSE_NTF]		= ethnl_default_notify,
>>  	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
>>  	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
>> +	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,
>>  };
>>  
>>  void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
>> @@ -958,6 +961,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
>>  		.policy = ethnl_stats_get_policy,
>>  		.maxattr = ARRAY_SIZE(ethnl_stats_get_policy) - 1,
>>  	},
>> +	{
>> +		.cmd	= ETHTOOL_MSG_PREEMPT_GET,
>> +		.doit	= ethnl_default_doit,
>> +		.start	= ethnl_default_start,
>> +		.dumpit	= ethnl_default_dumpit,
>> +		.done	= ethnl_default_done,
>> +		.policy = ethnl_preempt_get_policy,
>> +		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,
>> +	},
>> +	{
>> +		.cmd	= ETHTOOL_MSG_PREEMPT_SET,
>> +		.flags	= GENL_UNS_ADMIN_PERM,
>> +		.doit	= ethnl_set_preempt,
>> +		.policy = ethnl_preempt_set_policy,
>> +		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,
>> +	},
>>  };
>>  
>>  static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
>> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
>> index 3e25a47fd482..cc90a463a81c 100644
>> --- a/net/ethtool/netlink.h
>> +++ b/net/ethtool/netlink.h
>> @@ -345,6 +345,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;
>>  extern const struct ethnl_request_ops ethnl_eee_request_ops;
>>  extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
>>  extern const struct ethnl_request_ops ethnl_fec_request_ops;
>> +extern const struct ethnl_request_ops ethnl_preempt_request_ops;
>>  extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
>>  extern const struct ethnl_request_ops ethnl_stats_request_ops;
>>  
>> @@ -381,6 +382,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
>>  extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
>>  extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
>>  extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
>> +extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
>> +extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
>>  extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
>>  
>>  int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
>> @@ -400,6 +403,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
>>  int ethnl_tunnel_info_start(struct netlink_callback *cb);
>>  int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
>>  int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
>> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);
>>  
>>  extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
>>  extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
>> diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
>> new file mode 100644
>> index 000000000000..4f96d3c2b1d5
>> --- /dev/null
>> +++ b/net/ethtool/preempt.c
>> @@ -0,0 +1,146 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +#include "netlink.h"
>> +#include "common.h"
>> +
>> +struct preempt_req_info {
>> +	struct ethnl_req_info		base;
>> +};
>> +
>> +struct preempt_reply_data {
>> +	struct ethnl_reply_data		base;
>> +	struct ethtool_fp		fp;
>> +};
>> +
>> +#define PREEMPT_REPDATA(__reply_base) \
>> +	container_of(__reply_base, struct preempt_reply_data, base)
>> +
>> +const struct nla_policy
>> +ethnl_preempt_get_policy[] = {
>> +	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),
>> +};
>> +
>> +static int preempt_prepare_data(const struct ethnl_req_info *req_base,
>> +				struct ethnl_reply_data *reply_base,
>> +				struct genl_info *info)
>> +{
>> +	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
>> +	struct net_device *dev = reply_base->dev;
>> +	int ret;
>> +
>> +	if (!dev->ethtool_ops->get_preempt)
>> +		return -EOPNOTSUPP;
>> +
>> +	ret = ethnl_ops_begin(dev);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);
>> +	ethnl_ops_complete(dev);
>> +
>> +	return ret;
>> +}
>> +
>> +static int preempt_reply_size(const struct ethnl_req_info *req_base,
>> +			      const struct ethnl_reply_data *reply_base)
>> +{
>> +	int len = 0;
>> +
>> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
>> +	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
>> +
>> +	return len;
>> +}
>> +
>> +static int preempt_fill_reply(struct sk_buff *skb,
>> +			      const struct ethnl_req_info *req_base,
>> +			      const struct ethnl_reply_data *reply_base)
>> +{
>> +	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
>> +	const struct ethtool_fp *preempt = &data->fp;
>> +
>> +	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))
>> +		return -EMSGSIZE;
>> +
>> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,
>> +			preempt->add_frag_size))
>> +		return -EMSGSIZE;
>> +
>> +	return 0;
>> +}
>> +
>> +const struct ethnl_request_ops ethnl_preempt_request_ops = {
>> +	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,
>> +	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,
>> +	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,
>> +	.req_info_size		= sizeof(struct preempt_req_info),
>> +	.reply_data_size	= sizeof(struct preempt_reply_data),
>> +
>> +	.prepare_data		= preempt_prepare_data,
>> +	.reply_size		= preempt_reply_size,
>> +	.fill_reply		= preempt_fill_reply,
>> +};
>> +
>> +const struct nla_policy
>> +ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
>> +	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
>> +	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
>> +	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
>> +};
>> +
>> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
>> +{
>> +	struct ethnl_req_info req_info = {};
>> +	struct nlattr **tb = info->attrs;
>> +	struct ethtool_fp preempt = {};
>> +	struct net_device *dev;
>> +	bool mod = false;
>> +	int ret;
>> +
>> +	ret = ethnl_parse_header_dev_get(&req_info,
>> +					 tb[ETHTOOL_A_PREEMPT_HEADER],
>> +					 genl_info_net(info), info->extack,
>> +					 true);
>> +	if (ret < 0)
>> +		return ret;
>> +	dev = req_info.dev;
>> +	ret = -EOPNOTSUPP;
>
> Some new lines around here please? And maybe it would look a bit cleaner
> if you could assign "ret = -EOPNOTSUPP" in the "preempt ops not present"
> if condition body?
>

I will add more vertical spaces. About the error returning idioms, even
if they are not my preferred style, they are consistent with the other
files in net/ethtool, so will keep that as it is.

>> +	if (!dev->ethtool_ops->get_preempt ||
>> +	    !dev->ethtool_ops->set_preempt)
>> +		goto out_dev;
>> +
>> +	rtnl_lock();
>> +	ret = ethnl_ops_begin(dev);
>> +	if (ret < 0)
>> +		goto out_rtnl;
>> +
>> +	ret = dev->ethtool_ops->get_preempt(dev, &preempt);
>
> I don't know much about the background of ethtool netlink, but why does
> the .doit of ETHTOOL_MSG_*_SET go through a getter first? Is it because
> all the netlink attributes from the message are optional, and we need to
> default to the current state?
>

Yes, and there's the "optimization" that the setter will only be called
if there's any modification, so we need to know the current state.

>> +	if (ret < 0) {
>> +		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
>> +		goto out_ops;
>> +	}
>> +
>> +	ethnl_update_u8(&preempt.enabled,
>> +			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
>> +	ethnl_update_u32(&preempt.add_frag_size,
>> +			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
>> +	ret = 0;
>
> This reinitialization of ret to zero is interesting. It implies
> ->get_preempt() is allowed to return > 0 as a success error code.
> However ->set_preempt() below isn't? (its return value is directly
> propagated to callers of ethnl_set_preempt().
>

It also applies to other "commands" in net/ethtool. My feeling is that
this is more like a undocumented convention than a bug (following what
the first command did). Will leave as it is, unless there are strong
feelings.


>> +	if (!mod)
>> +		goto out_ops;
>> +
>> +	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
>> +	if (ret < 0) {
>> +		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");
>> +		goto out_ops;
>> +	}
>> +
>> +	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
>> +
>> +out_ops:
>> +	ethnl_ops_complete(dev);
>> +out_rtnl:
>> +	rtnl_unlock();
>> +out_dev:
>> +	dev_put(dev);
>> +	return ret;
>> +}
>> -- 
>> 2.32.0
>> 


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 01/12] ethtool: Add support for configuring frame preemption
@ 2022-04-11 22:39       ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-11 22:39 UTC (permalink / raw)
  To: intel-wired-lan

Hi,

(bringing an old thread back to life)

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jun 25, 2021 at 05:33:03PM -0700, Vinicius Costa Gomes wrote:
>> Frame preemption (described in IEEE 802.3-2018, Section 99 in
>> particular) defines the concept of preemptible and express queues. It
>> allows traffic from express queues to "interrupt" traffic from
>> preemptible queues, which are "resumed" after the express traffic has
>> finished transmitting.
>> 
>> Frame preemption can only be used when both the local device and the
>> link partner support it.
>> 
>> Only parameters for enabling/disabling frame preemption and
>> configuring the minimum fragment size are included here. Expressing
>> which queues are marked as preemptible is left to mqprio/taprio, as
>> having that information there should be easier on the user.
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>>  Documentation/networking/ethtool-netlink.rst |  38 +++++
>>  include/linux/ethtool.h                      |  22 +++
>>  include/uapi/linux/ethtool_netlink.h         |  17 +++
>>  net/ethtool/Makefile                         |   2 +-
>>  net/ethtool/common.c                         |  25 ++++
>>  net/ethtool/netlink.c                        |  19 +++
>>  net/ethtool/netlink.h                        |   4 +
>>  net/ethtool/preempt.c                        | 146 +++++++++++++++++++
>>  8 files changed, 272 insertions(+), 1 deletion(-)
>>  create mode 100644 net/ethtool/preempt.c
>> 
>> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
>> index 6ea91e41593f..a87f1716944e 100644
>> --- a/Documentation/networking/ethtool-netlink.rst
>> +++ b/Documentation/networking/ethtool-netlink.rst
>> @@ -1477,6 +1477,44 @@ Low and high bounds are inclusive, for example:
>>   etherStatsPkts512to1023Octets 512  1023
>>   ============================= ==== ====
>
> I think you need to add some extra documentation bits to the
>
> List of message types
> =====================
>
> and
>
> Request translation
> ===================
>
> sections.
>

Will add some more documentation.

>>  
>> +PREEMPT_GET
>> +===========
>> +
>> +Get information about frame preemption state.
>> +
>> +Request contents:
>> +
>> +  ====================================  ======  ==========================
>> +  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header
>> +  ====================================  ======  ==========================
>> +
>> +Request contents:
>> +
>> +  =====================================  ======  ==========================
>> +  ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header
>> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
>> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
>> +  =====================================  ======  ==========================
>> +
>> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
>> +fragment size that the receiver device supports.
>> +
>> +PREEMPT_SET
>> +===========
>> +
>> +Sets frame preemption parameters.
>> +
>> +Request contents:
>> +
>> +  =====================================  ======  ==========================
>> +  ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header
>> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled
>> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size
>> +  =====================================  ======  ==========================
>> +
>> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final
>> +fragment size that the receiver device supports.
>> +
>>  Request translation
>>  ===================
>>  
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index 29dbb603bc91..7e449be8f335 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -409,6 +409,19 @@ struct ethtool_module_eeprom {
>>  	u8	*data;
>>  };
>>  
>> +/**
>> + * struct ethtool_fp - Frame Preemption information
>> + *
>> + * @enabled: Enable frame preemption.
>> + * @add_frag_size: Minimum size for additional (non-final) fragments
>> + * in bytes, for the value defined in the IEEE 802.3-2018 standard see
>> + * ethtool_frag_size_to_mult().
>> + */
>> +struct ethtool_fp {
>> +	u8 enabled;
>> +	u32 add_frag_size;
>
> Strange that the verify_disable bit is not in here? I haven't looked at
> further patches in detail but I saw in the commit message that you added
> support for it, maybe it needs to be squashed with this?

Will squash the commit that exposes verification config via netlink into this.

>
> Can we make "enabled" a bool?

It seems that the current convention is to use u32 to represent booleans
in the ethtool/netlink API. See ethnl_update_bool32(), will use this instead.

>
>> +};
>> +
>>  /**
>>   * struct ethtool_ops - optional netdev operations
>>   * @cap_link_lanes_supported: indicates if the driver supports lanes
>> @@ -561,6 +574,8 @@ struct ethtool_module_eeprom {
>>   *	not report statistics.
>>   * @get_fecparam: Get the network device Forward Error Correction parameters.
>>   * @set_fecparam: Set the network device Forward Error Correction parameters.
>> + * @get_preempt: Get the network device Frame Preemption parameters.
>> + * @set_preempt: Set the network device Frame Preemption parameters.
>>   * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
>>   *	This is only useful if the device maintains PHY statistics and
>>   *	cannot use the standard PHY library helpers.
>> @@ -675,6 +690,10 @@ struct ethtool_ops {
>>  				      struct ethtool_fecparam *);
>>  	int	(*set_fecparam)(struct net_device *,
>>  				      struct ethtool_fecparam *);
>> +	int	(*get_preempt)(struct net_device *,
>> +			       struct ethtool_fp *);
>> +	int	(*set_preempt)(struct net_device *, struct ethtool_fp *,
>> +			       struct netlink_ext_ack *);
>>  	void	(*get_ethtool_phy_stats)(struct net_device *,
>>  					 struct ethtool_stats *, u64 *);
>>  	int	(*get_phy_tunable)(struct net_device *,
>> @@ -766,4 +785,7 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
>>   * next string.
>>   */
>>  extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
>> +
>> +u8 ethtool_frag_size_to_mult(u32 frag_size);
>> +
>>  #endif /* _LINUX_ETHTOOL_H */
>> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
>> index c7135c9c37a5..4600aba1c693 100644
>> --- a/include/uapi/linux/ethtool_netlink.h
>> +++ b/include/uapi/linux/ethtool_netlink.h
>> @@ -44,6 +44,8 @@ enum {
>>  	ETHTOOL_MSG_TUNNEL_INFO_GET,
>>  	ETHTOOL_MSG_FEC_GET,
>>  	ETHTOOL_MSG_FEC_SET,
>> +	ETHTOOL_MSG_PREEMPT_GET,
>> +	ETHTOOL_MSG_PREEMPT_SET,
>>  	ETHTOOL_MSG_MODULE_EEPROM_GET,
>>  	ETHTOOL_MSG_STATS_GET,
>>  
>> @@ -86,6 +88,8 @@ enum {
>>  	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,
>>  	ETHTOOL_MSG_FEC_GET_REPLY,
>>  	ETHTOOL_MSG_FEC_NTF,
>> +	ETHTOOL_MSG_PREEMPT_GET_REPLY,
>> +	ETHTOOL_MSG_PREEMPT_NTF,
>>  	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
>>  	ETHTOOL_MSG_STATS_GET_REPLY,
>
> Correct me if I'm wrong, but enums in uapi should always be added at the
> end, otherwise you break value with user space binaries which use
> ETHTOOL_MSG_MODULE_EEPROM_GET and are compiled against old kernel
> headers.

Fixed.

>
>>  
>> @@ -664,6 +668,19 @@ enum {
>>  	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)
>>  };
>>  
>> +/* FRAME PREEMPTION */
>> +
>> +enum {
>> +	ETHTOOL_A_PREEMPT_UNSPEC,
>> +	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */
>> +	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */
>> +	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */
>> +
>> +	/* add new constants above here */
>> +	__ETHTOOL_A_PREEMPT_CNT,
>> +	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)
>> +};
>> +
>>  /* MODULE EEPROM */
>>  
>>  enum {
>> diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
>> index 723c9a8a8cdf..4b84b2d34c7a 100644
>> --- a/net/ethtool/Makefile
>> +++ b/net/ethtool/Makefile
>> @@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o
>>  ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
>>  		   linkstate.o debug.o wol.o features.o privflags.o rings.o \
>>  		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
>> -		   tunnels.o fec.o eeprom.o stats.o
>> +		   tunnels.o fec.o preempt.o eeprom.o stats.o
>> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
>> index f9dcbad84788..68d123dd500b 100644
>> --- a/net/ethtool/common.c
>> +++ b/net/ethtool/common.c
>> @@ -579,3 +579,28 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,
>>  	link_ksettings->base.duplex = link_info->duplex;
>>  }
>>  EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);
>> +
>> +/**
>> + * ethtool_frag_size_to_mult() - Convert from a Frame Preemption
>> + * Additional Fragment size in bytes to a multiplier.
>> + * @frag_size: minimum non-final fragment size in bytes.
>> + *
>> + * The multiplier is defined as:
>> + *	"A 2-bit integer value used to indicate the minimum size of
>> + *	non-final fragments supported by the receiver on the given port
>> + *	associated with the local System. This value is expressed in units
>> + *	of 64 octets of additional fragment length."
>> + *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018
>> + *	standard.
>> + *
>> + * Return: the multiplier is a number in the [0, 2] interval.
>> + */
>> +u8 ethtool_frag_size_to_mult(u32 frag_size)
>> +{
>> +	u8 mult = (frag_size / 64) - 1;
>> +
>> +	mult = clamp_t(u8, mult, 0, 3);
>> +
>> +	return mult;
>
> I think it would look better as "return clamp_t(u8, mult, 0, 3);"

Fixed.

>
>> +}
>> +EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);
>> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
>> index a7346346114f..f4e07b740790 100644
>> --- a/net/ethtool/netlink.c
>> +++ b/net/ethtool/netlink.c
>> @@ -246,6 +246,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
>>  	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,
>>  	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,
>>  	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,
>> +	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,
>>  	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,
>>  	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,
>>  };
>> @@ -561,6 +562,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {
>>  	[ETHTOOL_MSG_PAUSE_NTF]		= &ethnl_pause_request_ops,
>>  	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,
>>  	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,
>> +	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,
>>  };
>>  
>>  /* default notification handler */
>> @@ -654,6 +656,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
>>  	[ETHTOOL_MSG_PAUSE_NTF]		= ethnl_default_notify,
>>  	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,
>>  	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,
>> +	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,
>>  };
>>  
>>  void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)
>> @@ -958,6 +961,22 @@ static const struct genl_ops ethtool_genl_ops[] = {
>>  		.policy = ethnl_stats_get_policy,
>>  		.maxattr = ARRAY_SIZE(ethnl_stats_get_policy) - 1,
>>  	},
>> +	{
>> +		.cmd	= ETHTOOL_MSG_PREEMPT_GET,
>> +		.doit	= ethnl_default_doit,
>> +		.start	= ethnl_default_start,
>> +		.dumpit	= ethnl_default_dumpit,
>> +		.done	= ethnl_default_done,
>> +		.policy = ethnl_preempt_get_policy,
>> +		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,
>> +	},
>> +	{
>> +		.cmd	= ETHTOOL_MSG_PREEMPT_SET,
>> +		.flags	= GENL_UNS_ADMIN_PERM,
>> +		.doit	= ethnl_set_preempt,
>> +		.policy = ethnl_preempt_set_policy,
>> +		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,
>> +	},
>>  };
>>  
>>  static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
>> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
>> index 3e25a47fd482..cc90a463a81c 100644
>> --- a/net/ethtool/netlink.h
>> +++ b/net/ethtool/netlink.h
>> @@ -345,6 +345,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;
>>  extern const struct ethnl_request_ops ethnl_eee_request_ops;
>>  extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;
>>  extern const struct ethnl_request_ops ethnl_fec_request_ops;
>> +extern const struct ethnl_request_ops ethnl_preempt_request_ops;
>>  extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;
>>  extern const struct ethnl_request_ops ethnl_stats_request_ops;
>>  
>> @@ -381,6 +382,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
>>  extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
>>  extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
>>  extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
>> +extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];
>> +extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];
>>  extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];
>>  
>>  int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);
>> @@ -400,6 +403,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
>>  int ethnl_tunnel_info_start(struct netlink_callback *cb);
>>  int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
>>  int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);
>> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);
>>  
>>  extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
>>  extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
>> diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c
>> new file mode 100644
>> index 000000000000..4f96d3c2b1d5
>> --- /dev/null
>> +++ b/net/ethtool/preempt.c
>> @@ -0,0 +1,146 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +#include "netlink.h"
>> +#include "common.h"
>> +
>> +struct preempt_req_info {
>> +	struct ethnl_req_info		base;
>> +};
>> +
>> +struct preempt_reply_data {
>> +	struct ethnl_reply_data		base;
>> +	struct ethtool_fp		fp;
>> +};
>> +
>> +#define PREEMPT_REPDATA(__reply_base) \
>> +	container_of(__reply_base, struct preempt_reply_data, base)
>> +
>> +const struct nla_policy
>> +ethnl_preempt_get_policy[] = {
>> +	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),
>> +};
>> +
>> +static int preempt_prepare_data(const struct ethnl_req_info *req_base,
>> +				struct ethnl_reply_data *reply_base,
>> +				struct genl_info *info)
>> +{
>> +	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
>> +	struct net_device *dev = reply_base->dev;
>> +	int ret;
>> +
>> +	if (!dev->ethtool_ops->get_preempt)
>> +		return -EOPNOTSUPP;
>> +
>> +	ret = ethnl_ops_begin(dev);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);
>> +	ethnl_ops_complete(dev);
>> +
>> +	return ret;
>> +}
>> +
>> +static int preempt_reply_size(const struct ethnl_req_info *req_base,
>> +			      const struct ethnl_reply_data *reply_base)
>> +{
>> +	int len = 0;
>> +
>> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */
>> +	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */
>> +
>> +	return len;
>> +}
>> +
>> +static int preempt_fill_reply(struct sk_buff *skb,
>> +			      const struct ethnl_req_info *req_base,
>> +			      const struct ethnl_reply_data *reply_base)
>> +{
>> +	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);
>> +	const struct ethtool_fp *preempt = &data->fp;
>> +
>> +	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))
>> +		return -EMSGSIZE;
>> +
>> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,
>> +			preempt->add_frag_size))
>> +		return -EMSGSIZE;
>> +
>> +	return 0;
>> +}
>> +
>> +const struct ethnl_request_ops ethnl_preempt_request_ops = {
>> +	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,
>> +	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,
>> +	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,
>> +	.req_info_size		= sizeof(struct preempt_req_info),
>> +	.reply_data_size	= sizeof(struct preempt_reply_data),
>> +
>> +	.prepare_data		= preempt_prepare_data,
>> +	.reply_size		= preempt_reply_size,
>> +	.fill_reply		= preempt_fill_reply,
>> +};
>> +
>> +const struct nla_policy
>> +ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {
>> +	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),
>> +	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),
>> +	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },
>> +};
>> +
>> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)
>> +{
>> +	struct ethnl_req_info req_info = {};
>> +	struct nlattr **tb = info->attrs;
>> +	struct ethtool_fp preempt = {};
>> +	struct net_device *dev;
>> +	bool mod = false;
>> +	int ret;
>> +
>> +	ret = ethnl_parse_header_dev_get(&req_info,
>> +					 tb[ETHTOOL_A_PREEMPT_HEADER],
>> +					 genl_info_net(info), info->extack,
>> +					 true);
>> +	if (ret < 0)
>> +		return ret;
>> +	dev = req_info.dev;
>> +	ret = -EOPNOTSUPP;
>
> Some new lines around here please? And maybe it would look a bit cleaner
> if you could assign "ret = -EOPNOTSUPP" in the "preempt ops not present"
> if condition body?
>

I will add more vertical spaces. About the error returning idioms, even
if they are not my preferred style, they are consistent with the other
files in net/ethtool, so will keep that as it is.

>> +	if (!dev->ethtool_ops->get_preempt ||
>> +	    !dev->ethtool_ops->set_preempt)
>> +		goto out_dev;
>> +
>> +	rtnl_lock();
>> +	ret = ethnl_ops_begin(dev);
>> +	if (ret < 0)
>> +		goto out_rtnl;
>> +
>> +	ret = dev->ethtool_ops->get_preempt(dev, &preempt);
>
> I don't know much about the background of ethtool netlink, but why does
> the .doit of ETHTOOL_MSG_*_SET go through a getter first? Is it because
> all the netlink attributes from the message are optional, and we need to
> default to the current state?
>

Yes, and there's the "optimization" that the setter will only be called
if there's any modification, so we need to know the current state.

>> +	if (ret < 0) {
>> +		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");
>> +		goto out_ops;
>> +	}
>> +
>> +	ethnl_update_u8(&preempt.enabled,
>> +			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);
>> +	ethnl_update_u32(&preempt.add_frag_size,
>> +			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);
>> +	ret = 0;
>
> This reinitialization of ret to zero is interesting. It implies
> ->get_preempt() is allowed to return > 0 as a success error code.
> However ->set_preempt() below isn't? (its return value is directly
> propagated to callers of ethnl_set_preempt().
>

It also applies to other "commands" in net/ethtool. My feeling is that
this is more like a undocumented convention than a bug (following what
the first command did). Will leave as it is, unless there are strong
feelings.


>> +	if (!mod)
>> +		goto out_ops;
>> +
>> +	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);
>> +	if (ret < 0) {
>> +		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");
>> +		goto out_ops;
>> +	}
>> +
>> +	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);
>> +
>> +out_ops:
>> +	ethnl_ops_complete(dev);
>> +out_rtnl:
>> +	rtnl_unlock();
>> +out_dev:
>> +	dev_put(dev);
>> +	return ret;
>> +}
>> -- 
>> 2.32.0
>> 


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
  2021-06-27 19:58     ` [Intel-wired-lan] " Vladimir Oltean
@ 2022-04-11 23:31       ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-11 23:31 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jun 25, 2021 at 05:33:04PM -0700, Vinicius Costa Gomes wrote:
>> Adds a way to configure which traffic classes are marked as
>> preemptible and which are marked as express.
>> 
>> Even if frame preemption is not a "real" offload, because it can't be
>> executed purely in software, having this information near where the
>> mapping of traffic classes to queues is specified, makes it,
>> hopefully, easier to use.
>> 
>> taprio will receive the information of which traffic classes are
>> marked as express/preemptible, and when offloading frame preemption to
>> the driver will convert the information, so the driver receives which
>> queues are marked as express/preemptible.
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>>  include/linux/netdevice.h      |  1 +
>>  include/net/pkt_sched.h        |  4 ++++
>>  include/uapi/linux/pkt_sched.h |  1 +
>>  net/sched/sch_taprio.c         | 43 ++++++++++++++++++++++++++++++----
>>  4 files changed, 44 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index be1dcceda5e4..af5d4c5b0ad5 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -923,6 +923,7 @@ enum tc_setup_type {
>>  	TC_SETUP_QDISC_TBF,
>>  	TC_SETUP_QDISC_FIFO,
>>  	TC_SETUP_QDISC_HTB,
>> +	TC_SETUP_PREEMPT,
>>  };
>>  
>>  /* These structures hold the attributes of bpf state that are being passed
>> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
>> index 6d7b12cba015..b4cb479d1cf5 100644
>> --- a/include/net/pkt_sched.h
>> +++ b/include/net/pkt_sched.h
>> @@ -178,6 +178,10 @@ struct tc_taprio_qopt_offload {
>>  	struct tc_taprio_sched_entry entries[];
>>  };
>>  
>> +struct tc_preempt_qopt_offload {
>> +	u32 preemptible_queues;
>> +};
>> +
>>  /* Reference counting */
>>  struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
>>  						  *offload);
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 79a699f106b1..830ce9c9ec6f 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -1241,6 +1241,7 @@ enum {
>>  	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
>>  	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
>>  	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
>> +	TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */
>>  	__TCA_TAPRIO_ATTR_MAX,
>>  };
>>  
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index 66fe2b82af9a..58586f98c648 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -64,6 +64,7 @@ struct taprio_sched {
>>  	struct Qdisc **qdiscs;
>>  	struct Qdisc *root;
>>  	u32 flags;
>> +	u32 preemptible_tcs;
>>  	enum tk_offsets tk_offset;
>>  	int clockid;
>>  	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
>> @@ -786,6 +787,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
>>  	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
>>  	[TCA_TAPRIO_ATTR_FLAGS]                      = { .type = NLA_U32 },
>>  	[TCA_TAPRIO_ATTR_TXTIME_DELAY]		     = { .type = NLA_U32 },
>> +	[TCA_TAPRIO_ATTR_PREEMPT_TCS]                = { .type = NLA_U32 },
>>  };
>>  
>>  static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
>> @@ -1284,6 +1286,7 @@ static int taprio_disable_offload(struct net_device *dev,
>>  				  struct netlink_ext_ack *extack)
>>  {
>>  	const struct net_device_ops *ops = dev->netdev_ops;
>> +	struct tc_preempt_qopt_offload preempt = { };
>>  	struct tc_taprio_qopt_offload *offload;
>>  	int err;
>>  
>> @@ -1302,13 +1305,15 @@ static int taprio_disable_offload(struct net_device *dev,
>>  	offload->enable = 0;
>>  
>>  	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
>> -	if (err < 0) {
>> +	if (err < 0)
>>  		NL_SET_ERR_MSG(extack,
>> -			       "Device failed to disable offload");
>> -		goto out;
>> -	}
>> +			       "Device failed to disable taprio offload");
>> +
>> +	err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt);
>> +	if (err < 0)
>> +		NL_SET_ERR_MSG(extack,
>> +			       "Device failed to disable frame preemption offload");
>
> First line in taprio_disable_offload() is:
>
> 	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
> 		return 0;
>
> but you said it yourself below that the preemptible queues thing is
> independent of whether you have taprio offload or not (or taprio at
> all). So the queues will never be reset back to the eMAC if you don't
> use full offload (yes, this includes txtime offload too). In fact, it's
> so independent, that I don't even know why we add them to taprio in the
> first place :)

That I didn't change taprio_disable_offload() was a mistake caused in
part by the limitations of the hardware I have (I cannot have txtime
offload and frame preemption enabled at the same time), so I didn't
catch that.

> I think the argument had to do with the hold/advance commands (other
> frame preemption stuff that's already in taprio), but those are really
> special and only to be used in the Qbv+Qbu combination, but the pMAC
> traffic classes? I don't know... Honestly I thought that me asking to
> see preemptible queues implemented for mqprio as well was going to
> discourage you, but oh well...
>

Now, the real important part, if this should be communicated to the
driver via taprio or via ethtool/netlink.   

I don't really have strong opinions on this anymore, the two options are
viable/possible.

This is going to be a niche feature, agreed, so thinking that going with
the one that gives the user more flexibility perhaps is best, i.e. using
ethtool/netlink to communicate which queues should be marked as
preemptible or express.

>>  
>> -out:
>>  	taprio_offload_free(offload);
>>  
>>  	return err;
>> @@ -1525,6 +1530,29 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>>  					       mqprio->prio_tc_map[i]);
>>  	}
>>  
>> +	/* It's valid to enable frame preemption without any kind of
>> +	 * offloading being enabled, so keep it separated.
>> +	 */
>> +	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
>> +		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
>> +		struct tc_preempt_qopt_offload qopt = { };
>> +
>> +		if (preempt == U32_MAX) {
>> +			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
>> +			err = -EINVAL;
>> +			goto free_sched;
>> +		}
>
> Hmmm, did we somehow agree that at least one traffic class must not be
> preemptible? Citation needed.
>
>> +
>> +		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
>> +
>> +		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
>> +						    &qopt);
>> +		if (err)
>> +			goto free_sched;
>> +
>> +		q->preemptible_tcs = preempt;
>> +	}
>> +
>>  	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
>>  		err = taprio_enable_offload(dev, q, new_admin, extack);
>>  	else
>> @@ -1681,6 +1709,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
>>  	 */
>>  	q->clockid = -1;
>>  	q->flags = TAPRIO_FLAGS_INVALID;
>> +	q->preemptible_tcs = U32_MAX;
>>  
>>  	spin_lock(&taprio_list_lock);
>>  	list_add(&q->taprio_list, &taprio_list);
>> @@ -1899,6 +1928,10 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
>>  	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
>>  		goto options_error;
>>  
>> +	if (q->preemptible_tcs != U32_MAX &&
>> +	    nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs))
>> +		goto options_error;
>> +
>>  	if (q->txtime_delay &&
>>  	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
>>  		goto options_error;
>> -- 
>> 2.32.0
>> 



-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
@ 2022-04-11 23:31       ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-11 23:31 UTC (permalink / raw)
  To: intel-wired-lan

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jun 25, 2021 at 05:33:04PM -0700, Vinicius Costa Gomes wrote:
>> Adds a way to configure which traffic classes are marked as
>> preemptible and which are marked as express.
>> 
>> Even if frame preemption is not a "real" offload, because it can't be
>> executed purely in software, having this information near where the
>> mapping of traffic classes to queues is specified, makes it,
>> hopefully, easier to use.
>> 
>> taprio will receive the information of which traffic classes are
>> marked as express/preemptible, and when offloading frame preemption to
>> the driver will convert the information, so the driver receives which
>> queues are marked as express/preemptible.
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>>  include/linux/netdevice.h      |  1 +
>>  include/net/pkt_sched.h        |  4 ++++
>>  include/uapi/linux/pkt_sched.h |  1 +
>>  net/sched/sch_taprio.c         | 43 ++++++++++++++++++++++++++++++----
>>  4 files changed, 44 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index be1dcceda5e4..af5d4c5b0ad5 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -923,6 +923,7 @@ enum tc_setup_type {
>>  	TC_SETUP_QDISC_TBF,
>>  	TC_SETUP_QDISC_FIFO,
>>  	TC_SETUP_QDISC_HTB,
>> +	TC_SETUP_PREEMPT,
>>  };
>>  
>>  /* These structures hold the attributes of bpf state that are being passed
>> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
>> index 6d7b12cba015..b4cb479d1cf5 100644
>> --- a/include/net/pkt_sched.h
>> +++ b/include/net/pkt_sched.h
>> @@ -178,6 +178,10 @@ struct tc_taprio_qopt_offload {
>>  	struct tc_taprio_sched_entry entries[];
>>  };
>>  
>> +struct tc_preempt_qopt_offload {
>> +	u32 preemptible_queues;
>> +};
>> +
>>  /* Reference counting */
>>  struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
>>  						  *offload);
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 79a699f106b1..830ce9c9ec6f 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -1241,6 +1241,7 @@ enum {
>>  	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
>>  	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
>>  	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
>> +	TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */
>>  	__TCA_TAPRIO_ATTR_MAX,
>>  };
>>  
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index 66fe2b82af9a..58586f98c648 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -64,6 +64,7 @@ struct taprio_sched {
>>  	struct Qdisc **qdiscs;
>>  	struct Qdisc *root;
>>  	u32 flags;
>> +	u32 preemptible_tcs;
>>  	enum tk_offsets tk_offset;
>>  	int clockid;
>>  	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
>> @@ -786,6 +787,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
>>  	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
>>  	[TCA_TAPRIO_ATTR_FLAGS]                      = { .type = NLA_U32 },
>>  	[TCA_TAPRIO_ATTR_TXTIME_DELAY]		     = { .type = NLA_U32 },
>> +	[TCA_TAPRIO_ATTR_PREEMPT_TCS]                = { .type = NLA_U32 },
>>  };
>>  
>>  static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
>> @@ -1284,6 +1286,7 @@ static int taprio_disable_offload(struct net_device *dev,
>>  				  struct netlink_ext_ack *extack)
>>  {
>>  	const struct net_device_ops *ops = dev->netdev_ops;
>> +	struct tc_preempt_qopt_offload preempt = { };
>>  	struct tc_taprio_qopt_offload *offload;
>>  	int err;
>>  
>> @@ -1302,13 +1305,15 @@ static int taprio_disable_offload(struct net_device *dev,
>>  	offload->enable = 0;
>>  
>>  	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
>> -	if (err < 0) {
>> +	if (err < 0)
>>  		NL_SET_ERR_MSG(extack,
>> -			       "Device failed to disable offload");
>> -		goto out;
>> -	}
>> +			       "Device failed to disable taprio offload");
>> +
>> +	err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt);
>> +	if (err < 0)
>> +		NL_SET_ERR_MSG(extack,
>> +			       "Device failed to disable frame preemption offload");
>
> First line in taprio_disable_offload() is:
>
> 	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
> 		return 0;
>
> but you said it yourself below that the preemptible queues thing is
> independent of whether you have taprio offload or not (or taprio at
> all). So the queues will never be reset back to the eMAC if you don't
> use full offload (yes, this includes txtime offload too). In fact, it's
> so independent, that I don't even know why we add them to taprio in the
> first place :)

That I didn't change taprio_disable_offload() was a mistake caused in
part by the limitations of the hardware I have (I cannot have txtime
offload and frame preemption enabled at the same time), so I didn't
catch that.

> I think the argument had to do with the hold/advance commands (other
> frame preemption stuff that's already in taprio), but those are really
> special and only to be used in the Qbv+Qbu combination, but the pMAC
> traffic classes? I don't know... Honestly I thought that me asking to
> see preemptible queues implemented for mqprio as well was going to
> discourage you, but oh well...
>

Now, the real important part, if this should be communicated to the
driver via taprio or via ethtool/netlink.   

I don't really have strong opinions on this anymore, the two options are
viable/possible.

This is going to be a niche feature, agreed, so thinking that going with
the one that gives the user more flexibility perhaps is best, i.e. using
ethtool/netlink to communicate which queues should be marked as
preemptible or express.

>>  
>> -out:
>>  	taprio_offload_free(offload);
>>  
>>  	return err;
>> @@ -1525,6 +1530,29 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>>  					       mqprio->prio_tc_map[i]);
>>  	}
>>  
>> +	/* It's valid to enable frame preemption without any kind of
>> +	 * offloading being enabled, so keep it separated.
>> +	 */
>> +	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
>> +		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
>> +		struct tc_preempt_qopt_offload qopt = { };
>> +
>> +		if (preempt == U32_MAX) {
>> +			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
>> +			err = -EINVAL;
>> +			goto free_sched;
>> +		}
>
> Hmmm, did we somehow agree that at least one traffic class must not be
> preemptible? Citation needed.
>
>> +
>> +		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
>> +
>> +		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
>> +						    &qopt);
>> +		if (err)
>> +			goto free_sched;
>> +
>> +		q->preemptible_tcs = preempt;
>> +	}
>> +
>>  	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
>>  		err = taprio_enable_offload(dev, q, new_admin, extack);
>>  	else
>> @@ -1681,6 +1709,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
>>  	 */
>>  	q->clockid = -1;
>>  	q->flags = TAPRIO_FLAGS_INVALID;
>> +	q->preemptible_tcs = U32_MAX;
>>  
>>  	spin_lock(&taprio_list_lock);
>>  	list_add(&q->taprio_list, &taprio_list);
>> @@ -1899,6 +1928,10 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
>>  	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
>>  		goto options_error;
>>  
>> +	if (q->preemptible_tcs != U32_MAX &&
>> +	    nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs))
>> +		goto options_error;
>> +
>>  	if (q->txtime_delay &&
>>  	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
>>  		goto options_error;
>> -- 
>> 2.32.0
>> 



-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 11/12] igc: Check incompatible configs for Frame Preemption
  2021-06-28  9:20     ` [Intel-wired-lan] " Vladimir Oltean
@ 2022-04-11 23:36       ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-11 23:36 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jun 25, 2021 at 05:33:13PM -0700, Vinicius Costa Gomes wrote:
>> Frame Preemption and LaunchTime cannot be enabled on the same queue.
>> If that situation happens, emit an error to the user, and log the
>> error.
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>
> This is a very interesting limitation, considering the fact that much of
> the frame preemption validation that I did was in conjunction with
> tc-etf and SO_TXTIME (send packets on 2 queues, one preemptible and one
> express, and compare the TX timestamps of the express packets with their
> scheduled TX times). The base-time offset between the ET and the PT
> packets is varied in small increments in the order of 20 ns or so.
> If this is not possible with hardware driven by igc, how do you know it
> works properly? :)

Good question. My tests were much less accurate than what you were
doing, I was basically flooding the link with preemptable packets, and
sending some number of express packets, and counting them using some
debug counters on the receiving side.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 11/12] igc: Check incompatible configs for Frame Preemption
@ 2022-04-11 23:36       ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-11 23:36 UTC (permalink / raw)
  To: intel-wired-lan

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jun 25, 2021 at 05:33:13PM -0700, Vinicius Costa Gomes wrote:
>> Frame Preemption and LaunchTime cannot be enabled on the same queue.
>> If that situation happens, emit an error to the user, and log the
>> error.
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>
> This is a very interesting limitation, considering the fact that much of
> the frame preemption validation that I did was in conjunction with
> tc-etf and SO_TXTIME (send packets on 2 queues, one preemptible and one
> express, and compare the TX timestamps of the express packets with their
> scheduled TX times). The base-time offset between the ET and the PT
> packets is varied in small increments in the order of 20 ns or so.
> If this is not possible with hardware driven by igc, how do you know it
> works properly? :)

Good question. My tests were much less accurate than what you were
doing, I was basically flooding the link with preemptable packets, and
sending some number of express packets, and counting them using some
debug counters on the receiving side.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
  2022-04-11 23:31       ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-04-12  0:08         ` Vladimir Oltean
  -1 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2022-04-12  0:08 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

On Mon, Apr 11, 2022 at 04:31:03PM -0700, Vinicius Costa Gomes wrote:
> > First line in taprio_disable_offload() is:
> >
> > 	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
> > 		return 0;
> >
> > but you said it yourself below that the preemptible queues thing is
> > independent of whether you have taprio offload or not (or taprio at
> > all). So the queues will never be reset back to the eMAC if you don't
> > use full offload (yes, this includes txtime offload too). In fact, it's
> > so independent, that I don't even know why we add them to taprio in the
> > first place :)
>
> That I didn't change taprio_disable_offload() was a mistake caused in
> part by the limitations of the hardware I have (I cannot have txtime
> offload and frame preemption enabled at the same time), so I didn't
> catch that.
>
> > I think the argument had to do with the hold/advance commands (other
> > frame preemption stuff that's already in taprio), but those are really
> > special and only to be used in the Qbv+Qbu combination, but the pMAC
> > traffic classes? I don't know... Honestly I thought that me asking to
> > see preemptible queues implemented for mqprio as well was going to
> > discourage you, but oh well...
>
> Now, the real important part, if this should be communicated to the
> driver via taprio or via ethtool/netlink.
>
> I don't really have strong opinions on this anymore, the two options are
> viable/possible.
>
> This is going to be a niche feature, agreed, so thinking that going with
> the one that gives the user more flexibility perhaps is best, i.e. using
> ethtool/netlink to communicate which queues should be marked as
> preemptible or express.

So we're back at this, very well.

I was just happening to be looking at clause 36 of 802.1Q (Priority Flow Control),
a feature exchanged through DCBX where flows of a certain priority can be
configured as lossless on a port, and generate PAUSE frames. This is essentially
the extension of 802.3 annex 31B MAC Control PAUSE operation with the ability to
enable/disable flow control on a per-priority basis.

The priority in PFC (essentially synonymous with "traffic class") is the same
priority as the priority in frame preemption. And you know how PFC is configured
in Linux? Not through the qdisc, but through DCB_ATTR_PFC_CFG, a nested dcbnl
netlink attribute with one nested u8 attribute per priority value
(DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_7).

Not saying we should follow the exact same model as PFC, just saying that I'm
hard pressed to find a good reason why the "preemptable traffic classes"
information should sit in a layer which is basically independent of the frame
preemption feature itself.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
@ 2022-04-12  0:08         ` Vladimir Oltean
  0 siblings, 0 replies; 50+ messages in thread
From: Vladimir Oltean @ 2022-04-12  0:08 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Apr 11, 2022 at 04:31:03PM -0700, Vinicius Costa Gomes wrote:
> > First line in taprio_disable_offload() is:
> >
> > 	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
> > 		return 0;
> >
> > but you said it yourself below that the preemptible queues thing is
> > independent of whether you have taprio offload or not (or taprio at
> > all). So the queues will never be reset back to the eMAC if you don't
> > use full offload (yes, this includes txtime offload too). In fact, it's
> > so independent, that I don't even know why we add them to taprio in the
> > first place :)
>
> That I didn't change taprio_disable_offload() was a mistake caused in
> part by the limitations of the hardware I have (I cannot have txtime
> offload and frame preemption enabled at the same time), so I didn't
> catch that.
>
> > I think the argument had to do with the hold/advance commands (other
> > frame preemption stuff that's already in taprio), but those are really
> > special and only to be used in the Qbv+Qbu combination, but the pMAC
> > traffic classes? I don't know... Honestly I thought that me asking to
> > see preemptible queues implemented for mqprio as well was going to
> > discourage you, but oh well...
>
> Now, the real important part, if this should be communicated to the
> driver via taprio or via ethtool/netlink.
>
> I don't really have strong opinions on this anymore, the two options are
> viable/possible.
>
> This is going to be a niche feature, agreed, so thinking that going with
> the one that gives the user more flexibility perhaps is best, i.e. using
> ethtool/netlink to communicate which queues should be marked as
> preemptible or express.

So we're back at this, very well.

I was just happening to be looking at clause 36 of 802.1Q (Priority Flow Control),
a feature exchanged through DCBX where flows of a certain priority can be
configured as lossless on a port, and generate PAUSE frames. This is essentially
the extension of 802.3 annex 31B MAC Control PAUSE operation with the ability to
enable/disable flow control on a per-priority basis.

The priority in PFC (essentially synonymous with "traffic class") is the same
priority as the priority in frame preemption. And you know how PFC is configured
in Linux? Not through the qdisc, but through DCB_ATTR_PFC_CFG, a nested dcbnl
netlink attribute with one nested u8 attribute per priority value
(DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_7).

Not saying we should follow the exact same model as PFC, just saying that I'm
hard pressed to find a good reason why the "preemptable traffic classes"
information should sit in a layer which is basically independent of the frame
preemption feature itself.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 12/12] igc: Add support for Frame Preemption verification
  2021-06-28  9:59     ` [Intel-wired-lan] " Vladimir Oltean
@ 2022-04-12  0:13       ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-12  0:13 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jun 25, 2021 at 05:33:14PM -0700, Vinicius Costa Gomes wrote:
>> Add support for sending/receiving Frame Preemption verification
>> frames.
>> 
>> The i225 hardware doesn't implement the process of verification
>> internally, this is left to the driver.
>> 
>> Add a simple implementation of the state machine defined in IEEE
>> 802.3-2018, Section 99.4.7.
>> 
>> For now, the state machine is started manually by the user, when
>> enabling verification. Example:
>> 
>> $ ethtool --set-frame-preemption IFACE disable-verify off
>> 
>> The "verified" condition is set to true when the SMD-V frame is sent,
>> and the SMD-R frame is received. So, it only tracks the transmission
>> side. This seems to be what's expected from IEEE 802.3-2018.
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>>  drivers/net/ethernet/intel/igc/igc.h         |  15 ++
>>  drivers/net/ethernet/intel/igc/igc_defines.h |  13 ++
>>  drivers/net/ethernet/intel/igc/igc_ethtool.c |  20 +-
>>  drivers/net/ethernet/intel/igc/igc_main.c    | 216 +++++++++++++++++++
>>  4 files changed, 261 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
>> index 9b2ddcbf65fb..84234efed781 100644
>> --- a/drivers/net/ethernet/intel/igc/igc.h
>> +++ b/drivers/net/ethernet/intel/igc/igc.h
>> @@ -122,6 +122,13 @@ struct igc_ring {
>>  	struct xsk_buff_pool *xsk_pool;
>>  } ____cacheline_internodealigned_in_smp;
>>  
>> +enum frame_preemption_state {
>> +	FRAME_PREEMPTION_STATE_FAILED,
>> +	FRAME_PREEMPTION_STATE_DONE,
>> +	FRAME_PREEMPTION_STATE_START,
>> +	FRAME_PREEMPTION_STATE_SENT,
>> +};
>> +
>>  /* Board specific private data structure */
>>  struct igc_adapter {
>>  	struct net_device *netdev;
>> @@ -240,6 +247,14 @@ struct igc_adapter {
>>  		struct timespec64 start;
>>  		struct timespec64 period;
>>  	} perout[IGC_N_PEROUT];
>> +
>> +	struct delayed_work fp_verification_work;
>> +	unsigned long fp_start;
>> +	bool fp_received_smd_v;
>> +	bool fp_received_smd_r;
>> +	unsigned int fp_verify_cnt;
>> +	enum frame_preemption_state fp_tx_state;
>> +	bool fp_disable_verify;
>>  };
>>  
>>  void igc_up(struct igc_adapter *adapter);
>> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
>> index a2ea057d8e6e..cf46f5d5a505 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
>> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
>> @@ -268,6 +268,8 @@
>>  #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
>>  #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
>>  #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
>> +#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
>> +#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
>>  #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
>>  #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
>>  #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
>> @@ -327,9 +329,20 @@
>>  
>>  #define IGC_RXDEXT_STATERR_LB	0x00040000
>>  
>> +#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
>> +#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
>> +
>
> So the i225 gives you the ability to select from multiple
> Start-of-mPacket-Delimiter values on a per-TX descriptor basis?
> And this is in addition to configuring that TX ring as preemptable I
> guess? Because I notice that you're sending on the TX ring affine to the
> current CPU that the verification work item is running on (which you
> don't check anywhere that it is configured as going to the pMAC or
> not).

Yeah, talking to the hardware folks, those descriptors are handled
differently by the hardware.

> And on RX, it always gives you the kind of SMD that the packet had
> (including the classic SFD for express packets)?
> Cool.

I would use another word, but yeah :-)

>
> It would be nice if I could connect back to back an i225 board with an
> NXP LS1028A to see if the verification state machines pass both ways (on
> LS1028A it is 100% hardware based, we just enable/disable the feature
> and we can monitor the state changes via an interrupt).
>

My life would be easier if that were the case here.

>>  /* Advanced Receive Descriptor bit definitions */
>>  #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
>>  
>> +#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
>> +#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
>> +
>> +#define IGC_SMD_TYPE_SFD		0x0
>> +#define IGC_SMD_TYPE_SMD_V		0x1
>> +#define IGC_SMD_TYPE_SMD_R		0x2
>> +#define IGC_SMD_TYPE_COMPLETE		0x3
>> +
>>  #define IGC_RXDEXT_STATERR_L4E		0x20000000
>>  #define IGC_RXDEXT_STATERR_IPE		0x40000000
>>  #define IGC_RXDEXT_STATERR_RXE		0x80000000
>> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> index 84d5afe92154..f52a7be3af66 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> @@ -1649,6 +1649,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
>>  
>>  	fpcmd->enabled = adapter->frame_preemption_active;
>>  	fpcmd->add_frag_size = adapter->add_frag_size;
>> +	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
>> +	fpcmd->disable_verify = adapter->fp_disable_verify;
>>  
>>  	return 0;
>>  }
>> @@ -1664,10 +1666,22 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
>>  		return -EINVAL;
>>  	}
>>  
>> -	adapter->frame_preemption_active = fpcmd->enabled;
>> -	adapter->add_frag_size = fpcmd->add_frag_size;
>> +	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
>> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
>> +		schedule_delayed_work(&adapter->fp_verification_work, msecs_to_jiffies(10));
>
> Not sure how much you'd like to tune this, but the spec has a
> configurable verifyTime between 1 ms and 128 ms. You chose the default
> value, so we should be ok for now.

We can add a configurable for that later, via ethtool for example.

>
>> +	}
>>  
>> -	return igc_tsn_offload_apply(adapter);
>> +	adapter->fp_disable_verify = fpcmd->disable_verify;
>> +
>> +	if (adapter->frame_preemption_active != fpcmd->enabled ||
>> +	    adapter->add_frag_size != fpcmd->add_frag_size) {
>> +		adapter->frame_preemption_active = fpcmd->enabled;
>> +		adapter->add_frag_size = fpcmd->add_frag_size;
>> +
>> +		return igc_tsn_offload_apply(adapter);
>> +	}
>> +
>> +	return 0;
>>  }
>>  
>>  static int igc_ethtool_begin(struct net_device *netdev)
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 20dac04a02f2..ed55bd13e4a1 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -28,6 +28,11 @@
>>  #define IGC_XDP_TX		BIT(1)
>>  #define IGC_XDP_REDIRECT	BIT(2)
>>  
>> +#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
>> +#define IGC_MAX_VERIFY_CNT 3
>> +
>> +#define IGC_FP_SMD_FRAME_SIZE 60
>> +
>>  static int debug = -1;
>>  
>>  MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
>> @@ -2169,6 +2174,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
>>  	return 0;
>>  }
>>  
>> +static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
>> +				 struct sk_buff *skb)
>> +{
>> +	dma_addr_t dma;
>> +	unsigned int size;
>> +
>> +	size = skb_headlen(skb);
>> +
>> +	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
>> +	if (dma_mapping_error(ring->dev, dma)) {
>> +		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	buffer->skb = skb;
>> +	buffer->protocol = 0;
>> +	buffer->bytecount = skb->len;
>> +	buffer->gso_segs = 1;
>> +	buffer->time_stamp = jiffies;
>> +	dma_unmap_len_set(buffer, len, skb->len);
>> +	dma_unmap_addr_set(buffer, dma, dma);
>> +
>> +	return 0;
>> +}
>> +
>> +static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
>> +				     struct sk_buff *skb, int type)
>> +{
>> +	struct igc_tx_buffer *buffer;
>> +	union igc_adv_tx_desc *desc;
>> +	u32 cmd_type, olinfo_status;
>> +	int err;
>> +
>> +	if (!igc_desc_unused(ring))
>> +		return -EBUSY;
>> +
>> +	buffer = &ring->tx_buffer_info[ring->next_to_use];
>> +	err = igc_fp_init_smd_frame(ring, buffer, skb);
>> +	if (err)
>> +		return err;
>> +
>> +	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
>> +		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
>> +		   buffer->bytecount;
>> +	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
>> +
>> +	switch (type) {
>> +	case IGC_SMD_TYPE_SMD_V:
>> +		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
>> +		break;
>> +	case IGC_SMD_TYPE_SMD_R:
>> +		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	desc = IGC_TX_DESC(ring, ring->next_to_use);
>> +	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
>> +	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>> +	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
>> +
>> +	netdev_tx_sent_queue(txring_txq(ring), skb->len);
>> +
>> +	buffer->next_to_watch = desc;
>> +
>> +	ring->next_to_use++;
>> +	if (ring->next_to_use == ring->count)
>> +		ring->next_to_use = 0;
>> +
>> +	return 0;
>> +}
>> +
>>  static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
>>  					    int cpu)
>>  {
>> @@ -2299,6 +2377,19 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
>>  	q_vector->rx.total_bytes += bytes;
>>  }
>>  
>> +static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
>> +{
>> +	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
>> +
>> +	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
>> +		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
>> +}
>> +
>> +static bool igc_check_smd_frame(struct igc_rx_buffer *rx_buffer, unsigned int size)
>> +{
>> +	return size == 60;
>
> You should probably also verify that the contents is 60 octets of zeroes (sans the mCRC)?
>

Yeah, I will add some checks for that.

>> +}
>> +
>>  static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>>  {
>>  	unsigned int total_bytes = 0, total_packets = 0;
>> @@ -2315,6 +2406,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>>  		ktime_t timestamp = 0;
>>  		struct xdp_buff xdp;
>>  		int pkt_offset = 0;
>> +		int smd_type;
>>  		void *pktbuf;
>>  
>>  		/* return some buffers to hardware, one at a time is too slow */
>> @@ -2346,6 +2438,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>>  			size -= IGC_TS_HDR_LEN;
>>  		}
>>  
>> +		smd_type = igc_rx_desc_smd_type(rx_desc);
>> +
>> +		if (smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R) {
>
> I guess the performance people will love you for this change. You should
> probably guard it by an "if (unlikely(disableVerify == false))" condition.
>

Will add the unlikely().

>> +			if (igc_check_smd_frame(rx_buffer, size)) {
>> +				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
>> +				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
>> +				schedule_delayed_work(&adapter->fp_verification_work, 0);
>> +			}
>> +
>> +			/* Advance the ring next-to-clean */
>> +			igc_is_non_eop(rx_ring, rx_desc);
>> +
>> +			cleaned_count++;
>> +			continue;
>> +		}
>> +
>>  		if (!skb) {
>>  			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>>  			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
>> @@ -5607,6 +5715,107 @@ static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
>>  	return igc_tsn_offload_apply(adapter);
>>  }
>>  
>> +/* I225 doesn't send the SMD frames automatically, we need to handle
>> + * them ourselves.
>> + */
>> +static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
>> +{
>> +	int cpu = smp_processor_id();
>> +	struct netdev_queue *nq;
>> +	struct igc_ring *ring;
>> +	struct sk_buff *skb;
>> +	void *data;
>> +	int err;
>> +
>> +	if (!netif_running(adapter->netdev))
>> +		return -ENOTCONN;
>> +
>> +	/* FIXME: rename this function to something less specific, as
>> +	 * it can be used outside XDP.
>> +	 */
>> +	ring = igc_xdp_get_tx_ring(adapter, cpu);
>> +	nq = txring_txq(ring);
>> +
>> +	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
>> +	if (!skb)
>> +		return -ENOMEM;
>> +
>> +	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
>> +	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
>> +
>> +	__netif_tx_lock(nq, cpu);
>> +
>> +	err = igc_fp_init_tx_descriptor(ring, skb, type);
>> +
>> +	igc_flush_tx_descriptors(ring);
>> +
>> +	__netif_tx_unlock(nq);
>> +
>> +	return err;
>> +}
>> +
>> +static void igc_fp_verification_work(struct work_struct *work)
>> +{
>> +	struct delayed_work *dwork = to_delayed_work(work);
>> +	struct igc_adapter *adapter;
>> +	int err;
>> +
>> +	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
>> +
>> +	if (adapter->fp_disable_verify)
>> +		goto done;
>> +
>> +	switch (adapter->fp_tx_state) {
>> +	case FRAME_PREEMPTION_STATE_START:
>> +		adapter->fp_received_smd_r = false;
>> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
>> +		if (err < 0)
>> +			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
>
> On TX error should you really advance to the STATE_SENT?
>

We tried to send a SMD-V frame and it failed, the error was probably
transient (unable to allocate memory) and it's going to be retried later.

>> +
>> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
>> +		adapter->fp_start = jiffies;
>> +		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
>> +		break;
>> +
>> +	case FRAME_PREEMPTION_STATE_SENT:
>> +		if (adapter->fp_received_smd_r) {
>> +			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
>> +			adapter->fp_received_smd_r = false;
>> +			break;
>> +		}
>> +
>> +		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
>> +			adapter->fp_verify_cnt++;
>> +			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
>> +
>> +			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
>> +				adapter->fp_verify_cnt = 0;
>> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
>> +				netdev_err(adapter->netdev,
>> +					   "Exceeded number of attempts for frame preemption verification\n");
>> +			} else {
>> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
>> +			}
>> +			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
>> +		}
>> +
>> +		break;
>> +
>> +	case FRAME_PREEMPTION_STATE_FAILED:
>> +	case FRAME_PREEMPTION_STATE_DONE:
>> +		break;
>> +	}
>> +
>> +done:
>> +	if (adapter->fp_received_smd_v) {
>> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
>> +		if (err < 0)
>> +			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
>> +
>> +		adapter->fp_received_smd_v = false;
>> +	}
>> +}
>> +
>>  static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
>>  			void *type_data)
>>  {
>> @@ -6023,6 +6232,7 @@ static int igc_probe(struct pci_dev *pdev,
>>  
>>  	INIT_WORK(&adapter->reset_task, igc_reset_task);
>>  	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
>> +	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
>>  
>>  	/* Initialize link properties that are user-changeable */
>>  	adapter->fc_autoneg = true;
>> @@ -6044,6 +6254,12 @@ static int igc_probe(struct pci_dev *pdev,
>>  
>>  	igc_ptp_init(adapter);
>>  
>> +	/* FIXME: This sets the default to not do the verification
>> +	 * automatically, when we have support in multiple
>> +	 * controllers, this default can be changed.
>> +	 */
>> +	adapter->fp_disable_verify = true;
>> +
>
> Hmmmmm. So we need to instruct our users to explicitly enable
> verification in their ethtool-based scripts, since the default values
> will vary wildly from one vendor to another. On LS1028A I see no reason
> why verification would be disabled by default.
>

Reading 99.4.3 (IEEE 802.3-2018) again, that "Verification may be disabled"
seems to imply that it should be enabled by default.

I will change this.

>>  	/* reset the hardware with the new settings */
>>  	igc_reset(adapter);
>>  
>> -- 
>> 2.32.0
>> 

-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 12/12] igc: Add support for Frame Preemption verification
@ 2022-04-12  0:13       ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-12  0:13 UTC (permalink / raw)
  To: intel-wired-lan

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jun 25, 2021 at 05:33:14PM -0700, Vinicius Costa Gomes wrote:
>> Add support for sending/receiving Frame Preemption verification
>> frames.
>> 
>> The i225 hardware doesn't implement the process of verification
>> internally, this is left to the driver.
>> 
>> Add a simple implementation of the state machine defined in IEEE
>> 802.3-2018, Section 99.4.7.
>> 
>> For now, the state machine is started manually by the user, when
>> enabling verification. Example:
>> 
>> $ ethtool --set-frame-preemption IFACE disable-verify off
>> 
>> The "verified" condition is set to true when the SMD-V frame is sent,
>> and the SMD-R frame is received. So, it only tracks the transmission
>> side. This seems to be what's expected from IEEE 802.3-2018.
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>>  drivers/net/ethernet/intel/igc/igc.h         |  15 ++
>>  drivers/net/ethernet/intel/igc/igc_defines.h |  13 ++
>>  drivers/net/ethernet/intel/igc/igc_ethtool.c |  20 +-
>>  drivers/net/ethernet/intel/igc/igc_main.c    | 216 +++++++++++++++++++
>>  4 files changed, 261 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
>> index 9b2ddcbf65fb..84234efed781 100644
>> --- a/drivers/net/ethernet/intel/igc/igc.h
>> +++ b/drivers/net/ethernet/intel/igc/igc.h
>> @@ -122,6 +122,13 @@ struct igc_ring {
>>  	struct xsk_buff_pool *xsk_pool;
>>  } ____cacheline_internodealigned_in_smp;
>>  
>> +enum frame_preemption_state {
>> +	FRAME_PREEMPTION_STATE_FAILED,
>> +	FRAME_PREEMPTION_STATE_DONE,
>> +	FRAME_PREEMPTION_STATE_START,
>> +	FRAME_PREEMPTION_STATE_SENT,
>> +};
>> +
>>  /* Board specific private data structure */
>>  struct igc_adapter {
>>  	struct net_device *netdev;
>> @@ -240,6 +247,14 @@ struct igc_adapter {
>>  		struct timespec64 start;
>>  		struct timespec64 period;
>>  	} perout[IGC_N_PEROUT];
>> +
>> +	struct delayed_work fp_verification_work;
>> +	unsigned long fp_start;
>> +	bool fp_received_smd_v;
>> +	bool fp_received_smd_r;
>> +	unsigned int fp_verify_cnt;
>> +	enum frame_preemption_state fp_tx_state;
>> +	bool fp_disable_verify;
>>  };
>>  
>>  void igc_up(struct igc_adapter *adapter);
>> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
>> index a2ea057d8e6e..cf46f5d5a505 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
>> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
>> @@ -268,6 +268,8 @@
>>  #define IGC_TXD_DTYP_C		0x00000000 /* Context Descriptor */
>>  #define IGC_TXD_POPTS_IXSM	0x01       /* Insert IP checksum */
>>  #define IGC_TXD_POPTS_TXSM	0x02       /* Insert TCP/UDP checksum */
>> +#define IGC_TXD_POPTS_SMD_V	0x10       /* Transmitted packet is a SMD-Verify */
>> +#define IGC_TXD_POPTS_SMD_R	0x20       /* Transmitted packet is a SMD-Response */
>>  #define IGC_TXD_CMD_EOP		0x01000000 /* End of Packet */
>>  #define IGC_TXD_CMD_IC		0x04000000 /* Insert Checksum */
>>  #define IGC_TXD_CMD_DEXT	0x20000000 /* Desc extension (0 = legacy) */
>> @@ -327,9 +329,20 @@
>>  
>>  #define IGC_RXDEXT_STATERR_LB	0x00040000
>>  
>> +#define IGC_RXD_STAT_SMD_V	0x2000  /* Received packet is SMD-Verify packet */
>> +#define IGC_RXD_STAT_SMD_R	0x4000  /* Received packet is SMD-Response packet */
>> +
>
> So the i225 gives you the ability to select from multiple
> Start-of-mPacket-Delimiter values on a per-TX descriptor basis?
> And this is in addition to configuring that TX ring as preemptable I
> guess? Because I notice that you're sending on the TX ring affine to the
> current CPU that the verification work item is running on (which you
> don't check anywhere that it is configured as going to the pMAC or
> not).

Yeah, talking to the hardware folks, those descriptors are handled
differently by the hardware.

> And on RX, it always gives you the kind of SMD that the packet had
> (including the classic SFD for express packets)?
> Cool.

I would use another word, but yeah :-)

>
> It would be nice if I could connect back to back an i225 board with an
> NXP LS1028A to see if the verification state machines pass both ways (on
> LS1028A it is 100% hardware based, we just enable/disable the feature
> and we can monitor the state changes via an interrupt).
>

My life would be easier if that were the case here.

>>  /* Advanced Receive Descriptor bit definitions */
>>  #define IGC_RXDADV_STAT_TSIP	0x08000 /* timestamp in packet */
>>  
>> +#define IGC_RXDADV_STAT_SMD_TYPE_MASK	0x06000
>> +#define IGC_RXDADV_STAT_SMD_TYPE_SHIFT	13
>> +
>> +#define IGC_SMD_TYPE_SFD		0x0
>> +#define IGC_SMD_TYPE_SMD_V		0x1
>> +#define IGC_SMD_TYPE_SMD_R		0x2
>> +#define IGC_SMD_TYPE_COMPLETE		0x3
>> +
>>  #define IGC_RXDEXT_STATERR_L4E		0x20000000
>>  #define IGC_RXDEXT_STATERR_IPE		0x40000000
>>  #define IGC_RXDEXT_STATERR_RXE		0x80000000
>> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> index 84d5afe92154..f52a7be3af66 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> @@ -1649,6 +1649,8 @@ static int igc_ethtool_get_preempt(struct net_device *netdev,
>>  
>>  	fpcmd->enabled = adapter->frame_preemption_active;
>>  	fpcmd->add_frag_size = adapter->add_frag_size;
>> +	fpcmd->verified = adapter->fp_tx_state == FRAME_PREEMPTION_STATE_DONE;
>> +	fpcmd->disable_verify = adapter->fp_disable_verify;
>>  
>>  	return 0;
>>  }
>> @@ -1664,10 +1666,22 @@ static int igc_ethtool_set_preempt(struct net_device *netdev,
>>  		return -EINVAL;
>>  	}
>>  
>> -	adapter->frame_preemption_active = fpcmd->enabled;
>> -	adapter->add_frag_size = fpcmd->add_frag_size;
>> +	if (!fpcmd->disable_verify && adapter->fp_disable_verify) {
>> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
>> +		schedule_delayed_work(&adapter->fp_verification_work, msecs_to_jiffies(10));
>
> Not sure how much you'd like to tune this, but the spec has a
> configurable verifyTime between 1 ms and 128 ms. You chose the default
> value, so we should be ok for now.

We can add a configurable for that later, via ethtool for example.

>
>> +	}
>>  
>> -	return igc_tsn_offload_apply(adapter);
>> +	adapter->fp_disable_verify = fpcmd->disable_verify;
>> +
>> +	if (adapter->frame_preemption_active != fpcmd->enabled ||
>> +	    adapter->add_frag_size != fpcmd->add_frag_size) {
>> +		adapter->frame_preemption_active = fpcmd->enabled;
>> +		adapter->add_frag_size = fpcmd->add_frag_size;
>> +
>> +		return igc_tsn_offload_apply(adapter);
>> +	}
>> +
>> +	return 0;
>>  }
>>  
>>  static int igc_ethtool_begin(struct net_device *netdev)
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 20dac04a02f2..ed55bd13e4a1 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -28,6 +28,11 @@
>>  #define IGC_XDP_TX		BIT(1)
>>  #define IGC_XDP_REDIRECT	BIT(2)
>>  
>> +#define IGC_FP_TIMEOUT msecs_to_jiffies(100)
>> +#define IGC_MAX_VERIFY_CNT 3
>> +
>> +#define IGC_FP_SMD_FRAME_SIZE 60
>> +
>>  static int debug = -1;
>>  
>>  MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
>> @@ -2169,6 +2174,79 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
>>  	return 0;
>>  }
>>  
>> +static int igc_fp_init_smd_frame(struct igc_ring *ring, struct igc_tx_buffer *buffer,
>> +				 struct sk_buff *skb)
>> +{
>> +	dma_addr_t dma;
>> +	unsigned int size;
>> +
>> +	size = skb_headlen(skb);
>> +
>> +	dma = dma_map_single(ring->dev, skb->data, size, DMA_TO_DEVICE);
>> +	if (dma_mapping_error(ring->dev, dma)) {
>> +		netdev_err_once(ring->netdev, "Failed to map DMA for TX\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	buffer->skb = skb;
>> +	buffer->protocol = 0;
>> +	buffer->bytecount = skb->len;
>> +	buffer->gso_segs = 1;
>> +	buffer->time_stamp = jiffies;
>> +	dma_unmap_len_set(buffer, len, skb->len);
>> +	dma_unmap_addr_set(buffer, dma, dma);
>> +
>> +	return 0;
>> +}
>> +
>> +static int igc_fp_init_tx_descriptor(struct igc_ring *ring,
>> +				     struct sk_buff *skb, int type)
>> +{
>> +	struct igc_tx_buffer *buffer;
>> +	union igc_adv_tx_desc *desc;
>> +	u32 cmd_type, olinfo_status;
>> +	int err;
>> +
>> +	if (!igc_desc_unused(ring))
>> +		return -EBUSY;
>> +
>> +	buffer = &ring->tx_buffer_info[ring->next_to_use];
>> +	err = igc_fp_init_smd_frame(ring, buffer, skb);
>> +	if (err)
>> +		return err;
>> +
>> +	cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
>> +		   IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
>> +		   buffer->bytecount;
>> +	olinfo_status = buffer->bytecount << IGC_ADVTXD_PAYLEN_SHIFT;
>> +
>> +	switch (type) {
>> +	case IGC_SMD_TYPE_SMD_V:
>> +		olinfo_status |= (IGC_TXD_POPTS_SMD_V << 8);
>> +		break;
>> +	case IGC_SMD_TYPE_SMD_R:
>> +		olinfo_status |= (IGC_TXD_POPTS_SMD_R << 8);
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	desc = IGC_TX_DESC(ring, ring->next_to_use);
>> +	desc->read.cmd_type_len = cpu_to_le32(cmd_type);
>> +	desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>> +	desc->read.buffer_addr = cpu_to_le64(dma_unmap_addr(buffer, dma));
>> +
>> +	netdev_tx_sent_queue(txring_txq(ring), skb->len);
>> +
>> +	buffer->next_to_watch = desc;
>> +
>> +	ring->next_to_use++;
>> +	if (ring->next_to_use == ring->count)
>> +		ring->next_to_use = 0;
>> +
>> +	return 0;
>> +}
>> +
>>  static struct igc_ring *igc_xdp_get_tx_ring(struct igc_adapter *adapter,
>>  					    int cpu)
>>  {
>> @@ -2299,6 +2377,19 @@ static void igc_update_rx_stats(struct igc_q_vector *q_vector,
>>  	q_vector->rx.total_bytes += bytes;
>>  }
>>  
>> +static int igc_rx_desc_smd_type(union igc_adv_rx_desc *rx_desc)
>> +{
>> +	u32 status = le32_to_cpu(rx_desc->wb.upper.status_error);
>> +
>> +	return (status & IGC_RXDADV_STAT_SMD_TYPE_MASK)
>> +		>> IGC_RXDADV_STAT_SMD_TYPE_SHIFT;
>> +}
>> +
>> +static bool igc_check_smd_frame(struct igc_rx_buffer *rx_buffer, unsigned int size)
>> +{
>> +	return size == 60;
>
> You should probably also verify that the contents is 60 octets of zeroes (sans the mCRC)?
>

Yeah, I will add some checks for that.

>> +}
>> +
>>  static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>>  {
>>  	unsigned int total_bytes = 0, total_packets = 0;
>> @@ -2315,6 +2406,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>>  		ktime_t timestamp = 0;
>>  		struct xdp_buff xdp;
>>  		int pkt_offset = 0;
>> +		int smd_type;
>>  		void *pktbuf;
>>  
>>  		/* return some buffers to hardware, one at a time is too slow */
>> @@ -2346,6 +2438,22 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>>  			size -= IGC_TS_HDR_LEN;
>>  		}
>>  
>> +		smd_type = igc_rx_desc_smd_type(rx_desc);
>> +
>> +		if (smd_type == IGC_SMD_TYPE_SMD_V || smd_type == IGC_SMD_TYPE_SMD_R) {
>
> I guess the performance people will love you for this change. You should
> probably guard it by an "if (unlikely(disableVerify == false))" condition.
>

Will add the unlikely().

>> +			if (igc_check_smd_frame(rx_buffer, size)) {
>> +				adapter->fp_received_smd_v = smd_type == IGC_SMD_TYPE_SMD_V;
>> +				adapter->fp_received_smd_r = smd_type == IGC_SMD_TYPE_SMD_R;
>> +				schedule_delayed_work(&adapter->fp_verification_work, 0);
>> +			}
>> +
>> +			/* Advance the ring next-to-clean */
>> +			igc_is_non_eop(rx_ring, rx_desc);
>> +
>> +			cleaned_count++;
>> +			continue;
>> +		}
>> +
>>  		if (!skb) {
>>  			xdp_init_buff(&xdp, truesize, &rx_ring->xdp_rxq);
>>  			xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
>> @@ -5607,6 +5715,107 @@ static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
>>  	return igc_tsn_offload_apply(adapter);
>>  }
>>  
>> +/* I225 doesn't send the SMD frames automatically, we need to handle
>> + * them ourselves.
>> + */
>> +static int igc_xmit_smd_frame(struct igc_adapter *adapter, int type)
>> +{
>> +	int cpu = smp_processor_id();
>> +	struct netdev_queue *nq;
>> +	struct igc_ring *ring;
>> +	struct sk_buff *skb;
>> +	void *data;
>> +	int err;
>> +
>> +	if (!netif_running(adapter->netdev))
>> +		return -ENOTCONN;
>> +
>> +	/* FIXME: rename this function to something less specific, as
>> +	 * it can be used outside XDP.
>> +	 */
>> +	ring = igc_xdp_get_tx_ring(adapter, cpu);
>> +	nq = txring_txq(ring);
>> +
>> +	skb = alloc_skb(IGC_FP_SMD_FRAME_SIZE, GFP_KERNEL);
>> +	if (!skb)
>> +		return -ENOMEM;
>> +
>> +	data = skb_put(skb, IGC_FP_SMD_FRAME_SIZE);
>> +	memset(data, 0, IGC_FP_SMD_FRAME_SIZE);
>> +
>> +	__netif_tx_lock(nq, cpu);
>> +
>> +	err = igc_fp_init_tx_descriptor(ring, skb, type);
>> +
>> +	igc_flush_tx_descriptors(ring);
>> +
>> +	__netif_tx_unlock(nq);
>> +
>> +	return err;
>> +}
>> +
>> +static void igc_fp_verification_work(struct work_struct *work)
>> +{
>> +	struct delayed_work *dwork = to_delayed_work(work);
>> +	struct igc_adapter *adapter;
>> +	int err;
>> +
>> +	adapter = container_of(dwork, struct igc_adapter, fp_verification_work);
>> +
>> +	if (adapter->fp_disable_verify)
>> +		goto done;
>> +
>> +	switch (adapter->fp_tx_state) {
>> +	case FRAME_PREEMPTION_STATE_START:
>> +		adapter->fp_received_smd_r = false;
>> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_V);
>> +		if (err < 0)
>> +			netdev_err(adapter->netdev, "Error sending SMD-V frame\n");
>
> On TX error should you really advance to the STATE_SENT?
>

We tried to send a SMD-V frame and it failed, the error was probably
transient (unable to allocate memory) and it's going to be retried later.

>> +
>> +		adapter->fp_tx_state = FRAME_PREEMPTION_STATE_SENT;
>> +		adapter->fp_start = jiffies;
>> +		schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
>> +		break;
>> +
>> +	case FRAME_PREEMPTION_STATE_SENT:
>> +		if (adapter->fp_received_smd_r) {
>> +			adapter->fp_tx_state = FRAME_PREEMPTION_STATE_DONE;
>> +			adapter->fp_received_smd_r = false;
>> +			break;
>> +		}
>> +
>> +		if (time_is_before_jiffies(adapter->fp_start + IGC_FP_TIMEOUT)) {
>> +			adapter->fp_verify_cnt++;
>> +			netdev_warn(adapter->netdev, "Timeout waiting for SMD-R frame\n");
>> +
>> +			if (adapter->fp_verify_cnt > IGC_MAX_VERIFY_CNT) {
>> +				adapter->fp_verify_cnt = 0;
>> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_FAILED;
>> +				netdev_err(adapter->netdev,
>> +					   "Exceeded number of attempts for frame preemption verification\n");
>> +			} else {
>> +				adapter->fp_tx_state = FRAME_PREEMPTION_STATE_START;
>> +			}
>> +			schedule_delayed_work(&adapter->fp_verification_work, IGC_FP_TIMEOUT);
>> +		}
>> +
>> +		break;
>> +
>> +	case FRAME_PREEMPTION_STATE_FAILED:
>> +	case FRAME_PREEMPTION_STATE_DONE:
>> +		break;
>> +	}
>> +
>> +done:
>> +	if (adapter->fp_received_smd_v) {
>> +		err = igc_xmit_smd_frame(adapter, IGC_SMD_TYPE_SMD_R);
>> +		if (err < 0)
>> +			netdev_err(adapter->netdev, "Error sending SMD-R frame\n");
>> +
>> +		adapter->fp_received_smd_v = false;
>> +	}
>> +}
>> +
>>  static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type,
>>  			void *type_data)
>>  {
>> @@ -6023,6 +6232,7 @@ static int igc_probe(struct pci_dev *pdev,
>>  
>>  	INIT_WORK(&adapter->reset_task, igc_reset_task);
>>  	INIT_WORK(&adapter->watchdog_task, igc_watchdog_task);
>> +	INIT_DELAYED_WORK(&adapter->fp_verification_work, igc_fp_verification_work);
>>  
>>  	/* Initialize link properties that are user-changeable */
>>  	adapter->fc_autoneg = true;
>> @@ -6044,6 +6254,12 @@ static int igc_probe(struct pci_dev *pdev,
>>  
>>  	igc_ptp_init(adapter);
>>  
>> +	/* FIXME: This sets the default to not do the verification
>> +	 * automatically, when we have support in multiple
>> +	 * controllers, this default can be changed.
>> +	 */
>> +	adapter->fp_disable_verify = true;
>> +
>
> Hmmmmm. So we need to instruct our users to explicitly enable
> verification in their ethtool-based scripts, since the default values
> will vary wildly from one vendor to another. On LS1028A I see no reason
> why verification would be disabled by default.
>

Reading 99.4.3 (IEEE 802.3-2018) again, that "Verification may be disabled"
seems to imply that it should be enabled by default.

I will change this.

>>  	/* reset the hardware with the new settings */
>>  	igc_reset(adapter);
>>  
>> -- 
>> 2.32.0
>> 

-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
  2022-04-12  0:08         ` [Intel-wired-lan] " Vladimir Oltean
@ 2022-04-12  0:38           ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-12  0:38 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, jhs, xiyou.wangcong, jiri, kuba, Po Liu, intel-wired-lan,
	anthony.l.nguyen, mkubecek

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Mon, Apr 11, 2022 at 04:31:03PM -0700, Vinicius Costa Gomes wrote:
>> > First line in taprio_disable_offload() is:
>> >
>> > 	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
>> > 		return 0;
>> >
>> > but you said it yourself below that the preemptible queues thing is
>> > independent of whether you have taprio offload or not (or taprio at
>> > all). So the queues will never be reset back to the eMAC if you don't
>> > use full offload (yes, this includes txtime offload too). In fact, it's
>> > so independent, that I don't even know why we add them to taprio in the
>> > first place :)
>>
>> That I didn't change taprio_disable_offload() was a mistake caused in
>> part by the limitations of the hardware I have (I cannot have txtime
>> offload and frame preemption enabled at the same time), so I didn't
>> catch that.
>>
>> > I think the argument had to do with the hold/advance commands (other
>> > frame preemption stuff that's already in taprio), but those are really
>> > special and only to be used in the Qbv+Qbu combination, but the pMAC
>> > traffic classes? I don't know... Honestly I thought that me asking to
>> > see preemptible queues implemented for mqprio as well was going to
>> > discourage you, but oh well...
>>
>> Now, the real important part, if this should be communicated to the
>> driver via taprio or via ethtool/netlink.
>>
>> I don't really have strong opinions on this anymore, the two options are
>> viable/possible.
>>
>> This is going to be a niche feature, agreed, so thinking that going with
>> the one that gives the user more flexibility perhaps is best, i.e. using
>> ethtool/netlink to communicate which queues should be marked as
>> preemptible or express.
>
> So we're back at this, very well.
>
> I was just happening to be looking at clause 36 of 802.1Q (Priority Flow Control),
> a feature exchanged through DCBX where flows of a certain priority can be
> configured as lossless on a port, and generate PAUSE frames. This is essentially
> the extension of 802.3 annex 31B MAC Control PAUSE operation with the ability to
> enable/disable flow control on a per-priority basis.
>
> The priority in PFC (essentially synonymous with "traffic class") is the same
> priority as the priority in frame preemption. And you know how PFC is configured
> in Linux? Not through the qdisc, but through DCB_ATTR_PFC_CFG, a nested dcbnl
> netlink attribute with one nested u8 attribute per priority value
> (DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_7).
>
> Not saying we should follow the exact same model as PFC, just saying that I'm
> hard pressed to find a good reason why the "preemptable traffic classes"
> information should sit in a layer which is basically independent of the frame
> preemption feature itself.

Ok, going to take this as another point in favor of going the ethtool
route.


Thank you,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-wired-lan] [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload
@ 2022-04-12  0:38           ` Vinicius Costa Gomes
  0 siblings, 0 replies; 50+ messages in thread
From: Vinicius Costa Gomes @ 2022-04-12  0:38 UTC (permalink / raw)
  To: intel-wired-lan

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Mon, Apr 11, 2022 at 04:31:03PM -0700, Vinicius Costa Gomes wrote:
>> > First line in taprio_disable_offload() is:
>> >
>> > 	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
>> > 		return 0;
>> >
>> > but you said it yourself below that the preemptible queues thing is
>> > independent of whether you have taprio offload or not (or taprio at
>> > all). So the queues will never be reset back to the eMAC if you don't
>> > use full offload (yes, this includes txtime offload too). In fact, it's
>> > so independent, that I don't even know why we add them to taprio in the
>> > first place :)
>>
>> That I didn't change taprio_disable_offload() was a mistake caused in
>> part by the limitations of the hardware I have (I cannot have txtime
>> offload and frame preemption enabled at the same time), so I didn't
>> catch that.
>>
>> > I think the argument had to do with the hold/advance commands (other
>> > frame preemption stuff that's already in taprio), but those are really
>> > special and only to be used in the Qbv+Qbu combination, but the pMAC
>> > traffic classes? I don't know... Honestly I thought that me asking to
>> > see preemptible queues implemented for mqprio as well was going to
>> > discourage you, but oh well...
>>
>> Now, the real important part, if this should be communicated to the
>> driver via taprio or via ethtool/netlink.
>>
>> I don't really have strong opinions on this anymore, the two options are
>> viable/possible.
>>
>> This is going to be a niche feature, agreed, so thinking that going with
>> the one that gives the user more flexibility perhaps is best, i.e. using
>> ethtool/netlink to communicate which queues should be marked as
>> preemptible or express.
>
> So we're back at this, very well.
>
> I was just happening to be looking at clause 36 of 802.1Q (Priority Flow Control),
> a feature exchanged through DCBX where flows of a certain priority can be
> configured as lossless on a port, and generate PAUSE frames. This is essentially
> the extension of 802.3 annex 31B MAC Control PAUSE operation with the ability to
> enable/disable flow control on a per-priority basis.
>
> The priority in PFC (essentially synonymous with "traffic class") is the same
> priority as the priority in frame preemption. And you know how PFC is configured
> in Linux? Not through the qdisc, but through DCB_ATTR_PFC_CFG, a nested dcbnl
> netlink attribute with one nested u8 attribute per priority value
> (DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_7).
>
> Not saying we should follow the exact same model as PFC, just saying that I'm
> hard pressed to find a good reason why the "preemptable traffic classes"
> information should sit in a layer which is basically independent of the frame
> preemption feature itself.

Ok, going to take this as another point in favor of going the ethtool
route.


Thank you,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-04-12  0:38 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-26  0:33 [PATCH net-next v4 00/12] ethtool: Add support for frame preemption Vinicius Costa Gomes
2021-06-26  0:33 ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 01/12] ethtool: Add support for configuring " Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-27 19:43   ` Vladimir Oltean
2021-06-27 19:43     ` [Intel-wired-lan] " Vladimir Oltean
2022-04-11 22:39     ` Vinicius Costa Gomes
2022-04-11 22:39       ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 02/12] taprio: Add support for frame preemption offload Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-27 19:58   ` Vladimir Oltean
2021-06-27 19:58     ` [Intel-wired-lan] " Vladimir Oltean
2022-04-11 23:31     ` Vinicius Costa Gomes
2022-04-11 23:31       ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-04-12  0:08       ` Vladimir Oltean
2022-04-12  0:08         ` [Intel-wired-lan] " Vladimir Oltean
2022-04-12  0:38         ` Vinicius Costa Gomes
2022-04-12  0:38           ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 03/12] core: Introduce netdev_tc_map_to_queue_mask() Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 04/12] taprio: Replace tc_map_to_queue_mask() Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-27 20:02   ` Vladimir Oltean
2021-06-27 20:02     ` [Intel-wired-lan] " Vladimir Oltean
2021-06-26  0:33 ` [PATCH net-next v4 05/12] mqprio: Add support for frame preemption offload Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 06/12] igc: Add support for enabling frame preemption via ethtool Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 07/12] igc: Add support for TC_SETUP_PREEMPT Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 08/12] igc: Simplify TSN flags handling Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 09/12] igc: Add support for setting frame preemption configuration Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 10/12] ethtool: Add support for Frame Preemption verification Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-28  9:17   ` Vladimir Oltean
2021-06-28  9:17     ` [Intel-wired-lan] " Vladimir Oltean
2021-06-26  0:33 ` [PATCH net-next v4 11/12] igc: Check incompatible configs for Frame Preemption Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-28  9:20   ` Vladimir Oltean
2021-06-28  9:20     ` [Intel-wired-lan] " Vladimir Oltean
2022-04-11 23:36     ` Vinicius Costa Gomes
2022-04-11 23:36       ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-26  0:33 ` [PATCH net-next v4 12/12] igc: Add support for Frame Preemption verification Vinicius Costa Gomes
2021-06-26  0:33   ` [Intel-wired-lan] " Vinicius Costa Gomes
2021-06-28  9:59   ` Vladimir Oltean
2021-06-28  9:59     ` [Intel-wired-lan] " Vladimir Oltean
2022-04-12  0:13     ` Vinicius Costa Gomes
2022-04-12  0:13       ` [Intel-wired-lan] " Vinicius Costa Gomes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.