linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation
@ 2023-05-12 15:25 Larysa Zaremba
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable Larysa Zaremba
                   ` (14 more replies)
  0 siblings, 15 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:25 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

This series introduces XDP hints support into ice driver and adds new kfunc
hints that utilize hardware capabilities.

- patches 01-04 refactors driver descriptor to skb fields processing code,
  making it more reusable without changing any behavior.

- patches 05-08 add support add support for existing hints (timestamp and 
  hash) in ice driver.

- patches 09-12 introduce new kfunc hints, namely 2 VLAN tag hints 
  (ctag & stag separately) and "checksum level", which is basically
  a CHECKSUM_UNNECESSARY indicator. Then those hints are implemented in
  ice driver.

- patches 13-15 adjust xdp_hw_metadata to account for new hints.

- in particular, patch 14 lifts the limitation on data_meta size to be
  32 or lower, because all the information that needs to be passed into
  AF_XDP from XDP in xdp_hw_metadata no longer fits into 32 bytes.

Aleksander Lobakin (1):
  net, xdp: allow metadata > 32

Larysa Zaremba (14):
  ice: make RX hash reading code more reusable
  ice: make RX HW timestamp reading code more reusable
  ice: make RX checksum checking code more reusable
  ice: Make ptype internal to descriptor info processing
  ice: Introduce ice_xdp_buff
  ice: Support HW timestamp hint
  ice: Support RX hash XDP hint
  ice: Support XDP hints in AF_XDP ZC mode
  xdp: Add VLAN tag hint
  ice: Implement VLAN tag hint
  xdp: Add checksum level hint
  ice: Implement checksum level hint
  selftests/bpf: Allow VLAN packets in xdp_hw_metadata
  selftests/bpf: Add flags and new hints to xdp_hw_metadata

 Documentation/networking/xdp-rx-metadata.rst  |  14 +-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h    | 412 +++++++++---------
 drivers/net/ethernet/intel/ice/ice_main.c     |   1 +
 drivers/net/ethernet/intel/ice/ice_ptp.c      |  23 +-
 drivers/net/ethernet/intel/ice/ice_ptp.h      |  18 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  13 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h     |  23 +-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 311 +++++++++++--
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  13 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c      |  16 +-
 include/linux/netdevice.h                     |   3 +
 include/linux/skbuff.h                        |  13 +-
 include/net/xdp.h                             |  16 +-
 kernel/bpf/offload.c                          |   6 +
 net/core/xdp.c                                |  36 ++
 .../selftests/bpf/progs/xdp_hw_metadata.c     |  49 ++-
 tools/testing/selftests/bpf/xdp_hw_metadata.c |  29 +-
 tools/testing/selftests/bpf/xdp_metadata.h    |  36 +-
 19 files changed, 738 insertions(+), 296 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
@ 2023-05-12 15:25 ` Larysa Zaremba
  2023-05-19 16:46   ` Alexander Lobakin
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 02/15] ice: make RX HW timestamp " Larysa Zaremba
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:25 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Previously, we only needed RX hash in skb path,
hence all related code was written with skb in mind.
But with the addition of XDP hints via kfuncs to the ice driver,
the same logic will be needed in .xmo_() callbacks.

Separate generic process of reading RX hash from a descriptor
into a separate function.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 38 +++++++++++++------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index c8322fb6f2b3..fc67bbf600af 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -63,28 +63,44 @@ static enum pkt_hash_types ice_ptype_to_htype(u16 ptype)
 }
 
 /**
- * ice_rx_hash - set the hash value in the skb
+ * ice_copy_rx_hash_from_desc - copy hash value from descriptor to address
+ * @rx_desc: specific descriptor
+ * @dst: address to copy hash value to
+ *
+ * Returns true, if valid hash has been copied into the destination address.
+ */
+static bool
+ice_copy_rx_hash_from_desc(union ice_32b_rx_flex_desc *rx_desc, u32 *dst)
+{
+	struct ice_32b_rx_flex_desc_nic *nic_mdid;
+
+	if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC)
+		return false;
+
+	nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
+	*dst = le32_to_cpu(nic_mdid->rss_hash);
+	return true;
+}
+
+/**
+ * ice_rx_hash_to_skb - set the hash value in the skb
  * @rx_ring: descriptor ring
  * @rx_desc: specific descriptor
  * @skb: pointer to current skb
  * @rx_ptype: the ptype value from the descriptor
  */
 static void
-ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
-	    struct sk_buff *skb, u16 rx_ptype)
+ice_rx_hash_to_skb(struct ice_rx_ring *rx_ring,
+		   union ice_32b_rx_flex_desc *rx_desc,
+		   struct sk_buff *skb, u16 rx_ptype)
 {
-	struct ice_32b_rx_flex_desc_nic *nic_mdid;
 	u32 hash;
 
 	if (!(rx_ring->netdev->features & NETIF_F_RXHASH))
 		return;
 
-	if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC)
-		return;
-
-	nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
-	hash = le32_to_cpu(nic_mdid->rss_hash);
-	skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
+	if (ice_copy_rx_hash_from_desc(rx_desc, &hash))
+		skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
 }
 
 /**
@@ -186,7 +202,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
 		       union ice_32b_rx_flex_desc *rx_desc,
 		       struct sk_buff *skb, u16 ptype)
 {
-	ice_rx_hash(rx_ring, rx_desc, skb, ptype);
+	ice_rx_hash_to_skb(rx_ring, rx_desc, skb, ptype);
 
 	/* modifies the skb - consumes the enet header */
 	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 02/15] ice: make RX HW timestamp reading code more reusable
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable Larysa Zaremba
@ 2023-05-12 15:25 ` Larysa Zaremba
  2023-05-19 16:52   ` Alexander Lobakin
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 03/15] ice: make RX checksum checking " Larysa Zaremba
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:25 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Previously, we only needed RX HW timestamp in skb path,
hence all related code was written with skb in mind.
But with the addition of XDP hints via kfuncs to the ice driver,
the same logic will be needed in .xmo_() callbacks.

Put generic process of reading RX HW timestamp from a descriptor
into a separate function.
Move skb-related code into another source file.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ptp.c      | 23 +++++++---------
 drivers/net/ethernet/intel/ice/ice_ptp.h      | 18 ++++++++-----
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 27 ++++++++++++++++++-
 3 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index ac6f06f9a2ed..c90ce91f11ab 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -2142,30 +2142,28 @@ int ice_ptp_set_ts_config(struct ice_pf *pf, struct ifreq *ifr)
 }
 
 /**
- * ice_ptp_rx_hwtstamp - Check for an Rx timestamp
+ * ice_ptp_copy_rx_hwts_from_desc - Check for an Rx timestamp
  * @rx_ring: Ring to get the VSI info
  * @rx_desc: Receive descriptor
- * @skb: Particular skb to send timestamp with
+ * @dst: Address to put RX timestamp to
  *
- * The driver receives a notification in the receive descriptor with timestamp.
- * The timestamp is in ns, so we must convert the result first.
+ * If function returns true, dst contains a valid RX timestamp in ns.
  */
-void
-ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
-		    union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb)
+bool ice_ptp_copy_rx_hwts_from_desc(struct ice_rx_ring *rx_ring,
+				    union ice_32b_rx_flex_desc *rx_desc,
+				    u64 *dst)
 {
-	struct skb_shared_hwtstamps *hwtstamps;
 	u64 ts_ns, cached_time;
 	u32 ts_high;
 
 	if (!(rx_desc->wb.time_stamp_low & ICE_PTP_TS_VALID))
-		return;
+		return false;
 
 	cached_time = READ_ONCE(rx_ring->cached_phctime);
 
 	/* Do not report a timestamp if we don't have a cached PHC time */
 	if (!cached_time)
-		return;
+		return false;
 
 	/* Use ice_ptp_extend_32b_ts directly, using the ring-specific cached
 	 * PHC value, rather than accessing the PF. This also allows us to
@@ -2176,9 +2174,8 @@ ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
 	ts_high = le32_to_cpu(rx_desc->wb.flex_ts.ts_high);
 	ts_ns = ice_ptp_extend_32b_ts(cached_time, ts_high);
 
-	hwtstamps = skb_hwtstamps(skb);
-	memset(hwtstamps, 0, sizeof(*hwtstamps));
-	hwtstamps->hwtstamp = ns_to_ktime(ts_ns);
+	*dst = ts_ns;
+	return true;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.h b/drivers/net/ethernet/intel/ice/ice_ptp.h
index 9cda2f43e0e5..509ea9570276 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.h
@@ -259,9 +259,9 @@ int ice_get_ptp_clock_index(struct ice_pf *pf);
 s8 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb);
 bool ice_ptp_process_ts(struct ice_pf *pf);
 
-void
-ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
-		    union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb);
+bool ice_ptp_copy_rx_hwts_from_desc(struct ice_rx_ring *rx_ring,
+				    union ice_32b_rx_flex_desc *rx_desc,
+				    u64 *dst);
 void ice_ptp_reset(struct ice_pf *pf);
 void ice_ptp_prepare_for_reset(struct ice_pf *pf);
 void ice_ptp_init(struct ice_pf *pf);
@@ -294,9 +294,15 @@ static inline bool ice_ptp_process_ts(struct ice_pf *pf)
 {
 	return true;
 }
-static inline void
-ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
-		    union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb) { }
+
+static inline bool
+ice_ptp_copy_rx_hwts_from_desc(struct ice_rx_ring *rx_ring,
+			       union ice_32b_rx_flex_desc *rx_desc,
+			       u64 *dst)
+{
+	return false;
+}
+
 static inline void ice_ptp_reset(struct ice_pf *pf) { }
 static inline void ice_ptp_prepare_for_reset(struct ice_pf *pf) { }
 static inline void ice_ptp_init(struct ice_pf *pf) { }
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index fc67bbf600af..1aab79dc8915 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -186,6 +186,31 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
 	ring->vsi->back->hw_csum_rx_error++;
 }
 
+/**
+ * ice_ptp_rx_hwts_to_skb - Put RX timestamp into skb, if available
+ * @rx_ring: Ring to get the VSI info
+ * @rx_desc: Receive descriptor
+ * @skb: Particular skb to send timestamp with
+ *
+ * The driver receives a notification in the receive descriptor with timestamp.
+ * The timestamp is in ns, so we must convert the result first.
+ */
+static void
+ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring,
+		       union ice_32b_rx_flex_desc *rx_desc,
+		       struct sk_buff *skb)
+{
+	struct skb_shared_hwtstamps *hwtstamps;
+	u64 ts_ns;
+
+	if (!ice_ptp_copy_rx_hwts_from_desc(rx_ring, rx_desc, &ts_ns))
+		return;
+
+	hwtstamps = skb_hwtstamps(skb);
+	memset(hwtstamps, 0, sizeof(*hwtstamps));
+	hwtstamps->hwtstamp = ns_to_ktime(ts_ns);
+}
+
 /**
  * ice_process_skb_fields - Populate skb header fields from Rx descriptor
  * @rx_ring: Rx descriptor ring packet is being transacted on
@@ -210,7 +235,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
 	ice_rx_csum(rx_ring, skb, rx_desc, ptype);
 
 	if (rx_ring->ptp_rx)
-		ice_ptp_rx_hwtstamp(rx_ring, rx_desc, skb);
+		ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb);
 }
 
 /**
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 03/15] ice: make RX checksum checking code more reusable
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable Larysa Zaremba
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 02/15] ice: make RX HW timestamp " Larysa Zaremba
@ 2023-05-12 15:25 ` Larysa Zaremba
  2023-05-22 15:51   ` Alexander Lobakin
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 04/15] ice: Make ptype internal to descriptor info processing Larysa Zaremba
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:25 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Previously, we only needed RX checksum flags in skb path,
hence all related code was written with skb in mind.
But with the addition of XDP hints via kfuncs to the ice driver,
the same logic will be needed in .xmo_() callbacks.

Put generic process of determining checksum status into
a separate function.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 71 ++++++++++++-------
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 1aab79dc8915..6a4fd3f3fc0a 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -104,17 +104,17 @@ ice_rx_hash_to_skb(struct ice_rx_ring *rx_ring,
 }
 
 /**
- * ice_rx_csum - Indicate in skb if checksum is good
- * @ring: the ring we care about
- * @skb: skb currently being received and modified
+ * ice_rx_csum_checked - Indicates, whether hardware has checked the checksum
  * @rx_desc: the receive descriptor
  * @ptype: the packet type decoded by hardware
+ * @csum_lvl_dst: address to put checksum level into
+ * @ring: ring for error stats, can be NULL
  *
- * skb->protocol must be set before this function is called
+ * Returns true, if hardware has checked the checksum.
  */
-static void
-ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
-	    union ice_32b_rx_flex_desc *rx_desc, u16 ptype)
+static bool
+ice_rx_csum_checked(union ice_32b_rx_flex_desc *rx_desc, u16 ptype,
+		    u8 *csum_lvl_dst, struct ice_rx_ring *ring)
 {
 	struct ice_rx_ptype_decoded decoded;
 	u16 rx_status0, rx_status1;
@@ -125,20 +125,12 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
 
 	decoded = ice_decode_rx_desc_ptype(ptype);
 
-	/* Start with CHECKSUM_NONE and by default csum_level = 0 */
-	skb->ip_summed = CHECKSUM_NONE;
-	skb_checksum_none_assert(skb);
-
-	/* check if Rx checksum is enabled */
-	if (!(ring->netdev->features & NETIF_F_RXCSUM))
-		return;
-
 	/* check if HW has decoded the packet and checksum */
 	if (!(rx_status0 & BIT(ICE_RX_FLEX_DESC_STATUS0_L3L4P_S)))
-		return;
+		return false;
 
 	if (!(decoded.known && decoded.outer_ip))
-		return;
+		return false;
 
 	ipv4 = (decoded.outer_ip == ICE_RX_PTYPE_OUTER_IP) &&
 	       (decoded.outer_ip_ver == ICE_RX_PTYPE_OUTER_IPV4);
@@ -168,22 +160,51 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
 	 * we are indicating we validated the inner checksum.
 	 */
 	if (decoded.tunnel_type >= ICE_RX_PTYPE_TUNNEL_IP_GRENAT)
-		skb->csum_level = 1;
+		*csum_lvl_dst = 1;
 
 	/* Only report checksum unnecessary for TCP, UDP, or SCTP */
 	switch (decoded.inner_prot) {
 	case ICE_RX_PTYPE_INNER_PROT_TCP:
 	case ICE_RX_PTYPE_INNER_PROT_UDP:
 	case ICE_RX_PTYPE_INNER_PROT_SCTP:
-		skb->ip_summed = CHECKSUM_UNNECESSARY;
-		break;
-	default:
-		break;
+		return true;
 	}
-	return;
+
+	return false;
 
 checksum_fail:
-	ring->vsi->back->hw_csum_rx_error++;
+	if (ring)
+		ring->vsi->back->hw_csum_rx_error++;
+
+	return false;
+}
+
+/**
+ * ice_rx_csum_into_skb - Indicate in skb if checksum is good
+ * @ring: the ring we care about
+ * @skb: skb currently being received and modified
+ * @rx_desc: the receive descriptor
+ * @ptype: the packet type decoded by hardware
+ */
+static void
+ice_rx_csum_into_skb(struct ice_rx_ring *ring, struct sk_buff *skb,
+		     union ice_32b_rx_flex_desc *rx_desc, u16 ptype)
+{
+	u8 csum_level = 0;
+
+	/* Start with CHECKSUM_NONE and by default csum_level = 0 */
+	skb->ip_summed = CHECKSUM_NONE;
+	skb_checksum_none_assert(skb);
+
+	/* check if Rx checksum is enabled */
+	if (!(ring->netdev->features & NETIF_F_RXCSUM))
+		return;
+
+	if (!ice_rx_csum_checked(rx_desc, ptype, &csum_level, ring))
+		return;
+
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+	skb->csum_level = csum_level;
 }
 
 /**
@@ -232,7 +253,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
 	/* modifies the skb - consumes the enet header */
 	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
-	ice_rx_csum(rx_ring, skb, rx_desc, ptype);
+	ice_rx_csum_into_skb(rx_ring, skb, rx_desc, ptype);
 
 	if (rx_ring->ptp_rx)
 		ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 04/15] ice: Make ptype internal to descriptor info processing
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (2 preceding siblings ...)
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 03/15] ice: make RX checksum checking " Larysa Zaremba
@ 2023-05-12 15:25 ` Larysa Zaremba
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 05/15] ice: Introduce ice_xdp_buff Larysa Zaremba
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:25 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Currently, rx_ptype variable is used only as an argument
to ice_process_skb_fields() and is computed
just before the function call.

Therefore, there is no reason to pass this value as an argument.
Instead, remove this argument and compute the value directly inside
ice_process_skb_fields() function.

Also, separate its calculation into a short function, so the code
can later be reused in .xmo_() callbacks.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  6 +-----
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 15 +++++++++++++--
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  2 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c      |  2 +-
 4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 4fcf2d07eb85..c9bb77da0861 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1181,7 +1181,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
 		unsigned int size;
 		u16 stat_err_bits;
 		u16 vlan_tag = 0;
-		u16 rx_ptype;
 
 		/* get the Rx desc from Rx ring based on 'next_to_clean' */
 		rx_desc = ICE_RX_DESC(rx_ring, ntc);
@@ -1286,10 +1285,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
 		total_rx_bytes += skb->len;
 
 		/* populate checksum, VLAN, and protocol */
-		rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) &
-			ICE_RX_FLEX_DESC_PTYPE_M;
-
-		ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
+		ice_process_skb_fields(rx_ring, rx_desc, skb);
 
 		ice_trace(clean_rx_irq_indicate, rx_ring, rx_desc, skb);
 		/* send completed skb up the stack */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 6a4fd3f3fc0a..2515f5f7a2b6 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -232,12 +232,21 @@ ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring,
 	hwtstamps->hwtstamp = ns_to_ktime(ts_ns);
 }
 
+/**
+ * ice_get_ptype - Read HW packet type from the descriptor
+ * @rx_desc: RX descriptor
+ */
+static u16 ice_get_ptype(union ice_32b_rx_flex_desc *rx_desc)
+{
+	return le16_to_cpu(rx_desc->wb.ptype_flex_flags0) &
+	       ICE_RX_FLEX_DESC_PTYPE_M;
+}
+
 /**
  * ice_process_skb_fields - Populate skb header fields from Rx descriptor
  * @rx_ring: Rx descriptor ring packet is being transacted on
  * @rx_desc: pointer to the EOP Rx descriptor
  * @skb: pointer to current skb being populated
- * @ptype: the packet type decoded by hardware
  *
  * This function checks the ring, descriptor, and packet information in
  * order to populate the hash, checksum, VLAN, protocol, and
@@ -246,8 +255,10 @@ ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring,
 void
 ice_process_skb_fields(struct ice_rx_ring *rx_ring,
 		       union ice_32b_rx_flex_desc *rx_desc,
-		       struct sk_buff *skb, u16 ptype)
+		       struct sk_buff *skb)
 {
+	u16 ptype = ice_get_ptype(rx_desc);
+
 	ice_rx_hash_to_skb(rx_ring, rx_desc, skb, ptype);
 
 	/* modifies the skb - consumes the enet header */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
index 115969ecdf7b..e1d49e1235b3 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
@@ -148,7 +148,7 @@ void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val);
 void
 ice_process_skb_fields(struct ice_rx_ring *rx_ring,
 		       union ice_32b_rx_flex_desc *rx_desc,
-		       struct sk_buff *skb, u16 ptype);
+		       struct sk_buff *skb);
 void
 ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag);
 #endif /* !_ICE_TXRX_LIB_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index d1e489da7363..3b80aed5d47a 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -857,7 +857,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 		rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) &
 				       ICE_RX_FLEX_DESC_PTYPE_M;
 
-		ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
+		ice_process_skb_fields(rx_ring, rx_desc, skb);
 		ice_receive_skb(rx_ring, skb, vlan_tag);
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 05/15] ice: Introduce ice_xdp_buff
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (3 preceding siblings ...)
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 04/15] ice: Make ptype internal to descriptor info processing Larysa Zaremba
@ 2023-05-12 15:25 ` Larysa Zaremba
  2023-05-22 16:46   ` Alexander Lobakin
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 06/15] ice: Support HW timestamp hint Larysa Zaremba
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:25 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

In order to use XDP hints via kfuncs we need to put
RX descriptor and ring pointers just next to xdp_buff.
Same as in hints implementations in other drivers, we archieve
this through putting xdp_buff into a child structure.

Currently, xdp_buff is stored in the ring structure,
so replace it with union that includes child structure.
This way enough memory is available while existing XDP code
remains isolated from hints.

Size of the new child structure (ice_xdp_buff) is 72 bytes,
therefore it does not fit into a single cache line.
To at least place union at the start of cache line, move 'next'
field from CL3 to CL1, as it isn't used often.

Placing union at the start of cache line makes at least xdp_buff
and descriptor fit into a single CL,
ring pointer is used less often, so it can spill into the next CL.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  7 ++++--
 drivers/net/ethernet/intel/ice/ice_txrx.h     | 23 ++++++++++++++++---
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 11 +++++++++
 3 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index c9bb77da0861..ca21a71749b6 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -557,13 +557,14 @@ ice_rx_frame_truesize(struct ice_rx_ring *rx_ring, const unsigned int size)
  * @xdp_prog: XDP program to run
  * @xdp_ring: ring to be used for XDP_TX action
  * @rx_buf: Rx buffer to store the XDP action
+ * @eop_desc: Last descriptor in packet to read metadata from
  *
  * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
  */
 static void
 ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 	    struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
-	    struct ice_rx_buf *rx_buf)
+	    struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc)
 {
 	unsigned int ret = ICE_XDP_PASS;
 	u32 act;
@@ -571,6 +572,8 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 	if (!xdp_prog)
 		goto exit;
 
+	ice_xdp_set_meta_srcs(xdp, eop_desc, rx_ring);
+
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 	switch (act) {
 	case XDP_PASS:
@@ -1240,7 +1243,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
 		if (ice_is_non_eop(rx_ring, rx_desc))
 			continue;
 
-		ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf);
+		ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf, rx_desc);
 		if (rx_buf->act == ICE_XDP_PASS)
 			goto construct_skb;
 		total_rx_bytes += xdp_get_buff_len(xdp);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index fff0efe28373..f1ac2eb974f1 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -260,6 +260,15 @@ enum ice_rx_dtype {
 	ICE_RX_DTYPE_SPLIT_ALWAYS	= 2,
 };
 
+struct ice_xdp_buff {
+	struct xdp_buff xdp_buff;
+	union ice_32b_rx_flex_desc *eop_desc;	/* Required for all metadata */
+	/* End of the 1st cache line */
+	struct ice_rx_ring *rx_ring;
+};
+
+static_assert(offsetof(struct ice_xdp_buff, xdp_buff) == 0);
+
 /* indices into GLINT_ITR registers */
 #define ICE_RX_ITR	ICE_IDX_ITR0
 #define ICE_TX_ITR	ICE_IDX_ITR1
@@ -301,7 +310,6 @@ enum ice_dynamic_itr {
 /* descriptor ring, associated with a VSI */
 struct ice_rx_ring {
 	/* CL1 - 1st cacheline starts here */
-	struct ice_rx_ring *next;	/* pointer to next ring in q_vector */
 	void *desc;			/* Descriptor ring memory */
 	struct device *dev;		/* Used for DMA mapping */
 	struct net_device *netdev;	/* netdev ring maps to */
@@ -313,12 +321,19 @@ struct ice_rx_ring {
 	u16 count;			/* Number of descriptors */
 	u16 reg_idx;			/* HW register index of the ring */
 	u16 next_to_alloc;
-	/* CL2 - 2nd cacheline starts here */
+
 	union {
 		struct ice_rx_buf *rx_buf;
 		struct xdp_buff **xdp_buf;
 	};
-	struct xdp_buff xdp;
+	/* CL2 - 2nd cacheline starts here
+	 * Size of ice_xdp_buff is 72 bytes,
+	 * so it spills into CL3
+	 */
+	union {
+		struct ice_xdp_buff xdp_ext;
+		struct xdp_buff xdp;
+	};
 	/* CL3 - 3rd cacheline starts here */
 	struct bpf_prog *xdp_prog;
 	u16 rx_offset;
@@ -328,6 +343,8 @@ struct ice_rx_ring {
 	u16 next_to_clean;
 	u16 first_desc;
 
+	struct ice_rx_ring *next;	/* pointer to next ring in q_vector */
+
 	/* stats structs */
 	struct ice_ring_stats *ring_stats;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
index e1d49e1235b3..2835a8348237 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
@@ -151,4 +151,15 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
 		       struct sk_buff *skb);
 void
 ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag);
+
+static inline void
+ice_xdp_set_meta_srcs(struct xdp_buff *xdp,
+		      union ice_32b_rx_flex_desc *eop_desc,
+		      struct ice_rx_ring *rx_ring)
+{
+	struct ice_xdp_buff *xdp_ext = (struct ice_xdp_buff *)xdp;
+
+	xdp_ext->eop_desc = eop_desc;
+	xdp_ext->rx_ring = rx_ring;
+}
 #endif /* !_ICE_TXRX_LIB_H_ */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 06/15] ice: Support HW timestamp hint
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (4 preceding siblings ...)
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 05/15] ice: Introduce ice_xdp_buff Larysa Zaremba
@ 2023-05-12 15:25 ` Larysa Zaremba
  2023-05-12 18:19   ` Stanislav Fomichev
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 07/15] ice: Support RX hash XDP hint Larysa Zaremba
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:25 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Use previously refactored code and create a function
that allows XDP code to read HW timestamp.

HW timestamp is the first supported hint in the driver,
so also add xdp_metadata_ops.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |  2 ++
 drivers/net/ethernet/intel/ice/ice_main.c     |  1 +
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 22 +++++++++++++++++++
 3 files changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index aa32111afd6e..ba1bb8392db1 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -962,4 +962,6 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf)
 	set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
 	clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
 }
+
+extern const struct xdp_metadata_ops ice_xdp_md_ops;
 #endif /* _ICE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index a1f7c8edc22f..cda6c4a80737 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3378,6 +3378,7 @@ static void ice_set_ops(struct ice_vsi *vsi)
 
 	netdev->netdev_ops = &ice_netdev_ops;
 	netdev->udp_tunnel_nic_info = &pf->hw.udp_tunnel_nic;
+	netdev->xdp_metadata_ops = &ice_xdp_md_ops;
 	ice_set_ethtool_ops(netdev);
 
 	if (vsi->type != ICE_VSI_PF)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 2515f5f7a2b6..e9589cadf811 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -537,3 +537,25 @@ void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res,
 			spin_unlock(&xdp_ring->tx_lock);
 	}
 }
+
+/**
+ * ice_xdp_rx_hw_ts - HW timestamp XDP hint handler
+ * @ctx: XDP buff pointer
+ * @ts_ns: destination address
+ *
+ * Copy HW timestamp (if available) to the destination address.
+ */
+static int ice_xdp_rx_hw_ts(const struct xdp_md *ctx, u64 *ts_ns)
+{
+	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
+
+	if (!ice_ptp_copy_rx_hwts_from_desc(xdp_ext->rx_ring,
+					    xdp_ext->eop_desc, ts_ns))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+const struct xdp_metadata_ops ice_xdp_md_ops = {
+	.xmo_rx_timestamp		= ice_xdp_rx_hw_ts,
+};
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 07/15] ice: Support RX hash XDP hint
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (5 preceding siblings ...)
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 06/15] ice: Support HW timestamp hint Larysa Zaremba
@ 2023-05-12 15:25 ` Larysa Zaremba
  2023-05-12 18:22   ` Stanislav Fomichev
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 08/15] ice: Support XDP hints in AF_XDP ZC mode Larysa Zaremba
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:25 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

RX hash XDP hint requests both hash value and type.
Type is XDP-specific, so we need a separate way to map
these values to the hardware ptypes, so create a lookup table.

Instead of creating a new long list, reuse contents
of ice_decode_rx_desc_ptype[] through preprocessor.

Current hash type enum does not contain ICMP packet type,
but ice devices support it, so also add a new type into core code.

Then use previously refactored code and create a function
that allows XDP code to read RX hash.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h    | 412 +++++++++---------
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  72 +++
 include/net/xdp.h                             |   3 +
 3 files changed, 283 insertions(+), 204 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index 89f986a75cc8..d384ddfcb83e 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -673,6 +673,212 @@ struct ice_tlan_ctx {
  *      Use the enum ice_rx_l2_ptype to decode the packet type
  * ENDIF
  */
+#define ICE_PTYPES								\
+	/* L2 Packet types */							\
+	ICE_PTT_UNUSED_ENTRY(0),						\
+	ICE_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),			\
+	ICE_PTT_UNUSED_ENTRY(2),						\
+	ICE_PTT_UNUSED_ENTRY(3),						\
+	ICE_PTT_UNUSED_ENTRY(4),						\
+	ICE_PTT_UNUSED_ENTRY(5),						\
+	ICE_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),			\
+	ICE_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),			\
+	ICE_PTT_UNUSED_ENTRY(8),						\
+	ICE_PTT_UNUSED_ENTRY(9),						\
+	ICE_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),		\
+	ICE_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),		\
+	ICE_PTT_UNUSED_ENTRY(12),						\
+	ICE_PTT_UNUSED_ENTRY(13),						\
+	ICE_PTT_UNUSED_ENTRY(14),						\
+	ICE_PTT_UNUSED_ENTRY(15),						\
+	ICE_PTT_UNUSED_ENTRY(16),						\
+	ICE_PTT_UNUSED_ENTRY(17),						\
+	ICE_PTT_UNUSED_ENTRY(18),						\
+	ICE_PTT_UNUSED_ENTRY(19),						\
+	ICE_PTT_UNUSED_ENTRY(20),						\
+	ICE_PTT_UNUSED_ENTRY(21),						\
+										\
+	/* Non Tunneled IPv4 */							\
+	ICE_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3),		\
+	ICE_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3),		\
+	ICE_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(25),						\
+	ICE_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP,  PAY4),		\
+	ICE_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4),		\
+	ICE_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4),		\
+										\
+	/* IPv4 --> IPv4 */							\
+	ICE_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3),		\
+	ICE_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3),		\
+	ICE_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(32),						\
+	ICE_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP,  PAY4),		\
+	ICE_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),		\
+	ICE_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),		\
+										\
+	/* IPv4 --> IPv6 */							\
+	ICE_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3),		\
+	ICE_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3),		\
+	ICE_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(39),						\
+	ICE_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP,  PAY4),		\
+	ICE_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),		\
+	ICE_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),		\
+										\
+	/* IPv4 --> GRE/NAT */							\
+	ICE_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),		\
+										\
+	/* IPv4 --> GRE/NAT --> IPv4 */						\
+	ICE_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),		\
+	ICE_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),		\
+	ICE_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(47),						\
+	ICE_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP,  PAY4),		\
+	ICE_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),		\
+	ICE_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),		\
+										\
+	/* IPv4 --> GRE/NAT --> IPv6 */						\
+	ICE_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),		\
+	ICE_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),		\
+	ICE_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(54),						\
+	ICE_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP,  PAY4),		\
+	ICE_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),		\
+	ICE_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),		\
+										\
+	/* IPv4 --> GRE/NAT --> MAC */						\
+	ICE_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),	\
+										\
+	/* IPv4 --> GRE/NAT --> MAC --> IPv4 */					\
+	ICE_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),	\
+	ICE_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),	\
+	ICE_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP,  PAY4),	\
+	ICE_PTT_UNUSED_ENTRY(62),						\
+	ICE_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP,  PAY4),	\
+	ICE_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),	\
+	ICE_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),	\
+										\
+	/* IPv4 --> GRE/NAT -> MAC --> IPv6 */					\
+	ICE_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),	\
+	ICE_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),	\
+	ICE_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP,  PAY4),	\
+	ICE_PTT_UNUSED_ENTRY(69),						\
+	ICE_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP,  PAY4),	\
+	ICE_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),	\
+	ICE_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),	\
+										\
+	/* IPv4 --> GRE/NAT --> MAC/VLAN */					\
+	ICE_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),	\
+										\
+	/* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */				\
+	ICE_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),	\
+	ICE_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),	\
+	ICE_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP,  PAY4),	\
+	ICE_PTT_UNUSED_ENTRY(77),						\
+	ICE_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP,  PAY4),	\
+	ICE_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),	\
+	ICE_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),	\
+										\
+	/* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */				\
+	ICE_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),	\
+	ICE_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),	\
+	ICE_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP,  PAY4),	\
+	ICE_PTT_UNUSED_ENTRY(84),						\
+	ICE_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP,  PAY4),	\
+	ICE_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),	\
+	ICE_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),	\
+										\
+	/* Non Tunneled IPv6 */							\
+	ICE_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3),		\
+	ICE_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3),		\
+	ICE_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(91),						\
+	ICE_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP,  PAY4),		\
+	ICE_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4),		\
+	ICE_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4),		\
+										\
+	/* IPv6 --> IPv4 */							\
+	ICE_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3),		\
+	ICE_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3),		\
+	ICE_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(98),						\
+	ICE_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP,  PAY4),		\
+	ICE_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),		\
+	ICE_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),		\
+										\
+	/* IPv6 --> IPv6 */							\
+	ICE_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3),		\
+	ICE_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3),		\
+	ICE_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(105),						\
+	ICE_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP,  PAY4),		\
+	ICE_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),		\
+	ICE_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),		\
+										\
+	/* IPv6 --> GRE/NAT */							\
+	ICE_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),		\
+										\
+	/* IPv6 --> GRE/NAT -> IPv4 */						\
+	ICE_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),		\
+	ICE_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),		\
+	ICE_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(113),						\
+	ICE_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP,  PAY4),		\
+	ICE_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),		\
+	ICE_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),		\
+										\
+	/* IPv6 --> GRE/NAT -> IPv6 */						\
+	ICE_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),		\
+	ICE_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),		\
+	ICE_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP,  PAY4),		\
+	ICE_PTT_UNUSED_ENTRY(120),						\
+	ICE_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP,  PAY4),		\
+	ICE_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),		\
+	ICE_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),		\
+										\
+	/* IPv6 --> GRE/NAT -> MAC */						\
+	ICE_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),	\
+										\
+	/* IPv6 --> GRE/NAT -> MAC -> IPv4 */					\
+	ICE_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),	\
+	ICE_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),	\
+	ICE_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP,  PAY4),	\
+	ICE_PTT_UNUSED_ENTRY(128),						\
+	ICE_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP,  PAY4),	\
+	ICE_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),	\
+	ICE_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),	\
+										\
+	/* IPv6 --> GRE/NAT -> MAC -> IPv6 */					\
+	ICE_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),	\
+	ICE_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),	\
+	ICE_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP,  PAY4),	\
+	ICE_PTT_UNUSED_ENTRY(135),						\
+	ICE_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP,  PAY4),	\
+	ICE_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),	\
+	ICE_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),	\
+										\
+	/* IPv6 --> GRE/NAT -> MAC/VLAN */					\
+	ICE_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),	\
+										\
+	/* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */				\
+	ICE_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),	\
+	ICE_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),	\
+	ICE_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP,  PAY4),	\
+	ICE_PTT_UNUSED_ENTRY(143),						\
+	ICE_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP,  PAY4),	\
+	ICE_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),	\
+	ICE_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),	\
+										\
+	/* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */				\
+	ICE_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),	\
+	ICE_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),	\
+	ICE_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP,  PAY4),	\
+	ICE_PTT_UNUSED_ENTRY(150),						\
+	ICE_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP,  PAY4),	\
+	ICE_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),	\
+	ICE_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
+
+#define ICE_NUM_DEFINED_PTYPES	154
 
 /* macro to make the table lines short, use explicit indexing with [PTYPE] */
 #define ICE_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\
@@ -695,212 +901,10 @@ struct ice_tlan_ctx {
 
 /* Lookup table mapping in the 10-bit HW PTYPE to the bit field for decoding */
 static const struct ice_rx_ptype_decoded ice_ptype_lkup[BIT(10)] = {
-	/* L2 Packet types */
-	ICE_PTT_UNUSED_ENTRY(0),
-	ICE_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
-	ICE_PTT_UNUSED_ENTRY(2),
-	ICE_PTT_UNUSED_ENTRY(3),
-	ICE_PTT_UNUSED_ENTRY(4),
-	ICE_PTT_UNUSED_ENTRY(5),
-	ICE_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
-	ICE_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
-	ICE_PTT_UNUSED_ENTRY(8),
-	ICE_PTT_UNUSED_ENTRY(9),
-	ICE_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
-	ICE_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
-	ICE_PTT_UNUSED_ENTRY(12),
-	ICE_PTT_UNUSED_ENTRY(13),
-	ICE_PTT_UNUSED_ENTRY(14),
-	ICE_PTT_UNUSED_ENTRY(15),
-	ICE_PTT_UNUSED_ENTRY(16),
-	ICE_PTT_UNUSED_ENTRY(17),
-	ICE_PTT_UNUSED_ENTRY(18),
-	ICE_PTT_UNUSED_ENTRY(19),
-	ICE_PTT_UNUSED_ENTRY(20),
-	ICE_PTT_UNUSED_ENTRY(21),
-
-	/* Non Tunneled IPv4 */
-	ICE_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3),
-	ICE_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3),
-	ICE_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(25),
-	ICE_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP,  PAY4),
-	ICE_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4),
-	ICE_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4),
-
-	/* IPv4 --> IPv4 */
-	ICE_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
-	ICE_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
-	ICE_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(32),
-	ICE_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP,  PAY4),
-	ICE_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
-	ICE_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
-
-	/* IPv4 --> IPv6 */
-	ICE_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
-	ICE_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
-	ICE_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(39),
-	ICE_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP,  PAY4),
-	ICE_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
-	ICE_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
-
-	/* IPv4 --> GRE/NAT */
-	ICE_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
-
-	/* IPv4 --> GRE/NAT --> IPv4 */
-	ICE_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
-	ICE_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
-	ICE_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(47),
-	ICE_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP,  PAY4),
-	ICE_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
-	ICE_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
-
-	/* IPv4 --> GRE/NAT --> IPv6 */
-	ICE_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
-	ICE_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
-	ICE_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(54),
-	ICE_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP,  PAY4),
-	ICE_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
-	ICE_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
-
-	/* IPv4 --> GRE/NAT --> MAC */
-	ICE_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
-
-	/* IPv4 --> GRE/NAT --> MAC --> IPv4 */
-	ICE_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
-	ICE_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
-	ICE_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(62),
-	ICE_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP,  PAY4),
-	ICE_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
-	ICE_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
-
-	/* IPv4 --> GRE/NAT -> MAC --> IPv6 */
-	ICE_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
-	ICE_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
-	ICE_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(69),
-	ICE_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP,  PAY4),
-	ICE_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
-	ICE_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
-
-	/* IPv4 --> GRE/NAT --> MAC/VLAN */
-	ICE_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
-
-	/* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */
-	ICE_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
-	ICE_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
-	ICE_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(77),
-	ICE_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP,  PAY4),
-	ICE_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
-	ICE_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
-
-	/* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */
-	ICE_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
-	ICE_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
-	ICE_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(84),
-	ICE_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP,  PAY4),
-	ICE_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
-	ICE_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
-
-	/* Non Tunneled IPv6 */
-	ICE_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3),
-	ICE_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3),
-	ICE_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(91),
-	ICE_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP,  PAY4),
-	ICE_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4),
-	ICE_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4),
-
-	/* IPv6 --> IPv4 */
-	ICE_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
-	ICE_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
-	ICE_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(98),
-	ICE_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP,  PAY4),
-	ICE_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
-	ICE_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
-
-	/* IPv6 --> IPv6 */
-	ICE_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
-	ICE_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
-	ICE_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(105),
-	ICE_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP,  PAY4),
-	ICE_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
-	ICE_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
-
-	/* IPv6 --> GRE/NAT */
-	ICE_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
-
-	/* IPv6 --> GRE/NAT -> IPv4 */
-	ICE_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
-	ICE_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
-	ICE_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(113),
-	ICE_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP,  PAY4),
-	ICE_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
-	ICE_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
-
-	/* IPv6 --> GRE/NAT -> IPv6 */
-	ICE_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
-	ICE_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
-	ICE_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(120),
-	ICE_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP,  PAY4),
-	ICE_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
-	ICE_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
-
-	/* IPv6 --> GRE/NAT -> MAC */
-	ICE_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
-
-	/* IPv6 --> GRE/NAT -> MAC -> IPv4 */
-	ICE_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
-	ICE_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
-	ICE_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(128),
-	ICE_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP,  PAY4),
-	ICE_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
-	ICE_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
-
-	/* IPv6 --> GRE/NAT -> MAC -> IPv6 */
-	ICE_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
-	ICE_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
-	ICE_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(135),
-	ICE_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP,  PAY4),
-	ICE_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
-	ICE_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
-
-	/* IPv6 --> GRE/NAT -> MAC/VLAN */
-	ICE_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
-
-	/* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */
-	ICE_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
-	ICE_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
-	ICE_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(143),
-	ICE_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP,  PAY4),
-	ICE_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
-	ICE_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
-
-	/* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */
-	ICE_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
-	ICE_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
-	ICE_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP,  PAY4),
-	ICE_PTT_UNUSED_ENTRY(150),
-	ICE_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP,  PAY4),
-	ICE_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
-	ICE_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
+	ICE_PTYPES
 
 	/* unused entries */
-	[154 ... 1023] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
+	[ICE_NUM_DEFINED_PTYPES ... 1023] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
 };
 
 static inline struct ice_rx_ptype_decoded ice_decode_rx_desc_ptype(u16 ptype)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index e9589cadf811..1caa73644e7b 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -556,6 +556,78 @@ static int ice_xdp_rx_hw_ts(const struct xdp_md *ctx, u64 *ts_ns)
 	return 0;
 }
 
+/* Define a ptype index -> XDP hash type lookup table.
+ * It uses the same ptype definitions as ice_decode_rx_desc_ptype[],
+ * avoiding possible copy-paste errors.
+ */
+#undef ICE_PTT
+#undef ICE_PTT_UNUSED_ENTRY
+
+#define ICE_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\
+	[PTYPE] = XDP_RSS_L3_##OUTER_IP_VER | XDP_RSS_L4_##I | XDP_RSS_TYPE_##PL
+
+#define ICE_PTT_UNUSED_ENTRY(PTYPE) [PTYPE] = 0
+
+/* A few supplementary definitions for when XDP hash types do not coincide
+ * with what can be generated from ptype definitions
+ * by means of preprocessor concatenation.
+ */
+#define XDP_RSS_L3_NONE		XDP_RSS_TYPE_NONE
+#define XDP_RSS_L4_NONE		XDP_RSS_TYPE_NONE
+#define XDP_RSS_TYPE_PAY2	XDP_RSS_TYPE_L2
+#define XDP_RSS_TYPE_PAY3	XDP_RSS_TYPE_NONE
+#define XDP_RSS_TYPE_PAY4	XDP_RSS_L4
+
+static const enum xdp_rss_hash_type
+ice_ptype_to_xdp_hash[ICE_NUM_DEFINED_PTYPES] = {
+	ICE_PTYPES
+};
+
+#undef XDP_RSS_L3_NONE
+#undef XDP_RSS_L4_NONE
+#undef XDP_RSS_TYPE_PAY2
+#undef XDP_RSS_TYPE_PAY3
+#undef XDP_RSS_TYPE_PAY4
+
+#undef ICE_PTT
+#undef ICE_PTT_UNUSED_ENTRY
+
+/**
+ * ice_xdp_rx_hash_type - Get XDP-specific hash type from the RX descriptor
+ * @eop_desc: End of Packet descriptor
+ */
+static enum xdp_rss_hash_type
+ice_xdp_rx_hash_type(union ice_32b_rx_flex_desc *eop_desc)
+{
+	u16 ptype = ice_get_ptype(eop_desc);
+
+	if (unlikely(ptype >= ICE_NUM_DEFINED_PTYPES))
+		return 0;
+
+	return ice_ptype_to_xdp_hash[ptype];
+}
+
+/**
+ * ice_xdp_rx_hash - RX hash XDP hint handler
+ * @ctx: XDP buff pointer
+ * @hash: hash destination address
+ * @rss_type: XDP hash type destination address
+ *
+ * Copy RX hash (if available) and its type to the destination address.
+ */
+static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
+			   enum xdp_rss_hash_type *rss_type)
+{
+	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
+
+	*rss_type = ice_xdp_rx_hash_type(xdp_ext->eop_desc);
+	if (!ice_copy_rx_hash_from_desc(xdp_ext->eop_desc, hash))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 const struct xdp_metadata_ops ice_xdp_md_ops = {
 	.xmo_rx_timestamp		= ice_xdp_rx_hw_ts,
+	.xmo_rx_hash			= ice_xdp_rx_hash,
 };
diff --git a/include/net/xdp.h b/include/net/xdp.h
index d1c5381fc95f..6381560efae2 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -417,6 +417,7 @@ enum xdp_rss_hash_type {
 	XDP_RSS_L4_UDP		= BIT(5),
 	XDP_RSS_L4_SCTP		= BIT(6),
 	XDP_RSS_L4_IPSEC	= BIT(7), /* L4 based hash include IPSEC SPI */
+	XDP_RSS_L4_ICMP		= BIT(8),
 
 	/* Second part: RSS hash type combinations used for driver HW mapping */
 	XDP_RSS_TYPE_NONE            = 0,
@@ -432,11 +433,13 @@ enum xdp_rss_hash_type {
 	XDP_RSS_TYPE_L4_IPV4_UDP     = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_UDP,
 	XDP_RSS_TYPE_L4_IPV4_SCTP    = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_SCTP,
 	XDP_RSS_TYPE_L4_IPV4_IPSEC   = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_IPSEC,
+	XDP_RSS_TYPE_L4_IPV4_ICMP    = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_ICMP,
 
 	XDP_RSS_TYPE_L4_IPV6_TCP     = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_TCP,
 	XDP_RSS_TYPE_L4_IPV6_UDP     = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_UDP,
 	XDP_RSS_TYPE_L4_IPV6_SCTP    = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_SCTP,
 	XDP_RSS_TYPE_L4_IPV6_IPSEC   = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_IPSEC,
+	XDP_RSS_TYPE_L4_IPV6_ICMP    = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_ICMP,
 
 	XDP_RSS_TYPE_L4_IPV6_TCP_EX  = XDP_RSS_TYPE_L4_IPV6_TCP  | XDP_RSS_L3_DYNHDR,
 	XDP_RSS_TYPE_L4_IPV6_UDP_EX  = XDP_RSS_TYPE_L4_IPV6_UDP  | XDP_RSS_L3_DYNHDR,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 08/15] ice: Support XDP hints in AF_XDP ZC mode
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (6 preceding siblings ...)
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 07/15] ice: Support RX hash XDP hint Larysa Zaremba
@ 2023-05-12 15:26 ` Larysa Zaremba
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint Larysa Zaremba
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:26 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

In AF_XDP ZC, xdp_buff is not stored on ring,
instead it is provided by xsk_pool.
Space for metadata sources right after such buffers was already reserved
in commit 94ecc5ca4dbf ("xsk: Add cb area to struct xdp_buff_xsk").
This makes the implementation rather straightforward.

Update AF_XDP ZC packet processing to support XDP hints.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_xsk.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 3b80aed5d47a..7f5ce3529666 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -708,16 +708,25 @@ static int ice_xmit_xdp_tx_zc(struct xdp_buff *xdp,
  * @xdp: xdp_buff used as input to the XDP program
  * @xdp_prog: XDP program to run
  * @xdp_ring: ring to be used for XDP_TX action
+ * @rx_desc: packet descriptor
  *
  * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
  */
 static int
 ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
-	       struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
+	       struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
+	       union ice_32b_rx_flex_desc *rx_desc)
 {
 	int err, result = ICE_XDP_PASS;
 	u32 act;
 
+	/* We can safely convert xdp_buff_xsk to ice_xdp_buff,
+	 * because there are XSK_PRIV_MAX bytes reserved in xdp_buff_xsk
+	 * right after xdp_buff, for our private use.
+	 * Macro insures we do not go above the limit.
+	 */
+	XSK_CHECK_PRIV_TYPE(struct ice_xdp_buff);
+	ice_xdp_set_meta_srcs(xdp, rx_desc, rx_ring);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	if (likely(act == XDP_REDIRECT)) {
@@ -816,7 +825,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 		xsk_buff_set_size(xdp, size);
 		xsk_buff_dma_sync_for_cpu(xdp, rx_ring->xsk_pool);
 
-		xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring);
+		xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring,
+					 rx_desc);
 		if (likely(xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))) {
 			xdp_xmit |= xdp_res;
 		} else if (xdp_res == ICE_XDP_EXIT) {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (7 preceding siblings ...)
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 08/15] ice: Support XDP hints in AF_XDP ZC mode Larysa Zaremba
@ 2023-05-12 15:26 ` Larysa Zaremba
  2023-05-12 18:28   ` Stanislav Fomichev
  2023-05-15 15:36   ` Jesper Dangaard Brouer
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 10/15] ice: Implement " Larysa Zaremba
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:26 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Implement functionality that enables drivers to expose VLAN tag
to XDP code.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 Documentation/networking/xdp-rx-metadata.rst | 11 ++++++++-
 include/linux/netdevice.h                    |  2 ++
 include/net/xdp.h                            |  4 ++++
 kernel/bpf/offload.c                         |  4 ++++
 net/core/xdp.c                               | 24 ++++++++++++++++++++
 5 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
index 25ce72af81c2..73a78029c596 100644
--- a/Documentation/networking/xdp-rx-metadata.rst
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -18,7 +18,16 @@ Currently, the following kfuncs are supported. In the future, as more
 metadata is supported, this set will grow:
 
 .. kernel-doc:: net/core/xdp.c
-   :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash
+   :identifiers: bpf_xdp_metadata_rx_timestamp
+
+.. kernel-doc:: net/core/xdp.c
+   :identifiers: bpf_xdp_metadata_rx_hash
+
+.. kernel-doc:: net/core/xdp.c
+   :identifiers: bpf_xdp_metadata_rx_ctag
+
+.. kernel-doc:: net/core/xdp.c
+   :identifiers: bpf_xdp_metadata_rx_stag
 
 An XDP program can use these kfuncs to read the metadata into stack
 variables for its own consumption. Or, to pass the metadata on to other
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08fbd4622ccf..fdae37fe11f5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1655,6 +1655,8 @@ struct xdp_metadata_ops {
 	int	(*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
 	int	(*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash,
 			       enum xdp_rss_hash_type *rss_type);
+	int	(*xmo_rx_ctag)(const struct xdp_md *ctx, u16 *vlan_tag);
+	int	(*xmo_rx_stag)(const struct xdp_md *ctx, u16 *vlan_tag);
 };
 
 /**
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 6381560efae2..2db7439fc60f 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -389,6 +389,10 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
 			   bpf_xdp_metadata_rx_timestamp) \
 	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH, \
 			   bpf_xdp_metadata_rx_hash) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CTAG, \
+			   bpf_xdp_metadata_rx_ctag) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_STAG, \
+			   bpf_xdp_metadata_rx_stag) \
 
 enum {
 #define XDP_METADATA_KFUNC(name, _) name,
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index d9c9f45e3529..2c6b6e82cfac 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -848,6 +848,10 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
 		p = ops->xmo_rx_timestamp;
 	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
 		p = ops->xmo_rx_hash;
+	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CTAG))
+		p = ops->xmo_rx_ctag;
+	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_STAG))
+		p = ops->xmo_rx_stag;
 out:
 	up_read(&bpf_devs_lock);
 
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 41e5ca8643ec..eff21501609f 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -738,6 +738,30 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
 	return -EOPNOTSUPP;
 }
 
+/**
+ * bpf_xdp_metadata_rx_ctag - Read XDP packet inner vlan tag.
+ * @ctx: XDP context pointer.
+ * @vlan_tag: Return value pointer.
+ *
+ * Returns 0 on success or ``-errno`` on error.
+ */
+__bpf_kfunc int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
+{
+	return -EOPNOTSUPP;
+}
+
+/**
+ * bpf_xdp_metadata_rx_stag - Read XDP packet outer vlan tag.
+ * @ctx: XDP context pointer.
+ * @vlan_tag: Return value pointer.
+ *
+ * Returns 0 on success or ``-errno`` on error.
+ */
+__bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
+{
+	return -EOPNOTSUPP;
+}
+
 __diag_pop();
 
 BTF_SET8_START(xdp_metadata_kfunc_ids)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 10/15] ice: Implement VLAN tag hint
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (8 preceding siblings ...)
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint Larysa Zaremba
@ 2023-05-12 15:26 ` Larysa Zaremba
  2023-05-12 18:31   ` Stanislav Fomichev
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 11/15] xdp: Add checksum level hint Larysa Zaremba
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:26 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Implement .xmo_rx_vlan_tag callback to allow XDP code to read
packet's VLAN tag.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 44 +++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 1caa73644e7b..39547feb6106 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -627,7 +627,51 @@ static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
 	return 0;
 }
 
+/**
+ * ice_xdp_rx_ctag - VLAN tag XDP hint handler
+ * @ctx: XDP buff pointer
+ * @vlan_tag: destination address
+ *
+ * Copy VLAN tag (if was stripped) to the destination address.
+ */
+static int ice_xdp_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
+{
+	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
+	netdev_features_t features;
+
+	features = xdp_ext->rx_ring->netdev->features;
+
+	if (!(features & NETIF_F_HW_VLAN_CTAG_RX))
+		return -EINVAL;
+
+	*vlan_tag = ice_get_vlan_tag_from_rx_desc(xdp_ext->eop_desc);
+	return 0;
+}
+
+/**
+ * ice_xdp_rx_stag - VLAN s-tag XDP hint handler
+ * @ctx: XDP buff pointer
+ * @vlan_tag: destination address
+ *
+ * Copy VLAN s-tag (if was stripped) to the destination address.
+ */
+static int ice_xdp_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
+{
+	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
+	netdev_features_t features;
+
+	features = xdp_ext->rx_ring->netdev->features;
+
+	if (!(features & NETIF_F_HW_VLAN_STAG_RX))
+		return -EINVAL;
+
+	*vlan_tag = ice_get_vlan_tag_from_rx_desc(xdp_ext->eop_desc);
+	return 0;
+}
+
 const struct xdp_metadata_ops ice_xdp_md_ops = {
 	.xmo_rx_timestamp		= ice_xdp_rx_hw_ts,
 	.xmo_rx_hash			= ice_xdp_rx_hash,
+	.xmo_rx_ctag			= ice_xdp_rx_ctag,
+	.xmo_rx_stag			= ice_xdp_rx_stag,
 };
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 11/15] xdp: Add checksum level hint
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (9 preceding siblings ...)
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 10/15] ice: Implement " Larysa Zaremba
@ 2023-05-12 15:26 ` Larysa Zaremba
  2023-05-12 18:34   ` Stanislav Fomichev
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 12/15] ice: Implement " Larysa Zaremba
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:26 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Implement functionality that enables drivers to expose to XDP code,
whether checksums was checked and on what level.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 Documentation/networking/xdp-rx-metadata.rst |  3 +++
 include/linux/netdevice.h                    |  1 +
 include/net/xdp.h                            |  2 ++
 kernel/bpf/offload.c                         |  2 ++
 net/core/xdp.c                               | 12 ++++++++++++
 5 files changed, 20 insertions(+)

diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
index 73a78029c596..f74f0e283097 100644
--- a/Documentation/networking/xdp-rx-metadata.rst
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -29,6 +29,9 @@ metadata is supported, this set will grow:
 .. kernel-doc:: net/core/xdp.c
    :identifiers: bpf_xdp_metadata_rx_stag
 
+.. kernel-doc:: net/core/xdp.c
+   :identifiers: bpf_xdp_metadata_rx_csum_lvl
+
 An XDP program can use these kfuncs to read the metadata into stack
 variables for its own consumption. Or, to pass the metadata on to other
 consumers, an XDP program can store it into the metadata area carried
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fdae37fe11f5..ddade3a15366 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1657,6 +1657,7 @@ struct xdp_metadata_ops {
 			       enum xdp_rss_hash_type *rss_type);
 	int	(*xmo_rx_ctag)(const struct xdp_md *ctx, u16 *vlan_tag);
 	int	(*xmo_rx_stag)(const struct xdp_md *ctx, u16 *vlan_tag);
+	int	(*xmo_rx_csum_lvl)(const struct xdp_md *ctx, u8 *csum_level);
 };
 
 /**
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 2db7439fc60f..0fbd25616241 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -393,6 +393,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
 			   bpf_xdp_metadata_rx_ctag) \
 	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_STAG, \
 			   bpf_xdp_metadata_rx_stag) \
+	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM_LVL, \
+			   bpf_xdp_metadata_rx_csum_lvl) \
 
 enum {
 #define XDP_METADATA_KFUNC(name, _) name,
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 2c6b6e82cfac..8bd54fb4ac63 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -852,6 +852,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
 		p = ops->xmo_rx_ctag;
 	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_STAG))
 		p = ops->xmo_rx_stag;
+	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM_LVL))
+		p = ops->xmo_rx_csum_lvl;
 out:
 	up_read(&bpf_devs_lock);
 
diff --git a/net/core/xdp.c b/net/core/xdp.c
index eff21501609f..7dd45fd62983 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -762,6 +762,18 @@ __bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag
 	return -EOPNOTSUPP;
 }
 
+/**
+ * bpf_xdp_metadata_rx_csum_lvl - Get depth at which HW has checked the checksum.
+ * @ctx: XDP context pointer.
+ * @csum_level: Return value pointer.
+ *
+ * Returns 0 on success (HW has checked the checksum) or ``-errno`` on error.
+ */
+__bpf_kfunc int bpf_xdp_metadata_rx_csum_lvl(const struct xdp_md *ctx, u8 *csum_level)
+{
+	return -EOPNOTSUPP;
+}
+
 __diag_pop();
 
 BTF_SET8_START(xdp_metadata_kfunc_ids)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 12/15] ice: Implement checksum level hint
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (10 preceding siblings ...)
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 11/15] xdp: Add checksum level hint Larysa Zaremba
@ 2023-05-12 15:26 ` Larysa Zaremba
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 13/15] selftests/bpf: Allow VLAN packets in xdp_hw_metadata Larysa Zaremba
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:26 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Implement .xmo_rx_csum_lvl callback to allow XDP code to determine,
whether checksum was checked by hardware and on what level.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 24 ++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 39547feb6106..6a3ec925f20d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -161,6 +161,8 @@ ice_rx_csum_checked(union ice_32b_rx_flex_desc *rx_desc, u16 ptype,
 	 */
 	if (decoded.tunnel_type >= ICE_RX_PTYPE_TUNNEL_IP_GRENAT)
 		*csum_lvl_dst = 1;
+	else
+		*csum_lvl_dst = 0;
 
 	/* Only report checksum unnecessary for TCP, UDP, or SCTP */
 	switch (decoded.inner_prot) {
@@ -190,7 +192,7 @@ static void
 ice_rx_csum_into_skb(struct ice_rx_ring *ring, struct sk_buff *skb,
 		     union ice_32b_rx_flex_desc *rx_desc, u16 ptype)
 {
-	u8 csum_level = 0;
+	u8 csum_level;
 
 	/* Start with CHECKSUM_NONE and by default csum_level = 0 */
 	skb->ip_summed = CHECKSUM_NONE;
@@ -669,9 +671,29 @@ static int ice_xdp_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
 	return 0;
 }
 
+/**
+ * ice_xdp_rx_csum_lvl - Get level, at which HW has checked the checksum
+ * @ctx: XDP buff pointer
+ * @csum_lvl: destination address
+ *
+ * Copy HW checksum level (if was checked) to the destination address.
+ */
+static int ice_xdp_rx_csum_lvl(const struct xdp_md *ctx, u8 *csum_lvl)
+{
+	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
+	u16 ptype = ice_get_ptype(xdp_ext->eop_desc);
+
+	if (!ice_rx_csum_checked(xdp_ext->eop_desc, ptype, csum_lvl,
+				 xdp_ext->rx_ring))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 const struct xdp_metadata_ops ice_xdp_md_ops = {
 	.xmo_rx_timestamp		= ice_xdp_rx_hw_ts,
 	.xmo_rx_hash			= ice_xdp_rx_hash,
 	.xmo_rx_ctag			= ice_xdp_rx_ctag,
 	.xmo_rx_stag			= ice_xdp_rx_stag,
+	.xmo_rx_csum_lvl		= ice_xdp_rx_csum_lvl,
 };
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 13/15] selftests/bpf: Allow VLAN packets in xdp_hw_metadata
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (11 preceding siblings ...)
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 12/15] ice: Implement " Larysa Zaremba
@ 2023-05-12 15:26 ` Larysa Zaremba
  2023-05-12 18:33   ` Stanislav Fomichev
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32 Larysa Zaremba
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 15/15] selftests/bpf: Add flags and new hints to xdp_hw_metadata Larysa Zaremba
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:26 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Make VLAN c-tag and s-tag XDP hint testing more convenient
by not skipping VLAN-ed packets.

Allow both 802.1ad and 802.1Q headers.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 9 ++++++++-
 tools/testing/selftests/bpf/xdp_metadata.h          | 8 ++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index b2dfd7066c6e..f95f82a8b449 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -26,15 +26,22 @@ int rx(struct xdp_md *ctx)
 {
 	void *data, *data_meta, *data_end;
 	struct ipv6hdr *ip6h = NULL;
-	struct ethhdr *eth = NULL;
 	struct udphdr *udp = NULL;
 	struct iphdr *iph = NULL;
 	struct xdp_meta *meta;
+	struct ethhdr *eth;
 	int err;
 
 	data = (void *)(long)ctx->data;
 	data_end = (void *)(long)ctx->data_end;
 	eth = data;
+
+	if (eth + 1 < data_end && eth->h_proto == bpf_htons(ETH_P_8021AD))
+		eth = (void *)eth + sizeof(struct vlan_hdr);
+
+	if (eth + 1 < data_end && eth->h_proto == bpf_htons(ETH_P_8021Q))
+		eth = (void *)eth + sizeof(struct vlan_hdr);
+
 	if (eth + 1 < data_end) {
 		if (eth->h_proto == bpf_htons(ETH_P_IP)) {
 			iph = (void *)(eth + 1);
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index 938a729bd307..6664893c2c77 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -9,6 +9,14 @@
 #define ETH_P_IPV6 0x86DD
 #endif
 
+#ifndef ETH_P_8021Q
+#define ETH_P_8021Q 0x8100
+#endif
+
+#ifndef ETH_P_8021AD
+#define ETH_P_8021AD 0x88A8
+#endif
+
 struct xdp_meta {
 	__u64 rx_timestamp;
 	__u64 xdp_timestamp;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (12 preceding siblings ...)
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 13/15] selftests/bpf: Allow VLAN packets in xdp_hw_metadata Larysa Zaremba
@ 2023-05-12 15:26 ` Larysa Zaremba
  2023-05-15 16:17   ` Jesper Dangaard Brouer
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 15/15] selftests/bpf: Add flags and new hints to xdp_hw_metadata Larysa Zaremba
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:26 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel, Aleksander Lobakin

From: Aleksander Lobakin <aleksander.lobakin@intel.com>

When using XDP hints, metadata sometimes has to be much bigger
than 32 bytes. Relax the restriction, allow metadata larger than 32 bytes
and make __skb_metadata_differs() work with bigger lengths.

Now size of metadata is only limited by the fact it is stored as u8
in skb_shared_info, so maximum possible value is 255. Other important
conditions, such as having enough space for xdp_frame building, are already
checked in bpf_xdp_adjust_meta().

The requirement of having its length aligned to 4 bytes is still
valid.

Signed-off-by: Aleksander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 include/linux/skbuff.h | 13 ++++++++-----
 include/net/xdp.h      |  7 ++++++-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8ddb4af1a501..afcd372aecdf 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4219,10 +4219,13 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a,
 {
 	const void *a = skb_metadata_end(skb_a);
 	const void *b = skb_metadata_end(skb_b);
-	/* Using more efficient varaiant than plain call to memcmp(). */
-#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
 	u64 diffs = 0;
 
+	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
+	    BITS_PER_LONG != 64)
+		goto slow;
+
+	/* Using more efficient variant than plain call to memcmp(). */
 	switch (meta_len) {
 #define __it(x, op) (x -= sizeof(u##op))
 #define __it_diff(a, b, op) (*(u##op *)__it(a, op)) ^ (*(u##op *)__it(b, op))
@@ -4242,11 +4245,11 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a,
 		fallthrough;
 	case  4: diffs |= __it_diff(a, b, 32);
 		break;
+	default:
+slow:
+		return memcmp(a - meta_len, b - meta_len, meta_len);
 	}
 	return diffs;
-#else
-	return memcmp(a - meta_len, b - meta_len, meta_len);
-#endif
 }
 
 static inline bool skb_metadata_differs(const struct sk_buff *skb_a,
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0fbd25616241..f48723250c7c 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -370,7 +370,12 @@ xdp_data_meta_unsupported(const struct xdp_buff *xdp)
 
 static inline bool xdp_metalen_invalid(unsigned long metalen)
 {
-	return (metalen & (sizeof(__u32) - 1)) || (metalen > 32);
+	typeof(metalen) meta_max;
+
+	meta_max = type_max(typeof_member(struct skb_shared_info, meta_len));
+	BUILD_BUG_ON(!__builtin_constant_p(meta_max));
+
+	return !IS_ALIGNED(metalen, sizeof(u32)) || metalen > meta_max;
 }
 
 struct xdp_attachment_info {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH RESEND bpf-next 15/15] selftests/bpf: Add flags and new hints to xdp_hw_metadata
  2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
                   ` (13 preceding siblings ...)
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32 Larysa Zaremba
@ 2023-05-12 15:26 ` Larysa Zaremba
  2023-05-12 18:37   ` Stanislav Fomichev
  14 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-12 15:26 UTC (permalink / raw)
  To: bpf
  Cc: Larysa Zaremba, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Jesper Dangaard Brouer, Alexander Lobakin,
	Magnus Karlsson, Maryam Tahhan, xdp-hints, netdev,
	intel-wired-lan, linux-kernel

Add hints added in the previous patches (VLAN tags and checksum level)
to the xdp_hw_metadata program.

Also, to make metadata layout more straightforward, add flags field
to pass information about validity of every separate hint separately.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 .../selftests/bpf/progs/xdp_hw_metadata.c     | 40 ++++++++++++++++---
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 29 +++++++++++---
 tools/testing/selftests/bpf/xdp_metadata.h    | 28 ++++++++++++-
 3 files changed, 85 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index f95f82a8b449..97bad79ce4ca 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -20,6 +20,12 @@ extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx,
 					 __u64 *timestamp) __ksym;
 extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash,
 				    enum xdp_rss_hash_type *rss_type) __ksym;
+extern int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx,
+				    __u16 *vlan_tag) __ksym;
+extern int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx,
+				    __u16 *vlan_tag) __ksym;
+extern int bpf_xdp_metadata_rx_csum_lvl(const struct xdp_md *ctx,
+					__u8 *csum_level) __ksym;
 
 SEC("xdp")
 int rx(struct xdp_md *ctx)
@@ -83,15 +89,39 @@ int rx(struct xdp_md *ctx)
 		return XDP_PASS;
 	}
 
+	meta->hint_valid = 0;
+
 	err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
-	if (!err)
+	if (err) {
+		meta->rx_timestamp_err = err;
+	} else {
+		meta->hint_valid |= XDP_META_FIELD_TS;
 		meta->xdp_timestamp = bpf_ktime_get_tai_ns();
-	else
-		meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
+	}
 
 	err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
-	if (err < 0)
-		meta->rx_hash_err = err; /* Used by AF_XDP as no hash signal */
+	if (err)
+		meta->rx_hash_err = err;
+	else
+		meta->hint_valid |= XDP_META_FIELD_RSS;
+
+	err = bpf_xdp_metadata_rx_ctag(ctx, &meta->rx_ctag);
+	if (err)
+		meta->rx_ctag_err = err;
+	else
+		meta->hint_valid |= XDP_META_FIELD_CTAG;
+
+	err = bpf_xdp_metadata_rx_stag(ctx, &meta->rx_stag);
+	if (err)
+		meta->rx_stag_err = err;
+	else
+		meta->hint_valid |= XDP_META_FIELD_STAG;
+
+	err = bpf_xdp_metadata_rx_csum_lvl(ctx, &meta->rx_csum_lvl);
+	if (err)
+		meta->rx_csum_err = err;
+	else
+		meta->hint_valid |= XDP_META_FIELD_CSUM_LVL;
 
 	__sync_add_and_fetch(&pkts_redir, 1);
 	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 613321eb84c1..efcabe68f64b 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -156,15 +156,16 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id)
 
 	meta = data - sizeof(*meta);
 
-	if (meta->rx_hash_err < 0)
-		printf("No rx_hash err=%d\n", meta->rx_hash_err);
-	else
+	if (meta->hint_valid & XDP_META_FIELD_RSS)
 		printf("rx_hash: 0x%X with RSS type:0x%X\n",
 		       meta->rx_hash, meta->rx_hash_type);
+	else
+		printf("No rx_hash, err=%d\n", meta->rx_hash_err);
+
+	if (meta->hint_valid & XDP_META_FIELD_TS) {
+		printf("rx_timestamp:  %llu (sec:%0.4f)\n", meta->rx_timestamp,
+		       (double)meta->rx_timestamp / NANOSEC_PER_SEC);
 
-	printf("rx_timestamp:  %llu (sec:%0.4f)\n", meta->rx_timestamp,
-	       (double)meta->rx_timestamp / NANOSEC_PER_SEC);
-	if (meta->rx_timestamp) {
 		__u64 usr_clock = gettime(clock_id);
 		__u64 xdp_clock = meta->xdp_timestamp;
 		__s64 delta_X = xdp_clock - meta->rx_timestamp;
@@ -179,8 +180,24 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id)
 		       usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
 		       (double)delta_X2U / NANOSEC_PER_SEC,
 		       (double)delta_X2U / 1000);
+	} else {
+		printf("No rx_timestamp, err=%d\n", meta->rx_timestamp_err);
 	}
 
+	if (meta->hint_valid & XDP_META_FIELD_CTAG)
+		printf("rx_ctag: %u\n", meta->rx_ctag);
+	else
+		printf("No rx_ctag, err=%d\n", meta->rx_ctag_err);
+
+	if (meta->hint_valid & XDP_META_FIELD_STAG)
+		printf("rx_stag: %u\n", meta->rx_stag);
+	else
+		printf("No rx_stag, err=%d\n", meta->rx_stag_err);
+
+	if (meta->hint_valid & XDP_META_FIELD_CSUM_LVL)
+		printf("Checksum was checked at level %u\n", meta->rx_csum_lvl);
+	else
+		printf("Checksum was not checked, err=%d\n", meta->rx_csum_err);
 }
 
 static void verify_skb_metadata(int fd)
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index 6664893c2c77..7c0267a8918a 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -17,12 +17,38 @@
 #define ETH_P_8021AD 0x88A8
 #endif
 
+#define BIT(nr)			(1 << (nr))
+
+enum xdp_meta_field {
+	XDP_META_FIELD_TS	= BIT(0),
+	XDP_META_FIELD_RSS	= BIT(1),
+	XDP_META_FIELD_CTAG	= BIT(2),
+	XDP_META_FIELD_STAG	= BIT(3),
+	XDP_META_FIELD_CSUM_LVL	= BIT(4),
+};
+
 struct xdp_meta {
-	__u64 rx_timestamp;
+	union {
+		__u64 rx_timestamp;
+		__s32 rx_timestamp_err;
+	};
 	__u64 xdp_timestamp;
 	__u32 rx_hash;
 	union {
 		__u32 rx_hash_type;
 		__s32 rx_hash_err;
 	};
+	union {
+		__u16 rx_ctag;
+		__s32 rx_ctag_err;
+	};
+	union {
+		__u16 rx_stag;
+		__s32 rx_stag_err;
+	};
+	union {
+		__u8 rx_csum_lvl;
+		__s32 rx_csum_err;
+	};
+	enum xdp_meta_field hint_valid;
 };
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 06/15] ice: Support HW timestamp hint
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 06/15] ice: Support HW timestamp hint Larysa Zaremba
@ 2023-05-12 18:19   ` Stanislav Fomichev
  2023-05-16 16:17     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 54+ messages in thread
From: Stanislav Fomichev @ 2023-05-12 18:19 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On 05/12, Larysa Zaremba wrote:
> Use previously refactored code and create a function
> that allows XDP code to read HW timestamp.
> 
> HW timestamp is the first supported hint in the driver,
> so also add xdp_metadata_ops.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice.h          |  2 ++
>  drivers/net/ethernet/intel/ice/ice_main.c     |  1 +
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 22 +++++++++++++++++++
>  3 files changed, 25 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index aa32111afd6e..ba1bb8392db1 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -962,4 +962,6 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf)
>  	set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
>  	clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
>  }
> +
> +extern const struct xdp_metadata_ops ice_xdp_md_ops;
>  #endif /* _ICE_H_ */
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index a1f7c8edc22f..cda6c4a80737 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -3378,6 +3378,7 @@ static void ice_set_ops(struct ice_vsi *vsi)
>  
>  	netdev->netdev_ops = &ice_netdev_ops;
>  	netdev->udp_tunnel_nic_info = &pf->hw.udp_tunnel_nic;
> +	netdev->xdp_metadata_ops = &ice_xdp_md_ops;
>  	ice_set_ethtool_ops(netdev);
>  
>  	if (vsi->type != ICE_VSI_PF)
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> index 2515f5f7a2b6..e9589cadf811 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> @@ -537,3 +537,25 @@ void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res,
>  			spin_unlock(&xdp_ring->tx_lock);
>  	}
>  }
> +
> +/**
> + * ice_xdp_rx_hw_ts - HW timestamp XDP hint handler
> + * @ctx: XDP buff pointer
> + * @ts_ns: destination address
> + *
> + * Copy HW timestamp (if available) to the destination address.
> + */
> +static int ice_xdp_rx_hw_ts(const struct xdp_md *ctx, u64 *ts_ns)
> +{
> +	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
> +
> +	if (!ice_ptp_copy_rx_hwts_from_desc(xdp_ext->rx_ring,
> +					    xdp_ext->eop_desc, ts_ns))
> +		return -EOPNOTSUPP;

Per Jesper's recent update, should this be ENODATA?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 07/15] ice: Support RX hash XDP hint
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 07/15] ice: Support RX hash XDP hint Larysa Zaremba
@ 2023-05-12 18:22   ` Stanislav Fomichev
  2023-05-15 13:46     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Stanislav Fomichev @ 2023-05-12 18:22 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On 05/12, Larysa Zaremba wrote:
> RX hash XDP hint requests both hash value and type.
> Type is XDP-specific, so we need a separate way to map
> these values to the hardware ptypes, so create a lookup table.
> 
> Instead of creating a new long list, reuse contents
> of ice_decode_rx_desc_ptype[] through preprocessor.
> 
> Current hash type enum does not contain ICMP packet type,
> but ice devices support it, so also add a new type into core code.
> 
> Then use previously refactored code and create a function
> that allows XDP code to read RX hash.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  .../net/ethernet/intel/ice/ice_lan_tx_rx.h    | 412 +++++++++---------
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  72 +++
>  include/net/xdp.h                             |   3 +
>  3 files changed, 283 insertions(+), 204 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> index 89f986a75cc8..d384ddfcb83e 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> +++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> @@ -673,6 +673,212 @@ struct ice_tlan_ctx {
>   *      Use the enum ice_rx_l2_ptype to decode the packet type
>   * ENDIF
>   */
> +#define ICE_PTYPES								\
> +	/* L2 Packet types */							\
> +	ICE_PTT_UNUSED_ENTRY(0),						\
> +	ICE_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),			\
> +	ICE_PTT_UNUSED_ENTRY(2),						\
> +	ICE_PTT_UNUSED_ENTRY(3),						\
> +	ICE_PTT_UNUSED_ENTRY(4),						\
> +	ICE_PTT_UNUSED_ENTRY(5),						\
> +	ICE_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),			\
> +	ICE_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),			\
> +	ICE_PTT_UNUSED_ENTRY(8),						\
> +	ICE_PTT_UNUSED_ENTRY(9),						\
> +	ICE_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),		\
> +	ICE_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),		\
> +	ICE_PTT_UNUSED_ENTRY(12),						\
> +	ICE_PTT_UNUSED_ENTRY(13),						\
> +	ICE_PTT_UNUSED_ENTRY(14),						\
> +	ICE_PTT_UNUSED_ENTRY(15),						\
> +	ICE_PTT_UNUSED_ENTRY(16),						\
> +	ICE_PTT_UNUSED_ENTRY(17),						\
> +	ICE_PTT_UNUSED_ENTRY(18),						\
> +	ICE_PTT_UNUSED_ENTRY(19),						\
> +	ICE_PTT_UNUSED_ENTRY(20),						\
> +	ICE_PTT_UNUSED_ENTRY(21),						\
> +										\
> +	/* Non Tunneled IPv4 */							\
> +	ICE_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3),		\
> +	ICE_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3),		\
> +	ICE_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(25),						\
> +	ICE_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP,  PAY4),		\
> +	ICE_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4),		\
> +	ICE_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv4 --> IPv4 */							\
> +	ICE_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3),		\
> +	ICE_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3),		\
> +	ICE_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(32),						\
> +	ICE_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP,  PAY4),		\
> +	ICE_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),		\
> +	ICE_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv4 --> IPv6 */							\
> +	ICE_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3),		\
> +	ICE_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3),		\
> +	ICE_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(39),						\
> +	ICE_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP,  PAY4),		\
> +	ICE_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),		\
> +	ICE_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv4 --> GRE/NAT */							\
> +	ICE_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),		\
> +										\
> +	/* IPv4 --> GRE/NAT --> IPv4 */						\
> +	ICE_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),		\
> +	ICE_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),		\
> +	ICE_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(47),						\
> +	ICE_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP,  PAY4),		\
> +	ICE_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),		\
> +	ICE_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv4 --> GRE/NAT --> IPv6 */						\
> +	ICE_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),		\
> +	ICE_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),		\
> +	ICE_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(54),						\
> +	ICE_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP,  PAY4),		\
> +	ICE_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),		\
> +	ICE_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv4 --> GRE/NAT --> MAC */						\
> +	ICE_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),	\
> +										\
> +	/* IPv4 --> GRE/NAT --> MAC --> IPv4 */					\
> +	ICE_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),	\
> +	ICE_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),	\
> +	ICE_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP,  PAY4),	\
> +	ICE_PTT_UNUSED_ENTRY(62),						\
> +	ICE_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP,  PAY4),	\
> +	ICE_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),	\
> +	ICE_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),	\
> +										\
> +	/* IPv4 --> GRE/NAT -> MAC --> IPv6 */					\
> +	ICE_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),	\
> +	ICE_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),	\
> +	ICE_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP,  PAY4),	\
> +	ICE_PTT_UNUSED_ENTRY(69),						\
> +	ICE_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP,  PAY4),	\
> +	ICE_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),	\
> +	ICE_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),	\
> +										\
> +	/* IPv4 --> GRE/NAT --> MAC/VLAN */					\
> +	ICE_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),	\
> +										\
> +	/* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */				\
> +	ICE_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),	\
> +	ICE_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),	\
> +	ICE_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP,  PAY4),	\
> +	ICE_PTT_UNUSED_ENTRY(77),						\
> +	ICE_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP,  PAY4),	\
> +	ICE_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),	\
> +	ICE_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),	\
> +										\
> +	/* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */				\
> +	ICE_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),	\
> +	ICE_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),	\
> +	ICE_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP,  PAY4),	\
> +	ICE_PTT_UNUSED_ENTRY(84),						\
> +	ICE_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP,  PAY4),	\
> +	ICE_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),	\
> +	ICE_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),	\
> +										\
> +	/* Non Tunneled IPv6 */							\
> +	ICE_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3),		\
> +	ICE_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3),		\
> +	ICE_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(91),						\
> +	ICE_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP,  PAY4),		\
> +	ICE_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4),		\
> +	ICE_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv6 --> IPv4 */							\
> +	ICE_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3),		\
> +	ICE_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3),		\
> +	ICE_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(98),						\
> +	ICE_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP,  PAY4),		\
> +	ICE_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),		\
> +	ICE_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv6 --> IPv6 */							\
> +	ICE_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3),		\
> +	ICE_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3),		\
> +	ICE_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(105),						\
> +	ICE_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP,  PAY4),		\
> +	ICE_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),		\
> +	ICE_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv6 --> GRE/NAT */							\
> +	ICE_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),		\
> +										\
> +	/* IPv6 --> GRE/NAT -> IPv4 */						\
> +	ICE_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),		\
> +	ICE_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),		\
> +	ICE_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(113),						\
> +	ICE_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP,  PAY4),		\
> +	ICE_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),		\
> +	ICE_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv6 --> GRE/NAT -> IPv6 */						\
> +	ICE_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),		\
> +	ICE_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),		\
> +	ICE_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP,  PAY4),		\
> +	ICE_PTT_UNUSED_ENTRY(120),						\
> +	ICE_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP,  PAY4),		\
> +	ICE_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),		\
> +	ICE_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),		\
> +										\
> +	/* IPv6 --> GRE/NAT -> MAC */						\
> +	ICE_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),	\
> +										\
> +	/* IPv6 --> GRE/NAT -> MAC -> IPv4 */					\
> +	ICE_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),	\
> +	ICE_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),	\
> +	ICE_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP,  PAY4),	\
> +	ICE_PTT_UNUSED_ENTRY(128),						\
> +	ICE_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP,  PAY4),	\
> +	ICE_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),	\
> +	ICE_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),	\
> +										\
> +	/* IPv6 --> GRE/NAT -> MAC -> IPv6 */					\
> +	ICE_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),	\
> +	ICE_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),	\
> +	ICE_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP,  PAY4),	\
> +	ICE_PTT_UNUSED_ENTRY(135),						\
> +	ICE_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP,  PAY4),	\
> +	ICE_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),	\
> +	ICE_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),	\
> +										\
> +	/* IPv6 --> GRE/NAT -> MAC/VLAN */					\
> +	ICE_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),	\
> +										\
> +	/* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */				\
> +	ICE_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),	\
> +	ICE_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),	\
> +	ICE_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP,  PAY4),	\
> +	ICE_PTT_UNUSED_ENTRY(143),						\
> +	ICE_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP,  PAY4),	\
> +	ICE_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),	\
> +	ICE_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),	\
> +										\
> +	/* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */				\
> +	ICE_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),	\
> +	ICE_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),	\
> +	ICE_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP,  PAY4),	\
> +	ICE_PTT_UNUSED_ENTRY(150),						\
> +	ICE_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP,  PAY4),	\
> +	ICE_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),	\
> +	ICE_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
> +
> +#define ICE_NUM_DEFINED_PTYPES	154
>  
>  /* macro to make the table lines short, use explicit indexing with [PTYPE] */
>  #define ICE_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\
> @@ -695,212 +901,10 @@ struct ice_tlan_ctx {
>  
>  /* Lookup table mapping in the 10-bit HW PTYPE to the bit field for decoding */
>  static const struct ice_rx_ptype_decoded ice_ptype_lkup[BIT(10)] = {
> -	/* L2 Packet types */
> -	ICE_PTT_UNUSED_ENTRY(0),
> -	ICE_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
> -	ICE_PTT_UNUSED_ENTRY(2),
> -	ICE_PTT_UNUSED_ENTRY(3),
> -	ICE_PTT_UNUSED_ENTRY(4),
> -	ICE_PTT_UNUSED_ENTRY(5),
> -	ICE_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
> -	ICE_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
> -	ICE_PTT_UNUSED_ENTRY(8),
> -	ICE_PTT_UNUSED_ENTRY(9),
> -	ICE_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
> -	ICE_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
> -	ICE_PTT_UNUSED_ENTRY(12),
> -	ICE_PTT_UNUSED_ENTRY(13),
> -	ICE_PTT_UNUSED_ENTRY(14),
> -	ICE_PTT_UNUSED_ENTRY(15),
> -	ICE_PTT_UNUSED_ENTRY(16),
> -	ICE_PTT_UNUSED_ENTRY(17),
> -	ICE_PTT_UNUSED_ENTRY(18),
> -	ICE_PTT_UNUSED_ENTRY(19),
> -	ICE_PTT_UNUSED_ENTRY(20),
> -	ICE_PTT_UNUSED_ENTRY(21),
> -
> -	/* Non Tunneled IPv4 */
> -	ICE_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3),
> -	ICE_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3),
> -	ICE_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(25),
> -	ICE_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP,  PAY4),
> -	ICE_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4),
> -	ICE_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4),
> -
> -	/* IPv4 --> IPv4 */
> -	ICE_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
> -	ICE_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
> -	ICE_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(32),
> -	ICE_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP,  PAY4),
> -	ICE_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
> -	ICE_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
> -
> -	/* IPv4 --> IPv6 */
> -	ICE_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
> -	ICE_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
> -	ICE_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(39),
> -	ICE_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP,  PAY4),
> -	ICE_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
> -	ICE_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
> -
> -	/* IPv4 --> GRE/NAT */
> -	ICE_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
> -
> -	/* IPv4 --> GRE/NAT --> IPv4 */
> -	ICE_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
> -	ICE_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
> -	ICE_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(47),
> -	ICE_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP,  PAY4),
> -	ICE_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
> -	ICE_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
> -
> -	/* IPv4 --> GRE/NAT --> IPv6 */
> -	ICE_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
> -	ICE_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
> -	ICE_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(54),
> -	ICE_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP,  PAY4),
> -	ICE_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
> -	ICE_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
> -
> -	/* IPv4 --> GRE/NAT --> MAC */
> -	ICE_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
> -
> -	/* IPv4 --> GRE/NAT --> MAC --> IPv4 */
> -	ICE_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
> -	ICE_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
> -	ICE_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(62),
> -	ICE_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP,  PAY4),
> -	ICE_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
> -	ICE_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
> -
> -	/* IPv4 --> GRE/NAT -> MAC --> IPv6 */
> -	ICE_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
> -	ICE_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
> -	ICE_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(69),
> -	ICE_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP,  PAY4),
> -	ICE_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
> -	ICE_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
> -
> -	/* IPv4 --> GRE/NAT --> MAC/VLAN */
> -	ICE_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
> -
> -	/* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */
> -	ICE_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
> -	ICE_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
> -	ICE_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(77),
> -	ICE_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP,  PAY4),
> -	ICE_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
> -	ICE_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
> -
> -	/* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */
> -	ICE_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
> -	ICE_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
> -	ICE_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(84),
> -	ICE_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP,  PAY4),
> -	ICE_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
> -	ICE_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
> -
> -	/* Non Tunneled IPv6 */
> -	ICE_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3),
> -	ICE_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3),
> -	ICE_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(91),
> -	ICE_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP,  PAY4),
> -	ICE_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4),
> -	ICE_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4),
> -
> -	/* IPv6 --> IPv4 */
> -	ICE_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
> -	ICE_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
> -	ICE_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(98),
> -	ICE_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP,  PAY4),
> -	ICE_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
> -	ICE_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
> -
> -	/* IPv6 --> IPv6 */
> -	ICE_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
> -	ICE_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
> -	ICE_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(105),
> -	ICE_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP,  PAY4),
> -	ICE_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
> -	ICE_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
> -
> -	/* IPv6 --> GRE/NAT */
> -	ICE_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
> -
> -	/* IPv6 --> GRE/NAT -> IPv4 */
> -	ICE_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
> -	ICE_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
> -	ICE_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(113),
> -	ICE_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP,  PAY4),
> -	ICE_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
> -	ICE_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
> -
> -	/* IPv6 --> GRE/NAT -> IPv6 */
> -	ICE_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
> -	ICE_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
> -	ICE_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(120),
> -	ICE_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP,  PAY4),
> -	ICE_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
> -	ICE_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
> -
> -	/* IPv6 --> GRE/NAT -> MAC */
> -	ICE_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
> -
> -	/* IPv6 --> GRE/NAT -> MAC -> IPv4 */
> -	ICE_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
> -	ICE_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
> -	ICE_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(128),
> -	ICE_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP,  PAY4),
> -	ICE_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
> -	ICE_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
> -
> -	/* IPv6 --> GRE/NAT -> MAC -> IPv6 */
> -	ICE_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
> -	ICE_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
> -	ICE_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(135),
> -	ICE_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP,  PAY4),
> -	ICE_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
> -	ICE_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
> -
> -	/* IPv6 --> GRE/NAT -> MAC/VLAN */
> -	ICE_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
> -
> -	/* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */
> -	ICE_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
> -	ICE_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
> -	ICE_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(143),
> -	ICE_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP,  PAY4),
> -	ICE_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
> -	ICE_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
> -
> -	/* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */
> -	ICE_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
> -	ICE_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
> -	ICE_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP,  PAY4),
> -	ICE_PTT_UNUSED_ENTRY(150),
> -	ICE_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP,  PAY4),
> -	ICE_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
> -	ICE_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
> +	ICE_PTYPES
>  
>  	/* unused entries */
> -	[154 ... 1023] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
> +	[ICE_NUM_DEFINED_PTYPES ... 1023] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
>  };
>  
>  static inline struct ice_rx_ptype_decoded ice_decode_rx_desc_ptype(u16 ptype)
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> index e9589cadf811..1caa73644e7b 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> @@ -556,6 +556,78 @@ static int ice_xdp_rx_hw_ts(const struct xdp_md *ctx, u64 *ts_ns)
>  	return 0;
>  }
>  
> +/* Define a ptype index -> XDP hash type lookup table.
> + * It uses the same ptype definitions as ice_decode_rx_desc_ptype[],
> + * avoiding possible copy-paste errors.
> + */
> +#undef ICE_PTT
> +#undef ICE_PTT_UNUSED_ENTRY
> +
> +#define ICE_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\
> +	[PTYPE] = XDP_RSS_L3_##OUTER_IP_VER | XDP_RSS_L4_##I | XDP_RSS_TYPE_##PL
> +
> +#define ICE_PTT_UNUSED_ENTRY(PTYPE) [PTYPE] = 0
> +
> +/* A few supplementary definitions for when XDP hash types do not coincide
> + * with what can be generated from ptype definitions
> + * by means of preprocessor concatenation.
> + */
> +#define XDP_RSS_L3_NONE		XDP_RSS_TYPE_NONE
> +#define XDP_RSS_L4_NONE		XDP_RSS_TYPE_NONE
> +#define XDP_RSS_TYPE_PAY2	XDP_RSS_TYPE_L2
> +#define XDP_RSS_TYPE_PAY3	XDP_RSS_TYPE_NONE
> +#define XDP_RSS_TYPE_PAY4	XDP_RSS_L4
> +
> +static const enum xdp_rss_hash_type
> +ice_ptype_to_xdp_hash[ICE_NUM_DEFINED_PTYPES] = {
> +	ICE_PTYPES
> +};
> +
> +#undef XDP_RSS_L3_NONE
> +#undef XDP_RSS_L4_NONE
> +#undef XDP_RSS_TYPE_PAY2
> +#undef XDP_RSS_TYPE_PAY3
> +#undef XDP_RSS_TYPE_PAY4
> +
> +#undef ICE_PTT
> +#undef ICE_PTT_UNUSED_ENTRY
> +
> +/**
> + * ice_xdp_rx_hash_type - Get XDP-specific hash type from the RX descriptor
> + * @eop_desc: End of Packet descriptor
> + */
> +static enum xdp_rss_hash_type
> +ice_xdp_rx_hash_type(union ice_32b_rx_flex_desc *eop_desc)
> +{
> +	u16 ptype = ice_get_ptype(eop_desc);
> +
> +	if (unlikely(ptype >= ICE_NUM_DEFINED_PTYPES))
> +		return 0;
> +
> +	return ice_ptype_to_xdp_hash[ptype];
> +}
> +
> +/**
> + * ice_xdp_rx_hash - RX hash XDP hint handler
> + * @ctx: XDP buff pointer
> + * @hash: hash destination address
> + * @rss_type: XDP hash type destination address
> + *
> + * Copy RX hash (if available) and its type to the destination address.
> + */
> +static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
> +			   enum xdp_rss_hash_type *rss_type)
> +{
> +	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
> +
> +	*rss_type = ice_xdp_rx_hash_type(xdp_ext->eop_desc);
> +	if (!ice_copy_rx_hash_from_desc(xdp_ext->eop_desc, hash))
> +		return -EOPNOTSUPP;

Same here? See the following for the context:
https://lore.kernel.org/bpf/167940675120.2718408.8176058626864184420.stgit@firesoul/

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint Larysa Zaremba
@ 2023-05-12 18:28   ` Stanislav Fomichev
  2023-05-15 15:36   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 54+ messages in thread
From: Stanislav Fomichev @ 2023-05-12 18:28 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On 05/12, Larysa Zaremba wrote:
> Implement functionality that enables drivers to expose VLAN tag
> to XDP code.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>

Acked-by: Stanislav Fomichev <sdf@google.com>

> ---
>  Documentation/networking/xdp-rx-metadata.rst | 11 ++++++++-
>  include/linux/netdevice.h                    |  2 ++
>  include/net/xdp.h                            |  4 ++++
>  kernel/bpf/offload.c                         |  4 ++++
>  net/core/xdp.c                               | 24 ++++++++++++++++++++
>  5 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> index 25ce72af81c2..73a78029c596 100644
> --- a/Documentation/networking/xdp-rx-metadata.rst
> +++ b/Documentation/networking/xdp-rx-metadata.rst
> @@ -18,7 +18,16 @@ Currently, the following kfuncs are supported. In the future, as more
>  metadata is supported, this set will grow:
>  
>  .. kernel-doc:: net/core/xdp.c
> -   :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash
> +   :identifiers: bpf_xdp_metadata_rx_timestamp
> +
> +.. kernel-doc:: net/core/xdp.c
> +   :identifiers: bpf_xdp_metadata_rx_hash
> +
> +.. kernel-doc:: net/core/xdp.c
> +   :identifiers: bpf_xdp_metadata_rx_ctag
> +
> +.. kernel-doc:: net/core/xdp.c
> +   :identifiers: bpf_xdp_metadata_rx_stag
>  
>  An XDP program can use these kfuncs to read the metadata into stack
>  variables for its own consumption. Or, to pass the metadata on to other
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 08fbd4622ccf..fdae37fe11f5 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1655,6 +1655,8 @@ struct xdp_metadata_ops {
>  	int	(*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
>  	int	(*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash,
>  			       enum xdp_rss_hash_type *rss_type);
> +	int	(*xmo_rx_ctag)(const struct xdp_md *ctx, u16 *vlan_tag);
> +	int	(*xmo_rx_stag)(const struct xdp_md *ctx, u16 *vlan_tag);
>  };
>  
>  /**
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 6381560efae2..2db7439fc60f 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -389,6 +389,10 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
>  			   bpf_xdp_metadata_rx_timestamp) \
>  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH, \
>  			   bpf_xdp_metadata_rx_hash) \
> +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CTAG, \
> +			   bpf_xdp_metadata_rx_ctag) \
> +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_STAG, \
> +			   bpf_xdp_metadata_rx_stag) \
>  
>  enum {
>  #define XDP_METADATA_KFUNC(name, _) name,
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index d9c9f45e3529..2c6b6e82cfac 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -848,6 +848,10 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
>  		p = ops->xmo_rx_timestamp;
>  	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH))
>  		p = ops->xmo_rx_hash;
> +	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CTAG))
> +		p = ops->xmo_rx_ctag;
> +	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_STAG))
> +		p = ops->xmo_rx_stag;
>  out:
>  	up_read(&bpf_devs_lock);
>  
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 41e5ca8643ec..eff21501609f 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -738,6 +738,30 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
>  	return -EOPNOTSUPP;
>  }
>  
> +/**
> + * bpf_xdp_metadata_rx_ctag - Read XDP packet inner vlan tag.
> + * @ctx: XDP context pointer.
> + * @vlan_tag: Return value pointer.
> + *
> + * Returns 0 on success or ``-errno`` on error.
> + */
> +__bpf_kfunc int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +/**
> + * bpf_xdp_metadata_rx_stag - Read XDP packet outer vlan tag.
> + * @ctx: XDP context pointer.
> + * @vlan_tag: Return value pointer.
> + *
> + * Returns 0 on success or ``-errno`` on error.
> + */
> +__bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  __diag_pop();
>  
>  BTF_SET8_START(xdp_metadata_kfunc_ids)
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 10/15] ice: Implement VLAN tag hint
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 10/15] ice: Implement " Larysa Zaremba
@ 2023-05-12 18:31   ` Stanislav Fomichev
  2023-05-15 13:41     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Stanislav Fomichev @ 2023-05-12 18:31 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On 05/12, Larysa Zaremba wrote:
> Implement .xmo_rx_vlan_tag callback to allow XDP code to read
> packet's VLAN tag.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 44 +++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> index 1caa73644e7b..39547feb6106 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> @@ -627,7 +627,51 @@ static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
>  	return 0;
>  }
>  
> +/**
> + * ice_xdp_rx_ctag - VLAN tag XDP hint handler
> + * @ctx: XDP buff pointer
> + * @vlan_tag: destination address
> + *
> + * Copy VLAN tag (if was stripped) to the destination address.
> + */
> +static int ice_xdp_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
> +{
> +	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
> +	netdev_features_t features;
> +

[..]

> +	features = xdp_ext->rx_ring->netdev->features;
> +
> +	if (!(features & NETIF_F_HW_VLAN_CTAG_RX))
> +		return -EINVAL;

Passing-by comment: why do we need to check features?
ice_get_vlan_tag_from_rx_desc seems to be checking a bunch of
fields in the descriptors, so that should be enough?

> +
> +	*vlan_tag = ice_get_vlan_tag_from_rx_desc(xdp_ext->eop_desc);

Should we also do the following:

if (!*vlan_tag)
	return -ENODATA;

?

> +	return 0;
> +}
> +
> +/**
> + * ice_xdp_rx_stag - VLAN s-tag XDP hint handler
> + * @ctx: XDP buff pointer
> + * @vlan_tag: destination address
> + *
> + * Copy VLAN s-tag (if was stripped) to the destination address.
> + */
> +static int ice_xdp_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
> +{
> +	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
> +	netdev_features_t features;
> +
> +	features = xdp_ext->rx_ring->netdev->features;
> +
> +	if (!(features & NETIF_F_HW_VLAN_STAG_RX))
> +		return -EINVAL;
> +
> +	*vlan_tag = ice_get_vlan_tag_from_rx_desc(xdp_ext->eop_desc);
> +	return 0;
> +}
> +
>  const struct xdp_metadata_ops ice_xdp_md_ops = {
>  	.xmo_rx_timestamp		= ice_xdp_rx_hw_ts,
>  	.xmo_rx_hash			= ice_xdp_rx_hash,
> +	.xmo_rx_ctag			= ice_xdp_rx_ctag,
> +	.xmo_rx_stag			= ice_xdp_rx_stag,
>  };
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 13/15] selftests/bpf: Allow VLAN packets in xdp_hw_metadata
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 13/15] selftests/bpf: Allow VLAN packets in xdp_hw_metadata Larysa Zaremba
@ 2023-05-12 18:33   ` Stanislav Fomichev
  2023-05-15 14:05     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Stanislav Fomichev @ 2023-05-12 18:33 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On 05/12, Larysa Zaremba wrote:
> Make VLAN c-tag and s-tag XDP hint testing more convenient
> by not skipping VLAN-ed packets.
> 
> Allow both 802.1ad and 802.1Q headers.

Can we also extend non-hw test? That should require adding metadata
handlers to veth to extract relevant parts from skb + update ip link
commands to add vlan id. Should be relatively easy to do?

> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 9 ++++++++-
>  tools/testing/selftests/bpf/xdp_metadata.h          | 8 ++++++++
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> index b2dfd7066c6e..f95f82a8b449 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> @@ -26,15 +26,22 @@ int rx(struct xdp_md *ctx)
>  {
>  	void *data, *data_meta, *data_end;
>  	struct ipv6hdr *ip6h = NULL;
> -	struct ethhdr *eth = NULL;
>  	struct udphdr *udp = NULL;
>  	struct iphdr *iph = NULL;
>  	struct xdp_meta *meta;
> +	struct ethhdr *eth;
>  	int err;
>  
>  	data = (void *)(long)ctx->data;
>  	data_end = (void *)(long)ctx->data_end;
>  	eth = data;
> +
> +	if (eth + 1 < data_end && eth->h_proto == bpf_htons(ETH_P_8021AD))
> +		eth = (void *)eth + sizeof(struct vlan_hdr);
> +
> +	if (eth + 1 < data_end && eth->h_proto == bpf_htons(ETH_P_8021Q))
> +		eth = (void *)eth + sizeof(struct vlan_hdr);
> +
>  	if (eth + 1 < data_end) {
>  		if (eth->h_proto == bpf_htons(ETH_P_IP)) {
>  			iph = (void *)(eth + 1);
> diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
> index 938a729bd307..6664893c2c77 100644
> --- a/tools/testing/selftests/bpf/xdp_metadata.h
> +++ b/tools/testing/selftests/bpf/xdp_metadata.h
> @@ -9,6 +9,14 @@
>  #define ETH_P_IPV6 0x86DD
>  #endif
>  
> +#ifndef ETH_P_8021Q
> +#define ETH_P_8021Q 0x8100
> +#endif
> +
> +#ifndef ETH_P_8021AD
> +#define ETH_P_8021AD 0x88A8
> +#endif
> +
>  struct xdp_meta {
>  	__u64 rx_timestamp;
>  	__u64 xdp_timestamp;
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 11/15] xdp: Add checksum level hint
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 11/15] xdp: Add checksum level hint Larysa Zaremba
@ 2023-05-12 18:34   ` Stanislav Fomichev
  2023-05-15 13:49     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Stanislav Fomichev @ 2023-05-12 18:34 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On 05/12, Larysa Zaremba wrote:
> Implement functionality that enables drivers to expose to XDP code,
> whether checksums was checked and on what level.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  Documentation/networking/xdp-rx-metadata.rst |  3 +++
>  include/linux/netdevice.h                    |  1 +
>  include/net/xdp.h                            |  2 ++
>  kernel/bpf/offload.c                         |  2 ++
>  net/core/xdp.c                               | 12 ++++++++++++
>  5 files changed, 20 insertions(+)
> 
> diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> index 73a78029c596..f74f0e283097 100644
> --- a/Documentation/networking/xdp-rx-metadata.rst
> +++ b/Documentation/networking/xdp-rx-metadata.rst
> @@ -29,6 +29,9 @@ metadata is supported, this set will grow:
>  .. kernel-doc:: net/core/xdp.c
>     :identifiers: bpf_xdp_metadata_rx_stag
>  
> +.. kernel-doc:: net/core/xdp.c
> +   :identifiers: bpf_xdp_metadata_rx_csum_lvl
> +
>  An XDP program can use these kfuncs to read the metadata into stack
>  variables for its own consumption. Or, to pass the metadata on to other
>  consumers, an XDP program can store it into the metadata area carried
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index fdae37fe11f5..ddade3a15366 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1657,6 +1657,7 @@ struct xdp_metadata_ops {
>  			       enum xdp_rss_hash_type *rss_type);
>  	int	(*xmo_rx_ctag)(const struct xdp_md *ctx, u16 *vlan_tag);
>  	int	(*xmo_rx_stag)(const struct xdp_md *ctx, u16 *vlan_tag);
> +	int	(*xmo_rx_csum_lvl)(const struct xdp_md *ctx, u8 *csum_level);
>  };
>  
>  /**
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 2db7439fc60f..0fbd25616241 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -393,6 +393,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
>  			   bpf_xdp_metadata_rx_ctag) \
>  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_STAG, \
>  			   bpf_xdp_metadata_rx_stag) \
> +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM_LVL, \
> +			   bpf_xdp_metadata_rx_csum_lvl) \
>  
>  enum {
>  #define XDP_METADATA_KFUNC(name, _) name,
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index 2c6b6e82cfac..8bd54fb4ac63 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -852,6 +852,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
>  		p = ops->xmo_rx_ctag;
>  	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_STAG))
>  		p = ops->xmo_rx_stag;
> +	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM_LVL))
> +		p = ops->xmo_rx_csum_lvl;
>  out:
>  	up_read(&bpf_devs_lock);
>  
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index eff21501609f..7dd45fd62983 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -762,6 +762,18 @@ __bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag
>  	return -EOPNOTSUPP;
>  }
>  
> +/**
> + * bpf_xdp_metadata_rx_csum_lvl - Get depth at which HW has checked the checksum.
> + * @ctx: XDP context pointer.
> + * @csum_level: Return value pointer.

Let's maybe clarify what the level means here? For example, do we start
counting from 0 or 1?

> + *
> + * Returns 0 on success (HW has checked the checksum) or ``-errno`` on error.
> + */
> +__bpf_kfunc int bpf_xdp_metadata_rx_csum_lvl(const struct xdp_md *ctx, u8 *csum_level)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  __diag_pop();
>  
>  BTF_SET8_START(xdp_metadata_kfunc_ids)
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 15/15] selftests/bpf: Add flags and new hints to xdp_hw_metadata
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 15/15] selftests/bpf: Add flags and new hints to xdp_hw_metadata Larysa Zaremba
@ 2023-05-12 18:37   ` Stanislav Fomichev
  0 siblings, 0 replies; 54+ messages in thread
From: Stanislav Fomichev @ 2023-05-12 18:37 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On 05/12, Larysa Zaremba wrote:
> Add hints added in the previous patches (VLAN tags and checksum level)
> to the xdp_hw_metadata program.
> 
> Also, to make metadata layout more straightforward, add flags field
> to pass information about validity of every separate hint separately.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>

Acked-by: Stanislav Fomichev <sdf@google.com>

> ---
>  .../selftests/bpf/progs/xdp_hw_metadata.c     | 40 ++++++++++++++++---
>  tools/testing/selftests/bpf/xdp_hw_metadata.c | 29 +++++++++++---
>  tools/testing/selftests/bpf/xdp_metadata.h    | 28 ++++++++++++-
>  3 files changed, 85 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> index f95f82a8b449..97bad79ce4ca 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> @@ -20,6 +20,12 @@ extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx,
>  					 __u64 *timestamp) __ksym;
>  extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash,
>  				    enum xdp_rss_hash_type *rss_type) __ksym;
> +extern int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx,
> +				    __u16 *vlan_tag) __ksym;
> +extern int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx,
> +				    __u16 *vlan_tag) __ksym;
> +extern int bpf_xdp_metadata_rx_csum_lvl(const struct xdp_md *ctx,
> +					__u8 *csum_level) __ksym;
>  
>  SEC("xdp")
>  int rx(struct xdp_md *ctx)
> @@ -83,15 +89,39 @@ int rx(struct xdp_md *ctx)
>  		return XDP_PASS;
>  	}
>  
> +	meta->hint_valid = 0;
> +
>  	err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
> -	if (!err)
> +	if (err) {
> +		meta->rx_timestamp_err = err;
> +	} else {
> +		meta->hint_valid |= XDP_META_FIELD_TS;
>  		meta->xdp_timestamp = bpf_ktime_get_tai_ns();
> -	else
> -		meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
> +	}
>  
>  	err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
> -	if (err < 0)
> -		meta->rx_hash_err = err; /* Used by AF_XDP as no hash signal */
> +	if (err)
> +		meta->rx_hash_err = err;
> +	else
> +		meta->hint_valid |= XDP_META_FIELD_RSS;
> +
> +	err = bpf_xdp_metadata_rx_ctag(ctx, &meta->rx_ctag);
> +	if (err)
> +		meta->rx_ctag_err = err;
> +	else
> +		meta->hint_valid |= XDP_META_FIELD_CTAG;
> +
> +	err = bpf_xdp_metadata_rx_stag(ctx, &meta->rx_stag);
> +	if (err)
> +		meta->rx_stag_err = err;
> +	else
> +		meta->hint_valid |= XDP_META_FIELD_STAG;
> +
> +	err = bpf_xdp_metadata_rx_csum_lvl(ctx, &meta->rx_csum_lvl);
> +	if (err)
> +		meta->rx_csum_err = err;
> +	else
> +		meta->hint_valid |= XDP_META_FIELD_CSUM_LVL;
>  
>  	__sync_add_and_fetch(&pkts_redir, 1);
>  	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> index 613321eb84c1..efcabe68f64b 100644
> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> @@ -156,15 +156,16 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id)
>  
>  	meta = data - sizeof(*meta);
>  
> -	if (meta->rx_hash_err < 0)
> -		printf("No rx_hash err=%d\n", meta->rx_hash_err);
> -	else
> +	if (meta->hint_valid & XDP_META_FIELD_RSS)
>  		printf("rx_hash: 0x%X with RSS type:0x%X\n",
>  		       meta->rx_hash, meta->rx_hash_type);
> +	else
> +		printf("No rx_hash, err=%d\n", meta->rx_hash_err);
> +
> +	if (meta->hint_valid & XDP_META_FIELD_TS) {
> +		printf("rx_timestamp:  %llu (sec:%0.4f)\n", meta->rx_timestamp,
> +		       (double)meta->rx_timestamp / NANOSEC_PER_SEC);
>  
> -	printf("rx_timestamp:  %llu (sec:%0.4f)\n", meta->rx_timestamp,
> -	       (double)meta->rx_timestamp / NANOSEC_PER_SEC);
> -	if (meta->rx_timestamp) {
>  		__u64 usr_clock = gettime(clock_id);
>  		__u64 xdp_clock = meta->xdp_timestamp;
>  		__s64 delta_X = xdp_clock - meta->rx_timestamp;
> @@ -179,8 +180,24 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id)
>  		       usr_clock, (double)usr_clock / NANOSEC_PER_SEC,
>  		       (double)delta_X2U / NANOSEC_PER_SEC,
>  		       (double)delta_X2U / 1000);
> +	} else {
> +		printf("No rx_timestamp, err=%d\n", meta->rx_timestamp_err);
>  	}
>  
> +	if (meta->hint_valid & XDP_META_FIELD_CTAG)
> +		printf("rx_ctag: %u\n", meta->rx_ctag);
> +	else
> +		printf("No rx_ctag, err=%d\n", meta->rx_ctag_err);
> +
> +	if (meta->hint_valid & XDP_META_FIELD_STAG)
> +		printf("rx_stag: %u\n", meta->rx_stag);
> +	else
> +		printf("No rx_stag, err=%d\n", meta->rx_stag_err);
> +
> +	if (meta->hint_valid & XDP_META_FIELD_CSUM_LVL)
> +		printf("Checksum was checked at level %u\n", meta->rx_csum_lvl);
> +	else
> +		printf("Checksum was not checked, err=%d\n", meta->rx_csum_err);
>  }
>  
>  static void verify_skb_metadata(int fd)
> diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
> index 6664893c2c77..7c0267a8918a 100644
> --- a/tools/testing/selftests/bpf/xdp_metadata.h
> +++ b/tools/testing/selftests/bpf/xdp_metadata.h
> @@ -17,12 +17,38 @@
>  #define ETH_P_8021AD 0x88A8
>  #endif
>  
> +#define BIT(nr)			(1 << (nr))
> +
> +enum xdp_meta_field {
> +	XDP_META_FIELD_TS	= BIT(0),
> +	XDP_META_FIELD_RSS	= BIT(1),
> +	XDP_META_FIELD_CTAG	= BIT(2),
> +	XDP_META_FIELD_STAG	= BIT(3),
> +	XDP_META_FIELD_CSUM_LVL	= BIT(4),
> +};
> +
>  struct xdp_meta {
> -	__u64 rx_timestamp;
> +	union {
> +		__u64 rx_timestamp;
> +		__s32 rx_timestamp_err;
> +	};
>  	__u64 xdp_timestamp;
>  	__u32 rx_hash;
>  	union {
>  		__u32 rx_hash_type;
>  		__s32 rx_hash_err;
>  	};
> +	union {
> +		__u16 rx_ctag;
> +		__s32 rx_ctag_err;
> +	};
> +	union {
> +		__u16 rx_stag;
> +		__s32 rx_stag_err;
> +	};
> +	union {
> +		__u8 rx_csum_lvl;
> +		__s32 rx_csum_err;
> +	};
> +	enum xdp_meta_field hint_valid;
>  };
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 10/15] ice: Implement VLAN tag hint
  2023-05-12 18:31   ` Stanislav Fomichev
@ 2023-05-15 13:41     ` Larysa Zaremba
  2023-05-15 15:07       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-15 13:41 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On Fri, May 12, 2023 at 11:31:21AM -0700, Stanislav Fomichev wrote:
> On 05/12, Larysa Zaremba wrote:
> > Implement .xmo_rx_vlan_tag callback to allow XDP code to read
> > packet's VLAN tag.
> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 44 +++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > index 1caa73644e7b..39547feb6106 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > @@ -627,7 +627,51 @@ static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
> >  	return 0;
> >  }
> >  
> > +/**
> > + * ice_xdp_rx_ctag - VLAN tag XDP hint handler
> > + * @ctx: XDP buff pointer
> > + * @vlan_tag: destination address
> > + *
> > + * Copy VLAN tag (if was stripped) to the destination address.
> > + */
> > +static int ice_xdp_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
> > +{
> > +	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
> > +	netdev_features_t features;
> > +
> 
> [..]
> 
> > +	features = xdp_ext->rx_ring->netdev->features;
> > +
> > +	if (!(features & NETIF_F_HW_VLAN_CTAG_RX))
> > +		return -EINVAL;
> 
> Passing-by comment: why do we need to check features?
> ice_get_vlan_tag_from_rx_desc seems to be checking a bunch of
> fields in the descriptors, so that should be enough?

Unfortunately, it is not enough, because it only checks, if there is a valid 
value in the descriptor, without distinguishing c-tag from s-tag. In this
hardware, c-tag and s-tag are mutually exclusive, so they can occupy same 
descriptor fields. Checking netdev features is just the easiest way to tell them 
apart.

I guess, storing this information in in the ring structure would be more 
efficient than checking netdev features. I know Piotr Raczynski indends to 
review this series, so maybe he would provide some additional 
feedback/suggestions.

> 
> > +
> > +	*vlan_tag = ice_get_vlan_tag_from_rx_desc(xdp_ext->eop_desc);
> 
> Should we also do the following:
> 
> if (!*vlan_tag)
> 	return -ENODATA;
> 
> ?

Oh, returning VLAN tag with zero value really made sense to me at the beginning,
but after playing with different kinds of packets, I think returning error makes 
more sense. Will change.

> 
> > +	return 0;
> > +}
> > +
> > +/**
> > + * ice_xdp_rx_stag - VLAN s-tag XDP hint handler
> > + * @ctx: XDP buff pointer
> > + * @vlan_tag: destination address
> > + *
> > + * Copy VLAN s-tag (if was stripped) to the destination address.
> > + */
> > +static int ice_xdp_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
> > +{
> > +	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
> > +	netdev_features_t features;
> > +
> > +	features = xdp_ext->rx_ring->netdev->features;
> > +
> > +	if (!(features & NETIF_F_HW_VLAN_STAG_RX))
> > +		return -EINVAL;
> > +
> > +	*vlan_tag = ice_get_vlan_tag_from_rx_desc(xdp_ext->eop_desc);
> > +	return 0;
> > +}
> > +
> >  const struct xdp_metadata_ops ice_xdp_md_ops = {
> >  	.xmo_rx_timestamp		= ice_xdp_rx_hw_ts,
> >  	.xmo_rx_hash			= ice_xdp_rx_hash,
> > +	.xmo_rx_ctag			= ice_xdp_rx_ctag,
> > +	.xmo_rx_stag			= ice_xdp_rx_stag,
> >  };
> > -- 
> > 2.35.3
> > 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 07/15] ice: Support RX hash XDP hint
  2023-05-12 18:22   ` Stanislav Fomichev
@ 2023-05-15 13:46     ` Larysa Zaremba
  0 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-15 13:46 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On Fri, May 12, 2023 at 11:22:05AM -0700, Stanislav Fomichev wrote:
> On 05/12, Larysa Zaremba wrote:
> > RX hash XDP hint requests both hash value and type.
> > Type is XDP-specific, so we need a separate way to map
> > these values to the hardware ptypes, so create a lookup table.
> > 
> > Instead of creating a new long list, reuse contents
> > of ice_decode_rx_desc_ptype[] through preprocessor.
> > 
> > Current hash type enum does not contain ICMP packet type,
> > but ice devices support it, so also add a new type into core code.
> > 
> > Then use previously refactored code and create a function
> > that allows XDP code to read RX hash.
> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  .../net/ethernet/intel/ice/ice_lan_tx_rx.h    | 412 +++++++++---------
> >  drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  72 +++
> >  include/net/xdp.h                             |   3 +
> >  3 files changed, 283 insertions(+), 204 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> > index 89f986a75cc8..d384ddfcb83e 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> > @@ -673,6 +673,212 @@ struct ice_tlan_ctx {
> >   *      Use the enum ice_rx_l2_ptype to decode the packet type
> >   * ENDIF
> >   */
> > +#define ICE_PTYPES								\
> > +	/* L2 Packet types */							\
> > +	ICE_PTT_UNUSED_ENTRY(0),						\
> > +	ICE_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),			\
> > +	ICE_PTT_UNUSED_ENTRY(2),						\

[...]

> > + * @eop_desc: End of Packet descriptor
> > + */
> > +static enum xdp_rss_hash_type
> > +ice_xdp_rx_hash_type(union ice_32b_rx_flex_desc *eop_desc)
> > +{
> > +	u16 ptype = ice_get_ptype(eop_desc);
> > +
> > +	if (unlikely(ptype >= ICE_NUM_DEFINED_PTYPES))
> > +		return 0;
> > +
> > +	return ice_ptype_to_xdp_hash[ptype];
> > +}
> > +
> > +/**
> > + * ice_xdp_rx_hash - RX hash XDP hint handler
> > + * @ctx: XDP buff pointer
> > + * @hash: hash destination address
> > + * @rss_type: XDP hash type destination address
> > + *
> > + * Copy RX hash (if available) and its type to the destination address.
> > + */
> > +static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
> > +			   enum xdp_rss_hash_type *rss_type)
> > +{
> > +	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
> > +
> > +	*rss_type = ice_xdp_rx_hash_type(xdp_ext->eop_desc);
> > +	if (!ice_copy_rx_hash_from_desc(xdp_ext->eop_desc, hash))
> > +		return -EOPNOTSUPP;
> 
> Same here? See the following for the context:
> https://lore.kernel.org/bpf/167940675120.2718408.8176058626864184420.stgit@firesoul/

Thanks! Will fix.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 11/15] xdp: Add checksum level hint
  2023-05-12 18:34   ` Stanislav Fomichev
@ 2023-05-15 13:49     ` Larysa Zaremba
  0 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-15 13:49 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On Fri, May 12, 2023 at 11:34:21AM -0700, Stanislav Fomichev wrote:
> On 05/12, Larysa Zaremba wrote:
> > Implement functionality that enables drivers to expose to XDP code,
> > whether checksums was checked and on what level.
> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  Documentation/networking/xdp-rx-metadata.rst |  3 +++
> >  include/linux/netdevice.h                    |  1 +
> >  include/net/xdp.h                            |  2 ++
> >  kernel/bpf/offload.c                         |  2 ++
> >  net/core/xdp.c                               | 12 ++++++++++++
> >  5 files changed, 20 insertions(+)
> > 
> > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > index 73a78029c596..f74f0e283097 100644
> > --- a/Documentation/networking/xdp-rx-metadata.rst
> > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > @@ -29,6 +29,9 @@ metadata is supported, this set will grow:
> >  .. kernel-doc:: net/core/xdp.c
> >     :identifiers: bpf_xdp_metadata_rx_stag
> >  
> > +.. kernel-doc:: net/core/xdp.c
> > +   :identifiers: bpf_xdp_metadata_rx_csum_lvl
> > +
> >  An XDP program can use these kfuncs to read the metadata into stack
> >  variables for its own consumption. Or, to pass the metadata on to other
> >  consumers, an XDP program can store it into the metadata area carried
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index fdae37fe11f5..ddade3a15366 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1657,6 +1657,7 @@ struct xdp_metadata_ops {
> >  			       enum xdp_rss_hash_type *rss_type);
> >  	int	(*xmo_rx_ctag)(const struct xdp_md *ctx, u16 *vlan_tag);
> >  	int	(*xmo_rx_stag)(const struct xdp_md *ctx, u16 *vlan_tag);
> > +	int	(*xmo_rx_csum_lvl)(const struct xdp_md *ctx, u8 *csum_level);
> >  };
> >  
> >  /**
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 2db7439fc60f..0fbd25616241 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -393,6 +393,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
> >  			   bpf_xdp_metadata_rx_ctag) \
> >  	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_STAG, \
> >  			   bpf_xdp_metadata_rx_stag) \
> > +	XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_CSUM_LVL, \
> > +			   bpf_xdp_metadata_rx_csum_lvl) \
> >  
> >  enum {
> >  #define XDP_METADATA_KFUNC(name, _) name,
> > diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> > index 2c6b6e82cfac..8bd54fb4ac63 100644
> > --- a/kernel/bpf/offload.c
> > +++ b/kernel/bpf/offload.c
> > @@ -852,6 +852,8 @@ void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id)
> >  		p = ops->xmo_rx_ctag;
> >  	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_STAG))
> >  		p = ops->xmo_rx_stag;
> > +	else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_CSUM_LVL))
> > +		p = ops->xmo_rx_csum_lvl;
> >  out:
> >  	up_read(&bpf_devs_lock);
> >  
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index eff21501609f..7dd45fd62983 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -762,6 +762,18 @@ __bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag
> >  	return -EOPNOTSUPP;
> >  }
> >  
> > +/**
> > + * bpf_xdp_metadata_rx_csum_lvl - Get depth at which HW has checked the checksum.
> > + * @ctx: XDP context pointer.
> > + * @csum_level: Return value pointer.
> 
> Let's maybe clarify what the level means here? For example, do we start
> counting from 0 or 1?

Sure, I'll add a comment that the meaning of level is the same as in skb, 
counting from 0.

> 
> > + *
> > + * Returns 0 on success (HW has checked the checksum) or ``-errno`` on error.
> > + */
> > +__bpf_kfunc int bpf_xdp_metadata_rx_csum_lvl(const struct xdp_md *ctx, u8 *csum_level)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> >  __diag_pop();
> >  
> >  BTF_SET8_START(xdp_metadata_kfunc_ids)
> > -- 
> > 2.35.3
> > 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 13/15] selftests/bpf: Allow VLAN packets in xdp_hw_metadata
  2023-05-12 18:33   ` Stanislav Fomichev
@ 2023-05-15 14:05     ` Larysa Zaremba
  0 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-15 14:05 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Jesper Dangaard Brouer,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel

On Fri, May 12, 2023 at 11:33:34AM -0700, Stanislav Fomichev wrote:
> On 05/12, Larysa Zaremba wrote:
> > Make VLAN c-tag and s-tag XDP hint testing more convenient
> > by not skipping VLAN-ed packets.
> > 
> > Allow both 802.1ad and 802.1Q headers.
> 
> Can we also extend non-hw test? That should require adding metadata
> handlers to veth to extract relevant parts from skb + update ip link
> commands to add vlan id. Should be relatively easy to do?
> 

Seems like something I can and should do. Will be in v2.

> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  tools/testing/selftests/bpf/progs/xdp_hw_metadata.c | 9 ++++++++-
> >  tools/testing/selftests/bpf/xdp_metadata.h          | 8 ++++++++
> >  2 files changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> > index b2dfd7066c6e..f95f82a8b449 100644
> > --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> > +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> > @@ -26,15 +26,22 @@ int rx(struct xdp_md *ctx)
> >  {
> >  	void *data, *data_meta, *data_end;
> >  	struct ipv6hdr *ip6h = NULL;
> > -	struct ethhdr *eth = NULL;
> >  	struct udphdr *udp = NULL;
> >  	struct iphdr *iph = NULL;
> >  	struct xdp_meta *meta;
> > +	struct ethhdr *eth;
> >  	int err;
> >  
> >  	data = (void *)(long)ctx->data;
> >  	data_end = (void *)(long)ctx->data_end;
> >  	eth = data;
> > +
> > +	if (eth + 1 < data_end && eth->h_proto == bpf_htons(ETH_P_8021AD))
> > +		eth = (void *)eth + sizeof(struct vlan_hdr);
> > +
> > +	if (eth + 1 < data_end && eth->h_proto == bpf_htons(ETH_P_8021Q))
> > +		eth = (void *)eth + sizeof(struct vlan_hdr);
> > +
> >  	if (eth + 1 < data_end) {
> >  		if (eth->h_proto == bpf_htons(ETH_P_IP)) {
> >  			iph = (void *)(eth + 1);
> > diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
> > index 938a729bd307..6664893c2c77 100644
> > --- a/tools/testing/selftests/bpf/xdp_metadata.h
> > +++ b/tools/testing/selftests/bpf/xdp_metadata.h
> > @@ -9,6 +9,14 @@
> >  #define ETH_P_IPV6 0x86DD
> >  #endif
> >  
> > +#ifndef ETH_P_8021Q
> > +#define ETH_P_8021Q 0x8100
> > +#endif
> > +
> > +#ifndef ETH_P_8021AD
> > +#define ETH_P_8021AD 0x88A8
> > +#endif
> > +
> >  struct xdp_meta {
> >  	__u64 rx_timestamp;
> >  	__u64 xdp_timestamp;
> > -- 
> > 2.35.3
> > 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 10/15] ice: Implement VLAN tag hint
  2023-05-15 13:41     ` Larysa Zaremba
@ 2023-05-15 15:07       ` Jesper Dangaard Brouer
  2023-05-15 15:45         ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-15 15:07 UTC (permalink / raw)
  To: Larysa Zaremba, Stanislav Fomichev
  Cc: brouer, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel



On 15/05/2023 15.41, Larysa Zaremba wrote:
>>> +	*vlan_tag = ice_get_vlan_tag_from_rx_desc(xdp_ext->eop_desc);
>> Should we also do the following:
>>
>> if (!*vlan_tag)
>> 	return -ENODATA;
>>
>> ?
> Oh, returning VLAN tag with zero value really made sense to me at the beginning,
> but after playing with different kinds of packets, I think returning error makes
> more sense. Will change.
> 

IIRC then VLAN tag zero is also a valid id, right?

--Jesper


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint Larysa Zaremba
  2023-05-12 18:28   ` Stanislav Fomichev
@ 2023-05-15 15:36   ` Jesper Dangaard Brouer
  2023-05-15 16:09     ` Larysa Zaremba
  1 sibling, 1 reply; 54+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-15 15:36 UTC (permalink / raw)
  To: Larysa Zaremba, bpf
  Cc: brouer, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel



On 12/05/2023 17.26, Larysa Zaremba wrote:
> Implement functionality that enables drivers to expose VLAN tag
> to XDP code.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
[...]

> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 41e5ca8643ec..eff21501609f 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -738,6 +738,30 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
>   	return -EOPNOTSUPP;
>   }
>   

Remember below becomes part of main documentation on HW metadata hints:
  - https://kernel.org/doc/html/latest/networking/xdp-rx-metadata.html

Hint compiling locally I use:
  make SPHINXDIRS="networking" htmldocs

> +/**
> + * bpf_xdp_metadata_rx_ctag - Read XDP packet inner vlan tag.

Is bpf_xdp_metadata_rx_ctag a good function name for the inner vlan tag?
Like wise below "stag".

I cannot remember if the C-tag or S-tag is the inner or outer vlan tag.

When reading BPF code that use these function names, then I would have
to ask Google for help, or find-and-read this doc.

Can we come-up with a more intuitive name, that e.g. helps when reading
the BPF-prog code?

> + * @ctx: XDP context pointer.
> + * @vlan_tag: Return value pointer.
> + *

IMHO right here, there should be a description.

E.g. for what a VLAN "tag" means.  I assume a "tag" isn't the VLAN id,
but the raw VLAN tag that also contains the prio numbers etc.

It this VLAN tag expected to be in network-byte-order ?
IMHO this doc should define what is expected (and driver devel must
follow this).

> + * Returns 0 on success or ``-errno`` on error.
> + */
> +__bpf_kfunc int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +/**
> + * bpf_xdp_metadata_rx_stag - Read XDP packet outer vlan tag.
> + * @ctx: XDP context pointer.
> + * @vlan_tag: Return value pointer.
> + *
> + * Returns 0 on success or ``-errno`` on error.

IMHO we should provide more guidance to expected return codes, and what
they mean.  IMHO driver developers must only return codes that are
described here, and if they invent a new, add it as part of their patch.

See, formatting in bpf_xdp_metadata_rx_hash and check how this gets
compiled into HTML.


> + */
> +__bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
> +{
> +	return -EOPNOTSUPP;
> +}
> +


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 10/15] ice: Implement VLAN tag hint
  2023-05-15 15:07       ` Jesper Dangaard Brouer
@ 2023-05-15 15:45         ` Larysa Zaremba
  0 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-15 15:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Stanislav Fomichev, brouer, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On Mon, May 15, 2023 at 05:07:19PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 15/05/2023 15.41, Larysa Zaremba wrote:
> > > > +	*vlan_tag = ice_get_vlan_tag_from_rx_desc(xdp_ext->eop_desc);
> > > Should we also do the following:
> > > 
> > > if (!*vlan_tag)
> > > 	return -ENODATA;
> > > 
> > > ?
> > Oh, returning VLAN tag with zero value really made sense to me at the beginning,
> > but after playing with different kinds of packets, I think returning error makes
> > more sense. Will change.
> > 
> 
> IIRC then VLAN tag zero is also a valid id, right?

AFAIK, 0x000 is reseved and basically means "no vlan tag". When ice hardware 
returns such value in descriptor, it says "no vlan tag was stripped" and this 
doesn't necessarily mean there is no VLAN tag in the packet.

For example, let us consider a packet:

  Ether/802.1ad(s-tag)/802.1q(c-tag)/...

Hardware does not strip c-tag in such case and sends 0x000 in the descriptor, 
but packet clearly does contain a c-tag, so at least in ice, it is reasonable to
not consider '0' a reliable value.

I guess, for s-tag value of 0x000 should be more reliable, so maybe
'if (!*vlan_tag)' usage can be limited to c-tag function.

> 
> --Jesper
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint
  2023-05-15 15:36   ` Jesper Dangaard Brouer
@ 2023-05-15 16:09     ` Larysa Zaremba
  2023-05-22  8:37       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-15 16:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, brouer, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On Mon, May 15, 2023 at 05:36:12PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 12/05/2023 17.26, Larysa Zaremba wrote:
> > Implement functionality that enables drivers to expose VLAN tag
> > to XDP code.
> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> [...]
> 
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index 41e5ca8643ec..eff21501609f 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -738,6 +738,30 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
> >   	return -EOPNOTSUPP;
> >   }
> 
> Remember below becomes part of main documentation on HW metadata hints:
>  - https://kernel.org/doc/html/latest/networking/xdp-rx-metadata.html
> 
> Hint compiling locally I use:
>  make SPHINXDIRS="networking" htmldocs
> 
> > +/**
> > + * bpf_xdp_metadata_rx_ctag - Read XDP packet inner vlan tag.
> 
> Is bpf_xdp_metadata_rx_ctag a good function name for the inner vlan tag?
> Like wise below "stag".
> 
> I cannot remember if the C-tag or S-tag is the inner or outer vlan tag.
> 
> When reading BPF code that use these function names, then I would have
> to ask Google for help, or find-and-read this doc.
> 
> Can we come-up with a more intuitive name, that e.g. helps when reading
> the BPF-prog code?

Well, my reasoning for such naming is that if someone can configure s-tag 
stripping in ethtool with 'rx-vlan-stag-hw-parse', they shouldn't have any 
problem with understanding those function names.

One possible improvement that comes to mind is maybe (similarly ethtool) calling 
c-tag just 'tag' and letting s-tag stay 'stag'. Because c-tag is this default 
802.1q tag, which is supported by various hardware, while s-tag is significantly 
less widespread.

But there are many options, really.

What are your suggestions?

> 
> > + * @ctx: XDP context pointer.
> > + * @vlan_tag: Return value pointer.
> > + *
> 
> IMHO right here, there should be a description.
> 
> E.g. for what a VLAN "tag" means.  I assume a "tag" isn't the VLAN id,
> but the raw VLAN tag that also contains the prio numbers etc.
> 
> It this VLAN tag expected to be in network-byte-order ?
> IMHO this doc should define what is expected (and driver devel must
> follow this).

Will specify that.

> 
> > + * Returns 0 on success or ``-errno`` on error.
> > + */
> > +__bpf_kfunc int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> > +/**
> > + * bpf_xdp_metadata_rx_stag - Read XDP packet outer vlan tag.
> > + * @ctx: XDP context pointer.
> > + * @vlan_tag: Return value pointer.
> > + *
> > + * Returns 0 on success or ``-errno`` on error.
> 
> IMHO we should provide more guidance to expected return codes, and what
> they mean.  IMHO driver developers must only return codes that are
> described here, and if they invent a new, add it as part of their patch.

That's a good suggestion, I will expand the comment to describe error codes used 
so far.

> 
> See, formatting in bpf_xdp_metadata_rx_hash and check how this gets
> compiled into HTML.
> 
> 
> > + */
> > +__bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-12 15:26 ` [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32 Larysa Zaremba
@ 2023-05-15 16:17   ` Jesper Dangaard Brouer
  2023-05-15 17:08     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-15 16:17 UTC (permalink / raw)
  To: Larysa Zaremba, bpf
  Cc: brouer, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel, Aleksander Lobakin



On 12/05/2023 17.26, Larysa Zaremba wrote:
> From: Aleksander Lobakin <aleksander.lobakin@intel.com>
> 
> When using XDP hints, metadata sometimes has to be much bigger
> than 32 bytes. Relax the restriction, allow metadata larger than 32 bytes
> and make __skb_metadata_differs() work with bigger lengths.
> 
> Now size of metadata is only limited by the fact it is stored as u8
> in skb_shared_info, so maximum possible value is 255. 

I'm confused, IIRC the metadata area isn't stored "in skb_shared_info".
The maximum possible size is limited by the XDP headroom, which is also
shared/limited with/by xdp_frame.  I must be reading the sentence wrong,
somehow.

> Other important
> conditions, such as having enough space for xdp_frame building, are already
> checked in bpf_xdp_adjust_meta().
> 
> The requirement of having its length aligned to 4 bytes is still
> valid.
> 
> Signed-off-by: Aleksander Lobakin <aleksander.lobakin@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>   include/linux/skbuff.h | 13 ++++++++-----
>   include/net/xdp.h      |  7 ++++++-
>   2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 8ddb4af1a501..afcd372aecdf 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -4219,10 +4219,13 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a,
>   {
>   	const void *a = skb_metadata_end(skb_a);
>   	const void *b = skb_metadata_end(skb_b);
> -	/* Using more efficient varaiant than plain call to memcmp(). */
> -#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
>   	u64 diffs = 0;
>   
> +	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
> +	    BITS_PER_LONG != 64)
> +		goto slow;
> +
> +	/* Using more efficient variant than plain call to memcmp(). */
>   	switch (meta_len) {
>   #define __it(x, op) (x -= sizeof(u##op))
>   #define __it_diff(a, b, op) (*(u##op *)__it(a, op)) ^ (*(u##op *)__it(b, op))
> @@ -4242,11 +4245,11 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a,
>   		fallthrough;
>   	case  4: diffs |= __it_diff(a, b, 32);
>   		break;
> +	default:
> +slow:
> +		return memcmp(a - meta_len, b - meta_len, meta_len);
>   	}
>   	return diffs;
> -#else
> -	return memcmp(a - meta_len, b - meta_len, meta_len);
> -#endif
>   }
>   
>   static inline bool skb_metadata_differs(const struct sk_buff *skb_a,
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 0fbd25616241..f48723250c7c 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -370,7 +370,12 @@ xdp_data_meta_unsupported(const struct xdp_buff *xdp)
>   
>   static inline bool xdp_metalen_invalid(unsigned long metalen)
>   {
> -	return (metalen & (sizeof(__u32) - 1)) || (metalen > 32);
> +	typeof(metalen) meta_max;
> +
> +	meta_max = type_max(typeof_member(struct skb_shared_info, meta_len));
> +	BUILD_BUG_ON(!__builtin_constant_p(meta_max));
> +
> +	return !IS_ALIGNED(metalen, sizeof(u32)) || metalen > meta_max;
>   }
>   
>   struct xdp_attachment_info {


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-15 16:17   ` Jesper Dangaard Brouer
@ 2023-05-15 17:08     ` Larysa Zaremba
  2023-05-16 12:37       ` Alexander Lobakin
  0 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-15 17:08 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, brouer, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel,
	Aleksander Lobakin

On Mon, May 15, 2023 at 06:17:02PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 12/05/2023 17.26, Larysa Zaremba wrote:
> > From: Aleksander Lobakin <aleksander.lobakin@intel.com>
> > 
> > When using XDP hints, metadata sometimes has to be much bigger
> > than 32 bytes. Relax the restriction, allow metadata larger than 32 bytes
> > and make __skb_metadata_differs() work with bigger lengths.
> > 
> > Now size of metadata is only limited by the fact it is stored as u8
> > in skb_shared_info, so maximum possible value is 255.
> 
> I'm confused, IIRC the metadata area isn't stored "in skb_shared_info".
> The maximum possible size is limited by the XDP headroom, which is also
> shared/limited with/by xdp_frame.  I must be reading the sentence wrong,
> somehow.

It's not 'metadata is stored as u8', it's 'metadata size is stored as u8' :)
Maybe I should rephrase it better in v2.

> 
> > Other important
> > conditions, such as having enough space for xdp_frame building, are already
> > checked in bpf_xdp_adjust_meta().
> > 
> > The requirement of having its length aligned to 4 bytes is still
> > valid.
> > 
> > Signed-off-by: Aleksander Lobakin <aleksander.lobakin@intel.com>
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >   include/linux/skbuff.h | 13 ++++++++-----
> >   include/net/xdp.h      |  7 ++++++-
> >   2 files changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 8ddb4af1a501..afcd372aecdf 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -4219,10 +4219,13 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a,
> >   {
> >   	const void *a = skb_metadata_end(skb_a);
> >   	const void *b = skb_metadata_end(skb_b);
> > -	/* Using more efficient varaiant than plain call to memcmp(). */
> > -#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
> >   	u64 diffs = 0;
> > +	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
> > +	    BITS_PER_LONG != 64)
> > +		goto slow;
> > +
> > +	/* Using more efficient variant than plain call to memcmp(). */
> >   	switch (meta_len) {
> >   #define __it(x, op) (x -= sizeof(u##op))
> >   #define __it_diff(a, b, op) (*(u##op *)__it(a, op)) ^ (*(u##op *)__it(b, op))
> > @@ -4242,11 +4245,11 @@ static inline bool __skb_metadata_differs(const struct sk_buff *skb_a,
> >   		fallthrough;
> >   	case  4: diffs |= __it_diff(a, b, 32);
> >   		break;
> > +	default:
> > +slow:
> > +		return memcmp(a - meta_len, b - meta_len, meta_len);
> >   	}
> >   	return diffs;
> > -#else
> > -	return memcmp(a - meta_len, b - meta_len, meta_len);
> > -#endif
> >   }
> >   static inline bool skb_metadata_differs(const struct sk_buff *skb_a,
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 0fbd25616241..f48723250c7c 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -370,7 +370,12 @@ xdp_data_meta_unsupported(const struct xdp_buff *xdp)
> >   static inline bool xdp_metalen_invalid(unsigned long metalen)
> >   {
> > -	return (metalen & (sizeof(__u32) - 1)) || (metalen > 32);
> > +	typeof(metalen) meta_max;
> > +
> > +	meta_max = type_max(typeof_member(struct skb_shared_info, meta_len));
> > +	BUILD_BUG_ON(!__builtin_constant_p(meta_max));
> > +
> > +	return !IS_ALIGNED(metalen, sizeof(u32)) || metalen > meta_max;
> >   }
> >   struct xdp_attachment_info {
> 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-15 17:08     ` Larysa Zaremba
@ 2023-05-16 12:37       ` Alexander Lobakin
  2023-05-16 15:35         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-16 12:37 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: Jesper Dangaard Brouer, bpf, brouer, Stanislav Fomichev,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Mon, 15 May 2023 19:08:39 +0200

> On Mon, May 15, 2023 at 06:17:02PM +0200, Jesper Dangaard Brouer wrote:
>>
>>
>> On 12/05/2023 17.26, Larysa Zaremba wrote:
>>> From: Aleksander Lobakin <aleksander.lobakin@intel.com>
>>>
>>> When using XDP hints, metadata sometimes has to be much bigger
>>> than 32 bytes. Relax the restriction, allow metadata larger than 32 bytes
>>> and make __skb_metadata_differs() work with bigger lengths.
>>>
>>> Now size of metadata is only limited by the fact it is stored as u8
>>> in skb_shared_info, so maximum possible value is 255.
>>
>> I'm confused, IIRC the metadata area isn't stored "in skb_shared_info".
>> The maximum possible size is limited by the XDP headroom, which is also
>> shared/limited with/by xdp_frame.  I must be reading the sentence wrong,
>> somehow.

skb_shared_info::meta_size is u8. Since metadata gets carried from
xdp_buff to skb, this check is needed (it's compile-time constant anyway).
Check for headroom is done separately already (two sentences below).

> 
> It's not 'metadata is stored as u8', it's 'metadata size is stored as u8' :)
> Maybe I should rephrase it better in v2.
> 
>>
>>> Other important
>>> conditions, such as having enough space for xdp_frame building, are already
>>> checked in bpf_xdp_adjust_meta().
>>>
>>> The requirement of having its length aligned to 4 bytes is still
>>> valid.
BTW I decided to not expand switch-case in __skb_metadata_differs() with
more size values because: 1) it's not a common case; 2) memcmp() is +/-
fast on x86; 3) it's gross already. But I can if needed :D I think it
can be compressed via some macro hell.

(this function is called for each skb when GROing if it carries any
 meta, so sometimes may hurt. Larysa, have you noticed any perf
 regression between meta <= 32 and > 32?)

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-16 12:37       ` Alexander Lobakin
@ 2023-05-16 15:35         ` Jesper Dangaard Brouer
  2023-05-19 16:35           ` Alexander Lobakin
  0 siblings, 1 reply; 54+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-16 15:35 UTC (permalink / raw)
  To: Alexander Lobakin, Larysa Zaremba
  Cc: brouer, Jesper Dangaard Brouer, bpf, Stanislav Fomichev,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel



On 16/05/2023 14.37, Alexander Lobakin wrote:
> From: Larysa Zaremba<larysa.zaremba@intel.com>
> Date: Mon, 15 May 2023 19:08:39 +0200
> 
>> On Mon, May 15, 2023 at 06:17:02PM +0200, Jesper Dangaard Brouer wrote:
>>>
>>> On 12/05/2023 17.26, Larysa Zaremba wrote:
>>>> From: Aleksander Lobakin<aleksander.lobakin@intel.com>
>>>>
>>>> When using XDP hints, metadata sometimes has to be much bigger
>>>> than 32 bytes. Relax the restriction, allow metadata larger than 32 bytes
>>>> and make __skb_metadata_differs() work with bigger lengths.
>>>>
>>>> Now size of metadata is only limited by the fact it is stored as u8
>>>> in skb_shared_info, so maximum possible value is 255.
 >>>
>>> I'm confused, IIRC the metadata area isn't stored "in skb_shared_info".
>>> The maximum possible size is limited by the XDP headroom, which is also
>>> shared/limited with/by xdp_frame.  I must be reading the sentence wrong,
>>> somehow.
 >
> skb_shared_info::meta_size  is u8. Since metadata gets carried from
> xdp_buff to skb, this check is needed (it's compile-time constant anyway).
> Check for headroom is done separately already (two sentences below).
> 

Damn, argh, for SKBs the "meta_len" is stored in skb_shared_info, which
is located on another cacheline.
That is a sure way to KILL performance! :-(

But only use for SKBs that gets created from xdp with metadata, right?



>> It's not 'metadata is stored as u8', it's 'metadata size is stored as u8' :)
>> Maybe I should rephrase it better in v2.

Yes, a rephrase will be good.

--Jesper



static inline u8 skb_metadata_len(const struct sk_buff *skb)
{
	return skb_shinfo(skb)->meta_len;
}


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 06/15] ice: Support HW timestamp hint
  2023-05-12 18:19   ` Stanislav Fomichev
@ 2023-05-16 16:17     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 54+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-16 16:17 UTC (permalink / raw)
  To: Stanislav Fomichev, Larysa Zaremba
  Cc: brouer, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
	netdev, intel-wired-lan, linux-kernel



On 12/05/2023 20.19, Stanislav Fomichev wrote:
>> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> index 2515f5f7a2b6..e9589cadf811 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> @@ -537,3 +537,25 @@ void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res,
>>   			spin_unlock(&xdp_ring->tx_lock);
>>   	}
>>   }
>> +
>> +/**
>> + * ice_xdp_rx_hw_ts - HW timestamp XDP hint handler
>> + * @ctx: XDP buff pointer
>> + * @ts_ns: destination address
>> + *
>> + * Copy HW timestamp (if available) to the destination address.
>> + */
>> +static int ice_xdp_rx_hw_ts(const struct xdp_md *ctx, u64 *ts_ns)
>> +{
>> +	const struct ice_xdp_buff *xdp_ext = (void *)ctx;
>> +
>> +	if (!ice_ptp_copy_rx_hwts_from_desc(xdp_ext->rx_ring,
>> +					    xdp_ext->eop_desc, ts_ns))
>> +		return -EOPNOTSUPP;
> Per Jesper's recent update, should this be ENODATA?
> 

Yes, please :-)

https://git.kernel.org/torvalds/c/915efd8a446b ("xdp: bpf_xdp_metadata 
use EOPNOTSUPP for no driver support")

--Jesper


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-16 15:35         ` Jesper Dangaard Brouer
@ 2023-05-19 16:35           ` Alexander Lobakin
  2023-05-22 11:41             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-19 16:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Larysa Zaremba, brouer, bpf, Stanislav Fomichev,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Jiri Olsa, Jesse Brandeburg,
	Tony Nguyen, Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Jesper Dangaard Brouer <jbrouer@redhat.com>
Date: Tue, 16 May 2023 17:35:27 +0200

> 
> 
> On 16/05/2023 14.37, Alexander Lobakin wrote:
>> From: Larysa Zaremba<larysa.zaremba@intel.com>
>> Date: Mon, 15 May 2023 19:08:39 +0200
>>
>>> On Mon, May 15, 2023 at 06:17:02PM +0200, Jesper Dangaard Brouer wrote:
>>>>
>>>> On 12/05/2023 17.26, Larysa Zaremba wrote:
>>>>> From: Aleksander Lobakin<aleksander.lobakin@intel.com>
>>>>>
>>>>> When using XDP hints, metadata sometimes has to be much bigger
>>>>> than 32 bytes. Relax the restriction, allow metadata larger than 32
>>>>> bytes
>>>>> and make __skb_metadata_differs() work with bigger lengths.
>>>>>
>>>>> Now size of metadata is only limited by the fact it is stored as u8
>>>>> in skb_shared_info, so maximum possible value is 255.
>>>>
>>>> I'm confused, IIRC the metadata area isn't stored "in skb_shared_info".
>>>> The maximum possible size is limited by the XDP headroom, which is also
>>>> shared/limited with/by xdp_frame.  I must be reading the sentence
>>>> wrong,
>>>> somehow.
>>
>> skb_shared_info::meta_size  is u8. Since metadata gets carried from
>> xdp_buff to skb, this check is needed (it's compile-time constant
>> anyway).
>> Check for headroom is done separately already (two sentences below).
>>
> 
> Damn, argh, for SKBs the "meta_len" is stored in skb_shared_info, which
> is located on another cacheline.
> That is a sure way to KILL performance! :-(

Have you read the code? I use type_max(typeof_member(shinfo, meta_len)),
what performance are you talking about?

The whole xdp_metalen_invalid() gets expanded into:

	return (metalen % 4) || metalen > 255;

at compile-time. All those typeof shenanigans are only to not open-code
meta_len's type/size/max.

> 
> But only use for SKBs that gets created from xdp with metadata, right?
> 
> 
> 
>>> It's not 'metadata is stored as u8', it's 'metadata size is stored as
>>> u8' :)
>>> Maybe I should rephrase it better in v2.
> 
> Yes, a rephrase will be good.
> 
> --Jesper
> 
> 
> 
> static inline u8 skb_metadata_len(const struct sk_buff *skb)
> {
>     return skb_shinfo(skb)->meta_len;
> }
> 

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable Larysa Zaremba
@ 2023-05-19 16:46   ` Alexander Lobakin
  2023-05-22 15:03     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-19 16:46 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Fri, 12 May 2023 17:25:53 +0200

> Previously, we only needed RX hash in skb path,
> hence all related code was written with skb in mind.
> But with the addition of XDP hints via kfuncs to the ice driver,
> the same logic will be needed in .xmo_() callbacks.
> 
> Separate generic process of reading RX hash from a descriptor
> into a separate function.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 38 +++++++++++++------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> index c8322fb6f2b3..fc67bbf600af 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> @@ -63,28 +63,44 @@ static enum pkt_hash_types ice_ptype_to_htype(u16 ptype)
>  }
>  
>  /**
> - * ice_rx_hash - set the hash value in the skb
> + * ice_copy_rx_hash_from_desc - copy hash value from descriptor to address
> + * @rx_desc: specific descriptor
> + * @dst: address to copy hash value to
> + *
> + * Returns true, if valid hash has been copied into the destination address.
> + */
> +static bool
> +ice_copy_rx_hash_from_desc(union ice_32b_rx_flex_desc *rx_desc, u32 *dst)

@rx_desc can be const.

I'm also unsure about the naming. Why not name this one ice_rx_hash()
and the one which sets it in skb ice_rx_hash_skb()?

> +{
> +	struct ice_32b_rx_flex_desc_nic *nic_mdid;

Also const. I thought you'll pick most of my optimizations from the
related commit :D

> +
> +	if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC)
> +		return false;
> +
> +	nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
> +	*dst = le32_to_cpu(nic_mdid->rss_hash);
> +	return true;

You can just return the hash. `hash == 0` means there's no hash, so it
basically means `false`, while non-zero is `true`.

> +}
> +
> +/**
> + * ice_rx_hash_to_skb - set the hash value in the skb
>   * @rx_ring: descriptor ring
>   * @rx_desc: specific descriptor
>   * @skb: pointer to current skb
>   * @rx_ptype: the ptype value from the descriptor
>   */
>  static void
> -ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
> -	    struct sk_buff *skb, u16 rx_ptype)
> +ice_rx_hash_to_skb(struct ice_rx_ring *rx_ring,
> +		   union ice_32b_rx_flex_desc *rx_desc,
> +		   struct sk_buff *skb, u16 rx_ptype)
>  {
> -	struct ice_32b_rx_flex_desc_nic *nic_mdid;
>  	u32 hash;
>  
>  	if (!(rx_ring->netdev->features & NETIF_F_RXHASH))
>  		return;
>  
> -	if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC)
> -		return;
> -
> -	nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
> -	hash = le32_to_cpu(nic_mdid->rss_hash);
> -	skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
> +	if (ice_copy_rx_hash_from_desc(rx_desc, &hash))

likely()? I wouldn't care about zero-hashed frames, their perf is not
critical anyway.

> +		skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
>  }
>  
>  /**
> @@ -186,7 +202,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
>  		       union ice_32b_rx_flex_desc *rx_desc,
>  		       struct sk_buff *skb, u16 ptype)
>  {
> -	ice_rx_hash(rx_ring, rx_desc, skb, ptype);
> +	ice_rx_hash_to_skb(rx_ring, rx_desc, skb, ptype);
>  
>  	/* modifies the skb - consumes the enet header */
>  	skb->protocol = eth_type_trans(skb, rx_ring->netdev);

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 02/15] ice: make RX HW timestamp reading code more reusable
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 02/15] ice: make RX HW timestamp " Larysa Zaremba
@ 2023-05-19 16:52   ` Alexander Lobakin
  2023-05-22 15:07     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-19 16:52 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Fri, 12 May 2023 17:25:54 +0200

> Previously, we only needed RX HW timestamp in skb path,
> hence all related code was written with skb in mind.
> But with the addition of XDP hints via kfuncs to the ice driver,
> the same logic will be needed in .xmo_() callbacks.

[...]

> @@ -2176,9 +2174,8 @@ ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
>  	ts_high = le32_to_cpu(rx_desc->wb.flex_ts.ts_high);
>  	ts_ns = ice_ptp_extend_32b_ts(cached_time, ts_high);
>  
> -	hwtstamps = skb_hwtstamps(skb);
> -	memset(hwtstamps, 0, sizeof(*hwtstamps));
> -	hwtstamps->hwtstamp = ns_to_ktime(ts_ns);
> +	*dst = ts_ns;
> +	return true;

Can't we use the same I wrote in the prev. comment, i.e. return 0 or
timestamp? I don't think ts == 0 is valid.

>  }
>  
>  /**

[...]

> + * The driver receives a notification in the receive descriptor with timestamp.
> + * The timestamp is in ns, so we must convert the result first.
> + */
> +static void
> +ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring,
> +		       union ice_32b_rx_flex_desc *rx_desc,
> +		       struct sk_buff *skb)
> +{
> +	struct skb_shared_hwtstamps *hwtstamps;
> +	u64 ts_ns;
> +
> +	if (!ice_ptp_copy_rx_hwts_from_desc(rx_ring, rx_desc, &ts_ns))
> +		return;
> +
> +	hwtstamps = skb_hwtstamps(skb);
> +	memset(hwtstamps, 0, sizeof(*hwtstamps));
> +	hwtstamps->hwtstamp = ns_to_ktime(ts_ns);

Ok, my optimizations aren't in this series :D
If you look at the hwtimestamps in skb, you'll see all that can be
minimized to just:

	*skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){
		.hwtstamp	= ns_to_ktime(ts_ns),
	};

Compiler will probably do its job, but I wouldn't always rely on it.
Sometimes it's even able to not expand memset(8 bytes) to *(u64 *) = 0.

> +}
> +
>  /**
>   * ice_process_skb_fields - Populate skb header fields from Rx descriptor
>   * @rx_ring: Rx descriptor ring packet is being transacted on
> @@ -210,7 +235,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
>  	ice_rx_csum(rx_ring, skb, rx_desc, ptype);
>  
>  	if (rx_ring->ptp_rx)
> -		ice_ptp_rx_hwtstamp(rx_ring, rx_desc, skb);
> +		ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb);
>  }
>  
>  /**

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint
  2023-05-15 16:09     ` Larysa Zaremba
@ 2023-05-22  8:37       ` Jesper Dangaard Brouer
  2023-05-22 15:48         ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-22  8:37 UTC (permalink / raw)
  To: Larysa Zaremba, Jesper Dangaard Brouer
  Cc: brouer, bpf, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel



On 15/05/2023 18.09, Larysa Zaremba wrote:
> On Mon, May 15, 2023 at 05:36:12PM +0200, Jesper Dangaard Brouer wrote:
>>
>>
>> On 12/05/2023 17.26, Larysa Zaremba wrote:
>>> Implement functionality that enables drivers to expose VLAN tag
>>> to XDP code.
>>>
>>> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
>>> ---
>> [...]
>>
>>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>>> index 41e5ca8643ec..eff21501609f 100644
>>> --- a/net/core/xdp.c
>>> +++ b/net/core/xdp.c
>>> @@ -738,6 +738,30 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
>>>    	return -EOPNOTSUPP;
>>>    }
>>
>> Remember below becomes part of main documentation on HW metadata hints:
>>   - https://kernel.org/doc/html/latest/networking/xdp-rx-metadata.html
>>
>> Hint compiling locally I use:
>>   make SPHINXDIRS="networking" htmldocs
>>
>>> +/**
>>> + * bpf_xdp_metadata_rx_ctag - Read XDP packet inner vlan tag.
>>
>> Is bpf_xdp_metadata_rx_ctag a good function name for the inner vlan tag?
>> Like wise below "stag".
>>
>> I cannot remember if the C-tag or S-tag is the inner or outer vlan tag.
>>
>> When reading BPF code that use these function names, then I would have
>> to ask Google for help, or find-and-read this doc.
>>
>> Can we come-up with a more intuitive name, that e.g. helps when reading
>> the BPF-prog code?
> 
> Well, my reasoning for such naming is that if someone can configure s-tag
> stripping in ethtool with 'rx-vlan-stag-hw-parse', they shouldn't have any
> problem with understanding those function names.
> 

Naming is hard.  My perspective is conveying the meaning without having
to be knowledgeable about ethtool VLAN commands.  My perspective is a
casual BPF-programmer that reads "bpf_xdp_metadata_rx_stag()".
Hopefully we can choose a name that says "vlan" somewhere, such that the
person reading this doesn't have to lookup and find the documentation to
deduct this code is related to VLANs.

> One possible improvement that comes to mind is maybe (similarly ethtool) calling
> c-tag just 'tag' and letting s-tag stay 'stag'. Because c-tag is this default
> 802.1q tag, which is supported by various hardware, while s-tag is significantly
> less widespread.
> 
> But there are many options, really.
> 
> What are your suggestions?
>

One suggestion is (the symmetrical):
  * bpf_xdp_metadata_rx_vlan_inner_tag
  * bpf_xdp_metadata_rx_vlan_outer_tag

As you say above the first "inner" VLAN tag is just the regular 802.1Q
VLAN tag.  The concept of C-tag and S-tag is from 802.1ad that
introduced the concept of double tagging.

Thus one could argue for shorter names like:
  * bpf_xdp_metadata_rx_vlan_tag
  * bpf_xdp_metadata_rx_vlan_outer_tag


>>
>>> + * @ctx: XDP context pointer.
>>> + * @vlan_tag: Return value pointer.
>>> + *
>>
>> IMHO right here, there should be a description.
>>
>> E.g. for what a VLAN "tag" means.  I assume a "tag" isn't the VLAN id,
>> but the raw VLAN tag that also contains the prio numbers etc.
>>
>> It this VLAN tag expected to be in network-byte-order ?
>> IMHO this doc should define what is expected (and driver devel must
>> follow this).
> 
> Will specify that.
> 
>>
>>> + * Returns 0 on success or ``-errno`` on error.
>>> + */
>>> +__bpf_kfunc int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
>>> +{
>>> +	return -EOPNOTSUPP;
>>> +}
>>> +
>>> +/**
>>> + * bpf_xdp_metadata_rx_stag - Read XDP packet outer vlan tag.
>>> + * @ctx: XDP context pointer.
>>> + * @vlan_tag: Return value pointer.
>>> + *

(p.s. Googling I find multiple definitions of what the "S" in S-tag
means. The most reliable or statistically consistent seems to be
"Service tag", or "Service provider tag".)

The description for the renamed "bpf_xdp_metadata_rx_vlan_outer_tag"
should IMHO explain that the outer VLAN tag is often refered to as the 
S-tag (or Service-tag) in Q-in-Q (802.1ad) terminology.  Perhaps we can 
even spell out that some hardware support (and must be configured via 
ethtool) to extract this stag.

A dump of the tool rx-vlan related commands:

   $ ethtool -k i40e2 | grep rx-vlan
   rx-vlan-offload: on
   rx-vlan-filter: on [fixed]
   rx-vlan-stag-hw-parse: off [fixed]
   rx-vlan-stag-filter: off [fixed]




>>> + * Returns 0 on success or ``-errno`` on error.
>>
>> IMHO we should provide more guidance to expected return codes, and what
>> they mean.  IMHO driver developers must only return codes that are
>> described here, and if they invent a new, add it as part of their patch.
> 
> That's a good suggestion, I will expand the comment to describe error codes used
> so far.
> 
>>
>> See, formatting in bpf_xdp_metadata_rx_hash and check how this gets
>> compiled into HTML.
>>
>>
>>> + */
>>> +__bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
>>> +{
>>> +	return -EOPNOTSUPP;
>>> +}
>>> +
>>
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-19 16:35           ` Alexander Lobakin
@ 2023-05-22 11:41             ` Jesper Dangaard Brouer
  2023-05-22 15:28               ` Alexander Lobakin
  0 siblings, 1 reply; 54+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-22 11:41 UTC (permalink / raw)
  To: Alexander Lobakin, Jesper Dangaard Brouer, Daniel Borkmann
  Cc: brouer, Larysa Zaremba, bpf, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel



On 19/05/2023 18.35, Alexander Lobakin wrote:
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Date: Tue, 16 May 2023 17:35:27 +0200
> 
>>
>> On 16/05/2023 14.37, Alexander Lobakin wrote:
>>> From: Larysa Zaremba<larysa.zaremba@intel.com>
>>> Date: Mon, 15 May 2023 19:08:39 +0200
>>>
>>>> On Mon, May 15, 2023 at 06:17:02PM +0200, Jesper Dangaard Brouer wrote:
>>>>>
>>>>> On 12/05/2023 17.26, Larysa Zaremba wrote:
>>>>>> From: Aleksander Lobakin<aleksander.lobakin@intel.com>
>>>>>>
>>>>>> When using XDP hints, metadata sometimes has to be much bigger
>>>>>> than 32 bytes. Relax the restriction, allow metadata larger than 32
>>>>>> bytes
>>>>>> and make __skb_metadata_differs() work with bigger lengths.
>>>>>>
>>>>>> Now size of metadata is only limited by the fact it is stored as u8
>>>>>> in skb_shared_info, so maximum possible value is 255.
>>>>>
>>>>> I'm confused, IIRC the metadata area isn't stored "in skb_shared_info".
>>>>> The maximum possible size is limited by the XDP headroom, which is also
>>>>> shared/limited with/by xdp_frame.  I must be reading the sentence
>>>>> wrong,
>>>>> somehow.
>>>
>>> skb_shared_info::meta_size  is u8. Since metadata gets carried from
>>> xdp_buff to skb, this check is needed (it's compile-time constant
>>> anyway).
>>> Check for headroom is done separately already (two sentences below).
>>>
>>
>> Damn, argh, for SKBs the "meta_len" is stored in skb_shared_info, which
>> is located on another cacheline.
>> That is a sure way to KILL performance! :-(
> 
> Have you read the code? I use type_max(typeof_member(shinfo, meta_len)),
> what performance are you talking about?
> 

Not talking about your changes (in this patch).

I'm realizing that SKBs using metadata area will have a performance hit
due to accessing another cacheline (the meta_len in skb_shared_info).

IIRC Daniel complained about this performance hit (in the past), I guess
this explains it.  IIRC Cilium changed to use percpu variables/datastore
to workaround this.


> The whole xdp_metalen_invalid() gets expanded into:
> 
> 	return (metalen % 4) || metalen > 255;
> 
> at compile-time. All those typeof shenanigans are only to not open-code
> meta_len's type/size/max.
> 
>>
>> But only use for SKBs that gets created from xdp with metadata, right?
>>

Normal netstack processing actually access this skb_shinfo->meta_len in
gro_list_prepare().  As the caller dev_gro_receive() later access other
memory in skb_shared_info, then the GRO code path already takes this hit
to begin with.

--Jesper


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable
  2023-05-19 16:46   ` Alexander Lobakin
@ 2023-05-22 15:03     ` Larysa Zaremba
  2023-05-22 15:36       ` Alexander Lobakin
  0 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-22 15:03 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On Fri, May 19, 2023 at 06:46:31PM +0200, Alexander Lobakin wrote:
> From: Larysa Zaremba <larysa.zaremba@intel.com>
> Date: Fri, 12 May 2023 17:25:53 +0200
> 
> > Previously, we only needed RX hash in skb path,
> > hence all related code was written with skb in mind.
> > But with the addition of XDP hints via kfuncs to the ice driver,
> > the same logic will be needed in .xmo_() callbacks.
> > 
> > Separate generic process of reading RX hash from a descriptor
> > into a separate function.
> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 38 +++++++++++++------
> >  1 file changed, 27 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > index c8322fb6f2b3..fc67bbf600af 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > @@ -63,28 +63,44 @@ static enum pkt_hash_types ice_ptype_to_htype(u16 ptype)
> >  }
> >  
> >  /**
> > - * ice_rx_hash - set the hash value in the skb
> > + * ice_copy_rx_hash_from_desc - copy hash value from descriptor to address
> > + * @rx_desc: specific descriptor
> > + * @dst: address to copy hash value to
> > + *
> > + * Returns true, if valid hash has been copied into the destination address.
> > + */
> > +static bool
> > +ice_copy_rx_hash_from_desc(union ice_32b_rx_flex_desc *rx_desc, u32 *dst)
> 
> @rx_desc can be const.

Yes

> 
> I'm also unsure about the naming. Why not name this one ice_rx_hash()
> and the one which sets it in skb ice_rx_hash_skb()?

I just think that

  ice_copy_rx_hash_from_desc(desc, &hash, ...);
  ice_copy_rx_hash_from_desc(desc, hash_ptr, ...);

communicates the intention (for a person that does not see a prototype) much 
better than

  ice_rx_hash(desc, &hash, ...);
  ice_rx_hash(desc, hash_ptr, ...);

But now when I think about that, 'from_desc' part can probably be dropped 
without little to no impact, if we also replace 'copy' with sth more 
descriptive, like:

  ice_read_rx_hash(desc, &hash, ...);
  ice_read_rx_hash(desc, hash_ptr, ...);

Same for timestamp functions.

Probably, the main reason I started naming functions this way was 
ice_get_vlan_tag_from_rx_desc().
'_from_rx_desc' part is pretty redundant there too.

I won't change '_to_skb' part though, I think function should show the direction 
of change it applies.

> 
> > +{
> > +	struct ice_32b_rx_flex_desc_nic *nic_mdid;
> 
> Also const. I thought you'll pick most of my optimizations from the
> related commit :D

Well, at some point I kinda forgot about the patch, because it wasn't very 
usefult at the start of development, to be honest. Should have looked at it the 
the later stages though >_<

Will make nic_mdid const.

> 
> > +
> > +	if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC)
> > +		return false;
> > +
> > +	nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
> > +	*dst = le32_to_cpu(nic_mdid->rss_hash);
> > +	return true;
> 
> You can just return the hash. `hash == 0` means there's no hash, so it
> basically means `false`, while non-zero is `true`.

Agree about both hash and timestamp.

Taking this comment and the earlier on into account, I'll name functions like
that:

ice_get_rx_hash()
ice_get_vlan_tag()
ice_ptp_get_rx_hwts_ns()

> 
> > +}
> > +
> > +/**
> > + * ice_rx_hash_to_skb - set the hash value in the skb
> >   * @rx_ring: descriptor ring
> >   * @rx_desc: specific descriptor
> >   * @skb: pointer to current skb
> >   * @rx_ptype: the ptype value from the descriptor
> >   */
> >  static void
> > -ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
> > -	    struct sk_buff *skb, u16 rx_ptype)
> > +ice_rx_hash_to_skb(struct ice_rx_ring *rx_ring,
> > +		   union ice_32b_rx_flex_desc *rx_desc,
> > +		   struct sk_buff *skb, u16 rx_ptype)
> >  {
> > -	struct ice_32b_rx_flex_desc_nic *nic_mdid;
> >  	u32 hash;
> >  
> >  	if (!(rx_ring->netdev->features & NETIF_F_RXHASH))
> >  		return;
> >  
> > -	if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC)
> > -		return;
> > -
> > -	nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
> > -	hash = le32_to_cpu(nic_mdid->rss_hash);
> > -	skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
> > +	if (ice_copy_rx_hash_from_desc(rx_desc, &hash))
> 
> likely()? I wouldn't care about zero-hashed frames, their perf is not
> critical anyway.

Sure.

> 
> > +		skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
> >  }
> >  
> >  /**
> > @@ -186,7 +202,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
> >  		       union ice_32b_rx_flex_desc *rx_desc,
> >  		       struct sk_buff *skb, u16 ptype)
> >  {
> > -	ice_rx_hash(rx_ring, rx_desc, skb, ptype);
> > +	ice_rx_hash_to_skb(rx_ring, rx_desc, skb, ptype);
> >  
> >  	/* modifies the skb - consumes the enet header */
> >  	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
> 
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 02/15] ice: make RX HW timestamp reading code more reusable
  2023-05-19 16:52   ` Alexander Lobakin
@ 2023-05-22 15:07     ` Larysa Zaremba
  0 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-22 15:07 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On Fri, May 19, 2023 at 06:52:13PM +0200, Alexander Lobakin wrote:
> From: Larysa Zaremba <larysa.zaremba@intel.com>
> Date: Fri, 12 May 2023 17:25:54 +0200
> 
> > Previously, we only needed RX HW timestamp in skb path,
> > hence all related code was written with skb in mind.
> > But with the addition of XDP hints via kfuncs to the ice driver,
> > the same logic will be needed in .xmo_() callbacks.
> 
> [...]
> 
> > @@ -2176,9 +2174,8 @@ ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
> >  	ts_high = le32_to_cpu(rx_desc->wb.flex_ts.ts_high);
> >  	ts_ns = ice_ptp_extend_32b_ts(cached_time, ts_high);
> >  
> > -	hwtstamps = skb_hwtstamps(skb);
> > -	memset(hwtstamps, 0, sizeof(*hwtstamps));
> > -	hwtstamps->hwtstamp = ns_to_ktime(ts_ns);
> > +	*dst = ts_ns;
> > +	return true;
> 
> Can't we use the same I wrote in the prev. comment, i.e. return 0 or
> timestamp? I don't think ts == 0 is valid.
>

Agreed with this in the answer to the previous email :)
 
> >  }
> >  
> >  /**
> 
> [...]
> 
> > + * The driver receives a notification in the receive descriptor with timestamp.
> > + * The timestamp is in ns, so we must convert the result first.
> > + */
> > +static void
> > +ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring,
> > +		       union ice_32b_rx_flex_desc *rx_desc,
> > +		       struct sk_buff *skb)
> > +{
> > +	struct skb_shared_hwtstamps *hwtstamps;
> > +	u64 ts_ns;
> > +
> > +	if (!ice_ptp_copy_rx_hwts_from_desc(rx_ring, rx_desc, &ts_ns))
> > +		return;
> > +
> > +	hwtstamps = skb_hwtstamps(skb);
> > +	memset(hwtstamps, 0, sizeof(*hwtstamps));
> > +	hwtstamps->hwtstamp = ns_to_ktime(ts_ns);
> 
> Ok, my optimizations aren't in this series :D
> If you look at the hwtimestamps in skb, you'll see all that can be
> minimized to just:
> 
> 	*skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){
> 		.hwtstamp	= ns_to_ktime(ts_ns),
> 	};
> 
> Compiler will probably do its job, but I wouldn't always rely on it.
> Sometimes it's even able to not expand memset(8 bytes) to *(u64 *) = 0.

Ok, will fix.

> 
> > +}
> > +
> >  /**
> >   * ice_process_skb_fields - Populate skb header fields from Rx descriptor
> >   * @rx_ring: Rx descriptor ring packet is being transacted on
> > @@ -210,7 +235,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
> >  	ice_rx_csum(rx_ring, skb, rx_desc, ptype);
> >  
> >  	if (rx_ring->ptp_rx)
> > -		ice_ptp_rx_hwtstamp(rx_ring, rx_desc, skb);
> > +		ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb);
> >  }
> >  
> >  /**
> 
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-22 11:41             ` Jesper Dangaard Brouer
@ 2023-05-22 15:28               ` Alexander Lobakin
  2023-05-22 15:55                 ` Daniel Borkmann
  0 siblings, 1 reply; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-22 15:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Daniel Borkmann
  Cc: brouer, Larysa Zaremba, bpf, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Jesper Dangaard Brouer <jbrouer@redhat.com>
Date: Mon, 22 May 2023 13:41:43 +0200

> 
> 
> On 19/05/2023 18.35, Alexander Lobakin wrote:
>> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
>> Date: Tue, 16 May 2023 17:35:27 +0200

[...]

> Not talking about your changes (in this patch).
> 
> I'm realizing that SKBs using metadata area will have a performance hit
> due to accessing another cacheline (the meta_len in skb_shared_info).
> 
> IIRC Daniel complained about this performance hit (in the past), I guess
> this explains it.  IIRC Cilium changed to use percpu variables/datastore
> to workaround this.

Why should we compare metadata of skbs on GRO anyway? I was disabling it
the old hints series (conditionally, if driver asks), moreover...
...if metadata contains full checksum, GRO will be broken completely due
to this comparison (or any other frame-unique fields. VLAN tags and
hashes are okay).

> 
> 
>> The whole xdp_metalen_invalid() gets expanded into:
>>
>>     return (metalen % 4) || metalen > 255;
>>
>> at compile-time. All those typeof shenanigans are only to not open-code
>> meta_len's type/size/max.
>>
>>>
>>> But only use for SKBs that gets created from xdp with metadata, right?
>>>
> 
> Normal netstack processing actually access this skb_shinfo->meta_len in
> gro_list_prepare().  As the caller dev_gro_receive() later access other
> memory in skb_shared_info, then the GRO code path already takes this hit
> to begin with.

You access skb_shinfo() often even before running XDP program, for
example, when a frame is multi-buffer. Plus HW timestamps are also
there, and so on.

> 
> --Jesper
> 

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable
  2023-05-22 15:03     ` Larysa Zaremba
@ 2023-05-22 15:36       ` Alexander Lobakin
  0 siblings, 0 replies; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-22 15:36 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Mon, 22 May 2023 17:03:54 +0200

> On Fri, May 19, 2023 at 06:46:31PM +0200, Alexander Lobakin wrote:
>> From: Larysa Zaremba <larysa.zaremba@intel.com>
>> Date: Fri, 12 May 2023 17:25:53 +0200

[...]

>>> +	nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
>>> +	*dst = le32_to_cpu(nic_mdid->rss_hash);
>>> +	return true;
>>
>> You can just return the hash. `hash == 0` means there's no hash, so it
>> basically means `false`, while non-zero is `true`.
> 
> Agree about both hash and timestamp.
> 
> Taking this comment and the earlier on into account, I'll name functions like
> that:
> 
> ice_get_rx_hash()
> ice_get_vlan_tag()
> ice_ptp_get_rx_hwts_ns()

Sounds good to me!

[...]

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint
  2023-05-22  8:37       ` Jesper Dangaard Brouer
@ 2023-05-22 15:48         ` Larysa Zaremba
  2023-05-23 10:16           ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-22 15:48 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: brouer, bpf, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On Mon, May 22, 2023 at 10:37:33AM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 15/05/2023 18.09, Larysa Zaremba wrote:
> > On Mon, May 15, 2023 at 05:36:12PM +0200, Jesper Dangaard Brouer wrote:
> > > 
> > > 
> > > On 12/05/2023 17.26, Larysa Zaremba wrote:
> > > > Implement functionality that enables drivers to expose VLAN tag
> > > > to XDP code.
> > > > 
> > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > ---
> > > [...]
> > > 
> > > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > > index 41e5ca8643ec..eff21501609f 100644
> > > > --- a/net/core/xdp.c
> > > > +++ b/net/core/xdp.c
> > > > @@ -738,6 +738,30 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
> > > >    	return -EOPNOTSUPP;
> > > >    }
> > > 
> > > Remember below becomes part of main documentation on HW metadata hints:
> > >   - https://kernel.org/doc/html/latest/networking/xdp-rx-metadata.html
> > > 
> > > Hint compiling locally I use:
> > >   make SPHINXDIRS="networking" htmldocs
> > > 
> > > > +/**
> > > > + * bpf_xdp_metadata_rx_ctag - Read XDP packet inner vlan tag.
> > > 
> > > Is bpf_xdp_metadata_rx_ctag a good function name for the inner vlan tag?
> > > Like wise below "stag".
> > > 
> > > I cannot remember if the C-tag or S-tag is the inner or outer vlan tag.
> > > 
> > > When reading BPF code that use these function names, then I would have
> > > to ask Google for help, or find-and-read this doc.
> > > 
> > > Can we come-up with a more intuitive name, that e.g. helps when reading
> > > the BPF-prog code?
> > 
> > Well, my reasoning for such naming is that if someone can configure s-tag
> > stripping in ethtool with 'rx-vlan-stag-hw-parse', they shouldn't have any
> > problem with understanding those function names.
> > 
> 
> Naming is hard.  My perspective is conveying the meaning without having
> to be knowledgeable about ethtool VLAN commands.  My perspective is a
> casual BPF-programmer that reads "bpf_xdp_metadata_rx_stag()".
> Hopefully we can choose a name that says "vlan" somewhere, such that the
> person reading this doesn't have to lookup and find the documentation to
> deduct this code is related to VLANs.
> 
> > One possible improvement that comes to mind is maybe (similarly ethtool) calling
> > c-tag just 'tag' and letting s-tag stay 'stag'. Because c-tag is this default
> > 802.1q tag, which is supported by various hardware, while s-tag is significantly
> > less widespread.
> > 
> > But there are many options, really.
> > 
> > What are your suggestions?
> > 
> 
> One suggestion is (the symmetrical):
>  * bpf_xdp_metadata_rx_vlan_inner_tag
>  * bpf_xdp_metadata_rx_vlan_outer_tag
> 
> As you say above the first "inner" VLAN tag is just the regular 802.1Q
> VLAN tag.  The concept of C-tag and S-tag is from 802.1ad that
> introduced the concept of double tagging.
> 
> Thus one could argue for shorter names like:
>  * bpf_xdp_metadata_rx_vlan_tag
>  * bpf_xdp_metadata_rx_vlan_outer_tag
>

AFAIK, outer tag is a broader term, it's pretty often used for stacked 802.1Q 
headers. I can't find what exactly is an expected behavior for rxvlan and
rx-vlan-stag-hw-parse in ethtool, but iavf documentation states that rxvlan
"enables outer or single 802.1Q VLAN stripping" and rx-vlan-stag-hw-parse
"enables outer or single 802.1ad VLAN stripping". This is in consistent with how 
ice hardware behaves. More credible sources would be welcome.

What about:
  * bpf_xdp_metadata_rx_vlan_tag
  * bpf_xdp_metadata_rx_vlan_qinq_tag

> 
> > > 
> > > > + * @ctx: XDP context pointer.
> > > > + * @vlan_tag: Return value pointer.
> > > > + *
> > > 
> > > IMHO right here, there should be a description.
> > > 
> > > E.g. for what a VLAN "tag" means.  I assume a "tag" isn't the VLAN id,
> > > but the raw VLAN tag that also contains the prio numbers etc.
> > > 
> > > It this VLAN tag expected to be in network-byte-order ?
> > > IMHO this doc should define what is expected (and driver devel must
> > > follow this).
> > 
> > Will specify that.
> > 
> > > 
> > > > + * Returns 0 on success or ``-errno`` on error.
> > > > + */
> > > > +__bpf_kfunc int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
> > > > +{
> > > > +	return -EOPNOTSUPP;
> > > > +}
> > > > +
> > > > +/**
> > > > + * bpf_xdp_metadata_rx_stag - Read XDP packet outer vlan tag.
> > > > + * @ctx: XDP context pointer.
> > > > + * @vlan_tag: Return value pointer.
> > > > + *
> 
> (p.s. Googling I find multiple definitions of what the "S" in S-tag
> means. The most reliable or statistically consistent seems to be
> "Service tag", or "Service provider tag".)
> 
> The description for the renamed "bpf_xdp_metadata_rx_vlan_outer_tag"
> should IMHO explain that the outer VLAN tag is often refered to as the S-tag
> (or Service-tag) in Q-in-Q (802.1ad) terminology.  Perhaps we can even spell
> out that some hardware support (and must be configured via ethtool) to
> extract this stag.
> 
> A dump of the tool rx-vlan related commands:
> 
>   $ ethtool -k i40e2 | grep rx-vlan
>   rx-vlan-offload: on
>   rx-vlan-filter: on [fixed]
>   rx-vlan-stag-hw-parse: off [fixed]
>   rx-vlan-stag-filter: off [fixed]
> 
> 
> 
> 
> > > > + * Returns 0 on success or ``-errno`` on error.
> > > 
> > > IMHO we should provide more guidance to expected return codes, and what
> > > they mean.  IMHO driver developers must only return codes that are
> > > described here, and if they invent a new, add it as part of their patch.
> > 
> > That's a good suggestion, I will expand the comment to describe error codes used
> > so far.
> > 
> > > 
> > > See, formatting in bpf_xdp_metadata_rx_hash and check how this gets
> > > compiled into HTML.
> > > 
> > > 
> > > > + */
> > > > +__bpf_kfunc int bpf_xdp_metadata_rx_stag(const struct xdp_md *ctx, u16 *vlan_tag)
> > > > +{
> > > > +	return -EOPNOTSUPP;
> > > > +}
> > > > +
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 03/15] ice: make RX checksum checking code more reusable
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 03/15] ice: make RX checksum checking " Larysa Zaremba
@ 2023-05-22 15:51   ` Alexander Lobakin
  2023-05-22 16:05     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-22 15:51 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Fri, 12 May 2023 17:25:55 +0200

> Previously, we only needed RX checksum flags in skb path,
> hence all related code was written with skb in mind.
> But with the addition of XDP hints via kfuncs to the ice driver,
> the same logic will be needed in .xmo_() callbacks.
> 
> Put generic process of determining checksum status into
> a separate function.
> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 71 ++++++++++++-------
>  1 file changed, 46 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> index 1aab79dc8915..6a4fd3f3fc0a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> @@ -104,17 +104,17 @@ ice_rx_hash_to_skb(struct ice_rx_ring *rx_ring,
>  }
>  
>  /**
> - * ice_rx_csum - Indicate in skb if checksum is good
> - * @ring: the ring we care about
> - * @skb: skb currently being received and modified
> + * ice_rx_csum_checked - Indicates, whether hardware has checked the checksum

%CHECKSUM_UNNECESSARY means that the csum is correct / frame is not
damaged. So "checked" is not enough I'd say, it's "verified" at least.
OTOH that's too long already, I'd go with classic "csum_ok" :D

>   * @rx_desc: the receive descriptor
>   * @ptype: the packet type decoded by hardware
> + * @csum_lvl_dst: address to put checksum level into
> + * @ring: ring for error stats, can be NULL
>   *
> - * skb->protocol must be set before this function is called
> + * Returns true, if hardware has checked the checksum.
>   */
> -static void
> -ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
> -	    union ice_32b_rx_flex_desc *rx_desc, u16 ptype)
> +static bool
> +ice_rx_csum_checked(union ice_32b_rx_flex_desc *rx_desc, u16 ptype,

(also const, but I guess you'll do that either way after the previous
 mails)

> +		    u8 *csum_lvl_dst, struct ice_rx_ring *ring)
>  {
>  	struct ice_rx_ptype_decoded decoded;
>  	u16 rx_status0, rx_status1;

[...]

> +/**
> + * ice_rx_csum_into_skb - Indicate in skb if checksum is good
> + * @ring: the ring we care about
> + * @skb: skb currently being received and modified
> + * @rx_desc: the receive descriptor
> + * @ptype: the packet type decoded by hardware
> + */
> +static void
> +ice_rx_csum_into_skb(struct ice_rx_ring *ring, struct sk_buff *skb,
> +		     union ice_32b_rx_flex_desc *rx_desc, u16 ptype)
> +{
> +	u8 csum_level = 0;

I'm not a fan of variables shorter than u32 on the stack. And since it
gets passed by a reference, I'm not sure the compiler will inline it =\

> +
> +	/* Start with CHECKSUM_NONE and by default csum_level = 0 */
> +	skb->ip_summed = CHECKSUM_NONE;
> +	skb_checksum_none_assert(skb);

Can we also remove this? Neither of these makes sense. ::ip_summed is
always zeroed after the memset() in __build_skb_around() (somewhere
there), while the assertion checks for `skb->ip_summed ==
CHECKSUM_NONE`, i.e. it's *always* true here (set and check :D). It's
some ancient pathetic rituals copied over and over again from e100
centuries or so...

...and BTW the comment is misleading, because the code doesn't zero
::csum_level as they claim :D

> +
> +	/* check if Rx checksum is enabled */
> +	if (!(ring->netdev->features & NETIF_F_RXCSUM))
> +		return;
> +
> +	if (!ice_rx_csum_checked(rx_desc, ptype, &csum_level, ring))
> +		return;
> +
> +	skb->ip_summed = CHECKSUM_UNNECESSARY;
> +	skb->csum_level = csum_level;

Since csum_level is useless when ip_summed is set to NONE, what do you
think about making the function return -1, 0, or 1 without writing
anything by reference?

	int csum_level;

	csum_level = ice_rx_csum_ok(rx_desc, ptype, ring);
	if (csum_level < 0)
		return;

	skb->ip_summed = CHECKSUM_UNNECESSARY;
	skb->csum_level = csum_level;

I'm not saying it's better (might be a bit at codegen), just proposing.

>  }
>  
>  /**
> @@ -232,7 +253,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
>  	/* modifies the skb - consumes the enet header */
>  	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
>  
> -	ice_rx_csum(rx_ring, skb, rx_desc, ptype);
> +	ice_rx_csum_into_skb(rx_ring, skb, rx_desc, ptype);
>  
>  	if (rx_ring->ptp_rx)
>  		ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb);

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32
  2023-05-22 15:28               ` Alexander Lobakin
@ 2023-05-22 15:55                 ` Daniel Borkmann
  0 siblings, 0 replies; 54+ messages in thread
From: Daniel Borkmann @ 2023-05-22 15:55 UTC (permalink / raw)
  To: Alexander Lobakin, Jesper Dangaard Brouer
  Cc: brouer, Larysa Zaremba, bpf, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On 5/22/23 5:28 PM, Alexander Lobakin wrote:
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Date: Mon, 22 May 2023 13:41:43 +0200
>> On 19/05/2023 18.35, Alexander Lobakin wrote:
>>> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
>>> Date: Tue, 16 May 2023 17:35:27 +0200
> 
> [...]
> 
>> Not talking about your changes (in this patch).
>>
>> I'm realizing that SKBs using metadata area will have a performance hit
>> due to accessing another cacheline (the meta_len in skb_shared_info).
>>
>> IIRC Daniel complained about this performance hit (in the past), I guess
>> this explains it.  IIRC Cilium changed to use percpu variables/datastore
>> to workaround this.
> 
> Why should we compare metadata of skbs on GRO anyway? I was disabling it
> the old hints series (conditionally, if driver asks), moreover...
> ...if metadata contains full checksum, GRO will be broken completely due
> to this comparison (or any other frame-unique fields. VLAN tags and
> hashes are okay).

This is when BPF prog on XDP populates metadata with custom data when it
wants to transfer information from XDP to skb aka tc BPF prog side. percpu
data store may not work here as it is not guaranteed that skb might end up
on same CPU.

>>> The whole xdp_metalen_invalid() gets expanded into:
>>>
>>>      return (metalen % 4) || metalen > 255;
>>>
>>> at compile-time. All those typeof shenanigans are only to not open-code
>>> meta_len's type/size/max.
>>>
>>>>
>>>> But only use for SKBs that gets created from xdp with metadata, right?
>>>>
>>
>> Normal netstack processing actually access this skb_shinfo->meta_len in
>> gro_list_prepare().  As the caller dev_gro_receive() later access other
>> memory in skb_shared_info, then the GRO code path already takes this hit
>> to begin with.
> 
> You access skb_shinfo() often even before running XDP program, for
> example, when a frame is multi-buffer. Plus HW timestamps are also
> there, and so on.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 03/15] ice: make RX checksum checking code more reusable
  2023-05-22 15:51   ` Alexander Lobakin
@ 2023-05-22 16:05     ` Larysa Zaremba
  0 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-22 16:05 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On Mon, May 22, 2023 at 05:51:37PM +0200, Alexander Lobakin wrote:
> From: Larysa Zaremba <larysa.zaremba@intel.com>
> Date: Fri, 12 May 2023 17:25:55 +0200
> 
> > Previously, we only needed RX checksum flags in skb path,
> > hence all related code was written with skb in mind.
> > But with the addition of XDP hints via kfuncs to the ice driver,
> > the same logic will be needed in .xmo_() callbacks.
> > 
> > Put generic process of determining checksum status into
> > a separate function.
> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 71 ++++++++++++-------
> >  1 file changed, 46 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > index 1aab79dc8915..6a4fd3f3fc0a 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > @@ -104,17 +104,17 @@ ice_rx_hash_to_skb(struct ice_rx_ring *rx_ring,
> >  }
> >  
> >  /**
> > - * ice_rx_csum - Indicate in skb if checksum is good
> > - * @ring: the ring we care about
> > - * @skb: skb currently being received and modified
> > + * ice_rx_csum_checked - Indicates, whether hardware has checked the checksum
> 
> %CHECKSUM_UNNECESSARY means that the csum is correct / frame is not
> damaged. So "checked" is not enough I'd say, it's "verified" at least.
> OTOH that's too long already, I'd go with classic "csum_ok" :D

'csum_ok' sounds good :) 'csum_valid' if want to be fancy

> 
> >   * @rx_desc: the receive descriptor
> >   * @ptype: the packet type decoded by hardware
> > + * @csum_lvl_dst: address to put checksum level into
> > + * @ring: ring for error stats, can be NULL
> >   *
> > - * skb->protocol must be set before this function is called
> > + * Returns true, if hardware has checked the checksum.
> >   */
> > -static void
> > -ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
> > -	    union ice_32b_rx_flex_desc *rx_desc, u16 ptype)
> > +static bool
> > +ice_rx_csum_checked(union ice_32b_rx_flex_desc *rx_desc, u16 ptype,
> 
> (also const, but I guess you'll do that either way after the previous
>  mails)

OK

> 
> > +		    u8 *csum_lvl_dst, struct ice_rx_ring *ring)
> >  {
> >  	struct ice_rx_ptype_decoded decoded;
> >  	u16 rx_status0, rx_status1;
> 
> [...]
> 
> > +/**
> > + * ice_rx_csum_into_skb - Indicate in skb if checksum is good
> > + * @ring: the ring we care about
> > + * @skb: skb currently being received and modified
> > + * @rx_desc: the receive descriptor
> > + * @ptype: the packet type decoded by hardware
> > + */
> > +static void
> > +ice_rx_csum_into_skb(struct ice_rx_ring *ring, struct sk_buff *skb,
> > +		     union ice_32b_rx_flex_desc *rx_desc, u16 ptype)
> > +{
> > +	u8 csum_level = 0;
> 
> I'm not a fan of variables shorter than u32 on the stack. And since it
> gets passed by a reference, I'm not sure the compiler will inline it =\
> 
> > +
> > +	/* Start with CHECKSUM_NONE and by default csum_level = 0 */
> > +	skb->ip_summed = CHECKSUM_NONE;
> > +	skb_checksum_none_assert(skb);
> 
> Can we also remove this? Neither of these makes sense. ::ip_summed is
> always zeroed after the memset() in __build_skb_around() (somewhere
> there), while the assertion checks for `skb->ip_summed ==
> CHECKSUM_NONE`, i.e. it's *always* true here (set and check :D). It's
> some ancient pathetic rituals copied over and over again from e100
> centuries or so...

Will fix.

> 
> ...and BTW the comment is misleading, because the code doesn't zero
> ::csum_level as they claim :D
> 
> > +
> > +	/* check if Rx checksum is enabled */
> > +	if (!(ring->netdev->features & NETIF_F_RXCSUM))
> > +		return;
> > +
> > +	if (!ice_rx_csum_checked(rx_desc, ptype, &csum_level, ring))
> > +		return;
> > +
> > +	skb->ip_summed = CHECKSUM_UNNECESSARY;
> > +	skb->csum_level = csum_level;
> 
> Since csum_level is useless when ip_summed is set to NONE, what do you
> think about making the function return -1, 0, or 1 without writing
> anything by reference?
> 
> 	int csum_level;
> 
> 	csum_level = ice_rx_csum_ok(rx_desc, ptype, ring);
> 	if (csum_level < 0)
> 		return;
> 
> 	skb->ip_summed = CHECKSUM_UNNECESSARY;
> 	skb->csum_level = csum_level;
> 
> I'm not saying it's better (might be a bit at codegen), just proposing.

I think it's worth a try.

> 
> >  }
> >  
> >  /**
> > @@ -232,7 +253,7 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
> >  	/* modifies the skb - consumes the enet header */
> >  	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
> >  
> > -	ice_rx_csum(rx_ring, skb, rx_desc, ptype);
> > +	ice_rx_csum_into_skb(rx_ring, skb, rx_desc, ptype);
> >  
> >  	if (rx_ring->ptp_rx)
> >  		ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb);
> 
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 05/15] ice: Introduce ice_xdp_buff
  2023-05-12 15:25 ` [PATCH RESEND bpf-next 05/15] ice: Introduce ice_xdp_buff Larysa Zaremba
@ 2023-05-22 16:46   ` Alexander Lobakin
  2023-05-23  8:02     ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-22 16:46 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Fri, 12 May 2023 17:25:57 +0200

> In order to use XDP hints via kfuncs we need to put
> RX descriptor and ring pointers just next to xdp_buff.
> Same as in hints implementations in other drivers, we archieve
> this through putting xdp_buff into a child structure.
> 
> Currently, xdp_buff is stored in the ring structure,
> so replace it with union that includes child structure.
> This way enough memory is available while existing XDP code
> remains isolated from hints.
> 
> Size of the new child structure (ice_xdp_buff) is 72 bytes,
> therefore it does not fit into a single cache line.
> To at least place union at the start of cache line, move 'next'
> field from CL3 to CL1, as it isn't used often.
> 
> Placing union at the start of cache line makes at least xdp_buff
> and descriptor fit into a single CL,
> ring pointer is used less often, so it can spill into the next CL.

Spill or span?

> 
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx.c     |  7 ++++--
>  drivers/net/ethernet/intel/ice/ice_txrx.h     | 23 ++++++++++++++++---
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 11 +++++++++
>  3 files changed, 36 insertions(+), 5 deletions(-)

[...]

> --- a/drivers/net/ethernet/intel/ice/ice_txrx.h
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
> @@ -260,6 +260,15 @@ enum ice_rx_dtype {
>  	ICE_RX_DTYPE_SPLIT_ALWAYS	= 2,
>  };
>  
> +struct ice_xdp_buff {
> +	struct xdp_buff xdp_buff;
> +	union ice_32b_rx_flex_desc *eop_desc;	/* Required for all metadata */

Probably can be const here as well after changing all the places
appropriately -- I don't think you write to it anywhere.

> +	/* End of the 1st cache line */
> +	struct ice_rx_ring *rx_ring;

Can't we get rid of ring dependency? Maybe there's only a couple fields
that could be copied here instead of referencing the ring? I just find
it weird that our drivers often look for something in the ring structure
to parse a descriptor ._.
If not, can't it be const?

> +};
> +
> +static_assert(offsetof(struct ice_xdp_buff, xdp_buff) == 0);
> +
>  /* indices into GLINT_ITR registers */
>  #define ICE_RX_ITR	ICE_IDX_ITR0
>  #define ICE_TX_ITR	ICE_IDX_ITR1
> @@ -301,7 +310,6 @@ enum ice_dynamic_itr {
>  /* descriptor ring, associated with a VSI */
>  struct ice_rx_ring {
>  	/* CL1 - 1st cacheline starts here */
> -	struct ice_rx_ring *next;	/* pointer to next ring in q_vector */
>  	void *desc;			/* Descriptor ring memory */
>  	struct device *dev;		/* Used for DMA mapping */
>  	struct net_device *netdev;	/* netdev ring maps to */
> @@ -313,12 +321,19 @@ struct ice_rx_ring {
>  	u16 count;			/* Number of descriptors */
>  	u16 reg_idx;			/* HW register index of the ring */
>  	u16 next_to_alloc;
> -	/* CL2 - 2nd cacheline starts here */
> +
>  	union {
>  		struct ice_rx_buf *rx_buf;
>  		struct xdp_buff **xdp_buf;
>  	};
> -	struct xdp_buff xdp;
> +	/* CL2 - 2nd cacheline starts here
> +	 * Size of ice_xdp_buff is 72 bytes,
> +	 * so it spills into CL3
> +	 */
> +	union {
> +		struct ice_xdp_buff xdp_ext;
> +		struct xdp_buff xdp;
> +	};

...or you can leave just one xdp_ext (naming it just "xdp") -- for now,
this union does literally nothing, as xdp_ext contains xdp at its very
beginning.

>  	/* CL3 - 3rd cacheline starts here */
>  	struct bpf_prog *xdp_prog;
>  	u16 rx_offset;
> @@ -328,6 +343,8 @@ struct ice_rx_ring {
>  	u16 next_to_clean;
>  	u16 first_desc;
>  
> +	struct ice_rx_ring *next;	/* pointer to next ring in q_vector */

It can be placed even farther, somewhere near rcu_head -- IIRC it's not
used anywhere on hotpath. Even ::ring_stats below is hotter.

> +
>  	/* stats structs */
>  	struct ice_ring_stats *ring_stats;
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> index e1d49e1235b3..2835a8348237 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> @@ -151,4 +151,15 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
>  		       struct sk_buff *skb);
>  void
>  ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag);
> +
> +static inline void
> +ice_xdp_set_meta_srcs(struct xdp_buff *xdp,

Not sure about the naming... But can't propose anything :clownface:
ice_xdp_init_buff()? Like xdp_init_buff(), but ice_xdp_buff :D

> +		      union ice_32b_rx_flex_desc *eop_desc,
> +		      struct ice_rx_ring *rx_ring)
> +{
> +	struct ice_xdp_buff *xdp_ext = (struct ice_xdp_buff *)xdp;

I'd use container_of(), even though it will do the same thing here.
BTW, is having &xdp_buff at offset 0 still a requirement?

> +
> +	xdp_ext->eop_desc = eop_desc;
> +	xdp_ext->rx_ring = rx_ring;
> +}
>  #endif /* !_ICE_TXRX_LIB_H_ */

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 05/15] ice: Introduce ice_xdp_buff
  2023-05-22 16:46   ` Alexander Lobakin
@ 2023-05-23  8:02     ` Larysa Zaremba
  2023-05-25 11:02       ` Alexander Lobakin
  0 siblings, 1 reply; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-23  8:02 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On Mon, May 22, 2023 at 06:46:40PM +0200, Alexander Lobakin wrote:
> From: Larysa Zaremba <larysa.zaremba@intel.com>
> Date: Fri, 12 May 2023 17:25:57 +0200
> 
> > In order to use XDP hints via kfuncs we need to put
> > RX descriptor and ring pointers just next to xdp_buff.
> > Same as in hints implementations in other drivers, we archieve
> > this through putting xdp_buff into a child structure.
> > 
> > Currently, xdp_buff is stored in the ring structure,
> > so replace it with union that includes child structure.
> > This way enough memory is available while existing XDP code
> > remains isolated from hints.
> > 
> > Size of the new child structure (ice_xdp_buff) is 72 bytes,
> > therefore it does not fit into a single cache line.
> > To at least place union at the start of cache line, move 'next'
> > field from CL3 to CL1, as it isn't used often.
> > 
> > Placing union at the start of cache line makes at least xdp_buff
> > and descriptor fit into a single CL,
> > ring pointer is used less often, so it can spill into the next CL.
> 
> Spill or span?

I guess 'span' is the better word.

> 
> > 
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_txrx.c     |  7 ++++--
> >  drivers/net/ethernet/intel/ice/ice_txrx.h     | 23 ++++++++++++++++---
> >  drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 11 +++++++++
> >  3 files changed, 36 insertions(+), 5 deletions(-)
> 
> [...]
> 
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
> > @@ -260,6 +260,15 @@ enum ice_rx_dtype {
> >  	ICE_RX_DTYPE_SPLIT_ALWAYS	= 2,
> >  };
> >  
> > +struct ice_xdp_buff {
> > +	struct xdp_buff xdp_buff;
> > +	union ice_32b_rx_flex_desc *eop_desc;	/* Required for all metadata */
> 
> Probably can be const here as well after changing all the places
> appropriately -- I don't think you write to it anywhere.

Correct.

> 
> > +	/* End of the 1st cache line */
> > +	struct ice_rx_ring *rx_ring;
> 
> Can't we get rid of ring dependency? Maybe there's only a couple fields
> that could be copied here instead of referencing the ring? I just find
> it weird that our drivers often look for something in the ring structure
> to parse a descriptor ._.
> If not, can't it be const?

You're right, I could put just rx_ring->cached_phctime into this structure.
But I recall you saying that if we access ring for timestamps only this is not a 
problem :)

> 
> > +};
> > +
> > +static_assert(offsetof(struct ice_xdp_buff, xdp_buff) == 0);
> > +
> >  /* indices into GLINT_ITR registers */
> >  #define ICE_RX_ITR	ICE_IDX_ITR0
> >  #define ICE_TX_ITR	ICE_IDX_ITR1
> > @@ -301,7 +310,6 @@ enum ice_dynamic_itr {
> >  /* descriptor ring, associated with a VSI */
> >  struct ice_rx_ring {
> >  	/* CL1 - 1st cacheline starts here */
> > -	struct ice_rx_ring *next;	/* pointer to next ring in q_vector */
> >  	void *desc;			/* Descriptor ring memory */
> >  	struct device *dev;		/* Used for DMA mapping */
> >  	struct net_device *netdev;	/* netdev ring maps to */
> > @@ -313,12 +321,19 @@ struct ice_rx_ring {
> >  	u16 count;			/* Number of descriptors */
> >  	u16 reg_idx;			/* HW register index of the ring */
> >  	u16 next_to_alloc;
> > -	/* CL2 - 2nd cacheline starts here */
> > +
> >  	union {
> >  		struct ice_rx_buf *rx_buf;
> >  		struct xdp_buff **xdp_buf;
> >  	};
> > -	struct xdp_buff xdp;
> > +	/* CL2 - 2nd cacheline starts here
> > +	 * Size of ice_xdp_buff is 72 bytes,
> > +	 * so it spills into CL3
> > +	 */
> > +	union {
> > +		struct ice_xdp_buff xdp_ext;
> > +		struct xdp_buff xdp;
> > +	};
> 
> ...or you can leave just one xdp_ext (naming it just "xdp") -- for now,
> this union does literally nothing, as xdp_ext contains xdp at its very
> beginning.

I would like to leave non-meta-related-code rather unaware of existance of 
ice_xdp_buff. Why access '&ring->xdp.xdp_buff' or '(struct xdp_buff *)xdp', when 
we can do just 'ring->xdp'?

> 
> >  	/* CL3 - 3rd cacheline starts here */
> >  	struct bpf_prog *xdp_prog;
> >  	u16 rx_offset;
> > @@ -328,6 +343,8 @@ struct ice_rx_ring {
> >  	u16 next_to_clean;
> >  	u16 first_desc;
> >  
> > +	struct ice_rx_ring *next;	/* pointer to next ring in q_vector */
> 
> It can be placed even farther, somewhere near rcu_head -- IIRC it's not
> used anywhere on hotpath. Even ::ring_stats below is hotter.

Ok, I'll try to but it further from the start.

> 
> > +
> >  	/* stats structs */
> >  	struct ice_ring_stats *ring_stats;
> >  
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> > index e1d49e1235b3..2835a8348237 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> > @@ -151,4 +151,15 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
> >  		       struct sk_buff *skb);
> >  void
> >  ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag);
> > +
> > +static inline void
> > +ice_xdp_set_meta_srcs(struct xdp_buff *xdp,
> 
> Not sure about the naming... But can't propose anything :clownface:
> ice_xdp_init_buff()? Like xdp_init_buff(), but ice_xdp_buff :D

ice_xdp_init_buff() sound exactly like a custom wrapper for xdp_init_buff(), but 
usage of those functions would be quite different. I've contemplated the naming 
of this one for some time and think it's good enough as it is, at least it 
communicates that function has sth to do with 'xdp' and 'meta' and doesn't sound 
like it fills in metadata.
> 
> > +		      union ice_32b_rx_flex_desc *eop_desc,
> > +		      struct ice_rx_ring *rx_ring)
> > +{
> > +	struct ice_xdp_buff *xdp_ext = (struct ice_xdp_buff *)xdp;
> 
> I'd use container_of(), even though it will do the same thing here.
> BTW, is having &xdp_buff at offset 0 still a requirement?

I've actually forgot about why it is a requirement, but have found my older 
github answer to you.

"AF_XDP implementation also assumes xdp_buff is at the start".

What I meant by that is xdp_buffs from xsk_pool have only tailroom.

Maybe I should add a comment about this next to static assert.
Will change to container_of, I guess it's more future-proof.

> 
> > +
> > +	xdp_ext->eop_desc = eop_desc;
> > +	xdp_ext->rx_ring = rx_ring;
> > +}
> >  #endif /* !_ICE_TXRX_LIB_H_ */
> 
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint
  2023-05-22 15:48         ` Larysa Zaremba
@ 2023-05-23 10:16           ` Jesper Dangaard Brouer
  2023-05-23 17:35             ` Larysa Zaremba
  0 siblings, 1 reply; 54+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-23 10:16 UTC (permalink / raw)
  To: Larysa Zaremba, Jesper Dangaard Brouer
  Cc: brouer, bpf, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel



On 22/05/2023 17.48, Larysa Zaremba wrote:
> On Mon, May 22, 2023 at 10:37:33AM +0200, Jesper Dangaard Brouer wrote:
>>
>>
>> On 15/05/2023 18.09, Larysa Zaremba wrote:
>>> On Mon, May 15, 2023 at 05:36:12PM +0200, Jesper Dangaard Brouer wrote:
>>>>
>>>>
>>>> On 12/05/2023 17.26, Larysa Zaremba wrote:
>>>>> Implement functionality that enables drivers to expose VLAN tag
>>>>> to XDP code.
>>>>>
>>>>> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
>>>>> ---
>>>> [...]
>>>>
>>>>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>>>>> index 41e5ca8643ec..eff21501609f 100644
>>>>> --- a/net/core/xdp.c
>>>>> +++ b/net/core/xdp.c
>>>>> @@ -738,6 +738,30 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
>>>>>     	return -EOPNOTSUPP;
>>>>>     }
>>>>
>>>> Remember below becomes part of main documentation on HW metadata hints:
>>>>    - https://kernel.org/doc/html/latest/networking/xdp-rx-metadata.html
>>>>
>>>> Hint compiling locally I use:
>>>>    make SPHINXDIRS="networking" htmldocs
>>>>
>>>>> +/**
>>>>> + * bpf_xdp_metadata_rx_ctag - Read XDP packet inner vlan tag.
>>>>
>>>> Is bpf_xdp_metadata_rx_ctag a good function name for the inner vlan tag?
>>>> Like wise below "stag".
>>>>
>>>> I cannot remember if the C-tag or S-tag is the inner or outer vlan tag.
>>>>
>>>> When reading BPF code that use these function names, then I would have
>>>> to ask Google for help, or find-and-read this doc.
>>>>
>>>> Can we come-up with a more intuitive name, that e.g. helps when reading
>>>> the BPF-prog code?
>>>
>>> Well, my reasoning for such naming is that if someone can configure s-tag
>>> stripping in ethtool with 'rx-vlan-stag-hw-parse', they shouldn't have any
>>> problem with understanding those function names.
>>>
>>
>> Naming is hard.  My perspective is conveying the meaning without having
>> to be knowledgeable about ethtool VLAN commands.  My perspective is a
>> casual BPF-programmer that reads "bpf_xdp_metadata_rx_stag()".
>> Hopefully we can choose a name that says "vlan" somewhere, such that the
>> person reading this doesn't have to lookup and find the documentation to
>> deduct this code is related to VLANs.
>>
>>> One possible improvement that comes to mind is maybe (similarly ethtool) calling
>>> c-tag just 'tag' and letting s-tag stay 'stag'. Because c-tag is this default
>>> 802.1q tag, which is supported by various hardware, while s-tag is significantly
>>> less widespread.
>>>
>>> But there are many options, really.
>>>
>>> What are your suggestions?
>>>
>>
>> One suggestion is (the symmetrical):
>>   * bpf_xdp_metadata_rx_vlan_inner_tag
>>   * bpf_xdp_metadata_rx_vlan_outer_tag
>>
>> As you say above the first "inner" VLAN tag is just the regular 802.1Q
>> VLAN tag.  The concept of C-tag and S-tag is from 802.1ad that
>> introduced the concept of double tagging.
>>
>> Thus one could argue for shorter names like:
>>   * bpf_xdp_metadata_rx_vlan_tag
>>   * bpf_xdp_metadata_rx_vlan_outer_tag
>>
> 
> AFAIK, outer tag is a broader term, it's pretty often used for stacked 802.1Q
> headers. I can't find what exactly is an expected behavior for rxvlan and
> rx-vlan-stag-hw-parse in ethtool, but iavf documentation states that rxvlan
> "enables outer or single 802.1Q VLAN stripping" and rx-vlan-stag-hw-parse
> "enables outer or single 802.1ad VLAN stripping". This is in consistent with how
> ice hardware behaves. More credible sources would be welcome.
> 

It would be good to figure out how other hardware behaves.

The iavf doc sounds like very similar behavior from both functions, just 
802.1Q vs 802.1ad.
Sounds like both will just pop/strip the outer vlan tag.
I have seen Ethertype 802.1Q being used (in practice) for double tagged
packets, even-though 802.1ad should have been used to comply with the
standard.

> What about:
>    * bpf_xdp_metadata_rx_vlan_tag
>    * bpf_xdp_metadata_rx_vlan_qinq_tag
> 

This sounds good to me.

I do wonder if we really need two functions for this?
Would one function be enough?

Given the (iavf) description, the functions basically does the same.
Looking at your ice driver implementation, they could be merged into one
function, as it is the same location in the descriptor.

>>
>>>>
>>>>> + * @ctx: XDP context pointer.
>>>>> + * @vlan_tag: Return value pointer.
>>>>> + *
>>>>
>>>> IMHO right here, there should be a description.
>>>>
>>>> E.g. for what a VLAN "tag" means.  I assume a "tag" isn't the VLAN id,
>>>> but the raw VLAN tag that also contains the prio numbers etc.
>>>>
>>>> It this VLAN tag expected to be in network-byte-order ?
>>>> IMHO this doc should define what is expected (and driver devel must
>>>> follow this).
>>>
>>> Will specify that.
>>>
>>>>
>>>>> + * Returns 0 on success or ``-errno`` on error.
>>>>> + */
>>>>> +__bpf_kfunc int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
>>>>> +{
>>>>> +	return -EOPNOTSUPP;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * bpf_xdp_metadata_rx_stag - Read XDP packet outer vlan tag.
>>>>> + * @ctx: XDP context pointer.
>>>>> + * @vlan_tag: Return value pointer.
>>>>> + *
>>
>> (p.s. Googling I find multiple definitions of what the "S" in S-tag
>> means. The most reliable or statistically consistent seems to be
>> "Service tag", or "Service provider tag".)
>>
>> The description for the renamed "bpf_xdp_metadata_rx_vlan_outer_tag"
>> should IMHO explain that the outer VLAN tag is often refered to as the S-tag
>> (or Service-tag) in Q-in-Q (802.1ad) terminology.  Perhaps we can even spell
>> out that some hardware support (and must be configured via ethtool) to
>> extract this stag.
>>
>> A dump of the tool rx-vlan related commands:
>>
>>    $ ethtool -k i40e2 | grep rx-vlan
>>    rx-vlan-offload: on
>>    rx-vlan-filter: on [fixed]
>>    rx-vlan-stag-hw-parse: off [fixed]
>>    rx-vlan-stag-filter: off [fixed]
>>
[...]


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint
  2023-05-23 10:16           ` Jesper Dangaard Brouer
@ 2023-05-23 17:35             ` Larysa Zaremba
  0 siblings, 0 replies; 54+ messages in thread
From: Larysa Zaremba @ 2023-05-23 17:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: brouer, bpf, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jakub Kicinski,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Jiri Olsa, Jesse Brandeburg, Tony Nguyen,
	Anatoly Burakov, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

On Tue, May 23, 2023 at 12:16:46PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 22/05/2023 17.48, Larysa Zaremba wrote:
> > On Mon, May 22, 2023 at 10:37:33AM +0200, Jesper Dangaard Brouer wrote:
> > > 
> > > 
> > > On 15/05/2023 18.09, Larysa Zaremba wrote:
> > > > On Mon, May 15, 2023 at 05:36:12PM +0200, Jesper Dangaard Brouer wrote:
> > > > > 
> > > > > 
> > > > > On 12/05/2023 17.26, Larysa Zaremba wrote:
> > > > > > Implement functionality that enables drivers to expose VLAN tag
> > > > > > to XDP code.
> > > > > > 
> > > > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > > > ---
> > > > > [...]
> > > > > 
> > > > > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > > > > index 41e5ca8643ec..eff21501609f 100644
> > > > > > --- a/net/core/xdp.c
> > > > > > +++ b/net/core/xdp.c
> > > > > > @@ -738,6 +738,30 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
> > > > > >     	return -EOPNOTSUPP;
> > > > > >     }
> > > > > 
> > > > > Remember below becomes part of main documentation on HW metadata hints:
> > > > >    - https://kernel.org/doc/html/latest/networking/xdp-rx-metadata.html
> > > > > 
> > > > > Hint compiling locally I use:
> > > > >    make SPHINXDIRS="networking" htmldocs
> > > > > 
> > > > > > +/**
> > > > > > + * bpf_xdp_metadata_rx_ctag - Read XDP packet inner vlan tag.
> > > > > 
> > > > > Is bpf_xdp_metadata_rx_ctag a good function name for the inner vlan tag?
> > > > > Like wise below "stag".
> > > > > 
> > > > > I cannot remember if the C-tag or S-tag is the inner or outer vlan tag.
> > > > > 
> > > > > When reading BPF code that use these function names, then I would have
> > > > > to ask Google for help, or find-and-read this doc.
> > > > > 
> > > > > Can we come-up with a more intuitive name, that e.g. helps when reading
> > > > > the BPF-prog code?
> > > > 
> > > > Well, my reasoning for such naming is that if someone can configure s-tag
> > > > stripping in ethtool with 'rx-vlan-stag-hw-parse', they shouldn't have any
> > > > problem with understanding those function names.
> > > > 
> > > 
> > > Naming is hard.  My perspective is conveying the meaning without having
> > > to be knowledgeable about ethtool VLAN commands.  My perspective is a
> > > casual BPF-programmer that reads "bpf_xdp_metadata_rx_stag()".
> > > Hopefully we can choose a name that says "vlan" somewhere, such that the
> > > person reading this doesn't have to lookup and find the documentation to
> > > deduct this code is related to VLANs.
> > > 
> > > > One possible improvement that comes to mind is maybe (similarly ethtool) calling
> > > > c-tag just 'tag' and letting s-tag stay 'stag'. Because c-tag is this default
> > > > 802.1q tag, which is supported by various hardware, while s-tag is significantly
> > > > less widespread.
> > > > 
> > > > But there are many options, really.
> > > > 
> > > > What are your suggestions?
> > > > 
> > > 
> > > One suggestion is (the symmetrical):
> > >   * bpf_xdp_metadata_rx_vlan_inner_tag
> > >   * bpf_xdp_metadata_rx_vlan_outer_tag
> > > 
> > > As you say above the first "inner" VLAN tag is just the regular 802.1Q
> > > VLAN tag.  The concept of C-tag and S-tag is from 802.1ad that
> > > introduced the concept of double tagging.
> > > 
> > > Thus one could argue for shorter names like:
> > >   * bpf_xdp_metadata_rx_vlan_tag
> > >   * bpf_xdp_metadata_rx_vlan_outer_tag
> > > 
> > 
> > AFAIK, outer tag is a broader term, it's pretty often used for stacked 802.1Q
> > headers. I can't find what exactly is an expected behavior for rxvlan and
> > rx-vlan-stag-hw-parse in ethtool, but iavf documentation states that rxvlan
> > "enables outer or single 802.1Q VLAN stripping" and rx-vlan-stag-hw-parse
> > "enables outer or single 802.1ad VLAN stripping". This is in consistent with how
> > ice hardware behaves. More credible sources would be welcome.
> > 
> 
> It would be good to figure out how other hardware behaves.
> 
> The iavf doc sounds like very similar behavior from both functions, just
> 802.1Q vs 802.1ad.
> Sounds like both will just pop/strip the outer vlan tag.
> I have seen Ethertype 802.1Q being used (in practice) for double tagged
> packets, even-though 802.1ad should have been used to comply with the
> standard.
> 
> > What about:
> >    * bpf_xdp_metadata_rx_vlan_tag
> >    * bpf_xdp_metadata_rx_vlan_qinq_tag
> > 
> 
> This sounds good to me.
> 
> I do wonder if we really need two functions for this?
> Would one function be enough?
> 
> Given the (iavf) description, the functions basically does the same.
> Looking at your ice driver implementation, they could be merged into one
> function, as it is the same location in the descriptor.
>

This design was very debatable in the first place.
I looked at different in-tree driver implementations of NETIF_F_HW_VLAN_STAG_RX
feature once more. Among those I could comprehend, seems like none has c-tag and 
s-tag stored separately. Actually, there are 2 situations:

1. (ex. mlx4) HW always strips outer or single VLAN tag, without distinction 
between 802.1Q and 802.1ad. TPID in such case is deduced from descriptor. 
NETIF_F_HW_VLAN_STAG_RX and NETIF_F_HW_VLAN_CTAG_RX must be enabled together.

2. (ex. ice) HW strips outer or single VLAN tag with a configured TPID. In such 
case descriptor doesn't carry info about TPID, because it's the same for all 
stripped tags. C-tag and s-tag stripping are mutually exclusive.
Example:
 - 802.1Q double VLAN, with s-tag stripping enabled, packet arrives 
   untouched, with c-tag stripping outermost tag gets stripped.
 - 802.1ad+802.1Q, with s-tag stripping enabled, 802.1ad header gets stripped,
   with c-tag stripping, packet arrives untouched.

Obviously, I can be sure only about our hardware.

Long story short, probably re-inventing the wheel wasn't a good idea on my part. 
Now I am much more inclined to just copy the logic from skb, so function would 
look like this:

  bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, __u16 *vlan_tag,
			       __u16 *tpid);

Maybe some applications would make use of just:

  bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx, __u16 *vlan_tag);

Both of the above functions would return information about outermost tag, if was 
stripped. Would have to think about the naming.

Comments are welcome!

> > > 
> > > > > 
> > > > > > + * @ctx: XDP context pointer.
> > > > > > + * @vlan_tag: Return value pointer.
> > > > > > + *
> > > > > 
> > > > > IMHO right here, there should be a description.
> > > > > 
> > > > > E.g. for what a VLAN "tag" means.  I assume a "tag" isn't the VLAN id,
> > > > > but the raw VLAN tag that also contains the prio numbers etc.
> > > > > 
> > > > > It this VLAN tag expected to be in network-byte-order ?
> > > > > IMHO this doc should define what is expected (and driver devel must
> > > > > follow this).
> > > > 
> > > > Will specify that.
> > > > 
> > > > > 
> > > > > > + * Returns 0 on success or ``-errno`` on error.
> > > > > > + */
> > > > > > +__bpf_kfunc int bpf_xdp_metadata_rx_ctag(const struct xdp_md *ctx, u16 *vlan_tag)
> > > > > > +{
> > > > > > +	return -EOPNOTSUPP;
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * bpf_xdp_metadata_rx_stag - Read XDP packet outer vlan tag.
> > > > > > + * @ctx: XDP context pointer.
> > > > > > + * @vlan_tag: Return value pointer.
> > > > > > + *
> > > 
> > > (p.s. Googling I find multiple definitions of what the "S" in S-tag
> > > means. The most reliable or statistically consistent seems to be
> > > "Service tag", or "Service provider tag".)
> > > 
> > > The description for the renamed "bpf_xdp_metadata_rx_vlan_outer_tag"
> > > should IMHO explain that the outer VLAN tag is often refered to as the S-tag
> > > (or Service-tag) in Q-in-Q (802.1ad) terminology.  Perhaps we can even spell
> > > out that some hardware support (and must be configured via ethtool) to
> > > extract this stag.
> > > 
> > > A dump of the tool rx-vlan related commands:
> > > 
> > >    $ ethtool -k i40e2 | grep rx-vlan
> > >    rx-vlan-offload: on
> > >    rx-vlan-filter: on [fixed]
> > >    rx-vlan-stag-hw-parse: off [fixed]
> > >    rx-vlan-stag-filter: off [fixed]
> > > 
> [...]
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH RESEND bpf-next 05/15] ice: Introduce ice_xdp_buff
  2023-05-23  8:02     ` Larysa Zaremba
@ 2023-05-25 11:02       ` Alexander Lobakin
  0 siblings, 0 replies; 54+ messages in thread
From: Alexander Lobakin @ 2023-05-25 11:02 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: bpf, Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jakub Kicinski, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Jiri Olsa,
	Jesse Brandeburg, Tony Nguyen, Anatoly Burakov,
	Jesper Dangaard Brouer, Alexander Lobakin, Magnus Karlsson,
	Maryam Tahhan, xdp-hints, netdev, intel-wired-lan, linux-kernel

From: Larysa Zaremba <larysa.zaremba@intel.com>
Date: Tue, 23 May 2023 10:02:42 +0200

> On Mon, May 22, 2023 at 06:46:40PM +0200, Alexander Lobakin wrote:
>> From: Larysa Zaremba <larysa.zaremba@intel.com>
>> Date: Fri, 12 May 2023 17:25:57 +0200
>>
>>> In order to use XDP hints via kfuncs we need to put
>>> RX descriptor and ring pointers just next to xdp_buff.
>>> Same as in hints implementations in other drivers, we archieve

                                                          ^^^^^^^^
                                                          achieve

I missed this one initially :D

>>> this through putting xdp_buff into a child structure.
>>>
>>> Currently, xdp_buff is stored in the ring structure,
>>> so replace it with union that includes child structure.
>>> This way enough memory is available while existing XDP code
>>> remains isolated from hints.

[...]

>>> +	/* End of the 1st cache line */
>>> +	struct ice_rx_ring *rx_ring;
>>
>> Can't we get rid of ring dependency? Maybe there's only a couple fields
>> that could be copied here instead of referencing the ring? I just find
>> it weird that our drivers often look for something in the ring structure
>> to parse a descriptor ._.
>> If not, can't it be const?
> 
> You're right, I could put just rx_ring->cached_phctime into this structure.
> But I recall you saying that if we access ring for timestamps only this is not a 
> problem :)

Sure, it's not a problem, I just thought it's an overkill to put pointer
to the ring here, since it's not needed to parse descriptors.
...checked right now, the function which processes timestamp from a
descriptor really needs only ::cached_phctime from the ring, nothing
more. Sorta overkill I think :s This phctime would be enough to put here.

> 
>>
>>> +};
>>> +
>>> +static_assert(offsetof(struct ice_xdp_buff, xdp_buff) == 0);
>>> +
>>>  /* indices into GLINT_ITR registers */
>>>  #define ICE_RX_ITR	ICE_IDX_ITR0
>>>  #define ICE_TX_ITR	ICE_IDX_ITR1

[...]

>>> +		struct ice_xdp_buff xdp_ext;
>>> +		struct xdp_buff xdp;
>>> +	};
>>
>> ...or you can leave just one xdp_ext (naming it just "xdp") -- for now,
>> this union does literally nothing, as xdp_ext contains xdp at its very
>> beginning.
> 
> I would like to leave non-meta-related-code rather unaware of existance of 
> ice_xdp_buff. Why access '&ring->xdp.xdp_buff' or '(struct xdp_buff *)xdp', when 
> we can do just 'ring->xdp'?

Hmm, got it. On point :D

> 
>>
>>>  	/* CL3 - 3rd cacheline starts here */
>>>  	struct bpf_prog *xdp_prog;
>>>  	u16 rx_offset;

[...]

>>> +static inline void
>>> +ice_xdp_set_meta_srcs(struct xdp_buff *xdp,
>>
>> Not sure about the naming... But can't propose anything :clownface:
>> ice_xdp_init_buff()? Like xdp_init_buff(), but ice_xdp_buff :D
> 
> ice_xdp_init_buff() sound exactly like a custom wrapper for xdp_init_buff(), but 
> usage of those functions would be quite different. I've contemplated the naming 
> of this one for some time and think it's good enough as it is, at least it 
> communicates that function has sth to do with 'xdp' and 'meta' and doesn't sound 
> like it fills in metadata.

ice_xdp_prepare_buff() :D Just kiddin, "set_meta_srcs" is fine, too.

>>
>>> +		      union ice_32b_rx_flex_desc *eop_desc,
>>> +		      struct ice_rx_ring *rx_ring)
>>> +{
>>> +	struct ice_xdp_buff *xdp_ext = (struct ice_xdp_buff *)xdp;
>>
>> I'd use container_of(), even though it will do the same thing here.
>> BTW, is having &xdp_buff at offset 0 still a requirement?
> 
> I've actually forgot about why it is a requirement, but have found my older 
> github answer to you.
> 
> "AF_XDP implementation also assumes xdp_buff is at the start".
> 
> What I meant by that is xdp_buffs from xsk_pool have only tailroom.

> 
> Maybe I should add a comment about this next to static assert.
> Will change to container_of, I guess it's more future-proof.

Ah, AF_XDP programs, right. Comment near the assertion + container_of()
sounds perfect.

[...]

Thanks,
Olek

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2023-05-25 11:03 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-12 15:25 [PATCH RESEND bpf-next 00/15] new kfunc XDP hints and ice implementation Larysa Zaremba
2023-05-12 15:25 ` [PATCH RESEND bpf-next 01/15] ice: make RX hash reading code more reusable Larysa Zaremba
2023-05-19 16:46   ` Alexander Lobakin
2023-05-22 15:03     ` Larysa Zaremba
2023-05-22 15:36       ` Alexander Lobakin
2023-05-12 15:25 ` [PATCH RESEND bpf-next 02/15] ice: make RX HW timestamp " Larysa Zaremba
2023-05-19 16:52   ` Alexander Lobakin
2023-05-22 15:07     ` Larysa Zaremba
2023-05-12 15:25 ` [PATCH RESEND bpf-next 03/15] ice: make RX checksum checking " Larysa Zaremba
2023-05-22 15:51   ` Alexander Lobakin
2023-05-22 16:05     ` Larysa Zaremba
2023-05-12 15:25 ` [PATCH RESEND bpf-next 04/15] ice: Make ptype internal to descriptor info processing Larysa Zaremba
2023-05-12 15:25 ` [PATCH RESEND bpf-next 05/15] ice: Introduce ice_xdp_buff Larysa Zaremba
2023-05-22 16:46   ` Alexander Lobakin
2023-05-23  8:02     ` Larysa Zaremba
2023-05-25 11:02       ` Alexander Lobakin
2023-05-12 15:25 ` [PATCH RESEND bpf-next 06/15] ice: Support HW timestamp hint Larysa Zaremba
2023-05-12 18:19   ` Stanislav Fomichev
2023-05-16 16:17     ` Jesper Dangaard Brouer
2023-05-12 15:25 ` [PATCH RESEND bpf-next 07/15] ice: Support RX hash XDP hint Larysa Zaremba
2023-05-12 18:22   ` Stanislav Fomichev
2023-05-15 13:46     ` Larysa Zaremba
2023-05-12 15:26 ` [PATCH RESEND bpf-next 08/15] ice: Support XDP hints in AF_XDP ZC mode Larysa Zaremba
2023-05-12 15:26 ` [PATCH RESEND bpf-next 09/15] xdp: Add VLAN tag hint Larysa Zaremba
2023-05-12 18:28   ` Stanislav Fomichev
2023-05-15 15:36   ` Jesper Dangaard Brouer
2023-05-15 16:09     ` Larysa Zaremba
2023-05-22  8:37       ` Jesper Dangaard Brouer
2023-05-22 15:48         ` Larysa Zaremba
2023-05-23 10:16           ` Jesper Dangaard Brouer
2023-05-23 17:35             ` Larysa Zaremba
2023-05-12 15:26 ` [PATCH RESEND bpf-next 10/15] ice: Implement " Larysa Zaremba
2023-05-12 18:31   ` Stanislav Fomichev
2023-05-15 13:41     ` Larysa Zaremba
2023-05-15 15:07       ` Jesper Dangaard Brouer
2023-05-15 15:45         ` Larysa Zaremba
2023-05-12 15:26 ` [PATCH RESEND bpf-next 11/15] xdp: Add checksum level hint Larysa Zaremba
2023-05-12 18:34   ` Stanislav Fomichev
2023-05-15 13:49     ` Larysa Zaremba
2023-05-12 15:26 ` [PATCH RESEND bpf-next 12/15] ice: Implement " Larysa Zaremba
2023-05-12 15:26 ` [PATCH RESEND bpf-next 13/15] selftests/bpf: Allow VLAN packets in xdp_hw_metadata Larysa Zaremba
2023-05-12 18:33   ` Stanislav Fomichev
2023-05-15 14:05     ` Larysa Zaremba
2023-05-12 15:26 ` [PATCH RESEND bpf-next 14/15] net, xdp: allow metadata > 32 Larysa Zaremba
2023-05-15 16:17   ` Jesper Dangaard Brouer
2023-05-15 17:08     ` Larysa Zaremba
2023-05-16 12:37       ` Alexander Lobakin
2023-05-16 15:35         ` Jesper Dangaard Brouer
2023-05-19 16:35           ` Alexander Lobakin
2023-05-22 11:41             ` Jesper Dangaard Brouer
2023-05-22 15:28               ` Alexander Lobakin
2023-05-22 15:55                 ` Daniel Borkmann
2023-05-12 15:26 ` [PATCH RESEND bpf-next 15/15] selftests/bpf: Add flags and new hints to xdp_hw_metadata Larysa Zaremba
2023-05-12 18:37   ` Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).