bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf RFC 0/4] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash
@ 2023-03-28 20:15 Jesper Dangaard Brouer
  2023-03-28 20:15 ` [PATCH bpf RFC 1/4] xdp: rss hash types representation Jesper Dangaard Brouer
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-28 20:15 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
	ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
	anthony.l.nguyen, yoong.siang.song, boon.leong.ong,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Notice targeted 6.3-rc kernel via bpf git tree.

Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value,
but doesn't provide information on the RSS hash type (part of 6.3-rc).

This patchset proposal is to use the return value from
bpf_xdp_metadata_rx_hash() to provide the RSS hash type.

---

Jesper Dangaard Brouer (4):
      xdp: rss hash types representation
      igc: bpf_xdp_metadata_rx_hash return xdp rss hash type
      veth: bpf_xdp_metadata_rx_hash return xdp rss hash type
      mlx5: bpf_xdp_metadata_rx_hash return xdp rss hash type


 drivers/net/ethernet/intel/igc/igc_main.c     | 22 ++++++-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  | 61 ++++++++++++++++++-
 drivers/net/veth.c                            |  2 +-
 include/linux/mlx5/device.h                   | 14 ++++-
 include/net/xdp.h                             | 54 ++++++++++++++++
 net/core/xdp.c                                |  4 +-
 6 files changed, 150 insertions(+), 7 deletions(-)

--


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-28 20:15 [PATCH bpf RFC 0/4] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash Jesper Dangaard Brouer
@ 2023-03-28 20:15 ` Jesper Dangaard Brouer
  2023-03-28 21:58   ` Stanislav Fomichev
  2023-03-29  8:10   ` Edward Cree
  2023-03-28 20:16 ` [PATCH bpf RFC 2/4] igc: bpf_xdp_metadata_rx_hash return xdp rss hash type Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-28 20:15 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
	ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
	anthony.l.nguyen, yoong.siang.song, boon.leong.ong,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

The RSS hash type specifies what portion of packet data NIC hardware used
when calculating RSS hash value. The RSS types are focused on Internet
traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
primarily TCP vs UDP, but some hardware supports SCTP.

Hardware RSS types are differently encoded for each hardware NIC. Most
hardware represent RSS hash type as a number. Determining L3 vs L4 often
requires a mapping table as there often isn't a pattern or sorting
according to ISO layer.

The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can both
be seen as a number that is ordered according by ISO layer, and can be bit
masked to separate IPv4 and IPv6 types for L4 protocols. Room is available
for extending later while keeping these properties. This maps and unifies
difference to hardware specific hashes.

This proposal change the kfunc API bpf_xdp_metadata_rx_hash() to return
this RSS hash type on success.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/xdp.h |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 net/core/xdp.c    |    4 +++-
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 5393b3ebe56e..63f462f5ea7f 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -8,6 +8,7 @@
 
 #include <linux/skbuff.h> /* skb_shared_info */
 #include <uapi/linux/netdev.h>
+#include <linux/bitfield.h>
 
 /**
  * DOC: XDP RX-queue information
@@ -396,6 +397,56 @@ XDP_METADATA_KFUNC_xxx
 MAX_XDP_METADATA_KFUNC,
 };
 
+/* For partitioning of xdp_rss_hash_type */
+#define RSS_L3		GENMASK(2,0) /* 3-bits = values between 1-7 */
+#define L4_BIT		BIT(3)       /* 1-bit - L4 indication */
+#define RSS_L4_IPV4	GENMASK(6,4) /* 3-bits */
+#define RSS_L4_IPV6	GENMASK(9,7) /* 3-bits */
+#define RSS_L4		GENMASK(9,3) /* = 7-bits - covering L4 IPV4+IPV6 */
+#define L4_IPV6_EX_BIT	BIT(9)       /* 1-bit - L4 IPv6 with Extension hdr */
+				     /* 11-bits in total */
+
+/* The XDP RSS hash type (xdp_rss_hash_type) can both be seen as a number that
+ * is ordered according by ISO layer, and can be bit masked to separate IPv4 and
+ * IPv6 types for L4 protocols. Room is available for extending later while
+ * keeping above properties, as this need to cover NIC hardware RSS types.
+ */
+enum xdp_rss_hash_type {
+	XDP_RSS_TYPE_NONE            = 0,
+	XDP_RSS_TYPE_L2              = XDP_RSS_TYPE_NONE,
+
+	XDP_RSS_TYPE_L3_MASK         = RSS_L3,
+	XDP_RSS_TYPE_L3_IPV4         = FIELD_PREP_CONST(RSS_L3, 1),
+	XDP_RSS_TYPE_L3_IPV6         = FIELD_PREP_CONST(RSS_L3, 2),
+	XDP_RSS_TYPE_L3_IPV6_EX      = FIELD_PREP_CONST(RSS_L3, 4),
+
+	XDP_RSS_TYPE_L4_MASK         = RSS_L4,
+	XDP_RSS_TYPE_L4_SHIFT        = __bf_shf(RSS_L4),
+	XDP_RSS_TYPE_L4_MASK_EX      = RSS_L4 | L4_IPV6_EX_BIT,
+
+	XDP_RSS_TYPE_L4_IPV4_MASK    = RSS_L4_IPV4,
+	XDP_RSS_TYPE_L4_BIT          = L4_BIT,
+	XDP_RSS_TYPE_L4_IPV4_TCP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 1),
+	XDP_RSS_TYPE_L4_IPV4_UDP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 2),
+	XDP_RSS_TYPE_L4_IPV4_SCTP    = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 3),
+
+	XDP_RSS_TYPE_L4_IPV6_MASK    = RSS_L4_IPV6,
+	XDP_RSS_TYPE_L4_IPV6_TCP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 1),
+	XDP_RSS_TYPE_L4_IPV6_UDP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 2),
+	XDP_RSS_TYPE_L4_IPV6_SCTP    = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 3),
+
+	XDP_RSS_TYPE_L4_IPV6_EX_MASK = L4_IPV6_EX_BIT,
+	XDP_RSS_TYPE_L4_IPV6_TCP_EX  = XDP_RSS_TYPE_L4_IPV6_TCP |L4_IPV6_EX_BIT,
+	XDP_RSS_TYPE_L4_IPV6_UDP_EX  = XDP_RSS_TYPE_L4_IPV6_UDP |L4_IPV6_EX_BIT,
+	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP|L4_IPV6_EX_BIT,
+};
+#undef RSS_L3
+#undef L4_BIT
+#undef RSS_L4_IPV4
+#undef RSS_L4_IPV6
+#undef RSS_L4
+#undef L4_IPV6_EX_BIT
+
 #ifdef CONFIG_NET
 u32 bpf_xdp_metadata_kfunc_id(int id);
 bool bpf_dev_bound_kfunc_id(u32 btf_id);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 7133017bcd74..81d41df30695 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -721,12 +721,14 @@ __bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *tim
  * @hash: Return value pointer.
  *
  * Return:
- * * Returns 0 on success or ``-errno`` on error.
+ * * Returns (positive) RSS hash **type** on success or ``-errno`` on error.
+ * * ``enum xdp_rss_hash_type`` : RSS hash type
  * * ``-EOPNOTSUPP`` : means device driver doesn't implement kfunc
  * * ``-ENODATA``    : means no RX-hash available for this frame
  */
 __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash)
 {
+	BTF_TYPE_EMIT(enum xdp_rss_hash_type);
 	return -EOPNOTSUPP;
 }
 



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf RFC 2/4] igc: bpf_xdp_metadata_rx_hash return xdp rss hash type
  2023-03-28 20:15 [PATCH bpf RFC 0/4] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash Jesper Dangaard Brouer
  2023-03-28 20:15 ` [PATCH bpf RFC 1/4] xdp: rss hash types representation Jesper Dangaard Brouer
@ 2023-03-28 20:16 ` Jesper Dangaard Brouer
  2023-03-28 20:16 ` [PATCH bpf RFC 3/4] veth: " Jesper Dangaard Brouer
  2023-03-28 20:16 ` [PATCH bpf RFC 4/4] mlx5: " Jesper Dangaard Brouer
  3 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-28 20:16 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
	ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
	anthony.l.nguyen, yoong.siang.song, boon.leong.ong,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Update API for bpf_xdp_metadata_rx_hash() by returning xdp rss hash type
via mapping table.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c |   22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index b382476f347c..a14f0597524a 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6496,6 +6496,26 @@ static int igc_xdp_rx_timestamp(const struct xdp_md *_ctx, u64 *timestamp)
 	return -ENODATA;
 }
 
+/* Mapping HW RSS Type to enum xdp_rss_hash_type */
+enum xdp_rss_hash_type igc_xdp_rss_type[IGC_RSS_TYPE_MAX_TABLE] = {
+	[IGC_RSS_TYPE_NO_HASH]		= XDP_RSS_TYPE_L2,
+	[IGC_RSS_TYPE_HASH_TCP_IPV4]	= XDP_RSS_TYPE_L4_IPV4_TCP,
+	[IGC_RSS_TYPE_HASH_IPV4]	= XDP_RSS_TYPE_L3_IPV4,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6]	= XDP_RSS_TYPE_L4_IPV6_TCP,
+	[IGC_RSS_TYPE_HASH_IPV6_EX]	= XDP_RSS_TYPE_L3_IPV6_EX,
+	[IGC_RSS_TYPE_HASH_IPV6]	= XDP_RSS_TYPE_L3_IPV6,
+	[IGC_RSS_TYPE_HASH_TCP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX,
+	[IGC_RSS_TYPE_HASH_UDP_IPV4]	= XDP_RSS_TYPE_L4_IPV4_UDP,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6]	= XDP_RSS_TYPE_L4_IPV6_UDP,
+	[IGC_RSS_TYPE_HASH_UDP_IPV6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX,
+	[10] = XDP_RSS_TYPE_NONE, /* RSS Type above 9 "Reserved" by HW  */
+	[11] = XDP_RSS_TYPE_NONE, /* keep array sized for SW bit-mask   */
+	[12] = XDP_RSS_TYPE_NONE, /* to handle future HW revisons       */
+	[13] = XDP_RSS_TYPE_NONE,
+	[14] = XDP_RSS_TYPE_NONE,
+	[15] = XDP_RSS_TYPE_NONE,
+};
+
 static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash)
 {
 	const struct igc_xdp_buff *ctx = (void *)_ctx;
@@ -6505,7 +6525,7 @@ static int igc_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash)
 
 	*hash = le32_to_cpu(ctx->rx_desc->wb.lower.hi_dword.rss);
 
-	return 0;
+	return igc_xdp_rss_type[igc_rss_type(ctx->rx_desc)];
 }
 
 const struct xdp_metadata_ops igc_xdp_metadata_ops = {



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf RFC 3/4] veth: bpf_xdp_metadata_rx_hash return xdp rss hash type
  2023-03-28 20:15 [PATCH bpf RFC 0/4] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash Jesper Dangaard Brouer
  2023-03-28 20:15 ` [PATCH bpf RFC 1/4] xdp: rss hash types representation Jesper Dangaard Brouer
  2023-03-28 20:16 ` [PATCH bpf RFC 2/4] igc: bpf_xdp_metadata_rx_hash return xdp rss hash type Jesper Dangaard Brouer
@ 2023-03-28 20:16 ` Jesper Dangaard Brouer
  2023-03-28 20:16 ` [PATCH bpf RFC 4/4] mlx5: " Jesper Dangaard Brouer
  3 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-28 20:16 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
	ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
	anthony.l.nguyen, yoong.siang.song, boon.leong.ong,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Update API for bpf_xdp_metadata_rx_hash() by returning xdp rss hash type.

The veth driver currently only support XDP-hints based on SKB code path.
The SKB have lost information about the RSS hash type, by compressing
the information down to a single bitfield skb->l4_hash, that only knows
if this was a L4 hash value.

In preparation for veth, the xdp_rss_hash_type have an L4 indication
bit that allow us to return a meaningful L4 indication when working
with SKB based packets.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/veth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 046461ee42ea..6b1084e39b25 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1624,7 +1624,7 @@ static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
 		return -ENODATA;
 
 	*hash = skb_get_hash(_ctx->skb);
-	return 0;
+	return _ctx->skb->l4_hash ? XDP_RSS_TYPE_L4_BIT : XDP_RSS_TYPE_NONE;
 }
 
 static const struct net_device_ops veth_netdev_ops = {



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf RFC 4/4] mlx5: bpf_xdp_metadata_rx_hash return xdp rss hash type
  2023-03-28 20:15 [PATCH bpf RFC 0/4] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2023-03-28 20:16 ` [PATCH bpf RFC 3/4] veth: " Jesper Dangaard Brouer
@ 2023-03-28 20:16 ` Jesper Dangaard Brouer
  3 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-28 20:16 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Stanislav Fomichev, martin.lau,
	ast, daniel, alexandr.lobakin, larysa.zaremba, xdp-hints,
	anthony.l.nguyen, yoong.siang.song, boon.leong.ong,
	intel-wired-lan, pabeni, jesse.brandeburg, kuba, edumazet,
	john.fastabend, hawk, davem

Update API for bpf_xdp_metadata_rx_hash() by returning xdp rss hash type
via mapping table.

The mlx5 hardware can also identify and RSS hash IPSEC.  This indicate
hash includes SPI (Security Parameters Index) as part of IPSEC hash.

Extend xdp core enum xdp_rss_hash_type with IPSEC hash type.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |   61 +++++++++++++++++++++-
 include/linux/mlx5/device.h                      |   14 ++++-
 include/net/xdp.h                                |    3 +
 3 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index c5dae48b7932..07bd70249c42 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -34,6 +34,7 @@
 #include <net/xdp_sock_drv.h>
 #include "en/xdp.h"
 #include "en/params.h"
+#include <linux/bitfield.h>
 
 int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk)
 {
@@ -169,15 +170,71 @@ static int mlx5e_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
 	return 0;
 }
 
+/* Mapping HW RSS Type bits CQE_RSS_HTYPE_IP + CQE_RSS_HTYPE_L4 into 4-bits*/
+#define RSS_TYPE_MAX_TABLE	16 /* 4-bits max 16 entries */
+#define RSS_L4		GENMASK(1,0)
+#define RSS_L3		GENMASK(3,2) /* Same as CQE_RSS_HTYPE_IP */
+
+/* Valid combinations of CQE_RSS_HTYPE_IP + CQE_RSS_HTYPE_L4 sorted numerical */
+enum mlx5_rss_hash_type {
+	RSS_TYPE_NO_HASH	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IP_NONE)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_NONE)),
+	RSS_TYPE_L3_IPV4	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IPV4)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_NONE)),
+	RSS_TYPE_L4_IPV4_TCP	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IPV4)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_TCP)),
+	RSS_TYPE_L4_IPV4_UDP	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IPV4)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_UDP)),
+	RSS_TYPE_L4_IPV4_IPSEC	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IPV4)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_IPSEC)),
+	RSS_TYPE_L3_IPV6	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IPV6)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_NONE)),
+	RSS_TYPE_L4_IPV6_TCP	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IPV6)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_TCP)),
+	RSS_TYPE_L4_IPV6_UDP	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IPV6)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_UDP)),
+	RSS_TYPE_L4_IPV6_IPSEC	= (FIELD_PREP_CONST(RSS_L3, CQE_RSS_IPV6)| \
+				   FIELD_PREP_CONST(RSS_L4, CQE_RSS_L4_IPSEC)),
+} mlx5_rss_hash_type;
+
+/* The invalid combinations will simply return zero */
+static const enum xdp_rss_hash_type mlx5_xdp_rss_type[RSS_TYPE_MAX_TABLE] = {
+	[RSS_TYPE_NO_HASH]	= XDP_RSS_TYPE_NONE,
+	[1]			= XDP_RSS_TYPE_NONE, /* Implicit zero */
+	[2]			= XDP_RSS_TYPE_NONE, /* Implicit zero */
+	[3]			= XDP_RSS_TYPE_NONE, /* Implicit zero */
+	[RSS_TYPE_L3_IPV4]	= XDP_RSS_TYPE_L3_IPV4,
+	[RSS_TYPE_L4_IPV4_TCP]	= XDP_RSS_TYPE_L4_IPV4_TCP,
+	[RSS_TYPE_L4_IPV4_UDP]	= XDP_RSS_TYPE_L4_IPV4_UDP,
+	[RSS_TYPE_L4_IPV4_IPSEC]= XDP_RSS_TYPE_L4_IPV4_IPSEC,
+	[RSS_TYPE_L3_IPV6]	= XDP_RSS_TYPE_L3_IPV6,
+	[RSS_TYPE_L4_IPV6_TCP]	= XDP_RSS_TYPE_L4_IPV6_TCP,
+	[RSS_TYPE_L4_IPV6_UDP]  = XDP_RSS_TYPE_L4_IPV6_UDP,
+	[RSS_TYPE_L4_IPV6_IPSEC]= XDP_RSS_TYPE_L4_IPV6_IPSEC,
+	[12]			= XDP_RSS_TYPE_NONE, /* Implicit zero */
+	[13]			= XDP_RSS_TYPE_NONE, /* Implicit zero */
+	[14]			= XDP_RSS_TYPE_NONE, /* Implicit zero */
+	[15]			= XDP_RSS_TYPE_NONE, /* Implicit zero */
+};
+
 static int mlx5e_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
 {
 	const struct mlx5e_xdp_buff *_ctx = (void *)ctx;
+	const struct mlx5_cqe64 *cqe = _ctx->cqe;
+	u32 hash_type, l4_type, ip_type, lookup;
 
 	if (unlikely(!(_ctx->xdp.rxq->dev->features & NETIF_F_RXHASH)))
 		return -ENODATA;
 
-	*hash = be32_to_cpu(_ctx->cqe->rss_hash_result);
-	return 0;
+	*hash = be32_to_cpu(cqe->rss_hash_result);
+
+	hash_type = cqe->rss_hash_type;
+	BUILD_BUG_ON(CQE_RSS_HTYPE_IP != RSS_L3); /* same mask */
+	ip_type = hash_type & CQE_RSS_HTYPE_IP;
+	l4_type = FIELD_GET(CQE_RSS_HTYPE_L4, hash_type);
+	lookup = ip_type | l4_type;
+
+	return mlx5_xdp_rss_type[lookup];
 }
 
 const struct xdp_metadata_ops mlx5e_xdp_metadata_ops = {
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 71b06ebad402..27aa9ae10996 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -36,6 +36,7 @@
 #include <linux/types.h>
 #include <rdma/ib_verbs.h>
 #include <linux/mlx5/mlx5_ifc.h>
+#include <linux/bitfield.h>
 
 #if defined(__LITTLE_ENDIAN)
 #define MLX5_SET_HOST_ENDIANNESS	0
@@ -980,14 +981,23 @@ enum {
 };
 
 enum {
-	CQE_RSS_HTYPE_IP	= 0x3 << 2,
+	CQE_RSS_HTYPE_IP	= GENMASK(3,2),
 	/* cqe->rss_hash_type[3:2] - IP destination selected for hash
 	 * (00 = none,  01 = IPv4, 10 = IPv6, 11 = Reserved)
 	 */
-	CQE_RSS_HTYPE_L4	= 0x3 << 6,
+	CQE_RSS_IP_NONE		= 0x0,
+	CQE_RSS_IPV4		= 0x1,
+	CQE_RSS_IPV6		= 0x2,
+	CQE_RSS_RESERVED	= 0x3,
+
+	CQE_RSS_HTYPE_L4	= GENMASK(7,6),
 	/* cqe->rss_hash_type[7:6] - L4 destination selected for hash
 	 * (00 = none, 01 = TCP. 10 = UDP, 11 = IPSEC.SPI
 	 */
+	CQE_RSS_L4_NONE		= 0x0,
+	CQE_RSS_L4_TCP		= 0x1,
+	CQE_RSS_L4_UDP		= 0x2,
+	CQE_RSS_L4_IPSEC	= 0x3,
 };
 
 enum {
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 63f462f5ea7f..962611d5bc02 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -429,16 +429,19 @@ enum xdp_rss_hash_type {
 	XDP_RSS_TYPE_L4_IPV4_TCP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 1),
 	XDP_RSS_TYPE_L4_IPV4_UDP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 2),
 	XDP_RSS_TYPE_L4_IPV4_SCTP    = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 3),
+	XDP_RSS_TYPE_L4_IPV4_IPSEC   = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 4),
 
 	XDP_RSS_TYPE_L4_IPV6_MASK    = RSS_L4_IPV6,
 	XDP_RSS_TYPE_L4_IPV6_TCP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 1),
 	XDP_RSS_TYPE_L4_IPV6_UDP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 2),
 	XDP_RSS_TYPE_L4_IPV6_SCTP    = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 3),
+	XDP_RSS_TYPE_L4_IPV6_IPSEC   = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 4),
 
 	XDP_RSS_TYPE_L4_IPV6_EX_MASK = L4_IPV6_EX_BIT,
 	XDP_RSS_TYPE_L4_IPV6_TCP_EX  = XDP_RSS_TYPE_L4_IPV6_TCP |L4_IPV6_EX_BIT,
 	XDP_RSS_TYPE_L4_IPV6_UDP_EX  = XDP_RSS_TYPE_L4_IPV6_UDP |L4_IPV6_EX_BIT,
 	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP|L4_IPV6_EX_BIT,
+	XDP_RSS_TYPE_L4_IPV6_IPSEC_EX= XDP_RSS_TYPE_L4_IPV6_IPSEC|L4_IPV6_EX_BIT,
 };
 #undef RSS_L3
 #undef L4_BIT



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-28 20:15 ` [PATCH bpf RFC 1/4] xdp: rss hash types representation Jesper Dangaard Brouer
@ 2023-03-28 21:58   ` Stanislav Fomichev
  2023-03-29 11:23     ` Jesper Dangaard Brouer
  2023-03-29  8:10   ` Edward Cree
  1 sibling, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2023-03-28 21:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
	boon.leong.ong, intel-wired-lan, pabeni, jesse.brandeburg, kuba,
	edumazet, john.fastabend, hawk, davem

On 03/28, Jesper Dangaard Brouer wrote:
> The RSS hash type specifies what portion of packet data NIC hardware used
> when calculating RSS hash value. The RSS types are focused on Internet
> traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
> value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
> primarily TCP vs UDP, but some hardware supports SCTP.

> Hardware RSS types are differently encoded for each hardware NIC. Most
> hardware represent RSS hash type as a number. Determining L3 vs L4 often
> requires a mapping table as there often isn't a pattern or sorting
> according to ISO layer.

> The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can both
> be seen as a number that is ordered according by ISO layer, and can be bit
> masked to separate IPv4 and IPv6 types for L4 protocols. Room is available
> for extending later while keeping these properties. This maps and unifies
> difference to hardware specific hashes.

Looks good overall. Any reason we're making this specific layout?
Why not simply the following?

enum {
	XDP_RSS_TYPE_NONE = 0,
	XDP_RSS_TYPE_IPV4 = BIT(0),
	XDP_RSS_TYPE_IPV6 = BIT(1),
	/* IPv6 with extension header. */
	/* let's note ^^^ it in the UAPI? */
	XDP_RSS_TYPE_IPV6_EX = BIT(2),
	XDP_RSS_TYPE_UDP = BIT(3),
	XDP_RSS_TYPE_TCP = BIT(4),
	XDP_RSS_TYPE_SCTP = BIT(5),
}

And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs XDP_RSS_TYPE_IPV6|XXX ?

> This proposal change the kfunc API bpf_xdp_metadata_rx_hash() to return
> this RSS hash type on success.

> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   include/net/xdp.h |   51  
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>   net/core/xdp.c    |    4 +++-
>   2 files changed, 54 insertions(+), 1 deletion(-)

> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 5393b3ebe56e..63f462f5ea7f 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -8,6 +8,7 @@

>   #include <linux/skbuff.h> /* skb_shared_info */
>   #include <uapi/linux/netdev.h>
> +#include <linux/bitfield.h>

>   /**
>    * DOC: XDP RX-queue information
> @@ -396,6 +397,56 @@ XDP_METADATA_KFUNC_xxx
>   MAX_XDP_METADATA_KFUNC,
>   };

> +/* For partitioning of xdp_rss_hash_type */
> +#define RSS_L3		GENMASK(2,0) /* 3-bits = values between 1-7 */
> +#define L4_BIT		BIT(3)       /* 1-bit - L4 indication */
> +#define RSS_L4_IPV4	GENMASK(6,4) /* 3-bits */
> +#define RSS_L4_IPV6	GENMASK(9,7) /* 3-bits */
> +#define RSS_L4		GENMASK(9,3) /* = 7-bits - covering L4 IPV4+IPV6 */
> +#define L4_IPV6_EX_BIT	BIT(9)       /* 1-bit - L4 IPv6 with Extension  
> hdr */
> +				     /* 11-bits in total */
> +
> +/* The XDP RSS hash type (xdp_rss_hash_type) can both be seen as a  
> number that
> + * is ordered according by ISO layer, and can be bit masked to separate  
> IPv4 and
> + * IPv6 types for L4 protocols. Room is available for extending later  
> while
> + * keeping above properties, as this need to cover NIC hardware RSS  
> types.
> + */
> +enum xdp_rss_hash_type {
> +	XDP_RSS_TYPE_NONE            = 0,
> +	XDP_RSS_TYPE_L2              = XDP_RSS_TYPE_NONE,
> +
> +	XDP_RSS_TYPE_L3_MASK         = RSS_L3,
> +	XDP_RSS_TYPE_L3_IPV4         = FIELD_PREP_CONST(RSS_L3, 1),
> +	XDP_RSS_TYPE_L3_IPV6         = FIELD_PREP_CONST(RSS_L3, 2),
> +	XDP_RSS_TYPE_L3_IPV6_EX      = FIELD_PREP_CONST(RSS_L3, 4),
> +
> +	XDP_RSS_TYPE_L4_MASK         = RSS_L4,
> +	XDP_RSS_TYPE_L4_SHIFT        = __bf_shf(RSS_L4),
> +	XDP_RSS_TYPE_L4_MASK_EX      = RSS_L4 | L4_IPV6_EX_BIT,
> +
> +	XDP_RSS_TYPE_L4_IPV4_MASK    = RSS_L4_IPV4,
> +	XDP_RSS_TYPE_L4_BIT          = L4_BIT,
> +	XDP_RSS_TYPE_L4_IPV4_TCP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 1),
> +	XDP_RSS_TYPE_L4_IPV4_UDP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 2),
> +	XDP_RSS_TYPE_L4_IPV4_SCTP    = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 3),
> +
> +	XDP_RSS_TYPE_L4_IPV6_MASK    = RSS_L4_IPV6,
> +	XDP_RSS_TYPE_L4_IPV6_TCP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 1),
> +	XDP_RSS_TYPE_L4_IPV6_UDP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 2),
> +	XDP_RSS_TYPE_L4_IPV6_SCTP    = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 3),
> +
> +	XDP_RSS_TYPE_L4_IPV6_EX_MASK = L4_IPV6_EX_BIT,
> +	XDP_RSS_TYPE_L4_IPV6_TCP_EX  = XDP_RSS_TYPE_L4_IPV6_TCP |L4_IPV6_EX_BIT,
> +	XDP_RSS_TYPE_L4_IPV6_UDP_EX  = XDP_RSS_TYPE_L4_IPV6_UDP |L4_IPV6_EX_BIT,
> +	XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP|L4_IPV6_EX_BIT,
> +};
> +#undef RSS_L3
> +#undef L4_BIT
> +#undef RSS_L4_IPV4
> +#undef RSS_L4_IPV6
> +#undef RSS_L4
> +#undef L4_IPV6_EX_BIT
> +
>   #ifdef CONFIG_NET
>   u32 bpf_xdp_metadata_kfunc_id(int id);
>   bool bpf_dev_bound_kfunc_id(u32 btf_id);
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 7133017bcd74..81d41df30695 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -721,12 +721,14 @@ __bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const  
> struct xdp_md *ctx, u64 *tim
>    * @hash: Return value pointer.
>    *
>    * Return:
> - * * Returns 0 on success or ``-errno`` on error.
> + * * Returns (positive) RSS hash **type** on success or ``-errno`` on  
> error.
> + * * ``enum xdp_rss_hash_type`` : RSS hash type
>    * * ``-EOPNOTSUPP`` : means device driver doesn't implement kfunc
>    * * ``-ENODATA``    : means no RX-hash available for this frame
>    */
>   __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32  
> *hash)
>   {
> +	BTF_TYPE_EMIT(enum xdp_rss_hash_type);
>   	return -EOPNOTSUPP;
>   }




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-28 20:15 ` [PATCH bpf RFC 1/4] xdp: rss hash types representation Jesper Dangaard Brouer
  2023-03-28 21:58   ` Stanislav Fomichev
@ 2023-03-29  8:10   ` Edward Cree
  2023-03-29 12:13     ` [xdp-hints] " Jesper Dangaard Brouer
  1 sibling, 1 reply; 15+ messages in thread
From: Edward Cree @ 2023-03-29  8:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, bpf
  Cc: netdev, Stanislav Fomichev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, anthony.l.nguyen,
	yoong.siang.song, boon.leong.ong, intel-wired-lan, pabeni,
	jesse.brandeburg, kuba, edumazet, john.fastabend, hawk, davem

On 28/03/2023 21:15, Jesper Dangaard Brouer wrote:
> Hardware RSS types are differently encoded for each hardware NIC. Most
> hardware represent RSS hash type as a number. Determining L3 vs L4 often
> requires a mapping table as there often isn't a pattern or sorting
> according to ISO layer.
> 
> The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can both
> be seen as a number that is ordered according by ISO layer, and can be bit
> masked to separate IPv4 and IPv6 types for L4 protocols. Room is available
> for extending later while keeping these properties. This maps and unifies
> difference to hardware specific hashes.

Would it be better to make use of the ETHTOOL_GRXFH defines (stuff
 like UDP_V6_FLOW, RXH_L4_B_0_1 etc.)?  Seems like that could allow
 for some code reuse in drivers.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-28 21:58   ` Stanislav Fomichev
@ 2023-03-29 11:23     ` Jesper Dangaard Brouer
  2023-03-29 17:18       ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-29 11:23 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
	boon.leong.ong, intel-wired-lan, pabeni, jesse.brandeburg, kuba,
	edumazet, john.fastabend, hawk, davem


On 28/03/2023 23.58, Stanislav Fomichev wrote:
> On 03/28, Jesper Dangaard Brouer wrote:
>> The RSS hash type specifies what portion of packet data NIC hardware used
>> when calculating RSS hash value. The RSS types are focused on Internet
>> traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
>> value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
>> primarily TCP vs UDP, but some hardware supports SCTP.
> 
>> Hardware RSS types are differently encoded for each hardware NIC. Most
>> hardware represent RSS hash type as a number. Determining L3 vs L4 often
>> requires a mapping table as there often isn't a pattern or sorting
>> according to ISO layer.
> 
>> The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can both
>> be seen as a number that is ordered according by ISO layer, and can be bit
>> masked to separate IPv4 and IPv6 types for L4 protocols. Room is available
>> for extending later while keeping these properties. This maps and unifies
>> difference to hardware specific hashes.
> 
> Looks good overall. Any reason we're making this specific layout?

One important goal is to have a simple/fast way to determining L3 vs L4,
because a L4 hash can be used for flow handling (e.g. load-balancing).

We below layout you can:

  if (rss_type & XDP_RSS_TYPE_L4_MASK)
	bool hw_hash_do_LB = true;

Or using it as a number:

  if (rss_type > XDP_RSS_TYPE_L4)
	bool hw_hash_do_LB = true;

I'm very open to changes to my "specific" layout.  I am in doubt if
using it as a number is the right approach and worth the trouble.

> Why not simply the following?
> 
> enum {
>      XDP_RSS_TYPE_NONE = 0,
>      XDP_RSS_TYPE_IPV4 = BIT(0),
>      XDP_RSS_TYPE_IPV6 = BIT(1),
>      /* IPv6 with extension header. */
>      /* let's note ^^^ it in the UAPI? */
>      XDP_RSS_TYPE_IPV6_EX = BIT(2),
>      XDP_RSS_TYPE_UDP = BIT(3),
>      XDP_RSS_TYPE_TCP = BIT(4),
>      XDP_RSS_TYPE_SCTP = BIT(5),

We know these bits for UDP, TCP, SCTP (and IPSEC) are exclusive, they
cannot be set at the same time, e.g. as a packet cannot both be UDP and
TCP.  Thus, using these bits as a number make sense to me, and is more
compact.

This BIT() approach also have the issue of extending it later (forward
compatibility).  As mentioned a common task will be to check if
hash-type is a L4 type.  See mlx5 [patch 4/4] needed to extend with
IPSEC. Notice how my XDP_RSS_TYPE_L4_MASK covers all the bits that this
can be extended with new L4 types, such that existing progs will still
work checking for L4 check.  It can of-cause be solved in the same way
for this BIT() approach by reserving some bits upfront in a mask.

> }
> 
> And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs 
> XDP_RSS_TYPE_IPV6|XXX ?

Do notice, that I already does some level of or'ing ("|") in this
proposal.  The main difference is that I hide this from the driver, and
kind of pre-combine the valid combination (enum's) drivers can select
from. I do get the point, and I think I will come up with a combined
solution based on your input.


The RSS hashing types and combinations comes from M$ standards:
  [1] 
https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ipv4-hash-type-combinations


>> This proposal change the kfunc API bpf_xdp_metadata_rx_hash() to return
>> this RSS hash type on success.
> 
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>>   include/net/xdp.h |   51 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   net/core/xdp.c    |    4 +++-
>>   2 files changed, 54 insertions(+), 1 deletion(-)
> 
>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>> index 5393b3ebe56e..63f462f5ea7f 100644
>> --- a/include/net/xdp.h
>> +++ b/include/net/xdp.h
>> @@ -8,6 +8,7 @@
> 
>>   #include <linux/skbuff.h> /* skb_shared_info */
>>   #include <uapi/linux/netdev.h>
>> +#include <linux/bitfield.h>
> 
>>   /**
>>    * DOC: XDP RX-queue information
>> @@ -396,6 +397,56 @@ XDP_METADATA_KFUNC_xxx
>>   MAX_XDP_METADATA_KFUNC,
>>   };
> 
>> +/* For partitioning of xdp_rss_hash_type */
>> +#define RSS_L3        GENMASK(2,0) /* 3-bits = values between 1-7 */
>> +#define L4_BIT        BIT(3)       /* 1-bit - L4 indication */
>> +#define RSS_L4_IPV4    GENMASK(6,4) /* 3-bits */
>> +#define RSS_L4_IPV6    GENMASK(9,7) /* 3-bits */
>> +#define RSS_L4        GENMASK(9,3) /* = 7-bits - covering L4 
>> IPV4+IPV6 */
>> +#define L4_IPV6_EX_BIT    BIT(9)       /* 1-bit - L4 IPv6 with 
>> Extension hdr */
>> +                     /* 11-bits in total */
>> +
>> +/* The XDP RSS hash type (xdp_rss_hash_type) can both be seen as a number that
>> + * is ordered according by ISO layer, and can be bit masked to separate IPv4 and
>> + * IPv6 types for L4 protocols. Room is available for extending later while
>> + * keeping above properties, as this need to cover NIC hardware RSS types.
>> + */
>> +enum xdp_rss_hash_type {
>> +    XDP_RSS_TYPE_NONE            = 0,
>> +    XDP_RSS_TYPE_L2              = XDP_RSS_TYPE_NONE,
>> +
>> +    XDP_RSS_TYPE_L3_MASK         = RSS_L3,
>> +    XDP_RSS_TYPE_L3_IPV4         = FIELD_PREP_CONST(RSS_L3, 1),
>> +    XDP_RSS_TYPE_L3_IPV6         = FIELD_PREP_CONST(RSS_L3, 2),
>> +    XDP_RSS_TYPE_L3_IPV6_EX      = FIELD_PREP_CONST(RSS_L3, 4),
>> +
>> +    XDP_RSS_TYPE_L4_MASK         = RSS_L4,
>> +    XDP_RSS_TYPE_L4_SHIFT        = __bf_shf(RSS_L4),
>> +    XDP_RSS_TYPE_L4_MASK_EX      = RSS_L4 | L4_IPV6_EX_BIT,
>> +
>> +    XDP_RSS_TYPE_L4_IPV4_MASK    = RSS_L4_IPV4,
>> +    XDP_RSS_TYPE_L4_BIT          = L4_BIT,
>> +    XDP_RSS_TYPE_L4_IPV4_TCP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 1),
>> +    XDP_RSS_TYPE_L4_IPV4_UDP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 2),
>> +    XDP_RSS_TYPE_L4_IPV4_SCTP    = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV4, 3),
>> +
>> +    XDP_RSS_TYPE_L4_IPV6_MASK    = RSS_L4_IPV6,
>> +    XDP_RSS_TYPE_L4_IPV6_TCP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 1),
>> +    XDP_RSS_TYPE_L4_IPV6_UDP     = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 2),
>> +    XDP_RSS_TYPE_L4_IPV6_SCTP    = L4_BIT|FIELD_PREP_CONST(RSS_L4_IPV6, 3),
>> +
>> +    XDP_RSS_TYPE_L4_IPV6_EX_MASK = L4_IPV6_EX_BIT,
>> +    XDP_RSS_TYPE_L4_IPV6_TCP_EX  = XDP_RSS_TYPE_L4_IPV6_TCP|L4_IPV6_EX_BIT,
>> +    XDP_RSS_TYPE_L4_IPV6_UDP_EX  = XDP_RSS_TYPE_L4_IPV6_UDP|L4_IPV6_EX_BIT,
>> +    XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP|L4_IPV6_EX_BIT,
>> +};
>> +#undef RSS_L3
>> +#undef L4_BIT
>> +#undef RSS_L4_IPV4
>> +#undef RSS_L4_IPV6
>> +#undef RSS_L4
>> +#undef L4_IPV6_EX_BIT
>> +
>>   #ifdef CONFIG_NET
>>   u32 bpf_xdp_metadata_kfunc_id(int id);
>>   bool bpf_dev_bound_kfunc_id(u32 btf_id);
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> index 7133017bcd74..81d41df30695 100644
>> --- a/net/core/xdp.c
>> +++ b/net/core/xdp.c
>> @@ -721,12 +721,14 @@ __bpf_kfunc int 
>> bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *tim
>>    * @hash: Return value pointer.
>>    *
>>    * Return:
>> - * * Returns 0 on success or ``-errno`` on error.
>> + * * Returns (positive) RSS hash **type** on success or ``-errno`` on 
>> error.
>> + * * ``enum xdp_rss_hash_type`` : RSS hash type
>>    * * ``-EOPNOTSUPP`` : means device driver doesn't implement kfunc
>>    * * ``-ENODATA``    : means no RX-hash available for this frame
>>    */
>>   __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, 
>> u32 *hash)
>>   {
>> +    BTF_TYPE_EMIT(enum xdp_rss_hash_type);
>>       return -EOPNOTSUPP;
>>   }
> 
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [xdp-hints] Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-29  8:10   ` Edward Cree
@ 2023-03-29 12:13     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-29 12:13 UTC (permalink / raw)
  To: Edward Cree, bpf
  Cc: brouer, netdev, Stanislav Fomichev, martin.lau, ast, daniel,
	alexandr.lobakin, larysa.zaremba, xdp-hints, anthony.l.nguyen,
	yoong.siang.song, boon.leong.ong, intel-wired-lan, pabeni,
	jesse.brandeburg, kuba, edumazet, john.fastabend, hawk, davem


On 29/03/2023 10.10, Edward Cree wrote:
> On 28/03/2023 21:15, Jesper Dangaard Brouer wrote:
>> Hardware RSS types are differently encoded for each hardware NIC. Most
>> hardware represent RSS hash type as a number. Determining L3 vs L4 often
>> requires a mapping table as there often isn't a pattern or sorting
>> according to ISO layer.
>>
>> The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can both
>> be seen as a number that is ordered according by ISO layer, and can be bit
>> masked to separate IPv4 and IPv6 types for L4 protocols. Room is available
>> for extending later while keeping these properties. This maps and unifies
>> difference to hardware specific hashes.
> 
> Would it be better to make use of the ETHTOOL_GRXFH defines (stuff
>   like UDP_V6_FLOW, RXH_L4_B_0_1 etc.)?  Seems like that could allow
>   for some code reuse in drivers.

Thanks for the point to ethtool defines.
I can see that these are used when configuring the hardware RSS hash the
NIC should calculate.

From: include/uapi/linux/ethtool.h
  /* L3-L4 network traffic flow hash options */
  #define	RXH_L2DA	(1 << 1)
  #define	RXH_VLAN	(1 << 2)
  #define	RXH_L3_PROTO	(1 << 3)
  #define	RXH_IP_SRC	(1 << 4)
  #define	RXH_IP_DST	(1 << 5)
  #define	RXH_L4_B_0_1	(1 << 6) /* src port in case of TCP/UDP/SCTP */
  #define	RXH_L4_B_2_3	(1 << 7) /* dst port in case of TCP/UDP/SCTP */
  #define	RXH_DISCARD	(1 << 31)

I notice that I forgot about VLAN tag (RXH_VLAN) also can be part of the
hash calc in my proposed design.

It is interpreting to follow the possible ethool cmd->flow_type's:

  /* L2-L4 network traffic flow types */
  #define	TCP_V4_FLOW	0x01	/* hash or spec (tcp_ip4_spec) */
  #define	UDP_V4_FLOW	0x02	/* hash or spec (udp_ip4_spec) */
  #define	SCTP_V4_FLOW	0x03	/* hash or spec (sctp_ip4_spec) */
  #define	AH_ESP_V4_FLOW	0x04	/* hash only */
  #define	TCP_V6_FLOW	0x05	/* hash or spec (tcp_ip6_spec; nfc only) */
  #define	UDP_V6_FLOW	0x06	/* hash or spec (udp_ip6_spec; nfc only) */
  #define	SCTP_V6_FLOW	0x07	/* hash or spec (sctp_ip6_spec; nfc only) */
  #define	AH_ESP_V6_FLOW	0x08	/* hash only */
  #define	AH_V4_FLOW	0x09	/* hash or spec (ah_ip4_spec) */
  #define	ESP_V4_FLOW	0x0a	/* hash or spec (esp_ip4_spec) */
  #define	AH_V6_FLOW	0x0b	/* hash or spec (ah_ip6_spec; nfc only) */
  #define	ESP_V6_FLOW	0x0c	/* hash or spec (esp_ip6_spec; nfc only) */
  #define	IPV4_USER_FLOW	0x0d	/* spec only (usr_ip4_spec) */
  #define	IP_USER_FLOW	IPV4_USER_FLOW
  #define	IPV6_USER_FLOW	0x0e	/* spec only (usr_ip6_spec; nfc only) */
  #define	IPV4_FLOW	0x10	/* hash only */
  #define	IPV6_FLOW	0x11	/* hash only */
  #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
  /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
  #define	FLOW_EXT	0x80000000
  #define	FLOW_MAC_EXT	0x40000000
  /* Flag to enable RSS spreading of traffic matching rule (nfc only) */
  #define	FLOW_RSS	0x20000000

It is clear that we need to support TCP+UDP+SCTP.

I assume the IPSEC is AH (Authentication Header) and ESP ( Encapsulating 
Security Payload.  Thus, (like I found with mlx5) we also need IPSET and 
maybe a bit (or number) for each protocol AH or ESP.

Both ah_ip4_spec and esp_ip4_spec points to ethtool.h struct:

/**
  * struct ethtool_ah_espip4_spec - flow specification for IPsec/IPv4
  * @ip4src: Source host
  * @ip4dst: Destination host
  * @spi: Security parameters index
  * @tos: Type-of-service
  *
  * This can be used to specify an IPsec transport or tunnel over IPv4.
  */
  struct ethtool_ah_espip4_spec {
	__be32	ip4src;
	__be32	ip4dst;
	__be32	spi;
	__u8    tos;
  };

Which confirms that it is the SPI that is the extra part of the hash.

--Jesper


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-29 11:23     ` Jesper Dangaard Brouer
@ 2023-03-29 17:18       ` Stanislav Fomichev
  2023-03-29 18:19         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2023-03-29 17:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
	boon.leong.ong, intel-wired-lan, pabeni, jesse.brandeburg, kuba,
	edumazet, john.fastabend, hawk, davem

On 03/29, Jesper Dangaard Brouer wrote:

> On 28/03/2023 23.58, Stanislav Fomichev wrote:
> > On 03/28, Jesper Dangaard Brouer wrote:
> > > The RSS hash type specifies what portion of packet data NIC hardware  
> used
> > > when calculating RSS hash value. The RSS types are focused on Internet
> > > traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get  
> hash
> > > value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
> > > primarily TCP vs UDP, but some hardware supports SCTP.
> >
> > > Hardware RSS types are differently encoded for each hardware NIC. Most
> > > hardware represent RSS hash type as a number. Determining L3 vs L4  
> often
> > > requires a mapping table as there often isn't a pattern or sorting
> > > according to ISO layer.
> >
> > > The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can  
> both
> > > be seen as a number that is ordered according by ISO layer, and can  
> be bit
> > > masked to separate IPv4 and IPv6 types for L4 protocols. Room is  
> available
> > > for extending later while keeping these properties. This maps and  
> unifies
> > > difference to hardware specific hashes.
> >
> > Looks good overall. Any reason we're making this specific layout?

> One important goal is to have a simple/fast way to determining L3 vs L4,
> because a L4 hash can be used for flow handling (e.g. load-balancing).

> We below layout you can:

>   if (rss_type & XDP_RSS_TYPE_L4_MASK)
> 	bool hw_hash_do_LB = true;

> Or using it as a number:

>   if (rss_type > XDP_RSS_TYPE_L4)
> 	bool hw_hash_do_LB = true;

Why is it strictly better then the following?

if (rss_type & (TYPE_UDP | TYPE_TCP | TYPE_SCTP)) {}

If we add some new L4 format, the bpf programs can be updated to support
it?

> I'm very open to changes to my "specific" layout.  I am in doubt if
> using it as a number is the right approach and worth the trouble.

> > Why not simply the following?
> >
> > enum {
> >  ����XDP_RSS_TYPE_NONE = 0,
> >  ����XDP_RSS_TYPE_IPV4 = BIT(0),
> >  ����XDP_RSS_TYPE_IPV6 = BIT(1),
> >  ����/* IPv6 with extension header. */
> >  ����/* let's note ^^^ it in the UAPI? */
> >  ����XDP_RSS_TYPE_IPV6_EX = BIT(2),
> >  ����XDP_RSS_TYPE_UDP = BIT(3),
> >  ����XDP_RSS_TYPE_TCP = BIT(4),
> >  ����XDP_RSS_TYPE_SCTP = BIT(5),

> We know these bits for UDP, TCP, SCTP (and IPSEC) are exclusive, they
> cannot be set at the same time, e.g. as a packet cannot both be UDP and
> TCP.  Thus, using these bits as a number make sense to me, and is more
> compact.

[..]

> This BIT() approach also have the issue of extending it later (forward
> compatibility).  As mentioned a common task will be to check if
> hash-type is a L4 type.  See mlx5 [patch 4/4] needed to extend with
> IPSEC. Notice how my XDP_RSS_TYPE_L4_MASK covers all the bits that this
> can be extended with new L4 types, such that existing progs will still
> work checking for L4 check.  It can of-cause be solved in the same way
> for this BIT() approach by reserving some bits upfront in a mask.

We're using 6 bits out of 64, we should be good for awhile? If there
is ever a forward compatibility issue, we can always come up with
a new kfunc.

One other related question I have is: should we export the type
over some additional new kfunc argument? (instead of abusing the return
type) Maybe that will let us drop the explicit BTF_TYPE_EMIT as well?

> > }
> >
> > And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs
> > XDP_RSS_TYPE_IPV6|XXX ?

> Do notice, that I already does some level of or'ing ("|") in this
> proposal.  The main difference is that I hide this from the driver, and
> kind of pre-combine the valid combination (enum's) drivers can select
> from. I do get the point, and I think I will come up with a combined
> solution based on your input.


> The RSS hashing types and combinations comes from M$ standards:
>   [1]  
> https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ipv4-hash-type-combinations

My main concern here is that we're over-complicating it with the masks
and the format. With the explicit bits we can easily map to that
spec you mention.

For example, for forward compat, I'm not sure we can assume that the people
will do:
	"rss_type & XDP_RSS_TYPE_L4_MASK"
instead of something like:
	"rss_type & (XDP_RSS_TYPE_L4_IPV4_TCP|XDP_RSS_TYPE_L4_IPV4_UDP)"

> > > This proposal change the kfunc API bpf_xdp_metadata_rx_hash() to  
> return
> > > this RSS hash type on success.
> >
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > ---
> > > � include/net/xdp.h |�� 51
> > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > � net/core/xdp.c��� |��� 4 +++-
> > > � 2 files changed, 54 insertions(+), 1 deletion(-)
> >
> > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > index 5393b3ebe56e..63f462f5ea7f 100644
> > > --- a/include/net/xdp.h
> > > +++ b/include/net/xdp.h
> > > @@ -8,6 +8,7 @@
> >
> > > � #include <linux/skbuff.h> /* skb_shared_info */
> > > � #include <uapi/linux/netdev.h>
> > > +#include <linux/bitfield.h>
> >
> > > � /**
> > > �� * DOC: XDP RX-queue information
> > > @@ -396,6 +397,56 @@ XDP_METADATA_KFUNC_xxx
> > > � MAX_XDP_METADATA_KFUNC,
> > > � };
> >
> > > +/* For partitioning of xdp_rss_hash_type */
> > > +#define RSS_L3������� GENMASK(2,0) /* 3-bits = values between 1-7 */
> > > +#define L4_BIT������� BIT(3)������ /* 1-bit - L4 indication */
> > > +#define RSS_L4_IPV4��� GENMASK(6,4) /* 3-bits */
> > > +#define RSS_L4_IPV6��� GENMASK(9,7) /* 3-bits */
> > > +#define RSS_L4������� GENMASK(9,3) /* = 7-bits - covering L4
> > > IPV4+IPV6 */
> > > +#define L4_IPV6_EX_BIT��� BIT(9)������ /* 1-bit - L4 IPv6 with
> > > Extension hdr */
> > > +�������������������� /* 11-bits in total */
> > > +
> > > +/* The XDP RSS hash type (xdp_rss_hash_type) can both be seen as a  
> number that
> > > + * is ordered according by ISO layer, and can be bit masked to  
> separate IPv4 and
> > > + * IPv6 types for L4 protocols. Room is available for extending  
> later while
> > > + * keeping above properties, as this need to cover NIC hardware RSS  
> types.
> > > + */
> > > +enum xdp_rss_hash_type {
> > > +��� XDP_RSS_TYPE_NONE����������� = 0,
> > > +��� XDP_RSS_TYPE_L2������������� = XDP_RSS_TYPE_NONE,
> > > +
> > > +��� XDP_RSS_TYPE_L3_MASK�������� = RSS_L3,
> > > +��� XDP_RSS_TYPE_L3_IPV4�������� = FIELD_PREP_CONST(RSS_L3, 1),
> > > +��� XDP_RSS_TYPE_L3_IPV6�������� = FIELD_PREP_CONST(RSS_L3, 2),
> > > +��� XDP_RSS_TYPE_L3_IPV6_EX����� = FIELD_PREP_CONST(RSS_L3, 4),
> > > +
> > > +��� XDP_RSS_TYPE_L4_MASK�������� = RSS_L4,
> > > +��� XDP_RSS_TYPE_L4_SHIFT������� = __bf_shf(RSS_L4),
> > > +��� XDP_RSS_TYPE_L4_MASK_EX����� = RSS_L4 | L4_IPV6_EX_BIT,
> > > +
> > > +��� XDP_RSS_TYPE_L4_IPV4_MASK��� = RSS_L4_IPV4,
> > > +��� XDP_RSS_TYPE_L4_BIT��������� = L4_BIT,
> > > +��� XDP_RSS_TYPE_L4_IPV4_TCP���� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV4, 1),
> > > +��� XDP_RSS_TYPE_L4_IPV4_UDP���� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV4, 2),
> > > +��� XDP_RSS_TYPE_L4_IPV4_SCTP��� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV4, 3),
> > > +
> > > +��� XDP_RSS_TYPE_L4_IPV6_MASK��� = RSS_L4_IPV6,
> > > +��� XDP_RSS_TYPE_L4_IPV6_TCP���� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV6, 1),
> > > +��� XDP_RSS_TYPE_L4_IPV6_UDP���� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV6, 2),
> > > +��� XDP_RSS_TYPE_L4_IPV6_SCTP��� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV6, 3),
> > > +
> > > +��� XDP_RSS_TYPE_L4_IPV6_EX_MASK = L4_IPV6_EX_BIT,
> > > +��� XDP_RSS_TYPE_L4_IPV6_TCP_EX� = XDP_RSS_TYPE_L4_IPV6_TCP| 
> L4_IPV6_EX_BIT,
> > > +��� XDP_RSS_TYPE_L4_IPV6_UDP_EX� = XDP_RSS_TYPE_L4_IPV6_UDP| 
> L4_IPV6_EX_BIT,
> > > +��� XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP| 
> L4_IPV6_EX_BIT,
> > > +};
> > > +#undef RSS_L3
> > > +#undef L4_BIT
> > > +#undef RSS_L4_IPV4
> > > +#undef RSS_L4_IPV6
> > > +#undef RSS_L4
> > > +#undef L4_IPV6_EX_BIT
> > > +
> > > � #ifdef CONFIG_NET
> > > � u32 bpf_xdp_metadata_kfunc_id(int id);
> > > � bool bpf_dev_bound_kfunc_id(u32 btf_id);
> > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > index 7133017bcd74..81d41df30695 100644
> > > --- a/net/core/xdp.c
> > > +++ b/net/core/xdp.c
> > > @@ -721,12 +721,14 @@ __bpf_kfunc int
> > > bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *tim
> > > �� * @hash: Return value pointer.
> > > �� *
> > > �� * Return:
> > > - * * Returns 0 on success or ``-errno`` on error.
> > > + * * Returns (positive) RSS hash **type** on success or ``-errno``
> > > on error.
> > > + * * ``enum xdp_rss_hash_type`` : RSS hash type
> > > �� * * ``-EOPNOTSUPP`` : means device driver doesn't implement kfunc
> > > �� * * ``-ENODATA``��� : means no RX-hash available for this frame
> > > �� */
> > > � __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx,
> > > u32 *hash)
> > > � {
> > > +��� BTF_TYPE_EMIT(enum xdp_rss_hash_type);
> > > ����� return -EOPNOTSUPP;
> > > � }
> >
> >
> >


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-29 17:18       ` Stanislav Fomichev
@ 2023-03-29 18:19         ` Jesper Dangaard Brouer
  2023-03-29 23:19           ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-29 18:19 UTC (permalink / raw)
  To: Stanislav Fomichev, Jesper Dangaard Brouer
  Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
	boon.leong.ong, intel-wired-lan, pabeni, jesse.brandeburg, kuba,
	edumazet, john.fastabend, hawk, davem


On 29/03/2023 19.18, Stanislav Fomichev wrote:
> On 03/29, Jesper Dangaard Brouer wrote:
> 
>> On 28/03/2023 23.58, Stanislav Fomichev wrote:
>> > On 03/28, Jesper Dangaard Brouer wrote:
>> > > The RSS hash type specifies what portion of packet data NIC hardware used
>> > > when calculating RSS hash value. The RSS types are focused on Internet
>> > > traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
>> > > value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
>> > > primarily TCP vs UDP, but some hardware supports SCTP.
>> >
>> > > Hardware RSS types are differently encoded for each hardware NIC. Most
>> > > hardware represent RSS hash type as a number. Determining L3 vs L4 often
>> > > requires a mapping table as there often isn't a pattern or sorting
>> > > according to ISO layer.
>> >
>> > > The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can both
>> > > be seen as a number that is ordered according by ISO layer, and can be bit
>> > > masked to separate IPv4 and IPv6 types for L4 protocols. Room is available
>> > > for extending later while keeping these properties. This maps and unifies
>> > > difference to hardware specific hashes.
>> >
>> > Looks good overall. Any reason we're making this specific layout?
> 
>> One important goal is to have a simple/fast way to determining L3 vs L4,
>> because a L4 hash can be used for flow handling (e.g. load-balancing).
> 
>> We below layout you can:
> 
>>   if (rss_type & XDP_RSS_TYPE_L4_MASK)
>>     bool hw_hash_do_LB = true;
> 
>> Or using it as a number:
> 
>>   if (rss_type > XDP_RSS_TYPE_L4)
>>     bool hw_hash_do_LB = true;
> 
> Why is it strictly better then the following?
> 
> if (rss_type & (TYPE_UDP | TYPE_TCP | TYPE_SCTP)) {}
> 

See V2 I dropped the idea of this being a number (that idea was not a
good idea).

> If we add some new L4 format, the bpf programs can be updated to support
> it?
> 
>> I'm very open to changes to my "specific" layout.  I am in doubt if
>> using it as a number is the right approach and worth the trouble.
> 
>> > Why not simply the following?
>> >
>> > enum {
>> >  ����XDP_RSS_TYPE_NONE = 0,
>> >  ����XDP_RSS_TYPE_IPV4 = BIT(0),
>> >  ����XDP_RSS_TYPE_IPV6 = BIT(1),
>> >  ����/* IPv6 with extension header. */
>> >  ����/* let's note ^^^ it in the UAPI? */
>> >  ����XDP_RSS_TYPE_IPV6_EX = BIT(2),
>> >  ����XDP_RSS_TYPE_UDP = BIT(3),
>> >  ����XDP_RSS_TYPE_TCP = BIT(4),
>> >  ����XDP_RSS_TYPE_SCTP = BIT(5),
> 
>> We know these bits for UDP, TCP, SCTP (and IPSEC) are exclusive, they
>> cannot be set at the same time, e.g. as a packet cannot both be UDP and
>> TCP.  Thus, using these bits as a number make sense to me, and is more
>> compact.
> 
> [..]
> 
>> This BIT() approach also have the issue of extending it later (forward
>> compatibility).  As mentioned a common task will be to check if
>> hash-type is a L4 type.  See mlx5 [patch 4/4] needed to extend with
>> IPSEC. Notice how my XDP_RSS_TYPE_L4_MASK covers all the bits that this
>> can be extended with new L4 types, such that existing progs will still
>> work checking for L4 check.  It can of-cause be solved in the same way
>> for this BIT() approach by reserving some bits upfront in a mask.
> 
> We're using 6 bits out of 64, we should be good for awhile? If there
> is ever a forward compatibility issue, we can always come up with
> a new kfunc.

I want/need store the RSS-type in the xdp_frame, for XDP_REDIRECT and
SKB use-cases.  Thus, I don't want to use 64-bit/8-bytes, as xdp_frame
size is limited (given it reduces headroom expansion).

> 
> One other related question I have is: should we export the type
> over some additional new kfunc argument? (instead of abusing the return
> type) 

Good question. I was also wondering if it wouldn't be better to add
another kfunc argument with the rss_hash_type?

That will change the call signature, so that will not be easy to handle
between kernel releases.


> Maybe that will let us drop the explicit BTF_TYPE_EMIT as well?

Sure, if we define it as an argument, then it will automatically
exported as BTF.

>> > }
>> >
>> > And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs
>> > XDP_RSS_TYPE_IPV6|XXX ?
> 
>> Do notice, that I already does some level of or'ing ("|") in this
>> proposal.  The main difference is that I hide this from the driver, and
>> kind of pre-combine the valid combination (enum's) drivers can select
>> from. I do get the point, and I think I will come up with a combined
>> solution based on your input.
> 
> 
>> The RSS hashing types and combinations comes from M$ standards:
>>   [1] 
>> https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ipv4-hash-type-combinations
> 
> My main concern here is that we're over-complicating it with the masks
> and the format. With the explicit bits we can easily map to that
> spec you mention.

See if you like my RFC-V2 proposal better.
It should go more in your direction.

> 
> For example, for forward compat, I'm not sure we can assume that the people
> will do:
>      "rss_type & XDP_RSS_TYPE_L4_MASK"
> instead of something like:
>      "rss_type & (XDP_RSS_TYPE_L4_IPV4_TCP|XDP_RSS_TYPE_L4_IPV4_UDP)"
> 

This code is allowed in V2 and should be. It is a choice of
BPF-programmer in line-2 to not be forward compatible with newer L4 types.

>> > > This proposal change the kfunc API bpf_xdp_metadata_rx_hash() to  return
>> > > this RSS hash type on success.

This is the real question (as also raised above)...
Should we use return value or add an argument for type?

--Jesper


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-29 18:19         ` Jesper Dangaard Brouer
@ 2023-03-29 23:19           ` Stanislav Fomichev
  2023-03-30  9:51             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2023-03-29 23:19 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
	boon.leong.ong, intel-wired-lan, pabeni, jesse.brandeburg, kuba,
	edumazet, john.fastabend, hawk, davem

On 03/29, Jesper Dangaard Brouer wrote:

> On 29/03/2023 19.18, Stanislav Fomichev wrote:
> > On 03/29, Jesper Dangaard Brouer wrote:
> >
> > > On 28/03/2023 23.58, Stanislav Fomichev wrote:
> > > > On 03/28, Jesper Dangaard Brouer wrote:
> > > > > The RSS hash type specifies what portion of packet data NIC  
> hardware used
> > > > > when calculating RSS hash value. The RSS types are focused on  
> Internet
> > > > > traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often  
> get hash
> > > > > value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and  
> L4
> > > > > primarily TCP vs UDP, but some hardware supports SCTP.
> > > >
> > > > > Hardware RSS types are differently encoded for each hardware NIC.  
> Most
> > > > > hardware represent RSS hash type as a number. Determining L3 vs  
> L4 often
> > > > > requires a mapping table as there often isn't a pattern or sorting
> > > > > according to ISO layer.
> > > >
> > > > > The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that  
> can both
> > > > > be seen as a number that is ordered according by ISO layer, and  
> can be bit
> > > > > masked to separate IPv4 and IPv6 types for L4 protocols. Room is  
> available
> > > > > for extending later while keeping these properties. This maps and  
> unifies
> > > > > difference to hardware specific hashes.
> > > >
> > > > Looks good overall. Any reason we're making this specific layout?
> >
> > > One important goal is to have a simple/fast way to determining L3 vs  
> L4,
> > > because a L4 hash can be used for flow handling (e.g. load-balancing).
> >
> > > We below layout you can:
> >
> > >   if (rss_type & XDP_RSS_TYPE_L4_MASK)
> > >     bool hw_hash_do_LB = true;
> >
> > > Or using it as a number:
> >
> > >   if (rss_type > XDP_RSS_TYPE_L4)
> > >     bool hw_hash_do_LB = true;
> >
> > Why is it strictly better then the following?
> >
> > if (rss_type & (TYPE_UDP | TYPE_TCP | TYPE_SCTP)) {}
> >

> See V2 I dropped the idea of this being a number (that idea was not a
> good idea).

👍

> > If we add some new L4 format, the bpf programs can be updated to support
> > it?
> >
> > > I'm very open to changes to my "specific" layout.  I am in doubt if
> > > using it as a number is the right approach and worth the trouble.
> >
> > > > Why not simply the following?
> > > >
> > > > enum {
> > > >  ����XDP_RSS_TYPE_NONE = 0,
> > > >  ����XDP_RSS_TYPE_IPV4 = BIT(0),
> > > >  ����XDP_RSS_TYPE_IPV6 = BIT(1),
> > > >  ����/* IPv6 with extension header. */
> > > >  ����/* let's note ^^^ it in the UAPI? */
> > > >  ����XDP_RSS_TYPE_IPV6_EX = BIT(2),
> > > >  ����XDP_RSS_TYPE_UDP = BIT(3),
> > > >  ����XDP_RSS_TYPE_TCP = BIT(4),
> > > >  ����XDP_RSS_TYPE_SCTP = BIT(5),
> >
> > > We know these bits for UDP, TCP, SCTP (and IPSEC) are exclusive, they
> > > cannot be set at the same time, e.g. as a packet cannot both be UDP  
> and
> > > TCP.  Thus, using these bits as a number make sense to me, and is more
> > > compact.
> >
> > [..]
> >
> > > This BIT() approach also have the issue of extending it later (forward
> > > compatibility).  As mentioned a common task will be to check if
> > > hash-type is a L4 type.  See mlx5 [patch 4/4] needed to extend with
> > > IPSEC. Notice how my XDP_RSS_TYPE_L4_MASK covers all the bits that  
> this
> > > can be extended with new L4 types, such that existing progs will still
> > > work checking for L4 check.  It can of-cause be solved in the same way
> > > for this BIT() approach by reserving some bits upfront in a mask.
> >
> > We're using 6 bits out of 64, we should be good for awhile? If there
> > is ever a forward compatibility issue, we can always come up with
> > a new kfunc.

> I want/need store the RSS-type in the xdp_frame, for XDP_REDIRECT and
> SKB use-cases.  Thus, I don't want to use 64-bit/8-bytes, as xdp_frame
> size is limited (given it reduces headroom expansion).

> >
> > One other related question I have is: should we export the type
> > over some additional new kfunc argument? (instead of abusing the return
> > type)

> Good question. I was also wondering if it wouldn't be better to add
> another kfunc argument with the rss_hash_type?

> That will change the call signature, so that will not be easy to handle
> between kernel releases.

Agree with Toke on a separate thread; might not be too late to fit it
into an rc..

> > Maybe that will let us drop the explicit BTF_TYPE_EMIT as well?

> Sure, if we define it as an argument, then it will automatically
> exported as BTF.

> > > > }
> > > >
> > > > And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs
> > > > XDP_RSS_TYPE_IPV6|XXX ?
> >
> > > Do notice, that I already does some level of or'ing ("|") in this
> > > proposal.  The main difference is that I hide this from the driver,  
> and
> > > kind of pre-combine the valid combination (enum's) drivers can select
> > > from. I do get the point, and I think I will come up with a combined
> > > solution based on your input.
> >
> >
> > > The RSS hashing types and combinations comes from M$ standards:
> > >   [1]  
> https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ipv4-hash-type-combinations
> >
> > My main concern here is that we're over-complicating it with the masks
> > and the format. With the explicit bits we can easily map to that
> > spec you mention.

> See if you like my RFC-V2 proposal better.
> It should go more in your direction.

Yeah, I like it better. Btw, why have a separate bit for XDP_RSS_BIT_EX?
Any reason it's not a XDP_RSS_L3_IPV6_EX within XDP_RSS_L3_MASK?

And the following part seems like a leftover from the earlier version:

+/* For partitioning of xdp_rss_hash_type */
+#define RSS_L3		GENMASK(2,0) /* 3-bits = values between 1-7 */
+#define L4_BIT		BIT(3)       /* 1-bit - L4 indication */
+#define RSS_L4_IPV4	GENMASK(6,4) /* 3-bits */
+#define RSS_L4_IPV6	GENMASK(9,7) /* 3-bits */
+#define RSS_L4		GENMASK(9,3) /* = 7-bits - covering L4 IPV4+IPV6 */
+#define L4_IPV6_EX_BIT	BIT(9)       /* 1-bit - L4 IPv6 with Extension hdr  
*/
+				     /* 11-bits in total */

> > For example, for forward compat, I'm not sure we can assume that the  
> people
> > will do:
> >      "rss_type & XDP_RSS_TYPE_L4_MASK"
> > instead of something like:
> >      "rss_type & (XDP_RSS_TYPE_L4_IPV4_TCP|XDP_RSS_TYPE_L4_IPV4_UDP)"
> >

> This code is allowed in V2 and should be. It is a choice of
> BPF-programmer in line-2 to not be forward compatible with newer L4 types.

> > > > > This proposal change the kfunc API bpf_xdp_metadata_rx_hash() to   
> return
> > > > > this RSS hash type on success.

> This is the real question (as also raised above)...
> Should we use return value or add an argument for type?

Let's fix the prototype while it's still early in the rc?
Maybe also extend the tests to drop/decode/verify the mask?

> --Jesper


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-29 23:19           ` Stanislav Fomichev
@ 2023-03-30  9:51             ` Jesper Dangaard Brouer
  2023-03-30 17:11               ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-30  9:51 UTC (permalink / raw)
  To: Stanislav Fomichev, Jesper Dangaard Brouer
  Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
	boon.leong.ong, intel-wired-lan, pabeni, jesse.brandeburg, kuba,
	edumazet, john.fastabend, hawk, davem


On 30/03/2023 01.19, Stanislav Fomichev wrote:
> On 03/29, Jesper Dangaard Brouer wrote:
> 
>> On 29/03/2023 19.18, Stanislav Fomichev wrote:
>> > On 03/29, Jesper Dangaard Brouer wrote:
>> >
>> > > On 28/03/2023 23.58, Stanislav Fomichev wrote:
>> > > > On 03/28, Jesper Dangaard Brouer wrote:
>> > > > > The RSS hash type specifies what portion of packet data NIC hardware used
>> > > > > when calculating RSS hash value. The RSS types are focused on Internet
>> > > > > traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
>> > > > > value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
>> > > > > primarily TCP vs UDP, but some hardware supports SCTP.
>> > > >
>> > > > > Hardware RSS types are differently encoded for each hardware NIC. Most
>> > > > > hardware represent RSS hash type as a number. Determining L3  vs L4 often
>> > > > > requires a mapping table as there often isn't a pattern or sorting
>> > > > > according to ISO layer.
>> > > >
>> > > > > The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can both
>> > > > > be seen as a number that is ordered according by ISO layer, and can be bit
>> > > > > masked to separate IPv4 and IPv6 types for L4 protocols. Room is available
>> > > > > for extending later while keeping these properties. This maps and unifies
>> > > > > difference to hardware specific hashes.
>> > > >
>> > > > Looks good overall. Any reason we're making this specific layout?
>> >
>> > > One important goal is to have a simple/fast way to determining L3 vs L4,
>> > > because a L4 hash can be used for flow handling (e.g. load-balancing).
>> >
>> > > We below layout you can:
>> >
>> > >   if (rss_type & XDP_RSS_TYPE_L4_MASK)
>> > >     bool hw_hash_do_LB = true;
>> >
>> > > Or using it as a number:
>> >
>> > >   if (rss_type > XDP_RSS_TYPE_L4)
>> > >     bool hw_hash_do_LB = true;
>> >
>> > Why is it strictly better then the following?
>> >
>> > if (rss_type & (TYPE_UDP | TYPE_TCP | TYPE_SCTP)) {}
>> >
> 
>> See V2 I dropped the idea of this being a number (that idea was not a
>> good idea).
> 
> 👍
> 
>> > If we add some new L4 format, the bpf programs can be updated to support
>> > it?
>> >
>> > > I'm very open to changes to my "specific" layout.  I am in doubt if
>> > > using it as a number is the right approach and worth the trouble.
>> >
>> > > > Why not simply the following?
>> > > >
>> > > > enum {
>> > > >  ����XDP_RSS_TYPE_NONE = 0,
>> > > >  ����XDP_RSS_TYPE_IPV4 = BIT(0),
>> > > >  ����XDP_RSS_TYPE_IPV6 = BIT(1),
>> > > >  ����/* IPv6 with extension header. */
>> > > >  ����/* let's note ^^^ it in the UAPI? */
>> > > >  ����XDP_RSS_TYPE_IPV6_EX = BIT(2),
>> > > >  ����XDP_RSS_TYPE_UDP = BIT(3),
>> > > >  ����XDP_RSS_TYPE_TCP = BIT(4),
>> > > >  ����XDP_RSS_TYPE_SCTP = BIT(5),
>> >
>> > > We know these bits for UDP, TCP, SCTP (and IPSEC) are exclusive, they
>> > > cannot be set at the same time, e.g. as a packet cannot both be UDP and
>> > > TCP.  Thus, using these bits as a number make sense to me, and is more
>> > > compact.

See below, why I'm wrong (in storing this as numbers).

>> >
>> > [..]
>> >
>> > > This BIT() approach also have the issue of extending it later (forward
>> > > compatibility).  As mentioned a common task will be to check if
>> > > hash-type is a L4 type.  See mlx5 [patch 4/4] needed to extend with
>> > > IPSEC. Notice how my XDP_RSS_TYPE_L4_MASK covers all the bits that this
>> > > can be extended with new L4 types, such that existing progs will still
>> > > work checking for L4 check.  It can of-cause be solved in the same way
>> > > for this BIT() approach by reserving some bits upfront in a mask.
>> >
>> > We're using 6 bits out of 64, we should be good for awhile? If there
>> > is ever a forward compatibility issue, we can always come up with
>> > a new kfunc.
> 
>> I want/need store the RSS-type in the xdp_frame, for XDP_REDIRECT and
>> SKB use-cases.  Thus, I don't want to use 64-bit/8-bytes, as xdp_frame
>> size is limited (given it reduces headroom expansion).
> 
>> >
>> > One other related question I have is: should we export the type
>> > over some additional new kfunc argument? (instead of abusing the return
>> > type)
> 
>> Good question. I was also wondering if it wouldn't be better to add
>> another kfunc argument with the rss_hash_type?
> 
>> That will change the call signature, so that will not be easy to handle
>> between kernel releases.
> 
> Agree with Toke on a separate thread; might not be too late to fit it
> into an rc..
> 
>> > Maybe that will let us drop the explicit BTF_TYPE_EMIT as well?
> 
>> Sure, if we define it as an argument, then it will automatically
>> exported as BTF.
> 
>> > > > }
>> > > >
>> > > > And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs
>> > > > XDP_RSS_TYPE_IPV6|XXX ?
>> >
>> > > Do notice, that I already does some level of or'ing ("|") in this
>> > > proposal.  The main difference is that I hide this from the  driver, and
>> > > kind of pre-combine the valid combination (enum's) drivers can select
>> > > from. I do get the point, and I think I will come up with a combined
>> > > solution based on your input.
>> >
>> >
>> > > The RSS hashing types and combinations comes from M$ standards:
>> > >   [1] 
>> https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ipv4-hash-type-combinations
>> >
>> > My main concern here is that we're over-complicating it with the masks
>> > and the format. With the explicit bits we can easily map to that
>> > spec you mention.
> 
>> See if you like my RFC-V2 proposal better.
>> It should go more in your direction.
> 
> Yeah, I like it better. Btw, why have a separate bit for XDP_RSS_BIT_EX?

Yes, we can rename the EX bit define (which is in V2).  I reduced the
name-length, because it allowed to keep code on-one-line when OR'ing.

> Any reason it's not a XDP_RSS_L3_IPV6_EX within XDP_RSS_L3_MASK?
> 

Hmm... I guess it belongs with L3.

Do notice that both IPv4 and IPv6 have a flexible header called either 
options/extensions headers, after their fixed header. (Mlx4 HW contains 
this info for IPv4, but I didn't extend xdp_rss_hash_type in that patch).
Thus, we could have a single BIT that is valid for both IPv4 and IPv6.
(This can help speedup packet parsing having this info).

[...]
> 
>> > For example, for forward compat, I'm not sure we can assume that the people
>> > will do:
>> >      "rss_type & XDP_RSS_TYPE_L4_MASK"
>> > instead of something like:
>> >      "rss_type & (XDP_RSS_TYPE_L4_IPV4_TCP|XDP_RSS_TYPE_L4_IPV4_UDP)"
>> >
> 
>> This code is allowed in V2 and should be. It is a choice of
>> BPF-programmer in line-2 to not be forward compatible with newer L4 
>> types.
> 

The above code made me realize, I was wrong and you are right, we should
represent the L4 types as BITs (and not as numbers).
Even-though a single packet cannot be both UDP and TCP at the same time,
then it is reasonable to have a code path that want to match both UDP
and TCP.  If L4 types are BITs then code can do a single compare (via
ORing), while if they are numbers then we need more compares.
Thus, I'll change scheme in V3 to use BITs.


>> > > > > This proposal change the kfunc API bpf_xdp_metadata_rx_hash() 
>> > > > > to  return this RSS hash type on success.
> 
>> This is the real question (as also raised above)...
>> Should we use return value or add an argument for type?
> 
> Let's fix the prototype while it's still early in the rc?

Okay, in V3 I will propose adding an argument for the type then.

> Maybe also extend the tests to drop/decode/verify the mask?

Yes, I/we obviously need to update the selftests.

One problem with selftests is that it's using veth SKB-based mode, and
SKB's have lost the RSS hash info and converted this into a single BIT
telling us if this was L4 based.  Thus, its hard to do some e.g. UDP
type verification, but I guess we can check if expected UDP packet is
RSS type L4.

In xdp_hw_metadata, I will add something that uses the RSS type bits.  I
was thinking to match against L4-UDP RSS type as program only AF_XDP
redirect UDP packets, so we can verify it was a UDP packet by HW info.

--Jesper


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-30  9:51             ` Jesper Dangaard Brouer
@ 2023-03-30 17:11               ` Stanislav Fomichev
  2023-03-30 18:52                 ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2023-03-30 17:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
	boon.leong.ong, intel-wired-lan, pabeni, jesse.brandeburg, kuba,
	edumazet, john.fastabend, hawk, davem

On 03/30, Jesper Dangaard Brouer wrote:

> On 30/03/2023 01.19, Stanislav Fomichev wrote:
> > On 03/29, Jesper Dangaard Brouer wrote:
> >
> > > On 29/03/2023 19.18, Stanislav Fomichev wrote:
> > > > On 03/29, Jesper Dangaard Brouer wrote:
> > > >
> > > > > On 28/03/2023 23.58, Stanislav Fomichev wrote:
> > > > > > On 03/28, Jesper Dangaard Brouer wrote:
> > > > > > > The RSS hash type specifies what portion of packet data NIC  
> hardware used
> > > > > > > when calculating RSS hash value. The RSS types are focused on  
> Internet
> > > > > > > traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP)  
> often get hash
> > > > > > > value zero and no RSS type. For L3 focused on IPv4 vs. IPv6,  
> and L4
> > > > > > > primarily TCP vs UDP, but some hardware supports SCTP.
> > > > > >
> > > > > > > Hardware RSS types are differently encoded for each hardware  
> NIC. Most
> > > > > > > hardware represent RSS hash type as a number. Determining L3   
> vs L4 often
> > > > > > > requires a mapping table as there often isn't a pattern or  
> sorting
> > > > > > > according to ISO layer.
> > > > > >
> > > > > > > The patch introduce a XDP RSS hash type (xdp_rss_hash_type)  
> that can both
> > > > > > > be seen as a number that is ordered according by ISO layer,  
> and can be bit
> > > > > > > masked to separate IPv4 and IPv6 types for L4 protocols. Room  
> is available
> > > > > > > for extending later while keeping these properties. This maps  
> and unifies
> > > > > > > difference to hardware specific hashes.
> > > > > >
> > > > > > Looks good overall. Any reason we're making this specific  
> layout?
> > > >
> > > > > One important goal is to have a simple/fast way to determining L3  
> vs L4,
> > > > > because a L4 hash can be used for flow handling (e.g.  
> load-balancing).
> > > >
> > > > > We below layout you can:
> > > >
> > > > >   if (rss_type & XDP_RSS_TYPE_L4_MASK)
> > > > >     bool hw_hash_do_LB = true;
> > > >
> > > > > Or using it as a number:
> > > >
> > > > >   if (rss_type > XDP_RSS_TYPE_L4)
> > > > >     bool hw_hash_do_LB = true;
> > > >
> > > > Why is it strictly better then the following?
> > > >
> > > > if (rss_type & (TYPE_UDP | TYPE_TCP | TYPE_SCTP)) {}
> > > >
> >
> > > See V2 I dropped the idea of this being a number (that idea was not a
> > > good idea).
> >
> > 👍
> >
> > > > If we add some new L4 format, the bpf programs can be updated to  
> support
> > > > it?
> > > >
> > > > > I'm very open to changes to my "specific" layout.  I am in doubt  
> if
> > > > > using it as a number is the right approach and worth the trouble.
> > > >
> > > > > > Why not simply the following?
> > > > > >
> > > > > > enum {
> > > > > >  ����XDP_RSS_TYPE_NONE = 0,
> > > > > >  ����XDP_RSS_TYPE_IPV4 = BIT(0),
> > > > > >  ����XDP_RSS_TYPE_IPV6 = BIT(1),
> > > > > >  ����/* IPv6 with extension header. */
> > > > > >  ����/* let's note ^^^ it in the UAPI? */
> > > > > >  ����XDP_RSS_TYPE_IPV6_EX = BIT(2),
> > > > > >  ����XDP_RSS_TYPE_UDP = BIT(3),
> > > > > >  ����XDP_RSS_TYPE_TCP = BIT(4),
> > > > > >  ����XDP_RSS_TYPE_SCTP = BIT(5),
> > > >
> > > > > We know these bits for UDP, TCP, SCTP (and IPSEC) are exclusive,  
> they
> > > > > cannot be set at the same time, e.g. as a packet cannot both be  
> UDP and
> > > > > TCP.  Thus, using these bits as a number make sense to me, and is  
> more
> > > > > compact.

> See below, why I'm wrong (in storing this as numbers).

> > > >
> > > > [..]
> > > >
> > > > > This BIT() approach also have the issue of extending it later  
> (forward
> > > > > compatibility).  As mentioned a common task will be to check if
> > > > > hash-type is a L4 type.  See mlx5 [patch 4/4] needed to extend  
> with
> > > > > IPSEC. Notice how my XDP_RSS_TYPE_L4_MASK covers all the bits  
> that this
> > > > > can be extended with new L4 types, such that existing progs will  
> still
> > > > > work checking for L4 check.  It can of-cause be solved in the  
> same way
> > > > > for this BIT() approach by reserving some bits upfront in a mask.
> > > >
> > > > We're using 6 bits out of 64, we should be good for awhile? If there
> > > > is ever a forward compatibility issue, we can always come up with
> > > > a new kfunc.
> >
> > > I want/need store the RSS-type in the xdp_frame, for XDP_REDIRECT and
> > > SKB use-cases.  Thus, I don't want to use 64-bit/8-bytes, as xdp_frame
> > > size is limited (given it reduces headroom expansion).
> >
> > > >
> > > > One other related question I have is: should we export the type
> > > > over some additional new kfunc argument? (instead of abusing the  
> return
> > > > type)
> >
> > > Good question. I was also wondering if it wouldn't be better to add
> > > another kfunc argument with the rss_hash_type?
> >
> > > That will change the call signature, so that will not be easy to  
> handle
> > > between kernel releases.
> >
> > Agree with Toke on a separate thread; might not be too late to fit it
> > into an rc..
> >
> > > > Maybe that will let us drop the explicit BTF_TYPE_EMIT as well?
> >
> > > Sure, if we define it as an argument, then it will automatically
> > > exported as BTF.
> >
> > > > > > }
> > > > > >
> > > > > > And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs
> > > > > > XDP_RSS_TYPE_IPV6|XXX ?
> > > >
> > > > > Do notice, that I already does some level of or'ing ("|") in this
> > > > > proposal.  The main difference is that I hide this from the   
> driver, and
> > > > > kind of pre-combine the valid combination (enum's) drivers can  
> select
> > > > > from. I do get the point, and I think I will come up with a  
> combined
> > > > > solution based on your input.
> > > >
> > > >
> > > > > The RSS hashing types and combinations comes from M$ standards:
> > > > >   [1]  
> https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ipv4-hash-type-combinations
> > > >
> > > > My main concern here is that we're over-complicating it with the  
> masks
> > > > and the format. With the explicit bits we can easily map to that
> > > > spec you mention.
> >
> > > See if you like my RFC-V2 proposal better.
> > > It should go more in your direction.
> >
> > Yeah, I like it better. Btw, why have a separate bit for XDP_RSS_BIT_EX?

> Yes, we can rename the EX bit define (which is in V2).  I reduced the
> name-length, because it allowed to keep code on-one-line when OR'ing.

> > Any reason it's not a XDP_RSS_L3_IPV6_EX within XDP_RSS_L3_MASK?
> >

> Hmm... I guess it belongs with L3.

> Do notice that both IPv4 and IPv6 have a flexible header called either
> options/extensions headers, after their fixed header. (Mlx4 HW contains  
> this
> info for IPv4, but I didn't extend xdp_rss_hash_type in that patch).
> Thus, we could have a single BIT that is valid for both IPv4 and IPv6.
> (This can help speedup packet parsing having this info).

A separate bit for both v4/v6 sounds good. But thinking more about it,
not sure what the users are supposed to do with it. Whether the flow is  
hashed
over the extension header should a config option, not a per-packet signal?

> [...]
> >
> > > > For example, for forward compat, I'm not sure we can assume that  
> the people
> > > > will do:
> > > >      "rss_type & XDP_RSS_TYPE_L4_MASK"
> > > > instead of something like:
> > > >      "rss_type & (XDP_RSS_TYPE_L4_IPV4_TCP| 
> XDP_RSS_TYPE_L4_IPV4_UDP)"
> > > >
> >
> > > This code is allowed in V2 and should be. It is a choice of
> > > BPF-programmer in line-2 to not be forward compatible with newer L4
> > > types.
> >

> The above code made me realize, I was wrong and you are right, we should
> represent the L4 types as BITs (and not as numbers).
> Even-though a single packet cannot be both UDP and TCP at the same time,
> then it is reasonable to have a code path that want to match both UDP
> and TCP.  If L4 types are BITs then code can do a single compare (via
> ORing), while if they are numbers then we need more compares.
> Thus, I'll change scheme in V3 to use BITs.

So you are saying that the following:
	if (rss_type & (TCP|UDP)

is much faster than the following:
	proto = rss_type & L4_MASK;
	if (proto == TCP || proto == UDP)

?

idk, as long as we have enough bits to represent everything, I'm fine
with either way, up to you. (not sure how much you want to constrain the  
data
to fit it into xdp_frame; assuming u16 is fine?)


> > > > > > > This proposal change the kfunc API
> > > bpf_xdp_metadata_rx_hash() > > > > to  return this RSS hash type on
> > > success.
> >
> > > This is the real question (as also raised above)...
> > > Should we use return value or add an argument for type?
> >
> > Let's fix the prototype while it's still early in the rc?

> Okay, in V3 I will propose adding an argument for the type then.

SG, thx!

> > Maybe also extend the tests to drop/decode/verify the mask?

> Yes, I/we obviously need to update the selftests.

> One problem with selftests is that it's using veth SKB-based mode, and
> SKB's have lost the RSS hash info and converted this into a single BIT
> telling us if this was L4 based.  Thus, its hard to do some e.g. UDP
> type verification, but I guess we can check if expected UDP packet is
> RSS type L4.

Yeah, sounds fair.

> In xdp_hw_metadata, I will add something that uses the RSS type bits.  I
> was thinking to match against L4-UDP RSS type as program only AF_XDP
> redirect UDP packets, so we can verify it was a UDP packet by HW info.

Or maybe just dump it, idk.

> --Jesper


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
  2023-03-30 17:11               ` Stanislav Fomichev
@ 2023-03-30 18:52                 ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-03-30 18:52 UTC (permalink / raw)
  To: Stanislav Fomichev, Jesper Dangaard Brouer
  Cc: brouer, bpf, netdev, martin.lau, ast, daniel, alexandr.lobakin,
	larysa.zaremba, xdp-hints, anthony.l.nguyen, yoong.siang.song,
	boon.leong.ong, intel-wired-lan, pabeni, jesse.brandeburg, kuba,
	edumazet, john.fastabend, hawk, davem



On 30/03/2023 19.11, Stanislav Fomichev wrote:
> On 03/30, Jesper Dangaard Brouer wrote:
> 
>> On 30/03/2023 01.19, Stanislav Fomichev wrote:
>> > On 03/29, Jesper Dangaard Brouer wrote:
>> >
>> > > On 29/03/2023 19.18, Stanislav Fomichev wrote:
>> > > > On 03/29, Jesper Dangaard Brouer wrote:
>> > > >
>> > > > > On 28/03/2023 23.58, Stanislav Fomichev wrote:
>> > > > > > On 03/28, Jesper Dangaard Brouer wrote:
>> > > > > > > The RSS hash type specifies what portion of packet data  NIC hardware used
>> > > > > > > when calculating RSS hash value. The RSS types are focused on Internet
>> > > > > > > traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
>> > > > > > > value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
>> > > > > > > primarily TCP vs UDP, but some hardware supports SCTP.
>> > > > > >
>> > > > > > > Hardware RSS types are differently encoded for each  hardware NIC. Most
>> > > > > > > hardware represent RSS hash type as a number. Determining L3 vs L4 often
>> > > > > > > requires a mapping table as there often isn't a pattern or sorting
>> > > > > > > according to ISO layer.
>> > > > > >
[...]
>> > Any reason it's not a XDP_RSS_L3_IPV6_EX within XDP_RSS_L3_MASK?
>> >
> 
>> Hmm... I guess it belongs with L3.
> 
>> Do notice that both IPv4 and IPv6 have a flexible header called either
>> options/extensions headers, after their fixed header. (Mlx4 HW contains this
>> info for IPv4, but I didn't extend xdp_rss_hash_type in that patch).
>> Thus, we could have a single BIT that is valid for both IPv4 and IPv6.
>> (This can help speedup packet parsing having this info).
> 
> A separate bit for both v4/v6 sounds good. But thinking more about it,
> not sure what the users are supposed to do with it. Whether the flow is 
> hashed over the extension header should a config option, not a per-packet signal?
> 

Microsoft defines which part of the IPv6 Extensions headers will be used 
for replacing either the Source (Home address) and Dest 
(Routing-Header-Type-2) IPv6 Addresses, in the hash calc, here[1]:

  [1] 
https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ndis_hash_ipv6_ex

The igc/i225 chip returns per-packet the RSS Type's with _EX added.
Thus, I implemented this per-packet basis.


>> [...]
>> >
>> > > > For example, for forward compat, I'm not sure we can assume that 
>> the people
>> > > > will do:
>> > > >      "rss_type & XDP_RSS_TYPE_L4_MASK"
>> > > > instead of something like:
>> > > >      "rss_type & 
>> (XDP_RSS_TYPE_L4_IPV4_TCP|XDP_RSS_TYPE_L4_IPV4_UDP)"
>> > > >
>> >
>> > > This code is allowed in V2 and should be. It is a choice of
>> > > BPF-programmer in line-2 to not be forward compatible with newer L4
>> > > types.
>> >
> 
>> The above code made me realize, I was wrong and you are right, we should
>> represent the L4 types as BITs (and not as numbers).
>> Even-though a single packet cannot be both UDP and TCP at the same time,
>> then it is reasonable to have a code path that want to match both UDP
>> and TCP.  If L4 types are BITs then code can do a single compare (via
>> ORing), while if they are numbers then we need more compares.
>> Thus, I'll change scheme in V3 to use BITs.
> 
> So you are saying that the following:
>      if (rss_type & (TCP|UDP)
> 
> is much faster than the following:
>      proto = rss_type & L4_MASK;
>      if (proto == TCP || proto == UDP)
> 
> ?

For XDP every instruction/cycle counts.
Just to make sure, I tested it with godbolt.org, 3 vs 4 inst.

> 
> idk, as long as we have enough bits to represent everything, I'm fine
> with either way, up to you. (not sure how much you want to constrain the 
> data
> to fit it into xdp_frame; assuming u16 is fine?)

Yes, u16 is fine.

> 
> 
>> > > > > > > This proposal change the kfunc API
>> > > bpf_xdp_metadata_rx_hash() > > > > to  return this RSS hash type on
>> > > success.
>> >
>> > > This is the real question (as also raised above)...
>> > > Should we use return value or add an argument for type?
>> >
>> > Let's fix the prototype while it's still early in the rc?
> 
>> Okay, in V3 I will propose adding an argument for the type then.
> 
> SG, thx!

> 
>> > Maybe also extend the tests to drop/decode/verify the mask?
> 
>> Yes, I/we obviously need to update the selftests.
> 
>> One problem with selftests is that it's using veth SKB-based mode, and
>> SKB's have lost the RSS hash info and converted this into a single BIT
>> telling us if this was L4 based.  Thus, its hard to do some e.g. UDP
>> type verification, but I guess we can check if expected UDP packet is
>> RSS type L4.
> 
> Yeah, sounds fair.
> 
>> In xdp_hw_metadata, I will add something that uses the RSS type bits.  I
>> was thinking to match against L4-UDP RSS type as program only AF_XDP
>> redirect UDP packets, so we can verify it was a UDP packet by HW info.
> 
> Or maybe just dump it, idk.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-30 18:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 20:15 [PATCH bpf RFC 0/4] XDP-hints: API change for RX-hash kfunc bpf_xdp_metadata_rx_hash Jesper Dangaard Brouer
2023-03-28 20:15 ` [PATCH bpf RFC 1/4] xdp: rss hash types representation Jesper Dangaard Brouer
2023-03-28 21:58   ` Stanislav Fomichev
2023-03-29 11:23     ` Jesper Dangaard Brouer
2023-03-29 17:18       ` Stanislav Fomichev
2023-03-29 18:19         ` Jesper Dangaard Brouer
2023-03-29 23:19           ` Stanislav Fomichev
2023-03-30  9:51             ` Jesper Dangaard Brouer
2023-03-30 17:11               ` Stanislav Fomichev
2023-03-30 18:52                 ` Jesper Dangaard Brouer
2023-03-29  8:10   ` Edward Cree
2023-03-29 12:13     ` [xdp-hints] " Jesper Dangaard Brouer
2023-03-28 20:16 ` [PATCH bpf RFC 2/4] igc: bpf_xdp_metadata_rx_hash return xdp rss hash type Jesper Dangaard Brouer
2023-03-28 20:16 ` [PATCH bpf RFC 3/4] veth: " Jesper Dangaard Brouer
2023-03-28 20:16 ` [PATCH bpf RFC 4/4] mlx5: " Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).