Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation

From: Stanislav Fomichev <sdf@google.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: brouer@redhat.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	martin.lau@kernel.org, ast@kernel.org, daniel@iogearbox.net,
	alexandr.lobakin@intel.com, larysa.zaremba@intel.com,
	xdp-hints@xdp-project.net, anthony.l.nguyen@intel.com,
	yoong.siang.song@intel.com, boon.leong.ong@intel.com,
	intel-wired-lan@lists.osuosl.org, pabeni@redhat.com,
	jesse.brandeburg@intel.com, kuba@kernel.org, edumazet@google.com,
	john.fastabend@gmail.com, hawk@kernel.org, davem@davemloft.net
Subject: Re: [PATCH bpf RFC 1/4] xdp: rss hash types representation
Date: Wed, 29 Mar 2023 10:18:17 -0700	[thread overview]
Message-ID: <ZCRy2f170FQ+fXsp@google.com> (raw)
In-Reply-To: <811724e2-cdd6-15fe-b176-9dfcdbd98bad@redhat.com>

On 03/29, Jesper Dangaard Brouer wrote:

> On 28/03/2023 23.58, Stanislav Fomichev wrote:
> > On 03/28, Jesper Dangaard Brouer wrote:
> > > The RSS hash type specifies what portion of packet data NIC hardware  
> used
> > > when calculating RSS hash value. The RSS types are focused on Internet
> > > traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get  
> hash
> > > value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
> > > primarily TCP vs UDP, but some hardware supports SCTP.
> >
> > > Hardware RSS types are differently encoded for each hardware NIC. Most
> > > hardware represent RSS hash type as a number. Determining L3 vs L4  
> often
> > > requires a mapping table as there often isn't a pattern or sorting
> > > according to ISO layer.
> >
> > > The patch introduce a XDP RSS hash type (xdp_rss_hash_type) that can  
> both
> > > be seen as a number that is ordered according by ISO layer, and can  
> be bit
> > > masked to separate IPv4 and IPv6 types for L4 protocols. Room is  
> available
> > > for extending later while keeping these properties. This maps and  
> unifies
> > > difference to hardware specific hashes.
> >
> > Looks good overall. Any reason we're making this specific layout?

> One important goal is to have a simple/fast way to determining L3 vs L4,
> because a L4 hash can be used for flow handling (e.g. load-balancing).

> We below layout you can:

>   if (rss_type & XDP_RSS_TYPE_L4_MASK)
> 	bool hw_hash_do_LB = true;

> Or using it as a number:

>   if (rss_type > XDP_RSS_TYPE_L4)
> 	bool hw_hash_do_LB = true;

Why is it strictly better then the following?

if (rss_type & (TYPE_UDP | TYPE_TCP | TYPE_SCTP)) {}

If we add some new L4 format, the bpf programs can be updated to support
it?

> I'm very open to changes to my "specific" layout.  I am in doubt if
> using it as a number is the right approach and worth the trouble.

> > Why not simply the following?
> >
> > enum {
> >  ����XDP_RSS_TYPE_NONE = 0,
> >  ����XDP_RSS_TYPE_IPV4 = BIT(0),
> >  ����XDP_RSS_TYPE_IPV6 = BIT(1),
> >  ����/* IPv6 with extension header. */
> >  ����/* let's note ^^^ it in the UAPI? */
> >  ����XDP_RSS_TYPE_IPV6_EX = BIT(2),
> >  ����XDP_RSS_TYPE_UDP = BIT(3),
> >  ����XDP_RSS_TYPE_TCP = BIT(4),
> >  ����XDP_RSS_TYPE_SCTP = BIT(5),

> We know these bits for UDP, TCP, SCTP (and IPSEC) are exclusive, they
> cannot be set at the same time, e.g. as a packet cannot both be UDP and
> TCP.  Thus, using these bits as a number make sense to me, and is more
> compact.

[..]

> This BIT() approach also have the issue of extending it later (forward
> compatibility).  As mentioned a common task will be to check if
> hash-type is a L4 type.  See mlx5 [patch 4/4] needed to extend with
> IPSEC. Notice how my XDP_RSS_TYPE_L4_MASK covers all the bits that this
> can be extended with new L4 types, such that existing progs will still
> work checking for L4 check.  It can of-cause be solved in the same way
> for this BIT() approach by reserving some bits upfront in a mask.

We're using 6 bits out of 64, we should be good for awhile? If there
is ever a forward compatibility issue, we can always come up with
a new kfunc.

One other related question I have is: should we export the type
over some additional new kfunc argument? (instead of abusing the return
type) Maybe that will let us drop the explicit BTF_TYPE_EMIT as well?

> > }
> >
> > And then using XDP_RSS_TYPE_IPV4|XDP_RSS_TYPE_UDP vs
> > XDP_RSS_TYPE_IPV6|XXX ?

> Do notice, that I already does some level of or'ing ("|") in this
> proposal.  The main difference is that I hide this from the driver, and
> kind of pre-combine the valid combination (enum's) drivers can select
> from. I do get the point, and I think I will come up with a combined
> solution based on your input.

> The RSS hashing types and combinations comes from M$ standards:
>   [1]  
> https://learn.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types#ipv4-hash-type-combinations

My main concern here is that we're over-complicating it with the masks
and the format. With the explicit bits we can easily map to that
spec you mention.

For example, for forward compat, I'm not sure we can assume that the people
will do:
	"rss_type & XDP_RSS_TYPE_L4_MASK"
instead of something like:
	"rss_type & (XDP_RSS_TYPE_L4_IPV4_TCP|XDP_RSS_TYPE_L4_IPV4_UDP)"

> > > This proposal change the kfunc API bpf_xdp_metadata_rx_hash() to  
> return
> > > this RSS hash type on success.
> >
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > ---
> > > � include/net/xdp.h |�� 51
> > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > � net/core/xdp.c��� |��� 4 +++-
> > > � 2 files changed, 54 insertions(+), 1 deletion(-)
> >
> > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > index 5393b3ebe56e..63f462f5ea7f 100644
> > > --- a/include/net/xdp.h
> > > +++ b/include/net/xdp.h
> > > @@ -8,6 +8,7 @@
> >
> > > � #include <linux/skbuff.h> /* skb_shared_info */
> > > � #include <uapi/linux/netdev.h>
> > > +#include <linux/bitfield.h>
> >
> > > � /**
> > > �� * DOC: XDP RX-queue information
> > > @@ -396,6 +397,56 @@ XDP_METADATA_KFUNC_xxx
> > > � MAX_XDP_METADATA_KFUNC,
> > > � };
> >
> > > +/* For partitioning of xdp_rss_hash_type */
> > > +#define RSS_L3������� GENMASK(2,0) /* 3-bits = values between 1-7 */
> > > +#define L4_BIT������� BIT(3)������ /* 1-bit - L4 indication */
> > > +#define RSS_L4_IPV4��� GENMASK(6,4) /* 3-bits */
> > > +#define RSS_L4_IPV6��� GENMASK(9,7) /* 3-bits */
> > > +#define RSS_L4������� GENMASK(9,3) /* = 7-bits - covering L4
> > > IPV4+IPV6 */
> > > +#define L4_IPV6_EX_BIT��� BIT(9)������ /* 1-bit - L4 IPv6 with
> > > Extension hdr */
> > > +�������������������� /* 11-bits in total */
> > > +
> > > +/* The XDP RSS hash type (xdp_rss_hash_type) can both be seen as a  
> number that
> > > + * is ordered according by ISO layer, and can be bit masked to  
> separate IPv4 and
> > > + * IPv6 types for L4 protocols. Room is available for extending  
> later while
> > > + * keeping above properties, as this need to cover NIC hardware RSS  
> types.
> > > + */
> > > +enum xdp_rss_hash_type {
> > > +��� XDP_RSS_TYPE_NONE����������� = 0,
> > > +��� XDP_RSS_TYPE_L2������������� = XDP_RSS_TYPE_NONE,
> > > +
> > > +��� XDP_RSS_TYPE_L3_MASK�������� = RSS_L3,
> > > +��� XDP_RSS_TYPE_L3_IPV4�������� = FIELD_PREP_CONST(RSS_L3, 1),
> > > +��� XDP_RSS_TYPE_L3_IPV6�������� = FIELD_PREP_CONST(RSS_L3, 2),
> > > +��� XDP_RSS_TYPE_L3_IPV6_EX����� = FIELD_PREP_CONST(RSS_L3, 4),
> > > +
> > > +��� XDP_RSS_TYPE_L4_MASK�������� = RSS_L4,
> > > +��� XDP_RSS_TYPE_L4_SHIFT������� = __bf_shf(RSS_L4),
> > > +��� XDP_RSS_TYPE_L4_MASK_EX����� = RSS_L4 | L4_IPV6_EX_BIT,
> > > +
> > > +��� XDP_RSS_TYPE_L4_IPV4_MASK��� = RSS_L4_IPV4,
> > > +��� XDP_RSS_TYPE_L4_BIT��������� = L4_BIT,
> > > +��� XDP_RSS_TYPE_L4_IPV4_TCP���� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV4, 1),
> > > +��� XDP_RSS_TYPE_L4_IPV4_UDP���� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV4, 2),
> > > +��� XDP_RSS_TYPE_L4_IPV4_SCTP��� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV4, 3),
> > > +
> > > +��� XDP_RSS_TYPE_L4_IPV6_MASK��� = RSS_L4_IPV6,
> > > +��� XDP_RSS_TYPE_L4_IPV6_TCP���� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV6, 1),
> > > +��� XDP_RSS_TYPE_L4_IPV6_UDP���� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV6, 2),
> > > +��� XDP_RSS_TYPE_L4_IPV6_SCTP��� = L4_BIT| 
> FIELD_PREP_CONST(RSS_L4_IPV6, 3),
> > > +
> > > +��� XDP_RSS_TYPE_L4_IPV6_EX_MASK = L4_IPV6_EX_BIT,
> > > +��� XDP_RSS_TYPE_L4_IPV6_TCP_EX� = XDP_RSS_TYPE_L4_IPV6_TCP| 
> L4_IPV6_EX_BIT,
> > > +��� XDP_RSS_TYPE_L4_IPV6_UDP_EX� = XDP_RSS_TYPE_L4_IPV6_UDP| 
> L4_IPV6_EX_BIT,
> > > +��� XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP| 
> L4_IPV6_EX_BIT,
> > > +};
> > > +#undef RSS_L3
> > > +#undef L4_BIT
> > > +#undef RSS_L4_IPV4
> > > +#undef RSS_L4_IPV6
> > > +#undef RSS_L4
> > > +#undef L4_IPV6_EX_BIT
> > > +
> > > � #ifdef CONFIG_NET
> > > � u32 bpf_xdp_metadata_kfunc_id(int id);
> > > � bool bpf_dev_bound_kfunc_id(u32 btf_id);
> > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > index 7133017bcd74..81d41df30695 100644
> > > --- a/net/core/xdp.c
> > > +++ b/net/core/xdp.c
> > > @@ -721,12 +721,14 @@ __bpf_kfunc int
> > > bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *tim
> > > �� * @hash: Return value pointer.
> > > �� *
> > > �� * Return:
> > > - * * Returns 0 on success or ``-errno`` on error.
> > > + * * Returns (positive) RSS hash **type** on success or ``-errno``
> > > on error.
> > > + * * ``enum xdp_rss_hash_type`` : RSS hash type
> > > �� * * ``-EOPNOTSUPP`` : means device driver doesn't implement kfunc
> > > �� * * ``-ENODATA``��� : means no RX-hash available for this frame
> > > �� */
> > > � __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx,
> > > u32 *hash)
> > > � {
> > > +��� BTF_TYPE_EMIT(enum xdp_rss_hash_type);
> > > ����� return -EOPNOTSUPP;
> > > � }
> >
> >
> >