All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Stanislav Fomichev <sdf@google.com>, bpf@vger.kernel.org
Cc: brouer@redhat.com, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, martin.lau@linux.dev, song@kernel.org,
	yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org,
	haoluo@google.com, jolsa@kernel.org,
	David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Alexander Lobakin <alexandr.lobakin@intel.com>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	Maryam Tahhan <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net, netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next v7 10/17] veth: Support RX XDP metadata
Date: Mon, 16 Jan 2023 17:21:08 +0100	[thread overview]
Message-ID: <a5ce7ac4-7901-6146-2c2a-5b4958c14e11@redhat.com> (raw)
In-Reply-To: <20230112003230.3779451-11-sdf@google.com>



On 12/01/2023 01.32, Stanislav Fomichev wrote:
> The goal is to enable end-to-end testing of the metadata for AF_XDP.

For me the goal with veth goes beyond *testing*.

This patch ignores the xdp_frame case.  I'm not blocking this patch, but
I'm saying we need to make sure there is a way forward for accessing
XDP-hints when handling redirected xdp_frame's.

I have two use-cases we should cover (as future work).

(#1) We have customers that want to redirect from physical NIC hardware
into containers, and then have the veth XDP-prog (selectively) redirect
into an AF_XDP socket (when matching fastpath packets).  Here they
(minimum) want access to the XDP hint info on HW checksum.

(#2) Both veth and cpumap can create SKBs based on xdp_frame's.  Here it
is essential to get HW checksum and HW hash when creating these SKBs
(else netstack have to do expensive csum calc and parsing in
flow-dissector).

> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>   drivers/net/veth.c | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 70f50602287a..ba3e05832843 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -118,6 +118,7 @@ static struct {
>   
>   struct veth_xdp_buff {
>   	struct xdp_buff xdp;
> +	struct sk_buff *skb;
>   };
>   
>   static int veth_get_link_ksettings(struct net_device *dev,
> @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
>   
>   		xdp_convert_frame_to_buff(frame, xdp);
>   		xdp->rxq = &rq->xdp_rxq;
> +		vxbuf.skb = NULL;
>   
>   		act = bpf_prog_run_xdp(xdp_prog, xdp);
>   
> @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>   	__skb_push(skb, skb->data - skb_mac_header(skb));
>   	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>   		goto drop;
> +	vxbuf.skb = skb;
>   
>   	orig_data = xdp->data;
>   	orig_data_end = xdp->data_end;
> @@ -1602,6 +1605,28 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>   	}
>   }
>   
> +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> +{
> +	struct veth_xdp_buff *_ctx = (void *)ctx;
> +
> +	if (!_ctx->skb)
> +		return -EOPNOTSUPP;
> +
> +	*timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;

The SKB stores this skb_hwtstamps() in skb_shared_info memory area.
This memory area is actually also available to xdp_frames.  Thus, we
could store the HW rx_timestamp in same location for redirected
xdp_frames.  This could make code path sharing possible between SKB vs
xdp_frame in veth.

This would also make it fast to "transfer" HW rx_timestamp when creating
an SKB from an xdp_frame, as data is already written in the correct place.

Performance wise the down-side is that skb_shared_info memory area is in
a separate cacheline.  Thus, when no HW rx_timestamp is available, then
it is very expensive for a veth XDP bpf-prog to access this, just to get
a zero back.  Having an xdp_frame->flags bit that knows if HW
rx_timestamp have been stored, can mitigate this.


> +	return 0;
> +}
> +
> +static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
> +{
> +	struct veth_xdp_buff *_ctx = (void *)ctx;
> +
> +	if (!_ctx->skb)
> +		return -EOPNOTSUPP;

For xdp_frame case, I'm considering simply storing the u32 RX-hash in
struct xdp_frame.  This makes it easy to extract for xdp_frame to SKB
create use-case.

As have been mentioned before, the SKB also requires knowing the RSS
hash-type.  This HW hash-type actually contains a lot of information,
that today is lost when reduced to the SKB hash-type.  Due to
standardization from Microsoft, most HW provide info on (L3) IPv4 or
IPv6, and on (L4) TCP or UDP (and often SCTP).  Often hardware
descriptor also provide info on the header length.  Future work in this
area is exciting as we can speedup parsing of packets in XDP, if we can
get are more detailed HW info on hash "packet-type".

> +
> +	*hash = skb_get_hash(_ctx->skb); > +	return 0;
> +}
> +

--Jesper


  reply	other threads:[~2023-01-16 16:34 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12  0:32 [PATCH bpf-next v7 00/17] xdp: hints via kfuncs Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 01/17] bpf: Document XDP RX metadata Stanislav Fomichev
2023-01-16 13:09   ` Jesper Dangaard Brouer
2023-01-17 20:33     ` Stanislav Fomichev
2023-01-18 14:28       ` Jesper Dangaard Brouer
2023-01-18 17:55         ` Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 02/17] bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 03/17] bpf: Move offload initialization into late_initcall Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 04/17] bpf: Reshuffle some parts of bpf/offload.c Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 05/17] bpf: Introduce device-bound XDP programs Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 06/17] selftests/bpf: Update expected test_offload.py messages Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 07/17] bpf: XDP metadata RX kfuncs Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 08/17] bpf: Support consuming XDP HW metadata from fext programs Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 09/17] veth: Introduce veth_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 10/17] veth: Support RX XDP metadata Stanislav Fomichev
2023-01-16 16:21   ` Jesper Dangaard Brouer [this message]
2023-01-17 20:33     ` Stanislav Fomichev
2023-01-18 15:57       ` Jesper Dangaard Brouer
2023-01-12  0:32 ` [PATCH bpf-next v7 11/17] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 12/17] net/mlx4_en: Introduce wrapper for xdp_buff Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 13/17] net/mlx4_en: Support RX XDP metadata Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 14/17] xsk: Add cb area to struct xdp_buff_xsk Stanislav Fomichev
2023-01-12  0:32 ` [PATCH bpf-next v7 15/17] net/mlx5e: Introduce wrapper for xdp_buff Stanislav Fomichev
2023-01-12  8:07   ` Tariq Toukan
2023-01-12 19:10     ` Stanislav Fomichev
2023-01-12 21:09       ` [xdp-hints] " Toke Høiland-Jørgensen
2023-01-12 21:55         ` Toke Høiland-Jørgensen
2023-01-12 22:18           ` Stanislav Fomichev
2023-01-12 22:29             ` Toke Høiland-Jørgensen
2023-01-13 20:55               ` Tariq Toukan
2023-01-13 20:53           ` Tariq Toukan
2023-01-13 21:31             ` Toke Høiland-Jørgensen
2023-01-15  6:59               ` Tariq Toukan
2023-01-15 11:13                 ` Toke Høiland-Jørgensen
2023-01-12  0:32 ` [PATCH bpf-next v7 16/17] net/mlx5e: Support RX XDP metadata Stanislav Fomichev
2023-01-12  8:13   ` Tariq Toukan
2023-01-12 19:09     ` Stanislav Fomichev
2023-01-13 20:25       ` Tariq Toukan
2023-01-12  0:32 ` [PATCH bpf-next v7 17/17] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
2023-01-12  7:29 ` [PATCH bpf-next v7 00/17] xdp: hints via kfuncs Martin KaFai Lau
2023-01-12  8:19   ` Tariq Toukan
2023-01-12 18:09     ` Stanislav Fomichev
2023-01-12 18:20       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5ce7ac4-7901-6146-2c2a-5b4958c14e11@redhat.com \
    --to=jbrouer@redhat.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.