bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Magnus Karlsson <magnus.karlsson@gmail.com>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>,
	brouer@redhat.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	xdp-hints@xdp-project.net, larysa.zaremba@intel.com,
	memxor@gmail.com, Lorenzo Bianconi <lorenzo@kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	dave@dtucker.co.uk, Magnus Karlsson <magnus.karlsson@intel.com>,
	bjorn@kernel.org
Subject: Re: [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options
Date: Fri, 9 Sep 2022 08:43:24 +0200	[thread overview]
Message-ID: <CAJ8uoz0CD18RUYU4SMsubB8fhv3uOwp6wi_uKsZSu_aOV5piaA@mail.gmail.com> (raw)
In-Reply-To: <9aab9ef1-446d-57ab-5789-afffe27801f4@redhat.com>

On Thu, Sep 8, 2022 at 5:04 PM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 08/09/2022 12.10, Maryam Tahhan wrote:
> > On 08/09/2022 09:06, Magnus Karlsson wrote:
> >> On Wed, Sep 7, 2022 at 5:48 PM Jesper Dangaard Brouer
> >> <brouer@redhat.com> wrote:
> >>>
> >>> From: Maryam Tahhan <mtahhan@redhat.com>
> >>>
> >>> Simply set AF_XDP descriptor options to XDP flags.
> >>>
> >>> Jesper: Will this really be acceptable by AF_XDP maintainers?
> >>
> >> Maryam, you guessed correctly that dedicating all these options bits
> >> for a single feature will not be ok :-). E.g., I want one bit for the
> >> AF_XDP multi-buffer support and who knows what other uses there might
> >> be for this options field in the future. Let us try to solve this in
> >> some other way. Here are some suggestions, all with their pros and
> >> cons.
> >>
> >
> > TBH it was Jespers question :)
>
> True. I'm generally questioning this patch...
> ... and indirectly asking Magnus.  (If you noticed, I didn't add my SoB)
>
> >> * Put this feature flag at a known place in the metadata area, for
> >> example just before the BTF ID. No need to fill this in if you are not
> >> redirecting to AF_XDP, but at a redirect to AF_XDP, the XDP flags are
> >> copied into this u32 in the metadata area so that user-space can
> >> consume it. Will cost 4 bytes of the metadata area though.
> >
> > If Jesper agrees I think this approach would make sense. Trying to
> > translate encodings into some other flags for AF_XDP I think will lead
> > to a growing set of translations as more options come along.
> > The other thing to be aware of is just making sure to clear/zero the
> > metadata space in the buffers at some point (ideally when the descriptor
> > is returned from the application) so when the buffers are used again
> > they are already in a "reset" state.
>
> I don't like this option ;-)
>
> First of all because this can give false positives, if "XDP flags copied
> into metadata area" is used for something else.  This can easily happen
> as XDP BPF-progs are free to metadata for something else.

Are XDP programs not free to overwrite the BTF id that you have last
in the md section too and you can get false positives for that as
well? Or do you protect it in some way? Sorry, but I do not understand
why a flags field would be different from a BTF id stored in the
metadata section.

> Second reason, because it would require AF_XDP to always read the
> metadata cache-line (and write, if clearing on "return").  Not a good
> optioon, given how performance sensitive AF_XDP workloads (at least
> benchmarks).

On its own, you are right, but when combined with the "bit in the
descriptor" proposal below, you would not get this performance
penalty. If the bit is zero, you do not have to read the MD cache
line. If the bit is one, you want to read the MD line to get your
metadata anyway, so one more read on the same cache line to get the
flags would not hurt performance. (There is of course a case where the
4 extra bytes of the flags could push the metadata you are interested
in to a new cache line, but this should be rare.)

But it all depends on if you need the resolution of a u32 flags field.
If not, forget this idea. If you do, then the metadata section is the
only place for it.

> >>
> >> * Instead encode this information into each metadata entry in the
> >> metadata area, in some way so that a flags field is not needed (-1
> >> signifies not valid, or whatever happens to make sense). This has the
> >> drawback that the user might have to look at a large number of entries
> >> just to find out there is nothing valid to read. To alleviate this, it
> >> could be combined with the next suggestion.
> >>
> >> * Dedicate one bit in the options field to indicate that there is at
> >> least one valid metadata entry in the metadata area. This could be
> >> combined with the two approaches above. However, depending on what
> >> metadata you have enabled, this bit might be pointless. If some
> >> metadata is always valid, then it serves no purpose. But it might if
> >> all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
> >> on one packet out of one thousand.
> >>
>
> I like this option better! Except that I have hoped to get 2 bits ;-)

I will give you two if you need it Jesper, no problem :-).

> The performance advantage is that the AF_XDP descriptor bits will
> already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
> application can avoid reading the metadata cache-line :-).

Agreed. I prefer if we can keep it simple and fast like this.

> When metadata is valid and contains valid XDP-hints can change between
> two packets.  E.g. XDP-hints can be enabled/disabled via ethtool, and
> the content can be enabled/disabled by other ethtool commands, and even
> setsockopt calls (e.g timestamping).  An XDP prog can also choose to use
> the area for something else for a subset of the packets.
>
> It is a design choice in this patchset to avoid locking down the NIC
> driver to a fixed XDP-hints layout, and avoid locking/disabling other
> ethtool config setting to keeping XDP-hints layout stable.  Originally I
> wanted this, but I realized that it would be impossible (and annoying
> for users) if we had to control every config interface to NIC hardware
> offload hints, to keep XDP-hints "always-valid".

> --Jesper
>
> >>> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
> >>> ---
> >>>   include/uapi/linux/if_xdp.h |    2 +-
> >>>   net/xdp/xsk.c               |    2 +-
> >>>   net/xdp/xsk_queue.h         |    3 ++-
> >>>   3 files changed, 4 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> >>> index a78a8096f4ce..9335b56474e7 100644
> >>> --- a/include/uapi/linux/if_xdp.h
> >>> +++ b/include/uapi/linux/if_xdp.h
> >>> @@ -103,7 +103,7 @@ struct xdp_options {
> >>>   struct xdp_desc {
> >>>          __u64 addr;
> >>>          __u32 len;
> >>> -       __u32 options;
> >>> +       __u32 options; /* set to the values of xdp_hints_flags*/
> >>>   };
> >>>
> >>>   /* UMEM descriptor is __u64 */
> >>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> >>> index 5b4ce6ba1bc7..32095d78f06b 100644
> >>> --- a/net/xdp/xsk.c
> >>> +++ b/net/xdp/xsk.c
> >>> @@ -141,7 +141,7 @@ static int __xsk_rcv_zc(struct xdp_sock *xs,
> >>> struct xdp_buff *xdp, u32 len)
> >>>          int err;
> >>>
> >>>          addr = xp_get_handle(xskb);
> >>> -       err = xskq_prod_reserve_desc(xs->rx, addr, len);
> >>> +       err = xskq_prod_reserve_desc(xs->rx, addr, len, xdp->flags);
> >>>          if (err) {
> >>>                  xs->rx_queue_full++;
> >>>                  return err;
> >>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> >>> index fb20bf7207cf..7a66f082f97e 100644
> >>> --- a/net/xdp/xsk_queue.h
> >>> +++ b/net/xdp/xsk_queue.h
> >>> @@ -368,7 +368,7 @@ static inline u32
> >>> xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_d
> >>>   }
> >>>
> >>>   static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
> >>> -                                        u64 addr, u32 len)
> >>> +                                        u64 addr, u32 len, u32 flags)
> >>>   {
> >>>          struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
> >>>          u32 idx;
> >>> @@ -380,6 +380,7 @@ static inline int xskq_prod_reserve_desc(struct
> >>> xsk_queue *q,
> >>>          idx = q->cached_prod++ & q->ring_mask;
> >>>          ring->desc[idx].addr = addr;
> >>>          ring->desc[idx].len = len;
> >>> +       ring->desc[idx].options = flags;
> >>>
> >>>          return 0;
> >>>   }
> >>>
> >>>
> >>
> >
>

  reply	other threads:[~2022-09-09  6:43 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 15:45 [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 01/18] libbpf: factor out BTF loading from load_module_btfs() Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 02/18] libbpf: try to load vmlinux BTF from the kernel first Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 03/18] libbpf: patch module BTF obj+type ID into BPF insns Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions Jesper Dangaard Brouer
2022-09-09 10:49   ` Burakov, Anatoly
2022-09-09 14:13     ` [xdp-hints] " Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 05/18] net: add net_device feature flag for XDP-hints Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 06/18] xdp: controlling XDP-hints from BPF-prog via helper Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 07/18] i40e: Refactor i40e_ptp_rx_hwtstamp Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 08/18] i40e: refactor i40e_rx_checksum with helper Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 09/18] bpf: export btf functions for modules Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 10/18] btf: Add helper for kernel modules to lookup full BTF ID Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 11/18] i40e: add XDP-hints handling Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 12/18] net: use XDP-hints in xdp_frame to SKB conversion Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 13/18] mvneta: add XDP-hints support Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 14/18] i40e: Add xdp_hints_union Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 15/18] ixgbe: enable xdp-hints Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 16/18] ixgbe: add rx timestamp xdp hints support Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options Jesper Dangaard Brouer
2022-09-08  8:06   ` Magnus Karlsson
2022-09-08 10:10     ` Maryam Tahhan
2022-09-08 15:04       ` Jesper Dangaard Brouer
2022-09-09  6:43         ` Magnus Karlsson [this message]
2022-09-09  8:12           ` Maryam Tahhan
2022-09-09  9:42             ` Jesper Dangaard Brouer
2022-09-09 10:14               ` Magnus Karlsson
2022-09-09 12:35                 ` [xdp-hints] " Jesper Dangaard Brouer
2022-09-09 12:44                   ` Magnus Karlsson
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 18/18] ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc Jesper Dangaard Brouer
2022-09-08  9:30 ` [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Alexander Lobakin
2022-09-09 13:48   ` Jesper Dangaard Brouer
2022-10-03 23:55 ` sdf
2022-10-04  9:29   ` Jesper Dangaard Brouer
2022-10-04 18:26     ` Stanislav Fomichev
2022-10-05  0:25       ` Martin KaFai Lau
2022-10-05  0:59         ` Jakub Kicinski
2022-10-05  1:02           ` Stanislav Fomichev
2022-10-05  1:24             ` Jakub Kicinski
2022-10-05  2:15               ` Stanislav Fomichev
2022-10-05 19:26                 ` Martin KaFai Lau
2022-10-06  9:14                   ` Magnus Karlsson
2022-10-06 15:29                     ` Jesper Dangaard Brouer
2022-10-11  6:29                       ` Martin KaFai Lau
2022-10-11 11:57                         ` Jesper Dangaard Brouer
2022-10-05 10:06             ` [xdp-hints] " Toke Høiland-Jørgensen
2022-10-05 18:47               ` sdf
2022-10-06  8:19                 ` Maryam Tahhan
2022-10-06 17:22                   ` sdf
2022-10-05 14:19             ` Jesper Dangaard Brouer
2022-10-06 14:59               ` Jakub Kicinski
2022-10-05 13:43         ` Jesper Dangaard Brouer
2022-10-05 16:29       ` Jesper Dangaard Brouer
2022-10-05 18:43         ` sdf
2022-10-06 17:47           ` Jesper Dangaard Brouer
2022-10-07 15:05             ` [xdp-hints] " David Ahern
2022-10-05 13:14     ` Burakov, Anatoly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ8uoz0CD18RUYU4SMsubB8fhv3uOwp6wi_uKsZSu_aOV5piaA@mail.gmail.com \
    --to=magnus.karlsson@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bjorn@kernel.org \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=dave@dtucker.co.uk \
    --cc=jbrouer@redhat.com \
    --cc=larysa.zaremba@intel.com \
    --cc=lorenzo@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=memxor@gmail.com \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).