bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	 Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	 John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,  Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	 Network Development <netdev@vger.kernel.org>
Subject: Re: [RFC bpf-next v2 11/11] net/mlx5e: Support TX timestamp metadata
Date: Fri, 23 Jun 2023 10:24:30 -0700	[thread overview]
Message-ID: <CAKH8qBvJjtSb+80cNEJ_3qBR-smcc5mBAH4rTiWhckxVeZWxLA@mail.gmail.com> (raw)
In-Reply-To: <CAADnVQ+611dOqVFuoffbM_cnOf62n6h+jaB1LwD2HWxS5if2CA@mail.gmail.com>

On Thu, Jun 22, 2023 at 7:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jun 22, 2023 at 3:13 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Thu, Jun 22, 2023 at 2:47 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jun 22, 2023 at 1:13 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > On Thu, Jun 22, 2023 at 12:58 PM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jun 21, 2023 at 10:02:44AM -0700, Stanislav Fomichev wrote:
> > > > > > WIP, not tested, only to show the overall idea.
> > > > > > Non-AF_XDP paths are marked with 'false' for now.
> > > > > >
> > > > > > Cc: netdev@vger.kernel.org
> > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > ---
> > > > > >  .../net/ethernet/mellanox/mlx5/core/en/txrx.h | 11 +++
> > > > > >  .../net/ethernet/mellanox/mlx5/core/en/xdp.c  | 96 ++++++++++++++++++-
> > > > > >  .../net/ethernet/mellanox/mlx5/core/en/xdp.h  |  9 +-
> > > > > >  .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  3 +
> > > > > >  .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 16 ++++
> > > > > >  .../net/ethernet/mellanox/mlx5/core/main.c    | 26 ++++-
> > > > > >  6 files changed, 156 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > > > > index 879d698b6119..e4509464e0b1 100644
> > > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > > > > @@ -6,6 +6,7 @@
> > > > > >
> > > > > >  #include "en.h"
> > > > > >  #include <linux/indirect_call_wrapper.h>
> > > > > > +#include <net/devtx.h>
> > > > > >
> > > > > >  #define MLX5E_TX_WQE_EMPTY_DS_COUNT (sizeof(struct mlx5e_tx_wqe) / MLX5_SEND_WQE_DS)
> > > > > >
> > > > > > @@ -506,4 +507,14 @@ static inline struct mlx5e_mpw_info *mlx5e_get_mpw_info(struct mlx5e_rq *rq, int
> > > > > >
> > > > > >       return (struct mlx5e_mpw_info *)((char *)rq->mpwqe.info + array_size(i, isz));
> > > > > >  }
> > > > > > +
> > > > > > +struct mlx5e_devtx_frame {
> > > > > > +     struct devtx_frame frame;
> > > > > > +     struct mlx5_cqe64 *cqe; /* tx completion */
> > > > >
> > > > > cqe is only valid at completion.
> > > > >
> > > > > > +     struct mlx5e_tx_wqe *wqe; /* tx */
> > > > >
> > > > > wqe is only valid at submission.
> > > > >
> > > > > imo that's a very clear sign that this is not a generic datastructure.
> > > > > The code is trying hard to make 'frame' part of it look common,
> > > > > but it won't help bpf prog to be 'generic'.
> > > > > It is still going to precisely coded for completion vs submission.
> > > > > Similarly a bpf prog for completion in veth will be different than bpf prog for completion in mlx5.
> > > > > As I stated earlier this 'generalization' and 'common' datastructure only adds code complexity.
> > > >
> > > > The reason I went with this abstract context is to allow the programs
> > > > to be attached to the different devices.
> > > > For example, the xdp_hw_metadata we currently have is not really tied
> > > > down to the particular implementation.
> > > > If every hook declaration looks different, it seems impossible to
> > > > create portable programs.
> > > >
> > > > The frame part is not really needed, we can probably rename it to ctx
> > > > and pass data/frags over the arguments?
> > > >
> > > > struct devtx_ctx {
> > > >   struct net_device *netdev;
> > > >   /* the devices will be able to create wrappers to stash descriptor pointers */
> > > > };
> > > > void veth_devtx_submit(struct devtx_ctx *ctx, void *data, u16 len, u8
> > > > meta_len, struct skb_shared_info *sinfo);
> > > >
> > > > But striving to have a similar hook declaration seems useful to
> > > > program portability sake?
> > >
> > > portability across what ?
> > > 'timestamp' on veth doesn't have a real use. It's testing only.
> > > Even testing is a bit dubious.
> > > I can see a need for bpf prog to run in the datacenter on mlx, brcm
> > > and whatever other nics, but they will have completely different
> > > hw descriptors. timestamp kfuncs to request/read can be common,
> > > but to read the descriptors bpf prog authors would need to write
> > > different code anyway.
> > > So kernel code going out its way to present somewhat common devtx_ctx
> > > just doesn't help. It adds code to the kernel, but bpf prog still
> > > has to be tailored for mlx and brcm differently.
> >
> > Isn't it the same discussion/arguments we had during the RX series?
>
> Right, but there we already have xdp_md as an abstraction.
> Extra kfuncs don't change that.
> Here is the whole new 'ctx' being proposed with assumption that
> it will be shared between completion and submission and will be
> useful in both.
>
> But there is skb at submission time and no skb at completion.
> xdp_frame is there, but it's the last record of what was sent on the wire.
> Parsing it with bpf is like examining steps in a sand. They are gone.
> Parsing at submission makes sense, not at completion
> and the driver has a way to associate wqe with cqe.

Right, and I'm not exposing neither skb nor xdp_md/frame, so we're on
the same page?
Or are you suggesting to further split devtx_frame into two contexts?
One for submit and another for complete?
And don't expose the payload at the complete time?
Having payload at complete might still be useful though, at least the header.
In case the users want only to inspect completion based on some marker/flow.

> > We want to provide common sane interfaces/abstractions via kfuncs.
> > That will make most BPF programs portable from mlx to brcm (for
> > example) without doing a rewrite.
> > We're also exposing raw (readonly) descriptors (via that get_ctx
> > helper) to the users who know what to do with them.
> > Most users don't know what to do with raw descriptors;
>
> Why do you think so?
> Who are those users?
> I see your proposal and thumbs up from onlookers.
> afaict there are zero users for rx side hw hints too.

My bias comes from the point of view of our internal use-cases where
we'd like to have rx/tx timestamps in the device-agnostic fashion.
I'm happy to incorporate other requirements as I did with exposing raw
descriptors at rx using get_ctx helper.
Regarding the usage: for the external ones I'm assuming it will take
time until it all percolates through the distros...

> > the specs are
> > not public; things can change depending on fw version/etc/etc.
> > So the progs that touch raw descriptors are not the primary use-case.
> > (that was the tl;dr for rx part, seems like it applies here?)
> >
> > Let's maybe discuss that mlx5 example? Are you proposing to do
> > something along these lines?
> >
> > void mlx5e_devtx_submit(struct mlx5e_tx_wqe *wqe);
> > void mlx5e_devtx_complete(struct mlx5_cqe64 *cqe);
> >
> > If yes, I'm missing how we define the common kfuncs in this case. The
> > kfuncs need to have some common context. We're defining them with:
> > bpf_devtx_<kfunc>(const struct devtx_frame *ctx);
>
> I'm looking at xdp_metadata and wondering who's using it.
> I haven't seen a single bug report.
> No bugs means no one is using it. There is zero chance that we managed
> to implement it bug-free on the first try.
> So new tx side things look like a feature creep to me.
> rx side is far from proven to be useful for anything.
> Yet you want to add new things.

I've been talking about both tx and rx timestamps right from the
beginning, so it's not really a new feature.
But what's the concern here? IIUC, the whole point of it being
kfunc-based is that we can wipe it all if/when it becomes a dead
weight.

Regarding the users, there is also a bit of a chicken and egg problem:
We have some internal interest in using AF_XDP, but it lacks multibuf
(which is in the review) and the offloads (which I'm trying to move
forward for both rx/tx).

  parent reply	other threads:[~2023-06-23 17:24 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-21 17:02 [RFC bpf-next v2 00/11] bpf: Netdev TX metadata Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 01/11] bpf: Rename some xdp-metadata functions into dev-bound Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 02/11] bpf: Resolve single typedef when walking structs Stanislav Fomichev
2023-06-22  5:17   ` Alexei Starovoitov
2023-06-22 17:55     ` Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 03/11] xsk: Support XDP_TX_METADATA_LEN Stanislav Fomichev
2023-06-22  9:11   ` Jesper D. Brouer
2023-06-22 17:55     ` Stanislav Fomichev
2023-06-23 10:24       ` Jesper Dangaard Brouer
2023-06-23 17:41         ` Stanislav Fomichev
2023-06-24  9:02           ` Jesper Dangaard Brouer
2023-06-26 17:00             ` Stanislav Fomichev
2023-06-28  8:09               ` Magnus Karlsson
2023-06-28 18:49                 ` Stanislav Fomichev
2023-06-29  6:15                   ` Magnus Karlsson
2023-06-29 11:30                   ` [xdp-hints] " Toke Høiland-Jørgensen
2023-06-29 11:48                     ` Magnus Karlsson
2023-06-29 12:01                       ` Toke Høiland-Jørgensen
2023-06-29 16:21                         ` Stanislav Fomichev
2023-06-29 20:58                           ` Toke Høiland-Jørgensen
2023-06-30  6:22                         ` Magnus Karlsson
2023-06-30  9:19                           ` Toke Høiland-Jørgensen
2023-06-22 15:26   ` Simon Horman
2023-06-22 17:55     ` Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 04/11] bpf: Implement devtx hook points Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 05/11] bpf: Implement devtx timestamp kfunc Stanislav Fomichev
2023-06-22 12:07   ` Jesper D. Brouer
2023-06-22 17:55     ` Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 06/11] net: veth: Implement devtx timestamp kfuncs Stanislav Fomichev
2023-06-23 23:29   ` Vinicius Costa Gomes
2023-06-26 17:00     ` Stanislav Fomichev
2023-06-26 22:00       ` Vinicius Costa Gomes
2023-06-26 23:29         ` Stanislav Fomichev
2023-06-27  1:38           ` Vinicius Costa Gomes
2023-06-21 17:02 ` [RFC bpf-next v2 07/11] selftests/xsk: Support XDP_TX_METADATA_LEN Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 08/11] selftests/bpf: Add helper to query current netns cookie Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 09/11] selftests/bpf: Extend xdp_metadata with devtx kfuncs Stanislav Fomichev
2023-06-23 11:12   ` Jesper D. Brouer
2023-06-23 17:40     ` Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 10/11] selftests/bpf: Extend xdp_hw_metadata " Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 11/11] net/mlx5e: Support TX timestamp metadata Stanislav Fomichev
2023-06-22 19:57   ` Alexei Starovoitov
2023-06-22 20:13     ` Stanislav Fomichev
2023-06-22 21:47       ` Alexei Starovoitov
2023-06-22 22:13         ` Stanislav Fomichev
2023-06-23  2:35           ` Alexei Starovoitov
2023-06-23 10:16             ` Maryam Tahhan
2023-06-23 16:32               ` Alexei Starovoitov
2023-06-23 17:47                 ` Maryam Tahhan
2023-06-23 17:24             ` Stanislav Fomichev [this message]
2023-06-23 18:57             ` Donald Hunter
2023-06-24  0:25               ` John Fastabend
2023-06-24  2:52                 ` Alexei Starovoitov
2023-06-24 21:38                   ` Jakub Kicinski
2023-06-25  1:12                     ` Stanislav Fomichev
2023-06-26 21:36                       ` Stanislav Fomichev
2023-06-26 22:37                         ` Alexei Starovoitov
2023-06-26 23:29                           ` Stanislav Fomichev
2023-06-27 13:35                             ` Toke Høiland-Jørgensen
2023-06-27 21:43                             ` John Fastabend
2023-06-27 22:56                               ` Stanislav Fomichev
2023-06-27 23:33                                 ` John Fastabend
2023-06-27 23:50                                   ` Alexei Starovoitov
2023-06-28 18:52                               ` Jakub Kicinski
2023-06-29 11:43                                 ` Toke Høiland-Jørgensen
2023-06-30 18:54                                   ` Stanislav Fomichev
2023-07-01  0:52                                   ` John Fastabend
2023-07-01  3:11                                     ` Jakub Kicinski
2023-07-03 18:30                                       ` John Fastabend
2023-07-03 19:33                                         ` Jakub Kicinski
2023-06-22  8:41 ` [RFC bpf-next v2 00/11] bpf: Netdev TX metadata Jesper Dangaard Brouer
2023-06-22 17:55   ` Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKH8qBvJjtSb+80cNEJ_3qBR-smcc5mBAH4rTiWhckxVeZWxLA@mail.gmail.com \
    --to=sdf@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).