bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, song@kernel.org, yhs@fb.com,
	kpsingh@kernel.org, haoluo@google.com, jolsa@kernel.org,
	"David Ahern" <dsahern@gmail.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Willem de Bruijn" <willemb@google.com>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>,
	"Anatoly Burakov" <anatoly.burakov@intel.com>,
	"Alexander Lobakin" <alexandr.lobakin@intel.com>,
	"Magnus Karlsson" <magnus.karlsson@gmail.com>,
	"Maryam Tahhan" <mtahhan@redhat.com>,
	xdp-hints@xdp-project.net, netdev@vger.kernel.org
Subject: Re: [xdp-hints] Re: [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp
Date: Wed, 16 Nov 2022 12:50:52 -0800	[thread overview]
Message-ID: <CAKH8qBs1rYXf0GGto9hPz-ELLZ9c692cFnKC9JLwAq5b7JRK-A@mail.gmail.com> (raw)
In-Reply-To: <6375340a6c284_66f16208aa@john.notmuch>

On Wed, Nov 16, 2022 at 11:03 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Toke Høiland-Jørgensen wrote:
> > Martin KaFai Lau <martin.lau@linux.dev> writes:
> >
> > > On 11/15/22 10:38 PM, John Fastabend wrote:
> > >>>>>>> +static void veth_unroll_kfunc(const struct bpf_prog *prog, u32 func_id,
> > >>>>>>> +                           struct bpf_patch *patch)
> > >>>>>>> +{
> > >>>>>>> +     if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) {
> > >>>>>>> +             /* return true; */
> > >>>>>>> +             bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 1));
> > >>>>>>> +     } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) {
> > >>>>>>> +             /* return ktime_get_mono_fast_ns(); */
> > >>>>>>> +             bpf_patch_append(patch, BPF_EMIT_CALL(ktime_get_mono_fast_ns));
> > >>>>>>> +     }
> > >>>>>>> +}
> > >>>>>>
> > >>>>>> So these look reasonable enough, but would be good to see some examples
> > >>>>>> of kfunc implementations that don't just BPF_CALL to a kernel function
> > >>>>>> (with those helper wrappers we were discussing before).
> > >>>>>
> > >>>>> Let's maybe add them if/when needed as we add more metadata support?
> > >>>>> xdp_metadata_export_to_skb has an example, and rfc 1/2 have more
> > >>>>> examples, so it shouldn't be a problem to resurrect them back at some
> > >>>>> point?
> > >>>>
> > >>>> Well, the reason I asked for them is that I think having to maintain the
> > >>>> BPF code generation in the drivers is probably the biggest drawback of
> > >>>> the kfunc approach, so it would be good to be relatively sure that we
> > >>>> can manage that complexity (via helpers) before we commit to this :)
> > >>>
> > >>> Right, and I've added a bunch of examples in v2 rfc so we can judge
> > >>> whether that complexity is manageable or not :-)
> > >>> Do you want me to add those wrappers you've back without any real users?
> > >>> Because I had to remove my veth tstamp accessors due to John/Jesper
> > >>> objections; I can maybe bring some of this back gated by some
> > >>> static_branch to avoid the fastpath cost?
> > >>
> > >> I missed the context a bit what did you mean "would be good to see some
> > >> examples of kfunc implementations that don't just BPF_CALL to a kernel
> > >> function"? In this case do you mean BPF code directly without the call?
> > >>
> > >> Early on I thought we should just expose the rx_descriptor which would
> > >> be roughly the same right? (difference being code embedded in driver vs
> > >> a lib) Trouble I ran into is driver code using seqlock_t and mutexs
> > >> which wasn't as straight forward as the simpler just read it from
> > >> the descriptor. For example in mlx getting the ts would be easy from
> > >> BPF with the mlx4_cqe struct exposed
> > >>
> > >> u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
> > >> {
> > >>          u64 hi, lo;
> > >>          struct mlx4_ts_cqe *ts_cqe = (struct mlx4_ts_cqe *)cqe;
> > >>
> > >>          lo = (u64)be16_to_cpu(ts_cqe->timestamp_lo);
> > >>          hi = ((u64)be32_to_cpu(ts_cqe->timestamp_hi) + !lo) << 16;
> > >>
> > >>          return hi | lo;
> > >> }
> > >>
> > >> but converting that to nsec is a bit annoying,
> > >>
> > >> void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
> > >>                              struct skb_shared_hwtstamps *hwts,
> > >>                              u64 timestamp)
> > >> {
> > >>          unsigned int seq;
> > >>          u64 nsec;
> > >>
> > >>          do {
> > >>                  seq = read_seqbegin(&mdev->clock_lock);
> > >>                  nsec = timecounter_cyc2time(&mdev->clock, timestamp);
> > >>          } while (read_seqretry(&mdev->clock_lock, seq));
> > >>
> > >>          memset(hwts, 0, sizeof(struct skb_shared_hwtstamps));
> > >>          hwts->hwtstamp = ns_to_ktime(nsec);
> > >> }
> > >>
> > >> I think the nsec is what you really want.
> > >>
> > >> With all the drivers doing slightly different ops we would have
> > >> to create read_seqbegin, read_seqretry, mutex_lock, ... to get
> > >> at least the mlx and ice drivers it looks like we would need some
> > >> more BPF primitives/helpers. Looks like some more work is needed
> > >> on ice driver though to get rx tstamps on all packets.
> > >>
> > >> Anyways this convinced me real devices will probably use BPF_CALL
> > >> and not BPF insns directly.
> > >
> > > Some of the mlx5 path looks like this:
> > >
> > > #define REAL_TIME_TO_NS(hi, low) (((u64)hi) * NSEC_PER_SEC + ((u64)low))
> > >
> > > static inline ktime_t mlx5_real_time_cyc2time(struct mlx5_clock *clock,
> > >                                                u64 timestamp)
> > > {
> > >          u64 time = REAL_TIME_TO_NS(timestamp >> 32, timestamp & 0xFFFFFFFF);
> > >
> > >          return ns_to_ktime(time);
> > > }
> > >
> > > If some hints are harder to get, then just doing a kfunc call is better.
> >
> > Sure, but if we end up having a full function call for every field in
> > the metadata, that will end up having a significant performance impact
> > on the XDP data path (thinking mostly about the skb metadata case here,
> > which will collect several bits of metadata).
> >
> > > csum may have a better chance to inline?
> >
> > Yup, I agree. Including that also makes it possible to benchmark this
> > series against Jesper's; which I think we should definitely be doing
> > before merging this.
>
> Good point I got sort of singularly focused on timestamp because I have
> a use case for it now.
>
> Also hash is often sitting in the rx descriptor.

Ack, let me try to add something else (that's more inline-able) on the
rx side for a v2.

> >
> > > Regardless, BPF in-lining is a well solved problem and used in many
> > > bpf helpers already, so there are many examples in the kernel. I don't
> > > think it is necessary to block this series because of missing some
> > > helper wrappers for inlining. The driver can always start with the
> > > simpler kfunc call first and optimize later if some hints from the
> > > drivers allow it.
> >
> > Well, "solved" in the sense of "there are a few handfuls of core BPF
> > people who know how to do it". My concern is that we'll end up with
> > either the BPF devs having to maintain all these bits of BPF byte code
> > in all the drivers; or drivers just punting to regular function calls
> > because the inlining is too complicated, with sub-par performance as per
> > the above. I don't think we should just hand-wave this away as "solved",
> > but rather treat this as an integral part of the initial series.
>
> This was my motivation for pushing the rx_descriptor into the xdp_buff.
> At this point if I'm going to have a kfunc call into the driver and
> have the driver rewrite the code into some BPF instructions I would
> just assume maintain this as a library code where I can hook it
> into my BPF program directly from user space. Maybe a few drivers
> will support all the things I want to read, but we run on lots of
> hardware (mlx, intel, eks, azure, gke, etc) and its been a lot of work
> to just get the basic feature parity. I also don't want to run around
> writing driver code for each vendor if I can avoid it. Having raw
> access to the rx descriptor gets me the escape hatch where I can
> just do it myself. And the last piece of info from my point of view
> (Tetragon, Cilium) I can run whatever libs I want and freely upgrade
> libbpf and cilium/ebpf but have a lot less ability to get users
> to upgrade kernels outside the LTS they picked. Meaning I can
> add new things much easier if its lifted into BPF code placed
> by user space.
>
> I appreciate that it means I import the problem of hardware detection
> and BTF CO-RE on networking codes, but we've already solved these
> problems for other reasons. For example just configuring the timestamp
> is a bit of an exercise in does my hardware support timestamping
> and does it support timestamping the packets I care about,  e.g.
> all pkts, just ptp pkts, etc.
>
> I don't think they are mutual exclusive with this series though
> because I can't see how to write these timestamping logic directly
> in BPF. But for rxhash and csum it seems doable. My preference
> is to have both the kfuncs and expose the descriptor directly.
>
> .John

  reply	other threads:[~2022-11-16 20:51 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-15  3:01 [PATCH bpf-next 00/11] xdp: hints via kfuncs Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 01/11] bpf: Document XDP RX metadata Stanislav Fomichev
2022-11-15 22:31   ` Zvi Effron
2022-11-15 22:43     ` Stanislav Fomichev
2022-11-15 23:34       ` Zvi Effron
2022-11-16  3:50         ` Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 02/11] bpf: Introduce bpf_patch Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 03/11] bpf: Support inlined/unrolled kfuncs for xdp metadata Stanislav Fomichev
2022-11-15 16:16   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 18:37     ` Stanislav Fomichev
2022-11-16 20:42   ` Jakub Kicinski
2022-11-16 20:53     ` Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 04/11] bpf: Implement hidden BPF_PUSH64 and BPF_POP64 instructions Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-15 16:17   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 18:37     ` Stanislav Fomichev
2022-11-15 22:46       ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-16  4:09         ` Stanislav Fomichev
2022-11-16  6:38           ` John Fastabend
2022-11-16  7:47             ` Martin KaFai Lau
2022-11-16 10:08               ` Toke Høiland-Jørgensen
2022-11-16 18:20                 ` Martin KaFai Lau
2022-11-16 19:03                 ` John Fastabend
2022-11-16 20:50                   ` Stanislav Fomichev [this message]
2022-11-16 23:47                     ` John Fastabend
2022-11-17  0:19                       ` Stanislav Fomichev
2022-11-17  2:17                         ` Alexei Starovoitov
2022-11-17  2:53                           ` Stanislav Fomichev
2022-11-17  2:59                             ` Alexei Starovoitov
2022-11-17  4:18                               ` Stanislav Fomichev
2022-11-17  6:55                                 ` John Fastabend
2022-11-17 17:51                                   ` Stanislav Fomichev
2022-11-17 19:47                                     ` John Fastabend
2022-11-17 20:17                                       ` Alexei Starovoitov
2022-11-17 11:32                             ` Toke Høiland-Jørgensen
2022-11-17 16:59                               ` Alexei Starovoitov
2022-11-17 17:52                                 ` Stanislav Fomichev
2022-11-17 23:46                                   ` Toke Høiland-Jørgensen
2022-11-18  0:02                                     ` Alexei Starovoitov
2022-11-18  0:29                                       ` Toke Høiland-Jørgensen
2022-11-17 10:27                       ` Toke Høiland-Jørgensen
2022-11-15  3:02 ` [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context Stanislav Fomichev
2022-11-15 23:20   ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-16  3:49     ` Stanislav Fomichev
2022-11-16  9:30       ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-16  7:04   ` Martin KaFai Lau
2022-11-16  9:48     ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-16 20:51       ` Stanislav Fomichev
2022-11-16 20:51     ` Stanislav Fomichev
2022-11-16 21:12   ` Jakub Kicinski
2022-11-16 21:49     ` Martin KaFai Lau
2022-11-18 14:05   ` Jesper Dangaard Brouer
2022-11-18 18:18     ` Stanislav Fomichev
2022-11-19 12:31       ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-21 17:53         ` Stanislav Fomichev
2022-11-21 18:47           ` Jakub Kicinski
2022-11-21 19:41             ` Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 07/11] selftests/bpf: Verify xdp_metadata xdp->af_xdp path Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 08/11] selftests/bpf: Verify xdp_metadata xdp->skb path Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 09/11] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 10/11] mxl4: Support rx timestamp metadata for xdp Stanislav Fomichev
2022-11-15  3:02 ` [PATCH bpf-next 11/11] selftests/bpf: Simple program to dump XDP RX metadata Stanislav Fomichev
2022-11-15 15:54 ` [xdp-hints] [PATCH bpf-next 00/11] xdp: hints via kfuncs Toke Høiland-Jørgensen
2022-11-15 18:37   ` Stanislav Fomichev
2022-11-15 22:31     ` [xdp-hints] " Toke Høiland-Jørgensen
2022-11-15 22:54     ` [xdp-hints] " Alexei Starovoitov
2022-11-15 23:13       ` [xdp-hints] " Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKH8qBs1rYXf0GGto9hPz-ELLZ9c692cFnKC9JLwAq5b7JRK-A@mail.gmail.com \
    --to=sdf@google.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=toke@redhat.com \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).