From: Jamal Hadi Salim <jhs@mojatatu.com>
To: Stanislav Fomichev <sdf@google.com>
Cc: "Tom Herbert" <tom@sipanda.io>,
"Jakub Kicinski" <kuba@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
anjali.singhai@intel.com, "Paolo Abeni" <pabeni@redhat.com>,
"Linux Kernel Network Developers" <netdev@vger.kernel.org>,
deb.chatterjee@intel.com, namrata.limaye@intel.com,
"Marcelo Ricardo Leitner" <mleitner@redhat.com>,
Mahesh.Shirshyad@amd.com, Vipin.Jain@amd.com,
tomasz.osinski@intel.com, "Jiri Pirko" <jiri@resnulli.us>,
"Cong Wang" <xiyou.wangcong@gmail.com>,
davem@davemloft.net, "Eric Dumazet" <edumazet@google.com>,
"Vlad Buslov" <vladbu@nvidia.com>,
"Simon Horman" <horms@kernel.org>,
"Khalid Manaa" <khalidm@nvidia.com>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Victor Nogueira" <victor@mojatatu.com>,
pctammela@mojatatu.com, dan.daly@intel.com,
"Andy Fingerhut" <andy.fingerhut@gmail.com>,
chris.sommers@keysight.com, "Matty Kadosh" <mattyk@nvidia.com>,
bpf <bpf@vger.kernel.org>
Subject: Re: Hardware Offload discussion WAS(Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)
Date: Mon, 4 Mar 2024 16:44:46 -0500 [thread overview]
Message-ID: <CAM0EoM=b6ymCEKs14ACanbkzscy=AdARYHSWprtexHBswD7xeg@mail.gmail.com> (raw)
In-Reply-To: <ZeY7TqCGFR3h36k-@google.com>
On Mon, Mar 4, 2024 at 4:23 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On 03/03, Jamal Hadi Salim wrote:
> > On Sun, Mar 3, 2024 at 1:11 PM Tom Herbert <tom@sipanda.io> wrote:
> > >
> > > On Sun, Mar 3, 2024 at 9:00 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > >
> > > > On Sat, Mar 2, 2024 at 10:27 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >
> > > > > On Sat, 2 Mar 2024 09:36:53 -0500 Jamal Hadi Salim wrote:
> > > > > > 2) Your point on: "integrate later", or at least "fill in the gaps"
> > > > > > This part i am probably going to mumble on. I am going to consider
> > > > > > more than just doing ACLs/MAT via flower/u32 for the sake of
> > > > > > discussion.
> > > > > > True, "fill the gaps" has been our model so far. It requires kernel
> > > > > > changes, user space code changes etc justifiably so because most of
> > > > > > the time such datapaths are subject to standardization via IETF, IEEE,
> > > > > > etc and new extensions come in on a regular basis. And sometimes we
> > > > > > do add features that one or two users or a single vendor has need for
> > > > > > at the cost of kernel and user/control extension. Given our work
> > > > > > process, any features added this way take a long time to make it to
> > > > > > the end user.
> > > > >
> > > > > What I had in mind was more of a DDP model. The device loads it binary
> > > > > blob FW in whatever way it does, then it tells the kernel its parser
> > > > > graph, and tables. The kernel exposes those tables to user space.
> > > > > All dynamic, no need to change the kernel for each new protocol.
> > > > >
> > > > > But that's different in two ways:
> > > > > 1. the device tells kernel the tables, no "dynamic reprogramming"
> > > > > 2. you don't need the SW side, the only use of the API is to interact
> > > > > with the device
> > > > >
> > > > > User can still do BPF kfuncs to look up in the tables (like in FIB),
> > > > > but call them from cls_bpf.
> > > > >
> > > >
> > > > This is not far off from what is envisioned today in the discussions.
> > > > The main issue is who loads the binary? We went from devlink to the
> > > > filter doing the loading. DDP is ethtool. We still need to tie a PCI
> > > > device/tc block to the "program" so we can do skip_sw and it works.
> > > > Meaning a device that is capable of handling multiple programs can
> > > > have multiple blobs loaded. A "program" is mapped to a tc filter and
> > > > MAT control works the same way as it does today (netlink/tc ndo).
> > > >
> > > > A program in P4 has a name, ID and people have been suggesting a sha1
> > > > identity (or a signature of some kind should be generated by the
> > > > compiler). So the upward propagation could be tied to discovering
> > > > these 3 tuples from the driver. Then the control plane targets a
> > > > program via those tuples via netlink (as we do currently).
> > > >
> > > > I do note, using the DDP sample space, currently whatever gets loaded
> > > > is "trusted" and really you need to have human knowledge of what the
> > > > NIC's parsing + MAT is to send the control. With P4 that is all
> > > > visible/programmable by the end user (i am not a proponent of vendors
> > > > "shipping" things or calling them for support) - so should be
> > > > sufficient to just discover what is in the binary and send the correct
> > > > control messages down.
> > > >
> > > > > I think in P4 terms that may be something more akin to only providing
> > > > > the runtime API? I seem to recall they had some distinction...
> > > >
> > > > There are several solutions out there (ex: TDI, P4runtime) - our API
> > > > is netlink and those could be written on top of netlink, there's no
> > > > controversy there.
> > > > So the starting point is defining the datapath using P4, generating
> > > > the binary blob and whatever constraints needed using the vendor
> > > > backend and for s/w equivalent generating the eBPF datapath.
> > > >
> > > > > > At the cost of this sounding controversial, i am going
> > > > > > to call things like fdb, fib, etc which have fixed datapaths in the
> > > > > > kernel "legacy". These "legacy" datapaths almost all the time have
> > > > >
> > > > > The cynic in me sometimes thinks that the biggest problem with "legacy"
> > > > > protocols is that it's hard to make money on them :)
> > > >
> > > > That's a big motivation without a doubt, but also there are people
> > > > that want to experiment with things. One of the craziest examples we
> > > > have is someone who created a P4 program for "in network calculator",
> > > > essentially a calculator in the datapath. You send it two operands and
> > > > an operator using custom headers, it does the math and responds with a
> > > > result in a new header. By itself this program is a toy but it
> > > > demonstrates that if one wanted to, they could have something custom
> > > > in hardware and/or kernel datapath.
> > >
> > > Jamal,
> > >
> > > Given how long P4 has been around it's surprising that the best
> > > publicly available code example is "the network calculator" toy.
> >
> > Come on Tom ;-> That was just an example of something "crazy" to
> > demonstrate freedom. I can run that in any of the P4 friendly NICs
> > today. You are probably being facetious - There are some serious
> > publicly available projects out there, some of which I quote on the
> > cover letter (like DASH).
>
> Shameless plug. I have a more crazy example with bpf:
>
> https://github.com/fomichev/xdp-btc-miner
>
Hrm - this looks crazy interesting;-> Tempting. I guess to port this
to P4 we'd need the sha256 in h/w (which most of these vendors have
already). Is there any other acceleration would you need? Would have
been more fun if you invented you own headers too ;->
cheers,
jamal
> A good way to ensure all those smartnic cycles are not wasted :-D
> I wish we had more nics with xdp bpf offloads :-(
next prev parent reply other threads:[~2024-03-04 21:44 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-25 16:54 [PATCH net-next v12 00/15] Introducing P4TC (series 1) Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 01/15] net: sched: act_api: Introduce P4 actions list Jamal Hadi Salim
2024-02-29 15:05 ` Paolo Abeni
2024-02-29 18:21 ` Jamal Hadi Salim
2024-03-01 7:30 ` Paolo Abeni
2024-03-01 12:39 ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 02/15] net/sched: act_api: increase action kind string length Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 03/15] net/sched: act_api: Update tc_action_ops to account for P4 actions Jamal Hadi Salim
2024-02-29 16:19 ` Paolo Abeni
2024-02-29 18:30 ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 04/15] net/sched: act_api: add struct p4tc_action_ops as a parameter to lookup callback Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 05/15] net: sched: act_api: Add support for preallocated P4 action instances Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 06/15] p4tc: add P4 data types Jamal Hadi Salim
2024-02-29 15:09 ` Paolo Abeni
2024-02-29 18:31 ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 07/15] p4tc: add template API Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 08/15] p4tc: add template pipeline create, get, update, delete Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 09/15] p4tc: add template action create, update, delete, get, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 10/15] p4tc: add runtime action support Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 11/15] p4tc: add template table create, update, delete, get, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 12/15] p4tc: add runtime table entry create and update Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 13/15] p4tc: add runtime table entry get, delete, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 14/15] p4tc: add set of P4TC table kfuncs Jamal Hadi Salim
2024-03-01 6:53 ` Martin KaFai Lau
2024-03-01 12:31 ` Jamal Hadi Salim
2024-03-03 1:32 ` Martin KaFai Lau
2024-03-03 17:20 ` Jamal Hadi Salim
2024-03-05 7:40 ` Martin KaFai Lau
2024-03-05 12:30 ` Jamal Hadi Salim
2024-03-06 7:58 ` Martin KaFai Lau
2024-03-06 20:22 ` Jamal Hadi Salim
2024-03-06 22:21 ` Martin KaFai Lau
2024-03-06 23:19 ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 15/15] p4tc: add P4 classifier Jamal Hadi Salim
2024-02-28 17:11 ` [PATCH net-next v12 00/15] Introducing P4TC (series 1) John Fastabend
2024-02-28 18:23 ` Jamal Hadi Salim
2024-02-28 21:13 ` John Fastabend
2024-03-01 7:02 ` Martin KaFai Lau
2024-03-01 12:36 ` Jamal Hadi Salim
2024-02-29 17:13 ` Paolo Abeni
2024-02-29 18:49 ` Jamal Hadi Salim
2024-02-29 20:52 ` John Fastabend
2024-02-29 21:49 ` Singhai, Anjali
2024-02-29 22:33 ` John Fastabend
2024-02-29 22:48 ` Jamal Hadi Salim
[not found] ` <CAOuuhY8qbsYCjdUYUZv8J3jz8HGXmtxLmTDP6LKgN5uRVZwMnQ@mail.gmail.com>
2024-03-01 17:00 ` Jakub Kicinski
2024-03-01 17:39 ` Jamal Hadi Salim
2024-03-02 1:32 ` Jakub Kicinski
2024-03-02 2:20 ` Tom Herbert
2024-03-03 3:15 ` Jakub Kicinski
2024-03-03 16:31 ` Tom Herbert
2024-03-04 20:07 ` Jakub Kicinski
2024-03-04 20:58 ` eBPF to implement core functionility WAS " Tom Herbert
2024-03-04 21:19 ` Stanislav Fomichev
2024-03-04 22:01 ` Tom Herbert
2024-03-04 23:24 ` Stanislav Fomichev
2024-03-04 23:50 ` Tom Herbert
2024-03-02 2:59 ` Hardware Offload discussion WAS(Re: " Jamal Hadi Salim
2024-03-02 14:36 ` Jamal Hadi Salim
2024-03-03 3:27 ` Jakub Kicinski
2024-03-03 17:00 ` Jamal Hadi Salim
2024-03-03 18:10 ` Tom Herbert
2024-03-03 19:04 ` Jamal Hadi Salim
2024-03-04 20:18 ` Jakub Kicinski
2024-03-04 21:02 ` Jamal Hadi Salim
2024-03-04 21:23 ` Stanislav Fomichev
2024-03-04 21:44 ` Jamal Hadi Salim [this message]
2024-03-04 22:23 ` Stanislav Fomichev
2024-03-04 22:59 ` Jamal Hadi Salim
2024-03-04 23:14 ` Stanislav Fomichev
2024-03-01 18:53 ` Chris Sommers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM0EoM=b6ymCEKs14ACanbkzscy=AdARYHSWprtexHBswD7xeg@mail.gmail.com' \
--to=jhs@mojatatu.com \
--cc=Mahesh.Shirshyad@amd.com \
--cc=Vipin.Jain@amd.com \
--cc=andy.fingerhut@gmail.com \
--cc=anjali.singhai@intel.com \
--cc=bpf@vger.kernel.org \
--cc=chris.sommers@keysight.com \
--cc=dan.daly@intel.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deb.chatterjee@intel.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=john.fastabend@gmail.com \
--cc=khalidm@nvidia.com \
--cc=kuba@kernel.org \
--cc=mattyk@nvidia.com \
--cc=mleitner@redhat.com \
--cc=namrata.limaye@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pctammela@mojatatu.com \
--cc=sdf@google.com \
--cc=toke@redhat.com \
--cc=tom@sipanda.io \
--cc=tomasz.osinski@intel.com \
--cc=victor@mojatatu.com \
--cc=vladbu@nvidia.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).