From: Stanislav Fomichev <sdf@google.com>
To: Tom Herbert <tom@sipanda.io>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
"Jamal Hadi Salim" <jhs@mojatatu.com>,
"John Fastabend" <john.fastabend@gmail.com>,
anjali.singhai@intel.com, "Paolo Abeni" <pabeni@redhat.com>,
"Linux Kernel Network Developers" <netdev@vger.kernel.org>,
deb.chatterjee@intel.com, namrata.limaye@intel.com,
mleitner@redhat.com, Mahesh.Shirshyad@amd.com,
Vipin.Jain@amd.com, tomasz.osinski@intel.com,
"Jiri Pirko" <jiri@resnulli.us>,
"Cong Wang" <xiyou.wangcong@gmail.com>,
davem@davemloft.net, edumazet@google.com,
"Vlad Buslov" <vladbu@nvidia.com>,
horms@kernel.org, khalidm@nvidia.com,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Victor Nogueira" <victor@mojatatu.com>,
pctammela@mojatatu.com, dan.daly@intel.com,
andy.fingerhut@gmail.com, chris.sommers@keysight.com,
mattyk@nvidia.com, bpf@vger.kernel.org
Subject: Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)
Date: Mon, 4 Mar 2024 13:19:08 -0800 [thread overview]
Message-ID: <ZeY6r9cm4pdW9WNC@google.com> (raw)
In-Reply-To: <CAOuuhY_senZbdC2cVU9kfDww_bT+a_VkNaDJYRk4_fMbJW17sQ@mail.gmail.com>
On 03/03, Tom Herbert wrote:
> On Sat, Mar 2, 2024 at 7:15 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 1 Mar 2024 18:20:36 -0800 Tom Herbert wrote:
> > > This is configurability versus programmability. The table driven
> > > approach as input (configurability) might work fine for generic
> > > match-action tables up to the point that tables are expressive enough
> > > to satisfy the requirements. But parsing doesn't fall into the table
> > > driven paradigm: parsers want to be *programmed*. This is why we
> > > removed kParser from this patch set and fell back to eBPF for parsing.
> > > But the problem we quickly hit that eBPF is not offloadable to network
> > > devices, for example when we compile P4 in an eBPF parser we've lost
> > > the declarative representation that parsers in the devices could
> > > consume (they're not CPUs running eBPF).
> > >
> > > I think the key here is what we mean by kernel offload. When we do
> > > kernel offload, is it the kernel implementation or the kernel
> > > functionality that's being offloaded? If it's the latter then we have
> > > a lot more flexibility. What we'd need is a safe and secure way to
> > > synchronize with that offload device that precisely supports the
> > > kernel functionality we'd like to offload. This can be done if both
> > > the kernel bits and programmed offload are derived from the same
> > > source (i.e. tag source code with a sha-1). For example, if someone
> > > writes a parser in P4, we can compile that into both eBPF and a P4
> > > backend using independent tool chains and program download. At
> > > runtime, the kernel can safely offload the functionality of the eBPF
> > > parser to the device if it matches the hash to that reported by the
> > > device
> >
> > Good points. If I understand you correctly you're saying that parsers
> > are more complex than just a basic parsing tree a'la u32.
>
> Yes. Parsing things like TLVs, GRE flag field, or nested protobufs
> isn't conducive to u32. We also want the advantages of compiler
> optimizations to unroll loops, squash nodes in the parse graph, etc.
>
> > Then we can take this argument further. P4 has grown to encompass a lot
> > of functionality of quite complex devices. How do we square that with
> > the kernel functionality offload model. If the entire device is modeled,
> > including f.e. TSO, an offload would mean that the user has to write
> > a TSO implementation which they then load into TC? That seems odd.
> >
> > IOW I don't quite know how to square in my head the "total
> > functionality" with being a TC-based "plugin".
>
> Hi Jakub,
>
> I believe the solution is to replace kernel code with eBPF in cases
> where we need programmability. This effectively means that we would
> ship eBPF code as part of the kernel. So in the case of TSO, the
> kernel would include a standard implementation in eBPF that could be
> compiled into the kernel by default. The restricted C source code is
> tagged with a hash, so if someone wants to offload TSO they could
> compile the source into their target and retain the hash. At runtime
> it's a matter of querying the driver to see if the device supports the
> TSO program the kernel is running by comparing hash values. Scaling
> this, a device could support a catalogue of programs: TSO, LRO,
> parser, IPtables, etc., If the kernel can match the hash of its eBPF
> code to one reported by the driver then it can assume functionality is
> offloadable. This is an elaboration of "device features", but instead
> of the device telling us they think they support an adequate GRO
> implementation by reporting NETIF_F_GRO, the device would tell the
> kernel that they not only support GRO but they provide identical
> functionality of the kernel GRO (which IMO is the first requirement of
> kernel offload).
>
> Even before considering hardware offload, I think this approach
> addresses a more fundamental problem to make the kernel programmable.
> Since the code is in eBPF, the kernel can be reprogrammed at runtime
> which could be controlled by TC. This allows local customization of
> kernel features, but also is the simplest way to "patch" the kernel
> with security and bug fixes (nobody is ever excited to do a kernel
[..]
> rebase in their datacenter!). Flow dissector is a prime candidate for
> this, and I am still planning to replace it with an all eBPF program
> (https://netdevconf.info/0x15/slides/16/Flow%20dissector_PANDA%20parser.pdf).
So you're suggesting to bundle (and extend)
tools/testing/selftests/bpf/progs/bpf_flow.c? We were thinking along
similar lines here. We load this program manually right now, shipping
and autoloading with the kernel will be easer.
next prev parent reply other threads:[~2024-03-04 21:19 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-25 16:54 [PATCH net-next v12 00/15] Introducing P4TC (series 1) Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 01/15] net: sched: act_api: Introduce P4 actions list Jamal Hadi Salim
2024-02-29 15:05 ` Paolo Abeni
2024-02-29 18:21 ` Jamal Hadi Salim
2024-03-01 7:30 ` Paolo Abeni
2024-03-01 12:39 ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 02/15] net/sched: act_api: increase action kind string length Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 03/15] net/sched: act_api: Update tc_action_ops to account for P4 actions Jamal Hadi Salim
2024-02-29 16:19 ` Paolo Abeni
2024-02-29 18:30 ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 04/15] net/sched: act_api: add struct p4tc_action_ops as a parameter to lookup callback Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 05/15] net: sched: act_api: Add support for preallocated P4 action instances Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 06/15] p4tc: add P4 data types Jamal Hadi Salim
2024-02-29 15:09 ` Paolo Abeni
2024-02-29 18:31 ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 07/15] p4tc: add template API Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 08/15] p4tc: add template pipeline create, get, update, delete Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 09/15] p4tc: add template action create, update, delete, get, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 10/15] p4tc: add runtime action support Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 11/15] p4tc: add template table create, update, delete, get, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 12/15] p4tc: add runtime table entry create and update Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 13/15] p4tc: add runtime table entry get, delete, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 14/15] p4tc: add set of P4TC table kfuncs Jamal Hadi Salim
2024-03-01 6:53 ` Martin KaFai Lau
2024-03-01 12:31 ` Jamal Hadi Salim
2024-03-03 1:32 ` Martin KaFai Lau
2024-03-03 17:20 ` Jamal Hadi Salim
2024-03-05 7:40 ` Martin KaFai Lau
2024-03-05 12:30 ` Jamal Hadi Salim
2024-03-06 7:58 ` Martin KaFai Lau
2024-03-06 20:22 ` Jamal Hadi Salim
2024-03-06 22:21 ` Martin KaFai Lau
2024-03-06 23:19 ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 15/15] p4tc: add P4 classifier Jamal Hadi Salim
2024-02-28 17:11 ` [PATCH net-next v12 00/15] Introducing P4TC (series 1) John Fastabend
2024-02-28 18:23 ` Jamal Hadi Salim
2024-02-28 21:13 ` John Fastabend
2024-03-01 7:02 ` Martin KaFai Lau
2024-03-01 12:36 ` Jamal Hadi Salim
2024-02-29 17:13 ` Paolo Abeni
2024-02-29 18:49 ` Jamal Hadi Salim
2024-02-29 20:52 ` John Fastabend
2024-02-29 21:49 ` Singhai, Anjali
2024-02-29 22:33 ` John Fastabend
2024-02-29 22:48 ` Jamal Hadi Salim
[not found] ` <CAOuuhY8qbsYCjdUYUZv8J3jz8HGXmtxLmTDP6LKgN5uRVZwMnQ@mail.gmail.com>
2024-03-01 17:00 ` Jakub Kicinski
2024-03-01 17:39 ` Jamal Hadi Salim
2024-03-02 1:32 ` Jakub Kicinski
2024-03-02 2:20 ` Tom Herbert
2024-03-03 3:15 ` Jakub Kicinski
2024-03-03 16:31 ` Tom Herbert
2024-03-04 20:07 ` Jakub Kicinski
2024-03-04 20:58 ` eBPF to implement core functionility WAS " Tom Herbert
2024-03-04 21:19 ` Stanislav Fomichev [this message]
2024-03-04 22:01 ` Tom Herbert
2024-03-04 23:24 ` Stanislav Fomichev
2024-03-04 23:50 ` Tom Herbert
2024-03-02 2:59 ` Hardware Offload discussion WAS(Re: " Jamal Hadi Salim
2024-03-02 14:36 ` Jamal Hadi Salim
2024-03-03 3:27 ` Jakub Kicinski
2024-03-03 17:00 ` Jamal Hadi Salim
2024-03-03 18:10 ` Tom Herbert
2024-03-03 19:04 ` Jamal Hadi Salim
2024-03-04 20:18 ` Jakub Kicinski
2024-03-04 21:02 ` Jamal Hadi Salim
2024-03-04 21:23 ` Stanislav Fomichev
2024-03-04 21:44 ` Jamal Hadi Salim
2024-03-04 22:23 ` Stanislav Fomichev
2024-03-04 22:59 ` Jamal Hadi Salim
2024-03-04 23:14 ` Stanislav Fomichev
2024-03-01 18:53 ` Chris Sommers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZeY6r9cm4pdW9WNC@google.com \
--to=sdf@google.com \
--cc=Mahesh.Shirshyad@amd.com \
--cc=Vipin.Jain@amd.com \
--cc=andy.fingerhut@gmail.com \
--cc=anjali.singhai@intel.com \
--cc=bpf@vger.kernel.org \
--cc=chris.sommers@keysight.com \
--cc=dan.daly@intel.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deb.chatterjee@intel.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=john.fastabend@gmail.com \
--cc=khalidm@nvidia.com \
--cc=kuba@kernel.org \
--cc=mattyk@nvidia.com \
--cc=mleitner@redhat.com \
--cc=namrata.limaye@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pctammela@mojatatu.com \
--cc=toke@redhat.com \
--cc=tom@sipanda.io \
--cc=tomasz.osinski@intel.com \
--cc=victor@mojatatu.com \
--cc=vladbu@nvidia.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).