bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Tom Herbert <tom@sipanda.io>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
	"Jamal Hadi Salim" <jhs@mojatatu.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	anjali.singhai@intel.com, "Paolo Abeni" <pabeni@redhat.com>,
	"Linux Kernel Network Developers" <netdev@vger.kernel.org>,
	deb.chatterjee@intel.com, namrata.limaye@intel.com,
	mleitner@redhat.com, Mahesh.Shirshyad@amd.com,
	Vipin.Jain@amd.com, tomasz.osinski@intel.com,
	"Jiri Pirko" <jiri@resnulli.us>,
	"Cong Wang" <xiyou.wangcong@gmail.com>,
	davem@davemloft.net, edumazet@google.com,
	"Vlad Buslov" <vladbu@nvidia.com>,
	horms@kernel.org, khalidm@nvidia.com,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Victor Nogueira" <victor@mojatatu.com>,
	pctammela@mojatatu.com, dan.daly@intel.com,
	andy.fingerhut@gmail.com, chris.sommers@keysight.com,
	mattyk@nvidia.com, bpf@vger.kernel.org
Subject: Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)
Date: Mon, 4 Mar 2024 13:19:08 -0800	[thread overview]
Message-ID: <ZeY6r9cm4pdW9WNC@google.com> (raw)
In-Reply-To: <CAOuuhY_senZbdC2cVU9kfDww_bT+a_VkNaDJYRk4_fMbJW17sQ@mail.gmail.com>

On 03/03, Tom Herbert wrote:
> On Sat, Mar 2, 2024 at 7:15 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 1 Mar 2024 18:20:36 -0800 Tom Herbert wrote:
> > > This is configurability versus programmability. The table driven
> > > approach as input (configurability) might work fine for generic
> > > match-action tables up to the point that tables are expressive enough
> > > to satisfy the requirements. But parsing doesn't fall into the table
> > > driven paradigm: parsers want to be *programmed*. This is why we
> > > removed kParser from this patch set and fell back to eBPF for parsing.
> > > But the problem we quickly hit that eBPF is not offloadable to network
> > > devices, for example when we compile P4 in an eBPF parser we've lost
> > > the declarative representation that parsers in the devices could
> > > consume (they're not CPUs running eBPF).
> > >
> > > I think the key here is what we mean by kernel offload. When we do
> > > kernel offload, is it the kernel implementation or the kernel
> > > functionality that's being offloaded? If it's the latter then we have
> > > a lot more flexibility. What we'd need is a safe and secure way to
> > > synchronize with that offload device that precisely supports the
> > > kernel functionality we'd like to offload. This can be done if both
> > > the kernel bits and programmed offload are derived from the same
> > > source (i.e. tag source code with a sha-1). For example, if someone
> > > writes a parser in P4, we can compile that into both eBPF and a P4
> > > backend using independent tool chains and program download. At
> > > runtime, the kernel can safely offload the functionality of the eBPF
> > > parser to the device if it matches the hash to that reported by the
> > > device
> >
> > Good points. If I understand you correctly you're saying that parsers
> > are more complex than just a basic parsing tree a'la u32.
> 
> Yes. Parsing things like TLVs, GRE flag field, or nested protobufs
> isn't conducive to u32. We also want the advantages of compiler
> optimizations to unroll loops, squash nodes in the parse graph, etc.
> 
> > Then we can take this argument further. P4 has grown to encompass a lot
> > of functionality of quite complex devices. How do we square that with
> > the kernel functionality offload model. If the entire device is modeled,
> > including f.e. TSO, an offload would mean that the user has to write
> > a TSO implementation which they then load into TC? That seems odd.
> >
> > IOW I don't quite know how to square in my head the "total
> > functionality" with being a TC-based "plugin".
> 
> Hi Jakub,
> 
> I believe the solution is to replace kernel code with eBPF in cases
> where we need programmability. This effectively means that we would
> ship eBPF code as part of the kernel. So in the case of TSO, the
> kernel would include a standard implementation in eBPF that could be
> compiled into the kernel by default. The restricted C source code is
> tagged with a hash, so if someone wants to offload TSO they could
> compile the source into their target and retain the hash. At runtime
> it's a matter of querying the driver to see if the device supports the
> TSO program the kernel is running by comparing hash values. Scaling
> this, a device could support a catalogue of programs: TSO, LRO,
> parser, IPtables, etc., If the kernel can match the hash of its eBPF
> code to one reported by the driver then it can assume functionality is
> offloadable. This is an elaboration of "device features", but instead
> of the device telling us they think they support an adequate GRO
> implementation by reporting NETIF_F_GRO, the device would tell the
> kernel that they not only support GRO but they provide identical
> functionality of the kernel GRO (which IMO is the first requirement of
> kernel offload).
> 
> Even before considering hardware offload, I think this approach
> addresses a more fundamental problem to make the kernel programmable.
> Since the code is in eBPF, the kernel can be reprogrammed at runtime
> which could be controlled by TC. This allows local customization of
> kernel features, but also is the simplest way to "patch" the kernel
> with security and bug fixes (nobody is ever excited to do a kernel

[..]

> rebase in their datacenter!). Flow dissector is a prime candidate for
> this, and I am still planning to replace it with an all eBPF program
> (https://netdevconf.info/0x15/slides/16/Flow%20dissector_PANDA%20parser.pdf).

So you're suggesting to bundle (and extend)
tools/testing/selftests/bpf/progs/bpf_flow.c? We were thinking along
similar lines here. We load this program manually right now, shipping
and autoloading with the kernel will be easer.

  parent reply	other threads:[~2024-03-04 21:19 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-25 16:54 [PATCH net-next v12 00/15] Introducing P4TC (series 1) Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 01/15] net: sched: act_api: Introduce P4 actions list Jamal Hadi Salim
2024-02-29 15:05   ` Paolo Abeni
2024-02-29 18:21     ` Jamal Hadi Salim
2024-03-01  7:30       ` Paolo Abeni
2024-03-01 12:39         ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 02/15] net/sched: act_api: increase action kind string length Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 03/15] net/sched: act_api: Update tc_action_ops to account for P4 actions Jamal Hadi Salim
2024-02-29 16:19   ` Paolo Abeni
2024-02-29 18:30     ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 04/15] net/sched: act_api: add struct p4tc_action_ops as a parameter to lookup callback Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 05/15] net: sched: act_api: Add support for preallocated P4 action instances Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 06/15] p4tc: add P4 data types Jamal Hadi Salim
2024-02-29 15:09   ` Paolo Abeni
2024-02-29 18:31     ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 07/15] p4tc: add template API Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 08/15] p4tc: add template pipeline create, get, update, delete Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 09/15] p4tc: add template action create, update, delete, get, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 10/15] p4tc: add runtime action support Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 11/15] p4tc: add template table create, update, delete, get, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 12/15] p4tc: add runtime table entry create and update Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 13/15] p4tc: add runtime table entry get, delete, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 14/15] p4tc: add set of P4TC table kfuncs Jamal Hadi Salim
2024-03-01  6:53   ` Martin KaFai Lau
2024-03-01 12:31     ` Jamal Hadi Salim
2024-03-03  1:32       ` Martin KaFai Lau
2024-03-03 17:20         ` Jamal Hadi Salim
2024-03-05  7:40           ` Martin KaFai Lau
2024-03-05 12:30             ` Jamal Hadi Salim
2024-03-06  7:58               ` Martin KaFai Lau
2024-03-06 20:22                 ` Jamal Hadi Salim
2024-03-06 22:21                   ` Martin KaFai Lau
2024-03-06 23:19                     ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 15/15] p4tc: add P4 classifier Jamal Hadi Salim
2024-02-28 17:11 ` [PATCH net-next v12 00/15] Introducing P4TC (series 1) John Fastabend
2024-02-28 18:23   ` Jamal Hadi Salim
2024-02-28 21:13     ` John Fastabend
2024-03-01  7:02   ` Martin KaFai Lau
2024-03-01 12:36     ` Jamal Hadi Salim
2024-02-29 17:13 ` Paolo Abeni
2024-02-29 18:49   ` Jamal Hadi Salim
2024-02-29 20:52     ` John Fastabend
2024-02-29 21:49   ` Singhai, Anjali
2024-02-29 22:33     ` John Fastabend
2024-02-29 22:48       ` Jamal Hadi Salim
     [not found]         ` <CAOuuhY8qbsYCjdUYUZv8J3jz8HGXmtxLmTDP6LKgN5uRVZwMnQ@mail.gmail.com>
2024-03-01 17:00           ` Jakub Kicinski
2024-03-01 17:39             ` Jamal Hadi Salim
2024-03-02  1:32               ` Jakub Kicinski
2024-03-02  2:20                 ` Tom Herbert
2024-03-03  3:15                   ` Jakub Kicinski
2024-03-03 16:31                     ` Tom Herbert
2024-03-04 20:07                       ` Jakub Kicinski
2024-03-04 20:58                         ` eBPF to implement core functionility WAS " Tom Herbert
2024-03-04 21:19                       ` Stanislav Fomichev [this message]
2024-03-04 22:01                         ` Tom Herbert
2024-03-04 23:24                           ` Stanislav Fomichev
2024-03-04 23:50                             ` Tom Herbert
2024-03-02  2:59                 ` Hardware Offload discussion WAS(Re: " Jamal Hadi Salim
2024-03-02 14:36                   ` Jamal Hadi Salim
2024-03-03  3:27                     ` Jakub Kicinski
2024-03-03 17:00                       ` Jamal Hadi Salim
2024-03-03 18:10                         ` Tom Herbert
2024-03-03 19:04                           ` Jamal Hadi Salim
2024-03-04 20:18                             ` Jakub Kicinski
2024-03-04 21:02                               ` Jamal Hadi Salim
2024-03-04 21:23                             ` Stanislav Fomichev
2024-03-04 21:44                               ` Jamal Hadi Salim
2024-03-04 22:23                                 ` Stanislav Fomichev
2024-03-04 22:59                                   ` Jamal Hadi Salim
2024-03-04 23:14                                     ` Stanislav Fomichev
2024-03-01 18:53   ` Chris Sommers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeY6r9cm4pdW9WNC@google.com \
    --to=sdf@google.com \
    --cc=Mahesh.Shirshyad@amd.com \
    --cc=Vipin.Jain@amd.com \
    --cc=andy.fingerhut@gmail.com \
    --cc=anjali.singhai@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=chris.sommers@keysight.com \
    --cc=dan.daly@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=deb.chatterjee@intel.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=khalidm@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=mattyk@nvidia.com \
    --cc=mleitner@redhat.com \
    --cc=namrata.limaye@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=toke@redhat.com \
    --cc=tom@sipanda.io \
    --cc=tomasz.osinski@intel.com \
    --cc=victor@mojatatu.com \
    --cc=vladbu@nvidia.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).