All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	BPF-dev-list <bpf@vger.kernel.org>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	William Tu <u9012063@gmail.com>,
	xdp-hints@xdp-project.net
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Tue, 01 Jun 2021 17:22:51 -0700	[thread overview]
Message-ID: <60b6cf5b6505e_38d6d208d8@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <8735u3dv2l.fsf@toke.dk>

Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Jesper Dangaard Brouer wrote:
> >
> > [...]
> >
> > I'll try to respond to both Toke and Jesper here and make it coherent so
> > we don't split this thread yet again.
> >
> > Wish me luck.
> >
> > @Toke "react" -> "not break" hopefully gives you my opinion on this.
> >
> > @Toke "five fields gives 32 different metadata formats" OK let me take
> > five fields,
> >
> >  struct meta {
> >    __u32 f1;
> >    __u32 f2;
> >    __u32 f3;
> >    __u32 f4;
> >    __u32 f5;
> >  }
> >
> > I'm still confused why the meta data would change just because the feature
> > is enabled or disabled. I've written drivers before and I don't want to
> > move around where I write f1 based on some combination of features f2,f3,f4,f5
> > state of enabled/disabled. If features are mutual exclusive I can build a
> > sensible union. If its possible for all fields to enabled then I just lay
> > them out like above.
> 
> The assumption that the layout would be changing as the features were
> enabled came from a discussion I had with Jesper where he pointed out
> that zeroing out the fields that were not active comes with a measurable
> performance impact. So changing the struct layout to only include the
> fields that are currently used is a way to make sure we don't hurt
> performance.
> 
> If I'm understanding you correctly, what you propose instead is that we
> just keep the struct layout the same and only write the data that we
> have, leaving the other fields as uninitialised (so essentially
> garbage), right?

Correct.

> 
> If we do this, the BPF program obviously needs to know which fields are
> valid and which are not. AFAICT you're proposing that this should be
> done out-of-band (i.e., by the system administrator manually ensuring
> BPF program config fits system config)? I think there are a couple of
> problems with this:
> 
> - It requires the system admin to coordinate device config with all of
>   their installed XDP applications. This is error-prone, especially as
>   the number of applications grows (say if different containers have
>   different XDP programs installed on their virtual devices).

A complete "system" will need to be choerent. If I forward into a veth
device the orchestration component needs to ensure program sending
bits there is using the same format the program installed there expects.

If I tailcall/fentry into another program that program the callee and
caller need to agree on the metadata protocol.

I don't see any way around this. Someone has to manage the network.

> 
> - It has synchronisation issues. Say I have an XDP program with optional
>   support for hardware timestamps and a software fallback. It gets
>   installed in software fallback mode; then the admin has to make sure
>   to enable the hardware timestamps before switching the application
>   into the mode where it will read that metadata field (and the opposite
>   order when disabling the hardware mode).

If you want tell the hardware to use an enable bit then check it on
ingress to the XDP program. What I like about this scheme is as a core
kernel or networking person I don't have to care how you do this. Figure
it out with your hardware and driver.

> 
> Also, we need to be able to deal with different metadata layouts on
> different packets in the same program. Consider the XDP program running
> on a veth device inside a container above: if this gets packets
> redirected into it from different NICs with different layouts, it needs
> to be able to figure out which packet came from where.

Again I don't think this is the kernel problem. If you have some setup
where NICs have different metadata then you need a XDP shim to rewrite
metadata. But, as the person who setup this system maybe you should
avoid this for the fast path at least.

> 
> With this in mind I think we have to encode the metadata format into the
> packet metadata itself somehow. This could just be a matter of including
> the BTF ID as part of the struct itself, so that your example could
> essentially do:
> 
>   if (data->meta_btf_id == timestamp_id) {
>     struct timestamp_meta *meta = data->meta_data;
>     // do stuff
>   } else {
>     struct normal_meta *meta = data->meta_data;
>   }

I'm not sure why you call it meta_btf_id there? Why not just
enabledTstampBit. so


  if (mymeta->enabledTstampBit) { ...} else { //fallback }

> 
> 
> and then, to avoid drivers having to define different layouts we could
> essentially have the two metadata structs be:
> 
>  struct normal_meta {
>   u32 rxhash;
>   u32 padding;
>   u8 vlan;
>  };
> 
> and
> 
>  struct timestamp_meta {
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

So a union on that u32 with padding and timestamp would suffice but
sure if you want to do it at compile time with btf_id ok. I think
runtime would make more sense. Like this,

 struct meta {
   u32 rxhash;
   u32 timestamp;
    u8 vlan;
    u8 flags;
 }

 if (m->flags) { read timestamp}

> 
> This still gets us exponential growth in the number of metadata structs,
> but at least we could create tooling to auto-generate the BTF for the
> variants so the complexity is reduced to just consuming a lot of BTF
> IDs.

auto-generate the BTF? I'm not sure why its needed to be honest. I think
for most simple things you can build a single metadata struct that
will fit. For more complex pipeline updates then we should generate
the BTF with the pipeline using P4 or such IMO.

> 
> Alternatively we could have an explicit bitmap of valid fields, like:
> 
>  struct all_meta {
>    u32 _valid_field_bitmap;
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

Aha I like this as you can tell from above. Would have just jumped
here but was responding as I was reading. So I think I'll leave above
maybe it is illustrative.

> 
> and if a program reads all_meta->timestamp CO-RE could transparently
> insert a check of the relevant field in the bitmap first. My immediate
> feeling is that this last solution would be nicer: We'd still need to

+1 I think we agree.

> include the packet BTF ID in the packet data to deal with case where
> packets are coming from different interfaces, but we'd avoid having lots
> of different variants with separate BTF IDs. I'm not sure what it would
> take to teach CO-RE to support such a scheme, though...
> 
> WDYT?

I think we agree to the last option (bitmap field) to start with build
that out I think you cover most hardware that is out today. Then
the more complex cases come as step2. And I think its hard to
understate that the last step above mostly works today, you don't need
to go do a bunch of kernel work.

> 
> -Toke
> 



  parent reply	other threads:[~2021-06-02  0:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 10:58 XDP-hints: Howto support multiple BTF types per packet basis? Jesper Dangaard Brouer
2021-05-26 19:12 ` Andrii Nakryiko
2021-05-26 20:20   ` Jesper Dangaard Brouer
2021-05-26 20:37     ` John Fastabend
2021-05-26 22:39     ` Andrii Nakryiko
2021-05-28 11:16       ` Jesper Dangaard Brouer
2021-05-30  3:24         ` Andrii Nakryiko
2021-05-26 20:31   ` John Fastabend
2021-05-26 22:54     ` Andrii Nakryiko
2021-05-27  0:44       ` John Fastabend
2021-05-27 17:44         ` Andrii Nakryiko
2021-05-28  5:48           ` John Fastabend
2021-05-28  9:16             ` Toke Høiland-Jørgensen
2021-05-28 10:38               ` Alexander Lobakin
2021-05-28 14:35               ` John Fastabend
2021-05-28 15:33                 ` Toke Høiland-Jørgensen
2021-05-28 16:02                 ` Jesper Dangaard Brouer
2021-05-28 17:29                   ` John Fastabend
2021-05-30  3:27                     ` Andrii Nakryiko
2021-05-31 11:03                     ` Toke Høiland-Jørgensen
2021-05-31 13:17                       ` Jesper Dangaard Brouer
2021-06-02  0:22                       ` John Fastabend [this message]
2021-06-02 16:18                         ` Jakub Kicinski
2021-06-22  7:44                           ` Michal Swiatkowski
2021-06-22 11:53                             ` Toke Høiland-Jørgensen
2021-06-23  8:32                               ` Michal Swiatkowski
2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
2021-06-24 13:07                                   ` Magnus Karlsson
2021-06-24 14:58                                     ` Alexei Starovoitov
2021-06-24 15:11                                   ` Zvi Effron
2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
2021-06-24 16:32                                       ` Zvi Effron
2021-06-24 16:45                                       ` Jesper Dangaard Brouer
2021-07-08  8:32                                   ` Michal Swiatkowski
2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
2021-09-02  2:49                                       ` Michal Swiatkowski
2021-09-02  9:17                                         ` Jesper Dangaard Brouer
2021-09-07  6:27                                           ` Michal Swiatkowski
2021-09-08 13:28                                             ` Jesper Dangaard Brouer
2021-09-09 18:19                                               ` Andrii Nakryiko
2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
2021-07-28  9:54 ` XDP-hints: how to inform driver about hints Michal Swiatkowski
2021-07-28 18:40   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=60b6cf5b6505e_38d6d208d8@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=toke@redhat.com \
    --cc=u9012063@gmail.com \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.