From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: David Ahern <dsahern@gmail.com>,
bpf@vger.kernel.org, netdev@vger.kernel.org,
Daniel Borkmann <borkmann@iogearbox.net>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
brouer@redhat.com
Subject: Re: [PATCH bpf-next RFC 2/3] bpf: devmap dynamic map-value storage area based on BTF
Date: Tue, 2 Jun 2020 10:59:08 +0200 [thread overview]
Message-ID: <20200602105908.19254e0f@carbon> (raw)
In-Reply-To: <87tuzyzodv.fsf@toke.dk>
On Fri, 29 May 2020 18:39:40 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
> > The devmap map-value can be read from BPF-prog side, and could be used for a
> > storage area per device. This could e.g. contain info on headers that need
> > to be added when packet egress this device.
> >
> > This patchset adds a dynamic storage member to struct bpf_devmap_val. More
> > importantly the struct bpf_devmap_val is made dynamic via leveraging and
> > requiring BTF for struct sizes above 4. The only mandatory struct member is
> > 'ifindex' with a fixed offset of zero.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > kernel/bpf/devmap.c | 216 ++++++++++++++++++++++++++++++++++++++++++++-------
> > 1 file changed, 185 insertions(+), 31 deletions(-)
> >
> > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> > index 4ab67b2d8159..9cf2dadcc0fe 100644
[...]
> > @@ -60,13 +61,30 @@ struct xdp_dev_bulk_queue {
> > unsigned int count;
> > };
> >
> > -/* DEVMAP values */
> > +/* DEVMAP map-value layout.
> > + *
> > + * The struct data-layout of map-value is a configuration interface.
> > + * BPF-prog side have read-only access to this memory.
> > + *
> > + * The layout might be different than below, because some struct members are
> > + * optional. This is made dynamic by requiring userspace provides an BTF
> > + * description of the struct layout, when creating the BPF-map. Struct names
> > + * are important and part of API, as BTF use these names to identify members.
> > + */
> > struct bpf_devmap_val {
> > - __u32 ifindex; /* device index */
> > + __u32 ifindex; /* device index - mandatory */
> > union {
> > int fd; /* prog fd on map write */
> > __u32 id; /* prog id on map read */
> > } bpf_prog;
> > + struct {
> > + /* This 'storage' member is meant as a dynamically sized area,
> > + * that BPF developer can redefine. As other members are added
> > + * overtime, this area can shrink, as size can be regained by
> > + * not using members above. Add new members above this struct.
> > + */
> > + unsigned char data[24];
> > + } storage;
>
> Why is this needed? Userspace already passes in the value_size, so why
> can't the kernel just use the BTF to pick out the values it cares about
> and let the rest be up to userspace?
The kernel cannot just ignore unknown struct members, due to forward
compatibility. An older kernel that sees a new struct member, cannot
know what this struct member is used for. Thus, later I'm rejecting
map creation if I detect members kernel doesn't know about.
This means, that I need to create a named area (e.g. named 'storage')
that users can define their own layout within.
This might be difficult to comprehend for other kernel developers,
because usually we create forward compatibility via walking the binary
struct and then assume that if an unknown area (in end-of-struct)
contains zeros, then it means end-user isn't using that unknown feature.
This doesn't work when the default value, as in this exact case, need
to be minus-1 do describe "unused" as this is a file descriptor.
Forward compatibility is different here. If the end-user include the
member in their BTF description, that means they intend to use it.
Thus, kernel need to reject map-create if it sees unknown members.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2020-06-02 8:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-29 15:59 [PATCH bpf-next RFC 0/3] bpf: dynamic map-value config layout via BTF Jesper Dangaard Brouer
2020-05-29 15:59 ` [PATCH bpf-next RFC 1/3] bpf: move struct bpf_devmap_val out of UAPI Jesper Dangaard Brouer
2020-05-29 16:06 ` David Ahern
2020-05-29 16:28 ` Jesper Dangaard Brouer
2020-05-29 15:59 ` [PATCH bpf-next RFC 2/3] bpf: devmap dynamic map-value storage area based on BTF Jesper Dangaard Brouer
2020-05-29 16:39 ` Toke Høiland-Jørgensen
2020-06-02 8:59 ` Jesper Dangaard Brouer [this message]
2020-06-02 9:23 ` Toke Høiland-Jørgensen
2020-06-02 10:01 ` Jesper Dangaard Brouer
2020-05-30 7:19 ` Andrii Nakryiko
2020-05-30 14:36 ` Jesper Dangaard Brouer
2020-06-01 21:30 ` Alexei Starovoitov
2020-06-02 7:00 ` Jesper Dangaard Brouer
2020-06-02 18:27 ` Alexei Starovoitov
2020-06-03 9:11 ` Jesper Dangaard Brouer
2020-06-03 16:20 ` Alexei Starovoitov
2020-05-29 15:59 ` [PATCH bpf-next RFC 3/3] samples/bpf: change xdp_fwd to use new BTF config interface Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200602105908.19254e0f@carbon \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=borkmann@iogearbox.net \
--cc=bpf@vger.kernel.org \
--cc=dsahern@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).