bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	Marek Majkowski <marek@cloudflare.com>,
	David Miller <davem@davemloft.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
Date: Wed, 02 Oct 2019 20:54:31 +0200	[thread overview]
Message-ID: <87zhijq8pk.fsf@toke.dk> (raw)
In-Reply-To: <CACAyw9860eDGU9meO0wQ82OgWNPv3LXAQqrJNf-mQFA0yu7rWQ@mail.gmail.com>

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Wed, 2 Oct 2019 at 14:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> This series adds support for executing multiple XDP programs on a single
>> interface in sequence, through the use of chain calls, as discussed at the Linux
>> Plumbers Conference last month:
> Hi Toke,
> Thanks for posting the patch set! As I mentioned, this is a big pain
> point for us as well. Right now, all of our different XDP components
> are managed by a single daemon called (drumroll) xdpd. We'd like to
> separate this out into individual bits and pieces that operate
> separately from each other.

Yes, I'm counting on you guys to be one of the potential users and help
me iron out the API and semantics! ;)

> I've looked at the kernel side of your patch set, here are my thoughts:
>> The basic idea is to express the chain call sequence through a special map type,
>> which contains a mapping from a (program, return code) tuple to another program
>> to run in next in the sequence. Userspace can populate this map to express
>> arbitrary call sequences, and update the sequence by updating or replacing the
>> map.
> How do you imagine this would work in practice? From what I can tell,
> the map is keyed by program id, which makes it difficult to reliably
> construct the control flow that I want.

The way I've been picturing this would work, is that programs would
cooperatively manage this by updating the data structures to insert and
remove themselves as needed. I haven't actually fleshed out this in code
yet (that's next on my agenda), but I imagine it would be something like

> As an example, I'd like to split the daemon into two parts, A and B,
> which I want to be completely independent. So:
> - A runs before B, if both are active
> - If A is not active, B is called first
> - If B is not active, only A is called
> Both A and B may at any point in time replace their current XDP
> program with a new one. This means that there is no stable program ID
> I can use.

They would have to cooperate. Either by a central management daemon, or
by using the chain call map as a data structure to coordinate.

If A switches out its program, it would need to first load the new
version, then update the map to replace the old ID with the new one.

> Another problem are deletions: if I delete A (because that component
> is shut down) B will never be called, since the program ID that linked
> B into the control flow is gone. This means that B needs to know about
> A and vice versa.

Say both programs are running. If B shuts down, it just removes itself
from the map. If A shuts down, it also needs to remove itself from the
map, and move up all its "descendants" in the call sequence. So in this
case, A would look up itself in the chain call map, find the next
program in the sequence, and install that as the new program on the

The operations are kinda like linked list manipulations. I had some
examples in my slides at LPC:

- Simple updates: *linked-list like* operations (map stays the same)
# Insert after id 3
  --> id = load(prog.o);
  --> map_update(map, {3, PASS}, id) # atomic update
# Insert before id 2
  --> id = load(prog.o);
  --> map_update(map, {id, PASS}, 2); # no effect on chain sequence
  --> map_update(map, {1, PASS}, id); # atomic update

- More complex operations: /*replace the whole thing*/

# Replace ID 3 with new program
  --> id = load(prog.o); map = new_map();
  --> map_update(map, {1, PASS}, 2);
  --> map_update(map, {1, TX}, id);
  --> map_update(map, {2, PASS}, id);
  --> xdp_attach(eth0, 1, map, FORCE); # atomic replace

The tricky part is avoiding read-modify-update races. I was imagining
that we could provide "cmpxchg-like" semantics by adding an "expected
old value" to the netlink load and/or to map updates. The kernel could
then ensure that the update fails if the actual value being replaced has
changed since userspace read it, in which case userspace could retry the
whole operation.

>> The actual execution of the program sequence is done in
>> bpf_prog_run_xdp(), which will lookup the chain sequence map, and if
>> found, will loop through calls to BPF_PROG_RUN, looking up the next
>> XDP program in the sequence based on the previous program ID and
>> return code.
> I think that the tail call chain is useful for other eBPF programs as
> well. How hard is it to turn the logic in bpf_prog_run_xdp into a
> helper instead?

Heh. In principle, I guess we could move the logic *into* BPF_PROG_RUN
instead of wrapping around it. The tricky part would be figuring out
where to stash the map itself.

One way to do this, which your other question about why both prog fd and
map fd was needed made me think about, is to make programs and sequence
maps interchangeable *everywhere*. I.e., every time we attach an eBPF
program anywhere, we pass an fd to that program. So from a UAPI PoV, we
could just extend that so all attach points would accept *either* a
program fd *or* a chain call map fd, and then do the appropriate thing.
This is quite intrusive, though, so let's leave that aside from now and
maybe come back to it in the future if this turns out to be a huge
success :)


  reply	other threads:[~2019-10-02 18:54 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 1/9] hashtab: Add new bpf_map_fd_put_value op Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 2/9] xdp: Add new xdp_chain_map type for specifying XDP call sequences Toke Høiland-Jørgensen
2019-10-02 15:50   ` Lorenz Bauer
2019-10-02 18:25     ` Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map Toke Høiland-Jørgensen
2019-10-02 15:50   ` Lorenz Bauer
2019-10-02 18:32     ` Toke Høiland-Jørgensen
2019-10-02 18:07   ` kbuild test robot
2019-10-02 18:29   ` kbuild test robot
2019-10-02 13:30 ` [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface Toke Høiland-Jørgensen
2019-10-02 17:33   ` kbuild test robot
2019-10-02 17:53   ` kbuild test robot
2019-10-02 13:30 ` [PATCH bpf-next 5/9] tools/include/uapi: Add XDP chain map definitions Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 6/9] tools/libbpf_probes: Add support for xdp_chain map type Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 7/9] bpftool: Add definitions " Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 8/9] libbpf: Add support for setting and getting XDP chain maps Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 9/9] selftests: Add tests for XDP chain calls Toke Høiland-Jørgensen
2019-10-02 15:10 ` [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through " Alan Maguire
2019-10-02 15:33   ` Toke Høiland-Jørgensen
2019-10-02 16:34     ` John Fastabend
2019-10-02 18:33       ` Toke Høiland-Jørgensen
2019-10-02 20:34         ` John Fastabend
2019-10-03  7:48           ` Toke Høiland-Jørgensen
2019-10-03 10:09             ` Jesper Dangaard Brouer
2019-10-03 19:45               ` John Fastabend
2019-10-02 16:35 ` Lorenz Bauer
2019-10-02 18:54   ` Toke Høiland-Jørgensen [this message]
2019-10-02 16:43 ` John Fastabend
2019-10-02 19:09   ` Toke Høiland-Jørgensen
2019-10-02 19:15   ` Daniel Borkmann
2019-10-02 19:29     ` Toke Høiland-Jørgensen
2019-10-02 19:46     ` Alexei Starovoitov
2019-10-03  7:58       ` Toke Høiland-Jørgensen
2019-10-02 18:38 ` Song Liu
2019-10-02 18:54   ` Song Liu
2019-10-02 19:25     ` Toke Høiland-Jørgensen
2019-10-03  8:53       ` Jesper Dangaard Brouer
2019-10-03 14:03         ` Alexei Starovoitov
2019-10-03 14:33           ` Toke Høiland-Jørgensen
2019-10-03 14:53             ` Edward Cree
2019-10-03 18:49               ` Jesper Dangaard Brouer
2019-10-03 19:35               ` John Fastabend
2019-10-04  8:09                 ` Toke Høiland-Jørgensen
2019-10-04 10:34                   ` Edward Cree
2019-10-04 15:58                     ` Lorenz Bauer
2019-10-07 16:43                       ` Edward Cree
2019-10-07 17:12                         ` Lorenz Bauer
2019-10-07 19:21                           ` Edward Cree
2019-10-07 21:01                         ` Alexei Starovoitov
2019-10-02 19:23   ` Toke Høiland-Jørgensen
2019-10-02 19:49     ` Song Liu
2019-10-03  7:59       ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zhijq8pk.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kafai@fb.com \
    --cc=kernel-team@cloudflare.com \
    --cc=lmb@cloudflare.com \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).