bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2 3/6] bpf: support attaching freplace programs to multiple attach points
Date: Mon, 20 Jul 2020 16:34:55 -0700	[thread overview]
Message-ID: <20200720233455.6ito7n2eqojlfnvk@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAEf4BzYd4Xrn4EqzqHCTuJ8TnZiTC1vWWvd=9Np+LNrgbtxOcQ@mail.gmail.com>

On Sun, Jul 19, 2020 at 10:02:48PM -0700, Andrii Nakryiko wrote:
> On Thu, Jul 16, 2020 at 7:06 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jul 16, 2020 at 12:50:05PM +0200, Toke Høiland-Jørgensen wrote:
> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > >
> > > > On Wed, Jul 15, 2020 at 03:09:02PM +0200, Toke Høiland-Jørgensen wrote:
> > > >>
> > > >> +  if (tgt_prog_fd) {
> > > >> +          /* For now we only allow new targets for BPF_PROG_TYPE_EXT */
> > > >> +          if (prog->type != BPF_PROG_TYPE_EXT ||
> > > >> +              !btf_id) {
> > > >> +                  err = -EINVAL;
> > > >> +                  goto out_put_prog;
> > > >> +          }
> > > >> +          tgt_prog = bpf_prog_get(tgt_prog_fd);
> > > >> +          if (IS_ERR(tgt_prog)) {
> > > >> +                  err = PTR_ERR(tgt_prog);
> > > >> +                  tgt_prog = NULL;
> > > >> +                  goto out_put_prog;
> > > >> +          }
> > > >> +
> > > >> +  } else if (btf_id) {
> > > >> +          err = -EINVAL;
> > > >> +          goto out_put_prog;
> > > >> +  } else {
> > > >> +          btf_id = prog->aux->attach_btf_id;
> > > >> +          tgt_prog = prog->aux->linked_prog;
> > > >> +          if (tgt_prog)
> > > >> +                  bpf_prog_inc(tgt_prog); /* we call bpf_prog_put() on link release */
> > > >
> > > > so the first prog_load cmd will beholding the first target prog?
> > > > This is complete non starter.
> > > > You didn't mention such decision anywhere.
> > > > The first ext prog will attach to the first dispatcher xdp prog,
> > > > then that ext prog will multi attach to second dispatcher xdp prog and
> > > > the first dispatcher prog will live in the kernel forever.
> > >
> > > Huh, yeah, you're right that's no good. Missing that was a think-o on my
> > > part, sorry about that :/
> > >
> > > > That's not what we discussed back in April.
> > >
> > > No, you mentioned turning aux->linked_prog into a list. However once I
> > > started looking at it I figured it was better to actually have all this
> > > (the trampoline and ref) as part of the bpf_link structure, since
> > > logically they're related.
> > >
> > > But as you pointed out, the original reference sticks. So either that
> > > needs to be removed, or I need to go back to the 'aux->linked_progs as a
> > > list' idea. Any preference?
> >
> > Good question. Back then I was thinking about converting linked_prog into link
> > list, since standalone single linked_prog is quite odd, because attaching ext
> > prog to multiple tgt progs should have equivalent properties across all
> > attachments.
> > Back then bpf_link wasn't quite developed.
> > Now I feel moving into bpf_tracing_link is better.
> > I guess a link list of bpf_tracing_link-s from 'struct bpf_prog' might work.
> > At prog load time we can do bpf_link_init() only (without doing bpf_link_prime)
> > and keep this pre-populated bpf_link with target bpf prog and trampoline
> > in a link list accessed from 'struct bpf_prog'.
> > Then bpf_tracing_prog_attach() without extra tgt_prog_fd/btf_id would complete
> > that bpf_tracing_link by calling bpf_link_prime() and bpf_link_settle()
> > without allocating new one.
> > Something like:
> > struct bpf_tracing_link {
> >         struct bpf_link link;  /* ext prog pointer is hidding in there */
> >         enum bpf_attach_type attach_type;
> >         struct bpf_trampoline *tr;
> >         struct bpf_prog *tgt_prog; /* old aux->linked_prog */
> > };
> >
> > ext prog -> aux -> link list of above bpf_tracing_link-s
> >
> > It's a circular reference, obviously.
> > Need to think through the complications and locking.
> >
> > bpf_tracing_prog_attach() with tgt_prog_fd/btf_id will alloc new bpf_tracing_link
> > and will add it to a link list.
> >
> > Just a rough idea. I wonder what Andrii thinks.
> >
> 
> I need to spend more time reading existing and new code to see all the
> details, but I'll throw a slightly different proposal and let you guys
> shoot it down.
> 
> So, what if instead of having linked_prog (as bpf_prog *, refcnt'ed),
> at BPF_PROG_LOAD time we just record the target prog's ID. BPF
> verifier, when doing its target prog checks would attempt to get
> bpf_prog * reference; if by that time the target program is gone,
> fail, of course. If not, everything proceeds as is, at the end of
> verification target_prog is put until attach time.
> 
> Then at attach time, we either go with pre-recorded (in
> prog->aux->linked_prog_id) target prog's ID or we get a new one from
> RAW_TP_OPEN tgt_prog_fd. Either way, we bump refcnt on that target
> prog and keep it with bpf_tracing_link (so link on detach would put
> target_prog, that way it doesn't go away while EXT prog is attached).
> Then do all the compatibility checks, and if everything works out,
> bpf_tracing_link gets created, we record trampoline there, etc, etc.
> Basically, instead of having an EXT prog holding a reference to the
> target prog, only attachment (bpf_link) does that, which conceptually
> also seems to make more sense to me. For verification we store prog ID
> and don't hold target prog at all.
> 
> 
> Now, there will be a problem once you attach EXT prog to a new XDP
> root program and release a link against the original XDP root program.
> First, I hope I understand the desired sequence right, here's an
> example:
> 
> 1. load XDP root prog X
> 2. load EXT prog with target prog X
> 3. attach EXT prog to prog X
> 4. load XDP root prog Y
> 5. attach EXT prog to prog Y (Y and X should be "compatible")
> 6. detach prog X (close bpf_link)
> 
> Is that the right sequence?
> 
> If yes, then the problem with storing ID of prog X in EXT
> prog->aux->linked_prog_id is that you won't be able to re-attach to
> new prog Z, because there won't be anything to check compatibility
> against (prog X will be long time gone).
> 
> So we can do two things here:
> 
> 1. on attach, replace ext_prog->aux->linked_prog_id with the latest
> attached prog (prog Y ID from above example)
> 2. instead of recording target program FD/ID, capture BTF FD and/or
> enough BTF information for checking compatibility.
> 
> Approach 2) seems like conceptually the right thing to do (record type
> info we care about, not an **instance** of BPF program, compatible
> with that type info), but technically might be harder.

I've read your proposal couple times and still don't get what you're
trying to solve with either ID or BTF info recording.
So that target prog doesn't get refcnt-ed? What's a problem with it?
Currently it's being refcnt-d in aux->linked_prog.
What I'm proposing about is to convert aux->linked_prog into a link list
of bpf_tracing_links which will contain linked_prog inside.
Conceptually that's what bpf_link is doing. It links two progs.
EXT prog is recorded in 'struct bpf_link' and
the target prog is recorded in 'struct bpf_tracing_link'.
So from bpf_link perspective everything seems clean to me.
The link list of bpf_tracing_link-s in EXT_prog->aux is only to preserve
existing api of prog_load cmd.

As far as step 5: attach EXT prog to prog Y (Y and X should be "compatible")
The chance of failure there should be minimal. libxdp/libdispatcher will
prepare rootlet XDP prog. It should really make sure that Y and X are compatible.
This should be invisible to users.

In addition we still need bpf_link_update_hook() I was talking about in April.
The full sequence is:
first user process:
 1. load XDP root prog X
 1' root_link = attach X to eth0
 2. load EXT prog with target prog X
 3. app1_link_fd = attach EXT prog to prog X
second user process:
 4. load XDP root prog Y
 4'. find EXT prog of the first user process
 5. app2_link_fd = attach EXT prog to prog Y (Y and X should be "compatible")
 6. bpf_link_update(root_link, X, Y); // now packet flows into Y and into EXT
   // while EXT is attached in two places
 7. app1_link_fd' = FD in second process that points to the same tracing link
    as app1_link_fd in the first process.
   bpf_link_update_hook(app1_link_fd', app2_link_fd)
the last operation need to update bpf_tracing_link that is held by app1
(which is the first user process) from the second user process. It needs to
retarget (update_hook) inside bpf_tracing_link from X to Y.
Since the processes are more or less not aware of each other.
One firewall holds link_fd that connects EXT to X,
but the second firewall (via libxdp) is updaing that tracing link
to re-hook EXT into Y.

  reply	other threads:[~2020-07-20 23:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15 13:08 [PATCH bpf-next v2 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 1/6] bpf: change logging calls from verbose() to bpf_log() and use log pointer Toke Høiland-Jørgensen
2020-07-16 10:10   ` Dan Carpenter
2020-07-15 13:09 ` [PATCH bpf-next v2 2/6] bpf: verifier: refactor check_attach_btf_id() Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 3/6] bpf: support attaching freplace programs to multiple attach points Toke Høiland-Jørgensen
2020-07-15 20:44   ` Alexei Starovoitov
2020-07-16 10:50     ` Toke Høiland-Jørgensen
2020-07-17  2:05       ` Alexei Starovoitov
2020-07-17 10:52         ` Toke Høiland-Jørgensen
2020-07-20 22:57           ` Alexei Starovoitov
2020-07-20  5:02         ` Andrii Nakryiko
2020-07-20 23:34           ` Alexei Starovoitov [this message]
2020-07-21  3:48             ` Andrii Nakryiko
2020-07-22  0:29               ` Alexei Starovoitov
2020-07-22  6:02                 ` Andrii Nakryiko
2020-07-23  0:32                   ` Alexei Starovoitov
2020-07-23  0:56                     ` Andrii Nakryiko
2020-07-16 10:14   ` Dan Carpenter
2020-07-15 13:09 ` [PATCH bpf-next v2 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 5/6] libbpf: add support for supplying target to bpf_raw_tracepoint_open() Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 6/6] selftests: add test for multiple attachments of freplace program Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200720233455.6ito7n2eqojlfnvk@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).