bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: ast@kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org,
	bjorn.topel@intel.com, magnus.karlsson@intel.com
Subject: Re: [PATCH v5 bpf-next 4/6] bpf, x64: rework pro/epilogue and tailcall handling in JIT
Date: Fri, 31 Jul 2020 00:58:52 +0200	[thread overview]
Message-ID: <20200730225852.GA28034@ranger.igk.intel.com> (raw)
In-Reply-To: <f427a00f-eaf0-d5e0-8542-2f6bc90ba3ce@iogearbox.net>

On Thu, Jul 30, 2020 at 10:16:00PM +0200, Daniel Borkmann wrote:
> On 7/29/20 11:10 PM, Maciej Fijalkowski wrote:
> > On Wed, Jul 29, 2020 at 06:10:44PM +0200, Maciej Fijalkowski wrote:
> > > On Wed, Jul 29, 2020 at 12:07:52AM +0200, Daniel Borkmann wrote:
> > > > On 7/24/20 7:35 PM, Maciej Fijalkowski wrote:
> > > > > This commit serves two things:
> > > > > 1) it optimizes BPF prologue/epilogue generation
> > > > > 2) it makes possible to have tailcalls within BPF subprogram
> > > > > 
> > > > > Both points are related to each other since without 1), 2) could not be
> > > > > achieved.
> > > > > 
> > > > [...]
> > > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> > > > > index 6fe6491fa17a..e9d62a60134b 100644
> > > > > --- a/kernel/bpf/arraymap.c
> > > > > +++ b/kernel/bpf/arraymap.c
> > > > > @@ -750,6 +750,7 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key,
> > > > >    				    struct bpf_prog *old,
> > > > >    				    struct bpf_prog *new)
> > > > >    {
> > > > > +	u8 *old_addr, *new_addr, *old_bypass_addr;
> > > > >    	struct prog_poke_elem *elem;
> > > > >    	struct bpf_array_aux *aux;
> > > > > @@ -800,13 +801,47 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key,
> > > > >    			if (poke->tail_call.map != map ||
> > > > >    			    poke->tail_call.key != key)
> > > > >    				continue;
> > > > > +			/* protect against un-updated poke descriptors since
> > > > > +			 * we could fill them from subprog and the same desc
> > > > > +			 * is present on main's program poke tab
> > > > > +			 */
> > > > > +			if (!poke->tailcall_bypass || !poke->tailcall_target ||
> > > > > +			    !poke->bypass_addr)
> > > > > +				continue;
> > > > 
> > > > Thinking more about this, this check here is not sufficient. You basically need this here
> > > > given you copy all poke descs over to each of the subprogs in jit_subprogs(). So for those
> > > > that weren't handled by the subprog have the above addresses as NULL. But in jit_subprogs()
> > > > once we filled out the target addresses for the bpf-in-bpf calls we loop over each subprog
> > > > and do the extra/final pass in the JIT to complete the images. However, nothing protects
> > > > bpf_tail_call_direct_fixup() as far as I can see from patching at the NULL addr if there is
> > > > a target program loaded in the map at the given key. That will most likely blow up and hit
> > > > the BUG_ON().
> > > 
> > > Okay, I agree with this reasoning but must admit that I don't understand
> > > when exactly during fixup the target prog for a given key might be already
> > > present? Could you shed some light on it? I recall that I was hitting
> > > this case in test_verifier kselftest, so maybe I'll dig onto that, but
> > > otherwise I didn't stumble upon this.
> 
> If the tail call map as first created and some programs attached to it, then you
> would hit this in bpf_tail_call_direct_fixup() for the subprogs where not all poke
> descs in the subprog's table belong to the actual prog.

Ah, got it. Thanks!

I've cooked the v6 that includes what we agreed on here together with
embarassing bisectability fix you spotted.

Just waiting for build to be finished so that I'll be sure that there's no
surprises after rebase.

> 
> > > > Instead of these above workarounds, did you try to go the path to only copy over the poke
> > > > descs that are relevant for the individual subprog (but not all the others)?
> > > 
> > > I was able to come up with something today, but I'd like to share it here
> > > and discuss whether you think it's correct approach before rushing with
> > > another revision.
> > > 
> > > Generally in fixup_bpf_calls I store the index of tail call insn onto the
> > > generated poke descriptor, then in jit_subprogs() I check whether the
> > > given poke descriptor belongs to the current subprog by checking if that
> > > previously stored absolute index of tail call insn is in the scope of the
> > > insns of given subprog. Then the insn->imm needs to be updated with new
> > > poke descriptor slot so that while JITing we will be able to grab the
> > > proper poke desc - previously it worked because we emulated the main
> > > prog's poke tab state onto each subprog.
> > > 
> > > This way the subprogs actually get only relevant poke descs, but I have a
> 
> That sounds reasonable to me, yes, and the below code also looks good.
> 
> > > concern about the main prog's poke tab. Shouldn't we pull out the descs
> > > that have been copied to the subprog out of the main poke tab?
> > > 
> > > If yes, then shouldn't the poke tab be converted to a linked list?
> > 
> > Thinking a bit more about this, I think we can just untrack the main
> > prog's aux struct from prog array map. If there are subprograms then the
> > main prog is treated as subprog 0 and with the logic below every poke desc
> > will be propagated properly.
> > 
> > I checked that doing:
> > 
> > 	for (i = 0; i < prog->aux->size_poke_tab; i++) {
> > 		map_ptr = prog->aux->poke_tab[i].tail_call.map;
> > 
> > 		map_ptr->ops->map_poke_untrack(map_ptr, prog->aux);
> > 	}
> > 
> > after the initial JIT subprogs loop works just fine and we can drop the
> > cumbersome check from map_poke_run().
> > 
> > wdyt?
> 
> Yes, that is needed as well. Given we test on prog->aux for tracking, the subprogs
> enries will get added in prog_array_map_poke_track() individually given their aux
> pointer is different and untracking main progs aux then also works since it has no
> effect on subprogs.
> 
> > > The patch that I will merge onto the 2/6 if you would say that we can live
> > > with this approach, it's on top of this series:
> > > 
> > >  From 57baac74647a4627fe85bb3393365de906070eb1 Mon Sep 17 00:00:00 2001
> > > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > Date: Wed, 29 Jul 2020 17:51:59 +0200
> > > Subject: [PATCH] bpf: propagate only those poke descs that are used in subprog
> > > 
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > ---
> > >   include/linux/bpf.h   |  1 +
> > >   kernel/bpf/verifier.c | 11 ++++++++++-
> > >   2 files changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 14b796bf35de..74ab8ec2f2d3 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -664,6 +664,7 @@ struct bpf_jit_poke_descriptor {
> > >   	bool tailcall_target_stable;
> > >   	u8 adj_off;
> > >   	u16 reason;
> > > +	u32 abs_insn_idx;
> 
> tiny nit: I think just calling insn_idx is sufficient.
> 
> > >   };
> > >   /* reg_type info for ctx arguments */
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 3ea769555246..d6402dc05087 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -9971,15 +9971,23 @@ static int jit_subprogs(struct bpf_verifier_env *env)
> > >   		func[i]->aux->func_info = prog->aux->func_info;
> > >   		for (j = 0; j < prog->aux->size_poke_tab; j++) {
> > > +			u32 abs_insn_idx = prog->aux->poke_tab[j].abs_insn_idx;
> > >   			int ret;
> > > +			if (!(abs_insn_idx >= subprog_start &&
> > > +			      abs_insn_idx <= subprog_end))
> > > +				continue;
> > > +
> > >   			ret = bpf_jit_add_poke_descriptor(func[i],
> > >   							  &prog->aux->poke_tab[j]);
> > >   			if (ret < 0) {
> > >   				verbose(env, "adding tail call poke descriptor failed\n");
> > >   				goto out_free;
> > >   			}
> > > -			map_ptr = func[i]->aux->poke_tab[j].tail_call.map;
> > > +
> > > +			func[i]->insnsi[abs_insn_idx - subprog_start].imm = ret + 1;
> > > +
> > > +			map_ptr = func[i]->aux->poke_tab[ret].tail_call.map;
> > >   			ret = map_ptr->ops->map_poke_track(map_ptr, func[i]->aux);
> > >   			if (ret < 0) {
> > >   				verbose(env, "tracking tail call prog failed\n");
> > > @@ -10309,6 +10317,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
> > >   					.reason = BPF_POKE_REASON_TAIL_CALL,
> > >   					.tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
> > >   					.tail_call.key = bpf_map_key_immediate(aux),
> > > +					.abs_insn_idx = i,
> > >   				};
> > >   				ret = bpf_jit_add_poke_descriptor(prog, &desc);
> > > -- 
> > > 2.20.1
> 
> Lets ship it, thanks!

Super cool! :)

> Daniel

  reply	other threads:[~2020-07-30 23:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-24 17:35 [PATCH v5 bpf-next 0/6] bpf: tailcalls in BPF subprograms Maciej Fijalkowski
2020-07-24 17:35 ` [PATCH v5 bpf-next 1/6] bpf, x64: use %rcx instead of %rax for tail call retpolines Maciej Fijalkowski
2020-07-24 17:35 ` [PATCH v5 bpf-next 2/6] bpf: propagate poke descriptors to subprograms Maciej Fijalkowski
2020-07-24 17:35 ` [PATCH v5 bpf-next 3/6] bpf: rename poke descriptor's 'ip' member to 'tailcall_target' Maciej Fijalkowski
2020-07-24 17:35 ` [PATCH v5 bpf-next 4/6] bpf, x64: rework pro/epilogue and tailcall handling in JIT Maciej Fijalkowski
2020-07-28 21:33   ` Daniel Borkmann
2020-07-28 22:07   ` Daniel Borkmann
2020-07-29 16:10     ` Maciej Fijalkowski
2020-07-29 21:10       ` Maciej Fijalkowski
2020-07-30 20:16         ` Daniel Borkmann
2020-07-30 22:58           ` Maciej Fijalkowski [this message]
2020-07-24 17:35 ` [PATCH v5 bpf-next 5/6] bpf: allow for tailcalls in BPF subprograms for x64 JIT Maciej Fijalkowski
2020-07-24 17:35 ` [PATCH v5 bpf-next 6/6] selftests: bpf: add dummy prog for bpf2bpf with tailcall Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200730225852.GA28034@ranger.igk.intel.com \
    --to=maciej.fijalkowski@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).