From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EA55C433DF for ; Wed, 29 Jul 2020 21:15:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E7197206D4 for ; Wed, 29 Jul 2020 21:15:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726476AbgG2VPK (ORCPT ); Wed, 29 Jul 2020 17:15:10 -0400 Received: from mga12.intel.com ([192.55.52.136]:57127 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726535AbgG2VPK (ORCPT ); Wed, 29 Jul 2020 17:15:10 -0400 IronPort-SDR: +1pfJFiUDuyOajM6bGMtmIY5OMMoxxrmOrbC0l78YDE/P+qP1btNcLE1islznTdCR4sv6tie4t LZ/vfUc6J9HQ== X-IronPort-AV: E=McAfee;i="6000,8403,9697"; a="131056559" X-IronPort-AV: E=Sophos;i="5.75,411,1589266800"; d="scan'208";a="131056559" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jul 2020 14:15:06 -0700 IronPort-SDR: z3gUYcQj3D5Q6KNsSCdhkXV9by2K4ElWSwZt8U1jN5acYfYeag7OAO5eMrEH1mT5tpldMksxdM z1jyXQhug0iA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,411,1589266800"; d="scan'208";a="273990912" Received: from ranger.igk.intel.com ([10.102.21.164]) by fmsmga008.fm.intel.com with ESMTP; 29 Jul 2020 14:15:04 -0700 Date: Wed, 29 Jul 2020 23:10:05 +0200 From: Maciej Fijalkowski To: Daniel Borkmann Cc: ast@kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, bjorn.topel@intel.com, magnus.karlsson@intel.com Subject: Re: [PATCH v5 bpf-next 4/6] bpf, x64: rework pro/epilogue and tailcall handling in JIT Message-ID: <20200729211005.GA2806@ranger.igk.intel.com> References: <20200724173557.5764-1-maciej.fijalkowski@intel.com> <20200724173557.5764-5-maciej.fijalkowski@intel.com> <20200729161044.GA2961@ranger.igk.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200729161044.GA2961@ranger.igk.intel.com> User-Agent: Mutt/1.12.1 (2019-06-15) Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Wed, Jul 29, 2020 at 06:10:44PM +0200, Maciej Fijalkowski wrote: > On Wed, Jul 29, 2020 at 12:07:52AM +0200, Daniel Borkmann wrote: > > On 7/24/20 7:35 PM, Maciej Fijalkowski wrote: > > > This commit serves two things: > > > 1) it optimizes BPF prologue/epilogue generation > > > 2) it makes possible to have tailcalls within BPF subprogram > > > > > > Both points are related to each other since without 1), 2) could not be > > > achieved. > > > > > [...] > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > > index 6fe6491fa17a..e9d62a60134b 100644 > > > --- a/kernel/bpf/arraymap.c > > > +++ b/kernel/bpf/arraymap.c > > > @@ -750,6 +750,7 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key, > > > struct bpf_prog *old, > > > struct bpf_prog *new) > > > { > > > + u8 *old_addr, *new_addr, *old_bypass_addr; > > > struct prog_poke_elem *elem; > > > struct bpf_array_aux *aux; > > > @@ -800,13 +801,47 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key, > > > if (poke->tail_call.map != map || > > > poke->tail_call.key != key) > > > continue; > > > + /* protect against un-updated poke descriptors since > > > + * we could fill them from subprog and the same desc > > > + * is present on main's program poke tab > > > + */ > > > + if (!poke->tailcall_bypass || !poke->tailcall_target || > > > + !poke->bypass_addr) > > > + continue; > > > > Thinking more about this, this check here is not sufficient. You basically need this here > > given you copy all poke descs over to each of the subprogs in jit_subprogs(). So for those > > that weren't handled by the subprog have the above addresses as NULL. But in jit_subprogs() > > once we filled out the target addresses for the bpf-in-bpf calls we loop over each subprog > > and do the extra/final pass in the JIT to complete the images. However, nothing protects > > bpf_tail_call_direct_fixup() as far as I can see from patching at the NULL addr if there is > > a target program loaded in the map at the given key. That will most likely blow up and hit > > the BUG_ON(). > > Okay, I agree with this reasoning but must admit that I don't understand > when exactly during fixup the target prog for a given key might be already > present? Could you shed some light on it? I recall that I was hitting > this case in test_verifier kselftest, so maybe I'll dig onto that, but > otherwise I didn't stumble upon this. > > > > > Instead of these above workarounds, did you try to go the path to only copy over the poke > > descs that are relevant for the individual subprog (but not all the others)? > > I was able to come up with something today, but I'd like to share it here > and discuss whether you think it's correct approach before rushing with > another revision. > > Generally in fixup_bpf_calls I store the index of tail call insn onto the > generated poke descriptor, then in jit_subprogs() I check whether the > given poke descriptor belongs to the current subprog by checking if that > previously stored absolute index of tail call insn is in the scope of the > insns of given subprog. Then the insn->imm needs to be updated with new > poke descriptor slot so that while JITing we will be able to grab the > proper poke desc - previously it worked because we emulated the main > prog's poke tab state onto each subprog. > > This way the subprogs actually get only relevant poke descs, but I have a > concern about the main prog's poke tab. Shouldn't we pull out the descs > that have been copied to the subprog out of the main poke tab? > > If yes, then shouldn't the poke tab be converted to a linked list? Thinking a bit more about this, I think we can just untrack the main prog's aux struct from prog array map. If there are subprograms then the main prog is treated as subprog 0 and with the logic below every poke desc will be propagated properly. I checked that doing: for (i = 0; i < prog->aux->size_poke_tab; i++) { map_ptr = prog->aux->poke_tab[i].tail_call.map; map_ptr->ops->map_poke_untrack(map_ptr, prog->aux); } after the initial JIT subprogs loop works just fine and we can drop the cumbersome check from map_poke_run(). wdyt? > > The patch that I will merge onto the 2/6 if you would say that we can live > with this approach, it's on top of this series: > > From 57baac74647a4627fe85bb3393365de906070eb1 Mon Sep 17 00:00:00 2001 > From: Maciej Fijalkowski > Date: Wed, 29 Jul 2020 17:51:59 +0200 > Subject: [PATCH] bpf: propagate only those poke descs that are used in subprog > > Signed-off-by: Maciej Fijalkowski > --- > include/linux/bpf.h | 1 + > kernel/bpf/verifier.c | 11 ++++++++++- > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 14b796bf35de..74ab8ec2f2d3 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -664,6 +664,7 @@ struct bpf_jit_poke_descriptor { > bool tailcall_target_stable; > u8 adj_off; > u16 reason; > + u32 abs_insn_idx; > }; > > /* reg_type info for ctx arguments */ > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 3ea769555246..d6402dc05087 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -9971,15 +9971,23 @@ static int jit_subprogs(struct bpf_verifier_env *env) > func[i]->aux->func_info = prog->aux->func_info; > > for (j = 0; j < prog->aux->size_poke_tab; j++) { > + u32 abs_insn_idx = prog->aux->poke_tab[j].abs_insn_idx; > int ret; > > + if (!(abs_insn_idx >= subprog_start && > + abs_insn_idx <= subprog_end)) > + continue; > + > ret = bpf_jit_add_poke_descriptor(func[i], > &prog->aux->poke_tab[j]); > if (ret < 0) { > verbose(env, "adding tail call poke descriptor failed\n"); > goto out_free; > } > - map_ptr = func[i]->aux->poke_tab[j].tail_call.map; > + > + func[i]->insnsi[abs_insn_idx - subprog_start].imm = ret + 1; > + > + map_ptr = func[i]->aux->poke_tab[ret].tail_call.map; > ret = map_ptr->ops->map_poke_track(map_ptr, func[i]->aux); > if (ret < 0) { > verbose(env, "tracking tail call prog failed\n"); > @@ -10309,6 +10317,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) > .reason = BPF_POKE_REASON_TAIL_CALL, > .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state), > .tail_call.key = bpf_map_key_immediate(aux), > + .abs_insn_idx = i, > }; > > ret = bpf_jit_add_poke_descriptor(prog, &desc); > -- > 2.20.1 > > > > > Thanks, > > Daniel