From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexei Starovoitov Subject: Re: [PATCH v2 bpf-next 0/3] bpf/verifier: subprog/func_call simplifications Date: Mon, 2 Apr 2018 18:08:04 -0700 Message-ID: <20180403010802.jkqffxw4m75oioj7@ast-mbp> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Daniel Borkmann , netdev@vger.kernel.org To: Edward Cree Return-path: Received: from mail-pl0-f49.google.com ([209.85.160.49]:37628 "EHLO mail-pl0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754604AbeDCBIH (ORCPT ); Mon, 2 Apr 2018 21:08:07 -0400 Received: by mail-pl0-f49.google.com with SMTP id v5-v6so4567833plo.4 for ; Mon, 02 Apr 2018 18:08:07 -0700 (PDT) Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Mar 29, 2018 at 11:44:17PM +0100, Edward Cree wrote: > By storing subprog boundaries as a subprogno mark on each insn, rather than > a start (and implicit end) for each subprog, we collect a number of gains: > * More efficient determination of which subprog contains a given insn, and > thus of find_subprog (which subprog begins at a given insn). > * Number of verifier "full recursive walk" passes is reduced, since most of > the work is done in the main insn walk (do_check()). Leftover work in > other passes is mostly linear scans (O(insn_cnt)) or, in the case of > check_max_stack_depth(), a topological sort (O(subprog_cnt)). > > Some other changes were also included to support this: > * Per-subprog info is stored in env->subprog_info, an array of structs, > rather than several arrays with a common index. > * Call graph is now stored in the new bpf_subprog_info struct; used here > for check_max_stack_depth() but may have other uses too. > > Along with this, patch #3 puts parent pointers (used by liveness analysis) > in the registers instead of the func_state or verifier_state, so that we > don't need skip_callee() machinery. This also does the right thing for > stack slots, so they don't need their own special handling for liveness > marking either. I like patch 3 and going to play with it. How did you test it ? Do you have processed_insn numbers for cilium or selftests programs before and after? There should be no difference, right? Essentially it's trading extra run-time cost of skip_callee logic for higher memory overhead due to parent pointers in every register state. I guess that's ok, since it does cleanup things nicely. As far as patch 1 it was very difficult to review, since several logically different things clamped together. So breaking it apart: - converting two arrays of subprog_starts and subprog_stack_depth into single array of struct bpf_subprog_info is a good thing - tsort is interesting, but not sure it's correct. more below - but main change of combining subprog logic with do_check is no good. The verifier have to move towards compiler style code analysis instead of the other way around. It has to analyze the program in simple and hopefully easy to understand passes instead of combinging everything into one loop. We _have_ to get rid of do_check() approach and corresponding insn_processed counter. That was and still is the biggest and most pressing issue we need to solve in bpf verification. The verifier has to switch to compiler style code analysis to scale. The algorithms should be such that scale with thousands of instructions and thousands of functions. The only way I see reaching that goal is to do hierarchical program analysis in passes: - identify subrpograms - identify basic blocks, build control flow graph - build dom graph, ssa and so on - and do per-basic block liveness and data flow analysis that propagates the combined result further into other BBs along cfg edges. There will be no 'do_check() across whole program' walk. Combining subprog pass with do_check is going into opposite direction of this long term work. Divide and conquer. Combining more things into do_check is the opposite of this programming principle. My short term plan is to split basic instruction correctness checks out of do_check() loop into separate pass and run it early on. It's necessary for bpf libraries to work, since the verifier cannot do full data flow analysis at this point, but should build cfg and move as much as possible out of instruction by instruction walk to scale with number of bpf programs/libraries and sizes of them. As far as tsort approach for determining max stack depth... It's an interesting idea, but implementation is suffering from the same 'combine everything into one loop' coding issue. I think updating total_stack_depth math should be separate from sorting, since the same function can be part of different stacks with different max. I don't see how updating global subprog_info[i].total_stack_depth can be correct. It has to be different for every stack and the longest stack is not necessarily the deepest. May be I'm missing something, but combined sort and stack_depth math didn't make it easy to review. I find existing check_max_stack_depth() algo much easier to understand.