From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jiong Wang Subject: [RFC bpf-next 00/10] initial control flow support for eBPF verifier Date: Mon, 7 May 2018 06:22:36 -0400 Message-ID: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com> Cc: john.fastabend@gmail.com, netdev@vger.kernel.org, oss-drivers@netronome.com, Jiong Wang To: alexei.starovoitov@gmail.com, daniel@iogearbox.net Return-path: Received: from mail-wm0-f65.google.com ([74.125.82.65]:36026 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751889AbeEGKXB (ORCPT ); Mon, 7 May 2018 06:23:01 -0400 Received: by mail-wm0-f65.google.com with SMTP id n10-v6so14370171wmc.1 for ; Mon, 07 May 2018 03:23:01 -0700 (PDT) Sender: netdev-owner@vger.kernel.org List-ID: This patch introduce initial control flow infrastructure to eBPF verifier. Various control flow information are built in an incremental way, so they could be easily maintained at various level. After this patch set, information collection on eBPF insns are centred at check_subprogs, check_cfg and related code then are replaced by new infrastructure. Checks/Analysis offered by check_cfg are still offered by new infrastructure. For example loop detection, recursive function call, unreachable insn, prune heuristic marking. This infrastructure comes with some memory usage and execution time cost. Memory pool based allocation is used to avoid frequently calling of kmalloc/free. Singly linked list and compact pointer are used to reduce various cfg node size. data structure size === basic blocks edge call-graph edge 16-bytes 24-bytes 6-bytes memory usage (test_l4lb_noinline, test_xdp_noinline) ==== (counted all subprogs): test_l4lb_noinline: cfg info: 597 insns, 12 subprogs 107 basic-blocks, 278 edges, 12 call edges. mem usage: 9216(9K)-bytes for memory pool (basic-blocks/edges/call-edges) 644-bytes for dominator trees. test_xdp_noinline: cfg info: 1029 insns, 21 subprogs 191 basic-basics, 502 edges, 23 call edges. mem usage: 15360(15K)-bytes for memory pool. 1192-bytes for dominator trees. The existing check_cfg is using two auxiliary arrays (insn_state and insn_stack), insn_cnt * sizeof(int) bytes for each, so would use ~5K and ~8K accordingly. (looks to me insn_state could use 1-byte element, insn_stack could use 2-byte, so probably could reduced to ~2K and ~3K). The existing check_cfg is using 8-bytes for each insn during cfg check. The new infrastructure basically use 2x mems, 16-bytes for each insn. execution time === test_l4lb_noinline: existing check_subprog/check_cfg: ~55000 ns new infrastructure: ~135000 ns test_xdp_noinline: existing check_subprog/check_cfg: ~52000 ns new infrastructure: ~120000 ns The control flow infrastructure could possibly lay the ground for further static analysis inside eBPF verifier, for example bounded loop detection, path-sensitive data-flow analysis etc. Jiong Wang (10): bpf: cfg: partition basic blocks for each subprog bpf: cfg: add edges between basic blocks to form CFG bpf: cfg: build domination tree using Tarjan algorithm bpf: cfg: detect loop use domination information bpf: cfg: detect unreachable basic blocks bpf: cfg: move find_subprog/add_subprog to cfg.c bpf: cfg: build call graph and detect unreachable/recursive call bpf: cfg: remove push_insn and check_cfg bpf: cfg: reduce k*alloc/free call by using memory pool for allocating nodes bpf: cfg: reduce memory usage by using singly list + compat pointer include/linux/bpf_verifier.h | 8 + kernel/bpf/Makefile | 2 +- kernel/bpf/cfg.c | 956 +++++++++++++++++++++++++++++++++++++++++++ kernel/bpf/cfg.h | 42 ++ kernel/bpf/verifier.c | 386 ++++++----------- 5 files changed, 1131 insertions(+), 263 deletions(-) create mode 100644 kernel/bpf/cfg.c create mode 100644 kernel/bpf/cfg.h -- 2.7.4