bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC bpf-next 0/8] bpf: accelerate insn patching speed
@ 2019-07-04 21:26 Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer Jiong Wang
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

This is an RFC based on latest bpf-next about acclerating insn patching
speed, it is now near the shape of final PATCH set, and we could see the
changes migrating to list patching would brings, so send out for
comments. Most of the info are in cover letter. I splitted the code in a
way to show API migration more easily.

Test Results
===
  - Full pass on test_verifier/test_prog/test_prog_32 under all three
    modes (interpreter, JIT, JIT with blinding).

  - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
    patching time from 5100s (nearly one and a half hour) to less than
    0.5s for 1M insn patching.

Known Issues
===
  - The following warning is triggered when running scale test which
    contains 1M insns and patching:
      warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330

    This is caused by existing code, it can be reproduced on bpf-next
    master with jit blinding enabled, then run scale unit test, it will
    shown up after half an hour. After this set, patching is very fast, so
    it shows up quickly.

  - No line info adjustment support when doing insn delete, subprog adj
    is with bug when doing insn delete as well. Generally, removal of insns
    could possibly cause remove of entire line or subprog, therefore
    entries of prog->aux->linfo or env->subprog needs to be deleted. I
    don't have good idea and clean code for integrating this into the
    linearization code at the moment, will do more experimenting,
    appreciate ideas and suggestions on this.
     
    Insn delete doesn't happen on normal programs, for example Cilium
    benchmarks, and happens rarely on test_progs, so the test coverage is
    not good. That's also why this RFC have a full pass on selftest with
    this known issue.

  - Could further use mem pool to accelerate the speed, changes are trivial
    on top of this RFC, and could be 2x extra faster. Not included in this
    RFC as reducing the algo complexity from quadratic to linear of insn
    number is the first step.

Background
===
This RFC aims to accelerate BPF insn patching speed, patching means expand
one bpf insn at any offset inside bpf prog into a set of new insns, or
remove insns.

At the moment, insn patching is quadratic of insn number, this is due to
branch targets of jump insns needs to be adjusted, and the algo used is:

  for insn inside prog
    patch insn + regeneate bpf prog
    for insn inside new prog
      adjust jump target

This is causing significant time spending when a bpf prog requires large
amount of patching on different insns. Benchmarking shows it could take
more than half minutes to finish patching when patching number is more
than 50K, and the time spent could be more than one hour when patching
number is around 1M.

  15000   :    3s
  45000   :   29s
  95000   :  125s
  195000  :  712s
  1000000 : 5100s

This RFC introduces new patching infrastructure. Before doing insn
patching, insns in bpf prog are turned into a singly linked list, insert
new insns just insert new list node, delete insns just set delete flag.
And finally, the list is linearized back into array, and branch target
adjustment is done for all jump insns during linearization. This algo
brings the time complexity from quadratic to linear of insn number.

Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
on medium sized prog, and for a 1M patching it reduce the time from 5100s
to less than 0.5s.

Patching API
===
Insn patching could happen on two layers inside BPF. One is "core layer"
where only BPF insns are patched. The other is "verification layer" where
insns have corresponding aux info as well high level subprog info, so
insn patching means aux info needs to be patched as well, and subprog info
needs to be adjusted. BPF prog also has debug info associated, so line info
should always be updated after insn patching.

So, list creation, destroy, insert, delete is the same for both layer,
but lineration is different. "verification layer" patching require extra
work. Therefore the patch APIs are:

   list creation:                bpf_create_list_insn
   list patch:                   bpf_patch_list_insn
   list pre-patch:               bpf_prepatch_list_insn
   list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
   list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
   list destroy:                 bpf_destroy_list_insn

list patch could change the insn at patch point, it will invalid the aux
info at patching point. list pre-patch insert new insns before patch point
where the insn and associated aux info are not touched, it is used for
example in convert_ctx_access when generating prologue. 

Typical API sequence for one patching pass:

   struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
   for (elem = list; elem; elem = elem->next)
      patch_buf = gen_patch_buf_logic;
      elem = bpf_patch_list_insn(elem, patch_buf, cnt);
   bpf_prog = bpf_linearize_list_insn(list)
   bpf_destroy_list_insn(list)
  
Several patching passes could also share the same list:

   struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
   for (elem = list; elem; elem = elem->next)
      patch_buf = gen_patch_buf_logic1;
      elem = bpf_patch_list_insn(elem, patch_buf, cnt);
   for (elem = list; elem; elem = elem->next)
      patch_buf = gen_patch_buf_logic2;
      elem = bpf_patch_list_insn(elem, patch_buf, cnt);
   bpf_prog = bpf_linearize_list_insn(list)
   bpf_destroy_list_insn(list)

but note new inserted insns int early passes won't have aux info except
zext info. So, if one patch pass requires all aux info updated and
recalculated for all insns including those pathced, it should first
linearize the old list, then re-create the list. The RFC always create and
linearize the list for each migrated patching pass separately.

Compared with old patching code, this new infrastructure has much less core
code, even though the final code has a couple of extra lines but that is
mostly due to for list based infrastructure, we need to do more error
checks, so the list and associated aux data structure could be freed when
errors happens.

Patching Restrictions
===
  - For core layer, the linearization assume no new jumps inside patch buf.
    Currently, the only user of this layer is jit blinding.
  - For verifier layer, there could be new jumps inside patch buf, but
    they should have branch target resolved themselves, meaning new jumps
    doesn't jump to insns out of the patch buf. This is the case for all
    existing verifier layer users.
  - bpf_insn_aux_data for all patched insns including the one at patch
    point are invalidated, only 32-bit zext info will be recalcuated.
    If the aux data of insn at patch point needs to be retained, it is
    purely insn insertion, so need to use the pre-patch API.

I plan to send out a PATCH set once I finished insn deletion line info adj
support, please have a looks at this RFC, and appreciate feedbacks.

Jiong Wang (8):
  bpf: introducing list based insn patching infra to core layer
  bpf: extend list based insn patching infra to verification layer
  bpf: migrate jit blinding to list patching infra
  bpf: migrate convert_ctx_accesses to list patching infra
  bpf: migrate fixup_bpf_calls to list patching infra
  bpf: migrate zero extension opt to list patching infra
  bpf: migrate insn remove to list patching infra
  bpf: delete all those code around old insn patching infrastructure

 include/linux/bpf_verifier.h |   1 -
 include/linux/filter.h       |  27 +-
 kernel/bpf/core.c            | 431 +++++++++++++++++-----------
 kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
 4 files changed, 580 insertions(+), 528 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
@ 2019-07-04 21:26 ` Jiong Wang
  2019-07-10 17:49   ` Andrii Nakryiko
  2019-07-04 21:26 ` [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer Jiong Wang
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

This patch introduces list based bpf insn patching infra to bpf core layer
which is lower than verification layer.

This layer has bpf insn sequence as the solo input, therefore the tasks
to be finished during list linerization is:
  - copy insn
  - relocate jumps
  - relocation line info.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/filter.h |  25 +++++
 kernel/bpf/core.c      | 268 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 293 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1fe53e7..1fea68c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -842,6 +842,31 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
 				       const struct bpf_insn *patch, u32 len);
 int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
 
+int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
+			int idx_map[]);
+
+#define LIST_INSN_FLAG_PATCHED	0x1
+#define LIST_INSN_FLAG_REMOVED	0x2
+struct bpf_list_insn {
+	struct bpf_insn insn;
+	struct bpf_list_insn *next;
+	s32 orig_idx;
+	u32 flag;
+};
+
+struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog);
+void bpf_destroy_list_insn(struct bpf_list_insn *list);
+/* Replace LIST_INSN with new list insns generated from PATCH. */
+struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
+					  const struct bpf_insn *patch,
+					  u32 len);
+/* Pre-patch list_insn with insns inside PATCH, meaning LIST_INSN is not
+ * touched. New list insns are inserted before it.
+ */
+struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
+					     const struct bpf_insn *patch,
+					     u32 len);
+
 void bpf_clear_redirect_map(struct bpf_map *map);
 
 static inline bool xdp_return_frame_no_direct(void)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index e2c1b43..e60703e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -502,6 +502,274 @@ int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt)
 	return WARN_ON_ONCE(bpf_adj_branches(prog, off, off + cnt, off, false));
 }
 
+int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
+			s32 idx_map[])
+{
+	u8 code = insn->code;
+	s64 imm;
+	s32 off;
+
+	if (BPF_CLASS(code) != BPF_JMP && BPF_CLASS(code) != BPF_JMP32)
+		return 0;
+
+	if (BPF_CLASS(code) == BPF_JMP &&
+	    (BPF_OP(code) == BPF_EXIT ||
+	     (BPF_OP(code) == BPF_CALL && insn->src_reg != BPF_PSEUDO_CALL)))
+		return 0;
+
+	/* BPF to BPF call. */
+	if (BPF_OP(code) == BPF_CALL) {
+		imm = idx_map[old_idx + insn->imm + 1] - new_idx - 1;
+		if (imm < S32_MIN || imm > S32_MAX)
+			return -ERANGE;
+		insn->imm = imm;
+		return 1;
+	}
+
+	/* Jump. */
+	off = idx_map[old_idx + insn->off + 1] - new_idx - 1;
+	if (off < S16_MIN || off > S16_MAX)
+		return -ERANGE;
+	insn->off = off;
+	return 0;
+}
+
+void bpf_destroy_list_insn(struct bpf_list_insn *list)
+{
+	struct bpf_list_insn *elem, *next;
+
+	for (elem = list; elem; elem = next) {
+		next = elem->next;
+		kvfree(elem);
+	}
+}
+
+struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog)
+{
+	unsigned int idx, len = prog->len;
+	struct bpf_list_insn *hdr, *prev;
+	struct bpf_insn *insns;
+
+	hdr = kvzalloc(sizeof(*hdr), GFP_KERNEL);
+	if (!hdr)
+		return ERR_PTR(-ENOMEM);
+
+	insns = prog->insnsi;
+	hdr->insn = insns[0];
+	hdr->orig_idx = 1;
+	prev = hdr;
+
+	for (idx = 1; idx < len; idx++) {
+		struct bpf_list_insn *node = kvzalloc(sizeof(*node),
+						      GFP_KERNEL);
+
+		if (!node) {
+			/* Destroy what has been allocated. */
+			bpf_destroy_list_insn(hdr);
+			return ERR_PTR(-ENOMEM);
+		}
+		node->insn = insns[idx];
+		node->orig_idx = idx + 1;
+		prev->next = node;
+		prev = node;
+	}
+
+	return hdr;
+}
+
+/* Linearize bpf list insn to array. */
+static struct bpf_prog *bpf_linearize_list_insn(struct bpf_prog *prog,
+						struct bpf_list_insn *list)
+{
+	u32 *idx_map, idx, prev_idx, fini_cnt = 0, orig_cnt = prog->len;
+	struct bpf_insn *insns, *insn;
+	struct bpf_list_insn *elem;
+
+	/* Calculate final size. */
+	for (elem = list; elem; elem = elem->next)
+		if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
+			fini_cnt++;
+
+	insns = prog->insnsi;
+	/* If prog length remains same, nothing else to do. */
+	if (fini_cnt == orig_cnt) {
+		for (insn = insns, elem = list; elem; elem = elem->next, insn++)
+			*insn = elem->insn;
+		return prog;
+	}
+	/* Realloc insn buffer when necessary. */
+	if (fini_cnt > orig_cnt)
+		prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
+					GFP_USER);
+	if (!prog)
+		return ERR_PTR(-ENOMEM);
+	insns = prog->insnsi;
+	prog->len = fini_cnt;
+
+	/* idx_map[OLD_IDX] = NEW_IDX */
+	idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
+	if (!idx_map)
+		return ERR_PTR(-ENOMEM);
+	memset(idx_map, 0xff, orig_cnt * sizeof(u32));
+
+	/* Copy over insn + calculate idx_map. */
+	for (idx = 0, elem = list; elem; elem = elem->next) {
+		int orig_idx = elem->orig_idx - 1;
+
+		if (orig_idx >= 0) {
+			idx_map[orig_idx] = idx;
+
+			if (elem->flag & LIST_INSN_FLAG_REMOVED)
+				continue;
+		}
+		insns[idx++] = elem->insn;
+	}
+
+	/* Relocate jumps using idx_map.
+	 *   old_dst = jmp_insn.old_target + old_pc + 1;
+	 *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
+	 *   jmp_insn.new_target = new_dst - new_pc - 1;
+	 */
+	for (idx = 0, prev_idx = 0, elem = list; elem; elem = elem->next) {
+		int ret, orig_idx;
+
+		/* A removed insn doesn't increase new_pc */
+		if (elem->flag & LIST_INSN_FLAG_REMOVED)
+			continue;
+
+		orig_idx = elem->orig_idx - 1;
+		ret = bpf_jit_adj_imm_off(&insns[idx],
+					  orig_idx >= 0 ? orig_idx : prev_idx,
+					  idx, idx_map);
+		idx++;
+		if (ret < 0) {
+			kvfree(idx_map);
+			return ERR_PTR(ret);
+		}
+		if (orig_idx >= 0)
+			/* Record prev_idx. it is used for relocating jump insn
+			 * inside patch buffer. For example, when doing jit
+			 * blinding, a jump could be moved to some other
+			 * positions inside the patch buffer, and its old_dst
+			 * could be calculated using prev_idx.
+			 */
+			prev_idx = orig_idx;
+	}
+
+	/* Adjust linfo.
+	 *
+	 * NOTE: the prog reached core layer has been adjusted to contain insns
+	 *       for single function, however linfo contains information for
+	 *       whole program, so we need to make sure linfo beyond current
+	 *       function is handled properly.
+	 */
+	if (prog->aux->nr_linfo) {
+		u32 linfo_idx, insn_start, insn_end, nr_linfo, idx, delta;
+		struct bpf_line_info *linfo;
+
+		linfo_idx = prog->aux->linfo_idx;
+		linfo = &prog->aux->linfo[linfo_idx];
+		insn_start = linfo[0].insn_off;
+		insn_end = insn_start + orig_cnt;
+		nr_linfo = prog->aux->nr_linfo - linfo_idx;
+		delta = fini_cnt - orig_cnt;
+		for (idx = 0; idx < nr_linfo; idx++) {
+			int adj_off;
+
+			if (linfo[idx].insn_off >= insn_end) {
+				linfo[idx].insn_off += delta;
+				continue;
+			}
+
+			adj_off = linfo[idx].insn_off - insn_start;
+			linfo[idx].insn_off = idx_map[adj_off] + insn_start;
+		}
+	}
+	kvfree(idx_map);
+
+	return prog;
+}
+
+struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
+					  const struct bpf_insn *patch,
+					  u32 len)
+{
+	struct bpf_list_insn *prev, *next;
+	u32 insn_delta = len - 1;
+	u32 idx;
+
+	list_insn->insn = *patch;
+	list_insn->flag |= LIST_INSN_FLAG_PATCHED;
+
+	/* Since our patchlet doesn't expand the image, we're done. */
+	if (insn_delta == 0)
+		return list_insn;
+
+	len--;
+	patch++;
+
+	prev = list_insn;
+	next = list_insn->next;
+	for (idx = 0; idx < len; idx++) {
+		struct bpf_list_insn *node = kvzalloc(sizeof(*node),
+						      GFP_KERNEL);
+
+		if (!node) {
+			/* Link what's allocated, so list destroyer could
+			 * free them.
+			 */
+			prev->next = next;
+			return ERR_PTR(-ENOMEM);
+		}
+
+		node->insn = patch[idx];
+		prev->next = node;
+		prev = node;
+	}
+
+	prev->next = next;
+	return prev;
+}
+
+struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
+					     const struct bpf_insn *patch,
+					     u32 len)
+{
+	struct bpf_list_insn *prev, *node, *begin_node;
+	u32 idx;
+
+	if (!len)
+		return list_insn;
+
+	node = kvzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return ERR_PTR(-ENOMEM);
+	node->insn = patch[0];
+	begin_node = node;
+	prev = node;
+
+	for (idx = 1; idx < len; idx++) {
+		node = kvzalloc(sizeof(*node), GFP_KERNEL);
+		if (!node) {
+			node = begin_node;
+			/* Release what's has been allocated. */
+			while (node) {
+				struct bpf_list_insn *next = node->next;
+
+				kvfree(node);
+				node = next;
+			}
+			return ERR_PTR(-ENOMEM);
+		}
+		node->insn = patch[idx];
+		prev->next = node;
+		prev = node;
+	}
+
+	prev->next = list_insn;
+	return begin_node;
+}
+
 void bpf_prog_kallsyms_del_subprogs(struct bpf_prog *fp)
 {
 	int i;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer Jiong Wang
@ 2019-07-04 21:26 ` Jiong Wang
  2019-07-10 17:50   ` Andrii Nakryiko
  2019-07-04 21:26 ` [RFC bpf-next 3/8] bpf: migrate jit blinding to list patching infra Jiong Wang
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

Verification layer also needs to handle auxiliar info as well as adjusting
subprog start.

At this layer, insns inside patch buffer could be jump, but they should
have been resolved, meaning they shouldn't jump to insn outside of the
patch buffer. Lineration function for this layer won't touch insns inside
patch buffer.

Adjusting subprog is finished along with adjusting jump target when the
input will cover bpf to bpf call insn, re-register subprog start is cheap.
But adjustment when there is insn deleteion is not considered yet.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a2e7637..2026d64 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
 	}
 }
 
+/* Linearize bpf list insn to array (verifier layer). */
+static struct bpf_verifier_env *
+verifier_linearize_list_insn(struct bpf_verifier_env *env,
+			     struct bpf_list_insn *list)
+{
+	u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
+	struct bpf_subprog_info *new_subinfo;
+	struct bpf_insn_aux_data *new_data;
+	struct bpf_prog *prog = env->prog;
+	struct bpf_verifier_env *ret_env;
+	struct bpf_insn *insns, *insn;
+	struct bpf_list_insn *elem;
+	int ret;
+
+	/* Calculate final size. */
+	for (elem = list; elem; elem = elem->next)
+		if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
+			fini_cnt++;
+
+	orig_cnt = prog->len;
+	insns = prog->insnsi;
+	/* If prog length remains same, nothing else to do. */
+	if (fini_cnt == orig_cnt) {
+		for (insn = insns, elem = list; elem; elem = elem->next, insn++)
+			*insn = elem->insn;
+		return env;
+	}
+	/* Realloc insn buffer when necessary. */
+	if (fini_cnt > orig_cnt)
+		prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
+					GFP_USER);
+	if (!prog)
+		return ERR_PTR(-ENOMEM);
+	insns = prog->insnsi;
+	prog->len = fini_cnt;
+	ret_env = env;
+
+	/* idx_map[OLD_IDX] = NEW_IDX */
+	idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
+	if (!idx_map)
+		return ERR_PTR(-ENOMEM);
+	memset(idx_map, 0xff, orig_cnt * sizeof(u32));
+
+	/* Use the same alloc method used when allocating env->insn_aux_data. */
+	new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
+	if (!new_data) {
+		kvfree(idx_map);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	/* Copy over insn + calculate idx_map. */
+	for (idx = 0, elem = list; elem; elem = elem->next) {
+		int orig_idx = elem->orig_idx - 1;
+
+		if (orig_idx >= 0) {
+			idx_map[orig_idx] = idx;
+
+			if (elem->flag & LIST_INSN_FLAG_REMOVED)
+				continue;
+
+			new_data[idx] = env->insn_aux_data[orig_idx];
+
+			if (elem->flag & LIST_INSN_FLAG_PATCHED)
+				new_data[idx].zext_dst =
+					insn_has_def32(env, &elem->insn);
+		} else {
+			new_data[idx].seen = true;
+			new_data[idx].zext_dst = insn_has_def32(env,
+								&elem->insn);
+		}
+		insns[idx++] = elem->insn;
+	}
+
+	new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
+	if (!new_subinfo) {
+		kvfree(idx_map);
+		vfree(new_data);
+		return ERR_PTR(-ENOMEM);
+	}
+	memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
+	memset(env->subprog_info, 0, sizeof(env->subprog_info));
+	env->subprog_cnt = 0;
+	env->prog = prog;
+	ret = add_subprog(env, 0);
+	if (ret < 0) {
+		ret_env = ERR_PTR(ret);
+		goto free_all_ret;
+	}
+	/* Relocate jumps using idx_map.
+	 *   old_dst = jmp_insn.old_target + old_pc + 1;
+	 *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
+	 *   jmp_insn.new_target = new_dst - new_pc - 1;
+	 */
+	for (idx = 0, elem = list; elem; elem = elem->next) {
+		int orig_idx = elem->orig_idx;
+
+		if (elem->flag & LIST_INSN_FLAG_REMOVED)
+			continue;
+		if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
+			idx++;
+			continue;
+		}
+
+		ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
+					  idx_map);
+		if (ret < 0) {
+			ret_env = ERR_PTR(ret);
+			goto free_all_ret;
+		}
+		/* Recalculate subprog start as we are at bpf2bpf call insn. */
+		if (ret > 0) {
+			ret = add_subprog(env, idx + insns[idx].imm + 1);
+			if (ret < 0) {
+				ret_env = ERR_PTR(ret);
+				goto free_all_ret;
+			}
+		}
+		idx++;
+	}
+	if (ret < 0) {
+		ret_env = ERR_PTR(ret);
+		goto free_all_ret;
+	}
+
+	env->subprog_info[env->subprog_cnt].start = fini_cnt;
+	for (idx = 0; idx <= env->subprog_cnt; idx++)
+		new_subinfo[idx].start = env->subprog_info[idx].start;
+	memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
+
+	/* Adjust linfo.
+	 * FIXME: no support for insn removal at the moment.
+	 */
+	if (prog->aux->nr_linfo) {
+		struct bpf_line_info *linfo = prog->aux->linfo;
+		u32 nr_linfo = prog->aux->nr_linfo;
+
+		for (idx = 0; idx < nr_linfo; idx++)
+			linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
+	}
+	vfree(env->insn_aux_data);
+	env->insn_aux_data = new_data;
+	goto free_mem_list_ret;
+free_all_ret:
+	vfree(new_data);
+free_mem_list_ret:
+	kvfree(new_subinfo);
+	kvfree(idx_map);
+	return ret_env;
+}
+
 static int opt_remove_dead_code(struct bpf_verifier_env *env)
 {
 	struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC bpf-next 3/8] bpf: migrate jit blinding to list patching infra
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer Jiong Wang
@ 2019-07-04 21:26 ` Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 4/8] bpf: migrate convert_ctx_accesses " Jiong Wang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

List linerization function will figure out the new jump destination of
patched/blinded jumps. No need of destination adjustment inside
bpf_jit_blind_insn any more.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/core.c | 76 ++++++++++++++++++++++++++-----------------------------
 1 file changed, 36 insertions(+), 40 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index e60703e..c3a5f84 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1162,7 +1162,6 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
 {
 	struct bpf_insn *to = to_buff;
 	u32 imm_rnd = get_random_int();
-	s16 off;
 
 	BUILD_BUG_ON(BPF_REG_AX  + 1 != MAX_BPF_JIT_REG);
 	BUILD_BUG_ON(MAX_BPF_REG + 1 != MAX_BPF_JIT_REG);
@@ -1234,13 +1233,10 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
 	case BPF_JMP | BPF_JSGE | BPF_K:
 	case BPF_JMP | BPF_JSLE | BPF_K:
 	case BPF_JMP | BPF_JSET | BPF_K:
-		/* Accommodate for extra offset in case of a backjump. */
-		off = from->off;
-		if (off < 0)
-			off -= 2;
 		*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
 		*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
-		*to++ = BPF_JMP_REG(from->code, from->dst_reg, BPF_REG_AX, off);
+		*to++ = BPF_JMP_REG(from->code, from->dst_reg, BPF_REG_AX,
+				    from->off);
 		break;
 
 	case BPF_JMP32 | BPF_JEQ  | BPF_K:
@@ -1254,14 +1250,10 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
 	case BPF_JMP32 | BPF_JSGE | BPF_K:
 	case BPF_JMP32 | BPF_JSLE | BPF_K:
 	case BPF_JMP32 | BPF_JSET | BPF_K:
-		/* Accommodate for extra offset in case of a backjump. */
-		off = from->off;
-		if (off < 0)
-			off -= 2;
 		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
 		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
 		*to++ = BPF_JMP32_REG(from->code, from->dst_reg, BPF_REG_AX,
-				      off);
+				      from->off);
 		break;
 
 	case BPF_LD | BPF_IMM | BPF_DW:
@@ -1332,10 +1324,9 @@ void bpf_jit_prog_release_other(struct bpf_prog *fp, struct bpf_prog *fp_other)
 struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
 {
 	struct bpf_insn insn_buff[16], aux[2];
-	struct bpf_prog *clone, *tmp;
-	int insn_delta, insn_cnt;
-	struct bpf_insn *insn;
-	int i, rewritten;
+	struct bpf_list_insn *list, *elem;
+	struct bpf_prog *clone, *ret_prog;
+	int rewritten;
 
 	if (!bpf_jit_blinding_enabled(prog) || prog->blinded)
 		return prog;
@@ -1344,43 +1335,48 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
 	if (!clone)
 		return ERR_PTR(-ENOMEM);
 
-	insn_cnt = clone->len;
-	insn = clone->insnsi;
+	list = bpf_create_list_insn(clone);
+	if (IS_ERR(list))
+		return (struct bpf_prog *)list;
+
+	/* kill uninitialized warning on some gcc versions. */
+	memset(&aux, 0, sizeof(aux));
+
+	for (elem = list; elem; elem = elem->next) {
+		struct bpf_list_insn *next = elem->next;
+		struct bpf_insn insn = elem->insn;
 
-	for (i = 0; i < insn_cnt; i++, insn++) {
 		/* We temporarily need to hold the original ld64 insn
 		 * so that we can still access the first part in the
 		 * second blinding run.
 		 */
-		if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW) &&
-		    insn[1].code == 0)
-			memcpy(aux, insn, sizeof(aux));
+		if (insn.code == (BPF_LD | BPF_IMM | BPF_DW)) {
+			struct bpf_insn next_insn = next->insn;
 
-		rewritten = bpf_jit_blind_insn(insn, aux, insn_buff);
+			if (next_insn.code == 0) {
+				aux[0] = insn;
+				aux[1] = next_insn;
+			}
+		}
+
+		rewritten = bpf_jit_blind_insn(&insn, aux, insn_buff);
 		if (!rewritten)
 			continue;
 
-		tmp = bpf_patch_insn_single(clone, i, insn_buff, rewritten);
-		if (IS_ERR(tmp)) {
-			/* Patching may have repointed aux->prog during
-			 * realloc from the original one, so we need to
-			 * fix it up here on error.
-			 */
-			bpf_jit_prog_release_other(prog, clone);
-			return tmp;
+		elem = bpf_patch_list_insn(elem, insn_buff, rewritten);
+		if (IS_ERR(elem)) {
+			ret_prog = (struct bpf_prog *)elem;
+			goto free_list_ret;
 		}
-
-		clone = tmp;
-		insn_delta = rewritten - 1;
-
-		/* Walk new program and skip insns we just inserted. */
-		insn = clone->insnsi + i + insn_delta;
-		insn_cnt += insn_delta;
-		i        += insn_delta;
 	}
 
-	clone->blinded = 1;
-	return clone;
+	clone = bpf_linearize_list_insn(clone, list);
+	if (!IS_ERR(clone))
+		clone->blinded = 1;
+	ret_prog = clone;
+free_list_ret:
+	bpf_destroy_list_insn(list);
+	return ret_prog;
 }
 #endif /* CONFIG_BPF_JIT */
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC bpf-next 4/8] bpf: migrate convert_ctx_accesses to list patching infra
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
                   ` (2 preceding siblings ...)
  2019-07-04 21:26 ` [RFC bpf-next 3/8] bpf: migrate jit blinding to list patching infra Jiong Wang
@ 2019-07-04 21:26 ` Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 5/8] bpf: migrate fixup_bpf_calls " Jiong Wang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

This patch migrate convert_ctx_accesses to new list patching
infrastructure. pre-patch is used for generating prologue, because what we
really want to do is insert the prog before prog start without touching
the first insn.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 98 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 58 insertions(+), 40 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2026d64..2d16e85 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8631,41 +8631,59 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
 static int convert_ctx_accesses(struct bpf_verifier_env *env)
 {
 	const struct bpf_verifier_ops *ops = env->ops;
-	int i, cnt, size, ctx_field_size, delta = 0;
-	const int insn_cnt = env->prog->len;
 	struct bpf_insn insn_buf[16], *insn;
 	u32 target_size, size_default, off;
-	struct bpf_prog *new_prog;
+	struct bpf_list_insn *list, *elem;
+	int cnt, size, ctx_field_size;
 	enum bpf_access_type type;
 	bool is_narrower_load;
+	int ret = 0;
+
+	list = bpf_create_list_insn(env->prog);
+	if (IS_ERR(list))
+		return PTR_ERR(list);
+	elem = list;
 
 	if (ops->gen_prologue || env->seen_direct_write) {
 		if (!ops->gen_prologue) {
 			verbose(env, "bpf verifier is misconfigured\n");
-			return -EINVAL;
+			ret = -EINVAL;
+			goto free_list_ret;
 		}
 		cnt = ops->gen_prologue(insn_buf, env->seen_direct_write,
 					env->prog);
 		if (cnt >= ARRAY_SIZE(insn_buf)) {
 			verbose(env, "bpf verifier is misconfigured\n");
-			return -EINVAL;
+			ret = -EINVAL;
+			goto free_list_ret;
 		} else if (cnt) {
-			new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt);
-			if (!new_prog)
-				return -ENOMEM;
+			struct bpf_list_insn *new_hdr;
 
-			env->prog = new_prog;
-			delta += cnt - 1;
+			/* "gen_prologue" generates patch buffer, we want to use
+			 * pre-patch buffer because we don't want to touch the
+			 * insn/aux at start offset.
+			 */
+			new_hdr = bpf_prepatch_list_insn(list, insn_buf,
+							 cnt - 1);
+			if (IS_ERR(new_hdr)) {
+				ret = -ENOMEM;
+				goto free_list_ret;
+			}
+			/* Update list head, so new pre-patched nodes could be
+			 * freed by destroyer.
+			 */
+			list = new_hdr;
 		}
 	}
 
 	if (bpf_prog_is_dev_bound(env->prog->aux))
-		return 0;
+		goto linearize_list_ret;
 
-	insn = env->prog->insnsi + delta;
-
-	for (i = 0; i < insn_cnt; i++, insn++) {
+	for (; elem; elem = elem->next) {
 		bpf_convert_ctx_access_t convert_ctx_access;
+		struct bpf_insn_aux_data *aux;
+
+		insn = &elem->insn;
 
 		if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) ||
 		    insn->code == (BPF_LDX | BPF_MEM | BPF_H) ||
@@ -8680,8 +8698,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		else
 			continue;
 
-		if (type == BPF_WRITE &&
-		    env->insn_aux_data[i + delta].sanitize_stack_off) {
+		aux = &env->insn_aux_data[elem->orig_idx - 1];
+		if (type == BPF_WRITE && aux->sanitize_stack_off) {
 			struct bpf_insn patch[] = {
 				/* Sanitize suspicious stack slot with zero.
 				 * There are no memory dependencies for this store,
@@ -8689,8 +8707,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 				 * constant of zero
 				 */
 				BPF_ST_MEM(BPF_DW, BPF_REG_FP,
-					   env->insn_aux_data[i + delta].sanitize_stack_off,
-					   0),
+					   aux->sanitize_stack_off, 0),
 				/* the original STX instruction will immediately
 				 * overwrite the same stack slot with appropriate value
 				 */
@@ -8698,17 +8715,15 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			};
 
 			cnt = ARRAY_SIZE(patch);
-			new_prog = bpf_patch_insn_data(env, i + delta, patch, cnt);
-			if (!new_prog)
-				return -ENOMEM;
-
-			delta    += cnt - 1;
-			env->prog = new_prog;
-			insn      = new_prog->insnsi + i + delta;
+			elem = bpf_patch_list_insn(elem, patch, cnt);
+			if (IS_ERR(elem)) {
+				ret = PTR_ERR(elem);
+				goto free_list_ret;
+			}
 			continue;
 		}
 
-		switch (env->insn_aux_data[i + delta].ptr_type) {
+		switch (aux->ptr_type) {
 		case PTR_TO_CTX:
 			if (!ops->convert_ctx_access)
 				continue;
@@ -8728,7 +8743,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			continue;
 		}
 
-		ctx_field_size = env->insn_aux_data[i + delta].ctx_field_size;
+		ctx_field_size = aux->ctx_field_size;
 		size = BPF_LDST_BYTES(insn);
 
 		/* If the read access is a narrower load of the field,
@@ -8744,7 +8759,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 
 			if (type == BPF_WRITE) {
 				verbose(env, "bpf verifier narrow ctx access misconfigured\n");
-				return -EINVAL;
+				ret = -EINVAL;
+				goto free_list_ret;
 			}
 
 			size_code = BPF_H;
@@ -8763,7 +8779,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) ||
 		    (ctx_field_size && !target_size)) {
 			verbose(env, "bpf verifier is misconfigured\n");
-			return -EINVAL;
+			ret = -EINVAL;
+			goto free_list_ret;
 		}
 
 		if (is_narrower_load && size < target_size) {
@@ -8786,18 +8803,19 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			}
 		}
 
-		new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
-		if (!new_prog)
-			return -ENOMEM;
-
-		delta += cnt - 1;
-
-		/* keep walking new program and skip insns we just inserted */
-		env->prog = new_prog;
-		insn      = new_prog->insnsi + i + delta;
+		elem = bpf_patch_list_insn(elem, insn_buf, cnt);
+		if (IS_ERR(elem)) {
+			ret = PTR_ERR(elem);
+			goto free_list_ret;
+		}
 	}
-
-	return 0;
+linearize_list_ret:
+	env = verifier_linearize_list_insn(env, list);
+	if (IS_ERR(env))
+		ret = PTR_ERR(env);
+free_list_ret:
+	bpf_destroy_list_insn(list);
+	return ret;
 }
 
 static int jit_subprogs(struct bpf_verifier_env *env)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC bpf-next 5/8] bpf: migrate fixup_bpf_calls to list patching infra
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
                   ` (3 preceding siblings ...)
  2019-07-04 21:26 ` [RFC bpf-next 4/8] bpf: migrate convert_ctx_accesses " Jiong Wang
@ 2019-07-04 21:26 ` Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 6/8] bpf: migrate zero extension opt " Jiong Wang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

This patch migrate fixup_bpf_calls to new list patching
infrastructure.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 94 +++++++++++++++++++++++++++------------------------
 1 file changed, 49 insertions(+), 45 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2d16e85..30ed28e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9033,16 +9033,19 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 {
 	struct bpf_prog *prog = env->prog;
 	struct bpf_insn *insn = prog->insnsi;
+	struct bpf_list_insn *list, *elem;
 	const struct bpf_func_proto *fn;
-	const int insn_cnt = prog->len;
 	const struct bpf_map_ops *ops;
 	struct bpf_insn_aux_data *aux;
 	struct bpf_insn insn_buf[16];
-	struct bpf_prog *new_prog;
 	struct bpf_map *map_ptr;
-	int i, cnt, delta = 0;
+	int cnt, ret = 0;
 
-	for (i = 0; i < insn_cnt; i++, insn++) {
+	list = bpf_create_list_insn(env->prog);
+	if (IS_ERR(list))
+		return PTR_ERR(list);
+	for (elem = list; elem; elem = elem->next) {
+		insn = &elem->insn;
 		if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
 		    insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
 		    insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
@@ -9073,13 +9076,11 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 				cnt = ARRAY_SIZE(mask_and_mod) - (is64 ? 1 : 0);
 			}
 
-			new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
-			if (!new_prog)
-				return -ENOMEM;
-
-			delta    += cnt - 1;
-			env->prog = prog = new_prog;
-			insn      = new_prog->insnsi + i + delta;
+			elem = bpf_patch_list_insn(elem, patchlet, cnt);
+			if (IS_ERR(elem)) {
+				ret = PTR_ERR(elem);
+				goto free_list_ret;
+			}
 			continue;
 		}
 
@@ -9089,16 +9090,15 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			cnt = env->ops->gen_ld_abs(insn, insn_buf);
 			if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) {
 				verbose(env, "bpf verifier is misconfigured\n");
-				return -EINVAL;
+				ret = -EINVAL;
+				goto free_list_ret;
 			}
 
-			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
-			if (!new_prog)
-				return -ENOMEM;
-
-			delta    += cnt - 1;
-			env->prog = prog = new_prog;
-			insn      = new_prog->insnsi + i + delta;
+			elem = bpf_patch_list_insn(elem, insn_buf, cnt);
+			if (IS_ERR(elem)) {
+				ret = PTR_ERR(elem);
+				goto free_list_ret;
+			}
 			continue;
 		}
 
@@ -9111,7 +9111,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			bool issrc, isneg;
 			u32 off_reg;
 
-			aux = &env->insn_aux_data[i + delta];
+			aux = &env->insn_aux_data[elem->orig_idx - 1];
 			if (!aux->alu_state ||
 			    aux->alu_state == BPF_ALU_NON_POINTER)
 				continue;
@@ -9144,13 +9144,12 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 				*patch++ = BPF_ALU64_IMM(BPF_MUL, off_reg, -1);
 			cnt = patch - insn_buf;
 
-			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
-			if (!new_prog)
-				return -ENOMEM;
+			elem = bpf_patch_list_insn(elem, insn_buf, cnt);
+			if (IS_ERR(elem)) {
+				ret = PTR_ERR(elem);
+				goto free_list_ret;
+			}
 
-			delta    += cnt - 1;
-			env->prog = prog = new_prog;
-			insn      = new_prog->insnsi + i + delta;
 			continue;
 		}
 
@@ -9183,7 +9182,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			insn->imm = 0;
 			insn->code = BPF_JMP | BPF_TAIL_CALL;
 
-			aux = &env->insn_aux_data[i + delta];
+			aux = &env->insn_aux_data[elem->orig_idx - 1];
 			if (!bpf_map_ptr_unpriv(aux))
 				continue;
 
@@ -9195,7 +9194,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			 */
 			if (bpf_map_ptr_poisoned(aux)) {
 				verbose(env, "tail_call abusing map_ptr\n");
-				return -EINVAL;
+				ret = -EINVAL;
+				goto free_list_ret;
 			}
 
 			map_ptr = BPF_MAP_PTR(aux->map_state);
@@ -9207,13 +9207,12 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 								 map)->index_mask);
 			insn_buf[2] = *insn;
 			cnt = 3;
-			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
-			if (!new_prog)
-				return -ENOMEM;
+			elem = bpf_patch_list_insn(elem, insn_buf, cnt);
+			if (IS_ERR(elem)) {
+				ret = PTR_ERR(elem);
+				goto free_list_ret;
+			}
 
-			delta    += cnt - 1;
-			env->prog = prog = new_prog;
-			insn      = new_prog->insnsi + i + delta;
 			continue;
 		}
 
@@ -9228,7 +9227,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 		     insn->imm == BPF_FUNC_map_push_elem   ||
 		     insn->imm == BPF_FUNC_map_pop_elem    ||
 		     insn->imm == BPF_FUNC_map_peek_elem)) {
-			aux = &env->insn_aux_data[i + delta];
+			aux = &env->insn_aux_data[elem->orig_idx - 1];
 			if (bpf_map_ptr_poisoned(aux))
 				goto patch_call_imm;
 
@@ -9239,17 +9238,16 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 				cnt = ops->map_gen_lookup(map_ptr, insn_buf);
 				if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) {
 					verbose(env, "bpf verifier is misconfigured\n");
-					return -EINVAL;
+					ret = -EINVAL;
+					goto free_list_ret;
 				}
 
-				new_prog = bpf_patch_insn_data(env, i + delta,
-							       insn_buf, cnt);
-				if (!new_prog)
-					return -ENOMEM;
+				elem = bpf_patch_list_insn(elem, insn_buf, cnt);
+				if (IS_ERR(elem)) {
+					ret = PTR_ERR(elem);
+					goto free_list_ret;
+				}
 
-				delta    += cnt - 1;
-				env->prog = prog = new_prog;
-				insn      = new_prog->insnsi + i + delta;
 				continue;
 			}
 
@@ -9307,12 +9305,18 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			verbose(env,
 				"kernel subsystem misconfigured func %s#%d\n",
 				func_id_name(insn->imm), insn->imm);
-			return -EFAULT;
+			ret = -EFAULT;
+			goto free_list_ret;
 		}
 		insn->imm = fn->func - __bpf_call_base;
 	}
 
-	return 0;
+	env = verifier_linearize_list_insn(env, list);
+	if (IS_ERR(env))
+		ret = PTR_ERR(env);
+free_list_ret:
+	bpf_destroy_list_insn(list);
+	return ret;
 }
 
 static void free_states(struct bpf_verifier_env *env)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC bpf-next 6/8] bpf: migrate zero extension opt to list patching infra
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
                   ` (4 preceding siblings ...)
  2019-07-04 21:26 ` [RFC bpf-next 5/8] bpf: migrate fixup_bpf_calls " Jiong Wang
@ 2019-07-04 21:26 ` Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 7/8] bpf: migrate insn remove " Jiong Wang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

This patch migrate 32-bit zero extension insertion to new list patching
infrastructure.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 30ed28e..58d6bbe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8549,10 +8549,9 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
 					 const union bpf_attr *attr)
 {
 	struct bpf_insn *patch, zext_patch[2], rnd_hi32_patch[4];
-	struct bpf_insn_aux_data *aux = env->insn_aux_data;
-	int i, patch_len, delta = 0, len = env->prog->len;
-	struct bpf_insn *insns = env->prog->insnsi;
-	struct bpf_prog *new_prog;
+	struct bpf_list_insn *list, *elem;
+	struct bpf_insn_aux_data *aux;
+	int patch_len, ret = 0;
 	bool rnd_hi32;
 
 	rnd_hi32 = attr->prog_flags & BPF_F_TEST_RND_HI32;
@@ -8560,12 +8559,16 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
 	rnd_hi32_patch[1] = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, 0);
 	rnd_hi32_patch[2] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_AX, 32);
 	rnd_hi32_patch[3] = BPF_ALU64_REG(BPF_OR, 0, BPF_REG_AX);
-	for (i = 0; i < len; i++) {
-		int adj_idx = i + delta;
-		struct bpf_insn insn;
 
-		insn = insns[adj_idx];
-		if (!aux[adj_idx].zext_dst) {
+	list = bpf_create_list_insn(env->prog);
+	if (IS_ERR(list))
+		return PTR_ERR(list);
+
+	for (elem = list; elem; elem = elem->next) {
+		struct bpf_insn insn = elem->insn;
+
+		aux = &env->insn_aux_data[elem->orig_idx - 1];
+		if (!aux->zext_dst) {
 			u8 code, class;
 			u32 imm_rnd;
 
@@ -8584,13 +8587,13 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
 			if (is_reg64(env, &insn, insn.dst_reg, NULL, DST_OP)) {
 				if (class == BPF_LD &&
 				    BPF_MODE(code) == BPF_IMM)
-					i++;
+					elem = elem->next;
 				continue;
 			}
 
 			/* ctx load could be transformed into wider load. */
 			if (class == BPF_LDX &&
-			    aux[adj_idx].ptr_type == PTR_TO_CTX)
+			    aux->ptr_type == PTR_TO_CTX)
 				continue;
 
 			imm_rnd = get_random_int();
@@ -8611,16 +8614,18 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
 		patch = zext_patch;
 		patch_len = 2;
 apply_patch_buffer:
-		new_prog = bpf_patch_insn_data(env, adj_idx, patch, patch_len);
-		if (!new_prog)
-			return -ENOMEM;
-		env->prog = new_prog;
-		insns = new_prog->insnsi;
-		aux = env->insn_aux_data;
-		delta += patch_len - 1;
+		elem = bpf_patch_list_insn(elem, patch, patch_len);
+		if (IS_ERR(elem)) {
+			ret = PTR_ERR(elem);
+			goto free_list_ret;
+		}
 	}
-
-	return 0;
+	env = verifier_linearize_list_insn(env, list);
+	if (IS_ERR(env))
+		ret = PTR_ERR(env);
+free_list_ret:
+	bpf_destroy_list_insn(list);
+	return ret;
 }
 
 /* convert load instructions that access fields of a context type into a
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC bpf-next 7/8] bpf: migrate insn remove to list patching infra
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
                   ` (5 preceding siblings ...)
  2019-07-04 21:26 ` [RFC bpf-next 6/8] bpf: migrate zero extension opt " Jiong Wang
@ 2019-07-04 21:26 ` Jiong Wang
  2019-07-04 21:26 ` [RFC bpf-next 8/8] bpf: delete all those code around old insn patching infrastructure Jiong Wang
  2019-07-10 17:39 ` [RFC bpf-next 0/8] bpf: accelerate insn patching speed Andrii Nakryiko
  8 siblings, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

This patch migrate dead code remove pass to new list patching
infrastructure.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 59 +++++++++++++++++----------------------------------
 1 file changed, 19 insertions(+), 40 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 58d6bbe..abe11fd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8500,49 +8500,30 @@ verifier_linearize_list_insn(struct bpf_verifier_env *env,
 	return ret_env;
 }
 
-static int opt_remove_dead_code(struct bpf_verifier_env *env)
+static int opt_remove_useless_code(struct bpf_verifier_env *env)
 {
-	struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
-	int insn_cnt = env->prog->len;
-	int i, err;
-
-	for (i = 0; i < insn_cnt; i++) {
-		int j;
-
-		j = 0;
-		while (i + j < insn_cnt && !aux_data[i + j].seen)
-			j++;
-		if (!j)
-			continue;
-
-		err = verifier_remove_insns(env, i, j);
-		if (err)
-			return err;
-		insn_cnt = env->prog->len;
-	}
-
-	return 0;
-}
-
-static int opt_remove_nops(struct bpf_verifier_env *env)
-{
-	const struct bpf_insn ja = BPF_JMP_IMM(BPF_JA, 0, 0, 0);
-	struct bpf_insn *insn = env->prog->insnsi;
-	int insn_cnt = env->prog->len;
-	int i, err;
+	struct bpf_insn_aux_data *auxs = env->insn_aux_data;
+	const struct bpf_insn nop =
+		BPF_JMP_IMM(BPF_JA, 0, 0, 0);
+	struct bpf_list_insn *list, *elem;
+	int ret = 0;
 
-	for (i = 0; i < insn_cnt; i++) {
-		if (memcmp(&insn[i], &ja, sizeof(ja)))
+	list = bpf_create_list_insn(env->prog);
+	if (IS_ERR(list))
+		return PTR_ERR(list);
+	for (elem = list; elem; elem = elem->next) {
+		if (auxs[elem->orig_idx - 1].seen &&
+		    memcmp(&elem->insn, &nop, sizeof(nop)))
 			continue;
 
-		err = verifier_remove_insns(env, i, 1);
-		if (err)
-			return err;
-		insn_cnt--;
-		i--;
+		elem->flag |= LIST_INSN_FLAG_REMOVED;
 	}
 
-	return 0;
+	env = verifier_linearize_list_insn(env, list);
+	if (IS_ERR(env))
+		ret = PTR_ERR(env);
+	bpf_destroy_list_insn(list);
+	return ret;
 }
 
 static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
@@ -9488,9 +9469,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 		if (ret == 0)
 			opt_hard_wire_dead_code_branches(env);
 		if (ret == 0)
-			ret = opt_remove_dead_code(env);
-		if (ret == 0)
-			ret = opt_remove_nops(env);
+			ret = opt_remove_useless_code(env);
 	} else {
 		if (ret == 0)
 			sanitize_dead_code(env);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC bpf-next 8/8] bpf: delete all those code around old insn patching infrastructure
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
                   ` (6 preceding siblings ...)
  2019-07-04 21:26 ` [RFC bpf-next 7/8] bpf: migrate insn remove " Jiong Wang
@ 2019-07-04 21:26 ` Jiong Wang
  2019-07-10 17:39 ` [RFC bpf-next 0/8] bpf: accelerate insn patching speed Andrii Nakryiko
  8 siblings, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-04 21:26 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: ecree, naveen.n.rao, andriin, jakub.kicinski, bpf, netdev,
	oss-drivers, Jiong Wang

This patch delete all code around old insn patching infrastructure.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |   1 -
 include/linux/filter.h       |   4 -
 kernel/bpf/core.c            | 169 ---------------------------------
 kernel/bpf/verifier.c        | 221 +------------------------------------------
 4 files changed, 1 insertion(+), 394 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 5fe99f3..79c1733 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -305,7 +305,6 @@ struct bpf_insn_aux_data {
 	bool zext_dst; /* this insn zero extends dst reg */
 	u8 alu_state; /* used in combination with alu_limit */
 	bool prune_point;
-	unsigned int orig_idx; /* original instruction index */
 };
 
 #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1fea68c..fcfe0b0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -838,10 +838,6 @@ static inline bool bpf_dump_raw_ok(void)
 	return kallsyms_show_value() == 1;
 }
 
-struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
-				       const struct bpf_insn *patch, u32 len);
-int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
-
 int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
 			int idx_map[]);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c3a5f84..716220b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -333,175 +333,6 @@ int bpf_prog_calc_tag(struct bpf_prog *fp)
 	return 0;
 }
 
-static int bpf_adj_delta_to_imm(struct bpf_insn *insn, u32 pos, s32 end_old,
-				s32 end_new, s32 curr, const bool probe_pass)
-{
-	const s64 imm_min = S32_MIN, imm_max = S32_MAX;
-	s32 delta = end_new - end_old;
-	s64 imm = insn->imm;
-
-	if (curr < pos && curr + imm + 1 >= end_old)
-		imm += delta;
-	else if (curr >= end_new && curr + imm + 1 < end_new)
-		imm -= delta;
-	if (imm < imm_min || imm > imm_max)
-		return -ERANGE;
-	if (!probe_pass)
-		insn->imm = imm;
-	return 0;
-}
-
-static int bpf_adj_delta_to_off(struct bpf_insn *insn, u32 pos, s32 end_old,
-				s32 end_new, s32 curr, const bool probe_pass)
-{
-	const s32 off_min = S16_MIN, off_max = S16_MAX;
-	s32 delta = end_new - end_old;
-	s32 off = insn->off;
-
-	if (curr < pos && curr + off + 1 >= end_old)
-		off += delta;
-	else if (curr >= end_new && curr + off + 1 < end_new)
-		off -= delta;
-	if (off < off_min || off > off_max)
-		return -ERANGE;
-	if (!probe_pass)
-		insn->off = off;
-	return 0;
-}
-
-static int bpf_adj_branches(struct bpf_prog *prog, u32 pos, s32 end_old,
-			    s32 end_new, const bool probe_pass)
-{
-	u32 i, insn_cnt = prog->len + (probe_pass ? end_new - end_old : 0);
-	struct bpf_insn *insn = prog->insnsi;
-	int ret = 0;
-
-	for (i = 0; i < insn_cnt; i++, insn++) {
-		u8 code;
-
-		/* In the probing pass we still operate on the original,
-		 * unpatched image in order to check overflows before we
-		 * do any other adjustments. Therefore skip the patchlet.
-		 */
-		if (probe_pass && i == pos) {
-			i = end_new;
-			insn = prog->insnsi + end_old;
-		}
-		code = insn->code;
-		if ((BPF_CLASS(code) != BPF_JMP &&
-		     BPF_CLASS(code) != BPF_JMP32) ||
-		    BPF_OP(code) == BPF_EXIT)
-			continue;
-		/* Adjust offset of jmps if we cross patch boundaries. */
-		if (BPF_OP(code) == BPF_CALL) {
-			if (insn->src_reg != BPF_PSEUDO_CALL)
-				continue;
-			ret = bpf_adj_delta_to_imm(insn, pos, end_old,
-						   end_new, i, probe_pass);
-		} else {
-			ret = bpf_adj_delta_to_off(insn, pos, end_old,
-						   end_new, i, probe_pass);
-		}
-		if (ret)
-			break;
-	}
-
-	return ret;
-}
-
-static void bpf_adj_linfo(struct bpf_prog *prog, u32 off, u32 delta)
-{
-	struct bpf_line_info *linfo;
-	u32 i, nr_linfo;
-
-	nr_linfo = prog->aux->nr_linfo;
-	if (!nr_linfo || !delta)
-		return;
-
-	linfo = prog->aux->linfo;
-
-	for (i = 0; i < nr_linfo; i++)
-		if (off < linfo[i].insn_off)
-			break;
-
-	/* Push all off < linfo[i].insn_off by delta */
-	for (; i < nr_linfo; i++)
-		linfo[i].insn_off += delta;
-}
-
-struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
-				       const struct bpf_insn *patch, u32 len)
-{
-	u32 insn_adj_cnt, insn_rest, insn_delta = len - 1;
-	const u32 cnt_max = S16_MAX;
-	struct bpf_prog *prog_adj;
-	int err;
-
-	/* Since our patchlet doesn't expand the image, we're done. */
-	if (insn_delta == 0) {
-		memcpy(prog->insnsi + off, patch, sizeof(*patch));
-		return prog;
-	}
-
-	insn_adj_cnt = prog->len + insn_delta;
-
-	/* Reject anything that would potentially let the insn->off
-	 * target overflow when we have excessive program expansions.
-	 * We need to probe here before we do any reallocation where
-	 * we afterwards may not fail anymore.
-	 */
-	if (insn_adj_cnt > cnt_max &&
-	    (err = bpf_adj_branches(prog, off, off + 1, off + len, true)))
-		return ERR_PTR(err);
-
-	/* Several new instructions need to be inserted. Make room
-	 * for them. Likely, there's no need for a new allocation as
-	 * last page could have large enough tailroom.
-	 */
-	prog_adj = bpf_prog_realloc(prog, bpf_prog_size(insn_adj_cnt),
-				    GFP_USER);
-	if (!prog_adj)
-		return ERR_PTR(-ENOMEM);
-
-	prog_adj->len = insn_adj_cnt;
-
-	/* Patching happens in 3 steps:
-	 *
-	 * 1) Move over tail of insnsi from next instruction onwards,
-	 *    so we can patch the single target insn with one or more
-	 *    new ones (patching is always from 1 to n insns, n > 0).
-	 * 2) Inject new instructions at the target location.
-	 * 3) Adjust branch offsets if necessary.
-	 */
-	insn_rest = insn_adj_cnt - off - len;
-
-	memmove(prog_adj->insnsi + off + len, prog_adj->insnsi + off + 1,
-		sizeof(*patch) * insn_rest);
-	memcpy(prog_adj->insnsi + off, patch, sizeof(*patch) * len);
-
-	/* We are guaranteed to not fail at this point, otherwise
-	 * the ship has sailed to reverse to the original state. An
-	 * overflow cannot happen at this point.
-	 */
-	BUG_ON(bpf_adj_branches(prog_adj, off, off + 1, off + len, false));
-
-	bpf_adj_linfo(prog_adj, off, insn_delta);
-
-	return prog_adj;
-}
-
-int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt)
-{
-	/* Branch offsets can't overflow when program is shrinking, no need
-	 * to call bpf_adj_branches(..., true) here
-	 */
-	memmove(prog->insnsi + off, prog->insnsi + off + cnt,
-		sizeof(struct bpf_insn) * (prog->len - off - cnt));
-	prog->len -= cnt;
-
-	return WARN_ON_ONCE(bpf_adj_branches(prog, off, off + cnt, off, false));
-}
-
 int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
 			s32 idx_map[])
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index abe11fd..9e5618f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8067,223 +8067,6 @@ static void convert_pseudo_ld_imm64(struct bpf_verifier_env *env)
 			insn->src_reg = 0;
 }
 
-/* single env->prog->insni[off] instruction was replaced with the range
- * insni[off, off + cnt).  Adjust corresponding insn_aux_data by copying
- * [0, off) and [off, end) to new locations, so the patched range stays zero
- */
-static int adjust_insn_aux_data(struct bpf_verifier_env *env,
-				struct bpf_prog *new_prog, u32 off, u32 cnt)
-{
-	struct bpf_insn_aux_data *new_data, *old_data = env->insn_aux_data;
-	struct bpf_insn *insn = new_prog->insnsi;
-	u32 prog_len;
-	int i;
-
-	/* aux info at OFF always needs adjustment, no matter fast path
-	 * (cnt == 1) is taken or not. There is no guarantee INSN at OFF is the
-	 * original insn at old prog.
-	 */
-	old_data[off].zext_dst = insn_has_def32(env, insn + off + cnt - 1);
-
-	if (cnt == 1)
-		return 0;
-	prog_len = new_prog->len;
-	new_data = vzalloc(array_size(prog_len,
-				      sizeof(struct bpf_insn_aux_data)));
-	if (!new_data)
-		return -ENOMEM;
-	memcpy(new_data, old_data, sizeof(struct bpf_insn_aux_data) * off);
-	memcpy(new_data + off + cnt - 1, old_data + off,
-	       sizeof(struct bpf_insn_aux_data) * (prog_len - off - cnt + 1));
-	for (i = off; i < off + cnt - 1; i++) {
-		new_data[i].seen = true;
-		new_data[i].zext_dst = insn_has_def32(env, insn + i);
-	}
-	env->insn_aux_data = new_data;
-	vfree(old_data);
-	return 0;
-}
-
-static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len)
-{
-	int i;
-
-	if (len == 1)
-		return;
-	/* NOTE: fake 'exit' subprog should be updated as well. */
-	for (i = 0; i <= env->subprog_cnt; i++) {
-		if (env->subprog_info[i].start <= off)
-			continue;
-		env->subprog_info[i].start += len - 1;
-	}
-}
-
-static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 off,
-					    const struct bpf_insn *patch, u32 len)
-{
-	struct bpf_prog *new_prog;
-
-	new_prog = bpf_patch_insn_single(env->prog, off, patch, len);
-	if (IS_ERR(new_prog)) {
-		if (PTR_ERR(new_prog) == -ERANGE)
-			verbose(env,
-				"insn %d cannot be patched due to 16-bit range\n",
-				env->insn_aux_data[off].orig_idx);
-		return NULL;
-	}
-	if (adjust_insn_aux_data(env, new_prog, off, len))
-		return NULL;
-	adjust_subprog_starts(env, off, len);
-	return new_prog;
-}
-
-static int adjust_subprog_starts_after_remove(struct bpf_verifier_env *env,
-					      u32 off, u32 cnt)
-{
-	int i, j;
-
-	/* find first prog starting at or after off (first to remove) */
-	for (i = 0; i < env->subprog_cnt; i++)
-		if (env->subprog_info[i].start >= off)
-			break;
-	/* find first prog starting at or after off + cnt (first to stay) */
-	for (j = i; j < env->subprog_cnt; j++)
-		if (env->subprog_info[j].start >= off + cnt)
-			break;
-	/* if j doesn't start exactly at off + cnt, we are just removing
-	 * the front of previous prog
-	 */
-	if (env->subprog_info[j].start != off + cnt)
-		j--;
-
-	if (j > i) {
-		struct bpf_prog_aux *aux = env->prog->aux;
-		int move;
-
-		/* move fake 'exit' subprog as well */
-		move = env->subprog_cnt + 1 - j;
-
-		memmove(env->subprog_info + i,
-			env->subprog_info + j,
-			sizeof(*env->subprog_info) * move);
-		env->subprog_cnt -= j - i;
-
-		/* remove func_info */
-		if (aux->func_info) {
-			move = aux->func_info_cnt - j;
-
-			memmove(aux->func_info + i,
-				aux->func_info + j,
-				sizeof(*aux->func_info) * move);
-			aux->func_info_cnt -= j - i;
-			/* func_info->insn_off is set after all code rewrites,
-			 * in adjust_btf_func() - no need to adjust
-			 */
-		}
-	} else {
-		/* convert i from "first prog to remove" to "first to adjust" */
-		if (env->subprog_info[i].start == off)
-			i++;
-	}
-
-	/* update fake 'exit' subprog as well */
-	for (; i <= env->subprog_cnt; i++)
-		env->subprog_info[i].start -= cnt;
-
-	return 0;
-}
-
-static int bpf_adj_linfo_after_remove(struct bpf_verifier_env *env, u32 off,
-				      u32 cnt)
-{
-	struct bpf_prog *prog = env->prog;
-	u32 i, l_off, l_cnt, nr_linfo;
-	struct bpf_line_info *linfo;
-
-	nr_linfo = prog->aux->nr_linfo;
-	if (!nr_linfo)
-		return 0;
-
-	linfo = prog->aux->linfo;
-
-	/* find first line info to remove, count lines to be removed */
-	for (i = 0; i < nr_linfo; i++)
-		if (linfo[i].insn_off >= off)
-			break;
-
-	l_off = i;
-	l_cnt = 0;
-	for (; i < nr_linfo; i++)
-		if (linfo[i].insn_off < off + cnt)
-			l_cnt++;
-		else
-			break;
-
-	/* First live insn doesn't match first live linfo, it needs to "inherit"
-	 * last removed linfo.  prog is already modified, so prog->len == off
-	 * means no live instructions after (tail of the program was removed).
-	 */
-	if (prog->len != off && l_cnt &&
-	    (i == nr_linfo || linfo[i].insn_off != off + cnt)) {
-		l_cnt--;
-		linfo[--i].insn_off = off + cnt;
-	}
-
-	/* remove the line info which refer to the removed instructions */
-	if (l_cnt) {
-		memmove(linfo + l_off, linfo + i,
-			sizeof(*linfo) * (nr_linfo - i));
-
-		prog->aux->nr_linfo -= l_cnt;
-		nr_linfo = prog->aux->nr_linfo;
-	}
-
-	/* pull all linfo[i].insn_off >= off + cnt in by cnt */
-	for (i = l_off; i < nr_linfo; i++)
-		linfo[i].insn_off -= cnt;
-
-	/* fix up all subprogs (incl. 'exit') which start >= off */
-	for (i = 0; i <= env->subprog_cnt; i++)
-		if (env->subprog_info[i].linfo_idx > l_off) {
-			/* program may have started in the removed region but
-			 * may not be fully removed
-			 */
-			if (env->subprog_info[i].linfo_idx >= l_off + l_cnt)
-				env->subprog_info[i].linfo_idx -= l_cnt;
-			else
-				env->subprog_info[i].linfo_idx = l_off;
-		}
-
-	return 0;
-}
-
-static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
-{
-	struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
-	unsigned int orig_prog_len = env->prog->len;
-	int err;
-
-	if (bpf_prog_is_dev_bound(env->prog->aux))
-		bpf_prog_offload_remove_insns(env, off, cnt);
-
-	err = bpf_remove_insns(env->prog, off, cnt);
-	if (err)
-		return err;
-
-	err = adjust_subprog_starts_after_remove(env, off, cnt);
-	if (err)
-		return err;
-
-	err = bpf_adj_linfo_after_remove(env, off, cnt);
-	if (err)
-		return err;
-
-	memmove(aux_data + off,	aux_data + off + cnt,
-		sizeof(*aux_data) * (orig_prog_len - off - cnt));
-
-	return 0;
-}
-
 /* The verifier does more data flow analysis than llvm and will not
  * explore branches that are dead at run time. Malicious programs can
  * have dead code too. Therefore replace all dead at-run-time code
@@ -9365,7 +9148,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	u64 start_time = ktime_get_ns();
 	struct bpf_verifier_env *env;
 	struct bpf_verifier_log *log;
-	int i, len, ret = -EINVAL;
+	int len, ret = -EINVAL;
 	bool is_priv;
 
 	/* no program is valid */
@@ -9386,8 +9169,6 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	ret = -ENOMEM;
 	if (!env->insn_aux_data)
 		goto err_free_env;
-	for (i = 0; i < len; i++)
-		env->insn_aux_data[i].orig_idx = i;
 	env->prog = *prog;
 	env->ops = bpf_verifier_ops[env->prog->type];
 	is_priv = capable(CAP_SYS_ADMIN);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
                   ` (7 preceding siblings ...)
  2019-07-04 21:26 ` [RFC bpf-next 8/8] bpf: delete all those code around old insn patching infrastructure Jiong Wang
@ 2019-07-10 17:39 ` Andrii Nakryiko
  2019-07-11 11:22   ` Jiong Wang
  8 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-10 17:39 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers

On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>
> This is an RFC based on latest bpf-next about acclerating insn patching
> speed, it is now near the shape of final PATCH set, and we could see the
> changes migrating to list patching would brings, so send out for
> comments. Most of the info are in cover letter. I splitted the code in a
> way to show API migration more easily.


Hey Jiong,


Sorry, took me a while to get to this and learn more about instruction
patching. Overall this looks good and I think is a good direction.
I'll post high-level feedback here, and some more
implementation-specific ones in corresponding patches.


>
> Test Results
> ===
>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
>     modes (interpreter, JIT, JIT with blinding).
>
>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
>     patching time from 5100s (nearly one and a half hour) to less than
>     0.5s for 1M insn patching.
>
> Known Issues
> ===
>   - The following warning is triggered when running scale test which
>     contains 1M insns and patching:
>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
>
>     This is caused by existing code, it can be reproduced on bpf-next
>     master with jit blinding enabled, then run scale unit test, it will
>     shown up after half an hour. After this set, patching is very fast, so
>     it shows up quickly.
>
>   - No line info adjustment support when doing insn delete, subprog adj
>     is with bug when doing insn delete as well. Generally, removal of insns
>     could possibly cause remove of entire line or subprog, therefore
>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
>     don't have good idea and clean code for integrating this into the
>     linearization code at the moment, will do more experimenting,
>     appreciate ideas and suggestions on this.

Is there any specific problem to detect which line info to delete? Or
what am I missing besides careful implementation?

>
>     Insn delete doesn't happen on normal programs, for example Cilium
>     benchmarks, and happens rarely on test_progs, so the test coverage is
>     not good. That's also why this RFC have a full pass on selftest with
>     this known issue.

I hope you'll add test for deletion (and w/ corresponding line info)
in final patch set :)

>
>   - Could further use mem pool to accelerate the speed, changes are trivial
>     on top of this RFC, and could be 2x extra faster. Not included in this
>     RFC as reducing the algo complexity from quadratic to linear of insn
>     number is the first step.

Honestly, I think that would add more complexity than necessary, and I
think we can further speed up performance without that, see below.

>
> Background
> ===
> This RFC aims to accelerate BPF insn patching speed, patching means expand
> one bpf insn at any offset inside bpf prog into a set of new insns, or
> remove insns.
>
> At the moment, insn patching is quadratic of insn number, this is due to
> branch targets of jump insns needs to be adjusted, and the algo used is:
>
>   for insn inside prog
>     patch insn + regeneate bpf prog
>     for insn inside new prog
>       adjust jump target
>
> This is causing significant time spending when a bpf prog requires large
> amount of patching on different insns. Benchmarking shows it could take
> more than half minutes to finish patching when patching number is more
> than 50K, and the time spent could be more than one hour when patching
> number is around 1M.
>
>   15000   :    3s
>   45000   :   29s
>   95000   :  125s
>   195000  :  712s
>   1000000 : 5100s
>
> This RFC introduces new patching infrastructure. Before doing insn
> patching, insns in bpf prog are turned into a singly linked list, insert
> new insns just insert new list node, delete insns just set delete flag.
> And finally, the list is linearized back into array, and branch target
> adjustment is done for all jump insns during linearization. This algo
> brings the time complexity from quadratic to linear of insn number.
>
> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
> on medium sized prog, and for a 1M patching it reduce the time from 5100s
> to less than 0.5s.
>
> Patching API
> ===
> Insn patching could happen on two layers inside BPF. One is "core layer"
> where only BPF insns are patched. The other is "verification layer" where
> insns have corresponding aux info as well high level subprog info, so
> insn patching means aux info needs to be patched as well, and subprog info
> needs to be adjusted. BPF prog also has debug info associated, so line info
> should always be updated after insn patching.
>
> So, list creation, destroy, insert, delete is the same for both layer,
> but lineration is different. "verification layer" patching require extra
> work. Therefore the patch APIs are:
>
>    list creation:                bpf_create_list_insn
>    list patch:                   bpf_patch_list_insn
>    list pre-patch:               bpf_prepatch_list_insn

I think pre-patch name is very confusing, until I read full
description I couldn't understand what it's supposed to be used for.
Speaking of bpf_patch_list_insn, patch is also generic enough to leave
me wondering whether instruction buffer is inserted after instruction,
or instruction is replaced with a bunch of instructions.

So how about two more specific names:
bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
instruction with a list of patch instructions)
bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
one is pretty clear).

>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)

These two functions are both quite involved, as well as share a lot of
common code. I'd rather have one linearize instruction, that takes env
as an optional parameter. If env is specified (which is the case for
all cases except for constant blinding pass), then adjust aux_data and
subprogs along the way.

This would keep logic less duplicated and shouldn't complexity beyond
few null checks in few places.

>    list destroy:                 bpf_destroy_list_insn
>

I'd also add a macro foreach_list_insn instead of explicit for loops
in multiple places. That would also allow to skip deleted instructions
transparently.

> list patch could change the insn at patch point, it will invalid the aux

typo: invalid -> invalidate

> info at patching point. list pre-patch insert new insns before patch point
> where the insn and associated aux info are not touched, it is used for
> example in convert_ctx_access when generating prologue.
>
> Typical API sequence for one patching pass:
>
>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
>    for (elem = list; elem; elem = elem->next)
>       patch_buf = gen_patch_buf_logic;
>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>    bpf_prog = bpf_linearize_list_insn(list)
>    bpf_destroy_list_insn(list)
>
> Several patching passes could also share the same list:
>
>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
>    for (elem = list; elem; elem = elem->next)
>       patch_buf = gen_patch_buf_logic1;
>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>    for (elem = list; elem; elem = elem->next)
>       patch_buf = gen_patch_buf_logic2;
>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>    bpf_prog = bpf_linearize_list_insn(list)
>    bpf_destroy_list_insn(list)
>
> but note new inserted insns int early passes won't have aux info except
> zext info. So, if one patch pass requires all aux info updated and
> recalculated for all insns including those pathced, it should first
> linearize the old list, then re-create the list. The RFC always create and
> linearize the list for each migrated patching pass separately.

I think we should do just one list creation, few passes of patching
and then linearize once. That will save quite a lot of memory
allocation and will speed up a lot of things. All the verifier
patching happens one after the other without any other functionality
in between, so there shouldn't be any problem.

As for aux_data. We can solve that even more simply and reliably by
storing a pointer along the struct bpf_list_insn (btw, how about
calling it bpf_patchable_insn?).

Here's how I propose to represent this patchable instruction:

struct bpf_list_insn {
       struct bpf_insn insn;
       struct bpf_list_insn *next;
       struct bpf_list_insn *target;
       struct bpf_insn_aux_data *aux_data;
       s32 orig_idx; // can repurpose this to have three meanings:
                     // -2 - deleted
                     // -1 - patched/inserted insn
                     // >=0 - original idx
};

The idea would be as follows:
1. when creating original list, target pointer will point directly to
a patchable instruction wrapper for jumps/calls. This will allow to
stop tracking and re-calculating jump offsets and instruction indicies
until linearization.
2. aux_data is also filled at that point. Later at linearization time
you'd just iterate over all the instructions in final order and copy
original aux_data, if it's present. And then just repace env's
aux_data array at the end, should be very simple and fast.
3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
same list of instructions and those passes will just keep inserting
instruction buffers. Given we have restriction that all the jumps are
only within patch buffer, it will be trivial to construct proper
patchable instruction wrappers for newly added instructions, with NULL
for aux_data and possibly non-NULL target (if it's a JMP insn).
4. After those passes, linearize, adjust subprogs (for this you'll
probably still need to create index mapping, right?), copy or create
new aux_data.
5. Done.

What do you think? I think this should be overall simpler and faster.
But let me know if I'm missing something.

>
> Compared with old patching code, this new infrastructure has much less core
> code, even though the final code has a couple of extra lines but that is
> mostly due to for list based infrastructure, we need to do more error
> checks, so the list and associated aux data structure could be freed when
> errors happens.
>
> Patching Restrictions
> ===
>   - For core layer, the linearization assume no new jumps inside patch buf.
>     Currently, the only user of this layer is jit blinding.
>   - For verifier layer, there could be new jumps inside patch buf, but
>     they should have branch target resolved themselves, meaning new jumps
>     doesn't jump to insns out of the patch buf. This is the case for all
>     existing verifier layer users.
>   - bpf_insn_aux_data for all patched insns including the one at patch
>     point are invalidated, only 32-bit zext info will be recalcuated.
>     If the aux data of insn at patch point needs to be retained, it is
>     purely insn insertion, so need to use the pre-patch API.
>
> I plan to send out a PATCH set once I finished insn deletion line info adj
> support, please have a looks at this RFC, and appreciate feedbacks.
>
> Jiong Wang (8):
>   bpf: introducing list based insn patching infra to core layer
>   bpf: extend list based insn patching infra to verification layer
>   bpf: migrate jit blinding to list patching infra
>   bpf: migrate convert_ctx_accesses to list patching infra
>   bpf: migrate fixup_bpf_calls to list patching infra
>   bpf: migrate zero extension opt to list patching infra
>   bpf: migrate insn remove to list patching infra
>   bpf: delete all those code around old insn patching infrastructure
>
>  include/linux/bpf_verifier.h |   1 -
>  include/linux/filter.h       |  27 +-
>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
>  4 files changed, 580 insertions(+), 528 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer
  2019-07-04 21:26 ` [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer Jiong Wang
@ 2019-07-10 17:49   ` Andrii Nakryiko
  2019-07-11 11:53     ` Jiong Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-10 17:49 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers

On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>
> This patch introduces list based bpf insn patching infra to bpf core layer
> which is lower than verification layer.
>
> This layer has bpf insn sequence as the solo input, therefore the tasks
> to be finished during list linerization is:
>   - copy insn
>   - relocate jumps
>   - relocation line info.
>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Suggested-by: Edward Cree <ecree@solarflare.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  include/linux/filter.h |  25 +++++
>  kernel/bpf/core.c      | 268 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 293 insertions(+)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 1fe53e7..1fea68c 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -842,6 +842,31 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
>                                        const struct bpf_insn *patch, u32 len);
>  int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
>
> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
> +                       int idx_map[]);
> +
> +#define LIST_INSN_FLAG_PATCHED 0x1
> +#define LIST_INSN_FLAG_REMOVED 0x2
> +struct bpf_list_insn {
> +       struct bpf_insn insn;
> +       struct bpf_list_insn *next;
> +       s32 orig_idx;
> +       u32 flag;
> +};
> +
> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog);
> +void bpf_destroy_list_insn(struct bpf_list_insn *list);
> +/* Replace LIST_INSN with new list insns generated from PATCH. */
> +struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
> +                                         const struct bpf_insn *patch,
> +                                         u32 len);
> +/* Pre-patch list_insn with insns inside PATCH, meaning LIST_INSN is not
> + * touched. New list insns are inserted before it.
> + */
> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
> +                                            const struct bpf_insn *patch,
> +                                            u32 len);
> +
>  void bpf_clear_redirect_map(struct bpf_map *map);
>
>  static inline bool xdp_return_frame_no_direct(void)
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index e2c1b43..e60703e 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -502,6 +502,274 @@ int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt)
>         return WARN_ON_ONCE(bpf_adj_branches(prog, off, off + cnt, off, false));
>  }
>
> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
> +                       s32 idx_map[])
> +{
> +       u8 code = insn->code;
> +       s64 imm;
> +       s32 off;
> +
> +       if (BPF_CLASS(code) != BPF_JMP && BPF_CLASS(code) != BPF_JMP32)
> +               return 0;
> +
> +       if (BPF_CLASS(code) == BPF_JMP &&
> +           (BPF_OP(code) == BPF_EXIT ||
> +            (BPF_OP(code) == BPF_CALL && insn->src_reg != BPF_PSEUDO_CALL)))
> +               return 0;
> +
> +       /* BPF to BPF call. */
> +       if (BPF_OP(code) == BPF_CALL) {
> +               imm = idx_map[old_idx + insn->imm + 1] - new_idx - 1;
> +               if (imm < S32_MIN || imm > S32_MAX)
> +                       return -ERANGE;
> +               insn->imm = imm;
> +               return 1;
> +       }
> +
> +       /* Jump. */
> +       off = idx_map[old_idx + insn->off + 1] - new_idx - 1;
> +       if (off < S16_MIN || off > S16_MAX)
> +               return -ERANGE;
> +       insn->off = off;
> +       return 0;
> +}
> +
> +void bpf_destroy_list_insn(struct bpf_list_insn *list)
> +{
> +       struct bpf_list_insn *elem, *next;
> +
> +       for (elem = list; elem; elem = next) {
> +               next = elem->next;
> +               kvfree(elem);
> +       }
> +}
> +
> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog)
> +{
> +       unsigned int idx, len = prog->len;
> +       struct bpf_list_insn *hdr, *prev;
> +       struct bpf_insn *insns;
> +
> +       hdr = kvzalloc(sizeof(*hdr), GFP_KERNEL);
> +       if (!hdr)
> +               return ERR_PTR(-ENOMEM);
> +
> +       insns = prog->insnsi;
> +       hdr->insn = insns[0];
> +       hdr->orig_idx = 1;
> +       prev = hdr;

I'm not sure why you need this "prologue" instead of handling first
instruction uniformly in for loop below?

> +
> +       for (idx = 1; idx < len; idx++) {
> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
> +                                                     GFP_KERNEL);
> +
> +               if (!node) {
> +                       /* Destroy what has been allocated. */
> +                       bpf_destroy_list_insn(hdr);
> +                       return ERR_PTR(-ENOMEM);
> +               }
> +               node->insn = insns[idx];
> +               node->orig_idx = idx + 1;

Why orig_idx is 1-based? It's really confusing.

> +               prev->next = node;
> +               prev = node;
> +       }
> +
> +       return hdr;
> +}
> +
> +/* Linearize bpf list insn to array. */
> +static struct bpf_prog *bpf_linearize_list_insn(struct bpf_prog *prog,
> +                                               struct bpf_list_insn *list)
> +{
> +       u32 *idx_map, idx, prev_idx, fini_cnt = 0, orig_cnt = prog->len;
> +       struct bpf_insn *insns, *insn;
> +       struct bpf_list_insn *elem;
> +
> +       /* Calculate final size. */
> +       for (elem = list; elem; elem = elem->next)
> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
> +                       fini_cnt++;
> +
> +       insns = prog->insnsi;
> +       /* If prog length remains same, nothing else to do. */
> +       if (fini_cnt == orig_cnt) {
> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
> +                       *insn = elem->insn;
> +               return prog;
> +       }
> +       /* Realloc insn buffer when necessary. */
> +       if (fini_cnt > orig_cnt)
> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
> +                                       GFP_USER);
> +       if (!prog)
> +               return ERR_PTR(-ENOMEM);
> +       insns = prog->insnsi;
> +       prog->len = fini_cnt;
> +
> +       /* idx_map[OLD_IDX] = NEW_IDX */
> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
> +       if (!idx_map)
> +               return ERR_PTR(-ENOMEM);
> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
> +
> +       /* Copy over insn + calculate idx_map. */
> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> +               int orig_idx = elem->orig_idx - 1;
> +
> +               if (orig_idx >= 0) {
> +                       idx_map[orig_idx] = idx;
> +
> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
> +                               continue;
> +               }
> +               insns[idx++] = elem->insn;
> +       }
> +
> +       /* Relocate jumps using idx_map.
> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
> +        */
> +       for (idx = 0, prev_idx = 0, elem = list; elem; elem = elem->next) {
> +               int ret, orig_idx;
> +
> +               /* A removed insn doesn't increase new_pc */
> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
> +                       continue;
> +
> +               orig_idx = elem->orig_idx - 1;
> +               ret = bpf_jit_adj_imm_off(&insns[idx],
> +                                         orig_idx >= 0 ? orig_idx : prev_idx,
> +                                         idx, idx_map);
> +               idx++;
> +               if (ret < 0) {
> +                       kvfree(idx_map);
> +                       return ERR_PTR(ret);
> +               }
> +               if (orig_idx >= 0)
> +                       /* Record prev_idx. it is used for relocating jump insn
> +                        * inside patch buffer. For example, when doing jit
> +                        * blinding, a jump could be moved to some other
> +                        * positions inside the patch buffer, and its old_dst
> +                        * could be calculated using prev_idx.
> +                        */
> +                       prev_idx = orig_idx;
> +       }
> +
> +       /* Adjust linfo.
> +        *
> +        * NOTE: the prog reached core layer has been adjusted to contain insns
> +        *       for single function, however linfo contains information for
> +        *       whole program, so we need to make sure linfo beyond current
> +        *       function is handled properly.
> +        */
> +       if (prog->aux->nr_linfo) {
> +               u32 linfo_idx, insn_start, insn_end, nr_linfo, idx, delta;
> +               struct bpf_line_info *linfo;
> +
> +               linfo_idx = prog->aux->linfo_idx;
> +               linfo = &prog->aux->linfo[linfo_idx];
> +               insn_start = linfo[0].insn_off;
> +               insn_end = insn_start + orig_cnt;
> +               nr_linfo = prog->aux->nr_linfo - linfo_idx;
> +               delta = fini_cnt - orig_cnt;
> +               for (idx = 0; idx < nr_linfo; idx++) {
> +                       int adj_off;
> +
> +                       if (linfo[idx].insn_off >= insn_end) {
> +                               linfo[idx].insn_off += delta;
> +                               continue;
> +                       }
> +
> +                       adj_off = linfo[idx].insn_off - insn_start;
> +                       linfo[idx].insn_off = idx_map[adj_off] + insn_start;
> +               }
> +       }
> +       kvfree(idx_map);
> +
> +       return prog;
> +}
> +
> +struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
> +                                         const struct bpf_insn *patch,
> +                                         u32 len)
> +{
> +       struct bpf_list_insn *prev, *next;
> +       u32 insn_delta = len - 1;
> +       u32 idx;
> +
> +       list_insn->insn = *patch;
> +       list_insn->flag |= LIST_INSN_FLAG_PATCHED;
> +
> +       /* Since our patchlet doesn't expand the image, we're done. */
> +       if (insn_delta == 0)
> +               return list_insn;
> +
> +       len--;
> +       patch++;
> +
> +       prev = list_insn;
> +       next = list_insn->next;
> +       for (idx = 0; idx < len; idx++) {
> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
> +                                                     GFP_KERNEL);
> +
> +               if (!node) {
> +                       /* Link what's allocated, so list destroyer could
> +                        * free them.
> +                        */
> +                       prev->next = next;

Why this special handling, if you can just insert element so that list
is well-formed after each instruction?

> +                       return ERR_PTR(-ENOMEM);
> +               }
> +
> +               node->insn = patch[idx];
> +               prev->next = node;
> +               prev = node;

E.g.,

node->next = next;
prev->next = node;
prev = node;

> +       }
> +
> +       prev->next = next;

And no need for this either.

> +       return prev;
> +}
> +
> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
> +                                            const struct bpf_insn *patch,
> +                                            u32 len)

prepatch and patch functions should share the same logic.

Prepend is just that - insert all instructions from buffer before current insns.
Patch -> replace current one with first instriction in a buffer, then
prepend remaining ones before the next instruction (so patch should
call info prepend, with adjusted count and array pointer).

> +{
> +       struct bpf_list_insn *prev, *node, *begin_node;
> +       u32 idx;
> +
> +       if (!len)
> +               return list_insn;
> +
> +       node = kvzalloc(sizeof(*node), GFP_KERNEL);
> +       if (!node)
> +               return ERR_PTR(-ENOMEM);
> +       node->insn = patch[0];
> +       begin_node = node;
> +       prev = node;
> +
> +       for (idx = 1; idx < len; idx++) {
> +               node = kvzalloc(sizeof(*node), GFP_KERNEL);
> +               if (!node) {
> +                       node = begin_node;
> +                       /* Release what's has been allocated. */
> +                       while (node) {
> +                               struct bpf_list_insn *next = node->next;
> +
> +                               kvfree(node);
> +                               node = next;
> +                       }
> +                       return ERR_PTR(-ENOMEM);
> +               }
> +               node->insn = patch[idx];
> +               prev->next = node;
> +               prev = node;
> +       }
> +
> +       prev->next = list_insn;
> +       return begin_node;
> +}
> +
>  void bpf_prog_kallsyms_del_subprogs(struct bpf_prog *fp)
>  {
>         int i;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
  2019-07-04 21:26 ` [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer Jiong Wang
@ 2019-07-10 17:50   ` Andrii Nakryiko
  2019-07-11 11:59     ` [oss-drivers] " Jiong Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-10 17:50 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers

On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>
> Verification layer also needs to handle auxiliar info as well as adjusting
> subprog start.
>
> At this layer, insns inside patch buffer could be jump, but they should
> have been resolved, meaning they shouldn't jump to insn outside of the
> patch buffer. Lineration function for this layer won't touch insns inside
> patch buffer.
>
> Adjusting subprog is finished along with adjusting jump target when the
> input will cover bpf to bpf call insn, re-register subprog start is cheap.
> But adjustment when there is insn deleteion is not considered yet.
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 150 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a2e7637..2026d64 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
>         }
>  }
>
> +/* Linearize bpf list insn to array (verifier layer). */
> +static struct bpf_verifier_env *
> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
> +                            struct bpf_list_insn *list)

It's unclear why this returns env back? It's not allocating a new env,
so it's weird and unnecessary. Just return error code.

> +{
> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
> +       struct bpf_subprog_info *new_subinfo;
> +       struct bpf_insn_aux_data *new_data;
> +       struct bpf_prog *prog = env->prog;
> +       struct bpf_verifier_env *ret_env;
> +       struct bpf_insn *insns, *insn;
> +       struct bpf_list_insn *elem;
> +       int ret;
> +
> +       /* Calculate final size. */
> +       for (elem = list; elem; elem = elem->next)
> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
> +                       fini_cnt++;
> +
> +       orig_cnt = prog->len;
> +       insns = prog->insnsi;
> +       /* If prog length remains same, nothing else to do. */
> +       if (fini_cnt == orig_cnt) {
> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
> +                       *insn = elem->insn;
> +               return env;
> +       }
> +       /* Realloc insn buffer when necessary. */
> +       if (fini_cnt > orig_cnt)
> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
> +                                       GFP_USER);
> +       if (!prog)
> +               return ERR_PTR(-ENOMEM);
> +       insns = prog->insnsi;
> +       prog->len = fini_cnt;
> +       ret_env = env;
> +
> +       /* idx_map[OLD_IDX] = NEW_IDX */
> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
> +       if (!idx_map)
> +               return ERR_PTR(-ENOMEM);
> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
> +
> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
> +       if (!new_data) {
> +               kvfree(idx_map);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       /* Copy over insn + calculate idx_map. */
> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> +               int orig_idx = elem->orig_idx - 1;
> +
> +               if (orig_idx >= 0) {
> +                       idx_map[orig_idx] = idx;
> +
> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
> +                               continue;
> +
> +                       new_data[idx] = env->insn_aux_data[orig_idx];
> +
> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
> +                               new_data[idx].zext_dst =
> +                                       insn_has_def32(env, &elem->insn);
> +               } else {
> +                       new_data[idx].seen = true;
> +                       new_data[idx].zext_dst = insn_has_def32(env,
> +                                                               &elem->insn);
> +               }
> +               insns[idx++] = elem->insn;
> +       }
> +
> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
> +       if (!new_subinfo) {
> +               kvfree(idx_map);
> +               vfree(new_data);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
> +       env->subprog_cnt = 0;
> +       env->prog = prog;
> +       ret = add_subprog(env, 0);
> +       if (ret < 0) {
> +               ret_env = ERR_PTR(ret);
> +               goto free_all_ret;
> +       }
> +       /* Relocate jumps using idx_map.
> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
> +        */
> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> +               int orig_idx = elem->orig_idx;
> +
> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
> +                       continue;
> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
> +                       idx++;
> +                       continue;
> +               }
> +
> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
> +                                         idx_map);
> +               if (ret < 0) {
> +                       ret_env = ERR_PTR(ret);
> +                       goto free_all_ret;
> +               }
> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
> +               if (ret > 0) {
> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
> +                       if (ret < 0) {
> +                               ret_env = ERR_PTR(ret);
> +                               goto free_all_ret;
> +                       }
> +               }
> +               idx++;
> +       }
> +       if (ret < 0) {
> +               ret_env = ERR_PTR(ret);
> +               goto free_all_ret;
> +       }
> +
> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
> +               new_subinfo[idx].start = env->subprog_info[idx].start;
> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
> +
> +       /* Adjust linfo.
> +        * FIXME: no support for insn removal at the moment.
> +        */
> +       if (prog->aux->nr_linfo) {
> +               struct bpf_line_info *linfo = prog->aux->linfo;
> +               u32 nr_linfo = prog->aux->nr_linfo;
> +
> +               for (idx = 0; idx < nr_linfo; idx++)
> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
> +       }
> +       vfree(env->insn_aux_data);
> +       env->insn_aux_data = new_data;
> +       goto free_mem_list_ret;
> +free_all_ret:
> +       vfree(new_data);
> +free_mem_list_ret:
> +       kvfree(new_subinfo);
> +       kvfree(idx_map);
> +       return ret_env;
> +}
> +
>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
>  {
>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-10 17:39 ` [RFC bpf-next 0/8] bpf: accelerate insn patching speed Andrii Nakryiko
@ 2019-07-11 11:22   ` Jiong Wang
  2019-07-12 19:43     ` Andrii Nakryiko
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2019-07-11 11:22 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers


Andrii Nakryiko writes:

> On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>> This is an RFC based on latest bpf-next about acclerating insn patching
>> speed, it is now near the shape of final PATCH set, and we could see the
>> changes migrating to list patching would brings, so send out for
>> comments. Most of the info are in cover letter. I splitted the code in a
>> way to show API migration more easily.
>
>
> Hey Jiong,
>
>
> Sorry, took me a while to get to this and learn more about instruction
> patching. Overall this looks good and I think is a good direction.
> I'll post high-level feedback here, and some more
> implementation-specific ones in corresponding patches.

Great, thanks very much for the feedbacks. Most of your feedbacks are
hitting those pain points I exactly had ran into. For some of them, I
thought similar solutions like yours, but failed due to various
reasons. Let's go through them again, I could have missed some important
things.

Please see my replies below.

>>
>> Test Results
>> ===
>>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
>>     modes (interpreter, JIT, JIT with blinding).
>>
>>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
>>     patching time from 5100s (nearly one and a half hour) to less than
>>     0.5s for 1M insn patching.
>>
>> Known Issues
>> ===
>>   - The following warning is triggered when running scale test which
>>     contains 1M insns and patching:
>>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
>>
>>     This is caused by existing code, it can be reproduced on bpf-next
>>     master with jit blinding enabled, then run scale unit test, it will
>>     shown up after half an hour. After this set, patching is very fast, so
>>     it shows up quickly.
>>
>>   - No line info adjustment support when doing insn delete, subprog adj
>>     is with bug when doing insn delete as well. Generally, removal of insns
>>     could possibly cause remove of entire line or subprog, therefore
>>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
>>     don't have good idea and clean code for integrating this into the
>>     linearization code at the moment, will do more experimenting,
>>     appreciate ideas and suggestions on this.
>
> Is there any specific problem to detect which line info to delete? Or
> what am I missing besides careful implementation?

Mostly line info and subprog info are range info which covers a range of
insns. Deleting insns could causing you adjusting the range or removing one
range entirely. subprog info could be fully recalcuated during
linearization while line info I need some careful implementation and I
failed to have clean code for this during linearization also as said no
unit tests to help me understand whether the code is correct or not.

I will described this latter, spent too much time writing the following
reply. Might worth an separate discussion thread.

>>
>>     Insn delete doesn't happen on normal programs, for example Cilium
>>     benchmarks, and happens rarely on test_progs, so the test coverage is
>>     not good. That's also why this RFC have a full pass on selftest with
>>     this known issue.
>
> I hope you'll add test for deletion (and w/ corresponding line info)
> in final patch set :)

Will try. Need to spend some time on BTF format.
>
>>
>>   - Could further use mem pool to accelerate the speed, changes are trivial
>>     on top of this RFC, and could be 2x extra faster. Not included in this
>>     RFC as reducing the algo complexity from quadratic to linear of insn
>>     number is the first step.
>
> Honestly, I think that would add more complexity than necessary, and I
> think we can further speed up performance without that, see below.
>
>>
>> Background
>> ===
>> This RFC aims to accelerate BPF insn patching speed, patching means expand
>> one bpf insn at any offset inside bpf prog into a set of new insns, or
>> remove insns.
>>
>> At the moment, insn patching is quadratic of insn number, this is due to
>> branch targets of jump insns needs to be adjusted, and the algo used is:
>>
>>   for insn inside prog
>>     patch insn + regeneate bpf prog
>>     for insn inside new prog
>>       adjust jump target
>>
>> This is causing significant time spending when a bpf prog requires large
>> amount of patching on different insns. Benchmarking shows it could take
>> more than half minutes to finish patching when patching number is more
>> than 50K, and the time spent could be more than one hour when patching
>> number is around 1M.
>>
>>   15000   :    3s
>>   45000   :   29s
>>   95000   :  125s
>>   195000  :  712s
>>   1000000 : 5100s
>>
>> This RFC introduces new patching infrastructure. Before doing insn
>> patching, insns in bpf prog are turned into a singly linked list, insert
>> new insns just insert new list node, delete insns just set delete flag.
>> And finally, the list is linearized back into array, and branch target
>> adjustment is done for all jump insns during linearization. This algo
>> brings the time complexity from quadratic to linear of insn number.
>>
>> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
>> on medium sized prog, and for a 1M patching it reduce the time from 5100s
>> to less than 0.5s.
>>
>> Patching API
>> ===
>> Insn patching could happen on two layers inside BPF. One is "core layer"
>> where only BPF insns are patched. The other is "verification layer" where
>> insns have corresponding aux info as well high level subprog info, so
>> insn patching means aux info needs to be patched as well, and subprog info
>> needs to be adjusted. BPF prog also has debug info associated, so line info
>> should always be updated after insn patching.
>>
>> So, list creation, destroy, insert, delete is the same for both layer,
>> but lineration is different. "verification layer" patching require extra
>> work. Therefore the patch APIs are:
>>
>>    list creation:                bpf_create_list_insn
>>    list patch:                   bpf_patch_list_insn
>>    list pre-patch:               bpf_prepatch_list_insn
>
> I think pre-patch name is very confusing, until I read full
> description I couldn't understand what it's supposed to be used for.
> Speaking of bpf_patch_list_insn, patch is also generic enough to leave
> me wondering whether instruction buffer is inserted after instruction,
> or instruction is replaced with a bunch of instructions.
>
> So how about two more specific names:
> bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
> instruction with a list of patch instructions)
> bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
> one is pretty clear).

My sense on English word is not great, will switch to above which indeed
reads more clear.

>>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
>>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
>
> These two functions are both quite involved, as well as share a lot of
> common code. I'd rather have one linearize instruction, that takes env
> as an optional parameter. If env is specified (which is the case for
> all cases except for constant blinding pass), then adjust aux_data and
> subprogs along the way.

Two version of lineration and how to unify them was a painpoint to me. I
thought to factor out some of the common code out, but it actually doesn't
count much, the final size counting + insnsi resize parts are the same,
then things start to diverge since the "Copy over insn" loop.

verifier layer needs to copy and initialize aux data etc. And jump
relocation is different. At core layer, the use case is JIT blinding which
could expand an jump_imm insn into a and/or/jump_reg sequence, and the
jump_reg is at the end of the patch buffer, it should be relocated. While
all use case in verifier layer, no jump in the prog will be patched and all
new jumps in patch buffer will jump inside the buffer locally so no need to
resolve.

And yes we could unify them into one and control the diverge using
argument, but then where to place the function is an issue. My
understanding is verifier.c is designed to be on top of core.c and core.c
should not reference and no need to be aware of any verifier specific data
structures, for example env or bpf_aux_insn_data etc.

So, in this RFC, I had choosed to write separate linerization function for
core and verifier layer. Does this make sense?

>
> This would keep logic less duplicated and shouldn't complexity beyond
> few null checks in few places.
>
>>    list destroy:                 bpf_destroy_list_insn
>>
>
> I'd also add a macro foreach_list_insn instead of explicit for loops
> in multiple places. That would also allow to skip deleted instructions
> transparently.
>
>> list patch could change the insn at patch point, it will invalid the aux
>
> typo: invalid -> invalidate

Ack.

>
>> info at patching point. list pre-patch insert new insns before patch point
>> where the insn and associated aux info are not touched, it is used for
>> example in convert_ctx_access when generating prologue.
>>
>> Typical API sequence for one patching pass:
>>
>>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
>>    for (elem = list; elem; elem = elem->next)
>>       patch_buf = gen_patch_buf_logic;
>>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>>    bpf_prog = bpf_linearize_list_insn(list)
>>    bpf_destroy_list_insn(list)
>>
>> Several patching passes could also share the same list:
>>
>>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
>>    for (elem = list; elem; elem = elem->next)
>>       patch_buf = gen_patch_buf_logic1;
>>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>>    for (elem = list; elem; elem = elem->next)
>>       patch_buf = gen_patch_buf_logic2;
>>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>>    bpf_prog = bpf_linearize_list_insn(list)
>>    bpf_destroy_list_insn(list)
>>
>> but note new inserted insns int early passes won't have aux info except
>> zext info. So, if one patch pass requires all aux info updated and
>> recalculated for all insns including those pathced, it should first
>> linearize the old list, then re-create the list. The RFC always create and
>> linearize the list for each migrated patching pass separately.
>
> I think we should do just one list creation, few passes of patching
> and then linearize once. That will save quite a lot of memory
> allocation and will speed up a lot of things. All the verifier
> patching happens one after the other without any other functionality
> in between, so there shouldn't be any problem.

Yes, as mentioned above, it is possible and I had tried to do it in an very
initial impl. IIRC convert_ctx_access + fixup_bpf_calls could share the
same list, but then the 32-bit zero extension insertion pass requires
aux.zext_dst set properly for all instructions including those patched
one which we need to linearize the list first (as we set zext_dst during
linerization), or the other choice is we do the zext_dst initialization
during bpf_patch_list_insn, but this then make bpf_patch_list_insn diverge
between core and verifier layer.

> As for aux_data. We can solve that even more simply and reliably by
> storing a pointer along the struct bpf_list_insn

This is exactly what I had implemented initially, but then the issue is how
to handle aux_data for patched insn? IIRC I was leave it as a NULL pointer,
but later found zext_dst info is required for all insns, so I end up
duplicating zext_dst in bpf_list_insn.

This leads me worrying we need to keep duplicating fields there as soon as
there is new similar requirements in future patching pass and I thought it
might be better to just reference the aux_data inside env using orig_idx,
this avoids duplicating information, but we need to make sure used fields
inside aux_data for patched insn update-to-date during linearization or
patching list.

> (btw, how about calling it bpf_patchable_insn?).

No preference, will use this one.

> Here's how I propose to represent this patchable instruction:
>
> struct bpf_list_insn {
>        struct bpf_insn insn;
>        struct bpf_list_insn *next;
>        struct bpf_list_insn *target;
>        struct bpf_insn_aux_data *aux_data;
>        s32 orig_idx; // can repurpose this to have three meanings:
>                      // -2 - deleted
>                      // -1 - patched/inserted insn
>                      // >=0 - original idx

I actually had experimented the -2/-1/0 trick, exactly the same number
assignment :) IIRC the code was not clear compared with using flag, the
reason seems to be:
  1. we still need orig_idx of an patched insn somehow, meaning negate the
     index.
  2. somehow somecode need to know whether one insn is deleted or patched
     after the negation, so I end up with some ugly code.

Anyway, I might had not thought hard enough on this, I will retry using the
special index instead of flag, hopefully I could have clean code this time.

> };
>
> The idea would be as follows:
> 1. when creating original list, target pointer will point directly to
> a patchable instruction wrapper for jumps/calls. This will allow to
> stop tracking and re-calculating jump offsets and instruction indicies
> until linearization.

Not sure I have followed the idea of "target" pointer. At the moment we are
using index mapping array (generated as by-product during coping insn).

While the "target" pointer means to during list initialization, each jump
insn will have target initialized to the list node of the converted jump
destination insn, and all those non-jump insns are with NULL? Then during
linearization you assign index to each list node (could be done as
by-product of other pass) before insn coping which could then relocate the
insn during the coping as the "target" would have final index calculated?
Am I following correctly?

> 2. aux_data is also filled at that point. Later at linearization time
> you'd just iterate over all the instructions in final order and copy
> original aux_data, if it's present. And then just repace env's
> aux_data array at the end, should be very simple and fast.

As explained, I am worried making aux_data a pointer will causing
duplicating some fields into list_insn if the fields are required for
patched insns.

> 3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
> same list of instructions and those passes will just keep inserting
> instruction buffers. Given we have restriction that all the jumps are
> only within patch buffer, it will be trivial to construct proper
> patchable instruction wrappers for newly added instructions, with NULL
> for aux_data and possibly non-NULL target (if it's a JMP insn).
> 4. After those passes, linearize, adjust subprogs (for this you'll
> probably still need to create index mapping, right?), copy or create
> new aux_data.
> 5. Done.
>
> What do you think? I think this should be overall simpler and faster.
> But let me know if I'm missing something.

Thanks for all these thoughts, they are very good suggestions and reminds
me to revisit some points I had forgotten. I will do the following things:

  1. retry the negative index solution to eliminate flag if the result code
     could be clean.
  2. the "target" pointer seems make sense, it makes list_insn bigger but
     normally space trade with time, so I will try to implement it to see
     how the code looks like.
  3. I still have concerns on making aux_data as pointer. Mostly due to
     patched insn will have NULL pointer and in case aux info of patched
     insn is required, we need to duplicate info inside list_insn. For
     example 32-bit zext opt requires zext_dst.

Regards,
Jiong

>>
>> Compared with old patching code, this new infrastructure has much less core
>> code, even though the final code has a couple of extra lines but that is
>> mostly due to for list based infrastructure, we need to do more error
>> checks, so the list and associated aux data structure could be freed when
>> errors happens.
>>
>> Patching Restrictions
>> ===
>>   - For core layer, the linearization assume no new jumps inside patch buf.
>>     Currently, the only user of this layer is jit blinding.
>>   - For verifier layer, there could be new jumps inside patch buf, but
>>     they should have branch target resolved themselves, meaning new jumps
>>     doesn't jump to insns out of the patch buf. This is the case for all
>>     existing verifier layer users.
>>   - bpf_insn_aux_data for all patched insns including the one at patch
>>     point are invalidated, only 32-bit zext info will be recalcuated.
>>     If the aux data of insn at patch point needs to be retained, it is
>>     purely insn insertion, so need to use the pre-patch API.
>>
>> I plan to send out a PATCH set once I finished insn deletion line info adj
>> support, please have a looks at this RFC, and appreciate feedbacks.
>>
>> Jiong Wang (8):
>>   bpf: introducing list based insn patching infra to core layer
>>   bpf: extend list based insn patching infra to verification layer
>>   bpf: migrate jit blinding to list patching infra
>>   bpf: migrate convert_ctx_accesses to list patching infra
>>   bpf: migrate fixup_bpf_calls to list patching infra
>>   bpf: migrate zero extension opt to list patching infra
>>   bpf: migrate insn remove to list patching infra
>>   bpf: delete all those code around old insn patching infrastructure
>>
>>  include/linux/bpf_verifier.h |   1 -
>>  include/linux/filter.h       |  27 +-
>>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
>>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
>>  4 files changed, 580 insertions(+), 528 deletions(-)
>>
>> --
>> 2.7.4
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer
  2019-07-10 17:49   ` Andrii Nakryiko
@ 2019-07-11 11:53     ` Jiong Wang
  2019-07-12 19:48       ` Andrii Nakryiko
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2019-07-11 11:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers


Andrii Nakryiko writes:

> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>> This patch introduces list based bpf insn patching infra to bpf core layer
>> which is lower than verification layer.
>>
>> This layer has bpf insn sequence as the solo input, therefore the tasks
>> to be finished during list linerization is:
>>   - copy insn
>>   - relocate jumps
>>   - relocation line info.
>>
>> Suggested-by: Alexei Starovoitov <ast@kernel.org>
>> Suggested-by: Edward Cree <ecree@solarflare.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>  include/linux/filter.h |  25 +++++
>>  kernel/bpf/core.c      | 268 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 293 insertions(+)
>>
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 1fe53e7..1fea68c 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -842,6 +842,31 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
>>                                        const struct bpf_insn *patch, u32 len);
>>  int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
>>
>> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
>> +                       int idx_map[]);
>> +
>> +#define LIST_INSN_FLAG_PATCHED 0x1
>> +#define LIST_INSN_FLAG_REMOVED 0x2
>> +struct bpf_list_insn {
>> +       struct bpf_insn insn;
>> +       struct bpf_list_insn *next;
>> +       s32 orig_idx;
>> +       u32 flag;
>> +};
>> +
>> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog);
>> +void bpf_destroy_list_insn(struct bpf_list_insn *list);
>> +/* Replace LIST_INSN with new list insns generated from PATCH. */
>> +struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
>> +                                         const struct bpf_insn *patch,
>> +                                         u32 len);
>> +/* Pre-patch list_insn with insns inside PATCH, meaning LIST_INSN is not
>> + * touched. New list insns are inserted before it.
>> + */
>> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
>> +                                            const struct bpf_insn *patch,
>> +                                            u32 len);
>> +
>>  void bpf_clear_redirect_map(struct bpf_map *map);
>>
>>  static inline bool xdp_return_frame_no_direct(void)
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index e2c1b43..e60703e 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -502,6 +502,274 @@ int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt)
>>         return WARN_ON_ONCE(bpf_adj_branches(prog, off, off + cnt, off, false));
>>  }
>>
>> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
>> +                       s32 idx_map[])
>> +{
>> +       u8 code = insn->code;
>> +       s64 imm;
>> +       s32 off;
>> +
>> +       if (BPF_CLASS(code) != BPF_JMP && BPF_CLASS(code) != BPF_JMP32)
>> +               return 0;
>> +
>> +       if (BPF_CLASS(code) == BPF_JMP &&
>> +           (BPF_OP(code) == BPF_EXIT ||
>> +            (BPF_OP(code) == BPF_CALL && insn->src_reg != BPF_PSEUDO_CALL)))
>> +               return 0;
>> +
>> +       /* BPF to BPF call. */
>> +       if (BPF_OP(code) == BPF_CALL) {
>> +               imm = idx_map[old_idx + insn->imm + 1] - new_idx - 1;
>> +               if (imm < S32_MIN || imm > S32_MAX)
>> +                       return -ERANGE;
>> +               insn->imm = imm;
>> +               return 1;
>> +       }
>> +
>> +       /* Jump. */
>> +       off = idx_map[old_idx + insn->off + 1] - new_idx - 1;
>> +       if (off < S16_MIN || off > S16_MAX)
>> +               return -ERANGE;
>> +       insn->off = off;
>> +       return 0;
>> +}
>> +
>> +void bpf_destroy_list_insn(struct bpf_list_insn *list)
>> +{
>> +       struct bpf_list_insn *elem, *next;
>> +
>> +       for (elem = list; elem; elem = next) {
>> +               next = elem->next;
>> +               kvfree(elem);
>> +       }
>> +}
>> +
>> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog)
>> +{
>> +       unsigned int idx, len = prog->len;
>> +       struct bpf_list_insn *hdr, *prev;
>> +       struct bpf_insn *insns;
>> +
>> +       hdr = kvzalloc(sizeof(*hdr), GFP_KERNEL);
>> +       if (!hdr)
>> +               return ERR_PTR(-ENOMEM);
>> +
>> +       insns = prog->insnsi;
>> +       hdr->insn = insns[0];
>> +       hdr->orig_idx = 1;
>> +       prev = hdr;
>
> I'm not sure why you need this "prologue" instead of handling first
> instruction uniformly in for loop below?

It is because the head of the list doesn't have precessor, so no need of
the prev->next assignment, not could do a check inside the loop to rule the
head out when doing it.

>> +
>> +       for (idx = 1; idx < len; idx++) {
>> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
>> +                                                     GFP_KERNEL);
>> +
>> +               if (!node) {
>> +                       /* Destroy what has been allocated. */
>> +                       bpf_destroy_list_insn(hdr);
>> +                       return ERR_PTR(-ENOMEM);
>> +               }
>> +               node->insn = insns[idx];
>> +               node->orig_idx = idx + 1;
>
> Why orig_idx is 1-based? It's really confusing.

orig_idx == 0 means one insn is without original insn, means it is an new
insn generated for patching purpose.

While the LIST_INSN_FLAG_PATCHED in the RFC means one insn in original prog
is patched.

I had been trying to differenciate above two cases, but yes, they are
confusing and differenciating them might be useless, if an insn in original
prog is patched, all its info could be treated as clobbered and needing
re-calculating or should do conservative assumption.

>
>> +               prev->next = node;
>> +               prev = node;
>> +       }
>> +
>> +       return hdr;
>> +}
>> +
>> +/* Linearize bpf list insn to array. */
>> +static struct bpf_prog *bpf_linearize_list_insn(struct bpf_prog *prog,
>> +                                               struct bpf_list_insn *list)
>> +{
>> +       u32 *idx_map, idx, prev_idx, fini_cnt = 0, orig_cnt = prog->len;
>> +       struct bpf_insn *insns, *insn;
>> +       struct bpf_list_insn *elem;
>> +
>> +       /* Calculate final size. */
>> +       for (elem = list; elem; elem = elem->next)
>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
>> +                       fini_cnt++;
>> +
>> +       insns = prog->insnsi;
>> +       /* If prog length remains same, nothing else to do. */
>> +       if (fini_cnt == orig_cnt) {
>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
>> +                       *insn = elem->insn;
>> +               return prog;
>> +       }
>> +       /* Realloc insn buffer when necessary. */
>> +       if (fini_cnt > orig_cnt)
>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
>> +                                       GFP_USER);
>> +       if (!prog)
>> +               return ERR_PTR(-ENOMEM);
>> +       insns = prog->insnsi;
>> +       prog->len = fini_cnt;
>> +
>> +       /* idx_map[OLD_IDX] = NEW_IDX */
>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
>> +       if (!idx_map)
>> +               return ERR_PTR(-ENOMEM);
>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
>> +
>> +       /* Copy over insn + calculate idx_map. */
>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> +               int orig_idx = elem->orig_idx - 1;
>> +
>> +               if (orig_idx >= 0) {
>> +                       idx_map[orig_idx] = idx;
>> +
>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> +                               continue;
>> +               }
>> +               insns[idx++] = elem->insn;
>> +       }
>> +
>> +       /* Relocate jumps using idx_map.
>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
>> +        */
>> +       for (idx = 0, prev_idx = 0, elem = list; elem; elem = elem->next) {
>> +               int ret, orig_idx;
>> +
>> +               /* A removed insn doesn't increase new_pc */
>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> +                       continue;
>> +
>> +               orig_idx = elem->orig_idx - 1;
>> +               ret = bpf_jit_adj_imm_off(&insns[idx],
>> +                                         orig_idx >= 0 ? orig_idx : prev_idx,
>> +                                         idx, idx_map);
>> +               idx++;
>> +               if (ret < 0) {
>> +                       kvfree(idx_map);
>> +                       return ERR_PTR(ret);
>> +               }
>> +               if (orig_idx >= 0)
>> +                       /* Record prev_idx. it is used for relocating jump insn
>> +                        * inside patch buffer. For example, when doing jit
>> +                        * blinding, a jump could be moved to some other
>> +                        * positions inside the patch buffer, and its old_dst
>> +                        * could be calculated using prev_idx.
>> +                        */
>> +                       prev_idx = orig_idx;
>> +       }
>> +
>> +       /* Adjust linfo.
>> +        *
>> +        * NOTE: the prog reached core layer has been adjusted to contain insns
>> +        *       for single function, however linfo contains information for
>> +        *       whole program, so we need to make sure linfo beyond current
>> +        *       function is handled properly.
>> +        */
>> +       if (prog->aux->nr_linfo) {
>> +               u32 linfo_idx, insn_start, insn_end, nr_linfo, idx, delta;
>> +               struct bpf_line_info *linfo;
>> +
>> +               linfo_idx = prog->aux->linfo_idx;
>> +               linfo = &prog->aux->linfo[linfo_idx];
>> +               insn_start = linfo[0].insn_off;
>> +               insn_end = insn_start + orig_cnt;
>> +               nr_linfo = prog->aux->nr_linfo - linfo_idx;
>> +               delta = fini_cnt - orig_cnt;
>> +               for (idx = 0; idx < nr_linfo; idx++) {
>> +                       int adj_off;
>> +
>> +                       if (linfo[idx].insn_off >= insn_end) {
>> +                               linfo[idx].insn_off += delta;
>> +                               continue;
>> +                       }
>> +
>> +                       adj_off = linfo[idx].insn_off - insn_start;
>> +                       linfo[idx].insn_off = idx_map[adj_off] + insn_start;
>> +               }
>> +       }
>> +       kvfree(idx_map);
>> +
>> +       return prog;
>> +}
>> +
>> +struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
>> +                                         const struct bpf_insn *patch,
>> +                                         u32 len)
>> +{
>> +       struct bpf_list_insn *prev, *next;
>> +       u32 insn_delta = len - 1;
>> +       u32 idx;
>> +
>> +       list_insn->insn = *patch;
>> +       list_insn->flag |= LIST_INSN_FLAG_PATCHED;
>> +
>> +       /* Since our patchlet doesn't expand the image, we're done. */
>> +       if (insn_delta == 0)
>> +               return list_insn;
>> +
>> +       len--;
>> +       patch++;
>> +
>> +       prev = list_insn;
>> +       next = list_insn->next;
>> +       for (idx = 0; idx < len; idx++) {
>> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
>> +                                                     GFP_KERNEL);
>> +
>> +               if (!node) {
>> +                       /* Link what's allocated, so list destroyer could
>> +                        * free them.
>> +                        */
>> +                       prev->next = next;
>
> Why this special handling, if you can just insert element so that list
> is well-formed after each instruction?

Good idea, just always do "node->next = next", the "prev->next = node" in
next round will fix it.

>
>> +                       return ERR_PTR(-ENOMEM);
>> +               }
>> +
>> +               node->insn = patch[idx];
>> +               prev->next = node;
>> +               prev = node;
>
> E.g.,
>
> node->next = next;
> prev->next = node;
> prev = node;
>
>> +       }
>> +
>> +       prev->next = next;
>
> And no need for this either.
>
>> +       return prev;
>> +}
>> +
>> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
>> +                                            const struct bpf_insn *patch,
>> +                                            u32 len)
>
> prepatch and patch functions should share the same logic.
>
> Prepend is just that - insert all instructions from buffer before current insns.
> Patch -> replace current one with first instriction in a buffer, then
> prepend remaining ones before the next instruction (so patch should
> call info prepend, with adjusted count and array pointer).

Ack, there indeed has quite a few things to simplify.

>
>> +{
>> +       struct bpf_list_insn *prev, *node, *begin_node;
>> +       u32 idx;
>> +
>> +       if (!len)
>> +               return list_insn;
>> +
>> +       node = kvzalloc(sizeof(*node), GFP_KERNEL);
>> +       if (!node)
>> +               return ERR_PTR(-ENOMEM);
>> +       node->insn = patch[0];
>> +       begin_node = node;
>> +       prev = node;
>> +
>> +       for (idx = 1; idx < len; idx++) {
>> +               node = kvzalloc(sizeof(*node), GFP_KERNEL);
>> +               if (!node) {
>> +                       node = begin_node;
>> +                       /* Release what's has been allocated. */
>> +                       while (node) {
>> +                               struct bpf_list_insn *next = node->next;
>> +
>> +                               kvfree(node);
>> +                               node = next;
>> +                       }
>> +                       return ERR_PTR(-ENOMEM);
>> +               }
>> +               node->insn = patch[idx];
>> +               prev->next = node;
>> +               prev = node;
>> +       }
>> +
>> +       prev->next = list_insn;
>> +       return begin_node;
>> +}
>> +
>>  void bpf_prog_kallsyms_del_subprogs(struct bpf_prog *fp)
>>  {
>>         int i;
>> --
>> 2.7.4
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [oss-drivers] Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
  2019-07-10 17:50   ` Andrii Nakryiko
@ 2019-07-11 11:59     ` Jiong Wang
  2019-07-11 12:20       ` Jiong Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2019-07-11 11:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers


Andrii Nakryiko writes:

> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>> Verification layer also needs to handle auxiliar info as well as adjusting
>> subprog start.
>>
>> At this layer, insns inside patch buffer could be jump, but they should
>> have been resolved, meaning they shouldn't jump to insn outside of the
>> patch buffer. Lineration function for this layer won't touch insns inside
>> patch buffer.
>>
>> Adjusting subprog is finished along with adjusting jump target when the
>> input will cover bpf to bpf call insn, re-register subprog start is cheap.
>> But adjustment when there is insn deleteion is not considered yet.
>>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 150 insertions(+)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index a2e7637..2026d64 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
>>         }
>>  }
>>
>> +/* Linearize bpf list insn to array (verifier layer). */
>> +static struct bpf_verifier_env *
>> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
>> +                            struct bpf_list_insn *list)
>
> It's unclear why this returns env back? It's not allocating a new env,
> so it's weird and unnecessary. Just return error code.

The reason is I was thinking we have two layers in BPF, the core and the
verifier.

For core layer (the relevant file is core.c), when doing patching, the
input is insn list and bpf_prog, the linearization should linearize the
insn list into insn array, and also whatever others affect inside bpf_prog
due to changing on insns, for example line info inside prog->aux. So the
return value is bpf_prog for core layer linearization hook. 

For verifier layer, it is similar, but the context if bpf_verifier_env, the
linearization hook should linearize the insn list, and also those affected
inside env, for example bpf_insn_aux_data, so the return value is
bpf_verifier_env, meaning returning an updated verifier context
(bpf_verifier_env) after insn list linearization.

Make sense?

Regards,
Jiong

>
>> +{
>> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
>> +       struct bpf_subprog_info *new_subinfo;
>> +       struct bpf_insn_aux_data *new_data;
>> +       struct bpf_prog *prog = env->prog;
>> +       struct bpf_verifier_env *ret_env;
>> +       struct bpf_insn *insns, *insn;
>> +       struct bpf_list_insn *elem;
>> +       int ret;
>> +
>> +       /* Calculate final size. */
>> +       for (elem = list; elem; elem = elem->next)
>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
>> +                       fini_cnt++;
>> +
>> +       orig_cnt = prog->len;
>> +       insns = prog->insnsi;
>> +       /* If prog length remains same, nothing else to do. */
>> +       if (fini_cnt == orig_cnt) {
>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
>> +                       *insn = elem->insn;
>> +               return env;
>> +       }
>> +       /* Realloc insn buffer when necessary. */
>> +       if (fini_cnt > orig_cnt)
>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
>> +                                       GFP_USER);
>> +       if (!prog)
>> +               return ERR_PTR(-ENOMEM);
>> +       insns = prog->insnsi;
>> +       prog->len = fini_cnt;
>> +       ret_env = env;
>> +
>> +       /* idx_map[OLD_IDX] = NEW_IDX */
>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
>> +       if (!idx_map)
>> +               return ERR_PTR(-ENOMEM);
>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
>> +
>> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
>> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
>> +       if (!new_data) {
>> +               kvfree(idx_map);
>> +               return ERR_PTR(-ENOMEM);
>> +       }
>> +
>> +       /* Copy over insn + calculate idx_map. */
>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> +               int orig_idx = elem->orig_idx - 1;
>> +
>> +               if (orig_idx >= 0) {
>> +                       idx_map[orig_idx] = idx;
>> +
>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> +                               continue;
>> +
>> +                       new_data[idx] = env->insn_aux_data[orig_idx];
>> +
>> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
>> +                               new_data[idx].zext_dst =
>> +                                       insn_has_def32(env, &elem->insn);
>> +               } else {
>> +                       new_data[idx].seen = true;
>> +                       new_data[idx].zext_dst = insn_has_def32(env,
>> +                                                               &elem->insn);
>> +               }
>> +               insns[idx++] = elem->insn;
>> +       }
>> +
>> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
>> +       if (!new_subinfo) {
>> +               kvfree(idx_map);
>> +               vfree(new_data);
>> +               return ERR_PTR(-ENOMEM);
>> +       }
>> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
>> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
>> +       env->subprog_cnt = 0;
>> +       env->prog = prog;
>> +       ret = add_subprog(env, 0);
>> +       if (ret < 0) {
>> +               ret_env = ERR_PTR(ret);
>> +               goto free_all_ret;
>> +       }
>> +       /* Relocate jumps using idx_map.
>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
>> +        */
>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> +               int orig_idx = elem->orig_idx;
>> +
>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> +                       continue;
>> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
>> +                       idx++;
>> +                       continue;
>> +               }
>> +
>> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
>> +                                         idx_map);
>> +               if (ret < 0) {
>> +                       ret_env = ERR_PTR(ret);
>> +                       goto free_all_ret;
>> +               }
>> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
>> +               if (ret > 0) {
>> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
>> +                       if (ret < 0) {
>> +                               ret_env = ERR_PTR(ret);
>> +                               goto free_all_ret;
>> +                       }
>> +               }
>> +               idx++;
>> +       }
>> +       if (ret < 0) {
>> +               ret_env = ERR_PTR(ret);
>> +               goto free_all_ret;
>> +       }
>> +
>> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
>> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
>> +               new_subinfo[idx].start = env->subprog_info[idx].start;
>> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
>> +
>> +       /* Adjust linfo.
>> +        * FIXME: no support for insn removal at the moment.
>> +        */
>> +       if (prog->aux->nr_linfo) {
>> +               struct bpf_line_info *linfo = prog->aux->linfo;
>> +               u32 nr_linfo = prog->aux->nr_linfo;
>> +
>> +               for (idx = 0; idx < nr_linfo; idx++)
>> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
>> +       }
>> +       vfree(env->insn_aux_data);
>> +       env->insn_aux_data = new_data;
>> +       goto free_mem_list_ret;
>> +free_all_ret:
>> +       vfree(new_data);
>> +free_mem_list_ret:
>> +       kvfree(new_subinfo);
>> +       kvfree(idx_map);
>> +       return ret_env;
>> +}
>> +
>>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
>>  {
>>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
>> --
>> 2.7.4
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [oss-drivers] Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
  2019-07-11 11:59     ` [oss-drivers] " Jiong Wang
@ 2019-07-11 12:20       ` Jiong Wang
  2019-07-12 19:51         ` Andrii Nakryiko
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2019-07-11 12:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Edward Cree, Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf,
	Networking, oss-drivers


Jiong Wang writes:

> Andrii Nakryiko writes:
>
>> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>>>
>>> Verification layer also needs to handle auxiliar info as well as adjusting
>>> subprog start.
>>>
>>> At this layer, insns inside patch buffer could be jump, but they should
>>> have been resolved, meaning they shouldn't jump to insn outside of the
>>> patch buffer. Lineration function for this layer won't touch insns inside
>>> patch buffer.
>>>
>>> Adjusting subprog is finished along with adjusting jump target when the
>>> input will cover bpf to bpf call insn, re-register subprog start is cheap.
>>> But adjustment when there is insn deleteion is not considered yet.
>>>
>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>>> ---
>>>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 150 insertions(+)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index a2e7637..2026d64 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
>>>         }
>>>  }
>>>
>>> +/* Linearize bpf list insn to array (verifier layer). */
>>> +static struct bpf_verifier_env *
>>> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
>>> +                            struct bpf_list_insn *list)
>>
>> It's unclear why this returns env back? It's not allocating a new env,
>> so it's weird and unnecessary. Just return error code.
>
> The reason is I was thinking we have two layers in BPF, the core and the
> verifier.
>
> For core layer (the relevant file is core.c), when doing patching, the
> input is insn list and bpf_prog, the linearization should linearize the
> insn list into insn array, and also whatever others affect inside bpf_prog
> due to changing on insns, for example line info inside prog->aux. So the
> return value is bpf_prog for core layer linearization hook. 
>
> For verifier layer, it is similar, but the context if bpf_verifier_env, the
> linearization hook should linearize the insn list, and also those affected
> inside env, for example bpf_insn_aux_data, so the return value is
> bpf_verifier_env, meaning returning an updated verifier context
> (bpf_verifier_env) after insn list linearization.

Realized your point is no new env is allocated, so just return error
code. Yes, the env pointer is not changed, just internal data is
updated. Return bpf_verifier_env mostly is trying to make the hook more
clear that it returns an updated "context" where the linearization happens,
for verifier layer, it is bpf_verifier_env, and for core layer, it is
bpf_prog, so return value was designed to return these two types.

>
> Make sense?
>
> Regards,
> Jiong
>
>>
>>> +{
>>> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
>>> +       struct bpf_subprog_info *new_subinfo;
>>> +       struct bpf_insn_aux_data *new_data;
>>> +       struct bpf_prog *prog = env->prog;
>>> +       struct bpf_verifier_env *ret_env;
>>> +       struct bpf_insn *insns, *insn;
>>> +       struct bpf_list_insn *elem;
>>> +       int ret;
>>> +
>>> +       /* Calculate final size. */
>>> +       for (elem = list; elem; elem = elem->next)
>>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
>>> +                       fini_cnt++;
>>> +
>>> +       orig_cnt = prog->len;
>>> +       insns = prog->insnsi;
>>> +       /* If prog length remains same, nothing else to do. */
>>> +       if (fini_cnt == orig_cnt) {
>>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
>>> +                       *insn = elem->insn;
>>> +               return env;
>>> +       }
>>> +       /* Realloc insn buffer when necessary. */
>>> +       if (fini_cnt > orig_cnt)
>>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
>>> +                                       GFP_USER);
>>> +       if (!prog)
>>> +               return ERR_PTR(-ENOMEM);
>>> +       insns = prog->insnsi;
>>> +       prog->len = fini_cnt;
>>> +       ret_env = env;
>>> +
>>> +       /* idx_map[OLD_IDX] = NEW_IDX */
>>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
>>> +       if (!idx_map)
>>> +               return ERR_PTR(-ENOMEM);
>>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
>>> +
>>> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
>>> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
>>> +       if (!new_data) {
>>> +               kvfree(idx_map);
>>> +               return ERR_PTR(-ENOMEM);
>>> +       }
>>> +
>>> +       /* Copy over insn + calculate idx_map. */
>>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>>> +               int orig_idx = elem->orig_idx - 1;
>>> +
>>> +               if (orig_idx >= 0) {
>>> +                       idx_map[orig_idx] = idx;
>>> +
>>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
>>> +                               continue;
>>> +
>>> +                       new_data[idx] = env->insn_aux_data[orig_idx];
>>> +
>>> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
>>> +                               new_data[idx].zext_dst =
>>> +                                       insn_has_def32(env, &elem->insn);
>>> +               } else {
>>> +                       new_data[idx].seen = true;
>>> +                       new_data[idx].zext_dst = insn_has_def32(env,
>>> +                                                               &elem->insn);
>>> +               }
>>> +               insns[idx++] = elem->insn;
>>> +       }
>>> +
>>> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
>>> +       if (!new_subinfo) {
>>> +               kvfree(idx_map);
>>> +               vfree(new_data);
>>> +               return ERR_PTR(-ENOMEM);
>>> +       }
>>> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
>>> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
>>> +       env->subprog_cnt = 0;
>>> +       env->prog = prog;
>>> +       ret = add_subprog(env, 0);
>>> +       if (ret < 0) {
>>> +               ret_env = ERR_PTR(ret);
>>> +               goto free_all_ret;
>>> +       }
>>> +       /* Relocate jumps using idx_map.
>>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
>>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
>>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
>>> +        */
>>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>>> +               int orig_idx = elem->orig_idx;
>>> +
>>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
>>> +                       continue;
>>> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
>>> +                       idx++;
>>> +                       continue;
>>> +               }
>>> +
>>> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
>>> +                                         idx_map);
>>> +               if (ret < 0) {
>>> +                       ret_env = ERR_PTR(ret);
>>> +                       goto free_all_ret;
>>> +               }
>>> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
>>> +               if (ret > 0) {
>>> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
>>> +                       if (ret < 0) {
>>> +                               ret_env = ERR_PTR(ret);
>>> +                               goto free_all_ret;
>>> +                       }
>>> +               }
>>> +               idx++;
>>> +       }
>>> +       if (ret < 0) {
>>> +               ret_env = ERR_PTR(ret);
>>> +               goto free_all_ret;
>>> +       }
>>> +
>>> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
>>> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
>>> +               new_subinfo[idx].start = env->subprog_info[idx].start;
>>> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
>>> +
>>> +       /* Adjust linfo.
>>> +        * FIXME: no support for insn removal at the moment.
>>> +        */
>>> +       if (prog->aux->nr_linfo) {
>>> +               struct bpf_line_info *linfo = prog->aux->linfo;
>>> +               u32 nr_linfo = prog->aux->nr_linfo;
>>> +
>>> +               for (idx = 0; idx < nr_linfo; idx++)
>>> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
>>> +       }
>>> +       vfree(env->insn_aux_data);
>>> +       env->insn_aux_data = new_data;
>>> +       goto free_mem_list_ret;
>>> +free_all_ret:
>>> +       vfree(new_data);
>>> +free_mem_list_ret:
>>> +       kvfree(new_subinfo);
>>> +       kvfree(idx_map);
>>> +       return ret_env;
>>> +}
>>> +
>>>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
>>>  {
>>>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
>>> --
>>> 2.7.4
>>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-11 11:22   ` Jiong Wang
@ 2019-07-12 19:43     ` Andrii Nakryiko
  2019-07-15  9:21       ` Jiong Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-12 19:43 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
	Yonghong Song

On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>
> >> This is an RFC based on latest bpf-next about acclerating insn patching
> >> speed, it is now near the shape of final PATCH set, and we could see the
> >> changes migrating to list patching would brings, so send out for
> >> comments. Most of the info are in cover letter. I splitted the code in a
> >> way to show API migration more easily.
> >
> >
> > Hey Jiong,
> >
> >
> > Sorry, took me a while to get to this and learn more about instruction
> > patching. Overall this looks good and I think is a good direction.
> > I'll post high-level feedback here, and some more
> > implementation-specific ones in corresponding patches.
>
> Great, thanks very much for the feedbacks. Most of your feedbacks are
> hitting those pain points I exactly had ran into. For some of them, I
> thought similar solutions like yours, but failed due to various
> reasons. Let's go through them again, I could have missed some important
> things.
>
> Please see my replies below.

Thanks for thoughtful reply :)

>
> >>
> >> Test Results
> >> ===
> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
> >>     modes (interpreter, JIT, JIT with blinding).
> >>
> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
> >>     patching time from 5100s (nearly one and a half hour) to less than
> >>     0.5s for 1M insn patching.
> >>
> >> Known Issues
> >> ===
> >>   - The following warning is triggered when running scale test which
> >>     contains 1M insns and patching:
> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
> >>
> >>     This is caused by existing code, it can be reproduced on bpf-next
> >>     master with jit blinding enabled, then run scale unit test, it will
> >>     shown up after half an hour. After this set, patching is very fast, so
> >>     it shows up quickly.
> >>
> >>   - No line info adjustment support when doing insn delete, subprog adj
> >>     is with bug when doing insn delete as well. Generally, removal of insns
> >>     could possibly cause remove of entire line or subprog, therefore
> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
> >>     don't have good idea and clean code for integrating this into the
> >>     linearization code at the moment, will do more experimenting,
> >>     appreciate ideas and suggestions on this.
> >
> > Is there any specific problem to detect which line info to delete? Or
> > what am I missing besides careful implementation?
>
> Mostly line info and subprog info are range info which covers a range of
> insns. Deleting insns could causing you adjusting the range or removing one
> range entirely. subprog info could be fully recalcuated during
> linearization while line info I need some careful implementation and I
> failed to have clean code for this during linearization also as said no
> unit tests to help me understand whether the code is correct or not.
>

Ok, that's good that it's just about clean implementation. Try to
implement it as clearly as possible. Then post it here, and if it can
be improved someone (me?) will try to help to clean it up further.

Not a big expert on line info, so can't comment on that,
unfortunately. Maybe Yonghong can chime in (cc'ed)


> I will described this latter, spent too much time writing the following
> reply. Might worth an separate discussion thread.
>
> >>
> >>     Insn delete doesn't happen on normal programs, for example Cilium
> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
> >>     not good. That's also why this RFC have a full pass on selftest with
> >>     this known issue.
> >
> > I hope you'll add test for deletion (and w/ corresponding line info)
> > in final patch set :)
>
> Will try. Need to spend some time on BTF format.
> >
> >>
> >>   - Could further use mem pool to accelerate the speed, changes are trivial
> >>     on top of this RFC, and could be 2x extra faster. Not included in this
> >>     RFC as reducing the algo complexity from quadratic to linear of insn
> >>     number is the first step.
> >
> > Honestly, I think that would add more complexity than necessary, and I
> > think we can further speed up performance without that, see below.
> >
> >>
> >> Background
> >> ===
> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
> >> remove insns.
> >>
> >> At the moment, insn patching is quadratic of insn number, this is due to
> >> branch targets of jump insns needs to be adjusted, and the algo used is:
> >>
> >>   for insn inside prog
> >>     patch insn + regeneate bpf prog
> >>     for insn inside new prog
> >>       adjust jump target
> >>
> >> This is causing significant time spending when a bpf prog requires large
> >> amount of patching on different insns. Benchmarking shows it could take
> >> more than half minutes to finish patching when patching number is more
> >> than 50K, and the time spent could be more than one hour when patching
> >> number is around 1M.
> >>
> >>   15000   :    3s
> >>   45000   :   29s
> >>   95000   :  125s
> >>   195000  :  712s
> >>   1000000 : 5100s
> >>
> >> This RFC introduces new patching infrastructure. Before doing insn
> >> patching, insns in bpf prog are turned into a singly linked list, insert
> >> new insns just insert new list node, delete insns just set delete flag.
> >> And finally, the list is linearized back into array, and branch target
> >> adjustment is done for all jump insns during linearization. This algo
> >> brings the time complexity from quadratic to linear of insn number.
> >>
> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
> >> to less than 0.5s.
> >>
> >> Patching API
> >> ===
> >> Insn patching could happen on two layers inside BPF. One is "core layer"
> >> where only BPF insns are patched. The other is "verification layer" where
> >> insns have corresponding aux info as well high level subprog info, so
> >> insn patching means aux info needs to be patched as well, and subprog info
> >> needs to be adjusted. BPF prog also has debug info associated, so line info
> >> should always be updated after insn patching.
> >>
> >> So, list creation, destroy, insert, delete is the same for both layer,
> >> but lineration is different. "verification layer" patching require extra
> >> work. Therefore the patch APIs are:
> >>
> >>    list creation:                bpf_create_list_insn
> >>    list patch:                   bpf_patch_list_insn
> >>    list pre-patch:               bpf_prepatch_list_insn
> >
> > I think pre-patch name is very confusing, until I read full
> > description I couldn't understand what it's supposed to be used for.
> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
> > me wondering whether instruction buffer is inserted after instruction,
> > or instruction is replaced with a bunch of instructions.
> >
> > So how about two more specific names:
> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
> > instruction with a list of patch instructions)
> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
> > one is pretty clear).
>
> My sense on English word is not great, will switch to above which indeed
> reads more clear.
>
> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
> >
> > These two functions are both quite involved, as well as share a lot of
> > common code. I'd rather have one linearize instruction, that takes env
> > as an optional parameter. If env is specified (which is the case for
> > all cases except for constant blinding pass), then adjust aux_data and
> > subprogs along the way.
>
> Two version of lineration and how to unify them was a painpoint to me. I
> thought to factor out some of the common code out, but it actually doesn't
> count much, the final size counting + insnsi resize parts are the same,
> then things start to diverge since the "Copy over insn" loop.
>
> verifier layer needs to copy and initialize aux data etc. And jump
> relocation is different. At core layer, the use case is JIT blinding which
> could expand an jump_imm insn into a and/or/jump_reg sequence, and the

Sorry, I didn't get what "could expand an jump_imm insn into a
and/or/jump_reg sequence", maybe you can clarify if I'm missing
something.

But from your cover letter description, core layer has no jumps at
all, while verifier has jumps inside patch buffer. So, if you support
jumps inside of patch buffer, it will automatically work for core
layer. Or what am I missing?

Just compared two version of linearize side by side. From what I can
see, unified version could look like this, high-level:

1. Count final insn count (but see my other suggestions how to avoid
that altogether). If not changed - exit.
2. Realloc insn buffer, copy just instructions (not aux_data yet).
Build idx_map, if necessary.
3. (if env) then bpf_patchable_insn has aux_data, so now do another
pass and copy it into resulting array.
4. (if env) Copy sub info. Though I'd see if we can just reuse old
ones and just adjust offsets. I'm not sure why we need to allocate new
array, subprogram count shouldn't change, right?
5. (common) Relocate jumps. Not clear why core layer doesn't care
about PATCHED (or, alternatively, why verifier layer cares). And
again, with targets pointer it will look totally different (and
simpler).
6. (if env) adjust subprogs
7. (common) Adjust prog's line info.

The devil is in the details, but I think this will still be better if
contained in one function if a bunch of `if (env)` checks. Still
pretty linear.

> jump_reg is at the end of the patch buffer, it should be relocated. While
> all use case in verifier layer, no jump in the prog will be patched and all
> new jumps in patch buffer will jump inside the buffer locally so no need to
> resolve.
>
> And yes we could unify them into one and control the diverge using
> argument, but then where to place the function is an issue. My
> understanding is verifier.c is designed to be on top of core.c and core.c
> should not reference and no need to be aware of any verifier specific data
> structures, for example env or bpf_aux_insn_data etc.

Func prototype where it is. Maybe forward-declare verifier env struct.
Implementation in verifier.c?

>
> So, in this RFC, I had choosed to write separate linerization function for
> core and verifier layer. Does this make sense?

See above. Let's still try to make it better.

>
> >
> > This would keep logic less duplicated and shouldn't complexity beyond
> > few null checks in few places.
> >
> >>    list destroy:                 bpf_destroy_list_insn
> >>
> >
> > I'd also add a macro foreach_list_insn instead of explicit for loops
> > in multiple places. That would also allow to skip deleted instructions
> > transparently.
> >
> >> list patch could change the insn at patch point, it will invalid the aux
> >
> > typo: invalid -> invalidate
>
> Ack.
>
> >
> >> info at patching point. list pre-patch insert new insns before patch point
> >> where the insn and associated aux info are not touched, it is used for
> >> example in convert_ctx_access when generating prologue.
> >>
> >> Typical API sequence for one patching pass:
> >>
> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> >>    for (elem = list; elem; elem = elem->next)
> >>       patch_buf = gen_patch_buf_logic;
> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >>    bpf_prog = bpf_linearize_list_insn(list)
> >>    bpf_destroy_list_insn(list)
> >>
> >> Several patching passes could also share the same list:
> >>
> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> >>    for (elem = list; elem; elem = elem->next)
> >>       patch_buf = gen_patch_buf_logic1;
> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >>    for (elem = list; elem; elem = elem->next)
> >>       patch_buf = gen_patch_buf_logic2;
> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >>    bpf_prog = bpf_linearize_list_insn(list)
> >>    bpf_destroy_list_insn(list)
> >>
> >> but note new inserted insns int early passes won't have aux info except
> >> zext info. So, if one patch pass requires all aux info updated and
> >> recalculated for all insns including those pathced, it should first
> >> linearize the old list, then re-create the list. The RFC always create and
> >> linearize the list for each migrated patching pass separately.
> >
> > I think we should do just one list creation, few passes of patching
> > and then linearize once. That will save quite a lot of memory
> > allocation and will speed up a lot of things. All the verifier
> > patching happens one after the other without any other functionality
> > in between, so there shouldn't be any problem.
>
> Yes, as mentioned above, it is possible and I had tried to do it in an very
> initial impl. IIRC convert_ctx_access + fixup_bpf_calls could share the
> same list, but then the 32-bit zero extension insertion pass requires
> aux.zext_dst set properly for all instructions including those patched

So zext_dst. Seems like it's easily calculatable, so doesn't seem like
it even needs to be accessed from aux_data.

But. I can see at least two ways to do this:
1. those patching passes that care about aux_data, should just do
extra check for NULL. Because when we adjust insns now, we just leave
zero-initialized aux_data, except for zext_dst and seen. So it's easy
to default to them if aux_data is NULL for patchable_insn.
2. just allocate and fill them out them when applying patch insns
buffer. It's not a duplication, we already fill them out during
patching today. So just do the same, except through malloc()'ed
pointer instead. At the end they will be copied into linear resulting
array during linearization (uniformly with non-patched insns).

> one which we need to linearize the list first (as we set zext_dst during
> linerization), or the other choice is we do the zext_dst initialization
> during bpf_patch_list_insn, but this then make bpf_patch_list_insn diverge
> between core and verifier layer.

List construction is much simpler, even if we have to have extra
check, similar to `if (env) { do_extra(); }`, IMO, it's fine.

>
> > As for aux_data. We can solve that even more simply and reliably by
> > storing a pointer along the struct bpf_list_insn
>
> This is exactly what I had implemented initially, but then the issue is how
> to handle aux_data for patched insn? IIRC I was leave it as a NULL pointer,
> but later found zext_dst info is required for all insns, so I end up
> duplicating zext_dst in bpf_list_insn.

See above. No duplication. You have a pointer. Whether aux_data is in
original array or was malloc()'ed, doesn't matter. But no duplication
of fields.

>
> This leads me worrying we need to keep duplicating fields there as soon as
> there is new similar requirements in future patching pass and I thought it
> might be better to just reference the aux_data inside env using orig_idx,
> this avoids duplicating information, but we need to make sure used fields
> inside aux_data for patched insn update-to-date during linearization or
> patching list.
>
> > (btw, how about calling it bpf_patchable_insn?).
>
> No preference, will use this one.
>
> > Here's how I propose to represent this patchable instruction:
> >
> > struct bpf_list_insn {
> >        struct bpf_insn insn;
> >        struct bpf_list_insn *next;
> >        struct bpf_list_insn *target;
> >        struct bpf_insn_aux_data *aux_data;
> >        s32 orig_idx; // can repurpose this to have three meanings:
> >                      // -2 - deleted
> >                      // -1 - patched/inserted insn
> >                      // >=0 - original idx
>
> I actually had experimented the -2/-1/0 trick, exactly the same number
> assignment :) IIRC the code was not clear compared with using flag, the
> reason seems to be:
>   1. we still need orig_idx of an patched insn somehow, meaning negate the
>      index.

Not following, original index with be >=0, no?


>   2. somehow somecode need to know whether one insn is deleted or patched
>      after the negation, so I end up with some ugly code.

So that's why you'll have constants with descriptive name for -2 and -1.

>
> Anyway, I might had not thought hard enough on this, I will retry using the
> special index instead of flag, hopefully I could have clean code this time.
>

Yeah, please try again. All those `orig_idx = insn->orig_idx - 1; if
(orig_idx >= 0) { ... }` are very confusing.

> > };
> >
> > The idea would be as follows:
> > 1. when creating original list, target pointer will point directly to
> > a patchable instruction wrapper for jumps/calls. This will allow to
> > stop tracking and re-calculating jump offsets and instruction indicies
> > until linearization.
>
> Not sure I have followed the idea of "target" pointer. At the moment we are
> using index mapping array (generated as by-product during coping insn).
>
> While the "target" pointer means to during list initialization, each jump
> insn will have target initialized to the list node of the converted jump
> destination insn, and all those non-jump insns are with NULL? Then during
> linearization you assign index to each list node (could be done as
> by-product of other pass) before insn coping which could then relocate the
> insn during the coping as the "target" would have final index calculated?
> Am I following correctly?

Yes, I think you are understanding correctly what I'm saying. For
implementation, you can do it in few ways, through few passes or with
some additional data, is less important. See what's cleanest.

>
> > 2. aux_data is also filled at that point. Later at linearization time
> > you'd just iterate over all the instructions in final order and copy
> > original aux_data, if it's present. And then just repace env's
> > aux_data array at the end, should be very simple and fast.
>
> As explained, I am worried making aux_data a pointer will causing
> duplicating some fields into list_insn if the fields are required for
> patched insns.

Addressed above, I don't think there will be any duplication, because
we pass aux_data by pointer.

>
> > 3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
> > same list of instructions and those passes will just keep inserting
> > instruction buffers. Given we have restriction that all the jumps are
> > only within patch buffer, it will be trivial to construct proper
> > patchable instruction wrappers for newly added instructions, with NULL
> > for aux_data and possibly non-NULL target (if it's a JMP insn).
> > 4. After those passes, linearize, adjust subprogs (for this you'll
> > probably still need to create index mapping, right?), copy or create
> > new aux_data.
> > 5. Done.
> >
> > What do you think? I think this should be overall simpler and faster.
> > But let me know if I'm missing something.
>
> Thanks for all these thoughts, they are very good suggestions and reminds
> me to revisit some points I had forgotten. I will do the following things:
>
>   1. retry the negative index solution to eliminate flag if the result code
>      could be clean.
>   2. the "target" pointer seems make sense, it makes list_insn bigger but
>      normally space trade with time, so I will try to implement it to see
>      how the code looks like.
>   3. I still have concerns on making aux_data as pointer. Mostly due to
>      patched insn will have NULL pointer and in case aux info of patched
>      insn is required, we need to duplicate info inside list_insn. For
>      example 32-bit zext opt requires zext_dst.
>


So one more thing I wanted to suggest. I'll try to keep high-level
suggestions here.

What about having a wrapper for patchable_insn list, where you can
store some additional data, like final count and whatever else. It
will eliminate some passes (counting) and will make list handling
easier (because you can have a dummy head pointer, so no special
handling of first element, you had this concern in patch #1, I
believe). But it will be clear if it's beneficial once implemented.

> Regards,
> Jiong
>
> >>
> >> Compared with old patching code, this new infrastructure has much less core
> >> code, even though the final code has a couple of extra lines but that is
> >> mostly due to for list based infrastructure, we need to do more error
> >> checks, so the list and associated aux data structure could be freed when
> >> errors happens.
> >>
> >> Patching Restrictions
> >> ===
> >>   - For core layer, the linearization assume no new jumps inside patch buf.
> >>     Currently, the only user of this layer is jit blinding.
> >>   - For verifier layer, there could be new jumps inside patch buf, but
> >>     they should have branch target resolved themselves, meaning new jumps
> >>     doesn't jump to insns out of the patch buf. This is the case for all
> >>     existing verifier layer users.
> >>   - bpf_insn_aux_data for all patched insns including the one at patch
> >>     point are invalidated, only 32-bit zext info will be recalcuated.
> >>     If the aux data of insn at patch point needs to be retained, it is
> >>     purely insn insertion, so need to use the pre-patch API.
> >>
> >> I plan to send out a PATCH set once I finished insn deletion line info adj
> >> support, please have a looks at this RFC, and appreciate feedbacks.
> >>
> >> Jiong Wang (8):
> >>   bpf: introducing list based insn patching infra to core layer
> >>   bpf: extend list based insn patching infra to verification layer
> >>   bpf: migrate jit blinding to list patching infra
> >>   bpf: migrate convert_ctx_accesses to list patching infra
> >>   bpf: migrate fixup_bpf_calls to list patching infra
> >>   bpf: migrate zero extension opt to list patching infra
> >>   bpf: migrate insn remove to list patching infra
> >>   bpf: delete all those code around old insn patching infrastructure
> >>
> >>  include/linux/bpf_verifier.h |   1 -
> >>  include/linux/filter.h       |  27 +-
> >>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
> >>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
> >>  4 files changed, 580 insertions(+), 528 deletions(-)
> >>
> >> --
> >> 2.7.4
> >>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer
  2019-07-11 11:53     ` Jiong Wang
@ 2019-07-12 19:48       ` Andrii Nakryiko
  2019-07-15  9:58         ` Jiong Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-12 19:48 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers

On Thu, Jul 11, 2019 at 4:53 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>
> >> This patch introduces list based bpf insn patching infra to bpf core layer
> >> which is lower than verification layer.
> >>
> >> This layer has bpf insn sequence as the solo input, therefore the tasks
> >> to be finished during list linerization is:
> >>   - copy insn
> >>   - relocate jumps
> >>   - relocation line info.
> >>
> >> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> >> Suggested-by: Edward Cree <ecree@solarflare.com>
> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >> ---
> >>  include/linux/filter.h |  25 +++++
> >>  kernel/bpf/core.c      | 268 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 293 insertions(+)
> >>
> >> diff --git a/include/linux/filter.h b/include/linux/filter.h
> >> index 1fe53e7..1fea68c 100644
> >> --- a/include/linux/filter.h
> >> +++ b/include/linux/filter.h
> >> @@ -842,6 +842,31 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
> >>                                        const struct bpf_insn *patch, u32 len);
> >>  int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
> >>
> >> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
> >> +                       int idx_map[]);
> >> +
> >> +#define LIST_INSN_FLAG_PATCHED 0x1
> >> +#define LIST_INSN_FLAG_REMOVED 0x2
> >> +struct bpf_list_insn {
> >> +       struct bpf_insn insn;
> >> +       struct bpf_list_insn *next;
> >> +       s32 orig_idx;
> >> +       u32 flag;
> >> +};
> >> +
> >> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog);
> >> +void bpf_destroy_list_insn(struct bpf_list_insn *list);
> >> +/* Replace LIST_INSN with new list insns generated from PATCH. */
> >> +struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
> >> +                                         const struct bpf_insn *patch,
> >> +                                         u32 len);
> >> +/* Pre-patch list_insn with insns inside PATCH, meaning LIST_INSN is not
> >> + * touched. New list insns are inserted before it.
> >> + */
> >> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
> >> +                                            const struct bpf_insn *patch,
> >> +                                            u32 len);
> >> +
> >>  void bpf_clear_redirect_map(struct bpf_map *map);
> >>
> >>  static inline bool xdp_return_frame_no_direct(void)
> >> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> >> index e2c1b43..e60703e 100644
> >> --- a/kernel/bpf/core.c
> >> +++ b/kernel/bpf/core.c
> >> @@ -502,6 +502,274 @@ int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt)
> >>         return WARN_ON_ONCE(bpf_adj_branches(prog, off, off + cnt, off, false));
> >>  }
> >>
> >> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
> >> +                       s32 idx_map[])
> >> +{
> >> +       u8 code = insn->code;
> >> +       s64 imm;
> >> +       s32 off;
> >> +
> >> +       if (BPF_CLASS(code) != BPF_JMP && BPF_CLASS(code) != BPF_JMP32)
> >> +               return 0;
> >> +
> >> +       if (BPF_CLASS(code) == BPF_JMP &&
> >> +           (BPF_OP(code) == BPF_EXIT ||
> >> +            (BPF_OP(code) == BPF_CALL && insn->src_reg != BPF_PSEUDO_CALL)))
> >> +               return 0;
> >> +
> >> +       /* BPF to BPF call. */
> >> +       if (BPF_OP(code) == BPF_CALL) {
> >> +               imm = idx_map[old_idx + insn->imm + 1] - new_idx - 1;
> >> +               if (imm < S32_MIN || imm > S32_MAX)
> >> +                       return -ERANGE;
> >> +               insn->imm = imm;
> >> +               return 1;
> >> +       }
> >> +
> >> +       /* Jump. */
> >> +       off = idx_map[old_idx + insn->off + 1] - new_idx - 1;
> >> +       if (off < S16_MIN || off > S16_MAX)
> >> +               return -ERANGE;
> >> +       insn->off = off;
> >> +       return 0;
> >> +}
> >> +
> >> +void bpf_destroy_list_insn(struct bpf_list_insn *list)
> >> +{
> >> +       struct bpf_list_insn *elem, *next;
> >> +
> >> +       for (elem = list; elem; elem = next) {
> >> +               next = elem->next;
> >> +               kvfree(elem);
> >> +       }
> >> +}
> >> +
> >> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog)
> >> +{
> >> +       unsigned int idx, len = prog->len;
> >> +       struct bpf_list_insn *hdr, *prev;
> >> +       struct bpf_insn *insns;
> >> +
> >> +       hdr = kvzalloc(sizeof(*hdr), GFP_KERNEL);
> >> +       if (!hdr)
> >> +               return ERR_PTR(-ENOMEM);
> >> +
> >> +       insns = prog->insnsi;
> >> +       hdr->insn = insns[0];
> >> +       hdr->orig_idx = 1;
> >> +       prev = hdr;
> >
> > I'm not sure why you need this "prologue" instead of handling first
> > instruction uniformly in for loop below?
>
> It is because the head of the list doesn't have precessor, so no need of
> the prev->next assignment, not could do a check inside the loop to rule the
> head out when doing it.

yeah, prev = NULL initially. Then

if (prev) prev->next = node;

Or see my suggestiong about having patchabel_insns_list wrapper struct
(in cover letter thread).

>
> >> +
> >> +       for (idx = 1; idx < len; idx++) {
> >> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
> >> +                                                     GFP_KERNEL);
> >> +
> >> +               if (!node) {
> >> +                       /* Destroy what has been allocated. */
> >> +                       bpf_destroy_list_insn(hdr);
> >> +                       return ERR_PTR(-ENOMEM);
> >> +               }
> >> +               node->insn = insns[idx];
> >> +               node->orig_idx = idx + 1;
> >
> > Why orig_idx is 1-based? It's really confusing.
>
> orig_idx == 0 means one insn is without original insn, means it is an new
> insn generated for patching purpose.
>
> While the LIST_INSN_FLAG_PATCHED in the RFC means one insn in original prog
> is patched.
>
> I had been trying to differenciate above two cases, but yes, they are
> confusing and differenciating them might be useless, if an insn in original
> prog is patched, all its info could be treated as clobbered and needing
> re-calculating or should do conservative assumption.

Instruction will be new and not patched only in patch_buffer. Once you
add them to the list, they are patched, no? Not sure what's the
distinction you are trying to maintain here.

>
> >
> >> +               prev->next = node;
> >> +               prev = node;
> >> +       }
> >> +
> >> +       return hdr;
> >> +}
> >> +

[...]

> >> +
> >> +       len--;
> >> +       patch++;
> >> +
> >> +       prev = list_insn;
> >> +       next = list_insn->next;
> >> +       for (idx = 0; idx < len; idx++) {
> >> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
> >> +                                                     GFP_KERNEL);
> >> +
> >> +               if (!node) {
> >> +                       /* Link what's allocated, so list destroyer could
> >> +                        * free them.
> >> +                        */
> >> +                       prev->next = next;
> >
> > Why this special handling, if you can just insert element so that list
> > is well-formed after each instruction?
>
> Good idea, just always do "node->next = next", the "prev->next = node" in
> next round will fix it.
>
> >
> >> +                       return ERR_PTR(-ENOMEM);
> >> +               }
> >> +
> >> +               node->insn = patch[idx];
> >> +               prev->next = node;
> >> +               prev = node;
> >
> > E.g.,
> >
> > node->next = next;
> > prev->next = node;
> > prev = node;
> >
> >> +       }
> >> +
> >> +       prev->next = next;
> >
> > And no need for this either.
> >
> >> +       return prev;
> >> +}
> >> +
> >> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
> >> +                                            const struct bpf_insn *patch,
> >> +                                            u32 len)
> >
> > prepatch and patch functions should share the same logic.
> >
> > Prepend is just that - insert all instructions from buffer before current insns.
> > Patch -> replace current one with first instriction in a buffer, then
> > prepend remaining ones before the next instruction (so patch should
> > call info prepend, with adjusted count and array pointer).
>
> Ack, there indeed has quite a few things to simplify.
>
> >
> >> +{
> >> +       struct bpf_list_insn *prev, *node, *begin_node;
> >> +       u32 idx;
> >> +
> >> +       if (!len)
> >> +               return list_insn;
> >> +
> >> +       node = kvzalloc(sizeof(*node), GFP_KERNEL);
> >> +       if (!node)
> >> +               return ERR_PTR(-ENOMEM);
> >> +       node->insn = patch[0];
> >> +       begin_node = node;
> >> +       prev = node;
> >> +
> >> +       for (idx = 1; idx < len; idx++) {
> >> +               node = kvzalloc(sizeof(*node), GFP_KERNEL);
> >> +               if (!node) {
> >> +                       node = begin_node;
> >> +                       /* Release what's has been allocated. */
> >> +                       while (node) {
> >> +                               struct bpf_list_insn *next = node->next;
> >> +
> >> +                               kvfree(node);
> >> +                               node = next;
> >> +                       }
> >> +                       return ERR_PTR(-ENOMEM);
> >> +               }
> >> +               node->insn = patch[idx];
> >> +               prev->next = node;
> >> +               prev = node;
> >> +       }
> >> +
> >> +       prev->next = list_insn;
> >> +       return begin_node;
> >> +}
> >> +
> >>  void bpf_prog_kallsyms_del_subprogs(struct bpf_prog *fp)
> >>  {
> >>         int i;
> >> --
> >> 2.7.4
> >>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [oss-drivers] Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
  2019-07-11 12:20       ` Jiong Wang
@ 2019-07-12 19:51         ` Andrii Nakryiko
  2019-07-15 10:02           ` Jiong Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-12 19:51 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Edward Cree, Naveen N. Rao, Jakub Kicinski, bpf, Networking,
	oss-drivers

On Thu, Jul 11, 2019 at 5:20 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Jiong Wang writes:
>
> > Andrii Nakryiko writes:
> >
> >> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>>
> >>> Verification layer also needs to handle auxiliar info as well as adjusting
> >>> subprog start.
> >>>
> >>> At this layer, insns inside patch buffer could be jump, but they should
> >>> have been resolved, meaning they shouldn't jump to insn outside of the
> >>> patch buffer. Lineration function for this layer won't touch insns inside
> >>> patch buffer.
> >>>
> >>> Adjusting subprog is finished along with adjusting jump target when the
> >>> input will cover bpf to bpf call insn, re-register subprog start is cheap.
> >>> But adjustment when there is insn deleteion is not considered yet.
> >>>
> >>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >>> ---
> >>>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 150 insertions(+)
> >>>
> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>> index a2e7637..2026d64 100644
> >>> --- a/kernel/bpf/verifier.c
> >>> +++ b/kernel/bpf/verifier.c
> >>> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
> >>>         }
> >>>  }
> >>>
> >>> +/* Linearize bpf list insn to array (verifier layer). */
> >>> +static struct bpf_verifier_env *
> >>> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
> >>> +                            struct bpf_list_insn *list)
> >>
> >> It's unclear why this returns env back? It's not allocating a new env,
> >> so it's weird and unnecessary. Just return error code.
> >
> > The reason is I was thinking we have two layers in BPF, the core and the
> > verifier.
> >
> > For core layer (the relevant file is core.c), when doing patching, the
> > input is insn list and bpf_prog, the linearization should linearize the
> > insn list into insn array, and also whatever others affect inside bpf_prog
> > due to changing on insns, for example line info inside prog->aux. So the
> > return value is bpf_prog for core layer linearization hook.
> >
> > For verifier layer, it is similar, but the context if bpf_verifier_env, the
> > linearization hook should linearize the insn list, and also those affected
> > inside env, for example bpf_insn_aux_data, so the return value is
> > bpf_verifier_env, meaning returning an updated verifier context
> > (bpf_verifier_env) after insn list linearization.
>
> Realized your point is no new env is allocated, so just return error
> code. Yes, the env pointer is not changed, just internal data is
> updated. Return bpf_verifier_env mostly is trying to make the hook more
> clear that it returns an updated "context" where the linearization happens,
> for verifier layer, it is bpf_verifier_env, and for core layer, it is
> bpf_prog, so return value was designed to return these two types.

Oh, I missed that core layer returns bpf_prog*. I think this is
confusing as hell and is very contrary to what one would expect. If
the function doesn't allocate those objects, it shouldn't return them,
except for rare cases of some accessor functions. Me reading this,
I'll always be suprised and will have to go skim code just to check
whether those functions really return new bpf_prog or
bpf_verifier_env, respectively.

Please change them both to just return error code.

>
> >
> > Make sense?
> >
> > Regards,
> > Jiong
> >
> >>
> >>> +{
> >>> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
> >>> +       struct bpf_subprog_info *new_subinfo;
> >>> +       struct bpf_insn_aux_data *new_data;
> >>> +       struct bpf_prog *prog = env->prog;
> >>> +       struct bpf_verifier_env *ret_env;
> >>> +       struct bpf_insn *insns, *insn;
> >>> +       struct bpf_list_insn *elem;
> >>> +       int ret;
> >>> +
> >>> +       /* Calculate final size. */
> >>> +       for (elem = list; elem; elem = elem->next)
> >>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
> >>> +                       fini_cnt++;
> >>> +
> >>> +       orig_cnt = prog->len;
> >>> +       insns = prog->insnsi;
> >>> +       /* If prog length remains same, nothing else to do. */
> >>> +       if (fini_cnt == orig_cnt) {
> >>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
> >>> +                       *insn = elem->insn;
> >>> +               return env;
> >>> +       }
> >>> +       /* Realloc insn buffer when necessary. */
> >>> +       if (fini_cnt > orig_cnt)
> >>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
> >>> +                                       GFP_USER);
> >>> +       if (!prog)
> >>> +               return ERR_PTR(-ENOMEM);
> >>> +       insns = prog->insnsi;
> >>> +       prog->len = fini_cnt;
> >>> +       ret_env = env;
> >>> +
> >>> +       /* idx_map[OLD_IDX] = NEW_IDX */
> >>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
> >>> +       if (!idx_map)
> >>> +               return ERR_PTR(-ENOMEM);
> >>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
> >>> +
> >>> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
> >>> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
> >>> +       if (!new_data) {
> >>> +               kvfree(idx_map);
> >>> +               return ERR_PTR(-ENOMEM);
> >>> +       }
> >>> +
> >>> +       /* Copy over insn + calculate idx_map. */
> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> >>> +               int orig_idx = elem->orig_idx - 1;
> >>> +
> >>> +               if (orig_idx >= 0) {
> >>> +                       idx_map[orig_idx] = idx;
> >>> +
> >>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
> >>> +                               continue;
> >>> +
> >>> +                       new_data[idx] = env->insn_aux_data[orig_idx];
> >>> +
> >>> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
> >>> +                               new_data[idx].zext_dst =
> >>> +                                       insn_has_def32(env, &elem->insn);
> >>> +               } else {
> >>> +                       new_data[idx].seen = true;
> >>> +                       new_data[idx].zext_dst = insn_has_def32(env,
> >>> +                                                               &elem->insn);
> >>> +               }
> >>> +               insns[idx++] = elem->insn;
> >>> +       }
> >>> +
> >>> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
> >>> +       if (!new_subinfo) {
> >>> +               kvfree(idx_map);
> >>> +               vfree(new_data);
> >>> +               return ERR_PTR(-ENOMEM);
> >>> +       }
> >>> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
> >>> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
> >>> +       env->subprog_cnt = 0;
> >>> +       env->prog = prog;
> >>> +       ret = add_subprog(env, 0);
> >>> +       if (ret < 0) {
> >>> +               ret_env = ERR_PTR(ret);
> >>> +               goto free_all_ret;
> >>> +       }
> >>> +       /* Relocate jumps using idx_map.
> >>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
> >>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
> >>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
> >>> +        */
> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> >>> +               int orig_idx = elem->orig_idx;
> >>> +
> >>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
> >>> +                       continue;
> >>> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
> >>> +                       idx++;
> >>> +                       continue;
> >>> +               }
> >>> +
> >>> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
> >>> +                                         idx_map);
> >>> +               if (ret < 0) {
> >>> +                       ret_env = ERR_PTR(ret);
> >>> +                       goto free_all_ret;
> >>> +               }
> >>> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
> >>> +               if (ret > 0) {
> >>> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
> >>> +                       if (ret < 0) {
> >>> +                               ret_env = ERR_PTR(ret);
> >>> +                               goto free_all_ret;
> >>> +                       }
> >>> +               }
> >>> +               idx++;
> >>> +       }
> >>> +       if (ret < 0) {
> >>> +               ret_env = ERR_PTR(ret);
> >>> +               goto free_all_ret;
> >>> +       }
> >>> +
> >>> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
> >>> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
> >>> +               new_subinfo[idx].start = env->subprog_info[idx].start;
> >>> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
> >>> +
> >>> +       /* Adjust linfo.
> >>> +        * FIXME: no support for insn removal at the moment.
> >>> +        */
> >>> +       if (prog->aux->nr_linfo) {
> >>> +               struct bpf_line_info *linfo = prog->aux->linfo;
> >>> +               u32 nr_linfo = prog->aux->nr_linfo;
> >>> +
> >>> +               for (idx = 0; idx < nr_linfo; idx++)
> >>> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
> >>> +       }
> >>> +       vfree(env->insn_aux_data);
> >>> +       env->insn_aux_data = new_data;
> >>> +       goto free_mem_list_ret;
> >>> +free_all_ret:
> >>> +       vfree(new_data);
> >>> +free_mem_list_ret:
> >>> +       kvfree(new_subinfo);
> >>> +       kvfree(idx_map);
> >>> +       return ret_env;
> >>> +}
> >>> +
> >>>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
> >>>  {
> >>>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
> >>> --
> >>> 2.7.4
> >>>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-12 19:43     ` Andrii Nakryiko
@ 2019-07-15  9:21       ` Jiong Wang
  2019-07-15 22:55         ` Andrii Nakryiko
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2019-07-15  9:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers, Yonghong Song


Andrii Nakryiko writes:

> On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>>
>> Andrii Nakryiko writes:
>>
>> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >>
>> >> This is an RFC based on latest bpf-next about acclerating insn patching
>> >> speed, it is now near the shape of final PATCH set, and we could see the
>> >> changes migrating to list patching would brings, so send out for
>> >> comments. Most of the info are in cover letter. I splitted the code in a
>> >> way to show API migration more easily.
>> >
>> >
>> > Hey Jiong,
>> >
>> >
>> > Sorry, took me a while to get to this and learn more about instruction
>> > patching. Overall this looks good and I think is a good direction.
>> > I'll post high-level feedback here, and some more
>> > implementation-specific ones in corresponding patches.
>>
>> Great, thanks very much for the feedbacks. Most of your feedbacks are
>> hitting those pain points I exactly had ran into. For some of them, I
>> thought similar solutions like yours, but failed due to various
>> reasons. Let's go through them again, I could have missed some important
>> things.
>>
>> Please see my replies below.
>
> Thanks for thoughtful reply :)
>
>>
>> >>
>> >> Test Results
>> >> ===
>> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
>> >>     modes (interpreter, JIT, JIT with blinding).
>> >>
>> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
>> >>     patching time from 5100s (nearly one and a half hour) to less than
>> >>     0.5s for 1M insn patching.
>> >>
>> >> Known Issues
>> >> ===
>> >>   - The following warning is triggered when running scale test which
>> >>     contains 1M insns and patching:
>> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
>> >>
>> >>     This is caused by existing code, it can be reproduced on bpf-next
>> >>     master with jit blinding enabled, then run scale unit test, it will
>> >>     shown up after half an hour. After this set, patching is very fast, so
>> >>     it shows up quickly.
>> >>
>> >>   - No line info adjustment support when doing insn delete, subprog adj
>> >>     is with bug when doing insn delete as well. Generally, removal of insns
>> >>     could possibly cause remove of entire line or subprog, therefore
>> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
>> >>     don't have good idea and clean code for integrating this into the
>> >>     linearization code at the moment, will do more experimenting,
>> >>     appreciate ideas and suggestions on this.
>> >
>> > Is there any specific problem to detect which line info to delete? Or
>> > what am I missing besides careful implementation?
>>
>> Mostly line info and subprog info are range info which covers a range of
>> insns. Deleting insns could causing you adjusting the range or removing one
>> range entirely. subprog info could be fully recalcuated during
>> linearization while line info I need some careful implementation and I
>> failed to have clean code for this during linearization also as said no
>> unit tests to help me understand whether the code is correct or not.
>>
>
> Ok, that's good that it's just about clean implementation. Try to
> implement it as clearly as possible. Then post it here, and if it can
> be improved someone (me?) will try to help to clean it up further.
>
> Not a big expert on line info, so can't comment on that,
> unfortunately. Maybe Yonghong can chime in (cc'ed)
>
>
>> I will described this latter, spent too much time writing the following
>> reply. Might worth an separate discussion thread.
>>
>> >>
>> >>     Insn delete doesn't happen on normal programs, for example Cilium
>> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
>> >>     not good. That's also why this RFC have a full pass on selftest with
>> >>     this known issue.
>> >
>> > I hope you'll add test for deletion (and w/ corresponding line info)
>> > in final patch set :)
>>
>> Will try. Need to spend some time on BTF format.
>> >
>> >>
>> >>   - Could further use mem pool to accelerate the speed, changes are trivial
>> >>     on top of this RFC, and could be 2x extra faster. Not included in this
>> >>     RFC as reducing the algo complexity from quadratic to linear of insn
>> >>     number is the first step.
>> >
>> > Honestly, I think that would add more complexity than necessary, and I
>> > think we can further speed up performance without that, see below.
>> >
>> >>
>> >> Background
>> >> ===
>> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
>> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
>> >> remove insns.
>> >>
>> >> At the moment, insn patching is quadratic of insn number, this is due to
>> >> branch targets of jump insns needs to be adjusted, and the algo used is:
>> >>
>> >>   for insn inside prog
>> >>     patch insn + regeneate bpf prog
>> >>     for insn inside new prog
>> >>       adjust jump target
>> >>
>> >> This is causing significant time spending when a bpf prog requires large
>> >> amount of patching on different insns. Benchmarking shows it could take
>> >> more than half minutes to finish patching when patching number is more
>> >> than 50K, and the time spent could be more than one hour when patching
>> >> number is around 1M.
>> >>
>> >>   15000   :    3s
>> >>   45000   :   29s
>> >>   95000   :  125s
>> >>   195000  :  712s
>> >>   1000000 : 5100s
>> >>
>> >> This RFC introduces new patching infrastructure. Before doing insn
>> >> patching, insns in bpf prog are turned into a singly linked list, insert
>> >> new insns just insert new list node, delete insns just set delete flag.
>> >> And finally, the list is linearized back into array, and branch target
>> >> adjustment is done for all jump insns during linearization. This algo
>> >> brings the time complexity from quadratic to linear of insn number.
>> >>
>> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
>> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
>> >> to less than 0.5s.
>> >>
>> >> Patching API
>> >> ===
>> >> Insn patching could happen on two layers inside BPF. One is "core layer"
>> >> where only BPF insns are patched. The other is "verification layer" where
>> >> insns have corresponding aux info as well high level subprog info, so
>> >> insn patching means aux info needs to be patched as well, and subprog info
>> >> needs to be adjusted. BPF prog also has debug info associated, so line info
>> >> should always be updated after insn patching.
>> >>
>> >> So, list creation, destroy, insert, delete is the same for both layer,
>> >> but lineration is different. "verification layer" patching require extra
>> >> work. Therefore the patch APIs are:
>> >>
>> >>    list creation:                bpf_create_list_insn
>> >>    list patch:                   bpf_patch_list_insn
>> >>    list pre-patch:               bpf_prepatch_list_insn
>> >
>> > I think pre-patch name is very confusing, until I read full
>> > description I couldn't understand what it's supposed to be used for.
>> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
>> > me wondering whether instruction buffer is inserted after instruction,
>> > or instruction is replaced with a bunch of instructions.
>> >
>> > So how about two more specific names:
>> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
>> > instruction with a list of patch instructions)
>> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
>> > one is pretty clear).
>>
>> My sense on English word is not great, will switch to above which indeed
>> reads more clear.
>>
>> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
>> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
>> >
>> > These two functions are both quite involved, as well as share a lot of
>> > common code. I'd rather have one linearize instruction, that takes env
>> > as an optional parameter. If env is specified (which is the case for
>> > all cases except for constant blinding pass), then adjust aux_data and
>> > subprogs along the way.
>>
>> Two version of lineration and how to unify them was a painpoint to me. I
>> thought to factor out some of the common code out, but it actually doesn't
>> count much, the final size counting + insnsi resize parts are the same,
>> then things start to diverge since the "Copy over insn" loop.
>>
>> verifier layer needs to copy and initialize aux data etc. And jump
>> relocation is different. At core layer, the use case is JIT blinding which
>> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
>
> Sorry, I didn't get what "could expand an jump_imm insn into a
> and/or/jump_reg sequence", maybe you can clarify if I'm missing
> something.
>
> But from your cover letter description, core layer has no jumps at
> all, while verifier has jumps inside patch buffer. So, if you support
> jumps inside of patch buffer, it will automatically work for core
> layer. Or what am I missing?

I meant in core layer (JIT blinding), there is the following patching:

input:
  insn 0             insn 0
  insn 1             insn 1
  jmp_imm   >>       mov_imm  \
  insn 2             xor_imm    insn seq expanded from jmp_imm
  insn 3             jmp_reg  /
                     insn 2
                     insn 3


jmp_imm is the insn that will be patched, and the actually transformation
is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
at the end of the patch buffer, must jump to the same destination as the
original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
be relocated, and the jump destination is outside of patch buffer.

This means for core layer (jit blinding), it needs to take care of insn
inside patch buffer.
  
> Just compared two version of linearize side by side. From what I can
> see, unified version could look like this, high-level:
>
> 1. Count final insn count (but see my other suggestions how to avoid
> that altogether). If not changed - exit.
> 2. Realloc insn buffer, copy just instructions (not aux_data yet).
> Build idx_map, if necessary.
> 3. (if env) then bpf_patchable_insn has aux_data, so now do another
> pass and copy it into resulting array.
> 4. (if env) Copy sub info. Though I'd see if we can just reuse old
> ones and just adjust offsets. I'm not sure why we need to allocate new
> array, subprogram count shouldn't change, right?

If there is no dead insn elimination opt, then we could just adjust
offsets. When there is insn deleting, I feel the logic becomes more
complex. One subprog could be completely deleted or partially deleted, so
I feel just recalculate the whole subprog info as a side-product is
much simpler.

> 5. (common) Relocate jumps. Not clear why core layer doesn't care
> about PATCHED (or, alternatively, why verifier layer cares).

See above, in this RFC, core layer care PATCHED during relocating jumps,
and verifier layer doesn't.

> And again, with targets pointer it will look totally different (and
> simpler).

Yes, will see how the code looks.

> 6. (if env) adjust subprogs
> 7. (common) Adjust prog's line info.
>
> The devil is in the details, but I think this will still be better if
> contained in one function if a bunch of `if (env)` checks. Still
> pretty linear.
>
>> jump_reg is at the end of the patch buffer, it should be relocated. While
>> all use case in verifier layer, no jump in the prog will be patched and all
>> new jumps in patch buffer will jump inside the buffer locally so no need to
>> resolve.
>>
>> And yes we could unify them into one and control the diverge using
>> argument, but then where to place the function is an issue. My
>> understanding is verifier.c is designed to be on top of core.c and core.c
>> should not reference and no need to be aware of any verifier specific data
>> structures, for example env or bpf_aux_insn_data etc.
>
> Func prototype where it is. Maybe forward-declare verifier env struct.
> Implementation in verifier.c?
>
>>
>> So, in this RFC, I had choosed to write separate linerization function for
>> core and verifier layer. Does this make sense?
>
> See above. Let's still try to make it better.
>
>>
>> >
>> > This would keep logic less duplicated and shouldn't complexity beyond
>> > few null checks in few places.
>> >
>> >>    list destroy:                 bpf_destroy_list_insn
>> >>
>> >
>> > I'd also add a macro foreach_list_insn instead of explicit for loops
>> > in multiple places. That would also allow to skip deleted instructions
>> > transparently.
>> >
>> >> list patch could change the insn at patch point, it will invalid the aux
>> >
>> > typo: invalid -> invalidate
>>
>> Ack.
>>
>> >
>> >> info at patching point. list pre-patch insert new insns before patch point
>> >> where the insn and associated aux info are not touched, it is used for
>> >> example in convert_ctx_access when generating prologue.
>> >>
>> >> Typical API sequence for one patching pass:
>> >>
>> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
>> >>    for (elem = list; elem; elem = elem->next)
>> >>       patch_buf = gen_patch_buf_logic;
>> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>> >>    bpf_prog = bpf_linearize_list_insn(list)
>> >>    bpf_destroy_list_insn(list)
>> >>
>> >> Several patching passes could also share the same list:
>> >>
>> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
>> >>    for (elem = list; elem; elem = elem->next)
>> >>       patch_buf = gen_patch_buf_logic1;
>> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>> >>    for (elem = list; elem; elem = elem->next)
>> >>       patch_buf = gen_patch_buf_logic2;
>> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>> >>    bpf_prog = bpf_linearize_list_insn(list)
>> >>    bpf_destroy_list_insn(list)
>> >>
>> >> but note new inserted insns int early passes won't have aux info except
>> >> zext info. So, if one patch pass requires all aux info updated and
>> >> recalculated for all insns including those pathced, it should first
>> >> linearize the old list, then re-create the list. The RFC always create and
>> >> linearize the list for each migrated patching pass separately.
>> >
>> > I think we should do just one list creation, few passes of patching
>> > and then linearize once. That will save quite a lot of memory
>> > allocation and will speed up a lot of things. All the verifier
>> > patching happens one after the other without any other functionality
>> > in between, so there shouldn't be any problem.
>>
>> Yes, as mentioned above, it is possible and I had tried to do it in an very
>> initial impl. IIRC convert_ctx_access + fixup_bpf_calls could share the
>> same list, but then the 32-bit zero extension insertion pass requires
>> aux.zext_dst set properly for all instructions including those patched
>
> So zext_dst. Seems like it's easily calculatable, so doesn't seem like
> it even needs to be accessed from aux_data.
>
> But. I can see at least two ways to do this:
> 1. those patching passes that care about aux_data, should just do
> extra check for NULL. Because when we adjust insns now, we just leave
> zero-initialized aux_data, except for zext_dst and seen. So it's easy
> to default to them if aux_data is NULL for patchable_insn.
> 2. just allocate and fill them out them when applying patch insns
> buffer. It's not a duplication, we already fill them out during
> patching today. So just do the same, except through malloc()'ed
> pointer instead. At the end they will be copied into linear resulting
> array during linearization (uniformly with non-patched insns).
>
>> one which we need to linearize the list first (as we set zext_dst during
>> linerization), or the other choice is we do the zext_dst initialization
>> during bpf_patch_list_insn, but this then make bpf_patch_list_insn diverge
>> between core and verifier layer.
>
> List construction is much simpler, even if we have to have extra
> check, similar to `if (env) { do_extra(); }`, IMO, it's fine.
>
>>
>> > As for aux_data. We can solve that even more simply and reliably by
>> > storing a pointer along the struct bpf_list_insn
>>
>> This is exactly what I had implemented initially, but then the issue is how
>> to handle aux_data for patched insn? IIRC I was leave it as a NULL pointer,
>> but later found zext_dst info is required for all insns, so I end up
>> duplicating zext_dst in bpf_list_insn.
>
> See above. No duplication. You have a pointer. Whether aux_data is in
> original array or was malloc()'ed, doesn't matter. But no duplication
> of fields.
>
>>
>> This leads me worrying we need to keep duplicating fields there as soon as
>> there is new similar requirements in future patching pass and I thought it
>> might be better to just reference the aux_data inside env using orig_idx,
>> this avoids duplicating information, but we need to make sure used fields
>> inside aux_data for patched insn update-to-date during linearization or
>> patching list.
>>
>> > (btw, how about calling it bpf_patchable_insn?).
>>
>> No preference, will use this one.
>>
>> > Here's how I propose to represent this patchable instruction:
>> >
>> > struct bpf_list_insn {
>> >        struct bpf_insn insn;
>> >        struct bpf_list_insn *next;
>> >        struct bpf_list_insn *target;
>> >        struct bpf_insn_aux_data *aux_data;
>> >        s32 orig_idx; // can repurpose this to have three meanings:
>> >                      // -2 - deleted
>> >                      // -1 - patched/inserted insn
>> >                      // >=0 - original idx
>>
>> I actually had experimented the -2/-1/0 trick, exactly the same number
>> assignment :) IIRC the code was not clear compared with using flag, the
>> reason seems to be:
>>   1. we still need orig_idx of an patched insn somehow, meaning negate the
>>      index.
>
> Not following, original index with be >=0, no?
>
>>   2. somehow somecode need to know whether one insn is deleted or patched
>>      after the negation, so I end up with some ugly code.
>
> So that's why you'll have constants with descriptive name for -2 and -1.
>
>>
>> Anyway, I might had not thought hard enough on this, I will retry using the
>> special index instead of flag, hopefully I could have clean code this time.
>>
>
> Yeah, please try again. All those `orig_idx = insn->orig_idx - 1; if
> (orig_idx >= 0) { ... }` are very confusing.
>
>> > };
>> >
>> > The idea would be as follows:
>> > 1. when creating original list, target pointer will point directly to
>> > a patchable instruction wrapper for jumps/calls. This will allow to
>> > stop tracking and re-calculating jump offsets and instruction indicies
>> > until linearization.
>>
>> Not sure I have followed the idea of "target" pointer. At the moment we are
>> using index mapping array (generated as by-product during coping insn).
>>
>> While the "target" pointer means to during list initialization, each jump
>> insn will have target initialized to the list node of the converted jump
>> destination insn, and all those non-jump insns are with NULL? Then during
>> linearization you assign index to each list node (could be done as
>> by-product of other pass) before insn coping which could then relocate the
>> insn during the coping as the "target" would have final index calculated?
>> Am I following correctly?
>
> Yes, I think you are understanding correctly what I'm saying. For
> implementation, you can do it in few ways, through few passes or with
> some additional data, is less important. See what's cleanest.
>
>>
>> > 2. aux_data is also filled at that point. Later at linearization time
>> > you'd just iterate over all the instructions in final order and copy
>> > original aux_data, if it's present. And then just repace env's
>> > aux_data array at the end, should be very simple and fast.
>>
>> As explained, I am worried making aux_data a pointer will causing
>> duplicating some fields into list_insn if the fields are required for
>> patched insns.
>
> Addressed above, I don't think there will be any duplication, because
> we pass aux_data by pointer.
>
>>
>> > 3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
>> > same list of instructions and those passes will just keep inserting
>> > instruction buffers. Given we have restriction that all the jumps are
>> > only within patch buffer, it will be trivial to construct proper
>> > patchable instruction wrappers for newly added instructions, with NULL
>> > for aux_data and possibly non-NULL target (if it's a JMP insn).
>> > 4. After those passes, linearize, adjust subprogs (for this you'll
>> > probably still need to create index mapping, right?), copy or create
>> > new aux_data.
>> > 5. Done.
>> >
>> > What do you think? I think this should be overall simpler and faster.
>> > But let me know if I'm missing something.
>>
>> Thanks for all these thoughts, they are very good suggestions and reminds
>> me to revisit some points I had forgotten. I will do the following things:
>>
>>   1. retry the negative index solution to eliminate flag if the result code
>>      could be clean.
>>   2. the "target" pointer seems make sense, it makes list_insn bigger but
>>      normally space trade with time, so I will try to implement it to see
>>      how the code looks like.
>>   3. I still have concerns on making aux_data as pointer. Mostly due to
>>      patched insn will have NULL pointer and in case aux info of patched
>>      insn is required, we need to duplicate info inside list_insn. For
>>      example 32-bit zext opt requires zext_dst.
>>
>
>
> So one more thing I wanted to suggest. I'll try to keep high-level
> suggestions here.
>
> What about having a wrapper for patchable_insn list, where you can
> store some additional data, like final count and whatever else. It
> will eliminate some passes (counting) and will make list handling
> easier (because you can have a dummy head pointer, so no special
> handling of first element

Will try it.

> you had this concern in patch #1, I
> believe). But it will be clear if it's beneficial once implemented.

>> Regards,
>> Jiong
>>
>> >>
>> >> Compared with old patching code, this new infrastructure has much less core
>> >> code, even though the final code has a couple of extra lines but that is
>> >> mostly due to for list based infrastructure, we need to do more error
>> >> checks, so the list and associated aux data structure could be freed when
>> >> errors happens.
>> >>
>> >> Patching Restrictions
>> >> ===
>> >>   - For core layer, the linearization assume no new jumps inside patch buf.
>> >>     Currently, the only user of this layer is jit blinding.
>> >>   - For verifier layer, there could be new jumps inside patch buf, but
>> >>     they should have branch target resolved themselves, meaning new jumps
>> >>     doesn't jump to insns out of the patch buf. This is the case for all
>> >>     existing verifier layer users.
>> >>   - bpf_insn_aux_data for all patched insns including the one at patch
>> >>     point are invalidated, only 32-bit zext info will be recalcuated.
>> >>     If the aux data of insn at patch point needs to be retained, it is
>> >>     purely insn insertion, so need to use the pre-patch API.
>> >>
>> >> I plan to send out a PATCH set once I finished insn deletion line info adj
>> >> support, please have a looks at this RFC, and appreciate feedbacks.
>> >>
>> >> Jiong Wang (8):
>> >>   bpf: introducing list based insn patching infra to core layer
>> >>   bpf: extend list based insn patching infra to verification layer
>> >>   bpf: migrate jit blinding to list patching infra
>> >>   bpf: migrate convert_ctx_accesses to list patching infra
>> >>   bpf: migrate fixup_bpf_calls to list patching infra
>> >>   bpf: migrate zero extension opt to list patching infra
>> >>   bpf: migrate insn remove to list patching infra
>> >>   bpf: delete all those code around old insn patching infrastructure
>> >>
>> >>  include/linux/bpf_verifier.h |   1 -
>> >>  include/linux/filter.h       |  27 +-
>> >>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
>> >>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
>> >>  4 files changed, 580 insertions(+), 528 deletions(-)
>> >>
>> >> --
>> >> 2.7.4
>> >>
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer
  2019-07-12 19:48       ` Andrii Nakryiko
@ 2019-07-15  9:58         ` Jiong Wang
  0 siblings, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-15  9:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers


Andrii Nakryiko writes:

> On Thu, Jul 11, 2019 at 4:53 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>>
>> Andrii Nakryiko writes:
>>
>> > On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >>
>> >> This patch introduces list based bpf insn patching infra to bpf core layer
>> >> which is lower than verification layer.
>> >>
>> >> This layer has bpf insn sequence as the solo input, therefore the tasks
>> >> to be finished during list linerization is:
>> >>   - copy insn
>> >>   - relocate jumps
>> >>   - relocation line info.
>> >>
>> >> Suggested-by: Alexei Starovoitov <ast@kernel.org>
>> >> Suggested-by: Edward Cree <ecree@solarflare.com>
>> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> >> ---
>> >>  include/linux/filter.h |  25 +++++
>> >>  kernel/bpf/core.c      | 268 +++++++++++++++++++++++++++++++++++++++++++++++++
>> >>  2 files changed, 293 insertions(+)
>> >>
>> >> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> >> index 1fe53e7..1fea68c 100644
>> >> --- a/include/linux/filter.h
>> >> +++ b/include/linux/filter.h
>> >> @@ -842,6 +842,31 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
>> >>                                        const struct bpf_insn *patch, u32 len);
>> >>  int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
>> >>
>> >> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
>> >> +                       int idx_map[]);
>> >> +
>> >> +#define LIST_INSN_FLAG_PATCHED 0x1
>> >> +#define LIST_INSN_FLAG_REMOVED 0x2
>> >> +struct bpf_list_insn {
>> >> +       struct bpf_insn insn;
>> >> +       struct bpf_list_insn *next;
>> >> +       s32 orig_idx;
>> >> +       u32 flag;
>> >> +};
>> >> +
>> >> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog);
>> >> +void bpf_destroy_list_insn(struct bpf_list_insn *list);
>> >> +/* Replace LIST_INSN with new list insns generated from PATCH. */
>> >> +struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
>> >> +                                         const struct bpf_insn *patch,
>> >> +                                         u32 len);
>> >> +/* Pre-patch list_insn with insns inside PATCH, meaning LIST_INSN is not
>> >> + * touched. New list insns are inserted before it.
>> >> + */
>> >> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
>> >> +                                            const struct bpf_insn *patch,
>> >> +                                            u32 len);
>> >> +
>> >>  void bpf_clear_redirect_map(struct bpf_map *map);
>> >>
>> >>  static inline bool xdp_return_frame_no_direct(void)
>> >> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> >> index e2c1b43..e60703e 100644
>> >> --- a/kernel/bpf/core.c
>> >> +++ b/kernel/bpf/core.c
>> >> @@ -502,6 +502,274 @@ int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt)
>> >>         return WARN_ON_ONCE(bpf_adj_branches(prog, off, off + cnt, off, false));
>> >>  }
>> >>
>> >> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
>> >> +                       s32 idx_map[])
>> >> +{
>> >> +       u8 code = insn->code;
>> >> +       s64 imm;
>> >> +       s32 off;
>> >> +
>> >> +       if (BPF_CLASS(code) != BPF_JMP && BPF_CLASS(code) != BPF_JMP32)
>> >> +               return 0;
>> >> +
>> >> +       if (BPF_CLASS(code) == BPF_JMP &&
>> >> +           (BPF_OP(code) == BPF_EXIT ||
>> >> +            (BPF_OP(code) == BPF_CALL && insn->src_reg != BPF_PSEUDO_CALL)))
>> >> +               return 0;
>> >> +
>> >> +       /* BPF to BPF call. */
>> >> +       if (BPF_OP(code) == BPF_CALL) {
>> >> +               imm = idx_map[old_idx + insn->imm + 1] - new_idx - 1;
>> >> +               if (imm < S32_MIN || imm > S32_MAX)
>> >> +                       return -ERANGE;
>> >> +               insn->imm = imm;
>> >> +               return 1;
>> >> +       }
>> >> +
>> >> +       /* Jump. */
>> >> +       off = idx_map[old_idx + insn->off + 1] - new_idx - 1;
>> >> +       if (off < S16_MIN || off > S16_MAX)
>> >> +               return -ERANGE;
>> >> +       insn->off = off;
>> >> +       return 0;
>> >> +}
>> >> +
>> >> +void bpf_destroy_list_insn(struct bpf_list_insn *list)
>> >> +{
>> >> +       struct bpf_list_insn *elem, *next;
>> >> +
>> >> +       for (elem = list; elem; elem = next) {
>> >> +               next = elem->next;
>> >> +               kvfree(elem);
>> >> +       }
>> >> +}
>> >> +
>> >> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog)
>> >> +{
>> >> +       unsigned int idx, len = prog->len;
>> >> +       struct bpf_list_insn *hdr, *prev;
>> >> +       struct bpf_insn *insns;
>> >> +
>> >> +       hdr = kvzalloc(sizeof(*hdr), GFP_KERNEL);
>> >> +       if (!hdr)
>> >> +               return ERR_PTR(-ENOMEM);
>> >> +
>> >> +       insns = prog->insnsi;
>> >> +       hdr->insn = insns[0];
>> >> +       hdr->orig_idx = 1;
>> >> +       prev = hdr;
>> >
>> > I'm not sure why you need this "prologue" instead of handling first
>> > instruction uniformly in for loop below?
>>
>> It is because the head of the list doesn't have precessor, so no need of
>> the prev->next assignment, not could do a check inside the loop to rule the
>> head out when doing it.
>
> yeah, prev = NULL initially. Then
>
> if (prev) prev->next = node;
>
> Or see my suggestiong about having patchabel_insns_list wrapper struct
> (in cover letter thread).
>
>>
>> >> +
>> >> +       for (idx = 1; idx < len; idx++) {
>> >> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
>> >> +                                                     GFP_KERNEL);
>> >> +
>> >> +               if (!node) {
>> >> +                       /* Destroy what has been allocated. */
>> >> +                       bpf_destroy_list_insn(hdr);
>> >> +                       return ERR_PTR(-ENOMEM);
>> >> +               }
>> >> +               node->insn = insns[idx];
>> >> +               node->orig_idx = idx + 1;
>> >
>> > Why orig_idx is 1-based? It's really confusing.
>>
>> orig_idx == 0 means one insn is without original insn, means it is an new
>> insn generated for patching purpose.
>>
>> While the LIST_INSN_FLAG_PATCHED in the RFC means one insn in original prog
>> is patched.
>>
>> I had been trying to differenciate above two cases, but yes, they are
>> confusing and differenciating them might be useless, if an insn in original
>> prog is patched, all its info could be treated as clobbered and needing
>> re-calculating or should do conservative assumption.
>
> Instruction will be new and not patched only in patch_buffer. Once you
> add them to the list, they are patched, no? Not sure what's the
> distinction you are trying to maintain here.

Never mind, the reason I was trying to differenciating them is because I
had some strange preference on the insn patched.

insn 1          insn 1
insn 2   >>     insn 2.1
insn 3          insn 2.2
                insn 2.3
                insn 3

I kind of thinking the it is better to maintain the original info of one
patched insn, that is to say insn 2 above is patched and expanded into insn
2.1/2.2/2.3, then I slightly felt better to copy the aux info of insn to
insn 2.1 and only rebuilt those we are sure needs to be updated, for
example zext because the insn is changed.

>> >
>> >> +               prev->next = node;
>> >> +               prev = node;
>> >> +       }
>> >> +
>> >> +       return hdr;
>> >> +}
>> >> +
>
> [...]
>
>> >> +
>> >> +       len--;
>> >> +       patch++;
>> >> +
>> >> +       prev = list_insn;
>> >> +       next = list_insn->next;
>> >> +       for (idx = 0; idx < len; idx++) {
>> >> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
>> >> +                                                     GFP_KERNEL);
>> >> +
>> >> +               if (!node) {
>> >> +                       /* Link what's allocated, so list destroyer could
>> >> +                        * free them.
>> >> +                        */
>> >> +                       prev->next = next;
>> >
>> > Why this special handling, if you can just insert element so that list
>> > is well-formed after each instruction?
>>
>> Good idea, just always do "node->next = next", the "prev->next = node" in
>> next round will fix it.
>>
>> >
>> >> +                       return ERR_PTR(-ENOMEM);
>> >> +               }
>> >> +
>> >> +               node->insn = patch[idx];
>> >> +               prev->next = node;
>> >> +               prev = node;
>> >
>> > E.g.,
>> >
>> > node->next = next;
>> > prev->next = node;
>> > prev = node;
>> >
>> >> +       }
>> >> +
>> >> +       prev->next = next;
>> >
>> > And no need for this either.
>> >
>> >> +       return prev;
>> >> +}
>> >> +
>> >> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
>> >> +                                            const struct bpf_insn *patch,
>> >> +                                            u32 len)
>> >
>> > prepatch and patch functions should share the same logic.
>> >
>> > Prepend is just that - insert all instructions from buffer before current insns.
>> > Patch -> replace current one with first instriction in a buffer, then
>> > prepend remaining ones before the next instruction (so patch should
>> > call info prepend, with adjusted count and array pointer).
>>
>> Ack, there indeed has quite a few things to simplify.
>>
>> >
>> >> +{
>> >> +       struct bpf_list_insn *prev, *node, *begin_node;
>> >> +       u32 idx;
>> >> +
>> >> +       if (!len)
>> >> +               return list_insn;
>> >> +
>> >> +       node = kvzalloc(sizeof(*node), GFP_KERNEL);
>> >> +       if (!node)
>> >> +               return ERR_PTR(-ENOMEM);
>> >> +       node->insn = patch[0];
>> >> +       begin_node = node;
>> >> +       prev = node;
>> >> +
>> >> +       for (idx = 1; idx < len; idx++) {
>> >> +               node = kvzalloc(sizeof(*node), GFP_KERNEL);
>> >> +               if (!node) {
>> >> +                       node = begin_node;
>> >> +                       /* Release what's has been allocated. */
>> >> +                       while (node) {
>> >> +                               struct bpf_list_insn *next = node->next;
>> >> +
>> >> +                               kvfree(node);
>> >> +                               node = next;
>> >> +                       }
>> >> +                       return ERR_PTR(-ENOMEM);
>> >> +               }
>> >> +               node->insn = patch[idx];
>> >> +               prev->next = node;
>> >> +               prev = node;
>> >> +       }
>> >> +
>> >> +       prev->next = list_insn;
>> >> +       return begin_node;
>> >> +}
>> >> +
>> >>  void bpf_prog_kallsyms_del_subprogs(struct bpf_prog *fp)
>> >>  {
>> >>         int i;
>> >> --
>> >> 2.7.4
>> >>
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [oss-drivers] Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
  2019-07-12 19:51         ` Andrii Nakryiko
@ 2019-07-15 10:02           ` Jiong Wang
  2019-07-15 22:29             ` Andrii Nakryiko
  0 siblings, 1 reply; 32+ messages in thread
From: Jiong Wang @ 2019-07-15 10:02 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Edward Cree, Naveen N. Rao, Jakub Kicinski, bpf, Networking,
	oss-drivers


Andrii Nakryiko writes:

> On Thu, Jul 11, 2019 at 5:20 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>>
>> Jiong Wang writes:
>>
>> > Andrii Nakryiko writes:
>> >
>> >> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >>>
>> >>> Verification layer also needs to handle auxiliar info as well as adjusting
>> >>> subprog start.
>> >>>
>> >>> At this layer, insns inside patch buffer could be jump, but they should
>> >>> have been resolved, meaning they shouldn't jump to insn outside of the
>> >>> patch buffer. Lineration function for this layer won't touch insns inside
>> >>> patch buffer.
>> >>>
>> >>> Adjusting subprog is finished along with adjusting jump target when the
>> >>> input will cover bpf to bpf call insn, re-register subprog start is cheap.
>> >>> But adjustment when there is insn deleteion is not considered yet.
>> >>>
>> >>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> >>> ---
>> >>>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>  1 file changed, 150 insertions(+)
>> >>>
>> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> >>> index a2e7637..2026d64 100644
>> >>> --- a/kernel/bpf/verifier.c
>> >>> +++ b/kernel/bpf/verifier.c
>> >>> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
>> >>>         }
>> >>>  }
>> >>>
>> >>> +/* Linearize bpf list insn to array (verifier layer). */
>> >>> +static struct bpf_verifier_env *
>> >>> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
>> >>> +                            struct bpf_list_insn *list)
>> >>
>> >> It's unclear why this returns env back? It's not allocating a new env,
>> >> so it's weird and unnecessary. Just return error code.
>> >
>> > The reason is I was thinking we have two layers in BPF, the core and the
>> > verifier.
>> >
>> > For core layer (the relevant file is core.c), when doing patching, the
>> > input is insn list and bpf_prog, the linearization should linearize the
>> > insn list into insn array, and also whatever others affect inside bpf_prog
>> > due to changing on insns, for example line info inside prog->aux. So the
>> > return value is bpf_prog for core layer linearization hook.
>> >
>> > For verifier layer, it is similar, but the context if bpf_verifier_env, the
>> > linearization hook should linearize the insn list, and also those affected
>> > inside env, for example bpf_insn_aux_data, so the return value is
>> > bpf_verifier_env, meaning returning an updated verifier context
>> > (bpf_verifier_env) after insn list linearization.
>>
>> Realized your point is no new env is allocated, so just return error
>> code. Yes, the env pointer is not changed, just internal data is
>> updated. Return bpf_verifier_env mostly is trying to make the hook more
>> clear that it returns an updated "context" where the linearization happens,
>> for verifier layer, it is bpf_verifier_env, and for core layer, it is
>> bpf_prog, so return value was designed to return these two types.
>
> Oh, I missed that core layer returns bpf_prog*. I think this is
> confusing as hell and is very contrary to what one would expect. If
> the function doesn't allocate those objects, it shouldn't return them,
> except for rare cases of some accessor functions. Me reading this,
> I'll always be suprised and will have to go skim code just to check
> whether those functions really return new bpf_prog or
> bpf_verifier_env, respectively.

bpf_prog_realloc do return new bpf_prog, so we will need to return bpf_prog
* for core layer.

>
> Please change them both to just return error code.
>
>>
>> >
>> > Make sense?
>> >
>> > Regards,
>> > Jiong
>> >
>> >>
>> >>> +{
>> >>> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
>> >>> +       struct bpf_subprog_info *new_subinfo;
>> >>> +       struct bpf_insn_aux_data *new_data;
>> >>> +       struct bpf_prog *prog = env->prog;
>> >>> +       struct bpf_verifier_env *ret_env;
>> >>> +       struct bpf_insn *insns, *insn;
>> >>> +       struct bpf_list_insn *elem;
>> >>> +       int ret;
>> >>> +
>> >>> +       /* Calculate final size. */
>> >>> +       for (elem = list; elem; elem = elem->next)
>> >>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
>> >>> +                       fini_cnt++;
>> >>> +
>> >>> +       orig_cnt = prog->len;
>> >>> +       insns = prog->insnsi;
>> >>> +       /* If prog length remains same, nothing else to do. */
>> >>> +       if (fini_cnt == orig_cnt) {
>> >>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
>> >>> +                       *insn = elem->insn;
>> >>> +               return env;
>> >>> +       }
>> >>> +       /* Realloc insn buffer when necessary. */
>> >>> +       if (fini_cnt > orig_cnt)
>> >>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
>> >>> +                                       GFP_USER);
>> >>> +       if (!prog)
>> >>> +               return ERR_PTR(-ENOMEM);
>> >>> +       insns = prog->insnsi;
>> >>> +       prog->len = fini_cnt;
>> >>> +       ret_env = env;
>> >>> +
>> >>> +       /* idx_map[OLD_IDX] = NEW_IDX */
>> >>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
>> >>> +       if (!idx_map)
>> >>> +               return ERR_PTR(-ENOMEM);
>> >>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
>> >>> +
>> >>> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
>> >>> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
>> >>> +       if (!new_data) {
>> >>> +               kvfree(idx_map);
>> >>> +               return ERR_PTR(-ENOMEM);
>> >>> +       }
>> >>> +
>> >>> +       /* Copy over insn + calculate idx_map. */
>> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> >>> +               int orig_idx = elem->orig_idx - 1;
>> >>> +
>> >>> +               if (orig_idx >= 0) {
>> >>> +                       idx_map[orig_idx] = idx;
>> >>> +
>> >>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> >>> +                               continue;
>> >>> +
>> >>> +                       new_data[idx] = env->insn_aux_data[orig_idx];
>> >>> +
>> >>> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
>> >>> +                               new_data[idx].zext_dst =
>> >>> +                                       insn_has_def32(env, &elem->insn);
>> >>> +               } else {
>> >>> +                       new_data[idx].seen = true;
>> >>> +                       new_data[idx].zext_dst = insn_has_def32(env,
>> >>> +                                                               &elem->insn);
>> >>> +               }
>> >>> +               insns[idx++] = elem->insn;
>> >>> +       }
>> >>> +
>> >>> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
>> >>> +       if (!new_subinfo) {
>> >>> +               kvfree(idx_map);
>> >>> +               vfree(new_data);
>> >>> +               return ERR_PTR(-ENOMEM);
>> >>> +       }
>> >>> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
>> >>> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
>> >>> +       env->subprog_cnt = 0;
>> >>> +       env->prog = prog;
>> >>> +       ret = add_subprog(env, 0);
>> >>> +       if (ret < 0) {
>> >>> +               ret_env = ERR_PTR(ret);
>> >>> +               goto free_all_ret;
>> >>> +       }
>> >>> +       /* Relocate jumps using idx_map.
>> >>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
>> >>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
>> >>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
>> >>> +        */
>> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> >>> +               int orig_idx = elem->orig_idx;
>> >>> +
>> >>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> >>> +                       continue;
>> >>> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
>> >>> +                       idx++;
>> >>> +                       continue;
>> >>> +               }
>> >>> +
>> >>> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
>> >>> +                                         idx_map);
>> >>> +               if (ret < 0) {
>> >>> +                       ret_env = ERR_PTR(ret);
>> >>> +                       goto free_all_ret;
>> >>> +               }
>> >>> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
>> >>> +               if (ret > 0) {
>> >>> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
>> >>> +                       if (ret < 0) {
>> >>> +                               ret_env = ERR_PTR(ret);
>> >>> +                               goto free_all_ret;
>> >>> +                       }
>> >>> +               }
>> >>> +               idx++;
>> >>> +       }
>> >>> +       if (ret < 0) {
>> >>> +               ret_env = ERR_PTR(ret);
>> >>> +               goto free_all_ret;
>> >>> +       }
>> >>> +
>> >>> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
>> >>> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
>> >>> +               new_subinfo[idx].start = env->subprog_info[idx].start;
>> >>> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
>> >>> +
>> >>> +       /* Adjust linfo.
>> >>> +        * FIXME: no support for insn removal at the moment.
>> >>> +        */
>> >>> +       if (prog->aux->nr_linfo) {
>> >>> +               struct bpf_line_info *linfo = prog->aux->linfo;
>> >>> +               u32 nr_linfo = prog->aux->nr_linfo;
>> >>> +
>> >>> +               for (idx = 0; idx < nr_linfo; idx++)
>> >>> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
>> >>> +       }
>> >>> +       vfree(env->insn_aux_data);
>> >>> +       env->insn_aux_data = new_data;
>> >>> +       goto free_mem_list_ret;
>> >>> +free_all_ret:
>> >>> +       vfree(new_data);
>> >>> +free_mem_list_ret:
>> >>> +       kvfree(new_subinfo);
>> >>> +       kvfree(idx_map);
>> >>> +       return ret_env;
>> >>> +}
>> >>> +
>> >>>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
>> >>>  {
>> >>>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
>> >>> --
>> >>> 2.7.4
>> >>>
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [oss-drivers] Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
  2019-07-15 10:02           ` Jiong Wang
@ 2019-07-15 22:29             ` Andrii Nakryiko
  2019-07-16  8:12               ` Jiong Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-15 22:29 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Edward Cree, Naveen N. Rao, Jakub Kicinski, bpf, Networking,
	oss-drivers

On Mon, Jul 15, 2019 at 3:02 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Thu, Jul 11, 2019 at 5:20 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>
> >>
> >> Jiong Wang writes:
> >>
> >> > Andrii Nakryiko writes:
> >> >
> >> >> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >> >>>
> >> >>> Verification layer also needs to handle auxiliar info as well as adjusting
> >> >>> subprog start.
> >> >>>
> >> >>> At this layer, insns inside patch buffer could be jump, but they should
> >> >>> have been resolved, meaning they shouldn't jump to insn outside of the
> >> >>> patch buffer. Lineration function for this layer won't touch insns inside
> >> >>> patch buffer.
> >> >>>
> >> >>> Adjusting subprog is finished along with adjusting jump target when the
> >> >>> input will cover bpf to bpf call insn, re-register subprog start is cheap.
> >> >>> But adjustment when there is insn deleteion is not considered yet.
> >> >>>
> >> >>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >> >>> ---
> >> >>>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >>>  1 file changed, 150 insertions(+)
> >> >>>
> >> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> >>> index a2e7637..2026d64 100644
> >> >>> --- a/kernel/bpf/verifier.c
> >> >>> +++ b/kernel/bpf/verifier.c
> >> >>> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
> >> >>>         }
> >> >>>  }
> >> >>>
> >> >>> +/* Linearize bpf list insn to array (verifier layer). */
> >> >>> +static struct bpf_verifier_env *
> >> >>> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
> >> >>> +                            struct bpf_list_insn *list)
> >> >>
> >> >> It's unclear why this returns env back? It's not allocating a new env,
> >> >> so it's weird and unnecessary. Just return error code.
> >> >
> >> > The reason is I was thinking we have two layers in BPF, the core and the
> >> > verifier.
> >> >
> >> > For core layer (the relevant file is core.c), when doing patching, the
> >> > input is insn list and bpf_prog, the linearization should linearize the
> >> > insn list into insn array, and also whatever others affect inside bpf_prog
> >> > due to changing on insns, for example line info inside prog->aux. So the
> >> > return value is bpf_prog for core layer linearization hook.
> >> >
> >> > For verifier layer, it is similar, but the context if bpf_verifier_env, the
> >> > linearization hook should linearize the insn list, and also those affected
> >> > inside env, for example bpf_insn_aux_data, so the return value is
> >> > bpf_verifier_env, meaning returning an updated verifier context
> >> > (bpf_verifier_env) after insn list linearization.
> >>
> >> Realized your point is no new env is allocated, so just return error
> >> code. Yes, the env pointer is not changed, just internal data is
> >> updated. Return bpf_verifier_env mostly is trying to make the hook more
> >> clear that it returns an updated "context" where the linearization happens,
> >> for verifier layer, it is bpf_verifier_env, and for core layer, it is
> >> bpf_prog, so return value was designed to return these two types.
> >
> > Oh, I missed that core layer returns bpf_prog*. I think this is
> > confusing as hell and is very contrary to what one would expect. If
> > the function doesn't allocate those objects, it shouldn't return them,
> > except for rare cases of some accessor functions. Me reading this,
> > I'll always be suprised and will have to go skim code just to check
> > whether those functions really return new bpf_prog or
> > bpf_verifier_env, respectively.
>
> bpf_prog_realloc do return new bpf_prog, so we will need to return bpf_prog
> * for core layer.

Ah, I see, then it would make sense for core layer, but still is very
confusing for verifier_linearize_list_insn.
I still hope for unified solution, so it shouldn't matter. But it
pointed me to a bug in your code, see below.

>
> >
> > Please change them both to just return error code.
> >
> >>
> >> >
> >> > Make sense?
> >> >
> >> > Regards,
> >> > Jiong
> >> >
> >> >>
> >> >>> +{
> >> >>> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
> >> >>> +       struct bpf_subprog_info *new_subinfo;
> >> >>> +       struct bpf_insn_aux_data *new_data;
> >> >>> +       struct bpf_prog *prog = env->prog;
> >> >>> +       struct bpf_verifier_env *ret_env;
> >> >>> +       struct bpf_insn *insns, *insn;
> >> >>> +       struct bpf_list_insn *elem;
> >> >>> +       int ret;
> >> >>> +
> >> >>> +       /* Calculate final size. */
> >> >>> +       for (elem = list; elem; elem = elem->next)
> >> >>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
> >> >>> +                       fini_cnt++;
> >> >>> +
> >> >>> +       orig_cnt = prog->len;
> >> >>> +       insns = prog->insnsi;
> >> >>> +       /* If prog length remains same, nothing else to do. */
> >> >>> +       if (fini_cnt == orig_cnt) {
> >> >>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
> >> >>> +                       *insn = elem->insn;
> >> >>> +               return env;
> >> >>> +       }
> >> >>> +       /* Realloc insn buffer when necessary. */
> >> >>> +       if (fini_cnt > orig_cnt)
> >> >>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
> >> >>> +                                       GFP_USER);
> >> >>> +       if (!prog)
> >> >>> +               return ERR_PTR(-ENOMEM);

On realloc failure, prog will be non-NULL, so you need to handle error
properly (and propagate it, instead of returning -ENOMEM):

if (IS_ERR(prog))
    return ERR_PTR(prog);


> >> >>> +       insns = prog->insnsi;
> >> >>> +       prog->len = fini_cnt;
> >> >>> +       ret_env = env;
> >> >>> +
> >> >>> +       /* idx_map[OLD_IDX] = NEW_IDX */
> >> >>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
> >> >>> +       if (!idx_map)
> >> >>> +               return ERR_PTR(-ENOMEM);
> >> >>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
> >> >>> +
> >> >>> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
> >> >>> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
> >> >>> +       if (!new_data) {
> >> >>> +               kvfree(idx_map);
> >> >>> +               return ERR_PTR(-ENOMEM);
> >> >>> +       }
> >> >>> +
> >> >>> +       /* Copy over insn + calculate idx_map. */
> >> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> >> >>> +               int orig_idx = elem->orig_idx - 1;
> >> >>> +
> >> >>> +               if (orig_idx >= 0) {
> >> >>> +                       idx_map[orig_idx] = idx;
> >> >>> +
> >> >>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
> >> >>> +                               continue;
> >> >>> +
> >> >>> +                       new_data[idx] = env->insn_aux_data[orig_idx];
> >> >>> +
> >> >>> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
> >> >>> +                               new_data[idx].zext_dst =
> >> >>> +                                       insn_has_def32(env, &elem->insn);
> >> >>> +               } else {
> >> >>> +                       new_data[idx].seen = true;
> >> >>> +                       new_data[idx].zext_dst = insn_has_def32(env,
> >> >>> +                                                               &elem->insn);
> >> >>> +               }
> >> >>> +               insns[idx++] = elem->insn;
> >> >>> +       }
> >> >>> +
> >> >>> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
> >> >>> +       if (!new_subinfo) {
> >> >>> +               kvfree(idx_map);
> >> >>> +               vfree(new_data);
> >> >>> +               return ERR_PTR(-ENOMEM);
> >> >>> +       }
> >> >>> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
> >> >>> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
> >> >>> +       env->subprog_cnt = 0;
> >> >>> +       env->prog = prog;
> >> >>> +       ret = add_subprog(env, 0);
> >> >>> +       if (ret < 0) {
> >> >>> +               ret_env = ERR_PTR(ret);
> >> >>> +               goto free_all_ret;
> >> >>> +       }
> >> >>> +       /* Relocate jumps using idx_map.
> >> >>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
> >> >>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
> >> >>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
> >> >>> +        */
> >> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> >> >>> +               int orig_idx = elem->orig_idx;
> >> >>> +
> >> >>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
> >> >>> +                       continue;
> >> >>> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
> >> >>> +                       idx++;
> >> >>> +                       continue;
> >> >>> +               }
> >> >>> +
> >> >>> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
> >> >>> +                                         idx_map);
> >> >>> +               if (ret < 0) {
> >> >>> +                       ret_env = ERR_PTR(ret);
> >> >>> +                       goto free_all_ret;
> >> >>> +               }
> >> >>> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
> >> >>> +               if (ret > 0) {
> >> >>> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
> >> >>> +                       if (ret < 0) {
> >> >>> +                               ret_env = ERR_PTR(ret);
> >> >>> +                               goto free_all_ret;
> >> >>> +                       }
> >> >>> +               }
> >> >>> +               idx++;
> >> >>> +       }
> >> >>> +       if (ret < 0) {
> >> >>> +               ret_env = ERR_PTR(ret);
> >> >>> +               goto free_all_ret;
> >> >>> +       }
> >> >>> +
> >> >>> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
> >> >>> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
> >> >>> +               new_subinfo[idx].start = env->subprog_info[idx].start;
> >> >>> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
> >> >>> +
> >> >>> +       /* Adjust linfo.
> >> >>> +        * FIXME: no support for insn removal at the moment.
> >> >>> +        */
> >> >>> +       if (prog->aux->nr_linfo) {
> >> >>> +               struct bpf_line_info *linfo = prog->aux->linfo;
> >> >>> +               u32 nr_linfo = prog->aux->nr_linfo;
> >> >>> +
> >> >>> +               for (idx = 0; idx < nr_linfo; idx++)
> >> >>> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
> >> >>> +       }
> >> >>> +       vfree(env->insn_aux_data);
> >> >>> +       env->insn_aux_data = new_data;
> >> >>> +       goto free_mem_list_ret;
> >> >>> +free_all_ret:
> >> >>> +       vfree(new_data);
> >> >>> +free_mem_list_ret:
> >> >>> +       kvfree(new_subinfo);
> >> >>> +       kvfree(idx_map);
> >> >>> +       return ret_env;
> >> >>> +}
> >> >>> +
> >> >>>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
> >> >>>  {
> >> >>>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
> >> >>> --
> >> >>> 2.7.4
> >> >>>
> >>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-15  9:21       ` Jiong Wang
@ 2019-07-15 22:55         ` Andrii Nakryiko
  2019-07-15 23:00           ` Andrii Nakryiko
  2019-07-16  8:50           ` Jiong Wang
  0 siblings, 2 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-15 22:55 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
	Yonghong Song

On Mon, Jul 15, 2019 at 2:21 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>
> >>
> >> Andrii Nakryiko writes:
> >>
> >> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >> >>
> >> >> This is an RFC based on latest bpf-next about acclerating insn patching
> >> >> speed, it is now near the shape of final PATCH set, and we could see the
> >> >> changes migrating to list patching would brings, so send out for
> >> >> comments. Most of the info are in cover letter. I splitted the code in a
> >> >> way to show API migration more easily.
> >> >
> >> >
> >> > Hey Jiong,
> >> >
> >> >
> >> > Sorry, took me a while to get to this and learn more about instruction
> >> > patching. Overall this looks good and I think is a good direction.
> >> > I'll post high-level feedback here, and some more
> >> > implementation-specific ones in corresponding patches.
> >>
> >> Great, thanks very much for the feedbacks. Most of your feedbacks are
> >> hitting those pain points I exactly had ran into. For some of them, I
> >> thought similar solutions like yours, but failed due to various
> >> reasons. Let's go through them again, I could have missed some important
> >> things.
> >>
> >> Please see my replies below.
> >
> > Thanks for thoughtful reply :)
> >
> >>
> >> >>
> >> >> Test Results
> >> >> ===
> >> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
> >> >>     modes (interpreter, JIT, JIT with blinding).
> >> >>
> >> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
> >> >>     patching time from 5100s (nearly one and a half hour) to less than
> >> >>     0.5s for 1M insn patching.
> >> >>
> >> >> Known Issues
> >> >> ===
> >> >>   - The following warning is triggered when running scale test which
> >> >>     contains 1M insns and patching:
> >> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
> >> >>
> >> >>     This is caused by existing code, it can be reproduced on bpf-next
> >> >>     master with jit blinding enabled, then run scale unit test, it will
> >> >>     shown up after half an hour. After this set, patching is very fast, so
> >> >>     it shows up quickly.
> >> >>
> >> >>   - No line info adjustment support when doing insn delete, subprog adj
> >> >>     is with bug when doing insn delete as well. Generally, removal of insns
> >> >>     could possibly cause remove of entire line or subprog, therefore
> >> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
> >> >>     don't have good idea and clean code for integrating this into the
> >> >>     linearization code at the moment, will do more experimenting,
> >> >>     appreciate ideas and suggestions on this.
> >> >
> >> > Is there any specific problem to detect which line info to delete? Or
> >> > what am I missing besides careful implementation?
> >>
> >> Mostly line info and subprog info are range info which covers a range of
> >> insns. Deleting insns could causing you adjusting the range or removing one
> >> range entirely. subprog info could be fully recalcuated during
> >> linearization while line info I need some careful implementation and I
> >> failed to have clean code for this during linearization also as said no
> >> unit tests to help me understand whether the code is correct or not.
> >>
> >
> > Ok, that's good that it's just about clean implementation. Try to
> > implement it as clearly as possible. Then post it here, and if it can
> > be improved someone (me?) will try to help to clean it up further.
> >
> > Not a big expert on line info, so can't comment on that,
> > unfortunately. Maybe Yonghong can chime in (cc'ed)
> >
> >
> >> I will described this latter, spent too much time writing the following
> >> reply. Might worth an separate discussion thread.
> >>
> >> >>
> >> >>     Insn delete doesn't happen on normal programs, for example Cilium
> >> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
> >> >>     not good. That's also why this RFC have a full pass on selftest with
> >> >>     this known issue.
> >> >
> >> > I hope you'll add test for deletion (and w/ corresponding line info)
> >> > in final patch set :)
> >>
> >> Will try. Need to spend some time on BTF format.
> >> >
> >> >>
> >> >>   - Could further use mem pool to accelerate the speed, changes are trivial
> >> >>     on top of this RFC, and could be 2x extra faster. Not included in this
> >> >>     RFC as reducing the algo complexity from quadratic to linear of insn
> >> >>     number is the first step.
> >> >
> >> > Honestly, I think that would add more complexity than necessary, and I
> >> > think we can further speed up performance without that, see below.
> >> >
> >> >>
> >> >> Background
> >> >> ===
> >> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
> >> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
> >> >> remove insns.
> >> >>
> >> >> At the moment, insn patching is quadratic of insn number, this is due to
> >> >> branch targets of jump insns needs to be adjusted, and the algo used is:
> >> >>
> >> >>   for insn inside prog
> >> >>     patch insn + regeneate bpf prog
> >> >>     for insn inside new prog
> >> >>       adjust jump target
> >> >>
> >> >> This is causing significant time spending when a bpf prog requires large
> >> >> amount of patching on different insns. Benchmarking shows it could take
> >> >> more than half minutes to finish patching when patching number is more
> >> >> than 50K, and the time spent could be more than one hour when patching
> >> >> number is around 1M.
> >> >>
> >> >>   15000   :    3s
> >> >>   45000   :   29s
> >> >>   95000   :  125s
> >> >>   195000  :  712s
> >> >>   1000000 : 5100s
> >> >>
> >> >> This RFC introduces new patching infrastructure. Before doing insn
> >> >> patching, insns in bpf prog are turned into a singly linked list, insert
> >> >> new insns just insert new list node, delete insns just set delete flag.
> >> >> And finally, the list is linearized back into array, and branch target
> >> >> adjustment is done for all jump insns during linearization. This algo
> >> >> brings the time complexity from quadratic to linear of insn number.
> >> >>
> >> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
> >> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
> >> >> to less than 0.5s.
> >> >>
> >> >> Patching API
> >> >> ===
> >> >> Insn patching could happen on two layers inside BPF. One is "core layer"
> >> >> where only BPF insns are patched. The other is "verification layer" where
> >> >> insns have corresponding aux info as well high level subprog info, so
> >> >> insn patching means aux info needs to be patched as well, and subprog info
> >> >> needs to be adjusted. BPF prog also has debug info associated, so line info
> >> >> should always be updated after insn patching.
> >> >>
> >> >> So, list creation, destroy, insert, delete is the same for both layer,
> >> >> but lineration is different. "verification layer" patching require extra
> >> >> work. Therefore the patch APIs are:
> >> >>
> >> >>    list creation:                bpf_create_list_insn
> >> >>    list patch:                   bpf_patch_list_insn
> >> >>    list pre-patch:               bpf_prepatch_list_insn
> >> >
> >> > I think pre-patch name is very confusing, until I read full
> >> > description I couldn't understand what it's supposed to be used for.
> >> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
> >> > me wondering whether instruction buffer is inserted after instruction,
> >> > or instruction is replaced with a bunch of instructions.
> >> >
> >> > So how about two more specific names:
> >> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
> >> > instruction with a list of patch instructions)
> >> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
> >> > one is pretty clear).
> >>
> >> My sense on English word is not great, will switch to above which indeed
> >> reads more clear.
> >>
> >> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
> >> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
> >> >
> >> > These two functions are both quite involved, as well as share a lot of
> >> > common code. I'd rather have one linearize instruction, that takes env
> >> > as an optional parameter. If env is specified (which is the case for
> >> > all cases except for constant blinding pass), then adjust aux_data and
> >> > subprogs along the way.
> >>
> >> Two version of lineration and how to unify them was a painpoint to me. I
> >> thought to factor out some of the common code out, but it actually doesn't
> >> count much, the final size counting + insnsi resize parts are the same,
> >> then things start to diverge since the "Copy over insn" loop.
> >>
> >> verifier layer needs to copy and initialize aux data etc. And jump
> >> relocation is different. At core layer, the use case is JIT blinding which
> >> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
> >
> > Sorry, I didn't get what "could expand an jump_imm insn into a
> > and/or/jump_reg sequence", maybe you can clarify if I'm missing
> > something.
> >
> > But from your cover letter description, core layer has no jumps at
> > all, while verifier has jumps inside patch buffer. So, if you support
> > jumps inside of patch buffer, it will automatically work for core
> > layer. Or what am I missing?
>
> I meant in core layer (JIT blinding), there is the following patching:
>
> input:
>   insn 0             insn 0
>   insn 1             insn 1
>   jmp_imm   >>       mov_imm  \
>   insn 2             xor_imm    insn seq expanded from jmp_imm
>   insn 3             jmp_reg  /
>                      insn 2
>                      insn 3
>
>
> jmp_imm is the insn that will be patched, and the actually transformation
> is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
> at the end of the patch buffer, must jump to the same destination as the
> original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
> be relocated, and the jump destination is outside of patch buffer.


Ok, great, thanks for explaining, yeah it's definitely something that
we should be able to support. BUT. It got me thinking a bit more and I
think I have simpler and more elegant solution now, again, supporting
both core-layer and verifier-layer operations.

struct bpf_patchable_insn {
   struct bpf_patchable_insn *next;
   struct bpf_insn insn;
   int orig_idx; /* original non-patched index */
   int new_idx;  /* new index, will be filled only during linearization */
};

struct bpf_patcher {
    /* dummy head node of a chain of patchable instructions */
    struct bpf_patchable_insn insn_head;
    /* dynamic array of size(original instruction count)
     * this is a map from original instruction index to a first
     * patchable instruction that replaced that instruction (or
     * just original instruction as bpf_patchable_insn).
     */
    int *orig_idx_to_patchable_insn;
    int cnt;
};

Few points, but it should be pretty clear just from comments and definitions:
1. When you created bpf_patcher, you create patchabe_insn list, fill
orig_idx_to_patchable_insn map to store proper pointers. This array is
NEVER changed after that.
2. When replacing instruction, you re-use struct bpf_patchable_insn
for first patched instruction, then append after that (not prepend to
next instruction to not disrupt orig_idx -> patchable_insn mapping).
3. During linearizations, you first traverse the chain of instructions
and trivially assing new_idxs.
4. No need for patchabe_insn->target anymore. All jumps use relative
instruction offsets, right? So when you need to determine new
instruction index during linearization, you just do (after you
calculated new instruction indicies):

func adjust_jmp(struct bpf_patcher* patcher, struct bpf_patchable_insn *insn) {
   int old_jmp_idx = insn->orig_idx + jmp_offset_of(insn->insn);
   int new_jmp_idx = patcher->orig_idx_to_patchable_insn[old_jmp_idx]->new_idx;
   adjust_jmp_offset(insn->insn, new_jmp_idx) - insn->orig_idx;
}

The idea is that we want to support quick look-up by original
instruction index. That's what orig_idx_to_patchable_insn provides. On
the other hand, no existing instruction is ever referencing newly
patched instruction by its new offset, so with careful implementation,
you can transparently support all the cases, regardless if it's in
core layer or verifier layer (so, e.g., verifier layer patched
instructions now will be able to jump out of patched buffer, if
necessary, neat, right?).

It is cleaner than everything we've discussed so far. Unless I missed
something critical (it's all quite convoluted, so I might have
forgotten some parts already). Let me know what you think.


>
> This means for core layer (jit blinding), it needs to take care of insn
> inside patch buffer.
>
> > Just compared two version of linearize side by side. From what I can
> > see, unified version could look like this, high-level:
> >
> > 1. Count final insn count (but see my other suggestions how to avoid
> > that altogether). If not changed - exit.
> > 2. Realloc insn buffer, copy just instructions (not aux_data yet).
> > Build idx_map, if necessary.
> > 3. (if env) then bpf_patchable_insn has aux_data, so now do another
> > pass and copy it into resulting array.
> > 4. (if env) Copy sub info. Though I'd see if we can just reuse old
> > ones and just adjust offsets. I'm not sure why we need to allocate new
> > array, subprogram count shouldn't change, right?
>
> If there is no dead insn elimination opt, then we could just adjust
> offsets. When there is insn deleting, I feel the logic becomes more
> complex. One subprog could be completely deleted or partially deleted, so
> I feel just recalculate the whole subprog info as a side-product is
> much simpler.

What's the situation where entirety of subprog can be deleted?


>
> > 5. (common) Relocate jumps. Not clear why core layer doesn't care
> > about PATCHED (or, alternatively, why verifier layer cares).
>
> See above, in this RFC, core layer care PATCHED during relocating jumps,
> and verifier layer doesn't.
>
> > And again, with targets pointer it will look totally different (and
> > simpler).
>
> Yes, will see how the code looks.
>
> > 6. (if env) adjust subprogs
> > 7. (common) Adjust prog's line info.
> >
> > The devil is in the details, but I think this will still be better if
> > contained in one function if a bunch of `if (env)` checks. Still
> > pretty linear.
> >
> >> jump_reg is at the end of the patch buffer, it should be relocated. While
> >> all use case in verifier layer, no jump in the prog will be patched and all
> >> new jumps in patch buffer will jump inside the buffer locally so no need to
> >> resolve.
> >>
> >> And yes we could unify them into one and control the diverge using
> >> argument, but then where to place the function is an issue. My
> >> understanding is verifier.c is designed to be on top of core.c and core.c
> >> should not reference and no need to be aware of any verifier specific data
> >> structures, for example env or bpf_aux_insn_data etc.
> >
> > Func prototype where it is. Maybe forward-declare verifier env struct.
> > Implementation in verifier.c?
> >
> >>
> >> So, in this RFC, I had choosed to write separate linerization function for
> >> core and verifier layer. Does this make sense?
> >
> > See above. Let's still try to make it better.
> >
> >>
> >> >
> >> > This would keep logic less duplicated and shouldn't complexity beyond
> >> > few null checks in few places.
> >> >
> >> >>    list destroy:                 bpf_destroy_list_insn
> >> >>
> >> >
> >> > I'd also add a macro foreach_list_insn instead of explicit for loops
> >> > in multiple places. That would also allow to skip deleted instructions
> >> > transparently.
> >> >
> >> >> list patch could change the insn at patch point, it will invalid the aux
> >> >
> >> > typo: invalid -> invalidate
> >>
> >> Ack.
> >>
> >> >
> >> >> info at patching point. list pre-patch insert new insns before patch point
> >> >> where the insn and associated aux info are not touched, it is used for
> >> >> example in convert_ctx_access when generating prologue.
> >> >>
> >> >> Typical API sequence for one patching pass:
> >> >>
> >> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> >> >>    for (elem = list; elem; elem = elem->next)
> >> >>       patch_buf = gen_patch_buf_logic;
> >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >> >>    bpf_prog = bpf_linearize_list_insn(list)
> >> >>    bpf_destroy_list_insn(list)
> >> >>
> >> >> Several patching passes could also share the same list:
> >> >>
> >> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> >> >>    for (elem = list; elem; elem = elem->next)
> >> >>       patch_buf = gen_patch_buf_logic1;
> >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >> >>    for (elem = list; elem; elem = elem->next)
> >> >>       patch_buf = gen_patch_buf_logic2;
> >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >> >>    bpf_prog = bpf_linearize_list_insn(list)
> >> >>    bpf_destroy_list_insn(list)
> >> >>
> >> >> but note new inserted insns int early passes won't have aux info except
> >> >> zext info. So, if one patch pass requires all aux info updated and
> >> >> recalculated for all insns including those pathced, it should first
> >> >> linearize the old list, then re-create the list. The RFC always create and
> >> >> linearize the list for each migrated patching pass separately.
> >> >
> >> > I think we should do just one list creation, few passes of patching
> >> > and then linearize once. That will save quite a lot of memory
> >> > allocation and will speed up a lot of things. All the verifier
> >> > patching happens one after the other without any other functionality
> >> > in between, so there shouldn't be any problem.
> >>
> >> Yes, as mentioned above, it is possible and I had tried to do it in an very
> >> initial impl. IIRC convert_ctx_access + fixup_bpf_calls could share the
> >> same list, but then the 32-bit zero extension insertion pass requires
> >> aux.zext_dst set properly for all instructions including those patched
> >
> > So zext_dst. Seems like it's easily calculatable, so doesn't seem like
> > it even needs to be accessed from aux_data.
> >
> > But. I can see at least two ways to do this:
> > 1. those patching passes that care about aux_data, should just do
> > extra check for NULL. Because when we adjust insns now, we just leave
> > zero-initialized aux_data, except for zext_dst and seen. So it's easy
> > to default to them if aux_data is NULL for patchable_insn.
> > 2. just allocate and fill them out them when applying patch insns
> > buffer. It's not a duplication, we already fill them out during
> > patching today. So just do the same, except through malloc()'ed
> > pointer instead. At the end they will be copied into linear resulting
> > array during linearization (uniformly with non-patched insns).
> >
> >> one which we need to linearize the list first (as we set zext_dst during
> >> linerization), or the other choice is we do the zext_dst initialization
> >> during bpf_patch_list_insn, but this then make bpf_patch_list_insn diverge
> >> between core and verifier layer.
> >
> > List construction is much simpler, even if we have to have extra
> > check, similar to `if (env) { do_extra(); }`, IMO, it's fine.
> >
> >>
> >> > As for aux_data. We can solve that even more simply and reliably by
> >> > storing a pointer along the struct bpf_list_insn
> >>
> >> This is exactly what I had implemented initially, but then the issue is how
> >> to handle aux_data for patched insn? IIRC I was leave it as a NULL pointer,
> >> but later found zext_dst info is required for all insns, so I end up
> >> duplicating zext_dst in bpf_list_insn.
> >
> > See above. No duplication. You have a pointer. Whether aux_data is in
> > original array or was malloc()'ed, doesn't matter. But no duplication
> > of fields.
> >
> >>
> >> This leads me worrying we need to keep duplicating fields there as soon as
> >> there is new similar requirements in future patching pass and I thought it
> >> might be better to just reference the aux_data inside env using orig_idx,
> >> this avoids duplicating information, but we need to make sure used fields
> >> inside aux_data for patched insn update-to-date during linearization or
> >> patching list.
> >>
> >> > (btw, how about calling it bpf_patchable_insn?).
> >>
> >> No preference, will use this one.
> >>
> >> > Here's how I propose to represent this patchable instruction:
> >> >
> >> > struct bpf_list_insn {
> >> >        struct bpf_insn insn;
> >> >        struct bpf_list_insn *next;
> >> >        struct bpf_list_insn *target;
> >> >        struct bpf_insn_aux_data *aux_data;
> >> >        s32 orig_idx; // can repurpose this to have three meanings:
> >> >                      // -2 - deleted
> >> >                      // -1 - patched/inserted insn
> >> >                      // >=0 - original idx
> >>
> >> I actually had experimented the -2/-1/0 trick, exactly the same number
> >> assignment :) IIRC the code was not clear compared with using flag, the
> >> reason seems to be:
> >>   1. we still need orig_idx of an patched insn somehow, meaning negate the
> >>      index.
> >
> > Not following, original index with be >=0, no?
> >
> >>   2. somehow somecode need to know whether one insn is deleted or patched
> >>      after the negation, so I end up with some ugly code.
> >
> > So that's why you'll have constants with descriptive name for -2 and -1.
> >
> >>
> >> Anyway, I might had not thought hard enough on this, I will retry using the
> >> special index instead of flag, hopefully I could have clean code this time.
> >>
> >
> > Yeah, please try again. All those `orig_idx = insn->orig_idx - 1; if
> > (orig_idx >= 0) { ... }` are very confusing.
> >
> >> > };
> >> >
> >> > The idea would be as follows:
> >> > 1. when creating original list, target pointer will point directly to
> >> > a patchable instruction wrapper for jumps/calls. This will allow to
> >> > stop tracking and re-calculating jump offsets and instruction indicies
> >> > until linearization.
> >>
> >> Not sure I have followed the idea of "target" pointer. At the moment we are
> >> using index mapping array (generated as by-product during coping insn).
> >>
> >> While the "target" pointer means to during list initialization, each jump
> >> insn will have target initialized to the list node of the converted jump
> >> destination insn, and all those non-jump insns are with NULL? Then during
> >> linearization you assign index to each list node (could be done as
> >> by-product of other pass) before insn coping which could then relocate the
> >> insn during the coping as the "target" would have final index calculated?
> >> Am I following correctly?
> >
> > Yes, I think you are understanding correctly what I'm saying. For
> > implementation, you can do it in few ways, through few passes or with
> > some additional data, is less important. See what's cleanest.
> >
> >>
> >> > 2. aux_data is also filled at that point. Later at linearization time
> >> > you'd just iterate over all the instructions in final order and copy
> >> > original aux_data, if it's present. And then just repace env's
> >> > aux_data array at the end, should be very simple and fast.
> >>
> >> As explained, I am worried making aux_data a pointer will causing
> >> duplicating some fields into list_insn if the fields are required for
> >> patched insns.
> >
> > Addressed above, I don't think there will be any duplication, because
> > we pass aux_data by pointer.
> >
> >>
> >> > 3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
> >> > same list of instructions and those passes will just keep inserting
> >> > instruction buffers. Given we have restriction that all the jumps are
> >> > only within patch buffer, it will be trivial to construct proper
> >> > patchable instruction wrappers for newly added instructions, with NULL
> >> > for aux_data and possibly non-NULL target (if it's a JMP insn).
> >> > 4. After those passes, linearize, adjust subprogs (for this you'll
> >> > probably still need to create index mapping, right?), copy or create
> >> > new aux_data.
> >> > 5. Done.
> >> >
> >> > What do you think? I think this should be overall simpler and faster.
> >> > But let me know if I'm missing something.
> >>
> >> Thanks for all these thoughts, they are very good suggestions and reminds
> >> me to revisit some points I had forgotten. I will do the following things:
> >>
> >>   1. retry the negative index solution to eliminate flag if the result code
> >>      could be clean.
> >>   2. the "target" pointer seems make sense, it makes list_insn bigger but
> >>      normally space trade with time, so I will try to implement it to see
> >>      how the code looks like.
> >>   3. I still have concerns on making aux_data as pointer. Mostly due to
> >>      patched insn will have NULL pointer and in case aux info of patched
> >>      insn is required, we need to duplicate info inside list_insn. For
> >>      example 32-bit zext opt requires zext_dst.
> >>
> >
> >
> > So one more thing I wanted to suggest. I'll try to keep high-level
> > suggestions here.
> >
> > What about having a wrapper for patchable_insn list, where you can
> > store some additional data, like final count and whatever else. It
> > will eliminate some passes (counting) and will make list handling
> > easier (because you can have a dummy head pointer, so no special
> > handling of first element
>
> Will try it.
>
> > you had this concern in patch #1, I
> > believe). But it will be clear if it's beneficial once implemented.
>
> >> Regards,
> >> Jiong
> >>
> >> >>
> >> >> Compared with old patching code, this new infrastructure has much less core
> >> >> code, even though the final code has a couple of extra lines but that is
> >> >> mostly due to for list based infrastructure, we need to do more error
> >> >> checks, so the list and associated aux data structure could be freed when
> >> >> errors happens.
> >> >>
> >> >> Patching Restrictions
> >> >> ===
> >> >>   - For core layer, the linearization assume no new jumps inside patch buf.
> >> >>     Currently, the only user of this layer is jit blinding.
> >> >>   - For verifier layer, there could be new jumps inside patch buf, but
> >> >>     they should have branch target resolved themselves, meaning new jumps
> >> >>     doesn't jump to insns out of the patch buf. This is the case for all
> >> >>     existing verifier layer users.
> >> >>   - bpf_insn_aux_data for all patched insns including the one at patch
> >> >>     point are invalidated, only 32-bit zext info will be recalcuated.
> >> >>     If the aux data of insn at patch point needs to be retained, it is
> >> >>     purely insn insertion, so need to use the pre-patch API.
> >> >>
> >> >> I plan to send out a PATCH set once I finished insn deletion line info adj
> >> >> support, please have a looks at this RFC, and appreciate feedbacks.
> >> >>
> >> >> Jiong Wang (8):
> >> >>   bpf: introducing list based insn patching infra to core layer
> >> >>   bpf: extend list based insn patching infra to verification layer
> >> >>   bpf: migrate jit blinding to list patching infra
> >> >>   bpf: migrate convert_ctx_accesses to list patching infra
> >> >>   bpf: migrate fixup_bpf_calls to list patching infra
> >> >>   bpf: migrate zero extension opt to list patching infra
> >> >>   bpf: migrate insn remove to list patching infra
> >> >>   bpf: delete all those code around old insn patching infrastructure
> >> >>
> >> >>  include/linux/bpf_verifier.h |   1 -
> >> >>  include/linux/filter.h       |  27 +-
> >> >>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
> >> >>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
> >> >>  4 files changed, 580 insertions(+), 528 deletions(-)
> >> >>
> >> >> --
> >> >> 2.7.4
> >> >>
> >>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-15 22:55         ` Andrii Nakryiko
@ 2019-07-15 23:00           ` Andrii Nakryiko
  2019-07-16  8:50           ` Jiong Wang
  1 sibling, 0 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-15 23:00 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
	Yonghong Song

On Mon, Jul 15, 2019 at 3:55 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 15, 2019 at 2:21 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> >
> >
> > Andrii Nakryiko writes:
> >
> > > On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> > >>
> > >>
> > >> Andrii Nakryiko writes:
> > >>
> > >> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> > >> >>
> > >> >> This is an RFC based on latest bpf-next about acclerating insn patching
> > >> >> speed, it is now near the shape of final PATCH set, and we could see the
> > >> >> changes migrating to list patching would brings, so send out for
> > >> >> comments. Most of the info are in cover letter. I splitted the code in a
> > >> >> way to show API migration more easily.
> > >> >
> > >> >
> > >> > Hey Jiong,
> > >> >
> > >> >
> > >> > Sorry, took me a while to get to this and learn more about instruction
> > >> > patching. Overall this looks good and I think is a good direction.
> > >> > I'll post high-level feedback here, and some more
> > >> > implementation-specific ones in corresponding patches.
> > >>
> > >> Great, thanks very much for the feedbacks. Most of your feedbacks are
> > >> hitting those pain points I exactly had ran into. For some of them, I
> > >> thought similar solutions like yours, but failed due to various
> > >> reasons. Let's go through them again, I could have missed some important
> > >> things.
> > >>
> > >> Please see my replies below.
> > >
> > > Thanks for thoughtful reply :)
> > >
> > >>
> > >> >>
> > >> >> Test Results
> > >> >> ===
> > >> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
> > >> >>     modes (interpreter, JIT, JIT with blinding).
> > >> >>
> > >> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
> > >> >>     patching time from 5100s (nearly one and a half hour) to less than
> > >> >>     0.5s for 1M insn patching.
> > >> >>
> > >> >> Known Issues
> > >> >> ===
> > >> >>   - The following warning is triggered when running scale test which
> > >> >>     contains 1M insns and patching:
> > >> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
> > >> >>
> > >> >>     This is caused by existing code, it can be reproduced on bpf-next
> > >> >>     master with jit blinding enabled, then run scale unit test, it will
> > >> >>     shown up after half an hour. After this set, patching is very fast, so
> > >> >>     it shows up quickly.
> > >> >>
> > >> >>   - No line info adjustment support when doing insn delete, subprog adj
> > >> >>     is with bug when doing insn delete as well. Generally, removal of insns
> > >> >>     could possibly cause remove of entire line or subprog, therefore
> > >> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
> > >> >>     don't have good idea and clean code for integrating this into the
> > >> >>     linearization code at the moment, will do more experimenting,
> > >> >>     appreciate ideas and suggestions on this.
> > >> >
> > >> > Is there any specific problem to detect which line info to delete? Or
> > >> > what am I missing besides careful implementation?
> > >>
> > >> Mostly line info and subprog info are range info which covers a range of
> > >> insns. Deleting insns could causing you adjusting the range or removing one
> > >> range entirely. subprog info could be fully recalcuated during
> > >> linearization while line info I need some careful implementation and I
> > >> failed to have clean code for this during linearization also as said no
> > >> unit tests to help me understand whether the code is correct or not.
> > >>
> > >
> > > Ok, that's good that it's just about clean implementation. Try to
> > > implement it as clearly as possible. Then post it here, and if it can
> > > be improved someone (me?) will try to help to clean it up further.
> > >
> > > Not a big expert on line info, so can't comment on that,
> > > unfortunately. Maybe Yonghong can chime in (cc'ed)
> > >
> > >
> > >> I will described this latter, spent too much time writing the following
> > >> reply. Might worth an separate discussion thread.
> > >>
> > >> >>
> > >> >>     Insn delete doesn't happen on normal programs, for example Cilium
> > >> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
> > >> >>     not good. That's also why this RFC have a full pass on selftest with
> > >> >>     this known issue.
> > >> >
> > >> > I hope you'll add test for deletion (and w/ corresponding line info)
> > >> > in final patch set :)
> > >>
> > >> Will try. Need to spend some time on BTF format.
> > >> >
> > >> >>
> > >> >>   - Could further use mem pool to accelerate the speed, changes are trivial
> > >> >>     on top of this RFC, and could be 2x extra faster. Not included in this
> > >> >>     RFC as reducing the algo complexity from quadratic to linear of insn
> > >> >>     number is the first step.
> > >> >
> > >> > Honestly, I think that would add more complexity than necessary, and I
> > >> > think we can further speed up performance without that, see below.
> > >> >
> > >> >>
> > >> >> Background
> > >> >> ===
> > >> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
> > >> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
> > >> >> remove insns.
> > >> >>
> > >> >> At the moment, insn patching is quadratic of insn number, this is due to
> > >> >> branch targets of jump insns needs to be adjusted, and the algo used is:
> > >> >>
> > >> >>   for insn inside prog
> > >> >>     patch insn + regeneate bpf prog
> > >> >>     for insn inside new prog
> > >> >>       adjust jump target
> > >> >>
> > >> >> This is causing significant time spending when a bpf prog requires large
> > >> >> amount of patching on different insns. Benchmarking shows it could take
> > >> >> more than half minutes to finish patching when patching number is more
> > >> >> than 50K, and the time spent could be more than one hour when patching
> > >> >> number is around 1M.
> > >> >>
> > >> >>   15000   :    3s
> > >> >>   45000   :   29s
> > >> >>   95000   :  125s
> > >> >>   195000  :  712s
> > >> >>   1000000 : 5100s
> > >> >>
> > >> >> This RFC introduces new patching infrastructure. Before doing insn
> > >> >> patching, insns in bpf prog are turned into a singly linked list, insert
> > >> >> new insns just insert new list node, delete insns just set delete flag.
> > >> >> And finally, the list is linearized back into array, and branch target
> > >> >> adjustment is done for all jump insns during linearization. This algo
> > >> >> brings the time complexity from quadratic to linear of insn number.
> > >> >>
> > >> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
> > >> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
> > >> >> to less than 0.5s.
> > >> >>
> > >> >> Patching API
> > >> >> ===
> > >> >> Insn patching could happen on two layers inside BPF. One is "core layer"
> > >> >> where only BPF insns are patched. The other is "verification layer" where
> > >> >> insns have corresponding aux info as well high level subprog info, so
> > >> >> insn patching means aux info needs to be patched as well, and subprog info
> > >> >> needs to be adjusted. BPF prog also has debug info associated, so line info
> > >> >> should always be updated after insn patching.
> > >> >>
> > >> >> So, list creation, destroy, insert, delete is the same for both layer,
> > >> >> but lineration is different. "verification layer" patching require extra
> > >> >> work. Therefore the patch APIs are:
> > >> >>
> > >> >>    list creation:                bpf_create_list_insn
> > >> >>    list patch:                   bpf_patch_list_insn
> > >> >>    list pre-patch:               bpf_prepatch_list_insn
> > >> >
> > >> > I think pre-patch name is very confusing, until I read full
> > >> > description I couldn't understand what it's supposed to be used for.
> > >> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
> > >> > me wondering whether instruction buffer is inserted after instruction,
> > >> > or instruction is replaced with a bunch of instructions.
> > >> >
> > >> > So how about two more specific names:
> > >> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
> > >> > instruction with a list of patch instructions)
> > >> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
> > >> > one is pretty clear).
> > >>
> > >> My sense on English word is not great, will switch to above which indeed
> > >> reads more clear.
> > >>
> > >> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
> > >> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
> > >> >
> > >> > These two functions are both quite involved, as well as share a lot of
> > >> > common code. I'd rather have one linearize instruction, that takes env
> > >> > as an optional parameter. If env is specified (which is the case for
> > >> > all cases except for constant blinding pass), then adjust aux_data and
> > >> > subprogs along the way.
> > >>
> > >> Two version of lineration and how to unify them was a painpoint to me. I
> > >> thought to factor out some of the common code out, but it actually doesn't
> > >> count much, the final size counting + insnsi resize parts are the same,
> > >> then things start to diverge since the "Copy over insn" loop.
> > >>
> > >> verifier layer needs to copy and initialize aux data etc. And jump
> > >> relocation is different. At core layer, the use case is JIT blinding which
> > >> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
> > >
> > > Sorry, I didn't get what "could expand an jump_imm insn into a
> > > and/or/jump_reg sequence", maybe you can clarify if I'm missing
> > > something.
> > >
> > > But from your cover letter description, core layer has no jumps at
> > > all, while verifier has jumps inside patch buffer. So, if you support
> > > jumps inside of patch buffer, it will automatically work for core
> > > layer. Or what am I missing?
> >
> > I meant in core layer (JIT blinding), there is the following patching:
> >
> > input:
> >   insn 0             insn 0
> >   insn 1             insn 1
> >   jmp_imm   >>       mov_imm  \
> >   insn 2             xor_imm    insn seq expanded from jmp_imm
> >   insn 3             jmp_reg  /
> >                      insn 2
> >                      insn 3
> >
> >
> > jmp_imm is the insn that will be patched, and the actually transformation
> > is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
> > at the end of the patch buffer, must jump to the same destination as the
> > original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
> > be relocated, and the jump destination is outside of patch buffer.
>
>
> Ok, great, thanks for explaining, yeah it's definitely something that
> we should be able to support. BUT. It got me thinking a bit more and I
> think I have simpler and more elegant solution now, again, supporting
> both core-layer and verifier-layer operations.
>
> struct bpf_patchable_insn {
>    struct bpf_patchable_insn *next;
>    struct bpf_insn insn;
>    int orig_idx; /* original non-patched index */
>    int new_idx;  /* new index, will be filled only during linearization */
> };
>
> struct bpf_patcher {
>     /* dummy head node of a chain of patchable instructions */
>     struct bpf_patchable_insn insn_head;
>     /* dynamic array of size(original instruction count)
>      * this is a map from original instruction index to a first
>      * patchable instruction that replaced that instruction (or
>      * just original instruction as bpf_patchable_insn).
>      */
>     int *orig_idx_to_patchable_insn;
>     int cnt;
> };
>
> Few points, but it should be pretty clear just from comments and definitions:
> 1. When you created bpf_patcher, you create patchabe_insn list, fill
> orig_idx_to_patchable_insn map to store proper pointers. This array is
> NEVER changed after that.
> 2. When replacing instruction, you re-use struct bpf_patchable_insn
> for first patched instruction, then append after that (not prepend to
> next instruction to not disrupt orig_idx -> patchable_insn mapping).
> 3. During linearizations, you first traverse the chain of instructions
> and trivially assing new_idxs.
> 4. No need for patchabe_insn->target anymore. All jumps use relative
> instruction offsets, right? So when you need to determine new
> instruction index during linearization, you just do (after you
> calculated new instruction indicies):
>
> func adjust_jmp(struct bpf_patcher* patcher, struct bpf_patchable_insn *insn) {
>    int old_jmp_idx = insn->orig_idx + jmp_offset_of(insn->insn);
>    int new_jmp_idx = patcher->orig_idx_to_patchable_insn[old_jmp_idx]->new_idx;
>    adjust_jmp_offset(insn->insn, new_jmp_idx) - insn->orig_idx;
> }

Forgot to mention. To handle deleted insns, you can either traverse
till you find first non-deleted instruction after that one, or during
filling of new_idx, just make sure to that new_idx of deleted
instruction is the same as the first non-deleted instruction's
new_idx.

For subprogs (presuming there is no case where we can just eliminate
entire subprog), you can just do simple look up from original index to
a new index. No need to copy/recalculate, etc. It's just orig_idx ->
new_idx mapping.

>
> The idea is that we want to support quick look-up by original
> instruction index. That's what orig_idx_to_patchable_insn provides. On
> the other hand, no existing instruction is ever referencing newly
> patched instruction by its new offset, so with careful implementation,
> you can transparently support all the cases, regardless if it's in
> core layer or verifier layer (so, e.g., verifier layer patched
> instructions now will be able to jump out of patched buffer, if
> necessary, neat, right?).
>
> It is cleaner than everything we've discussed so far. Unless I missed
> something critical (it's all quite convoluted, so I might have
> forgotten some parts already). Let me know what you think.
>
>
> >
> > This means for core layer (jit blinding), it needs to take care of insn
> > inside patch buffer.
> >
> > > Just compared two version of linearize side by side. From what I can
> > > see, unified version could look like this, high-level:
> > >
> > > 1. Count final insn count (but see my other suggestions how to avoid
> > > that altogether). If not changed - exit.
> > > 2. Realloc insn buffer, copy just instructions (not aux_data yet).
> > > Build idx_map, if necessary.
> > > 3. (if env) then bpf_patchable_insn has aux_data, so now do another
> > > pass and copy it into resulting array.
> > > 4. (if env) Copy sub info. Though I'd see if we can just reuse old
> > > ones and just adjust offsets. I'm not sure why we need to allocate new
> > > array, subprogram count shouldn't change, right?
> >
> > If there is no dead insn elimination opt, then we could just adjust
> > offsets. When there is insn deleting, I feel the logic becomes more
> > complex. One subprog could be completely deleted or partially deleted, so
> > I feel just recalculate the whole subprog info as a side-product is
> > much simpler.
>
> What's the situation where entirety of subprog can be deleted?
>
>
> >
> > > 5. (common) Relocate jumps. Not clear why core layer doesn't care
> > > about PATCHED (or, alternatively, why verifier layer cares).
> >
> > See above, in this RFC, core layer care PATCHED during relocating jumps,
> > and verifier layer doesn't.
> >
> > > And again, with targets pointer it will look totally different (and
> > > simpler).
> >
> > Yes, will see how the code looks.
> >
> > > 6. (if env) adjust subprogs
> > > 7. (common) Adjust prog's line info.
> > >
> > > The devil is in the details, but I think this will still be better if
> > > contained in one function if a bunch of `if (env)` checks. Still
> > > pretty linear.
> > >
> > >> jump_reg is at the end of the patch buffer, it should be relocated. While
> > >> all use case in verifier layer, no jump in the prog will be patched and all
> > >> new jumps in patch buffer will jump inside the buffer locally so no need to
> > >> resolve.
> > >>
> > >> And yes we could unify them into one and control the diverge using
> > >> argument, but then where to place the function is an issue. My
> > >> understanding is verifier.c is designed to be on top of core.c and core.c
> > >> should not reference and no need to be aware of any verifier specific data
> > >> structures, for example env or bpf_aux_insn_data etc.
> > >
> > > Func prototype where it is. Maybe forward-declare verifier env struct.
> > > Implementation in verifier.c?
> > >
> > >>
> > >> So, in this RFC, I had choosed to write separate linerization function for
> > >> core and verifier layer. Does this make sense?
> > >
> > > See above. Let's still try to make it better.
> > >
> > >>
> > >> >
> > >> > This would keep logic less duplicated and shouldn't complexity beyond
> > >> > few null checks in few places.
> > >> >
> > >> >>    list destroy:                 bpf_destroy_list_insn
> > >> >>
> > >> >
> > >> > I'd also add a macro foreach_list_insn instead of explicit for loops
> > >> > in multiple places. That would also allow to skip deleted instructions
> > >> > transparently.
> > >> >
> > >> >> list patch could change the insn at patch point, it will invalid the aux
> > >> >
> > >> > typo: invalid -> invalidate
> > >>
> > >> Ack.
> > >>
> > >> >
> > >> >> info at patching point. list pre-patch insert new insns before patch point
> > >> >> where the insn and associated aux info are not touched, it is used for
> > >> >> example in convert_ctx_access when generating prologue.
> > >> >>
> > >> >> Typical API sequence for one patching pass:
> > >> >>
> > >> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> > >> >>    for (elem = list; elem; elem = elem->next)
> > >> >>       patch_buf = gen_patch_buf_logic;
> > >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> > >> >>    bpf_prog = bpf_linearize_list_insn(list)
> > >> >>    bpf_destroy_list_insn(list)
> > >> >>
> > >> >> Several patching passes could also share the same list:
> > >> >>
> > >> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> > >> >>    for (elem = list; elem; elem = elem->next)
> > >> >>       patch_buf = gen_patch_buf_logic1;
> > >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> > >> >>    for (elem = list; elem; elem = elem->next)
> > >> >>       patch_buf = gen_patch_buf_logic2;
> > >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> > >> >>    bpf_prog = bpf_linearize_list_insn(list)
> > >> >>    bpf_destroy_list_insn(list)
> > >> >>
> > >> >> but note new inserted insns int early passes won't have aux info except
> > >> >> zext info. So, if one patch pass requires all aux info updated and
> > >> >> recalculated for all insns including those pathced, it should first
> > >> >> linearize the old list, then re-create the list. The RFC always create and
> > >> >> linearize the list for each migrated patching pass separately.
> > >> >
> > >> > I think we should do just one list creation, few passes of patching
> > >> > and then linearize once. That will save quite a lot of memory
> > >> > allocation and will speed up a lot of things. All the verifier
> > >> > patching happens one after the other without any other functionality
> > >> > in between, so there shouldn't be any problem.
> > >>
> > >> Yes, as mentioned above, it is possible and I had tried to do it in an very
> > >> initial impl. IIRC convert_ctx_access + fixup_bpf_calls could share the
> > >> same list, but then the 32-bit zero extension insertion pass requires
> > >> aux.zext_dst set properly for all instructions including those patched
> > >
> > > So zext_dst. Seems like it's easily calculatable, so doesn't seem like
> > > it even needs to be accessed from aux_data.
> > >
> > > But. I can see at least two ways to do this:
> > > 1. those patching passes that care about aux_data, should just do
> > > extra check for NULL. Because when we adjust insns now, we just leave
> > > zero-initialized aux_data, except for zext_dst and seen. So it's easy
> > > to default to them if aux_data is NULL for patchable_insn.
> > > 2. just allocate and fill them out them when applying patch insns
> > > buffer. It's not a duplication, we already fill them out during
> > > patching today. So just do the same, except through malloc()'ed
> > > pointer instead. At the end they will be copied into linear resulting
> > > array during linearization (uniformly with non-patched insns).
> > >
> > >> one which we need to linearize the list first (as we set zext_dst during
> > >> linerization), or the other choice is we do the zext_dst initialization
> > >> during bpf_patch_list_insn, but this then make bpf_patch_list_insn diverge
> > >> between core and verifier layer.
> > >
> > > List construction is much simpler, even if we have to have extra
> > > check, similar to `if (env) { do_extra(); }`, IMO, it's fine.
> > >
> > >>
> > >> > As for aux_data. We can solve that even more simply and reliably by
> > >> > storing a pointer along the struct bpf_list_insn
> > >>
> > >> This is exactly what I had implemented initially, but then the issue is how
> > >> to handle aux_data for patched insn? IIRC I was leave it as a NULL pointer,
> > >> but later found zext_dst info is required for all insns, so I end up
> > >> duplicating zext_dst in bpf_list_insn.
> > >
> > > See above. No duplication. You have a pointer. Whether aux_data is in
> > > original array or was malloc()'ed, doesn't matter. But no duplication
> > > of fields.
> > >
> > >>
> > >> This leads me worrying we need to keep duplicating fields there as soon as
> > >> there is new similar requirements in future patching pass and I thought it
> > >> might be better to just reference the aux_data inside env using orig_idx,
> > >> this avoids duplicating information, but we need to make sure used fields
> > >> inside aux_data for patched insn update-to-date during linearization or
> > >> patching list.
> > >>
> > >> > (btw, how about calling it bpf_patchable_insn?).
> > >>
> > >> No preference, will use this one.
> > >>
> > >> > Here's how I propose to represent this patchable instruction:
> > >> >
> > >> > struct bpf_list_insn {
> > >> >        struct bpf_insn insn;
> > >> >        struct bpf_list_insn *next;
> > >> >        struct bpf_list_insn *target;
> > >> >        struct bpf_insn_aux_data *aux_data;
> > >> >        s32 orig_idx; // can repurpose this to have three meanings:
> > >> >                      // -2 - deleted
> > >> >                      // -1 - patched/inserted insn
> > >> >                      // >=0 - original idx
> > >>
> > >> I actually had experimented the -2/-1/0 trick, exactly the same number
> > >> assignment :) IIRC the code was not clear compared with using flag, the
> > >> reason seems to be:
> > >>   1. we still need orig_idx of an patched insn somehow, meaning negate the
> > >>      index.
> > >
> > > Not following, original index with be >=0, no?
> > >
> > >>   2. somehow somecode need to know whether one insn is deleted or patched
> > >>      after the negation, so I end up with some ugly code.
> > >
> > > So that's why you'll have constants with descriptive name for -2 and -1.
> > >
> > >>
> > >> Anyway, I might had not thought hard enough on this, I will retry using the
> > >> special index instead of flag, hopefully I could have clean code this time.
> > >>
> > >
> > > Yeah, please try again. All those `orig_idx = insn->orig_idx - 1; if
> > > (orig_idx >= 0) { ... }` are very confusing.
> > >
> > >> > };
> > >> >
> > >> > The idea would be as follows:
> > >> > 1. when creating original list, target pointer will point directly to
> > >> > a patchable instruction wrapper for jumps/calls. This will allow to
> > >> > stop tracking and re-calculating jump offsets and instruction indicies
> > >> > until linearization.
> > >>
> > >> Not sure I have followed the idea of "target" pointer. At the moment we are
> > >> using index mapping array (generated as by-product during coping insn).
> > >>
> > >> While the "target" pointer means to during list initialization, each jump
> > >> insn will have target initialized to the list node of the converted jump
> > >> destination insn, and all those non-jump insns are with NULL? Then during
> > >> linearization you assign index to each list node (could be done as
> > >> by-product of other pass) before insn coping which could then relocate the
> > >> insn during the coping as the "target" would have final index calculated?
> > >> Am I following correctly?
> > >
> > > Yes, I think you are understanding correctly what I'm saying. For
> > > implementation, you can do it in few ways, through few passes or with
> > > some additional data, is less important. See what's cleanest.
> > >
> > >>
> > >> > 2. aux_data is also filled at that point. Later at linearization time
> > >> > you'd just iterate over all the instructions in final order and copy
> > >> > original aux_data, if it's present. And then just repace env's
> > >> > aux_data array at the end, should be very simple and fast.
> > >>
> > >> As explained, I am worried making aux_data a pointer will causing
> > >> duplicating some fields into list_insn if the fields are required for
> > >> patched insns.
> > >
> > > Addressed above, I don't think there will be any duplication, because
> > > we pass aux_data by pointer.
> > >
> > >>
> > >> > 3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
> > >> > same list of instructions and those passes will just keep inserting
> > >> > instruction buffers. Given we have restriction that all the jumps are
> > >> > only within patch buffer, it will be trivial to construct proper
> > >> > patchable instruction wrappers for newly added instructions, with NULL
> > >> > for aux_data and possibly non-NULL target (if it's a JMP insn).
> > >> > 4. After those passes, linearize, adjust subprogs (for this you'll
> > >> > probably still need to create index mapping, right?), copy or create
> > >> > new aux_data.
> > >> > 5. Done.
> > >> >
> > >> > What do you think? I think this should be overall simpler and faster.
> > >> > But let me know if I'm missing something.
> > >>
> > >> Thanks for all these thoughts, they are very good suggestions and reminds
> > >> me to revisit some points I had forgotten. I will do the following things:
> > >>
> > >>   1. retry the negative index solution to eliminate flag if the result code
> > >>      could be clean.
> > >>   2. the "target" pointer seems make sense, it makes list_insn bigger but
> > >>      normally space trade with time, so I will try to implement it to see
> > >>      how the code looks like.
> > >>   3. I still have concerns on making aux_data as pointer. Mostly due to
> > >>      patched insn will have NULL pointer and in case aux info of patched
> > >>      insn is required, we need to duplicate info inside list_insn. For
> > >>      example 32-bit zext opt requires zext_dst.
> > >>
> > >
> > >
> > > So one more thing I wanted to suggest. I'll try to keep high-level
> > > suggestions here.
> > >
> > > What about having a wrapper for patchable_insn list, where you can
> > > store some additional data, like final count and whatever else. It
> > > will eliminate some passes (counting) and will make list handling
> > > easier (because you can have a dummy head pointer, so no special
> > > handling of first element
> >
> > Will try it.
> >
> > > you had this concern in patch #1, I
> > > believe). But it will be clear if it's beneficial once implemented.
> >
> > >> Regards,
> > >> Jiong
> > >>
> > >> >>
> > >> >> Compared with old patching code, this new infrastructure has much less core
> > >> >> code, even though the final code has a couple of extra lines but that is
> > >> >> mostly due to for list based infrastructure, we need to do more error
> > >> >> checks, so the list and associated aux data structure could be freed when
> > >> >> errors happens.
> > >> >>
> > >> >> Patching Restrictions
> > >> >> ===
> > >> >>   - For core layer, the linearization assume no new jumps inside patch buf.
> > >> >>     Currently, the only user of this layer is jit blinding.
> > >> >>   - For verifier layer, there could be new jumps inside patch buf, but
> > >> >>     they should have branch target resolved themselves, meaning new jumps
> > >> >>     doesn't jump to insns out of the patch buf. This is the case for all
> > >> >>     existing verifier layer users.
> > >> >>   - bpf_insn_aux_data for all patched insns including the one at patch
> > >> >>     point are invalidated, only 32-bit zext info will be recalcuated.
> > >> >>     If the aux data of insn at patch point needs to be retained, it is
> > >> >>     purely insn insertion, so need to use the pre-patch API.
> > >> >>
> > >> >> I plan to send out a PATCH set once I finished insn deletion line info adj
> > >> >> support, please have a looks at this RFC, and appreciate feedbacks.
> > >> >>
> > >> >> Jiong Wang (8):
> > >> >>   bpf: introducing list based insn patching infra to core layer
> > >> >>   bpf: extend list based insn patching infra to verification layer
> > >> >>   bpf: migrate jit blinding to list patching infra
> > >> >>   bpf: migrate convert_ctx_accesses to list patching infra
> > >> >>   bpf: migrate fixup_bpf_calls to list patching infra
> > >> >>   bpf: migrate zero extension opt to list patching infra
> > >> >>   bpf: migrate insn remove to list patching infra
> > >> >>   bpf: delete all those code around old insn patching infrastructure
> > >> >>
> > >> >>  include/linux/bpf_verifier.h |   1 -
> > >> >>  include/linux/filter.h       |  27 +-
> > >> >>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
> > >> >>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
> > >> >>  4 files changed, 580 insertions(+), 528 deletions(-)
> > >> >>
> > >> >> --
> > >> >> 2.7.4
> > >> >>
> > >>
> >

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [oss-drivers] Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
  2019-07-15 22:29             ` Andrii Nakryiko
@ 2019-07-16  8:12               ` Jiong Wang
  0 siblings, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-16  8:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Edward Cree, Naveen N. Rao, Jakub Kicinski, bpf, Networking,
	oss-drivers


Andrii Nakryiko writes:

> On Mon, Jul 15, 2019 at 3:02 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>>
>> Andrii Nakryiko writes:
>>
>> > On Thu, Jul 11, 2019 at 5:20 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >>
>> >>
>> >> Jiong Wang writes:
>> >>
>> >> > Andrii Nakryiko writes:
>> >> >
>> >> >> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >> >>>
>> >> >>> Verification layer also needs to handle auxiliar info as well as adjusting
>> >> >>> subprog start.
>> >> >>>
>> >> >>> At this layer, insns inside patch buffer could be jump, but they should
>> >> >>> have been resolved, meaning they shouldn't jump to insn outside of the
>> >> >>> patch buffer. Lineration function for this layer won't touch insns inside
>> >> >>> patch buffer.
>> >> >>>
>> >> >>> Adjusting subprog is finished along with adjusting jump target when the
>> >> >>> input will cover bpf to bpf call insn, re-register subprog start is cheap.
>> >> >>> But adjustment when there is insn deleteion is not considered yet.
>> >> >>>
>> >> >>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> >> >>> ---
>> >> >>>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >>>  1 file changed, 150 insertions(+)
>> >> >>>
>> >> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> >> >>> index a2e7637..2026d64 100644
>> >> >>> --- a/kernel/bpf/verifier.c
>> >> >>> +++ b/kernel/bpf/verifier.c
>> >> >>> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
>> >> >>>         }
>> >> >>>  }
>> >> >>>
>> >> >>> +/* Linearize bpf list insn to array (verifier layer). */
>> >> >>> +static struct bpf_verifier_env *
>> >> >>> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
>> >> >>> +                            struct bpf_list_insn *list)
>> >> >>
>> >> >> It's unclear why this returns env back? It's not allocating a new env,
>> >> >> so it's weird and unnecessary. Just return error code.
>> >> >
>> >> > The reason is I was thinking we have two layers in BPF, the core and the
>> >> > verifier.
>> >> >
>> >> > For core layer (the relevant file is core.c), when doing patching, the
>> >> > input is insn list and bpf_prog, the linearization should linearize the
>> >> > insn list into insn array, and also whatever others affect inside bpf_prog
>> >> > due to changing on insns, for example line info inside prog->aux. So the
>> >> > return value is bpf_prog for core layer linearization hook.
>> >> >
>> >> > For verifier layer, it is similar, but the context if bpf_verifier_env, the
>> >> > linearization hook should linearize the insn list, and also those affected
>> >> > inside env, for example bpf_insn_aux_data, so the return value is
>> >> > bpf_verifier_env, meaning returning an updated verifier context
>> >> > (bpf_verifier_env) after insn list linearization.
>> >>
>> >> Realized your point is no new env is allocated, so just return error
>> >> code. Yes, the env pointer is not changed, just internal data is
>> >> updated. Return bpf_verifier_env mostly is trying to make the hook more
>> >> clear that it returns an updated "context" where the linearization happens,
>> >> for verifier layer, it is bpf_verifier_env, and for core layer, it is
>> >> bpf_prog, so return value was designed to return these two types.
>> >
>> > Oh, I missed that core layer returns bpf_prog*. I think this is
>> > confusing as hell and is very contrary to what one would expect. If
>> > the function doesn't allocate those objects, it shouldn't return them,
>> > except for rare cases of some accessor functions. Me reading this,
>> > I'll always be suprised and will have to go skim code just to check
>> > whether those functions really return new bpf_prog or
>> > bpf_verifier_env, respectively.
>>
>> bpf_prog_realloc do return new bpf_prog, so we will need to return bpf_prog
>> * for core layer.
>
> Ah, I see, then it would make sense for core layer, but still is very
> confusing for verifier_linearize_list_insn.
> I still hope for unified solution, so it shouldn't matter. But it
> pointed me to a bug in your code, see below.

Yeah, thanks!

>
>>
>> >
>> > Please change them both to just return error code.
>> >
>> >>
>> >> >
>> >> > Make sense?
>> >> >
>> >> > Regards,
>> >> > Jiong
>> >> >
>> >> >>
>> >> >>> +{
>> >> >>> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
>> >> >>> +       struct bpf_subprog_info *new_subinfo;
>> >> >>> +       struct bpf_insn_aux_data *new_data;
>> >> >>> +       struct bpf_prog *prog = env->prog;
>> >> >>> +       struct bpf_verifier_env *ret_env;
>> >> >>> +       struct bpf_insn *insns, *insn;
>> >> >>> +       struct bpf_list_insn *elem;
>> >> >>> +       int ret;
>> >> >>> +
>> >> >>> +       /* Calculate final size. */
>> >> >>> +       for (elem = list; elem; elem = elem->next)
>> >> >>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
>> >> >>> +                       fini_cnt++;
>> >> >>> +
>> >> >>> +       orig_cnt = prog->len;
>> >> >>> +       insns = prog->insnsi;
>> >> >>> +       /* If prog length remains same, nothing else to do. */
>> >> >>> +       if (fini_cnt == orig_cnt) {
>> >> >>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
>> >> >>> +                       *insn = elem->insn;
>> >> >>> +               return env;
>> >> >>> +       }
>> >> >>> +       /* Realloc insn buffer when necessary. */
>> >> >>> +       if (fini_cnt > orig_cnt)
>> >> >>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
>> >> >>> +                                       GFP_USER);
>> >> >>> +       if (!prog)
>> >> >>> +               return ERR_PTR(-ENOMEM);
>
> On realloc failure, prog will be non-NULL, so you need to handle error
> properly (and propagate it, instead of returning -ENOMEM):
>
> if (IS_ERR(prog))
>     return ERR_PTR(prog);
>
>
>> >> >>> +       insns = prog->insnsi;
>> >> >>> +       prog->len = fini_cnt;
>> >> >>> +       ret_env = env;
>> >> >>> +
>> >> >>> +       /* idx_map[OLD_IDX] = NEW_IDX */
>> >> >>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
>> >> >>> +       if (!idx_map)
>> >> >>> +               return ERR_PTR(-ENOMEM);
>> >> >>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
>> >> >>> +
>> >> >>> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
>> >> >>> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
>> >> >>> +       if (!new_data) {
>> >> >>> +               kvfree(idx_map);
>> >> >>> +               return ERR_PTR(-ENOMEM);
>> >> >>> +       }
>> >> >>> +
>> >> >>> +       /* Copy over insn + calculate idx_map. */
>> >> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> >> >>> +               int orig_idx = elem->orig_idx - 1;
>> >> >>> +
>> >> >>> +               if (orig_idx >= 0) {
>> >> >>> +                       idx_map[orig_idx] = idx;
>> >> >>> +
>> >> >>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> >> >>> +                               continue;
>> >> >>> +
>> >> >>> +                       new_data[idx] = env->insn_aux_data[orig_idx];
>> >> >>> +
>> >> >>> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
>> >> >>> +                               new_data[idx].zext_dst =
>> >> >>> +                                       insn_has_def32(env, &elem->insn);
>> >> >>> +               } else {
>> >> >>> +                       new_data[idx].seen = true;
>> >> >>> +                       new_data[idx].zext_dst = insn_has_def32(env,
>> >> >>> +                                                               &elem->insn);
>> >> >>> +               }
>> >> >>> +               insns[idx++] = elem->insn;
>> >> >>> +       }
>> >> >>> +
>> >> >>> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
>> >> >>> +       if (!new_subinfo) {
>> >> >>> +               kvfree(idx_map);
>> >> >>> +               vfree(new_data);
>> >> >>> +               return ERR_PTR(-ENOMEM);
>> >> >>> +       }
>> >> >>> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
>> >> >>> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
>> >> >>> +       env->subprog_cnt = 0;
>> >> >>> +       env->prog = prog;
>> >> >>> +       ret = add_subprog(env, 0);
>> >> >>> +       if (ret < 0) {
>> >> >>> +               ret_env = ERR_PTR(ret);
>> >> >>> +               goto free_all_ret;
>> >> >>> +       }
>> >> >>> +       /* Relocate jumps using idx_map.
>> >> >>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
>> >> >>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
>> >> >>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
>> >> >>> +        */
>> >> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> >> >>> +               int orig_idx = elem->orig_idx;
>> >> >>> +
>> >> >>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> >> >>> +                       continue;
>> >> >>> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
>> >> >>> +                       idx++;
>> >> >>> +                       continue;
>> >> >>> +               }
>> >> >>> +
>> >> >>> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
>> >> >>> +                                         idx_map);
>> >> >>> +               if (ret < 0) {
>> >> >>> +                       ret_env = ERR_PTR(ret);
>> >> >>> +                       goto free_all_ret;
>> >> >>> +               }
>> >> >>> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
>> >> >>> +               if (ret > 0) {
>> >> >>> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
>> >> >>> +                       if (ret < 0) {
>> >> >>> +                               ret_env = ERR_PTR(ret);
>> >> >>> +                               goto free_all_ret;
>> >> >>> +                       }
>> >> >>> +               }
>> >> >>> +               idx++;
>> >> >>> +       }
>> >> >>> +       if (ret < 0) {
>> >> >>> +               ret_env = ERR_PTR(ret);
>> >> >>> +               goto free_all_ret;
>> >> >>> +       }
>> >> >>> +
>> >> >>> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
>> >> >>> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
>> >> >>> +               new_subinfo[idx].start = env->subprog_info[idx].start;
>> >> >>> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
>> >> >>> +
>> >> >>> +       /* Adjust linfo.
>> >> >>> +        * FIXME: no support for insn removal at the moment.
>> >> >>> +        */
>> >> >>> +       if (prog->aux->nr_linfo) {
>> >> >>> +               struct bpf_line_info *linfo = prog->aux->linfo;
>> >> >>> +               u32 nr_linfo = prog->aux->nr_linfo;
>> >> >>> +
>> >> >>> +               for (idx = 0; idx < nr_linfo; idx++)
>> >> >>> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
>> >> >>> +       }
>> >> >>> +       vfree(env->insn_aux_data);
>> >> >>> +       env->insn_aux_data = new_data;
>> >> >>> +       goto free_mem_list_ret;
>> >> >>> +free_all_ret:
>> >> >>> +       vfree(new_data);
>> >> >>> +free_mem_list_ret:
>> >> >>> +       kvfree(new_subinfo);
>> >> >>> +       kvfree(idx_map);
>> >> >>> +       return ret_env;
>> >> >>> +}
>> >> >>> +
>> >> >>>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
>> >> >>>  {
>> >> >>>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
>> >> >>> --
>> >> >>> 2.7.4
>> >> >>>
>> >>
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-15 22:55         ` Andrii Nakryiko
  2019-07-15 23:00           ` Andrii Nakryiko
@ 2019-07-16  8:50           ` Jiong Wang
  2019-07-16 16:17             ` Alexei Starovoitov
  2019-07-16 17:49             ` Andrii Nakryiko
  1 sibling, 2 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-16  8:50 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers, Yonghong Song


Andrii Nakryiko writes:

> On Mon, Jul 15, 2019 at 2:21 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>>
>> Andrii Nakryiko writes:
>>
>> > On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >>
>> >>
>> >> Andrii Nakryiko writes:
>> >>
>> >> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >> >>
>> >> >> This is an RFC based on latest bpf-next about acclerating insn patching
>> >> >> speed, it is now near the shape of final PATCH set, and we could see the
>> >> >> changes migrating to list patching would brings, so send out for
>> >> >> comments. Most of the info are in cover letter. I splitted the code in a
>> >> >> way to show API migration more easily.
>> >> >
>> >> >
>> >> > Hey Jiong,
>> >> >
>> >> >
>> >> > Sorry, took me a while to get to this and learn more about instruction
>> >> > patching. Overall this looks good and I think is a good direction.
>> >> > I'll post high-level feedback here, and some more
>> >> > implementation-specific ones in corresponding patches.
>> >>
>> >> Great, thanks very much for the feedbacks. Most of your feedbacks are
>> >> hitting those pain points I exactly had ran into. For some of them, I
>> >> thought similar solutions like yours, but failed due to various
>> >> reasons. Let's go through them again, I could have missed some important
>> >> things.
>> >>
>> >> Please see my replies below.
>> >
>> > Thanks for thoughtful reply :)
>> >
>> >>
>> >> >>
>> >> >> Test Results
>> >> >> ===
>> >> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
>> >> >>     modes (interpreter, JIT, JIT with blinding).
>> >> >>
>> >> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
>> >> >>     patching time from 5100s (nearly one and a half hour) to less than
>> >> >>     0.5s for 1M insn patching.
>> >> >>
>> >> >> Known Issues
>> >> >> ===
>> >> >>   - The following warning is triggered when running scale test which
>> >> >>     contains 1M insns and patching:
>> >> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
>> >> >>
>> >> >>     This is caused by existing code, it can be reproduced on bpf-next
>> >> >>     master with jit blinding enabled, then run scale unit test, it will
>> >> >>     shown up after half an hour. After this set, patching is very fast, so
>> >> >>     it shows up quickly.
>> >> >>
>> >> >>   - No line info adjustment support when doing insn delete, subprog adj
>> >> >>     is with bug when doing insn delete as well. Generally, removal of insns
>> >> >>     could possibly cause remove of entire line or subprog, therefore
>> >> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
>> >> >>     don't have good idea and clean code for integrating this into the
>> >> >>     linearization code at the moment, will do more experimenting,
>> >> >>     appreciate ideas and suggestions on this.
>> >> >
>> >> > Is there any specific problem to detect which line info to delete? Or
>> >> > what am I missing besides careful implementation?
>> >>
>> >> Mostly line info and subprog info are range info which covers a range of
>> >> insns. Deleting insns could causing you adjusting the range or removing one
>> >> range entirely. subprog info could be fully recalcuated during
>> >> linearization while line info I need some careful implementation and I
>> >> failed to have clean code for this during linearization also as said no
>> >> unit tests to help me understand whether the code is correct or not.
>> >>
>> >
>> > Ok, that's good that it's just about clean implementation. Try to
>> > implement it as clearly as possible. Then post it here, and if it can
>> > be improved someone (me?) will try to help to clean it up further.
>> >
>> > Not a big expert on line info, so can't comment on that,
>> > unfortunately. Maybe Yonghong can chime in (cc'ed)
>> >
>> >
>> >> I will described this latter, spent too much time writing the following
>> >> reply. Might worth an separate discussion thread.
>> >>
>> >> >>
>> >> >>     Insn delete doesn't happen on normal programs, for example Cilium
>> >> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
>> >> >>     not good. That's also why this RFC have a full pass on selftest with
>> >> >>     this known issue.
>> >> >
>> >> > I hope you'll add test for deletion (and w/ corresponding line info)
>> >> > in final patch set :)
>> >>
>> >> Will try. Need to spend some time on BTF format.
>> >> >
>> >> >>
>> >> >>   - Could further use mem pool to accelerate the speed, changes are trivial
>> >> >>     on top of this RFC, and could be 2x extra faster. Not included in this
>> >> >>     RFC as reducing the algo complexity from quadratic to linear of insn
>> >> >>     number is the first step.
>> >> >
>> >> > Honestly, I think that would add more complexity than necessary, and I
>> >> > think we can further speed up performance without that, see below.
>> >> >
>> >> >>
>> >> >> Background
>> >> >> ===
>> >> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
>> >> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
>> >> >> remove insns.
>> >> >>
>> >> >> At the moment, insn patching is quadratic of insn number, this is due to
>> >> >> branch targets of jump insns needs to be adjusted, and the algo used is:
>> >> >>
>> >> >>   for insn inside prog
>> >> >>     patch insn + regeneate bpf prog
>> >> >>     for insn inside new prog
>> >> >>       adjust jump target
>> >> >>
>> >> >> This is causing significant time spending when a bpf prog requires large
>> >> >> amount of patching on different insns. Benchmarking shows it could take
>> >> >> more than half minutes to finish patching when patching number is more
>> >> >> than 50K, and the time spent could be more than one hour when patching
>> >> >> number is around 1M.
>> >> >>
>> >> >>   15000   :    3s
>> >> >>   45000   :   29s
>> >> >>   95000   :  125s
>> >> >>   195000  :  712s
>> >> >>   1000000 : 5100s
>> >> >>
>> >> >> This RFC introduces new patching infrastructure. Before doing insn
>> >> >> patching, insns in bpf prog are turned into a singly linked list, insert
>> >> >> new insns just insert new list node, delete insns just set delete flag.
>> >> >> And finally, the list is linearized back into array, and branch target
>> >> >> adjustment is done for all jump insns during linearization. This algo
>> >> >> brings the time complexity from quadratic to linear of insn number.
>> >> >>
>> >> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
>> >> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
>> >> >> to less than 0.5s.
>> >> >>
>> >> >> Patching API
>> >> >> ===
>> >> >> Insn patching could happen on two layers inside BPF. One is "core layer"
>> >> >> where only BPF insns are patched. The other is "verification layer" where
>> >> >> insns have corresponding aux info as well high level subprog info, so
>> >> >> insn patching means aux info needs to be patched as well, and subprog info
>> >> >> needs to be adjusted. BPF prog also has debug info associated, so line info
>> >> >> should always be updated after insn patching.
>> >> >>
>> >> >> So, list creation, destroy, insert, delete is the same for both layer,
>> >> >> but lineration is different. "verification layer" patching require extra
>> >> >> work. Therefore the patch APIs are:
>> >> >>
>> >> >>    list creation:                bpf_create_list_insn
>> >> >>    list patch:                   bpf_patch_list_insn
>> >> >>    list pre-patch:               bpf_prepatch_list_insn
>> >> >
>> >> > I think pre-patch name is very confusing, until I read full
>> >> > description I couldn't understand what it's supposed to be used for.
>> >> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
>> >> > me wondering whether instruction buffer is inserted after instruction,
>> >> > or instruction is replaced with a bunch of instructions.
>> >> >
>> >> > So how about two more specific names:
>> >> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
>> >> > instruction with a list of patch instructions)
>> >> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
>> >> > one is pretty clear).
>> >>
>> >> My sense on English word is not great, will switch to above which indeed
>> >> reads more clear.
>> >>
>> >> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
>> >> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
>> >> >
>> >> > These two functions are both quite involved, as well as share a lot of
>> >> > common code. I'd rather have one linearize instruction, that takes env
>> >> > as an optional parameter. If env is specified (which is the case for
>> >> > all cases except for constant blinding pass), then adjust aux_data and
>> >> > subprogs along the way.
>> >>
>> >> Two version of lineration and how to unify them was a painpoint to me. I
>> >> thought to factor out some of the common code out, but it actually doesn't
>> >> count much, the final size counting + insnsi resize parts are the same,
>> >> then things start to diverge since the "Copy over insn" loop.
>> >>
>> >> verifier layer needs to copy and initialize aux data etc. And jump
>> >> relocation is different. At core layer, the use case is JIT blinding which
>> >> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
>> >
>> > Sorry, I didn't get what "could expand an jump_imm insn into a
>> > and/or/jump_reg sequence", maybe you can clarify if I'm missing
>> > something.
>> >
>> > But from your cover letter description, core layer has no jumps at
>> > all, while verifier has jumps inside patch buffer. So, if you support
>> > jumps inside of patch buffer, it will automatically work for core
>> > layer. Or what am I missing?
>>
>> I meant in core layer (JIT blinding), there is the following patching:
>>
>> input:
>>   insn 0             insn 0
>>   insn 1             insn 1
>>   jmp_imm   >>       mov_imm  \
>>   insn 2             xor_imm    insn seq expanded from jmp_imm
>>   insn 3             jmp_reg  /
>>                      insn 2
>>                      insn 3
>>
>>
>> jmp_imm is the insn that will be patched, and the actually transformation
>> is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
>> at the end of the patch buffer, must jump to the same destination as the
>> original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
>> be relocated, and the jump destination is outside of patch buffer.
>
>
> Ok, great, thanks for explaining, yeah it's definitely something that
> we should be able to support. BUT. It got me thinking a bit more and I
> think I have simpler and more elegant solution now, again, supporting
> both core-layer and verifier-layer operations.
>
> struct bpf_patchable_insn {
>    struct bpf_patchable_insn *next;
>    struct bpf_insn insn;
>    int orig_idx; /* original non-patched index */
>    int new_idx;  /* new index, will be filled only during linearization */
> };
>
> struct bpf_patcher {
>     /* dummy head node of a chain of patchable instructions */
>     struct bpf_patchable_insn insn_head;
>     /* dynamic array of size(original instruction count)
>      * this is a map from original instruction index to a first
>      * patchable instruction that replaced that instruction (or
>      * just original instruction as bpf_patchable_insn).
>      */
>     int *orig_idx_to_patchable_insn;
>     int cnt;
> };
>
> Few points, but it should be pretty clear just from comments and definitions:
> 1. When you created bpf_patcher, you create patchabe_insn list, fill
> orig_idx_to_patchable_insn map to store proper pointers. This array is
> NEVER changed after that.
> 2. When replacing instruction, you re-use struct bpf_patchable_insn
> for first patched instruction, then append after that (not prepend to
> next instruction to not disrupt orig_idx -> patchable_insn mapping).
> 3. During linearizations, you first traverse the chain of instructions
> and trivially assing new_idxs.
> 4. No need for patchabe_insn->target anymore. All jumps use relative
> instruction offsets, right?

Yes, all jumps are pc-relative.

> So when you need to determine new
> instruction index during linearization, you just do (after you
> calculated new instruction indicies):
>
> func adjust_jmp(struct bpf_patcher* patcher, struct bpf_patchable_insn *insn) {
>    int old_jmp_idx = insn->orig_idx + jmp_offset_of(insn->insn);
>    int new_jmp_idx = patcher->orig_idx_to_patchable_insn[old_jmp_idx]->new_idx;
>    adjust_jmp_offset(insn->insn, new_jmp_idx) - insn->orig_idx;
> }

Hmm, this algo is kinds of the same this RFC, just we have organized "new_index"
as "idx_map". And in this RFC, only new_idx of one original insn matters,
no space is allocated for patched insns. (As mentioned, JIT blinding
requires the last insn inside patch buffer relocated to original jump
offset, so there was a little special handling in the relocation loop in
core layer linearization code)

> The idea is that we want to support quick look-up by original
> instruction index. That's what orig_idx_to_patchable_insn provides. On
> the other hand, no existing instruction is ever referencing newly
> patched instruction by its new offset, so with careful implementation,
> you can transparently support all the cases, regardless if it's in
> core layer or verifier layer (so, e.g., verifier layer patched
> instructions now will be able to jump out of patched buffer, if
> necessary, neat, right?).
>
> It is cleaner than everything we've discussed so far. Unless I missed
> something critical (it's all quite convoluted, so I might have
> forgotten some parts already). Let me know what you think.

Let me digest a little bit and do some coding, then I will come back. Some
issues can only shown up during in-depth coding. I kind of feel handling
aux reference in verifier layer is the part that will still introduce some
un-clean code.

<snip>
>> If there is no dead insn elimination opt, then we could just adjust
>> offsets. When there is insn deleting, I feel the logic becomes more
>> complex. One subprog could be completely deleted or partially deleted, so
>> I feel just recalculate the whole subprog info as a side-product is
>> much simpler.
>
> What's the situation where entirety of subprog can be deleted?

Suppose you have conditional jmp_imm, true path calls one subprog, false
path calls the other. If insn walker later found it is also true, then the
subprog at false path won't be marked as "seen", so it is entirely deleted.

I actually thought it is in theory one subprog could be deleted entirely,
so if we support insn deletion inside verifier, then range info like
line_info/subprog_info needs to consider one range is deleted.

Thanks.
Regards,
Jiong

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-16  8:50           ` Jiong Wang
@ 2019-07-16 16:17             ` Alexei Starovoitov
  2019-07-16 19:39               ` Jiong Wang
  2019-07-16 22:12               ` Jakub Kicinski
  2019-07-16 17:49             ` Andrii Nakryiko
  1 sibling, 2 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2019-07-16 16:17 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Andrii Nakryiko, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
	Yonghong Song

On Tue, Jul 16, 2019 at 09:50:25AM +0100, Jiong Wang wrote:
> 
> Let me digest a little bit and do some coding, then I will come back. Some
> issues can only shown up during in-depth coding. I kind of feel handling
> aux reference in verifier layer is the part that will still introduce some
> un-clean code.

I'm still internalizing this discussion. Only want to point out
that I think it's better to have simpler algorithm that consumes more
memory and slower than more complex algorithm that is more cpu/memory efficient.
Here we're aiming at 10x improvement anyway, so extra cpu and memory
here and there are good trade-off to make.

> >> If there is no dead insn elimination opt, then we could just adjust
> >> offsets. When there is insn deleting, I feel the logic becomes more
> >> complex. One subprog could be completely deleted or partially deleted, so
> >> I feel just recalculate the whole subprog info as a side-product is
> >> much simpler.
> >
> > What's the situation where entirety of subprog can be deleted?
> 
> Suppose you have conditional jmp_imm, true path calls one subprog, false
> path calls the other. If insn walker later found it is also true, then the
> subprog at false path won't be marked as "seen", so it is entirely deleted.
> 
> I actually thought it is in theory one subprog could be deleted entirely,
> so if we support insn deletion inside verifier, then range info like
> line_info/subprog_info needs to consider one range is deleted.

I don't think dead code elim can remove subprogs.
cfg check rejects code with dead progs.
I don't think we have a test for such 'dead prog only due to verifier walk'
situation. I wonder what happens :)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-16  8:50           ` Jiong Wang
  2019-07-16 16:17             ` Alexei Starovoitov
@ 2019-07-16 17:49             ` Andrii Nakryiko
  1 sibling, 0 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2019-07-16 17:49 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
	Yonghong Song

On Tue, Jul 16, 2019 at 1:50 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Mon, Jul 15, 2019 at 2:21 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>
> >>
> >> Andrii Nakryiko writes:
> >>
> >> > On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> >> >>
> >> >>
> >> >> Andrii Nakryiko writes:
> >> >>
> >> >> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >> >> >>
> >> >> >> This is an RFC based on latest bpf-next about acclerating insn patching
> >> >> >> speed, it is now near the shape of final PATCH set, and we could see the
> >> >> >> changes migrating to list patching would brings, so send out for
> >> >> >> comments. Most of the info are in cover letter. I splitted the code in a
> >> >> >> way to show API migration more easily.
> >> >> >
> >> >> >
> >> >> > Hey Jiong,
> >> >> >
> >> >> >
> >> >> > Sorry, took me a while to get to this and learn more about instruction
> >> >> > patching. Overall this looks good and I think is a good direction.
> >> >> > I'll post high-level feedback here, and some more
> >> >> > implementation-specific ones in corresponding patches.
> >> >>
> >> >> Great, thanks very much for the feedbacks. Most of your feedbacks are
> >> >> hitting those pain points I exactly had ran into. For some of them, I
> >> >> thought similar solutions like yours, but failed due to various
> >> >> reasons. Let's go through them again, I could have missed some important
> >> >> things.
> >> >>
> >> >> Please see my replies below.
> >> >
> >> > Thanks for thoughtful reply :)
> >> >
> >> >>
> >> >> >>
> >> >> >> Test Results
> >> >> >> ===
> >> >> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
> >> >> >>     modes (interpreter, JIT, JIT with blinding).
> >> >> >>
> >> >> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
> >> >> >>     patching time from 5100s (nearly one and a half hour) to less than
> >> >> >>     0.5s for 1M insn patching.
> >> >> >>
> >> >> >> Known Issues
> >> >> >> ===
> >> >> >>   - The following warning is triggered when running scale test which
> >> >> >>     contains 1M insns and patching:
> >> >> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
> >> >> >>
> >> >> >>     This is caused by existing code, it can be reproduced on bpf-next
> >> >> >>     master with jit blinding enabled, then run scale unit test, it will
> >> >> >>     shown up after half an hour. After this set, patching is very fast, so
> >> >> >>     it shows up quickly.
> >> >> >>
> >> >> >>   - No line info adjustment support when doing insn delete, subprog adj
> >> >> >>     is with bug when doing insn delete as well. Generally, removal of insns
> >> >> >>     could possibly cause remove of entire line or subprog, therefore
> >> >> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
> >> >> >>     don't have good idea and clean code for integrating this into the
> >> >> >>     linearization code at the moment, will do more experimenting,
> >> >> >>     appreciate ideas and suggestions on this.
> >> >> >
> >> >> > Is there any specific problem to detect which line info to delete? Or
> >> >> > what am I missing besides careful implementation?
> >> >>
> >> >> Mostly line info and subprog info are range info which covers a range of
> >> >> insns. Deleting insns could causing you adjusting the range or removing one
> >> >> range entirely. subprog info could be fully recalcuated during
> >> >> linearization while line info I need some careful implementation and I
> >> >> failed to have clean code for this during linearization also as said no
> >> >> unit tests to help me understand whether the code is correct or not.
> >> >>
> >> >
> >> > Ok, that's good that it's just about clean implementation. Try to
> >> > implement it as clearly as possible. Then post it here, and if it can
> >> > be improved someone (me?) will try to help to clean it up further.
> >> >
> >> > Not a big expert on line info, so can't comment on that,
> >> > unfortunately. Maybe Yonghong can chime in (cc'ed)
> >> >
> >> >
> >> >> I will described this latter, spent too much time writing the following
> >> >> reply. Might worth an separate discussion thread.
> >> >>
> >> >> >>
> >> >> >>     Insn delete doesn't happen on normal programs, for example Cilium
> >> >> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
> >> >> >>     not good. That's also why this RFC have a full pass on selftest with
> >> >> >>     this known issue.
> >> >> >
> >> >> > I hope you'll add test for deletion (and w/ corresponding line info)
> >> >> > in final patch set :)
> >> >>
> >> >> Will try. Need to spend some time on BTF format.
> >> >> >
> >> >> >>
> >> >> >>   - Could further use mem pool to accelerate the speed, changes are trivial
> >> >> >>     on top of this RFC, and could be 2x extra faster. Not included in this
> >> >> >>     RFC as reducing the algo complexity from quadratic to linear of insn
> >> >> >>     number is the first step.
> >> >> >
> >> >> > Honestly, I think that would add more complexity than necessary, and I
> >> >> > think we can further speed up performance without that, see below.
> >> >> >
> >> >> >>
> >> >> >> Background
> >> >> >> ===
> >> >> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
> >> >> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
> >> >> >> remove insns.
> >> >> >>
> >> >> >> At the moment, insn patching is quadratic of insn number, this is due to
> >> >> >> branch targets of jump insns needs to be adjusted, and the algo used is:
> >> >> >>
> >> >> >>   for insn inside prog
> >> >> >>     patch insn + regeneate bpf prog
> >> >> >>     for insn inside new prog
> >> >> >>       adjust jump target
> >> >> >>
> >> >> >> This is causing significant time spending when a bpf prog requires large
> >> >> >> amount of patching on different insns. Benchmarking shows it could take
> >> >> >> more than half minutes to finish patching when patching number is more
> >> >> >> than 50K, and the time spent could be more than one hour when patching
> >> >> >> number is around 1M.
> >> >> >>
> >> >> >>   15000   :    3s
> >> >> >>   45000   :   29s
> >> >> >>   95000   :  125s
> >> >> >>   195000  :  712s
> >> >> >>   1000000 : 5100s
> >> >> >>
> >> >> >> This RFC introduces new patching infrastructure. Before doing insn
> >> >> >> patching, insns in bpf prog are turned into a singly linked list, insert
> >> >> >> new insns just insert new list node, delete insns just set delete flag.
> >> >> >> And finally, the list is linearized back into array, and branch target
> >> >> >> adjustment is done for all jump insns during linearization. This algo
> >> >> >> brings the time complexity from quadratic to linear of insn number.
> >> >> >>
> >> >> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
> >> >> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
> >> >> >> to less than 0.5s.
> >> >> >>
> >> >> >> Patching API
> >> >> >> ===
> >> >> >> Insn patching could happen on two layers inside BPF. One is "core layer"
> >> >> >> where only BPF insns are patched. The other is "verification layer" where
> >> >> >> insns have corresponding aux info as well high level subprog info, so
> >> >> >> insn patching means aux info needs to be patched as well, and subprog info
> >> >> >> needs to be adjusted. BPF prog also has debug info associated, so line info
> >> >> >> should always be updated after insn patching.
> >> >> >>
> >> >> >> So, list creation, destroy, insert, delete is the same for both layer,
> >> >> >> but lineration is different. "verification layer" patching require extra
> >> >> >> work. Therefore the patch APIs are:
> >> >> >>
> >> >> >>    list creation:                bpf_create_list_insn
> >> >> >>    list patch:                   bpf_patch_list_insn
> >> >> >>    list pre-patch:               bpf_prepatch_list_insn
> >> >> >
> >> >> > I think pre-patch name is very confusing, until I read full
> >> >> > description I couldn't understand what it's supposed to be used for.
> >> >> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
> >> >> > me wondering whether instruction buffer is inserted after instruction,
> >> >> > or instruction is replaced with a bunch of instructions.
> >> >> >
> >> >> > So how about two more specific names:
> >> >> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
> >> >> > instruction with a list of patch instructions)
> >> >> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
> >> >> > one is pretty clear).
> >> >>
> >> >> My sense on English word is not great, will switch to above which indeed
> >> >> reads more clear.
> >> >>
> >> >> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
> >> >> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
> >> >> >
> >> >> > These two functions are both quite involved, as well as share a lot of
> >> >> > common code. I'd rather have one linearize instruction, that takes env
> >> >> > as an optional parameter. If env is specified (which is the case for
> >> >> > all cases except for constant blinding pass), then adjust aux_data and
> >> >> > subprogs along the way.
> >> >>
> >> >> Two version of lineration and how to unify them was a painpoint to me. I
> >> >> thought to factor out some of the common code out, but it actually doesn't
> >> >> count much, the final size counting + insnsi resize parts are the same,
> >> >> then things start to diverge since the "Copy over insn" loop.
> >> >>
> >> >> verifier layer needs to copy and initialize aux data etc. And jump
> >> >> relocation is different. At core layer, the use case is JIT blinding which
> >> >> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
> >> >
> >> > Sorry, I didn't get what "could expand an jump_imm insn into a
> >> > and/or/jump_reg sequence", maybe you can clarify if I'm missing
> >> > something.
> >> >
> >> > But from your cover letter description, core layer has no jumps at
> >> > all, while verifier has jumps inside patch buffer. So, if you support
> >> > jumps inside of patch buffer, it will automatically work for core
> >> > layer. Or what am I missing?
> >>
> >> I meant in core layer (JIT blinding), there is the following patching:
> >>
> >> input:
> >>   insn 0             insn 0
> >>   insn 1             insn 1
> >>   jmp_imm   >>       mov_imm  \
> >>   insn 2             xor_imm    insn seq expanded from jmp_imm
> >>   insn 3             jmp_reg  /
> >>                      insn 2
> >>                      insn 3
> >>
> >>
> >> jmp_imm is the insn that will be patched, and the actually transformation
> >> is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
> >> at the end of the patch buffer, must jump to the same destination as the
> >> original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
> >> be relocated, and the jump destination is outside of patch buffer.
> >
> >
> > Ok, great, thanks for explaining, yeah it's definitely something that
> > we should be able to support. BUT. It got me thinking a bit more and I
> > think I have simpler and more elegant solution now, again, supporting
> > both core-layer and verifier-layer operations.
> >
> > struct bpf_patchable_insn {
> >    struct bpf_patchable_insn *next;
> >    struct bpf_insn insn;
> >    int orig_idx; /* original non-patched index */
> >    int new_idx;  /* new index, will be filled only during linearization */
> > };
> >
> > struct bpf_patcher {
> >     /* dummy head node of a chain of patchable instructions */
> >     struct bpf_patchable_insn insn_head;
> >     /* dynamic array of size(original instruction count)
> >      * this is a map from original instruction index to a first
> >      * patchable instruction that replaced that instruction (or
> >      * just original instruction as bpf_patchable_insn).
> >      */
> >     int *orig_idx_to_patchable_insn;
> >     int cnt;
> > };
> >
> > Few points, but it should be pretty clear just from comments and definitions:
> > 1. When you created bpf_patcher, you create patchabe_insn list, fill
> > orig_idx_to_patchable_insn map to store proper pointers. This array is
> > NEVER changed after that.
> > 2. When replacing instruction, you re-use struct bpf_patchable_insn
> > for first patched instruction, then append after that (not prepend to
> > next instruction to not disrupt orig_idx -> patchable_insn mapping).
> > 3. During linearizations, you first traverse the chain of instructions
> > and trivially assing new_idxs.
> > 4. No need for patchabe_insn->target anymore. All jumps use relative
> > instruction offsets, right?
>
> Yes, all jumps are pc-relative.
>
> > So when you need to determine new
> > instruction index during linearization, you just do (after you
> > calculated new instruction indicies):
> >
> > func adjust_jmp(struct bpf_patcher* patcher, struct bpf_patchable_insn *insn) {
> >    int old_jmp_idx = insn->orig_idx + jmp_offset_of(insn->insn);
> >    int new_jmp_idx = patcher->orig_idx_to_patchable_insn[old_jmp_idx]->new_idx;
> >    adjust_jmp_offset(insn->insn, new_jmp_idx) - insn->orig_idx;
> > }
>
> Hmm, this algo is kinds of the same this RFC, just we have organized "new_index"
> as "idx_map". And in this RFC, only new_idx of one original insn matters,
> no space is allocated for patched insns. (As mentioned, JIT blinding

It's not really about saving space. It's about having a mapping from
original index to a new one (in this case, through struct
bpf_patchable_insn *), which stays correct at all times, thus allowing
to not linearize between patching passes.


> requires the last insn inside patch buffer relocated to original jump
> offset, so there was a little special handling in the relocation loop in
> core layer linearization code)
>
> > The idea is that we want to support quick look-up by original
> > instruction index. That's what orig_idx_to_patchable_insn provides. On
> > the other hand, no existing instruction is ever referencing newly
> > patched instruction by its new offset, so with careful implementation,
> > you can transparently support all the cases, regardless if it's in
> > core layer or verifier layer (so, e.g., verifier layer patched
> > instructions now will be able to jump out of patched buffer, if
> > necessary, neat, right?).
> >
> > It is cleaner than everything we've discussed so far. Unless I missed
> > something critical (it's all quite convoluted, so I might have
> > forgotten some parts already). Let me know what you think.
>
> Let me digest a little bit and do some coding, then I will come back. Some

Sure, give it some thought and give it a go at coding, I bet overall
it will turn out more succinct and simpler. Please post an updated
version when you are done. Thanks!

> issues can only shown up during in-depth coding. I kind of feel handling
> aux reference in verifier layer is the part that will still introduce some
> un-clean code.
>
> <snip>
> >> If there is no dead insn elimination opt, then we could just adjust
> >> offsets. When there is insn deleting, I feel the logic becomes more
> >> complex. One subprog could be completely deleted or partially deleted, so
> >> I feel just recalculate the whole subprog info as a side-product is
> >> much simpler.
> >
> > What's the situation where entirety of subprog can be deleted?
>
> Suppose you have conditional jmp_imm, true path calls one subprog, false
> path calls the other. If insn walker later found it is also true, then the
> subprog at false path won't be marked as "seen", so it is entirely deleted.
>
> I actually thought it is in theory one subprog could be deleted entirely,
> so if we support insn deletion inside verifier, then range info like
> line_info/subprog_info needs to consider one range is deleted.

Seems like this is not a problem, according to Alexei. But in the
worst case, it's now simple to re-calculate all this, given that we
have this simple operation to get new insn idx by old insn idx.

>
> Thanks.
> Regards,
> Jiong

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-16 16:17             ` Alexei Starovoitov
@ 2019-07-16 19:39               ` Jiong Wang
  2019-07-16 22:12               ` Jakub Kicinski
  1 sibling, 0 replies; 32+ messages in thread
From: Jiong Wang @ 2019-07-16 19:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiong Wang, Andrii Nakryiko, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers, Yonghong Song


Alexei Starovoitov writes:

> On Tue, Jul 16, 2019 at 09:50:25AM +0100, Jiong Wang wrote:
>> 
>> Let me digest a little bit and do some coding, then I will come back. Some
>> issues can only shown up during in-depth coding. I kind of feel handling
>> aux reference in verifier layer is the part that will still introduce some
>> un-clean code.
>
> I'm still internalizing this discussion. Only want to point out
> that I think it's better to have simpler algorithm that consumes more
> memory and slower than more complex algorithm that is more cpu/memory efficient.
> Here we're aiming at 10x improvement anyway, so extra cpu and memory
> here and there are good trade-off to make.
>
>> >> If there is no dead insn elimination opt, then we could just adjust
>> >> offsets. When there is insn deleting, I feel the logic becomes more
>> >> complex. One subprog could be completely deleted or partially deleted, so
>> >> I feel just recalculate the whole subprog info as a side-product is
>> >> much simpler.
>> >
>> > What's the situation where entirety of subprog can be deleted?
>> 
>> Suppose you have conditional jmp_imm, true path calls one subprog, false
>> path calls the other. If insn walker later found it is also true, then the
>> subprog at false path won't be marked as "seen", so it is entirely deleted.
>> 
>> I actually thought it is in theory one subprog could be deleted entirely,
>> so if we support insn deletion inside verifier, then range info like
>> line_info/subprog_info needs to consider one range is deleted.
>
> I don't think dead code elim can remove subprogs.
> cfg check rejects code with dead progs.

cfg check rejects unreachable code based on static analysis while one
subprog passed cfg check could be identified as dead later after runtime
value tracking, after check_cond_jmp_op pruning subprog call in false
path and making the subprog dead?

For example:

  static subprog1()
  static subprog2()
  
  foo(int mask)
  {
    if (mask & 0x1)
      subprog1();
    else
      subprog2();
    ...
  }

foo's incoming arg is a mask, and depending on whether the LSB is set, it
calls different init functions, subprog1 or subprog2.

foo might be called with a constant as mask, for example 0x8000. Then if
foo is not called by someone else, subprog1 is dead if there is no other
caller of it.

LLVM is smart enough to optimize out such dead functions if they are only
visible in the same compilation unit, and people might only write code in
such shape when they are encapsulated in a lib. but if case like above is
true, I think it is possible one subprog could be deleted by verifier
entirely.

> I don't think we have a test for such 'dead prog only due to verifier walk'
> situation. I wonder what happens :)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-16 16:17             ` Alexei Starovoitov
  2019-07-16 19:39               ` Jiong Wang
@ 2019-07-16 22:12               ` Jakub Kicinski
  2019-07-17  1:17                 ` Alexei Starovoitov
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2019-07-16 22:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiong Wang, Andrii Nakryiko, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, bpf, Networking, oss-drivers,
	Yonghong Song

On Tue, 16 Jul 2019 09:17:03 -0700, Alexei Starovoitov wrote:
> I don't think we have a test for such 'dead prog only due to verifier walk'
> situation. I wonder what happens :)

FWIW we do have verifier and BTF self tests for dead code removal
of entire subprogs! :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
  2019-07-16 22:12               ` Jakub Kicinski
@ 2019-07-17  1:17                 ` Alexei Starovoitov
  0 siblings, 0 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2019-07-17  1:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiong Wang, Andrii Nakryiko, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, bpf, Networking, oss-drivers,
	Yonghong Song

On Tue, Jul 16, 2019 at 3:12 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Tue, 16 Jul 2019 09:17:03 -0700, Alexei Starovoitov wrote:
> > I don't think we have a test for such 'dead prog only due to verifier walk'
> > situation. I wonder what happens :)
>
> FWIW we do have verifier and BTF self tests for dead code removal
> of entire subprogs! :)

Thanks! Indeed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-07-17  1:18 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-04 21:26 [RFC bpf-next 0/8] bpf: accelerate insn patching speed Jiong Wang
2019-07-04 21:26 ` [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer Jiong Wang
2019-07-10 17:49   ` Andrii Nakryiko
2019-07-11 11:53     ` Jiong Wang
2019-07-12 19:48       ` Andrii Nakryiko
2019-07-15  9:58         ` Jiong Wang
2019-07-04 21:26 ` [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer Jiong Wang
2019-07-10 17:50   ` Andrii Nakryiko
2019-07-11 11:59     ` [oss-drivers] " Jiong Wang
2019-07-11 12:20       ` Jiong Wang
2019-07-12 19:51         ` Andrii Nakryiko
2019-07-15 10:02           ` Jiong Wang
2019-07-15 22:29             ` Andrii Nakryiko
2019-07-16  8:12               ` Jiong Wang
2019-07-04 21:26 ` [RFC bpf-next 3/8] bpf: migrate jit blinding to list patching infra Jiong Wang
2019-07-04 21:26 ` [RFC bpf-next 4/8] bpf: migrate convert_ctx_accesses " Jiong Wang
2019-07-04 21:26 ` [RFC bpf-next 5/8] bpf: migrate fixup_bpf_calls " Jiong Wang
2019-07-04 21:26 ` [RFC bpf-next 6/8] bpf: migrate zero extension opt " Jiong Wang
2019-07-04 21:26 ` [RFC bpf-next 7/8] bpf: migrate insn remove " Jiong Wang
2019-07-04 21:26 ` [RFC bpf-next 8/8] bpf: delete all those code around old insn patching infrastructure Jiong Wang
2019-07-10 17:39 ` [RFC bpf-next 0/8] bpf: accelerate insn patching speed Andrii Nakryiko
2019-07-11 11:22   ` Jiong Wang
2019-07-12 19:43     ` Andrii Nakryiko
2019-07-15  9:21       ` Jiong Wang
2019-07-15 22:55         ` Andrii Nakryiko
2019-07-15 23:00           ` Andrii Nakryiko
2019-07-16  8:50           ` Jiong Wang
2019-07-16 16:17             ` Alexei Starovoitov
2019-07-16 19:39               ` Jiong Wang
2019-07-16 22:12               ` Jakub Kicinski
2019-07-17  1:17                 ` Alexei Starovoitov
2019-07-16 17:49             ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).