netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86
@ 2019-02-05 22:50 Rick Edgecombe
  2019-02-05 22:51 ` [RFC PATCH 1/4] bpf, x64: Implement BPF call retpoline Rick Edgecombe
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Rick Edgecombe @ 2019-02-05 22:50 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, ard.biesheuvel, dave.hansen, kristen, Rick Edgecombe

This introduces a new capability for BPF program JIT's to be located in vmalloc
space on x86_64. This can serve as a backup area for CONFIG_BPF_JIT_ALWAYS_ON in
case an unprivileged app uses all of the module space allowed by bpf_jit_limit.

In order to allow for calls from the increased distance of vmalloc from
kernel/module space, relative calls are emitted as full indirect calls if the
maximum relative call distance is exceeded. So the resulting performance of call
BPF instructions in this case is similar to the BPF interpreter.

Below is the results for the BPF unit test "BPF_MAXINSNS: Call heavy
transformations":

Config                  Duration 1      Duration 2
--------------------------------------------------
JIT Modules             13304           14224
JIT Vmalloc             33711           26877
Interpreter             31931           28180 
Retpoline JIT Modules   14517           13375
Retpoline JIT Vmalloc   65739           60281
Retpoline Interpreter   63097           61141

Since this benchmark is made up of only calls to the kernel text, it suggests
real world performance shouldn't ever be worse than the interpreter.

The logic for when to use vmalloc, is that we use module space unless it is
full, or, we are not CAP_SYS_ADMIN and bpf_jit_limit is exceeded. So vmalloc is
only used when either the insertion would fail, or BPF would fallback to the
interpreter. So there should be no run time regression when used in these
scenarios.

Todo:
 - This passes all the unit tests, but could use further verification that the
   compiler will still always converge.
 - Fix ARM after "bpf: Charge bpf jit limit in bpf_jit_alloc_exec" change
 - Update documentation for bpf_jit_limit

Possible future optimizations:
 - Look at re-mapping some text in vmalloc so that calls can be in relative jump
   range. For example, a BPF library program could maybe be re-mapped multiple
   times so that a copy is always near the caller and so we could use the faster
   calls.


Rick Edgecombe (4):
  bpf, x64: Implement BPF call retpoline
  bpf, x64: Increase distance for bpf calls
  bpf: Charge bpf jit limit in bpf_jit_alloc_exec
  bpf, x64: Enable unprivlidged jit in vmalloc

 arch/x86/include/asm/nospec-branch.h |  45 +++++++-
 arch/x86/net/bpf_jit_comp.c          | 149 ++++++++++++++++++++++-----
 include/linux/filter.h               |   3 +
 kernel/bpf/core.c                    |  20 ++--
 4 files changed, 183 insertions(+), 34 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/4] bpf, x64: Implement BPF call retpoline
  2019-02-05 22:50 [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Rick Edgecombe
@ 2019-02-05 22:51 ` Rick Edgecombe
  2019-02-05 22:51 ` [RFC PATCH 2/4] bpf, x64: Increase distance for bpf calls Rick Edgecombe
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Rick Edgecombe @ 2019-02-05 22:51 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, ard.biesheuvel, dave.hansen, kristen, Rick Edgecombe

Add x86 call retpoline sequence from the "Intel Retpoline: A Branch Target
Injection Mitigation White Paper" for BPF JIT compiler. Unlike the paper
it uses RBX instead of RAX since RAX is part of the BPF calling
convetions.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/nospec-branch.h | 45 ++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index dad12b767ba0..70b3f6534134 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -320,6 +320,8 @@ DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
 #endif /* __ASSEMBLY__ */
 
+#ifdef CONFIG_RETPOLINE
+# ifdef CONFIG_X86_64
 /*
  * Below is used in the eBPF JIT compiler and emits the byte sequence
  * for the following assembly:
@@ -341,8 +343,6 @@ DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
  *    jmp *%rax for x86_64
  *    jmp *%edx for x86_32
  */
-#ifdef CONFIG_RETPOLINE
-# ifdef CONFIG_X86_64
 #  define RETPOLINE_RAX_BPF_JIT_SIZE	17
 #  define RETPOLINE_RAX_BPF_JIT()				\
 do {								\
@@ -355,6 +355,44 @@ do {								\
 	EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */	\
 	EMIT1(0xC3);             /* retq */			\
 } while (0)
+
+/* Modified from "Intel Retpoline: A Branch Target Injection Mitigation White
+ * Paper" to use RBX register in order to not intefere with BPF calling
+ * conventions:
+ *
+ *   jmp label2
+ *
+ * label0:
+ *   call label1
+ *
+ * capture_ret_spec:
+ *   pause
+ *   lfence
+ *   jmp capture_ret_spec
+ *
+ * label1:
+ *   mov %rbx,(%rsp)
+ *   ret
+ *
+ * label2:
+ *   call label0
+ */
+#  define RETPOLINE_RBX_BPF_JIT_CALL_SIZE	24
+#  define RETPOLINE_RBX_BPF_JIT_CALL()				\
+do {								\
+	EMIT2(0xEB, 0x11);	 /* jump label2 */		\
+	/* label2: */						\
+	EMIT1_off32(0xE8, 7);	 /* call label1 */		\
+	/* capture_ret_spec: */					\
+	EMIT2(0xF3, 0x90);	 /* pause */			\
+	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
+	EMIT2(0xEB, 0xF9);	 /* jmp capture_ret_spec */	\
+	/* label1: */						\
+	EMIT4(0x48, 0x89, 0x1c, 0x24); /* mov %rbx,(%rsp) */	\
+	EMIT1(0xC3);		 /* ret */			\
+	/* label2: */						\
+	EMIT1_off32(0xE8, -22);	 /* call label0 */		\
+} while (0)
 # else /* !CONFIG_X86_64 */
 #  define RETPOLINE_EDX_BPF_JIT()				\
 do {								\
@@ -373,6 +411,9 @@ do {								\
 #  define RETPOLINE_RAX_BPF_JIT_SIZE	2
 #  define RETPOLINE_RAX_BPF_JIT()				\
 	EMIT2(0xFF, 0xE0);       /* jmp *%rax */
+#  define RETPOLINE_RBX_BPF_JIT_CALL_SIZE	2
+#  define RETPOLINE_RBX_BPF_JIT_CALL()				\
+	EMIT2(0xff, 0xD3)        /* call *%rbx */
 # else /* !CONFIG_X86_64 */
 #  define RETPOLINE_EDX_BPF_JIT()				\
 	EMIT2(0xFF, 0xE2)        /* jmp *%edx */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/4] bpf, x64: Increase distance for bpf calls
  2019-02-05 22:50 [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Rick Edgecombe
  2019-02-05 22:51 ` [RFC PATCH 1/4] bpf, x64: Implement BPF call retpoline Rick Edgecombe
@ 2019-02-05 22:51 ` Rick Edgecombe
  2019-02-05 22:51 ` [RFC PATCH 3/4] bpf: Charge bpf jit limit in bpf_jit_alloc_exec Rick Edgecombe
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Rick Edgecombe @ 2019-02-05 22:51 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, ard.biesheuvel, dave.hansen, kristen, Rick Edgecombe

This allows for BPF calls to be emitted that call further than the
relative call range. When a call cannot be emitted as a relative call it
is emitted as a full indirect call.

The image has to be allocated in order to compute the distance of the
call in order to know what type of call to be emitted. The two types of
calls have different sizes, so this requires allowing the image to shrink
further after image is created. After the image is allocated two more
passes are needed, one to get the new sizes and the other to set the
final offsets.

So the algorithm in bpf_int_jit_compile is changed to always do at least
2 more passes after the program converges once. The old check inside the
loop that verified if the program length changed after the image was
created now looks for the same scenario by checking if the image ever
converged a second time after if the maximum passes were reached.

In the case of retpoline, the call needs to be made through a retpoline
sequence. This sequence is emitted at the end of the program so that it
can be re-used by multiple calls.

This change is intended to not change the emitted code that is actually
executed in the case of using the module space, however the allocation
may be larger at the end when using retpoline due the thunk emitted at
the end.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/net/bpf_jit_comp.c | 117 +++++++++++++++++++++++++++++-------
 1 file changed, 94 insertions(+), 23 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 5542303c43d9..c9781d471e31 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -13,6 +13,7 @@
 #include <linux/filter.h>
 #include <linux/if_vlan.h>
 #include <linux/bpf.h>
+#include <linux/moduleloader.h>
 
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
@@ -408,13 +409,17 @@ static void emit_mov_reg(u8 **pprog, bool is64, u32 dst_reg, u32 src_reg)
 	*pprog = prog;
 }
 
+#define RETPOL_THUNK_SIZE ((IS_ENABLED(CONFIG_RETPOLINE) \
+				* RETPOLINE_RBX_BPF_JIT_CALL_SIZE) + 1)
+
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 		  int oldproglen, struct jit_context *ctx)
 {
 	struct bpf_insn *insn = bpf_prog->insnsi;
 	int insn_cnt = bpf_prog->len;
 	bool seen_exit = false;
-	u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
+	const int retpol_thunk = oldproglen - RETPOL_THUNK_SIZE;
+	u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY + RETPOL_THUNK_SIZE];
 	int i, cnt = 0;
 	int proglen = 0;
 	u8 *prog = temp;
@@ -430,7 +435,6 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 		s64 jmp_offset;
 		u8 jmp_cond;
 		int ilen;
-		u8 *func;
 
 		switch (insn->code) {
 			/* ALU */
@@ -856,16 +860,49 @@ xadd:			if (is_imm8(insn->off))
 
 			/* call */
 		case BPF_JMP | BPF_CALL:
-			func = (u8 *) __bpf_call_base + imm32;
-			jmp_offset = func - (image + addrs[i]);
-			if (!imm32 || !is_simm32(jmp_offset)) {
-				pr_err("unsupported BPF func %d addr %p image %p\n",
-				       imm32, func, image);
-				return -EINVAL;
+		{
+			bool func_addr_fixed;
+			u64 func_addr;
+			int ret = bpf_jit_get_func_addr(bpf_prog, insn,
+				!image, &func_addr, &func_addr_fixed);
+			if (ret < 0)
+				return ret;
+
+			jmp_offset = func_addr - (u64)(image + addrs[i]);
+
+			/*
+			 * Need to know the allocation location before we can
+			 * know whether we can use a relative call or not.
+			 *
+			 * Always emit indirect version until we have the image
+			 * so the length will shrink.
+			 */
+			if (image && (imm32 && is_simm32(jmp_offset))) {
+				/* Emit relative call */
+
+				EMIT1_off32(0xE8, jmp_offset);
+			} else {
+				/* Emit indirect call */
+
+				EMIT1(0x53); /* push rbx */
+				emit_mov_imm64(&prog, BPF_REG_6,
+					(u32)(((u64)func_addr) >> 32),
+					(u32)(u64)func_addr);
+				/* -1 to account for pop rbx below */
+				jmp_offset = retpol_thunk - (addrs[i] - 1);
+
+				/*
+				 * If retpoline, jump to retpoline thunk, or
+				 * else emit the default indirect call version.
+				 */
+				if (IS_ENABLED(CONFIG_RETPOLINE))
+					EMIT1_off32(0xE8, jmp_offset);
+				else
+					RETPOLINE_RBX_BPF_JIT_CALL();
+				EMIT1(0x5B); /* pop rbx */
 			}
-			EMIT1_off32(0xE8, jmp_offset);
 			break;
-
+		}
 		case BPF_JMP | BPF_TAIL_CALL:
 			emit_bpf_tail_call(&prog);
 			break;
@@ -1049,6 +1086,27 @@ xadd:			if (is_imm8(insn->off))
 		addrs[i] = proglen;
 		prog = temp;
 	}
+
+	/*
+	 * If this allocation is far from the kernel text, we may need
+	 * to do a retpoline call. Embed the chunk here so it can be
+	 * re-used by multiple calls.
+	 */
+	if (IS_ENABLED(CONFIG_RETPOLINE)) {
+		int new_len = RETPOLINE_RBX_BPF_JIT_CALL_SIZE + 1;
+
+		RETPOLINE_RBX_BPF_JIT_CALL();
+		EMIT1(0xC3); /* ret */
+
+		if (image) {
+			if (unlikely(proglen + new_len > oldproglen)) {
+				pr_err("bpf_jit: fatal error\n");
+				return -EFAULT;
+			}
+			memcpy(image + proglen, temp, new_len);
+		}
+		proglen += new_len;
+	}
 	return proglen;
 }
 
@@ -1073,6 +1131,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	int *addrs;
 	int pass;
 	int i;
+	int converged_pass;
 
 	if (!prog->jit_requested)
 		return orig_prog;
@@ -1127,10 +1186,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	/*
 	 * JITed image shrinks with every pass and the loop iterates
 	 * until the image stops shrinking. Very large BPF programs
-	 * may converge on the last pass. In such case do one more
-	 * pass to emit the final image.
+	 * may converge on the last pass. In such case one more
+	 * pass is needed to emit the final image.
+	 *
+	 * After the image memory is allocated there needs to be at least 2 more
+	 * passes as the emitted calls can shrink if they are in relative range.
+	 * The logic is, we always do at least 2 more passes after the image
+	 * converges once.
 	 */
-	for (pass = 0; pass < 20 || image; pass++) {
+	converged_pass = 1;
+	for (pass = 0; pass < 22 && pass < converged_pass + 2; pass++) {
 		proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
 		if (proglen <= 0) {
 out_image:
@@ -1140,26 +1205,32 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			prog = orig_prog;
 			goto out_addrs;
 		}
-		if (image) {
-			if (proglen != oldproglen) {
-				pr_err("bpf_jit: proglen=%d != oldproglen=%d\n",
-				       proglen, oldproglen);
-				goto out_image;
-			}
-			break;
-		}
-		if (proglen == oldproglen) {
+
+		if (proglen != oldproglen)
+			converged_pass = pass + 1;
+
+		if (proglen == oldproglen && !image) {
 			header = bpf_jit_binary_alloc(proglen, &image,
-						      1, jit_fill_hole);
+						1, jit_fill_hole);
 			if (!header) {
 				prog = orig_prog;
 				goto out_addrs;
 			}
 		}
+
 		oldproglen = proglen;
 		cond_resched();
 	}
 
+	/* See if the image ever converged */
+	if (image) {
+		if (proglen != oldproglen) {
+			pr_err("bpf_jit: proglen=%d != oldproglen=%d\n",
+				proglen, oldproglen);
+			goto out_image;
+		}
+	}
+
 	if (bpf_jit_enable > 1)
 		bpf_jit_dump(prog->len, proglen, pass + 1, image);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/4] bpf: Charge bpf jit limit in bpf_jit_alloc_exec
  2019-02-05 22:50 [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Rick Edgecombe
  2019-02-05 22:51 ` [RFC PATCH 1/4] bpf, x64: Implement BPF call retpoline Rick Edgecombe
  2019-02-05 22:51 ` [RFC PATCH 2/4] bpf, x64: Increase distance for bpf calls Rick Edgecombe
@ 2019-02-05 22:51 ` Rick Edgecombe
  2019-02-05 22:51 ` [RFC PATCH 4/4] bpf, x64: Enable unprivlidged jit in vmalloc Rick Edgecombe
  2019-02-06  0:35 ` [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Alexei Starovoitov
  4 siblings, 0 replies; 9+ messages in thread
From: Rick Edgecombe @ 2019-02-05 22:51 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, ard.biesheuvel, dave.hansen, kristen, Rick Edgecombe

This moves the calls to bpf_jit_charge_modmem and
bpf_jit_uncharge_modmem to bpf_jit_alloc_exec so the behavior can be
overridden for arch specific capabilities, for example if the arch
supports for allocating outside the module space.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 include/linux/filter.h |  3 +++
 kernel/bpf/core.c      | 20 +++++++++++---------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ad106d845b22..33c0ae5990e1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -859,6 +859,9 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 		     unsigned int alignment,
 		     bpf_jit_fill_hole_t bpf_fill_ill_insns);
 void bpf_jit_binary_free(struct bpf_binary_header *hdr);
+int bpf_jit_charge_modmem(u32 pages);
+void bpf_jit_uncharge_modmem(u32 pages);
+
 
 void bpf_jit_free(struct bpf_prog *fp);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index f908b9356025..ce08a09839c2 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -722,7 +722,7 @@ static int __init bpf_jit_charge_init(void)
 }
 pure_initcall(bpf_jit_charge_init);
 
-static int bpf_jit_charge_modmem(u32 pages)
+int bpf_jit_charge_modmem(u32 pages)
 {
 	if (atomic_long_add_return(pages, &bpf_jit_current) >
 	    (bpf_jit_limit >> PAGE_SHIFT)) {
@@ -735,14 +735,22 @@ static int bpf_jit_charge_modmem(u32 pages)
 	return 0;
 }
 
-static void bpf_jit_uncharge_modmem(u32 pages)
+void bpf_jit_uncharge_modmem(u32 pages)
 {
 	atomic_long_sub(pages, &bpf_jit_current);
 }
 
 void *__weak bpf_jit_alloc_exec(unsigned long size)
 {
-	return module_alloc(size);
+	void *ret;
+	u32 pages = size / PAGE_SIZE;
+
+	if (bpf_jit_charge_modmem(pages))
+		return NULL;
+	ret = module_alloc(size);
+	if (!ret)
+		bpf_jit_uncharge_modmem(pages);
+	return ret;
 }
 
 void __weak bpf_jit_free_exec(void *addr)
@@ -765,13 +773,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
 	pages = size / PAGE_SIZE;
 
-	if (bpf_jit_charge_modmem(pages))
-		return NULL;
 	hdr = bpf_jit_alloc_exec(size);
-	if (!hdr) {
-		bpf_jit_uncharge_modmem(pages);
-		return NULL;
-	}
 
 	/* Fill space with illegal/arch-dep instructions. */
 	bpf_fill_ill_insns(hdr, size);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 4/4] bpf, x64: Enable unprivlidged jit in vmalloc
  2019-02-05 22:50 [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Rick Edgecombe
                   ` (2 preceding siblings ...)
  2019-02-05 22:51 ` [RFC PATCH 3/4] bpf: Charge bpf jit limit in bpf_jit_alloc_exec Rick Edgecombe
@ 2019-02-05 22:51 ` Rick Edgecombe
  2019-02-06  0:35 ` [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Alexei Starovoitov
  4 siblings, 0 replies; 9+ messages in thread
From: Rick Edgecombe @ 2019-02-05 22:51 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, ard.biesheuvel, dave.hansen, kristen, Rick Edgecombe

This enables unprivlidged JIT allocations to be made in vmalloc space
when the bpf jit limit is exceeded.

The logic is we use module space unless it is full or we are not
CAP_SYS_ADMIN and bpf_jit_limit is exceeded, in which case we use vmalloc
space. So vmalloc is only used when either the insertion would fail, or
BPF would fallback to the interpreter.

In the case of using vmalloc, it is not charged against bpf_jit_limit.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index c9781d471e31..66d2b32a1db1 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1118,6 +1118,38 @@ struct x64_jit_data {
 	struct jit_context ctx;
 };
 
+void *bpf_jit_alloc_exec(unsigned long size)
+{
+	void *ret;
+	u32 pages = size / PAGE_SIZE;
+
+	/*
+	 * The logic is we use module space unless it is full or we are not
+	 * CAP_SYS_ADMIN and bpf_jit_limit is exceeded, in which case we use
+	 * vmalloc space.
+	 */
+	if (bpf_jit_charge_modmem(pages))
+		return vmalloc_exec(size);
+
+	ret = module_alloc(size);
+
+	if (!ret) {
+		bpf_jit_uncharge_modmem(pages);
+		/* If module space is full, try vmalloc */
+		return vmalloc_exec(size);
+	}
+
+	return ret;
+}
+
+void bpf_jit_free_exec(void *addr)
+{
+	if (is_vmalloc_addr(addr))
+		vfree(addr);
+	else
+		module_memfree(addr);
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	struct bpf_binary_header *header = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86
  2019-02-05 22:50 [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Rick Edgecombe
                   ` (3 preceding siblings ...)
  2019-02-05 22:51 ` [RFC PATCH 4/4] bpf, x64: Enable unprivlidged jit in vmalloc Rick Edgecombe
@ 2019-02-06  0:35 ` Alexei Starovoitov
  2019-02-06  1:11   ` Edgecombe, Rick P
  4 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2019-02-06  0:35 UTC (permalink / raw)
  To: Rick Edgecombe, daniel; +Cc: netdev, ard.biesheuvel, dave.hansen, kristen

On 2/5/19 2:50 PM, Rick Edgecombe wrote:
> This introduces a new capability for BPF program JIT's to be located in vmalloc
> space on x86_64. This can serve as a backup area for CONFIG_BPF_JIT_ALWAYS_ON in
> case an unprivileged app uses all of the module space allowed by bpf_jit_limit.
> 
> In order to allow for calls from the increased distance of vmalloc from
> kernel/module space, relative calls are emitted as full indirect calls if the
> maximum relative call distance is exceeded. So the resulting performance of call
> BPF instructions in this case is similar to the BPF interpreter.

If I read this correctly the patches introduce retpoline overhead
to direct function call because JITed progs are more than 32-bit apart
and they're far away only because of dubious security concern ?
Nack.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86
  2019-02-06  0:35 ` [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Alexei Starovoitov
@ 2019-02-06  1:11   ` Edgecombe, Rick P
  2019-02-06  1:40     ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Edgecombe, Rick P @ 2019-02-06  1:11 UTC (permalink / raw)
  To: daniel, ast; +Cc: kristen, netdev, ard.biesheuvel, Hansen, Dave

On Wed, 2019-02-06 at 00:35 +0000, Alexei Starovoitov wrote:
> On 2/5/19 2:50 PM, Rick Edgecombe wrote:
> > This introduces a new capability for BPF program JIT's to be located in
> > vmalloc
> > space on x86_64. This can serve as a backup area for
> > CONFIG_BPF_JIT_ALWAYS_ON in
> > case an unprivileged app uses all of the module space allowed by
> > bpf_jit_limit.
> > 
> > In order to allow for calls from the increased distance of vmalloc from
> > kernel/module space, relative calls are emitted as full indirect calls if
> > the
> > maximum relative call distance is exceeded. So the resulting performance of
> > call
> > BPF instructions in this case is similar to the BPF interpreter.
> 
> If I read this correctly the patches introduce retpoline overhead
> to direct function call because JITed progs are more than 32-bit apart
> and they're far away only because of dubious security concern ?
> Nack.
> 
There really isn't any overhead, because they are only far away if the module
space is full, or the bpf_jit_limit is exceeded for non-admin. So cases today
when insertions would succeed it emits the same code, but cases where the
insertion would fail due to lack of space, it now at least works with the
described performance.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86
  2019-02-06  1:11   ` Edgecombe, Rick P
@ 2019-02-06  1:40     ` Alexei Starovoitov
  2019-02-06 15:38       ` Daniel Borkmann
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2019-02-06  1:40 UTC (permalink / raw)
  To: Edgecombe, Rick P, daniel; +Cc: kristen, netdev, ard.biesheuvel, Hansen, Dave

On 2/5/19 5:11 PM, Edgecombe, Rick P wrote:
> On Wed, 2019-02-06 at 00:35 +0000, Alexei Starovoitov wrote:
>> On 2/5/19 2:50 PM, Rick Edgecombe wrote:
>>> This introduces a new capability for BPF program JIT's to be located in
>>> vmalloc
>>> space on x86_64. This can serve as a backup area for
>>> CONFIG_BPF_JIT_ALWAYS_ON in
>>> case an unprivileged app uses all of the module space allowed by
>>> bpf_jit_limit.
>>>
>>> In order to allow for calls from the increased distance of vmalloc from
>>> kernel/module space, relative calls are emitted as full indirect calls if
>>> the
>>> maximum relative call distance is exceeded. So the resulting performance of
>>> call
>>> BPF instructions in this case is similar to the BPF interpreter.
>>
>> If I read this correctly the patches introduce retpoline overhead
>> to direct function call because JITed progs are more than 32-bit apart
>> and they're far away only because of dubious security concern ?
>> Nack.
>>
> There really isn't any overhead, because they are only far away if the module
> space is full, or the bpf_jit_limit is exceeded for non-admin. So cases today
> when insertions would succeed it emits the same code, but cases where the
> insertion would fail due to lack of space, it now at least works with the
> described performance.

I disagree with the problem statement.
x86 classic BPF jit has been around forever and no one
complained that _unprivileged_ bpf progs exhaust module space.
With bpf_jit_limit we got an extra knob to close this
remote possibility of an attack.
It's more than enough.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86
  2019-02-06  1:40     ` Alexei Starovoitov
@ 2019-02-06 15:38       ` Daniel Borkmann
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Borkmann @ 2019-02-06 15:38 UTC (permalink / raw)
  To: Alexei Starovoitov, Edgecombe, Rick P
  Cc: kristen, netdev, ard.biesheuvel, Hansen, Dave

On 02/06/2019 02:40 AM, Alexei Starovoitov wrote:
> On 2/5/19 5:11 PM, Edgecombe, Rick P wrote:
>> On Wed, 2019-02-06 at 00:35 +0000, Alexei Starovoitov wrote:
>>> On 2/5/19 2:50 PM, Rick Edgecombe wrote:
>>>> This introduces a new capability for BPF program JIT's to be located in
>>>> vmalloc
>>>> space on x86_64. This can serve as a backup area for
>>>> CONFIG_BPF_JIT_ALWAYS_ON in
>>>> case an unprivileged app uses all of the module space allowed by
>>>> bpf_jit_limit.
>>>>
>>>> In order to allow for calls from the increased distance of vmalloc from
>>>> kernel/module space, relative calls are emitted as full indirect calls if
>>>> the
>>>> maximum relative call distance is exceeded. So the resulting performance of
>>>> call
>>>> BPF instructions in this case is similar to the BPF interpreter.
>>>
>>> If I read this correctly the patches introduce retpoline overhead
>>> to direct function call because JITed progs are more than 32-bit apart
>>> and they're far away only because of dubious security concern ?
>>> Nack.
>>>
>> There really isn't any overhead, because they are only far away if the module
>> space is full, or the bpf_jit_limit is exceeded for non-admin. So cases today
>> when insertions would succeed it emits the same code, but cases where the
>> insertion would fail due to lack of space, it now at least works with the
>> described performance.
> 
> I disagree with the problem statement.
> x86 classic BPF jit has been around forever and no one
> complained that _unprivileged_ bpf progs exhaust module space.
> With bpf_jit_limit we got an extra knob to close this
> remote possibility of an attack.
> It's more than enough.

Agree here, also for cap_sys_admin programs there is kind of unpredictability
when we cross the condition that module space is full and thus fall back having
to emit retpolines for every call, which *is* overhead and hard to debug behavior
wrt performance regression from user pov. So I'd rather have it fail and enlarge
the module space instead, for example.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-02-06 15:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05 22:50 [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Rick Edgecombe
2019-02-05 22:51 ` [RFC PATCH 1/4] bpf, x64: Implement BPF call retpoline Rick Edgecombe
2019-02-05 22:51 ` [RFC PATCH 2/4] bpf, x64: Increase distance for bpf calls Rick Edgecombe
2019-02-05 22:51 ` [RFC PATCH 3/4] bpf: Charge bpf jit limit in bpf_jit_alloc_exec Rick Edgecombe
2019-02-05 22:51 ` [RFC PATCH 4/4] bpf, x64: Enable unprivlidged jit in vmalloc Rick Edgecombe
2019-02-06  0:35 ` [RFC PATCH 0/4] Initial support for allocating BPF JITs in vmalloc for x86 Alexei Starovoitov
2019-02-06  1:11   ` Edgecombe, Rick P
2019-02-06  1:40     ` Alexei Starovoitov
2019-02-06 15:38       ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).