linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] powerpc/bpf: use BPF prog pack allocator
@ 2023-08-25 15:18 Hari Bathini
  2023-08-25 15:18 ` [PATCH v3 1/5] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Hari Bathini @ 2023-08-25 15:18 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Patches 1 & 2 add the arch specific functions needed to support this
feature. Patch 3 enables the support for powerpc and ensures cleanup
is handled gracefully. Patch 4 introduces patch_instructions() that
optimizes some calls while patching more than one instruction. Patch 5
leverages this new function to improve time taken for JIT'ing BPF
programs.

Note that the first 3 patches are sufficient to enable the support
for bpf_prog_pack on powerpc. Patches 4 & 5 are to improve the JIT
compilation time of BPF programs on powerpc.

Thanks to Christophe and Song for reviewing v2 [2].

Changes in v3:
* Fixed segfault issue observed on ppc32 due to inaccurate offset
  calculation for branching.
* Tried to minimize the performance impact for patch_instruction()
  with the introduction of patch_instructions().
* Corrected uses of u32* vs ppc_instr_t.
* Moved the change that introduces patch_instructions() to after
  enabling bpf_prog_pack support.
* Added few comments to improve code readability.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/
[2] https://lore.kernel.org/all/20230309180028.180200-1-hbathini@linux.ibm.com/


Hari Bathini (5):
  powerpc/bpf: implement bpf_arch_text_copy
  powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
  powerpc/code-patching: introduce patch_instructions()
  powerpc/bpf: use patch_instructions()

 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/lib/code-patching.c         |  94 ++++++++++++---
 arch/powerpc/net/bpf_jit.h               |  12 +-
 arch/powerpc/net/bpf_jit_comp.c          | 138 ++++++++++++++++++-----
 arch/powerpc/net/bpf_jit_comp32.c        |  13 ++-
 arch/powerpc/net/bpf_jit_comp64.c        |  10 +-
 6 files changed, 205 insertions(+), 63 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/5] powerpc/bpf: implement bpf_arch_text_copy
  2023-08-25 15:18 [PATCH 0/5] powerpc/bpf: use BPF prog pack allocator Hari Bathini
@ 2023-08-25 15:18 ` Hari Bathini
  2023-08-25 15:29   ` Christophe Leroy
  2023-08-25 15:18 ` [PATCH v3 2/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack Hari Bathini
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Hari Bathini @ 2023-08-25 15:18 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
multiple BPF programs to share the same page. Use patch_instruction()
to implement it.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp.c | 40 ++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 37043dfc1add..170ebf8ac0f2 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -13,9 +13,12 @@
 #include <linux/netdevice.h>
 #include <linux/filter.h>
 #include <linux/if_vlan.h>
-#include <asm/kprobes.h>
+#include <linux/memory.h>
 #include <linux/bpf.h>
 
+#include <asm/kprobes.h>
+#include <asm/code-patching.h>
+
 #include "bpf_jit.h"
 
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
@@ -23,6 +26,27 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
 	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
+/*
+ * Patch 'len' bytes of instructions from opcode to addr, one instruction
+ * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
+ */
+static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
+{
+	while (len > 0) {
+		ppc_inst_t insn = ppc_inst_read(opcode);
+		int ilen = ppc_inst_len(insn);
+
+		if (patch_instruction(addr, insn))
+			return ERR_PTR(-EINVAL);
+
+		len -= ilen;
+		addr = addr + ilen;
+		opcode = opcode + ilen;
+	}
+
+	return addr;
+}
+
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr)
 {
 	if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 4))) {
@@ -274,3 +298,17 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
 	ctx->exentry_idx++;
 	return 0;
 }
+
+void *bpf_arch_text_copy(void *dst, void *src, size_t len)
+{
+	void *ret;
+
+	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+		return ERR_PTR(-EINVAL);
+
+	mutex_lock(&text_mutex);
+	ret = bpf_patch_instructions(dst, src, len);
+	mutex_unlock(&text_mutex);
+
+	return ret;
+}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  2023-08-25 15:18 [PATCH 0/5] powerpc/bpf: use BPF prog pack allocator Hari Bathini
  2023-08-25 15:18 ` [PATCH v3 1/5] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
@ 2023-08-25 15:18 ` Hari Bathini
  2023-08-25 15:33   ` Christophe Leroy
  2023-08-25 15:18 ` [PATCH v3 3/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] Hari Bathini
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Hari Bathini @ 2023-08-25 15:18 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

Implement bpf_arch_text_invalidate and use it to fill unused part of
the bpf_prog_pack with trap instructions when a BPF program is freed.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 170ebf8ac0f2..7cd4cf53d61c 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -30,7 +30,7 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
  * Patch 'len' bytes of instructions from opcode to addr, one instruction
  * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
  */
-static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
+static void *bpf_patch_instructions(void *addr, void *opcode, size_t len, bool fill_insn)
 {
 	while (len > 0) {
 		ppc_inst_t insn = ppc_inst_read(opcode);
@@ -41,7 +41,8 @@ static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
 
 		len -= ilen;
 		addr = addr + ilen;
-		opcode = opcode + ilen;
+		if (!fill_insn)
+			opcode = opcode + ilen;
 	}
 
 	return addr;
@@ -307,7 +308,22 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 		return ERR_PTR(-EINVAL);
 
 	mutex_lock(&text_mutex);
-	ret = bpf_patch_instructions(dst, src, len);
+	ret = bpf_patch_instructions(dst, src, len, false);
+	mutex_unlock(&text_mutex);
+
+	return ret;
+}
+
+int bpf_arch_text_invalidate(void *dst, size_t len)
+{
+	u32 insn = BREAKPOINT_INSTRUCTION;
+	int ret;
+
+	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+		return -EINVAL;
+
+	mutex_lock(&text_mutex);
+	ret = IS_ERR(bpf_patch_instructions(dst, &insn, len, true));
 	mutex_unlock(&text_mutex);
 
 	return ret;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
  2023-08-25 15:18 [PATCH 0/5] powerpc/bpf: use BPF prog pack allocator Hari Bathini
  2023-08-25 15:18 ` [PATCH v3 1/5] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
  2023-08-25 15:18 ` [PATCH v3 2/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack Hari Bathini
@ 2023-08-25 15:18 ` Hari Bathini
  2023-08-25 15:18 ` [PATCH v3 4/5] powerpc/code-patching: introduce patch_instructions() Hari Bathini
  2023-08-25 15:18 ` [PATCH v3 5/5] powerpc/bpf: use patch_instructions() Hari Bathini
  4 siblings, 0 replies; 11+ messages in thread
From: Hari Bathini @ 2023-08-25 15:18 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

Use bpf_jit_binary_pack_alloc in powerpc jit. The jit engine first
writes the program to the rw buffer. When the jit is done, the program
is copied to the final location with bpf_jit_binary_pack_finalize.
With multiple jit_subprogs, bpf_jit_free is called on some subprograms
that haven't got bpf_jit_binary_pack_finalize() yet. Implement custom
bpf_jit_free() like in commit 1d5f82d9dd47 ("bpf, x86: fix freeing of
not-finalized bpf_prog_pack") to call bpf_jit_binary_pack_finalize(),
if necessary. While here, correct the misnomer powerpc64_jit_data to
powerpc_jit_data as it is meant for both ppc32 and ppc64.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit.h        |  12 ++--
 arch/powerpc/net/bpf_jit_comp.c   | 110 ++++++++++++++++++++++--------
 arch/powerpc/net/bpf_jit_comp32.c |  13 ++--
 arch/powerpc/net/bpf_jit_comp64.c |  10 +--
 4 files changed, 98 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 72b7bb34fade..02bd89aa8ea5 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -36,9 +36,6 @@
 		EMIT(PPC_RAW_BRANCH(offset));				      \
 	} while (0)
 
-/* bl (unconditional 'branch' with link) */
-#define PPC_BL(dest)	EMIT(PPC_RAW_BL((dest) - (unsigned long)(image + ctx->idx)))
-
 /* "cond" here covers BO:BI fields. */
 #define PPC_BCC_SHORT(cond, dest)					      \
 	do {								      \
@@ -169,16 +166,17 @@ static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i)
 }
 
 void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func);
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func);
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
 		       u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
 
-int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct codegen_context *ctx,
-			  int insn_idx, int jmp_off, int dst_reg);
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int pass,
+			  struct codegen_context *ctx, int insn_idx,
+			  int jmp_off, int dst_reg);
 
 #endif
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 7cd4cf53d61c..c60d7570e05d 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -64,10 +64,13 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
 	return 0;
 }
 
-struct powerpc64_jit_data {
-	struct bpf_binary_header *header;
+struct powerpc_jit_data {
+	/* address of rw header */
+	struct bpf_binary_header *hdr;
+	/* address of ro final header */
+	struct bpf_binary_header *fhdr;
 	u32 *addrs;
-	u8 *image;
+	u8 *fimage;
 	u32 proglen;
 	struct codegen_context ctx;
 };
@@ -84,15 +87,18 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	u8 *image = NULL;
 	u32 *code_base;
 	u32 *addrs;
-	struct powerpc64_jit_data *jit_data;
+	struct powerpc_jit_data *jit_data;
 	struct codegen_context cgctx;
 	int pass;
 	int flen;
-	struct bpf_binary_header *bpf_hdr;
+	struct bpf_binary_header *fhdr = NULL;
+	struct bpf_binary_header *hdr = NULL;
 	struct bpf_prog *org_fp = fp;
 	struct bpf_prog *tmp_fp;
 	bool bpf_blinded = false;
 	bool extra_pass = false;
+	u8 *fimage = NULL;
+	u32 *fcode_base;
 	u32 extable_len;
 	u32 fixup_len;
 
@@ -122,9 +128,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	addrs = jit_data->addrs;
 	if (addrs) {
 		cgctx = jit_data->ctx;
-		image = jit_data->image;
-		bpf_hdr = jit_data->header;
+		/*
+		 * JIT compiled to a writable location (image/code_base) first.
+		 * It is then moved to the readonly final location (fimage/fcode_base)
+		 * using instruction patching.
+		 */
+		fimage = jit_data->fimage;
+		fhdr = jit_data->fhdr;
 		proglen = jit_data->proglen;
+		hdr = jit_data->hdr;
+		image = (void *)hdr + ((void *)fimage - (void *)fhdr);
 		extra_pass = true;
 		/* During extra pass, ensure index is reset before repopulating extable entries */
 		cgctx.exentry_idx = 0;
@@ -144,7 +157,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
 	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0, false)) {
+	if (bpf_jit_build_body(fp, NULL, NULL, &cgctx, addrs, 0, false)) {
 		/* We hit something illegal or unsupported. */
 		fp = org_fp;
 		goto out_addrs;
@@ -159,7 +172,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	 */
 	if (cgctx.seen & SEEN_TAILCALL || !is_offset_in_branch_range((long)cgctx.idx * 4)) {
 		cgctx.idx = 0;
-		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0, false)) {
+		if (bpf_jit_build_body(fp, NULL, NULL, &cgctx, addrs, 0, false)) {
 			fp = org_fp;
 			goto out_addrs;
 		}
@@ -181,17 +194,19 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	proglen = cgctx.idx * 4;
 	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len;
 
-	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4, bpf_jit_fill_ill_insns);
-	if (!bpf_hdr) {
+	fhdr = bpf_jit_binary_pack_alloc(alloclen, &fimage, 4, &hdr, &image,
+					      bpf_jit_fill_ill_insns);
+	if (!fhdr) {
 		fp = org_fp;
 		goto out_addrs;
 	}
 
 	if (extable_len)
-		fp->aux->extable = (void *)image + FUNCTION_DESCR_SIZE + proglen + fixup_len;
+		fp->aux->extable = (void *)fimage + FUNCTION_DESCR_SIZE + proglen + fixup_len;
 
 skip_init_ctx:
 	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
+	fcode_base = (u32 *)(fimage + FUNCTION_DESCR_SIZE);
 
 	/* Code generation passes 1-2 */
 	for (pass = 1; pass < 3; pass++) {
@@ -199,8 +214,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		cgctx.idx = 0;
 		cgctx.alt_exit_addr = 0;
 		bpf_jit_build_prologue(code_base, &cgctx);
-		if (bpf_jit_build_body(fp, code_base, &cgctx, addrs, pass, extra_pass)) {
-			bpf_jit_binary_free(bpf_hdr);
+		if (bpf_jit_build_body(fp, code_base, fcode_base, &cgctx, addrs, pass,
+				       extra_pass)) {
+			bpf_arch_text_copy(&fhdr->size, &hdr->size, sizeof(hdr->size));
+			bpf_jit_binary_pack_free(fhdr, hdr);
 			fp = org_fp;
 			goto out_addrs;
 		}
@@ -220,17 +237,19 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 #ifdef CONFIG_PPC64_ELF_ABI_V1
 	/* Function descriptor nastiness: Address + TOC */
-	((u64 *)image)[0] = (u64)code_base;
+	((u64 *)image)[0] = (u64)fcode_base;
 	((u64 *)image)[1] = local_paca->kernel_toc;
 #endif
 
-	fp->bpf_func = (void *)image;
+	fp->bpf_func = (void *)fimage;
 	fp->jited = 1;
 	fp->jited_len = proglen + FUNCTION_DESCR_SIZE;
 
-	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + bpf_hdr->size);
 	if (!fp->is_func || extra_pass) {
-		bpf_jit_binary_lock_ro(bpf_hdr);
+		if (bpf_jit_binary_pack_finalize(fp, fhdr, hdr)) {
+			fp = org_fp;
+			goto out_addrs;
+		}
 		bpf_prog_fill_jited_linfo(fp, addrs);
 out_addrs:
 		kfree(addrs);
@@ -240,8 +259,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		jit_data->addrs = addrs;
 		jit_data->ctx = cgctx;
 		jit_data->proglen = proglen;
-		jit_data->image = image;
-		jit_data->header = bpf_hdr;
+		jit_data->fimage = fimage;
+		jit_data->fhdr = fhdr;
+		jit_data->hdr = hdr;
 	}
 
 out:
@@ -255,12 +275,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
  * The caller should check for (BPF_MODE(code) == BPF_PROBE_MEM) before calling
  * this function, as this only applies to BPF_PROBE_MEM, for now.
  */
-int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct codegen_context *ctx,
-			  int insn_idx, int jmp_off, int dst_reg)
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int pass,
+			  struct codegen_context *ctx, int insn_idx, int jmp_off,
+			  int dst_reg)
 {
 	off_t offset;
 	unsigned long pc;
-	struct exception_table_entry *ex;
+	struct exception_table_entry *ex, *ex_entry;
 	u32 *fixup;
 
 	/* Populate extable entries only in the last pass */
@@ -271,9 +292,16 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
 	    WARN_ON_ONCE(ctx->exentry_idx >= fp->aux->num_exentries))
 		return -EINVAL;
 
+	/*
+	 * Program is first written to image before copying to the
+	 * final location (fimage). Accordingly, update in the image first.
+	 * As all offsets used are relative, copying as is to the
+	 * final location should be alright.
+	 */
 	pc = (unsigned long)&image[insn_idx];
+	ex = (void *)fp->aux->extable - (void *)fimage + (void *)image;
 
-	fixup = (void *)fp->aux->extable -
+	fixup = (void *)ex -
 		(fp->aux->num_exentries * BPF_FIXUP_LEN * 4) +
 		(ctx->exentry_idx * BPF_FIXUP_LEN * 4);
 
@@ -284,17 +312,17 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
 	fixup[BPF_FIXUP_LEN - 1] =
 		PPC_RAW_BRANCH((long)(pc + jmp_off) - (long)&fixup[BPF_FIXUP_LEN - 1]);
 
-	ex = &fp->aux->extable[ctx->exentry_idx];
+	ex_entry = &ex[ctx->exentry_idx];
 
-	offset = pc - (long)&ex->insn;
+	offset = pc - (long)&ex_entry->insn;
 	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
 		return -ERANGE;
-	ex->insn = offset;
+	ex_entry->insn = offset;
 
-	offset = (long)fixup - (long)&ex->fixup;
+	offset = (long)fixup - (long)&ex_entry->fixup;
 	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
 		return -ERANGE;
-	ex->fixup = offset;
+	ex_entry->fixup = offset;
 
 	ctx->exentry_idx++;
 	return 0;
@@ -328,3 +356,27 @@ int bpf_arch_text_invalidate(void *dst, size_t len)
 
 	return ret;
 }
+
+void bpf_jit_free(struct bpf_prog *fp)
+{
+	if (fp->jited) {
+		struct powerpc_jit_data *jit_data = fp->aux->jit_data;
+		struct bpf_binary_header *hdr;
+
+		/*
+		 * If we fail the final pass of JIT (from jit_subprogs),
+		 * the program may not be finalized yet. Call finalize here
+		 * before freeing it.
+		 */
+		if (jit_data) {
+			bpf_jit_binary_pack_finalize(fp, jit_data->fhdr, jit_data->hdr);
+			kvfree(jit_data->addrs);
+			kfree(jit_data);
+		}
+		hdr = bpf_jit_binary_pack_hdr(fp);
+		bpf_jit_binary_pack_free(hdr, NULL);
+		WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
+	}
+
+	bpf_prog_unlock_free(fp);
+}
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 7f91ea064c08..434417c755fd 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -200,12 +200,13 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 	EMIT(PPC_RAW_BLR());
 }
 
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func)
+/* Relative offset needs to be calculated based on final image location */
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
 {
-	s32 rel = (s32)func - (s32)(image + ctx->idx);
+	s32 rel = (s32)func - (s32)(fimage + ctx->idx);
 
 	if (image && rel < 0x2000000 && rel >= -0x2000000) {
-		PPC_BL(func);
+		EMIT(PPC_RAW_BL(rel));
 	} else {
 		/* Load function address into r0 */
 		EMIT(PPC_RAW_LIS(_R0, IMM_H(func)));
@@ -278,7 +279,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
 }
 
 /* Assemble the body code between the prologue & epilogue */
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
 		       u32 *addrs, int pass, bool extra_pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
@@ -997,7 +998,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 					jmp_off += 4;
 				}
 
-				ret = bpf_add_extable_entry(fp, image, pass, ctx, insn_idx,
+				ret = bpf_add_extable_entry(fp, image, fimage, pass, ctx, insn_idx,
 							    jmp_off, dst_reg);
 				if (ret)
 					return ret;
@@ -1053,7 +1054,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 				EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_5), _R1, 12));
 			}
 
-			ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+			ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, func_addr);
 			if (ret)
 				return ret;
 
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 0f8048f6dad6..79f23974a320 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -240,7 +240,7 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, u
 	return 0;
 }
 
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *ctx, u64 func)
 {
 	unsigned int i, ctx_idx = ctx->idx;
 
@@ -361,7 +361,7 @@ asm (
 );
 
 /* Assemble the body code between the prologue & epilogue */
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
 		       u32 *addrs, int pass, bool extra_pass)
 {
 	enum stf_barrier_type stf_barrier = stf_barrier_type_get();
@@ -940,8 +940,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 				addrs[++i] = ctx->idx * 4;
 
 			if (BPF_MODE(code) == BPF_PROBE_MEM) {
-				ret = bpf_add_extable_entry(fp, image, pass, ctx, ctx->idx - 1,
-							    4, dst_reg);
+				ret = bpf_add_extable_entry(fp, image, fimage, pass, ctx,
+							    ctx->idx - 1, 4, dst_reg);
 				if (ret)
 					return ret;
 			}
@@ -995,7 +995,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 			if (func_addr_fixed)
 				ret = bpf_jit_emit_func_call_hlp(image, ctx, func_addr);
 			else
-				ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+				ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, func_addr);
 
 			if (ret)
 				return ret;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/5] powerpc/code-patching: introduce patch_instructions()
  2023-08-25 15:18 [PATCH 0/5] powerpc/bpf: use BPF prog pack allocator Hari Bathini
                   ` (2 preceding siblings ...)
  2023-08-25 15:18 ` [PATCH v3 3/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] Hari Bathini
@ 2023-08-25 15:18 ` Hari Bathini
  2023-08-25 15:18 ` [PATCH v3 5/5] powerpc/bpf: use patch_instructions() Hari Bathini
  4 siblings, 0 replies; 11+ messages in thread
From: Hari Bathini @ 2023-08-25 15:18 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce function patch_instructions()
that sets up the pte, clears the pte and flushes the tlb only once per
page range of instructions to be patched. This adds a slight overhead
to patch_instruction() call while improving the patching time for
scenarios where more than one instruction needs to be patched.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/include/asm/code-patching.h |  1 +
 arch/powerpc/lib/code-patching.c         | 94 ++++++++++++++++++++----
 2 files changed, 80 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..4f5f0c091586 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
 int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_instructions(void *addr, void *code, size_t len, bool fill_insn);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b00112d7ad46..60d446e16b42 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -278,20 +278,25 @@ static void unmap_patch_area(unsigned long addr)
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool fill_insn)
 {
-	int err;
-	u32 *patch_addr;
 	unsigned long text_poke_addr;
 	pte_t *pte;
 	unsigned long pfn = get_patch_pfn(addr);
 	struct mm_struct *patching_mm;
 	struct mm_struct *orig_mm;
+	ppc_inst_t instr;
+	void *patch_addr;
 	spinlock_t *ptl;
+	int ilen, err;
 
 	patching_mm = __this_cpu_read(cpu_patching_context.mm);
 	text_poke_addr = __this_cpu_read(cpu_patching_context.addr);
-	patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+	patch_addr = (void *)(text_poke_addr + offset_in_page(addr));
 
 	pte = get_locked_pte(patching_mm, text_poke_addr, &ptl);
 	if (!pte)
@@ -307,11 +312,22 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
 
 	orig_mm = start_using_temp_mm(patching_mm);
 
-	err = __patch_instruction(addr, instr, patch_addr);
+	while (len > 0) {
+		instr = ppc_inst_read(code);
+		ilen = ppc_inst_len(instr);
+		err = __patch_instruction(addr, instr, patch_addr);
+		/* hwsync performed by __patch_instruction (sync) if successful */
+		if (err) {
+			mb();  /* sync */
+			break;
+		}
 
-	/* hwsync performed by __patch_instruction (sync) if successful */
-	if (err)
-		mb();  /* sync */
+		len -= ilen;
+		patch_addr = patch_addr + ilen;
+		addr = (void *)addr + ilen;
+		if (!fill_insn)
+			code = code + ilen;
+	}
 
 	/* context synchronisation performed by __patch_instruction (isync or exception) */
 	stop_using_temp_mm(patching_mm, orig_mm);
@@ -328,16 +344,21 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
 	return err;
 }
 
-static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions(u32 *addr, void *code, size_t len, bool fill_insn)
 {
-	int err;
-	u32 *patch_addr;
 	unsigned long text_poke_addr;
 	pte_t *pte;
 	unsigned long pfn = get_patch_pfn(addr);
+	void *patch_addr;
+	ppc_inst_t instr;
+	int ilen, err;
 
 	text_poke_addr = (unsigned long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK;
-	patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+	patch_addr = (void *)(text_poke_addr + offset_in_page(addr));
 
 	pte = __this_cpu_read(cpu_patching_context.pte);
 	__set_pte_at(&init_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0);
@@ -345,7 +366,19 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
 	if (radix_enabled())
 		asm volatile("ptesync": : :"memory");
 
-	err = __patch_instruction(addr, instr, patch_addr);
+	while (len > 0) {
+		instr = ppc_inst_read(code);
+		ilen = ppc_inst_len(instr);
+		err = __patch_instruction(addr, instr, patch_addr);
+		if (err)
+			break;
+
+		len -= ilen;
+		patch_addr = patch_addr + ilen;
+		addr = (void *)addr + ilen;
+		if (!fill_insn)
+			code = code + ilen;
+	}
 
 	pte_clear(&init_mm, text_poke_addr, pte);
 	flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE);
@@ -369,15 +402,46 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
 
 	local_irq_save(flags);
 	if (mm_patch_enabled())
-		err = __do_patch_instruction_mm(addr, instr);
+		err = __do_patch_instructions_mm(addr, &instr, ppc_inst_len(instr), false);
 	else
-		err = __do_patch_instruction(addr, instr);
+		err = __do_patch_instructions(addr, &instr, ppc_inst_len(instr), false);
 	local_irq_restore(flags);
 
 	return err;
 }
 NOKPROBE_SYMBOL(patch_instruction);
 
+/*
+ * Patch 'addr' with 'len' bytes of instructions from 'code'.
+ */
+int patch_instructions(void *addr, void *code, size_t len, bool fill_insn)
+{
+	unsigned long flags;
+	size_t plen;
+	int err;
+
+	while (len > 0) {
+		plen = min_t(size_t, PAGE_SIZE - offset_in_page(addr), len);
+
+		local_irq_save(flags);
+		if (mm_patch_enabled())
+			err = __do_patch_instructions_mm(addr, code, plen, fill_insn);
+		else
+			err = __do_patch_instructions(addr, code, plen, fill_insn);
+		local_irq_restore(flags);
+		if (err)
+			break;
+
+		len -= plen;
+		addr = addr + plen;
+		if (!fill_insn)
+			code = code + plen;
+	}
+
+	return err;
+}
+NOKPROBE_SYMBOL(patch_instructions);
+
 int patch_branch(u32 *addr, unsigned long target, int flags)
 {
 	ppc_inst_t instr;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 5/5] powerpc/bpf: use patch_instructions()
  2023-08-25 15:18 [PATCH 0/5] powerpc/bpf: use BPF prog pack allocator Hari Bathini
                   ` (3 preceding siblings ...)
  2023-08-25 15:18 ` [PATCH v3 4/5] powerpc/code-patching: introduce patch_instructions() Hari Bathini
@ 2023-08-25 15:18 ` Hari Bathini
  2023-08-25 15:46   ` Christophe Leroy
  4 siblings, 1 reply; 11+ messages in thread
From: Hari Bathini @ 2023-08-25 15:18 UTC (permalink / raw)
  To: linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao

Use the newly introduced patch_instructions() that handles patching
multiple instructions with one call. This improves speed of exectution
for JIT'ing bpf programs.

Without this patch (on a POWER9 lpar):

  # time modprobe test_bpf
  real    2m59.681s
  user    0m0.000s
  sys     1m44.160s
  #

With this patch (on a POWER9 lpar):

  # time modprobe test_bpf
  real    0m5.013s
  user    0m0.000s
  sys     0m4.216s
  #

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp.c | 30 ++++--------------------------
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c60d7570e05d..1e5000d18321 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -26,28 +26,6 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
 	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
-/*
- * Patch 'len' bytes of instructions from opcode to addr, one instruction
- * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
- */
-static void *bpf_patch_instructions(void *addr, void *opcode, size_t len, bool fill_insn)
-{
-	while (len > 0) {
-		ppc_inst_t insn = ppc_inst_read(opcode);
-		int ilen = ppc_inst_len(insn);
-
-		if (patch_instruction(addr, insn))
-			return ERR_PTR(-EINVAL);
-
-		len -= ilen;
-		addr = addr + ilen;
-		if (!fill_insn)
-			opcode = opcode + ilen;
-	}
-
-	return addr;
-}
-
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr)
 {
 	if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 4))) {
@@ -330,16 +308,16 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int pass
 
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
-	void *ret;
+	int err;
 
 	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
 		return ERR_PTR(-EINVAL);
 
 	mutex_lock(&text_mutex);
-	ret = bpf_patch_instructions(dst, src, len, false);
+	err = patch_instructions(dst, src, len, false);
 	mutex_unlock(&text_mutex);
 
-	return ret;
+	return err ? ERR_PTR(err) : dst;
 }
 
 int bpf_arch_text_invalidate(void *dst, size_t len)
@@ -351,7 +329,7 @@ int bpf_arch_text_invalidate(void *dst, size_t len)
 		return -EINVAL;
 
 	mutex_lock(&text_mutex);
-	ret = IS_ERR(bpf_patch_instructions(dst, &insn, len, true));
+	ret = patch_instructions(dst, &insn, len, true);
 	mutex_unlock(&text_mutex);
 
 	return ret;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/5] powerpc/bpf: implement bpf_arch_text_copy
  2023-08-25 15:18 ` [PATCH v3 1/5] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
@ 2023-08-25 15:29   ` Christophe Leroy
  0 siblings, 0 replies; 11+ messages in thread
From: Christophe Leroy @ 2023-08-25 15:29 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao



Le 25/08/2023 à 17:18, Hari Bathini a écrit :
> bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
> multiple BPF programs to share the same page. Use patch_instruction()
> to implement it.

By using patch_instruction() for doing that you are mapping and 
unmapping the same page for each instruction you copy. You should 
implement something to map a page, copy all instruction going into that 
page, unmap the page and map next page, etc ....
And then perform the icache flush at the end.

> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>   arch/powerpc/net/bpf_jit_comp.c | 40 ++++++++++++++++++++++++++++++++-
>   1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 37043dfc1add..170ebf8ac0f2 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -13,9 +13,12 @@
>   #include <linux/netdevice.h>
>   #include <linux/filter.h>
>   #include <linux/if_vlan.h>
> -#include <asm/kprobes.h>
> +#include <linux/memory.h>
>   #include <linux/bpf.h>
>   
> +#include <asm/kprobes.h>
> +#include <asm/code-patching.h>
> +
>   #include "bpf_jit.h"
>   
>   static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> @@ -23,6 +26,27 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
>   	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
>   }
>   
> +/*
> + * Patch 'len' bytes of instructions from opcode to addr, one instruction
> + * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
> + */
> +static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
> +{
> +	while (len > 0) {
> +		ppc_inst_t insn = ppc_inst_read(opcode);
> +		int ilen = ppc_inst_len(insn);
> +
> +		if (patch_instruction(addr, insn))
> +			return ERR_PTR(-EINVAL);
> +
> +		len -= ilen;
> +		addr = addr + ilen;
> +		opcode = opcode + ilen;
> +	}
> +
> +	return addr;
> +}
> +
>   int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr)
>   {
>   	if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 4))) {
> @@ -274,3 +298,17 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
>   	ctx->exentry_idx++;
>   	return 0;
>   }
> +
> +void *bpf_arch_text_copy(void *dst, void *src, size_t len)
> +{
> +	void *ret;
> +
> +	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
> +		return ERR_PTR(-EINVAL);
> +
> +	mutex_lock(&text_mutex);
> +	ret = bpf_patch_instructions(dst, src, len);
> +	mutex_unlock(&text_mutex);
> +
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  2023-08-25 15:18 ` [PATCH v3 2/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack Hari Bathini
@ 2023-08-25 15:33   ` Christophe Leroy
  2023-08-25 17:37     ` Hari Bathini
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2023-08-25 15:33 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao



Le 25/08/2023 à 17:18, Hari Bathini a écrit :
> Implement bpf_arch_text_invalidate and use it to fill unused part of
> the bpf_prog_pack with trap instructions when a BPF program is freed.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>   arch/powerpc/net/bpf_jit_comp.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 170ebf8ac0f2..7cd4cf53d61c 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -30,7 +30,7 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
>    * Patch 'len' bytes of instructions from opcode to addr, one instruction
>    * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
>    */
> -static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
> +static void *bpf_patch_instructions(void *addr, void *opcode, size_t len, bool fill_insn)

It's a pitty that you have to modify in patch 2 a function you have 
added in patch 1 of the same series. Can't you have it right from the 
begining ?

>   {
>   	while (len > 0) {
>   		ppc_inst_t insn = ppc_inst_read(opcode);
> @@ -41,7 +41,8 @@ static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
>   
>   		len -= ilen;
>   		addr = addr + ilen;
> -		opcode = opcode + ilen;
> +		if (!fill_insn)
> +			opcode = opcode + ilen;
>   	}
>   
>   	return addr;
> @@ -307,7 +308,22 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
>   		return ERR_PTR(-EINVAL);
>   
>   	mutex_lock(&text_mutex);
> -	ret = bpf_patch_instructions(dst, src, len);
> +	ret = bpf_patch_instructions(dst, src, len, false);
> +	mutex_unlock(&text_mutex);
> +
> +	return ret;
> +}
> +
> +int bpf_arch_text_invalidate(void *dst, size_t len)
> +{
> +	u32 insn = BREAKPOINT_INSTRUCTION;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
> +		return -EINVAL;
> +
> +	mutex_lock(&text_mutex);
> +	ret = IS_ERR(bpf_patch_instructions(dst, &insn, len, true));

Why IS_ERR ?

As far as I understand from the weak definition in kernel/bpf/core.c, 
this function is supposed to return an error, not a bool.

>   	mutex_unlock(&text_mutex);
>   
>   	return ret;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 5/5] powerpc/bpf: use patch_instructions()
  2023-08-25 15:18 ` [PATCH v3 5/5] powerpc/bpf: use patch_instructions() Hari Bathini
@ 2023-08-25 15:46   ` Christophe Leroy
  2023-08-25 17:43     ` Hari Bathini
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2023-08-25 15:46 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao



Le 25/08/2023 à 17:18, Hari Bathini a écrit :
> Use the newly introduced patch_instructions() that handles patching
> multiple instructions with one call. This improves speed of exectution
> for JIT'ing bpf programs.
> 
> Without this patch (on a POWER9 lpar):
> 
>    # time modprobe test_bpf
>    real    2m59.681s
>    user    0m0.000s
>    sys     1m44.160s
>    #
> 
> With this patch (on a POWER9 lpar):
> 
>    # time modprobe test_bpf
>    real    0m5.013s
>    user    0m0.000s
>    sys     0m4.216s
>    #

Right, significant improvement. Forget by comment to patch 1, I should 
have read the series up to the end. Just wondering why you don't just 
put patch 4 up front ?

Christophe

> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>   arch/powerpc/net/bpf_jit_comp.c | 30 ++++--------------------------
>   1 file changed, 4 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index c60d7570e05d..1e5000d18321 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -26,28 +26,6 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
>   	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
>   }
>   
> -/*
> - * Patch 'len' bytes of instructions from opcode to addr, one instruction
> - * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
> - */
> -static void *bpf_patch_instructions(void *addr, void *opcode, size_t len, bool fill_insn)
> -{
> -	while (len > 0) {
> -		ppc_inst_t insn = ppc_inst_read(opcode);
> -		int ilen = ppc_inst_len(insn);
> -
> -		if (patch_instruction(addr, insn))
> -			return ERR_PTR(-EINVAL);
> -
> -		len -= ilen;
> -		addr = addr + ilen;
> -		if (!fill_insn)
> -			opcode = opcode + ilen;
> -	}
> -
> -	return addr;
> -}
> -
>   int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr)
>   {
>   	if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 4))) {
> @@ -330,16 +308,16 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int pass
>   
>   void *bpf_arch_text_copy(void *dst, void *src, size_t len)
>   {
> -	void *ret;
> +	int err;
>   
>   	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
>   		return ERR_PTR(-EINVAL);
>   
>   	mutex_lock(&text_mutex);
> -	ret = bpf_patch_instructions(dst, src, len, false);
> +	err = patch_instructions(dst, src, len, false);
>   	mutex_unlock(&text_mutex);
>   
> -	return ret;
> +	return err ? ERR_PTR(err) : dst;
>   }
>   
>   int bpf_arch_text_invalidate(void *dst, size_t len)
> @@ -351,7 +329,7 @@ int bpf_arch_text_invalidate(void *dst, size_t len)
>   		return -EINVAL;
>   
>   	mutex_lock(&text_mutex);
> -	ret = IS_ERR(bpf_patch_instructions(dst, &insn, len, true));
> +	ret = patch_instructions(dst, &insn, len, true);
>   	mutex_unlock(&text_mutex);
>   
>   	return ret;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  2023-08-25 15:33   ` Christophe Leroy
@ 2023-08-25 17:37     ` Hari Bathini
  0 siblings, 0 replies; 11+ messages in thread
From: Hari Bathini @ 2023-08-25 17:37 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, bpf
  Cc: Naveen N. Rao, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko



On 25/08/23 9:03 pm, Christophe Leroy wrote:
> 
> 
> Le 25/08/2023 à 17:18, Hari Bathini a écrit :
>> Implement bpf_arch_text_invalidate and use it to fill unused part of
>> the bpf_prog_pack with trap instructions when a BPF program is freed.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>    arch/powerpc/net/bpf_jit_comp.c | 22 +++++++++++++++++++---
>>    1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
>> index 170ebf8ac0f2..7cd4cf53d61c 100644
>> --- a/arch/powerpc/net/bpf_jit_comp.c
>> +++ b/arch/powerpc/net/bpf_jit_comp.c
>> @@ -30,7 +30,7 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
>>     * Patch 'len' bytes of instructions from opcode to addr, one instruction
>>     * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
>>     */
>> -static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
>> +static void *bpf_patch_instructions(void *addr, void *opcode, size_t len, bool fill_insn)
> 
> It's a pitty that you have to modify in patch 2 a function you have
> added in patch 1 of the same series. Can't you have it right from the
> begining ?
> 
>>    {
>>    	while (len > 0) {
>>    		ppc_inst_t insn = ppc_inst_read(opcode);
>> @@ -41,7 +41,8 @@ static void *bpf_patch_instructions(void *addr, void *opcode, size_t len)
>>    
>>    		len -= ilen;
>>    		addr = addr + ilen;
>> -		opcode = opcode + ilen;
>> +		if (!fill_insn)
>> +			opcode = opcode + ilen;
>>    	}
>>    
>>    	return addr;
>> @@ -307,7 +308,22 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
>>    		return ERR_PTR(-EINVAL);
>>    
>>    	mutex_lock(&text_mutex);
>> -	ret = bpf_patch_instructions(dst, src, len);
>> +	ret = bpf_patch_instructions(dst, src, len, false);
>> +	mutex_unlock(&text_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +int bpf_arch_text_invalidate(void *dst, size_t len)
>> +{
>> +	u32 insn = BREAKPOINT_INSTRUCTION;
>> +	int ret;
>> +
>> +	if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&text_mutex);
>> +	ret = IS_ERR(bpf_patch_instructions(dst, &insn, len, true));
> 
> Why IS_ERR ?
> 
> As far as I understand from the weak definition in kernel/bpf/core.c,
> this function is supposed to return an error, not a bool.

My bad! Will fix that in the next revision.

- Hari

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 5/5] powerpc/bpf: use patch_instructions()
  2023-08-25 15:46   ` Christophe Leroy
@ 2023-08-25 17:43     ` Hari Bathini
  0 siblings, 0 replies; 11+ messages in thread
From: Hari Bathini @ 2023-08-25 17:43 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, bpf
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Naveen N. Rao



On 25/08/23 9:16 pm, Christophe Leroy wrote:
> 
> 
> Le 25/08/2023 à 17:18, Hari Bathini a écrit :
>> Use the newly introduced patch_instructions() that handles patching
>> multiple instructions with one call. This improves speed of exectution
>> for JIT'ing bpf programs.
>>
>> Without this patch (on a POWER9 lpar):
>>
>>     # time modprobe test_bpf
>>     real    2m59.681s
>>     user    0m0.000s
>>     sys     1m44.160s
>>     #
>>
>> With this patch (on a POWER9 lpar):
>>
>>     # time modprobe test_bpf
>>     real    0m5.013s
>>     user    0m0.000s
>>     sys     0m4.216s
>>     #
> 
> Right, significant improvement. Forget by comment to patch 1, I should
> have read the series up to the end. Just wondering why you don't just
> put patch 4 up front ?

I wanted to remove the dependency for bpf_prog_pack enablement
patches with this improvement, just in case..

- Hari

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-08-25 17:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25 15:18 [PATCH 0/5] powerpc/bpf: use BPF prog pack allocator Hari Bathini
2023-08-25 15:18 ` [PATCH v3 1/5] powerpc/bpf: implement bpf_arch_text_copy Hari Bathini
2023-08-25 15:29   ` Christophe Leroy
2023-08-25 15:18 ` [PATCH v3 2/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack Hari Bathini
2023-08-25 15:33   ` Christophe Leroy
2023-08-25 17:37     ` Hari Bathini
2023-08-25 15:18 ` [PATCH v3 3/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] Hari Bathini
2023-08-25 15:18 ` [PATCH v3 4/5] powerpc/code-patching: introduce patch_instructions() Hari Bathini
2023-08-25 15:18 ` [PATCH v3 5/5] powerpc/bpf: use patch_instructions() Hari Bathini
2023-08-25 15:46   ` Christophe Leroy
2023-08-25 17:43     ` Hari Bathini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).