[RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines
@ 2022-02-07  7:07 ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-07  7:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, Steven Rostedt
  Cc: Yauheni Kaliuta, Jordan Niethe, linuxppc-dev, bpf, Jiri Olsa,
	Hari Bathini

This is an early RFC series that adds support for BPF Trampolines on 
powerpc64. Some of the selftests are passing for me, but this needs more
testing and I've likely missed a few things as well. A review of the
patches and feedback about the overall approach will be great.

This series depends on some of the other BPF JIT fixes and enhancements
posted previously, as well as on ftrace direct enablement on powerpc
which has also been posted in the past.


- Naveen


Naveen N. Rao (3):
  ftrace: Add ftrace_location_lookup() to lookup address of ftrace
    location
  powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  powerpc64/bpf: Add support for bpf trampolines

 arch/powerpc/kernel/kprobes.c      |   8 +-
 arch/powerpc/kernel/trace/ftrace.c |  11 +
 arch/powerpc/net/bpf_jit.h         |   8 +
 arch/powerpc/net/bpf_jit_comp.c    |   5 +-
 arch/powerpc/net/bpf_jit_comp64.c  | 619 ++++++++++++++++++++++++++++-
 include/linux/ftrace.h             |   5 +
 kernel/bpf/trampoline.c            |  27 +-
 kernel/trace/ftrace.c              |  14 +
 8 files changed, 670 insertions(+), 27 deletions(-)


base-commit: 33ecb3e590194051dc57eee1125c1d372b14c946
-- 2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines
@ 2022-02-07  7:07 ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-07  7:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, Steven Rostedt
  Cc: bpf, linuxppc-dev, Yauheni Kaliuta, Hari Bathini, Jordan Niethe,
	Jiri Olsa

This is an early RFC series that adds support for BPF Trampolines on 
powerpc64. Some of the selftests are passing for me, but this needs more
testing and I've likely missed a few things as well. A review of the
patches and feedback about the overall approach will be great.

This series depends on some of the other BPF JIT fixes and enhancements
posted previously, as well as on ftrace direct enablement on powerpc
which has also been posted in the past.


- Naveen


Naveen N. Rao (3):
  ftrace: Add ftrace_location_lookup() to lookup address of ftrace
    location
  powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  powerpc64/bpf: Add support for bpf trampolines

 arch/powerpc/kernel/kprobes.c      |   8 +-
 arch/powerpc/kernel/trace/ftrace.c |  11 +
 arch/powerpc/net/bpf_jit.h         |   8 +
 arch/powerpc/net/bpf_jit_comp.c    |   5 +-
 arch/powerpc/net/bpf_jit_comp64.c  | 619 ++++++++++++++++++++++++++++-
 include/linux/ftrace.h             |   5 +
 kernel/bpf/trampoline.c            |  27 +-
 kernel/trace/ftrace.c              |  14 +
 8 files changed, 670 insertions(+), 27 deletions(-)


base-commit: 33ecb3e590194051dc57eee1125c1d372b14c946
-- 2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 1/3] ftrace: Add ftrace_location_lookup() to lookup address of ftrace location
  2022-02-07  7:07 ` Naveen N. Rao
@ 2022-02-07  7:07   ` Naveen N. Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-07  7:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, Steven Rostedt
  Cc: Yauheni Kaliuta, Jordan Niethe, linuxppc-dev, bpf, Jiri Olsa,
	Hari Bathini

Add a new function ftrace_location_lookup() that can be used to
determine the exact ftrace location around function entry. This is
useful on architectures where the ftrace location is not the very first
instruction in a function. Such architectures can override this function
to search for ftrace location and to return the exact address of the
same.

Convert some uses of ftrace_location() in BPF infrastructure to the new
function.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 include/linux/ftrace.h  |  5 +++++
 kernel/bpf/trampoline.c | 27 +++++++++------------------
 kernel/trace/ftrace.c   | 14 ++++++++++++++
 3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 708e9d610f1337..59791f2aa0b356 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -582,6 +582,7 @@ int ftrace_test_record(struct dyn_ftrace *rec, bool enable);
 void ftrace_run_stop_machine(int command);
 unsigned long ftrace_location(unsigned long ip);
 unsigned long ftrace_location_range(unsigned long start, unsigned long end);
+unsigned long ftrace_location_lookup(unsigned long ip);
 unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec);
 unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec);
 
@@ -795,6 +796,10 @@ static inline unsigned long ftrace_location(unsigned long ip)
 {
 	return 0;
 }
+static inline unsigned long ftrace_location_lookup(unsigned long ip)
+{
+	return 0;
+}
 
 /*
  * Again users of functions that have ftrace_ops may not
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 4b6974a195c138..5da9d332cd0e10 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -117,25 +117,14 @@ static void bpf_trampoline_module_put(struct bpf_trampoline *tr)
 	tr->mod = NULL;
 }
 
-static int is_ftrace_location(void *ip)
-{
-	long addr;
-
-	addr = ftrace_location((long)ip);
-	if (!addr)
-		return 0;
-	if (WARN_ON_ONCE(addr != (long)ip))
-		return -EFAULT;
-	return 1;
-}
-
 static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
 {
 	void *ip = tr->func.addr;
 	int ret;
 
 	if (tr->func.ftrace_managed)
-		ret = unregister_ftrace_direct((long)ip, (long)old_addr);
+		ret = unregister_ftrace_direct(ftrace_location_lookup((unsigned long)ip),
+									(long)old_addr);
 	else
 		ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL);
 
@@ -150,7 +139,8 @@ static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_ad
 	int ret;
 
 	if (tr->func.ftrace_managed)
-		ret = modify_ftrace_direct((long)ip, (long)old_addr, (long)new_addr);
+		ret = modify_ftrace_direct(ftrace_location_lookup((unsigned long)ip),
+						(long)old_addr, (long)new_addr);
 	else
 		ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, new_addr);
 	return ret;
@@ -162,10 +152,11 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
 	void *ip = tr->func.addr;
 	int ret;
 
-	ret = is_ftrace_location(ip);
-	if (ret < 0)
-		return ret;
-	tr->func.ftrace_managed = ret;
+	ip = (void *)ftrace_location_lookup((unsigned long)ip);
+	tr->func.ftrace_managed = !!ip;
+
+	if (!ip)
+		ip = tr->func.addr;
 
 	if (bpf_trampoline_module_get(tr))
 		return -ENOENT;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index ff57a842fbebcd..6a68b86b2b6ac6 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1581,6 +1581,20 @@ unsigned long ftrace_location(unsigned long ip)
 	return ftrace_location_range(ip, ip);
 }
 
+/**
+ * ftrace_location_lookup - return exact address of traced location
+ * @ip: the instruction pointer to check
+ *
+ * Used to lookup traced location around function entry. This is
+ * especially useful on architectures where the traced location is
+ * not the very first instruction in a function. Such architectures
+ * should provide an implementation of this function.
+ */
+unsigned long __weak ftrace_location_lookup(unsigned long ip)
+{
+	return ftrace_location_range(ip, ip);
+}
+
 /**
  * ftrace_text_reserved - return true if range contains an ftrace location
  * @start: start of range to search
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 1/3] ftrace: Add ftrace_location_lookup() to lookup address of ftrace location
@ 2022-02-07  7:07   ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-07  7:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, Steven Rostedt
  Cc: bpf, linuxppc-dev, Yauheni Kaliuta, Hari Bathini, Jordan Niethe,
	Jiri Olsa

Add a new function ftrace_location_lookup() that can be used to
determine the exact ftrace location around function entry. This is
useful on architectures where the ftrace location is not the very first
instruction in a function. Such architectures can override this function
to search for ftrace location and to return the exact address of the
same.

Convert some uses of ftrace_location() in BPF infrastructure to the new
function.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 include/linux/ftrace.h  |  5 +++++
 kernel/bpf/trampoline.c | 27 +++++++++------------------
 kernel/trace/ftrace.c   | 14 ++++++++++++++
 3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 708e9d610f1337..59791f2aa0b356 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -582,6 +582,7 @@ int ftrace_test_record(struct dyn_ftrace *rec, bool enable);
 void ftrace_run_stop_machine(int command);
 unsigned long ftrace_location(unsigned long ip);
 unsigned long ftrace_location_range(unsigned long start, unsigned long end);
+unsigned long ftrace_location_lookup(unsigned long ip);
 unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec);
 unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec);
 
@@ -795,6 +796,10 @@ static inline unsigned long ftrace_location(unsigned long ip)
 {
 	return 0;
 }
+static inline unsigned long ftrace_location_lookup(unsigned long ip)
+{
+	return 0;
+}
 
 /*
  * Again users of functions that have ftrace_ops may not
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 4b6974a195c138..5da9d332cd0e10 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -117,25 +117,14 @@ static void bpf_trampoline_module_put(struct bpf_trampoline *tr)
 	tr->mod = NULL;
 }
 
-static int is_ftrace_location(void *ip)
-{
-	long addr;
-
-	addr = ftrace_location((long)ip);
-	if (!addr)
-		return 0;
-	if (WARN_ON_ONCE(addr != (long)ip))
-		return -EFAULT;
-	return 1;
-}
-
 static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
 {
 	void *ip = tr->func.addr;
 	int ret;
 
 	if (tr->func.ftrace_managed)
-		ret = unregister_ftrace_direct((long)ip, (long)old_addr);
+		ret = unregister_ftrace_direct(ftrace_location_lookup((unsigned long)ip),
+									(long)old_addr);
 	else
 		ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL);
 
@@ -150,7 +139,8 @@ static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_ad
 	int ret;
 
 	if (tr->func.ftrace_managed)
-		ret = modify_ftrace_direct((long)ip, (long)old_addr, (long)new_addr);
+		ret = modify_ftrace_direct(ftrace_location_lookup((unsigned long)ip),
+						(long)old_addr, (long)new_addr);
 	else
 		ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, new_addr);
 	return ret;
@@ -162,10 +152,11 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
 	void *ip = tr->func.addr;
 	int ret;
 
-	ret = is_ftrace_location(ip);
-	if (ret < 0)
-		return ret;
-	tr->func.ftrace_managed = ret;
+	ip = (void *)ftrace_location_lookup((unsigned long)ip);
+	tr->func.ftrace_managed = !!ip;
+
+	if (!ip)
+		ip = tr->func.addr;
 
 	if (bpf_trampoline_module_get(tr))
 		return -ENOENT;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index ff57a842fbebcd..6a68b86b2b6ac6 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1581,6 +1581,20 @@ unsigned long ftrace_location(unsigned long ip)
 	return ftrace_location_range(ip, ip);
 }
 
+/**
+ * ftrace_location_lookup - return exact address of traced location
+ * @ip: the instruction pointer to check
+ *
+ * Used to lookup traced location around function entry. This is
+ * especially useful on architectures where the traced location is
+ * not the very first instruction in a function. Such architectures
+ * should provide an implementation of this function.
+ */
+unsigned long __weak ftrace_location_lookup(unsigned long ip)
+{
+	return ftrace_location_range(ip, ip);
+}
+
 /**
  * ftrace_text_reserved - return true if range contains an ftrace location
  * @start: start of range to search
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-07  7:07 ` Naveen N. Rao
@ 2022-02-07  7:07   ` Naveen N. Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-07  7:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, Steven Rostedt
  Cc: Yauheni Kaliuta, Jordan Niethe, linuxppc-dev, bpf, Jiri Olsa,
	Hari Bathini

With CONFIG_MPROFILE_KERNEL, ftrace location is within the first 5
instructions of a function. Override ftrace_location_lookup() to search
within this range for the ftrace location.

Also convert kprobe_lookup_name() to utilize this function.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/kprobes.c      |  8 +-------
 arch/powerpc/kernel/trace/ftrace.c | 11 +++++++++++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9a492fdec1dfbe..03cb50e4c8c3e8 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -50,13 +50,7 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
 	addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
 	if (addr && !offset) {
 #ifdef CONFIG_KPROBES_ON_FTRACE
-		unsigned long faddr;
-		/*
-		 * Per livepatch.h, ftrace location is always within the first
-		 * 16 bytes of a function on powerpc with -mprofile-kernel.
-		 */
-		faddr = ftrace_location_range((unsigned long)addr,
-					      (unsigned long)addr + 16);
+		unsigned long faddr = ftrace_location_lookup((unsigned long)addr);
 		if (faddr)
 			addr = (kprobe_opcode_t *)faddr;
 		else
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index b65ca87a2eacb1..5127eb65c299af 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search)
 		return str;
 }
 #endif /* PPC64_ELF_ABI_v1 */
+
+#ifdef CONFIG_MPROFILE_KERNEL
+unsigned long ftrace_location_lookup(unsigned long ip)
+{
+	/*
+	 * Per livepatch.h, ftrace location is always within the first
+	 * 16 bytes of a function on powerpc with -mprofile-kernel.
+	 */
+	return ftrace_location_range(ip, ip + 16);
+}
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-07  7:07   ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-07  7:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, Steven Rostedt
  Cc: bpf, linuxppc-dev, Yauheni Kaliuta, Hari Bathini, Jordan Niethe,
	Jiri Olsa

With CONFIG_MPROFILE_KERNEL, ftrace location is within the first 5
instructions of a function. Override ftrace_location_lookup() to search
within this range for the ftrace location.

Also convert kprobe_lookup_name() to utilize this function.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/kprobes.c      |  8 +-------
 arch/powerpc/kernel/trace/ftrace.c | 11 +++++++++++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9a492fdec1dfbe..03cb50e4c8c3e8 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -50,13 +50,7 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
 	addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
 	if (addr && !offset) {
 #ifdef CONFIG_KPROBES_ON_FTRACE
-		unsigned long faddr;
-		/*
-		 * Per livepatch.h, ftrace location is always within the first
-		 * 16 bytes of a function on powerpc with -mprofile-kernel.
-		 */
-		faddr = ftrace_location_range((unsigned long)addr,
-					      (unsigned long)addr + 16);
+		unsigned long faddr = ftrace_location_lookup((unsigned long)addr);
 		if (faddr)
 			addr = (kprobe_opcode_t *)faddr;
 		else
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index b65ca87a2eacb1..5127eb65c299af 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search)
 		return str;
 }
 #endif /* PPC64_ELF_ABI_v1 */
+
+#ifdef CONFIG_MPROFILE_KERNEL
+unsigned long ftrace_location_lookup(unsigned long ip)
+{
+	/*
+	 * Per livepatch.h, ftrace location is always within the first
+	 * 16 bytes of a function on powerpc with -mprofile-kernel.
+	 */
+	return ftrace_location_range(ip, ip + 16);
+}
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 3/3] powerpc64/bpf: Add support for bpf trampolines
  2022-02-07  7:07 ` Naveen N. Rao
@ 2022-02-07  7:07   ` Naveen N. Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-07  7:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, Steven Rostedt
  Cc: Yauheni Kaliuta, Jordan Niethe, linuxppc-dev, bpf, Jiri Olsa,
	Hari Bathini

Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for powerpc64 -mprofile-kernel.

We set aside space for two stubs at the beginning of each bpf program.
These stubs are used if having to branch to locations outside the range
of a branch instruction.

BPF Trampolines adhere to the powerpc64 -mprofile-kernel ABI since these
need to attach to ftrace locations using ftrace direct attach. Due to
this, bpf_arch_text_poke() patches two instructions: 'mflr r0' and 'bl'
for BPF_MOD_CALL. The trampoline code itself closely follows the x86
implementation.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/net/bpf_jit.h        |   8 +
 arch/powerpc/net/bpf_jit_comp.c   |   5 +-
 arch/powerpc/net/bpf_jit_comp64.c | 619 +++++++++++++++++++++++++++++-
 3 files changed, 630 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 0832235a274983..777b10650678af 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -19,6 +19,14 @@
 #define FUNCTION_DESCR_SIZE	0
 #endif
 
+#ifdef PPC64_ELF_ABI_v2
+#define BPF_TRAMP_STUB_SIZE	32
+#else
+#define BPF_TRAMP_STUB_SIZE	0
+#endif
+
+#define PPC_BPF_MAGIC()			(0xeB9FC0DE)
+
 #define PLANT_INSTR(d, idx, instr)					      \
 	do { if (d) { (d)[idx] = instr; } idx++; } while (0)
 #define EMIT(instr)		PLANT_INSTR(image, ctx->idx, instr)
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 635f7448ff7952..5df2f15bfe4d75 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -220,7 +220,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
 
 	proglen = cgctx.idx * 4;
-	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len;
+	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len + BPF_TRAMP_STUB_SIZE * 2;
 
 	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4, bpf_jit_fill_ill_insns);
 	if (!bpf_hdr) {
@@ -228,6 +228,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		goto out_addrs;
 	}
 
+	image += BPF_TRAMP_STUB_SIZE * 2;
+
 	if (extable_len)
 		fp->aux->extable = (void *)image + FUNCTION_DESCR_SIZE + proglen + fixup_len;
 
@@ -251,6 +253,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	}
 
 	/* Code generation passes 1-2 */
+	*(code_base - 1) = PPC_BPF_MAGIC();
 	for (pass = 1; pass < 3; pass++) {
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index c3cfe1f4338fca..20d8f6e3cc9bb0 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -13,6 +13,7 @@
 #include <linux/netdevice.h>
 #include <linux/filter.h>
 #include <linux/if_vlan.h>
+#include <linux/memory.h>
 #include <asm/kprobes.h>
 #include <linux/bpf.h>
 #include <asm/security_features.h>
@@ -73,6 +74,10 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 {
 	int i;
 
+	/* two nops for trampoline attach */
+	EMIT(PPC_RAW_NOP());
+	EMIT(PPC_RAW_NOP());
+
 #ifdef PPC64_ELF_ABI_v2
 	PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
 #else
@@ -93,7 +98,7 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 		EMIT(PPC_RAW_NOP());
 	}
 
-#define BPF_TAILCALL_PROLOGUE_SIZE	12
+#define BPF_TAILCALL_PROLOGUE_SIZE	20
 
 	if (bpf_has_stack_frame(ctx)) {
 		/*
@@ -1133,3 +1138,615 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 
 	return 0;
 }
+
+#ifdef PPC64_ELF_ABI_v2
+
+static __always_inline int bpf_check_and_patch(u32 *ip, ppc_inst_t old_inst, ppc_inst_t new_inst)
+{
+	ppc_inst_t org_inst = ppc_inst_read(ip);
+	if (!ppc_inst_equal(org_inst, old_inst)) {
+		pr_info("bpf_check_and_patch: ip: 0x%lx, org_inst(0x%x) != old_inst (0x%x)\n",
+				(unsigned long)ip, ppc_inst_val(org_inst), ppc_inst_val(old_inst));
+		return -EBUSY;
+	}
+	if (ppc_inst_equal(org_inst, new_inst))
+		return 1;
+	return patch_instruction(ip, new_inst);
+}
+
+static u32 *bpf_find_existing_stub(u32 *ip, enum bpf_text_poke_type t, void *old_addr)
+{
+	int branch_flags = t == BPF_MOD_JUMP ? 0 : BRANCH_SET_LINK;
+	u32 *stub_addr = 0, *stub1, *stub2;
+	ppc_inst_t org_inst, old_inst;
+
+	if (!old_addr)
+		return 0;
+
+	stub1 = ip - (BPF_TRAMP_STUB_SIZE / sizeof(u32)) - (t == BPF_MOD_CALL ? 1 : 0);
+	stub2 = stub1 - (BPF_TRAMP_STUB_SIZE / sizeof(u32));
+	org_inst = ppc_inst_read(ip);
+	if (!create_branch(&old_inst, ip, (unsigned long)stub1, branch_flags) &&
+	    ppc_inst_equal(org_inst, old_inst))
+		stub_addr = stub1;
+	if (!create_branch(&old_inst, ip, (unsigned long)stub2, branch_flags) &&
+	    ppc_inst_equal(org_inst, old_inst))
+		stub_addr = stub2;
+
+	return stub_addr;
+}
+
+static u32 *bpf_setup_stub(u32 *ip, enum bpf_text_poke_type t, void *old_addr, void *new_addr)
+{
+	u32 *stub_addr, *stub1, *stub2;
+	ppc_inst_t org_inst, old_inst;
+	int i, ret;
+	u32 stub[] = {
+		PPC_RAW_LIS(12, 0),
+		PPC_RAW_ORI(12, 12, 0),
+		PPC_RAW_SLDI(12, 12, 32),
+		PPC_RAW_ORIS(12, 12, 0),
+		PPC_RAW_ORI(12, 12, 0),
+		PPC_RAW_MTCTR(12),
+		PPC_RAW_BCTR(),
+	};
+
+	/* verify we are patching the right location */
+	if (t == BPF_MOD_JUMP)
+		org_inst = ppc_inst_read(ip - 1);
+	else
+		org_inst = ppc_inst_read(ip - 2);
+	old_inst = ppc_inst(PPC_BPF_MAGIC());
+	if (!ppc_inst_equal(org_inst, old_inst))
+		return 0;
+
+	/* verify existing branch and note down the stub to use */
+	stub1 = ip - (BPF_TRAMP_STUB_SIZE / sizeof(u32)) - (t == BPF_MOD_CALL ? 1 : 0);
+	stub2 = stub1 - (BPF_TRAMP_STUB_SIZE / sizeof(u32));
+	stub_addr = 0;
+	org_inst = ppc_inst_read(ip);
+	if (old_addr) {
+		stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+		/* existing instruction should branch to one of the two stubs */
+		if (!stub_addr)
+			return 0;
+	} else {
+		old_inst = ppc_inst(PPC_RAW_NOP());
+		if (!ppc_inst_equal(org_inst, old_inst))
+			return 0;
+	}
+	if (stub_addr == stub1)
+		stub_addr = stub2;
+	else
+		stub_addr = stub1;
+
+	/* setup stub */
+	stub[0] |= IMM_L((unsigned long)new_addr >> 48);
+	stub[1] |= IMM_L((unsigned long)new_addr >> 32);
+	stub[3] |= IMM_L((unsigned long)new_addr >> 16);
+	stub[4] |= IMM_L((unsigned long)new_addr);
+	for (i = 0; i < sizeof(stub) / sizeof(u32); i++) {
+		ret = patch_instruction(stub_addr + i, ppc_inst(stub[i]));
+		if (ret) {
+			pr_err("bpf: patch_instruction() error while setting up stub: ret %d\n", ret);
+			return 0;
+		}
+	}
+
+	return stub_addr;
+}
+
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *old_addr, void *new_addr)
+{
+	ppc_inst_t org_inst, old_inst, new_inst;
+	int ret = -EINVAL;
+	u32 *stub_addr;
+
+	/* We currently only support poking bpf programs */
+	if (!is_bpf_text_address((long)ip)) {
+		pr_info("bpf_arch_text_poke (0x%lx): kernel/modules are not supported\n", (unsigned long)ip);
+		return -EINVAL;
+	}
+
+	mutex_lock(&text_mutex);
+	if (t == BPF_MOD_JUMP) {
+		/*
+		 * This can point to the beginning of a bpf program, or to certain locations
+		 * within a bpf program. We operate on a single instruction at ip here,
+		 * converting among a nop and an unconditional branch. Depending on branch
+		 * target, we may use the stub area at the beginning of the bpf program and
+		 * we assume that BPF_MOD_JUMP and BPF_MOD_CALL are never used without
+		 * transitioning to a nop.
+		 */
+		if (!old_addr && new_addr) {
+			/* nop -> b */
+			old_inst = ppc_inst(PPC_RAW_NOP());
+			if (create_branch(&new_inst, (u32 *)ip, (unsigned long)new_addr, 0)) {
+				stub_addr = bpf_setup_stub(ip, t, old_addr, new_addr);
+				if (!stub_addr ||
+				    create_branch(&new_inst, (u32 *)ip, (unsigned long)stub_addr, 0)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+		} else if (old_addr && !new_addr) {
+			/* b -> nop */
+			new_inst = ppc_inst(PPC_RAW_NOP());
+			if (create_branch(&old_inst, (u32 *)ip, (unsigned long)old_addr, 0)) {
+				stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+				if (!stub_addr ||
+				    create_branch(&old_inst, (u32 *)ip, (unsigned long)stub_addr, 0)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+		} else if (old_addr && new_addr) {
+			/* b -> b */
+			stub_addr = 0;
+			if (create_branch(&old_inst, (u32 *)ip, (unsigned long)old_addr, 0)) {
+				stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+				if (!stub_addr ||
+				    create_branch(&old_inst, (u32 *)ip, (unsigned long)stub_addr, 0)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			if (create_branch(&new_inst, (u32 *)ip, (unsigned long)new_addr, 0)) {
+				stub_addr = bpf_setup_stub(ip, t, old_addr, new_addr);
+				if (!stub_addr ||
+				    create_branch(&new_inst, (u32 *)ip, (unsigned long)stub_addr, 0)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			ret = bpf_check_and_patch((u32 *)ip, old_inst, new_inst);
+		}
+	} else if (t == BPF_MOD_CALL) {
+		/*
+		 * For a BPF_MOD_CALL, we expect ip to point at the start of a bpf program.
+		 * We will have to patch two instructions to mimic -mprofile-kernel: a 'mflr r0'
+		 * followed by a 'bl'. Instruction patching order matters: we always patch-in
+		 * the 'mflr r0' first and patch it out the last.
+		 */
+		if (!old_addr && new_addr) {
+			/* nop -> bl */
+
+			/* confirm that we have two nops */
+			old_inst = ppc_inst(PPC_RAW_NOP());
+			org_inst = ppc_inst_read(ip);
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			org_inst = ppc_inst_read((u32 *)ip + 1);
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* patch in the mflr */
+			new_inst = ppc_inst(PPC_RAW_MFLR(_R0));
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+			if (ret)
+				goto out;
+
+			/* prep the stub if needed */
+			ip = (u32 *)ip + 1;
+			if (create_branch(&new_inst, (u32 *)ip, (unsigned long)new_addr, BRANCH_SET_LINK)) {
+				stub_addr = bpf_setup_stub(ip, t, old_addr, new_addr);
+				if (!stub_addr ||
+				    create_branch(&new_inst, (u32 *)ip, (unsigned long)stub_addr, BRANCH_SET_LINK)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+
+			synchronize_rcu();
+
+			/* patch in the bl */
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+		} else if (old_addr && !new_addr) {
+			/* bl -> nop */
+
+			/* confirm the expected instruction sequence */
+			old_inst = ppc_inst(PPC_RAW_MFLR(_R0));
+			org_inst = ppc_inst_read(ip);
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			ip = (u32 *)ip + 1;
+			org_inst = ppc_inst_read(ip);
+			if (create_branch(&old_inst, (u32 *)ip, (unsigned long)old_addr, BRANCH_SET_LINK)) {
+				stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+				if (!stub_addr ||
+				    create_branch(&old_inst, (u32 *)ip, (unsigned long)stub_addr, BRANCH_SET_LINK)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* patch out the branch first */
+			new_inst = ppc_inst(PPC_RAW_NOP());
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+			if (ret)
+				goto out;
+
+			synchronize_rcu();
+
+			/* then, the mflr */
+			old_inst = ppc_inst(PPC_RAW_MFLR(_R0));
+			ret = bpf_check_and_patch((u32 *)ip - 1, old_inst, new_inst);
+		} else if (old_addr && new_addr) {
+			/* bl -> bl */
+
+			/* confirm the expected instruction sequence */
+			old_inst = ppc_inst(PPC_RAW_MFLR(_R0));
+			org_inst = ppc_inst_read(ip);
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			ip = (u32 *)ip + 1;
+			org_inst = ppc_inst_read(ip);
+			if (create_branch(&old_inst, (u32 *)ip, (unsigned long)old_addr, BRANCH_SET_LINK)) {
+				stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+				if (!stub_addr ||
+				    create_branch(&old_inst, (u32 *)ip, (unsigned long)stub_addr, BRANCH_SET_LINK)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* setup the new branch */
+			if (create_branch(&new_inst, (u32 *)ip, (unsigned long)new_addr, BRANCH_SET_LINK)) {
+				stub_addr = bpf_setup_stub(ip, t, old_addr, new_addr);
+				if (!stub_addr ||
+				    create_branch(&new_inst, (u32 *)ip, (unsigned long)stub_addr, BRANCH_SET_LINK)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+		}
+	}
+
+out:
+	mutex_unlock(&text_mutex);
+	return ret;
+}
+
+/*
+ * BPF Trampoline stack frame layout:
+ *
+ *		[	prev sp		] <-----
+ *		[   BPF_TRAMP_R26_SAVE	] 8	|
+ *		[   BPF_TRAMP_R25_SAVE	] 8	|
+ *		[   BPF_TRAMP_LR_SAVE	] 8	|
+ *		[       ret val		] 8	|
+ *		[   BPF_TRAMP_PROG_CTX	] 8 * 8	|
+ *		[ BPF_TRAMP_FUNC_ARG_CNT] 8	|
+ *		[   BPF_TRAMP_FUNC_IP	] 8	|
+ * sp (r1) --->	[   stack frame header	] ------
+ */
+
+/* stack frame header + data, quadword aligned */
+#define BPF_TRAMP_FRAME_SIZE	(STACK_FRAME_MIN_SIZE + (14 * 8))
+
+/* The below are offsets from r1 */
+/* upto 8 dword func parameters, as bpf prog ctx */
+#define BPF_TRAMP_PROG_CTX	(STACK_FRAME_MIN_SIZE + 16)
+/* bpf_get_func_arg_cnt() needs this before prog ctx */
+#define BPF_TRAMP_FUNC_ARG_CNT	(BPF_TRAMP_PROG_CTX - 8)
+/* bpf_get_func_ip() needs this here */
+#define BPF_TRAMP_FUNC_IP	(BPF_TRAMP_PROG_CTX - 16)
+/* lr save area, after space for upto 8 args followed by retval of orig_call/fentry progs */
+#define BPF_TRAMP_LR_SAVE	(BPF_TRAMP_PROG_CTX + (8 * 8) + 8)
+#define BPF_TRAMP_R25_SAVE	(BPF_TRAMP_LR_SAVE + 8)
+#define BPF_TRAMP_R26_SAVE	(BPF_TRAMP_R25_SAVE + 8)
+
+#define BPF_INSN_SAFETY		64
+
+static int invoke_bpf_prog(const struct btf_func_model *m, u32 *image, struct codegen_context *ctx,
+			   struct bpf_prog *p, bool save_ret)
+{
+	ppc_inst_t branch_insn;
+	u32 jmp_idx;
+	int ret;
+
+	/* __bpf_prog_enter(p) */
+	PPC_LI64(_R3, (unsigned long)p);
+	EMIT(PPC_RAW_MR(_R25, _R3));
+	ret = bpf_jit_emit_func_call_hlp(image, ctx,
+			p->aux->sleepable ? (u64)__bpf_prog_enter_sleepable : (u64)__bpf_prog_enter);
+	if (ret)
+		return ret;
+
+	/* remember prog start time returned by __bpf_prog_enter */
+	EMIT(PPC_RAW_MR(_R26, _R3));
+
+	/*
+	 * if (__bpf_prog_enter(p) == 0)
+	 *	goto skip_exec_of_prog;
+	 *
+	 * emit a nop to be later patched with conditional branch, once offset is known
+	 */
+	EMIT(PPC_RAW_CMPDI(_R3, 0));
+	jmp_idx = ctx->idx;
+	EMIT(PPC_RAW_NOP());
+
+	/* p->bpf_func() */
+	EMIT(PPC_RAW_ADDI(_R3, _R1, BPF_TRAMP_PROG_CTX));
+	if (!p->jited)
+		PPC_LI64(_R4, (unsigned long)p->insnsi);
+	if (is_offset_in_branch_range((unsigned long)p->bpf_func - (unsigned long)&image[ctx->idx])) {
+		PPC_BL((unsigned long)p->bpf_func);
+	} else {
+		PPC_BPF_LL(_R12, _R25, offsetof(struct bpf_prog, bpf_func));
+		EMIT(PPC_RAW_MTCTR(_R12));
+		EMIT(PPC_RAW_BCTRL());
+	}
+
+	if (save_ret)
+		PPC_BPF_STL(_R3, _R1, BPF_TRAMP_PROG_CTX + (m->nr_args * 8));
+
+	/* fix up branch */
+	if (create_cond_branch(&branch_insn, &image[jmp_idx], (unsigned long)&image[ctx->idx], COND_EQ << 16))
+		return -EINVAL;
+	image[jmp_idx] = ppc_inst_val(branch_insn);
+
+	/* __bpf_prog_exit(p, start_time) */
+	EMIT(PPC_RAW_MR(_R3, _R25));
+	EMIT(PPC_RAW_MR(_R4, _R26));
+	ret = bpf_jit_emit_func_call_hlp(image, ctx,
+			p->aux->sleepable ? (u64)__bpf_prog_exit_sleepable : (u64)__bpf_prog_exit);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int invoke_bpf(const struct btf_func_model *m, u32 *image, struct codegen_context *ctx,
+		      struct bpf_tramp_progs *tp, bool save_ret)
+{
+	int i;
+
+	for (i = 0; i < tp->nr_progs; i++) {
+		if (invoke_bpf_prog(m, image, ctx, tp->progs[i], save_ret))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int invoke_bpf_mod_ret(const struct btf_func_model *m, u32 *image, struct codegen_context *ctx,
+			      struct bpf_tramp_progs *tp, u32 *branches)
+{
+	int i;
+
+	/*
+	 * The first fmod_ret program will receive a garbage return value.
+	 * Set this to 0 to avoid confusing the program.
+	 */
+	EMIT(PPC_RAW_LI(_R3, 0));
+	PPC_BPF_STL(_R3, _R1, BPF_TRAMP_PROG_CTX + (m->nr_args * 8));
+	for (i = 0; i < tp->nr_progs; i++) {
+		if (invoke_bpf_prog(m, image, ctx, tp->progs[i], true))
+			return -EINVAL;
+
+		/*
+		 * mod_ret prog stored return value after prog ctx. Emit:
+		 * if (*(u64 *)(ret_val) !=  0)
+		 *	goto do_fexit;
+		 */
+		PPC_BPF_LL(_R3, _R1, BPF_TRAMP_PROG_CTX + (m->nr_args * 8));
+		EMIT(PPC_RAW_CMPDI(_R3, 0));
+
+		/*
+		 * Save the location of the branch and generate a nop, which is
+		 * replaced with a conditional jump once do_fexit (i.e. the
+		 * start of the fexit invocation) is finalized.
+		 */
+		branches[i] = ctx->idx;
+		EMIT(PPC_RAW_NOP());
+	}
+
+	return 0;
+}
+
+static bool is_valid_bpf_tramp_flags(unsigned int flags)
+{
+	if ((flags & BPF_TRAMP_F_RESTORE_REGS) && (flags & BPF_TRAMP_F_SKIP_FRAME))
+		return false;
+
+	/* We only support attaching to function entry */
+	if ((flags & BPF_TRAMP_F_CALL_ORIG) && !(flags & BPF_TRAMP_F_SKIP_FRAME))
+		return false;
+
+	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops, and it must be used alone */
+	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) && (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
+		return false;
+
+	return true;
+}
+
+/*
+ * We assume that orig_call is what this trampoline is being attached to and we use the link
+ * register for BPF_TRAMP_F_CALL_ORIG -- see is_valid_bpf_tramp_flags() for validating this.
+ */
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image_start, void *image_end,
+				const struct btf_func_model *m, u32 flags,
+				struct bpf_tramp_progs *tprogs,
+				void *orig_call __maybe_unused)
+{
+	bool save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
+	struct bpf_tramp_progs *fentry = &tprogs[BPF_TRAMP_FENTRY];
+	struct bpf_tramp_progs *fexit = &tprogs[BPF_TRAMP_FEXIT];
+	struct bpf_tramp_progs *fmod_ret = &tprogs[BPF_TRAMP_MODIFY_RETURN];
+	struct codegen_context codegen_ctx, *ctx;
+	int i, ret, nr_args = m->nr_args;
+	u32 *image = (u32 *)image_start;
+	ppc_inst_t branch_insn;
+	u32 *branches = NULL;
+
+	if (nr_args > 8 || !is_valid_bpf_tramp_flags(flags))
+		return -EINVAL;
+
+	ctx = &codegen_ctx;
+	memset(ctx, 0, sizeof(*ctx));
+
+	/*
+	 * Prologue for the trampoline follows ftrace -mprofile-kernel ABI.
+	 * On entry, LR has our return address while r0 has original return address.
+	 *	std	r0, 16(r1)
+	 *	stdu	r1, -144(r1)
+	 *	mflr	r0
+	 *	std	r0, 112(r1)
+	 *	std	r2, 24(r1)
+	 *	ld	r2, PACATOC(r13)
+	 *	std	r3, 40(r1)
+	 *	std	r4, 48(r2)
+	 *	...
+	 *	std	r25, 120(r1)
+	 *	std	r26, 128(r1)
+	 */
+	PPC_BPF_STL(_R0, _R1, PPC_LR_STKOFF);
+	PPC_BPF_STLU(_R1, _R1, -BPF_TRAMP_FRAME_SIZE);
+	EMIT(PPC_RAW_MFLR(_R0));
+	PPC_BPF_STL(_R0, _R1, BPF_TRAMP_LR_SAVE);
+	PPC_BPF_STL(_R2, _R1, 24);
+	PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
+	for (i = 0; i < nr_args; i++)
+		PPC_BPF_STL(_R3 + i, _R1, BPF_TRAMP_PROG_CTX + (i * 8));
+	PPC_BPF_STL(_R25, _R1, BPF_TRAMP_R25_SAVE);
+	PPC_BPF_STL(_R26, _R1, BPF_TRAMP_R26_SAVE);
+
+	/* save function arg count -- see bpf_get_func_arg_cnt() */
+	EMIT(PPC_RAW_LI(_R3, nr_args));
+	PPC_BPF_STL(_R3, _R1, BPF_TRAMP_FUNC_ARG_CNT);
+
+	/* save nip of the traced function before bpf prog ctx -- see bpf_get_func_ip() */
+	if (flags & BPF_TRAMP_F_IP_ARG) {
+		/* TODO: should this be GEP? */
+		EMIT(PPC_RAW_ADDI(_R3, _R0, -8));
+		PPC_BPF_STL(_R3, _R1, BPF_TRAMP_FUNC_IP);
+	}
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		PPC_LI64(_R3, (unsigned long)im);
+		ret = bpf_jit_emit_func_call_hlp(image, ctx, (u64)__bpf_tramp_enter);
+		if (ret)
+			return ret;
+	}
+
+	if (fentry->nr_progs)
+		if (invoke_bpf(m, image, ctx, fentry, flags & BPF_TRAMP_F_RET_FENTRY_RET))
+			return -EINVAL;
+
+	if (fmod_ret->nr_progs) {
+		branches = kcalloc(fmod_ret->nr_progs, sizeof(u32), GFP_KERNEL);
+		if (!branches)
+			return -ENOMEM;
+
+		if (invoke_bpf_mod_ret(m, image, ctx, fmod_ret, branches)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+	}
+
+	/* call original function */
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		PPC_BPF_LL(_R3, _R1, BPF_TRAMP_LR_SAVE);
+		EMIT(PPC_RAW_MTCTR(_R3));
+
+		/* restore args */
+		for (i = 0; i < nr_args; i++)
+			PPC_BPF_LL(_R3 + i, _R1, BPF_TRAMP_PROG_CTX + (i * 8));
+
+		PPC_BPF_LL(_R2, _R1, 24);
+		EMIT(PPC_RAW_BCTRL());
+		PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
+
+		/* remember return value in a stack for bpf prog to access */
+		PPC_BPF_STL(_R3, _R1, BPF_TRAMP_PROG_CTX + (nr_args * 8));
+
+		/* reserve space to patch branch instruction to skip fexit progs */
+		im->ip_after_call = &image[ctx->idx];
+		EMIT(PPC_RAW_NOP());
+	}
+
+	if (fmod_ret->nr_progs) {
+		/* update branches saved in invoke_bpf_mod_ret with aligned address of do_fexit */
+		for (i = 0; i < fmod_ret->nr_progs; i++) {
+			if (create_cond_branch(&branch_insn, &image[branches[i]],
+					       (unsigned long)&image[ctx->idx], COND_NE << 16)) {
+				ret = -EINVAL;
+				goto cleanup;
+			}
+
+			image[branches[i]] = ppc_inst_val(branch_insn);
+		}
+	}
+
+	if (fexit->nr_progs)
+		if (invoke_bpf(m, image, ctx, fexit, false)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+
+	if (flags & BPF_TRAMP_F_RESTORE_REGS)
+		for (i = 0; i < nr_args; i++)
+			PPC_BPF_LL(_R3 + i, _R1, BPF_TRAMP_PROG_CTX + (i * 8));
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		im->ip_epilogue = &image[ctx->idx];
+		PPC_LI64(_R3, (unsigned long)im);
+		ret = bpf_jit_emit_func_call_hlp(image, ctx, (u64)__bpf_tramp_exit);
+		if (ret)
+			goto cleanup;
+	}
+
+	/* restore return value of orig_call or fentry prog */
+	if (save_ret)
+		PPC_BPF_LL(_R3, _R1, BPF_TRAMP_PROG_CTX + (nr_args * 8));
+
+	/* epilogue */
+	PPC_BPF_LL(_R26, _R1, BPF_TRAMP_R26_SAVE);
+	PPC_BPF_LL(_R25, _R1, BPF_TRAMP_R25_SAVE);
+	PPC_BPF_LL(_R2, _R1, 24);
+	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+		/* skip our return address and return to parent */
+		EMIT(PPC_RAW_ADDI(_R1, _R1, BPF_TRAMP_FRAME_SIZE));
+		PPC_BPF_LL(_R0, _R1, PPC_LR_STKOFF);
+		EMIT(PPC_RAW_MTCTR(_R0));
+	} else {
+		PPC_BPF_LL(_R0, _R1, BPF_TRAMP_LR_SAVE);
+		EMIT(PPC_RAW_MTCTR(_R0));
+		EMIT(PPC_RAW_ADDI(_R1, _R1, BPF_TRAMP_FRAME_SIZE));
+		PPC_BPF_LL(_R0, _R1, PPC_LR_STKOFF);
+		EMIT(PPC_RAW_MTLR(_R0));
+	}
+	EMIT(PPC_RAW_BCTR());
+
+	/* make sure the trampoline generation logic doesn't overflow */
+	if (WARN_ON_ONCE(&image[ctx->idx] > (u32 *)image_end - BPF_INSN_SAFETY)) {
+		ret = -EFAULT;
+		goto cleanup;
+	}
+	ret = (u8 *)&image[ctx->idx] - (u8 *)image;
+
+cleanup:
+	kfree(branches);
+	return ret;
+}
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 3/3] powerpc64/bpf: Add support for bpf trampolines
@ 2022-02-07  7:07   ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-07  7:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, Steven Rostedt
  Cc: bpf, linuxppc-dev, Yauheni Kaliuta, Hari Bathini, Jordan Niethe,
	Jiri Olsa

Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for powerpc64 -mprofile-kernel.

We set aside space for two stubs at the beginning of each bpf program.
These stubs are used if having to branch to locations outside the range
of a branch instruction.

BPF Trampolines adhere to the powerpc64 -mprofile-kernel ABI since these
need to attach to ftrace locations using ftrace direct attach. Due to
this, bpf_arch_text_poke() patches two instructions: 'mflr r0' and 'bl'
for BPF_MOD_CALL. The trampoline code itself closely follows the x86
implementation.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/net/bpf_jit.h        |   8 +
 arch/powerpc/net/bpf_jit_comp.c   |   5 +-
 arch/powerpc/net/bpf_jit_comp64.c | 619 +++++++++++++++++++++++++++++-
 3 files changed, 630 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 0832235a274983..777b10650678af 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -19,6 +19,14 @@
 #define FUNCTION_DESCR_SIZE	0
 #endif
 
+#ifdef PPC64_ELF_ABI_v2
+#define BPF_TRAMP_STUB_SIZE	32
+#else
+#define BPF_TRAMP_STUB_SIZE	0
+#endif
+
+#define PPC_BPF_MAGIC()			(0xeB9FC0DE)
+
 #define PLANT_INSTR(d, idx, instr)					      \
 	do { if (d) { (d)[idx] = instr; } idx++; } while (0)
 #define EMIT(instr)		PLANT_INSTR(image, ctx->idx, instr)
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 635f7448ff7952..5df2f15bfe4d75 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -220,7 +220,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
 
 	proglen = cgctx.idx * 4;
-	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len;
+	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len + BPF_TRAMP_STUB_SIZE * 2;
 
 	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4, bpf_jit_fill_ill_insns);
 	if (!bpf_hdr) {
@@ -228,6 +228,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		goto out_addrs;
 	}
 
+	image += BPF_TRAMP_STUB_SIZE * 2;
+
 	if (extable_len)
 		fp->aux->extable = (void *)image + FUNCTION_DESCR_SIZE + proglen + fixup_len;
 
@@ -251,6 +253,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	}
 
 	/* Code generation passes 1-2 */
+	*(code_base - 1) = PPC_BPF_MAGIC();
 	for (pass = 1; pass < 3; pass++) {
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index c3cfe1f4338fca..20d8f6e3cc9bb0 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -13,6 +13,7 @@
 #include <linux/netdevice.h>
 #include <linux/filter.h>
 #include <linux/if_vlan.h>
+#include <linux/memory.h>
 #include <asm/kprobes.h>
 #include <linux/bpf.h>
 #include <asm/security_features.h>
@@ -73,6 +74,10 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 {
 	int i;
 
+	/* two nops for trampoline attach */
+	EMIT(PPC_RAW_NOP());
+	EMIT(PPC_RAW_NOP());
+
 #ifdef PPC64_ELF_ABI_v2
 	PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
 #else
@@ -93,7 +98,7 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 		EMIT(PPC_RAW_NOP());
 	}
 
-#define BPF_TAILCALL_PROLOGUE_SIZE	12
+#define BPF_TAILCALL_PROLOGUE_SIZE	20
 
 	if (bpf_has_stack_frame(ctx)) {
 		/*
@@ -1133,3 +1138,615 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 
 	return 0;
 }
+
+#ifdef PPC64_ELF_ABI_v2
+
+static __always_inline int bpf_check_and_patch(u32 *ip, ppc_inst_t old_inst, ppc_inst_t new_inst)
+{
+	ppc_inst_t org_inst = ppc_inst_read(ip);
+	if (!ppc_inst_equal(org_inst, old_inst)) {
+		pr_info("bpf_check_and_patch: ip: 0x%lx, org_inst(0x%x) != old_inst (0x%x)\n",
+				(unsigned long)ip, ppc_inst_val(org_inst), ppc_inst_val(old_inst));
+		return -EBUSY;
+	}
+	if (ppc_inst_equal(org_inst, new_inst))
+		return 1;
+	return patch_instruction(ip, new_inst);
+}
+
+static u32 *bpf_find_existing_stub(u32 *ip, enum bpf_text_poke_type t, void *old_addr)
+{
+	int branch_flags = t == BPF_MOD_JUMP ? 0 : BRANCH_SET_LINK;
+	u32 *stub_addr = 0, *stub1, *stub2;
+	ppc_inst_t org_inst, old_inst;
+
+	if (!old_addr)
+		return 0;
+
+	stub1 = ip - (BPF_TRAMP_STUB_SIZE / sizeof(u32)) - (t == BPF_MOD_CALL ? 1 : 0);
+	stub2 = stub1 - (BPF_TRAMP_STUB_SIZE / sizeof(u32));
+	org_inst = ppc_inst_read(ip);
+	if (!create_branch(&old_inst, ip, (unsigned long)stub1, branch_flags) &&
+	    ppc_inst_equal(org_inst, old_inst))
+		stub_addr = stub1;
+	if (!create_branch(&old_inst, ip, (unsigned long)stub2, branch_flags) &&
+	    ppc_inst_equal(org_inst, old_inst))
+		stub_addr = stub2;
+
+	return stub_addr;
+}
+
+static u32 *bpf_setup_stub(u32 *ip, enum bpf_text_poke_type t, void *old_addr, void *new_addr)
+{
+	u32 *stub_addr, *stub1, *stub2;
+	ppc_inst_t org_inst, old_inst;
+	int i, ret;
+	u32 stub[] = {
+		PPC_RAW_LIS(12, 0),
+		PPC_RAW_ORI(12, 12, 0),
+		PPC_RAW_SLDI(12, 12, 32),
+		PPC_RAW_ORIS(12, 12, 0),
+		PPC_RAW_ORI(12, 12, 0),
+		PPC_RAW_MTCTR(12),
+		PPC_RAW_BCTR(),
+	};
+
+	/* verify we are patching the right location */
+	if (t == BPF_MOD_JUMP)
+		org_inst = ppc_inst_read(ip - 1);
+	else
+		org_inst = ppc_inst_read(ip - 2);
+	old_inst = ppc_inst(PPC_BPF_MAGIC());
+	if (!ppc_inst_equal(org_inst, old_inst))
+		return 0;
+
+	/* verify existing branch and note down the stub to use */
+	stub1 = ip - (BPF_TRAMP_STUB_SIZE / sizeof(u32)) - (t == BPF_MOD_CALL ? 1 : 0);
+	stub2 = stub1 - (BPF_TRAMP_STUB_SIZE / sizeof(u32));
+	stub_addr = 0;
+	org_inst = ppc_inst_read(ip);
+	if (old_addr) {
+		stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+		/* existing instruction should branch to one of the two stubs */
+		if (!stub_addr)
+			return 0;
+	} else {
+		old_inst = ppc_inst(PPC_RAW_NOP());
+		if (!ppc_inst_equal(org_inst, old_inst))
+			return 0;
+	}
+	if (stub_addr == stub1)
+		stub_addr = stub2;
+	else
+		stub_addr = stub1;
+
+	/* setup stub */
+	stub[0] |= IMM_L((unsigned long)new_addr >> 48);
+	stub[1] |= IMM_L((unsigned long)new_addr >> 32);
+	stub[3] |= IMM_L((unsigned long)new_addr >> 16);
+	stub[4] |= IMM_L((unsigned long)new_addr);
+	for (i = 0; i < sizeof(stub) / sizeof(u32); i++) {
+		ret = patch_instruction(stub_addr + i, ppc_inst(stub[i]));
+		if (ret) {
+			pr_err("bpf: patch_instruction() error while setting up stub: ret %d\n", ret);
+			return 0;
+		}
+	}
+
+	return stub_addr;
+}
+
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *old_addr, void *new_addr)
+{
+	ppc_inst_t org_inst, old_inst, new_inst;
+	int ret = -EINVAL;
+	u32 *stub_addr;
+
+	/* We currently only support poking bpf programs */
+	if (!is_bpf_text_address((long)ip)) {
+		pr_info("bpf_arch_text_poke (0x%lx): kernel/modules are not supported\n", (unsigned long)ip);
+		return -EINVAL;
+	}
+
+	mutex_lock(&text_mutex);
+	if (t == BPF_MOD_JUMP) {
+		/*
+		 * This can point to the beginning of a bpf program, or to certain locations
+		 * within a bpf program. We operate on a single instruction at ip here,
+		 * converting among a nop and an unconditional branch. Depending on branch
+		 * target, we may use the stub area at the beginning of the bpf program and
+		 * we assume that BPF_MOD_JUMP and BPF_MOD_CALL are never used without
+		 * transitioning to a nop.
+		 */
+		if (!old_addr && new_addr) {
+			/* nop -> b */
+			old_inst = ppc_inst(PPC_RAW_NOP());
+			if (create_branch(&new_inst, (u32 *)ip, (unsigned long)new_addr, 0)) {
+				stub_addr = bpf_setup_stub(ip, t, old_addr, new_addr);
+				if (!stub_addr ||
+				    create_branch(&new_inst, (u32 *)ip, (unsigned long)stub_addr, 0)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+		} else if (old_addr && !new_addr) {
+			/* b -> nop */
+			new_inst = ppc_inst(PPC_RAW_NOP());
+			if (create_branch(&old_inst, (u32 *)ip, (unsigned long)old_addr, 0)) {
+				stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+				if (!stub_addr ||
+				    create_branch(&old_inst, (u32 *)ip, (unsigned long)stub_addr, 0)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+		} else if (old_addr && new_addr) {
+			/* b -> b */
+			stub_addr = 0;
+			if (create_branch(&old_inst, (u32 *)ip, (unsigned long)old_addr, 0)) {
+				stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+				if (!stub_addr ||
+				    create_branch(&old_inst, (u32 *)ip, (unsigned long)stub_addr, 0)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			if (create_branch(&new_inst, (u32 *)ip, (unsigned long)new_addr, 0)) {
+				stub_addr = bpf_setup_stub(ip, t, old_addr, new_addr);
+				if (!stub_addr ||
+				    create_branch(&new_inst, (u32 *)ip, (unsigned long)stub_addr, 0)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			ret = bpf_check_and_patch((u32 *)ip, old_inst, new_inst);
+		}
+	} else if (t == BPF_MOD_CALL) {
+		/*
+		 * For a BPF_MOD_CALL, we expect ip to point at the start of a bpf program.
+		 * We will have to patch two instructions to mimic -mprofile-kernel: a 'mflr r0'
+		 * followed by a 'bl'. Instruction patching order matters: we always patch-in
+		 * the 'mflr r0' first and patch it out the last.
+		 */
+		if (!old_addr && new_addr) {
+			/* nop -> bl */
+
+			/* confirm that we have two nops */
+			old_inst = ppc_inst(PPC_RAW_NOP());
+			org_inst = ppc_inst_read(ip);
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			org_inst = ppc_inst_read((u32 *)ip + 1);
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* patch in the mflr */
+			new_inst = ppc_inst(PPC_RAW_MFLR(_R0));
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+			if (ret)
+				goto out;
+
+			/* prep the stub if needed */
+			ip = (u32 *)ip + 1;
+			if (create_branch(&new_inst, (u32 *)ip, (unsigned long)new_addr, BRANCH_SET_LINK)) {
+				stub_addr = bpf_setup_stub(ip, t, old_addr, new_addr);
+				if (!stub_addr ||
+				    create_branch(&new_inst, (u32 *)ip, (unsigned long)stub_addr, BRANCH_SET_LINK)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+
+			synchronize_rcu();
+
+			/* patch in the bl */
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+		} else if (old_addr && !new_addr) {
+			/* bl -> nop */
+
+			/* confirm the expected instruction sequence */
+			old_inst = ppc_inst(PPC_RAW_MFLR(_R0));
+			org_inst = ppc_inst_read(ip);
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			ip = (u32 *)ip + 1;
+			org_inst = ppc_inst_read(ip);
+			if (create_branch(&old_inst, (u32 *)ip, (unsigned long)old_addr, BRANCH_SET_LINK)) {
+				stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+				if (!stub_addr ||
+				    create_branch(&old_inst, (u32 *)ip, (unsigned long)stub_addr, BRANCH_SET_LINK)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* patch out the branch first */
+			new_inst = ppc_inst(PPC_RAW_NOP());
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+			if (ret)
+				goto out;
+
+			synchronize_rcu();
+
+			/* then, the mflr */
+			old_inst = ppc_inst(PPC_RAW_MFLR(_R0));
+			ret = bpf_check_and_patch((u32 *)ip - 1, old_inst, new_inst);
+		} else if (old_addr && new_addr) {
+			/* bl -> bl */
+
+			/* confirm the expected instruction sequence */
+			old_inst = ppc_inst(PPC_RAW_MFLR(_R0));
+			org_inst = ppc_inst_read(ip);
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			ip = (u32 *)ip + 1;
+			org_inst = ppc_inst_read(ip);
+			if (create_branch(&old_inst, (u32 *)ip, (unsigned long)old_addr, BRANCH_SET_LINK)) {
+				stub_addr = bpf_find_existing_stub(ip, t, old_addr);
+				if (!stub_addr ||
+				    create_branch(&old_inst, (u32 *)ip, (unsigned long)stub_addr, BRANCH_SET_LINK)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			if (!ppc_inst_equal(org_inst, old_inst)) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* setup the new branch */
+			if (create_branch(&new_inst, (u32 *)ip, (unsigned long)new_addr, BRANCH_SET_LINK)) {
+				stub_addr = bpf_setup_stub(ip, t, old_addr, new_addr);
+				if (!stub_addr ||
+				    create_branch(&new_inst, (u32 *)ip, (unsigned long)stub_addr, BRANCH_SET_LINK)) {
+					ret = -EINVAL;
+					goto out;
+				}
+			}
+			ret = bpf_check_and_patch(ip, old_inst, new_inst);
+		}
+	}
+
+out:
+	mutex_unlock(&text_mutex);
+	return ret;
+}
+
+/*
+ * BPF Trampoline stack frame layout:
+ *
+ *		[	prev sp		] <-----
+ *		[   BPF_TRAMP_R26_SAVE	] 8	|
+ *		[   BPF_TRAMP_R25_SAVE	] 8	|
+ *		[   BPF_TRAMP_LR_SAVE	] 8	|
+ *		[       ret val		] 8	|
+ *		[   BPF_TRAMP_PROG_CTX	] 8 * 8	|
+ *		[ BPF_TRAMP_FUNC_ARG_CNT] 8	|
+ *		[   BPF_TRAMP_FUNC_IP	] 8	|
+ * sp (r1) --->	[   stack frame header	] ------
+ */
+
+/* stack frame header + data, quadword aligned */
+#define BPF_TRAMP_FRAME_SIZE	(STACK_FRAME_MIN_SIZE + (14 * 8))
+
+/* The below are offsets from r1 */
+/* upto 8 dword func parameters, as bpf prog ctx */
+#define BPF_TRAMP_PROG_CTX	(STACK_FRAME_MIN_SIZE + 16)
+/* bpf_get_func_arg_cnt() needs this before prog ctx */
+#define BPF_TRAMP_FUNC_ARG_CNT	(BPF_TRAMP_PROG_CTX - 8)
+/* bpf_get_func_ip() needs this here */
+#define BPF_TRAMP_FUNC_IP	(BPF_TRAMP_PROG_CTX - 16)
+/* lr save area, after space for upto 8 args followed by retval of orig_call/fentry progs */
+#define BPF_TRAMP_LR_SAVE	(BPF_TRAMP_PROG_CTX + (8 * 8) + 8)
+#define BPF_TRAMP_R25_SAVE	(BPF_TRAMP_LR_SAVE + 8)
+#define BPF_TRAMP_R26_SAVE	(BPF_TRAMP_R25_SAVE + 8)
+
+#define BPF_INSN_SAFETY		64
+
+static int invoke_bpf_prog(const struct btf_func_model *m, u32 *image, struct codegen_context *ctx,
+			   struct bpf_prog *p, bool save_ret)
+{
+	ppc_inst_t branch_insn;
+	u32 jmp_idx;
+	int ret;
+
+	/* __bpf_prog_enter(p) */
+	PPC_LI64(_R3, (unsigned long)p);
+	EMIT(PPC_RAW_MR(_R25, _R3));
+	ret = bpf_jit_emit_func_call_hlp(image, ctx,
+			p->aux->sleepable ? (u64)__bpf_prog_enter_sleepable : (u64)__bpf_prog_enter);
+	if (ret)
+		return ret;
+
+	/* remember prog start time returned by __bpf_prog_enter */
+	EMIT(PPC_RAW_MR(_R26, _R3));
+
+	/*
+	 * if (__bpf_prog_enter(p) == 0)
+	 *	goto skip_exec_of_prog;
+	 *
+	 * emit a nop to be later patched with conditional branch, once offset is known
+	 */
+	EMIT(PPC_RAW_CMPDI(_R3, 0));
+	jmp_idx = ctx->idx;
+	EMIT(PPC_RAW_NOP());
+
+	/* p->bpf_func() */
+	EMIT(PPC_RAW_ADDI(_R3, _R1, BPF_TRAMP_PROG_CTX));
+	if (!p->jited)
+		PPC_LI64(_R4, (unsigned long)p->insnsi);
+	if (is_offset_in_branch_range((unsigned long)p->bpf_func - (unsigned long)&image[ctx->idx])) {
+		PPC_BL((unsigned long)p->bpf_func);
+	} else {
+		PPC_BPF_LL(_R12, _R25, offsetof(struct bpf_prog, bpf_func));
+		EMIT(PPC_RAW_MTCTR(_R12));
+		EMIT(PPC_RAW_BCTRL());
+	}
+
+	if (save_ret)
+		PPC_BPF_STL(_R3, _R1, BPF_TRAMP_PROG_CTX + (m->nr_args * 8));
+
+	/* fix up branch */
+	if (create_cond_branch(&branch_insn, &image[jmp_idx], (unsigned long)&image[ctx->idx], COND_EQ << 16))
+		return -EINVAL;
+	image[jmp_idx] = ppc_inst_val(branch_insn);
+
+	/* __bpf_prog_exit(p, start_time) */
+	EMIT(PPC_RAW_MR(_R3, _R25));
+	EMIT(PPC_RAW_MR(_R4, _R26));
+	ret = bpf_jit_emit_func_call_hlp(image, ctx,
+			p->aux->sleepable ? (u64)__bpf_prog_exit_sleepable : (u64)__bpf_prog_exit);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int invoke_bpf(const struct btf_func_model *m, u32 *image, struct codegen_context *ctx,
+		      struct bpf_tramp_progs *tp, bool save_ret)
+{
+	int i;
+
+	for (i = 0; i < tp->nr_progs; i++) {
+		if (invoke_bpf_prog(m, image, ctx, tp->progs[i], save_ret))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int invoke_bpf_mod_ret(const struct btf_func_model *m, u32 *image, struct codegen_context *ctx,
+			      struct bpf_tramp_progs *tp, u32 *branches)
+{
+	int i;
+
+	/*
+	 * The first fmod_ret program will receive a garbage return value.
+	 * Set this to 0 to avoid confusing the program.
+	 */
+	EMIT(PPC_RAW_LI(_R3, 0));
+	PPC_BPF_STL(_R3, _R1, BPF_TRAMP_PROG_CTX + (m->nr_args * 8));
+	for (i = 0; i < tp->nr_progs; i++) {
+		if (invoke_bpf_prog(m, image, ctx, tp->progs[i], true))
+			return -EINVAL;
+
+		/*
+		 * mod_ret prog stored return value after prog ctx. Emit:
+		 * if (*(u64 *)(ret_val) !=  0)
+		 *	goto do_fexit;
+		 */
+		PPC_BPF_LL(_R3, _R1, BPF_TRAMP_PROG_CTX + (m->nr_args * 8));
+		EMIT(PPC_RAW_CMPDI(_R3, 0));
+
+		/*
+		 * Save the location of the branch and generate a nop, which is
+		 * replaced with a conditional jump once do_fexit (i.e. the
+		 * start of the fexit invocation) is finalized.
+		 */
+		branches[i] = ctx->idx;
+		EMIT(PPC_RAW_NOP());
+	}
+
+	return 0;
+}
+
+static bool is_valid_bpf_tramp_flags(unsigned int flags)
+{
+	if ((flags & BPF_TRAMP_F_RESTORE_REGS) && (flags & BPF_TRAMP_F_SKIP_FRAME))
+		return false;
+
+	/* We only support attaching to function entry */
+	if ((flags & BPF_TRAMP_F_CALL_ORIG) && !(flags & BPF_TRAMP_F_SKIP_FRAME))
+		return false;
+
+	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops, and it must be used alone */
+	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) && (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
+		return false;
+
+	return true;
+}
+
+/*
+ * We assume that orig_call is what this trampoline is being attached to and we use the link
+ * register for BPF_TRAMP_F_CALL_ORIG -- see is_valid_bpf_tramp_flags() for validating this.
+ */
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image_start, void *image_end,
+				const struct btf_func_model *m, u32 flags,
+				struct bpf_tramp_progs *tprogs,
+				void *orig_call __maybe_unused)
+{
+	bool save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
+	struct bpf_tramp_progs *fentry = &tprogs[BPF_TRAMP_FENTRY];
+	struct bpf_tramp_progs *fexit = &tprogs[BPF_TRAMP_FEXIT];
+	struct bpf_tramp_progs *fmod_ret = &tprogs[BPF_TRAMP_MODIFY_RETURN];
+	struct codegen_context codegen_ctx, *ctx;
+	int i, ret, nr_args = m->nr_args;
+	u32 *image = (u32 *)image_start;
+	ppc_inst_t branch_insn;
+	u32 *branches = NULL;
+
+	if (nr_args > 8 || !is_valid_bpf_tramp_flags(flags))
+		return -EINVAL;
+
+	ctx = &codegen_ctx;
+	memset(ctx, 0, sizeof(*ctx));
+
+	/*
+	 * Prologue for the trampoline follows ftrace -mprofile-kernel ABI.
+	 * On entry, LR has our return address while r0 has original return address.
+	 *	std	r0, 16(r1)
+	 *	stdu	r1, -144(r1)
+	 *	mflr	r0
+	 *	std	r0, 112(r1)
+	 *	std	r2, 24(r1)
+	 *	ld	r2, PACATOC(r13)
+	 *	std	r3, 40(r1)
+	 *	std	r4, 48(r2)
+	 *	...
+	 *	std	r25, 120(r1)
+	 *	std	r26, 128(r1)
+	 */
+	PPC_BPF_STL(_R0, _R1, PPC_LR_STKOFF);
+	PPC_BPF_STLU(_R1, _R1, -BPF_TRAMP_FRAME_SIZE);
+	EMIT(PPC_RAW_MFLR(_R0));
+	PPC_BPF_STL(_R0, _R1, BPF_TRAMP_LR_SAVE);
+	PPC_BPF_STL(_R2, _R1, 24);
+	PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
+	for (i = 0; i < nr_args; i++)
+		PPC_BPF_STL(_R3 + i, _R1, BPF_TRAMP_PROG_CTX + (i * 8));
+	PPC_BPF_STL(_R25, _R1, BPF_TRAMP_R25_SAVE);
+	PPC_BPF_STL(_R26, _R1, BPF_TRAMP_R26_SAVE);
+
+	/* save function arg count -- see bpf_get_func_arg_cnt() */
+	EMIT(PPC_RAW_LI(_R3, nr_args));
+	PPC_BPF_STL(_R3, _R1, BPF_TRAMP_FUNC_ARG_CNT);
+
+	/* save nip of the traced function before bpf prog ctx -- see bpf_get_func_ip() */
+	if (flags & BPF_TRAMP_F_IP_ARG) {
+		/* TODO: should this be GEP? */
+		EMIT(PPC_RAW_ADDI(_R3, _R0, -8));
+		PPC_BPF_STL(_R3, _R1, BPF_TRAMP_FUNC_IP);
+	}
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		PPC_LI64(_R3, (unsigned long)im);
+		ret = bpf_jit_emit_func_call_hlp(image, ctx, (u64)__bpf_tramp_enter);
+		if (ret)
+			return ret;
+	}
+
+	if (fentry->nr_progs)
+		if (invoke_bpf(m, image, ctx, fentry, flags & BPF_TRAMP_F_RET_FENTRY_RET))
+			return -EINVAL;
+
+	if (fmod_ret->nr_progs) {
+		branches = kcalloc(fmod_ret->nr_progs, sizeof(u32), GFP_KERNEL);
+		if (!branches)
+			return -ENOMEM;
+
+		if (invoke_bpf_mod_ret(m, image, ctx, fmod_ret, branches)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+	}
+
+	/* call original function */
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		PPC_BPF_LL(_R3, _R1, BPF_TRAMP_LR_SAVE);
+		EMIT(PPC_RAW_MTCTR(_R3));
+
+		/* restore args */
+		for (i = 0; i < nr_args; i++)
+			PPC_BPF_LL(_R3 + i, _R1, BPF_TRAMP_PROG_CTX + (i * 8));
+
+		PPC_BPF_LL(_R2, _R1, 24);
+		EMIT(PPC_RAW_BCTRL());
+		PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
+
+		/* remember return value in a stack for bpf prog to access */
+		PPC_BPF_STL(_R3, _R1, BPF_TRAMP_PROG_CTX + (nr_args * 8));
+
+		/* reserve space to patch branch instruction to skip fexit progs */
+		im->ip_after_call = &image[ctx->idx];
+		EMIT(PPC_RAW_NOP());
+	}
+
+	if (fmod_ret->nr_progs) {
+		/* update branches saved in invoke_bpf_mod_ret with aligned address of do_fexit */
+		for (i = 0; i < fmod_ret->nr_progs; i++) {
+			if (create_cond_branch(&branch_insn, &image[branches[i]],
+					       (unsigned long)&image[ctx->idx], COND_NE << 16)) {
+				ret = -EINVAL;
+				goto cleanup;
+			}
+
+			image[branches[i]] = ppc_inst_val(branch_insn);
+		}
+	}
+
+	if (fexit->nr_progs)
+		if (invoke_bpf(m, image, ctx, fexit, false)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+
+	if (flags & BPF_TRAMP_F_RESTORE_REGS)
+		for (i = 0; i < nr_args; i++)
+			PPC_BPF_LL(_R3 + i, _R1, BPF_TRAMP_PROG_CTX + (i * 8));
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		im->ip_epilogue = &image[ctx->idx];
+		PPC_LI64(_R3, (unsigned long)im);
+		ret = bpf_jit_emit_func_call_hlp(image, ctx, (u64)__bpf_tramp_exit);
+		if (ret)
+			goto cleanup;
+	}
+
+	/* restore return value of orig_call or fentry prog */
+	if (save_ret)
+		PPC_BPF_LL(_R3, _R1, BPF_TRAMP_PROG_CTX + (nr_args * 8));
+
+	/* epilogue */
+	PPC_BPF_LL(_R26, _R1, BPF_TRAMP_R26_SAVE);
+	PPC_BPF_LL(_R25, _R1, BPF_TRAMP_R25_SAVE);
+	PPC_BPF_LL(_R2, _R1, 24);
+	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+		/* skip our return address and return to parent */
+		EMIT(PPC_RAW_ADDI(_R1, _R1, BPF_TRAMP_FRAME_SIZE));
+		PPC_BPF_LL(_R0, _R1, PPC_LR_STKOFF);
+		EMIT(PPC_RAW_MTCTR(_R0));
+	} else {
+		PPC_BPF_LL(_R0, _R1, BPF_TRAMP_LR_SAVE);
+		EMIT(PPC_RAW_MTCTR(_R0));
+		EMIT(PPC_RAW_ADDI(_R1, _R1, BPF_TRAMP_FRAME_SIZE));
+		PPC_BPF_LL(_R0, _R1, PPC_LR_STKOFF);
+		EMIT(PPC_RAW_MTLR(_R0));
+	}
+	EMIT(PPC_RAW_BCTR());
+
+	/* make sure the trampoline generation logic doesn't overflow */
+	if (WARN_ON_ONCE(&image[ctx->idx] > (u32 *)image_end - BPF_INSN_SAFETY)) {
+		ret = -EFAULT;
+		goto cleanup;
+	}
+	ret = (u8 *)&image[ctx->idx] - (u8 *)image;
+
+cleanup:
+	kfree(branches);
+	return ret;
+}
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-07  7:07   ` Naveen N. Rao
@ 2022-02-07 15:24     ` Steven Rostedt
  -1 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-02-07 15:24 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Daniel Borkmann, Yauheni Kaliuta, Jiri Olsa, Jordan Niethe, bpf,
	linuxppc-dev, Alexei Starovoitov, Hari Bathini

On Mon,  7 Feb 2022 12:37:21 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search)
>  		return str;
>  }
>  #endif /* PPC64_ELF_ABI_v1 */
> +
> +#ifdef CONFIG_MPROFILE_KERNEL
> +unsigned long ftrace_location_lookup(unsigned long ip)
> +{
> +	/*
> +	 * Per livepatch.h, ftrace location is always within the first
> +	 * 16 bytes of a function on powerpc with -mprofile-kernel.
> +	 */
> +	return ftrace_location_range(ip, ip + 16);

I think this is the wrong approach for the implementation and error prone.

> +}
> +#endif
> -- 

What I believe is a safer approach is to use the record address and add the
range to it.

That is, you know that the ftrace modification site is a range (multiple
instructions), where in the ftrace infrastructure, only one ip represents
that range. What you want is if you pass in an ip, and that ip is within
that range, you return the ip that represents that range to ftrace.

It looks like we need to change the compare function in the bsearch.

Perhaps add a new macro to define the size of the range to be searched,
instead of just using MCOUNT_INSN_SIZE? Then we may not even need this new
lookup function?

static int ftrace_cmp_recs(const void *a, const void *b)
{
	const struct dyn_ftrace *key = a;
	const struct dyn_ftrace *rec = b;

	if (key->flags < rec->ip)
		return -1;
	if (key->ip >= rec->ip + ARCH_IP_SIZE)
		return 1;
	return 0;
}

Where ARCH_IP_SIZE is defined to MCOUNT_INSN_SIZE by default, but an arch
could define it to something else, like 16.

Would that work for you, or am I missing something?

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-07 15:24     ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-02-07 15:24 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Daniel Borkmann, Alexei Starovoitov, Michael Ellerman, bpf,
	linuxppc-dev, Yauheni Kaliuta, Hari Bathini, Jordan Niethe,
	Jiri Olsa

On Mon,  7 Feb 2022 12:37:21 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search)
>  		return str;
>  }
>  #endif /* PPC64_ELF_ABI_v1 */
> +
> +#ifdef CONFIG_MPROFILE_KERNEL
> +unsigned long ftrace_location_lookup(unsigned long ip)
> +{
> +	/*
> +	 * Per livepatch.h, ftrace location is always within the first
> +	 * 16 bytes of a function on powerpc with -mprofile-kernel.
> +	 */
> +	return ftrace_location_range(ip, ip + 16);

I think this is the wrong approach for the implementation and error prone.

> +}
> +#endif
> -- 

What I believe is a safer approach is to use the record address and add the
range to it.

That is, you know that the ftrace modification site is a range (multiple
instructions), where in the ftrace infrastructure, only one ip represents
that range. What you want is if you pass in an ip, and that ip is within
that range, you return the ip that represents that range to ftrace.

It looks like we need to change the compare function in the bsearch.

Perhaps add a new macro to define the size of the range to be searched,
instead of just using MCOUNT_INSN_SIZE? Then we may not even need this new
lookup function?

static int ftrace_cmp_recs(const void *a, const void *b)
{
	const struct dyn_ftrace *key = a;
	const struct dyn_ftrace *rec = b;

	if (key->flags < rec->ip)
		return -1;
	if (key->ip >= rec->ip + ARCH_IP_SIZE)
		return 1;
	return 0;
}

Where ARCH_IP_SIZE is defined to MCOUNT_INSN_SIZE by default, but an arch
could define it to something else, like 16.

Would that work for you, or am I missing something?

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-07 15:24     ` Steven Rostedt
@ 2022-02-09 17:50       ` Naveen N. Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-09 17:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Hari Bathini,
	Jordan Niethe, Jiri Olsa, linuxppc-dev, Michael Ellerman,
	Yauheni Kaliuta

Steven Rostedt wrote:
> On Mon,  7 Feb 2022 12:37:21 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> --- a/arch/powerpc/kernel/trace/ftrace.c
>> +++ b/arch/powerpc/kernel/trace/ftrace.c
>> @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search)
>>  		return str;
>>  }
>>  #endif /* PPC64_ELF_ABI_v1 */
>> +
>> +#ifdef CONFIG_MPROFILE_KERNEL
>> +unsigned long ftrace_location_lookup(unsigned long ip)
>> +{
>> +	/*
>> +	 * Per livepatch.h, ftrace location is always within the first
>> +	 * 16 bytes of a function on powerpc with -mprofile-kernel.
>> +	 */
>> +	return ftrace_location_range(ip, ip + 16);
> 
> I think this is the wrong approach for the implementation and error prone.
> 
>> +}
>> +#endif
>> -- 
> 
> What I believe is a safer approach is to use the record address and add the
> range to it.
> 
> That is, you know that the ftrace modification site is a range (multiple
> instructions), where in the ftrace infrastructure, only one ip represents
> that range. What you want is if you pass in an ip, and that ip is within
> that range, you return the ip that represents that range to ftrace.
> 
> It looks like we need to change the compare function in the bsearch.
> 
> Perhaps add a new macro to define the size of the range to be searched,
> instead of just using MCOUNT_INSN_SIZE? Then we may not even need this new
> lookup function?
> 
> static int ftrace_cmp_recs(const void *a, const void *b)
> {
> 	const struct dyn_ftrace *key = a;
> 	const struct dyn_ftrace *rec = b;
> 
> 	if (key->flags < rec->ip)
> 		return -1;
> 	if (key->ip >= rec->ip + ARCH_IP_SIZE)
> 		return 1;
> 	return 0;
> }
> 
> Where ARCH_IP_SIZE is defined to MCOUNT_INSN_SIZE by default, but an arch
> could define it to something else, like 16.
> 
> Would that work for you, or am I missing something?

Yes, I hadn't realized that [un]register_ftrace_direct() and 
modify_ftrace_direct() internally lookup the correct ftrace location, 
and act on that. So, changing ftrace_cmp_recs() does look like it will 
work well for powerpc. Thanks for this suggestion.

However, I think we will not be able to use a fixed range.  I would like 
to reserve instructions from function entry till the branch to 
_mcount(), and it can be two or four instructions depending on whether a 
function has a global entry point. For this, I am considering adding a 
field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
to initialize the same. I may need to override ftrace_cmp_recs().


Thanks,
- Naveen


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-09 17:50       ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-09 17:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Borkmann, Yauheni Kaliuta, Jordan Niethe, linuxppc-dev,
	bpf, Jiri Olsa, Alexei Starovoitov, Hari Bathini

Steven Rostedt wrote:
> On Mon,  7 Feb 2022 12:37:21 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> --- a/arch/powerpc/kernel/trace/ftrace.c
>> +++ b/arch/powerpc/kernel/trace/ftrace.c
>> @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search)
>>  		return str;
>>  }
>>  #endif /* PPC64_ELF_ABI_v1 */
>> +
>> +#ifdef CONFIG_MPROFILE_KERNEL
>> +unsigned long ftrace_location_lookup(unsigned long ip)
>> +{
>> +	/*
>> +	 * Per livepatch.h, ftrace location is always within the first
>> +	 * 16 bytes of a function on powerpc with -mprofile-kernel.
>> +	 */
>> +	return ftrace_location_range(ip, ip + 16);
> 
> I think this is the wrong approach for the implementation and error prone.
> 
>> +}
>> +#endif
>> -- 
> 
> What I believe is a safer approach is to use the record address and add the
> range to it.
> 
> That is, you know that the ftrace modification site is a range (multiple
> instructions), where in the ftrace infrastructure, only one ip represents
> that range. What you want is if you pass in an ip, and that ip is within
> that range, you return the ip that represents that range to ftrace.
> 
> It looks like we need to change the compare function in the bsearch.
> 
> Perhaps add a new macro to define the size of the range to be searched,
> instead of just using MCOUNT_INSN_SIZE? Then we may not even need this new
> lookup function?
> 
> static int ftrace_cmp_recs(const void *a, const void *b)
> {
> 	const struct dyn_ftrace *key = a;
> 	const struct dyn_ftrace *rec = b;
> 
> 	if (key->flags < rec->ip)
> 		return -1;
> 	if (key->ip >= rec->ip + ARCH_IP_SIZE)
> 		return 1;
> 	return 0;
> }
> 
> Where ARCH_IP_SIZE is defined to MCOUNT_INSN_SIZE by default, but an arch
> could define it to something else, like 16.
> 
> Would that work for you, or am I missing something?

Yes, I hadn't realized that [un]register_ftrace_direct() and 
modify_ftrace_direct() internally lookup the correct ftrace location, 
and act on that. So, changing ftrace_cmp_recs() does look like it will 
work well for powerpc. Thanks for this suggestion.

However, I think we will not be able to use a fixed range.  I would like 
to reserve instructions from function entry till the branch to 
_mcount(), and it can be two or four instructions depending on whether a 
function has a global entry point. For this, I am considering adding a 
field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
to initialize the same. I may need to override ftrace_cmp_recs().


Thanks,
- Naveen


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-09 17:50       ` Naveen N. Rao
@ 2022-02-09 21:10         ` Steven Rostedt
  -1 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-02-09 21:10 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Hari Bathini,
	Jordan Niethe, Jiri Olsa, linuxppc-dev, Michael Ellerman,
	Yauheni Kaliuta

On Wed, 09 Feb 2022 17:50:09 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> However, I think we will not be able to use a fixed range.  I would like 
> to reserve instructions from function entry till the branch to 
> _mcount(), and it can be two or four instructions depending on whether a 
> function has a global entry point. For this, I am considering adding a 
> field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
> to initialize the same. I may need to override ftrace_cmp_recs().

Be careful about adding anything to dyn_arch_ftrace. powerpc already adds
the pointer to the module. Anything you add to that gets multiplied by
thousands of times (which takes up memory).

At boot up you may see something like:

  ftrace: allocating 45363 entries in 178 pages

That's 45,363 dyn_arch_ftrace structures. And each module loads their own
as well. To see how many total you have after boot up:

  # cat /sys/kernel/tracing/dyn_ftrace_total_info 
55974 pages:295 groups: 89

That's from the same kernel. Another 10,000 entries were created by modules.
(This was for x86_64)

What you may be able to do, is to add a way to look at the already saved
kallsyms, which keeps track of the function entry and exit to know how to
map an address back to the function.

   kallsyms_lookup(addr, NULL, &offset, NULL, NULL);

Should give you the offset of addr from the start of the function.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-09 21:10         ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-02-09 21:10 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Daniel Borkmann, Yauheni Kaliuta, Jordan Niethe, linuxppc-dev,
	bpf, Jiri Olsa, Alexei Starovoitov, Hari Bathini

On Wed, 09 Feb 2022 17:50:09 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> However, I think we will not be able to use a fixed range.  I would like 
> to reserve instructions from function entry till the branch to 
> _mcount(), and it can be two or four instructions depending on whether a 
> function has a global entry point. For this, I am considering adding a 
> field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
> to initialize the same. I may need to override ftrace_cmp_recs().

Be careful about adding anything to dyn_arch_ftrace. powerpc already adds
the pointer to the module. Anything you add to that gets multiplied by
thousands of times (which takes up memory).

At boot up you may see something like:

  ftrace: allocating 45363 entries in 178 pages

That's 45,363 dyn_arch_ftrace structures. And each module loads their own
as well. To see how many total you have after boot up:

  # cat /sys/kernel/tracing/dyn_ftrace_total_info 
55974 pages:295 groups: 89

That's from the same kernel. Another 10,000 entries were created by modules.
(This was for x86_64)

What you may be able to do, is to add a way to look at the already saved
kallsyms, which keeps track of the function entry and exit to know how to
map an address back to the function.

   kallsyms_lookup(addr, NULL, &offset, NULL, NULL);

Should give you the offset of addr from the start of the function.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-09 21:10         ` Steven Rostedt
@ 2022-02-10 13:58           ` Naveen N. Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-10 13:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Hari Bathini,
	Jordan Niethe, Jiri Olsa, linuxppc-dev, Michael Ellerman,
	Yauheni Kaliuta

Steven Rostedt wrote:
> On Wed, 09 Feb 2022 17:50:09 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> However, I think we will not be able to use a fixed range.  I would like 
>> to reserve instructions from function entry till the branch to 
>> _mcount(), and it can be two or four instructions depending on whether a 
>> function has a global entry point. For this, I am considering adding a 
>> field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
>> to initialize the same. I may need to override ftrace_cmp_recs().
> 
> Be careful about adding anything to dyn_arch_ftrace. powerpc already adds
> the pointer to the module. Anything you add to that gets multiplied by
> thousands of times (which takes up memory).
> 
> At boot up you may see something like:
> 
>   ftrace: allocating 45363 entries in 178 pages
> 
> That's 45,363 dyn_arch_ftrace structures. And each module loads their own
> as well. To see how many total you have after boot up:
> 
> 
>   # cat /sys/kernel/tracing/dyn_ftrace_total_info 
> 55974 pages:295 groups: 89
> 
> That's from the same kernel. Another 10,000 entries were created by modules.
> (This was for x86_64)
> 
> What you may be able to do, is to add a way to look at the already saved
> kallsyms, which keeps track of the function entry and exit to know how to
> map an address back to the function.
> 
>    kallsyms_lookup(addr, NULL, &offset, NULL, NULL);
> 
> Should give you the offset of addr from the start of the function.

Good point. I should be able to overload the existing field for this 
purpose. Is something like the below ok?

---
 arch/powerpc/include/asm/ftrace.h  | 13 ++++++
 arch/powerpc/kernel/trace/ftrace.c | 73 ++++++++++++++++++++++++++----
 kernel/trace/ftrace.c              |  2 +
 3 files changed, 78 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index debe8c4f706260..96d6e26cee86af 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -59,6 +59,19 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 struct dyn_arch_ftrace {
 	struct module *mod;
 };
+
+struct dyn_ftrace;
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec);
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module *mod);
+
+#ifdef CONFIG_MPROFILE_KERNEL
+int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
+#define ftrace_init_nop ftrace_init_nop
+
+int ftrace_cmp_recs(const void *a, const void *b);
+#define ftrace_cmp_recs ftrace_cmp_recs
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 80b6285769f27c..d9b6faa4c98a8c 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -428,21 +428,21 @@ int ftrace_make_nop(struct module *mod,
 	 * We should either already have a pointer to the module
 	 * or it has been passed in.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		if (!mod) {
 			pr_err("No module loaded addr=%lx\n", addr);
 			return -EFAULT;
 		}
-		rec->arch.mod = mod;
+		ftrace_mod_addr_set(rec, mod);
 	} else if (mod) {
-		if (mod != rec->arch.mod) {
+		if (mod != ftrace_mod_addr_get(rec)) {
 			pr_err("Record mod %p not equal to passed in mod %p\n",
-			       rec->arch.mod, mod);
+			       ftrace_mod_addr_get(rec), mod);
 			return -EINVAL;
 		}
 		/* nothing to do if mod == rec->arch.mod */
 	} else
-		mod = rec->arch.mod;
+		mod = ftrace_mod_addr_get(rec);
 
 	return __ftrace_make_nop(mod, rec, addr);
 #else
@@ -451,6 +451,59 @@ int ftrace_make_nop(struct module *mod,
 #endif /* CONFIG_MODULES */
 }
 
+#ifdef CONFIG_MPROFILE_KERNEL
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec)
+{
+	return (struct module *)((unsigned long)rec->arch.mod & ~0x1);
+}
+
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module *mod)
+{
+	rec->arch.mod = (struct module *)(((unsigned long)rec->arch.mod & 0x1) | (unsigned long)mod);
+}
+
+bool ftrace_location_has_gep(const struct dyn_ftrace *rec)
+{
+	return !!((unsigned long)rec->arch.mod & 0x1);
+}
+
+int ftrace_cmp_recs(const void *a, const void *b)
+{
+	const struct dyn_ftrace *key = a;
+	const struct dyn_ftrace *rec = b;
+	int offset = ftrace_location_has_gep(rec) ? 12 : 4;
+
+	if (key->flags < rec->ip - offset)
+		return -1;
+	if (key->ip >= rec->ip + MCOUNT_INSN_SIZE)
+		return 1;
+	return 0;
+}
+
+int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
+{
+	unsigned long offset;
+
+	if (!kallsyms_lookup_size_offset(rec->ip, NULL, &offset) || (offset != 12 && offset != 4)) {
+		/* TODO: implement logic to deduce lep/gep from code */
+	} else if (offset == 12) {
+		ftrace_mod_addr_set(rec, (struct module *)1);
+	}
+
+	return ftrace_make_nop(mod, rec, MCOUNT_ADDR);
+}
+#else
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec)
+{
+	return rec->arch.mod;
+}
+
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module * mod)
+{
+	rec->arch.mod = mod;
+}
+#endif /* CONFIG_MPROFILE_KERNEL */
+
 #ifdef CONFIG_MODULES
 #ifdef CONFIG_PPC64
 /*
@@ -494,7 +547,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	ppc_inst_t instr;
 	void *ip = (void *)rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 
 	/* read where this goes */
 	if (copy_inst_from_kernel_nofault(op, ip))
@@ -561,7 +614,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	int err;
 	ppc_inst_t op;
 	u32 *ip = (u32 *)rec->ip;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 	unsigned long tramp;
 
 	/* read where this goes */
@@ -678,7 +731,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	 * Being that we are converting from nop, it had better
 	 * already have a module defined.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		pr_err("No module loaded\n");
 		return -EINVAL;
 	}
@@ -699,7 +752,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	ppc_inst_t op;
 	unsigned long ip = rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 
 	/* If we never set up ftrace trampolines, then bail */
 	if (!mod->arch.tramp || !mod->arch.tramp_regs) {
@@ -814,7 +867,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	/*
 	 * Out of range jumps are called from modules.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		pr_err("No module loaded\n");
 		return -EINVAL;
 	}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index f9feb197b2daaf..68f20cf34b0c47 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
 	}
 
 
+#ifndef ftrace_cmp_recs
 static int ftrace_cmp_recs(const void *a, const void *b)
 {
 	const struct dyn_ftrace *key = a;
@@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
 		return 1;
 	return 0;
 }
+#endif
 
 static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end)
 {



Thanks,
Naveen

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-10 13:58           ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-10 13:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Borkmann, Yauheni Kaliuta, Jordan Niethe, linuxppc-dev,
	bpf, Jiri Olsa, Alexei Starovoitov, Hari Bathini

Steven Rostedt wrote:
> On Wed, 09 Feb 2022 17:50:09 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> However, I think we will not be able to use a fixed range.  I would like 
>> to reserve instructions from function entry till the branch to 
>> _mcount(), and it can be two or four instructions depending on whether a 
>> function has a global entry point. For this, I am considering adding a 
>> field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
>> to initialize the same. I may need to override ftrace_cmp_recs().
> 
> Be careful about adding anything to dyn_arch_ftrace. powerpc already adds
> the pointer to the module. Anything you add to that gets multiplied by
> thousands of times (which takes up memory).
> 
> At boot up you may see something like:
> 
>   ftrace: allocating 45363 entries in 178 pages
> 
> That's 45,363 dyn_arch_ftrace structures. And each module loads their own
> as well. To see how many total you have after boot up:
> 
> 
>   # cat /sys/kernel/tracing/dyn_ftrace_total_info 
> 55974 pages:295 groups: 89
> 
> That's from the same kernel. Another 10,000 entries were created by modules.
> (This was for x86_64)
> 
> What you may be able to do, is to add a way to look at the already saved
> kallsyms, which keeps track of the function entry and exit to know how to
> map an address back to the function.
> 
>    kallsyms_lookup(addr, NULL, &offset, NULL, NULL);
> 
> Should give you the offset of addr from the start of the function.

Good point. I should be able to overload the existing field for this 
purpose. Is something like the below ok?

---
 arch/powerpc/include/asm/ftrace.h  | 13 ++++++
 arch/powerpc/kernel/trace/ftrace.c | 73 ++++++++++++++++++++++++++----
 kernel/trace/ftrace.c              |  2 +
 3 files changed, 78 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index debe8c4f706260..96d6e26cee86af 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -59,6 +59,19 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 struct dyn_arch_ftrace {
 	struct module *mod;
 };
+
+struct dyn_ftrace;
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec);
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module *mod);
+
+#ifdef CONFIG_MPROFILE_KERNEL
+int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
+#define ftrace_init_nop ftrace_init_nop
+
+int ftrace_cmp_recs(const void *a, const void *b);
+#define ftrace_cmp_recs ftrace_cmp_recs
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 80b6285769f27c..d9b6faa4c98a8c 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -428,21 +428,21 @@ int ftrace_make_nop(struct module *mod,
 	 * We should either already have a pointer to the module
 	 * or it has been passed in.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		if (!mod) {
 			pr_err("No module loaded addr=%lx\n", addr);
 			return -EFAULT;
 		}
-		rec->arch.mod = mod;
+		ftrace_mod_addr_set(rec, mod);
 	} else if (mod) {
-		if (mod != rec->arch.mod) {
+		if (mod != ftrace_mod_addr_get(rec)) {
 			pr_err("Record mod %p not equal to passed in mod %p\n",
-			       rec->arch.mod, mod);
+			       ftrace_mod_addr_get(rec), mod);
 			return -EINVAL;
 		}
 		/* nothing to do if mod == rec->arch.mod */
 	} else
-		mod = rec->arch.mod;
+		mod = ftrace_mod_addr_get(rec);
 
 	return __ftrace_make_nop(mod, rec, addr);
 #else
@@ -451,6 +451,59 @@ int ftrace_make_nop(struct module *mod,
 #endif /* CONFIG_MODULES */
 }
 
+#ifdef CONFIG_MPROFILE_KERNEL
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec)
+{
+	return (struct module *)((unsigned long)rec->arch.mod & ~0x1);
+}
+
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module *mod)
+{
+	rec->arch.mod = (struct module *)(((unsigned long)rec->arch.mod & 0x1) | (unsigned long)mod);
+}
+
+bool ftrace_location_has_gep(const struct dyn_ftrace *rec)
+{
+	return !!((unsigned long)rec->arch.mod & 0x1);
+}
+
+int ftrace_cmp_recs(const void *a, const void *b)
+{
+	const struct dyn_ftrace *key = a;
+	const struct dyn_ftrace *rec = b;
+	int offset = ftrace_location_has_gep(rec) ? 12 : 4;
+
+	if (key->flags < rec->ip - offset)
+		return -1;
+	if (key->ip >= rec->ip + MCOUNT_INSN_SIZE)
+		return 1;
+	return 0;
+}
+
+int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
+{
+	unsigned long offset;
+
+	if (!kallsyms_lookup_size_offset(rec->ip, NULL, &offset) || (offset != 12 && offset != 4)) {
+		/* TODO: implement logic to deduce lep/gep from code */
+	} else if (offset == 12) {
+		ftrace_mod_addr_set(rec, (struct module *)1);
+	}
+
+	return ftrace_make_nop(mod, rec, MCOUNT_ADDR);
+}
+#else
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec)
+{
+	return rec->arch.mod;
+}
+
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module * mod)
+{
+	rec->arch.mod = mod;
+}
+#endif /* CONFIG_MPROFILE_KERNEL */
+
 #ifdef CONFIG_MODULES
 #ifdef CONFIG_PPC64
 /*
@@ -494,7 +547,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	ppc_inst_t instr;
 	void *ip = (void *)rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 
 	/* read where this goes */
 	if (copy_inst_from_kernel_nofault(op, ip))
@@ -561,7 +614,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	int err;
 	ppc_inst_t op;
 	u32 *ip = (u32 *)rec->ip;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 	unsigned long tramp;
 
 	/* read where this goes */
@@ -678,7 +731,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	 * Being that we are converting from nop, it had better
 	 * already have a module defined.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		pr_err("No module loaded\n");
 		return -EINVAL;
 	}
@@ -699,7 +752,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	ppc_inst_t op;
 	unsigned long ip = rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 
 	/* If we never set up ftrace trampolines, then bail */
 	if (!mod->arch.tramp || !mod->arch.tramp_regs) {
@@ -814,7 +867,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	/*
 	 * Out of range jumps are called from modules.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		pr_err("No module loaded\n");
 		return -EINVAL;
 	}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index f9feb197b2daaf..68f20cf34b0c47 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
 	}
 
 
+#ifndef ftrace_cmp_recs
 static int ftrace_cmp_recs(const void *a, const void *b)
 {
 	const struct dyn_ftrace *key = a;
@@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
 		return 1;
 	return 0;
 }
+#endif
 
 static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end)
 {



Thanks,
Naveen

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-10 13:58           ` Naveen N. Rao
@ 2022-02-10 14:59             ` Steven Rostedt
  -1 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-02-10 14:59 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Hari Bathini,
	Jordan Niethe, Jiri Olsa, linuxppc-dev, Michael Ellerman,
	Yauheni Kaliuta

On Thu, 10 Feb 2022 13:58:29 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index f9feb197b2daaf..68f20cf34b0c47 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
>  	}
>  
>  
> +#ifndef ftrace_cmp_recs
>  static int ftrace_cmp_recs(const void *a, const void *b)
>  {
>  	const struct dyn_ftrace *key = a;
> @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
>  		return 1;
>  	return 0;
>  }
> +#endif
>  

I don't really care for this part, as it seems somewhat ugly. But this
patch does appear to solve your issue, and I can't think of a prettier way
to do this.

So, I will reluctantly ack it.

If anything, please add a comment above saying that architectures may need
to override this function, and when doing so, they will define their own
ftrace_cmp_recs.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-10 14:59             ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-02-10 14:59 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Daniel Borkmann, Yauheni Kaliuta, Jordan Niethe, linuxppc-dev,
	bpf, Jiri Olsa, Alexei Starovoitov, Hari Bathini

On Thu, 10 Feb 2022 13:58:29 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index f9feb197b2daaf..68f20cf34b0c47 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
>  	}
>  
>  
> +#ifndef ftrace_cmp_recs
>  static int ftrace_cmp_recs(const void *a, const void *b)
>  {
>  	const struct dyn_ftrace *key = a;
> @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
>  		return 1;
>  	return 0;
>  }
> +#endif
>  

I don't really care for this part, as it seems somewhat ugly. But this
patch does appear to solve your issue, and I can't think of a prettier way
to do this.

So, I will reluctantly ack it.

If anything, please add a comment above saying that architectures may need
to override this function, and when doing so, they will define their own
ftrace_cmp_recs.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-10 14:59             ` Steven Rostedt
@ 2022-02-10 16:40               ` Naveen N. Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-10 16:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Hari Bathini,
	Jordan Niethe, Jiri Olsa, linuxppc-dev, Michael Ellerman,
	Yauheni Kaliuta

Steven Rostedt wrote:
> On Thu, 10 Feb 2022 13:58:29 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index f9feb197b2daaf..68f20cf34b0c47 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
>>  	}
>>  
>>  
>> +#ifndef ftrace_cmp_recs
>>  static int ftrace_cmp_recs(const void *a, const void *b)
>>  {
>>  	const struct dyn_ftrace *key = a;
>> @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
>>  		return 1;
>>  	return 0;
>>  }
>> +#endif
>>  
> 
> I don't really care for this part, as it seems somewhat ugly. But this
> patch does appear to solve your issue, and I can't think of a prettier way
> to do this.

Yes, this approach looks like it will solve a few different issues for 
us. We would also like to nop-out the instruction before the branch to 
_mcount(), so this helps ensure that kprobes won't also overwrite that 
instruction.

> 
> So, I will reluctantly ack it.

Since we don't want to change struct dyn_ftrace, we will need to contain 
changes within the architecture code, so I don't see a way to do this in 
a generic manner.

The other option is to mark ftrace_cmp_recs() as a __weak function, but 
I have a vague recollection of you suggesting #ifdef rather than a 
__weak function in the past. I might be mis-remembering, so if you think 
making this a __weak function is better, I can do that.

> 
> If anything, please add a comment above saying that architectures may need
> to override this function, and when doing so, they will define their own
> ftrace_cmp_recs.

Sure.


- Naveen


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-10 16:40               ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-10 16:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Borkmann, Yauheni Kaliuta, Jordan Niethe, linuxppc-dev,
	bpf, Jiri Olsa, Alexei Starovoitov, Hari Bathini

Steven Rostedt wrote:
> On Thu, 10 Feb 2022 13:58:29 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index f9feb197b2daaf..68f20cf34b0c47 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
>>  	}
>>  
>>  
>> +#ifndef ftrace_cmp_recs
>>  static int ftrace_cmp_recs(const void *a, const void *b)
>>  {
>>  	const struct dyn_ftrace *key = a;
>> @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
>>  		return 1;
>>  	return 0;
>>  }
>> +#endif
>>  
> 
> I don't really care for this part, as it seems somewhat ugly. But this
> patch does appear to solve your issue, and I can't think of a prettier way
> to do this.

Yes, this approach looks like it will solve a few different issues for 
us. We would also like to nop-out the instruction before the branch to 
_mcount(), so this helps ensure that kprobes won't also overwrite that 
instruction.

> 
> So, I will reluctantly ack it.

Since we don't want to change struct dyn_ftrace, we will need to contain 
changes within the architecture code, so I don't see a way to do this in 
a generic manner.

The other option is to mark ftrace_cmp_recs() as a __weak function, but 
I have a vague recollection of you suggesting #ifdef rather than a 
__weak function in the past. I might be mis-remembering, so if you think 
making this a __weak function is better, I can do that.

> 
> If anything, please add a comment above saying that architectures may need
> to override this function, and when doing so, they will define their own
> ftrace_cmp_recs.

Sure.

- Naveen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-10 16:40               ` Naveen N. Rao
@ 2022-02-10 17:01                 ` Steven Rostedt
  -1 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-02-10 17:01 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Hari Bathini,
	Jordan Niethe, Jiri Olsa, linuxppc-dev, Michael Ellerman,
	Yauheni Kaliuta

On Thu, 10 Feb 2022 16:40:28 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> The other option is to mark ftrace_cmp_recs() as a __weak function, but 
> I have a vague recollection of you suggesting #ifdef rather than a 
> __weak function in the past. I might be mis-remembering, so if you think 
> making this a __weak function is better, I can do that.

No. If I wanted that I would have suggested it. I think this is the
prettiest of the ugly solutions out there ;-)

As I said, I can't think of a better solution, and we can go with this
until something else comes along.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-10 17:01                 ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-02-10 17:01 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Daniel Borkmann, Yauheni Kaliuta, Jordan Niethe, linuxppc-dev,
	bpf, Jiri Olsa, Alexei Starovoitov, Hari Bathini

On Thu, 10 Feb 2022 16:40:28 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> The other option is to mark ftrace_cmp_recs() as a __weak function, but 
> I have a vague recollection of you suggesting #ifdef rather than a 
> __weak function in the past. I might be mis-remembering, so if you think 
> making this a __weak function is better, I can do that.

No. If I wanted that I would have suggested it. I think this is the
prettiest of the ugly solutions out there ;-)

As I said, I can't think of a better solution, and we can go with this
until something else comes along.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
  2022-02-10 17:01                 ` Steven Rostedt
@ 2022-02-11 11:36                   ` Naveen N. Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-11 11:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Hari Bathini,
	Jordan Niethe, Jiri Olsa, linuxppc-dev, Michael Ellerman,
	Yauheni Kaliuta

Steven Rostedt wrote:
> On Thu, 10 Feb 2022 16:40:28 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> The other option is to mark ftrace_cmp_recs() as a __weak function, but 
>> I have a vague recollection of you suggesting #ifdef rather than a 
>> __weak function in the past. I might be mis-remembering, so if you think 
>> making this a __weak function is better, I can do that.
> 
> No. If I wanted that I would have suggested it. I think this is the
> prettiest of the ugly solutions out there ;-)

Understood :)

> 
> As I said, I can't think of a better solution, and we can go with this
> until something else comes along.

Thanks,
- Naveen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL
@ 2022-02-11 11:36                   ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-11 11:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Borkmann, Yauheni Kaliuta, Jordan Niethe, linuxppc-dev,
	bpf, Jiri Olsa, Alexei Starovoitov, Hari Bathini

Steven Rostedt wrote:
> On Thu, 10 Feb 2022 16:40:28 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> The other option is to mark ftrace_cmp_recs() as a __weak function, but 
>> I have a vague recollection of you suggesting #ifdef rather than a 
>> __weak function in the past. I might be mis-remembering, so if you think 
>> making this a __weak function is better, I can do that.
> 
> No. If I wanted that I would have suggested it. I think this is the
> prettiest of the ugly solutions out there ;-)

Understood :)

> 
> As I said, I can't think of a better solution, and we can go with this
> until something else comes along.

Thanks,
- Naveen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines
  2022-02-07  7:07 ` Naveen N. Rao
@ 2022-02-11 14:40   ` Christophe Leroy
  -1 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2022-02-11 14:40 UTC (permalink / raw)
  To: Naveen N. Rao, Daniel Borkmann, Alexei Starovoitov,
	Michael Ellerman, Steven Rostedt
  Cc: Yauheni Kaliuta, Jordan Niethe, linuxppc-dev, bpf, Jiri Olsa,
	Hari Bathini



Le 07/02/2022 à 08:07, Naveen N. Rao a écrit :
> This is an early RFC series that adds support for BPF Trampolines on
> powerpc64. Some of the selftests are passing for me, but this needs more
> testing and I've likely missed a few things as well. A review of the
> patches and feedback about the overall approach will be great.
> 
> This series depends on some of the other BPF JIT fixes and enhancements
> posted previously, as well as on ftrace direct enablement on powerpc
> which has also been posted in the past.

Is there any reason to limit this to powerpc64 ?

Christophe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines
@ 2022-02-11 14:40   ` Christophe Leroy
  0 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2022-02-11 14:40 UTC (permalink / raw)
  To: Naveen N. Rao, Daniel Borkmann, Alexei Starovoitov,
	Michael Ellerman, Steven Rostedt
  Cc: Yauheni Kaliuta, Jordan Niethe, Jiri Olsa, bpf, linuxppc-dev,
	Hari Bathini



Le 07/02/2022 à 08:07, Naveen N. Rao a écrit :
> This is an early RFC series that adds support for BPF Trampolines on
> powerpc64. Some of the selftests are passing for me, but this needs more
> testing and I've likely missed a few things as well. A review of the
> patches and feedback about the overall approach will be great.
> 
> This series depends on some of the other BPF JIT fixes and enhancements
> posted previously, as well as on ftrace direct enablement on powerpc
> which has also been posted in the past.

Is there any reason to limit this to powerpc64 ?

Christophe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines
  2022-02-11 14:40   ` Christophe Leroy
@ 2022-02-14 10:47     ` Naveen N. Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-14 10:47 UTC (permalink / raw)
  To: Alexei Starovoitov, Christophe Leroy, Daniel Borkmann,
	Michael Ellerman, Steven Rostedt
  Cc: Yauheni Kaliuta, Jordan Niethe, linuxppc-dev, bpf, Jiri Olsa,
	Hari Bathini

Christophe Leroy wrote:
> 
> 
> Le 07/02/2022 à 08:07, Naveen N. Rao a écrit :
>> This is an early RFC series that adds support for BPF Trampolines on
>> powerpc64. Some of the selftests are passing for me, but this needs more
>> testing and I've likely missed a few things as well. A review of the
>> patches and feedback about the overall approach will be great.
>> 
>> This series depends on some of the other BPF JIT fixes and enhancements
>> posted previously, as well as on ftrace direct enablement on powerpc
>> which has also been posted in the past.
> 
> Is there any reason to limit this to powerpc64 ?

I have limited this to elf v2, and we won't be able to get this working 
on elf v1, since we don't have DYNAMIC_FTRACE_WITH_REGS supported there. 
We should be able to get this working on ppc32 though.


- Naveen


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines
@ 2022-02-14 10:47     ` Naveen N. Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Naveen N. Rao @ 2022-02-14 10:47 UTC (permalink / raw)
  To: Alexei Starovoitov, Christophe Leroy, Daniel Borkmann,
	Michael Ellerman, Steven Rostedt
  Cc: bpf, Hari Bathini, Jordan Niethe, Jiri Olsa, linuxppc-dev,
	Yauheni Kaliuta

Christophe Leroy wrote:
> 
> 
> Le 07/02/2022 à 08:07, Naveen N. Rao a écrit :
>> This is an early RFC series that adds support for BPF Trampolines on
>> powerpc64. Some of the selftests are passing for me, but this needs more
>> testing and I've likely missed a few things as well. A review of the
>> patches and feedback about the overall approach will be great.
>> 
>> This series depends on some of the other BPF JIT fixes and enhancements
>> posted previously, as well as on ftrace direct enablement on powerpc
>> which has also been posted in the past.
> 
> Is there any reason to limit this to powerpc64 ?

I have limited this to elf v2, and we won't be able to get this working 
on elf v1, since we don't have DYNAMIC_FTRACE_WITH_REGS supported there. 
We should be able to get this working on ppc32 though.


- Naveen


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-02-14 11:16 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-07  7:07 [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines Naveen N. Rao
2022-02-07  7:07 ` Naveen N. Rao
2022-02-07  7:07 ` [RFC PATCH 1/3] ftrace: Add ftrace_location_lookup() to lookup address of ftrace location Naveen N. Rao
2022-02-07  7:07   ` Naveen N. Rao
2022-02-07  7:07 ` [RFC PATCH 2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL Naveen N. Rao
2022-02-07  7:07   ` Naveen N. Rao
2022-02-07 15:24   ` Steven Rostedt
2022-02-07 15:24     ` Steven Rostedt
2022-02-09 17:50     ` Naveen N. Rao
2022-02-09 17:50       ` Naveen N. Rao
2022-02-09 21:10       ` Steven Rostedt
2022-02-09 21:10         ` Steven Rostedt
2022-02-10 13:58         ` Naveen N. Rao
2022-02-10 13:58           ` Naveen N. Rao
2022-02-10 14:59           ` Steven Rostedt
2022-02-10 14:59             ` Steven Rostedt
2022-02-10 16:40             ` Naveen N. Rao
2022-02-10 16:40               ` Naveen N. Rao
2022-02-10 17:01               ` Steven Rostedt
2022-02-10 17:01                 ` Steven Rostedt
2022-02-11 11:36                 ` Naveen N. Rao
2022-02-11 11:36                   ` Naveen N. Rao
2022-02-07  7:07 ` [RFC PATCH 3/3] powerpc64/bpf: Add support for bpf trampolines Naveen N. Rao
2022-02-07  7:07   ` Naveen N. Rao
2022-02-11 14:40 ` [RFC PATCH 0/3] powerpc64/bpf: Add support for BPF Trampolines Christophe Leroy
2022-02-11 14:40   ` Christophe Leroy
2022-02-14 10:47   ` Naveen N. Rao
2022-02-14 10:47     ` Naveen N. Rao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.