linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Out-of-line static calls for powerpc64 ELF V2
@ 2022-10-05  5:32 Benjamin Gray
  2022-10-05  5:32 ` [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function Benjamin Gray
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Benjamin Gray @ 2022-10-05  5:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ajd, peterz, Benjamin Gray, npiggin, ardb, jbaron, rostedt, jpoimboe

Implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI.
Static calls patch an indirect branch into a direct branch at runtime.
Out-of-line specifically has a caller directly call a trampoline, and
the trampoline gets patched to directly call the target.

Previous versions here:
V2: https://lore.kernel.org/all/20220926064316.765967-1-bgray@linux.ibm.com/
V1: https://lore.kernel.org/all/20220916062330.430468-1-bgray@linux.ibm.com/

Changed for V3:

[general]
* Rebased on top of
  https://lore.kernel.org/all/0df84a2eea551c1d000c34c36d0c1d23cbe26a97.1664289176.git.christophe.leroy@csgroup.eu/
  for removing the separate CONFIG_STRICT_KERNEL_RWX cases. Can rebase back onto next if necessary.
* Some some minor refactoring/style fixes throughout

[1/6]
* Code patching rewritten again
* This time it only adds support for what is needed:
        * int or long sized writes only
        * assumed within a cacheline (static call pointers are aligned
          for atomic updates, instructions are aligned anyway)
        * unconditional instruction syncing because non-instruction
          patching is not used in any performance sensitive paths
        * similarly, dword case is marked unlikely. ftrace activation is biggest
          performance concern, and it only uses non-prefixed instructions.
* Should be zero difference on 32-bit, minor differences on 64-bit
* Design doesn't need to be revisited unless specifically 1 or 2 byte
  patching is needed. Most such patches can be emulated by read-update-store
  of 4 bytes. Non-cacheline safe patches can be split similarly (they
  can't have atomicity requirements if they aren't aligned).

[3/6]
* Refactored to use patch_branch (thx Christophe)

[5/6]
* Required .localentry NAME, 1 directive guarded by toolchain version check
* Removed #ifdef's from static call implementation. Added sign_extend_long to
  support this.
* Fixed a bug in ppc_function_toc handling of lis case & made it more verbose
  to make such errors stand out more. New layout splits into calculating required
  values, and then applying them in two steps.

[6/6]
* Replaced SAVE_REGS/RESTORE_REGS macros with functions
* Reduced global register usage of tests
* Support running on 32-bit as well


Benjamin Gray (6):
  powerpc/code-patching: Implement generic text patching function
  powerpc/module: Handle caller-saved TOC in module linker
  powerpc/module: Optimise nearby branches in ELF V2 ABI stub
  static_call: Move static call selftest to static_call_selftest.c
  powerpc/64: Add support for out-of-line static calls
  powerpc: Add tests for out-of-line static calls

 arch/powerpc/Kconfig                     |  26 ++-
 arch/powerpc/include/asm/code-patching.h |  30 +++
 arch/powerpc/include/asm/static_call.h   |  80 ++++++-
 arch/powerpc/kernel/Makefile             |   4 +-
 arch/powerpc/kernel/module_64.c          |  26 ++-
 arch/powerpc/kernel/static_call.c        | 183 +++++++++++++++-
 arch/powerpc/kernel/static_call_test.c   | 263 +++++++++++++++++++++++
 arch/powerpc/kernel/static_call_test.h   |  56 +++++
 arch/powerpc/lib/code-patching.c         |  73 +++++--
 kernel/Makefile                          |   1 +
 kernel/static_call_inline.c              |  43 ----
 kernel/static_call_selftest.c            |  41 ++++
 12 files changed, 753 insertions(+), 73 deletions(-)
 create mode 100644 arch/powerpc/kernel/static_call_test.c
 create mode 100644 arch/powerpc/kernel/static_call_test.h
 create mode 100644 kernel/static_call_selftest.c


base-commit: 9a5e80596e50f1ab19fecb2d337e7ea3287ee083
--
2.37.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function
  2022-10-05  5:32 [PATCH v3 0/6] Out-of-line static calls for powerpc64 ELF V2 Benjamin Gray
@ 2022-10-05  5:32 ` Benjamin Gray
  2022-10-05 17:55   ` Christophe Leroy
  2022-10-05  5:32 ` [PATCH v3 2/6] powerpc/module: Handle caller-saved TOC in module linker Benjamin Gray
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Benjamin Gray @ 2022-10-05  5:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ajd, peterz, Benjamin Gray, npiggin, ardb, jbaron, rostedt, jpoimboe

Adds a generic text patching mechanism for patches of size int or long
bytes.

The patch_instruction function is reimplemented in terms of this
more generic function. This generic implementation allows patching of
arbitrary long data, such as pointers on 64-bit.

On 32-bit patch_int is marked noinline to prevent a mis-optimisation.
Without noinline, inside patch_branch the compiler may inline all the
way to do_patch_memory, preventing the compiler from inlining
do_patch_memory into patch_int. This would needlessly force patch_int
to be a branch to do_patch_memory.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
---
 arch/powerpc/include/asm/code-patching.h | 29 ++++++++++
 arch/powerpc/lib/code-patching.c         | 73 ++++++++++++++++++------
 2 files changed, 85 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..170bfa848c7c 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -72,7 +72,36 @@ static inline int create_branch(ppc_inst_t *instr, const u32 *addr,
 int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
 		       unsigned long target, int flags);
 int patch_branch(u32 *addr, unsigned long target, int flags);
+
+/* patch_uint and patch_ulong must only be called on addresses where the patch
+ * does not cross a cacheline, otherwise it may not be flushed properly and
+ * mixes of new and stale data may be observed.
+ *
+ * patch_instruction and other instruction patchers automatically satisfy this
+ * requirement due to instruction alignment requirements.
+ */
+
+int patch_uint(void *addr, unsigned int val);
+
+#ifdef CONFIG_PPC64
+
+int patch_ulong(void *addr, unsigned long val);
 int patch_instruction(u32 *addr, ppc_inst_t instr);
+
+#else
+
+static inline int patch_ulong(void *addr, unsigned long val)
+{
+	return patch_uint(addr, val);
+}
+
+static inline int patch_instruction(u32 *addr, ppc_inst_t instr)
+{
+	return patch_uint(addr, ppc_inst_val(instr));
+}
+
+#endif
+
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
 
 static inline unsigned long patch_site_addr(s32 *site)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 125c55e3e148..ecdd2e523d9a 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -15,20 +15,24 @@
 #include <asm/code-patching.h>
 #include <asm/inst.h>
 
-static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr)
+static int __patch_memory(void *patch_addr, unsigned long val, void *exec_addr,
+			   bool is_dword)
 {
-	if (!ppc_inst_prefixed(instr)) {
-		u32 val = ppc_inst_val(instr);
-
-		__put_kernel_nofault(patch_addr, &val, u32, failed);
-	} else {
-		u64 val = ppc_inst_as_ulong(instr);
+	/* Prefixed instruction may cross cacheline if cacheline smaller than 64 bytes */
+	BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64);
 
+	if (unlikely(is_dword))
 		__put_kernel_nofault(patch_addr, &val, u64, failed);
-	}
+	else
+		__put_kernel_nofault(patch_addr, &val, u32, failed);
 
-	asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
-							    "r" (exec_addr));
+	/* Assume data is inside a single cacheline */
+	dcbst(patch_addr);
+	mb(); /* sync */
+	/* Flush on the EA that may be executed in case of a non-coherent icache */
+	icbi(exec_addr);
+	mb(); /* sync */
+	isync();
 
 	return 0;
 
@@ -38,7 +42,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr
 
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr)
 {
-	return __patch_instruction(addr, instr, addr);
+	if (ppc_inst_prefixed(instr))
+		return __patch_memory(addr, ppc_inst_as_ulong(instr), addr, true);
+	else
+		return __patch_memory(addr, ppc_inst_val(instr), addr, false);
 }
 
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
@@ -149,7 +156,7 @@ static void unmap_patch_area(unsigned long addr)
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
-static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
+static int __do_patch_memory(void *addr, unsigned long val, bool is_dword)
 {
 	int err;
 	u32 *patch_addr;
@@ -166,7 +173,7 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
 	if (radix_enabled())
 		asm volatile("ptesync": : :"memory");
 
-	err = __patch_instruction(addr, instr, patch_addr);
+	err = __patch_memory(patch_addr, val, addr, is_dword);
 
 	pte_clear(&init_mm, text_poke_addr, pte);
 	flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE);
@@ -174,7 +181,7 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
 	return err;
 }
 
-int patch_instruction(u32 *addr, ppc_inst_t instr)
+static int do_patch_memory(void *addr, unsigned long val, bool is_dword)
 {
 	int err;
 	unsigned long flags;
@@ -186,15 +193,47 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
 	 */
 	if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) ||
 	    !static_branch_likely(&poking_init_done))
-		return raw_patch_instruction(addr, instr);
+		return __patch_memory(addr, val, addr, is_dword);
 
 	local_irq_save(flags);
-	err = __do_patch_instruction(addr, instr);
+	err = __do_patch_memory(addr, val, is_dword);
 	local_irq_restore(flags);
 
 	return err;
 }
-NOKPROBE_SYMBOL(patch_instruction);
+
+#ifdef CONFIG_PPC64
+
+int patch_uint(void *addr, unsigned int val)
+{
+	return do_patch_memory(addr, val, false);
+}
+NOKPROBE_SYMBOL(patch_uint)
+
+int patch_ulong(void *addr, unsigned long val)
+{
+	return do_patch_memory(addr, val, true);
+}
+NOKPROBE_SYMBOL(patch_ulong)
+
+int patch_instruction(u32 *addr, ppc_inst_t instr)
+{
+	if (ppc_inst_prefixed(instr))
+		return patch_ulong(addr, ppc_inst_as_ulong(instr));
+	else
+		return patch_uint(addr, ppc_inst_val(instr));
+}
+NOKPROBE_SYMBOL(patch_instruction)
+
+#else
+
+noinline int patch_uint(void *addr, unsigned int val)
+{
+	return do_patch_memory(addr, val, false);
+}
+NOKPROBE_SYMBOL(patch_uint)
+
+#endif
 
 int patch_branch(u32 *addr, unsigned long target, int flags)
 {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 2/6] powerpc/module: Handle caller-saved TOC in module linker
  2022-10-05  5:32 [PATCH v3 0/6] Out-of-line static calls for powerpc64 ELF V2 Benjamin Gray
  2022-10-05  5:32 ` [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function Benjamin Gray
@ 2022-10-05  5:32 ` Benjamin Gray
  2022-10-05 19:18   ` Christophe Leroy
  2022-10-05  5:32 ` [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub Benjamin Gray
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Benjamin Gray @ 2022-10-05  5:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ajd, peterz, Benjamin Gray, npiggin, ardb, jbaron, rostedt, jpoimboe

The callee may set a field in st_other to 1 to indicate r2 should be
treated as caller-saved. This means a trampoline must be used to save
the current TOC before calling it and restore it afterwards, much like
external calls.

This is necessary for supporting V2 ABI static calls that do not
preserve the TOC.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
---
 arch/powerpc/kernel/module_64.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7e45dc98df8a..4d816f7785b4 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -55,6 +55,12 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
 	 * of function and try to derive r2 from it). */
 	return PPC64_LOCAL_ENTRY_OFFSET(sym->st_other);
 }
+
+static bool need_r2save_stub(unsigned char st_other)
+{
+	return ((st_other & STO_PPC64_LOCAL_MASK) >> STO_PPC64_LOCAL_BIT) == 1;
+}
+
 #else
 
 static func_desc_t func_desc(unsigned long addr)
@@ -66,6 +72,11 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
 	return 0;
 }
 
+static bool need_r2save_stub(unsigned char st_other)
+{
+	return false;
+}
+
 void *dereference_module_function_descriptor(struct module *mod, void *ptr)
 {
 	if (ptr < (void *)mod->arch.start_opd ||
@@ -632,7 +643,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		case R_PPC_REL24:
 			/* FIXME: Handle weak symbols here --RR */
 			if (sym->st_shndx == SHN_UNDEF ||
-			    sym->st_shndx == SHN_LIVEPATCH) {
+			    sym->st_shndx == SHN_LIVEPATCH ||
+			    need_r2save_stub(sym->st_other)) {
 				/* External: go via stub */
 				value = stub_for_addr(sechdrs, value, me,
 						strtab + sym->st_name);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub
  2022-10-05  5:32 [PATCH v3 0/6] Out-of-line static calls for powerpc64 ELF V2 Benjamin Gray
  2022-10-05  5:32 ` [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function Benjamin Gray
  2022-10-05  5:32 ` [PATCH v3 2/6] powerpc/module: Handle caller-saved TOC in module linker Benjamin Gray
@ 2022-10-05  5:32 ` Benjamin Gray
  2022-10-05 19:21   ` Christophe Leroy
  2022-10-06  8:24   ` Andrew Donnellan
  2022-10-05  5:32 ` [PATCH v3 4/6] static_call: Move static call selftest to static_call_selftest.c Benjamin Gray
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Benjamin Gray @ 2022-10-05  5:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ajd, peterz, Benjamin Gray, npiggin, ardb, jbaron, rostedt, jpoimboe

Inserts a direct branch to the stub target when possible, replacing the
mtctr/btctr sequence.

The load into r12 could potentially be skipped too, but that change
would need to refactor the arguments to indicate that the address
does not have a separate local entry point.

This helps the static call implementation, where modules calling their
own trampolines are called through this stub and the trampoline is
easily within range of a direct branch.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
---
 arch/powerpc/kernel/module_64.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 4d816f7785b4..13ce7a4d8a8d 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -141,6 +141,12 @@ static u32 ppc64_stub_insns[] = {
 	PPC_RAW_BCTR(),
 };
 
+#ifdef CONFIG_PPC64_ELF_ABI_V1
+#define PPC64_STUB_MTCTR_OFFSET 5
+#else
+#define PPC64_STUB_MTCTR_OFFSET 4
+#endif
+
 /* Count how many different 24-bit relocations (different symbol,
    different addend) */
 static unsigned int count_relocs(const Elf64_Rela *rela, unsigned int num)
@@ -426,6 +432,7 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
 			      struct module *me,
 			      const char *name)
 {
+	int err;
 	long reladdr;
 	func_desc_t desc;
 	int i;
@@ -439,6 +446,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
 			return 0;
 	}
 
+	/* Replace indirect branch sequence with direct branch where possible */
+	err = patch_branch(&entry->jump[PPC64_STUB_MTCTR_OFFSET], addr, 0);
+	if (err && err != -ERANGE)
+		return 0;
+
 	/* Stub uses address relative to r2. */
 	reladdr = (unsigned long)entry - my_r2(sechdrs, me);
 	if (reladdr > 0x7FFFFFFF || reladdr < -(0x80000000L)) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 4/6] static_call: Move static call selftest to static_call_selftest.c
  2022-10-05  5:32 [PATCH v3 0/6] Out-of-line static calls for powerpc64 ELF V2 Benjamin Gray
                   ` (2 preceding siblings ...)
  2022-10-05  5:32 ` [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub Benjamin Gray
@ 2022-10-05  5:32 ` Benjamin Gray
  2022-10-05 19:22   ` Christophe Leroy
  2022-10-05  5:32 ` [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls Benjamin Gray
  2022-10-05  5:32 ` [PATCH v3 6/6] powerpc: Add tests " Benjamin Gray
  5 siblings, 1 reply; 25+ messages in thread
From: Benjamin Gray @ 2022-10-05  5:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ajd, peterz, Benjamin Gray, npiggin, ardb, jbaron, rostedt, jpoimboe

These tests are out-of-line only, so moving them to the
their own file allows them to be run when an arch does
not implement inline static calls.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
---
 kernel/Makefile               |  1 +
 kernel/static_call_inline.c   | 43 -----------------------------------
 kernel/static_call_selftest.c | 41 +++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 43 deletions(-)
 create mode 100644 kernel/static_call_selftest.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 318789c728d3..8ce8beaa3cc0 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -113,6 +113,7 @@ obj-$(CONFIG_KCSAN) += kcsan/
 obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o
+obj-$(CONFIG_STATIC_CALL_SELFTEST) += static_call_selftest.o
 obj-$(CONFIG_CFI_CLANG) += cfi.o
 
 obj-$(CONFIG_PERF_EVENTS) += events/
diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c
index dc5665b62814..64d04d054698 100644
--- a/kernel/static_call_inline.c
+++ b/kernel/static_call_inline.c
@@ -498,46 +498,3 @@ int __init static_call_init(void)
 	return 0;
 }
 early_initcall(static_call_init);
-
-#ifdef CONFIG_STATIC_CALL_SELFTEST
-
-static int func_a(int x)
-{
-	return x+1;
-}
-
-static int func_b(int x)
-{
-	return x+2;
-}
-
-DEFINE_STATIC_CALL(sc_selftest, func_a);
-
-static struct static_call_data {
-      int (*func)(int);
-      int val;
-      int expect;
-} static_call_data [] __initdata = {
-      { NULL,   2, 3 },
-      { func_b, 2, 4 },
-      { func_a, 2, 3 }
-};
-
-static int __init test_static_call_init(void)
-{
-      int i;
-
-      for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
-	      struct static_call_data *scd = &static_call_data[i];
-
-              if (scd->func)
-                      static_call_update(sc_selftest, scd->func);
-
-              WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
-      }
-
-      return 0;
-}
-early_initcall(test_static_call_init);
-
-#endif /* CONFIG_STATIC_CALL_SELFTEST */
diff --git a/kernel/static_call_selftest.c b/kernel/static_call_selftest.c
new file mode 100644
index 000000000000..246ad89f64eb
--- /dev/null
+++ b/kernel/static_call_selftest.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/static_call.h>
+
+static int func_a(int x)
+{
+	return x+1;
+}
+
+static int func_b(int x)
+{
+	return x+2;
+}
+
+DEFINE_STATIC_CALL(sc_selftest, func_a);
+
+static struct static_call_data {
+	int (*func)(int);
+	int val;
+	int expect;
+} static_call_data [] __initdata = {
+	{ NULL,   2, 3 },
+	{ func_b, 2, 4 },
+	{ func_a, 2, 3 }
+};
+
+static int __init test_static_call_init(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
+		struct static_call_data *scd = &static_call_data[i];
+
+		if (scd->func)
+			static_call_update(sc_selftest, scd->func);
+
+		WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
+	}
+
+	return 0;
+}
+early_initcall(test_static_call_init);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-05  5:32 [PATCH v3 0/6] Out-of-line static calls for powerpc64 ELF V2 Benjamin Gray
                   ` (3 preceding siblings ...)
  2022-10-05  5:32 ` [PATCH v3 4/6] static_call: Move static call selftest to static_call_selftest.c Benjamin Gray
@ 2022-10-05  5:32 ` Benjamin Gray
  2022-10-05 19:38   ` Christophe Leroy
  2022-10-05  5:32 ` [PATCH v3 6/6] powerpc: Add tests " Benjamin Gray
  5 siblings, 1 reply; 25+ messages in thread
From: Benjamin Gray @ 2022-10-05  5:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ajd, peterz, Benjamin Gray, npiggin, ardb, jbaron, rostedt, jpoimboe

Implement static call support for 64 bit V2 ABI. This requires making
sure the TOC is kept correct across kernel-module boundaries. As a
secondary concern, it tries to use the local entry point of a target
wherever possible. It does so by checking if both tramp & target are
kernel code, and falls back to detecting the common global entry point
patterns if modules are involved. Detecting the global entry point is
also required for setting the local entry point as the trampoline
target: if we cannot detect the local entry point, then we need to
convservatively initialise r12 and use the global entry point.

The trampolines are marked with `.localentry NAME, 1` to make the
linker save and restore the TOC on each call to the trampoline. This
allows the trampoline to safely target functions with different TOC
values.

However this directive also implies the TOC is not initialised on entry
to the trampoline. The kernel TOC is easily found in the PACA, but not
an arbitrary module TOC. Therefore the trampoline implementation depends
on whether it's in the kernel or not. If in the kernel, we initialise
the TOC using the PACA. If in a module, we have to initialise the TOC
with zero context, so it's quite expensive.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
---
 arch/powerpc/Kconfig                     |  14 ++-
 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/include/asm/static_call.h   |  80 +++++++++++++-
 arch/powerpc/kernel/Makefile             |   3 +-
 arch/powerpc/kernel/static_call.c        | 130 +++++++++++++++++++++--
 5 files changed, 216 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4c466acdc70d..962e36ec34ec 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -102,6 +102,18 @@ config GENERIC_HWEIGHT
 	bool
 	default y
 
+config TOOLCHAIN_SUPPORTS_LOCALENTRY1
+	bool
+	depends on PPC64_ELF_ABI_V2
+	default y if LD_VERSION >= 23200 || LLD_VERSION >= 110000
+	help
+	  A section of the ELF symbol st_other field can be given the value 1
+	  using the directive '.localentry NAME, 1' to mean the local and global
+	  entry points are the same, and r2 should be treated as caller-saved.
+
+	  Older versions of Clang and binutils do not recognise this form of the
+	  directive and will error if it is used.
+
 config PPC
 	bool
 	default y
@@ -248,7 +260,7 @@ config PPC
 	select HAVE_SOFTIRQ_ON_OWN_STACK
 	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
 	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
-	select HAVE_STATIC_CALL			if PPC32
+	select HAVE_STATIC_CALL			if PPC32 || (PPC64_ELF_ABI_V2 && TOOLCHAIN_SUPPORTS_LOCALENTRY1)
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select HUGETLB_PAGE_SIZE_VARIABLE	if PPC_BOOK3S_64 && HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index 170bfa848c7c..cb4629e55e57 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -152,6 +152,7 @@ int translate_branch(ppc_inst_t *instr, const u32 *dest, const u32 *src);
 bool is_conditional_branch(ppc_inst_t instr);
 
 #define OP_RT_RA_MASK	0xffff0000UL
+#define OP_SI_MASK	0x0000ffffUL
 #define LIS_R2		(PPC_RAW_LIS(_R2, 0))
 #define ADDIS_R2_R12	(PPC_RAW_ADDIS(_R2, _R12, 0))
 #define ADDI_R2_R2	(PPC_RAW_ADDI(_R2, _R2, 0))
diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
index de1018cc522b..3d6e82200cb7 100644
--- a/arch/powerpc/include/asm/static_call.h
+++ b/arch/powerpc/include/asm/static_call.h
@@ -2,12 +2,75 @@
 #ifndef _ASM_POWERPC_STATIC_CALL_H
 #define _ASM_POWERPC_STATIC_CALL_H
 
+#ifdef CONFIG_PPC64_ELF_ABI_V2
+
+#ifdef MODULE
+
+#define __PPC_SCT(name, inst)					\
+	asm(".pushsection .text, \"ax\"				\n"	\
+	    ".align 6						\n"	\
+	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
+	    ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1	\n"	\
+	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
+	    "	mflr	11					\n"	\
+	    "	bcl	20, 31, $+4				\n"	\
+	    "0:	mflr	12					\n"	\
+	    "	mtlr	11					\n"	\
+	    "	addi	12, 12, (" STATIC_CALL_TRAMP_STR(name) " - 0b)	\n"	\
+	    "	addis 2, 12, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@ha	\n"	\
+	    "	addi 2, 2, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@l	\n"	\
+	    "	" inst "					\n"	\
+	    "	ld	12, (2f - " STATIC_CALL_TRAMP_STR(name) ")(12)	\n"	\
+	    "	mtctr	12					\n"	\
+	    "	bctr						\n"	\
+	    "1:	li	3, 0					\n"	\
+	    "	blr						\n"	\
+	    ".balign 8						\n"	\
+	    "2:	.8byte 0					\n"	\
+	    ".type " STATIC_CALL_TRAMP_STR(name) ", @function	\n"	\
+	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
+	    ".popsection					\n")
+
+#else /* KERNEL */
+
+#define __PPC_SCT(name, inst)					\
+	asm(".pushsection .text, \"ax\"				\n"	\
+	    ".align 5						\n"	\
+	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
+	    ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1	\n"	\
+	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
+	    "	ld	2, 16(13)				\n"	\
+	    "	" inst "					\n"	\
+	    "	addis	12, 2, 2f@toc@ha			\n"	\
+	    "	ld	12, 2f@toc@l(12)			\n"	\
+	    "	mtctr	12					\n"	\
+	    "	bctr						\n"	\
+	    "1:	li	3, 0					\n"	\
+	    "	blr						\n"	\
+	    ".balign 8						\n"	\
+	    "2:	.8byte 0					\n"	\
+	    ".type " STATIC_CALL_TRAMP_STR(name) ", @function	\n"	\
+	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
+	    ".popsection					\n")
+
+#endif /* MODULE */
+
+#define PPC_SCT_INST_MODULE		28		/* Offset of instruction to update */
+#define PPC_SCT_RET0_MODULE		44		/* Offset of label 1 */
+#define PPC_SCT_DATA_MODULE		56		/* Offset of label 2 (aligned) */
+
+#define PPC_SCT_INST_KERNEL		4		/* Offset of instruction to update */
+#define PPC_SCT_RET0_KERNEL		24		/* Offset of label 1 */
+#define PPC_SCT_DATA_KERNEL		32		/* Offset of label 2 (aligned) */
+
+#elif defined(CONFIG_PPC32)
+
 #define __PPC_SCT(name, inst)					\
 	asm(".pushsection .text, \"ax\"				\n"	\
 	    ".align 5						\n"	\
 	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
 	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
-	    inst "						\n"	\
+	    "	" inst "					\n"	\
 	    "	lis	12,2f@ha				\n"	\
 	    "	lwz	12,2f@l(12)				\n"	\
 	    "	mtctr	12					\n"	\
@@ -19,11 +82,20 @@
 	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
 	    ".popsection					\n")
 
-#define PPC_SCT_RET0		20		/* Offset of label 1 */
-#define PPC_SCT_DATA		28		/* Offset of label 2 */
+#define PPC_SCT_INST_MODULE		0		/* Offset of instruction to update */
+#define PPC_SCT_RET0_MODULE		20		/* Offset of label 1 */
+#define PPC_SCT_DATA_MODULE		28		/* Offset of label 2 */
+
+#define PPC_SCT_INST_KERNEL		PPC_SCT_INST_MODULE
+#define PPC_SCT_RET0_KERNEL		PPC_SCT_RET0_MODULE
+#define PPC_SCT_DATA_KERNEL		PPC_SCT_DATA_MODULE
+
+#else /* !CONFIG_PPC64_ELF_ABI_V2 && !CONFIG_PPC32 */
+#error "Unsupported ABI"
+#endif /* CONFIG_PPC64_ELF_ABI_V2 */
 
 #define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func)	__PPC_SCT(name, "b " #func)
 #define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)	__PPC_SCT(name, "blr")
-#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name)	__PPC_SCT(name, "b .+20")
+#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name)	__PPC_SCT(name, "b 1f")
 
 #endif /* _ASM_POWERPC_STATIC_CALL_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 06d2d1f78f71..a30d0d0f5499 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -128,8 +128,9 @@ extra-y				+= vmlinux.lds
 
 obj-$(CONFIG_RELOCATABLE)	+= reloc_$(BITS).o
 
-obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o early_32.o static_call.o
+obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o early_32.o
 obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
+obj-$(CONFIG_HAVE_STATIC_CALL)	+= static_call.o
 obj-$(CONFIG_KGDB)		+= kgdb.o
 obj-$(CONFIG_BOOTX_TEXT)	+= btext.o
 obj-$(CONFIG_SMP)		+= smp.o
diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
index 863a7aa24650..9211b2e189bb 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -1,33 +1,151 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/bitops.h>
 #include <linux/memory.h>
 #include <linux/static_call.h>
 
 #include <asm/code-patching.h>
 
+static long sign_extend_long(unsigned long value, int index)
+{
+	if (sizeof(long) == 8)
+		return sign_extend64(value, index);
+	else
+		return sign_extend32(value, index);
+}
+
+static void *ppc_function_toc(u32 *func)
+{
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) {
+		/* There are two common global entry sequences we handle below
+		 *
+		 * 1. addis r2, r12, SI1
+		 *    addi r2, SI2
+		 *
+		 * 2. lis r2, SI1
+		 *    addi r2, SI2
+		 *
+		 * Where r12 contains the global entry point address (it is otherwise
+		 * uninitialised, so doesn't matter what value we use if this is not
+		 * a separate global entry point).
+		 *
+		 * Here we simulate running the given sequence and return the result it
+		 * would calculate. If the sequence is not recognised we return NULL.
+		 */
+		u32 insn1 = *func;
+		u32 insn2 = *(func + 1);
+		unsigned long op_regs1 = insn1 & OP_RT_RA_MASK;
+		unsigned long op_regs2 = insn2 & OP_RT_RA_MASK;
+		unsigned long si1 = insn1 & OP_SI_MASK;
+		unsigned long si2 = insn2 & OP_SI_MASK;
+		unsigned long imm1 = sign_extend_long(si1 << 16, 31);
+		unsigned long imm2 = sign_extend_long(si2, 15);
+		unsigned long addr = 0;
+
+		/* Simulate the first instruction */
+		if (op_regs1 == ADDIS_R2_R12)
+			addr += (unsigned long)func + imm1;
+		else if (op_regs1 == LIS_R2)
+			addr += imm1;
+		else
+			return NULL;
+
+		/* Simulate the second instruction */
+		if (op_regs2 == ADDI_R2_R2)
+			addr += imm2;
+		else
+			return NULL;
+
+		return (void *)addr;
+	}
+
+	return NULL;
+}
+
+static bool shares_toc(void *func1, void *func2)
+{
+	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) {
+		void *func1_toc;
+		void *func2_toc;
+
+		if (func1 == NULL || func2 == NULL)
+			return false;
+
+		/* Assume the kernel only uses a single TOC */
+		if (core_kernel_text((unsigned long)func1) &&
+		    core_kernel_text((unsigned long)func2))
+			return true;
+
+		/* Fall back to calculating the TOC from common patterns
+		 * if modules are involved
+		 */
+		func1_toc = ppc_function_toc(func1);
+		func2_toc = ppc_function_toc(func2);
+		return func1_toc != NULL && func2_toc != NULL && func1_toc == func2_toc;
+	}
+
+	return true;
+}
+
+static void *get_inst_addr(void *tramp)
+{
+	return tramp + (core_kernel_text((unsigned long)tramp)
+				? PPC_SCT_INST_KERNEL
+				: PPC_SCT_INST_MODULE);
+}
+
+static void *get_ret0_addr(void *tramp)
+{
+	return tramp + (core_kernel_text((unsigned long)tramp)
+				? PPC_SCT_RET0_KERNEL
+				: PPC_SCT_RET0_MODULE);
+}
+
+static void *get_data_addr(void *tramp)
+{
+	return tramp + (core_kernel_text((unsigned long) tramp)
+				? PPC_SCT_DATA_KERNEL
+				: PPC_SCT_DATA_MODULE);
+}
+
 void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
 {
 	int err;
 	bool is_ret0 = (func == __static_call_return0);
-	unsigned long target = (unsigned long)(is_ret0 ? tramp + PPC_SCT_RET0 : func);
-	bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
+	bool is_short;
+	void *target = is_ret0 ? get_ret0_addr(tramp) : func;
+	void *tramp_inst = get_inst_addr(tramp);
 
 	if (!tramp)
 		return;
 
+	if (is_ret0)
+		is_short = true;
+	else if (shares_toc(tramp, target))
+		is_short = is_offset_in_branch_range(
+			(long)ppc_function_entry(target) - (long)tramp_inst);
+	else
+		/* Combine out-of-range with not sharing a TOC. Though it's possible
+		 * an out-of-range target shares a TOC, handling this separately
+		 * complicates the trampoline. It's simpler to always use the global
+		 * entry point in this case.
+		 */
+		is_short = false;
+
 	mutex_lock(&text_mutex);
 
 	if (func && !is_short) {
-		err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target));
+		err = patch_ulong(get_data_addr(tramp), (unsigned long)target);
 		if (err)
 			goto out;
 	}
 
 	if (!func)
-		err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
+		err = patch_instruction(tramp_inst, ppc_inst(PPC_RAW_BLR()));
 	else if (is_short)
-		err = patch_branch(tramp, target, 0);
+		err = patch_branch(tramp_inst, ppc_function_entry(target), 0);
 	else
-		err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
+		err = patch_instruction(tramp_inst, ppc_inst(PPC_RAW_NOP()));
+
 out:
 	mutex_unlock(&text_mutex);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 6/6] powerpc: Add tests for out-of-line static calls
  2022-10-05  5:32 [PATCH v3 0/6] Out-of-line static calls for powerpc64 ELF V2 Benjamin Gray
                   ` (4 preceding siblings ...)
  2022-10-05  5:32 ` [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls Benjamin Gray
@ 2022-10-05  5:32 ` Benjamin Gray
  5 siblings, 0 replies; 25+ messages in thread
From: Benjamin Gray @ 2022-10-05  5:32 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: ajd, peterz, Benjamin Gray, npiggin, ardb, jbaron, rostedt, jpoimboe

KUnit tests for the various combinations of caller/trampoline/target and
kernel/module. They must be run from a module loaded at runtime to
guarantee they have a different TOC to the kernel (64-bit ELF V2) and
increase the chance of testing the non-direct branch path of the
trampoline.

For 64-bit ELF V2 ABI the tests try to mitigate the chance of panicking
by restoring the TOC after every static call. Not all possible errors
can be caught by this (we can't stop a trampoline from using a bad TOC
itself), but it makes certain errors easier to debug.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
---
 arch/powerpc/Kconfig                   |  12 ++
 arch/powerpc/kernel/Makefile           |   1 +
 arch/powerpc/kernel/static_call.c      |  53 +++++
 arch/powerpc/kernel/static_call_test.c | 263 +++++++++++++++++++++++++
 arch/powerpc/kernel/static_call_test.h |  56 ++++++
 5 files changed, 385 insertions(+)
 create mode 100644 arch/powerpc/kernel/static_call_test.c
 create mode 100644 arch/powerpc/kernel/static_call_test.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 962e36ec34ec..5b9d5fa96a9e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1035,6 +1035,18 @@ config PPC_RTAS_FILTER
 	  Say Y unless you know what you are doing and the filter is causing
 	  problems for you.
 
+config PPC_STATIC_CALL_KUNIT_TEST
+	tristate "KUnit tests for static calls"
+	default KUNIT_ALL_TESTS
+	depends on HAVE_STATIC_CALL && KUNIT && m
+	help
+	  Tests for static calls across all combinations of caller/trampoline/target
+	  being kernel/module. On ELF ABI V2 the tests check the TOC is kept consistent.
+
+	  Must be built as a module and loaded at runtime to ensure the module has
+	  a different TOC to the kernel and make it likely that non-direct branch
+	  path of the trampoline is tested.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index a30d0d0f5499..22c07e3d34df 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -131,6 +131,7 @@ obj-$(CONFIG_RELOCATABLE)	+= reloc_$(BITS).o
 obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o early_32.o
 obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
 obj-$(CONFIG_HAVE_STATIC_CALL)	+= static_call.o
+obj-$(CONFIG_PPC_STATIC_CALL_KUNIT_TEST)	+= static_call_test.o
 obj-$(CONFIG_KGDB)		+= kgdb.o
 obj-$(CONFIG_BOOTX_TEXT)	+= btext.o
 obj-$(CONFIG_SMP)		+= smp.o
diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
index 9211b2e189bb..44957ba91e3f 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -153,3 +153,56 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
 		panic("%s: patching failed %pS at %pS\n", __func__, func, tramp);
 }
 EXPORT_SYMBOL_GPL(arch_static_call_transform);
+
+
+#if IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST)
+
+#include "static_call_test.h"
+
+int ppc_sc_kernel_target_1(struct kunit *test)
+{
+	toc_fixup(test);
+	return 1;
+}
+
+int ppc_sc_kernel_target_2(struct kunit *test)
+{
+	toc_fixup(test);
+	return 2;
+}
+
+DEFINE_STATIC_CALL(ppc_sc_kernel, ppc_sc_kernel_target_1);
+
+int ppc_sc_kernel_call(struct kunit *test)
+{
+	return PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test));
+}
+
+int ppc_sc_kernel_call_indirect(struct kunit *test, int (*fn)(struct kunit *test))
+{
+	return PROTECTED_SC(test, int, fn(test));
+}
+
+long ppc_sc_kernel_target_big(struct kunit *test, long a, long b, long c, long d,
+			      long e, long f, long g, long h, long i)
+{
+	toc_fixup(test);
+	KUNIT_EXPECT_EQ(test, a, b);
+	KUNIT_EXPECT_EQ(test, a, c);
+	KUNIT_EXPECT_EQ(test, a, d);
+	KUNIT_EXPECT_EQ(test, a, e);
+	KUNIT_EXPECT_EQ(test, a, f);
+	KUNIT_EXPECT_EQ(test, a, g);
+	KUNIT_EXPECT_EQ(test, a, h);
+	KUNIT_EXPECT_EQ(test, a, i);
+	return ~a;
+}
+
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_1);
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_2);
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_big);
+EXPORT_STATIC_CALL_GPL(ppc_sc_kernel);
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_call);
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_call_indirect);
+
+#endif /* IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) */
diff --git a/arch/powerpc/kernel/static_call_test.c b/arch/powerpc/kernel/static_call_test.c
new file mode 100644
index 000000000000..10a09ef455cf
--- /dev/null
+++ b/arch/powerpc/kernel/static_call_test.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "static_call_test.h"
+
+#include <linux/kconfig.h>
+#include <linux/module.h>
+#include <linux/static_call.h>
+
+/* Tests to ensure correctness in a variety of cases for static calls.
+ *
+ * The tests focus on ensuring the TOC is kept consistent across the
+ * module-kernel boundary, as compilers can't see that a trampoline
+ * defined locally to a caller might be jumping to a function with a
+ * different TOC. So it's important that these tests are compiled as
+ * a module to ensure the TOC will be different to the kernel's.
+ */
+
+#ifdef CONFIG_PPC64_ELF_ABI_V2
+
+/* Utils to hold a copy of the old register values while we test.
+ *
+ * We can't use the KUnit init/exit hooks because when the hooks and
+ * test cases return they will be in the KUnit context that doesn't know
+ * we've reserved and modified some non-volatile registers.
+ */
+
+static void *saved_registers[2];
+
+static void init_testcase(struct kunit *test)
+{
+	saved_registers[0] = actual_toc;
+	saved_registers[1] = module_toc;
+	module_toc = current_toc;
+	KUNIT_ASSERT_PTR_NE(test, module_toc, (void *)get_paca()->kernel_toc);
+}
+
+static void exit_testcase(void)
+{
+	actual_toc = saved_registers[0];
+	module_toc = saved_registers[1];
+}
+
+#else
+
+static void init_testcase(struct kunit *test) {}
+static void exit_testcase(void) {}
+
+#endif  /* CONFIG_PPC64_ELF_ABI_V2 */
+
+static int module_target_11(struct kunit *test)
+{
+	toc_fixup(test);
+	return 11;
+}
+
+static int module_target_12(struct kunit *test)
+{
+	toc_fixup(test);
+	return 12;
+}
+
+DEFINE_STATIC_CALL(module_sc, module_target_11);
+
+DEFINE_STATIC_CALL_RET0(module_sc_ret0, int(void));
+DEFINE_STATIC_CALL_NULL(module_sc_null, int(int));
+
+static int add_one(int *val)
+{
+	return (*val)++;
+}
+
+static void null_function_test(struct kunit *test)
+{
+	int val = 0;
+
+	init_testcase(test);
+
+	/* Check argument unconditionally evaluated */
+	static_call_cond(module_sc_null)(add_one(&val));
+	KUNIT_ASSERT_EQ(test, 1, val);
+
+	exit_testcase();
+}
+
+static void return_zero_test(struct kunit *test)
+{
+	int ret;
+
+	init_testcase(test);
+
+	ret = PROTECTED_SC(test, int, static_call(module_sc_ret0)());
+	KUNIT_ASSERT_EQ(test, 0, ret);
+
+	static_call_update(ppc_sc_kernel, (void *)__static_call_return0);
+	ret = PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test));
+	KUNIT_ASSERT_EQ(test, 0, ret);
+
+	static_call_update(module_sc, (void *)__static_call_return0);
+	ret = PROTECTED_SC(test, int, static_call(module_sc)(test));
+	KUNIT_ASSERT_EQ(test, 0, ret);
+
+	exit_testcase();
+}
+
+static void kernel_kernel_kernel_test(struct kunit *test)
+{
+	init_testcase(test);
+
+	static_call_update(ppc_sc_kernel, ppc_sc_kernel_target_1);
+	KUNIT_ASSERT_EQ(test, 1, ppc_sc_kernel_call(test));
+
+	static_call_update(ppc_sc_kernel, ppc_sc_kernel_target_2);
+	KUNIT_ASSERT_EQ(test, 2, ppc_sc_kernel_call(test));
+
+	exit_testcase();
+}
+
+static void kernel_kernel_module_test(struct kunit *test)
+{
+	init_testcase(test);
+
+	static_call_update(ppc_sc_kernel, module_target_11);
+	KUNIT_ASSERT_EQ(test, 11, ppc_sc_kernel_call(test));
+
+	static_call_update(ppc_sc_kernel, module_target_12);
+	KUNIT_ASSERT_EQ(test, 12, ppc_sc_kernel_call(test));
+
+	exit_testcase();
+}
+
+static void kernel_module_kernel_test(struct kunit *test)
+{
+	init_testcase(test);
+
+	static_call_update(module_sc, ppc_sc_kernel_target_1);
+	KUNIT_ASSERT_EQ(test, 1, ppc_sc_kernel_call_indirect(test, static_call(module_sc)));
+
+	static_call_update(module_sc, ppc_sc_kernel_target_2);
+	KUNIT_ASSERT_EQ(test, 2, ppc_sc_kernel_call_indirect(test, static_call(module_sc)));
+
+	exit_testcase();
+}
+
+static void kernel_module_module_test(struct kunit *test)
+{
+	init_testcase(test);
+
+	static_call_update(module_sc, module_target_11);
+	KUNIT_ASSERT_EQ(test, 11, ppc_sc_kernel_call_indirect(test, static_call(module_sc)));
+
+	static_call_update(module_sc, module_target_12);
+	KUNIT_ASSERT_EQ(test, 12, ppc_sc_kernel_call_indirect(test, static_call(module_sc)));
+
+	exit_testcase();
+}
+
+static void module_kernel_kernel_test(struct kunit *test)
+{
+	int ret;
+
+	init_testcase(test);
+
+	static_call_update(ppc_sc_kernel, ppc_sc_kernel_target_1);
+	ret = PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test));
+	KUNIT_ASSERT_EQ(test, 1, ret);
+
+	static_call_update(ppc_sc_kernel, ppc_sc_kernel_target_2);
+	ret = PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test));
+	KUNIT_ASSERT_EQ(test, 2, ret);
+
+	exit_testcase();
+}
+
+static void module_kernel_module_test(struct kunit *test)
+{
+	int ret;
+
+	init_testcase(test);
+
+	static_call_update(ppc_sc_kernel, module_target_11);
+	ret = PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test));
+	KUNIT_ASSERT_EQ(test, 11, ret);
+
+	static_call_update(ppc_sc_kernel, module_target_12);
+	ret = PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test));
+	KUNIT_ASSERT_EQ(test, 12, ret);
+
+	exit_testcase();
+}
+
+static void module_module_kernel_test(struct kunit *test)
+{
+	int ret;
+
+	init_testcase(test);
+
+	static_call_update(module_sc, ppc_sc_kernel_target_1);
+	ret = PROTECTED_SC(test, int, static_call(module_sc)(test));
+	KUNIT_ASSERT_EQ(test, 1, ret);
+
+	static_call_update(module_sc, ppc_sc_kernel_target_2);
+	ret = PROTECTED_SC(test, int, static_call(module_sc)(test));
+	KUNIT_ASSERT_EQ(test, 2, ret);
+
+	exit_testcase();
+}
+
+static void module_module_module_test(struct kunit *test)
+{
+	int ret;
+
+	init_testcase(test);
+
+	static_call_update(module_sc, module_target_11);
+	ret = PROTECTED_SC(test, int, static_call(module_sc)(test));
+	KUNIT_ASSERT_EQ(test, 11, ret);
+
+	static_call_update(module_sc, module_target_12);
+	ret = PROTECTED_SC(test, int, static_call(module_sc)(test));
+	KUNIT_ASSERT_EQ(test, 12, ret);
+
+	exit_testcase();
+}
+
+DEFINE_STATIC_CALL(module_sc_stack_params, ppc_sc_kernel_target_big);
+
+static void stack_parameters_test(struct kunit *test)
+{
+	long m = -0x87654321;
+	long ret;
+
+	init_testcase(test);
+
+	ret = PROTECTED_SC(test, long,
+		static_call(module_sc_stack_params)(test, m, m, m, m, m, m, m, m, m));
+	KUNIT_ASSERT_EQ(test, ~m, ret);
+
+	exit_testcase();
+}
+
+static struct kunit_case static_call_test_cases[] = {
+	KUNIT_CASE(null_function_test),
+	KUNIT_CASE(return_zero_test),
+	KUNIT_CASE(stack_parameters_test),
+	KUNIT_CASE(kernel_kernel_kernel_test),
+	KUNIT_CASE(kernel_kernel_module_test),
+	KUNIT_CASE(kernel_module_kernel_test),
+	KUNIT_CASE(kernel_module_module_test),
+	KUNIT_CASE(module_kernel_kernel_test),
+	KUNIT_CASE(module_kernel_module_test),
+	KUNIT_CASE(module_module_kernel_test),
+	KUNIT_CASE(module_module_module_test),
+	{}
+};
+
+static struct kunit_suite ppc_static_call_test_suite = {
+	.name = "ppc-static-call",
+	.test_cases = static_call_test_cases,
+};
+kunit_test_suite(ppc_static_call_test_suite);
+
+MODULE_AUTHOR("Benjamin Gray <bgray@linux.ibm.com>");
+MODULE_LICENSE("GPL");
diff --git a/arch/powerpc/kernel/static_call_test.h b/arch/powerpc/kernel/static_call_test.h
new file mode 100644
index 000000000000..71b5bc52c099
--- /dev/null
+++ b/arch/powerpc/kernel/static_call_test.h
@@ -0,0 +1,56 @@
+#ifndef _POWERPC_STATIC_CALL_TEST_
+#define _POWERPC_STATIC_CALL_TEST_
+
+#include <kunit/test.h>
+
+DECLARE_STATIC_CALL(ppc_sc_kernel, int(struct kunit *test));
+int ppc_sc_kernel_target_1(struct kunit *test);
+int ppc_sc_kernel_target_2(struct kunit *test);
+long ppc_sc_kernel_target_big(struct kunit *test, long a, long b, long c, long d,
+			      long e, long f, long g, long h, long i);
+int ppc_sc_kernel_call(struct kunit *test);
+int ppc_sc_kernel_call_indirect(struct kunit *test, int(*fn)(struct kunit *test));
+
+#ifdef CONFIG_PPC64_ELF_ABI_V2
+
+/* Reserve these registers for testing so that a TOC error
+ * doesn't necessarily crash the whole kernel.
+ *
+ * The tests ensure the contents are restored before returning.
+ */
+register void *current_toc asm ("r2");
+register void *actual_toc asm ("r14");  /* Copy of TOC before fixup */
+register void *module_toc asm ("r15");
+
+#ifdef MODULE
+
+#define toc_fixup(test) \
+	actual_toc = current_toc; \
+	current_toc = module_toc; \
+	KUNIT_EXPECT_PTR_EQ(test, module_toc, actual_toc)
+
+#else /* KERNEL */
+
+#define toc_fixup(test) \
+	actual_toc = current_toc; \
+	current_toc = (void *)get_paca()->kernel_toc; \
+	KUNIT_EXPECT_PTR_EQ(test, (void *)get_paca()->kernel_toc, actual_toc)
+
+#endif /* MODULE */
+
+#define PROTECTED_SC(test, ret_type, call) \
+({ \
+	ret_type ret; \
+	ret = call; \
+	toc_fixup(test); \
+	ret; \
+})
+
+#else
+
+#define toc_fixup(test) {}
+#define PROTECTED_SC(test, ret_type, call) call
+
+#endif /* CONFIG_PPC64_ELF_ABI_V2 */
+
+#endif /* _POWERPC_STATIC_CALL_TEST_ */
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function
  2022-10-05  5:32 ` [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function Benjamin Gray
@ 2022-10-05 17:55   ` Christophe Leroy
  2022-10-06  3:36     ` Benjamin Gray
  0 siblings, 1 reply; 25+ messages in thread
From: Christophe Leroy @ 2022-10-05 17:55 UTC (permalink / raw)
  To: Benjamin Gray, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe

Hi,

Le 05/10/2022 à 07:32, Benjamin Gray a écrit :
> Adds a generic text patching mechanism for patches of size int or long
> bytes.
> 
> The patch_instruction function is reimplemented in terms of this
> more generic function. This generic implementation allows patching of
> arbitrary long data, such as pointers on 64-bit.
> 
> On 32-bit patch_int is marked noinline to prevent a mis-optimisation.
> Without noinline, inside patch_branch the compiler may inline all the
> way to do_patch_memory, preventing the compiler from inlining
> do_patch_memory into patch_int. This would needlessly force patch_int
> to be a branch to do_patch_memory.

I'm on business trip this week so I can't test it on hardware, but the 
generated code looks horrid and sub-optimal, with a stack frame and so 
many registers saved into it. That's mpc885_ads_defconfig built with GCC 
12, without modules without stackprotector with 4k pages.

00000168 <__patch_memory.constprop.0>:
  168:	90 83 00 00 	stw     r4,0(r3)
  16c:	7c 00 18 6c 	dcbst   0,r3
  170:	7c 00 04 ac 	hwsync
  174:	7c 00 2f ac 	icbi    0,r5
  178:	7c 00 04 ac 	hwsync
  17c:	4c 00 01 2c 	isync
  180:	38 60 00 00 	li      r3,0
  184:	4e 80 00 20 	blr
  188:	38 60 ff ff 	li      r3,-1
  18c:	4e 80 00 20 	blr

00000190 <raw_patch_instruction>:
  190:	90 83 00 00 	stw     r4,0(r3)
  194:	7c 00 18 6c 	dcbst   0,r3
  198:	7c 00 04 ac 	hwsync
  19c:	7c 00 1f ac 	icbi    0,r3
  1a0:	7c 00 04 ac 	hwsync
  1a4:	4c 00 01 2c 	isync
  1a8:	38 60 00 00 	li      r3,0
  1ac:	4e 80 00 20 	blr
  1b0:	38 60 ff ff 	li      r3,-1
  1b4:	4e 80 00 20 	blr

000001b8 <patch_uint>:
  1b8:	7c 65 1b 78 	mr      r5,r3
  1bc:	48 00 00 a4 	b       260 <patch_uint+0xa8>
  1c0:	94 21 ff e0 	stwu    r1,-32(r1)
  1c4:	7c 08 02 a6 	mflr    r0
  1c8:	90 01 00 24 	stw     r0,36(r1)
  1cc:	93 81 00 10 	stw     r28,16(r1)
  1d0:	93 a1 00 14 	stw     r29,20(r1)
  1d4:	93 c1 00 18 	stw     r30,24(r1)
  1d8:	93 e1 00 1c 	stw     r31,28(r1)
  1dc:	7f 80 00 a6 	mfmsr   r28
  1e0:	7c 51 13 a6 	mtspr   81,r2
  1e4:	3d 20 00 00 	lis     r9,0
			1e6: R_PPC_ADDR16_HA	.data
  1e8:	81 49 00 00 	lwz     r10,0(r9)
			1ea: R_PPC_ADDR16_LO	.data
  1ec:	3d 20 00 00 	lis     r9,0
			1ee: R_PPC_ADDR16_HA	init_mm+0x24
  1f0:	83 ea 00 04 	lwz     r31,4(r10)
  1f4:	80 e9 00 00 	lwz     r7,0(r9)
			1f6: R_PPC_ADDR16_LO	init_mm+0x24
  1f8:	57 e8 65 3a 	rlwinm  r8,r31,12,20,29
  1fc:	7f a7 40 2e 	lwzx    r29,r7,r8
  200:	7c 69 1b 78 	mr      r9,r3
  204:	3d 29 40 00 	addis   r9,r9,16384
  208:	57 fe b5 3a 	rlwinm  r30,r31,22,20,29
  20c:	55 29 00 26 	clrrwi  r9,r9,12
  210:	61 29 01 25 	ori     r9,r9,293
  214:	57 bd 00 26 	clrrwi  r29,r29,12
  218:	3f de c0 00 	addis   r30,r30,-16384
  21c:	7d 3d f1 2e 	stwx    r9,r29,r30
  220:	53 e3 00 26 	rlwimi  r3,r31,0,0,19
  224:	4b ff ff 45 	bl      168 <__patch_memory.constprop.0>
  228:	39 20 00 00 	li      r9,0
  22c:	7d 3d f1 2e 	stwx    r9,r29,r30
  230:	57 ff 00 26 	clrrwi  r31,r31,12
  234:	7c 00 fa 64 	tlbie   r31,r0
  238:	7c 00 04 ac 	hwsync
  23c:	7f 80 01 24 	mtmsr   r28
  240:	80 01 00 24 	lwz     r0,36(r1)
  244:	83 81 00 10 	lwz     r28,16(r1)
  248:	83 a1 00 14 	lwz     r29,20(r1)
  24c:	83 c1 00 18 	lwz     r30,24(r1)
  250:	83 e1 00 1c 	lwz     r31,28(r1)
  254:	7c 08 03 a6 	mtlr    r0
  258:	38 21 00 20 	addi    r1,r1,32
  25c:	4e 80 00 20 	blr
  260:	4b ff ff 08 	b       168 <__patch_memory.constprop.0>


Christophe


> 
> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/code-patching.h | 29 ++++++++++
>   arch/powerpc/lib/code-patching.c         | 73 ++++++++++++++++++------
>   2 files changed, 85 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
> index 3f881548fb61..170bfa848c7c 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -72,7 +72,36 @@ static inline int create_branch(ppc_inst_t *instr, const u32 *addr,
>   int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
>   		       unsigned long target, int flags);
>   int patch_branch(u32 *addr, unsigned long target, int flags);
> +
> +/* patch_uint and patch_ulong must only be called on addresses where the patch
> + * does not cross a cacheline, otherwise it may not be flushed properly and
> + * mixes of new and stale data may be observed.
> + *
> + * patch_instruction and other instruction patchers automatically satisfy this
> + * requirement due to instruction alignment requirements.
> + */
> +
> +int patch_uint(void *addr, unsigned int val);
> +
> +#ifdef CONFIG_PPC64
> +
> +int patch_ulong(void *addr, unsigned long val);
>   int patch_instruction(u32 *addr, ppc_inst_t instr);
> +
> +#else
> +
> +static inline int patch_ulong(void *addr, unsigned long val)
> +{
> +	return patch_uint(addr, val);
> +}
> +
> +static inline int patch_instruction(u32 *addr, ppc_inst_t instr)
> +{
> +	return patch_uint(addr, ppc_inst_val(instr));
> +}
> +
> +#endif
> +
>   int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
>   
>   static inline unsigned long patch_site_addr(s32 *site)
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> index 125c55e3e148..ecdd2e523d9a 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -15,20 +15,24 @@
>   #include <asm/code-patching.h>
>   #include <asm/inst.h>
>   
> -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr)
> +static int __patch_memory(void *patch_addr, unsigned long val, void *exec_addr,
> +			   bool is_dword)
>   {
> -	if (!ppc_inst_prefixed(instr)) {
> -		u32 val = ppc_inst_val(instr);
> -
> -		__put_kernel_nofault(patch_addr, &val, u32, failed);
> -	} else {
> -		u64 val = ppc_inst_as_ulong(instr);
> +	/* Prefixed instruction may cross cacheline if cacheline smaller than 64 bytes */
> +	BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64);
>   
> +	if (unlikely(is_dword))
>   		__put_kernel_nofault(patch_addr, &val, u64, failed);
> -	}
> +	else
> +		__put_kernel_nofault(patch_addr, &val, u32, failed);
>   
> -	asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
> -							    "r" (exec_addr));
> +	/* Assume data is inside a single cacheline */
> +	dcbst(patch_addr);
> +	mb(); /* sync */
> +	/* Flush on the EA that may be executed in case of a non-coherent icache */
> +	icbi(exec_addr);
> +	mb(); /* sync */
> +	isync();
>   
>   	return 0;
>   
> @@ -38,7 +42,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr
>   
>   int raw_patch_instruction(u32 *addr, ppc_inst_t instr)
>   {
> -	return __patch_instruction(addr, instr, addr);
> +	if (ppc_inst_prefixed(instr))
> +		return __patch_memory(addr, ppc_inst_as_ulong(instr), addr, true);
> +	else
> +		return __patch_memory(addr, ppc_inst_val(instr), addr, false);
>   }
>   
>   static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> @@ -149,7 +156,7 @@ static void unmap_patch_area(unsigned long addr)
>   	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>   }
>   
> -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
> +static int __do_patch_memory(void *addr, unsigned long val, bool is_dword)
>   {
>   	int err;
>   	u32 *patch_addr;
> @@ -166,7 +173,7 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
>   	if (radix_enabled())
>   		asm volatile("ptesync": : :"memory");
>   
> -	err = __patch_instruction(addr, instr, patch_addr);
> +	err = __patch_memory(patch_addr, val, addr, is_dword);
>   
>   	pte_clear(&init_mm, text_poke_addr, pte);
>   	flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE);
> @@ -174,7 +181,7 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
>   	return err;
>   }
>   
> -int patch_instruction(u32 *addr, ppc_inst_t instr)
> +static int do_patch_memory(void *addr, unsigned long val, bool is_dword)
>   {
>   	int err;
>   	unsigned long flags;
> @@ -186,15 +193,47 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
>   	 */
>   	if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) ||
>   	    !static_branch_likely(&poking_init_done))
> -		return raw_patch_instruction(addr, instr);
> +		return __patch_memory(addr, val, addr, is_dword);
>   
>   	local_irq_save(flags);
> -	err = __do_patch_instruction(addr, instr);
> +	err = __do_patch_memory(addr, val, is_dword);
>   	local_irq_restore(flags);
>   
>   	return err;
>   }
> -NOKPROBE_SYMBOL(patch_instruction);
> +
> +#ifdef CONFIG_PPC64
> +
> +int patch_uint(void *addr, unsigned int val)
> +{
> +	return do_patch_memory(addr, val, false);
> +}
> +NOKPROBE_SYMBOL(patch_uint)
> +
> +int patch_ulong(void *addr, unsigned long val)
> +{
> +	return do_patch_memory(addr, val, true);
> +}
> +NOKPROBE_SYMBOL(patch_ulong)
> +
> +int patch_instruction(u32 *addr, ppc_inst_t instr)
> +{
> +	if (ppc_inst_prefixed(instr))
> +		return patch_ulong(addr, ppc_inst_as_ulong(instr));
> +	else
> +		return patch_uint(addr, ppc_inst_val(instr));
> +}
> +NOKPROBE_SYMBOL(patch_instruction)
> +
> +#else
> +
> +noinline int patch_uint(void *addr, unsigned int val)
> +{
> +	return do_patch_memory(addr, val, false);
> +}
> +NOKPROBE_SYMBOL(patch_uint)
> +
> +#endif
>   
>   int patch_branch(u32 *addr, unsigned long target, int flags)
>   {

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/6] powerpc/module: Handle caller-saved TOC in module linker
  2022-10-05  5:32 ` [PATCH v3 2/6] powerpc/module: Handle caller-saved TOC in module linker Benjamin Gray
@ 2022-10-05 19:18   ` Christophe Leroy
  2022-10-06  3:51     ` Andrew Donnellan
  2022-10-06  4:39     ` Benjamin Gray
  0 siblings, 2 replies; 25+ messages in thread
From: Christophe Leroy @ 2022-10-05 19:18 UTC (permalink / raw)
  To: Benjamin Gray, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe



Le 05/10/2022 à 07:32, Benjamin Gray a écrit :
> The callee may set a field in st_other to 1 to indicate r2 should be
> treated as caller-saved. This means a trampoline must be used to save
> the current TOC before calling it and restore it afterwards, much like
> external calls.

The 'callee', what is that here in that context ?

Don't you mean the 'linker' instead ?

> 
> This is necessary for supporting V2 ABI static calls that do not
> preserve the TOC.
> 
> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
> ---
>   arch/powerpc/kernel/module_64.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 7e45dc98df8a..4d816f7785b4 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -55,6 +55,12 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
>   	 * of function and try to derive r2 from it). */
>   	return PPC64_LOCAL_ENTRY_OFFSET(sym->st_other);
>   }
> +
> +static bool need_r2save_stub(unsigned char st_other)
> +{
> +	return ((st_other & STO_PPC64_LOCAL_MASK) >> STO_PPC64_LOCAL_BIT) == 1;

I would have writen :

	return st_other & STO_PPC64_LOCAL_MASK == 1 << STO_PPC64_LOCAL_BIT;

But it is just a matter of preference, so up to you.

> +}
> +
>   #else
>   
>   static func_desc_t func_desc(unsigned long addr)
> @@ -66,6 +72,11 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
>   	return 0;
>   }
>   
> +static bool need_r2save_stub(unsigned char st_other)
> +{
> +	return false;
> +}
> +
>   void *dereference_module_function_descriptor(struct module *mod, void *ptr)
>   {
>   	if (ptr < (void *)mod->arch.start_opd ||
> @@ -632,7 +643,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>   		case R_PPC_REL24:
>   			/* FIXME: Handle weak symbols here --RR */
>   			if (sym->st_shndx == SHN_UNDEF ||
> -			    sym->st_shndx == SHN_LIVEPATCH) {
> +			    sym->st_shndx == SHN_LIVEPATCH ||
> +			    need_r2save_stub(sym->st_other)) {
>   				/* External: go via stub */
>   				value = stub_for_addr(sechdrs, value, me,
>   						strtab + sym->st_name);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub
  2022-10-05  5:32 ` [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub Benjamin Gray
@ 2022-10-05 19:21   ` Christophe Leroy
  2022-10-06  8:24   ` Andrew Donnellan
  1 sibling, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2022-10-05 19:21 UTC (permalink / raw)
  To: Benjamin Gray, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe



Le 05/10/2022 à 07:32, Benjamin Gray a écrit :
> Inserts a direct branch to the stub target when possible, replacing the
> mtctr/btctr sequence.
> 
> The load into r12 could potentially be skipped too, but that change
> would need to refactor the arguments to indicate that the address
> does not have a separate local entry point.
> 
> This helps the static call implementation, where modules calling their
> own trampolines are called through this stub and the trampoline is
> easily within range of a direct branch.
> 
> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>

> ---
>   arch/powerpc/kernel/module_64.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 4d816f7785b4..13ce7a4d8a8d 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -141,6 +141,12 @@ static u32 ppc64_stub_insns[] = {
>   	PPC_RAW_BCTR(),
>   };
>   
> +#ifdef CONFIG_PPC64_ELF_ABI_V1
> +#define PPC64_STUB_MTCTR_OFFSET 5
> +#else
> +#define PPC64_STUB_MTCTR_OFFSET 4
> +#endif
> +
>   /* Count how many different 24-bit relocations (different symbol,
>      different addend) */
>   static unsigned int count_relocs(const Elf64_Rela *rela, unsigned int num)
> @@ -426,6 +432,7 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
>   			      struct module *me,
>   			      const char *name)
>   {
> +	int err;
>   	long reladdr;
>   	func_desc_t desc;
>   	int i;
> @@ -439,6 +446,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
>   			return 0;
>   	}
>   
> +	/* Replace indirect branch sequence with direct branch where possible */
> +	err = patch_branch(&entry->jump[PPC64_STUB_MTCTR_OFFSET], addr, 0);
> +	if (err && err != -ERANGE)
> +		return 0;
> +
>   	/* Stub uses address relative to r2. */
>   	reladdr = (unsigned long)entry - my_r2(sechdrs, me);
>   	if (reladdr > 0x7FFFFFFF || reladdr < -(0x80000000L)) {

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 4/6] static_call: Move static call selftest to static_call_selftest.c
  2022-10-05  5:32 ` [PATCH v3 4/6] static_call: Move static call selftest to static_call_selftest.c Benjamin Gray
@ 2022-10-05 19:22   ` Christophe Leroy
  0 siblings, 0 replies; 25+ messages in thread
From: Christophe Leroy @ 2022-10-05 19:22 UTC (permalink / raw)
  To: Benjamin Gray, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe



Le 05/10/2022 à 07:32, Benjamin Gray a écrit :
> These tests are out-of-line only, so moving them to the
> their own file allows them to be run when an arch does
> not implement inline static calls.
> 
> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>

> ---
>   kernel/Makefile               |  1 +
>   kernel/static_call_inline.c   | 43 -----------------------------------
>   kernel/static_call_selftest.c | 41 +++++++++++++++++++++++++++++++++
>   3 files changed, 42 insertions(+), 43 deletions(-)
>   create mode 100644 kernel/static_call_selftest.c
> 
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 318789c728d3..8ce8beaa3cc0 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -113,6 +113,7 @@ obj-$(CONFIG_KCSAN) += kcsan/
>   obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>   obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o
>   obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o
> +obj-$(CONFIG_STATIC_CALL_SELFTEST) += static_call_selftest.o
>   obj-$(CONFIG_CFI_CLANG) += cfi.o
>   
>   obj-$(CONFIG_PERF_EVENTS) += events/
> diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c
> index dc5665b62814..64d04d054698 100644
> --- a/kernel/static_call_inline.c
> +++ b/kernel/static_call_inline.c
> @@ -498,46 +498,3 @@ int __init static_call_init(void)
>   	return 0;
>   }
>   early_initcall(static_call_init);
> -
> -#ifdef CONFIG_STATIC_CALL_SELFTEST
> -
> -static int func_a(int x)
> -{
> -	return x+1;
> -}
> -
> -static int func_b(int x)
> -{
> -	return x+2;
> -}
> -
> -DEFINE_STATIC_CALL(sc_selftest, func_a);
> -
> -static struct static_call_data {
> -      int (*func)(int);
> -      int val;
> -      int expect;
> -} static_call_data [] __initdata = {
> -      { NULL,   2, 3 },
> -      { func_b, 2, 4 },
> -      { func_a, 2, 3 }
> -};
> -
> -static int __init test_static_call_init(void)
> -{
> -      int i;
> -
> -      for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
> -	      struct static_call_data *scd = &static_call_data[i];
> -
> -              if (scd->func)
> -                      static_call_update(sc_selftest, scd->func);
> -
> -              WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
> -      }
> -
> -      return 0;
> -}
> -early_initcall(test_static_call_init);
> -
> -#endif /* CONFIG_STATIC_CALL_SELFTEST */
> diff --git a/kernel/static_call_selftest.c b/kernel/static_call_selftest.c
> new file mode 100644
> index 000000000000..246ad89f64eb
> --- /dev/null
> +++ b/kernel/static_call_selftest.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/static_call.h>
> +
> +static int func_a(int x)
> +{
> +	return x+1;
> +}
> +
> +static int func_b(int x)
> +{
> +	return x+2;
> +}
> +
> +DEFINE_STATIC_CALL(sc_selftest, func_a);
> +
> +static struct static_call_data {
> +	int (*func)(int);
> +	int val;
> +	int expect;
> +} static_call_data [] __initdata = {
> +	{ NULL,   2, 3 },
> +	{ func_b, 2, 4 },
> +	{ func_a, 2, 3 }
> +};
> +
> +static int __init test_static_call_init(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
> +		struct static_call_data *scd = &static_call_data[i];
> +
> +		if (scd->func)
> +			static_call_update(sc_selftest, scd->func);
> +
> +		WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
> +	}
> +
> +	return 0;
> +}
> +early_initcall(test_static_call_init);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-05  5:32 ` [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls Benjamin Gray
@ 2022-10-05 19:38   ` Christophe Leroy
  2022-10-06  0:39     ` Michael Ellerman
  0 siblings, 1 reply; 25+ messages in thread
From: Christophe Leroy @ 2022-10-05 19:38 UTC (permalink / raw)
  To: Benjamin Gray, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe



Le 05/10/2022 à 07:32, Benjamin Gray a écrit :
> Implement static call support for 64 bit V2 ABI. This requires making
> sure the TOC is kept correct across kernel-module boundaries. As a
> secondary concern, it tries to use the local entry point of a target
> wherever possible. It does so by checking if both tramp & target are
> kernel code, and falls back to detecting the common global entry point
> patterns if modules are involved. Detecting the global entry point is
> also required for setting the local entry point as the trampoline
> target: if we cannot detect the local entry point, then we need to
> convservatively initialise r12 and use the global entry point.
> 
> The trampolines are marked with `.localentry NAME, 1` to make the
> linker save and restore the TOC on each call to the trampoline. This
> allows the trampoline to safely target functions with different TOC
> values.
> 
> However this directive also implies the TOC is not initialised on entry
> to the trampoline. The kernel TOC is easily found in the PACA, but not
> an arbitrary module TOC. Therefore the trampoline implementation depends
> on whether it's in the kernel or not. If in the kernel, we initialise
> the TOC using the PACA. If in a module, we have to initialise the TOC
> with zero context, so it's quite expensive.
> 
> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>

This looks good to me

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>

However, thinking out loudly, I'm wondering, could we make things any 
simpler when CONFIG_MODULES is not selected, or is that a too much 
corner case on PPC64 ?

I'm asking because on embedded PPC32 it is current to not have 
CONFIG_MODULES set.

> ---
>   arch/powerpc/Kconfig                     |  14 ++-
>   arch/powerpc/include/asm/code-patching.h |   1 +
>   arch/powerpc/include/asm/static_call.h   |  80 +++++++++++++-
>   arch/powerpc/kernel/Makefile             |   3 +-
>   arch/powerpc/kernel/static_call.c        | 130 +++++++++++++++++++++--
>   5 files changed, 216 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 4c466acdc70d..962e36ec34ec 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -102,6 +102,18 @@ config GENERIC_HWEIGHT
>   	bool
>   	default y
>   
> +config TOOLCHAIN_SUPPORTS_LOCALENTRY1
> +	bool
> +	depends on PPC64_ELF_ABI_V2
> +	default y if LD_VERSION >= 23200 || LLD_VERSION >= 110000
> +	help
> +	  A section of the ELF symbol st_other field can be given the value 1
> +	  using the directive '.localentry NAME, 1' to mean the local and global
> +	  entry points are the same, and r2 should be treated as caller-saved.
> +
> +	  Older versions of Clang and binutils do not recognise this form of the
> +	  directive and will error if it is used.
> +
>   config PPC
>   	bool
>   	default y
> @@ -248,7 +260,7 @@ config PPC
>   	select HAVE_SOFTIRQ_ON_OWN_STACK
>   	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>   	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
> -	select HAVE_STATIC_CALL			if PPC32
> +	select HAVE_STATIC_CALL			if PPC32 || (PPC64_ELF_ABI_V2 && TOOLCHAIN_SUPPORTS_LOCALENTRY1)
>   	select HAVE_SYSCALL_TRACEPOINTS
>   	select HAVE_VIRT_CPU_ACCOUNTING
>   	select HUGETLB_PAGE_SIZE_VARIABLE	if PPC_BOOK3S_64 && HUGETLB_PAGE
> diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
> index 170bfa848c7c..cb4629e55e57 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -152,6 +152,7 @@ int translate_branch(ppc_inst_t *instr, const u32 *dest, const u32 *src);
>   bool is_conditional_branch(ppc_inst_t instr);
>   
>   #define OP_RT_RA_MASK	0xffff0000UL
> +#define OP_SI_MASK	0x0000ffffUL
>   #define LIS_R2		(PPC_RAW_LIS(_R2, 0))
>   #define ADDIS_R2_R12	(PPC_RAW_ADDIS(_R2, _R12, 0))
>   #define ADDI_R2_R2	(PPC_RAW_ADDI(_R2, _R2, 0))
> diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
> index de1018cc522b..3d6e82200cb7 100644
> --- a/arch/powerpc/include/asm/static_call.h
> +++ b/arch/powerpc/include/asm/static_call.h
> @@ -2,12 +2,75 @@
>   #ifndef _ASM_POWERPC_STATIC_CALL_H
>   #define _ASM_POWERPC_STATIC_CALL_H
>   
> +#ifdef CONFIG_PPC64_ELF_ABI_V2
> +
> +#ifdef MODULE
> +
> +#define __PPC_SCT(name, inst)					\
> +	asm(".pushsection .text, \"ax\"				\n"	\
> +	    ".align 6						\n"	\
> +	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
> +	    ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1	\n"	\
> +	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
> +	    "	mflr	11					\n"	\
> +	    "	bcl	20, 31, $+4				\n"	\
> +	    "0:	mflr	12					\n"	\
> +	    "	mtlr	11					\n"	\
> +	    "	addi	12, 12, (" STATIC_CALL_TRAMP_STR(name) " - 0b)	\n"	\
> +	    "	addis 2, 12, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@ha	\n"	\
> +	    "	addi 2, 2, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@l	\n"	\
> +	    "	" inst "					\n"	\
> +	    "	ld	12, (2f - " STATIC_CALL_TRAMP_STR(name) ")(12)	\n"	\
> +	    "	mtctr	12					\n"	\
> +	    "	bctr						\n"	\
> +	    "1:	li	3, 0					\n"	\
> +	    "	blr						\n"	\
> +	    ".balign 8						\n"	\
> +	    "2:	.8byte 0					\n"	\
> +	    ".type " STATIC_CALL_TRAMP_STR(name) ", @function	\n"	\
> +	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
> +	    ".popsection					\n")
> +
> +#else /* KERNEL */
> +
> +#define __PPC_SCT(name, inst)					\
> +	asm(".pushsection .text, \"ax\"				\n"	\
> +	    ".align 5						\n"	\
> +	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
> +	    ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1	\n"	\
> +	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
> +	    "	ld	2, 16(13)				\n"	\
> +	    "	" inst "					\n"	\
> +	    "	addis	12, 2, 2f@toc@ha			\n"	\
> +	    "	ld	12, 2f@toc@l(12)			\n"	\
> +	    "	mtctr	12					\n"	\
> +	    "	bctr						\n"	\
> +	    "1:	li	3, 0					\n"	\
> +	    "	blr						\n"	\
> +	    ".balign 8						\n"	\
> +	    "2:	.8byte 0					\n"	\
> +	    ".type " STATIC_CALL_TRAMP_STR(name) ", @function	\n"	\
> +	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
> +	    ".popsection					\n")
> +
> +#endif /* MODULE */
> +
> +#define PPC_SCT_INST_MODULE		28		/* Offset of instruction to update */
> +#define PPC_SCT_RET0_MODULE		44		/* Offset of label 1 */
> +#define PPC_SCT_DATA_MODULE		56		/* Offset of label 2 (aligned) */
> +
> +#define PPC_SCT_INST_KERNEL		4		/* Offset of instruction to update */
> +#define PPC_SCT_RET0_KERNEL		24		/* Offset of label 1 */
> +#define PPC_SCT_DATA_KERNEL		32		/* Offset of label 2 (aligned) */
> +
> +#elif defined(CONFIG_PPC32)
> +
>   #define __PPC_SCT(name, inst)					\
>   	asm(".pushsection .text, \"ax\"				\n"	\
>   	    ".align 5						\n"	\
>   	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
>   	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
> -	    inst "						\n"	\
> +	    "	" inst "					\n"	\
>   	    "	lis	12,2f@ha				\n"	\
>   	    "	lwz	12,2f@l(12)				\n"	\
>   	    "	mtctr	12					\n"	\
> @@ -19,11 +82,20 @@
>   	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
>   	    ".popsection					\n")
>   
> -#define PPC_SCT_RET0		20		/* Offset of label 1 */
> -#define PPC_SCT_DATA		28		/* Offset of label 2 */
> +#define PPC_SCT_INST_MODULE		0		/* Offset of instruction to update */
> +#define PPC_SCT_RET0_MODULE		20		/* Offset of label 1 */
> +#define PPC_SCT_DATA_MODULE		28		/* Offset of label 2 */
> +
> +#define PPC_SCT_INST_KERNEL		PPC_SCT_INST_MODULE
> +#define PPC_SCT_RET0_KERNEL		PPC_SCT_RET0_MODULE
> +#define PPC_SCT_DATA_KERNEL		PPC_SCT_DATA_MODULE
> +
> +#else /* !CONFIG_PPC64_ELF_ABI_V2 && !CONFIG_PPC32 */
> +#error "Unsupported ABI"
> +#endif /* CONFIG_PPC64_ELF_ABI_V2 */
>   
>   #define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func)	__PPC_SCT(name, "b " #func)
>   #define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)	__PPC_SCT(name, "blr")
> -#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name)	__PPC_SCT(name, "b .+20")
> +#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name)	__PPC_SCT(name, "b 1f")
>   
>   #endif /* _ASM_POWERPC_STATIC_CALL_H */
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 06d2d1f78f71..a30d0d0f5499 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -128,8 +128,9 @@ extra-y				+= vmlinux.lds
>   
>   obj-$(CONFIG_RELOCATABLE)	+= reloc_$(BITS).o
>   
> -obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o early_32.o static_call.o
> +obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o early_32.o
>   obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
> +obj-$(CONFIG_HAVE_STATIC_CALL)	+= static_call.o
>   obj-$(CONFIG_KGDB)		+= kgdb.o
>   obj-$(CONFIG_BOOTX_TEXT)	+= btext.o
>   obj-$(CONFIG_SMP)		+= smp.o
> diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
> index 863a7aa24650..9211b2e189bb 100644
> --- a/arch/powerpc/kernel/static_call.c
> +++ b/arch/powerpc/kernel/static_call.c
> @@ -1,33 +1,151 @@
>   // SPDX-License-Identifier: GPL-2.0
> +#include <linux/bitops.h>
>   #include <linux/memory.h>
>   #include <linux/static_call.h>
>   
>   #include <asm/code-patching.h>
>   
> +static long sign_extend_long(unsigned long value, int index)
> +{
> +	if (sizeof(long) == 8)
> +		return sign_extend64(value, index);
> +	else
> +		return sign_extend32(value, index);
> +}
> +
> +static void *ppc_function_toc(u32 *func)
> +{
> +	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) {
> +		/* There are two common global entry sequences we handle below
> +		 *
> +		 * 1. addis r2, r12, SI1
> +		 *    addi r2, SI2
> +		 *
> +		 * 2. lis r2, SI1
> +		 *    addi r2, SI2
> +		 *
> +		 * Where r12 contains the global entry point address (it is otherwise
> +		 * uninitialised, so doesn't matter what value we use if this is not
> +		 * a separate global entry point).
> +		 *
> +		 * Here we simulate running the given sequence and return the result it
> +		 * would calculate. If the sequence is not recognised we return NULL.
> +		 */
> +		u32 insn1 = *func;
> +		u32 insn2 = *(func + 1);
> +		unsigned long op_regs1 = insn1 & OP_RT_RA_MASK;
> +		unsigned long op_regs2 = insn2 & OP_RT_RA_MASK;
> +		unsigned long si1 = insn1 & OP_SI_MASK;
> +		unsigned long si2 = insn2 & OP_SI_MASK;
> +		unsigned long imm1 = sign_extend_long(si1 << 16, 31);
> +		unsigned long imm2 = sign_extend_long(si2, 15);
> +		unsigned long addr = 0;
> +
> +		/* Simulate the first instruction */
> +		if (op_regs1 == ADDIS_R2_R12)
> +			addr += (unsigned long)func + imm1;
> +		else if (op_regs1 == LIS_R2)
> +			addr += imm1;
> +		else
> +			return NULL;
> +
> +		/* Simulate the second instruction */
> +		if (op_regs2 == ADDI_R2_R2)
> +			addr += imm2;
> +		else
> +			return NULL;
> +
> +		return (void *)addr;
> +	}
> +
> +	return NULL;
> +}
> +
> +static bool shares_toc(void *func1, void *func2)
> +{
> +	if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) {
> +		void *func1_toc;
> +		void *func2_toc;
> +
> +		if (func1 == NULL || func2 == NULL)
> +			return false;
> +
> +		/* Assume the kernel only uses a single TOC */
> +		if (core_kernel_text((unsigned long)func1) &&
> +		    core_kernel_text((unsigned long)func2))
> +			return true;
> +
> +		/* Fall back to calculating the TOC from common patterns
> +		 * if modules are involved
> +		 */
> +		func1_toc = ppc_function_toc(func1);
> +		func2_toc = ppc_function_toc(func2);
> +		return func1_toc != NULL && func2_toc != NULL && func1_toc == func2_toc;
> +	}
> +
> +	return true;
> +}
> +
> +static void *get_inst_addr(void *tramp)
> +{
> +	return tramp + (core_kernel_text((unsigned long)tramp)
> +				? PPC_SCT_INST_KERNEL
> +				: PPC_SCT_INST_MODULE);
> +}
> +
> +static void *get_ret0_addr(void *tramp)
> +{
> +	return tramp + (core_kernel_text((unsigned long)tramp)
> +				? PPC_SCT_RET0_KERNEL
> +				: PPC_SCT_RET0_MODULE);
> +}
> +
> +static void *get_data_addr(void *tramp)
> +{
> +	return tramp + (core_kernel_text((unsigned long) tramp)
> +				? PPC_SCT_DATA_KERNEL
> +				: PPC_SCT_DATA_MODULE);
> +}
> +
>   void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
>   {
>   	int err;
>   	bool is_ret0 = (func == __static_call_return0);
> -	unsigned long target = (unsigned long)(is_ret0 ? tramp + PPC_SCT_RET0 : func);
> -	bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
> +	bool is_short;
> +	void *target = is_ret0 ? get_ret0_addr(tramp) : func;
> +	void *tramp_inst = get_inst_addr(tramp);
>   
>   	if (!tramp)
>   		return;
>   
> +	if (is_ret0)
> +		is_short = true;
> +	else if (shares_toc(tramp, target))
> +		is_short = is_offset_in_branch_range(
> +			(long)ppc_function_entry(target) - (long)tramp_inst);
> +	else
> +		/* Combine out-of-range with not sharing a TOC. Though it's possible
> +		 * an out-of-range target shares a TOC, handling this separately
> +		 * complicates the trampoline. It's simpler to always use the global
> +		 * entry point in this case.
> +		 */
> +		is_short = false;
> +
>   	mutex_lock(&text_mutex);
>   
>   	if (func && !is_short) {
> -		err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target));
> +		err = patch_ulong(get_data_addr(tramp), (unsigned long)target);
>   		if (err)
>   			goto out;
>   	}
>   
>   	if (!func)
> -		err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
> +		err = patch_instruction(tramp_inst, ppc_inst(PPC_RAW_BLR()));
>   	else if (is_short)
> -		err = patch_branch(tramp, target, 0);
> +		err = patch_branch(tramp_inst, ppc_function_entry(target), 0);
>   	else
> -		err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
> +		err = patch_instruction(tramp_inst, ppc_inst(PPC_RAW_NOP()));
> +
>   out:
>   	mutex_unlock(&text_mutex);
>   

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-05 19:38   ` Christophe Leroy
@ 2022-10-06  0:39     ` Michael Ellerman
  2022-10-06  5:01       ` Benjamin Gray
  2022-10-06 18:22       ` Segher Boessenkool
  0 siblings, 2 replies; 25+ messages in thread
From: Michael Ellerman @ 2022-10-06  0:39 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Gray, linuxppc-dev
  Cc: ajd, peterz, rostedt, jpoimboe, jbaron, npiggin, ardb

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 05/10/2022 à 07:32, Benjamin Gray a écrit :
>> Implement static call support for 64 bit V2 ABI. This requires making
>> sure the TOC is kept correct across kernel-module boundaries. As a
>> secondary concern, it tries to use the local entry point of a target
>> wherever possible. It does so by checking if both tramp & target are
>> kernel code, and falls back to detecting the common global entry point
>> patterns if modules are involved. Detecting the global entry point is
>> also required for setting the local entry point as the trampoline
>> target: if we cannot detect the local entry point, then we need to
>> convservatively initialise r12 and use the global entry point.
>> 
>> The trampolines are marked with `.localentry NAME, 1` to make the
>> linker save and restore the TOC on each call to the trampoline. This
>> allows the trampoline to safely target functions with different TOC
>> values.
>> 
>> However this directive also implies the TOC is not initialised on entry
>> to the trampoline. The kernel TOC is easily found in the PACA, but not
>> an arbitrary module TOC. Therefore the trampoline implementation depends
>> on whether it's in the kernel or not. If in the kernel, we initialise
>> the TOC using the PACA. If in a module, we have to initialise the TOC
>> with zero context, so it's quite expensive.
>> 
>> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
>
> This looks good to me
>
> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>
> However, thinking out loudly, I'm wondering, could we make things any 
> simpler when CONFIG_MODULES is not selected, or is that a too much 
> corner case on PPC64 ?

I'd say it's mostly a corner case.

Obviously no distros ship with modules disabled. 

AFAIK even the stripped down kernels we use in CPU bringup have modules
enabled.

So I think it's probably not worth worrying about, unless there's an
obvious and fairly simple optimisation.

cheers

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function
  2022-10-05 17:55   ` Christophe Leroy
@ 2022-10-06  3:36     ` Benjamin Gray
  2022-10-06  9:19       ` Christophe Leroy
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Gray @ 2022-10-06  3:36 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe

On Wed, 2022-10-05 at 17:55 +0000, Christophe Leroy wrote:
> I'm on business trip this week so I can't test it on hardware, but
> the 
> generated code looks horrid and sub-optimal, with a stack frame and
> so 
> many registers saved into it. That's mpc885_ads_defconfig built with
> GCC 
> 12, without modules without stackprotector with 4k pages.

Yeah, that definitely wasn't supposed to happen. I was looking at the
32 and 64 bit machine code closely while actively writing it, but I
must have refactored it badly when cleaning up for submission. It was
supposed to automatically be inlined, leaving it identical to the
original on 32-bit.

Given inlining is desirable here, and 64-bit inlines anyway, I think I
should just mark __patch_memory with __always_inline.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/6] powerpc/module: Handle caller-saved TOC in module linker
  2022-10-05 19:18   ` Christophe Leroy
@ 2022-10-06  3:51     ` Andrew Donnellan
  2022-10-06  4:39     ` Benjamin Gray
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Donnellan @ 2022-10-06  3:51 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Gray, linuxppc-dev
  Cc: peterz, rostedt, ardb, jbaron, npiggin, jpoimboe

On Wed, 2022-10-05 at 19:18 +0000, Christophe Leroy wrote:
> 
> 
> Le 05/10/2022 à 07:32, Benjamin Gray a écrit :
> > The callee may set a field in st_other to 1 to indicate r2 should
> > be
> > treated as caller-saved. This means a trampoline must be used to
> > save
> > the current TOC before calling it and restore it afterwards, much
> > like
> > external calls.
> 
> The 'callee', what is that here in that context ?
> 
> Don't you mean the 'linker' instead ?

Should probably explain explicitly that this is the .localentry NAME, 1
directive.

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/6] powerpc/module: Handle caller-saved TOC in module linker
  2022-10-05 19:18   ` Christophe Leroy
  2022-10-06  3:51     ` Andrew Donnellan
@ 2022-10-06  4:39     ` Benjamin Gray
  1 sibling, 0 replies; 25+ messages in thread
From: Benjamin Gray @ 2022-10-06  4:39 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe

On Wed, 2022-10-05 at 19:18 +0000, Christophe Leroy wrote:
> 
> 
> Le 05/10/2022 à 07:32, Benjamin Gray a écrit :
> > The callee may set a field in st_other to 1 to indicate r2 should
> > be
> > treated as caller-saved. This means a trampoline must be used to
> > save
> > the current TOC before calling it and restore it afterwards, much
> > like
> > external calls.
> 
> The 'callee', what is that here in that context ?
> 
> Don't you mean the 'linker' instead ?

Callee in the sense of the function body, as opposed to the call-site
like the R_PPC64_REL24, etc., relocations are (which I was thinking of
as the caller). I can see how the comment needs more context though.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-06  0:39     ` Michael Ellerman
@ 2022-10-06  5:01       ` Benjamin Gray
  2022-10-06 18:22       ` Segher Boessenkool
  1 sibling, 0 replies; 25+ messages in thread
From: Benjamin Gray @ 2022-10-06  5:01 UTC (permalink / raw)
  To: Michael Ellerman, Christophe Leroy, linuxppc-dev
  Cc: ajd, peterz, rostedt, jpoimboe, jbaron, npiggin, ardb

On Thu, 2022-10-06 at 11:39 +1100, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> > However, thinking out loudly, I'm wondering, could we make things
> > any 
> > simpler when CONFIG_MODULES is not selected, or is that a too much 
> > corner case on PPC64 ?
> 
> I'd say it's mostly a corner case.
> 
> Obviously no distros ship with modules disabled. 
> 
> AFAIK even the stripped down kernels we use in CPU bringup have
> modules
> enabled.
> 
> So I think it's probably not worth worrying about, unless there's an
> obvious and fairly simple optimisation.
> 
> cheers

Yeah, I think supporting this case would amount to a whole new
trampoline implementation. Something like the original RFC
implementation would be best here as there would only be one TOC to
worry about.

Instead, I expect this can all be optimised better once we can apply
patches to call-sites. If so, we can patch the call-site NOPs ourselves
without marking the trampoline as caller-saved TOC, which would remove
the suboptimal save-r2 trampolines. Then we could use separate local &
global entry points like the RFC. This would unify the kernel/module
trampolines again.

The benefit of this series is that it works with just what the ABI
offers, without extra kernel tools / linker magic. But I would
definitely revisit it once call-site patching is possible, especially
when working on inline static calls.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub
  2022-10-05  5:32 ` [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub Benjamin Gray
  2022-10-05 19:21   ` Christophe Leroy
@ 2022-10-06  8:24   ` Andrew Donnellan
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Donnellan @ 2022-10-06  8:24 UTC (permalink / raw)
  To: Benjamin Gray, linuxppc-dev
  Cc: peterz, npiggin, ardb, jbaron, rostedt, jpoimboe

On Wed, 2022-10-05 at 16:32 +1100, Benjamin Gray wrote:
> Inserts a direct branch to the stub target when possible, replacing
> the
> mtctr/btctr sequence.
> 
> The load into r12 could potentially be skipped too, but that change
> would need to refactor the arguments to indicate that the address
> does not have a separate local entry point.
> 
> This helps the static call implementation, where modules calling
> their
> own trampolines are called through this stub and the trampoline is
> easily within range of a direct branch.
> 
> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>

I'm not well versed in this code but nothing stands out as problematic
and it makes sense.

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

> ---
>  arch/powerpc/kernel/module_64.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/module_64.c
> b/arch/powerpc/kernel/module_64.c
> index 4d816f7785b4..13ce7a4d8a8d 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -141,6 +141,12 @@ static u32 ppc64_stub_insns[] = {
>         PPC_RAW_BCTR(),
>  };
>  
> +#ifdef CONFIG_PPC64_ELF_ABI_V1
> +#define PPC64_STUB_MTCTR_OFFSET 5
> +#else
> +#define PPC64_STUB_MTCTR_OFFSET 4
> +#endif
> +
>  /* Count how many different 24-bit relocations (different symbol,
>     different addend) */
>  static unsigned int count_relocs(const Elf64_Rela *rela, unsigned
> int num)
> @@ -426,6 +432,7 @@ static inline int create_stub(const Elf64_Shdr
> *sechdrs,
>                               struct module *me,
>                               const char *name)
>  {
> +       int err;
>         long reladdr;
>         func_desc_t desc;
>         int i;
> @@ -439,6 +446,11 @@ static inline int create_stub(const Elf64_Shdr
> *sechdrs,
>                         return 0;
>         }
>  
> +       /* Replace indirect branch sequence with direct branch where
> possible */
> +       err = patch_branch(&entry->jump[PPC64_STUB_MTCTR_OFFSET],
> addr, 0);
> +       if (err && err != -ERANGE)
> +               return 0;
> +
>         /* Stub uses address relative to r2. */
>         reladdr = (unsigned long)entry - my_r2(sechdrs, me);
>         if (reladdr > 0x7FFFFFFF || reladdr < -(0x80000000L)) {

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function
  2022-10-06  3:36     ` Benjamin Gray
@ 2022-10-06  9:19       ` Christophe Leroy
  2022-10-06 21:53         ` Benjamin Gray
  0 siblings, 1 reply; 25+ messages in thread
From: Christophe Leroy @ 2022-10-06  9:19 UTC (permalink / raw)
  To: Benjamin Gray, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe



Le 06/10/2022 à 05:36, Benjamin Gray a écrit :
> On Wed, 2022-10-05 at 17:55 +0000, Christophe Leroy wrote:
>> I'm on business trip this week so I can't test it on hardware, but
>> the
>> generated code looks horrid and sub-optimal, with a stack frame and
>> so
>> many registers saved into it. That's mpc885_ads_defconfig built with
>> GCC
>> 12, without modules without stackprotector with 4k pages.
> 
> Yeah, that definitely wasn't supposed to happen. I was looking at the
> 32 and 64 bit machine code closely while actively writing it, but I
> must have refactored it badly when cleaning up for submission. It was
> supposed to automatically be inlined, leaving it identical to the
> original on 32-bit.
> 
> Given inlining is desirable here, and 64-bit inlines anyway, I think I
> should just mark __patch_memory with __always_inline.

FWIW, I get a far better result with :

diff --git a/arch/powerpc/lib/code-patching.c 
b/arch/powerpc/lib/code-patching.c
index ba00c550d9d2..447b8de6e427 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -21,7 +21,7 @@ static int __patch_memory(void *patch_addr, unsigned 
long val, void *exec_addr,
  	/* Prefixed instruction may cross cacheline if cacheline smaller than 
64 bytes */
  	BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64);

-	if (unlikely(is_dword))
+	if (IS_ENABLED(CONFIG_PPC64) && unlikely(is_dword))
  		__put_kernel_nofault(patch_addr, &val, u64, failed);
  	else
  		__put_kernel_nofault(patch_addr, &val, u32, failed);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-06  0:39     ` Michael Ellerman
  2022-10-06  5:01       ` Benjamin Gray
@ 2022-10-06 18:22       ` Segher Boessenkool
  2022-10-06 18:38         ` Christophe Leroy
  1 sibling, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2022-10-06 18:22 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: ajd, peterz, npiggin, rostedt, ardb, jbaron, Benjamin Gray,
	linuxppc-dev, jpoimboe

On Thu, Oct 06, 2022 at 11:39:50AM +1100, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> > However, thinking out loudly, I'm wondering, could we make things any 
> > simpler when CONFIG_MODULES is not selected, or is that a too much 
> > corner case on PPC64 ?
> 
> I'd say it's mostly a corner case.
> 
> Obviously no distros ship with modules disabled. 
> 
> AFAIK even the stripped down kernels we use in CPU bringup have modules
> enabled.
> 
> So I think it's probably not worth worrying about, unless there's an
> obvious and fairly simple optimisation.

Long ago I built kernels that fit together with the boot firmware and a
root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
close to that at all these days :-)

What is the overhead if you enable modules but do not use them, these
days?


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-06 18:22       ` Segher Boessenkool
@ 2022-10-06 18:38         ` Christophe Leroy
  2022-10-06 20:45           ` Segher Boessenkool
  0 siblings, 1 reply; 25+ messages in thread
From: Christophe Leroy @ 2022-10-06 18:38 UTC (permalink / raw)
  To: Segher Boessenkool, Michael Ellerman
  Cc: ajd, peterz, rostedt, ardb, jbaron, npiggin, Benjamin Gray,
	linuxppc-dev, jpoimboe



Le 06/10/2022 à 20:22, Segher Boessenkool a écrit :
> On Thu, Oct 06, 2022 at 11:39:50AM +1100, Michael Ellerman wrote:
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> However, thinking out loudly, I'm wondering, could we make things any
>>> simpler when CONFIG_MODULES is not selected, or is that a too much
>>> corner case on PPC64 ?
>>
>> I'd say it's mostly a corner case.
>>
>> Obviously no distros ship with modules disabled.
>>
>> AFAIK even the stripped down kernels we use in CPU bringup have modules
>> enabled.
>>
>> So I think it's probably not worth worrying about, unless there's an
>> obvious and fairly simple optimisation.
> 
> Long ago I built kernels that fit together with the boot firmware and a
> root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
> close to that at all these days :-)

4MB, not easy. But 8M still achievable. Well our smaller board has 32M, 
we have thousands of it spread all over Europe and have to keep it up to 
date ....

> 
> What is the overhead if you enable modules but do not use them, these
> days?
> 

On the 8xx it is mainly the instruction TLB miss handler:

#ifdef CONFIG_MODULES
	mfcr	r11
	not.	r10, r10
#endif
	mfspr	r10, SPRN_M_TWB	/* Get level 1 table */
#ifdef CONFIG_MODULES
	blt+	3f
	rlwinm	r10, r10, 0, 20, 31
	oris	r10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
3:
	mtcr	r11
#endif


And also some patches which have a interesting impact, like commit 
cb3ac45214c0 ("powerpc/code-patching: Don't call 
is_vmalloc_or_module_addr() without CONFIG_MODULES")


On the other hand, if we want one day to replace nftables by BPF jitted 
iptables, CONFIG_MODULES will be required. So what will be the 
trade-off, don't know yet because BPF is not yet cross-compile friendly.

Christophe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-06 18:38         ` Christophe Leroy
@ 2022-10-06 20:45           ` Segher Boessenkool
  2022-10-06 20:50             ` Christophe Leroy
  0 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2022-10-06 20:45 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: ajd, peterz, rostedt, ardb, jbaron, npiggin, Benjamin Gray,
	linuxppc-dev, jpoimboe

On Thu, Oct 06, 2022 at 06:38:00PM +0000, Christophe Leroy wrote:
> Le 06/10/2022 à 20:22, Segher Boessenkool a écrit :
> > Long ago I built kernels that fit together with the boot firmware and a
> > root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
> > close to that at all these days :-)
> 
> 4MB, not easy. But 8M still achievable. Well our smaller board has 32M, 
> we have thousands of it spread all over Europe and have to keep it up to 
> date ....

The smallest of these systems had 256MB RAM.  This 4MB is flash ROM :-)

> > What is the overhead if you enable modules but do not use them, these
> > days?
> 
> On the 8xx it is mainly the instruction TLB miss handler:

I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
or something like that :-)  And, on 64 bit, which is what the question
was about!


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-06 20:45           ` Segher Boessenkool
@ 2022-10-06 20:50             ` Christophe Leroy
  2022-10-06 21:04               ` Segher Boessenkool
  0 siblings, 1 reply; 25+ messages in thread
From: Christophe Leroy @ 2022-10-06 20:50 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: ajd, peterz, rostedt, ardb, jbaron, npiggin, Benjamin Gray,
	linuxppc-dev, jpoimboe



Le 06/10/2022 à 22:45, Segher Boessenkool a écrit :
> On Thu, Oct 06, 2022 at 06:38:00PM +0000, Christophe Leroy wrote:
>> Le 06/10/2022 à 20:22, Segher Boessenkool a écrit :
>>> Long ago I built kernels that fit together with the boot firmware and a
>>> root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
>>> close to that at all these days :-)
>>
>> 4MB, not easy. But 8M still achievable. Well our smaller board has 32M,
>> we have thousands of it spread all over Europe and have to keep it up to
>> date ....
> 
> The smallest of these systems had 256MB RAM.  This 4MB is flash ROM :-)

I fit Uboot + DTB + Kernel + Initramfs with klibc and mtdutils in a 2MB 
flash ROM.

> 
>>> What is the overhead if you enable modules but do not use them, these
>>> days?
>>
>> On the 8xx it is mainly the instruction TLB miss handler:
> 
> I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
> or something like that :-)  And, on 64 bit, which is what the question
> was about!

Ah, does the size really matters here ? I was thinking more in terms of 
performance when I made the comment.

Christophe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls
  2022-10-06 20:50             ` Christophe Leroy
@ 2022-10-06 21:04               ` Segher Boessenkool
  0 siblings, 0 replies; 25+ messages in thread
From: Segher Boessenkool @ 2022-10-06 21:04 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: ajd, peterz, rostedt, ardb, jbaron, npiggin, Benjamin Gray,
	linuxppc-dev, jpoimboe

On Thu, Oct 06, 2022 at 08:50:31PM +0000, Christophe Leroy wrote:
> Le 06/10/2022 à 22:45, Segher Boessenkool a écrit :
> > I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
> > or something like that :-)  And, on 64 bit, which is what the question
> > was about!
> 
> Ah, does the size really matters here ? I was thinking more in terms of 
> performance when I made the comment.

Other than some unused code there should not be much performance impact
at all from enabling modules when not needed, on 64 bit.  Security (in
depth) is a very different kettle of fish of course ;-)


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function
  2022-10-06  9:19       ` Christophe Leroy
@ 2022-10-06 21:53         ` Benjamin Gray
  0 siblings, 0 replies; 25+ messages in thread
From: Benjamin Gray @ 2022-10-06 21:53 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
  Cc: ajd, peterz, npiggin, ardb, jbaron, rostedt, jpoimboe

On Thu, 2022-10-06 at 09:19 +0000, Christophe Leroy wrote:
> 
> 
> Le 06/10/2022 à 05:36, Benjamin Gray a écrit :
> > On Wed, 2022-10-05 at 17:55 +0000, Christophe Leroy wrote:
> > > I'm on business trip this week so I can't test it on hardware,
> > > but
> > > the
> > > generated code looks horrid and sub-optimal, with a stack frame
> > > and
> > > so
> > > many registers saved into it. That's mpc885_ads_defconfig built
> > > with
> > > GCC
> > > 12, without modules without stackprotector with 4k pages.
> > 
> > Yeah, that definitely wasn't supposed to happen. I was looking at
> > the
> > 32 and 64 bit machine code closely while actively writing it, but I
> > must have refactored it badly when cleaning up for submission. It
> > was
> > supposed to automatically be inlined, leaving it identical to the
> > original on 32-bit.
> > 
> > Given inlining is desirable here, and 64-bit inlines anyway, I
> > think I
> > should just mark __patch_memory with __always_inline.
> 
> FWIW, I get a far better result with :
> 
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index ba00c550d9d2..447b8de6e427 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -21,7 +21,7 @@ static int __patch_memory(void *patch_addr,
> unsigned 
> long val, void *exec_addr,
>         /* Prefixed instruction may cross cacheline if cacheline
> smaller than 
> 64 bytes */
>         BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES <
> 64);
> 
> -       if (unlikely(is_dword))
> +       if (IS_ENABLED(CONFIG_PPC64) && unlikely(is_dword))
>                 __put_kernel_nofault(patch_addr, &val, u64, failed);
>         else
>                 __put_kernel_nofault(patch_addr, &val, u32, failed);

That's very interesting, as that's what I had originally. I removed the
IS_ENABLED(CONFIG_PPC64) part of the if condition as part of submission
cleanup refactoring because it should be redundant after constant
propagation. The weird thing is that GCC constant propagates either
way, it's just changing it's mind about inlining.

I can add it back, but the optimisation here seems very fragile. It
feels more reliable just to specify it explicitly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-10-06 21:55 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-05  5:32 [PATCH v3 0/6] Out-of-line static calls for powerpc64 ELF V2 Benjamin Gray
2022-10-05  5:32 ` [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function Benjamin Gray
2022-10-05 17:55   ` Christophe Leroy
2022-10-06  3:36     ` Benjamin Gray
2022-10-06  9:19       ` Christophe Leroy
2022-10-06 21:53         ` Benjamin Gray
2022-10-05  5:32 ` [PATCH v3 2/6] powerpc/module: Handle caller-saved TOC in module linker Benjamin Gray
2022-10-05 19:18   ` Christophe Leroy
2022-10-06  3:51     ` Andrew Donnellan
2022-10-06  4:39     ` Benjamin Gray
2022-10-05  5:32 ` [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub Benjamin Gray
2022-10-05 19:21   ` Christophe Leroy
2022-10-06  8:24   ` Andrew Donnellan
2022-10-05  5:32 ` [PATCH v3 4/6] static_call: Move static call selftest to static_call_selftest.c Benjamin Gray
2022-10-05 19:22   ` Christophe Leroy
2022-10-05  5:32 ` [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls Benjamin Gray
2022-10-05 19:38   ` Christophe Leroy
2022-10-06  0:39     ` Michael Ellerman
2022-10-06  5:01       ` Benjamin Gray
2022-10-06 18:22       ` Segher Boessenkool
2022-10-06 18:38         ` Christophe Leroy
2022-10-06 20:45           ` Segher Boessenkool
2022-10-06 20:50             ` Christophe Leroy
2022-10-06 21:04               ` Segher Boessenkool
2022-10-05  5:32 ` [PATCH v3 6/6] powerpc: Add tests " Benjamin Gray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).