* [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, x86, chenzhongjin
This series applies on top of the series v3 "objtool: Enable and
implement --mcount option on powerpc" [1] rebased on powerpc-next branch
A few modifications are done to core parts to enable powerpc
implementation:
- R_X86_64_PC32 is abstracted to R_REL32 so that it can then be
redefined as R_PPC_REL32.
- A call to static_call_init() is added to start_kernel() to avoid
every architecture to have to call it
- Trampoline address is provided to arch_static_call_transform() even
when setting a site to fallback on a call to the trampoline when the
target is too far.
[1] https://lore.kernel.org/lkml/70b6d08d-aced-7f4e-b958-a3c7ae1a9319@csgroup.eu/T/#rb3a073c54aba563a135fba891e0c34c46e47beef
Christophe Leroy (7):
powerpc: Add missing asm/asm.h for objtool
objtool/powerpc: Activate objtool on PPC32
objtool: Add architecture specific R_REL32 macro
objtool/powerpc: Add necessary support for inline static calls
init: Call static_call_init() from start_kernel()
static_call_inline: Provide trampoline address when updating sites
powerpc/static_call: Implement inline static calls
arch/powerpc/Kconfig | 3 +-
arch/powerpc/include/asm/asm.h | 7 +++
arch/powerpc/include/asm/static_call.h | 2 +
arch/powerpc/kernel/cpu_setup_6xx.S | 26 ++++++---
arch/powerpc/kernel/cpu_setup_fsl_booke.S | 8 ++-
arch/powerpc/kernel/entry_32.S | 8 ++-
arch/powerpc/kernel/head_40x.S | 5 +-
arch/powerpc/kernel/head_8xx.S | 5 +-
arch/powerpc/kernel/head_book3s_32.S | 29 +++++++---
arch/powerpc/kernel/head_fsl_booke.S | 5 +-
arch/powerpc/kernel/static_call.c | 56 ++++++++++++++-----
arch/powerpc/kernel/swsusp_32.S | 5 +-
arch/powerpc/kvm/fpu.S | 17 ++++--
arch/powerpc/platforms/52xx/lite5200_sleep.S | 15 +++--
arch/x86/kernel/static_call.c | 2 +-
init/main.c | 1 +
kernel/static_call_inline.c | 2 +-
tools/objtool/arch/powerpc/decode.c | 16 ++++--
tools/objtool/arch/powerpc/include/arch/elf.h | 1 +
tools/objtool/arch/x86/include/arch/elf.h | 1 +
tools/objtool/check.c | 10 ++--
tools/objtool/orc_gen.c | 2 +-
22 files changed, 162 insertions(+), 64 deletions(-)
create mode 100644 arch/powerpc/include/asm/asm.h
--
2.36.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: chenzhongjin, x86, linuxppc-dev, linux-kernel
This series applies on top of the series v3 "objtool: Enable and
implement --mcount option on powerpc" [1] rebased on powerpc-next branch
A few modifications are done to core parts to enable powerpc
implementation:
- R_X86_64_PC32 is abstracted to R_REL32 so that it can then be
redefined as R_PPC_REL32.
- A call to static_call_init() is added to start_kernel() to avoid
every architecture to have to call it
- Trampoline address is provided to arch_static_call_transform() even
when setting a site to fallback on a call to the trampoline when the
target is too far.
[1] https://lore.kernel.org/lkml/70b6d08d-aced-7f4e-b958-a3c7ae1a9319@csgroup.eu/T/#rb3a073c54aba563a135fba891e0c34c46e47beef
Christophe Leroy (7):
powerpc: Add missing asm/asm.h for objtool
objtool/powerpc: Activate objtool on PPC32
objtool: Add architecture specific R_REL32 macro
objtool/powerpc: Add necessary support for inline static calls
init: Call static_call_init() from start_kernel()
static_call_inline: Provide trampoline address when updating sites
powerpc/static_call: Implement inline static calls
arch/powerpc/Kconfig | 3 +-
arch/powerpc/include/asm/asm.h | 7 +++
arch/powerpc/include/asm/static_call.h | 2 +
arch/powerpc/kernel/cpu_setup_6xx.S | 26 ++++++---
arch/powerpc/kernel/cpu_setup_fsl_booke.S | 8 ++-
arch/powerpc/kernel/entry_32.S | 8 ++-
arch/powerpc/kernel/head_40x.S | 5 +-
arch/powerpc/kernel/head_8xx.S | 5 +-
arch/powerpc/kernel/head_book3s_32.S | 29 +++++++---
arch/powerpc/kernel/head_fsl_booke.S | 5 +-
arch/powerpc/kernel/static_call.c | 56 ++++++++++++++-----
arch/powerpc/kernel/swsusp_32.S | 5 +-
arch/powerpc/kvm/fpu.S | 17 ++++--
arch/powerpc/platforms/52xx/lite5200_sleep.S | 15 +++--
arch/x86/kernel/static_call.c | 2 +-
init/main.c | 1 +
kernel/static_call_inline.c | 2 +-
tools/objtool/arch/powerpc/decode.c | 16 ++++--
tools/objtool/arch/powerpc/include/arch/elf.h | 1 +
tools/objtool/arch/x86/include/arch/elf.h | 1 +
tools/objtool/check.c | 10 ++--
tools/objtool/orc_gen.c | 2 +-
22 files changed, 162 insertions(+), 64 deletions(-)
create mode 100644 arch/powerpc/include/asm/asm.h
--
2.36.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v2 1/7] powerpc: Add missing asm/asm.h for objtool
2022-07-08 17:31 ` Christophe Leroy
@ 2022-07-08 17:31 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, x86, chenzhongjin
Since commit e2ef115813c3 ("objtool: Fix STACK_FRAME_NON_STANDARD
reloc type"), powerpc needs asm/asm.h to enable objtool.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/asm.h | 7 +++++++
1 file changed, 7 insertions(+)
create mode 100644 arch/powerpc/include/asm/asm.h
diff --git a/arch/powerpc/include/asm/asm.h b/arch/powerpc/include/asm/asm.h
new file mode 100644
index 000000000000..86f46b604e9a
--- /dev/null
+++ b/arch/powerpc/include/asm/asm.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_ASM_H
+#define _ASM_POWERPC_ASM_H
+
+#define _ASM_PTR " .long "
+
+#endif /* _ASM_POWERPC_ASM_H */
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 1/7] powerpc: Add missing asm/asm.h for objtool
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: chenzhongjin, x86, linuxppc-dev, linux-kernel
Since commit e2ef115813c3 ("objtool: Fix STACK_FRAME_NON_STANDARD
reloc type"), powerpc needs asm/asm.h to enable objtool.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/asm.h | 7 +++++++
1 file changed, 7 insertions(+)
create mode 100644 arch/powerpc/include/asm/asm.h
diff --git a/arch/powerpc/include/asm/asm.h b/arch/powerpc/include/asm/asm.h
new file mode 100644
index 000000000000..86f46b604e9a
--- /dev/null
+++ b/arch/powerpc/include/asm/asm.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_ASM_H
+#define _ASM_POWERPC_ASM_H
+
+#define _ASM_PTR " .long "
+
+#endif /* _ASM_POWERPC_ASM_H */
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 2/7] objtool/powerpc: Activate objtool on PPC32
2022-07-08 17:31 ` Christophe Leroy
@ 2022-07-08 17:31 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, x86, chenzhongjin
Fix several annotations in assembly files and enable objtool on PPC32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/Kconfig | 2 +-
arch/powerpc/kernel/cpu_setup_6xx.S | 26 ++++++++++++------
arch/powerpc/kernel/cpu_setup_fsl_booke.S | 8 ++++--
arch/powerpc/kernel/entry_32.S | 8 ++++--
arch/powerpc/kernel/head_40x.S | 5 +++-
arch/powerpc/kernel/head_8xx.S | 5 +++-
arch/powerpc/kernel/head_book3s_32.S | 29 ++++++++++++++------
arch/powerpc/kernel/head_fsl_booke.S | 5 +++-
arch/powerpc/kernel/swsusp_32.S | 5 +++-
arch/powerpc/kvm/fpu.S | 17 ++++++++----
arch/powerpc/platforms/52xx/lite5200_sleep.S | 15 +++++++---
11 files changed, 90 insertions(+), 35 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 96263d78aec9..00a43eb26418 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -237,7 +237,7 @@ config PPC
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S)
select HAVE_OPTPROBES
- select HAVE_OBJTOOL if PPC64
+ select HAVE_OBJTOOL
select HAVE_OBJTOOL_MCOUNT if HAVE_OBJTOOL
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI if PPC64
diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S b/arch/powerpc/kernel/cpu_setup_6xx.S
index f8b5ff64b604..f29ce3dd6140 100644
--- a/arch/powerpc/kernel/cpu_setup_6xx.S
+++ b/arch/powerpc/kernel/cpu_setup_6xx.S
@@ -4,6 +4,8 @@
* Copyright (C) 2003 Benjamin Herrenschmidt (benh@kernel.crashing.org)
*/
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/cputable.h>
@@ -81,7 +83,7 @@ _GLOBAL(__setup_cpu_745x)
blr
/* Enable caches for 603's, 604, 750 & 7400 */
-setup_common_caches:
+SYM_FUNC_START_LOCAL(setup_common_caches)
mfspr r11,SPRN_HID0
andi. r0,r11,HID0_DCE
ori r11,r11,HID0_ICE|HID0_DCE
@@ -95,11 +97,12 @@ setup_common_caches:
sync
isync
blr
+SYM_FUNC_END(setup_common_caches)
/* 604, 604e, 604ev, ...
* Enable superscalar execution & branch history table
*/
-setup_604_hid0:
+SYM_FUNC_START_LOCAL(setup_604_hid0)
mfspr r11,SPRN_HID0
ori r11,r11,HID0_SIED|HID0_BHTE
ori r8,r11,HID0_BTCD
@@ -110,6 +113,7 @@ setup_604_hid0:
sync
isync
blr
+SYM_FUNC_END(setup_604_hid0)
/* 7400 <= rev 2.7 and 7410 rev = 1.0 suffer from some
* erratas we work around here.
@@ -125,13 +129,14 @@ setup_604_hid0:
* needed once we have applied workaround #5 (though it's
* not set by Apple's firmware at least).
*/
-setup_7400_workarounds:
+SYM_FUNC_START_LOCAL(setup_7400_workarounds)
mfpvr r3
rlwinm r3,r3,0,20,31
cmpwi 0,r3,0x0207
ble 1f
blr
-setup_7410_workarounds:
+SYM_FUNC_END(setup_7400_workarounds)
+SYM_FUNC_START_LOCAL(setup_7410_workarounds)
mfpvr r3
rlwinm r3,r3,0,20,31
cmpwi 0,r3,0x0100
@@ -151,6 +156,7 @@ setup_7410_workarounds:
sync
isync
blr
+SYM_FUNC_END(setup_7410_workarounds)
/* 740/750/7400/7410
* Enable Store Gathering (SGE), Address Broadcast (ABE),
@@ -158,7 +164,7 @@ setup_7410_workarounds:
* Dynamic Power Management (DPM), Speculative (SPD)
* Clear Instruction cache throttling (ICTC)
*/
-setup_750_7400_hid0:
+SYM_FUNC_START_LOCAL(setup_750_7400_hid0)
mfspr r11,SPRN_HID0
ori r11,r11,HID0_SGE | HID0_ABE | HID0_BHTE | HID0_BTIC
oris r11,r11,HID0_DPM@h
@@ -177,12 +183,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_NO_DPM)
sync
isync
blr
+SYM_FUNC_END(setup_750_7400_hid0)
/* 750cx specific
* Looks like we have to disable NAP feature for some PLL settings...
* (waiting for confirmation)
*/
-setup_750cx:
+SYM_FUNC_START_LOCAL(setup_750cx)
mfspr r10, SPRN_HID1
rlwinm r10,r10,4,28,31
cmpwi cr0,r10,7
@@ -196,11 +203,13 @@ setup_750cx:
andc r6,r6,r7
stw r6,CPU_SPEC_FEATURES(r4)
blr
+SYM_FUNC_END(setup_750cx)
/* 750fx specific
*/
-setup_750fx:
+SYM_FUNC_START_LOCAL(setup_750fx)
blr
+SYM_FUNC_END(setup_750fx)
/* MPC 745x
* Enable Store Gathering (SGE), Branch Folding (FOLD)
@@ -212,7 +221,7 @@ setup_750fx:
* Clear Instruction cache throttling (ICTC)
* Enable L2 HW prefetch
*/
-setup_745x_specifics:
+SYM_FUNC_START_LOCAL(setup_745x_specifics)
/* We check for the presence of an L3 cache setup by
* the firmware. If any, we disable NAP capability as
* it's known to be bogus on rev 2.1 and earlier
@@ -270,6 +279,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NO_DPM)
sync
isync
blr
+SYM_FUNC_END(setup_745x_specifics)
/*
* Initialize the FPU registers. This is needed to work around an errata
diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
index 4bf33f1b4193..f573a4f3bbe6 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
@@ -8,6 +8,8 @@
* Benjamin Herrenschmidt <benh@kernel.crashing.org>
*/
+#include <linux/linkage.h>
+
#include <asm/page.h>
#include <asm/processor.h>
#include <asm/cputable.h>
@@ -274,7 +276,7 @@ _GLOBAL(flush_dcache_L1)
blr
-has_L2_cache:
+SYM_FUNC_START_LOCAL(has_L2_cache)
/* skip L2 cache on P2040/P2040E as they have no L2 cache */
mfspr r3, SPRN_SVR
/* shift right by 8 bits and clear E bit of SVR */
@@ -290,9 +292,10 @@ has_L2_cache:
1:
li r3, 0
blr
+SYM_FUNC_END(has_L2_cache)
/* flush backside L2 cache */
-flush_backside_L2_cache:
+SYM_FUNC_START_LOCAL(flush_backside_L2_cache)
mflr r10
bl has_L2_cache
mtlr r10
@@ -313,6 +316,7 @@ flush_backside_L2_cache:
bne 1b
2:
blr
+SYM_FUNC_END(flush_backside_L2_cache)
_GLOBAL(cpu_down_flush_e500v2)
mflr r0
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1d599df6f169..f47b682d4667 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -18,6 +18,8 @@
#include <linux/err.h>
#include <linux/sys.h>
#include <linux/threads.h>
+#include <linux/linkage.h>
+
#include <asm/reg.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -74,17 +76,19 @@ _ASM_NOKPROBE_SYMBOL(prepare_transfer_to_handler)
#endif /* CONFIG_PPC_BOOK3S_32 || CONFIG_E500 */
#if defined(CONFIG_PPC_KUEP) && defined(CONFIG_PPC_BOOK3S_32)
- .globl __kuep_lock
+SYM_FUNC_START(__kuep_lock)
__kuep_lock:
lwz r9, THREAD+THSR0(r2)
update_user_segments_by_4 r9, r10, r11, r12
blr
+SYM_FUNC_END(__kuep_lock)
-__kuep_unlock:
+SYM_FUNC_START_LOCAL(__kuep_unlock)
lwz r9, THREAD+THSR0(r2)
rlwinm r9,r9,0,~SR_NX
update_user_segments_by_4 r9, r10, r11, r12
blr
+SYM_FUNC_END(__kuep_unlock)
.macro kuep_lock
bl __kuep_lock
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 088f500896c7..9110fe9d6747 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -28,6 +28,8 @@
#include <linux/init.h>
#include <linux/pgtable.h>
#include <linux/sizes.h>
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -662,7 +664,7 @@ start_here:
* kernel initialization. This maps the first 32 MBytes of memory 1:1
* virtual to physical and more importantly sets the cache mode.
*/
-initial_mmu:
+SYM_FUNC_START_LOCAL(initial_mmu)
tlbia /* Invalidate all TLB entries */
isync
@@ -711,6 +713,7 @@ initial_mmu:
mtspr SPRN_EVPR,r0
blr
+SYM_FUNC_END(initial_mmu)
_GLOBAL(abort)
mfspr r13,SPRN_DBCR0
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 0b05f2be66b9..c94ed5a08c93 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -18,6 +18,8 @@
#include <linux/magic.h>
#include <linux/pgtable.h>
#include <linux/sizes.h>
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -625,7 +627,7 @@ start_here:
* 24 Mbytes of data, and the 512k IMMR space. Anything not covered by
* these mappings is mapped by page tables.
*/
-initial_mmu:
+SYM_FUNC_START_LOCAL(initial_mmu)
li r8, 0
mtspr SPRN_MI_CTR, r8 /* remove PINNED ITLB entries */
lis r10, MD_TWAM@h
@@ -686,6 +688,7 @@ initial_mmu:
#endif
mtspr SPRN_DER, r8
blr
+SYM_FUNC_END(initial_mmu)
_GLOBAL(mmu_pin_tlb)
lis r9, (1f - PAGE_OFFSET)@h
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 6c739beb938c..c0e0868ba01a 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -18,6 +18,8 @@
#include <linux/init.h>
#include <linux/pgtable.h>
+#include <linux/linkage.h>
+
#include <asm/reg.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -877,7 +879,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
* Load stuff into the MMU. Intended to be called with
* IR=0 and DR=0.
*/
-early_hash_table:
+SYM_FUNC_START_LOCAL(early_hash_table)
sync /* Force all PTE updates to finish */
isync
tlbia /* Clear all TLB entries */
@@ -888,8 +890,9 @@ early_hash_table:
ori r6, r6, 3 /* 256kB table */
mtspr SPRN_SDR1, r6
blr
+SYM_FUNC_END(early_hash_table)
-load_up_mmu:
+SYM_FUNC_START_LOCAL(load_up_mmu)
sync /* Force all PTE updates to finish */
isync
tlbia /* Clear all TLB entries */
@@ -918,6 +921,7 @@ BEGIN_MMU_FTR_SECTION
LOAD_BAT(7,r3,r4,r5)
END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
blr
+SYM_FUNC_END(load_up_mmu)
_GLOBAL(load_segment_registers)
li r0, NUM_USER_SEGMENTS /* load up user segment register values */
@@ -1028,7 +1032,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
* this makes sure it's done.
* -- Cort
*/
-clear_bats:
+SYM_FUNC_START_LOCAL(clear_bats)
li r10,0
mtspr SPRN_DBAT0U,r10
@@ -1072,6 +1076,7 @@ BEGIN_MMU_FTR_SECTION
mtspr SPRN_IBAT7L,r10
END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
blr
+SYM_FUNC_END(clear_bats)
_GLOBAL(update_bats)
lis r4, 1f@h
@@ -1108,15 +1113,16 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
mtspr SPRN_SRR1, r6
rfi
-flush_tlbs:
+SYM_FUNC_START_LOCAL(flush_tlbs)
lis r10, 0x40
1: addic. r10, r10, -0x1000
tlbie r10
bgt 1b
sync
blr
+SYM_FUNC_END(flush_tlbs)
-mmu_off:
+SYM_FUNC_START_LOCAL(mmu_off)
addi r4, r3, __after_mmu_off - _start
mfmsr r3
andi. r0,r3,MSR_DR|MSR_IR /* MMU enabled? */
@@ -1128,9 +1134,10 @@ mmu_off:
mtspr SPRN_SRR1,r3
sync
rfi
+SYM_FUNC_END(mmu_off)
/* We use one BAT to map up to 256M of RAM at _PAGE_OFFSET */
-initial_bats:
+SYM_FUNC_START_LOCAL(initial_bats)
lis r11,PAGE_OFFSET@h
tophys(r8,r11)
#ifdef CONFIG_SMP
@@ -1146,9 +1153,10 @@ initial_bats:
mtspr SPRN_IBAT0U,r11
isync
blr
+SYM_FUNC_END(initial_bats)
#ifdef CONFIG_BOOTX_TEXT
-setup_disp_bat:
+SYM_FUNC_START_LOCAL(setup_disp_bat)
/*
* setup the display bat prepared for us in prom.c
*/
@@ -1164,10 +1172,11 @@ setup_disp_bat:
mtspr SPRN_DBAT3L,r8
mtspr SPRN_DBAT3U,r11
blr
+SYM_FUNC_END(setup_disp_bat)
#endif /* CONFIG_BOOTX_TEXT */
#ifdef CONFIG_PPC_EARLY_DEBUG_CPM
-setup_cpm_bat:
+SYM_FUNC_START_LOCAL(setup_cpm_bat)
lis r8, 0xf000
ori r8, r8, 0x002a
mtspr SPRN_DBAT1L, r8
@@ -1177,10 +1186,11 @@ setup_cpm_bat:
mtspr SPRN_DBAT1U, r11
blr
+SYM_FUNC_END(setup_cpm_bat)
#endif
#ifdef CONFIG_PPC_EARLY_DEBUG_USBGECKO
-setup_usbgecko_bat:
+SYM_FUNC_START_LOCAL(setup_usbgecko_bat)
/* prepare a BAT for early io */
#if defined(CONFIG_GAMECUBE)
lis r8, 0x0c00
@@ -1199,6 +1209,7 @@ setup_usbgecko_bat:
mtspr SPRN_DBAT1L, r8
mtspr SPRN_DBAT1U, r11
blr
+SYM_FUNC_END(setup_usbgecko_bat)
#endif
.data
diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index f0db4f52bc00..744b096857a1 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -29,6 +29,8 @@
#include <linux/init.h>
#include <linux/threads.h>
#include <linux/pgtable.h>
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -885,7 +887,7 @@ KernelSPE:
* Translate the effec addr in r3 to phys addr. The phys addr will be put
* into r3(higher 32bit) and r4(lower 32bit)
*/
-get_phys_addr:
+SYM_FUNC_START_LOCAL(get_phys_addr)
mfmsr r8
mfspr r9,SPRN_PID
rlwinm r9,r9,16,0x3fff0000 /* turn PID into MAS6[SPID] */
@@ -907,6 +909,7 @@ get_phys_addr:
mfspr r3,SPRN_MAS7
#endif
blr
+SYM_FUNC_END(get_phys_addr)
/*
* Global functions
diff --git a/arch/powerpc/kernel/swsusp_32.S b/arch/powerpc/kernel/swsusp_32.S
index e0cbd63007f2..ffb79326483c 100644
--- a/arch/powerpc/kernel/swsusp_32.S
+++ b/arch/powerpc/kernel/swsusp_32.S
@@ -1,5 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/threads.h>
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/cputable.h>
@@ -400,7 +402,7 @@ _ASM_NOKPROBE_SYMBOL(swsusp_arch_resume)
/* FIXME:This construct is actually not useful since we don't shut
* down the instruction MMU, we could just flip back MSR-DR on.
*/
-turn_on_mmu:
+SYM_FUNC_START_LOCAL(turn_on_mmu)
mflr r4
mtsrr0 r4
mtsrr1 r3
@@ -408,4 +410,5 @@ turn_on_mmu:
isync
rfi
_ASM_NOKPROBE_SYMBOL(turn_on_mmu)
+SYM_FUNC_END(turn_on_mmu)
diff --git a/arch/powerpc/kvm/fpu.S b/arch/powerpc/kvm/fpu.S
index 315c94946bad..b68e7f26a81f 100644
--- a/arch/powerpc/kvm/fpu.S
+++ b/arch/powerpc/kvm/fpu.S
@@ -6,6 +6,8 @@
*/
#include <linux/pgtable.h>
+#include <linux/linkage.h>
+
#include <asm/reg.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -110,18 +112,22 @@ FPS_THREE_IN(fsel)
* R8 = (double*)¶m3 [load_three]
* LR = instruction call function
*/
-fpd_load_three:
+SYM_FUNC_START_LOCAL(fpd_load_three)
lfd 2,0(r8) /* load param3 */
-fpd_load_two:
+SYM_FUNC_START_LOCAL(fpd_load_two)
lfd 1,0(r7) /* load param2 */
-fpd_load_one:
+SYM_FUNC_START_LOCAL(fpd_load_one)
lfd 0,0(r6) /* load param1 */
-fpd_load_none:
+SYM_FUNC_START_LOCAL(fpd_load_none)
lfd 3,0(r3) /* load up fpscr value */
MTFSF_L(3)
lwz r6, 0(r4) /* load cr */
mtcr r6
blr
+SYM_FUNC_END(fpd_load_none)
+SYM_FUNC_END(fpd_load_one)
+SYM_FUNC_END(fpd_load_two)
+SYM_FUNC_END(fpd_load_three)
/*
* End of double instruction processing
@@ -131,13 +137,14 @@ fpd_load_none:
* R5 = (double*)&result
* LR = caller of instruction call function
*/
-fpd_return:
+SYM_FUNC_START_LOCAL(fpd_return)
mfcr r6
stfd 0,0(r5) /* save result */
mffs 0
stfd 0,0(r3) /* save new fpscr value */
stw r6,0(r4) /* save new cr value */
blr
+SYM_FUNC_END(fpd_return)
/*
* Double operation with no input operand
diff --git a/arch/powerpc/platforms/52xx/lite5200_sleep.S b/arch/powerpc/platforms/52xx/lite5200_sleep.S
index afee8b1515a8..0b12647e7b42 100644
--- a/arch/powerpc/platforms/52xx/lite5200_sleep.S
+++ b/arch/powerpc/platforms/52xx/lite5200_sleep.S
@@ -1,4 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/linkage.h>
+
#include <asm/reg.h>
#include <asm/ppc_asm.h>
#include <asm/processor.h>
@@ -178,7 +180,8 @@ sram_code:
/* local udelay in sram is needed */
- udelay: /* r11 - tb_ticks_per_usec, r12 - usecs, overwrites r13 */
+SYM_FUNC_START_LOCAL(udelay)
+ /* r11 - tb_ticks_per_usec, r12 - usecs, overwrites r13 */
mullw r12, r12, r11
mftb r13 /* start */
add r12, r13, r12 /* end */
@@ -187,6 +190,7 @@ sram_code:
cmp cr0, r13, r12
blt 1b
blr
+SYM_FUNC_END(udelay)
sram_code_end:
@@ -271,7 +275,7 @@ _ASM_NOKPROBE_SYMBOL(lite5200_wakeup)
SAVE_SR(n+2, addr+2); \
SAVE_SR(n+3, addr+3);
-save_regs:
+SYM_FUNC_START_LOCAL(save_regs)
stw r0, 0(r4)
stw r1, 0x4(r4)
stw r2, 0x8(r4)
@@ -317,6 +321,7 @@ save_regs:
SAVE_SPRN(TBRU, 0x5b)
blr
+SYM_FUNC_END(save_regs)
/* restore registers */
@@ -336,7 +341,7 @@ save_regs:
LOAD_SR(n+2, addr+2); \
LOAD_SR(n+3, addr+3);
-restore_regs:
+SYM_FUNC_START_LOCAL(restore_regs)
lis r4, registers@h
ori r4, r4, registers@l
@@ -393,6 +398,7 @@ restore_regs:
blr
_ASM_NOKPROBE_SYMBOL(restore_regs)
+SYM_FUNC_END(restore_regs)
@@ -403,7 +409,7 @@ _ASM_NOKPROBE_SYMBOL(restore_regs)
* Flush data cache
* Do this by just reading lots of stuff into the cache.
*/
-flush_data_cache:
+SYM_FUNC_START_LOCAL(flush_data_cache)
lis r3,CONFIG_KERNEL_START@h
ori r3,r3,CONFIG_KERNEL_START@l
li r4,NUM_CACHE_LINES
@@ -413,3 +419,4 @@ flush_data_cache:
addi r3,r3,L1_CACHE_BYTES /* Next line, please */
bdnz 1b
blr
+SYM_FUNC_END(flush_data_cache)
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 2/7] objtool/powerpc: Activate objtool on PPC32
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: chenzhongjin, x86, linuxppc-dev, linux-kernel
Fix several annotations in assembly files and enable objtool on PPC32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/Kconfig | 2 +-
arch/powerpc/kernel/cpu_setup_6xx.S | 26 ++++++++++++------
arch/powerpc/kernel/cpu_setup_fsl_booke.S | 8 ++++--
arch/powerpc/kernel/entry_32.S | 8 ++++--
arch/powerpc/kernel/head_40x.S | 5 +++-
arch/powerpc/kernel/head_8xx.S | 5 +++-
arch/powerpc/kernel/head_book3s_32.S | 29 ++++++++++++++------
arch/powerpc/kernel/head_fsl_booke.S | 5 +++-
arch/powerpc/kernel/swsusp_32.S | 5 +++-
arch/powerpc/kvm/fpu.S | 17 ++++++++----
arch/powerpc/platforms/52xx/lite5200_sleep.S | 15 +++++++---
11 files changed, 90 insertions(+), 35 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 96263d78aec9..00a43eb26418 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -237,7 +237,7 @@ config PPC
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S)
select HAVE_OPTPROBES
- select HAVE_OBJTOOL if PPC64
+ select HAVE_OBJTOOL
select HAVE_OBJTOOL_MCOUNT if HAVE_OBJTOOL
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI if PPC64
diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S b/arch/powerpc/kernel/cpu_setup_6xx.S
index f8b5ff64b604..f29ce3dd6140 100644
--- a/arch/powerpc/kernel/cpu_setup_6xx.S
+++ b/arch/powerpc/kernel/cpu_setup_6xx.S
@@ -4,6 +4,8 @@
* Copyright (C) 2003 Benjamin Herrenschmidt (benh@kernel.crashing.org)
*/
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/cputable.h>
@@ -81,7 +83,7 @@ _GLOBAL(__setup_cpu_745x)
blr
/* Enable caches for 603's, 604, 750 & 7400 */
-setup_common_caches:
+SYM_FUNC_START_LOCAL(setup_common_caches)
mfspr r11,SPRN_HID0
andi. r0,r11,HID0_DCE
ori r11,r11,HID0_ICE|HID0_DCE
@@ -95,11 +97,12 @@ setup_common_caches:
sync
isync
blr
+SYM_FUNC_END(setup_common_caches)
/* 604, 604e, 604ev, ...
* Enable superscalar execution & branch history table
*/
-setup_604_hid0:
+SYM_FUNC_START_LOCAL(setup_604_hid0)
mfspr r11,SPRN_HID0
ori r11,r11,HID0_SIED|HID0_BHTE
ori r8,r11,HID0_BTCD
@@ -110,6 +113,7 @@ setup_604_hid0:
sync
isync
blr
+SYM_FUNC_END(setup_604_hid0)
/* 7400 <= rev 2.7 and 7410 rev = 1.0 suffer from some
* erratas we work around here.
@@ -125,13 +129,14 @@ setup_604_hid0:
* needed once we have applied workaround #5 (though it's
* not set by Apple's firmware at least).
*/
-setup_7400_workarounds:
+SYM_FUNC_START_LOCAL(setup_7400_workarounds)
mfpvr r3
rlwinm r3,r3,0,20,31
cmpwi 0,r3,0x0207
ble 1f
blr
-setup_7410_workarounds:
+SYM_FUNC_END(setup_7400_workarounds)
+SYM_FUNC_START_LOCAL(setup_7410_workarounds)
mfpvr r3
rlwinm r3,r3,0,20,31
cmpwi 0,r3,0x0100
@@ -151,6 +156,7 @@ setup_7410_workarounds:
sync
isync
blr
+SYM_FUNC_END(setup_7410_workarounds)
/* 740/750/7400/7410
* Enable Store Gathering (SGE), Address Broadcast (ABE),
@@ -158,7 +164,7 @@ setup_7410_workarounds:
* Dynamic Power Management (DPM), Speculative (SPD)
* Clear Instruction cache throttling (ICTC)
*/
-setup_750_7400_hid0:
+SYM_FUNC_START_LOCAL(setup_750_7400_hid0)
mfspr r11,SPRN_HID0
ori r11,r11,HID0_SGE | HID0_ABE | HID0_BHTE | HID0_BTIC
oris r11,r11,HID0_DPM@h
@@ -177,12 +183,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_NO_DPM)
sync
isync
blr
+SYM_FUNC_END(setup_750_7400_hid0)
/* 750cx specific
* Looks like we have to disable NAP feature for some PLL settings...
* (waiting for confirmation)
*/
-setup_750cx:
+SYM_FUNC_START_LOCAL(setup_750cx)
mfspr r10, SPRN_HID1
rlwinm r10,r10,4,28,31
cmpwi cr0,r10,7
@@ -196,11 +203,13 @@ setup_750cx:
andc r6,r6,r7
stw r6,CPU_SPEC_FEATURES(r4)
blr
+SYM_FUNC_END(setup_750cx)
/* 750fx specific
*/
-setup_750fx:
+SYM_FUNC_START_LOCAL(setup_750fx)
blr
+SYM_FUNC_END(setup_750fx)
/* MPC 745x
* Enable Store Gathering (SGE), Branch Folding (FOLD)
@@ -212,7 +221,7 @@ setup_750fx:
* Clear Instruction cache throttling (ICTC)
* Enable L2 HW prefetch
*/
-setup_745x_specifics:
+SYM_FUNC_START_LOCAL(setup_745x_specifics)
/* We check for the presence of an L3 cache setup by
* the firmware. If any, we disable NAP capability as
* it's known to be bogus on rev 2.1 and earlier
@@ -270,6 +279,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NO_DPM)
sync
isync
blr
+SYM_FUNC_END(setup_745x_specifics)
/*
* Initialize the FPU registers. This is needed to work around an errata
diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
index 4bf33f1b4193..f573a4f3bbe6 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
@@ -8,6 +8,8 @@
* Benjamin Herrenschmidt <benh@kernel.crashing.org>
*/
+#include <linux/linkage.h>
+
#include <asm/page.h>
#include <asm/processor.h>
#include <asm/cputable.h>
@@ -274,7 +276,7 @@ _GLOBAL(flush_dcache_L1)
blr
-has_L2_cache:
+SYM_FUNC_START_LOCAL(has_L2_cache)
/* skip L2 cache on P2040/P2040E as they have no L2 cache */
mfspr r3, SPRN_SVR
/* shift right by 8 bits and clear E bit of SVR */
@@ -290,9 +292,10 @@ has_L2_cache:
1:
li r3, 0
blr
+SYM_FUNC_END(has_L2_cache)
/* flush backside L2 cache */
-flush_backside_L2_cache:
+SYM_FUNC_START_LOCAL(flush_backside_L2_cache)
mflr r10
bl has_L2_cache
mtlr r10
@@ -313,6 +316,7 @@ flush_backside_L2_cache:
bne 1b
2:
blr
+SYM_FUNC_END(flush_backside_L2_cache)
_GLOBAL(cpu_down_flush_e500v2)
mflr r0
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1d599df6f169..f47b682d4667 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -18,6 +18,8 @@
#include <linux/err.h>
#include <linux/sys.h>
#include <linux/threads.h>
+#include <linux/linkage.h>
+
#include <asm/reg.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -74,17 +76,19 @@ _ASM_NOKPROBE_SYMBOL(prepare_transfer_to_handler)
#endif /* CONFIG_PPC_BOOK3S_32 || CONFIG_E500 */
#if defined(CONFIG_PPC_KUEP) && defined(CONFIG_PPC_BOOK3S_32)
- .globl __kuep_lock
+SYM_FUNC_START(__kuep_lock)
__kuep_lock:
lwz r9, THREAD+THSR0(r2)
update_user_segments_by_4 r9, r10, r11, r12
blr
+SYM_FUNC_END(__kuep_lock)
-__kuep_unlock:
+SYM_FUNC_START_LOCAL(__kuep_unlock)
lwz r9, THREAD+THSR0(r2)
rlwinm r9,r9,0,~SR_NX
update_user_segments_by_4 r9, r10, r11, r12
blr
+SYM_FUNC_END(__kuep_unlock)
.macro kuep_lock
bl __kuep_lock
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 088f500896c7..9110fe9d6747 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -28,6 +28,8 @@
#include <linux/init.h>
#include <linux/pgtable.h>
#include <linux/sizes.h>
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -662,7 +664,7 @@ start_here:
* kernel initialization. This maps the first 32 MBytes of memory 1:1
* virtual to physical and more importantly sets the cache mode.
*/
-initial_mmu:
+SYM_FUNC_START_LOCAL(initial_mmu)
tlbia /* Invalidate all TLB entries */
isync
@@ -711,6 +713,7 @@ initial_mmu:
mtspr SPRN_EVPR,r0
blr
+SYM_FUNC_END(initial_mmu)
_GLOBAL(abort)
mfspr r13,SPRN_DBCR0
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 0b05f2be66b9..c94ed5a08c93 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -18,6 +18,8 @@
#include <linux/magic.h>
#include <linux/pgtable.h>
#include <linux/sizes.h>
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -625,7 +627,7 @@ start_here:
* 24 Mbytes of data, and the 512k IMMR space. Anything not covered by
* these mappings is mapped by page tables.
*/
-initial_mmu:
+SYM_FUNC_START_LOCAL(initial_mmu)
li r8, 0
mtspr SPRN_MI_CTR, r8 /* remove PINNED ITLB entries */
lis r10, MD_TWAM@h
@@ -686,6 +688,7 @@ initial_mmu:
#endif
mtspr SPRN_DER, r8
blr
+SYM_FUNC_END(initial_mmu)
_GLOBAL(mmu_pin_tlb)
lis r9, (1f - PAGE_OFFSET)@h
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 6c739beb938c..c0e0868ba01a 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -18,6 +18,8 @@
#include <linux/init.h>
#include <linux/pgtable.h>
+#include <linux/linkage.h>
+
#include <asm/reg.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -877,7 +879,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
* Load stuff into the MMU. Intended to be called with
* IR=0 and DR=0.
*/
-early_hash_table:
+SYM_FUNC_START_LOCAL(early_hash_table)
sync /* Force all PTE updates to finish */
isync
tlbia /* Clear all TLB entries */
@@ -888,8 +890,9 @@ early_hash_table:
ori r6, r6, 3 /* 256kB table */
mtspr SPRN_SDR1, r6
blr
+SYM_FUNC_END(early_hash_table)
-load_up_mmu:
+SYM_FUNC_START_LOCAL(load_up_mmu)
sync /* Force all PTE updates to finish */
isync
tlbia /* Clear all TLB entries */
@@ -918,6 +921,7 @@ BEGIN_MMU_FTR_SECTION
LOAD_BAT(7,r3,r4,r5)
END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
blr
+SYM_FUNC_END(load_up_mmu)
_GLOBAL(load_segment_registers)
li r0, NUM_USER_SEGMENTS /* load up user segment register values */
@@ -1028,7 +1032,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_HPTE_TABLE)
* this makes sure it's done.
* -- Cort
*/
-clear_bats:
+SYM_FUNC_START_LOCAL(clear_bats)
li r10,0
mtspr SPRN_DBAT0U,r10
@@ -1072,6 +1076,7 @@ BEGIN_MMU_FTR_SECTION
mtspr SPRN_IBAT7L,r10
END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
blr
+SYM_FUNC_END(clear_bats)
_GLOBAL(update_bats)
lis r4, 1f@h
@@ -1108,15 +1113,16 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
mtspr SPRN_SRR1, r6
rfi
-flush_tlbs:
+SYM_FUNC_START_LOCAL(flush_tlbs)
lis r10, 0x40
1: addic. r10, r10, -0x1000
tlbie r10
bgt 1b
sync
blr
+SYM_FUNC_END(flush_tlbs)
-mmu_off:
+SYM_FUNC_START_LOCAL(mmu_off)
addi r4, r3, __after_mmu_off - _start
mfmsr r3
andi. r0,r3,MSR_DR|MSR_IR /* MMU enabled? */
@@ -1128,9 +1134,10 @@ mmu_off:
mtspr SPRN_SRR1,r3
sync
rfi
+SYM_FUNC_END(mmu_off)
/* We use one BAT to map up to 256M of RAM at _PAGE_OFFSET */
-initial_bats:
+SYM_FUNC_START_LOCAL(initial_bats)
lis r11,PAGE_OFFSET@h
tophys(r8,r11)
#ifdef CONFIG_SMP
@@ -1146,9 +1153,10 @@ initial_bats:
mtspr SPRN_IBAT0U,r11
isync
blr
+SYM_FUNC_END(initial_bats)
#ifdef CONFIG_BOOTX_TEXT
-setup_disp_bat:
+SYM_FUNC_START_LOCAL(setup_disp_bat)
/*
* setup the display bat prepared for us in prom.c
*/
@@ -1164,10 +1172,11 @@ setup_disp_bat:
mtspr SPRN_DBAT3L,r8
mtspr SPRN_DBAT3U,r11
blr
+SYM_FUNC_END(setup_disp_bat)
#endif /* CONFIG_BOOTX_TEXT */
#ifdef CONFIG_PPC_EARLY_DEBUG_CPM
-setup_cpm_bat:
+SYM_FUNC_START_LOCAL(setup_cpm_bat)
lis r8, 0xf000
ori r8, r8, 0x002a
mtspr SPRN_DBAT1L, r8
@@ -1177,10 +1186,11 @@ setup_cpm_bat:
mtspr SPRN_DBAT1U, r11
blr
+SYM_FUNC_END(setup_cpm_bat)
#endif
#ifdef CONFIG_PPC_EARLY_DEBUG_USBGECKO
-setup_usbgecko_bat:
+SYM_FUNC_START_LOCAL(setup_usbgecko_bat)
/* prepare a BAT for early io */
#if defined(CONFIG_GAMECUBE)
lis r8, 0x0c00
@@ -1199,6 +1209,7 @@ setup_usbgecko_bat:
mtspr SPRN_DBAT1L, r8
mtspr SPRN_DBAT1U, r11
blr
+SYM_FUNC_END(setup_usbgecko_bat)
#endif
.data
diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index f0db4f52bc00..744b096857a1 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -29,6 +29,8 @@
#include <linux/init.h>
#include <linux/threads.h>
#include <linux/pgtable.h>
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -885,7 +887,7 @@ KernelSPE:
* Translate the effec addr in r3 to phys addr. The phys addr will be put
* into r3(higher 32bit) and r4(lower 32bit)
*/
-get_phys_addr:
+SYM_FUNC_START_LOCAL(get_phys_addr)
mfmsr r8
mfspr r9,SPRN_PID
rlwinm r9,r9,16,0x3fff0000 /* turn PID into MAS6[SPID] */
@@ -907,6 +909,7 @@ get_phys_addr:
mfspr r3,SPRN_MAS7
#endif
blr
+SYM_FUNC_END(get_phys_addr)
/*
* Global functions
diff --git a/arch/powerpc/kernel/swsusp_32.S b/arch/powerpc/kernel/swsusp_32.S
index e0cbd63007f2..ffb79326483c 100644
--- a/arch/powerpc/kernel/swsusp_32.S
+++ b/arch/powerpc/kernel/swsusp_32.S
@@ -1,5 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/threads.h>
+#include <linux/linkage.h>
+
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/cputable.h>
@@ -400,7 +402,7 @@ _ASM_NOKPROBE_SYMBOL(swsusp_arch_resume)
/* FIXME:This construct is actually not useful since we don't shut
* down the instruction MMU, we could just flip back MSR-DR on.
*/
-turn_on_mmu:
+SYM_FUNC_START_LOCAL(turn_on_mmu)
mflr r4
mtsrr0 r4
mtsrr1 r3
@@ -408,4 +410,5 @@ turn_on_mmu:
isync
rfi
_ASM_NOKPROBE_SYMBOL(turn_on_mmu)
+SYM_FUNC_END(turn_on_mmu)
diff --git a/arch/powerpc/kvm/fpu.S b/arch/powerpc/kvm/fpu.S
index 315c94946bad..b68e7f26a81f 100644
--- a/arch/powerpc/kvm/fpu.S
+++ b/arch/powerpc/kvm/fpu.S
@@ -6,6 +6,8 @@
*/
#include <linux/pgtable.h>
+#include <linux/linkage.h>
+
#include <asm/reg.h>
#include <asm/page.h>
#include <asm/mmu.h>
@@ -110,18 +112,22 @@ FPS_THREE_IN(fsel)
* R8 = (double*)¶m3 [load_three]
* LR = instruction call function
*/
-fpd_load_three:
+SYM_FUNC_START_LOCAL(fpd_load_three)
lfd 2,0(r8) /* load param3 */
-fpd_load_two:
+SYM_FUNC_START_LOCAL(fpd_load_two)
lfd 1,0(r7) /* load param2 */
-fpd_load_one:
+SYM_FUNC_START_LOCAL(fpd_load_one)
lfd 0,0(r6) /* load param1 */
-fpd_load_none:
+SYM_FUNC_START_LOCAL(fpd_load_none)
lfd 3,0(r3) /* load up fpscr value */
MTFSF_L(3)
lwz r6, 0(r4) /* load cr */
mtcr r6
blr
+SYM_FUNC_END(fpd_load_none)
+SYM_FUNC_END(fpd_load_one)
+SYM_FUNC_END(fpd_load_two)
+SYM_FUNC_END(fpd_load_three)
/*
* End of double instruction processing
@@ -131,13 +137,14 @@ fpd_load_none:
* R5 = (double*)&result
* LR = caller of instruction call function
*/
-fpd_return:
+SYM_FUNC_START_LOCAL(fpd_return)
mfcr r6
stfd 0,0(r5) /* save result */
mffs 0
stfd 0,0(r3) /* save new fpscr value */
stw r6,0(r4) /* save new cr value */
blr
+SYM_FUNC_END(fpd_return)
/*
* Double operation with no input operand
diff --git a/arch/powerpc/platforms/52xx/lite5200_sleep.S b/arch/powerpc/platforms/52xx/lite5200_sleep.S
index afee8b1515a8..0b12647e7b42 100644
--- a/arch/powerpc/platforms/52xx/lite5200_sleep.S
+++ b/arch/powerpc/platforms/52xx/lite5200_sleep.S
@@ -1,4 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/linkage.h>
+
#include <asm/reg.h>
#include <asm/ppc_asm.h>
#include <asm/processor.h>
@@ -178,7 +180,8 @@ sram_code:
/* local udelay in sram is needed */
- udelay: /* r11 - tb_ticks_per_usec, r12 - usecs, overwrites r13 */
+SYM_FUNC_START_LOCAL(udelay)
+ /* r11 - tb_ticks_per_usec, r12 - usecs, overwrites r13 */
mullw r12, r12, r11
mftb r13 /* start */
add r12, r13, r12 /* end */
@@ -187,6 +190,7 @@ sram_code:
cmp cr0, r13, r12
blt 1b
blr
+SYM_FUNC_END(udelay)
sram_code_end:
@@ -271,7 +275,7 @@ _ASM_NOKPROBE_SYMBOL(lite5200_wakeup)
SAVE_SR(n+2, addr+2); \
SAVE_SR(n+3, addr+3);
-save_regs:
+SYM_FUNC_START_LOCAL(save_regs)
stw r0, 0(r4)
stw r1, 0x4(r4)
stw r2, 0x8(r4)
@@ -317,6 +321,7 @@ save_regs:
SAVE_SPRN(TBRU, 0x5b)
blr
+SYM_FUNC_END(save_regs)
/* restore registers */
@@ -336,7 +341,7 @@ save_regs:
LOAD_SR(n+2, addr+2); \
LOAD_SR(n+3, addr+3);
-restore_regs:
+SYM_FUNC_START_LOCAL(restore_regs)
lis r4, registers@h
ori r4, r4, registers@l
@@ -393,6 +398,7 @@ restore_regs:
blr
_ASM_NOKPROBE_SYMBOL(restore_regs)
+SYM_FUNC_END(restore_regs)
@@ -403,7 +409,7 @@ _ASM_NOKPROBE_SYMBOL(restore_regs)
* Flush data cache
* Do this by just reading lots of stuff into the cache.
*/
-flush_data_cache:
+SYM_FUNC_START_LOCAL(flush_data_cache)
lis r3,CONFIG_KERNEL_START@h
ori r3,r3,CONFIG_KERNEL_START@l
li r4,NUM_CACHE_LINES
@@ -413,3 +419,4 @@ flush_data_cache:
addi r3,r3,L1_CACHE_BYTES /* Next line, please */
bdnz 1b
blr
+SYM_FUNC_END(flush_data_cache)
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 3/7] objtool: Add architecture specific R_REL32 macro
2022-07-08 17:31 ` Christophe Leroy
@ 2022-07-08 17:31 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, x86, chenzhongjin
In order to allow other architectures than x86 to use 32 bits
PC relative relocations (S+A-P), define a R_REL32 macro that each
architecture will define, in the same way as already done for
R_NONE, R_ABS32 and R_ABS64.
For x86 that corresponds to R_X86_64_PC32.
For powerpc it will be R_PPC_REL32/R_PPC64_REL32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Improved commit message based on feedback from Segher
---
tools/objtool/arch/x86/include/arch/elf.h | 1 +
tools/objtool/check.c | 10 +++++-----
tools/objtool/orc_gen.c | 2 +-
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/tools/objtool/arch/x86/include/arch/elf.h b/tools/objtool/arch/x86/include/arch/elf.h
index ac14987cf687..e7d228c686db 100644
--- a/tools/objtool/arch/x86/include/arch/elf.h
+++ b/tools/objtool/arch/x86/include/arch/elf.h
@@ -4,5 +4,6 @@
#define R_NONE R_X86_64_NONE
#define R_ABS64 R_X86_64_64
#define R_ABS32 R_X86_64_32
+#define R_REL32 R_X86_64_PC32
#endif /* _OBJTOOL_ARCH_ELF */
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index dec42a226048..ba8fd313372c 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -652,7 +652,7 @@ static int create_static_call_sections(struct objtool_file *file)
/* populate reloc for 'addr' */
if (elf_add_reloc_to_insn(file->elf, sec,
idx * sizeof(struct static_call_site),
- R_X86_64_PC32,
+ R_REL32,
insn->sec, insn->offset))
return -1;
@@ -693,7 +693,7 @@ static int create_static_call_sections(struct objtool_file *file)
/* populate reloc for 'key' */
if (elf_add_reloc(file->elf, sec,
idx * sizeof(struct static_call_site) + 4,
- R_X86_64_PC32, key_sym,
+ R_REL32, key_sym,
is_sibling_call(insn) * STATIC_CALL_SITE_TAIL))
return -1;
@@ -737,7 +737,7 @@ static int create_retpoline_sites_sections(struct objtool_file *file)
if (elf_add_reloc_to_insn(file->elf, sec,
idx * sizeof(int),
- R_X86_64_PC32,
+ R_REL32,
insn->sec, insn->offset)) {
WARN("elf_add_reloc_to_insn: .retpoline_sites");
return -1;
@@ -789,7 +789,7 @@ static int create_ibt_endbr_seal_sections(struct objtool_file *file)
if (elf_add_reloc_to_insn(file->elf, sec,
idx * sizeof(int),
- R_X86_64_PC32,
+ R_REL32,
insn->sec, insn->offset)) {
WARN("elf_add_reloc_to_insn: .ibt_endbr_seal");
return -1;
@@ -3718,7 +3718,7 @@ static int validate_ibt_insn(struct objtool_file *file, struct instruction *insn
continue;
off = reloc->sym->offset;
- if (reloc->type == R_X86_64_PC32 || reloc->type == R_X86_64_PLT32)
+ if (reloc->type == R_REL32 || reloc->type == R_X86_64_PLT32)
off += arch_dest_reloc_offset(reloc->addend);
else
off += reloc->addend;
diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c
index 1f22b7ebae58..49a877b9c879 100644
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -101,7 +101,7 @@ static int write_orc_entry(struct elf *elf, struct section *orc_sec,
orc->bp_offset = bswap_if_needed(elf, orc->bp_offset);
/* populate reloc for ip */
- if (elf_add_reloc_to_insn(elf, ip_sec, idx * sizeof(int), R_X86_64_PC32,
+ if (elf_add_reloc_to_insn(elf, ip_sec, idx * sizeof(int), R_REL32,
insn_sec, insn_off))
return -1;
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 3/7] objtool: Add architecture specific R_REL32 macro
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: chenzhongjin, x86, linuxppc-dev, linux-kernel
In order to allow other architectures than x86 to use 32 bits
PC relative relocations (S+A-P), define a R_REL32 macro that each
architecture will define, in the same way as already done for
R_NONE, R_ABS32 and R_ABS64.
For x86 that corresponds to R_X86_64_PC32.
For powerpc it will be R_PPC_REL32/R_PPC64_REL32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Improved commit message based on feedback from Segher
---
tools/objtool/arch/x86/include/arch/elf.h | 1 +
tools/objtool/check.c | 10 +++++-----
tools/objtool/orc_gen.c | 2 +-
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/tools/objtool/arch/x86/include/arch/elf.h b/tools/objtool/arch/x86/include/arch/elf.h
index ac14987cf687..e7d228c686db 100644
--- a/tools/objtool/arch/x86/include/arch/elf.h
+++ b/tools/objtool/arch/x86/include/arch/elf.h
@@ -4,5 +4,6 @@
#define R_NONE R_X86_64_NONE
#define R_ABS64 R_X86_64_64
#define R_ABS32 R_X86_64_32
+#define R_REL32 R_X86_64_PC32
#endif /* _OBJTOOL_ARCH_ELF */
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index dec42a226048..ba8fd313372c 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -652,7 +652,7 @@ static int create_static_call_sections(struct objtool_file *file)
/* populate reloc for 'addr' */
if (elf_add_reloc_to_insn(file->elf, sec,
idx * sizeof(struct static_call_site),
- R_X86_64_PC32,
+ R_REL32,
insn->sec, insn->offset))
return -1;
@@ -693,7 +693,7 @@ static int create_static_call_sections(struct objtool_file *file)
/* populate reloc for 'key' */
if (elf_add_reloc(file->elf, sec,
idx * sizeof(struct static_call_site) + 4,
- R_X86_64_PC32, key_sym,
+ R_REL32, key_sym,
is_sibling_call(insn) * STATIC_CALL_SITE_TAIL))
return -1;
@@ -737,7 +737,7 @@ static int create_retpoline_sites_sections(struct objtool_file *file)
if (elf_add_reloc_to_insn(file->elf, sec,
idx * sizeof(int),
- R_X86_64_PC32,
+ R_REL32,
insn->sec, insn->offset)) {
WARN("elf_add_reloc_to_insn: .retpoline_sites");
return -1;
@@ -789,7 +789,7 @@ static int create_ibt_endbr_seal_sections(struct objtool_file *file)
if (elf_add_reloc_to_insn(file->elf, sec,
idx * sizeof(int),
- R_X86_64_PC32,
+ R_REL32,
insn->sec, insn->offset)) {
WARN("elf_add_reloc_to_insn: .ibt_endbr_seal");
return -1;
@@ -3718,7 +3718,7 @@ static int validate_ibt_insn(struct objtool_file *file, struct instruction *insn
continue;
off = reloc->sym->offset;
- if (reloc->type == R_X86_64_PC32 || reloc->type == R_X86_64_PLT32)
+ if (reloc->type == R_REL32 || reloc->type == R_X86_64_PLT32)
off += arch_dest_reloc_offset(reloc->addend);
else
off += reloc->addend;
diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c
index 1f22b7ebae58..49a877b9c879 100644
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -101,7 +101,7 @@ static int write_orc_entry(struct elf *elf, struct section *orc_sec,
orc->bp_offset = bswap_if_needed(elf, orc->bp_offset);
/* populate reloc for ip */
- if (elf_add_reloc_to_insn(elf, ip_sec, idx * sizeof(int), R_X86_64_PC32,
+ if (elf_add_reloc_to_insn(elf, ip_sec, idx * sizeof(int), R_REL32,
insn_sec, insn_off))
return -1;
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 4/7] objtool/powerpc: Add necessary support for inline static calls
2022-07-08 17:31 ` Christophe Leroy
@ 2022-07-08 17:31 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, x86, chenzhongjin
In order to support inline static calls for powerpc, objtool needs
the following additions:
- R_REL32 macro
- Support for JUMP instruction used for tail calls
Add the support of decoding branch instruction 'b' which is the jump
instruction used for tail calls. This is because a static call can be
a tail call.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
tools/objtool/arch/powerpc/decode.c | 16 ++++++++++------
tools/objtool/arch/powerpc/include/arch/elf.h | 1 +
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/tools/objtool/arch/powerpc/decode.c b/tools/objtool/arch/powerpc/decode.c
index 06fc0206bf8e..ba84869cd134 100644
--- a/tools/objtool/arch/powerpc/decode.c
+++ b/tools/objtool/arch/powerpc/decode.c
@@ -59,13 +59,17 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
opcode = insn >> 26;
switch (opcode) {
- case 18: /* bl */
- if ((insn & 3) == 1) {
+ case 18: /* bl/b */
+ if ((insn & 3) == 1)
*type = INSN_CALL;
- *immediate = insn & 0x3fffffc;
- if (*immediate & 0x2000000)
- *immediate -= 0x4000000;
- }
+ else if ((insn & 3) == 0)
+ *type = INSN_JUMP_UNCONDITIONAL;
+ else
+ break;
+
+ *immediate = insn & 0x3fffffc;
+ if (*immediate & 0x2000000)
+ *immediate -= 0x4000000;
break;
}
diff --git a/tools/objtool/arch/powerpc/include/arch/elf.h b/tools/objtool/arch/powerpc/include/arch/elf.h
index 73f9ae172fe5..befc2e30d38b 100644
--- a/tools/objtool/arch/powerpc/include/arch/elf.h
+++ b/tools/objtool/arch/powerpc/include/arch/elf.h
@@ -6,5 +6,6 @@
#define R_NONE R_PPC_NONE
#define R_ABS64 R_PPC64_ADDR64
#define R_ABS32 R_PPC_ADDR32
+#define R_REL32 R_PPC_REL32 /* R_PPC64_REL32 is identical */
#endif /* _OBJTOOL_ARCH_ELF */
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 4/7] objtool/powerpc: Add necessary support for inline static calls
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: chenzhongjin, x86, linuxppc-dev, linux-kernel
In order to support inline static calls for powerpc, objtool needs
the following additions:
- R_REL32 macro
- Support for JUMP instruction used for tail calls
Add the support of decoding branch instruction 'b' which is the jump
instruction used for tail calls. This is because a static call can be
a tail call.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
tools/objtool/arch/powerpc/decode.c | 16 ++++++++++------
tools/objtool/arch/powerpc/include/arch/elf.h | 1 +
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/tools/objtool/arch/powerpc/decode.c b/tools/objtool/arch/powerpc/decode.c
index 06fc0206bf8e..ba84869cd134 100644
--- a/tools/objtool/arch/powerpc/decode.c
+++ b/tools/objtool/arch/powerpc/decode.c
@@ -59,13 +59,17 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
opcode = insn >> 26;
switch (opcode) {
- case 18: /* bl */
- if ((insn & 3) == 1) {
+ case 18: /* bl/b */
+ if ((insn & 3) == 1)
*type = INSN_CALL;
- *immediate = insn & 0x3fffffc;
- if (*immediate & 0x2000000)
- *immediate -= 0x4000000;
- }
+ else if ((insn & 3) == 0)
+ *type = INSN_JUMP_UNCONDITIONAL;
+ else
+ break;
+
+ *immediate = insn & 0x3fffffc;
+ if (*immediate & 0x2000000)
+ *immediate -= 0x4000000;
break;
}
diff --git a/tools/objtool/arch/powerpc/include/arch/elf.h b/tools/objtool/arch/powerpc/include/arch/elf.h
index 73f9ae172fe5..befc2e30d38b 100644
--- a/tools/objtool/arch/powerpc/include/arch/elf.h
+++ b/tools/objtool/arch/powerpc/include/arch/elf.h
@@ -6,5 +6,6 @@
#define R_NONE R_PPC_NONE
#define R_ABS64 R_PPC64_ADDR64
#define R_ABS32 R_PPC_ADDR32
+#define R_REL32 R_PPC_REL32 /* R_PPC64_REL32 is identical */
#endif /* _OBJTOOL_ARCH_ELF */
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 5/7] init: Call static_call_init() from start_kernel()
2022-07-08 17:31 ` Christophe Leroy
@ 2022-07-08 17:31 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, x86, chenzhongjin
Call static_call_init() just after jump_label_init().
x86 already called it from setup_arch(). This is not a
problem as static_call_init() is guarded from double call.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
init/main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/init/main.c b/init/main.c
index 0ee39cdcfcac..7b8e9608f091 100644
--- a/init/main.c
+++ b/init/main.c
@@ -962,6 +962,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
pr_notice("Kernel command line: %s\n", saved_command_line);
/* parameters may set static keys */
jump_label_init();
+ static_call_init();
parse_early_param();
after_dashes = parse_args("Booting kernel",
static_command_line, __start___param,
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 5/7] init: Call static_call_init() from start_kernel()
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: chenzhongjin, x86, linuxppc-dev, linux-kernel
Call static_call_init() just after jump_label_init().
x86 already called it from setup_arch(). This is not a
problem as static_call_init() is guarded from double call.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
init/main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/init/main.c b/init/main.c
index 0ee39cdcfcac..7b8e9608f091 100644
--- a/init/main.c
+++ b/init/main.c
@@ -962,6 +962,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
pr_notice("Kernel command line: %s\n", saved_command_line);
/* parameters may set static keys */
jump_label_init();
+ static_call_init();
parse_early_param();
after_dashes = parse_args("Booting kernel",
static_command_line, __start___param,
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 6/7] static_call_inline: Provide trampoline address when updating sites
2022-07-08 17:31 ` Christophe Leroy
@ 2022-07-08 17:31 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, x86, chenzhongjin
In preparation of support of inline static calls on powerpc, provide
trampoline address when updating sites, so that if the destination
function is too far for a direct function call, the call site will
be patched with a call to the trampoline.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/x86/kernel/static_call.c | 2 +-
kernel/static_call_inline.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
index aa72cefdd5be..4db30b0ea71c 100644
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -102,7 +102,7 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
{
mutex_lock(&text_mutex);
- if (tramp) {
+ if (tramp && !site) {
__static_call_validate(tramp, true, true);
__static_call_transform(tramp, __sc_insn(!func, true), func);
}
diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c
index dc5665b62814..b5de9d92fa4e 100644
--- a/kernel/static_call_inline.c
+++ b/kernel/static_call_inline.c
@@ -195,7 +195,7 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func)
continue;
}
- arch_static_call_transform(site_addr, NULL, func,
+ arch_static_call_transform(site_addr, tramp, func,
static_call_is_tail(site));
}
}
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 6/7] static_call_inline: Provide trampoline address when updating sites
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: chenzhongjin, x86, linuxppc-dev, linux-kernel
In preparation of support of inline static calls on powerpc, provide
trampoline address when updating sites, so that if the destination
function is too far for a direct function call, the call site will
be patched with a call to the trampoline.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/x86/kernel/static_call.c | 2 +-
kernel/static_call_inline.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
index aa72cefdd5be..4db30b0ea71c 100644
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -102,7 +102,7 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
{
mutex_lock(&text_mutex);
- if (tramp) {
+ if (tramp && !site) {
__static_call_validate(tramp, true, true);
__static_call_transform(tramp, __sc_insn(!func, true), func);
}
diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c
index dc5665b62814..b5de9d92fa4e 100644
--- a/kernel/static_call_inline.c
+++ b/kernel/static_call_inline.c
@@ -195,7 +195,7 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func)
continue;
}
- arch_static_call_transform(site_addr, NULL, func,
+ arch_static_call_transform(site_addr, tramp, func,
static_call_is_tail(site));
}
}
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 7/7] powerpc/static_call: Implement inline static calls
2022-07-08 17:31 ` Christophe Leroy
@ 2022-07-08 17:31 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: Christophe Leroy, linux-kernel, linuxppc-dev, x86, chenzhongjin
Implement inline static calls:
- Put a 'bl' to the destination function ('b' if tail call)
- Put a 'nop' when the destination function is NULL ('blr' if tail call)
- Put a 'li r3,0' when the destination is the RET0 function and not
a tail call.
If the destination is too far (over the 32Mb limit), go via the
trampoline.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/static_call.h | 2 +
arch/powerpc/kernel/static_call.c | 56 +++++++++++++++++++-------
3 files changed, 44 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 00a43eb26418..cb92887acc3f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -251,6 +251,7 @@ config PPC
select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
select HAVE_STATIC_CALL if PPC32
+ select HAVE_STATIC_CALL_INLINE if PPC32
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HUGETLB_PAGE_SIZE_VARIABLE if PPC_BOOK3S_64 && HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
index de1018cc522b..e3d5d3823dac 100644
--- a/arch/powerpc/include/asm/static_call.h
+++ b/arch/powerpc/include/asm/static_call.h
@@ -26,4 +26,6 @@
#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) __PPC_SCT(name, "blr")
#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name) __PPC_SCT(name, "b .+20")
+#define CALL_INSN_SIZE 4
+
#endif /* _ASM_POWERPC_STATIC_CALL_H */
diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
index 863a7aa24650..0093b471186d 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -8,26 +8,52 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
{
int err;
bool is_ret0 = (func == __static_call_return0);
- unsigned long target = (unsigned long)(is_ret0 ? tramp + PPC_SCT_RET0 : func);
- bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
-
- if (!tramp)
- return;
+ unsigned long _tramp = (unsigned long)tramp;
+ unsigned long _func = (unsigned long)func;
+ unsigned long _ret0 = _tramp + PPC_SCT_RET0;
+ bool is_short = is_offset_in_branch_range((long)func - (long)(site ? : tramp));
mutex_lock(&text_mutex);
- if (func && !is_short) {
- err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target));
- if (err)
- goto out;
+ if (site && !tail) {
+ if (!func)
+ err = patch_instruction(site, ppc_inst(PPC_RAW_NOP()));
+ else if (is_ret0)
+ err = patch_instruction(site, ppc_inst(PPC_RAW_LI(_R3, 0)));
+ else if (is_short)
+ err = patch_branch(site, _func, BRANCH_SET_LINK);
+ else if (tramp)
+ err = patch_branch(site, _tramp, BRANCH_SET_LINK);
+ else
+ err = 0;
+ } else if (site) {
+ if (!func)
+ err = patch_instruction(site, ppc_inst(PPC_RAW_BLR()));
+ else if (is_ret0)
+ err = patch_branch(site, _ret0, 0);
+ else if (is_short)
+ err = patch_branch(site, _func, 0);
+ else if (tramp)
+ err = patch_branch(site, _tramp, 0);
+ else
+ err = 0;
+ } else if (tramp) {
+ if (func && !is_short) {
+ err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(_func));
+ if (err)
+ goto out;
+ }
+
+ if (!func)
+ err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
+ else if (is_ret0)
+ err = patch_branch(tramp, _ret0, 0);
+ else if (is_short)
+ err = patch_branch(tramp, _func, 0);
+ else
+ err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
}
- if (!func)
- err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
- else if (is_short)
- err = patch_branch(tramp, target, 0);
- else
- err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
out:
mutex_unlock(&text_mutex);
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 7/7] powerpc/static_call: Implement inline static calls
@ 2022-07-08 17:31 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-07-08 17:31 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, sv, agust, jpoimboe, peterz,
jbaron, rostedt, ardb, tglx, mingo, bp, dave.hansen, hpa
Cc: chenzhongjin, x86, linuxppc-dev, linux-kernel
Implement inline static calls:
- Put a 'bl' to the destination function ('b' if tail call)
- Put a 'nop' when the destination function is NULL ('blr' if tail call)
- Put a 'li r3,0' when the destination is the RET0 function and not
a tail call.
If the destination is too far (over the 32Mb limit), go via the
trampoline.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/static_call.h | 2 +
arch/powerpc/kernel/static_call.c | 56 +++++++++++++++++++-------
3 files changed, 44 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 00a43eb26418..cb92887acc3f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -251,6 +251,7 @@ config PPC
select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
select HAVE_STATIC_CALL if PPC32
+ select HAVE_STATIC_CALL_INLINE if PPC32
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HUGETLB_PAGE_SIZE_VARIABLE if PPC_BOOK3S_64 && HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
index de1018cc522b..e3d5d3823dac 100644
--- a/arch/powerpc/include/asm/static_call.h
+++ b/arch/powerpc/include/asm/static_call.h
@@ -26,4 +26,6 @@
#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) __PPC_SCT(name, "blr")
#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name) __PPC_SCT(name, "b .+20")
+#define CALL_INSN_SIZE 4
+
#endif /* _ASM_POWERPC_STATIC_CALL_H */
diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
index 863a7aa24650..0093b471186d 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -8,26 +8,52 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
{
int err;
bool is_ret0 = (func == __static_call_return0);
- unsigned long target = (unsigned long)(is_ret0 ? tramp + PPC_SCT_RET0 : func);
- bool is_short = is_offset_in_branch_range((long)target - (long)tramp);
-
- if (!tramp)
- return;
+ unsigned long _tramp = (unsigned long)tramp;
+ unsigned long _func = (unsigned long)func;
+ unsigned long _ret0 = _tramp + PPC_SCT_RET0;
+ bool is_short = is_offset_in_branch_range((long)func - (long)(site ? : tramp));
mutex_lock(&text_mutex);
- if (func && !is_short) {
- err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target));
- if (err)
- goto out;
+ if (site && !tail) {
+ if (!func)
+ err = patch_instruction(site, ppc_inst(PPC_RAW_NOP()));
+ else if (is_ret0)
+ err = patch_instruction(site, ppc_inst(PPC_RAW_LI(_R3, 0)));
+ else if (is_short)
+ err = patch_branch(site, _func, BRANCH_SET_LINK);
+ else if (tramp)
+ err = patch_branch(site, _tramp, BRANCH_SET_LINK);
+ else
+ err = 0;
+ } else if (site) {
+ if (!func)
+ err = patch_instruction(site, ppc_inst(PPC_RAW_BLR()));
+ else if (is_ret0)
+ err = patch_branch(site, _ret0, 0);
+ else if (is_short)
+ err = patch_branch(site, _func, 0);
+ else if (tramp)
+ err = patch_branch(site, _tramp, 0);
+ else
+ err = 0;
+ } else if (tramp) {
+ if (func && !is_short) {
+ err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(_func));
+ if (err)
+ goto out;
+ }
+
+ if (!func)
+ err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
+ else if (is_ret0)
+ err = patch_branch(tramp, _ret0, 0);
+ else if (is_short)
+ err = patch_branch(tramp, _func, 0);
+ else
+ err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
}
- if (!func)
- err = patch_instruction(tramp, ppc_inst(PPC_RAW_BLR()));
- else if (is_short)
- err = patch_branch(tramp, target, 0);
- else
- err = patch_instruction(tramp, ppc_inst(PPC_RAW_NOP()));
out:
mutex_unlock(&text_mutex);
--
2.36.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
2022-07-08 17:31 ` Christophe Leroy
@ 2022-07-09 6:52 ` Ard Biesheuvel
-1 siblings, 0 replies; 24+ messages in thread
From: Ard Biesheuvel @ 2022-07-09 6:52 UTC (permalink / raw)
To: Christophe Leroy
Cc: Michael Ellerman, Nicholas Piggin, sv, agust, Josh Poimboeuf,
Peter Zijlstra, Jason Baron, Steven Rostedt (VMware),
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Linux Kernel Mailing List,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
X86 ML, Chen Zhongjin
Hello Christophe,
On Fri, 8 Jul 2022 at 19:32, Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
> This series applies on top of the series v3 "objtool: Enable and
> implement --mcount option on powerpc" [1] rebased on powerpc-next branch
>
> A few modifications are done to core parts to enable powerpc
> implementation:
> - R_X86_64_PC32 is abstracted to R_REL32 so that it can then be
> redefined as R_PPC_REL32.
> - A call to static_call_init() is added to start_kernel() to avoid
> every architecture to have to call it
> - Trampoline address is provided to arch_static_call_transform() even
> when setting a site to fallback on a call to the trampoline when the
> target is too far.
>
> [1] https://lore.kernel.org/lkml/70b6d08d-aced-7f4e-b958-a3c7ae1a9319@csgroup.eu/T/#rb3a073c54aba563a135fba891e0c34c46e47beef
>
> Christophe Leroy (7):
> powerpc: Add missing asm/asm.h for objtool
> objtool/powerpc: Activate objtool on PPC32
> objtool: Add architecture specific R_REL32 macro
> objtool/powerpc: Add necessary support for inline static calls
> init: Call static_call_init() from start_kernel()
> static_call_inline: Provide trampoline address when updating sites
> powerpc/static_call: Implement inline static calls
>
Could you quantify the performance gains of moving from out-of-line,
patched tail-call branch instructions to full-fledged inline static
calls? On x86, the retpoline problem makes this glaringly obvious, but
on other architectures, the complexity of supporting this model may
outweigh the performance advantages.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
@ 2022-07-09 6:52 ` Ard Biesheuvel
0 siblings, 0 replies; 24+ messages in thread
From: Ard Biesheuvel @ 2022-07-09 6:52 UTC (permalink / raw)
To: Christophe Leroy
Cc: X86 ML, Peter Zijlstra, Chen Zhongjin, Dave Hansen,
Linux Kernel Mailing List, Nicholas Piggin, Jason Baron,
Ingo Molnar, sv, Steven Rostedt (VMware),
H. Peter Anvin, Borislav Petkov, Thomas Gleixner, agust,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
Josh Poimboeuf
Hello Christophe,
On Fri, 8 Jul 2022 at 19:32, Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
> This series applies on top of the series v3 "objtool: Enable and
> implement --mcount option on powerpc" [1] rebased on powerpc-next branch
>
> A few modifications are done to core parts to enable powerpc
> implementation:
> - R_X86_64_PC32 is abstracted to R_REL32 so that it can then be
> redefined as R_PPC_REL32.
> - A call to static_call_init() is added to start_kernel() to avoid
> every architecture to have to call it
> - Trampoline address is provided to arch_static_call_transform() even
> when setting a site to fallback on a call to the trampoline when the
> target is too far.
>
> [1] https://lore.kernel.org/lkml/70b6d08d-aced-7f4e-b958-a3c7ae1a9319@csgroup.eu/T/#rb3a073c54aba563a135fba891e0c34c46e47beef
>
> Christophe Leroy (7):
> powerpc: Add missing asm/asm.h for objtool
> objtool/powerpc: Activate objtool on PPC32
> objtool: Add architecture specific R_REL32 macro
> objtool/powerpc: Add necessary support for inline static calls
> init: Call static_call_init() from start_kernel()
> static_call_inline: Provide trampoline address when updating sites
> powerpc/static_call: Implement inline static calls
>
Could you quantify the performance gains of moving from out-of-line,
patched tail-call branch instructions to full-fledged inline static
calls? On x86, the retpoline problem makes this glaringly obvious, but
on other architectures, the complexity of supporting this model may
outweigh the performance advantages.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
2022-07-09 6:52 ` Ard Biesheuvel
@ 2022-09-01 16:46 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-09-01 16:46 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Michael Ellerman, Nicholas Piggin, sv, agust, Josh Poimboeuf,
Peter Zijlstra, Jason Baron, Steven Rostedt (VMware),
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Linux Kernel Mailing List,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
X86 ML, Chen Zhongjin
Le 09/07/2022 à 08:52, Ard Biesheuvel a écrit :
> Hello Christophe,
>
> On Fri, 8 Jul 2022 at 19:32, Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>> This series applies on top of the series v3 "objtool: Enable and
>> implement --mcount option on powerpc" [1] rebased on powerpc-next branch
>>
>> A few modifications are done to core parts to enable powerpc
>> implementation:
>> - R_X86_64_PC32 is abstracted to R_REL32 so that it can then be
>> redefined as R_PPC_REL32.
>> - A call to static_call_init() is added to start_kernel() to avoid
>> every architecture to have to call it
>> - Trampoline address is provided to arch_static_call_transform() even
>> when setting a site to fallback on a call to the trampoline when the
>> target is too far.
>>
>> [1] https://lore.kernel.org/lkml/70b6d08d-aced-7f4e-b958-a3c7ae1a9319@csgroup.eu/T/#rb3a073c54aba563a135fba891e0c34c46e47beef
>>
>> Christophe Leroy (7):
>> powerpc: Add missing asm/asm.h for objtool
>> objtool/powerpc: Activate objtool on PPC32
>> objtool: Add architecture specific R_REL32 macro
>> objtool/powerpc: Add necessary support for inline static calls
>> init: Call static_call_init() from start_kernel()
>> static_call_inline: Provide trampoline address when updating sites
>> powerpc/static_call: Implement inline static calls
>>
>
> Could you quantify the performance gains of moving from out-of-line,
> patched tail-call branch instructions to full-fledged inline static
> calls? On x86, the retpoline problem makes this glaringly obvious, but
> on other architectures, the complexity of supporting this model may
> outweigh the performance advantages.
Surprisingly, I get worst performance with inline static call than with
out of line static call:
No static call:
root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 17.186
Running with 1*40 (== 40) tasks.
Time: 16.738
Running with 1*40 (== 40) tasks.
Time: 16.579
Running with 1*40 (== 40) tasks.
Time: 16.838
Running with 1*40 (== 40) tasks.
Time: 16.652
Running with 1*40 (== 40) tasks.
Time: 17.380
Running with 1*40 (== 40) tasks.
Time: 16.630
Running with 1*40 (== 40) tasks.
Time: 16.850
Running with 1*40 (== 40) tasks.
Time: 17.161
Running with 1*40 (== 40) tasks.
Time: 16.722
Performance counter stats for './hackbench 1' (10 runs):
17019.55 msec task-clock # 0.980 CPUs
utilized ( +- 0.51% )
4847 context-switches # 282.280 /sec
( +- 6.32% )
0 cpu-migrations # 0.000 /sec
1249 page-faults # 72.739 /sec
( +- 0.49% )
2245344976 cycles # 0.131 GHz
( +- 0.51% )
727437072 instructions # 0.32 insn per
cycle ( +- 0.40% )
<not supported> branches
<not supported> branch-misses
17.3585 +- 0.0909 seconds time elapsed ( +- 0.52% )
Outline static call:
root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 15.892
Running with 1*40 (== 40) tasks.
Time: 15.731
Running with 1*40 (== 40) tasks.
Time: 15.507
Running with 1*40 (== 40) tasks.
Time: 16.269
Running with 1*40 (== 40) tasks.
Time: 15.934
Running with 1*40 (== 40) tasks.
Time: 16.048
Running with 1*40 (== 40) tasks.
Time: 15.700
Running with 1*40 (== 40) tasks.
Time: 16.063
Running with 1*40 (== 40) tasks.
Time: 15.852
Running with 1*40 (== 40) tasks.
Time: 15.941
Performance counter stats for './hackbench 1' (10 runs):
16227.32 msec task-clock # 0.992 CPUs
utilized ( +- 0.42% )
3732 context-switches # 230.525 /sec
( +- 6.42% )
0 cpu-migrations # 0.000 /sec
1244 page-faults # 76.842 /sec
( +- 0.11% )
2141094288 cycles # 0.132 GHz
( +- 0.42% )
712598441 instructions # 0.33 insn per
cycle ( +- 0.29% )
<not supported> branches
<not supported> branch-misses
16.3539 +- 0.0675 seconds time elapsed ( +- 0.41% )
Inline static call:
root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 17.512
Running with 1*40 (== 40) tasks.
Time: 17.240
Running with 1*40 (== 40) tasks.
Time: 16.901
Running with 1*40 (== 40) tasks.
Time: 17.125
Running with 1*40 (== 40) tasks.
Time: 17.262
Running with 1*40 (== 40) tasks.
Time: 17.298
Running with 1*40 (== 40) tasks.
Time: 17.182
Running with 1*40 (== 40) tasks.
Time: 16.988
Running with 1*40 (== 40) tasks.
Time: 17.102
Running with 1*40 (== 40) tasks.
Time: 16.669
Performance counter stats for './hackbench 1' (10 runs):
16976.76 msec task-clock # 0.964 CPUs
utilized ( +- 0.44% )
4760 context-switches # 273.007 /sec
( +- 4.93% )
0 cpu-migrations # 0.000 /sec
1252 page-faults # 71.808 /sec
( +- 0.35% )
2239986112 cycles # 0.128 GHz
( +- 0.44% )
721540184 instructions # 0.31 insn per
cycle ( +- 0.31% )
<not supported> branches
<not supported> branch-misses
17.6126 +- 0.0762 seconds time elapsed ( +- 0.43% )
Summary:
No static calls:
17.3585 +- 0.0909 seconds time elapsed ( +- 0.52% )
Out-of-line static calls:
16.3539 +- 0.0675 seconds time elapsed ( +- 0.41% )
Inline static calls:
17.6126 +- 0.0762 seconds time elapsed ( +- 0.43% )
Is there anything wrong with inline statica calls ?
Christophe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
@ 2022-09-01 16:46 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-09-01 16:46 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: X86 ML, Peter Zijlstra, Chen Zhongjin, Dave Hansen,
Linux Kernel Mailing List, Nicholas Piggin, Jason Baron,
Ingo Molnar, sv, Steven Rostedt (VMware),
H. Peter Anvin, Borislav Petkov, Thomas Gleixner, agust,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
Josh Poimboeuf
Le 09/07/2022 à 08:52, Ard Biesheuvel a écrit :
> Hello Christophe,
>
> On Fri, 8 Jul 2022 at 19:32, Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>> This series applies on top of the series v3 "objtool: Enable and
>> implement --mcount option on powerpc" [1] rebased on powerpc-next branch
>>
>> A few modifications are done to core parts to enable powerpc
>> implementation:
>> - R_X86_64_PC32 is abstracted to R_REL32 so that it can then be
>> redefined as R_PPC_REL32.
>> - A call to static_call_init() is added to start_kernel() to avoid
>> every architecture to have to call it
>> - Trampoline address is provided to arch_static_call_transform() even
>> when setting a site to fallback on a call to the trampoline when the
>> target is too far.
>>
>> [1] https://lore.kernel.org/lkml/70b6d08d-aced-7f4e-b958-a3c7ae1a9319@csgroup.eu/T/#rb3a073c54aba563a135fba891e0c34c46e47beef
>>
>> Christophe Leroy (7):
>> powerpc: Add missing asm/asm.h for objtool
>> objtool/powerpc: Activate objtool on PPC32
>> objtool: Add architecture specific R_REL32 macro
>> objtool/powerpc: Add necessary support for inline static calls
>> init: Call static_call_init() from start_kernel()
>> static_call_inline: Provide trampoline address when updating sites
>> powerpc/static_call: Implement inline static calls
>>
>
> Could you quantify the performance gains of moving from out-of-line,
> patched tail-call branch instructions to full-fledged inline static
> calls? On x86, the retpoline problem makes this glaringly obvious, but
> on other architectures, the complexity of supporting this model may
> outweigh the performance advantages.
Surprisingly, I get worst performance with inline static call than with
out of line static call:
No static call:
root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 17.186
Running with 1*40 (== 40) tasks.
Time: 16.738
Running with 1*40 (== 40) tasks.
Time: 16.579
Running with 1*40 (== 40) tasks.
Time: 16.838
Running with 1*40 (== 40) tasks.
Time: 16.652
Running with 1*40 (== 40) tasks.
Time: 17.380
Running with 1*40 (== 40) tasks.
Time: 16.630
Running with 1*40 (== 40) tasks.
Time: 16.850
Running with 1*40 (== 40) tasks.
Time: 17.161
Running with 1*40 (== 40) tasks.
Time: 16.722
Performance counter stats for './hackbench 1' (10 runs):
17019.55 msec task-clock # 0.980 CPUs
utilized ( +- 0.51% )
4847 context-switches # 282.280 /sec
( +- 6.32% )
0 cpu-migrations # 0.000 /sec
1249 page-faults # 72.739 /sec
( +- 0.49% )
2245344976 cycles # 0.131 GHz
( +- 0.51% )
727437072 instructions # 0.32 insn per
cycle ( +- 0.40% )
<not supported> branches
<not supported> branch-misses
17.3585 +- 0.0909 seconds time elapsed ( +- 0.52% )
Outline static call:
root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 15.892
Running with 1*40 (== 40) tasks.
Time: 15.731
Running with 1*40 (== 40) tasks.
Time: 15.507
Running with 1*40 (== 40) tasks.
Time: 16.269
Running with 1*40 (== 40) tasks.
Time: 15.934
Running with 1*40 (== 40) tasks.
Time: 16.048
Running with 1*40 (== 40) tasks.
Time: 15.700
Running with 1*40 (== 40) tasks.
Time: 16.063
Running with 1*40 (== 40) tasks.
Time: 15.852
Running with 1*40 (== 40) tasks.
Time: 15.941
Performance counter stats for './hackbench 1' (10 runs):
16227.32 msec task-clock # 0.992 CPUs
utilized ( +- 0.42% )
3732 context-switches # 230.525 /sec
( +- 6.42% )
0 cpu-migrations # 0.000 /sec
1244 page-faults # 76.842 /sec
( +- 0.11% )
2141094288 cycles # 0.132 GHz
( +- 0.42% )
712598441 instructions # 0.33 insn per
cycle ( +- 0.29% )
<not supported> branches
<not supported> branch-misses
16.3539 +- 0.0675 seconds time elapsed ( +- 0.41% )
Inline static call:
root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 17.512
Running with 1*40 (== 40) tasks.
Time: 17.240
Running with 1*40 (== 40) tasks.
Time: 16.901
Running with 1*40 (== 40) tasks.
Time: 17.125
Running with 1*40 (== 40) tasks.
Time: 17.262
Running with 1*40 (== 40) tasks.
Time: 17.298
Running with 1*40 (== 40) tasks.
Time: 17.182
Running with 1*40 (== 40) tasks.
Time: 16.988
Running with 1*40 (== 40) tasks.
Time: 17.102
Running with 1*40 (== 40) tasks.
Time: 16.669
Performance counter stats for './hackbench 1' (10 runs):
16976.76 msec task-clock # 0.964 CPUs
utilized ( +- 0.44% )
4760 context-switches # 273.007 /sec
( +- 4.93% )
0 cpu-migrations # 0.000 /sec
1252 page-faults # 71.808 /sec
( +- 0.35% )
2239986112 cycles # 0.128 GHz
( +- 0.44% )
721540184 instructions # 0.31 insn per
cycle ( +- 0.31% )
<not supported> branches
<not supported> branch-misses
17.6126 +- 0.0762 seconds time elapsed ( +- 0.43% )
Summary:
No static calls:
17.3585 +- 0.0909 seconds time elapsed ( +- 0.52% )
Out-of-line static calls:
16.3539 +- 0.0675 seconds time elapsed ( +- 0.41% )
Inline static calls:
17.6126 +- 0.0762 seconds time elapsed ( +- 0.43% )
Is there anything wrong with inline statica calls ?
Christophe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
2022-09-01 16:46 ` Christophe Leroy
@ 2022-09-08 0:13 ` Benjamin Gray
-1 siblings, 0 replies; 24+ messages in thread
From: Benjamin Gray @ 2022-09-08 0:13 UTC (permalink / raw)
To: Christophe Leroy, Ard Biesheuvel
Cc: X86 ML, Peter Zijlstra, Chen Zhongjin, Dave Hansen,
Linux Kernel Mailing List, Nicholas Piggin, Jason Baron,
Ingo Molnar, sv, Steven Rostedt (VMware),
H. Peter Anvin, Borislav Petkov, Thomas Gleixner, agust,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
Josh Poimboeuf
[-- Attachment #1: Type: text/plain, Size: 596 bytes --]
On Thu, 2022-09-01 at 16:46 +0000, Christophe Leroy wrote:
> Surprisingly, I get worst performance with inline static call than
> with
> out of line static call:
I'm not sure what hackbench is doing, but when microbenchmarking 64 bit
out-of-line calls in a loop I saw a similar thing where adding more
indirection improved the performance despite doing more work. The cause
seemed to be a combination of using older hardware and the target being
too short (just an integer increment). Moving to a newer machine and
adding a lot of NOPs to the target made the performance make sense.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
@ 2022-09-08 0:13 ` Benjamin Gray
0 siblings, 0 replies; 24+ messages in thread
From: Benjamin Gray @ 2022-09-08 0:13 UTC (permalink / raw)
To: Christophe Leroy, Ard Biesheuvel
Cc: X86 ML, Peter Zijlstra,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
Dave Hansen, Linux Kernel Mailing List, Nicholas Piggin,
Jason Baron, Ingo Molnar, sv, Steven Rostedt (VMware),
H. Peter Anvin, Borislav Petkov, Thomas Gleixner, agust,
Chen Zhongjin, Josh Poimboeuf
[-- Attachment #1: Type: text/plain, Size: 596 bytes --]
On Thu, 2022-09-01 at 16:46 +0000, Christophe Leroy wrote:
> Surprisingly, I get worst performance with inline static call than
> with
> out of line static call:
I'm not sure what hackbench is doing, but when microbenchmarking 64 bit
out-of-line calls in a loop I saw a similar thing where adding more
indirection improved the performance despite doing more work. The cause
seemed to be a combination of using older hardware and the target being
too short (just an integer increment). Moving to a newer machine and
adding a lot of NOPs to the target made the performance make sense.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
2022-09-08 0:13 ` Benjamin Gray
@ 2022-09-08 6:11 ` Christophe Leroy
-1 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-09-08 6:11 UTC (permalink / raw)
To: Benjamin Gray, Ard Biesheuvel
Cc: X86 ML, Peter Zijlstra, Chen Zhongjin, Dave Hansen,
Linux Kernel Mailing List, Nicholas Piggin, Jason Baron,
Ingo Molnar, sv, Steven Rostedt (VMware),
H. Peter Anvin, Borislav Petkov, Thomas Gleixner, agust,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
Josh Poimboeuf
Le 08/09/2022 à 02:13, Benjamin Gray a écrit :
> On Thu, 2022-09-01 at 16:46 +0000, Christophe Leroy wrote:
>> Surprisingly, I get worst performance with inline static call than
>> with
>> out of line static call:
>
> I'm not sure what hackbench is doing, but when microbenchmarking 64 bit
> out-of-line calls in a loop I saw a similar thing where adding more
> indirection improved the performance despite doing more work. The cause
> seemed to be a combination of using older hardware and the target being
> too short (just an integer increment). Moving to a newer machine and
> adding a lot of NOPs to the target made the performance make sense.
Yes might be.
I think I'll first do new tests with CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B
Christophe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
@ 2022-09-08 6:11 ` Christophe Leroy
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Leroy @ 2022-09-08 6:11 UTC (permalink / raw)
To: Benjamin Gray, Ard Biesheuvel
Cc: X86 ML, Peter Zijlstra,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
Dave Hansen, Linux Kernel Mailing List, Nicholas Piggin,
Jason Baron, Ingo Molnar, sv, Steven Rostedt (VMware),
H. Peter Anvin, Borislav Petkov, Thomas Gleixner, agust,
Chen Zhongjin, Josh Poimboeuf
Le 08/09/2022 à 02:13, Benjamin Gray a écrit :
> On Thu, 2022-09-01 at 16:46 +0000, Christophe Leroy wrote:
>> Surprisingly, I get worst performance with inline static call than
>> with
>> out of line static call:
>
> I'm not sure what hackbench is doing, but when microbenchmarking 64 bit
> out-of-line calls in a loop I saw a similar thing where adding more
> indirection improved the performance despite doing more work. The cause
> seemed to be a combination of using older hardware and the target being
> too short (just an integer increment). Moving to a newer machine and
> adding a lot of NOPs to the target made the performance make sense.
Yes might be.
I think I'll first do new tests with CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B
Christophe
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2022-09-08 6:12 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-08 17:31 [PATCH v2 0/7] Implement inline static calls on PPC32 - v2 Christophe Leroy
2022-07-08 17:31 ` Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 1/7] powerpc: Add missing asm/asm.h for objtool Christophe Leroy
2022-07-08 17:31 ` Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 2/7] objtool/powerpc: Activate objtool on PPC32 Christophe Leroy
2022-07-08 17:31 ` Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 3/7] objtool: Add architecture specific R_REL32 macro Christophe Leroy
2022-07-08 17:31 ` Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 4/7] objtool/powerpc: Add necessary support for inline static calls Christophe Leroy
2022-07-08 17:31 ` Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 5/7] init: Call static_call_init() from start_kernel() Christophe Leroy
2022-07-08 17:31 ` Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 6/7] static_call_inline: Provide trampoline address when updating sites Christophe Leroy
2022-07-08 17:31 ` Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 7/7] powerpc/static_call: Implement inline static calls Christophe Leroy
2022-07-08 17:31 ` Christophe Leroy
2022-07-09 6:52 ` [PATCH v2 0/7] Implement inline static calls on PPC32 - v2 Ard Biesheuvel
2022-07-09 6:52 ` Ard Biesheuvel
2022-09-01 16:46 ` Christophe Leroy
2022-09-01 16:46 ` Christophe Leroy
2022-09-08 0:13 ` Benjamin Gray
2022-09-08 0:13 ` Benjamin Gray
2022-09-08 6:11 ` Christophe Leroy
2022-09-08 6:11 ` Christophe Leroy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.