linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] ARM kernel size fixes
@ 2015-03-13 12:07 Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset Ard Biesheuvel
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

This series is a suggested approach to preventing linker failures on large
kernels. It is somewhat unpolished, and posted for comments/testing primarily.

The issues were found and reported by Arnd Bergmann, and these patches are
loosely based on his initial approach to work around them.

Changes since v1:
- Updated PROCINFO patch (#1) to refer to the base of the struct by name, and
  simplify the calling code (rmk)
- Updated b_far/bl_far patch (#3) to remove ARM/THUMB alternatives and use a
  conditionally defined PC_BIAS instead. Also added b_abs/bl_abs versions,
  which can only be used for absolute branches but can be implemented in fewer
  instructions. Added conditional branch support as well.
- introduce (#6) and use (#7) the .text.fixup input section which gets emitted
  after each .text section for each .o
- added patch #8 that allows the kallsyms data to be moved to .data

Ard Biesheuvel (8):
  ARM: replace PROCINFO embedded branch with relative offset
  ARM: move HYP text to end of .text section
  ARM: add macro to perform far branches (b/bl)
  ARM: use bl_far to call __hyp_stub_install_secondary from the .data
    section
  ARM: move the .idmap.text section closer to .head.text
  asm-generic: introduce .text.fixup input section
  ARM: keep .text and .fixup regions together
  kallsyms: allow kallsyms data to reside in the .data section

 arch/arm/Kconfig                      |  1 +
 arch/arm/include/asm/assembler.h      | 83 +++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/futex.h          |  2 +-
 arch/arm/include/asm/uaccess.h        | 10 ++---
 arch/arm/include/asm/word-at-a-time.h |  2 +-
 arch/arm/kernel/entry-armv.S          |  2 +-
 arch/arm/kernel/head.S                | 14 +++---
 arch/arm/kernel/sleep.S               |  2 +-
 arch/arm/kernel/swp_emulate.c         |  2 +-
 arch/arm/kernel/vmlinux.lds.S         | 15 ++++---
 arch/arm/kvm/init.S                   |  5 +--
 arch/arm/kvm/interrupts.S             |  4 +-
 arch/arm/lib/clear_user.S             |  2 +-
 arch/arm/lib/copy_to_user.S           |  2 +-
 arch/arm/lib/csumpartialcopyuser.S    |  2 +-
 arch/arm/mm/alignment.c               |  6 +--
 arch/arm/mm/proc-arm1020.S            |  4 +-
 arch/arm/mm/proc-arm1020e.S           |  4 +-
 arch/arm/mm/proc-arm1022.S            |  4 +-
 arch/arm/mm/proc-arm1026.S            |  4 +-
 arch/arm/mm/proc-arm720.S             |  4 +-
 arch/arm/mm/proc-arm740.S             |  4 +-
 arch/arm/mm/proc-arm7tdmi.S           |  4 +-
 arch/arm/mm/proc-arm920.S             |  4 +-
 arch/arm/mm/proc-arm922.S             |  4 +-
 arch/arm/mm/proc-arm925.S             |  4 +-
 arch/arm/mm/proc-arm926.S             |  4 +-
 arch/arm/mm/proc-arm940.S             |  4 +-
 arch/arm/mm/proc-arm946.S             |  4 +-
 arch/arm/mm/proc-arm9tdmi.S           |  4 +-
 arch/arm/mm/proc-fa526.S              |  4 +-
 arch/arm/mm/proc-feroceon.S           |  5 ++-
 arch/arm/mm/proc-macros.S             |  4 ++
 arch/arm/mm/proc-mohawk.S             |  4 +-
 arch/arm/mm/proc-sa110.S              |  4 +-
 arch/arm/mm/proc-sa1100.S             |  4 +-
 arch/arm/mm/proc-v6.S                 |  4 +-
 arch/arm/mm/proc-v7.S                 | 28 ++++++------
 arch/arm/mm/proc-v7m.S                |  4 +-
 arch/arm/mm/proc-xsc3.S               |  4 +-
 arch/arm/mm/proc-xscale.S             |  4 +-
 arch/arm/nwfpe/entry.S                |  2 +-
 include/asm-generic/vmlinux.lds.h     | 14 +++++-
 init/Kconfig                          |  4 ++
 scripts/kallsyms.c                    |  2 +-
 45 files changed, 200 insertions(+), 101 deletions(-)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
@ 2015-03-13 12:07 ` Ard Biesheuvel
  2015-04-19 16:59   ` Joachim Eastwood
  2015-03-13 12:07 ` [PATCH v2 2/8] ARM: move HYP text to end of .text section Ard Biesheuvel
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

This patch replaces the 'branch to setup()' instructions embedded
in the PROCINFO structs with the offset to that setup function
relative to the base of the struct. This preserves the position
independent nature of that field, but uses a data item rather
than an instruction.

This is mainly done to prevent linker failures on large kernels,
where the setup function is out of reach for the branch.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/head.S      | 14 +++++++-------
 arch/arm/mm/proc-arm1020.S  |  4 ++--
 arch/arm/mm/proc-arm1020e.S |  4 ++--
 arch/arm/mm/proc-arm1022.S  |  4 ++--
 arch/arm/mm/proc-arm1026.S  |  4 ++--
 arch/arm/mm/proc-arm720.S   |  4 ++--
 arch/arm/mm/proc-arm740.S   |  4 ++--
 arch/arm/mm/proc-arm7tdmi.S |  4 ++--
 arch/arm/mm/proc-arm920.S   |  4 ++--
 arch/arm/mm/proc-arm922.S   |  4 ++--
 arch/arm/mm/proc-arm925.S   |  4 ++--
 arch/arm/mm/proc-arm926.S   |  4 ++--
 arch/arm/mm/proc-arm940.S   |  4 ++--
 arch/arm/mm/proc-arm946.S   |  4 ++--
 arch/arm/mm/proc-arm9tdmi.S |  4 ++--
 arch/arm/mm/proc-fa526.S    |  4 ++--
 arch/arm/mm/proc-feroceon.S |  5 +++--
 arch/arm/mm/proc-macros.S   |  4 ++++
 arch/arm/mm/proc-mohawk.S   |  4 ++--
 arch/arm/mm/proc-sa110.S    |  4 ++--
 arch/arm/mm/proc-sa1100.S   |  4 ++--
 arch/arm/mm/proc-v6.S       |  4 ++--
 arch/arm/mm/proc-v7.S       | 28 ++++++++++++++--------------
 arch/arm/mm/proc-v7m.S      |  4 ++--
 arch/arm/mm/proc-xsc3.S     |  4 ++--
 arch/arm/mm/proc-xscale.S   |  4 ++--
 26 files changed, 72 insertions(+), 67 deletions(-)

diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 01963273c07a..3637973a9708 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -138,9 +138,9 @@ ENTRY(stext)
 						@ mmu has been enabled
 	adr	lr, BSYM(1f)			@ return (PIC) address
 	mov	r8, r4				@ set TTBR1 to swapper_pg_dir
- ARM(	add	pc, r10, #PROCINFO_INITFUNC	)
- THUMB(	add	r12, r10, #PROCINFO_INITFUNC	)
- THUMB(	ret	r12				)
+	ldr	r12, [r10, #PROCINFO_INITFUNC]
+	add	r12, r12, r10
+	ret	r12
 1:	b	__enable_mmu
 ENDPROC(stext)
 	.ltorg
@@ -386,10 +386,10 @@ ENTRY(secondary_startup)
 	ldr	r8, [r7, lr]			@ get secondary_data.swapper_pg_dir
 	adr	lr, BSYM(__enable_mmu)		@ return address
 	mov	r13, r12			@ __secondary_switched address
- ARM(	add	pc, r10, #PROCINFO_INITFUNC	) @ initialise processor
-						  @ (return control reg)
- THUMB(	add	r12, r10, #PROCINFO_INITFUNC	)
- THUMB(	ret	r12				)
+	ldr	r12, [r10, #PROCINFO_INITFUNC]
+	add	r12, r12, r10			@ initialise processor
+						@ (return control reg)
+	ret	r12
 ENDPROC(secondary_startup)
 ENDPROC(secondary_startup_arm)
 
diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S
index 86ee5d47ce3c..aa0519eed698 100644
--- a/arch/arm/mm/proc-arm1020.S
+++ b/arch/arm/mm/proc-arm1020.S
@@ -507,7 +507,7 @@ cpu_arm1020_name:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__arm1020_proc_info,#object
 __arm1020_proc_info:
@@ -519,7 +519,7 @@ __arm1020_proc_info:
 	.long   PMD_TYPE_SECT | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__arm1020_setup
+	initfn	__arm1020_setup, __arm1020_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB
diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S
index a6331d78601f..bff4c7f70fd6 100644
--- a/arch/arm/mm/proc-arm1020e.S
+++ b/arch/arm/mm/proc-arm1020e.S
@@ -465,7 +465,7 @@ arm1020e_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__arm1020e_proc_info,#object
 __arm1020e_proc_info:
@@ -479,7 +479,7 @@ __arm1020e_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__arm1020e_setup
+	initfn	__arm1020e_setup, __arm1020e_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB | HWCAP_EDSP
diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S
index a126b7a59928..dbb2413fe04d 100644
--- a/arch/arm/mm/proc-arm1022.S
+++ b/arch/arm/mm/proc-arm1022.S
@@ -448,7 +448,7 @@ arm1022_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__arm1022_proc_info,#object
 __arm1022_proc_info:
@@ -462,7 +462,7 @@ __arm1022_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__arm1022_setup
+	initfn	__arm1022_setup, __arm1022_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB | HWCAP_EDSP
diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S
index fc294067e977..0b37b2cef9d3 100644
--- a/arch/arm/mm/proc-arm1026.S
+++ b/arch/arm/mm/proc-arm1026.S
@@ -442,7 +442,7 @@ arm1026_crval:
 	string	cpu_arm1026_name, "ARM1026EJ-S"
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__arm1026_proc_info,#object
 __arm1026_proc_info:
@@ -456,7 +456,7 @@ __arm1026_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__arm1026_setup
+	initfn	__arm1026_setup, __arm1026_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP|HWCAP_JAVA
diff --git a/arch/arm/mm/proc-arm720.S b/arch/arm/mm/proc-arm720.S
index 2baa66b3ac9b..3651cd70e418 100644
--- a/arch/arm/mm/proc-arm720.S
+++ b/arch/arm/mm/proc-arm720.S
@@ -186,7 +186,7 @@ arm720_crval:
  * See <asm/procinfo.h> for a definition of this structure.
  */
 	
-		.section ".proc.info.init", #alloc, #execinstr
+		.section ".proc.info.init", #alloc
 
 .macro arm720_proc_info name:req, cpu_val:req, cpu_mask:req, cpu_name:req, cpu_flush:req
 		.type	__\name\()_proc_info,#object
@@ -203,7 +203,7 @@ __\name\()_proc_info:
 			PMD_BIT4 | \
 			PMD_SECT_AP_WRITE | \
 			PMD_SECT_AP_READ
-		b	\cpu_flush				@ cpu_flush
+		initfn	\cpu_flush, __\name\()_proc_info	@ cpu_flush
 		.long	cpu_arch_name				@ arch_name
 		.long	cpu_elf_name				@ elf_name
 		.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB	@ elf_hwcap
diff --git a/arch/arm/mm/proc-arm740.S b/arch/arm/mm/proc-arm740.S
index ac1ea6b3bce4..024fb7732407 100644
--- a/arch/arm/mm/proc-arm740.S
+++ b/arch/arm/mm/proc-arm740.S
@@ -132,14 +132,14 @@ __arm740_setup:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 	.type	__arm740_proc_info,#object
 __arm740_proc_info:
 	.long	0x41807400
 	.long	0xfffffff0
 	.long	0
 	.long	0
-	b	__arm740_setup
+	initfn	__arm740_setup, __arm740_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB | HWCAP_26BIT
diff --git a/arch/arm/mm/proc-arm7tdmi.S b/arch/arm/mm/proc-arm7tdmi.S
index bf6ba4bc30ff..25472d94426d 100644
--- a/arch/arm/mm/proc-arm7tdmi.S
+++ b/arch/arm/mm/proc-arm7tdmi.S
@@ -76,7 +76,7 @@ __arm7tdmi_setup:
 
 		.align
 
-		.section ".proc.info.init", #alloc, #execinstr
+		.section ".proc.info.init", #alloc
 
 .macro arm7tdmi_proc_info name:req, cpu_val:req, cpu_mask:req, cpu_name:req, \
 	extra_hwcaps=0
@@ -86,7 +86,7 @@ __\name\()_proc_info:
 		.long	\cpu_mask
 		.long	0
 		.long	0
-		b	__arm7tdmi_setup
+		initfn	__arm7tdmi_setup, __\name\()_proc_info
 		.long	cpu_arch_name
 		.long	cpu_elf_name
 		.long	HWCAP_SWP | HWCAP_26BIT | ( \extra_hwcaps )
diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S
index 22bf8dde4f84..7a14bd4414c9 100644
--- a/arch/arm/mm/proc-arm920.S
+++ b/arch/arm/mm/proc-arm920.S
@@ -448,7 +448,7 @@ arm920_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__arm920_proc_info,#object
 __arm920_proc_info:
@@ -464,7 +464,7 @@ __arm920_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__arm920_setup
+	initfn	__arm920_setup, __arm920_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB
diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S
index 0c6d5ac5a6d4..edccfcdcd551 100644
--- a/arch/arm/mm/proc-arm922.S
+++ b/arch/arm/mm/proc-arm922.S
@@ -426,7 +426,7 @@ arm922_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__arm922_proc_info,#object
 __arm922_proc_info:
@@ -442,7 +442,7 @@ __arm922_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__arm922_setup
+	initfn	__arm922_setup, __arm922_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB
diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S
index c32d073282ea..ede8c54ab4aa 100644
--- a/arch/arm/mm/proc-arm925.S
+++ b/arch/arm/mm/proc-arm925.S
@@ -494,7 +494,7 @@ arm925_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 .macro arm925_proc_info name:req, cpu_val:req, cpu_mask:req, cpu_name:req, cache
 	.type	__\name\()_proc_info,#object
@@ -510,7 +510,7 @@ __\name\()_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__arm925_setup
+	initfn	__arm925_setup, __\name\()_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB
diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S
index 252b2503038d..fb827c633693 100644
--- a/arch/arm/mm/proc-arm926.S
+++ b/arch/arm/mm/proc-arm926.S
@@ -474,7 +474,7 @@ arm926_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__arm926_proc_info,#object
 __arm926_proc_info:
@@ -490,7 +490,7 @@ __arm926_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__arm926_setup
+	initfn	__arm926_setup, __arm926_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP|HWCAP_JAVA
diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S
index e5212d489377..0a0b7a9167b6 100644
--- a/arch/arm/mm/proc-arm940.S
+++ b/arch/arm/mm/proc-arm940.S
@@ -354,14 +354,14 @@ __arm940_setup:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__arm940_proc_info,#object
 __arm940_proc_info:
 	.long	0x41009400
 	.long	0xff00fff0
 	.long	0
-	b	__arm940_setup
+	initfn	__arm940_setup, __arm940_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB
diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S
index b3dd9b2d0b8e..c85b40d2117e 100644
--- a/arch/arm/mm/proc-arm946.S
+++ b/arch/arm/mm/proc-arm946.S
@@ -409,14 +409,14 @@ __arm946_setup:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 	.type	__arm946_proc_info,#object
 __arm946_proc_info:
 	.long	0x41009460
 	.long	0xff00fff0
 	.long	0
 	.long	0
-	b	__arm946_setup
+	initfn	__arm946_setup, __arm946_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB
diff --git a/arch/arm/mm/proc-arm9tdmi.S b/arch/arm/mm/proc-arm9tdmi.S
index 8227322bbb8f..7fac8c612134 100644
--- a/arch/arm/mm/proc-arm9tdmi.S
+++ b/arch/arm/mm/proc-arm9tdmi.S
@@ -70,7 +70,7 @@ __arm9tdmi_setup:
 
 		.align
 
-		.section ".proc.info.init", #alloc, #execinstr
+		.section ".proc.info.init", #alloc
 
 .macro arm9tdmi_proc_info name:req, cpu_val:req, cpu_mask:req, cpu_name:req
 		.type	__\name\()_proc_info, #object
@@ -79,7 +79,7 @@ __\name\()_proc_info:
 		.long	\cpu_mask
 		.long	0
 		.long	0
-		b	__arm9tdmi_setup
+		initfn	__arm9tdmi_setup, __\name\()_proc_info
 		.long	cpu_arch_name
 		.long	cpu_elf_name
 		.long	HWCAP_SWP | HWCAP_THUMB | HWCAP_26BIT
diff --git a/arch/arm/mm/proc-fa526.S b/arch/arm/mm/proc-fa526.S
index c494886892ba..4001b73af4ee 100644
--- a/arch/arm/mm/proc-fa526.S
+++ b/arch/arm/mm/proc-fa526.S
@@ -190,7 +190,7 @@ fa526_cr1_set:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__fa526_proc_info,#object
 __fa526_proc_info:
@@ -206,7 +206,7 @@ __fa526_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__fa526_setup
+	initfn	__fa526_setup, __fa526_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF
diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S
index 03a1b75f2e16..e494d6d6acbe 100644
--- a/arch/arm/mm/proc-feroceon.S
+++ b/arch/arm/mm/proc-feroceon.S
@@ -584,7 +584,7 @@ feroceon_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 .macro feroceon_proc_info name:req, cpu_val:req, cpu_mask:req, cpu_name:req, cache:req
 	.type	__\name\()_proc_info,#object
@@ -601,7 +601,8 @@ __\name\()_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__feroceon_setup
+	initfn	__feroceon_setup, __\name\()_proc_info
+	.long __feroceon_setup
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 082b9f2f7e90..0f13b5f9281e 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -331,3 +331,7 @@ ENTRY(\name\()_tlb_fns)
 	.globl	\x
 	.equ	\x, \y
 .endm
+
+.macro	initfn, func, base
+	.long	\func - \base
+.endm
diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S
index 53d393455f13..d65edf717bf7 100644
--- a/arch/arm/mm/proc-mohawk.S
+++ b/arch/arm/mm/proc-mohawk.S
@@ -427,7 +427,7 @@ mohawk_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__88sv331x_proc_info,#object
 __88sv331x_proc_info:
@@ -443,7 +443,7 @@ __88sv331x_proc_info:
 		PMD_BIT4 | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__mohawk_setup
+	initfn	__mohawk_setup, __88sv331x_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP
diff --git a/arch/arm/mm/proc-sa110.S b/arch/arm/mm/proc-sa110.S
index 8008a0461cf5..ee2ce496239f 100644
--- a/arch/arm/mm/proc-sa110.S
+++ b/arch/arm/mm/proc-sa110.S
@@ -199,7 +199,7 @@ sa110_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	.type	__sa110_proc_info,#object
 __sa110_proc_info:
@@ -213,7 +213,7 @@ __sa110_proc_info:
 	.long   PMD_TYPE_SECT | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__sa110_setup
+	initfn	__sa110_setup, __sa110_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_26BIT | HWCAP_FAST_MULT
diff --git a/arch/arm/mm/proc-sa1100.S b/arch/arm/mm/proc-sa1100.S
index 89f97ac648a9..222d5836f666 100644
--- a/arch/arm/mm/proc-sa1100.S
+++ b/arch/arm/mm/proc-sa1100.S
@@ -242,7 +242,7 @@ sa1100_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 .macro sa1100_proc_info name:req, cpu_val:req, cpu_mask:req, cpu_name:req
 	.type	__\name\()_proc_info,#object
@@ -257,7 +257,7 @@ __\name\()_proc_info:
 	.long   PMD_TYPE_SECT | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__sa1100_setup
+	initfn	__sa1100_setup, __\name\()_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_26BIT | HWCAP_FAST_MULT
diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S
index d0390f4b3f18..06d890a2342b 100644
--- a/arch/arm/mm/proc-v6.S
+++ b/arch/arm/mm/proc-v6.S
@@ -264,7 +264,7 @@ v6_crval:
 	string	cpu_elf_name, "v6"
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	/*
 	 * Match any ARMv6 processor core.
@@ -287,7 +287,7 @@ __v6_proc_info:
 		PMD_SECT_XN | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__v6_setup
+	initfn	__v6_setup, __v6_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	/* See also feat_v6_fixup() for HWCAP_TLS */
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 8b4ee5e81c14..6bdaa4cc1784 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -462,19 +462,19 @@ __v7_setup_stack:
 	string	cpu_elf_name, "v7"
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	/*
 	 * Standard v7 proc info content
 	 */
-.macro __v7_proc initfunc, mm_mmuflags = 0, io_mmuflags = 0, hwcaps = 0, proc_fns = v7_processor_functions
+.macro __v7_proc name, initfunc, mm_mmuflags = 0, io_mmuflags = 0, hwcaps = 0, proc_fns = v7_processor_functions
 	ALT_SMP(.long	PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AP_READ | \
 			PMD_SECT_AF | PMD_FLAGS_SMP | \mm_mmuflags)
 	ALT_UP(.long	PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AP_READ | \
 			PMD_SECT_AF | PMD_FLAGS_UP | \mm_mmuflags)
 	.long	PMD_TYPE_SECT | PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ | PMD_SECT_AF | \io_mmuflags
-	W(b)	\initfunc
+	initfn	\initfunc, \name
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP | HWCAP_HALF | HWCAP_THUMB | HWCAP_FAST_MULT | \
@@ -494,7 +494,7 @@ __v7_setup_stack:
 __v7_ca5mp_proc_info:
 	.long	0x410fc050
 	.long	0xff0ffff0
-	__v7_proc __v7_ca5mp_setup
+	__v7_proc __v7_ca5mp_proc_info, __v7_ca5mp_setup
 	.size	__v7_ca5mp_proc_info, . - __v7_ca5mp_proc_info
 
 	/*
@@ -504,7 +504,7 @@ __v7_ca5mp_proc_info:
 __v7_ca9mp_proc_info:
 	.long	0x410fc090
 	.long	0xff0ffff0
-	__v7_proc __v7_ca9mp_setup, proc_fns = ca9mp_processor_functions
+	__v7_proc __v7_ca9mp_proc_info, __v7_ca9mp_setup, proc_fns = ca9mp_processor_functions
 	.size	__v7_ca9mp_proc_info, . - __v7_ca9mp_proc_info
 
 #endif	/* CONFIG_ARM_LPAE */
@@ -517,7 +517,7 @@ __v7_ca9mp_proc_info:
 __v7_pj4b_proc_info:
 	.long	0x560f5800
 	.long	0xff0fff00
-	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
+	__v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions
 	.size	__v7_pj4b_proc_info, . - __v7_pj4b_proc_info
 #endif
 
@@ -528,7 +528,7 @@ __v7_pj4b_proc_info:
 __v7_cr7mp_proc_info:
 	.long	0x410fc170
 	.long	0xff0ffff0
-	__v7_proc __v7_cr7mp_setup
+	__v7_proc __v7_cr7mp_proc_info, __v7_cr7mp_setup
 	.size	__v7_cr7mp_proc_info, . - __v7_cr7mp_proc_info
 
 	/*
@@ -538,7 +538,7 @@ __v7_cr7mp_proc_info:
 __v7_ca7mp_proc_info:
 	.long	0x410fc070
 	.long	0xff0ffff0
-	__v7_proc __v7_ca7mp_setup
+	__v7_proc __v7_ca7mp_proc_info, __v7_ca7mp_setup
 	.size	__v7_ca7mp_proc_info, . - __v7_ca7mp_proc_info
 
 	/*
@@ -548,7 +548,7 @@ __v7_ca7mp_proc_info:
 __v7_ca12mp_proc_info:
 	.long	0x410fc0d0
 	.long	0xff0ffff0
-	__v7_proc __v7_ca12mp_setup
+	__v7_proc __v7_ca12mp_proc_info, __v7_ca12mp_setup
 	.size	__v7_ca12mp_proc_info, . - __v7_ca12mp_proc_info
 
 	/*
@@ -558,7 +558,7 @@ __v7_ca12mp_proc_info:
 __v7_ca15mp_proc_info:
 	.long	0x410fc0f0
 	.long	0xff0ffff0
-	__v7_proc __v7_ca15mp_setup
+	__v7_proc __v7_ca15mp_proc_info, __v7_ca15mp_setup
 	.size	__v7_ca15mp_proc_info, . - __v7_ca15mp_proc_info
 
 	/*
@@ -568,7 +568,7 @@ __v7_ca15mp_proc_info:
 __v7_b15mp_proc_info:
 	.long	0x420f00f0
 	.long	0xff0ffff0
-	__v7_proc __v7_b15mp_setup
+	__v7_proc __v7_b15mp_proc_info, __v7_b15mp_setup
 	.size	__v7_b15mp_proc_info, . - __v7_b15mp_proc_info
 
 	/*
@@ -578,7 +578,7 @@ __v7_b15mp_proc_info:
 __v7_ca17mp_proc_info:
 	.long	0x410fc0e0
 	.long	0xff0ffff0
-	__v7_proc __v7_ca17mp_setup
+	__v7_proc __v7_ca17mp_proc_info, __v7_ca17mp_setup
 	.size	__v7_ca17mp_proc_info, . - __v7_ca17mp_proc_info
 
 	/*
@@ -594,7 +594,7 @@ __krait_proc_info:
 	 * do support them. They also don't indicate support for fused multiply
 	 * instructions even though they actually do support them.
 	 */
-	__v7_proc __v7_setup, hwcaps = HWCAP_IDIV | HWCAP_VFPv4
+	__v7_proc __krait_proc_info, __v7_setup, hwcaps = HWCAP_IDIV | HWCAP_VFPv4
 	.size	__krait_proc_info, . - __krait_proc_info
 
 	/*
@@ -604,5 +604,5 @@ __krait_proc_info:
 __v7_proc_info:
 	.long	0x000f0000		@ Required ID value
 	.long	0x000f0000		@ Mask for ID
-	__v7_proc __v7_setup
+	__v7_proc __v7_proc_info, __v7_setup
 	.size	__v7_proc_info, . - __v7_proc_info
diff --git a/arch/arm/mm/proc-v7m.S b/arch/arm/mm/proc-v7m.S
index d1e68b553d3b..e08e1f2bab76 100644
--- a/arch/arm/mm/proc-v7m.S
+++ b/arch/arm/mm/proc-v7m.S
@@ -135,7 +135,7 @@ __v7m_setup_stack_top:
 	string cpu_elf_name "v7m"
 	string cpu_v7m_name "ARMv7-M"
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 	/*
 	 * Match any ARMv7-M processor core.
@@ -146,7 +146,7 @@ __v7m_proc_info:
 	.long	0x000f0000		@ Mask for ID
 	.long   0			@ proc_info_list.__cpu_mm_mmu_flags
 	.long   0			@ proc_info_list.__cpu_io_mmu_flags
-	b	__v7m_setup		@ proc_info_list.__cpu_flush
+	initfn	__v7m_setup, __v7m_proc_info	@ proc_info_list.__cpu_flush
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT
diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S
index f8acdfece036..293dcc2c441f 100644
--- a/arch/arm/mm/proc-xsc3.S
+++ b/arch/arm/mm/proc-xsc3.S
@@ -499,7 +499,7 @@ xsc3_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 .macro xsc3_proc_info name:req, cpu_val:req, cpu_mask:req
 	.type	__\name\()_proc_info,#object
@@ -514,7 +514,7 @@ __\name\()_proc_info:
 	.long	PMD_TYPE_SECT | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__xsc3_setup
+	initfn	__xsc3_setup, __\name\()_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP
diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S
index afa2b3c4df4a..b6bbfdb6dfdc 100644
--- a/arch/arm/mm/proc-xscale.S
+++ b/arch/arm/mm/proc-xscale.S
@@ -612,7 +612,7 @@ xscale_crval:
 
 	.align
 
-	.section ".proc.info.init", #alloc, #execinstr
+	.section ".proc.info.init", #alloc
 
 .macro xscale_proc_info name:req, cpu_val:req, cpu_mask:req, cpu_name:req, cache
 	.type	__\name\()_proc_info,#object
@@ -627,7 +627,7 @@ __\name\()_proc_info:
 	.long	PMD_TYPE_SECT | \
 		PMD_SECT_AP_WRITE | \
 		PMD_SECT_AP_READ
-	b	__xscale_setup
+	initfn	__xscale_setup, __\name\()_proc_info
 	.long	cpu_arch_name
 	.long	cpu_elf_name
 	.long	HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/8] ARM: move HYP text to end of .text section
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset Ard Biesheuvel
@ 2015-03-13 12:07 ` Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl) Ard Biesheuvel
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

The HYP text is essentially a separate binary from the kernel proper,
so it can be moved away from the rest of the kernel. This helps prevent
link failures due to branch relocations exceeding their range.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S | 8 ++++++--
 arch/arm/kvm/init.S           | 5 +----
 arch/arm/kvm/interrupts.S     | 4 +---
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index b31aa73e8076..e3b9403bd2d6 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -22,11 +22,14 @@
 	ALIGN_FUNCTION();						\
 	VMLINUX_SYMBOL(__idmap_text_start) = .;				\
 	*(.idmap.text)							\
-	VMLINUX_SYMBOL(__idmap_text_end) = .;				\
+	VMLINUX_SYMBOL(__idmap_text_end) = .;
+
+#define HYP_TEXT							\
 	. = ALIGN(32);							\
 	VMLINUX_SYMBOL(__hyp_idmap_text_start) = .;			\
 	*(.hyp.idmap.text)						\
-	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
+	VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;			\
+	*(.hyp.text)
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
@@ -118,6 +121,7 @@ SECTIONS
 		. = ALIGN(4);
 		*(.got)			/* Global offset table		*/
 			ARM_CPU_KEEP(PROC_INFO)
+			HYP_TEXT
 	}
 
 #ifdef CONFIG_DEBUG_RODATA
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 3988e72d16ff..7a377d36de5d 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -51,8 +51,7 @@
  *   Switches to the runtime PGD, set stack and vectors.
  */
 
-	.text
-	.pushsection    .hyp.idmap.text,"ax"
+	.section    ".hyp.idmap.text", #alloc
 	.align 5
 __kvm_hyp_init:
 	.globl __kvm_hyp_init
@@ -155,5 +154,3 @@ target:	@ We're now in the trampoline code, switch page tables
 
 	.globl __kvm_hyp_init_end
 __kvm_hyp_init_end:
-
-	.popsection
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 79caf79b304a..db22e9bedfcd 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -27,7 +27,7 @@
 #include <asm/vfpmacros.h>
 #include "interrupts_head.S"
 
-	.text
+	.section	".hyp.text", #alloc
 
 __kvm_hyp_code_start:
 	.globl __kvm_hyp_code_start
@@ -316,8 +316,6 @@ THUMB(	orr	r2, r2, #PSR_T_BIT	)
 	eret
 .endm
 
-	.text
-
 	.align 5
 __kvm_hyp_vector:
 	.globl __kvm_hyp_vector
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl)
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 2/8] ARM: move HYP text to end of .text section Ard Biesheuvel
@ 2015-03-13 12:07 ` Ard Biesheuvel
  2015-03-13 16:40   ` Russell King - ARM Linux
  2015-03-18 10:07   ` Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 4/8] ARM: use bl_far to call __hyp_stub_install_secondary from the .data section Ard Biesheuvel
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

These macros execute PC-relative branches, but with a larger
reach than the 24 bits that are available in the b and bl opcodes.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/assembler.h | 83 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index f67fd3afebdf..2e7f55194782 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -88,6 +88,17 @@
 #endif
 
 /*
+ * The program counter is always ahead of the address of the currently
+ * executing instruction by PC_BIAS bytes, whose value differs depending
+ * on the execution mode.
+ */
+#ifdef CONFIG_THUMB2_KERNEL
+#define PC_BIAS		4
+#else
+#define PC_BIAS		8
+#endif
+
+/*
  * Enable and disable interrupts
  */
 #if __LINUX_ARM_ARCH__ >= 6
@@ -108,6 +119,78 @@
 	.endm
 #endif
 
+	/*
+	 * Macros to emit relative conditional branches that may exceed the
+	 * range of the 24-bit immediate of the ordinary b/bl instructions.
+	 * NOTE: this doesn't work with locally defined symbols, as they
+	 * lack the ARM/Thumb annotation (even if they are annotated as
+	 * functions)
+	 */
+	.macro  b_far, target, tmpreg, c=
+#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
+	movt\c	\tmpreg, #:upper16:(\target - (8888f + PC_BIAS))
+	movw\c	\tmpreg, #:lower16:(\target - (8888f + PC_BIAS))
+8888:	add\c	pc, pc, \tmpreg
+#else
+	ldr\c	\tmpreg, 8889f
+8888:	add\c	pc, pc, \tmpreg
+	.ifnb	\c
+	b	8890f
+	.endif
+8889:	.long	\target - (8888b + PC_BIAS)
+8890:
+#endif
+	.endm
+
+	.macro	bl_far, target, c=
+#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
+	movt\c	ip, #:upper16:(\target - (8887f + PC_BIAS))
+	movw\c	ip, #:lower16:(\target - (8887f + PC_BIAS))
+8887:	add\c	ip, ip, pc
+	blx\c	ip
+#else
+	adr\c	lr, 8887f
+	b_far	\target, ip, \c
+8887:
+#endif
+	.endm
+
+	/*
+	 * Macros to emit absolute conditional branches: these are preferred
+	 * over the far variants above because they use fewer instructions
+	 * and/or use implicit literals that the assembler can group together
+	 * to optimize cache utilization. However, they can only be used to
+	 * call functions at their link time address, which rules out early boot
+	 * code that executes with the MMU off.
+	 * The v7 variant uses a movt/movw pair to prevent potential D-cache
+	 * stalls on the literal, so using these macros is preferred over using
+	 * 'ldr pc, =XXX' directly (unless no scratch register is available)
+	 * NOTE: this doesn't work with locally defined symbols, as they
+	 * lack the ARM/Thumb annotation (even if they are annotated as
+	 * functions)
+	 */
+	.macro	b_abs, target, tmpreg, c=
+#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
+	movt\c	\tmpreg, #:upper16:\target
+	movw\c	\tmpreg, #:lower16:\target
+	bx\c	\tmpreg
+#else
+	ldr\c	pc, =\target
+#endif
+	.endm
+
+	.macro	bl_abs, target, c=
+#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
+	movt\c	lr, #:upper16:\target
+	movw\c	lr, #:lower16:\target
+	blx\c	lr
+#else
+	adr\c	lr, BSYM(8886f)
+	ldr\c	pc, =\target
+8886:
+#endif
+	.endm
+
 	.macro asm_trace_hardirqs_off
 #if defined(CONFIG_TRACE_IRQFLAGS)
 	stmdb   sp!, {r0-r3, ip, lr}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/8] ARM: use bl_far to call __hyp_stub_install_secondary from the .data section
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2015-03-13 12:07 ` [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl) Ard Biesheuvel
@ 2015-03-13 12:07 ` Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 5/8] ARM: move the .idmap.text section closer to .head.text Ard Biesheuvel
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/sleep.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/sleep.S b/arch/arm/kernel/sleep.S
index e1e60e5a7a27..0ea3813fedce 100644
--- a/arch/arm/kernel/sleep.S
+++ b/arch/arm/kernel/sleep.S
@@ -128,7 +128,7 @@ ENDPROC(cpu_resume_after_mmu)
 ENTRY(cpu_resume)
 ARM_BE8(setend be)			@ ensure we are in BE mode
 #ifdef CONFIG_ARM_VIRT_EXT
-	bl	__hyp_stub_install_secondary
+	bl_far	__hyp_stub_install_secondary
 #endif
 	safe_svcmode_maskall r1
 	mov	r1, #0
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/8] ARM: move the .idmap.text section closer to .head.text
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2015-03-13 12:07 ` [PATCH v2 4/8] ARM: use bl_far to call __hyp_stub_install_secondary from the .data section Ard Biesheuvel
@ 2015-03-13 12:07 ` Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 6/8] asm-generic: introduce .text.fixup input section Ard Biesheuvel
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

This moves the .idmap.text section closer to .head.text, so that
relative branches are less likely to go out of range if the kernel
text gets bigger.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index e3b9403bd2d6..2e7b2220ef5f 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -103,6 +103,7 @@ SECTIONS
 
 	.text : {			/* Real text segment		*/
 		_stext = .;		/* Text and read-only data	*/
+			IDMAP_TEXT
 			__exception_text_start = .;
 			*(.exception.text)
 			__exception_text_end = .;
@@ -111,7 +112,6 @@ SECTIONS
 			SCHED_TEXT
 			LOCK_TEXT
 			KPROBES_TEXT
-			IDMAP_TEXT
 #ifdef CONFIG_MMU
 			*(.fixup)
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/8] asm-generic: introduce .text.fixup input section
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2015-03-13 12:07 ` [PATCH v2 5/8] ARM: move the .idmap.text section closer to .head.text Ard Biesheuvel
@ 2015-03-13 12:07 ` Ard Biesheuvel
  2015-03-18 18:58   ` Arnd Bergmann
  2015-03-13 12:07 ` [PATCH v2 7/8] ARM: keep .text and .fixup regions together Ard Biesheuvel
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

This introduces a new .text.fixup input section that gets emitted
together with the .text section for each input object file.

Note that

  *(.text)
  *(.text.fixup)

is not the same as

  *(.text .text.fixup)

and we are looking for the latter, to ensure that fixup snippets that
are assembled into a separate section in the object file do not end
up out of range for the relative branch instructions it contains if
the .text section itself grows very large.

This helps prevent linker failures on large ARM kernels.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 include/asm-generic/vmlinux.lds.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index ac78910d7416..463231d5bfc7 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -401,7 +401,7 @@
 #define TEXT_TEXT							\
 		ALIGN_FUNCTION();					\
 		*(.text.hot)						\
-		*(.text)						\
+		*(.text .text.fixup)					\
 		*(.ref.text)						\
 	MEM_KEEP(init.text)						\
 	MEM_KEEP(exit.text)						\
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 7/8] ARM: keep .text and .fixup regions together
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2015-03-13 12:07 ` [PATCH v2 6/8] asm-generic: introduce .text.fixup input section Ard Biesheuvel
@ 2015-03-13 12:07 ` Ard Biesheuvel
  2015-03-13 12:07 ` [PATCH v2 8/8] kallsyms: allow kallsyms data to reside in the .data section Ard Biesheuvel
  2015-03-18  7:54 ` [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
  8 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

This moves all fixup snippets to the .text.fixup section, which is
a special section that gets emitted along with the .text section
for each input object file, i.e., the snippets are kept much closer
to the code they refer to, which helps prevent linker failure on
large kernels.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/futex.h          |  2 +-
 arch/arm/include/asm/uaccess.h        | 10 +++++-----
 arch/arm/include/asm/word-at-a-time.h |  2 +-
 arch/arm/kernel/entry-armv.S          |  2 +-
 arch/arm/kernel/swp_emulate.c         |  2 +-
 arch/arm/kernel/vmlinux.lds.S         |  5 +----
 arch/arm/lib/clear_user.S             |  2 +-
 arch/arm/lib/copy_to_user.S           |  2 +-
 arch/arm/lib/csumpartialcopyuser.S    |  2 +-
 arch/arm/mm/alignment.c               |  6 +++---
 arch/arm/nwfpe/entry.S                |  2 +-
 11 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 53e69dae796f..4e78065a16aa 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -13,7 +13,7 @@
 	"	.align	3\n"					\
 	"	.long	1b, 4f, 2b, 4f\n"			\
 	"	.popsection\n"					\
-	"	.pushsection .fixup,\"ax\"\n"			\
+	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
 	"4:	mov	%0, " err_reg "\n"			\
 	"	b	3b\n"					\
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index ce0786efd26c..74b17d09ef7a 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -315,7 +315,7 @@ do {									\
 	__asm__ __volatile__(					\
 	"1:	" TUSER(ldrb) "	%1,[%2],#0\n"			\
 	"2:\n"							\
-	"	.pushsection .fixup,\"ax\"\n"			\
+	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
 	"3:	mov	%0, %3\n"				\
 	"	mov	%1, #0\n"				\
@@ -351,7 +351,7 @@ do {									\
 	__asm__ __volatile__(					\
 	"1:	" TUSER(ldr) "	%1,[%2],#0\n"			\
 	"2:\n"							\
-	"	.pushsection .fixup,\"ax\"\n"			\
+	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
 	"3:	mov	%0, %3\n"				\
 	"	mov	%1, #0\n"				\
@@ -397,7 +397,7 @@ do {									\
 	__asm__ __volatile__(					\
 	"1:	" TUSER(strb) "	%1,[%2],#0\n"			\
 	"2:\n"							\
-	"	.pushsection .fixup,\"ax\"\n"			\
+	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
 	"3:	mov	%0, %3\n"				\
 	"	b	2b\n"					\
@@ -430,7 +430,7 @@ do {									\
 	__asm__ __volatile__(					\
 	"1:	" TUSER(str) "	%1,[%2],#0\n"			\
 	"2:\n"							\
-	"	.pushsection .fixup,\"ax\"\n"			\
+	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
 	"3:	mov	%0, %3\n"				\
 	"	b	2b\n"					\
@@ -458,7 +458,7 @@ do {									\
  THUMB(	"1:	" TUSER(str) "	" __reg_oper1 ", [%1]\n"	) \
  THUMB(	"2:	" TUSER(str) "	" __reg_oper0 ", [%1, #4]\n"	) \
 	"3:\n"							\
-	"	.pushsection .fixup,\"ax\"\n"			\
+	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
 	"4:	mov	%0, %3\n"				\
 	"	b	3b\n"					\
diff --git a/arch/arm/include/asm/word-at-a-time.h b/arch/arm/include/asm/word-at-a-time.h
index a6d0a29861e7..5831dce4b51c 100644
--- a/arch/arm/include/asm/word-at-a-time.h
+++ b/arch/arm/include/asm/word-at-a-time.h
@@ -71,7 +71,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr)
 	asm(
 	"1:	ldr	%0, [%2]\n"
 	"2:\n"
-	"	.pushsection .fixup,\"ax\"\n"
+	"	.pushsection .text.fixup,\"ax\"\n"
 	"	.align 2\n"
 	"3:	and	%1, %2, #0x3\n"
 	"	bic	%2, %2, #0x3\n"
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 672b21942fff..570306c49406 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -545,7 +545,7 @@ ENDPROC(__und_usr)
 /*
  * The out of line fixup for the ldrt instructions above.
  */
-	.pushsection .fixup, "ax"
+	.pushsection .text.fixup, "ax"
 	.align	2
 4:	str     r4, [sp, #S_PC]			@ retry current instruction
 	ret	r9
diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c
index afdd51e30bec..1361756782c7 100644
--- a/arch/arm/kernel/swp_emulate.c
+++ b/arch/arm/kernel/swp_emulate.c
@@ -42,7 +42,7 @@
 	"	cmp		%0, #0\n"			\
 	"	movne		%0, %4\n"			\
 	"2:\n"							\
-	"	.section	 .fixup,\"ax\"\n"		\
+	"	.section	 .text.fixup,\"ax\"\n"		\
 	"	.align		2\n"				\
 	"3:	mov		%0, %5\n"			\
 	"	b		2b\n"				\
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 2e7b2220ef5f..82846f60e31e 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -77,7 +77,7 @@ SECTIONS
 		ARM_EXIT_DISCARD(EXIT_DATA)
 		EXIT_CALL
 #ifndef CONFIG_MMU
-		*(.fixup)
+		*(.text.fixup)
 		*(__ex_table)
 #endif
 #ifndef CONFIG_SMP_ON_UP
@@ -112,9 +112,6 @@ SECTIONS
 			SCHED_TEXT
 			LOCK_TEXT
 			KPROBES_TEXT
-#ifdef CONFIG_MMU
-			*(.fixup)
-#endif
 			*(.gnu.warning)
 			*(.glue_7)
 			*(.glue_7t)
diff --git a/arch/arm/lib/clear_user.S b/arch/arm/lib/clear_user.S
index 14a0d988c82c..1710fd7db2d5 100644
--- a/arch/arm/lib/clear_user.S
+++ b/arch/arm/lib/clear_user.S
@@ -47,7 +47,7 @@ USER(		strnebt	r2, [r0])
 ENDPROC(__clear_user)
 ENDPROC(__clear_user_std)
 
-		.pushsection .fixup,"ax"
+		.pushsection .text.fixup,"ax"
 		.align	0
 9001:		ldmfd	sp!, {r0, pc}
 		.popsection
diff --git a/arch/arm/lib/copy_to_user.S b/arch/arm/lib/copy_to_user.S
index a9d3db16ecb5..9648b0675a3e 100644
--- a/arch/arm/lib/copy_to_user.S
+++ b/arch/arm/lib/copy_to_user.S
@@ -100,7 +100,7 @@ WEAK(__copy_to_user)
 ENDPROC(__copy_to_user)
 ENDPROC(__copy_to_user_std)
 
-	.pushsection .fixup,"ax"
+	.pushsection .text.fixup,"ax"
 	.align 0
 	copy_abort_preamble
 	ldmfd	sp!, {r1, r2, r3}
diff --git a/arch/arm/lib/csumpartialcopyuser.S b/arch/arm/lib/csumpartialcopyuser.S
index 7d08b43d2c0e..1d0957e61f89 100644
--- a/arch/arm/lib/csumpartialcopyuser.S
+++ b/arch/arm/lib/csumpartialcopyuser.S
@@ -68,7 +68,7 @@
  * so properly, we would have to add in whatever registers were loaded before
  * the fault, which, with the current asm above is not predictable.
  */
-		.pushsection .fixup,"ax"
+		.pushsection .text.fixup,"ax"
 		.align	4
 9001:		mov	r4, #-EFAULT
 		ldr	r5, [sp, #8*4]		@ *err_ptr
diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
index 2c0c541c60ca..9769f1eefe3b 100644
--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -201,7 +201,7 @@ union offset_union {
  THUMB(	"1:	"ins"	%1, [%2]\n"	)		\
  THUMB(	"	add	%2, %2, #1\n"	)		\
 	"2:\n"						\
-	"	.pushsection .fixup,\"ax\"\n"		\
+	"	.pushsection .text.fixup,\"ax\"\n"	\
 	"	.align	2\n"				\
 	"3:	mov	%0, #1\n"			\
 	"	b	2b\n"				\
@@ -261,7 +261,7 @@ union offset_union {
 		"	mov	%1, %1, "NEXT_BYTE"\n"		\
 		"2:	"ins"	%1, [%2]\n"			\
 		"3:\n"						\
-		"	.pushsection .fixup,\"ax\"\n"		\
+		"	.pushsection .text.fixup,\"ax\"\n"	\
 		"	.align	2\n"				\
 		"4:	mov	%0, #1\n"			\
 		"	b	3b\n"				\
@@ -301,7 +301,7 @@ union offset_union {
 		"	mov	%1, %1, "NEXT_BYTE"\n"		\
 		"4:	"ins"	%1, [%2]\n"			\
 		"5:\n"						\
-		"	.pushsection .fixup,\"ax\"\n"		\
+		"	.pushsection .text.fixup,\"ax\"\n"	\
 		"	.align	2\n"				\
 		"6:	mov	%0, #1\n"			\
 		"	b	5b\n"				\
diff --git a/arch/arm/nwfpe/entry.S b/arch/arm/nwfpe/entry.S
index 5d65be1f1e8a..71df43547659 100644
--- a/arch/arm/nwfpe/entry.S
+++ b/arch/arm/nwfpe/entry.S
@@ -113,7 +113,7 @@ next:
 	@ to fault.  Emit the appropriate exception gunk to fix things up.
 	@ ??? For some reason, faults can happen at .Lx2 even with a
 	@ plain LDR instruction.  Weird, but it seems harmless.
-	.pushsection .fixup,"ax"
+	.pushsection .text.fixup,"ax"
 	.align	2
 .Lfix:	ret	r9			@ let the user eat segfaults
 	.popsection
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 8/8] kallsyms: allow kallsyms data to reside in the .data section
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2015-03-13 12:07 ` [PATCH v2 7/8] ARM: keep .text and .fixup regions together Ard Biesheuvel
@ 2015-03-13 12:07 ` Ard Biesheuvel
  2015-03-18  7:54 ` [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
  8 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-13 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

On architectures such as ARM, the default location of the kallsyms
data in the rodata section may be problematic, as it then sits right
between the .text and .init.text/.exit.text sections. This is usually
not a problem, but as soon as the code size exceeds a certain threshold,
the linker will start adding trampolines to ensure the two code regions
can reach each other through ordinary relative branches. This causes
inconsistencies between subsequent versions of the kallsyms data,
causing the build to fail.

This adds a Kconfig symbol that, when set, causes the kallsyms data
regions to be moved to the .data section instead, which works around
this problem.

Cc: linux-arch at vger.kernel.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/Kconfig                  |  1 +
 include/asm-generic/vmlinux.lds.h | 12 +++++++++++-
 init/Kconfig                      |  4 ++++
 scripts/kallsyms.c                |  2 +-
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9f1f09a2bc9b..639e215bd9a1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -11,6 +11,7 @@ config ARM
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_WANT_IPC_PARSE_VERSION
+	select ARCH_HAVE_KALLSYMS_IN_DATA_SECTION
 	select BUILDTIME_EXTABLE_SORT if MMU
 	select CLONE_BACKWARDS
 	select CPU_PM if (SUSPEND || CPU_IDLE)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 463231d5bfc7..09f93bfaad0e 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -150,6 +150,14 @@
 #define TRACE_SYSCALLS()
 #endif
 
+#ifdef	CONFIG_ARCH_HAVE_KALLSYMS_IN_DATA_SECTION
+#define	KALLSYMS_RODATA
+#define	KALLSYMS_DATA		*(.kallsyms_data)
+#else
+#define	KALLSYMS_RODATA		*(.kallsyms_data)
+#define	KALLSYMS_DATA
+#endif
+
 
 #define ___OF_TABLE(cfg, name)	_OF_TABLE_##cfg(name)
 #define __OF_TABLE(cfg, name)	___OF_TABLE(cfg, name)
@@ -197,7 +205,8 @@
 	LIKELY_PROFILE()		       				\
 	BRANCH_PROFILE()						\
 	TRACE_PRINTKS()							\
-	TRACEPOINT_STR()
+	TRACEPOINT_STR()						\
+	KALLSYMS_DATA
 
 /*
  * Data section helpers
@@ -234,6 +243,7 @@
 	.rodata           : AT(ADDR(.rodata) - LOAD_OFFSET) {		\
 		VMLINUX_SYMBOL(__start_rodata) = .;			\
 		*(.rodata) *(.rodata.*)					\
+		KALLSYMS_RODATA						\
 		*(__vermagic)		/* Kernel version magic */	\
 		. = ALIGN(8);						\
 		VMLINUX_SYMBOL(__start___tracepoints_ptrs) = .;		\
diff --git a/init/Kconfig b/init/Kconfig
index 058e3671fa11..d6f4920f3487 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1410,6 +1410,10 @@ config KALLSYMS_ALL
 
 	   Say N unless you really need all symbols.
 
+config ARCH_HAVE_KALLSYMS_IN_DATA_SECTION
+	bool
+	depends on KALLSYMS
+
 config PRINTK
 	default y
 	bool "Enable support for printk" if EXPERT
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index c6d33bd15b04..b23682a967e0 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -333,7 +333,7 @@ static void write_src(void)
 	printf("#define ALGN .align 4\n");
 	printf("#endif\n");
 
-	printf("\t.section .rodata, \"a\"\n");
+	printf("\t.section .kallsyms_data, \"a\"\n");
 
 	/* Provide proper symbols relocatability by their '_text'
 	 * relativeness.  The symbol names cannot be used to construct
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl)
  2015-03-13 12:07 ` [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl) Ard Biesheuvel
@ 2015-03-13 16:40   ` Russell King - ARM Linux
  2015-03-17 20:35     ` Ard Biesheuvel
  2015-03-18 10:07   ` Ard Biesheuvel
  1 sibling, 1 reply; 22+ messages in thread
From: Russell King - ARM Linux @ 2015-03-13 16:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 13, 2015 at 01:07:27PM +0100, Ard Biesheuvel wrote:
> +	.macro	bl_abs, target, c=
> +#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
> +	movt\c	lr, #:upper16:\target
> +	movw\c	lr, #:lower16:\target
> +	blx\c	lr

So I've looked this up, and it's valid, which is surprising because BLX
itself writes to LR - the read from LR must happen before BLX itself
writes to LR.  Thankfully, because of the pipelining, this is probably
guaranteed.

I wonder whether there will be any errata on this... maybe on non-ARM
CPUs?  It'll be interesting to find out what happens once we merge
this... :)

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl)
  2015-03-13 16:40   ` Russell King - ARM Linux
@ 2015-03-17 20:35     ` Ard Biesheuvel
  0 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-17 20:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 March 2015 at 17:40, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Mar 13, 2015 at 01:07:27PM +0100, Ard Biesheuvel wrote:
>> +     .macro  bl_abs, target, c=
>> +#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
>> +     movt\c  lr, #:upper16:\target
>> +     movw\c  lr, #:lower16:\target
>> +     blx\c   lr
>
> So I've looked this up, and it's valid, which is surprising because BLX
> itself writes to LR - the read from LR must happen before BLX itself
> writes to LR.  Thankfully, because of the pipelining, this is probably
> guaranteed.
>

I hadn't given it another thought, to be honest, as arithmetic
instructions can also use the same register as input and output.
But I suppose branch instructions don't go through all the ordinary
pipeline stages

> I wonder whether there will be any errata on this... maybe on non-ARM
> CPUs?  It'll be interesting to find out what happens once we merge
> this... :)
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/8] ARM kernel size fixes
  2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2015-03-13 12:07 ` [PATCH v2 8/8] kallsyms: allow kallsyms data to reside in the .data section Ard Biesheuvel
@ 2015-03-18  7:54 ` Ard Biesheuvel
  8 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-18  7:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 March 2015 at 13:07, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> This series is a suggested approach to preventing linker failures on large
> kernels. It is somewhat unpolished, and posted for comments/testing primarily.
>
> The issues were found and reported by Arnd Bergmann, and these patches are
> loosely based on his initial approach to work around them.
>
> Changes since v1:
> - Updated PROCINFO patch (#1) to refer to the base of the struct by name, and
>   simplify the calling code (rmk)
> - Updated b_far/bl_far patch (#3) to remove ARM/THUMB alternatives and use a
>   conditionally defined PC_BIAS instead. Also added b_abs/bl_abs versions,
>   which can only be used for absolute branches but can be implemented in fewer
>   instructions. Added conditional branch support as well.
> - introduce (#6) and use (#7) the .text.fixup input section which gets emitted
>   after each .text section for each .o
> - added patch #8 that allows the kallsyms data to be moved to .data
>

I put the following ones in the patch tracker:

>   ARM: replace PROCINFO embedded branch with relative offset
>   ARM: add macro to perform far branches (b/bl)
>   ARM: use bl_far to call __hyp_stub_install_secondary from the .data
>     section
>   ARM: move the .idmap.text section closer to .head.text

Arnd, may I have your ack on these if you think this approach is ok?

>   asm-generic: introduce .text.fixup input section
>   ARM: keep .text and .fixup regions together

This one can be dropped and/or deferred. I don't need it to build
Arnd's dotconfig-from-hell successfully, and there is a KVM patch
under review that touches the same part of the linker script.

>   ARM: move HYP text to end of .text section

This one has not been discussed at all, so let's defer for now

>   kallsyms: allow kallsyms data to reside in the .data section
>

Regards,
Ard.


>  arch/arm/Kconfig                      |  1 +
>  arch/arm/include/asm/assembler.h      | 83 +++++++++++++++++++++++++++++++++++
>  arch/arm/include/asm/futex.h          |  2 +-
>  arch/arm/include/asm/uaccess.h        | 10 ++---
>  arch/arm/include/asm/word-at-a-time.h |  2 +-
>  arch/arm/kernel/entry-armv.S          |  2 +-
>  arch/arm/kernel/head.S                | 14 +++---
>  arch/arm/kernel/sleep.S               |  2 +-
>  arch/arm/kernel/swp_emulate.c         |  2 +-
>  arch/arm/kernel/vmlinux.lds.S         | 15 ++++---
>  arch/arm/kvm/init.S                   |  5 +--
>  arch/arm/kvm/interrupts.S             |  4 +-
>  arch/arm/lib/clear_user.S             |  2 +-
>  arch/arm/lib/copy_to_user.S           |  2 +-
>  arch/arm/lib/csumpartialcopyuser.S    |  2 +-
>  arch/arm/mm/alignment.c               |  6 +--
>  arch/arm/mm/proc-arm1020.S            |  4 +-
>  arch/arm/mm/proc-arm1020e.S           |  4 +-
>  arch/arm/mm/proc-arm1022.S            |  4 +-
>  arch/arm/mm/proc-arm1026.S            |  4 +-
>  arch/arm/mm/proc-arm720.S             |  4 +-
>  arch/arm/mm/proc-arm740.S             |  4 +-
>  arch/arm/mm/proc-arm7tdmi.S           |  4 +-
>  arch/arm/mm/proc-arm920.S             |  4 +-
>  arch/arm/mm/proc-arm922.S             |  4 +-
>  arch/arm/mm/proc-arm925.S             |  4 +-
>  arch/arm/mm/proc-arm926.S             |  4 +-
>  arch/arm/mm/proc-arm940.S             |  4 +-
>  arch/arm/mm/proc-arm946.S             |  4 +-
>  arch/arm/mm/proc-arm9tdmi.S           |  4 +-
>  arch/arm/mm/proc-fa526.S              |  4 +-
>  arch/arm/mm/proc-feroceon.S           |  5 ++-
>  arch/arm/mm/proc-macros.S             |  4 ++
>  arch/arm/mm/proc-mohawk.S             |  4 +-
>  arch/arm/mm/proc-sa110.S              |  4 +-
>  arch/arm/mm/proc-sa1100.S             |  4 +-
>  arch/arm/mm/proc-v6.S                 |  4 +-
>  arch/arm/mm/proc-v7.S                 | 28 ++++++------
>  arch/arm/mm/proc-v7m.S                |  4 +-
>  arch/arm/mm/proc-xsc3.S               |  4 +-
>  arch/arm/mm/proc-xscale.S             |  4 +-
>  arch/arm/nwfpe/entry.S                |  2 +-
>  include/asm-generic/vmlinux.lds.h     | 14 +++++-
>  init/Kconfig                          |  4 ++
>  scripts/kallsyms.c                    |  2 +-
>  45 files changed, 200 insertions(+), 101 deletions(-)
>
> --
> 1.8.3.2
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl)
  2015-03-13 12:07 ` [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl) Ard Biesheuvel
  2015-03-13 16:40   ` Russell King - ARM Linux
@ 2015-03-18 10:07   ` Ard Biesheuvel
  2015-03-19  9:01     ` [PATCH] " Ard Biesheuvel
  1 sibling, 1 reply; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-18 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 March 2015 at 13:07, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> These macros execute PC-relative branches, but with a larger
> reach than the 24 bits that are available in the b and bl opcodes.
>
> Acked-by: Nicolas Pitre <nico@linaro.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm/include/asm/assembler.h | 83 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 83 insertions(+)
>
> diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
> index f67fd3afebdf..2e7f55194782 100644
> --- a/arch/arm/include/asm/assembler.h
> +++ b/arch/arm/include/asm/assembler.h
> @@ -88,6 +88,17 @@
>  #endif
>
>  /*
> + * The program counter is always ahead of the address of the currently
> + * executing instruction by PC_BIAS bytes, whose value differs depending
> + * on the execution mode.
> + */
> +#ifdef CONFIG_THUMB2_KERNEL
> +#define PC_BIAS                4
> +#else
> +#define PC_BIAS                8
> +#endif
> +
> +/*
>   * Enable and disable interrupts
>   */
>  #if __LINUX_ARM_ARCH__ >= 6
> @@ -108,6 +119,78 @@
>         .endm
>  #endif
>
> +       /*
> +        * Macros to emit relative conditional branches that may exceed the
> +        * range of the 24-bit immediate of the ordinary b/bl instructions.
> +        * NOTE: this doesn't work with locally defined symbols, as they
> +        * lack the ARM/Thumb annotation (even if they are annotated as
> +        * functions)
> +        */
> +       .macro  b_far, target, tmpreg, c=
> +#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
> +       movt\c  \tmpreg, #:upper16:(\target - (8888f + PC_BIAS))
> +       movw\c  \tmpreg, #:lower16:(\target - (8888f + PC_BIAS))
> +8888:  add\c   pc, pc, \tmpreg
> +#else
> +       ldr\c   \tmpreg, 8889f
> +8888:  add\c   pc, pc, \tmpreg
> +       .ifnb   \c
> +       b       8890f
> +       .endif
> +8889:  .long   \target - (8888b + PC_BIAS)
> +8890:
> +#endif
> +       .endm

Actually, I have found something better:

add\c \tmpreg, pc, #:pc_g0_nc:\target - PC_BIAS
add\c \tmpreg, \tmpreg, #:pc_g1_nc:\target - PC_BIAS + 4
add\c pc, \tmpreg, #:pc_g2:\target - PC_BIAS + 8

This uses a PC-relative group relocation to split the offset into
12-bit chunks and poke them into the add instructions
This way, we don't need the literal at all.

Note that add with pc as destination is ARM-only, so we should
probably retain the v7 movw/movt regardless


> +
> +       .macro  bl_far, target, c=
> +#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
> +       movt\c  ip, #:upper16:(\target - (8887f + PC_BIAS))
> +       movw\c  ip, #:lower16:(\target - (8887f + PC_BIAS))
> +8887:  add\c   ip, ip, pc
> +       blx\c   ip
> +#else
> +       adr\c   lr, 8887f
> +       b_far   \target, ip, \c
> +8887:
> +#endif
> +       .endm
> +
> +       /*
> +        * Macros to emit absolute conditional branches: these are preferred
> +        * over the far variants above because they use fewer instructions
> +        * and/or use implicit literals that the assembler can group together
> +        * to optimize cache utilization. However, they can only be used to
> +        * call functions at their link time address, which rules out early boot
> +        * code that executes with the MMU off.
> +        * The v7 variant uses a movt/movw pair to prevent potential D-cache
> +        * stalls on the literal, so using these macros is preferred over using
> +        * 'ldr pc, =XXX' directly (unless no scratch register is available)
> +        * NOTE: this doesn't work with locally defined symbols, as they
> +        * lack the ARM/Thumb annotation (even if they are annotated as
> +        * functions)
> +        */
> +       .macro  b_abs, target, tmpreg, c=
> +#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
> +       movt\c  \tmpreg, #:upper16:\target
> +       movw\c  \tmpreg, #:lower16:\target
> +       bx\c    \tmpreg
> +#else
> +       ldr\c   pc, =\target
> +#endif
> +       .endm
> +
> +       .macro  bl_abs, target, c=
> +#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
> +       movt\c  lr, #:upper16:\target
> +       movw\c  lr, #:lower16:\target
> +       blx\c   lr
> +#else
> +       adr\c   lr, BSYM(8886f)
> +       ldr\c   pc, =\target
> +8886:
> +#endif
> +       .endm
> +
>         .macro asm_trace_hardirqs_off
>  #if defined(CONFIG_TRACE_IRQFLAGS)
>         stmdb   sp!, {r0-r3, ip, lr}
> --
> 1.8.3.2
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 6/8] asm-generic: introduce .text.fixup input section
  2015-03-13 12:07 ` [PATCH v2 6/8] asm-generic: introduce .text.fixup input section Ard Biesheuvel
@ 2015-03-18 18:58   ` Arnd Bergmann
  0 siblings, 0 replies; 22+ messages in thread
From: Arnd Bergmann @ 2015-03-18 18:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 13 March 2015, Ard Biesheuvel wrote:
> This introduces a new .text.fixup input section that gets emitted
> together with the .text section for each input object file.
> 
> Note that
> 
>   *(.text)
>   *(.text.fixup)
> 
> is not the same as
> 
>   *(.text .text.fixup)
> 
> and we are looking for the latter, to ensure that fixup snippets that
> are assembled into a separate section in the object file do not end
> up out of range for the relative branch instructions it contains if
> the .text section itself grows very large.
> 
> This helps prevent linker failures on large ARM kernels.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Acked-by: Arnd Bergmann <arnd@arndb.de>

Let's merge this together with the other patches rather than using 
the asm-generic git.

> ---
>  include/asm-generic/vmlinux.lds.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index ac78910d7416..463231d5bfc7 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -401,7 +401,7 @@
>  #define TEXT_TEXT							\
>  		ALIGN_FUNCTION();					\
>  		*(.text.hot)						\
> -		*(.text)						\
> +		*(.text .text.fixup)					\
>  		*(.ref.text)						\
>  	MEM_KEEP(init.text)						\
>  	MEM_KEEP(exit.text)						\
> -- 
> 1.8.3.2
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] ARM: add macro to perform far branches (b/bl)
  2015-03-18 10:07   ` Ard Biesheuvel
@ 2015-03-19  9:01     ` Ard Biesheuvel
  0 siblings, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-03-19  9:01 UTC (permalink / raw)
  To: linux-arm-kernel

OK, so this is what I came up with in the end. I dropped b_abs/bl_abs as
they are not needed anymore, now that b_far/bl_far are emitted without
any explicit or implicit literals.

I updated the ARCH check so that movw/movt/ really only gets used on
v7 targeted builds. I also updated the v7 variant to use bx instead
of adding with the PC as destination register, as this is deprecated
by the ARM ARM.

--------------------8<-----------------------

These macros execute PC-relative branches, but with a larger
reach than the 24 bits that are available in the b and bl opcodes.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/assembler.h | 44 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index f67fd3afebdf..1b9a630f93e0 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -88,6 +88,17 @@
 #endif
 
 /*
+ * The program counter is always ahead of the address of the currently
+ * executing instruction by PC_BIAS bytes, whose value differs depending
+ * on the execution mode.
+ */
+#ifdef CONFIG_THUMB2_KERNEL
+#define PC_BIAS		4
+#else
+#define PC_BIAS		8
+#endif
+
+/*
  * Enable and disable interrupts
  */
 #if __LINUX_ARM_ARCH__ >= 6
@@ -108,6 +119,39 @@
 	.endm
 #endif
 
+	/*
+	 * Macros to emit relative conditional branches that may exceed the
+	 * range of the 24-bit immediate of the ordinary b/bl instructions.
+	 * NOTE: this doesn't work with locally defined symbols, as they
+	 * lack the ARM/Thumb annotation (even if they are annotated as
+	 * functions)
+	 */
+	.macro  b_far, target, r, c=, b=bx
+#if __LINUX_ARM_ARCH__ >= 7
+	movt\c	\r, #:upper16:(\target - (8888f + PC_BIAS))
+	movw\c	\r, #:lower16:(\target - (8888f + PC_BIAS))
+8888:	add\c	\r, \r, pc
+	\b\c	\r
+#else
+	/*
+	 * Compute the PC-relative offset of \target. We need to correct for
+	 * the bias when reading the PC at label 8888, and for the offset
+	 * between the place of the read and the place of the relocation.
+	 */
+8888:	add\c	\r, pc, #:pc_g0_nc:(\target - PC_BIAS + (. - 8888b))
+	add\c	\r, \r, #:pc_g1_nc:(\target - PC_BIAS + (. - 8888b))
+	add\c	pc, \r, #:pc_g2:(\target - PC_BIAS + (. - 8888b))
+#endif
+	.endm
+
+	.macro	bl_far, target, c=
+#if __LINUX_ARM_ARCH__ < 7
+	adr\c	lr, 8887f
+#endif
+	b_far	\target, ip, \c, blx
+8887:
+	.endm
+
 	.macro asm_trace_hardirqs_off
 #if defined(CONFIG_TRACE_IRQFLAGS)
 	stmdb   sp!, {r0-r3, ip, lr}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset
  2015-03-13 12:07 ` [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset Ard Biesheuvel
@ 2015-04-19 16:59   ` Joachim Eastwood
  2015-04-19 17:08     ` Russell King - ARM Linux
  0 siblings, 1 reply; 22+ messages in thread
From: Joachim Eastwood @ 2015-04-19 16:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,
On 13 March 2015 at 13:07, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> This patch replaces the 'branch to setup()' instructions embedded
> in the PROCINFO structs with the offset to that setup function
> relative to the base of the struct. This preserves the position
> independent nature of that field, but uses a data item rather
> than an instruction.
>
> This is mainly done to prevent linker failures on large kernels,
> where the setup function is out of reach for the branch.

This commit (bf35706f3d09 in Linus master) breaks booting on ARMv7-M.

When I try to boot Linus master now on my NXP LPC4357 (Cortex-M4) dev
kit I get the following message from u-boot.
## Booting kernel from Legacy Image at 29000000 ...
Image Name: Linux
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 1412318 Bytes = 1.3 MB
Load Address: 28008000
Entry Point: 28008001
Verifying Checksum ... OK
Loading Kernel Image ... OK
OK

Starting kernel ...

UNHANDLED EXCEPTION: HARD FAULT
R0 = ffffffff R1 = 00001038
R2 = 281d8711 R3 = 00000000
R12 = 2822092c LR = 28008023
PC = 2822092e PSR = 21000000

Reverting bf35706f3d09 (plus fixing a small conflict) makes Linus
master boot again.

I am using the following compiler:
gcc version 4.9.2 20140904 (prerelease) (crosstool-NG
linaro-1.13.1-4.9-2014.09 - Linaro GCC 4.9-2014.09

The ARMv7-M machine that I am using is not upstream yet, but you can
find the patch set on the mailing list.

regards,
Joachim Eastwood

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset
  2015-04-19 16:59   ` Joachim Eastwood
@ 2015-04-19 17:08     ` Russell King - ARM Linux
  2015-04-19 17:41       ` Ard Biesheuvel
  2015-04-19 19:24       ` Joachim Eastwood
  0 siblings, 2 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2015-04-19 17:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Apr 19, 2015 at 06:59:45PM +0200, Joachim Eastwood wrote:
> Hi Ard,
> On 13 March 2015 at 13:07, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > This patch replaces the 'branch to setup()' instructions embedded
> > in the PROCINFO structs with the offset to that setup function
> > relative to the base of the struct. This preserves the position
> > independent nature of that field, but uses a data item rather
> > than an instruction.
> >
> > This is mainly done to prevent linker failures on large kernels,
> > where the setup function is out of reach for the branch.
> 
> This commit (bf35706f3d09 in Linus master) breaks booting on ARMv7-M.
> 
> When I try to boot Linus master now on my NXP LPC4357 (Cortex-M4) dev
> kit I get the following message from u-boot.
> ## Booting kernel from Legacy Image at 29000000 ...
> Image Name: Linux
> Image Type: ARM Linux Kernel Image (uncompressed)
> Data Size: 1412318 Bytes = 1.3 MB
> Load Address: 28008000
> Entry Point: 28008001
> Verifying Checksum ... OK
> Loading Kernel Image ... OK
> OK
> 
> Starting kernel ...
> 
> UNHANDLED EXCEPTION: HARD FAULT
> R0 = ffffffff R1 = 00001038
> R2 = 281d8711 R3 = 00000000
> R12 = 2822092c LR = 28008023
> PC = 2822092e PSR = 21000000
> 
> Reverting bf35706f3d09 (plus fixing a small conflict) makes Linus
> master boot again.
> 
> I am using the following compiler:
> gcc version 4.9.2 20140904 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.09 - Linaro GCC 4.9-2014.09
> 
> The ARMv7-M machine that I am using is not upstream yet, but you can
> find the patch set on the mailing list.

Interesting... it works here with stock gcc 4.9.2.  Maybe it's a bug in
the Linaro gcc?

Could you mail me (privately) your vmlinux file (the one in the root
directory) for analysis please?

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset
  2015-04-19 17:08     ` Russell King - ARM Linux
@ 2015-04-19 17:41       ` Ard Biesheuvel
  2015-04-19 19:28         ` Russell King - ARM Linux
  2015-04-19 19:24       ` Joachim Eastwood
  1 sibling, 1 reply; 22+ messages in thread
From: Ard Biesheuvel @ 2015-04-19 17:41 UTC (permalink / raw)
  To: linux-arm-kernel


> On 19 apr. 2015, at 19:08, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> 
>> On Sun, Apr 19, 2015 at 06:59:45PM +0200, Joachim Eastwood wrote:
>> Hi Ard,
>>> On 13 March 2015 at 13:07, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> This patch replaces the 'branch to setup()' instructions embedded
>>> in the PROCINFO structs with the offset to that setup function
>>> relative to the base of the struct. This preserves the position
>>> independent nature of that field, but uses a data item rather
>>> than an instruction.
>>> 
>>> This is mainly done to prevent linker failures on large kernels,
>>> where the setup function is out of reach for the branch.
>> 
>> This commit (bf35706f3d09 in Linus master) breaks booting on ARMv7-M.
>> 
>> When I try to boot Linus master now on my NXP LPC4357 (Cortex-M4) dev
>> kit I get the following message from u-boot.
>> ## Booting kernel from Legacy Image at 29000000 ...
>> Image Name: Linux
>> Image Type: ARM Linux Kernel Image (uncompressed)
>> Data Size: 1412318 Bytes = 1.3 MB
>> Load Address: 28008000
>> Entry Point: 28008001
>> Verifying Checksum ... OK
>> Loading Kernel Image ... OK
>> OK
>> 
>> Starting kernel ...
>> 
>> UNHANDLED EXCEPTION: HARD FAULT
>> R0 = ffffffff R1 = 00001038
>> R2 = 281d8711 R3 = 00000000
>> R12 = 2822092c LR = 28008023
>> PC = 2822092e PSR = 21000000
>> 
>> Reverting bf35706f3d09 (plus fixing a small conflict) makes Linus
>> master boot again.
>> 
>> I am using the following compiler:
>> gcc version 4.9.2 20140904 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.09 - Linaro GCC 4.9-2014.09
>> 
>> The ARMv7-M machine that I am using is not upstream yet, but you can
>> find the patch set on the mailing list.
> 
> Interesting... it works here with stock gcc 4.9.2.  Maybe it's a bug in
> the Linaro gcc?
> 
> Could you mail me (privately) your vmlinux file (the one in the root
> directory) for analysis please?

I am away from my work pc so i can't check but i wonder if all setup functions are correctly annotated as thumb2 when built in thumb2 mode. If not, it would explain why a plain branch works but doing arithmetic on the address doesn't.

Perhaps a bsym() around the setup function in question is sufficient to solve this?

Ard.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset
  2015-04-19 17:08     ` Russell King - ARM Linux
  2015-04-19 17:41       ` Ard Biesheuvel
@ 2015-04-19 19:24       ` Joachim Eastwood
  1 sibling, 0 replies; 22+ messages in thread
From: Joachim Eastwood @ 2015-04-19 19:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 April 2015 at 19:08, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sun, Apr 19, 2015 at 06:59:45PM +0200, Joachim Eastwood wrote:
>> Hi Ard,
>> On 13 March 2015 at 13:07, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > This patch replaces the 'branch to setup()' instructions embedded
>> > in the PROCINFO structs with the offset to that setup function
>> > relative to the base of the struct. This preserves the position
>> > independent nature of that field, but uses a data item rather
>> > than an instruction.
>> >
>> > This is mainly done to prevent linker failures on large kernels,
>> > where the setup function is out of reach for the branch.
>>
>> This commit (bf35706f3d09 in Linus master) breaks booting on ARMv7-M.
>>
>> When I try to boot Linus master now on my NXP LPC4357 (Cortex-M4) dev
>> kit I get the following message from u-boot.
>> ## Booting kernel from Legacy Image at 29000000 ...
>> Image Name: Linux
>> Image Type: ARM Linux Kernel Image (uncompressed)
>> Data Size: 1412318 Bytes = 1.3 MB
>> Load Address: 28008000
>> Entry Point: 28008001
>> Verifying Checksum ... OK
>> Loading Kernel Image ... OK
>> OK
>>
>> Starting kernel ...
>>
>> UNHANDLED EXCEPTION: HARD FAULT
>> R0 = ffffffff R1 = 00001038
>> R2 = 281d8711 R3 = 00000000
>> R12 = 2822092c LR = 28008023
>> PC = 2822092e PSR = 21000000
>>
>> Reverting bf35706f3d09 (plus fixing a small conflict) makes Linus
>> master boot again.
>>
>> I am using the following compiler:
>> gcc version 4.9.2 20140904 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.09 - Linaro GCC 4.9-2014.09
>>
>> The ARMv7-M machine that I am using is not upstream yet, but you can
>> find the patch set on the mailing list.
>
> Interesting... it works here with stock gcc 4.9.2.  Maybe it's a bug in
> the Linaro gcc?

I tried the ARM crosscompiler from
https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/
and it gives me the same result as the Linaro one.

gcc version 4.9.0 (GCC)

> Could you mail me (privately) your vmlinux file (the one in the root
> directory) for analysis please?

Sure (mail already sent).

regards,
Joachim Eastwood

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset
  2015-04-19 17:41       ` Ard Biesheuvel
@ 2015-04-19 19:28         ` Russell King - ARM Linux
  2015-04-19 19:45           ` Joachim Eastwood
  2015-04-19 21:52           ` Ard Biesheuvel
  0 siblings, 2 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2015-04-19 19:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Apr 19, 2015 at 07:41:08PM +0200, Ard Biesheuvel wrote:
> I am away from my work pc so i can't check but i wonder if all setup
> functions are correctly annotated as thumb2 when built in thumb2 mode.
> If not, it would explain why a plain branch works but doing arithmetic
> on the address doesn't.

Yes, it's a Thumb2 kernel, but more importantly, it's a nommu kernel,
and the nommu code wasn't touched.

So, the entry code looks like this:

28008000:       f8df 9024       ldr.w   r9, [pc, #36]   ; 28008028 <__after_proc_init+0x4>
28008004:       f8d9 9000       ldr.w   r9, [r9]
28008008:       f001 f926       bl      28009258 <__lookup_processor_type>
2800800c:       ea5f 0a05       movs.w  sl, r5
28008010:       f001 8164       beq.w   280092dc <__error_p>
28008014:       f8df d014       ldr.w   sp, [pc, #20]   ; 2800802c <__after_proc_init+0x8>
28008018:       f20f 0e07       addw    lr, pc, #7
2800801c:       f10a 0c10       add.w   ip, sl, #16
28008020:       46e7            mov     pc, ip
28008022:       e7ff            b.n     28008024 <__after_proc_init>

which results in us jumping to:

2822091c <__proc_info_begin>:
2822091c:       000f0000        andeq   r0, pc, r0
28220920:       000f0000        andeq   r0, pc, r0
        ...
2822092c:       fff5ce6d                        ; <UNDEFINED> instruction: 0xfff5ce6d

^^^ here.  That's an offset from the beginning of the structure, which
gives us an address of 0x2817d789, which would be correct:

2817d788 <__v7m_setup>:
2817d788:       4829            ldr     r0, [pc, #164]  ; (2817d830 <v7m_processor_functions+0x30>)
2817d78a:       f8df c0a8       ldr.w   ip, [pc, #168]  ; 2817d834 <v7m_processor_functions+0x34>
2817d78e:       f8c0 c008       str.w   ip, [r0, #8]

The patch below should resolve it - Joachim, please confirm:

diff --git a/arch/arm/kernel/head-nommu.S b/arch/arm/kernel/head-nommu.S
index 455033110078..5925449f6f04 100644
--- a/arch/arm/kernel/head-nommu.S
+++ b/arch/arm/kernel/head-nommu.S
@@ -80,9 +80,9 @@ ENTRY(stext)
 	ldr	r13, =__mmap_switched		@ address to jump to after
 						@ initialising sctlr
 	adr	lr, BSYM(1f)			@ return (PIC) address
- ARM(	add	pc, r10, #PROCINFO_INITFUNC	)
- THUMB(	add	r12, r10, #PROCINFO_INITFUNC	)
- THUMB(	ret	r12				)
+	ldr	r12, [r10, #PROCINFO_INITFUNC]
+	add	r12, r12, r10
+	ret	r12
  1:	b	__after_proc_init
 ENDPROC(stext)
 
@@ -117,9 +117,9 @@ ENTRY(secondary_startup)
 
 	adr	lr, BSYM(__after_proc_init)	@ return address
 	mov	r13, r12			@ __secondary_switched address
- ARM(	add	pc, r10, #PROCINFO_INITFUNC	)
- THUMB(	add	r12, r10, #PROCINFO_INITFUNC	)
- THUMB(	ret	r12				)
+	ldr	r12, [r10, #PROCINFO_INITFUNC]
+	add	r12, r12, r10
+	ret	r12
 ENDPROC(secondary_startup)
 
 ENTRY(__secondary_switched)


-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset
  2015-04-19 19:28         ` Russell King - ARM Linux
@ 2015-04-19 19:45           ` Joachim Eastwood
  2015-04-19 21:52           ` Ard Biesheuvel
  1 sibling, 0 replies; 22+ messages in thread
From: Joachim Eastwood @ 2015-04-19 19:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 April 2015 at 21:28, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sun, Apr 19, 2015 at 07:41:08PM +0200, Ard Biesheuvel wrote:
>> I am away from my work pc so i can't check but i wonder if all setup
>> functions are correctly annotated as thumb2 when built in thumb2 mode.
>> If not, it would explain why a plain branch works but doing arithmetic
>> on the address doesn't.
>
> Yes, it's a Thumb2 kernel, but more importantly, it's a nommu kernel,
> and the nommu code wasn't touched.
>
> So, the entry code looks like this:
>
> 28008000:       f8df 9024       ldr.w   r9, [pc, #36]   ; 28008028 <__after_proc_init+0x4>
> 28008004:       f8d9 9000       ldr.w   r9, [r9]
> 28008008:       f001 f926       bl      28009258 <__lookup_processor_type>
> 2800800c:       ea5f 0a05       movs.w  sl, r5
> 28008010:       f001 8164       beq.w   280092dc <__error_p>
> 28008014:       f8df d014       ldr.w   sp, [pc, #20]   ; 2800802c <__after_proc_init+0x8>
> 28008018:       f20f 0e07       addw    lr, pc, #7
> 2800801c:       f10a 0c10       add.w   ip, sl, #16
> 28008020:       46e7            mov     pc, ip
> 28008022:       e7ff            b.n     28008024 <__after_proc_init>
>
> which results in us jumping to:
>
> 2822091c <__proc_info_begin>:
> 2822091c:       000f0000        andeq   r0, pc, r0
> 28220920:       000f0000        andeq   r0, pc, r0
>         ...
> 2822092c:       fff5ce6d                        ; <UNDEFINED> instruction: 0xfff5ce6d
>
> ^^^ here.  That's an offset from the beginning of the structure, which
> gives us an address of 0x2817d789, which would be correct:
>
> 2817d788 <__v7m_setup>:
> 2817d788:       4829            ldr     r0, [pc, #164]  ; (2817d830 <v7m_processor_functions+0x30>)
> 2817d78a:       f8df c0a8       ldr.w   ip, [pc, #168]  ; 2817d834 <v7m_processor_functions+0x34>
> 2817d78e:       f8c0 c008       str.w   ip, [r0, #8]
>
> The patch below should resolve it - Joachim, please confirm:

Yep, patch below makes Linus master boot again on my Cortex-M4 board.
Tested-by: Joachim Eastwood <manabian@gmail.com>


Thanks for debugging and fixing the problem Russell.

regards,
Joachim Eastwood


> diff --git a/arch/arm/kernel/head-nommu.S b/arch/arm/kernel/head-nommu.S
> index 455033110078..5925449f6f04 100644
> --- a/arch/arm/kernel/head-nommu.S
> +++ b/arch/arm/kernel/head-nommu.S
> @@ -80,9 +80,9 @@ ENTRY(stext)
>         ldr     r13, =__mmap_switched           @ address to jump to after
>                                                 @ initialising sctlr
>         adr     lr, BSYM(1f)                    @ return (PIC) address
> - ARM(  add     pc, r10, #PROCINFO_INITFUNC     )
> - THUMB(        add     r12, r10, #PROCINFO_INITFUNC    )
> - THUMB(        ret     r12                             )
> +       ldr     r12, [r10, #PROCINFO_INITFUNC]
> +       add     r12, r12, r10
> +       ret     r12
>   1:    b       __after_proc_init
>  ENDPROC(stext)
>
> @@ -117,9 +117,9 @@ ENTRY(secondary_startup)
>
>         adr     lr, BSYM(__after_proc_init)     @ return address
>         mov     r13, r12                        @ __secondary_switched address
> - ARM(  add     pc, r10, #PROCINFO_INITFUNC     )
> - THUMB(        add     r12, r10, #PROCINFO_INITFUNC    )
> - THUMB(        ret     r12                             )
> +       ldr     r12, [r10, #PROCINFO_INITFUNC]
> +       add     r12, r12, r10
> +       ret     r12
>  ENDPROC(secondary_startup)
>
>  ENTRY(__secondary_switched)
>
>
> --
> FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
> according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset
  2015-04-19 19:28         ` Russell King - ARM Linux
  2015-04-19 19:45           ` Joachim Eastwood
@ 2015-04-19 21:52           ` Ard Biesheuvel
  1 sibling, 0 replies; 22+ messages in thread
From: Ard Biesheuvel @ 2015-04-19 21:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 April 2015 at 21:28, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sun, Apr 19, 2015 at 07:41:08PM +0200, Ard Biesheuvel wrote:
>> I am away from my work pc so i can't check but i wonder if all setup
>> functions are correctly annotated as thumb2 when built in thumb2 mode.
>> If not, it would explain why a plain branch works but doing arithmetic
>> on the address doesn't.
>
> Yes, it's a Thumb2 kernel, but more importantly, it's a nommu kernel,
> and the nommu code wasn't touched.
>

Ah, my bad. I had no idea that code was duplicated elsewhere, but I
guess grepping for PROCINFO_INITFUNC would have given me a strong
hint.

> So, the entry code looks like this:
>
> 28008000:       f8df 9024       ldr.w   r9, [pc, #36]   ; 28008028 <__after_proc_init+0x4>
> 28008004:       f8d9 9000       ldr.w   r9, [r9]
> 28008008:       f001 f926       bl      28009258 <__lookup_processor_type>
> 2800800c:       ea5f 0a05       movs.w  sl, r5
> 28008010:       f001 8164       beq.w   280092dc <__error_p>
> 28008014:       f8df d014       ldr.w   sp, [pc, #20]   ; 2800802c <__after_proc_init+0x8>
> 28008018:       f20f 0e07       addw    lr, pc, #7
> 2800801c:       f10a 0c10       add.w   ip, sl, #16
> 28008020:       46e7            mov     pc, ip

OK, there's still a dodgy bit here. The issue I pointed out in my
previous email actually does exist, i.e., the setup functions are not
always annotated as thumb2 so the offset from the base of the struct
may lack the thumb bit even if the function is coded in thumb2. This
is caused by the fact that local labels lack this annotation, even if
the function is emitted into a separate section in the same object
file and references to it are resolved by the linker through
relocations.

Looking at a couple of procinfo entries from proc-v7.S, it turns out
that the offset field (the 1st word on the 2nd line) indeed contains
even values in some cases in a Thumb2 kernel

c0771364 <__v7_ca9mp_proc_info>:
c0771364:       410fc090 ff0ffff0 00011c0e 00000c02     ...A............
c0771374:       ffaab634 c0773934 c077393a 00008097     4...49w.:9w.....
...

c0771398 <__v7_ca8_proc_info>:
c0771398:       410fc080 ff0ffff0 00011c0e 00000c02     ...A............
c07713a8:       ffaab667 c0773934 c077393a 00008097     g...49w.:9w.....
...

c07713cc <__v7_pj4b_proc_info>:
c07713cc:       560f5800 ff0fff00 00011c0e 00000c02     .X.V............
c07713dc:       ffaab5ee c0773934 c077393a 00008097     ....49w.:9w.....
...

c0771400 <__v7_cr7mp_proc_info>:
c0771400:       410fc170 ff0ffff0 00011c0e 00000c02     p..A............
c0771410:       ffaab598 c0773934 c077393a 00008097     ....49w.:9w.....
...

c0771434 <__v7_ca7mp_proc_info>:
c0771434:       410fc070 ff0ffff0 00011c0e 00000c02     p..A............
c0771444:       ffaab56a c0773934 c077393a 00008097     j...49w.:9w.....
...

but we are getting lucky because the 'ret r12' instruction from
head{-nommu}.S is emitted as 'mov pc, ip', which is a [for v7]
deprecated method of performing a branch-to-register which doesn't
incur a mode switch. In other words, if we'd use the architecturally
correct 'bx ip' here, the code breaks.

As far as I can tell, there are no such setup functions that could run
on a Thumb2 capable CPU but are emitted in ARM code explicitly, so I
think the fix could be as simple as

diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index c671f345266a..a4f6d74e9e21 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -333,7 +333,7 @@ ENTRY(\name\()_tlb_fns)
 .endm

 .macro initfn, func, base
-       .long   \func - \base
+       .long   BSYM(\func) - \base
 .endm

        /*

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-04-19 21:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-13 12:07 [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel
2015-03-13 12:07 ` [PATCH v2 1/8] ARM: replace PROCINFO embedded branch with relative offset Ard Biesheuvel
2015-04-19 16:59   ` Joachim Eastwood
2015-04-19 17:08     ` Russell King - ARM Linux
2015-04-19 17:41       ` Ard Biesheuvel
2015-04-19 19:28         ` Russell King - ARM Linux
2015-04-19 19:45           ` Joachim Eastwood
2015-04-19 21:52           ` Ard Biesheuvel
2015-04-19 19:24       ` Joachim Eastwood
2015-03-13 12:07 ` [PATCH v2 2/8] ARM: move HYP text to end of .text section Ard Biesheuvel
2015-03-13 12:07 ` [PATCH v2 3/8] ARM: add macro to perform far branches (b/bl) Ard Biesheuvel
2015-03-13 16:40   ` Russell King - ARM Linux
2015-03-17 20:35     ` Ard Biesheuvel
2015-03-18 10:07   ` Ard Biesheuvel
2015-03-19  9:01     ` [PATCH] " Ard Biesheuvel
2015-03-13 12:07 ` [PATCH v2 4/8] ARM: use bl_far to call __hyp_stub_install_secondary from the .data section Ard Biesheuvel
2015-03-13 12:07 ` [PATCH v2 5/8] ARM: move the .idmap.text section closer to .head.text Ard Biesheuvel
2015-03-13 12:07 ` [PATCH v2 6/8] asm-generic: introduce .text.fixup input section Ard Biesheuvel
2015-03-18 18:58   ` Arnd Bergmann
2015-03-13 12:07 ` [PATCH v2 7/8] ARM: keep .text and .fixup regions together Ard Biesheuvel
2015-03-13 12:07 ` [PATCH v2 8/8] kallsyms: allow kallsyms data to reside in the .data section Ard Biesheuvel
2015-03-18  7:54 ` [PATCH v2 0/8] ARM kernel size fixes Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).