linux-hardening.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup
@ 2022-01-24 17:47 Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 01/32] ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups Ard Biesheuvel
                   ` (32 more replies)
  0 siblings, 33 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

This v5 series is a combined followup to

- IRQ stacks support for v7 SMP systems [0],
- vmap'ed stacks support for v7 SMP systems[1],
- extending support for both IRQ stacks and vmap'ed stacks for all
  remaining configurations, including v6/v7 SMP multiplatform kernels
  and uniprocessor configurations including v7-M [2]

[0] https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/
[1] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-1-ardb@kernel.org/
[2] https://lore.kernel.org/linux-arm-kernel/20211206164659.1495084-1-ardb@kernel.org/

This work was queued up in the ARM tree for a while, but due to problems
with the vmap'ed stacks code, which was difficult to revert in
isolation, the whole stack was dropped again.

In order to prevent similar problems from occurring this time around,
the series was reorganized so that the vmap'ed stacks changes appear at
the very end, which also results in a more natural progression of the
changes.

Changes since v4:
- incorporate fixups to avoid build failures on Clang related to
  literals in subsections,
- switch from the ID map to swapper_pg_dir as early as possible when
  onlining a CPU on !LPAE, to ensure that the stack is mapped,
- use SMP_ON_UP patching to elide HWCAP_TLS tests on SMP+v6,
- clean up __switch_to() for Thumb2 a bit more,
- add patch to make the vmalloc_seq counter SMP safe,
- use enter_lazy_tlb() hook on !LPAE to ensure that the active_mm used
  by a kernel thread has a mapping for its vmap'ed stack,

Code can be found under the arm-vmap-stacks-v5 tag at
git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git

Cc: Russell King <linux@armlinux.org.uk>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Keith Packard <keithpac@amazon.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Jesse Taube <mr.bossman075@gmail.com>

Ard Biesheuvel (26):
  ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups
  ARM: decompressor: disable stack protector
  ARM: stackprotector: prefer compiler for TLS based per-task protector
  ARM: entry: preserve thread_info pointer in switch_to
  ARM: module: implement support for PC-relative group relocations
  ARM: assembler: add optimized ldr/str macros to load variables from
    memory
  ARM: percpu: add SMP_ON_UP support
  ARM: use TLS register for 'current' on !SMP as well
  ARM: smp: defer TPIDRURO update for SMP v6 configurations too
  ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems
  ARM: assembler: introduce bl_r macro
  ARM: unwind: support unwinding across multiple stacks
  ARM: export dump_mem() to other objects
  ARM: unwind: dump exception stack from calling frame
  ARM: backtrace-clang: avoid crash on bogus frame pointer
  ARM: implement IRQ stacks
  ARM: call_with_stack: add unwind support
  ARM: run softirqs on the per-CPU IRQ stack
  ARM: memcpy: use frame pointer as unwind anchor
  ARM: memmove: use frame pointer as unwind anchor
  ARM: memset: clean up unwind annotations
  ARM: unwind: disregard unwind info before stack frame is set up
  ARM: entry: rework stack realignment code in svc_entry
  ARM: switch_to: clean up Thumb2 code path
  ARM: mm: prepare vmalloc_seq handling for use under SMP
  ARM: implement support for vmap'ed stacks

Arnd Bergmann (5):
  ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER
  ARM: footbridge: use GENERIC_IRQ_MULTI_HANDLER
  ARM: iop32x: offset IRQ numbers by 1
  ARM: iop32x: use GENERIC_IRQ_MULTI_HANDLER
  ARM: remove old-style irq entry

Vladimir Murzin (1):
  irqchip: nvic: Use GENERIC_IRQ_MULTI_HANDLER

 arch/arm/Kconfig                                    |  39 ++--
 arch/arm/Makefile                                   |   9 +
 arch/arm/boot/compressed/Makefile                   |   6 +-
 arch/arm/boot/compressed/misc.c                     |   7 -
 arch/arm/include/asm/assembler.h                    | 209 ++++++++++++++++----
 arch/arm/include/asm/current.h                      |  47 +++--
 arch/arm/include/asm/elf.h                          |   3 +
 arch/arm/include/asm/entry-macro-multi.S            |  40 ----
 arch/arm/include/asm/hardware/entry-macro-iomd.S    | 131 ------------
 arch/arm/include/asm/insn.h                         |  17 ++
 arch/arm/include/asm/irq.h                          |   1 -
 arch/arm/include/asm/mach/arch.h                    |   2 -
 arch/arm/include/asm/mmu.h                          |   2 +-
 arch/arm/include/asm/mmu_context.h                  |  22 ++-
 arch/arm/include/asm/page.h                         |   3 +
 arch/arm/include/asm/percpu.h                       |  36 +++-
 arch/arm/include/asm/smp.h                          |   5 -
 arch/arm/include/asm/stacktrace.h                   |  12 ++
 arch/arm/include/asm/switch_to.h                    |   3 +-
 arch/arm/include/asm/thread_info.h                  |  35 +---
 arch/arm/include/asm/tls.h                          |  30 ++-
 arch/arm/include/asm/v7m.h                          |   3 +-
 arch/arm/kernel/asm-offsets.c                       |   3 -
 arch/arm/kernel/entry-armv.S                        | 208 +++++++++++++++----
 arch/arm/kernel/entry-common.S                      |  16 +-
 arch/arm/kernel/entry-header.S                      |  47 ++++-
 arch/arm/kernel/entry-v7m.S                         |  39 ++--
 arch/arm/kernel/head-common.S                       |   4 +-
 arch/arm/kernel/head.S                              |   7 +
 arch/arm/kernel/irq.c                               |  55 ++++--
 arch/arm/kernel/module.c                            |  90 +++++++++
 arch/arm/kernel/process.c                           |   7 +-
 arch/arm/kernel/setup.c                             |   8 +-
 arch/arm/kernel/sleep.S                             |  13 ++
 arch/arm/kernel/smp.c                               |  11 +-
 arch/arm/kernel/traps.c                             |  92 ++++++++-
 arch/arm/kernel/unwind.c                            |  50 +++--
 arch/arm/kernel/vmlinux.lds.S                       |   4 +-
 arch/arm/lib/backtrace-clang.S                      |  13 +-
 arch/arm/lib/backtrace.S                            |   7 +
 arch/arm/lib/call_with_stack.S                      |  33 +++-
 arch/arm/lib/copy_from_user.S                       |  13 +-
 arch/arm/lib/copy_template.S                        |  67 +++----
 arch/arm/lib/copy_to_user.S                         |  13 +-
 arch/arm/lib/memcpy.S                               |  13 +-
 arch/arm/lib/memmove.S                              |  60 ++----
 arch/arm/lib/memset.S                               |   7 +-
 arch/arm/mach-footbridge/common.c                   |  87 ++++++++
 arch/arm/mach-footbridge/include/mach/entry-macro.S | 107 ----------
 arch/arm/mach-iop32x/cp6.c                          |  10 +-
 arch/arm/mach-iop32x/include/mach/entry-macro.S     |  31 ---
 arch/arm/mach-iop32x/include/mach/irqs.h            |   2 +-
 arch/arm/mach-iop32x/iop3xx.h                       |   1 +
 arch/arm/mach-iop32x/irq.c                          |  29 ++-
 arch/arm/mach-iop32x/irqs.h                         |  60 +++---
 arch/arm/mach-rpc/fiq.S                             |   5 +-
 arch/arm/mach-rpc/include/mach/entry-macro.S        |  13 --
 arch/arm/mach-rpc/irq.c                             |  95 +++++++++
 arch/arm/mm/Kconfig                                 |   1 +
 arch/arm/mm/context.c                               |   3 +-
 arch/arm/mm/ioremap.c                               |  18 +-
 drivers/irqchip/Kconfig                             |   1 +
 drivers/irqchip/irq-nvic.c                          |  22 +--
 63 files changed, 1270 insertions(+), 757 deletions(-)
 delete mode 100644 arch/arm/include/asm/entry-macro-multi.S
 delete mode 100644 arch/arm/include/asm/hardware/entry-macro-iomd.S
 delete mode 100644 arch/arm/mach-footbridge/include/mach/entry-macro.S
 delete mode 100644 arch/arm/mach-iop32x/include/mach/entry-macro.S
 delete mode 100644 arch/arm/mach-rpc/include/mach/entry-macro.S

-- 
2.30.2


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v5 01/32] ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 02/32] ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

IOMD_IRQREQC nor IOMD_IRQREQD are ever defined, so any conditionally
compiled code that depends on them is dead code, and can be removed.

Suggested-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/hardware/entry-macro-iomd.S | 47 --------------------
 1 file changed, 47 deletions(-)

diff --git a/arch/arm/include/asm/hardware/entry-macro-iomd.S b/arch/arm/include/asm/hardware/entry-macro-iomd.S
index f7692731e514..81441dfa5282 100644
--- a/arch/arm/include/asm/hardware/entry-macro-iomd.S
+++ b/arch/arm/include/asm/hardware/entry-macro-iomd.S
@@ -24,16 +24,6 @@
 		ldrbeq	\irqstat, [\base, #IOMD_IRQREQA]	@ get low priority
 		addeq	\tmp, \tmp, #256		@ irq_prio_d table size
 		teqeq	\irqstat, #0
-#ifdef IOMD_IRQREQC
-		ldrbeq	\irqstat, [\base, #IOMD_IRQREQC]
-		addeq	\tmp, \tmp, #256		@ irq_prio_l table size
-		teqeq	\irqstat, #0
-#endif
-#ifdef IOMD_IRQREQD
-		ldrbeq	\irqstat, [\base, #IOMD_IRQREQD]
-		addeq	\tmp, \tmp, #256		@ irq_prio_lc table size
-		teqeq	\irqstat, #0
-#endif
 2406:		ldrbne	\irqnr, [\tmp, \irqstat]	@ get IRQ number
 		.endm
 
@@ -92,40 +82,3 @@ irq_prio_l:	.byte	 0, 0, 1, 0, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3
 		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
 		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
 		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
-#ifdef IOMD_IRQREQC
-irq_prio_lc:	.byte	24,24,25,24,26,26,26,26,27,27,27,27,27,27,27,27
-		.byte	28,24,25,24,26,26,26,26,27,27,27,27,27,27,27,27
-		.byte	29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29
-		.byte	29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29
-		.byte	30,30,30,30,30,30,30,30,27,27,27,27,27,27,27,27
-		.byte	30,30,30,30,30,30,30,30,27,27,27,27,27,27,27,27
-		.byte	29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29
-		.byte	29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29
-		.byte	31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31
-		.byte	31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31
-		.byte	31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31
-		.byte	31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31
-		.byte	31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31
-		.byte	31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31
-		.byte	31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31
-		.byte	31,31,31,31,31,31,31,31,31,31,31,31,31,31,31,31
-#endif
-#ifdef IOMD_IRQREQD
-irq_prio_ld:	.byte	40,40,41,40,42,42,42,42,43,43,43,43,43,43,43,43
-		.byte	44,40,41,40,42,42,42,42,43,43,43,43,43,43,43,43
-		.byte	45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45
-		.byte	45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45
-		.byte	46,46,46,46,46,46,46,46,43,43,43,43,43,43,43,43
-		.byte	46,46,46,46,46,46,46,46,43,43,43,43,43,43,43,43
-		.byte	45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45
-		.byte	45,45,45,45,45,45,45,45,45,45,45,45,45,45,45,45
-		.byte	47,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
-		.byte	47,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
-		.byte	47,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
-		.byte	47,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
-		.byte	47,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
-		.byte	47,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
-		.byte	47,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
-		.byte	47,47,47,47,47,47,47,47,47,47,47,47,47,47,47,47
-#endif
-
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 02/32] ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 01/32] ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 03/32] ARM: footbridge: " Ard Biesheuvel
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

From: Arnd Bergmann <arnd@arndb.de>

This is one of the last platforms using the old entry path.
While this code path is spread over a few files, it is fairly
straightforward to convert it into an equivalent C version,
leaving the existing algorithm and all the priority handling
the same.

Unlike most irqchip drivers, this means reading the status
register(s) in a loop and always handling the highest-priority
irq first.

The IOMD_IRQREQC and IOMD_IRQREQD registers are not actaully
used here, but I left the code in place for the time being,
to keep the conversion as direct as possible. It could be
removed in a cleanup on top.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ardb: drop obsolete IOMD_IRQREQC/IOMD_IRQREQD handling]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/Kconfig                                 |  1 +
 arch/arm/include/asm/hardware/entry-macro-iomd.S | 84 -----------------
 arch/arm/mach-rpc/fiq.S                          |  5 +-
 arch/arm/mach-rpc/include/mach/entry-macro.S     | 13 ---
 arch/arm/mach-rpc/irq.c                          | 95 ++++++++++++++++++++
 5 files changed, 99 insertions(+), 99 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index fabe39169b12..40193ec76f1a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -443,6 +443,7 @@ config ARCH_RPC
 	select ARM_HAS_SG_CHAIN
 	select CPU_SA110
 	select FIQ
+	select GENERIC_IRQ_MULTI_HANDLER
 	select HAVE_PATA_PLATFORM
 	select ISA_DMA_API
 	select LEGACY_TIMER_TICK
diff --git a/arch/arm/include/asm/hardware/entry-macro-iomd.S b/arch/arm/include/asm/hardware/entry-macro-iomd.S
deleted file mode 100644
index 81441dfa5282..000000000000
--- a/arch/arm/include/asm/hardware/entry-macro-iomd.S
+++ /dev/null
@@ -1,84 +0,0 @@
-/*
- * arch/arm/include/asm/hardware/entry-macro-iomd.S
- *
- * Low-level IRQ helper macros for IOC/IOMD based platforms
- *
- * This file is licensed under  the terms of the GNU General Public
- * License version 2. This program is licensed "as is" without any
- * warranty of any kind, whether express or implied.
- */
-
-/* IOC / IOMD based hardware */
-#include <asm/hardware/iomd.h>
-
-		.macro	get_irqnr_and_base, irqnr, irqstat, base, tmp
-		ldrb	\irqstat, [\base, #IOMD_IRQREQB]	@ get high priority first
-		ldr	\tmp, =irq_prio_h
-		teq	\irqstat, #0
-#ifdef IOMD_BASE
-		ldrbeq	\irqstat, [\base, #IOMD_DMAREQ]	@ get dma
-		addeq	\tmp, \tmp, #256		@ irq_prio_h table size
-		teqeq	\irqstat, #0
-		bne	2406f
-#endif
-		ldrbeq	\irqstat, [\base, #IOMD_IRQREQA]	@ get low priority
-		addeq	\tmp, \tmp, #256		@ irq_prio_d table size
-		teqeq	\irqstat, #0
-2406:		ldrbne	\irqnr, [\tmp, \irqstat]	@ get IRQ number
-		.endm
-
-/*
- * Interrupt table (incorporates priority).  Please note that we
- * rely on the order of these tables (see above code).
- */
-		.align	5
-irq_prio_h:	.byte	 0, 8, 9, 8,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	12, 8, 9, 8,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	14,14,14,14,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	14,14,14,14,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	15,15,15,15,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	15,15,15,15,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	15,15,15,15,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	15,15,15,15,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10
-		.byte	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10
-#ifdef IOMD_BASE
-irq_prio_d:	.byte	 0,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	20,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	22,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	22,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	23,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	23,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	22,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	22,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-		.byte	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16
-#endif
-irq_prio_l:	.byte	 0, 0, 1, 0, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3
-		.byte	 4, 0, 1, 0, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3
-		.byte	 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
-		.byte	 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
-		.byte	 6, 6, 6, 6, 6, 6, 6, 6, 3, 3, 3, 3, 3, 3, 3, 3
-		.byte	 6, 6, 6, 6, 6, 6, 6, 6, 3, 3, 3, 3, 3, 3, 3, 3
-		.byte	 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
-		.byte	 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
-		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
-		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
-		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
-		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
-		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
-		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
-		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
-		.byte	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
diff --git a/arch/arm/mach-rpc/fiq.S b/arch/arm/mach-rpc/fiq.S
index 0de83e9b0b39..087bdf4bc093 100644
--- a/arch/arm/mach-rpc/fiq.S
+++ b/arch/arm/mach-rpc/fiq.S
@@ -2,10 +2,11 @@
 #include <linux/linkage.h>
 #include <asm/assembler.h>
 #include <mach/hardware.h>
-#include <mach/entry-macro.S>
 
-	.text
+	.equ	ioc_base_high, IOC_BASE & 0xff000000
+	.equ	ioc_base_low, IOC_BASE & 0x00ff0000
 
+	.text
 	.global	rpc_default_fiq_end
 ENTRY(rpc_default_fiq_start)
 	mov	r12, #ioc_base_high
diff --git a/arch/arm/mach-rpc/include/mach/entry-macro.S b/arch/arm/mach-rpc/include/mach/entry-macro.S
deleted file mode 100644
index a6d1a9f4bb79..000000000000
--- a/arch/arm/mach-rpc/include/mach/entry-macro.S
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include <mach/hardware.h>
-#include <asm/hardware/entry-macro-iomd.S>
-
-	.equ	ioc_base_high, IOC_BASE & 0xff000000
-	.equ	ioc_base_low, IOC_BASE & 0x00ff0000
-
-	.macro  get_irqnr_preamble, base, tmp
-	mov	\base, #ioc_base_high		@ point at IOC
-	.if	ioc_base_low
-	orr	\base, \base, #ioc_base_low
-	.endif
-	.endm
diff --git a/arch/arm/mach-rpc/irq.c b/arch/arm/mach-rpc/irq.c
index 803aeb126f0e..dc29384b6ef8 100644
--- a/arch/arm/mach-rpc/irq.c
+++ b/arch/arm/mach-rpc/irq.c
@@ -14,6 +14,99 @@
 #define CLR	0x04
 #define MASK	0x08
 
+static const u8 irq_prio_h[256] = {
+	 0, 8, 9, 8,10,10,10,10,11,11,11,11,10,10,10,10,
+	12, 8, 9, 8,10,10,10,10,11,11,11,11,10,10,10,10,
+	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10,
+	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10,
+	14,14,14,14,10,10,10,10,11,11,11,11,10,10,10,10,
+	14,14,14,14,10,10,10,10,11,11,11,11,10,10,10,10,
+	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10,
+	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10,
+	15,15,15,15,10,10,10,10,11,11,11,11,10,10,10,10,
+	15,15,15,15,10,10,10,10,11,11,11,11,10,10,10,10,
+	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10,
+	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10,
+	15,15,15,15,10,10,10,10,11,11,11,11,10,10,10,10,
+	15,15,15,15,10,10,10,10,11,11,11,11,10,10,10,10,
+	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10,
+	13,13,13,13,10,10,10,10,11,11,11,11,10,10,10,10,
+};
+
+static const u8 irq_prio_d[256] = {
+	 0,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	20,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	22,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	22,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	23,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	23,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	22,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	22,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+	21,16,17,16,18,16,17,16,19,16,17,16,18,16,17,16,
+};
+
+static const u8 irq_prio_l[256] = {
+	 0, 0, 1, 0, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
+	 4, 0, 1, 0, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
+	 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+	 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+	 6, 6, 6, 6, 6, 6, 6, 6, 3, 3, 3, 3, 3, 3, 3, 3,
+	 6, 6, 6, 6, 6, 6, 6, 6, 3, 3, 3, 3, 3, 3, 3, 3,
+	 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+	 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+	 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+};
+
+static int iomd_get_irq_nr(void)
+{
+	int irq;
+	u8 reg;
+
+	/* get highest priority first */
+	reg = readb(IOC_BASE + IOMD_IRQREQB);
+	irq = irq_prio_h[reg];
+	if (irq)
+		return irq;
+
+	/* get DMA  */
+	reg = readb(IOC_BASE + IOMD_DMAREQ);
+	irq = irq_prio_d[reg];
+	if (irq)
+		return irq;
+
+	/* get low priority */
+	reg = readb(IOC_BASE + IOMD_IRQREQA);
+	irq = irq_prio_l[reg];
+	if (irq)
+		return irq;
+	return 0;
+}
+
+static void iomd_handle_irq(struct pt_regs *regs)
+{
+	int irq;
+
+	do {
+		irq = iomd_get_irq_nr();
+		if (irq)
+			generic_handle_irq(irq);
+	} while (irq);
+}
+
 static void __iomem *iomd_get_base(struct irq_data *d)
 {
 	void *cd = irq_data_get_irq_chip_data(d);
@@ -82,6 +175,8 @@ void __init rpc_init_irq(void)
 	set_fiq_handler(&rpc_default_fiq_start,
 		&rpc_default_fiq_end - &rpc_default_fiq_start);
 
+	set_handle_irq(iomd_handle_irq);
+
 	for (irq = 0; irq < NR_IRQS; irq++) {
 		clr = IRQ_NOREQUEST;
 		set = 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 03/32] ARM: footbridge: use GENERIC_IRQ_MULTI_HANDLER
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 01/32] ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 02/32] ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 04/32] ARM: iop32x: offset IRQ numbers by 1 Ard Biesheuvel
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

From: Arnd Bergmann <arnd@arndb.de>

Footbridge still uses the classic IRQ entry path in assembler,
but this is easily converted into an equivalent C version.

In this case, the correlation between IRQ numbers and bits in
the status register is non-obvious, and the priorities are
handled by manually checking each bit in a static order,
re-reading the status register after each handled event.

I moved the code into the new file and edited the syntax without
changing this sequence to keep the behavior as close as possible
to what it traditionally did.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
---
 arch/arm/Kconfig                                    |   1 +
 arch/arm/mach-footbridge/common.c                   |  87 ++++++++++++++++
 arch/arm/mach-footbridge/include/mach/entry-macro.S | 107 --------------------
 3 files changed, 88 insertions(+), 107 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 40193ec76f1a..bef5085f2ce7 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -361,6 +361,7 @@ config ARCH_FOOTBRIDGE
 	select FOOTBRIDGE
 	select NEED_MACH_IO_H if !MMU
 	select NEED_MACH_MEMORY_H
+	select GENERIC_IRQ_MULTI_HANDLER
 	help
 	  Support for systems based on the DC21285 companion chip
 	  ("FootBridge"), such as the Simtec CATS and the Rebel NetWinder.
diff --git a/arch/arm/mach-footbridge/common.c b/arch/arm/mach-footbridge/common.c
index eee095f0e2f6..322495df271d 100644
--- a/arch/arm/mach-footbridge/common.c
+++ b/arch/arm/mach-footbridge/common.c
@@ -27,6 +27,91 @@
 
 #include "common.h"
 
+#include <mach/hardware.h>
+#include <mach/irqs.h>
+#include <asm/hardware/dec21285.h>
+
+static int dc21285_get_irq(void)
+{
+	void __iomem *irqstatus = (void __iomem *)CSR_IRQ_STATUS;
+	u32 mask = readl(irqstatus);
+
+	if (mask & IRQ_MASK_SDRAMPARITY)
+		return IRQ_SDRAMPARITY;
+
+	if (mask & IRQ_MASK_UART_RX)
+		return IRQ_CONRX;
+
+	if (mask & IRQ_MASK_DMA1)
+		return IRQ_DMA1;
+
+	if (mask & IRQ_MASK_DMA2)
+		return IRQ_DMA2;
+
+	if (mask & IRQ_MASK_IN0)
+		return IRQ_IN0;
+
+	if (mask & IRQ_MASK_IN1)
+		return IRQ_IN1;
+
+	if (mask & IRQ_MASK_IN2)
+		return IRQ_IN2;
+
+	if (mask & IRQ_MASK_IN3)
+		return IRQ_IN3;
+
+	if (mask & IRQ_MASK_PCI)
+		return IRQ_PCI;
+
+	if (mask & IRQ_MASK_DOORBELLHOST)
+		return IRQ_DOORBELLHOST;
+
+	if (mask & IRQ_MASK_I2OINPOST)
+		return IRQ_I2OINPOST;
+
+	if (mask & IRQ_MASK_TIMER1)
+		return IRQ_TIMER1;
+
+	if (mask & IRQ_MASK_TIMER2)
+		return IRQ_TIMER2;
+
+	if (mask & IRQ_MASK_TIMER3)
+		return IRQ_TIMER3;
+
+	if (mask & IRQ_MASK_UART_TX)
+		return IRQ_CONTX;
+
+	if (mask & IRQ_MASK_PCI_ABORT)
+		return IRQ_PCI_ABORT;
+
+	if (mask & IRQ_MASK_PCI_SERR)
+		return IRQ_PCI_SERR;
+
+	if (mask & IRQ_MASK_DISCARD_TIMER)
+		return IRQ_DISCARD_TIMER;
+
+	if (mask & IRQ_MASK_PCI_DPERR)
+		return IRQ_PCI_DPERR;
+
+	if (mask & IRQ_MASK_PCI_PERR)
+		return IRQ_PCI_PERR;
+
+	return 0;
+}
+
+static void dc21285_handle_irq(struct pt_regs *regs)
+{
+	int irq;
+	do {
+		irq = dc21285_get_irq();
+		if (!irq)
+			break;
+
+		generic_handle_irq(irq);
+	} while (1);
+}
+
+
 unsigned int mem_fclk_21285 = 50000000;
 
 EXPORT_SYMBOL(mem_fclk_21285);
@@ -108,6 +193,8 @@ static void __init __fb_init_irq(void)
 
 void __init footbridge_init_irq(void)
 {
+	set_handle_irq(dc21285_handle_irq);
+
 	__fb_init_irq();
 
 	if (!footbridge_cfn_mode())
diff --git a/arch/arm/mach-footbridge/include/mach/entry-macro.S b/arch/arm/mach-footbridge/include/mach/entry-macro.S
deleted file mode 100644
index dabbd5c54a78..000000000000
--- a/arch/arm/mach-footbridge/include/mach/entry-macro.S
+++ /dev/null
@@ -1,107 +0,0 @@
-/*
- * arch/arm/mach-footbridge/include/mach/entry-macro.S
- *
- * Low-level IRQ helper macros for footbridge-based platforms
- *
- * This file is licensed under  the terms of the GNU General Public
- * License version 2. This program is licensed "as is" without any
- * warranty of any kind, whether express or implied.
- */
-#include <mach/hardware.h>
-#include <mach/irqs.h>
-#include <asm/hardware/dec21285.h>
-
-		.equ	dc21285_high, ARMCSR_BASE & 0xff000000
-		.equ	dc21285_low, ARMCSR_BASE & 0x00ffffff
-
-		.macro  get_irqnr_preamble, base, tmp
-		mov	\base, #dc21285_high
-		.if	dc21285_low
-		orr	\base, \base, #dc21285_low
-		.endif
-		.endm
-
-		.macro	get_irqnr_and_base, irqnr, irqstat, base, tmp
-		ldr	\irqstat, [\base, #0x180]	@ get interrupts
-
-		mov	\irqnr, #IRQ_SDRAMPARITY
-		tst	\irqstat, #IRQ_MASK_SDRAMPARITY
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_UART_RX
-		movne	\irqnr, #IRQ_CONRX
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_DMA1
-		movne	\irqnr, #IRQ_DMA1
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_DMA2
-		movne	\irqnr, #IRQ_DMA2
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_IN0
-		movne	\irqnr, #IRQ_IN0
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_IN1
-		movne	\irqnr, #IRQ_IN1
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_IN2
-		movne	\irqnr, #IRQ_IN2
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_IN3
-		movne	\irqnr, #IRQ_IN3
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_PCI
-		movne	\irqnr, #IRQ_PCI
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_DOORBELLHOST
-		movne	\irqnr, #IRQ_DOORBELLHOST
-		bne     1001f
-
-		tst	\irqstat, #IRQ_MASK_I2OINPOST
-		movne	\irqnr, #IRQ_I2OINPOST
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_TIMER1
-		movne	\irqnr, #IRQ_TIMER1
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_TIMER2
-		movne	\irqnr, #IRQ_TIMER2
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_TIMER3
-		movne	\irqnr, #IRQ_TIMER3
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_UART_TX
-		movne	\irqnr, #IRQ_CONTX
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_PCI_ABORT
-		movne	\irqnr, #IRQ_PCI_ABORT
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_PCI_SERR
-		movne	\irqnr, #IRQ_PCI_SERR
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_DISCARD_TIMER
-		movne	\irqnr, #IRQ_DISCARD_TIMER
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_PCI_DPERR
-		movne	\irqnr, #IRQ_PCI_DPERR
-		bne	1001f
-
-		tst	\irqstat, #IRQ_MASK_PCI_PERR
-		movne	\irqnr, #IRQ_PCI_PERR
-1001:
-		.endm
-
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 04/32] ARM: iop32x: offset IRQ numbers by 1
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 03/32] ARM: footbridge: " Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 05/32] ARM: iop32x: use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

From: Arnd Bergmann <arnd@arndb.de>

iop32x is one of the last platforms to use IRQ 0, and this has apparently
stopped working in a 2014 cleanup without anyone noticing. This interrupt
is used for the DMA engine, so most likely this has not actually worked
in the past 7 years, but it's also not essential for using this board.

I'm splitting out this change from my GENERIC_IRQ_MULTI_HANDLER
conversion so it can be backported if anyone cares.

Fixes: a71b092a9c68 ("ARM: Convert handle_IRQ to use __handle_domain_irq")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ardb: take +1 offset into account in mask/unmask and init as well]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
---
 arch/arm/mach-iop32x/include/mach/entry-macro.S |  2 +-
 arch/arm/mach-iop32x/include/mach/irqs.h        |  2 +-
 arch/arm/mach-iop32x/irq.c                      |  6 +-
 arch/arm/mach-iop32x/irqs.h                     | 60 +++++++++++---------
 4 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/arch/arm/mach-iop32x/include/mach/entry-macro.S b/arch/arm/mach-iop32x/include/mach/entry-macro.S
index 8e6766d4621e..341e5d9a6616 100644
--- a/arch/arm/mach-iop32x/include/mach/entry-macro.S
+++ b/arch/arm/mach-iop32x/include/mach/entry-macro.S
@@ -20,7 +20,7 @@
 	mrc     p6, 0, \irqstat, c8, c0, 0	@ Read IINTSRC
 	cmp     \irqstat, #0
 	clzne   \irqnr, \irqstat
-	rsbne   \irqnr, \irqnr, #31
+	rsbne   \irqnr, \irqnr, #32
 	.endm
 
 	.macro arch_ret_to_user, tmp1, tmp2
diff --git a/arch/arm/mach-iop32x/include/mach/irqs.h b/arch/arm/mach-iop32x/include/mach/irqs.h
index c4e78df428e8..e09ae5f48aec 100644
--- a/arch/arm/mach-iop32x/include/mach/irqs.h
+++ b/arch/arm/mach-iop32x/include/mach/irqs.h
@@ -9,6 +9,6 @@
 #ifndef __IRQS_H
 #define __IRQS_H
 
-#define NR_IRQS			32
+#define NR_IRQS			33
 
 #endif
diff --git a/arch/arm/mach-iop32x/irq.c b/arch/arm/mach-iop32x/irq.c
index 2d48bf1398c1..d1e8824cbd82 100644
--- a/arch/arm/mach-iop32x/irq.c
+++ b/arch/arm/mach-iop32x/irq.c
@@ -32,14 +32,14 @@ static void intstr_write(u32 val)
 static void
 iop32x_irq_mask(struct irq_data *d)
 {
-	iop32x_mask &= ~(1 << d->irq);
+	iop32x_mask &= ~(1 << (d->irq - 1));
 	intctl_write(iop32x_mask);
 }
 
 static void
 iop32x_irq_unmask(struct irq_data *d)
 {
-	iop32x_mask |= 1 << d->irq;
+	iop32x_mask |= 1 << (d->irq - 1);
 	intctl_write(iop32x_mask);
 }
 
@@ -65,7 +65,7 @@ void __init iop32x_init_irq(void)
 	    machine_is_em7210())
 		*IOP3XX_PCIIRSR = 0x0f;
 
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 1; i < NR_IRQS; i++) {
 		irq_set_chip_and_handler(i, &ext_chip, handle_level_irq);
 		irq_clear_status_flags(i, IRQ_NOREQUEST | IRQ_NOPROBE);
 	}
diff --git a/arch/arm/mach-iop32x/irqs.h b/arch/arm/mach-iop32x/irqs.h
index 69858e4e905d..e1dfc8b4e7d7 100644
--- a/arch/arm/mach-iop32x/irqs.h
+++ b/arch/arm/mach-iop32x/irqs.h
@@ -7,36 +7,40 @@
 #ifndef __IOP32X_IRQS_H
 #define __IOP32X_IRQS_H
 
+/* Interrupts in Linux start at 1, hardware starts at 0 */
+
+#define IOP_IRQ(x) ((x) + 1)
+
 /*
  * IOP80321 chipset interrupts
  */
-#define IRQ_IOP32X_DMA0_EOT	0
-#define IRQ_IOP32X_DMA0_EOC	1
-#define IRQ_IOP32X_DMA1_EOT	2
-#define IRQ_IOP32X_DMA1_EOC	3
-#define IRQ_IOP32X_AA_EOT	6
-#define IRQ_IOP32X_AA_EOC	7
-#define IRQ_IOP32X_CORE_PMON	8
-#define IRQ_IOP32X_TIMER0	9
-#define IRQ_IOP32X_TIMER1	10
-#define IRQ_IOP32X_I2C_0	11
-#define IRQ_IOP32X_I2C_1	12
-#define IRQ_IOP32X_MESSAGING	13
-#define IRQ_IOP32X_ATU_BIST	14
-#define IRQ_IOP32X_PERFMON	15
-#define IRQ_IOP32X_CORE_PMU	16
-#define IRQ_IOP32X_BIU_ERR	17
-#define IRQ_IOP32X_ATU_ERR	18
-#define IRQ_IOP32X_MCU_ERR	19
-#define IRQ_IOP32X_DMA0_ERR	20
-#define IRQ_IOP32X_DMA1_ERR	21
-#define IRQ_IOP32X_AA_ERR	23
-#define IRQ_IOP32X_MSG_ERR	24
-#define IRQ_IOP32X_SSP		25
-#define IRQ_IOP32X_XINT0	27
-#define IRQ_IOP32X_XINT1	28
-#define IRQ_IOP32X_XINT2	29
-#define IRQ_IOP32X_XINT3	30
-#define IRQ_IOP32X_HPI		31
+#define IRQ_IOP32X_DMA0_EOT	IOP_IRQ(0)
+#define IRQ_IOP32X_DMA0_EOC	IOP_IRQ(1)
+#define IRQ_IOP32X_DMA1_EOT	IOP_IRQ(2)
+#define IRQ_IOP32X_DMA1_EOC	IOP_IRQ(3)
+#define IRQ_IOP32X_AA_EOT	IOP_IRQ(6)
+#define IRQ_IOP32X_AA_EOC	IOP_IRQ(7)
+#define IRQ_IOP32X_CORE_PMON	IOP_IRQ(8)
+#define IRQ_IOP32X_TIMER0	IOP_IRQ(9)
+#define IRQ_IOP32X_TIMER1	IOP_IRQ(10)
+#define IRQ_IOP32X_I2C_0	IOP_IRQ(11)
+#define IRQ_IOP32X_I2C_1	IOP_IRQ(12)
+#define IRQ_IOP32X_MESSAGING	IOP_IRQ(13)
+#define IRQ_IOP32X_ATU_BIST	IOP_IRQ(14)
+#define IRQ_IOP32X_PERFMON	IOP_IRQ(15)
+#define IRQ_IOP32X_CORE_PMU	IOP_IRQ(16)
+#define IRQ_IOP32X_BIU_ERR	IOP_IRQ(17)
+#define IRQ_IOP32X_ATU_ERR	IOP_IRQ(18)
+#define IRQ_IOP32X_MCU_ERR	IOP_IRQ(19)
+#define IRQ_IOP32X_DMA0_ERR	IOP_IRQ(20)
+#define IRQ_IOP32X_DMA1_ERR	IOP_IRQ(21)
+#define IRQ_IOP32X_AA_ERR	IOP_IRQ(23)
+#define IRQ_IOP32X_MSG_ERR	IOP_IRQ(24)
+#define IRQ_IOP32X_SSP		IOP_IRQ(25)
+#define IRQ_IOP32X_XINT0	IOP_IRQ(27)
+#define IRQ_IOP32X_XINT1	IOP_IRQ(28)
+#define IRQ_IOP32X_XINT2	IOP_IRQ(29)
+#define IRQ_IOP32X_XINT3	IOP_IRQ(30)
+#define IRQ_IOP32X_HPI		IOP_IRQ(31)
 
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 05/32] ARM: iop32x: use GENERIC_IRQ_MULTI_HANDLER
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 04/32] ARM: iop32x: offset IRQ numbers by 1 Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 06/32] ARM: remove old-style irq entry Ard Biesheuvel
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

From: Arnd Bergmann <arnd@arndb.de>

iop32x uses the entry-macro.S file for both the IRQ entry and for
hooking into the arch_ret_to_user code path. This is done because the
cp6 registers have to be enabled before accessing any of the interrupt
controller registers but have to be disabled when running in user space.

There is also a lazy-enable logic in cp6.c, but during a hardirq, we
know it has to be enabled.

Both the cp6-enable code and the code to read the IRQ status can be
lifted into the normal generic_handle_arch_irq() path, but the
cp6-disable code has to remain in the user return code. As nothing
other than iop32x uses this hook, just open-code it there with an
ifdef for the platform that can eventually be removed when iop32x
has reached the end of its life.

The cp6-enable path in the IRQ entry has an extra cp_wait barrier that
the trap version does not have, but it is harmless to do it in both
cases to simplify the logic here at the cost of a few extra cycles
for the trap.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/Kconfig                                |  5 +---
 arch/arm/kernel/entry-common.S                  | 16 +++++-----
 arch/arm/mach-iop32x/cp6.c                      | 10 ++++++-
 arch/arm/mach-iop32x/include/mach/entry-macro.S | 31 --------------------
 arch/arm/mach-iop32x/iop3xx.h                   |  1 +
 arch/arm/mach-iop32x/irq.c                      | 23 +++++++++++++++
 6 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index bef5085f2ce7..ac2f88ce0b9a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -226,9 +226,6 @@ config GENERIC_ISA_DMA
 config FIQ
 	bool
 
-config NEED_RET_TO_USER
-	bool
-
 config ARCH_MTD_XIP
 	bool
 
@@ -370,9 +367,9 @@ config ARCH_IOP32X
 	bool "IOP32x-based"
 	depends on MMU
 	select CPU_XSCALE
+	select GENERIC_IRQ_MULTI_HANDLER
 	select GPIO_IOP
 	select GPIOLIB
-	select NEED_RET_TO_USER
 	select FORCE_PCI
 	select PLAT_IOP
 	help
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index ac86c34682bb..c928d6b04cce 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -16,12 +16,14 @@
 
 	.equ	NR_syscalls, __NR_syscalls
 
-#ifdef CONFIG_NEED_RET_TO_USER
-#include <mach/entry-macro.S>
-#else
-	.macro  arch_ret_to_user, tmp1, tmp2
-	.endm
+	.macro  arch_ret_to_user, tmp
+#ifdef CONFIG_ARCH_IOP32X
+	mrc	p15, 0, \tmp, c15, c1, 0
+	tst	\tmp, #(1 << 6)
+	bicne	\tmp, \tmp, #(1 << 6)
+	mcrne	p15, 0, \tmp, c15, c1, 0	@ Disable cp6 access
 #endif
+	.endm
 
 #include "entry-header.S"
 
@@ -55,7 +57,7 @@ __ret_fast_syscall:
 
 
 	/* perform architecture specific actions before user return */
-	arch_ret_to_user r1, lr
+	arch_ret_to_user r1
 
 	restore_user_regs fast = 1, offset = S_OFF
  UNWIND(.fnend		)
@@ -128,7 +130,7 @@ no_work_pending:
 	asm_trace_hardirqs_on save = 0
 
 	/* perform architecture specific actions before user return */
-	arch_ret_to_user r1, lr
+	arch_ret_to_user r1
 	ct_user_enter save = 0
 
 	restore_user_regs fast = 0, offset = 0
diff --git a/arch/arm/mach-iop32x/cp6.c b/arch/arm/mach-iop32x/cp6.c
index ec74b07fb7e3..2882674a1c39 100644
--- a/arch/arm/mach-iop32x/cp6.c
+++ b/arch/arm/mach-iop32x/cp6.c
@@ -7,7 +7,7 @@
 #include <asm/traps.h>
 #include <asm/ptrace.h>
 
-static int cp6_trap(struct pt_regs *regs, unsigned int instr)
+void iop_enable_cp6(void)
 {
 	u32 temp;
 
@@ -16,7 +16,15 @@ static int cp6_trap(struct pt_regs *regs, unsigned int instr)
 		"mrc	p15, 0, %0, c15, c1, 0\n\t"
 		"orr	%0, %0, #(1 << 6)\n\t"
 		"mcr	p15, 0, %0, c15, c1, 0\n\t"
+		"mrc	p15, 0, %0, c15, c1, 0\n\t"
+		"mov	%0, %0\n\t"
+		"sub	pc, pc, #4  @ cp_wait\n\t"
 		: "=r"(temp));
+}
+
+static int cp6_trap(struct pt_regs *regs, unsigned int instr)
+{
+	iop_enable_cp6();
 
 	return 0;
 }
diff --git a/arch/arm/mach-iop32x/include/mach/entry-macro.S b/arch/arm/mach-iop32x/include/mach/entry-macro.S
deleted file mode 100644
index 341e5d9a6616..000000000000
--- a/arch/arm/mach-iop32x/include/mach/entry-macro.S
+++ /dev/null
@@ -1,31 +0,0 @@
-/*
- * arch/arm/mach-iop32x/include/mach/entry-macro.S
- *
- * Low-level IRQ helper macros for IOP32x-based platforms
- *
- * This file is licensed under the terms of the GNU General Public
- * License version 2. This program is licensed "as is" without any
- * warranty of any kind, whether express or implied.
- */
-	.macro get_irqnr_preamble, base, tmp
-	mrc	p15, 0, \tmp, c15, c1, 0
-	orr	\tmp, \tmp, #(1 << 6)
-	mcr	p15, 0, \tmp, c15, c1, 0	@ Enable cp6 access
-	mrc	p15, 0, \tmp, c15, c1, 0
-	mov	\tmp, \tmp
-	sub	pc, pc, #4			@ cp_wait
-	.endm
-
-	.macro  get_irqnr_and_base, irqnr, irqstat, base, tmp
-	mrc     p6, 0, \irqstat, c8, c0, 0	@ Read IINTSRC
-	cmp     \irqstat, #0
-	clzne   \irqnr, \irqstat
-	rsbne   \irqnr, \irqnr, #32
-	.endm
-
-	.macro arch_ret_to_user, tmp1, tmp2
-	mrc	p15, 0, \tmp1, c15, c1, 0
-	ands	\tmp2, \tmp1, #(1 << 6)
-	bicne	\tmp1, \tmp1, #(1 << 6)
-	mcrne	p15, 0, \tmp1, c15, c1, 0	@ Disable cp6 access
-	.endm
diff --git a/arch/arm/mach-iop32x/iop3xx.h b/arch/arm/mach-iop32x/iop3xx.h
index 46b4b34a4ad2..a6ec7ebadb35 100644
--- a/arch/arm/mach-iop32x/iop3xx.h
+++ b/arch/arm/mach-iop32x/iop3xx.h
@@ -225,6 +225,7 @@ extern int iop3xx_get_init_atu(void);
 #include <linux/reboot.h>
 
 void iop3xx_map_io(void);
+void iop_enable_cp6(void);
 void iop_init_cp6_handler(void);
 void iop_init_time(unsigned long tickrate);
 void iop3xx_restart(enum reboot_mode, const char *);
diff --git a/arch/arm/mach-iop32x/irq.c b/arch/arm/mach-iop32x/irq.c
index d1e8824cbd82..6dca7e97d81f 100644
--- a/arch/arm/mach-iop32x/irq.c
+++ b/arch/arm/mach-iop32x/irq.c
@@ -29,6 +29,15 @@ static void intstr_write(u32 val)
 	asm volatile("mcr p6, 0, %0, c4, c0, 0" : : "r" (val));
 }
 
+static u32 iintsrc_read(void)
+{
+	int irq;
+
+	asm volatile("mrc p6, 0, %0, c8, c0, 0" : "=r" (irq));
+
+	return irq;
+}
+
 static void
 iop32x_irq_mask(struct irq_data *d)
 {
@@ -50,11 +59,25 @@ struct irq_chip ext_chip = {
 	.irq_unmask	= iop32x_irq_unmask,
 };
 
+static void iop_handle_irq(struct pt_regs *regs)
+{
+	u32 mask;
+
+	iop_enable_cp6();
+
+	do {
+		mask = iintsrc_read();
+		if (mask)
+			generic_handle_irq(fls(mask));
+	} while (mask);
+}
+
 void __init iop32x_init_irq(void)
 {
 	int i;
 
 	iop_init_cp6_handler();
+	set_handle_irq(iop_handle_irq);
 
 	intctl_write(0);
 	intstr_write(0);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 06/32] ARM: remove old-style irq entry
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 05/32] ARM: iop32x: use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 07/32] irqchip: nvic: Use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

From: Arnd Bergmann <arnd@arndb.de>

The last user of arch_irq_handler_default is gone now, so the
entry-macro-multi.S file and all references to mach/entry-macro.S can
be removed, as well as the asm_do_IRQ() entrypoint into the interrupt
handling routines implemented in C.

Note: The ARMv7-M entry still uses its own top-level IRQ entry, calling
nvic_handle_irq() from assembly. This could be changed to go through
generic_handle_arch_irq() as well, but it's unclear to me if there are
any benefits.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ardb: keep irq_handler macro as it will carry the IRQ stack handling]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
---
 arch/arm/Kconfig                         | 12 +-----
 arch/arm/include/asm/entry-macro-multi.S | 40 --------------------
 arch/arm/include/asm/irq.h               |  1 -
 arch/arm/include/asm/mach/arch.h         |  2 -
 arch/arm/include/asm/smp.h               |  5 ---
 arch/arm/kernel/entry-armv.S             |  8 ----
 arch/arm/kernel/irq.c                    | 17 ---------
 arch/arm/kernel/smp.c                    |  5 ---
 8 files changed, 1 insertion(+), 89 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ac2f88ce0b9a..7528cbdb90a1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -58,6 +58,7 @@ config ARM
 	select GENERIC_CPU_AUTOPROBE
 	select GENERIC_EARLY_IOREMAP
 	select GENERIC_IDLE_POLL_SETUP
+	select GENERIC_IRQ_MULTI_HANDLER if MMU
 	select GENERIC_IRQ_PROBE
 	select GENERIC_IRQ_SHOW
 	select GENERIC_IRQ_SHOW_LEVEL
@@ -319,7 +320,6 @@ config ARCH_MULTIPLATFORM
 	select AUTO_ZRELADDR
 	select TIMER_OF
 	select COMMON_CLK
-	select GENERIC_IRQ_MULTI_HANDLER
 	select HAVE_PCI
 	select PCI_DOMAINS_GENERIC if PCI
 	select SPARSE_IRQ
@@ -343,7 +343,6 @@ config ARCH_EP93XX
 	select ARM_AMBA
 	imply ARM_PATCH_PHYS_VIRT
 	select ARM_VIC
-	select GENERIC_IRQ_MULTI_HANDLER
 	select AUTO_ZRELADDR
 	select CLKSRC_MMIO
 	select CPU_ARM920T
@@ -358,7 +357,6 @@ config ARCH_FOOTBRIDGE
 	select FOOTBRIDGE
 	select NEED_MACH_IO_H if !MMU
 	select NEED_MACH_MEMORY_H
-	select GENERIC_IRQ_MULTI_HANDLER
 	help
 	  Support for systems based on the DC21285 companion chip
 	  ("FootBridge"), such as the Simtec CATS and the Rebel NetWinder.
@@ -367,7 +365,6 @@ config ARCH_IOP32X
 	bool "IOP32x-based"
 	depends on MMU
 	select CPU_XSCALE
-	select GENERIC_IRQ_MULTI_HANDLER
 	select GPIO_IOP
 	select GPIOLIB
 	select FORCE_PCI
@@ -383,7 +380,6 @@ config ARCH_IXP4XX
 	select ARCH_SUPPORTS_BIG_ENDIAN
 	select CPU_XSCALE
 	select DMABOUNCE if PCI
-	select GENERIC_IRQ_MULTI_HANDLER
 	select GPIO_IXP4XX
 	select GPIOLIB
 	select HAVE_PCI
@@ -399,7 +395,6 @@ config ARCH_IXP4XX
 config ARCH_DOVE
 	bool "Marvell Dove"
 	select CPU_PJ4
-	select GENERIC_IRQ_MULTI_HANDLER
 	select GPIOLIB
 	select HAVE_PCI
 	select MVEBU_MBUS
@@ -422,7 +417,6 @@ config ARCH_PXA
 	select CLKSRC_MMIO
 	select TIMER_OF
 	select CPU_XSCALE if !CPU_XSC3
-	select GENERIC_IRQ_MULTI_HANDLER
 	select GPIO_PXA
 	select GPIOLIB
 	select IRQ_DOMAIN
@@ -441,7 +435,6 @@ config ARCH_RPC
 	select ARM_HAS_SG_CHAIN
 	select CPU_SA110
 	select FIQ
-	select GENERIC_IRQ_MULTI_HANDLER
 	select HAVE_PATA_PLATFORM
 	select ISA_DMA_API
 	select LEGACY_TIMER_TICK
@@ -462,7 +455,6 @@ config ARCH_SA1100
 	select COMMON_CLK
 	select CPU_FREQ
 	select CPU_SA1100
-	select GENERIC_IRQ_MULTI_HANDLER
 	select GPIOLIB
 	select IRQ_DOMAIN
 	select ISA
@@ -477,7 +469,6 @@ config ARCH_S3C24XX
 	select CLKSRC_SAMSUNG_PWM
 	select GPIO_SAMSUNG
 	select GPIOLIB
-	select GENERIC_IRQ_MULTI_HANDLER
 	select NEED_MACH_IO_H
 	select S3C2410_WATCHDOG
 	select SAMSUNG_ATAGS
@@ -495,7 +486,6 @@ config ARCH_OMAP1
 	select ARCH_OMAP
 	select CLKSRC_MMIO
 	select GENERIC_IRQ_CHIP
-	select GENERIC_IRQ_MULTI_HANDLER
 	select GPIOLIB
 	select HAVE_LEGACY_CLK
 	select IRQ_DOMAIN
diff --git a/arch/arm/include/asm/entry-macro-multi.S b/arch/arm/include/asm/entry-macro-multi.S
deleted file mode 100644
index dfc6bfa43012..000000000000
--- a/arch/arm/include/asm/entry-macro-multi.S
+++ /dev/null
@@ -1,40 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include <asm/assembler.h>
-
-/*
- * Interrupt handling.  Preserves r7, r8, r9
- */
-	.macro	arch_irq_handler_default
-	get_irqnr_preamble r6, lr
-1:	get_irqnr_and_base r0, r2, r6, lr
-	movne	r1, sp
-	@
-	@ routine called with r0 = irq number, r1 = struct pt_regs *
-	@
-	badrne	lr, 1b
-	bne	asm_do_IRQ
-
-#ifdef CONFIG_SMP
-	/*
-	 * XXX
-	 *
-	 * this macro assumes that irqstat (r2) and base (r6) are
-	 * preserved from get_irqnr_and_base above
-	 */
-	ALT_SMP(test_for_ipi r0, r2, r6, lr)
-	ALT_UP_B(9997f)
-	movne	r1, sp
-	badrne	lr, 1b
-	bne	do_IPI
-#endif
-9997:
-	.endm
-
-	.macro	arch_irq_handler, symbol_name
-	.align	5
-	.global \symbol_name
-\symbol_name:
-	mov	r8, lr
-	arch_irq_handler_default
-	ret	r8
-	.endm
diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 1cbcc462b07e..a7c2337b0c7d 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -26,7 +26,6 @@
 struct irqaction;
 struct pt_regs;
 
-extern void asm_do_IRQ(unsigned int, struct pt_regs *);
 void handle_IRQ(unsigned int, struct pt_regs *);
 void init_IRQ(void);
 
diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index eec0c0bda766..9349e7a82c9c 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -56,9 +56,7 @@ struct machine_desc {
 	void			(*init_time)(void);
 	void			(*init_machine)(void);
 	void			(*init_late)(void);
-#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
 	void			(*handle_irq)(struct pt_regs *);
-#endif
 	void			(*restart)(enum reboot_mode, const char *);
 };
 
diff --git a/arch/arm/include/asm/smp.h b/arch/arm/include/asm/smp.h
index f16cbbd5cda4..7c1c90d9f582 100644
--- a/arch/arm/include/asm/smp.h
+++ b/arch/arm/include/asm/smp.h
@@ -24,11 +24,6 @@ struct seq_file;
  */
 extern void show_ipi_list(struct seq_file *, int);
 
-/*
- * Called from assembly code, this handles an IPI.
- */
-asmlinkage void do_IPI(int ipinr, struct pt_regs *regs);
-
 /*
  * Called from C code, this handles an IPI.
  */
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 5cd057859fe9..9d9372781408 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -19,9 +19,6 @@
 #include <asm/glue-df.h>
 #include <asm/glue-pf.h>
 #include <asm/vfpmacros.h>
-#ifndef CONFIG_GENERIC_IRQ_MULTI_HANDLER
-#include <mach/entry-macro.S>
-#endif
 #include <asm/thread_notify.h>
 #include <asm/unwind.h>
 #include <asm/unistd.h>
@@ -30,19 +27,14 @@
 #include <asm/uaccess-asm.h>
 
 #include "entry-header.S"
-#include <asm/entry-macro-multi.S>
 #include <asm/probes.h>
 
 /*
  * Interrupt handling.
  */
 	.macro	irq_handler
-#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
 	mov	r0, sp
 	bl	generic_handle_arch_irq
-#else
-	arch_irq_handler_default
-#endif
 	.endm
 
 	.macro	pabt_helper
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index b79975bd988c..5a1e52a4ee11 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -80,23 +80,6 @@ void handle_IRQ(unsigned int irq, struct pt_regs *regs)
 		ack_bad_irq(irq);
 }
 
-/*
- * asm_do_IRQ is the interface to be used from assembly code.
- */
-asmlinkage void __exception_irq_entry
-asm_do_IRQ(unsigned int irq, struct pt_regs *regs)
-{
-	struct pt_regs *old_regs;
-
-	irq_enter();
-	old_regs = set_irq_regs(regs);
-
-	handle_IRQ(irq, regs);
-
-	set_irq_regs(old_regs);
-	irq_exit();
-}
-
 void __init init_IRQ(void)
 {
 	int ret;
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 97ee6b1567e9..ed2b168ff46c 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -628,11 +628,6 @@ static void ipi_complete(unsigned int cpu)
 /*
  * Main handler for inter-processor interrupts
  */
-asmlinkage void __exception_irq_entry do_IPI(int ipinr, struct pt_regs *regs)
-{
-	handle_IPI(ipinr, regs);
-}
-
 static void do_handle_IPI(int ipinr)
 {
 	unsigned int cpu = smp_processor_id();
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 07/32] irqchip: nvic: Use GENERIC_IRQ_MULTI_HANDLER
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 06/32] ARM: remove old-style irq entry Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 08/32] ARM: decompressor: disable stack protector Ard Biesheuvel
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube,
	Mark Rutland

From: Vladimir Murzin <vladimir.murzin@arm.com>

Rather then restructuring the ARMv7M entrly logic per TODO, just move
NVIC to GENERIC_IRQ_MULTI_HANDLER.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/include/asm/v7m.h  |  3 ++-
 arch/arm/kernel/entry-v7m.S | 10 +++------
 drivers/irqchip/Kconfig     |  1 +
 drivers/irqchip/irq-nvic.c  | 22 +++++---------------
 4 files changed, 11 insertions(+), 25 deletions(-)

diff --git a/arch/arm/include/asm/v7m.h b/arch/arm/include/asm/v7m.h
index 2cb00d15831b..4512f7e1918f 100644
--- a/arch/arm/include/asm/v7m.h
+++ b/arch/arm/include/asm/v7m.h
@@ -13,6 +13,7 @@
 #define V7M_SCB_ICSR_PENDSVSET			(1 << 28)
 #define V7M_SCB_ICSR_PENDSVCLR			(1 << 27)
 #define V7M_SCB_ICSR_RETTOBASE			(1 << 11)
+#define V7M_SCB_ICSR_VECTACTIVE			0x000001ff
 
 #define V7M_SCB_VTOR			0x08
 
@@ -38,7 +39,7 @@
 #define V7M_SCB_SHCSR_MEMFAULTENA		(1 << 16)
 
 #define V7M_xPSR_FRAMEPTRALIGN			0x00000200
-#define V7M_xPSR_EXCEPTIONNO			0x000001ff
+#define V7M_xPSR_EXCEPTIONNO			V7M_SCB_ICSR_VECTACTIVE
 
 /*
  * When branching to an address that has bits [31:28] == 0xf an exception return
diff --git a/arch/arm/kernel/entry-v7m.S b/arch/arm/kernel/entry-v7m.S
index 7bde93c10962..520dd43e7e08 100644
--- a/arch/arm/kernel/entry-v7m.S
+++ b/arch/arm/kernel/entry-v7m.S
@@ -39,14 +39,10 @@ __irq_entry:
 	@
 	@ Invoke the IRQ handler
 	@
-	mrs	r0, ipsr
-	ldr	r1, =V7M_xPSR_EXCEPTIONNO
-	and	r0, r1
-	sub	r0, #16
-	mov	r1, sp
+	mov	r0, sp
 	stmdb	sp!, {lr}
-	@ routine called with r0 = irq number, r1 = struct pt_regs *
-	bl	nvic_handle_irq
+	@ routine called with r0 = struct pt_regs *
+	bl	generic_handle_arch_irq
 
 	pop	{lr}
 	@
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 7038957f4a77..488eaa14d3a7 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -58,6 +58,7 @@ config ARM_NVIC
 	bool
 	select IRQ_DOMAIN_HIERARCHY
 	select GENERIC_IRQ_CHIP
+	select GENERIC_IRQ_MULTI_HANDLER
 
 config ARM_VIC
 	bool
diff --git a/drivers/irqchip/irq-nvic.c b/drivers/irqchip/irq-nvic.c
index ba4759b3e269..125f9c1cf0c3 100644
--- a/drivers/irqchip/irq-nvic.c
+++ b/drivers/irqchip/irq-nvic.c
@@ -37,25 +37,12 @@
 
 static struct irq_domain *nvic_irq_domain;
 
-static void __nvic_handle_irq(irq_hw_number_t hwirq)
+static void __irq_entry nvic_handle_irq(struct pt_regs *regs)
 {
-	generic_handle_domain_irq(nvic_irq_domain, hwirq);
-}
+	unsigned long icsr = readl_relaxed(BASEADDR_V7M_SCB + V7M_SCB_ICSR);
+	irq_hw_number_t hwirq = (icsr & V7M_SCB_ICSR_VECTACTIVE) - 16;
 
-/*
- * TODO: restructure the ARMv7M entry logic so that this entry logic can live
- * in arch code.
- */
-asmlinkage void __exception_irq_entry
-nvic_handle_irq(irq_hw_number_t hwirq, struct pt_regs *regs)
-{
-	struct pt_regs *old_regs;
-
-	irq_enter();
-	old_regs = set_irq_regs(regs);
-	__nvic_handle_irq(hwirq);
-	set_irq_regs(old_regs);
-	irq_exit();
+	generic_handle_domain_irq(nvic_irq_domain, hwirq);
 }
 
 static int nvic_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
@@ -141,6 +128,7 @@ static int __init nvic_of_init(struct device_node *node,
 	for (i = 0; i < irqs; i += 4)
 		writel_relaxed(0, nvic_base + NVIC_IPR + i);
 
+	set_handle_irq(nvic_handle_irq);
 	return 0;
 }
 IRQCHIP_DECLARE(armv7m_nvic, "arm,armv7m-nvic", nvic_of_init);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 08/32] ARM: decompressor: disable stack protector
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 07/32] irqchip: nvic: Use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 09/32] ARM: stackprotector: prefer compiler for TLS based per-task protector Ard Biesheuvel
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Enabling the stack protector in the decompressor is of dubious value,
given that it uses a fixed value for the canary, cannot print any output
unless CONFIG_DEBUG_LL is enabled (which relies on board specific build
time settings), and is already disabled for a good chunk of the code
(libfdt).

So let's just disable it in the decompressor. This will make it easier
in the future to manage the command line options that would need to be
removed again in this context for the TLS register based stack
protector.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/boot/compressed/Makefile | 6 +-----
 arch/arm/boot/compressed/misc.c   | 7 -------
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 954eee8a785a..187a187706cb 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -92,17 +92,13 @@ ifeq ($(CONFIG_USE_OF),y)
 OBJS	+= $(libfdt_objs) fdt_check_mem_start.o
 endif
 
-# -fstack-protector-strong triggers protection checks in this code,
-# but it is being used too early to link to meaningful stack_chk logic.
-$(foreach o, $(libfdt_objs) atags_to_fdt.o fdt_check_mem_start.o, \
-	$(eval CFLAGS_$(o) := -I $(srctree)/scripts/dtc/libfdt -fno-stack-protector))
-
 targets       := vmlinux vmlinux.lds piggy_data piggy.o \
 		 head.o $(OBJS)
 
 KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
 
 ccflags-y := -fpic $(call cc-option,-mno-single-pic-base,) -fno-builtin \
+	     -I$(srctree)/scripts/dtc/libfdt -fno-stack-protector \
 	     -I$(obj) $(DISABLE_ARM_SSP_PER_TASK_PLUGIN)
 ccflags-remove-$(CONFIG_FUNCTION_TRACER) += -pg
 asflags-y := -DZIMAGE
diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c
index e1e9a5dde853..c3c66ff2d696 100644
--- a/arch/arm/boot/compressed/misc.c
+++ b/arch/arm/boot/compressed/misc.c
@@ -128,13 +128,6 @@ asmlinkage void __div0(void)
 	error("Attempting division by 0!");
 }
 
-const unsigned long __stack_chk_guard = 0x000a0dff;
-
-void __stack_chk_fail(void)
-{
-	error("stack-protector: Kernel stack is corrupted\n");
-}
-
 extern int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x));
 
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 09/32] ARM: stackprotector: prefer compiler for TLS based per-task protector
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 08/32] ARM: decompressor: disable stack protector Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 10/32] ARM: entry: preserve thread_info pointer in switch_to Ard Biesheuvel
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Currently, we implement the per-task stack protector for ARM using a GCC
plugin, due to lack of native compiler support. However, work is
underway to get this implemented in the compiler, which means we will be
able to deprecate the GCC plugin at some point.

In the meantime, we will need to support both, where the native compiler
implementation is obviously preferred. So let's wire this up in Kconfig
and the Makefile.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/Kconfig  | 8 ++++++--
 arch/arm/Makefile | 9 +++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 7528cbdb90a1..99ac5d75dcec 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1596,10 +1596,14 @@ config XEN
 	help
 	  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.
 
+config CC_HAVE_STACKPROTECTOR_TLS
+	def_bool $(cc-option,-mtp=cp15 -mstack-protector-guard=tls -mstack-protector-guard-offset=0)
+
 config STACKPROTECTOR_PER_TASK
 	bool "Use a unique stack canary value for each task"
-	depends on GCC_PLUGINS && STACKPROTECTOR && THREAD_INFO_IN_TASK && !XIP_DEFLATED_DATA
-	select GCC_PLUGIN_ARM_SSP_PER_TASK
+	depends on STACKPROTECTOR && THREAD_INFO_IN_TASK && !XIP_DEFLATED_DATA
+	depends on GCC_PLUGINS || CC_HAVE_STACKPROTECTOR_TLS
+	select GCC_PLUGIN_ARM_SSP_PER_TASK if !CC_HAVE_STACKPROTECTOR_TLS
 	default y
 	help
 	  Due to the fact that GCC uses an ordinary symbol reference from
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 77172d555c7e..e943624cbf87 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -275,6 +275,14 @@ endif
 
 ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
 prepare: stack_protector_prepare
+ifeq ($(CONFIG_CC_HAVE_STACKPROTECTOR_TLS),y)
+stack_protector_prepare: prepare0
+	$(eval KBUILD_CFLAGS += \
+		-mstack-protector-guard=tls \
+		-mstack-protector-guard-offset=$(shell	\
+			awk '{if ($$2 == "TSK_STACK_CANARY") print $$3;}'\
+				include/generated/asm-offsets.h))
+else
 stack_protector_prepare: prepare0
 	$(eval SSP_PLUGIN_CFLAGS := \
 		-fplugin-arg-arm_ssp_per_task_plugin-offset=$(shell	\
@@ -283,6 +291,7 @@ stack_protector_prepare: prepare0
 	$(eval KBUILD_CFLAGS += $(SSP_PLUGIN_CFLAGS))
 	$(eval GCC_PLUGINS_CFLAGS += $(SSP_PLUGIN_CFLAGS))
 endif
+endif
 
 all:	$(notdir $(KBUILD_IMAGE))
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 10/32] ARM: entry: preserve thread_info pointer in switch_to
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 09/32] ARM: stackprotector: prefer compiler for TLS based per-task protector Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 11/32] ARM: module: implement support for PC-relative group relocations Ard Biesheuvel
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Tweak the UP stack protector handling code so that the thread info
pointer is preserved in R7 until set_current is called. This is needed
for a subsequent patch that implements THREAD_INFO_IN_TASK and
set_current for UP as well.

This also means we will prefer the per-task protector on UP systems that
implement the thread ID registers, so tweak the preprocessor
conditionals to reflect this.

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/kernel/entry-armv.S | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 9d9372781408..5e01a34369a0 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -744,16 +744,16 @@ ENTRY(__switch_to)
 	ldr	r6, [r2, #TI_CPU_DOMAIN]
 #endif
 	switch_tls r1, r4, r5, r3, r7
-#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP)
-	ldr	r7, [r2, #TI_TASK]
+#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP) && \
+    !defined(CONFIG_STACKPROTECTOR_PER_TASK)
+	ldr	r9, [r2, #TI_TASK]
 	ldr	r8, =__stack_chk_guard
 	.if (TSK_STACK_CANARY > IMM12_MASK)
-	add	r7, r7, #TSK_STACK_CANARY & ~IMM12_MASK
+	add	r9, r9, #TSK_STACK_CANARY & ~IMM12_MASK
 	.endif
-	ldr	r7, [r7, #TSK_STACK_CANARY & IMM12_MASK]
-#elif defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO)
-	mov	r7, r2				@ Preserve 'next'
+	ldr	r9, [r9, #TSK_STACK_CANARY & IMM12_MASK]
 #endif
+	mov	r7, r2				@ Preserve 'next'
 #ifdef CONFIG_CPU_USE_DOMAINS
 	mcr	p15, 0, r6, c3, c0, 0		@ Set domain register
 #endif
@@ -762,8 +762,9 @@ ENTRY(__switch_to)
 	ldr	r0, =thread_notify_head
 	mov	r1, #THREAD_NOTIFY_SWITCH
 	bl	atomic_notifier_call_chain
-#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP)
-	str	r7, [r8]
+#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP) && \
+    !defined(CONFIG_STACKPROTECTOR_PER_TASK)
+	str	r9, [r8]
 #endif
  THUMB(	mov	ip, r4			   )
 	mov	r0, r5
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 11/32] ARM: module: implement support for PC-relative group relocations
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 10/32] ARM: entry: preserve thread_info pointer in switch_to Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 12/32] ARM: assembler: add optimized ldr/str macros to load variables from memory Ard Biesheuvel
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Add support for the R_ARM_ALU_PC_Gn_NC and R_ARM_LDR_PC_G2 group
relocations [0] so we can use them in modules. These will be used to
load the current task pointer from a global variable without having to
rely on a literal pool entry to carry the address of this variable,
which may have a significant negative impact on cache utilization for
variables that are used often and in many different places, as each
occurrence will result in a literal pool entry and therefore a line in
the D-cache.

[0] 'ELF for the ARM architecture'
    https://github.com/ARM-software/abi-aa/releases

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/include/asm/elf.h |  3 +
 arch/arm/kernel/module.c   | 90 ++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h
index b8102a6ddf16..d68101655b74 100644
--- a/arch/arm/include/asm/elf.h
+++ b/arch/arm/include/asm/elf.h
@@ -61,6 +61,9 @@ typedef struct user_fp elf_fpregset_t;
 #define R_ARM_MOVT_ABS		44
 #define R_ARM_MOVW_PREL_NC	45
 #define R_ARM_MOVT_PREL		46
+#define R_ARM_ALU_PC_G0_NC	57
+#define R_ARM_ALU_PC_G1_NC	59
+#define R_ARM_LDR_PC_G2		63
 
 #define R_ARM_THM_CALL		10
 #define R_ARM_THM_JUMP24	30
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index beac45e89ba6..49ff7fd18f0c 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -68,6 +68,44 @@ bool module_exit_section(const char *name)
 		strstarts(name, ".ARM.exidx.exit");
 }
 
+#ifdef CONFIG_ARM_HAS_GROUP_RELOCS
+/*
+ * This implements the partitioning algorithm for group relocations as
+ * documented in the ARM AArch32 ELF psABI (IHI 0044).
+ *
+ * A single PC-relative symbol reference is divided in up to 3 add or subtract
+ * operations, where the final one could be incorporated into a load/store
+ * instruction with immediate offset. E.g.,
+ *
+ *   ADD	Rd, PC, #...		or	ADD	Rd, PC, #...
+ *   ADD	Rd, Rd, #...			ADD	Rd, Rd, #...
+ *   LDR	Rd, [Rd, #...]			ADD	Rd, Rd, #...
+ *
+ * The latter has a guaranteed range of only 16 MiB (3x8 == 24 bits), so it is
+ * of limited use in the kernel. However, the ADD/ADD/LDR combo has a range of
+ * -/+ 256 MiB, (2x8 + 12 == 28 bits), which means it has sufficient range for
+ * any in-kernel symbol reference (unless module PLTs are being used).
+ *
+ * The main advantage of this approach over the typical pattern using a literal
+ * load is that literal loads may miss in the D-cache, and generally lead to
+ * lower cache efficiency for variables that are referenced often from many
+ * different places in the code.
+ */
+static u32 get_group_rem(u32 group, u32 *offset)
+{
+	u32 val = *offset;
+	u32 shift;
+	do {
+		shift = val ? (31 - __fls(val)) & ~1 : 32;
+		*offset = val;
+		if (!val)
+			break;
+		val &= 0xffffff >> shift;
+	} while (group--);
+	return shift;
+}
+#endif
+
 int
 apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 	       unsigned int relindex, struct module *module)
@@ -87,6 +125,9 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 #ifdef CONFIG_THUMB2_KERNEL
 		u32 upper, lower, sign, j1, j2;
 #endif
+#ifdef CONFIG_ARM_HAS_GROUP_RELOCS
+		u32 shift, group = 1;
+#endif
 
 		offset = ELF32_R_SYM(rel->r_info);
 		if (offset < 0 || offset > (symsec->sh_size / sizeof(Elf32_Sym))) {
@@ -331,6 +372,55 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 			*(u16 *)(loc + 2) = __opcode_to_mem_thumb16(lower);
 			break;
 #endif
+#ifdef CONFIG_ARM_HAS_GROUP_RELOCS
+		case R_ARM_ALU_PC_G0_NC:
+			group = 0;
+			fallthrough;
+		case R_ARM_ALU_PC_G1_NC:
+			tmp = __mem_to_opcode_arm(*(u32 *)loc);
+			offset = ror32(tmp & 0xff, (tmp & 0xf00) >> 7);
+			if (tmp & BIT(22))
+				offset = -offset;
+			offset += sym->st_value - loc;
+			if (offset < 0) {
+				offset = -offset;
+				tmp = (tmp & ~BIT(23)) | BIT(22); // SUB opcode
+			} else {
+				tmp = (tmp & ~BIT(22)) | BIT(23); // ADD opcode
+			}
+
+			shift = get_group_rem(group, &offset);
+			if (shift < 24) {
+				offset >>= 24 - shift;
+				offset |= (shift + 8) << 7;
+			}
+			*(u32 *)loc = __opcode_to_mem_arm((tmp & ~0xfff) | offset);
+			break;
+
+		case R_ARM_LDR_PC_G2:
+			tmp = __mem_to_opcode_arm(*(u32 *)loc);
+			offset = tmp & 0xfff;
+			if (~tmp & BIT(23))		// U bit cleared?
+				offset = -offset;
+			offset += sym->st_value - loc;
+			if (offset < 0) {
+				offset = -offset;
+				tmp &= ~BIT(23);	// clear U bit
+			} else {
+				tmp |= BIT(23);		// set U bit
+			}
+			get_group_rem(2, &offset);
+
+			if (offset > 0xfff) {
+				pr_err("%s: section %u reloc %u sym '%s': relocation %u out of range (%#lx -> %#x)\n",
+				       module->name, relindex, i, symname,
+				       ELF32_R_TYPE(rel->r_info), loc,
+				       sym->st_value);
+				return -ENOEXEC;
+			}
+			*(u32 *)loc = __opcode_to_mem_arm((tmp & ~0xfff) | offset);
+			break;
+#endif
 
 		default:
 			pr_err("%s: unknown relocation: %u\n",
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 12/32] ARM: assembler: add optimized ldr/str macros to load variables from memory
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 11/32] ARM: module: implement support for PC-relative group relocations Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 13/32] ARM: percpu: add SMP_ON_UP support Ard Biesheuvel
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

We will be adding variable loads to various hot paths, so it makes sense
to add a helper macro that can load variables from asm code without the
use of literal pool entries. On v7 or later, we can simply use MOVW/MOVT
pairs, but on earlier cores, this requires a bit of hackery to emit a
instruction sequence that implements this using a sequence of ADD/LDR
instructions.

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/Kconfig                 | 11 +++++
 arch/arm/include/asm/assembler.h | 48 ++++++++++++++++++--
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 99ac5d75dcec..9586636289d2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -139,6 +139,17 @@ config ARM
 	  Europe.  There is an ARM Linux project with a web page at
 	  <http://www.arm.linux.org.uk/>.
 
+config ARM_HAS_GROUP_RELOCS
+	def_bool y
+	depends on !LD_IS_LLD || LLD_VERSION >= 140000
+	depends on !COMPILE_TEST
+	help
+	  Whether or not to use R_ARM_ALU_PC_Gn or R_ARM_LDR_PC_Gn group
+	  relocations, which have been around for a long time, but were not
+	  supported in LLD until version 14. The combined range is -/+ 256 MiB,
+	  which is usually sufficient, but not for allyesconfig, so we disable
+	  this feature when doing compile testing.
+
 config ARM_HAS_SG_CHAIN
 	bool
 
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 7d23d4bb2168..7a4e292b68e4 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -564,12 +564,12 @@ THUMB(	orr	\reg , \reg , #PSR_T_BIT	)
 	/*
 	 * mov_l - move a constant value or [relocated] address into a register
 	 */
-	.macro		mov_l, dst:req, imm:req
+	.macro		mov_l, dst:req, imm:req, cond
 	.if		__LINUX_ARM_ARCH__ < 7
-	ldr		\dst, =\imm
+	ldr\cond	\dst, =\imm
 	.else
-	movw		\dst, #:lower16:\imm
-	movt		\dst, #:upper16:\imm
+	movw\cond	\dst, #:lower16:\imm
+	movt\cond	\dst, #:upper16:\imm
 	.endif
 	.endm
 
@@ -607,6 +607,46 @@ THUMB(	orr	\reg , \reg , #PSR_T_BIT	)
 	__adldst_l	str, \src, \sym, \tmp, \cond
 	.endm
 
+	.macro		__ldst_va, op, reg, tmp, sym, offset=0, cond
+#if __LINUX_ARM_ARCH__ >= 7 || \
+    !defined(CONFIG_ARM_HAS_GROUP_RELOCS) || \
+    (defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS))
+	mov_l		\tmp, \sym, \cond
+#else
+	/*
+	 * Avoid a literal load, by emitting a sequence of ADD/LDR instructions
+	 * with the appropriate relocations. The combined sequence has a range
+	 * of -/+ 256 MiB, which should be sufficient for the core kernel and
+	 * for modules loaded into the module region.
+	 */
+	.globl		\sym
+	.reloc		.L0_\@, R_ARM_ALU_PC_G0_NC, \sym
+	.reloc		.L1_\@, R_ARM_ALU_PC_G1_NC, \sym
+	.reloc		.L2_\@, R_ARM_LDR_PC_G2, \sym
+.L0_\@: sub\cond	\tmp, pc, #8 - \offset
+.L1_\@: sub\cond	\tmp, \tmp, #4 - \offset
+#endif
+.L2_\@: \op\cond	\reg, [\tmp, #\offset]
+	.endm
+
+	/*
+	 * ldr_va - load a 32-bit word from the virtual address of \sym
+	 */
+	.macro		ldr_va, rd:req, sym:req, cond, tmp, offset
+	.ifb		\tmp
+	__ldst_va	ldr, \rd, \rd, \sym, \offset, \cond
+	.else
+	__ldst_va	ldr, \rd, \tmp, \sym, \offset, \cond
+	.endif
+	.endm
+
+	/*
+	 * str_va - store a 32-bit word to the virtual address of \sym
+	 */
+	.macro		str_va, rn:req, sym:req, tmp:req
+	__ldst_va	str, \rn, \tmp, \sym
+	.endm
+
 	/*
 	 * rev_l - byte-swap a 32-bit value
 	 *
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 13/32] ARM: percpu: add SMP_ON_UP support
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 12/32] ARM: assembler: add optimized ldr/str macros to load variables from memory Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 14/32] ARM: use TLS register for 'current' on !SMP as well Ard Biesheuvel
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Permit the use of the TPIDRPRW system register for carrying the per-CPU
offset in generic SMP configurations that also target non-SMP capable
ARMv6 cores. This uses the SMP_ON_UP code patching framework to turn all
TPIDRPRW accesses into reads/writes of entry #0 in the __per_cpu_offset
array.

While at it, switch over some existing direct TPIDRPRW accesses in asm
code to invocations of a new helper that is patched in the same way when
necessary.

Note that CPU_V6+SMP without SMP_ON_UP results in a kernel that does not
boot on v6 CPUs without SMP extensions, so add this dependency to
Kconfig as well.

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/include/asm/assembler.h | 61 +++++++++++++++++++-
 arch/arm/include/asm/insn.h      | 17 ++++++
 arch/arm/include/asm/percpu.h    | 36 ++++++++++--
 arch/arm/mm/Kconfig              |  1 +
 4 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 7a4e292b68e4..30752c4427d4 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -216,9 +216,7 @@
 
 	.macro	reload_current, t1:req, t2:req
 #ifdef CONFIG_CURRENT_POINTER_IN_TPIDRURO
-	adr_l	\t1, __entry_task		@ get __entry_task base address
-	mrc	p15, 0, \t2, c13, c0, 4		@ get per-CPU offset
-	ldr	\t1, [\t1, \t2]			@ load variable
+	ldr_this_cpu \t1, __entry_task, \t1, \t2
 	mcr	p15, 0, \t1, c13, c0, 3		@ store in TPIDRURO
 #endif
 	.endm
@@ -308,6 +306,26 @@
 #define ALT_UP_B(label) b label
 #endif
 
+	/*
+	 * this_cpu_offset - load the per-CPU offset of this CPU into
+	 * 		     register 'rd'
+	 */
+	.macro		this_cpu_offset, rd:req
+#ifdef CONFIG_SMP
+ALT_SMP(mrc		p15, 0, \rd, c13, c0, 4)
+#ifdef CONFIG_CPU_V6
+ALT_UP_B(.L0_\@)
+	.subsection	1
+.L0_\@: ldr_va		\rd, __per_cpu_offset
+	b		.L1_\@
+	.previous
+.L1_\@:
+#endif
+#else
+	mov		\rd, #0
+#endif
+	.endm
+
 /*
  * Instruction barrier
  */
@@ -647,6 +665,43 @@ THUMB(	orr	\reg , \reg , #PSR_T_BIT	)
 	__ldst_va	str, \rn, \tmp, \sym
 	.endm
 
+	/*
+	 * ldr_this_cpu_armv6 - Load a 32-bit word from the per-CPU variable 'sym',
+	 *			without using a temp register. Supported in ARM mode
+	 *			only.
+	 */
+	.macro		ldr_this_cpu_armv6, rd:req, sym:req
+	this_cpu_offset	\rd
+	.globl		\sym
+	.reloc		.L0_\@, R_ARM_ALU_PC_G0_NC, \sym
+	.reloc		.L1_\@, R_ARM_ALU_PC_G1_NC, \sym
+	.reloc		.L2_\@, R_ARM_LDR_PC_G2, \sym
+	add		\rd, \rd, pc
+.L0_\@: sub		\rd, \rd, #4
+.L1_\@: sub		\rd, \rd, #0
+.L2_\@: ldr		\rd, [\rd, #4]
+	.endm
+
+	/*
+	 * ldr_this_cpu - Load a 32-bit word from the per-CPU variable 'sym'
+	 *		  into register 'rd', which may be the stack pointer,
+	 *		  using 't1' and 't2' as general temp registers. These
+	 *		  are permitted to overlap with 'rd' if != sp
+	 */
+	.macro		ldr_this_cpu, rd:req, sym:req, t1:req, t2:req
+#ifndef CONFIG_SMP
+	ldr_va		\rd, \sym,, \t1			@ CPU offset == 0x0
+#elif __LINUX_ARM_ARCH__ >= 7 || \
+      !defined(CONFIG_ARM_HAS_GROUP_RELOCS) || \
+      (defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS))
+	this_cpu_offset	\t1
+	mov_l		\t2, \sym
+	ldr		\rd, [\t1, \t2]
+#else
+	ldr_this_cpu_armv6 \rd, \sym
+#endif
+	.endm
+
 	/*
 	 * rev_l - byte-swap a 32-bit value
 	 *
diff --git a/arch/arm/include/asm/insn.h b/arch/arm/include/asm/insn.h
index 5475cbf9fb6b..faf3d1c28368 100644
--- a/arch/arm/include/asm/insn.h
+++ b/arch/arm/include/asm/insn.h
@@ -2,6 +2,23 @@
 #ifndef __ASM_ARM_INSN_H
 #define __ASM_ARM_INSN_H
 
+#include <linux/types.h>
+
+/*
+ * Avoid a literal load by emitting a sequence of ADD/LDR instructions with the
+ * appropriate relocations. The combined sequence has a range of -/+ 256 MiB,
+ * which should be sufficient for the core kernel as well as modules loaded
+ * into the module region. (Not supported by LLD before release 14)
+ */
+#define LOAD_SYM_ARMV6(reg, sym)					\
+	"	.globl	" #sym "				\n\t"	\
+	"	.reloc	10f, R_ARM_ALU_PC_G0_NC, " #sym "	\n\t"	\
+	"	.reloc	11f, R_ARM_ALU_PC_G1_NC, " #sym "	\n\t"	\
+	"	.reloc	12f, R_ARM_LDR_PC_G2, " #sym "		\n\t"	\
+	"10:	sub	" #reg ", pc, #8			\n\t"	\
+	"11:	sub	" #reg ", " #reg ", #4			\n\t"	\
+	"12:	ldr	" #reg ", [" #reg ", #0]		\n\t"
+
 static inline unsigned long
 arm_gen_nop(void)
 {
diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
index e2fcb3cfd3de..7feba9d65e85 100644
--- a/arch/arm/include/asm/percpu.h
+++ b/arch/arm/include/asm/percpu.h
@@ -5,20 +5,27 @@
 #ifndef _ASM_ARM_PERCPU_H_
 #define _ASM_ARM_PERCPU_H_
 
+#include <asm/insn.h>
+
 register unsigned long current_stack_pointer asm ("sp");
 
 /*
  * Same as asm-generic/percpu.h, except that we store the per cpu offset
  * in the TPIDRPRW. TPIDRPRW only exists on V6K and V7
  */
-#if defined(CONFIG_SMP) && !defined(CONFIG_CPU_V6)
+#ifdef CONFIG_SMP
 static inline void set_my_cpu_offset(unsigned long off)
 {
+	extern unsigned int smp_on_up;
+
+	if (IS_ENABLED(CONFIG_CPU_V6) && !smp_on_up)
+		return;
+
 	/* Set TPIDRPRW */
 	asm volatile("mcr p15, 0, %0, c13, c0, 4" : : "r" (off) : "memory");
 }
 
-static inline unsigned long __my_cpu_offset(void)
+static __always_inline unsigned long __my_cpu_offset(void)
 {
 	unsigned long off;
 
@@ -27,8 +34,29 @@ static inline unsigned long __my_cpu_offset(void)
 	 * We want to allow caching the value, so avoid using volatile and
 	 * instead use a fake stack read to hazard against barrier().
 	 */
-	asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off)
-		: "Q" (*(const unsigned long *)current_stack_pointer));
+	asm("0:	mrc p15, 0, %0, c13, c0, 4			\n\t"
+#ifdef CONFIG_CPU_V6
+	    "1:							\n\t"
+	    "	.subsection 1					\n\t"
+#if defined(CONFIG_ARM_HAS_GROUP_RELOCS) && \
+    !(defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS))
+	    "2: " LOAD_SYM_ARMV6(%0, __per_cpu_offset) "	\n\t"
+	    "	b	1b					\n\t"
+#else
+	    "2: ldr	%0, 3f					\n\t"
+	    "	ldr	%0, [%0]				\n\t"
+	    "	b	1b					\n\t"
+	    "3:	.long	__per_cpu_offset			\n\t"
+#endif
+	    "	.previous					\n\t"
+	    "	.pushsection \".alt.smp.init\", \"a\"		\n\t"
+	    "	.align	2					\n\t"
+	    "	.long	0b - .					\n\t"
+	    "	b	. + (2b - 0b)				\n\t"
+	    "	.popsection					\n\t"
+#endif
+	    : "=r" (off)
+	    : "Q" (*(const unsigned long *)current_stack_pointer));
 
 	return off;
 }
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 58afba346729..a91ff22c6c2e 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -386,6 +386,7 @@ config CPU_V6
 	select CPU_PABRT_V6
 	select CPU_THUMB_CAPABLE
 	select CPU_TLB_V6 if MMU
+	select SMP_ON_UP if SMP
 
 # ARMv6k
 config CPU_V6K
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 14/32] ARM: use TLS register for 'current' on !SMP as well
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (12 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 13/32] ARM: percpu: add SMP_ON_UP support Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 15/32] ARM: smp: defer TPIDRURO update for SMP v6 configurations too Ard Biesheuvel
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Enable the use of the TLS register to hold the 'current' pointer also on
non-SMP configurations that target v6k or later CPUs. This will permit
the use of THREAD_INFO_IN_TASK as well as IRQ stacks and vmap'ed stacks
for such configurations.

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9586636289d2..0e1b93de10b4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1163,7 +1163,7 @@ config SMP_ON_UP
 
 config CURRENT_POINTER_IN_TPIDRURO
 	def_bool y
-	depends on SMP && CPU_32v6K && !CPU_V6
+	depends on CPU_32v6K && !CPU_V6
 
 config ARM_CPU_TOPOLOGY
 	bool "Support cpu topology definition"
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 15/32] ARM: smp: defer TPIDRURO update for SMP v6 configurations too
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (13 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 14/32] ARM: use TLS register for 'current' on !SMP as well Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 16/32] ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems Ard Biesheuvel
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Defer TPIDURO updates for user space until exit also for CPU_V6+SMP
configurations so that we can decide at runtime whether to use it to
carry the current pointer, provided that we are running on a CPU that
actually implements this register. This is needed for
THREAD_INFO_IN_TASK support for UP systems, which requires that all SMP
capable systems use the TPIDRURO based access to 'current' as the only
remaining alternative will be a global variable which only works on UP.

Given that SMP implies support for HWCAP_TLS, we can patch away the
hwcap test entirely from the context switch path rather than just the
TPIDRURO assignment when running on SMP hardware.

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/include/asm/tls.h     | 30 +++++++++++++-------
 arch/arm/kernel/entry-header.S |  8 +++++-
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/tls.h b/arch/arm/include/asm/tls.h
index c3296499176c..de254347acf6 100644
--- a/arch/arm/include/asm/tls.h
+++ b/arch/arm/include/asm/tls.h
@@ -18,21 +18,31 @@
 	.endm
 
 	.macro switch_tls_v6, base, tp, tpuser, tmp1, tmp2
-	ldr	\tmp1, =elf_hwcap
-	ldr	\tmp1, [\tmp1, #0]
+#ifdef CONFIG_SMP
+ALT_SMP(nop)
+ALT_UP_B(.L0_\@)
+	.subsection 1
+#endif
+.L0_\@: ldr_va	\tmp1, elf_hwcap
 	mov	\tmp2, #0xffff0fff
 	tst	\tmp1, #HWCAP_TLS		@ hardware TLS available?
 	streq	\tp, [\tmp2, #-15]		@ set TLS value at 0xffff0ff0
-	mrcne	p15, 0, \tmp2, c13, c0, 2	@ get the user r/w register
-	mcrne	p15, 0, \tp, c13, c0, 3		@ yes, set TLS register
-	mcrne	p15, 0, \tpuser, c13, c0, 2	@ set user r/w register
-	strne	\tmp2, [\base, #TI_TP_VALUE + 4] @ save it
+	beq	.L2_\@
+	mcr	p15, 0, \tp, c13, c0, 3		@ yes, set TLS register
+#ifdef CONFIG_SMP
+	b	.L1_\@
+	.previous
+#endif
+.L1_\@: switch_tls_v6k \base, \tp, \tpuser, \tmp1, \tmp2
+.L2_\@:
 	.endm
 
 	.macro switch_tls_software, base, tp, tpuser, tmp1, tmp2
 	mov	\tmp1, #0xffff0fff
 	str	\tp, [\tmp1, #-15]		@ set TLS value at 0xffff0ff0
 	.endm
+#else
+#include <asm/smp_plat.h>
 #endif
 
 #ifdef CONFIG_TLS_REG_EMUL
@@ -43,7 +53,7 @@
 #elif defined(CONFIG_CPU_V6)
 #define tls_emu		0
 #define has_tls_reg		(elf_hwcap & HWCAP_TLS)
-#define defer_tls_reg_update	0
+#define defer_tls_reg_update	is_smp()
 #define switch_tls	switch_tls_v6
 #elif defined(CONFIG_CPU_32v6K)
 #define tls_emu		0
@@ -81,11 +91,11 @@ static inline void set_tls(unsigned long val)
 	 */
 	barrier();
 
-	if (!tls_emu && !defer_tls_reg_update) {
-		if (has_tls_reg) {
+	if (!tls_emu) {
+		if (has_tls_reg && !defer_tls_reg_update) {
 			asm("mcr p15, 0, %0, c13, c0, 3"
 			    : : "r" (val));
-		} else {
+		} else if (!has_tls_reg) {
 #ifdef CONFIG_KUSER_HELPERS
 			/*
 			 * User space must never try to access this
diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
index ae24dd54e9ef..da206bd4f194 100644
--- a/arch/arm/kernel/entry-header.S
+++ b/arch/arm/kernel/entry-header.S
@@ -292,12 +292,18 @@
 
 
 	.macro	restore_user_regs, fast = 0, offset = 0
-#if defined(CONFIG_CPU_32v6K) && !defined(CONFIG_CPU_V6)
+#if defined(CONFIG_CPU_32v6K) && \
+    (!defined(CONFIG_CPU_V6) || defined(CONFIG_SMP))
+#ifdef CONFIG_CPU_V6
+ALT_SMP(nop)
+ALT_UP_B(.L1_\@)
+#endif
 	@ The TLS register update is deferred until return to user space so we
 	@ can use it for other things while running in the kernel
 	get_thread_info r1
 	ldr	r1, [r1, #TI_TP_VALUE]
 	mcr	p15, 0, r1, c13, c0, 3		@ set TLS register
+.L1_\@:
 #endif
 
 	uaccess_enable r1, isb=0
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 16/32] ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (14 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 15/32] ARM: smp: defer TPIDRURO update for SMP v6 configurations too Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 17/32] ARM: assembler: introduce bl_r macro Ard Biesheuvel
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

On UP systems, only a single task can be 'current' at the same time,
which means we can use a global variable to track it. This means we can
also enable THREAD_INFO_IN_TASK for those systems, as in that case,
thread_info is accessed via current rather than the other way around,
removing the need to store thread_info at the base of the task stack.
This, in turn, permits us to enable IRQ stacks and vmap'ed stacks on UP
systems as well.

To partially mitigate the performance overhead of this arrangement, use
a ADD/ADD/LDR sequence with the appropriate PC-relative group
relocations to load the value of current when needed. This means that
accessing current will still only require a single load as before,
avoiding the need for a literal to carry the address of the global
variable in each function. However, accessing thread_info will now
require this load as well.

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/Kconfig                   |  4 +-
 arch/arm/include/asm/assembler.h   | 83 +++++++++++++-------
 arch/arm/include/asm/current.h     | 47 +++++++----
 arch/arm/include/asm/switch_to.h   |  3 +-
 arch/arm/include/asm/thread_info.h | 27 -------
 arch/arm/kernel/asm-offsets.c      |  3 -
 arch/arm/kernel/entry-armv.S       |  9 ++-
 arch/arm/kernel/entry-header.S     |  2 +-
 arch/arm/kernel/entry-v7m.S        | 10 ++-
 arch/arm/kernel/head-common.S      |  4 +-
 arch/arm/kernel/process.c          |  7 +-
 arch/arm/kernel/smp.c              |  6 ++
 12 files changed, 115 insertions(+), 90 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0e1b93de10b4..108a7a872084 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -127,7 +127,7 @@ config ARM
 	select PERF_USE_VMALLOC
 	select RTC_LIB
 	select SYS_SUPPORTS_APM_EMULATION
-	select THREAD_INFO_IN_TASK if CURRENT_POINTER_IN_TPIDRURO
+	select THREAD_INFO_IN_TASK
 	select TRACE_IRQFLAGS_SUPPORT if !CPU_V7M
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
@@ -1612,7 +1612,7 @@ config CC_HAVE_STACKPROTECTOR_TLS
 
 config STACKPROTECTOR_PER_TASK
 	bool "Use a unique stack canary value for each task"
-	depends on STACKPROTECTOR && THREAD_INFO_IN_TASK && !XIP_DEFLATED_DATA
+	depends on STACKPROTECTOR && CURRENT_POINTER_IN_TPIDRURO && !XIP_DEFLATED_DATA
 	depends on GCC_PLUGINS || CC_HAVE_STACKPROTECTOR_TLS
 	select GCC_PLUGIN_ARM_SSP_PER_TASK if !CC_HAVE_STACKPROTECTOR_TLS
 	default y
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 30752c4427d4..bf304596f87e 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -199,41 +199,12 @@
 	.endm
 	.endr
 
-	.macro	get_current, rd
-#ifdef CONFIG_CURRENT_POINTER_IN_TPIDRURO
-	mrc	p15, 0, \rd, c13, c0, 3		@ get TPIDRURO register
-#else
-	get_thread_info \rd
-	ldr	\rd, [\rd, #TI_TASK]
-#endif
-	.endm
-
-	.macro	set_current, rn
-#ifdef CONFIG_CURRENT_POINTER_IN_TPIDRURO
-	mcr	p15, 0, \rn, c13, c0, 3		@ set TPIDRURO register
-#endif
-	.endm
-
-	.macro	reload_current, t1:req, t2:req
-#ifdef CONFIG_CURRENT_POINTER_IN_TPIDRURO
-	ldr_this_cpu \t1, __entry_task, \t1, \t2
-	mcr	p15, 0, \t1, c13, c0, 3		@ store in TPIDRURO
-#endif
-	.endm
-
 /*
  * Get current thread_info.
  */
 	.macro	get_thread_info, rd
-#ifdef CONFIG_THREAD_INFO_IN_TASK
 	/* thread_info is the first member of struct task_struct */
 	get_current \rd
-#else
- ARM(	mov	\rd, sp, lsr #THREAD_SIZE_ORDER + PAGE_SHIFT	)
- THUMB(	mov	\rd, sp			)
- THUMB(	lsr	\rd, \rd, #THREAD_SIZE_ORDER + PAGE_SHIFT	)
-	mov	\rd, \rd, lsl #THREAD_SIZE_ORDER + PAGE_SHIFT
-#endif
 	.endm
 
 /*
@@ -326,6 +297,60 @@ ALT_UP_B(.L0_\@)
 #endif
 	.endm
 
+	/*
+	 * set_current - store the task pointer of this CPU's current task
+	 */
+	.macro		set_current, rn:req, tmp:req
+#if defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || defined(CONFIG_SMP)
+9998:	mcr		p15, 0, \rn, c13, c0, 3		@ set TPIDRURO register
+#ifdef CONFIG_CPU_V6
+ALT_UP_B(.L0_\@)
+	.subsection	1
+.L0_\@: str_va		\rn, __current, \tmp
+	b		.L1_\@
+	.previous
+.L1_\@:
+#endif
+#else
+	str_va		\rn, __current, \tmp
+#endif
+	.endm
+
+	/*
+	 * get_current - load the task pointer of this CPU's current task
+	 */
+	.macro		get_current, rd:req
+#if defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || defined(CONFIG_SMP)
+9998:	mrc		p15, 0, \rd, c13, c0, 3		@ get TPIDRURO register
+#ifdef CONFIG_CPU_V6
+ALT_UP_B(.L0_\@)
+	.subsection	1
+.L0_\@: ldr_va		\rd, __current
+	b		.L1_\@
+	.previous
+.L1_\@:
+#endif
+#else
+	ldr_va		\rd, __current
+#endif
+	.endm
+
+	/*
+	 * reload_current - reload the task pointer of this CPU's current task
+	 *		    into the TLS register
+	 */
+	.macro		reload_current, t1:req, t2:req
+#if defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || defined(CONFIG_SMP)
+#ifdef CONFIG_CPU_V6
+ALT_SMP(nop)
+ALT_UP_B(.L0_\@)
+#endif
+	ldr_this_cpu	\t1, __entry_task, \t1, \t2
+	mcr		p15, 0, \t1, c13, c0, 3		@ store in TPIDRURO
+.L0_\@:
+#endif
+	.endm
+
 /*
  * Instruction barrier
  */
diff --git a/arch/arm/include/asm/current.h b/arch/arm/include/asm/current.h
index 6bf0aad672c3..c03706869384 100644
--- a/arch/arm/include/asm/current.h
+++ b/arch/arm/include/asm/current.h
@@ -8,25 +8,18 @@
 #define _ASM_ARM_CURRENT_H
 
 #ifndef __ASSEMBLY__
+#include <asm/insn.h>
 
 struct task_struct;
 
-static inline void set_current(struct task_struct *cur)
-{
-	if (!IS_ENABLED(CONFIG_CURRENT_POINTER_IN_TPIDRURO))
-		return;
-
-	/* Set TPIDRURO */
-	asm("mcr p15, 0, %0, c13, c0, 3" :: "r"(cur) : "memory");
-}
-
-#ifdef CONFIG_CURRENT_POINTER_IN_TPIDRURO
+extern struct task_struct *__current;
 
-static inline struct task_struct *get_current(void)
+static __always_inline __attribute_const__ struct task_struct *get_current(void)
 {
 	struct task_struct *cur;
 
 #if __has_builtin(__builtin_thread_pointer) && \
+    defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) && \
     !(defined(CONFIG_THUMB2_KERNEL) && \
       defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 130001)
 	/*
@@ -39,16 +32,40 @@ static inline struct task_struct *get_current(void)
 	 * https://github.com/ClangBuiltLinux/linux/issues/1485
 	 */
 	cur = __builtin_thread_pointer();
+#elif defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || defined(CONFIG_SMP)
+	asm("0:	mrc p15, 0, %0, c13, c0, 3			\n\t"
+#ifdef CONFIG_CPU_V6
+	    "1:							\n\t"
+	    "	.subsection 1					\n\t"
+#if defined(CONFIG_ARM_HAS_GROUP_RELOCS) && \
+    !(defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS))
+	    "2: " LOAD_SYM_ARMV6(%0, __current) "		\n\t"
+	    "	b	1b					\n\t"
 #else
-	asm("mrc p15, 0, %0, c13, c0, 3" : "=r"(cur));
+	    "2:	ldr	%0, 3f					\n\t"
+	    "	ldr	%0, [%0]				\n\t"
+	    "	b	1b					\n\t"
+	    "3:	.long	__current				\n\t"
+#endif
+	    "	.previous					\n\t"
+	    "	.pushsection \".alt.smp.init\", \"a\"		\n\t"
+	    "	.align	2					\n\t"
+	    "	.long	0b - .					\n\t"
+	    "	b	. + (2b - 0b)				\n\t"
+	    "	.popsection					\n\t"
+#endif
+	    : "=r"(cur));
+#elif __LINUX_ARM_ARCH__>= 7 || \
+      !defined(CONFIG_ARM_HAS_GROUP_RELOCS) || \
+      (defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS))
+	cur = __current;
+#else
+	asm(LOAD_SYM_ARMV6(%0, __current) : "=r"(cur));
 #endif
 	return cur;
 }
 
 #define current get_current()
-#else
-#include <asm-generic/current.h>
-#endif /* CONFIG_CURRENT_POINTER_IN_TPIDRURO */
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/arm/include/asm/switch_to.h b/arch/arm/include/asm/switch_to.h
index 61e4a3c4ca6e..9372348516ce 100644
--- a/arch/arm/include/asm/switch_to.h
+++ b/arch/arm/include/asm/switch_to.h
@@ -3,6 +3,7 @@
 #define __ASM_ARM_SWITCH_TO_H
 
 #include <linux/thread_info.h>
+#include <asm/smp_plat.h>
 
 /*
  * For v7 SMP cores running a preemptible kernel we may be pre-empted
@@ -26,7 +27,7 @@ extern struct task_struct *__switch_to(struct task_struct *, struct thread_info
 #define switch_to(prev,next,last)					\
 do {									\
 	__complete_pending_tlbi();					\
-	if (IS_ENABLED(CONFIG_CURRENT_POINTER_IN_TPIDRURO))		\
+	if (IS_ENABLED(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || is_smp())	\
 		__this_cpu_write(__entry_task, next);			\
 	last = __switch_to(prev,task_thread_info(prev), task_thread_info(next));	\
 } while (0)
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 164e15f26485..e039d8f12d9b 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -54,9 +54,6 @@ struct cpu_context_save {
 struct thread_info {
 	unsigned long		flags;		/* low level flags */
 	int			preempt_count;	/* 0 => preemptable, <0 => bug */
-#ifndef CONFIG_THREAD_INFO_IN_TASK
-	struct task_struct	*task;		/* main task structure */
-#endif
 	__u32			cpu;		/* cpu */
 	__u32			cpu_domain;	/* cpu domain */
 	struct cpu_context_save	cpu_context;	/* cpu context */
@@ -72,39 +69,15 @@ struct thread_info {
 
 #define INIT_THREAD_INFO(tsk)						\
 {									\
-	INIT_THREAD_INFO_TASK(tsk)					\
 	.flags		= 0,						\
 	.preempt_count	= INIT_PREEMPT_COUNT,				\
 }
 
-#ifdef CONFIG_THREAD_INFO_IN_TASK
-#define INIT_THREAD_INFO_TASK(tsk)
-
 static inline struct task_struct *thread_task(struct thread_info* ti)
 {
 	return (struct task_struct *)ti;
 }
 
-#else
-#define INIT_THREAD_INFO_TASK(tsk)	.task = &(tsk),
-
-static inline struct task_struct *thread_task(struct thread_info* ti)
-{
-	return ti->task;
-}
-
-/*
- * how to get the thread information struct from C
- */
-static inline struct thread_info *current_thread_info(void) __attribute_const__;
-
-static inline struct thread_info *current_thread_info(void)
-{
-	return (struct thread_info *)
-		(current_stack_pointer & ~(THREAD_SIZE - 1));
-}
-#endif
-
 #define thread_saved_pc(tsk)	\
 	((unsigned long)(task_thread_info(tsk)->cpu_context.pc))
 #define thread_saved_sp(tsk)	\
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 645845e4982a..2c8d76fd7c66 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -43,9 +43,6 @@ int main(void)
   BLANK();
   DEFINE(TI_FLAGS,		offsetof(struct thread_info, flags));
   DEFINE(TI_PREEMPT,		offsetof(struct thread_info, preempt_count));
-#ifndef CONFIG_THREAD_INFO_IN_TASK
-  DEFINE(TI_TASK,		offsetof(struct thread_info, task));
-#endif
   DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
   DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
   DEFINE(TI_CPU_SAVE,		offsetof(struct thread_info, cpu_context));
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 5e01a34369a0..2f912c509e0d 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -746,12 +746,13 @@ ENTRY(__switch_to)
 	switch_tls r1, r4, r5, r3, r7
 #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP) && \
     !defined(CONFIG_STACKPROTECTOR_PER_TASK)
-	ldr	r9, [r2, #TI_TASK]
 	ldr	r8, =__stack_chk_guard
 	.if (TSK_STACK_CANARY > IMM12_MASK)
-	add	r9, r9, #TSK_STACK_CANARY & ~IMM12_MASK
-	.endif
+	add	r9, r2, #TSK_STACK_CANARY & ~IMM12_MASK
 	ldr	r9, [r9, #TSK_STACK_CANARY & IMM12_MASK]
+	.else
+	ldr	r9, [r2, #TSK_STACK_CANARY & IMM12_MASK]
+	.endif
 #endif
 	mov	r7, r2				@ Preserve 'next'
 #ifdef CONFIG_CPU_USE_DOMAINS
@@ -768,7 +769,7 @@ ENTRY(__switch_to)
 #endif
  THUMB(	mov	ip, r4			   )
 	mov	r0, r5
-	set_current r7
+	set_current r7, r8
  ARM(	ldmia	r4, {r4 - sl, fp, sp, pc}  )	@ Load all regs saved previously
  THUMB(	ldmia	ip!, {r4 - sl, fp}	   )	@ Load all regs saved previously
  THUMB(	ldr	sp, [ip], #4		   )
diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
index da206bd4f194..9f01b229841a 100644
--- a/arch/arm/kernel/entry-header.S
+++ b/arch/arm/kernel/entry-header.S
@@ -300,7 +300,7 @@ ALT_UP_B(.L1_\@)
 #endif
 	@ The TLS register update is deferred until return to user space so we
 	@ can use it for other things while running in the kernel
-	get_thread_info r1
+	mrc	p15, 0, r1, c13, c0, 3		@ get current_thread_info
 	ldr	r1, [r1, #TI_TP_VALUE]
 	mcr	p15, 0, r1, c13, c0, 3		@ set TLS register
 .L1_\@:
diff --git a/arch/arm/kernel/entry-v7m.S b/arch/arm/kernel/entry-v7m.S
index 520dd43e7e08..4e0d318b67c6 100644
--- a/arch/arm/kernel/entry-v7m.S
+++ b/arch/arm/kernel/entry-v7m.S
@@ -97,15 +97,17 @@ ENTRY(__switch_to)
 	str	sp, [ip], #4
 	str	lr, [ip], #4
 	mov	r5, r0
+	mov	r6, r2			@ Preserve 'next'
 	add	r4, r2, #TI_CPU_SAVE
 	ldr	r0, =thread_notify_head
 	mov	r1, #THREAD_NOTIFY_SWITCH
 	bl	atomic_notifier_call_chain
-	mov	ip, r4
 	mov	r0, r5
-	ldmia	ip!, {r4 - r11}		@ Load all regs saved previously
-	ldr	sp, [ip]
-	ldr	pc, [ip, #4]!
+	mov	r1, r6
+	ldmia	r4, {r4 - r12, lr}	@ Load all regs saved previously
+	set_current r1, r2
+	mov	sp, ip
+	bx	lr
 	.fnend
 ENDPROC(__switch_to)
 
diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
index da18e0a17dc2..42cae73fcc19 100644
--- a/arch/arm/kernel/head-common.S
+++ b/arch/arm/kernel/head-common.S
@@ -105,10 +105,8 @@ __mmap_switched:
 	mov	r1, #0
 	bl	__memset			@ clear .bss
 
-#ifdef CONFIG_CURRENT_POINTER_IN_TPIDRURO
 	adr_l	r0, init_task			@ get swapper task_struct
-	set_current r0
-#endif
+	set_current r0, r1
 
 	ldmia	r4, {r0, r1, r2, r3}
 	str	r9, [r0]			@ Save processor ID
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index d47159f3791c..0617af11377f 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -36,7 +36,7 @@
 
 #include "signal.h"
 
-#ifdef CONFIG_CURRENT_POINTER_IN_TPIDRURO
+#if defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || defined(CONFIG_SMP)
 DEFINE_PER_CPU(struct task_struct *, __entry_task);
 #endif
 
@@ -46,6 +46,11 @@ unsigned long __stack_chk_guard __read_mostly;
 EXPORT_SYMBOL(__stack_chk_guard);
 #endif
 
+#ifndef CONFIG_CURRENT_POINTER_IN_TPIDRURO
+asmlinkage struct task_struct *__current;
+EXPORT_SYMBOL(__current);
+#endif
+
 static const char *processor_modes[] __maybe_unused = {
   "USER_26", "FIQ_26" , "IRQ_26" , "SVC_26" , "UK4_26" , "UK5_26" , "UK6_26" , "UK7_26" ,
   "UK8_26" , "UK9_26" , "UK10_26", "UK11_26", "UK12_26", "UK13_26", "UK14_26", "UK15_26",
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index ed2b168ff46c..73fc645fc4c7 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -400,6 +400,12 @@ static void smp_store_cpu_info(unsigned int cpuid)
 	check_cpu_icache_size(cpuid);
 }
 
+static void set_current(struct task_struct *cur)
+{
+	/* Set TPIDRURO */
+	asm("mcr p15, 0, %0, c13, c0, 3" :: "r"(cur) : "memory");
+}
+
 /*
  * This is the secondary CPU boot entry.  We're using this CPUs
  * idle thread stack, but a set of temporary page tables.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 17/32] ARM: assembler: introduce bl_r macro
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (15 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 16/32] ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 18/32] ARM: unwind: support unwinding across multiple stacks Ard Biesheuvel
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Add a bl_r macro that abstract the difference between the ways indirect
calls are performed on older and newer ARM architecture revisions.

The main difference is to prefer blx instructions over explicit LR
assignments when possible, as these tend to confuse the prediction logic
in out-of-order cores when speculating across a function return.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/include/asm/assembler.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index bf304596f87e..7242e9a56650 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -744,4 +744,19 @@ THUMB(	orr	\reg , \reg , #PSR_T_BIT	)
 	.endif
 	.endm
 
+	/*
+	 * bl_r - branch and link to register
+	 *
+	 * @dst: target to branch to
+	 * @c: conditional opcode suffix
+	 */
+	.macro		bl_r, dst:req, c
+	.if		__LINUX_ARM_ARCH__ < 6
+	mov\c		lr, pc
+	mov\c		pc, \dst
+	.else
+	blx\c		\dst
+	.endif
+	.endm
+
 #endif /* __ASM_ASSEMBLER_H__ */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 18/32] ARM: unwind: support unwinding across multiple stacks
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (16 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 17/32] ARM: assembler: introduce bl_r macro Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 19/32] ARM: export dump_mem() to other objects Ard Biesheuvel
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Implement support in the unwinder for dealing with multiple stacks.
This will be needed once we add support for IRQ stacks, or for the
overflow stack used by the vmap'ed stacks code.

This involves tracking the unwind opcodes that either update the virtual
stack pointer from another virtual register, or perform an explicit
subtract on the virtual stack pointer, and updating the low and high
bounds that we use to sanitize the stack pointer accordingly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/kernel/unwind.c | 25 +++++++++++++-------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index 59fdf257bf8b..9cb9af3fc433 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -52,6 +52,7 @@ EXPORT_SYMBOL(__aeabi_unwind_cpp_pr2);
 struct unwind_ctrl_block {
 	unsigned long vrs[16];		/* virtual register set */
 	const unsigned long *insn;	/* pointer to the current instructions word */
+	unsigned long sp_low;		/* lowest value of sp allowed */
 	unsigned long sp_high;		/* highest value of sp allowed */
 	/*
 	 * 1 : check for stack overflow for each register pop.
@@ -256,8 +257,12 @@ static int unwind_exec_pop_subset_r4_to_r13(struct unwind_ctrl_block *ctrl,
 		mask >>= 1;
 		reg++;
 	}
-	if (!load_sp)
+	if (!load_sp) {
 		ctrl->vrs[SP] = (unsigned long)vsp;
+	} else {
+		ctrl->sp_low = ctrl->vrs[SP];
+		ctrl->sp_high = ALIGN(ctrl->sp_low, THREAD_SIZE);
+	}
 
 	return URC_OK;
 }
@@ -313,9 +318,10 @@ static int unwind_exec_insn(struct unwind_ctrl_block *ctrl)
 
 	if ((insn & 0xc0) == 0x00)
 		ctrl->vrs[SP] += ((insn & 0x3f) << 2) + 4;
-	else if ((insn & 0xc0) == 0x40)
+	else if ((insn & 0xc0) == 0x40) {
 		ctrl->vrs[SP] -= ((insn & 0x3f) << 2) + 4;
-	else if ((insn & 0xf0) == 0x80) {
+		ctrl->sp_low = ctrl->vrs[SP];
+	} else if ((insn & 0xf0) == 0x80) {
 		unsigned long mask;
 
 		insn = (insn << 8) | unwind_get_byte(ctrl);
@@ -330,9 +336,11 @@ static int unwind_exec_insn(struct unwind_ctrl_block *ctrl)
 		if (ret)
 			goto error;
 	} else if ((insn & 0xf0) == 0x90 &&
-		   (insn & 0x0d) != 0x0d)
+		   (insn & 0x0d) != 0x0d) {
 		ctrl->vrs[SP] = ctrl->vrs[insn & 0x0f];
-	else if ((insn & 0xf0) == 0xa0) {
+		ctrl->sp_low = ctrl->vrs[SP];
+		ctrl->sp_high = ALIGN(ctrl->sp_low, THREAD_SIZE);
+	} else if ((insn & 0xf0) == 0xa0) {
 		ret = unwind_exec_pop_r4_to_rN(ctrl, insn);
 		if (ret)
 			goto error;
@@ -375,13 +383,12 @@ static int unwind_exec_insn(struct unwind_ctrl_block *ctrl)
  */
 int unwind_frame(struct stackframe *frame)
 {
-	unsigned long low;
 	const struct unwind_idx *idx;
 	struct unwind_ctrl_block ctrl;
 
 	/* store the highest address on the stack to avoid crossing it*/
-	low = frame->sp;
-	ctrl.sp_high = ALIGN(low, THREAD_SIZE);
+	ctrl.sp_low = frame->sp;
+	ctrl.sp_high = ALIGN(ctrl.sp_low, THREAD_SIZE);
 
 	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
 		 frame->pc, frame->lr, frame->sp);
@@ -437,7 +444,7 @@ int unwind_frame(struct stackframe *frame)
 		urc = unwind_exec_insn(&ctrl);
 		if (urc < 0)
 			return urc;
-		if (ctrl.vrs[SP] < low || ctrl.vrs[SP] >= ctrl.sp_high)
+		if (ctrl.vrs[SP] < ctrl.sp_low || ctrl.vrs[SP] > ctrl.sp_high)
 			return -URC_FAILURE;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 19/32] ARM: export dump_mem() to other objects
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (17 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 18/32] ARM: unwind: support unwinding across multiple stacks Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 20/32] ARM: unwind: dump exception stack from calling frame Ard Biesheuvel
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

The unwind info based stack unwinder will make its own call to
dump_mem() to dump the exception stack, so give it external linkage.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/include/asm/stacktrace.h | 2 ++
 arch/arm/kernel/traps.c           | 7 +++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h
index 8f54f9ad8a9b..33ee1aa4b8c0 100644
--- a/arch/arm/include/asm/stacktrace.h
+++ b/arch/arm/include/asm/stacktrace.h
@@ -36,5 +36,7 @@ void arm_get_current_stackframe(struct pt_regs *regs, struct stackframe *frame)
 extern int unwind_frame(struct stackframe *frame);
 extern void walk_stackframe(struct stackframe *frame,
 			    int (*fn)(struct stackframe *, void *), void *data);
+extern void dump_mem(const char *lvl, const char *str, unsigned long bottom,
+		     unsigned long top);
 
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index da04ed85855a..710306eac71f 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -35,6 +35,7 @@
 #include <asm/ptrace.h>
 #include <asm/unwind.h>
 #include <asm/tls.h>
+#include <asm/stacktrace.h>
 #include <asm/system_misc.h>
 #include <asm/opcodes.h>
 
@@ -60,8 +61,6 @@ static int __init user_debug_setup(char *str)
 __setup("user_debug=", user_debug_setup);
 #endif
 
-static void dump_mem(const char *, const char *, unsigned long, unsigned long);
-
 void dump_backtrace_entry(unsigned long where, unsigned long from,
 			  unsigned long frame, const char *loglvl)
 {
@@ -120,8 +119,8 @@ static int verify_stack(unsigned long sp)
 /*
  * Dump out the contents of some memory nicely...
  */
-static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
-		     unsigned long top)
+void dump_mem(const char *lvl, const char *str, unsigned long bottom,
+	      unsigned long top)
 {
 	unsigned long first;
 	int i;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 20/32] ARM: unwind: dump exception stack from calling frame
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (18 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 19/32] ARM: export dump_mem() to other objects Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 21/32] ARM: backtrace-clang: avoid crash on bogus frame pointer Ard Biesheuvel
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

The existing code that dumps the contents of the pt_regs structure
passed to __entry routines does so while unwinding the callee frame, and
dereferences the stack pointer as a struct pt_regs*. This will no longer
work when we enable support for IRQ or overflow stacks, because the
struct pt_regs may live on the task stack, while we are executing from
another stack.

The unwinder has access to this information, but only while unwinding
the calling frame. So let's combine the exception stack dumping code
with the handling of the calling frame as well. By printing it before
dumping the caller/callee addresses, the output order is preserved.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/include/asm/stacktrace.h | 10 ++++++++++
 arch/arm/kernel/traps.c           |  3 ++-
 arch/arm/kernel/unwind.c          |  8 +++++++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h
index 33ee1aa4b8c0..d87d60532b86 100644
--- a/arch/arm/include/asm/stacktrace.h
+++ b/arch/arm/include/asm/stacktrace.h
@@ -18,6 +18,16 @@ struct stackframe {
 	struct llist_node *kr_cur;
 	struct task_struct *tsk;
 #endif
+#ifdef CONFIG_ARM_UNWIND
+	/*
+	 * This field is used to track the stack pointer value when calling
+	 * __entry routines. This is needed when IRQ stacks and overflow stacks
+	 * are used, because in that case, the struct pt_regs passed to these
+	 * __entry routines may be at the top of the task stack, while we are
+	 * executing from another stack.
+	 */
+	unsigned long sp_low;
+#endif
 };
 
 static __always_inline
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 710306eac71f..c51b87f6fc3e 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -76,7 +76,8 @@ void dump_backtrace_entry(unsigned long where, unsigned long from,
 	printk("%s %ps from %pS\n", loglvl, (void *)where, (void *)from);
 #endif
 
-	if (in_entry_text(from) && end <= ALIGN(frame, THREAD_SIZE))
+	if (!IS_ENABLED(CONFIG_UNWINDER_ARM) &&
+	    in_entry_text(from) && end <= ALIGN(frame, THREAD_SIZE))
 		dump_mem(loglvl, "Exception stack", frame + 4, end);
 }
 
diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index 9cb9af3fc433..b7a6141c342f 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -29,6 +29,7 @@
 #include <linux/spinlock.h>
 #include <linux/list.h>
 
+#include <asm/sections.h>
 #include <asm/stacktrace.h>
 #include <asm/traps.h>
 #include <asm/unwind.h>
@@ -459,6 +460,7 @@ int unwind_frame(struct stackframe *frame)
 	frame->sp = ctrl.vrs[SP];
 	frame->lr = ctrl.vrs[LR];
 	frame->pc = ctrl.vrs[PC];
+	frame->sp_low = ctrl.sp_low;
 
 	return URC_OK;
 }
@@ -502,7 +504,11 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 		urc = unwind_frame(&frame);
 		if (urc < 0)
 			break;
-		dump_backtrace_entry(where, frame.pc, frame.sp - 4, loglvl);
+		if (in_entry_text(where))
+			dump_mem(loglvl, "Exception stack", frame.sp_low,
+				 frame.sp_low + sizeof(struct pt_regs));
+
+		dump_backtrace_entry(where, frame.pc, 0, loglvl);
 	}
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 21/32] ARM: backtrace-clang: avoid crash on bogus frame pointer
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (19 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 20/32] ARM: unwind: dump exception stack from calling frame Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 22/32] ARM: implement IRQ stacks Ard Biesheuvel
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

The Clang backtrace code dereferences the link register value pulled
from the stack to decide whether the caller was a branch-and-link
instruction, in order to subsequently decode the offset to find the
start of the calling function. Unlike other loads in this routine, this
one is not protected by a fixup, and may therefore cause a crash if the
address in question is bogus.

So let's fix this, by treating the fault as a failure to decode the 'bl'
instruction. To avoid a label renum, reuse a fixup label that guards an
instruction that cannot fault to begin with.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/lib/backtrace-clang.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/lib/backtrace-clang.S b/arch/arm/lib/backtrace-clang.S
index 5b2cdb1003e3..5b4bca85d06d 100644
--- a/arch/arm/lib/backtrace-clang.S
+++ b/arch/arm/lib/backtrace-clang.S
@@ -144,7 +144,7 @@ for_each_frame:	tst	frame, mask		@ Check for address exceptions
  */
 1003:		ldr	sv_lr, [sv_fp, #4]	@ get saved lr from next frame
 
-		ldr	r0, [sv_lr, #-4]	@ get call instruction
+1004:		ldr	r0, [sv_lr, #-4]	@ get call instruction
 		ldr	r3, .Lopcode+4
 		and	r2, r3, r0		@ is this a bl call
 		teq	r2, r3
@@ -164,7 +164,7 @@ finished_setup:
 /*
  * Print the function (sv_pc) and where it was called from (sv_lr).
  */
-1004:		mov	r0, sv_pc
+		mov	r0, sv_pc
 
 		mov	r1, sv_lr
 		mov	r2, frame
@@ -210,7 +210,7 @@ ENDPROC(c_backtrace)
 		.long	1001b, 1006b
 		.long	1002b, 1006b
 		.long	1003b, 1006b
-		.long	1004b, 1006b
+		.long	1004b, finished_setup
 		.long   1005b, 1006b
 		.popsection
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 22/32] ARM: implement IRQ stacks
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (20 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 21/32] ARM: backtrace-clang: avoid crash on bogus frame pointer Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 23/32] ARM: call_with_stack: add unwind support Ard Biesheuvel
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Now that we no longer rely on the stack pointer to access the current
task struct or thread info, we can implement support for IRQ stacks
cleanly as well.

Define a per-CPU IRQ stack and switch to this stack when taking an IRQ,
provided that we were not already using that stack in the interrupted
context. This is never the case for IRQs taken from user space, but ones
taken while running in the kernel could fire while one taken from user
space has not completed yet.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/include/asm/assembler.h |  4 ++
 arch/arm/kernel/entry-armv.S     | 48 ++++++++++++++++++--
 arch/arm/kernel/entry-v7m.S      | 17 ++++++-
 arch/arm/kernel/irq.c            | 17 +++++++
 arch/arm/kernel/traps.c          | 15 +++++-
 arch/arm/lib/backtrace-clang.S   |  7 +++
 arch/arm/lib/backtrace.S         |  7 +++
 7 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 7242e9a56650..f961f99721dd 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -86,6 +86,10 @@
 
 #define IMM12_MASK 0xfff
 
+/* the frame pointer used for stack unwinding */
+ARM(	fpreg	.req	r11	)
+THUMB(	fpreg	.req	r7	)
+
 /*
  * Enable and disable interrupts
  */
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 2f912c509e0d..38e3978a50a9 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -32,9 +32,51 @@
 /*
  * Interrupt handling.
  */
-	.macro	irq_handler
+	.macro	irq_handler, from_user:req
 	mov	r0, sp
+#ifdef CONFIG_UNWINDER_ARM
+	mov	fpreg, sp		@ Preserve original SP
+#else
+	mov	r7, fp			@ Preserve original FP
+	mov	r8, sp			@ Preserve original SP
+#endif
+	ldr_this_cpu sp, irq_stack_ptr, r2, r3
+	.if	\from_user == 0
+UNWIND( .setfp	fpreg, sp		)
+	@
+	@ If we took the interrupt while running in the kernel, we may already
+	@ be using the IRQ stack, so revert to the original value in that case.
+	@
+	subs	r2, sp, r0		@ SP above bottom of IRQ stack?
+	rsbscs	r2, r2, #THREAD_SIZE	@ ... and below the top?
+	movcs	sp, r0			@ If so, revert to incoming SP
+
+#ifndef CONFIG_UNWINDER_ARM
+	@
+	@ Inform the frame pointer unwinder where the next frame lives
+	@
+	movcc	lr, pc			@ Make LR point into .entry.text so
+					@ that we will get a dump of the
+					@ exception stack for this frame.
+#ifdef CONFIG_CC_IS_GCC
+	movcc	ip, r0			@ Store the old SP in the frame record.
+	stmdbcc	sp!, {fp, ip, lr, pc}	@ Push frame record
+	addcc	fp, sp, #12
+#else
+	stmdbcc	sp!, {fp, lr}		@ Push frame record
+	movcc	fp, sp
+#endif // CONFIG_CC_IS_GCC
+#endif // CONFIG_UNWINDER_ARM
+	.endif
+
 	bl	generic_handle_arch_irq
+
+#ifdef CONFIG_UNWINDER_ARM
+	mov	sp, fpreg		@ Restore original SP
+#else
+	mov	fp, r7			@ Restore original FP
+	mov	sp, r8			@ Restore original SP
+#endif // CONFIG_UNWINDER_ARM
 	.endm
 
 	.macro	pabt_helper
@@ -191,7 +233,7 @@ ENDPROC(__dabt_svc)
 	.align	5
 __irq_svc:
 	svc_entry
-	irq_handler
+	irq_handler from_user=0
 
 #ifdef CONFIG_PREEMPTION
 	ldr	r8, [tsk, #TI_PREEMPT]		@ get preempt count
@@ -418,7 +460,7 @@ ENDPROC(__dabt_usr)
 __irq_usr:
 	usr_entry
 	kuser_cmpxchg_check
-	irq_handler
+	irq_handler from_user=1
 	get_thread_info tsk
 	mov	why, #0
 	b	ret_to_user_from_irq
diff --git a/arch/arm/kernel/entry-v7m.S b/arch/arm/kernel/entry-v7m.S
index 4e0d318b67c6..de8a60363c85 100644
--- a/arch/arm/kernel/entry-v7m.S
+++ b/arch/arm/kernel/entry-v7m.S
@@ -40,11 +40,24 @@ __irq_entry:
 	@ Invoke the IRQ handler
 	@
 	mov	r0, sp
-	stmdb	sp!, {lr}
+	ldr_this_cpu sp, irq_stack_ptr, r1, r2
+
+	@
+	@ If we took the interrupt while running in the kernel, we may already
+	@ be using the IRQ stack, so revert to the original value in that case.
+	@
+	subs	r2, sp, r0		@ SP above bottom of IRQ stack?
+	rsbscs	r2, r2, #THREAD_SIZE	@ ... and below the top?
+	movcs	sp, r0
+
+	push	{r0, lr}		@ preserve LR and original SP
+
 	@ routine called with r0 = struct pt_regs *
 	bl	generic_handle_arch_irq
 
-	pop	{lr}
+	pop	{r0, lr}
+	mov	sp, r0
+
 	@
 	@ Check for any pending work if returning to user
 	@
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 5a1e52a4ee11..92ae80a8e5b4 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -43,6 +43,21 @@
 
 unsigned long irq_err_count;
 
+asmlinkage DEFINE_PER_CPU_READ_MOSTLY(u8 *, irq_stack_ptr);
+
+static void __init init_irq_stacks(void)
+{
+	u8 *stack;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		stack = (u8 *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
+		if (WARN_ON(!stack))
+			break;
+		per_cpu(irq_stack_ptr, cpu) = &stack[THREAD_SIZE];
+	}
+}
+
 int arch_show_interrupts(struct seq_file *p, int prec)
 {
 #ifdef CONFIG_FIQ
@@ -84,6 +99,8 @@ void __init init_IRQ(void)
 {
 	int ret;
 
+	init_irq_stacks();
+
 	if (IS_ENABLED(CONFIG_OF) && !machine_desc->init_irq)
 		irqchip_init();
 	else
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index c51b87f6fc3e..1b8bef286fbc 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -66,6 +66,19 @@ void dump_backtrace_entry(unsigned long where, unsigned long from,
 {
 	unsigned long end = frame + 4 + sizeof(struct pt_regs);
 
+	if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER) &&
+	    IS_ENABLED(CONFIG_CC_IS_GCC) &&
+	    end > ALIGN(frame, THREAD_SIZE)) {
+		/*
+		 * If we are walking past the end of the stack, it may be due
+		 * to the fact that we are on an IRQ or overflow stack. In this
+		 * case, we can load the address of the other stack from the
+		 * frame record.
+		 */
+		frame = ((unsigned long *)frame)[-2] - 4;
+		end = frame + 4 + sizeof(struct pt_regs);
+	}
+
 #ifndef CONFIG_KALLSYMS
 	printk("%sFunction entered at [<%08lx>] from [<%08lx>]\n",
 		loglvl, where, from);
@@ -280,7 +293,7 @@ static int __die(const char *str, int err, struct pt_regs *regs)
 
 	if (!user_mode(regs) || in_interrupt()) {
 		dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
-			 THREAD_SIZE + (unsigned long)task_stack_page(tsk));
+			 ALIGN(regs->ARM_sp, THREAD_SIZE));
 		dump_backtrace(regs, tsk, KERN_EMERG);
 		dump_instr(KERN_EMERG, regs);
 	}
diff --git a/arch/arm/lib/backtrace-clang.S b/arch/arm/lib/backtrace-clang.S
index 5b4bca85d06d..1f0814b41bcf 100644
--- a/arch/arm/lib/backtrace-clang.S
+++ b/arch/arm/lib/backtrace-clang.S
@@ -197,6 +197,13 @@ finished_setup:
 
 		cmp	sv_fp, frame		@ next frame must be
 		mov	frame, sv_fp		@ above the current frame
+
+		@
+		@ Kernel stacks may be discontiguous in memory. If the next
+		@ frame is below the previous frame, accept it as long as it
+		@ lives in kernel memory.
+		@
+		cmpls	sv_fp, #PAGE_OFFSET
 		bhi	for_each_frame
 
 1006:		adr	r0, .Lbad
diff --git a/arch/arm/lib/backtrace.S b/arch/arm/lib/backtrace.S
index e8408f22d4dc..e6e8451c5cb3 100644
--- a/arch/arm/lib/backtrace.S
+++ b/arch/arm/lib/backtrace.S
@@ -98,6 +98,13 @@ for_each_frame:	tst	frame, mask		@ Check for address exceptions
 
 		cmp	sv_fp, frame		@ next frame must be
 		mov	frame, sv_fp		@ above the current frame
+
+		@
+		@ Kernel stacks may be discontiguous in memory. If the next
+		@ frame is below the previous frame, accept it as long as it
+		@ lives in kernel memory.
+		@
+		cmpls	sv_fp, #PAGE_OFFSET
 		bhi	for_each_frame
 
 1006:		adr	r0, .Lbad
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 23/32] ARM: call_with_stack: add unwind support
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (21 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 22/32] ARM: implement IRQ stacks Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 24/32] ARM: run softirqs on the per-CPU IRQ stack Ard Biesheuvel
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Restructure the code and add the unwind annotations so that both the
frame pointer unwinder as well as the EHABI unwind info based unwinder
will be able to follow the call stack through call_with_stack().

Since GCC and Clang use different formats for the stack frame, two
methods are implemented: a GCC version that pushes fp, sp, lr and pc for
compatibility with the frame pointer unwinder, and a second version that
works with Clang, as well as with the EHABI unwinder both in ARM and
Thumb2 modes.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/lib/call_with_stack.S | 33 +++++++++++++++-----
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/arch/arm/lib/call_with_stack.S b/arch/arm/lib/call_with_stack.S
index 28b0341ae786..0a268a6c513c 100644
--- a/arch/arm/lib/call_with_stack.S
+++ b/arch/arm/lib/call_with_stack.S
@@ -8,25 +8,42 @@
 
 #include <linux/linkage.h>
 #include <asm/assembler.h>
+#include <asm/unwind.h>
 
 /*
  * void call_with_stack(void (*fn)(void *), void *arg, void *sp)
  *
  * Change the stack to that pointed at by sp, then invoke fn(arg) with
  * the new stack.
+ *
+ * The sequence below follows the APCS frame convention for frame pointer
+ * unwinding, and implements the unwinder annotations needed by the EABI
+ * unwinder.
  */
-ENTRY(call_with_stack)
-	str	sp, [r2, #-4]!
-	str	lr, [r2, #-4]!
 
+ENTRY(call_with_stack)
+#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
+	mov	ip, sp
+	push	{fp, ip, lr, pc}
+	sub	fp, ip, #4
+#else
+UNWIND( .fnstart		)
+UNWIND( .save	{fpreg, lr}	)
+	push	{fpreg, lr}
+UNWIND( .setfp	fpreg, sp	)
+	mov	fpreg, sp
+#endif
 	mov	sp, r2
 	mov	r2, r0
 	mov	r0, r1
 
-	badr	lr, 1f
-	ret	r2
+	bl_r	r2
 
-1:	ldr	lr, [sp]
-	ldr	sp, [sp, #4]
-	ret	lr
+#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
+	ldmdb	fp, {fp, sp, pc}
+#else
+	mov	sp, fpreg
+	pop	{fpreg, pc}
+UNWIND( .fnend			)
+#endif
 ENDPROC(call_with_stack)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 24/32] ARM: run softirqs on the per-CPU IRQ stack
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (22 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 23/32] ARM: call_with_stack: add unwind support Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-03-22  9:04   ` Sebastian Andrzej Siewior
  2022-01-24 17:47 ` [PATCH v5 25/32] ARM: memcpy: use frame pointer as unwind anchor Ard Biesheuvel
                   ` (8 subsequent siblings)
  32 siblings, 1 reply; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Now that we have enabled IRQ stacks, any softIRQs that are handled over
the back of a hard IRQ will run from the IRQ stack as well. However, any
synchronous softirq processing that happens when re-enabling softIRQs
from task context will still execute on that task's stack.

Since any call to local_bh_enable() at any level in the task's call
stack may trigger a softIRQ processing run, which could potentially
cause a task stack overflow if the combined stack footprints exceed the
stack's size, let's run these synchronous invocations of do_softirq() on
the IRQ stack as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/Kconfig      |  2 ++
 arch/arm/kernel/irq.c | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 108a7a872084..b959249dd716 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -128,6 +128,8 @@ config ARM
 	select RTC_LIB
 	select SYS_SUPPORTS_APM_EMULATION
 	select THREAD_INFO_IN_TASK
+	select HAVE_IRQ_EXIT_ON_IRQ_STACK
+	select HAVE_SOFTIRQ_ON_OWN_STACK
 	select TRACE_IRQFLAGS_SUPPORT if !CPU_V7M
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 92ae80a8e5b4..380376f55554 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -36,11 +36,14 @@
 #include <asm/hardware/cache-l2x0.h>
 #include <asm/hardware/cache-uniphier.h>
 #include <asm/outercache.h>
+#include <asm/softirq_stack.h>
 #include <asm/exception.h>
 #include <asm/mach/arch.h>
 #include <asm/mach/irq.h>
 #include <asm/mach/time.h>
 
+#include "reboot.h"
+
 unsigned long irq_err_count;
 
 asmlinkage DEFINE_PER_CPU_READ_MOSTLY(u8 *, irq_stack_ptr);
@@ -58,6 +61,17 @@ static void __init init_irq_stacks(void)
 	}
 }
 
+static void ____do_softirq(void *arg)
+{
+	__do_softirq();
+}
+
+void do_softirq_own_stack(void)
+{
+	call_with_stack(____do_softirq, NULL,
+			__this_cpu_read(irq_stack_ptr));
+}
+
 int arch_show_interrupts(struct seq_file *p, int prec)
 {
 #ifdef CONFIG_FIQ
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 25/32] ARM: memcpy: use frame pointer as unwind anchor
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (23 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 24/32] ARM: run softirqs on the per-CPU IRQ stack Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 26/32] ARM: memmove: " Ard Biesheuvel
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

The memcpy template is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.

Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory
copy functions") addressed this by carving up the function in different
chunks as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.

Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.

Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the template, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/lib/copy_from_user.S | 13 ++--
 arch/arm/lib/copy_template.S  | 67 +++++++-------------
 arch/arm/lib/copy_to_user.S   | 13 ++--
 arch/arm/lib/memcpy.S         | 13 ++--
 4 files changed, 38 insertions(+), 68 deletions(-)

diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index 480a20766137..270de7debd0f 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -91,18 +91,15 @@
 	strb\cond \reg, [\ptr], #1
 	.endm
 
-	.macro enter reg1 reg2
+	.macro enter regs:vararg
 	mov	r3, #0
-	stmdb	sp!, {r0, r2, r3, \reg1, \reg2}
+UNWIND( .save	{r0, r2, r3, \regs}		)
+	stmdb	sp!, {r0, r2, r3, \regs}
 	.endm
 
-	.macro usave reg1 reg2
-	UNWIND(	.save {r0, r2, r3, \reg1, \reg2}	)
-	.endm
-
-	.macro exit reg1 reg2
+	.macro exit regs:vararg
 	add	sp, sp, #8
-	ldmfd	sp!, {r0, \reg1, \reg2}
+	ldmfd	sp!, {r0, \regs}
 	.endm
 
 	.text
diff --git a/arch/arm/lib/copy_template.S b/arch/arm/lib/copy_template.S
index 810a805d36dc..8fbafb074fe9 100644
--- a/arch/arm/lib/copy_template.S
+++ b/arch/arm/lib/copy_template.S
@@ -69,13 +69,10 @@
  *	than one 32bit instruction in Thumb-2)
  */
 
-
-	UNWIND(	.fnstart			)
-		enter	r4, lr
-	UNWIND(	.fnend				)
-
 	UNWIND(	.fnstart			)
-		usave	r4, lr			  @ in first stmdb block
+		enter	r4, UNWIND(fpreg,) lr
+	UNWIND(	.setfp	fpreg, sp		)
+	UNWIND(	mov	fpreg, sp		)
 
 		subs	r2, r2, #4
 		blt	8f
@@ -86,12 +83,7 @@
 		bne	10f
 
 1:		subs	r2, r2, #(28)
-		stmfd	sp!, {r5 - r8}
-	UNWIND(	.fnend				)
-
-	UNWIND(	.fnstart			)
-		usave	r4, lr
-	UNWIND(	.save	{r5 - r8}		) @ in second stmfd block
+		stmfd	sp!, {r5, r6, r8, r9}
 		blt	5f
 
 	CALGN(	ands	ip, r0, #31		)
@@ -110,9 +102,9 @@
 	PLD(	pld	[r1, #92]		)
 
 3:	PLD(	pld	[r1, #124]		)
-4:		ldr8w	r1, r3, r4, r5, r6, r7, r8, ip, lr, abort=20f
+4:		ldr8w	r1, r3, r4, r5, r6, r8, r9, ip, lr, abort=20f
 		subs	r2, r2, #32
-		str8w	r0, r3, r4, r5, r6, r7, r8, ip, lr, abort=20f
+		str8w	r0, r3, r4, r5, r6, r8, r9, ip, lr, abort=20f
 		bge	3b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	4b			)
@@ -132,8 +124,8 @@
 		ldr1w	r1, r4, abort=20f
 		ldr1w	r1, r5, abort=20f
 		ldr1w	r1, r6, abort=20f
-		ldr1w	r1, r7, abort=20f
 		ldr1w	r1, r8, abort=20f
+		ldr1w	r1, r9, abort=20f
 		ldr1w	r1, lr, abort=20f
 
 #if LDR1W_SHIFT < STR1W_SHIFT
@@ -150,17 +142,14 @@
 		str1w	r0, r4, abort=20f
 		str1w	r0, r5, abort=20f
 		str1w	r0, r6, abort=20f
-		str1w	r0, r7, abort=20f
 		str1w	r0, r8, abort=20f
+		str1w	r0, r9, abort=20f
 		str1w	r0, lr, abort=20f
 
 	CALGN(	bcs	2b			)
 
-7:		ldmfd	sp!, {r5 - r8}
-	UNWIND(	.fnend				) @ end of second stmfd block
+7:		ldmfd	sp!, {r5, r6, r8, r9}
 
-	UNWIND(	.fnstart			)
-		usave	r4, lr			  @ still in first stmdb block
 8:		movs	r2, r2, lsl #31
 		ldr1b	r1, r3, ne, abort=21f
 		ldr1b	r1, r4, cs, abort=21f
@@ -169,7 +158,7 @@
 		str1b	r0, r4, cs, abort=21f
 		str1b	r0, ip, cs, abort=21f
 
-		exit	r4, pc
+		exit	r4, UNWIND(fpreg,) pc
 
 9:		rsb	ip, ip, #4
 		cmp	ip, #2
@@ -189,13 +178,10 @@
 		ldr1w	r1, lr, abort=21f
 		beq	17f
 		bgt	18f
-	UNWIND(	.fnend				)
 
 
 		.macro	forward_copy_shift pull push
 
-	UNWIND(	.fnstart			)
-		usave	r4, lr			  @ still in first stmdb block
 		subs	r2, r2, #28
 		blt	14f
 
@@ -205,12 +191,8 @@
 	CALGN(	subcc	r2, r2, ip		)
 	CALGN(	bcc	15f			)
 
-11:		stmfd	sp!, {r5 - r9}
-	UNWIND(	.fnend				)
+11:		stmfd	sp!, {r5, r6, r8 - r10}
 
-	UNWIND(	.fnstart			)
-		usave	r4, lr
-	UNWIND(	.save	{r5 - r9}		) @ in new second stmfd block
 	PLD(	pld	[r1, #0]		)
 	PLD(	subs	r2, r2, #96		)
 	PLD(	pld	[r1, #28]		)
@@ -219,35 +201,32 @@
 	PLD(	pld	[r1, #92]		)
 
 12:	PLD(	pld	[r1, #124]		)
-13:		ldr4w	r1, r4, r5, r6, r7, abort=19f
+13:		ldr4w	r1, r4, r5, r6, r8, abort=19f
 		mov	r3, lr, lspull #\pull
 		subs	r2, r2, #32
-		ldr4w	r1, r8, r9, ip, lr, abort=19f
+		ldr4w	r1, r9, r10, ip, lr, abort=19f
 		orr	r3, r3, r4, lspush #\push
 		mov	r4, r4, lspull #\pull
 		orr	r4, r4, r5, lspush #\push
 		mov	r5, r5, lspull #\pull
 		orr	r5, r5, r6, lspush #\push
 		mov	r6, r6, lspull #\pull
-		orr	r6, r6, r7, lspush #\push
-		mov	r7, r7, lspull #\pull
-		orr	r7, r7, r8, lspush #\push
+		orr	r6, r6, r8, lspush #\push
 		mov	r8, r8, lspull #\pull
 		orr	r8, r8, r9, lspush #\push
 		mov	r9, r9, lspull #\pull
-		orr	r9, r9, ip, lspush #\push
+		orr	r9, r9, r10, lspush #\push
+		mov	r10, r10, lspull #\pull
+		orr	r10, r10, ip, lspush #\push
 		mov	ip, ip, lspull #\pull
 		orr	ip, ip, lr, lspush #\push
-		str8w	r0, r3, r4, r5, r6, r7, r8, r9, ip, abort=19f
+		str8w	r0, r3, r4, r5, r6, r8, r9, r10, ip, abort=19f
 		bge	12b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	13b			)
 
-		ldmfd	sp!, {r5 - r9}
-	UNWIND(	.fnend				) @ end of the second stmfd block
+		ldmfd	sp!, {r5, r6, r8 - r10}
 
-	UNWIND(	.fnstart			)
-		usave	r4, lr			  @ still in first stmdb block
 14:		ands	ip, r2, #28
 		beq	16f
 
@@ -262,7 +241,6 @@
 
 16:		sub	r1, r1, #(\push / 8)
 		b	8b
-	UNWIND(	.fnend				)
 
 		.endm
 
@@ -273,6 +251,7 @@
 
 18:		forward_copy_shift	pull=24	push=8
 
+	UNWIND(	.fnend				)
 
 /*
  * Abort preamble and completion macros.
@@ -282,13 +261,13 @@
  */
 
 	.macro	copy_abort_preamble
-19:	ldmfd	sp!, {r5 - r9}
+19:	ldmfd	sp!, {r5, r6, r8 - r10}
 	b	21f
-20:	ldmfd	sp!, {r5 - r8}
+20:	ldmfd	sp!, {r5, r6, r8, r9}
 21:
 	.endm
 
 	.macro	copy_abort_end
-	ldmfd	sp!, {r4, pc}
+	ldmfd	sp!, {r4, UNWIND(fpreg,) pc}
 	.endm
 
diff --git a/arch/arm/lib/copy_to_user.S b/arch/arm/lib/copy_to_user.S
index 842ea5ede485..fac49e57cc0b 100644
--- a/arch/arm/lib/copy_to_user.S
+++ b/arch/arm/lib/copy_to_user.S
@@ -90,18 +90,15 @@
 	strusr	\reg, \ptr, 1, \cond, abort=\abort
 	.endm
 
-	.macro enter reg1 reg2
+	.macro enter regs:vararg
 	mov	r3, #0
-	stmdb	sp!, {r0, r2, r3, \reg1, \reg2}
+UNWIND( .save	{r0, r2, r3, \regs}		)
+	stmdb	sp!, {r0, r2, r3, \regs}
 	.endm
 
-	.macro usave reg1 reg2
-	UNWIND(	.save {r0, r2, r3, \reg1, \reg2}	)
-	.endm
-
-	.macro exit reg1 reg2
+	.macro exit regs:vararg
 	add	sp, sp, #8
-	ldmfd	sp!, {r0, \reg1, \reg2}
+	ldmfd	sp!, {r0, \regs}
 	.endm
 
 	.text
diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S
index e4caf48c089f..90f2b645aa0d 100644
--- a/arch/arm/lib/memcpy.S
+++ b/arch/arm/lib/memcpy.S
@@ -42,16 +42,13 @@
 	strb\cond \reg, [\ptr], #1
 	.endm
 
-	.macro enter reg1 reg2
-	stmdb sp!, {r0, \reg1, \reg2}
+	.macro enter regs:vararg
+UNWIND( .save	{r0, \regs}		)
+	stmdb sp!, {r0, \regs}
 	.endm
 
-	.macro usave reg1 reg2
-	UNWIND(	.save	{r0, \reg1, \reg2}	)
-	.endm
-
-	.macro exit reg1 reg2
-	ldmfd sp!, {r0, \reg1, \reg2}
+	.macro exit regs:vararg
+	ldmfd sp!, {r0, \regs}
 	.endm
 
 	.text
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 26/32] ARM: memmove: use frame pointer as unwind anchor
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (24 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 25/32] ARM: memcpy: use frame pointer as unwind anchor Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 27/32] ARM: memset: clean up unwind annotations Ard Biesheuvel
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

The memmove routine is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.

Commit 207a6cb06990c ("ARM: 8224/1: Add unwinding support for memmove
function") addressed this by carving up the function in different chunks
as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.

Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.

Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the function, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/lib/memmove.S | 60 +++++++-------------
 1 file changed, 20 insertions(+), 40 deletions(-)

diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S
index 6fecc12a1f51..6410554039fd 100644
--- a/arch/arm/lib/memmove.S
+++ b/arch/arm/lib/memmove.S
@@ -31,12 +31,13 @@ WEAK(memmove)
 		subs	ip, r0, r1
 		cmphi	r2, ip
 		bls	__memcpy
-
-		stmfd	sp!, {r0, r4, lr}
 	UNWIND(	.fnend				)
 
 	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		) @ in first stmfd block
+	UNWIND(	.save	{r0, r4, fpreg, lr}	)
+		stmfd	sp!, {r0, r4, UNWIND(fpreg,) lr}
+	UNWIND(	.setfp	fpreg, sp		)
+	UNWIND(	mov	fpreg, sp		)
 		add	r1, r1, r2
 		add	r0, r0, r2
 		subs	r2, r2, #4
@@ -48,12 +49,7 @@ WEAK(memmove)
 		bne	10f
 
 1:		subs	r2, r2, #(28)
-		stmfd	sp!, {r5 - r8}
-	UNWIND(	.fnend				)
-
-	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		)
-	UNWIND(	.save	{r5 - r8}		) @ in second stmfd block
+		stmfd	sp!, {r5, r6, r8, r9}
 		blt	5f
 
 	CALGN(	ands	ip, r0, #31		)
@@ -72,9 +68,9 @@ WEAK(memmove)
 	PLD(	pld	[r1, #-96]		)
 
 3:	PLD(	pld	[r1, #-128]		)
-4:		ldmdb	r1!, {r3, r4, r5, r6, r7, r8, ip, lr}
+4:		ldmdb	r1!, {r3, r4, r5, r6, r8, r9, ip, lr}
 		subs	r2, r2, #32
-		stmdb	r0!, {r3, r4, r5, r6, r7, r8, ip, lr}
+		stmdb	r0!, {r3, r4, r5, r6, r8, r9, ip, lr}
 		bge	3b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	4b			)
@@ -88,8 +84,8 @@ WEAK(memmove)
 		W(ldr)	r4, [r1, #-4]!
 		W(ldr)	r5, [r1, #-4]!
 		W(ldr)	r6, [r1, #-4]!
-		W(ldr)	r7, [r1, #-4]!
 		W(ldr)	r8, [r1, #-4]!
+		W(ldr)	r9, [r1, #-4]!
 		W(ldr)	lr, [r1, #-4]!
 
 		add	pc, pc, ip
@@ -99,17 +95,13 @@ WEAK(memmove)
 		W(str)	r4, [r0, #-4]!
 		W(str)	r5, [r0, #-4]!
 		W(str)	r6, [r0, #-4]!
-		W(str)	r7, [r0, #-4]!
 		W(str)	r8, [r0, #-4]!
+		W(str)	r9, [r0, #-4]!
 		W(str)	lr, [r0, #-4]!
 
 	CALGN(	bcs	2b			)
 
-7:		ldmfd	sp!, {r5 - r8}
-	UNWIND(	.fnend				) @ end of second stmfd block
-
-	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		) @ still in first stmfd block
+7:		ldmfd	sp!, {r5, r6, r8, r9}
 
 8:		movs	r2, r2, lsl #31
 		ldrbne	r3, [r1, #-1]!
@@ -118,7 +110,7 @@ WEAK(memmove)
 		strbne	r3, [r0, #-1]!
 		strbcs	r4, [r0, #-1]!
 		strbcs	ip, [r0, #-1]
-		ldmfd	sp!, {r0, r4, pc}
+		ldmfd	sp!, {r0, r4, UNWIND(fpreg,) pc}
 
 9:		cmp	ip, #2
 		ldrbgt	r3, [r1, #-1]!
@@ -137,13 +129,10 @@ WEAK(memmove)
 		ldr	r3, [r1, #0]
 		beq	17f
 		blt	18f
-	UNWIND(	.fnend				)
 
 
 		.macro	backward_copy_shift push pull
 
-	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		) @ still in first stmfd block
 		subs	r2, r2, #28
 		blt	14f
 
@@ -152,12 +141,7 @@ WEAK(memmove)
 	CALGN(	subcc	r2, r2, ip		)
 	CALGN(	bcc	15f			)
 
-11:		stmfd	sp!, {r5 - r9}
-	UNWIND(	.fnend				)
-
-	UNWIND(	.fnstart			)
-	UNWIND(	.save	{r0, r4, lr}		)
-	UNWIND(	.save	{r5 - r9}		) @ in new second stmfd block
+11:		stmfd	sp!, {r5, r6, r8 - r10}
 
 	PLD(	pld	[r1, #-4]		)
 	PLD(	subs	r2, r2, #96		)
@@ -167,35 +151,31 @@ WEAK(memmove)
 	PLD(	pld	[r1, #-96]		)
 
 12:	PLD(	pld	[r1, #-128]		)
-13:		ldmdb   r1!, {r7, r8, r9, ip}
+13:		ldmdb   r1!, {r8, r9, r10, ip}
 		mov     lr, r3, lspush #\push
 		subs    r2, r2, #32
 		ldmdb   r1!, {r3, r4, r5, r6}
 		orr     lr, lr, ip, lspull #\pull
 		mov     ip, ip, lspush #\push
-		orr     ip, ip, r9, lspull #\pull
+		orr     ip, ip, r10, lspull #\pull
+		mov     r10, r10, lspush #\push
+		orr     r10, r10, r9, lspull #\pull
 		mov     r9, r9, lspush #\push
 		orr     r9, r9, r8, lspull #\pull
 		mov     r8, r8, lspush #\push
-		orr     r8, r8, r7, lspull #\pull
-		mov     r7, r7, lspush #\push
-		orr     r7, r7, r6, lspull #\pull
+		orr     r8, r8, r6, lspull #\pull
 		mov     r6, r6, lspush #\push
 		orr     r6, r6, r5, lspull #\pull
 		mov     r5, r5, lspush #\push
 		orr     r5, r5, r4, lspull #\pull
 		mov     r4, r4, lspush #\push
 		orr     r4, r4, r3, lspull #\pull
-		stmdb   r0!, {r4 - r9, ip, lr}
+		stmdb   r0!, {r4 - r6, r8 - r10, ip, lr}
 		bge	12b
 	PLD(	cmn	r2, #96			)
 	PLD(	bge	13b			)
 
-		ldmfd	sp!, {r5 - r9}
-	UNWIND(	.fnend				) @ end of the second stmfd block
-
-	UNWIND(	.fnstart			)
-	UNWIND(	.save {r0, r4, lr}		) @ still in first stmfd block
+		ldmfd	sp!, {r5, r6, r8 - r10}
 
 14:		ands	ip, r2, #28
 		beq	16f
@@ -211,7 +191,6 @@ WEAK(memmove)
 
 16:		add	r1, r1, #(\pull / 8)
 		b	8b
-	UNWIND(	.fnend				)
 
 		.endm
 
@@ -222,5 +201,6 @@ WEAK(memmove)
 
 18:		backward_copy_shift	push=24	pull=8
 
+	UNWIND(	.fnend				)
 ENDPROC(memmove)
 ENDPROC(__memmove)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 27/32] ARM: memset: clean up unwind annotations
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (25 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 26/32] ARM: memmove: " Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 28/32] ARM: unwind: disregard unwind info before stack frame is set up Ard Biesheuvel
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

The memset implementation carves up the code in different sections, each
covered with their own unwind info. In this case, it is done in a way
similar to how the compiler might do it, to disambiguate between parts
where the return address is in LR and the SP is unmodified, and parts
where a stack frame is live, and the unwinder needs to know the size of
the stack frame and the location of the return address within it.

Only the placement of the unwind directives is slightly odd: the stack
pushes are placed in the wrong sections, which may confuse the unwinder
when attempting to unwind with PC pointing at the stack push in
question.

So let's fix this up, by reordering the directives and instructions as
appropriate.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/lib/memset.S | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S
index 9817cb258c1a..d71ab61430b2 100644
--- a/arch/arm/lib/memset.S
+++ b/arch/arm/lib/memset.S
@@ -28,16 +28,16 @@ UNWIND( .fnstart         )
 	mov	r3, r1
 7:	cmp	r2, #16
 	blt	4f
+UNWIND( .fnend              )
 
 #if ! CALGN(1)+0
 
 /*
  * We need 2 extra registers for this loop - use r8 and the LR
  */
-	stmfd	sp!, {r8, lr}
-UNWIND( .fnend              )
 UNWIND( .fnstart            )
 UNWIND( .save {r8, lr}      )
+	stmfd	sp!, {r8, lr}
 	mov	r8, r1
 	mov	lr, r3
 
@@ -66,10 +66,9 @@ UNWIND( .fnend              )
  * whole cache lines at once.
  */
 
-	stmfd	sp!, {r4-r8, lr}
-UNWIND( .fnend                 )
 UNWIND( .fnstart               )
 UNWIND( .save {r4-r8, lr}      )
+	stmfd	sp!, {r4-r8, lr}
 	mov	r4, r1
 	mov	r5, r3
 	mov	r6, r1
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 28/32] ARM: unwind: disregard unwind info before stack frame is set up
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (26 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 27/32] ARM: memset: clean up unwind annotations Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 29/32] ARM: entry: rework stack realignment code in svc_entry Ard Biesheuvel
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

When unwinding the stack from a stack overflow, we are likely to start
from a stack push instruction, given that this is the most common way to
grow the stack for compiler emitted code. This push instruction rarely
appears anywhere else than at offset 0x0 of the function, and if it
doesn't, the compiler tends to split up the unwind annotations, given
that the stack frame layout is apparently not the same throughout the
function.

This means that, in the general case, if the frame's PC points at the
first instruction covered by a certain unwind entry, there is no way the
stack frame that the unwind entry describes could have been created yet,
and so we are still on the stack frame of the caller in that case. So
treat this as a special case, and return with the new PC taken from the
frame's LR, without applying the unwind transformations to the virtual
register set.

This permits us to unwind the call stack on stack overflow when the
overflow was caused by a stack push on function entry.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/kernel/unwind.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index b7a6141c342f..e8d729975f12 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -411,7 +411,21 @@ int unwind_frame(struct stackframe *frame)
 	if (idx->insn == 1)
 		/* can't unwind */
 		return -URC_FAILURE;
-	else if ((idx->insn & 0x80000000) == 0)
+	else if (frame->pc == prel31_to_addr(&idx->addr_offset)) {
+		/*
+		 * Unwinding is tricky when we're halfway through the prologue,
+		 * since the stack frame that the unwinder expects may not be
+		 * fully set up yet. However, one thing we do know for sure is
+		 * that if we are unwinding from the very first instruction of
+		 * a function, we are still effectively in the stack frame of
+		 * the caller, and the unwind info has no relevance yet.
+		 */
+		if (frame->pc == frame->lr)
+			return -URC_FAILURE;
+		frame->sp_low = frame->sp;
+		frame->pc = frame->lr;
+		return URC_OK;
+	} else if ((idx->insn & 0x80000000) == 0)
 		/* prel31 to the unwind table */
 		ctrl.insn = (unsigned long *)prel31_to_addr(&idx->insn);
 	else if ((idx->insn & 0xff000000) == 0x80000000)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 29/32] ARM: entry: rework stack realignment code in svc_entry
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (27 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 28/32] ARM: unwind: disregard unwind info before stack frame is set up Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 30/32] ARM: switch_to: clean up Thumb2 code path Ard Biesheuvel
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

The original Thumb-2 enablement patches updated the stack realignment
code in svc_entry to work around the lack of a STMIB instruction in
Thumb-2, by subtracting 4 from the frame size, inverting the sense of
the misaligment check, and changing to a STMIA instruction and a final
stack push of a 4 byte quantity that results in the stack becoming
aligned at the end of the sequence. It also pushes and pops R0 to the
stack in order to have a temp register that Thumb-2 allows in general
purpose ALU instructions, as TST using SP is not permitted.

Both are a bit problematic for vmap'ed stacks, as using the stack is
only permitted after we decide that we did not overflow the stack, or
have already switched to the overflow stack.

As for the alignment check: the current approach creates a corner case
where, if the initial SUB of SP ends up right at the start of the stack,
we will end up subtracting another 8 bytes and overflowing it.  This
means we would need to add the overflow check *after* the SUB that
deliberately misaligns the stack. However, this would require us to keep
local state (i.e., whether we performed the subtract or not) across the
overflow check, but without any GPRs or stack available.

So let's switch to an approach where we don't use the stack, and where
the alignment check of the stack pointer occurs in the usual way, as
this is guaranteed not to result in overflow. This means we will be able
to do the overflow check first.

While at it, switch to R1 so the mode stack pointer in R0 remains
accessible.

Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/kernel/entry-armv.S | 25 +++++++++++---------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 38e3978a50a9..a4009e4302bb 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -177,24 +177,27 @@ ENDPROC(__und_invalid)
 	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1
  UNWIND(.fnstart		)
  UNWIND(.save {r0 - pc}		)
-	sub	sp, sp, #(SVC_REGS_SIZE + \stack_hole - 4)
+	sub	sp, sp, #(SVC_REGS_SIZE + \stack_hole)
 #ifdef CONFIG_THUMB2_KERNEL
- SPFIX(	str	r0, [sp]	)	@ temporarily saved
- SPFIX(	mov	r0, sp		)
- SPFIX(	tst	r0, #4		)	@ test original stack alignment
- SPFIX(	ldr	r0, [sp]	)	@ restored
+	add	sp, r1			@ get SP in a GPR without
+	sub	r1, sp, r1		@ using a temp register
+	tst	r1, #4			@ test stack pointer alignment
+	sub	r1, sp, r1		@ restore original R1
+	sub	sp, r1			@ restore original SP
 #else
  SPFIX(	tst	sp, #4		)
 #endif
- SPFIX(	subeq	sp, sp, #4	)
-	stmia	sp, {r1 - r12}
+ SPFIX(	subne	sp, sp, #4	)
+
+ ARM(	stmib	sp, {r1 - r12}	)
+ THUMB(	stmia	sp, {r0 - r12}	)	@ No STMIB in Thumb-2
 
 	ldmia	r0, {r3 - r5}
-	add	r7, sp, #S_SP - 4	@ here for interlock avoidance
+	add	r7, sp, #S_SP		@ here for interlock avoidance
 	mov	r6, #-1			@  ""  ""      ""       ""
-	add	r2, sp, #(SVC_REGS_SIZE + \stack_hole - 4)
- SPFIX(	addeq	r2, r2, #4	)
-	str	r3, [sp, #-4]!		@ save the "real" r0 copied
+	add	r2, sp, #(SVC_REGS_SIZE + \stack_hole)
+ SPFIX(	addne	r2, r2, #4	)
+	str	r3, [sp]		@ save the "real" r0 copied
 					@ from the exception stack
 
 	mov	r3, lr
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 30/32] ARM: switch_to: clean up Thumb2 code path
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (28 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 29/32] ARM: entry: rework stack realignment code in svc_entry Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 31/32] ARM: mm: prepare vmalloc_seq handling for use under SMP Ard Biesheuvel
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

The load/store-multiple instructions that essentially perform the
switch_to operation in ARM mode, by loading/storing all callee save
registers as well the stack pointer and the link register or program
counter, is split into 3 separate loads or stores for Thumb-2, with the
IP register used as a temporary to capture the target address.

We can clean this up a bit, by sticking with a single STMIA or LDMIA
instruction, but one that uses IP instead of SP.

While at it, switch to a MOVW/MOVT pair to load thread_notify_head.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/kernel/entry-armv.S | 24 +++++++++++---------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index a4009e4302bb..86be80159c14 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -773,14 +773,14 @@ ENDPROC(__fiq_usr)
  * r0 = previous task_struct, r1 = previous thread_info, r2 = next thread_info
  * previous and next are guaranteed not to be the same.
  */
+	.align	5
 ENTRY(__switch_to)
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
-	add	ip, r1, #TI_CPU_SAVE
- ARM(	stmia	ip!, {r4 - sl, fp, sp, lr} )	@ Store most regs on stack
- THUMB(	stmia	ip!, {r4 - sl, fp}	   )	@ Store most regs on stack
- THUMB(	str	sp, [ip], #4		   )
- THUMB(	str	lr, [ip], #4		   )
+	add	r3, r1, #TI_CPU_SAVE
+ ARM(	stmia	r3, {r4 - sl, fp, sp, lr}  )	@ Store most regs on stack
+ THUMB( mov	ip, sp			   )
+ THUMB( stmia	r3, {r4 - sl, fp, ip, lr}  )	@ Thumb2 does not permit SP here
 	ldr	r4, [r2, #TI_TP_VALUE]
 	ldr	r5, [r2, #TI_TP_VALUE + 4]
 #ifdef CONFIG_CPU_USE_DOMAINS
@@ -805,20 +805,22 @@ ENTRY(__switch_to)
 #endif
 	mov	r5, r0
 	add	r4, r2, #TI_CPU_SAVE
-	ldr	r0, =thread_notify_head
+	mov_l	r0, thread_notify_head
 	mov	r1, #THREAD_NOTIFY_SWITCH
 	bl	atomic_notifier_call_chain
 #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_SMP) && \
     !defined(CONFIG_STACKPROTECTOR_PER_TASK)
 	str	r9, [r8]
 #endif
- THUMB(	mov	ip, r4			   )
 	mov	r0, r5
 	set_current r7, r8
- ARM(	ldmia	r4, {r4 - sl, fp, sp, pc}  )	@ Load all regs saved previously
- THUMB(	ldmia	ip!, {r4 - sl, fp}	   )	@ Load all regs saved previously
- THUMB(	ldr	sp, [ip], #4		   )
- THUMB(	ldr	pc, [ip]		   )
+#if !defined(CONFIG_THUMB2_KERNEL)
+	ldmia	r4, {r4 - sl, fp, sp, pc}	@ Load all regs saved previously
+#else
+	ldmia	r4, {r4 - sl, fp, ip, lr}	@ Thumb2 does not permit SP here
+	mov	sp, ip
+	ret	lr
+#endif
  UNWIND(.fnend		)
 ENDPROC(__switch_to)
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 31/32] ARM: mm: prepare vmalloc_seq handling for use under SMP
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (29 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 30/32] ARM: switch_to: clean up Thumb2 code path Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:47 ` [PATCH v5 32/32] ARM: implement support for vmap'ed stacks Ard Biesheuvel
  2022-01-24 17:56 ` [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Russell King (Oracle)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Currently, the vmalloc_seq counter is only used to keep track of changes
in the vmalloc region on !SMP builds, which means there is no need to
deal with concurrency explicitly.

However, in a subsequent patch, we will wire up this same mechanism for
ensuring that vmap'ed stacks are guaranteed to be mapped by the active
mm before switching to a task, and here we need to ensure that changes
to the page tables are visible to other CPUs when they observe a change
in the sequence count.

Since LPAE needs none of this, fold a check against it into the
vmalloc_seq counter check after breaking it out into a separate static
inline helper.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/asm/mmu.h         |  2 +-
 arch/arm/include/asm/mmu_context.h | 13 +++++++++++--
 arch/arm/mm/context.c              |  3 +--
 arch/arm/mm/ioremap.c              | 18 +++++++++++-------
 4 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/mmu.h b/arch/arm/include/asm/mmu.h
index 1592a4264488..e049723840d3 100644
--- a/arch/arm/include/asm/mmu.h
+++ b/arch/arm/include/asm/mmu.h
@@ -10,7 +10,7 @@ typedef struct {
 #else
 	int		switch_pending;
 #endif
-	unsigned int	vmalloc_seq;
+	atomic_t	vmalloc_seq;
 	unsigned long	sigpage;
 #ifdef CONFIG_VDSO
 	unsigned long	vdso;
diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
index 84e58956fcab..71a26986efb9 100644
--- a/arch/arm/include/asm/mmu_context.h
+++ b/arch/arm/include/asm/mmu_context.h
@@ -23,6 +23,16 @@
 
 void __check_vmalloc_seq(struct mm_struct *mm);
 
+#ifdef CONFIG_MMU
+static inline void check_vmalloc_seq(struct mm_struct *mm)
+{
+	if (!IS_ENABLED(CONFIG_ARM_LPAE) &&
+	    unlikely(atomic_read(&mm->context.vmalloc_seq) !=
+		     atomic_read(&init_mm.context.vmalloc_seq)))
+		__check_vmalloc_seq(mm);
+}
+#endif
+
 #ifdef CONFIG_CPU_HAS_ASID
 
 void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk);
@@ -52,8 +62,7 @@ static inline void a15_erratum_get_cpumask(int this_cpu, struct mm_struct *mm,
 static inline void check_and_switch_context(struct mm_struct *mm,
 					    struct task_struct *tsk)
 {
-	if (unlikely(mm->context.vmalloc_seq != init_mm.context.vmalloc_seq))
-		__check_vmalloc_seq(mm);
+	check_vmalloc_seq(mm);
 
 	if (irqs_disabled())
 		/*
diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index 48091870db89..4204ffa2d104 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -240,8 +240,7 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
 	unsigned int cpu = smp_processor_id();
 	u64 asid;
 
-	if (unlikely(mm->context.vmalloc_seq != init_mm.context.vmalloc_seq))
-		__check_vmalloc_seq(mm);
+	check_vmalloc_seq(mm);
 
 	/*
 	 * We cannot update the pgd and the ASID atomicly with classic
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index 197f8eb3a775..aa08bcb72db9 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -117,16 +117,21 @@ EXPORT_SYMBOL(ioremap_page);
 
 void __check_vmalloc_seq(struct mm_struct *mm)
 {
-	unsigned int seq;
+	int seq;
 
 	do {
-		seq = init_mm.context.vmalloc_seq;
+		seq = atomic_read(&init_mm.context.vmalloc_seq);
 		memcpy(pgd_offset(mm, VMALLOC_START),
 		       pgd_offset_k(VMALLOC_START),
 		       sizeof(pgd_t) * (pgd_index(VMALLOC_END) -
 					pgd_index(VMALLOC_START)));
-		mm->context.vmalloc_seq = seq;
-	} while (seq != init_mm.context.vmalloc_seq);
+		/*
+		 * Use a store-release so that other CPUs that observe the
+		 * counter's new value are guaranteed to see the results of the
+		 * memcpy as well.
+		 */
+		atomic_set_release(&mm->context.vmalloc_seq, seq);
+	} while (seq != atomic_read(&init_mm.context.vmalloc_seq));
 }
 
 #if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
@@ -157,7 +162,7 @@ static void unmap_area_sections(unsigned long virt, unsigned long size)
 			 * Note: this is still racy on SMP machines.
 			 */
 			pmd_clear(pmdp);
-			init_mm.context.vmalloc_seq++;
+			atomic_inc_return_release(&init_mm.context.vmalloc_seq);
 
 			/*
 			 * Free the page table, if there was one.
@@ -174,8 +179,7 @@ static void unmap_area_sections(unsigned long virt, unsigned long size)
 	 * Ensure that the active_mm is up to date - we want to
 	 * catch any use-after-iounmap cases.
 	 */
-	if (current->active_mm->context.vmalloc_seq != init_mm.context.vmalloc_seq)
-		__check_vmalloc_seq(current->active_mm);
+	check_vmalloc_seq(current->active_mm);
 
 	flush_tlb_kernel_range(virt, end);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 32/32] ARM: implement support for vmap'ed stacks
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (30 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 31/32] ARM: mm: prepare vmalloc_seq handling for use under SMP Ard Biesheuvel
@ 2022-01-24 17:47 ` Ard Biesheuvel
  2022-01-24 17:56 ` [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Russell King (Oracle)
  32 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:47 UTC (permalink / raw)
  To: linux, linux-arm-kernel
  Cc: linux-hardening, Ard Biesheuvel, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

Wire up the generic support for managing task stack allocations via vmalloc,
and implement the entry code that detects whether we faulted because of a
stack overrun (or future stack overrun caused by pushing the pt_regs array)

While this adds a fair amount of tricky entry asm code, it should be
noted that it only adds a TST + branch to the svc_entry path. The code
implementing the non-trivial handling of the overflow stack is emitted
out-of-line into the .text section.

Since on !LPAE, we rely on do_translation_fault() to keep PMD level page
table entries that cover the vmalloc region up to date, we need to
ensure that we don't hit such a stale PMD entry when accessing the
stack, as the fault handler itself needs a stack to run as well. So
let's bump the vmalloc_seq counter when PMD level entries in the vmalloc
range are modified, so that the MM switch fetches the latest version of
the entries. To ensure that kernel threads executing with an active_mm
other than init_mm are up to date, add an implementation of
enter_lazy_tlb() to handle this case.

Note that the page table walker is not an ordinary observer in terms of
concurrency, which means that memory barriers alone are not sufficient
to prevent spurious translation faults from occurring when accessing the
stack after a context switch. For this reason, a dummy read from the new
stack is added to __switch_to() right before switching to it, so that
any faults can be dealt with by do_translation_fault() while the old
stack is still active.

Also note that we need to increase the per-mode stack by 1 word, to gain
some space to stash a GPR until we know it is safe to touch the stack.
However, due to the cacheline alignment of the struct, this does not
actually increase the memory footprint of the struct stack array at all.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
---
 arch/arm/Kconfig                   |  1 +
 arch/arm/include/asm/mmu_context.h |  9 ++
 arch/arm/include/asm/page.h        |  3 +
 arch/arm/include/asm/thread_info.h |  8 ++
 arch/arm/kernel/entry-armv.S       | 91 ++++++++++++++++++--
 arch/arm/kernel/entry-header.S     | 37 ++++++++
 arch/arm/kernel/head.S             |  7 ++
 arch/arm/kernel/irq.c              |  9 +-
 arch/arm/kernel/setup.c            |  8 +-
 arch/arm/kernel/sleep.S            | 13 +++
 arch/arm/kernel/traps.c            | 69 ++++++++++++++-
 arch/arm/kernel/unwind.c           |  3 +-
 arch/arm/kernel/vmlinux.lds.S      |  4 +-
 13 files changed, 247 insertions(+), 15 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b959249dd716..cbbe38f55088 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -130,6 +130,7 @@ config ARM
 	select THREAD_INFO_IN_TASK
 	select HAVE_IRQ_EXIT_ON_IRQ_STACK
 	select HAVE_SOFTIRQ_ON_OWN_STACK
+	select HAVE_ARCH_VMAP_STACK if MMU && ARM_HAS_GROUP_RELOCS
 	select TRACE_IRQFLAGS_SUPPORT if !CPU_V7M
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
index 71a26986efb9..db2cb06aa8cf 100644
--- a/arch/arm/include/asm/mmu_context.h
+++ b/arch/arm/include/asm/mmu_context.h
@@ -138,6 +138,15 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 #endif
 }
 
+#ifdef CONFIG_VMAP_STACK
+static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
+{
+	if (mm != &init_mm)
+		check_vmalloc_seq(mm);
+}
+#define enter_lazy_tlb enter_lazy_tlb
+#endif
+
 #include <asm-generic/mmu_context.h>
 
 #endif
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 11b058a72a5b..5fcc8a600e36 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -147,6 +147,9 @@ extern void copy_page(void *to, const void *from);
 #include <asm/pgtable-3level-types.h>
 #else
 #include <asm/pgtable-2level-types.h>
+#ifdef CONFIG_VMAP_STACK
+#define ARCH_PAGE_TABLE_SYNC_MASK	PGTBL_PMD_MODIFIED
+#endif
 #endif
 
 #endif /* CONFIG_MMU */
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index e039d8f12d9b..aecc403b2880 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -25,6 +25,14 @@
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 #define THREAD_START_SP		(THREAD_SIZE - 8)
 
+#ifdef CONFIG_VMAP_STACK
+#define THREAD_ALIGN		(2 * THREAD_SIZE)
+#else
+#define THREAD_ALIGN		THREAD_SIZE
+#endif
+
+#define OVERFLOW_STACK_SIZE	SZ_4K
+
 #ifndef __ASSEMBLY__
 
 struct task_struct;
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 86be80159c14..e098cc4de426 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -49,6 +49,10 @@ UNWIND( .setfp	fpreg, sp		)
 	@
 	subs	r2, sp, r0		@ SP above bottom of IRQ stack?
 	rsbscs	r2, r2, #THREAD_SIZE	@ ... and below the top?
+#ifdef CONFIG_VMAP_STACK
+	ldr_va	r2, high_memory, cc	@ End of the linear region
+	cmpcc	r2, r0			@ Stack pointer was below it?
+#endif
 	movcs	sp, r0			@ If so, revert to incoming SP
 
 #ifndef CONFIG_UNWINDER_ARM
@@ -174,13 +178,18 @@ ENDPROC(__und_invalid)
 #define SPFIX(code...)
 #endif
 
-	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1
+	.macro	svc_entry, stack_hole=0, trace=1, uaccess=1, overflow_check=1
  UNWIND(.fnstart		)
- UNWIND(.save {r0 - pc}		)
 	sub	sp, sp, #(SVC_REGS_SIZE + \stack_hole)
+ THUMB( add	sp, r1		)	@ get SP in a GPR without
+ THUMB( sub	r1, sp, r1	)	@ using a temp register
+
+	.if	\overflow_check
+ UNWIND(.save	{r0 - pc}	)
+	do_overflow_check (SVC_REGS_SIZE + \stack_hole)
+	.endif
+
 #ifdef CONFIG_THUMB2_KERNEL
-	add	sp, r1			@ get SP in a GPR without
-	sub	r1, sp, r1		@ using a temp register
 	tst	r1, #4			@ test stack pointer alignment
 	sub	r1, sp, r1		@ restore original R1
 	sub	sp, r1			@ restore original SP
@@ -814,16 +823,88 @@ ENTRY(__switch_to)
 #endif
 	mov	r0, r5
 	set_current r7, r8
-#if !defined(CONFIG_THUMB2_KERNEL)
+#if !defined(CONFIG_THUMB2_KERNEL) && \
+    !(defined(CONFIG_VMAP_STACK) && !defined(CONFIG_ARM_LPAE))
 	ldmia	r4, {r4 - sl, fp, sp, pc}	@ Load all regs saved previously
 #else
 	ldmia	r4, {r4 - sl, fp, ip, lr}	@ Thumb2 does not permit SP here
+#if defined(CONFIG_VMAP_STACK) && !defined(CONFIG_ARM_LPAE)
+	@ Even though we take care to ensure that the previous task's active_mm
+	@ has the correct translation for next's task stack, the architecture
+	@ permits that a translation fault caused by a speculative access is
+	@ taken once the result of the access should become architecturally
+	@ visible. Usually, we rely on do_translation_fault() to fix this up
+	@ transparently, but that only works for the stack if we are not using
+	@ it when taking the fault. So do a dummy read from next's stack while
+	@ still running from prev's stack, so that any faults get taken here.
+	ldr	r2, [ip]
+#endif
 	mov	sp, ip
 	ret	lr
 #endif
  UNWIND(.fnend		)
 ENDPROC(__switch_to)
 
+#ifdef CONFIG_VMAP_STACK
+	.text
+	.align	2
+__bad_stack:
+	@
+	@ We've just detected an overflow. We need to load the address of this
+	@ CPU's overflow stack into the stack pointer register. We have only one
+	@ register available so let's switch to ARM mode and use the per-CPU
+	@ variable accessor that does not require any scratch registers.
+	@
+	@ We enter here with IP clobbered and its value stashed on the mode
+	@ stack.
+	@
+THUMB(	bx	pc		)
+THUMB(	nop			)
+THUMB(	.arm			)
+	ldr_this_cpu_armv6 ip, overflow_stack_ptr
+
+	str	sp, [ip, #-4]!			@ Preserve original SP value
+	mov	sp, ip				@ Switch to overflow stack
+	pop	{ip}				@ Original SP in IP
+
+#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
+	mov	ip, ip				@ mov expected by unwinder
+	push	{fp, ip, lr, pc}		@ GCC flavor frame record
+#else
+	str	ip, [sp, #-8]!			@ store original SP
+	push	{fpreg, lr}			@ Clang flavor frame record
+#endif
+UNWIND( ldr	ip, [r0, #4]	)		@ load exception LR
+UNWIND( str	ip, [sp, #12]	)		@ store in the frame record
+	ldr	ip, [r0, #12]			@ reload IP
+
+	@ Store the original GPRs to the new stack.
+	svc_entry uaccess=0, overflow_check=0
+
+UNWIND( .save   {sp, pc}	)
+UNWIND( .save   {fpreg, lr}	)
+UNWIND( .setfp  fpreg, sp	)
+
+	ldr	fpreg, [sp, #S_SP]		@ Add our frame record
+						@ to the linked list
+#if defined(CONFIG_UNWINDER_FRAME_POINTER) && defined(CONFIG_CC_IS_GCC)
+	ldr	r1, [fp, #4]			@ reload SP at entry
+	add	fp, fp, #12
+#else
+	ldr	r1, [fpreg, #8]
+#endif
+	str	r1, [sp, #S_SP]			@ store in pt_regs
+
+	@ Stash the regs for handle_bad_stack
+	mov	r0, sp
+
+	@ Time to die
+	bl	handle_bad_stack
+	nop
+UNWIND( .fnend			)
+ENDPROC(__bad_stack)
+#endif
+
 	__INIT
 
 /*
diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S
index 9f01b229841a..347c975c5d9d 100644
--- a/arch/arm/kernel/entry-header.S
+++ b/arch/arm/kernel/entry-header.S
@@ -429,3 +429,40 @@ scno	.req	r7		@ syscall number
 tbl	.req	r8		@ syscall table pointer
 why	.req	r8		@ Linux syscall (!= 0)
 tsk	.req	r9		@ current thread_info
+
+	.macro	do_overflow_check, frame_size:req
+#ifdef CONFIG_VMAP_STACK
+	@
+	@ Test whether the SP has overflowed. Task and IRQ stacks are aligned
+	@ so that SP & BIT(THREAD_SIZE_ORDER + PAGE_SHIFT) should always be
+	@ zero.
+	@
+ARM(	tst	sp, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)	)
+THUMB(	tst	r1, #1 << (THREAD_SIZE_ORDER + PAGE_SHIFT)	)
+THUMB(	it	ne						)
+	bne	.Lstack_overflow_check\@
+
+	.pushsection	.text
+.Lstack_overflow_check\@:
+	@
+	@ The stack pointer is not pointing to a valid vmap'ed stack, but it
+	@ may be pointing into the linear map instead, which may happen if we
+	@ are already running from the overflow stack. We cannot detect overflow
+	@ in such cases so just carry on.
+	@
+	str	ip, [r0, #12]			@ Stash IP on the mode stack
+	ldr_va	ip, high_memory			@ Start of VMALLOC space
+ARM(	cmp	sp, ip			)	@ SP in vmalloc space?
+THUMB(	cmp	r1, ip			)
+THUMB(	itt	lo			)
+	ldrlo	ip, [r0, #12]			@ Restore IP
+	blo	.Lout\@				@ Carry on
+
+THUMB(	sub	r1, sp, r1		)	@ Restore original R1
+THUMB(	sub	sp, r1			)	@ Restore original SP
+	add	sp, sp, #\frame_size		@ Undo svc_entry's SP change
+	b	__bad_stack			@ Handle VMAP stack overflow
+	.popsection
+.Lout\@:
+#endif
+	.endm
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index c04dd94630c7..500612d3da2e 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -424,6 +424,13 @@ ENDPROC(secondary_startup)
 ENDPROC(secondary_startup_arm)
 
 ENTRY(__secondary_switched)
+#if defined(CONFIG_VMAP_STACK) && !defined(CONFIG_ARM_LPAE)
+	@ Before using the vmap'ed stack, we have to switch to swapper_pg_dir
+	@ as the ID map does not cover the vmalloc region.
+	mrc	p15, 0, ip, c2, c0, 1	@ read TTBR1
+	mcr	p15, 0, ip, c2, c0, 0	@ set TTBR0
+	instr_sync
+#endif
 	adr_l	r7, secondary_data + 12		@ get secondary_data.stack
 	ldr	sp, [r7]
 	ldr	r0, [r7, #4]			@ get secondary_data.task
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 380376f55554..74a1c878bc7a 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -54,7 +54,14 @@ static void __init init_irq_stacks(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		stack = (u8 *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
+		if (!IS_ENABLED(CONFIG_VMAP_STACK))
+			stack = (u8 *)__get_free_pages(GFP_KERNEL,
+						       THREAD_SIZE_ORDER);
+		else
+			stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN,
+					       THREADINFO_GFP, NUMA_NO_NODE,
+					       __builtin_return_address(0));
+
 		if (WARN_ON(!stack))
 			break;
 		per_cpu(irq_stack_ptr, cpu) = &stack[THREAD_SIZE];
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 284a80c0b6e1..039feb7cd590 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -141,10 +141,10 @@ EXPORT_SYMBOL(outer_cache);
 int __cpu_architecture __read_mostly = CPU_ARCH_UNKNOWN;
 
 struct stack {
-	u32 irq[3];
-	u32 abt[3];
-	u32 und[3];
-	u32 fiq[3];
+	u32 irq[4];
+	u32 abt[4];
+	u32 und[4];
+	u32 fiq[4];
 } ____cacheline_aligned;
 
 #ifndef CONFIG_CPU_V7M
diff --git a/arch/arm/kernel/sleep.S b/arch/arm/kernel/sleep.S
index 43077e11dafd..a86a1d4f3461 100644
--- a/arch/arm/kernel/sleep.S
+++ b/arch/arm/kernel/sleep.S
@@ -67,6 +67,12 @@ ENTRY(__cpu_suspend)
 	ldr	r4, =cpu_suspend_size
 #endif
 	mov	r5, sp			@ current virtual SP
+#ifdef CONFIG_VMAP_STACK
+	@ Run the suspend code from the overflow stack so we don't have to rely
+	@ on vmalloc-to-phys conversions anywhere in the arch suspend code.
+	@ The original SP value captured in R5 will be restored on the way out.
+	ldr_this_cpu sp, overflow_stack_ptr, r6, r7
+#endif
 	add	r4, r4, #12		@ Space for pgd, virt sp, phys resume fn
 	sub	sp, sp, r4		@ allocate CPU state on stack
 	ldr	r3, =sleep_save_sp
@@ -113,6 +119,13 @@ ENTRY(cpu_resume_mmu)
 ENDPROC(cpu_resume_mmu)
 	.popsection
 cpu_resume_after_mmu:
+#if defined(CONFIG_VMAP_STACK) && !defined(CONFIG_ARM_LPAE)
+	@ Before using the vmap'ed stack, we have to switch to swapper_pg_dir
+	@ as the ID map does not cover the vmalloc region.
+	mrc	p15, 0, ip, c2, c0, 1	@ read TTBR1
+	mcr	p15, 0, ip, c2, c0, 0	@ set TTBR0
+	instr_sync
+#endif
 	bl	cpu_init		@ restore the und/abt/irq banked regs
 	mov	r0, #0			@ return zero on success
 	ldmfd	sp!, {r4 - r11, pc}
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1b8bef286fbc..8b076eaeaf61 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -123,7 +123,8 @@ void dump_backtrace_stm(u32 *stack, u32 instruction, const char *loglvl)
 static int verify_stack(unsigned long sp)
 {
 	if (sp < PAGE_OFFSET ||
-	    (sp > (unsigned long)high_memory && high_memory != NULL))
+	    (!IS_ENABLED(CONFIG_VMAP_STACK) &&
+	     sp > (unsigned long)high_memory && high_memory != NULL))
 		return -EFAULT;
 
 	return 0;
@@ -293,7 +294,8 @@ static int __die(const char *str, int err, struct pt_regs *regs)
 
 	if (!user_mode(regs) || in_interrupt()) {
 		dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
-			 ALIGN(regs->ARM_sp, THREAD_SIZE));
+			 ALIGN(regs->ARM_sp - THREAD_SIZE, THREAD_ALIGN)
+			 + THREAD_SIZE);
 		dump_backtrace(regs, tsk, KERN_EMERG);
 		dump_instr(KERN_EMERG, regs);
 	}
@@ -840,3 +842,66 @@ void __init early_trap_init(void *vectors_base)
 	 */
 #endif
 }
+
+#ifdef CONFIG_VMAP_STACK
+
+DECLARE_PER_CPU(u8 *, irq_stack_ptr);
+
+asmlinkage DEFINE_PER_CPU(u8 *, overflow_stack_ptr);
+
+static int __init allocate_overflow_stacks(void)
+{
+	u8 *stack;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		stack = (u8 *)__get_free_page(GFP_KERNEL);
+		if (WARN_ON(!stack))
+			return -ENOMEM;
+		per_cpu(overflow_stack_ptr, cpu) = &stack[OVERFLOW_STACK_SIZE];
+	}
+	return 0;
+}
+early_initcall(allocate_overflow_stacks);
+
+asmlinkage void handle_bad_stack(struct pt_regs *regs)
+{
+	unsigned long tsk_stk = (unsigned long)current->stack;
+	unsigned long irq_stk = (unsigned long)this_cpu_read(irq_stack_ptr);
+	unsigned long ovf_stk = (unsigned long)this_cpu_read(overflow_stack_ptr);
+
+	console_verbose();
+	pr_emerg("Insufficient stack space to handle exception!");
+
+	pr_emerg("Task stack:     [0x%08lx..0x%08lx]\n",
+		 tsk_stk, tsk_stk + THREAD_SIZE);
+	pr_emerg("IRQ stack:      [0x%08lx..0x%08lx]\n",
+		 irq_stk - THREAD_SIZE, irq_stk);
+	pr_emerg("Overflow stack: [0x%08lx..0x%08lx]\n",
+		 ovf_stk - OVERFLOW_STACK_SIZE, ovf_stk);
+
+	die("kernel stack overflow", regs, 0);
+}
+
+#ifndef CONFIG_ARM_LPAE
+/*
+ * Normally, we rely on the logic in do_translation_fault() to update stale PMD
+ * entries covering the vmalloc space in a task's page tables when it first
+ * accesses the region in question. Unfortunately, this is not sufficient when
+ * the task stack resides in the vmalloc region, as do_translation_fault() is a
+ * C function that needs a stack to run.
+ *
+ * So we need to ensure that these PMD entries are up to date *before* the MM
+ * switch. As we already have some logic in the MM switch path that takes care
+ * of this, let's trigger it by bumping the counter every time the core vmalloc
+ * code modifies a PMD entry in the vmalloc region. Use release semantics on
+ * the store so that other CPUs observing the counter's new value are
+ * guaranteed to see the updated page table entries as well.
+ */
+void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+{
+	if (start < VMALLOC_END && end > VMALLOC_START)
+		atomic_inc_return_release(&init_mm.context.vmalloc_seq);
+}
+#endif
+#endif
diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index e8d729975f12..c5ea328c428d 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -389,7 +389,8 @@ int unwind_frame(struct stackframe *frame)
 
 	/* store the highest address on the stack to avoid crossing it*/
 	ctrl.sp_low = frame->sp;
-	ctrl.sp_high = ALIGN(ctrl.sp_low, THREAD_SIZE);
+	ctrl.sp_high = ALIGN(ctrl.sp_low - THREAD_SIZE, THREAD_ALIGN)
+		       + THREAD_SIZE;
 
 	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
 		 frame->pc, frame->lr, frame->sp);
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index f02d617e3359..aa12b65a7fd6 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -138,12 +138,12 @@ SECTIONS
 #ifdef CONFIG_STRICT_KERNEL_RWX
 	. = ALIGN(1<<SECTION_SHIFT);
 #else
-	. = ALIGN(THREAD_SIZE);
+	. = ALIGN(THREAD_ALIGN);
 #endif
 	__init_end = .;
 
 	_sdata = .;
-	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
+	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
 	_edata = .;
 
 	BSS_SECTION(0, 0, 0)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup
  2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
                   ` (31 preceding siblings ...)
  2022-01-24 17:47 ` [PATCH v5 32/32] ARM: implement support for vmap'ed stacks Ard Biesheuvel
@ 2022-01-24 17:56 ` Russell King (Oracle)
  2022-01-24 17:57   ` Ard Biesheuvel
  32 siblings, 1 reply; 38+ messages in thread
From: Russell King (Oracle) @ 2022-01-24 17:56 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-hardening, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

On Mon, Jan 24, 2022 at 06:47:12PM +0100, Ard Biesheuvel wrote:
> This v5 series is a combined followup to
> 
> - IRQ stacks support for v7 SMP systems [0],
> - vmap'ed stacks support for v7 SMP systems[1],
> - extending support for both IRQ stacks and vmap'ed stacks for all
>   remaining configurations, including v6/v7 SMP multiplatform kernels
>   and uniprocessor configurations including v7-M [2]
> 
> [0] https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/
> [1] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-1-ardb@kernel.org/
> [2] https://lore.kernel.org/linux-arm-kernel/20211206164659.1495084-1-ardb@kernel.org/
> 
> This work was queued up in the ARM tree for a while, but due to problems
> with the vmap'ed stacks code, which was difficult to revert in
> isolation, the whole stack was dropped again.
> 
> In order to prevent similar problems from occurring this time around,
> the series was reorganized so that the vmap'ed stacks changes appear at
> the very end, which also results in a more natural progression of the
> changes.
> 
> Changes since v4:
> - incorporate fixups to avoid build failures on Clang related to
>   literals in subsections,
> - switch from the ID map to swapper_pg_dir as early as possible when
>   onlining a CPU on !LPAE, to ensure that the stack is mapped,
> - use SMP_ON_UP patching to elide HWCAP_TLS tests on SMP+v6,
> - clean up __switch_to() for Thumb2 a bit more,
> - add patch to make the vmalloc_seq counter SMP safe,
> - use enter_lazy_tlb() hook on !LPAE to ensure that the active_mm used
>   by a kernel thread has a mapping for its vmap'ed stack,

Hi Ard,

I still have the original code in devel-stable, and being a guaranteed
stable branch, it's not something I'll be dropping... Please can I have
fixes on top of what is already there please?

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup
  2022-01-24 17:56 ` [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Russell King (Oracle)
@ 2022-01-24 17:57   ` Ard Biesheuvel
  0 siblings, 0 replies; 38+ messages in thread
From: Ard Biesheuvel @ 2022-01-24 17:57 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Linux ARM, linux-hardening, Nicolas Pitre, Arnd Bergmann,
	Kees Cook, Keith Packard, Linus Walleij, Nick Desaulniers,
	Tony Lindgren, Marc Zyngier, Vladimir Murzin, Jesse Taube

On Mon, 24 Jan 2022 at 18:57, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Mon, Jan 24, 2022 at 06:47:12PM +0100, Ard Biesheuvel wrote:
> > This v5 series is a combined followup to
> >
> > - IRQ stacks support for v7 SMP systems [0],
> > - vmap'ed stacks support for v7 SMP systems[1],
> > - extending support for both IRQ stacks and vmap'ed stacks for all
> >   remaining configurations, including v6/v7 SMP multiplatform kernels
> >   and uniprocessor configurations including v7-M [2]
> >
> > [0] https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/
> > [1] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-1-ardb@kernel.org/
> > [2] https://lore.kernel.org/linux-arm-kernel/20211206164659.1495084-1-ardb@kernel.org/
> >
> > This work was queued up in the ARM tree for a while, but due to problems
> > with the vmap'ed stacks code, which was difficult to revert in
> > isolation, the whole stack was dropped again.
> >
> > In order to prevent similar problems from occurring this time around,
> > the series was reorganized so that the vmap'ed stacks changes appear at
> > the very end, which also results in a more natural progression of the
> > changes.
> >
> > Changes since v4:
> > - incorporate fixups to avoid build failures on Clang related to
> >   literals in subsections,
> > - switch from the ID map to swapper_pg_dir as early as possible when
> >   onlining a CPU on !LPAE, to ensure that the stack is mapped,
> > - use SMP_ON_UP patching to elide HWCAP_TLS tests on SMP+v6,
> > - clean up __switch_to() for Thumb2 a bit more,
> > - add patch to make the vmalloc_seq counter SMP safe,
> > - use enter_lazy_tlb() hook on !LPAE to ensure that the active_mm used
> >   by a kernel thread has a mapping for its vmap'ed stack,
>
> Hi Ard,
>
> I still have the original code in devel-stable, and being a guaranteed
> stable branch, it's not something I'll be dropping... Please can I have
> fixes on top of what is already there please?
>

Sure.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 24/32] ARM: run softirqs on the per-CPU IRQ stack
  2022-01-24 17:47 ` [PATCH v5 24/32] ARM: run softirqs on the per-CPU IRQ stack Ard Biesheuvel
@ 2022-03-22  9:04   ` Sebastian Andrzej Siewior
  2022-03-22  9:35     ` Ard Biesheuvel
  0 siblings, 1 reply; 38+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-03-22  9:04 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux, linux-arm-kernel, linux-hardening, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, Tony Lindgren, Marc Zyngier, Vladimir Murzin,
	Jesse Taube, Thomas Gleixner

On 2022-01-24 18:47:36 [+0100], Ard Biesheuvel wrote:
> @@ -58,6 +61,17 @@ static void __init init_irq_stacks(void)
>  	}
>  }
>  
> +static void ____do_softirq(void *arg)
> +{
> +	__do_softirq();
> +}
> +
> +void do_softirq_own_stack(void)
> +{
> +	call_with_stack(____do_softirq, NULL,
> +			__this_cpu_read(irq_stack_ptr));
> +}
> +

Do I miss the obvious here or is the irq_stack_ptr only used for
softirqs and not for hardirqs?

>  int arch_show_interrupts(struct seq_file *p, int prec)
>  {
>  #ifdef CONFIG_FIQ

Sebastian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 24/32] ARM: run softirqs on the per-CPU IRQ stack
  2022-03-22  9:04   ` Sebastian Andrzej Siewior
@ 2022-03-22  9:35     ` Ard Biesheuvel
  2022-03-22 11:29       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 38+ messages in thread
From: Ard Biesheuvel @ 2022-03-22  9:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Russell King, Linux ARM, linux-hardening, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, Tony Lindgren, Marc Zyngier, Vladimir Murzin,
	Jesse Taube, Thomas Gleixner

On Tue, 22 Mar 2022 at 10:04, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2022-01-24 18:47:36 [+0100], Ard Biesheuvel wrote:
> > @@ -58,6 +61,17 @@ static void __init init_irq_stacks(void)
> >       }
> >  }
> >
> > +static void ____do_softirq(void *arg)
> > +{
> > +     __do_softirq();
> > +}
> > +
> > +void do_softirq_own_stack(void)
> > +{
> > +     call_with_stack(____do_softirq, NULL,
> > +                     __this_cpu_read(irq_stack_ptr));
> > +}
> > +
>
> Do I miss the obvious here or is the irq_stack_ptr only used for
> softirqs and not for hardirqs?
>
> >  int arch_show_interrupts(struct seq_file *p, int prec)
> >  {
> >  #ifdef CONFIG_FIQ
>

It is also used for hard IRQs - please refer to the irq_handler macro
in arch/arm/kernel/entry-armv.S

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 24/32] ARM: run softirqs on the per-CPU IRQ stack
  2022-03-22  9:35     ` Ard Biesheuvel
@ 2022-03-22 11:29       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 38+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-03-22 11:29 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King, Linux ARM, linux-hardening, Nicolas Pitre,
	Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
	Nick Desaulniers, Tony Lindgren, Marc Zyngier, Vladimir Murzin,
	Jesse Taube, Thomas Gleixner

On 2022-03-22 10:35:18 [+0100], Ard Biesheuvel wrote:
> It is also used for hard IRQs - please refer to the irq_handler macro
> in arch/arm/kernel/entry-armv.S

Thank you, this makes sense.

Sebastian

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-03-22 11:29 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-24 17:47 [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 01/32] ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 02/32] ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 03/32] ARM: footbridge: " Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 04/32] ARM: iop32x: offset IRQ numbers by 1 Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 05/32] ARM: iop32x: use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 06/32] ARM: remove old-style irq entry Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 07/32] irqchip: nvic: Use GENERIC_IRQ_MULTI_HANDLER Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 08/32] ARM: decompressor: disable stack protector Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 09/32] ARM: stackprotector: prefer compiler for TLS based per-task protector Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 10/32] ARM: entry: preserve thread_info pointer in switch_to Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 11/32] ARM: module: implement support for PC-relative group relocations Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 12/32] ARM: assembler: add optimized ldr/str macros to load variables from memory Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 13/32] ARM: percpu: add SMP_ON_UP support Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 14/32] ARM: use TLS register for 'current' on !SMP as well Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 15/32] ARM: smp: defer TPIDRURO update for SMP v6 configurations too Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 16/32] ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 17/32] ARM: assembler: introduce bl_r macro Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 18/32] ARM: unwind: support unwinding across multiple stacks Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 19/32] ARM: export dump_mem() to other objects Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 20/32] ARM: unwind: dump exception stack from calling frame Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 21/32] ARM: backtrace-clang: avoid crash on bogus frame pointer Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 22/32] ARM: implement IRQ stacks Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 23/32] ARM: call_with_stack: add unwind support Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 24/32] ARM: run softirqs on the per-CPU IRQ stack Ard Biesheuvel
2022-03-22  9:04   ` Sebastian Andrzej Siewior
2022-03-22  9:35     ` Ard Biesheuvel
2022-03-22 11:29       ` Sebastian Andrzej Siewior
2022-01-24 17:47 ` [PATCH v5 25/32] ARM: memcpy: use frame pointer as unwind anchor Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 26/32] ARM: memmove: " Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 27/32] ARM: memset: clean up unwind annotations Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 28/32] ARM: unwind: disregard unwind info before stack frame is set up Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 29/32] ARM: entry: rework stack realignment code in svc_entry Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 30/32] ARM: switch_to: clean up Thumb2 code path Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 31/32] ARM: mm: prepare vmalloc_seq handling for use under SMP Ard Biesheuvel
2022-01-24 17:47 ` [PATCH v5 32/32] ARM: implement support for vmap'ed stacks Ard Biesheuvel
2022-01-24 17:56 ` [PATCH v5 00/32] ARM vmap'ed and IRQ stacks roundup Russell King (Oracle)
2022-01-24 17:57   ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).