All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2] arm64: initialize per-cpu offsets earlier
@ 2020-10-05 16:43 Mark Rutland
  2020-10-05 18:08 ` Will Deacon
  2020-10-09  1:18   ` Qian Cai
  0 siblings, 2 replies; 12+ messages in thread
From: Mark Rutland @ 2020-10-05 16:43 UTC (permalink / raw)
  To: will, linux-arm-kernel; +Cc: Mark Rutland, Catalin Marinas, James Morse

The current initialization of the per-cpu offset register is difficult
to follow and this initialization is not always early enough for
upcoming instrumentation with KCSAN, where the instrumentation callbacks
use the per-cpu offset.

To make it possible to support KCSAN, and to simplify reasoning about
early bringup code, let's initialize the per-cpu offset earlier, before
we run any C code that may consume it. To do so, this patch adds a new
init_this_cpu_offset() helper that's called before the usual
primary/secondary start functions. For consistency, this is also used to
re-initialize the per-cpu offset after the runtime per-cpu areas have
been allocated (which can change CPU0's offset).

So that init_this_cpu_offset() isn't subject to any instrumentation that
might consume the per-cpu offset, it is marked with noinstr, preventing
instrumentation.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/cpu.h |  2 ++
 arch/arm64/kernel/head.S     |  3 +++
 arch/arm64/kernel/setup.c    | 12 ++++++------
 arch/arm64/kernel/smp.c      | 13 ++++++++-----
 4 files changed, 19 insertions(+), 11 deletions(-)

Since v1[1]:

* Fix typos
* Rebase atop v5.9-rc4

Mark.

[1] https://lore.kernel.org/r/20200730163806.23053-1-mark.rutland@arm.com

diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 7faae6ff3ab4d..d9d60b18e8116 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -68,4 +68,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info);
 void update_cpu_features(int cpu, struct cpuinfo_arm64 *info,
 				 struct cpuinfo_arm64 *boot);
 
+void init_this_cpu_offset(void);
+
 #endif /* __ASM_CPU_H */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 037421c66b147..2720e6ec68140 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -452,6 +452,8 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	bl	__pi_memset
 	dsb	ishst				// Make zero page visible to PTW
 
+	bl	init_this_cpu_offset
+
 #ifdef CONFIG_KASAN
 	bl	kasan_early_init
 #endif
@@ -758,6 +760,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
 	ptrauth_keys_init_cpu x2, x3, x4, x5
 #endif
 
+	bl	init_this_cpu_offset
 	b	secondary_start_kernel
 SYM_FUNC_END(__secondary_switched)
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 53acbeca4f574..fde4396418add 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -87,12 +87,6 @@ void __init smp_setup_processor_id(void)
 	u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
 	set_cpu_logical_map(0, mpidr);
 
-	/*
-	 * clear __my_cpu_offset on boot CPU to avoid hang caused by
-	 * using percpu variable early, for example, lockdep will
-	 * access percpu variable inside lock_release
-	 */
-	set_my_cpu_offset(0);
 	pr_info("Booting Linux on physical CPU 0x%010lx [0x%08x]\n",
 		(unsigned long)mpidr, read_cpuid_id());
 }
@@ -281,6 +275,12 @@ u64 cpu_logical_map(int cpu)
 	return __cpu_logical_map[cpu];
 }
 
+void noinstr init_this_cpu_offset(void)
+{
+	unsigned int cpu = task_cpu(current);
+	set_my_cpu_offset(per_cpu_offset(cpu));
+}
+
 void __init __no_sanitize_address setup_arch(char **cmdline_p)
 {
 	init_mm.start_code = (unsigned long) _text;
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 355ee9eed4dde..7714310fba226 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -192,10 +192,7 @@ asmlinkage notrace void secondary_start_kernel(void)
 	u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
 	struct mm_struct *mm = &init_mm;
 	const struct cpu_operations *ops;
-	unsigned int cpu;
-
-	cpu = task_cpu(current);
-	set_my_cpu_offset(per_cpu_offset(cpu));
+	unsigned int cpu = smp_processor_id();
 
 	/*
 	 * All kernel threads share the same mm context; grab a
@@ -435,7 +432,13 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 void __init smp_prepare_boot_cpu(void)
 {
-	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
+	/*
+	 * Now that setup_per_cpu_areas() has allocated the runtime per-cpu
+	 * areas it is only safe to read the CPU0 boot-time area, and we must
+	 * reinitialize the offset to point to the runtime area.
+	 */
+	init_this_cpu_offset();
+
 	cpuinfo_store_boot_cpu();
 
 	/*
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
  2020-10-05 16:43 [PATCHv2] arm64: initialize per-cpu offsets earlier Mark Rutland
@ 2020-10-05 18:08 ` Will Deacon
  2020-10-09  1:18   ` Qian Cai
  1 sibling, 0 replies; 12+ messages in thread
From: Will Deacon @ 2020-10-05 18:08 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel
  Cc: catalin.marinas, kernel-team, James Morse, Will Deacon

On Mon, 5 Oct 2020 17:43:03 +0100, Mark Rutland wrote:
> The current initialization of the per-cpu offset register is difficult
> to follow and this initialization is not always early enough for
> upcoming instrumentation with KCSAN, where the instrumentation callbacks
> use the per-cpu offset.
> 
> To make it possible to support KCSAN, and to simplify reasoning about
> early bringup code, let's initialize the per-cpu offset earlier, before
> we run any C code that may consume it. To do so, this patch adds a new
> init_this_cpu_offset() helper that's called before the usual
> primary/secondary start functions. For consistency, this is also used to
> re-initialize the per-cpu offset after the runtime per-cpu areas have
> been allocated (which can change CPU0's offset).
> 
> [...]

Applied to arm64 (for-next/late-arrivals), thanks!

[1/1] arm64: initialize per-cpu offsets earlier
      https://git.kernel.org/arm64/c/353e228eb355

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
  2020-10-05 16:43 [PATCHv2] arm64: initialize per-cpu offsets earlier Mark Rutland
@ 2020-10-09  1:18   ` Qian Cai
  2020-10-09  1:18   ` Qian Cai
  1 sibling, 0 replies; 12+ messages in thread
From: Qian Cai @ 2020-10-09  1:18 UTC (permalink / raw)
  To: Mark Rutland, will, linux-arm-kernel
  Cc: Catalin Marinas, James Morse, Stephen Rothwell,
	Linux Next Mailing List, Linux Kernel Mailing List

On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> The current initialization of the per-cpu offset register is difficult
> to follow and this initialization is not always early enough for
> upcoming instrumentation with KCSAN, where the instrumentation callbacks
> use the per-cpu offset.
> 
> To make it possible to support KCSAN, and to simplify reasoning about
> early bringup code, let's initialize the per-cpu offset earlier, before
> we run any C code that may consume it. To do so, this patch adds a new
> init_this_cpu_offset() helper that's called before the usual
> primary/secondary start functions. For consistency, this is also used to
> re-initialize the per-cpu offset after the runtime per-cpu areas have
> been allocated (which can change CPU0's offset).
> 
> So that init_this_cpu_offset() isn't subject to any instrumentation that
> might consume the per-cpu offset, it is marked with noinstr, preventing
> instrumentation.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Will Deacon <will@kernel.org>

Reverting this commit on the top of today's linux-next fixed an issue that
Thunder X2 is unable to boot:

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...

It hangs here for more than 10 minutes even with "earlycon" before I gave up.
The reverting makes it boot again following by those lines almost immediately.

[    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x431f0af1]
[    0.000000][    T0] Linux version 5.9.0-rc8-next-20201008+ (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8) #6 SMP Thu Oct 8 20:57:40 EDT 2020
[    0.000000][    T0] efi: EFI v2.70 by American Megatrends
[    0.000000][    T0] efi: ESRT=0xf9224418 SMBIOS=0xfcca0000 SMBIOS 3.0=0xfcc90000 ACPI 2.0=0xf9720000 MEMRESERVE=0xfc965918 
[    0.000000][    T0] esrt: Reserving ESRT space from 0x00000000f9224418 to 0x00000000f9224450.
[    0.000000][    T0] ACPI: Early table checksum verification disabled
[    0.000000][    T0] ACPI: RSDP 0x00000000F9720000 000024 (v02 HPE   )
[    0.000000][    T0] ACPI: XSDT 0x00000000F9720028 0000DC (v01 HPE    ServerCL 01072009 AMI  00010013)
[    0.000000][    T0] ACPI: FACP 0x00000000F9720108 000114 (v06 HPE    ServerCL 01072009 AMI  00010013)
[    0.000000][    T0] ACPI: DSDT 0x00000000F9720220 000714 (v02 HPE    ServerCL 20150406 INTL 20170831)
[    0.000000][    T0] ACPI: FIDT 0x00000000F9720938 00009C (v01 HPE    ServerCL 01072009 AMI  00010013)
...

# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              224
On-line CPU(s) list: 0-223
Thread(s) per core:  4
Core(s) per socket:  28
Socket(s):           2
NUMA node(s):        2
Vendor ID:           Cavium
Model:               1
Model name:          ThunderX2 99xx
Stepping:            0x1
BogoMIPS:            400.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            32768K
NUMA node0 CPU(s):   0-111
NUMA node1 CPU(s):   112-223
Flags:               fp asimd aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm

> ---
>  arch/arm64/include/asm/cpu.h |  2 ++
>  arch/arm64/kernel/head.S     |  3 +++
>  arch/arm64/kernel/setup.c    | 12 ++++++------
>  arch/arm64/kernel/smp.c      | 13 ++++++++-----
>  4 files changed, 19 insertions(+), 11 deletions(-)
> 
> Since v1[1]:
> 
> * Fix typos
> * Rebase atop v5.9-rc4
> 
> Mark.
> 
> [1] https://lore.kernel.org/r/20200730163806.23053-1-mark.rutland@arm.com
> 
> diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
> index 7faae6ff3ab4d..d9d60b18e8116 100644
> --- a/arch/arm64/include/asm/cpu.h
> +++ b/arch/arm64/include/asm/cpu.h
> @@ -68,4 +68,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info);
>  void update_cpu_features(int cpu, struct cpuinfo_arm64 *info,
>  				 struct cpuinfo_arm64 *boot);
>  
> +void init_this_cpu_offset(void);
> +
>  #endif /* __ASM_CPU_H */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 037421c66b147..2720e6ec68140 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -452,6 +452,8 @@ SYM_FUNC_START_LOCAL(__primary_switched)
>  	bl	__pi_memset
>  	dsb	ishst				// Make zero page visible to
> PTW
>  
> +	bl	init_this_cpu_offset
> +
>  #ifdef CONFIG_KASAN
>  	bl	kasan_early_init
>  #endif
> @@ -758,6 +760,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
>  	ptrauth_keys_init_cpu x2, x3, x4, x5
>  #endif
>  
> +	bl	init_this_cpu_offset
>  	b	secondary_start_kernel
>  SYM_FUNC_END(__secondary_switched)
>  
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 53acbeca4f574..fde4396418add 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -87,12 +87,6 @@ void __init smp_setup_processor_id(void)
>  	u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
>  	set_cpu_logical_map(0, mpidr);
>  
> -	/*
> -	 * clear __my_cpu_offset on boot CPU to avoid hang caused by
> -	 * using percpu variable early, for example, lockdep will
> -	 * access percpu variable inside lock_release
> -	 */
> -	set_my_cpu_offset(0);
>  	pr_info("Booting Linux on physical CPU 0x%010lx [0x%08x]\n",
>  		(unsigned long)mpidr, read_cpuid_id());
>  }
> @@ -281,6 +275,12 @@ u64 cpu_logical_map(int cpu)
>  	return __cpu_logical_map[cpu];
>  }
>  
> +void noinstr init_this_cpu_offset(void)
> +{
> +	unsigned int cpu = task_cpu(current);
> +	set_my_cpu_offset(per_cpu_offset(cpu));
> +}
> +
>  void __init __no_sanitize_address setup_arch(char **cmdline_p)
>  {
>  	init_mm.start_code = (unsigned long) _text;
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 355ee9eed4dde..7714310fba226 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -192,10 +192,7 @@ asmlinkage notrace void secondary_start_kernel(void)
>  	u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
>  	struct mm_struct *mm = &init_mm;
>  	const struct cpu_operations *ops;
> -	unsigned int cpu;
> -
> -	cpu = task_cpu(current);
> -	set_my_cpu_offset(per_cpu_offset(cpu));
> +	unsigned int cpu = smp_processor_id();
>  
>  	/*
>  	 * All kernel threads share the same mm context; grab a
> @@ -435,7 +432,13 @@ void __init smp_cpus_done(unsigned int max_cpus)
>  
>  void __init smp_prepare_boot_cpu(void)
>  {
> -	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
> +	/*
> +	 * Now that setup_per_cpu_areas() has allocated the runtime per-cpu
> +	 * areas it is only safe to read the CPU0 boot-time area, and we must
> +	 * reinitialize the offset to point to the runtime area.
> +	 */
> +	init_this_cpu_offset();
> +
>  	cpuinfo_store_boot_cpu();
>  
>  	/*


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
@ 2020-10-09  1:18   ` Qian Cai
  0 siblings, 0 replies; 12+ messages in thread
From: Qian Cai @ 2020-10-09  1:18 UTC (permalink / raw)
  To: Mark Rutland, will, linux-arm-kernel
  Cc: Catalin Marinas, Linux Next Mailing List, James Morse,
	Linux Kernel Mailing List, Stephen Rothwell

On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> The current initialization of the per-cpu offset register is difficult
> to follow and this initialization is not always early enough for
> upcoming instrumentation with KCSAN, where the instrumentation callbacks
> use the per-cpu offset.
> 
> To make it possible to support KCSAN, and to simplify reasoning about
> early bringup code, let's initialize the per-cpu offset earlier, before
> we run any C code that may consume it. To do so, this patch adds a new
> init_this_cpu_offset() helper that's called before the usual
> primary/secondary start functions. For consistency, this is also used to
> re-initialize the per-cpu offset after the runtime per-cpu areas have
> been allocated (which can change CPU0's offset).
> 
> So that init_this_cpu_offset() isn't subject to any instrumentation that
> might consume the per-cpu offset, it is marked with noinstr, preventing
> instrumentation.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Will Deacon <will@kernel.org>

Reverting this commit on the top of today's linux-next fixed an issue that
Thunder X2 is unable to boot:

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...

It hangs here for more than 10 minutes even with "earlycon" before I gave up.
The reverting makes it boot again following by those lines almost immediately.

[    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x431f0af1]
[    0.000000][    T0] Linux version 5.9.0-rc8-next-20201008+ (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8) #6 SMP Thu Oct 8 20:57:40 EDT 2020
[    0.000000][    T0] efi: EFI v2.70 by American Megatrends
[    0.000000][    T0] efi: ESRT=0xf9224418 SMBIOS=0xfcca0000 SMBIOS 3.0=0xfcc90000 ACPI 2.0=0xf9720000 MEMRESERVE=0xfc965918 
[    0.000000][    T0] esrt: Reserving ESRT space from 0x00000000f9224418 to 0x00000000f9224450.
[    0.000000][    T0] ACPI: Early table checksum verification disabled
[    0.000000][    T0] ACPI: RSDP 0x00000000F9720000 000024 (v02 HPE   )
[    0.000000][    T0] ACPI: XSDT 0x00000000F9720028 0000DC (v01 HPE    ServerCL 01072009 AMI  00010013)
[    0.000000][    T0] ACPI: FACP 0x00000000F9720108 000114 (v06 HPE    ServerCL 01072009 AMI  00010013)
[    0.000000][    T0] ACPI: DSDT 0x00000000F9720220 000714 (v02 HPE    ServerCL 20150406 INTL 20170831)
[    0.000000][    T0] ACPI: FIDT 0x00000000F9720938 00009C (v01 HPE    ServerCL 01072009 AMI  00010013)
...

# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              224
On-line CPU(s) list: 0-223
Thread(s) per core:  4
Core(s) per socket:  28
Socket(s):           2
NUMA node(s):        2
Vendor ID:           Cavium
Model:               1
Model name:          ThunderX2 99xx
Stepping:            0x1
BogoMIPS:            400.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            32768K
NUMA node0 CPU(s):   0-111
NUMA node1 CPU(s):   112-223
Flags:               fp asimd aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm

> ---
>  arch/arm64/include/asm/cpu.h |  2 ++
>  arch/arm64/kernel/head.S     |  3 +++
>  arch/arm64/kernel/setup.c    | 12 ++++++------
>  arch/arm64/kernel/smp.c      | 13 ++++++++-----
>  4 files changed, 19 insertions(+), 11 deletions(-)
> 
> Since v1[1]:
> 
> * Fix typos
> * Rebase atop v5.9-rc4
> 
> Mark.
> 
> [1] https://lore.kernel.org/r/20200730163806.23053-1-mark.rutland@arm.com
> 
> diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
> index 7faae6ff3ab4d..d9d60b18e8116 100644
> --- a/arch/arm64/include/asm/cpu.h
> +++ b/arch/arm64/include/asm/cpu.h
> @@ -68,4 +68,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info);
>  void update_cpu_features(int cpu, struct cpuinfo_arm64 *info,
>  				 struct cpuinfo_arm64 *boot);
>  
> +void init_this_cpu_offset(void);
> +
>  #endif /* __ASM_CPU_H */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 037421c66b147..2720e6ec68140 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -452,6 +452,8 @@ SYM_FUNC_START_LOCAL(__primary_switched)
>  	bl	__pi_memset
>  	dsb	ishst				// Make zero page visible to
> PTW
>  
> +	bl	init_this_cpu_offset
> +
>  #ifdef CONFIG_KASAN
>  	bl	kasan_early_init
>  #endif
> @@ -758,6 +760,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
>  	ptrauth_keys_init_cpu x2, x3, x4, x5
>  #endif
>  
> +	bl	init_this_cpu_offset
>  	b	secondary_start_kernel
>  SYM_FUNC_END(__secondary_switched)
>  
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 53acbeca4f574..fde4396418add 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -87,12 +87,6 @@ void __init smp_setup_processor_id(void)
>  	u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
>  	set_cpu_logical_map(0, mpidr);
>  
> -	/*
> -	 * clear __my_cpu_offset on boot CPU to avoid hang caused by
> -	 * using percpu variable early, for example, lockdep will
> -	 * access percpu variable inside lock_release
> -	 */
> -	set_my_cpu_offset(0);
>  	pr_info("Booting Linux on physical CPU 0x%010lx [0x%08x]\n",
>  		(unsigned long)mpidr, read_cpuid_id());
>  }
> @@ -281,6 +275,12 @@ u64 cpu_logical_map(int cpu)
>  	return __cpu_logical_map[cpu];
>  }
>  
> +void noinstr init_this_cpu_offset(void)
> +{
> +	unsigned int cpu = task_cpu(current);
> +	set_my_cpu_offset(per_cpu_offset(cpu));
> +}
> +
>  void __init __no_sanitize_address setup_arch(char **cmdline_p)
>  {
>  	init_mm.start_code = (unsigned long) _text;
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 355ee9eed4dde..7714310fba226 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -192,10 +192,7 @@ asmlinkage notrace void secondary_start_kernel(void)
>  	u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
>  	struct mm_struct *mm = &init_mm;
>  	const struct cpu_operations *ops;
> -	unsigned int cpu;
> -
> -	cpu = task_cpu(current);
> -	set_my_cpu_offset(per_cpu_offset(cpu));
> +	unsigned int cpu = smp_processor_id();
>  
>  	/*
>  	 * All kernel threads share the same mm context; grab a
> @@ -435,7 +432,13 @@ void __init smp_cpus_done(unsigned int max_cpus)
>  
>  void __init smp_prepare_boot_cpu(void)
>  {
> -	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
> +	/*
> +	 * Now that setup_per_cpu_areas() has allocated the runtime per-cpu
> +	 * areas it is only safe to read the CPU0 boot-time area, and we must
> +	 * reinitialize the offset to point to the runtime area.
> +	 */
> +	init_this_cpu_offset();
> +
>  	cpuinfo_store_boot_cpu();
>  
>  	/*


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
  2020-10-09  1:18   ` Qian Cai
@ 2020-10-09  8:51     ` Will Deacon
  -1 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2020-10-09  8:51 UTC (permalink / raw)
  To: Qian Cai
  Cc: Mark Rutland, linux-arm-kernel, Catalin Marinas, James Morse,
	Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List

On Thu, Oct 08, 2020 at 09:18:24PM -0400, Qian Cai wrote:
> On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> > The current initialization of the per-cpu offset register is difficult
> > to follow and this initialization is not always early enough for
> > upcoming instrumentation with KCSAN, where the instrumentation callbacks
> > use the per-cpu offset.
> > 
> > To make it possible to support KCSAN, and to simplify reasoning about
> > early bringup code, let's initialize the per-cpu offset earlier, before
> > we run any C code that may consume it. To do so, this patch adds a new
> > init_this_cpu_offset() helper that's called before the usual
> > primary/secondary start functions. For consistency, this is also used to
> > re-initialize the per-cpu offset after the runtime per-cpu areas have
> > been allocated (which can change CPU0's offset).
> > 
> > So that init_this_cpu_offset() isn't subject to any instrumentation that
> > might consume the per-cpu offset, it is marked with noinstr, preventing
> > instrumentation.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> 
> Reverting this commit on the top of today's linux-next fixed an issue that
> Thunder X2 is unable to boot:
> 
> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
> 
> EFI stub: Booting Linux Kernel...
> EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services and installing virtual address map...
> 
> It hangs here for more than 10 minutes even with "earlycon" before I gave up.
> The reverting makes it boot again following by those lines almost immediately.
> 
> [    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x431f0af1]
> [    0.000000][    T0] Linux version 5.9.0-rc8-next-20201008+ (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8) #6 SMP Thu Oct 8 20:57:40 EDT 2020
> [    0.000000][    T0] efi: EFI v2.70 by American Megatrends
> [    0.000000][    T0] efi: ESRT=0xf9224418 SMBIOS=0xfcca0000 SMBIOS 3.0=0xfcc90000 ACPI 2.0=0xf9720000 MEMRESERVE=0xfc965918 
> [    0.000000][    T0] esrt: Reserving ESRT space from 0x00000000f9224418 to 0x00000000f9224450.
> [    0.000000][    T0] ACPI: Early table checksum verification disabled
> [    0.000000][    T0] ACPI: RSDP 0x00000000F9720000 000024 (v02 HPE   )
> [    0.000000][    T0] ACPI: XSDT 0x00000000F9720028 0000DC (v01 HPE    ServerCL 01072009 AMI  00010013)
> [    0.000000][    T0] ACPI: FACP 0x00000000F9720108 000114 (v06 HPE    ServerCL 01072009 AMI  00010013)
> [    0.000000][    T0] ACPI: DSDT 0x00000000F9720220 000714 (v02 HPE    ServerCL 20150406 INTL 20170831)
> [    0.000000][    T0] ACPI: FIDT 0x00000000F9720938 00009C (v01 HPE    ServerCL 01072009 AMI  00010013)

Interesting...

Could you provide a disassembly of init_this_cpu_offset() please?

I was hoping to send the 5.10 pull request today, so I'll probably have
to revert this change for now.

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
@ 2020-10-09  8:51     ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2020-10-09  8:51 UTC (permalink / raw)
  To: Qian Cai
  Cc: Mark Rutland, Stephen Rothwell, Catalin Marinas,
	Linux Kernel Mailing List, Linux Next Mailing List, James Morse,
	linux-arm-kernel

On Thu, Oct 08, 2020 at 09:18:24PM -0400, Qian Cai wrote:
> On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> > The current initialization of the per-cpu offset register is difficult
> > to follow and this initialization is not always early enough for
> > upcoming instrumentation with KCSAN, where the instrumentation callbacks
> > use the per-cpu offset.
> > 
> > To make it possible to support KCSAN, and to simplify reasoning about
> > early bringup code, let's initialize the per-cpu offset earlier, before
> > we run any C code that may consume it. To do so, this patch adds a new
> > init_this_cpu_offset() helper that's called before the usual
> > primary/secondary start functions. For consistency, this is also used to
> > re-initialize the per-cpu offset after the runtime per-cpu areas have
> > been allocated (which can change CPU0's offset).
> > 
> > So that init_this_cpu_offset() isn't subject to any instrumentation that
> > might consume the per-cpu offset, it is marked with noinstr, preventing
> > instrumentation.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> 
> Reverting this commit on the top of today's linux-next fixed an issue that
> Thunder X2 is unable to boot:
> 
> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
> 
> EFI stub: Booting Linux Kernel...
> EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services and installing virtual address map...
> 
> It hangs here for more than 10 minutes even with "earlycon" before I gave up.
> The reverting makes it boot again following by those lines almost immediately.
> 
> [    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x431f0af1]
> [    0.000000][    T0] Linux version 5.9.0-rc8-next-20201008+ (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8) #6 SMP Thu Oct 8 20:57:40 EDT 2020
> [    0.000000][    T0] efi: EFI v2.70 by American Megatrends
> [    0.000000][    T0] efi: ESRT=0xf9224418 SMBIOS=0xfcca0000 SMBIOS 3.0=0xfcc90000 ACPI 2.0=0xf9720000 MEMRESERVE=0xfc965918 
> [    0.000000][    T0] esrt: Reserving ESRT space from 0x00000000f9224418 to 0x00000000f9224450.
> [    0.000000][    T0] ACPI: Early table checksum verification disabled
> [    0.000000][    T0] ACPI: RSDP 0x00000000F9720000 000024 (v02 HPE   )
> [    0.000000][    T0] ACPI: XSDT 0x00000000F9720028 0000DC (v01 HPE    ServerCL 01072009 AMI  00010013)
> [    0.000000][    T0] ACPI: FACP 0x00000000F9720108 000114 (v06 HPE    ServerCL 01072009 AMI  00010013)
> [    0.000000][    T0] ACPI: DSDT 0x00000000F9720220 000714 (v02 HPE    ServerCL 20150406 INTL 20170831)
> [    0.000000][    T0] ACPI: FIDT 0x00000000F9720938 00009C (v01 HPE    ServerCL 01072009 AMI  00010013)

Interesting...

Could you provide a disassembly of init_this_cpu_offset() please?

I was hoping to send the 5.10 pull request today, so I'll probably have
to revert this change for now.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
  2020-10-09  8:51     ` Will Deacon
@ 2020-10-09  9:43       ` Mark Rutland
  -1 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2020-10-09  9:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Qian Cai, linux-arm-kernel, Catalin Marinas, James Morse,
	Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List

Hi Qian,

On Fri, Oct 09, 2020 at 09:51:15AM +0100, Will Deacon wrote:
> On Thu, Oct 08, 2020 at 09:18:24PM -0400, Qian Cai wrote:
> > On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> > > The current initialization of the per-cpu offset register is difficult
> > > to follow and this initialization is not always early enough for
> > > upcoming instrumentation with KCSAN, where the instrumentation callbacks
> > > use the per-cpu offset.
> > > 
> > > To make it possible to support KCSAN, and to simplify reasoning about
> > > early bringup code, let's initialize the per-cpu offset earlier, before
> > > we run any C code that may consume it. To do so, this patch adds a new
> > > init_this_cpu_offset() helper that's called before the usual
> > > primary/secondary start functions. For consistency, this is also used to
> > > re-initialize the per-cpu offset after the runtime per-cpu areas have
> > > been allocated (which can change CPU0's offset).
> > > 
> > > So that init_this_cpu_offset() isn't subject to any instrumentation that
> > > might consume the per-cpu offset, it is marked with noinstr, preventing
> > > instrumentation.
> > > 
> > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: James Morse <james.morse@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > 
> > Reverting this commit on the top of today's linux-next fixed an issue that
> > Thunder X2 is unable to boot:
> > 
> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config

Sorry about this. :/

Will, to save you reading all the below, I think the right thing to do
for now is to revert this.

Building with that config, I see a boot time-hang on the primary CPU
under QEMU TCG, with or without VHE (in both cases, a stuck in a
recursive synchronous exception). With the patch reverted, the kernel
boots.

Looking at the assembly, task_cpu() gets instrumented (which puts this
patch on dodgy ground generally and I think warrants the revert), but as
it's instrumented with KASAN_INLINE that doesn't immediately explain the
issue since the shadow should be up and so we shouldn't call the report
function. I'll dig into this some more.

Assembly dumps below; for init_this_cpu_offset the reference to
page_wait_table+0x4d00 is generating the address for __per_cpu_offset,
but objdump doesn't have enough info to resolve that nicely.

| ffffa00011143bc0 <init_this_cpu_offset>:
| ffffa00011143bc0:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
| ffffa00011143bc4:       d5384100        mrs     x0, sp_el0
| ffffa00011143bc8:       910003fd        mov     x29, sp
| ffffa00011143bcc:       97bb29b9        bl      ffffa0001000e2b0 <task_cpu>
| ffffa00011143bd0:       f0004ea1        adrp    x1, ffffa00011b1a000 <page_wait_table+0x4d00>
| ffffa00011143bd4:       913c6021        add     x1, x1, #0xf18
| ffffa00011143bd8:       f8605820        ldr     x0, [x1, w0, uxtw #3]
| ffffa00011143bdc:       97bb29b1        bl      ffffa0001000e2a0 <set_my_cpu_offset>
| ffffa00011143be0:       a8c17bfd        ldp     x29, x30, [sp], #16
| ffffa00011143be4:       d65f03c0        ret

| ffffa0001000e2b0 <task_cpu>:
| ffffa0001000e2b0:       a9be7bfd        stp     x29, x30, [sp, #-32]!
| ffffa0001000e2b4:       d2d40001        mov     x1, #0xa00000000000             // #175921860444160
| ffffa0001000e2b8:       f2fbffe1        movk    x1, #0xdfff, lsl #48
| ffffa0001000e2bc:       910003fd        mov     x29, sp
| ffffa0001000e2c0:       f9000bf3        str     x19, [sp, #16]
| ffffa0001000e2c4:       aa0003f3        mov     x19, x0
| ffffa0001000e2c8:       91012000        add     x0, x0, #0x48
| ffffa0001000e2cc:       52800062        mov     w2, #0x3                        // #3
| ffffa0001000e2d0:       d343fc03        lsr     x3, x0, #3
| ffffa0001000e2d4:       38e16861        ldrsb   w1, [x3, x1]
| ffffa0001000e2d8:       7100003f        cmp     w1, #0x0
| ffffa0001000e2dc:       7a411041        ccmp    w2, w1, #0x1, ne  // ne = any
| ffffa0001000e2e0:       540000aa        b.ge    ffffa0001000e2f4 <task_cpu+0x44>  // b.tcont
| ffffa0001000e2e4:       b9404a60        ldr     w0, [x19, #72]
| ffffa0001000e2e8:       f9400bf3        ldr     x19, [sp, #16]
| ffffa0001000e2ec:       a8c27bfd        ldp     x29, x30, [sp], #32
| ffffa0001000e2f0:       d65f03c0        ret
| ffffa0001000e2f4:       9415af1f        bl      ffffa00010579f70 <__asan_report_load4_noabort>
| ffffa0001000e2f8:       17fffffb        b       ffffa0001000e2e4 <task_cpu+0x34>
| ffffa0001000e2fc:       d503201f        nop

| ffffa0001000e2a0 <set_my_cpu_offset>:
| ffffa0001000e2a0:       d518d080        msr     tpidr_el1, x0
| ffffa0001000e2a4:       d65f03c0        ret
| ffffa0001000e2a8:       d503201f        nop
| ffffa0001000e2ac:       d503201f        nop

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
@ 2020-10-09  9:43       ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2020-10-09  9:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Stephen Rothwell, Qian Cai, Catalin Marinas,
	Linux Kernel Mailing List, Linux Next Mailing List, James Morse,
	linux-arm-kernel

Hi Qian,

On Fri, Oct 09, 2020 at 09:51:15AM +0100, Will Deacon wrote:
> On Thu, Oct 08, 2020 at 09:18:24PM -0400, Qian Cai wrote:
> > On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> > > The current initialization of the per-cpu offset register is difficult
> > > to follow and this initialization is not always early enough for
> > > upcoming instrumentation with KCSAN, where the instrumentation callbacks
> > > use the per-cpu offset.
> > > 
> > > To make it possible to support KCSAN, and to simplify reasoning about
> > > early bringup code, let's initialize the per-cpu offset earlier, before
> > > we run any C code that may consume it. To do so, this patch adds a new
> > > init_this_cpu_offset() helper that's called before the usual
> > > primary/secondary start functions. For consistency, this is also used to
> > > re-initialize the per-cpu offset after the runtime per-cpu areas have
> > > been allocated (which can change CPU0's offset).
> > > 
> > > So that init_this_cpu_offset() isn't subject to any instrumentation that
> > > might consume the per-cpu offset, it is marked with noinstr, preventing
> > > instrumentation.
> > > 
> > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: James Morse <james.morse@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > 
> > Reverting this commit on the top of today's linux-next fixed an issue that
> > Thunder X2 is unable to boot:
> > 
> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config

Sorry about this. :/

Will, to save you reading all the below, I think the right thing to do
for now is to revert this.

Building with that config, I see a boot time-hang on the primary CPU
under QEMU TCG, with or without VHE (in both cases, a stuck in a
recursive synchronous exception). With the patch reverted, the kernel
boots.

Looking at the assembly, task_cpu() gets instrumented (which puts this
patch on dodgy ground generally and I think warrants the revert), but as
it's instrumented with KASAN_INLINE that doesn't immediately explain the
issue since the shadow should be up and so we shouldn't call the report
function. I'll dig into this some more.

Assembly dumps below; for init_this_cpu_offset the reference to
page_wait_table+0x4d00 is generating the address for __per_cpu_offset,
but objdump doesn't have enough info to resolve that nicely.

| ffffa00011143bc0 <init_this_cpu_offset>:
| ffffa00011143bc0:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
| ffffa00011143bc4:       d5384100        mrs     x0, sp_el0
| ffffa00011143bc8:       910003fd        mov     x29, sp
| ffffa00011143bcc:       97bb29b9        bl      ffffa0001000e2b0 <task_cpu>
| ffffa00011143bd0:       f0004ea1        adrp    x1, ffffa00011b1a000 <page_wait_table+0x4d00>
| ffffa00011143bd4:       913c6021        add     x1, x1, #0xf18
| ffffa00011143bd8:       f8605820        ldr     x0, [x1, w0, uxtw #3]
| ffffa00011143bdc:       97bb29b1        bl      ffffa0001000e2a0 <set_my_cpu_offset>
| ffffa00011143be0:       a8c17bfd        ldp     x29, x30, [sp], #16
| ffffa00011143be4:       d65f03c0        ret

| ffffa0001000e2b0 <task_cpu>:
| ffffa0001000e2b0:       a9be7bfd        stp     x29, x30, [sp, #-32]!
| ffffa0001000e2b4:       d2d40001        mov     x1, #0xa00000000000             // #175921860444160
| ffffa0001000e2b8:       f2fbffe1        movk    x1, #0xdfff, lsl #48
| ffffa0001000e2bc:       910003fd        mov     x29, sp
| ffffa0001000e2c0:       f9000bf3        str     x19, [sp, #16]
| ffffa0001000e2c4:       aa0003f3        mov     x19, x0
| ffffa0001000e2c8:       91012000        add     x0, x0, #0x48
| ffffa0001000e2cc:       52800062        mov     w2, #0x3                        // #3
| ffffa0001000e2d0:       d343fc03        lsr     x3, x0, #3
| ffffa0001000e2d4:       38e16861        ldrsb   w1, [x3, x1]
| ffffa0001000e2d8:       7100003f        cmp     w1, #0x0
| ffffa0001000e2dc:       7a411041        ccmp    w2, w1, #0x1, ne  // ne = any
| ffffa0001000e2e0:       540000aa        b.ge    ffffa0001000e2f4 <task_cpu+0x44>  // b.tcont
| ffffa0001000e2e4:       b9404a60        ldr     w0, [x19, #72]
| ffffa0001000e2e8:       f9400bf3        ldr     x19, [sp, #16]
| ffffa0001000e2ec:       a8c27bfd        ldp     x29, x30, [sp], #32
| ffffa0001000e2f0:       d65f03c0        ret
| ffffa0001000e2f4:       9415af1f        bl      ffffa00010579f70 <__asan_report_load4_noabort>
| ffffa0001000e2f8:       17fffffb        b       ffffa0001000e2e4 <task_cpu+0x34>
| ffffa0001000e2fc:       d503201f        nop

| ffffa0001000e2a0 <set_my_cpu_offset>:
| ffffa0001000e2a0:       d518d080        msr     tpidr_el1, x0
| ffffa0001000e2a4:       d65f03c0        ret
| ffffa0001000e2a8:       d503201f        nop
| ffffa0001000e2ac:       d503201f        nop

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
  2020-10-09  9:43       ` Mark Rutland
@ 2020-10-09 10:24         ` Mark Rutland
  -1 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2020-10-09 10:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: Qian Cai, linux-arm-kernel, Catalin Marinas, James Morse,
	Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List

On Fri, Oct 09, 2020 at 10:43:18AM +0100, Mark Rutland wrote:
> Hi Qian,
> 
> On Fri, Oct 09, 2020 at 09:51:15AM +0100, Will Deacon wrote:
> > On Thu, Oct 08, 2020 at 09:18:24PM -0400, Qian Cai wrote:
> > > On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> > > > The current initialization of the per-cpu offset register is difficult
> > > > to follow and this initialization is not always early enough for
> > > > upcoming instrumentation with KCSAN, where the instrumentation callbacks
> > > > use the per-cpu offset.
> > > > 
> > > > To make it possible to support KCSAN, and to simplify reasoning about
> > > > early bringup code, let's initialize the per-cpu offset earlier, before
> > > > we run any C code that may consume it. To do so, this patch adds a new
> > > > init_this_cpu_offset() helper that's called before the usual
> > > > primary/secondary start functions. For consistency, this is also used to
> > > > re-initialize the per-cpu offset after the runtime per-cpu areas have
> > > > been allocated (which can change CPU0's offset).
> > > > 
> > > > So that init_this_cpu_offset() isn't subject to any instrumentation that
> > > > might consume the per-cpu offset, it is marked with noinstr, preventing
> > > > instrumentation.
> > > > 
> > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: James Morse <james.morse@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > 
> > > Reverting this commit on the top of today's linux-next fixed an issue that
> > > Thunder X2 is unable to boot:
> > > 
> > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
> 
> Sorry about this. :/
> 
> Will, to save you reading all the below, I think the right thing to do
> for now is to revert this.


> Looking at the assembly, task_cpu() gets instrumented (which puts this
> patch on dodgy ground generally and I think warrants the revert), but as
> it's instrumented with KASAN_INLINE that doesn't immediately explain the
> issue since the shadow should be up and so we shouldn't call the report
> function. I'll dig into this some more.

Ok; that's my fault due to trying to do this before kasan_early_init.

I see what's going on now. If you're happy to take a fixup instead of a
revert, patch below. Otherwise I'll a complete patch atop of the revert
after rc1.

Thanks,
Mark.

---->8----
From e93fcb9649c9ccfbea9a6f17b68280420685ddc5 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Fri, 9 Oct 2020 11:06:32 +0100
Subject: [PATCH] arm64: fix per-cpu offset initialization

Qian sees a boot-time hang introduced by commit:

  353e228eb355be5a ("arm64: initialize per-cpu offsets earlier")

... which happens because task_cpu() can be instrumented by KASAN, and
we call init_this_cpu_offset() before we've performed the early KASAN
initialization.

We don't need to initialize the per-cpu offset before the early KASAN
initialization runs (and we didn't prior to the patch in question), so
we can avoid bothering with that.

However, were task_cpu() instrumented with something else, this could
cause similar issues, so let's also open-code that within
init_this_cpu_offset() to avoid that possibility.

It's also possible that set_my_cpu_offset() gets instrumented in
future, so let's avoid that by marking it __always_inline. It's only
used by init_this_cpu_offset(), so this doesn't matter for any other
code.

Finally, per_cpu_offset(x) is a macro expanding to __per_cpu_offset[x],
which is inlined and not instrumented.

Fixes: 353e228eb355be5a ("arm64: initialize per-cpu offsets earlier")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by:  Qian Cai <cai@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/percpu.h | 2 +-
 arch/arm64/kernel/head.S        | 2 --
 arch/arm64/kernel/setup.c       | 2 +-
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
index 0b6409b89e5e0..0c347d3faf55c 100644
--- a/arch/arm64/include/asm/percpu.h
+++ b/arch/arm64/include/asm/percpu.h
@@ -11,7 +11,7 @@
 #include <asm/cmpxchg.h>
 #include <asm/stack_pointer.h>
 
-static inline void set_my_cpu_offset(unsigned long off)
+static __always_inline void set_my_cpu_offset(unsigned long off)
 {
 	asm volatile(ALTERNATIVE("msr tpidr_el1, %0",
 				 "msr tpidr_el2, %0",
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index e28c9d4e5278c..9bbea14a9ca3f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -448,8 +448,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	bl	__pi_memset
 	dsb	ishst				// Make zero page visible to PTW
 
-	bl	init_this_cpu_offset
-
 #ifdef CONFIG_KASAN
 	bl	kasan_early_init
 #endif
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 005171972764b..161eaa83264ea 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -278,7 +278,7 @@ EXPORT_SYMBOL_GPL(cpu_logical_map);
 
 void noinstr init_this_cpu_offset(void)
 {
-	unsigned int cpu = task_cpu(current);
+	unsigned int cpu = current->cpu;
 	set_my_cpu_offset(per_cpu_offset(cpu));
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
@ 2020-10-09 10:24         ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2020-10-09 10:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: Stephen Rothwell, Qian Cai, Catalin Marinas,
	Linux Kernel Mailing List, Linux Next Mailing List, James Morse,
	linux-arm-kernel

On Fri, Oct 09, 2020 at 10:43:18AM +0100, Mark Rutland wrote:
> Hi Qian,
> 
> On Fri, Oct 09, 2020 at 09:51:15AM +0100, Will Deacon wrote:
> > On Thu, Oct 08, 2020 at 09:18:24PM -0400, Qian Cai wrote:
> > > On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> > > > The current initialization of the per-cpu offset register is difficult
> > > > to follow and this initialization is not always early enough for
> > > > upcoming instrumentation with KCSAN, where the instrumentation callbacks
> > > > use the per-cpu offset.
> > > > 
> > > > To make it possible to support KCSAN, and to simplify reasoning about
> > > > early bringup code, let's initialize the per-cpu offset earlier, before
> > > > we run any C code that may consume it. To do so, this patch adds a new
> > > > init_this_cpu_offset() helper that's called before the usual
> > > > primary/secondary start functions. For consistency, this is also used to
> > > > re-initialize the per-cpu offset after the runtime per-cpu areas have
> > > > been allocated (which can change CPU0's offset).
> > > > 
> > > > So that init_this_cpu_offset() isn't subject to any instrumentation that
> > > > might consume the per-cpu offset, it is marked with noinstr, preventing
> > > > instrumentation.
> > > > 
> > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: James Morse <james.morse@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > 
> > > Reverting this commit on the top of today's linux-next fixed an issue that
> > > Thunder X2 is unable to boot:
> > > 
> > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
> 
> Sorry about this. :/
> 
> Will, to save you reading all the below, I think the right thing to do
> for now is to revert this.


> Looking at the assembly, task_cpu() gets instrumented (which puts this
> patch on dodgy ground generally and I think warrants the revert), but as
> it's instrumented with KASAN_INLINE that doesn't immediately explain the
> issue since the shadow should be up and so we shouldn't call the report
> function. I'll dig into this some more.

Ok; that's my fault due to trying to do this before kasan_early_init.

I see what's going on now. If you're happy to take a fixup instead of a
revert, patch below. Otherwise I'll a complete patch atop of the revert
after rc1.

Thanks,
Mark.

---->8----
From e93fcb9649c9ccfbea9a6f17b68280420685ddc5 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Fri, 9 Oct 2020 11:06:32 +0100
Subject: [PATCH] arm64: fix per-cpu offset initialization

Qian sees a boot-time hang introduced by commit:

  353e228eb355be5a ("arm64: initialize per-cpu offsets earlier")

... which happens because task_cpu() can be instrumented by KASAN, and
we call init_this_cpu_offset() before we've performed the early KASAN
initialization.

We don't need to initialize the per-cpu offset before the early KASAN
initialization runs (and we didn't prior to the patch in question), so
we can avoid bothering with that.

However, were task_cpu() instrumented with something else, this could
cause similar issues, so let's also open-code that within
init_this_cpu_offset() to avoid that possibility.

It's also possible that set_my_cpu_offset() gets instrumented in
future, so let's avoid that by marking it __always_inline. It's only
used by init_this_cpu_offset(), so this doesn't matter for any other
code.

Finally, per_cpu_offset(x) is a macro expanding to __per_cpu_offset[x],
which is inlined and not instrumented.

Fixes: 353e228eb355be5a ("arm64: initialize per-cpu offsets earlier")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by:  Qian Cai <cai@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/percpu.h | 2 +-
 arch/arm64/kernel/head.S        | 2 --
 arch/arm64/kernel/setup.c       | 2 +-
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
index 0b6409b89e5e0..0c347d3faf55c 100644
--- a/arch/arm64/include/asm/percpu.h
+++ b/arch/arm64/include/asm/percpu.h
@@ -11,7 +11,7 @@
 #include <asm/cmpxchg.h>
 #include <asm/stack_pointer.h>
 
-static inline void set_my_cpu_offset(unsigned long off)
+static __always_inline void set_my_cpu_offset(unsigned long off)
 {
 	asm volatile(ALTERNATIVE("msr tpidr_el1, %0",
 				 "msr tpidr_el2, %0",
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index e28c9d4e5278c..9bbea14a9ca3f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -448,8 +448,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	bl	__pi_memset
 	dsb	ishst				// Make zero page visible to PTW
 
-	bl	init_this_cpu_offset
-
 #ifdef CONFIG_KASAN
 	bl	kasan_early_init
 #endif
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 005171972764b..161eaa83264ea 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -278,7 +278,7 @@ EXPORT_SYMBOL_GPL(cpu_logical_map);
 
 void noinstr init_this_cpu_offset(void)
 {
-	unsigned int cpu = task_cpu(current);
+	unsigned int cpu = current->cpu;
 	set_my_cpu_offset(per_cpu_offset(cpu));
 }
 
-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
  2020-10-09 10:24         ` Mark Rutland
@ 2020-10-09 10:46           ` Will Deacon
  -1 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2020-10-09 10:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Qian Cai, linux-arm-kernel, Catalin Marinas, James Morse,
	Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List

On Fri, Oct 09, 2020 at 11:24:38AM +0100, Mark Rutland wrote:
> On Fri, Oct 09, 2020 at 10:43:18AM +0100, Mark Rutland wrote:
> > Hi Qian,
> > 
> > On Fri, Oct 09, 2020 at 09:51:15AM +0100, Will Deacon wrote:
> > > On Thu, Oct 08, 2020 at 09:18:24PM -0400, Qian Cai wrote:
> > > > On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> > > > > The current initialization of the per-cpu offset register is difficult
> > > > > to follow and this initialization is not always early enough for
> > > > > upcoming instrumentation with KCSAN, where the instrumentation callbacks
> > > > > use the per-cpu offset.
> > > > > 
> > > > > To make it possible to support KCSAN, and to simplify reasoning about
> > > > > early bringup code, let's initialize the per-cpu offset earlier, before
> > > > > we run any C code that may consume it. To do so, this patch adds a new
> > > > > init_this_cpu_offset() helper that's called before the usual
> > > > > primary/secondary start functions. For consistency, this is also used to
> > > > > re-initialize the per-cpu offset after the runtime per-cpu areas have
> > > > > been allocated (which can change CPU0's offset).
> > > > > 
> > > > > So that init_this_cpu_offset() isn't subject to any instrumentation that
> > > > > might consume the per-cpu offset, it is marked with noinstr, preventing
> > > > > instrumentation.
> > > > > 
> > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > > Cc: James Morse <james.morse@arm.com>
> > > > > Cc: Will Deacon <will@kernel.org>
> > > > 
> > > > Reverting this commit on the top of today's linux-next fixed an issue that
> > > > Thunder X2 is unable to boot:
> > > > 
> > > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
> > 
> > Sorry about this. :/
> > 
> > Will, to save you reading all the below, I think the right thing to do
> > for now is to revert this.
> 
> 
> > Looking at the assembly, task_cpu() gets instrumented (which puts this
> > patch on dodgy ground generally and I think warrants the revert), but as
> > it's instrumented with KASAN_INLINE that doesn't immediately explain the
> > issue since the shadow should be up and so we shouldn't call the report
> > function. I'll dig into this some more.
> 
> Ok; that's my fault due to trying to do this before kasan_early_init.
> 
> I see what's going on now. If you're happy to take a fixup instead of a
> revert, patch below. Otherwise I'll a complete patch atop of the revert
> after rc1.

For now, I've reverted the patch on for-next/core and redone the tag.

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2] arm64: initialize per-cpu offsets earlier
@ 2020-10-09 10:46           ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2020-10-09 10:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Stephen Rothwell, Qian Cai, Catalin Marinas,
	Linux Kernel Mailing List, Linux Next Mailing List, James Morse,
	linux-arm-kernel

On Fri, Oct 09, 2020 at 11:24:38AM +0100, Mark Rutland wrote:
> On Fri, Oct 09, 2020 at 10:43:18AM +0100, Mark Rutland wrote:
> > Hi Qian,
> > 
> > On Fri, Oct 09, 2020 at 09:51:15AM +0100, Will Deacon wrote:
> > > On Thu, Oct 08, 2020 at 09:18:24PM -0400, Qian Cai wrote:
> > > > On Mon, 2020-10-05 at 17:43 +0100, Mark Rutland wrote:
> > > > > The current initialization of the per-cpu offset register is difficult
> > > > > to follow and this initialization is not always early enough for
> > > > > upcoming instrumentation with KCSAN, where the instrumentation callbacks
> > > > > use the per-cpu offset.
> > > > > 
> > > > > To make it possible to support KCSAN, and to simplify reasoning about
> > > > > early bringup code, let's initialize the per-cpu offset earlier, before
> > > > > we run any C code that may consume it. To do so, this patch adds a new
> > > > > init_this_cpu_offset() helper that's called before the usual
> > > > > primary/secondary start functions. For consistency, this is also used to
> > > > > re-initialize the per-cpu offset after the runtime per-cpu areas have
> > > > > been allocated (which can change CPU0's offset).
> > > > > 
> > > > > So that init_this_cpu_offset() isn't subject to any instrumentation that
> > > > > might consume the per-cpu offset, it is marked with noinstr, preventing
> > > > > instrumentation.
> > > > > 
> > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > > Cc: James Morse <james.morse@arm.com>
> > > > > Cc: Will Deacon <will@kernel.org>
> > > > 
> > > > Reverting this commit on the top of today's linux-next fixed an issue that
> > > > Thunder X2 is unable to boot:
> > > > 
> > > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
> > 
> > Sorry about this. :/
> > 
> > Will, to save you reading all the below, I think the right thing to do
> > for now is to revert this.
> 
> 
> > Looking at the assembly, task_cpu() gets instrumented (which puts this
> > patch on dodgy ground generally and I think warrants the revert), but as
> > it's instrumented with KASAN_INLINE that doesn't immediately explain the
> > issue since the shadow should be up and so we shouldn't call the report
> > function. I'll dig into this some more.
> 
> Ok; that's my fault due to trying to do this before kasan_early_init.
> 
> I see what's going on now. If you're happy to take a fixup instead of a
> revert, patch below. Otherwise I'll a complete patch atop of the revert
> after rc1.

For now, I've reverted the patch on for-next/core and redone the tag.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-10-09 10:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-05 16:43 [PATCHv2] arm64: initialize per-cpu offsets earlier Mark Rutland
2020-10-05 18:08 ` Will Deacon
2020-10-09  1:18 ` Qian Cai
2020-10-09  1:18   ` Qian Cai
2020-10-09  8:51   ` Will Deacon
2020-10-09  8:51     ` Will Deacon
2020-10-09  9:43     ` Mark Rutland
2020-10-09  9:43       ` Mark Rutland
2020-10-09 10:24       ` Mark Rutland
2020-10-09 10:24         ` Mark Rutland
2020-10-09 10:46         ` Will Deacon
2020-10-09 10:46           ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.