All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-08  6:07 ` Huangzhaoyang
  0 siblings, 0 replies; 20+ messages in thread
From: Huangzhaoyang @ 2021-10-08  6:07 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Suzuki K Poulose,
	Ionela Voinescu, Quentin Perret, Vladimir Murzin,
	linux-arm-kernel, Zhaoyang Huang, linux-kernel, ke.wang

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
test, which can be work around by a msleep on the sw context. We assume
suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.

PS:
The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
alike racing between on chip PAN and SW_PAN.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 arch/arm64/kernel/cpufeature.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index efed283..3c0de0d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
 	WARN_ON_ONCE(in_interrupt());
 
 	sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
+	isb();
 	set_pstate_pan(1);
 }
 #endif /* CONFIG_ARM64_PAN */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-08  6:07 ` Huangzhaoyang
  0 siblings, 0 replies; 20+ messages in thread
From: Huangzhaoyang @ 2021-10-08  6:07 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Suzuki K Poulose,
	Ionela Voinescu, Quentin Perret, Vladimir Murzin,
	linux-arm-kernel, Zhaoyang Huang, linux-kernel, ke.wang

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
test, which can be work around by a msleep on the sw context. We assume
suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.

PS:
The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
alike racing between on chip PAN and SW_PAN.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 arch/arm64/kernel/cpufeature.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index efed283..3c0de0d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
 	WARN_ON_ONCE(in_interrupt());
 
 	sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
+	isb();
 	set_pstate_pan(1);
 }
 #endif /* CONFIG_ARM64_PAN */
-- 
1.9.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-08  6:07 ` Huangzhaoyang
@ 2021-10-08  8:01   ` Will Deacon
  -1 siblings, 0 replies; 20+ messages in thread
From: Will Deacon @ 2021-10-08  8:01 UTC (permalink / raw)
  To: Huangzhaoyang
  Cc: Catalin Marinas, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, linux-kernel, ke.wang

Hi,

On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> test, which can be work around by a msleep on the sw context. We assume
> suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> 
> PS:
> The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> alike racing between on chip PAN and SW_PAN.

Sorry, but I'm struggling to understand the problem here. Please could you
explain it in more detail?

  - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
  - Can you explain the race that you think might be occurring?
  - Why does an ISB prevent the race?

> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  arch/arm64/kernel/cpufeature.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index efed283..3c0de0d 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
>  	WARN_ON_ONCE(in_interrupt());
>  
>  	sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> +	isb();
>  	set_pstate_pan(1);

SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
is itself a context-synchronizing event, so I can't see why the ISB makes
any difference here (at least, for the purposes of PAN).

Thanks,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-08  8:01   ` Will Deacon
  0 siblings, 0 replies; 20+ messages in thread
From: Will Deacon @ 2021-10-08  8:01 UTC (permalink / raw)
  To: Huangzhaoyang
  Cc: Catalin Marinas, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, linux-kernel, ke.wang

Hi,

On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> test, which can be work around by a msleep on the sw context. We assume
> suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> 
> PS:
> The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> alike racing between on chip PAN and SW_PAN.

Sorry, but I'm struggling to understand the problem here. Please could you
explain it in more detail?

  - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
  - Can you explain the race that you think might be occurring?
  - Why does an ISB prevent the race?

> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  arch/arm64/kernel/cpufeature.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index efed283..3c0de0d 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
>  	WARN_ON_ONCE(in_interrupt());
>  
>  	sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> +	isb();
>  	set_pstate_pan(1);

SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
is itself a context-synchronizing event, so I can't see why the ISB makes
any difference here (at least, for the purposes of PAN).

Thanks,

Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-08  8:01   ` Will Deacon
@ 2021-10-08  8:34     ` Zhaoyang Huang
  -1 siblings, 0 replies; 20+ messages in thread
From: Zhaoyang Huang @ 2021-10-08  8:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
>
> Hi,
>
> On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > test, which can be work around by a msleep on the sw context. We assume
> > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> >
> > PS:
> > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > alike racing between on chip PAN and SW_PAN.
>
> Sorry, but I'm struggling to understand the problem here. Please could you
> explain it in more detail?
>
>   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
>   - Can you explain the race that you think might be occurring?
>   - Why does an ISB prevent the race?
Please find panic logs[1], related codes[2], sample of debug patch[3]
below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
by the debug patch during retest (all entrances that msr ttbr1_el1 are
under watch) which should work. Adding ISB here to prevent race on
TTBR1 from previous access of sysregs which can affect the msr
result(the test is still ongoing). Could the race be
ARM64_HAS_PAN(automated by core) and SW_PAN.

[1]
[    0.348000]  [0:    migration/0:   11] Synchronous External Abort:
level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
[    0.352000]  [0:    migration/0:   11] Internal error: : 96000055
[#1] PREEMPT SMP
[    0.352000]  [0:    migration/0:   11] Modules linked in:
[    0.352000]  [0:    migration/0:   11] Process migration/0 (pid:
11, stack limit = 0x        (ptrval))
[    0.352000]  [0:    migration/0:   11] CPU: 0 PID: 11 Comm:
migration/0 Tainted: G S
4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
[    0.352000]  [0:    migration/0:   11] Hardware name: Spreadtrum
UMS9230 1H10 SoC (DT)
[    0.352000]  [0:    migration/0:   11] task:         (ptrval)
task.stack:         (ptrval)
[    0.352000]  [0:    migration/0:   11] pc : patch_alternative+0x68/0x27c
[    0.352000]  [0:    migration/0:   11] lr :
__apply_alternatives.llvm.7450387295891320208+0x60/0x160

[2]
__apply_alternatives
   for()
       patch_alternative    <----panic here in the 2nd round of loop
after invoking flush_icache_range
       flush_icache_range

[3]
sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
+ tst     \tmp1, #0xffff80000000 // check ttbr1_el1 valid
+    b.le    .
msr ttbr1_el1, \tmp1 // set reserved ASID

>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >  arch/arm64/kernel/cpufeature.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index efed283..3c0de0d 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> >       WARN_ON_ONCE(in_interrupt());
> >
> >       sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> > +     isb();
> >       set_pstate_pan(1);
>
> SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
> is itself a context-synchronizing event, so I can't see why the ISB makes
> any difference here (at least, for the purposes of PAN).
>
> Thanks,
>
> Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-08  8:34     ` Zhaoyang Huang
  0 siblings, 0 replies; 20+ messages in thread
From: Zhaoyang Huang @ 2021-10-08  8:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
>
> Hi,
>
> On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > test, which can be work around by a msleep on the sw context. We assume
> > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> >
> > PS:
> > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > alike racing between on chip PAN and SW_PAN.
>
> Sorry, but I'm struggling to understand the problem here. Please could you
> explain it in more detail?
>
>   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
>   - Can you explain the race that you think might be occurring?
>   - Why does an ISB prevent the race?
Please find panic logs[1], related codes[2], sample of debug patch[3]
below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
by the debug patch during retest (all entrances that msr ttbr1_el1 are
under watch) which should work. Adding ISB here to prevent race on
TTBR1 from previous access of sysregs which can affect the msr
result(the test is still ongoing). Could the race be
ARM64_HAS_PAN(automated by core) and SW_PAN.

[1]
[    0.348000]  [0:    migration/0:   11] Synchronous External Abort:
level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
[    0.352000]  [0:    migration/0:   11] Internal error: : 96000055
[#1] PREEMPT SMP
[    0.352000]  [0:    migration/0:   11] Modules linked in:
[    0.352000]  [0:    migration/0:   11] Process migration/0 (pid:
11, stack limit = 0x        (ptrval))
[    0.352000]  [0:    migration/0:   11] CPU: 0 PID: 11 Comm:
migration/0 Tainted: G S
4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
[    0.352000]  [0:    migration/0:   11] Hardware name: Spreadtrum
UMS9230 1H10 SoC (DT)
[    0.352000]  [0:    migration/0:   11] task:         (ptrval)
task.stack:         (ptrval)
[    0.352000]  [0:    migration/0:   11] pc : patch_alternative+0x68/0x27c
[    0.352000]  [0:    migration/0:   11] lr :
__apply_alternatives.llvm.7450387295891320208+0x60/0x160

[2]
__apply_alternatives
   for()
       patch_alternative    <----panic here in the 2nd round of loop
after invoking flush_icache_range
       flush_icache_range

[3]
sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
+ tst     \tmp1, #0xffff80000000 // check ttbr1_el1 valid
+    b.le    .
msr ttbr1_el1, \tmp1 // set reserved ASID

>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >  arch/arm64/kernel/cpufeature.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index efed283..3c0de0d 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> >       WARN_ON_ONCE(in_interrupt());
> >
> >       sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> > +     isb();
> >       set_pstate_pan(1);
>
> SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
> is itself a context-synchronizing event, so I can't see why the ISB makes
> any difference here (at least, for the purposes of PAN).
>
> Thanks,
>
> Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-08  8:34     ` Zhaoyang Huang
@ 2021-10-08  8:45       ` Catalin Marinas
  -1 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2021-10-08  8:45 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Will Deacon, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > test, which can be work around by a msleep on the sw context. We assume
> > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > >
> > > PS:
> > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > alike racing between on chip PAN and SW_PAN.
> >
> > Sorry, but I'm struggling to understand the problem here. Please could you
> > explain it in more detail?
> >
> >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> >   - Can you explain the race that you think might be occurring?
> >   - Why does an ISB prevent the race?
> Please find panic logs[1], related codes[2], sample of debug patch[3]
> below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
> by the debug patch during retest (all entrances that msr ttbr1_el1 are
> under watch) which should work. Adding ISB here to prevent race on
> TTBR1 from previous access of sysregs which can affect the msr
> result(the test is still ongoing). Could the race be
> ARM64_HAS_PAN(automated by core) and SW_PAN.

Can you please change the ARM64_HAS_PAN type to
ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE? I wonder whether
system_uses_ttbr0_pan() changes its output when all CPUs had been
brought up and system_uses_hw_pan() returns true.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-08  8:45       ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2021-10-08  8:45 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Will Deacon, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > test, which can be work around by a msleep on the sw context. We assume
> > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > >
> > > PS:
> > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > alike racing between on chip PAN and SW_PAN.
> >
> > Sorry, but I'm struggling to understand the problem here. Please could you
> > explain it in more detail?
> >
> >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> >   - Can you explain the race that you think might be occurring?
> >   - Why does an ISB prevent the race?
> Please find panic logs[1], related codes[2], sample of debug patch[3]
> below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
> by the debug patch during retest (all entrances that msr ttbr1_el1 are
> under watch) which should work. Adding ISB here to prevent race on
> TTBR1 from previous access of sysregs which can affect the msr
> result(the test is still ongoing). Could the race be
> ARM64_HAS_PAN(automated by core) and SW_PAN.

Can you please change the ARM64_HAS_PAN type to
ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE? I wonder whether
system_uses_ttbr0_pan() changes its output when all CPUs had been
brought up and system_uses_hw_pan() returns true.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-08  8:45       ` Catalin Marinas
@ 2021-10-08  8:55         ` Zhaoyang Huang
  -1 siblings, 0 replies; 20+ messages in thread
From: Zhaoyang Huang @ 2021-10-08  8:55 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 8, 2021 at 4:45 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > >
> > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > test, which can be work around by a msleep on the sw context. We assume
> > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > >
> > > > PS:
> > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > alike racing between on chip PAN and SW_PAN.
> > >
> > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > explain it in more detail?
> > >
> > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > >   - Can you explain the race that you think might be occurring?
> > >   - Why does an ISB prevent the race?
> > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
> > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > under watch) which should work. Adding ISB here to prevent race on
> > TTBR1 from previous access of sysregs which can affect the msr
> > result(the test is still ongoing). Could the race be
> > ARM64_HAS_PAN(automated by core) and SW_PAN.
>
> Can you please change the ARM64_HAS_PAN type to
> ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE? I wonder whether
> system_uses_ttbr0_pan() changes its output when all CPUs had been
> brought up and system_uses_hw_pan() returns true.
ok, thanks. We will try. Is it a workaround for known defect?
>
> --
> Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-08  8:55         ` Zhaoyang Huang
  0 siblings, 0 replies; 20+ messages in thread
From: Zhaoyang Huang @ 2021-10-08  8:55 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 8, 2021 at 4:45 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > >
> > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > test, which can be work around by a msleep on the sw context. We assume
> > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > >
> > > > PS:
> > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > alike racing between on chip PAN and SW_PAN.
> > >
> > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > explain it in more detail?
> > >
> > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > >   - Can you explain the race that you think might be occurring?
> > >   - Why does an ISB prevent the race?
> > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
> > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > under watch) which should work. Adding ISB here to prevent race on
> > TTBR1 from previous access of sysregs which can affect the msr
> > result(the test is still ongoing). Could the race be
> > ARM64_HAS_PAN(automated by core) and SW_PAN.
>
> Can you please change the ARM64_HAS_PAN type to
> ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE? I wonder whether
> system_uses_ttbr0_pan() changes its output when all CPUs had been
> brought up and system_uses_hw_pan() returns true.
ok, thanks. We will try. Is it a workaround for known defect?
>
> --
> Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-08  8:55         ` Zhaoyang Huang
@ 2021-10-08  9:07           ` Catalin Marinas
  -1 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2021-10-08  9:07 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Will Deacon, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 08, 2021 at 04:55:05PM +0800, Zhaoyang Huang wrote:
> On Fri, Oct 8, 2021 at 4:45 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > >
> > > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > > test, which can be work around by a msleep on the sw context. We assume
> > > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > > >
> > > > > PS:
> > > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > > alike racing between on chip PAN and SW_PAN.
> > > >
> > > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > > explain it in more detail?
> > > >
> > > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > > >   - Can you explain the race that you think might be occurring?
> > > >   - Why does an ISB prevent the race?
> > > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > > below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
> > > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > > under watch) which should work. Adding ISB here to prevent race on
> > > TTBR1 from previous access of sysregs which can affect the msr
> > > result(the test is still ongoing). Could the race be
> > > ARM64_HAS_PAN(automated by core) and SW_PAN.
> >
> > Can you please change the ARM64_HAS_PAN type to
> > ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE? I wonder whether
> > system_uses_ttbr0_pan() changes its output when all CPUs had been
> > brought up and system_uses_hw_pan() returns true.
> 
> ok, thanks. We will try. Is it a workaround for known defect?

No, other than the potential kernel bug you reported.

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-08  9:07           ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2021-10-08  9:07 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Will Deacon, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 08, 2021 at 04:55:05PM +0800, Zhaoyang Huang wrote:
> On Fri, Oct 8, 2021 at 4:45 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > >
> > > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > > test, which can be work around by a msleep on the sw context. We assume
> > > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > > >
> > > > > PS:
> > > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > > alike racing between on chip PAN and SW_PAN.
> > > >
> > > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > > explain it in more detail?
> > > >
> > > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > > >   - Can you explain the race that you think might be occurring?
> > > >   - Why does an ISB prevent the race?
> > > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > > below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
> > > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > > under watch) which should work. Adding ISB here to prevent race on
> > > TTBR1 from previous access of sysregs which can affect the msr
> > > result(the test is still ongoing). Could the race be
> > > ARM64_HAS_PAN(automated by core) and SW_PAN.
> >
> > Can you please change the ARM64_HAS_PAN type to
> > ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE? I wonder whether
> > system_uses_ttbr0_pan() changes its output when all CPUs had been
> > brought up and system_uses_hw_pan() returns true.
> 
> ok, thanks. We will try. Is it a workaround for known defect?

No, other than the potential kernel bug you reported.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-08  9:07           ` Catalin Marinas
@ 2021-10-11  2:49             ` Zhaoyang Huang
  -1 siblings, 0 replies; 20+ messages in thread
From: Zhaoyang Huang @ 2021-10-11  2:49 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 8, 2021 at 5:07 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Fri, Oct 08, 2021 at 04:55:05PM +0800, Zhaoyang Huang wrote:
> > On Fri, Oct 8, 2021 at 4:45 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > > > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > >
> > > > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > > > test, which can be work around by a msleep on the sw context. We assume
> > > > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > > > >
> > > > > > PS:
> > > > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > > > alike racing between on chip PAN and SW_PAN.
> > > > >
> > > > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > > > explain it in more detail?
> > > > >
> > > > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > > > >   - Can you explain the race that you think might be occurring?
> > > > >   - Why does an ISB prevent the race?
> > > > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > > > below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
> > > > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > > > under watch) which should work. Adding ISB here to prevent race on
> > > > TTBR1 from previous access of sysregs which can affect the msr
> > > > result(the test is still ongoing). Could the race be
> > > > ARM64_HAS_PAN(automated by core) and SW_PAN.
> > >
> > > Can you please change the ARM64_HAS_PAN type to
> > > ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE? I wonder whether
> > > system_uses_ttbr0_pan() changes its output when all CPUs had been
> > > brought up and system_uses_hw_pan() returns true.
> >
> > ok, thanks. We will try. Is it a workaround for known defect?
>
> No, other than the potential kernel bug you reported.
Changing the type to ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE doesn't work
for this issue.
>
> --
> Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-11  2:49             ` Zhaoyang Huang
  0 siblings, 0 replies; 20+ messages in thread
From: Zhaoyang Huang @ 2021-10-11  2:49 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Mark Rutland, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Fri, Oct 8, 2021 at 5:07 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Fri, Oct 08, 2021 at 04:55:05PM +0800, Zhaoyang Huang wrote:
> > On Fri, Oct 8, 2021 at 4:45 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > > > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > >
> > > > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > > > test, which can be work around by a msleep on the sw context. We assume
> > > > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > > > >
> > > > > > PS:
> > > > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > > > alike racing between on chip PAN and SW_PAN.
> > > > >
> > > > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > > > explain it in more detail?
> > > > >
> > > > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > > > >   - Can you explain the race that you think might be occurring?
> > > > >   - Why does an ISB prevent the race?
> > > > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > > > below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
> > > > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > > > under watch) which should work. Adding ISB here to prevent race on
> > > > TTBR1 from previous access of sysregs which can affect the msr
> > > > result(the test is still ongoing). Could the race be
> > > > ARM64_HAS_PAN(automated by core) and SW_PAN.
> > >
> > > Can you please change the ARM64_HAS_PAN type to
> > > ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE? I wonder whether
> > > system_uses_ttbr0_pan() changes its output when all CPUs had been
> > > brought up and system_uses_hw_pan() returns true.
> >
> > ok, thanks. We will try. Is it a workaround for known defect?
>
> No, other than the potential kernel bug you reported.
Changing the type to ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE doesn't work
for this issue.
>
> --
> Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-08  8:34     ` Zhaoyang Huang
@ 2021-10-11  9:38       ` Mark Rutland
  -1 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2021-10-11  9:38 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Will Deacon, Catalin Marinas, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

Hi,

On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > test, which can be work around by a msleep on the sw context. We assume
> > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > >
> > > PS:
> > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > alike racing between on chip PAN and SW_PAN.
> >
> > Sorry, but I'm struggling to understand the problem here. Please could you
> > explain it in more detail?
> >
> >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> >   - Can you explain the race that you think might be occurring?
> >   - Why does an ISB prevent the race?
> Please find panic logs[1], related codes[2], sample of debug patch[3]
> below. TTBR1_EL1 equals 0x34000000 when panic

Just to check, how do you know the value of TTBR1_EL1 was 0x34000000?
That isn't in the log sample below -- was that from the output of
show_pte(), an external debugger, or something else?

I'm assuming from the "(ptrval)" bits below that can't have been from
show_pte().

> and can NOT be captured
> by the debug patch during retest (all entrances that msr ttbr1_el1 are
> under watch) which should work. Adding ISB here to prevent race on
> TTBR1 from previous access of sysregs which can affect the msr
> result(the test is still ongoing). Could the race be
> ARM64_HAS_PAN(automated by core) and SW_PAN.
> 
> [1]
> [    0.348000]  [0:    migration/0:   11] Synchronous External Abort:
> level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
> [    0.352000]  [0:    migration/0:   11] Internal error: : 96000055
> [#1] PREEMPT SMP
> [    0.352000]  [0:    migration/0:   11] Modules linked in:
> [    0.352000]  [0:    migration/0:   11] Process migration/0 (pid:
> 11, stack limit = 0x        (ptrval))
> [    0.352000]  [0:    migration/0:   11] CPU: 0 PID: 11 Comm:
> migration/0 Tainted: G S

Assuming I've read the `taint_flags` table correctly, that 'S' is
`TAINT_CPU_OUT_OF_SPEC`, for which we should dump warnings for at boot
time. The 'G' indicates the absence of proprietary modules.

Can you provide a full dmesg for a failed boot, please?

Have you made any changes to arch/arm64/kernel/cpufeature.c?

Are you able to test with a mainline kernel?

> 4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
>
> [    0.352000]  [0:    migration/0:   11] Hardware name: Spreadtrum
> UMS9230 1H10 SoC (DT)
> [    0.352000]  [0:    migration/0:   11] task:         (ptrval)
> task.stack:         (ptrval)
> [    0.352000]  [0:    migration/0:   11] pc : patch_alternative+0x68/0x27c
> [    0.352000]  [0:    migration/0:   11] lr :
> __apply_alternatives.llvm.7450387295891320208+0x60/0x160
> 
> [2]
> __apply_alternatives
>    for()
>        patch_alternative    <----panic here in the 2nd round of loop
> after invoking flush_icache_range
>        flush_icache_range
> 
> [3]
> sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
> + tst     \tmp1, #0xffff80000000 // check ttbr1_el1 valid
> +    b.le    .

What are you trying to detect for here? This is testing both the ASID
and BADDR[47] bits, so I don;t understand the rationale.

Thanks,
Mark.

> msr ttbr1_el1, \tmp1 // set reserved ASID
> 
> >
> > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > ---
> > >  arch/arm64/kernel/cpufeature.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > index efed283..3c0de0d 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> > >       WARN_ON_ONCE(in_interrupt());
> > >
> > >       sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> > > +     isb();
> > >       set_pstate_pan(1);
> >
> > SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
> > is itself a context-synchronizing event, so I can't see why the ISB makes
> > any difference here (at least, for the purposes of PAN).
> >
> > Thanks,
> >
> > Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-11  9:38       ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2021-10-11  9:38 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Will Deacon, Catalin Marinas, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

Hi,

On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > test, which can be work around by a msleep on the sw context. We assume
> > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > >
> > > PS:
> > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > alike racing between on chip PAN and SW_PAN.
> >
> > Sorry, but I'm struggling to understand the problem here. Please could you
> > explain it in more detail?
> >
> >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> >   - Can you explain the race that you think might be occurring?
> >   - Why does an ISB prevent the race?
> Please find panic logs[1], related codes[2], sample of debug patch[3]
> below. TTBR1_EL1 equals 0x34000000 when panic

Just to check, how do you know the value of TTBR1_EL1 was 0x34000000?
That isn't in the log sample below -- was that from the output of
show_pte(), an external debugger, or something else?

I'm assuming from the "(ptrval)" bits below that can't have been from
show_pte().

> and can NOT be captured
> by the debug patch during retest (all entrances that msr ttbr1_el1 are
> under watch) which should work. Adding ISB here to prevent race on
> TTBR1 from previous access of sysregs which can affect the msr
> result(the test is still ongoing). Could the race be
> ARM64_HAS_PAN(automated by core) and SW_PAN.
> 
> [1]
> [    0.348000]  [0:    migration/0:   11] Synchronous External Abort:
> level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
> [    0.352000]  [0:    migration/0:   11] Internal error: : 96000055
> [#1] PREEMPT SMP
> [    0.352000]  [0:    migration/0:   11] Modules linked in:
> [    0.352000]  [0:    migration/0:   11] Process migration/0 (pid:
> 11, stack limit = 0x        (ptrval))
> [    0.352000]  [0:    migration/0:   11] CPU: 0 PID: 11 Comm:
> migration/0 Tainted: G S

Assuming I've read the `taint_flags` table correctly, that 'S' is
`TAINT_CPU_OUT_OF_SPEC`, for which we should dump warnings for at boot
time. The 'G' indicates the absence of proprietary modules.

Can you provide a full dmesg for a failed boot, please?

Have you made any changes to arch/arm64/kernel/cpufeature.c?

Are you able to test with a mainline kernel?

> 4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
>
> [    0.352000]  [0:    migration/0:   11] Hardware name: Spreadtrum
> UMS9230 1H10 SoC (DT)
> [    0.352000]  [0:    migration/0:   11] task:         (ptrval)
> task.stack:         (ptrval)
> [    0.352000]  [0:    migration/0:   11] pc : patch_alternative+0x68/0x27c
> [    0.352000]  [0:    migration/0:   11] lr :
> __apply_alternatives.llvm.7450387295891320208+0x60/0x160
> 
> [2]
> __apply_alternatives
>    for()
>        patch_alternative    <----panic here in the 2nd round of loop
> after invoking flush_icache_range
>        flush_icache_range
> 
> [3]
> sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
> + tst     \tmp1, #0xffff80000000 // check ttbr1_el1 valid
> +    b.le    .

What are you trying to detect for here? This is testing both the ASID
and BADDR[47] bits, so I don;t understand the rationale.

Thanks,
Mark.

> msr ttbr1_el1, \tmp1 // set reserved ASID
> 
> >
> > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > ---
> > >  arch/arm64/kernel/cpufeature.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > index efed283..3c0de0d 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> > >       WARN_ON_ONCE(in_interrupt());
> > >
> > >       sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> > > +     isb();
> > >       set_pstate_pan(1);
> >
> > SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
> > is itself a context-synchronizing event, so I can't see why the ISB makes
> > any difference here (at least, for the purposes of PAN).
> >
> > Thanks,
> >
> > Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-11  9:38       ` Mark Rutland
@ 2021-10-11 11:08         ` Zhaoyang Huang
  -1 siblings, 0 replies; 20+ messages in thread
From: Zhaoyang Huang @ 2021-10-11 11:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Catalin Marinas, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Mon, Oct 11, 2021 at 5:38 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi,
>
> On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > >
> > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > test, which can be work around by a msleep on the sw context. We assume
> > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > >
> > > > PS:
> > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > alike racing between on chip PAN and SW_PAN.
> > >
> > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > explain it in more detail?
> > >
> > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > >   - Can you explain the race that you think might be occurring?
> > >   - Why does an ISB prevent the race?
> > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > below. TTBR1_EL1 equals 0x34000000 when panic
>
> Just to check, how do you know the value of TTBR1_EL1 was 0x34000000?
> That isn't in the log sample below -- was that from the output of
> show_pte(), an external debugger, or something else?
>
> I'm assuming from the "(ptrval)" bits below that can't have been from
> show_pte().
>
> > and can NOT be captured
> > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > under watch) which should work. Adding ISB here to prevent race on
> > TTBR1 from previous access of sysregs which can affect the msr
> > result(the test is still ongoing). Could the race be
> > ARM64_HAS_PAN(automated by core) and SW_PAN.
> >
> > [1]
> > [    0.348000]  [0:    migration/0:   11] Synchronous External Abort:
> > level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
> > [    0.352000]  [0:    migration/0:   11] Internal error: : 96000055
> > [#1] PREEMPT SMP
> > [    0.352000]  [0:    migration/0:   11] Modules linked in:
> > [    0.352000]  [0:    migration/0:   11] Process migration/0 (pid:
> > 11, stack limit = 0x        (ptrval))
> > [    0.352000]  [0:    migration/0:   11] CPU: 0 PID: 11 Comm:
> > migration/0 Tainted: G S
>
> Assuming I've read the `taint_flags` table correctly, that 'S' is
> `TAINT_CPU_OUT_OF_SPEC`, for which we should dump warnings for at boot
> time. The 'G' indicates the absence of proprietary modules.
>
> Can you provide a full dmesg for a failed boot, please?
>
> Have you made any changes to arch/arm64/kernel/cpufeature.c?
>
> Are you able to test with a mainline kernel?
>
> > 4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
> >
> > [    0.352000]  [0:    migration/0:   11] Hardware name: Spreadtrum
> > UMS9230 1H10 SoC (DT)
> > [    0.352000]  [0:    migration/0:   11] task:         (ptrval)
> > task.stack:         (ptrval)
> > [    0.352000]  [0:    migration/0:   11] pc : patch_alternative+0x68/0x27c
> > [    0.352000]  [0:    migration/0:   11] lr :
> > __apply_alternatives.llvm.7450387295891320208+0x60/0x160
> >
> > [2]
> > __apply_alternatives
> >    for()
> >        patch_alternative    <----panic here in the 2nd round of loop
> > after invoking flush_icache_range
> >        flush_icache_range
> >
> > [3]
> > sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
> > + tst     \tmp1, #0xffff80000000 // check ttbr1_el1 valid
> > +    b.le    .
>
> What are you trying to detect for here? This is testing both the ASID
> and BADDR[47] bits, so I don;t understand the rationale.
>
> Thanks,
> Mark.
this issue is fixed by the patch 'arm64: Avoid flush_icache_range() in
alternatives patching code(429388682dc266e7a693f9c27e3aabd341d55343)'.
thanks
>
> > msr ttbr1_el1, \tmp1 // set reserved ASID
> >
> > >
> > > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > ---
> > > >  arch/arm64/kernel/cpufeature.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > > index efed283..3c0de0d 100644
> > > > --- a/arch/arm64/kernel/cpufeature.c
> > > > +++ b/arch/arm64/kernel/cpufeature.c
> > > > @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> > > >       WARN_ON_ONCE(in_interrupt());
> > > >
> > > >       sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> > > > +     isb();
> > > >       set_pstate_pan(1);
> > >
> > > SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
> > > is itself a context-synchronizing event, so I can't see why the ISB makes
> > > any difference here (at least, for the purposes of PAN).
> > >
> > > Thanks,
> > >
> > > Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-11 11:08         ` Zhaoyang Huang
  0 siblings, 0 replies; 20+ messages in thread
From: Zhaoyang Huang @ 2021-10-11 11:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Catalin Marinas, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Mon, Oct 11, 2021 at 5:38 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi,
>
> On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > >
> > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > test, which can be work around by a msleep on the sw context. We assume
> > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > >
> > > > PS:
> > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > alike racing between on chip PAN and SW_PAN.
> > >
> > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > explain it in more detail?
> > >
> > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > >   - Can you explain the race that you think might be occurring?
> > >   - Why does an ISB prevent the race?
> > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > below. TTBR1_EL1 equals 0x34000000 when panic
>
> Just to check, how do you know the value of TTBR1_EL1 was 0x34000000?
> That isn't in the log sample below -- was that from the output of
> show_pte(), an external debugger, or something else?
>
> I'm assuming from the "(ptrval)" bits below that can't have been from
> show_pte().
>
> > and can NOT be captured
> > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > under watch) which should work. Adding ISB here to prevent race on
> > TTBR1 from previous access of sysregs which can affect the msr
> > result(the test is still ongoing). Could the race be
> > ARM64_HAS_PAN(automated by core) and SW_PAN.
> >
> > [1]
> > [    0.348000]  [0:    migration/0:   11] Synchronous External Abort:
> > level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
> > [    0.352000]  [0:    migration/0:   11] Internal error: : 96000055
> > [#1] PREEMPT SMP
> > [    0.352000]  [0:    migration/0:   11] Modules linked in:
> > [    0.352000]  [0:    migration/0:   11] Process migration/0 (pid:
> > 11, stack limit = 0x        (ptrval))
> > [    0.352000]  [0:    migration/0:   11] CPU: 0 PID: 11 Comm:
> > migration/0 Tainted: G S
>
> Assuming I've read the `taint_flags` table correctly, that 'S' is
> `TAINT_CPU_OUT_OF_SPEC`, for which we should dump warnings for at boot
> time. The 'G' indicates the absence of proprietary modules.
>
> Can you provide a full dmesg for a failed boot, please?
>
> Have you made any changes to arch/arm64/kernel/cpufeature.c?
>
> Are you able to test with a mainline kernel?
>
> > 4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
> >
> > [    0.352000]  [0:    migration/0:   11] Hardware name: Spreadtrum
> > UMS9230 1H10 SoC (DT)
> > [    0.352000]  [0:    migration/0:   11] task:         (ptrval)
> > task.stack:         (ptrval)
> > [    0.352000]  [0:    migration/0:   11] pc : patch_alternative+0x68/0x27c
> > [    0.352000]  [0:    migration/0:   11] lr :
> > __apply_alternatives.llvm.7450387295891320208+0x60/0x160
> >
> > [2]
> > __apply_alternatives
> >    for()
> >        patch_alternative    <----panic here in the 2nd round of loop
> > after invoking flush_icache_range
> >        flush_icache_range
> >
> > [3]
> > sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
> > + tst     \tmp1, #0xffff80000000 // check ttbr1_el1 valid
> > +    b.le    .
>
> What are you trying to detect for here? This is testing both the ASID
> and BADDR[47] bits, so I don;t understand the rationale.
>
> Thanks,
> Mark.
this issue is fixed by the patch 'arm64: Avoid flush_icache_range() in
alternatives patching code(429388682dc266e7a693f9c27e3aabd341d55343)'.
thanks
>
> > msr ttbr1_el1, \tmp1 // set reserved ASID
> >
> > >
> > > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > ---
> > > >  arch/arm64/kernel/cpufeature.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > > index efed283..3c0de0d 100644
> > > > --- a/arch/arm64/kernel/cpufeature.c
> > > > +++ b/arch/arm64/kernel/cpufeature.c
> > > > @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> > > >       WARN_ON_ONCE(in_interrupt());
> > > >
> > > >       sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> > > > +     isb();
> > > >       set_pstate_pan(1);
> > >
> > > SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
> > > is itself a context-synchronizing event, so I can't see why the ISB makes
> > > any difference here (at least, for the purposes of PAN).
> > >
> > > Thanks,
> > >
> > > Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
  2021-10-11 11:08         ` Zhaoyang Huang
@ 2021-10-11 12:15           ` Mark Rutland
  -1 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2021-10-11 12:15 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Will Deacon, Catalin Marinas, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Mon, Oct 11, 2021 at 07:08:00PM +0800, Zhaoyang Huang wrote:
> On Mon, Oct 11, 2021 at 5:38 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > >
> > > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > > test, which can be work around by a msleep on the sw context. We assume
> > > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > > >
> > > > > PS:
> > > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > > alike racing between on chip PAN and SW_PAN.
> > > >
> > > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > > explain it in more detail?
> > > >
> > > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > > >   - Can you explain the race that you think might be occurring?
> > > >   - Why does an ISB prevent the race?
> > > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > > below. TTBR1_EL1 equals 0x34000000 when panic
> >
> > Just to check, how do you know the value of TTBR1_EL1 was 0x34000000?
> > That isn't in the log sample below -- was that from the output of
> > show_pte(), an external debugger, or something else?
> >
> > I'm assuming from the "(ptrval)" bits below that can't have been from
> > show_pte().
> >
> > > and can NOT be captured
> > > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > > under watch) which should work. Adding ISB here to prevent race on
> > > TTBR1 from previous access of sysregs which can affect the msr
> > > result(the test is still ongoing). Could the race be
> > > ARM64_HAS_PAN(automated by core) and SW_PAN.
> > >
> > > [1]
> > > [    0.348000]  [0:    migration/0:   11] Synchronous External Abort:
> > > level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
> > > [    0.352000]  [0:    migration/0:   11] Internal error: : 96000055
> > > [#1] PREEMPT SMP
> > > [    0.352000]  [0:    migration/0:   11] Modules linked in:
> > > [    0.352000]  [0:    migration/0:   11] Process migration/0 (pid:
> > > 11, stack limit = 0x        (ptrval))
> > > [    0.352000]  [0:    migration/0:   11] CPU: 0 PID: 11 Comm:
> > > migration/0 Tainted: G S
> >
> > Assuming I've read the `taint_flags` table correctly, that 'S' is
> > `TAINT_CPU_OUT_OF_SPEC`, for which we should dump warnings for at boot
> > time. The 'G' indicates the absence of proprietary modules.
> >
> > Can you provide a full dmesg for a failed boot, please?
> >
> > Have you made any changes to arch/arm64/kernel/cpufeature.c?
> >
> > Are you able to test with a mainline kernel?
> >
> > > 4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
> > >
> > > [    0.352000]  [0:    migration/0:   11] Hardware name: Spreadtrum
> > > UMS9230 1H10 SoC (DT)
> > > [    0.352000]  [0:    migration/0:   11] task:         (ptrval)
> > > task.stack:         (ptrval)
> > > [    0.352000]  [0:    migration/0:   11] pc : patch_alternative+0x68/0x27c
> > > [    0.352000]  [0:    migration/0:   11] lr :
> > > __apply_alternatives.llvm.7450387295891320208+0x60/0x160
> > >
> > > [2]
> > > __apply_alternatives
> > >    for()
> > >        patch_alternative    <----panic here in the 2nd round of loop
> > > after invoking flush_icache_range
> > >        flush_icache_range
> > >
> > > [3]
> > > sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
> > > + tst     \tmp1, #0xffff80000000 // check ttbr1_el1 valid
> > > +    b.le    .
> >
> > What are you trying to detect for here? This is testing both the ASID
> > and BADDR[47] bits, so I don;t understand the rationale.
> >
> > Thanks,
> > Mark.
> this issue is fixed by the patch 'arm64: Avoid flush_icache_range() in
> alternatives patching code(429388682dc266e7a693f9c27e3aabd341d55343)'.
> thanks

Ah, thanks for this.

So this is because in arch/arm64/mm/cache.S we do (abbreviated):

| ENTRY(flush_icache_range)
| 	/* FALLTHROUGH */
| ENTRY(__flush_cache_user_range)
| 	uaccess_ttbr0_enable x2, x3, x4
| 
| 	...
| 
| 	uaccess_ttbr0_disable x1, x2
| 	ret
| ENDPROC(flush_icache_range)

... and even if I-caches are coherent and we don't execute junk for
something we've patched but not invalidated, since we patch each
alternative site in turn we can end up with unbalanced calls to
enable/disable across a number of patches.

It looks like 429388682dc266e7a693f9c27e3aabd341d55343 isn't in the
current linux-4.14.y, so we should post a backported version. From local
testing, that applies trivially.

Who wants to post that to stable? I can do if people want.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] arch: ARM64: add isb before enable pan
@ 2021-10-11 12:15           ` Mark Rutland
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Rutland @ 2021-10-11 12:15 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Will Deacon, Catalin Marinas, Suzuki K Poulose, Ionela Voinescu,
	Quentin Perret, Vladimir Murzin, linux-arm-kernel,
	Zhaoyang Huang, LKML, Ke Wang, ping.zhou1

On Mon, Oct 11, 2021 at 07:08:00PM +0800, Zhaoyang Huang wrote:
> On Mon, Oct 11, 2021 at 5:38 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > On Fri, Oct 08, 2021 at 04:34:12PM +0800, Zhaoyang Huang wrote:
> > > On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@kernel.org> wrote:
> > > > On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > >
> > > > > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > > > > test, which can be work around by a msleep on the sw context. We assume
> > > > > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> > > > >
> > > > > PS:
> > > > > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > > > > alike racing between on chip PAN and SW_PAN.
> > > >
> > > > Sorry, but I'm struggling to understand the problem here. Please could you
> > > > explain it in more detail?
> > > >
> > > >   - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> > > >   - Can you explain the race that you think might be occurring?
> > > >   - Why does an ISB prevent the race?
> > > Please find panic logs[1], related codes[2], sample of debug patch[3]
> > > below. TTBR1_EL1 equals 0x34000000 when panic
> >
> > Just to check, how do you know the value of TTBR1_EL1 was 0x34000000?
> > That isn't in the log sample below -- was that from the output of
> > show_pte(), an external debugger, or something else?
> >
> > I'm assuming from the "(ptrval)" bits below that can't have been from
> > show_pte().
> >
> > > and can NOT be captured
> > > by the debug patch during retest (all entrances that msr ttbr1_el1 are
> > > under watch) which should work. Adding ISB here to prevent race on
> > > TTBR1 from previous access of sysregs which can affect the msr
> > > result(the test is still ongoing). Could the race be
> > > ARM64_HAS_PAN(automated by core) and SW_PAN.
> > >
> > > [1]
> > > [    0.348000]  [0:    migration/0:   11] Synchronous External Abort:
> > > level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
> > > [    0.352000]  [0:    migration/0:   11] Internal error: : 96000055
> > > [#1] PREEMPT SMP
> > > [    0.352000]  [0:    migration/0:   11] Modules linked in:
> > > [    0.352000]  [0:    migration/0:   11] Process migration/0 (pid:
> > > 11, stack limit = 0x        (ptrval))
> > > [    0.352000]  [0:    migration/0:   11] CPU: 0 PID: 11 Comm:
> > > migration/0 Tainted: G S
> >
> > Assuming I've read the `taint_flags` table correctly, that 'S' is
> > `TAINT_CPU_OUT_OF_SPEC`, for which we should dump warnings for at boot
> > time. The 'G' indicates the absence of proprietary modules.
> >
> > Can you provide a full dmesg for a failed boot, please?
> >
> > Have you made any changes to arch/arm64/kernel/cpufeature.c?
> >
> > Are you able to test with a mainline kernel?
> >
> > > 4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
> > >
> > > [    0.352000]  [0:    migration/0:   11] Hardware name: Spreadtrum
> > > UMS9230 1H10 SoC (DT)
> > > [    0.352000]  [0:    migration/0:   11] task:         (ptrval)
> > > task.stack:         (ptrval)
> > > [    0.352000]  [0:    migration/0:   11] pc : patch_alternative+0x68/0x27c
> > > [    0.352000]  [0:    migration/0:   11] lr :
> > > __apply_alternatives.llvm.7450387295891320208+0x60/0x160
> > >
> > > [2]
> > > __apply_alternatives
> > >    for()
> > >        patch_alternative    <----panic here in the 2nd round of loop
> > > after invoking flush_icache_range
> > >        flush_icache_range
> > >
> > > [3]
> > > sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
> > > + tst     \tmp1, #0xffff80000000 // check ttbr1_el1 valid
> > > +    b.le    .
> >
> > What are you trying to detect for here? This is testing both the ASID
> > and BADDR[47] bits, so I don;t understand the rationale.
> >
> > Thanks,
> > Mark.
> this issue is fixed by the patch 'arm64: Avoid flush_icache_range() in
> alternatives patching code(429388682dc266e7a693f9c27e3aabd341d55343)'.
> thanks

Ah, thanks for this.

So this is because in arch/arm64/mm/cache.S we do (abbreviated):

| ENTRY(flush_icache_range)
| 	/* FALLTHROUGH */
| ENTRY(__flush_cache_user_range)
| 	uaccess_ttbr0_enable x2, x3, x4
| 
| 	...
| 
| 	uaccess_ttbr0_disable x1, x2
| 	ret
| ENDPROC(flush_icache_range)

... and even if I-caches are coherent and we don't execute junk for
something we've patched but not invalidated, since we patch each
alternative site in turn we can end up with unbalanced calls to
enable/disable across a number of patches.

It looks like 429388682dc266e7a693f9c27e3aabd341d55343 isn't in the
current linux-4.14.y, so we should post a backported version. From local
testing, that applies trivially.

Who wants to post that to stable? I can do if people want.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-10-11 12:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08  6:07 [RFC PATCH] arch: ARM64: add isb before enable pan Huangzhaoyang
2021-10-08  6:07 ` Huangzhaoyang
2021-10-08  8:01 ` Will Deacon
2021-10-08  8:01   ` Will Deacon
2021-10-08  8:34   ` Zhaoyang Huang
2021-10-08  8:34     ` Zhaoyang Huang
2021-10-08  8:45     ` Catalin Marinas
2021-10-08  8:45       ` Catalin Marinas
2021-10-08  8:55       ` Zhaoyang Huang
2021-10-08  8:55         ` Zhaoyang Huang
2021-10-08  9:07         ` Catalin Marinas
2021-10-08  9:07           ` Catalin Marinas
2021-10-11  2:49           ` Zhaoyang Huang
2021-10-11  2:49             ` Zhaoyang Huang
2021-10-11  9:38     ` Mark Rutland
2021-10-11  9:38       ` Mark Rutland
2021-10-11 11:08       ` Zhaoyang Huang
2021-10-11 11:08         ` Zhaoyang Huang
2021-10-11 12:15         ` Mark Rutland
2021-10-11 12:15           ` Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.