linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: instruction storage exception handling
       [not found] <uqZVxyE3l9oalZp_hyXFJvdH-ADNTvpOuQeoNGyqrUcoNgh9afea1-FzfZKMgiaF5WxY4kdMQlJYzmjvdQ2E2joF86-mEcaxdifht9z8NA0=@protonmail.com>
@ 2021-10-27  4:10 ` Nicholas Piggin
  2021-10-27  5:00   ` Christophe Leroy
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2021-10-27  4:10 UTC (permalink / raw)
  To: Jacques de Laval, To: Scott Wood; +Cc: linuxppc-dev

Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
> Hi,
> 
> We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
> T1023 SOC with two e5500 cores, and are running in 32-bit mode:
> 
> 	CONFIG_PPC32=y
> 	# CONFIG_PPC64 is not set
> 
> 	#
> 	# Processor support
> 	#
> 	# CONFIG_PPC_BOOK3S_32 is not set
> 	CONFIG_PPC_85xx=y
> 	# CONFIG_PPC_8xx is not set
> 	# CONFIG_40x is not set
> 	# CONFIG_44x is not set
> 	CONFIG_GENERIC_CPU=y
> 	# CONFIG_E5500_CPU is not set
> 	# CONFIG_E6500_CPU is not set
> 	CONFIG_E500=y
> 	CONFIG_PPC_E500MC=y
> 	CONFIG_PPC_FPU_REGS=y
> 	CONFIG_PPC_FPU=y
> 	CONFIG_FSL_EMB_PERFMON=y
> 	CONFIG_FSL_EMB_PERF_EVENT=y
> 	CONFIG_FSL_EMB_PERF_EVENT_E500=y
> 	CONFIG_BOOKE=y
> 	CONFIG_FSL_BOOKE=y
> 	CONFIG_PPC_FSL_BOOK3E=y
> 	CONFIG_PTE_64BIT=y
> 	CONFIG_PHYS_64BIT=y
> 	CONFIG_PPC_MMU_NOHASH=y
> 	CONFIG_PPC_BOOK3E_MMU=y
> 	# CONFIG_PMU_SYSFS is not set
> 	CONFIG_SMP=y
> 	CONFIG_NR_CPUS=2
> 	CONFIG_PPC_DOORBELL=y
> 	# end of Processor support
> 
> We compile using 32-bit Bootlin PPC toolchain:
> 
> 	powerpc-e500mc glibc bleeding-edge 2020.08-1.
> 
> When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
> process is running, and for debugging purposes our init currently looks like
> this:
> 
> 	int main(int argc, char *argv[]){
> 		for (int i = 0; ; i++) {
> 			FILE *fp = fopen("/dev/kmsg", "w");
> 			if (fp) {
> 				fprintf(fp, "%d\n", i);
> 				fclose(fp);
> 			}
> 			sleep(1);
> 		}
> 		return 0;
> 	}
> 
> When the hangup occur we don't get any output at all from our PID 1.
> The last output is from the kernel:
> 
> 	Run /sbin/init as init process
> 	  with arguments:
> 	    /sbin/init
> 	  with environment:
> 	    HOME=/
> 	    TERM=linux
> 	    kgdboc=ttyS0,115200
> 
> When issuing a backtrace on all active cpus we can see that the kernel is
> handling an instruction storage exception:
> 
> 	sysrq: Show backtrace of all active CPUs
> 	sysrq: CPU0:
> 	CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
> 	NIP:  c02aac78 LR: c02aac2c CTR: 00000000
> 	REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
> 	MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
> 	GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
> 	GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
> 	GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
> 	NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
> 	LR [c02aac2c] handle_mm_fault+0xac/0x11f0
> 	Call Trace:
> 	[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
> 	[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
> 	[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
> 	[c1907f10] [c0000988] InstructionStorage+0x108/0x120
> 	--- interrupt: 400 at 0x65a238
> 	NIP:  0065a238 LR: 0052f26c CTR: 0052f260
> 	REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
> 	MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
> 	GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
> 	GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
> 	GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
> 	NIP [0065a238] 0x65a238
> 	LR [0052f26c] 0x52f26c
> 	--- interrupt: 400
> 	Instruction dump:
> 	60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
> 	90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
> 
> We have also observed that the CPU is continuously servicing the same interrupt
> (north of 140k times per sec), it is not deadlocked.
> 
> We have not yet been able to reproduce this behavior under QEMU system
> emulation.
> 
> When bisecting between 5.10 and 5.14.11 we can see that this behavior started
> with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
> 
> 	powerpc: remove arguments from fault handler functions

Thank you for the excellent work to investigate and report this.

> 
> Our best guess that the instruction storage exception is not properly handled
> and the kernel is never able to recover from the page fault, but we don't
> really know how to proceed. Does anyone have any suggestions or insights?

Before my patch, zero was passed to the page fault error code, after
my patch it passes the contents of ESR SPR.

It looks like the BookE instruction access interrupt does not set ESR 
(except for BO interrupts maybe?) so you're getting what was in the ESR
register from a previous interrupt, and maybe if that was a store then
access_error won't cause a segfault because is_exec is true so that
test bails out early, then it might just keep retrying the interrupt.

That could explain why you don't always see the same thing.

Now previous code still saved ESR in regs->esr/dsisr for some reason.
I can't quite see why that should have been necessary though. Does
this patch solve it for you?

Thanks,
Nick
--


diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index e5503420b6c6..0e7cdc8716eb 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -467,10 +467,11 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 
 #define INSTRUCTION_STORAGE_EXCEPTION					      \
 	START_EXCEPTION(InstructionStorage)				      \
-	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);		      \
-	mfspr	r5,SPRN_ESR;		/* Grab the ESR and save it */	      \
+	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);			      \
+	li	r5,0;							      \
+	mfspr	r5,SPRN_ESR;		/* Store 0 in regs->esr (dsisr) */    \
 	stw	r5,_ESR(r11);						      \
-	stw	r12, _DEAR(r11);	/* Pass SRR0 as arg2 */		      \
+	stw	r12, _DEAR(r11);	/* Set regs->dear (dar) */	      \
 	prepare_transfer_to_handler;					      \
 	bl	do_page_fault;						      \
 	b	interrupt_return

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-27  4:10 ` instruction storage exception handling Nicholas Piggin
@ 2021-10-27  5:00   ` Christophe Leroy
  2021-10-27  5:25     ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Leroy @ 2021-10-27  5:00 UTC (permalink / raw)
  To: Nicholas Piggin, Jacques de Laval, Scott Wood; +Cc: linuxppc-dev



Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
> Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
>> Hi,
>>
>> We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
>> T1023 SOC with two e5500 cores, and are running in 32-bit mode:
>>
>> 	CONFIG_PPC32=y
>> 	# CONFIG_PPC64 is not set
>>
>> 	#
>> 	# Processor support
>> 	#
>> 	# CONFIG_PPC_BOOK3S_32 is not set
>> 	CONFIG_PPC_85xx=y
>> 	# CONFIG_PPC_8xx is not set
>> 	# CONFIG_40x is not set
>> 	# CONFIG_44x is not set
>> 	CONFIG_GENERIC_CPU=y
>> 	# CONFIG_E5500_CPU is not set
>> 	# CONFIG_E6500_CPU is not set
>> 	CONFIG_E500=y
>> 	CONFIG_PPC_E500MC=y
>> 	CONFIG_PPC_FPU_REGS=y
>> 	CONFIG_PPC_FPU=y
>> 	CONFIG_FSL_EMB_PERFMON=y
>> 	CONFIG_FSL_EMB_PERF_EVENT=y
>> 	CONFIG_FSL_EMB_PERF_EVENT_E500=y
>> 	CONFIG_BOOKE=y
>> 	CONFIG_FSL_BOOKE=y
>> 	CONFIG_PPC_FSL_BOOK3E=y
>> 	CONFIG_PTE_64BIT=y
>> 	CONFIG_PHYS_64BIT=y
>> 	CONFIG_PPC_MMU_NOHASH=y
>> 	CONFIG_PPC_BOOK3E_MMU=y
>> 	# CONFIG_PMU_SYSFS is not set
>> 	CONFIG_SMP=y
>> 	CONFIG_NR_CPUS=2
>> 	CONFIG_PPC_DOORBELL=y
>> 	# end of Processor support
>>
>> We compile using 32-bit Bootlin PPC toolchain:
>>
>> 	powerpc-e500mc glibc bleeding-edge 2020.08-1.
>>
>> When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
>> process is running, and for debugging purposes our init currently looks like
>> this:
>>
>> 	int main(int argc, char *argv[]){
>> 		for (int i = 0; ; i++) {
>> 			FILE *fp = fopen("/dev/kmsg", "w");
>> 			if (fp) {
>> 				fprintf(fp, "%d\n", i);
>> 				fclose(fp);
>> 			}
>> 			sleep(1);
>> 		}
>> 		return 0;
>> 	}
>>
>> When the hangup occur we don't get any output at all from our PID 1.
>> The last output is from the kernel:
>>
>> 	Run /sbin/init as init process
>> 	  with arguments:
>> 	    /sbin/init
>> 	  with environment:
>> 	    HOME=/
>> 	    TERM=linux
>> 	    kgdboc=ttyS0,115200
>>
>> When issuing a backtrace on all active cpus we can see that the kernel is
>> handling an instruction storage exception:
>>
>> 	sysrq: Show backtrace of all active CPUs
>> 	sysrq: CPU0:
>> 	CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
>> 	NIP:  c02aac78 LR: c02aac2c CTR: 00000000
>> 	REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
>> 	MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
>> 	GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
>> 	GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>> 	GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
>> 	NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
>> 	LR [c02aac2c] handle_mm_fault+0xac/0x11f0
>> 	Call Trace:
>> 	[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
>> 	[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
>> 	[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
>> 	[c1907f10] [c0000988] InstructionStorage+0x108/0x120
>> 	--- interrupt: 400 at 0x65a238
>> 	NIP:  0065a238 LR: 0052f26c CTR: 0052f260
>> 	REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
>> 	MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
>> 	GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
>> 	GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>> 	GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
>> 	NIP [0065a238] 0x65a238
>> 	LR [0052f26c] 0x52f26c
>> 	--- interrupt: 400
>> 	Instruction dump:
>> 	60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
>> 	90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
>>
>> We have also observed that the CPU is continuously servicing the same interrupt
>> (north of 140k times per sec), it is not deadlocked.
>>
>> We have not yet been able to reproduce this behavior under QEMU system
>> emulation.
>>
>> When bisecting between 5.10 and 5.14.11 we can see that this behavior started
>> with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
>>
>> 	powerpc: remove arguments from fault handler functions
> 
> Thank you for the excellent work to investigate and report this.
> 
>>
>> Our best guess that the instruction storage exception is not properly handled
>> and the kernel is never able to recover from the page fault, but we don't
>> really know how to proceed. Does anyone have any suggestions or insights?
> 
> Before my patch, zero was passed to the page fault error code, after
> my patch it passes the contents of ESR SPR.
> 
> It looks like the BookE instruction access interrupt does not set ESR
> (except for BO interrupts maybe?) so you're getting what was in the ESR
> register from a previous interrupt, and maybe if that was a store then
> access_error won't cause a segfault because is_exec is true so that
> test bails out early, then it might just keep retrying the interrupt.
> 
> That could explain why you don't always see the same thing.
> 
> Now previous code still saved ESR in regs->esr/dsisr for some reason.
> I can't quite see why that should have been necessary though. Does
> this patch solve it for you?
> 
> Thanks,
> Nick
> --
> 
> 
> diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
> index e5503420b6c6..0e7cdc8716eb 100644
> --- a/arch/powerpc/kernel/head_booke.h
> +++ b/arch/powerpc/kernel/head_booke.h
> @@ -467,10 +467,11 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>   
>   #define INSTRUCTION_STORAGE_EXCEPTION					      \
>   	START_EXCEPTION(InstructionStorage)				      \
> -	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);		      \
> -	mfspr	r5,SPRN_ESR;		/* Grab the ESR and save it */	      \
> +	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);			      \
> +	li	r5,0;							      \
> +	mfspr	r5,SPRN_ESR;		/* Store 0 in regs->esr (dsisr) */    \

I can't see how that can help, you set r5 to 0 and immediately after you 
reload ESR into r5 so you are still saving garbage into _ESR(r11)

>   	stw	r5,_ESR(r11);						      \
> -	stw	r12, _DEAR(r11);	/* Pass SRR0 as arg2 */		      \
> +	stw	r12, _DEAR(r11);	/* Set regs->dear (dar) */	      \
>   	prepare_transfer_to_handler;					      \
>   	bl	do_page_fault;						      \
>   	b	interrupt_return
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-27  5:00   ` Christophe Leroy
@ 2021-10-27  5:25     ` Nicholas Piggin
  2021-10-27  5:51       ` Christophe Leroy
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2021-10-27  5:25 UTC (permalink / raw)
  To: Christophe Leroy, Jacques de Laval, Scott Wood; +Cc: linuxppc-dev

Excerpts from Christophe Leroy's message of October 27, 2021 3:00 pm:
> 
> 
> Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
>> Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
>>> Hi,
>>>
>>> We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
>>> T1023 SOC with two e5500 cores, and are running in 32-bit mode:
>>>
>>> 	CONFIG_PPC32=y
>>> 	# CONFIG_PPC64 is not set
>>>
>>> 	#
>>> 	# Processor support
>>> 	#
>>> 	# CONFIG_PPC_BOOK3S_32 is not set
>>> 	CONFIG_PPC_85xx=y
>>> 	# CONFIG_PPC_8xx is not set
>>> 	# CONFIG_40x is not set
>>> 	# CONFIG_44x is not set
>>> 	CONFIG_GENERIC_CPU=y
>>> 	# CONFIG_E5500_CPU is not set
>>> 	# CONFIG_E6500_CPU is not set
>>> 	CONFIG_E500=y
>>> 	CONFIG_PPC_E500MC=y
>>> 	CONFIG_PPC_FPU_REGS=y
>>> 	CONFIG_PPC_FPU=y
>>> 	CONFIG_FSL_EMB_PERFMON=y
>>> 	CONFIG_FSL_EMB_PERF_EVENT=y
>>> 	CONFIG_FSL_EMB_PERF_EVENT_E500=y
>>> 	CONFIG_BOOKE=y
>>> 	CONFIG_FSL_BOOKE=y
>>> 	CONFIG_PPC_FSL_BOOK3E=y
>>> 	CONFIG_PTE_64BIT=y
>>> 	CONFIG_PHYS_64BIT=y
>>> 	CONFIG_PPC_MMU_NOHASH=y
>>> 	CONFIG_PPC_BOOK3E_MMU=y
>>> 	# CONFIG_PMU_SYSFS is not set
>>> 	CONFIG_SMP=y
>>> 	CONFIG_NR_CPUS=2
>>> 	CONFIG_PPC_DOORBELL=y
>>> 	# end of Processor support
>>>
>>> We compile using 32-bit Bootlin PPC toolchain:
>>>
>>> 	powerpc-e500mc glibc bleeding-edge 2020.08-1.
>>>
>>> When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
>>> process is running, and for debugging purposes our init currently looks like
>>> this:
>>>
>>> 	int main(int argc, char *argv[]){
>>> 		for (int i = 0; ; i++) {
>>> 			FILE *fp = fopen("/dev/kmsg", "w");
>>> 			if (fp) {
>>> 				fprintf(fp, "%d\n", i);
>>> 				fclose(fp);
>>> 			}
>>> 			sleep(1);
>>> 		}
>>> 		return 0;
>>> 	}
>>>
>>> When the hangup occur we don't get any output at all from our PID 1.
>>> The last output is from the kernel:
>>>
>>> 	Run /sbin/init as init process
>>> 	  with arguments:
>>> 	    /sbin/init
>>> 	  with environment:
>>> 	    HOME=/
>>> 	    TERM=linux
>>> 	    kgdboc=ttyS0,115200
>>>
>>> When issuing a backtrace on all active cpus we can see that the kernel is
>>> handling an instruction storage exception:
>>>
>>> 	sysrq: Show backtrace of all active CPUs
>>> 	sysrq: CPU0:
>>> 	CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
>>> 	NIP:  c02aac78 LR: c02aac2c CTR: 00000000
>>> 	REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
>>> 	MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
>>> 	GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
>>> 	GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
>>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>> 	GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
>>> 	NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
>>> 	LR [c02aac2c] handle_mm_fault+0xac/0x11f0
>>> 	Call Trace:
>>> 	[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
>>> 	[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
>>> 	[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
>>> 	[c1907f10] [c0000988] InstructionStorage+0x108/0x120
>>> 	--- interrupt: 400 at 0x65a238
>>> 	NIP:  0065a238 LR: 0052f26c CTR: 0052f260
>>> 	REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
>>> 	MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
>>> 	GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
>>> 	GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
>>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>> 	GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
>>> 	NIP [0065a238] 0x65a238
>>> 	LR [0052f26c] 0x52f26c
>>> 	--- interrupt: 400
>>> 	Instruction dump:
>>> 	60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
>>> 	90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
>>>
>>> We have also observed that the CPU is continuously servicing the same interrupt
>>> (north of 140k times per sec), it is not deadlocked.
>>>
>>> We have not yet been able to reproduce this behavior under QEMU system
>>> emulation.
>>>
>>> When bisecting between 5.10 and 5.14.11 we can see that this behavior started
>>> with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
>>>
>>> 	powerpc: remove arguments from fault handler functions
>> 
>> Thank you for the excellent work to investigate and report this.
>> 
>>>
>>> Our best guess that the instruction storage exception is not properly handled
>>> and the kernel is never able to recover from the page fault, but we don't
>>> really know how to proceed. Does anyone have any suggestions or insights?
>> 
>> Before my patch, zero was passed to the page fault error code, after
>> my patch it passes the contents of ESR SPR.
>> 
>> It looks like the BookE instruction access interrupt does not set ESR
>> (except for BO interrupts maybe?) so you're getting what was in the ESR
>> register from a previous interrupt, and maybe if that was a store then
>> access_error won't cause a segfault because is_exec is true so that
>> test bails out early, then it might just keep retrying the interrupt.
>> 
>> That could explain why you don't always see the same thing.
>> 
>> Now previous code still saved ESR in regs->esr/dsisr for some reason.
>> I can't quite see why that should have been necessary though. Does
>> this patch solve it for you?
>> 
>> Thanks,
>> Nick
>> --
>> 
>> 
>> diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
>> index e5503420b6c6..0e7cdc8716eb 100644
>> --- a/arch/powerpc/kernel/head_booke.h
>> +++ b/arch/powerpc/kernel/head_booke.h
>> @@ -467,10 +467,11 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>>   
>>   #define INSTRUCTION_STORAGE_EXCEPTION					      \
>>   	START_EXCEPTION(InstructionStorage)				      \
>> -	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);		      \
>> -	mfspr	r5,SPRN_ESR;		/* Grab the ESR and save it */	      \
>> +	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);			      \
>> +	li	r5,0;							      \
>> +	mfspr	r5,SPRN_ESR;		/* Store 0 in regs->esr (dsisr) */    \
> 
> I can't see how that can help, you set r5 to 0 and immediately after you 
> reload ESR into r5 so you are still saving garbage into _ESR(r11)
> 

Oops, stupid mistake.

Thanks,
Nick

diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index e5503420b6c6..094c9790a490 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -467,10 +467,10 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 
 #define INSTRUCTION_STORAGE_EXCEPTION					      \
 	START_EXCEPTION(InstructionStorage)				      \
-	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);		      \
-	mfspr	r5,SPRN_ESR;		/* Grab the ESR and save it */	      \
+	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);			      \
+	li	r5,0;			/* Store 0 in regs->esr (dsisr) */    \
 	stw	r5,_ESR(r11);						      \
-	stw	r12, _DEAR(r11);	/* Pass SRR0 as arg2 */		      \
+	stw	r12, _DEAR(r11);	/* Set regs->dear (dar) */	      \
 	prepare_transfer_to_handler;					      \
 	bl	do_page_fault;						      \
 	b	interrupt_return

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-27  5:25     ` Nicholas Piggin
@ 2021-10-27  5:51       ` Christophe Leroy
  2021-10-27  7:47         ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Leroy @ 2021-10-27  5:51 UTC (permalink / raw)
  To: Nicholas Piggin, Jacques de Laval, Scott Wood; +Cc: linuxppc-dev



Le 27/10/2021 à 07:25, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of October 27, 2021 3:00 pm:
>>
>>
>> Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
>>> Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
>>>> Hi,
>>>>
>>>> We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
>>>> T1023 SOC with two e5500 cores, and are running in 32-bit mode:
>>>>
>>>> 	CONFIG_PPC32=y
>>>> 	# CONFIG_PPC64 is not set
>>>>
>>>> 	#
>>>> 	# Processor support
>>>> 	#
>>>> 	# CONFIG_PPC_BOOK3S_32 is not set
>>>> 	CONFIG_PPC_85xx=y
>>>> 	# CONFIG_PPC_8xx is not set
>>>> 	# CONFIG_40x is not set
>>>> 	# CONFIG_44x is not set
>>>> 	CONFIG_GENERIC_CPU=y
>>>> 	# CONFIG_E5500_CPU is not set
>>>> 	# CONFIG_E6500_CPU is not set
>>>> 	CONFIG_E500=y
>>>> 	CONFIG_PPC_E500MC=y
>>>> 	CONFIG_PPC_FPU_REGS=y
>>>> 	CONFIG_PPC_FPU=y
>>>> 	CONFIG_FSL_EMB_PERFMON=y
>>>> 	CONFIG_FSL_EMB_PERF_EVENT=y
>>>> 	CONFIG_FSL_EMB_PERF_EVENT_E500=y
>>>> 	CONFIG_BOOKE=y
>>>> 	CONFIG_FSL_BOOKE=y
>>>> 	CONFIG_PPC_FSL_BOOK3E=y
>>>> 	CONFIG_PTE_64BIT=y
>>>> 	CONFIG_PHYS_64BIT=y
>>>> 	CONFIG_PPC_MMU_NOHASH=y
>>>> 	CONFIG_PPC_BOOK3E_MMU=y
>>>> 	# CONFIG_PMU_SYSFS is not set
>>>> 	CONFIG_SMP=y
>>>> 	CONFIG_NR_CPUS=2
>>>> 	CONFIG_PPC_DOORBELL=y
>>>> 	# end of Processor support
>>>>
>>>> We compile using 32-bit Bootlin PPC toolchain:
>>>>
>>>> 	powerpc-e500mc glibc bleeding-edge 2020.08-1.
>>>>
>>>> When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
>>>> process is running, and for debugging purposes our init currently looks like
>>>> this:
>>>>
>>>> 	int main(int argc, char *argv[]){
>>>> 		for (int i = 0; ; i++) {
>>>> 			FILE *fp = fopen("/dev/kmsg", "w");
>>>> 			if (fp) {
>>>> 				fprintf(fp, "%d\n", i);
>>>> 				fclose(fp);
>>>> 			}
>>>> 			sleep(1);
>>>> 		}
>>>> 		return 0;
>>>> 	}
>>>>
>>>> When the hangup occur we don't get any output at all from our PID 1.
>>>> The last output is from the kernel:
>>>>
>>>> 	Run /sbin/init as init process
>>>> 	  with arguments:
>>>> 	    /sbin/init
>>>> 	  with environment:
>>>> 	    HOME=/
>>>> 	    TERM=linux
>>>> 	    kgdboc=ttyS0,115200
>>>>
>>>> When issuing a backtrace on all active cpus we can see that the kernel is
>>>> handling an instruction storage exception:
>>>>
>>>> 	sysrq: Show backtrace of all active CPUs
>>>> 	sysrq: CPU0:
>>>> 	CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
>>>> 	NIP:  c02aac78 LR: c02aac2c CTR: 00000000
>>>> 	REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
>>>> 	MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
>>>> 	GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
>>>> 	GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
>>>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>>> 	GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
>>>> 	NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
>>>> 	LR [c02aac2c] handle_mm_fault+0xac/0x11f0
>>>> 	Call Trace:
>>>> 	[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
>>>> 	[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
>>>> 	[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
>>>> 	[c1907f10] [c0000988] InstructionStorage+0x108/0x120
>>>> 	--- interrupt: 400 at 0x65a238
>>>> 	NIP:  0065a238 LR: 0052f26c CTR: 0052f260
>>>> 	REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
>>>> 	MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
>>>> 	GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
>>>> 	GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
>>>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>>> 	GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
>>>> 	NIP [0065a238] 0x65a238
>>>> 	LR [0052f26c] 0x52f26c
>>>> 	--- interrupt: 400
>>>> 	Instruction dump:
>>>> 	60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
>>>> 	90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
>>>>
>>>> We have also observed that the CPU is continuously servicing the same interrupt
>>>> (north of 140k times per sec), it is not deadlocked.
>>>>
>>>> We have not yet been able to reproduce this behavior under QEMU system
>>>> emulation.
>>>>
>>>> When bisecting between 5.10 and 5.14.11 we can see that this behavior started
>>>> with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
>>>>
>>>> 	powerpc: remove arguments from fault handler functions
>>>
>>> Thank you for the excellent work to investigate and report this.
>>>
>>>>
>>>> Our best guess that the instruction storage exception is not properly handled
>>>> and the kernel is never able to recover from the page fault, but we don't
>>>> really know how to proceed. Does anyone have any suggestions or insights?
>>>
>>> Before my patch, zero was passed to the page fault error code, after
>>> my patch it passes the contents of ESR SPR.
>>>
>>> It looks like the BookE instruction access interrupt does not set ESR
>>> (except for BO interrupts maybe?) so you're getting what was in the ESR
>>> register from a previous interrupt, and maybe if that was a store then
>>> access_error won't cause a segfault because is_exec is true so that
>>> test bails out early, then it might just keep retrying the interrupt.
>>>
>>> That could explain why you don't always see the same thing.
>>>
>>> Now previous code still saved ESR in regs->esr/dsisr for some reason.
>>> I can't quite see why that should have been necessary though. Does

According to the e500 Reference Manual, on ISI:

BO is set if the instruction fetch caused a byte-ordering exception; 
otherwise cleared. *All* other defined ESR bits are *cleared*.

Christophe

>>> this patch solve it for you?
>>>
>>> Thanks,
>>> Nick
>>> --
>>>
>>>
>>> diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
>>> index e5503420b6c6..0e7cdc8716eb 100644
>>> --- a/arch/powerpc/kernel/head_booke.h
>>> +++ b/arch/powerpc/kernel/head_booke.h
>>> @@ -467,10 +467,11 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>>>    
>>>    #define INSTRUCTION_STORAGE_EXCEPTION					      \
>>>    	START_EXCEPTION(InstructionStorage)				      \
>>> -	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);		      \
>>> -	mfspr	r5,SPRN_ESR;		/* Grab the ESR and save it */	      \
>>> +	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);			      \
>>> +	li	r5,0;							      \
>>> +	mfspr	r5,SPRN_ESR;		/* Store 0 in regs->esr (dsisr) */    \
>>
>> I can't see how that can help, you set r5 to 0 and immediately after you
>> reload ESR into r5 so you are still saving garbage into _ESR(r11)
>>
> 
> Oops, stupid mistake.
> 
> Thanks,
> Nick
> 
> diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
> index e5503420b6c6..094c9790a490 100644
> --- a/arch/powerpc/kernel/head_booke.h
> +++ b/arch/powerpc/kernel/head_booke.h
> @@ -467,10 +467,10 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>   
>   #define INSTRUCTION_STORAGE_EXCEPTION					      \
>   	START_EXCEPTION(InstructionStorage)				      \
> -	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);		      \
> -	mfspr	r5,SPRN_ESR;		/* Grab the ESR and save it */	      \
> +	NORMAL_EXCEPTION_PROLOG(0x400, INST_STORAGE);			      \
> +	li	r5,0;			/* Store 0 in regs->esr (dsisr) */    \
>   	stw	r5,_ESR(r11);						      \
> -	stw	r12, _DEAR(r11);	/* Pass SRR0 as arg2 */		      \
> +	stw	r12, _DEAR(r11);	/* Set regs->dear (dar) */	      \
>   	prepare_transfer_to_handler;					      \
>   	bl	do_page_fault;						      \
>   	b	interrupt_return
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-27  5:51       ` Christophe Leroy
@ 2021-10-27  7:47         ` Nicholas Piggin
  2021-10-27  7:52           ` Christophe Leroy
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2021-10-27  7:47 UTC (permalink / raw)
  To: Christophe Leroy, Jacques de Laval, Scott Wood; +Cc: linuxppc-dev

Excerpts from Christophe Leroy's message of October 27, 2021 3:51 pm:
> 
> 
> Le 27/10/2021 à 07:25, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of October 27, 2021 3:00 pm:
>>>
>>>
>>> Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
>>>> Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
>>>>> Hi,
>>>>>
>>>>> We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
>>>>> T1023 SOC with two e5500 cores, and are running in 32-bit mode:
>>>>>
>>>>> 	CONFIG_PPC32=y
>>>>> 	# CONFIG_PPC64 is not set
>>>>>
>>>>> 	#
>>>>> 	# Processor support
>>>>> 	#
>>>>> 	# CONFIG_PPC_BOOK3S_32 is not set
>>>>> 	CONFIG_PPC_85xx=y
>>>>> 	# CONFIG_PPC_8xx is not set
>>>>> 	# CONFIG_40x is not set
>>>>> 	# CONFIG_44x is not set
>>>>> 	CONFIG_GENERIC_CPU=y
>>>>> 	# CONFIG_E5500_CPU is not set
>>>>> 	# CONFIG_E6500_CPU is not set
>>>>> 	CONFIG_E500=y
>>>>> 	CONFIG_PPC_E500MC=y
>>>>> 	CONFIG_PPC_FPU_REGS=y
>>>>> 	CONFIG_PPC_FPU=y
>>>>> 	CONFIG_FSL_EMB_PERFMON=y
>>>>> 	CONFIG_FSL_EMB_PERF_EVENT=y
>>>>> 	CONFIG_FSL_EMB_PERF_EVENT_E500=y
>>>>> 	CONFIG_BOOKE=y
>>>>> 	CONFIG_FSL_BOOKE=y
>>>>> 	CONFIG_PPC_FSL_BOOK3E=y
>>>>> 	CONFIG_PTE_64BIT=y
>>>>> 	CONFIG_PHYS_64BIT=y
>>>>> 	CONFIG_PPC_MMU_NOHASH=y
>>>>> 	CONFIG_PPC_BOOK3E_MMU=y
>>>>> 	# CONFIG_PMU_SYSFS is not set
>>>>> 	CONFIG_SMP=y
>>>>> 	CONFIG_NR_CPUS=2
>>>>> 	CONFIG_PPC_DOORBELL=y
>>>>> 	# end of Processor support
>>>>>
>>>>> We compile using 32-bit Bootlin PPC toolchain:
>>>>>
>>>>> 	powerpc-e500mc glibc bleeding-edge 2020.08-1.
>>>>>
>>>>> When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
>>>>> process is running, and for debugging purposes our init currently looks like
>>>>> this:
>>>>>
>>>>> 	int main(int argc, char *argv[]){
>>>>> 		for (int i = 0; ; i++) {
>>>>> 			FILE *fp = fopen("/dev/kmsg", "w");
>>>>> 			if (fp) {
>>>>> 				fprintf(fp, "%d\n", i);
>>>>> 				fclose(fp);
>>>>> 			}
>>>>> 			sleep(1);
>>>>> 		}
>>>>> 		return 0;
>>>>> 	}
>>>>>
>>>>> When the hangup occur we don't get any output at all from our PID 1.
>>>>> The last output is from the kernel:
>>>>>
>>>>> 	Run /sbin/init as init process
>>>>> 	  with arguments:
>>>>> 	    /sbin/init
>>>>> 	  with environment:
>>>>> 	    HOME=/
>>>>> 	    TERM=linux
>>>>> 	    kgdboc=ttyS0,115200
>>>>>
>>>>> When issuing a backtrace on all active cpus we can see that the kernel is
>>>>> handling an instruction storage exception:
>>>>>
>>>>> 	sysrq: Show backtrace of all active CPUs
>>>>> 	sysrq: CPU0:
>>>>> 	CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
>>>>> 	NIP:  c02aac78 LR: c02aac2c CTR: 00000000
>>>>> 	REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
>>>>> 	MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
>>>>> 	GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
>>>>> 	GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
>>>>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>>>> 	GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
>>>>> 	NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
>>>>> 	LR [c02aac2c] handle_mm_fault+0xac/0x11f0
>>>>> 	Call Trace:
>>>>> 	[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
>>>>> 	[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
>>>>> 	[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
>>>>> 	[c1907f10] [c0000988] InstructionStorage+0x108/0x120
>>>>> 	--- interrupt: 400 at 0x65a238
>>>>> 	NIP:  0065a238 LR: 0052f26c CTR: 0052f260
>>>>> 	REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
>>>>> 	MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
>>>>> 	GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
>>>>> 	GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
>>>>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>>>> 	GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
>>>>> 	NIP [0065a238] 0x65a238
>>>>> 	LR [0052f26c] 0x52f26c
>>>>> 	--- interrupt: 400
>>>>> 	Instruction dump:
>>>>> 	60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
>>>>> 	90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
>>>>>
>>>>> We have also observed that the CPU is continuously servicing the same interrupt
>>>>> (north of 140k times per sec), it is not deadlocked.
>>>>>
>>>>> We have not yet been able to reproduce this behavior under QEMU system
>>>>> emulation.
>>>>>
>>>>> When bisecting between 5.10 and 5.14.11 we can see that this behavior started
>>>>> with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
>>>>>
>>>>> 	powerpc: remove arguments from fault handler functions
>>>>
>>>> Thank you for the excellent work to investigate and report this.
>>>>
>>>>>
>>>>> Our best guess that the instruction storage exception is not properly handled
>>>>> and the kernel is never able to recover from the page fault, but we don't
>>>>> really know how to proceed. Does anyone have any suggestions or insights?
>>>>
>>>> Before my patch, zero was passed to the page fault error code, after
>>>> my patch it passes the contents of ESR SPR.
>>>>
>>>> It looks like the BookE instruction access interrupt does not set ESR
>>>> (except for BO interrupts maybe?) so you're getting what was in the ESR
>>>> register from a previous interrupt, and maybe if that was a store then
>>>> access_error won't cause a segfault because is_exec is true so that
>>>> test bails out early, then it might just keep retrying the interrupt.
>>>>
>>>> That could explain why you don't always see the same thing.
>>>>
>>>> Now previous code still saved ESR in regs->esr/dsisr for some reason.
>>>> I can't quite see why that should have been necessary though. Does
> 
> According to the e500 Reference Manual, on ISI:
> 
> BO is set if the instruction fetch caused a byte-ordering exception; 
> otherwise cleared. *All* other defined ESR bits are *cleared*.

You're right. In that case it shouldn't change anything unless there
was a BO fault. I'm not sure what the problem is then. Guessing based
on the NIP and instructions, it looks like it's probably got the correct 
user address that it's storing into vmf on the stack, so it has got past 
the access checks so my theory would be wrong anyway.

Must be something simple but I can't see it yet. 

Thanks,
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-27  7:47         ` Nicholas Piggin
@ 2021-10-27  7:52           ` Christophe Leroy
  2021-10-27 12:03             ` Jacques de Laval
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Leroy @ 2021-10-27  7:52 UTC (permalink / raw)
  To: Nicholas Piggin, Jacques de Laval, Scott Wood; +Cc: linuxppc-dev



Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of October 27, 2021 3:51 pm:
>>
>>
>> Le 27/10/2021 à 07:25, Nicholas Piggin a écrit :
>>> Excerpts from Christophe Leroy's message of October 27, 2021 3:00 pm:
>>>>
>>>>
>>>> Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
>>>>> Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
>>>>>> Hi,
>>>>>>
>>>>>> We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
>>>>>> T1023 SOC with two e5500 cores, and are running in 32-bit mode:
>>>>>>
>>>>>> 	CONFIG_PPC32=y
>>>>>> 	# CONFIG_PPC64 is not set
>>>>>>
>>>>>> 	#
>>>>>> 	# Processor support
>>>>>> 	#
>>>>>> 	# CONFIG_PPC_BOOK3S_32 is not set
>>>>>> 	CONFIG_PPC_85xx=y
>>>>>> 	# CONFIG_PPC_8xx is not set
>>>>>> 	# CONFIG_40x is not set
>>>>>> 	# CONFIG_44x is not set
>>>>>> 	CONFIG_GENERIC_CPU=y
>>>>>> 	# CONFIG_E5500_CPU is not set
>>>>>> 	# CONFIG_E6500_CPU is not set
>>>>>> 	CONFIG_E500=y
>>>>>> 	CONFIG_PPC_E500MC=y
>>>>>> 	CONFIG_PPC_FPU_REGS=y
>>>>>> 	CONFIG_PPC_FPU=y
>>>>>> 	CONFIG_FSL_EMB_PERFMON=y
>>>>>> 	CONFIG_FSL_EMB_PERF_EVENT=y
>>>>>> 	CONFIG_FSL_EMB_PERF_EVENT_E500=y
>>>>>> 	CONFIG_BOOKE=y
>>>>>> 	CONFIG_FSL_BOOKE=y
>>>>>> 	CONFIG_PPC_FSL_BOOK3E=y
>>>>>> 	CONFIG_PTE_64BIT=y
>>>>>> 	CONFIG_PHYS_64BIT=y
>>>>>> 	CONFIG_PPC_MMU_NOHASH=y
>>>>>> 	CONFIG_PPC_BOOK3E_MMU=y
>>>>>> 	# CONFIG_PMU_SYSFS is not set
>>>>>> 	CONFIG_SMP=y
>>>>>> 	CONFIG_NR_CPUS=2
>>>>>> 	CONFIG_PPC_DOORBELL=y
>>>>>> 	# end of Processor support
>>>>>>
>>>>>> We compile using 32-bit Bootlin PPC toolchain:
>>>>>>
>>>>>> 	powerpc-e500mc glibc bleeding-edge 2020.08-1.
>>>>>>
>>>>>> When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
>>>>>> process is running, and for debugging purposes our init currently looks like
>>>>>> this:
>>>>>>
>>>>>> 	int main(int argc, char *argv[]){
>>>>>> 		for (int i = 0; ; i++) {
>>>>>> 			FILE *fp = fopen("/dev/kmsg", "w");
>>>>>> 			if (fp) {
>>>>>> 				fprintf(fp, "%d\n", i);
>>>>>> 				fclose(fp);
>>>>>> 			}
>>>>>> 			sleep(1);
>>>>>> 		}
>>>>>> 		return 0;
>>>>>> 	}
>>>>>>
>>>>>> When the hangup occur we don't get any output at all from our PID 1.
>>>>>> The last output is from the kernel:
>>>>>>
>>>>>> 	Run /sbin/init as init process
>>>>>> 	  with arguments:
>>>>>> 	    /sbin/init
>>>>>> 	  with environment:
>>>>>> 	    HOME=/
>>>>>> 	    TERM=linux
>>>>>> 	    kgdboc=ttyS0,115200
>>>>>>
>>>>>> When issuing a backtrace on all active cpus we can see that the kernel is
>>>>>> handling an instruction storage exception:
>>>>>>
>>>>>> 	sysrq: Show backtrace of all active CPUs
>>>>>> 	sysrq: CPU0:
>>>>>> 	CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
>>>>>> 	NIP:  c02aac78 LR: c02aac2c CTR: 00000000
>>>>>> 	REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
>>>>>> 	MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
>>>>>> 	GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
>>>>>> 	GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
>>>>>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>>>>> 	GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
>>>>>> 	NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
>>>>>> 	LR [c02aac2c] handle_mm_fault+0xac/0x11f0
>>>>>> 	Call Trace:
>>>>>> 	[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
>>>>>> 	[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
>>>>>> 	[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
>>>>>> 	[c1907f10] [c0000988] InstructionStorage+0x108/0x120
>>>>>> 	--- interrupt: 400 at 0x65a238
>>>>>> 	NIP:  0065a238 LR: 0052f26c CTR: 0052f260
>>>>>> 	REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
>>>>>> 	MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
>>>>>> 	GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
>>>>>> 	GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
>>>>>> 	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>>>>> 	GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
>>>>>> 	NIP [0065a238] 0x65a238
>>>>>> 	LR [0052f26c] 0x52f26c
>>>>>> 	--- interrupt: 400
>>>>>> 	Instruction dump:
>>>>>> 	60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
>>>>>> 	90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
>>>>>>
>>>>>> We have also observed that the CPU is continuously servicing the same interrupt
>>>>>> (north of 140k times per sec), it is not deadlocked.
>>>>>>
>>>>>> We have not yet been able to reproduce this behavior under QEMU system
>>>>>> emulation.
>>>>>>
>>>>>> When bisecting between 5.10 and 5.14.11 we can see that this behavior started
>>>>>> with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
>>>>>>
>>>>>> 	powerpc: remove arguments from fault handler functions
>>>>>
>>>>> Thank you for the excellent work to investigate and report this.
>>>>>
>>>>>>
>>>>>> Our best guess that the instruction storage exception is not properly handled
>>>>>> and the kernel is never able to recover from the page fault, but we don't
>>>>>> really know how to proceed. Does anyone have any suggestions or insights?
>>>>>
>>>>> Before my patch, zero was passed to the page fault error code, after
>>>>> my patch it passes the contents of ESR SPR.
>>>>>
>>>>> It looks like the BookE instruction access interrupt does not set ESR
>>>>> (except for BO interrupts maybe?) so you're getting what was in the ESR
>>>>> register from a previous interrupt, and maybe if that was a store then
>>>>> access_error won't cause a segfault because is_exec is true so that
>>>>> test bails out early, then it might just keep retrying the interrupt.
>>>>>
>>>>> That could explain why you don't always see the same thing.
>>>>>
>>>>> Now previous code still saved ESR in regs->esr/dsisr for some reason.
>>>>> I can't quite see why that should have been necessary though. Does
>>
>> According to the e500 Reference Manual, on ISI:
>>
>> BO is set if the instruction fetch caused a byte-ordering exception;
>> otherwise cleared. *All* other defined ESR bits are *cleared*.
> 
> You're right. In that case it shouldn't change anything unless there
> was a BO fault. I'm not sure what the problem is then. Guessing based
> on the NIP and instructions, it looks like it's probably got the correct
> user address that it's storing into vmf on the stack, so it has got past
> the access checks so my theory would be wrong anyway.
> 
> Must be something simple but I can't see it yet.
> 

Anyway, I think it is still worth doing the check with setting 0 in 
_ESR(r11), maybe the Reference Manual is wrong.

So Jacques, please do the test anyway if you can.

Thanks
Christophe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-27  7:52           ` Christophe Leroy
@ 2021-10-27 12:03             ` Jacques de Laval
  2021-10-28  3:01               ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Jacques de Laval @ 2021-10-27 12:03 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Scott Wood, linuxppc-dev, Nicholas Piggin

On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:

> Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
>
> > Excerpts from Christophe Leroy's message of October 27, 2021 3:51 pm:
> >
> > > Le 27/10/2021 à 07:25, Nicholas Piggin a écrit :
> > >
> > > > Excerpts from Christophe Leroy's message of October 27, 2021 3:00 pm:
> > > >
> > > > > Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
> > > > >
> > > > > > Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
> > > > > > >
> > > > > > > T1023 SOC with two e5500 cores, and are running in 32-bit mode:
> > > > > > >
> > > > > > > CONFIG_PPC32=y
> > > > > > >
> > > > > > > CONFIG_PPC64 is not set
> > > > > > > =======================
> > > > > > >
> > > > > > > Processor support
> > > > > > > =================
> > > > > > >
> > > > > > > CONFIG_PPC_BOOK3S_32 is not set
> > > > > > > ===============================
> > > > > > >
> > > > > > > CONFIG_PPC_85xx=y
> > > > > > >
> > > > > > > CONFIG_PPC_8xx is not set
> > > > > > > =========================
> > > > > > >
> > > > > > > CONFIG_40x is not set
> > > > > > > =====================
> > > > > > >
> > > > > > > CONFIG_44x is not set
> > > > > > > =====================
> > > > > > >
> > > > > > > CONFIG_GENERIC_CPU=y
> > > > > > >
> > > > > > > CONFIG_E5500_CPU is not set
> > > > > > > ===========================
> > > > > > >
> > > > > > > CONFIG_E6500_CPU is not set
> > > > > > > ===========================
> > > > > > >
> > > > > > > CONFIG_E500=y
> > > > > > >
> > > > > > > CONFIG_PPC_E500MC=y
> > > > > > >
> > > > > > > CONFIG_PPC_FPU_REGS=y
> > > > > > >
> > > > > > > CONFIG_PPC_FPU=y
> > > > > > >
> > > > > > > CONFIG_FSL_EMB_PERFMON=y
> > > > > > >
> > > > > > > CONFIG_FSL_EMB_PERF_EVENT=y
> > > > > > >
> > > > > > > CONFIG_FSL_EMB_PERF_EVENT_E500=y
> > > > > > >
> > > > > > > CONFIG_BOOKE=y
> > > > > > >
> > > > > > > CONFIG_FSL_BOOKE=y
> > > > > > >
> > > > > > > CONFIG_PPC_FSL_BOOK3E=y
> > > > > > >
> > > > > > > CONFIG_PTE_64BIT=y
> > > > > > >
> > > > > > > CONFIG_PHYS_64BIT=y
> > > > > > >
> > > > > > > CONFIG_PPC_MMU_NOHASH=y
> > > > > > >
> > > > > > > CONFIG_PPC_BOOK3E_MMU=y
> > > > > > >
> > > > > > > CONFIG_PMU_SYSFS is not set
> > > > > > > ===========================
> > > > > > >
> > > > > > > CONFIG_SMP=y
> > > > > > >
> > > > > > > CONFIG_NR_CPUS=2
> > > > > > >
> > > > > > > CONFIG_PPC_DOORBELL=y
> > > > > > >
> > > > > > > end of Processor support
> > > > > > > ========================
> > > > > > >
> > > > > > > We compile using 32-bit Bootlin PPC toolchain:
> > > > > > >
> > > > > > > powerpc-e500mc glibc bleeding-edge 2020.08-1.
> > > > > > >
> > > > > > > When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
> > > > > > >
> > > > > > > process is running, and for debugging purposes our init currently looks like
> > > > > > >
> > > > > > > this:
> > > > > > >
> > > > > > > int main(int argc, char *argv[]){
> > > > > > >
> > > > > > > for (int i = 0; ; i++) {
> > > > > > >
> > > > > > > FILE *fp = fopen("/dev/kmsg", "w");
> > > > > > >
> > > > > > > if (fp) {
> > > > > > >
> > > > > > > fprintf(fp, "%d\n", i);
> > > > > > >
> > > > > > > fclose(fp);
> > > > > > >
> > > > > > > }
> > > > > > >
> > > > > > > sleep(1);
> > > > > > >
> > > > > > > }
> > > > > > >
> > > > > > > return 0;
> > > > > > >
> > > > > > > }
> > > > > > >
> > > > > > > When the hangup occur we don't get any output at all from our PID 1.
> > > > > > >
> > > > > > > The last output is from the kernel:
> > > > > > >
> > > > > > > Run /sbin/init as init process
> > > > > > >
> > > > > > > with arguments:
> > > > > > >
> > > > > > > /sbin/init
> > > > > > >
> > > > > > > with environment:
> > > > > > >
> > > > > > > HOME=/
> > > > > > >
> > > > > > > TERM=linux
> > > > > > >
> > > > > > > kgdboc=ttyS0,115200
> > > > > > >
> > > > > > > When issuing a backtrace on all active cpus we can see that the kernel is
> > > > > > >
> > > > > > > handling an instruction storage exception:
> > > > > > >
> > > > > > > sysrq: Show backtrace of all active CPUs
> > > > > > >
> > > > > > > sysrq: CPU0:
> > > > > > >
> > > > > > > CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
> > > > > > >
> > > > > > > NIP: c02aac78 LR: c02aac2c CTR: 00000000
> > > > > > >
> > > > > > > REGS: c1907d40 TRAP: 0500 Not tainted (5.14.11)
> > > > > > >
> > > > > > > MSR: 00029002 <CE,EE,ME> CR: 82244284 XER: 20000000
> > > > > > >
> > > > > > > GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
> > > > > > >
> > > > > > > GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
> > > > > > >
> > > > > > > GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
> > > > > > >
> > > > > > > GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
> > > > > > >
> > > > > > > NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
> > > > > > >
> > > > > > > LR [c02aac2c] handle_mm_fault+0xac/0x11f0
> > > > > > >
> > > > > > > Call Trace:
> > > > > > >
> > > > > > > [c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
> > > > > > >
> > > > > > > [c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
> > > > > > >
> > > > > > > [c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
> > > > > > >
> > > > > > > [c1907f10] [c0000988] InstructionStorage+0x108/0x120
> > > > > > >
> > > > > > > --- interrupt: 400 at 0x65a238
> > > > > > >
> > > > > > > NIP: 0065a238 LR: 0052f26c CTR: 0052f260
> > > > > > >
> > > > > > > REGS: c1907f20 TRAP: 0400 Not tainted (5.14.11)
> > > > > > >
> > > > > > > MSR: 0002d002 <CE,EE,PR,ME> CR: 42242284 XER: 00000000
> > > > > > >
> > > > > > > GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
> > > > > > >
> > > > > > > GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
> > > > > > >
> > > > > > > GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
> > > > > > >
> > > > > > > GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
> > > > > > >
> > > > > > > NIP [0065a238] 0x65a238
> > > > > > >
> > > > > > > LR [0052f26c] 0x52f26c
> > > > > > >
> > > > > > > --- interrupt: 400
> > > > > > >
> > > > > > > Instruction dump:
> > > > > > >
> > > > > > > 60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
> > > > > > >
> > > > > > > 90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
> > > > > > >
> > > > > > > We have also observed that the CPU is continuously servicing the same interrupt
> > > > > > >
> > > > > > > (north of 140k times per sec), it is not deadlocked.
> > > > > > >
> > > > > > > We have not yet been able to reproduce this behavior under QEMU system
> > > > > > >
> > > > > > > emulation.
> > > > > > >
> > > > > > > When bisecting between 5.10 and 5.14.11 we can see that this behavior started
> > > > > > >
> > > > > > > with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
> > > > > > >
> > > > > > > powerpc: remove arguments from fault handler functions
> > > > > >
> > > > > > Thank you for the excellent work to investigate and report this.
> > > > > >
> > > > > > > Our best guess that the instruction storage exception is not properly handled
> > > > > > >
> > > > > > > and the kernel is never able to recover from the page fault, but we don't
> > > > > > >
> > > > > > > really know how to proceed. Does anyone have any suggestions or insights?
> > > > > >
> > > > > > Before my patch, zero was passed to the page fault error code, after
> > > > > >
> > > > > > my patch it passes the contents of ESR SPR.
> > > > > >
> > > > > > It looks like the BookE instruction access interrupt does not set ESR
> > > > > >
> > > > > > (except for BO interrupts maybe?) so you're getting what was in the ESR
> > > > > >
> > > > > > register from a previous interrupt, and maybe if that was a store then
> > > > > >
> > > > > > access_error won't cause a segfault because is_exec is true so that
> > > > > >
> > > > > > test bails out early, then it might just keep retrying the interrupt.
> > > > > >
> > > > > > That could explain why you don't always see the same thing.
> > > > > >
> > > > > > Now previous code still saved ESR in regs->esr/dsisr for some reason.
> > > > > >
> > > > > > I can't quite see why that should have been necessary though. Does
> > >
> > > According to the e500 Reference Manual, on ISI:
> > >
> > > BO is set if the instruction fetch caused a byte-ordering exception;
> > >
> > > otherwise cleared. All other defined ESR bits are cleared.
> >
> > You're right. In that case it shouldn't change anything unless there
> >
> > was a BO fault. I'm not sure what the problem is then. Guessing based
> >
> > on the NIP and instructions, it looks like it's probably got the correct
> >
> > user address that it's storing into vmf on the stack, so it has got past
> >
> > the access checks so my theory would be wrong anyway.
> >
> > Must be something simple but I can't see it yet.
>
> Anyway, I think it is still worth doing the check with setting 0 in
>
> _ESR(r11), maybe the Reference Manual is wrong.
>
> So Jacques, please do the test anyway if you can.
>
> Thanks
>
> Christophe

I tested with the last patch from Nicholas, and with that I can not
reproduce the issue, so it seems like that solves it for my case and setup
at least.

Big thanks Christophe and Nicholas for looking in to this!

Thanks,
Jacques

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-27 12:03             ` Jacques de Laval
@ 2021-10-28  3:01               ` Nicholas Piggin
  2021-10-28  9:08                 ` Jacques de Laval
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2021-10-28  3:01 UTC (permalink / raw)
  To: Christophe Leroy, Jacques de Laval; +Cc: Scott Wood, linuxppc-dev

Excerpts from Jacques de Laval's message of October 27, 2021 10:03 pm:
> On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
>> Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
>>
>> > You're right. In that case it shouldn't change anything unless there
>> >
>> > was a BO fault. I'm not sure what the problem is then. Guessing based
>> >
>> > on the NIP and instructions, it looks like it's probably got the correct
>> >
>> > user address that it's storing into vmf on the stack, so it has got past
>> >
>> > the access checks so my theory would be wrong anyway.
>> >
>> > Must be something simple but I can't see it yet.
>>
>> Anyway, I think it is still worth doing the check with setting 0 in
>>
>> _ESR(r11), maybe the Reference Manual is wrong.
>>
>> So Jacques, please do the test anyway if you can.
>>
>> Thanks
>>
>> Christophe
> 
> I tested with the last patch from Nicholas, and with that I can not
> reproduce the issue, so it seems like that solves it for my case and setup
> at least.
> 
> Big thanks Christophe and Nicholas for looking in to this!

Thanks for reporting and testing. We can certainly send this patch 
upstream to fix the regression, but I'm still not exactly sure what is 
going on. If it is an errata or part of specification we missed that 
could explain it but it would be good to understand and comment it.

If you have time to test again with only the following patch applied,
it might give a better clue. This patch should keep running but it
would print some dmesg output.

Thanks,
Nick

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..cf56f23ff90a 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -548,6 +548,12 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
 
 DEFINE_INTERRUPT_HANDLER(do_page_fault)
 {
+	if (TRAP(regs) == INTERRUPT_INST_STORAGE) {
+		if (regs->dsisr != 0) {
+			printk("ISI pc:%lx msr:%lx dsisr:%lx ESR:%lx\n", regs->nip, regs->msr, regs->dsisr, mfspr(SPRN_ESR));
+			regs->dsisr = 0; // fix?
+		}
+	}
 	__do_page_fault(regs);
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-28  3:01               ` Nicholas Piggin
@ 2021-10-28  9:08                 ` Jacques de Laval
  2021-10-28  9:35                   ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Jacques de Laval @ 2021-10-28  9:08 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Scott Wood, linuxppc-dev

On Thursday, October 28th, 2021 at 5:01 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> Excerpts from Jacques de Laval's message of October 27, 2021 10:03 pm:
>
> > On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
> >
> > > Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
> > >
> > > > You're right. In that case it shouldn't change anything unless there
> > > >
> > > > was a BO fault. I'm not sure what the problem is then. Guessing based
> > > >
> > > > on the NIP and instructions, it looks like it's probably got the correct
> > > >
> > > > user address that it's storing into vmf on the stack, so it has got past
> > > >
> > > > the access checks so my theory would be wrong anyway.
> > > >
> > > > Must be something simple but I can't see it yet.
> > >
> > > Anyway, I think it is still worth doing the check with setting 0 in
> > >
> > > _ESR(r11), maybe the Reference Manual is wrong.
> > >
> > > So Jacques, please do the test anyway if you can.
> > >
> > > Thanks
> > >
> > > Christophe
> >
> > I tested with the last patch from Nicholas, and with that I can not
> >
> > reproduce the issue, so it seems like that solves it for my case and setup
> >
> > at least.
> >
> > Big thanks Christophe and Nicholas for looking in to this!
>
> Thanks for reporting and testing. We can certainly send this patch
>
> upstream to fix the regression, but I'm still not exactly sure what is
>
> going on. If it is an errata or part of specification we missed that
>
> could explain it but it would be good to understand and comment it.
>
> If you have time to test again with only the following patch applied,
>
> it might give a better clue. This patch should keep running but it
>
> would print some dmesg output.
>
> Thanks,
>
> Nick
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>
> index a8d0ce85d39a..cf56f23ff90a 100644
>
> --- a/arch/powerpc/mm/fault.c
>
> +++ b/arch/powerpc/mm/fault.c
>
> @@ -548,6 +548,12 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
>
> DEFINE_INTERRUPT_HANDLER(do_page_fault)
>
> {
>
> -   if (TRAP(regs) == INTERRUPT_INST_STORAGE) {
>
> -         if (regs->dsisr != 0) {
>
>
> -         	printk("ISI pc:%lx msr:%lx dsisr:%lx ESR:%lx\\n", regs->nip, regs->msr, regs->dsisr, mfspr(SPRN_ESR));
>
>
> -         	regs->dsisr = 0; // fix?
>
>
> -         }
>
>
> -   }
>
>     __do_page_fault(regs);
>
>     }
>

As expected it keeps running. The output, and number of prints is naturally
a bit different from time to time, but dsisr/ESR is always 0x800000.

Here's a representative output from one run:

	ISI pc:b789e6c0 msr:2d002 dsisr:800000 ESR:800000
	ISI pc:b7884220 msr:2d002 dsisr:800000 ESR:800000
	ISI pc:b78c18a4 msr:2d002 dsisr:800000 ESR:800000
	ISI pc:55a238 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:412380 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:3aabe0 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:47a0e0 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:443290 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:43b350 msr:2d002 dsisr:800000 ESR:800000

Thanks,
Jacques

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-28  9:08                 ` Jacques de Laval
@ 2021-10-28  9:35                   ` Nicholas Piggin
  2021-10-28 12:42                     ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2021-10-28  9:35 UTC (permalink / raw)
  To: Jacques de Laval; +Cc: Scott Wood, linuxppc-dev

Excerpts from Jacques de Laval's message of October 28, 2021 7:08 pm:
> On Thursday, October 28th, 2021 at 5:01 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
>> Excerpts from Jacques de Laval's message of October 27, 2021 10:03 pm:
>>
>> > On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
>> >
>> > > Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
>> > >
>> > > > You're right. In that case it shouldn't change anything unless there
>> > > >
>> > > > was a BO fault. I'm not sure what the problem is then. Guessing based
>> > > >
>> > > > on the NIP and instructions, it looks like it's probably got the correct
>> > > >
>> > > > user address that it's storing into vmf on the stack, so it has got past
>> > > >
>> > > > the access checks so my theory would be wrong anyway.
>> > > >
>> > > > Must be something simple but I can't see it yet.
>> > >
>> > > Anyway, I think it is still worth doing the check with setting 0 in
>> > >
>> > > _ESR(r11), maybe the Reference Manual is wrong.
>> > >
>> > > So Jacques, please do the test anyway if you can.
>> > >
>> > > Thanks
>> > >
>> > > Christophe
>> >
>> > I tested with the last patch from Nicholas, and with that I can not
>> >
>> > reproduce the issue, so it seems like that solves it for my case and setup
>> >
>> > at least.
>> >
>> > Big thanks Christophe and Nicholas for looking in to this!
>>
>> Thanks for reporting and testing. We can certainly send this patch
>>
>> upstream to fix the regression, but I'm still not exactly sure what is
>>
>> going on. If it is an errata or part of specification we missed that
>>
>> could explain it but it would be good to understand and comment it.
>>
>> If you have time to test again with only the following patch applied,
>>
>> it might give a better clue. This patch should keep running but it
>>
>> would print some dmesg output.
>>
>> Thanks,
>>
>> Nick
>>
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>>
>> index a8d0ce85d39a..cf56f23ff90a 100644
>>
>> --- a/arch/powerpc/mm/fault.c
>>
>> +++ b/arch/powerpc/mm/fault.c
>>
>> @@ -548,6 +548,12 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
>>
>> DEFINE_INTERRUPT_HANDLER(do_page_fault)
>>
>> {
>>
>> -   if (TRAP(regs) == INTERRUPT_INST_STORAGE) {
>>
>> -         if (regs->dsisr != 0) {
>>
>>
>> -         	printk("ISI pc:%lx msr:%lx dsisr:%lx ESR:%lx\\n", regs->nip, regs->msr, regs->dsisr, mfspr(SPRN_ESR));
>>
>>
>> -         	regs->dsisr = 0; // fix?
>>
>>
>> -         }
>>
>>
>> -   }
>>
>>     __do_page_fault(regs);
>>
>>     }
>>
> 
> As expected it keeps running. The output, and number of prints is naturally
> a bit different from time to time, but dsisr/ESR is always 0x800000.
> 
> Here's a representative output from one run:
> 
> 	ISI pc:b789e6c0 msr:2d002 dsisr:800000 ESR:800000
> 	ISI pc:b7884220 msr:2d002 dsisr:800000 ESR:800000
> 	ISI pc:b78c18a4 msr:2d002 dsisr:800000 ESR:800000
> 	ISI pc:55a238 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:412380 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:3aabe0 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:47a0e0 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:443290 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:43b350 msr:2d002 dsisr:800000 ESR:800000

Great, thanks for testing that is interesting.

Looking a bit more,

https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf

This is the manual for e500 family including e5500.

Table 8-4. Interrupt Summary by IVOR shows ISI interrupt as affecting 
ESR register with [PT], this means the PT bit may be set. There is no
BO bit specified here.

The architecture (and this manual) says that if an interrupt type 
affects one of the ESR bits then all others are cleared. However if
we look at the 4.8.7 Exception Syndrome Register (ESR) definition,
the PT bit is specified with <E.PT>. This means it is implemented if
the processor supports the E.PT extension.

According to this table, the e5500 does not support E.PT.
https://www.linux-kvm.org/page/E500_virtual_CPU_specification

So it seems possible that a processor which does not support E.PT and 
therefore ISI will never set any bits in ESR, will not zero the ESR
when it takes an ISI intrrupt, without violating the specification.

It looks like that is what is happening here, ESR is being left from
a previous store DSI.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: instruction storage exception handling
  2021-10-28  9:35                   ` Nicholas Piggin
@ 2021-10-28 12:42                     ` Nicholas Piggin
  0 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2021-10-28 12:42 UTC (permalink / raw)
  To: Jacques de Laval; +Cc: Scott Wood, linuxppc-dev

Excerpts from Nicholas Piggin's message of October 28, 2021 7:35 pm:
> Excerpts from Jacques de Laval's message of October 28, 2021 7:08 pm:
>> On Thursday, October 28th, 2021 at 5:01 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
>>> Excerpts from Jacques de Laval's message of October 27, 2021 10:03 pm:
>>>
>>> > On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
>>> >
>>> > > Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
>>> > >
>>> > > > You're right. In that case it shouldn't change anything unless there
>>> > > >
>>> > > > was a BO fault. I'm not sure what the problem is then. Guessing based
>>> > > >
>>> > > > on the NIP and instructions, it looks like it's probably got the correct
>>> > > >
>>> > > > user address that it's storing into vmf on the stack, so it has got past
>>> > > >
>>> > > > the access checks so my theory would be wrong anyway.
>>> > > >
>>> > > > Must be something simple but I can't see it yet.
>>> > >
>>> > > Anyway, I think it is still worth doing the check with setting 0 in
>>> > >
>>> > > _ESR(r11), maybe the Reference Manual is wrong.
>>> > >
>>> > > So Jacques, please do the test anyway if you can.
>>> > >
>>> > > Thanks
>>> > >
>>> > > Christophe
>>> >
>>> > I tested with the last patch from Nicholas, and with that I can not
>>> >
>>> > reproduce the issue, so it seems like that solves it for my case and setup
>>> >
>>> > at least.
>>> >
>>> > Big thanks Christophe and Nicholas for looking in to this!
>>>
>>> Thanks for reporting and testing. We can certainly send this patch
>>>
>>> upstream to fix the regression, but I'm still not exactly sure what is
>>>
>>> going on. If it is an errata or part of specification we missed that
>>>
>>> could explain it but it would be good to understand and comment it.
>>>
>>> If you have time to test again with only the following patch applied,
>>>
>>> it might give a better clue. This patch should keep running but it
>>>
>>> would print some dmesg output.
>>>
>>> Thanks,
>>>
>>> Nick
>>>
>>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>>>
>>> index a8d0ce85d39a..cf56f23ff90a 100644
>>>
>>> --- a/arch/powerpc/mm/fault.c
>>>
>>> +++ b/arch/powerpc/mm/fault.c
>>>
>>> @@ -548,6 +548,12 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
>>>
>>> DEFINE_INTERRUPT_HANDLER(do_page_fault)
>>>
>>> {
>>>
>>> -   if (TRAP(regs) == INTERRUPT_INST_STORAGE) {
>>>
>>> -         if (regs->dsisr != 0) {
>>>
>>>
>>> -         	printk("ISI pc:%lx msr:%lx dsisr:%lx ESR:%lx\\n", regs->nip, regs->msr, regs->dsisr, mfspr(SPRN_ESR));
>>>
>>>
>>> -         	regs->dsisr = 0; // fix?
>>>
>>>
>>> -         }
>>>
>>>
>>> -   }
>>>
>>>     __do_page_fault(regs);
>>>
>>>     }
>>>
>> 
>> As expected it keeps running. The output, and number of prints is naturally
>> a bit different from time to time, but dsisr/ESR is always 0x800000.
>> 
>> Here's a representative output from one run:
>> 
>> 	ISI pc:b789e6c0 msr:2d002 dsisr:800000 ESR:800000
>> 	ISI pc:b7884220 msr:2d002 dsisr:800000 ESR:800000
>> 	ISI pc:b78c18a4 msr:2d002 dsisr:800000 ESR:800000
>> 	ISI pc:55a238 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:412380 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:3aabe0 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:47a0e0 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:443290 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:43b350 msr:2d002 dsisr:800000 ESR:800000
> 
> Great, thanks for testing that is interesting.
> 
> Looking a bit more,
> 
> https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf
> 
> This is the manual for e500 family including e5500.
> 
> Table 8-4. Interrupt Summary by IVOR shows ISI interrupt as affecting 
> ESR register with [PT], this means the PT bit may be set. There is no
> BO bit specified here.
> 
> The architecture (and this manual) says that if an interrupt type 
> affects one of the ESR bits then all others are cleared. However if
> we look at the 4.8.7 Exception Syndrome Register (ESR) definition,
> the PT bit is specified with <E.PT>. This means it is implemented if
> the processor supports the E.PT extension.
> 
> According to this table, the e5500 does not support E.PT.
> https://www.linux-kvm.org/page/E500_virtual_CPU_specification
> 
> So it seems possible that a processor which does not support E.PT and 
> therefore ISI will never set any bits in ESR, will not zero the ESR
> when it takes an ISI intrrupt, without violating the specification.
> 
> It looks like that is what is happening here, ESR is being left from
> a previous store DSI.

Actually it could be another explanation, it takes a instruction 
TLB error interrupt first which always sets no ESR bits, and then
it jumps to the ISI handler without clearing ESR.

I have not actually worked out why it is causing an infinite loop,
it doesn't seem to be the main fault handler, maybe it is some
low level TLB or PTE code, I can't get a kernel to boot under
quemu due to some bug, but a 64-bit kernel does not cause the problem if 
you manually add ESR_DST into dsisr on instruction faults. Don't really
need to know that exactly though, it's clearly a bug.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* instruction storage exception handling
@ 2021-10-26 22:44 Jacques de Laval
  0 siblings, 0 replies; 12+ messages in thread
From: Jacques de Laval @ 2021-10-26 22:44 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
T1023 SOC with two e5500 cores, and are running in 32-bit mode:

	CONFIG_PPC32=y
	# CONFIG_PPC64 is not set

	#
	# Processor support
	#
	# CONFIG_PPC_BOOK3S_32 is not set
	CONFIG_PPC_85xx=y
	# CONFIG_PPC_8xx is not set
	# CONFIG_40x is not set
	# CONFIG_44x is not set
	CONFIG_GENERIC_CPU=y
	# CONFIG_E5500_CPU is not set
	# CONFIG_E6500_CPU is not set
	CONFIG_E500=y
	CONFIG_PPC_E500MC=y
	CONFIG_PPC_FPU_REGS=y
	CONFIG_PPC_FPU=y
	CONFIG_FSL_EMB_PERFMON=y
	CONFIG_FSL_EMB_PERF_EVENT=y
	CONFIG_FSL_EMB_PERF_EVENT_E500=y
	CONFIG_BOOKE=y
	CONFIG_FSL_BOOKE=y
	CONFIG_PPC_FSL_BOOK3E=y
	CONFIG_PTE_64BIT=y
	CONFIG_PHYS_64BIT=y
	CONFIG_PPC_MMU_NOHASH=y
	CONFIG_PPC_BOOK3E_MMU=y
	# CONFIG_PMU_SYSFS is not set
	CONFIG_SMP=y
	CONFIG_NR_CPUS=2
	CONFIG_PPC_DOORBELL=y
	# end of Processor support

We compile using 32-bit Bootlin PPC toolchain:

	powerpc-e500mc glibc bleeding-edge 2020.08-1.

When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
process is running, and for debugging purposes our init currently looks like
this:

	int main(int argc, char *argv[]){
		for (int i = 0; ; i++) {
			FILE *fp = fopen("/dev/kmsg", "w");
			if (fp) {
				fprintf(fp, "%d\n", i);
				fclose(fp);
			}
			sleep(1);
		}
		return 0;
	}

When the hangup occur we don't get any output at all from our PID 1.
The last output is from the kernel:

	Run /sbin/init as init process
	  with arguments:
	    /sbin/init
	  with environment:
	    HOME=/
	    TERM=linux
	    kgdboc=ttyS0,115200

When issuing a backtrace on all active cpus we can see that the kernel is
handling an instruction storage exception:

	sysrq: Show backtrace of all active CPUs
	sysrq: CPU0:
	CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
	NIP:  c02aac78 LR: c02aac2c CTR: 00000000
	REGS: c1907d40 TRAP: 0500   Not tainted  (5.14.11)
	MSR:  00029002 <CE,EE,ME>  CR: 82244284  XER: 20000000
	GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
	GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
	GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
	NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
	LR [c02aac2c] handle_mm_fault+0xac/0x11f0
	Call Trace:
	[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
	[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
	[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
	[c1907f10] [c0000988] InstructionStorage+0x108/0x120
	--- interrupt: 400 at 0x65a238
	NIP:  0065a238 LR: 0052f26c CTR: 0052f260
	REGS: c1907f20 TRAP: 0400   Not tainted  (5.14.11)
	MSR:  0002d002 <CE,EE,PR,ME>  CR: 42242284  XER: 00000000
	GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
	GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
	GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
	GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
	NIP [0065a238] 0x65a238
	LR [0052f26c] 0x52f26c
	--- interrupt: 400
	Instruction dump:
	60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
	90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc

We have also observed that the CPU is continuously servicing the same interrupt
(north of 140k times per sec), it is not deadlocked.

We have not yet been able to reproduce this behavior under QEMU system
emulation.

When bisecting between 5.10 and 5.14.11 we can see that this behavior started
with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:

	powerpc: remove arguments from fault handler functions

Our best guess that the instruction storage exception is not properly handled
and the kernel is never able to recover from the page fault, but we don't
really know how to proceed. Does anyone have any suggestions or insights?

Regards,
Jacques de Laval

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-10-28 12:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <uqZVxyE3l9oalZp_hyXFJvdH-ADNTvpOuQeoNGyqrUcoNgh9afea1-FzfZKMgiaF5WxY4kdMQlJYzmjvdQ2E2joF86-mEcaxdifht9z8NA0=@protonmail.com>
2021-10-27  4:10 ` instruction storage exception handling Nicholas Piggin
2021-10-27  5:00   ` Christophe Leroy
2021-10-27  5:25     ` Nicholas Piggin
2021-10-27  5:51       ` Christophe Leroy
2021-10-27  7:47         ` Nicholas Piggin
2021-10-27  7:52           ` Christophe Leroy
2021-10-27 12:03             ` Jacques de Laval
2021-10-28  3:01               ` Nicholas Piggin
2021-10-28  9:08                 ` Jacques de Laval
2021-10-28  9:35                   ` Nicholas Piggin
2021-10-28 12:42                     ` Nicholas Piggin
2021-10-26 22:44 Jacques de Laval

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).