All of lore.kernel.org
 help / color / mirror / Atom feed
* Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP
@ 2021-07-04 21:38 Radu Rendec
  2021-07-06 10:43 ` Christophe Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Radu Rendec @ 2021-07-04 21:38 UTC (permalink / raw)
  To: linuxppc-dev

Hi Everyone,

I'm trying to set up my (virtual) environment to test an old bug in the
PPC32 ptrace() code. I came across a completely different problem,
which seems to make gdb pretty much unusable on PPC32. I'm not sure if
this is a real kernel bug or maybe something wrong with my
configuration.

I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
native gdb (inside the VM) and setting a breakpoint in main() in a test
"hello world" program. Upon running the test program, I am hitting the
BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
included below at the end of the email.

FWIW, gdb says:
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
The program no longer exists.

I also added a pr_info() to do_notify_resume() just to see how much
different 'regs' and 'current->thread.regs' are. Surprisingly, they are
just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
be OK (pid and comm are consistent with the test program).

If I'm not missing something, the 'regs' pointer that is eventually
passed to do_notify_resume() is calculated in interrupt_return() by
adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That
means all registers are saved on the stack before entering the
interrupt handler and, upon returning, a pointer to the register
structure is calculated from the stack pointer. Either the offset
itself is wrong, or the stack pointer is off by 0x30.

This is as far as I have gone. Hopefully this rings a bell to someone
who is familiar with that part of the code and has a better
understanding of PPC32 interrupt handling in general.

Last but not least, my configuration is fairly standard. I'm using the
powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin
to compile everything (kernel and user-space). The qemu version is
5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs
that I built using Buildroot 2021.05. The gdb version is 9.2.

regs=c7955f10 cur=c7955f40 pid=138 comm=test
------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/signal.c:296!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
Modules linked in:
CPU: 0 PID: 138 Comm: test Not tainted 5.13.0-dirty #3
NIP:  c0009694 LR: c0009694 CTR: c065f540
REGS: c7955dc0 TRAP: 0700   Not tainted  (5.13.0-dirty)
MSR:  00028002 <CE,EE>  CR: 28000282  XER: 20000000

GPR00: c0009694 c7955e80 c7145100 0000002c dfbdc3d4 dfbe5d24 00000027 dfbdc3d8 
GPR08: c0ffe988 00000000 00000000 00000000 22000282 00000000 00000000 b7fe17b4 
GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 
GPR24: bffff970 b7fff130 00000001 bffff960 c7955f10 00000800 c7145100 00000102 
NIP [c0009694] do_notify_resume+0x314/0x320
LR [c0009694] do_notify_resume+0x314/0x320
Call Trace:
[c7955e80] [c0009694] do_notify_resume+0x314/0x320 (unreliable)
[c7955ee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
[c7955f00] [c00151e8] interrupt_return+0x14/0x13c
--- interrupt: 7d8 at 0xb7fc3714
NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
REGS: c7955f10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000

GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 
GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 
GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 
GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 
NIP [b7fc3714] 0xb7fc3714
LR [b7fc3714] 0xb7fc3714
--- interrupt: 7d8
Instruction dump:
4bffff04 7c0802a6 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc 
7f84e378 38e204b0 3863ce30 4809d819 <0fe00000> 60000000 60000000 3d20c0ff 
---[ end trace 065671519ba3d526 ]---

Note: the BUG() line is slightly different because I had made this
small change to print the pointers:

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 9ded046edb0e..57ea6e500a6f 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -289,7 +289,12 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 		klp_update_patch_state(current);
 
 	if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
-		BUG_ON(regs != current->thread.regs);
+		if (regs != current->thread.regs) {
+			pr_info("regs=%px cur=%px pid=%d comm=%s\n",
+				regs, current->thread.regs,
+				current->pid, current->comm);
+			BUG();
+		}
 		do_signal(current);
 	}
 



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP
  2021-07-04 21:38 Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP Radu Rendec
@ 2021-07-06 10:43 ` Christophe Leroy
  2021-07-06 11:56   ` Radu Rendec
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2021-07-06 10:43 UTC (permalink / raw)
  To: Radu Rendec, linuxppc-dev



Le 04/07/2021 à 23:38, Radu Rendec a écrit :
> Hi Everyone,
> 
> I'm trying to set up my (virtual) environment to test an old bug in the
> PPC32 ptrace() code. I came across a completely different problem,
> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
> this is a real kernel bug or maybe something wrong with my
> configuration.
> 
> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
> native gdb (inside the VM) and setting a breakpoint in main() in a test
> "hello world" program. Upon running the test program, I am hitting the
> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
> included below at the end of the email.
> 
> FWIW, gdb says:
> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> The program no longer exists.
> 
> I also added a pr_info() to do_notify_resume() just to see how much
> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
> be OK (pid and comm are consistent with the test program).

The TRAP = 0x7d8 is obviously wrong.

Need to know which 'TRAP' it is exactly.
Could you try to dump what we have at the correct regs ?
Something like 'show_regs(current->thread.regs)' should do it.


> 
> If I'm not missing something, the 'regs' pointer that is eventually
> passed to do_notify_resume() is calculated in interrupt_return() by
> adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That
> means all registers are saved on the stack before entering the
> interrupt handler and, upon returning, a pointer to the register
> structure is calculated from the stack pointer. Either the offset
> itself is wrong, or the stack pointer is off by 0x30.
> 
> This is as far as I have gone. Hopefully this rings a bell to someone
> who is familiar with that part of the code and has a better
> understanding of PPC32 interrupt handling in general.
> 
> Last but not least, my configuration is fairly standard. I'm using the
> powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin
> to compile everything (kernel and user-space). The qemu version is
> 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs
> that I built using Buildroot 2021.05. The gdb version is 9.2.
> 
> regs=c7955f10 cur=c7955f40 pid=138 comm=test
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/kernel/signal.c:296!
> Oops: Exception in kernel mode, sig: 5 [#1]
> BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
> Modules linked in:
> CPU: 0 PID: 138 Comm: test Not tainted 5.13.0-dirty #3
> NIP:  c0009694 LR: c0009694 CTR: c065f540
> REGS: c7955dc0 TRAP: 0700   Not tainted  (5.13.0-dirty)
> MSR:  00028002 <CE,EE>  CR: 28000282  XER: 20000000
> 
> GPR00: c0009694 c7955e80 c7145100 0000002c dfbdc3d4 dfbe5d24 00000027 dfbdc3d8
> GPR08: c0ffe988 00000000 00000000 00000000 22000282 00000000 00000000 b7fe17b4
> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
> GPR24: bffff970 b7fff130 00000001 bffff960 c7955f10 00000800 c7145100 00000102
> NIP [c0009694] do_notify_resume+0x314/0x320
> LR [c0009694] do_notify_resume+0x314/0x320
> Call Trace:
> [c7955e80] [c0009694] do_notify_resume+0x314/0x320 (unreliable)
> [c7955ee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
> [c7955f00] [c00151e8] interrupt_return+0x14/0x13c
> --- interrupt: 7d8 at 0xb7fc3714
> NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
> REGS: c7955f10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
> MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000
> 
> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552
> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4
> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850
> NIP [b7fc3714] 0xb7fc3714
> LR [b7fc3714] 0xb7fc3714
> --- interrupt: 7d8
> Instruction dump:
> 4bffff04 7c0802a6 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc
> 7f84e378 38e204b0 3863ce30 4809d819 <0fe00000> 60000000 60000000 3d20c0ff
> ---[ end trace 065671519ba3d526 ]---
> 
> Note: the BUG() line is slightly different because I had made this
> small change to print the pointers:
> 
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 9ded046edb0e..57ea6e500a6f 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -289,7 +289,12 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
>   		klp_update_patch_state(current);
>   
>   	if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
> -		BUG_ON(regs != current->thread.regs);
> +		if (regs != current->thread.regs) {
> +			pr_info("regs=%px cur=%px pid=%d comm=%s\n",
> +				regs, current->thread.regs,
> +				current->pid, current->comm);
> +			BUG();
> +		}
>   		do_signal(current);
>   	}
>   
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP
  2021-07-06 10:43 ` Christophe Leroy
@ 2021-07-06 11:56   ` Radu Rendec
  2021-07-06 13:16     ` Christophe Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Radu Rendec @ 2021-07-06 11:56 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
> > I'm trying to set up my (virtual) environment to test an old bug in the
> > PPC32 ptrace() code. I came across a completely different problem,
> > which seems to make gdb pretty much unusable on PPC32. I'm not sure if
> > this is a real kernel bug or maybe something wrong with my
> > configuration.
> >
> > I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
> > native gdb (inside the VM) and setting a breakpoint in main() in a test
> > "hello world" program. Upon running the test program, I am hitting the
> > BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
> > included below at the end of the email.
> >
> > FWIW, gdb says:
> > Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> > The program no longer exists.
> >
> > I also added a pr_info() to do_notify_resume() just to see how much
> > different 'regs' and 'current->thread.regs' are. Surprisingly, they are
> > just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
> > be OK (pid and comm are consistent with the test program).
>
> The TRAP = 0x7d8 is obviously wrong.
>
> Need to know which 'TRAP' it is exactly.
> Could you try to dump what we have at the correct regs ?
> Something like 'show_regs(current->thread.regs)' should do it.

Sure, please see the output below. It looks to me like the "correct"
regs are just garbage. Either they are overwritten or current->thread.regs
is wrong. But in any case, r1 = 0 doesn't look good.

regs=c7a0ff10 cur=c7a0ff40 pid=139 comm=test
CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4
NIP:  10000338 LR: 00030000 CTR: 00000003
REGS: c7a0ff40 TRAP: 670000   Not tainted  (5.13.0-dirty)
MSR:  1002d202 <CE,EE,PR,ME,DE>  CR: 10000004  XER: 80670100

GPR00: b7fc36d8 00000000 00000000 b7fe17b4 00000000 b7ffd588 b7ffe8b8 b7ffee10 
GPR08: b7fff214 b7ffdf40 b7fff208 bffff858 bffff970 b7fff130 00000001 bffff960 
GPR16: b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 b7fc3714 1002d002 b7fff208 00000003 
GPR24: b7fc3714 00000000 22000284 bffff960 000007d8 10000338 08000000 bffff850 
NIP [10000338] 0x10000338
LR [00030000] 0x30000
Call Trace:
[c7a0fe40] [c0008eac] show_regs+0x4c/0x1b0 (unreliable)
[c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320
[c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
[c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c
--- interrupt: 7d8 at 0xb7fc3714
NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
REGS: c7a0ff10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000

GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 
GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 
GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 
GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 
NIP [b7fc3714] 0xb7fc3714
LR [b7fc3714] 0xb7fc3714
--- interrupt: 7d8
------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/signal.c:298!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
Modules linked in:
CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4
NIP:  c000969c LR: c000969c CTR: c065f540
REGS: c7a0fdc0 TRAP: 0700   Not tainted  (5.13.0-dirty)
MSR:  00028002 <CE,EE>  CR: 28000282  XER: 20000000

GPR00: c000969c c7a0fe80 c70ef500 c70efbd8 c70ef500 00000010 c7a0fc38 00000002 
GPR08: 00000001 00000000 00000000 00000000 28000282 00000000 00000000 b7fe17b4 
GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 
GPR24: bffff970 b7fff130 00000001 bffff960 c7a0ff10 00000800 c70ef500 00000102 
NIP [c000969c] do_notify_resume+0x31c/0x320
LR [c000969c] do_notify_resume+0x31c/0x320
Call Trace:
[c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 (unreliable)
[c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
[c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c
--- interrupt: 7d8 at 0xb7fc3714
NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
REGS: c7a0ff10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000

GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 
GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 
GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 
GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 
NIP [b7fc3714] 0xb7fc3714
LR [b7fc3714] 0xb7fc3714
--- interrupt: 7d8
Instruction dump:
93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc 7f84e378 38e204b0 
3863ce30 4809d819 80620704 4bfff7c9 <0fe00000> 3d20c0ff 8129c014 2c090000 
---[ end trace 1cbf27cd19ae39a0 ]---

> > If I'm not missing something, the 'regs' pointer that is eventually
> > passed to do_notify_resume() is calculated in interrupt_return() by
> > adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That
> > means all registers are saved on the stack before entering the
> > interrupt handler and, upon returning, a pointer to the register
> > structure is calculated from the stack pointer. Either the offset
> > itself is wrong, or the stack pointer is off by 0x30.
> >
> > This is as far as I have gone. Hopefully this rings a bell to someone
> > who is familiar with that part of the code and has a better
> > understanding of PPC32 interrupt handling in general.
> >
> > Last but not least, my configuration is fairly standard. I'm using the
> > powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin
> > to compile everything (kernel and user-space). The qemu version is
> > 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs
> > that I built using Buildroot 2021.05. The gdb version is 9.2.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP
  2021-07-06 11:56   ` Radu Rendec
@ 2021-07-06 13:16     ` Christophe Leroy
  2021-07-06 13:50       ` Radu Rendec
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2021-07-06 13:16 UTC (permalink / raw)
  To: Radu Rendec, linuxppc-dev, Nicholas Piggin



Le 06/07/2021 à 13:56, Radu Rendec a écrit :
> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
>> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
>>> I'm trying to set up my (virtual) environment to test an old bug in the
>>> PPC32 ptrace() code. I came across a completely different problem,
>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
>>> this is a real kernel bug or maybe something wrong with my
>>> configuration.
>>>
>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
>>> native gdb (inside the VM) and setting a breakpoint in main() in a test
>>> "hello world" program. Upon running the test program, I am hitting the
>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
>>> included below at the end of the email.
>>>
>>> FWIW, gdb says:
>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
>>> The program no longer exists.
>>>
>>> I also added a pr_info() to do_notify_resume() just to see how much
>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
>>> be OK (pid and comm are consistent with the test program).
>>
>> The TRAP = 0x7d8 is obviously wrong.
>>
>> Need to know which 'TRAP' it is exactly.
>> Could you try to dump what we have at the correct regs ?
>> Something like 'show_regs(current->thread.regs)' should do it.
> 
> Sure, please see the output below. It looks to me like the "correct"
> regs are just garbage. Either they are overwritten or current->thread.regs
> is wrong. But in any case, r1 = 0 doesn't look good.

Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct 
exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see.

Commit 
https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26

Need to find the best solution now to fix that.

> 
> regs=c7a0ff10 cur=c7a0ff40 pid=139 comm=test
> CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4
> NIP:  10000338 LR: 00030000 CTR: 00000003
> REGS: c7a0ff40 TRAP: 670000   Not tainted  (5.13.0-dirty)
> MSR:  1002d202 <CE,EE,PR,ME,DE>  CR: 10000004  XER: 80670100
> 
> GPR00: b7fc36d8 00000000 00000000 b7fe17b4 00000000 b7ffd588 b7ffe8b8 b7ffee10
> GPR08: b7fff214 b7ffdf40 b7fff208 bffff858 bffff970 b7fff130 00000001 bffff960
> GPR16: b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 b7fc3714 1002d002 b7fff208 00000003
> GPR24: b7fc3714 00000000 22000284 bffff960 000007d8 10000338 08000000 bffff850
> NIP [10000338] 0x10000338
> LR [00030000] 0x30000
> Call Trace:
> [c7a0fe40] [c0008eac] show_regs+0x4c/0x1b0 (unreliable)
> [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320
> [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
> [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c
> --- interrupt: 7d8 at 0xb7fc3714
> NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
> REGS: c7a0ff10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
> MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000
> 
> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552
> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4
> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850
> NIP [b7fc3714] 0xb7fc3714
> LR [b7fc3714] 0xb7fc3714
> --- interrupt: 7d8
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/kernel/signal.c:298!
> Oops: Exception in kernel mode, sig: 5 [#1]
> BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
> Modules linked in:
> CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4
> NIP:  c000969c LR: c000969c CTR: c065f540
> REGS: c7a0fdc0 TRAP: 0700   Not tainted  (5.13.0-dirty)
> MSR:  00028002 <CE,EE>  CR: 28000282  XER: 20000000
> 
> GPR00: c000969c c7a0fe80 c70ef500 c70efbd8 c70ef500 00000010 c7a0fc38 00000002
> GPR08: 00000001 00000000 00000000 00000000 28000282 00000000 00000000 b7fe17b4
> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
> GPR24: bffff970 b7fff130 00000001 bffff960 c7a0ff10 00000800 c70ef500 00000102
> NIP [c000969c] do_notify_resume+0x31c/0x320
> LR [c000969c] do_notify_resume+0x31c/0x320
> Call Trace:
> [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 (unreliable)
> [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
> [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c
> --- interrupt: 7d8 at 0xb7fc3714
> NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
> REGS: c7a0ff10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
> MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000
> 
> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552
> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4
> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850
> NIP [b7fc3714] 0xb7fc3714
> LR [b7fc3714] 0xb7fc3714
> --- interrupt: 7d8
> Instruction dump:
> 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc 7f84e378 38e204b0
> 3863ce30 4809d819 80620704 4bfff7c9 <0fe00000> 3d20c0ff 8129c014 2c090000
> ---[ end trace 1cbf27cd19ae39a0 ]---
> 
>>> If I'm not missing something, the 'regs' pointer that is eventually
>>> passed to do_notify_resume() is calculated in interrupt_return() by
>>> adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That
>>> means all registers are saved on the stack before entering the
>>> interrupt handler and, upon returning, a pointer to the register
>>> structure is calculated from the stack pointer. Either the offset
>>> itself is wrong, or the stack pointer is off by 0x30.
>>>
>>> This is as far as I have gone. Hopefully this rings a bell to someone
>>> who is familiar with that part of the code and has a better
>>> understanding of PPC32 interrupt handling in general.
>>>
>>> Last but not least, my configuration is fairly standard. I'm using the
>>> powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin
>>> to compile everything (kernel and user-space). The qemu version is
>>> 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs
>>> that I built using Buildroot 2021.05. The gdb version is 9.2.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP
  2021-07-06 13:16     ` Christophe Leroy
@ 2021-07-06 13:50       ` Radu Rendec
  2021-07-06 13:53         ` Christophe Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Radu Rendec @ 2021-07-06 13:50 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, Nicholas Piggin

On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote:
>Le 06/07/2021 à 13:56, Radu Rendec a écrit :
>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
>>>> I'm trying to set up my (virtual) environment to test an old bug in the
>>>> PPC32 ptrace() code. I came across a completely different problem,
>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
>>>> this is a real kernel bug or maybe something wrong with my
>>>> configuration.
>>>>
>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test
>>>> "hello world" program. Upon running the test program, I am hitting the
>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
>>>> included below at the end of the email.
>>>>
>>>> FWIW, gdb says:
>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
>>>> The program no longer exists.
>>>>
>>>> I also added a pr_info() to do_notify_resume() just to see how much
>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
>>>> be OK (pid and comm are consistent with the test program).
>>>
>>> The TRAP = 0x7d8 is obviously wrong.
>>>
>>> Need to know which 'TRAP' it is exactly.
>>> Could you try to dump what we have at the correct regs ?
>>> Something like 'show_regs(current->thread.regs)' should do it.
>> 
>> Sure, please see the output below. It looks to me like the "correct"
>> regs are just garbage. Either they are overwritten or current->thread.regs
>> is wrong. But in any case, r1 = 0 doesn't look good.
>
>Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct 
>exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see.
>
>Commit 
>https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26
>
>Need to find the best solution now to fix that.

Awesome, happy to see you figured it out so quickly.

I'm not sure if it makes any sense, but one thing that comes to mind is
to put struct exception_regs before struct pt_regs when the frame is
saved. Unless of course other parts of the code expect the opposite.

>> regs=c7a0ff10 cur=c7a0ff40 pid=139 comm=test
>> CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4
>> NIP:  10000338 LR: 00030000 CTR: 00000003
>> REGS: c7a0ff40 TRAP: 670000   Not tainted  (5.13.0-dirty)
>> MSR:  1002d202 <CE,EE,PR,ME,DE>  CR: 10000004  XER: 80670100
>> 
>> GPR00: b7fc36d8 00000000 00000000 b7fe17b4 00000000 b7ffd588 b7ffe8b8 b7ffee10
>> GPR08: b7fff214 b7ffdf40 b7fff208 bffff858 bffff970 b7fff130 00000001 bffff960
>> GPR16: b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 b7fc3714 1002d002 b7fff208 00000003
>> GPR24: b7fc3714 00000000 22000284 bffff960 000007d8 10000338 08000000 bffff850
>> NIP [10000338] 0x10000338
>> LR [00030000] 0x30000
>> Call Trace:
>> [c7a0fe40] [c0008eac] show_regs+0x4c/0x1b0 (unreliable)
>> [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320
>> [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
>> [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c
>> --- interrupt: 7d8 at 0xb7fc3714
>> NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
>> REGS: c7a0ff10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
>> MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000
>> 
>> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552
>> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4
>> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
>> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850
>> NIP [b7fc3714] 0xb7fc3714
>> LR [b7fc3714] 0xb7fc3714
>> --- interrupt: 7d8
>> ------------[ cut here ]------------
>> kernel BUG at arch/powerpc/kernel/signal.c:298!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
>> Modules linked in:
>> CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4
>> NIP:  c000969c LR: c000969c CTR: c065f540
>> REGS: c7a0fdc0 TRAP: 0700   Not tainted  (5.13.0-dirty)
>> MSR:  00028002 <CE,EE>  CR: 28000282  XER: 20000000
>> 
>> GPR00: c000969c c7a0fe80 c70ef500 c70efbd8 c70ef500 00000010 c7a0fc38 00000002
>> GPR08: 00000001 00000000 00000000 00000000 28000282 00000000 00000000 b7fe17b4
>> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
>> GPR24: bffff970 b7fff130 00000001 bffff960 c7a0ff10 00000800 c70ef500 00000102
>> NIP [c000969c] do_notify_resume+0x31c/0x320
>> LR [c000969c] do_notify_resume+0x31c/0x320
>> Call Trace:
>> [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 (unreliable)
>> [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
>> [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c
>> --- interrupt: 7d8 at 0xb7fc3714
>> NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
>> REGS: c7a0ff10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
>> MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000
>> 
>> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552
>> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4
>> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
>> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850
>> NIP [b7fc3714] 0xb7fc3714
>> LR [b7fc3714] 0xb7fc3714
>> --- interrupt: 7d8
>> Instruction dump:
>> 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc 7f84e378 38e204b0
>> 3863ce30 4809d819 80620704 4bfff7c9 <0fe00000> 3d20c0ff 8129c014 2c090000
>> ---[ end trace 1cbf27cd19ae39a0 ]---
>> 
>>>> If I'm not missing something, the 'regs' pointer that is eventually
>>>> passed to do_notify_resume() is calculated in interrupt_return() by
>>>> adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That
>>>> means all registers are saved on the stack before entering the
>>>> interrupt handler and, upon returning, a pointer to the register
>>>> structure is calculated from the stack pointer. Either the offset
>>>> itself is wrong, or the stack pointer is off by 0x30.
>>>>
>>>> This is as far as I have gone. Hopefully this rings a bell to someone
>>>> who is familiar with that part of the code and has a better
>>>> understanding of PPC32 interrupt handling in general.
>>>>
>>>> Last but not least, my configuration is fairly standard. I'm using the
>>>> powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin
>>>> to compile everything (kernel and user-space). The qemu version is
>>>> 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs
>>>> that I built using Buildroot 2021.05. The gdb version is 9.2.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP
  2021-07-06 13:50       ` Radu Rendec
@ 2021-07-06 13:53         ` Christophe Leroy
  2021-07-06 14:05           ` Radu Rendec
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2021-07-06 13:53 UTC (permalink / raw)
  To: Radu Rendec, linuxppc-dev, Nicholas Piggin



Le 06/07/2021 à 15:50, Radu Rendec a écrit :
> On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote:
>> Le 06/07/2021 à 13:56, Radu Rendec a écrit :
>>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
>>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
>>>>> I'm trying to set up my (virtual) environment to test an old bug in the
>>>>> PPC32 ptrace() code. I came across a completely different problem,
>>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
>>>>> this is a real kernel bug or maybe something wrong with my
>>>>> configuration.
>>>>>
>>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
>>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test
>>>>> "hello world" program. Upon running the test program, I am hitting the
>>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
>>>>> included below at the end of the email.
>>>>>
>>>>> FWIW, gdb says:
>>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
>>>>> The program no longer exists.
>>>>>
>>>>> I also added a pr_info() to do_notify_resume() just to see how much
>>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
>>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
>>>>> be OK (pid and comm are consistent with the test program).
>>>>
>>>> The TRAP = 0x7d8 is obviously wrong.
>>>>
>>>> Need to know which 'TRAP' it is exactly.
>>>> Could you try to dump what we have at the correct regs ?
>>>> Something like 'show_regs(current->thread.regs)' should do it.
>>>
>>> Sure, please see the output below. It looks to me like the "correct"
>>> regs are just garbage. Either they are overwritten or current->thread.regs
>>> is wrong. But in any case, r1 = 0 doesn't look good.
>>
>> Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct
>> exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see.
>>
>> Commit
>> https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26
>>
>> Need to find the best solution now to fix that.
> 
> Awesome, happy to see you figured it out so quickly.
> 
> I'm not sure if it makes any sense, but one thing that comes to mind is
> to put struct exception_regs before struct pt_regs when the frame is
> saved. Unless of course other parts of the code expect the opposite.
> 

Yes I think it is a good idea. I think I won't have time to look at that before summer vacation though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP
  2021-07-06 13:53         ` Christophe Leroy
@ 2021-07-06 14:05           ` Radu Rendec
  2021-07-06 14:11             ` Christophe Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Radu Rendec @ 2021-07-06 14:05 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, Nicholas Piggin

On Tue, 2021-07-06 at 15:53 +0200, Christophe Leroy wrote:
>Le 06/07/2021 à 15:50, Radu Rendec a écrit :
>> On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote:
>>> Le 06/07/2021 à 13:56, Radu Rendec a écrit :
>>>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
>>>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
>>>>>> I'm trying to set up my (virtual) environment to test an old bug in the
>>>>>> PPC32 ptrace() code. I came across a completely different problem,
>>>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
>>>>>> this is a real kernel bug or maybe something wrong with my
>>>>>> configuration.
>>>>>>
>>>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
>>>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test
>>>>>> "hello world" program. Upon running the test program, I am hitting the
>>>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
>>>>>> included below at the end of the email.
>>>>>>
>>>>>> FWIW, gdb says:
>>>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
>>>>>> The program no longer exists.
>>>>>>
>>>>>> I also added a pr_info() to do_notify_resume() just to see how much
>>>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
>>>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
>>>>>> be OK (pid and comm are consistent with the test program).
>>>>>
>>>>> The TRAP = 0x7d8 is obviously wrong.
>>>>>
>>>>> Need to know which 'TRAP' it is exactly.
>>>>> Could you try to dump what we have at the correct regs ?
>>>>> Something like 'show_regs(current->thread.regs)' should do it.
>>>>
>>>> Sure, please see the output below. It looks to me like the "correct"
>>>> regs are just garbage. Either they are overwritten or current->thread.regs
>>>> is wrong. But in any case, r1 = 0 doesn't look good.
>>>
>>> Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct
>>> exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see.
>>>
>>> Commit
>>> https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26
>>>
>>> Need to find the best solution now to fix that.
>> 
>> Awesome, happy to see you figured it out so quickly.
>> 
>> I'm not sure if it makes any sense, but one thing that comes to mind is
>> to put struct exception_regs before struct pt_regs when the frame is
>> saved. Unless of course other parts of the code expect the opposite.
>
>Yes I think it is a good idea. I think I won't have time to look at that before summer vacation though.

I can take a stab at it. I'm not familiar with that part of the code,
but the best way to learn is to get your hands dirty :) In the worst
case, I won't fix it.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP
  2021-07-06 14:05           ` Radu Rendec
@ 2021-07-06 14:11             ` Christophe Leroy
  0 siblings, 0 replies; 8+ messages in thread
From: Christophe Leroy @ 2021-07-06 14:11 UTC (permalink / raw)
  To: Radu Rendec, linuxppc-dev, Nicholas Piggin



Le 06/07/2021 à 16:05, Radu Rendec a écrit :
> On Tue, 2021-07-06 at 15:53 +0200, Christophe Leroy wrote:
>> Le 06/07/2021 à 15:50, Radu Rendec a écrit :
>>> On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote:
>>>> Le 06/07/2021 à 13:56, Radu Rendec a écrit :
>>>>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
>>>>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
>>>>>>> I'm trying to set up my (virtual) environment to test an old bug in the
>>>>>>> PPC32 ptrace() code. I came across a completely different problem,
>>>>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
>>>>>>> this is a real kernel bug or maybe something wrong with my
>>>>>>> configuration.
>>>>>>>
>>>>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
>>>>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test
>>>>>>> "hello world" program. Upon running the test program, I am hitting the
>>>>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
>>>>>>> included below at the end of the email.
>>>>>>>
>>>>>>> FWIW, gdb says:
>>>>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
>>>>>>> The program no longer exists.
>>>>>>>
>>>>>>> I also added a pr_info() to do_notify_resume() just to see how much
>>>>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
>>>>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
>>>>>>> be OK (pid and comm are consistent with the test program).
>>>>>>
>>>>>> The TRAP = 0x7d8 is obviously wrong.
>>>>>>
>>>>>> Need to know which 'TRAP' it is exactly.
>>>>>> Could you try to dump what we have at the correct regs ?
>>>>>> Something like 'show_regs(current->thread.regs)' should do it.
>>>>>
>>>>> Sure, please see the output below. It looks to me like the "correct"
>>>>> regs are just garbage. Either they are overwritten or current->thread.regs
>>>>> is wrong. But in any case, r1 = 0 doesn't look good.
>>>>
>>>> Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct
>>>> exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see.
>>>>
>>>> Commit
>>>> https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26
>>>>
>>>> Need to find the best solution now to fix that.
>>>
>>> Awesome, happy to see you figured it out so quickly.
>>>
>>> I'm not sure if it makes any sense, but one thing that comes to mind is
>>> to put struct exception_regs before struct pt_regs when the frame is
>>> saved. Unless of course other parts of the code expect the opposite.
>>
>> Yes I think it is a good idea. I think I won't have time to look at that before summer vacation though.
> 
> I can take a stab at it. I'm not familiar with that part of the code,
> but the best way to learn is to get your hands dirty :) In the worst
> case, I won't fix it.
> 

Not that easy in fact.
After looking once more, the best solution I see now would be to move the content of struct 
exception_regs into the second part of struct pt_regs (the kernel one in asm/ptrace.h).

Changes should be limited to head_booke.h and asm-offsets.c
struct exception_regs and STACK_EXC_LVL_FRAME_SIZE should go away.

Christophe

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-06 14:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-04 21:38 Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP Radu Rendec
2021-07-06 10:43 ` Christophe Leroy
2021-07-06 11:56   ` Radu Rendec
2021-07-06 13:16     ` Christophe Leroy
2021-07-06 13:50       ` Radu Rendec
2021-07-06 13:53         ` Christophe Leroy
2021-07-06 14:05           ` Radu Rendec
2021-07-06 14:11             ` Christophe Leroy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.