* [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
@ 2023-03-27 11:56 ` Yunhui Cui
0 siblings, 0 replies; 10+ messages in thread
From: Yunhui Cui @ 2023-03-27 11:56 UTC (permalink / raw)
To: paul.walmsley, palmer, aou, cuiyunhui, linux-riscv, linux-kernel
We encountered such a problem(logs are below). We are more curious about
which instruction execution caused the exception. After dumping it
through show_opcodes(), we found that it was caused by a floating-point
instruction.
In this way, we found the problem: in the system bringup , it is
precisely that we have not enabled the floating point function.
So when an exception occurs, it is necessary to dump the instruction
that caused the exception, like x86/fault (ba54d856a9d8).
Logs:
[ 0.822481] Run /init as init process
[ 0.837569] init[1]: unhandled signal 4 code 0x1 at 0x000000000005e028 in bb[10000+5fe000]
[ 0.932292] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00048-g4a843c9043e8-dirty #138
[ 0.932949] Hardware name: , BIOS
[ 0.933399] epc : 000000000005e028 ra : 000000000007c7c4 sp : 0000003fffd45da0
[ 0.933855] gp : ffffffff816ea2d8 tp : 0000000000000000 t0 : 0000000000000000
[ 0.934303] t1 : 0000003fffd35df0 t2 : 0000000000000000 s0 : 0000000000000000
[ 0.934734] s1 : 0000000000000000 a0 : 0000003fffd36190 a1 : 0000003fffd45e18
[ 0.935200] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[ 0.935622] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[ 0.936073] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
[ 0.936495] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
[ 0.936947] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
[ 0.937487] s11: 0000000000d14980 t3 : 0000000000000000 t4 : 0000000000000000
[ 0.937954] t5 : 0000000000000000 t6 : 0000000000000000
[ 0.938510] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/bug.h | 1 +
arch/riscv/kernel/process.c | 30 ++++++++++++++++++++++++++++++
arch/riscv/kernel/traps.c | 1 +
3 files changed, 32 insertions(+)
diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
index d3804a2f9aad..77655dd10a2c 100644
--- a/arch/riscv/include/asm/bug.h
+++ b/arch/riscv/include/asm/bug.h
@@ -86,6 +86,7 @@ struct pt_regs;
struct task_struct;
void __show_regs(struct pt_regs *regs);
+void show_opcodes(struct pt_regs *regs);
void die(struct pt_regs *regs, const char *str);
void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr);
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 03ac3aa611f5..759dc74fe160 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -83,6 +83,36 @@ void show_regs(struct pt_regs *regs)
dump_backtrace(regs, NULL, KERN_DEFAULT);
}
+static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
+ unsigned int nbytes)
+{
+ if (!user_mode(regs))
+ return copy_from_kernel_nofault(buf, (u8 *)src, nbytes);
+
+ /* The user space code from other tasks cannot be accessed. */
+ if (regs != task_pt_regs(current))
+ return -EPERM;
+
+ return copy_from_user_nofault(buf, (void __user *)src, nbytes);
+}
+
+void show_opcodes(struct pt_regs *regs)
+{
+ u8 opcodes[4];
+
+ switch (copy_code(regs, opcodes, regs->epc, sizeof(opcodes))) {
+ case 0:
+ pr_info("Opcode: %4ph", opcodes);
+ break;
+ case -EPERM:
+ pr_err("Unable to access userspace of other tasks");
+ break;
+ default:
+ pr_err("Failed to access opcode");
+ break;
+ }
+}
+
void start_thread(struct pt_regs *regs, unsigned long pc,
unsigned long sp)
{
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 0a98fd0ddfe9..a6f6851e4e87 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -68,6 +68,7 @@ void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr)
print_vma_addr(KERN_CONT " in ", instruction_pointer(regs));
pr_cont("\n");
__show_regs(regs);
+ show_opcodes(regs);
}
force_sig_fault(signo, code, (void __user *)addr);
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
@ 2023-03-27 11:56 ` Yunhui Cui
0 siblings, 0 replies; 10+ messages in thread
From: Yunhui Cui @ 2023-03-27 11:56 UTC (permalink / raw)
To: paul.walmsley, palmer, aou, cuiyunhui, linux-riscv, linux-kernel
We encountered such a problem(logs are below). We are more curious about
which instruction execution caused the exception. After dumping it
through show_opcodes(), we found that it was caused by a floating-point
instruction.
In this way, we found the problem: in the system bringup , it is
precisely that we have not enabled the floating point function.
So when an exception occurs, it is necessary to dump the instruction
that caused the exception, like x86/fault (ba54d856a9d8).
Logs:
[ 0.822481] Run /init as init process
[ 0.837569] init[1]: unhandled signal 4 code 0x1 at 0x000000000005e028 in bb[10000+5fe000]
[ 0.932292] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00048-g4a843c9043e8-dirty #138
[ 0.932949] Hardware name: , BIOS
[ 0.933399] epc : 000000000005e028 ra : 000000000007c7c4 sp : 0000003fffd45da0
[ 0.933855] gp : ffffffff816ea2d8 tp : 0000000000000000 t0 : 0000000000000000
[ 0.934303] t1 : 0000003fffd35df0 t2 : 0000000000000000 s0 : 0000000000000000
[ 0.934734] s1 : 0000000000000000 a0 : 0000003fffd36190 a1 : 0000003fffd45e18
[ 0.935200] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[ 0.935622] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[ 0.936073] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
[ 0.936495] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
[ 0.936947] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
[ 0.937487] s11: 0000000000d14980 t3 : 0000000000000000 t4 : 0000000000000000
[ 0.937954] t5 : 0000000000000000 t6 : 0000000000000000
[ 0.938510] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/bug.h | 1 +
arch/riscv/kernel/process.c | 30 ++++++++++++++++++++++++++++++
arch/riscv/kernel/traps.c | 1 +
3 files changed, 32 insertions(+)
diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
index d3804a2f9aad..77655dd10a2c 100644
--- a/arch/riscv/include/asm/bug.h
+++ b/arch/riscv/include/asm/bug.h
@@ -86,6 +86,7 @@ struct pt_regs;
struct task_struct;
void __show_regs(struct pt_regs *regs);
+void show_opcodes(struct pt_regs *regs);
void die(struct pt_regs *regs, const char *str);
void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr);
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 03ac3aa611f5..759dc74fe160 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -83,6 +83,36 @@ void show_regs(struct pt_regs *regs)
dump_backtrace(regs, NULL, KERN_DEFAULT);
}
+static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
+ unsigned int nbytes)
+{
+ if (!user_mode(regs))
+ return copy_from_kernel_nofault(buf, (u8 *)src, nbytes);
+
+ /* The user space code from other tasks cannot be accessed. */
+ if (regs != task_pt_regs(current))
+ return -EPERM;
+
+ return copy_from_user_nofault(buf, (void __user *)src, nbytes);
+}
+
+void show_opcodes(struct pt_regs *regs)
+{
+ u8 opcodes[4];
+
+ switch (copy_code(regs, opcodes, regs->epc, sizeof(opcodes))) {
+ case 0:
+ pr_info("Opcode: %4ph", opcodes);
+ break;
+ case -EPERM:
+ pr_err("Unable to access userspace of other tasks");
+ break;
+ default:
+ pr_err("Failed to access opcode");
+ break;
+ }
+}
+
void start_thread(struct pt_regs *regs, unsigned long pc,
unsigned long sp)
{
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 0a98fd0ddfe9..a6f6851e4e87 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -68,6 +68,7 @@ void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr)
print_vma_addr(KERN_CONT " in ", instruction_pointer(regs));
pr_cont("\n");
__show_regs(regs);
+ show_opcodes(regs);
}
force_sig_fault(signo, code, (void __user *)addr);
--
2.20.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
2023-03-27 11:56 ` Yunhui Cui
@ 2023-03-28 17:03 ` Conor Dooley
-1 siblings, 0 replies; 10+ messages in thread
From: Conor Dooley @ 2023-03-28 17:03 UTC (permalink / raw)
To: Yunhui Cui; +Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5088 bytes --]
Hey,
riscv/fault: Dump user opcode bytes on fatal faults
I think you can drop the /fault, we don't usually use prefixes that like
for RISC-V.
On Mon, Mar 27, 2023 at 07:56:42PM +0800, Yunhui Cui wrote:
> We encountered such a problem(logs are below). We are more curious about
> which instruction execution caused the exception. After dumping it
> through show_opcodes(), we found that it was caused by a floating-point
> instruction.
>
> In this way, we found the problem: in the system bringup , it is
> precisely that we have not enabled the floating point function.
What do you mean by that "have not enabled the floating point function"?
> So when an exception occurs, it is necessary to dump the instruction
> that caused the exception, like x86/fault (ba54d856a9d8).
That's not the usual format for referring to commits, checkpatch should
complain about that.
>
> Logs:
> [ 0.822481] Run /init as init process
> [ 0.837569] init[1]: unhandled signal 4 code 0x1 at 0x000000000005e028 in bb[10000+5fe000]
> [ 0.932292] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00048-g4a843c9043e8-dirty #138
5.14-rc4?, oof! Need to get yourself onto a released, LTS kernel I
think!
Anyway, this patch doesn't apply to either riscv/for-next, riscv/fixes
or v6.3-rc1. What is the appropriate base to apply this patch?
> [ 0.932949] Hardware name: , BIOS
> [ 0.933399] epc : 000000000005e028 ra : 000000000007c7c4 sp : 0000003fffd45da0
> [ 0.933855] gp : ffffffff816ea2d8 tp : 0000000000000000 t0 : 0000000000000000
> [ 0.934303] t1 : 0000003fffd35df0 t2 : 0000000000000000 s0 : 0000000000000000
> [ 0.934734] s1 : 0000000000000000 a0 : 0000003fffd36190 a1 : 0000003fffd45e18
> [ 0.935200] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> [ 0.935622] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
> [ 0.936073] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
> [ 0.936495] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
> [ 0.936947] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
> [ 0.937487] s11: 0000000000d14980 t3 : 0000000000000000 t4 : 0000000000000000
> [ 0.937954] t5 : 0000000000000000 t6 : 0000000000000000
> [ 0.938510] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
I have no idea what the significance of this particular backtrace is,
could you elaborate on what this is demonstrating? (and drop the leading
[###] too as it doesn't exactly add anything!
Thanks,
Conor.
> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> ---
> arch/riscv/include/asm/bug.h | 1 +
> arch/riscv/kernel/process.c | 30 ++++++++++++++++++++++++++++++
> arch/riscv/kernel/traps.c | 1 +
> 3 files changed, 32 insertions(+)
>
> diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
> index d3804a2f9aad..77655dd10a2c 100644
> --- a/arch/riscv/include/asm/bug.h
> +++ b/arch/riscv/include/asm/bug.h
> @@ -86,6 +86,7 @@ struct pt_regs;
> struct task_struct;
>
> void __show_regs(struct pt_regs *regs);
> +void show_opcodes(struct pt_regs *regs);
> void die(struct pt_regs *regs, const char *str);
> void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr);
>
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 03ac3aa611f5..759dc74fe160 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -83,6 +83,36 @@ void show_regs(struct pt_regs *regs)
> dump_backtrace(regs, NULL, KERN_DEFAULT);
> }
>
> +static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
> + unsigned int nbytes)
> +{
> + if (!user_mode(regs))
> + return copy_from_kernel_nofault(buf, (u8 *)src, nbytes);
> +
> + /* The user space code from other tasks cannot be accessed. */
> + if (regs != task_pt_regs(current))
> + return -EPERM;
> +
> + return copy_from_user_nofault(buf, (void __user *)src, nbytes);
> +}
> +
> +void show_opcodes(struct pt_regs *regs)
> +{
> + u8 opcodes[4];
> +
> + switch (copy_code(regs, opcodes, regs->epc, sizeof(opcodes))) {
> + case 0:
> + pr_info("Opcode: %4ph", opcodes);
> + break;
> + case -EPERM:
> + pr_err("Unable to access userspace of other tasks");
> + break;
> + default:
> + pr_err("Failed to access opcode");
> + break;
> + }
> +}
> +
> void start_thread(struct pt_regs *regs, unsigned long pc,
> unsigned long sp)
> {
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 0a98fd0ddfe9..a6f6851e4e87 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -68,6 +68,7 @@ void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr)
> print_vma_addr(KERN_CONT " in ", instruction_pointer(regs));
> pr_cont("\n");
> __show_regs(regs);
> + show_opcodes(regs);
> }
>
> force_sig_fault(signo, code, (void __user *)addr);
> --
> 2.20.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
@ 2023-03-28 17:03 ` Conor Dooley
0 siblings, 0 replies; 10+ messages in thread
From: Conor Dooley @ 2023-03-28 17:03 UTC (permalink / raw)
To: Yunhui Cui; +Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 5088 bytes --]
Hey,
riscv/fault: Dump user opcode bytes on fatal faults
I think you can drop the /fault, we don't usually use prefixes that like
for RISC-V.
On Mon, Mar 27, 2023 at 07:56:42PM +0800, Yunhui Cui wrote:
> We encountered such a problem(logs are below). We are more curious about
> which instruction execution caused the exception. After dumping it
> through show_opcodes(), we found that it was caused by a floating-point
> instruction.
>
> In this way, we found the problem: in the system bringup , it is
> precisely that we have not enabled the floating point function.
What do you mean by that "have not enabled the floating point function"?
> So when an exception occurs, it is necessary to dump the instruction
> that caused the exception, like x86/fault (ba54d856a9d8).
That's not the usual format for referring to commits, checkpatch should
complain about that.
>
> Logs:
> [ 0.822481] Run /init as init process
> [ 0.837569] init[1]: unhandled signal 4 code 0x1 at 0x000000000005e028 in bb[10000+5fe000]
> [ 0.932292] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00048-g4a843c9043e8-dirty #138
5.14-rc4?, oof! Need to get yourself onto a released, LTS kernel I
think!
Anyway, this patch doesn't apply to either riscv/for-next, riscv/fixes
or v6.3-rc1. What is the appropriate base to apply this patch?
> [ 0.932949] Hardware name: , BIOS
> [ 0.933399] epc : 000000000005e028 ra : 000000000007c7c4 sp : 0000003fffd45da0
> [ 0.933855] gp : ffffffff816ea2d8 tp : 0000000000000000 t0 : 0000000000000000
> [ 0.934303] t1 : 0000003fffd35df0 t2 : 0000000000000000 s0 : 0000000000000000
> [ 0.934734] s1 : 0000000000000000 a0 : 0000003fffd36190 a1 : 0000003fffd45e18
> [ 0.935200] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> [ 0.935622] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
> [ 0.936073] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
> [ 0.936495] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
> [ 0.936947] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
> [ 0.937487] s11: 0000000000d14980 t3 : 0000000000000000 t4 : 0000000000000000
> [ 0.937954] t5 : 0000000000000000 t6 : 0000000000000000
> [ 0.938510] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
I have no idea what the significance of this particular backtrace is,
could you elaborate on what this is demonstrating? (and drop the leading
[###] too as it doesn't exactly add anything!
Thanks,
Conor.
> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> ---
> arch/riscv/include/asm/bug.h | 1 +
> arch/riscv/kernel/process.c | 30 ++++++++++++++++++++++++++++++
> arch/riscv/kernel/traps.c | 1 +
> 3 files changed, 32 insertions(+)
>
> diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
> index d3804a2f9aad..77655dd10a2c 100644
> --- a/arch/riscv/include/asm/bug.h
> +++ b/arch/riscv/include/asm/bug.h
> @@ -86,6 +86,7 @@ struct pt_regs;
> struct task_struct;
>
> void __show_regs(struct pt_regs *regs);
> +void show_opcodes(struct pt_regs *regs);
> void die(struct pt_regs *regs, const char *str);
> void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr);
>
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 03ac3aa611f5..759dc74fe160 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -83,6 +83,36 @@ void show_regs(struct pt_regs *regs)
> dump_backtrace(regs, NULL, KERN_DEFAULT);
> }
>
> +static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
> + unsigned int nbytes)
> +{
> + if (!user_mode(regs))
> + return copy_from_kernel_nofault(buf, (u8 *)src, nbytes);
> +
> + /* The user space code from other tasks cannot be accessed. */
> + if (regs != task_pt_regs(current))
> + return -EPERM;
> +
> + return copy_from_user_nofault(buf, (void __user *)src, nbytes);
> +}
> +
> +void show_opcodes(struct pt_regs *regs)
> +{
> + u8 opcodes[4];
> +
> + switch (copy_code(regs, opcodes, regs->epc, sizeof(opcodes))) {
> + case 0:
> + pr_info("Opcode: %4ph", opcodes);
> + break;
> + case -EPERM:
> + pr_err("Unable to access userspace of other tasks");
> + break;
> + default:
> + pr_err("Failed to access opcode");
> + break;
> + }
> +}
> +
> void start_thread(struct pt_regs *regs, unsigned long pc,
> unsigned long sp)
> {
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 0a98fd0ddfe9..a6f6851e4e87 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -68,6 +68,7 @@ void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr)
> print_vma_addr(KERN_CONT " in ", instruction_pointer(regs));
> pr_cont("\n");
> __show_regs(regs);
> + show_opcodes(regs);
> }
>
> force_sig_fault(signo, code, (void __user *)addr);
> --
> 2.20.1
>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
[-- Attachment #2: Type: text/plain, Size: 161 bytes --]
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
2023-03-28 17:03 ` Conor Dooley
@ 2023-03-29 3:52 ` 运辉崔
-1 siblings, 0 replies; 10+ messages in thread
From: 运辉崔 @ 2023-03-29 3:52 UTC (permalink / raw)
To: Conor Dooley; +Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel
Hi Conor,
On Wed, Mar 29, 2023 at 1:03 AM Conor Dooley <conor@kernel.org> wrote:
> riscv/fault: Dump user opcode bytes on fatal faults
>
> I think you can drop the /fault, we don't usually use prefixes that like
> for RISC-V.
>
ok, i'll update it on v2
> > In this way, we found the problem: in the system bringup , it is
> > precisely that we have not enabled the floating point function.
>
> What do you mean by that "have not enabled the floating point function"?
The related cpu feature(COMPAT_HWCAP_ISA_F) is not enabled in the
riscv_fill_hwcap function interface.
> > So when an exception occurs, it is necessary to dump the instruction
> > that caused the exception, like x86/fault (ba54d856a9d8).
>
> That's not the usual format for referring to commits, checkpatch should
> complain about that.
ok, i'll update it on v2.
> >
> > Logs:
> > [ 0.822481] Run /init as init process
> > [ 0.837569] init[1]: unhandled signal 4 code 0x1 at 0x000000000005e028 in bb[10000+5fe000]
> > [ 0.932292] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00048-g4a843c9043e8-dirty #138
>
> 5.14-rc4?, oof! Need to get yourself onto a released, LTS kernel I
> think!
Just a print,v6.3-rc1 also has this problem.
>
> Anyway, this patch doesn't apply to either riscv/for-next, riscv/fixes
> or v6.3-rc1. What is the appropriate base to apply this patch?
ok, i'll update it on v2.
> > [ 0.936073] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
> > [ 0.936495] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
> > [ 0.936947] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
> > [ 0.937487] s11: 0000000000d14980 t3 : 0000000000000000 t4 : 0000000000000000
> > [ 0.937954] t5 : 0000000000000000 t6 : 0000000000000000
> > [ 0.938510] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
>
> I have no idea what the significance of this particular backtrace is,
> could you elaborate on what this is demonstrating? (and drop the leading
> [###] too as it doesn't exactly add anything!
The current call trace does not show the instruction that caused the
exception. ok, I'll remove it on v2.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
@ 2023-03-29 3:52 ` 运辉崔
0 siblings, 0 replies; 10+ messages in thread
From: 运辉崔 @ 2023-03-29 3:52 UTC (permalink / raw)
To: Conor Dooley; +Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel
Hi Conor,
On Wed, Mar 29, 2023 at 1:03 AM Conor Dooley <conor@kernel.org> wrote:
> riscv/fault: Dump user opcode bytes on fatal faults
>
> I think you can drop the /fault, we don't usually use prefixes that like
> for RISC-V.
>
ok, i'll update it on v2
> > In this way, we found the problem: in the system bringup , it is
> > precisely that we have not enabled the floating point function.
>
> What do you mean by that "have not enabled the floating point function"?
The related cpu feature(COMPAT_HWCAP_ISA_F) is not enabled in the
riscv_fill_hwcap function interface.
> > So when an exception occurs, it is necessary to dump the instruction
> > that caused the exception, like x86/fault (ba54d856a9d8).
>
> That's not the usual format for referring to commits, checkpatch should
> complain about that.
ok, i'll update it on v2.
> >
> > Logs:
> > [ 0.822481] Run /init as init process
> > [ 0.837569] init[1]: unhandled signal 4 code 0x1 at 0x000000000005e028 in bb[10000+5fe000]
> > [ 0.932292] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00048-g4a843c9043e8-dirty #138
>
> 5.14-rc4?, oof! Need to get yourself onto a released, LTS kernel I
> think!
Just a print,v6.3-rc1 also has this problem.
>
> Anyway, this patch doesn't apply to either riscv/for-next, riscv/fixes
> or v6.3-rc1. What is the appropriate base to apply this patch?
ok, i'll update it on v2.
> > [ 0.936073] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
> > [ 0.936495] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
> > [ 0.936947] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
> > [ 0.937487] s11: 0000000000d14980 t3 : 0000000000000000 t4 : 0000000000000000
> > [ 0.937954] t5 : 0000000000000000 t6 : 0000000000000000
> > [ 0.938510] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
>
> I have no idea what the significance of this particular backtrace is,
> could you elaborate on what this is demonstrating? (and drop the leading
> [###] too as it doesn't exactly add anything!
The current call trace does not show the instruction that caused the
exception. ok, I'll remove it on v2.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
2023-03-29 3:52 ` 运辉崔
@ 2023-03-29 6:24 ` Conor Dooley
-1 siblings, 0 replies; 10+ messages in thread
From: Conor Dooley @ 2023-03-29 6:24 UTC (permalink / raw)
To: 运辉崔
Cc: Conor Dooley, paul.walmsley, palmer, aou, linux-riscv, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2903 bytes --]
Hey 运辉崔,
On Wed, Mar 29, 2023 at 11:52:21AM +0800, 运辉崔 wrote:
> On Wed, Mar 29, 2023 at 1:03 AM Conor Dooley <conor@kernel.org> wrote:
>
> > riscv/fault: Dump user opcode bytes on fatal faults
> >
> > I think you can drop the /fault, we don't usually use prefixes that like
> > for RISC-V.
> >
> ok, i'll update it on v2
>
>
> > > In this way, we found the problem: in the system bringup , it is
> > > precisely that we have not enabled the floating point function.
> >
> > What do you mean by that "have not enabled the floating point function"?
>
> The related cpu feature(COMPAT_HWCAP_ISA_F) is not enabled in the
> riscv_fill_hwcap function interface.
Right, I'm trying to figure out of this is another bug in the kernel -
if you don't have "fd" in riscv,isa in your devicetree then, even if
CONFIG_FPU is set, none of the FPU code is meant to run, right?
> > > So when an exception occurs, it is necessary to dump the instruction
> > > that caused the exception, like x86/fault (ba54d856a9d8).
> >
> > That's not the usual format for referring to commits, checkpatch should
> > complain about that.
>
> ok, i'll update it on v2.
>
> > >
> > > Logs:
> > > [ 0.822481] Run /init as init process
> > > [ 0.837569] init[1]: unhandled signal 4 code 0x1 at 0x000000000005e028 in bb[10000+5fe000]
> > > [ 0.932292] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00048-g4a843c9043e8-dirty #138
> >
> > 5.14-rc4?, oof! Need to get yourself onto a released, LTS kernel I
> > think!
>
> Just a print,v6.3-rc1 also has this problem.
Right, but that's a pretty old kernel to be using!
Seeing 5.14 and the patch not applying made me wonder if the patch was
generated against an old kernel.
> > Anyway, this patch doesn't apply to either riscv/for-next, riscv/fixes
> > or v6.3-rc1. What is the appropriate base to apply this patch?
>
> ok, i'll update it on v2.
>
>
> > > [ 0.936073] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
> > > [ 0.936495] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
> > > [ 0.936947] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
> > > [ 0.937487] s11: 0000000000d14980 t3 : 0000000000000000 t4 : 0000000000000000
> > > [ 0.937954] t5 : 0000000000000000 t6 : 0000000000000000
> > > [ 0.938510] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
> >
> > I have no idea what the significance of this particular backtrace is,
> > could you elaborate on what this is demonstrating? (and drop the leading
> > [###] too as it doesn't exactly add anything!
>
> The current call trace does not show the instruction that caused the
> exception. ok, I'll remove it on v2.
What would be nice to have is what the new show_opcodes() function will
look like ;)
Thanks,
Conor.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
@ 2023-03-29 6:24 ` Conor Dooley
0 siblings, 0 replies; 10+ messages in thread
From: Conor Dooley @ 2023-03-29 6:24 UTC (permalink / raw)
To: 运辉崔
Cc: Conor Dooley, paul.walmsley, palmer, aou, linux-riscv, linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 2903 bytes --]
Hey 运辉崔,
On Wed, Mar 29, 2023 at 11:52:21AM +0800, 运辉崔 wrote:
> On Wed, Mar 29, 2023 at 1:03 AM Conor Dooley <conor@kernel.org> wrote:
>
> > riscv/fault: Dump user opcode bytes on fatal faults
> >
> > I think you can drop the /fault, we don't usually use prefixes that like
> > for RISC-V.
> >
> ok, i'll update it on v2
>
>
> > > In this way, we found the problem: in the system bringup , it is
> > > precisely that we have not enabled the floating point function.
> >
> > What do you mean by that "have not enabled the floating point function"?
>
> The related cpu feature(COMPAT_HWCAP_ISA_F) is not enabled in the
> riscv_fill_hwcap function interface.
Right, I'm trying to figure out of this is another bug in the kernel -
if you don't have "fd" in riscv,isa in your devicetree then, even if
CONFIG_FPU is set, none of the FPU code is meant to run, right?
> > > So when an exception occurs, it is necessary to dump the instruction
> > > that caused the exception, like x86/fault (ba54d856a9d8).
> >
> > That's not the usual format for referring to commits, checkpatch should
> > complain about that.
>
> ok, i'll update it on v2.
>
> > >
> > > Logs:
> > > [ 0.822481] Run /init as init process
> > > [ 0.837569] init[1]: unhandled signal 4 code 0x1 at 0x000000000005e028 in bb[10000+5fe000]
> > > [ 0.932292] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00048-g4a843c9043e8-dirty #138
> >
> > 5.14-rc4?, oof! Need to get yourself onto a released, LTS kernel I
> > think!
>
> Just a print,v6.3-rc1 also has this problem.
Right, but that's a pretty old kernel to be using!
Seeing 5.14 and the patch not applying made me wonder if the patch was
generated against an old kernel.
> > Anyway, this patch doesn't apply to either riscv/for-next, riscv/fixes
> > or v6.3-rc1. What is the appropriate base to apply this patch?
>
> ok, i'll update it on v2.
>
>
> > > [ 0.936073] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
> > > [ 0.936495] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
> > > [ 0.936947] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
> > > [ 0.937487] s11: 0000000000d14980 t3 : 0000000000000000 t4 : 0000000000000000
> > > [ 0.937954] t5 : 0000000000000000 t6 : 0000000000000000
> > > [ 0.938510] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
> >
> > I have no idea what the significance of this particular backtrace is,
> > could you elaborate on what this is demonstrating? (and drop the leading
> > [###] too as it doesn't exactly add anything!
>
> The current call trace does not show the instruction that caused the
> exception. ok, I'll remove it on v2.
What would be nice to have is what the new show_opcodes() function will
look like ;)
Thanks,
Conor.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
[-- Attachment #2: Type: text/plain, Size: 161 bytes --]
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
2023-03-29 6:24 ` Conor Dooley
@ 2023-03-29 7:35 ` 运辉崔
-1 siblings, 0 replies; 10+ messages in thread
From: 运辉崔 @ 2023-03-29 7:35 UTC (permalink / raw)
To: Conor Dooley
Cc: Conor Dooley, paul.walmsley, palmer, aou, linux-riscv, linux-kernel
Hi Conor,
On Wed, Mar 29, 2023 at 2:25 PM Conor Dooley <conor.dooley@microchip.com> wrote:
>
> Right, I'm trying to figure out of this is another bug in the kernel -
> if you don't have "fd" in riscv,isa in your devicetree then, even if
> CONFIG_FPU is set, none of the FPU code is meant to run, right?
yeah, CONFIG_FPU is set.
In the problem I encountered, the init to be executed in user mode
contained floating-point instructions, which caused an exception.
> What would be nice to have is what the new show_opcodes() function will
> look like ;)
After printing the contents of __show_regs(), this line will continue
to be printed:
Opcode: 53 80 02 f0
It is not just the problem I encountered. When the process exits
unexpected, we all want to know what the instruction that caused the
process exception is.
Thanks,
Yunhui
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [PATCH] riscv/fault: Dump user opcode bytes on fatal faults
@ 2023-03-29 7:35 ` 运辉崔
0 siblings, 0 replies; 10+ messages in thread
From: 运辉崔 @ 2023-03-29 7:35 UTC (permalink / raw)
To: Conor Dooley
Cc: Conor Dooley, paul.walmsley, palmer, aou, linux-riscv, linux-kernel
Hi Conor,
On Wed, Mar 29, 2023 at 2:25 PM Conor Dooley <conor.dooley@microchip.com> wrote:
>
> Right, I'm trying to figure out of this is another bug in the kernel -
> if you don't have "fd" in riscv,isa in your devicetree then, even if
> CONFIG_FPU is set, none of the FPU code is meant to run, right?
yeah, CONFIG_FPU is set.
In the problem I encountered, the init to be executed in user mode
contained floating-point instructions, which caused an exception.
> What would be nice to have is what the new show_opcodes() function will
> look like ;)
After printing the contents of __show_regs(), this line will continue
to be printed:
Opcode: 53 80 02 f0
It is not just the problem I encountered. When the process exits
unexpected, we all want to know what the instruction that caused the
process exception is.
Thanks,
Yunhui
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-03-29 7:38 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-27 11:56 [PATCH] riscv/fault: Dump user opcode bytes on fatal faults Yunhui Cui
2023-03-27 11:56 ` Yunhui Cui
2023-03-28 17:03 ` Conor Dooley
2023-03-28 17:03 ` Conor Dooley
2023-03-29 3:52 ` [External] " 运辉崔
2023-03-29 3:52 ` 运辉崔
2023-03-29 6:24 ` Conor Dooley
2023-03-29 6:24 ` Conor Dooley
2023-03-29 7:35 ` 运辉崔
2023-03-29 7:35 ` 运辉崔
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.