All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2019-01-21 17:07 ` Eugeniy Paltsev
  0 siblings, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2019-01-21 17:07 UTC (permalink / raw)
  To: linux-snps-arc, Vineet Gupta
  Cc: linux-kernel, Alexey Brodkin, Eugeniy Paltsev

We process signals in the end of syscall/exception handler.
If the signal is fatal we print register's content using
show_regs function. show_regs() also prints information about
last exception happened.

In case of multicore system we can catch the situation when we
will print wrong information about exception. See the example:
______________________________
CPU-0: started to handle page fault
CPU-1: sent signal to process, which is executed on CPU-0
CPU-0: ended page fault handle. Started to process signal before
       returnig to userspace. Process signal, which is send from
       CPU-0. As th signal is fatal we call show_regs().
       show_regs() will show information about last exception
       which is *page fault* (instead of "trap" which is used for
       signals and happened on CPU-0)

So we will get message like this:
   # ./waitpid02
  potentially unexpected fatal signal 8.
  Path: /home/waitpid02
  CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
  task: 9f11c200 task.stack: 9f3ae000

  [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
  [EFA   ]: 0x00000000
  [BLINK ]: 0x123ea
  [ERET  ]: 0x123ec
    @off 0x123ec in [/home/waitpid02]
    VMA: 0x00010000 to 0x00016000
  [STAT32]: 0x80080882 : IE U
  BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
  LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
  [-----other-info-----]

This message is confusing because it show information about page fault
( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
to signal.

This situation was reproduced with waitpid02 LTP test.
_____________________________

So remove printing information about exceptions from show_regs()
to avoid confusing messages. Print information about exceptions
only in required places instead of show_regs()

Now we don't print information about exceptions if signal is simply
send by another userspace app. So in case of waitpid02 we will print
next message:
_____________________________
   # ./waitpid02
  potentially unexpected fatal signal 8.
  Path: /root/waitpid02
  CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
  [ECR   ]: 0x00050100
  [EFA   ]: 0x00000000
  [BLINK ]: 0x20001486
  [-----other-info-----]
_____________________________

This patch fix
STAR 9001146055: waitpid02: Invalid Write @ 0x00000000 by insn @ 0x000123ec

NOTE:
To be more clear I give examples of different faults (signal-based,
userspace/kernelspace exception-based) with different values of
"/proc/sys/kernel/print-fatal-signals" option.

0) NULL pointer access from user space, print-fatal-signals == 1:
------------>8---------------
 # ./arc_hell
Exception: arc_hell[103]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
  ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
potentially unexpected fatal signal 11.
Path: /root/arc_hell
CPU: 1 PID: 103 Comm: arc_hell Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
[ECR   ]: 0x00050100
[EFA   ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
[STAT32]: 0x80080882 : IE U
BTA: 0x2003a358  SP: 0x5fa27dc4  FP: 0x5fa27de8
LPS: 0x2003a628 LPE: 0x2003a62c LPC: 0x00000000
r00: 0x00000000 r01: 0x200740b0 r02: 0x00000001
r03: 0x00000007 r04: 0x80808080 r05: 0x2f2f2f2f
r06: 0x7c7a2f43 r07: 0x00000000 r08: 0x1a131100
r09: 0x2008b1e0 r10: 0x20003a5c r11: 0x20004038
r12: 0x20039ef8 r13: 0x200740b0 r14: 0x00000000
r15: 0x200740b0 r16: 0x00000000 r17: 0x0007d468
r18: 0x0009313a r19: 0x00000000 r20: 0x0009c22c
r21: 0x0009c23c r22: 0x0009ab64 r23: 0x00000000
r24: 0x0009dfc5 r25: 0x20004b70

Segmentation fault
------------>8---------------

1) NULL pointer access from user space, print-fatal-signals == 0:
------------>8---------------
 # ./arc_hell
Exception: arc_hell[107]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
  ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
Segmentation fault
------------>8---------------

2) Process killed by signal (waitpid02 test), print-fatal-signals == 1:
------------>8---------------
 # ./waitpid02
potentially unexpected fatal signal 8.
Path: /root/waitpid02
CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
[ECR   ]: 0x00050100
[EFA   ]: 0x00000000
[BLINK ]: 0x20001486
[ERET  ]: 0x2000146c
[STAT32]: 0x80080082 : IE U
BTA: 0x20000fc4  SP: 0x5fa21d64  FP: 0x00000000
LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x00000006
r00: 0x2000c0dc r01: 0x00000018 r02: 0x0001159a
r03: 0x00000001 r04: 0x00000000 r05: 0x00000045
r06: 0x0000004e r07: 0x01010101 r08: 0x000000dc
r09: 0x200a31e0 r10: 0x20003a5c r11: 0x20004038
r12: 0x20001486 r13: 0x20004174 r14: 0x07ca2bc0
r15: 0x20004078 r16: 0x00000000 r17: 0x20004038
r18: 0x00000001 r19: 0x00000000 r20: 0x0001159a
r21: 0x00000001 r22: 0x00000000 r23: 0x00000004
r24: 0x2000d1fc r25: 0x20004cd0
------------>8---------------

3) Process killed by signal (waitpid02 test), print-fatal-signals == 0:
------------>8---------------
 # ./waitpid02
------------>8---------------

4) NULL pointer access from kernel space, regardless print-fatal-signals
value:
------------>8---------------
Oops
Exception: at stmmac_dvr_probe+0x40/0x7e0:
  ECR: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x90575254
Path: (null)
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #11
[EFA   ]: 0x00000000
[BLINK ]: stmmac_dvr_probe+0x2a/0x7e0
[ERET  ]: stmmac_dvr_probe+0x40/0x7e0
[STAT32]: 0x80080402 : IE K
BTA: 0x9057523e  SP: 0xbf02be40  FP: 0x00000000
LPS: 0x9073dbcc LPE: 0x9073dbe0 LPC: 0x00000000
r00: 0xbf1f0000 r01: 0x00000000 r02: 0x00000008
r03: 0x00000000 r04: 0x00000064 r05: 0x00000000
r06: 0x00000003 r07: 0x00000004 r08: 0x01010101
r09: 0x00000031 r10: 0xbf1f0006 r11: 0x00000000
r12: 0x9057523e

Stack Trace:
  stmmac_dvr_probe+0x40/0x7e0
  dwmac_generic_probe+0x50/0xbc
  platform_drv_probe+0x26/0x5c
  really_probe+0x3b0/0x464
  __driver_attach+0xd4/0xd8
  bus_for_each_dev+0x3a/0x70
  bus_add_driver+0x12a/0x18c
  driver_register+0x56/0xe8
  do_one_initcall+0x2e/0x118
  kernel_init_freeable+0x102/0x194
------------>8---------------

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
---
Changes v3->v4:
 * Rebase onto last ARC changes.

Changes v2->v3:
 * Don't show exception message if the corresponding signal is
   handled by application (inspired by x86 implementation)
 * Rebase onto v4.19-rc8

Changes v1->v2:
 * Change format of message about exception.

 arch/arc/include/asm/bug.h     |  1 +
 arch/arc/kernel/traps.c        |  3 +++
 arch/arc/kernel/troubleshoot.c | 46 ++++++++++++++++++++++++++++++++----------
 arch/arc/mm/fault.c            |  9 +++++++++
 4 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
index 21ec82466d62..b68f7f82f2d8 100644
--- a/arch/arc/include/asm/bug.h
+++ b/arch/arc/include/asm/bug.h
@@ -16,6 +16,7 @@
 struct task_struct;
 
 void show_regs(struct pt_regs *regs);
+void show_exception_mesg(struct pt_regs *regs);
 void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs);
 void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
 			    unsigned long address);
diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
index a7fcbc0d3943..b94b13120bff 100644
--- a/arch/arc/kernel/traps.c
+++ b/arch/arc/kernel/traps.c
@@ -50,6 +50,9 @@ unhandled_exception(const char *str, struct pt_regs *regs,
 
 		tsk->thread.fault_address = (__force unsigned int)addr;
 
+		if (unhandled_signal(tsk, signo))
+			show_exception_mesg(regs);
+
 		force_sig_fault(signo, si_code, addr, tsk);
 
 	} else {
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index 215f515442e0..64e9867a1a65 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -106,13 +106,13 @@ static void show_faulting_vma(unsigned long address)
 			if (IS_ERR(nm))
 				nm = "?";
 		}
-		pr_info("    @off 0x%lx in [%s]\n"
-			"    VMA: 0x%08lx to 0x%08lx\n",
+
+		pr_cont("[off 0x%lx in %s, VMA: %08lx:%08lx] ",
 			vma->vm_start < TASK_UNMAPPED_BASE ?
 				address : address - vma->vm_start,
 			nm, vma->vm_start, vma->vm_end);
 	} else
-		pr_info("    @No matching VMA found\n");
+		pr_cont("[No matching VMA found] ");
 
 	up_read(&active_mm->mmap_sem);
 }
@@ -122,7 +122,7 @@ static void show_ecr_verbose(struct pt_regs *regs)
 	unsigned int vec, cause_code;
 	unsigned long address;
 
-	pr_info("\n[ECR   ]: 0x%08lx => ", regs->event);
+	pr_cont("\n  ECR: 0x%08lx => ", regs->event);
 
 	/* For Data fault, this is data address not instruction addr */
 	address = current->thread.fault_address;
@@ -170,10 +170,37 @@ static void show_ecr_verbose(struct pt_regs *regs)
 	}
 }
 
+static inline void show_exception_mesg_u(struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+
+	pr_info("Exception: %s[%d]: at %pS ",
+		tsk->comm, task_pid_nr(tsk), (void *)regs->ret);
+
+	show_faulting_vma(regs->ret);
+
+	show_ecr_verbose(regs);
+}
+
+static inline void show_exception_mesg_k(struct pt_regs *regs)
+{
+	pr_info("Exception: at %pS:", (void *)regs->ret);
+
+	show_ecr_verbose(regs);
+}
+
 /************************************************************************
  *  API called by rest of kernel
  ***********************************************************************/
 
+void show_exception_mesg(struct pt_regs *regs)
+{
+	if (user_mode(regs))
+		show_exception_mesg_u(regs);
+	else
+		show_exception_mesg_k(regs);
+}
+
 void show_regs(struct pt_regs *regs)
 {
 	struct task_struct *tsk = current;
@@ -188,15 +215,10 @@ void show_regs(struct pt_regs *regs)
 	print_task_path_n_nm(tsk);
 	show_regs_print_info(KERN_INFO);
 
-	show_ecr_verbose(regs);
-
-	pr_info("[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
-		current->thread.fault_address,
+	pr_info("[ECR   ]: 0x%08lx\n[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
+		regs->event, current->thread.fault_address,
 		(void *)regs->blink, (void *)regs->ret);
 
-	if (user_mode(regs))
-		show_faulting_vma(regs->ret); /* faulting code, not data */
-
 	pr_info("[STAT32]: 0x%08lx", regs->status32);
 
 #define STS_BIT(r, bit)	r->status32 & STATUS_##bit##_MASK ? #bit" " : ""
@@ -239,6 +261,8 @@ void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
 	/* Show fault description */
 	pr_info("\n%s\n", str);
 
+	show_exception_mesg(regs);
+
 	/* Caller and Callee regs */
 	show_regs(regs);
 
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 8df1638259f3..e44b64107adb 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -202,7 +202,12 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 	/* User mode accesses just cause a SIGSEGV */
 	if (user_mode(regs)) {
 		tsk->thread.fault_address = address;
+
+		if (unhandled_signal(tsk, SIGSEGV))
+			show_exception_mesg(regs);
+
 		force_sig_fault(SIGSEGV, si_code, (void __user *)address, tsk);
+
 		return;
 	}
 
@@ -237,5 +242,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 		goto no_context;
 
 	tsk->thread.fault_address = address;
+
+	if (unhandled_signal(tsk, SIGBUS))
+		show_exception_mesg(regs);
+
 	force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address, tsk);
 }
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2019-01-21 17:07 ` Eugeniy Paltsev
  0 siblings, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2019-01-21 17:07 UTC (permalink / raw)
  To: linux-snps-arc

We process signals in the end of syscall/exception handler.
If the signal is fatal we print register's content using
show_regs function. show_regs() also prints information about
last exception happened.

In case of multicore system we can catch the situation when we
will print wrong information about exception. See the example:
______________________________
CPU-0: started to handle page fault
CPU-1: sent signal to process, which is executed on CPU-0
CPU-0: ended page fault handle. Started to process signal before
       returnig to userspace. Process signal, which is send from
       CPU-0. As th signal is fatal we call show_regs().
       show_regs() will show information about last exception
       which is *page fault* (instead of "trap" which is used for
       signals and happened on CPU-0)

So we will get message like this:
   # ./waitpid02
  potentially unexpected fatal signal 8.
  Path: /home/waitpid02
  CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
  task: 9f11c200 task.stack: 9f3ae000

  [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
  [EFA   ]: 0x00000000
  [BLINK ]: 0x123ea
  [ERET  ]: 0x123ec
    @off 0x123ec in [/home/waitpid02]
    VMA: 0x00010000 to 0x00016000
  [STAT32]: 0x80080882 : IE U
  BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
  LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
  [-----other-info-----]

This message is confusing because it show information about page fault
( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
to signal.

This situation was reproduced with waitpid02 LTP test.
_____________________________

So remove printing information about exceptions from show_regs()
to avoid confusing messages. Print information about exceptions
only in required places instead of show_regs()

Now we don't print information about exceptions if signal is simply
send by another userspace app. So in case of waitpid02 we will print
next message:
_____________________________
   # ./waitpid02
  potentially unexpected fatal signal 8.
  Path: /root/waitpid02
  CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
  [ECR   ]: 0x00050100
  [EFA   ]: 0x00000000
  [BLINK ]: 0x20001486
  [-----other-info-----]
_____________________________

This patch fix
STAR 9001146055: waitpid02: Invalid Write @ 0x00000000 by insn @ 0x000123ec

NOTE:
To be more clear I give examples of different faults (signal-based,
userspace/kernelspace exception-based) with different values of
"/proc/sys/kernel/print-fatal-signals" option.

0) NULL pointer access from user space, print-fatal-signals == 1:
------------>8---------------
 # ./arc_hell
Exception: arc_hell[103]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
  ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
potentially unexpected fatal signal 11.
Path: /root/arc_hell
CPU: 1 PID: 103 Comm: arc_hell Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
[ECR   ]: 0x00050100
[EFA   ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
[STAT32]: 0x80080882 : IE U
BTA: 0x2003a358  SP: 0x5fa27dc4  FP: 0x5fa27de8
LPS: 0x2003a628 LPE: 0x2003a62c LPC: 0x00000000
r00: 0x00000000 r01: 0x200740b0 r02: 0x00000001
r03: 0x00000007 r04: 0x80808080 r05: 0x2f2f2f2f
r06: 0x7c7a2f43 r07: 0x00000000 r08: 0x1a131100
r09: 0x2008b1e0 r10: 0x20003a5c r11: 0x20004038
r12: 0x20039ef8 r13: 0x200740b0 r14: 0x00000000
r15: 0x200740b0 r16: 0x00000000 r17: 0x0007d468
r18: 0x0009313a r19: 0x00000000 r20: 0x0009c22c
r21: 0x0009c23c r22: 0x0009ab64 r23: 0x00000000
r24: 0x0009dfc5 r25: 0x20004b70

Segmentation fault
------------>8---------------

1) NULL pointer access from user space, print-fatal-signals == 0:
------------>8---------------
 # ./arc_hell
Exception: arc_hell[107]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
  ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
Segmentation fault
------------>8---------------

2) Process killed by signal (waitpid02 test), print-fatal-signals == 1:
------------>8---------------
 # ./waitpid02
potentially unexpected fatal signal 8.
Path: /root/waitpid02
CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
[ECR   ]: 0x00050100
[EFA   ]: 0x00000000
[BLINK ]: 0x20001486
[ERET  ]: 0x2000146c
[STAT32]: 0x80080082 : IE U
BTA: 0x20000fc4  SP: 0x5fa21d64  FP: 0x00000000
LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x00000006
r00: 0x2000c0dc r01: 0x00000018 r02: 0x0001159a
r03: 0x00000001 r04: 0x00000000 r05: 0x00000045
r06: 0x0000004e r07: 0x01010101 r08: 0x000000dc
r09: 0x200a31e0 r10: 0x20003a5c r11: 0x20004038
r12: 0x20001486 r13: 0x20004174 r14: 0x07ca2bc0
r15: 0x20004078 r16: 0x00000000 r17: 0x20004038
r18: 0x00000001 r19: 0x00000000 r20: 0x0001159a
r21: 0x00000001 r22: 0x00000000 r23: 0x00000004
r24: 0x2000d1fc r25: 0x20004cd0
------------>8---------------

3) Process killed by signal (waitpid02 test), print-fatal-signals == 0:
------------>8---------------
 # ./waitpid02
------------>8---------------

4) NULL pointer access from kernel space, regardless print-fatal-signals
value:
------------>8---------------
Oops
Exception: at stmmac_dvr_probe+0x40/0x7e0:
  ECR: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x90575254
Path: (null)
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #11
[EFA   ]: 0x00000000
[BLINK ]: stmmac_dvr_probe+0x2a/0x7e0
[ERET  ]: stmmac_dvr_probe+0x40/0x7e0
[STAT32]: 0x80080402 : IE K
BTA: 0x9057523e  SP: 0xbf02be40  FP: 0x00000000
LPS: 0x9073dbcc LPE: 0x9073dbe0 LPC: 0x00000000
r00: 0xbf1f0000 r01: 0x00000000 r02: 0x00000008
r03: 0x00000000 r04: 0x00000064 r05: 0x00000000
r06: 0x00000003 r07: 0x00000004 r08: 0x01010101
r09: 0x00000031 r10: 0xbf1f0006 r11: 0x00000000
r12: 0x9057523e

Stack Trace:
  stmmac_dvr_probe+0x40/0x7e0
  dwmac_generic_probe+0x50/0xbc
  platform_drv_probe+0x26/0x5c
  really_probe+0x3b0/0x464
  __driver_attach+0xd4/0xd8
  bus_for_each_dev+0x3a/0x70
  bus_add_driver+0x12a/0x18c
  driver_register+0x56/0xe8
  do_one_initcall+0x2e/0x118
  kernel_init_freeable+0x102/0x194
------------>8---------------

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev at synopsys.com>
---
Changes v3->v4:
 * Rebase onto last ARC changes.

Changes v2->v3:
 * Don't show exception message if the corresponding signal is
   handled by application (inspired by x86 implementation)
 * Rebase onto v4.19-rc8

Changes v1->v2:
 * Change format of message about exception.

 arch/arc/include/asm/bug.h     |  1 +
 arch/arc/kernel/traps.c        |  3 +++
 arch/arc/kernel/troubleshoot.c | 46 ++++++++++++++++++++++++++++++++----------
 arch/arc/mm/fault.c            |  9 +++++++++
 4 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
index 21ec82466d62..b68f7f82f2d8 100644
--- a/arch/arc/include/asm/bug.h
+++ b/arch/arc/include/asm/bug.h
@@ -16,6 +16,7 @@
 struct task_struct;
 
 void show_regs(struct pt_regs *regs);
+void show_exception_mesg(struct pt_regs *regs);
 void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs);
 void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
 			    unsigned long address);
diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
index a7fcbc0d3943..b94b13120bff 100644
--- a/arch/arc/kernel/traps.c
+++ b/arch/arc/kernel/traps.c
@@ -50,6 +50,9 @@ unhandled_exception(const char *str, struct pt_regs *regs,
 
 		tsk->thread.fault_address = (__force unsigned int)addr;
 
+		if (unhandled_signal(tsk, signo))
+			show_exception_mesg(regs);
+
 		force_sig_fault(signo, si_code, addr, tsk);
 
 	} else {
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index 215f515442e0..64e9867a1a65 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -106,13 +106,13 @@ static void show_faulting_vma(unsigned long address)
 			if (IS_ERR(nm))
 				nm = "?";
 		}
-		pr_info("    @off 0x%lx in [%s]\n"
-			"    VMA: 0x%08lx to 0x%08lx\n",
+
+		pr_cont("[off 0x%lx in %s, VMA: %08lx:%08lx] ",
 			vma->vm_start < TASK_UNMAPPED_BASE ?
 				address : address - vma->vm_start,
 			nm, vma->vm_start, vma->vm_end);
 	} else
-		pr_info("    @No matching VMA found\n");
+		pr_cont("[No matching VMA found] ");
 
 	up_read(&active_mm->mmap_sem);
 }
@@ -122,7 +122,7 @@ static void show_ecr_verbose(struct pt_regs *regs)
 	unsigned int vec, cause_code;
 	unsigned long address;
 
-	pr_info("\n[ECR   ]: 0x%08lx => ", regs->event);
+	pr_cont("\n  ECR: 0x%08lx => ", regs->event);
 
 	/* For Data fault, this is data address not instruction addr */
 	address = current->thread.fault_address;
@@ -170,10 +170,37 @@ static void show_ecr_verbose(struct pt_regs *regs)
 	}
 }
 
+static inline void show_exception_mesg_u(struct pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+
+	pr_info("Exception: %s[%d]: at %pS ",
+		tsk->comm, task_pid_nr(tsk), (void *)regs->ret);
+
+	show_faulting_vma(regs->ret);
+
+	show_ecr_verbose(regs);
+}
+
+static inline void show_exception_mesg_k(struct pt_regs *regs)
+{
+	pr_info("Exception: at %pS:", (void *)regs->ret);
+
+	show_ecr_verbose(regs);
+}
+
 /************************************************************************
  *  API called by rest of kernel
  ***********************************************************************/
 
+void show_exception_mesg(struct pt_regs *regs)
+{
+	if (user_mode(regs))
+		show_exception_mesg_u(regs);
+	else
+		show_exception_mesg_k(regs);
+}
+
 void show_regs(struct pt_regs *regs)
 {
 	struct task_struct *tsk = current;
@@ -188,15 +215,10 @@ void show_regs(struct pt_regs *regs)
 	print_task_path_n_nm(tsk);
 	show_regs_print_info(KERN_INFO);
 
-	show_ecr_verbose(regs);
-
-	pr_info("[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
-		current->thread.fault_address,
+	pr_info("[ECR   ]: 0x%08lx\n[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
+		regs->event, current->thread.fault_address,
 		(void *)regs->blink, (void *)regs->ret);
 
-	if (user_mode(regs))
-		show_faulting_vma(regs->ret); /* faulting code, not data */
-
 	pr_info("[STAT32]: 0x%08lx", regs->status32);
 
 #define STS_BIT(r, bit)	r->status32 & STATUS_##bit##_MASK ? #bit" " : ""
@@ -239,6 +261,8 @@ void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
 	/* Show fault description */
 	pr_info("\n%s\n", str);
 
+	show_exception_mesg(regs);
+
 	/* Caller and Callee regs */
 	show_regs(regs);
 
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 8df1638259f3..e44b64107adb 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -202,7 +202,12 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 	/* User mode accesses just cause a SIGSEGV */
 	if (user_mode(regs)) {
 		tsk->thread.fault_address = address;
+
+		if (unhandled_signal(tsk, SIGSEGV))
+			show_exception_mesg(regs);
+
 		force_sig_fault(SIGSEGV, si_code, (void __user *)address, tsk);
+
 		return;
 	}
 
@@ -237,5 +242,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 		goto no_context;
 
 	tsk->thread.fault_address = address;
+
+	if (unhandled_signal(tsk, SIGBUS))
+		show_exception_mesg(regs);
+
 	force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address, tsk);
 }
-- 
2.14.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message
  2019-01-21 17:07 ` Eugeniy Paltsev
@ 2019-01-21 21:56   ` Vineet Gupta
  -1 siblings, 0 replies; 14+ messages in thread
From: Vineet Gupta @ 2019-01-21 21:56 UTC (permalink / raw)
  To: Eugeniy Paltsev, linux-snps-arc; +Cc: linux-kernel, Alexey Brodkin

On 1/21/19 9:07 AM, Eugeniy Paltsev wrote:
> We process signals in the end of syscall/exception handler.
> If the signal is fatal we print register's content using
> show_regs function. show_regs() also prints information about
> last exception happened.
> 
> In case of multicore system we can catch the situation when we
> will print wrong information about exception. See the example:
> ______________________________
> CPU-0: started to handle page fault
> CPU-1: sent signal to process, which is executed on CPU-0
> CPU-0: ended page fault handle. Started to process signal before
>        returnig to userspace. Process signal, which is send from
>        CPU-0. As th signal is fatal we call show_regs().
>        show_regs() will show information about last exception
>        which is *page fault* (instead of "trap" which is used for
>        signals and happened on CPU-0)
> 
> So we will get message like this:
>    # ./waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /home/waitpid02
>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>   task: 9f11c200 task.stack: 9f3ae000
> 
>   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
>   [EFA   ]: 0x00000000
>   [BLINK ]: 0x123ea
>   [ERET  ]: 0x123ec
>     @off 0x123ec in [/home/waitpid02]
>     VMA: 0x00010000 to 0x00016000
>   [STAT32]: 0x80080882 : IE U
>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
>   [-----other-info-----]
> 
> This message is confusing because it show information about page fault
> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> to signal.
> 
> This situation was reproduced with waitpid02 LTP test.
> _____________________________
> 
> So remove printing information about exceptions from show_regs()
> to avoid confusing messages. Print information about exceptions
> only in required places instead of show_regs()

That is fine, but as I mentioned in your last posting, this is still not complete.
If printing reg file confuses us in case of termination by signal from some other
task, I don't see how just leaving out the exception regs, but still printing rest
of the reg file is the complete solution. It is still bogus and any fixes to that
effect are band aids.

> 
> Now we don't print information about exceptions if signal is simply
> send by another userspace app. So in case of waitpid02 we will print
> next message:

So all we are skipping is the decoding of ECR as you seem to be printing the raw
value anyways.

> _____________________________
>    # ./waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /root/waitpid02
>   CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
>   [ECR   ]: 0x00050100
>   [EFA   ]: 0x00000000
>   [BLINK ]: 0x20001486
>   [-----other-info-----]
> _____________________________
> 
> This patch fix
> STAR 9001146055: waitpid02: Invalid Write @ 0x00000000 by insn @ 0x000123ec
> 
> NOTE:
> To be more clear I give examples of different faults (signal-based,
> userspace/kernelspace exception-based) with different values of
> "/proc/sys/kernel/print-fatal-signals" option.
> 
> 0) NULL pointer access from user space, print-fatal-signals == 1:
> ------------>8---------------
>  # ./arc_hell
> Exception: arc_hell[103]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
>   ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
> potentially unexpected fatal signal 11.
> Path: /root/arc_hell
> CPU: 1 PID: 103 Comm: arc_hell Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
> [ECR   ]: 0x00050100

So we are printing the ECR twice. Sorry this approach is not going to work.


> [EFA   ]: 0x00000000
> [BLINK ]: 0x20039ef8
> [ERET  ]: 0x2003a35c
...
> 
> Segmentation fault
> ------------>8---------------
> 
> 1) NULL pointer access from user space, print-fatal-signals == 0:
> ------------>8---------------
>  # ./arc_hell
> Exception: arc_hell[107]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
>   ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
> Segmentation fault
> ------------>8---------------
> 
> 2) Process killed by signal (waitpid02 test), print-fatal-signals == 1:
> ------------>8---------------
>  # ./waitpid02
> potentially unexpected fatal signal 8.
> Path: /root/waitpid02
> CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
> [ECR   ]: 0x00050100
> [EFA   ]: 0x00000000
> [BLINK ]: 0x20001486
> [ERET  ]: 0x2000146c
> [STAT32]: 0x80080082 : IE U
> BTA: 0x20000fc4  SP: 0x5fa21d64  FP: 0x00000000
> LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x00000006
> r00: 0x2000c0dc r01: 0x00000018 r02: 0x0001159a
> r03: 0x00000001 r04: 0x00000000 r05: 0x00000045
> r06: 0x0000004e r07: 0x01010101 r08: 0x000000dc
> r09: 0x200a31e0 r10: 0x20003a5c r11: 0x20004038
> r12: 0x20001486 r13: 0x20004174 r14: 0x07ca2bc0
> r15: 0x20004078 r16: 0x00000000 r17: 0x20004038
> r18: 0x00000001 r19: 0x00000000 r20: 0x0001159a
> r21: 0x00000001 r22: 0x00000000 r23: 0x00000004
> r24: 0x2000d1fc r25: 0x20004cd0

This is the part I really object to. We should not print any register when they
are not relevant. Not decoding the ECR is not enough IMO.


> ------------>8---------------
> 
> 3) Process killed by signal (waitpid02 test), print-fatal-signals == 0:
> ------------>8---------------
>  # ./waitpid02

...
> 
> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
> ---
> Changes v3->v4:
>  * Rebase onto last ARC changes.
> 
> Changes v2->v3:
>  * Don't show exception message if the corresponding signal is
>    handled by application (inspired by x86 implementation)
>  * Rebase onto v4.19-rc8
> 
> Changes v1->v2:
>  * Change format of message about exception.
> 
>  arch/arc/include/asm/bug.h     |  1 +
>  arch/arc/kernel/traps.c        |  3 +++
>  arch/arc/kernel/troubleshoot.c | 46 ++++++++++++++++++++++++++++++++----------
>  arch/arc/mm/fault.c            |  9 +++++++++
>  4 files changed, 48 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
> index 21ec82466d62..b68f7f82f2d8 100644
> --- a/arch/arc/include/asm/bug.h
> +++ b/arch/arc/include/asm/bug.h
> @@ -16,6 +16,7 @@
>  struct task_struct;
>  
>  void show_regs(struct pt_regs *regs);
> +void show_exception_mesg(struct pt_regs *regs);
>  void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs);
>  void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
>  			    unsigned long address);
> diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
> index a7fcbc0d3943..b94b13120bff 100644
> --- a/arch/arc/kernel/traps.c
> +++ b/arch/arc/kernel/traps.c
> @@ -50,6 +50,9 @@ unhandled_exception(const char *str, struct pt_regs *regs,
>  
>  		tsk->thread.fault_address = (__force unsigned int)addr;
>  
> +		if (unhandled_signal(tsk, signo))
> +			show_exception_mesg(regs);
> +
>  		force_sig_fault(signo, si_code, addr, tsk);
>  
>  	} else {
> diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
> index 215f515442e0..64e9867a1a65 100644
> --- a/arch/arc/kernel/troubleshoot.c
> +++ b/arch/arc/kernel/troubleshoot.c
> @@ -106,13 +106,13 @@ static void show_faulting_vma(unsigned long address)
>  			if (IS_ERR(nm))
>  				nm = "?";
>  		}
> -		pr_info("    @off 0x%lx in [%s]\n"
> -			"    VMA: 0x%08lx to 0x%08lx\n",
> +
> +		pr_cont("[off 0x%lx in %s, VMA: %08lx:%08lx] ",
>  			vma->vm_start < TASK_UNMAPPED_BASE ?
>  				address : address - vma->vm_start,
>  			nm, vma->vm_start, vma->vm_end);
>  	} else
> -		pr_info("    @No matching VMA found\n");
> +		pr_cont("[No matching VMA found] ");
>  
>  	up_read(&active_mm->mmap_sem);
>  }
> @@ -122,7 +122,7 @@ static void show_ecr_verbose(struct pt_regs *regs)
>  	unsigned int vec, cause_code;
>  	unsigned long address;
>  
> -	pr_info("\n[ECR   ]: 0x%08lx => ", regs->event);
> +	pr_cont("\n  ECR: 0x%08lx => ", regs->event);
>  
>  	/* For Data fault, this is data address not instruction addr */
>  	address = current->thread.fault_address;
> @@ -170,10 +170,37 @@ static void show_ecr_verbose(struct pt_regs *regs)
>  	}
>  }
>  
> +static inline void show_exception_mesg_u(struct pt_regs *regs)
> +{
> +	struct task_struct *tsk = current;
> +
> +	pr_info("Exception: %s[%d]: at %pS ",
> +		tsk->comm, task_pid_nr(tsk), (void *)regs->ret);
> +
> +	show_faulting_vma(regs->ret);
> +
> +	show_ecr_verbose(regs);
> +}
> +
> +static inline void show_exception_mesg_k(struct pt_regs *regs)
> +{
> +	pr_info("Exception: at %pS:", (void *)regs->ret);
> +
> +	show_ecr_verbose(regs);
> +}
> +
>  /************************************************************************
>   *  API called by rest of kernel
>   ***********************************************************************/
>  
> +void show_exception_mesg(struct pt_regs *regs)
> +{
> +	if (user_mode(regs))
> +		show_exception_mesg_u(regs);
> +	else
> +		show_exception_mesg_k(regs);
> +}
> +
>  void show_regs(struct pt_regs *regs)
>  {
>  	struct task_struct *tsk = current;
> @@ -188,15 +215,10 @@ void show_regs(struct pt_regs *regs)
>  	print_task_path_n_nm(tsk);
>  	show_regs_print_info(KERN_INFO);
>  
> -	show_ecr_verbose(regs);
> -
> -	pr_info("[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
> -		current->thread.fault_address,
> +	pr_info("[ECR   ]: 0x%08lx\n[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
> +		regs->event, current->thread.fault_address,
>  		(void *)regs->blink, (void *)regs->ret);
>  
> -	if (user_mode(regs))
> -		show_faulting_vma(regs->ret); /* faulting code, not data */
> -
>  	pr_info("[STAT32]: 0x%08lx", regs->status32);
>  
>  #define STS_BIT(r, bit)	r->status32 & STATUS_##bit##_MASK ? #bit" " : ""
> @@ -239,6 +261,8 @@ void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
>  	/* Show fault description */
>  	pr_info("\n%s\n", str);
>  
> +	show_exception_mesg(regs);
> +
>  	/* Caller and Callee regs */
>  	show_regs(regs);
>  
> diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
> index 8df1638259f3..e44b64107adb 100644
> --- a/arch/arc/mm/fault.c
> +++ b/arch/arc/mm/fault.c
> @@ -202,7 +202,12 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
>  	/* User mode accesses just cause a SIGSEGV */
>  	if (user_mode(regs)) {
>  		tsk->thread.fault_address = address;
> +
> +		if (unhandled_signal(tsk, SIGSEGV))
> +			show_exception_mesg(regs);
> +
>  		force_sig_fault(SIGSEGV, si_code, (void __user *)address, tsk);
> +
>  		return;
>  	}
>  
> @@ -237,5 +242,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
>  		goto no_context;
>  
>  	tsk->thread.fault_address = address;
> +
> +	if (unhandled_signal(tsk, SIGBUS))
> +		show_exception_mesg(regs);
> +
>  	force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address, tsk);
>  }
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2019-01-21 21:56   ` Vineet Gupta
  0 siblings, 0 replies; 14+ messages in thread
From: Vineet Gupta @ 2019-01-21 21:56 UTC (permalink / raw)
  To: linux-snps-arc

On 1/21/19 9:07 AM, Eugeniy Paltsev wrote:
> We process signals in the end of syscall/exception handler.
> If the signal is fatal we print register's content using
> show_regs function. show_regs() also prints information about
> last exception happened.
> 
> In case of multicore system we can catch the situation when we
> will print wrong information about exception. See the example:
> ______________________________
> CPU-0: started to handle page fault
> CPU-1: sent signal to process, which is executed on CPU-0
> CPU-0: ended page fault handle. Started to process signal before
>        returnig to userspace. Process signal, which is send from
>        CPU-0. As th signal is fatal we call show_regs().
>        show_regs() will show information about last exception
>        which is *page fault* (instead of "trap" which is used for
>        signals and happened on CPU-0)
> 
> So we will get message like this:
>    # ./waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /home/waitpid02
>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>   task: 9f11c200 task.stack: 9f3ae000
> 
>   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
>   [EFA   ]: 0x00000000
>   [BLINK ]: 0x123ea
>   [ERET  ]: 0x123ec
>     @off 0x123ec in [/home/waitpid02]
>     VMA: 0x00010000 to 0x00016000
>   [STAT32]: 0x80080882 : IE U
>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
>   [-----other-info-----]
> 
> This message is confusing because it show information about page fault
> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> to signal.
> 
> This situation was reproduced with waitpid02 LTP test.
> _____________________________
> 
> So remove printing information about exceptions from show_regs()
> to avoid confusing messages. Print information about exceptions
> only in required places instead of show_regs()

That is fine, but as I mentioned in your last posting, this is still not complete.
If printing reg file confuses us in case of termination by signal from some other
task, I don't see how just leaving out the exception regs, but still printing rest
of the reg file is the complete solution. It is still bogus and any fixes to that
effect are band aids.

> 
> Now we don't print information about exceptions if signal is simply
> send by another userspace app. So in case of waitpid02 we will print
> next message:

So all we are skipping is the decoding of ECR as you seem to be printing the raw
value anyways.

> _____________________________
>    # ./waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /root/waitpid02
>   CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
>   [ECR   ]: 0x00050100
>   [EFA   ]: 0x00000000
>   [BLINK ]: 0x20001486
>   [-----other-info-----]
> _____________________________
> 
> This patch fix
> STAR 9001146055: waitpid02: Invalid Write @ 0x00000000 by insn @ 0x000123ec
> 
> NOTE:
> To be more clear I give examples of different faults (signal-based,
> userspace/kernelspace exception-based) with different values of
> "/proc/sys/kernel/print-fatal-signals" option.
> 
> 0) NULL pointer access from user space, print-fatal-signals == 1:
> ------------>8---------------
>  # ./arc_hell
> Exception: arc_hell[103]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
>   ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
> potentially unexpected fatal signal 11.
> Path: /root/arc_hell
> CPU: 1 PID: 103 Comm: arc_hell Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
> [ECR   ]: 0x00050100

So we are printing the ECR twice. Sorry this approach is not going to work.


> [EFA   ]: 0x00000000
> [BLINK ]: 0x20039ef8
> [ERET  ]: 0x2003a35c
...
> 
> Segmentation fault
> ------------>8---------------
> 
> 1) NULL pointer access from user space, print-fatal-signals == 0:
> ------------>8---------------
>  # ./arc_hell
> Exception: arc_hell[107]: at 0x2003a35c [off 0x2e35c in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
>   ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
> Segmentation fault
> ------------>8---------------
> 
> 2) Process killed by signal (waitpid02 test), print-fatal-signals == 1:
> ------------>8---------------
>  # ./waitpid02
> potentially unexpected fatal signal 8.
> Path: /root/waitpid02
> CPU: 2 PID: 105 Comm: waitpid02 Not tainted 4.18.0-rc8-00002-gde0f6d6aeb53-dirty #17
> [ECR   ]: 0x00050100
> [EFA   ]: 0x00000000
> [BLINK ]: 0x20001486
> [ERET  ]: 0x2000146c
> [STAT32]: 0x80080082 : IE U
> BTA: 0x20000fc4  SP: 0x5fa21d64  FP: 0x00000000
> LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x00000006
> r00: 0x2000c0dc r01: 0x00000018 r02: 0x0001159a
> r03: 0x00000001 r04: 0x00000000 r05: 0x00000045
> r06: 0x0000004e r07: 0x01010101 r08: 0x000000dc
> r09: 0x200a31e0 r10: 0x20003a5c r11: 0x20004038
> r12: 0x20001486 r13: 0x20004174 r14: 0x07ca2bc0
> r15: 0x20004078 r16: 0x00000000 r17: 0x20004038
> r18: 0x00000001 r19: 0x00000000 r20: 0x0001159a
> r21: 0x00000001 r22: 0x00000000 r23: 0x00000004
> r24: 0x2000d1fc r25: 0x20004cd0

This is the part I really object to. We should not print any register when they
are not relevant. Not decoding the ECR is not enough IMO.


> ------------>8---------------
> 
> 3) Process killed by signal (waitpid02 test), print-fatal-signals == 0:
> ------------>8---------------
>  # ./waitpid02

...
> 
> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev at synopsys.com>
> ---
> Changes v3->v4:
>  * Rebase onto last ARC changes.
> 
> Changes v2->v3:
>  * Don't show exception message if the corresponding signal is
>    handled by application (inspired by x86 implementation)
>  * Rebase onto v4.19-rc8
> 
> Changes v1->v2:
>  * Change format of message about exception.
> 
>  arch/arc/include/asm/bug.h     |  1 +
>  arch/arc/kernel/traps.c        |  3 +++
>  arch/arc/kernel/troubleshoot.c | 46 ++++++++++++++++++++++++++++++++----------
>  arch/arc/mm/fault.c            |  9 +++++++++
>  4 files changed, 48 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
> index 21ec82466d62..b68f7f82f2d8 100644
> --- a/arch/arc/include/asm/bug.h
> +++ b/arch/arc/include/asm/bug.h
> @@ -16,6 +16,7 @@
>  struct task_struct;
>  
>  void show_regs(struct pt_regs *regs);
> +void show_exception_mesg(struct pt_regs *regs);
>  void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs);
>  void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
>  			    unsigned long address);
> diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
> index a7fcbc0d3943..b94b13120bff 100644
> --- a/arch/arc/kernel/traps.c
> +++ b/arch/arc/kernel/traps.c
> @@ -50,6 +50,9 @@ unhandled_exception(const char *str, struct pt_regs *regs,
>  
>  		tsk->thread.fault_address = (__force unsigned int)addr;
>  
> +		if (unhandled_signal(tsk, signo))
> +			show_exception_mesg(regs);
> +
>  		force_sig_fault(signo, si_code, addr, tsk);
>  
>  	} else {
> diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
> index 215f515442e0..64e9867a1a65 100644
> --- a/arch/arc/kernel/troubleshoot.c
> +++ b/arch/arc/kernel/troubleshoot.c
> @@ -106,13 +106,13 @@ static void show_faulting_vma(unsigned long address)
>  			if (IS_ERR(nm))
>  				nm = "?";
>  		}
> -		pr_info("    @off 0x%lx in [%s]\n"
> -			"    VMA: 0x%08lx to 0x%08lx\n",
> +
> +		pr_cont("[off 0x%lx in %s, VMA: %08lx:%08lx] ",
>  			vma->vm_start < TASK_UNMAPPED_BASE ?
>  				address : address - vma->vm_start,
>  			nm, vma->vm_start, vma->vm_end);
>  	} else
> -		pr_info("    @No matching VMA found\n");
> +		pr_cont("[No matching VMA found] ");
>  
>  	up_read(&active_mm->mmap_sem);
>  }
> @@ -122,7 +122,7 @@ static void show_ecr_verbose(struct pt_regs *regs)
>  	unsigned int vec, cause_code;
>  	unsigned long address;
>  
> -	pr_info("\n[ECR   ]: 0x%08lx => ", regs->event);
> +	pr_cont("\n  ECR: 0x%08lx => ", regs->event);
>  
>  	/* For Data fault, this is data address not instruction addr */
>  	address = current->thread.fault_address;
> @@ -170,10 +170,37 @@ static void show_ecr_verbose(struct pt_regs *regs)
>  	}
>  }
>  
> +static inline void show_exception_mesg_u(struct pt_regs *regs)
> +{
> +	struct task_struct *tsk = current;
> +
> +	pr_info("Exception: %s[%d]: at %pS ",
> +		tsk->comm, task_pid_nr(tsk), (void *)regs->ret);
> +
> +	show_faulting_vma(regs->ret);
> +
> +	show_ecr_verbose(regs);
> +}
> +
> +static inline void show_exception_mesg_k(struct pt_regs *regs)
> +{
> +	pr_info("Exception: at %pS:", (void *)regs->ret);
> +
> +	show_ecr_verbose(regs);
> +}
> +
>  /************************************************************************
>   *  API called by rest of kernel
>   ***********************************************************************/
>  
> +void show_exception_mesg(struct pt_regs *regs)
> +{
> +	if (user_mode(regs))
> +		show_exception_mesg_u(regs);
> +	else
> +		show_exception_mesg_k(regs);
> +}
> +
>  void show_regs(struct pt_regs *regs)
>  {
>  	struct task_struct *tsk = current;
> @@ -188,15 +215,10 @@ void show_regs(struct pt_regs *regs)
>  	print_task_path_n_nm(tsk);
>  	show_regs_print_info(KERN_INFO);
>  
> -	show_ecr_verbose(regs);
> -
> -	pr_info("[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
> -		current->thread.fault_address,
> +	pr_info("[ECR   ]: 0x%08lx\n[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
> +		regs->event, current->thread.fault_address,
>  		(void *)regs->blink, (void *)regs->ret);
>  
> -	if (user_mode(regs))
> -		show_faulting_vma(regs->ret); /* faulting code, not data */
> -
>  	pr_info("[STAT32]: 0x%08lx", regs->status32);
>  
>  #define STS_BIT(r, bit)	r->status32 & STATUS_##bit##_MASK ? #bit" " : ""
> @@ -239,6 +261,8 @@ void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
>  	/* Show fault description */
>  	pr_info("\n%s\n", str);
>  
> +	show_exception_mesg(regs);
> +
>  	/* Caller and Callee regs */
>  	show_regs(regs);
>  
> diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
> index 8df1638259f3..e44b64107adb 100644
> --- a/arch/arc/mm/fault.c
> +++ b/arch/arc/mm/fault.c
> @@ -202,7 +202,12 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
>  	/* User mode accesses just cause a SIGSEGV */
>  	if (user_mode(regs)) {
>  		tsk->thread.fault_address = address;
> +
> +		if (unhandled_signal(tsk, SIGSEGV))
> +			show_exception_mesg(regs);
> +
>  		force_sig_fault(SIGSEGV, si_code, (void __user *)address, tsk);
> +
>  		return;
>  	}
>  
> @@ -237,5 +242,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
>  		goto no_context;
>  
>  	tsk->thread.fault_address = address;
> +
> +	if (unhandled_signal(tsk, SIGBUS))
> +		show_exception_mesg(regs);
> +
>  	force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address, tsk);
>  }
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message
  2018-07-05 21:26       ` Vineet Gupta
@ 2018-07-06 13:05         ` Eugeniy Paltsev
  -1 siblings, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2018-07-06 13:05 UTC (permalink / raw)
  To: Eugeniy.Paltsev, Vineet Gupta, linux-snps-arc
  Cc: linux-kernel, viro, Alexey Brodkin

Hi Vineet,

On Thu, 2018-07-05 at 14:26 -0700, Vineet Gupta wrote:
> On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> > On Mon, 2018-07-02 at 10:57 -0700, Vineet Gupta wrote:
> > > +CC Al
> > > 
> > > On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> > > > We process signals in the end of syscall/exception handler.
> > > > It the signal is fatal we print register's content using
> > > > show_regs function. show_regs() also prints information about
> > > > last exception happened.
> > > > 
> > > > In case of multicore system we can catch the situation when we
> > > > will print wrong information about exception. See the example:
> > > > ______________________________
> > > > CPU-0: started to handle page fault
> > > > CPU-1: sent signal to process, which is executed on CPU-0
> > > > CPU-0: ended page fault handle. Started to process signal before
> > > >        returnig to userspace. Process signal, which is send from
> > > >        CPU-0. As th signal is fatal we call show_regs().
> > > >        show_regs() will show information about last exception
> > > >        which is *page fault* (instead of "trap" which is used for
> > > >        signals and happened on CPU-0)
> > > > 
> > > > So we will get message like this:
> > > >     /home/waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   Path: /home/waitpid02
> > > >   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
> > > >   task: 9f11c200 task.stack: 9f3ae000
> > > > 
> > > >   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
> > > >   [EFA   ]: 0x00000000
> > > >   [BLINK ]: 0x123ea
> > > >   [ERET  ]: 0x123ec
> > > >     @off 0x123ec in [/home/waitpid02]
> > > >     VMA: 0x00010000 to 0x00016000
> > > >   [STAT32]: 0x80080882 : IE U
> > > >   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
> > > >   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
> > > >   [-----other-info-----]
> > > > 
> > > > This message is confusing because it show information about page fault
> > > > ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> > > > to signal.
> > > 
> > > Agreed this is misleading. @Al, is there a way to identify process termination
> > > from signals because it did something wrong vs. say unhandled signal. For former,
> > > we want to dump additional info in show_regs() such as PC / Fault addres etc and
> > > not in other scenario.
> > > 
> > > > This situation was reproduced with waitpid02 LTP test.
> > > > _____________________________
> > > > 
> > > > So remove printing information about exceptions from show_regs()
> > > > to avoid confusing messages. Print information about exceptions
> > > > only in required places instead of show_regs()
> > > > 
> > > > Now we don't print information about exceptions if signal is simply
> > > > send by another userspace app. So in case of waitpid02 we will print
> > > > next message:
> > > > _____________________________
> > > >     ./waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   [STAT32]: 0x80080082 : IE U
> > > >   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
> > > >   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
> > > >   [-----other-info-----]
> > > > _____________________________
> > > 
> > > The prints I'm seeing now, for a segv from NULL pointer access is even more
> > > confusing !
> > > There's a mixup of prints....
> > > 
> > > -------------------->8--------------------
> > > Path: /segv
> > > CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
> > > 
> > > [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
> > > [EFA   ]: 0x00000000
> > > [BLINK ]: 0x20047bb0
> > > [ERET  ]: 0x103ac
> > >     @off 0x103ac in [/segv]
> > >     VMA: 0x00010000 to 0x00012000
> > > 
> > > potentially unexpected fatal signal 11.
> > > [STAT32]: 0x80080882 : IE U    
> > > BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
> > > LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> > > r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
> > > r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
> > > ...
> > > -------------------->8--------------------
> > > 
> > > and for the process killed by signal 8, we get below.
> > > 
> > > -------------------->8--------------------
> > > [ARCLinux]# kill -8 71
> > > [ARCLinux]# potentially unexpected fatal signal 8.
> > > [STAT32]: 0x80080882 : IE U    
> > > BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
> > > LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> > > r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
> > > r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
> > > r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
> > > ...
> > > 
> > > 
> > > [1]+  Floating point exception   ./sleep
> > > -------------------->8--------------------
> > > I'm not sure whats the improvement here vs. the status quo.
> > 
> > Why do you think this is confusing?
> > The main change is that we don't print exception registers for signal based kill.
> 
> For the pure signal based termination, what is the point of printing the rest of
> registers. If you say "it is to give a feel of what the userspace was doing at the
> time...." then we are lacking the most crucial piece which is the PC at the time
> (i.e. ERET placeholder).

We can disable printing the rest of registers by setting "print-fatal-signals" to 0
by default. In that case we will print rest of registers only for exceptions
happened in kernel space (when die() function is called)

So in case of setting "print-fatal-signals" to 0 we will get following messages:

NULL pointer access from user space:
---------->8-------------
# ./arc_hell &
# Path: /root/arc_hell
CPU: 0 PID: 79 Comm: arc_hell Not tainted 4.17.0+ #10

[ECR   ]: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
[EFA   ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
    @off 0x2e35c in [/lib/libuClibc-1.0.18.so]
    VMA: 0x2000c000 to 0x20072000

[1]+  Segmentation fault         ./arc_hell
---------->8-------------

Process killed by signal 11 via 'kill -s 11':
---------->8-------------
# sleep 1000 &
[1]+  Segmentation fault         sleep 1000
---------->8-------------



Probably the best variant is to print only information about unexpected exception
from exception handler and leave printing ECR/EFA/BLINK/ERET in show_regs().

So in case of exception based signal we will print from exception handler
something like that:
---------->8-------------
Unexpected exception: Invalid Read @ 0x00000000 by insn @ 0x2003a35c
---------->8-------------

Other information will be print in show_regs (only if "print-fatal-signals"
is enabled):
---------->8-------------
potentially unexpected fatal signal 11.
Path: /root/arc_hell
CPU: 0 PID: 79 Comm: arc_hell Not tainted 4.17.0+ #10
[ECR   ]: 0x00050100
[EFA   ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
    @off 0x2e35c in [/lib/libuClibc-1.0.18.so]
    VMA: 0x2000c000 to 0x20072000
[STAT32]: 0x80080882 : IE U
BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000
r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f
...
---------->8-------------


> > Moreover, new behavior is more like x86-64 behavior. See the example:
> 
> For a null pointer based termination, we now have a ugly looking "potentially
> unexpected..." print in the middle of reg file dump, that is not what x86 does.
> Anyways, that print It is undesirable, but not a deal breaker.  The issue is point
> above, can we remedy it.
> 
> BTW in your original patch, for a null pointer access, the printing code now
> allocates 2 pages, in each of show_xxx routines and the one in show_regs() is now
> pointless, as it is not used there at all there - so please fix that as well.

Thanks for pointing, will fix in v2 patch.

> -Vineet
> 
-- 
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2018-07-06 13:05         ` Eugeniy Paltsev
  0 siblings, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2018-07-06 13:05 UTC (permalink / raw)
  To: linux-snps-arc

Hi Vineet,

On Thu, 2018-07-05@14:26 -0700, Vineet Gupta wrote:
> On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> > On Mon, 2018-07-02@10:57 -0700, Vineet Gupta wrote:
> > > +CC Al
> > > 
> > > On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> > > > We process signals in the end of syscall/exception handler.
> > > > It the signal is fatal we print register's content using
> > > > show_regs function. show_regs() also prints information about
> > > > last exception happened.
> > > > 
> > > > In case of multicore system we can catch the situation when we
> > > > will print wrong information about exception. See the example:
> > > > ______________________________
> > > > CPU-0: started to handle page fault
> > > > CPU-1: sent signal to process, which is executed on CPU-0
> > > > CPU-0: ended page fault handle. Started to process signal before
> > > >        returnig to userspace. Process signal, which is send from
> > > >        CPU-0. As th signal is fatal we call show_regs().
> > > >        show_regs() will show information about last exception
> > > >        which is *page fault* (instead of "trap" which is used for
> > > >        signals and happened on CPU-0)
> > > > 
> > > > So we will get message like this:
> > > >     /home/waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   Path: /home/waitpid02
> > > >   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
> > > >   task: 9f11c200 task.stack: 9f3ae000
> > > > 
> > > >   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
> > > >   [EFA   ]: 0x00000000
> > > >   [BLINK ]: 0x123ea
> > > >   [ERET  ]: 0x123ec
> > > >     @off 0x123ec in [/home/waitpid02]
> > > >     VMA: 0x00010000 to 0x00016000
> > > >   [STAT32]: 0x80080882 : IE U
> > > >   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
> > > >   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
> > > >   [-----other-info-----]
> > > > 
> > > > This message is confusing because it show information about page fault
> > > > ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> > > > to signal.
> > > 
> > > Agreed this is misleading. @Al, is there a way to identify process termination
> > > from signals because it did something wrong vs. say unhandled signal. For former,
> > > we want to dump additional info in show_regs() such as PC / Fault addres etc and
> > > not in other scenario.
> > > 
> > > > This situation was reproduced with waitpid02 LTP test.
> > > > _____________________________
> > > > 
> > > > So remove printing information about exceptions from show_regs()
> > > > to avoid confusing messages. Print information about exceptions
> > > > only in required places instead of show_regs()
> > > > 
> > > > Now we don't print information about exceptions if signal is simply
> > > > send by another userspace app. So in case of waitpid02 we will print
> > > > next message:
> > > > _____________________________
> > > >     ./waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   [STAT32]: 0x80080082 : IE U
> > > >   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
> > > >   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
> > > >   [-----other-info-----]
> > > > _____________________________
> > > 
> > > The prints I'm seeing now, for a segv from NULL pointer access is even more
> > > confusing !
> > > There's a mixup of prints....
> > > 
> > > -------------------->8--------------------
> > > Path: /segv
> > > CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
> > > 
> > > [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
> > > [EFA   ]: 0x00000000
> > > [BLINK ]: 0x20047bb0
> > > [ERET  ]: 0x103ac
> > >     @off 0x103ac in [/segv]
> > >     VMA: 0x00010000 to 0x00012000
> > > 
> > > potentially unexpected fatal signal 11.
> > > [STAT32]: 0x80080882 : IE U    
> > > BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
> > > LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> > > r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
> > > r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
> > > ...
> > > -------------------->8--------------------
> > > 
> > > and for the process killed by signal 8, we get below.
> > > 
> > > -------------------->8--------------------
> > > [ARCLinux]# kill -8 71
> > > [ARCLinux]# potentially unexpected fatal signal 8.
> > > [STAT32]: 0x80080882 : IE U    
> > > BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
> > > LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> > > r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
> > > r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
> > > r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
> > > ...
> > > 
> > > 
> > > [1]+  Floating point exception   ./sleep
> > > -------------------->8--------------------
> > > I'm not sure whats the improvement here vs. the status quo.
> > 
> > Why do you think this is confusing?
> > The main change is that we don't print exception registers for signal based kill.
> 
> For the pure signal based termination, what is the point of printing the rest of
> registers. If you say "it is to give a feel of what the userspace was doing at the
> time...." then we are lacking the most crucial piece which is the PC at the time
> (i.e. ERET placeholder).

We can disable printing the rest of registers by setting "print-fatal-signals" to 0
by default. In that case we will print rest of registers only for exceptions
happened in kernel space (when die() function is called)

So in case of setting "print-fatal-signals" to 0 we will get following messages:

NULL pointer access from user space:
---------->8-------------
# ./arc_hell &
# Path: /root/arc_hell
CPU: 0 PID: 79 Comm: arc_hell Not tainted 4.17.0+ #10

[ECR   ]: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
[EFA   ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
    @off 0x2e35c in [/lib/libuClibc-1.0.18.so]
    VMA: 0x2000c000 to 0x20072000

[1]+  Segmentation fault         ./arc_hell
---------->8-------------

Process killed by signal 11 via 'kill -s 11':
---------->8-------------
# sleep 1000 &
[1]+  Segmentation fault         sleep 1000
---------->8-------------



Probably the best variant is to print only information about unexpected exception
from exception handler and leave printing ECR/EFA/BLINK/ERET in show_regs().

So in case of exception based signal we will print from exception handler
something like that:
---------->8-------------
Unexpected exception: Invalid Read @ 0x00000000 by insn @ 0x2003a35c
---------->8-------------

Other information will be print in show_regs (only if "print-fatal-signals"
is enabled):
---------->8-------------
potentially unexpected fatal signal 11.
Path: /root/arc_hell
CPU: 0 PID: 79 Comm: arc_hell Not tainted 4.17.0+ #10
[ECR   ]: 0x00050100
[EFA   ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
    @off 0x2e35c in [/lib/libuClibc-1.0.18.so]
    VMA: 0x2000c000 to 0x20072000
[STAT32]: 0x80080882 : IE U
BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000
r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f
...
---------->8-------------


> > Moreover, new behavior is more like x86-64 behavior. See the example:
> 
> For a null pointer based termination, we now have a ugly looking "potentially
> unexpected..." print in the middle of reg file dump, that is not what x86 does.
> Anyways, that print It is undesirable, but not a deal breaker.  The issue is point
> above, can we remedy it.
> 
> BTW in your original patch, for a null pointer access, the printing code now
> allocates 2 pages, in each of show_xxx routines and the one in show_regs() is now
> pointless, as it is not used there at all there - so please fix that as well.

Thanks for pointing, will fix in v2 patch.

> -Vineet
> 
-- 
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message
  2018-07-03 10:57     ` Eugeniy Paltsev
@ 2018-07-05 21:26       ` Vineet Gupta
  -1 siblings, 0 replies; 14+ messages in thread
From: Vineet Gupta @ 2018-07-05 21:26 UTC (permalink / raw)
  To: Eugeniy Paltsev, Eugeniy.Paltsev, linux-snps-arc
  Cc: linux-kernel, viro, Alexey Brodkin

On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> On Mon, 2018-07-02 at 10:57 -0700, Vineet Gupta wrote:
>> +CC Al
>>
>> On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
>>> We process signals in the end of syscall/exception handler.
>>> It the signal is fatal we print register's content using
>>> show_regs function. show_regs() also prints information about
>>> last exception happened.
>>>
>>> In case of multicore system we can catch the situation when we
>>> will print wrong information about exception. See the example:
>>> ______________________________
>>> CPU-0: started to handle page fault
>>> CPU-1: sent signal to process, which is executed on CPU-0
>>> CPU-0: ended page fault handle. Started to process signal before
>>>        returnig to userspace. Process signal, which is send from
>>>        CPU-0. As th signal is fatal we call show_regs().
>>>        show_regs() will show information about last exception
>>>        which is *page fault* (instead of "trap" which is used for
>>>        signals and happened on CPU-0)
>>>
>>> So we will get message like this:
>>>     /home/waitpid02
>>>   potentially unexpected fatal signal 8.
>>>   Path: /home/waitpid02
>>>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>>>   task: 9f11c200 task.stack: 9f3ae000
>>>
>>>   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
>>>   [EFA   ]: 0x00000000
>>>   [BLINK ]: 0x123ea
>>>   [ERET  ]: 0x123ec
>>>     @off 0x123ec in [/home/waitpid02]
>>>     VMA: 0x00010000 to 0x00016000
>>>   [STAT32]: 0x80080882 : IE U
>>>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
>>>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
>>>   [-----other-info-----]
>>>
>>> This message is confusing because it show information about page fault
>>> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
>>> to signal.
>> Agreed this is misleading. @Al, is there a way to identify process termination
>> from signals because it did something wrong vs. say unhandled signal. For former,
>> we want to dump additional info in show_regs() such as PC / Fault addres etc and
>> not in other scenario.
>>
>>> This situation was reproduced with waitpid02 LTP test.
>>> _____________________________
>>>
>>> So remove printing information about exceptions from show_regs()
>>> to avoid confusing messages. Print information about exceptions
>>> only in required places instead of show_regs()
>>>
>>> Now we don't print information about exceptions if signal is simply
>>> send by another userspace app. So in case of waitpid02 we will print
>>> next message:
>>> _____________________________
>>>     ./waitpid02
>>>   potentially unexpected fatal signal 8.
>>>   [STAT32]: 0x80080082 : IE U
>>>   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
>>>   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
>>>   [-----other-info-----]
>>> _____________________________
>> The prints I'm seeing now, for a segv from NULL pointer access is even more
>> confusing !
>> There's a mixup of prints....
>>
>> -------------------->8--------------------
>> Path: /segv
>> CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
>>
>> [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
>> [EFA   ]: 0x00000000
>> [BLINK ]: 0x20047bb0
>> [ERET  ]: 0x103ac
>>     @off 0x103ac in [/segv]
>>     VMA: 0x00010000 to 0x00012000
>>
>> potentially unexpected fatal signal 11.
>> [STAT32]: 0x80080882 : IE U    
>> BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
>> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
>> r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
>> r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
>> ...
>> -------------------->8--------------------
>>
>> and for the process killed by signal 8, we get below.
>>
>> -------------------->8--------------------
>> [ARCLinux]# kill -8 71
>> [ARCLinux]# potentially unexpected fatal signal 8.
>> [STAT32]: 0x80080882 : IE U    
>> BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
>> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
>> r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
>> r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
>> r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
>> ...
>>
>>
>> [1]+  Floating point exception   ./sleep
>> -------------------->8--------------------
>> I'm not sure whats the improvement here vs. the status quo.
> Why do you think this is confusing?
> The main change is that we don't print exception registers for signal based kill.

For the pure signal based termination, what is the point of printing the rest of
registers. If you say "it is to give a feel of what the userspace was doing at the
time...." then we are lacking the most crucial piece which is the PC at the time
(i.e. ERET placeholder).

> Moreover, new behavior is more like x86-64 behavior. See the example:

For a null pointer based termination, we now have a ugly looking "potentially
unexpected..." print in the middle of reg file dump, that is not what x86 does.
Anyways, that print It is undesirable, but not a deal breaker.  The issue is point
above, can we remedy it.

BTW in your original patch, for a null pointer access, the printing code now
allocates 2 pages, in each of show_xxx routines and the one in show_regs() is now
pointless, as it is not used there at all there - so please fix that as well.

-Vineet

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2018-07-05 21:26       ` Vineet Gupta
  0 siblings, 0 replies; 14+ messages in thread
From: Vineet Gupta @ 2018-07-05 21:26 UTC (permalink / raw)
  To: linux-snps-arc

On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> On Mon, 2018-07-02@10:57 -0700, Vineet Gupta wrote:
>> +CC Al
>>
>> On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
>>> We process signals in the end of syscall/exception handler.
>>> It the signal is fatal we print register's content using
>>> show_regs function. show_regs() also prints information about
>>> last exception happened.
>>>
>>> In case of multicore system we can catch the situation when we
>>> will print wrong information about exception. See the example:
>>> ______________________________
>>> CPU-0: started to handle page fault
>>> CPU-1: sent signal to process, which is executed on CPU-0
>>> CPU-0: ended page fault handle. Started to process signal before
>>>        returnig to userspace. Process signal, which is send from
>>>        CPU-0. As th signal is fatal we call show_regs().
>>>        show_regs() will show information about last exception
>>>        which is *page fault* (instead of "trap" which is used for
>>>        signals and happened on CPU-0)
>>>
>>> So we will get message like this:
>>>     /home/waitpid02
>>>   potentially unexpected fatal signal 8.
>>>   Path: /home/waitpid02
>>>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>>>   task: 9f11c200 task.stack: 9f3ae000
>>>
>>>   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
>>>   [EFA   ]: 0x00000000
>>>   [BLINK ]: 0x123ea
>>>   [ERET  ]: 0x123ec
>>>     @off 0x123ec in [/home/waitpid02]
>>>     VMA: 0x00010000 to 0x00016000
>>>   [STAT32]: 0x80080882 : IE U
>>>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
>>>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
>>>   [-----other-info-----]
>>>
>>> This message is confusing because it show information about page fault
>>> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
>>> to signal.
>> Agreed this is misleading. @Al, is there a way to identify process termination
>> from signals because it did something wrong vs. say unhandled signal. For former,
>> we want to dump additional info in show_regs() such as PC / Fault addres etc and
>> not in other scenario.
>>
>>> This situation was reproduced with waitpid02 LTP test.
>>> _____________________________
>>>
>>> So remove printing information about exceptions from show_regs()
>>> to avoid confusing messages. Print information about exceptions
>>> only in required places instead of show_regs()
>>>
>>> Now we don't print information about exceptions if signal is simply
>>> send by another userspace app. So in case of waitpid02 we will print
>>> next message:
>>> _____________________________
>>>     ./waitpid02
>>>   potentially unexpected fatal signal 8.
>>>   [STAT32]: 0x80080082 : IE U
>>>   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
>>>   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
>>>   [-----other-info-----]
>>> _____________________________
>> The prints I'm seeing now, for a segv from NULL pointer access is even more
>> confusing !
>> There's a mixup of prints....
>>
>> -------------------->8--------------------
>> Path: /segv
>> CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
>>
>> [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
>> [EFA   ]: 0x00000000
>> [BLINK ]: 0x20047bb0
>> [ERET  ]: 0x103ac
>>     @off 0x103ac in [/segv]
>>     VMA: 0x00010000 to 0x00012000
>>
>> potentially unexpected fatal signal 11.
>> [STAT32]: 0x80080882 : IE U    
>> BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
>> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
>> r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
>> r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
>> ...
>> -------------------->8--------------------
>>
>> and for the process killed by signal 8, we get below.
>>
>> -------------------->8--------------------
>> [ARCLinux]# kill -8 71
>> [ARCLinux]# potentially unexpected fatal signal 8.
>> [STAT32]: 0x80080882 : IE U    
>> BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
>> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
>> r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
>> r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
>> r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
>> ...
>>
>>
>> [1]+  Floating point exception   ./sleep
>> -------------------->8--------------------
>> I'm not sure whats the improvement here vs. the status quo.
> Why do you think this is confusing?
> The main change is that we don't print exception registers for signal based kill.

For the pure signal based termination, what is the point of printing the rest of
registers. If you say "it is to give a feel of what the userspace was doing at the
time...." then we are lacking the most crucial piece which is the PC at the time
(i.e. ERET placeholder).

> Moreover, new behavior is more like x86-64 behavior. See the example:

For a null pointer based termination, we now have a ugly looking "potentially
unexpected..." print in the middle of reg file dump, that is not what x86 does.
Anyways, that print It is undesirable, but not a deal breaker.  The issue is point
above, can we remedy it.

BTW in your original patch, for a null pointer access, the printing code now
allocates 2 pages, in each of show_xxx routines and the one in show_regs() is now
pointless, as it is not used there at all there - so please fix that as well.

-Vineet

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message
  2018-07-02 17:57   ` Vineet Gupta
@ 2018-07-03 10:57     ` Eugeniy Paltsev
  -1 siblings, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2018-07-03 10:57 UTC (permalink / raw)
  To: Eugeniy.Paltsev, Vineet Gupta, linux-snps-arc
  Cc: linux-kernel, viro, Alexey Brodkin

On Mon, 2018-07-02 at 10:57 -0700, Vineet Gupta wrote:
> +CC Al
> 
> On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> > We process signals in the end of syscall/exception handler.
> > It the signal is fatal we print register's content using
> > show_regs function. show_regs() also prints information about
> > last exception happened.
> > 
> > In case of multicore system we can catch the situation when we
> > will print wrong information about exception. See the example:
> > ______________________________
> > CPU-0: started to handle page fault
> > CPU-1: sent signal to process, which is executed on CPU-0
> > CPU-0: ended page fault handle. Started to process signal before
> >        returnig to userspace. Process signal, which is send from
> >        CPU-0. As th signal is fatal we call show_regs().
> >        show_regs() will show information about last exception
> >        which is *page fault* (instead of "trap" which is used for
> >        signals and happened on CPU-0)
> > 
> > So we will get message like this:
> >     /home/waitpid02
> >   potentially unexpected fatal signal 8.
> >   Path: /home/waitpid02
> >   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
> >   task: 9f11c200 task.stack: 9f3ae000
> > 
> >   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
> >   [EFA   ]: 0x00000000
> >   [BLINK ]: 0x123ea
> >   [ERET  ]: 0x123ec
> >     @off 0x123ec in [/home/waitpid02]
> >     VMA: 0x00010000 to 0x00016000
> >   [STAT32]: 0x80080882 : IE U
> >   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
> >   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
> >   [-----other-info-----]
> > 
> > This message is confusing because it show information about page fault
> > ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> > to signal.
> 
> Agreed this is misleading. @Al, is there a way to identify process termination
> from signals because it did something wrong vs. say unhandled signal. For former,
> we want to dump additional info in show_regs() such as PC / Fault addres etc and
> not in other scenario.
> 
> > This situation was reproduced with waitpid02 LTP test.
> > _____________________________
> > 
> > So remove printing information about exceptions from show_regs()
> > to avoid confusing messages. Print information about exceptions
> > only in required places instead of show_regs()
> > 
> > Now we don't print information about exceptions if signal is simply
> > send by another userspace app. So in case of waitpid02 we will print
> > next message:
> > _____________________________
> >     ./waitpid02
> >   potentially unexpected fatal signal 8.
> >   [STAT32]: 0x80080082 : IE U
> >   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
> >   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
> >   [-----other-info-----]
> > _____________________________
> 
> The prints I'm seeing now, for a segv from NULL pointer access is even more
> confusing !
> There's a mixup of prints....
> 
> -------------------->8--------------------
> Path: /segv
> CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
> 
> [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
> [EFA   ]: 0x00000000
> [BLINK ]: 0x20047bb0
> [ERET  ]: 0x103ac
>     @off 0x103ac in [/segv]
>     VMA: 0x00010000 to 0x00012000
> 
> potentially unexpected fatal signal 11.
> [STAT32]: 0x80080882 : IE U    
> BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
> r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
> ...
> -------------------->8--------------------
> 
> and for the process killed by signal 8, we get below.
> 
> -------------------->8--------------------
> [ARCLinux]# kill -8 71
> [ARCLinux]# potentially unexpected fatal signal 8.
> [STAT32]: 0x80080882 : IE U    
> BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
> r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
> r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
> ...
> 
> 
> [1]+  Floating point exception   ./sleep
> -------------------->8--------------------
> I'm not sure whats the improvement here vs. the status quo.

Why do you think this is confusing?
The main change is that we don't print exception registers for signal based kill.

Moreover, new behavior is more like x86-64 behavior. See the example:

NULL pointer access on x86-64:
-------------------->8--------------------
hell[3925]: segfault at 0 ip 00007f621eb56c41 sp 00007ffcaa5951b8 error 4 in libc-2.26.so[7f621ea01000+1ac000]
potentially unexpected fatal signal 11.
CPU: 1 PID: 3925 Comm: hell Not tainted 4.16.13-200.fc27.x86_64 #1
Hardware name: Dell Inc. Latitude 7480/00F6D3, BIOS 1.9.3 03/09/2018
RIP: 0033:0x7f621eb56c41
RSP: 002b:00007ffcaa5951b8 EFLAGS: 00010283
RAX: 000000000000000e RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000016eb260 RDI: 0000000000000000
RBP: 00007ffcaa5951f0 R08: 0000000000000003 R09: 0000000000000077
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffcaa5952d0 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f621efbf4c0(0000) GS:ffff8f6abe480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000004556a4005 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
-------------------->8--------------------

Process killed by signal 11 on x86-64:
-------------------->8--------------------
potentially unexpected fatal signal 11.
CPU: 0 PID: 3998 Comm: sh Not tainted 4.16.13-200.fc27.x86_64 #1
Hardware name: Dell Inc. Latitude 7480/00F6D3, BIOS 1.9.3 03/09/2018
RIP: 0033:0x7fc0b13d744e
RSP: 002b:00007fffba851e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: 0000000000000003 RBX: 000055b3e21256a0 RCX: 00007fc0b13d744e
RDX: 0000000000000241 RSI: 000055b3e2125540 RDI: ffffffffffffff9c
RBP: 00007fffba851f40 R08: 0000000000000020 R09: 0000000000000000
R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000a
R13: 0000000000000003 R14: 0000000000000001 R15: 000055b3e2125540
FS:  00007fc0b1cdb740(0000) GS:ffff8f6abe400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff33d403000 CR3: 00000003dc7ea004 CR4: 00000000003606f0
-------------------->8--------------------


NOTE: in both cases (ARC and x86-64) "/proc/sys/kernel/print-fatal-signals" was
set to "1"

-- 
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2018-07-03 10:57     ` Eugeniy Paltsev
  0 siblings, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2018-07-03 10:57 UTC (permalink / raw)
  To: linux-snps-arc

On Mon, 2018-07-02@10:57 -0700, Vineet Gupta wrote:
> +CC Al
> 
> On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> > We process signals in the end of syscall/exception handler.
> > It the signal is fatal we print register's content using
> > show_regs function. show_regs() also prints information about
> > last exception happened.
> > 
> > In case of multicore system we can catch the situation when we
> > will print wrong information about exception. See the example:
> > ______________________________
> > CPU-0: started to handle page fault
> > CPU-1: sent signal to process, which is executed on CPU-0
> > CPU-0: ended page fault handle. Started to process signal before
> >        returnig to userspace. Process signal, which is send from
> >        CPU-0. As th signal is fatal we call show_regs().
> >        show_regs() will show information about last exception
> >        which is *page fault* (instead of "trap" which is used for
> >        signals and happened on CPU-0)
> > 
> > So we will get message like this:
> >     /home/waitpid02
> >   potentially unexpected fatal signal 8.
> >   Path: /home/waitpid02
> >   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
> >   task: 9f11c200 task.stack: 9f3ae000
> > 
> >   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
> >   [EFA   ]: 0x00000000
> >   [BLINK ]: 0x123ea
> >   [ERET  ]: 0x123ec
> >     @off 0x123ec in [/home/waitpid02]
> >     VMA: 0x00010000 to 0x00016000
> >   [STAT32]: 0x80080882 : IE U
> >   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
> >   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
> >   [-----other-info-----]
> > 
> > This message is confusing because it show information about page fault
> > ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> > to signal.
> 
> Agreed this is misleading. @Al, is there a way to identify process termination
> from signals because it did something wrong vs. say unhandled signal. For former,
> we want to dump additional info in show_regs() such as PC / Fault addres etc and
> not in other scenario.
> 
> > This situation was reproduced with waitpid02 LTP test.
> > _____________________________
> > 
> > So remove printing information about exceptions from show_regs()
> > to avoid confusing messages. Print information about exceptions
> > only in required places instead of show_regs()
> > 
> > Now we don't print information about exceptions if signal is simply
> > send by another userspace app. So in case of waitpid02 we will print
> > next message:
> > _____________________________
> >     ./waitpid02
> >   potentially unexpected fatal signal 8.
> >   [STAT32]: 0x80080082 : IE U
> >   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
> >   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
> >   [-----other-info-----]
> > _____________________________
> 
> The prints I'm seeing now, for a segv from NULL pointer access is even more
> confusing !
> There's a mixup of prints....
> 
> -------------------->8--------------------
> Path: /segv
> CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
> 
> [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
> [EFA   ]: 0x00000000
> [BLINK ]: 0x20047bb0
> [ERET  ]: 0x103ac
>     @off 0x103ac in [/segv]
>     VMA: 0x00010000 to 0x00012000
> 
> potentially unexpected fatal signal 11.
> [STAT32]: 0x80080882 : IE U    
> BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
> r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
> ...
> -------------------->8--------------------
> 
> and for the process killed by signal 8, we get below.
> 
> -------------------->8--------------------
> [ARCLinux]# kill -8 71
> [ARCLinux]# potentially unexpected fatal signal 8.
> [STAT32]: 0x80080882 : IE U    
> BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
> LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
> r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
> r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
> ...
> 
> 
> [1]+  Floating point exception   ./sleep
> -------------------->8--------------------
> I'm not sure whats the improvement here vs. the status quo.

Why do you think this is confusing?
The main change is that we don't print exception registers for signal based kill.

Moreover, new behavior is more like x86-64 behavior. See the example:

NULL pointer access on x86-64:
-------------------->8--------------------
hell[3925]: segfault at 0 ip 00007f621eb56c41 sp 00007ffcaa5951b8 error 4 in libc-2.26.so[7f621ea01000+1ac000]
potentially unexpected fatal signal 11.
CPU: 1 PID: 3925 Comm: hell Not tainted 4.16.13-200.fc27.x86_64 #1
Hardware name: Dell Inc. Latitude 7480/00F6D3, BIOS 1.9.3 03/09/2018
RIP: 0033:0x7f621eb56c41
RSP: 002b:00007ffcaa5951b8 EFLAGS: 00010283
RAX: 000000000000000e RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000016eb260 RDI: 0000000000000000
RBP: 00007ffcaa5951f0 R08: 0000000000000003 R09: 0000000000000077
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffcaa5952d0 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f621efbf4c0(0000) GS:ffff8f6abe480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000004556a4005 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
-------------------->8--------------------

Process killed by signal 11 on x86-64:
-------------------->8--------------------
potentially unexpected fatal signal 11.
CPU: 0 PID: 3998 Comm: sh Not tainted 4.16.13-200.fc27.x86_64 #1
Hardware name: Dell Inc. Latitude 7480/00F6D3, BIOS 1.9.3 03/09/2018
RIP: 0033:0x7fc0b13d744e
RSP: 002b:00007fffba851e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: 0000000000000003 RBX: 000055b3e21256a0 RCX: 00007fc0b13d744e
RDX: 0000000000000241 RSI: 000055b3e2125540 RDI: ffffffffffffff9c
RBP: 00007fffba851f40 R08: 0000000000000020 R09: 0000000000000000
R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000a
R13: 0000000000000003 R14: 0000000000000001 R15: 000055b3e2125540
FS:  00007fc0b1cdb740(0000) GS:ffff8f6abe400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff33d403000 CR3: 00000003dc7ea004 CR4: 00000000003606f0
-------------------->8--------------------


NOTE: in both cases (ARC and x86-64) "/proc/sys/kernel/print-fatal-signals" was
set to "1"

-- 
 Eugeniy Paltsev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message
  2018-06-29 19:39 ` Eugeniy Paltsev
@ 2018-07-02 17:57   ` Vineet Gupta
  -1 siblings, 0 replies; 14+ messages in thread
From: Vineet Gupta @ 2018-07-02 17:57 UTC (permalink / raw)
  To: Eugeniy Paltsev, linux-snps-arc; +Cc: linux-kernel, Alexey Brodkin, Al Viro

+CC Al

On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> We process signals in the end of syscall/exception handler.
> It the signal is fatal we print register's content using
> show_regs function. show_regs() also prints information about
> last exception happened.
>
> In case of multicore system we can catch the situation when we
> will print wrong information about exception. See the example:
> ______________________________
> CPU-0: started to handle page fault
> CPU-1: sent signal to process, which is executed on CPU-0
> CPU-0: ended page fault handle. Started to process signal before
>        returnig to userspace. Process signal, which is send from
>        CPU-0. As th signal is fatal we call show_regs().
>        show_regs() will show information about last exception
>        which is *page fault* (instead of "trap" which is used for
>        signals and happened on CPU-0)
>
> So we will get message like this:
>     /home/waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /home/waitpid02
>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>   task: 9f11c200 task.stack: 9f3ae000
>
>   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
>   [EFA   ]: 0x00000000
>   [BLINK ]: 0x123ea
>   [ERET  ]: 0x123ec
>     @off 0x123ec in [/home/waitpid02]
>     VMA: 0x00010000 to 0x00016000
>   [STAT32]: 0x80080882 : IE U
>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
>   [-----other-info-----]
>
> This message is confusing because it show information about page fault
> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> to signal.

Agreed this is misleading. @Al, is there a way to identify process termination
from signals because it did something wrong vs. say unhandled signal. For former,
we want to dump additional info in show_regs() such as PC / Fault addres etc and
not in other scenario.

> This situation was reproduced with waitpid02 LTP test.
> _____________________________
>
> So remove printing information about exceptions from show_regs()
> to avoid confusing messages. Print information about exceptions
> only in required places instead of show_regs()
>
> Now we don't print information about exceptions if signal is simply
> send by another userspace app. So in case of waitpid02 we will print
> next message:
> _____________________________
>     ./waitpid02
>   potentially unexpected fatal signal 8.
>   [STAT32]: 0x80080082 : IE U
>   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
>   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
>   [-----other-info-----]
> _____________________________

The prints I'm seeing now, for a segv from NULL pointer access is even more
confusing !
There's a mixup of prints....

-------------------->8--------------------
Path: /segv
CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412

[ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
[EFA   ]: 0x00000000
[BLINK ]: 0x20047bb0
[ERET  ]: 0x103ac
    @off 0x103ac in [/segv]
    VMA: 0x00010000 to 0x00012000

potentially unexpected fatal signal 11.
[STAT32]: 0x80080882 : IE U    
BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
...
-------------------->8--------------------

and for the process killed by signal 8, we get below.

-------------------->8--------------------
[ARCLinux]# kill -8 71
[ARCLinux]# potentially unexpected fatal signal 8.
[STAT32]: 0x80080882 : IE U    
BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
...


[1]+  Floating point exception   ./sleep
-------------------->8--------------------

I'm not sure whats the improvement here vs. the status quo.

For signal based kill, we don't want to dump the extra registers and if any, we
might still want to print the PC where the process was last seen in user mode to
give user of idea what it was doing at the time.

-Vineet

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2018-07-02 17:57   ` Vineet Gupta
  0 siblings, 0 replies; 14+ messages in thread
From: Vineet Gupta @ 2018-07-02 17:57 UTC (permalink / raw)
  To: linux-snps-arc

+CC Al

On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> We process signals in the end of syscall/exception handler.
> It the signal is fatal we print register's content using
> show_regs function. show_regs() also prints information about
> last exception happened.
>
> In case of multicore system we can catch the situation when we
> will print wrong information about exception. See the example:
> ______________________________
> CPU-0: started to handle page fault
> CPU-1: sent signal to process, which is executed on CPU-0
> CPU-0: ended page fault handle. Started to process signal before
>        returnig to userspace. Process signal, which is send from
>        CPU-0. As th signal is fatal we call show_regs().
>        show_regs() will show information about last exception
>        which is *page fault* (instead of "trap" which is used for
>        signals and happened on CPU-0)
>
> So we will get message like this:
>     /home/waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /home/waitpid02
>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>   task: 9f11c200 task.stack: 9f3ae000
>
>   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
>   [EFA   ]: 0x00000000
>   [BLINK ]: 0x123ea
>   [ERET  ]: 0x123ec
>     @off 0x123ec in [/home/waitpid02]
>     VMA: 0x00010000 to 0x00016000
>   [STAT32]: 0x80080882 : IE U
>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
>   [-----other-info-----]
>
> This message is confusing because it show information about page fault
> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> to signal.

Agreed this is misleading. @Al, is there a way to identify process termination
from signals because it did something wrong vs. say unhandled signal. For former,
we want to dump additional info in show_regs() such as PC / Fault addres etc and
not in other scenario.

> This situation was reproduced with waitpid02 LTP test.
> _____________________________
>
> So remove printing information about exceptions from show_regs()
> to avoid confusing messages. Print information about exceptions
> only in required places instead of show_regs()
>
> Now we don't print information about exceptions if signal is simply
> send by another userspace app. So in case of waitpid02 we will print
> next message:
> _____________________________
>     ./waitpid02
>   potentially unexpected fatal signal 8.
>   [STAT32]: 0x80080082 : IE U
>   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
>   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
>   [-----other-info-----]
> _____________________________

The prints I'm seeing now, for a segv from NULL pointer access is even more
confusing !
There's a mixup of prints....

-------------------->8--------------------
Path: /segv
CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412

[ECR?? ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
[EFA?? ]: 0x00000000
[BLINK ]: 0x20047bb0
[ERET? ]: 0x103ac
??? @off 0x103ac in [/segv]
??? VMA: 0x00010000 to 0x00012000

potentially unexpected fatal signal 11.
[STAT32]: 0x80080882 : IE U????
BTA: 0x00010398??? ?SP: 0x5fc95e1c??? ?FP: 0x5fc95e20
LPS: 0x20039ffc??? LPE: 0x2003a000??? LPC: 0x00000000
r00: 0x00000001??? r01: 0x5fc95e94??? r02: 0x00000000???
r03: 0x00000064??? r04: 0x80808080??? r05: 0x2f2f2f2f???
...
-------------------->8--------------------

and for the process killed by signal 8, we get below.

-------------------->8--------------------
[ARCLinux]# kill -8 71
[ARCLinux]# potentially unexpected fatal signal 8.
[STAT32]: 0x80080882 : IE U????
BTA: 0x20020660??? ?SP: 0x5fbcddec??? ?FP: 0x5fbcde1c
LPS: 0x20039ffc??? LPE: 0x2003a000??? LPC: 0x00000000
r00: 0xfffffdfc??? r01: 0x5fbcddf0??? r02: 0x00000000???
r03: 0x00000008??? r04: 0x80808080??? r05: 0x2f2f2f2f???
r06: 0x7a2f5f4a??? r07: 0x00000000??? r08: 0x00000065???
...


[1]+? Floating point exception?? ./sleep
-------------------->8--------------------

I'm not sure whats the improvement here vs. the status quo.

For signal based kill, we don't want to dump the extra registers and if any, we
might still want to print the PC where the process was last seen in user mode to
give user of idea what it was doing at the time.

-Vineet

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2018-06-29 19:39 ` Eugeniy Paltsev
  0 siblings, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2018-06-29 19:39 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: linux-kernel, Vineet Gupta, Alexey Brodkin, Eugeniy Paltsev

We process signals in the end of syscall/exception handler.
It the signal is fatal we print register's content using
show_regs function. show_regs() also prints information about
last exception happened.

In case of multicore system we can catch the situation when we
will print wrong information about exception. See the example:
______________________________
CPU-0: started to handle page fault
CPU-1: sent signal to process, which is executed on CPU-0
CPU-0: ended page fault handle. Started to process signal before
       returnig to userspace. Process signal, which is send from
       CPU-0. As th signal is fatal we call show_regs().
       show_regs() will show information about last exception
       which is *page fault* (instead of "trap" which is used for
       signals and happened on CPU-0)

So we will get message like this:
    /home/waitpid02
  potentially unexpected fatal signal 8.
  Path: /home/waitpid02
  CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
  task: 9f11c200 task.stack: 9f3ae000

  [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
  [EFA   ]: 0x00000000
  [BLINK ]: 0x123ea
  [ERET  ]: 0x123ec
    @off 0x123ec in [/home/waitpid02]
    VMA: 0x00010000 to 0x00016000
  [STAT32]: 0x80080882 : IE U
  BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
  LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
  [-----other-info-----]

This message is confusing because it show information about page fault
( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
to signal.

This situation was reproduced with waitpid02 LTP test.
_____________________________

So remove printing information about exceptions from show_regs()
to avoid confusing messages. Print information about exceptions
only in required places instead of show_regs()

Now we don't print information about exceptions if signal is simply
send by another userspace app. So in case of waitpid02 we will print
next message:
_____________________________
    ./waitpid02
  potentially unexpected fatal signal 8.
  [STAT32]: 0x80080082 : IE U
  BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
  LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
  [-----other-info-----]
_____________________________

This patch fix
STAR 9001146055: waitpid02: Invalid Write @ 0x00000000 by insn @ 0x000123ec

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
---
 arch/arc/include/asm/bug.h     |  1 +
 arch/arc/kernel/traps.c        |  1 +
 arch/arc/kernel/troubleshoot.c | 27 ++++++++++++++++++++-------
 arch/arc/mm/fault.c            |  2 ++
 4 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
index 21ec82466d62..b9fd60f7d36b 100644
--- a/arch/arc/include/asm/bug.h
+++ b/arch/arc/include/asm/bug.h
@@ -16,6 +16,7 @@
 struct task_struct;
 
 void show_regs(struct pt_regs *regs);
+void show_exception_regs(struct pt_regs *regs);
 void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs);
 void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
 			    unsigned long address);
diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
index b123558bf0bb..fe41f0d9488f 100644
--- a/arch/arc/kernel/traps.c
+++ b/arch/arc/kernel/traps.c
@@ -50,6 +50,7 @@ unhandled_exception(const char *str, struct pt_regs *regs, siginfo_t *info)
 		tsk->thread.fault_address = (__force unsigned int)info->si_addr;
 
 		force_sig_info(info->si_signo, info, tsk);
+		show_exception_regs(regs);
 
 	} else {
 		/* If not due to copy_(to|from)_user, we are doomed */
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index 783b20354f8b..313aea25e638 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -170,14 +170,9 @@ static void show_ecr_verbose(struct pt_regs *regs)
 	}
 }
 
-/************************************************************************
- *  API called by rest of kernel
- ***********************************************************************/
-
-void show_regs(struct pt_regs *regs)
+void show_exception_regs(struct pt_regs *regs)
 {
 	struct task_struct *tsk = current;
-	struct callee_regs *cregs;
 	char *buf;
 
 	buf = (char *)__get_free_page(GFP_KERNEL);
@@ -190,12 +185,28 @@ void show_regs(struct pt_regs *regs)
 	show_ecr_verbose(regs);
 
 	pr_info("[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
-		current->thread.fault_address,
+		tsk->thread.fault_address,
 		(void *)regs->blink, (void *)regs->ret);
 
 	if (user_mode(regs))
 		show_faulting_vma(regs->ret, buf); /* faulting code, not data */
 
+	free_page((unsigned long)buf);
+}
+
+/************************************************************************
+ *  API called by rest of kernel
+ ***********************************************************************/
+
+void show_regs(struct pt_regs *regs)
+{
+	struct callee_regs *cregs;
+	char *buf;
+
+	buf = (char *)__get_free_page(GFP_KERNEL);
+	if (!buf)
+		return;
+
 	pr_info("[STAT32]: 0x%08lx", regs->status32);
 
 #define STS_BIT(r, bit)	r->status32 & STATUS_##bit##_MASK ? #bit" " : ""
@@ -238,6 +249,8 @@ void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
 	/* Show fault description */
 	pr_info("\n%s\n", str);
 
+	show_exception_regs(regs);
+
 	/* Caller and Callee regs */
 	show_regs(regs);
 
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index a0b7bd6d030d..7e0e2591bb27 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -200,6 +200,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 		/* info.si_code has been set above */
 		info.si_addr = (void __user *)address;
 		force_sig_info(SIGSEGV, &info, tsk);
+		show_exception_regs(regs);
 		return;
 	}
 
@@ -239,4 +240,5 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 	info.si_code = BUS_ADRERR;
 	info.si_addr = (void __user *)address;
 	force_sig_info(SIGBUS, &info, tsk);
+	show_exception_regs(regs);
 }
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] ARC: prevent showing irrelevant exception info in signal message
@ 2018-06-29 19:39 ` Eugeniy Paltsev
  0 siblings, 0 replies; 14+ messages in thread
From: Eugeniy Paltsev @ 2018-06-29 19:39 UTC (permalink / raw)
  To: linux-snps-arc

We process signals in the end of syscall/exception handler.
It the signal is fatal we print register's content using
show_regs function. show_regs() also prints information about
last exception happened.

In case of multicore system we can catch the situation when we
will print wrong information about exception. See the example:
______________________________
CPU-0: started to handle page fault
CPU-1: sent signal to process, which is executed on CPU-0
CPU-0: ended page fault handle. Started to process signal before
       returnig to userspace. Process signal, which is send from
       CPU-0. As th signal is fatal we call show_regs().
       show_regs() will show information about last exception
       which is *page fault* (instead of "trap" which is used for
       signals and happened on CPU-0)

So we will get message like this:
    /home/waitpid02
  potentially unexpected fatal signal 8.
  Path: /home/waitpid02
  CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
  task: 9f11c200 task.stack: 9f3ae000

  [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
  [EFA   ]: 0x00000000
  [BLINK ]: 0x123ea
  [ERET  ]: 0x123ec
    @off 0x123ec in [/home/waitpid02]
    VMA: 0x00010000 to 0x00016000
  [STAT32]: 0x80080882 : IE U
  BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
  LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
  [-----other-info-----]

This message is confusing because it show information about page fault
( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
to signal.

This situation was reproduced with waitpid02 LTP test.
_____________________________

So remove printing information about exceptions from show_regs()
to avoid confusing messages. Print information about exceptions
only in required places instead of show_regs()

Now we don't print information about exceptions if signal is simply
send by another userspace app. So in case of waitpid02 we will print
next message:
_____________________________
    ./waitpid02
  potentially unexpected fatal signal 8.
  [STAT32]: 0x80080082 : IE U
  BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
  LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
  [-----other-info-----]
_____________________________

This patch fix
STAR 9001146055: waitpid02: Invalid Write @ 0x00000000 by insn @ 0x000123ec

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev at synopsys.com>
---
 arch/arc/include/asm/bug.h     |  1 +
 arch/arc/kernel/traps.c        |  1 +
 arch/arc/kernel/troubleshoot.c | 27 ++++++++++++++++++++-------
 arch/arc/mm/fault.c            |  2 ++
 4 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
index 21ec82466d62..b9fd60f7d36b 100644
--- a/arch/arc/include/asm/bug.h
+++ b/arch/arc/include/asm/bug.h
@@ -16,6 +16,7 @@
 struct task_struct;
 
 void show_regs(struct pt_regs *regs);
+void show_exception_regs(struct pt_regs *regs);
 void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs);
 void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
 			    unsigned long address);
diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
index b123558bf0bb..fe41f0d9488f 100644
--- a/arch/arc/kernel/traps.c
+++ b/arch/arc/kernel/traps.c
@@ -50,6 +50,7 @@ unhandled_exception(const char *str, struct pt_regs *regs, siginfo_t *info)
 		tsk->thread.fault_address = (__force unsigned int)info->si_addr;
 
 		force_sig_info(info->si_signo, info, tsk);
+		show_exception_regs(regs);
 
 	} else {
 		/* If not due to copy_(to|from)_user, we are doomed */
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index 783b20354f8b..313aea25e638 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -170,14 +170,9 @@ static void show_ecr_verbose(struct pt_regs *regs)
 	}
 }
 
-/************************************************************************
- *  API called by rest of kernel
- ***********************************************************************/
-
-void show_regs(struct pt_regs *regs)
+void show_exception_regs(struct pt_regs *regs)
 {
 	struct task_struct *tsk = current;
-	struct callee_regs *cregs;
 	char *buf;
 
 	buf = (char *)__get_free_page(GFP_KERNEL);
@@ -190,12 +185,28 @@ void show_regs(struct pt_regs *regs)
 	show_ecr_verbose(regs);
 
 	pr_info("[EFA   ]: 0x%08lx\n[BLINK ]: %pS\n[ERET  ]: %pS\n",
-		current->thread.fault_address,
+		tsk->thread.fault_address,
 		(void *)regs->blink, (void *)regs->ret);
 
 	if (user_mode(regs))
 		show_faulting_vma(regs->ret, buf); /* faulting code, not data */
 
+	free_page((unsigned long)buf);
+}
+
+/************************************************************************
+ *  API called by rest of kernel
+ ***********************************************************************/
+
+void show_regs(struct pt_regs *regs)
+{
+	struct callee_regs *cregs;
+	char *buf;
+
+	buf = (char *)__get_free_page(GFP_KERNEL);
+	if (!buf)
+		return;
+
 	pr_info("[STAT32]: 0x%08lx", regs->status32);
 
 #define STS_BIT(r, bit)	r->status32 & STATUS_##bit##_MASK ? #bit" " : ""
@@ -238,6 +249,8 @@ void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
 	/* Show fault description */
 	pr_info("\n%s\n", str);
 
+	show_exception_regs(regs);
+
 	/* Caller and Callee regs */
 	show_regs(regs);
 
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index a0b7bd6d030d..7e0e2591bb27 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -200,6 +200,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 		/* info.si_code has been set above */
 		info.si_addr = (void __user *)address;
 		force_sig_info(SIGSEGV, &info, tsk);
+		show_exception_regs(regs);
 		return;
 	}
 
@@ -239,4 +240,5 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 	info.si_code = BUS_ADRERR;
 	info.si_addr = (void __user *)address;
 	force_sig_info(SIGBUS, &info, tsk);
+	show_exception_regs(regs);
 }
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-21 21:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-21 17:07 [PATCH] ARC: prevent showing irrelevant exception info in signal message Eugeniy Paltsev
2019-01-21 17:07 ` Eugeniy Paltsev
2019-01-21 21:56 ` Vineet Gupta
2019-01-21 21:56   ` Vineet Gupta
  -- strict thread matches above, loose matches on Subject: below --
2018-06-29 19:39 Eugeniy Paltsev
2018-06-29 19:39 ` Eugeniy Paltsev
2018-07-02 17:57 ` Vineet Gupta
2018-07-02 17:57   ` Vineet Gupta
2018-07-03 10:57   ` Eugeniy Paltsev
2018-07-03 10:57     ` Eugeniy Paltsev
2018-07-05 21:26     ` Vineet Gupta
2018-07-05 21:26       ` Vineet Gupta
2018-07-06 13:05       ` Eugeniy Paltsev
2018-07-06 13:05         ` Eugeniy Paltsev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.