All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1
@ 2018-03-06  9:49 Borislav Petkov
  2018-03-06  9:49 ` [PATCH 1/9] panic: Add closing panic marker parenthesis Borislav Petkov
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

Hi,

here's v2 of the dumpstack cleanups.

I've split them into more fine-grained pieces to show each change. The
relevant parts are the saving of the executive registers of the first
time we oops and dumping them in the end + opcode bytes for user faults.
I've tested splats in a 80x25 screen and the registers, RIP and opcode
bytes fit all in.

I'm adding exemplary dumps from 32-bit and 64-bit at the end of this mail.

I still have on my TODO list to experiment with console log levels and
see whether we can do a best-of-both-worlds thing there.

Thx.

Borislav Petkov (9):
  panic: Add closing panic marker parenthesis
  x86/fault: Do not print IP in show_fault_oops()
  x86/dumpstack: Unify show_regs()
  x86/dumpstack: Carve out Code: dumping into a function
  x86/dumpstack: Improve opcodes dumping in the Code: section
  x86/dumpstack: Add loglevel argument to show_opcodes()
  x86/fault: Dump user opcode bytes on fatal faults
  x86/dumpstack: Add a show_ip() function
  x86/dumpstack: Save first regs set for the executive summary

 arch/x86/include/asm/stacktrace.h |   4 +-
 arch/x86/kernel/dumpstack.c       | 106 ++++++++++++++++++++++++++++++--------
 arch/x86/kernel/dumpstack_32.c    |  42 ---------------
 arch/x86/kernel/dumpstack_64.c    |  42 ---------------
 arch/x86/kernel/process_32.c      |   4 +-
 arch/x86/mm/fault.c               |   8 +--
 kernel/panic.c                    |   2 +-
 7 files changed, 93 insertions(+), 115 deletions(-)



Changelog:

v0:

Hi,

so I've been thinking about doing this for a while now: be able to dump
the opcode bytes around the user rIP just like we do for kernel faults.

Why?

See patch 5's commit message. That's why I've marked it RFC.

The rest is cleanups: we're copying the opcodes byte-by-byte and that's
just wasteful.

Also, we're using probe_kernel_read() underneath and it does
__copy_from_user_inatomic() which makes copying user opcode bytes
trivial.

With that, it looks like this:

[  696.837457] strsep[1733]: segfault at 40066b ip 00007fad558fccf8 sp 00007ffc5e662520 error 7 in libc-2.26.so[7fad55876000+1ad000]
[  696.837538] Code: 1b 48 89 fd 48 89 df e8 77 99 f9 ff 48 01 d8 80 38 00 75 17 48 c7 45 00 00 00 00 00 48 83 c4 08 48 89 d8 5b 5d c3 0f 1f 44 00 00 <c6> 00 00 48 83 c0 01 48 89 45 00 48 83 c4 08 48 89 d8 5b 5d c3

and the code matches, as expected:

0000000000086cc0 <__strsep_g@@GLIBC_2.2.5>:
   86cc0:       55                      push   %rbp
   86cc1:       53                      push   %rbx
   86cc2:       48 83 ec 08             sub    $0x8,%rsp
   86cc6:       48 8b 1f                mov    (%rdi),%rbx
   86cc9:       48 85 db                test   %rbx,%rbx
   86ccc:       74 1b                   je     86ce9 <__strsep_g@@GLIBC_2.2.5+0x29>
   86cce:       48 89 fd                mov    %rdi,%rbp
   86cd1:       48 89 df                mov    %rbx,%rdi
   86cd4:       e8 77 99 f9 ff          callq  20650 <*ABS*+0x854e0@plt>
   86cd9:       48 01 d8                add    %rbx,%rax
   86cdc:       80 38 00                cmpb   $0x0,(%rax)
   86cdf:       75 17                   jne    86cf8 <__strsep_g@@GLIBC_2.2.5+0x38>
   86ce1:       48 c7 45 00 00 00 00    movq   $0x0,0x0(%rbp)
   86ce8:       00 
   86ce9:       48 83 c4 08             add    $0x8,%rsp
   86ced:       48 89 d8                mov    %rbx,%rax
   86cf0:       5b                      pop    %rbx
   86cf1:       5d                      pop    %rbp
   86cf2:       c3                      retq   
   86cf3:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
   86cf8:       c6 00 00                movb   $0x0,(%rax)
   86cfb:       48 83 c0 01             add    $0x1,%rax
   86cff:       48 89 45 00             mov    %rax,0x0(%rbp)
   86d03:       48 83 c4 08             add    $0x8,%rsp
   86d07:       48 89 d8                mov    %rbx,%rax
   86d0a:       5b                      pop    %rbx
   86d0b:       5d                      pop    %rbp
   86d0c:       c3                      retq

Comments and suggestions are welcome!

Thx.

Example dumps:

64-bit:

[   34.099980] sysrq: SysRq : Trigger a crash
[   34.101275] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[   34.102843] PGD 799c8067 P4D 799c8067 PUD 7a1a1067 PMD 0 
[   34.103946] Oops: 0002 [#1] PREEMPT SMP
[   34.104828] CPU: 7 PID: 3674 Comm: bash Not tainted 4.16.0-rc3+ #4
[   34.105252] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   34.105252] RIP: 0010:sysrq_handle_crash+0x17/0x20
[   34.105252] Code: eb d1 e8 dd 15 b7 ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 e8 d6 24 bd ff c7 05 a4 20 19 01 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c3 0f 1f 44 00 00 e8 56 22 c2 ff fb e9 
[   34.105252] RSP: 0018:ffffc90001b8fdf0 EFLAGS: 00010246
[   34.105252] RAX: 0000000000000000 RBX: 0000000000000063 RCX: 0000000000000000
[   34.105252] RDX: 0000000000000000 RSI: ffffffff8110156a RDI: 0000000000000063
[   34.105252] RBP: ffffffff822714c0 R08: 0000000000000183 R09: 000000000001868c
[   34.105252] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a
[   34.105252] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   34.105252] FS:  00007ffff7fdb700(0000) GS:ffff88007edc0000(0000) knlGS:0000000000000000
[   34.105252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.105252] CR2: 0000000000000000 CR3: 000000007a1e4000 CR4: 00000000000406e0
[   34.105252] Call Trace:
[   34.105252]  __handle_sysrq+0x9e/0x160
[   34.105252]  write_sysrq_trigger+0x2b/0x30
[   34.105252]  proc_reg_write+0x38/0x70
[   34.105252]  __vfs_write+0x36/0x160
[   34.105252]  ? __fd_install+0x69/0x110
[   34.105252]  ? preempt_count_add+0x74/0xb0
[   34.105252]  ? _raw_spin_lock+0x13/0x30
[   34.105252]  ? set_close_on_exec+0x41/0x80
[   34.105252]  ? preempt_count_sub+0xa8/0x100
[   34.105252]  vfs_write+0xc0/0x190
[   34.105252]  SyS_write+0x64/0xe0
[   34.105252]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   34.105252]  do_syscall_64+0x70/0x130
[   34.105252]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[   34.105252] RIP: 0033:0x7ffff74b9620
[   34.105252] Code: ff 73 01 c3 48 8b 0d 68 98 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d bd f1 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce 8f 01 00 48 89 04 
[   34.105252] RSP: 002b:00007fffffffe638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   34.105252] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ffff74b9620
[   34.105252] RDX: 0000000000000002 RSI: 0000000000705408 RDI: 0000000000000001
[   34.105252] RBP: 0000000000705408 R08: 000000000000000a R09: 00007ffff7fdb700
[   34.105252] R10: 00007fffffffe490 R11: 0000000000000246 R12: 00007ffff77842a0
[   34.105252] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
[   34.105252] Modules linked in:
[   34.105252] CR2: 0000000000000000
[   34.139401] ---[ end trace 6f07cfbc39b079a5 ]---
[   34.140257] RIP: 0010:sysrq_handle_crash+0x17/0x20
[   34.141092] Code: eb d1 e8 dd 15 b7 ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 e8 d6 24 bd ff c7 05 a4 20 19 01 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c3 0f 1f 44 00 00 e8 56 22 c2 ff fb e9 
[   34.143853] RSP: 0018:ffffc90001b8fdf0 EFLAGS: 00010246
[   34.144786] RAX: 0000000000000000 RBX: 0000000000000063 RCX: 0000000000000000
[   34.145890] RDX: 0000000000000000 RSI: ffffffff8110156a RDI: 0000000000000063
[   34.146875] RBP: ffffffff822714c0 R08: 0000000000000183 R09: 000000000001868c
[   34.147841] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a
[   34.149001] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   34.150089] Kernel panic - not syncing: Fatal exception
[   34.151214] Kernel Offset: disabled
[   34.151727] ---[ end Kernel panic - not syncing: Fatal exception ]---

32-bit:

[   14.915957] sysrq: SysRq : Trigger a crash
[   14.918576] BUG: unable to handle kernel NULL pointer dereference at 00000000
[   14.922275] *pde = 00000000 
[   14.922512] Oops: 0002 [#1] PREEMPT SMP
[   14.922512] CPU: 2 PID: 2070 Comm: bash Not tainted 4.16.0-rc3+ #5
[   14.922512] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   14.922512] EIP: sysrq_handle_crash+0x1d/0x30
[   14.922512] Code: bf ff eb d6 e8 45 1c ba ff 90 8d 74 26 00 0f 1f 44 00 00 55 89 e5 e8 d3 0a c0 ff c7 05 94 72 c1 c1 01 00 00 00 0f ae f8 0f 1f 00 <c6> 05 00 00 00 00 01 5d c3 8d 76 00 8d bc 27 00 00 00 00 0f 1f 
[   14.922512] EAX: 80000000 EBX: 0000000a ECX: 00000000 EDX: c110a13d
[   14.922512] ESI: 00000063 EDI: 00000000 EBP: f3edfe8c ESP: f3edfe8c
[   14.922512]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   14.922512] CR0: 80050033 CR2: 00000000 CR3: 33710000 CR4: 000406d0
[   14.922512] Call Trace:
[   14.922512]  __handle_sysrq+0x93/0x130
[   14.922512]  ? sysrq_filter+0x3c0/0x3c0
[   14.922512]  write_sysrq_trigger+0x27/0x40
[   14.922512]  proc_reg_write+0x4d/0x80
[   14.922512]  ? proc_reg_poll+0x70/0x70
[   14.922512]  __vfs_write+0x38/0x160
[   14.922512]  ? preempt_count_sub+0xa0/0x110
[   14.922512]  ? __fd_install+0x51/0xd0
[   14.922512]  ? __sb_start_write+0x4c/0xc0
[   14.922512]  ? preempt_count_sub+0xa0/0x110
[   14.922512]  vfs_write+0x98/0x180
[   14.922512]  SyS_write+0x4f/0xb0
[   14.922512]  do_fast_syscall_32+0x99/0x200
[   14.922512]  entry_SYSENTER_32+0x53/0x86
[   14.922512] EIP: 0xb7f33b35
[   14.922512] Code: ff 89 e5 8b 55 08 8b 80 64 cd ff ff 85 d2 74 02 89 02 5d c3 8b 04 24 c3 8b 0c 24 c3 8b 1c 24 c3 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 
[   14.922512] EAX: ffffffda EBX: 00000001 ECX: 08bbba08 EDX: 00000002
[   14.990702] ESI: 00000002 EDI: b7efed80 EBP: 08bbba08 ESP: bfd09980
[   14.990702]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[   14.990702] Modules linked in:
[   14.990702] CR2: 0000000000000000
[   14.991604] ---[ end trace 2355d48bbc4bb91b ]---
[   14.991607] EIP: sysrq_handle_crash+0x1d/0x30
[   14.991608] Code: bf ff eb d6 e8 45 1c ba ff 90 8d 74 26 00 0f 1f 44 00 00 55 89 e5 e8 d3 0a c0 ff c7 05 94 72 c1 c1 01 00 00 00 0f ae f8 0f 1f 00 <c6> 05 00 00 00 00 01 5d c3 8d 76 00 8d bc 27 00 00 00 00 0f 1f 
[   15.007713] EAX: 80000000 EBX: 0000000a ECX: 00000000 EDX: c110a13d
[   15.009676] ESI: 00000063 EDI: 00000000 EBP: f3edfe8c ESP: c1c0d87c
[   15.011644]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   15.017438] Kernel panic - not syncing: Fatal exception
[   15.021398] Kernel Offset: disabled
[   15.021398] ---[ end Kernel panic - not syncing: Fatal exception ]---

-- 
2.13.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/9] panic: Add closing panic marker parenthesis
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  2018-03-08 11:03   ` [tip:core/core] " tip-bot for Borislav Petkov
  2018-03-06  9:49 ` [PATCH 2/9] x86/fault: Do not print IP in show_fault_oops() Borislav Petkov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

Otherwise it looks unbalanced.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 kernel/panic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 2cfef408fec9..9fb023d0cae1 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -289,7 +289,7 @@ void panic(const char *fmt, ...)
 		disabled_wait(caller);
 	}
 #endif
-	pr_emerg("---[ end Kernel panic - not syncing: %s\n", buf);
+	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 	local_irq_enable();
 	for (i = 0; ; i += PANIC_TIMER_STEP) {
 		touch_softlockup_watchdog();
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/9] x86/fault: Do not print IP in show_fault_oops()
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
  2018-03-06  9:49 ` [PATCH 1/9] panic: Add closing panic marker parenthesis Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  2018-03-08 11:09   ` [tip:x86/cleanups] " tip-bot for Borislav Petkov
  2018-03-06  9:49 ` [PATCH 3/9] x86/dumpstack: Unify show_regs() Borislav Petkov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

... because __show_regs() already does that.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/fault.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e6af2b464c3d..26865147a507 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -699,7 +699,6 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
 		printk(KERN_CONT "paging request");
 
 	printk(KERN_CONT " at %px\n", (void *) address);
-	printk(KERN_ALERT "IP: %pS\n", (void *)regs->ip);
 
 	dump_pagetable(address);
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/9] x86/dumpstack: Unify show_regs()
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
  2018-03-06  9:49 ` [PATCH 1/9] panic: Add closing panic marker parenthesis Borislav Petkov
  2018-03-06  9:49 ` [PATCH 2/9] x86/fault: Do not print IP in show_fault_oops() Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  2018-03-08 11:10   ` [tip:x86/cleanups] " tip-bot for Borislav Petkov
  2018-03-06  9:49 ` [PATCH 4/9] x86/dumpstack: Carve out Code: dumping into a function Borislav Petkov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

The 32-bit version uses KERN_EMERG and commit

  b0f4c4b32c8e ("bugs, x86: Fix printk levels for panic, softlockups and stack dumps")

changed the 64-bit version to KERN_DEFAULT. The same justification in
that commit that those messages do not belong in the terminal, holds
true for 32-bit also, so make it so.

Make code_bytes static, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/stacktrace.h |  2 --
 arch/x86/kernel/dumpstack.c       | 49 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/dumpstack_32.c    | 42 ---------------------------------
 arch/x86/kernel/dumpstack_64.c    | 42 ---------------------------------
 4 files changed, 48 insertions(+), 87 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index f73706878772..133d9425fced 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -87,8 +87,6 @@ get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
 void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *stack, char *log_lvl);
 
-extern unsigned int code_bytes;
-
 /* The form of the top of the frame on the stack */
 struct stack_frame {
 	struct stack_frame *next_frame;
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index a2d8a3908670..18fa9d74c182 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -24,7 +24,7 @@
 
 int panic_on_unrecovered_nmi;
 int panic_on_io_nmi;
-unsigned int code_bytes = 64;
+static unsigned int code_bytes = 64;
 static int die_counter;
 
 bool in_task_stack(unsigned long *stack, struct task_struct *task,
@@ -375,3 +375,50 @@ static int __init code_bytes_setup(char *s)
 	return 1;
 }
 __setup("code_bytes=", code_bytes_setup);
+
+void show_regs(struct pt_regs *regs)
+{
+	bool all = true;
+	int i;
+
+	show_regs_print_info(KERN_DEFAULT);
+
+	if (IS_ENABLED(CONFIG_X86_32))
+		all = !user_mode(regs);
+
+	__show_regs(regs, all);
+
+	/*
+	 * When in-kernel, we also print out the stack and code at the
+	 * time of the fault..
+	 */
+	if (!user_mode(regs)) {
+		unsigned int code_prologue = code_bytes * 43 / 64;
+		unsigned int code_len = code_bytes;
+		unsigned char c;
+		u8 *ip;
+
+		show_trace_log_lvl(current, regs, NULL, KERN_DEFAULT);
+
+		printk(KERN_DEFAULT "Code: ");
+
+		ip = (u8 *)regs->ip - code_prologue;
+		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
+			/* try starting at IP */
+			ip = (u8 *)regs->ip;
+			code_len = code_len - code_prologue + 1;
+		}
+		for (i = 0; i < code_len; i++, ip++) {
+			if (ip < (u8 *)PAGE_OFFSET ||
+					probe_kernel_address(ip, c)) {
+				pr_cont(" Bad RIP value.");
+				break;
+			}
+			if (ip == (u8 *)regs->ip)
+				pr_cont("<%02x> ", c);
+			else
+				pr_cont("%02x ", c);
+		}
+	}
+	pr_cont("\n");
+}
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 04170f63e3a1..cd53f3030e40 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -127,45 +127,3 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	info->type = STACK_TYPE_UNKNOWN;
 	return -EINVAL;
 }
-
-void show_regs(struct pt_regs *regs)
-{
-	int i;
-
-	show_regs_print_info(KERN_EMERG);
-	__show_regs(regs, !user_mode(regs));
-
-	/*
-	 * When in-kernel, we also print out the stack and code at the
-	 * time of the fault..
-	 */
-	if (!user_mode(regs)) {
-		unsigned int code_prologue = code_bytes * 43 / 64;
-		unsigned int code_len = code_bytes;
-		unsigned char c;
-		u8 *ip;
-
-		show_trace_log_lvl(current, regs, NULL, KERN_EMERG);
-
-		pr_emerg("Code:");
-
-		ip = (u8 *)regs->ip - code_prologue;
-		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
-			/* try starting at IP */
-			ip = (u8 *)regs->ip;
-			code_len = code_len - code_prologue + 1;
-		}
-		for (i = 0; i < code_len; i++, ip++) {
-			if (ip < (u8 *)PAGE_OFFSET ||
-					probe_kernel_address(ip, c)) {
-				pr_cont("  Bad EIP value.");
-				break;
-			}
-			if (ip == (u8 *)regs->ip)
-				pr_cont(" <%02x>", c);
-			else
-				pr_cont(" %02x", c);
-		}
-	}
-	pr_cont("\n");
-}
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 563e28d14f2c..5cdb9e84da57 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -149,45 +149,3 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
 	info->type = STACK_TYPE_UNKNOWN;
 	return -EINVAL;
 }
-
-void show_regs(struct pt_regs *regs)
-{
-	int i;
-
-	show_regs_print_info(KERN_DEFAULT);
-	__show_regs(regs, 1);
-
-	/*
-	 * When in-kernel, we also print out the stack and code at the
-	 * time of the fault..
-	 */
-	if (!user_mode(regs)) {
-		unsigned int code_prologue = code_bytes * 43 / 64;
-		unsigned int code_len = code_bytes;
-		unsigned char c;
-		u8 *ip;
-
-		show_trace_log_lvl(current, regs, NULL, KERN_DEFAULT);
-
-		printk(KERN_DEFAULT "Code: ");
-
-		ip = (u8 *)regs->ip - code_prologue;
-		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
-			/* try starting at IP */
-			ip = (u8 *)regs->ip;
-			code_len = code_len - code_prologue + 1;
-		}
-		for (i = 0; i < code_len; i++, ip++) {
-			if (ip < (u8 *)PAGE_OFFSET ||
-					probe_kernel_address(ip, c)) {
-				pr_cont(" Bad RIP value.");
-				break;
-			}
-			if (ip == (u8 *)regs->ip)
-				pr_cont("<%02x> ", c);
-			else
-				pr_cont("%02x ", c);
-		}
-	}
-	pr_cont("\n");
-}
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/9] x86/dumpstack: Carve out Code: dumping into a function
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
                   ` (2 preceding siblings ...)
  2018-03-06  9:49 ` [PATCH 3/9] x86/dumpstack: Unify show_regs() Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  2018-03-06  9:49 ` [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section Borislav Petkov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

No functionality change, carve it out into a separate function for later
changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/dumpstack.c | 57 ++++++++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 18fa9d74c182..19a5860b62c8 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -69,6 +69,35 @@ static void printk_stack_address(unsigned long address, int reliable,
 	printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
+static void show_opcodes(u8 *rip)
+{
+	unsigned int code_prologue = code_bytes * 43 / 64;
+	unsigned int code_len = code_bytes;
+	unsigned char c;
+	u8 *ip;
+	int i;
+
+	printk(KERN_DEFAULT "Code: ");
+
+	ip = (u8 *)rip - code_prologue;
+	if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
+		/* try starting at IP */
+		ip = (u8 *)rip;
+		code_len = code_len - code_prologue + 1;
+	}
+	for (i = 0; i < code_len; i++, ip++) {
+		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
+			pr_cont(" Bad RIP value.");
+			break;
+		}
+		if (ip == (u8 *)rip)
+			pr_cont("<%02x> ", c);
+		else
+			pr_cont("%02x ", c);
+	}
+	pr_cont("\n");
+}
+
 void show_iret_regs(struct pt_regs *regs)
 {
 	printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip);
@@ -379,7 +408,6 @@ __setup("code_bytes=", code_bytes_setup);
 void show_regs(struct pt_regs *regs)
 {
 	bool all = true;
-	int i;
 
 	show_regs_print_info(KERN_DEFAULT);
 
@@ -393,32 +421,7 @@ void show_regs(struct pt_regs *regs)
 	 * time of the fault..
 	 */
 	if (!user_mode(regs)) {
-		unsigned int code_prologue = code_bytes * 43 / 64;
-		unsigned int code_len = code_bytes;
-		unsigned char c;
-		u8 *ip;
-
 		show_trace_log_lvl(current, regs, NULL, KERN_DEFAULT);
-
-		printk(KERN_DEFAULT "Code: ");
-
-		ip = (u8 *)regs->ip - code_prologue;
-		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
-			/* try starting at IP */
-			ip = (u8 *)regs->ip;
-			code_len = code_len - code_prologue + 1;
-		}
-		for (i = 0; i < code_len; i++, ip++) {
-			if (ip < (u8 *)PAGE_OFFSET ||
-					probe_kernel_address(ip, c)) {
-				pr_cont(" Bad RIP value.");
-				break;
-			}
-			if (ip == (u8 *)regs->ip)
-				pr_cont("<%02x> ", c);
-			else
-				pr_cont("%02x ", c);
-		}
+		show_opcodes((u8 *)regs->ip);
 	}
-	pr_cont("\n");
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
                   ` (3 preceding siblings ...)
  2018-03-06  9:49 ` [PATCH 4/9] x86/dumpstack: Carve out Code: dumping into a function Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  2018-03-06 18:47   ` Linus Torvalds
  2018-03-06  9:49 ` [PATCH 6/9] x86/dumpstack: Add loglevel argument to show_opcodes() Borislav Petkov
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

The code used to iterate byte-by-byte over the bytes around RIP and that
is expensive: disabling pagefaults around it, copy_from_user, etc...

Make it read the whole buffer of code_bytes size in one go. By default
use a statically allocated 64 bytes buffer. If "code_bytes=" is supplied
on the cmdline a new buffer gets allocated.

Also, do the PAGE_OFFSET check outside of the function because latter
will be reused in other context.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/dumpstack.c | 44 +++++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 19a5860b62c8..12ddfc9dcb01 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -22,9 +22,13 @@
 #include <asm/stacktrace.h>
 #include <asm/unwind.h>
 
+#define OPCODE_BUFSIZE 64
+
 int panic_on_unrecovered_nmi;
 int panic_on_io_nmi;
-static unsigned int code_bytes = 64;
+static unsigned int code_bytes = OPCODE_BUFSIZE;
+static u8 __opc[OPCODE_BUFSIZE];
+static u8 *opcodes = __opc;
 static int die_counter;
 
 bool in_task_stack(unsigned long *stack, struct task_struct *task,
@@ -71,29 +75,23 @@ static void printk_stack_address(unsigned long address, int reliable,
 
 static void show_opcodes(u8 *rip)
 {
-	unsigned int code_prologue = code_bytes * 43 / 64;
-	unsigned int code_len = code_bytes;
-	unsigned char c;
+	unsigned int code_prologue = code_bytes * 43 / OPCODE_BUFSIZE;
 	u8 *ip;
 	int i;
 
 	printk(KERN_DEFAULT "Code: ");
 
 	ip = (u8 *)rip - code_prologue;
-	if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
-		/* try starting at IP */
-		ip = (u8 *)rip;
-		code_len = code_len - code_prologue + 1;
+	if (probe_kernel_read(opcodes, ip, code_bytes)) {
+		pr_cont(" Bad RIP value.\n");
+		return;
 	}
-	for (i = 0; i < code_len; i++, ip++) {
-		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
-			pr_cont(" Bad RIP value.");
-			break;
-		}
+
+	for (i = 0; i < code_bytes; i++, ip++) {
 		if (ip == (u8 *)rip)
-			pr_cont("<%02x> ", c);
+			pr_cont("<%02x> ", opcodes[i]);
 		else
-			pr_cont("%02x ", c);
+			pr_cont("%02x ", opcodes[i]);
 	}
 	pr_cont("\n");
 }
@@ -387,8 +385,8 @@ void die(const char *str, struct pt_regs *regs, long err)
 
 static int __init code_bytes_setup(char *s)
 {
-	ssize_t ret;
 	unsigned long val;
+	ssize_t ret;
 
 	if (!s)
 		return -EINVAL;
@@ -401,6 +399,14 @@ static int __init code_bytes_setup(char *s)
 	if (code_bytes > 8192)
 		code_bytes = 8192;
 
+	if (code_bytes > OPCODE_BUFSIZE) {
+		u8 *new_buf = kzalloc(code_bytes, GFP_KERNEL);
+		if (!new_buf)
+			return -ENOMEM;
+
+		opcodes = new_buf;
+	}
+
 	return 1;
 }
 __setup("code_bytes=", code_bytes_setup);
@@ -422,6 +428,10 @@ void show_regs(struct pt_regs *regs)
 	 */
 	if (!user_mode(regs)) {
 		show_trace_log_lvl(current, regs, NULL, KERN_DEFAULT);
-		show_opcodes((u8 *)regs->ip);
+
+		if (regs->ip < PAGE_OFFSET)
+			pr_cont(" Bad RIP value.\n");
+		else
+			show_opcodes((u8 *)regs->ip);
 	}
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/9] x86/dumpstack: Add loglevel argument to show_opcodes()
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
                   ` (4 preceding siblings ...)
  2018-03-06  9:49 ` [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  2018-03-06  9:49 ` [PATCH 7/9] x86/fault: Dump user opcode bytes on fatal faults Borislav Petkov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

Will be used in the next patch.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/stacktrace.h | 1 +
 arch/x86/kernel/dumpstack.c       | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 133d9425fced..0630eeb18bbc 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -111,4 +111,5 @@ static inline unsigned long caller_frame_pointer(void)
 	return (unsigned long)frame;
 }
 
+void show_opcodes(u8 *rip, const char *loglvl);
 #endif /* _ASM_X86_STACKTRACE_H */
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 12ddfc9dcb01..3625f79fbb42 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -73,13 +73,13 @@ static void printk_stack_address(unsigned long address, int reliable,
 	printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
-static void show_opcodes(u8 *rip)
+void show_opcodes(u8 *rip, const char *loglvl)
 {
 	unsigned int code_prologue = code_bytes * 43 / OPCODE_BUFSIZE;
 	u8 *ip;
 	int i;
 
-	printk(KERN_DEFAULT "Code: ");
+	printk("%sCode: ", loglvl);
 
 	ip = (u8 *)rip - code_prologue;
 	if (probe_kernel_read(opcodes, ip, code_bytes)) {
@@ -432,6 +432,6 @@ void show_regs(struct pt_regs *regs)
 		if (regs->ip < PAGE_OFFSET)
 			pr_cont(" Bad RIP value.\n");
 		else
-			show_opcodes((u8 *)regs->ip);
+			show_opcodes((u8 *)regs->ip, KERN_DEFAULT);
 	}
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/9] x86/fault: Dump user opcode bytes on fatal faults
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
                   ` (5 preceding siblings ...)
  2018-03-06  9:49 ` [PATCH 6/9] x86/dumpstack: Add loglevel argument to show_opcodes() Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  2018-03-06  9:49 ` [PATCH 8/9] x86/dumpstack: Add a show_ip() function Borislav Petkov
  2018-03-06  9:49 ` [PATCH 9/9] x86/dumpstack: Save first regs set for the executive summary Borislav Petkov
  8 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

Sometimes it is useful to see which user opcode bytes RIP points to
when a fault happens: be it to rule out RIP corruption, to dump info
early during boot, when doing core dumps is impossible due to not having
writable fs yet.

Sometimes it is useful if debugging an issue and one doesn't have access
to the executable which caused the fault in order to disassemble it.

That last aspect might have some security implications so
show_unhandled_signals could be revisited for that or a new config
option added.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/fault.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 26865147a507..b3c19f734442 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -850,6 +850,8 @@ static inline void
 show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address, struct task_struct *tsk)
 {
+	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
+
 	if (!unhandled_signal(tsk, SIGSEGV))
 		return;
 
@@ -857,13 +859,14 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 		return;
 
 	printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx",
-		task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
-		tsk->comm, task_pid_nr(tsk), address,
+		loglvl, tsk->comm, task_pid_nr(tsk), address,
 		(void *)regs->ip, (void *)regs->sp, error_code);
 
 	print_vma_addr(KERN_CONT " in ", regs->ip);
 
 	printk(KERN_CONT "\n");
+
+	show_opcodes((u8 *)regs->ip, loglvl);
 }
 
 static void
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/9] x86/dumpstack: Add a show_ip() function
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
                   ` (6 preceding siblings ...)
  2018-03-06  9:49 ` [PATCH 7/9] x86/fault: Dump user opcode bytes on fatal faults Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  2018-03-06  9:49 ` [PATCH 9/9] x86/dumpstack: Save first regs set for the executive summary Borislav Petkov
  8 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

... which shows the Istruction Pointer along with the insn bytes around it.
Use it whenever we print rIP.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/stacktrace.h |  1 +
 arch/x86/kernel/dumpstack.c       | 13 ++++++++++++-
 arch/x86/kernel/process_32.c      |  4 +---
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 0630eeb18bbc..b6dc698f992a 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -112,4 +112,5 @@ static inline unsigned long caller_frame_pointer(void)
 }
 
 void show_opcodes(u8 *rip, const char *loglvl);
+void show_ip(struct pt_regs *regs, const char *loglvl);
 #endif /* _ASM_X86_STACKTRACE_H */
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 3625f79fbb42..e855815b00d4 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -96,9 +96,20 @@ void show_opcodes(u8 *rip, const char *loglvl)
 	pr_cont("\n");
 }
 
+void show_ip(struct pt_regs *regs, const char *loglvl)
+{
+#ifdef CONFIG_X86_32
+	printk("%sEIP: %pS\n", loglvl, (void *)regs->ip);
+#else
+	printk("%sRIP: %04x:%pS\n", loglvl, (int)regs->cs, (void *)regs->ip);
+#endif
+
+	show_opcodes((u8 *)regs->ip, loglvl);
+}
+
 void show_iret_regs(struct pt_regs *regs)
 {
-	printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip);
+	show_ip(regs, KERN_DEFAULT);
 	printk(KERN_DEFAULT "RSP: %04x:%016lx EFLAGS: %08lx", (int)regs->ss,
 		regs->sp, regs->flags);
 }
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 5224c6099184..3d1f1226b972 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -76,9 +76,7 @@ void __show_regs(struct pt_regs *regs, int all)
 		savesegment(gs, gs);
 	}
 
-	printk(KERN_DEFAULT "EIP: %pS\n", (void *)regs->ip);
-	printk(KERN_DEFAULT "EFLAGS: %08lx CPU: %d\n", regs->flags,
-		raw_smp_processor_id());
+	show_ip(regs, KERN_DEFAULT);
 
 	printk(KERN_DEFAULT "EAX: %08lx EBX: %08lx ECX: %08lx EDX: %08lx\n",
 		regs->ax, regs->bx, regs->cx, regs->dx);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 9/9] x86/dumpstack: Save first regs set for the executive summary
  2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
                   ` (7 preceding siblings ...)
  2018-03-06  9:49 ` [PATCH 8/9] x86/dumpstack: Add a show_ip() function Borislav Petkov
@ 2018-03-06  9:49 ` Borislav Petkov
  8 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2018-03-06  9:49 UTC (permalink / raw)
  To: X86 ML
  Cc: Andy Lutomirski, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, LKML

From: Borislav Petkov <bp@suse.de>

Save the regs set when we call __die() for the first time and print it
in oops_end().

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/dumpstack.c | 33 ++++++++++++---------------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index e855815b00d4..409a5bd02a18 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -31,6 +31,8 @@ static u8 __opc[OPCODE_BUFSIZE];
 static u8 *opcodes = __opc;
 static int die_counter;
 
+static struct pt_regs exec_summary_regs;
+
 bool in_task_stack(unsigned long *stack, struct task_struct *task,
 		   struct stack_info *info)
 {
@@ -325,6 +327,9 @@ void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
 	raw_local_irq_restore(flags);
 	oops_exit();
 
+	/* Executive summary in case the oops scrolled away */
+	__show_regs(&exec_summary_regs, false);
+
 	if (!signr)
 		return;
 	if (in_interrupt())
@@ -343,10 +348,10 @@ NOKPROBE_SYMBOL(oops_end);
 
 int __die(const char *str, struct pt_regs *regs, long err)
 {
-#ifdef CONFIG_X86_32
-	unsigned short ss;
-	unsigned long sp;
-#endif
+	/* Save the regs of the first oops for the executive summary later. */
+	if (!die_counter)
+		exec_summary_regs = *regs;
+
 	printk(KERN_DEFAULT
 	       "%s: %04lx [#%d]%s%s%s%s%s\n", str, err & 0xffff, ++die_counter,
 	       IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT"         : "",
@@ -356,26 +361,14 @@ int __die(const char *str, struct pt_regs *regs, long err)
 	       IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION) ?
 	       (boot_cpu_has(X86_FEATURE_PTI) ? " PTI" : " NOPTI") : "");
 
+	show_regs(regs);
+
 	if (notify_die(DIE_OOPS, str, regs, err,
 			current->thread.trap_nr, SIGSEGV) == NOTIFY_STOP)
 		return 1;
 
 	print_modules();
-	show_regs(regs);
-#ifdef CONFIG_X86_32
-	if (user_mode(regs)) {
-		sp = regs->sp;
-		ss = regs->ss;
-	} else {
-		sp = kernel_stack_pointer(regs);
-		savesegment(ss, ss);
-	}
-	printk(KERN_EMERG "EIP: %pS SS:ESP: %04x:%08lx\n",
-	       (void *)regs->ip, ss, sp);
-#else
-	/* Executive summary in case the oops scrolled away */
-	printk(KERN_ALERT "RIP: %pS RSP: %016lx\n", (void *)regs->ip, regs->sp);
-#endif
+
 	return 0;
 }
 NOKPROBE_SYMBOL(__die);
@@ -442,7 +435,5 @@ void show_regs(struct pt_regs *regs)
 
 		if (regs->ip < PAGE_OFFSET)
 			pr_cont(" Bad RIP value.\n");
-		else
-			show_opcodes((u8 *)regs->ip, KERN_DEFAULT);
 	}
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-06  9:49 ` [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section Borislav Petkov
@ 2018-03-06 18:47   ` Linus Torvalds
  2018-03-07 10:13     ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2018-03-06 18:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra, LKML

On Tue, Mar 6, 2018 at 1:49 AM, Borislav Petkov <bp@alien8.de> wrote:
>
> Make it read the whole buffer of code_bytes size in one go. By default
> use a statically allocated 64 bytes buffer. If "code_bytes=" is supplied
> on the cmdline a new buffer gets allocated.

Are these always serialized? For oopses, I think we end up serializing
with die_lock, but is that always the case?

Maybe at least a comment about why a static allocation is ok?

             Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-06 18:47   ` Linus Torvalds
@ 2018-03-07 10:13     ` Borislav Petkov
  2018-03-07 13:25       ` Josh Poimboeuf
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2018-03-07 10:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: X86 ML, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra, LKML

On Tue, Mar 06, 2018 at 10:47:56AM -0800, Linus Torvalds wrote:
> Are these always serialized? For oopses, I think we end up serializing
> with die_lock, but is that always the case?

Hmm, good question.

> Maybe at least a comment about why a static allocation is ok?

Well, I'm afraid it is not ok but let me show what I'm seeing - maybe
I'm wrong somewhere:

Normally, when something calls die() we do this:

die
|-> oops_begin
|-> arch_spin_lock(&die_lock)		<-- grab die_lock
|-> __die
|-> show_regs
|-> __show_regs
|-> show_iret_regs
|-> show_ip
|-> show_opcodes

and we dump fine here.

But, if, for example, a #PF happens while we die(), we could do this:

do_page_fault
|-> __do_page_fault
|-> bad_area_nosemaphore
|-> __bad_area_nosemaphore
|-> show_signal_msg
|-> show_opcodes

that's the catch-all case in:

        if (unlikely(fault_in_kernel_space(address))) {

and that doesn't sync with the die_lock, AFAICT, and we're walking all
over the opcodes buffer.

Unless I'm missing something, that is.

If I'm not, then I guess I need to think about a better way to solve
this. Because I like the improvement of not having to probe_kernel_read()
byte-by-byte but read it all at once.

And that is fine if I do a 64-byte default, on-stack buffer but that
code_bytes= thing can be 2 pages max which is yuck. No way I'm doing
on-stack buffers then.

Hmm, I need to think about it.

Thanks!

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-07 10:13     ` Borislav Petkov
@ 2018-03-07 13:25       ` Josh Poimboeuf
  2018-03-07 14:16         ` Borislav Petkov
  2018-03-07 21:08         ` Linus Torvalds
  0 siblings, 2 replies; 23+ messages in thread
From: Josh Poimboeuf @ 2018-03-07 13:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, X86 ML, Andy Lutomirski, Peter Zijlstra, LKML

On Wed, Mar 07, 2018 at 11:13:14AM +0100, Borislav Petkov wrote:
> And that is fine if I do a 64-byte default, on-stack buffer but that
> code_bytes= thing can be 2 pages max which is yuck. No way I'm doing
> on-stack buffers then.

How about we just remove the 'code_bytes=' option?  (Or at the very
least, reduce its possible range to a reasonable max?)

I doubt anybody actually uses it.  I'd never heard of it before, nor
have I ever seen an oops with a long code dump.  I can't fathom why
somebody would even need it.  64 bytes is plenty, and an 8k code dump
just sounds insane.

It comes from the following commit:

commit 86c418374223be3f328b5522545196db02c8ceda
Author: Chuck Ebbert <cebbert@redhat.com>
Date:   Tue Feb 13 13:26:25 2007 +0100

    [PATCH] i386: add option to show more code in oops reports

    Sometimes developers need to see more object code in an oops report,
    e.g. when kernel may be corrupted at runtime.

    Add the "code_bytes" option for this.

    Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
    Signed-off-by: Andi Kleen <ak@suse.de>
    Cc: Andi Kleen <ak@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

But I've never seen a case where somebody needed to use it.

-- 
Josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-07 13:25       ` Josh Poimboeuf
@ 2018-03-07 14:16         ` Borislav Petkov
  2018-03-07 21:08         ` Linus Torvalds
  1 sibling, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2018-03-07 14:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, X86 ML, Andy Lutomirski, Peter Zijlstra, LKML,
	Chuck Ebbert

On Wed, Mar 07, 2018 at 07:25:35AM -0600, Josh Poimboeuf wrote:
> How about we just remove the 'code_bytes=' option?

Haha, removing stuff is my usual solution :-)

I'd love to.

> (Or at the very
> least, reduce its possible range to a reasonable max?)
> 
> I doubt anybody actually uses it.  I'd never heard of it before, nor
> have I ever seen an oops with a long code dump.  I can't fathom why
> somebody would even need it.  64 bytes is plenty, and an 8k code dump
> just sounds insane.

Yeah, I see a Chuck Ebbert in git log output with a gmail account, maybe
that's the same person. (I've assumed he's not at RH anymore, otherwise
you would've CCed him :-)).

Let's ask him. CCed.

> It comes from the following commit:
> 
> commit 86c418374223be3f328b5522545196db02c8ceda
> Author: Chuck Ebbert <cebbert@redhat.com>
> Date:   Tue Feb 13 13:26:25 2007 +0100
> 
>     [PATCH] i386: add option to show more code in oops reports
> 
>     Sometimes developers need to see more object code in an oops report,
>     e.g. when kernel may be corrupted at runtime.
> 
>     Add the "code_bytes" option for this.
> 
>     Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
>     Signed-off-by: Andi Kleen <ak@suse.de>
>     Cc: Andi Kleen <ak@suse.de>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> 
> But I've never seen a case where somebody needed to use it.
> 
> -- 
> Josh

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-07 13:25       ` Josh Poimboeuf
  2018-03-07 14:16         ` Borislav Petkov
@ 2018-03-07 21:08         ` Linus Torvalds
  2018-03-08 10:16           ` Borislav Petkov
  1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2018-03-07 21:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Borislav Petkov, X86 ML, Andy Lutomirski, Peter Zijlstra, LKML

On Wed, Mar 7, 2018 at 5:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> How about we just remove the 'code_bytes=' option?  (Or at the very
> least, reduce its possible range to a reasonable max?)

Ack. Just limit it to 64 bytes max sounds plenty.

             Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-07 21:08         ` Linus Torvalds
@ 2018-03-08 10:16           ` Borislav Petkov
  2018-03-08 18:00             ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2018-03-08 10:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, X86 ML, Andy Lutomirski, Peter Zijlstra, LKML

On Wed, Mar 07, 2018 at 01:08:32PM -0800, Linus Torvalds wrote:
> On Wed, Mar 7, 2018 at 5:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > How about we just remove the 'code_bytes=' option?  (Or at the very
> > least, reduce its possible range to a reasonable max?)
> 
> Ack. Just limit it to 64 bytes max sounds plenty.

Done, combined diff ontop. With a 64-byte on-stack opcodes buffer:

---
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b37c1c30c16f..2c74c1694d9d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -587,11 +587,6 @@
 			Sets the size of memory pool for coherent, atomic dma
 			allocations, by default set to 256K.
 
-	code_bytes	[X86] How many bytes of object code to print
-			in an oops report.
-			Range: 0 - 8192
-			Default: 64
-
 	com20020=	[HW,NET] ARCnet - COM20020 chipset
 			Format:
 			<io>[,<irq>[,<nodeID>[,<backplane>[,<ckp>[,<timeout>]]]]]
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 409a5bd02a18..2fc009a1824e 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -22,13 +22,9 @@
 #include <asm/stacktrace.h>
 #include <asm/unwind.h>
 
-#define OPCODE_BUFSIZE 64
-
 int panic_on_unrecovered_nmi;
 int panic_on_io_nmi;
-static unsigned int code_bytes = OPCODE_BUFSIZE;
-static u8 __opc[OPCODE_BUFSIZE];
-static u8 *opcodes = __opc;
+
 static int die_counter;
 
 static struct pt_regs exec_summary_regs;
@@ -77,19 +73,21 @@ static void printk_stack_address(unsigned long address, int reliable,
 
 void show_opcodes(u8 *rip, const char *loglvl)
 {
-	unsigned int code_prologue = code_bytes * 43 / OPCODE_BUFSIZE;
+#define OPCODE_BUFSIZE 64
+	unsigned int code_prologue = OPCODE_BUFSIZE * 43 / OPCODE_BUFSIZE;
+	u8 opcodes[OPCODE_BUFSIZE];
 	u8 *ip;
 	int i;
 
 	printk("%sCode: ", loglvl);
 
 	ip = (u8 *)rip - code_prologue;
-	if (probe_kernel_read(opcodes, ip, code_bytes)) {
+	if (probe_kernel_read(opcodes, ip, OPCODE_BUFSIZE)) {
 		pr_cont(" Bad RIP value.\n");
 		return;
 	}
 
-	for (i = 0; i < code_bytes; i++, ip++) {
+	for (i = 0; i < OPCODE_BUFSIZE; i++, ip++) {
 		if (ip == (u8 *)rip)
 			pr_cont("<%02x> ", opcodes[i]);
 		else
@@ -387,34 +385,6 @@ void die(const char *str, struct pt_regs *regs, long err)
 	oops_end(flags, regs, sig);
 }
 
-static int __init code_bytes_setup(char *s)
-{
-	unsigned long val;
-	ssize_t ret;
-
-	if (!s)
-		return -EINVAL;
-
-	ret = kstrtoul(s, 0, &val);
-	if (ret)
-		return ret;
-
-	code_bytes = val;
-	if (code_bytes > 8192)
-		code_bytes = 8192;
-
-	if (code_bytes > OPCODE_BUFSIZE) {
-		u8 *new_buf = kzalloc(code_bytes, GFP_KERNEL);
-		if (!new_buf)
-			return -ENOMEM;
-
-		opcodes = new_buf;
-	}
-
-	return 1;
-}
-__setup("code_bytes=", code_bytes_setup);
-
 void show_regs(struct pt_regs *regs)
 {
 	bool all = true;

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip:core/core] panic: Add closing panic marker parenthesis
  2018-03-06  9:49 ` [PATCH 1/9] panic: Add closing panic marker parenthesis Borislav Petkov
@ 2018-03-08 11:03   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-03-08 11:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mingo, bp, tglx, jpoimboe, torvalds, peterz, luto, linux-kernel

Commit-ID:  5ad751053704df3f00d2bb2dc9345c697c212150
Gitweb:     https://git.kernel.org/tip/5ad751053704df3f00d2bb2dc9345c697c212150
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 6 Mar 2018 10:49:12 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 8 Mar 2018 12:01:10 +0100

panic: Add closing panic marker parenthesis

Otherwise it looks unbalanced.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20180306094920.16917-2-bp@alien8.de

---
 kernel/panic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 2cfef408fec9..9fb023d0cae1 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -289,7 +289,7 @@ void panic(const char *fmt, ...)
 		disabled_wait(caller);
 	}
 #endif
-	pr_emerg("---[ end Kernel panic - not syncing: %s\n", buf);
+	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 	local_irq_enable();
 	for (i = 0; ; i += PANIC_TIMER_STEP) {
 		touch_softlockup_watchdog();

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip:x86/cleanups] x86/fault: Do not print IP in show_fault_oops()
  2018-03-06  9:49 ` [PATCH 2/9] x86/fault: Do not print IP in show_fault_oops() Borislav Petkov
@ 2018-03-08 11:09   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-03-08 11:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, linux-kernel, luto, tglx, torvalds, peterz, jpoimboe, mingo, bp

Commit-ID:  9558080935e0bd744d68a7e1747a7117310623cf
Gitweb:     https://git.kernel.org/tip/9558080935e0bd744d68a7e1747a7117310623cf
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 6 Mar 2018 10:49:13 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 8 Mar 2018 12:04:59 +0100

x86/fault: Do not print IP in show_fault_oops()

... because __show_regs() already does that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20180306094920.16917-3-bp@alien8.de

---
 arch/x86/mm/fault.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c88573d90f3e..93505990df10 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -699,7 +699,6 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
 		printk(KERN_CONT "paging request");
 
 	printk(KERN_CONT " at %px\n", (void *) address);
-	printk(KERN_ALERT "IP: %pS\n", (void *)regs->ip);
 
 	dump_pagetable(address);
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip:x86/cleanups] x86/dumpstack: Unify show_regs()
  2018-03-06  9:49 ` [PATCH 3/9] x86/dumpstack: Unify show_regs() Borislav Petkov
@ 2018-03-08 11:10   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Borislav Petkov @ 2018-03-08 11:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, linux-kernel, tglx, jpoimboe, hpa, mingo, bp, peterz, torvalds

Commit-ID:  16d1cb0bc43642a4d934631a73c5210ad2499e2f
Gitweb:     https://git.kernel.org/tip/16d1cb0bc43642a4d934631a73c5210ad2499e2f
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 6 Mar 2018 10:49:14 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 8 Mar 2018 12:04:59 +0100

x86/dumpstack: Unify show_regs()

The 32-bit version uses KERN_EMERG and commit

  b0f4c4b32c8e ("bugs, x86: Fix printk levels for panic, softlockups and stack dumps")

changed the 64-bit version to KERN_DEFAULT. The same justification in
that commit that those messages do not belong in the terminal, holds
true for 32-bit also, so make it so.

Make code_bytes static, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20180306094920.16917-4-bp@alien8.de

---
 arch/x86/include/asm/stacktrace.h |  2 --
 arch/x86/kernel/dumpstack.c       | 49 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/dumpstack_32.c    | 42 ---------------------------------
 arch/x86/kernel/dumpstack_64.c    | 42 ---------------------------------
 4 files changed, 48 insertions(+), 87 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index f73706878772..133d9425fced 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -87,8 +87,6 @@ get_stack_pointer(struct task_struct *task, struct pt_regs *regs)
 void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unsigned long *stack, char *log_lvl);
 
-extern unsigned int code_bytes;
-
 /* The form of the top of the frame on the stack */
 struct stack_frame {
 	struct stack_frame *next_frame;
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index a2d8a3908670..18fa9d74c182 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -24,7 +24,7 @@
 
 int panic_on_unrecovered_nmi;
 int panic_on_io_nmi;
-unsigned int code_bytes = 64;
+static unsigned int code_bytes = 64;
 static int die_counter;
 
 bool in_task_stack(unsigned long *stack, struct task_struct *task,
@@ -375,3 +375,50 @@ static int __init code_bytes_setup(char *s)
 	return 1;
 }
 __setup("code_bytes=", code_bytes_setup);
+
+void show_regs(struct pt_regs *regs)
+{
+	bool all = true;
+	int i;
+
+	show_regs_print_info(KERN_DEFAULT);
+
+	if (IS_ENABLED(CONFIG_X86_32))
+		all = !user_mode(regs);
+
+	__show_regs(regs, all);
+
+	/*
+	 * When in-kernel, we also print out the stack and code at the
+	 * time of the fault..
+	 */
+	if (!user_mode(regs)) {
+		unsigned int code_prologue = code_bytes * 43 / 64;
+		unsigned int code_len = code_bytes;
+		unsigned char c;
+		u8 *ip;
+
+		show_trace_log_lvl(current, regs, NULL, KERN_DEFAULT);
+
+		printk(KERN_DEFAULT "Code: ");
+
+		ip = (u8 *)regs->ip - code_prologue;
+		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
+			/* try starting at IP */
+			ip = (u8 *)regs->ip;
+			code_len = code_len - code_prologue + 1;
+		}
+		for (i = 0; i < code_len; i++, ip++) {
+			if (ip < (u8 *)PAGE_OFFSET ||
+					probe_kernel_address(ip, c)) {
+				pr_cont(" Bad RIP value.");
+				break;
+			}
+			if (ip == (u8 *)regs->ip)
+				pr_cont("<%02x> ", c);
+			else
+				pr_cont("%02x ", c);
+		}
+	}
+	pr_cont("\n");
+}
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 04170f63e3a1..cd53f3030e40 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -127,45 +127,3 @@ unknown:
 	info->type = STACK_TYPE_UNKNOWN;
 	return -EINVAL;
 }
-
-void show_regs(struct pt_regs *regs)
-{
-	int i;
-
-	show_regs_print_info(KERN_EMERG);
-	__show_regs(regs, !user_mode(regs));
-
-	/*
-	 * When in-kernel, we also print out the stack and code at the
-	 * time of the fault..
-	 */
-	if (!user_mode(regs)) {
-		unsigned int code_prologue = code_bytes * 43 / 64;
-		unsigned int code_len = code_bytes;
-		unsigned char c;
-		u8 *ip;
-
-		show_trace_log_lvl(current, regs, NULL, KERN_EMERG);
-
-		pr_emerg("Code:");
-
-		ip = (u8 *)regs->ip - code_prologue;
-		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
-			/* try starting at IP */
-			ip = (u8 *)regs->ip;
-			code_len = code_len - code_prologue + 1;
-		}
-		for (i = 0; i < code_len; i++, ip++) {
-			if (ip < (u8 *)PAGE_OFFSET ||
-					probe_kernel_address(ip, c)) {
-				pr_cont("  Bad EIP value.");
-				break;
-			}
-			if (ip == (u8 *)regs->ip)
-				pr_cont(" <%02x>", c);
-			else
-				pr_cont(" %02x", c);
-		}
-	}
-	pr_cont("\n");
-}
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 563e28d14f2c..5cdb9e84da57 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -149,45 +149,3 @@ unknown:
 	info->type = STACK_TYPE_UNKNOWN;
 	return -EINVAL;
 }
-
-void show_regs(struct pt_regs *regs)
-{
-	int i;
-
-	show_regs_print_info(KERN_DEFAULT);
-	__show_regs(regs, 1);
-
-	/*
-	 * When in-kernel, we also print out the stack and code at the
-	 * time of the fault..
-	 */
-	if (!user_mode(regs)) {
-		unsigned int code_prologue = code_bytes * 43 / 64;
-		unsigned int code_len = code_bytes;
-		unsigned char c;
-		u8 *ip;
-
-		show_trace_log_lvl(current, regs, NULL, KERN_DEFAULT);
-
-		printk(KERN_DEFAULT "Code: ");
-
-		ip = (u8 *)regs->ip - code_prologue;
-		if (ip < (u8 *)PAGE_OFFSET || probe_kernel_address(ip, c)) {
-			/* try starting at IP */
-			ip = (u8 *)regs->ip;
-			code_len = code_len - code_prologue + 1;
-		}
-		for (i = 0; i < code_len; i++, ip++) {
-			if (ip < (u8 *)PAGE_OFFSET ||
-					probe_kernel_address(ip, c)) {
-				pr_cont(" Bad RIP value.");
-				break;
-			}
-			if (ip == (u8 *)regs->ip)
-				pr_cont("<%02x> ", c);
-			else
-				pr_cont("%02x ", c);
-		}
-	}
-	pr_cont("\n");
-}

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-08 10:16           ` Borislav Petkov
@ 2018-03-08 18:00             ` Linus Torvalds
  2018-03-08 22:36               ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2018-03-08 18:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Josh Poimboeuf, X86 ML, Andy Lutomirski, Peter Zijlstra, LKML

On Thu, Mar 8, 2018 at 2:16 AM, Borislav Petkov <bp@alien8.de> wrote:
> +#define OPCODE_BUFSIZE 64
> +       unsigned int code_prologue = OPCODE_BUFSIZE * 43 / OPCODE_BUFSIZE;

Heh.

That's a very odd way of writing "43".

Honestly, the "43" is just "two thirds" rounded to closest, and it's
not important anyway, so I think you should just write it as

        unsigned int code_prologue = OPCODE_BUFSIZE * 2 / 3;

and never mind that it will now be 42.

42 is obviously the right answer anyway, which makes me think we got
it wrong earlier.

         Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-08 18:00             ` Linus Torvalds
@ 2018-03-08 22:36               ` Borislav Petkov
  2018-03-08 23:20                 ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2018-03-08 22:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, X86 ML, Andy Lutomirski, Peter Zijlstra, LKML

On Thu, Mar 08, 2018 at 10:00:09AM -0800, Linus Torvalds wrote:
> On Thu, Mar 8, 2018 at 2:16 AM, Borislav Petkov <bp@alien8.de> wrote:
> > +#define OPCODE_BUFSIZE 64
> > +       unsigned int code_prologue = OPCODE_BUFSIZE * 43 / OPCODE_BUFSIZE;
> 
> Heh.
> 
> That's a very odd way of writing "43".

I was simply search-replacing code_bytes :-)

> Honestly, the "43" is just "two thirds" rounded to closest, and it's
> not important anyway, so I think you should just write it as
> 
>         unsigned int code_prologue = OPCODE_BUFSIZE * 2 / 3;

Btw, do we have any explanation for the two-thirds prologue? I dug it
out to the patch below but it doesn't say why the prologue being bigger
is more important than the epilogue.

I would've made it half and half but I guess it is more important to see
the opcode bytes leading to rip... Oh well.

> and never mind that it will now be 42.
>
> 42 is obviously the right answer anyway, which makes me think we got
> it wrong earlier.

Doh, of course! What was I thinking?!? :-)

Done.

---
>From 313c2652ce75899d7801b021ed40e5b8ef233aca Mon Sep 17 00:00:00 2001
From: Keith Owens <kaos@ocs.com.au>
Date: Sun, 22 Aug 2004 22:36:42 -0700
Subject: [PATCH] [PATCH] i386 oops output: dump preceding code

This teaches the i386 oops dumper to dump opcodes preceding and after the
offending EIP.  Supporting code against ksymoops has been tested and produces
output like the below.

Support for this was added to ksymoops-2.4.9.

Note that ksymoops will guarantee that the disassembly after the <eip> value
is always in sync - if the disassembly from the start of the Code: line does
not sync up with the EIP address ksymoops will perform the resync.


Warning (merge_maps): no symbols in merged map
Mar 18 23:47:36 vmm kernel: kernel BUG at fs/open.c:802!
Mar 18 23:47:36 vmm kernel: invalid operand: 0000 [#1]
Mar 18 23:47:36 vmm kernel: CPU:    0
Mar 18 23:47:36 vmm kernel: EIP:    0060:[<c014fedf>] VLI    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Mar 18 23:47:36 vmm kernel: EFLAGS: 00010246
Mar 18 23:47:36 vmm kernel: eax: ccdfb900   ebx: 4001020d   ecx: 00000000   edx: 0000007b
Mar 18 23:47:36 vmm kernel: esi: 00000000   edi: bfffdd70   ebp: ccdfdfbc   esp: ccdfdfb0
Mar 18 23:47:36 vmm kernel: ds: 007b   es: 007b   ss: 0068
Mar 18 23:47:36 vmm kernel: Stack: 4001020d 00000000 bfffdd70 ccdfc000 c0109213 4001020d 00000000 00000003
Mar 18 23:47:36 vmm kernel:        00000000 bfffdd70 bfffdc88 00000005 0000007b 0000007b 00000005 4000ef94
Mar 18 23:47:36 vmm kernel:        00000073 00000206 bfffdbd8 0000007b
Mar 18 23:47:36 vmm kernel: Call Trace:
Mar 18 23:47:36 vmm kernel:  [<c0109213>] syscall_call+0x7/0xb
Mar 18 23:47:36 vmm kernel: Code: 14 98 f0 81 41 04 00 00 00 01 5b 89 ec 5d c3 90 b8 00 e0 ff ff 21 e0 55 89 e5 57 56 53 8b 00 81 b8 e4 01 00 00 0f 27 00 00 75 08 <0f> 0b 22 03 85 18 2f c0 8b 45 08 50 e8 30 d4 00 00 89 c7 83 c4


>>EIP; c014fedf No symbols available   <=====

Trace; c0109213 No symbols available

This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.

Code;  c014feb4 No symbols available
00000000 <_EIP>:
Code;  c014feb4 No symbols available
   0:   14 98                     adc    $0x98,%al
Code;  c014feb6 No symbols available
   2:   f0 81 41 04 00 00 00      lock addl $0x1000000,0x4(%ecx)
Code;  c014febd No symbols available
   9:   01
Code;  c014febe No symbols available
   a:   5b                        pop    %ebx
Code;  c014febf No symbols available
   b:   89 ec                     mov    %ebp,%esp
Code;  c014fec1 No symbols available
   d:   5d                        pop    %ebp
Code;  c014fec2 No symbols available
   e:   c3                        ret
Code;  c014fec3 No symbols available
   f:   90                        nop
Code;  c014fec4 No symbols available
  10:   b8 00 e0 ff ff            mov    $0xffffe000,%eax
Code;  c014fec9 No symbols available
  15:   21 e0                     and    %esp,%eax
Code;  c014fecb No symbols available
  17:   55                        push   %ebp
Code;  c014fecc No symbols available
  18:   89 e5                     mov    %esp,%ebp
Code;  c014fece No symbols available
  1a:   57                        push   %edi
Code;  c014fecf No symbols available
  1b:   56                        push   %esi
Code;  c014fed0 No symbols available
  1c:   53                        push   %ebx
Code;  c014fed1 No symbols available
  1d:   8b 00                     mov    (%eax),%eax
Code;  c014fed3 No symbols available
  1f:   81 b8 e4 01 00 00 0f      cmpl   $0x270f,0x1e4(%eax)
Code;  c014feda No symbols available
  26:   27 00 00
Code;  c014fedd No symbols available
  29:   75 08                     jne    33 <_EIP+0x33> c014fee7 No symbols available

This decode from eip onwards should be reliable

Code;  c014fedf No symbols available
00000000 <_EIP>:
Code;  c014fedf No symbols available   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c014fee1 No symbols available
   2:   22 03                     and    (%ebx),%al
Code;  c014fee3 No symbols available
   4:   85 18                     test   %ebx,(%eax)
Code;  c014fee5 No symbols available
   6:   2f                        das
Code;  c014fee6 No symbols available
   7:   c0 8b 45 08 50 e8 30      rorb   $0x30,0xe8500845(%ebx)
Code;  c014feed No symbols available
   e:   d4 00                     aam    $0x0
Code;  c014feef No symbols available
  10:   00                        .byte 0x0
Code;  c014fef0 No symbols available
  11:   89 c7                     mov    %eax,%edi
Code;  c014fef2 No symbols available
  13:   83                        .byte 0x83
Code;  c014fef3 No symbols available
  14:   c4                        .byte 0xc4



Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
 arch/i386/kernel/traps.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 12e9ce87f46c..ae69bf846a40 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -216,7 +216,7 @@ void show_registers(struct pt_regs *regs)
 		ss = regs->xss & 0xffff;
 	}
 	print_modules();
-	printk("CPU:    %d\nEIP:    %04x:[<%08lx>]    %s\nEFLAGS: %08lx"
+	printk("CPU:    %d\nEIP:    %04x:[<%08lx>]    %s VLI\nEFLAGS: %08lx"
 			"   (%s) \n",
 		smp_processor_id(), 0xffff & regs->xcs, regs->eip,
 		print_tainted(), regs->eflags, UTS_RELEASE);
@@ -234,23 +234,25 @@ void show_registers(struct pt_regs *regs)
 	 * time of the fault..
 	 */
 	if (in_kernel) {
+		u8 *eip;
 
 		printk("\nStack: ");
 		show_stack(NULL, (unsigned long*)esp);
 
 		printk("Code: ");
-		if(regs->eip < PAGE_OFFSET)
-			goto bad;
 
-		for(i=0;i<20;i++)
-		{
+		eip = (u8 *)regs->eip - 43;
+		for (i = 0; i < 64; i++, eip++) {
 			unsigned char c;
-			if(__get_user(c, &((unsigned char*)regs->eip)[i])) {
-bad:
+
+			if (eip < (u8 *)PAGE_OFFSET || __get_user(c, eip)) {
 				printk(" Bad EIP value.");
 				break;
 			}
-			printk("%02x ", c);
+			if (eip == (u8 *)regs->eip)
+				printk("<%02x> ", c);
+			else
+				printk("%02x ", c);
 		}
 	}
 	printk("\n");
-- 
2.13.0


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-08 22:36               ` Borislav Petkov
@ 2018-03-08 23:20                 ` Linus Torvalds
  2018-03-09 10:15                   ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2018-03-08 23:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Josh Poimboeuf, X86 ML, Andy Lutomirski, Peter Zijlstra, LKML

On Thu, Mar 8, 2018 at 2:36 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> Btw, do we have any explanation for the two-thirds prologue? I dug it
> out to the patch below but it doesn't say why the prologue being bigger
> is more important than the epilogue.

Yes, so the reason the prologue is more important is that there's
really two cases for the "Code:" line:

 (a) it doesn't matter at all, because you have the kernel image
anyway, and can just get the code from there based on the faulting
instruction pointer address.

 (b) it ends up helping you match things up roughly, because maybe you
have a different compiler version and different code generation
(and/or maybe just different config options)

In the (a) case, the code doesn't matter at all, and you might as well
just look at the whole function by just doing "gdb vmlinux" and then
disassemble it. You have everything just from %rip.

In the (b) case, there are two reasons to give code "around" the
faulting instruction pointer:

 - to help match up when you compare your not-identical vmlinux sequence

 - to actually figure out what the register allocation in the faulting
kernel was, so that you can make sense of the register dump, because
your local kernel might have completely different register allocaiton.

And for that *second* case, the instructions that _precede_ the
faulting instructions are generally much more important than the
instructions that follow the faulting instruction.

The instructions that *follow* the faulting instructions generally
have little or no relevance for the register dump (obviously loops etc
_can_ make them  matter, but you see the rough point).

So the instructions that precede the faulting instruction are in many
ways much more relevant than the instructions that follow it. So you
want more of those.

Also, the instructions that precede the faulting instruction have
another issue - with variable-length instructions, it's hard to sync
up things. Even if we were to use a disassembler (which we almost
certainly don't at fault time), walking backwards can easily be
ambiguous.

So you want extra "slop" in the bytes that get dumped before the
instruction, because you don't know quite where the instruction
boundaries may be.  In contrast, the faulting instruction itself - and
the instrucvtions that follow it - you can tell where the instruction
boundaries are (unless you had an exception due to a corrupted
indirect branch, which does happen but is quite rare).

So there's two reasons for why the bytes _before_ the instruction are
more important than the bytes _after_ the instruction: the register
state is more relevant for them, and the slop for decoding.

The "two thirds" then just comes from that. It's just a random
estimate of "instruction bytes that precede the faulting instructions
are more important than later ones".

In fact, to some degree the "one third" can be seen as "the faulting
instruction itself needs some space".

An x86 instruction can be at most 15 bytes, and you do likely want at
least that. Any bytes after the first instruction aren't _useless_
(because they definitely can help line things up when you have
slightly different code generation due to compiler versions etc), so
"at least 15 bytes, and maybe a bit more" again makes that 2/3rds
thing work out well when you print out 64 bytes.

                      Linus

PS. Historical side note:

We haven't always printed out that much. This is Linux-0.01:

        for(i=0;i<10;i++)
                printk("%02x ",0xff & get_seg_byte(esp[1],(i+(char *)esp[0])));

so it just printed out 10 bytes, basically just the faulting
instruction (and not even all of it if it has a lot of immediates).

We eventually extended the 10 bytes into 20 bytes, again starting at
the existing instruction, because if you don't print out a lot, you
need to start at the faulting instruction itself because of slop.

The expansion from 20 to 64 bytes happened in 2004 in v2.6.9. And
that's also when we started dumping preceding instructions, because
once you print out a fair chunk of bytes, that's when "preceding
instructions are more important" comes in. Because before that, you
want to guarantee that you at least print out the faulting
instruction.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section
  2018-03-08 23:20                 ` Linus Torvalds
@ 2018-03-09 10:15                   ` Borislav Petkov
  0 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2018-03-09 10:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, X86 ML, Andy Lutomirski, Peter Zijlstra, LKML

On Thu, Mar 08, 2018 at 03:20:58PM -0800, Linus Torvalds wrote:
> Yes, so the reason the prologue is more important is that there's
> really two cases for the "Code:" line:

Yap, I had a hunch it must be about some of those but thanks for taking
the time and writing it down!

I've tried to summarize the reasoning and slap it as a comment above it
so that it is clear why we do it this way:

---
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index c3aa0e29513e..7a21098c9128 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -71,6 +71,25 @@ static void printk_stack_address(unsigned long address, int reliable,
 	printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
+/*
+ * The are a couple of reasons for the 2/3rd prologue, courtesy of Linus:
+ *
+ * In case where we don't have the exact kernel image (which, if we did, we can
+ * simply disassemble and navigate to the RIP), the purpose of the bigger
+ * prologue is to have more context and to be able to correlate the code from
+ * the different toolchains better.
+ *
+ * In addition, it helps in recreating the register allocation of the failing
+ * kernel and thus make sense of the register dump.
+ *
+ * What is more, the additional complication of a variable length insn arch like
+ * x86 warrants having longer byte sequence before rIP so that the disassembler
+ * can "sync" up properly and find instruction boundaries when decoding the
+ * opcode bytes.
+ *
+ * Thus, the 2/3rds prologue and 64 byte OPCODE_BUFSIZE is just a random
+ * guesstimate in attempt to achieve all of the above.
+ */
 void show_opcodes(u8 *rip, const char *loglvl)
 {
 #define OPCODE_BUFSIZE 64

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-03-09 10:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-06  9:49 [PATCH 0/9] x86/dumpstack: Cleanups and user opcode bytes Code: section, v1 Borislav Petkov
2018-03-06  9:49 ` [PATCH 1/9] panic: Add closing panic marker parenthesis Borislav Petkov
2018-03-08 11:03   ` [tip:core/core] " tip-bot for Borislav Petkov
2018-03-06  9:49 ` [PATCH 2/9] x86/fault: Do not print IP in show_fault_oops() Borislav Petkov
2018-03-08 11:09   ` [tip:x86/cleanups] " tip-bot for Borislav Petkov
2018-03-06  9:49 ` [PATCH 3/9] x86/dumpstack: Unify show_regs() Borislav Petkov
2018-03-08 11:10   ` [tip:x86/cleanups] " tip-bot for Borislav Petkov
2018-03-06  9:49 ` [PATCH 4/9] x86/dumpstack: Carve out Code: dumping into a function Borislav Petkov
2018-03-06  9:49 ` [PATCH 5/9] x86/dumpstack: Improve opcodes dumping in the Code: section Borislav Petkov
2018-03-06 18:47   ` Linus Torvalds
2018-03-07 10:13     ` Borislav Petkov
2018-03-07 13:25       ` Josh Poimboeuf
2018-03-07 14:16         ` Borislav Petkov
2018-03-07 21:08         ` Linus Torvalds
2018-03-08 10:16           ` Borislav Petkov
2018-03-08 18:00             ` Linus Torvalds
2018-03-08 22:36               ` Borislav Petkov
2018-03-08 23:20                 ` Linus Torvalds
2018-03-09 10:15                   ` Borislav Petkov
2018-03-06  9:49 ` [PATCH 6/9] x86/dumpstack: Add loglevel argument to show_opcodes() Borislav Petkov
2018-03-06  9:49 ` [PATCH 7/9] x86/fault: Dump user opcode bytes on fatal faults Borislav Petkov
2018-03-06  9:49 ` [PATCH 8/9] x86/dumpstack: Add a show_ip() function Borislav Petkov
2018-03-06  9:49 ` [PATCH 9/9] x86/dumpstack: Save first regs set for the executive summary Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.