linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/3] x86/insn-eval: Add support for 64-bit kernel mode
@ 2019-11-15 19:17 Jann Horn
  2019-11-15 19:17 ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Jann Horn
  2019-11-15 19:17 ` [PATCH v2 3/3] x86/kasan: Print original " Jann Horn
  0 siblings, 2 replies; 21+ messages in thread
From: Jann Horn @ 2019-11-15 19:17 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	kasan-dev, jannh
  Cc: linux-kernel, Andrey Konovalov, Andy Lutomirski, Sean Christopherson

To support evaluating 64-bit kernel mode instructions:

Replace existing checks for user_64bit_mode() with a new helper that
checks whether code is being executed in either 64-bit kernel mode or
64-bit user mode.

Select the GS base depending on whether the instruction is being
evaluated in kernel mode.

Signed-off-by: Jann Horn <jannh@google.com>
---

Notes:
    v2:
      no changes

 arch/x86/include/asm/ptrace.h | 13 +++++++++++++
 arch/x86/lib/insn-eval.c      | 26 +++++++++++++++-----------
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 5057a8ed100b..ac45b06941a5 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -159,6 +159,19 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
 #endif
 }
 
+/*
+ * Determine whether the register set came from any context that is running in
+ * 64-bit mode.
+ */
+static inline bool any_64bit_mode(struct pt_regs *regs)
+{
+#ifdef CONFIG_X86_64
+	return !user_mode(regs) || user_64bit_mode(regs);
+#else
+	return false;
+#endif
+}
+
 #ifdef CONFIG_X86_64
 #define current_user_stack_pointer()	current_pt_regs()->sp
 #define compat_user_stack_pointer()	current_pt_regs()->sp
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 306c3a0902ba..31600d851fd8 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -155,7 +155,7 @@ static bool check_seg_overrides(struct insn *insn, int regoff)
  */
 static int resolve_default_seg(struct insn *insn, struct pt_regs *regs, int off)
 {
-	if (user_64bit_mode(regs))
+	if (any_64bit_mode(regs))
 		return INAT_SEG_REG_IGNORE;
 	/*
 	 * Resolve the default segment register as described in Section 3.7.4
@@ -266,7 +266,7 @@ static int resolve_seg_reg(struct insn *insn, struct pt_regs *regs, int regoff)
 	 * which may be invalid at this point.
 	 */
 	if (regoff == offsetof(struct pt_regs, ip)) {
-		if (user_64bit_mode(regs))
+		if (any_64bit_mode(regs))
 			return INAT_SEG_REG_IGNORE;
 		else
 			return INAT_SEG_REG_CS;
@@ -289,7 +289,7 @@ static int resolve_seg_reg(struct insn *insn, struct pt_regs *regs, int regoff)
 	 * In long mode, segment override prefixes are ignored, except for
 	 * overrides for FS and GS.
 	 */
-	if (user_64bit_mode(regs)) {
+	if (any_64bit_mode(regs)) {
 		if (idx != INAT_SEG_REG_FS &&
 		    idx != INAT_SEG_REG_GS)
 			idx = INAT_SEG_REG_IGNORE;
@@ -646,23 +646,27 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx)
 		 */
 		return (unsigned long)(sel << 4);
 
-	if (user_64bit_mode(regs)) {
+	if (any_64bit_mode(regs)) {
 		/*
 		 * Only FS or GS will have a base address, the rest of
 		 * the segments' bases are forced to 0.
 		 */
 		unsigned long base;
 
-		if (seg_reg_idx == INAT_SEG_REG_FS)
+		if (seg_reg_idx == INAT_SEG_REG_FS) {
 			rdmsrl(MSR_FS_BASE, base);
-		else if (seg_reg_idx == INAT_SEG_REG_GS)
+		} else if (seg_reg_idx == INAT_SEG_REG_GS) {
 			/*
 			 * swapgs was called at the kernel entry point. Thus,
 			 * MSR_KERNEL_GS_BASE will have the user-space GS base.
 			 */
-			rdmsrl(MSR_KERNEL_GS_BASE, base);
-		else
+			if (user_mode(regs))
+				rdmsrl(MSR_KERNEL_GS_BASE, base);
+			else
+				rdmsrl(MSR_GS_BASE, base);
+		} else {
 			base = 0;
+		}
 		return base;
 	}
 
@@ -703,7 +707,7 @@ static unsigned long get_seg_limit(struct pt_regs *regs, int seg_reg_idx)
 	if (sel < 0)
 		return 0;
 
-	if (user_64bit_mode(regs) || v8086_mode(regs))
+	if (any_64bit_mode(regs) || v8086_mode(regs))
 		return -1L;
 
 	if (!sel)
@@ -948,7 +952,7 @@ static int get_eff_addr_modrm(struct insn *insn, struct pt_regs *regs,
 	 * following instruction.
 	 */
 	if (*regoff == -EDOM) {
-		if (user_64bit_mode(regs))
+		if (any_64bit_mode(regs))
 			tmp = regs->ip + insn->length;
 		else
 			tmp = 0;
@@ -1250,7 +1254,7 @@ static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs)
 	 * After computed, the effective address is treated as an unsigned
 	 * quantity.
 	 */
-	if (!user_64bit_mode(regs) && ((unsigned int)eff_addr > seg_limit))
+	if (!any_64bit_mode(regs) && ((unsigned int)eff_addr > seg_limit))
 		goto out;
 
 	/*
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-15 19:17 [PATCH v2 1/3] x86/insn-eval: Add support for 64-bit kernel mode Jann Horn
@ 2019-11-15 19:17 ` Jann Horn
  2019-11-18 14:21   ` Borislav Petkov
                     ` (2 more replies)
  2019-11-15 19:17 ` [PATCH v2 3/3] x86/kasan: Print original " Jann Horn
  1 sibling, 3 replies; 21+ messages in thread
From: Jann Horn @ 2019-11-15 19:17 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	kasan-dev, jannh
  Cc: linux-kernel, Andrey Konovalov, Andy Lutomirski, Sean Christopherson

A frequent cause of #GP exceptions are memory accesses to non-canonical
addresses. Unlike #PF, #GP doesn't come with a fault address in CR2, so
the kernel doesn't currently print the fault address for #GP.
Luckily, we already have the necessary infrastructure for decoding X86
instructions and computing the memory address that is being accessed;
hook it up to the #GP handler so that we can figure out whether the #GP
looks like it was caused by a non-canonical address, and if so, print
that address.

While it is already possible to compute the faulting address manually by
disassembling the opcode dump and evaluating the instruction against the
register dump, this should make it slightly easier to identify crashes
at a glance.

Signed-off-by: Jann Horn <jannh@google.com>
---

Notes:
    v2:
     - print different message for segment-related GP (Borislav)
     - rewrite check for non-canonical address (Sean)
     - make it clear we don't know for sure why the GP happened (Andy)

 arch/x86/kernel/traps.c | 45 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c90312146da0..12d42697a18e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -56,6 +56,8 @@
 #include <asm/mpx.h>
 #include <asm/vm86.h>
 #include <asm/umip.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -509,6 +511,38 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 	do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, 0, NULL);
 }
 
+/*
+ * On 64-bit, if an uncaught #GP occurs while dereferencing a non-canonical
+ * address, print that address.
+ */
+static void print_kernel_gp_address(struct pt_regs *regs)
+{
+#ifdef CONFIG_X86_64
+	u8 insn_bytes[MAX_INSN_SIZE];
+	struct insn insn;
+	unsigned long addr_ref;
+
+	if (probe_kernel_read(insn_bytes, (void *)regs->ip, MAX_INSN_SIZE))
+		return;
+
+	kernel_insn_init(&insn, insn_bytes, MAX_INSN_SIZE);
+	insn_get_modrm(&insn);
+	insn_get_sib(&insn);
+	addr_ref = (unsigned long)insn_get_addr_ref(&insn, regs);
+
+	/* Bail out if insn_get_addr_ref() failed or we got a kernel address. */
+	if (addr_ref >= ~__VIRTUAL_MASK)
+		return;
+
+	/* Bail out if the entire operand is in the canonical user half. */
+	if (addr_ref + insn.opnd_bytes - 1 <= __VIRTUAL_MASK)
+		return;
+
+	pr_alert("probably dereferencing non-canonical address 0x%016lx\n",
+		 addr_ref);
+#endif
+}
+
 dotraplinkage void
 do_general_protection(struct pt_regs *regs, long error_code)
 {
@@ -547,8 +581,15 @@ do_general_protection(struct pt_regs *regs, long error_code)
 			return;
 
 		if (notify_die(DIE_GPF, desc, regs, error_code,
-			       X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
-			die(desc, regs, error_code);
+			       X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
+			return;
+
+		if (error_code)
+			pr_alert("GPF is segment-related (see error code)\n");
+		else
+			print_kernel_gp_address(regs);
+
+		die(desc, regs, error_code);
 		return;
 	}
 
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/3] x86/kasan: Print original address on #GP
  2019-11-15 19:17 [PATCH v2 1/3] x86/insn-eval: Add support for 64-bit kernel mode Jann Horn
  2019-11-15 19:17 ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Jann Horn
@ 2019-11-15 19:17 ` Jann Horn
  2019-11-18  8:36   ` Dmitry Vyukov
  1 sibling, 1 reply; 21+ messages in thread
From: Jann Horn @ 2019-11-15 19:17 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	kasan-dev, jannh
  Cc: linux-kernel, Andrey Konovalov, Andy Lutomirski, Sean Christopherson

Make #GP exceptions caused by out-of-bounds KASAN shadow accesses easier
to understand by computing the address of the original access and
printing that. More details are in the comments in the patch.

This turns an error like this:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    traps: probably dereferencing non-canonical address 0xe017577ddf75b7dd
    general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI

into this:

    traps: dereferencing non-canonical address 0xe017577ddf75b7dd
    traps: probably dereferencing non-canonical address 0xe017577ddf75b7dd
    KASAN: maybe wild-memory-access in range
            [0x00badbeefbadbee8-0x00badbeefbadbeef]
    general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI

The hook is placed in architecture-independent code, but is currently
only wired up to the X86 exception handler because I'm not sufficiently
familiar with the address space layout and exception handling mechanisms
on other architectures.

Signed-off-by: Jann Horn <jannh@google.com>
---

Notes:
    v2:
     - move to mm/kasan/report.c (Dmitry)
     - change hook name to be more generic
     - use TASK_SIZE instead of TASK_SIZE_MAX for compiling on non-x86
     - don't open-code KASAN_SHADOW_MASK (Dmitry)
     - add "KASAN: " prefix, but not "BUG: " (Andrey, Dmitry)
     - use same naming scheme as get_wild_bug_type (Andrey)

 arch/x86/kernel/traps.c     |  2 ++
 arch/x86/mm/kasan_init_64.c | 21 -------------------
 include/linux/kasan.h       |  6 ++++++
 mm/kasan/report.c           | 40 +++++++++++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 12d42697a18e..87b52682a37a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -37,6 +37,7 @@
 #include <linux/mm.h>
 #include <linux/smp.h>
 #include <linux/io.h>
+#include <linux/kasan.h>
 #include <asm/stacktrace.h>
 #include <asm/processor.h>
 #include <asm/debugreg.h>
@@ -540,6 +541,7 @@ static void print_kernel_gp_address(struct pt_regs *regs)
 
 	pr_alert("probably dereferencing non-canonical address 0x%016lx\n",
 		 addr_ref);
+	kasan_non_canonical_hook(addr_ref);
 #endif
 }
 
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 296da58f3013..69c437fb21cc 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -245,23 +245,6 @@ static void __init kasan_map_early_shadow(pgd_t *pgd)
 	} while (pgd++, addr = next, addr != end);
 }
 
-#ifdef CONFIG_KASAN_INLINE
-static int kasan_die_handler(struct notifier_block *self,
-			     unsigned long val,
-			     void *data)
-{
-	if (val == DIE_GPF) {
-		pr_emerg("CONFIG_KASAN_INLINE enabled\n");
-		pr_emerg("GPF could be caused by NULL-ptr deref or user memory access\n");
-	}
-	return NOTIFY_OK;
-}
-
-static struct notifier_block kasan_die_notifier = {
-	.notifier_call = kasan_die_handler,
-};
-#endif
-
 void __init kasan_early_init(void)
 {
 	int i;
@@ -298,10 +281,6 @@ void __init kasan_init(void)
 	int i;
 	void *shadow_cpu_entry_begin, *shadow_cpu_entry_end;
 
-#ifdef CONFIG_KASAN_INLINE
-	register_die_notifier(&kasan_die_notifier);
-#endif
-
 	memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
 
 	/*
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index cc8a03cc9674..7305024b44e3 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -194,4 +194,10 @@ static inline void *kasan_reset_tag(const void *addr)
 
 #endif /* CONFIG_KASAN_SW_TAGS */
 
+#ifdef CONFIG_KASAN_INLINE
+void kasan_non_canonical_hook(unsigned long addr);
+#else /* CONFIG_KASAN_INLINE */
+static inline void kasan_non_canonical_hook(unsigned long addr) { }
+#endif /* CONFIG_KASAN_INLINE */
+
 #endif /* LINUX_KASAN_H */
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 621782100eaa..5ef9f24f566b 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -512,3 +512,43 @@ void __kasan_report(unsigned long addr, size_t size, bool is_write, unsigned lon
 
 	end_report(&flags);
 }
+
+#ifdef CONFIG_KASAN_INLINE
+/*
+ * With CONFIG_KASAN_INLINE, accesses to bogus pointers (outside the high
+ * canonical half of the address space) cause out-of-bounds shadow memory reads
+ * before the actual access. For addresses in the low canonical half of the
+ * address space, as well as most non-canonical addresses, that out-of-bounds
+ * shadow memory access lands in the non-canonical part of the address space.
+ * Help the user figure out what the original bogus pointer was.
+ */
+void kasan_non_canonical_hook(unsigned long addr)
+{
+	unsigned long orig_addr;
+	const char *bug_type;
+
+	if (addr < KASAN_SHADOW_OFFSET)
+		return;
+
+	orig_addr = (addr - KASAN_SHADOW_OFFSET) << KASAN_SHADOW_SCALE_SHIFT;
+	/*
+	 * For faults near the shadow address for NULL, we can be fairly certain
+	 * that this is a KASAN shadow memory access.
+	 * For faults that correspond to shadow for low canonical addresses, we
+	 * can still be pretty sure - that shadow region is a fairly narrow
+	 * chunk of the non-canonical address space.
+	 * But faults that look like shadow for non-canonical addresses are a
+	 * really large chunk of the address space. In that case, we still
+	 * print the decoded address, but make it clear that this is not
+	 * necessarily what's actually going on.
+	 */
+	if (orig_addr < PAGE_SIZE)
+		bug_type = "null-ptr-deref";
+	else if (orig_addr < TASK_SIZE)
+		bug_type = "probably user-memory-access";
+	else
+		bug_type = "maybe wild-memory-access";
+	pr_alert("KASAN: %s in range [0x%016lx-0x%016lx]\n", bug_type,
+		 orig_addr, orig_addr + KASAN_SHADOW_MASK);
+}
+#endif
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] x86/kasan: Print original address on #GP
  2019-11-15 19:17 ` [PATCH v2 3/3] x86/kasan: Print original " Jann Horn
@ 2019-11-18  8:36   ` Dmitry Vyukov
  0 siblings, 0 replies; 21+ messages in thread
From: Dmitry Vyukov @ 2019-11-18  8:36 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	kasan-dev, LKML, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Fri, Nov 15, 2019 at 8:17 PM Jann Horn <jannh@google.com> wrote:
>
> Make #GP exceptions caused by out-of-bounds KASAN shadow accesses easier
> to understand by computing the address of the original access and
> printing that. More details are in the comments in the patch.
>
> This turns an error like this:
>
>     kasan: CONFIG_KASAN_INLINE enabled
>     kasan: GPF could be caused by NULL-ptr deref or user memory access
>     traps: probably dereferencing non-canonical address 0xe017577ddf75b7dd
>     general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
>
> into this:
>
>     traps: dereferencing non-canonical address 0xe017577ddf75b7dd
>     traps: probably dereferencing non-canonical address 0xe017577ddf75b7dd
>     KASAN: maybe wild-memory-access in range
>             [0x00badbeefbadbee8-0x00badbeefbadbeef]
>     general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
>
> The hook is placed in architecture-independent code, but is currently
> only wired up to the X86 exception handler because I'm not sufficiently
> familiar with the address space layout and exception handling mechanisms
> on other architectures.
>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>
> Notes:
>     v2:
>      - move to mm/kasan/report.c (Dmitry)
>      - change hook name to be more generic
>      - use TASK_SIZE instead of TASK_SIZE_MAX for compiling on non-x86
>      - don't open-code KASAN_SHADOW_MASK (Dmitry)
>      - add "KASAN: " prefix, but not "BUG: " (Andrey, Dmitry)
>      - use same naming scheme as get_wild_bug_type (Andrey)
>
>  arch/x86/kernel/traps.c     |  2 ++
>  arch/x86/mm/kasan_init_64.c | 21 -------------------
>  include/linux/kasan.h       |  6 ++++++
>  mm/kasan/report.c           | 40 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 48 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 12d42697a18e..87b52682a37a 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -37,6 +37,7 @@
>  #include <linux/mm.h>
>  #include <linux/smp.h>
>  #include <linux/io.h>
> +#include <linux/kasan.h>
>  #include <asm/stacktrace.h>
>  #include <asm/processor.h>
>  #include <asm/debugreg.h>
> @@ -540,6 +541,7 @@ static void print_kernel_gp_address(struct pt_regs *regs)
>
>         pr_alert("probably dereferencing non-canonical address 0x%016lx\n",
>                  addr_ref);
> +       kasan_non_canonical_hook(addr_ref);
>  #endif
>  }
>
> diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
> index 296da58f3013..69c437fb21cc 100644
> --- a/arch/x86/mm/kasan_init_64.c
> +++ b/arch/x86/mm/kasan_init_64.c
> @@ -245,23 +245,6 @@ static void __init kasan_map_early_shadow(pgd_t *pgd)
>         } while (pgd++, addr = next, addr != end);
>  }
>
> -#ifdef CONFIG_KASAN_INLINE
> -static int kasan_die_handler(struct notifier_block *self,
> -                            unsigned long val,
> -                            void *data)
> -{
> -       if (val == DIE_GPF) {
> -               pr_emerg("CONFIG_KASAN_INLINE enabled\n");
> -               pr_emerg("GPF could be caused by NULL-ptr deref or user memory access\n");
> -       }
> -       return NOTIFY_OK;
> -}
> -
> -static struct notifier_block kasan_die_notifier = {
> -       .notifier_call = kasan_die_handler,
> -};
> -#endif
> -
>  void __init kasan_early_init(void)
>  {
>         int i;
> @@ -298,10 +281,6 @@ void __init kasan_init(void)
>         int i;
>         void *shadow_cpu_entry_begin, *shadow_cpu_entry_end;
>
> -#ifdef CONFIG_KASAN_INLINE
> -       register_die_notifier(&kasan_die_notifier);
> -#endif
> -
>         memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
>
>         /*
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index cc8a03cc9674..7305024b44e3 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -194,4 +194,10 @@ static inline void *kasan_reset_tag(const void *addr)
>
>  #endif /* CONFIG_KASAN_SW_TAGS */
>
> +#ifdef CONFIG_KASAN_INLINE
> +void kasan_non_canonical_hook(unsigned long addr);
> +#else /* CONFIG_KASAN_INLINE */
> +static inline void kasan_non_canonical_hook(unsigned long addr) { }
> +#endif /* CONFIG_KASAN_INLINE */
> +
>  #endif /* LINUX_KASAN_H */
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index 621782100eaa..5ef9f24f566b 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -512,3 +512,43 @@ void __kasan_report(unsigned long addr, size_t size, bool is_write, unsigned lon
>
>         end_report(&flags);
>  }
> +
> +#ifdef CONFIG_KASAN_INLINE
> +/*
> + * With CONFIG_KASAN_INLINE, accesses to bogus pointers (outside the high
> + * canonical half of the address space) cause out-of-bounds shadow memory reads
> + * before the actual access. For addresses in the low canonical half of the
> + * address space, as well as most non-canonical addresses, that out-of-bounds
> + * shadow memory access lands in the non-canonical part of the address space.
> + * Help the user figure out what the original bogus pointer was.
> + */
> +void kasan_non_canonical_hook(unsigned long addr)
> +{
> +       unsigned long orig_addr;
> +       const char *bug_type;
> +
> +       if (addr < KASAN_SHADOW_OFFSET)
> +               return;
> +
> +       orig_addr = (addr - KASAN_SHADOW_OFFSET) << KASAN_SHADOW_SCALE_SHIFT;
> +       /*
> +        * For faults near the shadow address for NULL, we can be fairly certain
> +        * that this is a KASAN shadow memory access.
> +        * For faults that correspond to shadow for low canonical addresses, we
> +        * can still be pretty sure - that shadow region is a fairly narrow
> +        * chunk of the non-canonical address space.
> +        * But faults that look like shadow for non-canonical addresses are a
> +        * really large chunk of the address space. In that case, we still
> +        * print the decoded address, but make it clear that this is not
> +        * necessarily what's actually going on.
> +        */
> +       if (orig_addr < PAGE_SIZE)
> +               bug_type = "null-ptr-deref";
> +       else if (orig_addr < TASK_SIZE)
> +               bug_type = "probably user-memory-access";
> +       else
> +               bug_type = "maybe wild-memory-access";
> +       pr_alert("KASAN: %s in range [0x%016lx-0x%016lx]\n", bug_type,
> +                orig_addr, orig_addr + KASAN_SHADOW_MASK);
> +}
> +#endif
> --
> 2.24.0.432.g9d3f5f5b63-goog


Reviewed-by: Dmitry Vyukov <dvyukov@google.com>

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-15 19:17 ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Jann Horn
@ 2019-11-18 14:21   ` Borislav Petkov
  2019-11-18 16:02     ` Dmitry Vyukov
  2019-11-20  4:25   ` Andi Kleen
  2019-11-23 23:07   ` Andy Lutomirski
  2 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2019-11-18 14:21 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasan-dev,
	linux-kernel, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Fri, Nov 15, 2019 at 08:17:27PM +0100, Jann Horn wrote:
>  dotraplinkage void
>  do_general_protection(struct pt_regs *regs, long error_code)
>  {
> @@ -547,8 +581,15 @@ do_general_protection(struct pt_regs *regs, long error_code)
>  			return;
>  
>  		if (notify_die(DIE_GPF, desc, regs, error_code,
> -			       X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
> -			die(desc, regs, error_code);
> +			       X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
> +			return;
> +
> +		if (error_code)
> +			pr_alert("GPF is segment-related (see error code)\n");
> +		else
> +			print_kernel_gp_address(regs);
> +
> +		die(desc, regs, error_code);

Right, this way, those messages appear before the main "general
protection ..." message:

[    2.434372] traps: probably dereferencing non-canonical address 0xdfff000000000001
[    2.442492] general protection fault: 0000 [#1] PREEMPT SMP

Can we glue/merge them together? Or is this going to confuse tools too much:

[    2.542218] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP

(and that sentence could be shorter too:

 	"general protection fault for non-canonical address 0xdfff000000000001"

looks ok to me too.)

Here's a dirty diff together with a reproducer ontop of yours:

---
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bf796f8c9998..dab702ba28a6 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -515,7 +515,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
  * On 64-bit, if an uncaught #GP occurs while dereferencing a non-canonical
  * address, print that address.
  */
-static void print_kernel_gp_address(struct pt_regs *regs)
+static unsigned long get_kernel_gp_address(struct pt_regs *regs)
 {
 #ifdef CONFIG_X86_64
 	u8 insn_bytes[MAX_INSN_SIZE];
@@ -523,7 +523,7 @@ static void print_kernel_gp_address(struct pt_regs *regs)
 	unsigned long addr_ref;
 
 	if (probe_kernel_read(insn_bytes, (void *)regs->ip, MAX_INSN_SIZE))
-		return;
+		return 0;
 
 	kernel_insn_init(&insn, insn_bytes, MAX_INSN_SIZE);
 	insn_get_modrm(&insn);
@@ -532,22 +532,22 @@ static void print_kernel_gp_address(struct pt_regs *regs)
 
 	/* Bail out if insn_get_addr_ref() failed or we got a kernel address. */
 	if (addr_ref >= ~__VIRTUAL_MASK)
-		return;
+		return 0;
 
 	/* Bail out if the entire operand is in the canonical user half. */
 	if (addr_ref + insn.opnd_bytes - 1 <= __VIRTUAL_MASK)
-		return;
+		return 0;
 
-	pr_alert("probably dereferencing non-canonical address 0x%016lx\n",
-		 addr_ref);
+	return addr_ref;
 #endif
 }
 
+#define GPFSTR "general protection fault"
 dotraplinkage void
 do_general_protection(struct pt_regs *regs, long error_code)
 {
-	const char *desc = "general protection fault";
 	struct task_struct *tsk;
+	char desc[90];
 
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
 	cond_local_irq_enable(regs);
@@ -584,12 +584,18 @@ do_general_protection(struct pt_regs *regs, long error_code)
 			       X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
 			return;
 
-		if (error_code)
-			pr_alert("GPF is segment-related (see error code)\n");
-		else
-			print_kernel_gp_address(regs);
+		if (error_code) {
+			snprintf(desc, 90, "segment-related " GPFSTR);
+		} else {
+			unsigned long addr_ref = get_kernel_gp_address(regs);
+
+			if (addr_ref)
+				snprintf(desc, 90, GPFSTR " while derefing a non-canonical address 0x%lx", addr_ref);
+			else
+				snprintf(desc, 90, GPFSTR);
+		}
 
-		die(desc, regs, error_code);
+		die((const char *)desc, regs, error_code);
 		return;
 	}
 
diff --git a/init/main.c b/init/main.c
index 91f6ebb30ef0..7acc7e660be9 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1124,6 +1124,9 @@ static int __ref kernel_init(void *unused)
 
 	rcu_end_inkernel_boot();
 
+	asm volatile("mov $0xdfff000000000001, %rax\n\t"
+		     "jmpq *%rax\n\t");
+
 	if (ramdisk_execute_command) {
 		ret = run_init_process(ramdisk_execute_command);
 		if (!ret)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-18 14:21   ` Borislav Petkov
@ 2019-11-18 16:02     ` Dmitry Vyukov
  2019-11-18 16:19       ` Jann Horn
  0 siblings, 1 reply; 21+ messages in thread
From: Dmitry Vyukov @ 2019-11-18 16:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jann Horn, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	kasan-dev, LKML, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Mon, Nov 18, 2019 at 3:21 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Fri, Nov 15, 2019 at 08:17:27PM +0100, Jann Horn wrote:
> >  dotraplinkage void
> >  do_general_protection(struct pt_regs *regs, long error_code)
> >  {
> > @@ -547,8 +581,15 @@ do_general_protection(struct pt_regs *regs, long error_code)
> >                       return;
> >
> >               if (notify_die(DIE_GPF, desc, regs, error_code,
> > -                            X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
> > -                     die(desc, regs, error_code);
> > +                            X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
> > +                     return;
> > +
> > +             if (error_code)
> > +                     pr_alert("GPF is segment-related (see error code)\n");
> > +             else
> > +                     print_kernel_gp_address(regs);
> > +
> > +             die(desc, regs, error_code);
>
> Right, this way, those messages appear before the main "general
> protection ..." message:
>
> [    2.434372] traps: probably dereferencing non-canonical address 0xdfff000000000001
> [    2.442492] general protection fault: 0000 [#1] PREEMPT SMP
>
> Can we glue/merge them together? Or is this going to confuse tools too much:
>
> [    2.542218] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
>
> (and that sentence could be shorter too:
>
>         "general protection fault for non-canonical address 0xdfff000000000001"
>
> looks ok to me too.)

This exact form will confuse syzkaller crash parsing for Linux kernel:
https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L1347
It expects a "general protection fault:" line for these crashes.

A graceful way to update kernel crash messages would be to add more
tests with the new format here:
https://github.com/google/syzkaller/tree/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/testdata/linux/report
Update parsing code. Roll out new version. Update all other testing
systems that detect and parse kernel crashes. Then commit kernel
changes.

An unfortunate consequence of offloading testing to third-party systems...



> Here's a dirty diff together with a reproducer ontop of yours:
>
> ---
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index bf796f8c9998..dab702ba28a6 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -515,7 +515,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
>   * On 64-bit, if an uncaught #GP occurs while dereferencing a non-canonical
>   * address, print that address.
>   */
> -static void print_kernel_gp_address(struct pt_regs *regs)
> +static unsigned long get_kernel_gp_address(struct pt_regs *regs)
>  {
>  #ifdef CONFIG_X86_64
>         u8 insn_bytes[MAX_INSN_SIZE];
> @@ -523,7 +523,7 @@ static void print_kernel_gp_address(struct pt_regs *regs)
>         unsigned long addr_ref;
>
>         if (probe_kernel_read(insn_bytes, (void *)regs->ip, MAX_INSN_SIZE))
> -               return;
> +               return 0;
>
>         kernel_insn_init(&insn, insn_bytes, MAX_INSN_SIZE);
>         insn_get_modrm(&insn);
> @@ -532,22 +532,22 @@ static void print_kernel_gp_address(struct pt_regs *regs)
>
>         /* Bail out if insn_get_addr_ref() failed or we got a kernel address. */
>         if (addr_ref >= ~__VIRTUAL_MASK)
> -               return;
> +               return 0;
>
>         /* Bail out if the entire operand is in the canonical user half. */
>         if (addr_ref + insn.opnd_bytes - 1 <= __VIRTUAL_MASK)
> -               return;
> +               return 0;
>
> -       pr_alert("probably dereferencing non-canonical address 0x%016lx\n",
> -                addr_ref);
> +       return addr_ref;
>  #endif
>  }
>
> +#define GPFSTR "general protection fault"
>  dotraplinkage void
>  do_general_protection(struct pt_regs *regs, long error_code)
>  {
> -       const char *desc = "general protection fault";
>         struct task_struct *tsk;
> +       char desc[90];
>
>         RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
>         cond_local_irq_enable(regs);
> @@ -584,12 +584,18 @@ do_general_protection(struct pt_regs *regs, long error_code)
>                                X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
>                         return;
>
> -               if (error_code)
> -                       pr_alert("GPF is segment-related (see error code)\n");
> -               else
> -                       print_kernel_gp_address(regs);
> +               if (error_code) {
> +                       snprintf(desc, 90, "segment-related " GPFSTR);
> +               } else {
> +                       unsigned long addr_ref = get_kernel_gp_address(regs);
> +
> +                       if (addr_ref)
> +                               snprintf(desc, 90, GPFSTR " while derefing a non-canonical address 0x%lx", addr_ref);
> +                       else
> +                               snprintf(desc, 90, GPFSTR);
> +               }
>
> -               die(desc, regs, error_code);
> +               die((const char *)desc, regs, error_code);
>                 return;
>         }
>
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..7acc7e660be9 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1124,6 +1124,9 @@ static int __ref kernel_init(void *unused)
>
>         rcu_end_inkernel_boot();
>
> +       asm volatile("mov $0xdfff000000000001, %rax\n\t"
> +                    "jmpq *%rax\n\t");
> +
>         if (ramdisk_execute_command) {
>                 ret = run_init_process(ramdisk_execute_command);
>                 if (!ret)
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20191118142144.GC6363%40zn.tnic.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-18 16:02     ` Dmitry Vyukov
@ 2019-11-18 16:19       ` Jann Horn
  2019-11-18 16:29         ` Dmitry Vyukov
  0 siblings, 1 reply; 21+ messages in thread
From: Jann Horn @ 2019-11-18 16:19 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	kasan-dev, LKML, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Mon, Nov 18, 2019 at 5:03 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> On Mon, Nov 18, 2019 at 3:21 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Fri, Nov 15, 2019 at 08:17:27PM +0100, Jann Horn wrote:
> > >  dotraplinkage void
> > >  do_general_protection(struct pt_regs *regs, long error_code)
> > >  {
> > > @@ -547,8 +581,15 @@ do_general_protection(struct pt_regs *regs, long error_code)
> > >                       return;
> > >
> > >               if (notify_die(DIE_GPF, desc, regs, error_code,
> > > -                            X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
> > > -                     die(desc, regs, error_code);
> > > +                            X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
> > > +                     return;
> > > +
> > > +             if (error_code)
> > > +                     pr_alert("GPF is segment-related (see error code)\n");
> > > +             else
> > > +                     print_kernel_gp_address(regs);
> > > +
> > > +             die(desc, regs, error_code);
> >
> > Right, this way, those messages appear before the main "general
> > protection ..." message:
> >
> > [    2.434372] traps: probably dereferencing non-canonical address 0xdfff000000000001
> > [    2.442492] general protection fault: 0000 [#1] PREEMPT SMP
> >
> > Can we glue/merge them together? Or is this going to confuse tools too much:
> >
> > [    2.542218] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> >
> > (and that sentence could be shorter too:
> >
> >         "general protection fault for non-canonical address 0xdfff000000000001"
> >
> > looks ok to me too.)
>
> This exact form will confuse syzkaller crash parsing for Linux kernel:
> https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L1347
> It expects a "general protection fault:" line for these crashes.
>
> A graceful way to update kernel crash messages would be to add more
> tests with the new format here:
> https://github.com/google/syzkaller/tree/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/testdata/linux/report
> Update parsing code. Roll out new version. Update all other testing
> systems that detect and parse kernel crashes. Then commit kernel
> changes.

So for syzkaller, it'd be fine as long as we keep the colon there?
Something like:

general protection fault: derefing non-canonical address
0xdfff000000000001: 0000 [#1] PREEMPT SMP

And it looks like the 0day test bot doesn't have any specific pattern
for #GP, it seems to just look for the panic triggered by
panic-on-oops as far as I can tell (oops=panic in lkp-exec/qemu, no
"general protection fault" in etc/dmesg-kill-pattern).

> An unfortunate consequence of offloading testing to third-party systems...

And of not having a standard way to signal "this line starts something
that should be reported as a bug"? Maybe as a longer-term idea, it'd
help to have some sort of extra prefix byte that the kernel can print
to say "here comes a bug report, first line should be the subject", or
something like that, similar to how we have loglevels...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-18 16:19       ` Jann Horn
@ 2019-11-18 16:29         ` Dmitry Vyukov
  2019-11-18 16:40           ` error attribution for stalls [was: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP] Jann Horn
  2019-11-18 16:44           ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Borislav Petkov
  0 siblings, 2 replies; 21+ messages in thread
From: Dmitry Vyukov @ 2019-11-18 16:29 UTC (permalink / raw)
  To: Jann Horn
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	kasan-dev, LKML, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Mon, Nov 18, 2019 at 5:20 PM 'Jann Horn' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> On Mon, Nov 18, 2019 at 5:03 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Mon, Nov 18, 2019 at 3:21 PM Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > On Fri, Nov 15, 2019 at 08:17:27PM +0100, Jann Horn wrote:
> > > >  dotraplinkage void
> > > >  do_general_protection(struct pt_regs *regs, long error_code)
> > > >  {
> > > > @@ -547,8 +581,15 @@ do_general_protection(struct pt_regs *regs, long error_code)
> > > >                       return;
> > > >
> > > >               if (notify_die(DIE_GPF, desc, regs, error_code,
> > > > -                            X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
> > > > -                     die(desc, regs, error_code);
> > > > +                            X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
> > > > +                     return;
> > > > +
> > > > +             if (error_code)
> > > > +                     pr_alert("GPF is segment-related (see error code)\n");
> > > > +             else
> > > > +                     print_kernel_gp_address(regs);
> > > > +
> > > > +             die(desc, regs, error_code);
> > >
> > > Right, this way, those messages appear before the main "general
> > > protection ..." message:
> > >
> > > [    2.434372] traps: probably dereferencing non-canonical address 0xdfff000000000001
> > > [    2.442492] general protection fault: 0000 [#1] PREEMPT SMP
> > >
> > > Can we glue/merge them together? Or is this going to confuse tools too much:
> > >
> > > [    2.542218] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> > >
> > > (and that sentence could be shorter too:
> > >
> > >         "general protection fault for non-canonical address 0xdfff000000000001"
> > >
> > > looks ok to me too.)
> >
> > This exact form will confuse syzkaller crash parsing for Linux kernel:
> > https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L1347
> > It expects a "general protection fault:" line for these crashes.
> >
> > A graceful way to update kernel crash messages would be to add more
> > tests with the new format here:
> > https://github.com/google/syzkaller/tree/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/testdata/linux/report
> > Update parsing code. Roll out new version. Update all other testing
> > systems that detect and parse kernel crashes. Then commit kernel
> > changes.
>
> So for syzkaller, it'd be fine as long as we keep the colon there?
> Something like:
>
> general protection fault: derefing non-canonical address
> 0xdfff000000000001: 0000 [#1] PREEMPT SMP

Probably. Tests help a lot to answer such questions ;) But presumably
it should break parsing.

> And it looks like the 0day test bot doesn't have any specific pattern
> for #GP, it seems to just look for the panic triggered by
> panic-on-oops as far as I can tell (oops=panic in lkp-exec/qemu, no
> "general protection fault" in etc/dmesg-kill-pattern).
>
> > An unfortunate consequence of offloading testing to third-party systems...
>
> And of not having a standard way to signal "this line starts something
> that should be reported as a bug"? Maybe as a longer-term idea, it'd
> help to have some sort of extra prefix byte that the kernel can print
> to say "here comes a bug report, first line should be the subject", or
> something like that, similar to how we have loglevels...

This would be great.
Also a way to denote crash end.
However we have lots of special logic for subjects, not sure if kernel
could provide good subject:
https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L537-L1588
Probably it could, but it won't be completely trivial. E.g. if there
is a stall inside of a timer function, it should give the name of the
actual timer callback as identity ("stall in timer_subsystem_foo"). Or
for syscalls we use more disambiguation b/c "in sys_ioclt" is not much
different than saying "there is a bug in kernel" :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* error attribution for stalls [was: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP]
  2019-11-18 16:29         ` Dmitry Vyukov
@ 2019-11-18 16:40           ` Jann Horn
  2019-11-18 16:44           ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Borislav Petkov
  1 sibling, 0 replies; 21+ messages in thread
From: Jann Horn @ 2019-11-18 16:40 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	kasan-dev, LKML, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Mon, Nov 18, 2019 at 5:29 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> On Mon, Nov 18, 2019 at 5:20 PM 'Jann Horn' via kasan-dev
> <kasan-dev@googlegroups.com> wrote:
> > On Mon, Nov 18, 2019 at 5:03 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > This exact form will confuse syzkaller crash parsing for Linux kernel:
> > > https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L1347
> > > It expects a "general protection fault:" line for these crashes.
> > >
> > > A graceful way to update kernel crash messages would be to add more
> > > tests with the new format here:
> > > https://github.com/google/syzkaller/tree/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/testdata/linux/report
> > > Update parsing code. Roll out new version. Update all other testing
> > > systems that detect and parse kernel crashes. Then commit kernel
> > > changes.
[...]
> > > An unfortunate consequence of offloading testing to third-party systems...
> >
> > And of not having a standard way to signal "this line starts something
> > that should be reported as a bug"? Maybe as a longer-term idea, it'd
> > help to have some sort of extra prefix byte that the kernel can print
> > to say "here comes a bug report, first line should be the subject", or
> > something like that, similar to how we have loglevels...
>
> This would be great.
> Also a way to denote crash end.
> However we have lots of special logic for subjects, not sure if kernel
> could provide good subject:
> https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L537-L1588
> Probably it could, but it won't be completely trivial. E.g. if there
> is a stall inside of a timer function, it should give the name of the
> actual timer callback as identity ("stall in timer_subsystem_foo"). Or
> for syscalls we use more disambiguation b/c "in sys_ioclt" is not much
> different than saying "there is a bug in kernel" :)

Maybe I'm overthinking things, and maybe this is too much effort
relative to the benefit it brings, but here's a crazy idea:

For the specific case of stalls, it might help if the kernel could put
markers on the stack on the first stall warning (e.g. assuming that
ORC is enabled, by walking the stack and replacing all saved
instruction pointers with a pointer to some magic trampoline that
jumps back to the original caller using some sort of shadow stack),
then wait a few seconds, and then check how far on the stack the
markers have been cleared. Then hopefully you'd know exactly in which
function you're looping.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-18 16:29         ` Dmitry Vyukov
  2019-11-18 16:40           ` error attribution for stalls [was: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP] Jann Horn
@ 2019-11-18 16:44           ` Borislav Petkov
  2019-11-18 17:38             ` Borislav Petkov
  2019-11-20 11:40             ` Ingo Molnar
  1 sibling, 2 replies; 21+ messages in thread
From: Borislav Petkov @ 2019-11-18 16:44 UTC (permalink / raw)
  To: Dmitry Vyukov, Jann Horn
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	kasan-dev, LKML, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Mon, Nov 18, 2019 at 05:29:42PM +0100, Dmitry Vyukov wrote:
> > And of not having a standard way to signal "this line starts something
> > that should be reported as a bug"? Maybe as a longer-term idea, it'd
> > help to have some sort of extra prefix byte that the kernel can print
> > to say "here comes a bug report, first line should be the subject", or
> > something like that, similar to how we have loglevels...
> 
> This would be great.
> Also a way to denote crash end.
> However we have lots of special logic for subjects, not sure if kernel
> could provide good subject:
> https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L537-L1588
> Probably it could, but it won't be completely trivial. E.g. if there
> is a stall inside of a timer function, it should give the name of the
> actual timer callback as identity ("stall in timer_subsystem_foo"). Or
> for syscalls we use more disambiguation b/c "in sys_ioclt" is not much
> different than saying "there is a bug in kernel" :)

While external tools are fine and cool, they can't really block kernel
development and printk strings format is not an ABI. And yeah, we have
this discussion each time someone proposes changes to those "magic"
strings but I guess it is about time to fix this in a way that any
future changes don't break tools.

And so I like the idea of marking *only* the first splat with some small
prefix char as that first splat is the special and very important one.
I.e., the one where die_counter is 0.

So I could very well imagine something like:

...
[    2.523708] Write protecting the kernel read-only data: 16384k
[    2.524729] Freeing unused kernel image (text/rodata gap) memory: 2040K
[    2.525594] Freeing unused kernel image (rodata/data gap) memory: 368K
[    2.541414] x86/mm: Checked W+X mappings: passed, no W+X pages found.

<--- important first splat starts here:

[    2.542218] [*] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
[    2.543343] [*] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
[    2.544138] [*] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[    2.545120] [*] RIP: 0010:kernel_init+0x58/0x107
[    2.546055] [*] Code: 48 c7 c7 e0 5c e7 81 e8 eb d2 90 ff c7 05 77 d6 95 00 02 00 00 00 e8 4e 1d a2 ff e8 69 b7 91 ff 48 b8 01 00 00 00 00 00 ff df <ff> e0 48 8b 3d fe 54 d7 00 48 85 ff 74 22 e8 76 93 84 ff 85 c0 89
[    2.550242] [*] RSP: 0018:ffffc90000013f50 EFLAGS: 00010246
[    2.551691] [*] RAX: dfff000000000001 RBX: ffffffff817b7ac9 RCX: 0000000000000040
[    2.553435] [*] RDX: 0000000000000030 RSI: ffff88807da2f170 RDI: 000000000002f170
[    2.555169] [*] RBP: 0000000000000000 R08: 00000000000001a6 R09: 00000000ad55ad55
[    2.556393] [*] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
[    2.557268] [*] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    2.558417] [*] FS:  0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
[    2.559370] [*] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.560138] [*] CR2: 0000000000000000 CR3: 0000000002009000 CR4: 00000000003406f0
[    2.561013] [*] Call Trace:
[    2.561506] [*]  ret_from_fork+0x22/0x40
[    2.562080] [*] Modules linked in:
[    2.583706] [*] ---[ end trace 8ceb5a62d3ebbfa6 ]---
[    2.584384] [*] RIP: 0010:kernel_init+0x58/0x107
[    2.584999] [*] Code: 48 c7 c7 e0 5c e7 81 e8 eb d2 90 ff c7 05 77 d6 95 00 02 00 00 00 e8 4e 1d a2 ff e8 69 b7 91 ff 48 b8 01 00 00 00 00 00 ff df <ff> e0 48 8b 3d fe 54 d7 00 48 85 ff 74 22 e8 76 93 84 ff 85 c0 89
[    2.591746] [*] RSP: 0018:ffffc90000013f50 EFLAGS: 00010246
[    2.593175] [*] RAX: dfff000000000001 RBX: ffffffff817b7ac9 RCX: 0000000000000040
[    2.594892] [*] RDX: 0000000000000030 RSI: ffff88807da2f170 RDI: 000000000002f170
[    2.599706] [*] RBP: 0000000000000000 R08: 00000000000001a6 R09: 00000000ad55ad55
[    2.600585] [*] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
[    2.601433] [*] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    2.602283] [*] FS:  0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
[    2.603207] [*] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.607706] [*] CR2: 0000000000000000 CR3: 0000000002009000 CR4: 00000000003406f0
[    2.608565] [*] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    2.609600] [*] Kernel Offset: disabled
[    2.610168] [*] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

<--- and ends here.

to denote that first splat. And the '*' thing is just an example - it
can be any char - whatever's easier to grep for.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-18 16:44           ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Borislav Petkov
@ 2019-11-18 17:38             ` Borislav Petkov
  2019-11-20 11:41               ` Ingo Molnar
  2019-11-20 11:40             ` Ingo Molnar
  1 sibling, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2019-11-18 17:38 UTC (permalink / raw)
  To: Dmitry Vyukov, Jann Horn
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	kasan-dev, LKML, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Mon, Nov 18, 2019 at 05:44:07PM +0100, Borislav Petkov wrote:
> [    2.523708] Write protecting the kernel read-only data: 16384k
> [    2.524729] Freeing unused kernel image (text/rodata gap) memory: 2040K
> [    2.525594] Freeing unused kernel image (rodata/data gap) memory: 368K
> [    2.541414] x86/mm: Checked W+X mappings: passed, no W+X pages found.
> 
> <--- important first splat starts here:
> 
> [    2.542218] [*] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
		  ^

Btw, tglx just suggested on IRC to simply slap the die_counter number here so
that you have

[    2.543343] [1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
[    2.544138] [1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
...

which also tells you to which splat the line belongs to.

Also useful.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-15 19:17 ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Jann Horn
  2019-11-18 14:21   ` Borislav Petkov
@ 2019-11-20  4:25   ` Andi Kleen
  2019-11-20 10:31     ` Jann Horn
  2019-11-23 23:07   ` Andy Lutomirski
  2 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2019-11-20  4:25 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	kasan-dev, linux-kernel, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

Jann Horn <jannh@google.com> writes:

> +
> +		if (error_code)
> +			pr_alert("GPF is segment-related (see error code)\n");
> +		else
> +			print_kernel_gp_address(regs);

Is this really correct? There are a lot of instructions that can do #GP
(it's the CPU's equivalent of EINVAL) and I'm pretty sure many of them
don't set an error code, and many don't have operands either.

You would need to make sure the instruction decoder handles these
cases correctly, and ideally that you detect it instead of printing
a bogus address.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-20  4:25   ` Andi Kleen
@ 2019-11-20 10:31     ` Jann Horn
  2019-11-20 13:56       ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Jann Horn @ 2019-11-20 10:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	Dmitry Vyukov, kasan-dev, kernel list, Andrey Konovalov,
	Andy Lutomirski, Sean Christopherson

On Wed, Nov 20, 2019 at 5:25 AM Andi Kleen <ak@linux.intel.com> wrote:
> Jann Horn <jannh@google.com> writes:
> > +             if (error_code)
> > +                     pr_alert("GPF is segment-related (see error code)\n");
> > +             else
> > +                     print_kernel_gp_address(regs);
>
> Is this really correct? There are a lot of instructions that can do #GP
> (it's the CPU's equivalent of EINVAL) and I'm pretty sure many of them
> don't set an error code, and many don't have operands either.
>
> You would need to make sure the instruction decoder handles these
> cases correctly, and ideally that you detect it instead of printing
> a bogus address.

Is there a specific concern you have about the instruction decoder? As
far as I can tell, all the paths of insn_get_addr_ref() only work if
the instruction has a mod R/M byte according to the instruction
tables, and then figures out the address based on that. While that
means that there's a wide variety of cases in which we won't be able
to figure out the address, I'm not aware of anything specific that is
likely to lead to false positives.

But Andy did suggest that we hedge a bit in the error message because
even if the address passed to the instruction is non-canonical, we
don't know for sure whether that's actually the reason why things
failed, and that's why it says "probably" in the message about the
address now.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-18 16:44           ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Borislav Petkov
  2019-11-18 17:38             ` Borislav Petkov
@ 2019-11-20 11:40             ` Ingo Molnar
  2019-11-20 11:52               ` Borislav Petkov
  1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2019-11-20 11:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dmitry Vyukov, Jann Horn, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrey Ryabinin,
	Alexander Potapenko, kasan-dev, LKML, Andrey Konovalov,
	Andy Lutomirski, Sean Christopherson, Linus Torvalds


* Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Nov 18, 2019 at 05:29:42PM +0100, Dmitry Vyukov wrote:
> > > And of not having a standard way to signal "this line starts something
> > > that should be reported as a bug"? Maybe as a longer-term idea, it'd
> > > help to have some sort of extra prefix byte that the kernel can print
> > > to say "here comes a bug report, first line should be the subject", or
> > > something like that, similar to how we have loglevels...
> > 
> > This would be great.
> > Also a way to denote crash end.
> > However we have lots of special logic for subjects, not sure if kernel
> > could provide good subject:
> > https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L537-L1588
> > Probably it could, but it won't be completely trivial. E.g. if there
> > is a stall inside of a timer function, it should give the name of the
> > actual timer callback as identity ("stall in timer_subsystem_foo"). Or
> > for syscalls we use more disambiguation b/c "in sys_ioclt" is not much
> > different than saying "there is a bug in kernel" :)
> 
> While external tools are fine and cool, they can't really block kernel
> development and printk strings format is not an ABI. And yeah, we have
> this discussion each time someone proposes changes to those "magic"
> strings but I guess it is about time to fix this in a way that any
> future changes don't break tools.
> 
> And so I like the idea of marking *only* the first splat with some small
> prefix char as that first splat is the special and very important one.
> I.e., the one where die_counter is 0.
> 
> So I could very well imagine something like:
> 
> ...
> [    2.523708] Write protecting the kernel read-only data: 16384k
> [    2.524729] Freeing unused kernel image (text/rodata gap) memory: 2040K
> [    2.525594] Freeing unused kernel image (rodata/data gap) memory: 368K
> [    2.541414] x86/mm: Checked W+X mappings: passed, no W+X pages found.
> 
> <--- important first splat starts here:
> 
> [    2.542218] [*] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> [    2.543343] [*] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
> [    2.544138] [*] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> [    2.545120] [*] RIP: 0010:kernel_init+0x58/0x107
> [    2.546055] [*] Code: 48 c7 c7 e0 5c e7 81 e8 eb d2 90 ff c7 05 77 d6 95 00 02 00 00 00 e8 4e 1d a2 ff e8 69 b7 91 ff 48 b8 01 00 00 00 00 00 ff df <ff> e0 48 8b 3d fe 54 d7 00 48 85 ff 74 22 e8 76 93 84 ff 85 c0 89

> <--- and ends here.
> 
> to denote that first splat. And the '*' thing is just an example - it
> can be any char - whatever's easier to grep for.

Well, this would break various pieces of tooling I'm sure.

Maybe it would be nicer to tooling to embedd the splat-counter in the 
timestamp in a way:

> [    2.542218-#1] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> [    2.543343-#1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
> [    2.544138-#1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> [    2.545120-#1] RIP: 0010:kernel_init+0x58/0x107
> [    2.546055-#1] Code: 48 c7 c7 e0 5c e7 81 e8 eb d2 90 ff c7 05 77 d6 95 00 02 00 00 00 e8 4e 1d a2 ff e8 69 b7 91 ff 48 b8 01 00 00 00 00 00 ff df <ff> e0 48 8b 3d fe 54 d7 00 48 85 ff 74 22 e8 76 93 84 ff 85 c0 89

That way we'd not only know that it's the first splat, but we'd know it 
from all the *other* splats as well where they are in the splat-rank ;-)

(Also Cc:-ed Linus.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-18 17:38             ` Borislav Petkov
@ 2019-11-20 11:41               ` Ingo Molnar
  0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2019-11-20 11:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dmitry Vyukov, Jann Horn, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrey Ryabinin,
	Alexander Potapenko, kasan-dev, LKML, Andrey Konovalov,
	Andy Lutomirski, Sean Christopherson


* Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Nov 18, 2019 at 05:44:07PM +0100, Borislav Petkov wrote:
> > [    2.523708] Write protecting the kernel read-only data: 16384k
> > [    2.524729] Freeing unused kernel image (text/rodata gap) memory: 2040K
> > [    2.525594] Freeing unused kernel image (rodata/data gap) memory: 368K
> > [    2.541414] x86/mm: Checked W+X mappings: passed, no W+X pages found.
> > 
> > <--- important first splat starts here:
> > 
> > [    2.542218] [*] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> 		  ^
> 
> Btw, tglx just suggested on IRC to simply slap the die_counter number here so
> that you have
> 
> [    2.543343] [1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
> [    2.544138] [1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> ...
> 
> which also tells you to which splat the line belongs to.
>
> Also useful.

Yeah - but I think it would be even better to make it part of the 
timestamp - most tools will already discard the [] bit, so why not merge 
the two:

> [    2.543343-#1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
> [    2.544138-#1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-20 11:40             ` Ingo Molnar
@ 2019-11-20 11:52               ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2019-11-20 11:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dmitry Vyukov, Jann Horn, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andrey Ryabinin,
	Alexander Potapenko, kasan-dev, LKML, Andrey Konovalov,
	Andy Lutomirski, Sean Christopherson, Linus Torvalds

On Wed, Nov 20, 2019 at 12:40:31PM +0100, Ingo Molnar wrote:
> Well, this would break various pieces of tooling I'm sure.

Well, if at all, this will break them one last time. Ironically, the
intent here is to have a markup which doesn't break them anymore, once
that markup is agreed upon by all parties.

Because each time we touch those printk formats, tools people complain
about us breaking their tools. So we should get the best of both worlds
by marking those splats in a way that tools can grep for and we won't
touch the markers anymore, once established.

Also, "[]" was only an example. It can be anything we want, as in "<>"
or "!" or whatever is a short prefix that prepends those lines.

> Maybe it would be nicer to tooling to embedd the splat-counter in the 
> timestamp in a way:

Or that. Whatever we agree, as long as it is a unique marker for splats.
And it should say which splat it is, as that is also very useful
information to have it in each line.

> > [    2.542218-#1] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> > [    2.543343-#1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8+ #8
> > [    2.544138-#1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> > [    2.545120-#1] RIP: 0010:kernel_init+0x58/0x107
> > [    2.546055-#1] Code: 48 c7 c7 e0 5c e7 81 e8 eb d2 90 ff c7 05 77 d6 95 00 02 00 00 00 e8 4e 1d a2 ff e8 69 b7 91 ff 48 b8 01 00 00 00 00 00 ff df <ff> e0 48 8b 3d fe 54 d7 00 48 85 ff 74 22 e8 76 93 84 ff 85 c0 89
> 
> That way we'd not only know that it's the first splat, but we'd know it 
> from all the *other* splats as well where they are in the splat-rank ;-)

That's exactly why I'd want the number in there.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-20 10:31     ` Jann Horn
@ 2019-11-20 13:56       ` Andi Kleen
  2019-11-20 14:24         ` Jann Horn
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2019-11-20 13:56 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	Dmitry Vyukov, kasan-dev, kernel list, Andrey Konovalov,
	Andy Lutomirski, Sean Christopherson

> Is there a specific concern you have about the instruction decoder? As
> far as I can tell, all the paths of insn_get_addr_ref() only work if
> the instruction has a mod R/M byte according to the instruction
> tables, and then figures out the address based on that. While that
> means that there's a wide variety of cases in which we won't be able
> to figure out the address, I'm not aware of anything specific that is
> likely to lead to false positives.

First there will be a lot of cases you'll just print 0, even
though 0 is canonical if there is no operand.

Then it might be that the address is canonical, but triggers
#GP anyways (e.g. unaligned SSE)

Or it might be the wrong address if there is an operand,
there are many complex instructions that reference something
in memory, and usually do canonical checking there.

And some other odd cases. For example when the instruction length
exceeds 15 bytes. I know there is fuzzing for the instruction
decoder, but it might be worth double checking it handles
all of that correctly. I'm not sure how good the fuzzer's coverage
is.

At a minimum you should probably check if the address is
actually non canonical. Maybe that's simple enough and weeds out
most cases.

-Andi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-20 13:56       ` Andi Kleen
@ 2019-11-20 14:24         ` Jann Horn
  0 siblings, 0 replies; 21+ messages in thread
From: Jann Horn @ 2019-11-20 14:24 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, Andrey Ryabinin, Alexander Potapenko,
	Dmitry Vyukov, kasan-dev, kernel list, Andrey Konovalov,
	Andy Lutomirski, Sean Christopherson

On Wed, Nov 20, 2019 at 2:56 PM Andi Kleen <ak@linux.intel.com> wrote:
> > Is there a specific concern you have about the instruction decoder? As
> > far as I can tell, all the paths of insn_get_addr_ref() only work if
> > the instruction has a mod R/M byte according to the instruction
> > tables, and then figures out the address based on that. While that
> > means that there's a wide variety of cases in which we won't be able
> > to figure out the address, I'm not aware of anything specific that is
> > likely to lead to false positives.
>
> First there will be a lot of cases you'll just print 0, even
> though 0 is canonical if there is no operand.

Why would I print zeroes if there is no operand? The decoder logic
returns a -1 if it can't find a mod r/m byte, which causes the #GP
handler to not print any address at all. Or are you talking about some
weird instruction that takes an operand that is actually ignored, or
something weird like that?

> Then it might be that the address is canonical, but triggers
> #GP anyways (e.g. unaligned SSE)

Which is an argument for printing the address even if it is canonical,
as Ingo suggested, I guess.

> Or it might be the wrong address if there is an operand,
> there are many complex instructions that reference something
> in memory, and usually do canonical checking there.

In which case you'd probably usually see a canonical address in the
instruction's argument, which causes the error message to not appear
(in patch v2/v3) / to be different (in my current draft for patch v4).
And as Ingo said over in the other thread, even if the argument is not
directly the faulting address at all, it might still help with
figuring out what's going on.

> And some other odd cases. For example when the instruction length
> exceeds 15 bytes.

But this is the #GP handler. Don't overlong instructions give you #UD instead?

> I know there is fuzzing for the instruction
> decoder, but it might be worth double checking it handles
> all of that correctly. I'm not sure how good the fuzzer's coverage
> is.
>
> At a minimum you should probably check if the address is
> actually non canonical. Maybe that's simple enough and weeds out
> most cases.

The patch you're commenting on does that already; quoting the patch:

+       /* Bail out if insn_get_addr_ref() failed or we got a kernel address. */
+       if (addr_ref >= ~__VIRTUAL_MASK)
+               return;
+
+       /* Bail out if the entire operand is in the canonical user half. */
+       if (addr_ref + insn.opnd_bytes - 1 <= __VIRTUAL_MASK)
+               return;

But at Ingo's request, I'm planning to change that in the v4 patch;
see <https://lore.kernel.org/lkml/20191120111859.GA115930@gmail.com/>
and <https://lore.kernel.org/lkml/CAG48ez0Frp4-+xHZ=UhbHh0hC_h-1VtJfwHw=kDo6NahyMv1ig@mail.gmail.com/>.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-15 19:17 ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Jann Horn
  2019-11-18 14:21   ` Borislav Petkov
  2019-11-20  4:25   ` Andi Kleen
@ 2019-11-23 23:07   ` Andy Lutomirski
  2019-11-27 20:27     ` Jann Horn
  2 siblings, 1 reply; 21+ messages in thread
From: Andy Lutomirski @ 2019-11-23 23:07 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	X86 ML, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	kasan-dev, LKML, Andrey Konovalov, Andy Lutomirski,
	Sean Christopherson

On Fri, Nov 15, 2019 at 11:17 AM Jann Horn <jannh@google.com> wrote:
>
> A frequent cause of #GP exceptions are memory accesses to non-canonical
> addresses. Unlike #PF, #GP doesn't come with a fault address in CR2, so
> the kernel doesn't currently print the fault address for #GP.
> Luckily, we already have the necessary infrastructure for decoding X86
> instructions and computing the memory address that is being accessed;
> hook it up to the #GP handler so that we can figure out whether the #GP
> looks like it was caused by a non-canonical address, and if so, print
> that address.
>
> While it is already possible to compute the faulting address manually by
> disassembling the opcode dump and evaluating the instruction against the
> register dump, this should make it slightly easier to identify crashes
> at a glance.
>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>
> Notes:
>     v2:
>      - print different message for segment-related GP (Borislav)
>      - rewrite check for non-canonical address (Sean)
>      - make it clear we don't know for sure why the GP happened (Andy)
>
>  arch/x86/kernel/traps.c | 45 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index c90312146da0..12d42697a18e 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -56,6 +56,8 @@
>  #include <asm/mpx.h>
>  #include <asm/vm86.h>
>  #include <asm/umip.h>
> +#include <asm/insn.h>
> +#include <asm/insn-eval.h>
>
>  #ifdef CONFIG_X86_64
>  #include <asm/x86_init.h>
> @@ -509,6 +511,38 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
>         do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, 0, NULL);
>  }
>
> +/*
> + * On 64-bit, if an uncaught #GP occurs while dereferencing a non-canonical
> + * address, print that address.
> + */
> +static void print_kernel_gp_address(struct pt_regs *regs)
> +{
> +#ifdef CONFIG_X86_64
> +       u8 insn_bytes[MAX_INSN_SIZE];
> +       struct insn insn;
> +       unsigned long addr_ref;
> +
> +       if (probe_kernel_read(insn_bytes, (void *)regs->ip, MAX_INSN_SIZE))
> +               return;
> +
> +       kernel_insn_init(&insn, insn_bytes, MAX_INSN_SIZE);
> +       insn_get_modrm(&insn);
> +       insn_get_sib(&insn);
> +       addr_ref = (unsigned long)insn_get_addr_ref(&insn, regs);
> +
> +       /* Bail out if insn_get_addr_ref() failed or we got a kernel address. */
> +       if (addr_ref >= ~__VIRTUAL_MASK)
> +               return;
> +
> +       /* Bail out if the entire operand is in the canonical user half. */
> +       if (addr_ref + insn.opnd_bytes - 1 <= __VIRTUAL_MASK)
> +               return;
> +
> +       pr_alert("probably dereferencing non-canonical address 0x%016lx\n",
> +                addr_ref);
> +#endif
> +}

Could you refactor this a little bit so that we end up with a helper
that does the computation?  Something like:

int probe_insn_get_memory_ref(void **addr, size_t *len, void *insn_addr);

returns 1 if there was a memory operand and fills in addr and len,
returns 0 if there was no memory operand, and returns a negative error
on error.

I think we're going to want this for #AC handling, too :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-23 23:07   ` Andy Lutomirski
@ 2019-11-27 20:27     ` Jann Horn
  2019-11-28  5:23       ` Andy Lutomirski
  0 siblings, 1 reply; 21+ messages in thread
From: Jann Horn @ 2019-11-27 20:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	X86 ML, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	kasan-dev, LKML, Andrey Konovalov, Sean Christopherson

On Sun, Nov 24, 2019 at 12:08 AM Andy Lutomirski <luto@kernel.org> wrote:
> On Fri, Nov 15, 2019 at 11:17 AM Jann Horn <jannh@google.com> wrote:
> > A frequent cause of #GP exceptions are memory accesses to non-canonical
> > addresses. Unlike #PF, #GP doesn't come with a fault address in CR2, so
> > the kernel doesn't currently print the fault address for #GP.
> > Luckily, we already have the necessary infrastructure for decoding X86
> > instructions and computing the memory address that is being accessed;
> > hook it up to the #GP handler so that we can figure out whether the #GP
> > looks like it was caused by a non-canonical address, and if so, print
> > that address.
[...]
> > +static void print_kernel_gp_address(struct pt_regs *regs)
> > +{
> > +#ifdef CONFIG_X86_64
> > +       u8 insn_bytes[MAX_INSN_SIZE];
> > +       struct insn insn;
> > +       unsigned long addr_ref;
> > +
> > +       if (probe_kernel_read(insn_bytes, (void *)regs->ip, MAX_INSN_SIZE))
> > +               return;
> > +
> > +       kernel_insn_init(&insn, insn_bytes, MAX_INSN_SIZE);
> > +       insn_get_modrm(&insn);
> > +       insn_get_sib(&insn);
> > +       addr_ref = (unsigned long)insn_get_addr_ref(&insn, regs);
[...]
> > +}
>
> Could you refactor this a little bit so that we end up with a helper
> that does the computation?  Something like:
>
> int probe_insn_get_memory_ref(void **addr, size_t *len, void *insn_addr);
>
> returns 1 if there was a memory operand and fills in addr and len,
> returns 0 if there was no memory operand, and returns a negative error
> on error.
>
> I think we're going to want this for #AC handling, too :)

Mmmh... the instruction decoder doesn't currently give us a reliable
access size though. (I know, I'm using it here regardless, but it
doesn't really matter here if the decoded size is too big from time to
time... whereas I imagine that that'd matter quite a bit for #AC
handling.) IIRC e.g. a MOVZX that loads 1 byte into a 4-byte register
is decoded as having .opnd_bytes==4; and if you look through
arch/x86/lib/insn.c, there isn't even anything that would ever set
->opnd_bytes to 1. You'd have to add some plumbing to get reliable
access sizes. I don't want to add a helper for this before the
underlying infrastructure actually works properly.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
  2019-11-27 20:27     ` Jann Horn
@ 2019-11-28  5:23       ` Andy Lutomirski
  0 siblings, 0 replies; 21+ messages in thread
From: Andy Lutomirski @ 2019-11-28  5:23 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, X86 ML, Andrey Ryabinin, Alexander Potapenko,
	Dmitry Vyukov, kasan-dev, LKML, Andrey Konovalov,
	Sean Christopherson

On Wed, Nov 27, 2019 at 12:27 PM Jann Horn <jannh@google.com> wrote:
>
> On Sun, Nov 24, 2019 at 12:08 AM Andy Lutomirski <luto@kernel.org> wrote:
> > On Fri, Nov 15, 2019 at 11:17 AM Jann Horn <jannh@google.com> wrote:
> > > A frequent cause of #GP exceptions are memory accesses to non-canonical
> > > addresses. Unlike #PF, #GP doesn't come with a fault address in CR2, so
> > > the kernel doesn't currently print the fault address for #GP.
> > > Luckily, we already have the necessary infrastructure for decoding X86
> > > instructions and computing the memory address that is being accessed;
> > > hook it up to the #GP handler so that we can figure out whether the #GP
> > > looks like it was caused by a non-canonical address, and if so, print
> > > that address.
> [...]
> > > +static void print_kernel_gp_address(struct pt_regs *regs)
> > > +{
> > > +#ifdef CONFIG_X86_64
> > > +       u8 insn_bytes[MAX_INSN_SIZE];
> > > +       struct insn insn;
> > > +       unsigned long addr_ref;
> > > +
> > > +       if (probe_kernel_read(insn_bytes, (void *)regs->ip, MAX_INSN_SIZE))
> > > +               return;
> > > +
> > > +       kernel_insn_init(&insn, insn_bytes, MAX_INSN_SIZE);
> > > +       insn_get_modrm(&insn);
> > > +       insn_get_sib(&insn);
> > > +       addr_ref = (unsigned long)insn_get_addr_ref(&insn, regs);
> [...]
> > > +}
> >
> > Could you refactor this a little bit so that we end up with a helper
> > that does the computation?  Something like:
> >
> > int probe_insn_get_memory_ref(void **addr, size_t *len, void *insn_addr);
> >
> > returns 1 if there was a memory operand and fills in addr and len,
> > returns 0 if there was no memory operand, and returns a negative error
> > on error.
> >
> > I think we're going to want this for #AC handling, too :)
>
> Mmmh... the instruction decoder doesn't currently give us a reliable
> access size though. (I know, I'm using it here regardless, but it
> doesn't really matter here if the decoded size is too big from time to
> time... whereas I imagine that that'd matter quite a bit for #AC
> handling.) IIRC e.g. a MOVZX that loads 1 byte into a 4-byte register
> is decoded as having .opnd_bytes==4; and if you look through
> arch/x86/lib/insn.c, there isn't even anything that would ever set
> ->opnd_bytes to 1. You'd have to add some plumbing to get reliable
> access sizes. I don't want to add a helper for this before the
> underlying infrastructure actually works properly.

Fair enough.  Although, with #AC, we know a priori that the address is
unaligned, so we could at least print "Unaligned access at 0x%lx\n".
But we can certainly leave these details to someone else.

(For context, there are patches floating around to enable a formerly
secret CPU feature to generate #AC on a LOCK instruction that spans a
cache line.)

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-11-28  5:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15 19:17 [PATCH v2 1/3] x86/insn-eval: Add support for 64-bit kernel mode Jann Horn
2019-11-15 19:17 ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Jann Horn
2019-11-18 14:21   ` Borislav Petkov
2019-11-18 16:02     ` Dmitry Vyukov
2019-11-18 16:19       ` Jann Horn
2019-11-18 16:29         ` Dmitry Vyukov
2019-11-18 16:40           ` error attribution for stalls [was: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP] Jann Horn
2019-11-18 16:44           ` [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP Borislav Petkov
2019-11-18 17:38             ` Borislav Petkov
2019-11-20 11:41               ` Ingo Molnar
2019-11-20 11:40             ` Ingo Molnar
2019-11-20 11:52               ` Borislav Petkov
2019-11-20  4:25   ` Andi Kleen
2019-11-20 10:31     ` Jann Horn
2019-11-20 13:56       ` Andi Kleen
2019-11-20 14:24         ` Jann Horn
2019-11-23 23:07   ` Andy Lutomirski
2019-11-27 20:27     ` Jann Horn
2019-11-28  5:23       ` Andy Lutomirski
2019-11-15 19:17 ` [PATCH v2 3/3] x86/kasan: Print original " Jann Horn
2019-11-18  8:36   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).