All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] x86/fault: Further improve #PF oops messages
@ 2018-12-21 21:36 Sean Christopherson
  2018-12-21 21:36 ` [PATCH v3 1/2] x86/fault: Reword initial BUG message for unhandled page faults Sean Christopherson
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Sean Christopherson @ 2018-12-21 21:36 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Rik van Riel,
	Yu-cheng Yu, Ingo Molnar

Rework the messages printed for #PF oopses to display more detailed
information about the fault in human readable form and to avoid
conflicting messages and/or statements that may not always be accurate.

v3:
  - Prepend a patch to reword the initial BUG message
  - Add sample output in the changelogs
  - Swap the order of the #PF lines.  For most cases the three main lines
    show up in reverse fir-tree ordering and the cause of the fault is
    easy to pick out since it's the last thing highlighted by pr_alert
    (excepting when dumping the IDT, GDT, etc...).

v2:
  - Explicitly call out protection keys violations
  - "Slightly" reword the changelog

v1: https://lkml.kernel.org/r/20181207195223.23968-1-sean.j.christopherson@intel.com
v2: https://lkml.kernel.org/r/20181207184423.1962-1-sean.j.christopherson@intel.com

Sean Christopherson (2):
  x86/fault: Reword initial BUG message for unhandled page faults
  x86/fault: Decode and print #PF oops in human readable form

 arch/x86/mm/fault.c | 51 +++++++++++++++------------------------------
 1 file changed, 17 insertions(+), 34 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] x86/fault: Reword initial BUG message for unhandled page faults
  2018-12-21 21:36 [PATCH v3 0/2] x86/fault: Further improve #PF oops messages Sean Christopherson
@ 2018-12-21 21:36 ` Sean Christopherson
  2019-04-19 18:35   ` [tip:x86/mm] " tip-bot for Sean Christopherson
  2018-12-21 21:36 ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Sean Christopherson
  2019-02-28 15:44 ` [PATCH v3 0/2] x86/fault: Further improve #PF oops messages Sean Christopherson
  2 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2018-12-21 21:36 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Rik van Riel,
	Yu-cheng Yu, Ingo Molnar

Reword the NULL pointer dereference case to simply state that a NULL
pointer was dereferenced, i.e. drop "unable to handle" as that implies
that there are instances where the kernel actual does handle NULL
pointer dereferences, which is not true barring funky exception fixup.

For the non-NULL case, replace "kernel paging request" with "page fault"
as the kernel can technically oops on faults that originated in user
code.  Dropping "kernel" also allows future patches to provide detailed
information on where the fault occurred, e.g. user vs. kernel, without
conflicting with the initial BUG message.

In both cases, replace "at address=" with wording more appropriate to
the oops, as "at" may be interpreted as stating that the address is the
RIP of the instruction that faulted.

Last, and probably least, further qualify the NULL-pointer path by
checking that the fault actually originated in kernel code.  It's
technically possible for userspace to map address 0, and not printing
a super specific message is the least of our worries if the kernel does
manage to oops on an actual NULL pointer dereference from userspace.

Before:
    BUG: unable to handle kernel NULL pointer dereference at ffffbeef00000000
    BUG: unable to handle kernel paging request at ffffbeef00000000

After:
    BUG: kernel NULL pointer dereference, address = 0000000000000008
    BUG: unable to handle page fault for address = ffffbeef00000000

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/mm/fault.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..39dccdfef496 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -644,9 +644,12 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 				from_kuid(&init_user_ns, current_uid()));
 	}
 
-	pr_alert("BUG: unable to handle kernel %s at %px\n",
-		 address < PAGE_SIZE ? "NULL pointer dereference" : "paging request",
-		 (void *)address);
+	if (address < PAGE_SIZE && !user_mode(regs))
+		pr_alert("BUG: kernel NULL pointer dereference, address = %px\n",
+			(void *)address);
+	else
+		pr_alert("BUG: unable to handle page fault for address = %px\n",
+			(void *)address);
 
 	err_txt[0] = 0;
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form
  2018-12-21 21:36 [PATCH v3 0/2] x86/fault: Further improve #PF oops messages Sean Christopherson
  2018-12-21 21:36 ` [PATCH v3 1/2] x86/fault: Reword initial BUG message for unhandled page faults Sean Christopherson
@ 2018-12-21 21:36 ` Sean Christopherson
  2019-04-19 18:35   ` [tip:x86/mm] " tip-bot for Sean Christopherson
  2019-04-29 13:58   ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Dmitry Vyukov
  2019-02-28 15:44 ` [PATCH v3 0/2] x86/fault: Further improve #PF oops messages Sean Christopherson
  2 siblings, 2 replies; 10+ messages in thread
From: Sean Christopherson @ 2018-12-21 21:36 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Rik van Riel,
	Yu-cheng Yu, Ingo Molnar

Linus pointed out that deciphering the raw #PF error code and printing
a more human readable message are two different things, and also that
printing the negative cases is mostly just noise[1].  For example, the
USER bit doesn't mean the fault originated in user code and stating
that an oops wasn't due to a protection keys violation isn't interesting
since an oops on a keys violation is a one-in-a-million scenario.

Remove the per-bit decoding of the error code and instead print:
  - the raw error code
  - why the fault occurred
  - the effective privilege level of the access
  - the type of access
  - whether the fault originated in user code or kernel code

This provides the user with the information needed to triage 99.9% of
oopses without polluting the log with useless information or conflating
the error_code with the CPL.

Sample output:

    BUG: kernel NULL pointer dereference, address = 0000000000000008
    #PF: supervisor-privileged instruction fetch from kernel code
    #PF: error_code(0x0010) - not-present page

    BUG: unable to handle page fault for address = ffffbeef00000000
    #PF: supervisor-privileged instruction fetch from kernel code
    #PF: error_code(0x0010) - not-present page

    BUG: unable to handle page fault for address = ffffc90000230000
    #PF: supervisor-privileged write access from kernel code
    #PF: error_code(0x000b) - reserved bit violation

[1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.com

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/mm/fault.c | 42 +++++++++++-------------------------------
 1 file changed, 11 insertions(+), 31 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 39dccdfef496..a4421cbd230b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -603,24 +603,9 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
 		 name, index, addr, (desc.limit0 | (desc.limit1 << 16)));
 }
 
-/*
- * This helper function transforms the #PF error_code bits into
- * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
- */
-static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
-{
-	if (error_code & mask) {
-		if (buf[0])
-			strcat(buf, " ");
-		strcat(buf, txt);
-	}
-}
-
 static void
 show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
-	char err_txt[64];
-
 	if (!oops_may_print())
 		return;
 
@@ -651,27 +636,22 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		pr_alert("BUG: unable to handle page fault for address = %px\n",
 			(void *)address);
 
-	err_txt[0] = 0;
-
-	/*
-	 * Note: length of these appended strings including the separation space and the
-	 * zero delimiter must fit into err_txt[].
-	 */
-	err_str_append(error_code, err_txt, X86_PF_PROT,  "[PROT]" );
-	err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
-	err_str_append(error_code, err_txt, X86_PF_USER,  "[USER]" );
-	err_str_append(error_code, err_txt, X86_PF_RSVD,  "[RSVD]" );
-	err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
-	err_str_append(error_code, err_txt, X86_PF_PK,    "[PK]"   );
-
-	pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+	pr_alert("#PF: %s-privileged %s from %s code\n",
+		 (error_code & X86_PF_USER)  ? "user" : "supervisor",
+		 (error_code & X86_PF_INSTR) ? "instruction fetch" :
+		 (error_code & X86_PF_WRITE) ? "write access" :
+					       "read access",
+			     user_mode(regs) ? "user" : "kernel");
+	pr_alert("#PF: error_code(0x%04lx) - %s\n", error_code,
+		 !(error_code & X86_PF_PROT) ? "not-present page" :
+		 (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
+		 (error_code & X86_PF_PK)    ? "protection keys violation" :
+					       "permissions violation");
 
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
 		struct desc_ptr idt, gdt;
 		u16 ldtr, tr;
 
-		pr_alert("This was a system access from user code\n");
-
 		/*
 		 * This can happen for quite a few reasons.  The more obvious
 		 * ones are faults accessing the GDT, or LDT.  Perhaps
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] x86/fault: Further improve #PF oops messages
  2018-12-21 21:36 [PATCH v3 0/2] x86/fault: Further improve #PF oops messages Sean Christopherson
  2018-12-21 21:36 ` [PATCH v3 1/2] x86/fault: Reword initial BUG message for unhandled page faults Sean Christopherson
  2018-12-21 21:36 ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Sean Christopherson
@ 2019-02-28 15:44 ` Sean Christopherson
  2 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2019-02-28 15:44 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Rik van Riel, Yu,
	Yu-cheng, Ingo Molnar

High latency ping.  These still apply as-is.

On Fri, Dec 21, 2018 at 01:36:55PM -0800, Sean Christopherson wrote:
> Rework the messages printed for #PF oopses to display more detailed
> information about the fault in human readable form and to avoid
> conflicting messages and/or statements that may not always be accurate.
> 
> v3:
>   - Prepend a patch to reword the initial BUG message
>   - Add sample output in the changelogs
>   - Swap the order of the #PF lines.  For most cases the three main lines
>     show up in reverse fir-tree ordering and the cause of the fault is
>     easy to pick out since it's the last thing highlighted by pr_alert
>     (excepting when dumping the IDT, GDT, etc...).
> 
> v2:
>   - Explicitly call out protection keys violations
>   - "Slightly" reword the changelog
> 
> v1: https://lkml.kernel.org/r/20181207195223.23968-1-sean.j.christopherson@intel.com
> v2: https://lkml.kernel.org/r/20181207184423.1962-1-sean.j.christopherson@intel.com
> 
> Sean Christopherson (2):
>   x86/fault: Reword initial BUG message for unhandled page faults
>   x86/fault: Decode and print #PF oops in human readable form
> 
>  arch/x86/mm/fault.c | 51 +++++++++++++++------------------------------
>  1 file changed, 17 insertions(+), 34 deletions(-)
> 
> -- 
> 2.19.2
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:x86/mm] x86/fault: Reword initial BUG message for unhandled page faults
  2018-12-21 21:36 ` [PATCH v3 1/2] x86/fault: Reword initial BUG message for unhandled page faults Sean Christopherson
@ 2019-04-19 18:35   ` tip-bot for Sean Christopherson
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Sean Christopherson @ 2019-04-19 18:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, riel, torvalds, sean.j.christopherson, luto,
	linux-kernel, peterz, hpa, yu-cheng.yu, tglx, bp, mingo

Commit-ID:  f28b11a2abd9cf5535baa03e28fc63ad0e14f52a
Gitweb:     https://git.kernel.org/tip/f28b11a2abd9cf5535baa03e28fc63ad0e14f52a
Author:     Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate: Fri, 21 Dec 2018 13:36:56 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 19 Apr 2019 19:31:15 +0200

x86/fault: Reword initial BUG message for unhandled page faults

Reword the NULL pointer dereference case to simply state that a NULL
pointer was dereferenced, i.e. drop "unable to handle" as that implies
that there are instances where the kernel actual does handle NULL
pointer dereferences, which is not true barring funky exception fixup.

For the non-NULL case, replace "kernel paging request" with "page fault"
as the kernel can technically oops on faults that originated in user
code.  Dropping "kernel" also allows future patches to provide detailed
information on where the fault occurred, e.g. user vs. kernel, without
conflicting with the initial BUG message.

In both cases, replace "at address=" with wording more appropriate to
the oops, as "at" may be interpreted as stating that the address is the
RIP of the instruction that faulted.

Last, and probably least, further qualify the NULL-pointer path by
checking that the fault actually originated in kernel code.  It's
technically possible for userspace to map address 0, and not printing
a super specific message is the least of our worries if the kernel does
manage to oops on an actual NULL pointer dereference from userspace.

Before:
    BUG: unable to handle kernel NULL pointer dereference at ffffbeef00000000
    BUG: unable to handle kernel paging request at ffffbeef00000000

After:
    BUG: kernel NULL pointer dereference, address = 0000000000000008
    BUG: unable to handle page fault for address = ffffbeef00000000

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20181221213657.27628-2-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/fault.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 667f1da36208..df2c5cdef5c4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -644,9 +644,12 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 				from_kuid(&init_user_ns, current_uid()));
 	}
 
-	pr_alert("BUG: unable to handle kernel %s at %px\n",
-		 address < PAGE_SIZE ? "NULL pointer dereference" : "paging request",
-		 (void *)address);
+	if (address < PAGE_SIZE && !user_mode(regs))
+		pr_alert("BUG: kernel NULL pointer dereference, address = %px\n",
+			(void *)address);
+	else
+		pr_alert("BUG: unable to handle page fault for address = %px\n",
+			(void *)address);
 
 	err_txt[0] = 0;
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/mm] x86/fault: Decode and print #PF oops in human readable form
  2018-12-21 21:36 ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Sean Christopherson
@ 2019-04-19 18:35   ` tip-bot for Sean Christopherson
  2019-04-21 18:35     ` Borislav Petkov
  2019-04-29 13:58   ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Dmitry Vyukov
  1 sibling, 1 reply; 10+ messages in thread
From: tip-bot for Sean Christopherson @ 2019-04-19 18:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, hpa, bp, luto, sean.j.christopherson, peterz, tglx,
	mingo, linux-kernel, riel, dave.hansen, yu-cheng.yu

Commit-ID:  18ea35c5ed9921867194a8efc2a0ac2d5a3c7d2a
Gitweb:     https://git.kernel.org/tip/18ea35c5ed9921867194a8efc2a0ac2d5a3c7d2a
Author:     Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate: Fri, 21 Dec 2018 13:36:57 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 19 Apr 2019 19:31:16 +0200

x86/fault: Decode and print #PF oops in human readable form

Linus pointed out that deciphering the raw #PF error code and printing
a more human readable message are two different things, and also that
printing the negative cases is mostly just noise[1].  For example, the
USER bit doesn't mean the fault originated in user code and stating
that an oops wasn't due to a protection keys violation isn't interesting
since an oops on a keys violation is a one-in-a-million scenario.

Remove the per-bit decoding of the error code and instead print:
  - the raw error code
  - why the fault occurred
  - the effective privilege level of the access
  - the type of access
  - whether the fault originated in user code or kernel code

This provides the user with the information needed to triage 99.9% of
oopses without polluting the log with useless information or conflating
the error_code with the CPL.

Sample output:

    BUG: kernel NULL pointer dereference, address = 0000000000000008
    #PF: supervisor-privileged instruction fetch from kernel code
    #PF: error_code(0x0010) - not-present page

    BUG: unable to handle page fault for address = ffffbeef00000000
    #PF: supervisor-privileged instruction fetch from kernel code
    #PF: error_code(0x0010) - not-present page

    BUG: unable to handle page fault for address = ffffc90000230000
    #PF: supervisor-privileged write access from kernel code
    #PF: error_code(0x000b) - reserved bit violation

[1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.com

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20181221213657.27628-3-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/fault.c | 42 +++++++++++-------------------------------
 1 file changed, 11 insertions(+), 31 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index df2c5cdef5c4..74c9204c5751 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -603,24 +603,9 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
 		 name, index, addr, (desc.limit0 | (desc.limit1 << 16)));
 }
 
-/*
- * This helper function transforms the #PF error_code bits into
- * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
- */
-static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
-{
-	if (error_code & mask) {
-		if (buf[0])
-			strcat(buf, " ");
-		strcat(buf, txt);
-	}
-}
-
 static void
 show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
-	char err_txt[64];
-
 	if (!oops_may_print())
 		return;
 
@@ -651,27 +636,22 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		pr_alert("BUG: unable to handle page fault for address = %px\n",
 			(void *)address);
 
-	err_txt[0] = 0;
-
-	/*
-	 * Note: length of these appended strings including the separation space and the
-	 * zero delimiter must fit into err_txt[].
-	 */
-	err_str_append(error_code, err_txt, X86_PF_PROT,  "[PROT]" );
-	err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
-	err_str_append(error_code, err_txt, X86_PF_USER,  "[USER]" );
-	err_str_append(error_code, err_txt, X86_PF_RSVD,  "[RSVD]" );
-	err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
-	err_str_append(error_code, err_txt, X86_PF_PK,    "[PK]"   );
-
-	pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+	pr_alert("#PF: %s-privileged %s from %s code\n",
+		 (error_code & X86_PF_USER)  ? "user" : "supervisor",
+		 (error_code & X86_PF_INSTR) ? "instruction fetch" :
+		 (error_code & X86_PF_WRITE) ? "write access" :
+					       "read access",
+			     user_mode(regs) ? "user" : "kernel");
+	pr_alert("#PF: error_code(0x%04lx) - %s\n", error_code,
+		 !(error_code & X86_PF_PROT) ? "not-present page" :
+		 (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
+		 (error_code & X86_PF_PK)    ? "protection keys violation" :
+					       "permissions violation");
 
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
 		struct desc_ptr idt, gdt;
 		u16 ldtr, tr;
 
-		pr_alert("This was a system access from user code\n");
-
 		/*
 		 * This can happen for quite a few reasons.  The more obvious
 		 * ones are faults accessing the GDT, or LDT.  Perhaps

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [tip:x86/mm] x86/fault: Decode and print #PF oops in human readable form
  2019-04-19 18:35   ` [tip:x86/mm] " tip-bot for Sean Christopherson
@ 2019-04-21 18:35     ` Borislav Petkov
  2019-04-21 18:52       ` [tip:x86/mm] x86/fault: Make fault messages more succinct tip-bot for Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2019-04-21 18:35 UTC (permalink / raw)
  To: hpa, torvalds, luto, sean.j.christopherson, peterz, tglx, mingo,
	linux-kernel, riel, dave.hansen, yu-cheng.yu
  Cc: linux-tip-commits

On Fri, Apr 19, 2019 at 11:35:51AM -0700, tip-bot for Sean Christopherson wrote:
>     BUG: kernel NULL pointer dereference, address = 0000000000000008
>     #PF: supervisor-privileged instruction fetch from kernel code
>     #PF: error_code(0x0010) - not-present page
> 
>     BUG: unable to handle page fault for address = ffffbeef00000000
>     #PF: supervisor-privileged instruction fetch from kernel code
>     #PF: error_code(0x0010) - not-present page
> 
>     BUG: unable to handle page fault for address = ffffc90000230000
>     #PF: supervisor-privileged write access from kernel code
>     #PF: error_code(0x000b) - reserved bit violation

Writing those in human-readable form is nice. May I suggest making those
messages more succinct, though - we'll be staring at them for years,
after all.

---
From: Borislav Petkov <bp@suse.de>
Date: Sun, 21 Apr 2019 20:24:08 +0200
Subject: [PATCH] x86/fault: Make fault messages more succinct

So we are going to be staring at those in the next years, let's make
them more succinct. In particular:

 - change "address = " to "address: "
 - "-privileged" reads funny. It should be simply "kernel" or "user"
 - "from kernel code" reads funny too. "kernel mode" or "user mode" is
   more natural.

An actual example says more than 1000 words, of course:

  [    0.248370] BUG: kernel NULL pointer dereference, address: 00000000000005b8
  [    0.249120] #PF: supervisor write access in kernel mode
  [    0.249717] #PF: error_code(0x0002) - not-present page

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/fault.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 74c9204c5751..a0df19b0897d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -630,13 +630,13 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 	}
 
 	if (address < PAGE_SIZE && !user_mode(regs))
-		pr_alert("BUG: kernel NULL pointer dereference, address = %px\n",
+		pr_alert("BUG: kernel NULL pointer dereference, address: %px\n",
 			(void *)address);
 	else
-		pr_alert("BUG: unable to handle page fault for address = %px\n",
+		pr_alert("BUG: unable to handle page fault for address: %px\n",
 			(void *)address);
 
-	pr_alert("#PF: %s-privileged %s from %s code\n",
+	pr_alert("#PF: %s %s in %s mode\n",
 		 (error_code & X86_PF_USER)  ? "user" : "supervisor",
 		 (error_code & X86_PF_INSTR) ? "instruction fetch" :
 		 (error_code & X86_PF_WRITE) ? "write access" :
-- 
2.21.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/mm] x86/fault: Make fault messages more succinct
  2019-04-21 18:35     ` Borislav Petkov
@ 2019-04-21 18:52       ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Borislav Petkov @ 2019-04-21 18:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: torvalds, mingo, hpa, peterz, linux-kernel, tglx, bp

Commit-ID:  ea2f8d60603efbd1cb4e193a593945a2fe24d264
Gitweb:     https://git.kernel.org/tip/ea2f8d60603efbd1cb4e193a593945a2fe24d264
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Sun, 21 Apr 2019 20:35:24 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 21 Apr 2019 20:48:51 +0200

x86/fault: Make fault messages more succinct

So we are going to be staring at those in the next years, let's make
them more succinct. In particular:

 - change "address = " to "address: "

 - "-privileged" reads funny. It should be simply "kernel" or "user"

 - "from kernel code" reads funny too. "kernel mode" or "user mode" is
   more natural.

An actual example says more than 1000 words, of course:

  [    0.248370] BUG: kernel NULL pointer dereference, address: 00000000000005b8
  [    0.249120] #PF: supervisor write access in kernel mode
  [    0.249717] #PF: error_code(0x0002) - not-present page

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: riel@surriel.com
Cc: sean.j.christopherson@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20190421183524.GC6048@zn.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/fault.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 74c9204c5751..a0df19b0897d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -630,13 +630,13 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 	}
 
 	if (address < PAGE_SIZE && !user_mode(regs))
-		pr_alert("BUG: kernel NULL pointer dereference, address = %px\n",
+		pr_alert("BUG: kernel NULL pointer dereference, address: %px\n",
 			(void *)address);
 	else
-		pr_alert("BUG: unable to handle page fault for address = %px\n",
+		pr_alert("BUG: unable to handle page fault for address: %px\n",
 			(void *)address);
 
-	pr_alert("#PF: %s-privileged %s from %s code\n",
+	pr_alert("#PF: %s %s in %s mode\n",
 		 (error_code & X86_PF_USER)  ? "user" : "supervisor",
 		 (error_code & X86_PF_INSTR) ? "instruction fetch" :
 		 (error_code & X86_PF_WRITE) ? "write access" :

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form
  2018-12-21 21:36 ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Sean Christopherson
  2019-04-19 18:35   ` [tip:x86/mm] " tip-bot for Sean Christopherson
@ 2019-04-29 13:58   ` Dmitry Vyukov
  2019-04-29 14:14     ` Borislav Petkov
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2019-04-29 13:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, the arch/x86 maintainers,
	H. Peter Anvin, LKML, Linus Torvalds, Rik van Riel, Yu-cheng Yu,
	Ingo Molnar, syzkaller

On Fri, Dec 21, 2018 at 10:37 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Linus pointed out that deciphering the raw #PF error code and printing
> a more human readable message are two different things, and also that
> printing the negative cases is mostly just noise[1].  For example, the
> USER bit doesn't mean the fault originated in user code and stating
> that an oops wasn't due to a protection keys violation isn't interesting
> since an oops on a keys violation is a one-in-a-million scenario.
>
> Remove the per-bit decoding of the error code and instead print:
>   - the raw error code
>   - why the fault occurred
>   - the effective privilege level of the access
>   - the type of access
>   - whether the fault originated in user code or kernel code
>
> This provides the user with the information needed to triage 99.9% of
> oopses without polluting the log with useless information or conflating
> the error_code with the CPL.
>
> Sample output:
>
>     BUG: kernel NULL pointer dereference, address = 0000000000000008
>     #PF: supervisor-privileged instruction fetch from kernel code
>     #PF: error_code(0x0010) - not-present page
>
>     BUG: unable to handle page fault for address = ffffbeef00000000
>     #PF: supervisor-privileged instruction fetch from kernel code
>     #PF: error_code(0x0010) - not-present page
>
>     BUG: unable to handle page fault for address = ffffc90000230000
>     #PF: supervisor-privileged write access from kernel code
>     #PF: error_code(0x000b) - reserved bit violation



As a note for future.
I see 3 recent changes that shuffle crash format forth and back:

ea2f8d60603efbd1cb4e193a593945a2fe24d264 x86/fault: Make fault
messages more succinct
f28b11a2abd9cf5535baa03e28fc63ad0e14f52a x86/fault: Reword initial BUG
message for unhandled page faults
18ea35c5ed9921867194a8efc2a0ac2d5a3c7d2a x86/fault: Decode and print
#PF oops in human readable form

Ideally, such changes are coordinated with kernel testing for gradual
rollout. Since kernel does not provide an official facility for crash
parsing, the actual output effectively becomes part of public API.
Such changes in format may lead to glues bugs, not duped bugs and
missed bugs (including incorrect bisection, when something stopped
being detected as a crash at all).
Ideally-ideally, kernel provides some kind official output parsing
facility (that can understand when kernel has bugged, when the crash
starts/end and some kind of stable identity for a crash) and includes
output parsing tests for all types of crashes because this types
kernel sabotaging testing happens periodically.

We already have samples for "address =" format, but not for "address:"
yet. So I still can't update parsing for the final format and we have
massively glued crash reports.






> [1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.com
>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/mm/fault.c | 42 +++++++++++-------------------------------
>  1 file changed, 11 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 39dccdfef496..a4421cbd230b 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -603,24 +603,9 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
>                  name, index, addr, (desc.limit0 | (desc.limit1 << 16)));
>  }
>
> -/*
> - * This helper function transforms the #PF error_code bits into
> - * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
> - */
> -static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
> -{
> -       if (error_code & mask) {
> -               if (buf[0])
> -                       strcat(buf, " ");
> -               strcat(buf, txt);
> -       }
> -}
> -
>  static void
>  show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
>  {
> -       char err_txt[64];
> -
>         if (!oops_may_print())
>                 return;
>
> @@ -651,27 +636,22 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
>                 pr_alert("BUG: unable to handle page fault for address = %px\n",
>                         (void *)address);
>
> -       err_txt[0] = 0;
> -
> -       /*
> -        * Note: length of these appended strings including the separation space and the
> -        * zero delimiter must fit into err_txt[].
> -        */
> -       err_str_append(error_code, err_txt, X86_PF_PROT,  "[PROT]" );
> -       err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
> -       err_str_append(error_code, err_txt, X86_PF_USER,  "[USER]" );
> -       err_str_append(error_code, err_txt, X86_PF_RSVD,  "[RSVD]" );
> -       err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
> -       err_str_append(error_code, err_txt, X86_PF_PK,    "[PK]"   );
> -
> -       pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
> +       pr_alert("#PF: %s-privileged %s from %s code\n",
> +                (error_code & X86_PF_USER)  ? "user" : "supervisor",
> +                (error_code & X86_PF_INSTR) ? "instruction fetch" :
> +                (error_code & X86_PF_WRITE) ? "write access" :
> +                                              "read access",
> +                            user_mode(regs) ? "user" : "kernel");
> +       pr_alert("#PF: error_code(0x%04lx) - %s\n", error_code,
> +                !(error_code & X86_PF_PROT) ? "not-present page" :
> +                (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
> +                (error_code & X86_PF_PK)    ? "protection keys violation" :
> +                                              "permissions violation");
>
>         if (!(error_code & X86_PF_USER) && user_mode(regs)) {
>                 struct desc_ptr idt, gdt;
>                 u16 ldtr, tr;
>
> -               pr_alert("This was a system access from user code\n");
> -
>                 /*
>                  * This can happen for quite a few reasons.  The more obvious
>                  * ones are faults accessing the GDT, or LDT.  Perhaps
> --
> 2.19.2
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form
  2019-04-29 13:58   ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Dmitry Vyukov
@ 2019-04-29 14:14     ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2019-04-29 14:14 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Sean Christopherson, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	the arch/x86 maintainers, H. Peter Anvin, LKML, Linus Torvalds,
	Rik van Riel, Yu-cheng Yu, Ingo Molnar, syzkaller

On Mon, Apr 29, 2019 at 03:58:20PM +0200, Dmitry Vyukov wrote:
> Ideally, such changes are coordinated with kernel testing for gradual
> rollout. Since kernel does not provide an official facility for crash
> parsing, the actual output effectively becomes part of public API.

Well, printk message formats are not an API. I agree, though, that if
this is how kernel testing is going to get told about crashes, then we'd
need some sort of parsing specification which gets changed together with
when the actual messages are changed, so that tools don't break.

Or something slicker and more robust than tools parsing dmesg output...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-04-29 14:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-21 21:36 [PATCH v3 0/2] x86/fault: Further improve #PF oops messages Sean Christopherson
2018-12-21 21:36 ` [PATCH v3 1/2] x86/fault: Reword initial BUG message for unhandled page faults Sean Christopherson
2019-04-19 18:35   ` [tip:x86/mm] " tip-bot for Sean Christopherson
2018-12-21 21:36 ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Sean Christopherson
2019-04-19 18:35   ` [tip:x86/mm] " tip-bot for Sean Christopherson
2019-04-21 18:35     ` Borislav Petkov
2019-04-21 18:52       ` [tip:x86/mm] x86/fault: Make fault messages more succinct tip-bot for Borislav Petkov
2019-04-29 13:58   ` [PATCH v3 2/2] x86/fault: Decode and print #PF oops in human readable form Dmitry Vyukov
2019-04-29 14:14     ` Borislav Petkov
2019-02-28 15:44 ` [PATCH v3 0/2] x86/fault: Further improve #PF oops messages Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.