All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] x86/MCE/AMD: Always give panic severity for UC errors in kernel context
@ 2017-11-06 17:46 Borislav Petkov
  2017-11-06 17:46 ` [PATCH 2/2] x86/MCE/AMD: Fix mce_severity_amd_smca() signature Borislav Petkov
  2017-11-07 10:15   ` tip-bot for Borislav Petkov
  0 siblings, 2 replies; 6+ messages in thread
From: Borislav Petkov @ 2017-11-06 17:46 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Yazen Ghannam <yazen.ghannam@amd.com>

The AMD severity grading function was introduced in kernel 4.1. The
current logic can possibly give MCE_AR_SEVERITY for uncorrectable
errors in kernel context. The system may then get stuck in a loop as
memory_failure() will try to handle the bad kernel memory and find it
busy.

Return MCE_PANIC_SEVERITY for all UC errors IN_KERNEL context on AMD
systems.

After:

  b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")

was accepted in v4.6, this issue was masked because of the tail-end attempt
at kernel mode recovery in the #MC handler.

However, uncorrectable errors IN_KERNEL context should always be considered
unrecoverable and cause a panic.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1509562746-6313-1-git-send-email-Yazen.Ghannam@amd.com
Fixes: bf80bbd7dcf5 (x86/mce: Add an AMD severities-grading function)
Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 87cc9ab7a13c..4b8187639c2d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -245,6 +245,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc
 
 	if (m->status & MCI_STATUS_UC) {
 
+		if (ctx == IN_KERNEL)
+			return MCE_PANIC_SEVERITY;
+
 		/*
 		 * On older systems where overflow_recov flag is not present, we
 		 * should simply panic if an error overflow occurs. If
@@ -255,10 +258,6 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc
 			if (mce_flags.smca)
 				return mce_severity_amd_smca(m, ctx);
 
-			/* software can try to contain */
-			if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == IN_KERNEL))
-				return MCE_PANIC_SEVERITY;
-
 			/* kill current process */
 			return MCE_AR_SEVERITY;
 		} else {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] x86/MCE/AMD: Fix mce_severity_amd_smca() signature
  2017-11-06 17:46 [PATCH 1/2] x86/MCE/AMD: Always give panic severity for UC errors in kernel context Borislav Petkov
@ 2017-11-06 17:46 ` Borislav Petkov
  2017-11-07 10:16     ` tip-bot for Borislav Petkov
  2017-11-07 10:15   ` tip-bot for Borislav Petkov
  1 sibling, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2017-11-06 17:46 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Yazen Ghannam <yazen.ghannam@amd.com>

Change the err_ctx type to "enum context" to match the type passed in.

No functionality change.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1509563052-10039-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 4b8187639c2d..4ca632a06e0b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -204,7 +204,7 @@ static int error_context(struct mce *m)
 	return IN_KERNEL;
 }
 
-static int mce_severity_amd_smca(struct mce *m, int err_ctx)
+static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
 {
 	u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
 	u32 low, high;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:ras/core] x86/MCE/AMD: Always give panic severity for UC errors in kernel context
@ 2017-11-07 10:15   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Yazen Ghannam @ 2017-11-07 10:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tony.luck, bp, yazen.ghannam, peterz, linux-kernel, hpa,
	torvalds, linux-edac, tglx, mingo

Commit-ID:  d65dfc81bb3894fdb68cbc74bbf5fb48d2354071
Gitweb:     https://git.kernel.org/tip/d65dfc81bb3894fdb68cbc74bbf5fb48d2354071
Author:     Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate: Mon, 6 Nov 2017 18:46:32 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 7 Nov 2017 11:07:50 +0100

x86/MCE/AMD: Always give panic severity for UC errors in kernel context

The AMD severity grading function was introduced in kernel 4.1. The
current logic can possibly give MCE_AR_SEVERITY for uncorrectable
errors in kernel context. The system may then get stuck in a loop as
memory_failure() will try to handle the bad kernel memory and find it
busy.

Return MCE_PANIC_SEVERITY for all UC errors IN_KERNEL context on AMD
systems.

After:

  b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")

was accepted in v4.6, this issue was masked because of the tail-end attempt
at kernel mode recovery in the #MC handler.

However, uncorrectable errors IN_KERNEL context should always be considered
unrecoverable and cause a panic.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9.x
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: bf80bbd7dcf5 (x86/mce: Add an AMD severities-grading function)
Link: http://lkml.kernel.org/r/20171106174633.13576-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 87cc9ab..4b81876 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -245,6 +245,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc
 
 	if (m->status & MCI_STATUS_UC) {
 
+		if (ctx == IN_KERNEL)
+			return MCE_PANIC_SEVERITY;
+
 		/*
 		 * On older systems where overflow_recov flag is not present, we
 		 * should simply panic if an error overflow occurs. If
@@ -255,10 +258,6 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc
 			if (mce_flags.smca)
 				return mce_severity_amd_smca(m, ctx);
 
-			/* software can try to contain */
-			if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == IN_KERNEL))
-				return MCE_PANIC_SEVERITY;
-
 			/* kill current process */
 			return MCE_AR_SEVERITY;
 		} else {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:ras/core] x86/MCE/AMD: Always give panic severity for UC errors in kernel context
@ 2017-11-07 10:15   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Borislav Petkov @ 2017-11-07 10:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tony.luck, bp, yazen.ghannam, peterz, linux-kernel, hpa,
	torvalds, linux-edac, tglx, mingo

Commit-ID:  d65dfc81bb3894fdb68cbc74bbf5fb48d2354071
Gitweb:     https://git.kernel.org/tip/d65dfc81bb3894fdb68cbc74bbf5fb48d2354071
Author:     Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate: Mon, 6 Nov 2017 18:46:32 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 7 Nov 2017 11:07:50 +0100

x86/MCE/AMD: Always give panic severity for UC errors in kernel context

The AMD severity grading function was introduced in kernel 4.1. The
current logic can possibly give MCE_AR_SEVERITY for uncorrectable
errors in kernel context. The system may then get stuck in a loop as
memory_failure() will try to handle the bad kernel memory and find it
busy.

Return MCE_PANIC_SEVERITY for all UC errors IN_KERNEL context on AMD
systems.

After:

  b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")

was accepted in v4.6, this issue was masked because of the tail-end attempt
at kernel mode recovery in the #MC handler.

However, uncorrectable errors IN_KERNEL context should always be considered
unrecoverable and cause a panic.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9.x
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: bf80bbd7dcf5 (x86/mce: Add an AMD severities-grading function)
Link: http://lkml.kernel.org/r/20171106174633.13576-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 87cc9ab..4b81876 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -245,6 +245,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc
 
 	if (m->status & MCI_STATUS_UC) {
 
+		if (ctx == IN_KERNEL)
+			return MCE_PANIC_SEVERITY;
+
 		/*
 		 * On older systems where overflow_recov flag is not present, we
 		 * should simply panic if an error overflow occurs. If
@@ -255,10 +258,6 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc
 			if (mce_flags.smca)
 				return mce_severity_amd_smca(m, ctx);
 
-			/* software can try to contain */
-			if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == IN_KERNEL))
-				return MCE_PANIC_SEVERITY;
-
 			/* kill current process */
 			return MCE_AR_SEVERITY;
 		} else {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:ras/core] x86/MCE/AMD: Fix mce_severity_amd_smca() signature
@ 2017-11-07 10:16     ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Yazen Ghannam @ 2017-11-07 10:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, peterz, tglx, bp, tony.luck, linux-edac, linux-kernel,
	yazen.ghannam, hpa, mingo

Commit-ID:  783ca517bfd62ca516178712775e4b273292d5b1
Gitweb:     https://git.kernel.org/tip/783ca517bfd62ca516178712775e4b273292d5b1
Author:     Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate: Mon, 6 Nov 2017 18:46:33 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 7 Nov 2017 11:07:50 +0100

x86/MCE/AMD: Fix mce_severity_amd_smca() signature

Change the err_ctx type to "enum context" to match the type passed in.

No functionality change.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171106174633.13576-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 4b81876..4ca632a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -204,7 +204,7 @@ static int error_context(struct mce *m)
 	return IN_KERNEL;
 }
 
-static int mce_severity_amd_smca(struct mce *m, int err_ctx)
+static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
 {
 	u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
 	u32 low, high;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:ras/core] x86/MCE/AMD: Fix mce_severity_amd_smca() signature
@ 2017-11-07 10:16     ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Borislav Petkov @ 2017-11-07 10:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, peterz, tglx, bp, tony.luck, linux-edac, linux-kernel,
	yazen.ghannam, hpa, mingo

Commit-ID:  783ca517bfd62ca516178712775e4b273292d5b1
Gitweb:     https://git.kernel.org/tip/783ca517bfd62ca516178712775e4b273292d5b1
Author:     Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate: Mon, 6 Nov 2017 18:46:33 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 7 Nov 2017 11:07:50 +0100

x86/MCE/AMD: Fix mce_severity_amd_smca() signature

Change the err_ctx type to "enum context" to match the type passed in.

No functionality change.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171106174633.13576-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 4b81876..4ca632a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -204,7 +204,7 @@ static int error_context(struct mce *m)
 	return IN_KERNEL;
 }
 
-static int mce_severity_amd_smca(struct mce *m, int err_ctx)
+static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
 {
 	u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
 	u32 low, high;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-07 10:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 17:46 [PATCH 1/2] x86/MCE/AMD: Always give panic severity for UC errors in kernel context Borislav Petkov
2017-11-06 17:46 ` [PATCH 2/2] x86/MCE/AMD: Fix mce_severity_amd_smca() signature Borislav Petkov
2017-11-07 10:16   ` [tip:ras/core] " tip-bot for Yazen Ghannam
2017-11-07 10:16     ` tip-bot for Borislav Petkov
2017-11-07 10:15 ` [tip:ras/core] x86/MCE/AMD: Always give panic severity for UC errors in kernel context tip-bot for Yazen Ghannam
2017-11-07 10:15   ` tip-bot for Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.