linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] x86/MCE: some minor fixes
@ 2020-11-27 16:18 Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Gabriele Paoloni
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Gabriele Paoloni @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

During the safety analysis that was done in the context of the
ELISA project by the safety architecture working group some
incorrectnesses were spotted.
This patchset proposes some fixes.

Changes since v1:
- fixed grammar
- improved readibility of patch1 and Cc'd for stable
- kill_it flag renamed to kill_current_task

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>

Gabriele Paoloni (5):
  x86/mce: do not overwrite no_way_out if mce_end() fails
  x86/mce: move the mce_panic() call and 'kill_it' assignments to the
    right places
  x86/mce: for LMCE panic only if mca_cfg.tolerant < 3
  x86/mce: remove redundant call to irq_work_queue()
  x86/mce: rename kill_it as kill_current_task

 arch/x86/kernel/cpu/mce/core.c | 39 +++++++++++++++-------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails
  2020-11-27 16:18 [PATCH v2 0/5] x86/MCE: some minor fixes Gabriele Paoloni
@ 2020-11-27 16:18 ` Gabriele Paoloni
  2020-11-27 16:43   ` [tip: x86/urgent] x86/mce: Do " tip-bot2 for Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places Gabriele Paoloni
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Gabriele Paoloni @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Currently if mce_end() fails 'no_way_out' is set equal to 'worst'.
'worst' is the worst severity that was found across the MCA banks
associated with the current CPU; however at this point 'no_way_out'
could have been already set by mca_start() by looking at all
severities of all CPUs that entered the MCE handler.
If mce_end() fails, check first if no_way_out is already set and,
if so, stick to it, otherwise use the local worst value.

Cc: <stable@vger.kernel.org>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4102b866e7c0..32b7099e3511 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1384,8 +1384,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * When there's any problem use only local no_way_out state.
 	 */
 	if (!lmce) {
-		if (mce_end(order) < 0)
-			no_way_out = worst >= MCE_PANIC_SEVERITY;
+		if (mce_end(order) < 0) {
+			if (!no_way_out)
+				no_way_out = worst >= MCE_PANIC_SEVERITY;
+		}
 	} else {
 		/*
 		 * If there was a fatal machine check we should have
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places
  2020-11-27 16:18 [PATCH v2 0/5] x86/MCE: some minor fixes Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Gabriele Paoloni
@ 2020-11-27 16:18 ` Gabriele Paoloni
  2020-12-01 18:05   ` [tip: ras/core] x86/mce: Move " tip-bot2 for Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Gabriele Paoloni
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Gabriele Paoloni @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Right now for local MCEs we panic(),if needed, right after lmce is
set. For global MCEs mce_reign() takes care of calling mce_panic().
Hence:
- improve readibility by moving the conditional evaluation of
tolerant up to when kill_it is set first;
- move the mce_panic() call up into the statement where mce_end()
fails.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 32b7099e3511..50e9b0893a92 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1350,8 +1350,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * severity is MCE_AR_SEVERITY we have other options.
 	 */
 	if (!(m.mcgstatus & MCG_STATUS_RIPV))
-		kill_it = 1;
-
+		kill_it = (cfg->tolerant == 3) ? 0 : 1;
 	/*
 	 * Check if this MCE is signaled to only this logical processor,
 	 * on Intel, Zhaoxin only.
@@ -1387,6 +1386,12 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		if (mce_end(order) < 0) {
 			if (!no_way_out)
 				no_way_out = worst >= MCE_PANIC_SEVERITY;
+			/*
+			 * mce_reign() has probably failed hence evaluate if we need
+			 * to panic
+			 */
+			if (no_way_out && mca_cfg.tolerant < 3)
+				mce_panic("Fatal machine check on current CPU", &m, msg);
 		}
 	} else {
 		/*
@@ -1403,15 +1408,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	/*
-	 * If tolerant is at an insane level we drop requests to kill
-	 * processes and continue even when there is no way out.
-	 */
-	if (cfg->tolerant == 3)
-		kill_it = 0;
-	else if (no_way_out)
-		mce_panic("Fatal machine check on current CPU", &m, msg);
-
 	if (worst > 0)
 		irq_work_queue(&mce_irq_work);
 
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3
  2020-11-27 16:18 [PATCH v2 0/5] x86/MCE: some minor fixes Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places Gabriele Paoloni
@ 2020-11-27 16:18 ` Gabriele Paoloni
  2020-12-01 18:05   ` [tip: ras/core] x86/mce: Panic for LMCE " tip-bot2 for Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue() Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task Gabriele Paoloni
  4 siblings, 1 reply; 11+ messages in thread
From: Gabriele Paoloni @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Right now for LMCE if no_way_out is set mce_panic() is called
regardless of mca_cfg.tolerant. This is not correct as, if
mca_cfg.tolerant = 3, the code should never panic.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 50e9b0893a92..d766a3f6a343 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1367,7 +1367,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * to see it will clear it.
 	 */
 	if (lmce) {
-		if (no_way_out)
+		if (no_way_out && mca_cfg.tolerant < 3)
 			mce_panic("Fatal local machine check", &m, msg);
 	} else {
 		order = mce_start(&no_way_out);
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue()
  2020-11-27 16:18 [PATCH v2 0/5] x86/MCE: some minor fixes Gabriele Paoloni
                   ` (2 preceding siblings ...)
  2020-11-27 16:18 ` [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Gabriele Paoloni
@ 2020-11-27 16:18 ` Gabriele Paoloni
  2020-12-01 18:05   ` [tip: ras/core] x86/mce: Remove " tip-bot2 for Gabriele Paoloni
  2020-11-27 16:18 ` [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task Gabriele Paoloni
  4 siblings, 1 reply; 11+ messages in thread
From: Gabriele Paoloni @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Right now in do_machine_check() __mc_scan_banks() triggers
the following call tree:
__mc_scan_banks()->mce_log()->irq_work_queue(&mce_irq_work).

Hence the call of irq_work_queue() below after __mc_scan_banks()
seems redundant. Just remove it.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index d766a3f6a343..802302c54762 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1408,9 +1408,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	if (worst > 0)
-		irq_work_queue(&mce_irq_work);
-
 	if (worst != MCE_AR_SEVERITY && !kill_it)
 		goto out;
 
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task
  2020-11-27 16:18 [PATCH v2 0/5] x86/MCE: some minor fixes Gabriele Paoloni
                   ` (3 preceding siblings ...)
  2020-11-27 16:18 ` [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue() Gabriele Paoloni
@ 2020-11-27 16:18 ` Gabriele Paoloni
  2020-12-01 18:05   ` [tip: ras/core] x86/mce: Rename kill_it to kill_current_task tip-bot2 for Gabriele Paoloni
  4 siblings, 1 reply; 11+ messages in thread
From: Gabriele Paoloni @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Currently if an MCE happens in user-mode or while the kernel
is copying data from user space, 'kill_it' is used to check
if we can recover the execution of the interrupted task or
not; the flag name however is not much meaningful, hence
rename it to match its goal.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 802302c54762..740a4fcc1e90 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1320,10 +1320,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	int no_way_out = 0;
 
 	/*
-	 * If kill_it gets set, there might be a way to recover from this
+	 * If kill_current_task is not set, there might be a way to recover from this
 	 * error.
 	 */
-	int kill_it = 0;
+	int kill_current_task = 0;
 
 	/*
 	 * MCEs are always local on AMD. Same is determined by MCG_STATUS_LMCES
@@ -1350,7 +1350,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * severity is MCE_AR_SEVERITY we have other options.
 	 */
 	if (!(m.mcgstatus & MCG_STATUS_RIPV))
-		kill_it = (cfg->tolerant == 3) ? 0 : 1;
+		kill_current_task = (cfg->tolerant == 3) ? 0 : 1;
 	/*
 	 * Check if this MCE is signaled to only this logical processor,
 	 * on Intel, Zhaoxin only.
@@ -1408,7 +1408,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	if (worst != MCE_AR_SEVERITY && !kill_it)
+	if (worst != MCE_AR_SEVERITY && !kill_current_task)
 		goto out;
 
 	/* Fault was in user mode and we need to take some action */
@@ -1416,7 +1416,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		/* If this triggers there is no way to recover. Die hard. */
 		BUG_ON(!on_thread_stack() || !user_mode(regs));
 
-		queue_task_work(&m, kill_it);
+		queue_task_work(&m, kill_current_task);
 
 	} else {
 		/*
@@ -1434,7 +1434,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 
 		if (m.kflags & MCE_IN_KERNEL_COPYIN)
-			queue_task_work(&m, kill_it);
+			queue_task_work(&m, kill_current_task);
 	}
 out:
 	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: x86/urgent] x86/mce: Do not overwrite no_way_out if mce_end() fails
  2020-11-27 16:18 ` [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Gabriele Paoloni
@ 2020-11-27 16:43   ` tip-bot2 for Gabriele Paoloni
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Gabriele Paoloni @ 2020-11-27 16:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Gabriele Paoloni, Borislav Petkov, Tony Luck, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     25bc65d8ddfc17cc1d7a45bd48e9bdc0e729ced3
Gitweb:        https://git.kernel.org/tip/25bc65d8ddfc17cc1d7a45bd48e9bdc0e729ced3
Author:        Gabriele Paoloni <gabriele.paoloni@intel.com>
AuthorDate:    Fri, 27 Nov 2020 16:18:15 
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 27 Nov 2020 17:38:36 +01:00

x86/mce: Do not overwrite no_way_out if mce_end() fails

Currently, if mce_end() fails, no_way_out - the variable denoting
whether the machine can recover from this MCE - is determined by whether
the worst severity that was found across the MCA banks associated with
the current CPU, is of panic severity.

However, at this point no_way_out could have been already set by
mca_start() after looking at all severities of all CPUs that entered the
MCE handler. If mce_end() fails, check first if no_way_out is already
set and, if so, stick to it, otherwise use the local worst value.

 [ bp: Massage. ]

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201127161819.3106432-2-gabriele.paoloni@intel.com
---
 arch/x86/kernel/cpu/mce/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4102b86..32b7099 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1384,8 +1384,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * When there's any problem use only local no_way_out state.
 	 */
 	if (!lmce) {
-		if (mce_end(order) < 0)
-			no_way_out = worst >= MCE_PANIC_SEVERITY;
+		if (mce_end(order) < 0) {
+			if (!no_way_out)
+				no_way_out = worst >= MCE_PANIC_SEVERITY;
+		}
 	} else {
 		/*
 		 * If there was a fatal machine check we should have

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: ras/core] x86/mce: Rename kill_it to kill_current_task
  2020-11-27 16:18 ` [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task Gabriele Paoloni
@ 2020-12-01 18:05   ` tip-bot2 for Gabriele Paoloni
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Gabriele Paoloni @ 2020-12-01 18:05 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Gabriele Paoloni, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     e1c06d2366e743475b91045ef0c2ce1bbd028cb6
Gitweb:        https://git.kernel.org/tip/e1c06d2366e743475b91045ef0c2ce1bbd028cb6
Author:        Gabriele Paoloni <gabriele.paoloni@intel.com>
AuthorDate:    Fri, 27 Nov 2020 16:18:19 
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 01 Dec 2020 18:58:50 +01:00

x86/mce: Rename kill_it to kill_current_task

Currently, if an MCE happens in user-mode or while the kernel is copying
data from user space, 'kill_it' is used to check if execution of the
interrupted task can be recovered or not; the flag name however is not
very meaningful, hence rename it to match its goal.

 [ bp: Massage commit message, rename the queue_task_work() arg too. ]

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201127161819.3106432-6-gabriele.paoloni@intel.com
---
 arch/x86/kernel/cpu/mce/core.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index a9991a9..6af6a3c 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1266,14 +1266,14 @@ static void kill_me_maybe(struct callback_head *cb)
 	}
 }
 
-static void queue_task_work(struct mce *m, int kill_it)
+static void queue_task_work(struct mce *m, int kill_current_task)
 {
 	current->mce_addr = m->addr;
 	current->mce_kflags = m->kflags;
 	current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
 	current->mce_whole_page = whole_page(m);
 
-	if (kill_it)
+	if (kill_current_task)
 		current->mce_kill_me.func = kill_me_now;
 	else
 		current->mce_kill_me.func = kill_me_maybe;
@@ -1321,10 +1321,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	int no_way_out = 0;
 
 	/*
-	 * If kill_it gets set, there might be a way to recover from this
+	 * If kill_current_task is not set, there might be a way to recover from this
 	 * error.
 	 */
-	int kill_it = 0;
+	int kill_current_task = 0;
 
 	/*
 	 * MCEs are always local on AMD. Same is determined by MCG_STATUS_LMCES
@@ -1351,7 +1351,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * severity is MCE_AR_SEVERITY we have other options.
 	 */
 	if (!(m.mcgstatus & MCG_STATUS_RIPV))
-		kill_it = (cfg->tolerant == 3) ? 0 : 1;
+		kill_current_task = (cfg->tolerant == 3) ? 0 : 1;
 	/*
 	 * Check if this MCE is signaled to only this logical processor,
 	 * on Intel, Zhaoxin only.
@@ -1406,7 +1406,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	if (worst != MCE_AR_SEVERITY && !kill_it)
+	if (worst != MCE_AR_SEVERITY && !kill_current_task)
 		goto out;
 
 	/* Fault was in user mode and we need to take some action */
@@ -1414,7 +1414,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		/* If this triggers there is no way to recover. Die hard. */
 		BUG_ON(!on_thread_stack() || !user_mode(regs));
 
-		queue_task_work(&m, kill_it);
+		queue_task_work(&m, kill_current_task);
 
 	} else {
 		/*
@@ -1432,7 +1432,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 
 		if (m.kflags & MCE_IN_KERNEL_COPYIN)
-			queue_task_work(&m, kill_it);
+			queue_task_work(&m, kill_current_task);
 	}
 out:
 	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: ras/core] x86/mce: Move the mce_panic() call and 'kill_it' assignments to the right places
  2020-11-27 16:18 ` [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places Gabriele Paoloni
@ 2020-12-01 18:05   ` tip-bot2 for Gabriele Paoloni
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Gabriele Paoloni @ 2020-12-01 18:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Gabriele Paoloni, Borislav Petkov, Tony Luck, x86, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     e273e6e12ab1db3eb57712bd60655744d0091fa3
Gitweb:        https://git.kernel.org/tip/e273e6e12ab1db3eb57712bd60655744d0091fa3
Author:        Gabriele Paoloni <gabriele.paoloni@intel.com>
AuthorDate:    Fri, 27 Nov 2020 16:18:16 
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 01 Dec 2020 18:45:56 +01:00

x86/mce: Move the mce_panic() call and 'kill_it' assignments to the right places

Right now, for local MCEs the machine calls panic(), if needed, right
after lmce is set. For MCE broadcasting, mce_reign() takes care of
calling mce_panic().

Hence:
- improve readability by moving the conditional evaluation of
tolerant up to when kill_it is set first;
- move the mce_panic() call up into the statement where mce_end()
fails.

 [ bp: Massage, remove comment in the mce_end() failure case because it
   is superfluous; use local ptr 'cfg' in both tests. ]

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20201127161819.3106432-3-gabriele.paoloni@intel.com
---
 arch/x86/kernel/cpu/mce/core.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f319bed..ebaa52a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1351,8 +1351,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * severity is MCE_AR_SEVERITY we have other options.
 	 */
 	if (!(m.mcgstatus & MCG_STATUS_RIPV))
-		kill_it = 1;
-
+		kill_it = (cfg->tolerant == 3) ? 0 : 1;
 	/*
 	 * Check if this MCE is signaled to only this logical processor,
 	 * on Intel, Zhaoxin only.
@@ -1388,6 +1387,9 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		if (mce_end(order) < 0) {
 			if (!no_way_out)
 				no_way_out = worst >= MCE_PANIC_SEVERITY;
+
+			if (no_way_out && cfg->tolerant < 3)
+				mce_panic("Fatal machine check on current CPU", &m, msg);
 		}
 	} else {
 		/*
@@ -1404,15 +1406,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	/*
-	 * If tolerant is at an insane level we drop requests to kill
-	 * processes and continue even when there is no way out.
-	 */
-	if (cfg->tolerant == 3)
-		kill_it = 0;
-	else if (no_way_out)
-		mce_panic("Fatal machine check on current CPU", &m, msg);
-
 	if (worst > 0)
 		irq_work_queue(&mce_irq_work);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: ras/core] x86/mce: Remove redundant call to irq_work_queue()
  2020-11-27 16:18 ` [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue() Gabriele Paoloni
@ 2020-12-01 18:05   ` tip-bot2 for Gabriele Paoloni
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Gabriele Paoloni @ 2020-12-01 18:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Gabriele Paoloni, Borislav Petkov, Tony Luck, x86, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     d5b38e3d0fdb1a16994b449bc338fb8b26816b07
Gitweb:        https://git.kernel.org/tip/d5b38e3d0fdb1a16994b449bc338fb8b26816b07
Author:        Gabriele Paoloni <gabriele.paoloni@intel.com>
AuthorDate:    Fri, 27 Nov 2020 16:18:18 
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 01 Dec 2020 18:54:32 +01:00

x86/mce: Remove redundant call to irq_work_queue()

Currently, __mc_scan_banks() in do_machine_check() does the following
callchain:

  __mc_scan_banks()->mce_log()->irq_work_queue(&mce_irq_work).

Hence, the call to irq_work_queue() below after __mc_scan_banks()
seems redundant. Just remove it.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20201127161819.3106432-5-gabriele.paoloni@intel.com
---
 arch/x86/kernel/cpu/mce/core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 99da2e0..a9991a9 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1406,9 +1406,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	if (worst > 0)
-		irq_work_queue(&mce_irq_work);
-
 	if (worst != MCE_AR_SEVERITY && !kill_it)
 		goto out;
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: ras/core] x86/mce: Panic for LMCE only if mca_cfg.tolerant < 3
  2020-11-27 16:18 ` [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Gabriele Paoloni
@ 2020-12-01 18:05   ` tip-bot2 for Gabriele Paoloni
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Gabriele Paoloni @ 2020-12-01 18:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Gabriele Paoloni, Borislav Petkov, Tony Luck, x86, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     3a866b16fd2360a9c4ebf71cfbf7ebfe968c1409
Gitweb:        https://git.kernel.org/tip/3a866b16fd2360a9c4ebf71cfbf7ebfe968c1409
Author:        Gabriele Paoloni <gabriele.paoloni@intel.com>
AuthorDate:    Fri, 27 Nov 2020 16:18:17 
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 01 Dec 2020 18:49:29 +01:00

x86/mce: Panic for LMCE only if mca_cfg.tolerant < 3

Right now for LMCE, if no_way_out is set, mce_panic() is called
regardless of mca_cfg.tolerant. This is not correct as, if
mca_cfg.tolerant = 3, the code should never panic.

Add that check.

 [ bp: use local ptr 'cfg'. ]

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20201127161819.3106432-4-gabriele.paoloni@intel.com
---
 arch/x86/kernel/cpu/mce/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ebaa52a..99da2e0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1368,7 +1368,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * to see it will clear it.
 	 */
 	if (lmce) {
-		if (no_way_out)
+		if (no_way_out && cfg->tolerant < 3)
 			mce_panic("Fatal local machine check", &m, msg);
 	} else {
 		order = mce_start(&no_way_out);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-12-01 18:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-27 16:18 [PATCH v2 0/5] x86/MCE: some minor fixes Gabriele Paoloni
2020-11-27 16:18 ` [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Gabriele Paoloni
2020-11-27 16:43   ` [tip: x86/urgent] x86/mce: Do " tip-bot2 for Gabriele Paoloni
2020-11-27 16:18 ` [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places Gabriele Paoloni
2020-12-01 18:05   ` [tip: ras/core] x86/mce: Move " tip-bot2 for Gabriele Paoloni
2020-11-27 16:18 ` [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Gabriele Paoloni
2020-12-01 18:05   ` [tip: ras/core] x86/mce: Panic for LMCE " tip-bot2 for Gabriele Paoloni
2020-11-27 16:18 ` [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue() Gabriele Paoloni
2020-12-01 18:05   ` [tip: ras/core] x86/mce: Remove " tip-bot2 for Gabriele Paoloni
2020-11-27 16:18 ` [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task Gabriele Paoloni
2020-12-01 18:05   ` [tip: ras/core] x86/mce: Rename kill_it to kill_current_task tip-bot2 for Gabriele Paoloni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).