linux-safety.lists.elisa.tech archive mirror
 help / color / mirror / Atom feed
* [linux-safety] [PATCH v2 0/5] x86/MCE: some minor fixes
@ 2020-11-27 16:18 Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Paoloni, Gabriele
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Paoloni, Gabriele @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

During the safety analysis that was done in the context of the
ELISA project by the safety architecture working group some
incorrectnesses were spotted.
This patchset proposes some fixes.

Changes since v1:
- fixed grammar
- improved readibility of patch1 and Cc'd for stable
- kill_it flag renamed to kill_current_task

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>

Gabriele Paoloni (5):
  x86/mce: do not overwrite no_way_out if mce_end() fails
  x86/mce: move the mce_panic() call and 'kill_it' assignments to the
    right places
  x86/mce: for LMCE panic only if mca_cfg.tolerant < 3
  x86/mce: remove redundant call to irq_work_queue()
  x86/mce: rename kill_it as kill_current_task

 arch/x86/kernel/cpu/mce/core.c | 39 +++++++++++++++-------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#207): https://lists.elisa.tech/g/linux-safety/message/207
Mute This Topic: https://lists.elisa.tech/mt/78549943/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [linux-safety] [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails
  2020-11-27 16:18 [linux-safety] [PATCH v2 0/5] x86/MCE: some minor fixes Paoloni, Gabriele
@ 2020-11-27 16:18 ` Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places Paoloni, Gabriele
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Paoloni, Gabriele @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Currently if mce_end() fails 'no_way_out' is set equal to 'worst'.
'worst' is the worst severity that was found across the MCA banks
associated with the current CPU; however at this point 'no_way_out'
could have been already set by mca_start() by looking at all
severities of all CPUs that entered the MCE handler.
If mce_end() fails, check first if no_way_out is already set and,
if so, stick to it, otherwise use the local worst value.

Cc: <stable@vger.kernel.org>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4102b866e7c0..32b7099e3511 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1384,8 +1384,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * When there's any problem use only local no_way_out state.
 	 */
 	if (!lmce) {
-		if (mce_end(order) < 0)
-			no_way_out = worst >= MCE_PANIC_SEVERITY;
+		if (mce_end(order) < 0) {
+			if (!no_way_out)
+				no_way_out = worst >= MCE_PANIC_SEVERITY;
+		}
 	} else {
 		/*
 		 * If there was a fatal machine check we should have
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#208): https://lists.elisa.tech/g/linux-safety/message/208
Mute This Topic: https://lists.elisa.tech/mt/78549944/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [linux-safety] [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places
  2020-11-27 16:18 [linux-safety] [PATCH v2 0/5] x86/MCE: some minor fixes Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Paoloni, Gabriele
@ 2020-11-27 16:18 ` Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Paoloni, Gabriele
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Paoloni, Gabriele @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Right now for local MCEs we panic(),if needed, right after lmce is
set. For global MCEs mce_reign() takes care of calling mce_panic().
Hence:
- improve readibility by moving the conditional evaluation of
tolerant up to when kill_it is set first;
- move the mce_panic() call up into the statement where mce_end()
fails.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 32b7099e3511..50e9b0893a92 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1350,8 +1350,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * severity is MCE_AR_SEVERITY we have other options.
 	 */
 	if (!(m.mcgstatus & MCG_STATUS_RIPV))
-		kill_it = 1;
-
+		kill_it = (cfg->tolerant == 3) ? 0 : 1;
 	/*
 	 * Check if this MCE is signaled to only this logical processor,
 	 * on Intel, Zhaoxin only.
@@ -1387,6 +1386,12 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		if (mce_end(order) < 0) {
 			if (!no_way_out)
 				no_way_out = worst >= MCE_PANIC_SEVERITY;
+			/*
+			 * mce_reign() has probably failed hence evaluate if we need
+			 * to panic
+			 */
+			if (no_way_out && mca_cfg.tolerant < 3)
+				mce_panic("Fatal machine check on current CPU", &m, msg);
 		}
 	} else {
 		/*
@@ -1403,15 +1408,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	/*
-	 * If tolerant is at an insane level we drop requests to kill
-	 * processes and continue even when there is no way out.
-	 */
-	if (cfg->tolerant == 3)
-		kill_it = 0;
-	else if (no_way_out)
-		mce_panic("Fatal machine check on current CPU", &m, msg);
-
 	if (worst > 0)
 		irq_work_queue(&mce_irq_work);
 
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#209): https://lists.elisa.tech/g/linux-safety/message/209
Mute This Topic: https://lists.elisa.tech/mt/78549945/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [linux-safety] [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3
  2020-11-27 16:18 [linux-safety] [PATCH v2 0/5] x86/MCE: some minor fixes Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places Paoloni, Gabriele
@ 2020-11-27 16:18 ` Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue() Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task Paoloni, Gabriele
  4 siblings, 0 replies; 6+ messages in thread
From: Paoloni, Gabriele @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Right now for LMCE if no_way_out is set mce_panic() is called
regardless of mca_cfg.tolerant. This is not correct as, if
mca_cfg.tolerant = 3, the code should never panic.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 50e9b0893a92..d766a3f6a343 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1367,7 +1367,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * to see it will clear it.
 	 */
 	if (lmce) {
-		if (no_way_out)
+		if (no_way_out && mca_cfg.tolerant < 3)
 			mce_panic("Fatal local machine check", &m, msg);
 	} else {
 		order = mce_start(&no_way_out);
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#210): https://lists.elisa.tech/g/linux-safety/message/210
Mute This Topic: https://lists.elisa.tech/mt/78549947/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [linux-safety] [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue()
  2020-11-27 16:18 [linux-safety] [PATCH v2 0/5] x86/MCE: some minor fixes Paoloni, Gabriele
                   ` (2 preceding siblings ...)
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Paoloni, Gabriele
@ 2020-11-27 16:18 ` Paoloni, Gabriele
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task Paoloni, Gabriele
  4 siblings, 0 replies; 6+ messages in thread
From: Paoloni, Gabriele @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Right now in do_machine_check() __mc_scan_banks() triggers
the following call tree:
__mc_scan_banks()->mce_log()->irq_work_queue(&mce_irq_work).

Hence the call of irq_work_queue() below after __mc_scan_banks()
seems redundant. Just remove it.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index d766a3f6a343..802302c54762 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1408,9 +1408,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	if (worst > 0)
-		irq_work_queue(&mce_irq_work);
-
 	if (worst != MCE_AR_SEVERITY && !kill_it)
 		goto out;
 
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#211): https://lists.elisa.tech/g/linux-safety/message/211
Mute This Topic: https://lists.elisa.tech/mt/78549948/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [linux-safety] [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task
  2020-11-27 16:18 [linux-safety] [PATCH v2 0/5] x86/MCE: some minor fixes Paoloni, Gabriele
                   ` (3 preceding siblings ...)
  2020-11-27 16:18 ` [linux-safety] [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue() Paoloni, Gabriele
@ 2020-11-27 16:18 ` Paoloni, Gabriele
  4 siblings, 0 replies; 6+ messages in thread
From: Paoloni, Gabriele @ 2020-11-27 16:18 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, x86, hpa, linux-edac, linux-kernel
  Cc: gabriele.paoloni, linux-safety

Currently if an MCE happens in user-mode or while the kernel
is copying data from user space, 'kill_it' is used to check
if we can recover the execution of the interrupted task or
not; the flag name however is not much meaningful, hence
rename it to match its goal.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 802302c54762..740a4fcc1e90 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1320,10 +1320,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	int no_way_out = 0;
 
 	/*
-	 * If kill_it gets set, there might be a way to recover from this
+	 * If kill_current_task is not set, there might be a way to recover from this
 	 * error.
 	 */
-	int kill_it = 0;
+	int kill_current_task = 0;
 
 	/*
 	 * MCEs are always local on AMD. Same is determined by MCG_STATUS_LMCES
@@ -1350,7 +1350,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * severity is MCE_AR_SEVERITY we have other options.
 	 */
 	if (!(m.mcgstatus & MCG_STATUS_RIPV))
-		kill_it = (cfg->tolerant == 3) ? 0 : 1;
+		kill_current_task = (cfg->tolerant == 3) ? 0 : 1;
 	/*
 	 * Check if this MCE is signaled to only this logical processor,
 	 * on Intel, Zhaoxin only.
@@ -1408,7 +1408,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 	}
 
-	if (worst != MCE_AR_SEVERITY && !kill_it)
+	if (worst != MCE_AR_SEVERITY && !kill_current_task)
 		goto out;
 
 	/* Fault was in user mode and we need to take some action */
@@ -1416,7 +1416,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		/* If this triggers there is no way to recover. Die hard. */
 		BUG_ON(!on_thread_stack() || !user_mode(regs));
 
-		queue_task_work(&m, kill_it);
+		queue_task_work(&m, kill_current_task);
 
 	} else {
 		/*
@@ -1434,7 +1434,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		}
 
 		if (m.kflags & MCE_IN_KERNEL_COPYIN)
-			queue_task_work(&m, kill_it);
+			queue_task_work(&m, kill_current_task);
 	}
 out:
 	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
-- 
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#212): https://lists.elisa.tech/g/linux-safety/message/212
Mute This Topic: https://lists.elisa.tech/mt/78549949/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-



^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-27 16:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-27 16:18 [linux-safety] [PATCH v2 0/5] x86/MCE: some minor fixes Paoloni, Gabriele
2020-11-27 16:18 ` [linux-safety] [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails Paoloni, Gabriele
2020-11-27 16:18 ` [linux-safety] [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places Paoloni, Gabriele
2020-11-27 16:18 ` [linux-safety] [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Paoloni, Gabriele
2020-11-27 16:18 ` [linux-safety] [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue() Paoloni, Gabriele
2020-11-27 16:18 ` [linux-safety] [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task Paoloni, Gabriele

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).