linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] x86/mce: Handle error simulation failures in mce-inject module
@ 2021-09-15 23:27 Smita Koralahalli
  2021-09-15 23:27 ` [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation Smita Koralahalli
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Smita Koralahalli @ 2021-09-15 23:27 UTC (permalink / raw)
  To: x86, linux-edac, linux-kernel
  Cc: Tony Luck, H . Peter Anvin, yazen.ghannam,
	Smita.KoralahalliChannabasappa

This series of patches handles the scenarios where error simulation
fails silently on mce-inject module. It also cleans up the code by
replacing MCx_{STATUS, ADDR, MISC} macros with msr_ops and finally returns
error code to userspace on failures injecting the module.

Error simulation fails if the bank is unpopulated (MCA_IPID register reads
zero) or if the platform enforces write ignored behavior on status
registers.

The first patch checks for an unpopulated bank by reading the value out
from MCA_IPID register and the fourth patch checks for writes ignored from
MCA_STATUS and MCA_DESTAT.

The second patch sets valid bit before doing error injection.

The third patch does some cleanup by replacing MCx_{STATUS, ADDR, MISC}
macros with msr_ops.

The final patch returns error code to userspace from mce-inject module.

Smita Koralahalli (5):
  x86/mce/inject: Check if a bank is unpopulated before error simulation
  x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
  x86/mce: Use msr_ops in prepare_msrs()
  x86/mce/inject: Check for writes ignored in status registers
  x86/mce/mce-inject: Return error code to userspace from mce-inject
    module

 arch/x86/kernel/cpu/mce/core.c   |  1 +
 arch/x86/kernel/cpu/mce/inject.c | 80 ++++++++++++++++++++++++--------
 2 files changed, 62 insertions(+), 19 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-09-15 23:27 [PATCH 0/5] x86/mce: Handle error simulation failures in mce-inject module Smita Koralahalli
@ 2021-09-15 23:27 ` Smita Koralahalli
  2021-09-24  8:26   ` Borislav Petkov
  2021-09-15 23:27 ` [PATCH 2/5] x86/mce/inject: Set the valid bit in MCA_STATUS before error injection Smita Koralahalli
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Smita Koralahalli @ 2021-09-15 23:27 UTC (permalink / raw)
  To: x86, linux-edac, linux-kernel
  Cc: Tony Luck, H . Peter Anvin, yazen.ghannam,
	Smita.KoralahalliChannabasappa

The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
(SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
will read as zero and writes to it will be ignored. Check the value of
this register before trying to simulate the error.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 0bfc14041bbb..51ac575c4605 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -577,6 +577,24 @@ static int inj_bank_set(void *data, u64 val)
 	}
 
 	m->bank = val;
+
+	/* Read IPID value to determine if a bank is unpopulated on the target
+	 * CPU.
+	 */
+	if (boot_cpu_has(X86_FEATURE_SMCA)) {
+
+		/* Check for user provided IPID value. */
+		if (!m->ipid) {
+			rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),
+				      &m->ipid);
+			if (!m->ipid) {
+				pr_err("Error simulation not possible: Bank %llu unpopulated\n",
+					val);
+				return -ENODEV;
+			}
+		}
+	}
+
 	do_inject();
 
 	/* Reset injection struct */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/5] x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
  2021-09-15 23:27 [PATCH 0/5] x86/mce: Handle error simulation failures in mce-inject module Smita Koralahalli
  2021-09-15 23:27 ` [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation Smita Koralahalli
@ 2021-09-15 23:27 ` Smita Koralahalli
  2021-09-24  8:26   ` Borislav Petkov
  2021-09-15 23:27 ` [PATCH 3/5] x86/mce: Use msr_ops in prepare_msrs() Smita Koralahalli
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Smita Koralahalli @ 2021-09-15 23:27 UTC (permalink / raw)
  To: x86, linux-edac, linux-kernel
  Cc: Tony Luck, H . Peter Anvin, yazen.ghannam,
	Smita.KoralahalliChannabasappa

MCA handlers check the valid bit in each status register (MCA_STATUS[Val])
and examine the remainder of the status register only if the valid bit is
set.

Set the valid bit in the corresponding MCA_STATUS register if the user
forgets to set it while doing error simulation.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 arch/x86/kernel/cpu/mce/inject.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 51ac575c4605..8de709b049fc 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -490,6 +490,8 @@ static void do_inject(void)
 
 	i_mce.tsc = rdtsc_ordered();
 
+	i_mce.status |= MCI_STATUS_VAL;
+
 	if (i_mce.misc)
 		i_mce.status |= MCI_STATUS_MISCV;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/5] x86/mce: Use msr_ops in prepare_msrs()
  2021-09-15 23:27 [PATCH 0/5] x86/mce: Handle error simulation failures in mce-inject module Smita Koralahalli
  2021-09-15 23:27 ` [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation Smita Koralahalli
  2021-09-15 23:27 ` [PATCH 2/5] x86/mce/inject: Set the valid bit in MCA_STATUS before error injection Smita Koralahalli
@ 2021-09-15 23:27 ` Smita Koralahalli
  2021-09-24  8:26   ` Borislav Petkov
  2021-09-15 23:27 ` [PATCH 4/5] x86/mce/inject: Check for writes ignored in status registers Smita Koralahalli
  2021-09-15 23:27 ` [PATCH 5/5] x86/mce/mce-inject: Return error code to userspace from mce-inject module Smita Koralahalli
  4 siblings, 1 reply; 18+ messages in thread
From: Smita Koralahalli @ 2021-09-15 23:27 UTC (permalink / raw)
  To: x86, linux-edac, linux-kernel
  Cc: Tony Luck, H . Peter Anvin, yazen.ghannam,
	Smita.KoralahalliChannabasappa

Replace MCx_{STATUS, ADDR, MISC} macros with msr_ops.

Also, restructure the code to avoid multiple initializations for MCA
registers. SMCA machines define a different set of MSRs for MCA registers
and msr_ops initializes appropriate MSRs for SMCA and legacy processors.

Initialize MCA_MISC and MCA_SYND registers at the end after initializing
MCx_{STATUS, DESTAT} which is further explained in the next patch.

Make msr_ops exportable in order to be accessible from mce-inject module.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 arch/x86/kernel/cpu/mce/core.c   |  1 +
 arch/x86/kernel/cpu/mce/inject.c | 27 +++++++++++++--------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 193204aee880..9af910acb930 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -222,6 +222,7 @@ struct mca_msr_regs msr_ops = {
 	.addr	= addr_reg,
 	.misc	= misc_reg
 };
+EXPORT_SYMBOL_GPL(msr_ops);
 
 static void __print_mce(struct mce *m)
 {
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 8de709b049fc..8af4c9845f96 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -464,22 +464,21 @@ static void prepare_msrs(void *info)
 
 	wrmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
 
-	if (boot_cpu_has(X86_FEATURE_SMCA)) {
-		if (m.inject_flags == DFR_INT_INJ) {
-			wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(b), m.status);
-			wrmsrl(MSR_AMD64_SMCA_MCx_DEADDR(b), m.addr);
-		} else {
-			wrmsrl(MSR_AMD64_SMCA_MCx_STATUS(b), m.status);
-			wrmsrl(MSR_AMD64_SMCA_MCx_ADDR(b), m.addr);
-		}
+	if (boot_cpu_has(X86_FEATURE_SMCA) &&
+	    m.inject_flags == DFR_INT_INJ) {
+		wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(b), m.status);
+		wrmsrl(MSR_AMD64_SMCA_MCx_DEADDR(b), m.addr);
+		goto out;
+	}
+
+	wrmsrl(msr_ops.status(b), m.status);
+	wrmsrl(msr_ops.addr(b), m.addr);
 
-		wrmsrl(MSR_AMD64_SMCA_MCx_MISC(b), m.misc);
+out:
+	wrmsrl(msr_ops.misc(b), m.misc);
+
+	if (boot_cpu_has(X86_FEATURE_SMCA))
 		wrmsrl(MSR_AMD64_SMCA_MCx_SYND(b), m.synd);
-	} else {
-		wrmsrl(MSR_IA32_MCx_STATUS(b), m.status);
-		wrmsrl(MSR_IA32_MCx_ADDR(b), m.addr);
-		wrmsrl(MSR_IA32_MCx_MISC(b), m.misc);
-	}
 }
 
 static void do_inject(void)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/5] x86/mce/inject: Check for writes ignored in status registers
  2021-09-15 23:27 [PATCH 0/5] x86/mce: Handle error simulation failures in mce-inject module Smita Koralahalli
                   ` (2 preceding siblings ...)
  2021-09-15 23:27 ` [PATCH 3/5] x86/mce: Use msr_ops in prepare_msrs() Smita Koralahalli
@ 2021-09-15 23:27 ` Smita Koralahalli
  2021-09-15 23:27 ` [PATCH 5/5] x86/mce/mce-inject: Return error code to userspace from mce-inject module Smita Koralahalli
  4 siblings, 0 replies; 18+ messages in thread
From: Smita Koralahalli @ 2021-09-15 23:27 UTC (permalink / raw)
  To: x86, linux-edac, linux-kernel
  Cc: Tony Luck, H . Peter Anvin, yazen.ghannam,
	Smita.KoralahalliChannabasappa

According to Section 2.1.16.3 under HWCR[McStatusWrEn] in "PPR for AMD
Family 19h, Model 01h, Revision B1 Processors - 55898 Rev 0.35 - Feb 5,
2021", the status register may sometimes enforce write ignored behavior
independent of the value of HWCR[McStatusWrEn] depending on the platform
settings.

Hence, evaluate for writes ignored for MCA_STATUS and MCA_DESTAT
separately, before doing error simulation. If true, return with an error
code.

Deferred errors on an SMCA platform use different MSR for MCA_DESTAT.
Hence, evaluate MCA_DESTAT instead of MCA_STATUS on deferred errors, and
do not modify the existing value in MCA_STATUS by writing and reading from
it.

Rearrange the calls and write to registers MCx_{ADDR, MISC, SYND} and
MCG_STATUS only if error simulation is available.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 arch/x86/kernel/cpu/mce/inject.c | 39 ++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 8af4c9845f96..c7d1564f244b 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -457,24 +457,39 @@ static void toggle_nb_mca_mst_cpu(u16 nid)
 		       __func__, PCI_FUNC(F3->devfn), NBCFG);
 }
 
+struct mce_err_handler {
+	struct mce *mce;
+	int err;
+};
+
+static struct mce_err_handler mce_err;
+
 static void prepare_msrs(void *info)
 {
-	struct mce m = *(struct mce *)info;
+	struct mce_err_handler *i_mce_err = ((struct mce_err_handler *)info);
+	struct mce m = *i_mce_err->mce;
 	u8 b = m.bank;
 
-	wrmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
+	u32 status_reg = msr_ops.status(b);
+	u32 addr_reg = msr_ops.addr(b);
 
 	if (boot_cpu_has(X86_FEATURE_SMCA) &&
 	    m.inject_flags == DFR_INT_INJ) {
-		wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(b), m.status);
-		wrmsrl(MSR_AMD64_SMCA_MCx_DEADDR(b), m.addr);
-		goto out;
+		status_reg = MSR_AMD64_SMCA_MCx_DESTAT(b);
+		addr_reg = MSR_AMD64_SMCA_MCx_DEADDR(b);
 	}
 
-	wrmsrl(msr_ops.status(b), m.status);
-	wrmsrl(msr_ops.addr(b), m.addr);
+	wrmsrl(status_reg, m.status);
+	rdmsrl(status_reg, m.status);
+
+	if (!m.status) {
+		pr_info("Error simulation is not available\n");
+		i_mce_err->err = -EINVAL;
+		return;
+	}
 
-out:
+	wrmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
+	wrmsrl(addr_reg, m.addr);
 	wrmsrl(msr_ops.misc(b), m.misc);
 
 	if (boot_cpu_has(X86_FEATURE_SMCA))
@@ -487,6 +502,9 @@ static void do_inject(void)
 	unsigned int cpu = i_mce.extcpu;
 	u8 b = i_mce.bank;
 
+	mce_err.mce = &i_mce;
+	mce_err.err = 0;
+
 	i_mce.tsc = rdtsc_ordered();
 
 	i_mce.status |= MCI_STATUS_VAL;
@@ -538,10 +556,13 @@ static void do_inject(void)
 
 	i_mce.mcgstatus = mcg_status;
 	i_mce.inject_flags = inj_type;
-	smp_call_function_single(cpu, prepare_msrs, &i_mce, 0);
+	smp_call_function_single(cpu, prepare_msrs, &mce_err, 0);
 
 	toggle_hw_mce_inject(cpu, false);
 
+	if (mce_err.err)
+		goto err;
+
 	switch (inj_type) {
 	case DFR_INT_INJ:
 		smp_call_function_single(cpu, trigger_dfr_int, NULL, 0);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/5] x86/mce/mce-inject: Return error code to userspace from mce-inject module
  2021-09-15 23:27 [PATCH 0/5] x86/mce: Handle error simulation failures in mce-inject module Smita Koralahalli
                   ` (3 preceding siblings ...)
  2021-09-15 23:27 ` [PATCH 4/5] x86/mce/inject: Check for writes ignored in status registers Smita Koralahalli
@ 2021-09-15 23:27 ` Smita Koralahalli
  2021-09-24  8:26   ` Borislav Petkov
  4 siblings, 1 reply; 18+ messages in thread
From: Smita Koralahalli @ 2021-09-15 23:27 UTC (permalink / raw)
  To: x86, linux-edac, linux-kernel
  Cc: Tony Luck, H . Peter Anvin, yazen.ghannam,
	Smita.KoralahalliChannabasappa

Currently, the mce-inject module fails silently and user must look for
kernel logs to determine if the injection has succeeded.

Save time for the user and return error code from the module if error
injection fails.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 arch/x86/kernel/cpu/mce/inject.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index c7d1564f244b..0ef9ff921c6a 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -549,8 +549,10 @@ static void do_inject(void)
 	}
 
 	cpus_read_lock();
-	if (!cpu_online(cpu))
+	if (!cpu_online(cpu)) {
+		mce_err.err = -ENODEV;
 		goto err;
+	}
 
 	toggle_hw_mce_inject(cpu, true);
 
@@ -622,7 +624,7 @@ static int inj_bank_set(void *data, u64 val)
 	/* Reset injection struct */
 	setup_inj_struct(&i_mce);
 
-	return 0;
+	return mce_err.err;
 }
 
 MCE_INJECT_GET(bank);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-09-15 23:27 ` [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation Smita Koralahalli
@ 2021-09-24  8:26   ` Borislav Petkov
  2021-09-27 19:51     ` Smita Koralahalli Channabasappa
  2021-10-11 21:12     ` Koralahalli Channabasappa, Smita
  0 siblings, 2 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-09-24  8:26 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: x86, linux-edac, linux-kernel, Tony Luck, H . Peter Anvin, yazen.ghannam

On Wed, Sep 15, 2021 at 06:27:35PM -0500, Smita Koralahalli wrote:
> The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
> (SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
> will read as zero and writes to it will be ignored. Check the value of
> this register before trying to simulate the error.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>  arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
> index 0bfc14041bbb..51ac575c4605 100644
> --- a/arch/x86/kernel/cpu/mce/inject.c
> +++ b/arch/x86/kernel/cpu/mce/inject.c
> @@ -577,6 +577,24 @@ static int inj_bank_set(void *data, u64 val)
>  	}
>  
>  	m->bank = val;
> +
> +	/* Read IPID value to determine if a bank is unpopulated on the target
> +	 * CPU.
> +	 */

Kernel comments style format is:

	/*
	 * A sentence ending with a full-stop.
	 * Another sentence. ...
	 * More sentences. ...
	 */

> +	if (boot_cpu_has(X86_FEATURE_SMCA)) {

This whole thing belongs into inj_ipid_set() where you should verify
whether the bank is set when you try to set the IPID for that bank.

> +
> +		/* Check for user provided IPID value. */
> +		if (!m->ipid) {
> +			rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),
> +				      &m->ipid);

Oh well, one IPI per ipid write. We're doing injection so we can't be on
a production machine so who cares about IPIs there.

> +			if (!m->ipid) {
> +				pr_err("Error simulation not possible: Bank %llu unpopulated\n",


"Cannot set IPID for bank... - bank %d unpopulated\n"

Also, in all your text, use "injection" instead of "simulation" so that
there's no confusion.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
  2021-09-15 23:27 ` [PATCH 2/5] x86/mce/inject: Set the valid bit in MCA_STATUS before error injection Smita Koralahalli
@ 2021-09-24  8:26   ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-09-24  8:26 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: x86, linux-edac, linux-kernel, Tony Luck, H . Peter Anvin, yazen.ghannam

On Wed, Sep 15, 2021 at 06:27:36PM -0500, Smita Koralahalli wrote:
> MCA handlers check the valid bit in each status register (MCA_STATUS[Val])
> and examine the remainder of the status register only if the valid bit is
> set.
> 
> Set the valid bit in the corresponding MCA_STATUS register if the user
> forgets to set it while doing error simulation.

Why, maybe the user wants to inject with Val not set. You could warn
here instead and state that handlers will likely ignore signatures with
Val=0.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] x86/mce: Use msr_ops in prepare_msrs()
  2021-09-15 23:27 ` [PATCH 3/5] x86/mce: Use msr_ops in prepare_msrs() Smita Koralahalli
@ 2021-09-24  8:26   ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-09-24  8:26 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: x86, linux-edac, linux-kernel, Tony Luck, H . Peter Anvin, yazen.ghannam

On Wed, Sep 15, 2021 at 06:27:37PM -0500, Smita Koralahalli wrote:
> Replace MCx_{STATUS, ADDR, MISC} macros with msr_ops.
> 
> Also, restructure the code to avoid multiple initializations for MCA
> registers. SMCA machines define a different set of MSRs for MCA registers
> and msr_ops initializes appropriate MSRs for SMCA and legacy processors.
> 
> Initialize MCA_MISC and MCA_SYND registers at the end after initializing
> MCx_{STATUS, DESTAT} which is further explained in the next patch.
> 
> Make msr_ops exportable in order to be accessible from mce-inject module.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c   |  1 +
>  arch/x86/kernel/cpu/mce/inject.c | 27 +++++++++++++--------------
>  2 files changed, 14 insertions(+), 14 deletions(-)

https://git.kernel.org/tip/8121b8f947be0033f567619be204639a50cad298

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] x86/mce/mce-inject: Return error code to userspace from mce-inject module
  2021-09-15 23:27 ` [PATCH 5/5] x86/mce/mce-inject: Return error code to userspace from mce-inject module Smita Koralahalli
@ 2021-09-24  8:26   ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-09-24  8:26 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: x86, linux-edac, linux-kernel, Tony Luck, H . Peter Anvin, yazen.ghannam

On Wed, Sep 15, 2021 at 06:27:39PM -0500, Smita Koralahalli wrote:
> Currently, the mce-inject module fails silently and user must look for
> kernel logs to determine if the injection has succeeded.
> 
> Save time for the user and return error code from the module if error
> injection fails.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>  arch/x86/kernel/cpu/mce/inject.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
> index c7d1564f244b..0ef9ff921c6a 100644
> --- a/arch/x86/kernel/cpu/mce/inject.c
> +++ b/arch/x86/kernel/cpu/mce/inject.c
> @@ -549,8 +549,10 @@ static void do_inject(void)
>  	}
>  
>  	cpus_read_lock();
> -	if (!cpu_online(cpu))
> +	if (!cpu_online(cpu)) {
> +		mce_err.err = -ENODEV;

You could issue a pr_err() here too. That ENODEV probably turns into
"write failed" but that doesn't say why.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-09-24  8:26   ` Borislav Petkov
@ 2021-09-27 19:51     ` Smita Koralahalli Channabasappa
  2021-09-27 20:15       ` Borislav Petkov
  2021-10-11 21:12     ` Koralahalli Channabasappa, Smita
  1 sibling, 1 reply; 18+ messages in thread
From: Smita Koralahalli Channabasappa @ 2021-09-27 19:51 UTC (permalink / raw)
  To: Borislav Petkov, Smita Koralahalli
  Cc: x86, linux-edac, linux-kernel, Tony Luck, H . Peter Anvin, yazen.ghannam

Hi Boris,

On 9/24/21 3:26 AM, Borislav Petkov wrote:

> On Wed, Sep 15, 2021 at 06:27:35PM -0500, Smita Koralahalli wrote:
>> The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
>> (SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
>> will read as zero and writes to it will be ignored. Check the value of
>> this register before trying to simulate the error.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>>   arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
>> index 0bfc14041bbb..51ac575c4605 100644
>> --- a/arch/x86/kernel/cpu/mce/inject.c
>> +++ b/arch/x86/kernel/cpu/mce/inject.c
>> @@ -577,6 +577,24 @@ static int inj_bank_set(void *data, u64 val)
>>   	}
>>
>> +	if (boot_cpu_has(X86_FEATURE_SMCA)) {
> This whole thing belongs into inj_ipid_set() where you should verify
> whether the bank is set when you try to set the IPID for that bank.

Can you please elaborate on this? I'm not sure if I understood this
right. Should I read the ipid file to verify that the user has input
proper ipid? If ipid file reads zero then do rdmsrl_on_cpu?

Thanks,
Smita

>
> +			if (!m->ipid) {
> +				pr_err("Error simulation not possible: Bank %llu unpopulated\n",
>> +
>> +		/* Check for user provided IPID value. */
>> +		if (!m->ipid) {
>> +			rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),
>> +				      &m->ipid);
> Oh well, one IPI per ipid write. We're doing injection so we can't be on
> a production machine so who cares about IPIs there.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-09-27 19:51     ` Smita Koralahalli Channabasappa
@ 2021-09-27 20:15       ` Borislav Petkov
  2021-09-27 21:56         ` Smita Koralahalli Channabasappa
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2021-09-27 20:15 UTC (permalink / raw)
  To: Smita Koralahalli Channabasappa
  Cc: Smita Koralahalli, x86, linux-edac, linux-kernel, Tony Luck,
	H . Peter Anvin, yazen.ghannam

On Mon, Sep 27, 2021 at 02:51:56PM -0500, Smita Koralahalli Channabasappa wrote:
> Can you please elaborate on this? I'm not sure if I understood this
> right. Should I read the ipid file to verify that the user has input
> proper ipid? If ipid file reads zero then do rdmsrl_on_cpu?

No, on a write to the ipid file you should do that checking and write if
the bank is populated or fail the write otherwise. And you should put
all that code in inj_bank_set() - that's why I say "on a write to the
ipid file".

And instead of boot_cpu_has() you should use cpu_feature_enabled().

Makes sense?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-09-27 20:15       ` Borislav Petkov
@ 2021-09-27 21:56         ` Smita Koralahalli Channabasappa
  2021-09-27 22:05           ` Borislav Petkov
  0 siblings, 1 reply; 18+ messages in thread
From: Smita Koralahalli Channabasappa @ 2021-09-27 21:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Smita Koralahalli, x86, linux-edac, linux-kernel, Tony Luck,
	H . Peter Anvin, yazen.ghannam

On 9/27/21 3:15 PM, Borislav Petkov wrote:

> On Mon, Sep 27, 2021 at 02:51:56PM -0500, Smita Koralahalli Channabasappa wrote:
>> Can you please elaborate on this? I'm not sure if I understood this
>> right. Should I read the ipid file to verify that the user has input
>> proper ipid? If ipid file reads zero then do rdmsrl_on_cpu?
> No, on a write to the ipid file you should do that checking and write if
> the bank is populated or fail the write otherwise. And you should put
> all that code in inj_bank_set() - that's why I say "on a write to the
> ipid file".
>
> And instead of boot_cpu_has() you should use cpu_feature_enabled().
>
> Makes sense?

Yes, this makes sense to me now. But you meant to say inj_ipid_set()
instead of inj_bank_set()..?

Something like this:

-MCE_INJECT_SET(ipid)

+static int inj_ipid_set(void *data, u64 val)
+{
+	struct mce *m = (struct mce*)data;

+	if cpu_feature_enabled(X86_FEATURE_SMCA)) {

+		rdmsrl_on_cpu(..
		..
		..
+	m->ipid = val;
+	..
+}

Thanks,


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-09-27 21:56         ` Smita Koralahalli Channabasappa
@ 2021-09-27 22:05           ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-09-27 22:05 UTC (permalink / raw)
  To: Smita Koralahalli Channabasappa
  Cc: Smita Koralahalli, x86, linux-edac, linux-kernel, Tony Luck,
	H . Peter Anvin, yazen.ghannam

On Mon, Sep 27, 2021 at 04:56:17PM -0500, Smita Koralahalli Channabasappa wrote:
> Yes, this makes sense to me now. But you meant to say inj_ipid_set()
> instead of inj_bank_set()..?

Yeah, I had it correct before:

"This whole thing belongs into inj_ipid_set() where you should verify... "

> 
> Something like this:
> 
> -MCE_INJECT_SET(ipid)
> 
> +static int inj_ipid_set(void *data, u64 val)
> +{
> +	struct mce *m = (struct mce*)data;
> 
> +	if cpu_feature_enabled(X86_FEATURE_SMCA)) {
> 
> +		rdmsrl_on_cpu(..
> 		..
> 		..
> +	m->ipid = val;
> +	..
> +}

Yes, and return proper error codes.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-09-24  8:26   ` Borislav Petkov
  2021-09-27 19:51     ` Smita Koralahalli Channabasappa
@ 2021-10-11 21:12     ` Koralahalli Channabasappa, Smita
  2021-10-14 18:22       ` Borislav Petkov
  1 sibling, 1 reply; 18+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2021-10-11 21:12 UTC (permalink / raw)
  To: Borislav Petkov, Smita Koralahalli
  Cc: x86, linux-edac, linux-kernel, Tony Luck, H . Peter Anvin, yazen.ghannam

Hi Boris,

Sorry for the delayed response.

When I was coding this up, I came across few issues. Mentioning below.

On 9/24/21 3:26 AM, Borislav Petkov wrote:

> On Wed, Sep 15, 2021 at 06:27:35PM -0500, Smita Koralahalli wrote:
>> The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
>> (SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
>> will read as zero and writes to it will be ignored. Check the value of
>> this register before trying to simulate the error.
>>
>> Signed-off-by: Smita Koralahalli<Smita.KoralahalliChannabasappa@amd.com>
>> ---
>>   arch/x86/kernel/cpu/mce/inject.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
>> index 0bfc14041bbb..51ac575c4605 100644
>> --- a/arch/x86/kernel/cpu/mce/inject.c
>> +++ b/arch/x86/kernel/cpu/mce/inject.c
>> @@ -577,6 +577,24 @@ static int inj_bank_set(void *data, u64 val)
>>   	}
>>   
>>   	m->bank = val;
>> +
>> +	/* Read IPID value to determine if a bank is unpopulated on the target
>> +	 * CPU.
>> +	 */
> Kernel comments style format is:
>
> 	/*
> 	 * A sentence ending with a full-stop.
> 	 * Another sentence. ...
> 	 * More sentences. ...
> 	 */
>
>> +	if (boot_cpu_has(X86_FEATURE_SMCA)) {
> This whole thing belongs into inj_ipid_set() where you should verify
> whether the bank is set when you try to set the IPID for that bank.

I do not have the bank number in order to look up the IPID for that bank.
I couldn't know the bank number because mce-inject files are synchronized
in a way that once the bank number is written the injection starts.
Can you please suggest what needs to be done here?
  
Also, the IPID register is read only from the OS, hence the user provided
IPID values could be useful for "sw" error injection types. For "hw" error
injection types we need to read from the registers to determine the IPID
value.

Should there be two cases where on a "sw" injection use the user provided
IPID value whereas on "hw" injection read from registers?

I'm pasting the code snippet after rework on the comments.

static int inj_ipid_set(void *data, u64 val)
{
         struct mce *m = (struct mce *)data;

         if (cpu_feature_enabled(X86_FEATURE_SMCA)) {
                 if (val && inj_type == SW_INJ)
                         m->ipid = val;
                 else {
                         rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(?),
                                       &m->ipid); // Requires bank number here.
                         if (!m->ipid) {
                                 pr_err("Cannot set IPID - unpopulated bank\n");
                                 return -ENODEV;
                         }
                 }
         }

         return 0;
  }

Please let me know what do you think?
Thanks,

>> +
>> +		/* Check for user provided IPID value. */
>> +		if (!m->ipid) {
>> +			rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),
>> +				      &m->ipid);
> Oh well, one IPI per ipid write. We're doing injection so we can't be on
> a production machine so who cares about IPIs there.
>
>> +			if (!m->ipid) {
>> +				pr_err("Error simulation not possible: Bank %llu unpopulated\n",
> "Cannot set IPID for bank... - bank %d unpopulated\n"
>
> Also, in all your text, use "injection" instead of "simulation" so that
> there's no confusion.
>
> Thx.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-10-11 21:12     ` Koralahalli Channabasappa, Smita
@ 2021-10-14 18:22       ` Borislav Petkov
  2021-10-14 20:26         ` Koralahalli Channabasappa, Smita
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2021-10-14 18:22 UTC (permalink / raw)
  To: Koralahalli Channabasappa, Smita
  Cc: Smita Koralahalli, x86, linux-edac, linux-kernel, Tony Luck,
	H . Peter Anvin, yazen.ghannam

On Mon, Oct 11, 2021 at 04:12:14PM -0500, Koralahalli Channabasappa, Smita wrote:
> I do not have the bank number in order to look up the IPID for that bank.
> I couldn't know the bank number because mce-inject files are synchronized
> in a way that once the bank number is written the injection starts.
> Can you please suggest what needs to be done here?
>
> Also, the IPID register is read only from the OS, hence the user provided
> IPID values could be useful for "sw" error injection types. For "hw" error
> injection types we need to read from the registers to determine the IPID
> value.
> 
> Should there be two cases where on a "sw" injection use the user provided
> IPID value whereas on "hw" injection read from registers?

Right, that's a good point. So the way I see it is, we need to decide
what is allowed for sw injection and what for hw injection, wrt to IPID
value.

I think for sw injection, we probably should say that since this is
sw only and its purpose is to test the code only, there should not be
any limitations imposed by the underlying machine. Like using the bank
number, for example.

So what you do now for sw injection:

		if (val && inj_type == SW_INJ)
			m->ipid = val;

should be good enough. User simply sets some IPID value and that value
will be used for the bank which is written when injecting.

Now, for hw injection, you have two cases:

1. The bank is unpopulated so setting the IPID there doesn't make any sense.

2. The bank *is* populated and the respective IPID MSR has a value
describing what that bank is.

And in that case, does it even make sense to set the IPID? I don't think
so because that IP block's type - aka IPID - has been set already by
hardware/firmware.

So the way I see it, it makes no sense whatsoever to set the IPID of a
bank during hw injection.

Right?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-10-14 18:22       ` Borislav Petkov
@ 2021-10-14 20:26         ` Koralahalli Channabasappa, Smita
  2021-10-14 20:57           ` Borislav Petkov
  0 siblings, 1 reply; 18+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2021-10-14 20:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Smita Koralahalli, x86, linux-edac, linux-kernel, Tony Luck,
	H . Peter Anvin, yazen.ghannam

On 10/14/21 1:22 PM, Borislav Petkov wrote:

> On Mon, Oct 11, 2021 at 04:12:14PM -0500, Koralahalli Channabasappa, Smita wrote:
>> I do not have the bank number in order to look up the IPID for that bank.
>> I couldn't know the bank number because mce-inject files are synchronized
>> in a way that once the bank number is written the injection starts.
>> Can you please suggest what needs to be done here?
>>
>> Also, the IPID register is read only from the OS, hence the user provided
>> IPID values could be useful for "sw" error injection types. For "hw" error
>> injection types we need to read from the registers to determine the IPID
>> value.
>>
>> Should there be two cases where on a "sw" injection use the user provided
>> IPID value whereas on "hw" injection read from registers?
> Right, that's a good point. So the way I see it is, we need to decide
> what is allowed for sw injection and what for hw injection, wrt to IPID
> value.
>
> I think for sw injection, we probably should say that since this is
> sw only and its purpose is to test the code only, there should not be
> any limitations imposed by the underlying machine. Like using the bank
> number, for example.
>
> So what you do now for sw injection:
>
> 		if (val && inj_type == SW_INJ)
> 			m->ipid = val;
>
> should be good enough. User simply sets some IPID value and that value
> will be used for the bank which is written when injecting.
>
> Now, for hw injection, you have two cases:
>
> 1. The bank is unpopulated so setting the IPID there doesn't make any sense.
>
> 2. The bank *is* populated and the respective IPID MSR has a value
> describing what that bank is.
>
> And in that case, does it even make sense to set the IPID? I don't think
> so because that IP block's type - aka IPID - has been set already by
> hardware/firmware.
>
> So the way I see it, it makes no sense whatsoever to set the IPID of a
> bank during hw injection.
>
> Right?

Yes, I agree. inj_ipid_set() can be used to serve the purpose of setting
user provided IPID on a sw injection only.

My concern was, we need to determine whether the bank is unpopulated or
populated before trying to inject the errors on a hw injection, for which
we need to read the IPID MSR of that bank.

We cannot do that inside inj_ipid_set() as we do not know the bank number
until inj_bank_set() executes which is called after inj_ipid_set().
mce-inject files are synchronized in a way that once the bank number is
written in inj_bank_set(), injection starts.

So this snippet of code:

if (inj_type != SW_INJ) {
	rdmsrl_on_cpu(m->extcpu, MSR_AMD64_SMCA_MCx_IPID(val),&m->ipid);
	if (!m->ipid) {
		pr_err("Error injection not possible - bank %d unpopulated\n");
		return -ENODEV;
	}
}

should be retained inside inj_bank_set() ?

And inj_ipid_set() should just set m->ipid = val on a SW_INJ as you mentioned
above?

Thanks,



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation
  2021-10-14 20:26         ` Koralahalli Channabasappa, Smita
@ 2021-10-14 20:57           ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-10-14 20:57 UTC (permalink / raw)
  To: Koralahalli Channabasappa, Smita
  Cc: Smita Koralahalli, x86, linux-edac, linux-kernel, Tony Luck,
	H . Peter Anvin, yazen.ghannam

On Thu, Oct 14, 2021 at 03:26:13PM -0500, Koralahalli Channabasappa, Smita wrote:
> My concern was, we need to determine whether the bank is unpopulated or
> populated before trying to inject the errors on a hw injection, for which
> we need to read the IPID MSR of that bank.

Ah, that. Look at the smca_banks[] array in .../mce/amd.c and how
smca_configure() prepares all banks in there. You could use that array
to query which SMCA bank on which CPU is initialized, before injecting
into it.

> should be retained inside inj_bank_set() ?

And yes, I guess you'll have to do it there because then you know which
bank and which CPU the hw injection is supposed to happen on.

> And inj_ipid_set() should just set m->ipid = val on a SW_INJ as you mentioned
> above?

Yap.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-10-14 20:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-15 23:27 [PATCH 0/5] x86/mce: Handle error simulation failures in mce-inject module Smita Koralahalli
2021-09-15 23:27 ` [PATCH 1/5] x86/mce/inject: Check if a bank is unpopulated before error simulation Smita Koralahalli
2021-09-24  8:26   ` Borislav Petkov
2021-09-27 19:51     ` Smita Koralahalli Channabasappa
2021-09-27 20:15       ` Borislav Petkov
2021-09-27 21:56         ` Smita Koralahalli Channabasappa
2021-09-27 22:05           ` Borislav Petkov
2021-10-11 21:12     ` Koralahalli Channabasappa, Smita
2021-10-14 18:22       ` Borislav Petkov
2021-10-14 20:26         ` Koralahalli Channabasappa, Smita
2021-10-14 20:57           ` Borislav Petkov
2021-09-15 23:27 ` [PATCH 2/5] x86/mce/inject: Set the valid bit in MCA_STATUS before error injection Smita Koralahalli
2021-09-24  8:26   ` Borislav Petkov
2021-09-15 23:27 ` [PATCH 3/5] x86/mce: Use msr_ops in prepare_msrs() Smita Koralahalli
2021-09-24  8:26   ` Borislav Petkov
2021-09-15 23:27 ` [PATCH 4/5] x86/mce/inject: Check for writes ignored in status registers Smita Koralahalli
2021-09-15 23:27 ` [PATCH 5/5] x86/mce/mce-inject: Return error code to userspace from mce-inject module Smita Koralahalli
2021-09-24  8:26   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).