linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors
@ 2019-03-21 20:25 Ghannam, Yazen
  2019-03-21 20:25 ` [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models Ghannam, Yazen
  2019-03-22 17:24 ` [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors Borislav Petkov
  0 siblings, 2 replies; 8+ messages in thread
From: Ghannam, Yazen @ 2019-03-21 20:25 UTC (permalink / raw)
  To: linux-edac
  Cc: Ghannam, Yazen, linux-kernel, bp, tony.luck, x86, rafal, clemej

From: Yazen Ghannam <yazen.ghannam@amd.com>

Some systems may report spurious MCA errors. In general, spurious MCA
errors may be disabled by clearing a particular bit in MCA_CTL. However,
clearing a bit in MCA_CTL may not be recommended for some errors, so the
only option is to ignore them.

An MCA error is printed and handled after it has been added to the MCE
event pool. So an MCA error can be ignored by not adding it to the pool.

Create a function pointer to filter MCA errors and use this when adding
an error to the MCE event pool.

Install a default function that does not filter any errors.

Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190307212552.8865-1-Yazen.Ghannam@amd.com

v1->v2:
* This is a new patch replacing V1 Patch 1 which is no longer needed.

 arch/x86/include/asm/mce.h        | 3 +++
 arch/x86/kernel/cpu/mce/core.c    | 6 ++++++
 arch/x86/kernel/cpu/mce/genpool.c | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 22d05e3835f0..0b0b797a959c 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -253,6 +253,9 @@ extern void mce_disable_bank(int bank);
 extern void (*machine_check_vector)(struct pt_regs *, long error_code);
 void do_machine_check(struct pt_regs *, long);
 
+/* Filter MCEs from the decoder chain. */
+extern bool (*filter_mce)(struct mce *m);
+
 /*
  * Threshold handler
  */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b7fb541a4873..effb40581f08 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1771,6 +1771,12 @@ static void __mcheck_cpu_init_timer(void)
 	mce_start_timer(t);
 }
 
+/*
+ * Don't filter MCEs by default. Install a system-specific function, if needed.
+ */
+static bool default_filter_mce(struct mce *m) { return false; }
+bool (*filter_mce)(struct mce *) = default_filter_mce;
+
 /* Handle unconfigured int18 (should never happen) */
 static void unexpected_machine_check(struct pt_regs *regs, long error_code)
 {
diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
index 3395549c51d3..64d1d5a00f39 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -99,6 +99,9 @@ int mce_gen_pool_add(struct mce *mce)
 {
 	struct mce_evt_llist *node;
 
+	if (filter_mce(mce))
+		return -EINVAL;
+
 	if (!mce_evt_pool)
 		return -EINVAL;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models
  2019-03-21 20:25 [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors Ghannam, Yazen
@ 2019-03-21 20:25 ` Ghannam, Yazen
  2019-03-22 17:34   ` Borislav Petkov
  2019-03-22 17:24 ` [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors Borislav Petkov
  1 sibling, 1 reply; 8+ messages in thread
From: Ghannam, Yazen @ 2019-03-21 20:25 UTC (permalink / raw)
  To: linux-edac
  Cc: Ghannam, Yazen, linux-kernel, bp, tony.luck, x86, rafal, clemej

From: Yazen Ghannam <yazen.ghannam@amd.com>

AMD Family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
errors under certain conditions. The errors are benign and can safely be
ignored. However, the high error rate may cause the MCA threshold
counter to overflow causing a high rate of thresholding interrupts. In
addition, users may see the errors reported through the AMD MCE decoder
module, even with the interrupt disabled, due to MCA polling.

This error is reported through the Instruction Fetch bank.

Clear the "Counter Present" bit in the Instruction Fetch bank's
MCA_MISC0 register. This will prevent enabling MCA thresholding on this
bank which will prevent the high interrupt rate due to this error.

Define a function to filter these errors from the MCE event pool.
Install this function during AMD vendor init. The MCA banks are enabled
after vendor init, so the filter function will be installed before the
spurious errors will be reported.

Cc: <stable@vger.kernel.org> # 4.14.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
Cc: <stable@vger.kernel.org> # 4.14.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190307212552.8865-2-Yazen.Ghannam@amd.com

v1->v2:
* Filter out the error earlier in MCE code rather than later in EDAC.

 arch/x86/kernel/cpu/mce/amd.c | 57 ++++++++++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index e64de5149e50..2db85f65b41e 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -563,22 +563,55 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
 	return offset;
 }
 
+bool filter_mce_rv(struct mce *m)
+{
+	enum smca_bank_types bank_type = smca_get_bank_type(m->bank);
+	u8 xec = (m->status >> 16) & 0x3F;
+
+	/*
+	 * Spurious errors of this type may be reported.
+	 * See Family 17h Models 10h-2Fh Erratum #1114.
+	 */
+	if (bank_type == SMCA_IF && xec == 10)
+		return true;
+
+	return false;
+}
+
+static void filter_mce_check(struct cpuinfo_x86 *c)
+{
+	if (c->x86 == 0x17 && (c->x86_model >= 0x10 && c->x86_model <= 0x2F))
+		filter_mce = filter_mce_rv;
+}
+
 /*
- * Turn off MC4_MISC thresholding banks on all family 0x15 models since
- * they're not supported there.
+ * Turn off thresholding banks for the following conditions:
+ * - MC4_MISC thresholding is not support on Family 0x15.
+ * - Prevent possible spurious interrupts from the IF bank on Family 0x17
+ *   Models 0x10-0x2F due to Erratum #1114.
  */
-void disable_err_thresholding(struct cpuinfo_x86 *c)
+void disable_err_thresholding(struct cpuinfo_x86 *c, unsigned int bank)
 {
-	int i;
+	int i, num_msrs;
 	u64 hwcr;
 	bool need_toggle;
-	u32 msrs[] = {
-		0x00000413, /* MC4_MISC0 */
-		0xc0000408, /* MC4_MISC1 */
-	};
+	u32 msrs[NR_BLOCKS];
 
-	if (c->x86 != 0x15)
+	if (c->x86 == 0x15 && bank == 4) {
+		msrs[0] = 0x00000413; /* MC4_MISC0 */
+		msrs[1] = 0xc0000408; /* MC4_MISC1 */
+		num_msrs = 2;
+	} else if (c->x86 == 0x17 &&
+		   (c->x86_model >= 0x10 && c->x86_model <= 0x2F)) {
+
+		if (smca_get_bank_type(bank) != SMCA_IF)
+			return;
+
+		msrs[0] = MSR_AMD64_SMCA_MCx_MISC(bank);
+		num_msrs = 1;
+	} else {
 		return;
+	}
 
 	rdmsrl(MSR_K7_HWCR, hwcr);
 
@@ -589,7 +622,7 @@ void disable_err_thresholding(struct cpuinfo_x86 *c)
 		wrmsrl(MSR_K7_HWCR, hwcr | BIT(18));
 
 	/* Clear CntP bit safely */
-	for (i = 0; i < ARRAY_SIZE(msrs); i++)
+	for (i = 0; i < num_msrs; i++)
 		msr_clear_bit(msrs[i], 62);
 
 	/* restore old settings */
@@ -604,12 +637,14 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	unsigned int bank, block, cpu = smp_processor_id();
 	int offset = -1;
 
-	disable_err_thresholding(c);
+	filter_mce_check(c);
 
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
 		if (mce_flags.smca)
 			smca_configure(bank, cpu);
 
+		disable_err_thresholding(c, bank);
+
 		for (block = 0; block < NR_BLOCKS; ++block) {
 			address = get_block_address(address, low, high, bank, block);
 			if (!address)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors
  2019-03-21 20:25 [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors Ghannam, Yazen
  2019-03-21 20:25 ` [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models Ghannam, Yazen
@ 2019-03-22 17:24 ` Borislav Petkov
  2019-03-22 19:05   ` Ghannam, Yazen
  1 sibling, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2019-03-22 17:24 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86, rafal, clemej

On Thu, Mar 21, 2019 at 08:25:17PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Some systems may report spurious MCA errors. In general, spurious MCA
> errors may be disabled by clearing a particular bit in MCA_CTL. However,
> clearing a bit in MCA_CTL may not be recommended for some errors, so the
> only option is to ignore them.
> 
> An MCA error is printed and handled after it has been added to the MCE
> event pool. So an MCA error can be ignored by not adding it to the pool.
> 
> Create a function pointer to filter MCA errors and use this when adding
> an error to the MCE event pool.
> 
> Install a default function that does not filter any errors.
> 
> Cc: <stable@vger.kernel.org> # 4.14.x
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Link:
> https://lkml.kernel.org/r/20190307212552.8865-1-Yazen.Ghannam@amd.com
> 
> v1->v2:
> * This is a new patch replacing V1 Patch 1 which is no longer needed.
> 
>  arch/x86/include/asm/mce.h        | 3 +++
>  arch/x86/kernel/cpu/mce/core.c    | 6 ++++++
>  arch/x86/kernel/cpu/mce/genpool.c | 3 +++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 22d05e3835f0..0b0b797a959c 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -253,6 +253,9 @@ extern void mce_disable_bank(int bank);
>  extern void (*machine_check_vector)(struct pt_regs *, long error_code);
>  void do_machine_check(struct pt_regs *, long);
>  
> +/* Filter MCEs from the decoder chain. */

That should be something like:

/* Decides whether to add MCE records to the decoder chain or filter them out. */

> +extern bool (*filter_mce)(struct mce *m);
> +
>  /*
>   * Threshold handler
>   */
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index b7fb541a4873..effb40581f08 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1771,6 +1771,12 @@ static void __mcheck_cpu_init_timer(void)
>  	mce_start_timer(t);
>  }
>  
> +/*
> + * Don't filter MCEs by default. Install a system-specific function, if needed.
> + */

That comment is kinda obvious.

> +static bool default_filter_mce(struct mce *m) { return false; }
> +bool (*filter_mce)(struct mce *) = default_filter_mce;
> +
>  /* Handle unconfigured int18 (should never happen) */
>  static void unexpected_machine_check(struct pt_regs *regs, long error_code)
>  {
> diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
> index 3395549c51d3..64d1d5a00f39 100644
> --- a/arch/x86/kernel/cpu/mce/genpool.c
> +++ b/arch/x86/kernel/cpu/mce/genpool.c
> @@ -99,6 +99,9 @@ int mce_gen_pool_add(struct mce *mce)
>  {
>  	struct mce_evt_llist *node;
>  
> +	if (filter_mce(mce))
> +		return -EINVAL;
> +
>  	if (!mce_evt_pool)
>  		return -EINVAL;
>  
> -- 
> 2.17.1
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models
  2019-03-21 20:25 ` [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models Ghannam, Yazen
@ 2019-03-22 17:34   ` Borislav Petkov
  2019-03-22 19:24     ` Ghannam, Yazen
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2019-03-22 17:34 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86, rafal, clemej

On Thu, Mar 21, 2019 at 08:25:18PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> AMD Family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
> errors under certain conditions. The errors are benign and can safely be
> ignored. However, the high error rate may cause the MCA threshold
> counter to overflow causing a high rate of thresholding interrupts. In
> addition, users may see the errors reported through the AMD MCE decoder
> module, even with the interrupt disabled, due to MCA polling.
> 
> This error is reported through the Instruction Fetch bank.
> 
> Clear the "Counter Present" bit in the Instruction Fetch bank's
> MCA_MISC0 register. This will prevent enabling MCA thresholding on this
> bank which will prevent the high interrupt rate due to this error.
> 
> Define a function to filter these errors from the MCE event pool.
> Install this function during AMD vendor init. The MCA banks are enabled
> after vendor init, so the filter function will be installed before the
> spurious errors will be reported.
> 
> Cc: <stable@vger.kernel.org> # 4.14.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
> Cc: <stable@vger.kernel.org> # 4.14.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
> Cc: <stable@vger.kernel.org> # 4.14.x
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Link:
> https://lkml.kernel.org/r/20190307212552.8865-2-Yazen.Ghannam@amd.com
> 
> v1->v2:
> * Filter out the error earlier in MCE code rather than later in EDAC.
> 
>  arch/x86/kernel/cpu/mce/amd.c | 57 ++++++++++++++++++++++++++++-------
>  1 file changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index e64de5149e50..2db85f65b41e 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -563,22 +563,55 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
>  	return offset;
>  }
>  
> +bool filter_mce_rv(struct mce *m)
> +{
> +	enum smca_bank_types bank_type = smca_get_bank_type(m->bank);
> +	u8 xec = (m->status >> 16) & 0x3F;
> +
> +	/*
> +	 * Spurious errors of this type may be reported.
> +	 * See Family 17h Models 10h-2Fh Erratum #1114.
> +	 */
> +	if (bank_type == SMCA_IF && xec == 10)
> +		return true;
> +
> +	return false;
> +}
> +
> +static void filter_mce_check(struct cpuinfo_x86 *c)
> +{
> +	if (c->x86 == 0x17 && (c->x86_model >= 0x10 && c->x86_model <= 0x2F))
> +		filter_mce = filter_mce_rv;
> +}

Why all the noodling here with a check function which assigns a
filter_mce_rv (btw, that "rv" means nothing outside of AMD) and a
generic default_filter_mce?

Why not a simple filter_mce() in mce/core.c which calls amd_filter_mce()
based on x86_vendor and amd_filter_mce() is defined in mce/amd.c?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors
  2019-03-22 17:24 ` [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors Borislav Petkov
@ 2019-03-22 19:05   ` Ghannam, Yazen
  0 siblings, 0 replies; 8+ messages in thread
From: Ghannam, Yazen @ 2019-03-22 19:05 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86, rafal, clemej

On 3/22/2019 12:24 PM, Borislav Petkov wrote:
> On Thu, Mar 21, 2019 at 08:25:17PM +0000, Ghannam, Yazen wrote:
>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>
>> Some systems may report spurious MCA errors. In general, spurious MCA
>> errors may be disabled by clearing a particular bit in MCA_CTL. However,
>> clearing a bit in MCA_CTL may not be recommended for some errors, so the
>> only option is to ignore them.
>>
>> An MCA error is printed and handled after it has been added to the MCE
>> event pool. So an MCA error can be ignored by not adding it to the pool.
>>
>> Create a function pointer to filter MCA errors and use this when adding
>> an error to the MCE event pool.
>>
>> Install a default function that does not filter any errors.
>>
>> Cc: <stable@vger.kernel.org> # 4.14.x
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> ---
>> Link:
>> https://lkml.kernel.org/r/20190307212552.8865-1-Yazen.Ghannam@amd.com
>>
>> v1->v2:
>> * This is a new patch replacing V1 Patch 1 which is no longer needed.
>>
>>  arch/x86/include/asm/mce.h        | 3 +++
>>  arch/x86/kernel/cpu/mce/core.c    | 6 ++++++
>>  arch/x86/kernel/cpu/mce/genpool.c | 3 +++
>>  3 files changed, 12 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index 22d05e3835f0..0b0b797a959c 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -253,6 +253,9 @@ extern void mce_disable_bank(int bank);
>>  extern void (*machine_check_vector)(struct pt_regs *, long error_code);
>>  void do_machine_check(struct pt_regs *, long);
>>  
>> +/* Filter MCEs from the decoder chain. */
> 
> That should be something like:
> 
> /* Decides whether to add MCE records to the decoder chain or filter them out. */
> 

Okay. I'll make a change.

>> +extern bool (*filter_mce)(struct mce *m);
>> +
>>  /*
>>   * Threshold handler
>>   */
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index b7fb541a4873..effb40581f08 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -1771,6 +1771,12 @@ static void __mcheck_cpu_init_timer(void)
>>  	mce_start_timer(t);
>>  }
>>  
>> +/*
>> + * Don't filter MCEs by default. Install a system-specific function, if needed.
>> + */
> 
> That comment is kinda obvious.
> 

Okay. I'll drop it.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models
  2019-03-22 17:34   ` Borislav Petkov
@ 2019-03-22 19:24     ` Ghannam, Yazen
  2019-03-22 19:32       ` Borislav Petkov
  0 siblings, 1 reply; 8+ messages in thread
From: Ghannam, Yazen @ 2019-03-22 19:24 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86, rafal, clemej

On 3/22/2019 12:34 PM, Borislav Petkov wrote:
> On Thu, Mar 21, 2019 at 08:25:18PM +0000, Ghannam, Yazen wrote:
>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>
>> AMD Family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
>> errors under certain conditions. The errors are benign and can safely be
>> ignored. However, the high error rate may cause the MCA threshold
>> counter to overflow causing a high rate of thresholding interrupts. In
>> addition, users may see the errors reported through the AMD MCE decoder
>> module, even with the interrupt disabled, due to MCA polling.
>>
>> This error is reported through the Instruction Fetch bank.
>>
>> Clear the "Counter Present" bit in the Instruction Fetch bank's
>> MCA_MISC0 register. This will prevent enabling MCA thresholding on this
>> bank which will prevent the high interrupt rate due to this error.
>>
>> Define a function to filter these errors from the MCE event pool.
>> Install this function during AMD vendor init. The MCA banks are enabled
>> after vendor init, so the filter function will be installed before the
>> spurious errors will be reported.
>>
>> Cc: <stable@vger.kernel.org> # 4.14.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
>> Cc: <stable@vger.kernel.org> # 4.14.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
>> Cc: <stable@vger.kernel.org> # 4.14.x
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> ---
>> Link:
>> https://lkml.kernel.org/r/20190307212552.8865-2-Yazen.Ghannam@amd.com
>>
>> v1->v2:
>> * Filter out the error earlier in MCE code rather than later in EDAC.
>>
>>  arch/x86/kernel/cpu/mce/amd.c | 57 ++++++++++++++++++++++++++++-------
>>  1 file changed, 46 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index e64de5149e50..2db85f65b41e 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -563,22 +563,55 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
>>  	return offset;
>>  }
>>  
>> +bool filter_mce_rv(struct mce *m)
>> +{
>> +	enum smca_bank_types bank_type = smca_get_bank_type(m->bank);
>> +	u8 xec = (m->status >> 16) & 0x3F;
>> +
>> +	/*
>> +	 * Spurious errors of this type may be reported.
>> +	 * See Family 17h Models 10h-2Fh Erratum #1114.
>> +	 */
>> +	if (bank_type == SMCA_IF && xec == 10)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +static void filter_mce_check(struct cpuinfo_x86 *c)
>> +{
>> +	if (c->x86 == 0x17 && (c->x86_model >= 0x10 && c->x86_model <= 0x2F))
>> +		filter_mce = filter_mce_rv;
>> +}
> 
> Why all the noodling here with a check function which assigns a
> filter_mce_rv (btw, that "rv" means nothing outside of AMD) and a
> generic default_filter_mce?
> 

Okay. It could be named something else. This model group is Raven Ridge, so I thought "rv" would be a good. Maybe spelling it out would be better? Or should it just be something family/model?

> Why not a simple filter_mce() in mce/core.c which calls amd_filter_mce()
> based on x86_vendor and amd_filter_mce() is defined in mce/amd.c?
> 

Generally, the model groups share the same hardware design and so the same quirks. So I'm thinking that it'd be more efficient to have a filter function that targets a specific group of models rather than one that checks all known quirks on all models.

Most of the quirks are dealt with at init time, but this needs be to done during run time for each MCE that is logged. So I didn't want to add unnecessary checks to the MCE handlers. We have quirk_no_way_out() that does something similar.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models
  2019-03-22 19:24     ` Ghannam, Yazen
@ 2019-03-22 19:32       ` Borislav Petkov
  2019-03-22 19:33         ` Ghannam, Yazen
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2019-03-22 19:32 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86, rafal, clemej

On Fri, Mar 22, 2019 at 07:24:01PM +0000, Ghannam, Yazen wrote:
> Generally, the model groups share the same hardware design and so the
> same quirks. So I'm thinking that it'd be more efficient to have a
> filter function that targets a specific group of models rather than
> one that checks all known quirks on all models.

Or simply start with a amd_filter_mce() function and when it grows big
and unwieldy, only *then* start thinking about splitting it into models
and families. For now, you're fine with a single AMD-specific function.

> Most of the quirks are dealt with at init time, but this needs be to
> done during run time for each MCE that is logged. So I didn't want to
> add unnecessary checks to the MCE handlers. We have quirk_no_way_out()
> that does something similar.

I don't think a couple of instructions checking vendor and family would
be at all noticeable so let's start simple.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models
  2019-03-22 19:32       ` Borislav Petkov
@ 2019-03-22 19:33         ` Ghannam, Yazen
  0 siblings, 0 replies; 8+ messages in thread
From: Ghannam, Yazen @ 2019-03-22 19:33 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86, rafal, clemej

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Friday, March 22, 2019 2:32 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com; x86@kernel.org; rafal@milecki.pl;
> clemej@gmail.com
> Subject: Re: [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models
> 
> On Fri, Mar 22, 2019 at 07:24:01PM +0000, Ghannam, Yazen wrote:
> > Generally, the model groups share the same hardware design and so the
> > same quirks. So I'm thinking that it'd be more efficient to have a
> > filter function that targets a specific group of models rather than
> > one that checks all known quirks on all models.
> 
> Or simply start with a amd_filter_mce() function and when it grows big
> and unwieldy, only *then* start thinking about splitting it into models
> and families. For now, you're fine with a single AMD-specific function.
> 

Understood.

> > Most of the quirks are dealt with at init time, but this needs be to
> > done during run time for each MCE that is logged. So I didn't want to
> > add unnecessary checks to the MCE handlers. We have quirk_no_way_out()
> > that does something similar.
> 
> I don't think a couple of instructions checking vendor and family would
> be at all noticeable so let's start simple.
> 

Okay, will do.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-03-22 19:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-21 20:25 [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors Ghannam, Yazen
2019-03-21 20:25 ` [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on some Family 17h models Ghannam, Yazen
2019-03-22 17:34   ` Borislav Petkov
2019-03-22 19:24     ` Ghannam, Yazen
2019-03-22 19:32       ` Borislav Petkov
2019-03-22 19:33         ` Ghannam, Yazen
2019-03-22 17:24 ` [PATCH v2 1/2] x86/MCE: Add function to allow filtering of MCA errors Borislav Petkov
2019-03-22 19:05   ` Ghannam, Yazen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).