All of lore.kernel.org
 help / color / mirror / Atom feed
* x86/MCE/AMD: Fix the thresholding machinery initialization order
@ 2018-11-28  9:09 Borislav Petkov
  0 siblings, 0 replies; 4+ messages in thread
From: Borislav Petkov @ 2018-11-28  9:09 UTC (permalink / raw)
  To: John Clemens
  Cc: Rafał Miłecki, Ghannam, Yazen, Tony Luck, linux-edac,
	Aravind Gopalakrishnan, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, Rafał Miłecki

On Tue, Nov 27, 2018 at 10:06:55PM -0500, John Clemens wrote:
> Not that anyone needs further verification at this point, but tested working
> on top of 4.19.5, on the HP EliteBook 745 G5, BIOS Q81 v01.03.01 (same as
> bugzilla report).
> 
> Tested-by: John Clemens <john@deater.net>

Thanks for testing!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* x86/MCE/AMD: Fix the thresholding machinery initialization order
@ 2018-11-28  3:06 John Clemens
  0 siblings, 0 replies; 4+ messages in thread
From: John Clemens @ 2018-11-28  3:06 UTC (permalink / raw)
  To: Rafał Miłecki, Borislav Petkov, Ghannam, Yazen
  Cc: Tony Luck, linux-edac, Aravind Gopalakrishnan, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, x86, Rafał Miłecki

On 11/27/18 3:38 PM, Rafał Miłecki wrote:
> On 27.11.2018 20:25, Borislav Petkov wrote:
>> Currently, the code sets up the thresholding interrupt vector and only
>> then goes about initializing the thresholding banks. Which is wrong,
>> because an early thresholding interrupt would cause a NULL pointer
>> dereference when accessing those banks and prevent the machine from
>> booting.
>>
>> Therefore, set the thresholding interrupt vector only *after* having
>> initialized the banks successfully.
>>
>> Fixes: 18807ddb7f88 ("x86/mce/AMD: Reset Threshold Limit after logging 
>> error")
>> Reported-by: Rafał Miłecki <rafal@milecki.pl>
>> Reported-by: John Clemens <clemej@gmail.com>
>> Signed-off-by: Borislav Petkov <bp@suse.de>
>> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
>> Cc: linux-edac@vger.kernel.org
>> Cc: stable@vger.kernel.org
>> Cc: Tony Luck <tony.luck@intel.com>
>> Cc: x86@kernel.org
>> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
>> Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291
> 
> Tested on top of 4.14.83 and 4.20-rc4. It fixed both kernels for me!
> 
> Tested-by: Rafał Miłecki <rafal@milecki.pl>

Not that anyone needs further verification at this point, but tested 
working on top of 4.19.5, on the HP EliteBook 745 G5, BIOS Q81 v01.03.01 
(same as bugzilla report).

Tested-by: John Clemens <john@deater.net>

> John: thanks a lot for bisecting this issue down and finding a
> workaround (mce=off). It allowed me to boot recent Linux & debug it
> further.

Thank you for taking it the rest of the way, life got in the way after 
the bisect.  I'm glad you were able to pick it up.

> Borislav: thank you for looking at my patch and coming with a nicer fix
> SO quickly!

My thanks as well, to all.

john.c

^ permalink raw reply	[flat|nested] 4+ messages in thread

* x86/MCE/AMD: Fix the thresholding machinery initialization order
@ 2018-11-27 20:38 Rafał Miłecki
  0 siblings, 0 replies; 4+ messages in thread
From: Rafał Miłecki @ 2018-11-27 20:38 UTC (permalink / raw)
  To: Borislav Petkov, Ghannam, Yazen, John Clemens
  Cc: Tony Luck, linux-edac, Aravind Gopalakrishnan, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, x86, Rafał Miłecki

On 27.11.2018 20:25, Borislav Petkov wrote:
> Currently, the code sets up the thresholding interrupt vector and only
> then goes about initializing the thresholding banks. Which is wrong,
> because an early thresholding interrupt would cause a NULL pointer
> dereference when accessing those banks and prevent the machine from
> booting.
> 
> Therefore, set the thresholding interrupt vector only *after* having
> initialized the banks successfully.
> 
> Fixes: 18807ddb7f88 ("x86/mce/AMD: Reset Threshold Limit after logging error")
> Reported-by: Rafał Miłecki <rafal@milecki.pl>
> Reported-by: John Clemens <clemej@gmail.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
> Cc: linux-edac@vger.kernel.org
> Cc: stable@vger.kernel.org
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: x86@kernel.org
> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
> Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291

Tested on top of 4.14.83 and 4.20-rc4. It fixed both kernels for me!

Tested-by: Rafał Miłecki <rafal@milecki.pl>


John: thanks a lot for bisecting this issue down and finding a
workaround (mce=off). It allowed me to boot recent Linux & debug it
further.

Borislav: thank you for looking at my patch and coming with a nicer fix
SO quickly!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* x86/MCE/AMD: Fix the thresholding machinery initialization order
@ 2018-11-27 19:25 Borislav Petkov
  0 siblings, 0 replies; 4+ messages in thread
From: Borislav Petkov @ 2018-11-27 19:25 UTC (permalink / raw)
  To: Ghannam, Yazen, Rafał Miłecki, John Clemens
  Cc: Tony Luck, linux-edac, Aravind Gopalakrishnan, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, x86, Rafał Miłecki

Currently, the code sets up the thresholding interrupt vector and only
then goes about initializing the thresholding banks. Which is wrong,
because an early thresholding interrupt would cause a NULL pointer
dereference when accessing those banks and prevent the machine from
booting.

Therefore, set the thresholding interrupt vector only *after* having
initialized the banks successfully.

Fixes: 18807ddb7f88 ("x86/mce/AMD: Reset Threshold Limit after logging error")
Reported-by: Rafał Miłecki <rafal@milecki.pl>
Reported-by: John Clemens <clemej@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index dd33c357548f..e12454e21b8a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -56,7 +56,7 @@
 /* Threshold LVT offset is at MSR0xC0000410[15:12] */
 #define SMCA_THR_LVT_OFF	0xF000
 
-static bool thresholding_en;
+static bool thresholding_irq_en;
 
 static const char * const th_names[] = {
 	"load_store",
@@ -534,9 +534,8 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
 
 set_offset:
 	offset = setup_APIC_mce_threshold(offset, new);
-
-	if ((offset == new) && (mce_threshold_vector != amd_threshold_interrupt))
-		mce_threshold_vector = amd_threshold_interrupt;
+	if (offset == new)
+		thresholding_irq_en = true;
 
 done:
 	mce_threshold_block_init(&b, offset);
@@ -1357,9 +1356,6 @@ int mce_threshold_remove_device(unsigned int cpu)
 {
 	unsigned int bank;
 
-	if (!thresholding_en)
-		return 0;
-
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
 		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
@@ -1377,9 +1373,6 @@ int mce_threshold_create_device(unsigned int cpu)
 	struct threshold_bank **bp;
 	int err = 0;
 
-	if (!thresholding_en)
-		return 0;
-
 	bp = per_cpu(threshold_banks, cpu);
 	if (bp)
 		return 0;
@@ -1408,9 +1401,6 @@ static __init int threshold_init_device(void)
 {
 	unsigned lcpu = 0;
 
-	if (mce_threshold_vector == amd_threshold_interrupt)
-		thresholding_en = true;
-
 	/* to hit CPUs online before the notifier is up */
 	for_each_online_cpu(lcpu) {
 		int err = mce_threshold_create_device(lcpu);
@@ -1419,6 +1409,9 @@ static __init int threshold_init_device(void)
 			return err;
 	}
 
+	if (thresholding_irq_en)
+		mce_threshold_vector = amd_threshold_interrupt;
+
 	return 0;
 }
 /*

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-28  9:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-28  9:09 x86/MCE/AMD: Fix the thresholding machinery initialization order Borislav Petkov
  -- strict thread matches above, loose matches on Subject: below --
2018-11-28  3:06 John Clemens
2018-11-27 20:38 Rafał Miłecki
2018-11-27 19:25 Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.