linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] x86/mce: Support extended MCA_ADDR address on SMCA systems
@ 2022-12-06 17:36 Yazen Ghannam
  2022-12-06 17:36 ` [PATCH v6 1/4] x86/mce: Cleanup bank processing on init Yazen Ghannam
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Yazen Ghannam @ 2022-12-06 17:36 UTC (permalink / raw)
  To: linux-edac
  Cc: linux-kernel, tony.luck, x86, Smita.KoralahalliChannabasappa,
	Yazen Ghannam

Hi all,

This series of patches adds support for extended physical address on newer
AMD CPUs.

Patch 1 simplifies part of the MCA init path by reordering and combining
some helper functions. This was shared by Boris during the discussion of v5
of this set.

Patch 2 removes another MCA init helper function and merges it with the AMD
MCA init code. Also, some other MCA init functions are reordered so that
MCA is enabled after vendor init. This was shared by myself during the
discussion of v5 of this set.

Patch 3 moves the SMCA-specific parsing of the MCA_ADDR register into a
separate helper function. This was originally submitted by Smita and
reworked by Boris. The current version is unmodified from her v5
submission.

Patch 4 adds support for the new location of the "LSB" field used when
parsing MCA_ADDR. This was originally submitted by Smita, and I've rebased
it on the first three patches.

Thanks,
Yazen 

Link:
https://lore.kernel.org/r/20220412154038.261750-1-Smita.KoralahalliChannabasappa@amd.com

Borislav Petkov (1):
  x86/mce: Cleanup bank processing on init

Smita Koralahalli (2):
  x86/mce: Define function to extract ErrorAddr from MCA_ADDR
  x86/mce: Add support for Extended Physical Address MCA changes

Yazen Ghannam (1):
  x86/mce: Remove __mcheck_cpu_init_early()

 arch/x86/include/asm/mce.h         |   3 +-
 arch/x86/kernel/cpu/mce/amd.c      |  16 ++---
 arch/x86/kernel/cpu/mce/core.c     | 100 +++++++----------------------
 arch/x86/kernel/cpu/mce/internal.h |  44 +++++++++++++
 4 files changed, 74 insertions(+), 89 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v6 1/4] x86/mce: Cleanup bank processing on init
  2022-12-06 17:36 [PATCH v6 0/4] x86/mce: Support extended MCA_ADDR address on SMCA systems Yazen Ghannam
@ 2022-12-06 17:36 ` Yazen Ghannam
  2022-12-23 13:23   ` Borislav Petkov
  2022-12-06 17:36 ` [PATCH v6 2/4] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Yazen Ghannam @ 2022-12-06 17:36 UTC (permalink / raw)
  To: linux-edac
  Cc: linux-kernel, tony.luck, x86, Smita.KoralahalliChannabasappa,
	Borislav Petkov, Yazen Ghannam

From: Borislav Petkov <bp@suse.de>

Unify the bank preparation into __mcheck_cpu_init_clear_banks(), rename
that function to what it does now - prepares banks. Do this so that
generic and vendor banks init goes first so that settings done during
that init can take effect before the first bank polling takes place.

Move __mcheck_cpu_check_banks() into __mcheck_cpu_init_prepare_banks()
as it already loops over the banks.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Link:
https://lore.kernel.org/r/Ylb3/4oi6KAjdsJW@zn.tnic

v6:
	New. Added Yazen's Reviewed-by.

 arch/x86/include/asm/mce.h     |  3 +-
 arch/x86/kernel/cpu/mce/core.c | 64 ++++++++++------------------------
 2 files changed, 19 insertions(+), 48 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6e986088817d..0dd7752345ec 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -253,8 +253,7 @@ DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
 enum mcp_flags {
 	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
 	MCP_UC		= BIT(1),	/* log uncorrected errors */
-	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
-	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
+	MCP_QUEUE_LOG	= BIT(2),	/* only queue to genpool */
 };
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c8ec5c71712..5f406d135d32 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -738,9 +738,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 log_it:
 		error_seen = true;
 
-		if (flags & MCP_DONTLOG)
-			goto clear_it;
-
 		mce_read_aux(&m, i);
 		m.severity = mce_severity(&m, NULL, NULL, false);
 		/*
@@ -1707,7 +1704,7 @@ static void __mcheck_cpu_mce_banks_init(void)
 		/*
 		 * Init them all, __mcheck_cpu_apply_quirks() is going to apply
 		 * the required vendor quirks before
-		 * __mcheck_cpu_init_clear_banks() does the final bank setup.
+		 * __mcheck_cpu_init_prepare_banks() does the final bank setup.
 		 */
 		b->ctl = -1ULL;
 		b->init = true;
@@ -1746,21 +1743,8 @@ static void __mcheck_cpu_cap_init(void)
 
 static void __mcheck_cpu_init_generic(void)
 {
-	enum mcp_flags m_fl = 0;
-	mce_banks_t all_banks;
 	u64 cap;
 
-	if (!mca_cfg.bootlog)
-		m_fl = MCP_DONTLOG;
-
-	/*
-	 * Log the machine checks left over from the previous reset. Log them
-	 * only, do not start processing them. That will happen in mcheck_late_init()
-	 * when all consumers have been registered on the notifier chain.
-	 */
-	bitmap_fill(all_banks, MAX_NR_BANKS);
-	machine_check_poll(MCP_UC | MCP_QUEUE_LOG | m_fl, &all_banks);
-
 	cr4_set_bits(X86_CR4_MCE);
 
 	rdmsrl(MSR_IA32_MCG_CAP, cap);
@@ -1768,36 +1752,22 @@ static void __mcheck_cpu_init_generic(void)
 		wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);
 }
 
-static void __mcheck_cpu_init_clear_banks(void)
+static void __mcheck_cpu_init_prepare_banks(void)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+	mce_banks_t all_banks;
+	u64 msrval;
 	int i;
 
-	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
-		struct mce_bank *b = &mce_banks[i];
-
-		if (!b->init)
-			continue;
-		wrmsrl(mca_msr_reg(i, MCA_CTL), b->ctl);
-		wrmsrl(mca_msr_reg(i, MCA_STATUS), 0);
+	/*
+	 * Log the machine checks left over from the previous reset. Log them
+	 * only, do not start processing them. That will happen in mcheck_late_init()
+	 * when all consumers have been registered on the notifier chain.
+	 */
+	if (mca_cfg.bootlog) {
+		bitmap_fill(all_banks, MAX_NR_BANKS);
+		machine_check_poll(MCP_UC | MCP_QUEUE_LOG, &all_banks);
 	}
-}
-
-/*
- * Do a final check to see if there are any unused/RAZ banks.
- *
- * This must be done after the banks have been initialized and any quirks have
- * been applied.
- *
- * Do not call this from any user-initiated flows, e.g. CPU hotplug or sysfs.
- * Otherwise, a user who disables a bank will not be able to re-enable it
- * without a system reboot.
- */
-static void __mcheck_cpu_check_banks(void)
-{
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-	u64 msrval;
-	int i;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
 		struct mce_bank *b = &mce_banks[i];
@@ -1805,6 +1775,9 @@ static void __mcheck_cpu_check_banks(void)
 		if (!b->init)
 			continue;
 
+		wrmsrl(mca_msr_reg(i, MCA_CTL), b->ctl);
+		wrmsrl(mca_msr_reg(i, MCA_STATUS), 0);
+
 		rdmsrl(mca_msr_reg(i, MCA_CTL), msrval);
 		b->init = !!msrval;
 	}
@@ -2169,8 +2142,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 	__mcheck_cpu_init_early(c);
 	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
-	__mcheck_cpu_init_clear_banks();
-	__mcheck_cpu_check_banks();
+	__mcheck_cpu_init_prepare_banks();
 	__mcheck_cpu_setup_timer();
 }
 
@@ -2338,7 +2310,7 @@ static void mce_syscore_resume(void)
 {
 	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(raw_cpu_ptr(&cpu_info));
-	__mcheck_cpu_init_clear_banks();
+	__mcheck_cpu_init_prepare_banks();
 }
 
 static struct syscore_ops mce_syscore_ops = {
@@ -2356,7 +2328,7 @@ static void mce_cpu_restart(void *data)
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
 	__mcheck_cpu_init_generic();
-	__mcheck_cpu_init_clear_banks();
+	__mcheck_cpu_init_prepare_banks();
 	__mcheck_cpu_init_timer();
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 2/4] x86/mce: Remove __mcheck_cpu_init_early()
  2022-12-06 17:36 [PATCH v6 0/4] x86/mce: Support extended MCA_ADDR address on SMCA systems Yazen Ghannam
  2022-12-06 17:36 ` [PATCH v6 1/4] x86/mce: Cleanup bank processing on init Yazen Ghannam
@ 2022-12-06 17:36 ` Yazen Ghannam
  2022-12-28 18:53   ` Borislav Petkov
  2022-12-06 17:36 ` [PATCH v6 3/4] x86/mce: Define function to extract ErrorAddr from MCA_ADDR Yazen Ghannam
  2022-12-06 17:36 ` [PATCH v6 4/4] x86/mce: Add support for Extended Physical Address MCA changes Yazen Ghannam
  3 siblings, 1 reply; 8+ messages in thread
From: Yazen Ghannam @ 2022-12-06 17:36 UTC (permalink / raw)
  To: linux-edac
  Cc: linux-kernel, tony.luck, x86, Smita.KoralahalliChannabasappa,
	Yazen Ghannam

The __mcheck_cpu_init_early() function was introduced so that some
vendor-specific features are detected before the first MCA polling event
done in __mcheck_cpu_init_generic().

Currently, __mcheck_cpu_init_early() is only used on AMD-based systems and
additional code will be needed to support various system configurations.

However, the current and future vendor-specific code should be done during
vendor init. This keeps all the vendor code in a common location and
simplifies the generic init flow.

Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init().
Also, move __mcheck_cpu_init_generic() after
__mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
MCA polling event.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Link:
https://lore.kernel.org/r/YqJHwXkg3Ny9fI3s@yaz-fattaah

v6:
	New.

 arch/x86/kernel/cpu/mce/amd.c  |  4 ++++
 arch/x86/kernel/cpu/mce/core.c | 20 +++-----------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 10fb5b5c9efa..b80472a52ad8 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -681,6 +681,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	u32 low = 0, high = 0, address = 0;
 	int offset = -1;
 
+	mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
+	mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
+	mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
+	mce_flags.amd_threshold	 = 1;
 
 	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
 		if (mce_flags.smca)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5f406d135d32..9efd6d010e2d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1906,19 +1906,6 @@ static int __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
 	return 0;
 }
 
-/*
- * Init basic CPU features needed for early decoding of MCEs.
- */
-static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
-{
-	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
-		mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
-		mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
-		mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
-		mce_flags.amd_threshold	 = 1;
-	}
-}
-
 static void mce_centaur_feature_init(struct cpuinfo_x86 *c)
 {
 	struct mca_config *cfg = &mca_cfg;
@@ -2139,10 +2126,9 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 
 	mca_cfg.initialized = 1;
 
-	__mcheck_cpu_init_early(c);
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 	__mcheck_cpu_setup_timer();
 }
 
@@ -2308,9 +2294,9 @@ static void mce_syscore_shutdown(void)
  */
 static void mce_syscore_resume(void)
 {
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(raw_cpu_ptr(&cpu_info));
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 }
 
 static struct syscore_ops mce_syscore_ops = {
@@ -2327,8 +2313,8 @@ static void mce_cpu_restart(void *data)
 {
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_timer();
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 3/4] x86/mce: Define function to extract ErrorAddr from MCA_ADDR
  2022-12-06 17:36 [PATCH v6 0/4] x86/mce: Support extended MCA_ADDR address on SMCA systems Yazen Ghannam
  2022-12-06 17:36 ` [PATCH v6 1/4] x86/mce: Cleanup bank processing on init Yazen Ghannam
  2022-12-06 17:36 ` [PATCH v6 2/4] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
@ 2022-12-06 17:36 ` Yazen Ghannam
  2022-12-06 17:36 ` [PATCH v6 4/4] x86/mce: Add support for Extended Physical Address MCA changes Yazen Ghannam
  3 siblings, 0 replies; 8+ messages in thread
From: Yazen Ghannam @ 2022-12-06 17:36 UTC (permalink / raw)
  To: linux-edac
  Cc: linux-kernel, tony.luck, x86, Smita.KoralahalliChannabasappa,
	Yazen Ghannam

From: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Move MCA_ADDR[ErrorAddr] extraction into a separate helper function. This
will be further refactored to support extended ErrorAddr bits in MCA_ADDR
in newer AMD CPUs.

  [ bp: Massage. ]

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Link:
https://lore.kernel.org/r/20220412154038.261750-2-Smita.KoralahalliChannabasappa@amd.com

v2:
	No change.
v3:
	Rebased on the latest tip tree. No functional changes.
v4:
	Commit description change to be void of the patch linearity.
v5:
	Extract entire function including comments.
	Define smca_extract_err_addr() in mce/internal.h
v6:
	No functional change. Removed old link.

 arch/x86/kernel/cpu/mce/amd.c      | 10 +---------
 arch/x86/kernel/cpu/mce/core.c     | 10 +---------
 arch/x86/kernel/cpu/mce/internal.h | 15 +++++++++++++++
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index b80472a52ad8..85977ca07825 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -740,15 +740,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 	if (m.status & MCI_STATUS_ADDRV) {
 		m.addr = addr;
 
-		/*
-		 * Extract [55:<lsb>] where lsb is the least significant
-		 * *valid* bit of the address bits.
-		 */
-		if (mce_flags.smca) {
-			u8 lsb = (m.addr >> 56) & 0x3f;
-
-			m.addr &= GENMASK_ULL(55, lsb);
-		}
+		smca_extract_err_addr(&m);
 	}
 
 	if (mce_flags.smca) {
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 9efd6d010e2d..757cc46298d3 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -633,15 +633,7 @@ static noinstr void mce_read_aux(struct mce *m, int i)
 			m->addr <<= shift;
 		}
 
-		/*
-		 * Extract [55:<lsb>] where lsb is the least significant
-		 * *valid* bit of the address bits.
-		 */
-		if (mce_flags.smca) {
-			u8 lsb = (m->addr >> 56) & 0x3f;
-
-			m->addr &= GENMASK_ULL(55, lsb);
-		}
+		smca_extract_err_addr(m);
 	}
 
 	if (mce_flags.smca) {
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 7e03f5b7f6bd..6dcb94fe0f65 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -189,8 +189,23 @@ extern bool filter_mce(struct mce *m);
 
 #ifdef CONFIG_X86_MCE_AMD
 extern bool amd_filter_mce(struct mce *m);
+
+/* Extract [55:<lsb>] where lsb is the LS-*valid* bit of the address bits. */
+static __always_inline void smca_extract_err_addr(struct mce *m)
+{
+	u8 lsb;
+
+	if (!mce_flags.smca)
+		return;
+
+	lsb = (m->addr >> 56) & 0x3f;
+
+	m->addr &= GENMASK_ULL(55, lsb);
+}
+
 #else
 static inline bool amd_filter_mce(struct mce *m) { return false; }
+static inline void smca_extract_err_addr(struct mce *m) { }
 #endif
 
 #ifdef CONFIG_X86_ANCIENT_MCE
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 4/4] x86/mce: Add support for Extended Physical Address MCA changes
  2022-12-06 17:36 [PATCH v6 0/4] x86/mce: Support extended MCA_ADDR address on SMCA systems Yazen Ghannam
                   ` (2 preceding siblings ...)
  2022-12-06 17:36 ` [PATCH v6 3/4] x86/mce: Define function to extract ErrorAddr from MCA_ADDR Yazen Ghannam
@ 2022-12-06 17:36 ` Yazen Ghannam
  3 siblings, 0 replies; 8+ messages in thread
From: Yazen Ghannam @ 2022-12-06 17:36 UTC (permalink / raw)
  To: linux-edac
  Cc: linux-kernel, tony.luck, x86, Smita.KoralahalliChannabasappa,
	Yazen Ghannam

From: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Newer AMD CPUs support more physical address bits.

That is the MCA_ADDR registers on Scalable MCA systems contain the
ErrorAddr in bits [56:0] instead of [55:0]. Hence the existing LSB field
from bits [61:56] in MCA_ADDR must be moved around to accommodate the
larger ErrorAddr size.

MCA_CONFIG[McaLsbInStatusSupported] indicates this change. If set, the
LSB field will be found in MCA_STATUS rather than MCA_ADDR.

Each logical CPU has unique MCA bank in hardware and is not shared with
other logical CPUs. Additionally on SMCA systems, each feature bit may be
different for each bank within same logical CPU.

Check for MCA_CONFIG[McaLsbInStatusSupported] for each MCA bank and for
each CPU.

Additionally, all MCA banks do not support maximum ErrorAddr bits in
MCA_ADDR. Some banks might support fewer bits but the remaining bits are
marked as reserved.

[Yazen: Rebased and fixed up formatting.]

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Link:
https://lore.kernel.org/r/20220412154038.261750-3-Smita.KoralahalliChannabasappa@amd.com

v2:
	Declared lsb_in_status in existing mce_bank[] struct.
	Moved struct mce_bank[] declaration from core.c -> internal.h
v3:
	Rebased on the latest tip tree. No functional changes.
v4:
	No change.
v5:
	Extend comment for smca_extract_err_addr if AddrLsb is found in
	MCA_STATUS registers.
v6:
	Rebase and fix up formatting.

 arch/x86/kernel/cpu/mce/amd.c      |  2 ++
 arch/x86/kernel/cpu/mce/core.c     |  8 +-------
 arch/x86/kernel/cpu/mce/internal.h | 31 +++++++++++++++++++++++++++++-
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 85977ca07825..d4ec9b3481b8 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -306,6 +306,8 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 		if ((low & BIT(5)) && !((high >> 5) & 0x3))
 			high |= BIT(5);
 
+		this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
+
 		wrmsr(smca_config, low, high);
 	}
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 757cc46298d3..8b67e0284564 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -67,13 +67,7 @@ DEFINE_PER_CPU(unsigned, mce_exception_count);
 
 DEFINE_PER_CPU_READ_MOSTLY(unsigned int, mce_num_banks);
 
-struct mce_bank {
-	u64			ctl;			/* subevents to enable */
-
-	__u64 init			: 1,		/* initialise bank? */
-	      __reserved_1		: 63;
-};
-static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array);
+DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array);
 
 #define ATTR_LEN               16
 /* One object for each MCE bank, shared by all CPUs */
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 6dcb94fe0f65..867bcf9ee424 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -177,6 +177,24 @@ struct mce_vendor_flags {
 
 extern struct mce_vendor_flags mce_flags;
 
+struct mce_bank {
+	/* subevents to enable */
+	u64			ctl;
+
+	/* initialise bank? */
+	__u64 init		: 1,
+
+	/*
+	 * (AMD) MCA_CONFIG[McaLsbInStatusSupported]: This bit indicates
+	 * the LSB field is found in MCA_STATUS, when set.
+	 */
+	lsb_in_status		: 1,
+
+	__reserved_1		: 62;
+};
+
+DECLARE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array);
+
 enum mca_msr {
 	MCA_CTL,
 	MCA_STATUS,
@@ -190,7 +208,10 @@ extern bool filter_mce(struct mce *m);
 #ifdef CONFIG_X86_MCE_AMD
 extern bool amd_filter_mce(struct mce *m);
 
-/* Extract [55:<lsb>] where lsb is the LS-*valid* bit of the address bits. */
+/*
+ * If MCA_CONFIG[McaLsbInStatusSupported] is set, extract ErrAddr in bits
+ * [56:0], else in bits [55:0] of MCA_ADDR.
+ */
 static __always_inline void smca_extract_err_addr(struct mce *m)
 {
 	u8 lsb;
@@ -198,6 +219,14 @@ static __always_inline void smca_extract_err_addr(struct mce *m)
 	if (!mce_flags.smca)
 		return;
 
+	if (this_cpu_ptr(mce_banks_array)[m->bank].lsb_in_status) {
+		lsb = (m->status >> 24) & 0x3f;
+
+		m->addr &= GENMASK_ULL(56, lsb);
+
+		return;
+	}
+
 	lsb = (m->addr >> 56) & 0x3f;
 
 	m->addr &= GENMASK_ULL(55, lsb);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 1/4] x86/mce: Cleanup bank processing on init
  2022-12-06 17:36 ` [PATCH v6 1/4] x86/mce: Cleanup bank processing on init Yazen Ghannam
@ 2022-12-23 13:23   ` Borislav Petkov
  0 siblings, 0 replies; 8+ messages in thread
From: Borislav Petkov @ 2022-12-23 13:23 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: linux-edac, linux-kernel, tony.luck, x86,
	Smita.KoralahalliChannabasappa, Borislav Petkov

On Tue, Dec 06, 2022 at 11:36:04AM -0600, Yazen Ghannam wrote:
> -static void __mcheck_cpu_init_clear_banks(void)
> +static void __mcheck_cpu_init_prepare_banks(void)
>  {
>  	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
> +	mce_banks_t all_banks;
> +	u64 msrval;
>  	int i;
>  
> -	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> -		struct mce_bank *b = &mce_banks[i];
> -
> -		if (!b->init)
> -			continue;
> -		wrmsrl(mca_msr_reg(i, MCA_CTL), b->ctl);
> -		wrmsrl(mca_msr_reg(i, MCA_STATUS), 0);
> +	/*
> +	 * Log the machine checks left over from the previous reset. Log them
> +	 * only, do not start processing them. That will happen in mcheck_late_init()
> +	 * when all consumers have been registered on the notifier chain.
> +	 */
> +	if (mca_cfg.bootlog) {
> +		bitmap_fill(all_banks, MAX_NR_BANKS);
> +		machine_check_poll(MCP_UC | MCP_QUEUE_LOG, &all_banks);
>  	}

Yeah, just a nit ontop - lemme move all_banks into the if branch so that
it is closer and obvious:

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5f406d135d32..a90d3eb6fcd8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1755,7 +1755,6 @@ static void __mcheck_cpu_init_generic(void)
 static void __mcheck_cpu_init_prepare_banks(void)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-	mce_banks_t all_banks;
 	u64 msrval;
 	int i;
 
@@ -1765,6 +1764,8 @@ static void __mcheck_cpu_init_prepare_banks(void)
 	 * when all consumers have been registered on the notifier chain.
 	 */
 	if (mca_cfg.bootlog) {
+		mce_banks_t all_banks;
+
 		bitmap_fill(all_banks, MAX_NR_BANKS);
 		machine_check_poll(MCP_UC | MCP_QUEUE_LOG, &all_banks);
 	}

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 2/4] x86/mce: Remove __mcheck_cpu_init_early()
  2022-12-06 17:36 ` [PATCH v6 2/4] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
@ 2022-12-28 18:53   ` Borislav Petkov
  2023-01-03 20:54     ` Luck, Tony
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2022-12-28 18:53 UTC (permalink / raw)
  To: Yazen Ghannam, tony.luck
  Cc: linux-edac, linux-kernel, x86, Smita.KoralahalliChannabasappa

On Tue, Dec 06, 2022 at 11:36:05AM -0600, Yazen Ghannam wrote:
> +	mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
> +	mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
> +	mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
> +	mce_flags.amd_threshold	 = 1;
>  
>  	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
>  		if (mce_flags.smca)
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5f406d135d32..9efd6d010e2d 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1906,19 +1906,6 @@ static int __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
>  	return 0;
>  }
>  
> -/*
> - * Init basic CPU features needed for early decoding of MCEs.
> - */
> -static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
> -{
> -	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
> -		mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
> -		mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
> -		mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
> -		mce_flags.amd_threshold	 = 1;
> -	}

Yeah, looking at this, before and after the change, what we are and were
doing here is silly. Those flags are global for the whole system but we
do set them on each CPU - unnecessarily, ofc ;-\ - because we don't have
a BSP MCE init call.

That above happens on the mcheck_cpu_init() path which is per-CPU.

However, if we had to be precise and correct, this flags setup should
happen in a function called

	mcheck_bsp_init()

or so which gets called at the end of identify_boot_cpu() and which does
all the *once* actions there like allocate the gen pool, run the quirks
which need to run only once on the BSP and so on.

So that we don't have to do unnecessary work on every CPU.

Tony, thoughts?

I think we should start working towards this - doesn't have to be done
immediately but I think a proper separation of what runs where - once
on the BSP or on every CPU - is needed here. Unless I'm missing an
important angle, which is entirely possible.

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v6 2/4] x86/mce: Remove __mcheck_cpu_init_early()
  2022-12-28 18:53   ` Borislav Petkov
@ 2023-01-03 20:54     ` Luck, Tony
  0 siblings, 0 replies; 8+ messages in thread
From: Luck, Tony @ 2023-01-03 20:54 UTC (permalink / raw)
  To: Borislav Petkov, Yazen Ghannam
  Cc: linux-edac, linux-kernel, x86, Smita.KoralahalliChannabasappa

> Yeah, looking at this, before and after the change, what we are and were
> doing here is silly. Those flags are global for the whole system but we
> do set them on each CPU - unnecessarily, ofc ;-\ - because we don't have
> a BSP MCE init call.
>
> That above happens on the mcheck_cpu_init() path which is per-CPU.
>
> However, if we had to be precise and correct, this flags setup should
> happen in a function called
>
>       mcheck_bsp_init()
>
> or so which gets called at the end of identify_boot_cpu() and which does
> all the *once* actions there like allocate the gen pool, run the quirks
> which need to run only once on the BSP and so on.
>
> So that we don't have to do unnecessary work on every CPU.
>
> Tony, thoughts?
>
> I think we should start working towards this - doesn't have to be done
> immediately but I think a proper separation of what runs where - once
> on the BSP or on every CPU - is needed here. Unless I'm missing an
> important angle, which is entirely possible.

Cleanup sounds good. But do we need a new mcheck_bsp_init() function?

Can the "only once" stuff be done from mcheck_init()? Or does it rely on
things that aren't set up that early?

-Tony

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-03 20:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-06 17:36 [PATCH v6 0/4] x86/mce: Support extended MCA_ADDR address on SMCA systems Yazen Ghannam
2022-12-06 17:36 ` [PATCH v6 1/4] x86/mce: Cleanup bank processing on init Yazen Ghannam
2022-12-23 13:23   ` Borislav Petkov
2022-12-06 17:36 ` [PATCH v6 2/4] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
2022-12-28 18:53   ` Borislav Petkov
2023-01-03 20:54     ` Luck, Tony
2022-12-06 17:36 ` [PATCH v6 3/4] x86/mce: Define function to extract ErrorAddr from MCA_ADDR Yazen Ghannam
2022-12-06 17:36 ` [PATCH v6 4/4] x86/mce: Add support for Extended Physical Address MCA changes Yazen Ghannam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).