Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[]
@ 2019-11-21 14:15 Yazen Ghannam
  2019-12-10 10:07 ` [tip: ras/core] " tip-bot2 for Yazen Ghannam
  2019-12-17 10:01 ` [tip: ras/urgent] x86/MCE/AMD: " tip-bot2 for Yazen Ghannam
  0 siblings, 2 replies; 5+ messages in thread
From: Yazen Ghannam @ 2019-11-21 14:15 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

Each logical CPU in Scalable MCA systems controls a unique set of MCA
banks in the system. These banks are not shared between CPUs. The bank
types and ordering will be the same across CPUs on currently available
systems.

However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while
other CPUs do not. In this case, the bank seen as Reserved on one CPU is
assumed to be the same type as the bank seen as a known type on another
CPU. In general, this occurs when the hardware represented by the MCA
bank is disabled, e.g. disabled memory controllers on certain models,
etc. The MCA bank is disabled in the hardware, so there is no
possibility of getting an MCA/MCE from it even if it is assumed to have
a known type.

For example:

Full system:
	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         UMC         |          UMC
	 2    |         CS          |          CS

System with hardware disabled:
	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         UMC         |          RAZ
	 2    |         CS          |          CS

For this reason, there is a single, global struct smca_banks[] that is
initialized at boot time. This array is initialized on each CPU as it
comes online. However, the array will not be updated if an entry already
exists.

This works as expected when the first CPU (usually CPU0) has all
possible MCA banks enabled. But if the first CPU has a subset, then it
will save a "Reserved" type in smca_banks[]. Successive CPUs will then
not be able to update smca_banks[] even if they encounter a known bank
type.

This may result in unexpected behavior. Depending on the system
configuration, a user may observe issues enumerating the MCA
thresholding sysfs interface. The issues may be as trivial as sysfs
entries not being available, or as severe as system hangs.

For example:

	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         RAZ         |          UMC
	 2    |         CS          |          CS

Extend the smca_banks[] entry check to return if the entry is a
non-reserved type. Otherwise, continue so that CPUs that encounter a
known bank type can update smca_banks[].

Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mce/amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 4f790c375580..ee0f211b5074 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -266,7 +266,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 	smca_set_misc_banks_map(bank, cpu);
 
 	/* Return early if this bank was already initialized. */
-	if (smca_banks[bank].hwid)
+	if (smca_banks[bank].hwid && smca_banks[bank].hwid->hwid_mcatype != 0)
 		return;
 
 	if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: ras/core] x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[]
  2019-11-21 14:15 [PATCH] x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[] Yazen Ghannam
@ 2019-12-10 10:07 ` " tip-bot2 for Yazen Ghannam
  2019-12-17  1:49   ` Ghannam, Yazen
  2019-12-17 10:01 ` [tip: ras/urgent] x86/MCE/AMD: " tip-bot2 for Yazen Ghannam
  1 sibling, 1 reply; 5+ messages in thread
From: tip-bot2 for Yazen Ghannam @ 2019-12-10 10:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yazen Ghannam, Borislav Petkov, H. Peter Anvin, Ingo Molnar,
	linux-edac, Thomas Gleixner, Tony Luck, x86-ml, LKML

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     a0cac35c1d83184151be4851ea90b5f920957967
Gitweb:        https://git.kernel.org/tip/a0cac35c1d83184151be4851ea90b5f920957967
Author:        Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate:    Thu, 21 Nov 2019 08:15:08 -06:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 10 Dec 2019 09:27:59 +01:00

x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[]

Each logical CPU in Scalable MCA systems controls a unique set of MCA
banks in the system. These banks are not shared between CPUs. The bank
types and ordering will be the same across CPUs on currently available
systems.

However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while
other CPUs do not. In this case, the bank seen as Reserved on one CPU is
assumed to be the same type as the bank seen as a known type on another
CPU.

In general, this occurs when the hardware represented by the MCA bank
is disabled, e.g. disabled memory controllers on certain models, etc.
The MCA bank is disabled in the hardware, so there is no possibility of
getting an MCA/MCE from it even if it is assumed to have a known type.

For example:

Full system:
	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         UMC         |          UMC
	 2    |         CS          |          CS

System with hardware disabled:
	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         UMC         |          RAZ
	 2    |         CS          |          CS

For this reason, there is a single, global struct smca_banks[] that is
initialized at boot time. This array is initialized on each CPU as it
comes online. However, the array will not be updated if an entry already
exists.

This works as expected when the first CPU (usually CPU0) has all
possible MCA banks enabled. But if the first CPU has a subset, then it
will save a "Reserved" type in smca_banks[]. Successive CPUs will then
not be able to update smca_banks[] even if they encounter a known bank
type.

This may result in unexpected behavior. Depending on the system
configuration, a user may observe issues enumerating the MCA
thresholding sysfs interface. The issues may be as trivial as sysfs
entries not being available, or as severe as system hangs.

For example:

	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         RAZ         |          UMC
	 2    |         CS          |          CS

Extend the smca_banks[] entry check to return if the entry is a
non-reserved type. Otherwise, continue so that CPUs that encounter a
known bank type can update smca_banks[].

Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com
---
 arch/x86/kernel/cpu/mce/amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index e41e3b4..d6cf5c1 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -266,7 +266,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 	smca_set_misc_banks_map(bank, cpu);
 
 	/* Return early if this bank was already initialized. */
-	if (smca_banks[bank].hwid)
+	if (smca_banks[bank].hwid && smca_banks[bank].hwid->hwid_mcatype != 0)
 		return;
 
 	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [tip: ras/core] x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[]
  2019-12-10 10:07 ` [tip: ras/core] " tip-bot2 for Yazen Ghannam
@ 2019-12-17  1:49   ` Ghannam, Yazen
  2019-12-17 10:01     ` Borislav Petkov
  0 siblings, 1 reply; 5+ messages in thread
From: Ghannam, Yazen @ 2019-12-17  1:49 UTC (permalink / raw)
  To: linux-kernel, linux-tip-commits, Borislav Petkov
  Cc: H. Peter Anvin, Ingo Molnar, linux-edac, Thomas Gleixner,
	Tony Luck, x86-ml

> -----Original Message-----
> From: tip-bot2@linutronix.de <tip-bot2@linutronix.de>
> Sent: Tuesday, December 10, 2019 5:07 AM
> To: linux-tip-commits@vger.kernel.org
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; Borislav Petkov <bp@suse.de>; H. Peter Anvin <hpa@zytor.com>; Ingo Molnar
> <mingo@kernel.org>; linux-edac <linux-edac@vger.kernel.org>; Thomas Gleixner <tglx@linutronix.de>; Tony Luck
> <tony.luck@intel.com>; x86-ml <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>
> Subject: [tip: ras/core] x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[]
> 
> The following commit has been merged into the ras/core branch of tip:
> 
> Commit-ID:     a0cac35c1d83184151be4851ea90b5f920957967
> Gitweb:
...
> Author:        Yazen Ghannam <yazen.ghannam@amd.com>
> AuthorDate:    Thu, 21 Nov 2019 08:15:08 -06:00
> Committer:     Borislav Petkov <bp@suse.de>
> CommitterDate: Tue, 10 Dec 2019 09:27:59 +01:00
> 
> x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[]
> 

Boris,
Can this please be applied to ras/urgent? It fixes a boot issue on some
recently released AMD systems.

I had the Fixes tag, but I forgot to include CC:<stable>. Sorry about that.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tip: ras/core] x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[]
  2019-12-17  1:49   ` Ghannam, Yazen
@ 2019-12-17 10:01     ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2019-12-17 10:01 UTC (permalink / raw)
  To: Ghannam, Yazen
  Cc: linux-kernel, linux-tip-commits, Borislav Petkov, H. Peter Anvin,
	Ingo Molnar, linux-edac, Thomas Gleixner, Tony Luck, x86-ml

On Tue, Dec 17, 2019 at 01:49:13AM +0000, Ghannam, Yazen wrote:
> Can this please be applied to ras/urgent? It fixes a boot issue on some
> recently released AMD systems.
> 
> I had the Fixes tag, but I forgot to include CC:<stable>. Sorry about that.

Ok, I've reshuffled ras/core and ras/urgent and pushed them out. I'd
appreciate it if you ran the urgent branch too, on your end. I tested on
everything I have here but my hw doesn't trigger any boot hang issues
anyway. If you have people reporting issues, now would be a good time
for them to test it too.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: ras/urgent] x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]
  2019-11-21 14:15 [PATCH] x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[] Yazen Ghannam
  2019-12-10 10:07 ` [tip: ras/core] " tip-bot2 for Yazen Ghannam
@ 2019-12-17 10:01 ` " tip-bot2 for Yazen Ghannam
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot2 for Yazen Ghannam @ 2019-12-17 10:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yazen Ghannam, Borislav Petkov, H. Peter Anvin, Ingo Molnar,
	linux-edac, stable, Thomas Gleixner, Tony Luck, x86-ml, LKML

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID:     966af20929ac24360ba3fac5533eb2ab003747da
Gitweb:        https://git.kernel.org/tip/966af20929ac24360ba3fac5533eb2ab003747da
Author:        Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate:    Thu, 21 Nov 2019 08:15:08 -06:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 17 Dec 2019 09:39:53 +01:00

x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]

Each logical CPU in Scalable MCA systems controls a unique set of MCA
banks in the system. These banks are not shared between CPUs. The bank
types and ordering will be the same across CPUs on currently available
systems.

However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while
other CPUs do not. In this case, the bank seen as Reserved on one CPU is
assumed to be the same type as the bank seen as a known type on another
CPU.

In general, this occurs when the hardware represented by the MCA bank
is disabled, e.g. disabled memory controllers on certain models, etc.
The MCA bank is disabled in the hardware, so there is no possibility of
getting an MCA/MCE from it even if it is assumed to have a known type.

For example:

Full system:
	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         UMC         |          UMC
	 2    |         CS          |          CS

System with hardware disabled:
	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         UMC         |          RAZ
	 2    |         CS          |          CS

For this reason, there is a single, global struct smca_banks[] that is
initialized at boot time. This array is initialized on each CPU as it
comes online. However, the array will not be updated if an entry already
exists.

This works as expected when the first CPU (usually CPU0) has all
possible MCA banks enabled. But if the first CPU has a subset, then it
will save a "Reserved" type in smca_banks[]. Successive CPUs will then
not be able to update smca_banks[] even if they encounter a known bank
type.

This may result in unexpected behavior. Depending on the system
configuration, a user may observe issues enumerating the MCA
thresholding sysfs interface. The issues may be as trivial as sysfs
entries not being available, or as severe as system hangs.

For example:

	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         RAZ         |          UMC
	 2    |         CS          |          CS

Extend the smca_banks[] entry check to return if the entry is a
non-reserved type. Otherwise, continue so that CPUs that encounter a
known bank type can update smca_banks[].

Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com
---
 arch/x86/kernel/cpu/mce/amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index e41e3b4..d6cf5c1 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -266,7 +266,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 	smca_set_misc_banks_map(bank, cpu);
 
 	/* Return early if this bank was already initialized. */
-	if (smca_banks[bank].hwid)
+	if (smca_banks[bank].hwid && smca_banks[bank].hwid->hwid_mcatype != 0)
 		return;
 
 	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-21 14:15 [PATCH] x86/mce/AMD: Allow Reserved types to be overwritten in smca_banks[] Yazen Ghannam
2019-12-10 10:07 ` [tip: ras/core] " tip-bot2 for Yazen Ghannam
2019-12-17  1:49   ` Ghannam, Yazen
2019-12-17 10:01     ` Borislav Petkov
2019-12-17 10:01 ` [tip: ras/urgent] x86/MCE/AMD: " tip-bot2 for Yazen Ghannam

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git