linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Set and use cpu_die_id on AMD-based systems
@ 2020-11-09 21:06 Yazen Ghannam
  2020-11-09 21:06 ` [PATCH 1/4] x86/CPU/AMD: Save AMD NodeId as cpu_die_id Yazen Ghannam
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Yazen Ghannam @ 2020-11-09 21:06 UTC (permalink / raw)
  To: x86
  Cc: Yazen Ghannam, linux-kernel, Smita.KoralahalliChannabasappa,
	linux-edac, puwen, kim.phillips, thomas.lendacky, wei.huang2

From: Yazen Ghannam <yazen.ghannam@amd.com>

AMD-based systems currently use a "NodeId" when referencing a
software-visible hardware structure. This may be referred to as a "Die"
in x86 documentation, "Node" in some AMD documentation, and "Package" in
Linux documentation.

Recently a cpu_die_id value was added to struct cpuinfo_x86. This value
can be used on AMD-based systems rather than using an AMD-specific value
throughout the kernel.

This set is based on patches 1-3 from the following set.
https://lkml.kernel.org/r/20200903200144.310991-1-Yazen.Ghannam@amd.com

Thanks,
Yazen

Yazen Ghannam (4):
  x86/CPU/AMD: Save AMD NodeId as cpu_die_id
  x86/CPU/AMD: Remove amd_get_nb_id()
  EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
  x86/topology: Set cpu_die_id only if DIE_TYPE found

 Documentation/x86/topology.rst   |  9 +++++++++
 arch/x86/events/amd/core.c       |  2 +-
 arch/x86/include/asm/cacheinfo.h |  4 ++--
 arch/x86/include/asm/processor.h |  2 --
 arch/x86/kernel/amd_nb.c         |  4 ++--
 arch/x86/kernel/cpu/amd.c        | 17 +++++------------
 arch/x86/kernel/cpu/cacheinfo.c  |  8 ++++----
 arch/x86/kernel/cpu/hygon.c      | 11 +++++------
 arch/x86/kernel/cpu/mce/amd.c    |  4 ++--
 arch/x86/kernel/cpu/mce/inject.c |  4 ++--
 arch/x86/kernel/cpu/topology.c   | 10 ++++++++--
 drivers/edac/amd64_edac.c        |  4 ++--
 drivers/edac/mce_amd.c           |  4 ++--
 13 files changed, 44 insertions(+), 39 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] x86/CPU/AMD: Save AMD NodeId as cpu_die_id
  2020-11-09 21:06 [PATCH 0/4] Set and use cpu_die_id on AMD-based systems Yazen Ghannam
@ 2020-11-09 21:06 ` Yazen Ghannam
  2020-11-09 21:06 ` [PATCH 2/4] x86/CPU/AMD: Remove amd_get_nb_id() Yazen Ghannam
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Yazen Ghannam @ 2020-11-09 21:06 UTC (permalink / raw)
  To: x86
  Cc: Yazen Ghannam, linux-kernel, Smita.KoralahalliChannabasappa,
	linux-edac, puwen, kim.phillips, thomas.lendacky, wei.huang2

From: Yazen Ghannam <yazen.ghannam@amd.com>

AMD systems provide a "NodeId" value that represents a global ID
indicating to which "Node" a logical CPU belongs. The "Node" is a
physical structure equivalent to a Die, and it should not be confused
with logical structures like NUMA nodes. Logical nodes can be adjusted
based on firmware or other settings whereas the physical nodes/dies are
fixed based on hardware topology.

The NodeId value can be used when a physical ID is needed by software.

Save the AMD NodeId to struct cpuinfo_x86.cpu_die_id. Use the value
from CPUID or MSR as appropriate. Default to phys_proc_id otherwise.
Do so for both AMD and Hygon systems.

Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no
longer needed.

Update the x86 topology documentation.

[ Use cpu_die_id. ]
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 Documentation/x86/topology.rst   |  9 +++++++++
 arch/x86/include/asm/cacheinfo.h |  4 ++--
 arch/x86/kernel/cpu/amd.c        | 11 +++++------
 arch/x86/kernel/cpu/cacheinfo.c  |  6 +++---
 arch/x86/kernel/cpu/hygon.c      | 11 +++++------
 5 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/Documentation/x86/topology.rst b/Documentation/x86/topology.rst
index e29739904e37..7f58010ea86a 100644
--- a/Documentation/x86/topology.rst
+++ b/Documentation/x86/topology.rst
@@ -41,6 +41,8 @@ Package
 Packages contain a number of cores plus shared resources, e.g. DRAM
 controller, shared caches etc.
 
+Modern systems may also use the term 'Die' for package.
+
 AMD nomenclature for package is 'Node'.
 
 Package-related topology information in the kernel:
@@ -53,11 +55,18 @@ Package-related topology information in the kernel:
 
     The number of dies in a package. This information is retrieved via CPUID.
 
+  - cpuinfo_x86.cpu_die_id:
+
+    The physical ID of the die. This information is retrieved via CPUID.
+
   - cpuinfo_x86.phys_proc_id:
 
     The physical ID of the package. This information is retrieved via CPUID
     and deduced from the APIC IDs of the cores in the package.
 
+    Modern systems use this value for the socket. There may be multiple
+    packages within a socket. This value may differ from cpu_die_id.
+
   - cpuinfo_x86.logical_proc_id:
 
     The logical ID of the package. As we do not trust BIOSes to enumerate the
diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index 86b63c7feab7..86b2e0dcc4bf 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_X86_CACHEINFO_H
 #define _ASM_X86_CACHEINFO_H
 
-void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id);
-void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id);
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu);
+void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
 
 #endif /* _ASM_X86_CACHEINFO_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6062ce586b95..2f1fbd8150af 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -330,7 +330,6 @@ static void legacy_fixup_core_id(struct cpuinfo_x86 *c)
  */
 static void amd_get_topology(struct cpuinfo_x86 *c)
 {
-	u8 node_id;
 	int cpu = smp_processor_id();
 
 	/* get information required for multi-node processors */
@@ -340,7 +339,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
 
 		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
 
-		node_id  = ecx & 0xff;
+		c->cpu_die_id  = ecx & 0xff;
 
 		if (c->x86 == 0x15)
 			c->cu_id = ebx & 0xff;
@@ -360,15 +359,15 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
 		if (!err)
 			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
 
-		cacheinfo_amd_init_llc_id(c, cpu, node_id);
+		cacheinfo_amd_init_llc_id(c, cpu);
 
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
 		u64 value;
 
 		rdmsrl(MSR_FAM10H_NODE_ID, value);
-		node_id = value & 7;
+		c->cpu_die_id = value & 7;
 
-		per_cpu(cpu_llc_id, cpu) = node_id;
+		per_cpu(cpu_llc_id, cpu) = c->cpu_die_id;
 	} else
 		return;
 
@@ -393,7 +392,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
 	/* Convert the initial APIC ID into the socket ID */
 	c->phys_proc_id = c->initial_apicid >> bits;
 	/* use socket ID also for last level cache */
-	per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
+	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->phys_proc_id;
 }
 
 static void amd_detect_ppin(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 57074cf3ad7c..f9ac682e75e7 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -646,7 +646,7 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c)
 	return i;
 }
 
-void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id)
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu)
 {
 	/*
 	 * We may have multiple LLCs if L3 caches exist, so check if we
@@ -657,7 +657,7 @@ void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id)
 
 	if (c->x86 < 0x17) {
 		/* LLC is at the node level. */
-		per_cpu(cpu_llc_id, cpu) = node_id;
+		per_cpu(cpu_llc_id, cpu) = c->cpu_die_id;
 	} else if (c->x86 == 0x17 && c->x86_model <= 0x1F) {
 		/*
 		 * LLC is at the core complex level.
@@ -684,7 +684,7 @@ void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id)
 	}
 }
 
-void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id)
+void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu)
 {
 	/*
 	 * We may have multiple LLCs if L3 caches exist, so check if we
diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
index ac6c30e5801d..dc0840aae26c 100644
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -65,7 +65,6 @@ static void hygon_get_topology_early(struct cpuinfo_x86 *c)
  */
 static void hygon_get_topology(struct cpuinfo_x86 *c)
 {
-	u8 node_id;
 	int cpu = smp_processor_id();
 
 	/* get information required for multi-node processors */
@@ -75,7 +74,7 @@ static void hygon_get_topology(struct cpuinfo_x86 *c)
 
 		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
 
-		node_id  = ecx & 0xff;
+		c->cpu_die_id  = ecx & 0xff;
 
 		c->cpu_core_id = ebx & 0xff;
 
@@ -93,14 +92,14 @@ static void hygon_get_topology(struct cpuinfo_x86 *c)
 		/* Socket ID is ApicId[6] for these processors. */
 		c->phys_proc_id = c->apicid >> APICID_SOCKET_ID_BIT;
 
-		cacheinfo_hygon_init_llc_id(c, cpu, node_id);
+		cacheinfo_hygon_init_llc_id(c, cpu);
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
 		u64 value;
 
 		rdmsrl(MSR_FAM10H_NODE_ID, value);
-		node_id = value & 7;
+		c->cpu_die_id = value & 7;
 
-		per_cpu(cpu_llc_id, cpu) = node_id;
+		per_cpu(cpu_llc_id, cpu) = c->cpu_die_id;
 	} else
 		return;
 
@@ -123,7 +122,7 @@ static void hygon_detect_cmp(struct cpuinfo_x86 *c)
 	/* Convert the initial APIC ID into the socket ID */
 	c->phys_proc_id = c->initial_apicid >> bits;
 	/* use socket ID also for last level cache */
-	per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
+	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->phys_proc_id;
 }
 
 static void srat_detect_node(struct cpuinfo_x86 *c)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] x86/CPU/AMD: Remove amd_get_nb_id()
  2020-11-09 21:06 [PATCH 0/4] Set and use cpu_die_id on AMD-based systems Yazen Ghannam
  2020-11-09 21:06 ` [PATCH 1/4] x86/CPU/AMD: Save AMD NodeId as cpu_die_id Yazen Ghannam
@ 2020-11-09 21:06 ` Yazen Ghannam
  2020-11-09 21:06 ` [PATCH 3/4] EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId Yazen Ghannam
  2020-11-09 21:06 ` [PATCH 4/4] x86/topology: Set cpu_die_id only if DIE_TYPE found Yazen Ghannam
  3 siblings, 0 replies; 5+ messages in thread
From: Yazen Ghannam @ 2020-11-09 21:06 UTC (permalink / raw)
  To: x86
  Cc: Yazen Ghannam, linux-kernel, Smita.KoralahalliChannabasappa,
	linux-edac, puwen, kim.phillips, thomas.lendacky, wei.huang2

From: Yazen Ghannam <yazen.ghannam@amd.com>

The Last Level Cache ID is returned by amd_get_nb_id(). In practice,
this value is the same as the AMD NodeId for callers of this function.
The NodeId is saved in struct cpuinfo_x86.cpu_die_id.

Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and
remove the function.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/events/amd/core.c       | 2 +-
 arch/x86/include/asm/processor.h | 2 --
 arch/x86/kernel/amd_nb.c         | 4 ++--
 arch/x86/kernel/cpu/amd.c        | 6 ------
 arch/x86/kernel/cpu/cacheinfo.c  | 2 +-
 arch/x86/kernel/cpu/mce/amd.c    | 4 ++--
 arch/x86/kernel/cpu/mce/inject.c | 4 ++--
 drivers/edac/amd64_edac.c        | 4 ++--
 drivers/edac/mce_amd.c           | 2 +-
 9 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 39eb276d0277..2c1791c4a518 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -538,7 +538,7 @@ static void amd_pmu_cpu_starting(int cpu)
 	if (!x86_pmu.amd_nb_constraints)
 		return;
 
-	nb_id = amd_get_nb_id(cpu);
+	nb_id = topology_die_id(cpu);
 	WARN_ON_ONCE(nb_id == BAD_APICID);
 
 	for_each_online_cpu(i) {
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 60dbcdcb833f..a411466a6e74 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -815,10 +815,8 @@ extern int set_tsc_mode(unsigned int val);
 DECLARE_PER_CPU(u64, msr_misc_features_shadow);
 
 #ifdef CONFIG_CPU_SUP_AMD
-extern u16 amd_get_nb_id(int cpu);
 extern u32 amd_get_nodes_per_socket(void);
 #else
-static inline u16 amd_get_nb_id(int cpu)		{ return 0; }
 static inline u32 amd_get_nodes_per_socket(void)	{ return 0; }
 #endif
 
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 18f6b7c4bd79..b4396952c9a6 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -384,7 +384,7 @@ struct resource *amd_get_mmconfig_range(struct resource *res)
 
 int amd_get_subcaches(int cpu)
 {
-	struct pci_dev *link = node_to_amd_nb(amd_get_nb_id(cpu))->link;
+	struct pci_dev *link = node_to_amd_nb(topology_die_id(cpu))->link;
 	unsigned int mask;
 
 	if (!amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
@@ -398,7 +398,7 @@ int amd_get_subcaches(int cpu)
 int amd_set_subcaches(int cpu, unsigned long mask)
 {
 	static unsigned int reset, ban;
-	struct amd_northbridge *nb = node_to_amd_nb(amd_get_nb_id(cpu));
+	struct amd_northbridge *nb = node_to_amd_nb(topology_die_id(cpu));
 	unsigned int reg;
 	int cuid;
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 2f1fbd8150af..1f71c7616917 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -424,12 +424,6 @@ static void amd_detect_ppin(struct cpuinfo_x86 *c)
 	clear_cpu_cap(c, X86_FEATURE_AMD_PPIN);
 }
 
-u16 amd_get_nb_id(int cpu)
-{
-	return per_cpu(cpu_llc_id, cpu);
-}
-EXPORT_SYMBOL_GPL(amd_get_nb_id);
-
 u32 amd_get_nodes_per_socket(void)
 {
 	return nodes_per_socket;
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index f9ac682e75e7..3ca9be482a9e 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -580,7 +580,7 @@ static void amd_init_l3_cache(struct _cpuid4_info_regs *this_leaf, int index)
 	if (index < 3)
 		return;
 
-	node = amd_get_nb_id(smp_processor_id());
+	node = topology_die_id(smp_processor_id());
 	this_leaf->nb = node_to_amd_nb(node);
 	if (this_leaf->nb && !this_leaf->nb->l3_cache.indices)
 		amd_calc_l3_indices(this_leaf->nb);
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 0c6b02dd744c..e486f96b3cb3 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1341,7 +1341,7 @@ static int threshold_create_bank(struct threshold_bank **bp, unsigned int cpu,
 		return -ENODEV;
 
 	if (is_shared_bank(bank)) {
-		nb = node_to_amd_nb(amd_get_nb_id(cpu));
+		nb = node_to_amd_nb(topology_die_id(cpu));
 
 		/* threshold descriptor already initialized on this node? */
 		if (nb && nb->bank4) {
@@ -1445,7 +1445,7 @@ static void threshold_remove_bank(struct threshold_bank *bank)
 		 * The last CPU on this node using the shared bank is going
 		 * away, remove that bank now.
 		 */
-		nb = node_to_amd_nb(amd_get_nb_id(smp_processor_id()));
+		nb = node_to_amd_nb(topology_die_id(smp_processor_id()));
 		nb->bank4 = NULL;
 	}
 
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 3a44346f2276..7b360731fc2d 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -522,8 +522,8 @@ static void do_inject(void)
 	if (boot_cpu_has(X86_FEATURE_AMD_DCM) &&
 	    b == 4 &&
 	    boot_cpu_data.x86 < 0x17) {
-		toggle_nb_mca_mst_cpu(amd_get_nb_id(cpu));
-		cpu = get_nbc_for_node(amd_get_nb_id(cpu));
+		toggle_nb_mca_mst_cpu(topology_die_id(cpu));
+		cpu = get_nbc_for_node(topology_die_id(cpu));
 	}
 
 	get_online_cpus();
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3bac76efd3f6..4e36d8494563 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1133,7 +1133,7 @@ static int k8_early_channel_count(struct amd64_pvt *pvt)
 /* On F10h and later ErrAddr is MC4_ADDR[47:1] */
 static u64 get_error_address(struct amd64_pvt *pvt, struct mce *m)
 {
-	u16 mce_nid = amd_get_nb_id(m->extcpu);
+	u16 mce_nid = topology_die_id(m->extcpu);
 	struct mem_ctl_info *mci;
 	u8 start_bit = 1;
 	u8 end_bit   = 47;
@@ -3038,7 +3038,7 @@ static void get_cpus_on_this_dct_cpumask(struct cpumask *mask, u16 nid)
 	int cpu;
 
 	for_each_online_cpu(cpu)
-		if (amd_get_nb_id(cpu) == nid)
+		if (topology_die_id(cpu) == nid)
 			cpumask_set_cpu(cpu, mask);
 }
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 7f28edb070bd..85095e3902ec 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -869,7 +869,7 @@ static void decode_mc3_mce(struct mce *m)
 static void decode_mc4_mce(struct mce *m)
 {
 	unsigned int fam = x86_family(m->cpuid);
-	int node_id = amd_get_nb_id(m->extcpu);
+	int node_id = topology_die_id(m->extcpu);
 	u16 ec = EC(m->status);
 	u8 xec = XEC(m->status, 0x1f);
 	u8 offset = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
  2020-11-09 21:06 [PATCH 0/4] Set and use cpu_die_id on AMD-based systems Yazen Ghannam
  2020-11-09 21:06 ` [PATCH 1/4] x86/CPU/AMD: Save AMD NodeId as cpu_die_id Yazen Ghannam
  2020-11-09 21:06 ` [PATCH 2/4] x86/CPU/AMD: Remove amd_get_nb_id() Yazen Ghannam
@ 2020-11-09 21:06 ` Yazen Ghannam
  2020-11-09 21:06 ` [PATCH 4/4] x86/topology: Set cpu_die_id only if DIE_TYPE found Yazen Ghannam
  3 siblings, 0 replies; 5+ messages in thread
From: Yazen Ghannam @ 2020-11-09 21:06 UTC (permalink / raw)
  To: x86
  Cc: Yazen Ghannam, linux-kernel, Smita.KoralahalliChannabasappa,
	linux-edac, puwen, kim.phillips, thomas.lendacky, wei.huang2

From: Yazen Ghannam <yazen.ghannam@amd.com>

The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and
later systems. This function is used in amd64_edac_mod to do
system-specific decoding for DRAM ECC errors. The function takes a
"NodeId" as a parameter.

In AMD documentation, NodeId is used to identify a physical die in a
system. This can be used to identify a node in the AMD_NB code and also
it is used with umc_normaddr_to_sysaddr().

However, the input used for decode_dram_ecc() is currently the NUMA node
of a logical CPU. In the default configuration, the NUMA node and
physical die will be equivalent, so this doesn't have an impact. But the
NUMA node configuration can be adjusted with optional memory
interleaving modes. This will cause the NUMA node enumeration to not
match the physical die enumeration. The mismatch will cause the address
translation function to fail or report incorrect results.

Use struct cpuinfo_x86.cpu_die_id for the node_id parameter to ensure the
physical ID is used.

Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/mce_amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 85095e3902ec..5dd905a3f30c 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1003,7 +1003,7 @@ static void decode_smca_error(struct mce *m)
 		pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
 
 	if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
-		decode_dram_ecc(cpu_to_node(m->extcpu), m);
+		decode_dram_ecc(topology_die_id(m->extcpu), m);
 }
 
 static inline void amd_decode_err_code(u16 ec)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] x86/topology: Set cpu_die_id only if DIE_TYPE found
  2020-11-09 21:06 [PATCH 0/4] Set and use cpu_die_id on AMD-based systems Yazen Ghannam
                   ` (2 preceding siblings ...)
  2020-11-09 21:06 ` [PATCH 3/4] EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId Yazen Ghannam
@ 2020-11-09 21:06 ` Yazen Ghannam
  3 siblings, 0 replies; 5+ messages in thread
From: Yazen Ghannam @ 2020-11-09 21:06 UTC (permalink / raw)
  To: x86
  Cc: Yazen Ghannam, linux-kernel, Smita.KoralahalliChannabasappa,
	linux-edac, puwen, kim.phillips, thomas.lendacky, wei.huang2

From: Yazen Ghannam <yazen.ghannam@amd.com>

CPUID Leaf 0x1F defines a DIE_TYPE level, but CPUID Leaf 0xB does not.
However, detect_extended_topology() will set struct
cpuinfo_x86.cpu_die_id regardless of whether a valid Die ID was found.

Only set cpu_die_id if a DIE_TYPE level is found. CPU topology code may
use another value for cpu_die_id, e.g. the AMD NodeId on AMD-based
systems. Code ordering should be maintained so that the CPUID Leaf 0x1F
Die ID value will take precedence on systems that may use another value.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/topology.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index d3a0791bc052..1068002c8532 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -96,6 +96,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
 	unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
 	unsigned int core_select_mask, core_level_siblings;
 	unsigned int die_select_mask, die_level_siblings;
+	bool die_level_present = false;
 	int leaf;
 
 	leaf = detect_extended_topology_leaf(c);
@@ -126,6 +127,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
 			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 		}
 		if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
+			die_level_present = true;
 			die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
 			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 		}
@@ -139,8 +141,12 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
 
 	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
 				ht_mask_width) & core_select_mask;
-	c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
-				core_plus_mask_width) & die_select_mask;
+
+	if (die_level_present) {
+		c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
+					core_plus_mask_width) & die_select_mask;
+	}
+
 	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
 				die_plus_mask_width);
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-09 21:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-09 21:06 [PATCH 0/4] Set and use cpu_die_id on AMD-based systems Yazen Ghannam
2020-11-09 21:06 ` [PATCH 1/4] x86/CPU/AMD: Save AMD NodeId as cpu_die_id Yazen Ghannam
2020-11-09 21:06 ` [PATCH 2/4] x86/CPU/AMD: Remove amd_get_nb_id() Yazen Ghannam
2020-11-09 21:06 ` [PATCH 3/4] EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId Yazen Ghannam
2020-11-09 21:06 ` [PATCH 4/4] x86/topology: Set cpu_die_id only if DIE_TYPE found Yazen Ghannam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).