All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] AMD64 EDAC Cleanup and Refactor
@ 2022-05-09 14:55 Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 01/18] EDAC/amd64: Don't set up EDAC PCI control on Family 17h+ Yazen Ghannam
                   ` (17 more replies)
  0 siblings, 18 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

Hi Boris,

Please apply this set of cleanup and refactor patches.

Patches 1-4 remove unusable and useless code for Family 17h and later
systems.

Patches 5-18 refactor a good amount of the module code to prep for
adding GPU support. Initial patches and review here:

https://lkml.kernel.org/r/20220228161354.54923-1-nchatrad@amd.com

After reflecting on things more, I think the best approach is to split
the code between modern and legacy systems. It's possible to redo some
of the legacy code, but I think it's better to just leave it as-is. I'd
like to avoid introducing new bugs or changing existing behavior for
legacy systems.

Thanks!

-Yazen

Muralidhara M K (14):
  EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
  EDAC/amd64: Add prep_chip_selects() into pvt->ops
  EDAC/amd64: Add read_base_mask() into pvt->ops
  EDAC/amd64: Add determine_memory_type() into pvt->ops
  EDAC/amd64: Add get_ecc_sym_sz() into pvt->ops
  EDAC/amd64: Add read_mc_regs() into pvt->ops
  EDAC/amd64: Add ecc_enabled() into pvt->ops
  EDAC/amd64: Add determine_edac_cap() into pvt->ops
  EDAC/amd64: Add determine_edac_ctl_cap() into pvt->ops
  EDAC/amd64: Add setup_mci_misc_attrs() into pvt->ops
  EDAC/amd64: Add init_csrows() into pvt->ops
  EDAC/amd64: Add dump_misc_regs() into pvt->ops
  EDAC/amd64: Add get_cs_mode() into pvt->ops
  EDAC/amd64: Add get_err_info() into pvt->ops

Yazen Ghannam (4):
  EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
  EDAC/amd64: Remove scrub rate control for Family 17h and later
  EDAC/amd64: Remove PCI Function 6
  EDAC/amd64: Remove PCI Function 0

 drivers/edac/amd64_edac.c | 756 +++++++++++++++-----------------------
 drivers/edac/amd64_edac.h |  97 ++---
 2 files changed, 331 insertions(+), 522 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/18] EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 02/18] EDAC/amd64: Remove scrub rate control for Family 17h and later Yazen Ghannam
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

EDAC PCI control is used to detect/report legacy PCI errors like
"Parity" and "SERROR". Modern AMD systems use PCIe Advanced Error
Reporting (AER), and legacy PCI errors should not be reported.

Remove EDAC PCI control setup on AMD Family 17h and later systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2f854feeeb23..04fa96592317 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -4367,12 +4367,12 @@ static int __init amd64_edac_init(void)
 	}
 
 	/* register stuff with EDAC MCE */
-	if (boot_cpu_data.x86 >= 0x17)
+	if (boot_cpu_data.x86 >= 0x17) {
 		amd_register_ecc_decoder(decode_umc_error);
-	else
+	} else {
 		amd_register_ecc_decoder(decode_bus_error);
-
-	setup_pci_device();
+		setup_pci_device();
+	}
 
 #ifdef CONFIG_X86_32
 	amd64_err("%s on 32-bit is unsupported. USE AT YOUR OWN RISK!\n", EDAC_MOD_STR);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/18] EDAC/amd64: Remove scrub rate control for Family 17h and later
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 01/18] EDAC/amd64: Don't set up EDAC PCI control on Family 17h+ Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-11 10:26   ` Borislav Petkov
  2022-05-09 14:55 ` [PATCH 03/18] EDAC/amd64: Remove PCI Function 6 Yazen Ghannam
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

The scrub registers on AMD Family 17h and later may be inaccessible to
the OS. Furthermore, hardware designers recommend that the scrubbing
feature is managed by the firmware.

Remove support for the sdram_scrub_rate interface for AMD Family 17h
systems and later. Also, return an -EPERM code for these systems. This
matches the description in Documentation/ABI for this file. Also, this
matches the behavior that the OS is not permitted to modify the scrub
registers.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 31 +++++--------------------------
 drivers/edac/amd64_edac.h |  2 --
 2 files changed, 5 insertions(+), 28 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 04fa96592317..3ec7eb4ceb4e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -182,21 +182,6 @@ static inline int amd64_read_dct_pci_cfg(struct amd64_pvt *pvt, u8 dct,
  * other archs, we might not have access to the caches directly.
  */
 
-static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval)
-{
-	/*
-	 * Fam17h supports scrub values between 0x5 and 0x14. Also, the values
-	 * are shifted down by 0x5, so scrubval 0x5 is written to the register
-	 * as 0x0, scrubval 0x6 as 0x1, etc.
-	 */
-	if (scrubval >= 0x5 && scrubval <= 0x14) {
-		scrubval -= 0x5;
-		pci_write_bits32(pvt->F6, F17H_SCR_LIMIT_ADDR, scrubval, 0xF);
-		pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 1, 0x1);
-	} else {
-		pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 0, 0x1);
-	}
-}
 /*
  * Scan the scrub rate mapping table for a close or matching bandwidth value to
  * issue. If requested is too big, then use last maximum value found.
@@ -229,9 +214,7 @@ static int __set_scrub_rate(struct amd64_pvt *pvt, u32 new_bw, u32 min_rate)
 
 	scrubval = scrubrates[i].scrubval;
 
-	if (pvt->umc) {
-		__f17h_set_scrubval(pvt, scrubval);
-	} else if (pvt->fam == 0x15 && pvt->model == 0x60) {
+	if (pvt->fam == 0x15 && pvt->model == 0x60) {
 		f15h_select_dct(pvt, 0);
 		pci_write_bits32(pvt->F2, F15H_M60H_SCRCTRL, scrubval, 0x001F);
 		f15h_select_dct(pvt, 1);
@@ -251,6 +234,9 @@ static int set_scrub_rate(struct mem_ctl_info *mci, u32 bw)
 	struct amd64_pvt *pvt = mci->pvt_info;
 	u32 min_scrubrate = 0x5;
 
+	if (pvt->umc)
+		return -EPERM;
+
 	if (pvt->fam == 0xf)
 		min_scrubrate = 0x0;
 
@@ -272,14 +258,7 @@ static int get_scrub_rate(struct mem_ctl_info *mci)
 	u32 scrubval = 0;
 
 	if (pvt->umc) {
-		amd64_read_pci_cfg(pvt->F6, F17H_SCR_BASE_ADDR, &scrubval);
-		if (scrubval & BIT(0)) {
-			amd64_read_pci_cfg(pvt->F6, F17H_SCR_LIMIT_ADDR, &scrubval);
-			scrubval &= 0xF;
-			scrubval += 0x5;
-		} else {
-			scrubval = 0;
-		}
+		return -EPERM;
 	} else if (pvt->fam == 0x15) {
 		/* Erratum #505 */
 		if (pvt->model < 0x10)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 38e5ad95d010..48f1d97e1274 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -215,8 +215,6 @@
 #define DCT_SEL_HI			0x114
 
 #define F15H_M60H_SCRCTRL		0x1C8
-#define F17H_SCR_BASE_ADDR		0x48
-#define F17H_SCR_LIMIT_ADDR		0x4C
 
 /*
  * Function 3 - Misc Control
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/18] EDAC/amd64: Remove PCI Function 6
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 01/18] EDAC/amd64: Don't set up EDAC PCI control on Family 17h+ Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 02/18] EDAC/amd64: Remove scrub rate control for Family 17h and later Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 04/18] EDAC/amd64: Remove PCI Function 0 Yazen Ghannam
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

PCI Function 6 is used on Family 17h and later to access scrub
registers. With scrub access removed, this function has no other use.

Remove all Function 6 PCI IDs and related code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 22 +---------------------
 drivers/edac/amd64_edac.h | 10 +---------
 2 files changed, 2 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3ec7eb4ceb4e..b2f7c14a287c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2911,7 +2911,6 @@ static struct amd64_family_type family_types[] = {
 	[F17_CPUS] = {
 		.ctl_name = "F17h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
-		.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2921,7 +2920,6 @@ static struct amd64_family_type family_types[] = {
 	[F17_M10H_CPUS] = {
 		.ctl_name = "F17h_M10h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
-		.f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2931,7 +2929,6 @@ static struct amd64_family_type family_types[] = {
 	[F17_M30H_CPUS] = {
 		.ctl_name = "F17h_M30h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
-		.f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
 		.max_mcs = 8,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2941,7 +2938,6 @@ static struct amd64_family_type family_types[] = {
 	[F17_M60H_CPUS] = {
 		.ctl_name = "F17h_M60h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0,
-		.f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2951,7 +2947,6 @@ static struct amd64_family_type family_types[] = {
 	[F17_M70H_CPUS] = {
 		.ctl_name = "F17h_M70h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
-		.f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2961,7 +2956,6 @@ static struct amd64_family_type family_types[] = {
 	[F19_CPUS] = {
 		.ctl_name = "F19h",
 		.f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0,
-		.f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6,
 		.max_mcs = 8,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2971,7 +2965,6 @@ static struct amd64_family_type family_types[] = {
 	[F19_M10H_CPUS] = {
 		.ctl_name = "F19h_M10h",
 		.f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0,
-		.f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6,
 		.max_mcs = 12,
 		.flags.zn_regs_v2 = 1,
 		.ops = {
@@ -2982,7 +2975,6 @@ static struct amd64_family_type family_types[] = {
 	[F19_M50H_CPUS] = {
 		.ctl_name = "F19h_M50h",
 		.f0_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F0,
-		.f6_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F6,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -3295,7 +3287,7 @@ static void decode_umc_error(int node_id, struct mce *m)
 /*
  * Use pvt->F3 which contains the F3 CPU PCI device to get the related
  * F1 (AddrMap) and F2 (Dct) devices. Return negative value on error.
- * Reserve F0 and F6 on systems with a UMC.
+ * Reserve F0 on systems with a UMC.
  */
 static int
 reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
@@ -3307,21 +3299,11 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
 			return -ENODEV;
 		}
 
-		pvt->F6 = pci_get_related_function(pvt->F3->vendor, pci_id2, pvt->F3);
-		if (!pvt->F6) {
-			pci_dev_put(pvt->F0);
-			pvt->F0 = NULL;
-
-			edac_dbg(1, "F6 not found: device 0x%x\n", pci_id2);
-			return -ENODEV;
-		}
-
 		if (!pci_ctl_dev)
 			pci_ctl_dev = &pvt->F0->dev;
 
 		edac_dbg(1, "F0: %s\n", pci_name(pvt->F0));
 		edac_dbg(1, "F3: %s\n", pci_name(pvt->F3));
-		edac_dbg(1, "F6: %s\n", pci_name(pvt->F6));
 
 		return 0;
 	}
@@ -3357,7 +3339,6 @@ static void free_mc_sibling_devs(struct amd64_pvt *pvt)
 {
 	if (pvt->umc) {
 		pci_dev_put(pvt->F0);
-		pci_dev_put(pvt->F6);
 	} else {
 		pci_dev_put(pvt->F1);
 		pci_dev_put(pvt->F2);
@@ -4080,7 +4061,6 @@ static int hw_info_get(struct amd64_pvt *pvt)
 			return -ENOMEM;
 
 		pci_id1 = fam_type->f0_id;
-		pci_id2 = fam_type->f6_id;
 	} else {
 		pci_id1 = fam_type->f1_id;
 		pci_id2 = fam_type->f2_id;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 48f1d97e1274..2c7b49479aa9 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -115,21 +115,13 @@
 #define PCI_DEVICE_ID_AMD_16H_M30H_NB_F1 0x1581
 #define PCI_DEVICE_ID_AMD_16H_M30H_NB_F2 0x1582
 #define PCI_DEVICE_ID_AMD_17H_DF_F0	0x1460
-#define PCI_DEVICE_ID_AMD_17H_DF_F6	0x1466
 #define PCI_DEVICE_ID_AMD_17H_M10H_DF_F0 0x15e8
-#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F6 0x15ee
 #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F0 0x1490
-#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F6 0x1496
 #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F0 0x1448
-#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F6 0x144e
 #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F0 0x1440
-#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
 #define PCI_DEVICE_ID_AMD_19H_DF_F0	0x1650
-#define PCI_DEVICE_ID_AMD_19H_DF_F6	0x1656
 #define PCI_DEVICE_ID_AMD_19H_M10H_DF_F0 0x14ad
-#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F6 0x14b3
 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F0 0x166a
-#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F6 0x1670
 
 /*
  * Function 1 - Address Map
@@ -501,7 +493,7 @@ struct amd64_family_flags {
 
 struct amd64_family_type {
 	const char *ctl_name;
-	u16 f0_id, f1_id, f2_id, f6_id;
+	u16 f0_id, f1_id, f2_id;
 	/* Maximum number of memory controllers per die/node. */
 	u8 max_mcs;
 	struct amd64_family_flags flags;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/18] EDAC/amd64: Remove PCI Function 0
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (2 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 03/18] EDAC/amd64: Remove PCI Function 6 Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-11 10:34   ` Borislav Petkov
  2022-05-09 14:55 ` [PATCH 05/18] EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt Yazen Ghannam
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

PCI Function 0 is used on Family 17h and later only to read the "dhar"
value. This value is printed and provided through a module-specific
debug sysfs file. The value is not used for any Family 17h and later
code, and it does not have any apparent debug value on these systems.

Remove "dhar", Function 0 PCI IDs, and all related code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 38 +++++---------------------------------
 drivers/edac/amd64_edac.h | 10 +---------
 2 files changed, 6 insertions(+), 42 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b2f7c14a287c..e0f99b8edc97 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1433,9 +1433,6 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
 
 		debug_display_dimm_sizes_df(pvt, i);
 	}
-
-	edac_dbg(1, "F0x104 (DRAM Hole Address): 0x%08x, base: 0x%08x\n",
-		 pvt->dhar, dhar_base(pvt));
 }
 
 /* Display and decode various NB registers for debug purposes. */
@@ -1470,6 +1467,8 @@ static void __dump_misc_regs(struct amd64_pvt *pvt)
 	/* Only if NOT ganged does dclr1 have valid info */
 	if (!dct_ganging_enabled(pvt))
 		debug_dump_dramcfg_low(pvt, pvt->dclr1, 1);
+
+	edac_dbg(1, "  DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no");
 }
 
 /* Display and decode various NB registers for debug purposes. */
@@ -1480,8 +1479,6 @@ static void dump_misc_regs(struct amd64_pvt *pvt)
 	else
 		__dump_misc_regs(pvt);
 
-	edac_dbg(1, "  DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no");
-
 	amd64_info("using x%u syndromes.\n", pvt->ecc_sym_sz);
 }
 
@@ -2910,7 +2907,6 @@ static struct amd64_family_type family_types[] = {
 	},
 	[F17_CPUS] = {
 		.ctl_name = "F17h",
-		.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2919,7 +2915,6 @@ static struct amd64_family_type family_types[] = {
 	},
 	[F17_M10H_CPUS] = {
 		.ctl_name = "F17h_M10h",
-		.f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2928,7 +2923,6 @@ static struct amd64_family_type family_types[] = {
 	},
 	[F17_M30H_CPUS] = {
 		.ctl_name = "F17h_M30h",
-		.f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
 		.max_mcs = 8,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2937,7 +2931,6 @@ static struct amd64_family_type family_types[] = {
 	},
 	[F17_M60H_CPUS] = {
 		.ctl_name = "F17h_M60h",
-		.f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2946,7 +2939,6 @@ static struct amd64_family_type family_types[] = {
 	},
 	[F17_M70H_CPUS] = {
 		.ctl_name = "F17h_M70h",
-		.f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2955,7 +2947,6 @@ static struct amd64_family_type family_types[] = {
 	},
 	[F19_CPUS] = {
 		.ctl_name = "F19h",
-		.f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0,
 		.max_mcs = 8,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -2964,7 +2955,6 @@ static struct amd64_family_type family_types[] = {
 	},
 	[F19_M10H_CPUS] = {
 		.ctl_name = "F19h_M10h",
-		.f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0,
 		.max_mcs = 12,
 		.flags.zn_regs_v2 = 1,
 		.ops = {
@@ -2974,7 +2964,6 @@ static struct amd64_family_type family_types[] = {
 	},
 	[F19_M50H_CPUS] = {
 		.ctl_name = "F19h_M50h",
-		.f0_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F0,
 		.max_mcs = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
@@ -3287,26 +3276,12 @@ static void decode_umc_error(int node_id, struct mce *m)
 /*
  * Use pvt->F3 which contains the F3 CPU PCI device to get the related
  * F1 (AddrMap) and F2 (Dct) devices. Return negative value on error.
- * Reserve F0 on systems with a UMC.
  */
 static int
 reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
 {
-	if (pvt->umc) {
-		pvt->F0 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3);
-		if (!pvt->F0) {
-			edac_dbg(1, "F0 not found, device 0x%x\n", pci_id1);
-			return -ENODEV;
-		}
-
-		if (!pci_ctl_dev)
-			pci_ctl_dev = &pvt->F0->dev;
-
-		edac_dbg(1, "F0: %s\n", pci_name(pvt->F0));
-		edac_dbg(1, "F3: %s\n", pci_name(pvt->F3));
-
+	if (pvt->umc)
 		return 0;
-	}
 
 	/* Reserve the ADDRESS MAP Device */
 	pvt->F1 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3);
@@ -3338,7 +3313,7 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
 static void free_mc_sibling_devs(struct amd64_pvt *pvt)
 {
 	if (pvt->umc) {
-		pci_dev_put(pvt->F0);
+		return;
 	} else {
 		pci_dev_put(pvt->F1);
 		pci_dev_put(pvt->F2);
@@ -3428,7 +3403,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
 
 	if (pvt->umc) {
 		__read_mc_regs_df(pvt);
-		amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar);
 
 		goto skip;
 	}
@@ -4059,8 +4033,6 @@ static int hw_info_get(struct amd64_pvt *pvt)
 		pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
 		if (!pvt->umc)
 			return -ENOMEM;
-
-		pci_id1 = fam_type->f0_id;
 	} else {
 		pci_id1 = fam_type->f1_id;
 		pci_id2 = fam_type->f2_id;
@@ -4077,7 +4049,7 @@ static int hw_info_get(struct amd64_pvt *pvt)
 
 static void hw_info_put(struct amd64_pvt *pvt)
 {
-	if (pvt->F0 || pvt->F1)
+	if (pvt->F1)
 		free_mc_sibling_devs(pvt);
 
 	kfree(pvt->umc);
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 2c7b49479aa9..7cfac50d6558 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -114,14 +114,6 @@
 #define PCI_DEVICE_ID_AMD_16H_NB_F2	0x1532
 #define PCI_DEVICE_ID_AMD_16H_M30H_NB_F1 0x1581
 #define PCI_DEVICE_ID_AMD_16H_M30H_NB_F2 0x1582
-#define PCI_DEVICE_ID_AMD_17H_DF_F0	0x1460
-#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F0 0x15e8
-#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F0 0x1490
-#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F0 0x1448
-#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F0 0x1440
-#define PCI_DEVICE_ID_AMD_19H_DF_F0	0x1650
-#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F0 0x14ad
-#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F0 0x166a
 
 /*
  * Function 1 - Address Map
@@ -493,7 +485,7 @@ struct amd64_family_flags {
 
 struct amd64_family_type {
 	const char *ctl_name;
-	u16 f0_id, f1_id, f2_id;
+	u16 f1_id, f2_id;
 	/* Maximum number of memory controllers per die/node. */
 	u8 max_mcs;
 	struct amd64_family_flags flags;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/18] EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (3 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 04/18] EDAC/amd64: Remove PCI Function 0 Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-13 15:21   ` Borislav Petkov
  2022-05-09 14:55 ` [PATCH 06/18] EDAC/amd64: Add prep_chip_selects() into pvt->ops Yazen Ghannam
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

Future AMD systems will support heterogeneous "AMD Node" types, e.g.
CPU+GPU. Therefore, a global "family type" shared across all "AMD Nodes"
is no longer appropriate.

Move struct low_ops routines and members of struct amd64_family_type
to struct amd64_pvt.

Currently, there are many code branches that split between "modern" and
"legacy" systems. Another code branch will be needed in order to cover
"GPU" cases. However, rather than introduce another branching case in
multiple functions, the current branching code should be switched to a
set of function pointers. This change makes the code more readable and
simplifies adding support for new families/models.

In order to reuse code, define two sets of function pointers. Use one
for modern systems (Family 17h and later). This will not change between
current CPU families. Use another set of function pointers for legacy
systems (before Family 17h). Use the Family 16h versions as default
for the legacy ops since these are the latest, and adjust the function
pointers as needed for older families.

Rename the Family 17h functions to use a "umc" prefix. This is to
indicate that the functions apply to all modern CPU familes (17h, 18h,
and 19h) which all have Unified Memory Controllers (UMCs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 353 +++++++++++++-------------------------
 drivers/edac/amd64_edac.h |  66 +++----
 2 files changed, 140 insertions(+), 279 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index e0f99b8edc97..3d569290d4cf 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -13,11 +13,9 @@ module_param(ecc_enable_override, int, 0644);
 
 static struct msr __percpu *msrs;
 
-static struct amd64_family_type *fam_type;
-
-static inline u32 get_umc_reg(u32 reg)
+static inline u32 get_umc_reg(struct amd64_pvt *pvt, u32 reg)
 {
-	if (!fam_type->flags.zn_regs_v2)
+	if (!pvt->flags.zn_regs_v2)
 		return reg;
 
 	switch (reg) {
@@ -442,7 +440,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct,
 	for (i = 0; i < pvt->csels[dct].m_cnt; i++)
 
 #define for_each_umc(i) \
-	for (i = 0; i < fam_type->max_mcs; i++)
+	for (i = 0; i < pvt->max_mcs; i++)
 
 /*
  * @input_addr is an InputAddr associated with the node given by mci. Return the
@@ -1425,7 +1423,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
 
 		if (umc->dram_type == MEM_LRDDR4 || umc->dram_type == MEM_LRDDR5) {
 			amd_smn_read(pvt->mc_node_id,
-				     umc_base + get_umc_reg(UMCCH_ADDR_CFG),
+				     umc_base + get_umc_reg(pvt, UMCCH_ADDR_CFG),
 				     &tmp);
 			edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
 					i, 1 << ((tmp >> 4) & 0x3));
@@ -1498,7 +1496,7 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
 
 		for_each_umc(umc) {
 			pvt->csels[umc].b_cnt = 4;
-			pvt->csels[umc].m_cnt = fam_type->flags.zn_regs_v2 ? 4 : 2;
+			pvt->csels[umc].m_cnt = pvt->flags.zn_regs_v2 ? 4 : 2;
 		}
 
 	} else {
@@ -1538,7 +1536,7 @@ static void read_umc_base_mask(struct amd64_pvt *pvt)
 		}
 
 		umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
-		umc_mask_reg_sec = get_umc_base(umc) + get_umc_reg(UMCCH_ADDR_MASK_SEC);
+		umc_mask_reg_sec = get_umc_base(umc) + get_umc_reg(pvt, UMCCH_ADDR_MASK_SEC);
 
 		for_each_chip_select_mask(cs, umc, pvt) {
 			mask = &pvt->csels[umc].csmasks[cs];
@@ -1626,7 +1624,7 @@ static void determine_memory_type_df(struct amd64_pvt *pvt)
 		 * Check if the system supports the "DDR Type" field in UMC Config
 		 * and has DDR5 DIMMs in use.
 		 */
-		if (fam_type->flags.zn_regs_v2 && ((umc->umc_cfg & GENMASK(2, 0)) == 0x1)) {
+		if (pvt->flags.zn_regs_v2 && ((umc->umc_cfg & GENMASK(2, 0)) == 0x1)) {
 			if (umc->dimm_cfg & BIT(5))
 				umc->dram_type = MEM_LRDDR5;
 			else if (umc->dimm_cfg & BIT(4))
@@ -2027,7 +2025,7 @@ static int f1x_early_channel_count(struct amd64_pvt *pvt)
 	return channels;
 }
 
-static int f17_early_channel_count(struct amd64_pvt *pvt)
+static int umc_early_channel_count(struct amd64_pvt *pvt)
 {
 	int i, channels = 0;
 
@@ -2167,7 +2165,7 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct,
 		return ddr3_cs_size(cs_mode, false);
 }
 
-static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
 				    unsigned int cs_mode, int csrow_nr)
 {
 	u32 addr_mask_orig, addr_mask_deinterleaved;
@@ -2207,7 +2205,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
 	 */
 	dimm = csrow_nr >> 1;
 
-	if (!fam_type->flags.zn_regs_v2)
+	if (!pvt->flags.zn_regs_v2)
 		cs_mask_nr >>= 1;
 
 	/* Asymmetric dual-rank DIMM support. */
@@ -2827,151 +2825,6 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
 	}
 }
 
-static struct amd64_family_type family_types[] = {
-	[K8_CPUS] = {
-		.ctl_name = "K8",
-		.f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
-		.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= k8_early_channel_count,
-			.map_sysaddr_to_csrow	= k8_map_sysaddr_to_csrow,
-			.dbam_to_cs		= k8_dbam_to_chip_select,
-		}
-	},
-	[F10_CPUS] = {
-		.ctl_name = "F10h",
-		.f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
-		.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f1x_early_channel_count,
-			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
-			.dbam_to_cs		= f10_dbam_to_chip_select,
-		}
-	},
-	[F15_CPUS] = {
-		.ctl_name = "F15h",
-		.f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
-		.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f1x_early_channel_count,
-			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
-			.dbam_to_cs		= f15_dbam_to_chip_select,
-		}
-	},
-	[F15_M30H_CPUS] = {
-		.ctl_name = "F15h_M30h",
-		.f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
-		.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f1x_early_channel_count,
-			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
-			.dbam_to_cs		= f16_dbam_to_chip_select,
-		}
-	},
-	[F15_M60H_CPUS] = {
-		.ctl_name = "F15h_M60h",
-		.f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
-		.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f1x_early_channel_count,
-			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
-			.dbam_to_cs		= f15_m60h_dbam_to_chip_select,
-		}
-	},
-	[F16_CPUS] = {
-		.ctl_name = "F16h",
-		.f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
-		.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f1x_early_channel_count,
-			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
-			.dbam_to_cs		= f16_dbam_to_chip_select,
-		}
-	},
-	[F16_M30H_CPUS] = {
-		.ctl_name = "F16h_M30h",
-		.f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
-		.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f1x_early_channel_count,
-			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
-			.dbam_to_cs		= f16_dbam_to_chip_select,
-		}
-	},
-	[F17_CPUS] = {
-		.ctl_name = "F17h",
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_addr_mask_to_cs_size,
-		}
-	},
-	[F17_M10H_CPUS] = {
-		.ctl_name = "F17h_M10h",
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_addr_mask_to_cs_size,
-		}
-	},
-	[F17_M30H_CPUS] = {
-		.ctl_name = "F17h_M30h",
-		.max_mcs = 8,
-		.ops = {
-			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_addr_mask_to_cs_size,
-		}
-	},
-	[F17_M60H_CPUS] = {
-		.ctl_name = "F17h_M60h",
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_addr_mask_to_cs_size,
-		}
-	},
-	[F17_M70H_CPUS] = {
-		.ctl_name = "F17h_M70h",
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_addr_mask_to_cs_size,
-		}
-	},
-	[F19_CPUS] = {
-		.ctl_name = "F19h",
-		.max_mcs = 8,
-		.ops = {
-			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_addr_mask_to_cs_size,
-		}
-	},
-	[F19_M10H_CPUS] = {
-		.ctl_name = "F19h_M10h",
-		.max_mcs = 12,
-		.flags.zn_regs_v2 = 1,
-		.ops = {
-			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_addr_mask_to_cs_size,
-		}
-	},
-	[F19_M50H_CPUS] = {
-		.ctl_name = "F19h_M50h",
-		.max_mcs = 2,
-		.ops = {
-			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_addr_mask_to_cs_size,
-		}
-	},
-};
-
 /*
  * These are tables of eigenvectors (one per line) which can be used for the
  * construction of the syndrome tables. The modified syndrome search algorithm
@@ -3368,7 +3221,7 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
 		umc_base = get_umc_base(i);
 		umc = &pvt->umc[i];
 
-		amd_smn_read(nid, umc_base + get_umc_reg(UMCCH_DIMM_CFG), &umc->dimm_cfg);
+		amd_smn_read(nid, umc_base + get_umc_reg(pvt, UMCCH_DIMM_CFG), &umc->dimm_cfg);
 		amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
 		amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
 		amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
@@ -3897,7 +3750,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 
 	mci->edac_cap		= determine_edac_cap(pvt);
 	mci->mod_name		= EDAC_MOD_STR;
-	mci->ctl_name		= fam_type->ctl_name;
+	mci->ctl_name		= pvt->ctl_name;
 	mci->dev_name		= pci_name(pvt->F3);
 	mci->ctl_page_to_phys	= NULL;
 
@@ -3906,114 +3759,149 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 	mci->get_sdram_scrub_rate = get_scrub_rate;
 }
 
-/*
- * returns a pointer to the family descriptor on success, NULL otherwise.
- */
-static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
+static struct low_ops umc_ops = {
+	.early_channel_count		= umc_early_channel_count,
+	.dbam_to_cs			= umc_addr_mask_to_cs_size,
+};
+
+/* Use Family 16h versions for defaults and adjust as needed below. */
+static struct low_ops dct_ops = {
+	.early_channel_count		= f1x_early_channel_count,
+	.map_sysaddr_to_csrow		= f1x_map_sysaddr_to_csrow,
+	.dbam_to_cs			= f16_dbam_to_chip_select,
+};
+
+static int per_family_init(struct amd64_pvt *pvt)
 {
 	pvt->ext_model  = boot_cpu_data.x86_model >> 4;
 	pvt->stepping	= boot_cpu_data.x86_stepping;
 	pvt->model	= boot_cpu_data.x86_model;
 	pvt->fam	= boot_cpu_data.x86;
+	pvt->max_mcs	= 2;
+
+	/*
+	 * Decide on which ops group to use here and do any family/model
+	 * overrides below.
+	 */
+	if (pvt->fam >= 0x17)
+		pvt->ops = &umc_ops;
+	else
+		pvt->ops = &dct_ops;
 
 	switch (pvt->fam) {
 	case 0xf:
-		fam_type	= &family_types[K8_CPUS];
-		pvt->ops	= &family_types[K8_CPUS].ops;
+		pvt->ctl_name				= (pvt->ext_model > K8_REV_F) ?
+							  "K8 revF or later" : "K8 revE or earlier";
+		pvt->f1_id				= PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP;
+		pvt->f2_id				= PCI_DEVICE_ID_AMD_K8_NB_MEMCTL;
+		pvt->ops->early_channel_count		= k8_early_channel_count;
+		pvt->ops->map_sysaddr_to_csrow		= k8_map_sysaddr_to_csrow;
+		pvt->ops->dbam_to_cs			= k8_dbam_to_chip_select;
 		break;
 
 	case 0x10:
-		fam_type	= &family_types[F10_CPUS];
-		pvt->ops	= &family_types[F10_CPUS].ops;
+		pvt->ctl_name				= "F10h";
+		pvt->f1_id				= PCI_DEVICE_ID_AMD_10H_NB_MAP;
+		pvt->f2_id				= PCI_DEVICE_ID_AMD_10H_NB_DRAM;
+		pvt->ops->dbam_to_cs			= f10_dbam_to_chip_select;
 		break;
 
 	case 0x15:
-		if (pvt->model == 0x30) {
-			fam_type = &family_types[F15_M30H_CPUS];
-			pvt->ops = &family_types[F15_M30H_CPUS].ops;
+		switch (pvt->model) {
+		case 0x30:
+			pvt->ctl_name			= "F15h_M30h";
+			pvt->f1_id			= PCI_DEVICE_ID_AMD_15H_M30H_NB_F1;
+			pvt->f2_id			= PCI_DEVICE_ID_AMD_15H_M30H_NB_F2;
 			break;
-		} else if (pvt->model == 0x60) {
-			fam_type = &family_types[F15_M60H_CPUS];
-			pvt->ops = &family_types[F15_M60H_CPUS].ops;
+		case 0x60:
+			pvt->ctl_name			= "F15h_M60h";
+			pvt->f1_id			= PCI_DEVICE_ID_AMD_15H_M60H_NB_F1;
+			pvt->f2_id			= PCI_DEVICE_ID_AMD_15H_M60H_NB_F2;
+			pvt->ops->dbam_to_cs		= f15_m60h_dbam_to_chip_select;
+			break;
+		case 0x13:
+			/* Richland is only client */
+			return -ENODEV;
+		default:
+			pvt->ctl_name			= "F15h";
+			pvt->f1_id			= PCI_DEVICE_ID_AMD_15H_NB_F1;
+			pvt->f2_id			= PCI_DEVICE_ID_AMD_15H_NB_F2;
+			pvt->ops->dbam_to_cs		= f15_dbam_to_chip_select;
 			break;
-		/* Richland is only client */
-		} else if (pvt->model == 0x13) {
-			return NULL;
-		} else {
-			fam_type	= &family_types[F15_CPUS];
-			pvt->ops	= &family_types[F15_CPUS].ops;
 		}
 		break;
 
 	case 0x16:
-		if (pvt->model == 0x30) {
-			fam_type = &family_types[F16_M30H_CPUS];
-			pvt->ops = &family_types[F16_M30H_CPUS].ops;
+		switch (pvt->model) {
+		case 0x30:
+			pvt->ctl_name			= "F16h_M30h";
+			pvt->f1_id			= PCI_DEVICE_ID_AMD_16H_M30H_NB_F1;
+			pvt->f2_id			= PCI_DEVICE_ID_AMD_16H_M30H_NB_F2;
+			break;
+		default:
+			pvt->ctl_name			= "F16h";
+			pvt->f1_id			= PCI_DEVICE_ID_AMD_16H_NB_F1;
+			pvt->f2_id			= PCI_DEVICE_ID_AMD_16H_NB_F2;
 			break;
 		}
-		fam_type	= &family_types[F16_CPUS];
-		pvt->ops	= &family_types[F16_CPUS].ops;
 		break;
 
 	case 0x17:
-		if (pvt->model >= 0x10 && pvt->model <= 0x2f) {
-			fam_type = &family_types[F17_M10H_CPUS];
-			pvt->ops = &family_types[F17_M10H_CPUS].ops;
+		switch (pvt->model) {
+		case 0x10 ... 0x2f:
+			pvt->ctl_name			= "F17h_M10h";
 			break;
-		} else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
-			fam_type = &family_types[F17_M30H_CPUS];
-			pvt->ops = &family_types[F17_M30H_CPUS].ops;
+		case 0x30 ... 0x3f:
+			pvt->ctl_name			= "F17h_M30h";
+			pvt->max_mcs			= 8;
 			break;
-		} else if (pvt->model >= 0x60 && pvt->model <= 0x6f) {
-			fam_type = &family_types[F17_M60H_CPUS];
-			pvt->ops = &family_types[F17_M60H_CPUS].ops;
+		case 0x60 ... 0x6f:
+			pvt->ctl_name			= "F17h_M60h";
 			break;
-		} else if (pvt->model >= 0x70 && pvt->model <= 0x7f) {
-			fam_type = &family_types[F17_M70H_CPUS];
-			pvt->ops = &family_types[F17_M70H_CPUS].ops;
+		case 0x70 ... 0x7f:
+			pvt->ctl_name			= "F17h_M70h";
+			break;
+		default:
+			pvt->ctl_name			= "F17h";
 			break;
 		}
-		fallthrough;
-	case 0x18:
-		fam_type	= &family_types[F17_CPUS];
-		pvt->ops	= &family_types[F17_CPUS].ops;
+		break;
 
-		if (pvt->fam == 0x18)
-			family_types[F17_CPUS].ctl_name = "F18h";
+	case 0x18:
+		pvt->ctl_name				= "F18h";
 		break;
 
 	case 0x19:
-		if (pvt->model >= 0x10 && pvt->model <= 0x1f) {
-			fam_type = &family_types[F19_M10H_CPUS];
-			pvt->ops = &family_types[F19_M10H_CPUS].ops;
+		switch (pvt->model) {
+		case 0x00 ... 0x0f:
+			pvt->ctl_name			= "F19h";
+			pvt->max_mcs			= 8;
+			break;
+		case 0x10 ... 0x1f:
+			pvt->ctl_name			= "F19h_M10h";
+			pvt->max_mcs			= 12;
+			pvt->flags.zn_regs_v2		= 1;
 			break;
-		} else if (pvt->model >= 0x20 && pvt->model <= 0x2f) {
-			fam_type = &family_types[F17_M70H_CPUS];
-			pvt->ops = &family_types[F17_M70H_CPUS].ops;
-			fam_type->ctl_name = "F19h_M20h";
+		case 0x20 ... 0x2f:
+			pvt->ctl_name			= "F19h_M20h";
 			break;
-		} else if (pvt->model >= 0x50 && pvt->model <= 0x5f) {
-			fam_type = &family_types[F19_M50H_CPUS];
-			pvt->ops = &family_types[F19_M50H_CPUS].ops;
-			fam_type->ctl_name = "F19h_M50h";
+		case 0x50 ... 0x5f:
+			pvt->ctl_name			= "F19h_M50h";
 			break;
-		} else if (pvt->model >= 0xa0 && pvt->model <= 0xaf) {
-			fam_type = &family_types[F19_M10H_CPUS];
-			pvt->ops = &family_types[F19_M10H_CPUS].ops;
-			fam_type->ctl_name = "F19h_MA0h";
+		case 0xa0 ... 0xaf:
+			pvt->ctl_name			= "F19h_MA0h";
+			pvt->max_mcs			= 12;
+			pvt->flags.zn_regs_v2		= 1;
 			break;
 		}
-		fam_type	= &family_types[F19_CPUS];
-		pvt->ops	= &family_types[F19_CPUS].ops;
-		family_types[F19_CPUS].ctl_name = "F19h";
 		break;
 
 	default:
 		amd64_err("Unsupported family!\n");
-		return NULL;
+		return -ENODEV;
 	}
 
-	return fam_type;
+	return 0;
 }
 
 static const struct attribute_group *amd64_edac_attr_groups[] = {
@@ -4030,12 +3918,12 @@ static int hw_info_get(struct amd64_pvt *pvt)
 	int ret;
 
 	if (pvt->fam >= 0x17) {
-		pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
+		pvt->umc = kcalloc(pvt->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
 		if (!pvt->umc)
 			return -ENOMEM;
 	} else {
-		pci_id1 = fam_type->f1_id;
-		pci_id2 = fam_type->f2_id;
+		pci_id1 = pvt->f1_id;
+		pci_id2 = pvt->f2_id;
 	}
 
 	ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
@@ -4081,7 +3969,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
 	 * only one channel. Also, this simplifies handling later for the price
 	 * of a couple of KBs tops.
 	 */
-	layers[1].size = fam_type->max_mcs;
+	layers[1].size = pvt->max_mcs;
 	layers[1].is_virt_csrow = false;
 
 	mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
@@ -4111,7 +3999,7 @@ static bool instance_has_memory(struct amd64_pvt *pvt)
 	bool cs_enabled = false;
 	int cs = 0, dct = 0;
 
-	for (dct = 0; dct < fam_type->max_mcs; dct++) {
+	for (dct = 0; dct < pvt->max_mcs; dct++) {
 		for_each_chip_select(cs, dct, pvt)
 			cs_enabled |= csrow_enabled(cs, dct, pvt);
 	}
@@ -4140,9 +4028,8 @@ static int probe_one_instance(unsigned int nid)
 	pvt->mc_node_id	= nid;
 	pvt->F3 = F3;
 
-	ret = -ENODEV;
-	fam_type = per_family_init(pvt);
-	if (!fam_type)
+	ret = per_family_init(pvt);
+	if (ret < 0)
 		goto err_enable;
 
 	ret = hw_info_get(pvt);
@@ -4181,11 +4068,7 @@ static int probe_one_instance(unsigned int nid)
 		goto err_enable;
 	}
 
-	amd64_info("%s %sdetected (node %d).\n", fam_type->ctl_name,
-		     (pvt->fam == 0xf ?
-				(pvt->ext_model >= K8_REV_F  ? "revF or later "
-							     : "revE or earlier ")
-				 : ""), pvt->mc_node_id);
+	amd64_info("%s detected (node %d).\n", pvt->ctl_name, pvt->mc_node_id);
 
 	dump_misc_regs(pvt);
 
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 7cfac50d6558..a4a27208532c 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -273,25 +273,6 @@
 
 #define UMC_SDP_INIT			BIT(31)
 
-enum amd_families {
-	K8_CPUS = 0,
-	F10_CPUS,
-	F15_CPUS,
-	F15_M30H_CPUS,
-	F15_M60H_CPUS,
-	F16_CPUS,
-	F16_M30H_CPUS,
-	F17_CPUS,
-	F17_M10H_CPUS,
-	F17_M30H_CPUS,
-	F17_M60H_CPUS,
-	F17_M70H_CPUS,
-	F19_CPUS,
-	F19_M10H_CPUS,
-	F19_M50H_CPUS,
-	NUM_FAMILIES,
-};
-
 /* Error injection control structure */
 struct error_injection {
 	u32	 section;
@@ -334,11 +315,21 @@ struct amd64_umc {
 	enum mem_type dram_type;
 };
 
+struct amd64_family_flags {
+	/*
+	 * Indicates that the system supports the new register offsets, etc.
+	 * first introduced with Family 19h Model 10h.
+	 */
+	__u64 zn_regs_v2	: 1,
+
+	      __reserved	: 63;
+};
+
 struct amd64_pvt {
 	struct low_ops *ops;
 
 	/* pci_device handles which we utilize */
-	struct pci_dev *F0, *F1, *F2, *F3, *F6;
+	struct pci_dev *F1, *F2, *F3;
 
 	u16 mc_node_id;		/* MC index of this MC node */
 	u8 fam;			/* CPU family */
@@ -376,6 +367,12 @@ struct amd64_pvt {
 	/* x4, x8, or x16 syndromes in use */
 	u8 ecc_sym_sz;
 
+	const char *ctl_name;
+	u16 f1_id, f2_id;
+	/* Maximum number of memory controllers per die/node. */
+	u8 max_mcs;
+
+	struct amd64_family_flags flags;
 	/* place to store error injection parameters prior to issue */
 	struct error_injection injection;
 
@@ -466,30 +463,11 @@ struct ecc_settings {
  * functions and per device encoding/decoding logic.
  */
 struct low_ops {
-	int (*early_channel_count)	(struct amd64_pvt *pvt);
-	void (*map_sysaddr_to_csrow)	(struct mem_ctl_info *mci, u64 sys_addr,
-					 struct err_info *);
-	int (*dbam_to_cs)		(struct amd64_pvt *pvt, u8 dct,
-					 unsigned cs_mode, int cs_mask_nr);
-};
-
-struct amd64_family_flags {
-	/*
-	 * Indicates that the system supports the new register offsets, etc.
-	 * first introduced with Family 19h Model 10h.
-	 */
-	__u64 zn_regs_v2	: 1,
-
-	      __reserved	: 63;
-};
-
-struct amd64_family_type {
-	const char *ctl_name;
-	u16 f1_id, f2_id;
-	/* Maximum number of memory controllers per die/node. */
-	u8 max_mcs;
-	struct amd64_family_flags flags;
-	struct low_ops ops;
+	int  (*early_channel_count)(struct amd64_pvt *pvt);
+	void (*map_sysaddr_to_csrow)(struct mem_ctl_info *mci, u64 sys_addr,
+				     struct err_info *err);
+	int  (*dbam_to_cs)(struct amd64_pvt *pvt, u8 dct,
+			   unsigned int cs_mode, int cs_mask_nr);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/18] EDAC/amd64: Add prep_chip_selects() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (4 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 05/18] EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-18  8:10   ` Borislav Petkov
  2022-05-09 14:55 ` [PATCH 07/18] EDAC/amd64: Add read_base_mask() " Yazen Ghannam
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will need to set the number of available Chip Selects, i.e.
Base and Mask values, differently than existing systems. A function
pointer should be used rather than introduce another branching condition.

Prepare for this by adding prep_chip_selects() to pvt->ops and set it
as needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 26 +++++++++++++++-----------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3d569290d4cf..f8cd89278753 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1483,7 +1483,7 @@ static void dump_misc_regs(struct amd64_pvt *pvt)
 /*
  * See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60]
  */
-static void prep_chip_selects(struct amd64_pvt *pvt)
+static void dct_prep_chip_selects(struct amd64_pvt *pvt)
 {
 	if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) {
 		pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
@@ -1491,20 +1491,22 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
 	} else if (pvt->fam == 0x15 && pvt->model == 0x30) {
 		pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
 		pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
-	} else if (pvt->fam >= 0x17) {
-		int umc;
-
-		for_each_umc(umc) {
-			pvt->csels[umc].b_cnt = 4;
-			pvt->csels[umc].m_cnt = pvt->flags.zn_regs_v2 ? 4 : 2;
-		}
-
 	} else {
 		pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
 		pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
 	}
 }
 
+static void umc_prep_chip_selects(struct amd64_pvt *pvt)
+{
+	int umc;
+
+	for_each_umc(umc) {
+		pvt->csels[umc].b_cnt = 4;
+		pvt->csels[umc].m_cnt = pvt->flags.zn_regs_v2 ? 4 : 2;
+	}
+}
+
 static void read_umc_base_mask(struct amd64_pvt *pvt)
 {
 	u32 umc_base_reg, umc_base_reg_sec;
@@ -1563,8 +1565,6 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
 {
 	int cs;
 
-	prep_chip_selects(pvt);
-
 	if (pvt->umc)
 		return read_umc_base_mask(pvt);
 
@@ -3301,6 +3301,8 @@ static void read_mc_regs(struct amd64_pvt *pvt)
 	}
 
 skip:
+	pvt->ops->prep_chip_selects(pvt);
+
 	read_dct_base_mask(pvt);
 
 	determine_memory_type(pvt);
@@ -3762,6 +3764,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 static struct low_ops umc_ops = {
 	.early_channel_count		= umc_early_channel_count,
 	.dbam_to_cs			= umc_addr_mask_to_cs_size,
+	.prep_chip_selects		= umc_prep_chip_selects,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3769,6 +3772,7 @@ static struct low_ops dct_ops = {
 	.early_channel_count		= f1x_early_channel_count,
 	.map_sysaddr_to_csrow		= f1x_map_sysaddr_to_csrow,
 	.dbam_to_cs			= f16_dbam_to_chip_select,
+	.prep_chip_selects		= dct_prep_chip_selects,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index a4a27208532c..0a7738df396f 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -468,6 +468,7 @@ struct low_ops {
 				     struct err_info *err);
 	int  (*dbam_to_cs)(struct amd64_pvt *pvt, u8 dct,
 			   unsigned int cs_mode, int cs_mask_nr);
+	void (*prep_chip_selects)(struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/18] EDAC/amd64: Add read_base_mask() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (5 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 06/18] EDAC/amd64: Add prep_chip_selects() into pvt->ops Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 08/18] EDAC/amd64: Add determine_memory_type() " Yazen Ghannam
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will need to set the read the Base and Mask values differently
than existing systems. A function pointer should be used rather than
introduce another branching condition.

Prepare for this by adding read_base_mask() to pvt->ops and set it as
needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 11 +++++------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f8cd89278753..6e26bbb73f81 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1507,7 +1507,7 @@ static void umc_prep_chip_selects(struct amd64_pvt *pvt)
 	}
 }
 
-static void read_umc_base_mask(struct amd64_pvt *pvt)
+static void umc_read_base_mask(struct amd64_pvt *pvt)
 {
 	u32 umc_base_reg, umc_base_reg_sec;
 	u32 umc_mask_reg, umc_mask_reg_sec;
@@ -1561,13 +1561,10 @@ static void read_umc_base_mask(struct amd64_pvt *pvt)
 /*
  * Function 2 Offset F10_DCSB0; read in the DCS Base and DCS Mask registers
  */
-static void read_dct_base_mask(struct amd64_pvt *pvt)
+static void dct_read_base_mask(struct amd64_pvt *pvt)
 {
 	int cs;
 
-	if (pvt->umc)
-		return read_umc_base_mask(pvt);
-
 	for_each_chip_select(cs, 0, pvt) {
 		int reg0   = DCSB0 + (cs * 4);
 		int reg1   = DCSB1 + (cs * 4);
@@ -3303,7 +3300,7 @@ static void read_mc_regs(struct amd64_pvt *pvt)
 skip:
 	pvt->ops->prep_chip_selects(pvt);
 
-	read_dct_base_mask(pvt);
+	pvt->ops->read_base_mask(pvt);
 
 	determine_memory_type(pvt);
 
@@ -3765,6 +3762,7 @@ static struct low_ops umc_ops = {
 	.early_channel_count		= umc_early_channel_count,
 	.dbam_to_cs			= umc_addr_mask_to_cs_size,
 	.prep_chip_selects		= umc_prep_chip_selects,
+	.read_base_mask			= umc_read_base_mask,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3773,6 +3771,7 @@ static struct low_ops dct_ops = {
 	.map_sysaddr_to_csrow		= f1x_map_sysaddr_to_csrow,
 	.dbam_to_cs			= f16_dbam_to_chip_select,
 	.prep_chip_selects		= dct_prep_chip_selects,
+	.read_base_mask			= dct_read_base_mask,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 0a7738df396f..c81cc7f5fc96 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -469,6 +469,7 @@ struct low_ops {
 	int  (*dbam_to_cs)(struct amd64_pvt *pvt, u8 dct,
 			   unsigned int cs_mode, int cs_mask_nr);
 	void (*prep_chip_selects)(struct amd64_pvt *pvt);
+	void (*read_base_mask)(struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/18] EDAC/amd64: Add determine_memory_type() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (6 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 07/18] EDAC/amd64: Add read_base_mask() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 09/18] EDAC/amd64: Add get_ecc_sym_sz() " Yazen Ghannam
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will need to set the EDAC memory type differently than
existing systems. A function pointer should be used rather than introduce
another branching condition.

Prepare for this by adding determine_memory_type() to pvt->ops and set it
as needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 16 +++++++---------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 6e26bbb73f81..a767d8a6bfe8 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1604,7 +1604,7 @@ static void dct_read_base_mask(struct amd64_pvt *pvt)
 	}
 }
 
-static void determine_memory_type_df(struct amd64_pvt *pvt)
+static void umc_determine_memory_type(struct amd64_pvt *pvt)
 {
 	struct amd64_umc *umc;
 	u32 i;
@@ -1641,13 +1641,10 @@ static void determine_memory_type_df(struct amd64_pvt *pvt)
 	}
 }
 
-static void determine_memory_type(struct amd64_pvt *pvt)
+static void dct_determine_memory_type(struct amd64_pvt *pvt)
 {
 	u32 dram_ctrl, dcsm;
 
-	if (pvt->umc)
-		return determine_memory_type_df(pvt);
-
 	switch (pvt->fam) {
 	case 0xf:
 		if (pvt->ext_model >= K8_REV_F)
@@ -1697,6 +1694,8 @@ static void determine_memory_type(struct amd64_pvt *pvt)
 		WARN(1, KERN_ERR "%s: Family??? 0x%x\n", __func__, pvt->fam);
 		pvt->dram_type = MEM_EMPTY;
 	}
+
+	edac_dbg(1, "  DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
 	return;
 
 ddr3:
@@ -3302,10 +3301,7 @@ static void read_mc_regs(struct amd64_pvt *pvt)
 
 	pvt->ops->read_base_mask(pvt);
 
-	determine_memory_type(pvt);
-
-	if (!pvt->umc)
-		edac_dbg(1, "  DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
+	pvt->ops->determine_memory_type(pvt);
 
 	determine_ecc_sym_sz(pvt);
 }
@@ -3763,6 +3759,7 @@ static struct low_ops umc_ops = {
 	.dbam_to_cs			= umc_addr_mask_to_cs_size,
 	.prep_chip_selects		= umc_prep_chip_selects,
 	.read_base_mask			= umc_read_base_mask,
+	.determine_memory_type		= umc_determine_memory_type,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3772,6 +3769,7 @@ static struct low_ops dct_ops = {
 	.dbam_to_cs			= f16_dbam_to_chip_select,
 	.prep_chip_selects		= dct_prep_chip_selects,
 	.read_base_mask			= dct_read_base_mask,
+	.determine_memory_type		= dct_determine_memory_type,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index c81cc7f5fc96..da3db0f4f59b 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -470,6 +470,7 @@ struct low_ops {
 			   unsigned int cs_mode, int cs_mask_nr);
 	void (*prep_chip_selects)(struct amd64_pvt *pvt);
 	void (*read_base_mask)(struct amd64_pvt *pvt);
+	void (*determine_memory_type)(struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/18] EDAC/amd64: Add get_ecc_sym_sz() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (7 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 08/18] EDAC/amd64: Add determine_memory_type() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 10/18] EDAC/amd64: Add read_mc_regs() " Yazen Ghannam
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will need to determine ECC symbol size differently than
existing systems. A function pointer should be used rather than
introduce another branching condition.

Prepare for this by adding get_ecc_sym_sz() to pvt->ops and set it as
needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 43 +++++++++++++++++++++++----------------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index a767d8a6bfe8..2a3205f1205e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3169,26 +3169,11 @@ static void free_mc_sibling_devs(struct amd64_pvt *pvt)
 	}
 }
 
-static void determine_ecc_sym_sz(struct amd64_pvt *pvt)
+static void dct_determine_ecc_sym_sz(struct amd64_pvt *pvt)
 {
 	pvt->ecc_sym_sz = 4;
 
-	if (pvt->umc) {
-		u8 i;
-
-		for_each_umc(i) {
-			/* Check enabled channels only: */
-			if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
-				if (pvt->umc[i].ecc_ctrl & BIT(9)) {
-					pvt->ecc_sym_sz = 16;
-					return;
-				} else if (pvt->umc[i].ecc_ctrl & BIT(7)) {
-					pvt->ecc_sym_sz = 8;
-					return;
-				}
-			}
-		}
-	} else if (pvt->fam >= 0x10) {
+	if (pvt->fam >= 0x10) {
 		u32 tmp;
 
 		amd64_read_pci_cfg(pvt->F3, EXT_NB_MCA_CFG, &tmp);
@@ -3202,6 +3187,26 @@ static void determine_ecc_sym_sz(struct amd64_pvt *pvt)
 	}
 }
 
+static void umc_determine_ecc_sym_sz(struct amd64_pvt *pvt)
+{
+	u8 i;
+
+	pvt->ecc_sym_sz = 4;
+
+	for_each_umc(i) {
+		/* Check enabled channels only: */
+		if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
+			if (pvt->umc[i].ecc_ctrl & BIT(9)) {
+				pvt->ecc_sym_sz = 16;
+				return;
+			} else if (pvt->umc[i].ecc_ctrl & BIT(7)) {
+				pvt->ecc_sym_sz = 8;
+				return;
+			}
+		}
+	}
+}
+
 /*
  * Retrieve the hardware registers of the memory controller.
  */
@@ -3303,7 +3308,7 @@ static void read_mc_regs(struct amd64_pvt *pvt)
 
 	pvt->ops->determine_memory_type(pvt);
 
-	determine_ecc_sym_sz(pvt);
+	pvt->ops->determine_ecc_sym_sz(pvt);
 }
 
 /*
@@ -3760,6 +3765,7 @@ static struct low_ops umc_ops = {
 	.prep_chip_selects		= umc_prep_chip_selects,
 	.read_base_mask			= umc_read_base_mask,
 	.determine_memory_type		= umc_determine_memory_type,
+	.determine_ecc_sym_sz		= umc_determine_ecc_sym_sz,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3770,6 +3776,7 @@ static struct low_ops dct_ops = {
 	.prep_chip_selects		= dct_prep_chip_selects,
 	.read_base_mask			= dct_read_base_mask,
 	.determine_memory_type		= dct_determine_memory_type,
+	.determine_ecc_sym_sz		= dct_determine_ecc_sym_sz,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index da3db0f4f59b..5e9ff6ea7ebc 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -471,6 +471,7 @@ struct low_ops {
 	void (*prep_chip_selects)(struct amd64_pvt *pvt);
 	void (*read_base_mask)(struct amd64_pvt *pvt);
 	void (*determine_memory_type)(struct amd64_pvt *pvt);
+	void (*determine_ecc_sym_sz)(struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/18] EDAC/amd64: Add read_mc_regs() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (8 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 09/18] EDAC/amd64: Add get_ecc_sym_sz() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-18 11:02   ` Borislav Petkov
  2022-05-09 14:55 ` [PATCH 11/18] EDAC/amd64: Add ecc_enabled() " Yazen Ghannam
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will read a different set of memory controller values
compared than existing systems. A function pointer should be used
rather than introduce another branching condition.

Prepare for this by adding read_mc_regs() to pvt->ops and set it as
needed based on currently supported systems.

Reading the TOM/TOM2 registers is preserved in the legacy function and
removed for the modern function. The values don't seem useful for the
EDAC module on modern systems. But they are kept for legacy systems in
case there was a need for them.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 31 +++++++++++++------------------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2a3205f1205e..1bf1660fe8f3 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3210,7 +3210,7 @@ static void umc_determine_ecc_sym_sz(struct amd64_pvt *pvt)
 /*
  * Retrieve the hardware registers of the memory controller.
  */
-static void __read_mc_regs_df(struct amd64_pvt *pvt)
+static void umc_read_mc_regs(struct amd64_pvt *pvt)
 {
 	u8 nid = pvt->mc_node_id;
 	struct amd64_umc *umc;
@@ -3234,7 +3234,7 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
  * Retrieve the hardware registers of the memory controller (this includes the
  * 'Address Map' and 'Misc' device regs)
  */
-static void read_mc_regs(struct amd64_pvt *pvt)
+static void dct_read_mc_regs(struct amd64_pvt *pvt)
 {
 	unsigned int range;
 	u64 msr_val;
@@ -3255,12 +3255,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
 		edac_dbg(0, "  TOP_MEM2 disabled\n");
 	}
 
-	if (pvt->umc) {
-		__read_mc_regs_df(pvt);
-
-		goto skip;
-	}
-
 	amd64_read_pci_cfg(pvt->F3, NBCAP, &pvt->nbcap);
 
 	read_dram_ctl_register(pvt);
@@ -3300,15 +3294,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
 		amd64_read_dct_pci_cfg(pvt, 1, DCLR0, &pvt->dclr1);
 		amd64_read_dct_pci_cfg(pvt, 1, DCHR0, &pvt->dchr1);
 	}
-
-skip:
-	pvt->ops->prep_chip_selects(pvt);
-
-	pvt->ops->read_base_mask(pvt);
-
-	pvt->ops->determine_memory_type(pvt);
-
-	pvt->ops->determine_ecc_sym_sz(pvt);
 }
 
 /*
@@ -3766,6 +3751,7 @@ static struct low_ops umc_ops = {
 	.read_base_mask			= umc_read_base_mask,
 	.determine_memory_type		= umc_determine_memory_type,
 	.determine_ecc_sym_sz		= umc_determine_ecc_sym_sz,
+	.read_mc_regs			= umc_read_mc_regs,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3777,6 +3763,7 @@ static struct low_ops dct_ops = {
 	.read_base_mask			= dct_read_base_mask,
 	.determine_memory_type		= dct_determine_memory_type,
 	.determine_ecc_sym_sz		= dct_determine_ecc_sym_sz,
+	.read_mc_regs			= dct_read_mc_regs,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
@@ -3938,7 +3925,15 @@ static int hw_info_get(struct amd64_pvt *pvt)
 	if (ret)
 		return ret;
 
-	read_mc_regs(pvt);
+	pvt->ops->read_mc_regs(pvt);
+
+	pvt->ops->prep_chip_selects(pvt);
+
+	pvt->ops->read_base_mask(pvt);
+
+	pvt->ops->determine_memory_type(pvt);
+
+	pvt->ops->determine_ecc_sym_sz(pvt);
 
 	return 0;
 }
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 5e9ff6ea7ebc..25d0dcc5c480 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -472,6 +472,7 @@ struct low_ops {
 	void (*read_base_mask)(struct amd64_pvt *pvt);
 	void (*determine_memory_type)(struct amd64_pvt *pvt);
 	void (*determine_ecc_sym_sz)(struct amd64_pvt *pvt);
+	void (*read_mc_regs)(struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/18] EDAC/amd64: Add ecc_enabled() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (9 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 10/18] EDAC/amd64: Add read_mc_regs() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 12/18] EDAC/amd64: Add determine_edac_cap() " Yazen Ghannam
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will have different criteria for checking if ECC is enabled.
A function pointer should be used rather than introduce another
branching condition.

Prepare for this by adding ecc_enabled() to pvt->ops and set it as
needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 69 ++++++++++++++++++++++-----------------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1bf1660fe8f3..136f2454a502 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3634,52 +3634,59 @@ static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid,
 		amd64_warn("Error restoring NB MCGCTL settings!\n");
 }
 
-static bool ecc_enabled(struct amd64_pvt *pvt)
+static bool dct_ecc_enabled(struct amd64_pvt *pvt)
 {
 	u16 nid = pvt->mc_node_id;
 	bool nb_mce_en = false;
-	u8 ecc_en = 0, i;
+	u8 ecc_en = 0;
 	u32 value;
 
-	if (boot_cpu_data.x86 >= 0x17) {
-		u8 umc_en_mask = 0, ecc_en_mask = 0;
-		struct amd64_umc *umc;
+	amd64_read_pci_cfg(pvt->F3, NBCFG, &value);
 
-		for_each_umc(i) {
-			umc = &pvt->umc[i];
+	ecc_en = !!(value & NBCFG_ECC_ENABLE);
 
-			/* Only check enabled UMCs. */
-			if (!(umc->sdp_ctrl & UMC_SDP_INIT))
-				continue;
+	nb_mce_en = nb_mce_bank_enabled_on_node(nid);
+	if (!nb_mce_en)
+		edac_dbg(0, "NB MCE bank disabled, set MSR 0x%08x[4] on node %d to enable.\n",
+			 MSR_IA32_MCG_CTL, nid);
 
-			umc_en_mask |= BIT(i);
+	edac_dbg(3, "Node %d: DRAM ECC %s.\n", nid, (ecc_en ? "enabled" : "disabled"));
 
-			if (umc->umc_cap_hi & UMC_ECC_ENABLED)
-				ecc_en_mask |= BIT(i);
-		}
+	if (!ecc_en || !nb_mce_en)
+		return false;
+	else
+		return true;
+}
 
-		/* Check whether at least one UMC is enabled: */
-		if (umc_en_mask)
-			ecc_en = umc_en_mask == ecc_en_mask;
-		else
-			edac_dbg(0, "Node %d: No enabled UMCs.\n", nid);
+static bool umc_ecc_enabled(struct amd64_pvt *pvt)
+{
+	u8 umc_en_mask = 0, ecc_en_mask = 0;
+	u16 nid = pvt->mc_node_id;
+	struct amd64_umc *umc;
+	u8 ecc_en = 0, i;
 
-		/* Assume UMC MCA banks are enabled. */
-		nb_mce_en = true;
-	} else {
-		amd64_read_pci_cfg(pvt->F3, NBCFG, &value);
+	for_each_umc(i) {
+		umc = &pvt->umc[i];
+
+		/* Only check enabled UMCs. */
+		if (!(umc->sdp_ctrl & UMC_SDP_INIT))
+			continue;
 
-		ecc_en = !!(value & NBCFG_ECC_ENABLE);
+		umc_en_mask |= BIT(i);
 
-		nb_mce_en = nb_mce_bank_enabled_on_node(nid);
-		if (!nb_mce_en)
-			edac_dbg(0, "NB MCE bank disabled, set MSR 0x%08x[4] on node %d to enable.\n",
-				     MSR_IA32_MCG_CTL, nid);
+		if (umc->umc_cap_hi & UMC_ECC_ENABLED)
+			ecc_en_mask |= BIT(i);
 	}
 
+	/* Check whether at least one UMC is enabled: */
+	if (umc_en_mask)
+		ecc_en = umc_en_mask == ecc_en_mask;
+	else
+		edac_dbg(0, "Node %d: No enabled UMCs.\n", nid);
+
 	edac_dbg(3, "Node %d: DRAM ECC %s.\n", nid, (ecc_en ? "enabled" : "disabled"));
 
-	if (!ecc_en || !nb_mce_en)
+	if (!ecc_en)
 		return false;
 	else
 		return true;
@@ -3752,6 +3759,7 @@ static struct low_ops umc_ops = {
 	.determine_memory_type		= umc_determine_memory_type,
 	.determine_ecc_sym_sz		= umc_determine_ecc_sym_sz,
 	.read_mc_regs			= umc_read_mc_regs,
+	.ecc_enabled			= umc_ecc_enabled,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3764,6 +3772,7 @@ static struct low_ops dct_ops = {
 	.determine_memory_type		= dct_determine_memory_type,
 	.determine_ecc_sym_sz		= dct_determine_ecc_sym_sz,
 	.read_mc_regs			= dct_read_mc_regs,
+	.ecc_enabled			= dct_ecc_enabled,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
@@ -4045,7 +4054,7 @@ static int probe_one_instance(unsigned int nid)
 		goto err_enable;
 	}
 
-	if (!ecc_enabled(pvt)) {
+	if (!pvt->ops->ecc_enabled(pvt)) {
 		ret = -ENODEV;
 
 		if (!ecc_enable_override)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 25d0dcc5c480..99b6ffa21ba5 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -473,6 +473,7 @@ struct low_ops {
 	void (*determine_memory_type)(struct amd64_pvt *pvt);
 	void (*determine_ecc_sym_sz)(struct amd64_pvt *pvt);
 	void (*read_mc_regs)(struct amd64_pvt *pvt);
+	bool (*ecc_enabled)(struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/18] EDAC/amd64: Add determine_edac_cap() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (10 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 11/18] EDAC/amd64: Add ecc_enabled() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-06-20 16:21   ` Borislav Petkov
  2022-05-09 14:55 ` [PATCH 13/18] EDAC/amd64: Add determine_edac_ctl_cap() " Yazen Ghannam
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will have different criteria for checking the EDAC
capabilities of a controller. A function pointer should be used rather
than introduce another branching condition.

Prepare for this by adding determine_edac_cap() to pvt->ops and set it
as needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 30 ++++++++++++++++++------------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 136f2454a502..0bc9a3846773 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1261,13 +1261,25 @@ static int get_channel_from_ecc_syndrome(struct mem_ctl_info *, u16);
  * Determine if the DIMMs have ECC enabled. ECC is enabled ONLY if all the DIMMs
  * are ECC capable.
  */
-static unsigned long determine_edac_cap(struct amd64_pvt *pvt)
+static unsigned long dct_determine_edac_cap(struct amd64_pvt *pvt)
 {
 	unsigned long edac_cap = EDAC_FLAG_NONE;
 	u8 bit;
 
-	if (pvt->umc) {
-		u8 i, umc_en_mask = 0, dimm_ecc_en_mask = 0;
+	bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
+		? 19
+		: 17;
+
+	if (pvt->dclr0 & BIT(bit))
+		edac_cap = EDAC_FLAG_SECDED;
+
+	return edac_cap;
+}
+
+static unsigned long umc_determine_edac_cap(struct amd64_pvt *pvt)
+{
+	u8 i, umc_en_mask = 0, dimm_ecc_en_mask = 0;
+	unsigned long edac_cap = EDAC_FLAG_NONE;
 
 		for_each_umc(i) {
 			if (!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT))
@@ -1282,14 +1294,6 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt)
 
 		if (umc_en_mask == dimm_ecc_en_mask)
 			edac_cap = EDAC_FLAG_SECDED;
-	} else {
-		bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
-			? 19
-			: 17;
-
-		if (pvt->dclr0 & BIT(bit))
-			edac_cap = EDAC_FLAG_SECDED;
-	}
 
 	return edac_cap;
 }
@@ -3740,7 +3744,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 			mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
 	}
 
-	mci->edac_cap		= determine_edac_cap(pvt);
+	mci->edac_cap		= pvt->ops->determine_edac_cap(pvt);
 	mci->mod_name		= EDAC_MOD_STR;
 	mci->ctl_name		= pvt->ctl_name;
 	mci->dev_name		= pci_name(pvt->F3);
@@ -3760,6 +3764,7 @@ static struct low_ops umc_ops = {
 	.determine_ecc_sym_sz		= umc_determine_ecc_sym_sz,
 	.read_mc_regs			= umc_read_mc_regs,
 	.ecc_enabled			= umc_ecc_enabled,
+	.determine_edac_cap		= umc_determine_edac_cap,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3773,6 +3778,7 @@ static struct low_ops dct_ops = {
 	.determine_ecc_sym_sz		= dct_determine_ecc_sym_sz,
 	.read_mc_regs			= dct_read_mc_regs,
 	.ecc_enabled			= dct_ecc_enabled,
+	.determine_edac_cap		= dct_determine_edac_cap,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 99b6ffa21ba5..bfe48492a0ba 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -474,6 +474,7 @@ struct low_ops {
 	void (*determine_ecc_sym_sz)(struct amd64_pvt *pvt);
 	void (*read_mc_regs)(struct amd64_pvt *pvt);
 	bool (*ecc_enabled)(struct amd64_pvt *pvt);
+	unsigned long (*determine_edac_cap)(struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/18] EDAC/amd64: Add determine_edac_ctl_cap() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (11 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 12/18] EDAC/amd64: Add determine_edac_cap() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 14/18] EDAC/amd64: Add setup_mci_misc_attrs() " Yazen Ghannam
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will have different criteria for checking the EDAC
capabilities of a controller. A function pointer should be used rather
than introduce another branching condition.

Prepare for this by adding determine_edac_ctl_cap() to pvt->ops and set it
as needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 24 ++++++++++++++----------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 0bc9a3846773..b99eaa73131e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3697,7 +3697,17 @@ static bool umc_ecc_enabled(struct amd64_pvt *pvt)
 }
 
 static inline void
-f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
+dct_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
+{
+	if (pvt->nbcap & NBCAP_SECDED)
+		mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
+
+	if (pvt->nbcap & NBCAP_CHIPKILL)
+		mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+}
+
+static inline void
+umc_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 {
 	u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
 
@@ -3734,15 +3744,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 	mci->mtype_cap		= MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
 	mci->edac_ctl_cap	= EDAC_FLAG_NONE;
 
-	if (pvt->umc) {
-		f17h_determine_edac_ctl_cap(mci, pvt);
-	} else {
-		if (pvt->nbcap & NBCAP_SECDED)
-			mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
-
-		if (pvt->nbcap & NBCAP_CHIPKILL)
-			mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
-	}
+	pvt->ops->determine_edac_ctl_cap(mci, pvt);
 
 	mci->edac_cap		= pvt->ops->determine_edac_cap(pvt);
 	mci->mod_name		= EDAC_MOD_STR;
@@ -3765,6 +3767,7 @@ static struct low_ops umc_ops = {
 	.read_mc_regs			= umc_read_mc_regs,
 	.ecc_enabled			= umc_ecc_enabled,
 	.determine_edac_cap		= umc_determine_edac_cap,
+	.determine_edac_ctl_cap		= umc_determine_edac_ctl_cap,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3779,6 +3782,7 @@ static struct low_ops dct_ops = {
 	.read_mc_regs			= dct_read_mc_regs,
 	.ecc_enabled			= dct_ecc_enabled,
 	.determine_edac_cap		= dct_determine_edac_cap,
+	.determine_edac_ctl_cap		= dct_determine_edac_ctl_cap,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index bfe48492a0ba..15521adec9b5 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -475,6 +475,7 @@ struct low_ops {
 	void (*read_mc_regs)(struct amd64_pvt *pvt);
 	bool (*ecc_enabled)(struct amd64_pvt *pvt);
 	unsigned long (*determine_edac_cap)(struct amd64_pvt *pvt);
+	void (*determine_edac_ctl_cap)(struct mem_ctl_info *mci, struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/18] EDAC/amd64: Add setup_mci_misc_attrs() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (12 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 13/18] EDAC/amd64: Add determine_edac_ctl_cap() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 15/18] EDAC/amd64: Add init_csrows() " Yazen Ghannam
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will expose the memory controller attributes differently than
current systems. A function pointer should be used rather than introduce
another branching condition.

Prepare for this by adding setup_mci_misc_attrs() to pvt->ops. Legacy and
modern systems use the same function. A new function will be added for
GPU Nodes.

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 4 +++-
 drivers/edac/amd64_edac.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b99eaa73131e..f1346416e64d 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3768,6 +3768,7 @@ static struct low_ops umc_ops = {
 	.ecc_enabled			= umc_ecc_enabled,
 	.determine_edac_cap		= umc_determine_edac_cap,
 	.determine_edac_ctl_cap		= umc_determine_edac_ctl_cap,
+	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
 /* Use Family 16h versions for defaults and adjust as needed below. */
@@ -3783,6 +3784,7 @@ static struct low_ops dct_ops = {
 	.ecc_enabled			= dct_ecc_enabled,
 	.determine_edac_cap		= dct_determine_edac_cap,
 	.determine_edac_ctl_cap		= dct_determine_edac_ctl_cap,
+	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
 static int per_family_init(struct amd64_pvt *pvt)
@@ -4001,7 +4003,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
 	mci->pvt_info = pvt;
 	mci->pdev = &pvt->F3->dev;
 
-	setup_mci_misc_attrs(mci);
+	pvt->ops->setup_mci_misc_attrs(mci);
 
 	if (init_csrows(mci))
 		mci->edac_cap = EDAC_FLAG_NONE;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 15521adec9b5..93de7dea516a 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -476,6 +476,7 @@ struct low_ops {
 	bool (*ecc_enabled)(struct amd64_pvt *pvt);
 	unsigned long (*determine_edac_cap)(struct amd64_pvt *pvt);
 	void (*determine_edac_ctl_cap)(struct mem_ctl_info *mci, struct amd64_pvt *pvt);
+	void (*setup_mci_misc_attrs)(struct mem_ctl_info *mci);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 15/18] EDAC/amd64: Add init_csrows() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (13 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 14/18] EDAC/amd64: Add setup_mci_misc_attrs() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 16/18] EDAC/amd64: Add dump_misc_regs() " Yazen Ghannam
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will use a different method to enumerate the chip select
(struct dimm_info) values. A function pointer should be used rather
than introduce another branching condition.

Prepare for this by adding init_csrows() to pvt->ops and set it as
needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 11 +++++------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f1346416e64d..5beeeb2fd6a8 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3357,7 +3357,7 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
 	return nr_pages;
 }
 
-static int init_csrows_df(struct mem_ctl_info *mci)
+static int umc_init_csrows(struct mem_ctl_info *mci)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	enum edac_type edac_mode = EDAC_NONE;
@@ -3405,7 +3405,7 @@ static int init_csrows_df(struct mem_ctl_info *mci)
  * Initialize the array of csrow attribute instances, based on the values
  * from pci config hardware registers.
  */
-static int init_csrows(struct mem_ctl_info *mci)
+static int dct_init_csrows(struct mem_ctl_info *mci)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 	enum edac_type edac_mode = EDAC_NONE;
@@ -3415,9 +3415,6 @@ static int init_csrows(struct mem_ctl_info *mci)
 	int nr_pages = 0;
 	u32 val;
 
-	if (pvt->umc)
-		return init_csrows_df(mci);
-
 	amd64_read_pci_cfg(pvt->F3, NBCFG, &val);
 
 	pvt->nbcfg = val;
@@ -3768,6 +3765,7 @@ static struct low_ops umc_ops = {
 	.ecc_enabled			= umc_ecc_enabled,
 	.determine_edac_cap		= umc_determine_edac_cap,
 	.determine_edac_ctl_cap		= umc_determine_edac_ctl_cap,
+	.init_csrows			= umc_init_csrows,
 	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
@@ -3784,6 +3782,7 @@ static struct low_ops dct_ops = {
 	.ecc_enabled			= dct_ecc_enabled,
 	.determine_edac_cap		= dct_determine_edac_cap,
 	.determine_edac_ctl_cap		= dct_determine_edac_ctl_cap,
+	.init_csrows			= dct_init_csrows,
 	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
@@ -4005,7 +4004,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
 
 	pvt->ops->setup_mci_misc_attrs(mci);
 
-	if (init_csrows(mci))
+	if (pvt->ops->init_csrows(mci))
 		mci->edac_cap = EDAC_FLAG_NONE;
 
 	ret = -ENODEV;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 93de7dea516a..1b879c3cfb36 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -477,6 +477,7 @@ struct low_ops {
 	unsigned long (*determine_edac_cap)(struct amd64_pvt *pvt);
 	void (*determine_edac_ctl_cap)(struct mem_ctl_info *mci, struct amd64_pvt *pvt);
 	void (*setup_mci_misc_attrs)(struct mem_ctl_info *mci);
+	int  (*init_csrows)(struct mem_ctl_info *mci);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 16/18] EDAC/amd64: Add dump_misc_regs() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (14 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 15/18] EDAC/amd64: Add init_csrows() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 17/18] EDAC/amd64: Add get_cs_mode() " Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 18/18] EDAC/amd64: Add get_err_info() " Yazen Ghannam
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will print out different information and registers compared to
existing systems. A function pointer should be used rather than
introduce another branching condition.

Prepare for this by adding dump_misc_regs() to pvt->ops and set it as
needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 21 ++++++---------------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 5beeeb2fd6a8..b4c9d224f564 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1394,7 +1394,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
 	}
 }
 
-static void __dump_misc_regs_df(struct amd64_pvt *pvt)
+static void umc_dump_misc_regs(struct amd64_pvt *pvt)
 {
 	struct amd64_umc *umc;
 	u32 i, tmp, umc_base;
@@ -1437,8 +1437,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
 	}
 }
 
-/* Display and decode various NB registers for debug purposes. */
-static void __dump_misc_regs(struct amd64_pvt *pvt)
+static void dct_dump_misc_regs(struct amd64_pvt *pvt)
 {
 	edac_dbg(1, "F3xE8 (NB Cap): 0x%08x\n", pvt->nbcap);
 
@@ -1473,17 +1472,6 @@ static void __dump_misc_regs(struct amd64_pvt *pvt)
 	edac_dbg(1, "  DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no");
 }
 
-/* Display and decode various NB registers for debug purposes. */
-static void dump_misc_regs(struct amd64_pvt *pvt)
-{
-	if (pvt->umc)
-		__dump_misc_regs_df(pvt);
-	else
-		__dump_misc_regs(pvt);
-
-	amd64_info("using x%u syndromes.\n", pvt->ecc_sym_sz);
-}
-
 /*
  * See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60]
  */
@@ -3766,6 +3754,7 @@ static struct low_ops umc_ops = {
 	.determine_edac_cap		= umc_determine_edac_cap,
 	.determine_edac_ctl_cap		= umc_determine_edac_ctl_cap,
 	.init_csrows			= umc_init_csrows,
+	.dump_misc_regs			= umc_dump_misc_regs,
 	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
@@ -3783,6 +3772,7 @@ static struct low_ops dct_ops = {
 	.determine_edac_cap		= dct_determine_edac_cap,
 	.determine_edac_ctl_cap		= dct_determine_edac_ctl_cap,
 	.init_csrows			= dct_init_csrows,
+	.dump_misc_regs			= dct_dump_misc_regs,
 	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
@@ -4093,7 +4083,8 @@ static int probe_one_instance(unsigned int nid)
 
 	amd64_info("%s detected (node %d).\n", pvt->ctl_name, pvt->mc_node_id);
 
-	dump_misc_regs(pvt);
+	/* Display and decode various registers for debug purposes. */
+	pvt->ops->dump_misc_regs(pvt);
 
 	return ret;
 
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 1b879c3cfb36..4e7467c285b9 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -478,6 +478,7 @@ struct low_ops {
 	void (*determine_edac_ctl_cap)(struct mem_ctl_info *mci, struct amd64_pvt *pvt);
 	void (*setup_mci_misc_attrs)(struct mem_ctl_info *mci);
 	int  (*init_csrows)(struct mem_ctl_info *mci);
+	void (*dump_misc_regs)(struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 17/18] EDAC/amd64: Add get_cs_mode() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (15 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 16/18] EDAC/amd64: Add dump_misc_regs() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  2022-05-09 14:55 ` [PATCH 18/18] EDAC/amd64: Add get_err_info() " Yazen Ghannam
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will use a different method to determine the chip select
mode used on a controller. A function pointer should be used rather
than introduce another branching condition.

Prepare for this by adding get_cs_mode() to pvt->ops and set it as
needed based on currently supported systems.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Use a "dct" prefix for newly-defined legacy functions, since these
systems use DRAM Controllers (DCTs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 21 ++++++++++++---------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b4c9d224f564..248d1082736e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1341,7 +1341,14 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan)
 #define CS_EVEN			(CS_EVEN_PRIMARY | CS_EVEN_SECONDARY)
 #define CS_ODD			(CS_ODD_PRIMARY | CS_ODD_SECONDARY)
 
-static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
+static int dct_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
+{
+	u32 dbam = ctrl ? pvt->dbam1 : pvt->dbam0;
+
+	return DBAM_DIMM(dimm, dbam);
+}
+
+static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
 	u8 base, count = 0;
 	int cs_mode = 0;
@@ -1383,7 +1390,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
 		cs0 = dimm * 2;
 		cs1 = dimm * 2 + 1;
 
-		cs_mode = f17_get_cs_mode(dimm, ctrl, pvt);
+		cs_mode = umc_get_cs_mode(dimm, ctrl, pvt);
 
 		size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
 		size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1);
@@ -3324,16 +3331,10 @@ static void dct_read_mc_regs(struct amd64_pvt *pvt)
  */
 static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
 {
-	u32 dbam = dct ? pvt->dbam1 : pvt->dbam0;
 	int csrow_nr = csrow_nr_orig;
 	u32 cs_mode, nr_pages;
 
-	if (!pvt->umc) {
-		csrow_nr >>= 1;
-		cs_mode = DBAM_DIMM(csrow_nr, dbam);
-	} else {
-		cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
-	}
+	cs_mode = pvt->ops->get_cs_mode(csrow_nr >> 1, dct, pvt);
 
 	nr_pages   = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr);
 	nr_pages <<= 20 - PAGE_SHIFT;
@@ -3755,6 +3756,7 @@ static struct low_ops umc_ops = {
 	.determine_edac_ctl_cap		= umc_determine_edac_ctl_cap,
 	.init_csrows			= umc_init_csrows,
 	.dump_misc_regs			= umc_dump_misc_regs,
+	.get_cs_mode			= umc_get_cs_mode,
 	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
@@ -3773,6 +3775,7 @@ static struct low_ops dct_ops = {
 	.determine_edac_ctl_cap		= dct_determine_edac_ctl_cap,
 	.init_csrows			= dct_init_csrows,
 	.dump_misc_regs			= dct_dump_misc_regs,
+	.get_cs_mode			= dct_get_cs_mode,
 	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 4e7467c285b9..1f64c08ae0ce 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -479,6 +479,7 @@ struct low_ops {
 	void (*setup_mci_misc_attrs)(struct mem_ctl_info *mci);
 	int  (*init_csrows)(struct mem_ctl_info *mci);
 	void (*dump_misc_regs)(struct amd64_pvt *pvt);
+	int  (*get_cs_mode)(int dimm, u8 ctrl, struct amd64_pvt *pvt);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 18/18] EDAC/amd64: Add get_err_info() into pvt->ops
  2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
                   ` (16 preceding siblings ...)
  2022-05-09 14:55 ` [PATCH 17/18] EDAC/amd64: Add get_cs_mode() " Yazen Ghannam
@ 2022-05-09 14:55 ` Yazen Ghannam
  17 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-09 14:55 UTC (permalink / raw)
  To: bp, linux-edac
  Cc: linux-kernel, Smita.KoralahalliChannabasappa, muralidhara.mk,
	naveenkrishna.chatradhi, Yazen Ghannam

From: Muralidhara M K <muralidhara.mk@amd.com>

GPU Nodes will use a different method to determine the chip select
and channel of an error. A function pointer should be used rather
than introduce another branching condition.

Prepare for this by adding get_err_info() to pvt->ops. This function is
only called from the modern code path, so a legacy function is not
defined.

Use a "umc" prefix for modern systems, since these use Unified Memory
Controllers (UMCs).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
[Rebased/reworked patch and reworded commit message]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 15 ++++++++++-----
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 248d1082736e..81d165bcd252 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3067,10 +3067,16 @@ static inline void decode_bus_error(int node_id, struct mce *m)
  * Currently, we can derive the channel number by looking at the 6th nibble in
  * the instance_id. For example, instance_id=0xYXXXXX where Y is the channel
  * number.
+ *
+ * csrow can be derived from the lower 3 bits of MCA_SYND value.
+ *
+ * For DRAM ECC errors, the Chip Select number is given in bits [2:0] of
+ * the MCA_SYND[ErrorInformation] field.
  */
-static int find_umc_channel(struct mce *m)
+static void umc_get_err_info(struct mce *m, struct err_info *err)
 {
-	return (m->ipid & GENMASK(31, 0)) >> 20;
+	err->channel = (m->ipid & GENMASK(31, 0)) >> 20;
+	err->csrow = m->synd & 0x7;
 }
 
 static void decode_umc_error(int node_id, struct mce *m)
@@ -3092,8 +3098,6 @@ static void decode_umc_error(int node_id, struct mce *m)
 	if (m->status & MCI_STATUS_DEFERRED)
 		ecc_type = 3;
 
-	err.channel = find_umc_channel(m);
-
 	if (!(m->status & MCI_STATUS_SYNDV)) {
 		err.err_code = ERR_SYND;
 		goto log_error;
@@ -3108,7 +3112,7 @@ static void decode_umc_error(int node_id, struct mce *m)
 			err.err_code = ERR_CHANNEL;
 	}
 
-	err.csrow = m->synd & 0x7;
+	pvt->ops->get_err_info(m, &err);
 
 	if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
 		err.err_code = ERR_NORM_ADDR;
@@ -3757,6 +3761,7 @@ static struct low_ops umc_ops = {
 	.init_csrows			= umc_init_csrows,
 	.dump_misc_regs			= umc_dump_misc_regs,
 	.get_cs_mode			= umc_get_cs_mode,
+	.get_err_info			= umc_get_err_info,
 	.setup_mci_misc_attrs		= setup_mci_misc_attrs,
 };
 
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 1f64c08ae0ce..d5a64b0639bb 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -480,6 +480,7 @@ struct low_ops {
 	int  (*init_csrows)(struct mem_ctl_info *mci);
 	void (*dump_misc_regs)(struct amd64_pvt *pvt);
 	int  (*get_cs_mode)(int dimm, u8 ctrl, struct amd64_pvt *pvt);
+	void (*get_err_info)(struct mce *m, struct err_info *err);
 };
 
 int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/18] EDAC/amd64: Remove scrub rate control for Family 17h and later
  2022-05-09 14:55 ` [PATCH 02/18] EDAC/amd64: Remove scrub rate control for Family 17h and later Yazen Ghannam
@ 2022-05-11 10:26   ` Borislav Petkov
  2022-05-12 14:19     ` Yazen Ghannam
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-05-11 10:26 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Mon, May 09, 2022 at 02:55:18PM +0000, Yazen Ghannam wrote:
> @@ -251,6 +234,9 @@ static int set_scrub_rate(struct mem_ctl_info *mci, u32 bw)
>  	struct amd64_pvt *pvt = mci->pvt_info;
>  	u32 min_scrubrate = 0x5;
>  
> +	if (pvt->umc)
> +		return -EPERM;

Since this function is testing families, it might be more
straightforward for the check to be:

	if (pvt->fam >= 0x17)

> +
>  	if (pvt->fam == 0xf)
>  		min_scrubrate = 0x0;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/18] EDAC/amd64: Remove PCI Function 0
  2022-05-09 14:55 ` [PATCH 04/18] EDAC/amd64: Remove PCI Function 0 Yazen Ghannam
@ 2022-05-11 10:34   ` Borislav Petkov
  2022-05-12 14:34     ` Yazen Ghannam
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-05-11 10:34 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Mon, May 09, 2022 at 02:55:20PM +0000, Yazen Ghannam wrote:
> @@ -3287,26 +3276,12 @@ static void decode_umc_error(int node_id, struct mce *m)
>  /*
>   * Use pvt->F3 which contains the F3 CPU PCI device to get the related
>   * F1 (AddrMap) and F2 (Dct) devices. Return negative value on error.
> - * Reserve F0 on systems with a UMC.
>   */
>  static int
>  reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
>  {
> -	if (pvt->umc) {
> -		pvt->F0 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3);
> -		if (!pvt->F0) {
> -			edac_dbg(1, "F0 not found, device 0x%x\n", pci_id1);
> -			return -ENODEV;
> -		}
> -
> -		if (!pci_ctl_dev)
> -			pci_ctl_dev = &pvt->F0->dev;
> -
> -		edac_dbg(1, "F0: %s\n", pci_name(pvt->F0));
> -		edac_dbg(1, "F3: %s\n", pci_name(pvt->F3));
> -
> +	if (pvt->umc)

I don't like the sprinkling of those checks everywhere. And
hw_info_get() has those checks too. I think it would be cleaner if
hw_info_get() would call a df-specific function for fam 0x17 and later
and do the setup there cleanly:

hw_info_get:

	if (pvt->fam >= 0x17)
		return hw_info_get_df(pvt);

and so on.

Btw, I completely agree with leaving the old code as it is.

And I obviously like the code removal, ofc.

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/18] EDAC/amd64: Remove scrub rate control for Family 17h and later
  2022-05-11 10:26   ` Borislav Petkov
@ 2022-05-12 14:19     ` Yazen Ghannam
  0 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-12 14:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Wed, May 11, 2022 at 12:26:27PM +0200, Borislav Petkov wrote:
> On Mon, May 09, 2022 at 02:55:18PM +0000, Yazen Ghannam wrote:
> > @@ -251,6 +234,9 @@ static int set_scrub_rate(struct mem_ctl_info *mci, u32 bw)
> >  	struct amd64_pvt *pvt = mci->pvt_info;
> >  	u32 min_scrubrate = 0x5;
> >  
> > +	if (pvt->umc)
> > +		return -EPERM;
> 
> Since this function is testing families, it might be more
> straightforward for the check to be:
> 
> 	if (pvt->fam >= 0x17)
>

Yes, I'll make that change.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/18] EDAC/amd64: Remove PCI Function 0
  2022-05-11 10:34   ` Borislav Petkov
@ 2022-05-12 14:34     ` Yazen Ghannam
  2022-05-13  9:56       ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-12 14:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Wed, May 11, 2022 at 12:34:58PM +0200, Borislav Petkov wrote:
> On Mon, May 09, 2022 at 02:55:20PM +0000, Yazen Ghannam wrote:
> > @@ -3287,26 +3276,12 @@ static void decode_umc_error(int node_id, struct mce *m)
> >  /*
> >   * Use pvt->F3 which contains the F3 CPU PCI device to get the related
> >   * F1 (AddrMap) and F2 (Dct) devices. Return negative value on error.
> > - * Reserve F0 on systems with a UMC.
> >   */
> >  static int
> >  reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
> >  {
> > -	if (pvt->umc) {
> > -		pvt->F0 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3);
> > -		if (!pvt->F0) {
> > -			edac_dbg(1, "F0 not found, device 0x%x\n", pci_id1);
> > -			return -ENODEV;
> > -		}
> > -
> > -		if (!pci_ctl_dev)
> > -			pci_ctl_dev = &pvt->F0->dev;
> > -
> > -		edac_dbg(1, "F0: %s\n", pci_name(pvt->F0));
> > -		edac_dbg(1, "F3: %s\n", pci_name(pvt->F3));
> > -
> > +	if (pvt->umc)
> 
> I don't like the sprinkling of those checks everywhere. And
> hw_info_get() has those checks too. I think it would be cleaner if
> hw_info_get() would call a df-specific function for fam 0x17 and later
> and do the setup there cleanly:
> 
> hw_info_get:
> 
> 	if (pvt->fam >= 0x17)
> 		return hw_info_get_df(pvt);
> 
> and so on.
> 
> Btw, I completely agree with leaving the old code as it is.
> 
> And I obviously like the code removal, ofc.
> 
> :-)
>

Okay, will do.

Also, there are five function pointers that are created in this patchset and
called from hw_info_get(). I think those pointers can be dropped and the
helper functions called from hw_info_get(). So I think it'd be good to make
hw_info_get() into a function pointer which gets set to a functoin that calls
the right collection of legacy, modern, and GPU helper functions. How does
that sound?

Thanks!

-Yazen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/18] EDAC/amd64: Remove PCI Function 0
  2022-05-12 14:34     ` Yazen Ghannam
@ 2022-05-13  9:56       ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-05-13  9:56 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Thu, May 12, 2022 at 02:34:29PM +0000, Yazen Ghannam wrote:
> Also, there are five function pointers that are created in this patchset and
> called from hw_info_get(). I think those pointers can be dropped and the
> helper functions called from hw_info_get(). So I think it'd be good to make
> hw_info_get() into a function pointer which gets set to a functoin that calls
> the right collection of legacy, modern, and GPU helper functions. How does
> that sound?

Makes sense.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/18] EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
  2022-05-09 14:55 ` [PATCH 05/18] EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt Yazen Ghannam
@ 2022-05-13 15:21   ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-05-13 15:21 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Mon, May 09, 2022 at 02:55:21PM +0000, Yazen Ghannam wrote:
> From: Muralidhara M K <muralidhara.mk@amd.com>
> 
> Future AMD systems will support heterogeneous "AMD Node" types, e.g.
> CPU+GPU. Therefore, a global "family type" shared across all "AMD Nodes"
> is no longer appropriate.
> 
> Move struct low_ops routines and members of struct amd64_family_type
> to struct amd64_pvt.
> 
> Currently, there are many code branches that split between "modern" and
> "legacy" systems. Another code branch will be needed in order to cover
> "GPU" cases. However, rather than introduce another branching case in
> multiple functions, the current branching code should be switched to a
> set of function pointers. This change makes the code more readable and
> simplifies adding support for new families/models.
> 
> In order to reuse code, define two sets of function pointers. Use one
> for modern systems (Family 17h and later). This will not change between
> current CPU families. Use another set of function pointers for legacy
> systems (before Family 17h). Use the Family 16h versions as default
> for the legacy ops since these are the latest, and adjust the function
> pointers as needed for older families.
> 
> Rename the Family 17h functions to use a "umc" prefix. This is to
> indicate that the functions apply to all modern CPU familes (17h, 18h,

Unknown word [familes] in commit message.
Suggestions: ['families',

> and 19h) which all have Unified Memory Controllers (UMCs).
> 
> Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
> [Rebased/reworked patch and reworded commit message]
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
>  drivers/edac/amd64_edac.c | 353 +++++++++++++-------------------------
>  drivers/edac/amd64_edac.h |  66 +++----
>  2 files changed, 140 insertions(+), 279 deletions(-)

That diffstat looks good.

And it simplifies and streamlines the code real nicely.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/18] EDAC/amd64: Add prep_chip_selects() into pvt->ops
  2022-05-09 14:55 ` [PATCH 06/18] EDAC/amd64: Add prep_chip_selects() into pvt->ops Yazen Ghannam
@ 2022-05-18  8:10   ` Borislav Petkov
  2022-05-19 14:42     ` Yazen Ghannam
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-05-18  8:10 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Mon, May 09, 2022 at 02:55:22PM +0000, Yazen Ghannam wrote:
> From: Muralidhara M K <muralidhara.mk@amd.com>
> 
> GPU Nodes will need to set the number of available Chip Selects, i.e.
> Base and Mask values, differently than existing systems. A function
> pointer should be used rather than introduce another branching condition.

Yeah, it looks to me like all that detection logic should be split
eventually. Looking at read_mc_regs(), it has

        if (pvt->umc) {
                __read_mc_regs_df(pvt);

                goto skip;
        }

at the top, then a whole bunch of legacy stuff and then at the skip
label some common stuff...

Another thing you could consider is to have common functionality carved
out into helpers with "common" in the name and then call those from both
UMC and DCT paths. Perhaps that'll help keep the init paths sane. That
is, short of splitting this driver.

We did the splitting for Intel and there we have a common, librarized
code which gets linked into a couple of drivers. You don't have to do
this too - just putting it out there as an alternative.

The per-family function pointers design could be good too, if done
right. The advantage being, you have a single driver for all, yadda
yadda...

> Prepare for this by adding prep_chip_selects() to pvt->ops and set it
> as needed based on currently supported systems.
> 
> Use a "umc" prefix for modern systems, since these use Unified Memory
> Controllers (UMCs).
> 
> Use a "dct" prefix for newly-defined legacy functions, since these
> systems use DRAM Controllers (DCTs).
> 
> Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>

What does Naveen's SOB mean here? Co-developed-by perhaps?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/18] EDAC/amd64: Add read_mc_regs() into pvt->ops
  2022-05-09 14:55 ` [PATCH 10/18] EDAC/amd64: Add read_mc_regs() " Yazen Ghannam
@ 2022-05-18 11:02   ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-05-18 11:02 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Mon, May 09, 2022 at 02:55:26PM +0000, Yazen Ghannam wrote:
> @@ -3766,6 +3751,7 @@ static struct low_ops umc_ops = {
>  	.read_base_mask			= umc_read_base_mask,
>  	.determine_memory_type		= umc_determine_memory_type,
>  	.determine_ecc_sym_sz		= umc_determine_ecc_sym_sz,
> +	.read_mc_regs			= umc_read_mc_regs,
>  };
>  
>  /* Use Family 16h versions for defaults and adjust as needed below. */
> @@ -3777,6 +3763,7 @@ static struct low_ops dct_ops = {
>  	.read_base_mask			= dct_read_base_mask,
>  	.determine_memory_type		= dct_determine_memory_type,
>  	.determine_ecc_sym_sz		= dct_determine_ecc_sym_sz,
> +	.read_mc_regs			= dct_read_mc_regs,

Aha, good, here you do that split I mentioned earlier.

>  static int per_family_init(struct amd64_pvt *pvt)
> @@ -3938,7 +3925,15 @@ static int hw_info_get(struct amd64_pvt *pvt)
>  	if (ret)
>  		return ret;
>  
> -	read_mc_regs(pvt);
> +	pvt->ops->read_mc_regs(pvt);
> +
> +	pvt->ops->prep_chip_selects(pvt);
> +
> +	pvt->ops->read_base_mask(pvt);
> +
> +	pvt->ops->determine_memory_type(pvt);
> +
> +	pvt->ops->determine_ecc_sym_sz(pvt);
>  

Yeah, no need for the spaces.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/18] EDAC/amd64: Add prep_chip_selects() into pvt->ops
  2022-05-18  8:10   ` Borislav Petkov
@ 2022-05-19 14:42     ` Yazen Ghannam
  0 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-05-19 14:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Wed, May 18, 2022 at 10:10:05AM +0200, Borislav Petkov wrote:
> On Mon, May 09, 2022 at 02:55:22PM +0000, Yazen Ghannam wrote:
> > From: Muralidhara M K <muralidhara.mk@amd.com>
> > 
> > GPU Nodes will need to set the number of available Chip Selects, i.e.
> > Base and Mask values, differently than existing systems. A function
> > pointer should be used rather than introduce another branching condition.
> 
> Yeah, it looks to me like all that detection logic should be split
> eventually. Looking at read_mc_regs(), it has
> 
>         if (pvt->umc) {
>                 __read_mc_regs_df(pvt);
> 
>                 goto skip;
>         }
> 
> at the top, then a whole bunch of legacy stuff and then at the skip
> label some common stuff...
> 
> Another thing you could consider is to have common functionality carved
> out into helpers with "common" in the name and then call those from both
> UMC and DCT paths. Perhaps that'll help keep the init paths sane. That
> is, short of splitting this driver.
> 
> We did the splitting for Intel and there we have a common, librarized
> code which gets linked into a couple of drivers. You don't have to do
> this too - just putting it out there as an alternative.
> 
> The per-family function pointers design could be good too, if done
> right. The advantage being, you have a single driver for all, yadda
> yadda...
>

Yep, I've actually considered splitting this driver. But I think at this point
it'd be best to keep what we have, and then write a new driver if and when a
major change happens in future platforms.

> > Prepare for this by adding prep_chip_selects() to pvt->ops and set it
> > as needed based on currently supported systems.
> > 
> > Use a "umc" prefix for modern systems, since these use Unified Memory
> > Controllers (UMCs).
> > 
> > Use a "dct" prefix for newly-defined legacy functions, since these
> > systems use DRAM Controllers (DCTs).
> > 
> > Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
> > Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
> 
> What does Naveen's SOB mean here? Co-developed-by perhaps?
>

Yes, that's right. Sorry I missed it. I'll include it in the next revision.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/18] EDAC/amd64: Add determine_edac_cap() into pvt->ops
  2022-05-09 14:55 ` [PATCH 12/18] EDAC/amd64: Add determine_edac_cap() " Yazen Ghannam
@ 2022-06-20 16:21   ` Borislav Petkov
  2022-06-22 16:10     ` Yazen Ghannam
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-06-20 16:21 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

Back to those...

On Mon, May 09, 2022 at 02:55:28PM +0000, Yazen Ghannam wrote:
> From: Muralidhara M K <muralidhara.mk@amd.com>
> 
> GPU Nodes will have different criteria for checking the EDAC
> capabilities of a controller. A function pointer should be used rather
> than introduce another branching condition.
> 
> Prepare for this by adding determine_edac_cap() to pvt->ops and set it
> as needed based on currently supported systems.
> 
> Use a "umc" prefix for modern systems, since these use Unified Memory
> Controllers (UMCs).
> 
> Use a "dct" prefix for newly-defined legacy functions, since these
> systems use DRAM Controllers (DCTs).

Please refrain from adding those boilerplates to each commit message. Do
it once for the first patch and then no need anymore. It is clear what's
going on.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/18] EDAC/amd64: Add determine_edac_cap() into pvt->ops
  2022-06-20 16:21   ` Borislav Petkov
@ 2022-06-22 16:10     ` Yazen Ghannam
  0 siblings, 0 replies; 30+ messages in thread
From: Yazen Ghannam @ 2022-06-22 16:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, linux-kernel, Smita.KoralahalliChannabasappa,
	muralidhara.mk, naveenkrishna.chatradhi

On Mon, Jun 20, 2022 at 06:21:11PM +0200, Borislav Petkov wrote:
> Back to those...
> 
> On Mon, May 09, 2022 at 02:55:28PM +0000, Yazen Ghannam wrote:
> > From: Muralidhara M K <muralidhara.mk@amd.com>
> > 
> > GPU Nodes will have different criteria for checking the EDAC
> > capabilities of a controller. A function pointer should be used rather
> > than introduce another branching condition.
> > 
> > Prepare for this by adding determine_edac_cap() to pvt->ops and set it
> > as needed based on currently supported systems.
> > 
> > Use a "umc" prefix for modern systems, since these use Unified Memory
> > Controllers (UMCs).
> > 
> > Use a "dct" prefix for newly-defined legacy functions, since these
> > systems use DRAM Controllers (DCTs).
> 
> Please refrain from adding those boilerplates to each commit message. Do
> it once for the first patch and then no need anymore. It is clear what's
> going on.
>

Understood, will do.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-06-22 16:10 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09 14:55 [PATCH 00/18] AMD64 EDAC Cleanup and Refactor Yazen Ghannam
2022-05-09 14:55 ` [PATCH 01/18] EDAC/amd64: Don't set up EDAC PCI control on Family 17h+ Yazen Ghannam
2022-05-09 14:55 ` [PATCH 02/18] EDAC/amd64: Remove scrub rate control for Family 17h and later Yazen Ghannam
2022-05-11 10:26   ` Borislav Petkov
2022-05-12 14:19     ` Yazen Ghannam
2022-05-09 14:55 ` [PATCH 03/18] EDAC/amd64: Remove PCI Function 6 Yazen Ghannam
2022-05-09 14:55 ` [PATCH 04/18] EDAC/amd64: Remove PCI Function 0 Yazen Ghannam
2022-05-11 10:34   ` Borislav Petkov
2022-05-12 14:34     ` Yazen Ghannam
2022-05-13  9:56       ` Borislav Petkov
2022-05-09 14:55 ` [PATCH 05/18] EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt Yazen Ghannam
2022-05-13 15:21   ` Borislav Petkov
2022-05-09 14:55 ` [PATCH 06/18] EDAC/amd64: Add prep_chip_selects() into pvt->ops Yazen Ghannam
2022-05-18  8:10   ` Borislav Petkov
2022-05-19 14:42     ` Yazen Ghannam
2022-05-09 14:55 ` [PATCH 07/18] EDAC/amd64: Add read_base_mask() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 08/18] EDAC/amd64: Add determine_memory_type() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 09/18] EDAC/amd64: Add get_ecc_sym_sz() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 10/18] EDAC/amd64: Add read_mc_regs() " Yazen Ghannam
2022-05-18 11:02   ` Borislav Petkov
2022-05-09 14:55 ` [PATCH 11/18] EDAC/amd64: Add ecc_enabled() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 12/18] EDAC/amd64: Add determine_edac_cap() " Yazen Ghannam
2022-06-20 16:21   ` Borislav Petkov
2022-06-22 16:10     ` Yazen Ghannam
2022-05-09 14:55 ` [PATCH 13/18] EDAC/amd64: Add determine_edac_ctl_cap() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 14/18] EDAC/amd64: Add setup_mci_misc_attrs() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 15/18] EDAC/amd64: Add init_csrows() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 16/18] EDAC/amd64: Add dump_misc_regs() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 17/18] EDAC/amd64: Add get_cs_mode() " Yazen Ghannam
2022-05-09 14:55 ` [PATCH 18/18] EDAC/amd64: Add get_err_info() " Yazen Ghannam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.