linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] AMD64 EDAC fixes
@ 2019-07-09 21:56 Ghannam, Yazen
  2019-07-09 21:56 ` [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling Ghannam, Yazen
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-07-09 21:56 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Hi Boris,

This set contains a few fixes for some changes merged in v5.2. There
are also a couple of fixes for older issues. In addition, there are a
couple of patches to add support for Asymmetric Dual-Rank DIMMs.

Thanks,
Yazen

Link:
https://lkml.kernel.org/r/20190531234501.32826-1-Yazen.Ghannam@amd.com

v1->v2:
* Squash patches 1 and 2 together.

Yazen Ghannam (7):
  EDAC/amd64: Support more than two controllers for chip selects
    handling
  EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
  EDAC/amd64: Initialize DIMM info for systems with more than two
    channels
  EDAC/amd64: Find Chip Select memory size using Address Mask
  EDAC/amd64: Decode syndrome before translating address
  EDAC/amd64: Cache secondary Chip Select registers
  EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

 drivers/edac/amd64_edac.c | 348 ++++++++++++++++++++++++--------------
 drivers/edac/amd64_edac.h |   9 +-
 2 files changed, 232 insertions(+), 125 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling
  2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
@ 2019-07-09 21:56 ` Ghannam, Yazen
  2019-07-10 16:54   ` Phillips, Kim
  2019-08-02  6:49   ` Borislav Petkov
  2019-07-09 21:56 ` [PATCH v2 3/7] EDAC/amd64: Initialize DIMM info for systems with more than two channels Ghannam, Yazen
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-07-09 21:56 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.

This is a second version of a commit that was reverted.

Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-2-Yazen.Ghannam@amd.com
https://lkml.kernel.org/r/20190531234501.32826-3-Yazen.Ghannam@amd.com

v1->v2:
* Patches 1 and 2 squashed together.

 drivers/edac/amd64_edac.c | 123 +++++++++++++++++++++-----------------
 drivers/edac/amd64_edac.h |   5 +-
 2 files changed, 71 insertions(+), 57 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 873437be86d9..dd60cf5a3d96 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -810,7 +810,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
 
 	edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
-	for (dimm = 0; dimm < 4; dimm++) {
+	for (dimm = 0; dimm < 2; dimm++) {
 		size0 = 0;
 		cs0 = dimm * 2;
 
@@ -942,89 +942,102 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
 	} else if (pvt->fam == 0x15 && pvt->model == 0x30) {
 		pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
 		pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+	} else if (pvt->fam >= 0x17) {
+		int umc;
+
+		for_each_umc(umc) {
+			pvt->csels[umc].b_cnt = 4;
+			pvt->csels[umc].m_cnt = 2;
+		}
+
 	} else {
 		pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
 		pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
 	}
 }
 
+static void read_umc_base_mask(struct amd64_pvt *pvt)
+{
+	u32 umc_base_reg, umc_mask_reg;
+	u32 base_reg, mask_reg;
+	u32 *base, *mask;
+	int cs, umc;
+
+	for_each_umc(umc) {
+		umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+
+		for_each_chip_select(cs, umc, pvt) {
+			base = &pvt->csels[umc].csbases[cs];
+
+			base_reg = umc_base_reg + (cs * 4);
+
+			if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+				edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
+					 umc, cs, *base, base_reg);
+		}
+
+		umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+
+		for_each_chip_select_mask(cs, umc, pvt) {
+			mask = &pvt->csels[umc].csmasks[cs];
+
+			mask_reg = umc_mask_reg + (cs * 4);
+
+			if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+				edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
+					 umc, cs, *mask, mask_reg);
+		}
+	}
+}
+
 /*
  * Function 2 Offset F10_DCSB0; read in the DCS Base and DCS Mask registers
  */
 static void read_dct_base_mask(struct amd64_pvt *pvt)
 {
-	int base_reg0, base_reg1, mask_reg0, mask_reg1, cs;
+	int cs;
 
 	prep_chip_selects(pvt);
 
-	if (pvt->umc) {
-		base_reg0 = get_umc_base(0) + UMCCH_BASE_ADDR;
-		base_reg1 = get_umc_base(1) + UMCCH_BASE_ADDR;
-		mask_reg0 = get_umc_base(0) + UMCCH_ADDR_MASK;
-		mask_reg1 = get_umc_base(1) + UMCCH_ADDR_MASK;
-	} else {
-		base_reg0 = DCSB0;
-		base_reg1 = DCSB1;
-		mask_reg0 = DCSM0;
-		mask_reg1 = DCSM1;
-	}
+	if (pvt->umc)
+		return read_umc_base_mask(pvt);
 
 	for_each_chip_select(cs, 0, pvt) {
-		int reg0   = base_reg0 + (cs * 4);
-		int reg1   = base_reg1 + (cs * 4);
+		int reg0   = DCSB0 + (cs * 4);
+		int reg1   = DCSB1 + (cs * 4);
 		u32 *base0 = &pvt->csels[0].csbases[cs];
 		u32 *base1 = &pvt->csels[1].csbases[cs];
 
-		if (pvt->umc) {
-			if (!amd_smn_read(pvt->mc_node_id, reg0, base0))
-				edac_dbg(0, "  DCSB0[%d]=0x%08x reg: 0x%x\n",
-					 cs, *base0, reg0);
-
-			if (!amd_smn_read(pvt->mc_node_id, reg1, base1))
-				edac_dbg(0, "  DCSB1[%d]=0x%08x reg: 0x%x\n",
-					 cs, *base1, reg1);
-		} else {
-			if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0))
-				edac_dbg(0, "  DCSB0[%d]=0x%08x reg: F2x%x\n",
-					 cs, *base0, reg0);
+		if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0))
+			edac_dbg(0, "  DCSB0[%d]=0x%08x reg: F2x%x\n",
+				 cs, *base0, reg0);
 
-			if (pvt->fam == 0xf)
-				continue;
+		if (pvt->fam == 0xf)
+			continue;
 
-			if (!amd64_read_dct_pci_cfg(pvt, 1, reg0, base1))
-				edac_dbg(0, "  DCSB1[%d]=0x%08x reg: F2x%x\n",
-					 cs, *base1, (pvt->fam == 0x10) ? reg1
-								: reg0);
-		}
+		if (!amd64_read_dct_pci_cfg(pvt, 1, reg0, base1))
+			edac_dbg(0, "  DCSB1[%d]=0x%08x reg: F2x%x\n",
+				 cs, *base1, (pvt->fam == 0x10) ? reg1
+							: reg0);
 	}
 
 	for_each_chip_select_mask(cs, 0, pvt) {
-		int reg0   = mask_reg0 + (cs * 4);
-		int reg1   = mask_reg1 + (cs * 4);
+		int reg0   = DCSM0 + (cs * 4);
+		int reg1   = DCSM1 + (cs * 4);
 		u32 *mask0 = &pvt->csels[0].csmasks[cs];
 		u32 *mask1 = &pvt->csels[1].csmasks[cs];
 
-		if (pvt->umc) {
-			if (!amd_smn_read(pvt->mc_node_id, reg0, mask0))
-				edac_dbg(0, "    DCSM0[%d]=0x%08x reg: 0x%x\n",
-					 cs, *mask0, reg0);
-
-			if (!amd_smn_read(pvt->mc_node_id, reg1, mask1))
-				edac_dbg(0, "    DCSM1[%d]=0x%08x reg: 0x%x\n",
-					 cs, *mask1, reg1);
-		} else {
-			if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, mask0))
-				edac_dbg(0, "    DCSM0[%d]=0x%08x reg: F2x%x\n",
-					 cs, *mask0, reg0);
+		if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, mask0))
+			edac_dbg(0, "    DCSM0[%d]=0x%08x reg: F2x%x\n",
+				 cs, *mask0, reg0);
 
-			if (pvt->fam == 0xf)
-				continue;
+		if (pvt->fam == 0xf)
+			continue;
 
-			if (!amd64_read_dct_pci_cfg(pvt, 1, reg0, mask1))
-				edac_dbg(0, "    DCSM1[%d]=0x%08x reg: F2x%x\n",
-					 cs, *mask1, (pvt->fam == 0x10) ? reg1
-								: reg0);
-		}
+		if (!amd64_read_dct_pci_cfg(pvt, 1, reg0, mask1))
+			edac_dbg(0, "    DCSM1[%d]=0x%08x reg: F2x%x\n",
+				 cs, *mask1, (pvt->fam == 0x10) ? reg1
+							: reg0);
 	}
 }
 
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 8f66472f7adc..4dce6a2ac75f 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -96,6 +96,7 @@
 /* Hardware limit on ChipSelect rows per MC and processors per system */
 #define NUM_CHIPSELECTS			8
 #define DRAM_RANGES			8
+#define NUM_CONTROLLERS			8
 
 #define ON true
 #define OFF false
@@ -351,8 +352,8 @@ struct amd64_pvt {
 	u32 dbam0;		/* DRAM Base Address Mapping reg for DCT0 */
 	u32 dbam1;		/* DRAM Base Address Mapping reg for DCT1 */
 
-	/* one for each DCT */
-	struct chip_select csels[2];
+	/* one for each DCT/UMC */
+	struct chip_select csels[NUM_CONTROLLERS];
 
 	/* DRAM base and limit pairs F1x[78,70,68,60,58,50,48,40] */
 	struct dram_range ranges[DRAM_RANGES];
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
  2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
  2019-07-09 21:56 ` [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling Ghannam, Yazen
  2019-07-09 21:56 ` [PATCH v2 3/7] EDAC/amd64: Initialize DIMM info for systems with more than two channels Ghannam, Yazen
@ 2019-07-09 21:56 ` Ghannam, Yazen
  2019-08-02  7:42   ` Borislav Petkov
  2019-07-09 21:56 ` [PATCH v2 4/7] EDAC/amd64: Find Chip Select memory size using Address Mask Ghannam, Yazen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Ghannam, Yazen @ 2019-07-09 21:56 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting EDAC_CTL_CAP.

Set the appropriate EDAC_CTL_CAP flag based on the device type.

Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-4-Yazen.Ghannam@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index dd60cf5a3d96..125d6e2a828e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 static inline void
 f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 {
-	u8 i, ecc_en = 1, cpk_en = 1;
+	u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
 
 	for_each_umc(i) {
 		if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
 			ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
 			cpk_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_CHIPKILL_CAP);
+
+			dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
+			dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
 		}
 	}
 
@@ -3163,8 +3166,12 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 	if (ecc_en) {
 		mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
 
-		if (cpk_en)
-			mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+		if (cpk_en) {
+			if (dev_x4)
+				mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+			else if (dev_x16)
+				mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
+		}
 	}
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/7] EDAC/amd64: Initialize DIMM info for systems with more than two channels
  2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
  2019-07-09 21:56 ` [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling Ghannam, Yazen
@ 2019-07-09 21:56 ` Ghannam, Yazen
  2019-07-09 21:56 ` [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP Ghannam, Yazen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-07-09 21:56 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.

Fixes: bdcee7747f5c ("EDAC/amd64: Support more than two Unified Memory Controllers")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-5-Yazen.Ghannam@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 63 ++++++++++++++++++++++++++++++---------
 1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 125d6e2a828e..d0926b181c7c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2837,6 +2837,46 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
 	return nr_pages;
 }
 
+static int init_csrows_df(struct mem_ctl_info *mci)
+{
+	struct amd64_pvt *pvt = mci->pvt_info;
+	enum edac_type edac_mode = EDAC_NONE;
+	enum dev_type dev_type = DEV_UNKNOWN;
+	struct dimm_info *dimm;
+	int empty = 1;
+	u8 umc, cs;
+
+	if (mci->edac_ctl_cap & EDAC_FLAG_S16ECD16ED) {
+		edac_mode = EDAC_S16ECD16ED;
+		dev_type = DEV_X16;
+	} else if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED) {
+		edac_mode = EDAC_S4ECD4ED;
+		dev_type = DEV_X4;
+	} else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED) {
+		edac_mode = EDAC_SECDED;
+	}
+
+	for_each_umc(umc) {
+		for_each_chip_select(cs, umc, pvt) {
+			if (!csrow_enabled(cs, umc, pvt))
+				continue;
+
+			empty = 0;
+			dimm = mci->csrows[cs]->channels[umc]->dimm;
+
+			edac_dbg(1, "MC node: %d, csrow: %d\n",
+					pvt->mc_node_id, cs);
+
+			dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+			dimm->mtype = pvt->dram_type;
+			dimm->edac_mode = edac_mode;
+			dimm->dtype = dev_type;
+		}
+	}
+
+	return empty;
+}
+
 /*
  * Initialize the array of csrow attribute instances, based on the values
  * from pci config hardware registers.
@@ -2851,15 +2891,16 @@ static int init_csrows(struct mem_ctl_info *mci)
 	int nr_pages = 0;
 	u32 val;
 
-	if (!pvt->umc) {
-		amd64_read_pci_cfg(pvt->F3, NBCFG, &val);
+	if (pvt->umc)
+		return init_csrows_df(mci);
+
+	amd64_read_pci_cfg(pvt->F3, NBCFG, &val);
 
-		pvt->nbcfg = val;
+	pvt->nbcfg = val;
 
-		edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n",
-			 pvt->mc_node_id, val,
-			 !!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
-	}
+	edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n",
+		 pvt->mc_node_id, val,
+		 !!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
 
 	/*
 	 * We iterate over DCT0 here but we look at DCT1 in parallel, if needed.
@@ -2896,13 +2937,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 		edac_dbg(1, "Total csrow%d pages: %u\n", i, nr_pages);
 
 		/* Determine DIMM ECC mode: */
-		if (pvt->umc) {
-			if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED)
-				edac_mode = EDAC_S4ECD4ED;
-			else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED)
-				edac_mode = EDAC_SECDED;
-
-		} else if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
+		if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
 			edac_mode = (pvt->nbcfg & NBCFG_CHIPKILL)
 					? EDAC_S4ECD4ED
 					: EDAC_SECDED;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/7] EDAC/amd64: Find Chip Select memory size using Address Mask
  2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
                   ` (2 preceding siblings ...)
  2019-07-09 21:56 ` [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP Ghannam, Yazen
@ 2019-07-09 21:56 ` Ghannam, Yazen
  2019-07-09 21:56 ` [PATCH v2 6/7] EDAC/amd64: Cache secondary Chip Select registers Ghannam, Yazen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-07-09 21:56 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a few cases.

1) For single-rank, use the address mask as the size.

2) For dual-rank non-interleaved, use the address mask divided by 2 as
the size.

3) For dual-rank interleaved, do #2 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead do the de-interleaving. In the non-interleaved case, the
original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0xFFFF9FE, De-interleaved=0x3FFFFFE

Fixes: fc00c6a41638 ("EDAC/amd64: Adjust printed chip select sizes when interleaved")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-6-Yazen.Ghannam@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 107 ++++++++++++++++++++++----------------
 1 file changed, 63 insertions(+), 44 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index d0926b181c7c..f0424c10cac0 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -788,51 +788,36 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan)
 		 (dclr & BIT(15)) ?  "yes" : "no");
 }
 
-/*
- * The Address Mask should be a contiguous set of bits in the non-interleaved
- * case. So to check for CS interleaving, find the most- and least-significant
- * bits of the mask, generate a contiguous bitmask, and compare the two.
- */
-static bool f17_cs_interleaved(struct amd64_pvt *pvt, u8 ctrl, int cs)
+#define CS_EVEN_PRIMARY		BIT(0)
+#define CS_ODD_PRIMARY		BIT(1)
+
+static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
-	u32 mask = pvt->csels[ctrl].csmasks[cs >> 1];
-	u32 msb = fls(mask) - 1, lsb = ffs(mask) - 1;
-	u32 test_mask = GENMASK(msb, lsb);
+	int cs_mode = 0;
+
+	if (csrow_enabled(2 * dimm, ctrl, pvt))
+		cs_mode |= CS_EVEN_PRIMARY;
 
-	edac_dbg(1, "mask=0x%08x test_mask=0x%08x\n", mask, test_mask);
+	if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
+		cs_mode |= CS_ODD_PRIMARY;
 
-	return mask ^ test_mask;
+	return cs_mode;
 }
 
 static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
 {
-	int dimm, size0, size1, cs0, cs1;
+	int dimm, size0, size1, cs0, cs1, cs_mode;
 
 	edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
 	for (dimm = 0; dimm < 2; dimm++) {
-		size0 = 0;
 		cs0 = dimm * 2;
-
-		if (csrow_enabled(cs0, ctrl, pvt))
-			size0 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs0);
-
-		size1 = 0;
 		cs1 = dimm * 2 + 1;
 
-		if (csrow_enabled(cs1, ctrl, pvt)) {
-			/*
-			 * CS interleaving is only supported if both CSes have
-			 * the same amount of memory. Because they are
-			 * interleaved, it will look like both CSes have the
-			 * full amount of memory. Save the size for both as
-			 * half the amount we found on CS0, if interleaved.
-			 */
-			if (f17_cs_interleaved(pvt, ctrl, cs1))
-				size1 = size0 = (size0 >> 1);
-			else
-				size1 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs1);
-		}
+		cs_mode = f17_get_cs_mode(dimm, ctrl, pvt);
+
+		size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+		size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1);
 
 		amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
 				cs0,	size0,
@@ -1569,18 +1554,50 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct,
 		return ddr3_cs_size(cs_mode, false);
 }
 
-static int f17_base_addr_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
 				    unsigned int cs_mode, int csrow_nr)
 {
-	u32 base_addr = pvt->csels[umc].csbases[csrow_nr];
+	u32 addr_mask_orig, addr_mask_deinterleaved;
+	u32 msb, weight, num_zero_bits;
+	int dimm, dual_rank, size = 0;
 
-	/*  Each mask is used for every two base addresses. */
-	u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr >> 1];
+	if (!cs_mode)
+		return size;
 
-	/*  Register [31:1] = Address [39:9]. Size is in kBs here. */
-	u32 size = ((addr_mask >> 1) - (base_addr >> 1) + 1) >> 1;
+	dual_rank = !!(cs_mode & (CS_EVEN_PRIMARY | CS_ODD_PRIMARY));
 
-	edac_dbg(1, "BaseAddr: 0x%x, AddrMask: 0x%x\n", base_addr, addr_mask);
+	/*
+	 * There is one mask per DIMM, and two Chip Selects per DIMM.
+	 *	CS0 and CS1 -> DIMM0
+	 *	CS2 and CS3 -> DIMM1
+	 */
+	dimm = csrow_nr >> 1;
+
+	addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+
+	/*
+	 * The number of zero bits in the mask is equal to the number of bits
+	 * in a full mask minus the number of bits in the current mask.
+	 *
+	 * The MSB is the number of bits in the full mask because BIT[0] is
+	 * always 0.
+	 */
+	msb = fls(addr_mask_orig) - 1;
+	weight = hweight_long(addr_mask_orig);
+	num_zero_bits = msb - weight;
+
+	/* Take the number of zero bits off from the top of the mask. */
+	addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
+
+	edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
+	edac_dbg(1, "  Original AddrMask: 0x%x\n", addr_mask_orig);
+	edac_dbg(1, "  Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
+
+	/* Register [31:1] = Address [39:9]. Size is in kBs here. */
+	size = addr_mask_deinterleaved >> 1;
+
+	/* Divide size by two if dual rank. */
+	size >>= dual_rank;
 
 	/* Return size in MBs. */
 	return size >> 10;
@@ -2245,7 +2262,7 @@ static struct amd64_family_type family_types[] = {
 		.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_base_addr_to_cs_size,
+			.dbam_to_cs		= f17_addr_mask_to_cs_size,
 		}
 	},
 	[F17_M10H_CPUS] = {
@@ -2254,7 +2271,7 @@ static struct amd64_family_type family_types[] = {
 		.f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_base_addr_to_cs_size,
+			.dbam_to_cs		= f17_addr_mask_to_cs_size,
 		}
 	},
 	[F17_M30H_CPUS] = {
@@ -2263,7 +2280,7 @@ static struct amd64_family_type family_types[] = {
 		.f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
-			.dbam_to_cs		= f17_base_addr_to_cs_size,
+			.dbam_to_cs		= f17_addr_mask_to_cs_size,
 		}
 	},
 };
@@ -2822,10 +2839,12 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
 	int csrow_nr = csrow_nr_orig;
 	u32 cs_mode, nr_pages;
 
-	if (!pvt->umc)
+	if (!pvt->umc) {
 		csrow_nr >>= 1;
-
-	cs_mode = DBAM_DIMM(csrow_nr, dbam);
+		cs_mode = DBAM_DIMM(csrow_nr, dbam);
+	} else {
+		cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
+	}
 
 	nr_pages   = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr);
 	nr_pages <<= 20 - PAGE_SHIFT;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/7] EDAC/amd64: Cache secondary Chip Select registers
  2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
                   ` (3 preceding siblings ...)
  2019-07-09 21:56 ` [PATCH v2 4/7] EDAC/amd64: Find Chip Select memory size using Address Mask Ghannam, Yazen
@ 2019-07-09 21:56 ` Ghannam, Yazen
  2019-07-09 21:56 ` [PATCH v2 5/7] EDAC/amd64: Decode syndrome before translating address Ghannam, Yazen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-07-09 21:56 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

AMD Family 17h systems have a set of secondary Chip Select Base
Addresses and Address Masks. These do not represent unique Chip
Selects, rather they are used in conjunction with the primary
Chip Select registers in certain use cases.

Cache these secondary Chip Select registers for future use.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-8-Yazen.Ghannam@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 23 ++++++++++++++++++++---
 drivers/edac/amd64_edac.h |  4 ++++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4058b24b8e04..006417cb79dc 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -943,34 +943,51 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
 
 static void read_umc_base_mask(struct amd64_pvt *pvt)
 {
-	u32 umc_base_reg, umc_mask_reg;
-	u32 base_reg, mask_reg;
-	u32 *base, *mask;
+	u32 umc_base_reg, umc_base_reg_sec;
+	u32 umc_mask_reg, umc_mask_reg_sec;
+	u32 base_reg, base_reg_sec;
+	u32 mask_reg, mask_reg_sec;
+	u32 *base, *base_sec;
+	u32 *mask, *mask_sec;
 	int cs, umc;
 
 	for_each_umc(umc) {
 		umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+		umc_base_reg_sec = get_umc_base(umc) + UMCCH_BASE_ADDR_SEC;
 
 		for_each_chip_select(cs, umc, pvt) {
 			base = &pvt->csels[umc].csbases[cs];
+			base_sec = &pvt->csels[umc].csbases_sec[cs];
 
 			base_reg = umc_base_reg + (cs * 4);
+			base_reg_sec = umc_base_reg_sec + (cs * 4);
 
 			if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
 				edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
 					 umc, cs, *base, base_reg);
+
+			if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, base_sec))
+				edac_dbg(0, "    DCSB_SEC%d[%d]=0x%08x reg: 0x%x\n",
+					 umc, cs, *base_sec, base_reg_sec);
 		}
 
 		umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+		umc_mask_reg_sec = get_umc_base(umc) + UMCCH_ADDR_MASK_SEC;
 
 		for_each_chip_select_mask(cs, umc, pvt) {
 			mask = &pvt->csels[umc].csmasks[cs];
+			mask_sec = &pvt->csels[umc].csmasks_sec[cs];
 
 			mask_reg = umc_mask_reg + (cs * 4);
+			mask_reg_sec = umc_mask_reg_sec + (cs * 4);
 
 			if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
 				edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
 					 umc, cs, *mask, mask_reg);
+
+			if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, mask_sec))
+				edac_dbg(0, "    DCSM_SEC%d[%d]=0x%08x reg: 0x%x\n",
+					 umc, cs, *mask_sec, mask_reg_sec);
 		}
 	}
 }
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 4dce6a2ac75f..68f12de6e654 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -259,7 +259,9 @@
 
 /* UMC CH register offsets */
 #define UMCCH_BASE_ADDR			0x0
+#define UMCCH_BASE_ADDR_SEC		0x10
 #define UMCCH_ADDR_MASK			0x20
+#define UMCCH_ADDR_MASK_SEC		0x28
 #define UMCCH_ADDR_CFG			0x30
 #define UMCCH_DIMM_CFG			0x80
 #define UMCCH_UMC_CFG			0x100
@@ -312,9 +314,11 @@ struct dram_range {
 /* A DCT chip selects collection */
 struct chip_select {
 	u32 csbases[NUM_CHIPSELECTS];
+	u32 csbases_sec[NUM_CHIPSELECTS];
 	u8 b_cnt;
 
 	u32 csmasks[NUM_CHIPSELECTS];
+	u32 csmasks_sec[NUM_CHIPSELECTS];
 	u8 m_cnt;
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/7] EDAC/amd64: Decode syndrome before translating address
  2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
                   ` (4 preceding siblings ...)
  2019-07-09 21:56 ` [PATCH v2 6/7] EDAC/amd64: Cache secondary Chip Select registers Ghannam, Yazen
@ 2019-07-09 21:56 ` Ghannam, Yazen
  2019-07-09 21:56 ` [PATCH v2 7/7] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs Ghannam, Yazen
  2019-08-02 14:46 ` [PATCH v2 0/7] AMD64 EDAC fixes Borislav Petkov
  7 siblings, 0 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-07-09 21:56 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.

Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.

Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode function")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-7-Yazen.Ghannam@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f0424c10cac0..4058b24b8e04 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2567,13 +2567,6 @@ static void decode_umc_error(int node_id, struct mce *m)
 
 	err.channel = find_umc_channel(m);
 
-	if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
-		err.err_code = ERR_NORM_ADDR;
-		goto log_error;
-	}
-
-	error_address_to_page_and_offset(sys_addr, &err);
-
 	if (!(m->status & MCI_STATUS_SYNDV)) {
 		err.err_code = ERR_SYND;
 		goto log_error;
@@ -2590,6 +2583,13 @@ static void decode_umc_error(int node_id, struct mce *m)
 
 	err.csrow = m->synd & 0x7;
 
+	if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
+		err.err_code = ERR_NORM_ADDR;
+		goto log_error;
+	}
+
+	error_address_to_page_and_offset(sys_addr, &err);
+
 log_error:
 	__log_ecc_error(mci, &err, ecc_type);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 7/7] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs
  2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
                   ` (5 preceding siblings ...)
  2019-07-09 21:56 ` [PATCH v2 5/7] EDAC/amd64: Decode syndrome before translating address Ghannam, Yazen
@ 2019-07-09 21:56 ` Ghannam, Yazen
  2019-08-02 14:46 ` [PATCH v2 0/7] AMD64 EDAC fixes Borislav Petkov
  7 siblings, 0 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-07-09 21:56 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Future AMD systems will support "Asymmetric" Dual-Rank DIMMs. These are
DIMMs were the ranks are of different sizes.

The even rank will use the Primary Even Chip Select registers and the
odd rank will use the Secondary Odd Chip Select registers.

Recognize if a Secondary Odd Chip Select is being used. Use the
Secondary Odd Address Mask when calculating the chip select size.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-9-Yazen.Ghannam@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 006417cb79dc..6c284a4f980c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -790,6 +790,9 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan)
 
 #define CS_EVEN_PRIMARY		BIT(0)
 #define CS_ODD_PRIMARY		BIT(1)
+#define CS_ODD_SECONDARY	BIT(2)
+
+#define csrow_sec_enabled(i, dct, pvt)	((pvt)->csels[(dct)].csbases_sec[(i)] & DCSB_CS_ENABLE)
 
 static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
@@ -801,6 +804,10 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 	if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
 		cs_mode |= CS_ODD_PRIMARY;
 
+	/* Asymmetric Dual-Rank DIMM support. */
+	if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt))
+		cs_mode |= CS_ODD_SECONDARY;
+
 	return cs_mode;
 }
 
@@ -1590,7 +1597,11 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
 	 */
 	dimm = csrow_nr >> 1;
 
-	addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+	/* Asymmetric Dual-Rank DIMM support. */
+	if (cs_mode & CS_ODD_SECONDARY)
+		addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
+	else
+		addr_mask_orig = pvt->csels[umc].csmasks[dimm];
 
 	/*
 	 * The number of zero bits in the mask is equal to the number of bits
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling
  2019-07-09 21:56 ` [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling Ghannam, Yazen
@ 2019-07-10 16:54   ` Phillips, Kim
  2019-08-02  6:49   ` Borislav Petkov
  1 sibling, 0 replies; 16+ messages in thread
From: Phillips, Kim @ 2019-07-10 16:54 UTC (permalink / raw)
  To: Ghannam, Yazen, linux-edac; +Cc: linux-kernel, bp

On 7/9/19 4:56 PM, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> The struct chip_select array that's used for saving chip select bases
> and masks is fixed at length of two. There should be one struct
> chip_select for each controller, so this array should be increased to
> support systems that may have more than two controllers.
> 
> Increase the size of the struct chip_select array to eight, which is the
> largest number of controllers per die currently supported on AMD
> systems.
> 
> Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
> Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
> channel.
> 
> Also, carve out the Family 17h+ reading of the bases/masks into a
> separate function. This effectively reverts the original bases/masks
> reading code to before Family 17h support was added.
> 
> This is a second version of a commit that was reverted.
> 
> Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
> Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"")
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---

For this and the rest of the series:

Tested-by: Kim Phillips <kim.phillips@amd.com>

Thanks,

Kim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling
  2019-07-09 21:56 ` [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling Ghannam, Yazen
  2019-07-10 16:54   ` Phillips, Kim
@ 2019-08-02  6:49   ` Borislav Petkov
  2019-08-19 19:55     ` Ghannam, Yazen
  1 sibling, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2019-08-02  6:49 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Tue, Jul 09, 2019 at 09:56:54PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> The struct chip_select array that's used for saving chip select bases
> and masks is fixed at length of two. There should be one struct
> chip_select for each controller, so this array should be increased to
> support systems that may have more than two controllers.
> 
> Increase the size of the struct chip_select array to eight, which is the
> largest number of controllers per die currently supported on AMD
> systems.
> 
> Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
> Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
> channel.
> 
> Also, carve out the Family 17h+ reading of the bases/masks into a
> separate function. This effectively reverts the original bases/masks
> reading code to before Family 17h support was added.
> 
> This is a second version of a commit that was reverted.
> 
> Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
> Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"")

I'm not sure about those Fixes: tags you're slapping everywhere. First
of all, 8de9930a4618 is a revert so how can this be fixing a revert? If
anything, it should be fixing the original commit

  0a227af521d6 ("EDAC/amd64: Support more than two controllers for chip select handling")

which tried the more-than-2-memory-controllers thing.

But, it is not really a fix for that commit but a second attempt at it.
Which is not really a fix but hw enablement.

So I'm dropping those tags here. If you want them in stable, pls
backport them properly and test them on the respective stable kernels
before sending them to stable.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
  2019-07-09 21:56 ` [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP Ghannam, Yazen
@ 2019-08-02  7:42   ` Borislav Petkov
  2019-08-19 20:19     ` Ghannam, Yazen
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2019-08-02  7:42 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Tue, Jul 09, 2019 at 09:56:55PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> AMD Family 17h systems support x4 and x16 DRAM devices. However, the
> device type is not checked when setting EDAC_CTL_CAP.
> 
> Set the appropriate EDAC_CTL_CAP flag based on the device type.
> 
> Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")

This is better: a patch which fixes a previous patch and is simple,
small and clear. That you can tag with Fixes: just fine.

> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Link:
> https://lkml.kernel.org/r/20190531234501.32826-4-Yazen.Ghannam@amd.com
> 
> v1->v2:
> * No change.
> 
>  drivers/edac/amd64_edac.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index dd60cf5a3d96..125d6e2a828e 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
>  static inline void
>  f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
>  {
> -	u8 i, ecc_en = 1, cpk_en = 1;
> +	u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
>  
>  	for_each_umc(i) {
>  		if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
>  			ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
>  			cpk_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_CHIPKILL_CAP);
> +
> +			dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
> +			dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));

Are those bits mutually exclusive?

I.e., so that you can do:

	if (dev_x4)
		mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
	else
		mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;

?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/7] AMD64 EDAC fixes
  2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
                   ` (6 preceding siblings ...)
  2019-07-09 21:56 ` [PATCH v2 7/7] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs Ghannam, Yazen
@ 2019-08-02 14:46 ` Borislav Petkov
  2019-08-15 20:08   ` Ghannam, Yazen
  7 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2019-08-02 14:46 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Tue, Jul 09, 2019 at 09:56:54PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Hi Boris,
> 
> This set contains a few fixes for some changes merged in v5.2. There
> are also a couple of fixes for older issues. In addition, there are a
> couple of patches to add support for Asymmetric Dual-Rank DIMMs.
> 
> Thanks,
> Yazen
> 
> Link:
> https://lkml.kernel.org/r/20190531234501.32826-1-Yazen.Ghannam@amd.com
> 
> v1->v2:
> * Squash patches 1 and 2 together.
> 
> Yazen Ghannam (7):
>   EDAC/amd64: Support more than two controllers for chip selects
>     handling
>   EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
>   EDAC/amd64: Initialize DIMM info for systems with more than two
>     channels
>   EDAC/amd64: Find Chip Select memory size using Address Mask
>   EDAC/amd64: Decode syndrome before translating address
>   EDAC/amd64: Cache secondary Chip Select registers
>   EDAC/amd64: Support Asymmetric Dual-Rank DIMMs
> 
>  drivers/edac/amd64_edac.c | 348 ++++++++++++++++++++++++--------------
>  drivers/edac/amd64_edac.h |   9 +-
>  2 files changed, 232 insertions(+), 125 deletions(-)

So this still has this confusing reporting of unpopulated nodes:

[    4.291774] EDAC MC1: Giving out device to module amd64_edac controller F17h: DEV 0000:00:19.3 (INTERRUPT)
[    4.292021] EDAC DEBUG: ecc_enabled: Node 2: No enabled UMCs.
[    4.292231] EDAC amd64: Node 2: DRAM ECC disabled.
[    4.292405] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[    4.292859] EDAC DEBUG: ecc_enabled: Node 3: No enabled UMCs.
[    4.292963] EDAC amd64: Node 3: DRAM ECC disabled.
[    4.293063] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[    4.293347] AMD64 EDAC driver v3.5.0

which needs fixing.

Regardless, still not good enough. The snowy owl box I have here has 16
GB:

$ head -n1 /proc/meminfo
MemTotal:       15715328 kB

and yet

[    4.282251] EDAC MC: UMC0 chip selects:
[    4.282348] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
[    4.282455] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 0x1fffffe
[    4.282592] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved AddrMask: 0x1fffffe
[    4.282732] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
[    4.282839] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 0x1fffffe
[    4.283060] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved AddrMask: 0x1fffffe
[    4.283286] EDAC amd64: MC: 0:  8191MB 1:  8191MB
				   ^^^^^^^^^^^^^^^^^

[    4.283456] EDAC amd64: MC: 2:     0MB 3:     0MB

...

[    4.285379] EDAC MC: UMC1 chip selects:
[    4.285476] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
[    4.285583] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 0x1fffffe
[    4.285721] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved AddrMask: 0x1fffffe
[    4.285860] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
[    4.285967] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 0x1fffffe
[    4.286105] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved AddrMask: 0x1fffffe
[    4.286244] EDAC amd64: MC: 0:  8191MB 1:  8191MB
				   ^^^^^^^^^^^^^^^^^

[    4.286345] EDAC amd64: MC: 2:     0MB 3:     0MB

which shows 4 chip selects x 8Gb = 32G.

So something's still wrong. Before the patchset it says:

EDAC MC: UMC0 chip selects:
EDAC amd64: MC: 0:  8192MB 1:     0MB
...
EDAC MC: UMC1 chip selects:
EDAC amd64: MC: 0:  8192MB 1:     0MB

which is the correct output.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2 0/7] AMD64 EDAC fixes
  2019-08-02 14:46 ` [PATCH v2 0/7] AMD64 EDAC fixes Borislav Petkov
@ 2019-08-15 20:08   ` Ghannam, Yazen
  2019-08-16  6:47     ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Ghannam, Yazen @ 2019-08-15 20:08 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Friday, August 2, 2019 9:46 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 0/7] AMD64 EDAC fixes
> 
...
> 
> So this still has this confusing reporting of unpopulated nodes:
> 
> [    4.291774] EDAC MC1: Giving out device to module amd64_edac controller F17h: DEV 0000:00:19.3 (INTERRUPT)
> [    4.292021] EDAC DEBUG: ecc_enabled: Node 2: No enabled UMCs.
> [    4.292231] EDAC amd64: Node 2: DRAM ECC disabled.
> [    4.292405] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
> [    4.292859] EDAC DEBUG: ecc_enabled: Node 3: No enabled UMCs.
> [    4.292963] EDAC amd64: Node 3: DRAM ECC disabled.
> [    4.293063] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
> [    4.293347] AMD64 EDAC driver v3.5.0
> 
> which needs fixing.
> 

Yes, I agree. I was planning to do a fix in a separate set. Is that okay? Or should I add it here?

> Regardless, still not good enough. The snowy owl box I have here has 16
> GB:
> 
> $ head -n1 /proc/meminfo
> MemTotal:       15715328 kB
> 
> and yet
> 
> [    4.282251] EDAC MC: UMC0 chip selects:
> [    4.282348] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
> [    4.282455] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 0x1fffffe
> [    4.282592] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved AddrMask: 0x1fffffe
> [    4.282732] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
> [    4.282839] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 0x1fffffe
> [    4.283060] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved AddrMask: 0x1fffffe
> [    4.283286] EDAC amd64: MC: 0:  8191MB 1:  8191MB
> 				   ^^^^^^^^^^^^^^^^^
> 
> [    4.283456] EDAC amd64: MC: 2:     0MB 3:     0MB
> 
> ...
> 
> [    4.285379] EDAC MC: UMC1 chip selects:
> [    4.285476] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
> [    4.285583] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 0x1fffffe
> [    4.285721] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved AddrMask: 0x1fffffe
> [    4.285860] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
> [    4.285967] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 0x1fffffe
> [    4.286105] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved AddrMask: 0x1fffffe
> [    4.286244] EDAC amd64: MC: 0:  8191MB 1:  8191MB
> 				   ^^^^^^^^^^^^^^^^^
> 
> [    4.286345] EDAC amd64: MC: 2:     0MB 3:     0MB
> 
> which shows 4 chip selects x 8Gb = 32G.
> 
> So something's still wrong. Before the patchset it says:
> 
> EDAC MC: UMC0 chip selects:
> EDAC amd64: MC: 0:  8192MB 1:     0MB
> ...
> EDAC MC: UMC1 chip selects:
> EDAC amd64: MC: 0:  8192MB 1:     0MB
> 
> which is the correct output.
> 

Can you please send me the full kernel log and dmidecode output?

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/7] AMD64 EDAC fixes
  2019-08-15 20:08   ` Ghannam, Yazen
@ 2019-08-16  6:47     ` Borislav Petkov
  0 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2019-08-16  6:47 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Thu, Aug 15, 2019 at 08:08:39PM +0000, Ghannam, Yazen wrote:
> Yes, I agree. I was planning to do a fix in a separate set. Is that
> okay? Or should I add it here?

If you have it, send it on, sure.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling
  2019-08-02  6:49   ` Borislav Petkov
@ 2019-08-19 19:55     ` Ghannam, Yazen
  0 siblings, 0 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-08-19 19:55 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Friday, August 2, 2019 1:50 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling
> 
> On Tue, Jul 09, 2019 at 09:56:54PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > The struct chip_select array that's used for saving chip select bases
> > and masks is fixed at length of two. There should be one struct
> > chip_select for each controller, so this array should be increased to
> > support systems that may have more than two controllers.
> >
> > Increase the size of the struct chip_select array to eight, which is the
> > largest number of controllers per die currently supported on AMD
> > systems.
> >
> > Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
> > Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
> > channel.
> >
> > Also, carve out the Family 17h+ reading of the bases/masks into a
> > separate function. This effectively reverts the original bases/masks
> > reading code to before Family 17h support was added.
> >
> > This is a second version of a commit that was reverted.
> >
> > Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
> > Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"")
> 
> I'm not sure about those Fixes: tags you're slapping everywhere. First
> of all, 8de9930a4618 is a revert so how can this be fixing a revert? If
> anything, it should be fixing the original commit
> 
>   0a227af521d6 ("EDAC/amd64: Support more than two controllers for chip select handling")
> 
> which tried the more-than-2-memory-controllers thing.
> 
> But, it is not really a fix for that commit but a second attempt at it.
> Which is not really a fix but hw enablement.
> 
> So I'm dropping those tags here. If you want them in stable, pls
> backport them properly and test them on the respective stable kernels
> before sending them to stable.
> 

Okay, no problem.

Should I drop the Fixes tags on any other of the patches in this set?

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
  2019-08-02  7:42   ` Borislav Petkov
@ 2019-08-19 20:19     ` Ghannam, Yazen
  0 siblings, 0 replies; 16+ messages in thread
From: Ghannam, Yazen @ 2019-08-19 20:19 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Friday, August 2, 2019 2:42 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
> 
> On Tue, Jul 09, 2019 at 09:56:55PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > AMD Family 17h systems support x4 and x16 DRAM devices. However, the
> > device type is not checked when setting EDAC_CTL_CAP.
> >
> > Set the appropriate EDAC_CTL_CAP flag based on the device type.
> >
> > Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
> 
> This is better: a patch which fixes a previous patch and is simple,
> small and clear. That you can tag with Fixes: just fine.
> 
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> > ---
> > Link:
> > https://lkml.kernel.org/r/20190531234501.32826-4-Yazen.Ghannam@amd.com
> >
> > v1->v2:
> > * No change.
> >
> >  drivers/edac/amd64_edac.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index dd60cf5a3d96..125d6e2a828e 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
> >  static inline void
> >  f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
> >  {
> > -	u8 i, ecc_en = 1, cpk_en = 1;
> > +	u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
> >
> >  	for_each_umc(i) {
> >  		if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
> >  			ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
> >  			cpk_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_CHIPKILL_CAP);
> > +
> > +			dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
> > +			dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
> 
> Are those bits mutually exclusive?
> 
> I.e., so that you can do:
> 
> 	if (dev_x4)
> 		mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
> 	else
> 		mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
> 
> ?
> 

I don't think so. I believe they can both be zero. I'll verify and make the change if they are mutually exclusive.

Thanks,
Yazen


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-08-19 20:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-09 21:56 [PATCH v2 0/7] AMD64 EDAC fixes Ghannam, Yazen
2019-07-09 21:56 ` [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling Ghannam, Yazen
2019-07-10 16:54   ` Phillips, Kim
2019-08-02  6:49   ` Borislav Petkov
2019-08-19 19:55     ` Ghannam, Yazen
2019-07-09 21:56 ` [PATCH v2 3/7] EDAC/amd64: Initialize DIMM info for systems with more than two channels Ghannam, Yazen
2019-07-09 21:56 ` [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP Ghannam, Yazen
2019-08-02  7:42   ` Borislav Petkov
2019-08-19 20:19     ` Ghannam, Yazen
2019-07-09 21:56 ` [PATCH v2 4/7] EDAC/amd64: Find Chip Select memory size using Address Mask Ghannam, Yazen
2019-07-09 21:56 ` [PATCH v2 6/7] EDAC/amd64: Cache secondary Chip Select registers Ghannam, Yazen
2019-07-09 21:56 ` [PATCH v2 5/7] EDAC/amd64: Decode syndrome before translating address Ghannam, Yazen
2019-07-09 21:56 ` [PATCH v2 7/7] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs Ghannam, Yazen
2019-08-02 14:46 ` [PATCH v2 0/7] AMD64 EDAC fixes Borislav Petkov
2019-08-15 20:08   ` Ghannam, Yazen
2019-08-16  6:47     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).