linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/6] EDAC/amd64: Make struct amd64_family_type global
  2019-10-18 15:31 [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Ghannam, Yazen
@ 2019-10-18 15:31 ` Ghannam, Yazen
  2019-10-18 15:31 ` [PATCH 3/6] EDAC/amd64: Save max number of controllers to family type Ghannam, Yazen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Ghannam, Yazen @ 2019-10-18 15:31 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

The struct amd64_family_type doesn't change between multiple nodes and
instances of the modules, so make it global.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com

rfc -> v1:
* New patch based on suggestion from Boris.

 drivers/edac/amd64_edac.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c1d4536ae466..b9a712819c68 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -16,6 +16,8 @@ module_param(ecc_enable_override, int, 0644);
 
 static struct msr __percpu *msrs;
 
+static struct amd64_family_type *fam_type;
+
 /* Per-node stuff */
 static struct ecc_settings **ecc_stngs;
 
@@ -3278,8 +3280,7 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 	}
 }
 
-static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
-				 struct amd64_family_type *fam)
+static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
 
@@ -3298,7 +3299,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
 
 	mci->edac_cap		= determine_edac_cap(pvt);
 	mci->mod_name		= EDAC_MOD_STR;
-	mci->ctl_name		= fam->ctl_name;
+	mci->ctl_name		= fam_type->ctl_name;
 	mci->dev_name		= pci_name(pvt->F3);
 	mci->ctl_page_to_phys	= NULL;
 
@@ -3312,8 +3313,6 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
  */
 static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
 {
-	struct amd64_family_type *fam_type = NULL;
-
 	pvt->ext_model  = boot_cpu_data.x86_model >> 4;
 	pvt->stepping	= boot_cpu_data.x86_stepping;
 	pvt->model	= boot_cpu_data.x86_model;
@@ -3420,7 +3419,6 @@ static void compute_num_umcs(void)
 static int init_one_instance(unsigned int nid)
 {
 	struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
-	struct amd64_family_type *fam_type = NULL;
 	struct mem_ctl_info *mci = NULL;
 	struct edac_mc_layer layers[2];
 	struct amd64_pvt *pvt = NULL;
@@ -3497,7 +3495,7 @@ static int init_one_instance(unsigned int nid)
 	mci->pvt_info = pvt;
 	mci->pdev = &pvt->F3->dev;
 
-	setup_mci_misc_attrs(mci, fam_type);
+	setup_mci_misc_attrs(mci);
 
 	if (init_csrows(mci))
 		mci->edac_cap = EDAC_FLAG_NONE;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc.
@ 2019-10-18 15:31 Ghannam, Yazen
  2019-10-18 15:31 ` [PATCH 1/6] EDAC/amd64: Make struct amd64_family_type global Ghannam, Yazen
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Ghannam, Yazen @ 2019-10-18 15:31 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Hi Boris,

This set contains the next revision of the RFC patches I included with
the last AMD64 EDAC updates. I dropped the RFC tags, and I added a
couple of new patches.

Most of these patches address the issue where the module check and
complains about DRAM ECC on nodes without memory.

Patch 3 is new and came out of looking at the family type structs and
the boot flow.

Patch 6 fixes the "grain not set" warning that was recently introduced.

Thanks,
Yazen

Links:
https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com
https://lkml.kernel.org/r/20190821235938.118710-10-Yazen.Ghannam@amd.com
https://lkml.kernel.org/r/20190821235938.118710-11-Yazen.Ghannam@amd.com

Yazen Ghannam (6):
  EDAC/amd64: Make struct amd64_family_type global
  EDAC/amd64: Gather hardware information early
  EDAC/amd64: Save max number of controllers to family type
  EDAC/amd64: Use cached data when checking for ECC
  EDAC/amd64: Check for memory before fully initializing an instance
  EDAC/amd64: Set grain per DIMM

 drivers/edac/amd64_edac.c | 174 ++++++++++++++++++++------------------
 drivers/edac/amd64_edac.h |   1 +
 2 files changed, 94 insertions(+), 81 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/6] EDAC/amd64: Save max number of controllers to family type
  2019-10-18 15:31 [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Ghannam, Yazen
  2019-10-18 15:31 ` [PATCH 1/6] EDAC/amd64: Make struct amd64_family_type global Ghannam, Yazen
@ 2019-10-18 15:31 ` Ghannam, Yazen
  2019-10-21 14:40   ` Borislav Petkov
  2019-10-18 15:31 ` [PATCH 2/6] EDAC/amd64: Gather hardware information early Ghannam, Yazen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Ghannam, Yazen @ 2019-10-18 15:31 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

The maximum number of memory controllers is fixed within a family/model
group. In most cases, this has been fixed at 2, but some systems may
have up to 8.

The struct amd64_family_type already contains family/model-specific
information, and this can be used rather than adding model checks to
various functions.

Create a new field in struct amd64_family_type for max_num_controllers.
Set this when setting other family type information, and use this when
needing the maximum number of memory controllers possible for a system.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com

rfc -> v1:
* New patch.
* Idea came up from Boris' comment about compute_num_umcs().

 drivers/edac/amd64_edac.c | 45 +++++++++++++--------------------------
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4410da7c3a25..0fde5ad2fdcd 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -21,9 +21,6 @@ static struct amd64_family_type *fam_type;
 /* Per-node stuff */
 static struct ecc_settings **ecc_stngs;
 
-/* Number of Unified Memory Controllers */
-static u8 num_umcs;
-
 /*
  * Valid scrub rates for the K8 hardware memory scrubber. We map the scrubbing
  * bandwidth to a valid bit pattern. The 'set' operation finds the 'matching-
@@ -456,7 +453,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct,
 	for (i = 0; i < pvt->csels[dct].m_cnt; i++)
 
 #define for_each_umc(i) \
-	for (i = 0; i < num_umcs; i++)
+	for (i = 0; i < fam_type->max_num_controllers; i++)
 
 /*
  * @input_addr is an InputAddr associated with the node given by mci. Return the
@@ -2226,6 +2223,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "K8",
 		.f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
 		.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= k8_early_channel_count,
 			.map_sysaddr_to_csrow	= k8_map_sysaddr_to_csrow,
@@ -2236,6 +2234,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F10h",
 		.f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
 		.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f1x_early_channel_count,
 			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
@@ -2246,6 +2245,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F15h",
 		.f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
 		.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f1x_early_channel_count,
 			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
@@ -2256,6 +2256,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F15h_M30h",
 		.f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
 		.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f1x_early_channel_count,
 			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
@@ -2266,6 +2267,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F15h_M60h",
 		.f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
 		.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f1x_early_channel_count,
 			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
@@ -2276,6 +2278,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F16h",
 		.f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
 		.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f1x_early_channel_count,
 			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
@@ -2286,6 +2289,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F16h_M30h",
 		.f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
 		.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f1x_early_channel_count,
 			.map_sysaddr_to_csrow	= f1x_map_sysaddr_to_csrow,
@@ -2296,6 +2300,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F17h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
 		.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
 			.dbam_to_cs		= f17_addr_mask_to_cs_size,
@@ -2305,6 +2310,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F17h_M10h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
 		.f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
 			.dbam_to_cs		= f17_addr_mask_to_cs_size,
@@ -2314,6 +2320,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F17h_M30h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
 		.f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
+		.max_num_controllers = 8,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
 			.dbam_to_cs		= f17_addr_mask_to_cs_size,
@@ -2323,6 +2330,7 @@ static struct amd64_family_type family_types[] = {
 		.ctl_name = "F17h_M70h",
 		.f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
 		.f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6,
+		.max_num_controllers = 2,
 		.ops = {
 			.early_channel_count	= f17_early_channel_count,
 			.dbam_to_cs		= f17_addr_mask_to_cs_size,
@@ -3400,29 +3408,14 @@ static const struct attribute_group *amd64_edac_attr_groups[] = {
 	NULL
 };
 
-/* Set the number of Unified Memory Controllers in the system. */
-static void compute_num_umcs(void)
-{
-	u8 model = boot_cpu_data.x86_model;
-
-	if (boot_cpu_data.x86 < 0x17)
-		return;
-
-	if (model >= 0x30 && model <= 0x3f)
-		num_umcs = 8;
-	else
-		num_umcs = 2;
-
-	edac_dbg(1, "Number of UMCs: %x", num_umcs);
-}
-
 static int get_hardware_info(struct amd64_pvt *pvt)
 {
 	u16 pci_id1, pci_id2;
 	int ret = -EINVAL;
 
 	if (pvt->fam >= 0x17) {
-		pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), GFP_KERNEL);
+		pvt->umc = kcalloc(fam_type->max_num_controllers,
+				   sizeof(struct amd64_umc), GFP_KERNEL);
 		if (!pvt->umc) {
 			ret = -ENOMEM;
 			goto err_ret;
@@ -3476,14 +3469,8 @@ static int init_one_instance(struct amd64_pvt *pvt)
 	 * Always allocate two channels since we can have setups with DIMMs on
 	 * only one channel. Also, this simplifies handling later for the price
 	 * of a couple of KBs tops.
-	 *
-	 * On Fam17h+, the number of controllers may be greater than two. So set
-	 * the size equal to the maximum number of UMCs.
 	 */
-	if (pvt->fam >= 0x17)
-		layers[1].size = num_umcs;
-	else
-		layers[1].size = 2;
+	layers[1].size = fam_type->max_num_controllers;
 	layers[1].is_virt_csrow = false;
 
 	mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
@@ -3678,8 +3665,6 @@ static int __init amd64_edac_init(void)
 	if (!msrs)
 		goto err_free;
 
-	compute_num_umcs();
-
 	for (i = 0; i < amd_nb_num(); i++) {
 		err = probe_one_instance(i);
 		if (err) {
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 8c3cda81e619..0d5a9bc4d6de 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -479,6 +479,7 @@ struct low_ops {
 struct amd64_family_type {
 	const char *ctl_name;
 	u16 f0_id, f1_id, f2_id, f6_id;
+	u8 max_num_controllers;
 	struct low_ops ops;
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] EDAC/amd64: Gather hardware information early
  2019-10-18 15:31 [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Ghannam, Yazen
  2019-10-18 15:31 ` [PATCH 1/6] EDAC/amd64: Make struct amd64_family_type global Ghannam, Yazen
  2019-10-18 15:31 ` [PATCH 3/6] EDAC/amd64: Save max number of controllers to family type Ghannam, Yazen
@ 2019-10-18 15:31 ` Ghannam, Yazen
  2019-10-21  8:42   ` Borislav Petkov
  2019-10-18 15:31 ` [PATCH 5/6] EDAC/amd64: Check for memory before fully initializing an instance Ghannam, Yazen
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Ghannam, Yazen @ 2019-10-18 15:31 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Split out gathering hardware information from init_one_instance() into a
separate function get_hardware_info().

This is necessary so that the information can be cached earlier and used
to check if memory is populated and if ECC is enabled on a node.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com

rfc -> v1:
* Fixup after making struct amd64_family_type fam_type global.

 drivers/edac/amd64_edac.c | 72 +++++++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b9a712819c68..4410da7c3a25 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3416,33 +3416,16 @@ static void compute_num_umcs(void)
 	edac_dbg(1, "Number of UMCs: %x", num_umcs);
 }
 
-static int init_one_instance(unsigned int nid)
+static int get_hardware_info(struct amd64_pvt *pvt)
 {
-	struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
-	struct mem_ctl_info *mci = NULL;
-	struct edac_mc_layer layers[2];
-	struct amd64_pvt *pvt = NULL;
 	u16 pci_id1, pci_id2;
-	int err = 0, ret;
-
-	ret = -ENOMEM;
-	pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
-	if (!pvt)
-		goto err_ret;
-
-	pvt->mc_node_id	= nid;
-	pvt->F3 = F3;
-
-	ret = -EINVAL;
-	fam_type = per_family_init(pvt);
-	if (!fam_type)
-		goto err_free;
+	int ret = -EINVAL;
 
 	if (pvt->fam >= 0x17) {
 		pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), GFP_KERNEL);
 		if (!pvt->umc) {
 			ret = -ENOMEM;
-			goto err_free;
+			goto err_ret;
 		}
 
 		pci_id1 = fam_type->f0_id;
@@ -3452,18 +3435,33 @@ static int init_one_instance(unsigned int nid)
 		pci_id2 = fam_type->f2_id;
 	}
 
-	err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
-	if (err)
+	ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
+	if (ret)
 		goto err_post_init;
 
 	read_mc_regs(pvt);
 
+	return 0;
+
+err_post_init:
+	if (pvt->fam >= 0x17)
+		kfree(pvt->umc);
+
+err_ret:
+	return ret;
+}
+
+static int init_one_instance(struct amd64_pvt *pvt)
+{
+	struct mem_ctl_info *mci = NULL;
+	struct edac_mc_layer layers[2];
+	int ret = -EINVAL;
+
 	/*
 	 * We need to determine how many memory channels there are. Then use
 	 * that information for calculating the size of the dynamic instance
 	 * tables in the 'mci' structure.
 	 */
-	ret = -EINVAL;
 	pvt->channel_count = pvt->ops->early_channel_count(pvt);
 	if (pvt->channel_count < 0)
 		goto err_siblings;
@@ -3488,7 +3486,7 @@ static int init_one_instance(unsigned int nid)
 		layers[1].size = 2;
 	layers[1].is_virt_csrow = false;
 
-	mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0);
+	mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
 	if (!mci)
 		goto err_siblings;
 
@@ -3514,20 +3512,16 @@ static int init_one_instance(unsigned int nid)
 err_siblings:
 	free_mc_sibling_devs(pvt);
 
-err_post_init:
 	if (pvt->fam >= 0x17)
 		kfree(pvt->umc);
 
-err_free:
-	kfree(pvt);
-
-err_ret:
 	return ret;
 }
 
 static int probe_one_instance(unsigned int nid)
 {
 	struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
+	struct amd64_pvt *pvt = NULL;
 	struct ecc_settings *s;
 	int ret;
 
@@ -3538,6 +3532,21 @@ static int probe_one_instance(unsigned int nid)
 
 	ecc_stngs[nid] = s;
 
+	pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
+	if (!pvt)
+		goto err_settings;
+
+	pvt->mc_node_id	= nid;
+	pvt->F3 = F3;
+
+	fam_type = per_family_init(pvt);
+	if (!fam_type)
+		goto err_enable;
+
+	ret = get_hardware_info(pvt);
+	if (ret < 0)
+		goto err_enable;
+
 	if (!ecc_enabled(F3, nid)) {
 		ret = 0;
 
@@ -3554,7 +3563,7 @@ static int probe_one_instance(unsigned int nid)
 			goto err_enable;
 	}
 
-	ret = init_one_instance(nid);
+	ret = init_one_instance(pvt);
 	if (ret < 0) {
 		amd64_err("Error probing instance: %d\n", nid);
 
@@ -3567,6 +3576,9 @@ static int probe_one_instance(unsigned int nid)
 	return ret;
 
 err_enable:
+	kfree(pvt);
+
+err_settings:
 	kfree(s);
 	ecc_stngs[nid] = NULL;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] EDAC/amd64: Use cached data when checking for ECC
  2019-10-18 15:31 [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Ghannam, Yazen
                   ` (3 preceding siblings ...)
  2019-10-18 15:31 ` [PATCH 5/6] EDAC/amd64: Check for memory before fully initializing an instance Ghannam, Yazen
@ 2019-10-18 15:31 ` Ghannam, Yazen
  2019-10-18 15:31 ` [PATCH 6/6] EDAC/amd64: Set grain per DIMM Ghannam, Yazen
  2019-10-21 14:48 ` [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Borislav Petkov
  6 siblings, 0 replies; 11+ messages in thread
From: Ghannam, Yazen @ 2019-10-18 15:31 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

...now that the data is available earlier.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-10-Yazen.Ghannam@amd.com

rfc -> v1:
* No change.

 drivers/edac/amd64_edac.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 0fde5ad2fdcd..feb68c0a3217 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3200,31 +3200,27 @@ static const char *ecc_msg =
 	"'ecc_enable_override'.\n"
 	" (Note that use of the override may cause unknown side effects.)\n";
 
-static bool ecc_enabled(struct pci_dev *F3, u16 nid)
+static bool ecc_enabled(struct amd64_pvt *pvt)
 {
+	u16 nid = pvt->mc_node_id;
 	bool nb_mce_en = false;
 	u8 ecc_en = 0, i;
 	u32 value;
 
 	if (boot_cpu_data.x86 >= 0x17) {
 		u8 umc_en_mask = 0, ecc_en_mask = 0;
+		struct amd64_umc *umc;
 
 		for_each_umc(i) {
-			u32 base = get_umc_base(i);
+			umc = &pvt->umc[i];
 
 			/* Only check enabled UMCs. */
-			if (amd_smn_read(nid, base + UMCCH_SDP_CTRL, &value))
-				continue;
-
-			if (!(value & UMC_SDP_INIT))
+			if (!(umc->sdp_ctrl & UMC_SDP_INIT))
 				continue;
 
 			umc_en_mask |= BIT(i);
 
-			if (amd_smn_read(nid, base + UMCCH_UMC_CAP_HI, &value))
-				continue;
-
-			if (value & UMC_ECC_ENABLED)
+			if (umc->umc_cap_hi & UMC_ECC_ENABLED)
 				ecc_en_mask |= BIT(i);
 		}
 
@@ -3237,7 +3233,7 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 		/* Assume UMC MCA banks are enabled. */
 		nb_mce_en = true;
 	} else {
-		amd64_read_pci_cfg(F3, NBCFG, &value);
+		amd64_read_pci_cfg(pvt->F3, NBCFG, &value);
 
 		ecc_en = !!(value & NBCFG_ECC_ENABLE);
 
@@ -3534,7 +3530,7 @@ static int probe_one_instance(unsigned int nid)
 	if (ret < 0)
 		goto err_enable;
 
-	if (!ecc_enabled(F3, nid)) {
+	if (!ecc_enabled(pvt)) {
 		ret = 0;
 
 		if (!ecc_enable_override)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] EDAC/amd64: Check for memory before fully initializing an instance
  2019-10-18 15:31 [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Ghannam, Yazen
                   ` (2 preceding siblings ...)
  2019-10-18 15:31 ` [PATCH 2/6] EDAC/amd64: Gather hardware information early Ghannam, Yazen
@ 2019-10-18 15:31 ` Ghannam, Yazen
  2019-10-18 15:31 ` [PATCH 4/6] EDAC/amd64: Use cached data when checking for ECC Ghannam, Yazen
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Ghannam, Yazen @ 2019-10-18 15:31 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Return early before checking for ECC if the node does not have any
populated memory.

Free any cached hardware data before returning. Also, return 0 in this
case since this is not a failure. Other nodes may have memory and the
module should attempt to load an instance for them.

Move printing of hardware information to after the instance is
initialized, so that the information is only printed for nodes with
memory.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-11-Yazen.Ghannam@amd.com

rfc -> v1:
* Change message severity to "info".
  * Nodes without memory is a valid configuration. The user doesn't
    need to be warned.
* Drop "DRAM ECC disabled" from message.
  * The message is given when no memory was detected on a node.
  * The state of DRAM ECC is not checked here.

 drivers/edac/amd64_edac.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index feb68c0a3217..2a0a8be8f767 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2848,8 +2848,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
 	edac_dbg(1, "  DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
 
 	determine_ecc_sym_sz(pvt);
-
-	dump_misc_regs(pvt);
 }
 
 /*
@@ -3501,6 +3499,19 @@ static int init_one_instance(struct amd64_pvt *pvt)
 	return ret;
 }
 
+static bool instance_has_memory(struct amd64_pvt *pvt)
+{
+	bool cs_enabled = false;
+	int cs = 0, dct = 0;
+
+	for (dct = 0; dct < fam_type->max_num_controllers; dct++) {
+		for_each_chip_select(cs, dct, pvt)
+			cs_enabled |= csrow_enabled(cs, dct, pvt);
+	}
+
+	return cs_enabled;
+}
+
 static int probe_one_instance(unsigned int nid)
 {
 	struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
@@ -3530,6 +3541,12 @@ static int probe_one_instance(unsigned int nid)
 	if (ret < 0)
 		goto err_enable;
 
+	ret = 0;
+	if (!instance_has_memory(pvt)) {
+		amd64_info("Node %d: No DIMMs detected.\n", nid);
+		goto err_enable;
+	}
+
 	if (!ecc_enabled(pvt)) {
 		ret = 0;
 
@@ -3556,6 +3573,8 @@ static int probe_one_instance(unsigned int nid)
 		goto err_enable;
 	}
 
+	dump_misc_regs(pvt);
+
 	return ret;
 
 err_enable:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] EDAC/amd64: Set grain per DIMM
  2019-10-18 15:31 [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Ghannam, Yazen
                   ` (4 preceding siblings ...)
  2019-10-18 15:31 ` [PATCH 4/6] EDAC/amd64: Use cached data when checking for ECC Ghannam, Yazen
@ 2019-10-18 15:31 ` Ghannam, Yazen
  2019-10-21 14:48 ` [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Borislav Petkov
  6 siblings, 0 replies; 11+ messages in thread
From: Ghannam, Yazen @ 2019-10-18 15:31 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/724d6f97-61f2-94bd-3f4b-793a55b6ac15@amd.com

rfc -> v1:
* New patch.

 drivers/edac/amd64_edac.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2a0a8be8f767..b5c0accfefcf 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2944,6 +2944,7 @@ static int init_csrows_df(struct mem_ctl_info *mci)
 			dimm->mtype = pvt->dram_type;
 			dimm->edac_mode = edac_mode;
 			dimm->dtype = dev_type;
+			dimm->grain = 64;
 		}
 	}
 
@@ -3020,6 +3021,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 			dimm = csrow->channels[j]->dimm;
 			dimm->mtype = pvt->dram_type;
 			dimm->edac_mode = edac_mode;
+			dimm->grain = 64;
 		}
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/6] EDAC/amd64: Gather hardware information early
  2019-10-18 15:31 ` [PATCH 2/6] EDAC/amd64: Gather hardware information early Ghannam, Yazen
@ 2019-10-21  8:42   ` Borislav Petkov
  0 siblings, 0 replies; 11+ messages in thread
From: Borislav Petkov @ 2019-10-21  8:42 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Fri, Oct 18, 2019 at 03:31:26PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Split out gathering hardware information from init_one_instance() into a
> separate function get_hardware_info().
> 
> This is necessary so that the information can be cached earlier and used
> to check if memory is populated and if ECC is enabled on a node.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Link:
> https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com
> 
> rfc -> v1:
> * Fixup after making struct amd64_family_type fam_type global.
> 
>  drivers/edac/amd64_edac.c | 72 +++++++++++++++++++++++----------------
>  1 file changed, 42 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index b9a712819c68..4410da7c3a25 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3416,33 +3416,16 @@ static void compute_num_umcs(void)
>  	edac_dbg(1, "Number of UMCs: %x", num_umcs);
>  }
>  
> -static int init_one_instance(unsigned int nid)
> +static int get_hardware_info(struct amd64_pvt *pvt)
>  {
> -	struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
> -	struct mem_ctl_info *mci = NULL;
> -	struct edac_mc_layer layers[2];
> -	struct amd64_pvt *pvt = NULL;
>  	u16 pci_id1, pci_id2;
> -	int err = 0, ret;
> -
> -	ret = -ENOMEM;
> -	pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
> -	if (!pvt)
> -		goto err_ret;
> -
> -	pvt->mc_node_id	= nid;
> -	pvt->F3 = F3;
> -
> -	ret = -EINVAL;
> -	fam_type = per_family_init(pvt);
> -	if (!fam_type)
> -		goto err_free;
> +	int ret = -EINVAL;
>  
>  	if (pvt->fam >= 0x17) {
>  		pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), GFP_KERNEL);
>  		if (!pvt->umc) {
>  			ret = -ENOMEM;
> -			goto err_free;
> +			goto err_ret;
>  		}
>  
>  		pci_id1 = fam_type->f0_id;
> @@ -3452,18 +3435,33 @@ static int init_one_instance(unsigned int nid)
>  		pci_id2 = fam_type->f2_id;
>  	}
>  
> -	err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
> -	if (err)
> +	ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
> +	if (ret)
>  		goto err_post_init;
>  
>  	read_mc_regs(pvt);
>  
> +	return 0;
> +
> +err_post_init:
> +	if (pvt->fam >= 0x17)
> +		kfree(pvt->umc);

So you're freeing pvt->umc here but nothing in that function allocated
it. get_hardware_info() in probe_one_instance() did but if you do it
this way, it is kinda hard to follow and the layering is a bit iffy.

So what I'd suggest is:

* Rename get_hardware_info() to something like hw_info_get() so that
you can have a counterpart hw_info_put() which does any cleanup after
hw_info_get(), including the freeing of the ->umc.

* In probe_one_instance(), if init_one_instance() fails, call
hw_info_put() on the error path so that all your flow in the probe/init
functions is nicely ballanced and easily followed.

Makes sense?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/6] EDAC/amd64: Save max number of controllers to family type
  2019-10-18 15:31 ` [PATCH 3/6] EDAC/amd64: Save max number of controllers to family type Ghannam, Yazen
@ 2019-10-21 14:40   ` Borislav Petkov
  0 siblings, 0 replies; 11+ messages in thread
From: Borislav Petkov @ 2019-10-21 14:40 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Fri, Oct 18, 2019 at 03:31:26PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> The maximum number of memory controllers is fixed within a family/model
> group. In most cases, this has been fixed at 2, but some systems may
> have up to 8.
> 
> The struct amd64_family_type already contains family/model-specific
> information, and this can be used rather than adding model checks to
> various functions.
> 
> Create a new field in struct amd64_family_type for max_num_controllers.
> Set this when setting other family type information, and use this when
> needing the maximum number of memory controllers possible for a system.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Link:
> https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com

...

> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 8c3cda81e619..0d5a9bc4d6de 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -479,6 +479,7 @@ struct low_ops {
>  struct amd64_family_type {
>  	const char *ctl_name;
>  	u16 f0_id, f1_id, f2_id, f6_id;
> +	u8 max_num_controllers;

Sure but call that max_mcs or so, so that the code which mentions it,
doesn't stick out too much. You can put a comment above it here to
explain what it is:

	/* Maximum number of memory controllers per node */
	u8 max_mcs;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc.
  2019-10-18 15:31 [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Ghannam, Yazen
                   ` (5 preceding siblings ...)
  2019-10-18 15:31 ` [PATCH 6/6] EDAC/amd64: Set grain per DIMM Ghannam, Yazen
@ 2019-10-21 14:48 ` Borislav Petkov
  2019-10-21 16:40   ` Ghannam, Yazen
  6 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2019-10-21 14:48 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Fri, Oct 18, 2019 at 03:31:25PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Hi Boris,
> 
> This set contains the next revision of the RFC patches I included with
> the last AMD64 EDAC updates. I dropped the RFC tags, and I added a
> couple of new patches.

Yah, looks pretty much good, modulo the minor things I commented on
earlier.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc.
  2019-10-21 14:48 ` [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Borislav Petkov
@ 2019-10-21 16:40   ` Ghannam, Yazen
  0 siblings, 0 replies; 11+ messages in thread
From: Ghannam, Yazen @ 2019-10-21 16:40 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, October 21, 2019 10:48 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc.
> 
> On Fri, Oct 18, 2019 at 03:31:25PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > Hi Boris,
> >
> > This set contains the next revision of the RFC patches I included with
> > the last AMD64 EDAC updates. I dropped the RFC tags, and I added a
> > couple of new patches.
> 
> Yah, looks pretty much good, modulo the minor things I commented on
> earlier.
> 

Thank you. I'll send another revision soon.

-Yazen

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-10-21 16:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-18 15:31 [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Ghannam, Yazen
2019-10-18 15:31 ` [PATCH 1/6] EDAC/amd64: Make struct amd64_family_type global Ghannam, Yazen
2019-10-18 15:31 ` [PATCH 3/6] EDAC/amd64: Save max number of controllers to family type Ghannam, Yazen
2019-10-21 14:40   ` Borislav Petkov
2019-10-18 15:31 ` [PATCH 2/6] EDAC/amd64: Gather hardware information early Ghannam, Yazen
2019-10-21  8:42   ` Borislav Petkov
2019-10-18 15:31 ` [PATCH 5/6] EDAC/amd64: Check for memory before fully initializing an instance Ghannam, Yazen
2019-10-18 15:31 ` [PATCH 4/6] EDAC/amd64: Use cached data when checking for ECC Ghannam, Yazen
2019-10-18 15:31 ` [PATCH 6/6] EDAC/amd64: Set grain per DIMM Ghannam, Yazen
2019-10-21 14:48 ` [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc Borislav Petkov
2019-10-21 16:40   ` Ghannam, Yazen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).