All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-21 19:13 ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2018-03-21 19:13 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

It's possible that a system can be used without any DRAM populated on
one or more physical Dies on multi-die systems. Firmware will not
enable DRAM ECC on Dies without DRAM. Users will then see a message
about DRAM ECC disabled on those nodes without DRAM. However, DRAM ECC
may, in fact, be enabled on the other Dies that have DRAM.

Only print ECC enabled/disabled information for nodes that have at least
one enabled memory channel. A memory channel that is unused, i.e. has no
DRAM, should be seen as disabled. DRAM ECC information is not relevant
on nodes without DRAM.

Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 329cb96f886f..af0ce9aa8d24 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3035,6 +3035,7 @@ static const char *ecc_msg =
 static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 {
 	bool nb_mce_en = false;
+	bool mc_en = true;
 	u8 ecc_en = 0, i;
 	u32 value;
 
@@ -3060,6 +3061,8 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 				ecc_en_mask |= BIT(i);
 		}
 
+		mc_en = !!umc_en_mask;
+
 		/* Check whether at least one UMC is enabled: */
 		if (umc_en_mask)
 			ecc_en = umc_en_mask == ecc_en_mask;
@@ -3079,14 +3082,19 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 				     MSR_IA32_MCG_CTL, nid);
 	}
 
-	amd64_info("Node %d: DRAM ECC %s.\n",
-		   nid, (ecc_en ? "enabled" : "disabled"));
+	/*
+	 * Only print ECC enabled/disabled messages for nodes with enabled
+	 * memory controllers.
+	 */
+	if (mc_en) {
+		amd64_info("Node %d: DRAM ECC %s.\n",
+			   nid, (ecc_en ? "enabled" : "disabled"));
 
-	if (!ecc_en || !nb_mce_en) {
-		amd64_info("%s", ecc_msg);
-		return false;
+		if (!ecc_en || !nb_mce_en)
+			amd64_info("%s", ecc_msg);
 	}
-	return true;
+
+	return ecc_en && nb_mce_en;
 }
 
 static inline void
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-21 19:13 ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2018-03-21 19:13 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

It's possible that a system can be used without any DRAM populated on
one or more physical Dies on multi-die systems. Firmware will not
enable DRAM ECC on Dies without DRAM. Users will then see a message
about DRAM ECC disabled on those nodes without DRAM. However, DRAM ECC
may, in fact, be enabled on the other Dies that have DRAM.

Only print ECC enabled/disabled information for nodes that have at least
one enabled memory channel. A memory channel that is unused, i.e. has no
DRAM, should be seen as disabled. DRAM ECC information is not relevant
on nodes without DRAM.

Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 329cb96f886f..af0ce9aa8d24 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3035,6 +3035,7 @@ static const char *ecc_msg =
 static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 {
 	bool nb_mce_en = false;
+	bool mc_en = true;
 	u8 ecc_en = 0, i;
 	u32 value;
 
@@ -3060,6 +3061,8 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 				ecc_en_mask |= BIT(i);
 		}
 
+		mc_en = !!umc_en_mask;
+
 		/* Check whether at least one UMC is enabled: */
 		if (umc_en_mask)
 			ecc_en = umc_en_mask == ecc_en_mask;
@@ -3079,14 +3082,19 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 				     MSR_IA32_MCG_CTL, nid);
 	}
 
-	amd64_info("Node %d: DRAM ECC %s.\n",
-		   nid, (ecc_en ? "enabled" : "disabled"));
+	/*
+	 * Only print ECC enabled/disabled messages for nodes with enabled
+	 * memory controllers.
+	 */
+	if (mc_en) {
+		amd64_info("Node %d: DRAM ECC %s.\n",
+			   nid, (ecc_en ? "enabled" : "disabled"));
 
-	if (!ecc_en || !nb_mce_en) {
-		amd64_info("%s", ecc_msg);
-		return false;
+		if (!ecc_en || !nb_mce_en)
+			amd64_info("%s", ecc_msg);
 	}
-	return true;
+
+	return ecc_en && nb_mce_en;
 }
 
 static inline void

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] EDAC/amd64: Only remove instances that exist
@ 2018-03-21 19:13   ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2018-03-21 19:13 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

An instance may have failed probing because the probed node did not have
DRAM installed. When the module is unloaded we'll get a WARNing when
we try to remove a non-existent instance.

Save a bitmask of enabled instances with a bit for each node.

Only try to remove an instance if it was successfully enabled in the
first place.

Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index af0ce9aa8d24..054086b19c6c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -13,6 +13,8 @@ module_param(report_gart_errors, int, 0644);
 static int ecc_enable_override;
 module_param(ecc_enable_override, int, 0644);
 
+static unsigned long int enabled_instances;
+
 static struct msr __percpu *msrs;
 
 /* Per-node stuff */
@@ -3366,6 +3368,7 @@ static int probe_one_instance(unsigned int nid)
 		goto err_enable;
 	}
 
+	set_bit(nid, &enabled_instances);
 	return ret;
 
 err_enable:
@@ -3383,6 +3386,9 @@ static void remove_one_instance(unsigned int nid)
 	struct mem_ctl_info *mci;
 	struct amd64_pvt *pvt;
 
+	if (!test_bit(nid, &enabled_instances))
+		return;
+
 	mci = find_mci_by_dev(&F3->dev);
 	WARN_ON(!mci);
 
@@ -3405,6 +3411,8 @@ static void remove_one_instance(unsigned int nid)
 
 	kfree(pvt);
 	edac_mc_free(mci);
+
+	clear_bit(nid, &enabled_instances);
 }
 
 static void setup_pci_device(void)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [2/3] EDAC/amd64: Only remove instances that exist
@ 2018-03-21 19:13   ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2018-03-21 19:13 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

An instance may have failed probing because the probed node did not have
DRAM installed. When the module is unloaded we'll get a WARNing when
we try to remove a non-existent instance.

Save a bitmask of enabled instances with a bit for each node.

Only try to remove an instance if it was successfully enabled in the
first place.

Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index af0ce9aa8d24..054086b19c6c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -13,6 +13,8 @@ module_param(report_gart_errors, int, 0644);
 static int ecc_enable_override;
 module_param(ecc_enable_override, int, 0644);
 
+static unsigned long int enabled_instances;
+
 static struct msr __percpu *msrs;
 
 /* Per-node stuff */
@@ -3366,6 +3368,7 @@ static int probe_one_instance(unsigned int nid)
 		goto err_enable;
 	}
 
+	set_bit(nid, &enabled_instances);
 	return ret;
 
 err_enable:
@@ -3383,6 +3386,9 @@ static void remove_one_instance(unsigned int nid)
 	struct mem_ctl_info *mci;
 	struct amd64_pvt *pvt;
 
+	if (!test_bit(nid, &enabled_instances))
+		return;
+
 	mci = find_mci_by_dev(&F3->dev);
 	WARN_ON(!mci);
 
@@ -3405,6 +3411,8 @@ static void remove_one_instance(unsigned int nid)
 
 	kfree(pvt);
 	edac_mc_free(mci);
+
+	clear_bit(nid, &enabled_instances);
 }
 
 static void setup_pci_device(void)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] EDAC/amd64: Add DIMM device type for Fam17h
@ 2018-03-21 19:13   ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2018-03-21 19:13 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Set the DIMM device type for Fam17h.

Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 17 ++++++++++++++---
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 054086b19c6c..59d0085e4ffa 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -998,7 +998,7 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
 
 static void determine_memory_type(struct amd64_pvt *pvt)
 {
-	u32 dram_ctrl, dcsm;
+	u32 dram_ctrl, dcsm, dimm_cfg;
 
 	switch (pvt->fam) {
 	case 0xf:
@@ -1046,12 +1046,22 @@ static void determine_memory_type(struct amd64_pvt *pvt)
 		goto ddr3;
 
 	case 0x17:
-		if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
+		dimm_cfg = pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg;
+
+		if (dimm_cfg & BIT(5))
 			pvt->dram_type = MEM_LRDDR4;
-		else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
+		else if (dimm_cfg & BIT(4))
 			pvt->dram_type = MEM_RDDR4;
 		else
 			pvt->dram_type = MEM_DDR4;
+
+		if (dimm_cfg & BIT(7))
+			pvt->dev_type = DEV_X16;
+		else if (dimm_cfg & BIT(6))
+			pvt->dev_type = DEV_X4;
+		else
+			pvt->dev_type = DEV_UNKNOWN;
+
 		return;
 
 	default:
@@ -2855,6 +2865,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 		for (j = 0; j < pvt->channel_count; j++) {
 			dimm = csrow->channels[j]->dimm;
 			dimm->mtype = pvt->dram_type;
+			dimm->dtype = pvt->dev_type;
 			dimm->edac_mode = edac_mode;
 		}
 	}
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 1d4b74e9a037..47696d3ce487 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -368,6 +368,7 @@ struct amd64_pvt {
 
 	/* cache the dram_type */
 	enum mem_type dram_type;
+	enum dev_type dev_type;
 
 	struct amd64_umc *umc;	/* UMC registers */
 };
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [3/3] EDAC/amd64: Add DIMM device type for Fam17h
@ 2018-03-21 19:13   ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2018-03-21 19:13 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp

From: Yazen Ghannam <yazen.ghannam@amd.com>

Set the DIMM device type for Fam17h.

Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 17 ++++++++++++++---
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 054086b19c6c..59d0085e4ffa 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -998,7 +998,7 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
 
 static void determine_memory_type(struct amd64_pvt *pvt)
 {
-	u32 dram_ctrl, dcsm;
+	u32 dram_ctrl, dcsm, dimm_cfg;
 
 	switch (pvt->fam) {
 	case 0xf:
@@ -1046,12 +1046,22 @@ static void determine_memory_type(struct amd64_pvt *pvt)
 		goto ddr3;
 
 	case 0x17:
-		if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
+		dimm_cfg = pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg;
+
+		if (dimm_cfg & BIT(5))
 			pvt->dram_type = MEM_LRDDR4;
-		else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
+		else if (dimm_cfg & BIT(4))
 			pvt->dram_type = MEM_RDDR4;
 		else
 			pvt->dram_type = MEM_DDR4;
+
+		if (dimm_cfg & BIT(7))
+			pvt->dev_type = DEV_X16;
+		else if (dimm_cfg & BIT(6))
+			pvt->dev_type = DEV_X4;
+		else
+			pvt->dev_type = DEV_UNKNOWN;
+
 		return;
 
 	default:
@@ -2855,6 +2865,7 @@ static int init_csrows(struct mem_ctl_info *mci)
 		for (j = 0; j < pvt->channel_count; j++) {
 			dimm = csrow->channels[j]->dimm;
 			dimm->mtype = pvt->dram_type;
+			dimm->dtype = pvt->dev_type;
 			dimm->edac_mode = edac_mode;
 		}
 	}
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 1d4b74e9a037..47696d3ce487 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -368,6 +368,7 @@ struct amd64_pvt {
 
 	/* cache the dram_type */
 	enum mem_type dram_type;
+	enum dev_type dev_type;
 
 	struct amd64_umc *umc;	/* UMC registers */
 };

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-28 13:00   ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2018-03-28 13:00 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, linux-kernel

On Wed, Mar 21, 2018 at 02:13:33PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> It's possible that a system can be used without any DRAM populated on
> one or more physical Dies on multi-die systems. Firmware will not
> enable DRAM ECC on Dies without DRAM. Users will then see a message
> about DRAM ECC disabled on those nodes without DRAM. However, DRAM ECC
> may, in fact, be enabled on the other Dies that have DRAM.
> 
> Only print ECC enabled/disabled information for nodes that have at least
> one enabled memory channel.

So if the only reason for this is make the error messages more precise,
then let's not make it uglier than it is.

The right way to do it would be to push those checks down to
debug_display_dimm_sizes* which looks at the CS rows and the chip select
enable bits and there to differentiate between

* memory controller doesn't have DIMMs

and

* memory controller has DIMMs but ECC is disabled in the BIOS

and then print the respective informative error message. But not with a
yet another boolean which kinda takes care of F17h only and leaves the
old families as they were.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-28 13:00   ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2018-03-28 13:00 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, linux-kernel

On Wed, Mar 21, 2018 at 02:13:33PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> It's possible that a system can be used without any DRAM populated on
> one or more physical Dies on multi-die systems. Firmware will not
> enable DRAM ECC on Dies without DRAM. Users will then see a message
> about DRAM ECC disabled on those nodes without DRAM. However, DRAM ECC
> may, in fact, be enabled on the other Dies that have DRAM.
> 
> Only print ECC enabled/disabled information for nodes that have at least
> one enabled memory channel.

So if the only reason for this is make the error messages more precise,
then let's not make it uglier than it is.

The right way to do it would be to push those checks down to
debug_display_dimm_sizes* which looks at the CS rows and the chip select
enable bits and there to differentiate between

* memory controller doesn't have DIMMs

and

* memory controller has DIMMs but ECC is disabled in the BIOS

and then print the respective informative error message. But not with a
yet another boolean which kinda takes care of F17h only and leaves the
old families as they were.

Thx.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-28 14:38     ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Ghannam, Yazen @ 2018-03-28 14:38 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-
> owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Wednesday, March 28, 2018 9:00 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes
> with enabled MCs
> 
> On Wed, Mar 21, 2018 at 02:13:33PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > It's possible that a system can be used without any DRAM populated on
> > one or more physical Dies on multi-die systems. Firmware will not
> > enable DRAM ECC on Dies without DRAM. Users will then see a message
> > about DRAM ECC disabled on those nodes without DRAM. However, DRAM
> ECC
> > may, in fact, be enabled on the other Dies that have DRAM.
> >
> > Only print ECC enabled/disabled information for nodes that have at least
> > one enabled memory channel.
> 
> So if the only reason for this is make the error messages more precise,
> then let's not make it uglier than it is.
> 
> The right way to do it would be to push those checks down to
> debug_display_dimm_sizes* which looks at the CS rows and the chip select
> enable bits and there to differentiate between
> 
> * memory controller doesn't have DIMMs
> 
> and
> 
> * memory controller has DIMMs but ECC is disabled in the BIOS
> 
> and then print the respective informative error message. But not with a
> yet another boolean which kinda takes care of F17h only and leaves the
> old families as they were.
> 

In either of those cases we won't get to debug_display_dimm_sizes*
because we won't initialize the instance.

We have a message for "memory controller doesn't have DIMMs".
edac_dbg(0, "Node %d: No enabled UMCs.\n", nid);

We can change the wording of this to make it more clear. Also, we can
make this a pr_info and return from here.

So something like this:

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 329cb96f886f..6e211ed6d419 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3061,10 +3061,12 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
                }
 
                /* Check whether at least one UMC is enabled: */
-               if (umc_en_mask)
+               if (umc_en_mask) {
                        ecc_en = umc_en_mask == ecc_en_mask;
-               else
-                       edac_dbg(0, "Node %d: No enabled UMCs.\n", nid);
+               } else {
+                       amd64_info("Node %d: No DIMMs populated on memory controller.\n", nid);
+                       return false;
+               }
 
                /* Assume UMC MCA banks are enabled. */
                nb_mce_en = true;

This would work for Fam17h. For older systems I think we can look at
D18F2x94_dct[1:0][DisDramInterface]

Or maybe we have a separate function to check for enabled memory controllers
before we check for ECC?

Thanks,
Yazen

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-28 14:38     ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2018-03-28 14:38 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-
> owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Wednesday, March 28, 2018 9:00 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes
> with enabled MCs
> 
> On Wed, Mar 21, 2018 at 02:13:33PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > It's possible that a system can be used without any DRAM populated on
> > one or more physical Dies on multi-die systems. Firmware will not
> > enable DRAM ECC on Dies without DRAM. Users will then see a message
> > about DRAM ECC disabled on those nodes without DRAM. However, DRAM
> ECC
> > may, in fact, be enabled on the other Dies that have DRAM.
> >
> > Only print ECC enabled/disabled information for nodes that have at least
> > one enabled memory channel.
> 
> So if the only reason for this is make the error messages more precise,
> then let's not make it uglier than it is.
> 
> The right way to do it would be to push those checks down to
> debug_display_dimm_sizes* which looks at the CS rows and the chip select
> enable bits and there to differentiate between
> 
> * memory controller doesn't have DIMMs
> 
> and
> 
> * memory controller has DIMMs but ECC is disabled in the BIOS
> 
> and then print the respective informative error message. But not with a
> yet another boolean which kinda takes care of F17h only and leaves the
> old families as they were.
> 

In either of those cases we won't get to debug_display_dimm_sizes*
because we won't initialize the instance.

We have a message for "memory controller doesn't have DIMMs".
edac_dbg(0, "Node %d: No enabled UMCs.\n", nid);

We can change the wording of this to make it more clear. Also, we can
make this a pr_info and return from here.

So something like this:


This would work for Fam17h. For older systems I think we can look at
D18F2x94_dct[1:0][DisDramInterface]

Or maybe we have a separate function to check for enabled memory controllers
before we check for ECC?

Thanks,
Yazen

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 329cb96f886f..6e211ed6d419 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3061,10 +3061,12 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
                }
 
                /* Check whether at least one UMC is enabled: */
-               if (umc_en_mask)
+               if (umc_en_mask) {
                        ecc_en = umc_en_mask == ecc_en_mask;
-               else
-                       edac_dbg(0, "Node %d: No enabled UMCs.\n", nid);
+               } else {
+                       amd64_info("Node %d: No DIMMs populated on memory controller.\n", nid);
+                       return false;
+               }
 
                /* Assume UMC MCA banks are enabled. */
                nb_mce_en = true;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-28 15:43       ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2018-03-28 15:43 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Wed, Mar 28, 2018 at 02:38:11PM +0000, Ghannam, Yazen wrote:
> In either of those cases we won't get to debug_display_dimm_sizes*
> because we won't initialize the instance.

So you move that code which accesses csrows up so that it has the
required information to query DIMM state/presence.

>                 /* Assume UMC MCA banks are enabled. */
>                 nb_mce_en = true;

Also, I don't like that assumption.

> This would work for Fam17h. For older systems I think we can look at
> D18F2x94_dct[1:0][DisDramInterface]
> 
> Or maybe we have a separate function to check for enabled memory controllers
> before we check for ECC?

The best would be to have a function which checks whether DIMMs are
present on the node and act accordingly. You can pull up some of the
work of caching registers which are used in debug_display_dimm_sizes*
and use that info to get the required DIMM state upfront.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-28 15:43       ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2018-03-28 15:43 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel

On Wed, Mar 28, 2018 at 02:38:11PM +0000, Ghannam, Yazen wrote:
> In either of those cases we won't get to debug_display_dimm_sizes*
> because we won't initialize the instance.

So you move that code which accesses csrows up so that it has the
required information to query DIMM state/presence.

>                 /* Assume UMC MCA banks are enabled. */
>                 nb_mce_en = true;

Also, I don't like that assumption.

> This would work for Fam17h. For older systems I think we can look at
> D18F2x94_dct[1:0][DisDramInterface]
> 
> Or maybe we have a separate function to check for enabled memory controllers
> before we check for ECC?

The best would be to have a function which checks whether DIMMs are
present on the node and act accordingly. You can pull up some of the
work of caching registers which are used in debug_display_dimm_sizes*
and use that info to get the required DIMM state upfront.

Thx.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-28 15:59         ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Ghannam, Yazen @ 2018-03-28 15:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Wednesday, March 28, 2018 11:44 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes
> with enabled MCs
> 
> On Wed, Mar 28, 2018 at 02:38:11PM +0000, Ghannam, Yazen wrote:
> > In either of those cases we won't get to debug_display_dimm_sizes*
> > because we won't initialize the instance.
> 
> So you move that code which accesses csrows up so that it has the
> required information to query DIMM state/presence.
> 

I don't think we need to look at the csrows. We just need to check if the
controller is enabled or not. A controller is enabled if there is a DIMM
behind it and disabled if there isn't. 

> >                 /* Assume UMC MCA banks are enabled. */
> >                 nb_mce_en = true;
> 
> Also, I don't like that assumption.
> 

I'll look into it.

> > This would work for Fam17h. For older systems I think we can look at
> > D18F2x94_dct[1:0][DisDramInterface]
> >
> > Or maybe we have a separate function to check for enabled memory
> controllers
> > before we check for ECC?
> 
> The best would be to have a function which checks whether DIMMs are
> present on the node and act accordingly. You can pull up some of the
> work of caching registers which are used in debug_display_dimm_sizes*
> and use that info to get the required DIMM state upfront.
> 

Okay, so I'll do a separate function.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
@ 2018-03-28 15:59         ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2018-03-28 15:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Wednesday, March 28, 2018 11:44 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes
> with enabled MCs
> 
> On Wed, Mar 28, 2018 at 02:38:11PM +0000, Ghannam, Yazen wrote:
> > In either of those cases we won't get to debug_display_dimm_sizes*
> > because we won't initialize the instance.
> 
> So you move that code which accesses csrows up so that it has the
> required information to query DIMM state/presence.
> 

I don't think we need to look at the csrows. We just need to check if the
controller is enabled or not. A controller is enabled if there is a DIMM
behind it and disabled if there isn't. 

> >                 /* Assume UMC MCA banks are enabled. */
> >                 nb_mce_en = true;
> 
> Also, I don't like that assumption.
> 

I'll look into it.

> > This would work for Fam17h. For older systems I think we can look at
> > D18F2x94_dct[1:0][DisDramInterface]
> >
> > Or maybe we have a separate function to check for enabled memory
> controllers
> > before we check for ECC?
> 
> The best would be to have a function which checks whether DIMMs are
> present on the node and act accordingly. You can pull up some of the
> work of caching registers which are used in debug_display_dimm_sizes*
> and use that info to get the required DIMM state upfront.
> 

Okay, so I'll do a separate function.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-28 15:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-21 19:13 [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs Yazen Ghannam
2018-03-21 19:13 ` [1/3] " Yazen Ghannam
2018-03-21 19:13 ` [PATCH 2/3] EDAC/amd64: Only remove instances that exist Yazen Ghannam
2018-03-21 19:13   ` [2/3] " Yazen Ghannam
2018-03-21 19:13 ` [PATCH 3/3] EDAC/amd64: Add DIMM device type for Fam17h Yazen Ghannam
2018-03-21 19:13   ` [3/3] " Yazen Ghannam
2018-03-28 13:00 ` [PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs Borislav Petkov
2018-03-28 13:00   ` [1/3] " Borislav Petkov
2018-03-28 14:38   ` [PATCH 1/3] " Ghannam, Yazen
2018-03-28 14:38     ` [1/3] " Yazen Ghannam
2018-03-28 15:43     ` [PATCH 1/3] " Borislav Petkov
2018-03-28 15:43       ` [1/3] " Borislav Petkov
2018-03-28 15:59       ` [PATCH 1/3] " Ghannam, Yazen
2018-03-28 15:59         ` [1/3] " Yazen Ghannam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.