All of lore.kernel.org
 help / color / mirror / Atom feed
* [v2,1/1] EDAC, sb_edac: Don't create EDAC second memory controller if the HA1 is not present
@ 2017-09-13 10:42 Qiuxu Zhuo
  0 siblings, 0 replies; 2+ messages in thread
From: Qiuxu Zhuo @ 2017-09-13 10:42 UTC (permalink / raw)
  To: bp, mchehab; +Cc: tony.luck, arozansk, patrickg, yizhan, linux-edac, Qiuxu Zhuo

Yi Zhang reported the following failure on one 2-socket Haswell(E5-2603 v3) server
(DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the EDAC IMC1(the 2nd memory controller)
if any IMC1 device is present. For this case only HA1_TA of IMC1 was present,
but EDAC driver expected to find HA1/HA1_TM/HA1_TAD[0-3] devices, then the driver
reported the above failure.

In the link [1], it says the 'E5-2603 v3' CPU has maximum 4 memory channels. Yi Zhang
inserted one DIMM per channel for each CPU, and did random error address injection
test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so the server
really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3' implements
either one IMC or two. For CPU with one IMC, IMC1 is not used and should be ignored.

Thus, it's reasonable not create EDAC 2nd memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Fixes: e2f747b1f42a ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
---
v1->v2 changelog:
 Add a 'Fixes:' tag at the end of commit message.

 drivers/edac/sb_edac.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 80d860cb0746..7a3b201d51df 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -455,6 +455,7 @@ static const struct pci_id_table pci_dev_descr_sbridge_table[] = {
 static const struct pci_id_descr pci_dev_descr_ibridge[] = {
 		/* Processor Home Agent */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0,        0, IMC0) },
+	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 
 		/* Memory controller */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA,     0, IMC0) },
@@ -465,7 +466,6 @@ static const struct pci_id_descr pci_dev_descr_ibridge[] = {
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3,   0, IMC0) },
 
 		/* Optional, mode 2HA */
-	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA,     1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS,    1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0,   1, IMC1) },
@@ -2260,6 +2260,13 @@ static int sbridge_get_onedevice(struct pci_dev **prev,
 next_imc:
 	sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
 	if (!sbridge_dev) {
+		/* If the HA1 wasn't found, don't create EDAC second memory controller */
+		if (dev_descr->dom == IMC1 && devno != 1) {
+			edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
+				 PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
+			pci_dev_put(pdev);
+			return 0;
+		}
 
 		if (dev_descr->dom == SOCK)
 			goto out_imc;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [v2,1/1] EDAC, sb_edac: Don't create EDAC second memory controller if the HA1 is not present
@ 2017-09-27 10:21 Borislav Petkov
  0 siblings, 0 replies; 2+ messages in thread
From: Borislav Petkov @ 2017-09-27 10:21 UTC (permalink / raw)
  To: Qiuxu Zhuo; +Cc: mchehab, tony.luck, arozansk, patrickg, yizhan, linux-edac

On Wed, Sep 13, 2017 at 06:42:14PM +0800, Qiuxu Zhuo wrote:
> Yi Zhang reported the following failure on one 2-socket Haswell(E5-2603 v3) server
> (DELL PowerEdge 730xd):
> 
>   EDAC sbridge: Some needed devices are missing
>   EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
>   EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
>   EDAC sbridge: Couldn't find mci handler
>   EDAC sbridge: Couldn't find mci handler
>   EDAC sbridge: Failed to register device with error -19.
> 
> The refactored sb_edac driver creates the EDAC IMC1(the 2nd memory controller)
> if any IMC1 device is present. For this case only HA1_TA of IMC1 was present,
> but EDAC driver expected to find HA1/HA1_TM/HA1_TAD[0-3] devices, then the driver
> reported the above failure.
> 
> In the link [1], it says the 'E5-2603 v3' CPU has maximum 4 memory channels. Yi Zhang
> inserted one DIMM per channel for each CPU, and did random error address injection
> test with this patch:
> 
>       4024  addresses fell in TOLM hole area
>      12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
>      12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
>      12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
>      12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
>      12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
>      12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
>      12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
>      12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
>     106400  addresses were injected totally.
> 
> The test result shows that all the 4 channels belong to IMC0 per CPU, so the server
> really only has one IMC per CPU.
> 
> In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3' implements
> either one IMC or two. For CPU with one IMC, IMC1 is not used and should be ignored.
> 
> Thus, it's reasonable not create EDAC 2nd memory controller if the key HA1 is absent.
> 
> [1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
> [2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
> 
> Fixes: e2f747b1f42a ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")
> 
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
> ---
> v1->v2 changelog:
>  Add a 'Fixes:' tag at the end of commit message.
> 
>  drivers/edac/sb_edac.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Applied, thanks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-09-27 10:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-13 10:42 [v2,1/1] EDAC, sb_edac: Don't create EDAC second memory controller if the HA1 is not present Qiuxu Zhuo
2017-09-27 10:21 Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.