linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ie31200_edac missing PCI ID for i3-4370
@ 2021-02-01  0:07 Paul Marks
  2021-02-04 22:59 ` Jason Baron
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Marks @ 2021-02-01  0:07 UTC (permalink / raw)
  To: linux-edac; +Cc: jbaron, bp, m.chehab

I have an ASRock C226M WS with an i3-4370 CPU.

# lspci -vnn
00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
            DRAM Controller [8086:0c00] (rev 06)
        Subsystem: ASRock Incorporation 4th Gen Core Processor
            DRAM Controller [1849:0c00]
        Flags: bus master, fast devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=0c <?>
        Kernel driver in use: hsw_uncore

But edac-util doesn't work:

# edac-util -v
edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs

I tried this ham-fisted patch:

# diff -u ./drivers/edac/ie31200_edac.c{.old,}
--- ./drivers/edac/ie31200_edac.c.old
+++ ./drivers/edac/ie31200_edac.c
@@ -58,7 +58,7 @@
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
-#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
+#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918

And it seems happy now:

# lspci -vnn
00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
            DRAM Controller [8086:0c00] (rev 06)
        Subsystem: ASRock Incorporation 4th Gen Core Processor
            DRAM Controller [1849:0c00]
        Flags: bus master, fast devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=0c <?>
        Kernel driver in use: hsw_uncore
        Kernel modules: ie31200_edac

# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
mc0: csrow1: 0 Uncorrected Errors
mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors
edac-util: No errors to report.

I don't know if it's truly working because I can't overclock the RAM
to induce ECC errors, but still I think adding 8086:0c00 to this
driver could be useful.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ie31200_edac missing PCI ID for i3-4370
  2021-02-01  0:07 ie31200_edac missing PCI ID for i3-4370 Paul Marks
@ 2021-02-04 22:59 ` Jason Baron
  2021-02-04 23:22   ` Paul Marks
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Baron @ 2021-02-04 22:59 UTC (permalink / raw)
  To: Paul Marks, linux-edac; +Cc: bp, m.chehab



On 1/31/21 7:07 PM, Paul Marks wrote:
> I have an ASRock C226M WS with an i3-4370 CPU.
> 
> # lspci -vnn
> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>             DRAM Controller [8086:0c00] (rev 06)
>         Subsystem: ASRock Incorporation 4th Gen Core Processor
>             DRAM Controller [1849:0c00]
>         Flags: bus master, fast devsel, latency 0
>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>         Kernel driver in use: hsw_uncore
> 
> But edac-util doesn't work:
> 
> # edac-util -v
> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
> 
> I tried this ham-fisted patch:
> 
> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
> --- ./drivers/edac/ie31200_edac.c.old
> +++ ./drivers/edac/ie31200_edac.c
> @@ -58,7 +58,7 @@
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918

just curious why you removed here and didn't just add?

> 
> And it seems happy now:
> 
> # lspci -vnn
> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>             DRAM Controller [8086:0c00] (rev 06)
>         Subsystem: ASRock Incorporation 4th Gen Core Processor
>             DRAM Controller [1849:0c00]
>         Flags: bus master, fast devsel, latency 0
>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>         Kernel driver in use: hsw_uncore
>         Kernel modules: ie31200_edac
> 
> # edac-util -v
> mc0: 0 Uncorrected Errors with no DIMM info
> mc0: 0 Corrected Errors with no DIMM info
> mc0: csrow0: 0 Uncorrected Errors
> mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
> mc0: csrow1: 0 Uncorrected Errors
> mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors
> edac-util: No errors to report.
> 
> I don't know if it's truly working because I can't overclock the RAM
> to induce ECC errors, but still I think adding 8086:0c00 to this
> driver could be useful.
> 

Cool yeah - I think it makes sense to add if can confirm
that the Intel datasheet says that this cpu uses the same
registers to read errors from as the others. I can certainly
confirm that the other pci ids do increment ce counts...

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ie31200_edac missing PCI ID for i3-4370
  2021-02-04 22:59 ` Jason Baron
@ 2021-02-04 23:22   ` Paul Marks
  2021-02-09 22:25     ` Jason Baron
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Marks @ 2021-02-04 23:22 UTC (permalink / raw)
  To: Jason Baron; +Cc: linux-edac, bp, m.chehab

On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>
> On 1/31/21 7:07 PM, Paul Marks wrote:
> > I have an ASRock C226M WS with an i3-4370 CPU.
> >
> > # lspci -vnn
> > 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
> >             DRAM Controller [8086:0c00] (rev 06)
> >         Subsystem: ASRock Incorporation 4th Gen Core Processor
> >             DRAM Controller [1849:0c00]
> >         Flags: bus master, fast devsel, latency 0
> >         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
> >         Kernel driver in use: hsw_uncore
> >
> > But edac-util doesn't work:
> >
> > # edac-util -v
> > edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
> >
> > I tried this ham-fisted patch:
> >
> > # diff -u ./drivers/edac/ie31200_edac.c{.old,}
> > --- ./drivers/edac/ie31200_edac.c.old
> > +++ ./drivers/edac/ie31200_edac.c
> > @@ -58,7 +58,7 @@
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
> > -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
> > +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>
> just curious why you removed here and didn't just add?

This is not a serious patch, just a one-liner to demonstrate the problem.

>
> >
> > And it seems happy now:
> >
> > # lspci -vnn
> > 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
> >             DRAM Controller [8086:0c00] (rev 06)
> >         Subsystem: ASRock Incorporation 4th Gen Core Processor
> >             DRAM Controller [1849:0c00]
> >         Flags: bus master, fast devsel, latency 0
> >         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
> >         Kernel driver in use: hsw_uncore
> >         Kernel modules: ie31200_edac
> >
> > # edac-util -v
> > mc0: 0 Uncorrected Errors with no DIMM info
> > mc0: 0 Corrected Errors with no DIMM info
> > mc0: csrow0: 0 Uncorrected Errors
> > mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
> > mc0: csrow1: 0 Uncorrected Errors
> > mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors
> > edac-util: No errors to report.
> >
> > I don't know if it's truly working because I can't overclock the RAM
> > to induce ECC errors, but still I think adding 8086:0c00 to this
> > driver could be useful.
> >
>
> Cool yeah - I think it makes sense to add if can confirm
> that the Intel datasheet says that this cpu uses the same
> registers to read errors from as the others. I can certainly
> confirm that the other pci ids do increment ce counts...
>
> Thanks,
>
> -Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ie31200_edac missing PCI ID for i3-4370
  2021-02-04 23:22   ` Paul Marks
@ 2021-02-09 22:25     ` Jason Baron
  2021-02-09 23:58       ` Paul Marks
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Baron @ 2021-02-09 22:25 UTC (permalink / raw)
  To: Paul Marks; +Cc: linux-edac, bp, m.chehab



On 2/4/21 6:22 PM, Paul Marks wrote:
> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>>
>> On 1/31/21 7:07 PM, Paul Marks wrote:
>>> I have an ASRock C226M WS with an i3-4370 CPU.
>>>
>>> # lspci -vnn
>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>>>             DRAM Controller [8086:0c00] (rev 06)
>>>         Subsystem: ASRock Incorporation 4th Gen Core Processor
>>>             DRAM Controller [1849:0c00]
>>>         Flags: bus master, fast devsel, latency 0
>>>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>>>         Kernel driver in use: hsw_uncore
>>>
>>> But edac-util doesn't work:
>>>
>>> # edac-util -v
>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
>>>
>>> I tried this ham-fisted patch:
>>>
>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
>>> --- ./drivers/edac/ie31200_edac.c.old
>>> +++ ./drivers/edac/ie31200_edac.c
>>> @@ -58,7 +58,7 @@
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>>
>> just curious why you removed here and didn't just add?
> 
> This is not a serious patch, just a one-liner to demonstrate the problem.

Ok. Any chance you can find the datasheet that shows that this
driver is using the appropriate registers for this hw? I didn't
find it quickly looking...

Thanks,

-Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ie31200_edac missing PCI ID for i3-4370
  2021-02-09 22:25     ` Jason Baron
@ 2021-02-09 23:58       ` Paul Marks
  2021-02-10  3:27         ` Jason Baron
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Marks @ 2021-02-09 23:58 UTC (permalink / raw)
  To: Jason Baron; +Cc: linux-edac, bp

On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote:
>
> On 2/4/21 6:22 PM, Paul Marks wrote:
> > On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
> >>
> >> On 1/31/21 7:07 PM, Paul Marks wrote:
> >>> I have an ASRock C226M WS with an i3-4370 CPU.
> >>>
> >>> # lspci -vnn
> >>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
> >>>             DRAM Controller [8086:0c00] (rev 06)
> >>>         Subsystem: ASRock Incorporation 4th Gen Core Processor
> >>>             DRAM Controller [1849:0c00]
> >>>         Flags: bus master, fast devsel, latency 0
> >>>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
> >>>         Kernel driver in use: hsw_uncore
> >>>
> >>> But edac-util doesn't work:
> >>>
> >>> # edac-util -v
> >>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
> >>>
> >>> I tried this ham-fisted patch:
> >>>
> >>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
> >>> --- ./drivers/edac/ie31200_edac.c.old
> >>> +++ ./drivers/edac/ie31200_edac.c
> >>> @@ -58,7 +58,7 @@
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
> >>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
> >>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
> >>
> >> just curious why you removed here and didn't just add?
> >
> > This is not a serious patch, just a one-liner to demonstrate the problem.
>
> Ok. Any chance you can find the datasheet that shows that this
> driver is using the appropriate registers for this hw? I didn't
> find it quickly looking...
>

I wouldn't know where to begin.  Do you have an example of a similar
datasheet from one of the known-good devices?

I left "memtester" running on this machine, because it might increase
the odds of generating an ECC error someday.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ie31200_edac missing PCI ID for i3-4370
  2021-02-09 23:58       ` Paul Marks
@ 2021-02-10  3:27         ` Jason Baron
  2021-02-10 15:31           ` Jason Baron
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Baron @ 2021-02-10  3:27 UTC (permalink / raw)
  To: Paul Marks; +Cc: linux-edac, bp



On 2/9/21 6:58 PM, Paul Marks wrote:
> On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote:
>> On 2/4/21 6:22 PM, Paul Marks wrote:
>>> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>>>> On 1/31/21 7:07 PM, Paul Marks wrote:
>>>>> I have an ASRock C226M WS with an i3-4370 CPU.
>>>>>
>>>>> # lspci -vnn
>>>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>>>>>              DRAM Controller [8086:0c00] (rev 06)
>>>>>          Subsystem: ASRock Incorporation 4th Gen Core Processor
>>>>>              DRAM Controller [1849:0c00]
>>>>>          Flags: bus master, fast devsel, latency 0
>>>>>          Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>>>>>          Kernel driver in use: hsw_uncore
>>>>>
>>>>> But edac-util doesn't work:
>>>>>
>>>>> # edac-util -v
>>>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
>>>>>
>>>>> I tried this ham-fisted patch:
>>>>>
>>>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
>>>>> --- ./drivers/edac/ie31200_edac.c.old
>>>>> +++ ./drivers/edac/ie31200_edac.c
>>>>> @@ -58,7 +58,7 @@
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
>>>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
>>>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>>>> just curious why you removed here and didn't just add?
>>> This is not a serious patch, just a one-liner to demonstrate the problem.
>> Ok. Any chance you can find the datasheet that shows that this
>> driver is using the appropriate registers for this hw? I didn't
>> find it quickly looking...
>>
> I wouldn't know where to begin.  Do you have an example of a similar
> datasheet from one of the known-good devices?
>
> I left "memtester" running on this machine, because it might increase
> the odds of generating an ECC error someday.
Hi Paul,

I have a list of them at the top of:
drivers/edac/ie31200_edac.c

According to the following intel link it looks
like '0xc[0-f]' is valid (page 52):
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-2-datasheet.pdf

So I'm fine with this patch (assuming it just
becomes an addition).

Thanks,

-Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ie31200_edac missing PCI ID for i3-4370
  2021-02-10  3:27         ` Jason Baron
@ 2021-02-10 15:31           ` Jason Baron
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Baron @ 2021-02-10 15:31 UTC (permalink / raw)
  To: Paul Marks; +Cc: linux-edac, bp



On 2/9/21 10:27 PM, Jason Baron wrote:
> 
> 
> On 2/9/21 6:58 PM, Paul Marks wrote:
>> On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote:
>>> On 2/4/21 6:22 PM, Paul Marks wrote:
>>>> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>>>>> On 1/31/21 7:07 PM, Paul Marks wrote:
>>>>>> I have an ASRock C226M WS with an i3-4370 CPU.
>>>>>>
>>>>>> # lspci -vnn
>>>>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>>>>>>              DRAM Controller [8086:0c00] (rev 06)
>>>>>>          Subsystem: ASRock Incorporation 4th Gen Core Processor
>>>>>>              DRAM Controller [1849:0c00]
>>>>>>          Flags: bus master, fast devsel, latency 0
>>>>>>          Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>>>>>>          Kernel driver in use: hsw_uncore
>>>>>>
>>>>>> But edac-util doesn't work:
>>>>>>
>>>>>> # edac-util -v
>>>>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
>>>>>>
>>>>>> I tried this ham-fisted patch:
>>>>>>
>>>>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
>>>>>> --- ./drivers/edac/ie31200_edac.c.old
>>>>>> +++ ./drivers/edac/ie31200_edac.c
>>>>>> @@ -58,7 +58,7 @@
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
>>>>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
>>>>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>>>>> just curious why you removed here and didn't just add?
>>>> This is not a serious patch, just a one-liner to demonstrate the problem.
>>> Ok. Any chance you can find the datasheet that shows that this
>>> driver is using the appropriate registers for this hw? I didn't
>>> find it quickly looking...
>>>
>> I wouldn't know where to begin.  Do you have an example of a similar
>> datasheet from one of the known-good devices?
>>
>> I left "memtester" running on this machine, because it might increase
>> the odds of generating an ECC error someday.
> Hi Paul,
> 
> I have a list of them at the top of:
> drivers/edac/ie31200_edac.c
> 
> According to the following intel link it looks
> like '0xc[0-f]' is valid (page 52):

Sorry meant to write that as: '0x0c0[0-f]'.


> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-2-datasheet.pdf
> 
> So I'm fine with this patch (assuming it just
> becomes an addition).
> 
> Thanks,
> 
> -Jason
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ie31200_edac missing PCI ID for i3-4370
       [not found] ` <CAC0EBY8piMhd=wTKb94_cEP9FHWax_79V+MTt4_cY7jZYdoRkg@mail.gmail.com>
@ 2021-05-15 11:10   ` Rick Moritz
  0 siblings, 0 replies; 8+ messages in thread
From: Rick Moritz @ 2021-05-15 11:10 UTC (permalink / raw)
  To: linux-edac

Hi List, Hi Jason, Hi Paul,

Sorry for reviving this relatively old thread, but it brought back
memories from 2016, when I tried to add the Skylake i3's (PCI-ID
190f).

I've been running a modified kernel since then, with no issues (but no
proof that any error reporting is being done).

Output from ecc-utils is merely:

edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
edac-util: No errors to report.

without any of the DIMM-information I would expect..
The /sys tree for edac also contains Unknown for e.g. dimm_edac_node,
which isn't quite promising either.
The doc should be here:
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/desktop-6th-gen-core-family-datasheet-vol-2.pdf
Comparing the contents of the DID register does show a different
layout, something to keep in mind for the Haswell i3's as well.

Also, I'm not sure how this change will behave on same-generation
desktop i5s/i7s - they may just report nothing. Even the i3s may not
report any EDAC, depending how deep the ECC implementation really
goes.

Finally: did the original thread ever end in a kernel patch? I
couldn't find anything that looks related.

As I am back in kernel compiling, I will try and get my 6th gen i3 to
work - but I guess first I will need to dive into the code and data
sheet some more. I'd be glad for any pointers.

I would really like to hear back from Paul, how far he managed to get
with Haswell.

Cheers,


Rick


Full quote of last post in thread for reference:

On 2/9/21 10:27 PM, Jason Baron wrote:
>
>
> On 2/9/21 6:58 PM, Paul Marks wrote:
>> On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote:
>>> On 2/4/21 6:22 PM, Paul Marks wrote:
>>>> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>>>>> On 1/31/21 7:07 PM, Paul Marks wrote:
>>>>>> I have an ASRock C226M WS with an i3-4370 CPU.
>>>>>>
>>>>>> # lspci -vnn
>>>>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â  DRAM Controller [8086:0c00] (rev 06)
>>>>>> Â Â Â Â Â Â Â Â  Subsystem: ASRock Incorporation 4th Gen Core Processor
>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â  DRAM Controller [1849:0c00]
>>>>>> Â Â Â Â Â Â Â Â  Flags: bus master, fast devsel, latency 0
>>>>>> Â Â Â Â Â Â Â Â  Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>>>>>> Â Â Â Â Â Â Â Â  Kernel driver in use: hsw_uncore
>>>>>>
>>>>>> But edac-util doesn't work:
>>>>>>
>>>>>> # edac-util -v
>>>>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
>>>>>>
>>>>>> I tried this ham-fisted patch:
>>>>>>
>>>>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
>>>>>> --- ./drivers/edac/ie31200_edac.c.old
>>>>>> +++ ./drivers/edac/ie31200_edac.c
>>>>>> @@ -58,7 +58,7 @@
>>>>>> Â  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>>>>>> Â  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>>>>>> Â  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
>>>>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
>>>>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>>>>>> Â  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>>>>>> Â  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>>>>>> Â  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>>>>> just curious why you removed here and didn't just add?
>>>> This is not a serious patch, just a one-liner to demonstrate the problem.
>>> Ok. Any chance you can find the datasheet that shows that this
>>> driver is using the appropriate registers for this hw? I didn't
>>> find it quickly looking...
>>>
>> I wouldn't know where to begin.  Do you have an example of a similar
>> datasheet from one of the known-good devices?
>>
>> I left "memtester" running on this machine, because it might increase
>> the odds of generating an ECC error someday.
> Hi Paul,
>
> I have a list of them at the top of:
> drivers/edac/ie31200_edac.c
>
> According to the following intel link it looks
> like '0xc[0-f]' is valid (page 52):

Sorry meant to write that as: '0x0c0[0-f]'.


> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-2-datasheet.pdf
>
> So I'm fine with this patch (assuming it just
> becomes an addition).
>
> Thanks,
>
> -Jason
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-15 11:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01  0:07 ie31200_edac missing PCI ID for i3-4370 Paul Marks
2021-02-04 22:59 ` Jason Baron
2021-02-04 23:22   ` Paul Marks
2021-02-09 22:25     ` Jason Baron
2021-02-09 23:58       ` Paul Marks
2021-02-10  3:27         ` Jason Baron
2021-02-10 15:31           ` Jason Baron
     [not found] <CAC0EBY-QL5LP+POpyjjt-8rc7d5r2YC+2gzf-SShrJ6DQoyWqw@mail.gmail.com>
     [not found] ` <CAC0EBY8piMhd=wTKb94_cEP9FHWax_79V+MTt4_cY7jZYdoRkg@mail.gmail.com>
2021-05-15 11:10   ` Rick Moritz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).