linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] EDAC/amd64: Add support for ECC on family 19h model 60h-6Fh
@ 2023-04-25 20:12 Hristo Venev
  2023-05-09 14:53 ` Yazen Ghannam
  0 siblings, 1 reply; 9+ messages in thread
From: Hristo Venev @ 2023-04-25 20:12 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: Borislav Petkov, linux-edac, Hristo Venev

Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
instead of 12.

I tested this with two 32GB dual-rank DIMMs. The sizes appear to be
reported correctly:

    [    2.122750] EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
    [    2.122751] EDAC amd64: F19h_M60h detected (node 0).
    [    2.122754] EDAC MC: UMC0 chip selects:
    [    2.122754] EDAC amd64: MC: 0:     0MB 1:     0MB
    [    2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB
    [    2.122757] EDAC MC: UMC1 chip selects:
    [    2.122757] EDAC amd64: MC: 0:     0MB 1:     0MB
    [    2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB
    [    2.122759] AMD64 EDAC driver v3.5.0

ECC errors can also be detected:

    [  313.747594] mce: [Hardware Error]: Machine check events logged
    [  313.747597] [Hardware Error]: Corrected error, no action required.
    [  313.747613] [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b
    [  313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0
    [  313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203
    [  313.747652] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
    [  313.747669] EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1)
    [  313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

Signed-off-by: Hristo Venev <hristo@venev.name>
---
 drivers/edac/amd64_edac.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b55129425c81..1080784e2784 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3816,6 +3816,10 @@ static int per_family_init(struct amd64_pvt *pvt)
 		case 0x50 ... 0x5f:
 			pvt->ctl_name			= "F19h_M50h";
 			break;
+		case 0x60 ... 0x6f:
+			pvt->ctl_name			= "F19h_M60h";
+			pvt->flags.zn_regs_v2		= 1;
+			break;
 		case 0xa0 ... 0xaf:
 			pvt->ctl_name			= "F19h_MA0h";
 			pvt->max_mcs			= 12;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] EDAC/amd64: Add support for ECC on family 19h model 60h-6Fh
  2023-04-25 20:12 [PATCH] EDAC/amd64: Add support for ECC on family 19h model 60h-6Fh Hristo Venev
@ 2023-05-09 14:53 ` Yazen Ghannam
  2023-05-10 23:42   ` Limonciello, Mario
  0 siblings, 1 reply; 9+ messages in thread
From: Yazen Ghannam @ 2023-05-09 14:53 UTC (permalink / raw)
  To: Hristo Venev, Limonciello, Mario
  Cc: yazen.ghannam, Borislav Petkov, linux-edac

On 4/25/23 4:12 PM, Hristo Venev wrote:
> Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
> instead of 12.
> 
> I tested this with two 32GB dual-rank DIMMs. The sizes appear to be
> reported correctly:
> 
>     [    2.122750] EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
>     [    2.122751] EDAC amd64: F19h_M60h detected (node 0).
>     [    2.122754] EDAC MC: UMC0 chip selects:
>     [    2.122754] EDAC amd64: MC: 0:     0MB 1:     0MB
>     [    2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB
>     [    2.122757] EDAC MC: UMC1 chip selects:
>     [    2.122757] EDAC amd64: MC: 0:     0MB 1:     0MB
>     [    2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB
>     [    2.122759] AMD64 EDAC driver v3.5.0
> 
> ECC errors can also be detected:
> 
>     [  313.747594] mce: [Hardware Error]: Machine check events logged
>     [  313.747597] [Hardware Error]: Corrected error, no action required.
>     [  313.747613] [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b
>     [  313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0
>     [  313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203
>     [  313.747652] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
>     [  313.747669] EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1)
>     [  313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
> 
> Signed-off-by: Hristo Venev <hristo@venev.name>

Hi Hristo,

Thank you for the patch. It looks good to me.

Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>

> ---
>  drivers/edac/amd64_edac.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index b55129425c81..1080784e2784 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3816,6 +3816,10 @@ static int per_family_init(struct amd64_pvt *pvt)
>  		case 0x50 ... 0x5f:
>  			pvt->ctl_name			= "F19h_M50h";
>  			break;
> +		case 0x60 ... 0x6f:
> +			pvt->ctl_name			= "F19h_M60h";
> +			pvt->flags.zn_regs_v2		= 1;
> +			break;

Mario,

Are there other Client models that can leverage this change?

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] EDAC/amd64: Add support for ECC on family 19h model 60h-6Fh
  2023-05-09 14:53 ` Yazen Ghannam
@ 2023-05-10 23:42   ` Limonciello, Mario
  2023-05-11 13:02     ` Yazen Ghannam
  0 siblings, 1 reply; 9+ messages in thread
From: Limonciello, Mario @ 2023-05-10 23:42 UTC (permalink / raw)
  To: Ghannam, Yazen, Hristo Venev; +Cc: Borislav Petkov, linux-edac

[AMD Official Use Only - General]

> -----Original Message-----
> From: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Sent: Tuesday, May 9, 2023 9:53 AM
> To: Hristo Venev <hristo@venev.name>; Limonciello, Mario
> <Mario.Limonciello@amd.com>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; Borislav Petkov
> <bp@alien8.de>; linux-edac@vger.kernel.org
> Subject: Re: [PATCH] EDAC/amd64: Add support for ECC on family 19h model
> 60h-6Fh
>
> On 4/25/23 4:12 PM, Hristo Venev wrote:
> > Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
> > instead of 12.
> >
> > I tested this with two 32GB dual-rank DIMMs. The sizes appear to be
> > reported correctly:
> >
> >     [    2.122750] EDAC MC0: Giving out device to module amd64_edac
> controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
> >     [    2.122751] EDAC amd64: F19h_M60h detected (node 0).
> >     [    2.122754] EDAC MC: UMC0 chip selects:
> >     [    2.122754] EDAC amd64: MC: 0:     0MB 1:     0MB
> >     [    2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB
> >     [    2.122757] EDAC MC: UMC1 chip selects:
> >     [    2.122757] EDAC amd64: MC: 0:     0MB 1:     0MB
> >     [    2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB
> >     [    2.122759] AMD64 EDAC driver v3.5.0
> >
> > ECC errors can also be detected:
> >
> >     [  313.747594] mce: [Hardware Error]: Machine check events logged
> >     [  313.747597] [Hardware Error]: Corrected error, no action required.
> >     [  313.747613] [Hardware Error]: CPU:0 (19:61:2)
> MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]:
> 0xdc2040000400011b
> >     [  313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0
> >     [  313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome:
> 0x000100010a801203
> >     [  313.747652] [Hardware Error]: Unified Memory Controller Ext. Error
> Code: 0, DRAM ECC error.
> >     [  313.747669] EDAC MC0: 1 CE Cannot decode normalized address on
> mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64
> syndrome:0x1)
> >     [  313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx:
> RD
> >
> > Signed-off-by: Hristo Venev <hristo@venev.name>
>
> Hi Hristo,
>
> Thank you for the patch. It looks good to me.
>
> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
>
> > ---
> >  drivers/edac/amd64_edac.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index b55129425c81..1080784e2784 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -3816,6 +3816,10 @@ static int per_family_init(struct amd64_pvt *pvt)
> >             case 0x50 ... 0x5f:
> >                     pvt->ctl_name                   = "F19h_M50h";
> >                     break;
> > +           case 0x60 ... 0x6f:
> > +                   pvt->ctl_name                   = "F19h_M60h";
> > +                   pvt->flags.zn_regs_v2           = 1;
> > +                   break;
>
> Mario,
>
> Are there other Client models that can leverage this change?

Yes family 0x19 models 0x70... 0x7f can too, thanks!

>
> Thanks,
> Yazen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] EDAC/amd64: Add support for ECC on family 19h model 60h-6Fh
  2023-05-10 23:42   ` Limonciello, Mario
@ 2023-05-11 13:02     ` Yazen Ghannam
  2023-05-11 17:45       ` Hristo Venev
  2023-05-11 17:45       ` [PATCH v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh Hristo Venev
  0 siblings, 2 replies; 9+ messages in thread
From: Yazen Ghannam @ 2023-05-11 13:02 UTC (permalink / raw)
  To: Limonciello, Mario, Hristo Venev
  Cc: yazen.ghannam, Borislav Petkov, linux-edac

On 5/10/23 7:42 PM, Limonciello, Mario wrote:
> [AMD Official Use Only - General]
> 
>> -----Original Message-----
>> From: Ghannam, Yazen <Yazen.Ghannam@amd.com>
>> Sent: Tuesday, May 9, 2023 9:53 AM
>> To: Hristo Venev <hristo@venev.name>; Limonciello, Mario
>> <Mario.Limonciello@amd.com>
>> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; Borislav Petkov
>> <bp@alien8.de>; linux-edac@vger.kernel.org
>> Subject: Re: [PATCH] EDAC/amd64: Add support for ECC on family 19h model
>> 60h-6Fh
>>
>> On 4/25/23 4:12 PM, Hristo Venev wrote:
>>> Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
>>> instead of 12.
>>>
>>> I tested this with two 32GB dual-rank DIMMs. The sizes appear to be
>>> reported correctly:
>>>
>>>     [    2.122750] EDAC MC0: Giving out device to module amd64_edac
>> controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
>>>     [    2.122751] EDAC amd64: F19h_M60h detected (node 0).
>>>     [    2.122754] EDAC MC: UMC0 chip selects:
>>>     [    2.122754] EDAC amd64: MC: 0:     0MB 1:     0MB
>>>     [    2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB
>>>     [    2.122757] EDAC MC: UMC1 chip selects:
>>>     [    2.122757] EDAC amd64: MC: 0:     0MB 1:     0MB
>>>     [    2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB
>>>     [    2.122759] AMD64 EDAC driver v3.5.0
>>>
>>> ECC errors can also be detected:
>>>
>>>     [  313.747594] mce: [Hardware Error]: Machine check events logged
>>>     [  313.747597] [Hardware Error]: Corrected error, no action required.
>>>     [  313.747613] [Hardware Error]: CPU:0 (19:61:2)
>> MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]:
>> 0xdc2040000400011b
>>>     [  313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0
>>>     [  313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome:
>> 0x000100010a801203
>>>     [  313.747652] [Hardware Error]: Unified Memory Controller Ext. Error
>> Code: 0, DRAM ECC error.
>>>     [  313.747669] EDAC MC0: 1 CE Cannot decode normalized address on
>> mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64
>> syndrome:0x1)
>>>     [  313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx:
>> RD
>>>
>>> Signed-off-by: Hristo Venev <hristo@venev.name>
>>
>> Hi Hristo,
>>
>> Thank you for the patch. It looks good to me.
>>
>> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
>>
>>> ---
>>>  drivers/edac/amd64_edac.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
>>> index b55129425c81..1080784e2784 100644
>>> --- a/drivers/edac/amd64_edac.c
>>> +++ b/drivers/edac/amd64_edac.c
>>> @@ -3816,6 +3816,10 @@ static int per_family_init(struct amd64_pvt *pvt)
>>>             case 0x50 ... 0x5f:
>>>                     pvt->ctl_name                   = "F19h_M50h";
>>>                     break;
>>> +           case 0x60 ... 0x6f:
>>> +                   pvt->ctl_name                   = "F19h_M60h";
>>> +                   pvt->flags.zn_regs_v2           = 1;
>>> +                   break;
>>
>> Mario,
>>
>> Are there other Client models that can leverage this change?
> 
> Yes family 0x19 models 0x70... 0x7f can too, thanks!
>

Thanks Mario.

Hristo,
Can you please also add those models?

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] EDAC/amd64: Add support for ECC on family 19h model 60h-6Fh
  2023-05-11 13:02     ` Yazen Ghannam
@ 2023-05-11 17:45       ` Hristo Venev
  2023-05-15 14:27         ` Borislav Petkov
  2023-05-11 17:45       ` [PATCH v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh Hristo Venev
  1 sibling, 1 reply; 9+ messages in thread
From: Hristo Venev @ 2023-05-11 17:45 UTC (permalink / raw)
  To: Yazen Ghannam, Limonciello, Mario; +Cc: Borislav Petkov, linux-edac

I'll send the updated patch.

One thing I noticed is that in the ECC error I observed the address was
not decoded successfully. As I don't really have good test
infrastructure (getting the error involved tuning voltages over several
reboots), do you think you could look into it?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
  2023-05-11 13:02     ` Yazen Ghannam
  2023-05-11 17:45       ` Hristo Venev
@ 2023-05-11 17:45       ` Hristo Venev
  2023-05-11 17:58         ` Limonciello, Mario
  2023-05-15 14:39         ` Borislav Petkov
  1 sibling, 2 replies; 9+ messages in thread
From: Hristo Venev @ 2023-05-11 17:45 UTC (permalink / raw)
  To: Yazen Ghannam, Limonciello, Mario
  Cc: Borislav Petkov, linux-edac, Hristo Venev

Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
instead of 12.

I tested this with two 32GB dual-rank DIMMs. The sizes appear to be
reported correctly:

    [    2.122750] EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
    [    2.122751] EDAC amd64: F19h_M60h detected (node 0).
    [    2.122754] EDAC MC: UMC0 chip selects:
    [    2.122754] EDAC amd64: MC: 0:     0MB 1:     0MB
    [    2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB
    [    2.122757] EDAC MC: UMC1 chip selects:
    [    2.122757] EDAC amd64: MC: 0:     0MB 1:     0MB
    [    2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB
    [    2.122759] AMD64 EDAC driver v3.5.0

ECC errors can also be detected:

    [  313.747594] mce: [Hardware Error]: Machine check events logged
    [  313.747597] [Hardware Error]: Corrected error, no action required.
    [  313.747613] [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b
    [  313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0
    [  313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203
    [  313.747652] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
    [  313.747669] EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1)
    [  313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

According to Mario Limonciello, the same code should also work for
models 70h-7Fh [1].

Link: https://lore.kernel.org/linux-edac/d619252e-35c7-814b-acdb-74714619d62a@amd.com/T/#m9fc20d5dc36074048ec5f1c0a5b01b7f972a1cc7 [1]
Signed-off-by: Hristo Venev <hristo@venev.name>
---
 drivers/edac/amd64_edac.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b55129425c81..c00f7e4ef366 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3816,6 +3816,14 @@ static int per_family_init(struct amd64_pvt *pvt)
 		case 0x50 ... 0x5f:
 			pvt->ctl_name			= "F19h_M50h";
 			break;
+		case 0x60 ... 0x6f:
+			pvt->ctl_name			= "F19h_M60h";
+			pvt->flags.zn_regs_v2		= 1;
+			break;
+		case 0x70 ... 0x7f:
+			pvt->ctl_name			= "F19h_M70h";
+			pvt->flags.zn_regs_v2		= 1;
+			break;
 		case 0xa0 ... 0xaf:
 			pvt->ctl_name			= "F19h_MA0h";
 			pvt->max_mcs			= 12;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
  2023-05-11 17:45       ` [PATCH v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh Hristo Venev
@ 2023-05-11 17:58         ` Limonciello, Mario
  2023-05-15 14:39         ` Borislav Petkov
  1 sibling, 0 replies; 9+ messages in thread
From: Limonciello, Mario @ 2023-05-11 17:58 UTC (permalink / raw)
  To: Hristo Venev, Ghannam, Yazen; +Cc: Borislav Petkov, linux-edac

[AMD Official Use Only - General]

> Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
> instead of 12.
>
> I tested this with two 32GB dual-rank DIMMs. The sizes appear to be
> reported correctly:
>
>     [    2.122750] EDAC MC0: Giving out device to module amd64_edac
> controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
>     [    2.122751] EDAC amd64: F19h_M60h detected (node 0).
>     [    2.122754] EDAC MC: UMC0 chip selects:
>     [    2.122754] EDAC amd64: MC: 0:     0MB 1:     0MB
>     [    2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB
>     [    2.122757] EDAC MC: UMC1 chip selects:
>     [    2.122757] EDAC amd64: MC: 0:     0MB 1:     0MB
>     [    2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB
>     [    2.122759] AMD64 EDAC driver v3.5.0
>
> ECC errors can also be detected:
>
>     [  313.747594] mce: [Hardware Error]: Machine check events logged
>     [  313.747597] [Hardware Error]: Corrected error, no action required.
>     [  313.747613] [Hardware Error]: CPU:0 (19:61:2)
> MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]:
> 0xdc2040000400011b
>     [  313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0
>     [  313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome:
> 0x000100010a801203
>     [  313.747652] [Hardware Error]: Unified Memory Controller Ext. Error
> Code: 0, DRAM ECC error.
>     [  313.747669] EDAC MC0: 1 CE Cannot decode normalized address on
> mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64
> syndrome:0x1)
>     [  313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
>
> According to Mario Limonciello, the same code should also work for
> models 70h-7Fh [1].
>
> Link: https://lore.kernel.org/linux-edac/d619252e-35c7-814b-acdb-
> 74714619d62a@amd.com/T/#m9fc20d5dc36074048ec5f1c0a5b01b7f972a1cc7
> [1]
> Signed-off-by: Hristo Venev <hristo@venev.name>

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>

> ---
>  drivers/edac/amd64_edac.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index b55129425c81..c00f7e4ef366 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3816,6 +3816,14 @@ static int per_family_init(struct amd64_pvt *pvt)
>               case 0x50 ... 0x5f:
>                       pvt->ctl_name                   = "F19h_M50h";
>                       break;
> +             case 0x60 ... 0x6f:
> +                     pvt->ctl_name                   = "F19h_M60h";
> +                     pvt->flags.zn_regs_v2           = 1;
> +                     break;
> +             case 0x70 ... 0x7f:
> +                     pvt->ctl_name                   = "F19h_M70h";
> +                     pvt->flags.zn_regs_v2           = 1;
> +                     break;
>               case 0xa0 ... 0xaf:
>                       pvt->ctl_name                   = "F19h_MA0h";
>                       pvt->max_mcs                    = 12;
> --
> 2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] EDAC/amd64: Add support for ECC on family 19h model 60h-6Fh
  2023-05-11 17:45       ` Hristo Venev
@ 2023-05-15 14:27         ` Borislav Petkov
  0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2023-05-15 14:27 UTC (permalink / raw)
  To: Hristo Venev; +Cc: Yazen Ghannam, Limonciello, Mario, linux-edac

On Thu, May 11, 2023 at 08:45:06PM +0300, Hristo Venev wrote:
> do you think you could look into it?

Yeah, that's being worked on but it'll take a while longer.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh
  2023-05-11 17:45       ` [PATCH v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh Hristo Venev
  2023-05-11 17:58         ` Limonciello, Mario
@ 2023-05-15 14:39         ` Borislav Petkov
  1 sibling, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2023-05-15 14:39 UTC (permalink / raw)
  To: Hristo Venev; +Cc: Yazen Ghannam, Limonciello, Mario, linux-edac

On Thu, May 11, 2023 at 08:45:07PM +0300, Hristo Venev wrote:
> Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels
> instead of 12.
> 
> I tested this with two 32GB dual-rank DIMMs. The sizes appear to be
> reported correctly:
> 
>     [    2.122750] EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
>     [    2.122751] EDAC amd64: F19h_M60h detected (node 0).
>     [    2.122754] EDAC MC: UMC0 chip selects:
>     [    2.122754] EDAC amd64: MC: 0:     0MB 1:     0MB
>     [    2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB
>     [    2.122757] EDAC MC: UMC1 chip selects:
>     [    2.122757] EDAC amd64: MC: 0:     0MB 1:     0MB
>     [    2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB
>     [    2.122759] AMD64 EDAC driver v3.5.0
> 
> ECC errors can also be detected:
> 
>     [  313.747594] mce: [Hardware Error]: Machine check events logged
>     [  313.747597] [Hardware Error]: Corrected error, no action required.
>     [  313.747613] [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b
>     [  313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0
>     [  313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203
>     [  313.747652] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
>     [  313.747669] EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1)
>     [  313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
> 
> According to Mario Limonciello, the same code should also work for
> models 70h-7Fh [1].
> 
> Link: https://lore.kernel.org/linux-edac/d619252e-35c7-814b-acdb-74714619d62a@amd.com/T/#m9fc20d5dc36074048ec5f1c0a5b01b7f972a1cc7 [1]
> Signed-off-by: Hristo Venev <hristo@venev.name>
> ---
>  drivers/edac/amd64_edac.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-15 14:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-25 20:12 [PATCH] EDAC/amd64: Add support for ECC on family 19h model 60h-6Fh Hristo Venev
2023-05-09 14:53 ` Yazen Ghannam
2023-05-10 23:42   ` Limonciello, Mario
2023-05-11 13:02     ` Yazen Ghannam
2023-05-11 17:45       ` Hristo Venev
2023-05-15 14:27         ` Borislav Petkov
2023-05-11 17:45       ` [PATCH v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh Hristo Venev
2023-05-11 17:58         ` Limonciello, Mario
2023-05-15 14:39         ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).