EDAC/AMD64: Update scrub register addresses for newer models
diff mbox series

Message ID 20210116143353.7576-1-Yazen.Ghannam@amd.com
State New, archived
Headers show
Series
  • EDAC/AMD64: Update scrub register addresses for newer models
Related show

Commit Message

Yazen Ghannam Jan. 16, 2021, 2:33 p.m. UTC
From: Yazen Ghannam <yazen.ghannam@amd.com>

The Family 17h scrubber registers moved to different offset starting
with Model 30h. The new register offsets are used for all currently
available models since then.

Use the new register addresses as the defaults.

Set the proper scrub register addresses during module init for older
models.

Reported-by: WGH <wgh@torlan.ru>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 drivers/edac/amd64_edac.c | 23 ++++++++++++++++++-----
 drivers/edac/amd64_edac.h |  2 ++
 2 files changed, 20 insertions(+), 5 deletions(-)

Comments

WGH Jan. 18, 2021, 1:30 a.m. UTC | #1
On 16/01/2021 17:33, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
>
> The Family 17h scrubber registers moved to different offset starting
> with Model 30h. The new register offsets are used for all currently
> available models since then.
>
> Use the new register addresses as the defaults.
>
> Set the proper scrub register addresses during module init for older
> models.

So I tested the patch on my machine (AMD Ryzen 9 3900XT on ASRock B550 Extreme4 motherboard, Linux 5.10.7).

The /sys/devices/system/edac/mc/mc0/sdram_scrub_rate value seems to be stuck at 12284069 right after the boot, and does not change.
Writes to the file do not report any errors.

dmesg:

[    0.549451] EDAC MC: Ver: 3.0.0
[    0.817576] EDAC amd64: F17h_M70h detected (node 0).
[    0.818159] EDAC amd64: Node 0: DRAM ECC enabled.
[    0.818717] EDAC amd64: MCT channel count: 2
[    0.819324] EDAC MC0: Giving out device to module amd64_edac controller F17h_M70h: DEV 0000:00:18.3 (INTERRUPT)
[    0.819909] EDAC MC: UMC0 chip selects:
[    0.819910] EDAC amd64: MC: 0: 16384MB 1: 16384MB
[    0.820488] EDAC amd64: MC: 2: 16384MB 3: 16384MB
[    0.821067] EDAC MC: UMC1 chip selects:
[    0.821067] EDAC amd64: MC: 0: 16384MB 1: 16384MB
[    0.821630] EDAC amd64: MC: 2: 16384MB 3: 16384MB
[    0.822187] EDAC amd64: using x16 syndromes.
[    0.822739] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[    0.823314] AMD64 EDAC driver v3.5.0
Borislav Petkov Jan. 18, 2021, 7:31 p.m. UTC | #2
On Sat, Jan 16, 2021 at 02:33:53PM +0000, Yazen Ghannam wrote:
> +static struct {
> +	u32 base, limit;
> +} f17h_scrub_regs = {F17H_M30H_SCR_BASE_ADDR, F17H_M30H_SCR_LIMIT_ADDR};

Why not make this part of struct amd64_umc so that you can access them
through pvt->umc?
Yazen Ghannam Jan. 20, 2021, 2:35 p.m. UTC | #3
On Mon, Jan 18, 2021 at 04:30:58AM +0300, WGH wrote:
> On 16/01/2021 17:33, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > The Family 17h scrubber registers moved to different offset starting
> > with Model 30h. The new register offsets are used for all currently
> > available models since then.
> >
> > Use the new register addresses as the defaults.
> >
> > Set the proper scrub register addresses during module init for older
> > models.
> 
> So I tested the patch on my machine (AMD Ryzen 9 3900XT on ASRock B550 Extreme4 motherboard, Linux 5.10.7).
> 
> The /sys/devices/system/edac/mc/mc0/sdram_scrub_rate value seems to be stuck at 12284069 right after the boot, and does not change.
> Writes to the file do not report any errors.
> 
> dmesg:
> 
> [    0.549451] EDAC MC: Ver: 3.0.0
> [    0.817576] EDAC amd64: F17h_M70h detected (node 0).
> [    0.818159] EDAC amd64: Node 0: DRAM ECC enabled.
> [    0.818717] EDAC amd64: MCT channel count: 2
> [    0.819324] EDAC MC0: Giving out device to module amd64_edac controller F17h_M70h: DEV 0000:00:18.3 (INTERRUPT)
> [    0.819909] EDAC MC: UMC0 chip selects:
> [    0.819910] EDAC amd64: MC: 0: 16384MB 1: 16384MB
> [    0.820488] EDAC amd64: MC: 2: 16384MB 3: 16384MB
> [    0.821067] EDAC MC: UMC1 chip selects:
> [    0.821067] EDAC amd64: MC: 0: 16384MB 1: 16384MB
> [    0.821630] EDAC amd64: MC: 2: 16384MB 3: 16384MB
> [    0.822187] EDAC amd64: using x16 syndromes.
> [    0.822739] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
> [    0.823314] AMD64 EDAC driver v3.5.0
> 
>

Thanks for testing. I'll try to find a similar system and check it out.

Thanks,
Yazen
Yazen Ghannam Jan. 20, 2021, 2:41 p.m. UTC | #4
On Mon, Jan 18, 2021 at 08:31:12PM +0100, Borislav Petkov wrote:
> On Sat, Jan 16, 2021 at 02:33:53PM +0000, Yazen Ghannam wrote:
> > +static struct {
> > +	u32 base, limit;
> > +} f17h_scrub_regs = {F17H_M30H_SCR_BASE_ADDR, F17H_M30H_SCR_LIMIT_ADDR};
> 
> Why not make this part of struct amd64_umc so that you can access them
> through pvt->umc?
>

We have a struct amd64_umc per channel, so putting these fixed values
there seemed redundant. Would you mind if we put this in struct
amd64_family_type? We can then set the values per family/model group
like we do with the max_mcs.

Thanks,
Yazen
Borislav Petkov Jan. 20, 2021, 2:50 p.m. UTC | #5
On Wed, Jan 20, 2021 at 08:41:11AM -0600, Yazen Ghannam wrote:
> We have a struct amd64_umc per channel, so putting these fixed values
> there seemed redundant. Would you mind if we put this in struct
> amd64_family_type? We can then set the values per family/model group
> like we do with the max_mcs.

Sure, that's better.

Thx.

Patch
diff mbox series

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 9868f95a5622..b324b1589e5a 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -167,6 +167,10 @@  static inline int amd64_read_dct_pci_cfg(struct amd64_pvt *pvt, u8 dct,
  * other archs, we might not have access to the caches directly.
  */
 
+static struct {
+	u32 base, limit;
+} f17h_scrub_regs = {F17H_M30H_SCR_BASE_ADDR, F17H_M30H_SCR_LIMIT_ADDR};
+
 static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval)
 {
 	/*
@@ -176,10 +180,10 @@  static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval)
 	 */
 	if (scrubval >= 0x5 && scrubval <= 0x14) {
 		scrubval -= 0x5;
-		pci_write_bits32(pvt->F6, F17H_SCR_LIMIT_ADDR, scrubval, 0xF);
-		pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 1, 0x1);
+		pci_write_bits32(pvt->F6, f17h_scrub_regs.limit, scrubval, 0xF);
+		pci_write_bits32(pvt->F6, f17h_scrub_regs.base, 1, 0x1);
 	} else {
-		pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 0, 0x1);
+		pci_write_bits32(pvt->F6, f17h_scrub_regs.base, 0, 0x1);
 	}
 }
 /*
@@ -257,9 +261,9 @@  static int get_scrub_rate(struct mem_ctl_info *mci)
 	u32 scrubval = 0;
 
 	if (pvt->umc) {
-		amd64_read_pci_cfg(pvt->F6, F17H_SCR_BASE_ADDR, &scrubval);
+		amd64_read_pci_cfg(pvt->F6, f17h_scrub_regs.base, &scrubval);
 		if (scrubval & BIT(0)) {
-			amd64_read_pci_cfg(pvt->F6, F17H_SCR_LIMIT_ADDR, &scrubval);
+			amd64_read_pci_cfg(pvt->F6, f17h_scrub_regs.limit, &scrubval);
 			scrubval &= 0xF;
 			scrubval += 0x5;
 		} else {
@@ -3568,6 +3572,14 @@  f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 	}
 }
 
+static void f17h_set_scrub_regs(struct amd64_pvt *pvt)
+{
+	if ((pvt->fam == 0x17 && pvt->model < 0x30) || pvt->fam == 0x18) {
+		f17h_scrub_regs.base = F17H_SCR_BASE_ADDR;
+		f17h_scrub_regs.limit = F17H_SCR_LIMIT_ADDR;
+	}
+}
+
 static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 {
 	struct amd64_pvt *pvt = mci->pvt_info;
@@ -3577,6 +3589,7 @@  static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 
 	if (pvt->umc) {
 		f17h_determine_edac_ctl_cap(mci, pvt);
+		f17h_set_scrub_regs(pvt);
 	} else {
 		if (pvt->nbcap & NBCAP_SECDED)
 			mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 85aa820bc165..4606f72f4258 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -213,6 +213,8 @@ 
 #define F15H_M60H_SCRCTRL		0x1C8
 #define F17H_SCR_BASE_ADDR		0x48
 #define F17H_SCR_LIMIT_ADDR		0x4C
+#define F17H_M30H_SCR_BASE_ADDR		0x40
+#define F17H_M30H_SCR_LIMIT_ADDR	0x44
 
 /*
  * Function 3 - Misc Control