From: Yazen Ghannam <yazen.ghannam@amd.com> The struct amd64_family_type doesn't change between multiple nodes and instances of the modules, so make it global. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- Link: https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com rfc -> v1: * New patch based on suggestion from Boris. drivers/edac/amd64_edac.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index c1d4536ae466..b9a712819c68 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -16,6 +16,8 @@ module_param(ecc_enable_override, int, 0644); static struct msr __percpu *msrs; +static struct amd64_family_type *fam_type; + /* Per-node stuff */ static struct ecc_settings **ecc_stngs; @@ -3278,8 +3280,7 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) } } -static void setup_mci_misc_attrs(struct mem_ctl_info *mci, - struct amd64_family_type *fam) +static void setup_mci_misc_attrs(struct mem_ctl_info *mci) { struct amd64_pvt *pvt = mci->pvt_info; @@ -3298,7 +3299,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci, mci->edac_cap = determine_edac_cap(pvt); mci->mod_name = EDAC_MOD_STR; - mci->ctl_name = fam->ctl_name; + mci->ctl_name = fam_type->ctl_name; mci->dev_name = pci_name(pvt->F3); mci->ctl_page_to_phys = NULL; @@ -3312,8 +3313,6 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci, */ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) { - struct amd64_family_type *fam_type = NULL; - pvt->ext_model = boot_cpu_data.x86_model >> 4; pvt->stepping = boot_cpu_data.x86_stepping; pvt->model = boot_cpu_data.x86_model; @@ -3420,7 +3419,6 @@ static void compute_num_umcs(void) static int init_one_instance(unsigned int nid) { struct pci_dev *F3 = node_to_amd_nb(nid)->misc; - struct amd64_family_type *fam_type = NULL; struct mem_ctl_info *mci = NULL; struct edac_mc_layer layers[2]; struct amd64_pvt *pvt = NULL; @@ -3497,7 +3495,7 @@ static int init_one_instance(unsigned int nid) mci->pvt_info = pvt; mci->pdev = &pvt->F3->dev; - setup_mci_misc_attrs(mci, fam_type); + setup_mci_misc_attrs(mci); if (init_csrows(mci)) mci->edac_cap = EDAC_FLAG_NONE; -- 2.17.1
From: Yazen Ghannam <yazen.ghannam@amd.com> Hi Boris, This set contains the next revision of the RFC patches I included with the last AMD64 EDAC updates. I dropped the RFC tags, and I added a couple of new patches. Most of these patches address the issue where the module check and complains about DRAM ECC on nodes without memory. Patch 3 is new and came out of looking at the family type structs and the boot flow. Patch 6 fixes the "grain not set" warning that was recently introduced. Thanks, Yazen Links: https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com https://lkml.kernel.org/r/20190821235938.118710-10-Yazen.Ghannam@amd.com https://lkml.kernel.org/r/20190821235938.118710-11-Yazen.Ghannam@amd.com Yazen Ghannam (6): EDAC/amd64: Make struct amd64_family_type global EDAC/amd64: Gather hardware information early EDAC/amd64: Save max number of controllers to family type EDAC/amd64: Use cached data when checking for ECC EDAC/amd64: Check for memory before fully initializing an instance EDAC/amd64: Set grain per DIMM drivers/edac/amd64_edac.c | 174 ++++++++++++++++++++------------------ drivers/edac/amd64_edac.h | 1 + 2 files changed, 94 insertions(+), 81 deletions(-) -- 2.17.1
From: Yazen Ghannam <yazen.ghannam@amd.com> The maximum number of memory controllers is fixed within a family/model group. In most cases, this has been fixed at 2, but some systems may have up to 8. The struct amd64_family_type already contains family/model-specific information, and this can be used rather than adding model checks to various functions. Create a new field in struct amd64_family_type for max_num_controllers. Set this when setting other family type information, and use this when needing the maximum number of memory controllers possible for a system. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- Link: https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com rfc -> v1: * New patch. * Idea came up from Boris' comment about compute_num_umcs(). drivers/edac/amd64_edac.c | 45 +++++++++++++-------------------------- drivers/edac/amd64_edac.h | 1 + 2 files changed, 16 insertions(+), 30 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 4410da7c3a25..0fde5ad2fdcd 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -21,9 +21,6 @@ static struct amd64_family_type *fam_type; /* Per-node stuff */ static struct ecc_settings **ecc_stngs; -/* Number of Unified Memory Controllers */ -static u8 num_umcs; - /* * Valid scrub rates for the K8 hardware memory scrubber. We map the scrubbing * bandwidth to a valid bit pattern. The 'set' operation finds the 'matching- @@ -456,7 +453,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct, for (i = 0; i < pvt->csels[dct].m_cnt; i++) #define for_each_umc(i) \ - for (i = 0; i < num_umcs; i++) + for (i = 0; i < fam_type->max_num_controllers; i++) /* * @input_addr is an InputAddr associated with the node given by mci. Return the @@ -2226,6 +2223,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "K8", .f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP, .f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL, + .max_num_controllers = 2, .ops = { .early_channel_count = k8_early_channel_count, .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow, @@ -2236,6 +2234,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F10h", .f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP, .f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM, + .max_num_controllers = 2, .ops = { .early_channel_count = f1x_early_channel_count, .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, @@ -2246,6 +2245,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F15h", .f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2, + .max_num_controllers = 2, .ops = { .early_channel_count = f1x_early_channel_count, .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, @@ -2256,6 +2256,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F15h_M30h", .f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2, + .max_num_controllers = 2, .ops = { .early_channel_count = f1x_early_channel_count, .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, @@ -2266,6 +2267,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F15h_M60h", .f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2, + .max_num_controllers = 2, .ops = { .early_channel_count = f1x_early_channel_count, .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, @@ -2276,6 +2278,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F16h", .f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2, + .max_num_controllers = 2, .ops = { .early_channel_count = f1x_early_channel_count, .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, @@ -2286,6 +2289,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F16h_M30h", .f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2, + .max_num_controllers = 2, .ops = { .early_channel_count = f1x_early_channel_count, .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, @@ -2296,6 +2300,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F17h", .f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6, + .max_num_controllers = 2, .ops = { .early_channel_count = f17_early_channel_count, .dbam_to_cs = f17_addr_mask_to_cs_size, @@ -2305,6 +2310,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F17h_M10h", .f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6, + .max_num_controllers = 2, .ops = { .early_channel_count = f17_early_channel_count, .dbam_to_cs = f17_addr_mask_to_cs_size, @@ -2314,6 +2320,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F17h_M30h", .f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6, + .max_num_controllers = 8, .ops = { .early_channel_count = f17_early_channel_count, .dbam_to_cs = f17_addr_mask_to_cs_size, @@ -2323,6 +2330,7 @@ static struct amd64_family_type family_types[] = { .ctl_name = "F17h_M70h", .f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6, + .max_num_controllers = 2, .ops = { .early_channel_count = f17_early_channel_count, .dbam_to_cs = f17_addr_mask_to_cs_size, @@ -3400,29 +3408,14 @@ static const struct attribute_group *amd64_edac_attr_groups[] = { NULL }; -/* Set the number of Unified Memory Controllers in the system. */ -static void compute_num_umcs(void) -{ - u8 model = boot_cpu_data.x86_model; - - if (boot_cpu_data.x86 < 0x17) - return; - - if (model >= 0x30 && model <= 0x3f) - num_umcs = 8; - else - num_umcs = 2; - - edac_dbg(1, "Number of UMCs: %x", num_umcs); -} - static int get_hardware_info(struct amd64_pvt *pvt) { u16 pci_id1, pci_id2; int ret = -EINVAL; if (pvt->fam >= 0x17) { - pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), GFP_KERNEL); + pvt->umc = kcalloc(fam_type->max_num_controllers, + sizeof(struct amd64_umc), GFP_KERNEL); if (!pvt->umc) { ret = -ENOMEM; goto err_ret; @@ -3476,14 +3469,8 @@ static int init_one_instance(struct amd64_pvt *pvt) * Always allocate two channels since we can have setups with DIMMs on * only one channel. Also, this simplifies handling later for the price * of a couple of KBs tops. - * - * On Fam17h+, the number of controllers may be greater than two. So set - * the size equal to the maximum number of UMCs. */ - if (pvt->fam >= 0x17) - layers[1].size = num_umcs; - else - layers[1].size = 2; + layers[1].size = fam_type->max_num_controllers; layers[1].is_virt_csrow = false; mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); @@ -3678,8 +3665,6 @@ static int __init amd64_edac_init(void) if (!msrs) goto err_free; - compute_num_umcs(); - for (i = 0; i < amd_nb_num(); i++) { err = probe_one_instance(i); if (err) { diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 8c3cda81e619..0d5a9bc4d6de 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -479,6 +479,7 @@ struct low_ops { struct amd64_family_type { const char *ctl_name; u16 f0_id, f1_id, f2_id, f6_id; + u8 max_num_controllers; struct low_ops ops; }; -- 2.17.1
From: Yazen Ghannam <yazen.ghannam@amd.com> Split out gathering hardware information from init_one_instance() into a separate function get_hardware_info(). This is necessary so that the information can be cached earlier and used to check if memory is populated and if ECC is enabled on a node. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- Link: https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com rfc -> v1: * Fixup after making struct amd64_family_type fam_type global. drivers/edac/amd64_edac.c | 72 +++++++++++++++++++++++---------------- 1 file changed, 42 insertions(+), 30 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index b9a712819c68..4410da7c3a25 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3416,33 +3416,16 @@ static void compute_num_umcs(void) edac_dbg(1, "Number of UMCs: %x", num_umcs); } -static int init_one_instance(unsigned int nid) +static int get_hardware_info(struct amd64_pvt *pvt) { - struct pci_dev *F3 = node_to_amd_nb(nid)->misc; - struct mem_ctl_info *mci = NULL; - struct edac_mc_layer layers[2]; - struct amd64_pvt *pvt = NULL; u16 pci_id1, pci_id2; - int err = 0, ret; - - ret = -ENOMEM; - pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL); - if (!pvt) - goto err_ret; - - pvt->mc_node_id = nid; - pvt->F3 = F3; - - ret = -EINVAL; - fam_type = per_family_init(pvt); - if (!fam_type) - goto err_free; + int ret = -EINVAL; if (pvt->fam >= 0x17) { pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), GFP_KERNEL); if (!pvt->umc) { ret = -ENOMEM; - goto err_free; + goto err_ret; } pci_id1 = fam_type->f0_id; @@ -3452,18 +3435,33 @@ static int init_one_instance(unsigned int nid) pci_id2 = fam_type->f2_id; } - err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2); - if (err) + ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2); + if (ret) goto err_post_init; read_mc_regs(pvt); + return 0; + +err_post_init: + if (pvt->fam >= 0x17) + kfree(pvt->umc); + +err_ret: + return ret; +} + +static int init_one_instance(struct amd64_pvt *pvt) +{ + struct mem_ctl_info *mci = NULL; + struct edac_mc_layer layers[2]; + int ret = -EINVAL; + /* * We need to determine how many memory channels there are. Then use * that information for calculating the size of the dynamic instance * tables in the 'mci' structure. */ - ret = -EINVAL; pvt->channel_count = pvt->ops->early_channel_count(pvt); if (pvt->channel_count < 0) goto err_siblings; @@ -3488,7 +3486,7 @@ static int init_one_instance(unsigned int nid) layers[1].size = 2; layers[1].is_virt_csrow = false; - mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0); + mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); if (!mci) goto err_siblings; @@ -3514,20 +3512,16 @@ static int init_one_instance(unsigned int nid) err_siblings: free_mc_sibling_devs(pvt); -err_post_init: if (pvt->fam >= 0x17) kfree(pvt->umc); -err_free: - kfree(pvt); - -err_ret: return ret; } static int probe_one_instance(unsigned int nid) { struct pci_dev *F3 = node_to_amd_nb(nid)->misc; + struct amd64_pvt *pvt = NULL; struct ecc_settings *s; int ret; @@ -3538,6 +3532,21 @@ static int probe_one_instance(unsigned int nid) ecc_stngs[nid] = s; + pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL); + if (!pvt) + goto err_settings; + + pvt->mc_node_id = nid; + pvt->F3 = F3; + + fam_type = per_family_init(pvt); + if (!fam_type) + goto err_enable; + + ret = get_hardware_info(pvt); + if (ret < 0) + goto err_enable; + if (!ecc_enabled(F3, nid)) { ret = 0; @@ -3554,7 +3563,7 @@ static int probe_one_instance(unsigned int nid) goto err_enable; } - ret = init_one_instance(nid); + ret = init_one_instance(pvt); if (ret < 0) { amd64_err("Error probing instance: %d\n", nid); @@ -3567,6 +3576,9 @@ static int probe_one_instance(unsigned int nid) return ret; err_enable: + kfree(pvt); + +err_settings: kfree(s); ecc_stngs[nid] = NULL; -- 2.17.1
From: Yazen Ghannam <yazen.ghannam@amd.com> ...now that the data is available earlier. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- Link: https://lkml.kernel.org/r/20190821235938.118710-10-Yazen.Ghannam@amd.com rfc -> v1: * No change. drivers/edac/amd64_edac.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 0fde5ad2fdcd..feb68c0a3217 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3200,31 +3200,27 @@ static const char *ecc_msg = "'ecc_enable_override'.\n" " (Note that use of the override may cause unknown side effects.)\n"; -static bool ecc_enabled(struct pci_dev *F3, u16 nid) +static bool ecc_enabled(struct amd64_pvt *pvt) { + u16 nid = pvt->mc_node_id; bool nb_mce_en = false; u8 ecc_en = 0, i; u32 value; if (boot_cpu_data.x86 >= 0x17) { u8 umc_en_mask = 0, ecc_en_mask = 0; + struct amd64_umc *umc; for_each_umc(i) { - u32 base = get_umc_base(i); + umc = &pvt->umc[i]; /* Only check enabled UMCs. */ - if (amd_smn_read(nid, base + UMCCH_SDP_CTRL, &value)) - continue; - - if (!(value & UMC_SDP_INIT)) + if (!(umc->sdp_ctrl & UMC_SDP_INIT)) continue; umc_en_mask |= BIT(i); - if (amd_smn_read(nid, base + UMCCH_UMC_CAP_HI, &value)) - continue; - - if (value & UMC_ECC_ENABLED) + if (umc->umc_cap_hi & UMC_ECC_ENABLED) ecc_en_mask |= BIT(i); } @@ -3237,7 +3233,7 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid) /* Assume UMC MCA banks are enabled. */ nb_mce_en = true; } else { - amd64_read_pci_cfg(F3, NBCFG, &value); + amd64_read_pci_cfg(pvt->F3, NBCFG, &value); ecc_en = !!(value & NBCFG_ECC_ENABLE); @@ -3534,7 +3530,7 @@ static int probe_one_instance(unsigned int nid) if (ret < 0) goto err_enable; - if (!ecc_enabled(F3, nid)) { + if (!ecc_enabled(pvt)) { ret = 0; if (!ecc_enable_override) -- 2.17.1
From: Yazen Ghannam <yazen.ghannam@amd.com> Return early before checking for ECC if the node does not have any populated memory. Free any cached hardware data before returning. Also, return 0 in this case since this is not a failure. Other nodes may have memory and the module should attempt to load an instance for them. Move printing of hardware information to after the instance is initialized, so that the information is only printed for nodes with memory. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- Link: https://lkml.kernel.org/r/20190821235938.118710-11-Yazen.Ghannam@amd.com rfc -> v1: * Change message severity to "info". * Nodes without memory is a valid configuration. The user doesn't need to be warned. * Drop "DRAM ECC disabled" from message. * The message is given when no memory was detected on a node. * The state of DRAM ECC is not checked here. drivers/edac/amd64_edac.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index feb68c0a3217..2a0a8be8f767 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -2848,8 +2848,6 @@ static void read_mc_regs(struct amd64_pvt *pvt) edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]); determine_ecc_sym_sz(pvt); - - dump_misc_regs(pvt); } /* @@ -3501,6 +3499,19 @@ static int init_one_instance(struct amd64_pvt *pvt) return ret; } +static bool instance_has_memory(struct amd64_pvt *pvt) +{ + bool cs_enabled = false; + int cs = 0, dct = 0; + + for (dct = 0; dct < fam_type->max_num_controllers; dct++) { + for_each_chip_select(cs, dct, pvt) + cs_enabled |= csrow_enabled(cs, dct, pvt); + } + + return cs_enabled; +} + static int probe_one_instance(unsigned int nid) { struct pci_dev *F3 = node_to_amd_nb(nid)->misc; @@ -3530,6 +3541,12 @@ static int probe_one_instance(unsigned int nid) if (ret < 0) goto err_enable; + ret = 0; + if (!instance_has_memory(pvt)) { + amd64_info("Node %d: No DIMMs detected.\n", nid); + goto err_enable; + } + if (!ecc_enabled(pvt)) { ret = 0; @@ -3556,6 +3573,8 @@ static int probe_one_instance(unsigned int nid) goto err_enable; } + dump_misc_regs(pvt); + return ret; err_enable: -- 2.17.1
From: Yazen Ghannam <yazen.ghannam@amd.com> The following commit introduced a warning on error reports without a non-zero grain value. 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") The amd64_edac_mod module does not provide a value, so the warning will be given on the first reported memory error. Set the grain per DIMM to cacheline size (64 bytes). This is the current recommendation. Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- Link: https://lkml.kernel.org/r/724d6f97-61f2-94bd-3f4b-793a55b6ac15@amd.com rfc -> v1: * New patch. drivers/edac/amd64_edac.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 2a0a8be8f767..b5c0accfefcf 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -2944,6 +2944,7 @@ static int init_csrows_df(struct mem_ctl_info *mci) dimm->mtype = pvt->dram_type; dimm->edac_mode = edac_mode; dimm->dtype = dev_type; + dimm->grain = 64; } } @@ -3020,6 +3021,7 @@ static int init_csrows(struct mem_ctl_info *mci) dimm = csrow->channels[j]->dimm; dimm->mtype = pvt->dram_type; dimm->edac_mode = edac_mode; + dimm->grain = 64; } } -- 2.17.1
On Fri, Oct 18, 2019 at 03:31:26PM +0000, Ghannam, Yazen wrote: > From: Yazen Ghannam <yazen.ghannam@amd.com> > > Split out gathering hardware information from init_one_instance() into a > separate function get_hardware_info(). > > This is necessary so that the information can be cached earlier and used > to check if memory is populated and if ECC is enabled on a node. > > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> > --- > Link: > https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com > > rfc -> v1: > * Fixup after making struct amd64_family_type fam_type global. > > drivers/edac/amd64_edac.c | 72 +++++++++++++++++++++++---------------- > 1 file changed, 42 insertions(+), 30 deletions(-) > > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c > index b9a712819c68..4410da7c3a25 100644 > --- a/drivers/edac/amd64_edac.c > +++ b/drivers/edac/amd64_edac.c > @@ -3416,33 +3416,16 @@ static void compute_num_umcs(void) > edac_dbg(1, "Number of UMCs: %x", num_umcs); > } > > -static int init_one_instance(unsigned int nid) > +static int get_hardware_info(struct amd64_pvt *pvt) > { > - struct pci_dev *F3 = node_to_amd_nb(nid)->misc; > - struct mem_ctl_info *mci = NULL; > - struct edac_mc_layer layers[2]; > - struct amd64_pvt *pvt = NULL; > u16 pci_id1, pci_id2; > - int err = 0, ret; > - > - ret = -ENOMEM; > - pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL); > - if (!pvt) > - goto err_ret; > - > - pvt->mc_node_id = nid; > - pvt->F3 = F3; > - > - ret = -EINVAL; > - fam_type = per_family_init(pvt); > - if (!fam_type) > - goto err_free; > + int ret = -EINVAL; > > if (pvt->fam >= 0x17) { > pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), GFP_KERNEL); > if (!pvt->umc) { > ret = -ENOMEM; > - goto err_free; > + goto err_ret; > } > > pci_id1 = fam_type->f0_id; > @@ -3452,18 +3435,33 @@ static int init_one_instance(unsigned int nid) > pci_id2 = fam_type->f2_id; > } > > - err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2); > - if (err) > + ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2); > + if (ret) > goto err_post_init; > > read_mc_regs(pvt); > > + return 0; > + > +err_post_init: > + if (pvt->fam >= 0x17) > + kfree(pvt->umc); So you're freeing pvt->umc here but nothing in that function allocated it. get_hardware_info() in probe_one_instance() did but if you do it this way, it is kinda hard to follow and the layering is a bit iffy. So what I'd suggest is: * Rename get_hardware_info() to something like hw_info_get() so that you can have a counterpart hw_info_put() which does any cleanup after hw_info_get(), including the freeing of the ->umc. * In probe_one_instance(), if init_one_instance() fails, call hw_info_put() on the error path so that all your flow in the probe/init functions is nicely ballanced and easily followed. Makes sense? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Oct 18, 2019 at 03:31:26PM +0000, Ghannam, Yazen wrote: > From: Yazen Ghannam <yazen.ghannam@amd.com> > > The maximum number of memory controllers is fixed within a family/model > group. In most cases, this has been fixed at 2, but some systems may > have up to 8. > > The struct amd64_family_type already contains family/model-specific > information, and this can be used rather than adding model checks to > various functions. > > Create a new field in struct amd64_family_type for max_num_controllers. > Set this when setting other family type information, and use this when > needing the maximum number of memory controllers possible for a system. > > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> > --- > Link: > https://lkml.kernel.org/r/20190821235938.118710-9-Yazen.Ghannam@amd.com ... > diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h > index 8c3cda81e619..0d5a9bc4d6de 100644 > --- a/drivers/edac/amd64_edac.h > +++ b/drivers/edac/amd64_edac.h > @@ -479,6 +479,7 @@ struct low_ops { > struct amd64_family_type { > const char *ctl_name; > u16 f0_id, f1_id, f2_id, f6_id; > + u8 max_num_controllers; Sure but call that max_mcs or so, so that the code which mentions it, doesn't stick out too much. You can put a comment above it here to explain what it is: /* Maximum number of memory controllers per node */ u8 max_mcs; -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Oct 18, 2019 at 03:31:25PM +0000, Ghannam, Yazen wrote: > From: Yazen Ghannam <yazen.ghannam@amd.com> > > Hi Boris, > > This set contains the next revision of the RFC patches I included with > the last AMD64 EDAC updates. I dropped the RFC tags, and I added a > couple of new patches. Yah, looks pretty much good, modulo the minor things I commented on earlier. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, October 21, 2019 10:48 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc.
>
> On Fri, Oct 18, 2019 at 03:31:25PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > Hi Boris,
> >
> > This set contains the next revision of the RFC patches I included with
> > the last AMD64 EDAC updates. I dropped the RFC tags, and I added a
> > couple of new patches.
>
> Yah, looks pretty much good, modulo the minor things I commented on
> earlier.
>
Thank you. I'll send another revision soon.
-Yazen