* Re: FW: edac driver initialization, interrupt, & debug [not found] ` <CAChUvXMp6S6MBY_LmrfgdPcctQw70FoyxbiHeFqK+5fQx5omCw@mail.gmail.com> @ 2018-11-16 17:07 ` Tracy Smith 2018-11-17 14:05 ` Borislav Petkov 0 siblings, 1 reply; 10+ messages in thread From: Tracy Smith @ 2018-11-16 17:07 UTC (permalink / raw) To: linux-edac; +Cc: backports, linux-newbie, util-linux, linux-mmc I’m attempting to insmod/modprobe the layerscape_edac_mod.ko driver. It seems the driver enters layerscape_edac.c fsl_ddr_mc_init() and completes successfully. But there is no EDAC boot messages and no /proc/interrupts entry for the EDAC. I’m backporting the EDAC from the LSDK to the SDK 2.0. I have set CONFIG_EDAC_DEBUG and set edac_debug_level to 4, but I don’t see any debug messages other than printk()s that I add to fsl_ddr_mc_init() in layerscape_edac.c. No debug messages appear in any logs from fsl_ddr_edac.c. 1. How can I enable debug information? Is debugfs required to print the debug messages for the edac_debug_level and CONFIG_EDAC_DEBUG in the 4.1.35-rt41 kernel for drivers/edac? 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the platform_driver_register() is successful. But I don’t see any printk() messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear in any /var/log/*. 3. I don’t see any interrupts, so why would there not be an edac interrupt in /proc/inturrupts? Do I need to inject an error before seeing an edac interrupt in /proc/interrupts? lsmod module: layerscape_edac_mod 12594 0 4. To inject an error I can use the fsl_mc_inject …. routines in fsl_ddr_edac.c and write to the registers. But is there a utility that already uses these routines that can be used to inject an error (FSL_MC_ECC_ERR_INJECT, FSL_MC_DATA_ERR_INJECT_LO, Thanks you for any assistance. Instrumenting throughout the driver now to see if I can trace through the driver. thx, Tracy Smith -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: FW: edac driver initialization, interrupt, & debug 2018-11-16 17:07 ` FW: edac driver initialization, interrupt, & debug Tracy Smith @ 2018-11-17 14:05 ` Borislav Petkov 2018-11-17 23:22 ` Tracy Smith 2018-11-19 15:55 ` York Sun 0 siblings, 2 replies; 10+ messages in thread From: Borislav Petkov @ 2018-11-17 14:05 UTC (permalink / raw) To: Tracy Smith, York Sun Cc: linux-edac, backports, linux-newbie, util-linux, linux-mmc + York. On Fri, Nov 16, 2018 at 11:07:50AM -0600, Tracy Smith wrote: > I’m attempting to insmod/modprobe the layerscape_edac_mod.ko driver. > It seems the driver enters layerscape_edac.c fsl_ddr_mc_init() and > completes successfully. But there is no EDAC boot messages and no > /proc/interrupts entry for the EDAC. I’m backporting the EDAC from the > LSDK to the SDK 2.0. > > I have set CONFIG_EDAC_DEBUG and set edac_debug_level to 4, but I > don’t see any debug messages other than printk()s that I add to > fsl_ddr_mc_init() in layerscape_edac.c. No debug messages appear in > any logs from fsl_ddr_edac.c. > > 1. How can I enable debug information? Is debugfs required to print > the debug messages for the edac_debug_level and CONFIG_EDAC_DEBUG in > the 4.1.35-rt41 kernel for drivers/edac? No, just slap printks before every return statement, like: if (!devres_open_group(&op->dev, fsl_mc_err_probe, GFP_KERNEL)) { pr_err("%s: Error devres_open_group()\n", __func__); return -ENOMEM; } so that you can get closer to the place where it fails. > 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the > platform_driver_register() is successful. But I don’t see any printk() > messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear > in any /var/log/*. Yeah, see if it even gets called at all: int fsl_mc_err_probe(struct platform_device *op) { struct mem_ctl_info *mci; struct edac_mc_layer layers[2]; struct fsl_mc_pdata *pdata; struct resource r; u32 sdram_ctl; int res; pr_err("%s: entry\n", __func__); > 3. I don’t see any interrupts, so why would there not be an edac > interrupt in /proc/inturrupts? Probably because it doesn't reach the point where it registers an IRQ handler... > Do I need to inject an error before seeing an edac interrupt in > /proc/interrupts? You should, AFAICT, if it loads and registers stuff properly. > lsmod > module: layerscape_edac_mod 12594 0 > > 4. To inject an error I can use the fsl_mc_inject …. routines in > fsl_ddr_edac.c and write to the registers. But is there a utility that > already uses these routines that can be used to inject an error > (FSL_MC_ECC_ERR_INJECT, FSL_MC_DATA_ERR_INJECT_LO, You should be able to simply write to *sysfs*. Somewhere under /sys/devices/system/edac/... fsl_mc_inject_data_{lo,hi}_store simply writes the low and high inject register. Btw, looking at it, York, this whole injection functionality needs to be behind CONFIG_EDAC_DEBUG because a production driver shouldn't have injection capability. Hmmm. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: FW: edac driver initialization, interrupt, & debug 2018-11-17 14:05 ` Borislav Petkov @ 2018-11-17 23:22 ` Tracy Smith 2018-11-18 1:05 ` Steve Inkpen 2018-11-19 16:24 ` FW: " York Sun 2018-11-19 15:55 ` York Sun 1 sibling, 2 replies; 10+ messages in thread From: Tracy Smith @ 2018-11-17 23:22 UTC (permalink / raw) To: bp; +Cc: york.sun, linux-edac, backports, linux-newbie, util-linux, linux-mmc Thank you Boris for the information. It is helpful. >> 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the >> platform_driver_register() is successful. But I don’t see any printk() >> messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear >> in any /var/log/*. > Yeah, see if it even gets called at all: I did a grep on /var/log/* and don't see any printk's from fsl_mc_err_probe(). So, it's not being called. 1) What would cause the probe function not to be called? 2) Were changes made in how .probe functions were called between different kernel releases of the edac? 3) How should I go about root causing the reason for the .probe to fail since I may have to backport any changes made? 4) Possibly a patch exists for .probe changes after 4.1.35-rt41? static struct platform_driver fsl_ddr_mc_err_driver = { .probe = fsl_mc_err_probe, .remove = fsl_mc_err_remove, .driver = { .name = "fsl_ddr_mc_err", .of_match_table = fsl_ddr_mc_err_of_match, }, }l; int fsl_mc_err_probe(struct platform_device *op) { struct mem_ctl_info *mci; struct edac_mc_layer layers[2]; struct fsl_mc_pdata *pdata; struct resource r; u32 sdram_ctl; int res; pr_err("%s: entry\n", __func__); printk("entered fsl_mc_err_probe!\n"); Any assistance greatly appreciated. On Sat, Nov 17, 2018 at 8:05 AM Borislav Petkov <bp@alien8.de> wrote: > > + York. > > On Fri, Nov 16, 2018 at 11:07:50AM -0600, Tracy Smith wrote: > > I’m attempting to insmod/modprobe the layerscape_edac_mod.ko driver. > > It seems the driver enters layerscape_edac.c fsl_ddr_mc_init() and > > completes successfully. But there is no EDAC boot messages and no > > /proc/interrupts entry for the EDAC. I’m backporting the EDAC from the > > LSDK to the SDK 2.0. > > > > I have set CONFIG_EDAC_DEBUG and set edac_debug_level to 4, but I > > don’t see any debug messages other than printk()s that I add to > > fsl_ddr_mc_init() in layerscape_edac.c. No debug messages appear in > > any logs from fsl_ddr_edac.c. > > > > 1. How can I enable debug information? Is debugfs required to print > > the debug messages for the edac_debug_level and CONFIG_EDAC_DEBUG in > > the 4.1.35-rt41 kernel for drivers/edac? > > No, just slap printks before every return statement, like: > > if (!devres_open_group(&op->dev, fsl_mc_err_probe, GFP_KERNEL)) { > pr_err("%s: Error devres_open_group()\n", __func__); > return -ENOMEM; > } > > so that you can get closer to the place where it fails. > > > 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the > > platform_driver_register() is successful. But I don’t see any printk() > > messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear > > in any /var/log/*. > > Yeah, see if it even gets called at all: > > int fsl_mc_err_probe(struct platform_device *op) > { > struct mem_ctl_info *mci; > struct edac_mc_layer layers[2]; > struct fsl_mc_pdata *pdata; > struct resource r; > u32 sdram_ctl; > int res; > > pr_err("%s: entry\n", __func__); > > > > 3. I don’t see any interrupts, so why would there not be an edac > > interrupt in /proc/inturrupts? > > Probably because it doesn't reach the point where it registers an IRQ > handler... > > > Do I need to inject an error before seeing an edac interrupt in > > /proc/interrupts? > > You should, AFAICT, if it loads and registers stuff properly. > > > lsmod > > module: layerscape_edac_mod 12594 0 > > > > 4. To inject an error I can use the fsl_mc_inject …. routines in > > fsl_ddr_edac.c and write to the registers. But is there a utility that > > already uses these routines that can be used to inject an error > > (FSL_MC_ECC_ERR_INJECT, FSL_MC_DATA_ERR_INJECT_LO, > > You should be able to simply write to *sysfs*. Somewhere under > /sys/devices/system/edac/... > > fsl_mc_inject_data_{lo,hi}_store simply writes the low and high inject > register. > > Btw, looking at it, York, this whole injection functionality needs to > be behind CONFIG_EDAC_DEBUG because a production driver shouldn't have > injection capability. > > Hmmm. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply. -- Confidentiality notice: This e-mail message, including any attachments, may contain legally privileged and/or confidential information. If you are not the intended recipient(s), please immediately notify the sender and delete this e-mail message. -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: edac driver initialization, interrupt, & debug 2018-11-17 23:22 ` Tracy Smith @ 2018-11-18 1:05 ` Steve Inkpen 2018-11-19 16:37 ` Tracy Smith 2018-11-19 16:24 ` FW: " York Sun 1 sibling, 1 reply; 10+ messages in thread From: Steve Inkpen @ 2018-11-18 1:05 UTC (permalink / raw) To: Tracy Smith Cc: bp, york.sun, linux-edac, backports, linux-newbie, util-linux, linux-mmc Are you using a device tree? If yes, make sure there is an entry for this device. From your snippet of code, there appears to be a match entry in of_match_table? Steve > On Nov 17, 2018, at 6:22 PM, Tracy Smith <tlsmith3777@gmail.com> wrote: > > Thank you Boris for the information. It is helpful. > >>> 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the >>> platform_driver_register() is successful. But I don’t see any printk() >>> messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear >>> in any /var/log/*. > >> Yeah, see if it even gets called at all: > > I did a grep on /var/log/* and don't see any printk's from > fsl_mc_err_probe(). So, it's not being called. > > 1) What would cause the probe function not to be called? > > 2) Were changes made in how .probe functions were called between > different kernel releases of the edac? > > 3) How should I go about root causing the reason for the .probe to > fail since I may have to backport any changes made? > > 4) Possibly a patch exists for .probe changes after 4.1.35-rt41? > > static struct platform_driver fsl_ddr_mc_err_driver = { > > .probe = fsl_mc_err_probe, > .remove = fsl_mc_err_remove, > .driver = { > .name = "fsl_ddr_mc_err", > .of_match_table = fsl_ddr_mc_err_of_match, > }, > }l; > > int fsl_mc_err_probe(struct platform_device *op) > > { > struct mem_ctl_info *mci; > struct edac_mc_layer layers[2]; > struct fsl_mc_pdata *pdata; > struct resource r; > u32 sdram_ctl; > int res; > > pr_err("%s: entry\n", __func__); > printk("entered fsl_mc_err_probe!\n"); > > Any assistance greatly appreciated. > > >> On Sat, Nov 17, 2018 at 8:05 AM Borislav Petkov <bp@alien8.de> wrote: >> >> + York. >> >>> On Fri, Nov 16, 2018 at 11:07:50AM -0600, Tracy Smith wrote: >>> I’m attempting to insmod/modprobe the layerscape_edac_mod.ko driver. >>> It seems the driver enters layerscape_edac.c fsl_ddr_mc_init() and >>> completes successfully. But there is no EDAC boot messages and no >>> /proc/interrupts entry for the EDAC. I’m backporting the EDAC from the >>> LSDK to the SDK 2.0. >>> >>> I have set CONFIG_EDAC_DEBUG and set edac_debug_level to 4, but I >>> don’t see any debug messages other than printk()s that I add to >>> fsl_ddr_mc_init() in layerscape_edac.c. No debug messages appear in >>> any logs from fsl_ddr_edac.c. >>> >>> 1. How can I enable debug information? Is debugfs required to print >>> the debug messages for the edac_debug_level and CONFIG_EDAC_DEBUG in >>> the 4.1.35-rt41 kernel for drivers/edac? >> >> No, just slap printks before every return statement, like: >> >> if (!devres_open_group(&op->dev, fsl_mc_err_probe, GFP_KERNEL)) { >> pr_err("%s: Error devres_open_group()\n", __func__); >> return -ENOMEM; >> } >> >> so that you can get closer to the place where it fails. >> >>> 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the >>> platform_driver_register() is successful. But I don’t see any printk() >>> messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear >>> in any /var/log/*. >> >> Yeah, see if it even gets called at all: >> >> int fsl_mc_err_probe(struct platform_device *op) >> { >> struct mem_ctl_info *mci; >> struct edac_mc_layer layers[2]; >> struct fsl_mc_pdata *pdata; >> struct resource r; >> u32 sdram_ctl; >> int res; >> >> pr_err("%s: entry\n", __func__); >> >> >>> 3. I don’t see any interrupts, so why would there not be an edac >>> interrupt in /proc/inturrupts? >> >> Probably because it doesn't reach the point where it registers an IRQ >> handler... >> >>> Do I need to inject an error before seeing an edac interrupt in >>> /proc/interrupts? >> >> You should, AFAICT, if it loads and registers stuff properly. >> >>> lsmod >>> module: layerscape_edac_mod 12594 0 >>> >>> 4. To inject an error I can use the fsl_mc_inject …. routines in >>> fsl_ddr_edac.c and write to the registers. But is there a utility that >>> already uses these routines that can be used to inject an error >>> (FSL_MC_ECC_ERR_INJECT, FSL_MC_DATA_ERR_INJECT_LO, >> >> You should be able to simply write to *sysfs*. Somewhere under >> /sys/devices/system/edac/... >> >> fsl_mc_inject_data_{lo,hi}_store simply writes the low and high inject >> register. >> >> Btw, looking at it, York, this whole injection functionality needs to >> be behind CONFIG_EDAC_DEBUG because a production driver shouldn't have >> injection capability. >> >> Hmmm. >> >> -- >> Regards/Gruss, >> Boris. >> >> Good mailing practices for 400: avoid top-posting and trim the reply. > > > > -- > Confidentiality notice: This e-mail message, including any > attachments, may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), please > immediately notify the sender and delete this e-mail message. -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: edac driver initialization, interrupt, & debug 2018-11-18 1:05 ` Steve Inkpen @ 2018-11-19 16:37 ` Tracy Smith 2018-11-19 16:48 ` York Sun 0 siblings, 1 reply; 10+ messages in thread From: Tracy Smith @ 2018-11-19 16:37 UTC (permalink / raw) To: steve Cc: bp, york.sun, linux-edac, backports, linux-newbie, util-linux, linux-mmc Steve, you were correct, there wasn't a device tree entry for the qoriq memory controller in arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi. I added it making it identical to the fsl-ls1046s.dtsi, which should have the same memory controller and entry as the ls1043a. I added this but it didn't make a difference as far as being able to call the probe function. I'm now checking the mpc85xx_edac.c dtsi entry for comparison since York used the mpc85xx as the basis for the layerscape, but there is something else missing preventing the probe function from being called. @York What is your entry for /proc/device-tree/soc/ifc@1530000/board-control@1,0/compatible @York cat /proc/device-tree/compatible entry is this, is this correct? fsl,ls1043a-rdbfsl,ls1043a ddr: memory-controller@1080000 { compatible = "fsl,qoriq-memory-controller"; reg = <0x0 0x1080000 0x0 0x1000>; interrupts = <0 144 0x4>; big-endian; }; ifc: ifc@1530000 { compatible = "fsl,ifc", "simple-bus"; reg = <0x0 0x1530000 0x0 0x10000>; interrupts = <0 43 0x4>; }; I haven't had to change the edac code to compile it, so it is what is in drivers/edac. The ECC is enabled in uboot and supported by the memory controller DDR4. I have attached the layerscape_edac_mod.mod.c file after compiling at the end. I see the fsl_ddr_mc_err_of_match[] for the memory controller and it is associated with the .of_match_table = fsl_ddr_mc_err_of_match static const struct of_device_id fsl_ddr_mc_err_of_match[] = { { .compatible = "fsl,qoriq-memory-controller", }, {}, }; MODULE_DEVICE_TABLE(of, fsl_ddr_mc_err_of_match); static struct platform_driver fsl_ddr_mc_err_driver = { .probe = fsl_mc_err_probe, .remove = fsl_mc_err_remove, .driver = { .name = "fsl_ddr_mc_err", .of_match_table = fsl_ddr_mc_err_of_match, }, }; Beyond this, I only see a "of_match_device" in the altera_edac.c driver and the highbanks below, but not in any other drivers. cd drivers/edac grep of_match_device * | more altera_edac.c: id = of_match_device(altr_sdram_ctrl_of_match, &pdev->dev); highbank_l2_edac.c: id = of_match_device(hb_l2_err_of_match, &pdev->dev); highbank_mc_edac.c: id = of_match_device(hb_ddr_ctrl_of_match, &pdev->dev); The .of_match_table entry appears correct for the layerscape_edac.c. York took this entry from the mpc85xx_edac.c. layerscape_edac.c: .of_match_table = fsl_ddr_mc_err_of_match, mpc85xx_edac.c: .of_match_table = mpc85xx_l2_err_of_match, mpc85xx_edac.c: .of_match_table = mpc85xx_mc_err_of_match, ppc4xx_edac.c: .of_match_table = ppc4xx_edac_match, synopsys_edac.c: .of_match_table = synps_edac_match, xgene_edac.c: .of_match_table = xgene_edac_of_match, -- layerscape_edac_mod.mod.c #include <linux/vermagic.h> #include <linux/compiler.h> MODULE_INFO(vermagic, VERMAGIC_STRING); __visible struct module __this_module __attribute__((section(".gnu.linkonce.this_module"))) = { .name = KBUILD_MODNAME, .init = init_module, #ifdef CONFIG_MODULE_UNLOAD .exit = cleanup_module, #endif .arch = MODULE_ARCH_INIT, }; MODULE_INFO(intree, "Y"); static const struct modversion_info ____versions[] __used __attribute__((section("__versions"))) = { { 0xf41fc8a9, __VMLINUX_SYMBOL_STR(module_layout) }, { 0x8294e3fc, __VMLINUX_SYMBOL_STR(edac_mc_add_mc_with_groups) }, { 0x1fdc7df2, __VMLINUX_SYMBOL_STR(_mcount) }, { 0x51eafc8e, __VMLINUX_SYMBOL_STR(param_ops_int) }, { 0x69a358a6, __VMLINUX_SYMBOL_STR(iomem_resource) }, { 0x91715312, __VMLINUX_SYMBOL_STR(sprintf) }, { 0x26d622f, __VMLINUX_SYMBOL_STR(__platform_driver_register) }, { 0x60ea2d6, __VMLINUX_SYMBOL_STR(kstrtoull) }, { 0x11089ac7, __VMLINUX_SYMBOL_STR(_ctype) }, { 0xb51fbd64, __VMLINUX_SYMBOL_STR(edac_op_state) }, { 0x27e1a049, __VMLINUX_SYMBOL_STR(printk) }, { 0x1e614c08, __VMLINUX_SYMBOL_STR(of_find_property) }, { 0x1215bb3b, __VMLINUX_SYMBOL_STR(devres_open_group) }, { 0x91f27fc0, __VMLINUX_SYMBOL_STR(edac_mc_handle_error) }, { 0x643ce492, __VMLINUX_SYMBOL_STR(edac_mc_free) }, { 0x9b69ee39, __VMLINUX_SYMBOL_STR(edac_debug_level) }, { 0x9c26e7ff, __VMLINUX_SYMBOL_STR(edac_mc_alloc) }, { 0x3c5bfeaa, __VMLINUX_SYMBOL_STR(__devm_request_region) }, { 0x8229211c, __VMLINUX_SYMBOL_STR(devm_ioremap) }, { 0x159dd96a, __VMLINUX_SYMBOL_STR(edac_mc_del_mc) }, { 0xb3d3cbde, __VMLINUX_SYMBOL_STR(devres_remove_group) }, { 0x48fb5a26, __VMLINUX_SYMBOL_STR(of_address_to_resource) }, { 0xb2904f0, __VMLINUX_SYMBOL_STR(platform_get_irq) }, { 0xfb3ca8db, __VMLINUX_SYMBOL_STR(platform_driver_unregister) }, { 0x69e0f942, __VMLINUX_SYMBOL_STR(devm_request_threaded_irq) }, { 0xec4e8f9b, __VMLINUX_SYMBOL_STR(devres_release_group) }, }; static const char __module_depends[] __used __attribute__((section(".modinfo"))) = "depends="; Question: is this MODULE_ALIAS "of:N*T*Cfsl" correct? MODULE_ALIAS("of:N*T*Cfsl,qoriq-memory-controller*"); - layerscape_edac_mod.mod.c 55/55 100% thx, Tracy On Sat, Nov 17, 2018 at 7:05 PM Steve Inkpen <steve@theinkpens.com> wrote: > > Are you using a device tree? If yes, make sure there is an entry for this device. > > From your snippet of code, there appears to be a match entry in of_match_table? > > Steve > > > On Nov 17, 2018, at 6:22 PM, Tracy Smith <tlsmith3777@gmail.com> wrote: > > > > Thank you Boris for the information. It is helpful. > > > >>> 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the > >>> platform_driver_register() is successful. But I don’t see any printk() > >>> messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear > >>> in any /var/log/*. > > > >> Yeah, see if it even gets called at all: > > > > I did a grep on /var/log/* and don't see any printk's from > > fsl_mc_err_probe(). So, it's not being called. > > > > 1) What would cause the probe function not to be called? > > > > 2) Were changes made in how .probe functions were called between > > different kernel releases of the edac? > > > > 3) How should I go about root causing the reason for the .probe to > > fail since I may have to backport any changes made? > > > > 4) Possibly a patch exists for .probe changes after 4.1.35-rt41? > > > > static struct platform_driver fsl_ddr_mc_err_driver = { > > > > .probe = fsl_mc_err_probe, > > .remove = fsl_mc_err_remove, > > .driver = { > > .name = "fsl_ddr_mc_err", > > .of_match_table = fsl_ddr_mc_err_of_match, > > }, > > }l; > > > > int fsl_mc_err_probe(struct platform_device *op) > > > > { > > struct mem_ctl_info *mci; > > struct edac_mc_layer layers[2]; > > struct fsl_mc_pdata *pdata; > > struct resource r; > > u32 sdram_ctl; > > int res; > > > > pr_err("%s: entry\n", __func__); > > printk("entered fsl_mc_err_probe!\n"); > > > > Any assistance greatly appreciated. > > > > > >> On Sat, Nov 17, 2018 at 8:05 AM Borislav Petkov <bp@alien8.de> wrote: > >> > >> + York. > >> > >>> On Fri, Nov 16, 2018 at 11:07:50AM -0600, Tracy Smith wrote: > >>> I’m attempting to insmod/modprobe the layerscape_edac_mod.ko driver. > >>> It seems the driver enters layerscape_edac.c fsl_ddr_mc_init() and > >>> completes successfully. But there is no EDAC boot messages and no > >>> /proc/interrupts entry for the EDAC. I’m backporting the EDAC from the > >>> LSDK to the SDK 2.0. > >>> > >>> I have set CONFIG_EDAC_DEBUG and set edac_debug_level to 4, but I > >>> don’t see any debug messages other than printk()s that I add to > >>> fsl_ddr_mc_init() in layerscape_edac.c. No debug messages appear in > >>> any logs from fsl_ddr_edac.c. > >>> > >>> 1. How can I enable debug information? Is debugfs required to print > >>> the debug messages for the edac_debug_level and CONFIG_EDAC_DEBUG in > >>> the 4.1.35-rt41 kernel for drivers/edac? > >> > >> No, just slap printks before every return statement, like: > >> > >> if (!devres_open_group(&op->dev, fsl_mc_err_probe, GFP_KERNEL)) { > >> pr_err("%s: Error devres_open_group()\n", __func__); > >> return -ENOMEM; > >> } > >> > >> so that you can get closer to the place where it fails. > >> > >>> 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the > >>> platform_driver_register() is successful. But I don’t see any printk() > >>> messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear > >>> in any /var/log/*. > >> > >> Yeah, see if it even gets called at all: > >> > >> int fsl_mc_err_probe(struct platform_device *op) > >> { > >> struct mem_ctl_info *mci; > >> struct edac_mc_layer layers[2]; > >> struct fsl_mc_pdata *pdata; > >> struct resource r; > >> u32 sdram_ctl; > >> int res; > >> > >> pr_err("%s: entry\n", __func__); > >> > >> > >>> 3. I don’t see any interrupts, so why would there not be an edac > >>> interrupt in /proc/inturrupts? > >> > >> Probably because it doesn't reach the point where it registers an IRQ > >> handler... > >> > >>> Do I need to inject an error before seeing an edac interrupt in > >>> /proc/interrupts? > >> > >> You should, AFAICT, if it loads and registers stuff properly. > >> > >>> lsmod > >>> module: layerscape_edac_mod 12594 0 > >>> > >>> 4. To inject an error I can use the fsl_mc_inject …. routines in > >>> fsl_ddr_edac.c and write to the registers. But is there a utility that > >>> already uses these routines that can be used to inject an error > >>> (FSL_MC_ECC_ERR_INJECT, FSL_MC_DATA_ERR_INJECT_LO, > >> > >> You should be able to simply write to *sysfs*. Somewhere under > >> /sys/devices/system/edac/... > >> > >> fsl_mc_inject_data_{lo,hi}_store simply writes the low and high inject > >> register. > >> > >> Btw, looking at it, York, this whole injection functionality needs to > >> be behind CONFIG_EDAC_DEBUG because a production driver shouldn't have > >> injection capability. > >> > >> Hmmm. > >> > >> -- > >> Regards/Gruss, > >> Boris. > >> > >> Good mailing practices for 400: avoid top-posting and trim the reply. > > > > > > > > -- > > Confidentiality notice: This e-mail message, including any > > attachments, may contain legally privileged and/or confidential > > information. If you are not the intended recipient(s), please > > immediately notify the sender and delete this e-mail message. -- Confidentiality notice: This e-mail message, including any attachments, may contain legally privileged and/or confidential information. If you are not the intended recipient(s), please immediately notify the sender and delete this e-mail message. -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: edac driver initialization, interrupt, & debug 2018-11-19 16:37 ` Tracy Smith @ 2018-11-19 16:48 ` York Sun 2018-11-21 17:01 ` Tracy Smith 0 siblings, 1 reply; 10+ messages in thread From: York Sun @ 2018-11-19 16:48 UTC (permalink / raw) To: Tracy Smith, steve Cc: bp, linux-edac, backports, linux-newbie, util-linux, linux-mmc On 11/19/18 8:38 AM, Tracy Smith wrote: > Steve, you were correct, there wasn't a device tree entry for the > qoriq memory controller in > arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi. I added it making it > identical to the fsl-ls1046s.dtsi, which should have the same memory > controller and entry as the ls1043a. I added this but it didn't make > a difference as far as being able to call the probe function. I'm now > checking the mpc85xx_edac.c dtsi entry for comparison since York used > the mpc85xx as the basis for the layerscape, but there is something > else missing preventing the probe function from being called. > > @York > What is your entry for > /proc/device-tree/soc/ifc@1530000/board-control@1,0/compatible EDAC driver doesn't check IFC. Are you debugging EDAC for memory controller? > > @York > cat /proc/device-tree/compatible entry is this, is this correct? > fsl,ls1043a-rdbfsl,ls1043a Once again, you are using your modified code on your own board. So it is not ls1043ardb. This compatible has nothing to do with EDAC driver. I cannot help you with ls1043ardb because the real ls1043ardb board doesn't support ECC. The closest board I have is ls1046ardb. > > ddr: memory-controller@1080000 { > compatible = "fsl,qoriq-memory-controller"; > reg = <0x0 0x1080000 0x0 0x1000>; > interrupts = <0 144 0x4>; > big-endian; > }; This is your source code, not your final device tree. Please learn to use "fdt" command under U-Boot to dump your device tree before booting Linux, or check after Linux is up. For your reference, on my ls1046ardb, I have # cat /proc/device-tree/soc/memory-controller@1080000/compatible fsl,qoriq-memory-controller York -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: edac driver initialization, interrupt, & debug 2018-11-19 16:48 ` York Sun @ 2018-11-21 17:01 ` Tracy Smith 2018-11-21 22:02 ` Tracy Smith 0 siblings, 1 reply; 10+ messages in thread From: Tracy Smith @ 2018-11-21 17:01 UTC (permalink / raw) To: york.sun Cc: steve, bp, linux-edac, backports, linux-newbie, util-linux, linux-mmc Not probing the edac driver turned out to be a device tree issue as Steve suspected. Thanks to both Steve and York, this has been resolved and the backport is now logging ECC errors after injection. Added the ddr qoriq-memory-controller entry since we used a different .dtsi file. arch/arm64/boot/dts/freescale/...ls1043a.dtsi ddr: memory-controller@1080000 { compatible = "fsl,qoriq-memory-controller"; reg = <0x0 0x1080000 0x0 0x1000>; interrupts = <0 144 0x4>; big-endian; }; I now need to collect and report CE and UE ECC errors and extend the existing logging and reporting function that I currently see. After reviewing the following document, the system logging appears different from that given in the kernel EDAC document. I need the level of granularity described in the edac.txt file. https://www.mjmwired.net/kernel/Documentation/edac.txt#173 same as kernel/Documentation/edac.txt 1) Can I gather the system logging described below in the edac.txt file for layerscape? 2) Is there anything similar to the edac-utils but for ARM, or does sysfs replace the edac-utils, or something else? 3) What is currently used for collecting and reporting ECC errors for ARM/EDAC beyond the kernel log and messages? https://github.com/grondo/edac-utils 4) How is RAS reporting integrated into EDAC for error collection and reporting? 5) Has there been a patch to prevent EDAC sysfs API from reporting bogus values? See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html - The EDAC sysfs API will still report bogus values. So, userspace tools like edac-utils will still use the bogus data; - Add a new tracepoint-based way to get the binary information about the errors. This is the logging I currently see with layerscape EDAC. Need something explaining these fields. [ 407.612311] EDAC FSL_DDR MC0: Err Detect Register: 0x80000004 [ 407.618182] EDAC FSL_DDR MC0: Faulty Data bit: 0 [ 407.622793] EDAC FSL_DDR MC0: Expected Data / ECC: 0x40c50901_40c50900 / 0x800000f0 [ 407.630443] EDAC FSL_DDR MC0: Captured Data / ECC: 0x40c50900_40c50901 / 0xf0 [ 407.637571] EDAC FSL_DDR MC0: Err addr: 0x3e0bfff50 [ 407.642440] EDAC FSL_DDR MC0: PFN: 0x003e0bff This is the level of detail I need: SYSTEM LOGGING -------------- If logging for UEs and CEs is enabled, then system logs will contain information indicating that errors have been detected: EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, channel 1 "DIMM_B1": amd76x_edac EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, channel 1 "DIMM_B1": amd76x_edac The structure of the message is: the memory controller (MC0) Error type (CE) memory page (0x283) offset in the page (0xce0) the byte granularity (grain 8) or resolution of the error the error syndrome (0xb741) memory row (row 0) memory channel (channel 1) DIMM label, if set prior (DIMM B1 and then an optional, driver-specific message that may have additional information. Both UEs and CEs with no info will lack all but memory controller, error type, a notice of "no info" and then an optional, driver-specific error message. On Mon, Nov 19, 2018 at 10:48 AM York Sun <york.sun@nxp.com> wrote: > > On 11/19/18 8:38 AM, Tracy Smith wrote: > > Steve, you were correct, there wasn't a device tree entry for the > > qoriq memory controller in > > arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi. I added it making it > > identical to the fsl-ls1046s.dtsi, which should have the same memory > > controller and entry as the ls1043a. I added this but it didn't make > > a difference as far as being able to call the probe function. I'm now > > checking the mpc85xx_edac.c dtsi entry for comparison since York used > > the mpc85xx as the basis for the layerscape, but there is something > > else missing preventing the probe function from being called. > > > > @York > > What is your entry for > > /proc/device-tree/soc/ifc@1530000/board-control@1,0/compatible > > EDAC driver doesn't check IFC. Are you debugging EDAC for memory controller? > > > > > @York > > cat /proc/device-tree/compatible entry is this, is this correct? > > fsl,ls1043a-rdbfsl,ls1043a > > Once again, you are using your modified code on your own board. So it is > not ls1043ardb. This compatible has nothing to do with EDAC driver. > > I cannot help you with ls1043ardb because the real ls1043ardb board > doesn't support ECC. The closest board I have is ls1046ardb. > > > > > ddr: memory-controller@1080000 { > > compatible = "fsl,qoriq-memory-controller"; > > reg = <0x0 0x1080000 0x0 0x1000>; > > interrupts = <0 144 0x4>; > > big-endian; > > }; > > This is your source code, not your final device tree. Please learn to > use "fdt" command under U-Boot to dump your device tree before booting > Linux, or check after Linux is up. For your reference, on my ls1046ardb, > I have > > # cat /proc/device-tree/soc/memory-controller@1080000/compatible > fsl,qoriq-memory-controller > > York -- Confidentiality notice: This e-mail message, including any attachments, may contain legally privileged and/or confidential information. If you are not the intended recipient(s), please immediately notify the sender and delete this e-mail message. -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: edac driver initialization, interrupt, & debug 2018-11-21 17:01 ` Tracy Smith @ 2018-11-21 22:02 ` Tracy Smith 0 siblings, 0 replies; 10+ messages in thread From: Tracy Smith @ 2018-11-21 22:02 UTC (permalink / raw) To: york.sun Cc: steve, bp, linux-edac, backports, linux-newbie, util-linux, linux-mmc Please ignore the first question. I now see the expected EDAC message in the kernel log: EDAC MC0: 1 CE fsl_mc_err on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x5df1f offset:0xe40 grain:8 syndrome:0xe0e0) 1) Is there anything similar to the edac-utils but for ARM instead of x86, or does sysfs replace the edac-utils, or is there something else for ARM? 2) What is currently used for collecting and reporting ECC errors for ARM/EDAC beyond the kernel log and messages? https://github.com/grondo/edac-utils 3) How is RAS/rasdaemon reporting integrated into EDAC for error collection and reporting? 4) Has there been a patch to prevent EDAC sysfs API from reporting bogus values? See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html On Wed, Nov 21, 2018 at 11:01 AM Tracy Smith <tlsmith3777@gmail.com> wrote: > > Not probing the edac driver turned out to be a device tree issue as > Steve suspected. Thanks to both Steve and York, this has been resolved > and the backport is now logging ECC errors after injection. Added the > ddr qoriq-memory-controller entry since we used a different .dtsi > file. > > arch/arm64/boot/dts/freescale/...ls1043a.dtsi > > ddr: memory-controller@1080000 > { compatible = "fsl,qoriq-memory-controller"; reg = <0x0 0x1080000 0x0 > 0x1000>; interrupts = <0 144 0x4>; big-endian; }; > > I now need to collect and report CE and UE ECC errors and extend the > existing logging and reporting function that I currently see. After > reviewing the following document, the system logging appears different > from that given in the kernel EDAC document. I need the level of > granularity described in the edac.txt file. > > https://www.mjmwired.net/kernel/Documentation/edac.txt#173 same as > kernel/Documentation/edac.txt > > 1) Can I gather the system logging described below in the edac.txt > file for layerscape? > > 2) Is there anything similar to the edac-utils but for ARM, or does > sysfs replace the edac-utils, or something else? > > 3) What is currently used for collecting and reporting ECC errors for > ARM/EDAC beyond the kernel log and messages? > https://github.com/grondo/edac-utils > > 4) How is RAS reporting integrated into EDAC for error collection and reporting? > > 5) Has there been a patch to prevent EDAC sysfs API from reporting bogus values? > See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html > > - The EDAC sysfs API will still report bogus values. So, userspace > tools like edac-utils will still use the bogus data; > > - Add a new tracepoint-based way to get the binary information about > the errors. > > This is the logging I currently see with layerscape EDAC. Need > something explaining these fields. > > [ 407.612311] EDAC FSL_DDR MC0: Err Detect Register: 0x80000004 [ > 407.618182] EDAC FSL_DDR MC0: Faulty Data bit: 0 > [ 407.622793] EDAC FSL_DDR MC0: Expected Data / ECC: > 0x40c50901_40c50900 / 0x800000f0 > [ 407.630443] EDAC FSL_DDR MC0: Captured Data / ECC: 0x40c50900_40c50901 / 0xf0 > [ 407.637571] EDAC FSL_DDR MC0: Err addr: 0x3e0bfff50 > [ 407.642440] EDAC FSL_DDR MC0: PFN: 0x003e0bff > > This is the level of detail I need: > > SYSTEM LOGGING > -------------- > > If logging for UEs and CEs is enabled, then system logs will contain > information indicating that errors have been detected: > > EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, > channel 1 "DIMM_B1": amd76x_edac > > EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, > channel 1 "DIMM_B1": amd76x_edac > > The structure of the message is: > the memory controller (MC0) > Error type (CE) > memory page (0x283) > offset in the page (0xce0) > the byte granularity (grain 8) > or resolution of the error > the error syndrome (0xb741) > memory row (row 0) > memory channel (channel 1) > DIMM label, if set prior (DIMM B1 > and then an optional, driver-specific message that may > have additional information. > > Both UEs and CEs with no info will lack all but memory controller, error > type, a notice of "no info" and then an optional, driver-specific error > message. > > On Mon, Nov 19, 2018 at 10:48 AM York Sun <york.sun@nxp.com> wrote: > > > > On 11/19/18 8:38 AM, Tracy Smith wrote: > > > Steve, you were correct, there wasn't a device tree entry for the > > > qoriq memory controller in > > > arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi. I added it making it > > > identical to the fsl-ls1046s.dtsi, which should have the same memory > > > controller and entry as the ls1043a. I added this but it didn't make > > > a difference as far as being able to call the probe function. I'm now > > > checking the mpc85xx_edac.c dtsi entry for comparison since York used > > > the mpc85xx as the basis for the layerscape, but there is something > > > else missing preventing the probe function from being called. > > > > > > @York > > > What is your entry for > > > /proc/device-tree/soc/ifc@1530000/board-control@1,0/compatible > > > > EDAC driver doesn't check IFC. Are you debugging EDAC for memory controller? > > > > > > > > @York > > > cat /proc/device-tree/compatible entry is this, is this correct? > > > fsl,ls1043a-rdbfsl,ls1043a > > > > Once again, you are using your modified code on your own board. So it is > > not ls1043ardb. This compatible has nothing to do with EDAC driver. > > > > I cannot help you with ls1043ardb because the real ls1043ardb board > > doesn't support ECC. The closest board I have is ls1046ardb. > > > > > > > > ddr: memory-controller@1080000 { > > > compatible = "fsl,qoriq-memory-controller"; > > > reg = <0x0 0x1080000 0x0 0x1000>; > > > interrupts = <0 144 0x4>; > > > big-endian; > > > }; > > > > This is your source code, not your final device tree. Please learn to > > use "fdt" command under U-Boot to dump your device tree before booting > > Linux, or check after Linux is up. For your reference, on my ls1046ardb, > > I have > > > > # cat /proc/device-tree/soc/memory-controller@1080000/compatible > > fsl,qoriq-memory-controller > > > > York > > > > -- > Confidentiality notice: This e-mail message, including any > attachments, may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), please > immediately notify the sender and delete this e-mail message. -- Confidentiality notice: This e-mail message, including any attachments, may contain legally privileged and/or confidential information. If you are not the intended recipient(s), please immediately notify the sender and delete this e-mail message. -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: FW: edac driver initialization, interrupt, & debug 2018-11-17 23:22 ` Tracy Smith 2018-11-18 1:05 ` Steve Inkpen @ 2018-11-19 16:24 ` York Sun 1 sibling, 0 replies; 10+ messages in thread From: York Sun @ 2018-11-19 16:24 UTC (permalink / raw) To: Tracy Smith, bp Cc: linux-edac, backports, linux-newbie, util-linux, linux-mmc On 11/17/18 3:22 PM, Tracy Smith wrote: > Thank you Boris for the information. It is helpful. > >>> 2. The default EDAC_OPSTATE_INT in fsl_ddr_mc_init() and the >>> platform_driver_register() is successful. But I don’t see any printk() >>> messages in fsl_mc_err_probe() within fsl_ddr_edac.c. No errors appear >>> in any /var/log/*. > >> Yeah, see if it even gets called at all: > > I did a grep on /var/log/* and don't see any printk's from > fsl_mc_err_probe(). So, it's not being called. > > 1) What would cause the probe function not to be called? > > 2) Were changes made in how .probe functions were called between > different kernel releases of the edac? > > 3) How should I go about root causing the reason for the .probe to > fail since I may have to backport any changes made? > > 4) Possibly a patch exists for .probe changes after 4.1.35-rt41? > > static struct platform_driver fsl_ddr_mc_err_driver = { > > .probe = fsl_mc_err_probe, > .remove = fsl_mc_err_remove, > .driver = { > .name = "fsl_ddr_mc_err", > .of_match_table = fsl_ddr_mc_err_of_match, > }, > }l; > > int fsl_mc_err_probe(struct platform_device *op) > > { > struct mem_ctl_info *mci; > struct edac_mc_layer layers[2]; > struct fsl_mc_pdata *pdata; > struct resource r; > u32 sdram_ctl; > int res; > > pr_err("%s: entry\n", __func__); > printk("entered fsl_mc_err_probe!\n"); > > Any assistance greatly appreciated. > > > On Sat, Nov 17, 2018 at 8:05 AM Borislav Petkov <bp@alien8.de> wrote: >> >> + York. >> >> On Fri, Nov 16, 2018 at 11:07:50AM -0600, Tracy Smith wrote: >>> I’m attempting to insmod/modprobe the layerscape_edac_mod.ko driver. >>> It seems the driver enters layerscape_edac.c fsl_ddr_mc_init() and >>> completes successfully. But there is no EDAC boot messages and no >>> /proc/interrupts entry for the EDAC. I’m backporting the EDAC from the >>> LSDK to the SDK 2.0. Tracy, You said you were backporting to an older release. Please double check your device tree _after_ Linux reaches prompt. You can find it at /proc/device-tree. You can also check the device tree in U-Boot using bootm command step-by-step, after the fixup. As long as you have correct "compatible", the probe function should be called. It may quit if ECC is not detected, or not register interrupt if not found in device tree. York -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: FW: edac driver initialization, interrupt, & debug 2018-11-17 14:05 ` Borislav Petkov 2018-11-17 23:22 ` Tracy Smith @ 2018-11-19 15:55 ` York Sun 1 sibling, 0 replies; 10+ messages in thread From: York Sun @ 2018-11-19 15:55 UTC (permalink / raw) To: Borislav Petkov, Tracy Smith Cc: linux-edac, backports, linux-newbie, util-linux, linux-mmc On 11/17/18 6:05 AM, Borislav Petkov wrote: > > Btw, looking at it, York, this whole injection functionality needs to > be behind CONFIG_EDAC_DEBUG because a production driver shouldn't have > injection capability. I can make the change. York -- To unsubscribe from this list: send the line "unsubscribe backports" in ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-11-21 22:03 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <BYAPR02MB431115EC4735AE5B7E29F2CEF6DC0@BYAPR02MB4311.namprd02.prod.outlook.com> [not found] ` <BYAPR02MB43110062F32BFDEA712AB371F6DC0@BYAPR02MB4311.namprd02.prod.outlook.com> [not found] ` <CAChUvXMp6S6MBY_LmrfgdPcctQw70FoyxbiHeFqK+5fQx5omCw@mail.gmail.com> 2018-11-16 17:07 ` FW: edac driver initialization, interrupt, & debug Tracy Smith 2018-11-17 14:05 ` Borislav Petkov 2018-11-17 23:22 ` Tracy Smith 2018-11-18 1:05 ` Steve Inkpen 2018-11-19 16:37 ` Tracy Smith 2018-11-19 16:48 ` York Sun 2018-11-21 17:01 ` Tracy Smith 2018-11-21 22:02 ` Tracy Smith 2018-11-19 16:24 ` FW: " York Sun 2018-11-19 15:55 ` York Sun
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).