From: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> To: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com>, Andy Gross <agross@kernel.org>, Bjorn Andersson <bjorn.andersson@linaro.org>, Mark Rutland <mark.rutland@arm.com>, Rob Herring <robh+dt@kernel.org>, devicetree@vger.kernel.org, Mauro Carvalho Chehab <mchehab@kernel.org>, Tony Luck <tony.luck@intel.com>, Robert Richter <rrichter@marvell.com>, linux-edac@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, Stephen Boyd <swboyd@chromium.org>, Evan Green <evgreen@chromium.org>, tsoni@codeaurora.org, psodagud@codeaurora.org Subject: Re: [PATCH 2/2] drivers: edac: Add EDAC support for Kryo CPU caches Date: Mon, 13 Jan 2020 11:14:29 +0530 [thread overview] Message-ID: <585db411bc542bf3f326627b7390e0ca@codeaurora.org> (raw) In-Reply-To: <20191230115030.GA30767@zn.tnic> Hi Boris, Thanks for the review comments. On 2019-12-30 17:20, Borislav Petkov wrote: > On Thu, Dec 05, 2019 at 09:53:18AM +0000, Sai Prakash Ranjan wrote: >> Kryo{3,4}XX CPU cores implement RAS extensions to support >> Error Correcting Code(ECC). Currently all Kryo{3,4}XX CPU >> cores (gold/silver a.k.a big/LITTLE) support ECC via RAS. > > via RAS what? ARM64_RAS_EXTN? > > In any case, this needs James to look at and especially if there's some > ARM-generic functionality in there which should be shared, of course. > Yes it is ARM64_RAS_EXTN and I have been hoping if James can provide the feedback, it has been some time now since I posted this out. >> This adds an interrupt based driver for those CPUs and > > s/This adds/Add/ > Will correct. >> + >> +config EDAC_QCOM_KRYO_POLL >> + depends on EDAC_QCOM_KRYO >> + bool "Poll on Kryo ECC registers" >> + help >> + This option chooses whether or not you want to poll on the Kryo >> ECC >> + registers. When this is enabled, the polling rate can be set as a >> + module parameter. By default, it will call the polling function >> every >> + second. > > Why is this a separate option and why should people use that? > > Can the polling/irq be switched automatically? > No it cannot be switched automatically. It is used in case some SoCs do not support an irq based mechanism for EDAC. But I am contradicting myself because I am telling that atleast one interrupt should be specified in bindings, so it is best if I drop this polling option for now. >> + >> config EDAC_ASPEED >> tristate "Aspeed AST 2500 SoC" >> depends on MACH_ASPEED_G5 >> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile >> index d77200c9680b..29edcfa6ec0e 100644 >> --- a/drivers/edac/Makefile >> +++ b/drivers/edac/Makefile >> @@ -85,5 +85,6 @@ obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o >> obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o >> obj-$(CONFIG_EDAC_TI) += ti_edac.o >> obj-$(CONFIG_EDAC_QCOM) += qcom_edac.o >> +obj-$(CONFIG_EDAC_QCOM_KRYO) += qcom_kryo_edac.o > > What is the difference between this new driver and the qcom_edac one? > Can > functionality be shared? > > Should this new one be called simply kryo_edac instead? > qcom_edac driver is for QCOM system cache(last level cache), it should be renamed to qcom_llcc_edac.c. This new driver is for QCOM Kryo CPU core caches(L1,L2,L3). Functionality cannot be shared as these two are different IP blocks and best kept separate. >> + >> +#define DRV_NAME "qcom_kryo_edac" >> + >> +/* >> + * ARM Cortex-A55, Cortex-A75, Cortex-A76 TRM Chapter B3.3 > > Chapter? Where? URL? > I chose this because these TRMs are openly available and if you search for these above terms like "Cortex-A76 TRM Chapter B3.3" in google, then the first search result will be the TRM pdf, otherwise I would have to specify the long URL for the pdf and we do not know how long that URL link will be active. >> + >> +static const struct error_type err_type[] = { >> + { edac_device_handle_ce, "Kryo L1 Corrected Error" }, >> + { edac_device_handle_ue, "Kryo L1 Uncorrected Error" }, >> + { edac_device_handle_ue, "Kryo L1 Deferred Error" }, >> + { edac_device_handle_ce, "Kryo L2 Corrected Error" }, >> + { edac_device_handle_ue, "Kryo L2 Uncorrected Error" }, >> + { edac_device_handle_ue, "Kryo L2 Deferred Error" }, >> + { edac_device_handle_ce, "L3 Corrected Error" }, >> + { edac_device_handle_ue, "L3 Uncorrected Error" }, >> + { edac_device_handle_ue, "L3 Deferred Error" }, >> +}; >> + > > All that is not really needed - you can put the whole error type > detection and dumping in kryo_check_err_type() in nicely readable > switch-case statement. No need for the function pointers and special > structs. > How is this not easily readable? If I put this in kryo_check_err_type, then there will be nested switch which I think is not so great in terms of readability since it will not fit in one screen and is just more lines of code. >> +static struct edac_device_ctl_info __percpu *edac_dev; >> +static struct edac_device_ctl_info *drv_edev_ctl; >> + >> +static const char *get_error_msg(u64 errxstatus) >> +{ >> + const struct error_record *rec; >> + u32 errxstatus_serr; >> + >> + errxstatus_serr = FIELD_GET(KRYO_ERRXSTATUS_SERR, errxstatus); >> + >> + for (rec = serror_record; rec->error_code; rec++) { >> + if (errxstatus_serr == rec->error_code) >> + return rec->error_msg; >> + } >> + >> + return NULL; >> +} >> + >> +static void dump_syndrome_reg(int error_type, int level, >> + u64 errxstatus, u64 errxmisc, >> + struct edac_device_ctl_info *edev_ctl) >> +{ >> + char msg[KRYO_EDAC_MSG_MAX]; >> + const char *error_msg; >> + int cpu; >> + >> + cpu = raw_smp_processor_id(); > > Why raw_? > Because we will be calling smp_processor_id in preemptible context and if we enable CONFIG_DEBUG_PREEMPT, we would get a nice backtrace. [ 3.747468] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 3.755527] caller is qcom_kryo_edac_probe+0x138/0x2b8 [ 3.760819] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S 5.4.0-rc7-next-20191113-00009-g8666855d6a5b-dirty #107 [ 3.772323] Hardware name: Qualcomm Technologies, Inc. SM8150 MTP (DT) [ 3.779030] Call trace: [ 3.781556] dump_backtrace+0x0/0x158 [ 3.785331] show_stack+0x14/0x20 [ 3.788741] dump_stack+0xb0/0xf4 [ 3.792164] debug_smp_processor_id+0xd8/0xe0 [ 3.796639] qcom_kryo_edac_probe+0x138/0x2b8 [ 3.801116] platform_drv_probe+0x50/0xa8 [ 3.805236] really_probe+0x108/0x360 [ 3.808999] driver_probe_device+0x58/0x100 [ 3.813304] device_driver_attach+0x6c/0x78 [ 3.817606] __driver_attach+0xb0/0xf0 [ 3.821459] bus_for_each_dev+0x68/0xc8 [ 3.825407] driver_attach+0x20/0x28 [ 3.829083] bus_add_driver+0x160/0x1f0 [ 3.833030] driver_register+0x60/0x110 [ 3.836976] __platform_driver_register+0x40/0x48 [ 3.841813] qcom_kryo_edac_driver_init+0x18/0x20 [ 3.846645] do_one_initcall+0x58/0x1a0 [ 3.850596] kernel_init_freeable+0x19c/0x240 [ 3.855075] kernel_init+0x10/0x108 [ 3.858665] ret_from_fork+0x10/0x1c >> +static int kryo_l1_l2_setup_irq(struct platform_device *pdev, >> + struct edac_device_ctl_info *edev_ctl) >> +{ >> + int cpu, errirq, faultirq, ret; >> + >> + edac_dev = devm_alloc_percpu(&pdev->dev, *edac_dev); >> + if (!edac_dev) >> + return -ENOMEM; >> + >> + for_each_possible_cpu(cpu) { >> + preempt_disable(); >> + per_cpu(edac_dev, cpu) = edev_ctl; >> + preempt_enable(); >> + } > > That sillyness doesn't belong here, if at all. > Sorry but I do not understand the sillyness here. Could you please explain? > ... > >> +static void kryo_poll_cache_error(struct edac_device_ctl_info >> *edev_ctl) >> +{ >> + if (!edev_ctl) >> + edev_ctl = drv_edev_ctl; > > That's silly. > Actually its not silly. In case, polling is enabled and on PM exit edev_ctl could be NULL. >> + >> + on_each_cpu(kryo_check_l1_l2_ecc, edev_ctl, 1); >> + kryo_check_l3_scu_ecc(edev_ctl); >> +} > > ... > >> +static int qcom_kryo_edac_probe(struct platform_device *pdev) >> +{ >> + struct edac_device_ctl_info *edev_ctl; >> + struct device *dev = &pdev->dev; >> + int ret; >> + >> + qcom_kryo_edac_setup(); > > This function needs to have a return value saying whether it did setup > the hw properly or not and the probe function needs to return here if > not. Ok will add a return check. Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
WARNING: multiple messages have this Message-ID (diff)
From: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> To: Borislav Petkov <bp@alien8.de> Cc: Mark Rutland <mark.rutland@arm.com>, devicetree@vger.kernel.org, Tony Luck <tony.luck@intel.com>, psodagud@codeaurora.org, linux-arm-msm@vger.kernel.org, Stephen Boyd <swboyd@chromium.org>, tsoni@codeaurora.org, linux-kernel@vger.kernel.org, Rob Herring <robh+dt@kernel.org>, Bjorn Andersson <bjorn.andersson@linaro.org>, Andy Gross <agross@kernel.org>, James Morse <james.morse@arm.com>, Evan Green <evgreen@chromium.org>, Robert Richter <rrichter@marvell.com>, Mauro Carvalho Chehab <mchehab@kernel.org>, linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org Subject: Re: [PATCH 2/2] drivers: edac: Add EDAC support for Kryo CPU caches Date: Mon, 13 Jan 2020 11:14:29 +0530 [thread overview] Message-ID: <585db411bc542bf3f326627b7390e0ca@codeaurora.org> (raw) In-Reply-To: <20191230115030.GA30767@zn.tnic> Hi Boris, Thanks for the review comments. On 2019-12-30 17:20, Borislav Petkov wrote: > On Thu, Dec 05, 2019 at 09:53:18AM +0000, Sai Prakash Ranjan wrote: >> Kryo{3,4}XX CPU cores implement RAS extensions to support >> Error Correcting Code(ECC). Currently all Kryo{3,4}XX CPU >> cores (gold/silver a.k.a big/LITTLE) support ECC via RAS. > > via RAS what? ARM64_RAS_EXTN? > > In any case, this needs James to look at and especially if there's some > ARM-generic functionality in there which should be shared, of course. > Yes it is ARM64_RAS_EXTN and I have been hoping if James can provide the feedback, it has been some time now since I posted this out. >> This adds an interrupt based driver for those CPUs and > > s/This adds/Add/ > Will correct. >> + >> +config EDAC_QCOM_KRYO_POLL >> + depends on EDAC_QCOM_KRYO >> + bool "Poll on Kryo ECC registers" >> + help >> + This option chooses whether or not you want to poll on the Kryo >> ECC >> + registers. When this is enabled, the polling rate can be set as a >> + module parameter. By default, it will call the polling function >> every >> + second. > > Why is this a separate option and why should people use that? > > Can the polling/irq be switched automatically? > No it cannot be switched automatically. It is used in case some SoCs do not support an irq based mechanism for EDAC. But I am contradicting myself because I am telling that atleast one interrupt should be specified in bindings, so it is best if I drop this polling option for now. >> + >> config EDAC_ASPEED >> tristate "Aspeed AST 2500 SoC" >> depends on MACH_ASPEED_G5 >> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile >> index d77200c9680b..29edcfa6ec0e 100644 >> --- a/drivers/edac/Makefile >> +++ b/drivers/edac/Makefile >> @@ -85,5 +85,6 @@ obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o >> obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o >> obj-$(CONFIG_EDAC_TI) += ti_edac.o >> obj-$(CONFIG_EDAC_QCOM) += qcom_edac.o >> +obj-$(CONFIG_EDAC_QCOM_KRYO) += qcom_kryo_edac.o > > What is the difference between this new driver and the qcom_edac one? > Can > functionality be shared? > > Should this new one be called simply kryo_edac instead? > qcom_edac driver is for QCOM system cache(last level cache), it should be renamed to qcom_llcc_edac.c. This new driver is for QCOM Kryo CPU core caches(L1,L2,L3). Functionality cannot be shared as these two are different IP blocks and best kept separate. >> + >> +#define DRV_NAME "qcom_kryo_edac" >> + >> +/* >> + * ARM Cortex-A55, Cortex-A75, Cortex-A76 TRM Chapter B3.3 > > Chapter? Where? URL? > I chose this because these TRMs are openly available and if you search for these above terms like "Cortex-A76 TRM Chapter B3.3" in google, then the first search result will be the TRM pdf, otherwise I would have to specify the long URL for the pdf and we do not know how long that URL link will be active. >> + >> +static const struct error_type err_type[] = { >> + { edac_device_handle_ce, "Kryo L1 Corrected Error" }, >> + { edac_device_handle_ue, "Kryo L1 Uncorrected Error" }, >> + { edac_device_handle_ue, "Kryo L1 Deferred Error" }, >> + { edac_device_handle_ce, "Kryo L2 Corrected Error" }, >> + { edac_device_handle_ue, "Kryo L2 Uncorrected Error" }, >> + { edac_device_handle_ue, "Kryo L2 Deferred Error" }, >> + { edac_device_handle_ce, "L3 Corrected Error" }, >> + { edac_device_handle_ue, "L3 Uncorrected Error" }, >> + { edac_device_handle_ue, "L3 Deferred Error" }, >> +}; >> + > > All that is not really needed - you can put the whole error type > detection and dumping in kryo_check_err_type() in nicely readable > switch-case statement. No need for the function pointers and special > structs. > How is this not easily readable? If I put this in kryo_check_err_type, then there will be nested switch which I think is not so great in terms of readability since it will not fit in one screen and is just more lines of code. >> +static struct edac_device_ctl_info __percpu *edac_dev; >> +static struct edac_device_ctl_info *drv_edev_ctl; >> + >> +static const char *get_error_msg(u64 errxstatus) >> +{ >> + const struct error_record *rec; >> + u32 errxstatus_serr; >> + >> + errxstatus_serr = FIELD_GET(KRYO_ERRXSTATUS_SERR, errxstatus); >> + >> + for (rec = serror_record; rec->error_code; rec++) { >> + if (errxstatus_serr == rec->error_code) >> + return rec->error_msg; >> + } >> + >> + return NULL; >> +} >> + >> +static void dump_syndrome_reg(int error_type, int level, >> + u64 errxstatus, u64 errxmisc, >> + struct edac_device_ctl_info *edev_ctl) >> +{ >> + char msg[KRYO_EDAC_MSG_MAX]; >> + const char *error_msg; >> + int cpu; >> + >> + cpu = raw_smp_processor_id(); > > Why raw_? > Because we will be calling smp_processor_id in preemptible context and if we enable CONFIG_DEBUG_PREEMPT, we would get a nice backtrace. [ 3.747468] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 3.755527] caller is qcom_kryo_edac_probe+0x138/0x2b8 [ 3.760819] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S 5.4.0-rc7-next-20191113-00009-g8666855d6a5b-dirty #107 [ 3.772323] Hardware name: Qualcomm Technologies, Inc. SM8150 MTP (DT) [ 3.779030] Call trace: [ 3.781556] dump_backtrace+0x0/0x158 [ 3.785331] show_stack+0x14/0x20 [ 3.788741] dump_stack+0xb0/0xf4 [ 3.792164] debug_smp_processor_id+0xd8/0xe0 [ 3.796639] qcom_kryo_edac_probe+0x138/0x2b8 [ 3.801116] platform_drv_probe+0x50/0xa8 [ 3.805236] really_probe+0x108/0x360 [ 3.808999] driver_probe_device+0x58/0x100 [ 3.813304] device_driver_attach+0x6c/0x78 [ 3.817606] __driver_attach+0xb0/0xf0 [ 3.821459] bus_for_each_dev+0x68/0xc8 [ 3.825407] driver_attach+0x20/0x28 [ 3.829083] bus_add_driver+0x160/0x1f0 [ 3.833030] driver_register+0x60/0x110 [ 3.836976] __platform_driver_register+0x40/0x48 [ 3.841813] qcom_kryo_edac_driver_init+0x18/0x20 [ 3.846645] do_one_initcall+0x58/0x1a0 [ 3.850596] kernel_init_freeable+0x19c/0x240 [ 3.855075] kernel_init+0x10/0x108 [ 3.858665] ret_from_fork+0x10/0x1c >> +static int kryo_l1_l2_setup_irq(struct platform_device *pdev, >> + struct edac_device_ctl_info *edev_ctl) >> +{ >> + int cpu, errirq, faultirq, ret; >> + >> + edac_dev = devm_alloc_percpu(&pdev->dev, *edac_dev); >> + if (!edac_dev) >> + return -ENOMEM; >> + >> + for_each_possible_cpu(cpu) { >> + preempt_disable(); >> + per_cpu(edac_dev, cpu) = edev_ctl; >> + preempt_enable(); >> + } > > That sillyness doesn't belong here, if at all. > Sorry but I do not understand the sillyness here. Could you please explain? > ... > >> +static void kryo_poll_cache_error(struct edac_device_ctl_info >> *edev_ctl) >> +{ >> + if (!edev_ctl) >> + edev_ctl = drv_edev_ctl; > > That's silly. > Actually its not silly. In case, polling is enabled and on PM exit edev_ctl could be NULL. >> + >> + on_each_cpu(kryo_check_l1_l2_ecc, edev_ctl, 1); >> + kryo_check_l3_scu_ecc(edev_ctl); >> +} > > ... > >> +static int qcom_kryo_edac_probe(struct platform_device *pdev) >> +{ >> + struct edac_device_ctl_info *edev_ctl; >> + struct device *dev = &pdev->dev; >> + int ret; >> + >> + qcom_kryo_edac_setup(); > > This function needs to have a return value saying whether it did setup > the hw properly or not and the probe function needs to return here if > not. Ok will add a return check. Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-01-13 5:44 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <cover.1575529553.git.saiprakash.ranjan@codeaurora.org> 2019-12-05 9:53 ` [PATCH 1/2] dt-bindings: edac: Add DT bindings for Kryo EDAC Sai Prakash Ranjan 2019-12-05 9:53 ` Sai Prakash Ranjan 2019-12-18 23:37 ` Rob Herring 2019-12-18 23:37 ` Rob Herring 2019-12-19 6:50 ` Sai Prakash Ranjan 2019-12-19 6:50 ` Sai Prakash Ranjan 2019-12-19 13:58 ` Rob Herring 2019-12-19 13:58 ` Rob Herring 2019-12-19 14:48 ` Sai Prakash Ranjan 2019-12-19 14:48 ` Sai Prakash Ranjan 2020-01-15 18:48 ` James Morse 2020-01-15 18:48 ` James Morse 2020-01-24 14:21 ` Sai Prakash Ranjan 2020-01-24 14:21 ` Sai Prakash Ranjan 2020-02-26 17:12 ` James Morse 2020-02-26 17:12 ` James Morse 2019-12-05 9:53 ` [PATCH 2/2] drivers: edac: Add EDAC support for Kryo CPU caches Sai Prakash Ranjan 2019-12-05 9:53 ` Sai Prakash Ranjan [not found] ` <0101016ed57a6311-e815485c-4b77-4342-a3de-203673941602-000000@us-west-2.amazonses.com> 2019-12-11 19:32 ` Evan Green 2019-12-11 19:32 ` Evan Green 2019-12-11 22:33 ` Stephen Boyd 2019-12-13 5:31 ` Sai Prakash Ranjan 2019-12-13 5:31 ` Sai Prakash Ranjan [not found] ` <0101016ed57a6559-46c6c649-db28-4945-a11c-7441b8e9ac5b-000000@us-west-2.amazonses.com> 2019-12-30 11:50 ` Borislav Petkov 2019-12-30 11:50 ` Borislav Petkov 2020-01-13 5:44 ` Sai Prakash Ranjan [this message] 2020-01-13 5:44 ` Sai Prakash Ranjan 2020-01-15 18:49 ` James Morse 2020-01-15 18:49 ` James Morse 2020-01-24 14:52 ` Sai Prakash Ranjan 2020-01-24 14:52 ` Sai Prakash Ranjan 2019-12-13 9:05 Sai Prakash Ranjan
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=585db411bc542bf3f326627b7390e0ca@codeaurora.org \ --to=saiprakash.ranjan@codeaurora.org \ --cc=agross@kernel.org \ --cc=bjorn.andersson@linaro.org \ --cc=bp@alien8.de \ --cc=devicetree@vger.kernel.org \ --cc=evgreen@chromium.org \ --cc=james.morse@arm.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-arm-msm@vger.kernel.org \ --cc=linux-edac@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mark.rutland@arm.com \ --cc=mchehab@kernel.org \ --cc=psodagud@codeaurora.org \ --cc=robh+dt@kernel.org \ --cc=rrichter@marvell.com \ --cc=swboyd@chromium.org \ --cc=tony.luck@intel.com \ --cc=tsoni@codeaurora.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.