From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> To: unlisted-recipients:; (no To-header on input) Cc: linuxarm@huawei.com, mauro.chehab@huawei.com, Mauro Carvalho Chehab <mchehab+huawei@kernel.org>, Andy Gross <agross@kernel.org>, Bjorn Andersson <bjorn.andersson@linaro.org>, Hans Verkuil <hans.verkuil@cisco.com>, Mauro Carvalho Chehab <mchehab@kernel.org>, Stanimir Varbanov <stanimir.varbanov@linaro.org>, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH 03/25] media: venus: Rework error fail recover logic Date: Wed, 5 May 2021 11:41:53 +0200 [thread overview] Message-ID: <419e346f01af5423485202d624fc144756bd2b11.1620207353.git.mchehab+huawei@kernel.org> (raw) In-Reply-To: <cover.1620207353.git.mchehab+huawei@kernel.org> The Venus code has a sort of watchdog that attempts to recover from IP errors, implemented as a delayed work job, which calls venus_sys_error_handler(). Right now, it has several issues: 1. It assumes that PM runtime resume never fails 2. It internally runs two while() loops that also assume that PM runtime will never fail to go idle: while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_enc)) msleep(10); ... while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0])) usleep_range(1000, 1500); 3. It uses an OR to merge all return codes and then report to the user 4. If the hardware never recovers, it keeps running on every 10ms, flooding the syslog with 2 messages (so, up to 200 messages per second). Rework the code, in order to prevent that, by: 1. check the return code from PM runtime resume; 2. don't let the while() loops run forever; 3. store the failed event; 4. use warn ratelimited when it fails to recover. Fixes: af2c3834c8ca ("[media] media: venus: adding core part and helper functions") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> --- drivers/media/platform/qcom/venus/core.c | 59 +++++++++++++++++++----- 1 file changed, 47 insertions(+), 12 deletions(-) diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c index 54bac7ec14c5..4d0482743c0a 100644 --- a/drivers/media/platform/qcom/venus/core.c +++ b/drivers/media/platform/qcom/venus/core.c @@ -78,22 +78,32 @@ static const struct hfi_core_ops venus_core_ops = { .event_notify = venus_event_notify, }; +#define RPM_WAIT_FOR_IDLE_MAX_ATTEMPTS 10 + static void venus_sys_error_handler(struct work_struct *work) { struct venus_core *core = container_of(work, struct venus_core, work.work); - int ret = 0; + int ret, i, max_attempts = RPM_WAIT_FOR_IDLE_MAX_ATTEMPTS; + bool failed = false; + const char *err_msg = ""; - pm_runtime_get_sync(core->dev); + ret = pm_runtime_get_sync(core->dev); + if (ret < 0) { + err_msg = "resume runtime PM\n"; + max_attempts = 0; + failed = true; + } hfi_core_deinit(core, true); - dev_warn(core->dev, "system error has occurred, starting recovery!\n"); - mutex_lock(&core->lock); - while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_enc)) + for (i = 0; i < max_attempts; i++) { + if (!pm_runtime_active(core->dev_dec) && !pm_runtime_active(core->dev_enc)) + break; msleep(10); + } venus_shutdown(core); @@ -101,31 +111,56 @@ static void venus_sys_error_handler(struct work_struct *work) pm_runtime_put_sync(core->dev); - while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0])) + for (i = 0; i < max_attempts; i++) { + if (!core->pmdomains[0] || !pm_runtime_active(core->pmdomains[0])) + break; usleep_range(1000, 1500); + } hfi_reinit(core); - pm_runtime_get_sync(core->dev); + ret = pm_runtime_get_sync(core->dev); + if (ret < 0) { + err_msg = "resume runtime PM\n"; + max_attempts = 0; + failed = true; + } - ret |= venus_boot(core); - ret |= hfi_core_resume(core, true); + ret = venus_boot(core); + if (ret && !failed) { + err_msg = "boot Venus\n"; + failed = true; + } + + ret = hfi_core_resume(core, true); + if (ret && !failed) { + err_msg = "resume HFI\n"; + failed = true; + } enable_irq(core->irq); mutex_unlock(&core->lock); - ret |= hfi_core_init(core); + ret = hfi_core_init(core); + if (ret && !failed) { + err_msg = "init HFI\n"; + failed = true; + } pm_runtime_put_sync(core->dev); - if (ret) { + if (failed) { disable_irq_nosync(core->irq); - dev_warn(core->dev, "recovery failed (%d)\n", ret); + dev_warn_ratelimited(core->dev, + "System error has occurred, recovery failed to %s\n", + err_msg); schedule_delayed_work(&core->work, msecs_to_jiffies(10)); return; } + dev_warn(core->dev, "system error has occurred (recovered)\n"); + mutex_lock(&core->lock); core->sys_error = false; mutex_unlock(&core->lock); -- 2.30.2
next prev parent reply other threads:[~2021-05-05 9:42 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-05 9:41 [PATCH 00/25] Fix some PM runtime issues at the media subsystem Mauro Carvalho Chehab 2021-05-05 9:41 ` Mauro Carvalho Chehab [this message] 2021-05-05 11:05 ` [PATCH 03/25] media: venus: Rework error fail recover logic Jonathan Cameron 2021-05-06 15:11 ` [PATCH 00/25] Fix some PM runtime issues at the media subsystem Mauro Carvalho Chehab
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=419e346f01af5423485202d624fc144756bd2b11.1620207353.git.mchehab+huawei@kernel.org \ --to=mchehab+huawei@kernel.org \ --cc=agross@kernel.org \ --cc=bjorn.andersson@linaro.org \ --cc=hans.verkuil@cisco.com \ --cc=linux-arm-msm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-media@vger.kernel.org \ --cc=linuxarm@huawei.com \ --cc=mauro.chehab@huawei.com \ --cc=mchehab@kernel.org \ --cc=stanimir.varbanov@linaro.org \ --subject='Re: [PATCH 03/25] media: venus: Rework error fail recover logic' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).