From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: unlisted-recipients:; (no To-header on input)
Cc: linuxarm@huawei.com, mauro.chehab@huawei.com,
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
Andy Gross <agross@kernel.org>,
Bjorn Andersson <bjorn.andersson@linaro.org>,
Hans Verkuil <hans.verkuil@cisco.com>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Stanimir Varbanov <stanimir.varbanov@linaro.org>,
linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-media@vger.kernel.org
Subject: [PATCH 03/25] media: venus: Rework error fail recover logic
Date: Wed, 5 May 2021 11:41:53 +0200 [thread overview]
Message-ID: <419e346f01af5423485202d624fc144756bd2b11.1620207353.git.mchehab+huawei@kernel.org> (raw)
In-Reply-To: <cover.1620207353.git.mchehab+huawei@kernel.org>
The Venus code has a sort of watchdog that attempts to recover
from IP errors, implemented as a delayed work job, which
calls venus_sys_error_handler().
Right now, it has several issues:
1. It assumes that PM runtime resume never fails
2. It internally runs two while() loops that also assume that
PM runtime will never fail to go idle:
while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_enc))
msleep(10);
...
while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0]))
usleep_range(1000, 1500);
3. It uses an OR to merge all return codes and then report to the user
4. If the hardware never recovers, it keeps running on every 10ms,
flooding the syslog with 2 messages (so, up to 200 messages
per second).
Rework the code, in order to prevent that, by:
1. check the return code from PM runtime resume;
2. don't let the while() loops run forever;
3. store the failed event;
4. use warn ratelimited when it fails to recover.
Fixes: af2c3834c8ca ("[media] media: venus: adding core part and helper functions")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
drivers/media/platform/qcom/venus/core.c | 59 +++++++++++++++++++-----
1 file changed, 47 insertions(+), 12 deletions(-)
diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
index 54bac7ec14c5..4d0482743c0a 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -78,22 +78,32 @@ static const struct hfi_core_ops venus_core_ops = {
.event_notify = venus_event_notify,
};
+#define RPM_WAIT_FOR_IDLE_MAX_ATTEMPTS 10
+
static void venus_sys_error_handler(struct work_struct *work)
{
struct venus_core *core =
container_of(work, struct venus_core, work.work);
- int ret = 0;
+ int ret, i, max_attempts = RPM_WAIT_FOR_IDLE_MAX_ATTEMPTS;
+ bool failed = false;
+ const char *err_msg = "";
- pm_runtime_get_sync(core->dev);
+ ret = pm_runtime_get_sync(core->dev);
+ if (ret < 0) {
+ err_msg = "resume runtime PM\n";
+ max_attempts = 0;
+ failed = true;
+ }
hfi_core_deinit(core, true);
- dev_warn(core->dev, "system error has occurred, starting recovery!\n");
-
mutex_lock(&core->lock);
- while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_enc))
+ for (i = 0; i < max_attempts; i++) {
+ if (!pm_runtime_active(core->dev_dec) && !pm_runtime_active(core->dev_enc))
+ break;
msleep(10);
+ }
venus_shutdown(core);
@@ -101,31 +111,56 @@ static void venus_sys_error_handler(struct work_struct *work)
pm_runtime_put_sync(core->dev);
- while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0]))
+ for (i = 0; i < max_attempts; i++) {
+ if (!core->pmdomains[0] || !pm_runtime_active(core->pmdomains[0]))
+ break;
usleep_range(1000, 1500);
+ }
hfi_reinit(core);
- pm_runtime_get_sync(core->dev);
+ ret = pm_runtime_get_sync(core->dev);
+ if (ret < 0) {
+ err_msg = "resume runtime PM\n";
+ max_attempts = 0;
+ failed = true;
+ }
- ret |= venus_boot(core);
- ret |= hfi_core_resume(core, true);
+ ret = venus_boot(core);
+ if (ret && !failed) {
+ err_msg = "boot Venus\n";
+ failed = true;
+ }
+
+ ret = hfi_core_resume(core, true);
+ if (ret && !failed) {
+ err_msg = "resume HFI\n";
+ failed = true;
+ }
enable_irq(core->irq);
mutex_unlock(&core->lock);
- ret |= hfi_core_init(core);
+ ret = hfi_core_init(core);
+ if (ret && !failed) {
+ err_msg = "init HFI\n";
+ failed = true;
+ }
pm_runtime_put_sync(core->dev);
- if (ret) {
+ if (failed) {
disable_irq_nosync(core->irq);
- dev_warn(core->dev, "recovery failed (%d)\n", ret);
+ dev_warn_ratelimited(core->dev,
+ "System error has occurred, recovery failed to %s\n",
+ err_msg);
schedule_delayed_work(&core->work, msecs_to_jiffies(10));
return;
}
+ dev_warn(core->dev, "system error has occurred (recovered)\n");
+
mutex_lock(&core->lock);
core->sys_error = false;
mutex_unlock(&core->lock);
--
2.30.2
next prev parent reply other threads:[~2021-05-05 9:42 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-05 9:41 [PATCH 00/25] Fix some PM runtime issues at the media subsystem Mauro Carvalho Chehab
2021-05-05 9:41 ` Mauro Carvalho Chehab [this message]
2021-05-05 11:05 ` [PATCH 03/25] media: venus: Rework error fail recover logic Jonathan Cameron
2021-05-06 15:11 ` [PATCH 00/25] Fix some PM runtime issues at the media subsystem Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=419e346f01af5423485202d624fc144756bd2b11.1620207353.git.mchehab+huawei@kernel.org \
--to=mchehab+huawei@kernel.org \
--cc=agross@kernel.org \
--cc=bjorn.andersson@linaro.org \
--cc=hans.verkuil@cisco.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=mauro.chehab@huawei.com \
--cc=mchehab@kernel.org \
--cc=stanimir.varbanov@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).