All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: unlisted-recipients:; (no To-header on input)
Cc: linuxarm@huawei.com, mauro.chehab@huawei.com,
	Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	Andy Gross <agross@kernel.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>,
	Hans Verkuil <hans.verkuil@cisco.com>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Stanimir Varbanov <stanimir.varbanov@linaro.org>,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-media@vger.kernel.org
Subject: [PATCH 03/25] media: venus: Rework error fail recover logic
Date: Wed,  5 May 2021 11:41:53 +0200	[thread overview]
Message-ID: <419e346f01af5423485202d624fc144756bd2b11.1620207353.git.mchehab+huawei@kernel.org> (raw)
In-Reply-To: <cover.1620207353.git.mchehab+huawei@kernel.org>

The Venus code has a sort of watchdog that attempts to recover
from IP errors, implemented as a delayed work job, which
calls venus_sys_error_handler().

Right now, it has several issues:

1. It assumes that PM runtime resume never fails

2. It internally runs two while() loops that also assume that
   PM runtime will never fail to go idle:

	while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_enc))
		msleep(10);

...

	while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0]))
		usleep_range(1000, 1500);

3. It uses an OR to merge all return codes and then report to the user

4. If the hardware never recovers, it keeps running on every 10ms,
   flooding the syslog with 2 messages (so, up to 200 messages
   per second).

Rework the code, in order to prevent that, by:

1. check the return code from PM runtime resume;
2. don't let the while() loops run forever;
3. store the failed event;
4. use warn ratelimited when it fails to recover.

Fixes: af2c3834c8ca ("[media] media: venus: adding core part and helper functions")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/media/platform/qcom/venus/core.c | 59 +++++++++++++++++++-----
 1 file changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
index 54bac7ec14c5..4d0482743c0a 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -78,22 +78,32 @@ static const struct hfi_core_ops venus_core_ops = {
 	.event_notify = venus_event_notify,
 };
 
+#define RPM_WAIT_FOR_IDLE_MAX_ATTEMPTS 10
+
 static void venus_sys_error_handler(struct work_struct *work)
 {
 	struct venus_core *core =
 			container_of(work, struct venus_core, work.work);
-	int ret = 0;
+	int ret, i, max_attempts = RPM_WAIT_FOR_IDLE_MAX_ATTEMPTS;
+	bool failed = false;
+	const char *err_msg = "";
 
-	pm_runtime_get_sync(core->dev);
+	ret = pm_runtime_get_sync(core->dev);
+	if (ret < 0) {
+		err_msg = "resume runtime PM\n";
+		max_attempts = 0;
+		failed = true;
+	}
 
 	hfi_core_deinit(core, true);
 
-	dev_warn(core->dev, "system error has occurred, starting recovery!\n");
-
 	mutex_lock(&core->lock);
 
-	while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_enc))
+	for (i = 0; i < max_attempts; i++) {
+		if (!pm_runtime_active(core->dev_dec) && !pm_runtime_active(core->dev_enc))
+			break;
 		msleep(10);
+	}
 
 	venus_shutdown(core);
 
@@ -101,31 +111,56 @@ static void venus_sys_error_handler(struct work_struct *work)
 
 	pm_runtime_put_sync(core->dev);
 
-	while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0]))
+	for (i = 0; i < max_attempts; i++) {
+		if (!core->pmdomains[0] || !pm_runtime_active(core->pmdomains[0]))
+			break;
 		usleep_range(1000, 1500);
+	}
 
 	hfi_reinit(core);
 
-	pm_runtime_get_sync(core->dev);
+	ret = pm_runtime_get_sync(core->dev);
+	if (ret < 0) {
+		err_msg = "resume runtime PM\n";
+		max_attempts = 0;
+		failed = true;
+	}
 
-	ret |= venus_boot(core);
-	ret |= hfi_core_resume(core, true);
+	ret = venus_boot(core);
+	if (ret && !failed) {
+		err_msg = "boot Venus\n";
+		failed = true;
+	}
+
+	ret = hfi_core_resume(core, true);
+	if (ret && !failed) {
+		err_msg = "resume HFI\n";
+		failed = true;
+	}
 
 	enable_irq(core->irq);
 
 	mutex_unlock(&core->lock);
 
-	ret |= hfi_core_init(core);
+	ret = hfi_core_init(core);
+	if (ret && !failed) {
+		err_msg = "init HFI\n";
+		failed = true;
+	}
 
 	pm_runtime_put_sync(core->dev);
 
-	if (ret) {
+	if (failed) {
 		disable_irq_nosync(core->irq);
-		dev_warn(core->dev, "recovery failed (%d)\n", ret);
+		dev_warn_ratelimited(core->dev,
+				     "System error has occurred, recovery failed to %s\n",
+				     err_msg);
 		schedule_delayed_work(&core->work, msecs_to_jiffies(10));
 		return;
 	}
 
+	dev_warn(core->dev, "system error has occurred (recovered)\n");
+
 	mutex_lock(&core->lock);
 	core->sys_error = false;
 	mutex_unlock(&core->lock);
-- 
2.30.2


  parent reply	other threads:[~2021-05-05  9:42 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-05  9:41 [PATCH 00/25] Fix some PM runtime issues at the media subsystem Mauro Carvalho Chehab
2021-05-05  9:41 ` Mauro Carvalho Chehab
2021-05-05  9:41 ` Mauro Carvalho Chehab
2021-05-05  9:41 ` Mauro Carvalho Chehab
2021-05-05  9:41 ` Mauro Carvalho Chehab
2021-05-05  9:41 ` [PATCH 01/25] staging: media: rkvdec: fix pm_runtime_get_sync() usage count Mauro Carvalho Chehab
2021-05-05  9:41   ` Mauro Carvalho Chehab
2021-05-05  9:41   ` Mauro Carvalho Chehab
2021-05-05 12:23   ` Jonathan Cameron
2021-05-05 12:23     ` Jonathan Cameron
2021-05-05  9:41 ` [PATCH 02/25] staging: media: imx7-mipi-csis: " Mauro Carvalho Chehab
2021-05-05  9:41   ` Mauro Carvalho Chehab
2021-05-05  9:41   ` Mauro Carvalho Chehab
2021-05-05 11:06   ` Jonathan Cameron
2021-05-05 11:06     ` Jonathan Cameron
2021-05-05 11:17     ` Mauro Carvalho Chehab
2021-05-05 11:17       ` Mauro Carvalho Chehab
2021-05-05 13:56     ` Rui Miguel Silva
2021-05-05 13:56       ` Rui Miguel Silva
2021-05-05 14:23       ` Mauro Carvalho Chehab
2021-05-05 14:23         ` Mauro Carvalho Chehab
2021-05-05  9:41 ` Mauro Carvalho Chehab [this message]
2021-05-05 11:05   ` [PATCH 03/25] media: venus: Rework error fail recover logic Jonathan Cameron
2021-05-05  9:41 ` [PATCH 04/25] media: s5p_cec: decrement usage count if disabled Mauro Carvalho Chehab
2021-05-05 12:24   ` Jonathan Cameron
2021-05-05  9:41 ` [PATCH 05/25] media: i2c: ccs-core: return the right error code at suspend Mauro Carvalho Chehab
2021-05-05 12:24   ` Jonathan Cameron
2021-05-05 12:51   ` Sakari Ailus
2021-05-05  9:41 ` [PATCH 06/25] media: i2c: imx334: fix the pm runtime get logic Mauro Carvalho Chehab
2021-05-05 11:10   ` Jonathan Cameron
2021-05-05 11:24     ` Mauro Carvalho Chehab
2021-05-05 12:26       ` Jonathan Cameron
2021-05-05  9:41 ` [PATCH 07/25] media: exynos-gsc: don't resume at remove time Mauro Carvalho Chehab
2021-05-05  9:41   ` Mauro Carvalho Chehab
2021-05-05 12:27   ` Jonathan Cameron
2021-05-05 12:27     ` Jonathan Cameron
2021-05-05  9:41 ` [PATCH 08/25] media: atmel: properly get pm_runtime Mauro Carvalho Chehab
2021-05-05  9:41   ` Mauro Carvalho Chehab
2021-05-05 12:08   ` Jonathan Cameron
2021-05-05 12:08     ` Jonathan Cameron
2021-06-10  9:04     ` Eugen.Hristev
2021-06-10  9:04       ` Eugen.Hristev
2021-06-10  9:38       ` Mauro Carvalho Chehab
2021-06-10  9:38         ` Mauro Carvalho Chehab
2021-06-10 12:00         ` Eugen.Hristev
2021-06-10 12:00           ` Eugen.Hristev
2021-06-16  8:03           ` Mauro Carvalho Chehab
2021-06-16  8:03             ` Mauro Carvalho Chehab
2021-09-06  8:03           ` Mauro Carvalho Chehab
2021-09-06  8:03             ` Mauro Carvalho Chehab
2021-09-06  8:13             ` Eugen.Hristev
2021-09-06  8:13               ` Eugen.Hristev
2021-09-13 10:26               ` Eugen.Hristev
2021-09-13 10:26                 ` Eugen.Hristev
2021-05-05  9:41 ` [PATCH 09/25] media: hantro: do a PM resume earlier Mauro Carvalho Chehab
2021-05-05  9:41   ` Mauro Carvalho Chehab
2021-05-05  9:41   ` Mauro Carvalho Chehab
2021-05-05 11:34   ` Jonathan Cameron
2021-05-05 11:34     ` Jonathan Cameron
2021-05-05 13:22   ` Ezequiel Garcia
2021-05-05 13:22     ` Ezequiel Garcia
2021-05-05 13:22     ` Ezequiel Garcia
2021-05-05 13:46     ` Mauro Carvalho Chehab
2021-05-05 13:46       ` Mauro Carvalho Chehab
2021-05-05 14:01       ` Ezequiel Garcia
2021-05-05 14:01         ` Ezequiel Garcia
2021-05-05 14:01         ` Ezequiel Garcia
2021-05-05 14:15         ` Mauro Carvalho Chehab
2021-05-05 14:15           ` Mauro Carvalho Chehab
2021-05-05  9:42 ` [PATCH 10/25] media: marvel-ccic: fix some issues when getting pm_runtime Mauro Carvalho Chehab
2021-05-05  9:42 ` [PATCH 11/25] media: mdk-mdp: fix pm_runtime_get_sync() usage count Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05  9:42 ` [PATCH 12/25] media: rcar_fdp1: simplify error check logic at fdp_open() Mauro Carvalho Chehab
2021-05-05  9:48   ` Sergei Shtylyov
2021-05-05  9:42 ` [PATCH 13/25] media: rcar_fdp1: fix pm_runtime_get_sync() usage count Mauro Carvalho Chehab
2021-05-05 12:31   ` Jonathan Cameron
2021-05-05  9:42 ` [PATCH 14/25] media: renesas-ceu: Properly check for PM errors Mauro Carvalho Chehab
2021-05-05  9:56   ` Jacopo Mondi
2021-05-05  9:42 ` [PATCH 15/25] media: s5p: fix pm_runtime_get_sync() usage count Mauro Carvalho Chehab
2021-05-05 12:31   ` Jonathan Cameron
2021-05-05  9:42 ` [PATCH 16/25] media: am437x: " Mauro Carvalho Chehab
2021-05-05 12:32   ` Jonathan Cameron
2021-05-08 13:01   ` Lad, Prabhakar
2021-05-05  9:42 ` [PATCH 17/25] media: sh_vou: " Mauro Carvalho Chehab
2021-05-05  9:42 ` [PATCH 18/25] media: mtk-vcodec: fix PM runtime get logic Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05 12:32   ` Jonathan Cameron
2021-05-05 12:32     ` Jonathan Cameron
2021-05-05 12:32     ` Jonathan Cameron
2021-05-05  9:42 ` [PATCH 19/25] media: s5p-jpeg: fix pm_runtime_get_sync() usage count Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05 12:33   ` Jonathan Cameron
2021-05-05 12:33     ` Jonathan Cameron
2021-05-05  9:42 ` [PATCH 20/25] media: sti/delta: use pm_runtime_resume_and_get() Mauro Carvalho Chehab
2021-05-05 12:01   ` Jonathan Cameron
2021-05-05 12:33     ` Jonathan Cameron
2021-05-05  9:42 ` [PATCH 21/25] media: sunxi: fix pm_runtime_get_sync() usage count Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05  9:42 ` [PATCH 22/25] media: sti/bdisp: " Mauro Carvalho Chehab
2021-05-05 12:34   ` Jonathan Cameron
2021-05-05  9:42 ` [PATCH 23/25] media: exynos4-is: " Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05 12:20   ` Jonathan Cameron
2021-05-05 12:20     ` Jonathan Cameron
2021-05-05  9:42 ` [PATCH 24/25] media: exynos-gsc: " Mauro Carvalho Chehab
2021-05-05  9:42   ` Mauro Carvalho Chehab
2021-05-05 12:34   ` Jonathan Cameron
2021-05-05 12:34     ` Jonathan Cameron
2021-05-05  9:42 ` [PATCH 25/25] media: i2c: ccs-core: " Mauro Carvalho Chehab
2021-05-05 10:34   ` Sakari Ailus
2021-05-05 10:57     ` Mauro Carvalho Chehab
2021-05-05 10:58       ` Mauro Carvalho Chehab
2021-05-05 12:35         ` Jonathan Cameron
2021-05-05 14:06           ` Mauro Carvalho Chehab
2021-05-05 16:36             ` Jonathan Cameron
2021-05-05 11:02       ` Sakari Ailus
2021-05-06 15:11 ` [PATCH 00/25] Fix some PM runtime issues at the media subsystem Mauro Carvalho Chehab
2021-05-06 15:11   ` Mauro Carvalho Chehab
2021-05-06 15:11   ` Mauro Carvalho Chehab
2021-05-06 15:11   ` Mauro Carvalho Chehab
2021-05-06 15:11   ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=419e346f01af5423485202d624fc144756bd2b11.1620207353.git.mchehab+huawei@kernel.org \
    --to=mchehab+huawei@kernel.org \
    --cc=agross@kernel.org \
    --cc=bjorn.andersson@linaro.org \
    --cc=hans.verkuil@cisco.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mauro.chehab@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=stanimir.varbanov@linaro.org \
    --subject='Re: [PATCH 03/25] media: venus: Rework error fail recover logic' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.