[RFC PATCH 0/2] clk: Avoid potential deadlock when disabling unused clocks

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] clk: Avoid potential deadlock when disabling unused clocks
@ 2022-09-22 15:43 Douglas Anderson
  2022-09-22 15:43 ` [RFC PATCH 1/2] PM: runtime: Add pm_runtime_try_put_sync() and pm_runtime_try_get_sync() Douglas Anderson
  2022-09-22 15:43 ` [RFC PATCH 2/2] clk: core: Avoid potential deadlock when disabling unused clocks Douglas Anderson
  0 siblings, 2 replies; 5+ messages in thread
From: Douglas Anderson @ 2022-09-22 15:43 UTC (permalink / raw)
  To: rafael, sboyd
  Cc: pavel, linux-pm, len.brown, linux-clk, gregkh, Douglas Anderson,
	Michael Turquette, linux-kernel

These two patches together fix a deadlock that I seem to be hitting
every time with my particular setup (a sc7280-herobrine board running
the downstream chromeos-5.15 kernel).

FWIW:
* Others tried with the very a similar setup to me and they didn't
  reproduce the hang, which makes sense after the analysis in the
  commit message of the clock patch. To hit this you just need to be
  lucky.
* I bisected this in our tree and landed on a cherry-pick of commit
  1b771839de05 ("clk: qcom: gdsc: enable optional power domain
  support") as the first one where I saw the hang (and the "fix" of
  that commit upstream didn't change things). Nothing about that patch
  was directly touching the code paths that my analysis showed so I
  can only assume it changed the timing in a way to make this happen
  reliably for me.

While the solution proposed in this patch series doesn't seem too
terrible to me, it's still not exactly elegant. This is why I'm
marking it as an RFC. If someone wants to send out patches that solve
this in a better way then I'd be more than happy.

NOTE also that at the same time I started getting the deadlock I also
started getting a lockdep warning talking about similar locks.
Unfortunately, the lockdep warning seems to be unrelated. This fix
handles the deadlock but the lockdep still shows up. I can also fix
the lockdep without fixing the deadlock. I'll send a separate patch
about the lockdep.

Douglas Anderson (2):
  PM: runtime: Add pm_runtime_try_put_sync() and
    pm_runtime_try_get_sync()
  clk: core: Avoid potential deadlock when disabling unused clocks

 drivers/base/power/runtime.c |   7 +-
 drivers/clk/clk.c            | 137 ++++++++++++++++++++++++++++-------
 include/linux/pm_runtime.h   |  28 +++++++
 3 files changed, 143 insertions(+), 29 deletions(-)

-- 
2.37.3.968.ga6b4b080e4-goog

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] PM: runtime: Add pm_runtime_try_put_sync() and pm_runtime_try_get_sync()
  2022-09-22 15:43 [RFC PATCH 0/2] clk: Avoid potential deadlock when disabling unused clocks Douglas Anderson
@ 2022-09-22 15:43 ` Douglas Anderson
  2022-09-22 17:37   ` Rafael J. Wysocki
  2022-09-22 15:43 ` [RFC PATCH 2/2] clk: core: Avoid potential deadlock when disabling unused clocks Douglas Anderson
  1 sibling, 1 reply; 5+ messages in thread
From: Douglas Anderson @ 2022-09-22 15:43 UTC (permalink / raw)
  To: rafael, sboyd
  Cc: pavel, linux-pm, len.brown, linux-clk, gregkh, Douglas Anderson,
	linux-kernel

In some cases, a caller may wish to synchronously get or put the PM
Runtime state of a device but the caller may also be holding a
resource that the runtime suspend or runtime resume of the device
needs. Obviously this can lead to deadlock.

A case in point is the clock framework. While running
clk_disable_unused() the clock framework holds the global clock
"prepare" lock. The clock framework then goes through and does PM
Runtime actions. It should be no surprise to anyone that some devices
need to prepare or unprepare clocks are part of their PM Runtime
actions. Things generally work OK because of the "recursive" nature of
the global clock "prepare" lock, but if we get unlucky and the PM
Runtime action is happening in another task then we can end up
deadlocking.

Let's add a "try" version of the synchronous PM Runtime routines.
This version will return -EINPROGRESS rather than waiting. To
implement this we'll add a new flag: RPM_TRY.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/base/power/runtime.c |  7 +++++--
 include/linux/pm_runtime.h   | 28 ++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 997be3ac20a7..67cc6a620b12 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -609,7 +609,7 @@ static int rpm_suspend(struct device *dev, int rpmflags)
 	if (dev->power.runtime_status == RPM_SUSPENDING) {
 		DEFINE_WAIT(wait);
 
-		if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
+		if (rpmflags & (RPM_ASYNC | RPM_NOWAIT | RPM_TRY)) {
 			retval = -EINPROGRESS;
 			goto out;
 		}
@@ -791,7 +791,10 @@ static int rpm_resume(struct device *dev, int rpmflags)
 	    || dev->power.runtime_status == RPM_SUSPENDING) {
 		DEFINE_WAIT(wait);
 
-		if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
+		if (rpmflags & RPM_TRY) {
+			retval = -EINPROGRESS;
+			goto out;
+		} else if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
 			if (dev->power.runtime_status == RPM_SUSPENDING)
 				dev->power.deferred_resume = true;
 			else
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 0a41b2dcccad..c68baa63f0e7 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -21,6 +21,8 @@
 #define RPM_GET_PUT		0x04	/* Increment/decrement the
 					    usage_count */
 #define RPM_AUTO		0x08	/* Use autosuspend_delay */
+#define RPM_TRY			0x10	/* Try to be synchronous but fail
+					    with an error if we can't. */
 
 /*
  * Use this for defining a set of PM operations to be used in all situations
@@ -425,6 +427,19 @@ static inline int pm_runtime_get_sync(struct device *dev)
 	return __pm_runtime_resume(dev, RPM_GET_PUT);
 }
 
+/**
+ * pm_runtime_try_get_sync - Like pm_runtime_get_sync() but err if blocking
+ * @dev: Target device.
+ *
+ * This function works just like pm_runtime_get_sync() except that if the
+ * device in question is currently in the process of suspending or resuming
+ * that it will return with -EINPROGRESS instead of blocking.
+ */
+static inline int pm_runtime_try_get_sync(struct device *dev)
+{
+	return __pm_runtime_resume(dev, RPM_GET_PUT | RPM_TRY);
+}
+
 /**
  * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
  * @dev: Target device.
@@ -489,6 +504,19 @@ static inline int pm_runtime_put_sync(struct device *dev)
 	return __pm_runtime_idle(dev, RPM_GET_PUT);
 }
 
+/**
+ * pm_runtime_try_put_sync - Like pm_runtime_put_sync() but err if blocking
+ * @dev: Target device.
+ *
+ * This function works just like pm_runtime_put_sync() except that if the
+ * device in question is currently in the process of suspending that it will
+ * return with -EINPROGRESS instead of blocking.
+ */
+static inline int pm_runtime_try_put_sync(struct device *dev)
+{
+	return __pm_runtime_idle(dev, RPM_GET_PUT | RPM_TRY);
+}
+
 /**
  * pm_runtime_put_sync_suspend - Drop device usage counter and suspend if 0.
  * @dev: Target device.
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] clk: core: Avoid potential deadlock when disabling unused clocks
  2022-09-22 15:43 [RFC PATCH 0/2] clk: Avoid potential deadlock when disabling unused clocks Douglas Anderson
  2022-09-22 15:43 ` [RFC PATCH 1/2] PM: runtime: Add pm_runtime_try_put_sync() and pm_runtime_try_get_sync() Douglas Anderson
@ 2022-09-22 15:43 ` Douglas Anderson
  1 sibling, 0 replies; 5+ messages in thread
From: Douglas Anderson @ 2022-09-22 15:43 UTC (permalink / raw)
  To: rafael, sboyd
  Cc: pavel, linux-pm, len.brown, linux-clk, gregkh, Douglas Anderson,
	Michael Turquette, linux-kernel

While booting up my system, I seem to have hit the lucky jackpot and
my system was consistently deadlocking from an issue that seems to
have been possible for a long time. Analysis via kgdb made it obvious
what was happening.

The quick summary is here (gory details below):
* Task A:
  - running clk_disable_unused() which holds the prepare lock.
  - doing a synchronous runtime resume on a device; blocked waiting
    for the device which is marked as midway through suspending.
* Task B:
  - midway through suspending the same device on a work thread.
  - trying to unprepare a clock and grab the prepare lock.

That's a pretty clear deadlock.

Fixing the deadlock isn't amazingly straightforward. It should be
pretty clear that a random device's PM Runtime callbacks should be
able to prepare/unprepare clocks, so pretty much the only action would
be to drop the "prepare" lock while disabling unused clocks. That's
not super safe, though.

Instead of rejiggering the locking design of the whole clock
framework, let's use the following observations to fix this:
1. Disabling unused clocks is not terribly urgent. It can be delayed
   for a bit.
2. Disabling unused clocks can be retried. In other words, at any
   point in time we can stop, drop the prepare lock, and start all
   over again from the beginning.
This means that we can "fix" the problem by just backing off, delaying
a bit, and trying again.

At the moment we'll do an exponential type backoff (start at 1 ms and
double each time) and try at most 10 times. These numbers were picked
arbitrarily but seem like they'll work.

Gory detail of the analysis follow. This was from the chromeos-5.15
kernel, not pure upstream. The race hits as part of a lucky jackpot of
timings so I had to analyze it on the kernel I was on, but as far as I
know everything about this analysis applies to upstream:

Task A stack crawl (doing the clk_disable_unused()):
  task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008
  Call trace:
   schedule()
   rpm_resume()
   __pm_runtime_resume()
   clk_pm_runtime_get()
   clk_disable_unused_subtree()
   clk_disable_unused_subtree()
   clk_disable_unused_subtree()
   clk_disable_unused_subtree()
   clk_disable_unused()
   do_one_initcall()

In kgdb you can see the "dev" being resumed:
(gdb) frame 4
    at .../drivers/base/power/runtime.c:819
819                             schedule();
(gdb) print dev->driver
$2 = (struct device_driver *) 0x... <lpass_aon_cc_sc7280_driver+40>

Task B stack crawl
   schedule()
   schedule_preempt_disabled()
   __mutex_lock_common()
   mutex_lock_nested()
   clk_prepare_lock()
   clk_unprepare()
   pm_clk_suspend()
   pm_generic_runtime_suspend()
   __rpm_callback()
   rpm_callback()
   rpm_suspend()
   pm_runtime_work()
   process_one_work()
   worker_thread()
   kthread()

In kgdb you can see the "dev" being suspended
(gdb) frame 15
    at .../drivers/base/power/runtime.c:522
522                     retval = __rpm_callback(cb, dev);
(gdb) print dev->driver
$3 = (struct device_driver *) 0x... <lpass_aon_cc_sc7280_driver+40>

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/clk/clk.c | 137 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 110 insertions(+), 27 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index bd0b35cac83e..723e57a9d60d 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -9,6 +9,7 @@
 #include <linux/clk.h>
 #include <linux/clk-provider.h>
 #include <linux/clk/clk-conf.h>
+#include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
@@ -114,6 +115,22 @@ static int clk_pm_runtime_get(struct clk_core *core)
 	return pm_runtime_resume_and_get(core->dev);
 }
 
+static int clk_pm_runtime_try_get(struct clk_core *core)
+{
+	int ret;
+
+	if (!core->rpm_enabled)
+		return 0;
+
+	ret = pm_runtime_try_get_sync(core->dev);
+	if (ret < 0) {
+		pm_runtime_put_noidle(core->dev);
+		return ret;
+	}
+
+	return 0;
+}
+
 static void clk_pm_runtime_put(struct clk_core *core)
 {
 	if (!core->rpm_enabled)
@@ -122,6 +139,14 @@ static void clk_pm_runtime_put(struct clk_core *core)
 	pm_runtime_put_sync(core->dev);
 }
 
+static void clk_pm_runtime_put_async(struct clk_core *core)
+{
+	if (!core->rpm_enabled)
+		return;
+
+	pm_runtime_put(core->dev);
+}
+
 /***           locking             ***/
 static void clk_prepare_lock(void)
 {
@@ -1217,23 +1242,31 @@ static void clk_core_disable_unprepare(struct clk_core *core)
 	clk_core_unprepare_lock(core);
 }
 
-static void __init clk_unprepare_unused_subtree(struct clk_core *core)
+static int __init clk_unprepare_unused_subtree(struct clk_core *core)
 {
 	struct clk_core *child;
+	int ret;
 
 	lockdep_assert_held(&prepare_lock);
 
-	hlist_for_each_entry(child, &core->children, child_node)
-		clk_unprepare_unused_subtree(child);
+	hlist_for_each_entry(child, &core->children, child_node) {
+		ret = clk_unprepare_unused_subtree(child);
+		if (ret)
+			return ret;
+	}
 
 	if (core->prepare_count)
-		return;
+		return 0;
 
 	if (core->flags & CLK_IGNORE_UNUSED)
-		return;
+		return 0;
 
-	if (clk_pm_runtime_get(core))
-		return;
+	/* Backoff if the device is busy; see clk_disable_unused_subtree() */
+	ret = clk_pm_runtime_try_get(core);
+	if (ret == -EINPROGRESS)
+		return -EAGAIN;
+	else if (ret)
+		return ret;
 
 	if (clk_core_is_prepared(core)) {
 		trace_clk_unprepare(core);
@@ -1244,23 +1277,39 @@ static void __init clk_unprepare_unused_subtree(struct clk_core *core)
 		trace_clk_unprepare_complete(core);
 	}
 
-	clk_pm_runtime_put(core);
+	clk_pm_runtime_put_async(core);
+
+	return 0;
 }
 
-static void __init clk_disable_unused_subtree(struct clk_core *core)
+static int __init clk_disable_unused_subtree(struct clk_core *core)
 {
 	struct clk_core *child;
 	unsigned long flags;
+	int ret;
 
 	lockdep_assert_held(&prepare_lock);
 
-	hlist_for_each_entry(child, &core->children, child_node)
-		clk_disable_unused_subtree(child);
+	hlist_for_each_entry(child, &core->children, child_node) {
+		ret = clk_disable_unused_subtree(child);
+		if (ret)
+			return ret;
+	}
 
 	if (core->flags & CLK_OPS_PARENT_ENABLE)
 		clk_core_prepare_enable(core->parent);
 
-	if (clk_pm_runtime_get(core))
+	/*
+	 * If the device is already busy resuming / suspending then we need
+	 * to back off and try the whole subtree disable again. This is because
+	 * the resume / suspend may be happening on another CPU. The resume /
+	 * suspend code on the other CPU might be trying to prepare a clock, but
+	 * we're already holding the lock. That's deadlock unless we stand down.
+	 */
+	ret = clk_pm_runtime_try_get(core);
+	if (ret == -EINPROGRESS)
+		ret = -EAGAIN;
+	if (ret)
 		goto unprepare_out;
 
 	flags = clk_enable_lock();
@@ -1287,10 +1336,12 @@ static void __init clk_disable_unused_subtree(struct clk_core *core)
 
 unlock_out:
 	clk_enable_unlock(flags);
-	clk_pm_runtime_put(core);
+	clk_pm_runtime_put_async(core);
 unprepare_out:
 	if (core->flags & CLK_OPS_PARENT_ENABLE)
 		clk_core_disable_unprepare(core->parent);
+
+	return ret;
 }
 
 static bool clk_ignore_unused __initdata;
@@ -1301,32 +1352,64 @@ static int __init clk_ignore_unused_setup(char *__unused)
 }
 __setup("clk_ignore_unused", clk_ignore_unused_setup);
 
-static int __init clk_disable_unused(void)
+static int __init _clk_disable_unused(void)
 {
 	struct clk_core *core;
+	int ret;
+
+	hlist_for_each_entry(core, &clk_root_list, child_node) {
+		ret = clk_disable_unused_subtree(core);
+		if (ret)
+			return ret;
+	}
+
+	hlist_for_each_entry(core, &clk_orphan_list, child_node) {
+		ret = clk_disable_unused_subtree(core);
+		if (ret)
+			return ret;
+	}
+
+	hlist_for_each_entry(core, &clk_root_list, child_node) {
+		ret = clk_unprepare_unused_subtree(core);
+		if (ret)
+			return ret;
+	}
+
+	hlist_for_each_entry(core, &clk_orphan_list, child_node) {
+		ret = clk_unprepare_unused_subtree(core);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int __init clk_disable_unused(void)
+{
+	int ret;
+	int backoff_ms = 1;
+	int tries_left;
 
 	if (clk_ignore_unused) {
 		pr_warn("clk: Not disabling unused clocks\n");
 		return 0;
 	}
 
-	clk_prepare_lock();
+	for (tries_left = 10; tries_left; tries_left--) {
+		clk_prepare_lock();
+		ret = _clk_disable_unused();
+		clk_prepare_unlock();
 
-	hlist_for_each_entry(core, &clk_root_list, child_node)
-		clk_disable_unused_subtree(core);
-
-	hlist_for_each_entry(core, &clk_orphan_list, child_node)
-		clk_disable_unused_subtree(core);
-
-	hlist_for_each_entry(core, &clk_root_list, child_node)
-		clk_unprepare_unused_subtree(core);
+		if (ret != -EAGAIN)
+			return ret;
 
-	hlist_for_each_entry(core, &clk_orphan_list, child_node)
-		clk_unprepare_unused_subtree(core);
+		msleep(backoff_ms);
+		backoff_ms *= 2;
+	}
 
-	clk_prepare_unlock();
+	pr_warn("clk: Failed to disable unused clocks\n");
 
-	return 0;
+	return ret;
 }
 late_initcall_sync(clk_disable_unused);
 
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/2] PM: runtime: Add pm_runtime_try_put_sync() and pm_runtime_try_get_sync()
  2022-09-22 15:43 ` [RFC PATCH 1/2] PM: runtime: Add pm_runtime_try_put_sync() and pm_runtime_try_get_sync() Douglas Anderson
@ 2022-09-22 17:37   ` Rafael J. Wysocki
  2022-09-22 19:46     ` Doug Anderson
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2022-09-22 17:37 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Rafael J. Wysocki, Stephen Boyd, Pavel Machek, Linux PM,
	Len Brown, linux-clk, Greg Kroah-Hartman,
	Linux Kernel Mailing List

On Thu, Sep 22, 2022 at 5:44 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> In some cases, a caller may wish to synchronously get or put the PM
> Runtime state of a device but the caller may also be holding a
> resource that the runtime suspend or runtime resume of the device
> needs. Obviously this can lead to deadlock.
>
> A case in point is the clock framework. While running
> clk_disable_unused() the clock framework holds the global clock
> "prepare" lock. The clock framework then goes through and does PM
> Runtime actions. It should be no surprise to anyone that some devices
> need to prepare or unprepare clocks are part of their PM Runtime
> actions. Things generally work OK because of the "recursive" nature of
> the global clock "prepare" lock, but if we get unlucky and the PM
> Runtime action is happening in another task then we can end up
> deadlocking.
>
> Let's add a "try" version of the synchronous PM Runtime routines.
> This version will return -EINPROGRESS rather than waiting. To
> implement this we'll add a new flag: RPM_TRY.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
>  drivers/base/power/runtime.c |  7 +++++--
>  include/linux/pm_runtime.h   | 28 ++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 997be3ac20a7..67cc6a620b12 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -609,7 +609,7 @@ static int rpm_suspend(struct device *dev, int rpmflags)
>         if (dev->power.runtime_status == RPM_SUSPENDING) {
>                 DEFINE_WAIT(wait);
>
> -               if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
> +               if (rpmflags & (RPM_ASYNC | RPM_NOWAIT | RPM_TRY)) {
>                         retval = -EINPROGRESS;
>                         goto out;
>                 }
> @@ -791,7 +791,10 @@ static int rpm_resume(struct device *dev, int rpmflags)
>             || dev->power.runtime_status == RPM_SUSPENDING) {
>                 DEFINE_WAIT(wait);
>
> -               if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
> +               if (rpmflags & RPM_TRY) {
> +                       retval = -EINPROGRESS;

Returning -EINPROGRESS from here may be misleading, because the device
may not be resuming.

Besides, I'm not sure why a new flag is needed.  What about using
RPM_NOWAIT instead?



> +                       goto out;
> +               } else if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
>                         if (dev->power.runtime_status == RPM_SUSPENDING)
>                                 dev->power.deferred_resume = true;
>                         else
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index 0a41b2dcccad..c68baa63f0e7 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -21,6 +21,8 @@
>  #define RPM_GET_PUT            0x04    /* Increment/decrement the
>                                             usage_count */
>  #define RPM_AUTO               0x08    /* Use autosuspend_delay */
> +#define RPM_TRY                        0x10    /* Try to be synchronous but fail
> +                                           with an error if we can't. */
>
>  /*
>   * Use this for defining a set of PM operations to be used in all situations
> @@ -425,6 +427,19 @@ static inline int pm_runtime_get_sync(struct device *dev)
>         return __pm_runtime_resume(dev, RPM_GET_PUT);
>  }
>
> +/**
> + * pm_runtime_try_get_sync - Like pm_runtime_get_sync() but err if blocking
> + * @dev: Target device.
> + *
> + * This function works just like pm_runtime_get_sync() except that if the
> + * device in question is currently in the process of suspending or resuming
> + * that it will return with -EINPROGRESS instead of blocking.
> + */
> +static inline int pm_runtime_try_get_sync(struct device *dev)
> +{
> +       return __pm_runtime_resume(dev, RPM_GET_PUT | RPM_TRY);
> +}
> +
>  /**
>   * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
>   * @dev: Target device.
> @@ -489,6 +504,19 @@ static inline int pm_runtime_put_sync(struct device *dev)
>         return __pm_runtime_idle(dev, RPM_GET_PUT);
>  }
>
> +/**
> + * pm_runtime_try_put_sync - Like pm_runtime_put_sync() but err if blocking
> + * @dev: Target device.
> + *
> + * This function works just like pm_runtime_put_sync() except that if the
> + * device in question is currently in the process of suspending that it will
> + * return with -EINPROGRESS instead of blocking.
> + */
> +static inline int pm_runtime_try_put_sync(struct device *dev)
> +{
> +       return __pm_runtime_idle(dev, RPM_GET_PUT | RPM_TRY);
> +}
> +
>  /**
>   * pm_runtime_put_sync_suspend - Drop device usage counter and suspend if 0.
>   * @dev: Target device.
> --
> 2.37.3.968.ga6b4b080e4-goog
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/2] PM: runtime: Add pm_runtime_try_put_sync() and pm_runtime_try_get_sync()
  2022-09-22 17:37   ` Rafael J. Wysocki
@ 2022-09-22 19:46     ` Doug Anderson
  0 siblings, 0 replies; 5+ messages in thread
From: Doug Anderson @ 2022-09-22 19:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Boyd, Pavel Machek, Linux PM, Len Brown, linux-clk,
	Greg Kroah-Hartman, Linux Kernel Mailing List

Hi,

On Thu, Sep 22, 2022 at 10:38 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Sep 22, 2022 at 5:44 PM Douglas Anderson <dianders@chromium.org> wrote:
> >
> > In some cases, a caller may wish to synchronously get or put the PM
> > Runtime state of a device but the caller may also be holding a
> > resource that the runtime suspend or runtime resume of the device
> > needs. Obviously this can lead to deadlock.
> >
> > A case in point is the clock framework. While running
> > clk_disable_unused() the clock framework holds the global clock
> > "prepare" lock. The clock framework then goes through and does PM
> > Runtime actions. It should be no surprise to anyone that some devices
> > need to prepare or unprepare clocks are part of their PM Runtime
> > actions. Things generally work OK because of the "recursive" nature of
> > the global clock "prepare" lock, but if we get unlucky and the PM
> > Runtime action is happening in another task then we can end up
> > deadlocking.
> >
> > Let's add a "try" version of the synchronous PM Runtime routines.
> > This version will return -EINPROGRESS rather than waiting. To
> > implement this we'll add a new flag: RPM_TRY.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >  drivers/base/power/runtime.c |  7 +++++--
> >  include/linux/pm_runtime.h   | 28 ++++++++++++++++++++++++++++
> >  2 files changed, 33 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 997be3ac20a7..67cc6a620b12 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -609,7 +609,7 @@ static int rpm_suspend(struct device *dev, int rpmflags)
> >         if (dev->power.runtime_status == RPM_SUSPENDING) {
> >                 DEFINE_WAIT(wait);
> >
> > -               if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
> > +               if (rpmflags & (RPM_ASYNC | RPM_NOWAIT | RPM_TRY)) {
> >                         retval = -EINPROGRESS;
> >                         goto out;
> >                 }
> > @@ -791,7 +791,10 @@ static int rpm_resume(struct device *dev, int rpmflags)
> >             || dev->power.runtime_status == RPM_SUSPENDING) {
> >                 DEFINE_WAIT(wait);
> >
> > -               if (rpmflags & (RPM_ASYNC | RPM_NOWAIT)) {
> > +               if (rpmflags & RPM_TRY) {
> > +                       retval = -EINPROGRESS;
>
> Returning -EINPROGRESS from here may be misleading, because the device
> may not be resuming.
>
> Besides, I'm not sure why a new flag is needed.  What about using
> RPM_NOWAIT instead?

Yeah, we can use RPM_NOWAIT if we land your patch [1]. I'll spin with
that if folks agree that the overall approach taken in this series
makes sense.

[1] https://lore.kernel.org/r/12079576.O9o76ZdvQC@kreacher

-Doug

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-22 19:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 15:43 [RFC PATCH 0/2] clk: Avoid potential deadlock when disabling unused clocks Douglas Anderson
2022-09-22 15:43 ` [RFC PATCH 1/2] PM: runtime: Add pm_runtime_try_put_sync() and pm_runtime_try_get_sync() Douglas Anderson
2022-09-22 17:37   ` Rafael J. Wysocki
2022-09-22 19:46     ` Doug Anderson
2022-09-22 15:43 ` [RFC PATCH 2/2] clk: core: Avoid potential deadlock when disabling unused clocks Douglas Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).