[RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow()
@ 2023-03-29 21:33 Douglas Anderson
  2023-03-29 21:33 ` [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies Douglas Anderson
  2023-04-06 15:03 ` [RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow() Mark Brown
  0 siblings, 2 replies; 7+ messages in thread
From: Douglas Anderson @ 2023-03-29 21:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Dmitry Osipenko, Lucas Stach, David Collins, Stephen Boyd,
	Douglas Anderson, Liam Girdwood, linux-kernel

When a codepath locks a rdev using ww_mutex_lock_slow() directly then
that codepath is responsible for incrementing the "ref_cnt" and also
setting the "mutex_owner" to "current".

The regulator core consistently got that right for "ref_cnt" but
didn't always get it right for "mutex_owner". Let's fix this.

It's unlikely that this truly matters because the "mutex_owner" is
only needed if we're going to do subsequent locking of the same
rdev. However, even though it's not truly needed it seems less
surprising if we consistently set "mutex_owner" properly.

Fixes: f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators locking")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- "Consistently set mutex_owner when using ww_mutex_lock_slow()" new for v2.

 drivers/regulator/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 1490eb40c973..9a13240f3084 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -334,6 +334,7 @@ static void regulator_lock_dependent(struct regulator_dev *rdev,
 			ww_mutex_lock_slow(&new_contended_rdev->mutex, ww_ctx);
 			old_contended_rdev = new_contended_rdev;
 			old_contended_rdev->ref_cnt++;
+			old_contended_rdev->mutex_owner = current;
 		}
 
 		err = regulator_lock_recursive(rdev,
@@ -6048,6 +6049,7 @@ static void regulator_summary_lock(struct ww_acquire_ctx *ww_ctx)
 			ww_mutex_lock_slow(&new_contended_rdev->mutex, ww_ctx);
 			old_contended_rdev = new_contended_rdev;
 			old_contended_rdev->ref_cnt++;
+			old_contended_rdev->mutex_owner = current;
 		}
 
 		err = regulator_summary_lock_all(ww_ctx,
-- 
2.40.0.348.gf938b09366-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies
  2023-03-29 21:33 [RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow() Douglas Anderson
@ 2023-03-29 21:33 ` Douglas Anderson
  2023-04-07 21:46   ` Stephen Boyd
  2023-04-06 15:03 ` [RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow() Mark Brown
  1 sibling, 1 reply; 7+ messages in thread
From: Douglas Anderson @ 2023-03-29 21:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Dmitry Osipenko, Lucas Stach, David Collins, Stephen Boyd,
	Douglas Anderson, David Collins, Liam Girdwood, linux-kernel

An automated bot told me that there was a potential lockdep problem
with regulators. This was on the chromeos-5.15 kernel, but I see
nothing that would be different downstream compared to upstream. The
bot said:
  ============================================
  WARNING: possible recursive locking detected
  5.15.104-lockdep-17461-gc1e499ed6604 #1 Not tainted
  --------------------------------------------
  kworker/u16:4/115 is trying to acquire lock:
  ffffff8083110170 (regulator_ww_class_mutex){+.+.}-{3:3}, at: create_regulator+0x398/0x7ec

  but task is already holding lock:
  ffffff808378e170 (regulator_ww_class_mutex){+.+.}-{3:3}, at: ww_mutex_trylock+0x3c/0x7b8

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(regulator_ww_class_mutex);
    lock(regulator_ww_class_mutex);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  4 locks held by kworker/u16:4/115:
   #0: ffffff808006a948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x520/0x1348
   #1: ffffffc00e0a7cc0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x55c/0x1348
   #2: ffffff80828a2260 (&dev->mutex){....}-{3:3}, at: __device_attach_async_helper+0xd0/0x2a4
   #3: ffffff808378e170 (regulator_ww_class_mutex){+.+.}-{3:3}, at: ww_mutex_trylock+0x3c/0x7b8

  stack backtrace:
  CPU: 2 PID: 115 Comm: kworker/u16:4 Not tainted 5.15.104-lockdep-17461-gc1e499ed6604 #1 9292e52fa83c0e23762b2b3aa1bacf5787a4d5da
  Hardware name: Google Quackingstick (rev0+) (DT)
  Workqueue: events_unbound async_run_entry_fn
  Call trace:
   dump_backtrace+0x0/0x4ec
   show_stack+0x34/0x50
   dump_stack_lvl+0xdc/0x11c
   dump_stack+0x1c/0x48
   __lock_acquire+0x16d4/0x6c74
   lock_acquire+0x208/0x750
   __mutex_lock_common+0x11c/0x11f8
   ww_mutex_lock+0xc0/0x440
   create_regulator+0x398/0x7ec
   regulator_resolve_supply+0x654/0x7c4
   regulator_register_resolve_supply+0x30/0x120
   class_for_each_device+0x1b8/0x230
   regulator_register+0x17a4/0x1f40
   devm_regulator_register+0x60/0xd0
   reg_fixed_voltage_probe+0x728/0xaec
   platform_probe+0x150/0x1c8
   really_probe+0x274/0xa20
   __driver_probe_device+0x1dc/0x3f4
   driver_probe_device+0x78/0x1c0
   __device_attach_driver+0x1ac/0x2c8
   bus_for_each_drv+0x11c/0x190
   __device_attach_async_helper+0x1e4/0x2a4
   async_run_entry_fn+0xa0/0x3ac
   process_one_work+0x638/0x1348
   worker_thread+0x4a8/0x9c4
   kthread+0x2e4/0x3a0
   ret_from_fork+0x10/0x20

The problem was first reported soon after we made many of the
regulators probe asynchronously, though nothing I've seen implies that
the problems couldn't have also happened even without that.

I haven't personally been able to reproduce the lockdep issue, but the
issue does look somewhat legitimate. Specifically, it looks like in
regulator_resolve_supply() we are holding a "rdev" lock while calling
set_supply() -> create_regulator() which grabs the lock of a
_different_ "rdev" (the one for our supply). This is not necessarily
safe from a lockdep perspective since there is no documented ordering
between these two locks.

In reality, we should always be locking a regulator before the
supplying regulator, so I don't expect there to be any real deadlocks
in practice. However, the regulator framework in general doesn't
express this to lockdep.

Let's fix the issue by simply grabbing the two locks involved in the
same way we grab multiple locks elsewhere in the regulator framework:
using the "wound/wait" mechanisms.

Fixes: eaa7995c529b ("regulator: core: avoid regulator_resolve_supply() race condition")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
My knowledge of lockdep is not as strong as it should be and my
knowledge of wait-wound locks is not as strong as it should be. That,
combined with the fact that I can't actually reproduce the issue, has
led me to label this as RFC.

I can at least confirm that my system still boots with this patch
applied, but I can't say 100% for sure that this addresses the issue
that the bot reported to me. Hopefully others can review and make sure
that this seems sensible to them.

If this looks reasonable, I can land it and see if that prevents the
bot from hitting this again.

Changes in v2:
- Added "core" to the subject line.
- Fixed slowpath bug where we missed updating the ref_cnt / mutex_owner.
- Added Fixes since I realized the problem is somewhat "recent".

 drivers/regulator/core.c | 91 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 83 insertions(+), 8 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 9a13240f3084..08726bc0da9d 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -207,6 +207,78 @@ static void regulator_unlock(struct regulator_dev *rdev)
 	mutex_unlock(&regulator_nesting_mutex);
 }
 
+/**
+ * regulator_lock_two - lock two regulators
+ * @rdev1:		first regulator
+ * @rdev2:		second regulator
+ * @ww_ctx:		w/w mutex acquire context
+ *
+ * Locks both rdevs using the regulator_ww_class.
+ */
+static void regulator_lock_two(struct regulator_dev *rdev1,
+			       struct regulator_dev *rdev2,
+			       struct ww_acquire_ctx *ww_ctx)
+{
+	struct regulator_dev *tmp;
+	int ret;
+
+	ww_acquire_init(ww_ctx, &regulator_ww_class);
+
+	/* Try to just grab both of them */
+	ret = regulator_lock_nested(rdev1, ww_ctx);
+	WARN_ON(ret);
+	ret = regulator_lock_nested(rdev2, ww_ctx);
+	if (ret != -EDEADLOCK) {
+		WARN_ON(ret);
+		goto exit;
+	}
+
+	while (true) {
+		/*
+		 * Start of loop: rdev1 was locked and rdev2 was contended.
+		 * Need to unlock rdev1, slowly lock rdev2, then try rdev1
+		 * again.
+		 */
+		regulator_unlock(rdev1);
+
+		ww_mutex_lock_slow(&rdev2->mutex, ww_ctx);
+		rdev2->ref_cnt++;
+		rdev2->mutex_owner = current;
+		ret = regulator_lock_nested(rdev1, ww_ctx);
+
+		if (ret == -EDEADLOCK) {
+			/* More contention; swap which needs to be slow */
+			tmp = rdev1;
+			rdev1 = rdev2;
+			rdev2 = tmp;
+		} else {
+			WARN_ON(ret);
+			break;
+		}
+	}
+
+exit:
+	ww_acquire_done(ww_ctx);
+}
+
+/**
+ * regulator_unlock_two - unlock two regulators
+ * @rdev1:		first regulator
+ * @rdev2:		second regulator
+ * @ww_ctx:		w/w mutex acquire context
+ *
+ * The inverse of regulator_lock_two().
+ */
+
+static void regulator_unlock_two(struct regulator_dev *rdev1,
+				 struct regulator_dev *rdev2,
+				 struct ww_acquire_ctx *ww_ctx)
+{
+	regulator_unlock(rdev2);
+	regulator_unlock(rdev1);
+	ww_acquire_fini(ww_ctx);
+}
+
 static bool regulator_supply_is_couple(struct regulator_dev *rdev)
 {
 	struct regulator_dev *c_rdev;
@@ -1627,8 +1699,8 @@ static int set_machine_constraints(struct regulator_dev *rdev)
 
 /**
  * set_supply - set regulator supply regulator
- * @rdev: regulator name
- * @supply_rdev: supply regulator name
+ * @rdev: regulator (locked)
+ * @supply_rdev: supply regulator (locked))
  *
  * Called by platform initialisation code to set the supply regulator for this
  * regulator. This ensures that a regulators supply will also be enabled by the
@@ -1800,6 +1872,8 @@ static struct regulator *create_regulator(struct regulator_dev *rdev,
 	struct regulator *regulator;
 	int err = 0;
 
+	lockdep_assert_held_once(&rdev->mutex.base);
+
 	if (dev) {
 		char buf[REG_STR_SIZE];
 		int size;
@@ -1827,9 +1901,7 @@ static struct regulator *create_regulator(struct regulator_dev *rdev,
 	regulator->rdev = rdev;
 	regulator->supply_name = supply_name;
 
-	regulator_lock(rdev);
 	list_add(&regulator->list, &rdev->consumer_list);
-	regulator_unlock(rdev);
 
 	if (dev) {
 		regulator->dev = dev;
@@ -1995,6 +2067,7 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
 {
 	struct regulator_dev *r;
 	struct device *dev = rdev->dev.parent;
+	struct ww_acquire_ctx ww_ctx;
 	int ret = 0;
 
 	/* No supply to resolve? */
@@ -2061,23 +2134,23 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
 	 * between rdev->supply null check and setting rdev->supply in
 	 * set_supply() from concurrent tasks.
 	 */
-	regulator_lock(rdev);
+	regulator_lock_two(rdev, r, &ww_ctx);
 
 	/* Supply just resolved by a concurrent task? */
 	if (rdev->supply) {
-		regulator_unlock(rdev);
+		regulator_unlock_two(rdev, r, &ww_ctx);
 		put_device(&r->dev);
 		goto out;
 	}
 
 	ret = set_supply(rdev, r);
 	if (ret < 0) {
-		regulator_unlock(rdev);
+		regulator_unlock_two(rdev, r, &ww_ctx);
 		put_device(&r->dev);
 		goto out;
 	}
 
-	regulator_unlock(rdev);
+	regulator_unlock_two(rdev, r, &ww_ctx);
 
 	/*
 	 * In set_machine_constraints() we may have turned this regulator on
@@ -2190,7 +2263,9 @@ struct regulator *_regulator_get(struct device *dev, const char *id,
 		return regulator;
 	}
 
+	regulator_lock(rdev);
 	regulator = create_regulator(rdev, dev, id);
+	regulator_unlock(rdev);
 	if (regulator == NULL) {
 		regulator = ERR_PTR(-ENOMEM);
 		module_put(rdev->owner);
-- 
2.40.0.348.gf938b09366-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow()
  2023-03-29 21:33 [RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow() Douglas Anderson
  2023-03-29 21:33 ` [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies Douglas Anderson
@ 2023-04-06 15:03 ` Mark Brown
  1 sibling, 0 replies; 7+ messages in thread
From: Mark Brown @ 2023-04-06 15:03 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Dmitry Osipenko, Lucas Stach, David Collins, Stephen Boyd,
	Liam Girdwood, linux-kernel

On Wed, 29 Mar 2023 14:33:53 -0700, Douglas Anderson wrote:
> When a codepath locks a rdev using ww_mutex_lock_slow() directly then
> that codepath is responsible for incrementing the "ref_cnt" and also
> setting the "mutex_owner" to "current".
> 
> The regulator core consistently got that right for "ref_cnt" but
> didn't always get it right for "mutex_owner". Let's fix this.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next

Thanks!

[1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow()
      commit: b83a1772be854f87602de14726737d3e5b06e1f4
[2/2] regulator: core: Avoid lockdep reports when resolving supplies
      commit: cba6cfdc7c3f1516f0d08ddfb24e689af0932573

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies
  2023-03-29 21:33 ` [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies Douglas Anderson
@ 2023-04-07 21:46   ` Stephen Boyd
  2023-04-14  0:36     ` Doug Anderson
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2023-04-07 21:46 UTC (permalink / raw)
  To: Douglas Anderson, Mark Brown
  Cc: Dmitry Osipenko, Lucas Stach, David Collins, David Collins,
	Liam Girdwood, linux-kernel

Quoting Douglas Anderson (2023-03-29 14:33:54)
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index 9a13240f3084..08726bc0da9d 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -207,6 +207,78 @@ static void regulator_unlock(struct regulator_dev *rdev)
>         mutex_unlock(&regulator_nesting_mutex);
>  }
>
> +/**
> + * regulator_lock_two - lock two regulators
> + * @rdev1:             first regulator
> + * @rdev2:             second regulator
> + * @ww_ctx:            w/w mutex acquire context
> + *
> + * Locks both rdevs using the regulator_ww_class.
> + */
> +static void regulator_lock_two(struct regulator_dev *rdev1,
> +                              struct regulator_dev *rdev2,
> +                              struct ww_acquire_ctx *ww_ctx)
> +{
> +       struct regulator_dev *tmp;
> +       int ret;
> +
> +       ww_acquire_init(ww_ctx, &regulator_ww_class);
> +
> +       /* Try to just grab both of them */
> +       ret = regulator_lock_nested(rdev1, ww_ctx);
> +       WARN_ON(ret);
> +       ret = regulator_lock_nested(rdev2, ww_ctx);
> +       if (ret != -EDEADLOCK) {
> +               WARN_ON(ret);
> +               goto exit;
> +       }

I think this would be clearer if we had two local variable pointers

	struct regulator_dev *held, *contended;

	held = rdev1;
	contended = rdev2;

> +
> +       while (true) {
> +               /*
> +                * Start of loop: rdev1 was locked and rdev2 was contended.
> +                * Need to unlock rdev1, slowly lock rdev2, then try rdev1
> +                * again.
> +                */
> +               regulator_unlock(rdev1);

		regulator_unlock(held);

> +
> +               ww_mutex_lock_slow(&rdev2->mutex, ww_ctx);
> +               rdev2->ref_cnt++;
> +               rdev2->mutex_owner = current;
> +               ret = regulator_lock_nested(rdev1, ww_ctx);

		ww_mutex_lock_slow(&contended->mutex, ww_ctx);
		contended->ref_cnt++;
		contended->mutex_owner = current;
		swap(held, contended);
		ret = regulator_lock_nested(contended, ww_ctx);
		if (ret != -EDEADLOCK) {

> +                       WARN_ON(ret);
> +                       break;
> +               }
> +       }
> +
> +exit:
> +       ww_acquire_done(ww_ctx);
> +}
> +
> @@ -1627,8 +1699,8 @@ static int set_machine_constraints(struct regulator_dev *rdev)
>
>  /**
>   * set_supply - set regulator supply regulator
> - * @rdev: regulator name

It certainly wasn't the name :)

> - * @supply_rdev: supply regulator name
> + * @rdev: regulator (locked)
> + * @supply_rdev: supply regulator (locked))
>   *
>   * Called by platform initialisation code to set the supply regulator for this
>   * regulator. This ensures that a regulators supply will also be enabled by the
[...]
> @@ -2190,7 +2263,9 @@ struct regulator *_regulator_get(struct device *dev, const char *id,
>                 return regulator;
>         }
>
> +       regulator_lock(rdev);
>         regulator = create_regulator(rdev, dev, id);
> +       regulator_unlock(rdev);

I'm sad that we're now locking the entire time create_regulator() is
called. Can that be avoided? I see that create_regulator() publishes the
consumer on the consumer_list, but otherwise I don't think it needs to
hold the regulator lock. It goes on to call debugfs code after
allocating memory. After this patch, we're going to be holding the lock
for that regulator across debugfs APIs. I suspect that may lead to more
problems later on because the time we hold the lock is extremely wide
now.

Of course, we were already holding the child regulator's lock for the
supply, because that's what this patch is fixing in
regulator_resolve_supply(). I'm just nervous that we're holding the lock
for a much wider time now. Maybe we can have create_regulator() return
the regulator and add a new function like add_regulator_consumer() that
does the list modification? Then we can make create_regulator() do
everything without holding a lock and have a very short time where the
new function locks two regulator locks and does the linkage.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies
  2023-04-07 21:46   ` Stephen Boyd
@ 2023-04-14  0:36     ` Doug Anderson
  2023-04-29  1:57       ` Stephen Boyd
  0 siblings, 1 reply; 7+ messages in thread
From: Doug Anderson @ 2023-04-14  0:36 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mark Brown, Dmitry Osipenko, Lucas Stach, David Collins,
	David Collins, Liam Girdwood, linux-kernel

Hi,

On Fri, Apr 7, 2023 at 2:46 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Douglas Anderson (2023-03-29 14:33:54)
> > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> > index 9a13240f3084..08726bc0da9d 100644
> > --- a/drivers/regulator/core.c
> > +++ b/drivers/regulator/core.c
> > @@ -207,6 +207,78 @@ static void regulator_unlock(struct regulator_dev *rdev)
> >         mutex_unlock(&regulator_nesting_mutex);
> >  }
> >
> > +/**
> > + * regulator_lock_two - lock two regulators
> > + * @rdev1:             first regulator
> > + * @rdev2:             second regulator
> > + * @ww_ctx:            w/w mutex acquire context
> > + *
> > + * Locks both rdevs using the regulator_ww_class.
> > + */
> > +static void regulator_lock_two(struct regulator_dev *rdev1,
> > +                              struct regulator_dev *rdev2,
> > +                              struct ww_acquire_ctx *ww_ctx)
> > +{
> > +       struct regulator_dev *tmp;
> > +       int ret;
> > +
> > +       ww_acquire_init(ww_ctx, &regulator_ww_class);
> > +
> > +       /* Try to just grab both of them */
> > +       ret = regulator_lock_nested(rdev1, ww_ctx);
> > +       WARN_ON(ret);
> > +       ret = regulator_lock_nested(rdev2, ww_ctx);
> > +       if (ret != -EDEADLOCK) {
> > +               WARN_ON(ret);
> > +               goto exit;
> > +       }
>
> I think this would be clearer if we had two local variable pointers
>
>         struct regulator_dev *held, *contended;
>
>         held = rdev1;
>         contended = rdev2;
>
> > +
> > +       while (true) {
> > +               /*
> > +                * Start of loop: rdev1 was locked and rdev2 was contended.
> > +                * Need to unlock rdev1, slowly lock rdev2, then try rdev1
> > +                * again.
> > +                */
> > +               regulator_unlock(rdev1);
>
>                 regulator_unlock(held);
>
> > +
> > +               ww_mutex_lock_slow(&rdev2->mutex, ww_ctx);
> > +               rdev2->ref_cnt++;
> > +               rdev2->mutex_owner = current;
> > +               ret = regulator_lock_nested(rdev1, ww_ctx);
>
>                 ww_mutex_lock_slow(&contended->mutex, ww_ctx);
>                 contended->ref_cnt++;
>                 contended->mutex_owner = current;
>                 swap(held, contended);
>                 ret = regulator_lock_nested(contended, ww_ctx);
>                 if (ret != -EDEADLOCK) {

Sure, I can do the rename to make it clearer. OK, sent out as
("regulator: core: Make regulator_lock_two() logic easier to follow")
[1]

[1] https://lore.kernel.org/r/20230413173359.1.I1ae92b25689bd6579952e6d458b79f5f8054a0c9@changeid


> > @@ -2190,7 +2263,9 @@ struct regulator *_regulator_get(struct device *dev, const char *id,
> >                 return regulator;
> >         }
> >
> > +       regulator_lock(rdev);
> >         regulator = create_regulator(rdev, dev, id);
> > +       regulator_unlock(rdev);
>
> I'm sad that we're now locking the entire time create_regulator() is
> called. Can that be avoided? I see that create_regulator() publishes the
> consumer on the consumer_list, but otherwise I don't think it needs to
> hold the regulator lock. It goes on to call debugfs code after
> allocating memory. After this patch, we're going to be holding the lock
> for that regulator across debugfs APIs. I suspect that may lead to more
> problems later on because the time we hold the lock is extremely wide
> now.
>
> Of course, we were already holding the child regulator's lock for the
> supply, because that's what this patch is fixing in
> regulator_resolve_supply(). I'm just nervous that we're holding the lock
> for a much wider time now. Maybe we can have create_regulator() return
> the regulator and add a new function like add_regulator_consumer() that
> does the list modification? Then we can make create_regulator() do
> everything without holding a lock and have a very short time where the
> new function locks two regulator locks and does the linkage.

While we could try to come up with something fancier like this, I'm
not convinced it's worth the complexity. There are already cases where
we hold multiple regulator locks for quite long periods of time.
Specifically you can look at regulator_enable(). There, we'll grab the
lock for the regulator and the locks for all of the regulators parents
up the chain. Then we'll enable the regulator (and maybe the parents)
which might even include a long delay/sleep while holding the mutexes
for the whole chain.

Mark: do you have any opinion / intuition here? Is holding the rdev
lock for this larger scope a problem?

-Doug

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies
  2023-04-14  0:36     ` Doug Anderson
@ 2023-04-29  1:57       ` Stephen Boyd
  2023-04-30 16:14         ` Mark Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2023-04-29  1:57 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Mark Brown, Dmitry Osipenko, Lucas Stach, David Collins,
	Liam Girdwood, linux-kernel

Quoting Doug Anderson (2023-04-13 17:36:09)
> Hi,
>
> On Fri, Apr 7, 2023 at 2:46 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > Quoting Douglas Anderson (2023-03-29 14:33:54)
> > > @@ -2190,7 +2263,9 @@ struct regulator *_regulator_get(struct device *dev, const char *id,
> > >                 return regulator;
> > >         }
> > >
> > > +       regulator_lock(rdev);
> > >         regulator = create_regulator(rdev, dev, id);
> > > +       regulator_unlock(rdev);
> >
> > I'm sad that we're now locking the entire time create_regulator() is
> > called. Can that be avoided? I see that create_regulator() publishes the
> > consumer on the consumer_list, but otherwise I don't think it needs to
> > hold the regulator lock. It goes on to call debugfs code after
> > allocating memory. After this patch, we're going to be holding the lock
> > for that regulator across debugfs APIs. I suspect that may lead to more
> > problems later on because the time we hold the lock is extremely wide
> > now.
> >
> > Of course, we were already holding the child regulator's lock for the
> > supply, because that's what this patch is fixing in
> > regulator_resolve_supply(). I'm just nervous that we're holding the lock
> > for a much wider time now. Maybe we can have create_regulator() return
> > the regulator and add a new function like add_regulator_consumer() that
> > does the list modification? Then we can make create_regulator() do
> > everything without holding a lock and have a very short time where the
> > new function locks two regulator locks and does the linkage.
>
> While we could try to come up with something fancier like this, I'm
> not convinced it's worth the complexity. There are already cases where
> we hold multiple regulator locks for quite long periods of time.
> Specifically you can look at regulator_enable(). There, we'll grab the
> lock for the regulator and the locks for all of the regulators parents
> up the chain. Then we'll enable the regulator (and maybe the parents)
> which might even include a long delay/sleep while holding the mutexes
> for the whole chain.

Does that long period of time cover a large section of code? I'm not so
much concerned about holding the lock for a long time. I'm mostly
concerned about holding the lock across the debugfs APIs, and
potentially anything else that debugfs might lock against. If the lock
is only really needed to modify the list in a safe manner then it may be
better to make it a little more complex so that we don't exit the
regulator framework with some rdev locks held.

>
> Mark: do you have any opinion / intuition here? Is holding the rdev
> lock for this larger scope a problem?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies
  2023-04-29  1:57       ` Stephen Boyd
@ 2023-04-30 16:14         ` Mark Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Brown @ 2023-04-30 16:14 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Doug Anderson, Dmitry Osipenko, Lucas Stach, David Collins,
	Liam Girdwood, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

On Fri, Apr 28, 2023 at 06:57:36PM -0700, Stephen Boyd wrote:

> Does that long period of time cover a large section of code? I'm not so
> much concerned about holding the lock for a long time. I'm mostly
> concerned about holding the lock across the debugfs APIs, and
> potentially anything else that debugfs might lock against. If the lock
> is only really needed to modify the list in a safe manner then it may be
> better to make it a little more complex so that we don't exit the
> regulator framework with some rdev locks held.

I'm not sure that I understand the specific concerns with the debugfs
APIs?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-30 16:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-29 21:33 [RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow() Douglas Anderson
2023-03-29 21:33 ` [RFC PATCH v2 2/2] regulator: core: Avoid lockdep reports when resolving supplies Douglas Anderson
2023-04-07 21:46   ` Stephen Boyd
2023-04-14  0:36     ` Doug Anderson
2023-04-29  1:57       ` Stephen Boyd
2023-04-30 16:14         ` Mark Brown
2023-04-06 15:03 ` [RFC PATCH v2 1/2] regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow() Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).