All of lore.kernel.org
 help / color / mirror / Atom feed
* Regulator regression in next-20180305
@ 2018-03-05 23:12 ` Tony Lindgren
  0 siblings, 0 replies; 26+ messages in thread
From: Tony Lindgren @ 2018-03-05 23:12 UTC (permalink / raw)
  To: Mark Brown, Maciej Purski; +Cc: linux-kernel, linux-arm-kernel, linux-omap

Hi Mark and Maciej,

Looks like with next-20180305 there's a regulator regression
where mmc0 won't show any cards or produces errors:

mmcblk0: error -110 requesting status
mmc1: new high speed SDIO card at address 0001
mmcblk0: error -110 requesting status
mmcblk0: recovery failed!
print_req_error: I/O error, dev mmcblk0, sector 0
Buffer I/O error on dev mmcblk0, logical block 0, async page read
mmcblk0: error -110 requesting status
mmcblk0: recovery failed!

I bisected it down to the following commits:

broken		71c33626a802 ("regulator: core: Change voltage setting path")
fails to build	a7348f502ab7 ("regulator: core: Add voltage balancing mechanism")
works		cf6fc8064766 ("regulator: core: Resolve coupled regulators")

Any ideas?

Regards,

Tony

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-05 23:12 ` Tony Lindgren
  0 siblings, 0 replies; 26+ messages in thread
From: Tony Lindgren @ 2018-03-05 23:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark and Maciej,

Looks like with next-20180305 there's a regulator regression
where mmc0 won't show any cards or produces errors:

mmcblk0: error -110 requesting status
mmc1: new high speed SDIO card at address 0001
mmcblk0: error -110 requesting status
mmcblk0: recovery failed!
print_req_error: I/O error, dev mmcblk0, sector 0
Buffer I/O error on dev mmcblk0, logical block 0, async page read
mmcblk0: error -110 requesting status
mmcblk0: recovery failed!

I bisected it down to the following commits:

broken		71c33626a802 ("regulator: core: Change voltage setting path")
fails to build	a7348f502ab7 ("regulator: core: Add voltage balancing mechanism")
works		cf6fc8064766 ("regulator: core: Resolve coupled regulators")

Any ideas?

Regards,

Tony

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-05 23:12 ` Tony Lindgren
@ 2018-03-05 23:22   ` Fabio Estevam
  -1 siblings, 0 replies; 26+ messages in thread
From: Fabio Estevam @ 2018-03-05 23:22 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Mark Brown, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

Hi Tony,

On Mon, Mar 5, 2018 at 8:12 PM, Tony Lindgren <tony@atomide.com> wrote:
> Hi Mark and Maciej,
>
> Looks like with next-20180305 there's a regulator regression
> where mmc0 won't show any cards or produces errors:
>
> mmcblk0: error -110 requesting status
> mmc1: new high speed SDIO card at address 0001
> mmcblk0: error -110 requesting status
> mmcblk0: recovery failed!
> print_req_error: I/O error, dev mmcblk0, sector 0
> Buffer I/O error on dev mmcblk0, logical block 0, async page read
> mmcblk0: error -110 requesting status
> mmcblk0: recovery failed!
>
> I bisected it down to the following commits:
>
> broken          71c33626a802 ("regulator: core: Change voltage setting path")
> fails to build  a7348f502ab7 ("regulator: core: Add voltage balancing mechanism")
> works           cf6fc8064766 ("regulator: core: Resolve coupled regulators")
>
> Any ideas?

I have also seen regulator issues due to this series:
https://lkml.org/lkml/2018/3/5/731

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-05 23:22   ` Fabio Estevam
  0 siblings, 0 replies; 26+ messages in thread
From: Fabio Estevam @ 2018-03-05 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tony,

On Mon, Mar 5, 2018 at 8:12 PM, Tony Lindgren <tony@atomide.com> wrote:
> Hi Mark and Maciej,
>
> Looks like with next-20180305 there's a regulator regression
> where mmc0 won't show any cards or produces errors:
>
> mmcblk0: error -110 requesting status
> mmc1: new high speed SDIO card at address 0001
> mmcblk0: error -110 requesting status
> mmcblk0: recovery failed!
> print_req_error: I/O error, dev mmcblk0, sector 0
> Buffer I/O error on dev mmcblk0, logical block 0, async page read
> mmcblk0: error -110 requesting status
> mmcblk0: recovery failed!
>
> I bisected it down to the following commits:
>
> broken          71c33626a802 ("regulator: core: Change voltage setting path")
> fails to build  a7348f502ab7 ("regulator: core: Add voltage balancing mechanism")
> works           cf6fc8064766 ("regulator: core: Resolve coupled regulators")
>
> Any ideas?

I have also seen regulator issues due to this series:
https://lkml.org/lkml/2018/3/5/731

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-05 23:22   ` Fabio Estevam
@ 2018-03-06 16:30     ` Mark Brown
  -1 siblings, 0 replies; 26+ messages in thread
From: Mark Brown @ 2018-03-06 16:30 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Tony Lindgren, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

[-- Attachment #1: Type: text/plain, Size: 2594 bytes --]

On Mon, Mar 05, 2018 at 08:22:26PM -0300, Fabio Estevam wrote:
> On Mon, Mar 5, 2018 at 8:12 PM, Tony Lindgren <tony@atomide.com> wrote:

> > Looks like with next-20180305 there's a regulator regression
> > where mmc0 won't show any cards or produces errors:

> > mmcblk0: error -110 requesting status
> > mmc1: new high speed SDIO card at address 0001
> > mmcblk0: error -110 requesting status
> > mmcblk0: recovery failed!
> > print_req_error: I/O error, dev mmcblk0, sector 0
> > Buffer I/O error on dev mmcblk0, logical block 0, async page read
> > mmcblk0: error -110 requesting status
> > mmcblk0: recovery failed!

No other error messages?  That seems like there's something going on
that's very different to what Fabio was reporting...  I'm guessing some
voltage application didn't go through but it's hard to tell with so
little data.  dra7 does seem to have what Fabio had though so there's
definitely some effect on the OMAP platforms.

> I have also seen regulator issues due to this series:
> https://lkml.org/lkml/2018/3/5/731

Looking at your stuff I'm having trouble figuring out what's going on -
we're getting double locking of a parent regulator during enable
according to your backtraces but it's not clear to me what took that
lock already.  regulator_enable() walks the supplies before it takes
the lock on the regulator it's immediately being called on, not holding
any locks on supplies while enabling.  regulator_balance_voltage() then
tries to lock the supplies again but lockdep says the lock is already
held by regulator_enable().  It's also weird that this doesn't seem to
be showing up on other boards in kernelci, the regulator setup on those
i.MX boards looks to be quite simple so I'd expect a much wider impact.

I'm wondering if your case is more pain from mutex_lock_nested(), both
regulator_lock_coupled() and regulator_lock_supply() will end up using 
indexes starting at 0 for the locking classes.  That doesn't smell right
though, but in case my straw clutching works:

If we can't figure it out I'll just drop the series but I'd prefer to at
least understand what's going on.

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index e685f8b94acf..2c5b20a97f51 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev)
 {
 	int i;
 
-	for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
+	for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
 		mutex_lock_nested(&rdev->mutex, i);
 }
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-06 16:30     ` Mark Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Brown @ 2018-03-06 16:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 05, 2018 at 08:22:26PM -0300, Fabio Estevam wrote:
> On Mon, Mar 5, 2018 at 8:12 PM, Tony Lindgren <tony@atomide.com> wrote:

> > Looks like with next-20180305 there's a regulator regression
> > where mmc0 won't show any cards or produces errors:

> > mmcblk0: error -110 requesting status
> > mmc1: new high speed SDIO card at address 0001
> > mmcblk0: error -110 requesting status
> > mmcblk0: recovery failed!
> > print_req_error: I/O error, dev mmcblk0, sector 0
> > Buffer I/O error on dev mmcblk0, logical block 0, async page read
> > mmcblk0: error -110 requesting status
> > mmcblk0: recovery failed!

No other error messages?  That seems like there's something going on
that's very different to what Fabio was reporting...  I'm guessing some
voltage application didn't go through but it's hard to tell with so
little data.  dra7 does seem to have what Fabio had though so there's
definitely some effect on the OMAP platforms.

> I have also seen regulator issues due to this series:
> https://lkml.org/lkml/2018/3/5/731

Looking at your stuff I'm having trouble figuring out what's going on -
we're getting double locking of a parent regulator during enable
according to your backtraces but it's not clear to me what took that
lock already.  regulator_enable() walks the supplies before it takes
the lock on the regulator it's immediately being called on, not holding
any locks on supplies while enabling.  regulator_balance_voltage() then
tries to lock the supplies again but lockdep says the lock is already
held by regulator_enable().  It's also weird that this doesn't seem to
be showing up on other boards in kernelci, the regulator setup on those
i.MX boards looks to be quite simple so I'd expect a much wider impact.

I'm wondering if your case is more pain from mutex_lock_nested(), both
regulator_lock_coupled() and regulator_lock_supply() will end up using 
indexes starting at 0 for the locking classes.  That doesn't smell right
though, but in case my straw clutching works:

If we can't figure it out I'll just drop the series but I'd prefer to at
least understand what's going on.

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index e685f8b94acf..2c5b20a97f51 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev)
 {
 	int i;
 
-	for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
+	for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
 		mutex_lock_nested(&rdev->mutex, i);
 }
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180306/8ae451fd/attachment-0001.sig>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-06 16:30     ` Mark Brown
@ 2018-03-06 16:56       ` Fabio Estevam
  -1 siblings, 0 replies; 26+ messages in thread
From: Fabio Estevam @ 2018-03-06 16:56 UTC (permalink / raw)
  To: Mark Brown
  Cc: Tony Lindgren, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

Hi Mark,

On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote:

> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index e685f8b94acf..2c5b20a97f51 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev)
>  {
>         int i;
>
> -       for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
> +       for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
>                 mutex_lock_nested(&rdev->mutex, i);

With this change the log is a bit different, but still get a kernel hang:
https://pastebin.com/eF08TnuT

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-06 16:56       ` Fabio Estevam
  0 siblings, 0 replies; 26+ messages in thread
From: Fabio Estevam @ 2018-03-06 16:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote:

> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index e685f8b94acf..2c5b20a97f51 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev)
>  {
>         int i;
>
> -       for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
> +       for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
>                 mutex_lock_nested(&rdev->mutex, i);

With this change the log is a bit different, but still get a kernel hang:
https://pastebin.com/eF08TnuT

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-06 16:56       ` Fabio Estevam
@ 2018-03-06 17:18         ` Tony Lindgren
  -1 siblings, 0 replies; 26+ messages in thread
From: Tony Lindgren @ 2018-03-06 17:18 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Mark Brown, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

* Fabio Estevam <festevam@gmail.com> [180306 16:57]:
> Hi Mark,
> 
> On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote:
> 
> > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> > index e685f8b94acf..2c5b20a97f51 100644
> > --- a/drivers/regulator/core.c
> > +++ b/drivers/regulator/core.c
> > @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev)
> >  {
> >         int i;
> >
> > -       for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
> > +       for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
> >                 mutex_lock_nested(&rdev->mutex, i);
> 
> With this change the log is a bit different, but still get a kernel hang:
> https://pastebin.com/eF08TnuT

That patch does not seem to change anything for me. The errors
I posted yesterday for mmc0 only happen on duovero, other omap
variants just fail silently with no cards showing.

Regards,

Tony

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-06 17:18         ` Tony Lindgren
  0 siblings, 0 replies; 26+ messages in thread
From: Tony Lindgren @ 2018-03-06 17:18 UTC (permalink / raw)
  To: linux-arm-kernel

* Fabio Estevam <festevam@gmail.com> [180306 16:57]:
> Hi Mark,
> 
> On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote:
> 
> > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> > index e685f8b94acf..2c5b20a97f51 100644
> > --- a/drivers/regulator/core.c
> > +++ b/drivers/regulator/core.c
> > @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev)
> >  {
> >         int i;
> >
> > -       for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
> > +       for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
> >                 mutex_lock_nested(&rdev->mutex, i);
> 
> With this change the log is a bit different, but still get a kernel hang:
> https://pastebin.com/eF08TnuT

That patch does not seem to change anything for me. The errors
I posted yesterday for mmc0 only happen on duovero, other omap
variants just fail silently with no cards showing.

Regards,

Tony

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-06 16:56       ` Fabio Estevam
@ 2018-03-06 19:12         ` Mark Brown
  -1 siblings, 0 replies; 26+ messages in thread
From: Mark Brown @ 2018-03-06 19:12 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Tony Lindgren, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

[-- Attachment #1: Type: text/plain, Size: 666 bytes --]

On Tue, Mar 06, 2018 at 01:56:42PM -0300, Fabio Estevam wrote:
> On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote:

> > -       for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
> > +       for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
> >                 mutex_lock_nested(&rdev->mutex, i);

> With this change the log is a bit different, but still get a kernel hang:
> https://pastebin.com/eF08TnuT

Yeah, that's what I expected.  What we really need to figure out I think
is what exactly is taking the lock that we end up colliding with, I
don't seem to be able to reproduce so I'm having to just go on staring
at the code here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-06 19:12         ` Mark Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Brown @ 2018-03-06 19:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 06, 2018 at 01:56:42PM -0300, Fabio Estevam wrote:
> On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote:

> > -       for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
> > +       for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
> >                 mutex_lock_nested(&rdev->mutex, i);

> With this change the log is a bit different, but still get a kernel hang:
> https://pastebin.com/eF08TnuT

Yeah, that's what I expected.  What we really need to figure out I think
is what exactly is taking the lock that we end up colliding with, I
don't seem to be able to reproduce so I'm having to just go on staring
at the code here.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180306/7015dd76/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-06 19:12         ` Mark Brown
@ 2018-03-06 20:06           ` Fabio Estevam
  -1 siblings, 0 replies; 26+ messages in thread
From: Fabio Estevam @ 2018-03-06 20:06 UTC (permalink / raw)
  To: Mark Brown
  Cc: Tony Lindgren, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

Hi Mark,

On Tue, Mar 6, 2018 at 4:12 PM, Mark Brown <broonie@kernel.org> wrote:

> Yeah, that's what I expected.  What we really need to figure out I think
> is what exactly is taking the lock that we end up colliding with, I
> don't seem to be able to reproduce so I'm having to just go on staring
> at the code here.

Any debug options I should turn on in order to help us to understand
more on the lock issue?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-06 20:06           ` Fabio Estevam
  0 siblings, 0 replies; 26+ messages in thread
From: Fabio Estevam @ 2018-03-06 20:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Tue, Mar 6, 2018 at 4:12 PM, Mark Brown <broonie@kernel.org> wrote:

> Yeah, that's what I expected.  What we really need to figure out I think
> is what exactly is taking the lock that we end up colliding with, I
> don't seem to be able to reproduce so I'm having to just go on staring
> at the code here.

Any debug options I should turn on in order to help us to understand
more on the lock issue?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-06 20:06           ` Fabio Estevam
@ 2018-03-06 21:33             ` Mark Brown
  -1 siblings, 0 replies; 26+ messages in thread
From: Mark Brown @ 2018-03-06 21:33 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Tony Lindgren, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

[-- Attachment #1: Type: text/plain, Size: 663 bytes --]

On Tue, Mar 06, 2018 at 05:06:48PM -0300, Fabio Estevam wrote:
> On Tue, Mar 6, 2018 at 4:12 PM, Mark Brown <broonie@kernel.org> wrote:

> > Yeah, that's what I expected.  What we really need to figure out I think
> > is what exactly is taking the lock that we end up colliding with, I
> > don't seem to be able to reproduce so I'm having to just go on staring
> > at the code here.

> Any debug options I should turn on in order to help us to understand
> more on the lock issue?

No, unfortunately, apart from the generic kernel ones (like lockdep).
I'll try to have a poke at adding some tomorrow, though I'm also
considering just dropping the series for now.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-06 21:33             ` Mark Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Brown @ 2018-03-06 21:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 06, 2018 at 05:06:48PM -0300, Fabio Estevam wrote:
> On Tue, Mar 6, 2018 at 4:12 PM, Mark Brown <broonie@kernel.org> wrote:

> > Yeah, that's what I expected.  What we really need to figure out I think
> > is what exactly is taking the lock that we end up colliding with, I
> > don't seem to be able to reproduce so I'm having to just go on staring
> > at the code here.

> Any debug options I should turn on in order to help us to understand
> more on the lock issue?

No, unfortunately, apart from the generic kernel ones (like lockdep).
I'll try to have a poke at adding some tomorrow, though I'm also
considering just dropping the series for now.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180306/591e2d86/attachment.sig>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-06 21:33             ` Mark Brown
@ 2018-03-06 22:10               ` Fabio Estevam
  -1 siblings, 0 replies; 26+ messages in thread
From: Fabio Estevam @ 2018-03-06 22:10 UTC (permalink / raw)
  To: Mark Brown
  Cc: Tony Lindgren, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Tue, Mar 6, 2018 at 6:33 PM, Mark Brown <broonie@kernel.org> wrote:

> No, unfortunately, apart from the generic kernel ones (like lockdep).
> I'll try to have a poke at adding some tomorrow, though I'm also
> considering just dropping the series for now.

Ok, lockdep is already enabled by default in the defconfig I am using
(imx_v6_v7_defconfig).

Yesterday I spent a few hours trying to understand this locking issue,
but didn't get any success on this task.

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-06 22:10               ` Fabio Estevam
  0 siblings, 0 replies; 26+ messages in thread
From: Fabio Estevam @ 2018-03-06 22:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 6, 2018 at 6:33 PM, Mark Brown <broonie@kernel.org> wrote:

> No, unfortunately, apart from the generic kernel ones (like lockdep).
> I'll try to have a poke at adding some tomorrow, though I'm also
> considering just dropping the series for now.

Ok, lockdep is already enabled by default in the defconfig I am using
(imx_v6_v7_defconfig).

Yesterday I spent a few hours trying to understand this locking issue,
but didn't get any success on this task.

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-06 16:30     ` Mark Brown
@ 2018-03-07 12:57       ` Maciej Purski
  -1 siblings, 0 replies; 26+ messages in thread
From: Maciej Purski @ 2018-03-07 12:57 UTC (permalink / raw)
  To: Mark Brown, Fabio Estevam
  Cc: Tony Lindgren, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Marek Szyprowski, Bartlomiej Zolnierkiewicz

Hi all,
sorry it took me so long to answer.

On 03/06/2018 05:30 PM, Mark Brown wrote:
> On Mon, Mar 05, 2018 at 08:22:26PM -0300, Fabio Estevam wrote:
>> On Mon, Mar 5, 2018 at 8:12 PM, Tony Lindgren <tony@atomide.com> wrote:
> 
>>> Looks like with next-20180305 there's a regulator regression
>>> where mmc0 won't show any cards or produces errors:
> 
>>> mmcblk0: error -110 requesting status
>>> mmc1: new high speed SDIO card at address 0001
>>> mmcblk0: error -110 requesting status
>>> mmcblk0: recovery failed!
>>> print_req_error: I/O error, dev mmcblk0, sector 0
>>> Buffer I/O error on dev mmcblk0, logical block 0, async page read
>>> mmcblk0: error -110 requesting status
>>> mmcblk0: recovery failed!
> 
> No other error messages?  That seems like there's something going on
> that's very different to what Fabio was reporting...  I'm guessing some
> voltage application didn't go through but it's hard to tell with so
> little data.  dra7 does seem to have what Fabio had though so there's
> definitely some effect on the OMAP platforms.
>  >> I have also seen regulator issues due to this series:
>> https://lkml.org/lkml/2018/3/5/731
> 
> Looking at your stuff I'm having trouble figuring out what's going on -
> we're getting double locking of a parent regulator during enable
> according to your backtraces but it's not clear to me what took that
> lock already.  regulator_enable() walks the supplies before it takes
> the lock on the regulator it's immediately being called on, not holding
> any locks on supplies while enabling.  regulator_balance_voltage() then
> tries to lock the supplies again but lockdep says the lock is already
> held by regulator_enable().  It's also weird that this doesn't seem to
> be showing up on other boards in kernelci, the regulator setup on those
> i.MX boards looks to be quite simple so I'd expect a much wider impact.
>

I'm trying to figure out what is so special about these boards. The only strange 
thing, that I haven't noticed at first, is that all regulators share a common 
supply - dummy regulator. It is defined in anatop_regulator.c.


> I'm wondering if your case is more pain from mutex_lock_nested(), both
> regulator_lock_coupled() and regulator_lock_supply() will end up using
> indexes starting at 0 for the locking classes.  That doesn't smell right
> though, but in case my straw clutching works:
> 
> If we can't figure it out I'll just drop the series but I'd prefer to at
> least understand what's going on.
>

I have been struggling to reproduce the issue on my exynos boards, but all I 
have achieved is getting the same lockdep warning, but everything else works 
fine. I think it was a false positive caused by using the same indices in 
lock_coupled() and lock_supply().

> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index e685f8b94acf..2c5b20a97f51 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev)
>   {
>   	int i;
>   
> -	for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
> +	for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
>   		mutex_lock_nested(&rdev->mutex, i);
>   }
>   
> 

Best regards,
Maciej Purski

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-07 12:57       ` Maciej Purski
  0 siblings, 0 replies; 26+ messages in thread
From: Maciej Purski @ 2018-03-07 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,
sorry it took me so long to answer.

On 03/06/2018 05:30 PM, Mark Brown wrote:
> On Mon, Mar 05, 2018 at 08:22:26PM -0300, Fabio Estevam wrote:
>> On Mon, Mar 5, 2018 at 8:12 PM, Tony Lindgren <tony@atomide.com> wrote:
> 
>>> Looks like with next-20180305 there's a regulator regression
>>> where mmc0 won't show any cards or produces errors:
> 
>>> mmcblk0: error -110 requesting status
>>> mmc1: new high speed SDIO card at address 0001
>>> mmcblk0: error -110 requesting status
>>> mmcblk0: recovery failed!
>>> print_req_error: I/O error, dev mmcblk0, sector 0
>>> Buffer I/O error on dev mmcblk0, logical block 0, async page read
>>> mmcblk0: error -110 requesting status
>>> mmcblk0: recovery failed!
> 
> No other error messages?  That seems like there's something going on
> that's very different to what Fabio was reporting...  I'm guessing some
> voltage application didn't go through but it's hard to tell with so
> little data.  dra7 does seem to have what Fabio had though so there's
> definitely some effect on the OMAP platforms.
>  >> I have also seen regulator issues due to this series:
>> https://lkml.org/lkml/2018/3/5/731
> 
> Looking at your stuff I'm having trouble figuring out what's going on -
> we're getting double locking of a parent regulator during enable
> according to your backtraces but it's not clear to me what took that
> lock already.  regulator_enable() walks the supplies before it takes
> the lock on the regulator it's immediately being called on, not holding
> any locks on supplies while enabling.  regulator_balance_voltage() then
> tries to lock the supplies again but lockdep says the lock is already
> held by regulator_enable().  It's also weird that this doesn't seem to
> be showing up on other boards in kernelci, the regulator setup on those
> i.MX boards looks to be quite simple so I'd expect a much wider impact.
>

I'm trying to figure out what is so special about these boards. The only strange 
thing, that I haven't noticed at first, is that all regulators share a common 
supply - dummy regulator. It is defined in anatop_regulator.c.


> I'm wondering if your case is more pain from mutex_lock_nested(), both
> regulator_lock_coupled() and regulator_lock_supply() will end up using
> indexes starting at 0 for the locking classes.  That doesn't smell right
> though, but in case my straw clutching works:
> 
> If we can't figure it out I'll just drop the series but I'd prefer to at
> least understand what's going on.
>

I have been struggling to reproduce the issue on my exynos boards, but all I 
have achieved is getting the same lockdep warning, but everything else works 
fine. I think it was a false positive caused by using the same indices in 
lock_coupled() and lock_supply().

> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index e685f8b94acf..2c5b20a97f51 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev)
>   {
>   	int i;
>   
> -	for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++)
> +	for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++)
>   		mutex_lock_nested(&rdev->mutex, i);
>   }
>   
> 

Best regards,
Maciej Purski

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-07 12:57       ` Maciej Purski
  (?)
@ 2018-03-07 14:10       ` Mark Brown
  2018-03-07 14:37           ` Maciej Purski
  -1 siblings, 1 reply; 26+ messages in thread
From: Mark Brown @ 2018-03-07 14:10 UTC (permalink / raw)
  To: Maciej Purski
  Cc: Fabio Estevam, Tony Lindgren, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Marek Szyprowski, Bartlomiej Zolnierkiewicz

[-- Attachment #1: Type: text/plain, Size: 524 bytes --]

On Wed, Mar 07, 2018 at 01:57:12PM +0100, Maciej Purski wrote:

> I'm trying to figure out what is so special about these boards. The only
> strange thing, that I haven't noticed at first, is that all regulators share
> a common supply - dummy regulator. It is defined in anatop_regulator.c.

No, that's a regulator framework thing - the regulator framework will
use the dummy regulator as a supply when there's nothing described in
the DT so long as the client doesn't explicitly tell it that the supply
might be optional.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-07 14:10       ` Mark Brown
@ 2018-03-07 14:37           ` Maciej Purski
  0 siblings, 0 replies; 26+ messages in thread
From: Maciej Purski @ 2018-03-07 14:37 UTC (permalink / raw)
  To: Mark Brown
  Cc: Fabio Estevam, Tony Lindgren, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Marek Szyprowski, Bartlomiej Zolnierkiewicz


On 03/07/2018 03:10 PM, Mark Brown wrote:
> On Wed, Mar 07, 2018 at 01:57:12PM +0100, Maciej Purski wrote:
> 
>> I'm trying to figure out what is so special about these boards. The only
>> strange thing, that I haven't noticed at first, is that all regulators share
>> a common supply - dummy regulator. It is defined in anatop_regulator.c.
> 
> No, that's a regulator framework thing - the regulator framework will
> use the dummy regulator as a supply when there's nothing described in
> the DT so long as the client doesn't explicitly tell it that the supply
> might be optional.
> 

Ok, thanks for explanation. I think I have found a possibly dangerous scenario, 
but I can't see this situation possible in Fabio's case.

Assume, that we have a chain of supplies, consisting of at least 3. Say: A->B->C.

When we're setting voltage on A, we lock it, call balance_voltage(), lock 
suppliers and call set_voltage_rdev(). So we have regulators A, B, C locked. 
Then set_voltage_rdev() is trying to set voltage of its supply by calling 
set_voltage_unlocked().

Now we're on the regulator B. Set_voltage_unlocked() calls balance_voltage(), 
which again locks its supplies, if they exist. B's supply is C, so we end up 
with having a deadlock on regulator C.

Tony and Fabio, do you find this scenario possible on your boards?

Best regards
Maciej Purski

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-07 14:37           ` Maciej Purski
  0 siblings, 0 replies; 26+ messages in thread
From: Maciej Purski @ 2018-03-07 14:37 UTC (permalink / raw)
  To: linux-arm-kernel


On 03/07/2018 03:10 PM, Mark Brown wrote:
> On Wed, Mar 07, 2018 at 01:57:12PM +0100, Maciej Purski wrote:
> 
>> I'm trying to figure out what is so special about these boards. The only
>> strange thing, that I haven't noticed at first, is that all regulators share
>> a common supply - dummy regulator. It is defined in anatop_regulator.c.
> 
> No, that's a regulator framework thing - the regulator framework will
> use the dummy regulator as a supply when there's nothing described in
> the DT so long as the client doesn't explicitly tell it that the supply
> might be optional.
> 

Ok, thanks for explanation. I think I have found a possibly dangerous scenario, 
but I can't see this situation possible in Fabio's case.

Assume, that we have a chain of supplies, consisting of at least 3. Say: A->B->C.

When we're setting voltage on A, we lock it, call balance_voltage(), lock 
suppliers and call set_voltage_rdev(). So we have regulators A, B, C locked. 
Then set_voltage_rdev() is trying to set voltage of its supply by calling 
set_voltage_unlocked().

Now we're on the regulator B. Set_voltage_unlocked() calls balance_voltage(), 
which again locks its supplies, if they exist. B's supply is C, so we end up 
with having a deadlock on regulator C.

Tony and Fabio, do you find this scenario possible on your boards?

Best regards
Maciej Purski

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-06 22:10               ` Fabio Estevam
  (?)
@ 2018-03-07 14:45               ` Mark Brown
  -1 siblings, 0 replies; 26+ messages in thread
From: Mark Brown @ 2018-03-07 14:45 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Tony Lindgren, Maciej Purski, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

On Tue, Mar 06, 2018 at 07:10:01PM -0300, Fabio Estevam wrote:

> Yesterday I spent a few hours trying to understand this locking issue,
> but didn't get any success on this task.

Right, so I'm thinking that if it's taking this much effort to think
about the problem we should just drop the patch set for now.  I've moved
the code to a branch test/coupled in my git so it's not getting in
people's way in -next.  Maciej, I suspect that you might be able to
reproduce this if you set supply names in the regulator definitons for
the PMICs on your boards.  I think that's the difference here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Regulator regression in next-20180305
  2018-03-07 14:37           ` Maciej Purski
@ 2018-03-07 15:17             ` Tony Lindgren
  -1 siblings, 0 replies; 26+ messages in thread
From: Tony Lindgren @ 2018-03-07 15:17 UTC (permalink / raw)
  To: Maciej Purski
  Cc: Mark Brown, Fabio Estevam, linux-omap, linux-kernel,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Marek Szyprowski, Bartlomiej Zolnierkiewicz

* Maciej Purski <m.purski@samsung.com> [180307 14:38]:
> 
> On 03/07/2018 03:10 PM, Mark Brown wrote:
> > On Wed, Mar 07, 2018 at 01:57:12PM +0100, Maciej Purski wrote:
> > 
> > > I'm trying to figure out what is so special about these boards. The only
> > > strange thing, that I haven't noticed at first, is that all regulators share
> > > a common supply - dummy regulator. It is defined in anatop_regulator.c.
> > 
> > No, that's a regulator framework thing - the regulator framework will
> > use the dummy regulator as a supply when there's nothing described in
> > the DT so long as the client doesn't explicitly tell it that the supply
> > might be optional.
> > 
> 
> Ok, thanks for explanation. I think I have found a possibly dangerous
> scenario, but I can't see this situation possible in Fabio's case.
> 
> Assume, that we have a chain of supplies, consisting of at least 3. Say: A->B->C.
> 
> When we're setting voltage on A, we lock it, call balance_voltage(), lock
> suppliers and call set_voltage_rdev(). So we have regulators A, B, C locked.
> Then set_voltage_rdev() is trying to set voltage of its supply by calling
> set_voltage_unlocked().
> 
> Now we're on the regulator B. Set_voltage_unlocked() calls
> balance_voltage(), which again locks its supplies, if they exist. B's supply
> is C, so we end up with having a deadlock on regulator C.
> 
> Tony and Fabio, do you find this scenario possible on your boards?

For mmc0 at least typically the supply is on the PMIC for omap
variants and it's controlled over i2c or spi bus like most other
regulators. Then boards some have additional GPIO controlled
regulators. So yeah some kind of locking issue between two or
more regulators on i2c or spi bus could be the reason.

Regards,

Tony

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Regulator regression in next-20180305
@ 2018-03-07 15:17             ` Tony Lindgren
  0 siblings, 0 replies; 26+ messages in thread
From: Tony Lindgren @ 2018-03-07 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

* Maciej Purski <m.purski@samsung.com> [180307 14:38]:
> 
> On 03/07/2018 03:10 PM, Mark Brown wrote:
> > On Wed, Mar 07, 2018 at 01:57:12PM +0100, Maciej Purski wrote:
> > 
> > > I'm trying to figure out what is so special about these boards. The only
> > > strange thing, that I haven't noticed at first, is that all regulators share
> > > a common supply - dummy regulator. It is defined in anatop_regulator.c.
> > 
> > No, that's a regulator framework thing - the regulator framework will
> > use the dummy regulator as a supply when there's nothing described in
> > the DT so long as the client doesn't explicitly tell it that the supply
> > might be optional.
> > 
> 
> Ok, thanks for explanation. I think I have found a possibly dangerous
> scenario, but I can't see this situation possible in Fabio's case.
> 
> Assume, that we have a chain of supplies, consisting of at least 3. Say: A->B->C.
> 
> When we're setting voltage on A, we lock it, call balance_voltage(), lock
> suppliers and call set_voltage_rdev(). So we have regulators A, B, C locked.
> Then set_voltage_rdev() is trying to set voltage of its supply by calling
> set_voltage_unlocked().
> 
> Now we're on the regulator B. Set_voltage_unlocked() calls
> balance_voltage(), which again locks its supplies, if they exist. B's supply
> is C, so we end up with having a deadlock on regulator C.
> 
> Tony and Fabio, do you find this scenario possible on your boards?

For mmc0 at least typically the supply is on the PMIC for omap
variants and it's controlled over i2c or spi bus like most other
regulators. Then boards some have additional GPIO controlled
regulators. So yeah some kind of locking issue between two or
more regulators on i2c or spi bus could be the reason.

Regards,

Tony

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-03-07 15:17 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 23:12 Regulator regression in next-20180305 Tony Lindgren
2018-03-05 23:12 ` Tony Lindgren
2018-03-05 23:22 ` Fabio Estevam
2018-03-05 23:22   ` Fabio Estevam
2018-03-06 16:30   ` Mark Brown
2018-03-06 16:30     ` Mark Brown
2018-03-06 16:56     ` Fabio Estevam
2018-03-06 16:56       ` Fabio Estevam
2018-03-06 17:18       ` Tony Lindgren
2018-03-06 17:18         ` Tony Lindgren
2018-03-06 19:12       ` Mark Brown
2018-03-06 19:12         ` Mark Brown
2018-03-06 20:06         ` Fabio Estevam
2018-03-06 20:06           ` Fabio Estevam
2018-03-06 21:33           ` Mark Brown
2018-03-06 21:33             ` Mark Brown
2018-03-06 22:10             ` Fabio Estevam
2018-03-06 22:10               ` Fabio Estevam
2018-03-07 14:45               ` Mark Brown
2018-03-07 12:57     ` Maciej Purski
2018-03-07 12:57       ` Maciej Purski
2018-03-07 14:10       ` Mark Brown
2018-03-07 14:37         ` Maciej Purski
2018-03-07 14:37           ` Maciej Purski
2018-03-07 15:17           ` Tony Lindgren
2018-03-07 15:17             ` Tony Lindgren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.