All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
@ 2021-02-18 11:02 Wolfram Sang
  2021-03-02 10:38 ` Ulf Hansson
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfram Sang @ 2021-02-18 11:02 UTC (permalink / raw)
  To: linux-mmc
  Cc: linux-renesas-soc, Yoshihiro Shimoda, Wolfram Sang,
	Geert Uytterhoeven, Niklas Söderlund

RPM handling has been improved twice since this comment, and also SCC
handling has been improved a lot. All the testing we did (Geert's and
Niklas' and Wolfram's board farms) with the workaround removed did not
lead to problems, so it is time to get rid of it to the best of our
knowledge.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/mmc/host/tmio_mmc_core.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
index 49c2d406c48e..2478a91e84b2 100644
--- a/drivers/mmc/host/tmio_mmc_core.c
+++ b/drivers/mmc/host/tmio_mmc_core.c
@@ -1160,15 +1160,6 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
 				  mmc->caps & MMC_CAP_NEEDS_POLL ||
 				  !mmc_card_is_removable(mmc));
 
-	/*
-	 * On Gen2+, eMMC with NONREMOVABLE currently fails because native
-	 * hotplug gets disabled. It seems RuntimePM related yet we need further
-	 * research. Since we are planning a PM overhaul anyway, let's enforce
-	 * for now the device being active by enabling native hotplug always.
-	 */
-	if (pdata->flags & TMIO_MMC_MIN_RCAR2)
-		_host->native_hotplug = true;
-
 	/*
 	 * While using internal tmio hardware logic for card detection, we need
 	 * to ensure it stays powered for it to work.
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2021-02-18 11:02 [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE Wolfram Sang
@ 2021-03-02 10:38 ` Ulf Hansson
  0 siblings, 0 replies; 16+ messages in thread
From: Ulf Hansson @ 2021-03-02 10:38 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: linux-mmc, Linux-Renesas, Yoshihiro Shimoda, Geert Uytterhoeven,
	Niklas Söderlund

On Thu, 18 Feb 2021 at 13:01, Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
>
> RPM handling has been improved twice since this comment, and also SCC
> handling has been improved a lot. All the testing we did (Geert's and
> Niklas' and Wolfram's board farms) with the workaround removed did not
> lead to problems, so it is time to get rid of it to the best of our
> knowledge.
>
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>

Applied for next, thanks!

Kind regards
Uffe


> ---
>  drivers/mmc/host/tmio_mmc_core.c | 9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
> index 49c2d406c48e..2478a91e84b2 100644
> --- a/drivers/mmc/host/tmio_mmc_core.c
> +++ b/drivers/mmc/host/tmio_mmc_core.c
> @@ -1160,15 +1160,6 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
>                                   mmc->caps & MMC_CAP_NEEDS_POLL ||
>                                   !mmc_card_is_removable(mmc));
>
> -       /*
> -        * On Gen2+, eMMC with NONREMOVABLE currently fails because native
> -        * hotplug gets disabled. It seems RuntimePM related yet we need further
> -        * research. Since we are planning a PM overhaul anyway, let's enforce
> -        * for now the device being active by enabling native hotplug always.
> -        */
> -       if (pdata->flags & TMIO_MMC_MIN_RCAR2)
> -               _host->native_hotplug = true;
> -
>         /*
>          * While using internal tmio hardware logic for card detection, we need
>          * to ensure it stays powered for it to work.
> --
> 2.30.0
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-12-02  8:31               ` Geert Uytterhoeven
@ 2019-12-02  8:50                 ` Wolfram Sang
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfram Sang @ 2019-12-02  8:50 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Wolfram Sang, Linux MMC List, Linux-Renesas

[-- Attachment #1: Type: text/plain, Size: 298 bytes --]


> How do you reboot in between tests?
> I usually use /sbin/reboot if the target booted fine, and the (remote
> controlled)
> reset button if the target locked up.

I use mostly the reset button. As I recall, last week the issue happened
even after a cold boot... but I can retry using 'reboot'.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-12-02  8:20             ` Wolfram Sang
@ 2019-12-02  8:31               ` Geert Uytterhoeven
  2019-12-02  8:50                 ` Wolfram Sang
  0 siblings, 1 reply; 16+ messages in thread
From: Geert Uytterhoeven @ 2019-12-02  8:31 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: Wolfram Sang, Linux MMC List, Linux-Renesas

Hi Wolfram,

On Mon, Dec 2, 2019 at 9:20 AM Wolfram Sang <wsa@the-dreams.de> wrote:
> > anymore. However, I know have the issue highly reproducible with
> > renesas_defconfig and renesas-drivers. Good!
>
> Bummer, it is not that reproducible :(
>
> Yesterday, I tried latest linus tree which includes Ulf's changes to
> genpd and it worked, even with the NON_REMOVABLE workaround removed
> again. Then I reverted Ulf's changes to double check it made a
> difference, but the SCC still worked. So, I switched back to the
> renesas-drivers tree which used to fail last week, and it sadly works,
> too. Sigh...

How do you reboot in between tests?
I usually use /sbin/reboot if the target booted fine, and the (remote
controlled)
reset button if the target locked up.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-21 11:10           ` Wolfram Sang
@ 2019-12-02  8:20             ` Wolfram Sang
  2019-12-02  8:31               ` Geert Uytterhoeven
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfram Sang @ 2019-12-02  8:20 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Wolfram Sang, Linux MMC List, Linux-Renesas

[-- Attachment #1: Type: text/plain, Size: 624 bytes --]


> anymore. However, I know have the issue highly reproducible with
> renesas_defconfig and renesas-drivers. Good!

Bummer, it is not that reproducible :(

Yesterday, I tried latest linus tree which includes Ulf's changes to
genpd and it worked, even with the NON_REMOVABLE workaround removed
again. Then I reverted Ulf's changes to double check it made a
difference, but the SCC still worked. So, I switched back to the
renesas-drivers tree which used to fail last week, and it sadly works,
too. Sigh...

I'll move over now to upport the manual calibration mechanism and will
keep an eye on if/when the SCC fails again...


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-21 10:29           ` Ulf Hansson
  2019-11-21 10:52             ` Geert Uytterhoeven
@ 2019-11-21 11:12             ` Wolfram Sang
  1 sibling, 0 replies; 16+ messages in thread
From: Wolfram Sang @ 2019-11-21 11:12 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Geert Uytterhoeven, Wolfram Sang, Linux MMC List, Linux-Renesas

[-- Attachment #1: Type: text/plain, Size: 509 bytes --]


> If you haven't noticed, we have also managed to replace the call with
> pm_runtime_get_sync() with a call to dev_pm_domain_start(), to
> simplify the code. The point is, these changes are queued via Rafael's
> pm-tree for v5.5.

I'll have a look at those already. I had them in mind, too, once we
discovered that it is still RPM related.

> So, perhaps at this point we should simply drop the offending commit
> and revisit this once v5.5-rc1 is out?

Yes, this was my suggestion, too. I'll send a patch.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-21  9:35         ` Geert Uytterhoeven
  2019-11-21 10:29           ` Ulf Hansson
@ 2019-11-21 11:10           ` Wolfram Sang
  2019-12-02  8:20             ` Wolfram Sang
  1 sibling, 1 reply; 16+ messages in thread
From: Wolfram Sang @ 2019-11-21 11:10 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Wolfram Sang, Linux MMC List, Linux-Renesas

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]


> As I managed to bisect it, it was fairly reproducible for me. Just checkout
> commit 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE"),
> or use renesas-drivers.

That didn't work for me until I used renesas_defconfig with
renesas-drivers. Obviously (by now) the issue depends on the kernel
config. I don't know what I changed so I couldn't trigger the bug
anymore. However, I know have the issue highly reproducible with
renesas_defconfig and renesas-drivers. Good!

> But as it looks to be timing-related, and E3 has different/less CPU cores,
> it may still be affected.

I'll check my Ebisu later, too. Couldn't get the kernel to run on the
first try. Probably PEBKAC.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-21 10:29           ` Ulf Hansson
@ 2019-11-21 10:52             ` Geert Uytterhoeven
  2019-11-21 11:12             ` Wolfram Sang
  1 sibling, 0 replies; 16+ messages in thread
From: Geert Uytterhoeven @ 2019-11-21 10:52 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: Wolfram Sang, Wolfram Sang, Linux MMC List, Linux-Renesas

Hi Ulf,

On Thu, Nov 21, 2019 at 11:30 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Thu, 21 Nov 2019 at 10:35, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Thu, Nov 21, 2019 at 9:57 AM Wolfram Sang <wsa@the-dreams.de> wrote:
> > > > So some of my local code on top must have impacted the behavior.
> > >
> > > Any change in temperature? Niklas and I wonder if it is thermal related.
> >
> > Nope. I tried an old "known" good kernel again yesterday, and it worked.
> > That was BTW the one which had the additional debug prints:
> >
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -199,6 +199,7 @@ static struct generic_pm_domain
> > *dev_to_genpd(struct device *dev)
> >  static int genpd_stop_dev(const struct generic_pm_domain *genpd,
> >                           struct device *dev)
> >  {
> > +pr_info("==== %s/%s: stop\n", genpd->name, dev_name(dev));
> >         WARN(device_may_wakeup(dev),
> >              "Domain %s must be active_wakeup for wakeup source %s\n",
> >              genpd->name, dev_name(dev));
> > @@ -208,6 +209,7 @@ static int genpd_stop_dev(const struct
> > generic_pm_domain *genpd,
> >  static int genpd_start_dev(const struct generic_pm_domain *genpd,
> >                            struct device *dev)
> >  {
> > +pr_info("==== %s/%s: start\n", genpd->name, dev_name(dev));
> >         return GENPD_DEV_CALLBACK(genpd, int, start, dev);
> >  }
> >
> > Removing those prints made the old kernel fail, too, so this is why I think
> > it is a race with Runtime PM.
> >
> > With a tree based on latest renesas-drivers, it happens regardless of those
> > debug prints.
> >
> > > > > I am working on an issue where the SCC hangs, but this has to do with
> > > > > always providing the SCC clock (SDnH). I don't really see the connection
> > > > > of that to RuntimePM yet, though :/

> A few comments around your runtime PM concerns, not sure if this
> matters to you issues.
>
> So, only by looking at the code for probing of the tmio variant
> drivers, it seems like there are accesses (both reads and writes) for
> the device being done, without first making sure that the clock
> managed by the PM domain has been enabled. Is that really okay? Note,
> this isn't a new thing, but it has been there as long as can remember.

No, that is not OK.
On R-Car Gen2, my debug code that disables all non-critical module
clocks during early boot should have caught them.
On R-Car Gen3, it's different, unfortunately, as apparently not all
module clocks can be disabled (protected by secure code?).
So my debug code has limited use on those platforms.

Note that the SCC seems to be clocked by a normal clock (SDnH), not by a
module clock, so it's not controlled by Runtime PM.
In fact, without "[PATCH] WIP: clk: renesas: rcar-gen3: enable SDnH clk
for HS modes", Linux doesn't enable it at all.

BTW, perhaps U-Boot leaves the SCC in a different state on R-Car E3
than on M3-N?

> For example, in renesas_sdhi_probe() there are calls made to
> sd_ctrl_write16|read16() before calling tmio_mmc_host_probe().
>
> Additionally in tmio_mmc_host_probe(), there are calls made to
> sd_ctrl_write16|read16(), before calling pm_runtime_get_sync().

Ugh.

> If you haven't noticed, we have also managed to replace the call with
> pm_runtime_get_sync() with a call to dev_pm_domain_start(), to
> simplify the code. The point is, these changes are queued via Rafael's
> pm-tree for v5.5.

Thanks, I hadn't noticed that.
I do have pm/linux-next in renesas-drivers, so that code has been exercised.

> So, perhaps at this point we should simply drop the offending commit
> and revisit this once v5.5-rc1 is out?

Yes, that looks like the best short-term solution.
Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-21  9:35         ` Geert Uytterhoeven
@ 2019-11-21 10:29           ` Ulf Hansson
  2019-11-21 10:52             ` Geert Uytterhoeven
  2019-11-21 11:12             ` Wolfram Sang
  2019-11-21 11:10           ` Wolfram Sang
  1 sibling, 2 replies; 16+ messages in thread
From: Ulf Hansson @ 2019-11-21 10:29 UTC (permalink / raw)
  To: Geert Uytterhoeven, Wolfram Sang, Wolfram Sang
  Cc: Linux MMC List, Linux-Renesas

Geert, Wolfram

On Thu, 21 Nov 2019 at 10:35, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Wolfram,
>
> On Thu, Nov 21, 2019 at 9:57 AM Wolfram Sang <wsa@the-dreams.de> wrote:
> > > So some of my local code on top must have impacted the behavior.
> >
> > Any change in temperature? Niklas and I wonder if it is thermal related.
>
> Nope. I tried an old "known" good kernel again yesterday, and it worked.
> That was BTW the one which had the additional debug prints:
>
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -199,6 +199,7 @@ static struct generic_pm_domain
> *dev_to_genpd(struct device *dev)
>  static int genpd_stop_dev(const struct generic_pm_domain *genpd,
>                           struct device *dev)
>  {
> +pr_info("==== %s/%s: stop\n", genpd->name, dev_name(dev));
>         WARN(device_may_wakeup(dev),
>              "Domain %s must be active_wakeup for wakeup source %s\n",
>              genpd->name, dev_name(dev));
> @@ -208,6 +209,7 @@ static int genpd_stop_dev(const struct
> generic_pm_domain *genpd,
>  static int genpd_start_dev(const struct generic_pm_domain *genpd,
>                            struct device *dev)
>  {
> +pr_info("==== %s/%s: start\n", genpd->name, dev_name(dev));
>         return GENPD_DEV_CALLBACK(genpd, int, start, dev);
>  }
>
> Removing those prints made the old kernel fail, too, so this is why I think
> it is a race with Runtime PM.
>
> With a tree based on latest renesas-drivers, it happens regardless of those
> debug prints.
>
> > > > I am working on an issue where the SCC hangs, but this has to do with
> > > > always providing the SCC clock (SDnH). I don't really see the connection
> > > > of that to RuntimePM yet, though :/
> > >
> > > Makes sense: this is consistent with the behavior when accessing
> > > registers without enabling the corresponding module clock: it hangs.
> > > So this can happen with other clocks, too.
> > > One more reason not to delegate clock handling to a guest, as doing it
> > > wrong can take down the host, too...
> >
> > You mean when it comes to virtualization?
>
> Exactly.
>
> > > > Can you test this simple workaround patch instead of the revert just so
> > > > we get an idea if these issues are related?
> > >
> > > Thanks, applying your workaround on top of
> > > renesas-drivers-2019-11-19-v5.4-rc8 fixes the issue.
> >
> > Ok, good to know thanks for testing. Currently, I wonder why reverting
> > the NON_REMOVABLE workaround makes a difference. Maybe it is not
> > temperature related but a some race with RPM? I am debugging in this
> > direction now. But the lockup is still hard to trigger for me. Tried
> > v5.4-rc8 + NON_REMOVABLE patch with no luck. Will try renesas-drivers
> > next.
>
> As I managed to bisect it, it was fairly reproducible for me. Just checkout
> commit 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE"),
> or use renesas-drivers.
>
> Oh, if it's a race, it may be affected by the compiler, too.
> gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)
>
> > > This fix is part of renesas/topic/sdhi-manual-calib, right?
> >
> > Yes.
> >
> > > And thus has been present in some renesas-drivers release, but was
> > > dropped _before_ the 2019-10-15-v5.4-rc3 release.
> >
> > That would explain why it didn't show up before, right? And don't you
>
> Not exactly. That branch was dropped before Ulf reverted the
> NON_REMOVABLE workaround.
>
> > have a Ebisu in your board farm, too? Luckily, I have one, too, now. It
> > should be affected.
>
> Haven't seen the issue on Ebisu (yet?).
> To be sure, I have just retried again with the exact same kernel image
> and userland: m3n-salvator-xs hangs, ebisu boots fine (and I can read
> /dev/mmcblk2).
>
> But as it looks to be timing-related, and E3 has different/less CPU cores,
> it may still be affected.

A few comments around your runtime PM concerns, not sure if this
matters to you issues.

So, only by looking at the code for probing of the tmio variant
drivers, it seems like there are accesses (both reads and writes) for
the device being done, without first making sure that the clock
managed by the PM domain has been enabled. Is that really okay? Note,
this isn't a new thing, but it has been there as long as can remember.

For example, in renesas_sdhi_probe() there are calls made to
sd_ctrl_write16|read16() before calling tmio_mmc_host_probe().

Additionally in tmio_mmc_host_probe(), there are calls made to
sd_ctrl_write16|read16(), before calling pm_runtime_get_sync().

If you haven't noticed, we have also managed to replace the call with
pm_runtime_get_sync() with a call to dev_pm_domain_start(), to
simplify the code. The point is, these changes are queued via Rafael's
pm-tree for v5.5.

So, perhaps at this point we should simply drop the offending commit
and revisit this once v5.5-rc1 is out?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-21  8:57       ` Wolfram Sang
@ 2019-11-21  9:35         ` Geert Uytterhoeven
  2019-11-21 10:29           ` Ulf Hansson
  2019-11-21 11:10           ` Wolfram Sang
  0 siblings, 2 replies; 16+ messages in thread
From: Geert Uytterhoeven @ 2019-11-21  9:35 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: Wolfram Sang, Linux MMC List, Linux-Renesas

Hi Wolfram,

On Thu, Nov 21, 2019 at 9:57 AM Wolfram Sang <wsa@the-dreams.de> wrote:
> > So some of my local code on top must have impacted the behavior.
>
> Any change in temperature? Niklas and I wonder if it is thermal related.

Nope. I tried an old "known" good kernel again yesterday, and it worked.
That was BTW the one which had the additional debug prints:

--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -199,6 +199,7 @@ static struct generic_pm_domain
*dev_to_genpd(struct device *dev)
 static int genpd_stop_dev(const struct generic_pm_domain *genpd,
                          struct device *dev)
 {
+pr_info("==== %s/%s: stop\n", genpd->name, dev_name(dev));
        WARN(device_may_wakeup(dev),
             "Domain %s must be active_wakeup for wakeup source %s\n",
             genpd->name, dev_name(dev));
@@ -208,6 +209,7 @@ static int genpd_stop_dev(const struct
generic_pm_domain *genpd,
 static int genpd_start_dev(const struct generic_pm_domain *genpd,
                           struct device *dev)
 {
+pr_info("==== %s/%s: start\n", genpd->name, dev_name(dev));
        return GENPD_DEV_CALLBACK(genpd, int, start, dev);
 }

Removing those prints made the old kernel fail, too, so this is why I think
it is a race with Runtime PM.

With a tree based on latest renesas-drivers, it happens regardless of those
debug prints.

> > > I am working on an issue where the SCC hangs, but this has to do with
> > > always providing the SCC clock (SDnH). I don't really see the connection
> > > of that to RuntimePM yet, though :/
> >
> > Makes sense: this is consistent with the behavior when accessing
> > registers without enabling the corresponding module clock: it hangs.
> > So this can happen with other clocks, too.
> > One more reason not to delegate clock handling to a guest, as doing it
> > wrong can take down the host, too...
>
> You mean when it comes to virtualization?

Exactly.

> > > Can you test this simple workaround patch instead of the revert just so
> > > we get an idea if these issues are related?
> >
> > Thanks, applying your workaround on top of
> > renesas-drivers-2019-11-19-v5.4-rc8 fixes the issue.
>
> Ok, good to know thanks for testing. Currently, I wonder why reverting
> the NON_REMOVABLE workaround makes a difference. Maybe it is not
> temperature related but a some race with RPM? I am debugging in this
> direction now. But the lockup is still hard to trigger for me. Tried
> v5.4-rc8 + NON_REMOVABLE patch with no luck. Will try renesas-drivers
> next.

As I managed to bisect it, it was fairly reproducible for me. Just checkout
commit 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE"),
or use renesas-drivers.

Oh, if it's a race, it may be affected by the compiler, too.
gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)

> > This fix is part of renesas/topic/sdhi-manual-calib, right?
>
> Yes.
>
> > And thus has been present in some renesas-drivers release, but was
> > dropped _before_ the 2019-10-15-v5.4-rc3 release.
>
> That would explain why it didn't show up before, right? And don't you

Not exactly. That branch was dropped before Ulf reverted the
NON_REMOVABLE workaround.

> have a Ebisu in your board farm, too? Luckily, I have one, too, now. It
> should be affected.

Haven't seen the issue on Ebisu (yet?).
To be sure, I have just retried again with the exact same kernel image
and userland: m3n-salvator-xs hangs, ebisu boots fine (and I can read
/dev/mmcblk2).

But as it looks to be timing-related, and E3 has different/less CPU cores,
it may still be affected.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-20  7:46     ` Geert Uytterhoeven
@ 2019-11-21  8:57       ` Wolfram Sang
  2019-11-21  9:35         ` Geert Uytterhoeven
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfram Sang @ 2019-11-21  8:57 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Wolfram Sang, Linux MMC List, Linux-Renesas

[-- Attachment #1: Type: text/plain, Size: 1703 bytes --]

Hi Geert,

> So some of my local code on top must have impacted the behavior.

Any change in temperature? Niklas and I wonder if it is thermal related.

> > I am working on an issue where the SCC hangs, but this has to do with
> > always providing the SCC clock (SDnH). I don't really see the connection
> > of that to RuntimePM yet, though :/
> 
> Makes sense: this is consistent with the behavior when accessing
> registers without enabling the corresponding module clock: it hangs.
> So this can happen with other clocks, too.
> One more reason not to delegate clock handling to a guest, as doing it
> wrong can take down the host, too...

You mean when it comes to virtualization?

> > Can you test this simple workaround patch instead of the revert just so
> > we get an idea if these issues are related?
> 
> Thanks, applying your workaround on top of
> renesas-drivers-2019-11-19-v5.4-rc8 fixes the issue.

Ok, good to know thanks for testing. Currently, I wonder why reverting
the NON_REMOVABLE workaround makes a difference. Maybe it is not
temperature related but a some race with RPM? I am debugging in this
direction now. But the lockup is still hard to trigger for me. Tried
v5.4-rc8 + NON_REMOVABLE patch with no luck. Will try renesas-drivers
next.

> This fix is part of renesas/topic/sdhi-manual-calib, right?

Yes.

> And thus has been present in some renesas-drivers release, but was
> dropped _before_ the 2019-10-15-v5.4-rc3 release.

That would explain why it didn't show up before, right? And don't you
have a Ebisu in your board farm, too? Luckily, I have one, too, now. It
should be affected.

Thanks for the pointers,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-19 20:47   ` Wolfram Sang
@ 2019-11-20  7:46     ` Geert Uytterhoeven
  2019-11-21  8:57       ` Wolfram Sang
  0 siblings, 1 reply; 16+ messages in thread
From: Geert Uytterhoeven @ 2019-11-20  7:46 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: Wolfram Sang, Linux MMC List, Linux-Renesas

Hi Wolfram,

On Tue, Nov 19, 2019 at 9:47 PM Wolfram Sang <wsa@the-dreams.de> wrote:
> > Interestingly, this patch has been part of renesas-drivers since the
> > 2019-10-15-v5.4-rc3 release, without real issues.
>
> Huh, interesting. With which branch does this appear then? linux-next?

renesas-drivers

I can reproduce this with renesas-drivers-2019-11-19-v5.4-rc8 and
renesas_defconfig by booting a ramdisk, and reading from /dev/mmcblk1.

Also with renesas-drivers-2019-10-15-v5.4-rc3.
So some of my local code on top must have impacted the behavior.

> > Today, it was fairly reproducible, so I managed to bisect it to commit
> > 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE") in
> > mmc/next.  Reverting this commit fixes the issue.
>
> Hmm, probably we should do the revert despite our discusstion here. And
> then resend the original patch after we figured the cause of this hang.
>
> > The issue can also be fixed by:
> >   1. enabling the hs400_4taps and/or hs400_disabled quirks in
> >      sdhi_quirks_match[], OR
> >   2. forcing use_4tap = true in renesas_sdhi_check_scc_error().
> >
> > Salvator-X(S) with R-Car H3 ES1.0 & ES2.0, or M3-W ES1.0, the issue
> > does not show up (probably because of sdhi_quirks_match[]).
> >
> > Do you have a clue?
>
> Not very clear. M3-N is not a 4tap-device, so this can't be a fix.
> However, both disabling HS400 as well as using 4tap will prevent the SCC
> error checking in renesas_sdhi_check_scc_error(). I'd assume the SCC
> hangs.

Makes sense.

> I am working on an issue where the SCC hangs, but this has to do with
> always providing the SCC clock (SDnH). I don't really see the connection
> of that to RuntimePM yet, though :/

Makes sense: this is consistent with the behavior when accessing
registers without enabling the corresponding module clock: it hangs.
So this can happen with other clocks, too.
One more reason not to delegate clock handling to a guest, as doing it
wrong can take down the host, too...

> Can you test this simple workaround patch instead of the revert just so
> we get an idea if these issues are related?

Thanks, applying your workaround on top of
renesas-drivers-2019-11-19-v5.4-rc8 fixes the issue.

This fix is part of renesas/topic/sdhi-manual-calib, right?
And thus has been present in some renesas-drivers release, but was
dropped _before_ the 2019-10-15-v5.4-rc3 release.


Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-11-19 15:51 ` Geert Uytterhoeven
@ 2019-11-19 20:47   ` Wolfram Sang
  2019-11-20  7:46     ` Geert Uytterhoeven
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfram Sang @ 2019-11-19 20:47 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Wolfram Sang, Linux MMC List, Linux-Renesas

[-- Attachment #1: Type: text/plain, Size: 2957 bytes --]

Hi Geert,

thanks for the report!

> Interestingly, this patch has been part of renesas-drivers since the
> 2019-10-15-v5.4-rc3 release, without real issues.

Huh, interesting. With which branch does this appear then? linux-next?

> Today, it was fairly reproducible, so I managed to bisect it to commit
> 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE") in
> mmc/next.  Reverting this commit fixes the issue.

Hmm, probably we should do the revert despite our discusstion here. And
then resend the original patch after we figured the cause of this hang.

> The issue can also be fixed by:
>   1. enabling the hs400_4taps and/or hs400_disabled quirks in
>      sdhi_quirks_match[], OR
>   2. forcing use_4tap = true in renesas_sdhi_check_scc_error().
> 
> Salvator-X(S) with R-Car H3 ES1.0 & ES2.0, or M3-W ES1.0, the issue
> does not show up (probably because of sdhi_quirks_match[]).
> 
> Do you have a clue?

Not very clear. M3-N is not a 4tap-device, so this can't be a fix.
However, both disabling HS400 as well as using 4tap will prevent the SCC
error checking in renesas_sdhi_check_scc_error(). I'd assume the SCC
hangs.

I am working on an issue where the SCC hangs, but this has to do with
always providing the SCC clock (SDnH). I don't really see the connection
of that to RuntimePM yet, though :/

Can you test this simple workaround patch instead of the revert just so
we get an idea if these issues are related?

Thanks,

   Wolfram

From: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Thu, 27 Jun 2019 11:05:06 +0200
Subject: [PATCH] WIP: clk: renesas: rcar-gen3: enable SDnH clk for HS modes

When switching to HS400, we shortly need to switch back to plain HS
mode, but we still need the SDnH clock, so the SCC of SDHI can work.
So, make sure SDnH is still active, then.

FIXME: needs verification from the BSP/HW team!

Not-yet-Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/clk/renesas/rcar-gen3-cpg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/renesas/rcar-gen3-cpg.c b/drivers/clk/renesas/rcar-gen3-cpg.c
index d25c8ba00a65..043ab6ed9d55 100644
--- a/drivers/clk/renesas/rcar-gen3-cpg.c
+++ b/drivers/clk/renesas/rcar-gen3-cpg.c
@@ -263,7 +263,7 @@ static const struct sd_div_table cpg_sd_div_table[] = {
 /*	CPG_SD_DIV_TABLE_DATA(stp_hck,  stp_ck,   sd_srcfc,   sd_fc,  sd_div) */
 	CPG_SD_DIV_TABLE_DATA(0,        0,        0,          1,        4),
 	CPG_SD_DIV_TABLE_DATA(0,        0,        1,          1,        8),
-	CPG_SD_DIV_TABLE_DATA(1,        0,        2,          1,       16),
+	CPG_SD_DIV_TABLE_DATA(0,        0,        2,          1,       16),
 	CPG_SD_DIV_TABLE_DATA(1,        0,        3,          1,       32),
 	CPG_SD_DIV_TABLE_DATA(1,        0,        4,          1,       64),
 	CPG_SD_DIV_TABLE_DATA(0,        0,        0,          0,        2),


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-09-17 18:36 Wolfram Sang
  2019-10-03 10:01 ` Ulf Hansson
@ 2019-11-19 15:51 ` Geert Uytterhoeven
  2019-11-19 20:47   ` Wolfram Sang
  1 sibling, 1 reply; 16+ messages in thread
From: Geert Uytterhoeven @ 2019-11-19 15:51 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: Linux MMC List, Linux-Renesas

Hi Wolfram,

On Tue, Sep 17, 2019 at 9:46 PM Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
> PM has been reworked, so eMMC gets now detected on R-Car H3 ES1.0 and
> 2.0 as well as M3-N without the workaround. Card detect and write
> protect also still work. Remove the workaround.
>
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

> --- a/drivers/mmc/host/tmio_mmc_core.c
> +++ b/drivers/mmc/host/tmio_mmc_core.c
> @@ -1208,15 +1208,6 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
>         if (!_host->reset)
>                 _host->reset = tmio_mmc_reset;
>
> -       /*
> -        * On Gen2+, eMMC with NONREMOVABLE currently fails because native
> -        * hotplug gets disabled. It seems RuntimePM related yet we need further
> -        * research. Since we are planning a PM overhaul anyway, let's enforce
> -        * for now the device being active by enabling native hotplug always.
> -        */
> -       if (pdata->flags & TMIO_MMC_MIN_RCAR2)
> -               _host->native_hotplug = true;
> -
>         /*
>          * While using internal tmio hardware logic for card detection, we need
>          * to ensure it stays powered for it to work.

On Salvator-XS with R-Car M3-N, this causes lock ups during early userspace
boot up (Debian 9 nfsroot), usually after:

    [  OK  ] Started D-Bus System Message Bus.

At first I suspected systemd.  Then I noticed the kernel had locked up
completely, and no longer responded to ping requests.

Interestingly, this patch has been part of renesas-drivers since the
2019-10-15-v5.4-rc3 release, without real issues.
The first time I saw the issue was last week. Then it happened only after I
had disabled a debug print, which probably caused a timing difference in
Runtime PM disablement (and was fully reproducible).

Today, it was fairly reproducible, so I managed to bisect it to commit
7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE") in
mmc/next.  Reverting this commit fixes the issue.

The issue can also be fixed by:
  1. enabling the hs400_4taps and/or hs400_disabled quirks in
     sdhi_quirks_match[], OR
  2. forcing use_4tap = true in renesas_sdhi_check_scc_error().

Salvator-X(S) with R-Car H3 ES1.0 & ES2.0, or M3-W ES1.0, the issue
does not show up (probably because of sdhi_quirks_match[]).

Do you have a clue?
Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
  2019-09-17 18:36 Wolfram Sang
@ 2019-10-03 10:01 ` Ulf Hansson
  2019-11-19 15:51 ` Geert Uytterhoeven
  1 sibling, 0 replies; 16+ messages in thread
From: Ulf Hansson @ 2019-10-03 10:01 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linux-mmc, Linux-Renesas

On Tue, 17 Sep 2019 at 20:36, Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
>
> PM has been reworked, so eMMC gets now detected on R-Car H3 ES1.0 and
> 2.0 as well as M3-N without the workaround. Card detect and write
> protect also still work. Remove the workaround.
>
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Applied for next, thanks!

Kind regards
Uffe


> ---
>  drivers/mmc/host/tmio_mmc_core.c | 9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
> index 9b6e1001e77c..63dc37481fba 100644
> --- a/drivers/mmc/host/tmio_mmc_core.c
> +++ b/drivers/mmc/host/tmio_mmc_core.c
> @@ -1208,15 +1208,6 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
>         if (!_host->reset)
>                 _host->reset = tmio_mmc_reset;
>
> -       /*
> -        * On Gen2+, eMMC with NONREMOVABLE currently fails because native
> -        * hotplug gets disabled. It seems RuntimePM related yet we need further
> -        * research. Since we are planning a PM overhaul anyway, let's enforce
> -        * for now the device being active by enabling native hotplug always.
> -        */
> -       if (pdata->flags & TMIO_MMC_MIN_RCAR2)
> -               _host->native_hotplug = true;
> -
>         /*
>          * While using internal tmio hardware logic for card detection, we need
>          * to ensure it stays powered for it to work.
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE
@ 2019-09-17 18:36 Wolfram Sang
  2019-10-03 10:01 ` Ulf Hansson
  2019-11-19 15:51 ` Geert Uytterhoeven
  0 siblings, 2 replies; 16+ messages in thread
From: Wolfram Sang @ 2019-09-17 18:36 UTC (permalink / raw)
  To: linux-mmc; +Cc: linux-renesas-soc, Wolfram Sang

PM has been reworked, so eMMC gets now detected on R-Car H3 ES1.0 and
2.0 as well as M3-N without the workaround. Card detect and write
protect also still work. Remove the workaround.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/mmc/host/tmio_mmc_core.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
index 9b6e1001e77c..63dc37481fba 100644
--- a/drivers/mmc/host/tmio_mmc_core.c
+++ b/drivers/mmc/host/tmio_mmc_core.c
@@ -1208,15 +1208,6 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
 	if (!_host->reset)
 		_host->reset = tmio_mmc_reset;
 
-	/*
-	 * On Gen2+, eMMC with NONREMOVABLE currently fails because native
-	 * hotplug gets disabled. It seems RuntimePM related yet we need further
-	 * research. Since we are planning a PM overhaul anyway, let's enforce
-	 * for now the device being active by enabling native hotplug always.
-	 */
-	if (pdata->flags & TMIO_MMC_MIN_RCAR2)
-		_host->native_hotplug = true;
-
 	/*
 	 * While using internal tmio hardware logic for card detection, we need
 	 * to ensure it stays powered for it to work.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-03-02 14:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18 11:02 [PATCH] mmc: tmio: remove workaround for NON_REMOVABLE Wolfram Sang
2021-03-02 10:38 ` Ulf Hansson
  -- strict thread matches above, loose matches on Subject: below --
2019-09-17 18:36 Wolfram Sang
2019-10-03 10:01 ` Ulf Hansson
2019-11-19 15:51 ` Geert Uytterhoeven
2019-11-19 20:47   ` Wolfram Sang
2019-11-20  7:46     ` Geert Uytterhoeven
2019-11-21  8:57       ` Wolfram Sang
2019-11-21  9:35         ` Geert Uytterhoeven
2019-11-21 10:29           ` Ulf Hansson
2019-11-21 10:52             ` Geert Uytterhoeven
2019-11-21 11:12             ` Wolfram Sang
2019-11-21 11:10           ` Wolfram Sang
2019-12-02  8:20             ` Wolfram Sang
2019-12-02  8:31               ` Geert Uytterhoeven
2019-12-02  8:50                 ` Wolfram Sang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.