linux-mmc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] renesas_sdhi: fix hang when SCC loses its clock
@ 2020-06-04 11:20 Wolfram Sang
  2020-06-04 11:20 ` [PATCH 1/2] mmc: core: when downgrading HS400, callback into drivers earlier Wolfram Sang
  2020-06-04 11:20 ` [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning Wolfram Sang
  0 siblings, 2 replies; 13+ messages in thread
From: Wolfram Sang @ 2020-06-04 11:20 UTC (permalink / raw)
  To: linux-mmc; +Cc: linux-renesas-soc, Yoshihiro Shimoda, Wolfram Sang

This was a nasty one because it wasn't reproducible for a long time.
Recent work on the manual calibration mechanism made it show up again
for me, so I could finally tackle it. The reason is that there is more
SCC handling now, so we are more likely to stumble again over the
problem that it may have no clock at that time.

There is a patch in the BSP handling this issue, too, but it didn't work
for me on at least v5.6+ kernels. Also, I thought it is way simpler to
keep the last working external frequency instead of defining a default
one per SoC generation.

Patches are based on mmc/next as of yesterday or so. You need the
'manual calibration' patches for the issue to show up. They are not
fully tested yet, so I will send them as RFC in a minute. Or just fetch
this branch:

git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git renesas/sdhi/new_manual_calib

With that branch, reading a file from eMMC works for me(tm). If you
prevent 'keep_scc_freq' from being 'true', then reading a file should
stall the machine. Happened here on a R-Car M3-N to me.

Looking forward to comments and more tests.

Thanks,

   Wolfram

Wolfram Sang (2):
  mmc: core: when downgrading HS400, callback into drivers earlier
  mmc: renesas_sdhi: keep SCC clock active when tuning

 drivers/mmc/core/mmc.c               | 14 +++++++-------
 drivers/mmc/host/renesas_sdhi.h      |  1 +
 drivers/mmc/host/renesas_sdhi_core.c | 25 ++++++++++++++++++++++---
 3 files changed, 30 insertions(+), 10 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] mmc: core: when downgrading HS400, callback into drivers earlier
  2020-06-04 11:20 [PATCH 0/2] renesas_sdhi: fix hang when SCC loses its clock Wolfram Sang
@ 2020-06-04 11:20 ` Wolfram Sang
  2020-06-08  6:02   ` Yoshihiro Shimoda
  2020-06-04 11:20 ` [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning Wolfram Sang
  1 sibling, 1 reply; 13+ messages in thread
From: Wolfram Sang @ 2020-06-04 11:20 UTC (permalink / raw)
  To: linux-mmc; +Cc: linux-renesas-soc, Yoshihiro Shimoda, Wolfram Sang

The driver specific downgrade function makes more sense if we run it
before we switch anything, not after we already switched. Otherwise some
non-HS400 communication has already happened.

No need to convert users. There is only one currenty which needs this
change in a later patchset.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/mmc/core/mmc.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 4203303f946a..f97994eace3b 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1156,6 +1156,10 @@ static int mmc_select_hs400(struct mmc_card *card)
 	      host->ios.bus_width == MMC_BUS_WIDTH_8))
 		return 0;
 
+	/* Prepare host to downgrade to HS timing */
+	if (host->ops->hs400_downgrade)
+		host->ops->hs400_downgrade(host);
+
 	/* Switch card to HS mode */
 	val = EXT_CSD_TIMING_HS;
 	err = __mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
@@ -1171,10 +1175,6 @@ static int mmc_select_hs400(struct mmc_card *card)
 	/* Set host controller to HS timing */
 	mmc_set_timing(card->host, MMC_TIMING_MMC_HS);
 
-	/* Prepare host to downgrade to HS timing */
-	if (host->ops->hs400_downgrade)
-		host->ops->hs400_downgrade(host);
-
 	/* Reduce frequency to HS frequency */
 	max_dtr = card->ext_csd.hs_max_dtr;
 	mmc_set_clock(host, max_dtr);
@@ -1241,6 +1241,9 @@ int mmc_hs400_to_hs200(struct mmc_card *card)
 	int err;
 	u8 val;
 
+	if (host->ops->hs400_downgrade)
+		host->ops->hs400_downgrade(host);
+
 	/* Reduce frequency to HS */
 	max_dtr = card->ext_csd.hs_max_dtr;
 	mmc_set_clock(host, max_dtr);
@@ -1268,9 +1271,6 @@ int mmc_hs400_to_hs200(struct mmc_card *card)
 
 	mmc_set_timing(host, MMC_TIMING_MMC_HS);
 
-	if (host->ops->hs400_downgrade)
-		host->ops->hs400_downgrade(host);
-
 	err = mmc_switch_status(card, true);
 	if (err)
 		goto out_err;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning
  2020-06-04 11:20 [PATCH 0/2] renesas_sdhi: fix hang when SCC loses its clock Wolfram Sang
  2020-06-04 11:20 ` [PATCH 1/2] mmc: core: when downgrading HS400, callback into drivers earlier Wolfram Sang
@ 2020-06-04 11:20 ` Wolfram Sang
  2020-06-08  6:35   ` Yoshihiro Shimoda
  1 sibling, 1 reply; 13+ messages in thread
From: Wolfram Sang @ 2020-06-04 11:20 UTC (permalink / raw)
  To: linux-mmc; +Cc: linux-renesas-soc, Yoshihiro Shimoda, Wolfram Sang

Tuning procedure switches to lower frequencies but that will turn the
SCC off and accessing its register then will hang. So, flag when we are
tuning and keep the current setup of the external clock if we are doing
so. Note that we still switch to the lower frequency because of the
internal divider. We just make sure to not modify the external clock.
This patch depends on a MMC core patch calling the downgrade function
earlier.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/mmc/host/renesas_sdhi.h      |  1 +
 drivers/mmc/host/renesas_sdhi_core.c | 25 ++++++++++++++++++++++---
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/host/renesas_sdhi.h b/drivers/mmc/host/renesas_sdhi.h
index 14c64caefc64..58a8c9133ba4 100644
--- a/drivers/mmc/host/renesas_sdhi.h
+++ b/drivers/mmc/host/renesas_sdhi.h
@@ -59,6 +59,7 @@ struct renesas_sdhi {
 	u32 scc_tappos;
 	u32 scc_tappos_hs400;
 	bool doing_tune;
+	bool keep_scc_freq;
 
 	/* Tuning values: 1 for success, 0 for failure */
 	DECLARE_BITMAP(taps, BITS_PER_LONG);
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index 15e21894bd44..589a59fb70eb 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -117,8 +117,12 @@ static unsigned int renesas_sdhi_clk_update(struct tmio_mmc_host *host,
 	unsigned int freq, diff, best_freq = 0, diff_min = ~0;
 	int i;
 
-	/* tested only on R-Car Gen2+ currently; may work for others */
-	if (!(host->pdata->flags & TMIO_MMC_MIN_RCAR2))
+	/*
+	 * We simply return the current rate if a) we are not on a R-Car Gen2+
+	 * SoC (may work for others, but untested) or b) if the SCC needs its
+	 * clock during tuning, so we don't change the external clock setup.
+	 */
+	if (!(host->pdata->flags & TMIO_MMC_MIN_RCAR2) || priv->keep_scc_freq)
 		return clk_get_rate(priv->clk);
 
 	/*
@@ -323,6 +327,9 @@ static void renesas_sdhi_hs400_complete(struct mmc_host *mmc)
 	u32 bad_taps = priv->quirks ? priv->quirks->hs400_bad_taps : 0;
 	bool use_4tap = priv->quirks && priv->quirks->hs400_4taps;
 
+	/* Tuning done, no special handling for SCC clock needed anymore */
+	priv->keep_scc_freq = false;
+
 	sd_ctrl_write16(host, CTL_SD_CARD_CLK_CTL, ~CLK_CTL_SCLKEN &
 		sd_ctrl_read16(host, CTL_SD_CARD_CLK_CTL));
 
@@ -401,6 +408,14 @@ static void renesas_sdhi_disable_scc(struct mmc_host *mmc)
 
 	sd_ctrl_write16(host, CTL_SD_CARD_CLK_CTL, CLK_CTL_SCLKEN |
 			sd_ctrl_read16(host, CTL_SD_CARD_CLK_CTL));
+
+	/*
+	 * During retune, we still access SCC registers, so keep clock on.
+	 * We have it here again because for retuning it is too late when we
+	 * only enable this flag when HS400 tuning gets initialized.
+	 */
+	if (mmc_doing_retune(mmc))
+		priv->keep_scc_freq = true;
 }
 
 static void renesas_sdhi_reset_hs400_mode(struct tmio_mmc_host *host,
@@ -427,8 +442,12 @@ static void renesas_sdhi_reset_hs400_mode(struct tmio_mmc_host *host,
 static int renesas_sdhi_prepare_hs400_tuning(struct mmc_host *mmc, struct mmc_ios *ios)
 {
 	struct tmio_mmc_host *host = mmc_priv(mmc);
+	struct renesas_sdhi *priv = host_to_priv(host);
+
+	/* During tuning, we still access SCC registers, so keep clock on */
+	priv->keep_scc_freq = true;
+	renesas_sdhi_reset_hs400_mode(host, priv);
 
-	renesas_sdhi_reset_hs400_mode(host, host_to_priv(host));
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/2] mmc: core: when downgrading HS400, callback into drivers earlier
  2020-06-04 11:20 ` [PATCH 1/2] mmc: core: when downgrading HS400, callback into drivers earlier Wolfram Sang
@ 2020-06-08  6:02   ` Yoshihiro Shimoda
  2020-06-08 20:41     ` Wolfram Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Yoshihiro Shimoda @ 2020-06-08  6:02 UTC (permalink / raw)
  To: Wolfram Sang, linux-mmc; +Cc: linux-renesas-soc

Hi Wolfram-san,

Thank you for the patch!

> From: Wolfram Sang, Sent: Thursday, June 4, 2020 8:21 PM
> 
> The driver specific downgrade function makes more sense if we run it
> before we switch anything, not after we already switched. Otherwise some
> non-HS400 communication has already happened.
> 
> No need to convert users. There is only one currenty which needs this
> change in a later patchset.

Perhaps, should we add Fixes tag like below?

Fixes: ba6c7ac3a2f4 ("mmc: core: more fine-grained hooks for HS400 tuning")

> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
>  drivers/mmc/core/mmc.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
> index 4203303f946a..f97994eace3b 100644
> --- a/drivers/mmc/core/mmc.c
> +++ b/drivers/mmc/core/mmc.c
> @@ -1156,6 +1156,10 @@ static int mmc_select_hs400(struct mmc_card *card)
>  	      host->ios.bus_width == MMC_BUS_WIDTH_8))
>  		return 0;
> 
> +	/* Prepare host to downgrade to HS timing */
> +	if (host->ops->hs400_downgrade)
> +		host->ops->hs400_downgrade(host);
> +

IICU, we should call hs400_downgrade() between the __mmc_switch("EXT_CSD_TIMING_HS")
and mmc_set_timing(card->host, MMC_TIMING_MMC_HS) because the switch command should
be issued in HS400 mode.

>  	/* Switch card to HS mode */
>  	val = EXT_CSD_TIMING_HS;
>  	err = __mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
> @@ -1171,10 +1175,6 @@ static int mmc_select_hs400(struct mmc_card *card)
>  	/* Set host controller to HS timing */
>  	mmc_set_timing(card->host, MMC_TIMING_MMC_HS);
> 
> -	/* Prepare host to downgrade to HS timing */
> -	if (host->ops->hs400_downgrade)
> -		host->ops->hs400_downgrade(host);
> -
>  	/* Reduce frequency to HS frequency */
>  	max_dtr = card->ext_csd.hs_max_dtr;
>  	mmc_set_clock(host, max_dtr);
> @@ -1241,6 +1241,9 @@ int mmc_hs400_to_hs200(struct mmc_card *card)
>  	int err;
>  	u8 val;
> 
> +	if (host->ops->hs400_downgrade)
> +		host->ops->hs400_downgrade(host);
> +

IIUC, this also should be called between
__mmc_switch("EXT_CSD_TIMING_HS") to mmc_set_timing(MMC_TIMING_MMC_DDR52).

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning
  2020-06-04 11:20 ` [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning Wolfram Sang
@ 2020-06-08  6:35   ` Yoshihiro Shimoda
  2020-06-08 21:27     ` Wolfram Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Yoshihiro Shimoda @ 2020-06-08  6:35 UTC (permalink / raw)
  To: Wolfram Sang, linux-mmc; +Cc: linux-renesas-soc

Hi Wolfram-san,

Thank you for the patch!

> From: Wolfram Sang, Sent: Thursday, June 4, 2020 8:21 PM
> 
> Tuning procedure switches to lower frequencies but that will turn the
> SCC off and accessing its register then will hang. So, flag when we are
> tuning and keep the current setup of the external clock if we are doing
> so. Note that we still switch to the lower frequency because of the
> internal divider. We just make sure to not modify the external clock.
> This patch depends on a MMC core patch calling the downgrade function
> earlier.
> 
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
>  drivers/mmc/host/renesas_sdhi.h      |  1 +
>  drivers/mmc/host/renesas_sdhi_core.c | 25 ++++++++++++++++++++++---
>  2 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mmc/host/renesas_sdhi.h b/drivers/mmc/host/renesas_sdhi.h
> index 14c64caefc64..58a8c9133ba4 100644
> --- a/drivers/mmc/host/renesas_sdhi.h
> +++ b/drivers/mmc/host/renesas_sdhi.h
> @@ -59,6 +59,7 @@ struct renesas_sdhi {
>  	u32 scc_tappos;
>  	u32 scc_tappos_hs400;
>  	bool doing_tune;
> +	bool keep_scc_freq;
> 
>  	/* Tuning values: 1 for success, 0 for failure */
>  	DECLARE_BITMAP(taps, BITS_PER_LONG);
> diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
> index 15e21894bd44..589a59fb70eb 100644
> --- a/drivers/mmc/host/renesas_sdhi_core.c
> +++ b/drivers/mmc/host/renesas_sdhi_core.c
> @@ -117,8 +117,12 @@ static unsigned int renesas_sdhi_clk_update(struct tmio_mmc_host *host,
>  	unsigned int freq, diff, best_freq = 0, diff_min = ~0;
>  	int i;
> 
> -	/* tested only on R-Car Gen2+ currently; may work for others */
> -	if (!(host->pdata->flags & TMIO_MMC_MIN_RCAR2))
> +	/*
> +	 * We simply return the current rate if a) we are not on a R-Car Gen2+
> +	 * SoC (may work for others, but untested) or b) if the SCC needs its
> +	 * clock during tuning, so we don't change the external clock setup.
> +	 */
> +	if (!(host->pdata->flags & TMIO_MMC_MIN_RCAR2) || priv->keep_scc_freq)
>  		return clk_get_rate(priv->clk);
> 
>  	/*
> @@ -323,6 +327,9 @@ static void renesas_sdhi_hs400_complete(struct mmc_host *mmc)
>  	u32 bad_taps = priv->quirks ? priv->quirks->hs400_bad_taps : 0;
>  	bool use_4tap = priv->quirks && priv->quirks->hs400_4taps;
> 
> +	/* Tuning done, no special handling for SCC clock needed anymore */
> +	priv->keep_scc_freq = false;
> +

Setting keep_scc_freq to false is only here. But, I'm thinking
we should set it in some error paths like below somehow too:
 - error paths before hs400_complete() in mmc_select_hs400().
 - error path of mmc_execute_tuning() in mmc_retune().

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mmc: core: when downgrading HS400, callback into drivers earlier
  2020-06-08  6:02   ` Yoshihiro Shimoda
@ 2020-06-08 20:41     ` Wolfram Sang
  2020-06-09 10:35       ` Yoshihiro Shimoda
  0 siblings, 1 reply; 13+ messages in thread
From: Wolfram Sang @ 2020-06-08 20:41 UTC (permalink / raw)
  To: Yoshihiro Shimoda; +Cc: linux-mmc, linux-renesas-soc

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

Hi Shimoda-san,

thank you very much for the review!

> > The driver specific downgrade function makes more sense if we run it
> > before we switch anything, not after we already switched. Otherwise some
> > non-HS400 communication has already happened.
> > 
> > No need to convert users. There is only one currenty which needs this
> > change in a later patchset.
> 
> Perhaps, should we add Fixes tag like below?
> 
> Fixes: ba6c7ac3a2f4 ("mmc: core: more fine-grained hooks for HS400 tuning")

I am not sure. While it is more correct to move the call to
hs400_downgrade upwards, it does not really fix a bug on its own.
Without patch 2/2 of this series, there is not really a huge difference
when we disable the SCC the old way. For the new way, this patch is a
prerequisite.


> > +	/* Prepare host to downgrade to HS timing */
> > +	if (host->ops->hs400_downgrade)
> > +		host->ops->hs400_downgrade(host);
> > +
> 
> IICU, we should call hs400_downgrade() between the __mmc_switch("EXT_CSD_TIMING_HS")
> and mmc_set_timing(card->host, MMC_TIMING_MMC_HS) because the switch command should
> be issued in HS400 mode.

Yes, of course, you are right. Will fix!

Kind regards,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning
  2020-06-08  6:35   ` Yoshihiro Shimoda
@ 2020-06-08 21:27     ` Wolfram Sang
  2020-06-09 10:41       ` Yoshihiro Shimoda
  0 siblings, 1 reply; 13+ messages in thread
From: Wolfram Sang @ 2020-06-08 21:27 UTC (permalink / raw)
  To: Yoshihiro Shimoda; +Cc: linux-mmc, linux-renesas-soc

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]


> > +	/* Tuning done, no special handling for SCC clock needed anymore */
> > +	priv->keep_scc_freq = false;
> > +
> 
> Setting keep_scc_freq to false is only here. But, I'm thinking
> we should set it in some error paths like below somehow too:
>  - error paths before hs400_complete() in mmc_select_hs400().
>  - error path of mmc_execute_tuning() in mmc_retune().

Hmm, I guess you are right. That would kind of spoil my approach taken
here. Maybe we need another flag in the core like 'doing_tune' to
supplement 'doing_retune', so or driver knows when any kind of tuning is
going on?


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/2] mmc: core: when downgrading HS400, callback into drivers earlier
  2020-06-08 20:41     ` Wolfram Sang
@ 2020-06-09 10:35       ` Yoshihiro Shimoda
  0 siblings, 0 replies; 13+ messages in thread
From: Yoshihiro Shimoda @ 2020-06-09 10:35 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linux-mmc, linux-renesas-soc

Hi Wolfram-san,

> From: Wolfram Sang, Sent: Tuesday, June 9, 2020 5:41 AM
> 
> Hi Shimoda-san,
> 
> thank you very much for the review!
> 
> > > The driver specific downgrade function makes more sense if we run it
> > > before we switch anything, not after we already switched. Otherwise some
> > > non-HS400 communication has already happened.
> > >
> > > No need to convert users. There is only one currenty which needs this
> > > change in a later patchset.
> >
> > Perhaps, should we add Fixes tag like below?
> >
> > Fixes: ba6c7ac3a2f4 ("mmc: core: more fine-grained hooks for HS400 tuning")
> 
> I am not sure. While it is more correct to move the call to
> hs400_downgrade upwards, it does not really fix a bug on its own.
> Without patch 2/2 of this series, there is not really a huge difference
> when we disable the SCC the old way. For the new way, this patch is a
> prerequisite.

I got it. So, we should not add this Fixes tag.

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning
  2020-06-08 21:27     ` Wolfram Sang
@ 2020-06-09 10:41       ` Yoshihiro Shimoda
  2020-08-14  7:15         ` Wolfram Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Yoshihiro Shimoda @ 2020-06-09 10:41 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linux-mmc, linux-renesas-soc

Hi Wolfram-san,

> From: Wolfram Sang, Sent: Tuesday, June 9, 2020 6:27 AM
> 
> > > +	/* Tuning done, no special handling for SCC clock needed anymore */
> > > +	priv->keep_scc_freq = false;
> > > +
> >
> > Setting keep_scc_freq to false is only here. But, I'm thinking
> > we should set it in some error paths like below somehow too:
> >  - error paths before hs400_complete() in mmc_select_hs400().
> >  - error path of mmc_execute_tuning() in mmc_retune().
> 
> Hmm, I guess you are right. That would kind of spoil my approach taken
> here. Maybe we need another flag in the core like 'doing_tune' to
> supplement 'doing_retune', so or driver knows when any kind of tuning is
> going on?

Adding such a new flag is better, I think.

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning
  2020-06-09 10:41       ` Yoshihiro Shimoda
@ 2020-08-14  7:15         ` Wolfram Sang
  2020-08-28  0:51           ` Yoshihiro Shimoda
  0 siblings, 1 reply; 13+ messages in thread
From: Wolfram Sang @ 2020-08-14  7:15 UTC (permalink / raw)
  To: Yoshihiro Shimoda; +Cc: linux-mmc, linux-renesas-soc

[-- Attachment #1: Type: text/plain, Size: 1658 bytes --]

Hi Shimoda-san,tftp 0x58000000 r8a77965-salvator-xs.dtb; tftp 0x50000000 Image-m3n-wsa; booti 0x50000000 - 0x58000000

> > > > +	/* Tuning done, no special handling for SCC clock needed anymore */
> > > > +	priv->keep_scc_freq = false;
> > > > +
> > >
> > > Setting keep_scc_freq to false is only here. But, I'm thinking
> > > we should set it in some error paths like below somehow too:
> > >  - error paths before hs400_complete() in mmc_select_hs400().
> > >  - error path of mmc_execute_tuning() in mmc_retune().
> > 
> > Hmm, I guess you are right. That would kind of spoil my approach taken
> > here. Maybe we need another flag in the core like 'doing_tune' to
> > supplement 'doing_retune', so or driver knows when any kind of tuning is
> > going on?
> 
> Adding such a new flag is better, I think.

So, I added a flag to the MMC core and I think it should work. However,
I can't test it currently because, sadly, the issue disappeared again :(

I even can't reproduce the issue with the same codebase and config which
I used when I was working last time on it. And back then, the issue was
happening. I am at a loss currently what really triggers this hang.

I added some code to enforce reading something from the SCC with the
hclk disabled. However, that reading works fine today here, no hang.

So, it seems that keeping hclk enabled will fix the hang. However, it
doesn't look like it will hang just when we allow to disable it. Seems
something else is part of the equation, too...

I kept trying to figure this out for the last two days, but no success
so far. Will keep you updated.

Thanks,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning
  2020-08-14  7:15         ` Wolfram Sang
@ 2020-08-28  0:51           ` Yoshihiro Shimoda
  2020-09-01 10:24             ` Wolfram Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Yoshihiro Shimoda @ 2020-08-28  0:51 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linux-mmc, linux-renesas-soc

Hi Wolfram-san,

> From: Wolfram Sang, Sent: Friday, August 14, 2020 4:15 PM
> 
> > > > > +	/* Tuning done, no special handling for SCC clock needed anymore */
> > > > > +	priv->keep_scc_freq = false;
> > > > > +
> > > >
> > > > Setting keep_scc_freq to false is only here. But, I'm thinking
> > > > we should set it in some error paths like below somehow too:
> > > >  - error paths before hs400_complete() in mmc_select_hs400().
> > > >  - error path of mmc_execute_tuning() in mmc_retune().
> > >
> > > Hmm, I guess you are right. That would kind of spoil my approach taken
> > > here. Maybe we need another flag in the core like 'doing_tune' to
> > > supplement 'doing_retune', so or driver knows when any kind of tuning is
> > > going on?
> >
> > Adding such a new flag is better, I think.
> 
> So, I added a flag to the MMC core and I think it should work. However,
> I can't test it currently because, sadly, the issue disappeared again :(

I got a report from a colleague about this issue. According to the report,
this issue is related to retuning. When retuning happens, the mmc core
calls mmc_hs400_to_hs200() and then mmc_hs400_to_hs200() will set the clock
as 52MHz at first. So, it's possible to cause the issue.

It's difficult to cause retuning in normal situation. But, according to
the report, if we add a code which the sdhi driver reports an error
at the first CMD18 once, we can cause retuning and then the issue happens.

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning
  2020-08-28  0:51           ` Yoshihiro Shimoda
@ 2020-09-01 10:24             ` Wolfram Sang
  2020-09-01 13:39               ` Wolfram Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Wolfram Sang @ 2020-09-01 10:24 UTC (permalink / raw)
  To: Yoshihiro Shimoda; +Cc: linux-mmc, linux-renesas-soc

[-- Attachment #1: Type: text/plain, Size: 2672 bytes --]

Hi Shimoda-san,

> I got a report from a colleague about this issue. According to the report,
> this issue is related to retuning. When retuning happens, the mmc core
> calls mmc_hs400_to_hs200() and then mmc_hs400_to_hs200() will set the clock
> as 52MHz at first. So, it's possible to cause the issue.
> 
> It's difficult to cause retuning in normal situation. But, according to
> the report, if we add a code which the sdhi driver reports an error
> at the first CMD18 once, we can cause retuning and then the issue happens.

I took the liberty of a different approach because I wanted to reproduce
the issue when doing the initial tuning and not a retune. Because my new
series adds (and checks) a flag for doing_initial_tune, so I really
wanted to excercise this code path. This is a real problem, too, because
I saw this with my boards earlier back then.

And halleluja, today I saw it again, once. I switched to my H3-ES2.0
board which I haven't used for weeks. And when booting that for the
first time, I got a failure including logs. Later boots just went fine.

And because of the logs, I could finally inject an error which will
reproducibly cause the boot to hang because of a stalled SCC. Tada, here
is the patch:

From: Wolfram Sang <wsa+renesas@sang-engineering.com>
Subject: [PATCH] GOLD: simulate stalled SCC

Geez, this took ages to find...
---
 drivers/mmc/core/mmc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 216bd1aed373..6b3056437b37 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1218,6 +1218,7 @@ static int mmc_select_hs400(struct mmc_card *card)
 		host->ops->hs400_complete(host);
 
 	err = mmc_switch_status(card, true);
+	err = -EILSEQ;
 	if (err)
 		goto out_err;

Interestingly, the other mmc_switch_status() in mmc_select_hs400() was
not stalling the SCC. Anyhow, after this failute, the MMC core switches
back to 300kHz and the SCC clock is off but for some reason SCC is still
accessed. I will investigate why. The good news is that my new patch set
fixes the hang as expected. The board will continue to boot so we
probably want to have this series. However, I have the feeling that this
SCC access which hangs the board might be a bug because of an unintended
code path. I mean, this is also one reason why the bug triggers so
rarely these days. We have been fixing a lot of things and the SCC is
only accessed when it should be accessed. We will see. I also need to
test other boards, too.

So much for now, I hope I can report more later.

Happy hacking and kind regards,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning
  2020-09-01 10:24             ` Wolfram Sang
@ 2020-09-01 13:39               ` Wolfram Sang
  0 siblings, 0 replies; 13+ messages in thread
From: Wolfram Sang @ 2020-09-01 13:39 UTC (permalink / raw)
  To: Yoshihiro Shimoda; +Cc: linux-mmc, linux-renesas-soc

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]


> not stalling the SCC. Anyhow, after this failute, the MMC core switches
> back to 300kHz and the SCC clock is off but for some reason SCC is still
> accessed. I will investigate why. The good news is that my new patch set
> fixes the hang as expected. The board will continue to boot so we
> probably want to have this series. However, I have the feeling that this
> SCC access which hangs the board might be a bug because of an unintended
> code path. I mean, this is also one reason why the bug triggers so
> rarely these days. We have been fixing a lot of things and the SCC is
> only accessed when it should be accessed. We will see. I also need to
> test other boards, too.

Some more good news: I can reproduce the issue now not only with
H3-ES2.0 but also with my M3-N.

Interesting news: The hang comes from a code path I would have not
expected. It is not because of accessing an SCC register, it is this
line from renesas_sdhi_set_clock() which causes the issue:

186         sd_ctrl_write16(host, CTL_SD_CARD_CLK_CTL, clk & CLK_CTL_DIV_MASK);

I mean I can guess that the clock setting has something to do with the
SCC, but I can't see the direct connection with the documentation I
have.

I will stop that research here and will prepare now my series to leave
the SCC clock enabled as long as some tuning is in progress.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-01 13:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-04 11:20 [PATCH 0/2] renesas_sdhi: fix hang when SCC loses its clock Wolfram Sang
2020-06-04 11:20 ` [PATCH 1/2] mmc: core: when downgrading HS400, callback into drivers earlier Wolfram Sang
2020-06-08  6:02   ` Yoshihiro Shimoda
2020-06-08 20:41     ` Wolfram Sang
2020-06-09 10:35       ` Yoshihiro Shimoda
2020-06-04 11:20 ` [PATCH 2/2] mmc: renesas_sdhi: keep SCC clock active when tuning Wolfram Sang
2020-06-08  6:35   ` Yoshihiro Shimoda
2020-06-08 21:27     ` Wolfram Sang
2020-06-09 10:41       ` Yoshihiro Shimoda
2020-08-14  7:15         ` Wolfram Sang
2020-08-28  0:51           ` Yoshihiro Shimoda
2020-09-01 10:24             ` Wolfram Sang
2020-09-01 13:39               ` Wolfram Sang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).