All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lucas Stach <l.stach@pengutronix.de>
To: BOUGH CHEN <haibo.chen@nxp.com>,
	Fabio Estevam <festevam@gmail.com>,
	Angus Ainslie <angus@akkea.ca>,
	Leonard Crestez <leonard.crestez@nxp.com>,
	Peng Fan <peng.fan@nxp.com>, Abel Vesa <abel.vesa@nxp.com>,
	Stephen Boyd <sboyd@kernel.org>,
	Michael Turquette <mturquette@baylibre.com>
Cc: "Ulf Hansson" <ulf.hansson@linaro.org>,
	"Guido Günther" <agx@sigxcpu.org>,
	linux-mmc <linux-mmc@vger.kernel.org>,
	"Adrian Hunter" <adrian.hunter@intel.com>,
	dl-linux-imx <linux-imx@nxp.com>,
	"Sascha Hauer" <kernel@pengutronix.de>,
	"moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: sdhci timeout on imx8mq
Date: Tue, 05 Jan 2021 16:06:49 +0100	[thread overview]
Message-ID: <cd99776c0107833d69c9c7fc4c8d6ba1a41ea3d7.camel@pengutronix.de> (raw)
In-Reply-To: <VI1PR04MB52942233A0BA6BCB692F281E90670@VI1PR04MB5294.eurprd04.prod.outlook.com>

Hi all,

Am Mittwoch, dem 08.07.2020 um 01:32 +0000 schrieb BOUGH CHEN:
> > -----Original Message-----
> > From: Fabio Estevam [mailto:festevam@gmail.com]
> > Sent: 2020年7月7日 20:45
> > To: Angus Ainslie <angus@akkea.ca>
> > Cc: BOUGH CHEN <haibo.chen@nxp.com>; Ulf Hansson
> > <ulf.hansson@linaro.org>; Guido Günther <agx@sigxcpu.org>; linux-
> > mmc
> > <linux-mmc@vger.kernel.org>; Adrian Hunter
> > <adrian.hunter@intel.com>;
> > dl-linux-imx <linux-imx@nxp.com>; Sascha Hauer <
> > kernel@pengutronix.de>;
> > moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
> > <linux-arm-kernel@lists.infradead.org>
> > Subject: Re: sdhci timeout on imx8mq
> > 
> > Hi Angus,
> > 
> > On Tue, Jun 30, 2020 at 4:39 PM Angus Ainslie <angus@akkea.ca>
> > wrote:
> > 
> > > Has there been any progress with this. I'm getting this on about
> > > 50%
> > > of
> > 
> > Not from my side, sorry.
> > 
> > Bough,
> > 
> > Do you know why this problem affects the imx8mq-evk versions that
> > are
> > populated with the Micron eMMC and not the ones with Sandisk eMMC?
> 
> Hi Angus,
> 
> Can you show me the full fail log? I do not meet this issue on my
> side, besides, which kind of uboot do you use?

I was finally able to bisect this issue, which wasn't that much fun due
to the issue not being reproducible 100%. :/ Turns out that the issue
is even more interesting than I thought and likely doesn't have
anything to do with SDHCI or used bootloader versions. Here's my
current debugging state:

I've bisected the issue down to b04383b6a558 (clk: imx8mq: Define gates
for pll1/2 fixed dividers). The change itself looks fine to me, still
CC'ed Leonard for good measure.

In my testing the following partial revert fixes the issue:

--- a/drivers/clk/imx/clk-imx8mq.c
+++ b/drivers/clk/imx/clk-imx8mq.c
@@ -365,7 +365,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev)
        hws[IMX8MQ_SYS1_PLL_133M_CG] = imx_clk_hw_gate("sys1_pll_133m_cg", "sys1_pll_out", base + 0x30, 15);
        hws[IMX8MQ_SYS1_PLL_160M_CG] = imx_clk_hw_gate("sys1_pll_160m_cg", "sys1_pll_out", base + 0x30, 17);
        hws[IMX8MQ_SYS1_PLL_200M_CG] = imx_clk_hw_gate("sys1_pll_200m_cg", "sys1_pll_out", base + 0x30, 19);
-       hws[IMX8MQ_SYS1_PLL_266M_CG] = imx_clk_hw_gate("sys1_pll_266m_cg", "sys1_pll_out", base + 0x30, 21);
        hws[IMX8MQ_SYS1_PLL_400M_CG] = imx_clk_hw_gate("sys1_pll_400m_cg", "sys1_pll_out", base + 0x30, 23);
        hws[IMX8MQ_SYS1_PLL_800M_CG] = imx_clk_hw_gate("sys1_pll_800m_cg", "sys1_pll_out", base + 0x30, 25);
 
@@ -375,7 +375,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev)
        hws[IMX8MQ_SYS1_PLL_133M] = imx_clk_hw_fixed_factor("sys1_pll_133m", "sys1_pll_133m_cg", 1, 6);
        hws[IMX8MQ_SYS1_PLL_160M] = imx_clk_hw_fixed_factor("sys1_pll_160m", "sys1_pll_160m_cg", 1, 5);
        hws[IMX8MQ_SYS1_PLL_200M] = imx_clk_hw_fixed_factor("sys1_pll_200m", "sys1_pll_200m_cg", 1, 4);
-       hws[IMX8MQ_SYS1_PLL_266M] = imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_266m_cg", 1, 3);
+       hws[IMX8MQ_SYS1_PLL_266M] = imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_out", 1, 3);
        hws[IMX8MQ_SYS1_PLL_400M] = imx_clk_hw_fixed_factor("sys1_pll_400m", "sys1_pll_400m_cg", 1, 2);
        hws[IMX8MQ_SYS1_PLL_800M] = imx_clk_hw_fixed_factor("sys1_pll_800m", "sys1_pll_800m_cg", 1, 1);

The sys1_pll_266m is the parent of nand_usdhc_bus. I've validated that
the SDHCI driver properly enables this bus clock across the problematic
card access. So what I think is happening here is that both
nand_usdhc_bus and sys1_pll_266m are initially enabled. Sometime during
boot sys1_pll_266m gets disabled due to runtime PM on the enet_axi
clock, which is a direct child of sys1_pll_266m. At this point
nand_usdhc_bus is still enabled, but no consumer has claimed the clock
yet, so the parent clock gets disabled while this branch of the clock
tree is still active.

The reference manual states about this situation: "For any clock, its
source must be left on when it is kept on. Behavior is undefined if
this rule is violated."
And it seems this is exactly what's happening here: some kind of glitch
is introduced in the nand_usdhc_bus clock, which prevents the SDHCI
controller from working, even though the clock branch is properly
enabled later on. On my system the SDHCI timeout and following runtime
suspend/resume cycle on the nand_usdhc_bus clock seem to get it back
into a working state.

So I think we need some solution at the clock driver/framework level to
prevent shutting down parent clocks that have active branches, even if
those branches aren't claimed by a consumer (yet).

Regards,
Lucas


WARNING: multiple messages have this Message-ID (diff)
From: Lucas Stach <l.stach@pengutronix.de>
To: BOUGH CHEN <haibo.chen@nxp.com>,
	Fabio Estevam <festevam@gmail.com>,
	 Angus Ainslie <angus@akkea.ca>,
	Leonard Crestez <leonard.crestez@nxp.com>,
	Peng Fan <peng.fan@nxp.com>, Abel Vesa <abel.vesa@nxp.com>,
	Stephen Boyd <sboyd@kernel.org>,
	 Michael Turquette <mturquette@baylibre.com>
Cc: "Ulf Hansson" <ulf.hansson@linaro.org>,
	"Guido Günther" <agx@sigxcpu.org>,
	linux-mmc <linux-mmc@vger.kernel.org>,
	"Adrian Hunter" <adrian.hunter@intel.com>,
	dl-linux-imx <linux-imx@nxp.com>,
	"Sascha Hauer" <kernel@pengutronix.de>,
	"moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: sdhci timeout on imx8mq
Date: Tue, 05 Jan 2021 16:06:49 +0100	[thread overview]
Message-ID: <cd99776c0107833d69c9c7fc4c8d6ba1a41ea3d7.camel@pengutronix.de> (raw)
In-Reply-To: <VI1PR04MB52942233A0BA6BCB692F281E90670@VI1PR04MB5294.eurprd04.prod.outlook.com>

Hi all,

Am Mittwoch, dem 08.07.2020 um 01:32 +0000 schrieb BOUGH CHEN:
> > -----Original Message-----
> > From: Fabio Estevam [mailto:festevam@gmail.com]
> > Sent: 2020年7月7日 20:45
> > To: Angus Ainslie <angus@akkea.ca>
> > Cc: BOUGH CHEN <haibo.chen@nxp.com>; Ulf Hansson
> > <ulf.hansson@linaro.org>; Guido Günther <agx@sigxcpu.org>; linux-
> > mmc
> > <linux-mmc@vger.kernel.org>; Adrian Hunter
> > <adrian.hunter@intel.com>;
> > dl-linux-imx <linux-imx@nxp.com>; Sascha Hauer <
> > kernel@pengutronix.de>;
> > moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
> > <linux-arm-kernel@lists.infradead.org>
> > Subject: Re: sdhci timeout on imx8mq
> > 
> > Hi Angus,
> > 
> > On Tue, Jun 30, 2020 at 4:39 PM Angus Ainslie <angus@akkea.ca>
> > wrote:
> > 
> > > Has there been any progress with this. I'm getting this on about
> > > 50%
> > > of
> > 
> > Not from my side, sorry.
> > 
> > Bough,
> > 
> > Do you know why this problem affects the imx8mq-evk versions that
> > are
> > populated with the Micron eMMC and not the ones with Sandisk eMMC?
> 
> Hi Angus,
> 
> Can you show me the full fail log? I do not meet this issue on my
> side, besides, which kind of uboot do you use?

I was finally able to bisect this issue, which wasn't that much fun due
to the issue not being reproducible 100%. :/ Turns out that the issue
is even more interesting than I thought and likely doesn't have
anything to do with SDHCI or used bootloader versions. Here's my
current debugging state:

I've bisected the issue down to b04383b6a558 (clk: imx8mq: Define gates
for pll1/2 fixed dividers). The change itself looks fine to me, still
CC'ed Leonard for good measure.

In my testing the following partial revert fixes the issue:

--- a/drivers/clk/imx/clk-imx8mq.c
+++ b/drivers/clk/imx/clk-imx8mq.c
@@ -365,7 +365,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev)
        hws[IMX8MQ_SYS1_PLL_133M_CG] = imx_clk_hw_gate("sys1_pll_133m_cg", "sys1_pll_out", base + 0x30, 15);
        hws[IMX8MQ_SYS1_PLL_160M_CG] = imx_clk_hw_gate("sys1_pll_160m_cg", "sys1_pll_out", base + 0x30, 17);
        hws[IMX8MQ_SYS1_PLL_200M_CG] = imx_clk_hw_gate("sys1_pll_200m_cg", "sys1_pll_out", base + 0x30, 19);
-       hws[IMX8MQ_SYS1_PLL_266M_CG] = imx_clk_hw_gate("sys1_pll_266m_cg", "sys1_pll_out", base + 0x30, 21);
        hws[IMX8MQ_SYS1_PLL_400M_CG] = imx_clk_hw_gate("sys1_pll_400m_cg", "sys1_pll_out", base + 0x30, 23);
        hws[IMX8MQ_SYS1_PLL_800M_CG] = imx_clk_hw_gate("sys1_pll_800m_cg", "sys1_pll_out", base + 0x30, 25);
 
@@ -375,7 +375,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev)
        hws[IMX8MQ_SYS1_PLL_133M] = imx_clk_hw_fixed_factor("sys1_pll_133m", "sys1_pll_133m_cg", 1, 6);
        hws[IMX8MQ_SYS1_PLL_160M] = imx_clk_hw_fixed_factor("sys1_pll_160m", "sys1_pll_160m_cg", 1, 5);
        hws[IMX8MQ_SYS1_PLL_200M] = imx_clk_hw_fixed_factor("sys1_pll_200m", "sys1_pll_200m_cg", 1, 4);
-       hws[IMX8MQ_SYS1_PLL_266M] = imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_266m_cg", 1, 3);
+       hws[IMX8MQ_SYS1_PLL_266M] = imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_out", 1, 3);
        hws[IMX8MQ_SYS1_PLL_400M] = imx_clk_hw_fixed_factor("sys1_pll_400m", "sys1_pll_400m_cg", 1, 2);
        hws[IMX8MQ_SYS1_PLL_800M] = imx_clk_hw_fixed_factor("sys1_pll_800m", "sys1_pll_800m_cg", 1, 1);

The sys1_pll_266m is the parent of nand_usdhc_bus. I've validated that
the SDHCI driver properly enables this bus clock across the problematic
card access. So what I think is happening here is that both
nand_usdhc_bus and sys1_pll_266m are initially enabled. Sometime during
boot sys1_pll_266m gets disabled due to runtime PM on the enet_axi
clock, which is a direct child of sys1_pll_266m. At this point
nand_usdhc_bus is still enabled, but no consumer has claimed the clock
yet, so the parent clock gets disabled while this branch of the clock
tree is still active.

The reference manual states about this situation: "For any clock, its
source must be left on when it is kept on. Behavior is undefined if
this rule is violated."
And it seems this is exactly what's happening here: some kind of glitch
is introduced in the nand_usdhc_bus clock, which prevents the SDHCI
controller from working, even though the clock branch is properly
enabled later on. On my system the SDHCI timeout and following runtime
suspend/resume cycle on the nand_usdhc_bus clock seem to get it back
into a working state.

So I think we need some solution at the clock driver/framework level to
prevent shutting down parent clocks that have active branches, even if
those branches aren't claimed by a consumer (yet).

Regards,
Lucas


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-01-05 15:07 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-03 19:19 sdhci timeout on imx8mq Fabio Estevam
2020-02-03 19:19 ` Fabio Estevam
2020-02-05  9:26 ` Guido Günther
2020-02-05  9:26   ` Guido Günther
2020-02-05 13:18   ` Fabio Estevam
2020-02-05 13:18     ` Fabio Estevam
2020-02-07  2:11     ` BOUGH CHEN
2020-02-07  2:11       ` BOUGH CHEN
     [not found]       ` <VI1PR04MB504091C7991353F6092A8D91901A0@VI1PR04MB5040.eurprd04.prod.outlook.com>
2020-02-13 10:53         ` Fabio Estevam
2020-02-13 10:53           ` Fabio Estevam
2020-06-30 19:39           ` Angus Ainslie
2020-06-30 19:39             ` Angus Ainslie
2020-07-07 12:44             ` Fabio Estevam
2020-07-07 12:44               ` Fabio Estevam
2020-07-08  1:32               ` BOUGH CHEN
2020-07-08  1:32                 ` BOUGH CHEN
2020-12-18 20:07                 ` Lucas Stach
2020-12-18 20:07                   ` Lucas Stach
2020-12-18 20:45                   ` Angus Ainslie
2020-12-18 20:45                     ` Angus Ainslie
2020-12-23 21:06                   ` Angus Ainslie
2020-12-23 21:06                     ` Angus Ainslie
2021-01-05 15:06                 ` Lucas Stach [this message]
2021-01-05 15:06                   ` Lucas Stach
2021-01-06  9:29                   ` Bough Chen
2021-01-06  9:29                     ` Bough Chen
2021-01-06 15:09                     ` Lucas Stach
2021-01-06 15:09                       ` Lucas Stach
2021-01-07  1:47                       ` Bough Chen
2021-01-07  1:47                         ` Bough Chen
2021-01-06 18:56                   ` Fabio Estevam
2021-01-06 18:56                     ` Fabio Estevam
2021-01-07  1:30                     ` Jacky Bai
2021-01-07  1:30                       ` Jacky Bai
2021-01-07 11:26                       ` Lucas Stach
2021-01-07 11:26                         ` Lucas Stach
2021-01-08  1:27                         ` Jacky Bai
2021-01-08  1:27                           ` Jacky Bai
2021-03-09  7:35                         ` Heiko Thiery
2021-03-09  7:35                           ` Heiko Thiery
2021-01-19  2:35                   ` Peng Fan
2021-01-19  2:35                     ` Peng Fan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd99776c0107833d69c9c7fc4c8d6ba1a41ea3d7.camel@pengutronix.de \
    --to=l.stach@pengutronix.de \
    --cc=abel.vesa@nxp.com \
    --cc=adrian.hunter@intel.com \
    --cc=agx@sigxcpu.org \
    --cc=angus@akkea.ca \
    --cc=festevam@gmail.com \
    --cc=haibo.chen@nxp.com \
    --cc=kernel@pengutronix.de \
    --cc=leonard.crestez@nxp.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-mmc@vger.kernel.org \
    --cc=mturquette@baylibre.com \
    --cc=peng.fan@nxp.com \
    --cc=sboyd@kernel.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.