linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] mmc: core: adjust polling interval for CMD1
@ 2022-02-17 20:11 H. Nikolaus Schaller
  2022-03-01 13:38 ` Ulf Hansson
  0 siblings, 1 reply; 9+ messages in thread
From: H. Nikolaus Schaller @ 2022-02-17 20:11 UTC (permalink / raw)
  To: Huijin Park, Ulf Hansson
  Cc: Jean Rene Dawin, linux-kernel,
	Discussions about the Letux Kernel, Linux-OMAP, linux-mmc,
	Tony Lindgren

Hi,
Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
with our disto kernel based on v5.17-rc1:

>> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
>> Black, but didn't check the logs.
>> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
>> traffic there were messages like
>> 
>> [  662.529584] mmc1: error -110 doing runtime resume
>> [  669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
>> 
>> [  739.076072] mmc1: Timeout waiting for hardware interrupt.
>> [  739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
>> [  739.231053] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00003101
>> [  739.316472] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000400
>> [  739.401937] mmc1: sdhci: Argument:  0x00342d30 | Trn mode: 0x00000023
>> [  739.487439] mmc1: sdhci: Present:   0x01f70000 | Host ctl: 0x00000000
>> [  739.573007] mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
>> [  739.658609] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00003c07
>> [  739.744224] mmc1: sdhci: Timeout:   0x00000007 | Int stat: 0x00000002
>> [  739.829896] mmc1: sdhci: Int enab:  0x027f000b | Sig enab: 0x027f000b
>> [  739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>> [  740.001394] mmc1: sdhci: Caps:      0x07e10080 | Caps_1:   0x00000000
>> [  740.087208] mmc1: sdhci: Cmd:       0x0000193a | Max curr: 0x00000000
>> [  740.173051] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
>> [  740.258928] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>> [  740.344854] mmc1: sdhci: Host ctl2: 0x00000000
>> [  740.402796] mmc1: sdhci: ============================================
>> 
>> and finally IO errors and a corrupted filesystem.
>> 
>> 5.17.0-rc4-letux+ shows the same behaviour.

I checked with my BeagleBoard Black (am3358) and can confirm this observation.
It happens only with the integrated eMMC but not with the µSD connected to
the other mmc interface.

A git bisect found:

76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
Author: Huijin Park <huijin.park@samsung.com>
Date:   Thu Nov 4 15:32:31 2021 +0900

  mmc: core: adjust polling interval for CMD1

  In mmc_send_op_cond(), loops are continuously performed at the same
  interval of 10 ms.  However the behaviour is not good for some eMMC
  which can be out from a busy state earlier than 10 ms if normal.

  Rather than fixing about the interval time in mmc_send_op_cond(),
  let's instead convert into using the common __mmc_poll_for_busy().

  The reason for adjusting the interval time is that it is important
  to reduce the eMMC initialization time, especially in devices that
  use eMMC as rootfs.

  Test log(eMMC:KLM8G1GETF-B041):

  before: 12 ms (0.311016 - 0.298729)
  [    0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
  [    0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
  [    0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
  [    0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007

  after: 2 ms (0.301270 - 0.298762)
  [    0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
  [    0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
  [    0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
  [    0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
  [    0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
  [    0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
  [    0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
  [    0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007

  Signed-off-by: Huijin Park <huijin.park@samsung.com>
  Link: https://lore.kernel.org/r/20211104063231.2115-3-huijin.park@samsung.com
  Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Reverting this makes v5.17-rc[1-4] work.

Any suggestions or fixes?

BR and thanks,
Nikolaus Schaller

Reported-by: jdawin@math.uni-bielefeld.de


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mmc: core: adjust polling interval for CMD1
  2022-02-17 20:11 [BUG] mmc: core: adjust polling interval for CMD1 H. Nikolaus Schaller
@ 2022-03-01 13:38 ` Ulf Hansson
  2022-03-02  8:20   ` Jean Rene Dawin
  0 siblings, 1 reply; 9+ messages in thread
From: Ulf Hansson @ 2022-03-01 13:38 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Huijin Park, Jean Rene Dawin, linux-kernel,
	Discussions about the Letux Kernel, Linux-OMAP, linux-mmc,
	Tony Lindgren

On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@goldelico.com> wrote:
>
> Hi,
> Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
> with our disto kernel based on v5.17-rc1:
>
> >> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
> >> Black, but didn't check the logs.
> >> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
> >> traffic there were messages like
> >>
> >> [  662.529584] mmc1: error -110 doing runtime resume
> >> [  669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
> >>
> >> [  739.076072] mmc1: Timeout waiting for hardware interrupt.
> >> [  739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> >> [  739.231053] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00003101
> >> [  739.316472] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000400
> >> [  739.401937] mmc1: sdhci: Argument:  0x00342d30 | Trn mode: 0x00000023
> >> [  739.487439] mmc1: sdhci: Present:   0x01f70000 | Host ctl: 0x00000000
> >> [  739.573007] mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
> >> [  739.658609] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00003c07
> >> [  739.744224] mmc1: sdhci: Timeout:   0x00000007 | Int stat: 0x00000002
> >> [  739.829896] mmc1: sdhci: Int enab:  0x027f000b | Sig enab: 0x027f000b
> >> [  739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> >> [  740.001394] mmc1: sdhci: Caps:      0x07e10080 | Caps_1:   0x00000000
> >> [  740.087208] mmc1: sdhci: Cmd:       0x0000193a | Max curr: 0x00000000
> >> [  740.173051] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
> >> [  740.258928] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
> >> [  740.344854] mmc1: sdhci: Host ctl2: 0x00000000
> >> [  740.402796] mmc1: sdhci: ============================================
> >>
> >> and finally IO errors and a corrupted filesystem.
> >>
> >> 5.17.0-rc4-letux+ shows the same behaviour.
>
> I checked with my BeagleBoard Black (am3358) and can confirm this observation.
> It happens only with the integrated eMMC but not with the µSD connected to
> the other mmc interface.
>
> A git bisect found:
>
> 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
> commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
> Author: Huijin Park <huijin.park@samsung.com>
> Date:   Thu Nov 4 15:32:31 2021 +0900
>
>   mmc: core: adjust polling interval for CMD1
>
>   In mmc_send_op_cond(), loops are continuously performed at the same
>   interval of 10 ms.  However the behaviour is not good for some eMMC
>   which can be out from a busy state earlier than 10 ms if normal.
>
>   Rather than fixing about the interval time in mmc_send_op_cond(),
>   let's instead convert into using the common __mmc_poll_for_busy().
>
>   The reason for adjusting the interval time is that it is important
>   to reduce the eMMC initialization time, especially in devices that
>   use eMMC as rootfs.
>
>   Test log(eMMC:KLM8G1GETF-B041):
>
>   before: 12 ms (0.311016 - 0.298729)
>   [    0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
>   [    0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
>   [    0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
>   [    0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007
>
>   after: 2 ms (0.301270 - 0.298762)
>   [    0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
>   [    0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
>   [    0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
>   [    0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
>   [    0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
>   [    0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
>   [    0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
>   [    0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007
>
>   Signed-off-by: Huijin Park <huijin.park@samsung.com>
>   Link: https://lore.kernel.org/r/20211104063231.2115-3-huijin.park@samsung.com
>   Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>
> Reverting this makes v5.17-rc[1-4] work.
>
> Any suggestions or fixes?
>
> BR and thanks,
> Nikolaus Schaller
>
> Reported-by: jdawin@math.uni-bielefeld.de
>

Thanks for reporting and bisecting!

What changed with the offending commit is two things:

1) We are sending the CMD1 more frequently, initially in the loop in
the __mmc_poll_for_busy. Step by step, we increase the polling period.
2) We may end up using a slightly shorter total timeout for polling
time, compared to what we used before the offending commit.

Hopefully the problem is related to 2), in which case I think the
below patch should help. Can you please give it a try?

Kind regards
Uffe

From: Ulf Hansson <ulf.hansson@linaro.org>
Date: Tue, 1 Mar 2022 14:24:21 +0100
Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND

It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
a bit too small due to recent changes. Therefore, let's extend it to 2s,
which is probably more inline with its previous value, to fix the reported
timeout problems.

While at it, let's add a define for the timeout value, rather than using
a hard-coded value for it.

Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de>
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Huijin Park <huijin.park@samsung.com>
Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/mmc/core/mmc_ops.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index d63d1c735335..1f57174b3cf3 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -21,6 +21,7 @@

 #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
 #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
+#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */

 static const u8 tuning_blk_pattern_4bit[] = {
        0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
@@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
        cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
        cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;

-       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
+       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
+                                 &__mmc_send_op_cond_cb, &cb_data);
        if (err)
                return err;

-- 
2.25.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [BUG] mmc: core: adjust polling interval for CMD1
  2022-03-01 13:38 ` Ulf Hansson
@ 2022-03-02  8:20   ` Jean Rene Dawin
  2022-03-02  9:40     ` H. Nikolaus Schaller
  0 siblings, 1 reply; 9+ messages in thread
From: Jean Rene Dawin @ 2022-03-02  8:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: H. Nikolaus Schaller, Huijin Park, linux-kernel,
	Discussions about the Letux Kernel, Linux-OMAP, linux-mmc,
	Tony Lindgren

Ulf Hansson wrote on Tue  1/03/22 14:38:
> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >
> > Hi,
> > Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
> > with our disto kernel based on v5.17-rc1:
> >
> > >> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
> > >> Black, but didn't check the logs.
> > >> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
> > >> traffic there were messages like
> > >>
> > >> [  662.529584] mmc1: error -110 doing runtime resume
> > >> [  669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
> > >>
> > >> [  739.076072] mmc1: Timeout waiting for hardware interrupt.
> > >> [  739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> > >> [  739.231053] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00003101
> > >> [  739.316472] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000400
> > >> [  739.401937] mmc1: sdhci: Argument:  0x00342d30 | Trn mode: 0x00000023
> > >> [  739.487439] mmc1: sdhci: Present:   0x01f70000 | Host ctl: 0x00000000
> > >> [  739.573007] mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
> > >> [  739.658609] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00003c07
> > >> [  739.744224] mmc1: sdhci: Timeout:   0x00000007 | Int stat: 0x00000002
> > >> [  739.829896] mmc1: sdhci: Int enab:  0x027f000b | Sig enab: 0x027f000b
> > >> [  739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> > >> [  740.001394] mmc1: sdhci: Caps:      0x07e10080 | Caps_1:   0x00000000
> > >> [  740.087208] mmc1: sdhci: Cmd:       0x0000193a | Max curr: 0x00000000
> > >> [  740.173051] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
> > >> [  740.258928] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
> > >> [  740.344854] mmc1: sdhci: Host ctl2: 0x00000000
> > >> [  740.402796] mmc1: sdhci: ============================================
> > >>
> > >> and finally IO errors and a corrupted filesystem.
> > >>
> > >> 5.17.0-rc4-letux+ shows the same behaviour.
> >
> > I checked with my BeagleBoard Black (am3358) and can confirm this observation.
> > It happens only with the integrated eMMC but not with the µSD connected to
> > the other mmc interface.
> >
> > A git bisect found:
> >
> > 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
> > commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
> > Author: Huijin Park <huijin.park@samsung.com>
> > Date:   Thu Nov 4 15:32:31 2021 +0900
> >
> >   mmc: core: adjust polling interval for CMD1
> >
> >   In mmc_send_op_cond(), loops are continuously performed at the same
> >   interval of 10 ms.  However the behaviour is not good for some eMMC
> >   which can be out from a busy state earlier than 10 ms if normal.
> >
> >   Rather than fixing about the interval time in mmc_send_op_cond(),
> >   let's instead convert into using the common __mmc_poll_for_busy().
> >
> >   The reason for adjusting the interval time is that it is important
> >   to reduce the eMMC initialization time, especially in devices that
> >   use eMMC as rootfs.
> >
> >   Test log(eMMC:KLM8G1GETF-B041):
> >
> >   before: 12 ms (0.311016 - 0.298729)
> >   [    0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
> >   [    0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
> >   [    0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
> >   [    0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007
> >
> >   after: 2 ms (0.301270 - 0.298762)
> >   [    0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
> >   [    0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
> >   [    0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
> >   [    0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
> >   [    0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
> >   [    0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
> >   [    0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
> >   [    0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007
> >
> >   Signed-off-by: Huijin Park <huijin.park@samsung.com>
> >   Link: https://lore.kernel.org/r/20211104063231.2115-3-huijin.park@samsung.com
> >   Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> >
> > Reverting this makes v5.17-rc[1-4] work.
> >
> > Any suggestions or fixes?
> >
> > BR and thanks,
> > Nikolaus Schaller
> >
> > Reported-by: jdawin@math.uni-bielefeld.de
> >
> 
> Thanks for reporting and bisecting!
> 
> What changed with the offending commit is two things:
> 
> 1) We are sending the CMD1 more frequently, initially in the loop in
> the __mmc_poll_for_busy. Step by step, we increase the polling period.
> 2) We may end up using a slightly shorter total timeout for polling
> time, compared to what we used before the offending commit.
> 
> Hopefully the problem is related to 2), in which case I think the
> below patch should help. Can you please give it a try?
> 
> Kind regards
> Uffe
> 
> From: Ulf Hansson <ulf.hansson@linaro.org>
> Date: Tue, 1 Mar 2022 14:24:21 +0100
> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> 
> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> which is probably more inline with its previous value, to fix the reported
> timeout problems.
> 
> While at it, let's add a define for the timeout value, rather than using
> a hard-coded value for it.
> 
> Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de>
> Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
> Cc: Huijin Park <huijin.park@samsung.com>
> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>  drivers/mmc/core/mmc_ops.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index d63d1c735335..1f57174b3cf3 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -21,6 +21,7 @@
> 
>  #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
>  #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
> +#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */
> 
>  static const u8 tuning_blk_pattern_4bit[] = {
>         0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> ocr, u32 *rocr)
>         cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>         cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> 
> -       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> +       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> +                                 &__mmc_send_op_cond_cb, &cb_data);
>         if (err)
>                 return err;
> 
> -- 
> 2.25.1

Hi,

thanks. But testing with this patch still gives the same errors:

[   52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
[   52.273380] mmc1: error -110 doing runtime resume

and the system gets stuck eventually.

Regards,
Jean Rene Dawin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mmc: core: adjust polling interval for CMD1
  2022-03-02  8:20   ` Jean Rene Dawin
@ 2022-03-02  9:40     ` H. Nikolaus Schaller
  2022-03-03 10:40       ` Ulf Hansson
  0 siblings, 1 reply; 9+ messages in thread
From: H. Nikolaus Schaller @ 2022-03-02  9:40 UTC (permalink / raw)
  To: Jean Rene Dawin, Ulf Hansson
  Cc: Huijin Park, linux-kernel, Discussions about the Letux Kernel,
	Linux-OMAP, linux-mmc, Tony Lindgren

Hi,

> Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <jdawin@math.uni-bielefeld.de>:
> 
> Ulf Hansson wrote on Tue  1/03/22 14:38:
>> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@goldelico.com> wrote:
>>> 
>> 
>> From: Ulf Hansson <ulf.hansson@linaro.org>
>> Date: Tue, 1 Mar 2022 14:24:21 +0100
>> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
>> 
>> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
>> a bit too small due to recent changes. Therefore, let's extend it to 2s,
>> which is probably more inline with its previous value, to fix the reported
>> timeout problems.
>> 
>> While at it, let's add a define for the timeout value, rather than using
>> a hard-coded value for it.
>> 
>> Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de>
>> Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
>> Cc: Huijin Park <huijin.park@samsung.com>
>> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>> ---
>> drivers/mmc/core/mmc_ops.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
>> index d63d1c735335..1f57174b3cf3 100644
>> --- a/drivers/mmc/core/mmc_ops.c
>> +++ b/drivers/mmc/core/mmc_ops.c
>> @@ -21,6 +21,7 @@
>> 
>> #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
>> #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
>> +#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */
>> 
>> static const u8 tuning_blk_pattern_4bit[] = {
>>        0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
>> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
>> ocr, u32 *rocr)
>>        cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>>        cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>> 
>> -       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
>> +       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
>> +                                 &__mmc_send_op_cond_cb, &cb_data);
>>        if (err)
>>                return err;
>> 
>> -- 
>> 2.25.1
> 
> Hi,
> 
> thanks. But testing with this patch still gives the same errors:
> 
> [   52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> [   52.273380] mmc1: error -110 doing runtime resume
> 
> and the system gets stuck eventually.

Same result from my tests.

BR and thanks,
Nikolaus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mmc: core: adjust polling interval for CMD1
  2022-03-02  9:40     ` H. Nikolaus Schaller
@ 2022-03-03 10:40       ` Ulf Hansson
  2022-03-04  9:28         ` Jean Rene Dawin
  0 siblings, 1 reply; 9+ messages in thread
From: Ulf Hansson @ 2022-03-03 10:40 UTC (permalink / raw)
  To: Jean Rene Dawin, H. Nikolaus Schaller
  Cc: Huijin Park, linux-kernel, Discussions about the Letux Kernel,
	Linux-OMAP, linux-mmc, Tony Lindgren

On Wed, 2 Mar 2022 at 10:40, H. Nikolaus Schaller <hns@goldelico.com> wrote:
>
> Hi,
>
> > Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <jdawin@math.uni-bielefeld.de>:
> >
> > Ulf Hansson wrote on Tue  1/03/22 14:38:
> >> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >>>
> >>
> >> From: Ulf Hansson <ulf.hansson@linaro.org>
> >> Date: Tue, 1 Mar 2022 14:24:21 +0100
> >> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> >>
> >> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> >> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> >> which is probably more inline with its previous value, to fix the reported
> >> timeout problems.
> >>
> >> While at it, let's add a define for the timeout value, rather than using
> >> a hard-coded value for it.
> >>
> >> Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de>
> >> Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
> >> Cc: Huijin Park <huijin.park@samsung.com>
> >> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> >> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> >> ---
> >> drivers/mmc/core/mmc_ops.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >> index d63d1c735335..1f57174b3cf3 100644
> >> --- a/drivers/mmc/core/mmc_ops.c
> >> +++ b/drivers/mmc/core/mmc_ops.c
> >> @@ -21,6 +21,7 @@
> >>
> >> #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
> >> #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
> >> +#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */
> >>
> >> static const u8 tuning_blk_pattern_4bit[] = {
> >>        0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> >> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> >> ocr, u32 *rocr)
> >>        cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> >>        cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >>
> >> -       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> >> +       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> >> +                                 &__mmc_send_op_cond_cb, &cb_data);
> >>        if (err)
> >>                return err;
> >>
> >> --
> >> 2.25.1
> >
> > Hi,
> >
> > thanks. But testing with this patch still gives the same errors:
> >
> > [   52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> > [   52.273380] mmc1: error -110 doing runtime resume
> >
> > and the system gets stuck eventually.
>
> Same result from my tests.
>
> BR and thanks,
> Nikolaus

Alright, it starts to sound like we might need a revert (or at least a
way to restore the previous behaviour) - even if that would just paper
over the real problem. The real problem is more likely related to how
the host driver manages restoring of the power to the card, which
happens when runtime resuming it.

In any case, just to make sure the loop timeout itself isn't the
problem, can you run the below debug patch please? The intent is to
figure out how long the worst case timeout is, when it's working with
CMD1. As soon as the timeout exceeds the worst case, there is a
message printed to the log.

The below patch is based upon that the offending commit has been reverted.

Kind regards
Uffe

From: Ulf Hansson <ulf.hansson@linaro.org>
Date: Thu, 3 Mar 2022 11:00:04 +0100
Subject: [PATCH] mmc: core: DEBUG - Measure and log worst case CMD1 loop

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/mmc/core/mmc_ops.c | 11 +++++++++++
 include/linux/mmc/host.h   |  1 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index 9946733a34c6..3f03d9e8a3a4 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -177,11 +177,15 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
 {
        struct mmc_command cmd = {};
        int i, err = 0;
+       s64 cmd1_ms;
+       ktime_t time_start;

        cmd.opcode = MMC_SEND_OP_COND;
        cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
        cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;

+       time_start = ktime_get();
+
        for (i = 100; i; i--) {
                err = mmc_wait_for_cmd(host, &cmd, 0);
                if (err)
@@ -211,6 +215,13 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
                        cmd.arg = cmd.resp[0] | BIT(30);
        }

+       cmd1_ms = ktime_to_ms(ktime_sub(ktime_get(), time_start));
+       if (!err && cmd1_ms > host->cmd1_ms) {
+               pr_warn("%s: CMD1 timeout increased to %lld ms, loop=%d\n",
+                       mmc_hostname(host), cmd1_ms, i);
+               host->cmd1_ms = cmd1_ms;
+       }
+
        if (rocr && !mmc_host_is_spi(host))
                *rocr = cmd.resp[0];

diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 7afb57cab00b..c2ca3bb05620 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -301,6 +301,7 @@ struct mmc_host {
        unsigned int            f_min;
        unsigned int            f_max;
        unsigned int            f_init;
+       s64                     cmd1_ms;
        u32                     ocr_avail;
        u32                     ocr_avail_sdio; /* SDIO-specific OCR */
        u32                     ocr_avail_sd;   /* SD-specific OCR */
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [BUG] mmc: core: adjust polling interval for CMD1
  2022-03-03 10:40       ` Ulf Hansson
@ 2022-03-04  9:28         ` Jean Rene Dawin
  2022-03-04 10:08           ` Ulf Hansson
  0 siblings, 1 reply; 9+ messages in thread
From: Jean Rene Dawin @ 2022-03-04  9:28 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: H. Nikolaus Schaller, Huijin Park, linux-kernel,
	Discussions about the Letux Kernel, Linux-OMAP, linux-mmc,
	Tony Lindgren

Ulf Hansson wrote on Thu  3/03/22 11:40:
> On Wed, 2 Mar 2022 at 10:40, H. Nikolaus Schaller <hns@goldelico.com> wrote:
> 
> Alright, it starts to sound like we might need a revert (or at least a
> way to restore the previous behaviour) - even if that would just paper
> over the real problem. The real problem is more likely related to how
> the host driver manages restoring of the power to the card, which
> happens when runtime resuming it.
> 
> In any case, just to make sure the loop timeout itself isn't the
> problem, can you run the below debug patch please? The intent is to
> figure out how long the worst case timeout is, when it's working with
> CMD1. As soon as the timeout exceeds the worst case, there is a
> message printed to the log.
> 
> The below patch is based upon that the offending commit has been reverted.
> 
> Kind regards
> Uffe
> 
> From: Ulf Hansson <ulf.hansson@linaro.org>
> Date: Thu, 3 Mar 2022 11:00:04 +0100
> Subject: [PATCH] mmc: core: DEBUG - Measure and log worst case CMD1 loop
> 
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>  drivers/mmc/core/mmc_ops.c | 11 +++++++++++
>  include/linux/mmc/host.h   |  1 +
>  2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index 9946733a34c6..3f03d9e8a3a4 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -177,11 +177,15 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> ocr, u32 *rocr)
>  {
>         struct mmc_command cmd = {};
>         int i, err = 0;
> +       s64 cmd1_ms;
> +       ktime_t time_start;
> 
>         cmd.opcode = MMC_SEND_OP_COND;
>         cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>         cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> 
> +       time_start = ktime_get();
> +
>         for (i = 100; i; i--) {
>                 err = mmc_wait_for_cmd(host, &cmd, 0);
>                 if (err)
> @@ -211,6 +215,13 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> ocr, u32 *rocr)
>                         cmd.arg = cmd.resp[0] | BIT(30);
>         }
> 
> +       cmd1_ms = ktime_to_ms(ktime_sub(ktime_get(), time_start));
> +       if (!err && cmd1_ms > host->cmd1_ms) {
> +               pr_warn("%s: CMD1 timeout increased to %lld ms, loop=%d\n",
> +                       mmc_hostname(host), cmd1_ms, i);
> +               host->cmd1_ms = cmd1_ms;
> +       }
> +
>         if (rocr && !mmc_host_is_spi(host))
>                 *rocr = cmd.resp[0];
> 
> diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
> index 7afb57cab00b..c2ca3bb05620 100644
> --- a/include/linux/mmc/host.h
> +++ b/include/linux/mmc/host.h
> @@ -301,6 +301,7 @@ struct mmc_host {
>         unsigned int            f_min;
>         unsigned int            f_max;
>         unsigned int            f_init;
> +       s64                     cmd1_ms;
>         u32                     ocr_avail;
>         u32                     ocr_avail_sdio; /* SDIO-specific OCR */
>         u32                     ocr_avail_sd;   /* SD-specific OCR */
> -- 
> 2.25.1

Hi,

thanks. With this patch applied with reverted 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
I don't get the "stuck" errors in __mmc_poll_for_busy and only one message of

[    1.095194] mmc1: CMD1 timeout increased to 64 ms, loop=98

Regards,
Jean Rene Dawin



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mmc: core: adjust polling interval for CMD1
  2022-03-04  9:28         ` Jean Rene Dawin
@ 2022-03-04 10:08           ` Ulf Hansson
  0 siblings, 0 replies; 9+ messages in thread
From: Ulf Hansson @ 2022-03-04 10:08 UTC (permalink / raw)
  To: Jean Rene Dawin
  Cc: H. Nikolaus Schaller, Huijin Park, linux-kernel,
	Discussions about the Letux Kernel, Linux-OMAP, linux-mmc,
	Tony Lindgren

On Fri, 4 Mar 2022 at 10:28, Jean Rene Dawin
<jdawin@math.uni-bielefeld.de> wrote:
>
> Ulf Hansson wrote on Thu  3/03/22 11:40:
> > On Wed, 2 Mar 2022 at 10:40, H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >
> > Alright, it starts to sound like we might need a revert (or at least a
> > way to restore the previous behaviour) - even if that would just paper
> > over the real problem. The real problem is more likely related to how
> > the host driver manages restoring of the power to the card, which
> > happens when runtime resuming it.
> >
> > In any case, just to make sure the loop timeout itself isn't the
> > problem, can you run the below debug patch please? The intent is to
> > figure out how long the worst case timeout is, when it's working with
> > CMD1. As soon as the timeout exceeds the worst case, there is a
> > message printed to the log.
> >
> > The below patch is based upon that the offending commit has been reverted.
> >
> > Kind regards
> > Uffe
> >
> > From: Ulf Hansson <ulf.hansson@linaro.org>
> > Date: Thu, 3 Mar 2022 11:00:04 +0100
> > Subject: [PATCH] mmc: core: DEBUG - Measure and log worst case CMD1 loop
> >
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >  drivers/mmc/core/mmc_ops.c | 11 +++++++++++
> >  include/linux/mmc/host.h   |  1 +
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> > index 9946733a34c6..3f03d9e8a3a4 100644
> > --- a/drivers/mmc/core/mmc_ops.c
> > +++ b/drivers/mmc/core/mmc_ops.c
> > @@ -177,11 +177,15 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> > ocr, u32 *rocr)
> >  {
> >         struct mmc_command cmd = {};
> >         int i, err = 0;
> > +       s64 cmd1_ms;
> > +       ktime_t time_start;
> >
> >         cmd.opcode = MMC_SEND_OP_COND;
> >         cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> >         cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >
> > +       time_start = ktime_get();
> > +
> >         for (i = 100; i; i--) {
> >                 err = mmc_wait_for_cmd(host, &cmd, 0);
> >                 if (err)
> > @@ -211,6 +215,13 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> > ocr, u32 *rocr)
> >                         cmd.arg = cmd.resp[0] | BIT(30);
> >         }
> >
> > +       cmd1_ms = ktime_to_ms(ktime_sub(ktime_get(), time_start));
> > +       if (!err && cmd1_ms > host->cmd1_ms) {
> > +               pr_warn("%s: CMD1 timeout increased to %lld ms, loop=%d\n",
> > +                       mmc_hostname(host), cmd1_ms, i);
> > +               host->cmd1_ms = cmd1_ms;
> > +       }
> > +
> >         if (rocr && !mmc_host_is_spi(host))
> >                 *rocr = cmd.resp[0];
> >
> > diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
> > index 7afb57cab00b..c2ca3bb05620 100644
> > --- a/include/linux/mmc/host.h
> > +++ b/include/linux/mmc/host.h
> > @@ -301,6 +301,7 @@ struct mmc_host {
> >         unsigned int            f_min;
> >         unsigned int            f_max;
> >         unsigned int            f_init;
> > +       s64                     cmd1_ms;
> >         u32                     ocr_avail;
> >         u32                     ocr_avail_sdio; /* SDIO-specific OCR */
> >         u32                     ocr_avail_sd;   /* SD-specific OCR */
> > --
> > 2.25.1
>
> Hi,
>
> thanks. With this patch applied with reverted 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
> I don't get the "stuck" errors in __mmc_poll_for_busy and only one message of
>
> [    1.095194] mmc1: CMD1 timeout increased to 64 ms, loop=98

Okay, so for some reason polling doesn't work, if we poll too
frequently. Clearly the 1s total timeout should be sufficient.

I will prepare a patch that restores the old behaviour and post it
within a few minutes.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mmc: core: adjust polling interval for CMD1
  2022-03-03 12:16 ` Huijin Park
@ 2022-03-04  9:42   ` Jean Rene Dawin
  0 siblings, 0 replies; 9+ messages in thread
From: Jean Rene Dawin @ 2022-03-04  9:42 UTC (permalink / raw)
  To: Huijin Park
  Cc: H . Nikolaus Schaller, linux-kernel, linux-omap, linux-mmc,
	letux-kernel, Ulf Hansson, tony, bbanghj.park

Huijin Park wrote on Thu  3/03/22 21:16:
> Hi, sorry for the late reply.
> I guess the problem occurs depending on the eMMC model.
> Because I tested again and there was no problem.
> The eMMC model used for the test are as follows.
> (THGBMJG6C1LBAIL, KLM8G1GETF-B041)
> Anyway I guess the cause is interval time.
> I wrote a debug patch assuming that the reason was that some mmc models
> could not process CMD1 delivered at short intervals.
> I copied the polling function and adds interval minimum time parameter.
> I set the minimum time to 1ms. You can adjust it.
> Please test if there is no problem with the debug patch.
> 
> >Hi,
> >
> >> Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <jdawin@math.uni-bielefeld.de>:
> >> 
> >> Ulf Hansson wrote on Tue  1/03/22 14:38:
> >>> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >>>> 
> >>> 
> >>> From: Ulf Hansson <ulf.hansson@linaro.org>
> >>> Date: Tue, 1 Mar 2022 14:24:21 +0100
> >>> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> >>> 
> >>> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> >>> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> >>> which is probably more inline with its previous value, to fix the reported
> >>> timeout problems.
> >>> 
> >>> While at it, let's add a define for the timeout value, rather than using
> >>> a hard-coded value for it.
> >>> 
> >>> Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de>
> >>> Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
> >>> Cc: Huijin Park <huijin.park@samsung.com>
> >>> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> >>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> >>> ---
> >>> drivers/mmc/core/mmc_ops.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>> 
> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >>> index d63d1c735335..1f57174b3cf3 100644
> >>> --- a/drivers/mmc/core/mmc_ops.c
> >>> +++ b/drivers/mmc/core/mmc_ops.c
> >>> @@ -21,6 +21,7 @@
> >>> 
> >>> #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
> >>> #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
> >>> +#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */
> >>> 
> >>> static const u8 tuning_blk_pattern_4bit[] = {
> >>>        0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> >>> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> >>> ocr, u32 *rocr)
> >>>        cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> >>>        cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >>> 
> >>> -       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> >>> +       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> >>> +                                 &__mmc_send_op_cond_cb, &cb_data);
> >>>        if (err)
> >>>                return err;
> >>> 
> >>> -- 
> >>> 2.25.1
> >> 
> >> Hi,
> >> 
> >> thanks. But testing with this patch still gives the same errors:
> >> 
> >> [   52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> >> [   52.273380] mmc1: error -110 doing runtime resume
> >> 
> >> and the system gets stuck eventually.
> >
> >Same result from my tests.
> >
> >BR and thanks,
> >Nikolaus
> 
> 
> From c2458cb998dd8e275fefba52dd2532beb153c82d Mon Sep 17 00:00:00 2001
> From: Huijin Park <huijin.park@samsung.com>
> Date: Thu, 3 Mar 2022 20:43:22 +0900
> Subject: [PATCH] mmc: core: extend timeout and set min time for op_cond
> 
> This patch extends timeout to 2s and sets minimal interval time for
> checking op_cond stuck problem.
> 
> Signed-off-by: Huijin Park <huijin.park@samsung.com>
> 
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index d63d1c735335..ccad6379d183 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -21,6 +21,7 @@
>  
>  #define MMC_BKOPS_TIMEOUT_MS		(120 * 1000) /* 120s */
>  #define MMC_SANITIZE_TIMEOUT_MS		(240 * 1000) /* 240s */
> +#define MMC_OP_COND_TIMEOUT_MS		(  2 * 1000) /*   2s */
>  
>  static const u8 tuning_blk_pattern_4bit[] = {
>  	0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> @@ -179,6 +180,47 @@ int mmc_go_idle(struct mmc_host *host)
>  	return err;
>  }
>  
> +static int ____mmc_poll_for_busy(struct mmc_host *host, unsigned int udelay_min,
> +				 unsigned int timeout_ms,
> +				 int (*busy_cb)(void *cb_data, bool *busy),
> +				 void *cb_data)
> +{
> +	int err;
> +	unsigned long timeout;
> +	unsigned int udelay = udelay_min, udelay_max = 32768;
> +	bool expired = false;
> +	bool busy = false;
> +
> +	timeout = jiffies + msecs_to_jiffies(timeout_ms) + 1;
> +	do {
> +		/*
> +		 * Due to the possibility of being preempted while polling,
> +		 * check the expiration time first.
> +		 */
> +		expired = time_after(jiffies, timeout);
> +
> +		err = (*busy_cb)(cb_data, &busy);
> +		if (err)
> +			return err;
> +
> +		/* Timeout if the device still remains busy. */
> +		if (expired && busy) {
> +			pr_err("%s: Card stuck being busy! %s\n",
> +				mmc_hostname(host), __func__);
> +			return -ETIMEDOUT;
> +		}
> +
> +		/* Throttle the polling rate to avoid hogging the CPU. */
> +		if (busy) {
> +			usleep_range(udelay, udelay * 2);
> +			if (udelay < udelay_max)
> +				udelay *= 2;
> +		}
> +	} while (busy);
> +
> +	return 0;
> +}
> +
>  static int __mmc_send_op_cond_cb(void *cb_data, bool *busy)
>  {
>  	struct mmc_op_cond_busy_data *data = cb_data;
> @@ -232,7 +274,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32 ocr, u32 *rocr)
>  	cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>  	cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>  
> -	err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> +	err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
> +				    &__mmc_send_op_cond_cb, &cb_data);
>  	if (err)
>  		return err;
>  
> -- 
> 2.17.1
> 

Hi,

thanks for the patch. 
I started with a value of 10 ms :
       err = ____mmc_poll_for_busy(host, 10, MMC_OP_COND_TIMEOUT_MS,
                                    &__mmc_send_op_cond_cb, &cb_data);

and the result was agian

[   23.349932] mmc1: Card stuck being busy! __mmc_poll_for_busy
[   23.355936] mmc1: error -110 doing runtime resume

Same with 100 ms and 500 ms.

The messages seem to come from __mmc_poll_for_busy and not ____mmc_poll_for_busy.
Yet, changing the value in the call of ____mmc_poll_for_busy to 1000 ms changed the behaviour:

       err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
                                   &__mmc_send_op_cond_cb, &cb_data);

With this I didn't get any "stuck" errors and the board seemed stable.

Regards,
Jean Rene Dawin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [BUG] mmc: core: adjust polling interval for CMD1
       [not found] <CGME20220303121624epcas1p3781f58f2f04b0a26c35311aa0f4defcf@epcas1p3.samsung.com>
@ 2022-03-03 12:16 ` Huijin Park
  2022-03-04  9:42   ` Jean Rene Dawin
  0 siblings, 1 reply; 9+ messages in thread
From: Huijin Park @ 2022-03-03 12:16 UTC (permalink / raw)
  To: H . Nikolaus Schaller, Jean Rene Dawin
  Cc: linux-kernel, linux-omap, linux-mmc, letux-kernel, Ulf Hansson,
	tony, huijin.park, bbanghj.park

Hi, sorry for the late reply.
I guess the problem occurs depending on the eMMC model.
Because I tested again and there was no problem.
The eMMC model used for the test are as follows.
(THGBMJG6C1LBAIL, KLM8G1GETF-B041)
Anyway I guess the cause is interval time.
I wrote a debug patch assuming that the reason was that some mmc models
could not process CMD1 delivered at short intervals.
I copied the polling function and adds interval minimum time parameter.
I set the minimum time to 1ms. You can adjust it.
Please test if there is no problem with the debug patch.

>Hi,
>
>> Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <jdawin@math.uni-bielefeld.de>:
>> 
>> Ulf Hansson wrote on Tue  1/03/22 14:38:
>>> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@goldelico.com> wrote:
>>>> 
>>> 
>>> From: Ulf Hansson <ulf.hansson@linaro.org>
>>> Date: Tue, 1 Mar 2022 14:24:21 +0100
>>> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
>>> 
>>> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
>>> a bit too small due to recent changes. Therefore, let's extend it to 2s,
>>> which is probably more inline with its previous value, to fix the reported
>>> timeout problems.
>>> 
>>> While at it, let's add a define for the timeout value, rather than using
>>> a hard-coded value for it.
>>> 
>>> Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de>
>>> Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
>>> Cc: Huijin Park <huijin.park@samsung.com>
>>> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
>>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>>> ---
>>> drivers/mmc/core/mmc_ops.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
>>> index d63d1c735335..1f57174b3cf3 100644
>>> --- a/drivers/mmc/core/mmc_ops.c
>>> +++ b/drivers/mmc/core/mmc_ops.c
>>> @@ -21,6 +21,7 @@
>>> 
>>> #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
>>> #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
>>> +#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */
>>> 
>>> static const u8 tuning_blk_pattern_4bit[] = {
>>>        0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
>>> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
>>> ocr, u32 *rocr)
>>>        cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>>>        cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>>> 
>>> -       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
>>> +       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
>>> +                                 &__mmc_send_op_cond_cb, &cb_data);
>>>        if (err)
>>>                return err;
>>> 
>>> -- 
>>> 2.25.1
>> 
>> Hi,
>> 
>> thanks. But testing with this patch still gives the same errors:
>> 
>> [   52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
>> [   52.273380] mmc1: error -110 doing runtime resume
>> 
>> and the system gets stuck eventually.
>
>Same result from my tests.
>
>BR and thanks,
>Nikolaus


From c2458cb998dd8e275fefba52dd2532beb153c82d Mon Sep 17 00:00:00 2001
From: Huijin Park <huijin.park@samsung.com>
Date: Thu, 3 Mar 2022 20:43:22 +0900
Subject: [PATCH] mmc: core: extend timeout and set min time for op_cond

This patch extends timeout to 2s and sets minimal interval time for
checking op_cond stuck problem.

Signed-off-by: Huijin Park <huijin.park@samsung.com>

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index d63d1c735335..ccad6379d183 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -21,6 +21,7 @@
 
 #define MMC_BKOPS_TIMEOUT_MS		(120 * 1000) /* 120s */
 #define MMC_SANITIZE_TIMEOUT_MS		(240 * 1000) /* 240s */
+#define MMC_OP_COND_TIMEOUT_MS		(  2 * 1000) /*   2s */
 
 static const u8 tuning_blk_pattern_4bit[] = {
 	0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
@@ -179,6 +180,47 @@ int mmc_go_idle(struct mmc_host *host)
 	return err;
 }
 
+static int ____mmc_poll_for_busy(struct mmc_host *host, unsigned int udelay_min,
+				 unsigned int timeout_ms,
+				 int (*busy_cb)(void *cb_data, bool *busy),
+				 void *cb_data)
+{
+	int err;
+	unsigned long timeout;
+	unsigned int udelay = udelay_min, udelay_max = 32768;
+	bool expired = false;
+	bool busy = false;
+
+	timeout = jiffies + msecs_to_jiffies(timeout_ms) + 1;
+	do {
+		/*
+		 * Due to the possibility of being preempted while polling,
+		 * check the expiration time first.
+		 */
+		expired = time_after(jiffies, timeout);
+
+		err = (*busy_cb)(cb_data, &busy);
+		if (err)
+			return err;
+
+		/* Timeout if the device still remains busy. */
+		if (expired && busy) {
+			pr_err("%s: Card stuck being busy! %s\n",
+				mmc_hostname(host), __func__);
+			return -ETIMEDOUT;
+		}
+
+		/* Throttle the polling rate to avoid hogging the CPU. */
+		if (busy) {
+			usleep_range(udelay, udelay * 2);
+			if (udelay < udelay_max)
+				udelay *= 2;
+		}
+	} while (busy);
+
+	return 0;
+}
+
 static int __mmc_send_op_cond_cb(void *cb_data, bool *busy)
 {
 	struct mmc_op_cond_busy_data *data = cb_data;
@@ -232,7 +274,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32 ocr, u32 *rocr)
 	cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
 	cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
 
-	err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
+	err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
+				    &__mmc_send_op_cond_cb, &cb_data);
 	if (err)
 		return err;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-04 10:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-17 20:11 [BUG] mmc: core: adjust polling interval for CMD1 H. Nikolaus Schaller
2022-03-01 13:38 ` Ulf Hansson
2022-03-02  8:20   ` Jean Rene Dawin
2022-03-02  9:40     ` H. Nikolaus Schaller
2022-03-03 10:40       ` Ulf Hansson
2022-03-04  9:28         ` Jean Rene Dawin
2022-03-04 10:08           ` Ulf Hansson
     [not found] <CGME20220303121624epcas1p3781f58f2f04b0a26c35311aa0f4defcf@epcas1p3.samsung.com>
2022-03-03 12:16 ` Huijin Park
2022-03-04  9:42   ` Jean Rene Dawin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).