linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
@ 2016-04-25 15:18 Enric Balletbo i Serra
  2016-04-25 15:18 ` [PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors Enric Balletbo i Serra
  2016-04-25 15:29 ` [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo Serra
  0 siblings, 2 replies; 6+ messages in thread
From: Enric Balletbo i Serra @ 2016-04-25 15:18 UTC (permalink / raw)
  To: Jaehoon Chung, Ulf Hansson, linux-mmc, linux-kernel

Hi,

I introduced the cover letter to give some background about this.

I have been investigating a problem related to at least one specific sdcard when
UHS-I is set. The card is not detected due the tuning phase reports a
failure. Since the problem is only reproduced with a single model of a single
brand of card, it is probably a card firmware issue, but the card works fine
on my laptop.

The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
that it's easy to reproduce the hung task, either, with cold boots or suspend to
ram tests.

I tried to fix both problems (the original issue and the one introduced by the
patch) in different ways, but I ended thinking that this second proposal is the
most simple that solves both issues. So let's try to fix this by handling the
response CRC error slightly differently when tuning command is happening.

I tested the patch on both platforms, on exynos and on rockhip. I did lots of
tests and at the moment the patch seems to fix the rockchip issue and don't
hung on exynos. I'll continue testing meanwhile we discuss about it.

I think the patch, at least, needs the Doug's approval (as he dig into the issue
before) and the Tested-by Alim. So will be good if you have a slot of time to
look a bit into this.

Thanks in advance.
 Enric

[1] https://lkml.org/lkml/2015/5/18/495

Doug Anderson (1):
  mmc: dw_mmc: Wait for data transfer after response errors.

 drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors.
  2016-04-25 15:18 [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo i Serra
@ 2016-04-25 15:18 ` Enric Balletbo i Serra
  2016-04-25 15:29   ` Enric Balletbo Serra
  2016-04-25 15:29 ` [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo Serra
  1 sibling, 1 reply; 6+ messages in thread
From: Enric Balletbo i Serra @ 2016-04-25 15:18 UTC (permalink / raw)
  To: Jaehoon Chung, Ulf Hansson, linux-mmc, linux-kernel
  Cc: Doug Anderson, Alim Akhtar

From: Doug Anderson <dianders@chromium.org>

According to the DesignWare state machine description, after we get a
"response error" or "response CRC error" we move into data transfer
mode. That means that we don't necessarily need to special case
trying to deal with the failure right away. We can wait until we are
notified that the data transfer is complete (with or without errors)
and then we can deal with the failure.

It may sound strange to defer dealing with a command that we know will
fail anyway, but this appears to fix a bug. During tuning (CMD19) on
a specific card on an rk3288-based system, we found that we could get
a "response CRC error". Sending the stop command after the "response
CRC error" would then throw the system into a confused state causing
all future tuning phases to report failure.

When in the confused state, the controller would show these (hex codes
are interrupt status register):
 CMD ERR: 0x00000046 (cmd=19)
 CMD ERR: 0x0000004e (cmd=12)
 DATA ERR: 0x00000208
 DATA ERR: 0x0000020c
 CMD ERR: 0x00000104 (cmd=19)
 CMD ERR: 0x00000104 (cmd=12)
 DATA ERR: 0x00000208
 DATA ERR: 0x0000020c
 ...
 ...

It is inherently difficult to deal with the complexity of trying to
correctly send a stop command while a data transfer is taking place
since you need to deal with different corner cases caused by the fact
that the data transfer could complete (with errors or without errors)
during various places in sending the stop command (dw_mci_stop_dma,
send_stop_abort, etc)

Instead of adding a bunch of extra complexity to deal with this, it
seems much simpler to just use the more straightforward (and less
error-prone) path of letting the data transfer finish. There
shouldn't be any huge benefit to sending the stop command slightly
earlier, anyway.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Cc: Alim Akhtar <alim.akhtar@gmail.com>
---
 drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 242f9a0..2ebeea8 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1761,6 +1761,33 @@ static void dw_mci_tasklet_func(unsigned long priv)
 			}
 
 			if (cmd->data && err) {
+				/*
+				 * During UHS tuning sequence, sending the stop
+				 * command after the response CRC error would
+				 * throw the system into a confused state
+				 * causing all future tuning phases to report
+				 * failure.
+				 *
+				 * In such case controller will move into a data
+				 * transfer state after a response error or
+				 * response CRC error. Let's let that finish
+				 * before trying to send a stop, so we'll go to
+				 * STATE_SENDING_DATA.
+				 *
+				 * Although letting the data transfer take place
+				 * will waste a bit of time (we already know
+				 * the command was bad), it can't cause any
+				 * errors since it's possible it would have
+				 * taken place anyway if this tasklet got
+				 * delayed. Allowing the transfer to take place
+				 * avoids races and keeps things simple.
+				 */
+				if ((err != -ETIMEDOUT) &&
+				    (cmd->opcode == MMC_SEND_TUNING_BLOCK)) {
+					state = STATE_SENDING_DATA;
+					continue;
+				}
+
 				dw_mci_stop_dma(host);
 				send_stop_abort(host, data);
 				state = STATE_SENDING_STOP;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors.
  2016-04-25 15:18 ` [PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors Enric Balletbo i Serra
@ 2016-04-25 15:29   ` Enric Balletbo Serra
  0 siblings, 0 replies; 6+ messages in thread
From: Enric Balletbo Serra @ 2016-04-25 15:29 UTC (permalink / raw)
  To: Enric Balletbo i Serra
  Cc: Jaehoon Chung, Ulf Hansson, linux-mmc, linux-kernel,
	Doug Anderson, Alim Akhtar

Oh, damn, I didn't include the proper tags,

[PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors.
[PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards

should be

[PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards
[PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors.

2016-04-25 17:18 GMT+02:00 Enric Balletbo i Serra
<enric.balletbo@collabora.com>:
> From: Doug Anderson <dianders@chromium.org>
>
> According to the DesignWare state machine description, after we get a
> "response error" or "response CRC error" we move into data transfer
> mode. That means that we don't necessarily need to special case
> trying to deal with the failure right away. We can wait until we are
> notified that the data transfer is complete (with or without errors)
> and then we can deal with the failure.
>
> It may sound strange to defer dealing with a command that we know will
> fail anyway, but this appears to fix a bug. During tuning (CMD19) on
> a specific card on an rk3288-based system, we found that we could get
> a "response CRC error". Sending the stop command after the "response
> CRC error" would then throw the system into a confused state causing
> all future tuning phases to report failure.
>
> When in the confused state, the controller would show these (hex codes
> are interrupt status register):
>  CMD ERR: 0x00000046 (cmd=19)
>  CMD ERR: 0x0000004e (cmd=12)
>  DATA ERR: 0x00000208
>  DATA ERR: 0x0000020c
>  CMD ERR: 0x00000104 (cmd=19)
>  CMD ERR: 0x00000104 (cmd=12)
>  DATA ERR: 0x00000208
>  DATA ERR: 0x0000020c
>  ...
>  ...
>
> It is inherently difficult to deal with the complexity of trying to
> correctly send a stop command while a data transfer is taking place
> since you need to deal with different corner cases caused by the fact
> that the data transfer could complete (with errors or without errors)
> during various places in sending the stop command (dw_mci_stop_dma,
> send_stop_abort, etc)
>
> Instead of adding a bunch of extra complexity to deal with this, it
> seems much simpler to just use the more straightforward (and less
> error-prone) path of letting the data transfer finish. There
> shouldn't be any huge benefit to sending the stop command slightly
> earlier, anyway.
>
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
> Cc: Alim Akhtar <alim.akhtar@gmail.com>
> ---
>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 242f9a0..2ebeea8 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1761,6 +1761,33 @@ static void dw_mci_tasklet_func(unsigned long priv)
>                         }
>
>                         if (cmd->data && err) {
> +                               /*
> +                                * During UHS tuning sequence, sending the stop
> +                                * command after the response CRC error would
> +                                * throw the system into a confused state
> +                                * causing all future tuning phases to report
> +                                * failure.
> +                                *
> +                                * In such case controller will move into a data
> +                                * transfer state after a response error or
> +                                * response CRC error. Let's let that finish
> +                                * before trying to send a stop, so we'll go to
> +                                * STATE_SENDING_DATA.
> +                                *
> +                                * Although letting the data transfer take place
> +                                * will waste a bit of time (we already know
> +                                * the command was bad), it can't cause any
> +                                * errors since it's possible it would have
> +                                * taken place anyway if this tasklet got
> +                                * delayed. Allowing the transfer to take place
> +                                * avoids races and keeps things simple.
> +                                */
> +                               if ((err != -ETIMEDOUT) &&
> +                                   (cmd->opcode == MMC_SEND_TUNING_BLOCK)) {
> +                                       state = STATE_SENDING_DATA;
> +                                       continue;
> +                               }
> +
>                                 dw_mci_stop_dma(host);
>                                 send_stop_abort(host, data);
>                                 state = STATE_SENDING_STOP;
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
  2016-04-25 15:18 [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo i Serra
  2016-04-25 15:18 ` [PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors Enric Balletbo i Serra
@ 2016-04-25 15:29 ` Enric Balletbo Serra
  2016-04-26  7:44   ` Jaehoon Chung
  1 sibling, 1 reply; 6+ messages in thread
From: Enric Balletbo Serra @ 2016-04-25 15:29 UTC (permalink / raw)
  To: Enric Balletbo i Serra
  Cc: Jaehoon Chung, Ulf Hansson, linux-mmc, linux-kernel

Oh, damn, I didn't include the proper tags,

[PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors.
[PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards

should be

[PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards
[PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors.

2016-04-25 17:18 GMT+02:00 Enric Balletbo i Serra
<enric.balletbo@collabora.com>:
> Hi,
>
> I introduced the cover letter to give some background about this.
>
> I have been investigating a problem related to at least one specific sdcard when
> UHS-I is set. The card is not detected due the tuning phase reports a
> failure. Since the problem is only reproduced with a single model of a single
> brand of card, it is probably a card firmware issue, but the card works fine
> on my laptop.
>
> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
> Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
> that it's easy to reproduce the hung task, either, with cold boots or suspend to
> ram tests.
>
> I tried to fix both problems (the original issue and the one introduced by the
> patch) in different ways, but I ended thinking that this second proposal is the
> most simple that solves both issues. So let's try to fix this by handling the
> response CRC error slightly differently when tuning command is happening.
>
> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
> tests and at the moment the patch seems to fix the rockchip issue and don't
> hung on exynos. I'll continue testing meanwhile we discuss about it.
>
> I think the patch, at least, needs the Doug's approval (as he dig into the issue
> before) and the Tested-by Alim. So will be good if you have a slot of time to
> look a bit into this.
>
> Thanks in advance.
>  Enric
>
> [1] https://lkml.org/lkml/2015/5/18/495
>
> Doug Anderson (1):
>   mmc: dw_mmc: Wait for data transfer after response errors.
>
>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> --
> 2.1.0
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
  2016-04-25 15:29 ` [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo Serra
@ 2016-04-26  7:44   ` Jaehoon Chung
  2016-04-26  8:03     ` Enric Balletbo i Serra
  0 siblings, 1 reply; 6+ messages in thread
From: Jaehoon Chung @ 2016-04-26  7:44 UTC (permalink / raw)
  To: Enric Balletbo Serra, Enric Balletbo i Serra
  Cc: Ulf Hansson, linux-mmc, linux-kernel

Hi Enric,

On 04/26/2016 12:29 AM, Enric Balletbo Serra wrote:
> Oh, damn, I didn't include the proper tags,
> 
> [PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors.
> [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards
> 
> should be
> 
> [PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards
> [PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors.

What changed from patch V1?
Anyway, thanks for reminding! :)

Best Regards,
Jaehoon Chung

> 
> 2016-04-25 17:18 GMT+02:00 Enric Balletbo i Serra
> <enric.balletbo@collabora.com>:
>> Hi,
>>
>> I introduced the cover letter to give some background about this.
>>
>> I have been investigating a problem related to at least one specific sdcard when
>> UHS-I is set. The card is not detected due the tuning phase reports a
>> failure. Since the problem is only reproduced with a single model of a single
>> brand of card, it is probably a card firmware issue, but the card works fine
>> on my laptop.
>>
>> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
>> Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
>> that it's easy to reproduce the hung task, either, with cold boots or suspend to
>> ram tests.
>>
>> I tried to fix both problems (the original issue and the one introduced by the
>> patch) in different ways, but I ended thinking that this second proposal is the
>> most simple that solves both issues. So let's try to fix this by handling the
>> response CRC error slightly differently when tuning command is happening.
>>
>> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
>> tests and at the moment the patch seems to fix the rockchip issue and don't
>> hung on exynos. I'll continue testing meanwhile we discuss about it.
>>
>> I think the patch, at least, needs the Doug's approval (as he dig into the issue
>> before) and the Tested-by Alim. So will be good if you have a slot of time to
>> look a bit into this.
>>
>> Thanks in advance.
>>  Enric
>>
>> [1] https://lkml.org/lkml/2015/5/18/495
>>
>> Doug Anderson (1):
>>   mmc: dw_mmc: Wait for data transfer after response errors.
>>
>>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>>  1 file changed, 27 insertions(+)
>>
>> --
>> 2.1.0
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
  2016-04-26  7:44   ` Jaehoon Chung
@ 2016-04-26  8:03     ` Enric Balletbo i Serra
  0 siblings, 0 replies; 6+ messages in thread
From: Enric Balletbo i Serra @ 2016-04-26  8:03 UTC (permalink / raw)
  To: Jaehoon Chung, Enric Balletbo Serra; +Cc: Ulf Hansson, linux-mmc, linux-kernel

Hi Jaehoon,

On 26/04/16 09:44, Jaehoon Chung wrote:
> Hi Enric,
>
> On 04/26/2016 12:29 AM, Enric Balletbo Serra wrote:
>> Oh, damn, I didn't include the proper tags,
>>
>> [PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors.
>> [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards
>>
>> should be
>>
>> [PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards
>> [PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors.
>
> What changed from patch V1?
> Anyway, thanks for reminding! :)
>

Argh! also the changelog didn't go with the email. Ok, so let me resend 
this version with the proper tags and proper Changelog. I think will be 
more easy to follow the discussion then. Ignore these and sorry for the 
noise.

Best regards,
Enric

> Best Regards,
> Jaehoon Chung
>
>>
>> 2016-04-25 17:18 GMT+02:00 Enric Balletbo i Serra
>> <enric.balletbo@collabora.com>:
>>> Hi,
>>>
>>> I introduced the cover letter to give some background about this.
>>>
>>> I have been investigating a problem related to at least one specific sdcard when
>>> UHS-I is set. The card is not detected due the tuning phase reports a
>>> failure. Since the problem is only reproduced with a single model of a single
>>> brand of card, it is probably a card firmware issue, but the card works fine
>>> on my laptop.
>>>
>>> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
>>> Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
>>> that it's easy to reproduce the hung task, either, with cold boots or suspend to
>>> ram tests.
>>>
>>> I tried to fix both problems (the original issue and the one introduced by the
>>> patch) in different ways, but I ended thinking that this second proposal is the
>>> most simple that solves both issues. So let's try to fix this by handling the
>>> response CRC error slightly differently when tuning command is happening.
>>>
>>> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
>>> tests and at the moment the patch seems to fix the rockchip issue and don't
>>> hung on exynos. I'll continue testing meanwhile we discuss about it.
>>>
>>> I think the patch, at least, needs the Doug's approval (as he dig into the issue
>>> before) and the Tested-by Alim. So will be good if you have a slot of time to
>>> look a bit into this.
>>>
>>> Thanks in advance.
>>>   Enric
>>>
>>> [1] https://lkml.org/lkml/2015/5/18/495
>>>
>>> Doug Anderson (1):
>>>    mmc: dw_mmc: Wait for data transfer after response errors.
>>>
>>>   drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>>>   1 file changed, 27 insertions(+)
>>>
>>> --
>>> 2.1.0
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-26  8:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-25 15:18 [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo i Serra
2016-04-25 15:18 ` [PATCH v2] mmc: dw_mmc: Wait for data transfer after response errors Enric Balletbo i Serra
2016-04-25 15:29   ` Enric Balletbo Serra
2016-04-25 15:29 ` [PATCH v2] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo Serra
2016-04-26  7:44   ` Jaehoon Chung
2016-04-26  8:03     ` Enric Balletbo i Serra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).