linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
@ 2016-04-26  8:03 Enric Balletbo i Serra
  2016-04-26  8:03 ` [RESEND PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors Enric Balletbo i Serra
  2016-04-27  8:35 ` [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards Jaehoon Chung
  0 siblings, 2 replies; 9+ messages in thread
From: Enric Balletbo i Serra @ 2016-04-26  8:03 UTC (permalink / raw)
  To: Jaehoon Chung, Ulf Hansson, linux-mmc, linux-kernel

Hi,

I introduced the cover letter to give some background about this.

I have been investigating a problem related to at least one specific sdcard when
UHS-I is set. The card is not detected due the tuning phase reports a
failure. Since the problem is only reproduced with a single model of a single
brand of card, it is probably a card firmware issue, but the card works fine
on my laptop.

The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
that it's easy to reproduce the hung task, either, with cold boots or suspend to
ram tests.

I tried to fix both problems (the original issue and the one introduced by the
patch) in different ways, but I ended thinking that this second proposal is the
most simple that solves both issues. So let's try to fix this by handling the
response CRC error slightly differently when tuning command is happening.

I tested the patch on both platforms, on exynos and on rockhip. I did lots of
tests and at the moment the patch seems to fix the rockchip issue and don't
hung on exynos. I'll continue testing meanwhile we discuss about it.

I think the patch, at least, needs the Doug's approval (as he dig into the issue
before) and the Tested-by Alim. So will be good if you have a slot of time to
look a bit into this.

Thanks in advance.
 Enric

[1] https://lkml.org/lkml/2015/5/18/495

Changelog since v1:
- Fix the issue found by Alim with exynos letting the data transfer
  take place only when MMC_SEND_TUNING_BLOCK is issued.

Doug Anderson (1):
  mmc: dw_mmc: Wait for data transfer after response errors.

 drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RESEND PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors.
  2016-04-26  8:03 [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo i Serra
@ 2016-04-26  8:03 ` Enric Balletbo i Serra
  2016-06-20 10:14   ` Jaehoon Chung
  2016-04-27  8:35 ` [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards Jaehoon Chung
  1 sibling, 1 reply; 9+ messages in thread
From: Enric Balletbo i Serra @ 2016-04-26  8:03 UTC (permalink / raw)
  To: Jaehoon Chung, Ulf Hansson, linux-mmc, linux-kernel
  Cc: Doug Anderson, Alim Akhtar

From: Doug Anderson <dianders@chromium.org>

According to the DesignWare state machine description, after we get a
"response error" or "response CRC error" we move into data transfer
mode. That means that we don't necessarily need to special case
trying to deal with the failure right away. We can wait until we are
notified that the data transfer is complete (with or without errors)
and then we can deal with the failure.

It may sound strange to defer dealing with a command that we know will
fail anyway, but this appears to fix a bug. During tuning (CMD19) on
a specific card on an rk3288-based system, we found that we could get
a "response CRC error". Sending the stop command after the "response
CRC error" would then throw the system into a confused state causing
all future tuning phases to report failure.

When in the confused state, the controller would show these (hex codes
are interrupt status register):
 CMD ERR: 0x00000046 (cmd=19)
 CMD ERR: 0x0000004e (cmd=12)
 DATA ERR: 0x00000208
 DATA ERR: 0x0000020c
 CMD ERR: 0x00000104 (cmd=19)
 CMD ERR: 0x00000104 (cmd=12)
 DATA ERR: 0x00000208
 DATA ERR: 0x0000020c
 ...
 ...

It is inherently difficult to deal with the complexity of trying to
correctly send a stop command while a data transfer is taking place
since you need to deal with different corner cases caused by the fact
that the data transfer could complete (with errors or without errors)
during various places in sending the stop command (dw_mci_stop_dma,
send_stop_abort, etc)

Instead of adding a bunch of extra complexity to deal with this, it
seems much simpler to just use the more straightforward (and less
error-prone) path of letting the data transfer finish. There
shouldn't be any huge benefit to sending the stop command slightly
earlier, anyway.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Cc: Alim Akhtar <alim.akhtar@gmail.com>
---
Changelog since v1:
- Fix the issue found by Alim with exynos letting the data transfer
  take place only when MMC_SEND_TUNING_BLOCK is issued.

 drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 242f9a0..2ebeea8 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1761,6 +1761,33 @@ static void dw_mci_tasklet_func(unsigned long priv)
 			}
 
 			if (cmd->data && err) {
+				/*
+				 * During UHS tuning sequence, sending the stop
+				 * command after the response CRC error would
+				 * throw the system into a confused state
+				 * causing all future tuning phases to report
+				 * failure.
+				 *
+				 * In such case controller will move into a data
+				 * transfer state after a response error or
+				 * response CRC error. Let's let that finish
+				 * before trying to send a stop, so we'll go to
+				 * STATE_SENDING_DATA.
+				 *
+				 * Although letting the data transfer take place
+				 * will waste a bit of time (we already know
+				 * the command was bad), it can't cause any
+				 * errors since it's possible it would have
+				 * taken place anyway if this tasklet got
+				 * delayed. Allowing the transfer to take place
+				 * avoids races and keeps things simple.
+				 */
+				if ((err != -ETIMEDOUT) &&
+				    (cmd->opcode == MMC_SEND_TUNING_BLOCK)) {
+					state = STATE_SENDING_DATA;
+					continue;
+				}
+
 				dw_mci_stop_dma(host);
 				send_stop_abort(host, data);
 				state = STATE_SENDING_STOP;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
  2016-04-26  8:03 [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo i Serra
  2016-04-26  8:03 ` [RESEND PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors Enric Balletbo i Serra
@ 2016-04-27  8:35 ` Jaehoon Chung
  2016-04-27  8:53   ` Enric Balletbo i Serra
  1 sibling, 1 reply; 9+ messages in thread
From: Jaehoon Chung @ 2016-04-27  8:35 UTC (permalink / raw)
  To: Enric Balletbo i Serra, Ulf Hansson, linux-mmc, linux-kernel

On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
> Hi,
> 
> I introduced the cover letter to give some background about this.
> 
> I have been investigating a problem related to at least one specific sdcard when
> UHS-I is set. The card is not detected due the tuning phase reports a
> failure. Since the problem is only reproduced with a single model of a single
> brand of card, it is probably a card firmware issue, but the card works fine
> on my laptop.

I think you have analyzed many case..of course..it was successful to switch voltage, right?
Maybe this patch too old..so can you remember which specific sdcard is produced?

> 
> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
> Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
> that it's easy to reproduce the hung task, either, with cold boots or suspend to
> ram tests.

Yep..I have already tested and checked for this.

> 
> I tried to fix both problems (the original issue and the one introduced by the
> patch) in different ways, but I ended thinking that this second proposal is the
> most simple that solves both issues. So let's try to fix this by handling the
> response CRC error slightly differently when tuning command is happening.
> 
> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
> tests and at the moment the patch seems to fix the rockchip issue and don't
> hung on exynos. I'll continue testing meanwhile we discuss about it.
> 
> I think the patch, at least, needs the Doug's approval (as he dig into the issue
> before) and the Tested-by Alim. So will be good if you have a slot of time to
> look a bit into this.
> 
> Thanks in advance.
>  Enric
> 
> [1] https://lkml.org/lkml/2015/5/18/495
> 
> Changelog since v1:
> - Fix the issue found by Alim with exynos letting the data transfer
>   take place only when MMC_SEND_TUNING_BLOCK is issued.
> 
> Doug Anderson (1):
>   mmc: dw_mmc: Wait for data transfer after response errors.
> 
>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
  2016-04-27  8:35 ` [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards Jaehoon Chung
@ 2016-04-27  8:53   ` Enric Balletbo i Serra
  2016-06-20  7:59     ` Enric Balletbo Serra
  0 siblings, 1 reply; 9+ messages in thread
From: Enric Balletbo i Serra @ 2016-04-27  8:53 UTC (permalink / raw)
  To: Jaehoon Chung, Ulf Hansson, linux-mmc, linux-kernel



On 27/04/16 10:35, Jaehoon Chung wrote:
> On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
>> Hi,
>>
>> I introduced the cover letter to give some background about this.
>>
>> I have been investigating a problem related to at least one specific sdcard when
>> UHS-I is set. The card is not detected due the tuning phase reports a
>> failure. Since the problem is only reproduced with a single model of a single
>> brand of card, it is probably a card firmware issue, but the card works fine
>> on my laptop.
> 
> I think you have analyzed many case..of course..it was successful to switch voltage, right?
> Maybe this patch too old..so can you remember which specific sdcard is produced?
> 

Yes it was successful to switch voltage. The specific card is an UNIREX 16GB Class 10
SD card (Compatible with UHS-1)

>>
>> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
>> Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
>> that it's easy to reproduce the hung task, either, with cold boots or suspend to
>> ram tests.
> 
> Yep..I have already tested and checked for this.
> 
>>
>> I tried to fix both problems (the original issue and the one introduced by the
>> patch) in different ways, but I ended thinking that this second proposal is the
>> most simple that solves both issues. So let's try to fix this by handling the
>> response CRC error slightly differently when tuning command is happening.
>>
>> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
>> tests and at the moment the patch seems to fix the rockchip issue and don't
>> hung on exynos. I'll continue testing meanwhile we discuss about it.
>>
>> I think the patch, at least, needs the Doug's approval (as he dig into the issue
>> before) and the Tested-by Alim. So will be good if you have a slot of time to
>> look a bit into this.
>>
>> Thanks in advance.
>>  Enric
>>
>> [1] https://lkml.org/lkml/2015/5/18/495
>>
>> Changelog since v1:
>> - Fix the issue found by Alim with exynos letting the data transfer
>>   take place only when MMC_SEND_TUNING_BLOCK is issued.
>>
>> Doug Anderson (1):
>>   mmc: dw_mmc: Wait for data transfer after response errors.
>>
>>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>>  1 file changed, 27 insertions(+)
>>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
  2016-04-27  8:53   ` Enric Balletbo i Serra
@ 2016-06-20  7:59     ` Enric Balletbo Serra
  2016-06-20  8:02       ` Jaehoon Chung
  0 siblings, 1 reply; 9+ messages in thread
From: Enric Balletbo Serra @ 2016-06-20  7:59 UTC (permalink / raw)
  To: Enric Balletbo i Serra
  Cc: Jaehoon Chung, Ulf Hansson, linux-mmc, linux-kernel

Hi Jaehoon,

2016-04-27 10:53 GMT+02:00 Enric Balletbo i Serra
<enric.balletbo@collabora.com>:
>
>
> On 27/04/16 10:35, Jaehoon Chung wrote:
>> On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
>>> Hi,
>>>
>>> I introduced the cover letter to give some background about this.
>>>
>>> I have been investigating a problem related to at least one specific sdcard when
>>> UHS-I is set. The card is not detected due the tuning phase reports a
>>> failure. Since the problem is only reproduced with a single model of a single
>>> brand of card, it is probably a card firmware issue, but the card works fine
>>> on my laptop.
>>
>> I think you have analyzed many case..of course..it was successful to switch voltage, right?
>> Maybe this patch too old..so can you remember which specific sdcard is produced?
>>
>
> Yes it was successful to switch voltage. The specific card is an UNIREX 16GB Class 10
> SD card (Compatible with UHS-1)
>

Any feedback for this patch?


>>>
>>> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
>>> Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
>>> that it's easy to reproduce the hung task, either, with cold boots or suspend to
>>> ram tests.
>>
>> Yep..I have already tested and checked for this.
>>
>>>
>>> I tried to fix both problems (the original issue and the one introduced by the
>>> patch) in different ways, but I ended thinking that this second proposal is the
>>> most simple that solves both issues. So let's try to fix this by handling the
>>> response CRC error slightly differently when tuning command is happening.
>>>
>>> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
>>> tests and at the moment the patch seems to fix the rockchip issue and don't
>>> hung on exynos. I'll continue testing meanwhile we discuss about it.
>>>
>>> I think the patch, at least, needs the Doug's approval (as he dig into the issue
>>> before) and the Tested-by Alim. So will be good if you have a slot of time to
>>> look a bit into this.
>>>
>>> Thanks in advance.
>>>  Enric
>>>
>>> [1] https://lkml.org/lkml/2015/5/18/495
>>>
>>> Changelog since v1:
>>> - Fix the issue found by Alim with exynos letting the data transfer
>>>   take place only when MMC_SEND_TUNING_BLOCK is issued.
>>>
>>> Doug Anderson (1):
>>>   mmc: dw_mmc: Wait for data transfer after response errors.
>>>
>>>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>>>  1 file changed, 27 insertions(+)
>>>
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards.
  2016-06-20  7:59     ` Enric Balletbo Serra
@ 2016-06-20  8:02       ` Jaehoon Chung
  0 siblings, 0 replies; 9+ messages in thread
From: Jaehoon Chung @ 2016-06-20  8:02 UTC (permalink / raw)
  To: Enric Balletbo Serra, Enric Balletbo i Serra
  Cc: Ulf Hansson, linux-mmc, linux-kernel

Hi Enric,

On 06/20/2016 04:59 PM, Enric Balletbo Serra wrote:
> Hi Jaehoon,
> 
> 2016-04-27 10:53 GMT+02:00 Enric Balletbo i Serra
> <enric.balletbo@collabora.com>:
>>
>>
>> On 27/04/16 10:35, Jaehoon Chung wrote:
>>> On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
>>>> Hi,
>>>>
>>>> I introduced the cover letter to give some background about this.
>>>>
>>>> I have been investigating a problem related to at least one specific sdcard when
>>>> UHS-I is set. The card is not detected due the tuning phase reports a
>>>> failure. Since the problem is only reproduced with a single model of a single
>>>> brand of card, it is probably a card firmware issue, but the card works fine
>>>> on my laptop.
>>>
>>> I think you have analyzed many case..of course..it was successful to switch voltage, right?
>>> Maybe this patch too old..so can you remember which specific sdcard is produced?
>>>
>>
>> Yes it was successful to switch voltage. The specific card is an UNIREX 16GB Class 10
>> SD card (Compatible with UHS-1)
>>
> 
> Any feedback for this patch?

Sorry..I missed this patch..Thanks for reminding!
I will check this patch as soon as possible.

Best Regards,
Jaehoon Chung

> 
> 
>>>>
>>>> The first attempt to fix this was a patch sent by Doug Anderson [1], but Alim
>>>> Akhtar found that this produced randomly a hung task on Peach-pi. I can confirm
>>>> that it's easy to reproduce the hung task, either, with cold boots or suspend to
>>>> ram tests.
>>>
>>> Yep..I have already tested and checked for this.
>>>
>>>>
>>>> I tried to fix both problems (the original issue and the one introduced by the
>>>> patch) in different ways, but I ended thinking that this second proposal is the
>>>> most simple that solves both issues. So let's try to fix this by handling the
>>>> response CRC error slightly differently when tuning command is happening.
>>>>
>>>> I tested the patch on both platforms, on exynos and on rockhip. I did lots of
>>>> tests and at the moment the patch seems to fix the rockchip issue and don't
>>>> hung on exynos. I'll continue testing meanwhile we discuss about it.
>>>>
>>>> I think the patch, at least, needs the Doug's approval (as he dig into the issue
>>>> before) and the Tested-by Alim. So will be good if you have a slot of time to
>>>> look a bit into this.
>>>>
>>>> Thanks in advance.
>>>>  Enric
>>>>
>>>> [1] https://lkml.org/lkml/2015/5/18/495
>>>>
>>>> Changelog since v1:
>>>> - Fix the issue found by Alim with exynos letting the data transfer
>>>>   take place only when MMC_SEND_TUNING_BLOCK is issued.
>>>>
>>>> Doug Anderson (1):
>>>>   mmc: dw_mmc: Wait for data transfer after response errors.
>>>>
>>>>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>>>>  1 file changed, 27 insertions(+)
>>>>
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors.
  2016-04-26  8:03 ` [RESEND PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors Enric Balletbo i Serra
@ 2016-06-20 10:14   ` Jaehoon Chung
  2016-06-21 16:30     ` Enric Balletbo Serra
  0 siblings, 1 reply; 9+ messages in thread
From: Jaehoon Chung @ 2016-06-20 10:14 UTC (permalink / raw)
  To: Enric Balletbo i Serra, Ulf Hansson, linux-mmc, linux-kernel
  Cc: Doug Anderson, Alim Akhtar

Hi Enric,

On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
> From: Doug Anderson <dianders@chromium.org>
> 
> According to the DesignWare state machine description, after we get a
> "response error" or "response CRC error" we move into data transfer
> mode. That means that we don't necessarily need to special case
> trying to deal with the failure right away. We can wait until we are
> notified that the data transfer is complete (with or without errors)
> and then we can deal with the failure.
> 
> It may sound strange to defer dealing with a command that we know will
> fail anyway, but this appears to fix a bug. During tuning (CMD19) on
> a specific card on an rk3288-based system, we found that we could get
> a "response CRC error". Sending the stop command after the "response
> CRC error" would then throw the system into a confused state causing
> all future tuning phases to report failure.

I understood this patch what purpose has.
Does it need to consider for other tuning cases?

Best Regards,
Jaehoon Chung

> 
> When in the confused state, the controller would show these (hex codes
> are interrupt status register):
>  CMD ERR: 0x00000046 (cmd=19)
>  CMD ERR: 0x0000004e (cmd=12)
>  DATA ERR: 0x00000208
>  DATA ERR: 0x0000020c
>  CMD ERR: 0x00000104 (cmd=19)
>  CMD ERR: 0x00000104 (cmd=12)
>  DATA ERR: 0x00000208
>  DATA ERR: 0x0000020c
>  ...
>  ...
> 
> It is inherently difficult to deal with the complexity of trying to
> correctly send a stop command while a data transfer is taking place
> since you need to deal with different corner cases caused by the fact
> that the data transfer could complete (with errors or without errors)
> during various places in sending the stop command (dw_mci_stop_dma,
> send_stop_abort, etc)
> 
> Instead of adding a bunch of extra complexity to deal with this, it
> seems much simpler to just use the more straightforward (and less
> error-prone) path of letting the data transfer finish. There
> shouldn't be any huge benefit to sending the stop command slightly
> earlier, anyway.
> 
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
> Cc: Alim Akhtar <alim.akhtar@gmail.com>
> ---
> Changelog since v1:
> - Fix the issue found by Alim with exynos letting the data transfer
>   take place only when MMC_SEND_TUNING_BLOCK is issued.
> 
>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 242f9a0..2ebeea8 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1761,6 +1761,33 @@ static void dw_mci_tasklet_func(unsigned long priv)
>  			}
>  
>  			if (cmd->data && err) {
> +				/*
> +				 * During UHS tuning sequence, sending the stop
> +				 * command after the response CRC error would
> +				 * throw the system into a confused state
> +				 * causing all future tuning phases to report
> +				 * failure.
> +				 *
> +				 * In such case controller will move into a data
> +				 * transfer state after a response error or
> +				 * response CRC error. Let's let that finish
> +				 * before trying to send a stop, so we'll go to
> +				 * STATE_SENDING_DATA.
> +				 *
> +				 * Although letting the data transfer take place
> +				 * will waste a bit of time (we already know
> +				 * the command was bad), it can't cause any
> +				 * errors since it's possible it would have
> +				 * taken place anyway if this tasklet got
> +				 * delayed. Allowing the transfer to take place
> +				 * avoids races and keeps things simple.
> +				 */
> +				if ((err != -ETIMEDOUT) &&
> +				    (cmd->opcode == MMC_SEND_TUNING_BLOCK)) {
> +					state = STATE_SENDING_DATA;
> +					continue;
> +				}
> +
>  				dw_mci_stop_dma(host);
>  				send_stop_abort(host, data);
>  				state = STATE_SENDING_STOP;
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors.
  2016-06-20 10:14   ` Jaehoon Chung
@ 2016-06-21 16:30     ` Enric Balletbo Serra
  2016-06-22  1:26       ` Jaehoon Chung
  0 siblings, 1 reply; 9+ messages in thread
From: Enric Balletbo Serra @ 2016-06-21 16:30 UTC (permalink / raw)
  To: Jaehoon Chung
  Cc: Enric Balletbo i Serra, Ulf Hansson, linux-mmc, linux-kernel,
	Doug Anderson, Alim Akhtar

Hi Jaehoon,

2016-06-20 12:14 GMT+02:00 Jaehoon Chung <jh80.chung@samsung.com>:
> Hi Enric,
>
> On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
>> From: Doug Anderson <dianders@chromium.org>
>>
>> According to the DesignWare state machine description, after we get a
>> "response error" or "response CRC error" we move into data transfer
>> mode. That means that we don't necessarily need to special case
>> trying to deal with the failure right away. We can wait until we are
>> notified that the data transfer is complete (with or without errors)
>> and then we can deal with the failure.
>>
>> It may sound strange to defer dealing with a command that we know will
>> fail anyway, but this appears to fix a bug. During tuning (CMD19) on
>> a specific card on an rk3288-based system, we found that we could get
>> a "response CRC error". Sending the stop command after the "response
>> CRC error" would then throw the system into a confused state causing
>> all future tuning phases to report failure.
>
> I understood this patch what purpose has.
> Does it need to consider for other tuning cases?
>

Not sure, to be honest I only saw this during UHS tuning sequence on
this brand of cards I never reproduced the issue on other tuning
cases.

> Best Regards,
> Jaehoon Chung
>
>>
>> When in the confused state, the controller would show these (hex codes
>> are interrupt status register):
>>  CMD ERR: 0x00000046 (cmd=19)
>>  CMD ERR: 0x0000004e (cmd=12)
>>  DATA ERR: 0x00000208
>>  DATA ERR: 0x0000020c
>>  CMD ERR: 0x00000104 (cmd=19)
>>  CMD ERR: 0x00000104 (cmd=12)
>>  DATA ERR: 0x00000208
>>  DATA ERR: 0x0000020c
>>  ...
>>  ...
>>
>> It is inherently difficult to deal with the complexity of trying to
>> correctly send a stop command while a data transfer is taking place
>> since you need to deal with different corner cases caused by the fact
>> that the data transfer could complete (with errors or without errors)
>> during various places in sending the stop command (dw_mci_stop_dma,
>> send_stop_abort, etc)
>>
>> Instead of adding a bunch of extra complexity to deal with this, it
>> seems much simpler to just use the more straightforward (and less
>> error-prone) path of letting the data transfer finish. There
>> shouldn't be any huge benefit to sending the stop command slightly
>> earlier, anyway.
>>
>> Signed-off-by: Doug Anderson <dianders@chromium.org>
>> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
>> Cc: Alim Akhtar <alim.akhtar@gmail.com>
>> ---
>> Changelog since v1:
>> - Fix the issue found by Alim with exynos letting the data transfer
>>   take place only when MMC_SEND_TUNING_BLOCK is issued.
>>
>>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>> index 242f9a0..2ebeea8 100644
>> --- a/drivers/mmc/host/dw_mmc.c
>> +++ b/drivers/mmc/host/dw_mmc.c
>> @@ -1761,6 +1761,33 @@ static void dw_mci_tasklet_func(unsigned long priv)
>>                       }
>>
>>                       if (cmd->data && err) {
>> +                             /*
>> +                              * During UHS tuning sequence, sending the stop
>> +                              * command after the response CRC error would
>> +                              * throw the system into a confused state
>> +                              * causing all future tuning phases to report
>> +                              * failure.
>> +                              *
>> +                              * In such case controller will move into a data
>> +                              * transfer state after a response error or
>> +                              * response CRC error. Let's let that finish
>> +                              * before trying to send a stop, so we'll go to
>> +                              * STATE_SENDING_DATA.
>> +                              *
>> +                              * Although letting the data transfer take place
>> +                              * will waste a bit of time (we already know
>> +                              * the command was bad), it can't cause any
>> +                              * errors since it's possible it would have
>> +                              * taken place anyway if this tasklet got
>> +                              * delayed. Allowing the transfer to take place
>> +                              * avoids races and keeps things simple.
>> +                              */
>> +                             if ((err != -ETIMEDOUT) &&
>> +                                 (cmd->opcode == MMC_SEND_TUNING_BLOCK)) {
>> +                                     state = STATE_SENDING_DATA;
>> +                                     continue;
>> +                             }
>> +
>>                               dw_mci_stop_dma(host);
>>                               send_stop_abort(host, data);
>>                               state = STATE_SENDING_STOP;
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors.
  2016-06-21 16:30     ` Enric Balletbo Serra
@ 2016-06-22  1:26       ` Jaehoon Chung
  0 siblings, 0 replies; 9+ messages in thread
From: Jaehoon Chung @ 2016-06-22  1:26 UTC (permalink / raw)
  To: Enric Balletbo Serra
  Cc: Enric Balletbo i Serra, Ulf Hansson, linux-mmc, linux-kernel,
	Doug Anderson, Alim Akhtar

Hi Enric,

On 06/22/2016 01:30 AM, Enric Balletbo Serra wrote:
> Hi Jaehoon,
> 
> 2016-06-20 12:14 GMT+02:00 Jaehoon Chung <jh80.chung@samsung.com>:
>> Hi Enric,
>>
>> On 04/26/2016 05:03 PM, Enric Balletbo i Serra wrote:
>>> From: Doug Anderson <dianders@chromium.org>
>>>
>>> According to the DesignWare state machine description, after we get a
>>> "response error" or "response CRC error" we move into data transfer
>>> mode. That means that we don't necessarily need to special case
>>> trying to deal with the failure right away. We can wait until we are
>>> notified that the data transfer is complete (with or without errors)
>>> and then we can deal with the failure.
>>>
>>> It may sound strange to defer dealing with a command that we know will
>>> fail anyway, but this appears to fix a bug. During tuning (CMD19) on
>>> a specific card on an rk3288-based system, we found that we could get
>>> a "response CRC error". Sending the stop command after the "response
>>> CRC error" would then throw the system into a confused state causing
>>> all future tuning phases to report failure.
>>
>> I understood this patch what purpose has.
>> Does it need to consider for other tuning cases?
>>
> 
> Not sure, to be honest I only saw this during UHS tuning sequence on
> this brand of cards I never reproduced the issue on other tuning
> cases.
> 
>> Best Regards,
>> Jaehoon Chung
>>
>>>
>>> When in the confused state, the controller would show these (hex codes
>>> are interrupt status register):
>>>  CMD ERR: 0x00000046 (cmd=19)
>>>  CMD ERR: 0x0000004e (cmd=12)
>>>  DATA ERR: 0x00000208
>>>  DATA ERR: 0x0000020c
>>>  CMD ERR: 0x00000104 (cmd=19)
>>>  CMD ERR: 0x00000104 (cmd=12)
>>>  DATA ERR: 0x00000208
>>>  DATA ERR: 0x0000020c
>>>  ...
>>>  ...
>>>
>>> It is inherently difficult to deal with the complexity of trying to
>>> correctly send a stop command while a data transfer is taking place
>>> since you need to deal with different corner cases caused by the fact
>>> that the data transfer could complete (with errors or without errors)
>>> during various places in sending the stop command (dw_mci_stop_dma,
>>> send_stop_abort, etc)
>>>
>>> Instead of adding a bunch of extra complexity to deal with this, it
>>> seems much simpler to just use the more straightforward (and less
>>> error-prone) path of letting the data transfer finish. There
>>> shouldn't be any huge benefit to sending the stop command slightly
>>> earlier, anyway.
>>>
>>> Signed-off-by: Doug Anderson <dianders@chromium.org>
>>> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
>>> Cc: Alim Akhtar <alim.akhtar@gmail.com>

Applied this patch on my repository. Thanks!

Best Regards,
Jaehoon Chung

>>> ---
>>> Changelog since v1:
>>> - Fix the issue found by Alim with exynos letting the data transfer
>>>   take place only when MMC_SEND_TUNING_BLOCK is issued.
>>>
>>>  drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++++++
>>>  1 file changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>>> index 242f9a0..2ebeea8 100644
>>> --- a/drivers/mmc/host/dw_mmc.c
>>> +++ b/drivers/mmc/host/dw_mmc.c
>>> @@ -1761,6 +1761,33 @@ static void dw_mci_tasklet_func(unsigned long priv)
>>>                       }
>>>
>>>                       if (cmd->data && err) {
>>> +                             /*
>>> +                              * During UHS tuning sequence, sending the stop
>>> +                              * command after the response CRC error would
>>> +                              * throw the system into a confused state
>>> +                              * causing all future tuning phases to report
>>> +                              * failure.
>>> +                              *
>>> +                              * In such case controller will move into a data
>>> +                              * transfer state after a response error or
>>> +                              * response CRC error. Let's let that finish
>>> +                              * before trying to send a stop, so we'll go to
>>> +                              * STATE_SENDING_DATA.
>>> +                              *
>>> +                              * Although letting the data transfer take place
>>> +                              * will waste a bit of time (we already know
>>> +                              * the command was bad), it can't cause any
>>> +                              * errors since it's possible it would have
>>> +                              * taken place anyway if this tasklet got
>>> +                              * delayed. Allowing the transfer to take place
>>> +                              * avoids races and keeps things simple.
>>> +                              */
>>> +                             if ((err != -ETIMEDOUT) &&
>>> +                                 (cmd->opcode == MMC_SEND_TUNING_BLOCK)) {
>>> +                                     state = STATE_SENDING_DATA;
>>> +                                     continue;
>>> +                             }
>>> +
>>>                               dw_mci_stop_dma(host);
>>>                               send_stop_abort(host, data);
>>>                               state = STATE_SENDING_STOP;
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-22  1:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-26  8:03 [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards Enric Balletbo i Serra
2016-04-26  8:03 ` [RESEND PATCH v2 1/1] mmc: dw_mmc: Wait for data transfer after response errors Enric Balletbo i Serra
2016-06-20 10:14   ` Jaehoon Chung
2016-06-21 16:30     ` Enric Balletbo Serra
2016-06-22  1:26       ` Jaehoon Chung
2016-04-27  8:35 ` [RESEND PATCH v2 0/1] mmc: dw_mmc: Fix UHS tuning on some brand of cards Jaehoon Chung
2016-04-27  8:53   ` Enric Balletbo i Serra
2016-06-20  7:59     ` Enric Balletbo Serra
2016-06-20  8:02       ` Jaehoon Chung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).