All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mmc: dw_mmc: Wait for data transfer after response errors
@ 2015-05-18 15:53 Doug Anderson
  2015-05-26 18:02 ` Alim Akhtar
  0 siblings, 1 reply; 19+ messages in thread
From: Doug Anderson @ 2015-05-18 15:53 UTC (permalink / raw)
  To: Jaehoon Chung, Seungwon Jeon, Ulf Hansson
  Cc: Alim Akhtar, Sonny Rao, Andrew Bresticker, Heiko Stuebner,
	Addy Ke, Alexandru Stan, javier.martinez, Chris Zhong,
	Caesar Wang, Doug Anderson, linux-mmc, linux-kernel

According to the DesignWare state machine description, after we get a
"response error" or "response CRC error" we move into data transfer
mode.  That means that we don't necessarily need to special case
trying to deal with the failure right away.  We can wait until we are
notified that the data transfer is complete (with or without errors)
and then we can deal with the failure.

It may sound strange to defer dealing with a command that we know will
fail anyway, but this appears to fix a bug.  During tuning (CMD19) on
a specific card on an rk3288-based system, we found that we could get
a "response CRC error".  Sending the stop command after the "response
CRC error" would then throw the system into a confused state causing
all future tuning phases to report failure.

When in the confused state, the controller would show these (hex codes
are interrupt status register):
 CMD ERR: 0x00000046 (cmd=19)
 CMD ERR: 0x0000004e (cmd=12)
 DATA ERR: 0x00000208
 DATA ERR: 0x0000020c
 CMD ERR: 0x00000104 (cmd=19)
 CMD ERR: 0x00000104 (cmd=12)
 DATA ERR: 0x00000208
 DATA ERR: 0x0000020c
 ...
 ...

It is inherently difficult to deal with the complexity of trying to
correctly send a stop command while a data transfer is taking place
since you need to deal with different corner cases caused by the fact
that the data transfer could complete (with errors or without errors)
during various places in sending the stop command (dw_mci_stop_dma,
send_stop_abort, etc)

Instead of adding a bunch of extra complexity to deal with this, it
seems much simpler to just use the more straightforward (and less
error-prone) path of letting the data transfer finish.  There
shouldn't be any huge benefit to sending the stop command slightly
earlier, anyway.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/mmc/host/dw_mmc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 5f5adaf..c081ce2 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1574,6 +1574,26 @@ static void dw_mci_tasklet_func(unsigned long priv)
 			}
 
 			if (cmd->data && err) {
+				/*
+				 * Controller will move into a data transfer
+				 * state after a response error or response CRC
+				 * error.  Let's let that finish before trying
+				 * to send a stop, so we'll go to
+				 * STATE_SENDING_DATA.
+				 *
+				 * Although letting the data transfer take place
+				 * will waste a bit of time (we already know
+				 * the command was bad), it can't cause any
+				 * errors since it's possible it would have
+				 * taken place anyway if this tasklet got
+				 * delayed.  Allowing the transfer to take place
+				 * avoids races and keeps things simple.
+				 */
+				if (err != -ETIMEDOUT) {
+					state = STATE_SENDING_DATA;
+					continue;
+				}
+
 				dw_mci_stop_dma(host);
 				send_stop_abort(host, data);
 				state = STATE_SENDING_STOP;
-- 
2.2.0.rc0.207.ga3a616c


^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH] mmc: dw_mmc: Wait for data transfer after response errors
@ 2016-03-17 12:12 Enric Balletbo Serra
  2016-03-21 22:38 ` Doug Anderson
  0 siblings, 1 reply; 19+ messages in thread
From: Enric Balletbo Serra @ 2016-03-17 12:12 UTC (permalink / raw)
  To: linux-mmc, linux-kernel
  Cc: Doug Anderson, Alim Akhtar, Jaehoon Chung, Seungwon Jeon,
	Ulf Hansson, Alim Akhtar, Sonny Rao, Andrew Bresticker,
	Heiko Stuebner, Addy Ke, Alexandru Stan,
	Javier Martinez Canillas, Chris Zhong, Caesar Wang

Dear all,

Seems the following thread[1] didn't go anywhere. I'd like to continue
the discussion and share some tests that I did regarding the issue
that the patch is trying to fix.

First I reproduced the issue on my rockchip board and I tested the
patch intensively, I can confirm that the patch made by Doug fixes the
issue.But, as reported by Alim, seems that this patch has the side
effect that breaks mmc on peach-pi board [2], specially on
suspend/resume, I ran lots of tests on peach-pi and, although is a bit
random, I can also confirm the breakage.

Looks like that on peach-pi, when the patch is applied the controller
moves into a data transfer and the interrupt does not come, then the
task blocks. The reason why I think the dw_mmc-rockchip driver works
is because it has the DW_MCI_QUIRK_BROKEN_DTO quirk [3].

So I did lots of tests on peach-pi with dto quirk, suspend/resume
started to work again. But I guess this is not the proper solution or
it is? Thoughts?

[1] https://lkml.org/lkml/2015/5/18/495
[2] https://lava.collabora.co.uk/scheduler/job/169384/log_file#L_195_5
[3] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/mmc/host/dw_mmc-rockchip.c?id=57e104864bc4874a36796fd222d8d084dbf90b9b

Cheers,
Enric

>
> Alim,
>
> On Tue, May 26, 2015 at 11:02 AM, Alim Akhtar <alim.akhtar@gmail.com> wrote:
>> Hi Doug,
>> On peach-pi, I got a hung task once in 4 cold boot as [1].
>
> OK, I'll have to get my peach-pi or peach-pit up and running again.  I
> ran out of desk space and I haven't gotten it set back up.  :(
>
> I've been testing on an rk3288-based device.  Past experience has
> taught me that the rk3288 dw_mmc works differently than the exynos
> one, so perhaps this is a difference.
>
> Could you possibly patch in something like
> <https://chromium-review.googlesource.com/#/c/244347/> and provide the
> console for the failure?  I'll put it on my list to try this myself,
> too
>
>
>> I was checking on v4.1-rc5, git hash as below:
>>
>> 862e58a mmc: dw_mmc: Wait for data transfer after response errors
>> ba155e2 Linux 4.1-rc5
>> 5b13966
>>
>> Not sure if I missed any dependent patch??
>
> I'm currently testing out of tree, but my dw_mmc is very close to mainline.
>
>
>> Have not checked the dw TRM for this change, will do that as soon as I
>> get access to it.
>
> OK, sounds good.  I have some old version of the DesignWare TRM, so
> possibly something is different in the newer one...
>
>
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-03-31 18:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-18 15:53 [PATCH] mmc: dw_mmc: Wait for data transfer after response errors Doug Anderson
2015-05-26 18:02 ` Alim Akhtar
2015-05-26 20:44   ` Doug Anderson
2015-05-27  1:53     ` Jaehoon Chung
2015-05-27 16:52       ` Doug Anderson
2016-03-17 12:12 Enric Balletbo Serra
2016-03-21 22:38 ` Doug Anderson
2016-03-24 11:26   ` Enric Balletbo Serra
2016-03-24 15:16     ` Doug Anderson
2016-03-24 15:30     ` Russell King - ARM Linux
2016-03-24 16:06       ` Doug Anderson
2016-03-24 16:22         ` Russell King - ARM Linux
2016-03-30 17:16           ` Enric Balletbo Serra
2016-03-30 17:26             ` Russell King - ARM Linux
     [not found]               ` <CAFqH_51sMLbURO4n7OTEuC8S-6w0Q4aRv47nEoCAmK8-MJ+Jbw@mail.gmail.com>
2016-03-30 20:39                 ` Russell King - ARM Linux
2016-03-31  1:56               ` Shawn Lin
2016-03-31  2:03             ` Jaehoon Chung
2016-03-31  6:39               ` Enric Balletbo Serra
2016-03-31 18:12           ` Doug Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.