From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chaotian Jing Subject: Re: Regression after "do not use CMD13 to get status after speed mode switch" Date: Thu, 3 Nov 2016 11:39:34 +0800 Message-ID: <1478144374.4039.1.camel@mhfsdcap03> References: <1476930167.11050.4.camel@mhfsdcap03> <1477964625.4162.3.camel@mhfsdcap03> <17bdfdf2-5c58-7b09-6d4d-4f5173f6058e@intel.com> <1478082516.3862.3.camel@mhfsdcap03> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mailgw02.mediatek.com ([218.249.47.111]:56396 "EHLO mailgw02.mediatek.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1750970AbcKCDjo (ORCPT ); Wed, 2 Nov 2016 23:39:44 -0400 In-Reply-To: Sender: linux-arm-msm-owner@vger.kernel.org List-Id: linux-arm-msm@vger.kernel.org To: Adrian Hunter Cc: Ulf Hansson , Linus Walleij , "linux-mmc@vger.kernel.org" , "linux-arm-msm@vger.kernel.org" , Bjorn Andersson , Stephen Boyd , Andy Gross On Wed, 2016-11-02 at 14:51 +0200, Adrian Hunter wrote: > On 02/11/16 12:28, Chaotian Jing wrote: > > On Wed, 2016-11-02 at 10:19 +0200, Adrian Hunter wrote: > >> On 01/11/16 03:43, Chaotian Jing wrote: > >>> On Mon, 2016-10-31 at 15:09 +0200, Adrian Hunter wrote: > >>>> On 27/10/16 13:04, Ulf Hansson wrote: > >>>>> On 20 October 2016 at 09:06, Ulf Hansson wrote: > >>>>>> On 20 October 2016 at 04:22, Chaotian Jing wrote: > >>>>>>> On Wed, 2016-10-19 at 18:41 +0200, Ulf Hansson wrote: > >>>>>>>> Adrian, Linus, > >>>>>>>> > >>>>>>>> Thanks for looking into this and reporting! > >>>>>>>> > >>>>>>>> On 18 October 2016 at 15:23, Adrian Hunter wrote: > >>>>>>>>> On 18/10/16 11:36, Linus Walleij wrote: > >>>>>>>>>> On Mon, Oct 17, 2016 at 4:32 PM, Linus Walleij wrote: > >>>>>>>>>> > >>>>>>>>>>> Before this patch the eMMC is detected and all partitions enumerated > >>>>>>>>>>> immediately, but after the patch it doesn't come up at all, except > >>>>>>>>>>> sometimes, when it appears minutes (!) after boot, all of a sudden. > >>>>>>>>>> > >>>>>>>>>> FYI this is what it looks like when it eventually happens: > >>>>>>>>>> root@msm8660:/ [ 627.710175] mmc0: new high speed MMC card at address 0001 > >>>>>>>>>> [ 627.711641] mmcblk0: mmc0:0001 SEM04G 3.69 GiB > >>>>>>>>>> [ 627.715485] mmcblk0boot0: mmc0:0001 SEM04G partition 1 1.00 MiB > >>>>>>>>>> [ 627.736654] mmcblk0boot1: mmc0:0001 SEM04G partition 2 1.00 MiB > >>>>>>>>>> [ 627.747397] mmcblk0rpmb: mmc0:0001 SEM04G partition 3 128 KiB > >>>>>>>>>> [ 627.756326] mmcblk0: p1 p2 p3 p4 < p5 p6 p7 p8 p9 p10 p11 p12 p13 > >>>>>>>>>> p14 p15 p16 p17 p18 p19 p20 p21 > > >>>>>>>>>> > >>>>>>>>>> So after 627 seconds, a bit hard for users to wait this long for their > >>>>>>>>>> root filesystem. > >>>>>>>>> > >>>>>>>>> If the driver does not support busy detection and the eMMC card provides > >>>>>>>>> zero as the cmd6 generic timeout (which it may especially as cmd6 generic > >>>>>>>>> timeout wasn't added until eMMCv4.5), then __mmc_switch() defaults to > >>>>>>>>> waiting 10 minutes i.e. > >>>>>>>>> > >>>>>>>>> #define MMC_OPS_TIMEOUT_MS (10 * 60 * 1000) /* 10 minute timeout */ > >>>>>>>> > >>>>>>>> Urgh! Yes, I have verified that this is exactly what happens. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> So removal of CMD13 polling for HS mode (as per commit > >>>>>>>>> 08573eaf1a70104f83fdbee9b84e5be03480e9ed) is going to be a problem for some > >>>>>>>>> combinations of eMMC cards and host drivers. > >>>>>>>> > >>>>>>>> I was looking in the __mmc_switch() function, it's just a pain to walk > >>>>>>>> trough it :-) So first out I decided to clean it up and factor out the > >>>>>>>> polling parts. I will post the patches first out tomorrow morning, > >>>>>>>> running some final test right now. > >>>>>>>> > >>>>>>>> Although, that of course doesn't solve our problem. As I see it we > >>>>>>>> only have a few options here. > >>>>>>>> > >>>>>>>> 1) In case when cmd6 generic timeout isn't available, let's assign > >>>>>>>> another empirically selected value. > >>>>>>>> 2) Use a specific timeout when switching to HS mode. > >>>>>>>> 3) Even if we deploy 1 (and 2), perhaps we still should allow polling > >>>>>>>> with CMD13 for switching to HS mode - unless it causes issues for some > >>>>>>>> cards/drivers combination? > >>>>>>>> > >>>>>>>> BTW, I already tried 2) and it indeed solves the problem, although > >>>>>>>> depending on the selected timeout, it might delay the card detection > >>>>>>>> to process. > >>>>>>>> > >>>>>>>> Thoughts? > >>>>>>> > >>>>>>> I just have a try of switching to HS mode with Hynix EMMC, the first > >>>>>>> CMD13 gets response of 0x900, but the EMMC is still pull-low DAT0. so > >>>>>>> that CMD13 cannot indicate current card status in this case. > >>>>>> > >>>>>> Thanks for sharing that. Okay, so clearly we have some cards that > >>>>>> don't supports polling with CMD13 when switching to HS mode. > >>>>>> One could of course add quirks for these kind of cards and do a fixed > >>>>>> delay for them, but then to find out which these cards are is going to > >>>>>> be hard. > >>>>>> > >>>>>> It seems like we are left with using a fixed delay. Any ideas of what > >>>>>> such delay should be? And should we have one specific for switch to > >>>>>> the various speed modes and a different one that overrides the CMD6 > >>>>>> generic timout, when it doesn't exist? > >>>>>> > >>>>> > >>>>> Replying to my own earlier response, as I believe the problem could > >>>>> also be related to another old commit, see below. > >>>>> > >>>>> commit a27fbf2f067b0cd6f172c8b696b9a44c58bfaa7a > >>>>> Author: Seungwon Jeon > >>>>> Date: Wed Sep 4 21:21:05 2013 +0900 > >>>>> > >>>>> mmc: add ignorance case for CMD13 CRC error > >>>>> > >>>>> While speed mode is changed, CMD13 cannot be guaranteed. > >>>>> According to the spec., it is not recommended to use CMD13 > >>>>> to check the busy completion of the timing change. > >>>>> If CMD13 is used in this case, CRC error must be ignored. > >>>>> > >>>>> Signed-off-by: Seungwon Jeon > >>>>> Acked-by: Ulf Hansson > >>>>> Signed-off-by: Chris Ball > >>>>> > >>>>> > >>>>> The intent with this commit was not really correct. We don't want to > >>>>> ignore CRC errors, but instead we should *re-try* sending CMD13 once > >>>>> we get a CRC error. > >>>>> > >>>>> Unfortunate since this commit, instead we tell the host driver to > >>>>> *ignore* CRC errors and instead reads the status and returns 0 > >>>>> (indicating success). In the mmc core, in __mmc_switch(), it will thus > >>>>> parse the status reply, even for a reply that might have been received > >>>>> with a CRC error. Not good! > >>>> > >>>> I agree: ignoring CRC errors and then expecting the status in the response > >>>> to be correct doesn't make sense. > >>>> > >>>> However, it raises the question of what to do if there are always CRC errors > >>>> e.g. if it only works without CRC errors once the mode and frequency are > >>>> changed in the host controller. > >>>> > >>>>> I am wondering whether this actually is the main problem to why we > >>>>> think polling isn't working for some cases. And perhaps that was the > >>>>> original problem Chaotian was trying to solve? > >>>>> > >>>>> Thoughts? > >>>> > >>>> Does Chaotian have a real problem since his driver has busy detection anyway? > >>> > >>> In fact, I have not encounter CRC errors of CMD13, I have tried several > >>> eMMC cards, after mode switch, CMD13 will only gets 0x800 response and > >>> we don't know if card is busy by 0x800 response. > >> > >> Does it change to 0x900 when it is not busy? > >> > > No, it will not change to 0x900 when it is not busy. > > > >> But anyway the question was: do you have busy detection in your driver? > >> > > driver has busy detection ops->card_busy() but seems it's MMC core > > layer's responsibility to ensure that card is not busy when driver > > starts to issue commands. > > I tried a card here. The time between HS switch response and busy > de-assertion was only 58us i.e. practically instant. The CMD6 response was > 0x800 but the subsequent CMD13 response was 0x900. > > How long does it take your failing card to switch to HS? > It depends on EMMC chip type, some are very fast and some are take several ms. I just test Sandisk-SDIN9D-S2, CMD13 also gets 0x800 response after busy-deassert.