From: Yann Gautier <yann.gautier@foss.st.com>
To: Linus Walleij <linus.walleij@linaro.org>,
<linux-mmc@vger.kernel.org>,
"Ulf Hansson" <ulf.hansson@linaro.org>
Cc: <stable@vger.kernel.org>, <phone-devel@vger.kernel.org>,
Ludovic Barre <ludovic.barre@st.com>,
Stephan Gerhold <stephan@gerhold.net>, <newbyte@disroot.org>
Subject: Re: [PATCH v3] mmc: core: Add a card quirk for non-hw busy detection
Date: Mon, 16 Aug 2021 16:03:31 +0200 [thread overview]
Message-ID: <2f449f6e-bca0-3c70-4255-26619e957d44@foss.st.com> (raw)
In-Reply-To: <20210720144115.1525257-1-linus.walleij@linaro.org>
On 7/20/21 4:41 PM, Linus Walleij wrote:
> Some boot partitions on the Samsung 4GB KLM4G1YE4C "4YMD1R" and "M4G1YC"
> cards appear broken when accessed randomly. CMD6 to switch back to the main
> partition randomly stalls after CMD18 access to the boot partition 1, and
> the card never comes back online. The accesses to the boot partitions work
> several times before this happens, but eventually the card access hangs
> while initializing the card.
>
> Some problematic eMMC cards are found in the Samsung GT-S7710 (Skomer)
> and SGH-I407 (Kyle) mobile phones.
>
> I tried using only single blocks with CMD17 on the boot partitions with the
> result that it crashed even faster.
>
> After a bit of root cause analysis it turns out that these old eMMC cards
> probably cannot do hardware busy detection (monitoring DAT0) properly.
>
> The card survives on older kernels, but this is because recent kernels have
> added busy detection handling for the SoC used in these phones, exposing
> the issue.
>
> Construct a quirk that makes the MMC cord avoid using the ->card_busy()
> callback if the card is listed with MMC_QUIRK_BROKEN_HW_BUSY_DETECT and
> register the known problematic cards. The core changes are pretty
> straight-forward with a helper inline to check of we can use hardware
> busy detection.
>
> On the MMCI host we have to counter the fact that if the host was able to
> use busy detect, it would be used unsolicited in the command IRQ callback.
> Rewrite this so that MMCI will not attempt to use hardware busy detection
> in the command IRQ until:
> - A card is attached to the host and
> - We know that the card can handle this busy detection
>
> I have glanced over the ->card_busy() callbacks on some other hosts and
> they seem to mostly read a register reflecting the value of DAT0 for this
> which works fine with the quirk in this patch. However if the error appear
> on other hosts they might need additional fixes.
>
> After applying this patch, the main partition can be accessed and mounted
> without problems on Samsung GT-S7710 and SGH-I407.
>
> Fixes: cb0335b778c7 ("mmc: mmci: add busy_complete callback")
> Cc: stable@vger.kernel.org
> Cc: phone-devel@vger.kernel.org
> Cc: Ludovic Barre <ludovic.barre@st.com>
> Cc: Stephan Gerhold <stephan@gerhold.net>
> Reported-by: newbyte@disroot.org
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Hi Linus,
I was just testing your patch on top of mmc/next.
Whereas mmc/next is fine, with your patch I fail to pass MMC test 5
(Multi-block write).
I've got this error on STM32MP157C-EV1 board:
[ 108.956218] mmc0: Starting tests of card mmc0:aaaa...
[ 108.959862] mmc0: Test case 5. Multi-block write...
[ 108.995615] mmc0: Warning: Host did not wait for busy state to end.
[ 109.000483] mmc0: Result: ERROR (-110)
Then nothing more happens.
The test was done on an SD-card Sandisk Extreme Pro SDXC UHS-I mark 3,
in DDR50 mode.
I'll try to add more traces to see what happens.
Best regards,
Yann
> ---
> ChangeLog v2->v3:
> - Rebase on v5.14-rc1
> - Reword the commit message slightly.
> ChangeLog v1->v2:
> - Rewrite to reflect the actual problem of broken busy detection.
> ---
> drivers/mmc/core/core.c | 8 ++++----
> drivers/mmc/core/core.h | 17 +++++++++++++++++
> drivers/mmc/core/mmc_ops.c | 4 ++--
> drivers/mmc/core/quirks.h | 21 +++++++++++++++++++++
> drivers/mmc/host/mmci.c | 22 ++++++++++++++++++++--
> include/linux/mmc/card.h | 1 +
> 6 files changed, 65 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 95fedcf56e4a..e08dd9ea3d46 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -232,7 +232,7 @@ static void __mmc_start_request(struct mmc_host *host, struct mmc_request *mrq)
> * And bypass I/O abort, reset and bus suspend operations.
> */
> if (sdio_is_io_busy(mrq->cmd->opcode, mrq->cmd->arg) &&
> - host->ops->card_busy) {
> + mmc_hw_busy_detect(host)) {
> int tries = 500; /* Wait aprox 500ms at maximum */
>
> while (host->ops->card_busy(host) && --tries)
> @@ -1200,7 +1200,7 @@ int mmc_set_uhs_voltage(struct mmc_host *host, u32 ocr)
> */
> if (!host->ops->start_signal_voltage_switch)
> return -EPERM;
> - if (!host->ops->card_busy)
> + if (!mmc_hw_busy_detect(host))
> pr_warn("%s: cannot verify signal voltage switch\n",
> mmc_hostname(host));
>
> @@ -1220,7 +1220,7 @@ int mmc_set_uhs_voltage(struct mmc_host *host, u32 ocr)
> * after the response of cmd11, but wait 1 ms to be sure
> */
> mmc_delay(1);
> - if (host->ops->card_busy && !host->ops->card_busy(host)) {
> + if (mmc_hw_busy_detect(host) && !host->ops->card_busy(host)) {
> err = -EAGAIN;
> goto power_cycle;
> }
> @@ -1241,7 +1241,7 @@ int mmc_set_uhs_voltage(struct mmc_host *host, u32 ocr)
> * Failure to switch is indicated by the card holding
> * dat[0:3] low
> */
> - if (host->ops->card_busy && host->ops->card_busy(host))
> + if (mmc_hw_busy_detect(host) && host->ops->card_busy(host))
> err = -EAGAIN;
>
> power_cycle:
> diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
> index 0c4de2030b3f..6a5619eed4a6 100644
> --- a/drivers/mmc/core/core.h
> +++ b/drivers/mmc/core/core.h
> @@ -181,4 +181,21 @@ static inline int mmc_flush_cache(struct mmc_host *host)
> return 0;
> }
>
> +/**
> + * mmc_hw_busy_detect() - Can we use hw busy detection?
> + * @host: the host in question
> + */
> +static inline bool mmc_hw_busy_detect(struct mmc_host *host)
> +{
> + struct mmc_card *card = host->card;
> + bool has_ops;
> + bool able = true;
> +
> + has_ops = (host->ops->card_busy != NULL);
> + if (card)
> + able = !(card->quirks & MMC_QUIRK_BROKEN_HW_BUSY_DETECT);
> +
> + return (has_ops && able);
> +}
> +
> #endif
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index 973756ed4016..546fc799a8e5 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -435,7 +435,7 @@ static int mmc_busy_cb(void *cb_data, bool *busy)
> u32 status = 0;
> int err;
>
> - if (host->ops->card_busy) {
> + if (mmc_hw_busy_detect(host)) {
> *busy = host->ops->card_busy(host);
> return 0;
> }
> @@ -597,7 +597,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> * when it's not allowed to poll by using CMD13, then we need to rely on
> * waiting the stated timeout to be sufficient.
> */
> - if (!send_status && !host->ops->card_busy) {
> + if (!send_status && !mmc_hw_busy_detect(host)) {
> mmc_delay(timeout_ms);
> goto out_tim;
> }
> diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h
> index d68e6e513a4f..8da6526f0eb0 100644
> --- a/drivers/mmc/core/quirks.h
> +++ b/drivers/mmc/core/quirks.h
> @@ -99,6 +99,27 @@ static const struct mmc_fixup __maybe_unused mmc_blk_fixups[] = {
> MMC_FIXUP("V10016", CID_MANFID_KINGSTON, CID_OEMID_ANY, add_quirk_mmc,
> MMC_QUIRK_TRIM_BROKEN),
>
> + /*
> + * Some older Samsung eMMCs have broken hardware busy detection.
> + * Enabling this feature in the host controller can make the card
> + * accesses lock up completely.
> + */
> + MMC_FIXUP("4YMD1R", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
> + MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
> + /* Samsung KLMxGxxE4x eMMCs from 2012: 4, 8, 16, 32 and 64 GB */
> + MMC_FIXUP("M4G1YC", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
> + MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
> + MMC_FIXUP("M8G1WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
> + MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
> + MMC_FIXUP("MAG2WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
> + MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
> + MMC_FIXUP("MBG4WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
> + MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
> + MMC_FIXUP("MAG2WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
> + MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
> + MMC_FIXUP("MCG8WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
> + MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
> +
> END_FIXUP
> };
>
> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> index 984d35055156..3046917b2b67 100644
> --- a/drivers/mmc/host/mmci.c
> +++ b/drivers/mmc/host/mmci.c
> @@ -347,6 +347,24 @@ static int mmci_card_busy(struct mmc_host *mmc)
> return busy;
> }
>
> +/* Use this if the MMCI variant AND the card supports it */
> +static bool mmci_use_busy_detect(struct mmci_host *host)
> +{
> + struct mmc_card *card = host->mmc->card;
> +
> + if (!host->variant->busy_detect)
> + return false;
> +
> + /* We don't allow this until we know that the card can handle it */
> + if (!card)
> + return false;
> +
> + if (card->quirks & MMC_QUIRK_BROKEN_HW_BUSY_DETECT)
> + return false;
> +
> + return true;
> +}
> +
> static void mmci_reg_delay(struct mmci_host *host)
> {
> /*
> @@ -1381,7 +1399,7 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd,
> return;
>
> /* Handle busy detection on DAT0 if the variant supports it. */
> - if (busy_resp && host->variant->busy_detect)
> + if (busy_resp && mmci_use_busy_detect(host))
> if (!host->ops->busy_complete(host, status, err_msk))
> return;
>
> @@ -1725,7 +1743,7 @@ static void mmci_set_max_busy_timeout(struct mmc_host *mmc)
> struct mmci_host *host = mmc_priv(mmc);
> u32 max_busy_timeout = 0;
>
> - if (!host->variant->busy_detect)
> + if (!mmci_use_busy_detect(host))
> return;
>
> if (host->variant->busy_timeout && mmc->actual_clock)
> diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
> index 74e6c0624d27..525a39951c6d 100644
> --- a/include/linux/mmc/card.h
> +++ b/include/linux/mmc/card.h
> @@ -280,6 +280,7 @@ struct mmc_card {
> /* for byte mode */
> #define MMC_QUIRK_NONSTD_SDIO (1<<2) /* non-standard SDIO card attached */
> /* (missing CIA registers) */
> +#define MMC_QUIRK_BROKEN_HW_BUSY_DETECT (1<<3) /* Disable hardware busy detection on DAT0 */
> #define MMC_QUIRK_NONSTD_FUNC_IF (1<<4) /* SDIO card has nonstd function interfaces */
> #define MMC_QUIRK_DISABLE_CD (1<<5) /* disconnect CD/DAT[3] resistor */
> #define MMC_QUIRK_INAND_CMD38 (1<<6) /* iNAND devices have broken CMD38 */
>
next prev parent reply other threads:[~2021-08-16 14:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-20 14:41 [PATCH v3] mmc: core: Add a card quirk for non-hw busy detection Linus Walleij
2021-08-16 13:31 ` Ulf Hansson
2021-08-16 14:03 ` Yann Gautier [this message]
2021-08-16 22:37 ` Linus Walleij
2021-08-24 15:50 ` Yann Gautier
2022-02-13 23:56 ` Linus Walleij
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2f449f6e-bca0-3c70-4255-26619e957d44@foss.st.com \
--to=yann.gautier@foss.st.com \
--cc=linus.walleij@linaro.org \
--cc=linux-mmc@vger.kernel.org \
--cc=ludovic.barre@st.com \
--cc=newbyte@disroot.org \
--cc=phone-devel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=stephan@gerhold.net \
--cc=ulf.hansson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).