[PATCH 0/2] mmc: add unstuck function if host is in deadlock state

linux-mmc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] mmc: add unstuck function if host is in deadlock state
@ 2019-10-11 13:15 Ludovic Barre
  2019-10-11 13:15 ` [PATCH 1/2] " Ludovic Barre
  2019-10-11 13:15 ` [PATCH 2/2] mmc: mmci: add unstuck feature Ludovic Barre
  0 siblings, 2 replies; 6+ messages in thread
From: Ludovic Barre @ 2019-10-11 13:15 UTC (permalink / raw)
  To: Ulf Hansson, Rob Herring
  Cc: srinivas.kandagatla, Maxime Coquelin, Alexandre Torgue,
	linux-arm-kernel, linux-kernel, devicetree, linux-mmc,
	linux-stm32, Ludovic Barre

From: Ludovic Barre <ludovic.barre@st.com>

As discussed in this thread: https://patchwork.kernel.org/patch/10786421/
After a request, the host could be in deadlock state, and waiting
for a specific action to unstuck the hardware block before
resending a new command. This series adds mmc_hw_unstuck
callback (structure mmc_host_ops) before resending a new command
(call in mmc_blk_mq_rw_recovery, mmc_wait_for_req_done).

Ludovic Barre (2):
  mmc: add unstuck function if host is in deadlock state
  mmc: mmci: add unstuck feature

 drivers/mmc/core/block.c | 11 +++++++++++
 drivers/mmc/core/core.c  | 35 +++++++++++++++++++++++++++++++++--
 drivers/mmc/host/mmci.c  | 23 +++++++++++++++++++++--
 include/linux/mmc/core.h |  1 +
 include/linux/mmc/host.h |  7 +++++++
 5 files changed, 73 insertions(+), 4 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] mmc: add unstuck function if host is in deadlock state
  2019-10-11 13:15 [PATCH 0/2] mmc: add unstuck function if host is in deadlock state Ludovic Barre
@ 2019-10-11 13:15 ` Ludovic Barre
  2019-10-21 13:35   ` Ulf Hansson
  2019-10-11 13:15 ` [PATCH 2/2] mmc: mmci: add unstuck feature Ludovic Barre
  1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Barre @ 2019-10-11 13:15 UTC (permalink / raw)
  To: Ulf Hansson, Rob Herring
  Cc: srinivas.kandagatla, Maxime Coquelin, Alexandre Torgue,
	linux-arm-kernel, linux-kernel, devicetree, linux-mmc,
	linux-stm32, Ludovic Barre

From: Ludovic Barre <ludovic.barre@st.com>

After a request a host may be in deadlock state, and wait
a specific action to unstuck the hardware block before
re-sending a new command.

This patch adds an optional callback mmc_hw_unstuck which
allows the host to unstuck the controller. In order to avoid
a critical context, this callback must be called when the
request is completed. Depending the mmc request, the completion
function is defined by mrq->done and could be in block.c or core.c.

mmc_hw_unstuck is called if the host returns an cmd/sbc/stop/data
DEADLK error.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
---
 drivers/mmc/core/block.c | 11 +++++++++++
 drivers/mmc/core/core.c  | 35 +++++++++++++++++++++++++++++++++--
 include/linux/mmc/core.h |  1 +
 include/linux/mmc/host.h |  7 +++++++
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 2c71a434c915..2f723e2f5fde 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1799,6 +1799,17 @@ static void mmc_blk_mq_rw_recovery(struct mmc_queue *mq, struct request *req)
 	u32 blocks;
 	int err;
 
+	/*
+	 * if the host return a deadlock, it needs to be unstuck
+	 * before to send a new command.
+	 */
+	if (brq->sbc.error == -EDEADLK || brq->cmd.error == -EDEADLK ||
+	    brq->stop.error == -EDEADLK || brq->data.error == -EDEADLK) {
+		pr_err("%s: host is in bad state, must be unstuck\n",
+		       req->rq_disk->disk_name);
+		mmc_hw_unstuck(card->host);
+	}
+
 	/*
 	 * Some errors the host driver might not have seen. Set the number of
 	 * bytes transferred to zero in that case.
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 221127324709..43fe59a7403b 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -397,6 +397,7 @@ static int __mmc_start_req(struct mmc_host *host, struct mmc_request *mrq)
 void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
 {
 	struct mmc_command *cmd;
+	int sbc_err, stop_err, data_err;
 
 	while (1) {
 		wait_for_completion(&mrq->completion);
@@ -420,8 +421,24 @@ void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
 				       mmc_hostname(host), __func__);
 			}
 		}
-		if (!cmd->error || !cmd->retries ||
-		    mmc_card_removed(host->card))
+
+		sbc_err =  mrq->sbc ? mrq->sbc->error : 0;
+		stop_err = mrq->stop ? mrq->stop->error : 0;
+		data_err =  mrq->data ? mrq->data->error : 0;
+
+		/*
+		 * if the host return a deadlock, it needs to be unstuck
+		 * before to send a new command.
+		 */
+		if (cmd->error == -EDEADLK || sbc_err == -EDEADLK ||
+		    stop_err == -EDEADLK || data_err == -EDEADLK) {
+			pr_debug("%s: host is in bad state, must be unstuck\n",
+				 mmc_hostname(host));
+			mmc_hw_unstuck(host);
+		}
+
+		if ((!cmd->error && !sbc_err && !stop_err && !data_err) ||
+		    !cmd->retries || mmc_card_removed(host->card))
 			break;
 
 		mmc_retune_recheck(host);
@@ -430,6 +447,12 @@ void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
 			 mmc_hostname(host), cmd->opcode, cmd->error);
 		cmd->retries--;
 		cmd->error = 0;
+		if (mrq->sbc)
+			mrq->sbc->error = 0;
+		if (mrq->stop)
+			mrq->stop->error = 0;
+		if (mrq->data)
+			mrq->data->error = 0;
 		__mmc_start_request(host, mrq);
 	}
 
@@ -2161,6 +2184,14 @@ int mmc_sw_reset(struct mmc_host *host)
 }
 EXPORT_SYMBOL(mmc_sw_reset);
 
+void mmc_hw_unstuck(struct mmc_host *host)
+{
+	if (!host->ops->hw_unstuck)
+		return;
+	host->ops->hw_unstuck(host);
+}
+EXPORT_SYMBOL(mmc_hw_unstuck);
+
 static int mmc_rescan_try_freq(struct mmc_host *host, unsigned freq)
 {
 	host->f_init = freq;
diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index b7ba8810a3b5..eb10b8194073 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -173,6 +173,7 @@ void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq);
 int mmc_wait_for_cmd(struct mmc_host *host, struct mmc_command *cmd,
 		int retries);
 
+void mmc_hw_unstuck(struct mmc_host *host);
 int mmc_hw_reset(struct mmc_host *host);
 int mmc_sw_reset(struct mmc_host *host);
 void mmc_set_data_timeout(struct mmc_data *data, const struct mmc_card *card);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index ba703384bea0..8b52cafcd1eb 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -163,6 +163,13 @@ struct mmc_host_ops {
 	void	(*hw_reset)(struct mmc_host *host);
 	void	(*card_event)(struct mmc_host *host);
 
+	/*
+	 * Optional callback, if your host could be in deadlock after a command
+	 * and need a specific action to unstuck the controller before sending
+	 * new command.
+	 */
+	void	(*hw_unstuck)(struct mmc_host *host);
+
 	/*
 	 * Optional callback to support controllers with HW issues for multiple
 	 * I/O. Returns the number of supported blocks for the request.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] mmc: mmci: add unstuck feature
  2019-10-11 13:15 [PATCH 0/2] mmc: add unstuck function if host is in deadlock state Ludovic Barre
  2019-10-11 13:15 ` [PATCH 1/2] " Ludovic Barre
@ 2019-10-11 13:15 ` Ludovic Barre
  1 sibling, 0 replies; 6+ messages in thread
From: Ludovic Barre @ 2019-10-11 13:15 UTC (permalink / raw)
  To: Ulf Hansson, Rob Herring
  Cc: srinivas.kandagatla, Maxime Coquelin, Alexandre Torgue,
	linux-arm-kernel, linux-kernel, devicetree, linux-mmc,
	linux-stm32, Ludovic Barre

From: Ludovic Barre <ludovic.barre@st.com>

On busy_timeout feature if busy is too long on R1B command
a datatimeout occurs and a specific actions is needed to clear
the DPSM bit:
-reset the controller to clear the DPSM bit.
-restore registers: clk, pwr, datactrl.

Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
---
 drivers/mmc/host/mmci.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 40e72c30ea84..dafba4e0afc5 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -1320,7 +1320,7 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd,
 		cmd->error = -EILSEQ;
 	} else if (host->variant->busy_timeout && busy_resp &&
 		   status & MCI_DATATIMEOUT) {
-		cmd->error = -ETIMEDOUT;
+		cmd->error = -EDEADLK;
 	} else {
 		cmd->resp[0] = readl(base + MMCIRESPONSE0);
 		cmd->resp[1] = readl(base + MMCIRESPONSE1);
@@ -1332,7 +1332,6 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd,
 		if (host->data) {
 			/* Terminate the DMA transfer */
 			mmci_dma_error(host);
-
 			mmci_stop_data(host);
 			if (host->variant->cmdreg_stop && cmd->error) {
 				mmci_stop_command(host);
@@ -1787,6 +1786,25 @@ static int mmci_sig_volt_switch(struct mmc_host *mmc, struct mmc_ios *ios)
 	return ret;
 }
 
+static void mmci_hw_unstuck(struct mmc_host *mmc)
+{
+	struct mmci_host *host = mmc_priv(mmc);
+	unsigned long flags;
+
+	if (host->rst) {
+		reset_control_assert(host->rst);
+		udelay(2);
+		reset_control_deassert(host->rst);
+	}
+
+	spin_lock_irqsave(&host->lock, flags);
+	writel(host->clk_reg, host->base + MMCICLOCK);
+	writel(host->pwr_reg, host->base + MMCIPOWER);
+	writel(MCI_IRQENABLE | host->variant->start_err,
+	       host->base + MMCIMASK0);
+	spin_unlock_irqrestore(&host->lock, flags);
+}
+
 static struct mmc_host_ops mmci_ops = {
 	.request	= mmci_request,
 	.pre_req	= mmci_pre_request,
@@ -1795,6 +1813,7 @@ static struct mmc_host_ops mmci_ops = {
 	.get_ro		= mmc_gpio_get_ro,
 	.get_cd		= mmci_get_cd,
 	.start_signal_voltage_switch = mmci_sig_volt_switch,
+	.hw_unstuck	= mmci_hw_unstuck,
 };
 
 static int mmci_of_parse(struct device_node *np, struct mmc_host *mmc)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mmc: add unstuck function if host is in deadlock state
  2019-10-11 13:15 ` [PATCH 1/2] " Ludovic Barre
@ 2019-10-21 13:35   ` Ulf Hansson
  2019-11-13 16:54     ` Ludovic BARRE
  0 siblings, 1 reply; 6+ messages in thread
From: Ulf Hansson @ 2019-10-21 13:35 UTC (permalink / raw)
  To: Ludovic Barre
  Cc: Rob Herring, Srinivas Kandagatla, Maxime Coquelin,
	Alexandre Torgue, Linux ARM, Linux Kernel Mailing List, DTML,
	linux-mmc, linux-stm32

On Fri, 11 Oct 2019 at 15:15, Ludovic Barre <ludovic.Barre@st.com> wrote:
>
> From: Ludovic Barre <ludovic.barre@st.com>
>
> After a request a host may be in deadlock state, and wait
> a specific action to unstuck the hardware block before
> re-sending a new command.

Rather than talking about "unstuck" and "deadlock", how about instead
describing that an MMC controller, may end up in an non-functional
state hanging on something. Then to allow it to serve new requests it
needs to be reset.

>
> This patch adds an optional callback mmc_hw_unstuck which
> allows the host to unstuck the controller. In order to avoid
> a critical context, this callback must be called when the
> request is completed. Depending the mmc request, the completion
> function is defined by mrq->done and could be in block.c or core.c.

I think it's important to state exactly what is expected from the core
perspective, by the mmc host driver when it calls this new host ops.
We need to clarify that.

>
> mmc_hw_unstuck is called if the host returns an cmd/sbc/stop/data
> DEADLK error.

To me, this approach seems a bit upside-down. Although, I have to
admit that I haven't thought through this completely yet.

The thing is, to make this useful for host drivers in general, I
instead think we need to add timeout to each request that the core
sends to the host driver. In other words, rather than waiting forever
in the core for the completion variable to be set, via calling
wait_for_completion() we could call wait_for_completion_timeout(). The
tricky part is to figure out what timeout to use for each request.
Perhaps that is even why you picked the approach as implemented in
@subject patch instead?

Anyway, the typical scenario I see, is that the host driver is
hanging, likely waiting for an IRQ that never get raised. So, unless
it implements it own variant of a "request timeout" mechanism, it
simple isn't able to call mmc_request_done() to inform the core about
that the request has failed.

For comments to the code, I defer that to the next step, when we have
agreed on the way forward.

Kind regards
Uffe

>
> Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
> ---
>  drivers/mmc/core/block.c | 11 +++++++++++
>  drivers/mmc/core/core.c  | 35 +++++++++++++++++++++++++++++++++--
>  include/linux/mmc/core.h |  1 +
>  include/linux/mmc/host.h |  7 +++++++
>  4 files changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 2c71a434c915..2f723e2f5fde 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1799,6 +1799,17 @@ static void mmc_blk_mq_rw_recovery(struct mmc_queue *mq, struct request *req)
>         u32 blocks;
>         int err;
>
> +       /*
> +        * if the host return a deadlock, it needs to be unstuck
> +        * before to send a new command.
> +        */
> +       if (brq->sbc.error == -EDEADLK || brq->cmd.error == -EDEADLK ||
> +           brq->stop.error == -EDEADLK || brq->data.error == -EDEADLK) {
> +               pr_err("%s: host is in bad state, must be unstuck\n",
> +                      req->rq_disk->disk_name);
> +               mmc_hw_unstuck(card->host);
> +       }
> +
>         /*
>          * Some errors the host driver might not have seen. Set the number of
>          * bytes transferred to zero in that case.
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 221127324709..43fe59a7403b 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -397,6 +397,7 @@ static int __mmc_start_req(struct mmc_host *host, struct mmc_request *mrq)
>  void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
>  {
>         struct mmc_command *cmd;
> +       int sbc_err, stop_err, data_err;
>
>         while (1) {
>                 wait_for_completion(&mrq->completion);
> @@ -420,8 +421,24 @@ void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
>                                        mmc_hostname(host), __func__);
>                         }
>                 }
> -               if (!cmd->error || !cmd->retries ||
> -                   mmc_card_removed(host->card))
> +
> +               sbc_err =  mrq->sbc ? mrq->sbc->error : 0;
> +               stop_err = mrq->stop ? mrq->stop->error : 0;
> +               data_err =  mrq->data ? mrq->data->error : 0;
> +
> +               /*
> +                * if the host return a deadlock, it needs to be unstuck
> +                * before to send a new command.
> +                */
> +               if (cmd->error == -EDEADLK || sbc_err == -EDEADLK ||
> +                   stop_err == -EDEADLK || data_err == -EDEADLK) {
> +                       pr_debug("%s: host is in bad state, must be unstuck\n",
> +                                mmc_hostname(host));
> +                       mmc_hw_unstuck(host);
> +               }
> +
> +               if ((!cmd->error && !sbc_err && !stop_err && !data_err) ||
> +                   !cmd->retries || mmc_card_removed(host->card))
>                         break;
>
>                 mmc_retune_recheck(host);
> @@ -430,6 +447,12 @@ void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
>                          mmc_hostname(host), cmd->opcode, cmd->error);
>                 cmd->retries--;
>                 cmd->error = 0;
> +               if (mrq->sbc)
> +                       mrq->sbc->error = 0;
> +               if (mrq->stop)
> +                       mrq->stop->error = 0;
> +               if (mrq->data)
> +                       mrq->data->error = 0;
>                 __mmc_start_request(host, mrq);
>         }
>
> @@ -2161,6 +2184,14 @@ int mmc_sw_reset(struct mmc_host *host)
>  }
>  EXPORT_SYMBOL(mmc_sw_reset);
>
> +void mmc_hw_unstuck(struct mmc_host *host)
> +{
> +       if (!host->ops->hw_unstuck)
> +               return;
> +       host->ops->hw_unstuck(host);
> +}
> +EXPORT_SYMBOL(mmc_hw_unstuck);
> +
>  static int mmc_rescan_try_freq(struct mmc_host *host, unsigned freq)
>  {
>         host->f_init = freq;
> diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
> index b7ba8810a3b5..eb10b8194073 100644
> --- a/include/linux/mmc/core.h
> +++ b/include/linux/mmc/core.h
> @@ -173,6 +173,7 @@ void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq);
>  int mmc_wait_for_cmd(struct mmc_host *host, struct mmc_command *cmd,
>                 int retries);
>
> +void mmc_hw_unstuck(struct mmc_host *host);
>  int mmc_hw_reset(struct mmc_host *host);
>  int mmc_sw_reset(struct mmc_host *host);
>  void mmc_set_data_timeout(struct mmc_data *data, const struct mmc_card *card);
> diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
> index ba703384bea0..8b52cafcd1eb 100644
> --- a/include/linux/mmc/host.h
> +++ b/include/linux/mmc/host.h
> @@ -163,6 +163,13 @@ struct mmc_host_ops {
>         void    (*hw_reset)(struct mmc_host *host);
>         void    (*card_event)(struct mmc_host *host);
>
> +       /*
> +        * Optional callback, if your host could be in deadlock after a command
> +        * and need a specific action to unstuck the controller before sending
> +        * new command.
> +        */
> +       void    (*hw_unstuck)(struct mmc_host *host);
> +
>         /*
>          * Optional callback to support controllers with HW issues for multiple
>          * I/O. Returns the number of supported blocks for the request.
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mmc: add unstuck function if host is in deadlock state
  2019-10-21 13:35   ` Ulf Hansson
@ 2019-11-13 16:54     ` Ludovic BARRE
  2019-11-14 14:49       ` Ulf Hansson
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic BARRE @ 2019-11-13 16:54 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: DTML, Alexandre Torgue, linux-mmc, Linux Kernel Mailing List,
	Rob Herring, Srinivas Kandagatla, Maxime Coquelin, linux-stm32,
	Linux ARM



Le 10/21/19 à 3:35 PM, Ulf Hansson a écrit :
> On Fri, 11 Oct 2019 at 15:15, Ludovic Barre <ludovic.Barre@st.com> wrote:
>>
>> From: Ludovic Barre <ludovic.barre@st.com>
>>
>> After a request a host may be in deadlock state, and wait
>> a specific action to unstuck the hardware block before
>> re-sending a new command.
> 
> Rather than talking about "unstuck" and "deadlock", how about instead
> describing that an MMC controller, may end up in an non-functional
> state hanging on something. Then to allow it to serve new requests it
> needs to be reset.
> 

Ok, deadlock naming is perhaps too stronght and scary.

>>
>> This patch adds an optional callback mmc_hw_unstuck which
>> allows the host to unstuck the controller. In order to avoid
>> a critical context, this callback must be called when the
>> request is completed. Depending the mmc request, the completion
>> function is defined by mrq->done and could be in block.c or core.c.
> 
> I think it's important to state exactly what is expected from the core
> perspective, by the mmc host driver when it calls this new host ops.
> We need to clarify that.
> 
>>
>> mmc_hw_unstuck is called if the host returns an cmd/sbc/stop/data
>> DEADLK error.
> 
> To me, this approach seems a bit upside-down. Although, I have to
> admit that I haven't thought through this completely yet.
> 
> The thing is, to make this useful for host drivers in general, I
> instead think we need to add timeout to each request that the core
> sends to the host driver. In other words, rather than waiting forever
> in the core for the completion variable to be set, via calling
> wait_for_completion() we could call wait_for_completion_timeout(). The
> tricky part is to figure out what timeout to use for each request.
> Perhaps that is even why you picked the approach as implemented in
> @subject patch instead?

On STM32 SDMMC variant, If datatimeout occurs on R1B request the Data
Path State Machine stays in busy and only the DPSM is non-functional.
The hardware block waits a software action to abort the DPSM.

Like the CPSM stay alive, the framework can sent some requests
(without data, example cmd13:status) before to had this
timeout issue.

POV framework I understand the possibility to have a completion_timeout,
for more safety. But for this specific sdmmc case, I'm not fan, because 
the completion timeout error will occur several requests after the real 
issue (which put the DPSM non-functional). when the completion timeout
occurs we can't know if it's due to R1B timeout or an other issue.

To resolve the SDMMC's specificity, I can proposed you to add a threaded
irq in mmci drivers to abort the DPSM and terminate the request.

> 
> Anyway, the typical scenario I see, is that the host driver is
> hanging, likely waiting for an IRQ that never get raised. So, unless
> it implements it own variant of a "request timeout" mechanism, it
> simple isn't able to call mmc_request_done() to inform the core about
> that the request has failed.
> 
> For comments to the code, I defer that to the next step, when we have
> agreed on the way forward.
> 
> Kind regards
> Uffe
> 
>>
>> Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
>> ---
>>   drivers/mmc/core/block.c | 11 +++++++++++
>>   drivers/mmc/core/core.c  | 35 +++++++++++++++++++++++++++++++++--
>>   include/linux/mmc/core.h |  1 +
>>   include/linux/mmc/host.h |  7 +++++++
>>   4 files changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
>> index 2c71a434c915..2f723e2f5fde 100644
>> --- a/drivers/mmc/core/block.c
>> +++ b/drivers/mmc/core/block.c
>> @@ -1799,6 +1799,17 @@ static void mmc_blk_mq_rw_recovery(struct mmc_queue *mq, struct request *req)
>>          u32 blocks;
>>          int err;
>>
>> +       /*
>> +        * if the host return a deadlock, it needs to be unstuck
>> +        * before to send a new command.
>> +        */
>> +       if (brq->sbc.error == -EDEADLK || brq->cmd.error == -EDEADLK ||
>> +           brq->stop.error == -EDEADLK || brq->data.error == -EDEADLK) {
>> +               pr_err("%s: host is in bad state, must be unstuck\n",
>> +                      req->rq_disk->disk_name);
>> +               mmc_hw_unstuck(card->host);
>> +       }
>> +
>>          /*
>>           * Some errors the host driver might not have seen. Set the number of
>>           * bytes transferred to zero in that case.
>> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
>> index 221127324709..43fe59a7403b 100644
>> --- a/drivers/mmc/core/core.c
>> +++ b/drivers/mmc/core/core.c
>> @@ -397,6 +397,7 @@ static int __mmc_start_req(struct mmc_host *host, struct mmc_request *mrq)
>>   void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
>>   {
>>          struct mmc_command *cmd;
>> +       int sbc_err, stop_err, data_err;
>>
>>          while (1) {
>>                  wait_for_completion(&mrq->completion);
>> @@ -420,8 +421,24 @@ void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
>>                                         mmc_hostname(host), __func__);
>>                          }
>>                  }
>> -               if (!cmd->error || !cmd->retries ||
>> -                   mmc_card_removed(host->card))
>> +
>> +               sbc_err =  mrq->sbc ? mrq->sbc->error : 0;
>> +               stop_err = mrq->stop ? mrq->stop->error : 0;
>> +               data_err =  mrq->data ? mrq->data->error : 0;
>> +
>> +               /*
>> +                * if the host return a deadlock, it needs to be unstuck
>> +                * before to send a new command.
>> +                */
>> +               if (cmd->error == -EDEADLK || sbc_err == -EDEADLK ||
>> +                   stop_err == -EDEADLK || data_err == -EDEADLK) {
>> +                       pr_debug("%s: host is in bad state, must be unstuck\n",
>> +                                mmc_hostname(host));
>> +                       mmc_hw_unstuck(host);
>> +               }
>> +
>> +               if ((!cmd->error && !sbc_err && !stop_err && !data_err) ||
>> +                   !cmd->retries || mmc_card_removed(host->card))
>>                          break;
>>
>>                  mmc_retune_recheck(host);
>> @@ -430,6 +447,12 @@ void mmc_wait_for_req_done(struct mmc_host *host, struct mmc_request *mrq)
>>                           mmc_hostname(host), cmd->opcode, cmd->error);
>>                  cmd->retries--;
>>                  cmd->error = 0;
>> +               if (mrq->sbc)
>> +                       mrq->sbc->error = 0;
>> +               if (mrq->stop)
>> +                       mrq->stop->error = 0;
>> +               if (mrq->data)
>> +                       mrq->data->error = 0;
>>                  __mmc_start_request(host, mrq);
>>          }
>>
>> @@ -2161,6 +2184,14 @@ int mmc_sw_reset(struct mmc_host *host)
>>   }
>>   EXPORT_SYMBOL(mmc_sw_reset);
>>
>> +void mmc_hw_unstuck(struct mmc_host *host)
>> +{
>> +       if (!host->ops->hw_unstuck)
>> +               return;
>> +       host->ops->hw_unstuck(host);
>> +}
>> +EXPORT_SYMBOL(mmc_hw_unstuck);
>> +
>>   static int mmc_rescan_try_freq(struct mmc_host *host, unsigned freq)
>>   {
>>          host->f_init = freq;
>> diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
>> index b7ba8810a3b5..eb10b8194073 100644
>> --- a/include/linux/mmc/core.h
>> +++ b/include/linux/mmc/core.h
>> @@ -173,6 +173,7 @@ void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq);
>>   int mmc_wait_for_cmd(struct mmc_host *host, struct mmc_command *cmd,
>>                  int retries);
>>
>> +void mmc_hw_unstuck(struct mmc_host *host);
>>   int mmc_hw_reset(struct mmc_host *host);
>>   int mmc_sw_reset(struct mmc_host *host);
>>   void mmc_set_data_timeout(struct mmc_data *data, const struct mmc_card *card);
>> diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
>> index ba703384bea0..8b52cafcd1eb 100644
>> --- a/include/linux/mmc/host.h
>> +++ b/include/linux/mmc/host.h
>> @@ -163,6 +163,13 @@ struct mmc_host_ops {
>>          void    (*hw_reset)(struct mmc_host *host);
>>          void    (*card_event)(struct mmc_host *host);
>>
>> +       /*
>> +        * Optional callback, if your host could be in deadlock after a command
>> +        * and need a specific action to unstuck the controller before sending
>> +        * new command.
>> +        */
>> +       void    (*hw_unstuck)(struct mmc_host *host);
>> +
>>          /*
>>           * Optional callback to support controllers with HW issues for multiple
>>           * I/O. Returns the number of supported blocks for the request.
>> --
>> 2.17.1
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mmc: add unstuck function if host is in deadlock state
  2019-11-13 16:54     ` Ludovic BARRE
@ 2019-11-14 14:49       ` Ulf Hansson
  0 siblings, 0 replies; 6+ messages in thread
From: Ulf Hansson @ 2019-11-14 14:49 UTC (permalink / raw)
  To: Ludovic BARRE
  Cc: Rob Herring, Srinivas Kandagatla, Maxime Coquelin,
	Alexandre Torgue, Linux ARM, Linux Kernel Mailing List, DTML,
	linux-mmc, linux-stm32

On Wed, 13 Nov 2019 at 17:54, Ludovic BARRE <ludovic.barre@st.com> wrote:
>
>
>
> Le 10/21/19 à 3:35 PM, Ulf Hansson a écrit :
> > On Fri, 11 Oct 2019 at 15:15, Ludovic Barre <ludovic.Barre@st.com> wrote:
> >>
> >> From: Ludovic Barre <ludovic.barre@st.com>
> >>
> >> After a request a host may be in deadlock state, and wait
> >> a specific action to unstuck the hardware block before
> >> re-sending a new command.
> >
> > Rather than talking about "unstuck" and "deadlock", how about instead
> > describing that an MMC controller, may end up in an non-functional
> > state hanging on something. Then to allow it to serve new requests it
> > needs to be reset.
> >
>
> Ok, deadlock naming is perhaps too stronght and scary.
>
> >>
> >> This patch adds an optional callback mmc_hw_unstuck which
> >> allows the host to unstuck the controller. In order to avoid
> >> a critical context, this callback must be called when the
> >> request is completed. Depending the mmc request, the completion
> >> function is defined by mrq->done and could be in block.c or core.c.
> >
> > I think it's important to state exactly what is expected from the core
> > perspective, by the mmc host driver when it calls this new host ops.
> > We need to clarify that.
> >
> >>
> >> mmc_hw_unstuck is called if the host returns an cmd/sbc/stop/data
> >> DEADLK error.
> >
> > To me, this approach seems a bit upside-down. Although, I have to
> > admit that I haven't thought through this completely yet.
> >
> > The thing is, to make this useful for host drivers in general, I
> > instead think we need to add timeout to each request that the core
> > sends to the host driver. In other words, rather than waiting forever
> > in the core for the completion variable to be set, via calling
> > wait_for_completion() we could call wait_for_completion_timeout(). The
> > tricky part is to figure out what timeout to use for each request.
> > Perhaps that is even why you picked the approach as implemented in
> > @subject patch instead?
>
> On STM32 SDMMC variant, If datatimeout occurs on R1B request the Data
> Path State Machine stays in busy and only the DPSM is non-functional.
> The hardware block waits a software action to abort the DPSM.
>
> Like the CPSM stay alive, the framework can sent some requests
> (without data, example cmd13:status) before to had this
> timeout issue.
>
> POV framework I understand the possibility to have a completion_timeout,
> for more safety. But for this specific sdmmc case, I'm not fan, because
> the completion timeout error will occur several requests after the real
> issue (which put the DPSM non-functional). when the completion timeout
> occurs we can't know if it's due to R1B timeout or an other issue.

Right, I see what you are saying. So let's drop the approach suggested
in $subject series.

>
> To resolve the SDMMC's specificity, I can proposed you to add a threaded
> irq in mmci drivers to abort the DPSM and terminate the request.

Okay, so the threaded IRQ handler is needed, because the reset
operation may sleep (can't be executed in atomic context). Right?

That should work, but... let's move the discussion to that patch instead.

>
> >
> > Anyway, the typical scenario I see, is that the host driver is
> > hanging, likely waiting for an IRQ that never get raised. So, unless
> > it implements it own variant of a "request timeout" mechanism, it
> > simple isn't able to call mmc_request_done() to inform the core about
> > that the request has failed.
> >
> > For comments to the code, I defer that to the next step, when we have
> > agreed on the way forward.
> >
> > Kind regards
> > Uffe
> >

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-14 14:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-11 13:15 [PATCH 0/2] mmc: add unstuck function if host is in deadlock state Ludovic Barre
2019-10-11 13:15 ` [PATCH 1/2] " Ludovic Barre
2019-10-21 13:35   ` Ulf Hansson
2019-11-13 16:54     ` Ludovic BARRE
2019-11-14 14:49       ` Ulf Hansson
2019-10-11 13:15 ` [PATCH 2/2] mmc: mmci: add unstuck feature Ludovic Barre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).