linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V1 0/2] CQE fixes
@ 2020-05-06 14:34 Veerabhadrarao Badiganti
  2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti
  2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
  0 siblings, 2 replies; 13+ messages in thread
From: Veerabhadrarao Badiganti @ 2020-05-06 14:34 UTC (permalink / raw)
  To: adrian.hunter, ulf.hansson
  Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm,
	Veerabhadrarao Badiganti

Fixes for a couple of issues observed with CQE. One with CQE completion
path and the other one is with CQE recovery path.

Sarthak Garg (1):
  mmc: core: Fix recursive locking issue in CQE recovery path

Veerabhadrarao Badiganti (1):
  mmc: core: Check request type before completing the request

 drivers/mmc/core/block.c |  3 ++-
 drivers/mmc/core/queue.c | 11 ++++++-----
 2 files changed, 8 insertions(+), 6 deletions(-)

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V1 1/2] mmc: core: Check request type before completing the request
  2020-05-06 14:34 [PATCH V1 0/2] CQE fixes Veerabhadrarao Badiganti
@ 2020-05-06 14:34 ` Veerabhadrarao Badiganti
  2020-05-06 17:06   ` Adrian Hunter
  2020-05-08  8:12   ` Ulf Hansson
  2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
  1 sibling, 2 replies; 13+ messages in thread
From: Veerabhadrarao Badiganti @ 2020-05-06 14:34 UTC (permalink / raw)
  To: adrian.hunter, ulf.hansson
  Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm,
	Veerabhadrarao Badiganti, stable, Baolin Wang, Avri Altman,
	Chaotian Jing, Arnd Bergmann

In the request completion path with CQE, request type is being checked
after the request is getting completed. This is resulting in returning
the wrong request type and leading to the IO hang issue.

ASYNC request type is getting returned for DCMD type requests.
Because of this mismatch, mq->cqe_busy flag is never getting cleared
and the driver is not invoking blk_mq_hw_run_queue. So requests are not
getting dispatched to the LLD from the block layer.

All these eventually leading to IO hang issues.
So, get the request type before completing the request.

Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
---
 drivers/mmc/core/block.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 8499b56..c5367e2 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1370,6 +1370,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
 	struct mmc_request *mrq = &mqrq->brq.mrq;
 	struct request_queue *q = req->q;
 	struct mmc_host *host = mq->card->host;
+	enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
 	unsigned long flags;
 	bool put_card;
 	int err;
@@ -1399,7 +1400,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
 
 	spin_lock_irqsave(&mq->lock, flags);
 
-	mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+	mq->in_flight[issue_type] -= 1;
 
 	put_card = (mmc_tot_in_flight(mq) == 0);
 
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path
  2020-05-06 14:34 [PATCH V1 0/2] CQE fixes Veerabhadrarao Badiganti
  2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti
@ 2020-05-06 14:34 ` Veerabhadrarao Badiganti
  2020-05-07 11:48   ` Adrian Hunter
  2020-05-07 16:15   ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
  1 sibling, 2 replies; 13+ messages in thread
From: Veerabhadrarao Badiganti @ 2020-05-06 14:34 UTC (permalink / raw)
  To: adrian.hunter, ulf.hansson
  Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg,
	stable, Baolin Wang, Yoshihiro Shimoda, Andreas Koop,
	Thomas Gleixner

From: Sarthak Garg <sartgarg@codeaurora.org>

Consider the following stack trace

-001|raw_spin_lock_irqsave
-002|mmc_blk_cqe_complete_rq
-003|__blk_mq_complete_request(inline)
-003|blk_mq_complete_request(rq)
-004|mmc_cqe_timed_out(inline)
-004|mmc_mq_timed_out

mmc_mq_timed_out acquires the queue_lock for the first
time. The mmc_blk_cqe_complete_rq function also tries to acquire
the same queue lock resulting in recursive locking where the task
is spinning for the same lock which it has already acquired leading
to watchdog bark.

Fix this issue with the lock only for the required critical section.

Cc: <stable@vger.kernel.org> # v4.19+
Suggested-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org>
---
 drivers/mmc/core/queue.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 25bee3d..72bef39 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
 	case MMC_ISSUE_DCMD:
 		if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) {
 			if (recovery_needed)
-				__mmc_cqe_recovery_notifier(mq);
+				mmc_cqe_recovery_notifier(mrq);
 			return BLK_EH_RESET_TIMER;
 		}
 		/* No timeout (XXX: huh? comment doesn't make much sense) */
@@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
 
 	spin_lock_irqsave(&mq->lock, flags);
 
-	if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
+	if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) {
 		ret = BLK_EH_RESET_TIMER;
-	else
+		spin_unlock_irqrestore(&mq->lock, flags);
+	} else {
+		spin_unlock_irqrestore(&mq->lock, flags);
 		ret = mmc_cqe_timed_out(req);
-
-	spin_unlock_irqrestore(&mq->lock, flags);
+	}
 
 	return ret;
 }
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V1 1/2] mmc: core: Check request type before completing the request
  2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti
@ 2020-05-06 17:06   ` Adrian Hunter
  2020-05-08  8:12   ` Ulf Hansson
  1 sibling, 0 replies; 13+ messages in thread
From: Adrian Hunter @ 2020-05-06 17:06 UTC (permalink / raw)
  To: Veerabhadrarao Badiganti, ulf.hansson
  Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, stable,
	Baolin Wang, Avri Altman, Chaotian Jing, Arnd Bergmann

On 6/05/20 5:34 pm, Veerabhadrarao Badiganti wrote:
> In the request completion path with CQE, request type is being checked
> after the request is getting completed. This is resulting in returning
> the wrong request type and leading to the IO hang issue.
> 
> ASYNC request type is getting returned for DCMD type requests.
> Because of this mismatch, mq->cqe_busy flag is never getting cleared
> and the driver is not invoking blk_mq_hw_run_queue. So requests are not
> getting dispatched to the LLD from the block layer.
> 
> All these eventually leading to IO hang issues.
> So, get the request type before completing the request.
> 
> Cc: <stable@vger.kernel.org> # v4.19+

The fixed commit was in 4.16

> Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>

Fixes: 1e8e55b67030 ("mmc: block: Add CQE support")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>

Thank you for finding this!

> ---
>  drivers/mmc/core/block.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 8499b56..c5367e2 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1370,6 +1370,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
>  	struct mmc_request *mrq = &mqrq->brq.mrq;
>  	struct request_queue *q = req->q;
>  	struct mmc_host *host = mq->card->host;
> +	enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
>  	unsigned long flags;
>  	bool put_card;
>  	int err;
> @@ -1399,7 +1400,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
>  
>  	spin_lock_irqsave(&mq->lock, flags);
>  
> -	mq->in_flight[mmc_issue_type(mq, req)] -= 1;
> +	mq->in_flight[issue_type] -= 1;
>  
>  	put_card = (mmc_tot_in_flight(mq) == 0);
>  
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path
  2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
@ 2020-05-07 11:48   ` Adrian Hunter
  2020-05-07 14:06     ` [PATCH] mmc: block: Fix request completion in the CQE timeout path Adrian Hunter
  2020-05-07 16:15   ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
  1 sibling, 1 reply; 13+ messages in thread
From: Adrian Hunter @ 2020-05-07 11:48 UTC (permalink / raw)
  To: Veerabhadrarao Badiganti, ulf.hansson
  Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg,
	stable, Baolin Wang, Yoshihiro Shimoda, Andreas Koop,
	Thomas Gleixner

On 6/05/20 5:34 pm, Veerabhadrarao Badiganti wrote:
> From: Sarthak Garg <sartgarg@codeaurora.org>
> 
> Consider the following stack trace
> 
> -001|raw_spin_lock_irqsave
> -002|mmc_blk_cqe_complete_rq
> -003|__blk_mq_complete_request(inline)
> -003|blk_mq_complete_request(rq)
> -004|mmc_cqe_timed_out(inline)
> -004|mmc_mq_timed_out
> 
> mmc_mq_timed_out acquires the queue_lock for the first
> time. The mmc_blk_cqe_complete_rq function also tries to acquire
> the same queue lock resulting in recursive locking where the task
> is spinning for the same lock which it has already acquired leading
> to watchdog bark.
> 
> Fix this issue with the lock only for the required critical section.
> 
> Cc: <stable@vger.kernel.org> # v4.19+
> Suggested-by: Sahitya Tummala <stummala@codeaurora.org>
> Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org>
> ---
>  drivers/mmc/core/queue.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 25bee3d..72bef39 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
>  	case MMC_ISSUE_DCMD:
>  		if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) {
>  			if (recovery_needed)
> -				__mmc_cqe_recovery_notifier(mq);
> +				mmc_cqe_recovery_notifier(mrq);
>  			return BLK_EH_RESET_TIMER;
>  		}
>  		/* No timeout (XXX: huh? comment doesn't make much sense) */
> @@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
>  
>  	spin_lock_irqsave(&mq->lock, flags);
>  
> -	if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
> +	if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) {
>  		ret = BLK_EH_RESET_TIMER;
> -	else
> +		spin_unlock_irqrestore(&mq->lock, flags);
> +	} else {
> +		spin_unlock_irqrestore(&mq->lock, flags);
>  		ret = mmc_cqe_timed_out(req);
> -
> -	spin_unlock_irqrestore(&mq->lock, flags);
> +	}

This looks good, but I think there needs to be another change also.  I will
send a patch for that, but in the meantime maybe you could straighten up the
code flow through the spinlock e.g.

	spin_lock_irqsave(&mq->lock, flags);
	ignore = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled;
	spin_unlock_irqrestore(&mq->lock, flags);

	return ignore ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);

And add a fixes tag.

>  
>  	return ret;
>  }
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] mmc: block: Fix request completion in the CQE timeout path
  2020-05-07 11:48   ` Adrian Hunter
@ 2020-05-07 14:06     ` Adrian Hunter
  2020-05-08  5:25       ` Ulf Hansson
  0 siblings, 1 reply; 13+ messages in thread
From: Adrian Hunter @ 2020-05-07 14:06 UTC (permalink / raw)
  To: Veerabhadrarao Badiganti, ulf.hansson
  Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg,
	stable, Baolin Wang, Yoshihiro Shimoda, Thomas Gleixner,
	Christoph Hellwig

First, it should be noted that the CQE timeout (60 seconds) is substantial
so a CQE request that times out is really stuck, and the race between
timeout and completion is extremely unlikely. Nevertheless this patch
fixes an issue with it.

Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout")
preserved the existing functionality, to complete the request.
However that had only been necessary because the block layer
timeout handler had been marking the request to prevent it from being
completed normally. That restriction was removed at the same time, the
result being that a request that has gone will have been completed anyway.
That is, the completion in the timeout handler became unnecessary.

At the time, the unnecessary completion was harmless because the block
layer would ignore it, although that changed in kernel v5.0.

Note for stable, this patch will not apply cleanly without patch "mmc:
core: Fix recursive locking issue in CQE recovery path"

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout")
Cc: stable@vger.kernel.org
---


This is the patch I alluded to when replying to "mmc: core: Fix recursive
locking issue in CQE recovery path"


 drivers/mmc/core/queue.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 72bef39d7011..10ea67892b5f 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct
request *req)
 				mmc_cqe_recovery_notifier(mrq);
 			return BLK_EH_RESET_TIMER;
 		}
-		/* No timeout (XXX: huh? comment doesn't make much sense) */
-		blk_mq_complete_request(req);
+		/* The request has gone already */
 		return BLK_EH_DONE;
 	default:
 		/* Timeout is handled by mmc core */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path
  2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
  2020-05-07 11:48   ` Adrian Hunter
@ 2020-05-07 16:15   ` Veerabhadrarao Badiganti
  2020-05-07 17:21     ` Adrian Hunter
  2020-05-08  8:12     ` Ulf Hansson
  1 sibling, 2 replies; 13+ messages in thread
From: Veerabhadrarao Badiganti @ 2020-05-07 16:15 UTC (permalink / raw)
  To: adrian.hunter, ulf.hansson
  Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg,
	stable, Yoshihiro Shimoda, Baolin Wang, Kate Stewart,
	Allison Randal, Andreas Koop, Thomas Gleixner, Linus Walleij

From: Sarthak Garg <sartgarg@codeaurora.org>

Consider the following stack trace

-001|raw_spin_lock_irqsave
-002|mmc_blk_cqe_complete_rq
-003|__blk_mq_complete_request(inline)
-003|blk_mq_complete_request(rq)
-004|mmc_cqe_timed_out(inline)
-004|mmc_mq_timed_out

mmc_mq_timed_out acquires the queue_lock for the first
time. The mmc_blk_cqe_complete_rq function also tries to acquire
the same queue lock resulting in recursive locking where the task
is spinning for the same lock which it has already acquired leading
to watchdog bark.

Fix this issue with the lock only for the required critical section.

Cc: <stable@vger.kernel.org>
Fixes: 1e8e55b67030 ("mmc: block: Add CQE support")
Suggested-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org>
---
 drivers/mmc/core/queue.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 25bee3d..b5fd3bc 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
 	case MMC_ISSUE_DCMD:
 		if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) {
 			if (recovery_needed)
-				__mmc_cqe_recovery_notifier(mq);
+				mmc_cqe_recovery_notifier(mrq);
 			return BLK_EH_RESET_TIMER;
 		}
 		/* No timeout (XXX: huh? comment doesn't make much sense) */
@@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
 	struct mmc_card *card = mq->card;
 	struct mmc_host *host = card->host;
 	unsigned long flags;
-	int ret;
+	bool ignore_tout;
 
 	spin_lock_irqsave(&mq->lock, flags);
-
-	if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
-		ret = BLK_EH_RESET_TIMER;
-	else
-		ret = mmc_cqe_timed_out(req);
-
+	ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled;
 	spin_unlock_irqrestore(&mq->lock, flags);
 
-	return ret;
+	return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);
 }
 
 static void mmc_mq_recovery_handler(struct work_struct *work)
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path
  2020-05-07 16:15   ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
@ 2020-05-07 17:21     ` Adrian Hunter
  2020-05-08  8:12     ` Ulf Hansson
  1 sibling, 0 replies; 13+ messages in thread
From: Adrian Hunter @ 2020-05-07 17:21 UTC (permalink / raw)
  To: Veerabhadrarao Badiganti, ulf.hansson
  Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg,
	stable, Yoshihiro Shimoda, Baolin Wang, Kate Stewart,
	Allison Randal, Thomas Gleixner, Linus Walleij

On 7/05/20 7:15 pm, Veerabhadrarao Badiganti wrote:
> From: Sarthak Garg <sartgarg@codeaurora.org>
> 
> Consider the following stack trace
> 
> -001|raw_spin_lock_irqsave
> -002|mmc_blk_cqe_complete_rq
> -003|__blk_mq_complete_request(inline)
> -003|blk_mq_complete_request(rq)
> -004|mmc_cqe_timed_out(inline)
> -004|mmc_mq_timed_out
> 
> mmc_mq_timed_out acquires the queue_lock for the first
> time. The mmc_blk_cqe_complete_rq function also tries to acquire
> the same queue lock resulting in recursive locking where the task
> is spinning for the same lock which it has already acquired leading
> to watchdog bark.
> 
> Fix this issue with the lock only for the required critical section.
> 
> Cc: <stable@vger.kernel.org>
> Fixes: 1e8e55b67030 ("mmc: block: Add CQE support")
> Suggested-by: Sahitya Tummala <stummala@codeaurora.org>
> Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org>

Acked-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  drivers/mmc/core/queue.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 25bee3d..b5fd3bc 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
>  	case MMC_ISSUE_DCMD:
>  		if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) {
>  			if (recovery_needed)
> -				__mmc_cqe_recovery_notifier(mq);
> +				mmc_cqe_recovery_notifier(mrq);
>  			return BLK_EH_RESET_TIMER;
>  		}
>  		/* No timeout (XXX: huh? comment doesn't make much sense) */
> @@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
>  	struct mmc_card *card = mq->card;
>  	struct mmc_host *host = card->host;
>  	unsigned long flags;
> -	int ret;
> +	bool ignore_tout;
>  
>  	spin_lock_irqsave(&mq->lock, flags);
> -
> -	if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
> -		ret = BLK_EH_RESET_TIMER;
> -	else
> -		ret = mmc_cqe_timed_out(req);
> -
> +	ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled;
>  	spin_unlock_irqrestore(&mq->lock, flags);
>  
> -	return ret;
> +	return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);
>  }
>  
>  static void mmc_mq_recovery_handler(struct work_struct *work)
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mmc: block: Fix request completion in the CQE timeout path
  2020-05-07 14:06     ` [PATCH] mmc: block: Fix request completion in the CQE timeout path Adrian Hunter
@ 2020-05-08  5:25       ` Ulf Hansson
  2020-05-08  6:22         ` [PATCH RESEND] " Adrian Hunter
  0 siblings, 1 reply; 13+ messages in thread
From: Ulf Hansson @ 2020-05-08  5:25 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Veerabhadrarao Badiganti, Sahitya Tummala, linux-mmc,
	Linux Kernel Mailing List, linux-arm-msm, Sarthak Garg, # 4.0+,
	Baolin Wang, Yoshihiro Shimoda, Thomas Gleixner,
	Christoph Hellwig

On Thu, 7 May 2020 at 16:06, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> First, it should be noted that the CQE timeout (60 seconds) is substantial
> so a CQE request that times out is really stuck, and the race between
> timeout and completion is extremely unlikely. Nevertheless this patch
> fixes an issue with it.
>
> Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout")
> preserved the existing functionality, to complete the request.
> However that had only been necessary because the block layer
> timeout handler had been marking the request to prevent it from being
> completed normally. That restriction was removed at the same time, the
> result being that a request that has gone will have been completed anyway.
> That is, the completion in the timeout handler became unnecessary.
>
> At the time, the unnecessary completion was harmless because the block
> layer would ignore it, although that changed in kernel v5.0.
>
> Note for stable, this patch will not apply cleanly without patch "mmc:
> core: Fix recursive locking issue in CQE recovery path"
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout")
> Cc: stable@vger.kernel.org
> ---
>
>
> This is the patch I alluded to when replying to "mmc: core: Fix recursive
> locking issue in CQE recovery path"

Looks like the patch got corrupted, I was trying to fix it, but just
couldn't figure it out.

Can you please re-format and do a repost?

Kind regards
Uffe

>
>
>  drivers/mmc/core/queue.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 72bef39d7011..10ea67892b5f 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct
> request *req)
>                                 mmc_cqe_recovery_notifier(mrq);
>                         return BLK_EH_RESET_TIMER;
>                 }
> -               /* No timeout (XXX: huh? comment doesn't make much sense) */
> -               blk_mq_complete_request(req);
> +               /* The request has gone already */
>                 return BLK_EH_DONE;
>         default:
>                 /* Timeout is handled by mmc core */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RESEND] mmc: block: Fix request completion in the CQE timeout path
  2020-05-08  5:25       ` Ulf Hansson
@ 2020-05-08  6:22         ` Adrian Hunter
  2020-05-08  8:17           ` Ulf Hansson
  0 siblings, 1 reply; 13+ messages in thread
From: Adrian Hunter @ 2020-05-08  6:22 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-mmc, Veerabhadrarao Badiganti, Sahitya Tummala,
	Linux Kernel Mailing List, linux-arm-msm, Sarthak Garg,
	Baolin Wang, Yoshihiro Shimoda, Thomas Gleixner,
	Christoph Hellwig

First, it should be noted that the CQE timeout (60 seconds) is substantial
so a CQE request that times out is really stuck, and the race between
timeout and completion is extremely unlikely. Nevertheless this patch
fixes an issue with it.

Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout")
preserved the existing functionality, to complete the request.
However that had only been necessary because the block layer
timeout handler had been marking the request to prevent it from being
completed normally. That restriction was removed at the same time, the
result being that a request that has gone will have been completed anyway.
That is, the completion was unnecessary.

At the time, the unnecessary completion was harmless because the block
layer would ignore it, although that changed in kernel v5.0.

Note for stable, this patch will not apply cleanly without patch "mmc:
core: Fix recursive locking issue in CQE recovery path"

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout")
Cc: stable@vger.kernel.org
---
 drivers/mmc/core/queue.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 72bef39d7011..10ea67892b5f 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
 				mmc_cqe_recovery_notifier(mrq);
 			return BLK_EH_RESET_TIMER;
 		}
-		/* No timeout (XXX: huh? comment doesn't make much sense) */
-		blk_mq_complete_request(req);
+		/* The request has gone already */
 		return BLK_EH_DONE;
 	default:
 		/* Timeout is handled by mmc core */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V1 1/2] mmc: core: Check request type before completing the request
  2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti
  2020-05-06 17:06   ` Adrian Hunter
@ 2020-05-08  8:12   ` Ulf Hansson
  1 sibling, 0 replies; 13+ messages in thread
From: Ulf Hansson @ 2020-05-08  8:12 UTC (permalink / raw)
  To: Veerabhadrarao Badiganti
  Cc: Adrian Hunter, Sahitya Tummala, linux-mmc,
	Linux Kernel Mailing List, linux-arm-msm, # 4.0+,
	Baolin Wang, Avri Altman, Chaotian Jing, Arnd Bergmann

On Wed, 6 May 2020 at 16:34, Veerabhadrarao Badiganti
<vbadigan@codeaurora.org> wrote:
>
> In the request completion path with CQE, request type is being checked
> after the request is getting completed. This is resulting in returning
> the wrong request type and leading to the IO hang issue.
>
> ASYNC request type is getting returned for DCMD type requests.
> Because of this mismatch, mq->cqe_busy flag is never getting cleared
> and the driver is not invoking blk_mq_hw_run_queue. So requests are not
> getting dispatched to the LLD from the block layer.
>
> All these eventually leading to IO hang issues.
> So, get the request type before completing the request.
>
> Cc: <stable@vger.kernel.org> # v4.19+
> Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>

Applied for fixes, and by updating the tags that were provided by
Adrian, thanks!

Kind regards
Uffe


> ---
>  drivers/mmc/core/block.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 8499b56..c5367e2 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1370,6 +1370,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
>         struct mmc_request *mrq = &mqrq->brq.mrq;
>         struct request_queue *q = req->q;
>         struct mmc_host *host = mq->card->host;
> +       enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
>         unsigned long flags;
>         bool put_card;
>         int err;
> @@ -1399,7 +1400,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
>
>         spin_lock_irqsave(&mq->lock, flags);
>
> -       mq->in_flight[mmc_issue_type(mq, req)] -= 1;
> +       mq->in_flight[issue_type] -= 1;
>
>         put_card = (mmc_tot_in_flight(mq) == 0);
>
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path
  2020-05-07 16:15   ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
  2020-05-07 17:21     ` Adrian Hunter
@ 2020-05-08  8:12     ` Ulf Hansson
  1 sibling, 0 replies; 13+ messages in thread
From: Ulf Hansson @ 2020-05-08  8:12 UTC (permalink / raw)
  To: Veerabhadrarao Badiganti
  Cc: Adrian Hunter, Sahitya Tummala, linux-mmc,
	Linux Kernel Mailing List, linux-arm-msm, Sarthak Garg, # 4.0+,
	Yoshihiro Shimoda, Baolin Wang, Kate Stewart, Allison Randal,
	Andreas Koop, Thomas Gleixner, Linus Walleij

On Thu, 7 May 2020 at 18:15, Veerabhadrarao Badiganti
<vbadigan@codeaurora.org> wrote:
>
> From: Sarthak Garg <sartgarg@codeaurora.org>
>
> Consider the following stack trace
>
> -001|raw_spin_lock_irqsave
> -002|mmc_blk_cqe_complete_rq
> -003|__blk_mq_complete_request(inline)
> -003|blk_mq_complete_request(rq)
> -004|mmc_cqe_timed_out(inline)
> -004|mmc_mq_timed_out
>
> mmc_mq_timed_out acquires the queue_lock for the first
> time. The mmc_blk_cqe_complete_rq function also tries to acquire
> the same queue lock resulting in recursive locking where the task
> is spinning for the same lock which it has already acquired leading
> to watchdog bark.
>
> Fix this issue with the lock only for the required critical section.
>
> Cc: <stable@vger.kernel.org>
> Fixes: 1e8e55b67030 ("mmc: block: Add CQE support")
> Suggested-by: Sahitya Tummala <stummala@codeaurora.org>
> Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org>

Applied for fixes, thanks!

Kind regards
Uffe


> ---
>  drivers/mmc/core/queue.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 25bee3d..b5fd3bc 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
>         case MMC_ISSUE_DCMD:
>                 if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) {
>                         if (recovery_needed)
> -                               __mmc_cqe_recovery_notifier(mq);
> +                               mmc_cqe_recovery_notifier(mrq);
>                         return BLK_EH_RESET_TIMER;
>                 }
>                 /* No timeout (XXX: huh? comment doesn't make much sense) */
> @@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
>         struct mmc_card *card = mq->card;
>         struct mmc_host *host = card->host;
>         unsigned long flags;
> -       int ret;
> +       bool ignore_tout;
>
>         spin_lock_irqsave(&mq->lock, flags);
> -
> -       if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
> -               ret = BLK_EH_RESET_TIMER;
> -       else
> -               ret = mmc_cqe_timed_out(req);
> -
> +       ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled;
>         spin_unlock_irqrestore(&mq->lock, flags);
>
> -       return ret;
> +       return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);
>  }
>
>  static void mmc_mq_recovery_handler(struct work_struct *work)
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND] mmc: block: Fix request completion in the CQE timeout path
  2020-05-08  6:22         ` [PATCH RESEND] " Adrian Hunter
@ 2020-05-08  8:17           ` Ulf Hansson
  0 siblings, 0 replies; 13+ messages in thread
From: Ulf Hansson @ 2020-05-08  8:17 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-mmc, Veerabhadrarao Badiganti, Sahitya Tummala,
	Linux Kernel Mailing List, linux-arm-msm, Sarthak Garg,
	Baolin Wang, Yoshihiro Shimoda, Thomas Gleixner,
	Christoph Hellwig

On Fri, 8 May 2020 at 08:22, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> First, it should be noted that the CQE timeout (60 seconds) is substantial
> so a CQE request that times out is really stuck, and the race between
> timeout and completion is extremely unlikely. Nevertheless this patch
> fixes an issue with it.
>
> Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout")
> preserved the existing functionality, to complete the request.
> However that had only been necessary because the block layer
> timeout handler had been marking the request to prevent it from being
> completed normally. That restriction was removed at the same time, the
> result being that a request that has gone will have been completed anyway.
> That is, the completion was unnecessary.
>
> At the time, the unnecessary completion was harmless because the block
> layer would ignore it, although that changed in kernel v5.0.
>
> Note for stable, this patch will not apply cleanly without patch "mmc:
> core: Fix recursive locking issue in CQE recovery path"
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout")
> Cc: stable@vger.kernel.org

Applied for fixes, thanks!

Kind regards
Uffe

> ---
>  drivers/mmc/core/queue.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 72bef39d7011..10ea67892b5f 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
>                                 mmc_cqe_recovery_notifier(mrq);
>                         return BLK_EH_RESET_TIMER;
>                 }
> -               /* No timeout (XXX: huh? comment doesn't make much sense) */
> -               blk_mq_complete_request(req);
> +               /* The request has gone already */
>                 return BLK_EH_DONE;
>         default:
>                 /* Timeout is handled by mmc core */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-08  8:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-06 14:34 [PATCH V1 0/2] CQE fixes Veerabhadrarao Badiganti
2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti
2020-05-06 17:06   ` Adrian Hunter
2020-05-08  8:12   ` Ulf Hansson
2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
2020-05-07 11:48   ` Adrian Hunter
2020-05-07 14:06     ` [PATCH] mmc: block: Fix request completion in the CQE timeout path Adrian Hunter
2020-05-08  5:25       ` Ulf Hansson
2020-05-08  6:22         ` [PATCH RESEND] " Adrian Hunter
2020-05-08  8:17           ` Ulf Hansson
2020-05-07 16:15   ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti
2020-05-07 17:21     ` Adrian Hunter
2020-05-08  8:12     ` Ulf Hansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).