* [PATCH V1 0/2] CQE fixes @ 2020-05-06 14:34 Veerabhadrarao Badiganti 2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti 2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti 0 siblings, 2 replies; 13+ messages in thread From: Veerabhadrarao Badiganti @ 2020-05-06 14:34 UTC (permalink / raw) To: adrian.hunter, ulf.hansson Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Veerabhadrarao Badiganti Fixes for a couple of issues observed with CQE. One with CQE completion path and the other one is with CQE recovery path. Sarthak Garg (1): mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti (1): mmc: core: Check request type before completing the request drivers/mmc/core/block.c | 3 ++- drivers/mmc/core/queue.c | 11 ++++++----- 2 files changed, 8 insertions(+), 6 deletions(-) -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH V1 1/2] mmc: core: Check request type before completing the request 2020-05-06 14:34 [PATCH V1 0/2] CQE fixes Veerabhadrarao Badiganti @ 2020-05-06 14:34 ` Veerabhadrarao Badiganti 2020-05-06 17:06 ` Adrian Hunter 2020-05-08 8:12 ` Ulf Hansson 2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti 1 sibling, 2 replies; 13+ messages in thread From: Veerabhadrarao Badiganti @ 2020-05-06 14:34 UTC (permalink / raw) To: adrian.hunter, ulf.hansson Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Veerabhadrarao Badiganti, stable, Baolin Wang, Avri Altman, Chaotian Jing, Arnd Bergmann In the request completion path with CQE, request type is being checked after the request is getting completed. This is resulting in returning the wrong request type and leading to the IO hang issue. ASYNC request type is getting returned for DCMD type requests. Because of this mismatch, mq->cqe_busy flag is never getting cleared and the driver is not invoking blk_mq_hw_run_queue. So requests are not getting dispatched to the LLD from the block layer. All these eventually leading to IO hang issues. So, get the request type before completing the request. Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> --- drivers/mmc/core/block.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 8499b56..c5367e2 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1370,6 +1370,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) struct mmc_request *mrq = &mqrq->brq.mrq; struct request_queue *q = req->q; struct mmc_host *host = mq->card->host; + enum mmc_issue_type issue_type = mmc_issue_type(mq, req); unsigned long flags; bool put_card; int err; @@ -1399,7 +1400,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) spin_lock_irqsave(&mq->lock, flags); - mq->in_flight[mmc_issue_type(mq, req)] -= 1; + mq->in_flight[issue_type] -= 1; put_card = (mmc_tot_in_flight(mq) == 0); -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH V1 1/2] mmc: core: Check request type before completing the request 2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti @ 2020-05-06 17:06 ` Adrian Hunter 2020-05-08 8:12 ` Ulf Hansson 1 sibling, 0 replies; 13+ messages in thread From: Adrian Hunter @ 2020-05-06 17:06 UTC (permalink / raw) To: Veerabhadrarao Badiganti, ulf.hansson Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, stable, Baolin Wang, Avri Altman, Chaotian Jing, Arnd Bergmann On 6/05/20 5:34 pm, Veerabhadrarao Badiganti wrote: > In the request completion path with CQE, request type is being checked > after the request is getting completed. This is resulting in returning > the wrong request type and leading to the IO hang issue. > > ASYNC request type is getting returned for DCMD type requests. > Because of this mismatch, mq->cqe_busy flag is never getting cleared > and the driver is not invoking blk_mq_hw_run_queue. So requests are not > getting dispatched to the LLD from the block layer. > > All these eventually leading to IO hang issues. > So, get the request type before completing the request. > > Cc: <stable@vger.kernel.org> # v4.19+ The fixed commit was in 4.16 > Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Acked-by: Adrian Hunter <adrian.hunter@intel.com> Thank you for finding this! > --- > drivers/mmc/core/block.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > index 8499b56..c5367e2 100644 > --- a/drivers/mmc/core/block.c > +++ b/drivers/mmc/core/block.c > @@ -1370,6 +1370,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) > struct mmc_request *mrq = &mqrq->brq.mrq; > struct request_queue *q = req->q; > struct mmc_host *host = mq->card->host; > + enum mmc_issue_type issue_type = mmc_issue_type(mq, req); > unsigned long flags; > bool put_card; > int err; > @@ -1399,7 +1400,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) > > spin_lock_irqsave(&mq->lock, flags); > > - mq->in_flight[mmc_issue_type(mq, req)] -= 1; > + mq->in_flight[issue_type] -= 1; > > put_card = (mmc_tot_in_flight(mq) == 0); > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V1 1/2] mmc: core: Check request type before completing the request 2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti 2020-05-06 17:06 ` Adrian Hunter @ 2020-05-08 8:12 ` Ulf Hansson 1 sibling, 0 replies; 13+ messages in thread From: Ulf Hansson @ 2020-05-08 8:12 UTC (permalink / raw) To: Veerabhadrarao Badiganti Cc: Adrian Hunter, Sahitya Tummala, linux-mmc, Linux Kernel Mailing List, linux-arm-msm, # 4.0+, Baolin Wang, Avri Altman, Chaotian Jing, Arnd Bergmann On Wed, 6 May 2020 at 16:34, Veerabhadrarao Badiganti <vbadigan@codeaurora.org> wrote: > > In the request completion path with CQE, request type is being checked > after the request is getting completed. This is resulting in returning > the wrong request type and leading to the IO hang issue. > > ASYNC request type is getting returned for DCMD type requests. > Because of this mismatch, mq->cqe_busy flag is never getting cleared > and the driver is not invoking blk_mq_hw_run_queue. So requests are not > getting dispatched to the LLD from the block layer. > > All these eventually leading to IO hang issues. > So, get the request type before completing the request. > > Cc: <stable@vger.kernel.org> # v4.19+ > Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Applied for fixes, and by updating the tags that were provided by Adrian, thanks! Kind regards Uffe > --- > drivers/mmc/core/block.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > index 8499b56..c5367e2 100644 > --- a/drivers/mmc/core/block.c > +++ b/drivers/mmc/core/block.c > @@ -1370,6 +1370,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) > struct mmc_request *mrq = &mqrq->brq.mrq; > struct request_queue *q = req->q; > struct mmc_host *host = mq->card->host; > + enum mmc_issue_type issue_type = mmc_issue_type(mq, req); > unsigned long flags; > bool put_card; > int err; > @@ -1399,7 +1400,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) > > spin_lock_irqsave(&mq->lock, flags); > > - mq->in_flight[mmc_issue_type(mq, req)] -= 1; > + mq->in_flight[issue_type] -= 1; > > put_card = (mmc_tot_in_flight(mq) == 0); > > -- > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path 2020-05-06 14:34 [PATCH V1 0/2] CQE fixes Veerabhadrarao Badiganti 2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti @ 2020-05-06 14:34 ` Veerabhadrarao Badiganti 2020-05-07 11:48 ` Adrian Hunter 2020-05-07 16:15 ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti 1 sibling, 2 replies; 13+ messages in thread From: Veerabhadrarao Badiganti @ 2020-05-06 14:34 UTC (permalink / raw) To: adrian.hunter, ulf.hansson Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg, stable, Baolin Wang, Yoshihiro Shimoda, Andreas Koop, Thomas Gleixner From: Sarthak Garg <sartgarg@codeaurora.org> Consider the following stack trace -001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark. Fix this issue with the lock only for the required critical section. Cc: <stable@vger.kernel.org> # v4.19+ Suggested-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org> --- drivers/mmc/core/queue.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..72bef39 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed) - __mmc_cqe_recovery_notifier(mq); + mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } /* No timeout (XXX: huh? comment doesn't make much sense) */ @@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, spin_lock_irqsave(&mq->lock, flags); - if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) + if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) { ret = BLK_EH_RESET_TIMER; - else + spin_unlock_irqrestore(&mq->lock, flags); + } else { + spin_unlock_irqrestore(&mq->lock, flags); ret = mmc_cqe_timed_out(req); - - spin_unlock_irqrestore(&mq->lock, flags); + } return ret; } -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path 2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti @ 2020-05-07 11:48 ` Adrian Hunter 2020-05-07 14:06 ` [PATCH] mmc: block: Fix request completion in the CQE timeout path Adrian Hunter 2020-05-07 16:15 ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti 1 sibling, 1 reply; 13+ messages in thread From: Adrian Hunter @ 2020-05-07 11:48 UTC (permalink / raw) To: Veerabhadrarao Badiganti, ulf.hansson Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg, stable, Baolin Wang, Yoshihiro Shimoda, Andreas Koop, Thomas Gleixner On 6/05/20 5:34 pm, Veerabhadrarao Badiganti wrote: > From: Sarthak Garg <sartgarg@codeaurora.org> > > Consider the following stack trace > > -001|raw_spin_lock_irqsave > -002|mmc_blk_cqe_complete_rq > -003|__blk_mq_complete_request(inline) > -003|blk_mq_complete_request(rq) > -004|mmc_cqe_timed_out(inline) > -004|mmc_mq_timed_out > > mmc_mq_timed_out acquires the queue_lock for the first > time. The mmc_blk_cqe_complete_rq function also tries to acquire > the same queue lock resulting in recursive locking where the task > is spinning for the same lock which it has already acquired leading > to watchdog bark. > > Fix this issue with the lock only for the required critical section. > > Cc: <stable@vger.kernel.org> # v4.19+ > Suggested-by: Sahitya Tummala <stummala@codeaurora.org> > Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org> > --- > drivers/mmc/core/queue.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c > index 25bee3d..72bef39 100644 > --- a/drivers/mmc/core/queue.c > +++ b/drivers/mmc/core/queue.c > @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) > case MMC_ISSUE_DCMD: > if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { > if (recovery_needed) > - __mmc_cqe_recovery_notifier(mq); > + mmc_cqe_recovery_notifier(mrq); > return BLK_EH_RESET_TIMER; > } > /* No timeout (XXX: huh? comment doesn't make much sense) */ > @@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, > > spin_lock_irqsave(&mq->lock, flags); > > - if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) > + if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) { > ret = BLK_EH_RESET_TIMER; > - else > + spin_unlock_irqrestore(&mq->lock, flags); > + } else { > + spin_unlock_irqrestore(&mq->lock, flags); > ret = mmc_cqe_timed_out(req); > - > - spin_unlock_irqrestore(&mq->lock, flags); > + } This looks good, but I think there needs to be another change also. I will send a patch for that, but in the meantime maybe you could straighten up the code flow through the spinlock e.g. spin_lock_irqsave(&mq->lock, flags); ignore = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags); return ignore ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req); And add a fixes tag. > > return ret; > } > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] mmc: block: Fix request completion in the CQE timeout path 2020-05-07 11:48 ` Adrian Hunter @ 2020-05-07 14:06 ` Adrian Hunter 2020-05-08 5:25 ` Ulf Hansson 0 siblings, 1 reply; 13+ messages in thread From: Adrian Hunter @ 2020-05-07 14:06 UTC (permalink / raw) To: Veerabhadrarao Badiganti, ulf.hansson Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg, stable, Baolin Wang, Yoshihiro Shimoda, Thomas Gleixner, Christoph Hellwig First, it should be noted that the CQE timeout (60 seconds) is substantial so a CQE request that times out is really stuck, and the race between timeout and completion is extremely unlikely. Nevertheless this patch fixes an issue with it. Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") preserved the existing functionality, to complete the request. However that had only been necessary because the block layer timeout handler had been marking the request to prevent it from being completed normally. That restriction was removed at the same time, the result being that a request that has gone will have been completed anyway. That is, the completion in the timeout handler became unnecessary. At the time, the unnecessary completion was harmless because the block layer would ignore it, although that changed in kernel v5.0. Note for stable, this patch will not apply cleanly without patch "mmc: core: Fix recursive locking issue in CQE recovery path" Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") Cc: stable@vger.kernel.org --- This is the patch I alluded to when replying to "mmc: core: Fix recursive locking issue in CQE recovery path" drivers/mmc/core/queue.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 72bef39d7011..10ea67892b5f 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } - /* No timeout (XXX: huh? comment doesn't make much sense) */ - blk_mq_complete_request(req); + /* The request has gone already */ return BLK_EH_DONE; default: /* Timeout is handled by mmc core */ -- 2.17.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] mmc: block: Fix request completion in the CQE timeout path 2020-05-07 14:06 ` [PATCH] mmc: block: Fix request completion in the CQE timeout path Adrian Hunter @ 2020-05-08 5:25 ` Ulf Hansson 2020-05-08 6:22 ` [PATCH RESEND] " Adrian Hunter 0 siblings, 1 reply; 13+ messages in thread From: Ulf Hansson @ 2020-05-08 5:25 UTC (permalink / raw) To: Adrian Hunter Cc: Veerabhadrarao Badiganti, Sahitya Tummala, linux-mmc, Linux Kernel Mailing List, linux-arm-msm, Sarthak Garg, # 4.0+, Baolin Wang, Yoshihiro Shimoda, Thomas Gleixner, Christoph Hellwig On Thu, 7 May 2020 at 16:06, Adrian Hunter <adrian.hunter@intel.com> wrote: > > First, it should be noted that the CQE timeout (60 seconds) is substantial > so a CQE request that times out is really stuck, and the race between > timeout and completion is extremely unlikely. Nevertheless this patch > fixes an issue with it. > > Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") > preserved the existing functionality, to complete the request. > However that had only been necessary because the block layer > timeout handler had been marking the request to prevent it from being > completed normally. That restriction was removed at the same time, the > result being that a request that has gone will have been completed anyway. > That is, the completion in the timeout handler became unnecessary. > > At the time, the unnecessary completion was harmless because the block > layer would ignore it, although that changed in kernel v5.0. > > Note for stable, this patch will not apply cleanly without patch "mmc: > core: Fix recursive locking issue in CQE recovery path" > > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") > Cc: stable@vger.kernel.org > --- > > > This is the patch I alluded to when replying to "mmc: core: Fix recursive > locking issue in CQE recovery path" Looks like the patch got corrupted, I was trying to fix it, but just couldn't figure it out. Can you please re-format and do a repost? Kind regards Uffe > > > drivers/mmc/core/queue.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c > index 72bef39d7011..10ea67892b5f 100644 > --- a/drivers/mmc/core/queue.c > +++ b/drivers/mmc/core/queue.c > @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct > request *req) > mmc_cqe_recovery_notifier(mrq); > return BLK_EH_RESET_TIMER; > } > - /* No timeout (XXX: huh? comment doesn't make much sense) */ > - blk_mq_complete_request(req); > + /* The request has gone already */ > return BLK_EH_DONE; > default: > /* Timeout is handled by mmc core */ > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RESEND] mmc: block: Fix request completion in the CQE timeout path 2020-05-08 5:25 ` Ulf Hansson @ 2020-05-08 6:22 ` Adrian Hunter 2020-05-08 8:17 ` Ulf Hansson 0 siblings, 1 reply; 13+ messages in thread From: Adrian Hunter @ 2020-05-08 6:22 UTC (permalink / raw) To: Ulf Hansson Cc: linux-mmc, Veerabhadrarao Badiganti, Sahitya Tummala, Linux Kernel Mailing List, linux-arm-msm, Sarthak Garg, Baolin Wang, Yoshihiro Shimoda, Thomas Gleixner, Christoph Hellwig First, it should be noted that the CQE timeout (60 seconds) is substantial so a CQE request that times out is really stuck, and the race between timeout and completion is extremely unlikely. Nevertheless this patch fixes an issue with it. Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") preserved the existing functionality, to complete the request. However that had only been necessary because the block layer timeout handler had been marking the request to prevent it from being completed normally. That restriction was removed at the same time, the result being that a request that has gone will have been completed anyway. That is, the completion was unnecessary. At the time, the unnecessary completion was harmless because the block layer would ignore it, although that changed in kernel v5.0. Note for stable, this patch will not apply cleanly without patch "mmc: core: Fix recursive locking issue in CQE recovery path" Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") Cc: stable@vger.kernel.org --- drivers/mmc/core/queue.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 72bef39d7011..10ea67892b5f 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } - /* No timeout (XXX: huh? comment doesn't make much sense) */ - blk_mq_complete_request(req); + /* The request has gone already */ return BLK_EH_DONE; default: /* Timeout is handled by mmc core */ -- 2.17.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RESEND] mmc: block: Fix request completion in the CQE timeout path 2020-05-08 6:22 ` [PATCH RESEND] " Adrian Hunter @ 2020-05-08 8:17 ` Ulf Hansson 0 siblings, 0 replies; 13+ messages in thread From: Ulf Hansson @ 2020-05-08 8:17 UTC (permalink / raw) To: Adrian Hunter Cc: linux-mmc, Veerabhadrarao Badiganti, Sahitya Tummala, Linux Kernel Mailing List, linux-arm-msm, Sarthak Garg, Baolin Wang, Yoshihiro Shimoda, Thomas Gleixner, Christoph Hellwig On Fri, 8 May 2020 at 08:22, Adrian Hunter <adrian.hunter@intel.com> wrote: > > First, it should be noted that the CQE timeout (60 seconds) is substantial > so a CQE request that times out is really stuck, and the race between > timeout and completion is extremely unlikely. Nevertheless this patch > fixes an issue with it. > > Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") > preserved the existing functionality, to complete the request. > However that had only been necessary because the block layer > timeout handler had been marking the request to prevent it from being > completed normally. That restriction was removed at the same time, the > result being that a request that has gone will have been completed anyway. > That is, the completion was unnecessary. > > At the time, the unnecessary completion was harmless because the block > layer would ignore it, although that changed in kernel v5.0. > > Note for stable, this patch will not apply cleanly without patch "mmc: > core: Fix recursive locking issue in CQE recovery path" > > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") > Cc: stable@vger.kernel.org Applied for fixes, thanks! Kind regards Uffe > --- > drivers/mmc/core/queue.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c > index 72bef39d7011..10ea67892b5f 100644 > --- a/drivers/mmc/core/queue.c > +++ b/drivers/mmc/core/queue.c > @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) > mmc_cqe_recovery_notifier(mrq); > return BLK_EH_RESET_TIMER; > } > - /* No timeout (XXX: huh? comment doesn't make much sense) */ > - blk_mq_complete_request(req); > + /* The request has gone already */ > return BLK_EH_DONE; > default: > /* Timeout is handled by mmc core */ > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path 2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti 2020-05-07 11:48 ` Adrian Hunter @ 2020-05-07 16:15 ` Veerabhadrarao Badiganti 2020-05-07 17:21 ` Adrian Hunter 2020-05-08 8:12 ` Ulf Hansson 1 sibling, 2 replies; 13+ messages in thread From: Veerabhadrarao Badiganti @ 2020-05-07 16:15 UTC (permalink / raw) To: adrian.hunter, ulf.hansson Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg, stable, Yoshihiro Shimoda, Baolin Wang, Kate Stewart, Allison Randal, Andreas Koop, Thomas Gleixner, Linus Walleij From: Sarthak Garg <sartgarg@codeaurora.org> Consider the following stack trace -001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark. Fix this issue with the lock only for the required critical section. Cc: <stable@vger.kernel.org> Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Suggested-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org> --- drivers/mmc/core/queue.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..b5fd3bc 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed) - __mmc_cqe_recovery_notifier(mq); + mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } /* No timeout (XXX: huh? comment doesn't make much sense) */ @@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, struct mmc_card *card = mq->card; struct mmc_host *host = card->host; unsigned long flags; - int ret; + bool ignore_tout; spin_lock_irqsave(&mq->lock, flags); - - if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) - ret = BLK_EH_RESET_TIMER; - else - ret = mmc_cqe_timed_out(req); - + ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags); - return ret; + return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req); } static void mmc_mq_recovery_handler(struct work_struct *work) -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path 2020-05-07 16:15 ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti @ 2020-05-07 17:21 ` Adrian Hunter 2020-05-08 8:12 ` Ulf Hansson 1 sibling, 0 replies; 13+ messages in thread From: Adrian Hunter @ 2020-05-07 17:21 UTC (permalink / raw) To: Veerabhadrarao Badiganti, ulf.hansson Cc: stummala, linux-mmc, linux-kernel, linux-arm-msm, Sarthak Garg, stable, Yoshihiro Shimoda, Baolin Wang, Kate Stewart, Allison Randal, Thomas Gleixner, Linus Walleij On 7/05/20 7:15 pm, Veerabhadrarao Badiganti wrote: > From: Sarthak Garg <sartgarg@codeaurora.org> > > Consider the following stack trace > > -001|raw_spin_lock_irqsave > -002|mmc_blk_cqe_complete_rq > -003|__blk_mq_complete_request(inline) > -003|blk_mq_complete_request(rq) > -004|mmc_cqe_timed_out(inline) > -004|mmc_mq_timed_out > > mmc_mq_timed_out acquires the queue_lock for the first > time. The mmc_blk_cqe_complete_rq function also tries to acquire > the same queue lock resulting in recursive locking where the task > is spinning for the same lock which it has already acquired leading > to watchdog bark. > > Fix this issue with the lock only for the required critical section. > > Cc: <stable@vger.kernel.org> > Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") > Suggested-by: Sahitya Tummala <stummala@codeaurora.org> > Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> > --- > drivers/mmc/core/queue.c | 13 ++++--------- > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c > index 25bee3d..b5fd3bc 100644 > --- a/drivers/mmc/core/queue.c > +++ b/drivers/mmc/core/queue.c > @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) > case MMC_ISSUE_DCMD: > if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { > if (recovery_needed) > - __mmc_cqe_recovery_notifier(mq); > + mmc_cqe_recovery_notifier(mrq); > return BLK_EH_RESET_TIMER; > } > /* No timeout (XXX: huh? comment doesn't make much sense) */ > @@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, > struct mmc_card *card = mq->card; > struct mmc_host *host = card->host; > unsigned long flags; > - int ret; > + bool ignore_tout; > > spin_lock_irqsave(&mq->lock, flags); > - > - if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) > - ret = BLK_EH_RESET_TIMER; > - else > - ret = mmc_cqe_timed_out(req); > - > + ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; > spin_unlock_irqrestore(&mq->lock, flags); > > - return ret; > + return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req); > } > > static void mmc_mq_recovery_handler(struct work_struct *work) > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path 2020-05-07 16:15 ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti 2020-05-07 17:21 ` Adrian Hunter @ 2020-05-08 8:12 ` Ulf Hansson 1 sibling, 0 replies; 13+ messages in thread From: Ulf Hansson @ 2020-05-08 8:12 UTC (permalink / raw) To: Veerabhadrarao Badiganti Cc: Adrian Hunter, Sahitya Tummala, linux-mmc, Linux Kernel Mailing List, linux-arm-msm, Sarthak Garg, # 4.0+, Yoshihiro Shimoda, Baolin Wang, Kate Stewart, Allison Randal, Andreas Koop, Thomas Gleixner, Linus Walleij On Thu, 7 May 2020 at 18:15, Veerabhadrarao Badiganti <vbadigan@codeaurora.org> wrote: > > From: Sarthak Garg <sartgarg@codeaurora.org> > > Consider the following stack trace > > -001|raw_spin_lock_irqsave > -002|mmc_blk_cqe_complete_rq > -003|__blk_mq_complete_request(inline) > -003|blk_mq_complete_request(rq) > -004|mmc_cqe_timed_out(inline) > -004|mmc_mq_timed_out > > mmc_mq_timed_out acquires the queue_lock for the first > time. The mmc_blk_cqe_complete_rq function also tries to acquire > the same queue lock resulting in recursive locking where the task > is spinning for the same lock which it has already acquired leading > to watchdog bark. > > Fix this issue with the lock only for the required critical section. > > Cc: <stable@vger.kernel.org> > Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") > Suggested-by: Sahitya Tummala <stummala@codeaurora.org> > Signed-off-by: Sarthak Garg <sartgarg@codeaurora.org> Applied for fixes, thanks! Kind regards Uffe > --- > drivers/mmc/core/queue.c | 13 ++++--------- > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c > index 25bee3d..b5fd3bc 100644 > --- a/drivers/mmc/core/queue.c > +++ b/drivers/mmc/core/queue.c > @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) > case MMC_ISSUE_DCMD: > if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { > if (recovery_needed) > - __mmc_cqe_recovery_notifier(mq); > + mmc_cqe_recovery_notifier(mrq); > return BLK_EH_RESET_TIMER; > } > /* No timeout (XXX: huh? comment doesn't make much sense) */ > @@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, > struct mmc_card *card = mq->card; > struct mmc_host *host = card->host; > unsigned long flags; > - int ret; > + bool ignore_tout; > > spin_lock_irqsave(&mq->lock, flags); > - > - if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) > - ret = BLK_EH_RESET_TIMER; > - else > - ret = mmc_cqe_timed_out(req); > - > + ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; > spin_unlock_irqrestore(&mq->lock, flags); > > - return ret; > + return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req); > } > > static void mmc_mq_recovery_handler(struct work_struct *work) > -- > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-05-08 8:18 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-06 14:34 [PATCH V1 0/2] CQE fixes Veerabhadrarao Badiganti 2020-05-06 14:34 ` [PATCH V1 1/2] mmc: core: Check request type before completing the request Veerabhadrarao Badiganti 2020-05-06 17:06 ` Adrian Hunter 2020-05-08 8:12 ` Ulf Hansson 2020-05-06 14:34 ` [PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti 2020-05-07 11:48 ` Adrian Hunter 2020-05-07 14:06 ` [PATCH] mmc: block: Fix request completion in the CQE timeout path Adrian Hunter 2020-05-08 5:25 ` Ulf Hansson 2020-05-08 6:22 ` [PATCH RESEND] " Adrian Hunter 2020-05-08 8:17 ` Ulf Hansson 2020-05-07 16:15 ` [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path Veerabhadrarao Badiganti 2020-05-07 17:21 ` Adrian Hunter 2020-05-08 8:12 ` Ulf Hansson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).