* Re: [PATCH] mmc: block: handle complete_work on separate workqueue
[not found] <1549385523-20521-1-git-send-email-zhays@lexmark.com>
@ 2019-02-06 12:00 ` Ulf Hansson
2019-02-06 12:42 ` Adrian Hunter
0 siblings, 1 reply; 5+ messages in thread
From: Ulf Hansson @ 2019-02-06 12:00 UTC (permalink / raw)
To: Zachary Hays, Adrian Hunter
Cc: Christoph Hellwig, linux-mmc, linux-block, Linus Walleij
+ Adrian, Linus
On Tue, 5 Feb 2019 at 17:53, Zachary Hays <zhays@lexmark.com> wrote:
>
> The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set.
> This generates a rescuer thread for that queue that will trigger when
> the CPU is under heavy load and collect the uncompleted work.
>
> In the case of mmc, this creates the possibility of a deadlock as
> other blk-mq work is also run on the same queue. For example:
>
> - worker 0 claims the mmc host
> - worker 1 attempts to claim the host
> - worker 0 schedules complete_work to release the host
> - rescuer thread is triggered after time-out and collects the dangling
> work
> - rescuer thread attempts to complete the work in order starting with
> claim host
> - the task to release host is now blocked by a task to claim it and
> will never be called
>
Adrian, I need your help to understand more of this. The above
description doesn't add up to me.
In principle, already when "worker 1 attempts to claim the host" as
described above, it should succeed and should *not* need to wait for
the host to be released. Right?
The hole point with the commit 6c0cedd1ef95 ("mmc: core: Introduce
host claiming by context"), was to allow the mmc host to be
re-claimable for blk I/O requests, no matter from what worker/thread
the claim/release is done from.
Is it not working as expected you think? What am I missing here?
> The above results in multiple hung tasks that lead to failures to boot.
Of course, during boot there is also a card detect work running in
parallel with blk I/O requests. This work is being punted to the
"system_freezable_wq" and it claims/releases the host as well.
Perhaps, that could have something to do with the problem....
>
> Handling complete_work on a separate workqueue avoids this by keeping
> the work completion tasks separate from the other blk-mq work. This
> allows the host to be released without getting blocked by other tasks
> attempting to claim the host.
>
> Signed-off-by: Zachary Hays <zhays@lexmark.com>
> ---
> drivers/mmc/core/block.c | 10 +++++++++-
> include/linux/mmc/card.h | 1 +
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index aef1185f383d..14f3fdb8c6bb 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -2112,7 +2112,7 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
> if (waiting)
> wake_up(&mq->wait);
> else
> - kblockd_schedule_work(&mq->complete_work);
> + queue_work(mq->card->complete_wq, &mq->complete_work);
>
> return;
> }
> @@ -2924,6 +2924,13 @@ static int mmc_blk_probe(struct mmc_card *card)
>
> mmc_fixup_device(card, mmc_blk_fixups);
>
> + card->complete_wq = alloc_workqueue("mmc_complete",
> + WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
> + if (unlikely(!card->complete_wq)) {
> + pr_err("Failed to create mmc completion workqueue");
> + return -ENOMEM;
> + }
> +
> md = mmc_blk_alloc(card);
> if (IS_ERR(md))
> return PTR_ERR(md);
> @@ -2987,6 +2994,7 @@ static void mmc_blk_remove(struct mmc_card *card)
> pm_runtime_put_noidle(&card->dev);
> mmc_blk_remove_req(md);
> dev_set_drvdata(&card->dev, NULL);
> + destroy_workqueue(card->complete_wq);
> }
>
> static int _mmc_blk_suspend(struct mmc_card *card)
> diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
> index de7377815b6b..8ef330027b13 100644
> --- a/include/linux/mmc/card.h
> +++ b/include/linux/mmc/card.h
> @@ -308,6 +308,7 @@ struct mmc_card {
> unsigned int nr_parts;
>
> unsigned int bouncesz; /* Bounce buffer size */
> + struct workqueue_struct *complete_wq; /* Private workqueue */
> };
>
> static inline bool mmc_large_sector(struct mmc_card *card)
> --
> 2.7.4
>
So this change seems to solve the problem, which is really great.
However, I think we really need to understand what goes wrong, before
we apply this as fix.
Unfortunate, I am also in busy period, so I need a couple of more days
before I can help out running tests.
Kind regards
Uffe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mmc: block: handle complete_work on separate workqueue
2019-02-06 12:00 ` [PATCH] mmc: block: handle complete_work on separate workqueue Ulf Hansson
@ 2019-02-06 12:42 ` Adrian Hunter
2019-02-06 14:08 ` Ulf Hansson
0 siblings, 1 reply; 5+ messages in thread
From: Adrian Hunter @ 2019-02-06 12:42 UTC (permalink / raw)
To: Ulf Hansson, Zachary Hays
Cc: Christoph Hellwig, linux-mmc, linux-block, Linus Walleij
On 6/02/19 2:00 PM, Ulf Hansson wrote:
> + Adrian, Linus
>
> On Tue, 5 Feb 2019 at 17:53, Zachary Hays <zhays@lexmark.com> wrote:
>>
>> The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set.
>> This generates a rescuer thread for that queue that will trigger when
>> the CPU is under heavy load and collect the uncompleted work.
>>
>> In the case of mmc, this creates the possibility of a deadlock as
>> other blk-mq work is also run on the same queue. For example:
>>
>> - worker 0 claims the mmc host
>> - worker 1 attempts to claim the host
>> - worker 0 schedules complete_work to release the host
>> - rescuer thread is triggered after time-out and collects the dangling
>> work
>> - rescuer thread attempts to complete the work in order starting with
>> claim host
>> - the task to release host is now blocked by a task to claim it and
>> will never be called
>>
>
> Adrian, I need your help to understand more of this. The above
> description doesn't add up to me.
>
> In principle, already when "worker 1 attempts to claim the host" as
> described above, it should succeed and should *not* need to wait for
> the host to be released. Right?
If it is the same queue, then yes. Although in that case there is only 1
work for the hw queue so there cannot be another worker. There could be
another attempt to send a request directly, but that will not block - if the
host controller is busy, BLK_STS_RESOURCE will be returned from ->queue_rq().
>
> The hole point with the commit 6c0cedd1ef95 ("mmc: core: Introduce
> host claiming by context"), was to allow the mmc host to be
> re-claimable for blk I/O requests, no matter from what worker/thread
> the claim/release is done from.
>
> Is it not working as expected you think? What am I missing here?
I assumed we were talking about a situation where there are multiple
internal eMMC partitions each with their own disk and queue. In that case,
a queue waits if there is another queue that is using the eMMC.
We should clarify whether that is the scenario we are looking at. Zachary?
>
>> The above results in multiple hung tasks that lead to failures to boot.
>
> Of course, during boot there is also a card detect work running in
> parallel with blk I/O requests. This work is being punted to the
> "system_freezable_wq" and it claims/releases the host as well.
>
> Perhaps, that could have something to do with the problem....
>
>>
>> Handling complete_work on a separate workqueue avoids this by keeping
>> the work completion tasks separate from the other blk-mq work. This
>> allows the host to be released without getting blocked by other tasks
>> attempting to claim the host.
>>
>> Signed-off-by: Zachary Hays <zhays@lexmark.com>
>> ---
>> drivers/mmc/core/block.c | 10 +++++++++-
>> include/linux/mmc/card.h | 1 +
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
>> index aef1185f383d..14f3fdb8c6bb 100644
>> --- a/drivers/mmc/core/block.c
>> +++ b/drivers/mmc/core/block.c
>> @@ -2112,7 +2112,7 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
>> if (waiting)
>> wake_up(&mq->wait);
>> else
>> - kblockd_schedule_work(&mq->complete_work);
>> + queue_work(mq->card->complete_wq, &mq->complete_work);
>>
>> return;
>> }
>> @@ -2924,6 +2924,13 @@ static int mmc_blk_probe(struct mmc_card *card)
>>
>> mmc_fixup_device(card, mmc_blk_fixups);
>>
>> + card->complete_wq = alloc_workqueue("mmc_complete",
>> + WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
>> + if (unlikely(!card->complete_wq)) {
>> + pr_err("Failed to create mmc completion workqueue");
>> + return -ENOMEM;
>> + }
>> +
>> md = mmc_blk_alloc(card);
>> if (IS_ERR(md))
>> return PTR_ERR(md);
>> @@ -2987,6 +2994,7 @@ static void mmc_blk_remove(struct mmc_card *card)
>> pm_runtime_put_noidle(&card->dev);
>> mmc_blk_remove_req(md);
>> dev_set_drvdata(&card->dev, NULL);
>> + destroy_workqueue(card->complete_wq);
>> }
>>
>> static int _mmc_blk_suspend(struct mmc_card *card)
>> diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
>> index de7377815b6b..8ef330027b13 100644
>> --- a/include/linux/mmc/card.h
>> +++ b/include/linux/mmc/card.h
>> @@ -308,6 +308,7 @@ struct mmc_card {
>> unsigned int nr_parts;
>>
>> unsigned int bouncesz; /* Bounce buffer size */
>> + struct workqueue_struct *complete_wq; /* Private workqueue */
>> };
>>
>> static inline bool mmc_large_sector(struct mmc_card *card)
>> --
>> 2.7.4
>>
>
> So this change seems to solve the problem, which is really great.
> However, I think we really need to understand what goes wrong, before
> we apply this as fix.
>
> Unfortunate, I am also in busy period, so I need a couple of more days
> before I can help out running tests.
>
> Kind regards
> Uffe
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mmc: block: handle complete_work on separate workqueue
2019-02-06 12:42 ` Adrian Hunter
@ 2019-02-06 14:08 ` Ulf Hansson
2019-02-06 21:08 ` Zak Hays
[not found] ` <1549551788-2572-1-git-send-email-zhays@lexmark.com>
0 siblings, 2 replies; 5+ messages in thread
From: Ulf Hansson @ 2019-02-06 14:08 UTC (permalink / raw)
To: Adrian Hunter
Cc: Zachary Hays, Christoph Hellwig, linux-mmc, linux-block, Linus Walleij
On Wed, 6 Feb 2019 at 13:43, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 6/02/19 2:00 PM, Ulf Hansson wrote:
> > + Adrian, Linus
> >
> > On Tue, 5 Feb 2019 at 17:53, Zachary Hays <zhays@lexmark.com> wrote:
> >>
> >> The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set.
> >> This generates a rescuer thread for that queue that will trigger when
> >> the CPU is under heavy load and collect the uncompleted work.
> >>
> >> In the case of mmc, this creates the possibility of a deadlock as
> >> other blk-mq work is also run on the same queue. For example:
> >>
> >> - worker 0 claims the mmc host
> >> - worker 1 attempts to claim the host
> >> - worker 0 schedules complete_work to release the host
> >> - rescuer thread is triggered after time-out and collects the dangling
> >> work
> >> - rescuer thread attempts to complete the work in order starting with
> >> claim host
> >> - the task to release host is now blocked by a task to claim it and
> >> will never be called
> >>
> >
> > Adrian, I need your help to understand more of this. The above
> > description doesn't add up to me.
> >
> > In principle, already when "worker 1 attempts to claim the host" as
> > described above, it should succeed and should *not* need to wait for
> > the host to be released. Right?
>
> If it is the same queue, then yes. Although in that case there is only 1
> work for the hw queue so there cannot be another worker. There could be
> another attempt to send a request directly, but that will not block - if the
> host controller is busy, BLK_STS_RESOURCE will be returned from ->queue_rq().
>
Right.
> >
> > The hole point with the commit 6c0cedd1ef95 ("mmc: core: Introduce
> > host claiming by context"), was to allow the mmc host to be
> > re-claimable for blk I/O requests, no matter from what worker/thread
> > the claim/release is done from.
> >
> > Is it not working as expected you think? What am I missing here?
>
> I assumed we were talking about a situation where there are multiple
> internal eMMC partitions each with their own disk and queue. In that case,
> a queue waits if there is another queue that is using the eMMC.
Of course! I totally forgot about this case, that is most certainly
what must be happening!
>
> We should clarify whether that is the scenario we are looking at. Zachary?
Yes, please.
Assuming this is the case, I would also prefer an updated changelog
that describe this scenario.
Adrian, thanks a lot for you help!
[...]
Kind regards
Uffe
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] mmc: block: handle complete_work on separate workqueue
2019-02-06 14:08 ` Ulf Hansson
@ 2019-02-06 21:08 ` Zak Hays
[not found] ` <1549551788-2572-1-git-send-email-zhays@lexmark.com>
1 sibling, 0 replies; 5+ messages in thread
From: Zak Hays @ 2019-02-06 21:08 UTC (permalink / raw)
To: Ulf Hansson, Adrian Hunter
Cc: Christoph Hellwig, linux-mmc, linux-block, Linus Walleij
> -----Original Message-----
> From: Ulf Hansson <ulf.hansson@linaro.org>
> Sent: Wednesday, February 6, 2019 9:08 AM
> To: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Zak Hays <zak.hays@lexmark.com>; Christoph Hellwig <hch@lst.de>;
> linux-mmc@vger.kernel.org; linux-block <linux-block@vger.kernel.org>;
> Linus Walleij <linus.walleij@linaro.org>
> Subject: Re: [PATCH] mmc: block: handle complete_work on separate
> workqueue
>
> On Wed, 6 Feb 2019 at 13:43, Adrian Hunter <adrian.hunter@intel.com>
> wrote:
> >
> > On 6/02/19 2:00 PM, Ulf Hansson wrote:
> > > + Adrian, Linus
> > >
> > > On Tue, 5 Feb 2019 at 17:53, Zachary Hays <zhays@lexmark.com> wrote:
> > >>
> > >> The kblockd workqueue is created with the WQ_MEM_RECLAIM flag
> set.
> > >> This generates a rescuer thread for that queue that will trigger when
> > >> the CPU is under heavy load and collect the uncompleted work.
> > >>
> > >> In the case of mmc, this creates the possibility of a deadlock as
> > >> other blk-mq work is also run on the same queue. For example:
> > >>
> > >> - worker 0 claims the mmc host
> > >> - worker 1 attempts to claim the host
> > >> - worker 0 schedules complete_work to release the host
> > >> - rescuer thread is triggered after time-out and collects the dangling
> > >> work
> > >> - rescuer thread attempts to complete the work in order starting with
> > >> claim host
> > >> - the task to release host is now blocked by a task to claim it and
> > >> will never be called
> > >>
> > >
> > > Adrian, I need your help to understand more of this. The above
> > > description doesn't add up to me.
> > >
> > > In principle, already when "worker 1 attempts to claim the host" as
> > > described above, it should succeed and should *not* need to wait for
> > > the host to be released. Right?
> >
> > If it is the same queue, then yes. Although in that case there is only 1
> > work for the hw queue so there cannot be another worker. There could be
> > another attempt to send a request directly, but that will not block - if the
> > host controller is busy, BLK_STS_RESOURCE will be returned from -
> >queue_rq().
> >
>
> Right.
>
> > >
> > > The hole point with the commit 6c0cedd1ef95 ("mmc: core: Introduce
> > > host claiming by context"), was to allow the mmc host to be
> > > re-claimable for blk I/O requests, no matter from what worker/thread
> > > the claim/release is done from.
> > >
> > > Is it not working as expected you think? What am I missing here?
> >
> > I assumed we were talking about a situation where there are multiple
> > internal eMMC partitions each with their own disk and queue. In that case,
> > a queue waits if there is another queue that is using the eMMC.
>
> Of course! I totally forgot about this case, that is most certainly
> what must be happening!
>
> >
> > We should clarify whether that is the scenario we are looking at. Zachary?
>
> Yes, please.
Yes, this is the case. There are multiple partitions on the eMMC.
>
> Assuming this is the case, I would also prefer an updated changelog
> that describe this scenario.
I will update the commit message to clarify this and resubmit.
> Adrian, thanks a lot for you help!
>
> [...]
>
> Kind regards
> Uffe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] mmc: block: handle complete_work on separate workqueue
[not found] ` <1549551788-2572-1-git-send-email-zhays@lexmark.com>
@ 2019-02-08 11:52 ` Ulf Hansson
0 siblings, 0 replies; 5+ messages in thread
From: Ulf Hansson @ 2019-02-08 11:52 UTC (permalink / raw)
To: Zachary Hays
Cc: Adrian Hunter, Christoph Hellwig, linux-mmc, linux-block, Linus Walleij
On Thu, 7 Feb 2019 at 16:55, Zachary Hays <zhays@lexmark.com> wrote:
>
> The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set.
> This generates a rescuer thread for that queue that will trigger when
> the CPU is under heavy load and collect the uncompleted work.
>
> In the case of mmc, this creates the possibility of a deadlock when
> there are multiple partitions on the device as other blk-mq work is
> also run on the same queue. For example:
>
> - worker 0 claims the mmc host to work on partition 1
> - worker 1 attempts to claim the host for partition 2 but has to wait
> for worker 0 to finish
> - worker 0 schedules complete_work to release the host
> - rescuer thread is triggered after time-out and collects the dangling
> work
> - rescuer thread attempts to complete the work in order starting with
> claim host
> - the task to release host is now blocked by a task to claim it and
> will never be called
>
> The above results in multiple hung tasks that lead to failures to
> mount partitions.
>
> Handling complete_work on a separate workqueue avoids this by keeping
> the work completion tasks separate from the other blk-mq work. This
> allows the host to be released without getting blocked by other tasks
> attempting to claim the host.
>
> Signed-off-by: Zachary Hays <zhays@lexmark.com>
Applied for fixes and by adding a fixes+stable tag.
Thanks Zachary for fixing this non-trivial problem! And thanks also to
Adrian/Christoph for your valuable input to this (feel free to reply
with your reviewed-by tag).
Kind regards
Uffe
> ---
> drivers/mmc/core/block.c | 10 +++++++++-
> include/linux/mmc/card.h | 1 +
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index aef1185f383d..14f3fdb8c6bb 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -2112,7 +2112,7 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
> if (waiting)
> wake_up(&mq->wait);
> else
> - kblockd_schedule_work(&mq->complete_work);
> + queue_work(mq->card->complete_wq, &mq->complete_work);
>
> return;
> }
> @@ -2924,6 +2924,13 @@ static int mmc_blk_probe(struct mmc_card *card)
>
> mmc_fixup_device(card, mmc_blk_fixups);
>
> + card->complete_wq = alloc_workqueue("mmc_complete",
> + WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
> + if (unlikely(!card->complete_wq)) {
> + pr_err("Failed to create mmc completion workqueue");
> + return -ENOMEM;
> + }
> +
> md = mmc_blk_alloc(card);
> if (IS_ERR(md))
> return PTR_ERR(md);
> @@ -2987,6 +2994,7 @@ static void mmc_blk_remove(struct mmc_card *card)
> pm_runtime_put_noidle(&card->dev);
> mmc_blk_remove_req(md);
> dev_set_drvdata(&card->dev, NULL);
> + destroy_workqueue(card->complete_wq);
> }
>
> static int _mmc_blk_suspend(struct mmc_card *card)
> diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
> index de7377815b6b..8ef330027b13 100644
> --- a/include/linux/mmc/card.h
> +++ b/include/linux/mmc/card.h
> @@ -308,6 +308,7 @@ struct mmc_card {
> unsigned int nr_parts;
>
> unsigned int bouncesz; /* Bounce buffer size */
> + struct workqueue_struct *complete_wq; /* Private workqueue */
> };
>
> static inline bool mmc_large_sector(struct mmc_card *card)
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-02-08 11:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1549385523-20521-1-git-send-email-zhays@lexmark.com>
2019-02-06 12:00 ` [PATCH] mmc: block: handle complete_work on separate workqueue Ulf Hansson
2019-02-06 12:42 ` Adrian Hunter
2019-02-06 14:08 ` Ulf Hansson
2019-02-06 21:08 ` Zak Hays
[not found] ` <1549551788-2572-1-git-send-email-zhays@lexmark.com>
2019-02-08 11:52 ` [PATCH v2] " Ulf Hansson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).